DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH] gpudev: introduce memory API
@ 2021-06-02 20:35 Thomas Monjalon
  2021-06-02 20:46 ` Stephen Hemminger
                   ` (10 more replies)
  0 siblings, 11 replies; 128+ messages in thread
From: Thomas Monjalon @ 2021-06-02 20:35 UTC (permalink / raw)
  To: dev; +Cc: Elena Agostini

From: Elena Agostini <eagostini@nvidia.com>

The new library gpudev is for dealing with GPU from a DPDK application
in a vendor-agnostic way.

As a first step, the features are focused on memory management.
A function allows to allocate memory inside the GPU,
while another one allows to use main (CPU) memory from the GPU.

The infrastructure is prepared to welcome drivers in drivers/gpu/
as the upcoming NVIDIA one, implementing the gpudev API.
Other additions planned for next revisions:
  - C implementation file
  - guide documentation
  - unit tests
  - integration in testpmd to enable Rx/Tx to/from GPU memory.

The next step should focus on GPU processing task control.

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 .gitignore                           |   1 +
 MAINTAINERS                          |   6 +
 doc/api/doxy-api-index.md            |   1 +
 doc/api/doxy-api.conf.in             |   1 +
 doc/guides/conf.py                   |   8 ++
 doc/guides/gpus/features/default.ini |  13 ++
 doc/guides/gpus/index.rst            |  11 ++
 doc/guides/gpus/overview.rst         |   7 +
 doc/guides/index.rst                 |   1 +
 doc/guides/prog_guide/gpu.rst        |   5 +
 doc/guides/prog_guide/index.rst      |   1 +
 drivers/gpu/meson.build              |   4 +
 drivers/meson.build                  |   1 +
 lib/gpudev/gpu_driver.h              |  44 +++++++
 lib/gpudev/meson.build               |   9 ++
 lib/gpudev/rte_gpudev.h              | 183 +++++++++++++++++++++++++++
 lib/gpudev/version.map               |  11 ++
 lib/meson.build                      |   1 +
 18 files changed, 308 insertions(+)
 create mode 100644 doc/guides/gpus/features/default.ini
 create mode 100644 doc/guides/gpus/index.rst
 create mode 100644 doc/guides/gpus/overview.rst
 create mode 100644 doc/guides/prog_guide/gpu.rst
 create mode 100644 drivers/gpu/meson.build
 create mode 100644 lib/gpudev/gpu_driver.h
 create mode 100644 lib/gpudev/meson.build
 create mode 100644 lib/gpudev/rte_gpudev.h
 create mode 100644 lib/gpudev/version.map

diff --git a/.gitignore b/.gitignore
index b19c0717e6..49494e0c6c 100644
--- a/.gitignore
+++ b/.gitignore
@@ -14,6 +14,7 @@ doc/guides/compressdevs/overview_feature_table.txt
 doc/guides/regexdevs/overview_feature_table.txt
 doc/guides/vdpadevs/overview_feature_table.txt
 doc/guides/bbdevs/overview_feature_table.txt
+doc/guides/gpus/overview_feature_table.txt
 
 # ignore generated ctags/cscope files
 cscope.out.po
diff --git a/MAINTAINERS b/MAINTAINERS
index 5877a16971..c4755dfe9a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -452,6 +452,12 @@ F: app/test-regex/
 F: doc/guides/prog_guide/regexdev.rst
 F: doc/guides/regexdevs/features/default.ini
 
+GPU API - EXPERIMENTAL
+M: Elena Agostini <eagostini@nvidia.com>
+F: lib/gpudev/
+F: doc/guides/prog_guide/gpu.rst
+F: doc/guides/gpus/features/default.ini
+
 Eventdev API
 M: Jerin Jacob <jerinj@marvell.com>
 T: git://dpdk.org/next/dpdk-next-eventdev
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 1992107a03..bd10342ca2 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -21,6 +21,7 @@ The public API headers are grouped by topics:
   [compressdev]        (@ref rte_compressdev.h),
   [compress]           (@ref rte_comp.h),
   [regexdev]           (@ref rte_regexdev.h),
+  [gpudev]             (@ref rte_gpudev.h),
   [eventdev]           (@ref rte_eventdev.h),
   [event_eth_rx_adapter]   (@ref rte_event_eth_rx_adapter.h),
   [event_eth_tx_adapter]   (@ref rte_event_eth_tx_adapter.h),
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index 325a0195c6..831b9a6b33 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -40,6 +40,7 @@ INPUT                   = @TOPDIR@/doc/api/doxy-api-index.md \
                           @TOPDIR@/lib/eventdev \
                           @TOPDIR@/lib/fib \
                           @TOPDIR@/lib/flow_classify \
+                          @TOPDIR@/lib/gpudev \
                           @TOPDIR@/lib/graph \
                           @TOPDIR@/lib/gro \
                           @TOPDIR@/lib/gso \
diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 67d2dd62c7..7930da9ceb 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -152,6 +152,9 @@ def generate_overview_table(output_filename, table_id, section, table_name, titl
         name = ini_filename[:-4]
         name = name.replace('_vf', 'vf')
         pmd_names.append(name)
+    if not pmd_names:
+        # Add an empty column if table is empty (required by RST syntax)
+        pmd_names.append(' ')
 
     # Pad the table header names.
     max_header_len = len(max(pmd_names, key=len))
@@ -388,6 +391,11 @@ def setup(app):
                             'Features',
                             'Features availability in bbdev drivers',
                             'Feature')
+    table_file = dirname(__file__) + '/gpus/overview_feature_table.txt'
+    generate_overview_table(table_file, 1,
+                            'Features',
+                            'Features availability in GPU drivers',
+                            'Feature')
 
     if LooseVersion(sphinx_version) < LooseVersion('1.3.1'):
         print('Upgrade sphinx to version >= 1.3.1 for '
diff --git a/doc/guides/gpus/features/default.ini b/doc/guides/gpus/features/default.ini
new file mode 100644
index 0000000000..c363447b0d
--- /dev/null
+++ b/doc/guides/gpus/features/default.ini
@@ -0,0 +1,13 @@
+;
+; Features of a GPU driver.
+;
+; This file defines the features that are valid for inclusion in
+; the other driver files and also the order that they appear in
+; the features table in the documentation. The feature description
+; string should not exceed feature_str_len defined in conf.py.
+;
+[Features]
+Get device info                =
+Share CPU memory with GPU      =
+Allocate GPU memory            =
+Free memory                    =
diff --git a/doc/guides/gpus/index.rst b/doc/guides/gpus/index.rst
new file mode 100644
index 0000000000..f9c62aeb36
--- /dev/null
+++ b/doc/guides/gpus/index.rst
@@ -0,0 +1,11 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+   Copyright 2021 NVIDIA Corporation & Affiliates
+
+GPU Drivers
+===========
+
+.. toctree::
+   :maxdepth: 2
+   :numbered:
+
+   overview
diff --git a/doc/guides/gpus/overview.rst b/doc/guides/gpus/overview.rst
new file mode 100644
index 0000000000..e7f985e98b
--- /dev/null
+++ b/doc/guides/gpus/overview.rst
@@ -0,0 +1,7 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+   Copyright 2021 NVIDIA Corporation & Affiliates
+
+Overview of GPU Drivers
+=======================
+
+.. include:: overview_feature_table.txt
diff --git a/doc/guides/index.rst b/doc/guides/index.rst
index 857f0363d3..ee4d79a4eb 100644
--- a/doc/guides/index.rst
+++ b/doc/guides/index.rst
@@ -21,6 +21,7 @@ DPDK documentation
    compressdevs/index
    vdpadevs/index
    regexdevs/index
+   gpus/index
    eventdevs/index
    rawdevs/index
    mempool/index
diff --git a/doc/guides/prog_guide/gpu.rst b/doc/guides/prog_guide/gpu.rst
new file mode 100644
index 0000000000..54f9fa8300
--- /dev/null
+++ b/doc/guides/prog_guide/gpu.rst
@@ -0,0 +1,5 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+   Copyright 2021 NVIDIA Corporation & Affiliates
+
+GPU Library
+===========
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 2dce507f46..dfddf90b51 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -27,6 +27,7 @@ Programmer's Guide
     cryptodev_lib
     compressdev
     regexdev
+    gpu
     rte_security
     rawdev
     link_bonding_poll_mode_drv_lib
diff --git a/drivers/gpu/meson.build b/drivers/gpu/meson.build
new file mode 100644
index 0000000000..5189950616
--- /dev/null
+++ b/drivers/gpu/meson.build
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2021 NVIDIA Corporation & Affiliates
+
+drivers = []
diff --git a/drivers/meson.build b/drivers/meson.build
index bc6f4f567f..f607040d79 100644
--- a/drivers/meson.build
+++ b/drivers/meson.build
@@ -18,6 +18,7 @@ subdirs = [
         'vdpa',           # depends on common, bus and mempool.
         'event',          # depends on common, bus, mempool and net.
         'baseband',       # depends on common and bus.
+        'gpu',            # depends on common and bus.
 ]
 
 if meson.is_cross_build()
diff --git a/lib/gpudev/gpu_driver.h b/lib/gpudev/gpu_driver.h
new file mode 100644
index 0000000000..5ff609e49d
--- /dev/null
+++ b/lib/gpudev/gpu_driver.h
@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2021 NVIDIA Corporation & Affiliates
+ */
+
+#ifndef GPU_DRIVER_H
+#define GPU_DRIVER_H
+
+#include <stdint.h>
+
+#include <rte_common.h>
+
+#include "rte_gpudev.h"
+
+struct rte_gpu_dev;
+
+typedef int (*gpu_malloc_t)(struct rte_gpu_dev *dev, size_t size, void **ptr);
+typedef int (*gpu_free_t)(struct rte_gpu_dev *dev, void *ptr);
+
+struct rte_gpu_dev {
+	/* Backing device. */
+	struct rte_device *device;
+	/* GPU info structure. */
+	struct rte_gpu_info info;
+	/* Counter of processes using the device. */
+	uint16_t process_cnt;
+	/* If device is currently used or not. */
+	enum rte_gpu_state state;
+	/* FUNCTION: Allocate memory on the GPU. */
+	gpu_malloc_t gpu_malloc;
+	/* FUNCTION: Allocate memory on the CPU visible from the GPU. */
+	gpu_malloc_t gpu_malloc_visible;
+	/* FUNCTION: Free allocated memory on the GPU. */
+	gpu_free_t gpu_free;
+	/* Device interrupt handle. */
+	struct rte_intr_handle *intr_handle;
+	/* Driver-specific private data. */
+	void *dev_private;
+} __rte_cache_aligned;
+
+struct rte_gpu_dev *rte_gpu_dev_allocate(const char *name);
+struct rte_gpu_dev *rte_gpu_dev_get_by_name(const char *name);
+int rte_gpu_dev_release(struct rte_gpu_dev *gpudev);
+
+#endif /* GPU_DRIVER_H */
diff --git a/lib/gpudev/meson.build b/lib/gpudev/meson.build
new file mode 100644
index 0000000000..f05459e18d
--- /dev/null
+++ b/lib/gpudev/meson.build
@@ -0,0 +1,9 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2021 NVIDIA Corporation & Affiliates
+
+headers = files(
+        'rte_gpudev.h',
+)
+
+sources = files(
+)
diff --git a/lib/gpudev/rte_gpudev.h b/lib/gpudev/rte_gpudev.h
new file mode 100644
index 0000000000..b12f35c17e
--- /dev/null
+++ b/lib/gpudev/rte_gpudev.h
@@ -0,0 +1,183 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2021 NVIDIA Corporation & Affiliates
+ */
+
+#ifndef RTE_GPUDEV_H
+#define RTE_GPUDEV_H
+
+#include <stdint.h>
+#include <stdbool.h>
+
+#include <rte_common.h>
+
+/**
+ * @file
+ * Generic library to interact with a GPU.
+ *
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/** Maximum number of GPU engines. */
+#define RTE_GPU_MAX_DEVS UINT16_C(32)
+/** Maximum length of device name. */
+#define RTE_GPU_NAME_MAX_LEN 128
+
+/** Flags indicate current state of GPU device. */
+enum rte_gpu_state {
+	RTE_GPU_STATE_UNUSED,        /**< not initialized */
+	RTE_GPU_STATE_INITIALIZED,   /**< initialized */
+};
+
+/** Store a list of info for a given GPU. */
+struct rte_gpu_info {
+	/** GPU device ID. */
+	uint16_t gpu_id;
+	/** Unique identifier name. */
+	char name[RTE_GPU_NAME_MAX_LEN];
+	/** Total memory available on device. */
+	size_t total_memory;
+	/** Total processors available on device. */
+	int processor_count;
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Returns the number of GPUs detected and associated to DPDK.
+ *
+ * @return
+ *   The number of available GPUs.
+ */
+__rte_experimental
+uint16_t rte_gpu_dev_count_avail(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Check if the device is valid and initialized in DPDK.
+ *
+ * @param gpu_id
+ *   The input GPU ID.
+ *
+ * @return
+ *   - True if gpu_id is a valid and initialized GPU.
+ *   - False otherwise.
+ */
+__rte_experimental
+bool rte_gpu_dev_is_valid(uint16_t gpu_id);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get the GPU ID of the next valid GPU initialized in DPDK.
+ *
+ * @param gpu_id
+ *   The initial GPU ID to start the research.
+ *
+ * @return
+ *   Next GPU ID corresponding to a valid and initialized GPU device.
+ */
+__rte_experimental
+uint16_t rte_gpu_dev_find_next(uint16_t gpu_id);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Macro to iterate over all valid GPUs.
+ *
+ * @param gpu_id
+ *   The ID of the next possible valid GPU.
+ * @return
+ *   Next valid GPU ID, RTE_GPU_MAX_DEVS if there is none.
+ */
+#define RTE_GPU_FOREACH_DEV(gpu_id) \
+	for (gpu_id = rte_gpu_find_next(0); \
+	     gpu_id < RTE_GPU_MAX_DEVS; \
+	     gpu_id = rte_gpu_find_next(gpu_id + 1))
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Return GPU specific info.
+ *
+ * @param gpu_id
+ *   GPU ID to get info.
+ * @param info
+ *   Memory structure to fill with the info.
+ *
+ * @return
+ *   0 on success, -1 otherwise.
+ */
+__rte_experimental
+int rte_gpu_dev_info_get(uint16_t gpu_id, struct rte_gpu_info **info);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Allocate a chunk of memory on the GPU.
+ *
+ * @param gpu_id
+ *   GPU ID to allocate memory.
+ * @param size
+ *   Number of bytes to allocate.
+ * @param ptr
+ *   Pointer to store the address of the allocated memory.
+ *
+ * @return
+ *   0 on success, -1 otherwise.
+ */
+__rte_experimental
+int rte_gpu_malloc(uint16_t gpu_id, size_t size, void **ptr);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Allocate a chunk of memory on the CPU that is visible from the GPU.
+ *
+ * @param gpu_id
+ *   Reference GPU ID.
+ * @param size
+ *   Number of bytes to allocate.
+ * @param ptr
+ *   Pointer to store the address of the allocated memory.
+ *
+ * @return
+ *   0 on success, -1 otherwise.
+ */
+__rte_experimental
+int rte_gpu_malloc_visible(uint16_t gpu_id, size_t size, void **ptr);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Deallocate a chunk of memory allocated with rte_gpu_malloc*.
+ *
+ * @param gpu_id
+ *   Reference GPU ID.
+ * @param ptr
+ *   Pointer to the memory area to be deallocated.
+ *
+ * @return
+ *   0 on success, -1 otherwise.
+ */
+__rte_experimental
+int rte_gpu_free(uint16_t gpu_id, void *ptr);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_GPUDEV_H */
diff --git a/lib/gpudev/version.map b/lib/gpudev/version.map
new file mode 100644
index 0000000000..9e0f218e8b
--- /dev/null
+++ b/lib/gpudev/version.map
@@ -0,0 +1,11 @@
+EXPERIMENTAL {
+	global:
+
+	rte_gpu_dev_count_avail;
+	rte_gpu_dev_find_next;
+	rte_gpu_dev_info_get;
+	rte_gpu_dev_is_valid;
+	rte_gpu_free;
+	rte_gpu_malloc;
+	rte_gpu_malloc_visible;
+};
diff --git a/lib/meson.build b/lib/meson.build
index 4a64756a68..ffefc64c69 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -33,6 +33,7 @@ libraries = [
         'distributor',
         'efd',
         'eventdev',
+        'gpudev',
         'gro',
         'gso',
         'ip_frag',
-- 
2.31.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-02 20:35 [dpdk-dev] [PATCH] gpudev: introduce memory API Thomas Monjalon
@ 2021-06-02 20:46 ` Stephen Hemminger
  2021-06-02 20:48   ` Thomas Monjalon
  2021-06-03  7:06 ` Andrew Rybchenko
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 128+ messages in thread
From: Stephen Hemminger @ 2021-06-02 20:46 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Elena Agostini

On Wed,  2 Jun 2021 22:35:31 +0200
Thomas Monjalon <thomas@monjalon.net> wrote:

> +/** Store a list of info for a given GPU. */
> +struct rte_gpu_info {
> +	/** GPU device ID. */
> +	uint16_t gpu_id;
> +	/** Unique identifier name. */
> +	char name[RTE_GPU_NAME_MAX_LEN];
> +	/** Total memory available on device. */
> +	size_t total_memory;
> +	/** Total processors available on device. */
> +	int processor_count;

Nit: shouldn't processor_count be unsigned.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-02 20:46 ` Stephen Hemminger
@ 2021-06-02 20:48   ` Thomas Monjalon
  0 siblings, 0 replies; 128+ messages in thread
From: Thomas Monjalon @ 2021-06-02 20:48 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Elena Agostini

02/06/2021 22:46, Stephen Hemminger:
> On Wed,  2 Jun 2021 22:35:31 +0200
> Thomas Monjalon <thomas@monjalon.net> wrote:
> 
> > +/** Store a list of info for a given GPU. */
> > +struct rte_gpu_info {
> > +	/** GPU device ID. */
> > +	uint16_t gpu_id;
> > +	/** Unique identifier name. */
> > +	char name[RTE_GPU_NAME_MAX_LEN];
> > +	/** Total memory available on device. */
> > +	size_t total_memory;
> > +	/** Total processors available on device. */
> > +	int processor_count;
> 
> Nit: shouldn't processor_count be unsigned.

Absolutely yes, thanks for the catch.



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-02 20:35 [dpdk-dev] [PATCH] gpudev: introduce memory API Thomas Monjalon
  2021-06-02 20:46 ` Stephen Hemminger
@ 2021-06-03  7:06 ` Andrew Rybchenko
  2021-06-03  7:26   ` Thomas Monjalon
  2021-06-03  7:18 ` David Marchand
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 128+ messages in thread
From: Andrew Rybchenko @ 2021-06-03  7:06 UTC (permalink / raw)
  To: Thomas Monjalon, dev; +Cc: Elena Agostini

On 6/2/21 11:35 PM, Thomas Monjalon wrote:
> From: Elena Agostini <eagostini@nvidia.com>
> 
> The new library gpudev is for dealing with GPU from a DPDK application
> in a vendor-agnostic way.
> 
> As a first step, the features are focused on memory management.
> A function allows to allocate memory inside the GPU,
> while another one allows to use main (CPU) memory from the GPU.
> 
> The infrastructure is prepared to welcome drivers in drivers/gpu/
> as the upcoming NVIDIA one, implementing the gpudev API.
> Other additions planned for next revisions:
>   - C implementation file
>   - guide documentation
>   - unit tests
>   - integration in testpmd to enable Rx/Tx to/from GPU memory.
> 
> The next step should focus on GPU processing task control.
> 
> Signed-off-by: Elena Agostini <eagostini@nvidia.com>
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>


LGTM as an RFC. It is definitely to a patch to apply
since implementation is missing. See my notes below.

> ---
>  .gitignore                           |   1 +
>  MAINTAINERS                          |   6 +
>  doc/api/doxy-api-index.md            |   1 +
>  doc/api/doxy-api.conf.in             |   1 +
>  doc/guides/conf.py                   |   8 ++
>  doc/guides/gpus/features/default.ini |  13 ++
>  doc/guides/gpus/index.rst            |  11 ++
>  doc/guides/gpus/overview.rst         |   7 +
>  doc/guides/index.rst                 |   1 +
>  doc/guides/prog_guide/gpu.rst        |   5 +
>  doc/guides/prog_guide/index.rst      |   1 +
>  drivers/gpu/meson.build              |   4 +
>  drivers/meson.build                  |   1 +
>  lib/gpudev/gpu_driver.h              |  44 +++++++
>  lib/gpudev/meson.build               |   9 ++
>  lib/gpudev/rte_gpudev.h              | 183 +++++++++++++++++++++++++++
>  lib/gpudev/version.map               |  11 ++
>  lib/meson.build                      |   1 +
>  18 files changed, 308 insertions(+)
>  create mode 100644 doc/guides/gpus/features/default.ini
>  create mode 100644 doc/guides/gpus/index.rst
>  create mode 100644 doc/guides/gpus/overview.rst
>  create mode 100644 doc/guides/prog_guide/gpu.rst
>  create mode 100644 drivers/gpu/meson.build
>  create mode 100644 lib/gpudev/gpu_driver.h
>  create mode 100644 lib/gpudev/meson.build
>  create mode 100644 lib/gpudev/rte_gpudev.h
>  create mode 100644 lib/gpudev/version.map
> 
> diff --git a/.gitignore b/.gitignore
> index b19c0717e6..49494e0c6c 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -14,6 +14,7 @@ doc/guides/compressdevs/overview_feature_table.txt
>  doc/guides/regexdevs/overview_feature_table.txt
>  doc/guides/vdpadevs/overview_feature_table.txt
>  doc/guides/bbdevs/overview_feature_table.txt
> +doc/guides/gpus/overview_feature_table.txt
>  
>  # ignore generated ctags/cscope files
>  cscope.out.po
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 5877a16971..c4755dfe9a 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -452,6 +452,12 @@ F: app/test-regex/
>  F: doc/guides/prog_guide/regexdev.rst
>  F: doc/guides/regexdevs/features/default.ini
>  
> +GPU API - EXPERIMENTAL
> +M: Elena Agostini <eagostini@nvidia.com>
> +F: lib/gpudev/
> +F: doc/guides/prog_guide/gpu.rst
> +F: doc/guides/gpus/features/default.ini
> +
>  Eventdev API
>  M: Jerin Jacob <jerinj@marvell.com>
>  T: git://dpdk.org/next/dpdk-next-eventdev
> diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
> index 1992107a03..bd10342ca2 100644
> --- a/doc/api/doxy-api-index.md
> +++ b/doc/api/doxy-api-index.md
> @@ -21,6 +21,7 @@ The public API headers are grouped by topics:
>    [compressdev]        (@ref rte_compressdev.h),
>    [compress]           (@ref rte_comp.h),
>    [regexdev]           (@ref rte_regexdev.h),
> +  [gpudev]             (@ref rte_gpudev.h),
>    [eventdev]           (@ref rte_eventdev.h),
>    [event_eth_rx_adapter]   (@ref rte_event_eth_rx_adapter.h),
>    [event_eth_tx_adapter]   (@ref rte_event_eth_tx_adapter.h),
> diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
> index 325a0195c6..831b9a6b33 100644
> --- a/doc/api/doxy-api.conf.in
> +++ b/doc/api/doxy-api.conf.in
> @@ -40,6 +40,7 @@ INPUT                   = @TOPDIR@/doc/api/doxy-api-index.md \
>                            @TOPDIR@/lib/eventdev \
>                            @TOPDIR@/lib/fib \
>                            @TOPDIR@/lib/flow_classify \
> +                          @TOPDIR@/lib/gpudev \
>                            @TOPDIR@/lib/graph \
>                            @TOPDIR@/lib/gro \
>                            @TOPDIR@/lib/gso \
> diff --git a/doc/guides/conf.py b/doc/guides/conf.py
> index 67d2dd62c7..7930da9ceb 100644
> --- a/doc/guides/conf.py
> +++ b/doc/guides/conf.py
> @@ -152,6 +152,9 @@ def generate_overview_table(output_filename, table_id, section, table_name, titl
>          name = ini_filename[:-4]
>          name = name.replace('_vf', 'vf')
>          pmd_names.append(name)
> +    if not pmd_names:
> +        # Add an empty column if table is empty (required by RST syntax)
> +        pmd_names.append(' ')
>  
>      # Pad the table header names.
>      max_header_len = len(max(pmd_names, key=len))
> @@ -388,6 +391,11 @@ def setup(app):
>                              'Features',
>                              'Features availability in bbdev drivers',
>                              'Feature')
> +    table_file = dirname(__file__) + '/gpus/overview_feature_table.txt'
> +    generate_overview_table(table_file, 1,
> +                            'Features',
> +                            'Features availability in GPU drivers',
> +                            'Feature')
>  
>      if LooseVersion(sphinx_version) < LooseVersion('1.3.1'):
>          print('Upgrade sphinx to version >= 1.3.1 for '
> diff --git a/doc/guides/gpus/features/default.ini b/doc/guides/gpus/features/default.ini
> new file mode 100644
> index 0000000000..c363447b0d
> --- /dev/null
> +++ b/doc/guides/gpus/features/default.ini
> @@ -0,0 +1,13 @@
> +;
> +; Features of a GPU driver.
> +;
> +; This file defines the features that are valid for inclusion in
> +; the other driver files and also the order that they appear in
> +; the features table in the documentation. The feature description
> +; string should not exceed feature_str_len defined in conf.py.
> +;
> +[Features]
> +Get device info                =
> +Share CPU memory with GPU      =
> +Allocate GPU memory            =
> +Free memory                    =
> diff --git a/doc/guides/gpus/index.rst b/doc/guides/gpus/index.rst
> new file mode 100644
> index 0000000000..f9c62aeb36
> --- /dev/null
> +++ b/doc/guides/gpus/index.rst
> @@ -0,0 +1,11 @@
> +.. SPDX-License-Identifier: BSD-3-Clause
> +   Copyright 2021 NVIDIA Corporation & Affiliates
> +
> +GPU Drivers
> +===========
> +
> +.. toctree::
> +   :maxdepth: 2
> +   :numbered:
> +
> +   overview
> diff --git a/doc/guides/gpus/overview.rst b/doc/guides/gpus/overview.rst
> new file mode 100644
> index 0000000000..e7f985e98b
> --- /dev/null
> +++ b/doc/guides/gpus/overview.rst
> @@ -0,0 +1,7 @@
> +.. SPDX-License-Identifier: BSD-3-Clause
> +   Copyright 2021 NVIDIA Corporation & Affiliates
> +
> +Overview of GPU Drivers
> +=======================
> +
> +.. include:: overview_feature_table.txt
> diff --git a/doc/guides/index.rst b/doc/guides/index.rst
> index 857f0363d3..ee4d79a4eb 100644
> --- a/doc/guides/index.rst
> +++ b/doc/guides/index.rst
> @@ -21,6 +21,7 @@ DPDK documentation
>     compressdevs/index
>     vdpadevs/index
>     regexdevs/index
> +   gpus/index
>     eventdevs/index
>     rawdevs/index
>     mempool/index
> diff --git a/doc/guides/prog_guide/gpu.rst b/doc/guides/prog_guide/gpu.rst
> new file mode 100644
> index 0000000000..54f9fa8300
> --- /dev/null
> +++ b/doc/guides/prog_guide/gpu.rst
> @@ -0,0 +1,5 @@
> +.. SPDX-License-Identifier: BSD-3-Clause
> +   Copyright 2021 NVIDIA Corporation & Affiliates
> +
> +GPU Library
> +===========
> diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
> index 2dce507f46..dfddf90b51 100644
> --- a/doc/guides/prog_guide/index.rst
> +++ b/doc/guides/prog_guide/index.rst
> @@ -27,6 +27,7 @@ Programmer's Guide
>      cryptodev_lib
>      compressdev
>      regexdev
> +    gpu
>      rte_security
>      rawdev
>      link_bonding_poll_mode_drv_lib
> diff --git a/drivers/gpu/meson.build b/drivers/gpu/meson.build
> new file mode 100644
> index 0000000000..5189950616
> --- /dev/null
> +++ b/drivers/gpu/meson.build
> @@ -0,0 +1,4 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright 2021 NVIDIA Corporation & Affiliates
> +
> +drivers = []
> diff --git a/drivers/meson.build b/drivers/meson.build
> index bc6f4f567f..f607040d79 100644
> --- a/drivers/meson.build
> +++ b/drivers/meson.build
> @@ -18,6 +18,7 @@ subdirs = [
>          'vdpa',           # depends on common, bus and mempool.
>          'event',          # depends on common, bus, mempool and net.
>          'baseband',       # depends on common and bus.
> +        'gpu',            # depends on common and bus.
>  ]
>  
>  if meson.is_cross_build()
> diff --git a/lib/gpudev/gpu_driver.h b/lib/gpudev/gpu_driver.h
> new file mode 100644
> index 0000000000..5ff609e49d
> --- /dev/null
> +++ b/lib/gpudev/gpu_driver.h
> @@ -0,0 +1,44 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright 2021 NVIDIA Corporation & Affiliates
> + */
> +
> +#ifndef GPU_DRIVER_H
> +#define GPU_DRIVER_H
> +
> +#include <stdint.h>
> +
> +#include <rte_common.h>
> +
> +#include "rte_gpudev.h"
> +
> +struct rte_gpu_dev;
> +
> +typedef int (*gpu_malloc_t)(struct rte_gpu_dev *dev, size_t size, void **ptr);
> +typedef int (*gpu_free_t)(struct rte_gpu_dev *dev, void *ptr);

Not that important but I always prefer to typedef
function prototypes w/o pointer and use pointer in
the structure below. I.e.

typedef int (gpu_malloc_t)(struct rte_gpu_dev *dev, size_t size, void
**ptr);

It allows to specify that corresponding callback
must comply to the prototype and produce build
error otherwise (and do not rely on warnings), e.g.

static gpu_malloc_t mlx5_gpu_malloc;
static int
mlx5_gpu_malloc(struct rte_gpu_dev *dev, size_t size, void **ptr)
{
     ...
}

May be a new library should go this way.

> +
> +struct rte_gpu_dev {
> +	/* Backing device. */
> +	struct rte_device *device;
> +	/* GPU info structure. */
> +	struct rte_gpu_info info;
> +	/* Counter of processes using the device. */
> +	uint16_t process_cnt;
> +	/* If device is currently used or not. */
> +	enum rte_gpu_state state;
> +	/* FUNCTION: Allocate memory on the GPU. */
> +	gpu_malloc_t gpu_malloc;
> +	/* FUNCTION: Allocate memory on the CPU visible from the GPU. */
> +	gpu_malloc_t gpu_malloc_visible;
> +	/* FUNCTION: Free allocated memory on the GPU. */
> +	gpu_free_t gpu_free;

Don't we need a callback to get dev_info?

> +	/* Device interrupt handle. */
> +	struct rte_intr_handle *intr_handle;
> +	/* Driver-specific private data. */
> +	void *dev_private;
> +} __rte_cache_aligned;
> +
> +struct rte_gpu_dev *rte_gpu_dev_allocate(const char *name);
> +struct rte_gpu_dev *rte_gpu_dev_get_by_name(const char *name);
> +int rte_gpu_dev_release(struct rte_gpu_dev *gpudev);
> +
> +#endif /* GPU_DRIVER_H */
> diff --git a/lib/gpudev/meson.build b/lib/gpudev/meson.build
> new file mode 100644
> index 0000000000..f05459e18d
> --- /dev/null
> +++ b/lib/gpudev/meson.build
> @@ -0,0 +1,9 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright 2021 NVIDIA Corporation & Affiliates
> +
> +headers = files(
> +        'rte_gpudev.h',
> +)
> +
> +sources = files(
> +)
> diff --git a/lib/gpudev/rte_gpudev.h b/lib/gpudev/rte_gpudev.h
> new file mode 100644
> index 0000000000..b12f35c17e
> --- /dev/null
> +++ b/lib/gpudev/rte_gpudev.h
> @@ -0,0 +1,183 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright 2021 NVIDIA Corporation & Affiliates
> + */
> +
> +#ifndef RTE_GPUDEV_H
> +#define RTE_GPUDEV_H
> +
> +#include <stdint.h>
> +#include <stdbool.h>
> +
> +#include <rte_common.h>
> +
> +/**
> + * @file
> + * Generic library to interact with a GPU.
> + *
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/** Maximum number of GPU engines. */
> +#define RTE_GPU_MAX_DEVS UINT16_C(32)
> +/** Maximum length of device name. */
> +#define RTE_GPU_NAME_MAX_LEN 128
> +
> +/** Flags indicate current state of GPU device. */
> +enum rte_gpu_state {
> +	RTE_GPU_STATE_UNUSED,        /**< not initialized */
> +	RTE_GPU_STATE_INITIALIZED,   /**< initialized */
> +};
> +
> +/** Store a list of info for a given GPU. */
> +struct rte_gpu_info {
> +	/** GPU device ID. */
> +	uint16_t gpu_id;
> +	/** Unique identifier name. */
> +	char name[RTE_GPU_NAME_MAX_LEN];
> +	/** Total memory available on device. */
> +	size_t total_memory;
> +	/** Total processors available on device. */
> +	int processor_count;
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Returns the number of GPUs detected and associated to DPDK.
> + *
> + * @return
> + *   The number of available GPUs.
> + */
> +__rte_experimental
> +uint16_t rte_gpu_dev_count_avail(void);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Check if the device is valid and initialized in DPDK.
> + *
> + * @param gpu_id
> + *   The input GPU ID.
> + *
> + * @return
> + *   - True if gpu_id is a valid and initialized GPU.
> + *   - False otherwise.
> + */
> +__rte_experimental
> +bool rte_gpu_dev_is_valid(uint16_t gpu_id);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Get the GPU ID of the next valid GPU initialized in DPDK.
> + *
> + * @param gpu_id
> + *   The initial GPU ID to start the research.
> + *
> + * @return
> + *   Next GPU ID corresponding to a valid and initialized GPU device.
> + */
> +__rte_experimental
> +uint16_t rte_gpu_dev_find_next(uint16_t gpu_id);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Macro to iterate over all valid GPUs.
> + *
> + * @param gpu_id
> + *   The ID of the next possible valid GPU.
> + * @return
> + *   Next valid GPU ID, RTE_GPU_MAX_DEVS if there is none.
> + */
> +#define RTE_GPU_FOREACH_DEV(gpu_id) \
> +	for (gpu_id = rte_gpu_find_next(0); \
> +	     gpu_id < RTE_GPU_MAX_DEVS; \
> +	     gpu_id = rte_gpu_find_next(gpu_id + 1))
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Return GPU specific info.
> + *
> + * @param gpu_id
> + *   GPU ID to get info.
> + * @param info
> + *   Memory structure to fill with the info.
> + *
> + * @return
> + *   0 on success, -1 otherwise.
> + */
> +__rte_experimental
> +int rte_gpu_dev_info_get(uint16_t gpu_id, struct rte_gpu_info **info);

Hm, I think it is better to have 'struct rte_gpu_info *info'.
Why should it allocate and return memory to be freed by caller?

> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Allocate a chunk of memory on the GPU.

Looking a below function it is required to clarify here if
the memory is visible or invisible to GPU (or both allowed).

> + *
> + * @param gpu_id
> + *   GPU ID to allocate memory.
> + * @param size
> + *   Number of bytes to allocate.

Is behaviour defined if zero size is requested?
IMHO, it would be good to define.

> + * @param ptr
> + *   Pointer to store the address of the allocated memory.
> + *
> + * @return
> + *   0 on success, -1 otherwise.

Don't we want to differentiate various errors using
negative errno as it is done in many DPDK libraries?

> + */
> +__rte_experimental
> +int rte_gpu_malloc(uint16_t gpu_id, size_t size, void **ptr);

May be *malloc() should return a pointer and "negative"
values used to report various errnos?

The problem with the approach that comparison vs NULL will
not work in this case and we need special macro or small
inline function to check error condition.

Returned pointer is definitely more convenient, but above
not may result in bugs.

> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Allocate a chunk of memory on the CPU that is visible from the GPU.
> + *
> + * @param gpu_id
> + *   Reference GPU ID.
> + * @param size
> + *   Number of bytes to allocate.
> + * @param ptr
> + *   Pointer to store the address of the allocated memory.
> + *
> + * @return
> + *   0 on success, -1 otherwise.

Same here

> + */
> +__rte_experimental
> +int rte_gpu_malloc_visible(uint16_t gpu_id, size_t size, void **ptr);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Deallocate a chunk of memory allocated with rte_gpu_malloc*.
> + *
> + * @param gpu_id
> + *   Reference GPU ID.
> + * @param ptr
> + *   Pointer to the memory area to be deallocated.

I think it should be NOP in the case of NULL pointer and it
should be documented. If not, it must be documented as well.

> + *
> + * @return
> + *   0 on success, -1 otherwise.

Same here

> + */
> +__rte_experimental
> +int rte_gpu_free(uint16_t gpu_id, void *ptr);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* RTE_GPUDEV_H */
> diff --git a/lib/gpudev/version.map b/lib/gpudev/version.map
> new file mode 100644
> index 0000000000..9e0f218e8b
> --- /dev/null
> +++ b/lib/gpudev/version.map
> @@ -0,0 +1,11 @@
> +EXPERIMENTAL {
> +	global:
> +
> +	rte_gpu_dev_count_avail;
> +	rte_gpu_dev_find_next;
> +	rte_gpu_dev_info_get;
> +	rte_gpu_dev_is_valid;
> +	rte_gpu_free;
> +	rte_gpu_malloc;
> +	rte_gpu_malloc_visible;
> +};
> diff --git a/lib/meson.build b/lib/meson.build
> index 4a64756a68..ffefc64c69 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -33,6 +33,7 @@ libraries = [
>          'distributor',
>          'efd',
>          'eventdev',
> +        'gpudev',
>          'gro',
>          'gso',
>          'ip_frag',
> 


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-02 20:35 [dpdk-dev] [PATCH] gpudev: introduce memory API Thomas Monjalon
  2021-06-02 20:46 ` Stephen Hemminger
  2021-06-03  7:06 ` Andrew Rybchenko
@ 2021-06-03  7:18 ` David Marchand
  2021-06-03  7:30   ` Thomas Monjalon
  2021-06-03  7:47 ` Jerin Jacob
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 128+ messages in thread
From: David Marchand @ 2021-06-03  7:18 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Elena Agostini

Quick pass:

On Wed, Jun 2, 2021 at 10:36 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> diff --git a/lib/gpudev/gpu_driver.h b/lib/gpudev/gpu_driver.h
> new file mode 100644
> index 0000000000..5ff609e49d
> --- /dev/null
> +++ b/lib/gpudev/gpu_driver.h
> @@ -0,0 +1,44 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright 2021 NVIDIA Corporation & Affiliates
> + */
> +
> +#ifndef GPU_DRIVER_H
> +#define GPU_DRIVER_H
> +
> +#include <stdint.h>
> +
> +#include <rte_common.h>
> +
> +#include "rte_gpudev.h"
> +
> +struct rte_gpu_dev;
> +
> +typedef int (*gpu_malloc_t)(struct rte_gpu_dev *dev, size_t size, void **ptr);
> +typedef int (*gpu_free_t)(struct rte_gpu_dev *dev, void *ptr);
> +

Great to see this structure hidden in a driver-only header.


> +struct rte_gpu_dev {

We could have a name[] field here, that will be later pointed at, in
rte_gpu_info.
Who is responsible for deciding of the device name?


> +       /* Backing device. */
> +       struct rte_device *device;
> +       /* GPU info structure. */
> +       struct rte_gpu_info info;
> +       /* Counter of processes using the device. */
> +       uint16_t process_cnt;
> +       /* If device is currently used or not. */
> +       enum rte_gpu_state state;
> +       /* FUNCTION: Allocate memory on the GPU. */
> +       gpu_malloc_t gpu_malloc;
> +       /* FUNCTION: Allocate memory on the CPU visible from the GPU. */
> +       gpu_malloc_t gpu_malloc_visible;
> +       /* FUNCTION: Free allocated memory on the GPU. */
> +       gpu_free_t gpu_free;
> +       /* Device interrupt handle. */
> +       struct rte_intr_handle *intr_handle;
> +       /* Driver-specific private data. */
> +       void *dev_private;
> +} __rte_cache_aligned;
> +
> +struct rte_gpu_dev *rte_gpu_dev_allocate(const char *name);
> +struct rte_gpu_dev *rte_gpu_dev_get_by_name(const char *name);

Those symbols will have to be marked internal (__rte_internal +
version.map) for drivers to see them.


> +int rte_gpu_dev_release(struct rte_gpu_dev *gpudev);
> +
> +#endif /* GPU_DRIVER_H */


> diff --git a/lib/gpudev/rte_gpudev.h b/lib/gpudev/rte_gpudev.h
> new file mode 100644
> index 0000000000..b12f35c17e
> --- /dev/null
> +++ b/lib/gpudev/rte_gpudev.h
> @@ -0,0 +1,183 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright 2021 NVIDIA Corporation & Affiliates
> + */
> +
> +#ifndef RTE_GPUDEV_H
> +#define RTE_GPUDEV_H
> +
> +#include <stdint.h>
> +#include <stdbool.h>
> +
> +#include <rte_common.h>
> +
> +/**
> + * @file
> + * Generic library to interact with a GPU.
> + *
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/** Maximum number of GPU engines. */
> +#define RTE_GPU_MAX_DEVS UINT16_C(32)

Bleh.
Let's stop with max values.
The iterator _next should return a special value indicating there is
no more devs to list.



> +/** Maximum length of device name. */
> +#define RTE_GPU_NAME_MAX_LEN 128
> +
> +/** Flags indicate current state of GPU device. */
> +enum rte_gpu_state {
> +       RTE_GPU_STATE_UNUSED,        /**< not initialized */
> +       RTE_GPU_STATE_INITIALIZED,   /**< initialized */
> +};
> +
> +/** Store a list of info for a given GPU. */
> +struct rte_gpu_info {
> +       /** GPU device ID. */
> +       uint16_t gpu_id;
> +       /** Unique identifier name. */
> +       char name[RTE_GPU_NAME_MAX_LEN];

const char *name;

Then the gpu generic layer simply fills this with the
rte_gpu_dev->name field I proposed above.


> +       /** Total memory available on device. */
> +       size_t total_memory;
> +       /** Total processors available on device. */
> +       int processor_count;
> +};


-- 
David Marchand


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-03  7:06 ` Andrew Rybchenko
@ 2021-06-03  7:26   ` Thomas Monjalon
  2021-06-03  7:49     ` Andrew Rybchenko
  0 siblings, 1 reply; 128+ messages in thread
From: Thomas Monjalon @ 2021-06-03  7:26 UTC (permalink / raw)
  To: Andrew Rybchenko; +Cc: dev, Elena Agostini, david.marchand

03/06/2021 09:06, Andrew Rybchenko:
> On 6/2/21 11:35 PM, Thomas Monjalon wrote:
> > From: Elena Agostini <eagostini@nvidia.com>
> > 
> > The new library gpudev is for dealing with GPU from a DPDK application
> > in a vendor-agnostic way.
> > 
> > As a first step, the features are focused on memory management.
> > A function allows to allocate memory inside the GPU,
> > while another one allows to use main (CPU) memory from the GPU.
> > 
> > The infrastructure is prepared to welcome drivers in drivers/gpu/
> > as the upcoming NVIDIA one, implementing the gpudev API.
> > Other additions planned for next revisions:
> >   - C implementation file
> >   - guide documentation
> >   - unit tests
> >   - integration in testpmd to enable Rx/Tx to/from GPU memory.
> > 
> > The next step should focus on GPU processing task control.
> > 
> > Signed-off-by: Elena Agostini <eagostini@nvidia.com>
> > Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> 
> 
> LGTM as an RFC. It is definitely to a patch to apply
> since implementation is missing. See my notes below.

Yes sorry I forgot the RFC tag when sending.

[...]
> > +typedef int (*gpu_malloc_t)(struct rte_gpu_dev *dev, size_t size, void **ptr);
> > +typedef int (*gpu_free_t)(struct rte_gpu_dev *dev, void *ptr);
> 
> Not that important but I always prefer to typedef
> function prototypes w/o pointer and use pointer in
> the structure below. I.e.
> 
> typedef int (gpu_malloc_t)(struct rte_gpu_dev *dev, size_t size, void
> **ptr);
> 
> It allows to specify that corresponding callback
> must comply to the prototype and produce build
> error otherwise (and do not rely on warnings), e.g.
> 
> static gpu_malloc_t mlx5_gpu_malloc;
> static int
> mlx5_gpu_malloc(struct rte_gpu_dev *dev, size_t size, void **ptr)
> {
>      ...
> }
> 
> May be a new library should go this way.

I agree.
> 
> > +
> > +struct rte_gpu_dev {
> > +	/* Backing device. */
> > +	struct rte_device *device;
> > +	/* GPU info structure. */
> > +	struct rte_gpu_info info;
> > +	/* Counter of processes using the device. */
> > +	uint16_t process_cnt;
> > +	/* If device is currently used or not. */
> > +	enum rte_gpu_state state;
> > +	/* FUNCTION: Allocate memory on the GPU. */
> > +	gpu_malloc_t gpu_malloc;
> > +	/* FUNCTION: Allocate memory on the CPU visible from the GPU. */
> > +	gpu_malloc_t gpu_malloc_visible;
> > +	/* FUNCTION: Free allocated memory on the GPU. */
> > +	gpu_free_t gpu_free;
> 
> Don't we need a callback to get dev_info?

Yes it's my miss.

[...]
> > +__rte_experimental
> > +int rte_gpu_dev_info_get(uint16_t gpu_id, struct rte_gpu_info **info);
> 
> Hm, I think it is better to have 'struct rte_gpu_info *info'.
> Why should it allocate and return memory to be freed by caller?

No you're right, I overlooked it.

[...]
> > + * Allocate a chunk of memory on the GPU.
> 
> Looking a below function it is required to clarify here if
> the memory is visible or invisible to GPU (or both allowed).

This function allocates on the GPU so it is visible by the GPU.
I feel I misunderstand your question.

> > + *
> > + * @param gpu_id
> > + *   GPU ID to allocate memory.
> > + * @param size
> > + *   Number of bytes to allocate.
> 
> Is behaviour defined if zero size is requested?
> IMHO, it would be good to define.

OK

> > + * @param ptr
> > + *   Pointer to store the address of the allocated memory.
> > + *
> > + * @return
> > + *   0 on success, -1 otherwise.
> 
> Don't we want to differentiate various errors using
> negative errno as it is done in many DPDK libraries?

Yes I think so, I was just too much lazy to do it in this RFC.

> > + */
> > +__rte_experimental
> > +int rte_gpu_malloc(uint16_t gpu_id, size_t size, void **ptr);
> 
> May be *malloc() should return a pointer and "negative"
> values used to report various errnos?

I don't understand what you mean by negative values if it is a pointer.
We could return a pointer and use rte_errno.

> The problem with the approach that comparison vs NULL will
> not work in this case and we need special macro or small
> inline function to check error condition.
> 
> Returned pointer is definitely more convenient, but above
> not may result in bugs.

I don't know what is better.

[...]
> > + * Deallocate a chunk of memory allocated with rte_gpu_malloc*.
> > + *
> > + * @param gpu_id
> > + *   Reference GPU ID.
> > + * @param ptr
> > + *   Pointer to the memory area to be deallocated.
> 
> I think it should be NOP in the case of NULL pointer and it
> should be documented. If not, it must be documented as well.

OK for NOP.



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-03  7:18 ` David Marchand
@ 2021-06-03  7:30   ` Thomas Monjalon
  0 siblings, 0 replies; 128+ messages in thread
From: Thomas Monjalon @ 2021-06-03  7:30 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, Elena Agostini

03/06/2021 09:18, David Marchand:
> Quick pass:
> 
> On Wed, Jun 2, 2021 at 10:36 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > diff --git a/lib/gpudev/gpu_driver.h b/lib/gpudev/gpu_driver.h
> > new file mode 100644
> > index 0000000000..5ff609e49d
> > --- /dev/null
> > +++ b/lib/gpudev/gpu_driver.h
[...]
> > +struct rte_gpu_dev;
> > +
> > +typedef int (*gpu_malloc_t)(struct rte_gpu_dev *dev, size_t size, void **ptr);
> > +typedef int (*gpu_free_t)(struct rte_gpu_dev *dev, void *ptr);
> > +
> 
> Great to see this structure hidden in a driver-only header.
> 
> 
> > +struct rte_gpu_dev {
> 
> We could have a name[] field here, that will be later pointed at, in
> rte_gpu_info.
> Who is responsible for deciding of the device name?

The driver is responsible for the name of the device.
Yes I agree to store the name here.

> > +       /* Backing device. */
> > +       struct rte_device *device;
> > +       /* GPU info structure. */
> > +       struct rte_gpu_info info;
> > +       /* Counter of processes using the device. */
> > +       uint16_t process_cnt;
> > +       /* If device is currently used or not. */
> > +       enum rte_gpu_state state;
> > +       /* FUNCTION: Allocate memory on the GPU. */
> > +       gpu_malloc_t gpu_malloc;
> > +       /* FUNCTION: Allocate memory on the CPU visible from the GPU. */
> > +       gpu_malloc_t gpu_malloc_visible;
> > +       /* FUNCTION: Free allocated memory on the GPU. */
> > +       gpu_free_t gpu_free;
> > +       /* Device interrupt handle. */
> > +       struct rte_intr_handle *intr_handle;
> > +       /* Driver-specific private data. */
> > +       void *dev_private;
> > +} __rte_cache_aligned;
> > +
> > +struct rte_gpu_dev *rte_gpu_dev_allocate(const char *name);
> > +struct rte_gpu_dev *rte_gpu_dev_get_by_name(const char *name);
> 
> Those symbols will have to be marked internal (__rte_internal +
> version.map) for drivers to see them.

OK, good catch.

[...]
> > +/** Maximum number of GPU engines. */
> > +#define RTE_GPU_MAX_DEVS UINT16_C(32)
> 
> Bleh.
> Let's stop with max values.
> The iterator _next should return a special value indicating there is
> no more devs to list.

I fully agree.
I would like to define gpu_id as an int to simplify comparisons
with error value. int or int16_t ?

> > +/** Maximum length of device name. */
> > +#define RTE_GPU_NAME_MAX_LEN 128

Will be dropped as well.

> > +
> > +/** Flags indicate current state of GPU device. */
> > +enum rte_gpu_state {
> > +       RTE_GPU_STATE_UNUSED,        /**< not initialized */
> > +       RTE_GPU_STATE_INITIALIZED,   /**< initialized */
> > +};
> > +
> > +/** Store a list of info for a given GPU. */
> > +struct rte_gpu_info {
> > +       /** GPU device ID. */
> > +       uint16_t gpu_id;
> > +       /** Unique identifier name. */
> > +       char name[RTE_GPU_NAME_MAX_LEN];
> 
> const char *name;
> 
> Then the gpu generic layer simply fills this with the
> rte_gpu_dev->name field I proposed above.

Yes.

> > +       /** Total memory available on device. */
> > +       size_t total_memory;
> > +       /** Total processors available on device. */
> > +       int processor_count;
> > +};




^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-02 20:35 [dpdk-dev] [PATCH] gpudev: introduce memory API Thomas Monjalon
                   ` (2 preceding siblings ...)
  2021-06-03  7:18 ` David Marchand
@ 2021-06-03  7:47 ` Jerin Jacob
  2021-06-03  8:28   ` Thomas Monjalon
  2021-06-03  9:33   ` Ferruh Yigit
  2021-06-04  5:51 ` Wang, Haiyue
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 128+ messages in thread
From: Jerin Jacob @ 2021-06-03  7:47 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dpdk-dev, Elena Agostini

On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> From: Elena Agostini <eagostini@nvidia.com>
>
> The new library gpudev is for dealing with GPU from a DPDK application
> in a vendor-agnostic way.
>
> As a first step, the features are focused on memory management.
> A function allows to allocate memory inside the GPU,
> while another one allows to use main (CPU) memory from the GPU.
>
> The infrastructure is prepared to welcome drivers in drivers/gpu/
> as the upcoming NVIDIA one, implementing the gpudev API.
> Other additions planned for next revisions:
>   - C implementation file
>   - guide documentation
>   - unit tests
>   - integration in testpmd to enable Rx/Tx to/from GPU memory.
>
> The next step should focus on GPU processing task control.
>
> Signed-off-by: Elena Agostini <eagostini@nvidia.com>
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> ---
>  .gitignore                           |   1 +
>  MAINTAINERS                          |   6 +
>  doc/api/doxy-api-index.md            |   1 +
>  doc/api/doxy-api.conf.in             |   1 +
>  doc/guides/conf.py                   |   8 ++
>  doc/guides/gpus/features/default.ini |  13 ++
>  doc/guides/gpus/index.rst            |  11 ++
>  doc/guides/gpus/overview.rst         |   7 +
>  doc/guides/index.rst                 |   1 +
>  doc/guides/prog_guide/gpu.rst        |   5 +
>  doc/guides/prog_guide/index.rst      |   1 +
>  drivers/gpu/meson.build              |   4 +
>  drivers/meson.build                  |   1 +
>  lib/gpudev/gpu_driver.h              |  44 +++++++
>  lib/gpudev/meson.build               |   9 ++
>  lib/gpudev/rte_gpudev.h              | 183 +++++++++++++++++++++++++++
>  lib/gpudev/version.map               |  11 ++
>  lib/meson.build                      |   1 +
>  18 files changed, 308 insertions(+)
>  create mode 100644 doc/guides/gpus/features/default.ini
>  create mode 100644 doc/guides/gpus/index.rst
>  create mode 100644 doc/guides/gpus/overview.rst
>  create mode 100644 doc/guides/prog_guide/gpu.rst
>  create mode 100644 drivers/gpu/meson.build
>  create mode 100644 lib/gpudev/gpu_driver.h
>  create mode 100644 lib/gpudev/meson.build
>  create mode 100644 lib/gpudev/rte_gpudev.h
>  create mode 100644 lib/gpudev/version.map
>
> diff --git a/.gitignore b/.gitignore
> index b19c0717e6..49494e0c6c 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -14,6 +14,7 @@ doc/guides/compressdevs/overview_feature_table.txt
>  doc/guides/regexdevs/overview_feature_table.txt
>  doc/guides/vdpadevs/overview_feature_table.txt
>  doc/guides/bbdevs/overview_feature_table.txt
> +doc/guides/gpus/overview_feature_table.txt
>
>  # ignore generated ctags/cscope files
>  cscope.out.po
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 5877a16971..c4755dfe9a 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -452,6 +452,12 @@ F: app/test-regex/
>  F: doc/guides/prog_guide/regexdev.rst
>  F: doc/guides/regexdevs/features/default.ini
>
> +GPU API - EXPERIMENTAL
> +M: Elena Agostini <eagostini@nvidia.com>
> +F: lib/gpudev/
> +F: doc/guides/prog_guide/gpu.rst
> +F: doc/guides/gpus/features/default.ini
> +
>  Eventdev API
>  M: Jerin Jacob <jerinj@marvell.com>
>  T: git://dpdk.org/next/dpdk-next-eventdev
> diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
> index 1992107a03..bd10342ca2 100644
> --- a/doc/api/doxy-api-index.md
> +++ b/doc/api/doxy-api-index.md
> @@ -21,6 +21,7 @@ The public API headers are grouped by topics:
>    [compressdev]        (@ref rte_compressdev.h),
>    [compress]           (@ref rte_comp.h),
>    [regexdev]           (@ref rte_regexdev.h),
> +  [gpudev]             (@ref rte_gpudev.h),

Since this device does not have a queue etc? Shouldn't make it a
library like mempool with vendor-defined ops?
Any specific reason for making it a device? The reason why I am asking
this is, as other DPDK devices as symmetry in queue(s), configure,
start, stop operation etc.


> +
> +struct rte_gpu_dev {
> +       /* Backing device. */
> +       struct rte_device *device;

See above?

> +       /* GPU info structure. */

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-03  7:26   ` Thomas Monjalon
@ 2021-06-03  7:49     ` Andrew Rybchenko
  2021-06-03  8:26       ` Thomas Monjalon
  0 siblings, 1 reply; 128+ messages in thread
From: Andrew Rybchenko @ 2021-06-03  7:49 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Elena Agostini, david.marchand

On 6/3/21 10:26 AM, Thomas Monjalon wrote:
> 03/06/2021 09:06, Andrew Rybchenko:
>> On 6/2/21 11:35 PM, Thomas Monjalon wrote:
>>> + * Allocate a chunk of memory on the GPU.
>>
>> Looking a below function it is required to clarify here if
>> the memory is visible or invisible to GPU (or both allowed).
> 
> This function allocates on the GPU so it is visible by the GPU.
> I feel I misunderstand your question.

Below function says rte_gpu_malloc_visible() and its
description highlights that allocated memory is visible to GPU.
My problem that I don't understand what's the difference
between these two functions.

>>> + */
>>> +__rte_experimental
>>> +int rte_gpu_malloc(uint16_t gpu_id, size_t size, void **ptr);
>>
>> May be *malloc() should return a pointer and "negative"
>> values used to report various errnos?
> 
> I don't understand what you mean by negative values if it is a pointer.
> We could return a pointer and use rte_errno.

I was talking about something like (void *)(-ENOMEM), but it is
a bad idea. NULL + rte_errno is much better.

However, may be I'd kept callback as is set rte_error in lib if
negative value is returned by the callback. This way we'll be
safe against lost rte_errno update in drivers.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-03  7:49     ` Andrew Rybchenko
@ 2021-06-03  8:26       ` Thomas Monjalon
  2021-06-03  8:57         ` Andrew Rybchenko
  0 siblings, 1 reply; 128+ messages in thread
From: Thomas Monjalon @ 2021-06-03  8:26 UTC (permalink / raw)
  To: Andrew Rybchenko; +Cc: dev, Elena Agostini, david.marchand

03/06/2021 09:49, Andrew Rybchenko:
> On 6/3/21 10:26 AM, Thomas Monjalon wrote:
> > 03/06/2021 09:06, Andrew Rybchenko:
> >> On 6/2/21 11:35 PM, Thomas Monjalon wrote:
> >>> + * Allocate a chunk of memory on the GPU.
> >>
> >> Looking a below function it is required to clarify here if
> >> the memory is visible or invisible to GPU (or both allowed).
> > 
> > This function allocates on the GPU so it is visible by the GPU.
> > I feel I misunderstand your question.
> 
> Below function says rte_gpu_malloc_visible() and its
> description highlights that allocated memory is visible to GPU.
> My problem that I don't understand what's the difference
> between these two functions.

One function allocates in GPU mem, the other allows the GPU to use CPU mem.

> >>> + */
> >>> +__rte_experimental
> >>> +int rte_gpu_malloc(uint16_t gpu_id, size_t size, void **ptr);
> >>
> >> May be *malloc() should return a pointer and "negative"
> >> values used to report various errnos?
> > 
> > I don't understand what you mean by negative values if it is a pointer.
> > We could return a pointer and use rte_errno.
> 
> I was talking about something like (void *)(-ENOMEM), but it is
> a bad idea. NULL + rte_errno is much better.
> 
> However, may be I'd kept callback as is set rte_error in lib if
> negative value is returned by the callback. This way we'll be
> safe against lost rte_errno update in drivers.

OK



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-03  7:47 ` Jerin Jacob
@ 2021-06-03  8:28   ` Thomas Monjalon
  2021-06-03  8:41     ` Jerin Jacob
  2021-06-03  9:33   ` Ferruh Yigit
  1 sibling, 1 reply; 128+ messages in thread
From: Thomas Monjalon @ 2021-06-03  8:28 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dpdk-dev, Elena Agostini

03/06/2021 09:47, Jerin Jacob:
> On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon <thomas@monjalon.net> wrote:
> > --- a/doc/api/doxy-api-index.md
> > +++ b/doc/api/doxy-api-index.md
> > @@ -21,6 +21,7 @@ The public API headers are grouped by topics:
> >    [compressdev]        (@ref rte_compressdev.h),
> >    [compress]           (@ref rte_comp.h),
> >    [regexdev]           (@ref rte_regexdev.h),
> > +  [gpudev]             (@ref rte_gpudev.h),
> 
> Since this device does not have a queue etc? Shouldn't make it a
> library like mempool with vendor-defined ops?
> Any specific reason for making it a device? The reason why I am asking
> this is, as other DPDK devices as symmetry in queue(s), configure,
> start, stop operation etc.
> 
> 
> > +
> > +struct rte_gpu_dev {
> > +       /* Backing device. */
> > +       struct rte_device *device;
> 
> See above?

There is a PCI device probed.
I don't understand why it would not be represented as a device.



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-03  8:28   ` Thomas Monjalon
@ 2021-06-03  8:41     ` Jerin Jacob
  2021-06-03  8:43       ` Thomas Monjalon
  0 siblings, 1 reply; 128+ messages in thread
From: Jerin Jacob @ 2021-06-03  8:41 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dpdk-dev, Elena Agostini

On Thu, Jun 3, 2021 at 1:58 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 03/06/2021 09:47, Jerin Jacob:
> > On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > --- a/doc/api/doxy-api-index.md
> > > +++ b/doc/api/doxy-api-index.md
> > > @@ -21,6 +21,7 @@ The public API headers are grouped by topics:
> > >    [compressdev]        (@ref rte_compressdev.h),
> > >    [compress]           (@ref rte_comp.h),
> > >    [regexdev]           (@ref rte_regexdev.h),
> > > +  [gpudev]             (@ref rte_gpudev.h),
> >
> > Since this device does not have a queue etc? Shouldn't make it a
> > library like mempool with vendor-defined ops?
> > Any specific reason for making it a device? The reason why I am asking
> > this is, as other DPDK devices as symmetry in queue(s), configure,
> > start, stop operation etc.
> >
> >
> > > +
> > > +struct rte_gpu_dev {
> > > +       /* Backing device. */
> > > +       struct rte_device *device;
> >
> > See above?
>
> There is a PCI device probed.
> I don't understand why it would not be represented as a device.

All other DPDK device has symmetry in structures like queue and
symmetry in operation like it has configure, start, stop etc.
This one seems more like mempool to me all we want set of
vendor-defined ops. So any justification on
make it a device ? why not like mempool library?
(driver/mempool/octeontx2 Mempool HW is also PCI device, but
we don't take device path for mempool. So I would like to understand
any technical reason for making it a device).



>
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-03  8:41     ` Jerin Jacob
@ 2021-06-03  8:43       ` Thomas Monjalon
  2021-06-03  8:47         ` Jerin Jacob
  0 siblings, 1 reply; 128+ messages in thread
From: Thomas Monjalon @ 2021-06-03  8:43 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dpdk-dev, Elena Agostini

03/06/2021 10:41, Jerin Jacob:
> On Thu, Jun 3, 2021 at 1:58 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> >
> > 03/06/2021 09:47, Jerin Jacob:
> > > On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > --- a/doc/api/doxy-api-index.md
> > > > +++ b/doc/api/doxy-api-index.md
> > > > @@ -21,6 +21,7 @@ The public API headers are grouped by topics:
> > > >    [compressdev]        (@ref rte_compressdev.h),
> > > >    [compress]           (@ref rte_comp.h),
> > > >    [regexdev]           (@ref rte_regexdev.h),
> > > > +  [gpudev]             (@ref rte_gpudev.h),
> > >
> > > Since this device does not have a queue etc? Shouldn't make it a
> > > library like mempool with vendor-defined ops?
> > > Any specific reason for making it a device? The reason why I am asking
> > > this is, as other DPDK devices as symmetry in queue(s), configure,
> > > start, stop operation etc.
> > >
> > >
> > > > +
> > > > +struct rte_gpu_dev {
> > > > +       /* Backing device. */
> > > > +       struct rte_device *device;
> > >
> > > See above?
> >
> > There is a PCI device probed.
> > I don't understand why it would not be represented as a device.
> 
> All other DPDK device has symmetry in structures like queue and
> symmetry in operation like it has configure, start, stop etc.
> This one seems more like mempool to me all we want set of
> vendor-defined ops. So any justification on
> make it a device ? why not like mempool library?
> (driver/mempool/octeontx2 Mempool HW is also PCI device, but
> we don't take device path for mempool. So I would like to understand
> any technical reason for making it a device).

I don't understand what you mean by "symmetry".




^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-03  8:43       ` Thomas Monjalon
@ 2021-06-03  8:47         ` Jerin Jacob
  2021-06-03  8:53           ` Thomas Monjalon
  0 siblings, 1 reply; 128+ messages in thread
From: Jerin Jacob @ 2021-06-03  8:47 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dpdk-dev, Elena Agostini

On Thu, Jun 3, 2021 at 2:13 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 03/06/2021 10:41, Jerin Jacob:
> > On Thu, Jun 3, 2021 at 1:58 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > >
> > > 03/06/2021 09:47, Jerin Jacob:
> > > > On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > > --- a/doc/api/doxy-api-index.md
> > > > > +++ b/doc/api/doxy-api-index.md
> > > > > @@ -21,6 +21,7 @@ The public API headers are grouped by topics:
> > > > >    [compressdev]        (@ref rte_compressdev.h),
> > > > >    [compress]           (@ref rte_comp.h),
> > > > >    [regexdev]           (@ref rte_regexdev.h),
> > > > > +  [gpudev]             (@ref rte_gpudev.h),
> > > >
> > > > Since this device does not have a queue etc? Shouldn't make it a
> > > > library like mempool with vendor-defined ops?
> > > > Any specific reason for making it a device? The reason why I am asking
> > > > this is, as other DPDK devices as symmetry in queue(s), configure,
> > > > start, stop operation etc.
> > > >
> > > >
> > > > > +
> > > > > +struct rte_gpu_dev {
> > > > > +       /* Backing device. */
> > > > > +       struct rte_device *device;
> > > >
> > > > See above?
> > >
> > > There is a PCI device probed.
> > > I don't understand why it would not be represented as a device.
> >
> > All other DPDK device has symmetry in structures like queue and
> > symmetry in operation like it has configure, start, stop etc.
> > This one seems more like mempool to me all we want set of
> > vendor-defined ops. So any justification on
> > make it a device ? why not like mempool library?
> > (driver/mempool/octeontx2 Mempool HW is also PCI device, but
> > we don't take device path for mempool. So I would like to understand
> > any technical reason for making it a device).
>
> I don't understand what you mean by "symmetry".

The common attributes. or similarity

>
>
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-03  8:47         ` Jerin Jacob
@ 2021-06-03  8:53           ` Thomas Monjalon
  2021-06-03  9:20             ` Jerin Jacob
  0 siblings, 1 reply; 128+ messages in thread
From: Thomas Monjalon @ 2021-06-03  8:53 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dpdk-dev, Elena Agostini

03/06/2021 10:47, Jerin Jacob:
> On Thu, Jun 3, 2021 at 2:13 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> >
> > 03/06/2021 10:41, Jerin Jacob:
> > > On Thu, Jun 3, 2021 at 1:58 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > >
> > > > 03/06/2021 09:47, Jerin Jacob:
> > > > > On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > > > --- a/doc/api/doxy-api-index.md
> > > > > > +++ b/doc/api/doxy-api-index.md
> > > > > > @@ -21,6 +21,7 @@ The public API headers are grouped by topics:
> > > > > >    [compressdev]        (@ref rte_compressdev.h),
> > > > > >    [compress]           (@ref rte_comp.h),
> > > > > >    [regexdev]           (@ref rte_regexdev.h),
> > > > > > +  [gpudev]             (@ref rte_gpudev.h),
> > > > >
> > > > > Since this device does not have a queue etc? Shouldn't make it a
> > > > > library like mempool with vendor-defined ops?
> > > > > Any specific reason for making it a device? The reason why I am asking
> > > > > this is, as other DPDK devices as symmetry in queue(s), configure,
> > > > > start, stop operation etc.
> > > > >
> > > > >
> > > > > > +
> > > > > > +struct rte_gpu_dev {
> > > > > > +       /* Backing device. */
> > > > > > +       struct rte_device *device;
> > > > >
> > > > > See above?
> > > >
> > > > There is a PCI device probed.
> > > > I don't understand why it would not be represented as a device.
> > >
> > > All other DPDK device has symmetry in structures like queue and
> > > symmetry in operation like it has configure, start, stop etc.
> > > This one seems more like mempool to me all we want set of
> > > vendor-defined ops. So any justification on
> > > make it a device ? why not like mempool library?
> > > (driver/mempool/octeontx2 Mempool HW is also PCI device, but
> > > we don't take device path for mempool. So I would like to understand
> > > any technical reason for making it a device).
> >
> > I don't understand what you mean by "symmetry".
> 
> The common attributes. or similarity

The common attributes of a device are:
	- driver
	- bus
	- devargs
We have these attributes for a GPU.

About configure/start/stop usual functions,
I think we'll have something similar in the second step
for running tasks.



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-03  8:26       ` Thomas Monjalon
@ 2021-06-03  8:57         ` Andrew Rybchenko
  0 siblings, 0 replies; 128+ messages in thread
From: Andrew Rybchenko @ 2021-06-03  8:57 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Elena Agostini, david.marchand

On 6/3/21 11:26 AM, Thomas Monjalon wrote:
> 03/06/2021 09:49, Andrew Rybchenko:
>> On 6/3/21 10:26 AM, Thomas Monjalon wrote:
>>> 03/06/2021 09:06, Andrew Rybchenko:
>>>> On 6/2/21 11:35 PM, Thomas Monjalon wrote:
>>>>> + * Allocate a chunk of memory on the GPU.
>>>>
>>>> Looking a below function it is required to clarify here if
>>>> the memory is visible or invisible to GPU (or both allowed).
>>>
>>> This function allocates on the GPU so it is visible by the GPU.
>>> I feel I misunderstand your question.
>>
>> Below function says rte_gpu_malloc_visible() and its
>> description highlights that allocated memory is visible to GPU.
>> My problem that I don't understand what's the difference
>> between these two functions.
> 
> One function allocates in GPU mem, the other allows the GPU to use CPU mem.

Ah, I see no. G looks like C and I've not noticed the
difference. Now it is clear. My bad.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-03  8:53           ` Thomas Monjalon
@ 2021-06-03  9:20             ` Jerin Jacob
  2021-06-03  9:36               ` Thomas Monjalon
  0 siblings, 1 reply; 128+ messages in thread
From: Jerin Jacob @ 2021-06-03  9:20 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dpdk-dev, Elena Agostini

On Thu, Jun 3, 2021 at 2:23 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 03/06/2021 10:47, Jerin Jacob:
> > On Thu, Jun 3, 2021 at 2:13 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > >
> > > 03/06/2021 10:41, Jerin Jacob:
> > > > On Thu, Jun 3, 2021 at 1:58 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > >
> > > > > 03/06/2021 09:47, Jerin Jacob:
> > > > > > On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > > > > --- a/doc/api/doxy-api-index.md
> > > > > > > +++ b/doc/api/doxy-api-index.md
> > > > > > > @@ -21,6 +21,7 @@ The public API headers are grouped by topics:
> > > > > > >    [compressdev]        (@ref rte_compressdev.h),
> > > > > > >    [compress]           (@ref rte_comp.h),
> > > > > > >    [regexdev]           (@ref rte_regexdev.h),
> > > > > > > +  [gpudev]             (@ref rte_gpudev.h),
> > > > > >
> > > > > > Since this device does not have a queue etc? Shouldn't make it a
> > > > > > library like mempool with vendor-defined ops?
> > > > > > Any specific reason for making it a device? The reason why I am asking
> > > > > > this is, as other DPDK devices as symmetry in queue(s), configure,
> > > > > > start, stop operation etc.
> > > > > >
> > > > > >
> > > > > > > +
> > > > > > > +struct rte_gpu_dev {
> > > > > > > +       /* Backing device. */
> > > > > > > +       struct rte_device *device;
> > > > > >
> > > > > > See above?
> > > > >
> > > > > There is a PCI device probed.
> > > > > I don't understand why it would not be represented as a device.
> > > >
> > > > All other DPDK device has symmetry in structures like queue and
> > > > symmetry in operation like it has configure, start, stop etc.
> > > > This one seems more like mempool to me all we want set of
> > > > vendor-defined ops. So any justification on
> > > > make it a device ? why not like mempool library?
> > > > (driver/mempool/octeontx2 Mempool HW is also PCI device, but
> > > > we don't take device path for mempool. So I would like to understand
> > > > any technical reason for making it a device).
> > >
> > > I don't understand what you mean by "symmetry".
> >
> > The common attributes. or similarity
>
> The common attributes of a device are:
>         - driver
>         - bus
>         - devargs
> We have these attributes for a GPU.

Yes. Those are attributes of rte_device. That does not mean and
library can not use rte_device.(mempool library driver is using
rte_device which is backed by PCI)
In terms of similarity, all other device libraries(not devices) have
queue, enqueue() and dequeue() kind of scheme
in ethdev, cryptodev, compressdev, eventdev, bbdev, rawdev. regexdev.
i.e existing DPDK device libraries,
This one des not have have that, So question why to call it libgpudev vs libgpu.

The functions you have are memory allocation etc. That's more of a
library candidate.

>
> About configure/start/stop usual functions,
> I think we'll have something similar in the second step

Do you think or it will be there?. I think, it is import decision. The
device needs have a queue kind of structure
and it is mapping to core to have a notion of configure. queue_setup,
start and stop etc
Something similar to
http://code.dpdk.org/dpdk/v21.05/source/lib/regexdev/rte_regexdev.h#L27
Could you share how "running tasks" translates to the above scheme
like other her dpdk device libraries?



> for running tasks.
>
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-03  7:47 ` Jerin Jacob
  2021-06-03  8:28   ` Thomas Monjalon
@ 2021-06-03  9:33   ` Ferruh Yigit
  2021-06-04 10:28     ` Thomas Monjalon
  1 sibling, 1 reply; 128+ messages in thread
From: Ferruh Yigit @ 2021-06-03  9:33 UTC (permalink / raw)
  To: Jerin Jacob, Thomas Monjalon; +Cc: dpdk-dev, Elena Agostini

On 6/3/2021 8:47 AM, Jerin Jacob wrote:
> On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon <thomas@monjalon.net> wrote:
>>
>> From: Elena Agostini <eagostini@nvidia.com>
>>
>> The new library gpudev is for dealing with GPU from a DPDK application
>> in a vendor-agnostic way.
>>
>> As a first step, the features are focused on memory management.
>> A function allows to allocate memory inside the GPU,
>> while another one allows to use main (CPU) memory from the GPU.
>>
>> The infrastructure is prepared to welcome drivers in drivers/gpu/
>> as the upcoming NVIDIA one, implementing the gpudev API.
>> Other additions planned for next revisions:
>>   - C implementation file
>>   - guide documentation
>>   - unit tests
>>   - integration in testpmd to enable Rx/Tx to/from GPU memory.
>>
>> The next step should focus on GPU processing task control.
>>
>> Signed-off-by: Elena Agostini <eagostini@nvidia.com>
>> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
>> ---
>>  .gitignore                           |   1 +
>>  MAINTAINERS                          |   6 +
>>  doc/api/doxy-api-index.md            |   1 +
>>  doc/api/doxy-api.conf.in             |   1 +
>>  doc/guides/conf.py                   |   8 ++
>>  doc/guides/gpus/features/default.ini |  13 ++
>>  doc/guides/gpus/index.rst            |  11 ++
>>  doc/guides/gpus/overview.rst         |   7 +
>>  doc/guides/index.rst                 |   1 +
>>  doc/guides/prog_guide/gpu.rst        |   5 +
>>  doc/guides/prog_guide/index.rst      |   1 +
>>  drivers/gpu/meson.build              |   4 +
>>  drivers/meson.build                  |   1 +
>>  lib/gpudev/gpu_driver.h              |  44 +++++++
>>  lib/gpudev/meson.build               |   9 ++
>>  lib/gpudev/rte_gpudev.h              | 183 +++++++++++++++++++++++++++
>>  lib/gpudev/version.map               |  11 ++
>>  lib/meson.build                      |   1 +
>>  18 files changed, 308 insertions(+)
>>  create mode 100644 doc/guides/gpus/features/default.ini
>>  create mode 100644 doc/guides/gpus/index.rst
>>  create mode 100644 doc/guides/gpus/overview.rst
>>  create mode 100644 doc/guides/prog_guide/gpu.rst
>>  create mode 100644 drivers/gpu/meson.build
>>  create mode 100644 lib/gpudev/gpu_driver.h
>>  create mode 100644 lib/gpudev/meson.build
>>  create mode 100644 lib/gpudev/rte_gpudev.h
>>  create mode 100644 lib/gpudev/version.map
>>
>> diff --git a/.gitignore b/.gitignore
>> index b19c0717e6..49494e0c6c 100644
>> --- a/.gitignore
>> +++ b/.gitignore
>> @@ -14,6 +14,7 @@ doc/guides/compressdevs/overview_feature_table.txt
>>  doc/guides/regexdevs/overview_feature_table.txt
>>  doc/guides/vdpadevs/overview_feature_table.txt
>>  doc/guides/bbdevs/overview_feature_table.txt
>> +doc/guides/gpus/overview_feature_table.txt
>>
>>  # ignore generated ctags/cscope files
>>  cscope.out.po
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 5877a16971..c4755dfe9a 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -452,6 +452,12 @@ F: app/test-regex/
>>  F: doc/guides/prog_guide/regexdev.rst
>>  F: doc/guides/regexdevs/features/default.ini
>>
>> +GPU API - EXPERIMENTAL
>> +M: Elena Agostini <eagostini@nvidia.com>
>> +F: lib/gpudev/
>> +F: doc/guides/prog_guide/gpu.rst
>> +F: doc/guides/gpus/features/default.ini
>> +
>>  Eventdev API
>>  M: Jerin Jacob <jerinj@marvell.com>
>>  T: git://dpdk.org/next/dpdk-next-eventdev
>> diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
>> index 1992107a03..bd10342ca2 100644
>> --- a/doc/api/doxy-api-index.md
>> +++ b/doc/api/doxy-api-index.md
>> @@ -21,6 +21,7 @@ The public API headers are grouped by topics:
>>    [compressdev]        (@ref rte_compressdev.h),
>>    [compress]           (@ref rte_comp.h),
>>    [regexdev]           (@ref rte_regexdev.h),
>> +  [gpudev]             (@ref rte_gpudev.h),
> 
> Since this device does not have a queue etc? Shouldn't make it a
> library like mempool with vendor-defined ops?

+1

Current RFC announces additional memory allocation capabilities, which can suits
better as extension to existing memory related library instead of a new device
abstraction library.

> Any specific reason for making it a device? The reason why I am asking
> this is, as other DPDK devices as symmetry in queue(s), configure,
> start, stop operation etc.
> 
> 
>> +
>> +struct rte_gpu_dev {
>> +       /* Backing device. */
>> +       struct rte_device *device;
> 
> See above?
> 
>> +       /* GPU info structure. */


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-03  9:20             ` Jerin Jacob
@ 2021-06-03  9:36               ` Thomas Monjalon
  2021-06-03 10:04                 ` Jerin Jacob
  0 siblings, 1 reply; 128+ messages in thread
From: Thomas Monjalon @ 2021-06-03  9:36 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dpdk-dev, Elena Agostini

03/06/2021 11:20, Jerin Jacob:
> On Thu, Jun 3, 2021 at 2:23 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> >
> > 03/06/2021 10:47, Jerin Jacob:
> > > On Thu, Jun 3, 2021 at 2:13 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > >
> > > > 03/06/2021 10:41, Jerin Jacob:
> > > > > On Thu, Jun 3, 2021 at 1:58 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > > >
> > > > > > 03/06/2021 09:47, Jerin Jacob:
> > > > > > > On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > > > > > --- a/doc/api/doxy-api-index.md
> > > > > > > > +++ b/doc/api/doxy-api-index.md
> > > > > > > > @@ -21,6 +21,7 @@ The public API headers are grouped by topics:
> > > > > > > >    [compressdev]        (@ref rte_compressdev.h),
> > > > > > > >    [compress]           (@ref rte_comp.h),
> > > > > > > >    [regexdev]           (@ref rte_regexdev.h),
> > > > > > > > +  [gpudev]             (@ref rte_gpudev.h),
> > > > > > >
> > > > > > > Since this device does not have a queue etc? Shouldn't make it a
> > > > > > > library like mempool with vendor-defined ops?
> > > > > > > Any specific reason for making it a device? The reason why I am asking
> > > > > > > this is, as other DPDK devices as symmetry in queue(s), configure,
> > > > > > > start, stop operation etc.
> > > > > > >
> > > > > > >
> > > > > > > > +
> > > > > > > > +struct rte_gpu_dev {
> > > > > > > > +       /* Backing device. */
> > > > > > > > +       struct rte_device *device;
> > > > > > >
> > > > > > > See above?
> > > > > >
> > > > > > There is a PCI device probed.
> > > > > > I don't understand why it would not be represented as a device.
> > > > >
> > > > > All other DPDK device has symmetry in structures like queue and
> > > > > symmetry in operation like it has configure, start, stop etc.
> > > > > This one seems more like mempool to me all we want set of
> > > > > vendor-defined ops. So any justification on
> > > > > make it a device ? why not like mempool library?
> > > > > (driver/mempool/octeontx2 Mempool HW is also PCI device, but
> > > > > we don't take device path for mempool. So I would like to understand
> > > > > any technical reason for making it a device).
> > > >
> > > > I don't understand what you mean by "symmetry".
> > >
> > > The common attributes. or similarity
> >
> > The common attributes of a device are:
> >         - driver
> >         - bus
> >         - devargs
> > We have these attributes for a GPU.
> 
> Yes. Those are attributes of rte_device. That does not mean and
> library can not use rte_device.(mempool library driver is using
> rte_device which is backed by PCI)
> In terms of similarity, all other device libraries(not devices) have
> queue, enqueue() and dequeue() kind of scheme
> in ethdev, cryptodev, compressdev, eventdev, bbdev, rawdev. regexdev.
> i.e existing DPDK device libraries,
> This one des not have have that, So question why to call it libgpudev vs libgpu.
> 
> The functions you have are memory allocation etc. That's more of a
> library candidate.
> 
> > About configure/start/stop usual functions,
> > I think we'll have something similar in the second step
> 
> Do you think or it will be there?. I think, it is import decision.

That's an important discussion we need to have.
We are preparing a proposal.

> The device needs have a queue kind of structure
> and it is mapping to core to have a notion of configure. queue_setup,
> start and stop etc

Why is it a requirement to call it a device API?

> Something similar to
> http://code.dpdk.org/dpdk/v21.05/source/lib/regexdev/rte_regexdev.h#L27
> Could you share how "running tasks" translates to the above scheme
> like other her dpdk device libraries?

We will share our view soon but what to control in GPU execution
must be a community discussed requirement.




^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-03  9:36               ` Thomas Monjalon
@ 2021-06-03 10:04                 ` Jerin Jacob
  2021-06-03 10:30                   ` Thomas Monjalon
  0 siblings, 1 reply; 128+ messages in thread
From: Jerin Jacob @ 2021-06-03 10:04 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dpdk-dev, Elena Agostini

On Thu, Jun 3, 2021 at 3:06 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 03/06/2021 11:20, Jerin Jacob:
> > On Thu, Jun 3, 2021 at 2:23 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > >
> > > 03/06/2021 10:47, Jerin Jacob:
> > > > On Thu, Jun 3, 2021 at 2:13 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > >
> > > > > 03/06/2021 10:41, Jerin Jacob:
> > > > > > On Thu, Jun 3, 2021 at 1:58 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > > > >
> > > > > > > 03/06/2021 09:47, Jerin Jacob:
> > > > > > > > On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > > > > > > --- a/doc/api/doxy-api-index.md
> > > > > > > > > +++ b/doc/api/doxy-api-index.md
> > > > > > > > > @@ -21,6 +21,7 @@ The public API headers are grouped by topics:
> > > > > > > > >    [compressdev]        (@ref rte_compressdev.h),
> > > > > > > > >    [compress]           (@ref rte_comp.h),
> > > > > > > > >    [regexdev]           (@ref rte_regexdev.h),
> > > > > > > > > +  [gpudev]             (@ref rte_gpudev.h),
> > > > > > > >
> > > > > > > > Since this device does not have a queue etc? Shouldn't make it a
> > > > > > > > library like mempool with vendor-defined ops?
> > > > > > > > Any specific reason for making it a device? The reason why I am asking
> > > > > > > > this is, as other DPDK devices as symmetry in queue(s), configure,
> > > > > > > > start, stop operation etc.
> > > > > > > >
> > > > > > > >
> > > > > > > > > +
> > > > > > > > > +struct rte_gpu_dev {
> > > > > > > > > +       /* Backing device. */
> > > > > > > > > +       struct rte_device *device;
> > > > > > > >
> > > > > > > > See above?
> > > > > > >
> > > > > > > There is a PCI device probed.
> > > > > > > I don't understand why it would not be represented as a device.
> > > > > >
> > > > > > All other DPDK device has symmetry in structures like queue and
> > > > > > symmetry in operation like it has configure, start, stop etc.
> > > > > > This one seems more like mempool to me all we want set of
> > > > > > vendor-defined ops. So any justification on
> > > > > > make it a device ? why not like mempool library?
> > > > > > (driver/mempool/octeontx2 Mempool HW is also PCI device, but
> > > > > > we don't take device path for mempool. So I would like to understand
> > > > > > any technical reason for making it a device).
> > > > >
> > > > > I don't understand what you mean by "symmetry".
> > > >
> > > > The common attributes. or similarity
> > >
> > > The common attributes of a device are:
> > >         - driver
> > >         - bus
> > >         - devargs
> > > We have these attributes for a GPU.
> >
> > Yes. Those are attributes of rte_device. That does not mean and
> > library can not use rte_device.(mempool library driver is using
> > rte_device which is backed by PCI)
> > In terms of similarity, all other device libraries(not devices) have
> > queue, enqueue() and dequeue() kind of scheme
> > in ethdev, cryptodev, compressdev, eventdev, bbdev, rawdev. regexdev.
> > i.e existing DPDK device libraries,
> > This one des not have have that, So question why to call it libgpudev vs libgpu.

See below[1]

> >
> > The functions you have are memory allocation etc. That's more of a
> > library candidate.
> >
> > > About configure/start/stop usual functions,
> > > I think we'll have something similar in the second step
> >
> > Do you think or it will be there?. I think, it is import decision.
>
> That's an important discussion we need to have.
> We are preparing a proposal.

Ack.

>
> > The device needs have a queue kind of structure
> > and it is mapping to core to have a notion of configure. queue_setup,
> > start and stop etc
>
> Why is it a requirement to call it a device API?

Then we need to define what needs to call as device library vs library and how?
Why mempool is not called a  device library vs library?  and why all
other device library has a common structure like queues and
it binding core etc. I tried to explain above the similar attributes
for dpdk device libraries[1] which I think, it a requirement so
that the end user will have familiarity with device libraries rather
than each one has separate General guidelines and principles.

I think, it is more TB discussion topic and decides on this because I
don't see in technical issue in calling it a library.

>
> > Something similar to
> > http://code.dpdk.org/dpdk/v21.05/source/lib/regexdev/rte_regexdev.h#L27
> > Could you share how "running tasks" translates to the above scheme
> > like other her dpdk device libraries?
>
> We will share our view soon but what to control in GPU execution
> must be a community discussed requirement.

Makes sense.

>
>
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-03 10:04                 ` Jerin Jacob
@ 2021-06-03 10:30                   ` Thomas Monjalon
  2021-06-03 11:38                     ` Jerin Jacob
  0 siblings, 1 reply; 128+ messages in thread
From: Thomas Monjalon @ 2021-06-03 10:30 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dpdk-dev, Elena Agostini

03/06/2021 12:04, Jerin Jacob:
> On Thu, Jun 3, 2021 at 3:06 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> >
> > 03/06/2021 11:20, Jerin Jacob:
> > > On Thu, Jun 3, 2021 at 2:23 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > >
> > > > 03/06/2021 10:47, Jerin Jacob:
> > > > > On Thu, Jun 3, 2021 at 2:13 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > > >
> > > > > > 03/06/2021 10:41, Jerin Jacob:
> > > > > > > On Thu, Jun 3, 2021 at 1:58 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > > > > >
> > > > > > > > 03/06/2021 09:47, Jerin Jacob:
> > > > > > > > > On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > > > > > > > --- a/doc/api/doxy-api-index.md
> > > > > > > > > > +++ b/doc/api/doxy-api-index.md
> > > > > > > > > > @@ -21,6 +21,7 @@ The public API headers are grouped by topics:
> > > > > > > > > >    [compressdev]        (@ref rte_compressdev.h),
> > > > > > > > > >    [compress]           (@ref rte_comp.h),
> > > > > > > > > >    [regexdev]           (@ref rte_regexdev.h),
> > > > > > > > > > +  [gpudev]             (@ref rte_gpudev.h),
> > > > > > > > >
> > > > > > > > > Since this device does not have a queue etc? Shouldn't make it a
> > > > > > > > > library like mempool with vendor-defined ops?
> > > > > > > > > Any specific reason for making it a device? The reason why I am asking
> > > > > > > > > this is, as other DPDK devices as symmetry in queue(s), configure,
> > > > > > > > > start, stop operation etc.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > +
> > > > > > > > > > +struct rte_gpu_dev {
> > > > > > > > > > +       /* Backing device. */
> > > > > > > > > > +       struct rte_device *device;
> > > > > > > > >
> > > > > > > > > See above?
> > > > > > > >
> > > > > > > > There is a PCI device probed.
> > > > > > > > I don't understand why it would not be represented as a device.
> > > > > > >
> > > > > > > All other DPDK device has symmetry in structures like queue and
> > > > > > > symmetry in operation like it has configure, start, stop etc.
> > > > > > > This one seems more like mempool to me all we want set of
> > > > > > > vendor-defined ops. So any justification on
> > > > > > > make it a device ? why not like mempool library?
> > > > > > > (driver/mempool/octeontx2 Mempool HW is also PCI device, but
> > > > > > > we don't take device path for mempool. So I would like to understand
> > > > > > > any technical reason for making it a device).
> > > > > >
> > > > > > I don't understand what you mean by "symmetry".
> > > > >
> > > > > The common attributes. or similarity
> > > >
> > > > The common attributes of a device are:
> > > >         - driver
> > > >         - bus
> > > >         - devargs
> > > > We have these attributes for a GPU.
> > >
> > > Yes. Those are attributes of rte_device. That does not mean and
> > > library can not use rte_device.(mempool library driver is using
> > > rte_device which is backed by PCI)
> > > In terms of similarity, all other device libraries(not devices) have
> > > queue, enqueue() and dequeue() kind of scheme
> > > in ethdev, cryptodev, compressdev, eventdev, bbdev, rawdev. regexdev.
> > > i.e existing DPDK device libraries,
> > > This one des not have have that, So question why to call it libgpudev vs libgpu.
> 
> See below[1]
> 
> > >
> > > The functions you have are memory allocation etc. That's more of a
> > > library candidate.
> > >
> > > > About configure/start/stop usual functions,
> > > > I think we'll have something similar in the second step
> > >
> > > Do you think or it will be there?. I think, it is import decision.
> >
> > That's an important discussion we need to have.
> > We are preparing a proposal.
> 
> Ack.
> 
> >
> > > The device needs have a queue kind of structure
> > > and it is mapping to core to have a notion of configure. queue_setup,
> > > start and stop etc
> >
> > Why is it a requirement to call it a device API?
> 
> Then we need to define what needs to call as device library vs library and how?
> Why mempool is not called a  device library vs library?

My view is simple:
if it has drivers, it is a device API, except bus and mempool libs.
About mempool, it started as a standard lib and got extended for HW support.

> and why all
> other device library has a common structure like queues and
> it binding core etc. I tried to explain above the similar attributes
> for dpdk device libraries[1] which I think, it a requirement so
> that the end user will have familiarity with device libraries rather
> than each one has separate General guidelines and principles.
> 
> I think, it is more TB discussion topic and decides on this because I
> don't see in technical issue in calling it a library.

The naming is just a choice.
Yesterday morning it was called lib/gpu/
and in the evening it was renamed lib/gpudev/
so no technical issue :)

But the design of the API with queues or other paradigm
is something I would like to discuss here.
Note: there was no intent to publish GPU processing control
in DPDK 21.08. We want to focus on GPU memory in 21.08,
but I understand it is a key decision in the big picture.
What would be your need and would you design such API?

> > > Something similar to
> > > http://code.dpdk.org/dpdk/v21.05/source/lib/regexdev/rte_regexdev.h#L27
> > > Could you share how "running tasks" translates to the above scheme
> > > like other her dpdk device libraries?
> >
> > We will share our view soon but what to control in GPU execution
> > must be a community discussed requirement.
> 
> Makes sense.




^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-03 10:30                   ` Thomas Monjalon
@ 2021-06-03 11:38                     ` Jerin Jacob
  2021-06-04 12:55                       ` Thomas Monjalon
  0 siblings, 1 reply; 128+ messages in thread
From: Jerin Jacob @ 2021-06-03 11:38 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dpdk-dev, Elena Agostini

On Thu, Jun 3, 2021 at 4:00 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 03/06/2021 12:04, Jerin Jacob:
> > On Thu, Jun 3, 2021 at 3:06 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > >
> > > 03/06/2021 11:20, Jerin Jacob:
> > > > On Thu, Jun 3, 2021 at 2:23 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > >
> > > > > 03/06/2021 10:47, Jerin Jacob:
> > > > > > On Thu, Jun 3, 2021 at 2:13 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > > > >
> > > > > > > 03/06/2021 10:41, Jerin Jacob:
> > > > > > > > On Thu, Jun 3, 2021 at 1:58 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > > > > > >
> > > > > > > > > 03/06/2021 09:47, Jerin Jacob:
> > > > > > > > > > On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > > > > > > > > --- a/doc/api/doxy-api-index.md
> > > > > > > > > > > +++ b/doc/api/doxy-api-index.md
> > > > > > > > > > > @@ -21,6 +21,7 @@ The public API headers are grouped by topics:
> > > > > > > > > > >    [compressdev]        (@ref rte_compressdev.h),
> > > > > > > > > > >    [compress]           (@ref rte_comp.h),
> > > > > > > > > > >    [regexdev]           (@ref rte_regexdev.h),
> > > > > > > > > > > +  [gpudev]             (@ref rte_gpudev.h),
> > > > > > > > > >
> > > > > > > > > > Since this device does not have a queue etc? Shouldn't make it a
> > > > > > > > > > library like mempool with vendor-defined ops?
> > > > > > > > > > Any specific reason for making it a device? The reason why I am asking
> > > > > > > > > > this is, as other DPDK devices as symmetry in queue(s), configure,
> > > > > > > > > > start, stop operation etc.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > +
> > > > > > > > > > > +struct rte_gpu_dev {
> > > > > > > > > > > +       /* Backing device. */
> > > > > > > > > > > +       struct rte_device *device;
> > > > > > > > > >
> > > > > > > > > > See above?
> > > > > > > > >
> > > > > > > > > There is a PCI device probed.
> > > > > > > > > I don't understand why it would not be represented as a device.
> > > > > > > >
> > > > > > > > All other DPDK device has symmetry in structures like queue and
> > > > > > > > symmetry in operation like it has configure, start, stop etc.
> > > > > > > > This one seems more like mempool to me all we want set of
> > > > > > > > vendor-defined ops. So any justification on
> > > > > > > > make it a device ? why not like mempool library?
> > > > > > > > (driver/mempool/octeontx2 Mempool HW is also PCI device, but
> > > > > > > > we don't take device path for mempool. So I would like to understand
> > > > > > > > any technical reason for making it a device).
> > > > > > >
> > > > > > > I don't understand what you mean by "symmetry".
> > > > > >
> > > > > > The common attributes. or similarity
> > > > >
> > > > > The common attributes of a device are:
> > > > >         - driver
> > > > >         - bus
> > > > >         - devargs
> > > > > We have these attributes for a GPU.
> > > >
> > > > Yes. Those are attributes of rte_device. That does not mean and
> > > > library can not use rte_device.(mempool library driver is using
> > > > rte_device which is backed by PCI)
> > > > In terms of similarity, all other device libraries(not devices) have
> > > > queue, enqueue() and dequeue() kind of scheme
> > > > in ethdev, cryptodev, compressdev, eventdev, bbdev, rawdev. regexdev.
> > > > i.e existing DPDK device libraries,
> > > > This one des not have have that, So question why to call it libgpudev vs libgpu.
> >
> > See below[1]
> >
> > > >
> > > > The functions you have are memory allocation etc. That's more of a
> > > > library candidate.
> > > >
> > > > > About configure/start/stop usual functions,
> > > > > I think we'll have something similar in the second step
> > > >
> > > > Do you think or it will be there?. I think, it is import decision.
> > >
> > > That's an important discussion we need to have.
> > > We are preparing a proposal.
> >
> > Ack.
> >
> > >
> > > > The device needs have a queue kind of structure
> > > > and it is mapping to core to have a notion of configure. queue_setup,
> > > > start and stop etc
> > >
> > > Why is it a requirement to call it a device API?
> >
> > Then we need to define what needs to call as device library vs library and how?
> > Why mempool is not called a  device library vs library?
>
> My view is simple:
> if it has drivers, it is a device API, except bus and mempool libs.

rte_secuity has drivers but it is not called a device library.

> About mempool, it started as a standard lib and got extended for HW support.

Yes. We did not change to device library as it was fundamentally
different than another DPDK deices
when we added the device support.

>
> > and why all
> > other device library has a common structure like queues and
> > it binding core etc. I tried to explain above the similar attributes
> > for dpdk device libraries[1] which I think, it a requirement so
> > that the end user will have familiarity with device libraries rather
> > than each one has separate General guidelines and principles.
> >
> > I think, it is more TB discussion topic and decides on this because I
> > don't see in technical issue in calling it a library.
>
> The naming is just a choice.

Not sure.

> Yesterday morning it was called lib/gpu/
> and in the evening it was renamed lib/gpudev/
> so no technical issue :)
>
> But the design of the API with queues or other paradigm
> is something I would like to discuss here.

Yeah, That is important. IMO, That defines what needs to be a device library.

> Note: there was no intent to publish GPU processing control
> in DPDK 21.08. We want to focus on GPU memory in 21.08,
> but I understand it is a key decision in the big picture.

if the scope is only memory allocation, IMO, it is better to make a library.

> What would be your need and would you design such API?

For me, there is no need for gpu library(as of now). May GPU consumers
can define what
they need to control using the library.


>
> > > > Something similar to
> > > > http://code.dpdk.org/dpdk/v21.05/source/lib/regexdev/rte_regexdev.h#L27
> > > > Could you share how "running tasks" translates to the above scheme
> > > > like other her dpdk device libraries?
> > >
> > > We will share our view soon but what to control in GPU execution
> > > must be a community discussed requirement.
> >
> > Makes sense.
>
>
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-02 20:35 [dpdk-dev] [PATCH] gpudev: introduce memory API Thomas Monjalon
                   ` (3 preceding siblings ...)
  2021-06-03  7:47 ` Jerin Jacob
@ 2021-06-04  5:51 ` Wang, Haiyue
  2021-06-04  8:15   ` Thomas Monjalon
  2021-06-04 11:07 ` Wang, Haiyue
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 128+ messages in thread
From: Wang, Haiyue @ 2021-06-04  5:51 UTC (permalink / raw)
  To: Thomas Monjalon, dev; +Cc: Elena Agostini

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Thomas Monjalon
> Sent: Thursday, June 3, 2021 04:36
> To: dev@dpdk.org
> Cc: Elena Agostini <eagostini@nvidia.com>
> Subject: [dpdk-dev] [PATCH] gpudev: introduce memory API
> 
> From: Elena Agostini <eagostini@nvidia.com>
> 
> The new library gpudev is for dealing with GPU from a DPDK application
> in a vendor-agnostic way.
> 
> As a first step, the features are focused on memory management.
> A function allows to allocate memory inside the GPU,
> while another one allows to use main (CPU) memory from the GPU.
> 
> The infrastructure is prepared to welcome drivers in drivers/gpu/
> as the upcoming NVIDIA one, implementing the gpudev API.
> Other additions planned for next revisions:
>   - C implementation file
>   - guide documentation
>   - unit tests
>   - integration in testpmd to enable Rx/Tx to/from GPU memory.
> 
> The next step should focus on GPU processing task control.
> 

Is this patch for 'L2FWD-NV Workload on GPU' on P26 ?
https://developer.download.nvidia.com/video/gputechconf/gtc/2019/presentation/s9730-packet-processing-on-gpu-at-100gbe-line-rate.pdf


> Signed-off-by: Elena Agostini <eagostini@nvidia.com>
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> ---
>  .gitignore                           |   1 +
>  MAINTAINERS                          |   6 +
>  doc/api/doxy-api-index.md            |   1 +
>  doc/api/doxy-api.conf.in             |   1 +
>  doc/guides/conf.py                   |   8 ++
>  doc/guides/gpus/features/default.ini |  13 ++
>  doc/guides/gpus/index.rst            |  11 ++
>  doc/guides/gpus/overview.rst         |   7 +
>  doc/guides/index.rst                 |   1 +
>  doc/guides/prog_guide/gpu.rst        |   5 +
>  doc/guides/prog_guide/index.rst      |   1 +
>  drivers/gpu/meson.build              |   4 +
>  drivers/meson.build                  |   1 +
>  lib/gpudev/gpu_driver.h              |  44 +++++++
>  lib/gpudev/meson.build               |   9 ++
>  lib/gpudev/rte_gpudev.h              | 183 +++++++++++++++++++++++++++
>  lib/gpudev/version.map               |  11 ++
>  lib/meson.build                      |   1 +
>  18 files changed, 308 insertions(+)
>  create mode 100644 doc/guides/gpus/features/default.ini
>  create mode 100644 doc/guides/gpus/index.rst
>  create mode 100644 doc/guides/gpus/overview.rst
>  create mode 100644 doc/guides/prog_guide/gpu.rst
>  create mode 100644 drivers/gpu/meson.build
>  create mode 100644 lib/gpudev/gpu_driver.h
>  create mode 100644 lib/gpudev/meson.build
>  create mode 100644 lib/gpudev/rte_gpudev.h
>  create mode 100644 lib/gpudev/version.map
> 


> --
> 2.31.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-04  5:51 ` Wang, Haiyue
@ 2021-06-04  8:15   ` Thomas Monjalon
  0 siblings, 0 replies; 128+ messages in thread
From: Thomas Monjalon @ 2021-06-04  8:15 UTC (permalink / raw)
  To: Wang, Haiyue; +Cc: dev, Elena Agostini

04/06/2021 07:51, Wang, Haiyue:
> > From: Elena Agostini <eagostini@nvidia.com>
> > 
> > The new library gpudev is for dealing with GPU from a DPDK application
> > in a vendor-agnostic way.
> > 
> > As a first step, the features are focused on memory management.
> > A function allows to allocate memory inside the GPU,
> > while another one allows to use main (CPU) memory from the GPU.
> > 
> > The infrastructure is prepared to welcome drivers in drivers/gpu/
> > as the upcoming NVIDIA one, implementing the gpudev API.
> > Other additions planned for next revisions:
> >   - C implementation file
> >   - guide documentation
> >   - unit tests
> >   - integration in testpmd to enable Rx/Tx to/from GPU memory.
> > 
> > The next step should focus on GPU processing task control.
> > 
> 
> Is this patch for 'L2FWD-NV Workload on GPU' on P26 ?
> https://developer.download.nvidia.com/video/gputechconf/gtc/2019/presentation/s9730-packet-processing-on-gpu-at-100gbe-line-rate.pdf

Yes this is the same project: use GPU in DPDK workload.



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-03  9:33   ` Ferruh Yigit
@ 2021-06-04 10:28     ` Thomas Monjalon
  2021-06-04 11:09       ` Jerin Jacob
  0 siblings, 1 reply; 128+ messages in thread
From: Thomas Monjalon @ 2021-06-04 10:28 UTC (permalink / raw)
  To: Jerin Jacob, Ferruh Yigit; +Cc: dev, Elena Agostini

03/06/2021 11:33, Ferruh Yigit:
> On 6/3/2021 8:47 AM, Jerin Jacob wrote:
> > On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon <thomas@monjalon.net> wrote:
> >> +  [gpudev]             (@ref rte_gpudev.h),
> > 
> > Since this device does not have a queue etc? Shouldn't make it a
> > library like mempool with vendor-defined ops?
> 
> +1
> 
> Current RFC announces additional memory allocation capabilities, which can suits
> better as extension to existing memory related library instead of a new device
> abstraction library.

It is not replacing mempool.
It is more at the same level as EAL memory management:
allocate simple buffer, but with the exception it is done
on a specific device, so it requires a device ID.

The other reason it needs to be a full library is that
it will start a workload on the GPU and get completion notification
so we can integrate the GPU workload in a packet processing pipeline.



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-02 20:35 [dpdk-dev] [PATCH] gpudev: introduce memory API Thomas Monjalon
                   ` (4 preceding siblings ...)
  2021-06-04  5:51 ` Wang, Haiyue
@ 2021-06-04 11:07 ` Wang, Haiyue
  2021-06-04 12:43   ` Thomas Monjalon
  2021-06-06  1:10 ` Honnappa Nagarahalli
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 128+ messages in thread
From: Wang, Haiyue @ 2021-06-04 11:07 UTC (permalink / raw)
  To: Thomas Monjalon, dev; +Cc: Elena Agostini

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Thomas Monjalon
> Sent: Thursday, June 3, 2021 04:36
> To: dev@dpdk.org
> Cc: Elena Agostini <eagostini@nvidia.com>
> Subject: [dpdk-dev] [PATCH] gpudev: introduce memory API
> 
> From: Elena Agostini <eagostini@nvidia.com>
> 
> The new library gpudev is for dealing with GPU from a DPDK application
> in a vendor-agnostic way.
> 
> As a first step, the features are focused on memory management.
> A function allows to allocate memory inside the GPU,
> while another one allows to use main (CPU) memory from the GPU.
> 
> The infrastructure is prepared to welcome drivers in drivers/gpu/
> as the upcoming NVIDIA one, implementing the gpudev API.
> Other additions planned for next revisions:
>   - C implementation file
>   - guide documentation
>   - unit tests
>   - integration in testpmd to enable Rx/Tx to/from GPU memory.
> 
> The next step should focus on GPU processing task control.
> 
> Signed-off-by: Elena Agostini <eagostini@nvidia.com>
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> ---
>  .gitignore                           |   1 +
>  MAINTAINERS                          |   6 +
>  doc/api/doxy-api-index.md            |   1 +
>  doc/api/doxy-api.conf.in             |   1 +
>  doc/guides/conf.py                   |   8 ++
>  doc/guides/gpus/features/default.ini |  13 ++
>  doc/guides/gpus/index.rst            |  11 ++
>  doc/guides/gpus/overview.rst         |   7 +
>  doc/guides/index.rst                 |   1 +
>  doc/guides/prog_guide/gpu.rst        |   5 +
>  doc/guides/prog_guide/index.rst      |   1 +
>  drivers/gpu/meson.build              |   4 +
>  drivers/meson.build                  |   1 +
>  lib/gpudev/gpu_driver.h              |  44 +++++++
>  lib/gpudev/meson.build               |   9 ++
>  lib/gpudev/rte_gpudev.h              | 183 +++++++++++++++++++++++++++
>  lib/gpudev/version.map               |  11 ++
>  lib/meson.build                      |   1 +
>  18 files changed, 308 insertions(+)
>  create mode 100644 doc/guides/gpus/features/default.ini
>  create mode 100644 doc/guides/gpus/index.rst
>  create mode 100644 doc/guides/gpus/overview.rst
>  create mode 100644 doc/guides/prog_guide/gpu.rst
>  create mode 100644 drivers/gpu/meson.build
>  create mode 100644 lib/gpudev/gpu_driver.h
>  create mode 100644 lib/gpudev/meson.build
>  create mode 100644 lib/gpudev/rte_gpudev.h
>  create mode 100644 lib/gpudev/version.map
> 


> +#include <stdint.h>
> +
> +#include <rte_common.h>
> +
> +#include "rte_gpudev.h"
> +
> +struct rte_gpu_dev;
> +
> +typedef int (*gpu_malloc_t)(struct rte_gpu_dev *dev, size_t size, void **ptr);
> +typedef int (*gpu_free_t)(struct rte_gpu_dev *dev, void *ptr);
> +
> +struct rte_gpu_dev {
> +	/* Backing device. */
> +	struct rte_device *device;
> +	/* GPU info structure. */
> +	struct rte_gpu_info info;
> +	/* Counter of processes using the device. */
> +	uint16_t process_cnt;
> +	/* If device is currently used or not. */
> +	enum rte_gpu_state state;
> +	/* FUNCTION: Allocate memory on the GPU. */
> +	gpu_malloc_t gpu_malloc;
> +	/* FUNCTION: Allocate memory on the CPU visible from the GPU. */
> +	gpu_malloc_t gpu_malloc_visible;
> +	/* FUNCTION: Free allocated memory on the GPU. */
> +	gpu_free_t gpu_free;


I'm wondering that we can define the malloc type as:

typedef int (*gpu_malloc_t)(struct rte_gpu_dev *dev, size_t size, void **ptr,
					unsigned int flags)

#define RTE_GPU_MALLOC_F_CPU_VISIBLE 0x01u --> gpu_malloc_visible

Then only one malloc function member is needed, paired with 'gpu_free'.

> +	/* Device interrupt handle. */
> +	struct rte_intr_handle *intr_handle;
> +	/* Driver-specific private data. */
> +	void *dev_private;
> +} __rte_cache_aligned;
> +


> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Allocate a chunk of memory on the GPU.
> + *
> + * @param gpu_id
> + *   GPU ID to allocate memory.
> + * @param size
> + *   Number of bytes to allocate.
> + * @param ptr
> + *   Pointer to store the address of the allocated memory.
> + *
> + * @return
> + *   0 on success, -1 otherwise.
> + */
> +__rte_experimental
> +int rte_gpu_malloc(uint16_t gpu_id, size_t size, void **ptr);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Allocate a chunk of memory on the CPU that is visible from the GPU.
> + *
> + * @param gpu_id
> + *   Reference GPU ID.
> + * @param size
> + *   Number of bytes to allocate.
> + * @param ptr
> + *   Pointer to store the address of the allocated memory.
> + *
> + * @return
> + *   0 on success, -1 otherwise.
> + */
> +__rte_experimental
> +int rte_gpu_malloc_visible(uint16_t gpu_id, size_t size, void **ptr);

Then 'rte_gpu_malloc_visible' is no needed, and the new call is:

rte_gpu_malloc(uint16_t gpu_id, size_t size, void **ptr, RTE_GPU_MALLOC_F_CPU_VISIBLE).

Also, we can define more flags for feature extension. ;-)

> +
> +#ifdef __cplusplus
> +}
> --
> 2.31.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-04 10:28     ` Thomas Monjalon
@ 2021-06-04 11:09       ` Jerin Jacob
  2021-06-04 12:46         ` Thomas Monjalon
  0 siblings, 1 reply; 128+ messages in thread
From: Jerin Jacob @ 2021-06-04 11:09 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Ferruh Yigit, dpdk-dev, Elena Agostini

On Fri, Jun 4, 2021 at 3:58 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 03/06/2021 11:33, Ferruh Yigit:
> > On 6/3/2021 8:47 AM, Jerin Jacob wrote:
> > > On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon <thomas@monjalon.net> wrote:
> > >> +  [gpudev]             (@ref rte_gpudev.h),
> > >
> > > Since this device does not have a queue etc? Shouldn't make it a
> > > library like mempool with vendor-defined ops?
> >
> > +1
> >
> > Current RFC announces additional memory allocation capabilities, which can suits
> > better as extension to existing memory related library instead of a new device
> > abstraction library.
>
> It is not replacing mempool.
> It is more at the same level as EAL memory management:
> allocate simple buffer, but with the exception it is done
> on a specific device, so it requires a device ID.
>
> The other reason it needs to be a full library is that
> it will start a workload on the GPU and get completion notification
> so we can integrate the GPU workload in a packet processing pipeline.

I might have confused you. My intention is not to make to fit under mempool API.

I agree that we need a separate library for this. My objection is only
to not call libgpudev and
call it libgpu. And have APIs with rte_gpu_ instead of rte_gpu_dev as
it not like existing "device libraries" in DPDK and
it like other "libraries" in DPDK.



>
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-04 11:07 ` Wang, Haiyue
@ 2021-06-04 12:43   ` Thomas Monjalon
  2021-06-04 13:25     ` Wang, Haiyue
  0 siblings, 1 reply; 128+ messages in thread
From: Thomas Monjalon @ 2021-06-04 12:43 UTC (permalink / raw)
  To: Wang, Haiyue; +Cc: dev, Elena Agostini

04/06/2021 13:07, Wang, Haiyue:
> > From: Elena Agostini <eagostini@nvidia.com>
> > +typedef int (*gpu_malloc_t)(struct rte_gpu_dev *dev, size_t size, void **ptr);
> > +typedef int (*gpu_free_t)(struct rte_gpu_dev *dev, void *ptr);
> > +
[...]
> > +	/* FUNCTION: Allocate memory on the GPU. */
> > +	gpu_malloc_t gpu_malloc;
> > +	/* FUNCTION: Allocate memory on the CPU visible from the GPU. */
> > +	gpu_malloc_t gpu_malloc_visible;
> > +	/* FUNCTION: Free allocated memory on the GPU. */
> > +	gpu_free_t gpu_free;
> 
> 
> I'm wondering that we can define the malloc type as:
> 
> typedef int (*gpu_malloc_t)(struct rte_gpu_dev *dev, size_t size, void **ptr,
> 					unsigned int flags)
> 
> #define RTE_GPU_MALLOC_F_CPU_VISIBLE 0x01u --> gpu_malloc_visible
> 
> Then only one malloc function member is needed, paired with 'gpu_free'.
[...]
> > +int rte_gpu_malloc(uint16_t gpu_id, size_t size, void **ptr);
[...]
> > +int rte_gpu_malloc_visible(uint16_t gpu_id, size_t size, void **ptr);
> 
> Then 'rte_gpu_malloc_visible' is no needed, and the new call is:
> 
> rte_gpu_malloc(uint16_t gpu_id, size_t size, void **ptr, RTE_GPU_MALLOC_F_CPU_VISIBLE).
> 
> Also, we can define more flags for feature extension. ;-)

Yes it is a good idea.

Another question is about the function rte_gpu_free().
How do we recognize that a memory chunk is from the CPU and GPU visible,
or just from GPU?



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-04 11:09       ` Jerin Jacob
@ 2021-06-04 12:46         ` Thomas Monjalon
  2021-06-04 13:05           ` Andrew Rybchenko
  0 siblings, 1 reply; 128+ messages in thread
From: Thomas Monjalon @ 2021-06-04 12:46 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: Ferruh Yigit, dpdk-dev, Elena Agostini, david.marchand

04/06/2021 13:09, Jerin Jacob:
> On Fri, Jun 4, 2021 at 3:58 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > 03/06/2021 11:33, Ferruh Yigit:
> > > On 6/3/2021 8:47 AM, Jerin Jacob wrote:
> > > > On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > >> +  [gpudev]             (@ref rte_gpudev.h),
> > > >
> > > > Since this device does not have a queue etc? Shouldn't make it a
> > > > library like mempool with vendor-defined ops?
> > >
> > > +1
> > >
> > > Current RFC announces additional memory allocation capabilities, which can suits
> > > better as extension to existing memory related library instead of a new device
> > > abstraction library.
> >
> > It is not replacing mempool.
> > It is more at the same level as EAL memory management:
> > allocate simple buffer, but with the exception it is done
> > on a specific device, so it requires a device ID.
> >
> > The other reason it needs to be a full library is that
> > it will start a workload on the GPU and get completion notification
> > so we can integrate the GPU workload in a packet processing pipeline.
> 
> I might have confused you. My intention is not to make to fit under mempool API.
> 
> I agree that we need a separate library for this. My objection is only
> to not call libgpudev and
> call it libgpu. And have APIs with rte_gpu_ instead of rte_gpu_dev as
> it not like existing "device libraries" in DPDK and
> it like other "libraries" in DPDK.

I think we should define a queue of processing actions,
so it looks like other device libraries.
And anyway I think a library managing a device class,
and having some device drivers deserves the name of device library.

I would like to read more opinions.



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-03 11:38                     ` Jerin Jacob
@ 2021-06-04 12:55                       ` Thomas Monjalon
  2021-06-04 15:05                         ` Jerin Jacob
  0 siblings, 1 reply; 128+ messages in thread
From: Thomas Monjalon @ 2021-06-04 12:55 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev, Elena Agostini, ferruh.yigit

03/06/2021 13:38, Jerin Jacob:
> On Thu, Jun 3, 2021 at 4:00 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > 03/06/2021 12:04, Jerin Jacob:
> > > On Thu, Jun 3, 2021 at 3:06 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > 03/06/2021 11:20, Jerin Jacob:
> > > > > The device needs have a queue kind of structure
> > > > > and it is mapping to core to have a notion of configure. queue_setup,
> > > > > start and stop etc
> > > >
> > > > Why is it a requirement to call it a device API?
> > >
> > > Then we need to define what needs to call as device library vs library and how?
> > > Why mempool is not called a  device library vs library?
> >
> > My view is simple:
> > if it has drivers, it is a device API, except bus and mempool libs.
> 
> rte_secuity has drivers but it is not called a device library.

rte_security is a monster beast :)
Yes it has rte_security_ops implemented in net and crypto drivers,
but it is an API extension only, there is no driver dedicated to security.

> > About mempool, it started as a standard lib and got extended for HW support.
> 
> Yes. We did not change to device library as it was fundamentally
> different than another DPDK deices
> when we added the device support.
> 
> > > and why all
> > > other device library has a common structure like queues and
> > > it binding core etc. I tried to explain above the similar attributes
> > > for dpdk device libraries[1] which I think, it a requirement so
> > > that the end user will have familiarity with device libraries rather
> > > than each one has separate General guidelines and principles.
> > >
> > > I think, it is more TB discussion topic and decides on this because I
> > > don't see in technical issue in calling it a library.
> >
> > The naming is just a choice.
> 
> Not sure.
> 
> > Yesterday morning it was called lib/gpu/
> > and in the evening it was renamed lib/gpudev/
> > so no technical issue :)
> >
> > But the design of the API with queues or other paradigm
> > is something I would like to discuss here.
> 
> Yeah, That is important. IMO, That defines what needs to be a device library.
> 
> > Note: there was no intent to publish GPU processing control
> > in DPDK 21.08. We want to focus on GPU memory in 21.08,
> > but I understand it is a key decision in the big picture.
> 
> if the scope is only memory allocation, IMO, it is better to make a library.

No it is only the first step.

> > What would be your need and would you design such API?
> 
> For me, there is no need for gpu library(as of now). May GPU consumers
> can define what they need to control using the library.

We need to integrate GPU processing workload in the DPDK workflow
as a generic API.
There could be 2 modes:
	- queue of tasks
	- tasks in an infinite loop
In both modes, we could get completion notifications
with an interrupt/callback or by polling a shared memory.




^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-04 12:46         ` Thomas Monjalon
@ 2021-06-04 13:05           ` Andrew Rybchenko
  2021-06-04 13:18             ` Thomas Monjalon
  0 siblings, 1 reply; 128+ messages in thread
From: Andrew Rybchenko @ 2021-06-04 13:05 UTC (permalink / raw)
  To: Thomas Monjalon, Jerin Jacob
  Cc: Ferruh Yigit, dpdk-dev, Elena Agostini, david.marchand

On 6/4/21 3:46 PM, Thomas Monjalon wrote:
> 04/06/2021 13:09, Jerin Jacob:
>> On Fri, Jun 4, 2021 at 3:58 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>>> 03/06/2021 11:33, Ferruh Yigit:
>>>> On 6/3/2021 8:47 AM, Jerin Jacob wrote:
>>>>> On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon <thomas@monjalon.net> wrote:
>>>>>> +  [gpudev]             (@ref rte_gpudev.h),
>>>>>
>>>>> Since this device does not have a queue etc? Shouldn't make it a
>>>>> library like mempool with vendor-defined ops?
>>>>
>>>> +1
>>>>
>>>> Current RFC announces additional memory allocation capabilities, which can suits
>>>> better as extension to existing memory related library instead of a new device
>>>> abstraction library.
>>>
>>> It is not replacing mempool.
>>> It is more at the same level as EAL memory management:
>>> allocate simple buffer, but with the exception it is done
>>> on a specific device, so it requires a device ID.
>>>
>>> The other reason it needs to be a full library is that
>>> it will start a workload on the GPU and get completion notification
>>> so we can integrate the GPU workload in a packet processing pipeline.
>>
>> I might have confused you. My intention is not to make to fit under mempool API.
>>
>> I agree that we need a separate library for this. My objection is only
>> to not call libgpudev and
>> call it libgpu. And have APIs with rte_gpu_ instead of rte_gpu_dev as
>> it not like existing "device libraries" in DPDK and
>> it like other "libraries" in DPDK.
> 
> I think we should define a queue of processing actions,
> so it looks like other device libraries.
> And anyway I think a library managing a device class,
> and having some device drivers deserves the name of device library.
> 
> I would like to read more opinions.

Since the library is an unified interface to GPU device drivers
I think it should be named as in the patch - gpudev.

Mempool looks like an exception here - initially it was pure SW
library, but not there are HW backends and corresponding device
drivers.

What I don't understand where is GPU specifics here?
I.e. why GPU? NIC can have own memory and provide
corresponding API.

What's the difference of "the memory on the CPU that is visible from the
GPU" from existing memzones which are DMA mapped?

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-04 13:05           ` Andrew Rybchenko
@ 2021-06-04 13:18             ` Thomas Monjalon
  2021-06-04 13:59               ` Andrew Rybchenko
  0 siblings, 1 reply; 128+ messages in thread
From: Thomas Monjalon @ 2021-06-04 13:18 UTC (permalink / raw)
  To: Andrew Rybchenko
  Cc: Jerin Jacob, Ferruh Yigit, dpdk-dev, Elena Agostini, david.marchand

04/06/2021 15:05, Andrew Rybchenko:
> On 6/4/21 3:46 PM, Thomas Monjalon wrote:
> > 04/06/2021 13:09, Jerin Jacob:
> >> On Fri, Jun 4, 2021 at 3:58 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> >>> 03/06/2021 11:33, Ferruh Yigit:
> >>>> On 6/3/2021 8:47 AM, Jerin Jacob wrote:
> >>>>> On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon <thomas@monjalon.net> wrote:
> >>>>>> +  [gpudev]             (@ref rte_gpudev.h),
> >>>>>
> >>>>> Since this device does not have a queue etc? Shouldn't make it a
> >>>>> library like mempool with vendor-defined ops?
> >>>>
> >>>> +1
> >>>>
> >>>> Current RFC announces additional memory allocation capabilities, which can suits
> >>>> better as extension to existing memory related library instead of a new device
> >>>> abstraction library.
> >>>
> >>> It is not replacing mempool.
> >>> It is more at the same level as EAL memory management:
> >>> allocate simple buffer, but with the exception it is done
> >>> on a specific device, so it requires a device ID.
> >>>
> >>> The other reason it needs to be a full library is that
> >>> it will start a workload on the GPU and get completion notification
> >>> so we can integrate the GPU workload in a packet processing pipeline.
> >>
> >> I might have confused you. My intention is not to make to fit under mempool API.
> >>
> >> I agree that we need a separate library for this. My objection is only
> >> to not call libgpudev and
> >> call it libgpu. And have APIs with rte_gpu_ instead of rte_gpu_dev as
> >> it not like existing "device libraries" in DPDK and
> >> it like other "libraries" in DPDK.
> > 
> > I think we should define a queue of processing actions,
> > so it looks like other device libraries.
> > And anyway I think a library managing a device class,
> > and having some device drivers deserves the name of device library.
> > 
> > I would like to read more opinions.
> 
> Since the library is an unified interface to GPU device drivers
> I think it should be named as in the patch - gpudev.
> 
> Mempool looks like an exception here - initially it was pure SW
> library, but not there are HW backends and corresponding device
> drivers.
> 
> What I don't understand where is GPU specifics here?

That's an interesting question.
Let's ask first what is a GPU for DPDK?
I think it is like a sub-CPU with high parallel execution capabilities,
and it is controlled by the CPU.

> I.e. why GPU? NIC can have own memory and provide corresponding API.

So far we don't need to explicitly allocate memory on the NIC.
The packets are received or copied to the CPU memory.
In the GPU case, the NIC could save the packets directly
in the GPU memory, thus the need to manage the GPU memory.

Also, because the GPU program is dynamically loaded,
there is no fixed API to interact with the GPU workload except via memory.

> What's the difference of "the memory on the CPU that is visible from the
> GPU" from existing memzones which are DMA mapped?

The only difference is that the GPU must map the CPU memory
in its program logic.



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-04 12:43   ` Thomas Monjalon
@ 2021-06-04 13:25     ` Wang, Haiyue
  2021-06-04 14:06       ` Thomas Monjalon
  0 siblings, 1 reply; 128+ messages in thread
From: Wang, Haiyue @ 2021-06-04 13:25 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Elena Agostini

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Friday, June 4, 2021 20:44
> To: Wang, Haiyue <haiyue.wang@intel.com>
> Cc: dev@dpdk.org; Elena Agostini <eagostini@nvidia.com>
> Subject: Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
> 
> 04/06/2021 13:07, Wang, Haiyue:
> > > From: Elena Agostini <eagostini@nvidia.com>
> > > +typedef int (*gpu_malloc_t)(struct rte_gpu_dev *dev, size_t size, void **ptr);
> > > +typedef int (*gpu_free_t)(struct rte_gpu_dev *dev, void *ptr);
> > > +
> [...]
> > > +	/* FUNCTION: Allocate memory on the GPU. */
> > > +	gpu_malloc_t gpu_malloc;
> > > +	/* FUNCTION: Allocate memory on the CPU visible from the GPU. */
> > > +	gpu_malloc_t gpu_malloc_visible;
> > > +	/* FUNCTION: Free allocated memory on the GPU. */
> > > +	gpu_free_t gpu_free;
> >
> >
> > I'm wondering that we can define the malloc type as:
> >
> > typedef int (*gpu_malloc_t)(struct rte_gpu_dev *dev, size_t size, void **ptr,
> > 					unsigned int flags)
> >
> > #define RTE_GPU_MALLOC_F_CPU_VISIBLE 0x01u --> gpu_malloc_visible
> >
> > Then only one malloc function member is needed, paired with 'gpu_free'.
> [...]
> > > +int rte_gpu_malloc(uint16_t gpu_id, size_t size, void **ptr);
> [...]
> > > +int rte_gpu_malloc_visible(uint16_t gpu_id, size_t size, void **ptr);
> >
> > Then 'rte_gpu_malloc_visible' is no needed, and the new call is:
> >
> > rte_gpu_malloc(uint16_t gpu_id, size_t size, void **ptr, RTE_GPU_MALLOC_F_CPU_VISIBLE).
> >
> > Also, we can define more flags for feature extension. ;-)
> 
> Yes it is a good idea.
> 
> Another question is about the function rte_gpu_free().
> How do we recognize that a memory chunk is from the CPU and GPU visible,
> or just from GPU?
> 

I didn't find the rte_gpu_free_visible definition, and the rte_gpu_free's
comment just says: deallocate a chunk of memory allocated with rte_gpu_malloc*

Looks like the rte_gpu_free can handle this case ?

And from the definition "rte_gpu_free(uint16_t gpu_id, void *ptr)", the
free needs to check whether this memory belong to the GPU or not, so it
also can recognize the memory type, I think.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-04 13:18             ` Thomas Monjalon
@ 2021-06-04 13:59               ` Andrew Rybchenko
  2021-06-04 14:09                 ` Thomas Monjalon
  0 siblings, 1 reply; 128+ messages in thread
From: Andrew Rybchenko @ 2021-06-04 13:59 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Jerin Jacob, Ferruh Yigit, dpdk-dev, Elena Agostini, david.marchand

On 6/4/21 4:18 PM, Thomas Monjalon wrote:
> 04/06/2021 15:05, Andrew Rybchenko:
>> On 6/4/21 3:46 PM, Thomas Monjalon wrote:
>>> 04/06/2021 13:09, Jerin Jacob:
>>>> On Fri, Jun 4, 2021 at 3:58 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>>>>> 03/06/2021 11:33, Ferruh Yigit:
>>>>>> On 6/3/2021 8:47 AM, Jerin Jacob wrote:
>>>>>>> On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon <thomas@monjalon.net> wrote:
>>>>>>>> +  [gpudev]             (@ref rte_gpudev.h),
>>>>>>>
>>>>>>> Since this device does not have a queue etc? Shouldn't make it a
>>>>>>> library like mempool with vendor-defined ops?
>>>>>>
>>>>>> +1
>>>>>>
>>>>>> Current RFC announces additional memory allocation capabilities, which can suits
>>>>>> better as extension to existing memory related library instead of a new device
>>>>>> abstraction library.
>>>>>
>>>>> It is not replacing mempool.
>>>>> It is more at the same level as EAL memory management:
>>>>> allocate simple buffer, but with the exception it is done
>>>>> on a specific device, so it requires a device ID.
>>>>>
>>>>> The other reason it needs to be a full library is that
>>>>> it will start a workload on the GPU and get completion notification
>>>>> so we can integrate the GPU workload in a packet processing pipeline.
>>>>
>>>> I might have confused you. My intention is not to make to fit under mempool API.
>>>>
>>>> I agree that we need a separate library for this. My objection is only
>>>> to not call libgpudev and
>>>> call it libgpu. And have APIs with rte_gpu_ instead of rte_gpu_dev as
>>>> it not like existing "device libraries" in DPDK and
>>>> it like other "libraries" in DPDK.
>>>
>>> I think we should define a queue of processing actions,
>>> so it looks like other device libraries.
>>> And anyway I think a library managing a device class,
>>> and having some device drivers deserves the name of device library.
>>>
>>> I would like to read more opinions.
>>
>> Since the library is an unified interface to GPU device drivers
>> I think it should be named as in the patch - gpudev.
>>
>> Mempool looks like an exception here - initially it was pure SW
>> library, but not there are HW backends and corresponding device
>> drivers.
>>
>> What I don't understand where is GPU specifics here?
> 
> That's an interesting question.
> Let's ask first what is a GPU for DPDK?
> I think it is like a sub-CPU with high parallel execution capabilities,
> and it is controlled by the CPU.

I have no good ideas how to name it in accordance with
above description to avoid "G" which for "Graphics" if
understand correctly. However, may be it is not required.
No strong opinion on the topic, but unbinding from
"Graphics" would be nice.

>> I.e. why GPU? NIC can have own memory and provide corresponding API.
> 
> So far we don't need to explicitly allocate memory on the NIC.
> The packets are received or copied to the CPU memory.
> In the GPU case, the NIC could save the packets directly
> in the GPU memory, thus the need to manage the GPU memory.
> 
> Also, because the GPU program is dynamically loaded,
> there is no fixed API to interact with the GPU workload except via memory.
> 
>> What's the difference of "the memory on the CPU that is visible from the
>> GPU" from existing memzones which are DMA mapped?
> 
> The only difference is that the GPU must map the CPU memory
> in its program logic.

I see. Thanks for the explanations.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-04 13:25     ` Wang, Haiyue
@ 2021-06-04 14:06       ` Thomas Monjalon
  2021-06-04 18:04         ` Wang, Haiyue
  0 siblings, 1 reply; 128+ messages in thread
From: Thomas Monjalon @ 2021-06-04 14:06 UTC (permalink / raw)
  To: Wang, Haiyue; +Cc: dev, Elena Agostini, andrew.rybchenko, ferruh.yigit, jerinj

04/06/2021 15:25, Wang, Haiyue:
> From: Thomas Monjalon <thomas@monjalon.net>
> > Another question is about the function rte_gpu_free().
> > How do we recognize that a memory chunk is from the CPU and GPU visible,
> > or just from GPU?
> > 
> 
> I didn't find the rte_gpu_free_visible definition, and the rte_gpu_free's
> comment just says: deallocate a chunk of memory allocated with rte_gpu_malloc*
> 
> Looks like the rte_gpu_free can handle this case ?

This is the proposal, yes.

> And from the definition "rte_gpu_free(uint16_t gpu_id, void *ptr)", the
> free needs to check whether this memory belong to the GPU or not, so it
> also can recognize the memory type, I think.

Yes that's the idea behind having a single free function.
We could have some metadata in front of the memory chunk.
My question is to confirm whether it is a good design or not,
and whether it should be driver specific or have a common struct in the lib.

Opinions?



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-04 13:59               ` Andrew Rybchenko
@ 2021-06-04 14:09                 ` Thomas Monjalon
  2021-06-04 15:20                   ` Jerin Jacob
  0 siblings, 1 reply; 128+ messages in thread
From: Thomas Monjalon @ 2021-06-04 14:09 UTC (permalink / raw)
  To: Andrew Rybchenko
  Cc: Jerin Jacob, Ferruh Yigit, dpdk-dev, Elena Agostini, david.marchand

04/06/2021 15:59, Andrew Rybchenko:
> On 6/4/21 4:18 PM, Thomas Monjalon wrote:
> > 04/06/2021 15:05, Andrew Rybchenko:
> >> On 6/4/21 3:46 PM, Thomas Monjalon wrote:
> >>> 04/06/2021 13:09, Jerin Jacob:
> >>>> On Fri, Jun 4, 2021 at 3:58 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> >>>>> 03/06/2021 11:33, Ferruh Yigit:
> >>>>>> On 6/3/2021 8:47 AM, Jerin Jacob wrote:
> >>>>>>> On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon <thomas@monjalon.net> wrote:
> >>>>>>>> +  [gpudev]             (@ref rte_gpudev.h),
> >>>>>>>
> >>>>>>> Since this device does not have a queue etc? Shouldn't make it a
> >>>>>>> library like mempool with vendor-defined ops?
> >>>>>>
> >>>>>> +1
> >>>>>>
> >>>>>> Current RFC announces additional memory allocation capabilities, which can suits
> >>>>>> better as extension to existing memory related library instead of a new device
> >>>>>> abstraction library.
> >>>>>
> >>>>> It is not replacing mempool.
> >>>>> It is more at the same level as EAL memory management:
> >>>>> allocate simple buffer, but with the exception it is done
> >>>>> on a specific device, so it requires a device ID.
> >>>>>
> >>>>> The other reason it needs to be a full library is that
> >>>>> it will start a workload on the GPU and get completion notification
> >>>>> so we can integrate the GPU workload in a packet processing pipeline.
> >>>>
> >>>> I might have confused you. My intention is not to make to fit under mempool API.
> >>>>
> >>>> I agree that we need a separate library for this. My objection is only
> >>>> to not call libgpudev and
> >>>> call it libgpu. And have APIs with rte_gpu_ instead of rte_gpu_dev as
> >>>> it not like existing "device libraries" in DPDK and
> >>>> it like other "libraries" in DPDK.
> >>>
> >>> I think we should define a queue of processing actions,
> >>> so it looks like other device libraries.
> >>> And anyway I think a library managing a device class,
> >>> and having some device drivers deserves the name of device library.
> >>>
> >>> I would like to read more opinions.
> >>
> >> Since the library is an unified interface to GPU device drivers
> >> I think it should be named as in the patch - gpudev.
> >>
> >> Mempool looks like an exception here - initially it was pure SW
> >> library, but not there are HW backends and corresponding device
> >> drivers.
> >>
> >> What I don't understand where is GPU specifics here?
> > 
> > That's an interesting question.
> > Let's ask first what is a GPU for DPDK?
> > I think it is like a sub-CPU with high parallel execution capabilities,
> > and it is controlled by the CPU.
> 
> I have no good ideas how to name it in accordance with
> above description to avoid "G" which for "Graphics" if
> understand correctly. However, may be it is not required.
> No strong opinion on the topic, but unbinding from
> "Graphics" would be nice.

That's a question I ask myself for months now.
I am not able to find a better name,
and I start thinking that "GPU" is famous enough in high-load computing
to convey the idea of what we can expect.




^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-04 12:55                       ` Thomas Monjalon
@ 2021-06-04 15:05                         ` Jerin Jacob
  0 siblings, 0 replies; 128+ messages in thread
From: Jerin Jacob @ 2021-06-04 15:05 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dpdk-dev, Elena Agostini, Ferruh Yigit

On Fri, Jun 4, 2021 at 6:25 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 03/06/2021 13:38, Jerin Jacob:
> > On Thu, Jun 3, 2021 at 4:00 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > 03/06/2021 12:04, Jerin Jacob:
> > > > On Thu, Jun 3, 2021 at 3:06 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > > 03/06/2021 11:20, Jerin Jacob:
> > > > > > The device needs have a queue kind of structure
> > > > > > and it is mapping to core to have a notion of configure. queue_setup,
> > > > > > start and stop etc
> > > > >
> > > > > Why is it a requirement to call it a device API?
> > > >
> > > > Then we need to define what needs to call as device library vs library and how?
> > > > Why mempool is not called a  device library vs library?
> > >
> > > My view is simple:
> > > if it has drivers, it is a device API, except bus and mempool libs.
> >
> > rte_secuity has drivers but it is not called a device library.
>
> rte_security is a monster beast :)
> Yes it has rte_security_ops implemented in net and crypto drivers,
> but it is an API extension only, there is no driver dedicated to security.
>
> > > About mempool, it started as a standard lib and got extended for HW support.
> >
> > Yes. We did not change to device library as it was fundamentally
> > different than another DPDK deices
> > when we added the device support.
> >
> > > > and why all
> > > > other device library has a common structure like queues and
> > > > it binding core etc. I tried to explain above the similar attributes
> > > > for dpdk device libraries[1] which I think, it a requirement so
> > > > that the end user will have familiarity with device libraries rather
> > > > than each one has separate General guidelines and principles.
> > > >
> > > > I think, it is more TB discussion topic and decides on this because I
> > > > don't see in technical issue in calling it a library.
> > >
> > > The naming is just a choice.
> >
> > Not sure.
> >
> > > Yesterday morning it was called lib/gpu/
> > > and in the evening it was renamed lib/gpudev/
> > > so no technical issue :)
> > >
> > > But the design of the API with queues or other paradigm
> > > is something I would like to discuss here.
> >
> > Yeah, That is important. IMO, That defines what needs to be a device library.
> >
> > > Note: there was no intent to publish GPU processing control
> > > in DPDK 21.08. We want to focus on GPU memory in 21.08,
> > > but I understand it is a key decision in the big picture.
> >
> > if the scope is only memory allocation, IMO, it is better to make a library.
>
> No it is only the first step.
>
> > > What would be your need and would you design such API?
> >
> > For me, there is no need for gpu library(as of now). May GPU consumers
> > can define what they need to control using the library.
>
> We need to integrate GPU processing workload in the DPDK workflow
> as a generic API.
> There could be 2 modes:
>         - queue of tasks
>         - tasks in an infinite loop
> In both modes, we could get completion notifications
> with an interrupt/callback or by polling a shared memory.


OK. If we have enqeue/dequeue kind operation and with queue model then it
makes sense to have a device model. It was not there in your initial
patch, but if we are adding
in the future then it OK.



>
>
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-04 14:09                 ` Thomas Monjalon
@ 2021-06-04 15:20                   ` Jerin Jacob
  2021-06-04 15:51                     ` Thomas Monjalon
  0 siblings, 1 reply; 128+ messages in thread
From: Jerin Jacob @ 2021-06-04 15:20 UTC (permalink / raw)
  To: Thomas Monjalon, Honnappa Nagarahalli
  Cc: Andrew Rybchenko, Ferruh Yigit, dpdk-dev, Elena Agostini, David Marchand

On Fri, Jun 4, 2021 at 7:39 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 04/06/2021 15:59, Andrew Rybchenko:
> > On 6/4/21 4:18 PM, Thomas Monjalon wrote:
> > > 04/06/2021 15:05, Andrew Rybchenko:
> > >> On 6/4/21 3:46 PM, Thomas Monjalon wrote:
> > >>> 04/06/2021 13:09, Jerin Jacob:
> > >>>> On Fri, Jun 4, 2021 at 3:58 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > >>>>> 03/06/2021 11:33, Ferruh Yigit:
> > >>>>>> On 6/3/2021 8:47 AM, Jerin Jacob wrote:
> > >>>>>>> On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon <thomas@monjalon.net> wrote:
> > >>>>>>>> +  [gpudev]             (@ref rte_gpudev.h),
> > >>>>>>>
> > >>>>>>> Since this device does not have a queue etc? Shouldn't make it a
> > >>>>>>> library like mempool with vendor-defined ops?
> > >>>>>>
> > >>>>>> +1
> > >>>>>>
> > >>>>>> Current RFC announces additional memory allocation capabilities, which can suits
> > >>>>>> better as extension to existing memory related library instead of a new device
> > >>>>>> abstraction library.
> > >>>>>
> > >>>>> It is not replacing mempool.
> > >>>>> It is more at the same level as EAL memory management:
> > >>>>> allocate simple buffer, but with the exception it is done
> > >>>>> on a specific device, so it requires a device ID.
> > >>>>>
> > >>>>> The other reason it needs to be a full library is that
> > >>>>> it will start a workload on the GPU and get completion notification
> > >>>>> so we can integrate the GPU workload in a packet processing pipeline.
> > >>>>
> > >>>> I might have confused you. My intention is not to make to fit under mempool API.
> > >>>>
> > >>>> I agree that we need a separate library for this. My objection is only
> > >>>> to not call libgpudev and
> > >>>> call it libgpu. And have APIs with rte_gpu_ instead of rte_gpu_dev as
> > >>>> it not like existing "device libraries" in DPDK and
> > >>>> it like other "libraries" in DPDK.
> > >>>
> > >>> I think we should define a queue of processing actions,
> > >>> so it looks like other device libraries.
> > >>> And anyway I think a library managing a device class,
> > >>> and having some device drivers deserves the name of device library.
> > >>>
> > >>> I would like to read more opinions.
> > >>
> > >> Since the library is an unified interface to GPU device drivers
> > >> I think it should be named as in the patch - gpudev.
> > >>
> > >> Mempool looks like an exception here - initially it was pure SW
> > >> library, but not there are HW backends and corresponding device
> > >> drivers.
> > >>
> > >> What I don't understand where is GPU specifics here?
> > >
> > > That's an interesting question.
> > > Let's ask first what is a GPU for DPDK?
> > > I think it is like a sub-CPU with high parallel execution capabilities,
> > > and it is controlled by the CPU.
> >
> > I have no good ideas how to name it in accordance with
> > above description to avoid "G" which for "Graphics" if
> > understand correctly. However, may be it is not required.
> > No strong opinion on the topic, but unbinding from
> > "Graphics" would be nice.
>
> That's a question I ask myself for months now.
> I am not able to find a better name,
> and I start thinking that "GPU" is famous enough in high-load computing
> to convey the idea of what we can expect.


The closest I can think of is big-little architecture in ARM SoC.
https://www.arm.com/why-arm/technologies/big-little

We do have similar architecture, Where the "coprocessor" is part of
the main CPU.
It is operations are:
- Download firmware
- Memory mapping for Main CPU memory by the co-processor
- Enq/Deq Jobs from/to Main CPU/Coprocessor CPU.

If your scope is something similar and No Graphics involved here then
we can remove G.

Coincidentally, Yesterday, I had an interaction with Elena for the
same for BaseBand related work in ORAN where
GPU used as Baseband processing instead of Graphics.(So I can
understand the big picture of this library)

I can think of "coprocessor-dev" as one of the name. We do have
similar machine learning co-processors(for compute)
if we can keep a generic name and it is for the above functions we may
use this subsystem as well in the future.










>
>
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-04 15:20                   ` Jerin Jacob
@ 2021-06-04 15:51                     ` Thomas Monjalon
  2021-06-04 18:20                       ` Wang, Haiyue
  0 siblings, 1 reply; 128+ messages in thread
From: Thomas Monjalon @ 2021-06-04 15:51 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Honnappa Nagarahalli, Andrew Rybchenko, Ferruh Yigit, dpdk-dev,
	Elena Agostini, David Marchand

04/06/2021 17:20, Jerin Jacob:
> On Fri, Jun 4, 2021 at 7:39 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > 04/06/2021 15:59, Andrew Rybchenko:
> > > On 6/4/21 4:18 PM, Thomas Monjalon wrote:
> > > > 04/06/2021 15:05, Andrew Rybchenko:
> > > >> On 6/4/21 3:46 PM, Thomas Monjalon wrote:
> > > >>> 04/06/2021 13:09, Jerin Jacob:
> > > >>>> On Fri, Jun 4, 2021 at 3:58 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > >>>>> 03/06/2021 11:33, Ferruh Yigit:
> > > >>>>>> On 6/3/2021 8:47 AM, Jerin Jacob wrote:
> > > >>>>>>> On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > >>>>>>>> +  [gpudev]             (@ref rte_gpudev.h),
> > > >>>>>>>
> > > >>>>>>> Since this device does not have a queue etc? Shouldn't make it a
> > > >>>>>>> library like mempool with vendor-defined ops?
> > > >>>>>>
> > > >>>>>> +1
> > > >>>>>>
> > > >>>>>> Current RFC announces additional memory allocation capabilities, which can suits
> > > >>>>>> better as extension to existing memory related library instead of a new device
> > > >>>>>> abstraction library.
> > > >>>>>
> > > >>>>> It is not replacing mempool.
> > > >>>>> It is more at the same level as EAL memory management:
> > > >>>>> allocate simple buffer, but with the exception it is done
> > > >>>>> on a specific device, so it requires a device ID.
> > > >>>>>
> > > >>>>> The other reason it needs to be a full library is that
> > > >>>>> it will start a workload on the GPU and get completion notification
> > > >>>>> so we can integrate the GPU workload in a packet processing pipeline.
> > > >>>>
> > > >>>> I might have confused you. My intention is not to make to fit under mempool API.
> > > >>>>
> > > >>>> I agree that we need a separate library for this. My objection is only
> > > >>>> to not call libgpudev and
> > > >>>> call it libgpu. And have APIs with rte_gpu_ instead of rte_gpu_dev as
> > > >>>> it not like existing "device libraries" in DPDK and
> > > >>>> it like other "libraries" in DPDK.
> > > >>>
> > > >>> I think we should define a queue of processing actions,
> > > >>> so it looks like other device libraries.
> > > >>> And anyway I think a library managing a device class,
> > > >>> and having some device drivers deserves the name of device library.
> > > >>>
> > > >>> I would like to read more opinions.
> > > >>
> > > >> Since the library is an unified interface to GPU device drivers
> > > >> I think it should be named as in the patch - gpudev.
> > > >>
> > > >> Mempool looks like an exception here - initially it was pure SW
> > > >> library, but not there are HW backends and corresponding device
> > > >> drivers.
> > > >>
> > > >> What I don't understand where is GPU specifics here?
> > > >
> > > > That's an interesting question.
> > > > Let's ask first what is a GPU for DPDK?
> > > > I think it is like a sub-CPU with high parallel execution capabilities,
> > > > and it is controlled by the CPU.
> > >
> > > I have no good ideas how to name it in accordance with
> > > above description to avoid "G" which for "Graphics" if
> > > understand correctly. However, may be it is not required.
> > > No strong opinion on the topic, but unbinding from
> > > "Graphics" would be nice.
> >
> > That's a question I ask myself for months now.
> > I am not able to find a better name,
> > and I start thinking that "GPU" is famous enough in high-load computing
> > to convey the idea of what we can expect.
> 
> 
> The closest I can think of is big-little architecture in ARM SoC.
> https://www.arm.com/why-arm/technologies/big-little
> 
> We do have similar architecture, Where the "coprocessor" is part of
> the main CPU.
> It is operations are:
> - Download firmware
> - Memory mapping for Main CPU memory by the co-processor
> - Enq/Deq Jobs from/to Main CPU/Coprocessor CPU.

Yes it looks like the exact same scope.
I like the word "co-processor" in this context.

> If your scope is something similar and No Graphics involved here then
> we can remove G.

Indeed no graphics in DPDK :)
By removing the G, you mean keeping only PU? like "pudev"?
We could also define the G as "General".

> Coincidentally, Yesterday, I had an interaction with Elena for the
> same for BaseBand related work in ORAN where
> GPU used as Baseband processing instead of Graphics.(So I can
> understand the big picture of this library)

Yes baseband processing is one possible usage of GPU with DPDK.
We could also imagine some security analysis, or any machine learning...

> I can think of "coprocessor-dev" as one of the name.

"coprocessor" looks too long as prefix of the functions.

> We do have similar machine learning co-processors(for compute)
> if we can keep a generic name and it is for the above functions we may
> use this subsystem as well in the future.

Yes that's the idea to share a common synchronization mechanism
with different HW.

That's cool to have such a big interest in the community for this patch.



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-04 14:06       ` Thomas Monjalon
@ 2021-06-04 18:04         ` Wang, Haiyue
  2021-06-05  7:49           ` Thomas Monjalon
  0 siblings, 1 reply; 128+ messages in thread
From: Wang, Haiyue @ 2021-06-04 18:04 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, Elena Agostini, andrew.rybchenko, Yigit, Ferruh, jerinj

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Friday, June 4, 2021 22:06
> To: Wang, Haiyue <haiyue.wang@intel.com>
> Cc: dev@dpdk.org; Elena Agostini <eagostini@nvidia.com>; andrew.rybchenko@oktetlabs.ru; Yigit, Ferruh
> <ferruh.yigit@intel.com>; jerinj@marvell.com
> Subject: Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
> 
> 04/06/2021 15:25, Wang, Haiyue:
> > From: Thomas Monjalon <thomas@monjalon.net>
> > > Another question is about the function rte_gpu_free().
> > > How do we recognize that a memory chunk is from the CPU and GPU visible,
> > > or just from GPU?
> > >
> >
> > I didn't find the rte_gpu_free_visible definition, and the rte_gpu_free's
> > comment just says: deallocate a chunk of memory allocated with rte_gpu_malloc*
> >
> > Looks like the rte_gpu_free can handle this case ?
> 
> This is the proposal, yes.
> 
> > And from the definition "rte_gpu_free(uint16_t gpu_id, void *ptr)", the
> > free needs to check whether this memory belong to the GPU or not, so it
> > also can recognize the memory type, I think.
> 
> Yes that's the idea behind having a single free function.
> We could have some metadata in front of the memory chunk.
> My question is to confirm whether it is a good design or not,
> and whether it should be driver specific or have a common struct in the lib.
> 
> Opinions?
> 

Make the GPU memory to be registered into the common lib API with the metadata
like address, size etc, and also some GPU specific callbacks like to handle how
to make GPU memory visible to CPU ?

And the memory register can be like the exist external memory function:

int
rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
		unsigned int n_pages, size_t page_sz)


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-04 15:51                     ` Thomas Monjalon
@ 2021-06-04 18:20                       ` Wang, Haiyue
  2021-06-05  5:09                         ` Jerin Jacob
  0 siblings, 1 reply; 128+ messages in thread
From: Wang, Haiyue @ 2021-06-04 18:20 UTC (permalink / raw)
  To: Thomas Monjalon, Jerin Jacob
  Cc: Honnappa Nagarahalli, Andrew Rybchenko, Yigit, Ferruh, dpdk-dev,
	Elena Agostini, David Marchand

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Thomas Monjalon
> Sent: Friday, June 4, 2021 23:51
> To: Jerin Jacob <jerinjacobk@gmail.com>
> Cc: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>; Yigit, Ferruh <ferruh.yigit@intel.com>; dpdk-dev <dev@dpdk.org>;
> Elena Agostini <eagostini@nvidia.com>; David Marchand <david.marchand@redhat.com>
> Subject: Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
> 
> 04/06/2021 17:20, Jerin Jacob:
> > On Fri, Jun 4, 2021 at 7:39 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > 04/06/2021 15:59, Andrew Rybchenko:
> > > > On 6/4/21 4:18 PM, Thomas Monjalon wrote:
> > > > > 04/06/2021 15:05, Andrew Rybchenko:
> > > > >> On 6/4/21 3:46 PM, Thomas Monjalon wrote:
> > > > >>> 04/06/2021 13:09, Jerin Jacob:
> > > > >>>> On Fri, Jun 4, 2021 at 3:58 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > >>>>> 03/06/2021 11:33, Ferruh Yigit:
> > > > >>>>>> On 6/3/2021 8:47 AM, Jerin Jacob wrote:
> > > > >>>>>>> On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > >>>>>>>> +  [gpudev]             (@ref rte_gpudev.h),
> > > > >>>>>>>
> > > > >>>>>>> Since this device does not have a queue etc? Shouldn't make it a
> > > > >>>>>>> library like mempool with vendor-defined ops?
> > > > >>>>>>
> > > > >>>>>> +1
> > > > >>>>>>
> > > > >>>>>> Current RFC announces additional memory allocation capabilities, which can suits
> > > > >>>>>> better as extension to existing memory related library instead of a new device
> > > > >>>>>> abstraction library.
> > > > >>>>>
> > > > >>>>> It is not replacing mempool.
> > > > >>>>> It is more at the same level as EAL memory management:
> > > > >>>>> allocate simple buffer, but with the exception it is done
> > > > >>>>> on a specific device, so it requires a device ID.
> > > > >>>>>
> > > > >>>>> The other reason it needs to be a full library is that
> > > > >>>>> it will start a workload on the GPU and get completion notification
> > > > >>>>> so we can integrate the GPU workload in a packet processing pipeline.
> > > > >>>>
> > > > >>>> I might have confused you. My intention is not to make to fit under mempool API.
> > > > >>>>
> > > > >>>> I agree that we need a separate library for this. My objection is only
> > > > >>>> to not call libgpudev and
> > > > >>>> call it libgpu. And have APIs with rte_gpu_ instead of rte_gpu_dev as
> > > > >>>> it not like existing "device libraries" in DPDK and
> > > > >>>> it like other "libraries" in DPDK.
> > > > >>>
> > > > >>> I think we should define a queue of processing actions,
> > > > >>> so it looks like other device libraries.
> > > > >>> And anyway I think a library managing a device class,
> > > > >>> and having some device drivers deserves the name of device library.
> > > > >>>
> > > > >>> I would like to read more opinions.
> > > > >>
> > > > >> Since the library is an unified interface to GPU device drivers
> > > > >> I think it should be named as in the patch - gpudev.
> > > > >>
> > > > >> Mempool looks like an exception here - initially it was pure SW
> > > > >> library, but not there are HW backends and corresponding device
> > > > >> drivers.
> > > > >>
> > > > >> What I don't understand where is GPU specifics here?
> > > > >
> > > > > That's an interesting question.
> > > > > Let's ask first what is a GPU for DPDK?
> > > > > I think it is like a sub-CPU with high parallel execution capabilities,
> > > > > and it is controlled by the CPU.
> > > >
> > > > I have no good ideas how to name it in accordance with
> > > > above description to avoid "G" which for "Graphics" if
> > > > understand correctly. However, may be it is not required.
> > > > No strong opinion on the topic, but unbinding from
> > > > "Graphics" would be nice.
> > >
> > > That's a question I ask myself for months now.
> > > I am not able to find a better name,
> > > and I start thinking that "GPU" is famous enough in high-load computing
> > > to convey the idea of what we can expect.
> >
> >
> > The closest I can think of is big-little architecture in ARM SoC.
> > https://www.arm.com/why-arm/technologies/big-little
> >
> > We do have similar architecture, Where the "coprocessor" is part of
> > the main CPU.
> > It is operations are:
> > - Download firmware
> > - Memory mapping for Main CPU memory by the co-processor
> > - Enq/Deq Jobs from/to Main CPU/Coprocessor CPU.
> 
> Yes it looks like the exact same scope.
> I like the word "co-processor" in this context.
> 
> > If your scope is something similar and No Graphics involved here then
> > we can remove G.
> 
> Indeed no graphics in DPDK :)
> By removing the G, you mean keeping only PU? like "pudev"?
> We could also define the G as "General".
> 
> > Coincidentally, Yesterday, I had an interaction with Elena for the
> > same for BaseBand related work in ORAN where
> > GPU used as Baseband processing instead of Graphics.(So I can
> > understand the big picture of this library)
> 
> Yes baseband processing is one possible usage of GPU with DPDK.
> We could also imagine some security analysis, or any machine learning...
> 
> > I can think of "coprocessor-dev" as one of the name.
> 
> "coprocessor" looks too long as prefix of the functions.
> 
> > We do have similar machine learning co-processors(for compute)
> > if we can keep a generic name and it is for the above functions we may
> > use this subsystem as well in the future.
> 

Accelerator, 'acce_dev' ? ;-)

> Yes that's the idea to share a common synchronization mechanism
> with different HW.
> 
> That's cool to have such a big interest in the community for this patch.
> 


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-04 18:20                       ` Wang, Haiyue
@ 2021-06-05  5:09                         ` Jerin Jacob
  2021-06-06  1:13                           ` Honnappa Nagarahalli
  0 siblings, 1 reply; 128+ messages in thread
From: Jerin Jacob @ 2021-06-05  5:09 UTC (permalink / raw)
  To: Wang, Haiyue
  Cc: Thomas Monjalon, Honnappa Nagarahalli, Andrew Rybchenko, Yigit,
	Ferruh, dpdk-dev, Elena Agostini, David Marchand

On Fri, Jun 4, 2021 at 11:50 PM Wang, Haiyue <haiyue.wang@intel.com> wrote:
>
> > -----Original Message-----
> > From: dev <dev-bounces@dpdk.org> On Behalf Of Thomas Monjalon
> > Sent: Friday, June 4, 2021 23:51
> > To: Jerin Jacob <jerinjacobk@gmail.com>
> > Cc: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Andrew Rybchenko
> > <andrew.rybchenko@oktetlabs.ru>; Yigit, Ferruh <ferruh.yigit@intel.com>; dpdk-dev <dev@dpdk.org>;
> > Elena Agostini <eagostini@nvidia.com>; David Marchand <david.marchand@redhat.com>
> > Subject: Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
> >
> > 04/06/2021 17:20, Jerin Jacob:
> > > On Fri, Jun 4, 2021 at 7:39 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > 04/06/2021 15:59, Andrew Rybchenko:
> > > > > On 6/4/21 4:18 PM, Thomas Monjalon wrote:
> > > > > > 04/06/2021 15:05, Andrew Rybchenko:
> > > > > >> On 6/4/21 3:46 PM, Thomas Monjalon wrote:
> > > > > >>> 04/06/2021 13:09, Jerin Jacob:
> > > > > >>>> On Fri, Jun 4, 2021 at 3:58 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > > >>>>> 03/06/2021 11:33, Ferruh Yigit:
> > > > > >>>>>> On 6/3/2021 8:47 AM, Jerin Jacob wrote:
> > > > > >>>>>>> On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > > >>>>>>>> +  [gpudev]             (@ref rte_gpudev.h),
> > > > > >>>>>>>
> > > > > >>>>>>> Since this device does not have a queue etc? Shouldn't make it a
> > > > > >>>>>>> library like mempool with vendor-defined ops?
> > > > > >>>>>>
> > > > > >>>>>> +1
> > > > > >>>>>>
> > > > > >>>>>> Current RFC announces additional memory allocation capabilities, which can suits
> > > > > >>>>>> better as extension to existing memory related library instead of a new device
> > > > > >>>>>> abstraction library.
> > > > > >>>>>
> > > > > >>>>> It is not replacing mempool.
> > > > > >>>>> It is more at the same level as EAL memory management:
> > > > > >>>>> allocate simple buffer, but with the exception it is done
> > > > > >>>>> on a specific device, so it requires a device ID.
> > > > > >>>>>
> > > > > >>>>> The other reason it needs to be a full library is that
> > > > > >>>>> it will start a workload on the GPU and get completion notification
> > > > > >>>>> so we can integrate the GPU workload in a packet processing pipeline.
> > > > > >>>>
> > > > > >>>> I might have confused you. My intention is not to make to fit under mempool API.
> > > > > >>>>
> > > > > >>>> I agree that we need a separate library for this. My objection is only
> > > > > >>>> to not call libgpudev and
> > > > > >>>> call it libgpu. And have APIs with rte_gpu_ instead of rte_gpu_dev as
> > > > > >>>> it not like existing "device libraries" in DPDK and
> > > > > >>>> it like other "libraries" in DPDK.
> > > > > >>>
> > > > > >>> I think we should define a queue of processing actions,
> > > > > >>> so it looks like other device libraries.
> > > > > >>> And anyway I think a library managing a device class,
> > > > > >>> and having some device drivers deserves the name of device library.
> > > > > >>>
> > > > > >>> I would like to read more opinions.
> > > > > >>
> > > > > >> Since the library is an unified interface to GPU device drivers
> > > > > >> I think it should be named as in the patch - gpudev.
> > > > > >>
> > > > > >> Mempool looks like an exception here - initially it was pure SW
> > > > > >> library, but not there are HW backends and corresponding device
> > > > > >> drivers.
> > > > > >>
> > > > > >> What I don't understand where is GPU specifics here?
> > > > > >
> > > > > > That's an interesting question.
> > > > > > Let's ask first what is a GPU for DPDK?
> > > > > > I think it is like a sub-CPU with high parallel execution capabilities,
> > > > > > and it is controlled by the CPU.
> > > > >
> > > > > I have no good ideas how to name it in accordance with
> > > > > above description to avoid "G" which for "Graphics" if
> > > > > understand correctly. However, may be it is not required.
> > > > > No strong opinion on the topic, but unbinding from
> > > > > "Graphics" would be nice.
> > > >
> > > > That's a question I ask myself for months now.
> > > > I am not able to find a better name,
> > > > and I start thinking that "GPU" is famous enough in high-load computing
> > > > to convey the idea of what we can expect.
> > >
> > >
> > > The closest I can think of is big-little architecture in ARM SoC.
> > > https://www.arm.com/why-arm/technologies/big-little
> > >
> > > We do have similar architecture, Where the "coprocessor" is part of
> > > the main CPU.
> > > It is operations are:
> > > - Download firmware
> > > - Memory mapping for Main CPU memory by the co-processor
> > > - Enq/Deq Jobs from/to Main CPU/Coprocessor CPU.
> >
> > Yes it looks like the exact same scope.
> > I like the word "co-processor" in this context.
> >
> > > If your scope is something similar and No Graphics involved here then
> > > we can remove G.
> >
> > Indeed no graphics in DPDK :)
> > By removing the G, you mean keeping only PU? like "pudev"?
> > We could also define the G as "General".
> >
> > > Coincidentally, Yesterday, I had an interaction with Elena for the
> > > same for BaseBand related work in ORAN where
> > > GPU used as Baseband processing instead of Graphics.(So I can
> > > understand the big picture of this library)
> >
> > Yes baseband processing is one possible usage of GPU with DPDK.
> > We could also imagine some security analysis, or any machine learning...
> >
> > > I can think of "coprocessor-dev" as one of the name.
> >
> > "coprocessor" looks too long as prefix of the functions.

Yes. Libray name can be lengthy, but API prefix should be 3 letters
kind short form will be required.


> >
> > > We do have similar machine learning co-processors(for compute)
> > > if we can keep a generic name and it is for the above functions we may
> > > use this subsystem as well in the future.
> >
>
> Accelerator, 'acce_dev' ? ;-)

It may get confused with HW accelerators.


Some of the options I can think of. Sorting in my preference.

library name, API prefix
1) libhpc-dev, rte_hpc_ (hpc-> Heterogeneous processor compute)
2) libhc-dev, rte_hc_
(https://en.wikipedia.org/wiki/Heterogeneous_computing see: Example
hardware)
3) libpu-dev, rte_pu_ (pu -> processing unit)
4) libhp-dev, rte_hp_ (hp->heterogeneous processor)
5) libcoprocessor-dev, rte_cps_ ?
6) libcompute-dev, rte_cpt_ ?
7) libgpu-dev, rte_gpu_




>
> > Yes that's the idea to share a common synchronization mechanism
> > with different HW.
> >
> > That's cool to have such a big interest in the community for this patch.
> >
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-04 18:04         ` Wang, Haiyue
@ 2021-06-05  7:49           ` Thomas Monjalon
  2021-06-05 11:09             ` Wang, Haiyue
  0 siblings, 1 reply; 128+ messages in thread
From: Thomas Monjalon @ 2021-06-05  7:49 UTC (permalink / raw)
  To: Wang, Haiyue; +Cc: dev, Elena Agostini, andrew.rybchenko, Yigit, Ferruh, jerinj

04/06/2021 20:04, Wang, Haiyue:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 04/06/2021 15:25, Wang, Haiyue:
> > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > Another question is about the function rte_gpu_free().
> > > > How do we recognize that a memory chunk is from the CPU and GPU visible,
> > > > or just from GPU?
> > > >
> > >
> > > I didn't find the rte_gpu_free_visible definition, and the rte_gpu_free's
> > > comment just says: deallocate a chunk of memory allocated with rte_gpu_malloc*
> > >
> > > Looks like the rte_gpu_free can handle this case ?
> > 
> > This is the proposal, yes.
> > 
> > > And from the definition "rte_gpu_free(uint16_t gpu_id, void *ptr)", the
> > > free needs to check whether this memory belong to the GPU or not, so it
> > > also can recognize the memory type, I think.
> > 
> > Yes that's the idea behind having a single free function.
> > We could have some metadata in front of the memory chunk.
> > My question is to confirm whether it is a good design or not,
> > and whether it should be driver specific or have a common struct in the lib.
> > 
> > Opinions?
> > 
> 
> Make the GPU memory to be registered into the common lib API with the metadata
> like address, size etc, and also some GPU specific callbacks like to handle how
> to make GPU memory visible to CPU ?
> 
> And the memory register can be like the exist external memory function:
> 
> int
> rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
> 		unsigned int n_pages, size_t page_sz)

How do you specify the device ID?
I may have missed something.




^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-05  7:49           ` Thomas Monjalon
@ 2021-06-05 11:09             ` Wang, Haiyue
  0 siblings, 0 replies; 128+ messages in thread
From: Wang, Haiyue @ 2021-06-05 11:09 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, Elena Agostini, andrew.rybchenko, Yigit, Ferruh, jerinj

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Saturday, June 5, 2021 15:49
> To: Wang, Haiyue <haiyue.wang@intel.com>
> Cc: dev@dpdk.org; Elena Agostini <eagostini@nvidia.com>; andrew.rybchenko@oktetlabs.ru; Yigit, Ferruh
> <ferruh.yigit@intel.com>; jerinj@marvell.com
> Subject: Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
> 
> 04/06/2021 20:04, Wang, Haiyue:
> > From: Thomas Monjalon <thomas@monjalon.net>
> > > 04/06/2021 15:25, Wang, Haiyue:
> > > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > > Another question is about the function rte_gpu_free().
> > > > > How do we recognize that a memory chunk is from the CPU and GPU visible,
> > > > > or just from GPU?
> > > > >
> > > >
> > > > I didn't find the rte_gpu_free_visible definition, and the rte_gpu_free's
> > > > comment just says: deallocate a chunk of memory allocated with rte_gpu_malloc*
> > > >
> > > > Looks like the rte_gpu_free can handle this case ?
> > >
> > > This is the proposal, yes.
> > >
> > > > And from the definition "rte_gpu_free(uint16_t gpu_id, void *ptr)", the
> > > > free needs to check whether this memory belong to the GPU or not, so it
> > > > also can recognize the memory type, I think.
> > >
> > > Yes that's the idea behind having a single free function.
> > > We could have some metadata in front of the memory chunk.
> > > My question is to confirm whether it is a good design or not,
> > > and whether it should be driver specific or have a common struct in the lib.
> > >
> > > Opinions?
> > >
> >
> > Make the GPU memory to be registered into the common lib API with the metadata
> > like address, size etc, and also some GPU specific callbacks like to handle how
> > to make GPU memory visible to CPU ?
> >
> > And the memory register can be like the exist external memory function:
> >
> > int
> > rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
> > 		unsigned int n_pages, size_t page_sz)
> 
> How do you specify the device ID

I mean that take the current external memory register as an example, it is not
a real proto-type.

The GPU memory management library can provide the this kind of API for GPU driver
to register its memory at probe time or start time ?  

> I may have missed something.
> 



> 


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-02 20:35 [dpdk-dev] [PATCH] gpudev: introduce memory API Thomas Monjalon
                   ` (5 preceding siblings ...)
  2021-06-04 11:07 ` Wang, Haiyue
@ 2021-06-06  1:10 ` Honnappa Nagarahalli
  2021-06-07 10:50   ` Thomas Monjalon
  2021-07-30 13:55 ` [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library Thomas Monjalon
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 128+ messages in thread
From: Honnappa Nagarahalli @ 2021-06-06  1:10 UTC (permalink / raw)
  To: thomas, dev; +Cc: Elena Agostini, nd, Honnappa Nagarahalli, nd

<snip>

> 
> From: Elena Agostini <eagostini@nvidia.com>
> 
> The new library gpudev is for dealing with GPU from a DPDK application in a
> vendor-agnostic way.
It would be good to explain how the application using GPU+DPDK would look like.

Which parts of the workload need DPDK's support?

Any requirements on co-existence of GPU with other accelerators?

> 
> As a first step, the features are focused on memory management.
> A function allows to allocate memory inside the GPU, while another one
> allows to use main (CPU) memory from the GPU.
Is this memory for packet buffers or something else?

> 
> The infrastructure is prepared to welcome drivers in drivers/gpu/ as the
> upcoming NVIDIA one, implementing the gpudev API.
> Other additions planned for next revisions:
>   - C implementation file
>   - guide documentation
>   - unit tests
>   - integration in testpmd to enable Rx/Tx to/from GPU memory.
> 
> The next step should focus on GPU processing task control.
> 
> Signed-off-by: Elena Agostini <eagostini@nvidia.com>
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> ---
>  .gitignore                           |   1 +
>  MAINTAINERS                          |   6 +
>  doc/api/doxy-api-index.md            |   1 +
>  doc/api/doxy-api.conf.in             |   1 +
>  doc/guides/conf.py                   |   8 ++
>  doc/guides/gpus/features/default.ini |  13 ++
>  doc/guides/gpus/index.rst            |  11 ++
>  doc/guides/gpus/overview.rst         |   7 +
>  doc/guides/index.rst                 |   1 +
>  doc/guides/prog_guide/gpu.rst        |   5 +
>  doc/guides/prog_guide/index.rst      |   1 +
>  drivers/gpu/meson.build              |   4 +
>  drivers/meson.build                  |   1 +
>  lib/gpudev/gpu_driver.h              |  44 +++++++
>  lib/gpudev/meson.build               |   9 ++
>  lib/gpudev/rte_gpudev.h              | 183 +++++++++++++++++++++++++++
>  lib/gpudev/version.map               |  11 ++
>  lib/meson.build                      |   1 +
>  18 files changed, 308 insertions(+)
>  create mode 100644 doc/guides/gpus/features/default.ini
>  create mode 100644 doc/guides/gpus/index.rst  create mode 100644
> doc/guides/gpus/overview.rst  create mode 100644
> doc/guides/prog_guide/gpu.rst  create mode 100644
> drivers/gpu/meson.build  create mode 100644 lib/gpudev/gpu_driver.h
> create mode 100644 lib/gpudev/meson.build  create mode 100644
> lib/gpudev/rte_gpudev.h  create mode 100644 lib/gpudev/version.map
> 
> diff --git a/.gitignore b/.gitignore
> index b19c0717e6..49494e0c6c 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -14,6 +14,7 @@ doc/guides/compressdevs/overview_feature_table.txt
>  doc/guides/regexdevs/overview_feature_table.txt
>  doc/guides/vdpadevs/overview_feature_table.txt
>  doc/guides/bbdevs/overview_feature_table.txt
> +doc/guides/gpus/overview_feature_table.txt
> 
>  # ignore generated ctags/cscope files
>  cscope.out.po
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 5877a16971..c4755dfe9a 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -452,6 +452,12 @@ F: app/test-regex/
>  F: doc/guides/prog_guide/regexdev.rst
>  F: doc/guides/regexdevs/features/default.ini
> 
> +GPU API - EXPERIMENTAL
> +M: Elena Agostini <eagostini@nvidia.com>
> +F: lib/gpudev/
> +F: doc/guides/prog_guide/gpu.rst
> +F: doc/guides/gpus/features/default.ini
> +
>  Eventdev API
>  M: Jerin Jacob <jerinj@marvell.com>
>  T: git://dpdk.org/next/dpdk-next-eventdev
> diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md index
> 1992107a03..bd10342ca2 100644
> --- a/doc/api/doxy-api-index.md
> +++ b/doc/api/doxy-api-index.md
> @@ -21,6 +21,7 @@ The public API headers are grouped by topics:
>    [compressdev]        (@ref rte_compressdev.h),
>    [compress]           (@ref rte_comp.h),
>    [regexdev]           (@ref rte_regexdev.h),
> +  [gpudev]             (@ref rte_gpudev.h),
>    [eventdev]           (@ref rte_eventdev.h),
>    [event_eth_rx_adapter]   (@ref rte_event_eth_rx_adapter.h),
>    [event_eth_tx_adapter]   (@ref rte_event_eth_tx_adapter.h),
> diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in index
> 325a0195c6..831b9a6b33 100644
> --- a/doc/api/doxy-api.conf.in
> +++ b/doc/api/doxy-api.conf.in
> @@ -40,6 +40,7 @@ INPUT                   = @TOPDIR@/doc/api/doxy-api-
> index.md \
>                            @TOPDIR@/lib/eventdev \
>                            @TOPDIR@/lib/fib \
>                            @TOPDIR@/lib/flow_classify \
> +                          @TOPDIR@/lib/gpudev \
>                            @TOPDIR@/lib/graph \
>                            @TOPDIR@/lib/gro \
>                            @TOPDIR@/lib/gso \ diff --git a/doc/guides/conf.py
> b/doc/guides/conf.py index 67d2dd62c7..7930da9ceb 100644
> --- a/doc/guides/conf.py
> +++ b/doc/guides/conf.py
> @@ -152,6 +152,9 @@ def generate_overview_table(output_filename,
> table_id, section, table_name, titl
>          name = ini_filename[:-4]
>          name = name.replace('_vf', 'vf')
>          pmd_names.append(name)
> +    if not pmd_names:
> +        # Add an empty column if table is empty (required by RST syntax)
> +        pmd_names.append(' ')
> 
>      # Pad the table header names.
>      max_header_len = len(max(pmd_names, key=len)) @@ -388,6 +391,11
> @@ def setup(app):
>                              'Features',
>                              'Features availability in bbdev drivers',
>                              'Feature')
> +    table_file = dirname(__file__) + '/gpus/overview_feature_table.txt'
> +    generate_overview_table(table_file, 1,
> +                            'Features',
> +                            'Features availability in GPU drivers',
> +                            'Feature')
> 
>      if LooseVersion(sphinx_version) < LooseVersion('1.3.1'):
>          print('Upgrade sphinx to version >= 1.3.1 for '
> diff --git a/doc/guides/gpus/features/default.ini
> b/doc/guides/gpus/features/default.ini
> new file mode 100644
> index 0000000000..c363447b0d
> --- /dev/null
> +++ b/doc/guides/gpus/features/default.ini
> @@ -0,0 +1,13 @@
> +;
> +; Features of a GPU driver.
> +;
> +; This file defines the features that are valid for inclusion in ; the
> +other driver files and also the order that they appear in ; the
> +features table in the documentation. The feature description ; string
> +should not exceed feature_str_len defined in conf.py.
> +;
> +[Features]
> +Get device info                =
> +Share CPU memory with GPU      =
> +Allocate GPU memory            =
> +Free memory                    =
> diff --git a/doc/guides/gpus/index.rst b/doc/guides/gpus/index.rst new file
> mode 100644 index 0000000000..f9c62aeb36
> --- /dev/null
> +++ b/doc/guides/gpus/index.rst
> @@ -0,0 +1,11 @@
> +.. SPDX-License-Identifier: BSD-3-Clause
> +   Copyright 2021 NVIDIA Corporation & Affiliates
> +
> +GPU Drivers
> +===========
> +
> +.. toctree::
> +   :maxdepth: 2
> +   :numbered:
> +
> +   overview
> diff --git a/doc/guides/gpus/overview.rst b/doc/guides/gpus/overview.rst
> new file mode 100644 index 0000000000..e7f985e98b
> --- /dev/null
> +++ b/doc/guides/gpus/overview.rst
> @@ -0,0 +1,7 @@
> +.. SPDX-License-Identifier: BSD-3-Clause
> +   Copyright 2021 NVIDIA Corporation & Affiliates
> +
> +Overview of GPU Drivers
> +=======================
> +
> +.. include:: overview_feature_table.txt
> diff --git a/doc/guides/index.rst b/doc/guides/index.rst index
> 857f0363d3..ee4d79a4eb 100644
> --- a/doc/guides/index.rst
> +++ b/doc/guides/index.rst
> @@ -21,6 +21,7 @@ DPDK documentation
>     compressdevs/index
>     vdpadevs/index
>     regexdevs/index
> +   gpus/index
>     eventdevs/index
>     rawdevs/index
>     mempool/index
> diff --git a/doc/guides/prog_guide/gpu.rst b/doc/guides/prog_guide/gpu.rst
> new file mode 100644 index 0000000000..54f9fa8300
> --- /dev/null
> +++ b/doc/guides/prog_guide/gpu.rst
> @@ -0,0 +1,5 @@
> +.. SPDX-License-Identifier: BSD-3-Clause
> +   Copyright 2021 NVIDIA Corporation & Affiliates
> +
> +GPU Library
> +===========
> diff --git a/doc/guides/prog_guide/index.rst
> b/doc/guides/prog_guide/index.rst index 2dce507f46..dfddf90b51 100644
> --- a/doc/guides/prog_guide/index.rst
> +++ b/doc/guides/prog_guide/index.rst
> @@ -27,6 +27,7 @@ Programmer's Guide
>      cryptodev_lib
>      compressdev
>      regexdev
> +    gpu
>      rte_security
>      rawdev
>      link_bonding_poll_mode_drv_lib
> diff --git a/drivers/gpu/meson.build b/drivers/gpu/meson.build new file mode
> 100644 index 0000000000..5189950616
> --- /dev/null
> +++ b/drivers/gpu/meson.build
> @@ -0,0 +1,4 @@
> +# SPDX-License-Identifier: BSD-3-Clause # Copyright 2021 NVIDIA
> +Corporation & Affiliates
> +
> +drivers = []
> diff --git a/drivers/meson.build b/drivers/meson.build index
> bc6f4f567f..f607040d79 100644
> --- a/drivers/meson.build
> +++ b/drivers/meson.build
> @@ -18,6 +18,7 @@ subdirs = [
>          'vdpa',           # depends on common, bus and mempool.
>          'event',          # depends on common, bus, mempool and net.
>          'baseband',       # depends on common and bus.
> +        'gpu',            # depends on common and bus.
>  ]
> 
>  if meson.is_cross_build()
> diff --git a/lib/gpudev/gpu_driver.h b/lib/gpudev/gpu_driver.h new file mode
> 100644 index 0000000000..5ff609e49d
> --- /dev/null
> +++ b/lib/gpudev/gpu_driver.h
> @@ -0,0 +1,44 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright 2021 NVIDIA Corporation & Affiliates  */
> +
> +#ifndef GPU_DRIVER_H
> +#define GPU_DRIVER_H
> +
> +#include <stdint.h>
> +
> +#include <rte_common.h>
> +
> +#include "rte_gpudev.h"
> +
> +struct rte_gpu_dev;
> +
> +typedef int (*gpu_malloc_t)(struct rte_gpu_dev *dev, size_t size, void
> +**ptr); typedef int (*gpu_free_t)(struct rte_gpu_dev *dev, void *ptr);
> +
> +struct rte_gpu_dev {
> +	/* Backing device. */
> +	struct rte_device *device;
> +	/* GPU info structure. */
> +	struct rte_gpu_info info;
> +	/* Counter of processes using the device. */
> +	uint16_t process_cnt;
> +	/* If device is currently used or not. */
> +	enum rte_gpu_state state;
> +	/* FUNCTION: Allocate memory on the GPU. */
> +	gpu_malloc_t gpu_malloc;
> +	/* FUNCTION: Allocate memory on the CPU visible from the GPU. */
> +	gpu_malloc_t gpu_malloc_visible;
> +	/* FUNCTION: Free allocated memory on the GPU. */
> +	gpu_free_t gpu_free;
> +	/* Device interrupt handle. */
> +	struct rte_intr_handle *intr_handle;
> +	/* Driver-specific private data. */
> +	void *dev_private;
> +} __rte_cache_aligned;
> +
> +struct rte_gpu_dev *rte_gpu_dev_allocate(const char *name); struct
> +rte_gpu_dev *rte_gpu_dev_get_by_name(const char *name); int
> +rte_gpu_dev_release(struct rte_gpu_dev *gpudev);
> +
> +#endif /* GPU_DRIVER_H */
> diff --git a/lib/gpudev/meson.build b/lib/gpudev/meson.build new file mode
> 100644 index 0000000000..f05459e18d
> --- /dev/null
> +++ b/lib/gpudev/meson.build
> @@ -0,0 +1,9 @@
> +# SPDX-License-Identifier: BSD-3-Clause # Copyright 2021 NVIDIA
> +Corporation & Affiliates
> +
> +headers = files(
> +        'rte_gpudev.h',
> +)
> +
> +sources = files(
> +)
> diff --git a/lib/gpudev/rte_gpudev.h b/lib/gpudev/rte_gpudev.h new file mode
> 100644 index 0000000000..b12f35c17e
> --- /dev/null
> +++ b/lib/gpudev/rte_gpudev.h
> @@ -0,0 +1,183 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright 2021 NVIDIA Corporation & Affiliates  */
> +
> +#ifndef RTE_GPUDEV_H
> +#define RTE_GPUDEV_H
> +
> +#include <stdint.h>
> +#include <stdbool.h>
> +
> +#include <rte_common.h>
> +
> +/**
> + * @file
> + * Generic library to interact with a GPU.
> + *
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/** Maximum number of GPU engines. */
> +#define RTE_GPU_MAX_DEVS UINT16_C(32)
> +/** Maximum length of device name. */
> +#define RTE_GPU_NAME_MAX_LEN 128
> +
> +/** Flags indicate current state of GPU device. */ enum rte_gpu_state {
> +	RTE_GPU_STATE_UNUSED,        /**< not initialized */
> +	RTE_GPU_STATE_INITIALIZED,   /**< initialized */
> +};
> +
> +/** Store a list of info for a given GPU. */ struct rte_gpu_info {
> +	/** GPU device ID. */
> +	uint16_t gpu_id;
> +	/** Unique identifier name. */
> +	char name[RTE_GPU_NAME_MAX_LEN];
> +	/** Total memory available on device. */
> +	size_t total_memory;
> +	/** Total processors available on device. */
> +	int processor_count;
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Returns the number of GPUs detected and associated to DPDK.
> + *
> + * @return
> + *   The number of available GPUs.
> + */
> +__rte_experimental
> +uint16_t rte_gpu_dev_count_avail(void);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Check if the device is valid and initialized in DPDK.
> + *
> + * @param gpu_id
> + *   The input GPU ID.
> + *
> + * @return
> + *   - True if gpu_id is a valid and initialized GPU.
> + *   - False otherwise.
> + */
> +__rte_experimental
> +bool rte_gpu_dev_is_valid(uint16_t gpu_id);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Get the GPU ID of the next valid GPU initialized in DPDK.
> + *
> + * @param gpu_id
> + *   The initial GPU ID to start the research.
> + *
> + * @return
> + *   Next GPU ID corresponding to a valid and initialized GPU device.
> + */
> +__rte_experimental
> +uint16_t rte_gpu_dev_find_next(uint16_t gpu_id);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Macro to iterate over all valid GPUs.
> + *
> + * @param gpu_id
> + *   The ID of the next possible valid GPU.
> + * @return
> + *   Next valid GPU ID, RTE_GPU_MAX_DEVS if there is none.
> + */
> +#define RTE_GPU_FOREACH_DEV(gpu_id) \
> +	for (gpu_id = rte_gpu_find_next(0); \
> +	     gpu_id < RTE_GPU_MAX_DEVS; \
> +	     gpu_id = rte_gpu_find_next(gpu_id + 1))
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Return GPU specific info.
> + *
> + * @param gpu_id
> + *   GPU ID to get info.
> + * @param info
> + *   Memory structure to fill with the info.
> + *
> + * @return
> + *   0 on success, -1 otherwise.
> + */
> +__rte_experimental
> +int rte_gpu_dev_info_get(uint16_t gpu_id, struct rte_gpu_info **info);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Allocate a chunk of memory on the GPU.
> + *
> + * @param gpu_id
> + *   GPU ID to allocate memory.
> + * @param size
> + *   Number of bytes to allocate.
> + * @param ptr
> + *   Pointer to store the address of the allocated memory.
> + *
> + * @return
> + *   0 on success, -1 otherwise.
> + */
> +__rte_experimental
> +int rte_gpu_malloc(uint16_t gpu_id, size_t size, void **ptr);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Allocate a chunk of memory on the CPU that is visible from the GPU.
> + *
> + * @param gpu_id
> + *   Reference GPU ID.
> + * @param size
> + *   Number of bytes to allocate.
> + * @param ptr
> + *   Pointer to store the address of the allocated memory.
> + *
> + * @return
> + *   0 on success, -1 otherwise.
> + */
> +__rte_experimental
> +int rte_gpu_malloc_visible(uint16_t gpu_id, size_t size, void **ptr);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Deallocate a chunk of memory allocated with rte_gpu_malloc*.
> + *
> + * @param gpu_id
> + *   Reference GPU ID.
> + * @param ptr
> + *   Pointer to the memory area to be deallocated.
> + *
> + * @return
> + *   0 on success, -1 otherwise.
> + */
> +__rte_experimental
> +int rte_gpu_free(uint16_t gpu_id, void *ptr);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* RTE_GPUDEV_H */
> diff --git a/lib/gpudev/version.map b/lib/gpudev/version.map new file mode
> 100644 index 0000000000..9e0f218e8b
> --- /dev/null
> +++ b/lib/gpudev/version.map
> @@ -0,0 +1,11 @@
> +EXPERIMENTAL {
> +	global:
> +
> +	rte_gpu_dev_count_avail;
> +	rte_gpu_dev_find_next;
> +	rte_gpu_dev_info_get;
> +	rte_gpu_dev_is_valid;
> +	rte_gpu_free;
> +	rte_gpu_malloc;
> +	rte_gpu_malloc_visible;
> +};
> diff --git a/lib/meson.build b/lib/meson.build index 4a64756a68..ffefc64c69
> 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -33,6 +33,7 @@ libraries = [
>          'distributor',
>          'efd',
>          'eventdev',
> +        'gpudev',
>          'gro',
>          'gso',
>          'ip_frag',
> --
> 2.31.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-05  5:09                         ` Jerin Jacob
@ 2021-06-06  1:13                           ` Honnappa Nagarahalli
  2021-06-06  5:28                             ` Jerin Jacob
  2021-06-07  7:20                             ` Wang, Haiyue
  0 siblings, 2 replies; 128+ messages in thread
From: Honnappa Nagarahalli @ 2021-06-06  1:13 UTC (permalink / raw)
  To: Jerin Jacob, Wang, Haiyue
  Cc: thomas, Andrew Rybchenko, Yigit, Ferruh, dpdk-dev,
	Elena Agostini, David Marchand, nd, Honnappa Nagarahalli, nd

<snip>

> > >
> > > 04/06/2021 17:20, Jerin Jacob:
> > > > On Fri, Jun 4, 2021 at 7:39 PM Thomas Monjalon
> <thomas@monjalon.net> wrote:
> > > > > 04/06/2021 15:59, Andrew Rybchenko:
> > > > > > On 6/4/21 4:18 PM, Thomas Monjalon wrote:
> > > > > > > 04/06/2021 15:05, Andrew Rybchenko:
> > > > > > >> On 6/4/21 3:46 PM, Thomas Monjalon wrote:
> > > > > > >>> 04/06/2021 13:09, Jerin Jacob:
> > > > > > >>>> On Fri, Jun 4, 2021 at 3:58 PM Thomas Monjalon
> <thomas@monjalon.net> wrote:
> > > > > > >>>>> 03/06/2021 11:33, Ferruh Yigit:
> > > > > > >>>>>> On 6/3/2021 8:47 AM, Jerin Jacob wrote:
> > > > > > >>>>>>> On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon
> <thomas@monjalon.net> wrote:
> > > > > > >>>>>>>> +  [gpudev]             (@ref rte_gpudev.h),
> > > > > > >>>>>>>
> > > > > > >>>>>>> Since this device does not have a queue etc? Shouldn't
> > > > > > >>>>>>> make it a library like mempool with vendor-defined ops?
> > > > > > >>>>>>
> > > > > > >>>>>> +1
> > > > > > >>>>>>
> > > > > > >>>>>> Current RFC announces additional memory allocation
> > > > > > >>>>>> capabilities, which can suits better as extension to
> > > > > > >>>>>> existing memory related library instead of a new device
> abstraction library.
> > > > > > >>>>>
> > > > > > >>>>> It is not replacing mempool.
> > > > > > >>>>> It is more at the same level as EAL memory management:
> > > > > > >>>>> allocate simple buffer, but with the exception it is
> > > > > > >>>>> done on a specific device, so it requires a device ID.
> > > > > > >>>>>
> > > > > > >>>>> The other reason it needs to be a full library is that
> > > > > > >>>>> it will start a workload on the GPU and get completion
> > > > > > >>>>> notification so we can integrate the GPU workload in a packet
> processing pipeline.
> > > > > > >>>>
> > > > > > >>>> I might have confused you. My intention is not to make to fit
> under mempool API.
> > > > > > >>>>
> > > > > > >>>> I agree that we need a separate library for this. My
> > > > > > >>>> objection is only to not call libgpudev and call it
> > > > > > >>>> libgpu. And have APIs with rte_gpu_ instead of
> > > > > > >>>> rte_gpu_dev as it not like existing "device libraries" in
> > > > > > >>>> DPDK and it like other "libraries" in DPDK.
> > > > > > >>>
> > > > > > >>> I think we should define a queue of processing actions, so
> > > > > > >>> it looks like other device libraries.
> > > > > > >>> And anyway I think a library managing a device class, and
> > > > > > >>> having some device drivers deserves the name of device library.
> > > > > > >>>
> > > > > > >>> I would like to read more opinions.
> > > > > > >>
> > > > > > >> Since the library is an unified interface to GPU device
> > > > > > >> drivers I think it should be named as in the patch - gpudev.
> > > > > > >>
> > > > > > >> Mempool looks like an exception here - initially it was
> > > > > > >> pure SW library, but not there are HW backends and
> > > > > > >> corresponding device drivers.
> > > > > > >>
> > > > > > >> What I don't understand where is GPU specifics here?
> > > > > > >
> > > > > > > That's an interesting question.
> > > > > > > Let's ask first what is a GPU for DPDK?
> > > > > > > I think it is like a sub-CPU with high parallel execution
> > > > > > > capabilities, and it is controlled by the CPU.
> > > > > >
> > > > > > I have no good ideas how to name it in accordance with above
> > > > > > description to avoid "G" which for "Graphics" if understand
> > > > > > correctly. However, may be it is not required.
> > > > > > No strong opinion on the topic, but unbinding from "Graphics"
> > > > > > would be nice.
> > > > >
> > > > > That's a question I ask myself for months now.
> > > > > I am not able to find a better name, and I start thinking that
> > > > > "GPU" is famous enough in high-load computing to convey the idea
> > > > > of what we can expect.
> > > >
> > > >
> > > > The closest I can think of is big-little architecture in ARM SoC.
> > > > https://www.arm.com/why-arm/technologies/big-little
From the application pov, big-little arch is nothing but SMT. Not sure how it is similar to another device on PCIe.

> > > >
> > > > We do have similar architecture, Where the "coprocessor" is part
> > > > of the main CPU.
> > > > It is operations are:
> > > > - Download firmware
> > > > - Memory mapping for Main CPU memory by the co-processor
> > > > - Enq/Deq Jobs from/to Main CPU/Coprocessor CPU.
> > >
> > > Yes it looks like the exact same scope.
> > > I like the word "co-processor" in this context.
> > >
> > > > If your scope is something similar and No Graphics involved here
> > > > then we can remove G.
> > >
> > > Indeed no graphics in DPDK :)
> > > By removing the G, you mean keeping only PU? like "pudev"?
> > > We could also define the G as "General".
> > >
> > > > Coincidentally, Yesterday, I had an interaction with Elena for the
> > > > same for BaseBand related work in ORAN where GPU used as Baseband
> > > > processing instead of Graphics.(So I can understand the big
> > > > picture of this library)
This patch does not provide the big picture view of what the processing looks like using GPU. It would be good to explain that.
For ex:
1) Will the notion of GPU hidden from the application? i.e. is the application allowed to launch kernels?
	1a) Will DPDK provide abstract APIs to launch kernels?
     This would require us to have the notion of GPU in DPDK and the application would depend on the availability of GPU in the system.
2) Is launching kernels hidden? i.e. the application still calls DPDK abstract APIs (such as encryption/decryption APIs) without knowing that the encryption/decryption is happening on GPU.
     This does not require us to have a notion of GPU in DPDK at the API level

If we keep CXL in mind, I would imagine that in the future the devices on PCIe could have their own local memory. May be some of the APIs could use generic names. For ex: instead of calling it as "rte_gpu_malloc" may be we could call it as "rte_dev_malloc". This way any future device which hosts its own memory that need to be managed by the application, can use these APIs.
	

> > >
> > > Yes baseband processing is one possible usage of GPU with DPDK.
> > > We could also imagine some security analysis, or any machine learning...
> > >
> > > > I can think of "coprocessor-dev" as one of the name.
> > >
> > > "coprocessor" looks too long as prefix of the functions.
> 
> Yes. Libray name can be lengthy, but API prefix should be 3 letters kind short
> form will be required.
> 
> 
> > >
> > > > We do have similar machine learning co-processors(for compute) if
> > > > we can keep a generic name and it is for the above functions we
> > > > may use this subsystem as well in the future.
> > >
> >
> > Accelerator, 'acce_dev' ? ;-)
> 
> It may get confused with HW accelerators.
> 
> 
> Some of the options I can think of. Sorting in my preference.
> 
> library name, API prefix
> 1) libhpc-dev, rte_hpc_ (hpc-> Heterogeneous processor compute)
> 2) libhc-dev, rte_hc_
> (https://en.wikipedia.org/wiki/Heterogeneous_computing see: Example
> hardware)
> 3) libpu-dev, rte_pu_ (pu -> processing unit)
> 4) libhp-dev, rte_hp_ (hp->heterogeneous processor)
> 5) libcoprocessor-dev, rte_cps_ ?
> 6) libcompute-dev, rte_cpt_ ?
> 7) libgpu-dev, rte_gpu_
These seem to assume that the application can launch its own workload on the device? Does DPDK need to provide abstract APIs for launching work on a device?


> 
> 
> 
> 
> >
> > > Yes that's the idea to share a common synchronization mechanism with
> > > different HW.
> > >
> > > That's cool to have such a big interest in the community for this patch.
> > >
> >

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-06  1:13                           ` Honnappa Nagarahalli
@ 2021-06-06  5:28                             ` Jerin Jacob
  2021-06-07 10:29                               ` Thomas Monjalon
  2021-06-07  7:20                             ` Wang, Haiyue
  1 sibling, 1 reply; 128+ messages in thread
From: Jerin Jacob @ 2021-06-06  5:28 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Wang, Haiyue, thomas, Andrew Rybchenko, Yigit, Ferruh, dpdk-dev,
	Elena Agostini, David Marchand, nd

On Sun, Jun 6, 2021 at 6:44 AM Honnappa Nagarahalli
<Honnappa.Nagarahalli@arm.com> wrote:
>
> <snip>
>
> > > >
> > > > 04/06/2021 17:20, Jerin Jacob:
> > > > > On Fri, Jun 4, 2021 at 7:39 PM Thomas Monjalon
> > <thomas@monjalon.net> wrote:
> > > > > > 04/06/2021 15:59, Andrew Rybchenko:
> > > > > > > On 6/4/21 4:18 PM, Thomas Monjalon wrote:
> > > > > > > > 04/06/2021 15:05, Andrew Rybchenko:
> > > > > > > >> On 6/4/21 3:46 PM, Thomas Monjalon wrote:
> > > > > > > >>> 04/06/2021 13:09, Jerin Jacob:
> > > > > > > >>>> On Fri, Jun 4, 2021 at 3:58 PM Thomas Monjalon
> > <thomas@monjalon.net> wrote:
> > > > > > > >>>>> 03/06/2021 11:33, Ferruh Yigit:
> > > > > > > >>>>>> On 6/3/2021 8:47 AM, Jerin Jacob wrote:
> > > > > > > >>>>>>> On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon
> > <thomas@monjalon.net> wrote:
> > > > > > > >>>>>>>> +  [gpudev]             (@ref rte_gpudev.h),
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> Since this device does not have a queue etc? Shouldn't
> > > > > > > >>>>>>> make it a library like mempool with vendor-defined ops?
> > > > > > > >>>>>>
> > > > > > > >>>>>> +1
> > > > > > > >>>>>>
> > > > > > > >>>>>> Current RFC announces additional memory allocation
> > > > > > > >>>>>> capabilities, which can suits better as extension to
> > > > > > > >>>>>> existing memory related library instead of a new device
> > abstraction library.
> > > > > > > >>>>>
> > > > > > > >>>>> It is not replacing mempool.
> > > > > > > >>>>> It is more at the same level as EAL memory management:
> > > > > > > >>>>> allocate simple buffer, but with the exception it is
> > > > > > > >>>>> done on a specific device, so it requires a device ID.
> > > > > > > >>>>>
> > > > > > > >>>>> The other reason it needs to be a full library is that
> > > > > > > >>>>> it will start a workload on the GPU and get completion
> > > > > > > >>>>> notification so we can integrate the GPU workload in a packet
> > processing pipeline.
> > > > > > > >>>>
> > > > > > > >>>> I might have confused you. My intention is not to make to fit
> > under mempool API.
> > > > > > > >>>>
> > > > > > > >>>> I agree that we need a separate library for this. My
> > > > > > > >>>> objection is only to not call libgpudev and call it
> > > > > > > >>>> libgpu. And have APIs with rte_gpu_ instead of
> > > > > > > >>>> rte_gpu_dev as it not like existing "device libraries" in
> > > > > > > >>>> DPDK and it like other "libraries" in DPDK.
> > > > > > > >>>
> > > > > > > >>> I think we should define a queue of processing actions, so
> > > > > > > >>> it looks like other device libraries.
> > > > > > > >>> And anyway I think a library managing a device class, and
> > > > > > > >>> having some device drivers deserves the name of device library.
> > > > > > > >>>
> > > > > > > >>> I would like to read more opinions.
> > > > > > > >>
> > > > > > > >> Since the library is an unified interface to GPU device
> > > > > > > >> drivers I think it should be named as in the patch - gpudev.
> > > > > > > >>
> > > > > > > >> Mempool looks like an exception here - initially it was
> > > > > > > >> pure SW library, but not there are HW backends and
> > > > > > > >> corresponding device drivers.
> > > > > > > >>
> > > > > > > >> What I don't understand where is GPU specifics here?
> > > > > > > >
> > > > > > > > That's an interesting question.
> > > > > > > > Let's ask first what is a GPU for DPDK?
> > > > > > > > I think it is like a sub-CPU with high parallel execution
> > > > > > > > capabilities, and it is controlled by the CPU.
> > > > > > >
> > > > > > > I have no good ideas how to name it in accordance with above
> > > > > > > description to avoid "G" which for "Graphics" if understand
> > > > > > > correctly. However, may be it is not required.
> > > > > > > No strong opinion on the topic, but unbinding from "Graphics"
> > > > > > > would be nice.
> > > > > >
> > > > > > That's a question I ask myself for months now.
> > > > > > I am not able to find a better name, and I start thinking that
> > > > > > "GPU" is famous enough in high-load computing to convey the idea
> > > > > > of what we can expect.
> > > > >
> > > > >
> > > > > The closest I can think of is big-little architecture in ARM SoC.
> > > > > https://www.arm.com/why-arm/technologies/big-little
> From the application pov, big-little arch is nothing but SMT. Not sure how it is similar to another device on PCIe.


Yes. It may not be a device sitting on a PCIe bus, However,  It can
access it via some bus from the main CPU.


>
> > > > >
> > > > > We do have similar architecture, Where the "coprocessor" is part
> > > > > of the main CPU.
> > > > > It is operations are:
> > > > > - Download firmware
> > > > > - Memory mapping for Main CPU memory by the co-processor
> > > > > - Enq/Deq Jobs from/to Main CPU/Coprocessor CPU.
> > > >
> > > > Yes it looks like the exact same scope.
> > > > I like the word "co-processor" in this context.
> > > >
> > > > > If your scope is something similar and No Graphics involved here
> > > > > then we can remove G.
> > > >
> > > > Indeed no graphics in DPDK :)
> > > > By removing the G, you mean keeping only PU? like "pudev"?
> > > > We could also define the G as "General".
> > > >
> > > > > Coincidentally, Yesterday, I had an interaction with Elena for the
> > > > > same for BaseBand related work in ORAN where GPU used as Baseband
> > > > > processing instead of Graphics.(So I can understand the big
> > > > > picture of this library)
> This patch does not provide the big picture view of what the processing looks like using GPU. It would be good to explain that.
> For ex:
> 1) Will the notion of GPU hidden from the application? i.e. is the application allowed to launch kernels?
>         1a) Will DPDK provide abstract APIs to launch kernels?
>      This would require us to have the notion of GPU in DPDK and the application would depend on the availability of GPU in the system.
> 2) Is launching kernels hidden? i.e. the application still calls DPDK abstract APIs (such as encryption/decryption APIs) without knowing that the encryption/decryption is happening on GPU.
>      This does not require us to have a notion of GPU in DPDK at the API level

I will leave this to Thomas.

>
> If we keep CXL in mind, I would imagine that in the future the devices on PCIe could have their own local memory. May be some of the APIs could use generic names. For ex: instead of calling it as "rte_gpu_malloc" may be we could call it as "rte_dev_malloc". This way any future device which hosts its own memory that need to be managed by the application, can use these APIs.

That is a good thought. it is possible to hook the download firmware,
memory management, Job management(as messages to/from device) to
rte_device itself.
I think, one needs to consider, how to integrate with the existing
DPDK subsystem, for example: If one decided to implement bbdev or
regexdev with such computing device,
Need to consider, Is it better to have bbdev driver has depended
gpudev or rte_device has this callback and use with bbdev driver.




>
>
> > > >
> > > > Yes baseband processing is one possible usage of GPU with DPDK.
> > > > We could also imagine some security analysis, or any machine learning...
> > > >
> > > > > I can think of "coprocessor-dev" as one of the name.
> > > >
> > > > "coprocessor" looks too long as prefix of the functions.
> >
> > Yes. Libray name can be lengthy, but API prefix should be 3 letters kind short
> > form will be required.
> >
> >
> > > >
> > > > > We do have similar machine learning co-processors(for compute) if
> > > > > we can keep a generic name and it is for the above functions we
> > > > > may use this subsystem as well in the future.
> > > >
> > >
> > > Accelerator, 'acce_dev' ? ;-)
> >
> > It may get confused with HW accelerators.
> >
> >
> > Some of the options I can think of. Sorting in my preference.
> >
> > library name, API prefix
> > 1) libhpc-dev, rte_hpc_ (hpc-> Heterogeneous processor compute)
> > 2) libhc-dev, rte_hc_
> > (https://en.wikipedia.org/wiki/Heterogeneous_computing see: Example
> > hardware)
> > 3) libpu-dev, rte_pu_ (pu -> processing unit)
> > 4) libhp-dev, rte_hp_ (hp->heterogeneous processor)
> > 5) libcoprocessor-dev, rte_cps_ ?
> > 6) libcompute-dev, rte_cpt_ ?
> > 7) libgpu-dev, rte_gpu_
> These seem to assume that the application can launch its own workload on the device? Does DPDK need to provide abstract APIs for launching work on a device?
>
>
> >
> >
> >
> >
> > >
> > > > Yes that's the idea to share a common synchronization mechanism with
> > > > different HW.
> > > >
> > > > That's cool to have such a big interest in the community for this patch.
> > > >
> > >

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-06  1:13                           ` Honnappa Nagarahalli
  2021-06-06  5:28                             ` Jerin Jacob
@ 2021-06-07  7:20                             ` Wang, Haiyue
  2021-06-07 10:43                               ` Thomas Monjalon
  1 sibling, 1 reply; 128+ messages in thread
From: Wang, Haiyue @ 2021-06-07  7:20 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Jerin Jacob
  Cc: thomas, Andrew Rybchenko, Yigit, Ferruh, dpdk-dev,
	Elena Agostini, David Marchand, nd, nd

> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Sent: Sunday, June 6, 2021 09:14
> To: Jerin Jacob <jerinjacobk@gmail.com>; Wang, Haiyue <haiyue.wang@intel.com>
> Cc: thomas@monjalon.net; Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; Yigit, Ferruh
> <ferruh.yigit@intel.com>; dpdk-dev <dev@dpdk.org>; Elena Agostini <eagostini@nvidia.com>; David
> Marchand <david.marchand@redhat.com>; nd <nd@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
> Subject: RE: [dpdk-dev] [PATCH] gpudev: introduce memory API
> 
> <snip>
> 
> > > >
> > > > 04/06/2021 17:20, Jerin Jacob:
> > > > > On Fri, Jun 4, 2021 at 7:39 PM Thomas Monjalon
> > <thomas@monjalon.net> wrote:
> > > > > > 04/06/2021 15:59, Andrew Rybchenko:
> > > > > > > On 6/4/21 4:18 PM, Thomas Monjalon wrote:
> > > > > > > > 04/06/2021 15:05, Andrew Rybchenko:
> > > > > > > >> On 6/4/21 3:46 PM, Thomas Monjalon wrote:
> > > > > > > >>> 04/06/2021 13:09, Jerin Jacob:
> > > > > > > >>>> On Fri, Jun 4, 2021 at 3:58 PM Thomas Monjalon
> > <thomas@monjalon.net> wrote:
> > > > > > > >>>>> 03/06/2021 11:33, Ferruh Yigit:
> > > > > > > >>>>>> On 6/3/2021 8:47 AM, Jerin Jacob wrote:
> > > > > > > >>>>>>> On Thu, Jun 3, 2021 at 2:05 AM Thomas Monjalon
> > <thomas@monjalon.net> wrote:
> > > > > > > >>>>>>>> +  [gpudev]             (@ref rte_gpudev.h),
> > > > > > > >>>>>>>
> > > > > > > >>>>>>> Since this device does not have a queue etc? Shouldn't
> > > > > > > >>>>>>> make it a library like mempool with vendor-defined ops?
> > > > > > > >>>>>>
> > > > > > > >>>>>> +1
> > > > > > > >>>>>>
> > > > > > > >>>>>> Current RFC announces additional memory allocation
> > > > > > > >>>>>> capabilities, which can suits better as extension to
> > > > > > > >>>>>> existing memory related library instead of a new device
> > abstraction library.
> > > > > > > >>>>>
> > > > > > > >>>>> It is not replacing mempool.
> > > > > > > >>>>> It is more at the same level as EAL memory management:
> > > > > > > >>>>> allocate simple buffer, but with the exception it is
> > > > > > > >>>>> done on a specific device, so it requires a device ID.
> > > > > > > >>>>>
> > > > > > > >>>>> The other reason it needs to be a full library is that
> > > > > > > >>>>> it will start a workload on the GPU and get completion
> > > > > > > >>>>> notification so we can integrate the GPU workload in a packet
> > processing pipeline.
> > > > > > > >>>>
> > > > > > > >>>> I might have confused you. My intention is not to make to fit
> > under mempool API.
> > > > > > > >>>>
> > > > > > > >>>> I agree that we need a separate library for this. My
> > > > > > > >>>> objection is only to not call libgpudev and call it
> > > > > > > >>>> libgpu. And have APIs with rte_gpu_ instead of
> > > > > > > >>>> rte_gpu_dev as it not like existing "device libraries" in
> > > > > > > >>>> DPDK and it like other "libraries" in DPDK.
> > > > > > > >>>
> > > > > > > >>> I think we should define a queue of processing actions, so
> > > > > > > >>> it looks like other device libraries.
> > > > > > > >>> And anyway I think a library managing a device class, and
> > > > > > > >>> having some device drivers deserves the name of device library.
> > > > > > > >>>
> > > > > > > >>> I would like to read more opinions.
> > > > > > > >>
> > > > > > > >> Since the library is an unified interface to GPU device
> > > > > > > >> drivers I think it should be named as in the patch - gpudev.
> > > > > > > >>
> > > > > > > >> Mempool looks like an exception here - initially it was
> > > > > > > >> pure SW library, but not there are HW backends and
> > > > > > > >> corresponding device drivers.
> > > > > > > >>
> > > > > > > >> What I don't understand where is GPU specifics here?
> > > > > > > >
> > > > > > > > That's an interesting question.
> > > > > > > > Let's ask first what is a GPU for DPDK?
> > > > > > > > I think it is like a sub-CPU with high parallel execution
> > > > > > > > capabilities, and it is controlled by the CPU.
> > > > > > >
> > > > > > > I have no good ideas how to name it in accordance with above
> > > > > > > description to avoid "G" which for "Graphics" if understand
> > > > > > > correctly. However, may be it is not required.
> > > > > > > No strong opinion on the topic, but unbinding from "Graphics"
> > > > > > > would be nice.
> > > > > >
> > > > > > That's a question I ask myself for months now.
> > > > > > I am not able to find a better name, and I start thinking that
> > > > > > "GPU" is famous enough in high-load computing to convey the idea
> > > > > > of what we can expect.
> > > > >
> > > > >
> > > > > The closest I can think of is big-little architecture in ARM SoC.
> > > > > https://www.arm.com/why-arm/technologies/big-little
> From the application pov, big-little arch is nothing but SMT. Not sure how it is similar to another
> device on PCIe.
> 
> > > > >
> > > > > We do have similar architecture, Where the "coprocessor" is part
> > > > > of the main CPU.
> > > > > It is operations are:
> > > > > - Download firmware
> > > > > - Memory mapping for Main CPU memory by the co-processor
> > > > > - Enq/Deq Jobs from/to Main CPU/Coprocessor CPU.
> > > >
> > > > Yes it looks like the exact same scope.
> > > > I like the word "co-processor" in this context.
> > > >
> > > > > If your scope is something similar and No Graphics involved here
> > > > > then we can remove G.
> > > >
> > > > Indeed no graphics in DPDK :)
> > > > By removing the G, you mean keeping only PU? like "pudev"?
> > > > We could also define the G as "General".
> > > >
> > > > > Coincidentally, Yesterday, I had an interaction with Elena for the
> > > > > same for BaseBand related work in ORAN where GPU used as Baseband
> > > > > processing instead of Graphics.(So I can understand the big
> > > > > picture of this library)
> This patch does not provide the big picture view of what the processing looks like using GPU. It would
> be good to explain that.
> For ex:
> 1) Will the notion of GPU hidden from the application? i.e. is the application allowed to launch
> kernels?
> 	1a) Will DPDK provide abstract APIs to launch kernels?
>      This would require us to have the notion of GPU in DPDK and the application would depend on the
> availability of GPU in the system.
> 2) Is launching kernels hidden? i.e. the application still calls DPDK abstract APIs (such as
> encryption/decryption APIs) without knowing that the encryption/decryption is happening on GPU.
>      This does not require us to have a notion of GPU in DPDK at the API level
> 
> If we keep CXL in mind, I would imagine that in the future the devices on PCIe could have their own
> local memory. May be some of the APIs could use generic names. For ex: instead of calling it as
> "rte_gpu_malloc" may be we could call it as "rte_dev_malloc". This way any future device which hosts
> its own memory that need to be managed by the application, can use these APIs.
> 

"rte_dev_malloc" sounds a good name, then looks like we need to enhance the
'struct rte_device' with some new ops as:

eal: move DMA mapping from bus-specific to generic driver

https://patchwork.dpdk.org/project/dpdk/patch/20210331224547.2217759-1-thomas@monjalon.net/

> 
> > > >
> > > > Yes baseband processing is one possible usage of GPU with DPDK.
> > > > We could also imagine some security analysis, or any machine learning...
> > > >
> > > > > I can think of "coprocessor-dev" as one of the name.
> > > >
> > > > "coprocessor" looks too long as prefix of the functions.
> >
> > Yes. Libray name can be lengthy, but API prefix should be 3 letters kind short
> > form will be required.
> >
> >
> > > >
> > > > > We do have similar machine learning co-processors(for compute) if
> > > > > we can keep a generic name and it is for the above functions we
> > > > > may use this subsystem as well in the future.
> > > >
> > >
> > > Accelerator, 'acce_dev' ? ;-)
> >
> > It may get confused with HW accelerators.
> >
> >
> > Some of the options I can think of. Sorting in my preference.
> >
> > library name, API prefix
> > 1) libhpc-dev, rte_hpc_ (hpc-> Heterogeneous processor compute)
> > 2) libhc-dev, rte_hc_
> > (https://en.wikipedia.org/wiki/Heterogeneous_computing see: Example
> > hardware)
> > 3) libpu-dev, rte_pu_ (pu -> processing unit)
> > 4) libhp-dev, rte_hp_ (hp->heterogeneous processor)
> > 5) libcoprocessor-dev, rte_cps_ ?
> > 6) libcompute-dev, rte_cpt_ ?
> > 7) libgpu-dev, rte_gpu_
> These seem to assume that the application can launch its own workload on the device? Does DPDK need to
> provide abstract APIs for launching work on a device?
> 
> 
> >
> >
> >
> >
> > >
> > > > Yes that's the idea to share a common synchronization mechanism with
> > > > different HW.
> > > >
> > > > That's cool to have such a big interest in the community for this patch.
> > > >
> > >

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-06  5:28                             ` Jerin Jacob
@ 2021-06-07 10:29                               ` Thomas Monjalon
  0 siblings, 0 replies; 128+ messages in thread
From: Thomas Monjalon @ 2021-06-07 10:29 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: dev, Wang, Haiyue, Andrew Rybchenko, Yigit, Ferruh, dpdk-dev,
	Elena Agostini, David Marchand, nd, Jerin Jacob

06/06/2021 07:28, Jerin Jacob:
> On Sun, Jun 6, 2021 at 6:44 AM Honnappa Nagarahalli
> > This patch does not provide the big picture view of what the processing looks like using GPU. It would be good to explain that.
> > For ex:
> > 1) Will the notion of GPU hidden from the application? i.e. is the application allowed to launch kernels?
> >         1a) Will DPDK provide abstract APIs to launch kernels?
> >      This would require us to have the notion of GPU in DPDK and the application would depend on the availability of GPU in the system.

Not sure "kernels" is a well known word in this context.
I propose talking about computing tasks.
The DPDK application running on the CPU must be synchronized
with the tasks running on devices, so yes we need a way
to decide what to launch and when from the DPDK application.

> > 2) Is launching kernels hidden? i.e. the application still calls DPDK abstract APIs (such as encryption/decryption APIs) without knowing that the encryption/decryption is happening on GPU.
> >      This does not require us to have a notion of GPU in DPDK at the API level
> 
> I will leave this to Thomas.

The general need is to allow running any kind of processing on devices.
Some processing may be very specific, others could fit in the existing
class API like crypto and regex.
I think implementing such specific class drivers based on tasks
dynamically loaded on the device may be done as a second step.

Thank you for the questions, it helps defining the big picture
for the next revision of the patch.

> > If we keep CXL in mind, I would imagine that in the future the devices on PCIe could have their own local memory. May be some of the APIs could use generic names. For ex: instead of calling it as "rte_gpu_malloc" may be we could call it as "rte_dev_malloc". This way any future device which hosts its own memory that need to be managed by the application, can use these APIs.
> 
> That is a good thought. it is possible to hook the download firmware,
> memory management, Job management(as messages to/from device) to
> rte_device itself.
> I think, one needs to consider, how to integrate with the existing
> DPDK subsystem, for example: If one decided to implement bbdev or
> regexdev with such computing device,
> Need to consider, Is it better to have bbdev driver has depended
> gpudev or rte_device has this callback and use with bbdev driver.

Absolutely. If a specialized driver class fits with a workload,
it is best handled with a driver in its specific class.

> > > > > Yes baseband processing is one possible usage of GPU with DPDK.
> > > > > We could also imagine some security analysis, or any machine learning...
> > > > >
> > > > > > I can think of "coprocessor-dev" as one of the name.
> > > > >
> > > > > "coprocessor" looks too long as prefix of the functions.
> > >
> > > Yes. Libray name can be lengthy, but API prefix should be 3 letters kind short
> > > form will be required.
> > >
> > >
> > > > >
> > > > > > We do have similar machine learning co-processors(for compute) if
> > > > > > we can keep a generic name and it is for the above functions we
> > > > > > may use this subsystem as well in the future.
> > > > >
> > > >
> > > > Accelerator, 'acce_dev' ? ;-)
> > >
> > > It may get confused with HW accelerators.
> > >
> > >
> > > Some of the options I can think of. Sorting in my preference.
> > >
> > > library name, API prefix
> > > 1) libhpc-dev, rte_hpc_ (hpc-> Heterogeneous processor compute)
> > > 2) libhc-dev, rte_hc_
> > > (https://en.wikipedia.org/wiki/Heterogeneous_computing see: Example
> > > hardware)
> > > 3) libpu-dev, rte_pu_ (pu -> processing unit)
> > > 4) libhp-dev, rte_hp_ (hp->heterogeneous processor)
> > > 5) libcoprocessor-dev, rte_cps_ ?
> > > 6) libcompute-dev, rte_cpt_ ?
> > > 7) libgpu-dev, rte_gpu_
> > 
> > These seem to assume that the application can launch its own workload on the device? Does DPDK need to provide abstract APIs for launching work on a device?

That's the difficult part.
We should not try to re-invent CUDA or OpenCL.
I think this part should not be in DPDK.
We only need to synchronize with dynamic nature of the device workload.
We will be more specific in the v2.




^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-07  7:20                             ` Wang, Haiyue
@ 2021-06-07 10:43                               ` Thomas Monjalon
  2021-06-07 13:54                                 ` Jerin Jacob
  0 siblings, 1 reply; 128+ messages in thread
From: Thomas Monjalon @ 2021-06-07 10:43 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Jerin Jacob
  Cc: dev, Andrew Rybchenko, Yigit, Ferruh, dpdk-dev, Elena Agostini,
	David Marchand, nd, Wang, Haiyue

07/06/2021 09:20, Wang, Haiyue:
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > If we keep CXL in mind, I would imagine that in the future the devices on PCIe could have their own
> > local memory. May be some of the APIs could use generic names. For ex: instead of calling it as
> > "rte_gpu_malloc" may be we could call it as "rte_dev_malloc". This way any future device which hosts
> > its own memory that need to be managed by the application, can use these APIs.
> > 
> 
> "rte_dev_malloc" sounds a good name,

Yes I like the idea.
2 concerns:

1/ Device memory allocation requires a device handle.
So far we avoided exposing rte_device to the application.
How should we get a device handle from a DPDK application?

2/ Implementation must be done in a driver.
Should it be a callback defined at rte_device level?

> then looks like we need to enhance the
> 'struct rte_device' with some new ops as:
> 
> eal: move DMA mapping from bus-specific to generic driver
> 
> https://patchwork.dpdk.org/project/dpdk/patch/20210331224547.2217759-1-thomas@monjalon.net/

Not sure the above patch is a good idea.
Let's discuss this DMA detail later :)




^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-06  1:10 ` Honnappa Nagarahalli
@ 2021-06-07 10:50   ` Thomas Monjalon
  0 siblings, 0 replies; 128+ messages in thread
From: Thomas Monjalon @ 2021-06-07 10:50 UTC (permalink / raw)
  To: Honnappa Nagarahalli; +Cc: dev, Elena Agostini, nd

06/06/2021 03:10, Honnappa Nagarahalli:
> > The new library gpudev is for dealing with GPU from a DPDK application in a
> > vendor-agnostic way.
> 
> It would be good to explain how the application using GPU+DPDK would look like.

It can be anything to invent.
We can give few more words saying it is a computing done in parallel,
being either a loop or sequence of tasks. In both cases,
the DPDK application running on the CPU must be in control.

> Which parts of the workload need DPDK's support?

Control:
- start task (by function or shared memory variable)
- get completion notification (most probably by shared memory)

We will add such requirements in v2.

> Any requirements on co-existence of GPU with other accelerators?

It must be generic to allow any workload with any device.

> > As a first step, the features are focused on memory management.
> > A function allows to allocate memory inside the GPU, while another one
> > allows to use main (CPU) memory from the GPU.
> 
> Is this memory for packet buffers or something else?

Can be for packets or for control variables.



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-07 10:43                               ` Thomas Monjalon
@ 2021-06-07 13:54                                 ` Jerin Jacob
  2021-06-07 16:47                                   ` Thomas Monjalon
  2021-06-07 23:31                                   ` Honnappa Nagarahalli
  0 siblings, 2 replies; 128+ messages in thread
From: Jerin Jacob @ 2021-06-07 13:54 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Honnappa Nagarahalli, Andrew Rybchenko, Yigit, Ferruh, dpdk-dev,
	Elena Agostini, David Marchand, nd, Wang, Haiyue

On Mon, Jun 7, 2021 at 4:13 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 07/06/2021 09:20, Wang, Haiyue:
> > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > > If we keep CXL in mind, I would imagine that in the future the devices on PCIe could have their own
> > > local memory. May be some of the APIs could use generic names. For ex: instead of calling it as
> > > "rte_gpu_malloc" may be we could call it as "rte_dev_malloc". This way any future device which hosts
> > > its own memory that need to be managed by the application, can use these APIs.
> > >
> >
> > "rte_dev_malloc" sounds a good name,
>
> Yes I like the idea.
> 2 concerns:
>
> 1/ Device memory allocation requires a device handle.
> So far we avoided exposing rte_device to the application.
> How should we get a device handle from a DPDK application?

Each device behaves differently at this level. In the view of the
generic application, the architecture should like

< Use DPDK subsystem as rte_ethdev, rte_bbdev etc for SPECIFIC function >
^
|
< DPDK driver>
^
|
<rte_device with this new callbacks >

An implementation may decide to have "in tree" or "out of tree"
drivers or rte_device implementaion.
But generic DPDK applications should not use devices directly. i.e
rte_device need to have this callback and
mlx ethdev/crypto driver use this driver to implement public API.
Otherwise, it is the same as rawdev in DPDK.
So not sure what it brings other than raw dev here if we are not
taking the above architecture.

>
> 2/ Implementation must be done in a driver.
> Should it be a callback defined at rte_device level?

IMO, Yes and DPDK subsystem drivers to use it.

>
> > then looks like we need to enhance the
> > 'struct rte_device' with some new ops as:
> >
> > eal: move DMA mapping from bus-specific to generic driver
> >
> > https://patchwork.dpdk.org/project/dpdk/patch/20210331224547.2217759-1-thomas@monjalon.net/
>
> Not sure the above patch is a good idea.
> Let's discuss this DMA detail later :)
>
>
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-07 13:54                                 ` Jerin Jacob
@ 2021-06-07 16:47                                   ` Thomas Monjalon
  2021-06-08  4:10                                     ` Jerin Jacob
  2021-06-07 23:31                                   ` Honnappa Nagarahalli
  1 sibling, 1 reply; 128+ messages in thread
From: Thomas Monjalon @ 2021-06-07 16:47 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, Honnappa Nagarahalli, Andrew Rybchenko, Yigit, Ferruh,
	dpdk-dev, Elena Agostini, David Marchand, nd, Wang, Haiyue

07/06/2021 15:54, Jerin Jacob:
> On Mon, Jun 7, 2021 at 4:13 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > 07/06/2021 09:20, Wang, Haiyue:
> > > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > > > If we keep CXL in mind, I would imagine that in the future the devices on PCIe could have their own
> > > > local memory. May be some of the APIs could use generic names. For ex: instead of calling it as
> > > > "rte_gpu_malloc" may be we could call it as "rte_dev_malloc". This way any future device which hosts
> > > > its own memory that need to be managed by the application, can use these APIs.
> > > >
> > >
> > > "rte_dev_malloc" sounds a good name,
> >
> > Yes I like the idea.
> > 2 concerns:
> >
> > 1/ Device memory allocation requires a device handle.
> > So far we avoided exposing rte_device to the application.
> > How should we get a device handle from a DPDK application?
> 
> Each device behaves differently at this level. In the view of the
> generic application, the architecture should like
> 
> < Use DPDK subsystem as rte_ethdev, rte_bbdev etc for SPECIFIC function >
> ^
> |
> < DPDK driver>
> ^
> |
> <rte_device with this new callbacks >

I think the formatting went wrong above.

I would add more to the block diagram:

class device API      - computing device API
        |            |              |
class device driver -   computing device driver
        |                           |
       EAL device with memory callback

The idea above is that the class device driver can use services
of the new computing device library.
One basic API service is to provide a device ID for the memory callback.
Other services are for execution control.

> An implementation may decide to have "in tree" or "out of tree"
> drivers or rte_device implementaion.
> But generic DPDK applications should not use devices directly. i.e
> rte_device need to have this callback and
> mlx ethdev/crypto driver use this driver to implement public API.
> Otherwise, it is the same as rawdev in DPDK.
> So not sure what it brings other than raw dev here if we are not
> taking the above architecture.
> 
> >
> > 2/ Implementation must be done in a driver.
> > Should it be a callback defined at rte_device level?
> 
> IMO, Yes and DPDK subsystem drivers to use it.

I'm not sure subsystems should bypass the API for device memory.
We could do some generic work in the API function and call
the driver callback only for device-specific stuff.
In such case the callback and the API would be
in the library computing device library.
On the other hand, having the callback and API in EAL would allow
having a common function for memory allocation in EAL.

Another thought: I would like to unify memory allocation in DPDK
with the same set of flags in an unique function.
A flag could be used to target devices instead of the running CPU,
and the same parameter could be shared for the device ID or NUMA node.



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-07 13:54                                 ` Jerin Jacob
  2021-06-07 16:47                                   ` Thomas Monjalon
@ 2021-06-07 23:31                                   ` Honnappa Nagarahalli
  1 sibling, 0 replies; 128+ messages in thread
From: Honnappa Nagarahalli @ 2021-06-07 23:31 UTC (permalink / raw)
  To: Jerin Jacob, thomas
  Cc: Andrew Rybchenko, Yigit, Ferruh, dpdk-dev, Elena Agostini,
	David Marchand, nd, Wang, Haiyue, Honnappa Nagarahalli, nd



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> On Mon, Jun 7, 2021 at 4:13 PM Thomas Monjalon <thomas@monjalon.net>
> wrote:
> >
> > 07/06/2021 09:20, Wang, Haiyue:
> > > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > > > If we keep CXL in mind, I would imagine that in the future the
> > > > devices on PCIe could have their own local memory. May be some of
> > > > the APIs could use generic names. For ex: instead of calling it as
> > > > "rte_gpu_malloc" may be we could call it as "rte_dev_malloc". This way
> any future device which hosts its own memory that need to be managed by
> the application, can use these APIs.
> > > >
> > >
> > > "rte_dev_malloc" sounds a good name,
> >
> > Yes I like the idea.
> > 2 concerns:
> >
> > 1/ Device memory allocation requires a device handle.
> > So far we avoided exposing rte_device to the application.
> > How should we get a device handle from a DPDK application?
> 
> Each device behaves differently at this level. In the view of the generic
> application, the architecture should like
> 
> < Use DPDK subsystem as rte_ethdev, rte_bbdev etc for SPECIFIC function > ^
> |
> < DPDK driver>
> ^
> |
> <rte_device with this new callbacks >
> 
> An implementation may decide to have "in tree" or "out of tree"
> drivers or rte_device implementaion.
> But generic DPDK applications should not use devices directly. i.e rte_device
> need to have this callback and mlx ethdev/crypto driver use this driver to
> implement public API.
> Otherwise, it is the same as rawdev in DPDK.
> So not sure what it brings other than raw dev here if we are not taking the
> above architecture.
Agree, I think it is important to hide the device under the APIs for the application to benefit.

> 
> >
> > 2/ Implementation must be done in a driver.
> > Should it be a callback defined at rte_device level?
> 
> IMO, Yes and DPDK subsystem drivers to use it.
> 
> >
> > > then looks like we need to enhance the 'struct rte_device' with some
> > > new ops as:
> > >
> > > eal: move DMA mapping from bus-specific to generic driver
> > >
> > >
> https://patchwork.dpdk.org/project/dpdk/patch/20210331224547.2217759
> > > -1-thomas@monjalon.net/
> >
> > Not sure the above patch is a good idea.
> > Let's discuss this DMA detail later :)
> >
> >
> >

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-07 16:47                                   ` Thomas Monjalon
@ 2021-06-08  4:10                                     ` Jerin Jacob
  2021-06-08  6:34                                       ` Thomas Monjalon
  0 siblings, 1 reply; 128+ messages in thread
From: Jerin Jacob @ 2021-06-08  4:10 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Honnappa Nagarahalli, Andrew Rybchenko, Yigit, Ferruh, dpdk-dev,
	Elena Agostini, David Marchand, nd, Wang, Haiyue

On Mon, Jun 7, 2021 at 10:17 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 07/06/2021 15:54, Jerin Jacob:
> > On Mon, Jun 7, 2021 at 4:13 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > 07/06/2021 09:20, Wang, Haiyue:
> > > > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > > > > If we keep CXL in mind, I would imagine that in the future the devices on PCIe could have their own
> > > > > local memory. May be some of the APIs could use generic names. For ex: instead of calling it as
> > > > > "rte_gpu_malloc" may be we could call it as "rte_dev_malloc". This way any future device which hosts
> > > > > its own memory that need to be managed by the application, can use these APIs.
> > > > >
> > > >
> > > > "rte_dev_malloc" sounds a good name,
> > >
> > > Yes I like the idea.
> > > 2 concerns:
> > >
> > > 1/ Device memory allocation requires a device handle.
> > > So far we avoided exposing rte_device to the application.
> > > How should we get a device handle from a DPDK application?
> >
> > Each device behaves differently at this level. In the view of the
> > generic application, the architecture should like
> >
> > < Use DPDK subsystem as rte_ethdev, rte_bbdev etc for SPECIFIC function >
> > ^
> > |
> > < DPDK driver>
> > ^
> > |
> > <rte_device with this new callbacks >
>
> I think the formatting went wrong above.
>
> I would add more to the block diagram:
>
> class device API      - computing device API
>         |            |              |
> class device driver -   computing device driver
>         |                           |
>        EAL device with memory callback
>
> The idea above is that the class device driver can use services
> of the new computing device library.

Yes. The question is, do we need any public DPDK _application_ APIs for that?
If it is public API then the scope is much bigger than that as the application
can use it directly and it makes it non portable.

if the scope is only, the class driver consumption then the existing
"bus"  _kind of_
abstraction/API makes sense to me.

Where it abstracts,
-FW download of device
-Memory management of device
-Opaque way to enq/deque jobs to the device.

And above should be consumed by "class driver" not "application".

If the application doing do that, we are in rte_raw device territory.


> One basic API service is to provide a device ID for the memory callback.
> Other services are for execution control.
>
> > An implementation may decide to have "in tree" or "out of tree"
> > drivers or rte_device implementaion.
> > But generic DPDK applications should not use devices directly. i.e
> > rte_device need to have this callback and
> > mlx ethdev/crypto driver use this driver to implement public API.
> > Otherwise, it is the same as rawdev in DPDK.
> > So not sure what it brings other than raw dev here if we are not
> > taking the above architecture.
> >
> > >
> > > 2/ Implementation must be done in a driver.
> > > Should it be a callback defined at rte_device level?
> >
> > IMO, Yes and DPDK subsystem drivers to use it.
>
> I'm not sure subsystems should bypass the API for device memory.
> We could do some generic work in the API function and call
> the driver callback only for device-specific stuff.
> In such case the callback and the API would be
> in the library computing device library.
> On the other hand, having the callback and API in EAL would allow
> having a common function for memory allocation in EAL.
>
> Another thought: I would like to unify memory allocation in DPDK
> with the same set of flags in an unique function.
> A flag could be used to target devices instead of the running CPU,
> and the same parameter could be shared for the device ID or NUMA node.
>
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-08  4:10                                     ` Jerin Jacob
@ 2021-06-08  6:34                                       ` Thomas Monjalon
  2021-06-08  7:09                                         ` Jerin Jacob
  2021-06-15 18:24                                         ` Ferruh Yigit
  0 siblings, 2 replies; 128+ messages in thread
From: Thomas Monjalon @ 2021-06-08  6:34 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Honnappa Nagarahalli, Andrew Rybchenko, Yigit, Ferruh, dpdk-dev,
	Elena Agostini, David Marchand, nd, Wang, Haiyue

08/06/2021 06:10, Jerin Jacob:
> On Mon, Jun 7, 2021 at 10:17 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> >
> > 07/06/2021 15:54, Jerin Jacob:
> > > On Mon, Jun 7, 2021 at 4:13 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > 07/06/2021 09:20, Wang, Haiyue:
> > > > > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > > > > > If we keep CXL in mind, I would imagine that in the future the devices on PCIe could have their own
> > > > > > local memory. May be some of the APIs could use generic names. For ex: instead of calling it as
> > > > > > "rte_gpu_malloc" may be we could call it as "rte_dev_malloc". This way any future device which hosts
> > > > > > its own memory that need to be managed by the application, can use these APIs.
> > > > > >
> > > > >
> > > > > "rte_dev_malloc" sounds a good name,
> > > >
> > > > Yes I like the idea.
> > > > 2 concerns:
> > > >
> > > > 1/ Device memory allocation requires a device handle.
> > > > So far we avoided exposing rte_device to the application.
> > > > How should we get a device handle from a DPDK application?
> > >
> > > Each device behaves differently at this level. In the view of the
> > > generic application, the architecture should like
> > >
> > > < Use DPDK subsystem as rte_ethdev, rte_bbdev etc for SPECIFIC function >
> > > ^
> > > |
> > > < DPDK driver>
> > > ^
> > > |
> > > <rte_device with this new callbacks >
> >
> > I think the formatting went wrong above.
> >
> > I would add more to the block diagram:
> >
> > class device API      - computing device API
> >         |            |              |
> > class device driver -   computing device driver
> >         |                           |
> >        EAL device with memory callback
> >
> > The idea above is that the class device driver can use services
> > of the new computing device library.
> 
> Yes. The question is, do we need any public DPDK _application_ APIs for that?

To have something generic!

> If it is public API then the scope is much bigger than that as the application
> can use it directly and it makes it non portable.

It is a non-sense. If we make an API, it will be better portable.
The only part which is non-portable is the program on the device
which may be different per computing device.
The synchronization with the DPDK application should be portable
if we define some good API.

> if the scope is only, the class driver consumption then the existing
> "bus"  _kind of_
> abstraction/API makes sense to me.
> 
> Where it abstracts,
> -FW download of device
> -Memory management of device
> -Opaque way to enq/deque jobs to the device.
> 
> And above should be consumed by "class driver" not "application".
> 
> If the application doing do that, we are in rte_raw device territory.

I'm sorry I don't understand what you make such assertion.
It seems you don't want generic API (which is the purpose of DPDK).



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-08  6:34                                       ` Thomas Monjalon
@ 2021-06-08  7:09                                         ` Jerin Jacob
  2021-06-08  7:32                                           ` Thomas Monjalon
  2021-06-15 18:24                                         ` Ferruh Yigit
  1 sibling, 1 reply; 128+ messages in thread
From: Jerin Jacob @ 2021-06-08  7:09 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Honnappa Nagarahalli, Andrew Rybchenko, Yigit, Ferruh, dpdk-dev,
	Elena Agostini, David Marchand, nd, Wang, Haiyue

On Tue, Jun 8, 2021 at 12:05 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 08/06/2021 06:10, Jerin Jacob:
> > On Mon, Jun 7, 2021 at 10:17 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > >
> > > 07/06/2021 15:54, Jerin Jacob:
> > > > On Mon, Jun 7, 2021 at 4:13 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > > 07/06/2021 09:20, Wang, Haiyue:
> > > > > > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > > > > > > If we keep CXL in mind, I would imagine that in the future the devices on PCIe could have their own
> > > > > > > local memory. May be some of the APIs could use generic names. For ex: instead of calling it as
> > > > > > > "rte_gpu_malloc" may be we could call it as "rte_dev_malloc". This way any future device which hosts
> > > > > > > its own memory that need to be managed by the application, can use these APIs.
> > > > > > >
> > > > > >
> > > > > > "rte_dev_malloc" sounds a good name,
> > > > >
> > > > > Yes I like the idea.
> > > > > 2 concerns:
> > > > >
> > > > > 1/ Device memory allocation requires a device handle.
> > > > > So far we avoided exposing rte_device to the application.
> > > > > How should we get a device handle from a DPDK application?
> > > >
> > > > Each device behaves differently at this level. In the view of the
> > > > generic application, the architecture should like
> > > >
> > > > < Use DPDK subsystem as rte_ethdev, rte_bbdev etc for SPECIFIC function >
> > > > ^
> > > > |
> > > > < DPDK driver>
> > > > ^
> > > > |
> > > > <rte_device with this new callbacks >
> > >
> > > I think the formatting went wrong above.
> > >
> > > I would add more to the block diagram:
> > >
> > > class device API      - computing device API
> > >         |            |              |
> > > class device driver -   computing device driver
> > >         |                           |
> > >        EAL device with memory callback
> > >
> > > The idea above is that the class device driver can use services
> > > of the new computing device library.
> >
> > Yes. The question is, do we need any public DPDK _application_ APIs for that?
>
> To have something generic!
>
> > If it is public API then the scope is much bigger than that as the application
> > can use it directly and it makes it non portable.
>
> It is a non-sense. If we make an API, it will be better portable.

The portal application will be using class device API.
For example, when application needs to call rte_gpu_malloc() vs rte_malloc() ?
Is it better the use of drivers specific functions used in "class
device driver" not exposed?



> The only part which is non-portable is the program on the device
> which may be different per computing device.
> The synchronization with the DPDK application should be portable
> if we define some good API.
>
> > if the scope is only, the class driver consumption then the existing
> > "bus"  _kind of_
> > abstraction/API makes sense to me.
> >
> > Where it abstracts,
> > -FW download of device
> > -Memory management of device
> > -Opaque way to enq/deque jobs to the device.
> >
> > And above should be consumed by "class driver" not "application".
> >
> > If the application doing do that, we are in rte_raw device territory.
>
> I'm sorry I don't understand what you make such assertion.
> It seems you don't want generic API (which is the purpose of DPDK).

I would like to have a generic _application_ API if the application
_needs_ to use it.

The v1 nowhere close to any compute device description.

It has a memory allocation API. It is the device attribute, not
strictly tied to ONLY TO computing device.

So at least, I am asking to have concrete
proposal on "compute device" schematic rather than start with memory API
and rubber stamp as new device adds anything in future.

When we added any all the class devices to DPDK, Everyone had a complete view
of it is function(at RFC of each subsystem had enough API to express
the "basic" usage)
and purpose from the _application_ PoV. I see that is missing here.


>
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-08  7:09                                         ` Jerin Jacob
@ 2021-06-08  7:32                                           ` Thomas Monjalon
  0 siblings, 0 replies; 128+ messages in thread
From: Thomas Monjalon @ 2021-06-08  7:32 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Honnappa Nagarahalli, Andrew Rybchenko, Yigit, Ferruh, dpdk-dev,
	Elena Agostini, David Marchand, nd, Wang, Haiyue

08/06/2021 09:09, Jerin Jacob:
> On Tue, Jun 8, 2021 at 12:05 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> >
> > 08/06/2021 06:10, Jerin Jacob:
> > > On Mon, Jun 7, 2021 at 10:17 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > >
> > > > 07/06/2021 15:54, Jerin Jacob:
> > > > > On Mon, Jun 7, 2021 at 4:13 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > > > 07/06/2021 09:20, Wang, Haiyue:
> > > > > > > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > > > > > > > If we keep CXL in mind, I would imagine that in the future the devices on PCIe could have their own
> > > > > > > > local memory. May be some of the APIs could use generic names. For ex: instead of calling it as
> > > > > > > > "rte_gpu_malloc" may be we could call it as "rte_dev_malloc". This way any future device which hosts
> > > > > > > > its own memory that need to be managed by the application, can use these APIs.
> > > > > > > >
> > > > > > >
> > > > > > > "rte_dev_malloc" sounds a good name,
> > > > > >
> > > > > > Yes I like the idea.
> > > > > > 2 concerns:
> > > > > >
> > > > > > 1/ Device memory allocation requires a device handle.
> > > > > > So far we avoided exposing rte_device to the application.
> > > > > > How should we get a device handle from a DPDK application?
> > > > >
> > > > > Each device behaves differently at this level. In the view of the
> > > > > generic application, the architecture should like
> > > > >
> > > > > < Use DPDK subsystem as rte_ethdev, rte_bbdev etc for SPECIFIC function >
> > > > > ^
> > > > > |
> > > > > < DPDK driver>
> > > > > ^
> > > > > |
> > > > > <rte_device with this new callbacks >
> > > >
> > > > I think the formatting went wrong above.
> > > >
> > > > I would add more to the block diagram:
> > > >
> > > > class device API      - computing device API
> > > >         |            |              |
> > > > class device driver -   computing device driver
> > > >         |                           |
> > > >        EAL device with memory callback
> > > >
> > > > The idea above is that the class device driver can use services
> > > > of the new computing device library.
> > >
> > > Yes. The question is, do we need any public DPDK _application_ APIs for that?
> >
> > To have something generic!
> >
> > > If it is public API then the scope is much bigger than that as the application
> > > can use it directly and it makes it non portable.
> >
> > It is a non-sense. If we make an API, it will be better portable.
> 
> The portal application will be using class device API.
> For example, when application needs to call rte_gpu_malloc() vs rte_malloc() ?
> Is it better the use of drivers specific functions used in "class
> device driver" not exposed?
> 
> 
> 
> > The only part which is non-portable is the program on the device
> > which may be different per computing device.
> > The synchronization with the DPDK application should be portable
> > if we define some good API.
> >
> > > if the scope is only, the class driver consumption then the existing
> > > "bus"  _kind of_
> > > abstraction/API makes sense to me.
> > >
> > > Where it abstracts,
> > > -FW download of device
> > > -Memory management of device
> > > -Opaque way to enq/deque jobs to the device.
> > >
> > > And above should be consumed by "class driver" not "application".
> > >
> > > If the application doing do that, we are in rte_raw device territory.
> >
> > I'm sorry I don't understand what you make such assertion.
> > It seems you don't want generic API (which is the purpose of DPDK).
> 
> I would like to have a generic _application_ API if the application
> _needs_ to use it.
> 
> The v1 nowhere close to any compute device description.

As I said, I forgot the RFC tag.
I just wanted to start the discussion and it was fruitful, no regret.

> It has a memory allocation API. It is the device attribute, not
> strictly tied to ONLY TO computing device.
> 
> So at least, I am asking to have concrete
> proposal on "compute device" schematic rather than start with memory API
> and rubber stamp as new device adds anything in future.
> 
> When we added any all the class devices to DPDK, Everyone had a complete view
> of it is function(at RFC of each subsystem had enough API to express
> the "basic" usage)
> and purpose from the _application_ PoV. I see that is missing here.

I keep explaining in emails while preparing a v2.
Now that we go into circles, let's wait the v2 which will address
a lot of comments.



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-08  6:34                                       ` Thomas Monjalon
  2021-06-08  7:09                                         ` Jerin Jacob
@ 2021-06-15 18:24                                         ` Ferruh Yigit
  2021-06-15 18:54                                           ` Thomas Monjalon
  1 sibling, 1 reply; 128+ messages in thread
From: Ferruh Yigit @ 2021-06-15 18:24 UTC (permalink / raw)
  To: Thomas Monjalon, Jerin Jacob
  Cc: Honnappa Nagarahalli, Andrew Rybchenko, dpdk-dev, Elena Agostini,
	David Marchand, nd, Wang, Haiyue

On 6/8/2021 7:34 AM, Thomas Monjalon wrote:
> 08/06/2021 06:10, Jerin Jacob:
>> On Mon, Jun 7, 2021 at 10:17 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>>>
>>> 07/06/2021 15:54, Jerin Jacob:
>>>> On Mon, Jun 7, 2021 at 4:13 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>>>>> 07/06/2021 09:20, Wang, Haiyue:
>>>>>> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
>>>>>>> If we keep CXL in mind, I would imagine that in the future the devices on PCIe could have their own
>>>>>>> local memory. May be some of the APIs could use generic names. For ex: instead of calling it as
>>>>>>> "rte_gpu_malloc" may be we could call it as "rte_dev_malloc". This way any future device which hosts
>>>>>>> its own memory that need to be managed by the application, can use these APIs.
>>>>>>>
>>>>>>
>>>>>> "rte_dev_malloc" sounds a good name,
>>>>>
>>>>> Yes I like the idea.
>>>>> 2 concerns:
>>>>>
>>>>> 1/ Device memory allocation requires a device handle.
>>>>> So far we avoided exposing rte_device to the application.
>>>>> How should we get a device handle from a DPDK application?
>>>>
>>>> Each device behaves differently at this level. In the view of the
>>>> generic application, the architecture should like
>>>>
>>>> < Use DPDK subsystem as rte_ethdev, rte_bbdev etc for SPECIFIC function >
>>>> ^
>>>> |
>>>> < DPDK driver>
>>>> ^
>>>> |
>>>> <rte_device with this new callbacks >
>>>
>>> I think the formatting went wrong above.
>>>
>>> I would add more to the block diagram:
>>>
>>> class device API      - computing device API
>>>         |            |              |
>>> class device driver -   computing device driver
>>>         |                           |
>>>        EAL device with memory callback
>>>
>>> The idea above is that the class device driver can use services
>>> of the new computing device library.
>>
>> Yes. The question is, do we need any public DPDK _application_ APIs for that?
> 
> To have something generic!
> 
>> If it is public API then the scope is much bigger than that as the application
>> can use it directly and it makes it non portable.
> 
> It is a non-sense. If we make an API, it will be better portable.
> The only part which is non-portable is the program on the device
> which may be different per computing device.
> The synchronization with the DPDK application should be portable
> if we define some good API.
> 
>> if the scope is only, the class driver consumption then the existing
>> "bus"  _kind of_
>> abstraction/API makes sense to me.
>>
>> Where it abstracts,
>> -FW download of device
>> -Memory management of device
>> -Opaque way to enq/deque jobs to the device.
>>
>> And above should be consumed by "class driver" not "application".
>>
>> If the application doing do that, we are in rte_raw device territory.
> 
> I'm sorry I don't understand what you make such assertion.
> It seems you don't want generic API (which is the purpose of DPDK).
> 

The FW/kernel/"computing tasks" in the co-processor can be doing anything, as it
has been in FPGA/rawdev.

If there is no defined input & output of that computing task, an application
developed using it will be specific to that computing task, this is not portable
and feels like how rawdev works.

It is possible to have a generic API for control, to start the task and get
completion notification, but not having common input/output interface with
computing task still has same problem I think.

If the application is strictly depends to what computing task does, why not
extending rawdev to have the control APIs? Instead of new library.
And as you already said for memory, generic APIs can be used with additional
flags and using rawdev handler.

Or another option can be defining computing task a little more, have a common
interface, like mbuf, and add some capabilities/flags to let application know
more about computing task and give decision based on it, is this the intention?


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH] gpudev: introduce memory API
  2021-06-15 18:24                                         ` Ferruh Yigit
@ 2021-06-15 18:54                                           ` Thomas Monjalon
  0 siblings, 0 replies; 128+ messages in thread
From: Thomas Monjalon @ 2021-06-15 18:54 UTC (permalink / raw)
  To: Jerin Jacob, Ferruh Yigit
  Cc: Honnappa Nagarahalli, Andrew Rybchenko, dpdk-dev, Elena Agostini,
	David Marchand, nd, Wang, Haiyue

15/06/2021 20:24, Ferruh Yigit:
> On 6/8/2021 7:34 AM, Thomas Monjalon wrote:
> > 08/06/2021 06:10, Jerin Jacob:
> >> On Mon, Jun 7, 2021 at 10:17 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> >>>
> >>> 07/06/2021 15:54, Jerin Jacob:
> >>>> On Mon, Jun 7, 2021 at 4:13 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> >>>>> 07/06/2021 09:20, Wang, Haiyue:
> >>>>>> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> >>>>>>> If we keep CXL in mind, I would imagine that in the future the devices on PCIe could have their own
> >>>>>>> local memory. May be some of the APIs could use generic names. For ex: instead of calling it as
> >>>>>>> "rte_gpu_malloc" may be we could call it as "rte_dev_malloc". This way any future device which hosts
> >>>>>>> its own memory that need to be managed by the application, can use these APIs.
> >>>>>>>
> >>>>>>
> >>>>>> "rte_dev_malloc" sounds a good name,
> >>>>>
> >>>>> Yes I like the idea.
> >>>>> 2 concerns:
> >>>>>
> >>>>> 1/ Device memory allocation requires a device handle.
> >>>>> So far we avoided exposing rte_device to the application.
> >>>>> How should we get a device handle from a DPDK application?
> >>>>
> >>>> Each device behaves differently at this level. In the view of the
> >>>> generic application, the architecture should like
> >>>>
> >>>> < Use DPDK subsystem as rte_ethdev, rte_bbdev etc for SPECIFIC function >
> >>>> ^
> >>>> |
> >>>> < DPDK driver>
> >>>> ^
> >>>> |
> >>>> <rte_device with this new callbacks >
> >>>
> >>> I think the formatting went wrong above.
> >>>
> >>> I would add more to the block diagram:
> >>>
> >>> class device API      - computing device API
> >>>         |            |              |
> >>> class device driver -   computing device driver
> >>>         |                           |
> >>>        EAL device with memory callback
> >>>
> >>> The idea above is that the class device driver can use services
> >>> of the new computing device library.
> >>
> >> Yes. The question is, do we need any public DPDK _application_ APIs for that?
> > 
> > To have something generic!
> > 
> >> If it is public API then the scope is much bigger than that as the application
> >> can use it directly and it makes it non portable.
> > 
> > It is a non-sense. If we make an API, it will be better portable.
> > The only part which is non-portable is the program on the device
> > which may be different per computing device.
> > The synchronization with the DPDK application should be portable
> > if we define some good API.
> > 
> >> if the scope is only, the class driver consumption then the existing
> >> "bus"  _kind of_
> >> abstraction/API makes sense to me.
> >>
> >> Where it abstracts,
> >> -FW download of device
> >> -Memory management of device
> >> -Opaque way to enq/deque jobs to the device.
> >>
> >> And above should be consumed by "class driver" not "application".
> >>
> >> If the application doing do that, we are in rte_raw device territory.
> > 
> > I'm sorry I don't understand what you make such assertion.
> > It seems you don't want generic API (which is the purpose of DPDK).
> > 
> 
> The FW/kernel/"computing tasks" in the co-processor can be doing anything, as it
> has been in FPGA/rawdev.
> 
> If there is no defined input & output of that computing task, an application
> developed using it will be specific to that computing task, this is not portable
> and feels like how rawdev works.
> 
> It is possible to have a generic API for control, to start the task and get
> completion notification, but not having common input/output interface with
> computing task still has same problem I think.
> 
> If the application is strictly depends to what computing task does, why not
> extending rawdev to have the control APIs? Instead of new library.
> And as you already said for memory, generic APIs can be used with additional
> flags and using rawdev handler.
> 
> Or another option can be defining computing task a little more, have a common
> interface, like mbuf, and add some capabilities/flags to let application know
> more about computing task and give decision based on it, is this the intention?

I think we'll propose a thin layer to allow device memory management with
generic API in EAL and mbuf.
The task should be defined and controlled by the application,
and there is not much DPDK can do generically.

Stay tuned, and thanks for all the feedbacks.




^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library
  2021-06-02 20:35 [dpdk-dev] [PATCH] gpudev: introduce memory API Thomas Monjalon
                   ` (6 preceding siblings ...)
  2021-06-06  1:10 ` Honnappa Nagarahalli
@ 2021-07-30 13:55 ` Thomas Monjalon
  2021-07-30 13:55   ` [dpdk-dev] [RFC PATCH v2 1/7] hcdev: introduce heterogeneous computing device library Thomas Monjalon
                     ` (7 more replies)
  2021-10-09  1:53 ` [dpdk-dev] [PATCH v3 0/9] GPU library eagostini
                   ` (2 subsequent siblings)
  10 siblings, 8 replies; 128+ messages in thread
From: Thomas Monjalon @ 2021-07-30 13:55 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, David Marchand, Andrew Rybchenko, Haiyue Wang,
	Honnappa Nagarahalli, Jerin Jacob, Ferruh Yigit

From: Elena Agostini <eagostini@nvidia.com>

In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.

The goal of this new library is to enhance the collaboration between
DPDK, that's primarily a CPU framework, and other type of devices like GPUs.

When mixing network activity with task processing on a non-CPU device,
there may be the need to put in communication the CPU with the device
in order to manage the memory, synchronize operations, exchange info, etc..

This library provides a number of new features:
- Interoperability with device specific library with generic handlers
- Possibility to allocate and free memory on the device
- Possibility to allocate and free memory on the CPU but visible from the device
- Communication functions to enhance the dialog between the CPU and the device

The infrastructure is prepared to welcome drivers in drivers/hc/
as the upcoming NVIDIA one, implementing the hcdev API.

Some parts are not complete:
  - locks
  - memory allocation table
  - memory freeing
  - guide documentation
  - integration in devtools/check-doc-vs-code.sh
  - unit tests
  - integration in testpmd to enable Rx/Tx to/from GPU memory.

Below is a pseudo-code to give an example about how to use functions
in this library in case of a CUDA application.


Elena Agostini (4):
  hcdev: introduce heterogeneous computing device library
  hcdev: add memory API
  hcdev: add communication flag
  hcdev: add communication list

Thomas Monjalon (3):
  hcdev: add event notification
  hcdev: add child device representing a device context
  hcdev: support multi-process

 .gitignore                             |   1 +
 MAINTAINERS                            |   6 +
 doc/api/doxy-api-index.md              |   1 +
 doc/api/doxy-api.conf.in               |   1 +
 doc/guides/conf.py                     |   8 +
 doc/guides/hcdevs/features/default.ini |  13 +
 doc/guides/hcdevs/index.rst            |  11 +
 doc/guides/hcdevs/overview.rst         |  11 +
 doc/guides/index.rst                   |   1 +
 doc/guides/prog_guide/hcdev.rst        |   5 +
 doc/guides/prog_guide/index.rst        |   1 +
 doc/guides/rel_notes/release_21_08.rst |   5 +
 drivers/hc/meson.build                 |   4 +
 drivers/meson.build                    |   1 +
 lib/hcdev/hcdev.c                      | 789 +++++++++++++++++++++++++
 lib/hcdev/hcdev_driver.h               |  96 +++
 lib/hcdev/meson.build                  |  12 +
 lib/hcdev/rte_hcdev.h                  | 592 +++++++++++++++++++
 lib/hcdev/version.map                  |  35 ++
 lib/meson.build                        |   1 +
 20 files changed, 1594 insertions(+)
 create mode 100644 doc/guides/hcdevs/features/default.ini
 create mode 100644 doc/guides/hcdevs/index.rst
 create mode 100644 doc/guides/hcdevs/overview.rst
 create mode 100644 doc/guides/prog_guide/hcdev.rst
 create mode 100644 drivers/hc/meson.build
 create mode 100644 lib/hcdev/hcdev.c
 create mode 100644 lib/hcdev/hcdev_driver.h
 create mode 100644 lib/hcdev/meson.build
 create mode 100644 lib/hcdev/rte_hcdev.h
 create mode 100644 lib/hcdev/version.map



////////////////////////////////////////////////////////////////////////
///// HCDEV library + CUDA functions
////////////////////////////////////////////////////////////////////////
#define GPU_PAGE_SHIFT 16
#define GPU_PAGE_SIZE (1UL << GPU_PAGE_SHIFT)

int main() {
    struct rte_hcdev_flag quit_flag;
    struct rte_hcdev_comm_list *comm_list;
    int nb_rx = 0;
    int comm_list_entry = 0;
    struct rte_mbuf * rx_mbufs[max_rx_mbufs];
    cudaStream_t cstream;
    struct rte_mempool *mpool_payload, *mpool_header;
    struct rte_pktmbuf_extmem ext_mem;
    int16_t dev_id;

    /* Initialize CUDA objects (cstream, context, etc..). */
    /* Use hcdev library to register a new CUDA context if any */
    /* Let's assume the application wants to use the default context of the GPU device 0 */
    dev_id = 0;

    /* Create an external memory mempool using memory allocated on the GPU. */
    ext_mem.elt_size = mbufs_headroom_size;
                ext_mem.buf_len = RTE_ALIGN_CEIL(mbufs_num * ext_mem.elt_size, GPU_PAGE_SIZE);
    ext_mem.buf_iova = RTE_BAD_IOVA;
    ext_mem.buf_ptr = rte_hcdev_malloc(dev_id, ext_mem.buf_len, 0);
    rte_extmem_register(ext_mem.buf_ptr, ext_mem.buf_len, NULL, ext_mem.buf_iova, GPU_PAGE_SIZE);
    rte_dev_dma_map(rte_eth_devices[l2fwd_port_id].device, ext_mem.buf_ptr, ext_mem.buf_iova, ext_mem.buf_len);
    mpool_payload = rte_pktmbuf_pool_create_extbuf("gpu_mempool", mbufs_num,
                                                    0, 0, ext_mem.elt_size,
                                                    rte_socket_id(), &ext_mem, 1);

    /*
     * Create CPU - device communication flag. With this flag, the CPU can tell to the CUDA kernel
     * to exit from the main loop.
     */
    rte_hcdev_comm_create_flag(dev_id, &quit_flag, RTE_HCDEV_COMM_FLAG_CPU);
    rte_hcdev_comm_set_flag(&quit_flag, 0);

    /*
     * Create CPU - device communication list. Each entry of this list will be populated by the CPU
     * with a new set of received mbufs that the CUDA kernel has to process.
     */
    comm_list = rte_hcdev_comm_create_list(dev_id, num_entries);

    /* A very simple CUDA kernel with just 1 CUDA block and RTE_HCDEV_COMM_LIST_PKTS_MAX CUDA threads. */
    cuda_kernel_packet_processing<<<1, RTE_HCDEV_COMM_LIST_PKTS_MAX, 0, cstream>>>(quit_flag->ptr, comm_list, num_entries, ...);

    /*
     * For simplicity, the CPU here receives only 2 bursts of mbufs.
     * In a real application, network activity and device processing should overlap.
     */
    nb_rx = rte_eth_rx_burst(port_id, queue_id, &(rx_mbufs[0]), max_rx_mbufs);
    rte_hcdev_comm_populate_list_pkts(comm_list[0], rx_mbufs, nb_rx);
    nb_rx = rte_eth_rx_burst(port_id, queue_id, &(rx_mbufs[0]), max_rx_mbufs);
    rte_hcdev_comm_populate_list_pkts(comm_list[1], rx_mbufs, nb_rx);

    /*
     * CPU waits for the completion of the packets' processing on the CUDA kernel
     * and then it does a cleanup of the received mbufs.
     */
    while (rte_hcdev_comm_cleanup_list(comm_list[0]));
    while (rte_hcdev_comm_cleanup_list(comm_list[1]));

    /* CPU notifies the CUDA kernel that it has to terminate */
    rte_hcdev_comm_set_flag(&quit_flag, 1);

    /* hcdev objects cleanup/destruction */
    /* CUDA cleanup */
    /* DPDK cleanup */

    return 0;
}

////////////////////////////////////////////////////////////////////////
///// CUDA kernel
////////////////////////////////////////////////////////////////////////

void cuda_kernel(uint32_t * quit_flag_ptr, struct rte_hcdev_comm_list *comm_list, int comm_list_entries) {
    int comm_list_index = 0;
    struct rte_hcdev_comm_pkt *pkt_list = NULL;

    /* Do some pre-processing operations. */

    /* GPU kernel keeps checking this flag to know if it has to quit or wait for more packets. */
    while (*quit_flag_ptr == 0)
    {
        if (comm_list[comm_list_index]->status != RTE_HCDEV_COMM_LIST_READY)
            continue;

        if (threadIdx.x < comm_list[comm_list_index]->num_pkts)
        {
            /* Each CUDA thread processes a different packet. */
            packet_processing(comm_list[comm_list_index]->addr, comm_list[comm_list_index]->size, ..);
        }
        __threadfence();
        __syncthreads();

        /* Wait for new packets on the next communication list entry. */
        comm_list_index = (comm_list_index+1) % comm_list_entries;
    }

    /* Do some post-processing operations. */
}


-- 
2.31.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [RFC PATCH v2 1/7] hcdev: introduce heterogeneous computing device library
  2021-07-30 13:55 ` [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library Thomas Monjalon
@ 2021-07-30 13:55   ` Thomas Monjalon
  2021-07-30 13:55   ` [dpdk-dev] [RFC PATCH v2 2/7] hcdev: add event notification Thomas Monjalon
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 128+ messages in thread
From: Thomas Monjalon @ 2021-07-30 13:55 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, David Marchand, Andrew Rybchenko, Haiyue Wang,
	Honnappa Nagarahalli, Jerin Jacob, Ferruh Yigit, Elena Agostini,
	Ray Kinsella, Anatoly Burakov

From: Elena Agostini <eagostini@nvidia.com>

In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.

The new library hcdev is for dealing with computing devices
from a DPDK application running on the CPU.

The infrastructure is prepared to welcome drivers in drivers/hc/.

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 .gitignore                             |   1 +
 MAINTAINERS                            |   6 +
 doc/api/doxy-api-index.md              |   1 +
 doc/api/doxy-api.conf.in               |   1 +
 doc/guides/conf.py                     |   8 +
 doc/guides/hcdevs/features/default.ini |  10 +
 doc/guides/hcdevs/index.rst            |  11 ++
 doc/guides/hcdevs/overview.rst         |  11 ++
 doc/guides/index.rst                   |   1 +
 doc/guides/prog_guide/hcdev.rst        |   5 +
 doc/guides/prog_guide/index.rst        |   1 +
 doc/guides/rel_notes/release_21_08.rst |   4 +
 drivers/hc/meson.build                 |   4 +
 drivers/meson.build                    |   1 +
 lib/hcdev/hcdev.c                      | 249 +++++++++++++++++++++++++
 lib/hcdev/hcdev_driver.h               |  67 +++++++
 lib/hcdev/meson.build                  |  10 +
 lib/hcdev/rte_hcdev.h                  | 169 +++++++++++++++++
 lib/hcdev/version.map                  |  20 ++
 lib/meson.build                        |   1 +
 20 files changed, 581 insertions(+)
 create mode 100644 doc/guides/hcdevs/features/default.ini
 create mode 100644 doc/guides/hcdevs/index.rst
 create mode 100644 doc/guides/hcdevs/overview.rst
 create mode 100644 doc/guides/prog_guide/hcdev.rst
 create mode 100644 drivers/hc/meson.build
 create mode 100644 lib/hcdev/hcdev.c
 create mode 100644 lib/hcdev/hcdev_driver.h
 create mode 100644 lib/hcdev/meson.build
 create mode 100644 lib/hcdev/rte_hcdev.h
 create mode 100644 lib/hcdev/version.map

diff --git a/.gitignore b/.gitignore
index b19c0717e6..97e57e5897 100644
--- a/.gitignore
+++ b/.gitignore
@@ -14,6 +14,7 @@ doc/guides/compressdevs/overview_feature_table.txt
 doc/guides/regexdevs/overview_feature_table.txt
 doc/guides/vdpadevs/overview_feature_table.txt
 doc/guides/bbdevs/overview_feature_table.txt
+doc/guides/hcdevs/overview_feature_table.txt
 
 # ignore generated ctags/cscope files
 cscope.out.po
diff --git a/MAINTAINERS b/MAINTAINERS
index 8013ba1f14..71e850ae44 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -452,6 +452,12 @@ F: app/test-regex/
 F: doc/guides/prog_guide/regexdev.rst
 F: doc/guides/regexdevs/features/default.ini
 
+Heterogeneous Computing API - EXPERIMENTAL
+M: Elena Agostini <eagostini@nvidia.com>
+F: lib/hcdev/
+F: doc/guides/prog_guide/hcdev.rst
+F: doc/guides/hcdevs/features/default.ini
+
 Eventdev API
 M: Jerin Jacob <jerinj@marvell.com>
 T: git://dpdk.org/next/dpdk-next-eventdev
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 1992107a03..2e5256ccc1 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -21,6 +21,7 @@ The public API headers are grouped by topics:
   [compressdev]        (@ref rte_compressdev.h),
   [compress]           (@ref rte_comp.h),
   [regexdev]           (@ref rte_regexdev.h),
+  [hcdev]              (@ref rte_hcdev.h),
   [eventdev]           (@ref rte_eventdev.h),
   [event_eth_rx_adapter]   (@ref rte_event_eth_rx_adapter.h),
   [event_eth_tx_adapter]   (@ref rte_event_eth_tx_adapter.h),
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index 325a0195c6..549f373b8a 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -44,6 +44,7 @@ INPUT                   = @TOPDIR@/doc/api/doxy-api-index.md \
                           @TOPDIR@/lib/gro \
                           @TOPDIR@/lib/gso \
                           @TOPDIR@/lib/hash \
+                          @TOPDIR@/lib/hcdev \
                           @TOPDIR@/lib/ip_frag \
                           @TOPDIR@/lib/ipsec \
                           @TOPDIR@/lib/jobstats \
diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 67d2dd62c7..67ad2c8090 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -152,6 +152,9 @@ def generate_overview_table(output_filename, table_id, section, table_name, titl
         name = ini_filename[:-4]
         name = name.replace('_vf', 'vf')
         pmd_names.append(name)
+    if not pmd_names:
+        # Add an empty column if table is empty (required by RST syntax)
+        pmd_names.append(' ')
 
     # Pad the table header names.
     max_header_len = len(max(pmd_names, key=len))
@@ -388,6 +391,11 @@ def setup(app):
                             'Features',
                             'Features availability in bbdev drivers',
                             'Feature')
+    table_file = dirname(__file__) + '/hcdevs/overview_feature_table.txt'
+    generate_overview_table(table_file, 1,
+                            'Features',
+                            'Features availability in hcdev drivers',
+                            'Feature')
 
     if LooseVersion(sphinx_version) < LooseVersion('1.3.1'):
         print('Upgrade sphinx to version >= 1.3.1 for '
diff --git a/doc/guides/hcdevs/features/default.ini b/doc/guides/hcdevs/features/default.ini
new file mode 100644
index 0000000000..f988ee73d4
--- /dev/null
+++ b/doc/guides/hcdevs/features/default.ini
@@ -0,0 +1,10 @@
+;
+; Features of heterogeneous device driver.
+;
+; This file defines the features that are valid for inclusion in
+; the other driver files and also the order that they appear in
+; the features table in the documentation. The feature description
+; string should not exceed feature_str_len defined in conf.py.
+;
+[Features]
+Get device info                =
diff --git a/doc/guides/hcdevs/index.rst b/doc/guides/hcdevs/index.rst
new file mode 100644
index 0000000000..4c217ec0c2
--- /dev/null
+++ b/doc/guides/hcdevs/index.rst
@@ -0,0 +1,11 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+   Copyright (c) 2021 NVIDIA Corporation & Affiliates
+
+Heterogeneous Computing Device Drivers
+======================================
+
+.. toctree::
+   :maxdepth: 2
+   :numbered:
+
+   overview
diff --git a/doc/guides/hcdevs/overview.rst b/doc/guides/hcdevs/overview.rst
new file mode 100644
index 0000000000..aedce33792
--- /dev/null
+++ b/doc/guides/hcdevs/overview.rst
@@ -0,0 +1,11 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+   Copyright (c) 2021 NVIDIA Corporation & Affiliates
+
+Overview of Heterogeneous Computing Drivers
+===========================================
+
+Heterogeneous computing device may refer to any computing unit
+able to process data and to share some memory with the CPU.
+Examples are GPU or specialized processor in a SoC.
+
+.. include:: overview_feature_table.txt
diff --git a/doc/guides/index.rst b/doc/guides/index.rst
index 857f0363d3..643c52d8f9 100644
--- a/doc/guides/index.rst
+++ b/doc/guides/index.rst
@@ -21,6 +21,7 @@ DPDK documentation
    compressdevs/index
    vdpadevs/index
    regexdevs/index
+   hcdevs/index
    eventdevs/index
    rawdevs/index
    mempool/index
diff --git a/doc/guides/prog_guide/hcdev.rst b/doc/guides/prog_guide/hcdev.rst
new file mode 100644
index 0000000000..0b5bd3cb1c
--- /dev/null
+++ b/doc/guides/prog_guide/hcdev.rst
@@ -0,0 +1,5 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+   Copyright (c) 2021 NVIDIA Corporation & Affiliates
+
+Heterogeneous Computing Device Library
+======================================
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 2dce507f46..12e7ea3e20 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -27,6 +27,7 @@ Programmer's Guide
     cryptodev_lib
     compressdev
     regexdev
+    hcdev
     rte_security
     rawdev
     link_bonding_poll_mode_drv_lib
diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index 16bb9ce19e..fb350b4706 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -55,6 +55,10 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Introduced Heterogeneous Computing Device library with first features:**
+
+  * Device information
+
 * **Added auxiliary bus support.**
 
   Auxiliary bus provides a way to split function into child-devices
diff --git a/drivers/hc/meson.build b/drivers/hc/meson.build
new file mode 100644
index 0000000000..e51ad3381b
--- /dev/null
+++ b/drivers/hc/meson.build
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2021 NVIDIA Corporation & Affiliates
+
+drivers = []
diff --git a/drivers/meson.build b/drivers/meson.build
index bc6f4f567f..b0dbee1b54 100644
--- a/drivers/meson.build
+++ b/drivers/meson.build
@@ -18,6 +18,7 @@ subdirs = [
         'vdpa',           # depends on common, bus and mempool.
         'event',          # depends on common, bus, mempool and net.
         'baseband',       # depends on common and bus.
+        'hc',             # depends on common and bus.
 ]
 
 if meson.is_cross_build()
diff --git a/lib/hcdev/hcdev.c b/lib/hcdev/hcdev.c
new file mode 100644
index 0000000000..ea587b3713
--- /dev/null
+++ b/lib/hcdev/hcdev.c
@@ -0,0 +1,249 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 NVIDIA Corporation & Affiliates
+ */
+
+#include <rte_eal.h>
+#include <rte_string_fns.h>
+#include <rte_errno.h>
+#include <rte_log.h>
+
+#include "rte_hcdev.h"
+#include "hcdev_driver.h"
+
+/* Logging */
+RTE_LOG_REGISTER_DEFAULT(hcdev_logtype, NOTICE);
+#define HCDEV_LOG(level, ...) \
+	rte_log(RTE_LOG_ ## level, hcdev_logtype, RTE_FMT("hcdev: " \
+		RTE_FMT_HEAD(__VA_ARGS__,) "\n", RTE_FMT_TAIL(__VA_ARGS__,)))
+
+/* Set any driver error as EPERM */
+#define HCDEV_DRV_RET(function) \
+	((function != 0) ? -(rte_errno = EPERM) : (rte_errno = 0))
+
+/* Array of devices */
+static struct rte_hcdev *hcdevs;
+/* Number of currently valid devices */
+static int16_t hcdev_max;
+/* Number of currently valid devices */
+static int16_t hcdev_count;
+
+int
+rte_hcdev_init(size_t dev_max)
+{
+	if (dev_max == 0 || dev_max > INT16_MAX) {
+		HCDEV_LOG(ERR, "invalid array size");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	/* No lock, it must be called before or during first probing. */
+	if (hcdevs != NULL) {
+		HCDEV_LOG(ERR, "already initialized");
+		rte_errno = EBUSY;
+		return -rte_errno;
+	}
+
+	hcdevs = calloc(dev_max, sizeof(struct rte_hcdev));
+	if (hcdevs == NULL) {
+		HCDEV_LOG(ERR, "cannot initialize library");
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+
+	hcdev_max = dev_max;
+	return 0;
+}
+
+uint16_t
+rte_hcdev_count_avail(void)
+{
+	return hcdev_count;
+}
+
+bool
+rte_hcdev_is_valid(int16_t dev_id)
+{
+	if (dev_id >= 0 && dev_id < hcdev_max &&
+		hcdevs[dev_id].state == RTE_HCDEV_STATE_INITIALIZED)
+		return true;
+	return false;
+}
+
+int16_t
+rte_hcdev_find_next(int16_t dev_id)
+{
+	if (dev_id < 0)
+		dev_id = 0;
+	while (dev_id < hcdev_max &&
+			hcdevs[dev_id].state == RTE_HCDEV_STATE_UNUSED)
+		dev_id++;
+
+	if (dev_id >= hcdev_max)
+		return RTE_HCDEV_ID_NONE;
+	return dev_id;
+}
+
+static int16_t
+hcdev_find_free_id(void)
+{
+	int16_t dev_id;
+
+	for (dev_id = 0; dev_id < hcdev_max; dev_id++) {
+		if (hcdevs[dev_id].state == RTE_HCDEV_STATE_UNUSED)
+			return dev_id;
+	}
+	return RTE_HCDEV_ID_NONE;
+}
+
+static struct rte_hcdev *
+hcdev_get_by_id(int16_t dev_id)
+{
+	if (!rte_hcdev_is_valid(dev_id))
+		return NULL;
+	return &hcdevs[dev_id];
+}
+
+struct rte_hcdev *
+rte_hcdev_get_by_name(const char *name)
+{
+	int16_t dev_id;
+	struct rte_hcdev *dev;
+
+	if (name == NULL) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	RTE_HCDEV_FOREACH(dev_id) {
+		dev = &hcdevs[dev_id];
+		if (strncmp(name, dev->name, RTE_DEV_NAME_MAX_LEN) == 0)
+			return dev;
+	}
+	return NULL;
+}
+
+struct rte_hcdev *
+rte_hcdev_allocate(const char *name)
+{
+	int16_t dev_id;
+	struct rte_hcdev *dev;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+		HCDEV_LOG(ERR, "only primary process can allocate device");
+		rte_errno = EPERM;
+		return NULL;
+	}
+	if (name == NULL) {
+		HCDEV_LOG(ERR, "allocate device without a name");
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* implicit initialization of library before adding first device */
+	if (hcdevs == NULL && rte_hcdev_init(RTE_HCDEV_DEFAULT_MAX) < 0)
+		return NULL;
+
+	if (rte_hcdev_get_by_name(name) != NULL) {
+		HCDEV_LOG(ERR, "device with name %s already exists", name);
+		rte_errno = EEXIST;
+		return NULL;
+	}
+	dev_id = hcdev_find_free_id();
+	if (dev_id == RTE_HCDEV_ID_NONE) {
+		HCDEV_LOG(ERR, "reached maximum number of devices");
+		rte_errno = ENOENT;
+		return NULL;
+	}
+
+	dev = &hcdevs[dev_id];
+	memset(dev, 0, sizeof(*dev));
+
+	if (rte_strscpy(dev->name, name, RTE_DEV_NAME_MAX_LEN) < 0) {
+		HCDEV_LOG(ERR, "device name too long: %s", name);
+		rte_errno = ENAMETOOLONG;
+		return NULL;
+	}
+	dev->info.name = dev->name;
+	dev->info.dev_id = dev_id;
+	dev->info.numa_node = -1;
+
+	hcdev_count++;
+	HCDEV_LOG(DEBUG, "new device %s (id %d) of total %d",
+			name, dev_id, hcdev_count);
+	return dev;
+}
+
+void
+rte_hcdev_complete_new(struct rte_hcdev *dev)
+{
+	if (dev == NULL)
+		return;
+
+	dev->state = RTE_HCDEV_STATE_INITIALIZED;
+}
+
+int
+rte_hcdev_release(struct rte_hcdev *dev)
+{
+	if (dev == NULL) {
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+
+	HCDEV_LOG(DEBUG, "free device %s (id %d)",
+			dev->info.name, dev->info.dev_id);
+	dev->state = RTE_HCDEV_STATE_UNUSED;
+	hcdev_count--;
+
+	return 0;
+}
+
+int
+rte_hcdev_close(int16_t dev_id)
+{
+	int firsterr, binerr;
+	int *lasterr = &firsterr;
+	struct rte_hcdev *dev;
+
+	dev = hcdev_get_by_id(dev_id);
+	if (dev == NULL) {
+		HCDEV_LOG(ERR, "close invalid device ID %d", dev_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+
+	if (dev->ops.dev_close != NULL) {
+		*lasterr = HCDEV_DRV_RET(dev->ops.dev_close(dev));
+		if (*lasterr != 0)
+			lasterr = &binerr;
+	}
+
+	*lasterr = rte_hcdev_release(dev);
+
+	rte_errno = -firsterr;
+	return firsterr;
+}
+
+int
+rte_hcdev_info_get(int16_t dev_id, struct rte_hcdev_info *info)
+{
+	struct rte_hcdev *dev;
+
+	dev = hcdev_get_by_id(dev_id);
+	if (dev == NULL) {
+		HCDEV_LOG(ERR, "query invalid device ID %d", dev_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+	if (info == NULL) {
+		HCDEV_LOG(ERR, "query without storage");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	if (dev->ops.dev_info_get == NULL) {
+		*info = dev->info;
+		return 0;
+	}
+	return HCDEV_DRV_RET(dev->ops.dev_info_get(dev, info));
+}
diff --git a/lib/hcdev/hcdev_driver.h b/lib/hcdev/hcdev_driver.h
new file mode 100644
index 0000000000..ca23cb9b9f
--- /dev/null
+++ b/lib/hcdev/hcdev_driver.h
@@ -0,0 +1,67 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 NVIDIA Corporation & Affiliates
+ */
+
+/*
+ * This header file must be included only by drivers.
+ * It is considered internal, i.e. hidden for the application.
+ * The prefix rte_ is used to avoid namespace clash in drivers.
+ */
+
+#ifndef RTE_HCDEV_DRIVER_H
+#define RTE_HCDEV_DRIVER_H
+
+#include <stdint.h>
+
+#include <rte_dev.h>
+
+#include "rte_hcdev.h"
+
+/* Flags indicate current state of device. */
+enum rte_hcdev_state {
+	RTE_HCDEV_STATE_UNUSED,        /* not initialized */
+	RTE_HCDEV_STATE_INITIALIZED,   /* initialized */
+};
+
+struct rte_hcdev;
+typedef int (rte_hcdev_close_t)(struct rte_hcdev *dev);
+typedef int (rte_hcdev_info_get_t)(struct rte_hcdev *dev, struct rte_hcdev_info *info);
+
+struct rte_hcdev_ops {
+	/* Get device info. If NULL, info is just copied. */
+	rte_hcdev_info_get_t *dev_info_get;
+	/* Close device. */
+	rte_hcdev_close_t *dev_close;
+};
+
+struct rte_hcdev {
+	/* Backing device. */
+	struct rte_device *device;
+	/* Unique identifier name. */
+	char name[RTE_DEV_NAME_MAX_LEN]; /* Updated by this library. */
+	/* Device info structure. */
+	struct rte_hcdev_info info;
+	/* Driver functions. */
+	struct rte_hcdev_ops ops;
+	/* Current state (used or not) in the running process. */
+	enum rte_hcdev_state state; /* Updated by this library. */
+	/* Driver-specific private data for the running process. */
+	void *process_private;
+} __rte_cache_aligned;
+
+__rte_internal
+struct rte_hcdev *rte_hcdev_get_by_name(const char *name);
+
+/* First step of initialization */
+__rte_internal
+struct rte_hcdev *rte_hcdev_allocate(const char *name);
+
+/* Last step of initialization. */
+__rte_internal
+void rte_hcdev_complete_new(struct rte_hcdev *dev);
+
+/* Last step of removal. */
+__rte_internal
+int rte_hcdev_release(struct rte_hcdev *dev);
+
+#endif /* RTE_HCDEV_DRIVER_H */
diff --git a/lib/hcdev/meson.build b/lib/hcdev/meson.build
new file mode 100644
index 0000000000..565c3cb623
--- /dev/null
+++ b/lib/hcdev/meson.build
@@ -0,0 +1,10 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2021 NVIDIA Corporation & Affiliates
+
+headers = files(
+        'rte_hcdev.h',
+)
+
+sources = files(
+        'hcdev.c',
+)
diff --git a/lib/hcdev/rte_hcdev.h b/lib/hcdev/rte_hcdev.h
new file mode 100644
index 0000000000..83f58193c1
--- /dev/null
+++ b/lib/hcdev/rte_hcdev.h
@@ -0,0 +1,169 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 NVIDIA Corporation & Affiliates
+ */
+
+#ifndef RTE_HCDEV_H
+#define RTE_HCDEV_H
+
+#include <stddef.h>
+#include <stdint.h>
+#include <stdbool.h>
+
+#include <rte_compat.h>
+
+/**
+ * @file
+ * Generic library to interact with heterogeneous computing device.
+ *
+ * The API is not thread-safe.
+ * Device management must be done by a single thread.
+ *
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/** Maximum number of devices if rte_hcdev_init() is not called. */
+#define RTE_HCDEV_DEFAULT_MAX 32
+
+/** Empty device ID. */
+#define RTE_HCDEV_ID_NONE -1
+
+/** Store device info. */
+struct rte_hcdev_info {
+	/** Unique identifier name. */
+	const char *name;
+	/** Device ID. */
+	int16_t dev_id;
+	/** Total processors available on device. */
+	uint32_t processor_count;
+	/** Total memory available on device. */
+	size_t total_memory;
+	/* Local NUMA memory ID. -1 if unknown. */
+	int16_t numa_node;
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Initialize the device array before probing devices.
+ * If not called, the maximum of probed devices is RTE_HCDEV_DEFAULT_MAX.
+ *
+ * @param dev_max
+ *   Maximum number of devices.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENOMEM if out of memory
+ *   - EINVAL if 0 size
+ *   - EBUSY if already initialized
+ */
+__rte_experimental
+int rte_hcdev_init(size_t dev_max);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Return the number of heterogeneous computing devices detected
+ * and associated to DPDK.
+ *
+ * @return
+ *   The number of available computing devices.
+ */
+__rte_experimental
+uint16_t rte_hcdev_count_avail(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Check if the device is valid and initialized in DPDK.
+ *
+ * @param dev_id
+ *   The input device ID.
+ *
+ * @return
+ *   - True if dev_id is a valid and initialized computing device.
+ *   - False otherwise.
+ */
+__rte_experimental
+bool rte_hcdev_is_valid(int16_t dev_id);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get the ID of the next valid computing device initialized in DPDK.
+ *
+ * @param dev_id
+ *   The initial device ID to start the research.
+ *
+ * @return
+ *   Next device ID corresponding to a valid and initialized computing device,
+ *   RTE_HCDEV_ID_NONE if there is none.
+ */
+__rte_experimental
+int16_t rte_hcdev_find_next(int16_t dev_id);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Macro to iterate over all valid computing devices.
+ *
+ * @param dev_id
+ *   The ID of the next possible valid device, usually 0 to iterate all.
+ */
+#define RTE_HCDEV_FOREACH(dev_id) \
+	for (dev_id = rte_hcdev_find_next(0); \
+	     dev_id > 0; \
+	     dev_id = rte_hcdev_find_next(dev_id + 1))
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Close device.
+ * All resources are released.
+ *
+ * @param dev_id
+ *   Device ID to close.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - EPERM if driver error
+ */
+__rte_experimental
+int rte_hcdev_close(int16_t dev_id);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Return device specific info.
+ *
+ * @param dev_id
+ *   Device ID to get info.
+ * @param info
+ *   Memory structure to fill with the info.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - EINVAL if NULL info
+ *   - EPERM if driver error
+ */
+__rte_experimental
+int rte_hcdev_info_get(int16_t dev_id, struct rte_hcdev_info *info);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_HCDEV_H */
diff --git a/lib/hcdev/version.map b/lib/hcdev/version.map
new file mode 100644
index 0000000000..bc6dae6de7
--- /dev/null
+++ b/lib/hcdev/version.map
@@ -0,0 +1,20 @@
+EXPERIMENTAL {
+	global:
+
+	# added in 21.11
+	rte_hcdev_close;
+	rte_hcdev_count_avail;
+	rte_hcdev_find_next;
+	rte_hcdev_info_get;
+	rte_hcdev_init;
+	rte_hcdev_is_valid;
+};
+
+INTERNAL {
+	global:
+
+	rte_hcdev_allocate;
+	rte_hcdev_complete_new;
+	rte_hcdev_get_by_name;
+	rte_hcdev_release;
+};
diff --git a/lib/meson.build b/lib/meson.build
index 1673ca4323..3239182c03 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -35,6 +35,7 @@ libraries = [
         'eventdev',
         'gro',
         'gso',
+        'hcdev',
         'ip_frag',
         'jobstats',
         'kni',
-- 
2.31.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [RFC PATCH v2 2/7] hcdev: add event notification
  2021-07-30 13:55 ` [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library Thomas Monjalon
  2021-07-30 13:55   ` [dpdk-dev] [RFC PATCH v2 1/7] hcdev: introduce heterogeneous computing device library Thomas Monjalon
@ 2021-07-30 13:55   ` Thomas Monjalon
  2021-07-30 13:55   ` [dpdk-dev] [RFC PATCH v2 3/7] hcdev: add child device representing a device context Thomas Monjalon
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 128+ messages in thread
From: Thomas Monjalon @ 2021-07-30 13:55 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, David Marchand, Andrew Rybchenko, Haiyue Wang,
	Honnappa Nagarahalli, Jerin Jacob, Ferruh Yigit, Elena Agostini,
	Ray Kinsella

Callback functions may be registered for a device event.
Callback management is per-process and not thread-safe.

The events RTE_HCDEV_EVENT_NEW and RTE_HCDEV_EVENT_DEL
are notified respectively after creation and before removal
of a device, as part of the library functions.
Some future events may be emitted from drivers.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 lib/hcdev/hcdev.c        | 137 +++++++++++++++++++++++++++++++++++++++
 lib/hcdev/hcdev_driver.h |   7 ++
 lib/hcdev/rte_hcdev.h    |  71 ++++++++++++++++++++
 lib/hcdev/version.map    |   3 +
 4 files changed, 218 insertions(+)

diff --git a/lib/hcdev/hcdev.c b/lib/hcdev/hcdev.c
index ea587b3713..2a7ce1ccd8 100644
--- a/lib/hcdev/hcdev.c
+++ b/lib/hcdev/hcdev.c
@@ -3,6 +3,7 @@
  */
 
 #include <rte_eal.h>
+#include <rte_tailq.h>
 #include <rte_string_fns.h>
 #include <rte_errno.h>
 #include <rte_log.h>
@@ -27,6 +28,15 @@ static int16_t hcdev_max;
 /* Number of currently valid devices */
 static int16_t hcdev_count;
 
+/* Event callback object */
+struct rte_hcdev_callback {
+	TAILQ_ENTRY(rte_hcdev_callback) next;
+	rte_hcdev_callback_t *function;
+	void *user_data;
+	enum rte_hcdev_event event;
+};
+static void hcdev_free_callbacks(struct rte_hcdev *dev);
+
 int
 rte_hcdev_init(size_t dev_max)
 {
@@ -166,6 +176,7 @@ rte_hcdev_allocate(const char *name)
 	dev->info.name = dev->name;
 	dev->info.dev_id = dev_id;
 	dev->info.numa_node = -1;
+	TAILQ_INIT(&dev->callbacks);
 
 	hcdev_count++;
 	HCDEV_LOG(DEBUG, "new device %s (id %d) of total %d",
@@ -180,6 +191,7 @@ rte_hcdev_complete_new(struct rte_hcdev *dev)
 		return;
 
 	dev->state = RTE_HCDEV_STATE_INITIALIZED;
+	rte_hcdev_notify(dev, RTE_HCDEV_EVENT_NEW);
 }
 
 int
@@ -192,6 +204,9 @@ rte_hcdev_release(struct rte_hcdev *dev)
 
 	HCDEV_LOG(DEBUG, "free device %s (id %d)",
 			dev->info.name, dev->info.dev_id);
+	rte_hcdev_notify(dev, RTE_HCDEV_EVENT_DEL);
+
+	hcdev_free_callbacks(dev);
 	dev->state = RTE_HCDEV_STATE_UNUSED;
 	hcdev_count--;
 
@@ -224,6 +239,128 @@ rte_hcdev_close(int16_t dev_id)
 	return firsterr;
 }
 
+int
+rte_hcdev_callback_register(int16_t dev_id, enum rte_hcdev_event event,
+		rte_hcdev_callback_t *function, void *user_data)
+{
+	int16_t next_dev, last_dev;
+	struct rte_hcdev_callback_list *callbacks;
+	struct rte_hcdev_callback *callback;
+
+	if (!rte_hcdev_is_valid(dev_id) && dev_id != RTE_HCDEV_ID_ANY) {
+		HCDEV_LOG(ERR, "register callback of invalid ID %d", dev_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+	if (function == NULL) {
+		HCDEV_LOG(ERR, "cannot register callback without function");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	if (dev_id == RTE_HCDEV_ID_ANY) {
+		next_dev = 0;
+		last_dev = hcdev_max - 1;
+	} else {
+		next_dev = last_dev = dev_id;
+	}
+	do {
+		callbacks = &hcdevs[next_dev].callbacks;
+
+		/* check if not already registered */
+		TAILQ_FOREACH(callback, callbacks, next) {
+			if (callback->event == event &&
+					callback->function == function &&
+					callback->user_data == user_data) {
+				HCDEV_LOG(INFO, "callback already registered");
+				return 0;
+			}
+		}
+
+		callback = malloc(sizeof(*callback));
+		if (callback == NULL) {
+			HCDEV_LOG(ERR, "cannot allocate callback");
+			return -ENOMEM;
+		}
+		callback->function = function;
+		callback->user_data = user_data;
+		callback->event = event;
+		TAILQ_INSERT_TAIL(callbacks, callback, next);
+
+	} while (++next_dev <= last_dev);
+
+	return 0;
+}
+
+int
+rte_hcdev_callback_unregister(int16_t dev_id, enum rte_hcdev_event event,
+		rte_hcdev_callback_t *function, void *user_data)
+{
+	int16_t next_dev, last_dev;
+	struct rte_hcdev_callback_list *callbacks;
+	struct rte_hcdev_callback *callback, *next_callback;
+
+	if (!rte_hcdev_is_valid(dev_id) && dev_id != RTE_HCDEV_ID_ANY) {
+		HCDEV_LOG(ERR, "unregister callback of invalid ID %d", dev_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+	if (function == NULL) {
+		HCDEV_LOG(ERR, "cannot unregister callback without function");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	if (dev_id == RTE_HCDEV_ID_ANY) {
+		next_dev = 0;
+		last_dev = hcdev_max - 1;
+	} else {
+		next_dev = last_dev = dev_id;
+	}
+
+	do {
+		callbacks = &hcdevs[next_dev].callbacks;
+		TAILQ_FOREACH_SAFE(callback, callbacks, next, next_callback) {
+			if (callback->event != event ||
+					callback->function != function ||
+					(callback->user_data != user_data &&
+					user_data != (void *)-1))
+				continue;
+			TAILQ_REMOVE(callbacks, callback, next);
+			free(callback);
+		}
+	} while (++next_dev <= last_dev);
+
+	return 0;
+}
+
+static void
+hcdev_free_callbacks(struct rte_hcdev *dev)
+{
+	struct rte_hcdev_callback_list *callbacks;
+	struct rte_hcdev_callback *callback, *next_callback;
+
+	callbacks = &dev->callbacks;
+	TAILQ_FOREACH_SAFE(callback, callbacks, next, next_callback) {
+		TAILQ_REMOVE(callbacks, callback, next);
+		free(callback);
+	}
+}
+
+void
+rte_hcdev_notify(struct rte_hcdev *dev, enum rte_hcdev_event event)
+{
+	int16_t dev_id;
+	struct rte_hcdev_callback *callback;
+
+	dev_id = dev->info.dev_id;
+	TAILQ_FOREACH(callback, &dev->callbacks, next) {
+		if (callback->event != event || callback->function == NULL)
+			continue;
+		callback->function(dev_id, event, callback->user_data);
+	}
+}
+
 int
 rte_hcdev_info_get(int16_t dev_id, struct rte_hcdev_info *info)
 {
diff --git a/lib/hcdev/hcdev_driver.h b/lib/hcdev/hcdev_driver.h
index ca23cb9b9f..80d11bd612 100644
--- a/lib/hcdev/hcdev_driver.h
+++ b/lib/hcdev/hcdev_driver.h
@@ -12,6 +12,7 @@
 #define RTE_HCDEV_DRIVER_H
 
 #include <stdint.h>
+#include <sys/queue.h>
 
 #include <rte_dev.h>
 
@@ -43,6 +44,8 @@ struct rte_hcdev {
 	struct rte_hcdev_info info;
 	/* Driver functions. */
 	struct rte_hcdev_ops ops;
+	/* Event callback list. */
+	TAILQ_HEAD(rte_hcdev_callback_list, rte_hcdev_callback) callbacks;
 	/* Current state (used or not) in the running process. */
 	enum rte_hcdev_state state; /* Updated by this library. */
 	/* Driver-specific private data for the running process. */
@@ -64,4 +67,8 @@ void rte_hcdev_complete_new(struct rte_hcdev *dev);
 __rte_internal
 int rte_hcdev_release(struct rte_hcdev *dev);
 
+/* Call registered callbacks. No multi-process event. */
+__rte_internal
+void rte_hcdev_notify(struct rte_hcdev *dev, enum rte_hcdev_event);
+
 #endif /* RTE_HCDEV_DRIVER_H */
diff --git a/lib/hcdev/rte_hcdev.h b/lib/hcdev/rte_hcdev.h
index 83f58193c1..8131e4045a 100644
--- a/lib/hcdev/rte_hcdev.h
+++ b/lib/hcdev/rte_hcdev.h
@@ -17,6 +17,7 @@
  *
  * The API is not thread-safe.
  * Device management must be done by a single thread.
+ * TODO device rwlock for callback list
  *
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
@@ -31,6 +32,11 @@ extern "C" {
 
 /** Empty device ID. */
 #define RTE_HCDEV_ID_NONE -1
+/** Catch-all device ID. */
+#define RTE_HCDEV_ID_ANY INT16_MIN
+
+/** Catch-all callback data. */
+#define RTE_HCDEV_CALLBACK_ANY_DATA ((void *)-1)
 
 /** Store device info. */
 struct rte_hcdev_info {
@@ -46,6 +52,18 @@ struct rte_hcdev_info {
 	int16_t numa_node;
 };
 
+/** Flags passed in notification callback. */
+enum rte_hcdev_event {
+	/** Device is just initialized. */
+	RTE_HCDEV_EVENT_NEW,
+	/** Device is going to be released. */
+	RTE_HCDEV_EVENT_DEL,
+};
+
+/** Prototype of event callback function. */
+typedef void (rte_hcdev_callback_t)(int16_t dev_id,
+		enum rte_hcdev_event event, void *user_data);
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
@@ -142,6 +160,59 @@ int16_t rte_hcdev_find_next(int16_t dev_id);
 __rte_experimental
 int rte_hcdev_close(int16_t dev_id);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Register a function as event callback.
+ * A function may be registered multiple times for different events.
+ *
+ * @param dev_id
+ *   Device ID to get notified about.
+ *   RTE_HCDEV_ID_ANY means all devices.
+ * @param event
+ *   Device event to be registered for.
+ * @param function
+ *   Callback function to be called on event.
+ * @param user_data
+ *   Optional parameter passed in the callback.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - EINVAL if NULL function
+ *   - ENOMEM if out of memory
+ */
+__rte_experimental
+int rte_hcdev_callback_register(int16_t dev_id, enum rte_hcdev_event event,
+		rte_hcdev_callback_t *function, void *user_data);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Unregister for an event.
+ *
+ * @param dev_id
+ *   Device ID to be silenced.
+ *   RTE_HCDEV_ID_ANY means all devices.
+ * @param event
+ *   Registered event.
+ * @param function
+ *   Registered function.
+ * @param user_data
+ *   Optional parameter as registered.
+ *   RTE_HCDEV_CALLBACK_ANY_DATA is a catch-all.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - EINVAL if NULL function
+ */
+__rte_experimental
+int rte_hcdev_callback_unregister(int16_t dev_id, enum rte_hcdev_event event,
+		rte_hcdev_callback_t *function, void *user_data);
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
diff --git a/lib/hcdev/version.map b/lib/hcdev/version.map
index bc6dae6de7..24a5a5a7c4 100644
--- a/lib/hcdev/version.map
+++ b/lib/hcdev/version.map
@@ -2,6 +2,8 @@ EXPERIMENTAL {
 	global:
 
 	# added in 21.11
+	rte_hcdev_callback_register;
+	rte_hcdev_callback_unregister;
 	rte_hcdev_close;
 	rte_hcdev_count_avail;
 	rte_hcdev_find_next;
@@ -16,5 +18,6 @@ INTERNAL {
 	rte_hcdev_allocate;
 	rte_hcdev_complete_new;
 	rte_hcdev_get_by_name;
+	rte_hcdev_notify;
 	rte_hcdev_release;
 };
-- 
2.31.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [RFC PATCH v2 3/7] hcdev: add child device representing a device context
  2021-07-30 13:55 ` [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library Thomas Monjalon
  2021-07-30 13:55   ` [dpdk-dev] [RFC PATCH v2 1/7] hcdev: introduce heterogeneous computing device library Thomas Monjalon
  2021-07-30 13:55   ` [dpdk-dev] [RFC PATCH v2 2/7] hcdev: add event notification Thomas Monjalon
@ 2021-07-30 13:55   ` Thomas Monjalon
  2021-07-30 13:55   ` [dpdk-dev] [RFC PATCH v2 4/7] hcdev: support multi-process Thomas Monjalon
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 128+ messages in thread
From: Thomas Monjalon @ 2021-07-30 13:55 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, David Marchand, Andrew Rybchenko, Haiyue Wang,
	Honnappa Nagarahalli, Jerin Jacob, Ferruh Yigit, Elena Agostini,
	Ray Kinsella

The computing device may operate in some isolated contexts.
Memory and processing are isolated in a silo represented by
a child device.
The context is provided as an opaque by the caller of
rte_hcdev_add_child().

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 lib/hcdev/hcdev.c        | 45 ++++++++++++++++++++++++--
 lib/hcdev/hcdev_driver.h |  2 +-
 lib/hcdev/rte_hcdev.h    | 69 +++++++++++++++++++++++++++++++++++++---
 lib/hcdev/version.map    |  1 +
 4 files changed, 110 insertions(+), 7 deletions(-)

diff --git a/lib/hcdev/hcdev.c b/lib/hcdev/hcdev.c
index 2a7ce1ccd8..d40010749a 100644
--- a/lib/hcdev/hcdev.c
+++ b/lib/hcdev/hcdev.c
@@ -79,13 +79,22 @@ rte_hcdev_is_valid(int16_t dev_id)
 	return false;
 }
 
+static bool
+hcdev_match_parent(int16_t dev_id, int16_t parent)
+{
+	if (parent == RTE_HCDEV_ID_ANY)
+		return true;
+	return hcdevs[dev_id].info.parent == parent;
+}
+
 int16_t
-rte_hcdev_find_next(int16_t dev_id)
+rte_hcdev_find_next(int16_t dev_id, int16_t parent)
 {
 	if (dev_id < 0)
 		dev_id = 0;
 	while (dev_id < hcdev_max &&
-			hcdevs[dev_id].state == RTE_HCDEV_STATE_UNUSED)
+			(hcdevs[dev_id].state == RTE_HCDEV_STATE_UNUSED ||
+			!hcdev_match_parent(dev_id, parent)))
 		dev_id++;
 
 	if (dev_id >= hcdev_max)
@@ -176,6 +185,7 @@ rte_hcdev_allocate(const char *name)
 	dev->info.name = dev->name;
 	dev->info.dev_id = dev_id;
 	dev->info.numa_node = -1;
+	dev->info.parent = RTE_HCDEV_ID_NONE;
 	TAILQ_INIT(&dev->callbacks);
 
 	hcdev_count++;
@@ -184,6 +194,28 @@ rte_hcdev_allocate(const char *name)
 	return dev;
 }
 
+int16_t
+rte_hcdev_add_child(const char *name, int16_t parent, uint64_t child_context)
+{
+	struct rte_hcdev *dev;
+
+	if (!rte_hcdev_is_valid(parent)) {
+		HCDEV_LOG(ERR, "add child to invalid parent ID %d", parent);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+
+	dev = rte_hcdev_allocate(name);
+	if (dev == NULL)
+		return -rte_errno;
+
+	dev->info.parent = parent;
+	dev->info.context = child_context;
+
+	rte_hcdev_complete_new(dev);
+	return dev->info.dev_id;
+}
+
 void
 rte_hcdev_complete_new(struct rte_hcdev *dev)
 {
@@ -197,10 +229,19 @@ rte_hcdev_complete_new(struct rte_hcdev *dev)
 int
 rte_hcdev_release(struct rte_hcdev *dev)
 {
+	int16_t dev_id, child;
+
 	if (dev == NULL) {
 		rte_errno = ENODEV;
 		return -rte_errno;
 	}
+	dev_id = dev->info.dev_id;
+	RTE_HCDEV_FOREACH_CHILD(child, dev_id) {
+		HCDEV_LOG(ERR, "cannot release device %d with child %d",
+				dev_id, child);
+		rte_errno = EBUSY;
+		return -rte_errno;
+	}
 
 	HCDEV_LOG(DEBUG, "free device %s (id %d)",
 			dev->info.name, dev->info.dev_id);
diff --git a/lib/hcdev/hcdev_driver.h b/lib/hcdev/hcdev_driver.h
index 80d11bd612..39f6fc57ab 100644
--- a/lib/hcdev/hcdev_driver.h
+++ b/lib/hcdev/hcdev_driver.h
@@ -31,7 +31,7 @@ typedef int (rte_hcdev_info_get_t)(struct rte_hcdev *dev, struct rte_hcdev_info
 struct rte_hcdev_ops {
 	/* Get device info. If NULL, info is just copied. */
 	rte_hcdev_info_get_t *dev_info_get;
-	/* Close device. */
+	/* Close device or child context. */
 	rte_hcdev_close_t *dev_close;
 };
 
diff --git a/lib/hcdev/rte_hcdev.h b/lib/hcdev/rte_hcdev.h
index 8131e4045a..518020fd2f 100644
--- a/lib/hcdev/rte_hcdev.h
+++ b/lib/hcdev/rte_hcdev.h
@@ -42,8 +42,12 @@ extern "C" {
 struct rte_hcdev_info {
 	/** Unique identifier name. */
 	const char *name;
+	/** Opaque handler of the device context. */
+	uint64_t context;
 	/** Device ID. */
 	int16_t dev_id;
+	/** ID of the parent device, RTE_HCDEV_ID_NONE if no parent */
+	int16_t parent;
 	/** Total processors available on device. */
 	uint32_t processor_count;
 	/** Total memory available on device. */
@@ -112,6 +116,33 @@ uint16_t rte_hcdev_count_avail(void);
 __rte_experimental
 bool rte_hcdev_is_valid(int16_t dev_id);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Create a virtual device representing a context in the parent device.
+ *
+ * @param name
+ *   Unique string to identify the device.
+ * @param parent
+ *   Device ID of the parent.
+ * @param child_context
+ *   Opaque context handler.
+ *
+ * @return
+ *   Device ID of the new created child, -rte_errno otherwise:
+ *   - EINVAL if empty name
+ *   - ENAMETOOLONG if long name
+ *   - EEXIST if existing device name
+ *   - ENODEV if invalid parent
+ *   - EPERM if secondary process
+ *   - ENOENT if too many devices
+ *   - ENOMEM if out of space
+ */
+__rte_experimental
+int16_t rte_hcdev_add_child(const char *name,
+		int16_t parent, uint64_t child_context);
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
@@ -120,13 +151,17 @@ bool rte_hcdev_is_valid(int16_t dev_id);
  *
  * @param dev_id
  *   The initial device ID to start the research.
+ * @param parent
+ *   The device ID of the parent.
+ *   RTE_HCDEV_ID_NONE means no parent.
+ *   RTE_HCDEV_ID_ANY means no or any parent.
  *
  * @return
  *   Next device ID corresponding to a valid and initialized computing device,
  *   RTE_HCDEV_ID_NONE if there is none.
  */
 __rte_experimental
-int16_t rte_hcdev_find_next(int16_t dev_id);
+int16_t rte_hcdev_find_next(int16_t dev_id, int16_t parent);
 
 /**
  * @warning
@@ -138,15 +173,41 @@ int16_t rte_hcdev_find_next(int16_t dev_id);
  *   The ID of the next possible valid device, usually 0 to iterate all.
  */
 #define RTE_HCDEV_FOREACH(dev_id) \
-	for (dev_id = rte_hcdev_find_next(0); \
+	RTE_HCDEV_FOREACH_CHILD(dev_id, RTE_HCDEV_ID_ANY)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Macro to iterate over all valid computing devices having no parent.
+ *
+ * @param dev_id
+ *   The ID of the next possible valid device, usually 0 to iterate all.
+ */
+#define RTE_HCDEV_FOREACH_PARENT(dev_id) \
+	RTE_HCDEV_FOREACH_CHILD(dev_id, RTE_HCDEV_ID_NONE)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Macro to iterate over all valid children of a computing device parent.
+ *
+ * @param dev_id
+ *   The ID of the next possible valid device, usually 0 to iterate all.
+ * @param parent
+ *   The device ID of the parent.
+ */
+#define RTE_HCDEV_FOREACH_CHILD(dev_id, parent) \
+	for (dev_id = rte_hcdev_find_next(0, parent); \
 	     dev_id > 0; \
-	     dev_id = rte_hcdev_find_next(dev_id + 1))
+	     dev_id = rte_hcdev_find_next(dev_id + 1, parent))
 
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
  *
- * Close device.
+ * Close device or child context.
  * All resources are released.
  *
  * @param dev_id
diff --git a/lib/hcdev/version.map b/lib/hcdev/version.map
index 24a5a5a7c4..6d1a1ab1c9 100644
--- a/lib/hcdev/version.map
+++ b/lib/hcdev/version.map
@@ -2,6 +2,7 @@ EXPERIMENTAL {
 	global:
 
 	# added in 21.11
+	rte_hcdev_add_child;
 	rte_hcdev_callback_register;
 	rte_hcdev_callback_unregister;
 	rte_hcdev_close;
-- 
2.31.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [RFC PATCH v2 4/7] hcdev: support multi-process
  2021-07-30 13:55 ` [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library Thomas Monjalon
                     ` (2 preceding siblings ...)
  2021-07-30 13:55   ` [dpdk-dev] [RFC PATCH v2 3/7] hcdev: add child device representing a device context Thomas Monjalon
@ 2021-07-30 13:55   ` Thomas Monjalon
  2021-07-30 13:55   ` [dpdk-dev] [RFC PATCH v2 5/7] hcdev: add memory API Thomas Monjalon
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 128+ messages in thread
From: Thomas Monjalon @ 2021-07-30 13:55 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, David Marchand, Andrew Rybchenko, Haiyue Wang,
	Honnappa Nagarahalli, Jerin Jacob, Ferruh Yigit, Elena Agostini,
	Ray Kinsella, Anatoly Burakov

The device data shared between processes are moved in a struct
allocated in a shared memory (a new memzone for all hcdevs).
The main struct rte_hcdev references the shared memory
via the pointer mpshared.

The API function rte_hcdev_attach() is added to attach a device
from the secondary process.
The function rte_hcdev_allocate() can be used only by primary process.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 lib/hcdev/hcdev.c        | 114 ++++++++++++++++++++++++++++++++-------
 lib/hcdev/hcdev_driver.h |  23 ++++++--
 lib/hcdev/rte_hcdev.h    |   3 +-
 lib/hcdev/version.map    |   1 +
 4 files changed, 115 insertions(+), 26 deletions(-)

diff --git a/lib/hcdev/hcdev.c b/lib/hcdev/hcdev.c
index d40010749a..a7badd122b 100644
--- a/lib/hcdev/hcdev.c
+++ b/lib/hcdev/hcdev.c
@@ -5,6 +5,7 @@
 #include <rte_eal.h>
 #include <rte_tailq.h>
 #include <rte_string_fns.h>
+#include <rte_memzone.h>
 #include <rte_errno.h>
 #include <rte_log.h>
 
@@ -28,6 +29,12 @@ static int16_t hcdev_max;
 /* Number of currently valid devices */
 static int16_t hcdev_count;
 
+/* Shared memory between processes. */
+static const char *HCDEV_MEMZONE = "rte_hcdev_shared";
+static struct {
+	__extension__ struct rte_hcdev_mpshared hcdevs[0];
+} *hcdev_shared_mem;
+
 /* Event callback object */
 struct rte_hcdev_callback {
 	TAILQ_ENTRY(rte_hcdev_callback) next;
@@ -40,6 +47,8 @@ static void hcdev_free_callbacks(struct rte_hcdev *dev);
 int
 rte_hcdev_init(size_t dev_max)
 {
+	const struct rte_memzone *memzone;
+
 	if (dev_max == 0 || dev_max > INT16_MAX) {
 		HCDEV_LOG(ERR, "invalid array size");
 		rte_errno = EINVAL;
@@ -60,6 +69,23 @@ rte_hcdev_init(size_t dev_max)
 		return -rte_errno;
 	}
 
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		memzone = rte_memzone_reserve(HCDEV_MEMZONE,
+				sizeof(*hcdev_shared_mem) +
+				sizeof(*hcdev_shared_mem->hcdevs) * dev_max,
+				SOCKET_ID_ANY, 0);
+	} else {
+		memzone = rte_memzone_lookup(HCDEV_MEMZONE);
+	}
+	if (memzone == NULL) {
+		HCDEV_LOG(ERR, "cannot initialize shared memory");
+		free(hcdevs);
+		hcdevs = NULL;
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+	hcdev_shared_mem = memzone->addr;
+
 	hcdev_max = dev_max;
 	return 0;
 }
@@ -74,7 +100,7 @@ bool
 rte_hcdev_is_valid(int16_t dev_id)
 {
 	if (dev_id >= 0 && dev_id < hcdev_max &&
-		hcdevs[dev_id].state == RTE_HCDEV_STATE_INITIALIZED)
+		hcdevs[dev_id].process_state == RTE_HCDEV_STATE_INITIALIZED)
 		return true;
 	return false;
 }
@@ -84,7 +110,7 @@ hcdev_match_parent(int16_t dev_id, int16_t parent)
 {
 	if (parent == RTE_HCDEV_ID_ANY)
 		return true;
-	return hcdevs[dev_id].info.parent == parent;
+	return hcdevs[dev_id].mpshared->info.parent == parent;
 }
 
 int16_t
@@ -93,7 +119,7 @@ rte_hcdev_find_next(int16_t dev_id, int16_t parent)
 	if (dev_id < 0)
 		dev_id = 0;
 	while (dev_id < hcdev_max &&
-			(hcdevs[dev_id].state == RTE_HCDEV_STATE_UNUSED ||
+			(hcdevs[dev_id].process_state == RTE_HCDEV_STATE_UNUSED ||
 			!hcdev_match_parent(dev_id, parent)))
 		dev_id++;
 
@@ -108,7 +134,7 @@ hcdev_find_free_id(void)
 	int16_t dev_id;
 
 	for (dev_id = 0; dev_id < hcdev_max; dev_id++) {
-		if (hcdevs[dev_id].state == RTE_HCDEV_STATE_UNUSED)
+		if (hcdevs[dev_id].process_state == RTE_HCDEV_STATE_UNUSED)
 			return dev_id;
 	}
 	return RTE_HCDEV_ID_NONE;
@@ -135,7 +161,7 @@ rte_hcdev_get_by_name(const char *name)
 
 	RTE_HCDEV_FOREACH(dev_id) {
 		dev = &hcdevs[dev_id];
-		if (strncmp(name, dev->name, RTE_DEV_NAME_MAX_LEN) == 0)
+		if (strncmp(name, dev->mpshared->name, RTE_DEV_NAME_MAX_LEN) == 0)
 			return dev;
 	}
 	return NULL;
@@ -177,16 +203,20 @@ rte_hcdev_allocate(const char *name)
 	dev = &hcdevs[dev_id];
 	memset(dev, 0, sizeof(*dev));
 
-	if (rte_strscpy(dev->name, name, RTE_DEV_NAME_MAX_LEN) < 0) {
+	dev->mpshared = &hcdev_shared_mem->hcdevs[dev_id];
+	memset(dev->mpshared, 0, sizeof(*dev->mpshared));
+
+	if (rte_strscpy(dev->mpshared->name, name, RTE_DEV_NAME_MAX_LEN) < 0) {
 		HCDEV_LOG(ERR, "device name too long: %s", name);
 		rte_errno = ENAMETOOLONG;
 		return NULL;
 	}
-	dev->info.name = dev->name;
-	dev->info.dev_id = dev_id;
-	dev->info.numa_node = -1;
-	dev->info.parent = RTE_HCDEV_ID_NONE;
+	dev->mpshared->info.name = dev->mpshared->name;
+	dev->mpshared->info.dev_id = dev_id;
+	dev->mpshared->info.numa_node = -1;
+	dev->mpshared->info.parent = RTE_HCDEV_ID_NONE;
 	TAILQ_INIT(&dev->callbacks);
+	__atomic_fetch_add(&dev->mpshared->process_refcnt, 1, __ATOMIC_RELAXED);
 
 	hcdev_count++;
 	HCDEV_LOG(DEBUG, "new device %s (id %d) of total %d",
@@ -194,6 +224,51 @@ rte_hcdev_allocate(const char *name)
 	return dev;
 }
 
+struct rte_hcdev *
+rte_hcdev_attach(const char *name)
+{
+	int16_t dev_id;
+	struct rte_hcdev *dev;
+	struct rte_hcdev_mpshared *shared_dev;
+
+	if (rte_eal_process_type() != RTE_PROC_SECONDARY) {
+		HCDEV_LOG(ERR, "only secondary process can attach device");
+		rte_errno = EPERM;
+		return NULL;
+	}
+	if (name == NULL) {
+		HCDEV_LOG(ERR, "attach device without a name");
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* implicit initialization of library before adding first device */
+	if (hcdevs == NULL && rte_hcdev_init(RTE_HCDEV_DEFAULT_MAX) < 0)
+		return NULL;
+
+	for (dev_id = 0; dev_id < hcdev_max; dev_id++) {
+		shared_dev = &hcdev_shared_mem->hcdevs[dev_id];
+		if (strncmp(name, shared_dev->name, RTE_DEV_NAME_MAX_LEN) == 0)
+			break;
+	}
+	if (dev_id >= hcdev_max) {
+		HCDEV_LOG(ERR, "device with name %s not found", name);
+		rte_errno = ENOENT;
+		return NULL;
+	}
+	dev = &hcdevs[dev_id];
+	memset(dev, 0, sizeof(*dev));
+
+	TAILQ_INIT(&dev->callbacks);
+	dev->mpshared = shared_dev;
+	__atomic_fetch_add(&dev->mpshared->process_refcnt, 1, __ATOMIC_RELAXED);
+
+	hcdev_count++;
+	HCDEV_LOG(DEBUG, "attached device %s (id %d) of total %d",
+			name, dev_id, hcdev_count);
+	return dev;
+}
+
 int16_t
 rte_hcdev_add_child(const char *name, int16_t parent, uint64_t child_context)
 {
@@ -209,11 +284,11 @@ rte_hcdev_add_child(const char *name, int16_t parent, uint64_t child_context)
 	if (dev == NULL)
 		return -rte_errno;
 
-	dev->info.parent = parent;
-	dev->info.context = child_context;
+	dev->mpshared->info.parent = parent;
+	dev->mpshared->info.context = child_context;
 
 	rte_hcdev_complete_new(dev);
-	return dev->info.dev_id;
+	return dev->mpshared->info.dev_id;
 }
 
 void
@@ -222,7 +297,7 @@ rte_hcdev_complete_new(struct rte_hcdev *dev)
 	if (dev == NULL)
 		return;
 
-	dev->state = RTE_HCDEV_STATE_INITIALIZED;
+	dev->process_state = RTE_HCDEV_STATE_INITIALIZED;
 	rte_hcdev_notify(dev, RTE_HCDEV_EVENT_NEW);
 }
 
@@ -235,7 +310,7 @@ rte_hcdev_release(struct rte_hcdev *dev)
 		rte_errno = ENODEV;
 		return -rte_errno;
 	}
-	dev_id = dev->info.dev_id;
+	dev_id = dev->mpshared->info.dev_id;
 	RTE_HCDEV_FOREACH_CHILD(child, dev_id) {
 		HCDEV_LOG(ERR, "cannot release device %d with child %d",
 				dev_id, child);
@@ -244,11 +319,12 @@ rte_hcdev_release(struct rte_hcdev *dev)
 	}
 
 	HCDEV_LOG(DEBUG, "free device %s (id %d)",
-			dev->info.name, dev->info.dev_id);
+			dev->mpshared->info.name, dev->mpshared->info.dev_id);
 	rte_hcdev_notify(dev, RTE_HCDEV_EVENT_DEL);
 
 	hcdev_free_callbacks(dev);
-	dev->state = RTE_HCDEV_STATE_UNUSED;
+	dev->process_state = RTE_HCDEV_STATE_UNUSED;
+	__atomic_fetch_sub(&dev->mpshared->process_refcnt, 1, __ATOMIC_RELAXED);
 	hcdev_count--;
 
 	return 0;
@@ -394,7 +470,7 @@ rte_hcdev_notify(struct rte_hcdev *dev, enum rte_hcdev_event event)
 	int16_t dev_id;
 	struct rte_hcdev_callback *callback;
 
-	dev_id = dev->info.dev_id;
+	dev_id = dev->mpshared->info.dev_id;
 	TAILQ_FOREACH(callback, &dev->callbacks, next) {
 		if (callback->event != event || callback->function == NULL)
 			continue;
@@ -420,7 +496,7 @@ rte_hcdev_info_get(int16_t dev_id, struct rte_hcdev_info *info)
 	}
 
 	if (dev->ops.dev_info_get == NULL) {
-		*info = dev->info;
+		*info = dev->mpshared->info;
 		return 0;
 	}
 	return HCDEV_DRV_RET(dev->ops.dev_info_get(dev, info));
diff --git a/lib/hcdev/hcdev_driver.h b/lib/hcdev/hcdev_driver.h
index 39f6fc57ab..f33b56947b 100644
--- a/lib/hcdev/hcdev_driver.h
+++ b/lib/hcdev/hcdev_driver.h
@@ -35,19 +35,28 @@ struct rte_hcdev_ops {
 	rte_hcdev_close_t *dev_close;
 };
 
-struct rte_hcdev {
-	/* Backing device. */
-	struct rte_device *device;
+struct rte_hcdev_mpshared {
 	/* Unique identifier name. */
 	char name[RTE_DEV_NAME_MAX_LEN]; /* Updated by this library. */
+	/* Driver-specific private data shared in multi-process. */
+	void *dev_private;
 	/* Device info structure. */
 	struct rte_hcdev_info info;
+	/* Counter of processes using the device. */
+	uint16_t process_refcnt; /* Updated by this library. */
+};
+
+struct rte_hcdev {
+	/* Backing device. */
+	struct rte_device *device;
+	/* Data shared between processes. */
+	struct rte_hcdev_mpshared *mpshared;
 	/* Driver functions. */
 	struct rte_hcdev_ops ops;
 	/* Event callback list. */
 	TAILQ_HEAD(rte_hcdev_callback_list, rte_hcdev_callback) callbacks;
 	/* Current state (used or not) in the running process. */
-	enum rte_hcdev_state state; /* Updated by this library. */
+	enum rte_hcdev_state process_state; /* Updated by this library. */
 	/* Driver-specific private data for the running process. */
 	void *process_private;
 } __rte_cache_aligned;
@@ -55,10 +64,14 @@ struct rte_hcdev {
 __rte_internal
 struct rte_hcdev *rte_hcdev_get_by_name(const char *name);
 
-/* First step of initialization */
+/* First step of initialization in primary process. */
 __rte_internal
 struct rte_hcdev *rte_hcdev_allocate(const char *name);
 
+/* First step of initialization in secondary process. */
+__rte_internal
+struct rte_hcdev *rte_hcdev_attach(const char *name);
+
 /* Last step of initialization. */
 __rte_internal
 void rte_hcdev_complete_new(struct rte_hcdev *dev);
diff --git a/lib/hcdev/rte_hcdev.h b/lib/hcdev/rte_hcdev.h
index 518020fd2f..c95f37063d 100644
--- a/lib/hcdev/rte_hcdev.h
+++ b/lib/hcdev/rte_hcdev.h
@@ -15,9 +15,8 @@
  * @file
  * Generic library to interact with heterogeneous computing device.
  *
- * The API is not thread-safe.
- * Device management must be done by a single thread.
  * TODO device rwlock for callback list
+ * TODO mp shared rwlock for device array
  *
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
diff --git a/lib/hcdev/version.map b/lib/hcdev/version.map
index 6d1a1ab1c9..450c256527 100644
--- a/lib/hcdev/version.map
+++ b/lib/hcdev/version.map
@@ -17,6 +17,7 @@ INTERNAL {
 	global:
 
 	rte_hcdev_allocate;
+	rte_hcdev_attach;
 	rte_hcdev_complete_new;
 	rte_hcdev_get_by_name;
 	rte_hcdev_notify;
-- 
2.31.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [RFC PATCH v2 5/7] hcdev: add memory API
  2021-07-30 13:55 ` [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library Thomas Monjalon
                     ` (3 preceding siblings ...)
  2021-07-30 13:55   ` [dpdk-dev] [RFC PATCH v2 4/7] hcdev: support multi-process Thomas Monjalon
@ 2021-07-30 13:55   ` Thomas Monjalon
  2021-07-30 13:55   ` [dpdk-dev] [RFC PATCH v2 6/7] hcdev: add communication flag Thomas Monjalon
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 128+ messages in thread
From: Thomas Monjalon @ 2021-07-30 13:55 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, David Marchand, Andrew Rybchenko, Haiyue Wang,
	Honnappa Nagarahalli, Jerin Jacob, Ferruh Yigit, Elena Agostini,
	Ray Kinsella

From: Elena Agostini <eagostini@nvidia.com>

In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.
Such workload distribution can be achieved by sharing some memory.

As a first step, the features are focused on memory management.
A function allows to allocate memory inside the device,
or in the main (CPU) memory while making it visible for the device.
This memory may be used to save packets or for synchronization data.

The next step should focus on GPU processing task control.

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 doc/guides/hcdevs/features/default.ini |  3 +
 doc/guides/rel_notes/release_21_08.rst |  1 +
 lib/hcdev/hcdev.c                      | 88 ++++++++++++++++++++++++++
 lib/hcdev/hcdev_driver.h               |  9 +++
 lib/hcdev/rte_hcdev.h                  | 53 ++++++++++++++++
 lib/hcdev/version.map                  |  2 +
 6 files changed, 156 insertions(+)

diff --git a/doc/guides/hcdevs/features/default.ini b/doc/guides/hcdevs/features/default.ini
index f988ee73d4..ee32753d94 100644
--- a/doc/guides/hcdevs/features/default.ini
+++ b/doc/guides/hcdevs/features/default.ini
@@ -8,3 +8,6 @@
 ;
 [Features]
 Get device info                =
+Share CPU memory with device   =
+Allocate device memory         =
+Free memory                    =
diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index fb350b4706..e955a331a6 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -58,6 +58,7 @@ New Features
 * **Introduced Heterogeneous Computing Device library with first features:**
 
   * Device information
+  * Memory management
 
 * **Added auxiliary bus support.**
 
diff --git a/lib/hcdev/hcdev.c b/lib/hcdev/hcdev.c
index a7badd122b..621e0b99bd 100644
--- a/lib/hcdev/hcdev.c
+++ b/lib/hcdev/hcdev.c
@@ -6,6 +6,7 @@
 #include <rte_tailq.h>
 #include <rte_string_fns.h>
 #include <rte_memzone.h>
+#include <rte_malloc.h>
 #include <rte_errno.h>
 #include <rte_log.h>
 
@@ -501,3 +502,90 @@ rte_hcdev_info_get(int16_t dev_id, struct rte_hcdev_info *info)
 	}
 	return HCDEV_DRV_RET(dev->ops.dev_info_get(dev, info));
 }
+
+#define RTE_HCDEV_MALLOC_FLAGS_ALL \
+	RTE_HCDEV_MALLOC_REGISTER_FROM_CPU
+#define RTE_HCDEV_MALLOC_FLAGS_RESERVED ~RTE_HCDEV_MALLOC_FLAGS_ALL
+
+void *
+rte_hcdev_malloc(int16_t dev_id, size_t size, uint32_t flags)
+{
+	struct rte_hcdev *dev;
+	void *ptr;
+	int ret;
+
+	dev = hcdev_get_by_id(dev_id);
+	if (dev == NULL) {
+		HCDEV_LOG(ERR, "alloc mem for invalid device ID %d", dev_id);
+		rte_errno = ENODEV;
+		return NULL;
+	}
+	if (flags & RTE_HCDEV_MALLOC_FLAGS_RESERVED) {
+		HCDEV_LOG(ERR, "alloc mem with reserved flag 0x%x",
+				flags & RTE_HCDEV_MALLOC_FLAGS_RESERVED);
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	if (flags & RTE_HCDEV_MALLOC_REGISTER_FROM_CPU) {
+		if (dev->ops.mem_register == NULL) {
+			HCDEV_LOG(ERR, "mem registration not supported");
+			rte_errno = ENOTSUP;
+			return NULL;
+		}
+	} else {
+		if (dev->ops.mem_alloc == NULL) {
+			HCDEV_LOG(ERR, "mem allocation not supported");
+			rte_errno = ENOTSUP;
+			return NULL;
+		}
+	}
+
+	if (size == 0) /* dry-run */
+		return NULL;
+
+	if (flags & RTE_HCDEV_MALLOC_REGISTER_FROM_CPU) {
+		ptr = rte_zmalloc(NULL, size, 0);
+		if (ptr == NULL) {
+			HCDEV_LOG(ERR, "cannot allocate CPU memory");
+			rte_errno = ENOMEM;
+			return NULL;
+		}
+		ret = dev->ops.mem_register(dev, size, ptr);
+	} else {
+		ret = dev->ops.mem_alloc(dev, size, &ptr);
+	}
+	/* TODO maintain a table of chunks registered/allocated */
+	switch (ret) {
+	case 0:
+		return ptr;
+	case -ENOMEM:
+	case -E2BIG:
+		rte_errno = -ret;
+		return NULL;
+	default:
+		rte_errno = EPERM;
+		return NULL;
+	}
+}
+
+int
+rte_hcdev_free(int16_t dev_id, void *ptr)
+{
+	struct rte_hcdev *dev;
+
+	dev = hcdev_get_by_id(dev_id);
+	if (dev == NULL) {
+		HCDEV_LOG(ERR, "free mem for invalid device ID %d", dev_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+
+	if (dev->ops.mem_free == NULL) {
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	return HCDEV_DRV_RET(dev->ops.mem_free(dev, ptr));
+	/* TODO unregister callback */
+	/* TODO rte_free CPU memory */
+}
diff --git a/lib/hcdev/hcdev_driver.h b/lib/hcdev/hcdev_driver.h
index f33b56947b..f42f08508f 100644
--- a/lib/hcdev/hcdev_driver.h
+++ b/lib/hcdev/hcdev_driver.h
@@ -27,12 +27,21 @@ enum rte_hcdev_state {
 struct rte_hcdev;
 typedef int (rte_hcdev_close_t)(struct rte_hcdev *dev);
 typedef int (rte_hcdev_info_get_t)(struct rte_hcdev *dev, struct rte_hcdev_info *info);
+typedef int (rte_hcdev_mem_alloc_t)(struct rte_hcdev *dev, size_t size, void **ptr);
+typedef int (rte_hcdev_mem_register_t)(struct rte_hcdev *dev, size_t size, void *ptr);
+typedef int (rte_hcdev_free_t)(struct rte_hcdev *dev, void *ptr);
 
 struct rte_hcdev_ops {
 	/* Get device info. If NULL, info is just copied. */
 	rte_hcdev_info_get_t *dev_info_get;
 	/* Close device or child context. */
 	rte_hcdev_close_t *dev_close;
+	/* Allocate memory in device. */
+	rte_hcdev_mem_alloc_t *mem_alloc;
+	/* Register CPU memory in device. */
+	rte_hcdev_mem_register_t *mem_register;
+	/* Free memory allocated or registered in device. */
+	rte_hcdev_free_t *mem_free;
 };
 
 struct rte_hcdev_mpshared {
diff --git a/lib/hcdev/rte_hcdev.h b/lib/hcdev/rte_hcdev.h
index c95f37063d..11895d9486 100644
--- a/lib/hcdev/rte_hcdev.h
+++ b/lib/hcdev/rte_hcdev.h
@@ -9,6 +9,7 @@
 #include <stdint.h>
 #include <stdbool.h>
 
+#include <rte_bitops.h>
 #include <rte_compat.h>
 
 /**
@@ -293,6 +294,58 @@ int rte_hcdev_callback_unregister(int16_t dev_id, enum rte_hcdev_event event,
 __rte_experimental
 int rte_hcdev_info_get(int16_t dev_id, struct rte_hcdev_info *info);
 
+/** Memory allocated on a CPU node and visible by the device. */
+#define RTE_HCDEV_MALLOC_REGISTER_FROM_CPU RTE_BIT32(0)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Allocate a chunk of memory usable by the device.
+ *
+ * @param dev_id
+ *   Device ID requiring allocated memory.
+ * @param size
+ *   Number of bytes to allocate.
+ *   Requesting 0 will do nothing.
+ * @param flags
+ *   If 0, the default is to allocate in the device memory.
+ *   See flags RTE_HCDEV_MALLOC_*
+ *
+ * @return
+ *   A pointer to the allocated memory, otherwise NULL and rte_errno is set:
+ *   - ENODEV if invalid dev_id
+ *   - EINVAL if reserved flags
+ *   - ENOTSUP if operation not supported by the driver
+ *   - E2BIG if size is higher than limit
+ *   - ENOMEM if out of space
+ *   - EPERM if driver error
+ */
+__rte_experimental
+void *rte_hcdev_malloc(int16_t dev_id, size_t size, uint32_t flags)
+__rte_alloc_size(2);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Deallocate a chunk of memory allocated with rte_hcdev_malloc().
+ *
+ * @param dev_id
+ *   Reference device ID.
+ * @param ptr
+ *   Pointer to the memory area to be deallocated.
+ *   NULL is a no-op accepted value.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - ENOTSUP if operation not supported by the driver
+ *   - EPERM if driver error
+ */
+__rte_experimental
+int rte_hcdev_free(int16_t dev_id, void *ptr);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/hcdev/version.map b/lib/hcdev/version.map
index 450c256527..9195f4f747 100644
--- a/lib/hcdev/version.map
+++ b/lib/hcdev/version.map
@@ -8,9 +8,11 @@ EXPERIMENTAL {
 	rte_hcdev_close;
 	rte_hcdev_count_avail;
 	rte_hcdev_find_next;
+	rte_hcdev_free;
 	rte_hcdev_info_get;
 	rte_hcdev_init;
 	rte_hcdev_is_valid;
+	rte_hcdev_malloc;
 };
 
 INTERNAL {
-- 
2.31.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [RFC PATCH v2 6/7] hcdev: add communication flag
  2021-07-30 13:55 ` [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library Thomas Monjalon
                     ` (4 preceding siblings ...)
  2021-07-30 13:55   ` [dpdk-dev] [RFC PATCH v2 5/7] hcdev: add memory API Thomas Monjalon
@ 2021-07-30 13:55   ` Thomas Monjalon
  2021-07-30 13:55   ` [dpdk-dev] [RFC PATCH v2 7/7] hcdev: add communication list Thomas Monjalon
  2021-07-31  7:06   ` [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library Jerin Jacob
  7 siblings, 0 replies; 128+ messages in thread
From: Thomas Monjalon @ 2021-07-30 13:55 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, David Marchand, Andrew Rybchenko, Haiyue Wang,
	Honnappa Nagarahalli, Jerin Jacob, Ferruh Yigit, Elena Agostini,
	Ray Kinsella

From: Elena Agostini <eagostini@nvidia.com>

In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.
When mixing network activity with task processing there may be the need
to put in communication the CPU with the device in order to synchronize
operations.

The purpose of this flag is to allow the CPU and the device to
exchange ACKs. A possible use-case is described below.

CPU:
- Trigger some task on the device
- Prepare some data
- Signal to the device the data is ready updating the communication flag

Device:
- Do some pre-processing
- Wait for more data from the CPU polling on the communication flag
- Consume the data prepared by the CPU

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
---
 lib/hcdev/hcdev.c     |  71 ++++++++++++++++++++++++++++
 lib/hcdev/rte_hcdev.h | 107 ++++++++++++++++++++++++++++++++++++++++++
 lib/hcdev/version.map |   4 ++
 3 files changed, 182 insertions(+)

diff --git a/lib/hcdev/hcdev.c b/lib/hcdev/hcdev.c
index 621e0b99bd..e391988e73 100644
--- a/lib/hcdev/hcdev.c
+++ b/lib/hcdev/hcdev.c
@@ -589,3 +589,74 @@ rte_hcdev_free(int16_t dev_id, void *ptr)
 	/* TODO unregister callback */
 	/* TODO rte_free CPU memory */
 }
+
+int
+rte_hcdev_comm_create_flag(uint16_t dev_id, struct rte_hcdev_comm_flag *hcflag,
+		enum rte_hcdev_comm_flag_type mtype)
+{
+	size_t flag_size;
+
+	if (hcflag == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (mtype != RTE_HCDEV_COMM_FLAG_CPU) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	flag_size = sizeof(uint32_t);
+
+	hcflag->ptr = rte_hcdev_malloc(dev_id, flag_size,
+			RTE_HCDEV_MALLOC_REGISTER_FROM_CPU);
+	if (hcflag->ptr == NULL)
+		return -rte_errno;
+
+	hcflag->mtype = mtype;
+	return 0;
+}
+
+int
+rte_hcdev_comm_destroy_flag(uint16_t dev_id, struct rte_hcdev_comm_flag *hcflag)
+{
+	if (hcflag == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	return rte_hcdev_free(dev_id, hcflag->ptr);
+}
+
+int
+rte_hcdev_comm_set_flag(struct rte_hcdev_comm_flag *hcflag, uint32_t val)
+{
+	if (hcflag == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (hcflag->mtype != RTE_HCDEV_COMM_FLAG_CPU) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	RTE_HCDEV_VOLATILE(*hcflag->ptr) = val;
+
+	return 0;
+}
+
+int
+rte_hcdev_comm_get_flag_value(struct rte_hcdev_comm_flag *hcflag, uint32_t *val)
+{
+	if (hcflag == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (hcflag->mtype != RTE_HCDEV_COMM_FLAG_CPU) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	*val = RTE_HCDEV_VOLATILE(*hcflag->ptr);
+
+	return 0;
+}
diff --git a/lib/hcdev/rte_hcdev.h b/lib/hcdev/rte_hcdev.h
index 11895d9486..7b58041b3c 100644
--- a/lib/hcdev/rte_hcdev.h
+++ b/lib/hcdev/rte_hcdev.h
@@ -38,6 +38,9 @@ extern "C" {
 /** Catch-all callback data. */
 #define RTE_HCDEV_CALLBACK_ANY_DATA ((void *)-1)
 
+/** Access variable as volatile. */
+#define RTE_HCDEV_VOLATILE(x) (*(volatile typeof(x)*)&(x))
+
 /** Store device info. */
 struct rte_hcdev_info {
 	/** Unique identifier name. */
@@ -68,6 +71,18 @@ enum rte_hcdev_event {
 typedef void (rte_hcdev_callback_t)(int16_t dev_id,
 		enum rte_hcdev_event event, void *user_data);
 
+/** Memory where communication flag is allocated. */
+enum rte_hcdev_comm_flag_type {
+	/** Allocate flag on CPU memory visible from device. */
+	RTE_HCDEV_COMM_FLAG_CPU = 0,
+};
+
+/** Communication flag to coordinate CPU with the device. */
+struct rte_hcdev_comm_flag {
+	uint32_t *ptr;
+	enum rte_hcdev_comm_flag_type mtype;
+};
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
@@ -346,6 +361,98 @@ __rte_alloc_size(2);
 __rte_experimental
 int rte_hcdev_free(int16_t dev_id, void *ptr);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Create a communication flag that can be shared
+ * between CPU threads and device workload to exchange some status info
+ * (e.g. work is done, processing can start, etc..).
+ *
+ * @param dev_id
+ *   Reference device ID.
+ * @param hcflag
+ *   Pointer to the memory area of the hcflag structure.
+ * @param mtype
+ *   Type of memory to allocate the communication flag.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - EINVAL if invalid inputs
+ *   - ENOTSUP if operation not supported by the driver
+ *   - ENOMEM if out of space
+ *   - EPERM if driver error
+ */
+__rte_experimental
+int rte_hcdev_comm_create_flag(uint16_t dev_id,
+		struct rte_hcdev_comm_flag *hcflag,
+		enum rte_hcdev_comm_flag_type mtype);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Deallocate a communication flag.
+ *
+ * @param dev_id
+ *   Reference device ID.
+ * @param hcflag
+ *   Pointer to the memory area of the hcflag structure.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - EINVAL if NULL hcflag
+ *   - ENOTSUP if operation not supported by the driver
+ *   - EPERM if driver error
+ */
+__rte_experimental
+int rte_hcdev_comm_destroy_flag(uint16_t dev_id,
+		struct rte_hcdev_comm_flag *hcflag);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set the value of a communication flag as the input value.
+ * Flag memory area is treated as volatile.
+ * The flag must have been allocated with RTE_HCDEV_COMM_FLAG_CPU.
+ *
+ * @param hcflag
+ *   Pointer to the memory area of the hcflag structure.
+ * @param val
+ *   Value to set in the flag.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - EINVAL if invalid input params
+ */
+__rte_experimental
+int rte_hcdev_comm_set_flag(struct rte_hcdev_comm_flag *hcflag,
+		uint32_t val);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get the value of the communication flag.
+ * Flag memory area is treated as volatile.
+ * The flag must have been allocated with RTE_HCDEV_COMM_FLAG_CPU.
+ *
+ * @param hcflag
+ *   Pointer to the memory area of the hcflag structure.
+ * @param val
+ *   Flag output value.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - EINVAL if invalid input params
+ */
+__rte_experimental
+int rte_hcdev_comm_get_flag_value(struct rte_hcdev_comm_flag *hcflag,
+		uint32_t *val);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/hcdev/version.map b/lib/hcdev/version.map
index 9195f4f747..da969c7f1f 100644
--- a/lib/hcdev/version.map
+++ b/lib/hcdev/version.map
@@ -6,6 +6,10 @@ EXPERIMENTAL {
 	rte_hcdev_callback_register;
 	rte_hcdev_callback_unregister;
 	rte_hcdev_close;
+	rte_hcdev_comm_create_flag;
+	rte_hcdev_comm_destroy_flag;
+	rte_hcdev_comm_get_flag_value;
+	rte_hcdev_comm_set_flag;
 	rte_hcdev_count_avail;
 	rte_hcdev_find_next;
 	rte_hcdev_free;
-- 
2.31.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [RFC PATCH v2 7/7] hcdev: add communication list
  2021-07-30 13:55 ` [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library Thomas Monjalon
                     ` (5 preceding siblings ...)
  2021-07-30 13:55   ` [dpdk-dev] [RFC PATCH v2 6/7] hcdev: add communication flag Thomas Monjalon
@ 2021-07-30 13:55   ` Thomas Monjalon
  2021-07-31  7:06   ` [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library Jerin Jacob
  7 siblings, 0 replies; 128+ messages in thread
From: Thomas Monjalon @ 2021-07-30 13:55 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, David Marchand, Andrew Rybchenko, Haiyue Wang,
	Honnappa Nagarahalli, Jerin Jacob, Ferruh Yigit, Elena Agostini,
	Ray Kinsella

From: Elena Agostini <eagostini@nvidia.com>

In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.
When mixing network activity with task processing there may be the need
to put in communication the CPU with the device in order to synchronize
operations.

An example could be a receive-and-process application
where CPU is responsible for receiving packets in multiple mbufs
and the device is responsible for processing the content of those packets.

The purpose of this list is to provide a buffer in CPU memory visible
from the device that can be treated as a circular buffer
to let the CPU provide fondamental info of received packets to the device.

A possible use-case is described below.

CPU:
- Trigger some task on the device
- in a loop:
    - receive a number of packets
    - provide packets info to the device

Device:
- Do some pre-processing
- Wait to receive a new set of packet to be processed

Layout of a communication list would be:

     -------
    |   0    | => pkt_list
    | status |
    | #pkts  |
     -------
    |   1    | => pkt_list
    | status |
    | #pkts  |
     -------
    |   2    | => pkt_list
    | status |
    | #pkts  |
     -------
    |  ....  | => pkt_list
     -------

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
---
 lib/hcdev/hcdev.c     | 127 ++++++++++++++++++++++++++++++++++++++++
 lib/hcdev/meson.build |   2 +
 lib/hcdev/rte_hcdev.h | 132 ++++++++++++++++++++++++++++++++++++++++++
 lib/hcdev/version.map |   4 ++
 4 files changed, 265 insertions(+)

diff --git a/lib/hcdev/hcdev.c b/lib/hcdev/hcdev.c
index e391988e73..572f1713fc 100644
--- a/lib/hcdev/hcdev.c
+++ b/lib/hcdev/hcdev.c
@@ -660,3 +660,130 @@ rte_hcdev_comm_get_flag_value(struct rte_hcdev_comm_flag *hcflag, uint32_t *val)
 
 	return 0;
 }
+
+struct rte_hcdev_comm_list *
+rte_hcdev_comm_create_list(uint16_t dev_id,
+		uint32_t num_comm_items)
+{
+	struct rte_hcdev_comm_list *comm_list;
+	uint32_t idx_l;
+
+	if (num_comm_items == 0) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	comm_list = rte_hcdev_malloc(dev_id,
+			sizeof(struct rte_hcdev_comm_list) * num_comm_items,
+			RTE_HCDEV_MALLOC_REGISTER_FROM_CPU);
+	if (comm_list == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	for (idx_l = 0; idx_l < num_comm_items; idx_l++) {
+		comm_list[idx_l].pkt_list =
+			rte_hcdev_malloc(dev_id,
+					sizeof(struct rte_hcdev_comm_pkt) *
+					RTE_HCDEV_COMM_LIST_PKTS_MAX,
+					RTE_HCDEV_MALLOC_REGISTER_FROM_CPU);
+		if (comm_list[idx_l].pkt_list == NULL) {
+			rte_errno = ENOMEM;
+			return NULL;
+		}
+
+		RTE_HCDEV_VOLATILE(comm_list[idx_l].status) =
+			RTE_HCDEV_COMM_LIST_FREE;
+		comm_list[idx_l].num_pkts = 0;
+	}
+
+	return comm_list;
+}
+
+int
+rte_hcdev_comm_destroy_list(uint16_t dev_id,
+		struct rte_hcdev_comm_list *comm_list,
+		uint32_t num_comm_items)
+{
+	uint32_t idx_l;
+
+	if (comm_list == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	for (idx_l = 0; idx_l < num_comm_items; idx_l++)
+		rte_hcdev_free(dev_id, comm_list[idx_l].pkt_list);
+	rte_hcdev_free(dev_id, comm_list);
+
+	return 0;
+}
+
+int
+rte_hcdev_comm_populate_list_pkts(struct rte_hcdev_comm_list *comm_list_item,
+		struct rte_mbuf **mbufs, uint32_t num_mbufs)
+{
+	uint32_t idx;
+
+	if (comm_list_item == NULL || comm_list_item->pkt_list == NULL ||
+			mbufs == NULL || num_mbufs > RTE_HCDEV_COMM_LIST_PKTS_MAX) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	for (idx = 0; idx < num_mbufs; idx++) {
+		/* support only unchained mbufs */
+		if (unlikely((mbufs[idx]->nb_segs > 1) ||
+				(mbufs[idx]->next != NULL) ||
+				(mbufs[idx]->data_len != mbufs[idx]->pkt_len))) {
+			rte_errno = ENOTSUP;
+			return -rte_errno;
+		}
+		comm_list_item->pkt_list[idx].addr =
+				rte_pktmbuf_mtod_offset(mbufs[idx], uintptr_t, 0);
+		comm_list_item->pkt_list[idx].size = mbufs[idx]->pkt_len;
+		comm_list_item->pkt_list[idx].opaque = mbufs[idx];
+	}
+
+	RTE_HCDEV_VOLATILE(comm_list_item->num_pkts) = num_mbufs;
+	rte_mb();
+	RTE_HCDEV_VOLATILE(comm_list_item->status) = RTE_HCDEV_COMM_LIST_READY;
+
+	return 0;
+}
+
+int
+rte_hcdev_comm_cleanup_list(struct rte_hcdev_comm_list *comm_list_item)
+{
+	struct rte_mbuf *mbufs[RTE_HCDEV_COMM_LIST_PKTS_MAX];
+	uint32_t idx = 0;
+
+	if (comm_list_item == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	if (RTE_HCDEV_VOLATILE(comm_list_item->status) ==
+			RTE_HCDEV_COMM_LIST_READY) {
+		HCDEV_LOG(ERR, "packet list is still in progress");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	for (idx = 0; idx < RTE_HCDEV_COMM_LIST_PKTS_MAX; idx++) {
+		if (comm_list_item->pkt_list[idx].addr == 0)
+			break;
+
+		comm_list_item->pkt_list[idx].addr = 0;
+		comm_list_item->pkt_list[idx].size = 0;
+		mbufs[idx] = (struct rte_mbuf *) comm_list_item->pkt_list[idx].opaque;
+	}
+
+	rte_pktmbuf_free_bulk(mbufs, idx);
+
+	RTE_HCDEV_VOLATILE(comm_list_item->status) = RTE_HCDEV_COMM_LIST_FREE;
+	RTE_HCDEV_VOLATILE(comm_list_item->num_pkts) = 0;
+	rte_mb();
+
+	return 0;
+}
diff --git a/lib/hcdev/meson.build b/lib/hcdev/meson.build
index 565c3cb623..249849e0cb 100644
--- a/lib/hcdev/meson.build
+++ b/lib/hcdev/meson.build
@@ -8,3 +8,5 @@ headers = files(
 sources = files(
         'hcdev.c',
 )
+
+deps += ['mbuf']
diff --git a/lib/hcdev/rte_hcdev.h b/lib/hcdev/rte_hcdev.h
index 7b58041b3c..78f0e92957 100644
--- a/lib/hcdev/rte_hcdev.h
+++ b/lib/hcdev/rte_hcdev.h
@@ -9,6 +9,7 @@
 #include <stdint.h>
 #include <stdbool.h>
 
+#include <rte_mbuf.h>
 #include <rte_bitops.h>
 #include <rte_compat.h>
 
@@ -41,6 +42,9 @@ extern "C" {
 /** Access variable as volatile. */
 #define RTE_HCDEV_VOLATILE(x) (*(volatile typeof(x)*)&(x))
 
+/** Max number of packets per communication list. */
+#define RTE_HCDEV_COMM_LIST_PKTS_MAX 1024
+
 /** Store device info. */
 struct rte_hcdev_info {
 	/** Unique identifier name. */
@@ -79,10 +83,47 @@ enum rte_hcdev_comm_flag_type {
 
 /** Communication flag to coordinate CPU with the device. */
 struct rte_hcdev_comm_flag {
+	/** Pointer to flag memory area. */
 	uint32_t *ptr;
+	/** Type of memory used to allocate the flag. */
 	enum rte_hcdev_comm_flag_type mtype;
 };
 
+/** List of packets shared among CPU and device. */
+struct rte_hcdev_comm_pkt {
+	/** Address of the packet in memory (e.g. mbuf->buf_addr). */
+	uintptr_t addr;
+	/** Size in byte of the packet. */
+	size_t size;
+	/** Mbuf reference to release it in the rte_hcdev_comm_cleanup_list(). */
+	void *opaque;
+};
+
+/** Possible status for the list of packets shared among CPU and device. */
+enum rte_hcdev_comm_list_status {
+	/** Packet list can be filled with new mbufs, no one is using it. */
+	RTE_HCDEV_COMM_LIST_FREE = 0,
+	/** Packet list has been filled with new mbufs and it's ready to be used .*/
+	RTE_HCDEV_COMM_LIST_READY,
+	/** Packet list has been processed, it's ready to be freed. */
+	RTE_HCDEV_COMM_LIST_DONE,
+	/** Some error occurred during packet list processing. */
+	RTE_HCDEV_COMM_LIST_ERROR,
+};
+
+/**
+ * Communication list holding a number of lists of packets
+ * each having a status flag.
+ */
+struct rte_hcdev_comm_list {
+	/** List of packets populated by the CPU with a set of mbufs info. */
+	struct rte_hcdev_comm_pkt *pkt_list;
+	/** Number of packets in the list. */
+	uint32_t num_pkts;
+	/** Status of the list. */
+	enum rte_hcdev_comm_list_status status;
+};
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
@@ -453,6 +494,97 @@ __rte_experimental
 int rte_hcdev_comm_get_flag_value(struct rte_hcdev_comm_flag *hcflag,
 		uint32_t *val);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Create a communication list that can be used to share packets
+ * between CPU and device.
+ * Each element of the list contains:
+ *  - a packet list of RTE_HCDEV_COMM_LIST_PKTS_MAX elements
+ *  - number of packets in the list
+ *  - a status flag to communicate if the packet list is FREE,
+ *    READY to be processed, DONE with processing.
+ *
+ * The list is allocated in CPU-visible memory.
+ * At creation time, every list is in FREE state.
+ *
+ * @param dev_id
+ *   Reference device ID.
+ * @param num_comm_items
+ *   Number of items in the communication list.
+ *
+ * @return
+ *   A pointer to the allocated list, otherwise NULL and rte_errno is set:
+ *   - EINVAL if invalid input params
+ */
+__rte_experimental
+struct rte_hcdev_comm_list *rte_hcdev_comm_create_list(uint16_t dev_id,
+		uint32_t num_comm_items);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Destroy a communication list.
+ *
+ * @param dev_id
+ *   Reference device ID.
+ * @param comm_list
+ *   Communication list to be destroyed.
+ * @param num_comm_items
+ *   Number of items in the communication list.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - EINVAL if invalid input params
+ */
+__rte_experimental
+int rte_hcdev_comm_destroy_list(uint16_t dev_id,
+		struct rte_hcdev_comm_list *comm_list,
+		uint32_t num_comm_items);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Populate the packets list of the communication item
+ * with info from a list of mbufs.
+ * Status flag of that packet list is set to READY.
+ *
+ * @param comm_list_item
+ *   Communication list item to fill.
+ * @param mbufs
+ *   List of mbufs.
+ * @param num_mbufs
+ *   Number of mbufs.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - EINVAL if invalid input params
+ *   - ENOTSUP if mbufs are chained (multiple segments)
+ */
+__rte_experimental
+int rte_hcdev_comm_populate_list_pkts(struct rte_hcdev_comm_list *comm_list_item,
+		struct rte_mbuf **mbufs, uint32_t num_mbufs);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Reset a communication list item to the original state.
+ * The status flag set to FREE and mbufs are returned to the pool.
+ *
+ * @param comm_list_item
+ *   Communication list item to reset.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - EINVAL if invalid input params
+ */
+__rte_experimental
+int rte_hcdev_comm_cleanup_list(struct rte_hcdev_comm_list *comm_list_item);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/hcdev/version.map b/lib/hcdev/version.map
index da969c7f1f..caa00af647 100644
--- a/lib/hcdev/version.map
+++ b/lib/hcdev/version.map
@@ -6,9 +6,13 @@ EXPERIMENTAL {
 	rte_hcdev_callback_register;
 	rte_hcdev_callback_unregister;
 	rte_hcdev_close;
+	rte_hcdev_comm_cleanup_list;
 	rte_hcdev_comm_create_flag;
+	rte_hcdev_comm_create_list;
 	rte_hcdev_comm_destroy_flag;
+	rte_hcdev_comm_destroy_list;
 	rte_hcdev_comm_get_flag_value;
+	rte_hcdev_comm_populate_list_pkts;
 	rte_hcdev_comm_set_flag;
 	rte_hcdev_count_avail;
 	rte_hcdev_find_next;
-- 
2.31.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library
  2021-07-30 13:55 ` [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library Thomas Monjalon
                     ` (6 preceding siblings ...)
  2021-07-30 13:55   ` [dpdk-dev] [RFC PATCH v2 7/7] hcdev: add communication list Thomas Monjalon
@ 2021-07-31  7:06   ` Jerin Jacob
  2021-07-31  8:21     ` Thomas Monjalon
  7 siblings, 1 reply; 128+ messages in thread
From: Jerin Jacob @ 2021-07-31  7:06 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dpdk-dev, Stephen Hemminger, David Marchand, Andrew Rybchenko,
	Haiyue Wang, Honnappa Nagarahalli, Jerin Jacob, Ferruh Yigit,
	techboard

On Fri, Jul 30, 2021 at 7:25 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> From: Elena Agostini <eagostini@nvidia.com>
>
> In heterogeneous computing system, processing is not only in the CPU.
> Some tasks can be delegated to devices working in parallel.
>
> The goal of this new library is to enhance the collaboration between
> DPDK, that's primarily a CPU framework, and other type of devices like GPUs.
>
> When mixing network activity with task processing on a non-CPU device,
> there may be the need to put in communication the CPU with the device
> in order to manage the memory, synchronize operations, exchange info, etc..
>
> This library provides a number of new features:
> - Interoperability with device specific library with generic handlers
> - Possibility to allocate and free memory on the device
> - Possibility to allocate and free memory on the CPU but visible from the device
> - Communication functions to enhance the dialog between the CPU and the device
>
> The infrastructure is prepared to welcome drivers in drivers/hc/
> as the upcoming NVIDIA one, implementing the hcdev API.
>
> Some parts are not complete:
>   - locks
>   - memory allocation table
>   - memory freeing
>   - guide documentation
>   - integration in devtools/check-doc-vs-code.sh
>   - unit tests
>   - integration in testpmd to enable Rx/Tx to/from GPU memory.

Since the above line is the crux of the following text, I will start
from this point.

+ Techboard

I  can give my honest feedback on this.

I can map similar  stuff  in Marvell HW, where we do machine learning
as compute offload
on a different class of CPU.

In terms of RFC patch features

1) memory API - Use cases are aligned
2) communication flag and communication list
Our structure is completely different and we are using HW ring kind of
interface to post the job to compute interface and
the job completion result happens through the event device.
Kind of similar to the DMA API that has been discussed on the mailing list.


Now the bigger question is why need to Tx and then Rx something to
compute the device
Isn't  ot offload something? If so, why not add the those offload in
respective subsystem
to improve the subsystem(ethdev, cryptiodev etc) features set to adapt
new features or
introduce new subsystem (like ML, Inline Baseband processing) so that
it will be an opportunity to
implement the same in  HW or compute device. For example, if we take
this path, ML offloading will
be application code like testpmd, which deals with "specific" device
commands(aka glorified rawdev)
to deal with specific computing device offload "COMMANDS"
(The commands will be specific to  offload device, the same code wont
run on  other compute device)

Just my _personal_ preference is to have specific subsystems to
improve the DPDK instead of raw device kind of
path. If we decide another path as a community it is _fine_ too(as a
_project manager_ point of view it will be an easy path to dump SDK
stuff to DPDK without introducing the pain of the subsystem nor
improving the DPDK).

>
> Below is a pseudo-code to give an example about how to use functions
> in this library in case of a CUDA application.
>
>
> Elena Agostini (4):
>   hcdev: introduce heterogeneous computing device library
>   hcdev: add memory API
>   hcdev: add communication flag
>   hcdev: add communication list
>
> Thomas Monjalon (3):
>   hcdev: add event notification
>   hcdev: add child device representing a device context
>   hcdev: support multi-process
>
>  .gitignore                             |   1 +
>  MAINTAINERS                            |   6 +
>  doc/api/doxy-api-index.md              |   1 +
>  doc/api/doxy-api.conf.in               |   1 +
>  doc/guides/conf.py                     |   8 +
>  doc/guides/hcdevs/features/default.ini |  13 +
>  doc/guides/hcdevs/index.rst            |  11 +
>  doc/guides/hcdevs/overview.rst         |  11 +
>  doc/guides/index.rst                   |   1 +
>  doc/guides/prog_guide/hcdev.rst        |   5 +
>  doc/guides/prog_guide/index.rst        |   1 +
>  doc/guides/rel_notes/release_21_08.rst |   5 +
>  drivers/hc/meson.build                 |   4 +
>  drivers/meson.build                    |   1 +
>  lib/hcdev/hcdev.c                      | 789 +++++++++++++++++++++++++
>  lib/hcdev/hcdev_driver.h               |  96 +++
>  lib/hcdev/meson.build                  |  12 +
>  lib/hcdev/rte_hcdev.h                  | 592 +++++++++++++++++++
>  lib/hcdev/version.map                  |  35 ++
>  lib/meson.build                        |   1 +
>  20 files changed, 1594 insertions(+)
>  create mode 100644 doc/guides/hcdevs/features/default.ini
>  create mode 100644 doc/guides/hcdevs/index.rst
>  create mode 100644 doc/guides/hcdevs/overview.rst
>  create mode 100644 doc/guides/prog_guide/hcdev.rst
>  create mode 100644 drivers/hc/meson.build
>  create mode 100644 lib/hcdev/hcdev.c
>  create mode 100644 lib/hcdev/hcdev_driver.h
>  create mode 100644 lib/hcdev/meson.build
>  create mode 100644 lib/hcdev/rte_hcdev.h
>  create mode 100644 lib/hcdev/version.map
>
>
>
> ////////////////////////////////////////////////////////////////////////
> ///// HCDEV library + CUDA functions
> ////////////////////////////////////////////////////////////////////////
> #define GPU_PAGE_SHIFT 16
> #define GPU_PAGE_SIZE (1UL << GPU_PAGE_SHIFT)
>
> int main() {
>     struct rte_hcdev_flag quit_flag;
>     struct rte_hcdev_comm_list *comm_list;
>     int nb_rx = 0;
>     int comm_list_entry = 0;
>     struct rte_mbuf * rx_mbufs[max_rx_mbufs];
>     cudaStream_t cstream;
>     struct rte_mempool *mpool_payload, *mpool_header;
>     struct rte_pktmbuf_extmem ext_mem;
>     int16_t dev_id;
>
>     /* Initialize CUDA objects (cstream, context, etc..). */
>     /* Use hcdev library to register a new CUDA context if any */
>     /* Let's assume the application wants to use the default context of the GPU device 0 */
>     dev_id = 0;
>
>     /* Create an external memory mempool using memory allocated on the GPU. */
>     ext_mem.elt_size = mbufs_headroom_size;
>                 ext_mem.buf_len = RTE_ALIGN_CEIL(mbufs_num * ext_mem.elt_size, GPU_PAGE_SIZE);
>     ext_mem.buf_iova = RTE_BAD_IOVA;
>     ext_mem.buf_ptr = rte_hcdev_malloc(dev_id, ext_mem.buf_len, 0);
>     rte_extmem_register(ext_mem.buf_ptr, ext_mem.buf_len, NULL, ext_mem.buf_iova, GPU_PAGE_SIZE);
>     rte_dev_dma_map(rte_eth_devices[l2fwd_port_id].device, ext_mem.buf_ptr, ext_mem.buf_iova, ext_mem.buf_len);
>     mpool_payload = rte_pktmbuf_pool_create_extbuf("gpu_mempool", mbufs_num,
>                                                     0, 0, ext_mem.elt_size,
>                                                     rte_socket_id(), &ext_mem, 1);
>
>     /*
>      * Create CPU - device communication flag. With this flag, the CPU can tell to the CUDA kernel
>      * to exit from the main loop.
>      */
>     rte_hcdev_comm_create_flag(dev_id, &quit_flag, RTE_HCDEV_COMM_FLAG_CPU);
>     rte_hcdev_comm_set_flag(&quit_flag, 0);
>
>     /*
>      * Create CPU - device communication list. Each entry of this list will be populated by the CPU
>      * with a new set of received mbufs that the CUDA kernel has to process.
>      */
>     comm_list = rte_hcdev_comm_create_list(dev_id, num_entries);
>
>     /* A very simple CUDA kernel with just 1 CUDA block and RTE_HCDEV_COMM_LIST_PKTS_MAX CUDA threads. */
>     cuda_kernel_packet_processing<<<1, RTE_HCDEV_COMM_LIST_PKTS_MAX, 0, cstream>>>(quit_flag->ptr, comm_list, num_entries, ...);
>
>     /*
>      * For simplicity, the CPU here receives only 2 bursts of mbufs.
>      * In a real application, network activity and device processing should overlap.
>      */
>     nb_rx = rte_eth_rx_burst(port_id, queue_id, &(rx_mbufs[0]), max_rx_mbufs);
>     rte_hcdev_comm_populate_list_pkts(comm_list[0], rx_mbufs, nb_rx);
>     nb_rx = rte_eth_rx_burst(port_id, queue_id, &(rx_mbufs[0]), max_rx_mbufs);
>     rte_hcdev_comm_populate_list_pkts(comm_list[1], rx_mbufs, nb_rx);
>
>     /*
>      * CPU waits for the completion of the packets' processing on the CUDA kernel
>      * and then it does a cleanup of the received mbufs.
>      */
>     while (rte_hcdev_comm_cleanup_list(comm_list[0]));
>     while (rte_hcdev_comm_cleanup_list(comm_list[1]));
>
>     /* CPU notifies the CUDA kernel that it has to terminate */
>     rte_hcdev_comm_set_flag(&quit_flag, 1);
>
>     /* hcdev objects cleanup/destruction */
>     /* CUDA cleanup */
>     /* DPDK cleanup */
>
>     return 0;
> }
>
> ////////////////////////////////////////////////////////////////////////
> ///// CUDA kernel
> ////////////////////////////////////////////////////////////////////////
>
> void cuda_kernel(uint32_t * quit_flag_ptr, struct rte_hcdev_comm_list *comm_list, int comm_list_entries) {
>     int comm_list_index = 0;
>     struct rte_hcdev_comm_pkt *pkt_list = NULL;
>
>     /* Do some pre-processing operations. */
>
>     /* GPU kernel keeps checking this flag to know if it has to quit or wait for more packets. */
>     while (*quit_flag_ptr == 0)
>     {
>         if (comm_list[comm_list_index]->status != RTE_HCDEV_COMM_LIST_READY)
>             continue;
>
>         if (threadIdx.x < comm_list[comm_list_index]->num_pkts)
>         {
>             /* Each CUDA thread processes a different packet. */
>             packet_processing(comm_list[comm_list_index]->addr, comm_list[comm_list_index]->size, ..);
>         }
>         __threadfence();
>         __syncthreads();
>
>         /* Wait for new packets on the next communication list entry. */
>         comm_list_index = (comm_list_index+1) % comm_list_entries;
>     }
>
>     /* Do some post-processing operations. */
> }
>
>
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library
  2021-07-31  7:06   ` [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library Jerin Jacob
@ 2021-07-31  8:21     ` Thomas Monjalon
  2021-07-31 13:42       ` Jerin Jacob
  0 siblings, 1 reply; 128+ messages in thread
From: Thomas Monjalon @ 2021-07-31  8:21 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dpdk-dev, Stephen Hemminger, David Marchand, Andrew Rybchenko,
	Haiyue Wang, Honnappa Nagarahalli, Jerin Jacob, Ferruh Yigit,
	techboard

31/07/2021 09:06, Jerin Jacob:
> On Fri, Jul 30, 2021 at 7:25 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> >
> > From: Elena Agostini <eagostini@nvidia.com>
> >
> > In heterogeneous computing system, processing is not only in the CPU.
> > Some tasks can be delegated to devices working in parallel.
> >
> > The goal of this new library is to enhance the collaboration between
> > DPDK, that's primarily a CPU framework, and other type of devices like GPUs.
> >
> > When mixing network activity with task processing on a non-CPU device,
> > there may be the need to put in communication the CPU with the device
> > in order to manage the memory, synchronize operations, exchange info, etc..
> >
> > This library provides a number of new features:
> > - Interoperability with device specific library with generic handlers
> > - Possibility to allocate and free memory on the device
> > - Possibility to allocate and free memory on the CPU but visible from the device
> > - Communication functions to enhance the dialog between the CPU and the device
> >
> > The infrastructure is prepared to welcome drivers in drivers/hc/
> > as the upcoming NVIDIA one, implementing the hcdev API.
> >
> > Some parts are not complete:
> >   - locks
> >   - memory allocation table
> >   - memory freeing
> >   - guide documentation
> >   - integration in devtools/check-doc-vs-code.sh
> >   - unit tests
> >   - integration in testpmd to enable Rx/Tx to/from GPU memory.
> 
> Since the above line is the crux of the following text, I will start
> from this point.
> 
> + Techboard
> 
> I  can give my honest feedback on this.
> 
> I can map similar  stuff  in Marvell HW, where we do machine learning
> as compute offload
> on a different class of CPU.
> 
> In terms of RFC patch features
> 
> 1) memory API - Use cases are aligned
> 2) communication flag and communication list
> Our structure is completely different and we are using HW ring kind of
> interface to post the job to compute interface and
> the job completion result happens through the event device.
> Kind of similar to the DMA API that has been discussed on the mailing list.

Interesting.

> Now the bigger question is why need to Tx and then Rx something to
> compute the device
> Isn't  ot offload something? If so, why not add the those offload in
> respective subsystem
> to improve the subsystem(ethdev, cryptiodev etc) features set to adapt
> new features or
> introduce new subsystem (like ML, Inline Baseband processing) so that
> it will be an opportunity to
> implement the same in  HW or compute device. For example, if we take
> this path, ML offloading will
> be application code like testpmd, which deals with "specific" device
> commands(aka glorified rawdev)
> to deal with specific computing device offload "COMMANDS"
> (The commands will be specific to  offload device, the same code wont
> run on  other compute device)

Having specific features API is convenient for compatibility
between devices, yes, for the set of defined features.
Our approach is to start with a flexible API that the application
can use to implement any processing because with GPU programming,
there is no restriction on what can be achieved.
This approach does not contradict what you propose,
it does not prevent extending existing classes.

> Just my _personal_ preference is to have specific subsystems to
> improve the DPDK instead of raw device kind of
> path. If we decide another path as a community it is _fine_ too(as a
> _project manager_ point of view it will be an easy path to dump SDK
> stuff to DPDK without introducing the pain of the subsystem nor
> improving the DPDK).

Adding a new class API is also improving DPDK.



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library
  2021-07-31  8:21     ` Thomas Monjalon
@ 2021-07-31 13:42       ` Jerin Jacob
  2021-08-27  9:44         ` Thomas Monjalon
  0 siblings, 1 reply; 128+ messages in thread
From: Jerin Jacob @ 2021-07-31 13:42 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dpdk-dev, Stephen Hemminger, David Marchand, Andrew Rybchenko,
	Haiyue Wang, Honnappa Nagarahalli, Jerin Jacob, Ferruh Yigit,
	techboard

On Sat, Jul 31, 2021 at 1:51 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 31/07/2021 09:06, Jerin Jacob:
> > On Fri, Jul 30, 2021 at 7:25 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > >
> > > From: Elena Agostini <eagostini@nvidia.com>
> > >
> > > In heterogeneous computing system, processing is not only in the CPU.
> > > Some tasks can be delegated to devices working in parallel.
> > >
> > > The goal of this new library is to enhance the collaboration between
> > > DPDK, that's primarily a CPU framework, and other type of devices like GPUs.
> > >
> > > When mixing network activity with task processing on a non-CPU device,
> > > there may be the need to put in communication the CPU with the device
> > > in order to manage the memory, synchronize operations, exchange info, etc..
> > >
> > > This library provides a number of new features:
> > > - Interoperability with device specific library with generic handlers
> > > - Possibility to allocate and free memory on the device
> > > - Possibility to allocate and free memory on the CPU but visible from the device
> > > - Communication functions to enhance the dialog between the CPU and the device
> > >
> > > The infrastructure is prepared to welcome drivers in drivers/hc/
> > > as the upcoming NVIDIA one, implementing the hcdev API.
> > >
> > > Some parts are not complete:
> > >   - locks
> > >   - memory allocation table
> > >   - memory freeing
> > >   - guide documentation
> > >   - integration in devtools/check-doc-vs-code.sh
> > >   - unit tests
> > >   - integration in testpmd to enable Rx/Tx to/from GPU memory.
> >
> > Since the above line is the crux of the following text, I will start
> > from this point.
> >
> > + Techboard
> >
> > I  can give my honest feedback on this.
> >
> > I can map similar  stuff  in Marvell HW, where we do machine learning
> > as compute offload
> > on a different class of CPU.
> >
> > In terms of RFC patch features
> >
> > 1) memory API - Use cases are aligned
> > 2) communication flag and communication list
> > Our structure is completely different and we are using HW ring kind of
> > interface to post the job to compute interface and
> > the job completion result happens through the event device.
> > Kind of similar to the DMA API that has been discussed on the mailing list.
>
> Interesting.

It is hard to generalize the communication mechanism.
Is other GPU vendors have a similar communication mechanism? AMD, Intel ??

>
> > Now the bigger question is why need to Tx and then Rx something to
> > compute the device
> > Isn't  ot offload something? If so, why not add the those offload in
> > respective subsystem
> > to improve the subsystem(ethdev, cryptiodev etc) features set to adapt
> > new features or
> > introduce new subsystem (like ML, Inline Baseband processing) so that
> > it will be an opportunity to
> > implement the same in  HW or compute device. For example, if we take
> > this path, ML offloading will
> > be application code like testpmd, which deals with "specific" device
> > commands(aka glorified rawdev)
> > to deal with specific computing device offload "COMMANDS"
> > (The commands will be specific to  offload device, the same code wont
> > run on  other compute device)
>
> Having specific features API is convenient for compatibility
> between devices, yes, for the set of defined features.
> Our approach is to start with a flexible API that the application
> can use to implement any processing because with GPU programming,
> there is no restriction on what can be achieved.
> This approach does not contradict what you propose,
> it does not prevent extending existing classes.

It does prevent extending the existing classes as no one is going to
extent it there is the path of not doing do.

If an application can run only on a specific device, it is similar to
a raw device,
where the device definition is not defined. (i.e JOB metadata is not defined and
it is specific to the device).

>
> > Just my _personal_ preference is to have specific subsystems to
> > improve the DPDK instead of raw device kind of
> > path. If we decide another path as a community it is _fine_ too(as a
> > _project manager_ point of view it will be an easy path to dump SDK
> > stuff to DPDK without introducing the pain of the subsystem nor
> > improving the DPDK).
>
> Adding a new class API is also improving DPDK.

But the class is similar as raw dev class. The reason I say,
Job submission and response is can be abstracted as queue/dequeue APIs.
Taks/Job metadata is specific to compute devices (and it can not be
generalized).
If we generalize it makes sense to have a new class that does
"specific function".


>
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library
  2021-07-31 13:42       ` Jerin Jacob
@ 2021-08-27  9:44         ` Thomas Monjalon
  2021-08-27 12:19           ` Jerin Jacob
  0 siblings, 1 reply; 128+ messages in thread
From: Thomas Monjalon @ 2021-08-27  9:44 UTC (permalink / raw)
  To: Jerin Jacob, Jerin Jacob
  Cc: dev, Stephen Hemminger, David Marchand, Andrew Rybchenko,
	Haiyue Wang, Honnappa Nagarahalli, Ferruh Yigit, techboard,
	Elena Agostini

31/07/2021 15:42, Jerin Jacob:
> On Sat, Jul 31, 2021 at 1:51 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > 31/07/2021 09:06, Jerin Jacob:
> > > On Fri, Jul 30, 2021 at 7:25 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > From: Elena Agostini <eagostini@nvidia.com>
> > > >
> > > > In heterogeneous computing system, processing is not only in the CPU.
> > > > Some tasks can be delegated to devices working in parallel.
> > > >
> > > > The goal of this new library is to enhance the collaboration between
> > > > DPDK, that's primarily a CPU framework, and other type of devices like GPUs.
> > > >
> > > > When mixing network activity with task processing on a non-CPU device,
> > > > there may be the need to put in communication the CPU with the device
> > > > in order to manage the memory, synchronize operations, exchange info, etc..
> > > >
> > > > This library provides a number of new features:
> > > > - Interoperability with device specific library with generic handlers
> > > > - Possibility to allocate and free memory on the device
> > > > - Possibility to allocate and free memory on the CPU but visible from the device
> > > > - Communication functions to enhance the dialog between the CPU and the device
> > > >
> > > > The infrastructure is prepared to welcome drivers in drivers/hc/
> > > > as the upcoming NVIDIA one, implementing the hcdev API.
> > > >
> > > > Some parts are not complete:
> > > >   - locks
> > > >   - memory allocation table
> > > >   - memory freeing
> > > >   - guide documentation
> > > >   - integration in devtools/check-doc-vs-code.sh
> > > >   - unit tests
> > > >   - integration in testpmd to enable Rx/Tx to/from GPU memory.
> > >
> > > Since the above line is the crux of the following text, I will start
> > > from this point.
> > >
> > > + Techboard
> > >
> > > I  can give my honest feedback on this.
> > >
> > > I can map similar  stuff  in Marvell HW, where we do machine learning
> > > as compute offload
> > > on a different class of CPU.
> > >
> > > In terms of RFC patch features
> > >
> > > 1) memory API - Use cases are aligned
> > > 2) communication flag and communication list
> > > Our structure is completely different and we are using HW ring kind of
> > > interface to post the job to compute interface and
> > > the job completion result happens through the event device.
> > > Kind of similar to the DMA API that has been discussed on the mailing list.
> >
> > Interesting.
> 
> It is hard to generalize the communication mechanism.
> Is other GPU vendors have a similar communication mechanism? AMD, Intel ??

I don't know who to ask in AMD & Intel. Any ideas?

> > > Now the bigger question is why need to Tx and then Rx something to
> > > compute the device
> > > Isn't  ot offload something? If so, why not add the those offload in
> > > respective subsystem
> > > to improve the subsystem(ethdev, cryptiodev etc) features set to adapt
> > > new features or
> > > introduce new subsystem (like ML, Inline Baseband processing) so that
> > > it will be an opportunity to
> > > implement the same in  HW or compute device. For example, if we take
> > > this path, ML offloading will
> > > be application code like testpmd, which deals with "specific" device
> > > commands(aka glorified rawdev)
> > > to deal with specific computing device offload "COMMANDS"
> > > (The commands will be specific to  offload device, the same code wont
> > > run on  other compute device)
> >
> > Having specific features API is convenient for compatibility
> > between devices, yes, for the set of defined features.
> > Our approach is to start with a flexible API that the application
> > can use to implement any processing because with GPU programming,
> > there is no restriction on what can be achieved.
> > This approach does not contradict what you propose,
> > it does not prevent extending existing classes.
> 
> It does prevent extending the existing classes as no one is going to
> extent it there is the path of not doing do.

I disagree. Specific API is more convenient for some tasks,
so there is an incentive to define or extend specific device class APIs.
But it should not forbid doing custom processing.

> If an application can run only on a specific device, it is similar to
> a raw device,
> where the device definition is not defined. (i.e JOB metadata is not defined and
> it is specific to the device).
> 
> > > Just my _personal_ preference is to have specific subsystems to
> > > improve the DPDK instead of raw device kind of
> > > path. If we decide another path as a community it is _fine_ too(as a
> > > _project manager_ point of view it will be an easy path to dump SDK
> > > stuff to DPDK without introducing the pain of the subsystem nor
> > > improving the DPDK).
> >
> > Adding a new class API is also improving DPDK.
> 
> But the class is similar as raw dev class. The reason I say,
> Job submission and response is can be abstracted as queue/dequeue APIs.
> Taks/Job metadata is specific to compute devices (and it can not be
> generalized).
> If we generalize it makes sense to have a new class that does
> "specific function".

Computing device programming is already generalized with languages like OpenCL.
We should not try to reinvent the same.
We are just trying to properly integrate the concept in DPDK
and allow building on top of it.



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library
  2021-08-27  9:44         ` Thomas Monjalon
@ 2021-08-27 12:19           ` Jerin Jacob
  2021-08-29  5:32             ` Wang, Haiyue
  0 siblings, 1 reply; 128+ messages in thread
From: Jerin Jacob @ 2021-08-27 12:19 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Jerin Jacob, dpdk-dev, Stephen Hemminger, David Marchand,
	Andrew Rybchenko, Haiyue Wang, Honnappa Nagarahalli,
	Ferruh Yigit, techboard, Elena Agostini

On Fri, Aug 27, 2021 at 3:14 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 31/07/2021 15:42, Jerin Jacob:
> > On Sat, Jul 31, 2021 at 1:51 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > 31/07/2021 09:06, Jerin Jacob:
> > > > On Fri, Jul 30, 2021 at 7:25 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > > From: Elena Agostini <eagostini@nvidia.com>
> > > > >
> > > > > In heterogeneous computing system, processing is not only in the CPU.
> > > > > Some tasks can be delegated to devices working in parallel.
> > > > >
> > > > > The goal of this new library is to enhance the collaboration between
> > > > > DPDK, that's primarily a CPU framework, and other type of devices like GPUs.
> > > > >
> > > > > When mixing network activity with task processing on a non-CPU device,
> > > > > there may be the need to put in communication the CPU with the device
> > > > > in order to manage the memory, synchronize operations, exchange info, etc..
> > > > >
> > > > > This library provides a number of new features:
> > > > > - Interoperability with device specific library with generic handlers
> > > > > - Possibility to allocate and free memory on the device
> > > > > - Possibility to allocate and free memory on the CPU but visible from the device
> > > > > - Communication functions to enhance the dialog between the CPU and the device
> > > > >
> > > > > The infrastructure is prepared to welcome drivers in drivers/hc/
> > > > > as the upcoming NVIDIA one, implementing the hcdev API.
> > > > >
> > > > > Some parts are not complete:
> > > > >   - locks
> > > > >   - memory allocation table
> > > > >   - memory freeing
> > > > >   - guide documentation
> > > > >   - integration in devtools/check-doc-vs-code.sh
> > > > >   - unit tests
> > > > >   - integration in testpmd to enable Rx/Tx to/from GPU memory.
> > > >
> > > > Since the above line is the crux of the following text, I will start
> > > > from this point.
> > > >
> > > > + Techboard
> > > >
> > > > I  can give my honest feedback on this.
> > > >
> > > > I can map similar  stuff  in Marvell HW, where we do machine learning
> > > > as compute offload
> > > > on a different class of CPU.
> > > >
> > > > In terms of RFC patch features
> > > >
> > > > 1) memory API - Use cases are aligned
> > > > 2) communication flag and communication list
> > > > Our structure is completely different and we are using HW ring kind of
> > > > interface to post the job to compute interface and
> > > > the job completion result happens through the event device.
> > > > Kind of similar to the DMA API that has been discussed on the mailing list.
> > >
> > > Interesting.
> >
> > It is hard to generalize the communication mechanism.
> > Is other GPU vendors have a similar communication mechanism? AMD, Intel ??
>
> I don't know who to ask in AMD & Intel. Any ideas?

Good question.

At least in Marvell HW, the communication flag and communication list is
our structure is completely different and we are using HW ring kind of
interface to post the job to compute interface and
the job completion result happens through the event device.
kind of similar to the DMA API that has been discussed on the mailing list.

>
> > > > Now the bigger question is why need to Tx and then Rx something to
> > > > compute the device
> > > > Isn't  ot offload something? If so, why not add the those offload in
> > > > respective subsystem
> > > > to improve the subsystem(ethdev, cryptiodev etc) features set to adapt
> > > > new features or
> > > > introduce new subsystem (like ML, Inline Baseband processing) so that
> > > > it will be an opportunity to
> > > > implement the same in  HW or compute device. For example, if we take
> > > > this path, ML offloading will
> > > > be application code like testpmd, which deals with "specific" device
> > > > commands(aka glorified rawdev)
> > > > to deal with specific computing device offload "COMMANDS"
> > > > (The commands will be specific to  offload device, the same code wont
> > > > run on  other compute device)
> > >
> > > Having specific features API is convenient for compatibility
> > > between devices, yes, for the set of defined features.
> > > Our approach is to start with a flexible API that the application
> > > can use to implement any processing because with GPU programming,
> > > there is no restriction on what can be achieved.
> > > This approach does not contradict what you propose,
> > > it does not prevent extending existing classes.
> >
> > It does prevent extending the existing classes as no one is going to
> > extent it there is the path of not doing do.
>
> I disagree. Specific API is more convenient for some tasks,
> so there is an incentive to define or extend specific device class APIs.
> But it should not forbid doing custom processing.

This is the same as the raw device is in DPDK where the device
personality is not defined.

Even if define another API and if the personality is not defined,
it comes similar to the raw device as similar
to rawdev enqueue and dequeue.

To summarize,

1)  My _personal_ preference is to have specific subsystems
to improve the DPDK instead of the raw device kind of path.
2) If the device personality is not defined, use rawdev
3) All computing devices do not use  "communication flag" and
"communication list"
kind of structure. If are targeting a generic computing device then
that is not a portable scheme.
For GPU abstraction if "communication flag" and "communication list"
is the right kind of mechanism
then we can have a separate library for GPU communication specific to GPU <->
DPDK communication needs and explicit for GPU.

I think generic DPDK applications like testpmd should not
pollute with device-specific functions. Like, call device-specific
messages from the application
which makes the application runs only one device. I don't have a
strong opinion(expect
standardizing  "communication flag" and "communication list" as
generic computing device
communication mechanism) of others think it is OK to do that way in DPDK.

>
> > If an application can run only on a specific device, it is similar to
> > a raw device,
> > where the device definition is not defined. (i.e JOB metadata is not defined and
> > it is specific to the device).
> >
> > > > Just my _personal_ preference is to have specific subsystems to
> > > > improve the DPDK instead of raw device kind of
> > > > path. If we decide another path as a community it is _fine_ too(as a
> > > > _project manager_ point of view it will be an easy path to dump SDK
> > > > stuff to DPDK without introducing the pain of the subsystem nor
> > > > improving the DPDK).
> > >
> > > Adding a new class API is also improving DPDK.
> >
> > But the class is similar as raw dev class. The reason I say,
> > Job submission and response is can be abstracted as queue/dequeue APIs.
> > Taks/Job metadata is specific to compute devices (and it can not be
> > generalized).
> > If we generalize it makes sense to have a new class that does
> > "specific function".
>
> Computing device programming is already generalized with languages like OpenCL.
> We should not try to reinvent the same.
> We are just trying to properly integrate the concept in DPDK
> and allow building on top of it.

See above.

>
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library
  2021-08-27 12:19           ` Jerin Jacob
@ 2021-08-29  5:32             ` Wang, Haiyue
  2021-09-01 15:35               ` Elena Agostini
  0 siblings, 1 reply; 128+ messages in thread
From: Wang, Haiyue @ 2021-08-29  5:32 UTC (permalink / raw)
  To: Jerin Jacob, Thomas Monjalon
  Cc: Jerin Jacob, dpdk-dev, Stephen Hemminger, David Marchand,
	Andrew Rybchenko, Honnappa Nagarahalli, Yigit, Ferruh, techboard,
	Elena Agostini

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Friday, August 27, 2021 20:19
> To: Thomas Monjalon <thomas@monjalon.net>
> Cc: Jerin Jacob <jerinj@marvell.com>; dpdk-dev <dev@dpdk.org>; Stephen Hemminger
> <stephen@networkplumber.org>; David Marchand <david.marchand@redhat.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>; Wang, Haiyue <haiyue.wang@intel.com>; Honnappa Nagarahalli
> <honnappa.nagarahalli@arm.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; techboard@dpdk.org; Elena
> Agostini <eagostini@nvidia.com>
> Subject: Re: [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library
> 
> On Fri, Aug 27, 2021 at 3:14 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> >
> > 31/07/2021 15:42, Jerin Jacob:
> > > On Sat, Jul 31, 2021 at 1:51 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > 31/07/2021 09:06, Jerin Jacob:
> > > > > On Fri, Jul 30, 2021 at 7:25 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > > > From: Elena Agostini <eagostini@nvidia.com>
> > > > > >
> > > > > > In heterogeneous computing system, processing is not only in the CPU.
> > > > > > Some tasks can be delegated to devices working in parallel.
> > > > > >
> > > > > > The goal of this new library is to enhance the collaboration between
> > > > > > DPDK, that's primarily a CPU framework, and other type of devices like GPUs.
> > > > > >
> > > > > > When mixing network activity with task processing on a non-CPU device,
> > > > > > there may be the need to put in communication the CPU with the device
> > > > > > in order to manage the memory, synchronize operations, exchange info, etc..
> > > > > >
> > > > > > This library provides a number of new features:
> > > > > > - Interoperability with device specific library with generic handlers
> > > > > > - Possibility to allocate and free memory on the device
> > > > > > - Possibility to allocate and free memory on the CPU but visible from the device
> > > > > > - Communication functions to enhance the dialog between the CPU and the device
> > > > > >
> > > > > > The infrastructure is prepared to welcome drivers in drivers/hc/
> > > > > > as the upcoming NVIDIA one, implementing the hcdev API.
> > > > > >
> > > > > > Some parts are not complete:
> > > > > >   - locks
> > > > > >   - memory allocation table
> > > > > >   - memory freeing
> > > > > >   - guide documentation
> > > > > >   - integration in devtools/check-doc-vs-code.sh
> > > > > >   - unit tests
> > > > > >   - integration in testpmd to enable Rx/Tx to/from GPU memory.
> > > > >
> > > > > Since the above line is the crux of the following text, I will start
> > > > > from this point.
> > > > >
> > > > > + Techboard
> > > > >
> > > > > I  can give my honest feedback on this.
> > > > >
> > > > > I can map similar  stuff  in Marvell HW, where we do machine learning
> > > > > as compute offload
> > > > > on a different class of CPU.
> > > > >
> > > > > In terms of RFC patch features
> > > > >
> > > > > 1) memory API - Use cases are aligned
> > > > > 2) communication flag and communication list
> > > > > Our structure is completely different and we are using HW ring kind of
> > > > > interface to post the job to compute interface and
> > > > > the job completion result happens through the event device.
> > > > > Kind of similar to the DMA API that has been discussed on the mailing list.
> > > >
> > > > Interesting.
> > >
> > > It is hard to generalize the communication mechanism.
> > > Is other GPU vendors have a similar communication mechanism? AMD, Intel ??
> >
> > I don't know who to ask in AMD & Intel. Any ideas?
> 
> Good question.
> 
> At least in Marvell HW, the communication flag and communication list is
> our structure is completely different and we are using HW ring kind of
> interface to post the job to compute interface and
> the job completion result happens through the event device.
> kind of similar to the DMA API that has been discussed on the mailing list.
> 
> >
> > > > > Now the bigger question is why need to Tx and then Rx something to
> > > > > compute the device
> > > > > Isn't  ot offload something? If so, why not add the those offload in
> > > > > respective subsystem
> > > > > to improve the subsystem(ethdev, cryptiodev etc) features set to adapt
> > > > > new features or
> > > > > introduce new subsystem (like ML, Inline Baseband processing) so that
> > > > > it will be an opportunity to
> > > > > implement the same in  HW or compute device. For example, if we take
> > > > > this path, ML offloading will
> > > > > be application code like testpmd, which deals with "specific" device
> > > > > commands(aka glorified rawdev)
> > > > > to deal with specific computing device offload "COMMANDS"
> > > > > (The commands will be specific to  offload device, the same code wont
> > > > > run on  other compute device)
> > > >
> > > > Having specific features API is convenient for compatibility
> > > > between devices, yes, for the set of defined features.
> > > > Our approach is to start with a flexible API that the application
> > > > can use to implement any processing because with GPU programming,
> > > > there is no restriction on what can be achieved.
> > > > This approach does not contradict what you propose,
> > > > it does not prevent extending existing classes.
> > >
> > > It does prevent extending the existing classes as no one is going to
> > > extent it there is the path of not doing do.
> >
> > I disagree. Specific API is more convenient for some tasks,
> > so there is an incentive to define or extend specific device class APIs.
> > But it should not forbid doing custom processing.
> 
> This is the same as the raw device is in DPDK where the device
> personality is not defined.
> 
> Even if define another API and if the personality is not defined,
> it comes similar to the raw device as similar
> to rawdev enqueue and dequeue.
> 
> To summarize,
> 
> 1)  My _personal_ preference is to have specific subsystems
> to improve the DPDK instead of the raw device kind of path.

Something like rte_memdev to focus on device (GPU) memory management ?

The new DPDK auxiliary bus maybe make life easier to solve the complex
heterogeneous computing library. ;-)

> 2) If the device personality is not defined, use rawdev
> 3) All computing devices do not use  "communication flag" and
> "communication list"
> kind of structure. If are targeting a generic computing device then
> that is not a portable scheme.
> For GPU abstraction if "communication flag" and "communication list"
> is the right kind of mechanism
> then we can have a separate library for GPU communication specific to GPU <->
> DPDK communication needs and explicit for GPU.
> 
> I think generic DPDK applications like testpmd should not
> pollute with device-specific functions. Like, call device-specific
> messages from the application
> which makes the application runs only one device. I don't have a
> strong opinion(expect
> standardizing  "communication flag" and "communication list" as
> generic computing device
> communication mechanism) of others think it is OK to do that way in DPDK.
> 
> >
> > > If an application can run only on a specific device, it is similar to
> > > a raw device,
> > > where the device definition is not defined. (i.e JOB metadata is not defined and
> > > it is specific to the device).
> > >
> > > > > Just my _personal_ preference is to have specific subsystems to
> > > > > improve the DPDK instead of raw device kind of
> > > > > path. If we decide another path as a community it is _fine_ too(as a
> > > > > _project manager_ point of view it will be an easy path to dump SDK
> > > > > stuff to DPDK without introducing the pain of the subsystem nor
> > > > > improving the DPDK).
> > > >
> > > > Adding a new class API is also improving DPDK.
> > >
> > > But the class is similar as raw dev class. The reason I say,
> > > Job submission and response is can be abstracted as queue/dequeue APIs.
> > > Taks/Job metadata is specific to compute devices (and it can not be
> > > generalized).
> > > If we generalize it makes sense to have a new class that does
> > > "specific function".
> >
> > Computing device programming is already generalized with languages like OpenCL.
> > We should not try to reinvent the same.
> > We are just trying to properly integrate the concept in DPDK
> > and allow building on top of it.
> 
> See above.
> 
> >
> >

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library
  2021-08-29  5:32             ` Wang, Haiyue
@ 2021-09-01 15:35               ` Elena Agostini
  2021-09-02 13:12                 ` Jerin Jacob
  0 siblings, 1 reply; 128+ messages in thread
From: Elena Agostini @ 2021-09-01 15:35 UTC (permalink / raw)
  To: Wang, Haiyue, Jerin Jacob, NBU-Contact-Thomas Monjalon
  Cc: Jerin Jacob, dpdk-dev, Stephen Hemminger, David Marchand,
	Andrew Rybchenko, Honnappa Nagarahalli, Yigit, Ferruh, techboard


> -----Original Message-----
> From: Wang, Haiyue <haiyue.wang@intel.com>
> Sent: Sunday, August 29, 2021 7:33 AM
> To: Jerin Jacob <jerinjacobk@gmail.com>; NBU-Contact-Thomas Monjalon
> <thomas@monjalon.net>
> Cc: Jerin Jacob <jerinj@marvell.com>; dpdk-dev <dev@dpdk.org>; Stephen
> Hemminger <stephen@networkplumber.org>; David Marchand
> <david.marchand@redhat.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>; Honnappa Nagarahalli
> <honnappa.nagarahalli@arm.com>; Yigit, Ferruh <ferruh.yigit@intel.com>;
> techboard@dpdk.org; Elena Agostini <eagostini@nvidia.com>
> Subject: RE: [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library
> 
> 
> 
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Friday, August 27, 2021 20:19
> > To: Thomas Monjalon <thomas@monjalon.net>
> > Cc: Jerin Jacob <jerinj@marvell.com>; dpdk-dev <dev@dpdk.org>; Stephen
> Hemminger
> > <stephen@networkplumber.org>; David Marchand
> <david.marchand@redhat.com>; Andrew Rybchenko
> > <andrew.rybchenko@oktetlabs.ru>; Wang, Haiyue <haiyue.wang@intel.com>;
> Honnappa Nagarahalli
> > <honnappa.nagarahalli@arm.com>; Yigit, Ferruh <ferruh.yigit@intel.com>;
> techboard@dpdk.org; Elena
> > Agostini <eagostini@nvidia.com>
> > Subject: Re: [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library
> >
> > On Fri, Aug 27, 2021 at 3:14 PM Thomas Monjalon <thomas@monjalon.net>
> wrote:
> > >
> > > 31/07/2021 15:42, Jerin Jacob:
> > > > On Sat, Jul 31, 2021 at 1:51 PM Thomas Monjalon
> <thomas@monjalon.net> wrote:
> > > > > 31/07/2021 09:06, Jerin Jacob:
> > > > > > On Fri, Jul 30, 2021 at 7:25 PM Thomas Monjalon
> <thomas@monjalon.net> wrote:
> > > > > > > From: Elena Agostini <eagostini@nvidia.com>
> > > > > > >
> > > > > > > In heterogeneous computing system, processing is not only in the
> CPU.
> > > > > > > Some tasks can be delegated to devices working in parallel.
> > > > > > >
> > > > > > > The goal of this new library is to enhance the collaboration between
> > > > > > > DPDK, that's primarily a CPU framework, and other type of devices
> like GPUs.
> > > > > > >
> > > > > > > When mixing network activity with task processing on a non-CPU
> device,
> > > > > > > there may be the need to put in communication the CPU with the
> device
> > > > > > > in order to manage the memory, synchronize operations, exchange
> info, etc..
> > > > > > >
> > > > > > > This library provides a number of new features:
> > > > > > > - Interoperability with device specific library with generic handlers
> > > > > > > - Possibility to allocate and free memory on the device
> > > > > > > - Possibility to allocate and free memory on the CPU but visible from
> the device
> > > > > > > - Communication functions to enhance the dialog between the CPU
> and the device
> > > > > > >
> > > > > > > The infrastructure is prepared to welcome drivers in drivers/hc/
> > > > > > > as the upcoming NVIDIA one, implementing the hcdev API.
> > > > > > >
> > > > > > > Some parts are not complete:
> > > > > > >   - locks
> > > > > > >   - memory allocation table
> > > > > > >   - memory freeing
> > > > > > >   - guide documentation
> > > > > > >   - integration in devtools/check-doc-vs-code.sh
> > > > > > >   - unit tests
> > > > > > >   - integration in testpmd to enable Rx/Tx to/from GPU memory.
> > > > > >
> > > > > > Since the above line is the crux of the following text, I will start
> > > > > > from this point.
> > > > > >
> > > > > > + Techboard
> > > > > >
> > > > > > I  can give my honest feedback on this.
> > > > > >
> > > > > > I can map similar  stuff  in Marvell HW, where we do machine learning
> > > > > > as compute offload
> > > > > > on a different class of CPU.
> > > > > >
> > > > > > In terms of RFC patch features
> > > > > >
> > > > > > 1) memory API - Use cases are aligned
> > > > > > 2) communication flag and communication list
> > > > > > Our structure is completely different and we are using HW ring kind of
> > > > > > interface to post the job to compute interface and
> > > > > > the job completion result happens through the event device.
> > > > > > Kind of similar to the DMA API that has been discussed on the mailing
> list.
> > > > >
> > > > > Interesting.
> > > >
> > > > It is hard to generalize the communication mechanism.
> > > > Is other GPU vendors have a similar communication mechanism? AMD,
> Intel ??
> > >
> > > I don't know who to ask in AMD & Intel. Any ideas?
> >
> > Good question.
> >
> > At least in Marvell HW, the communication flag and communication list is
> > our structure is completely different and we are using HW ring kind of
> > interface to post the job to compute interface and
> > the job completion result happens through the event device.
> > kind of similar to the DMA API that has been discussed on the mailing list.

Please correct me if I'm wrong but what you are describing is a specific way 
to submit work on the device. Communication flag/list here is a direct data 
communication between the CPU and some kind of workload (e.g. GPU kernel)
that's already running on the device.

The rationale here is that:
- some work has been already submitted on the device and it's running
- CPU needs a real-time direct interaction through memory with the device
- the workload on the device needs some info from the CPU it can't get at submission time

This is good enough for NVIDIA and AMD GPU.
Need to double check for Intel GPU.

> >
> > >
> > > > > > Now the bigger question is why need to Tx and then Rx something to
> > > > > > compute the device
> > > > > > Isn't  ot offload something? If so, why not add the those offload in
> > > > > > respective subsystem
> > > > > > to improve the subsystem(ethdev, cryptiodev etc) features set to adapt
> > > > > > new features or
> > > > > > introduce new subsystem (like ML, Inline Baseband processing) so that
> > > > > > it will be an opportunity to
> > > > > > implement the same in  HW or compute device. For example, if we take
> > > > > > this path, ML offloading will
> > > > > > be application code like testpmd, which deals with "specific" device
> > > > > > commands(aka glorified rawdev)
> > > > > > to deal with specific computing device offload "COMMANDS"
> > > > > > (The commands will be specific to  offload device, the same code wont
> > > > > > run on  other compute device)
> > > > >
> > > > > Having specific features API is convenient for compatibility
> > > > > between devices, yes, for the set of defined features.
> > > > > Our approach is to start with a flexible API that the application
> > > > > can use to implement any processing because with GPU programming,
> > > > > there is no restriction on what can be achieved.
> > > > > This approach does not contradict what you propose,
> > > > > it does not prevent extending existing classes.
> > > >
> > > > It does prevent extending the existing classes as no one is going to
> > > > extent it there is the path of not doing do.
> > >
> > > I disagree. Specific API is more convenient for some tasks,
> > > so there is an incentive to define or extend specific device class APIs.
> > > But it should not forbid doing custom processing.
> >
> > This is the same as the raw device is in DPDK where the device
> > personality is not defined.
> >
> > Even if define another API and if the personality is not defined,
> > it comes similar to the raw device as similar
> > to rawdev enqueue and dequeue.
> >
> > To summarize,
> >
> > 1)  My _personal_ preference is to have specific subsystems
> > to improve the DPDK instead of the raw device kind of path.
> 
> Something like rte_memdev to focus on device (GPU) memory management ?
> 
> The new DPDK auxiliary bus maybe make life easier to solve the complex
> heterogeneous computing library. ;-)

To get a concrete idea about what's the best and most comprehensive
approach we should start with something that's flexible and simple enough.

A dedicated library it's a good starting point: easy to implement and embed in DPDK applications,
isolated from other components and users can play with it learning from the code.
As a second step we can think to embed the functionality in some other way 
within DPDK (e.g. split memory management and communication features).

> 
> > 2) If the device personality is not defined, use rawdev
> > 3) All computing devices do not use  "communication flag" and
> > "communication list"
> > kind of structure. If are targeting a generic computing device then
> > that is not a portable scheme.
> > For GPU abstraction if "communication flag" and "communication list"
> > is the right kind of mechanism
> > then we can have a separate library for GPU communication specific to GPU <-
> >
> > DPDK communication needs and explicit for GPU.
> >
> > I think generic DPDK applications like testpmd should not
> > pollute with device-specific functions. Like, call device-specific
> > messages from the application
> > which makes the application runs only one device. I don't have a
> > strong opinion(expect
> > standardizing  "communication flag" and "communication list" as
> > generic computing device
> > communication mechanism) of others think it is OK to do that way in DPDK.

I'd like to introduce (with a dedicated option) the memory API in testpmd to 
provide an example of how to TX/RX packets using device memory.

I agree to not embed communication flag/list features.

> >
> > >
> > > > If an application can run only on a specific device, it is similar to
> > > > a raw device,
> > > > where the device definition is not defined. (i.e JOB metadata is not defined
> and
> > > > it is specific to the device).
> > > >
> > > > > > Just my _personal_ preference is to have specific subsystems to
> > > > > > improve the DPDK instead of raw device kind of
> > > > > > path. If we decide another path as a community it is _fine_ too(as a
> > > > > > _project manager_ point of view it will be an easy path to dump SDK
> > > > > > stuff to DPDK without introducing the pain of the subsystem nor
> > > > > > improving the DPDK).
> > > > >
> > > > > Adding a new class API is also improving DPDK.
> > > >
> > > > But the class is similar as raw dev class. The reason I say,
> > > > Job submission and response is can be abstracted as queue/dequeue APIs.
> > > > Taks/Job metadata is specific to compute devices (and it can not be
> > > > generalized).
> > > > If we generalize it makes sense to have a new class that does
> > > > "specific function".
> > >
> > > Computing device programming is already generalized with languages like
> OpenCL.
> > > We should not try to reinvent the same.
> > > We are just trying to properly integrate the concept in DPDK
> > > and allow building on top of it.

Agree.

> >
> > See above.
> >
> > >
> > >

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library
  2021-09-01 15:35               ` Elena Agostini
@ 2021-09-02 13:12                 ` Jerin Jacob
  2021-09-06 16:11                   ` Elena Agostini
  0 siblings, 1 reply; 128+ messages in thread
From: Jerin Jacob @ 2021-09-02 13:12 UTC (permalink / raw)
  To: Elena Agostini
  Cc: Wang, Haiyue, NBU-Contact-Thomas Monjalon, Jerin Jacob, dpdk-dev,
	Stephen Hemminger, David Marchand, Andrew Rybchenko,
	Honnappa Nagarahalli, Yigit, Ferruh, techboard

On Wed, Sep 1, 2021 at 9:05 PM Elena Agostini <eagostini@nvidia.com> wrote:
>
>
> > -----Original Message-----
> > From: Wang, Haiyue <haiyue.wang@intel.com>
> > Sent: Sunday, August 29, 2021 7:33 AM
> > To: Jerin Jacob <jerinjacobk@gmail.com>; NBU-Contact-Thomas Monjalon
> > <thomas@monjalon.net>
> > Cc: Jerin Jacob <jerinj@marvell.com>; dpdk-dev <dev@dpdk.org>; Stephen
> > Hemminger <stephen@networkplumber.org>; David Marchand
> > <david.marchand@redhat.com>; Andrew Rybchenko
> > <andrew.rybchenko@oktetlabs.ru>; Honnappa Nagarahalli
> > <honnappa.nagarahalli@arm.com>; Yigit, Ferruh <ferruh.yigit@intel.com>;
> > techboard@dpdk.org; Elena Agostini <eagostini@nvidia.com>
> > Subject: RE: [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library
> >
> >
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Friday, August 27, 2021 20:19
> > > To: Thomas Monjalon <thomas@monjalon.net>
> > > Cc: Jerin Jacob <jerinj@marvell.com>; dpdk-dev <dev@dpdk.org>; Stephen
> > Hemminger
> > > <stephen@networkplumber.org>; David Marchand
> > <david.marchand@redhat.com>; Andrew Rybchenko
> > > <andrew.rybchenko@oktetlabs.ru>; Wang, Haiyue <haiyue.wang@intel.com>;
> > Honnappa Nagarahalli
> > > <honnappa.nagarahalli@arm.com>; Yigit, Ferruh <ferruh.yigit@intel.com>;
> > techboard@dpdk.org; Elena
> > > Agostini <eagostini@nvidia.com>
> > > Subject: Re: [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library
> > >
> > > On Fri, Aug 27, 2021 at 3:14 PM Thomas Monjalon <thomas@monjalon.net>
> > wrote:
> > > >
> > > > 31/07/2021 15:42, Jerin Jacob:
> > > > > On Sat, Jul 31, 2021 at 1:51 PM Thomas Monjalon
> > <thomas@monjalon.net> wrote:
> > > > > > 31/07/2021 09:06, Jerin Jacob:
> > > > > > > On Fri, Jul 30, 2021 at 7:25 PM Thomas Monjalon
> > <thomas@monjalon.net> wrote:
> > > > > > > > From: Elena Agostini <eagostini@nvidia.com>
> > > > > > > >
> > > > > > > > In heterogeneous computing system, processing is not only in the
> > CPU.
> > > > > > > > Some tasks can be delegated to devices working in parallel.
> > > > > > > >
> > > > > > > > The goal of this new library is to enhance the collaboration between
> > > > > > > > DPDK, that's primarily a CPU framework, and other type of devices
> > like GPUs.
> > > > > > > >
> > > > > > > > When mixing network activity with task processing on a non-CPU
> > device,
> > > > > > > > there may be the need to put in communication the CPU with the
> > device
> > > > > > > > in order to manage the memory, synchronize operations, exchange
> > info, etc..
> > > > > > > >
> > > > > > > > This library provides a number of new features:
> > > > > > > > - Interoperability with device specific library with generic handlers
> > > > > > > > - Possibility to allocate and free memory on the device
> > > > > > > > - Possibility to allocate and free memory on the CPU but visible from
> > the device
> > > > > > > > - Communication functions to enhance the dialog between the CPU
> > and the device
> > > > > > > >
> > > > > > > > The infrastructure is prepared to welcome drivers in drivers/hc/
> > > > > > > > as the upcoming NVIDIA one, implementing the hcdev API.
> > > > > > > >
> > > > > > > > Some parts are not complete:
> > > > > > > >   - locks
> > > > > > > >   - memory allocation table
> > > > > > > >   - memory freeing
> > > > > > > >   - guide documentation
> > > > > > > >   - integration in devtools/check-doc-vs-code.sh
> > > > > > > >   - unit tests
> > > > > > > >   - integration in testpmd to enable Rx/Tx to/from GPU memory.
> > > > > > >
> > > > > > > Since the above line is the crux of the following text, I will start
> > > > > > > from this point.
> > > > > > >
> > > > > > > + Techboard
> > > > > > >
> > > > > > > I  can give my honest feedback on this.
> > > > > > >
> > > > > > > I can map similar  stuff  in Marvell HW, where we do machine learning
> > > > > > > as compute offload
> > > > > > > on a different class of CPU.
> > > > > > >
> > > > > > > In terms of RFC patch features
> > > > > > >
> > > > > > > 1) memory API - Use cases are aligned
> > > > > > > 2) communication flag and communication list
> > > > > > > Our structure is completely different and we are using HW ring kind of
> > > > > > > interface to post the job to compute interface and
> > > > > > > the job completion result happens through the event device.
> > > > > > > Kind of similar to the DMA API that has been discussed on the mailing
> > list.
> > > > > >
> > > > > > Interesting.
> > > > >
> > > > > It is hard to generalize the communication mechanism.
> > > > > Is other GPU vendors have a similar communication mechanism? AMD,
> > Intel ??
> > > >
> > > > I don't know who to ask in AMD & Intel. Any ideas?
> > >
> > > Good question.
> > >
> > > At least in Marvell HW, the communication flag and communication list is
> > > our structure is completely different and we are using HW ring kind of
> > > interface to post the job to compute interface and
> > > the job completion result happens through the event device.
> > > kind of similar to the DMA API that has been discussed on the mailing list.
>
> Please correct me if I'm wrong but what you are describing is a specific way
> to submit work on the device. Communication flag/list here is a direct data
> communication between the CPU and some kind of workload (e.g. GPU kernel)
> that's already running on the device.

Exactly. What I meant is Communication flag/list is not generic enough
to express
and generic compute device. If all GPU works in this way, we could
make the library
name as GPU specific and add GPU specific communication mechanism.


>
> The rationale here is that:
> - some work has been already submitted on the device and it's running
> - CPU needs a real-time direct interaction through memory with the device
> - the workload on the device needs some info from the CPU it can't get at submission time
>
> This is good enough for NVIDIA and AMD GPU.
> Need to double check for Intel GPU.
>
> > >
> > > >
> > > > > > > Now the bigger question is why need to Tx and then Rx something to
> > > > > > > compute the device
> > > > > > > Isn't  ot offload something? If so, why not add the those offload in
> > > > > > > respective subsystem
> > > > > > > to improve the subsystem(ethdev, cryptiodev etc) features set to adapt
> > > > > > > new features or
> > > > > > > introduce new subsystem (like ML, Inline Baseband processing) so that
> > > > > > > it will be an opportunity to
> > > > > > > implement the same in  HW or compute device. For example, if we take
> > > > > > > this path, ML offloading will
> > > > > > > be application code like testpmd, which deals with "specific" device
> > > > > > > commands(aka glorified rawdev)
> > > > > > > to deal with specific computing device offload "COMMANDS"
> > > > > > > (The commands will be specific to  offload device, the same code wont
> > > > > > > run on  other compute device)
> > > > > >
> > > > > > Having specific features API is convenient for compatibility
> > > > > > between devices, yes, for the set of defined features.
> > > > > > Our approach is to start with a flexible API that the application
> > > > > > can use to implement any processing because with GPU programming,
> > > > > > there is no restriction on what can be achieved.
> > > > > > This approach does not contradict what you propose,
> > > > > > it does not prevent extending existing classes.
> > > > >
> > > > > It does prevent extending the existing classes as no one is going to
> > > > > extent it there is the path of not doing do.
> > > >
> > > > I disagree. Specific API is more convenient for some tasks,
> > > > so there is an incentive to define or extend specific device class APIs.
> > > > But it should not forbid doing custom processing.
> > >
> > > This is the same as the raw device is in DPDK where the device
> > > personality is not defined.
> > >
> > > Even if define another API and if the personality is not defined,
> > > it comes similar to the raw device as similar
> > > to rawdev enqueue and dequeue.
> > >
> > > To summarize,
> > >
> > > 1)  My _personal_ preference is to have specific subsystems
> > > to improve the DPDK instead of the raw device kind of path.
> >
> > Something like rte_memdev to focus on device (GPU) memory management ?
> >
> > The new DPDK auxiliary bus maybe make life easier to solve the complex
> > heterogeneous computing library. ;-)
>
> To get a concrete idea about what's the best and most comprehensive
> approach we should start with something that's flexible and simple enough.
>
> A dedicated library it's a good starting point: easy to implement and embed in DPDK applications,
> isolated from other components and users can play with it learning from the code.
> As a second step we can think to embed the functionality in some other way
> within DPDK (e.g. split memory management and communication features).
>
> >
> > > 2) If the device personality is not defined, use rawdev
> > > 3) All computing devices do not use  "communication flag" and
> > > "communication list"
> > > kind of structure. If are targeting a generic computing device then
> > > that is not a portable scheme.
> > > For GPU abstraction if "communication flag" and "communication list"
> > > is the right kind of mechanism
> > > then we can have a separate library for GPU communication specific to GPU <-
> > >
> > > DPDK communication needs and explicit for GPU.
> > >
> > > I think generic DPDK applications like testpmd should not
> > > pollute with device-specific functions. Like, call device-specific
> > > messages from the application
> > > which makes the application runs only one device. I don't have a
> > > strong opinion(expect
> > > standardizing  "communication flag" and "communication list" as
> > > generic computing device
> > > communication mechanism) of others think it is OK to do that way in DPDK.
>
> I'd like to introduce (with a dedicated option) the memory API in testpmd to
> provide an example of how to TX/RX packets using device memory.

Not sure without embedding sideband communication mechanism how it can notify to
GPU and back to CPU. If you could share the example API sequence that helps to
us understand the level of coupling with testpmd.


>
> I agree to not embed communication flag/list features.
>
> > >
> > > >
> > > > > If an application can run only on a specific device, it is similar to
> > > > > a raw device,
> > > > > where the device definition is not defined. (i.e JOB metadata is not defined
> > and
> > > > > it is specific to the device).
> > > > >
> > > > > > > Just my _personal_ preference is to have specific subsystems to
> > > > > > > improve the DPDK instead of raw device kind of
> > > > > > > path. If we decide another path as a community it is _fine_ too(as a
> > > > > > > _project manager_ point of view it will be an easy path to dump SDK
> > > > > > > stuff to DPDK without introducing the pain of the subsystem nor
> > > > > > > improving the DPDK).
> > > > > >
> > > > > > Adding a new class API is also improving DPDK.
> > > > >
> > > > > But the class is similar as raw dev class. The reason I say,
> > > > > Job submission and response is can be abstracted as queue/dequeue APIs.
> > > > > Taks/Job metadata is specific to compute devices (and it can not be
> > > > > generalized).
> > > > > If we generalize it makes sense to have a new class that does
> > > > > "specific function".
> > > >
> > > > Computing device programming is already generalized with languages like
> > OpenCL.
> > > > We should not try to reinvent the same.
> > > > We are just trying to properly integrate the concept in DPDK
> > > > and allow building on top of it.
>
> Agree.
>
> > >
> > > See above.
> > >
> > > >
> > > >

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library
  2021-09-02 13:12                 ` Jerin Jacob
@ 2021-09-06 16:11                   ` Elena Agostini
  2021-09-06 17:15                     ` Wang, Haiyue
  0 siblings, 1 reply; 128+ messages in thread
From: Elena Agostini @ 2021-09-06 16:11 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Wang, Haiyue, NBU-Contact-Thomas Monjalon, Jerin Jacob, dpdk-dev,
	Stephen Hemminger, David Marchand, Andrew Rybchenko,
	Honnappa Nagarahalli, Yigit, Ferruh, techboard



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Thursday, September 2, 2021 3:12 PM
> To: Elena Agostini <eagostini@nvidia.com>
> Cc: Wang, Haiyue <haiyue.wang@intel.com>; NBU-Contact-Thomas
> Monjalon <thomas@monjalon.net>; Jerin Jacob <jerinj@marvell.com>;
> dpdk-dev <dev@dpdk.org>; Stephen Hemminger
> <stephen@networkplumber.org>; David Marchand
> <david.marchand@redhat.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>; Honnappa Nagarahalli
> <honnappa.nagarahalli@arm.com>; Yigit, Ferruh <ferruh.yigit@intel.com>;
> techboard@dpdk.org
> Subject: Re: [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing
> library
> 
> 
> On Wed, Sep 1, 2021 at 9:05 PM Elena Agostini <eagostini@nvidia.com>
> wrote:
> >
> >
> > > -----Original Message-----
> > > From: Wang, Haiyue <haiyue.wang@intel.com>
> > > Sent: Sunday, August 29, 2021 7:33 AM
> > > To: Jerin Jacob <jerinjacobk@gmail.com>; NBU-Contact-Thomas
> Monjalon
> > > <thomas@monjalon.net>
> > > Cc: Jerin Jacob <jerinj@marvell.com>; dpdk-dev <dev@dpdk.org>;
> > > Stephen Hemminger <stephen@networkplumber.org>; David Marchand
> > > <david.marchand@redhat.com>; Andrew Rybchenko
> > > <andrew.rybchenko@oktetlabs.ru>; Honnappa Nagarahalli
> > > <honnappa.nagarahalli@arm.com>; Yigit, Ferruh
> > > <ferruh.yigit@intel.com>; techboard@dpdk.org; Elena Agostini
> > > <eagostini@nvidia.com>
> > > Subject: RE: [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing
> > > library
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > Sent: Friday, August 27, 2021 20:19
> > > > To: Thomas Monjalon <thomas@monjalon.net>
> > > > Cc: Jerin Jacob <jerinj@marvell.com>; dpdk-dev <dev@dpdk.org>;
> > > > Stephen
> > > Hemminger
> > > > <stephen@networkplumber.org>; David Marchand
> > > <david.marchand@redhat.com>; Andrew Rybchenko
> > > > <andrew.rybchenko@oktetlabs.ru>; Wang, Haiyue
> > > > <haiyue.wang@intel.com>;
> > > Honnappa Nagarahalli
> > > > <honnappa.nagarahalli@arm.com>; Yigit, Ferruh
> > > > <ferruh.yigit@intel.com>;
> > > techboard@dpdk.org; Elena
> > > > Agostini <eagostini@nvidia.com>
> > > > Subject: Re: [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing
> > > > library
> > > >
> > > > On Fri, Aug 27, 2021 at 3:14 PM Thomas Monjalon
> > > > <thomas@monjalon.net>
> > > wrote:
> > > > >
> > > > > 31/07/2021 15:42, Jerin Jacob:
> > > > > > On Sat, Jul 31, 2021 at 1:51 PM Thomas Monjalon
> > > <thomas@monjalon.net> wrote:
> > > > > > > 31/07/2021 09:06, Jerin Jacob:
> > > > > > > > On Fri, Jul 30, 2021 at 7:25 PM Thomas Monjalon
> > > <thomas@monjalon.net> wrote:
> > > > > > > > > From: Elena Agostini <eagostini@nvidia.com>
> > > > > > > > >
> > > > > > > > > In heterogeneous computing system, processing is not
> > > > > > > > > only in the
> > > CPU.
> > > > > > > > > Some tasks can be delegated to devices working in parallel.
> > > > > > > > >
> > > > > > > > > The goal of this new library is to enhance the
> > > > > > > > > collaboration between DPDK, that's primarily a CPU
> > > > > > > > > framework, and other type of devices
> > > like GPUs.
> > > > > > > > >
> > > > > > > > > When mixing network activity with task processing on a
> > > > > > > > > non-CPU
> > > device,
> > > > > > > > > there may be the need to put in communication the CPU
> > > > > > > > > with the
> > > device
> > > > > > > > > in order to manage the memory, synchronize operations,
> > > > > > > > > exchange
> > > info, etc..
> > > > > > > > >
> > > > > > > > > This library provides a number of new features:
> > > > > > > > > - Interoperability with device specific library with
> > > > > > > > > generic handlers
> > > > > > > > > - Possibility to allocate and free memory on the device
> > > > > > > > > - Possibility to allocate and free memory on the CPU but
> > > > > > > > > visible from
> > > the device
> > > > > > > > > - Communication functions to enhance the dialog between
> > > > > > > > > the CPU
> > > and the device
> > > > > > > > >
> > > > > > > > > The infrastructure is prepared to welcome drivers in
> > > > > > > > > drivers/hc/ as the upcoming NVIDIA one, implementing the
> hcdev API.
> > > > > > > > >
> > > > > > > > > Some parts are not complete:
> > > > > > > > >   - locks
> > > > > > > > >   - memory allocation table
> > > > > > > > >   - memory freeing
> > > > > > > > >   - guide documentation
> > > > > > > > >   - integration in devtools/check-doc-vs-code.sh
> > > > > > > > >   - unit tests
> > > > > > > > >   - integration in testpmd to enable Rx/Tx to/from GPU
> memory.
> > > > > > > >
> > > > > > > > Since the above line is the crux of the following text, I
> > > > > > > > will start from this point.
> > > > > > > >
> > > > > > > > + Techboard
> > > > > > > >
> > > > > > > > I  can give my honest feedback on this.
> > > > > > > >
> > > > > > > > I can map similar  stuff  in Marvell HW, where we do
> > > > > > > > machine learning as compute offload on a different class
> > > > > > > > of CPU.
> > > > > > > >
> > > > > > > > In terms of RFC patch features
> > > > > > > >
> > > > > > > > 1) memory API - Use cases are aligned
> > > > > > > > 2) communication flag and communication list Our structure
> > > > > > > > is completely different and we are using HW ring kind of
> > > > > > > > interface to post the job to compute interface and the job
> > > > > > > > completion result happens through the event device.
> > > > > > > > Kind of similar to the DMA API that has been discussed on
> > > > > > > > the mailing
> > > list.
> > > > > > >
> > > > > > > Interesting.
> > > > > >
> > > > > > It is hard to generalize the communication mechanism.
> > > > > > Is other GPU vendors have a similar communication mechanism?
> > > > > > AMD,
> > > Intel ??
> > > > >
> > > > > I don't know who to ask in AMD & Intel. Any ideas?
> > > >
> > > > Good question.
> > > >
> > > > At least in Marvell HW, the communication flag and communication
> > > > list is our structure is completely different and we are using HW
> > > > ring kind of interface to post the job to compute interface and
> > > > the job completion result happens through the event device.
> > > > kind of similar to the DMA API that has been discussed on the mailing
> list.
> >
> > Please correct me if I'm wrong but what you are describing is a
> > specific way to submit work on the device. Communication flag/list
> > here is a direct data communication between the CPU and some kind of
> > workload (e.g. GPU kernel) that's already running on the device.
> 
> Exactly. What I meant is Communication flag/list is not generic enough to
> express and generic compute device. If all GPU works in this way, we could
> make the library name as GPU specific and add GPU specific communication
> mechanism.

I'm in favor of reverting the name of the library with a more specific gpudev name
instead of hcdev. This library (both memory allocations and fancy features like
communication lists) can be tested on various GPUs but I'm not sure about
other type of devices. 

Again, as initial step, I would not complicate things
Let's have a GPU oriented library for now.

> 
> 
> >
> > The rationale here is that:
> > - some work has been already submitted on the device and it's running
> > - CPU needs a real-time direct interaction through memory with the
> > device
> > - the workload on the device needs some info from the CPU it can't get
> > at submission time
> >
> > This is good enough for NVIDIA and AMD GPU.
> > Need to double check for Intel GPU.
> >
> > > >
> > > > >
> > > > > > > > Now the bigger question is why need to Tx and then Rx
> > > > > > > > something to compute the device Isn't  ot offload
> > > > > > > > something? If so, why not add the those offload in
> > > > > > > > respective subsystem to improve the subsystem(ethdev,
> > > > > > > > cryptiodev etc) features set to adapt new features or
> > > > > > > > introduce new subsystem (like ML, Inline Baseband
> > > > > > > > processing) so that it will be an opportunity to implement
> > > > > > > > the same in  HW or compute device. For example, if we take
> > > > > > > > this path, ML offloading will be application code like
> > > > > > > > testpmd, which deals with "specific" device commands(aka
> > > > > > > > glorified rawdev) to deal with specific computing device
> > > > > > > > offload "COMMANDS"
> > > > > > > > (The commands will be specific to  offload device, the
> > > > > > > > same code wont run on  other compute device)
> > > > > > >
> > > > > > > Having specific features API is convenient for compatibility
> > > > > > > between devices, yes, for the set of defined features.
> > > > > > > Our approach is to start with a flexible API that the
> > > > > > > application can use to implement any processing because with
> > > > > > > GPU programming, there is no restriction on what can be
> achieved.
> > > > > > > This approach does not contradict what you propose, it does
> > > > > > > not prevent extending existing classes.
> > > > > >
> > > > > > It does prevent extending the existing classes as no one is
> > > > > > going to extent it there is the path of not doing do.
> > > > >
> > > > > I disagree. Specific API is more convenient for some tasks, so
> > > > > there is an incentive to define or extend specific device class APIs.
> > > > > But it should not forbid doing custom processing.
> > > >
> > > > This is the same as the raw device is in DPDK where the device
> > > > personality is not defined.
> > > >
> > > > Even if define another API and if the personality is not defined,
> > > > it comes similar to the raw device as similar to rawdev enqueue
> > > > and dequeue.
> > > >
> > > > To summarize,
> > > >
> > > > 1)  My _personal_ preference is to have specific subsystems to
> > > > improve the DPDK instead of the raw device kind of path.
> > >
> > > Something like rte_memdev to focus on device (GPU) memory
> management ?
> > >
> > > The new DPDK auxiliary bus maybe make life easier to solve the
> > > complex heterogeneous computing library. ;-)
> >
> > To get a concrete idea about what's the best and most comprehensive
> > approach we should start with something that's flexible and simple
> enough.
> >
> > A dedicated library it's a good starting point: easy to implement and
> > embed in DPDK applications, isolated from other components and users
> can play with it learning from the code.
> > As a second step we can think to embed the functionality in some other
> > way within DPDK (e.g. split memory management and communication
> features).
> >
> > >
> > > > 2) If the device personality is not defined, use rawdev
> > > > 3) All computing devices do not use  "communication flag" and
> > > > "communication list"
> > > > kind of structure. If are targeting a generic computing device
> > > > then that is not a portable scheme.
> > > > For GPU abstraction if "communication flag" and "communication
> list"
> > > > is the right kind of mechanism
> > > > then we can have a separate library for GPU communication specific
> > > > to GPU <-
> > > >
> > > > DPDK communication needs and explicit for GPU.
> > > >
> > > > I think generic DPDK applications like testpmd should not pollute
> > > > with device-specific functions. Like, call device-specific
> > > > messages from the application which makes the application runs
> > > > only one device. I don't have a strong opinion(expect
> > > > standardizing  "communication flag" and "communication list" as
> > > > generic computing device communication mechanism) of others think
> > > > it is OK to do that way in DPDK.
> >
> > I'd like to introduce (with a dedicated option) the memory API in
> > testpmd to provide an example of how to TX/RX packets using device
> memory.
> 
> Not sure without embedding sideband communication mechanism how it
> can notify to GPU and back to CPU. If you could share the example API
> sequence that helps to us understand the level of coupling with testpmd.
> 

There is no need of communication mechanism here.
Assuming there is not workload to process network packets (to not complicate
things), the steps are:
1) Create a DPDK mempool with device external memory using the hcdev (or gpudev) library
2) Use that mempool to tx/rx/fwd packets

As an example, you look at my l2fwd-nv application here: https://github.com/NVIDIA/l2fwd-nv

> 
> >
> > I agree to not embed communication flag/list features.
> >
> > > >
> > > > >
> > > > > > If an application can run only on a specific device, it is
> > > > > > similar to a raw device, where the device definition is not
> > > > > > defined. (i.e JOB metadata is not defined
> > > and
> > > > > > it is specific to the device).
> > > > > >
> > > > > > > > Just my _personal_ preference is to have specific
> > > > > > > > subsystems to improve the DPDK instead of raw device kind
> > > > > > > > of path. If we decide another path as a community it is
> > > > > > > > _fine_ too(as a _project manager_ point of view it will be
> > > > > > > > an easy path to dump SDK stuff to DPDK without introducing
> > > > > > > > the pain of the subsystem nor improving the DPDK).
> > > > > > >
> > > > > > > Adding a new class API is also improving DPDK.
> > > > > >
> > > > > > But the class is similar as raw dev class. The reason I say,
> > > > > > Job submission and response is can be abstracted as
> queue/dequeue APIs.
> > > > > > Taks/Job metadata is specific to compute devices (and it can
> > > > > > not be generalized).
> > > > > > If we generalize it makes sense to have a new class that does
> > > > > > "specific function".
> > > > >
> > > > > Computing device programming is already generalized with
> > > > > languages like
> > > OpenCL.
> > > > > We should not try to reinvent the same.
> > > > > We are just trying to properly integrate the concept in DPDK and
> > > > > allow building on top of it.
> >
> > Agree.
> >
> > > >
> > > > See above.
> > > >
> > > > >
> > > > >

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library
  2021-09-06 16:11                   ` Elena Agostini
@ 2021-09-06 17:15                     ` Wang, Haiyue
  2021-09-06 17:22                       ` Elena Agostini
  0 siblings, 1 reply; 128+ messages in thread
From: Wang, Haiyue @ 2021-09-06 17:15 UTC (permalink / raw)
  To: Elena Agostini, Jerin Jacob
  Cc: NBU-Contact-Thomas Monjalon, Jerin Jacob, dpdk-dev,
	Stephen Hemminger, David Marchand, Andrew Rybchenko,
	Honnappa Nagarahalli, Yigit, Ferruh, techboard

> -----Original Message-----
> From: Elena Agostini <eagostini@nvidia.com>
> Sent: Tuesday, September 7, 2021 00:11
> To: Jerin Jacob <jerinjacobk@gmail.com>
> Cc: Wang, Haiyue <haiyue.wang@intel.com>; NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Jerin
> Jacob <jerinj@marvell.com>; dpdk-dev <dev@dpdk.org>; Stephen Hemminger <stephen@networkplumber.org>;
> David Marchand <david.marchand@redhat.com>; Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; Honnappa
> Nagarahalli <honnappa.nagarahalli@arm.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; techboard@dpdk.org
> Subject: RE: [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library
> 
> 
> 


> > >
> > > I'd like to introduce (with a dedicated option) the memory API in
> > > testpmd to provide an example of how to TX/RX packets using device
> > memory.
> >
> > Not sure without embedding sideband communication mechanism how it
> > can notify to GPU and back to CPU. If you could share the example API
> > sequence that helps to us understand the level of coupling with testpmd.
> >
> 
> There is no need of communication mechanism here.
> Assuming there is not workload to process network packets (to not complicate
> things), the steps are:
> 1) Create a DPDK mempool with device external memory using the hcdev (or gpudev) library
> 2) Use that mempool to tx/rx/fwd packets
> 
> As an example, you look at my l2fwd-nv application here: https://github.com/NVIDIA/l2fwd-nv
> 

To enhance the 'rte_extmem_register' / 'rte_pktmbuf_pool_create_extbuf' ?

	if (l2fwd_mem_type == MEM_HOST_PINNED) {
		ext_mem.buf_ptr = rte_malloc("extmem", ext_mem.buf_len, 0);
		CUDA_CHECK(cudaHostRegister(ext_mem.buf_ptr, ext_mem.buf_len, cudaHostRegisterMapped));
		void *pDevice;
		CUDA_CHECK(cudaHostGetDevicePointer(&pDevice, ext_mem.buf_ptr, 0));
		if (pDevice != ext_mem.buf_ptr)
			rte_exit(EXIT_FAILURE, "GPU pointer does not match CPU pointer\n");
	} else {
		ext_mem.buf_iova = RTE_BAD_IOVA;
		CUDA_CHECK(cudaMalloc(&ext_mem.buf_ptr, ext_mem.buf_len));
		if (ext_mem.buf_ptr == NULL)
			rte_exit(EXIT_FAILURE, "Could not allocate GPU memory\n");

		unsigned int flag = 1;
		CUresult status = cuPointerSetAttribute(&flag, CU_POINTER_ATTRIBUTE_SYNC_MEMOPS, (CUdeviceptr)ext_mem.buf_ptr);
		if (CUDA_SUCCESS != status) {
			rte_exit(EXIT_FAILURE, "Could not set SYNC MEMOP attribute for GPU memory at %llx\n", (CUdeviceptr)ext_mem.buf_ptr);
		}
		ret = rte_extmem_register(ext_mem.buf_ptr, ext_mem.buf_len, NULL, ext_mem.buf_iova, GPU_PAGE_SIZE);
		if (ret)
			rte_exit(EXIT_FAILURE, "Could not register GPU memory\n");
	}
	ret = rte_dev_dma_map(rte_eth_devices[l2fwd_port_id].device, ext_mem.buf_ptr, ext_mem.buf_iova, ext_mem.buf_len);
	if (ret)
		rte_exit(EXIT_FAILURE, "Could not DMA map EXT memory\n");
	mpool_payload = rte_pktmbuf_pool_create_extbuf("payload_mpool", l2fwd_nb_mbufs,
											0, 0, ext_mem.elt_size, 
											rte_socket_id(), &ext_mem, 1);
	if (mpool_payload == NULL)
		rte_exit(EXIT_FAILURE, "Could not create EXT memory mempool\n");




^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library
  2021-09-06 17:15                     ` Wang, Haiyue
@ 2021-09-06 17:22                       ` Elena Agostini
  2021-09-07  0:55                         ` Wang, Haiyue
  0 siblings, 1 reply; 128+ messages in thread
From: Elena Agostini @ 2021-09-06 17:22 UTC (permalink / raw)
  To: Wang, Haiyue, Jerin Jacob
  Cc: NBU-Contact-Thomas Monjalon, Jerin Jacob, dpdk-dev,
	Stephen Hemminger, David Marchand, Andrew Rybchenko,
	Honnappa Nagarahalli, Yigit, Ferruh, techboard



> -----Original Message-----
> From: Wang, Haiyue <haiyue.wang@intel.com>
> Sent: Monday, September 6, 2021 7:15 PM
> To: Elena Agostini <eagostini@nvidia.com>; Jerin Jacob
> <jerinjacobk@gmail.com>
> Cc: NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Jerin Jacob
> <jerinj@marvell.com>; dpdk-dev <dev@dpdk.org>; Stephen Hemminger
> <stephen@networkplumber.org>; David Marchand
> <david.marchand@redhat.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>; Honnappa Nagarahalli
> <honnappa.nagarahalli@arm.com>; Yigit, Ferruh <ferruh.yigit@intel.com>;
> techboard@dpdk.org
> Subject: RE: [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing
> library
> 
> 
> > -----Original Message-----
> > From: Elena Agostini <eagostini@nvidia.com>
> > Sent: Tuesday, September 7, 2021 00:11
> > To: Jerin Jacob <jerinjacobk@gmail.com>
> > Cc: Wang, Haiyue <haiyue.wang@intel.com>; NBU-Contact-Thomas
> Monjalon
> > <thomas@monjalon.net>; Jerin Jacob <jerinj@marvell.com>; dpdk-dev
> > <dev@dpdk.org>; Stephen Hemminger <stephen@networkplumber.org>;
> David
> > Marchand <david.marchand@redhat.com>; Andrew Rybchenko
> > <andrew.rybchenko@oktetlabs.ru>; Honnappa Nagarahalli
> > <honnappa.nagarahalli@arm.com>; Yigit, Ferruh
> > <ferruh.yigit@intel.com>; techboard@dpdk.org
> > Subject: RE: [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing
> > library
> >
> >
> >
> 
> 
> > > >
> > > > I'd like to introduce (with a dedicated option) the memory API in
> > > > testpmd to provide an example of how to TX/RX packets using device
> > > memory.
> > >
> > > Not sure without embedding sideband communication mechanism how
> it
> > > can notify to GPU and back to CPU. If you could share the example
> > > API sequence that helps to us understand the level of coupling with
> testpmd.
> > >
> >
> > There is no need of communication mechanism here.
> > Assuming there is not workload to process network packets (to not
> > complicate things), the steps are:
> > 1) Create a DPDK mempool with device external memory using the hcdev
> > (or gpudev) library
> > 2) Use that mempool to tx/rx/fwd packets
> >
> > As an example, you look at my l2fwd-nv application here:
> > https://github.com/NVIDIA/l2fwd-nv
> >
> 
> To enhance the 'rte_extmem_register' / 'rte_pktmbuf_pool_create_extbuf'
> ?
> 

The purpose of these two functions is different.
Here DPDK allows the user to use any kind of memory to rx/tx packets.
It's not about allocating memory.

Maybe I'm missing the point here: what's the main objection in having a GPU library?

>         if (l2fwd_mem_type == MEM_HOST_PINNED) {
>                 ext_mem.buf_ptr = rte_malloc("extmem", ext_mem.buf_len, 0);
>                 CUDA_CHECK(cudaHostRegister(ext_mem.buf_ptr,
> ext_mem.buf_len, cudaHostRegisterMapped));
>                 void *pDevice;
>                 CUDA_CHECK(cudaHostGetDevicePointer(&pDevice,
> ext_mem.buf_ptr, 0));
>                 if (pDevice != ext_mem.buf_ptr)
>                         rte_exit(EXIT_FAILURE, "GPU pointer does not match CPU
> pointer\n");
>         } else {
>                 ext_mem.buf_iova = RTE_BAD_IOVA;
>                 CUDA_CHECK(cudaMalloc(&ext_mem.buf_ptr,
> ext_mem.buf_len));
>                 if (ext_mem.buf_ptr == NULL)
>                         rte_exit(EXIT_FAILURE, "Could not allocate GPU memory\n");
> 
>                 unsigned int flag = 1;
>                 CUresult status = cuPointerSetAttribute(&flag,
> CU_POINTER_ATTRIBUTE_SYNC_MEMOPS, (CUdeviceptr)ext_mem.buf_ptr);
>                 if (CUDA_SUCCESS != status) {
>                         rte_exit(EXIT_FAILURE, "Could not set SYNC MEMOP attribute
> for GPU memory at %llx\n", (CUdeviceptr)ext_mem.buf_ptr);
>                 }
>                 ret = rte_extmem_register(ext_mem.buf_ptr, ext_mem.buf_len,
> NULL, ext_mem.buf_iova, GPU_PAGE_SIZE);
>                 if (ret)
>                         rte_exit(EXIT_FAILURE, "Could not register GPU memory\n");
>         }
>         ret = rte_dev_dma_map(rte_eth_devices[l2fwd_port_id].device,
> ext_mem.buf_ptr, ext_mem.buf_iova, ext_mem.buf_len);
>         if (ret)
>                 rte_exit(EXIT_FAILURE, "Could not DMA map EXT memory\n");
>         mpool_payload = rte_pktmbuf_pool_create_extbuf("payload_mpool",
> l2fwd_nb_mbufs,
>                                                                                         0, 0, ext_mem.elt_size,
>                                                                                         rte_socket_id(),
> &ext_mem, 1);
>         if (mpool_payload == NULL)
>                 rte_exit(EXIT_FAILURE, "Could not create EXT memory
> mempool\n");
> 
> 


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library
  2021-09-06 17:22                       ` Elena Agostini
@ 2021-09-07  0:55                         ` Wang, Haiyue
  0 siblings, 0 replies; 128+ messages in thread
From: Wang, Haiyue @ 2021-09-07  0:55 UTC (permalink / raw)
  To: Elena Agostini, Jerin Jacob
  Cc: NBU-Contact-Thomas Monjalon, Jerin Jacob, dpdk-dev,
	Stephen Hemminger, David Marchand, Andrew Rybchenko,
	Honnappa Nagarahalli, Yigit, Ferruh, techboard

> -----Original Message-----
> From: Elena Agostini <eagostini@nvidia.com>
> Sent: Tuesday, September 7, 2021 01:23
> To: Wang, Haiyue <haiyue.wang@intel.com>; Jerin Jacob <jerinjacobk@gmail.com>
> Cc: NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Jerin Jacob <jerinj@marvell.com>; dpdk-dev
> <dev@dpdk.org>; Stephen Hemminger <stephen@networkplumber.org>; David Marchand
> <david.marchand@redhat.com>; Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; Honnappa Nagarahalli
> <honnappa.nagarahalli@arm.com>; Yigit, Ferruh <ferruh.yigit@intel.com>; techboard@dpdk.org
> Subject: RE: [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library
> 
> 
> 
> > -----Original Message-----
> > From: Wang, Haiyue <haiyue.wang@intel.com>
> > Sent: Monday, September 6, 2021 7:15 PM
> > To: Elena Agostini <eagostini@nvidia.com>; Jerin Jacob
> > <jerinjacobk@gmail.com>
> > Cc: NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Jerin Jacob
> > <jerinj@marvell.com>; dpdk-dev <dev@dpdk.org>; Stephen Hemminger
> > <stephen@networkplumber.org>; David Marchand
> > <david.marchand@redhat.com>; Andrew Rybchenko
> > <andrew.rybchenko@oktetlabs.ru>; Honnappa Nagarahalli
> > <honnappa.nagarahalli@arm.com>; Yigit, Ferruh <ferruh.yigit@intel.com>;
> > techboard@dpdk.org
> > Subject: RE: [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing
> > library
> >
> >
> > > -----Original Message-----
> > > From: Elena Agostini <eagostini@nvidia.com>
> > > Sent: Tuesday, September 7, 2021 00:11
> > > To: Jerin Jacob <jerinjacobk@gmail.com>
> > > Cc: Wang, Haiyue <haiyue.wang@intel.com>; NBU-Contact-Thomas
> > Monjalon
> > > <thomas@monjalon.net>; Jerin Jacob <jerinj@marvell.com>; dpdk-dev
> > > <dev@dpdk.org>; Stephen Hemminger <stephen@networkplumber.org>;
> > David
> > > Marchand <david.marchand@redhat.com>; Andrew Rybchenko
> > > <andrew.rybchenko@oktetlabs.ru>; Honnappa Nagarahalli
> > > <honnappa.nagarahalli@arm.com>; Yigit, Ferruh
> > > <ferruh.yigit@intel.com>; techboard@dpdk.org
> > > Subject: RE: [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing
> > > library
> > >
> > >
> > >
> >
> >
> > > > >
> > > > > I'd like to introduce (with a dedicated option) the memory API in
> > > > > testpmd to provide an example of how to TX/RX packets using device
> > > > memory.
> > > >
> > > > Not sure without embedding sideband communication mechanism how
> > it
> > > > can notify to GPU and back to CPU. If you could share the example
> > > > API sequence that helps to us understand the level of coupling with
> > testpmd.
> > > >
> > >
> > > There is no need of communication mechanism here.
> > > Assuming there is not workload to process network packets (to not
> > > complicate things), the steps are:
> > > 1) Create a DPDK mempool with device external memory using the hcdev
> > > (or gpudev) library
> > > 2) Use that mempool to tx/rx/fwd packets
> > >
> > > As an example, you look at my l2fwd-nv application here:
> > > https://github.com/NVIDIA/l2fwd-nv
> > >
> >
> > To enhance the 'rte_extmem_register' / 'rte_pktmbuf_pool_create_extbuf'
> > ?
> >
> 
> The purpose of these two functions is different.
> Here DPDK allows the user to use any kind of memory to rx/tx packets.
> It's not about allocating memory.

> 
> Maybe I'm missing the point here: what's the main objection in having a GPU library?

Exactly. ;-)

Maybe a real device code is worth for people to get the whole picture.

> 
> >         if (l2fwd_mem_type == MEM_HOST_PINNED) {
> >                 ext_mem.buf_ptr = rte_malloc("extmem", ext_mem.buf_len, 0);
> >                 CUDA_CHECK(cudaHostRegister(ext_mem.buf_ptr,
> > ext_mem.buf_len, cudaHostRegisterMapped));
> >                 void *pDevice;
> >                 CUDA_CHECK(cudaHostGetDevicePointer(&pDevice,
> > ext_mem.buf_ptr, 0));
> >                 if (pDevice != ext_mem.buf_ptr)
> >                         rte_exit(EXIT_FAILURE, "GPU pointer does not match CPU
> > pointer\n");
> >         } else {
> >                 ext_mem.buf_iova = RTE_BAD_IOVA;
> >                 CUDA_CHECK(cudaMalloc(&ext_mem.buf_ptr,
> > ext_mem.buf_len));
> >                 if (ext_mem.buf_ptr == NULL)
> >                         rte_exit(EXIT_FAILURE, "Could not allocate GPU memory\n");
> >
> >                 unsigned int flag = 1;
> >                 CUresult status = cuPointerSetAttribute(&flag,
> > CU_POINTER_ATTRIBUTE_SYNC_MEMOPS, (CUdeviceptr)ext_mem.buf_ptr);
> >                 if (CUDA_SUCCESS != status) {
> >                         rte_exit(EXIT_FAILURE, "Could not set SYNC MEMOP attribute
> > for GPU memory at %llx\n", (CUdeviceptr)ext_mem.buf_ptr);
> >                 }
> >                 ret = rte_extmem_register(ext_mem.buf_ptr, ext_mem.buf_len,
> > NULL, ext_mem.buf_iova, GPU_PAGE_SIZE);
> >                 if (ret)
> >                         rte_exit(EXIT_FAILURE, "Could not register GPU memory\n");
> >         }
> >         ret = rte_dev_dma_map(rte_eth_devices[l2fwd_port_id].device,
> > ext_mem.buf_ptr, ext_mem.buf_iova, ext_mem.buf_len);
> >         if (ret)
> >                 rte_exit(EXIT_FAILURE, "Could not DMA map EXT memory\n");
> >         mpool_payload = rte_pktmbuf_pool_create_extbuf("payload_mpool",
> > l2fwd_nb_mbufs,
> >                                                                                         0, 0,
> ext_mem.elt_size,
> >
> rte_socket_id(),
> > &ext_mem, 1);
> >         if (mpool_payload == NULL)
> >                 rte_exit(EXIT_FAILURE, "Could not create EXT memory
> > mempool\n");
> >
> >


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH v3 6/9] gpudev: add memory barrier
  2021-10-09  1:53   ` [dpdk-dev] [PATCH v3 6/9] gpudev: add memory barrier eagostini
@ 2021-10-08 20:16     ` Thomas Monjalon
  0 siblings, 0 replies; 128+ messages in thread
From: Thomas Monjalon @ 2021-10-08 20:16 UTC (permalink / raw)
  To: Elena Agostini; +Cc: dev

09/10/2021 03:53, eagostini@nvidia.com:
> From: Elena Agostini <eagostini@nvidia.com>
> 
> Add a function for the application to ensure the coherency
> of the writes executed by another device into the GPU memory.
> 
> Signed-off-by: Elena Agostini <eagostini@nvidia.com>
> ---
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Enforce a GPU memory write barrier.
> + *
> + * @param dev_id
> + *   Reference device ID.
> + *
> + * @return
> + *   0 on success, -rte_errno otherwise:
> + *   - ENODEV if invalid dev_id
> + *   - ENOTSUP if operation not supported by the driver
> + *   - EPERM if driver error
> + */
> +__rte_experimental
> +int rte_gpu_mbw(int16_t dev_id);

I would replace mbw with wmb.

Also it may be worth adding few more words about the goal:
ensure that previous writes in GPU memory are complete?
Does it work for writes done from CPU? from GPU?



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH v3 5/9] gpudev: add memory API
  2021-10-09  1:53   ` [dpdk-dev] [PATCH v3 5/9] gpudev: add memory API eagostini
@ 2021-10-08 20:18     ` Thomas Monjalon
  2021-10-29 19:38     ` Mattias Rönnblom
  1 sibling, 0 replies; 128+ messages in thread
From: Thomas Monjalon @ 2021-10-08 20:18 UTC (permalink / raw)
  To: Elena Agostini; +Cc: dev

09/10/2021 03:53, eagostini@nvidia.com:
> --- a/lib/gpudev/version.map
> +++ b/lib/gpudev/version.map
> +	rte_gpu_free;
>  	rte_gpu_info_get;
>  	rte_gpu_init;
>  	rte_gpu_is_valid;
> +	rte_gpu_malloc;
> +	rte_gpu_register;
> +	rte_gpu_unregister;
>  };

Should we insert _mem_ in each function name?
Proposal:
	rte_gpu_mem_alloc
	rte_gpu_mem_free
	rte_gpu_mem_register
	rte_gpu_mem_unregister




^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [PATCH v3 0/9] GPU library
  2021-06-02 20:35 [dpdk-dev] [PATCH] gpudev: introduce memory API Thomas Monjalon
                   ` (7 preceding siblings ...)
  2021-07-30 13:55 ` [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library Thomas Monjalon
@ 2021-10-09  1:53 ` eagostini
  2021-10-09  1:53   ` [dpdk-dev] [PATCH v3 1/9] gpudev: introduce GPU device class library eagostini
                     ` (9 more replies)
  2021-11-03 19:15 ` [dpdk-dev] [PATCH v4 " eagostini
  2021-11-08 18:57 ` [dpdk-dev] [PATCH v5 0/9] GPU library eagostini
  10 siblings, 10 replies; 128+ messages in thread
From: eagostini @ 2021-10-09  1:53 UTC (permalink / raw)
  To: dev; +Cc: eagostini

From: eagostini <eagostini@nvidia.com>

In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.

The goal of this new library is to enhance the collaboration between
DPDK, that's primarily a CPU framework, and GPU devices.

When mixing network activity with task processing on a non-CPU device,
there may be the need to put in communication the CPU with the device
in order to manage the memory, synchronize operations, exchange info, etc..

This library provides a number of new features:
- Interoperability with GPU-specific library with generic handlers
- Possibility to allocate and free memory on the GPU
- Possibility to allocate and free memory on the CPU but visible from the GPU
- Communication functions to enhance the dialog between the CPU and the GPU

The infrastructure is prepared to welcome drivers in drivers/gpu/
as the CUDA one, sent as draft:
https://patches.dpdk.org/project/dpdk/patch/20211005224905.13505-1-eagostini@nvidia.com/


Elena Agostini (6):
  gpudev: introduce GPU device class library
  gpudev: add memory API
  gpudev: add memory barrier
  gpudev: add communication flag
  gpudev: add communication list
  doc: add CUDA example in GPU guide

Thomas Monjalon (3):
  gpudev: add event notification
  gpudev: add child device representing a device context
  gpudev: support multi-process

 .gitignore                             |   1 +
 MAINTAINERS                            |   6 +
 app/meson.build                        |   1 +
 app/test-gpudev/main.c                 | 394 +++++++++++
 app/test-gpudev/meson.build            |   5 +
 doc/api/doxy-api-index.md              |   1 +
 doc/api/doxy-api.conf.in               |   1 +
 doc/guides/conf.py                     |   8 +
 doc/guides/gpus/features/default.ini   |  13 +
 doc/guides/gpus/index.rst              |  11 +
 doc/guides/gpus/overview.rst           |  10 +
 doc/guides/index.rst                   |   1 +
 doc/guides/prog_guide/gpudev.rst       | 226 +++++++
 doc/guides/prog_guide/index.rst        |   1 +
 doc/guides/rel_notes/release_21_11.rst |   6 +
 drivers/gpu/meson.build                |   4 +
 drivers/meson.build                    |   1 +
 lib/gpudev/gpudev.c                    | 904 +++++++++++++++++++++++++
 lib/gpudev/gpudev_driver.h             | 102 +++
 lib/gpudev/meson.build                 |  12 +
 lib/gpudev/rte_gpudev.h                | 649 ++++++++++++++++++
 lib/gpudev/version.map                 |  38 ++
 lib/meson.build                        |   1 +
 23 files changed, 2396 insertions(+)
 create mode 100644 app/test-gpudev/main.c
 create mode 100644 app/test-gpudev/meson.build
 create mode 100644 doc/guides/gpus/features/default.ini
 create mode 100644 doc/guides/gpus/index.rst
 create mode 100644 doc/guides/gpus/overview.rst
 create mode 100644 doc/guides/prog_guide/gpudev.rst
 create mode 100644 drivers/gpu/meson.build
 create mode 100644 lib/gpudev/gpudev.c
 create mode 100644 lib/gpudev/gpudev_driver.h
 create mode 100644 lib/gpudev/meson.build
 create mode 100644 lib/gpudev/rte_gpudev.h
 create mode 100644 lib/gpudev/version.map

-- 
2.17.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [PATCH v3 1/9] gpudev: introduce GPU device class library
  2021-10-09  1:53 ` [dpdk-dev] [PATCH v3 0/9] GPU library eagostini
@ 2021-10-09  1:53   ` eagostini
  2021-10-09  1:53   ` [dpdk-dev] [PATCH v3 2/9] gpudev: add event notification eagostini
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 128+ messages in thread
From: eagostini @ 2021-10-09  1:53 UTC (permalink / raw)
  To: dev; +Cc: Elena Agostini, Thomas Monjalon

From: Elena Agostini <eagostini@nvidia.com>

In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.

The new library gpudev is for dealing with GPGPU computing devices
from a DPDK application running on the CPU.

The infrastructure is prepared to welcome drivers in drivers/gpu/.

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 .gitignore                             |   1 +
 MAINTAINERS                            |   6 +
 app/meson.build                        |   1 +
 app/test-gpudev/main.c                 | 107 +++++++++++
 app/test-gpudev/meson.build            |   5 +
 doc/api/doxy-api-index.md              |   1 +
 doc/api/doxy-api.conf.in               |   1 +
 doc/guides/conf.py                     |   8 +
 doc/guides/gpus/features/default.ini   |  10 +
 doc/guides/gpus/index.rst              |  11 ++
 doc/guides/gpus/overview.rst           |  10 +
 doc/guides/index.rst                   |   1 +
 doc/guides/prog_guide/gpudev.rst       |  36 ++++
 doc/guides/prog_guide/index.rst        |   1 +
 doc/guides/rel_notes/release_21_11.rst |   4 +
 drivers/gpu/meson.build                |   4 +
 drivers/meson.build                    |   1 +
 lib/gpudev/gpudev.c                    | 249 +++++++++++++++++++++++++
 lib/gpudev/gpudev_driver.h             |  67 +++++++
 lib/gpudev/meson.build                 |  10 +
 lib/gpudev/rte_gpudev.h                | 168 +++++++++++++++++
 lib/gpudev/version.map                 |  20 ++
 lib/meson.build                        |   1 +
 23 files changed, 723 insertions(+)
 create mode 100644 app/test-gpudev/main.c
 create mode 100644 app/test-gpudev/meson.build
 create mode 100644 doc/guides/gpus/features/default.ini
 create mode 100644 doc/guides/gpus/index.rst
 create mode 100644 doc/guides/gpus/overview.rst
 create mode 100644 doc/guides/prog_guide/gpudev.rst
 create mode 100644 drivers/gpu/meson.build
 create mode 100644 lib/gpudev/gpudev.c
 create mode 100644 lib/gpudev/gpudev_driver.h
 create mode 100644 lib/gpudev/meson.build
 create mode 100644 lib/gpudev/rte_gpudev.h
 create mode 100644 lib/gpudev/version.map

diff --git a/.gitignore b/.gitignore
index b19c0717e6..49494e0c6c 100644
--- a/.gitignore
+++ b/.gitignore
@@ -14,6 +14,7 @@ doc/guides/compressdevs/overview_feature_table.txt
 doc/guides/regexdevs/overview_feature_table.txt
 doc/guides/vdpadevs/overview_feature_table.txt
 doc/guides/bbdevs/overview_feature_table.txt
+doc/guides/gpus/overview_feature_table.txt
 
 # ignore generated ctags/cscope files
 cscope.out.po
diff --git a/MAINTAINERS b/MAINTAINERS
index 278e5b3226..b61ad61ee2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -454,6 +454,12 @@ F: app/test-regex/
 F: doc/guides/prog_guide/regexdev.rst
 F: doc/guides/regexdevs/features/default.ini
 
+General-Purpose Graphics Processing Unit (GPU) API - EXPERIMENTAL
+M: Elena Agostini <eagostini@nvidia.com>
+F: lib/gpudev/
+F: doc/guides/prog_guide/gpudev.rst
+F: doc/guides/gpus/features/default.ini
+
 Eventdev API
 M: Jerin Jacob <jerinj@marvell.com>
 T: git://dpdk.org/next/dpdk-next-eventdev
diff --git a/app/meson.build b/app/meson.build
index 4c6049807c..42bca044e0 100644
--- a/app/meson.build
+++ b/app/meson.build
@@ -12,6 +12,7 @@ apps = [
         'test-eventdev',
         'test-fib',
         'test-flow-perf',
+        'test-gpudev',
         'test-pipeline',
         'test-pmd',
         'test-regex',
diff --git a/app/test-gpudev/main.c b/app/test-gpudev/main.c
new file mode 100644
index 0000000000..6a73a54e84
--- /dev/null
+++ b/app/test-gpudev/main.c
@@ -0,0 +1,107 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 NVIDIA Corporation & Affiliates
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <stdarg.h>
+#include <errno.h>
+#include <getopt.h>
+
+#include <rte_common.h>
+#include <rte_malloc.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_ether.h>
+#include <rte_ethdev.h>
+#include <rte_mempool.h>
+#include <rte_mbuf.h>
+
+#include <rte_gpudev.h>
+
+enum app_args {
+	ARG_HELP,
+	ARG_MEMPOOL
+};
+
+static void
+usage(const char *prog_name)
+{
+	printf("%s [EAL options] --\n",
+		prog_name);
+}
+
+static void
+args_parse(int argc, char **argv)
+{
+	char **argvopt;
+	int opt;
+	int opt_idx;
+
+	static struct option lgopts[] = {
+		{ "help", 0, 0, ARG_HELP},
+		/* End of options */
+		{ 0, 0, 0, 0 }
+	};
+
+	argvopt = argv;
+	while ((opt = getopt_long(argc, argvopt, "",
+				lgopts, &opt_idx)) != EOF) {
+		switch (opt) {
+		case ARG_HELP:
+			usage(argv[0]);
+			break;
+		default:
+			usage(argv[0]);
+			rte_exit(EXIT_FAILURE, "Invalid option: %s\n", argv[optind]);
+			break;
+		}
+	}
+}
+
+int
+main(int argc, char **argv)
+{
+	int ret;
+	int nb_gpus = 0;
+	int16_t gpu_id = 0;
+	struct rte_gpu_info ginfo;
+
+	/* Init EAL. */
+	ret = rte_eal_init(argc, argv);
+	if (ret < 0)
+		rte_exit(EXIT_FAILURE, "EAL init failed\n");
+	argc -= ret;
+	argv += ret;
+	if (argc > 1)
+		args_parse(argc, argv);
+	argc -= ret;
+	argv += ret;
+
+	nb_gpus = rte_gpu_count_avail();
+	printf("\n\nDPDK found %d GPUs:\n", nb_gpus);
+	RTE_GPU_FOREACH(gpu_id)
+	{
+		if(rte_gpu_info_get(gpu_id, &ginfo))
+			rte_exit(EXIT_FAILURE, "rte_gpu_info_get error - bye\n");
+
+		printf("\tGPU ID %d\n\t\tparent ID %d GPU Bus ID %s NUMA node %d Tot memory %.02f MB, Tot processors %d\n",
+				ginfo.dev_id,
+				ginfo.parent,
+				ginfo.name,
+				ginfo.numa_node,
+				(((float)ginfo.total_memory)/(float)1024)/(float)1024,
+				ginfo.processor_count
+			);
+	}
+	printf("\n\n");
+
+	/* clean up the EAL */
+	rte_eal_cleanup();
+	printf("Bye...\n");
+
+	return EXIT_SUCCESS;
+}
diff --git a/app/test-gpudev/meson.build b/app/test-gpudev/meson.build
new file mode 100644
index 0000000000..17bdef3646
--- /dev/null
+++ b/app/test-gpudev/meson.build
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2021 NVIDIA Corporation & Affiliates
+
+sources = files('main.c')
+deps = ['gpudev', 'ethdev']
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 1992107a03..bd10342ca2 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -21,6 +21,7 @@ The public API headers are grouped by topics:
   [compressdev]        (@ref rte_compressdev.h),
   [compress]           (@ref rte_comp.h),
   [regexdev]           (@ref rte_regexdev.h),
+  [gpudev]             (@ref rte_gpudev.h),
   [eventdev]           (@ref rte_eventdev.h),
   [event_eth_rx_adapter]   (@ref rte_event_eth_rx_adapter.h),
   [event_eth_tx_adapter]   (@ref rte_event_eth_tx_adapter.h),
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index 325a0195c6..831b9a6b33 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -40,6 +40,7 @@ INPUT                   = @TOPDIR@/doc/api/doxy-api-index.md \
                           @TOPDIR@/lib/eventdev \
                           @TOPDIR@/lib/fib \
                           @TOPDIR@/lib/flow_classify \
+                          @TOPDIR@/lib/gpudev \
                           @TOPDIR@/lib/graph \
                           @TOPDIR@/lib/gro \
                           @TOPDIR@/lib/gso \
diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 67d2dd62c7..7930da9ceb 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -152,6 +152,9 @@ def generate_overview_table(output_filename, table_id, section, table_name, titl
         name = ini_filename[:-4]
         name = name.replace('_vf', 'vf')
         pmd_names.append(name)
+    if not pmd_names:
+        # Add an empty column if table is empty (required by RST syntax)
+        pmd_names.append(' ')
 
     # Pad the table header names.
     max_header_len = len(max(pmd_names, key=len))
@@ -388,6 +391,11 @@ def setup(app):
                             'Features',
                             'Features availability in bbdev drivers',
                             'Feature')
+    table_file = dirname(__file__) + '/gpus/overview_feature_table.txt'
+    generate_overview_table(table_file, 1,
+                            'Features',
+                            'Features availability in GPU drivers',
+                            'Feature')
 
     if LooseVersion(sphinx_version) < LooseVersion('1.3.1'):
         print('Upgrade sphinx to version >= 1.3.1 for '
diff --git a/doc/guides/gpus/features/default.ini b/doc/guides/gpus/features/default.ini
new file mode 100644
index 0000000000..ec7a545eb7
--- /dev/null
+++ b/doc/guides/gpus/features/default.ini
@@ -0,0 +1,10 @@
+;
+; Features of GPU drivers.
+;
+; This file defines the features that are valid for inclusion in
+; the other driver files and also the order that they appear in
+; the features table in the documentation. The feature description
+; string should not exceed feature_str_len defined in conf.py.
+;
+[Features]
+Get device info                =
diff --git a/doc/guides/gpus/index.rst b/doc/guides/gpus/index.rst
new file mode 100644
index 0000000000..1878423239
--- /dev/null
+++ b/doc/guides/gpus/index.rst
@@ -0,0 +1,11 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+   Copyright (c) 2021 NVIDIA Corporation & Affiliates
+
+General-Purpose Graphics Processing Unit Drivers
+================================================
+
+.. toctree::
+   :maxdepth: 2
+   :numbered:
+
+   overview
diff --git a/doc/guides/gpus/overview.rst b/doc/guides/gpus/overview.rst
new file mode 100644
index 0000000000..4830348818
--- /dev/null
+++ b/doc/guides/gpus/overview.rst
@@ -0,0 +1,10 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+   Copyright (c) 2021 NVIDIA Corporation & Affiliates
+
+Overview of GPU Drivers
+=======================
+
+General-Purpose computing on Graphics Processing Unit (GPGPU)
+is the use of GPU to perform parallel computation.
+
+.. include:: overview_feature_table.txt
diff --git a/doc/guides/index.rst b/doc/guides/index.rst
index 857f0363d3..ee4d79a4eb 100644
--- a/doc/guides/index.rst
+++ b/doc/guides/index.rst
@@ -21,6 +21,7 @@ DPDK documentation
    compressdevs/index
    vdpadevs/index
    regexdevs/index
+   gpus/index
    eventdevs/index
    rawdevs/index
    mempool/index
diff --git a/doc/guides/prog_guide/gpudev.rst b/doc/guides/prog_guide/gpudev.rst
new file mode 100644
index 0000000000..6ea7239159
--- /dev/null
+++ b/doc/guides/prog_guide/gpudev.rst
@@ -0,0 +1,36 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+   Copyright (c) 2021 NVIDIA Corporation & Affiliates
+
+General-Purpose Graphics Processing Unit Library
+================================================
+
+When mixing networking activity with task processing on a GPU device,
+there may be the need to put in communication the CPU with the device
+in order to manage the memory, synchronize operations, exchange info, etc..
+
+By means of the generic GPU interface provided by this library,
+it is possible to allocate a chunk of GPU memory and use it
+to create a DPDK mempool with external mbufs having the payload
+on the GPU memory, enabling any network interface card
+(which support this feature like Mellanox NIC)
+to directly transmit and receive packets using GPU memory.
+
+Additionally, this library provides a number of functions
+to enhance the dialog between CPU and GPU.
+
+Out of scope of this library is to provide a wrapper for GPU specific libraries
+(e.g. CUDA Toolkit or OpenCL), thus it is not possible to launch workload
+on the device or create GPU specific objects
+(e.g. CUDA Driver context or CUDA Streams in case of NVIDIA GPUs).
+
+
+Features
+--------
+
+This library provides a number of features:
+
+- Interoperability with device-specific library through generic handlers.
+
+
+API Overview
+------------
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 2dce507f46..e49a09a07a 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -27,6 +27,7 @@ Programmer's Guide
     cryptodev_lib
     compressdev
     regexdev
+    gpudev
     rte_security
     rawdev
     link_bonding_poll_mode_drv_lib
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index c0a7f75518..4986a35b50 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -62,6 +62,10 @@ New Features
   * Added bus-level parsing of the devargs syntax.
   * Kept compatibility with the legacy syntax as parsing fallback.
 
+* **Introduced GPU device class with first features:**
+
+  * Device information
+
 * **Added new RSS offload types for IPv4/L4 checksum in RSS flow.**
 
   Added macros ETH_RSS_IPV4_CHKSUM and ETH_RSS_L4_CHKSUM, now IPv4 and
diff --git a/drivers/gpu/meson.build b/drivers/gpu/meson.build
new file mode 100644
index 0000000000..e51ad3381b
--- /dev/null
+++ b/drivers/gpu/meson.build
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2021 NVIDIA Corporation & Affiliates
+
+drivers = []
diff --git a/drivers/meson.build b/drivers/meson.build
index 3d08540581..be2d78ffd5 100644
--- a/drivers/meson.build
+++ b/drivers/meson.build
@@ -18,6 +18,7 @@ subdirs = [
         'vdpa',           # depends on common, bus and mempool.
         'event',          # depends on common, bus, mempool and net.
         'baseband',       # depends on common and bus.
+        'gpu',            # depends on common and bus.
 ]
 
 if meson.is_cross_build()
diff --git a/lib/gpudev/gpudev.c b/lib/gpudev/gpudev.c
new file mode 100644
index 0000000000..c839c530c8
--- /dev/null
+++ b/lib/gpudev/gpudev.c
@@ -0,0 +1,249 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 NVIDIA Corporation & Affiliates
+ */
+
+#include <rte_eal.h>
+#include <rte_string_fns.h>
+#include <rte_errno.h>
+#include <rte_log.h>
+
+#include "rte_gpudev.h"
+#include "gpudev_driver.h"
+
+/* Logging */
+RTE_LOG_REGISTER_DEFAULT(gpu_logtype, NOTICE);
+#define GPU_LOG(level, ...) \
+	rte_log(RTE_LOG_ ## level, gpu_logtype, RTE_FMT("gpu: " \
+		RTE_FMT_HEAD(__VA_ARGS__,) "\n", RTE_FMT_TAIL(__VA_ARGS__,)))
+
+/* Set any driver error as EPERM */
+#define GPU_DRV_RET(function) \
+	((function != 0) ? -(rte_errno = EPERM) : (rte_errno = 0))
+
+/* Array of devices */
+static struct rte_gpu *gpus;
+/* Number of currently valid devices */
+static int16_t gpu_max;
+/* Number of currently valid devices */
+static int16_t gpu_count;
+
+int
+rte_gpu_init(size_t dev_max)
+{
+	if (dev_max == 0 || dev_max > INT16_MAX) {
+		GPU_LOG(ERR, "invalid array size");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	/* No lock, it must be called before or during first probing. */
+	if (gpus != NULL) {
+		GPU_LOG(ERR, "already initialized");
+		rte_errno = EBUSY;
+		return -rte_errno;
+	}
+
+	gpus = calloc(dev_max, sizeof(struct rte_gpu));
+	if (gpus == NULL) {
+		GPU_LOG(ERR, "cannot initialize library");
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+
+	gpu_max = dev_max;
+	return 0;
+}
+
+uint16_t
+rte_gpu_count_avail(void)
+{
+	return gpu_count;
+}
+
+bool
+rte_gpu_is_valid(int16_t dev_id)
+{
+	if (dev_id >= 0 && dev_id < gpu_max &&
+		gpus[dev_id].state == RTE_GPU_STATE_INITIALIZED)
+		return true;
+	return false;
+}
+
+int16_t
+rte_gpu_find_next(int16_t dev_id)
+{
+	if (dev_id < 0)
+		dev_id = 0;
+	while (dev_id < gpu_max &&
+			gpus[dev_id].state == RTE_GPU_STATE_UNUSED)
+		dev_id++;
+
+	if (dev_id >= gpu_max)
+		return RTE_GPU_ID_NONE;
+	return dev_id;
+}
+
+static int16_t
+gpu_find_free_id(void)
+{
+	int16_t dev_id;
+
+	for (dev_id = 0; dev_id < gpu_max; dev_id++) {
+		if (gpus[dev_id].state == RTE_GPU_STATE_UNUSED)
+			return dev_id;
+	}
+	return RTE_GPU_ID_NONE;
+}
+
+static struct rte_gpu *
+gpu_get_by_id(int16_t dev_id)
+{
+	if (!rte_gpu_is_valid(dev_id))
+		return NULL;
+	return &gpus[dev_id];
+}
+
+struct rte_gpu *
+rte_gpu_get_by_name(const char *name)
+{
+	int16_t dev_id;
+	struct rte_gpu *dev;
+
+	if (name == NULL) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	RTE_GPU_FOREACH(dev_id) {
+		dev = &gpus[dev_id];
+		if (strncmp(name, dev->name, RTE_DEV_NAME_MAX_LEN) == 0)
+			return dev;
+	}
+	return NULL;
+}
+
+struct rte_gpu *
+rte_gpu_allocate(const char *name)
+{
+	int16_t dev_id;
+	struct rte_gpu *dev;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+		GPU_LOG(ERR, "only primary process can allocate device");
+		rte_errno = EPERM;
+		return NULL;
+	}
+	if (name == NULL) {
+		GPU_LOG(ERR, "allocate device without a name");
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* implicit initialization of library before adding first device */
+	if (gpus == NULL && rte_gpu_init(RTE_GPU_DEFAULT_MAX) < 0)
+		return NULL;
+
+	if (rte_gpu_get_by_name(name) != NULL) {
+		GPU_LOG(ERR, "device with name %s already exists", name);
+		rte_errno = EEXIST;
+		return NULL;
+	}
+	dev_id = gpu_find_free_id();
+	if (dev_id == RTE_GPU_ID_NONE) {
+		GPU_LOG(ERR, "reached maximum number of devices");
+		rte_errno = ENOENT;
+		return NULL;
+	}
+
+	dev = &gpus[dev_id];
+	memset(dev, 0, sizeof(*dev));
+
+	if (rte_strscpy(dev->name, name, RTE_DEV_NAME_MAX_LEN) < 0) {
+		GPU_LOG(ERR, "device name too long: %s", name);
+		rte_errno = ENAMETOOLONG;
+		return NULL;
+	}
+	dev->info.name = dev->name;
+	dev->info.dev_id = dev_id;
+	dev->info.numa_node = -1;
+
+	gpu_count++;
+	GPU_LOG(DEBUG, "new device %s (id %d) of total %d",
+			name, dev_id, gpu_count);
+	return dev;
+}
+
+void
+rte_gpu_complete_new(struct rte_gpu *dev)
+{
+	if (dev == NULL)
+		return;
+
+	dev->state = RTE_GPU_STATE_INITIALIZED;
+}
+
+int
+rte_gpu_release(struct rte_gpu *dev)
+{
+	if (dev == NULL) {
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+
+	GPU_LOG(DEBUG, "free device %s (id %d)",
+			dev->info.name, dev->info.dev_id);
+	dev->state = RTE_GPU_STATE_UNUSED;
+	gpu_count--;
+
+	return 0;
+}
+
+int
+rte_gpu_close(int16_t dev_id)
+{
+	int firsterr, binerr;
+	int *lasterr = &firsterr;
+	struct rte_gpu *dev;
+
+	dev = gpu_get_by_id(dev_id);
+	if (dev == NULL) {
+		GPU_LOG(ERR, "close invalid device ID %d", dev_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+
+	if (dev->ops.dev_close != NULL) {
+		*lasterr = GPU_DRV_RET(dev->ops.dev_close(dev));
+		if (*lasterr != 0)
+			lasterr = &binerr;
+	}
+
+	*lasterr = rte_gpu_release(dev);
+
+	rte_errno = -firsterr;
+	return firsterr;
+}
+
+int
+rte_gpu_info_get(int16_t dev_id, struct rte_gpu_info *info)
+{
+	struct rte_gpu *dev;
+
+	dev = gpu_get_by_id(dev_id);
+	if (dev == NULL) {
+		GPU_LOG(ERR, "query invalid device ID %d", dev_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+	if (info == NULL) {
+		GPU_LOG(ERR, "query without storage");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	if (dev->ops.dev_info_get == NULL) {
+		*info = dev->info;
+		return 0;
+	}
+	return GPU_DRV_RET(dev->ops.dev_info_get(dev, info));
+}
diff --git a/lib/gpudev/gpudev_driver.h b/lib/gpudev/gpudev_driver.h
new file mode 100644
index 0000000000..9e096e3b64
--- /dev/null
+++ b/lib/gpudev/gpudev_driver.h
@@ -0,0 +1,67 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 NVIDIA Corporation & Affiliates
+ */
+
+/*
+ * This header file must be included only by drivers.
+ * It is considered internal, i.e. hidden for the application.
+ * The prefix rte_ is used to avoid namespace clash in drivers.
+ */
+
+#ifndef RTE_GPUDEV_DRIVER_H
+#define RTE_GPUDEV_DRIVER_H
+
+#include <stdint.h>
+
+#include <rte_dev.h>
+
+#include "rte_gpudev.h"
+
+/* Flags indicate current state of device. */
+enum rte_gpu_state {
+	RTE_GPU_STATE_UNUSED,        /* not initialized */
+	RTE_GPU_STATE_INITIALIZED,   /* initialized */
+};
+
+struct rte_gpu;
+typedef int (rte_gpu_close_t)(struct rte_gpu *dev);
+typedef int (rte_gpu_info_get_t)(struct rte_gpu *dev, struct rte_gpu_info *info);
+
+struct rte_gpu_ops {
+	/* Get device info. If NULL, info is just copied. */
+	rte_gpu_info_get_t *dev_info_get;
+	/* Close device. */
+	rte_gpu_close_t *dev_close;
+};
+
+struct rte_gpu {
+	/* Backing device. */
+	struct rte_device *device;
+	/* Unique identifier name. */
+	char name[RTE_DEV_NAME_MAX_LEN]; /* Updated by this library. */
+	/* Device info structure. */
+	struct rte_gpu_info info;
+	/* Driver functions. */
+	struct rte_gpu_ops ops;
+	/* Current state (used or not) in the running process. */
+	enum rte_gpu_state state; /* Updated by this library. */
+	/* Driver-specific private data for the running process. */
+	void *process_private;
+} __rte_cache_aligned;
+
+__rte_internal
+struct rte_gpu *rte_gpu_get_by_name(const char *name);
+
+/* First step of initialization */
+__rte_internal
+struct rte_gpu *rte_gpu_allocate(const char *name);
+
+/* Last step of initialization. */
+__rte_internal
+void rte_gpu_complete_new(struct rte_gpu *dev);
+
+/* Last step of removal. */
+__rte_internal
+int rte_gpu_release(struct rte_gpu *dev);
+
+#endif /* RTE_GPUDEV_DRIVER_H */
diff --git a/lib/gpudev/meson.build b/lib/gpudev/meson.build
new file mode 100644
index 0000000000..608154817b
--- /dev/null
+++ b/lib/gpudev/meson.build
@@ -0,0 +1,10 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2021 NVIDIA Corporation & Affiliates
+
+headers = files(
+        'rte_gpudev.h',
+)
+
+sources = files(
+        'gpudev.c',
+)
diff --git a/lib/gpudev/rte_gpudev.h b/lib/gpudev/rte_gpudev.h
new file mode 100644
index 0000000000..eb7cfa8c59
--- /dev/null
+++ b/lib/gpudev/rte_gpudev.h
@@ -0,0 +1,168 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 NVIDIA Corporation & Affiliates
+ */
+
+#ifndef RTE_GPUDEV_H
+#define RTE_GPUDEV_H
+
+#include <stddef.h>
+#include <stdint.h>
+#include <stdbool.h>
+
+#include <rte_compat.h>
+
+/**
+ * @file
+ * Generic library to interact with GPU computing device.
+ *
+ * The API is not thread-safe.
+ * Device management must be done by a single thread.
+ *
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/** Maximum number of devices if rte_gpu_init() is not called. */
+#define RTE_GPU_DEFAULT_MAX 32
+
+/** Empty device ID. */
+#define RTE_GPU_ID_NONE -1
+
+/** Store device info. */
+struct rte_gpu_info {
+	/** Unique identifier name. */
+	const char *name;
+	/** Device ID. */
+	int16_t dev_id;
+	/** Total processors available on device. */
+	uint32_t processor_count;
+	/** Total memory available on device. */
+	size_t total_memory;
+	/* Local NUMA memory ID. -1 if unknown. */
+	int16_t numa_node;
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Initialize the device array before probing devices.
+ * If not called, the maximum of probed devices is RTE_GPU_DEFAULT_MAX.
+ *
+ * @param dev_max
+ *   Maximum number of devices.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENOMEM if out of memory
+ *   - EINVAL if 0 size
+ *   - EBUSY if already initialized
+ */
+__rte_experimental
+int rte_gpu_init(size_t dev_max);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Return the number of GPU detected and associated to DPDK.
+ *
+ * @return
+ *   The number of available computing devices.
+ */
+__rte_experimental
+uint16_t rte_gpu_count_avail(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Check if the device is valid and initialized in DPDK.
+ *
+ * @param dev_id
+ *   The input device ID.
+ *
+ * @return
+ *   - True if dev_id is a valid and initialized computing device.
+ *   - False otherwise.
+ */
+__rte_experimental
+bool rte_gpu_is_valid(int16_t dev_id);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get the ID of the next valid GPU initialized in DPDK.
+ *
+ * @param dev_id
+ *   The initial device ID to start the research.
+ *
+ * @return
+ *   Next device ID corresponding to a valid and initialized computing device,
+ *   RTE_GPU_ID_NONE if there is none.
+ */
+__rte_experimental
+int16_t rte_gpu_find_next(int16_t dev_id);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Macro to iterate over all valid GPU devices.
+ *
+ * @param dev_id
+ *   The ID of the next possible valid device, usually 0 to iterate all.
+ */
+#define RTE_GPU_FOREACH(dev_id) \
+	for (dev_id = rte_gpu_find_next(0); \
+	     dev_id > 0; \
+	     dev_id = rte_gpu_find_next(dev_id + 1))
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Close device.
+ * All resources are released.
+ *
+ * @param dev_id
+ *   Device ID to close.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - EPERM if driver error
+ */
+__rte_experimental
+int rte_gpu_close(int16_t dev_id);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Return device specific info.
+ *
+ * @param dev_id
+ *   Device ID to get info.
+ * @param info
+ *   Memory structure to fill with the info.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - EINVAL if NULL info
+ *   - EPERM if driver error
+ */
+__rte_experimental
+int rte_gpu_info_get(int16_t dev_id, struct rte_gpu_info *info);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_GPUDEV_H */
diff --git a/lib/gpudev/version.map b/lib/gpudev/version.map
new file mode 100644
index 0000000000..6ac6b327e2
--- /dev/null
+++ b/lib/gpudev/version.map
@@ -0,0 +1,20 @@
+EXPERIMENTAL {
+	global:
+
+	# added in 21.11
+	rte_gpu_close;
+	rte_gpu_count_avail;
+	rte_gpu_find_next;
+	rte_gpu_info_get;
+	rte_gpu_init;
+	rte_gpu_is_valid;
+};
+
+INTERNAL {
+	global:
+
+	rte_gpu_allocate;
+	rte_gpu_complete_new;
+	rte_gpu_get_by_name;
+	rte_gpu_release;
+};
diff --git a/lib/meson.build b/lib/meson.build
index b2ba7258d8..029298842a 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -33,6 +33,7 @@ libraries = [
         'distributor',
         'efd',
         'eventdev',
+        'gpudev',
         'gro',
         'gso',
         'ip_frag',
-- 
2.17.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [PATCH v3 2/9] gpudev: add event notification
  2021-10-09  1:53 ` [dpdk-dev] [PATCH v3 0/9] GPU library eagostini
  2021-10-09  1:53   ` [dpdk-dev] [PATCH v3 1/9] gpudev: introduce GPU device class library eagostini
@ 2021-10-09  1:53   ` eagostini
  2021-10-09  1:53   ` [dpdk-dev] [PATCH v3 3/9] gpudev: add child device representing a device context eagostini
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 128+ messages in thread
From: eagostini @ 2021-10-09  1:53 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon

From: Thomas Monjalon <thomas@monjalon.net>

Callback functions may be registered for a device event.
Callback management is per-process and not thread-safe.

The events RTE_GPU_EVENT_NEW and RTE_GPU_EVENT_DEL
are notified respectively after creation and before removal
of a device, as part of the library functions.
Some future events may be emitted from drivers.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 lib/gpudev/gpudev.c        | 148 +++++++++++++++++++++++++++++++++++++
 lib/gpudev/gpudev_driver.h |   7 ++
 lib/gpudev/rte_gpudev.h    |  70 ++++++++++++++++++
 lib/gpudev/version.map     |   3 +
 4 files changed, 228 insertions(+)

diff --git a/lib/gpudev/gpudev.c b/lib/gpudev/gpudev.c
index c839c530c8..d57e23df7c 100644
--- a/lib/gpudev/gpudev.c
+++ b/lib/gpudev/gpudev.c
@@ -3,6 +3,7 @@
  */
 
 #include <rte_eal.h>
+#include <rte_tailq.h>
 #include <rte_string_fns.h>
 #include <rte_errno.h>
 #include <rte_log.h>
@@ -27,6 +28,16 @@ static int16_t gpu_max;
 /* Number of currently valid devices */
 static int16_t gpu_count;
 
+/* Event callback object */
+struct rte_gpu_callback {
+	TAILQ_ENTRY(rte_gpu_callback) next;
+	rte_gpu_callback_t *function;
+	void *user_data;
+	enum rte_gpu_event event;
+};
+static rte_rwlock_t gpu_callback_lock = RTE_RWLOCK_INITIALIZER;
+static void gpu_free_callbacks(struct rte_gpu *dev);
+
 int
 rte_gpu_init(size_t dev_max)
 {
@@ -166,6 +177,7 @@ rte_gpu_allocate(const char *name)
 	dev->info.name = dev->name;
 	dev->info.dev_id = dev_id;
 	dev->info.numa_node = -1;
+	TAILQ_INIT(&dev->callbacks);
 
 	gpu_count++;
 	GPU_LOG(DEBUG, "new device %s (id %d) of total %d",
@@ -180,6 +192,8 @@ rte_gpu_complete_new(struct rte_gpu *dev)
 		return;
 
 	dev->state = RTE_GPU_STATE_INITIALIZED;
+	dev->state = RTE_GPU_STATE_INITIALIZED;
+	rte_gpu_notify(dev, RTE_GPU_EVENT_NEW);
 }
 
 int
@@ -192,6 +206,9 @@ rte_gpu_release(struct rte_gpu *dev)
 
 	GPU_LOG(DEBUG, "free device %s (id %d)",
 			dev->info.name, dev->info.dev_id);
+	rte_gpu_notify(dev, RTE_GPU_EVENT_DEL);
+
+	gpu_free_callbacks(dev);
 	dev->state = RTE_GPU_STATE_UNUSED;
 	gpu_count--;
 
@@ -224,6 +241,137 @@ rte_gpu_close(int16_t dev_id)
 	return firsterr;
 }
 
+int
+rte_gpu_callback_register(int16_t dev_id, enum rte_gpu_event event,
+		rte_gpu_callback_t *function, void *user_data)
+{
+	int16_t next_dev, last_dev;
+	struct rte_gpu_callback_list *callbacks;
+	struct rte_gpu_callback *callback;
+
+	if (!rte_gpu_is_valid(dev_id) && dev_id != RTE_GPU_ID_ANY) {
+		GPU_LOG(ERR, "register callback of invalid ID %d", dev_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+	if (function == NULL) {
+		GPU_LOG(ERR, "cannot register callback without function");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	if (dev_id == RTE_GPU_ID_ANY) {
+		next_dev = 0;
+		last_dev = gpu_max - 1;
+	} else {
+		next_dev = last_dev = dev_id;
+	}
+
+	rte_rwlock_write_lock(&gpu_callback_lock);
+	do {
+		callbacks = &gpus[next_dev].callbacks;
+
+		/* check if not already registered */
+		TAILQ_FOREACH(callback, callbacks, next) {
+			if (callback->event == event &&
+					callback->function == function &&
+					callback->user_data == user_data) {
+				GPU_LOG(INFO, "callback already registered");
+				return 0;
+			}
+		}
+
+		callback = malloc(sizeof(*callback));
+		if (callback == NULL) {
+			GPU_LOG(ERR, "cannot allocate callback");
+			return -ENOMEM;
+		}
+		callback->function = function;
+		callback->user_data = user_data;
+		callback->event = event;
+		TAILQ_INSERT_TAIL(callbacks, callback, next);
+
+	} while (++next_dev <= last_dev);
+	rte_rwlock_write_unlock(&gpu_callback_lock);
+
+	return 0;
+}
+
+int
+rte_gpu_callback_unregister(int16_t dev_id, enum rte_gpu_event event,
+		rte_gpu_callback_t *function, void *user_data)
+{
+	int16_t next_dev, last_dev;
+	struct rte_gpu_callback_list *callbacks;
+	struct rte_gpu_callback *callback, *nextcb;
+
+	if (!rte_gpu_is_valid(dev_id) && dev_id != RTE_GPU_ID_ANY) {
+		GPU_LOG(ERR, "unregister callback of invalid ID %d", dev_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+	if (function == NULL) {
+		GPU_LOG(ERR, "cannot unregister callback without function");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	if (dev_id == RTE_GPU_ID_ANY) {
+		next_dev = 0;
+		last_dev = gpu_max - 1;
+	} else {
+		next_dev = last_dev = dev_id;
+	}
+
+	rte_rwlock_write_lock(&gpu_callback_lock);
+	do {
+		callbacks = &gpus[next_dev].callbacks;
+		RTE_TAILQ_FOREACH_SAFE(callback, callbacks, next, nextcb) {
+			if (callback->event != event ||
+					callback->function != function ||
+					(callback->user_data != user_data &&
+					user_data != (void *)-1))
+				continue;
+			TAILQ_REMOVE(callbacks, callback, next);
+			free(callback);
+		}
+	} while (++next_dev <= last_dev);
+	rte_rwlock_write_unlock(&gpu_callback_lock);
+
+	return 0;
+}
+
+static void
+gpu_free_callbacks(struct rte_gpu *dev)
+{
+	struct rte_gpu_callback_list *callbacks;
+	struct rte_gpu_callback *callback, *nextcb;
+
+	callbacks = &dev->callbacks;
+	rte_rwlock_write_lock(&gpu_callback_lock);
+	RTE_TAILQ_FOREACH_SAFE(callback, callbacks, next, nextcb) {
+		TAILQ_REMOVE(callbacks, callback, next);
+		free(callback);
+	}
+	rte_rwlock_write_unlock(&gpu_callback_lock);
+}
+
+void
+rte_gpu_notify(struct rte_gpu *dev, enum rte_gpu_event event)
+{
+	int16_t dev_id;
+	struct rte_gpu_callback *callback;
+
+	dev_id = dev->info.dev_id;
+	rte_rwlock_read_lock(&gpu_callback_lock);
+	TAILQ_FOREACH(callback, &dev->callbacks, next) {
+		if (callback->event != event || callback->function == NULL)
+			continue;
+		callback->function(dev_id, event, callback->user_data);
+	}
+	rte_rwlock_read_unlock(&gpu_callback_lock);
+}
+
 int
 rte_gpu_info_get(int16_t dev_id, struct rte_gpu_info *info)
 {
diff --git a/lib/gpudev/gpudev_driver.h b/lib/gpudev/gpudev_driver.h
index 9e096e3b64..2a7089aa52 100644
--- a/lib/gpudev/gpudev_driver.h
+++ b/lib/gpudev/gpudev_driver.h
@@ -12,6 +12,7 @@
 #define RTE_GPUDEV_DRIVER_H
 
 #include <stdint.h>
+#include <sys/queue.h>
 
 #include <rte_dev.h>
 
@@ -43,6 +44,8 @@ struct rte_gpu {
 	struct rte_gpu_info info;
 	/* Driver functions. */
 	struct rte_gpu_ops ops;
+	/* Event callback list. */
+	TAILQ_HEAD(rte_gpu_callback_list, rte_gpu_callback) callbacks;
 	/* Current state (used or not) in the running process. */
 	enum rte_gpu_state state; /* Updated by this library. */
 	/* Driver-specific private data for the running process. */
@@ -64,4 +67,8 @@ void rte_gpu_complete_new(struct rte_gpu *dev);
 __rte_internal
 int rte_gpu_release(struct rte_gpu *dev);
 
+/* Call registered callbacks. No multi-process event. */
+__rte_internal
+void rte_gpu_notify(struct rte_gpu *dev, enum rte_gpu_event);
+
 #endif /* RTE_GPUDEV_DRIVER_H */
diff --git a/lib/gpudev/rte_gpudev.h b/lib/gpudev/rte_gpudev.h
index eb7cfa8c59..e1702fbfe4 100644
--- a/lib/gpudev/rte_gpudev.h
+++ b/lib/gpudev/rte_gpudev.h
@@ -31,6 +31,11 @@ extern "C" {
 
 /** Empty device ID. */
 #define RTE_GPU_ID_NONE -1
+/** Catch-all device ID. */
+#define RTE_GPU_ID_ANY INT16_MIN
+
+/** Catch-all callback data. */
+#define RTE_GPU_CALLBACK_ANY_DATA ((void *)-1)
 
 /** Store device info. */
 struct rte_gpu_info {
@@ -46,6 +51,18 @@ struct rte_gpu_info {
 	int16_t numa_node;
 };
 
+/** Flags passed in notification callback. */
+enum rte_gpu_event {
+	/** Device is just initialized. */
+	RTE_GPU_EVENT_NEW,
+	/** Device is going to be released. */
+	RTE_GPU_EVENT_DEL,
+};
+
+/** Prototype of event callback function. */
+typedef void (rte_gpu_callback_t)(int16_t dev_id,
+		enum rte_gpu_event event, void *user_data);
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
@@ -141,6 +158,59 @@ int16_t rte_gpu_find_next(int16_t dev_id);
 __rte_experimental
 int rte_gpu_close(int16_t dev_id);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Register a function as event callback.
+ * A function may be registered multiple times for different events.
+ *
+ * @param dev_id
+ *   Device ID to get notified about.
+ *   RTE_GPU_ID_ANY means all devices.
+ * @param event
+ *   Device event to be registered for.
+ * @param function
+ *   Callback function to be called on event.
+ * @param user_data
+ *   Optional parameter passed in the callback.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - EINVAL if NULL function
+ *   - ENOMEM if out of memory
+ */
+__rte_experimental
+int rte_gpu_callback_register(int16_t dev_id, enum rte_gpu_event event,
+		rte_gpu_callback_t *function, void *user_data);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Unregister for an event.
+ *
+ * @param dev_id
+ *   Device ID to be silenced.
+ *   RTE_GPU_ID_ANY means all devices.
+ * @param event
+ *   Registered event.
+ * @param function
+ *   Registered function.
+ * @param user_data
+ *   Optional parameter as registered.
+ *   RTE_GPU_CALLBACK_ANY_DATA is a catch-all.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - EINVAL if NULL function
+ */
+__rte_experimental
+int rte_gpu_callback_unregister(int16_t dev_id, enum rte_gpu_event event,
+		rte_gpu_callback_t *function, void *user_data);
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
diff --git a/lib/gpudev/version.map b/lib/gpudev/version.map
index 6ac6b327e2..b3b6b76c1c 100644
--- a/lib/gpudev/version.map
+++ b/lib/gpudev/version.map
@@ -2,6 +2,8 @@ EXPERIMENTAL {
 	global:
 
 	# added in 21.11
+	rte_gpu_callback_register;
+	rte_gpu_callback_unregister;
 	rte_gpu_close;
 	rte_gpu_count_avail;
 	rte_gpu_find_next;
@@ -16,5 +18,6 @@ INTERNAL {
 	rte_gpu_allocate;
 	rte_gpu_complete_new;
 	rte_gpu_get_by_name;
+	rte_gpu_notify;
 	rte_gpu_release;
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [PATCH v3 3/9] gpudev: add child device representing a device context
  2021-10-09  1:53 ` [dpdk-dev] [PATCH v3 0/9] GPU library eagostini
  2021-10-09  1:53   ` [dpdk-dev] [PATCH v3 1/9] gpudev: introduce GPU device class library eagostini
  2021-10-09  1:53   ` [dpdk-dev] [PATCH v3 2/9] gpudev: add event notification eagostini
@ 2021-10-09  1:53   ` eagostini
  2021-10-09  1:53   ` [dpdk-dev] [PATCH v3 4/9] gpudev: support multi-process eagostini
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 128+ messages in thread
From: eagostini @ 2021-10-09  1:53 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon

From: Thomas Monjalon <thomas@monjalon.net>

The computing device may operate in some isolated contexts.
Memory and processing are isolated in a silo represented by
a child device.
The context is provided as an opaque by the caller of
rte_gpu_add_child().

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 doc/guides/prog_guide/gpudev.rst | 12 ++++++
 lib/gpudev/gpudev.c              | 45 +++++++++++++++++++-
 lib/gpudev/gpudev_driver.h       |  2 +-
 lib/gpudev/rte_gpudev.h          | 71 +++++++++++++++++++++++++++++---
 lib/gpudev/version.map           |  1 +
 5 files changed, 123 insertions(+), 8 deletions(-)

diff --git a/doc/guides/prog_guide/gpudev.rst b/doc/guides/prog_guide/gpudev.rst
index 6ea7239159..7694639489 100644
--- a/doc/guides/prog_guide/gpudev.rst
+++ b/doc/guides/prog_guide/gpudev.rst
@@ -34,3 +34,15 @@ This library provides a number of features:
 
 API Overview
 ------------
+
+Child Device
+~~~~~~~~~~~~
+
+By default, DPDK PCIe module detects and registers physical GPU devices
+in the system.
+With the gpudev library is also possible to add additional non-physical devices
+through an ``uint64_t`` generic handler (e.g. CUDA Driver context)
+that will be registered internally by the driver as an additional device (child)
+connected to a physical device (parent).
+Each device (parent or child) is represented through a ID
+required to indicate which device a given operation should be executed on.
diff --git a/lib/gpudev/gpudev.c b/lib/gpudev/gpudev.c
index d57e23df7c..74cdd7f20b 100644
--- a/lib/gpudev/gpudev.c
+++ b/lib/gpudev/gpudev.c
@@ -80,13 +80,22 @@ rte_gpu_is_valid(int16_t dev_id)
 	return false;
 }
 
+static bool
+gpu_match_parent(int16_t dev_id, int16_t parent)
+{
+	if (parent == RTE_GPU_ID_ANY)
+		return true;
+	return gpus[dev_id].info.parent == parent;
+}
+
 int16_t
-rte_gpu_find_next(int16_t dev_id)
+rte_gpu_find_next(int16_t dev_id, int16_t parent)
 {
 	if (dev_id < 0)
 		dev_id = 0;
 	while (dev_id < gpu_max &&
-			gpus[dev_id].state == RTE_GPU_STATE_UNUSED)
+			(gpus[dev_id].state == RTE_GPU_STATE_UNUSED ||
+			!gpu_match_parent(dev_id, parent)))
 		dev_id++;
 
 	if (dev_id >= gpu_max)
@@ -177,6 +186,7 @@ rte_gpu_allocate(const char *name)
 	dev->info.name = dev->name;
 	dev->info.dev_id = dev_id;
 	dev->info.numa_node = -1;
+	dev->info.parent = RTE_GPU_ID_NONE;
 	TAILQ_INIT(&dev->callbacks);
 
 	gpu_count++;
@@ -185,6 +195,28 @@ rte_gpu_allocate(const char *name)
 	return dev;
 }
 
+int16_t
+rte_gpu_add_child(const char *name, int16_t parent, uint64_t child_context)
+{
+	struct rte_gpu *dev;
+
+	if (!rte_gpu_is_valid(parent)) {
+		GPU_LOG(ERR, "add child to invalid parent ID %d", parent);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+
+	dev = rte_gpu_allocate(name);
+	if (dev == NULL)
+		return -rte_errno;
+
+	dev->info.parent = parent;
+	dev->info.context = child_context;
+
+	rte_gpu_complete_new(dev);
+	return dev->info.dev_id;
+}
+
 void
 rte_gpu_complete_new(struct rte_gpu *dev)
 {
@@ -199,10 +231,19 @@ rte_gpu_complete_new(struct rte_gpu *dev)
 int
 rte_gpu_release(struct rte_gpu *dev)
 {
+	int16_t dev_id, child;
+
 	if (dev == NULL) {
 		rte_errno = ENODEV;
 		return -rte_errno;
 	}
+	dev_id = dev->info.dev_id;
+	RTE_GPU_FOREACH_CHILD(child, dev_id) {
+		GPU_LOG(ERR, "cannot release device %d with child %d",
+				dev_id, child);
+		rte_errno = EBUSY;
+		return -rte_errno;
+	}
 
 	GPU_LOG(DEBUG, "free device %s (id %d)",
 			dev->info.name, dev->info.dev_id);
diff --git a/lib/gpudev/gpudev_driver.h b/lib/gpudev/gpudev_driver.h
index 2a7089aa52..4d0077161c 100644
--- a/lib/gpudev/gpudev_driver.h
+++ b/lib/gpudev/gpudev_driver.h
@@ -31,7 +31,7 @@ typedef int (rte_gpu_info_get_t)(struct rte_gpu *dev, struct rte_gpu_info *info)
 struct rte_gpu_ops {
 	/* Get device info. If NULL, info is just copied. */
 	rte_gpu_info_get_t *dev_info_get;
-	/* Close device. */
+	/* Close device or child context. */
 	rte_gpu_close_t *dev_close;
 };
 
diff --git a/lib/gpudev/rte_gpudev.h b/lib/gpudev/rte_gpudev.h
index e1702fbfe4..df75dbdbab 100644
--- a/lib/gpudev/rte_gpudev.h
+++ b/lib/gpudev/rte_gpudev.h
@@ -41,8 +41,12 @@ extern "C" {
 struct rte_gpu_info {
 	/** Unique identifier name. */
 	const char *name;
+	/** Opaque handler of the device context. */
+	uint64_t context;
 	/** Device ID. */
 	int16_t dev_id;
+	/** ID of the parent device, RTE_GPU_ID_NONE if no parent */
+	int16_t parent;
 	/** Total processors available on device. */
 	uint32_t processor_count;
 	/** Total memory available on device. */
@@ -110,6 +114,33 @@ uint16_t rte_gpu_count_avail(void);
 __rte_experimental
 bool rte_gpu_is_valid(int16_t dev_id);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Create a virtual device representing a context in the parent device.
+ *
+ * @param name
+ *   Unique string to identify the device.
+ * @param parent
+ *   Device ID of the parent.
+ * @param child_context
+ *   Opaque context handler.
+ *
+ * @return
+ *   Device ID of the new created child, -rte_errno otherwise:
+ *   - EINVAL if empty name
+ *   - ENAMETOOLONG if long name
+ *   - EEXIST if existing device name
+ *   - ENODEV if invalid parent
+ *   - EPERM if secondary process
+ *   - ENOENT if too many devices
+ *   - ENOMEM if out of space
+ */
+__rte_experimental
+int16_t rte_gpu_add_child(const char *name,
+		int16_t parent, uint64_t child_context);
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
@@ -118,13 +149,17 @@ bool rte_gpu_is_valid(int16_t dev_id);
  *
  * @param dev_id
  *   The initial device ID to start the research.
+ * @param parent
+ *   The device ID of the parent.
+ *   RTE_GPU_ID_NONE means no parent.
+ *   RTE_GPU_ID_ANY means no or any parent.
  *
  * @return
  *   Next device ID corresponding to a valid and initialized computing device,
  *   RTE_GPU_ID_NONE if there is none.
  */
 __rte_experimental
-int16_t rte_gpu_find_next(int16_t dev_id);
+int16_t rte_gpu_find_next(int16_t dev_id, int16_t parent);
 
 /**
  * @warning
@@ -136,15 +171,41 @@ int16_t rte_gpu_find_next(int16_t dev_id);
  *   The ID of the next possible valid device, usually 0 to iterate all.
  */
 #define RTE_GPU_FOREACH(dev_id) \
-	for (dev_id = rte_gpu_find_next(0); \
-	     dev_id > 0; \
-	     dev_id = rte_gpu_find_next(dev_id + 1))
+	RTE_GPU_FOREACH_CHILD(dev_id, RTE_GPU_ID_ANY)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Macro to iterate over all valid computing devices having no parent.
+ *
+ * @param dev_id
+ *   The ID of the next possible valid device, usually 0 to iterate all.
+ */
+#define RTE_GPU_FOREACH_PARENT(dev_id) \
+	RTE_GPU_FOREACH_CHILD(dev_id, RTE_GPU_ID_NONE)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Macro to iterate over all valid children of a computing device parent.
+ *
+ * @param dev_id
+ *   The ID of the next possible valid device, usually 0 to iterate all.
+ * @param parent
+ *   The device ID of the parent.
+ */
+#define RTE_GPU_FOREACH_CHILD(dev_id, parent) \
+	for (dev_id = rte_gpu_find_next(0, parent); \
+	     dev_id >= 0; \
+	     dev_id = rte_gpu_find_next(dev_id + 1, parent))
 
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
  *
- * Close device.
+ * Close device or child context.
  * All resources are released.
  *
  * @param dev_id
diff --git a/lib/gpudev/version.map b/lib/gpudev/version.map
index b3b6b76c1c..4a934ed933 100644
--- a/lib/gpudev/version.map
+++ b/lib/gpudev/version.map
@@ -2,6 +2,7 @@ EXPERIMENTAL {
 	global:
 
 	# added in 21.11
+	rte_gpu_add_child;
 	rte_gpu_callback_register;
 	rte_gpu_callback_unregister;
 	rte_gpu_close;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [PATCH v3 4/9] gpudev: support multi-process
  2021-10-09  1:53 ` [dpdk-dev] [PATCH v3 0/9] GPU library eagostini
                     ` (2 preceding siblings ...)
  2021-10-09  1:53   ` [dpdk-dev] [PATCH v3 3/9] gpudev: add child device representing a device context eagostini
@ 2021-10-09  1:53   ` eagostini
  2021-10-09  1:53   ` [dpdk-dev] [PATCH v3 5/9] gpudev: add memory API eagostini
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 128+ messages in thread
From: eagostini @ 2021-10-09  1:53 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon

From: Thomas Monjalon <thomas@monjalon.net>

The device data shared between processes are moved in a struct
allocated in a shared memory (a new memzone for all GPUs).
The main struct rte_gpu references the shared memory
via the pointer mpshared.

The API function rte_gpu_attach() is added to attach a device
from the secondary process.
The function rte_gpu_allocate() can be used only by primary process.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 lib/gpudev/gpudev.c        | 127 +++++++++++++++++++++++++++++++------
 lib/gpudev/gpudev_driver.h |  25 ++++++--
 lib/gpudev/version.map     |   1 +
 3 files changed, 127 insertions(+), 26 deletions(-)

diff --git a/lib/gpudev/gpudev.c b/lib/gpudev/gpudev.c
index 74cdd7f20b..f0690cf730 100644
--- a/lib/gpudev/gpudev.c
+++ b/lib/gpudev/gpudev.c
@@ -5,6 +5,7 @@
 #include <rte_eal.h>
 #include <rte_tailq.h>
 #include <rte_string_fns.h>
+#include <rte_memzone.h>
 #include <rte_errno.h>
 #include <rte_log.h>
 
@@ -28,6 +29,12 @@ static int16_t gpu_max;
 /* Number of currently valid devices */
 static int16_t gpu_count;
 
+/* Shared memory between processes. */
+static const char *GPU_MEMZONE = "rte_gpu_shared";
+static struct {
+	__extension__ struct rte_gpu_mpshared gpus[0];
+} *gpu_shared_mem;
+
 /* Event callback object */
 struct rte_gpu_callback {
 	TAILQ_ENTRY(rte_gpu_callback) next;
@@ -75,7 +82,7 @@ bool
 rte_gpu_is_valid(int16_t dev_id)
 {
 	if (dev_id >= 0 && dev_id < gpu_max &&
-		gpus[dev_id].state == RTE_GPU_STATE_INITIALIZED)
+		gpus[dev_id].process_state == RTE_GPU_STATE_INITIALIZED)
 		return true;
 	return false;
 }
@@ -85,7 +92,7 @@ gpu_match_parent(int16_t dev_id, int16_t parent)
 {
 	if (parent == RTE_GPU_ID_ANY)
 		return true;
-	return gpus[dev_id].info.parent == parent;
+	return gpus[dev_id].mpshared->info.parent == parent;
 }
 
 int16_t
@@ -94,7 +101,7 @@ rte_gpu_find_next(int16_t dev_id, int16_t parent)
 	if (dev_id < 0)
 		dev_id = 0;
 	while (dev_id < gpu_max &&
-			(gpus[dev_id].state == RTE_GPU_STATE_UNUSED ||
+			(gpus[dev_id].process_state == RTE_GPU_STATE_UNUSED ||
 			!gpu_match_parent(dev_id, parent)))
 		dev_id++;
 
@@ -109,7 +116,7 @@ gpu_find_free_id(void)
 	int16_t dev_id;
 
 	for (dev_id = 0; dev_id < gpu_max; dev_id++) {
-		if (gpus[dev_id].state == RTE_GPU_STATE_UNUSED)
+		if (gpus[dev_id].process_state == RTE_GPU_STATE_UNUSED)
 			return dev_id;
 	}
 	return RTE_GPU_ID_NONE;
@@ -136,12 +143,35 @@ rte_gpu_get_by_name(const char *name)
 
 	RTE_GPU_FOREACH(dev_id) {
 		dev = &gpus[dev_id];
-		if (strncmp(name, dev->name, RTE_DEV_NAME_MAX_LEN) == 0)
+		if (strncmp(name, dev->mpshared->name, RTE_DEV_NAME_MAX_LEN) == 0)
 			return dev;
 	}
 	return NULL;
 }
 
+static int
+gpu_shared_mem_init(void)
+{
+	const struct rte_memzone *memzone;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		memzone = rte_memzone_reserve(GPU_MEMZONE,
+				sizeof(*gpu_shared_mem) +
+				sizeof(*gpu_shared_mem->gpus) * gpu_max,
+				SOCKET_ID_ANY, 0);
+	} else {
+		memzone = rte_memzone_lookup(GPU_MEMZONE);
+	}
+	if (memzone == NULL) {
+		GPU_LOG(ERR, "cannot initialize shared memory");
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+
+	gpu_shared_mem = memzone->addr;
+	return 0;
+}
+
 struct rte_gpu *
 rte_gpu_allocate(const char *name)
 {
@@ -163,6 +193,10 @@ rte_gpu_allocate(const char *name)
 	if (gpus == NULL && rte_gpu_init(RTE_GPU_DEFAULT_MAX) < 0)
 		return NULL;
 
+	/* initialize shared memory before adding first device */
+	if (gpu_shared_mem == NULL && gpu_shared_mem_init() < 0)
+		return NULL;
+
 	if (rte_gpu_get_by_name(name) != NULL) {
 		GPU_LOG(ERR, "device with name %s already exists", name);
 		rte_errno = EEXIST;
@@ -178,16 +212,20 @@ rte_gpu_allocate(const char *name)
 	dev = &gpus[dev_id];
 	memset(dev, 0, sizeof(*dev));
 
-	if (rte_strscpy(dev->name, name, RTE_DEV_NAME_MAX_LEN) < 0) {
+	dev->mpshared = &gpu_shared_mem->gpus[dev_id];
+	memset(dev->mpshared, 0, sizeof(*dev->mpshared));
+
+	if (rte_strscpy(dev->mpshared->name, name, RTE_DEV_NAME_MAX_LEN) < 0) {
 		GPU_LOG(ERR, "device name too long: %s", name);
 		rte_errno = ENAMETOOLONG;
 		return NULL;
 	}
-	dev->info.name = dev->name;
-	dev->info.dev_id = dev_id;
-	dev->info.numa_node = -1;
-	dev->info.parent = RTE_GPU_ID_NONE;
+	dev->mpshared->info.name = dev->mpshared->name;
+	dev->mpshared->info.dev_id = dev_id;
+	dev->mpshared->info.numa_node = -1;
+	dev->mpshared->info.parent = RTE_GPU_ID_NONE;
 	TAILQ_INIT(&dev->callbacks);
+	__atomic_fetch_add(&dev->mpshared->process_refcnt, 1, __ATOMIC_RELAXED);
 
 	gpu_count++;
 	GPU_LOG(DEBUG, "new device %s (id %d) of total %d",
@@ -195,6 +233,55 @@ rte_gpu_allocate(const char *name)
 	return dev;
 }
 
+struct rte_gpu *
+rte_gpu_attach(const char *name)
+{
+	int16_t dev_id;
+	struct rte_gpu *dev;
+	struct rte_gpu_mpshared *shared_dev;
+
+	if (rte_eal_process_type() != RTE_PROC_SECONDARY) {
+		GPU_LOG(ERR, "only secondary process can attach device");
+		rte_errno = EPERM;
+		return NULL;
+	}
+	if (name == NULL) {
+		GPU_LOG(ERR, "attach device without a name");
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* implicit initialization of library before adding first device */
+	if (gpus == NULL && rte_gpu_init(RTE_GPU_DEFAULT_MAX) < 0)
+		return NULL;
+
+	/* initialize shared memory before adding first device */
+	if (gpu_shared_mem == NULL && gpu_shared_mem_init() < 0)
+		return NULL;
+
+	for (dev_id = 0; dev_id < gpu_max; dev_id++) {
+		shared_dev = &gpu_shared_mem->gpus[dev_id];
+		if (strncmp(name, shared_dev->name, RTE_DEV_NAME_MAX_LEN) == 0)
+			break;
+	}
+	if (dev_id >= gpu_max) {
+		GPU_LOG(ERR, "device with name %s not found", name);
+		rte_errno = ENOENT;
+		return NULL;
+	}
+	dev = &gpus[dev_id];
+	memset(dev, 0, sizeof(*dev));
+
+	TAILQ_INIT(&dev->callbacks);
+	dev->mpshared = shared_dev;
+	__atomic_fetch_add(&dev->mpshared->process_refcnt, 1, __ATOMIC_RELAXED);
+
+	gpu_count++;
+	GPU_LOG(DEBUG, "attached device %s (id %d) of total %d",
+			name, dev_id, gpu_count);
+	return dev;
+}
+
 int16_t
 rte_gpu_add_child(const char *name, int16_t parent, uint64_t child_context)
 {
@@ -210,11 +297,11 @@ rte_gpu_add_child(const char *name, int16_t parent, uint64_t child_context)
 	if (dev == NULL)
 		return -rte_errno;
 
-	dev->info.parent = parent;
-	dev->info.context = child_context;
+	dev->mpshared->info.parent = parent;
+	dev->mpshared->info.context = child_context;
 
 	rte_gpu_complete_new(dev);
-	return dev->info.dev_id;
+	return dev->mpshared->info.dev_id;
 }
 
 void
@@ -223,8 +310,7 @@ rte_gpu_complete_new(struct rte_gpu *dev)
 	if (dev == NULL)
 		return;
 
-	dev->state = RTE_GPU_STATE_INITIALIZED;
-	dev->state = RTE_GPU_STATE_INITIALIZED;
+	dev->process_state = RTE_GPU_STATE_INITIALIZED;
 	rte_gpu_notify(dev, RTE_GPU_EVENT_NEW);
 }
 
@@ -237,7 +323,7 @@ rte_gpu_release(struct rte_gpu *dev)
 		rte_errno = ENODEV;
 		return -rte_errno;
 	}
-	dev_id = dev->info.dev_id;
+	dev_id = dev->mpshared->info.dev_id;
 	RTE_GPU_FOREACH_CHILD(child, dev_id) {
 		GPU_LOG(ERR, "cannot release device %d with child %d",
 				dev_id, child);
@@ -246,11 +332,12 @@ rte_gpu_release(struct rte_gpu *dev)
 	}
 
 	GPU_LOG(DEBUG, "free device %s (id %d)",
-			dev->info.name, dev->info.dev_id);
+			dev->mpshared->info.name, dev->mpshared->info.dev_id);
 	rte_gpu_notify(dev, RTE_GPU_EVENT_DEL);
 
 	gpu_free_callbacks(dev);
-	dev->state = RTE_GPU_STATE_UNUSED;
+	dev->process_state = RTE_GPU_STATE_UNUSED;
+	__atomic_fetch_sub(&dev->mpshared->process_refcnt, 1, __ATOMIC_RELAXED);
 	gpu_count--;
 
 	return 0;
@@ -403,7 +490,7 @@ rte_gpu_notify(struct rte_gpu *dev, enum rte_gpu_event event)
 	int16_t dev_id;
 	struct rte_gpu_callback *callback;
 
-	dev_id = dev->info.dev_id;
+	dev_id = dev->mpshared->info.dev_id;
 	rte_rwlock_read_lock(&gpu_callback_lock);
 	TAILQ_FOREACH(callback, &dev->callbacks, next) {
 		if (callback->event != event || callback->function == NULL)
@@ -431,7 +518,7 @@ rte_gpu_info_get(int16_t dev_id, struct rte_gpu_info *info)
 	}
 
 	if (dev->ops.dev_info_get == NULL) {
-		*info = dev->info;
+		*info = dev->mpshared->info;
 		return 0;
 	}
 	return GPU_DRV_RET(dev->ops.dev_info_get(dev, info));
diff --git a/lib/gpudev/gpudev_driver.h b/lib/gpudev/gpudev_driver.h
index 4d0077161c..9459c7e30f 100644
--- a/lib/gpudev/gpudev_driver.h
+++ b/lib/gpudev/gpudev_driver.h
@@ -35,19 +35,28 @@ struct rte_gpu_ops {
 	rte_gpu_close_t *dev_close;
 };
 
-struct rte_gpu {
-	/* Backing device. */
-	struct rte_device *device;
+struct rte_gpu_mpshared {
 	/* Unique identifier name. */
 	char name[RTE_DEV_NAME_MAX_LEN]; /* Updated by this library. */
+	/* Driver-specific private data shared in multi-process. */
+	void *dev_private;
 	/* Device info structure. */
 	struct rte_gpu_info info;
+	/* Counter of processes using the device. */
+	uint16_t process_refcnt; /* Updated by this library. */
+};
+
+struct rte_gpu {
+	/* Backing device. */
+	struct rte_device *device;
+	/* Data shared between processes. */
+	struct rte_gpu_mpshared *mpshared;
 	/* Driver functions. */
 	struct rte_gpu_ops ops;
 	/* Event callback list. */
 	TAILQ_HEAD(rte_gpu_callback_list, rte_gpu_callback) callbacks;
 	/* Current state (used or not) in the running process. */
-	enum rte_gpu_state state; /* Updated by this library. */
+	enum rte_gpu_state process_state; /* Updated by this library. */
 	/* Driver-specific private data for the running process. */
 	void *process_private;
 } __rte_cache_aligned;
@@ -55,15 +64,19 @@ struct rte_gpu {
 __rte_internal
 struct rte_gpu *rte_gpu_get_by_name(const char *name);
 
-/* First step of initialization */
+/* First step of initialization in primary process. */
 __rte_internal
 struct rte_gpu *rte_gpu_allocate(const char *name);
 
+/* First step of initialization in secondary process. */
+__rte_internal
+struct rte_gpu *rte_gpu_attach(const char *name);
+
 /* Last step of initialization. */
 __rte_internal
 void rte_gpu_complete_new(struct rte_gpu *dev);
 
-/* Last step of removal. */
+/* Last step of removal (primary or secondary process). */
 __rte_internal
 int rte_gpu_release(struct rte_gpu *dev);
 
diff --git a/lib/gpudev/version.map b/lib/gpudev/version.map
index 4a934ed933..58dc632393 100644
--- a/lib/gpudev/version.map
+++ b/lib/gpudev/version.map
@@ -17,6 +17,7 @@ INTERNAL {
 	global:
 
 	rte_gpu_allocate;
+	rte_gpu_attach;
 	rte_gpu_complete_new;
 	rte_gpu_get_by_name;
 	rte_gpu_notify;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [PATCH v3 5/9] gpudev: add memory API
  2021-10-09  1:53 ` [dpdk-dev] [PATCH v3 0/9] GPU library eagostini
                     ` (3 preceding siblings ...)
  2021-10-09  1:53   ` [dpdk-dev] [PATCH v3 4/9] gpudev: support multi-process eagostini
@ 2021-10-09  1:53   ` eagostini
  2021-10-08 20:18     ` Thomas Monjalon
  2021-10-29 19:38     ` Mattias Rönnblom
  2021-10-09  1:53   ` [dpdk-dev] [PATCH v3 6/9] gpudev: add memory barrier eagostini
                     ` (4 subsequent siblings)
  9 siblings, 2 replies; 128+ messages in thread
From: eagostini @ 2021-10-09  1:53 UTC (permalink / raw)
  To: dev; +Cc: Elena Agostini, Thomas Monjalon

From: Elena Agostini <eagostini@nvidia.com>

In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.
Such workload distribution can be achieved by sharing some memory.

As a first step, the features are focused on memory management.
A function allows to allocate memory inside the device,
or in the main (CPU) memory while making it visible for the device.
This memory may be used to save packets or for synchronization data.

The next step should focus on GPU processing task control.

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 app/test-gpudev/main.c                 | 118 +++++++++++++++++++++++++
 doc/guides/gpus/features/default.ini   |   3 +
 doc/guides/prog_guide/gpudev.rst       |  19 ++++
 doc/guides/rel_notes/release_21_11.rst |   1 +
 lib/gpudev/gpudev.c                    | 101 +++++++++++++++++++++
 lib/gpudev/gpudev_driver.h             |  12 +++
 lib/gpudev/rte_gpudev.h                |  95 ++++++++++++++++++++
 lib/gpudev/version.map                 |   4 +
 8 files changed, 353 insertions(+)

diff --git a/app/test-gpudev/main.c b/app/test-gpudev/main.c
index 6a73a54e84..98c02a3ee0 100644
--- a/app/test-gpudev/main.c
+++ b/app/test-gpudev/main.c
@@ -62,6 +62,110 @@ args_parse(int argc, char **argv)
 	}
 }
 
+static int
+alloc_gpu_memory(uint16_t gpu_id)
+{
+	void * ptr_1 = NULL;
+	void * ptr_2 = NULL;
+	size_t buf_bytes = 1024;
+	int ret = 0;
+
+	printf("\n=======> TEST: Allocate GPU memory\n");
+
+	/* Alloc memory on GPU 0 */
+	ptr_1 = rte_gpu_malloc(gpu_id, buf_bytes);
+	if(ptr_1 == NULL)
+	{
+		fprintf(stderr, "rte_gpu_malloc GPU memory returned error\n");
+		return -1;
+	}
+	printf("GPU memory allocated at 0x%p %zdB\n", ptr_1, buf_bytes);
+
+	ptr_2 = rte_gpu_malloc(gpu_id, buf_bytes);
+	if(ptr_2 == NULL)
+	{
+		fprintf(stderr, "rte_gpu_malloc GPU memory returned error\n");
+		return -1;
+	}
+	printf("GPU memory allocated at 0x%p %zdB\n", ptr_2, buf_bytes);
+
+	ret = rte_gpu_free(gpu_id, (uint8_t*)(ptr_1)+0x700);
+	if(ret < 0)
+	{
+		printf("GPU memory 0x%p + 0x700 NOT freed because of memory address not recognized by driver\n", ptr_1);
+	}
+	else
+	{
+		fprintf(stderr, "rte_gpu_free erroneusly freed GPU memory 0x%p + 0x700\n", ptr_1);
+		return -1;
+	}
+
+	ret = rte_gpu_free(gpu_id, ptr_2);
+	if(ret < 0)
+	{
+		fprintf(stderr, "rte_gpu_free returned error %d\n", ret);
+		return -1;
+	}
+	printf("GPU memory 0x%p freed\n", ptr_2);
+
+	ret = rte_gpu_free(gpu_id, ptr_1);
+	if(ret < 0)
+	{
+		fprintf(stderr, "rte_gpu_free returned error %d\n", ret);
+		return -1;
+	}
+	printf("GPU memory 0x%p freed\n", ptr_1);
+
+	return 0;
+}
+
+static int
+register_cpu_memory(uint16_t gpu_id)
+{
+	void * ptr = NULL;
+	size_t buf_bytes = 1024;
+	int ret = 0;
+
+	printf("\n=======> TEST: Register CPU memory\n");
+
+	/* Alloc memory on CPU visible from GPU 0 */
+	ptr = rte_zmalloc(NULL, buf_bytes, 0);
+	if (ptr == NULL) {
+		fprintf(stderr, "Failed to allocate CPU memory.\n");
+		return -1;
+	}
+
+	ret = rte_gpu_register(gpu_id, buf_bytes, ptr);
+	if(ret < 0)
+	{
+		fprintf(stderr, "rte_gpu_register CPU memory returned error %d\n", ret);
+		return -1;
+	}
+	printf("CPU memory registered at 0x%p %zdB\n", ptr, buf_bytes);
+
+	ret = rte_gpu_unregister(gpu_id, (uint8_t*)(ptr)+0x700);
+	if(ret < 0)
+	{
+		printf("CPU memory 0x%p + 0x700 NOT unregistered because of memory address not recognized by driver\n", ptr);
+	}
+	else
+	{
+		fprintf(stderr, "rte_gpu_free erroneusly freed GPU memory 0x%p + 0x700\n", ptr);
+		return -1;
+	}
+	printf("CPU memory 0x%p unregistered\n", ptr);
+
+	ret = rte_gpu_unregister(gpu_id, ptr);
+	if(ret < 0)
+	{
+		fprintf(stderr, "rte_gpu_unregister returned error %d\n", ret);
+		return -1;
+	}
+	printf("CPU memory 0x%p unregistered\n", ptr);
+
+	return 0;
+}
+
 int
 main(int argc, char **argv)
 {
@@ -99,6 +203,20 @@ main(int argc, char **argv)
 	}
 	printf("\n\n");
 
+	if(nb_gpus == 0)
+	{
+		fprintf(stderr, "Need at least one GPU on the system to run the example\n");
+		return EXIT_FAILURE;
+	}
+
+	gpu_id = 0;
+
+	/**
+	 * Memory tests
+	 */
+	alloc_gpu_memory(gpu_id);
+	register_cpu_memory(gpu_id);
+
 	/* clean up the EAL */
 	rte_eal_cleanup();
 	printf("Bye...\n");
diff --git a/doc/guides/gpus/features/default.ini b/doc/guides/gpus/features/default.ini
index ec7a545eb7..87e9966424 100644
--- a/doc/guides/gpus/features/default.ini
+++ b/doc/guides/gpus/features/default.ini
@@ -8,3 +8,6 @@
 ;
 [Features]
 Get device info                =
+Share CPU memory with device   =
+Allocate device memory         =
+Free memory                    =
diff --git a/doc/guides/prog_guide/gpudev.rst b/doc/guides/prog_guide/gpudev.rst
index 7694639489..9aca69038c 100644
--- a/doc/guides/prog_guide/gpudev.rst
+++ b/doc/guides/prog_guide/gpudev.rst
@@ -30,6 +30,8 @@ Features
 This library provides a number of features:
 
 - Interoperability with device-specific library through generic handlers.
+- Allocate and free memory on the device.
+- Register CPU memory to make it visible from the device.
 
 
 API Overview
@@ -46,3 +48,20 @@ that will be registered internally by the driver as an additional device (child)
 connected to a physical device (parent).
 Each device (parent or child) is represented through a ID
 required to indicate which device a given operation should be executed on.
+
+Memory Allocation
+~~~~~~~~~~~~~~~~~
+
+gpudev can allocate on an input given GPU device a memory area
+returning the pointer to that memory.
+Later, it's also possible to free that memory with gpudev.
+GPU memory allocated outside of the gpudev library
+(e.g. with GPU-specific library) cannot be freed by the gpudev library.
+
+Memory Registration
+~~~~~~~~~~~~~~~~~~~
+
+gpudev can register a CPU memory area to make it visible from a GPU device.
+Later, it's also possible to unregister that memory with gpudev.
+CPU memory registered outside of the gpudev library
+(e.g. with GPU specific library) cannot be unregistered by the gpudev library.
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index 4986a35b50..c4ac5e3053 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -65,6 +65,7 @@ New Features
 * **Introduced GPU device class with first features:**
 
   * Device information
+  * Memory management
 
 * **Added new RSS offload types for IPv4/L4 checksum in RSS flow.**
 
diff --git a/lib/gpudev/gpudev.c b/lib/gpudev/gpudev.c
index f0690cf730..1d8318f769 100644
--- a/lib/gpudev/gpudev.c
+++ b/lib/gpudev/gpudev.c
@@ -6,6 +6,7 @@
 #include <rte_tailq.h>
 #include <rte_string_fns.h>
 #include <rte_memzone.h>
+#include <rte_malloc.h>
 #include <rte_errno.h>
 #include <rte_log.h>
 
@@ -523,3 +524,103 @@ rte_gpu_info_get(int16_t dev_id, struct rte_gpu_info *info)
 	}
 	return GPU_DRV_RET(dev->ops.dev_info_get(dev, info));
 }
+
+void *
+rte_gpu_malloc(int16_t dev_id, size_t size)
+{
+	struct rte_gpu *dev;
+	void *ptr;
+	int ret;
+
+	dev = gpu_get_by_id(dev_id);
+	if (dev == NULL) {
+		GPU_LOG(ERR, "alloc mem for invalid device ID %d", dev_id);
+		rte_errno = ENODEV;
+		return NULL;
+	}
+
+	if (dev->ops.mem_alloc == NULL) {
+		GPU_LOG(ERR, "mem allocation not supported");
+		rte_errno = ENOTSUP;
+		return NULL;
+	}
+
+	if (size == 0) /* dry-run */
+		return NULL;
+
+	ret = dev->ops.mem_alloc(dev, size, &ptr);
+
+	switch (ret) {
+		case 0:
+			return ptr;
+		case -ENOMEM:
+		case -E2BIG:
+			rte_errno = -ret;
+			return NULL;
+		default:
+			rte_errno = -EPERM;
+			return NULL;
+	}
+}
+
+int
+rte_gpu_register(int16_t dev_id, size_t size, void * ptr)
+{
+	struct rte_gpu *dev;
+
+	dev = gpu_get_by_id(dev_id);
+	if (dev == NULL) {
+		GPU_LOG(ERR, "alloc mem for invalid device ID %d", dev_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+
+	if (dev->ops.mem_register == NULL) {
+		GPU_LOG(ERR, "mem registration not supported");
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+
+	if (size == 0 || ptr == NULL) /* dry-run */
+		return -EINVAL;
+
+	return GPU_DRV_RET(dev->ops.mem_register(dev, size, ptr));
+}
+
+int
+rte_gpu_unregister(int16_t dev_id, void * ptr)
+{
+	struct rte_gpu *dev;
+
+	dev = gpu_get_by_id(dev_id);
+	if (dev == NULL) {
+		GPU_LOG(ERR, "unregister mem for invalid device ID %d", dev_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+
+	if (dev->ops.mem_unregister == NULL) {
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	return GPU_DRV_RET(dev->ops.mem_unregister(dev, ptr));
+}
+
+int
+rte_gpu_free(int16_t dev_id, void *ptr)
+{
+	struct rte_gpu *dev;
+
+	dev = gpu_get_by_id(dev_id);
+	if (dev == NULL) {
+		GPU_LOG(ERR, "free mem for invalid device ID %d", dev_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+
+	if (dev->ops.mem_free == NULL) {
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	return GPU_DRV_RET(dev->ops.mem_free(dev, ptr));
+}
diff --git a/lib/gpudev/gpudev_driver.h b/lib/gpudev/gpudev_driver.h
index 9459c7e30f..11015944a6 100644
--- a/lib/gpudev/gpudev_driver.h
+++ b/lib/gpudev/gpudev_driver.h
@@ -27,12 +27,24 @@ enum rte_gpu_state {
 struct rte_gpu;
 typedef int (rte_gpu_close_t)(struct rte_gpu *dev);
 typedef int (rte_gpu_info_get_t)(struct rte_gpu *dev, struct rte_gpu_info *info);
+typedef int (rte_gpu_mem_alloc_t)(struct rte_gpu *dev, size_t size, void **ptr);
+typedef int (rte_gpu_free_t)(struct rte_gpu *dev, void *ptr);
+typedef int (rte_gpu_mem_register_t)(struct rte_gpu *dev, size_t size, void *ptr);
+typedef int (rte_gpu_mem_unregister_t)(struct rte_gpu *dev, void *ptr);
 
 struct rte_gpu_ops {
 	/* Get device info. If NULL, info is just copied. */
 	rte_gpu_info_get_t *dev_info_get;
 	/* Close device or child context. */
 	rte_gpu_close_t *dev_close;
+	/* Allocate memory in device. */
+	rte_gpu_mem_alloc_t *mem_alloc;
+	/* Register CPU memory in device. */
+	rte_gpu_mem_register_t *mem_register;
+	/* Free memory allocated or registered in device. */
+	rte_gpu_free_t *mem_free;
+	/* Unregister CPU memory in device. */
+	rte_gpu_mem_unregister_t *mem_unregister;
 };
 
 struct rte_gpu_mpshared {
diff --git a/lib/gpudev/rte_gpudev.h b/lib/gpudev/rte_gpudev.h
index df75dbdbab..3c276581c0 100644
--- a/lib/gpudev/rte_gpudev.h
+++ b/lib/gpudev/rte_gpudev.h
@@ -9,6 +9,7 @@
 #include <stdint.h>
 #include <stdbool.h>
 
+#include <rte_bitops.h>
 #include <rte_compat.h>
 
 /**
@@ -292,6 +293,100 @@ int rte_gpu_callback_unregister(int16_t dev_id, enum rte_gpu_event event,
 __rte_experimental
 int rte_gpu_info_get(int16_t dev_id, struct rte_gpu_info *info);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Allocate a chunk of memory usable by the device.
+ *
+ * @param dev_id
+ *   Device ID requiring allocated memory.
+ * @param size
+ *   Number of bytes to allocate.
+ *   Requesting 0 will do nothing.
+ *
+ * @return
+ *   A pointer to the allocated memory, otherwise NULL and rte_errno is set:
+ *   - ENODEV if invalid dev_id
+ *   - EINVAL if reserved flags
+ *   - ENOTSUP if operation not supported by the driver
+ *   - E2BIG if size is higher than limit
+ *   - ENOMEM if out of space
+ *   - EPERM if driver error
+ */
+__rte_experimental
+void *rte_gpu_malloc(int16_t dev_id, size_t size)
+__rte_alloc_size(2);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Deallocate a chunk of memory allocated with rte_gpu_malloc().
+ *
+ * @param dev_id
+ *   Reference device ID.
+ * @param ptr
+ *   Pointer to the memory area to be deallocated.
+ *   NULL is a no-op accepted value.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - ENOTSUP if operation not supported by the driver
+ *   - EPERM if driver error
+ */
+__rte_experimental
+int rte_gpu_free(int16_t dev_id, void *ptr);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Register a chunk of memory on the CPU usable by the device.
+ *
+ * @param dev_id
+ *   Device ID requiring allocated memory.
+ * @param size
+ *   Number of bytes to allocate.
+ *   Requesting 0 will do nothing.
+ * @param ptr
+ *   Pointer to the memory area to be registered.
+ *   NULL is a no-op accepted value.
+
+ * @return
+ *   A pointer to the allocated memory, otherwise NULL and rte_errno is set:
+ *   - ENODEV if invalid dev_id
+ *   - EINVAL if reserved flags
+ *   - ENOTSUP if operation not supported by the driver
+ *   - E2BIG if size is higher than limit
+ *   - ENOMEM if out of space
+ *   - EPERM if driver error
+ */
+__rte_experimental
+int rte_gpu_register(int16_t dev_id, size_t size, void * ptr);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Deregister a chunk of memory previusly registered with rte_gpu_mem_register()
+ *
+ * @param dev_id
+ *   Reference device ID.
+ * @param ptr
+ *   Pointer to the memory area to be unregistered.
+ *   NULL is a no-op accepted value.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - ENOTSUP if operation not supported by the driver
+ *   - EPERM if driver error
+ */
+__rte_experimental
+int rte_gpu_unregister(int16_t dev_id, void *ptr);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/gpudev/version.map b/lib/gpudev/version.map
index 58dc632393..d4a65ebd52 100644
--- a/lib/gpudev/version.map
+++ b/lib/gpudev/version.map
@@ -8,9 +8,13 @@ EXPERIMENTAL {
 	rte_gpu_close;
 	rte_gpu_count_avail;
 	rte_gpu_find_next;
+	rte_gpu_free;
 	rte_gpu_info_get;
 	rte_gpu_init;
 	rte_gpu_is_valid;
+	rte_gpu_malloc;
+	rte_gpu_register;
+	rte_gpu_unregister;
 };
 
 INTERNAL {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [PATCH v3 6/9] gpudev: add memory barrier
  2021-10-09  1:53 ` [dpdk-dev] [PATCH v3 0/9] GPU library eagostini
                     ` (4 preceding siblings ...)
  2021-10-09  1:53   ` [dpdk-dev] [PATCH v3 5/9] gpudev: add memory API eagostini
@ 2021-10-09  1:53   ` eagostini
  2021-10-08 20:16     ` Thomas Monjalon
  2021-10-09  1:53   ` [dpdk-dev] [PATCH v3 7/9] gpudev: add communication flag eagostini
                     ` (3 subsequent siblings)
  9 siblings, 1 reply; 128+ messages in thread
From: eagostini @ 2021-10-09  1:53 UTC (permalink / raw)
  To: dev; +Cc: Elena Agostini

From: Elena Agostini <eagostini@nvidia.com>

Add a function for the application to ensure the coherency
of the writes executed by another device into the GPU memory.

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
---
 doc/guides/prog_guide/gpudev.rst |  8 ++++++++
 lib/gpudev/gpudev.c              | 19 +++++++++++++++++++
 lib/gpudev/gpudev_driver.h       |  3 +++
 lib/gpudev/rte_gpudev.h          | 18 ++++++++++++++++++
 lib/gpudev/version.map           |  1 +
 5 files changed, 49 insertions(+)

diff --git a/doc/guides/prog_guide/gpudev.rst b/doc/guides/prog_guide/gpudev.rst
index 9aca69038c..eb5f0af817 100644
--- a/doc/guides/prog_guide/gpudev.rst
+++ b/doc/guides/prog_guide/gpudev.rst
@@ -65,3 +65,11 @@ gpudev can register a CPU memory area to make it visible from a GPU device.
 Later, it's also possible to unregister that memory with gpudev.
 CPU memory registered outside of the gpudev library
 (e.g. with GPU specific library) cannot be unregistered by the gpudev library.
+
+Memory Barrier
+~~~~~~~~~~~~~~
+
+Some GPU drivers may need, under certain conditions,
+to enforce the coherency of external devices writes (e.g. NIC receiving packets)
+into the GPU memory.
+gpudev abstracts and exposes this capability.
diff --git a/lib/gpudev/gpudev.c b/lib/gpudev/gpudev.c
index 1d8318f769..cefefd737a 100644
--- a/lib/gpudev/gpudev.c
+++ b/lib/gpudev/gpudev.c
@@ -624,3 +624,22 @@ rte_gpu_free(int16_t dev_id, void *ptr)
 	}
 	return GPU_DRV_RET(dev->ops.mem_free(dev, ptr));
 }
+
+int
+rte_gpu_mbw(int16_t dev_id)
+{
+	struct rte_gpu *dev;
+
+	dev = gpu_get_by_id(dev_id);
+	if (dev == NULL) {
+		GPU_LOG(ERR, "memory barrier for invalid device ID %d", dev_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+
+	if (dev->ops.mbw == NULL) {
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	return GPU_DRV_RET(dev->ops.mbw(dev));
+}
diff --git a/lib/gpudev/gpudev_driver.h b/lib/gpudev/gpudev_driver.h
index 11015944a6..ab24de9e28 100644
--- a/lib/gpudev/gpudev_driver.h
+++ b/lib/gpudev/gpudev_driver.h
@@ -31,6 +31,7 @@ typedef int (rte_gpu_mem_alloc_t)(struct rte_gpu *dev, size_t size, void **ptr);
 typedef int (rte_gpu_free_t)(struct rte_gpu *dev, void *ptr);
 typedef int (rte_gpu_mem_register_t)(struct rte_gpu *dev, size_t size, void *ptr);
 typedef int (rte_gpu_mem_unregister_t)(struct rte_gpu *dev, void *ptr);
+typedef int (rte_gpu_mbw_t)(struct rte_gpu *dev);
 
 struct rte_gpu_ops {
 	/* Get device info. If NULL, info is just copied. */
@@ -45,6 +46,8 @@ struct rte_gpu_ops {
 	rte_gpu_free_t *mem_free;
 	/* Unregister CPU memory in device. */
 	rte_gpu_mem_unregister_t *mem_unregister;
+	/* Enforce GPU memory write barrier. */
+	rte_gpu_mbw_t *mbw;
 };
 
 struct rte_gpu_mpshared {
diff --git a/lib/gpudev/rte_gpudev.h b/lib/gpudev/rte_gpudev.h
index 3c276581c0..e790b3e2b7 100644
--- a/lib/gpudev/rte_gpudev.h
+++ b/lib/gpudev/rte_gpudev.h
@@ -387,6 +387,24 @@ int rte_gpu_register(int16_t dev_id, size_t size, void * ptr);
 __rte_experimental
 int rte_gpu_unregister(int16_t dev_id, void *ptr);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Enforce a GPU memory write barrier.
+ *
+ * @param dev_id
+ *   Reference device ID.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - ENOTSUP if operation not supported by the driver
+ *   - EPERM if driver error
+ */
+__rte_experimental
+int rte_gpu_mbw(int16_t dev_id);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/gpudev/version.map b/lib/gpudev/version.map
index d4a65ebd52..d72d470d8e 100644
--- a/lib/gpudev/version.map
+++ b/lib/gpudev/version.map
@@ -13,6 +13,7 @@ EXPERIMENTAL {
 	rte_gpu_init;
 	rte_gpu_is_valid;
 	rte_gpu_malloc;
+	rte_gpu_mbw;
 	rte_gpu_register;
 	rte_gpu_unregister;
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [PATCH v3 7/9] gpudev: add communication flag
  2021-10-09  1:53 ` [dpdk-dev] [PATCH v3 0/9] GPU library eagostini
                     ` (5 preceding siblings ...)
  2021-10-09  1:53   ` [dpdk-dev] [PATCH v3 6/9] gpudev: add memory barrier eagostini
@ 2021-10-09  1:53   ` eagostini
  2021-10-09  1:53   ` [dpdk-dev] [PATCH v3 8/9] gpudev: add communication list eagostini
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 128+ messages in thread
From: eagostini @ 2021-10-09  1:53 UTC (permalink / raw)
  To: dev; +Cc: Elena Agostini

From: Elena Agostini <eagostini@nvidia.com>

In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.
When mixing network activity with task processing there may be the need
to put in communication the CPU with the device in order to synchronize
operations.

The purpose of this flag is to allow the CPU and the GPU to
exchange ACKs. A possible use-case is described below.

CPU:
- Trigger some task on the GPU
- Prepare some data
- Signal to the GPU the data is ready updating the communication flag

GPU:
- Do some pre-processing
- Wait for more data from the CPU polling on the communication flag
- Consume the data prepared by the CPU

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
---
 app/test-gpudev/main.c                 |  66 +++++++++++++++
 doc/guides/prog_guide/gpudev.rst       |  13 +++
 doc/guides/rel_notes/release_21_11.rst |   1 +
 lib/gpudev/gpudev.c                    |  94 +++++++++++++++++++++
 lib/gpudev/rte_gpudev.h                | 108 +++++++++++++++++++++++++
 lib/gpudev/version.map                 |   4 +
 6 files changed, 286 insertions(+)

diff --git a/app/test-gpudev/main.c b/app/test-gpudev/main.c
index 98c02a3ee0..22f5c950b2 100644
--- a/app/test-gpudev/main.c
+++ b/app/test-gpudev/main.c
@@ -166,6 +166,67 @@ register_cpu_memory(uint16_t gpu_id)
 	return 0;
 }
 
+static int
+create_update_comm_flag(uint16_t gpu_id)
+{
+	struct rte_gpu_comm_flag devflag;
+	int ret = 0;
+	uint32_t set_val;
+	uint32_t get_val;
+
+	printf("\n=======> TEST: Communication flag\n");
+
+	ret = rte_gpu_comm_create_flag(gpu_id, &devflag, RTE_GPU_COMM_FLAG_CPU);
+	if(ret < 0)
+	{
+		fprintf(stderr, "rte_gpu_comm_create_flag returned error %d\n", ret);
+		return -1;
+	}
+
+	set_val = 25;
+	ret = rte_gpu_comm_set_flag(&devflag, set_val);
+	if(ret < 0)
+	{
+		fprintf(stderr, "rte_gpu_comm_set_flag returned error %d\n", ret);
+		return -1;
+	}
+
+	ret = rte_gpu_comm_get_flag_value(&devflag, &get_val);
+	if(ret < 0)
+	{
+		fprintf(stderr, "rte_gpu_comm_get_flag_value returned error %d\n", ret);
+		return -1;
+	}
+
+	printf("Communication flag value at 0x%p was set to %d and current value is %d\n", devflag.ptr, set_val, get_val);
+
+	set_val = 38;
+	ret = rte_gpu_comm_set_flag(&devflag, set_val);
+	if(ret < 0)
+	{
+		fprintf(stderr, "rte_gpu_comm_set_flag returned error %d\n", ret);
+		return -1;
+	}
+
+	ret = rte_gpu_comm_get_flag_value(&devflag, &get_val);
+	if(ret < 0)
+	{
+		fprintf(stderr, "rte_gpu_comm_get_flag_value returned error %d\n", ret);
+		return -1;
+	}
+
+	printf("Communication flag value at 0x%p was set to %d and current value is %d\n", devflag.ptr, set_val, get_val);
+
+	ret = rte_gpu_comm_destroy_flag(&devflag);
+	if(ret < 0)
+	{
+		fprintf(stderr, "rte_gpu_comm_destroy_flags returned error %d\n", ret);
+		return -1;
+	}
+
+	return 0;
+}
+
 int
 main(int argc, char **argv)
 {
@@ -217,6 +278,11 @@ main(int argc, char **argv)
 	alloc_gpu_memory(gpu_id);
 	register_cpu_memory(gpu_id);
 
+	/**
+	 * Communication items test
+	 */
+	create_update_comm_flag(gpu_id);
+
 	/* clean up the EAL */
 	rte_eal_cleanup();
 	printf("Bye...\n");
diff --git a/doc/guides/prog_guide/gpudev.rst b/doc/guides/prog_guide/gpudev.rst
index eb5f0af817..e0db627aed 100644
--- a/doc/guides/prog_guide/gpudev.rst
+++ b/doc/guides/prog_guide/gpudev.rst
@@ -32,6 +32,10 @@ This library provides a number of features:
 - Interoperability with device-specific library through generic handlers.
 - Allocate and free memory on the device.
 - Register CPU memory to make it visible from the device.
+- Communication between the CPU and the device.
+
+The whole CPU - GPU communication is implemented
+using CPU memory visible from the GPU.
 
 
 API Overview
@@ -73,3 +77,12 @@ Some GPU drivers may need, under certain conditions,
 to enforce the coherency of external devices writes (e.g. NIC receiving packets)
 into the GPU memory.
 gpudev abstracts and exposes this capability.
+
+Communication Flag
+~~~~~~~~~~~~~~~~~~
+
+Considering an application with some GPU task
+that's waiting to receive a signal from the CPU
+to move forward with the execution.
+The communication flag allocates a CPU memory GPU-visible ``uint32_t`` flag
+that can be used by the CPU to communicate with a GPU task.
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index c4ac5e3053..59ab1a1920 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -66,6 +66,7 @@ New Features
 
   * Device information
   * Memory management
+  * Communication flag
 
 * **Added new RSS offload types for IPv4/L4 checksum in RSS flow.**
 
diff --git a/lib/gpudev/gpudev.c b/lib/gpudev/gpudev.c
index cefefd737a..827e29d8f6 100644
--- a/lib/gpudev/gpudev.c
+++ b/lib/gpudev/gpudev.c
@@ -643,3 +643,97 @@ rte_gpu_mbw(int16_t dev_id)
 	}
 	return GPU_DRV_RET(dev->ops.mbw(dev));
 }
+
+int
+rte_gpu_comm_create_flag(uint16_t dev_id, struct rte_gpu_comm_flag *devflag,
+		enum rte_gpu_comm_flag_type mtype)
+{
+	size_t flag_size;
+	int ret;
+
+	if (devflag == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (mtype != RTE_GPU_COMM_FLAG_CPU) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	flag_size = sizeof(uint32_t);
+
+	devflag->ptr = rte_zmalloc(NULL, flag_size, 0);
+	if (devflag->ptr == NULL) {
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+
+	ret = rte_gpu_register(dev_id, flag_size, devflag->ptr);
+	if(ret < 0)
+	{
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+
+	devflag->mtype = mtype;
+	devflag->dev_id = dev_id;
+
+	return 0;
+}
+
+int
+rte_gpu_comm_destroy_flag(struct rte_gpu_comm_flag *devflag)
+{
+	int ret;
+
+	if (devflag == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	ret = rte_gpu_unregister(devflag->dev_id, devflag->ptr);
+	if(ret < 0)
+	{
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	rte_free(devflag->ptr);
+
+	return 0;
+}
+
+int
+rte_gpu_comm_set_flag(struct rte_gpu_comm_flag *devflag, uint32_t val)
+{
+	if (devflag == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	if (devflag->mtype != RTE_GPU_COMM_FLAG_CPU) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	RTE_GPU_VOLATILE(*devflag->ptr) = val;
+
+	return 0;
+}
+
+int
+rte_gpu_comm_get_flag_value(struct rte_gpu_comm_flag *devflag, uint32_t *val)
+{
+	if (devflag == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (devflag->mtype != RTE_GPU_COMM_FLAG_CPU) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	*val = RTE_GPU_VOLATILE(*devflag->ptr);
+
+	return 0;
+}
diff --git a/lib/gpudev/rte_gpudev.h b/lib/gpudev/rte_gpudev.h
index e790b3e2b7..4a10a8bcf5 100644
--- a/lib/gpudev/rte_gpudev.h
+++ b/lib/gpudev/rte_gpudev.h
@@ -38,6 +38,9 @@ extern "C" {
 /** Catch-all callback data. */
 #define RTE_GPU_CALLBACK_ANY_DATA ((void *)-1)
 
+/** Access variable as volatile. */
+#define RTE_GPU_VOLATILE(x) (*(volatile typeof(x)*)&(x))
+
 /** Store device info. */
 struct rte_gpu_info {
 	/** Unique identifier name. */
@@ -68,6 +71,22 @@ enum rte_gpu_event {
 typedef void (rte_gpu_callback_t)(int16_t dev_id,
 		enum rte_gpu_event event, void *user_data);
 
+/** Memory where communication flag is allocated. */
+enum rte_gpu_comm_flag_type {
+	/** Allocate flag on CPU memory visible from device. */
+	RTE_GPU_COMM_FLAG_CPU = 0,
+};
+
+/** Communication flag to coordinate CPU with the device. */
+struct rte_gpu_comm_flag {
+	/** Device that will use the device flag. */
+	uint16_t dev_id;
+	/** Pointer to flag memory area. */
+	uint32_t *ptr;
+	/** Type of memory used to allocate the flag. */
+	enum rte_gpu_comm_flag_type mtype;
+};
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
@@ -405,6 +424,95 @@ int rte_gpu_unregister(int16_t dev_id, void *ptr);
 __rte_experimental
 int rte_gpu_mbw(int16_t dev_id);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Create a communication flag that can be shared
+ * between CPU threads and device workload to exchange some status info
+ * (e.g. work is done, processing can start, etc..).
+ *
+ * @param dev_id
+ *   Reference device ID.
+ * @param devflag
+ *   Pointer to the memory area of the devflag structure.
+ * @param mtype
+ *   Type of memory to allocate the communication flag.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - EINVAL if invalid inputs
+ *   - ENOTSUP if operation not supported by the driver
+ *   - ENOMEM if out of space
+ *   - EPERM if driver error
+ */
+__rte_experimental
+int rte_gpu_comm_create_flag(uint16_t dev_id,
+		struct rte_gpu_comm_flag *devflag,
+		enum rte_gpu_comm_flag_type mtype);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Deallocate a communication flag.
+ *
+ * @param devflag
+ *   Pointer to the memory area of the devflag structure.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - EINVAL if NULL devflag
+ *   - ENOTSUP if operation not supported by the driver
+ *   - EPERM if driver error
+ */
+__rte_experimental
+int rte_gpu_comm_destroy_flag(struct rte_gpu_comm_flag *devflag);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set the value of a communication flag as the input value.
+ * Flag memory area is treated as volatile.
+ * The flag must have been allocated with RTE_GPU_COMM_FLAG_CPU.
+ *
+ * @param devflag
+ *   Pointer to the memory area of the devflag structure.
+ * @param val
+ *   Value to set in the flag.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - EINVAL if invalid input params
+ */
+__rte_experimental
+int rte_gpu_comm_set_flag(struct rte_gpu_comm_flag *devflag,
+		uint32_t val);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get the value of the communication flag.
+ * Flag memory area is treated as volatile.
+ * The flag must have been allocated with RTE_GPU_COMM_FLAG_CPU.
+ *
+ * @param devflag
+ *   Pointer to the memory area of the devflag structure.
+ * @param val
+ *   Flag output value.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - EINVAL if invalid input params
+ */
+__rte_experimental
+int rte_gpu_comm_get_flag_value(struct rte_gpu_comm_flag *devflag,
+		uint32_t *val);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/gpudev/version.map b/lib/gpudev/version.map
index d72d470d8e..2fc039373a 100644
--- a/lib/gpudev/version.map
+++ b/lib/gpudev/version.map
@@ -6,6 +6,10 @@ EXPERIMENTAL {
 	rte_gpu_callback_register;
 	rte_gpu_callback_unregister;
 	rte_gpu_close;
+	rte_gpu_comm_create_flag;
+	rte_gpu_comm_destroy_flag;
+	rte_gpu_comm_get_flag_value;
+	rte_gpu_comm_set_flag;
 	rte_gpu_count_avail;
 	rte_gpu_find_next;
 	rte_gpu_free;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [PATCH v3 8/9] gpudev: add communication list
  2021-10-09  1:53 ` [dpdk-dev] [PATCH v3 0/9] GPU library eagostini
                     ` (6 preceding siblings ...)
  2021-10-09  1:53   ` [dpdk-dev] [PATCH v3 7/9] gpudev: add communication flag eagostini
@ 2021-10-09  1:53   ` eagostini
  2021-10-09  1:53   ` [dpdk-dev] [PATCH v3 9/9] doc: add CUDA example in GPU guide eagostini
  2021-10-10 10:16   ` [dpdk-dev] [PATCH v3 0/9] GPU library Jerin Jacob
  9 siblings, 0 replies; 128+ messages in thread
From: eagostini @ 2021-10-09  1:53 UTC (permalink / raw)
  To: dev; +Cc: Elena Agostini

From: Elena Agostini <eagostini@nvidia.com>

In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.
When mixing network activity with task processing there may be the need
to put in communication the CPU with the device in order to synchronize
operations.

An example could be a receive-and-process application
where CPU is responsible for receiving packets in multiple mbufs
and the GPU is responsible for processing the content of those packets.

The purpose of this list is to provide a buffer in CPU memory visible
from the GPU that can be treated as a circular buffer
to let the CPU provide fondamental info of received packets to the GPU.

A possible use-case is described below.

CPU:
- Trigger some task on the GPU
- in a loop:
    - receive a number of packets
    - provide packets info to the GPU

GPU:
- Do some pre-processing
- Wait to receive a new set of packet to be processed

Layout of a communication list would be:

     -------
    |   0    | => pkt_list
    | status |
    | #pkts  |
     -------
    |   1    | => pkt_list
    | status |
    | #pkts  |
     -------
    |   2    | => pkt_list
    | status |
    | #pkts  |
     -------
    |  ....  | => pkt_list
     -------

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
---
 app/test-gpudev/main.c                 | 103 +++++++++++++++
 doc/guides/prog_guide/gpudev.rst       |  16 +++
 doc/guides/rel_notes/release_21_11.rst |   2 +-
 lib/gpudev/gpudev.c                    | 165 +++++++++++++++++++++++++
 lib/gpudev/meson.build                 |   2 +
 lib/gpudev/rte_gpudev.h                | 129 +++++++++++++++++++
 lib/gpudev/version.map                 |   4 +
 7 files changed, 420 insertions(+), 1 deletion(-)

diff --git a/app/test-gpudev/main.c b/app/test-gpudev/main.c
index 22f5c950b2..8f7ffa4c63 100644
--- a/app/test-gpudev/main.c
+++ b/app/test-gpudev/main.c
@@ -227,6 +227,108 @@ create_update_comm_flag(uint16_t gpu_id)
 	return 0;
 }
 
+static int
+simulate_gpu_task(struct rte_gpu_comm_list *comm_list_item, int num_pkts)
+{
+	int idx;
+
+	if(comm_list_item == NULL)
+		return -1;
+
+	for (idx = 0; idx < num_pkts; idx++) {
+		/**
+		 * consume(comm_list_item->pkt_list[idx].addr);
+		 */
+	}
+	comm_list_item->status = RTE_GPU_COMM_LIST_DONE;
+
+	return 0;
+}
+
+static int
+create_update_comm_list(uint16_t gpu_id)
+{
+	int ret = 0;
+	int i = 0;
+	struct rte_gpu_comm_list * comm_list;
+	uint32_t num_comm_items = 1024;
+	struct rte_mbuf * mbufs[10];
+
+	printf("\n=======> TEST: Communication list\n");
+
+	comm_list = rte_gpu_comm_create_list(gpu_id, num_comm_items);
+	if(comm_list == NULL)
+	{
+		fprintf(stderr, "rte_gpu_comm_create_list returned error %d\n", ret);
+		return -1;
+	}
+
+	/**
+	 * Simulate DPDK receive functions like rte_eth_rx_burst()
+	 */
+	for(i = 0; i < 10; i++)
+	{
+		mbufs[i] = rte_zmalloc(NULL, sizeof(struct rte_mbuf), 0);
+		if (mbufs[i] == NULL) {
+			fprintf(stderr, "Failed to allocate fake mbufs in CPU memory.\n");
+			return -1;
+		}
+
+		memset(mbufs[i], 0, sizeof(struct rte_mbuf));
+	}
+
+	/**
+	 * Populate just the first item of  the list
+	 */
+	ret = rte_gpu_comm_populate_list_pkts(&(comm_list[0]), mbufs, 10);
+	if(ret < 0)
+	{
+		fprintf(stderr, "rte_gpu_comm_populate_list_pkts returned error %d\n", ret);
+		return -1;
+	}
+
+	ret = rte_gpu_comm_cleanup_list(&(comm_list[0]));
+	if(ret == 0)
+	{
+		fprintf(stderr, "rte_gpu_comm_cleanup_list erroneusly cleaned the list even if packets have not beeing consumed yet\n");
+		return -1;
+	}
+	else
+	{
+		fprintf(stderr, "rte_gpu_comm_cleanup_list correctly didn't clean up the packets because they have not beeing consumed yet\n");
+	}
+
+	/**
+	 * Simulate a GPU tasks going through the packet list to consume
+	 * mbufs packets and release them
+	 */
+	simulate_gpu_task(&(comm_list[0]), 10);
+
+	/**
+	 * Packets have been consumed, now the communication item
+	 * and the related mbufs can be all released
+	 */
+	ret = rte_gpu_comm_cleanup_list(&(comm_list[0]));
+	if(ret < 0)
+	{
+		fprintf(stderr, "rte_gpu_comm_cleanup_list returned error %d\n", ret);
+		return -1;
+	}
+
+	ret = rte_gpu_comm_destroy_list(comm_list, num_comm_items);
+	if(ret < 0)
+	{
+		fprintf(stderr, "rte_gpu_comm_destroy_list returned error %d\n", ret);
+		return -1;
+	}
+
+	for(i = 0; i < 10; i++)
+		rte_free(mbufs[i]);
+
+	printf("\nCommunication list test passed!\n");
+	return 0;
+}
+
 int
 main(int argc, char **argv)
 {
@@ -282,6 +384,7 @@ main(int argc, char **argv)
 	 * Communication items test
 	 */
 	create_update_comm_flag(gpu_id);
+	create_update_comm_list(gpu_id);
 
 	/* clean up the EAL */
 	rte_eal_cleanup();
diff --git a/doc/guides/prog_guide/gpudev.rst b/doc/guides/prog_guide/gpudev.rst
index e0db627aed..cbaec5a1e4 100644
--- a/doc/guides/prog_guide/gpudev.rst
+++ b/doc/guides/prog_guide/gpudev.rst
@@ -86,3 +86,19 @@ that's waiting to receive a signal from the CPU
 to move forward with the execution.
 The communication flag allocates a CPU memory GPU-visible ``uint32_t`` flag
 that can be used by the CPU to communicate with a GPU task.
+
+Communication list
+~~~~~~~~~~~~~~~~~~
+
+By default, DPDK pulls free mbufs from a mempool to receive packets.
+Best practice, expecially in a multithreaded application,
+is to no make any assumption on which mbufs will be used
+to receive the next bursts of packets.
+Considering an application with a GPU memory mempool
+attached to a receive queue having some task waiting on the GPU
+to receive a new burst of packets to be processed,
+there is the need to communicate from the CPU
+the list of mbuf payload addresses where received packet have been stored.
+The ``rte_gpu_comm_*()`` functions are responsible to create a list of packets
+that can be populated with receive mbuf payload addresses
+and communicated to the task running on the GPU.
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index 59ab1a1920..0c6d92a269 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -66,7 +66,7 @@ New Features
 
   * Device information
   * Memory management
-  * Communication flag
+  * Communication flag & list
 
 * **Added new RSS offload types for IPv4/L4 checksum in RSS flow.**
 
diff --git a/lib/gpudev/gpudev.c b/lib/gpudev/gpudev.c
index 827e29d8f6..3cfde97e3c 100644
--- a/lib/gpudev/gpudev.c
+++ b/lib/gpudev/gpudev.c
@@ -737,3 +737,168 @@ rte_gpu_comm_get_flag_value(struct rte_gpu_comm_flag *devflag, uint32_t *val)
 
 	return 0;
 }
+
+struct rte_gpu_comm_list *
+rte_gpu_comm_create_list(uint16_t dev_id,
+		uint32_t num_comm_items)
+{
+	struct rte_gpu_comm_list *comm_list;
+	uint32_t idx_l;
+	int ret;
+	struct rte_gpu *dev;
+
+	if (num_comm_items == 0) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	dev = gpu_get_by_id(dev_id);
+	if (dev == NULL) {
+		GPU_LOG(ERR, "memory barrier for invalid device ID %d", dev_id);
+		rte_errno = ENODEV;
+		return NULL;
+	}
+
+	comm_list = rte_zmalloc(NULL, sizeof(struct rte_gpu_comm_list) * num_comm_items, 0);
+	if (comm_list == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	ret = rte_gpu_register(dev_id, sizeof(struct rte_gpu_comm_list) * num_comm_items, comm_list);
+	if(ret < 0)
+	{
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	for (idx_l = 0; idx_l < num_comm_items; idx_l++) {
+		comm_list[idx_l].pkt_list = rte_zmalloc(NULL, sizeof(struct rte_gpu_comm_pkt) * RTE_GPU_COMM_LIST_PKTS_MAX, 0);
+		if (comm_list[idx_l].pkt_list == NULL) {
+			rte_errno = ENOMEM;
+			return NULL;
+		}
+
+		ret = rte_gpu_register(dev_id, sizeof(struct rte_gpu_comm_pkt) * RTE_GPU_COMM_LIST_PKTS_MAX, comm_list[idx_l].pkt_list);
+		if(ret < 0)
+		{
+			rte_errno = ENOMEM;
+			return NULL;
+		}
+
+		RTE_GPU_VOLATILE(comm_list[idx_l].status) = RTE_GPU_COMM_LIST_FREE;
+		comm_list[idx_l].num_pkts = 0;
+		comm_list[idx_l].dev_id = dev_id;
+	}
+
+	return comm_list;
+}
+
+int
+rte_gpu_comm_destroy_list(struct rte_gpu_comm_list *comm_list,
+		uint32_t num_comm_items)
+{
+	uint32_t idx_l;
+	int ret;
+	uint16_t dev_id;
+
+	if (comm_list == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	dev_id = comm_list[0].dev_id;
+
+	for (idx_l = 0; idx_l < num_comm_items; idx_l++)
+	{
+		ret = rte_gpu_unregister(dev_id, comm_list[idx_l].pkt_list);
+		if(ret < 0)
+		{
+			rte_errno = EINVAL;
+			return -1;
+		}
+
+		rte_free(comm_list[idx_l].pkt_list);
+	}
+
+	ret = rte_gpu_unregister(dev_id, comm_list);
+	if(ret < 0)
+	{
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	rte_free(comm_list);
+
+	return 0;
+}
+
+int
+rte_gpu_comm_populate_list_pkts(struct rte_gpu_comm_list *comm_list_item,
+		struct rte_mbuf **mbufs, uint32_t num_mbufs)
+{
+	uint32_t idx;
+
+	if (comm_list_item == NULL || comm_list_item->pkt_list == NULL ||
+			mbufs == NULL || num_mbufs > RTE_GPU_COMM_LIST_PKTS_MAX) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	for (idx = 0; idx < num_mbufs; idx++) {
+		/* support only unchained mbufs */
+		if (unlikely((mbufs[idx]->nb_segs > 1) ||
+				(mbufs[idx]->next != NULL) ||
+				(mbufs[idx]->data_len != mbufs[idx]->pkt_len))) {
+			rte_errno = ENOTSUP;
+			return -rte_errno;
+		}
+		comm_list_item->pkt_list[idx].addr =
+				rte_pktmbuf_mtod_offset(mbufs[idx], uintptr_t, 0);
+		comm_list_item->pkt_list[idx].size = mbufs[idx]->pkt_len;
+		comm_list_item->pkt_list[idx].opaque = mbufs[idx];
+	}
+
+	RTE_GPU_VOLATILE(comm_list_item->num_pkts) = num_mbufs;
+	rte_gpu_mbw(comm_list_item->dev_id);
+	RTE_GPU_VOLATILE(comm_list_item->status) = RTE_GPU_COMM_LIST_READY;
+	rte_gpu_mbw(comm_list_item->dev_id);
+
+	return 0;
+}
+
+int
+rte_gpu_comm_cleanup_list(struct rte_gpu_comm_list *comm_list_item)
+{
+	struct rte_mbuf *mbufs[RTE_GPU_COMM_LIST_PKTS_MAX];
+	uint32_t idx = 0;
+
+	if (comm_list_item == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	if (RTE_GPU_VOLATILE(comm_list_item->status) ==
+			RTE_GPU_COMM_LIST_READY) {
+		GPU_LOG(ERR, "packet list is still in progress");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	for (idx = 0; idx < RTE_GPU_COMM_LIST_PKTS_MAX; idx++) {
+		if (comm_list_item->pkt_list[idx].addr == 0)
+			break;
+
+		comm_list_item->pkt_list[idx].addr = 0;
+		comm_list_item->pkt_list[idx].size = 0;
+		mbufs[idx] = (struct rte_mbuf *) comm_list_item->pkt_list[idx].opaque;
+	}
+
+	rte_pktmbuf_free_bulk(mbufs, idx);
+
+	RTE_GPU_VOLATILE(comm_list_item->status) = RTE_GPU_COMM_LIST_FREE;
+	RTE_GPU_VOLATILE(comm_list_item->num_pkts) = 0;
+	rte_mb();
+
+	return 0;
+}
diff --git a/lib/gpudev/meson.build b/lib/gpudev/meson.build
index 608154817b..89a118f357 100644
--- a/lib/gpudev/meson.build
+++ b/lib/gpudev/meson.build
@@ -8,3 +8,5 @@ headers = files(
 sources = files(
         'gpudev.c',
 )
+
+deps += ['mbuf']
diff --git a/lib/gpudev/rte_gpudev.h b/lib/gpudev/rte_gpudev.h
index 4a10a8bcf5..a13a4fc2c8 100644
--- a/lib/gpudev/rte_gpudev.h
+++ b/lib/gpudev/rte_gpudev.h
@@ -9,6 +9,7 @@
 #include <stdint.h>
 #include <stdbool.h>
 
+#include <rte_mbuf.h>
 #include <rte_bitops.h>
 #include <rte_compat.h>
 
@@ -41,6 +42,9 @@ extern "C" {
 /** Access variable as volatile. */
 #define RTE_GPU_VOLATILE(x) (*(volatile typeof(x)*)&(x))
 
+/** Max number of packets per communication list. */
+#define RTE_GPU_COMM_LIST_PKTS_MAX 1024
+
 /** Store device info. */
 struct rte_gpu_info {
 	/** Unique identifier name. */
@@ -87,6 +91,43 @@ struct rte_gpu_comm_flag {
 	enum rte_gpu_comm_flag_type mtype;
 };
 
+/** List of packets shared among CPU and device. */
+struct rte_gpu_comm_pkt {
+	/** Address of the packet in memory (e.g. mbuf->buf_addr). */
+	uintptr_t addr;
+	/** Size in byte of the packet. */
+	size_t size;
+	/** Mbuf reference to release it in the rte_gpu_comm_cleanup_list(). */
+	void *opaque;
+};
+
+/** Possible status for the list of packets shared among CPU and device. */
+enum rte_gpu_comm_list_status {
+	/** Packet list can be filled with new mbufs, no one is using it. */
+	RTE_GPU_COMM_LIST_FREE = 0,
+	/** Packet list has been filled with new mbufs and it's ready to be used .*/
+	RTE_GPU_COMM_LIST_READY,
+	/** Packet list has been processed, it's ready to be freed. */
+	RTE_GPU_COMM_LIST_DONE,
+	/** Some error occurred during packet list processing. */
+	RTE_GPU_COMM_LIST_ERROR,
+};
+
+/**
+ * Communication list holding a number of lists of packets
+ * each having a status flag.
+ */
+struct rte_gpu_comm_list {
+	/** Device that will use the communication list. */
+	uint16_t dev_id;
+	/** List of packets populated by the CPU with a set of mbufs info. */
+	struct rte_gpu_comm_pkt *pkt_list;
+	/** Number of packets in the list. */
+	uint32_t num_pkts;
+	/** Status of the list. */
+	enum rte_gpu_comm_list_status status;
+};
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
@@ -513,6 +554,94 @@ __rte_experimental
 int rte_gpu_comm_get_flag_value(struct rte_gpu_comm_flag *devflag,
 		uint32_t *val);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Create a communication list that can be used to share packets
+ * between CPU and device.
+ * Each element of the list contains:
+ *  - a packet list of RTE_GPU_COMM_LIST_PKTS_MAX elements
+ *  - number of packets in the list
+ *  - a status flag to communicate if the packet list is FREE,
+ *    READY to be processed, DONE with processing.
+ *
+ * The list is allocated in CPU-visible memory.
+ * At creation time, every list is in FREE state.
+ *
+ * @param dev_id
+ *   Reference device ID.
+ * @param num_comm_items
+ *   Number of items in the communication list.
+ *
+ * @return
+ *   A pointer to the allocated list, otherwise NULL and rte_errno is set:
+ *   - EINVAL if invalid input params
+ */
+__rte_experimental
+struct rte_gpu_comm_list *rte_gpu_comm_create_list(uint16_t dev_id,
+		uint32_t num_comm_items);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Destroy a communication list.
+ *
+ * @param comm_list
+ *   Communication list to be destroyed.
+ * @param num_comm_items
+ *   Number of items in the communication list.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - EINVAL if invalid input params
+ */
+__rte_experimental
+int rte_gpu_comm_destroy_list(struct rte_gpu_comm_list *comm_list,
+		uint32_t num_comm_items);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Populate the packets list of the communication item
+ * with info from a list of mbufs.
+ * Status flag of that packet list is set to READY.
+ *
+ * @param comm_list_item
+ *   Communication list item to fill.
+ * @param mbufs
+ *   List of mbufs.
+ * @param num_mbufs
+ *   Number of mbufs.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - EINVAL if invalid input params
+ *   - ENOTSUP if mbufs are chained (multiple segments)
+ */
+__rte_experimental
+int rte_gpu_comm_populate_list_pkts(struct rte_gpu_comm_list *comm_list_item,
+		struct rte_mbuf **mbufs, uint32_t num_mbufs);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Reset a communication list item to the original state.
+ * The status flag set to FREE and mbufs are returned to the pool.
+ *
+ * @param comm_list_item
+ *   Communication list item to reset.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - EINVAL if invalid input params
+ */
+__rte_experimental
+int rte_gpu_comm_cleanup_list(struct rte_gpu_comm_list *comm_list_item);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/gpudev/version.map b/lib/gpudev/version.map
index 2fc039373a..45a35fa6e4 100644
--- a/lib/gpudev/version.map
+++ b/lib/gpudev/version.map
@@ -6,9 +6,13 @@ EXPERIMENTAL {
 	rte_gpu_callback_register;
 	rte_gpu_callback_unregister;
 	rte_gpu_close;
+	rte_gpu_comm_cleanup_list;
 	rte_gpu_comm_create_flag;
+	rte_gpu_comm_create_list;
 	rte_gpu_comm_destroy_flag;
+	rte_gpu_comm_destroy_list;
 	rte_gpu_comm_get_flag_value;
+	rte_gpu_comm_populate_list_pkts;
 	rte_gpu_comm_set_flag;
 	rte_gpu_count_avail;
 	rte_gpu_find_next;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [PATCH v3 9/9] doc: add CUDA example in GPU guide
  2021-10-09  1:53 ` [dpdk-dev] [PATCH v3 0/9] GPU library eagostini
                     ` (7 preceding siblings ...)
  2021-10-09  1:53   ` [dpdk-dev] [PATCH v3 8/9] gpudev: add communication list eagostini
@ 2021-10-09  1:53   ` eagostini
  2021-10-10 10:16   ` [dpdk-dev] [PATCH v3 0/9] GPU library Jerin Jacob
  9 siblings, 0 replies; 128+ messages in thread
From: eagostini @ 2021-10-09  1:53 UTC (permalink / raw)
  To: dev; +Cc: Elena Agostini

From: Elena Agostini <eagostini@nvidia.com>

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
---
 doc/guides/prog_guide/gpudev.rst | 122 +++++++++++++++++++++++++++++++
 1 file changed, 122 insertions(+)

diff --git a/doc/guides/prog_guide/gpudev.rst b/doc/guides/prog_guide/gpudev.rst
index cbaec5a1e4..1baf0c6772 100644
--- a/doc/guides/prog_guide/gpudev.rst
+++ b/doc/guides/prog_guide/gpudev.rst
@@ -102,3 +102,125 @@ the list of mbuf payload addresses where received packet have been stored.
 The ``rte_gpu_comm_*()`` functions are responsible to create a list of packets
 that can be populated with receive mbuf payload addresses
 and communicated to the task running on the GPU.
+
+
+CUDA Example
+------------
+
+In the example below, there is a pseudo-code to give an example
+about how to use functions in this library in case of a CUDA application.
+
+.. code-block:: c
+
+   //////////////////////////////////////////////////////////////////////////
+   ///// gpudev library + CUDA functions
+   //////////////////////////////////////////////////////////////////////////
+   #define GPU_PAGE_SHIFT 16
+   #define GPU_PAGE_SIZE (1UL << GPU_PAGE_SHIFT)
+
+   int main() {
+       struct rte_gpu_flag quit_flag;
+       struct rte_gpu_comm_list *comm_list;
+       int nb_rx = 0;
+       int comm_list_entry = 0;
+       struct rte_mbuf * rx_mbufs[max_rx_mbufs];
+       cudaStream_t cstream;
+       struct rte_mempool *mpool_payload, *mpool_header;
+       struct rte_pktmbuf_extmem ext_mem;
+       int16_t dev_id;
+       int16_t port_id = 0;
+
+       /** Initialize CUDA objects (cstream, context, etc..). */
+       /** Use gpudev library to register a new CUDA context if any */
+       /** Let's assume the application wants to use the default context of the GPU device 0 */
+
+       dev_id = 0;
+
+       /**
+        * Create an external memory mempool using memory allocated on the GPU.
+        */
+       ext_mem.elt_size = mbufs_headroom_size;
+                   ext_mem.buf_len = RTE_ALIGN_CEIL(mbufs_num * ext_mem.elt_size, GPU_PAGE_SIZE);
+       ext_mem.buf_iova = RTE_BAD_IOVA;
+       ext_mem.buf_ptr = rte_gpu_malloc(dev_id, ext_mem.buf_len, 0);
+       rte_extmem_register(ext_mem.buf_ptr, ext_mem.buf_len, NULL, ext_mem.buf_iova, GPU_PAGE_SIZE);
+       rte_dev_dma_map(rte_eth_devices[port_id].device, ext_mem.buf_ptr, ext_mem.buf_iova, ext_mem.buf_len);
+       mpool_payload = rte_pktmbuf_pool_create_extbuf("gpu_mempool", mbufs_num,
+                                                       0, 0, ext_mem.elt_size,
+                                                       rte_socket_id(), &ext_mem, 1);
+
+       /**
+        * Create CPU - device communication flag. With this flag, the CPU can tell to the CUDA kernel
+        * to exit from the main loop.
+        */
+       rte_gpu_comm_create_flag(dev_id, &quit_flag, RTE_GPU_COMM_FLAG_CPU);
+       rte_gpu_comm_set_flag(&quit_flag , 0);
+
+       /**
+        * Create CPU - device communication list. Each entry of this list will be populated by the CPU
+        * with a new set of received mbufs that the CUDA kernel has to process.
+        */
+       comm_list = rte_gpu_comm_create_list(dev_id, num_entries);
+
+       /** A very simple CUDA kernel with just 1 CUDA block and RTE_GPU_COMM_LIST_PKTS_MAX CUDA threads. */
+       cuda_kernel_packet_processing<<<1, RTE_GPU_COMM_LIST_PKTS_MAX, 0, cstream>>>(quit_flag->ptr, comm_list, num_entries, ...);
+
+       /**
+        * For simplicity, the CPU here receives only 2 bursts of mbufs.
+        * In a real application, network activity and device processing should overlap.
+        */
+       nb_rx = rte_eth_rx_burst(port_id, queue_id, &(rx_mbufs[0]), max_rx_mbufs);
+       rte_gpu_comm_populate_list_pkts(comm_list[0], rx_mbufs, nb_rx);
+       nb_rx = rte_eth_rx_burst(port_id, queue_id, &(rx_mbufs[0]), max_rx_mbufs);
+       rte_gpu_comm_populate_list_pkts(comm_list[1], rx_mbufs, nb_rx);
+
+       /**
+        * CPU waits for the completion of the packets' processing on the CUDA kernel
+        * and then it does a cleanup of the received mbufs.
+        */
+       while(rte_gpu_comm_cleanup_list(comm_list[0]));
+       while(rte_gpu_comm_cleanup_list(comm_list[1]));
+
+       /** CPU notifies the CUDA kernel that it has to terminate */
+       rte_gpu_comm_set_flag(&quit_flag, 1);
+
+       /** gpudev objects cleanup/destruction */
+       /** CUDA cleanup */
+
+       rte_gpu_free(dev_id, ext_mem.buf_len);
+
+       /** DPDK cleanup */
+
+       return 0;
+   }
+
+   //////////////////////////////////////////////////////////////////////////
+   ///// CUDA kernel
+   //////////////////////////////////////////////////////////////////////////
+
+   void cuda_kernel(uint32_t * quit_flag_ptr, struct rte_gpu_comm_list *comm_list, int comm_list_entries) {
+      int comm_list_index = 0;
+      struct rte_gpu_comm_pkt *pkt_list = NULL;
+
+      /** Do some pre-processing operations. */
+
+      /** GPU kernel keeps checking this flag to know if it has to quit or wait for more packets. */
+      while(*quit_flag_ptr == 0)
+      {
+         if(comm_list[comm_list_index]->status != RTE_GPU_COMM_LIST_READY)
+         continue;
+
+         if(threadIdx.x < comm_list[comm_list_index]->num_pkts)
+         {
+            /** Each CUDA thread processes a different packet. */
+            packet_processing(comm_list[comm_list_index]->addr, comm_list[comm_list_index]->size, ..);
+         }
+         __threadfence();
+         __syncthreads();
+
+         /** Wait for new packets on the next communication list entry. */
+         comm_list_index = (comm_list_index+1) % comm_list_entries;
+      }
+
+      /** Do some post-processing operations. */
+   }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/9] GPU library
  2021-10-09  1:53 ` [dpdk-dev] [PATCH v3 0/9] GPU library eagostini
                     ` (8 preceding siblings ...)
  2021-10-09  1:53   ` [dpdk-dev] [PATCH v3 9/9] doc: add CUDA example in GPU guide eagostini
@ 2021-10-10 10:16   ` Jerin Jacob
  2021-10-11  8:18     ` Thomas Monjalon
  9 siblings, 1 reply; 128+ messages in thread
From: Jerin Jacob @ 2021-10-10 10:16 UTC (permalink / raw)
  To: Elena Agostini, Thomas Monjalon, Ferruh Yigit, Honnappa Nagarahalli
  Cc: dpdk-dev

On Fri, Oct 8, 2021 at 11:13 PM <eagostini@nvidia.com> wrote:
>
> From: eagostini <eagostini@nvidia.com>
>
> In heterogeneous computing system, processing is not only in the CPU.
> Some tasks can be delegated to devices working in parallel.
>
> The goal of this new library is to enhance the collaboration between
> DPDK, that's primarily a CPU framework, and GPU devices.
>
> When mixing network activity with task processing on a non-CPU device,
> there may be the need to put in communication the CPU with the device
> in order to manage the memory, synchronize operations, exchange info, etc..
>
> This library provides a number of new features:
> - Interoperability with GPU-specific library with generic handlers
> - Possibility to allocate and free memory on the GPU
> - Possibility to allocate and free memory on the CPU but visible from the GPU
> - Communication functions to enhance the dialog between the CPU and the GPU

In the RFC thread, There was one outstanding non technical issues on this,

i.e
The above features are driver specific details. Does the DPDK
_application_ need to be aware of this?
aka DPDK device class has a fixed personality and it has API to deal
with abstracting specific application specific
end user functionality like ethdev, cryptodev, eventdev irrespective
of underlying bus/device properties.

Even similar semantics are required for DPU(Smart NIC)
communitication. I am planning to
send RFC in coming days to address the issue without the application
knowing the Bus/HW/Driver details.

Irrespective of the RFC I am planning to send and since the new
library needs techboard approval, You may
request that the techboard decide approval for this library. Also, As
far as I remember a minimum a SW driver in
additional to HW driver to accept a new driver class.

Just my 2c to save your cycles.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/9] GPU library
  2021-10-10 10:16   ` [dpdk-dev] [PATCH v3 0/9] GPU library Jerin Jacob
@ 2021-10-11  8:18     ` Thomas Monjalon
  2021-10-11  8:43       ` Jerin Jacob
  0 siblings, 1 reply; 128+ messages in thread
From: Thomas Monjalon @ 2021-10-11  8:18 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Elena Agostini, Ferruh Yigit, Honnappa Nagarahalli, dpdk-dev, techboard

10/10/2021 12:16, Jerin Jacob:
> On Fri, Oct 8, 2021 at 11:13 PM <eagostini@nvidia.com> wrote:
> >
> > From: eagostini <eagostini@nvidia.com>
> >
> > In heterogeneous computing system, processing is not only in the CPU.
> > Some tasks can be delegated to devices working in parallel.
> >
> > The goal of this new library is to enhance the collaboration between
> > DPDK, that's primarily a CPU framework, and GPU devices.
> >
> > When mixing network activity with task processing on a non-CPU device,
> > there may be the need to put in communication the CPU with the device
> > in order to manage the memory, synchronize operations, exchange info, etc..
> >
> > This library provides a number of new features:
> > - Interoperability with GPU-specific library with generic handlers
> > - Possibility to allocate and free memory on the GPU
> > - Possibility to allocate and free memory on the CPU but visible from the GPU
> > - Communication functions to enhance the dialog between the CPU and the GPU
> 
> In the RFC thread, There was one outstanding non technical issues on this,
> 
> i.e
> The above features are driver specific details. Does the DPDK
> _application_ need to be aware of this?

I don't see these features as driver-specific.

> aka DPDK device class has a fixed personality and it has API to deal
> with abstracting specific application specific
> end user functionality like ethdev, cryptodev, eventdev irrespective
> of underlying bus/device properties.

The goal of the lib is to allow anyone to invent any feature
which is not already available in DPDK.

> Even similar semantics are required for DPU(Smart NIC)
> communitication. I am planning to
> send RFC in coming days to address the issue without the application
> knowing the Bus/HW/Driver details.

gpudev is not exposing bus/hw/driver details.
I don't understand what you mean.

> Irrespective of the RFC I am planning to send and since the new
> library needs techboard approval, You may
> request that the techboard decide approval for this library. Also, As
> far as I remember a minimum a SW driver in
> additional to HW driver to accept a new driver class.

No, only one driver is required:
"When introducing a new device API, at least one driver should implement it."
Anyway, a SW driver doesn't make sense for gpudev.




^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/9] GPU library
  2021-10-11  8:18     ` Thomas Monjalon
@ 2021-10-11  8:43       ` Jerin Jacob
  2021-10-11  9:12         ` Thomas Monjalon
  0 siblings, 1 reply; 128+ messages in thread
From: Jerin Jacob @ 2021-10-11  8:43 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Elena Agostini, Ferruh Yigit, Honnappa Nagarahalli, dpdk-dev, techboard

On Mon, Oct 11, 2021 at 1:48 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 10/10/2021 12:16, Jerin Jacob:
> > On Fri, Oct 8, 2021 at 11:13 PM <eagostini@nvidia.com> wrote:
> > >
> > > From: eagostini <eagostini@nvidia.com>
> > >
> > > In heterogeneous computing system, processing is not only in the CPU.
> > > Some tasks can be delegated to devices working in parallel.
> > >
> > > The goal of this new library is to enhance the collaboration between
> > > DPDK, that's primarily a CPU framework, and GPU devices.
> > >
> > > When mixing network activity with task processing on a non-CPU device,
> > > there may be the need to put in communication the CPU with the device
> > > in order to manage the memory, synchronize operations, exchange info, etc..
> > >
> > > This library provides a number of new features:
> > > - Interoperability with GPU-specific library with generic handlers
> > > - Possibility to allocate and free memory on the GPU
> > > - Possibility to allocate and free memory on the CPU but visible from the GPU
> > > - Communication functions to enhance the dialog between the CPU and the GPU
> >
> > In the RFC thread, There was one outstanding non technical issues on this,
> >
> > i.e
> > The above features are driver specific details. Does the DPDK
> > _application_ need to be aware of this?
>
> I don't see these features as driver-specific.

That is the disconnect. I see this as more driver-specific details
which are not required
to implement an "application" facing API.

For example, If we need to implement application facing" subsystems like bbdev,
If we make all this driver interface, you can still implement the
bbdev API as a driver without
exposing HW specific details like how devices communicate to CPU, how
memory is allocated etc
 to "application".



>
> > aka DPDK device class has a fixed personality and it has API to deal
> > with abstracting specific application specific
> > end user functionality like ethdev, cryptodev, eventdev irrespective
> > of underlying bus/device properties.
>
> The goal of the lib is to allow anyone to invent any feature
> which is not already available in DPDK.
>
> > Even similar semantics are required for DPU(Smart NIC)
> > communitication. I am planning to
> > send RFC in coming days to address the issue without the application
> > knowing the Bus/HW/Driver details.
>
> gpudev is not exposing bus/hw/driver details.
> I don't understand what you mean.


See above.

>
> > Irrespective of the RFC I am planning to send and since the new
> > library needs techboard approval, You may
> > request that the techboard decide approval for this library. Also, As
> > far as I remember a minimum a SW driver in
> > additional to HW driver to accept a new driver class.
>
> No, only one driver is required:

That is good news.

> "When introducing a new device API, at least one driver should implement it."
> Anyway, a SW driver doesn't make sense for gpudev.

OK

>
>
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/9] GPU library
  2021-10-11  8:43       ` Jerin Jacob
@ 2021-10-11  9:12         ` Thomas Monjalon
  2021-10-11  9:29           ` Jerin Jacob
  0 siblings, 1 reply; 128+ messages in thread
From: Thomas Monjalon @ 2021-10-11  9:12 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: Elena Agostini, dpdk-dev, techboard

11/10/2021 10:43, Jerin Jacob:
> On Mon, Oct 11, 2021 at 1:48 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > 10/10/2021 12:16, Jerin Jacob:
> > > On Fri, Oct 8, 2021 at 11:13 PM <eagostini@nvidia.com> wrote:
> > > >
> > > > From: eagostini <eagostini@nvidia.com>
> > > >
> > > > In heterogeneous computing system, processing is not only in the CPU.
> > > > Some tasks can be delegated to devices working in parallel.
> > > >
> > > > The goal of this new library is to enhance the collaboration between
> > > > DPDK, that's primarily a CPU framework, and GPU devices.
> > > >
> > > > When mixing network activity with task processing on a non-CPU device,
> > > > there may be the need to put in communication the CPU with the device
> > > > in order to manage the memory, synchronize operations, exchange info, etc..
> > > >
> > > > This library provides a number of new features:
> > > > - Interoperability with GPU-specific library with generic handlers
> > > > - Possibility to allocate and free memory on the GPU
> > > > - Possibility to allocate and free memory on the CPU but visible from the GPU
> > > > - Communication functions to enhance the dialog between the CPU and the GPU
> > >
> > > In the RFC thread, There was one outstanding non technical issues on this,
> > >
> > > i.e
> > > The above features are driver specific details. Does the DPDK
> > > _application_ need to be aware of this?
> >
> > I don't see these features as driver-specific.
> 
> That is the disconnect. I see this as more driver-specific details
> which are not required to implement an "application" facing API.

Indeed this is the disconnect.
I already answered but it seems you don't accept the answer.

First, this is not driver-specific. It is a low-level API.

> For example, If we need to implement application facing" subsystems like bbdev,
> If we make all this driver interface, you can still implement the
> bbdev API as a driver without
> exposing HW specific details like how devices communicate to CPU, how
> memory is allocated etc
>  to "application".

There are 2 things to understand here.

First we want to allow the application using the GPU for needs which are
not exposed by any other DPDK API.

Second, if we want to implement another DPDK API like bbdev,
then the GPU implementation would be exposed as a vdev in bbdev,
using the HW GPU device being a PCI in gpudev.
They are two different levels, got it?

> > > aka DPDK device class has a fixed personality and it has API to deal
> > > with abstracting specific application specific
> > > end user functionality like ethdev, cryptodev, eventdev irrespective
> > > of underlying bus/device properties.
> >
> > The goal of the lib is to allow anyone to invent any feature
> > which is not already available in DPDK.
> >
> > > Even similar semantics are required for DPU(Smart NIC)
> > > communitication. I am planning to
> > > send RFC in coming days to address the issue without the application
> > > knowing the Bus/HW/Driver details.
> >
> > gpudev is not exposing bus/hw/driver details.
> > I don't understand what you mean.
> 
> See above.




^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/9] GPU library
  2021-10-11  9:12         ` Thomas Monjalon
@ 2021-10-11  9:29           ` Jerin Jacob
  2021-10-11 10:27             ` Thomas Monjalon
  0 siblings, 1 reply; 128+ messages in thread
From: Jerin Jacob @ 2021-10-11  9:29 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Elena Agostini, dpdk-dev, techboard

On Mon, Oct 11, 2021 at 2:42 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 11/10/2021 10:43, Jerin Jacob:
> > On Mon, Oct 11, 2021 at 1:48 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > 10/10/2021 12:16, Jerin Jacob:
> > > > On Fri, Oct 8, 2021 at 11:13 PM <eagostini@nvidia.com> wrote:
> > > > >
> > > > > From: eagostini <eagostini@nvidia.com>
> > > > >
> > > > > In heterogeneous computing system, processing is not only in the CPU.
> > > > > Some tasks can be delegated to devices working in parallel.
> > > > >
> > > > > The goal of this new library is to enhance the collaboration between
> > > > > DPDK, that's primarily a CPU framework, and GPU devices.
> > > > >
> > > > > When mixing network activity with task processing on a non-CPU device,
> > > > > there may be the need to put in communication the CPU with the device
> > > > > in order to manage the memory, synchronize operations, exchange info, etc..
> > > > >
> > > > > This library provides a number of new features:
> > > > > - Interoperability with GPU-specific library with generic handlers
> > > > > - Possibility to allocate and free memory on the GPU
> > > > > - Possibility to allocate and free memory on the CPU but visible from the GPU
> > > > > - Communication functions to enhance the dialog between the CPU and the GPU
> > > >
> > > > In the RFC thread, There was one outstanding non technical issues on this,
> > > >
> > > > i.e
> > > > The above features are driver specific details. Does the DPDK
> > > > _application_ need to be aware of this?
> > >
> > > I don't see these features as driver-specific.
> >
> > That is the disconnect. I see this as more driver-specific details
> > which are not required to implement an "application" facing API.
>
> Indeed this is the disconnect.
> I already answered but it seems you don't accept the answer.

Same with you. That is why I requested, we need to get opinions from others.
Some of them already provided opinions in RFC.

>
> First, this is not driver-specific. It is a low-level API.

What is the difference between low-level API and driver-level API.


>
> > For example, If we need to implement application facing" subsystems like bbdev,
> > If we make all this driver interface, you can still implement the
> > bbdev API as a driver without
> > exposing HW specific details like how devices communicate to CPU, how
> > memory is allocated etc
> >  to "application".
>
> There are 2 things to understand here.
>
> First we want to allow the application using the GPU for needs which are
> not exposed by any other DPDK API.
>
> Second, if we want to implement another DPDK API like bbdev,
> then the GPU implementation would be exposed as a vdev in bbdev,
> using the HW GPU device being a PCI in gpudev.
> They are two different levels, got it?

Exactly. So what is the point of exposing low-level driver API to
"application",
why not it is part of the internal driver API. My point is, why the
application needs to worry
about, How the CPU <-> Device communicated? CPU < -> Device memory
visibility etc.

>
> > > > aka DPDK device class has a fixed personality and it has API to deal
> > > > with abstracting specific application specific
> > > > end user functionality like ethdev, cryptodev, eventdev irrespective
> > > > of underlying bus/device properties.
> > >
> > > The goal of the lib is to allow anyone to invent any feature
> > > which is not already available in DPDK.
> > >
> > > > Even similar semantics are required for DPU(Smart NIC)
> > > > communitication. I am planning to
> > > > send RFC in coming days to address the issue without the application
> > > > knowing the Bus/HW/Driver details.
> > >
> > > gpudev is not exposing bus/hw/driver details.
> > > I don't understand what you mean.
> >
> > See above.
>
>
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/9] GPU library
  2021-10-11  9:29           ` Jerin Jacob
@ 2021-10-11 10:27             ` Thomas Monjalon
  2021-10-11 11:41               ` Jerin Jacob
  0 siblings, 1 reply; 128+ messages in thread
From: Thomas Monjalon @ 2021-10-11 10:27 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: Elena Agostini, dpdk-dev, techboard

11/10/2021 11:29, Jerin Jacob:
> On Mon, Oct 11, 2021 at 2:42 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > 11/10/2021 10:43, Jerin Jacob:
> > > On Mon, Oct 11, 2021 at 1:48 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > 10/10/2021 12:16, Jerin Jacob:
> > > > > On Fri, Oct 8, 2021 at 11:13 PM <eagostini@nvidia.com> wrote:
> > > > > >
> > > > > > From: eagostini <eagostini@nvidia.com>
> > > > > >
> > > > > > In heterogeneous computing system, processing is not only in the CPU.
> > > > > > Some tasks can be delegated to devices working in parallel.
> > > > > >
> > > > > > The goal of this new library is to enhance the collaboration between
> > > > > > DPDK, that's primarily a CPU framework, and GPU devices.
> > > > > >
> > > > > > When mixing network activity with task processing on a non-CPU device,
> > > > > > there may be the need to put in communication the CPU with the device
> > > > > > in order to manage the memory, synchronize operations, exchange info, etc..
> > > > > >
> > > > > > This library provides a number of new features:
> > > > > > - Interoperability with GPU-specific library with generic handlers
> > > > > > - Possibility to allocate and free memory on the GPU
> > > > > > - Possibility to allocate and free memory on the CPU but visible from the GPU
> > > > > > - Communication functions to enhance the dialog between the CPU and the GPU
> > > > >
> > > > > In the RFC thread, There was one outstanding non technical issues on this,
> > > > >
> > > > > i.e
> > > > > The above features are driver specific details. Does the DPDK
> > > > > _application_ need to be aware of this?
> > > >
> > > > I don't see these features as driver-specific.
> > >
> > > That is the disconnect. I see this as more driver-specific details
> > > which are not required to implement an "application" facing API.
> >
> > Indeed this is the disconnect.
> > I already answered but it seems you don't accept the answer.
> 
> Same with you. That is why I requested, we need to get opinions from others.
> Some of them already provided opinions in RFC.

This is why I Cc'ed techboard.

> > First, this is not driver-specific. It is a low-level API.
> 
> What is the difference between low-level API and driver-level API.

The low-level API provides tools to build a feature,
but no specific feature.

> > > For example, If we need to implement application facing" subsystems like bbdev,
> > > If we make all this driver interface, you can still implement the
> > > bbdev API as a driver without
> > > exposing HW specific details like how devices communicate to CPU, how
> > > memory is allocated etc
> > >  to "application".
> >
> > There are 2 things to understand here.
> >
> > First we want to allow the application using the GPU for needs which are
> > not exposed by any other DPDK API.
> >
> > Second, if we want to implement another DPDK API like bbdev,
> > then the GPU implementation would be exposed as a vdev in bbdev,
> > using the HW GPU device being a PCI in gpudev.
> > They are two different levels, got it?
> 
> Exactly. So what is the point of exposing low-level driver API to
> "application",
> why not it is part of the internal driver API. My point is, why the
> application needs to worry
> about, How the CPU <-> Device communicated? CPU < -> Device memory
> visibility etc.

There are two reasons.

1/ The application may want to use the GPU for some application-specific
needs which are not abstracted in DPDK API.

2/ This API may also be used by some feature implementation internally
in some DPDK libs or drivers.
We cannot skip the gpudev layer because this is what allows generic probing
of the HW, and gpudev allows to share the GPU with multiple features
implemented in different libs or drivers, thanks to the "child" concept.


> > > > > aka DPDK device class has a fixed personality and it has API to deal
> > > > > with abstracting specific application specific
> > > > > end user functionality like ethdev, cryptodev, eventdev irrespective
> > > > > of underlying bus/device properties.
> > > >
> > > > The goal of the lib is to allow anyone to invent any feature
> > > > which is not already available in DPDK.
> > > >
> > > > > Even similar semantics are required for DPU(Smart NIC)
> > > > > communitication. I am planning to
> > > > > send RFC in coming days to address the issue without the application
> > > > > knowing the Bus/HW/Driver details.
> > > >
> > > > gpudev is not exposing bus/hw/driver details.
> > > > I don't understand what you mean.
> > >
> > > See above.




^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/9] GPU library
  2021-10-11 10:27             ` Thomas Monjalon
@ 2021-10-11 11:41               ` Jerin Jacob
  2021-10-11 12:44                 ` Thomas Monjalon
  0 siblings, 1 reply; 128+ messages in thread
From: Jerin Jacob @ 2021-10-11 11:41 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Elena Agostini, dpdk-dev, techboard

On Mon, Oct 11, 2021 at 3:57 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 11/10/2021 11:29, Jerin Jacob:
> > On Mon, Oct 11, 2021 at 2:42 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > 11/10/2021 10:43, Jerin Jacob:
> > > > On Mon, Oct 11, 2021 at 1:48 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > > 10/10/2021 12:16, Jerin Jacob:
> > > > > > On Fri, Oct 8, 2021 at 11:13 PM <eagostini@nvidia.com> wrote:
> > > > > > >
> > > > > > > From: eagostini <eagostini@nvidia.com>
> > > > > > >
> > > > > > > In heterogeneous computing system, processing is not only in the CPU.
> > > > > > > Some tasks can be delegated to devices working in parallel.
> > > > > > >
> > > > > > > The goal of this new library is to enhance the collaboration between
> > > > > > > DPDK, that's primarily a CPU framework, and GPU devices.
> > > > > > >
> > > > > > > When mixing network activity with task processing on a non-CPU device,
> > > > > > > there may be the need to put in communication the CPU with the device
> > > > > > > in order to manage the memory, synchronize operations, exchange info, etc..
> > > > > > >
> > > > > > > This library provides a number of new features:
> > > > > > > - Interoperability with GPU-specific library with generic handlers
> > > > > > > - Possibility to allocate and free memory on the GPU
> > > > > > > - Possibility to allocate and free memory on the CPU but visible from the GPU
> > > > > > > - Communication functions to enhance the dialog between the CPU and the GPU
> > > > > >
> > > > > > In the RFC thread, There was one outstanding non technical issues on this,
> > > > > >
> > > > > > i.e
> > > > > > The above features are driver specific details. Does the DPDK
> > > > > > _application_ need to be aware of this?
> > > > >
> > > > > I don't see these features as driver-specific.
> > > >
> > > > That is the disconnect. I see this as more driver-specific details
> > > > which are not required to implement an "application" facing API.
> > >
> > > Indeed this is the disconnect.
> > > I already answered but it seems you don't accept the answer.
> >
> > Same with you. That is why I requested, we need to get opinions from others.
> > Some of them already provided opinions in RFC.
>
> This is why I Cc'ed techboard.

Yes. Indeed.

>
> > > First, this is not driver-specific. It is a low-level API.
> >
> > What is the difference between low-level API and driver-level API.
>
> The low-level API provides tools to build a feature,
> but no specific feature.
>
> > > > For example, If we need to implement application facing" subsystems like bbdev,
> > > > If we make all this driver interface, you can still implement the
> > > > bbdev API as a driver without
> > > > exposing HW specific details like how devices communicate to CPU, how
> > > > memory is allocated etc
> > > >  to "application".
> > >
> > > There are 2 things to understand here.
> > >
> > > First we want to allow the application using the GPU for needs which are
> > > not exposed by any other DPDK API.
> > >
> > > Second, if we want to implement another DPDK API like bbdev,
> > > then the GPU implementation would be exposed as a vdev in bbdev,
> > > using the HW GPU device being a PCI in gpudev.
> > > They are two different levels, got it?
> >
> > Exactly. So what is the point of exposing low-level driver API to
> > "application",
> > why not it is part of the internal driver API. My point is, why the
> > application needs to worry
> > about, How the CPU <-> Device communicated? CPU < -> Device memory
> > visibility etc.
>
> There are two reasons.
>
> 1/ The application may want to use the GPU for some application-specific
> needs which are not abstracted in DPDK API.

Yes. Exactly, That's where my concern, If we take this path, What is
the motivation to contribute to DPDK abstracted subsystem APIs which
make sense for multiple vendors and every
Similar stuff applicable for DPU,

Otherway to put, if GPU is doing some ethdev offload, why not making
as ethdev offload in ethdev spec so that
another type of device can be used and make sense for application writters.

For example, In the future, If someone needs to add ML(Machine
learning) subsystem and enable a proper subsystem
interface that is good for DPDK. If this path is open, there is no
motivation for contribution and the application
will not have a standard interface doing the ML job across multiple vendors.

That's is the only reason why saying it should not APPLICATION
interface it can be DRIVER interface.

>
> 2/ This API may also be used by some feature implementation internally
> in some DPDK libs or drivers.
> We cannot skip the gpudev layer because this is what allows generic probing
> of the HW, and gpudev allows to share the GPU with multiple features
> implemented in different libs or drivers, thanks to the "child" concept.

Again, why do applications need to know it? It is similar to `bus`
kind of this where it
sharing the physical resouces.


>
>
> > > > > > aka DPDK device class has a fixed personality and it has API to deal
> > > > > > with abstracting specific application specific
> > > > > > end user functionality like ethdev, cryptodev, eventdev irrespective
> > > > > > of underlying bus/device properties.
> > > > >
> > > > > The goal of the lib is to allow anyone to invent any feature
> > > > > which is not already available in DPDK.
> > > > >
> > > > > > Even similar semantics are required for DPU(Smart NIC)
> > > > > > communitication. I am planning to
> > > > > > send RFC in coming days to address the issue without the application
> > > > > > knowing the Bus/HW/Driver details.
> > > > >
> > > > > gpudev is not exposing bus/hw/driver details.
> > > > > I don't understand what you mean.
> > > >
> > > > See above.
>
>
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/9] GPU library
  2021-10-11 11:41               ` Jerin Jacob
@ 2021-10-11 12:44                 ` Thomas Monjalon
  2021-10-11 13:30                   ` Jerin Jacob
  0 siblings, 1 reply; 128+ messages in thread
From: Thomas Monjalon @ 2021-10-11 12:44 UTC (permalink / raw)
  To: Jerin Jacob, techboard; +Cc: Elena Agostini, dpdk-dev

11/10/2021 13:41, Jerin Jacob:
> On Mon, Oct 11, 2021 at 3:57 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > 11/10/2021 11:29, Jerin Jacob:
> > > On Mon, Oct 11, 2021 at 2:42 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > 11/10/2021 10:43, Jerin Jacob:
> > > > > On Mon, Oct 11, 2021 at 1:48 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > > > 10/10/2021 12:16, Jerin Jacob:
> > > > > > > On Fri, Oct 8, 2021 at 11:13 PM <eagostini@nvidia.com> wrote:
> > > > > > > >
> > > > > > > > From: eagostini <eagostini@nvidia.com>
> > > > > > > >
> > > > > > > > In heterogeneous computing system, processing is not only in the CPU.
> > > > > > > > Some tasks can be delegated to devices working in parallel.
> > > > > > > >
> > > > > > > > The goal of this new library is to enhance the collaboration between
> > > > > > > > DPDK, that's primarily a CPU framework, and GPU devices.
> > > > > > > >
> > > > > > > > When mixing network activity with task processing on a non-CPU device,
> > > > > > > > there may be the need to put in communication the CPU with the device
> > > > > > > > in order to manage the memory, synchronize operations, exchange info, etc..
> > > > > > > >
> > > > > > > > This library provides a number of new features:
> > > > > > > > - Interoperability with GPU-specific library with generic handlers
> > > > > > > > - Possibility to allocate and free memory on the GPU
> > > > > > > > - Possibility to allocate and free memory on the CPU but visible from the GPU
> > > > > > > > - Communication functions to enhance the dialog between the CPU and the GPU
> > > > > > >
> > > > > > > In the RFC thread, There was one outstanding non technical issues on this,
> > > > > > >
> > > > > > > i.e
> > > > > > > The above features are driver specific details. Does the DPDK
> > > > > > > _application_ need to be aware of this?
> > > > > >
> > > > > > I don't see these features as driver-specific.
> > > > >
> > > > > That is the disconnect. I see this as more driver-specific details
> > > > > which are not required to implement an "application" facing API.
> > > >
> > > > Indeed this is the disconnect.
> > > > I already answered but it seems you don't accept the answer.
> > >
> > > Same with you. That is why I requested, we need to get opinions from others.
> > > Some of them already provided opinions in RFC.
> >
> > This is why I Cc'ed techboard.
> 
> Yes. Indeed.
> 
> >
> > > > First, this is not driver-specific. It is a low-level API.
> > >
> > > What is the difference between low-level API and driver-level API.
> >
> > The low-level API provides tools to build a feature,
> > but no specific feature.
> >
> > > > > For example, If we need to implement application facing" subsystems like bbdev,
> > > > > If we make all this driver interface, you can still implement the
> > > > > bbdev API as a driver without
> > > > > exposing HW specific details like how devices communicate to CPU, how
> > > > > memory is allocated etc
> > > > >  to "application".
> > > >
> > > > There are 2 things to understand here.
> > > >
> > > > First we want to allow the application using the GPU for needs which are
> > > > not exposed by any other DPDK API.
> > > >
> > > > Second, if we want to implement another DPDK API like bbdev,
> > > > then the GPU implementation would be exposed as a vdev in bbdev,
> > > > using the HW GPU device being a PCI in gpudev.
> > > > They are two different levels, got it?
> > >
> > > Exactly. So what is the point of exposing low-level driver API to
> > > "application",
> > > why not it is part of the internal driver API. My point is, why the
> > > application needs to worry
> > > about, How the CPU <-> Device communicated? CPU < -> Device memory
> > > visibility etc.
> >
> > There are two reasons.
> >
> > 1/ The application may want to use the GPU for some application-specific
> > needs which are not abstracted in DPDK API.
> 
> Yes. Exactly, That's where my concern, If we take this path, What is
> the motivation to contribute to DPDK abstracted subsystem APIs which
> make sense for multiple vendors and every
> Similar stuff applicable for DPU,

A feature-specific API is better of course, there is no lose of motivation.
But you cannot forbid applications to have their own features on GPU.

> Otherway to put, if GPU is doing some ethdev offload, why not making
> as ethdev offload in ethdev spec so that
> another type of device can be used and make sense for application writters.

If we do ethdev offload, yes we'll implement it.
And we'll do it on top of gpudev, which is the only way to share the CPU.

> For example, In the future, If someone needs to add ML(Machine
> learning) subsystem and enable a proper subsystem
> interface that is good for DPDK. If this path is open, there is no
> motivation for contribution and the application
> will not have a standard interface doing the ML job across multiple vendors.

Wrong. It does remove the motivation, it is a first step to build on top of it.

> That's is the only reason why saying it should not APPLICATION
> interface it can be DRIVER interface.
> 
> >
> > 2/ This API may also be used by some feature implementation internally
> > in some DPDK libs or drivers.
> > We cannot skip the gpudev layer because this is what allows generic probing
> > of the HW, and gpudev allows to share the GPU with multiple features
> > implemented in different libs or drivers, thanks to the "child" concept.
> 
> Again, why do applications need to know it? It is similar to `bus`
> kind of this where it sharing the physical resouces.

No it's not a bus, it is a device that we need to share.

> > > > > > > aka DPDK device class has a fixed personality and it has API to deal
> > > > > > > with abstracting specific application specific
> > > > > > > end user functionality like ethdev, cryptodev, eventdev irrespective
> > > > > > > of underlying bus/device properties.
> > > > > >
> > > > > > The goal of the lib is to allow anyone to invent any feature
> > > > > > which is not already available in DPDK.
> > > > > >
> > > > > > > Even similar semantics are required for DPU(Smart NIC)
> > > > > > > communitication. I am planning to
> > > > > > > send RFC in coming days to address the issue without the application
> > > > > > > knowing the Bus/HW/Driver details.
> > > > > >
> > > > > > gpudev is not exposing bus/hw/driver details.
> > > > > > I don't understand what you mean.
> > > > >
> > > > > See above.

We are going into circles.
In short, Jerin wants to forbid the generic use of GPU in DPDK.
He wants only feature-specific API.
It is like restricting the functions we can run on a CPU.

And anyway we need this layer to share the GPU between multiple features.

Techboard please vote.



^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/9] GPU library
  2021-10-11 12:44                 ` Thomas Monjalon
@ 2021-10-11 13:30                   ` Jerin Jacob
  2021-10-19 10:00                     ` Elena Agostini
  0 siblings, 1 reply; 128+ messages in thread
From: Jerin Jacob @ 2021-10-11 13:30 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: techboard, Elena Agostini, dpdk-dev

On Mon, Oct 11, 2021 at 6:14 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 11/10/2021 13:41, Jerin Jacob:
> > On Mon, Oct 11, 2021 at 3:57 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > 11/10/2021 11:29, Jerin Jacob:
> > > > On Mon, Oct 11, 2021 at 2:42 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > > 11/10/2021 10:43, Jerin Jacob:
> > > > > > On Mon, Oct 11, 2021 at 1:48 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > > > > 10/10/2021 12:16, Jerin Jacob:
> > > > > > > > On Fri, Oct 8, 2021 at 11:13 PM <eagostini@nvidia.com> wrote:
> > > > > > > > >
> > > > > > > > > From: eagostini <eagostini@nvidia.com>
> > > > > > > > >
> > > > > > > > > In heterogeneous computing system, processing is not only in the CPU.
> > > > > > > > > Some tasks can be delegated to devices working in parallel.
> > > > > > > > >
> > > > > > > > > The goal of this new library is to enhance the collaboration between
> > > > > > > > > DPDK, that's primarily a CPU framework, and GPU devices.
> > > > > > > > >
> > > > > > > > > When mixing network activity with task processing on a non-CPU device,
> > > > > > > > > there may be the need to put in communication the CPU with the device
> > > > > > > > > in order to manage the memory, synchronize operations, exchange info, etc..
> > > > > > > > >
> > > > > > > > > This library provides a number of new features:
> > > > > > > > > - Interoperability with GPU-specific library with generic handlers
> > > > > > > > > - Possibility to allocate and free memory on the GPU
> > > > > > > > > - Possibility to allocate and free memory on the CPU but visible from the GPU
> > > > > > > > > - Communication functions to enhance the dialog between the CPU and the GPU
> > > > > > > >
> > > > > > > > In the RFC thread, There was one outstanding non technical issues on this,
> > > > > > > >
> > > > > > > > i.e
> > > > > > > > The above features are driver specific details. Does the DPDK
> > > > > > > > _application_ need to be aware of this?
> > > > > > >
> > > > > > > I don't see these features as driver-specific.
> > > > > >
> > > > > > That is the disconnect. I see this as more driver-specific details
> > > > > > which are not required to implement an "application" facing API.
> > > > >
> > > > > Indeed this is the disconnect.
> > > > > I already answered but it seems you don't accept the answer.
> > > >
> > > > Same with you. That is why I requested, we need to get opinions from others.
> > > > Some of them already provided opinions in RFC.
> > >
> > > This is why I Cc'ed techboard.
> >
> > Yes. Indeed.
> >
> > >
> > > > > First, this is not driver-specific. It is a low-level API.
> > > >
> > > > What is the difference between low-level API and driver-level API.
> > >
> > > The low-level API provides tools to build a feature,
> > > but no specific feature.
> > >
> > > > > > For example, If we need to implement application facing" subsystems like bbdev,
> > > > > > If we make all this driver interface, you can still implement the
> > > > > > bbdev API as a driver without
> > > > > > exposing HW specific details like how devices communicate to CPU, how
> > > > > > memory is allocated etc
> > > > > >  to "application".
> > > > >
> > > > > There are 2 things to understand here.
> > > > >
> > > > > First we want to allow the application using the GPU for needs which are
> > > > > not exposed by any other DPDK API.
> > > > >
> > > > > Second, if we want to implement another DPDK API like bbdev,
> > > > > then the GPU implementation would be exposed as a vdev in bbdev,
> > > > > using the HW GPU device being a PCI in gpudev.
> > > > > They are two different levels, got it?
> > > >
> > > > Exactly. So what is the point of exposing low-level driver API to
> > > > "application",
> > > > why not it is part of the internal driver API. My point is, why the
> > > > application needs to worry
> > > > about, How the CPU <-> Device communicated? CPU < -> Device memory
> > > > visibility etc.
> > >
> > > There are two reasons.
> > >
> > > 1/ The application may want to use the GPU for some application-specific
> > > needs which are not abstracted in DPDK API.
> >
> > Yes. Exactly, That's where my concern, If we take this path, What is
> > the motivation to contribute to DPDK abstracted subsystem APIs which
> > make sense for multiple vendors and every
> > Similar stuff applicable for DPU,
>
> A feature-specific API is better of course, there is no lose of motivation.
> But you cannot forbid applications to have their own features on GPU.

it still can use it. We don't need DPDK APIs for that.

>
> > Otherway to put, if GPU is doing some ethdev offload, why not making
> > as ethdev offload in ethdev spec so that
> > another type of device can be used and make sense for application writters.
>
> If we do ethdev offload, yes we'll implement it.
> And we'll do it on top of gpudev, which is the only way to share the CPU.
>
> > For example, In the future, If someone needs to add ML(Machine
> > learning) subsystem and enable a proper subsystem
> > interface that is good for DPDK. If this path is open, there is no
> > motivation for contribution and the application
> > will not have a standard interface doing the ML job across multiple vendors.
>
> Wrong. It does remove the motivation, it is a first step to build on top of it.

IMO, No need to make driver API to the public to feature API.

>
> > That's is the only reason why saying it should not APPLICATION
> > interface it can be DRIVER interface.
> >
> > >
> > > 2/ This API may also be used by some feature implementation internally
> > > in some DPDK libs or drivers.
> > > We cannot skip the gpudev layer because this is what allows generic probing
> > > of the HW, and gpudev allows to share the GPU with multiple features
> > > implemented in different libs or drivers, thanks to the "child" concept.
> >
> > Again, why do applications need to know it? It is similar to `bus`
> > kind of this where it sharing the physical resouces.
>
> No it's not a bus, it is a device that we need to share.
>
> > > > > > > > aka DPDK device class has a fixed personality and it has API to deal
> > > > > > > > with abstracting specific application specific
> > > > > > > > end user functionality like ethdev, cryptodev, eventdev irrespective
> > > > > > > > of underlying bus/device properties.
> > > > > > >
> > > > > > > The goal of the lib is to allow anyone to invent any feature
> > > > > > > which is not already available in DPDK.
> > > > > > >
> > > > > > > > Even similar semantics are required for DPU(Smart NIC)
> > > > > > > > communitication. I am planning to
> > > > > > > > send RFC in coming days to address the issue without the application
> > > > > > > > knowing the Bus/HW/Driver details.
> > > > > > >
> > > > > > > gpudev is not exposing bus/hw/driver details.
> > > > > > > I don't understand what you mean.
> > > > > >
 > > > > > > See above.
>
> We are going into circles.

Yes.

> In short, Jerin wants to forbid the generic use of GPU in DPDK.

See below.

> He wants only feature-specific API.

To re-reiterate, feature-specific "application" API. A device-specific
bit can be
driver API and accessible to the out-of-tree driver if needed.

IMO, if we take this path, DPU, XPU, GPU, etc we need N different libraries to
get the job done for a specific feature for the dataplane.
Instead, Enabling public feature APIs will make the application
portable and does not
need to worry about which type of *PU it runs.


> It is like restricting the functions we can run on a CPU.
>
> And anyway we need this layer to share the GPU between multiple features.

No disagreement there. Is that layer public application API or not is
the question.
it is like PCI device API calls over of the application and makes the
application device specific.

>
> Techboard please vote.

Yes.

>
>

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/9] GPU library
  2021-10-11 13:30                   ` Jerin Jacob
@ 2021-10-19 10:00                     ` Elena Agostini
  2021-10-19 18:47                       ` Jerin Jacob
  0 siblings, 1 reply; 128+ messages in thread
From: Elena Agostini @ 2021-10-19 10:00 UTC (permalink / raw)
  To: Jerin Jacob, NBU-Contact-Thomas Monjalon; +Cc: techboard, dpdk-dev

From: Jerin Jacob <jerinjacobk@gmail.com>
Date: Monday, 11 October 2021 at 15:30
To: NBU-Contact-Thomas Monjalon <thomas@monjalon.net>
Cc: techboard@dpdk.org <techboard@dpdk.org>, Elena Agostini <eagostini@nvidia.com>, dpdk-dev <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v3 0/9] GPU library
> On Mon, Oct 11, 2021 at 6:14 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> >
> > 11/10/2021 13:41, Jerin Jacob:
> > > On Mon, Oct 11, 2021 at 3:57 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > 11/10/2021 11:29, Jerin Jacob:
> > > > > On Mon, Oct 11, 2021 at 2:42 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > > > 11/10/2021 10:43, Jerin Jacob:
> > > > > > > On Mon, Oct 11, 2021 at 1:48 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> > > > > > > > 10/10/2021 12:16, Jerin Jacob:
> > > > > > > > > On Fri, Oct 8, 2021 at 11:13 PM <eagostini@nvidia.com> wrote:
> > > > > > > > > >
> > > > > > > > > > From: eagostini <eagostini@nvidia.com>
> > > > > > > > > >
> > > > > > > > > > In heterogeneous computing system, processing is not only in the CPU.
> > > > > > > > > > Some tasks can be delegated to devices working in parallel.
> > > > > > > > > >
> > > > > > > > > > The goal of this new library is to enhance the collaboration between
> > > > > > > > > > DPDK, that's primarily a CPU framework, and GPU devices.
> > > > > > > > > >
> > > > > > > > > > When mixing network activity with task processing on a non-CPU device,
> > > > > > > > > > there may be the need to put in communication the CPU with the device
> > > > > > > > > > in order to manage the memory, synchronize operations, exchange info, etc..
> > > > > > > > > >
> > > > > > > > > > This library provides a number of new features:
> > > > > > > > > > - Interoperability with GPU-specific library with generic handlers
> > > > > > > > > > - Possibility to allocate and free memory on the GPU
> > > > > > > > > > - Possibility to allocate and free memory on the CPU but visible from the GPU
> > > > > > > > > > - Communication functions to enhance the dialog between the CPU and the GPU
> > > > > > > > >
> > > > > > > > > In the RFC thread, There was one outstanding non technical issues on this,
> > > > > > > > >
> > > > > > > > > i.e
> > > > > > > > > The above features are driver specific details. Does the DPDK
> > > > > > > > > _application_ need to be aware of this?
> > > > > > > >
> > > > > > > > I don't see these features as driver-specific.
> > > > > > >
> > > > > > > That is the disconnect. I see this as more driver-specific details
> > > > > > > which are not required to implement an "application" facing API.
> > > > > >
> > > > > > Indeed this is the disconnect.
> > > > > > I already answered but it seems you don't accept the answer.
> > > > >
> > > > > Same with you. That is why I requested, we need to get opinions from others.
> > > > > Some of them already provided opinions in RFC.
> > > >
> > > > This is why I Cc'ed techboard.
> > >
> > > Yes. Indeed.
> > >
> > > >
> > > > > > First, this is not driver-specific. It is a low-level API.
> > > > >
> > > > > What is the difference between low-level API and driver-level API.
> > > >
> > > > The low-level API provides tools to build a feature,
> > > > but no specific feature.
> > > >
> > > > > > > For example, If we need to implement application facing" subsystems like bbdev,
> > > > > > > If we make all this driver interface, you can still implement the
> > > > > > > bbdev API as a driver without
> > > > > > > exposing HW specific details like how devices communicate to CPU, how
> > > > > > > memory is allocated etc
> > > > > > >  to "application".
> > > > > >
> > > > > > There are 2 things to understand here.
> > > > > >
> > > > > > First we want to allow the application using the GPU for needs which are
> > > > > > not exposed by any other DPDK API.
> > > > > >
> > > > > > Second, if we want to implement another DPDK API like bbdev,
> > > > > > then the GPU implementation would be exposed as a vdev in bbdev,
> > > > > > using the HW GPU device being a PCI in gpudev.
> > > > > > They are two different levels, got it?
> > > > >
> > > > > Exactly. So what is the point of exposing low-level driver API to
> > > > > "application",
> > > > > why not it is part of the internal driver API. My point is, why the
> > > > > application needs to worry
> > > > > about, How the CPU <-> Device communicated? CPU < -> Device memory
> > > > > visibility etc.
> > > >
> > > > There are two reasons.
> > > >
> > > > 1/ The application may want to use the GPU for some application-specific
> > > > needs which are not abstracted in DPDK API.
> > >
> > > Yes. Exactly, That's where my concern, If we take this path, What is
> > > the motivation to contribute to DPDK abstracted subsystem APIs which
> > > make sense for multiple vendors and every
> > > Similar stuff applicable for DPU,
> >
> > A feature-specific API is better of course, there is no lose of motivation.
> > But you cannot forbid applications to have their own features on GPU.
>
> it still can use it. We don't need DPDK APIs for that.
>
> >
> > > Otherway to put, if GPU is doing some ethdev offload, why not making
> > > as ethdev offload in ethdev spec so that
> > > another type of device can be used and make sense for application writters.
> >
> > If we do ethdev offload, yes we'll implement it.
> > And we'll do it on top of gpudev, which is the only way to share the CPU.
> >
> > > For example, In the future, If someone needs to add ML(Machine
> > > learning) subsystem and enable a proper subsystem
> > > interface that is good for DPDK. If this path is open, there is no
> > > motivation for contribution and the application
> > > will not have a standard interface doing the ML job across multiple vendors.
> >
> > Wrong. It does remove the motivation, it is a first step to build on top of it.
>
> IMO, No need to make driver API to the public to feature API.
>
> >
> > > That's is the only reason why saying it should not APPLICATION
> > > interface it can be DRIVER interface.
> > >
> > > >
> > > > 2/ This API may also be used by some feature implementation internally
> > > > in some DPDK libs or drivers.
> > > > We cannot skip the gpudev layer because this is what allows generic probing
> > > > of the HW, and gpudev allows to share the GPU with multiple features
> > > > implemented in different libs or drivers, thanks to the "child" concept.
> > >
> > > Again, why do applications need to know it? It is similar to `bus`
> > > kind of this where it sharing the physical resouces.
> >
> > No it's not a bus, it is a device that we need to share.
> >
> > > > > > > > > aka DPDK device class has a fixed personality and it has API to deal
> > > > > > > > > with abstracting specific application specific
> > > > > > > > > end user functionality like ethdev, cryptodev, eventdev irrespective
> > > > > > > > > of underlying bus/device properties.
> > > > > > > >
> > > > > > > > The goal of the lib is to allow anyone to invent any feature
> > > > > > > > which is not already available in DPDK.
> > > > > > > >
> > > > > > > > > Even similar semantics are required for DPU(Smart NIC)
> > > > > > > > > communitication. I am planning to
> > > > > > > > > send RFC in coming days to address the issue without the application
> > > > > > > > > knowing the Bus/HW/Driver details.
> > > > > > > >
> > > > > > > > gpudev is not exposing bus/hw/driver details.
> > > > > > > > I don't understand what you mean.
> > > > > > >
> > > > > > > See above.
> >
> > We are going into circles.
>
> Yes.
>
> > In short, Jerin wants to forbid the generic use of GPU in DPDK.
>
> See below.

Honestly I don’t see a real problem releasing the library at application level.
It doesn’t prevent to use it internally by other DPDK libraries/drivers if needed.

Applications can benefit of this library for a number of reasons:

  *   Enhance the interaction between GPU specific library and DPDK
  *   Hide GPU specific implementation details to the “final” user that wants to build a GPU + DPDK application
  *   Measure network throughput with common tools like testpmd using GPU memory

Please be aware that this is just a starting point.
I’m planning to expose a number of features (at memory and processing levels) that can be useful
to enhance the communication among GPU, CPU and NIC hiding the implementation
details within the library/driver.

>
> > He wants only feature-specific API.
>
> To re-reiterate, feature-specific "application" API. A device-specific
> bit can be
> driver API and accessible to the out-of-tree driver if needed.
>
> IMO, if we take this path, DPU, XPU, GPU, etc we need N different libraries to
> get the job done for a specific feature for the dataplane.
> Instead, Enabling public feature APIs will make the application
> portable and does not
> need to worry about which type of *PU it runs.
>

As I stated multiple times, let’s start with something simple that works and
then think about how to enhance the library/driver.
IMHO it doesn’t make sense to address all the use-cases now.
This is a completely new scenario we’re opening in the DPDK context, let’s start
from the basis.

>
> > It is like restricting the functions we can run on a CPU.
> >
> > And anyway we need this layer to share the GPU between multiple features.
>
> No disagreement there. Is that layer public application API or not is
> the question.
> it is like PCI device API calls over of the application and makes the
> application device specific.
>
> >
> > Techboard please vote.
>
> Yes.
>
> >
> >

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/9] GPU library
  2021-10-19 10:00                     ` Elena Agostini
@ 2021-10-19 18:47                       ` Jerin Jacob
  2021-10-19 19:11                         ` Thomas Monjalon
  0 siblings, 1 reply; 128+ messages in thread
From: Jerin Jacob @ 2021-10-19 18:47 UTC (permalink / raw)
  To: Elena Agostini; +Cc: NBU-Contact-Thomas Monjalon, techboard, dpdk-dev

>
> >
>
> > > He wants only feature-specific API.
>
> >
>
> > To re-reiterate, feature-specific "application" API. A device-specific
>
> > bit can be
>
> > driver API and accessible to the out-of-tree driver if needed.
>
> >
>
> > IMO, if we take this path, DPU, XPU, GPU, etc we need N different libraries to
>
> > get the job done for a specific feature for the dataplane.
>
> > Instead, Enabling public feature APIs will make the application
>
> > portable and does not
>
> > need to worry about which type of *PU it runs.
>
> >
>
>
>
> As I stated multiple times, let’s start with something simple that works and
>
> then think about how to enhance the library/driver.

I have submitted RFC[1] to abstract in a generic way so that
workload-specific _driver_ aspects are NOT EXPOSED to the application.

The RFC explains with application code, how any workload can be used
by the application without exposing the driver details.
Also, shows how to enable a framework to abstract a different form of
workload acceletor(DPU, GPU, XPU, IPU....)

In order to map to this discussion with RFC:
You may need to add a new host port for GPU which as
lib/dwa/rte_dwa_port_host_xxxxx.h and adding new workload as
lib/dwa/rte_dwa_profile_xxxx.h(Which can be reused by
all dataplane workload accelerator)



[1]
http://mails.dpdk.org/archives/dev/2021-October/226070.html


>
> IMHO it doesn’t make sense to address all the use-cases now.
>
> This is a completely new scenario we’re opening in the DPDK context, let’s start
>
> from the basis.

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/9] GPU library
  2021-10-19 18:47                       ` Jerin Jacob
@ 2021-10-19 19:11                         ` Thomas Monjalon
  2021-10-19 19:56                           ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran
  0 siblings, 1 reply; 128+ messages in thread
From: Thomas Monjalon @ 2021-10-19 19:11 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: Elena Agostini, techboard, dpdk-dev

19/10/2021 20:47, Jerin Jacob:
> > As I stated multiple times, let’s start with something simple that works and
> > then think about how to enhance the library/driver.
> 
> I have submitted RFC[1] to abstract in a generic way so that
> workload-specific _driver_ aspects are NOT EXPOSED to the application.
> 
> The RFC explains with application code, how any workload can be used
> by the application without exposing the driver details.
> Also, shows how to enable a framework to abstract a different form of
> workload acceletor(DPU, GPU, XPU, IPU....)
> 
> In order to map to this discussion with RFC:
> You may need to add a new host port for GPU which as
> lib/dwa/rte_dwa_port_host_xxxxx.h and adding new workload as
> lib/dwa/rte_dwa_profile_xxxx.h(Which can be reused by
> all dataplane workload accelerator)
> 
> [1] http://mails.dpdk.org/archives/dev/2021-October/226070.html

That's really 2 different points of view:
	- expose only some predefined dataplane workloads
	- or allow to build any custom worload




^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [EXT] Re:  [PATCH v3 0/9] GPU library
  2021-10-19 19:11                         ` Thomas Monjalon
@ 2021-10-19 19:56                           ` Jerin Jacob Kollanukkaran
  0 siblings, 0 replies; 128+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2021-10-19 19:56 UTC (permalink / raw)
  To: Thomas Monjalon, Jerin Jacob; +Cc: Elena Agostini, techboard, dpdk-dev

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Thomas Monjalon
> Sent: Wednesday, October 20, 2021 12:42 AM
> To: Jerin Jacob <jerinjacobk@gmail.com>
> Cc: Elena Agostini <eagostini@nvidia.com>; techboard@dpdk.org; dpdk-dev
> <dev@dpdk.org>
> Subject: [EXT] Re: [dpdk-dev] [PATCH v3 0/9] GPU library
> 
> External Email
> 
> ----------------------------------------------------------------------
> 19/10/2021 20:47, Jerin Jacob:
> > > As I stated multiple times, let’s start with something simple that
> > > works and then think about how to enhance the library/driver.
> >
> > I have submitted RFC[1] to abstract in a generic way so that
> > workload-specific _driver_ aspects are NOT EXPOSED to the application.
> >
> > The RFC explains with application code, how any workload can be used
> > by the application without exposing the driver details.
> > Also, shows how to enable a framework to abstract a different form of
> > workload acceletor(DPU, GPU, XPU, IPU....)
> >
> > In order to map to this discussion with RFC:
> > You may need to add a new host port for GPU which as
> > lib/dwa/rte_dwa_port_host_xxxxx.h and adding new workload as
> > lib/dwa/rte_dwa_profile_xxxx.h(Which can be reused by all dataplane
> > workload accelerator)
> >
> > [1]
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__mails.dpdk.org_arc
> > hives_dev_2021-
> 2DOctober_226070.html&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ
> >
> &r=1DGob4H4rxz6H8uITozGOCa0s5f4wCNtTa4UUKvcsvI&m=8RtZAL1mpqAKtA
> i3Fpryk
> >
> gq7CJ_9Xlb_cnR8gVPZPAc&s=aWBFPWzbVCCBZ9HSTIwBD5Mw9wdBYje1ad8zY
> j7oeiI&e
> > =
> 
> That's really 2 different points of view:
> 	- expose only some predefined dataplane workloads
> 	- or allow to build any custom workload

That’s one way to look at. Other way to look at, If you split the workload as quantifiable piece.
We could chain the workload to create any custom workload.
Another aspect is how can application writer  custom Dataplane workload which is not
specific to acceletor HW rather it express what need to do for the application .


> 
> 


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH v3 5/9] gpudev: add memory API
  2021-10-09  1:53   ` [dpdk-dev] [PATCH v3 5/9] gpudev: add memory API eagostini
  2021-10-08 20:18     ` Thomas Monjalon
@ 2021-10-29 19:38     ` Mattias Rönnblom
  2021-11-08 15:16       ` Elena Agostini
  1 sibling, 1 reply; 128+ messages in thread
From: Mattias Rönnblom @ 2021-10-29 19:38 UTC (permalink / raw)
  To: eagostini, dev; +Cc: Thomas Monjalon

On 2021-10-09 03:53, eagostini@nvidia.com wrote:
> From: Elena Agostini <eagostini@nvidia.com>
> 
> In heterogeneous computing system, processing is not only in the CPU.
> Some tasks can be delegated to devices working in parallel.
> Such workload distribution can be achieved by sharing some memory.
> 
> As a first step, the features are focused on memory management.
> A function allows to allocate memory inside the device,
> or in the main (CPU) memory while making it visible for the device.
> This memory may be used to save packets or for synchronization data.
> 
> The next step should focus on GPU processing task control.
> 
> Signed-off-by: Elena Agostini <eagostini@nvidia.com>
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> ---
>   app/test-gpudev/main.c                 | 118 +++++++++++++++++++++++++
>   doc/guides/gpus/features/default.ini   |   3 +
>   doc/guides/prog_guide/gpudev.rst       |  19 ++++
>   doc/guides/rel_notes/release_21_11.rst |   1 +
>   lib/gpudev/gpudev.c                    | 101 +++++++++++++++++++++
>   lib/gpudev/gpudev_driver.h             |  12 +++
>   lib/gpudev/rte_gpudev.h                |  95 ++++++++++++++++++++
>   lib/gpudev/version.map                 |   4 +
>   8 files changed, 353 insertions(+)
> 
> diff --git a/app/test-gpudev/main.c b/app/test-gpudev/main.c
> index 6a73a54e84..98c02a3ee0 100644
> --- a/app/test-gpudev/main.c
> +++ b/app/test-gpudev/main.c
> @@ -62,6 +62,110 @@ args_parse(int argc, char **argv)
>   	}
>   }
>   
> +static int
> +alloc_gpu_memory(uint16_t gpu_id)
> +{
> +	void * ptr_1 = NULL;

Delete space between '*' and 'p'.

> +	void * ptr_2 = NULL;
> +	size_t buf_bytes = 1024;
> +	int ret = 0;

This initialization is redundant.

> +
> +	printf("\n=======> TEST: Allocate GPU memory\n");
> +
> +	/* Alloc memory on GPU 0 */
> +	ptr_1 = rte_gpu_malloc(gpu_id, buf_bytes);
> +	if(ptr_1 == NULL)
> +	{

Misplaced braces.

"if (" rather than "if(".

> +		fprintf(stderr, "rte_gpu_malloc GPU memory returned error\n");
> +		return -1;
> +	}
> +	printf("GPU memory allocated at 0x%p %zdB\n", ptr_1, buf_bytes);
> +
> +	ptr_2 = rte_gpu_malloc(gpu_id, buf_bytes);
> +	if(ptr_2 == NULL)
> +	{

Again, and throughout this file.

> +		fprintf(stderr, "rte_gpu_malloc GPU memory returned error\n");
> +		return -1;
> +	}
> +	printf("GPU memory allocated at 0x%p %zdB\n", ptr_2, buf_bytes);
> +
> +	ret = rte_gpu_free(gpu_id, (uint8_t*)(ptr_1)+0x700);
> +	if(ret < 0)
> +	{
> +		printf("GPU memory 0x%p + 0x700 NOT freed because of memory address not recognized by driver\n", ptr_1);
> +	}
> +	else
> +	{
> +		fprintf(stderr, "rte_gpu_free erroneusly freed GPU memory 0x%p + 0x700\n", ptr_1);
> +		return -1;
> +	}
> +
> +	ret = rte_gpu_free(gpu_id, ptr_2);
> +	if(ret < 0)
> +	{
> +		fprintf(stderr, "rte_gpu_free returned error %d\n", ret);
> +		return -1;
> +	}
> +	printf("GPU memory 0x%p freed\n", ptr_2);
> +
> +	ret = rte_gpu_free(gpu_id, ptr_1);
> +	if(ret < 0)
> +	{
> +		fprintf(stderr, "rte_gpu_free returned error %d\n", ret);
> +		return -1;
> +	}
> +	printf("GPU memory 0x%p freed\n", ptr_1);
> +
> +	return 0;
> +}
> +
> +static int
> +register_cpu_memory(uint16_t gpu_id)
> +{
> +	void * ptr = NULL;
> +	size_t buf_bytes = 1024;
> +	int ret = 0;
> +
> +	printf("\n=======> TEST: Register CPU memory\n");
> +
> +	/* Alloc memory on CPU visible from GPU 0 */
> +	ptr = rte_zmalloc(NULL, buf_bytes, 0);
> +	if (ptr == NULL) {
> +		fprintf(stderr, "Failed to allocate CPU memory.\n");
> +		return -1;
> +	}
> +
> +	ret = rte_gpu_register(gpu_id, buf_bytes, ptr);
> +	if(ret < 0)
> +	{
> +		fprintf(stderr, "rte_gpu_register CPU memory returned error %d\n", ret);
> +		return -1;
> +	}
> +	printf("CPU memory registered at 0x%p %zdB\n", ptr, buf_bytes);
> +
> +	ret = rte_gpu_unregister(gpu_id, (uint8_t*)(ptr)+0x700);
> +	if(ret < 0)
> +	{
> +		printf("CPU memory 0x%p + 0x700 NOT unregistered because of memory address not recognized by driver\n", ptr);
> +	}
> +	else
> +	{
> +		fprintf(stderr, "rte_gpu_free erroneusly freed GPU memory 0x%p + 0x700\n", ptr);
> +		return -1;
> +	}
> +	printf("CPU memory 0x%p unregistered\n", ptr);
> +
> +	ret = rte_gpu_unregister(gpu_id, ptr);
> +	if(ret < 0)
> +	{
> +		fprintf(stderr, "rte_gpu_unregister returned error %d\n", ret);
> +		return -1;
> +	}
> +	printf("CPU memory 0x%p unregistered\n", ptr);
> +
> +	return 0;
> +}
> +
>   int
>   main(int argc, char **argv)
>   {
> @@ -99,6 +203,20 @@ main(int argc, char **argv)
>   	}
>   	printf("\n\n");
>   
> +	if(nb_gpus == 0 > +	{
> +		fprintf(stderr, "Need at least one GPU on the system to run the example\n");
> +		return EXIT_FAILURE;
> +	}
> +
> +	gpu_id = 0;
> +
> +	/**
> +	 * Memory tests
> +	 */
> +	alloc_gpu_memory(gpu_id);
> +	register_cpu_memory(gpu_id);
> +
>   	/* clean up the EAL */
>   	rte_eal_cleanup();
>   	printf("Bye...\n");
> diff --git a/doc/guides/gpus/features/default.ini b/doc/guides/gpus/features/default.ini
> index ec7a545eb7..87e9966424 100644
> --- a/doc/guides/gpus/features/default.ini
> +++ b/doc/guides/gpus/features/default.ini
> @@ -8,3 +8,6 @@
>   ;
>   [Features]
>   Get device info                =
> +Share CPU memory with device   =
> +Allocate device memory         =
> +Free memory                    =
> diff --git a/doc/guides/prog_guide/gpudev.rst b/doc/guides/prog_guide/gpudev.rst
> index 7694639489..9aca69038c 100644
> --- a/doc/guides/prog_guide/gpudev.rst
> +++ b/doc/guides/prog_guide/gpudev.rst
> @@ -30,6 +30,8 @@ Features
>   This library provides a number of features:
>   
>   - Interoperability with device-specific library through generic handlers.
> +- Allocate and free memory on the device.
> +- Register CPU memory to make it visible from the device.
>   
>   
>   API Overview
> @@ -46,3 +48,20 @@ that will be registered internally by the driver as an additional device (child)
>   connected to a physical device (parent).
>   Each device (parent or child) is represented through a ID
>   required to indicate which device a given operation should be executed on.
> +
> +Memory Allocation
> +~~~~~~~~~~~~~~~~~
> +
> +gpudev can allocate on an input given GPU device a memory area
> +returning the pointer to that memory.
> +Later, it's also possible to free that memory with gpudev.
> +GPU memory allocated outside of the gpudev library
> +(e.g. with GPU-specific library) cannot be freed by the gpudev library.
> +
> +Memory Registration
> +~~~~~~~~~~~~~~~~~~~
> +
> +gpudev can register a CPU memory area to make it visible from a GPU device.
> +Later, it's also possible to unregister that memory with gpudev.
> +CPU memory registered outside of the gpudev library
> +(e.g. with GPU specific library) cannot be unregistered by the gpudev library.
> diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
> index 4986a35b50..c4ac5e3053 100644
> --- a/doc/guides/rel_notes/release_21_11.rst
> +++ b/doc/guides/rel_notes/release_21_11.rst
> @@ -65,6 +65,7 @@ New Features
>   * **Introduced GPU device class with first features:**
>   
>     * Device information
> +  * Memory management
>   
>   * **Added new RSS offload types for IPv4/L4 checksum in RSS flow.**
>   
> diff --git a/lib/gpudev/gpudev.c b/lib/gpudev/gpudev.c
> index f0690cf730..1d8318f769 100644
> --- a/lib/gpudev/gpudev.c
> +++ b/lib/gpudev/gpudev.c
> @@ -6,6 +6,7 @@
>   #include <rte_tailq.h>
>   #include <rte_string_fns.h>
>   #include <rte_memzone.h>
> +#include <rte_malloc.h>
>   #include <rte_errno.h>
>   #include <rte_log.h>
>   
> @@ -523,3 +524,103 @@ rte_gpu_info_get(int16_t dev_id, struct rte_gpu_info *info)
>   	}
>   	return GPU_DRV_RET(dev->ops.dev_info_get(dev, info));
>   }
> +
> +void *
> +rte_gpu_malloc(int16_t dev_id, size_t size)
> +{
> +	struct rte_gpu *dev;
> +	void *ptr;
> +	int ret;
> +
> +	dev = gpu_get_by_id(dev_id);
> +	if (dev == NULL) {
> +		GPU_LOG(ERR, "alloc mem for invalid device ID %d", dev_id);
> +		rte_errno = ENODEV;
> +		return NULL;
> +	}
> +
> +	if (dev->ops.mem_alloc == NULL) {
> +		GPU_LOG(ERR, "mem allocation not supported");
> +		rte_errno = ENOTSUP;
> +		return NULL;
> +	}
> +
> +	if (size == 0) /* dry-run */
> +		return NULL;
> +
> +	ret = dev->ops.mem_alloc(dev, size, &ptr);
> +
> +	switch (ret) {
> +		case 0:
> +			return ptr;
> +		case -ENOMEM:
> +		case -E2BIG:
> +			rte_errno = -ret;
> +			return NULL;
> +		default:
> +			rte_errno = -EPERM;
> +			return NULL;
> +	}
> +}
> +
> +int
> +rte_gpu_register(int16_t dev_id, size_t size, void * ptr)
> +{
> +	struct rte_gpu *dev;
> +
> +	dev = gpu_get_by_id(dev_id);
> +	if (dev == NULL) {
> +		GPU_LOG(ERR, "alloc mem for invalid device ID %d", dev_id);
> +		rte_errno = ENODEV;
> +		return -rte_errno;
> +	}
> +
> +	if (dev->ops.mem_register == NULL) {
> +		GPU_LOG(ERR, "mem registration not supported");
> +		rte_errno = ENOTSUP;
> +		return -rte_errno;
> +	}
> +
> +	if (size == 0 || ptr == NULL) /* dry-run */
> +		return -EINVAL;
> +
> +	return GPU_DRV_RET(dev->ops.mem_register(dev, size, ptr));
> +}
> +
> +int
> +rte_gpu_unregister(int16_t dev_id, void * ptr)
> +{
> +	struct rte_gpu *dev;
> +
> +	dev = gpu_get_by_id(dev_id);
> +	if (dev == NULL) {
> +		GPU_LOG(ERR, "unregister mem for invalid device ID %d", dev_id);
> +		rte_errno = ENODEV;
> +		return -rte_errno;
> +	}
> +
> +	if (dev->ops.mem_unregister == NULL) {
> +		rte_errno = ENOTSUP;
> +		return -rte_errno;
> +	}
> +	return GPU_DRV_RET(dev->ops.mem_unregister(dev, ptr));
> +}
> +
> +int
> +rte_gpu_free(int16_t dev_id, void *ptr)
> +{
> +	struct rte_gpu *dev;
> +
> +	dev = gpu_get_by_id(dev_id);
> +	if (dev == NULL) {
> +		GPU_LOG(ERR, "free mem for invalid device ID %d", dev_id);
> +		rte_errno = ENODEV;
> +		return -rte_errno;
> +	}
> +
> +	if (dev->ops.mem_free == NULL) {
> +		rte_errno = ENOTSUP;
> +		return -rte_errno;
> +	}
> +	return GPU_DRV_RET(dev->ops.mem_free(dev, ptr));
> +}
> diff --git a/lib/gpudev/gpudev_driver.h b/lib/gpudev/gpudev_driver.h
> index 9459c7e30f..11015944a6 100644
> --- a/lib/gpudev/gpudev_driver.h
> +++ b/lib/gpudev/gpudev_driver.h
> @@ -27,12 +27,24 @@ enum rte_gpu_state {
>   struct rte_gpu;
>   typedef int (rte_gpu_close_t)(struct rte_gpu *dev);
>   typedef int (rte_gpu_info_get_t)(struct rte_gpu *dev, struct rte_gpu_info *info);
> +typedef int (rte_gpu_mem_alloc_t)(struct rte_gpu *dev, size_t size, void **ptr);
> +typedef int (rte_gpu_free_t)(struct rte_gpu *dev, void *ptr);
> +typedef int (rte_gpu_mem_register_t)(struct rte_gpu *dev, size_t size, void *ptr);
> +typedef int (rte_gpu_mem_unregister_t)(struct rte_gpu *dev, void *ptr);
>   
>   struct rte_gpu_ops {
>   	/* Get device info. If NULL, info is just copied. */
>   	rte_gpu_info_get_t *dev_info_get;
>   	/* Close device or child context. */
>   	rte_gpu_close_t *dev_close;
> +	/* Allocate memory in device. */
> +	rte_gpu_mem_alloc_t *mem_alloc;
> +	/* Register CPU memory in device. */
> +	rte_gpu_mem_register_t *mem_register;
> +	/* Free memory allocated or registered in device. */
> +	rte_gpu_free_t *mem_free;
> +	/* Unregister CPU memory in device. */
> +	rte_gpu_mem_unregister_t *mem_unregister;
>   };
>   
>   struct rte_gpu_mpshared {
> diff --git a/lib/gpudev/rte_gpudev.h b/lib/gpudev/rte_gpudev.h
> index df75dbdbab..3c276581c0 100644
> --- a/lib/gpudev/rte_gpudev.h
> +++ b/lib/gpudev/rte_gpudev.h
> @@ -9,6 +9,7 @@
>   #include <stdint.h>
>   #include <stdbool.h>
>   
> +#include <rte_bitops.h>
>   #include <rte_compat.h>
>   
>   /**
> @@ -292,6 +293,100 @@ int rte_gpu_callback_unregister(int16_t dev_id, enum rte_gpu_event event,
>   __rte_experimental
>   int rte_gpu_info_get(int16_t dev_id, struct rte_gpu_info *info);
>   
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Allocate a chunk of memory usable by the device.
> + *
> + * @param dev_id
> + *   Device ID requiring allocated memory.
> + * @param size
> + *   Number of bytes to allocate.
> + *   Requesting 0 will do nothing.
> + *
> + * @return
> + *   A pointer to the allocated memory, otherwise NULL and rte_errno is set:
> + *   - ENODEV if invalid dev_id
> + *   - EINVAL if reserved flags
> + *   - ENOTSUP if operation not supported by the driver
> + *   - E2BIG if size is higher than limit
> + *   - ENOMEM if out of space
> + *   - EPERM if driver error
> + */
> +__rte_experimental
> +void *rte_gpu_malloc(int16_t dev_id, size_t size)
> +__rte_alloc_size(2);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Deallocate a chunk of memory allocated with rte_gpu_malloc().
> + *
> + * @param dev_id
> + *   Reference device ID.
> + * @param ptr
> + *   Pointer to the memory area to be deallocated.
> + *   NULL is a no-op accepted value.
> + *
> + * @return
> + *   0 on success, -rte_errno otherwise:

I don't think you are supposed to set rte_errno if it's not needed, 
which is not the case here (since you return the error code).

> + *   - ENODEV if invalid dev_id
> + *   - ENOTSUP if operation not supported by the driver
> + *   - EPERM if driver error
> + */
> +__rte_experimental
> +int rte_gpu_free(int16_t dev_id, void *ptr);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Register a chunk of memory on the CPU usable by the device.
> + *
> + * @param dev_id
> + *   Device ID requiring allocated memory.
> + * @param size
> + *   Number of bytes to allocate.
> + *   Requesting 0 will do nothing.
> + * @param ptr
> + *   Pointer to the memory area to be registered.
> + *   NULL is a no-op accepted value.
> +
> + * @return
> + *   A pointer to the allocated memory, otherwise NULL and rte_errno is set:
> + *   - ENODEV if invalid dev_id
> + *   - EINVAL if reserved flags
> + *   - ENOTSUP if operation not supported by the driver
> + *   - E2BIG if size is higher than limit
> + *   - ENOMEM if out of space
> + *   - EPERM if driver error
> + */
> +__rte_experimental
> +int rte_gpu_register(int16_t dev_id, size_t size, void * ptr);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Deregister a chunk of memory previusly registered with rte_gpu_mem_register()
> + *
> + * @param dev_id
> + *   Reference device ID.
> + * @param ptr
> + *   Pointer to the memory area to be unregistered.
> + *   NULL is a no-op accepted value.
> + *
> + * @return
> + *   0 on success, -rte_errno otherwise:
> + *   - ENODEV if invalid dev_id
> + *   - ENOTSUP if operation not supported by the driver
> + *   - EPERM if driver error
> + */
> +__rte_experimental
> +int rte_gpu_unregister(int16_t dev_id, void *ptr);
> +
>   #ifdef __cplusplus
>   }
>   #endif
> diff --git a/lib/gpudev/version.map b/lib/gpudev/version.map
> index 58dc632393..d4a65ebd52 100644
> --- a/lib/gpudev/version.map
> +++ b/lib/gpudev/version.map
> @@ -8,9 +8,13 @@ EXPERIMENTAL {
>   	rte_gpu_close;
>   	rte_gpu_count_avail;
>   	rte_gpu_find_next;
> +	rte_gpu_free;
>   	rte_gpu_info_get;
>   	rte_gpu_init;
>   	rte_gpu_is_valid;
> +	rte_gpu_malloc;
> +	rte_gpu_register;
> +	rte_gpu_unregister;
>   };
>   
>   INTERNAL {
> 

^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [PATCH v4 0/9] GPU library
  2021-06-02 20:35 [dpdk-dev] [PATCH] gpudev: introduce memory API Thomas Monjalon
                   ` (8 preceding siblings ...)
  2021-10-09  1:53 ` [dpdk-dev] [PATCH v3 0/9] GPU library eagostini
@ 2021-11-03 19:15 ` eagostini
  2021-11-03 19:15   ` [dpdk-dev] [PATCH v4 1/9] gpudev: introduce GPU device class library eagostini
                     ` (8 more replies)
  2021-11-08 18:57 ` [dpdk-dev] [PATCH v5 0/9] GPU library eagostini
  10 siblings, 9 replies; 128+ messages in thread
From: eagostini @ 2021-11-03 19:15 UTC (permalink / raw)
  To: dev; +Cc: Elena Agostini

From: Elena Agostini <eagostini@nvidia.com>

In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.

The goal of this new library is to enhance the collaboration between
DPDK, that's primarily a CPU framework, and GPU devices.

When mixing network activity with task processing on a non-CPU device,
there may be the need to put in communication the CPU with the device
in order to manage the memory, synchronize operations, exchange info, etc..

This library provides a number of new features:
- Interoperability with GPU-specific library with generic handlers
- Possibility to allocate and free memory on the GPU
- Possibility to allocate and free memory on the CPU but visible from the GPU
- Communication functions to enhance the dialog between the CPU and the GPU

The infrastructure is prepared to welcome drivers in drivers/gpu/
as the CUDA one, sent as draft:
https://patches.dpdk.org/project/dpdk/patch/20211005224905.13505-1-eagostini@nvidia.com/

Changelog:
- Patches updated to latest DPDK commit
- Communication list item has an array of mbufs instead of opaque
  objects
- Communication list free doesn't release mbufs anymore

Elena Agostini (6):
  gpudev: introduce GPU device class library
  gpudev: add memory API
  gpudev: add memory barrier
  gpudev: add communication flag
  gpudev: add communication list
  doc: add CUDA example in GPU guide

Thomas Monjalon (3):
  gpudev: add event notification
  gpudev: add child device representing a device context
  gpudev: support multi-process

 .gitignore                             |   1 +
 MAINTAINERS                            |   6 +
 app/meson.build                        |   1 +
 app/test-gpudev/main.c                 | 394 +++++++++++
 app/test-gpudev/meson.build            |   5 +
 doc/api/doxy-api.conf.in               |   1 +
 doc/guides/conf.py                     |   8 +
 doc/guides/gpus/features/default.ini   |  13 +
 doc/guides/gpus/index.rst              |  11 +
 doc/guides/gpus/overview.rst           |  10 +
 doc/guides/index.rst                   |   1 +
 doc/guides/prog_guide/gpudev.rst       | 226 ++++++
 doc/guides/prog_guide/index.rst        |   1 +
 doc/guides/rel_notes/release_21_11.rst |   6 +
 drivers/gpu/meson.build                |   4 +
 drivers/meson.build                    |   1 +
 lib/gpudev/gpudev.c                    | 908 +++++++++++++++++++++++++
 lib/gpudev/gpudev_driver.h             | 102 +++
 lib/gpudev/meson.build                 |  12 +
 lib/gpudev/rte_gpudev.h                | 649 ++++++++++++++++++
 lib/gpudev/version.map                 |  38 ++
 lib/meson.build                        |   1 +
 22 files changed, 2399 insertions(+)
 create mode 100644 app/test-gpudev/main.c
 create mode 100644 app/test-gpudev/meson.build
 create mode 100644 doc/guides/gpus/features/default.ini
 create mode 100644 doc/guides/gpus/index.rst
 create mode 100644 doc/guides/gpus/overview.rst
 create mode 100644 doc/guides/prog_guide/gpudev.rst
 create mode 100644 drivers/gpu/meson.build
 create mode 100644 lib/gpudev/gpudev.c
 create mode 100644 lib/gpudev/gpudev_driver.h
 create mode 100644 lib/gpudev/meson.build
 create mode 100644 lib/gpudev/rte_gpudev.h
 create mode 100644 lib/gpudev/version.map

-- 
2.17.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [PATCH v4 1/9] gpudev: introduce GPU device class library
  2021-11-03 19:15 ` [dpdk-dev] [PATCH v4 " eagostini
@ 2021-11-03 19:15   ` eagostini
  2021-11-03 19:15   ` [dpdk-dev] [PATCH v4 2/9] gpudev: add event notification eagostini
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 128+ messages in thread
From: eagostini @ 2021-11-03 19:15 UTC (permalink / raw)
  To: dev; +Cc: Elena Agostini, Thomas Monjalon

From: Elena Agostini <eagostini@nvidia.com>

In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.

The new library gpudev is for dealing with GPGPU computing devices
from a DPDK application running on the CPU.

The infrastructure is prepared to welcome drivers in drivers/gpu/.

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 .gitignore                             |   1 +
 MAINTAINERS                            |   6 +
 app/meson.build                        |   1 +
 app/test-gpudev/main.c                 | 107 +++++++++++
 app/test-gpudev/meson.build            |   5 +
 doc/api/doxy-api.conf.in               |   1 +
 doc/guides/conf.py                     |   8 +
 doc/guides/gpus/features/default.ini   |  10 +
 doc/guides/gpus/index.rst              |  11 ++
 doc/guides/gpus/overview.rst           |  10 +
 doc/guides/index.rst                   |   1 +
 doc/guides/prog_guide/gpudev.rst       |  36 ++++
 doc/guides/prog_guide/index.rst        |   1 +
 doc/guides/rel_notes/release_21_11.rst |   4 +
 drivers/gpu/meson.build                |   4 +
 drivers/meson.build                    |   1 +
 lib/gpudev/gpudev.c                    | 249 +++++++++++++++++++++++++
 lib/gpudev/gpudev_driver.h             |  67 +++++++
 lib/gpudev/meson.build                 |  10 +
 lib/gpudev/rte_gpudev.h                | 168 +++++++++++++++++
 lib/gpudev/version.map                 |  20 ++
 lib/meson.build                        |   1 +
 22 files changed, 722 insertions(+)
 create mode 100644 app/test-gpudev/main.c
 create mode 100644 app/test-gpudev/meson.build
 create mode 100644 doc/guides/gpus/features/default.ini
 create mode 100644 doc/guides/gpus/index.rst
 create mode 100644 doc/guides/gpus/overview.rst
 create mode 100644 doc/guides/prog_guide/gpudev.rst
 create mode 100644 drivers/gpu/meson.build
 create mode 100644 lib/gpudev/gpudev.c
 create mode 100644 lib/gpudev/gpudev_driver.h
 create mode 100644 lib/gpudev/meson.build
 create mode 100644 lib/gpudev/rte_gpudev.h
 create mode 100644 lib/gpudev/version.map

diff --git a/.gitignore b/.gitignore
index b19c0717e6..49494e0c6c 100644
--- a/.gitignore
+++ b/.gitignore
@@ -14,6 +14,7 @@ doc/guides/compressdevs/overview_feature_table.txt
 doc/guides/regexdevs/overview_feature_table.txt
 doc/guides/vdpadevs/overview_feature_table.txt
 doc/guides/bbdevs/overview_feature_table.txt
+doc/guides/gpus/overview_feature_table.txt
 
 # ignore generated ctags/cscope files
 cscope.out.po
diff --git a/MAINTAINERS b/MAINTAINERS
index 0e5951f8f1..0f71736816 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -467,6 +467,12 @@ M: Bruce Richardson <bruce.richardson@intel.com>
 F: examples/dma/
 F: doc/guides/sample_app_ug/dma.rst
 
+General-Purpose Graphics Processing Unit (GPU) API - EXPERIMENTAL
+M: Elena Agostini <eagostini@nvidia.com>
+F: lib/gpudev/
+F: doc/guides/prog_guide/gpudev.rst
+F: doc/guides/gpus/features/default.ini
+
 Eventdev API
 M: Jerin Jacob <jerinj@marvell.com>
 T: git://dpdk.org/next/dpdk-next-eventdev
diff --git a/app/meson.build b/app/meson.build
index e41a2e3902..500d86750d 100644
--- a/app/meson.build
+++ b/app/meson.build
@@ -13,6 +13,7 @@ apps = [
         'test-eventdev',
         'test-fib',
         'test-flow-perf',
+        'test-gpudev',
         'test-pipeline',
         'test-pmd',
         'test-regex',
diff --git a/app/test-gpudev/main.c b/app/test-gpudev/main.c
new file mode 100644
index 0000000000..6a73a54e84
--- /dev/null
+++ b/app/test-gpudev/main.c
@@ -0,0 +1,107 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 NVIDIA Corporation & Affiliates
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <stdarg.h>
+#include <errno.h>
+#include <getopt.h>
+
+#include <rte_common.h>
+#include <rte_malloc.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_ether.h>
+#include <rte_ethdev.h>
+#include <rte_mempool.h>
+#include <rte_mbuf.h>
+
+#include <rte_gpudev.h>
+
+enum app_args {
+	ARG_HELP,
+	ARG_MEMPOOL
+};
+
+static void
+usage(const char *prog_name)
+{
+	printf("%s [EAL options] --\n",
+		prog_name);
+}
+
+static void
+args_parse(int argc, char **argv)
+{
+	char **argvopt;
+	int opt;
+	int opt_idx;
+
+	static struct option lgopts[] = {
+		{ "help", 0, 0, ARG_HELP},
+		/* End of options */
+		{ 0, 0, 0, 0 }
+	};
+
+	argvopt = argv;
+	while ((opt = getopt_long(argc, argvopt, "",
+				lgopts, &opt_idx)) != EOF) {
+		switch (opt) {
+		case ARG_HELP:
+			usage(argv[0]);
+			break;
+		default:
+			usage(argv[0]);
+			rte_exit(EXIT_FAILURE, "Invalid option: %s\n", argv[optind]);
+			break;
+		}
+	}
+}
+
+int
+main(int argc, char **argv)
+{
+	int ret;
+	int nb_gpus = 0;
+	int16_t gpu_id = 0;
+	struct rte_gpu_info ginfo;
+
+	/* Init EAL. */
+	ret = rte_eal_init(argc, argv);
+	if (ret < 0)
+		rte_exit(EXIT_FAILURE, "EAL init failed\n");
+	argc -= ret;
+	argv += ret;
+	if (argc > 1)
+		args_parse(argc, argv);
+	argc -= ret;
+	argv += ret;
+
+	nb_gpus = rte_gpu_count_avail();
+	printf("\n\nDPDK found %d GPUs:\n", nb_gpus);
+	RTE_GPU_FOREACH(gpu_id)
+	{
+		if(rte_gpu_info_get(gpu_id, &ginfo))
+			rte_exit(EXIT_FAILURE, "rte_gpu_info_get error - bye\n");
+
+		printf("\tGPU ID %d\n\t\tparent ID %d GPU Bus ID %s NUMA node %d Tot memory %.02f MB, Tot processors %d\n",
+				ginfo.dev_id,
+				ginfo.parent,
+				ginfo.name,
+				ginfo.numa_node,
+				(((float)ginfo.total_memory)/(float)1024)/(float)1024,
+				ginfo.processor_count
+			);
+	}
+	printf("\n\n");
+
+	/* clean up the EAL */
+	rte_eal_cleanup();
+	printf("Bye...\n");
+
+	return EXIT_SUCCESS;
+}
diff --git a/app/test-gpudev/meson.build b/app/test-gpudev/meson.build
new file mode 100644
index 0000000000..17bdef3646
--- /dev/null
+++ b/app/test-gpudev/meson.build
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2021 NVIDIA Corporation & Affiliates
+
+sources = files('main.c')
+deps = ['gpudev', 'ethdev']
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index 096ebbaf0d..db2ca9b6ed 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -41,6 +41,7 @@ INPUT                   = @TOPDIR@/doc/api/doxy-api-index.md \
                           @TOPDIR@/lib/eventdev \
                           @TOPDIR@/lib/fib \
                           @TOPDIR@/lib/flow_classify \
+                          @TOPDIR@/lib/gpudev \
                           @TOPDIR@/lib/graph \
                           @TOPDIR@/lib/gro \
                           @TOPDIR@/lib/gso \
diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 67d2dd62c7..7930da9ceb 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -152,6 +152,9 @@ def generate_overview_table(output_filename, table_id, section, table_name, titl
         name = ini_filename[:-4]
         name = name.replace('_vf', 'vf')
         pmd_names.append(name)
+    if not pmd_names:
+        # Add an empty column if table is empty (required by RST syntax)
+        pmd_names.append(' ')
 
     # Pad the table header names.
     max_header_len = len(max(pmd_names, key=len))
@@ -388,6 +391,11 @@ def setup(app):
                             'Features',
                             'Features availability in bbdev drivers',
                             'Feature')
+    table_file = dirname(__file__) + '/gpus/overview_feature_table.txt'
+    generate_overview_table(table_file, 1,
+                            'Features',
+                            'Features availability in GPU drivers',
+                            'Feature')
 
     if LooseVersion(sphinx_version) < LooseVersion('1.3.1'):
         print('Upgrade sphinx to version >= 1.3.1 for '
diff --git a/doc/guides/gpus/features/default.ini b/doc/guides/gpus/features/default.ini
new file mode 100644
index 0000000000..ec7a545eb7
--- /dev/null
+++ b/doc/guides/gpus/features/default.ini
@@ -0,0 +1,10 @@
+;
+; Features of GPU drivers.
+;
+; This file defines the features that are valid for inclusion in
+; the other driver files and also the order that they appear in
+; the features table in the documentation. The feature description
+; string should not exceed feature_str_len defined in conf.py.
+;
+[Features]
+Get device info                =
diff --git a/doc/guides/gpus/index.rst b/doc/guides/gpus/index.rst
new file mode 100644
index 0000000000..1878423239
--- /dev/null
+++ b/doc/guides/gpus/index.rst
@@ -0,0 +1,11 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+   Copyright (c) 2021 NVIDIA Corporation & Affiliates
+
+General-Purpose Graphics Processing Unit Drivers
+================================================
+
+.. toctree::
+   :maxdepth: 2
+   :numbered:
+
+   overview
diff --git a/doc/guides/gpus/overview.rst b/doc/guides/gpus/overview.rst
new file mode 100644
index 0000000000..4830348818
--- /dev/null
+++ b/doc/guides/gpus/overview.rst
@@ -0,0 +1,10 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+   Copyright (c) 2021 NVIDIA Corporation & Affiliates
+
+Overview of GPU Drivers
+=======================
+
+General-Purpose computing on Graphics Processing Unit (GPGPU)
+is the use of GPU to perform parallel computation.
+
+.. include:: overview_feature_table.txt
diff --git a/doc/guides/index.rst b/doc/guides/index.rst
index 919825992e..5eb5bd9c9a 100644
--- a/doc/guides/index.rst
+++ b/doc/guides/index.rst
@@ -22,6 +22,7 @@ DPDK documentation
    vdpadevs/index
    regexdevs/index
    dmadevs/index
+   gpus/index
    eventdevs/index
    rawdevs/index
    mempool/index
diff --git a/doc/guides/prog_guide/gpudev.rst b/doc/guides/prog_guide/gpudev.rst
new file mode 100644
index 0000000000..6ea7239159
--- /dev/null
+++ b/doc/guides/prog_guide/gpudev.rst
@@ -0,0 +1,36 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+   Copyright (c) 2021 NVIDIA Corporation & Affiliates
+
+General-Purpose Graphics Processing Unit Library
+================================================
+
+When mixing networking activity with task processing on a GPU device,
+there may be the need to put in communication the CPU with the device
+in order to manage the memory, synchronize operations, exchange info, etc..
+
+By means of the generic GPU interface provided by this library,
+it is possible to allocate a chunk of GPU memory and use it
+to create a DPDK mempool with external mbufs having the payload
+on the GPU memory, enabling any network interface card
+(which support this feature like Mellanox NIC)
+to directly transmit and receive packets using GPU memory.
+
+Additionally, this library provides a number of functions
+to enhance the dialog between CPU and GPU.
+
+Out of scope of this library is to provide a wrapper for GPU specific libraries
+(e.g. CUDA Toolkit or OpenCL), thus it is not possible to launch workload
+on the device or create GPU specific objects
+(e.g. CUDA Driver context or CUDA Streams in case of NVIDIA GPUs).
+
+
+Features
+--------
+
+This library provides a number of features:
+
+- Interoperability with device-specific library through generic handlers.
+
+
+API Overview
+------------
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 20e5155cf4..7090b5589a 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -28,6 +28,7 @@ Programmer's Guide
     compressdev
     regexdev
     dmadev
+    gpudev
     rte_security
     rawdev
     link_bonding_poll_mode_drv_lib
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index 502cc5ceb2..57851c63af 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -91,6 +91,10 @@ New Features
   Added ``rte_eth_macaddrs_get`` to allow user to retrieve all Ethernet
   addresses assigned to given ethernet port.
 
+* **Introduced GPU device class with first features:**
+
+  * Device information
+
 * **Added new RSS offload types for IPv4/L4 checksum in RSS flow.**
 
   Added macros ETH_RSS_IPV4_CHKSUM and ETH_RSS_L4_CHKSUM, now IPv4 and
diff --git a/drivers/gpu/meson.build b/drivers/gpu/meson.build
new file mode 100644
index 0000000000..e51ad3381b
--- /dev/null
+++ b/drivers/gpu/meson.build
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2021 NVIDIA Corporation & Affiliates
+
+drivers = []
diff --git a/drivers/meson.build b/drivers/meson.build
index 34c0276487..d5f4e1c1f2 100644
--- a/drivers/meson.build
+++ b/drivers/meson.build
@@ -19,6 +19,7 @@ subdirs = [
         'vdpa',           # depends on common, bus and mempool.
         'event',          # depends on common, bus, mempool and net.
         'baseband',       # depends on common and bus.
+        'gpu',            # depends on common and bus.
 ]
 
 if meson.is_cross_build()
diff --git a/lib/gpudev/gpudev.c b/lib/gpudev/gpudev.c
new file mode 100644
index 0000000000..c839c530c8
--- /dev/null
+++ b/lib/gpudev/gpudev.c
@@ -0,0 +1,249 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 NVIDIA Corporation & Affiliates
+ */
+
+#include <rte_eal.h>
+#include <rte_string_fns.h>
+#include <rte_errno.h>
+#include <rte_log.h>
+
+#include "rte_gpudev.h"
+#include "gpudev_driver.h"
+
+/* Logging */
+RTE_LOG_REGISTER_DEFAULT(gpu_logtype, NOTICE);
+#define GPU_LOG(level, ...) \
+	rte_log(RTE_LOG_ ## level, gpu_logtype, RTE_FMT("gpu: " \
+		RTE_FMT_HEAD(__VA_ARGS__,) "\n", RTE_FMT_TAIL(__VA_ARGS__,)))
+
+/* Set any driver error as EPERM */
+#define GPU_DRV_RET(function) \
+	((function != 0) ? -(rte_errno = EPERM) : (rte_errno = 0))
+
+/* Array of devices */
+static struct rte_gpu *gpus;
+/* Number of currently valid devices */
+static int16_t gpu_max;
+/* Number of currently valid devices */
+static int16_t gpu_count;
+
+int
+rte_gpu_init(size_t dev_max)
+{
+	if (dev_max == 0 || dev_max > INT16_MAX) {
+		GPU_LOG(ERR, "invalid array size");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	/* No lock, it must be called before or during first probing. */
+	if (gpus != NULL) {
+		GPU_LOG(ERR, "already initialized");
+		rte_errno = EBUSY;
+		return -rte_errno;
+	}
+
+	gpus = calloc(dev_max, sizeof(struct rte_gpu));
+	if (gpus == NULL) {
+		GPU_LOG(ERR, "cannot initialize library");
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+
+	gpu_max = dev_max;
+	return 0;
+}
+
+uint16_t
+rte_gpu_count_avail(void)
+{
+	return gpu_count;
+}
+
+bool
+rte_gpu_is_valid(int16_t dev_id)
+{
+	if (dev_id >= 0 && dev_id < gpu_max &&
+		gpus[dev_id].state == RTE_GPU_STATE_INITIALIZED)
+		return true;
+	return false;
+}
+
+int16_t
+rte_gpu_find_next(int16_t dev_id)
+{
+	if (dev_id < 0)
+		dev_id = 0;
+	while (dev_id < gpu_max &&
+			gpus[dev_id].state == RTE_GPU_STATE_UNUSED)
+		dev_id++;
+
+	if (dev_id >= gpu_max)
+		return RTE_GPU_ID_NONE;
+	return dev_id;
+}
+
+static int16_t
+gpu_find_free_id(void)
+{
+	int16_t dev_id;
+
+	for (dev_id = 0; dev_id < gpu_max; dev_id++) {
+		if (gpus[dev_id].state == RTE_GPU_STATE_UNUSED)
+			return dev_id;
+	}
+	return RTE_GPU_ID_NONE;
+}
+
+static struct rte_gpu *
+gpu_get_by_id(int16_t dev_id)
+{
+	if (!rte_gpu_is_valid(dev_id))
+		return NULL;
+	return &gpus[dev_id];
+}
+
+struct rte_gpu *
+rte_gpu_get_by_name(const char *name)
+{
+	int16_t dev_id;
+	struct rte_gpu *dev;
+
+	if (name == NULL) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	RTE_GPU_FOREACH(dev_id) {
+		dev = &gpus[dev_id];
+		if (strncmp(name, dev->name, RTE_DEV_NAME_MAX_LEN) == 0)
+			return dev;
+	}
+	return NULL;
+}
+
+struct rte_gpu *
+rte_gpu_allocate(const char *name)
+{
+	int16_t dev_id;
+	struct rte_gpu *dev;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+		GPU_LOG(ERR, "only primary process can allocate device");
+		rte_errno = EPERM;
+		return NULL;
+	}
+	if (name == NULL) {
+		GPU_LOG(ERR, "allocate device without a name");
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* implicit initialization of library before adding first device */
+	if (gpus == NULL && rte_gpu_init(RTE_GPU_DEFAULT_MAX) < 0)
+		return NULL;
+
+	if (rte_gpu_get_by_name(name) != NULL) {
+		GPU_LOG(ERR, "device with name %s already exists", name);
+		rte_errno = EEXIST;
+		return NULL;
+	}
+	dev_id = gpu_find_free_id();
+	if (dev_id == RTE_GPU_ID_NONE) {
+		GPU_LOG(ERR, "reached maximum number of devices");
+		rte_errno = ENOENT;
+		return NULL;
+	}
+
+	dev = &gpus[dev_id];
+	memset(dev, 0, sizeof(*dev));
+
+	if (rte_strscpy(dev->name, name, RTE_DEV_NAME_MAX_LEN) < 0) {
+		GPU_LOG(ERR, "device name too long: %s", name);
+		rte_errno = ENAMETOOLONG;
+		return NULL;
+	}
+	dev->info.name = dev->name;
+	dev->info.dev_id = dev_id;
+	dev->info.numa_node = -1;
+
+	gpu_count++;
+	GPU_LOG(DEBUG, "new device %s (id %d) of total %d",
+			name, dev_id, gpu_count);
+	return dev;
+}
+
+void
+rte_gpu_complete_new(struct rte_gpu *dev)
+{
+	if (dev == NULL)
+		return;
+
+	dev->state = RTE_GPU_STATE_INITIALIZED;
+}
+
+int
+rte_gpu_release(struct rte_gpu *dev)
+{
+	if (dev == NULL) {
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+
+	GPU_LOG(DEBUG, "free device %s (id %d)",
+			dev->info.name, dev->info.dev_id);
+	dev->state = RTE_GPU_STATE_UNUSED;
+	gpu_count--;
+
+	return 0;
+}
+
+int
+rte_gpu_close(int16_t dev_id)
+{
+	int firsterr, binerr;
+	int *lasterr = &firsterr;
+	struct rte_gpu *dev;
+
+	dev = gpu_get_by_id(dev_id);
+	if (dev == NULL) {
+		GPU_LOG(ERR, "close invalid device ID %d", dev_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+
+	if (dev->ops.dev_close != NULL) {
+		*lasterr = GPU_DRV_RET(dev->ops.dev_close(dev));
+		if (*lasterr != 0)
+			lasterr = &binerr;
+	}
+
+	*lasterr = rte_gpu_release(dev);
+
+	rte_errno = -firsterr;
+	return firsterr;
+}
+
+int
+rte_gpu_info_get(int16_t dev_id, struct rte_gpu_info *info)
+{
+	struct rte_gpu *dev;
+
+	dev = gpu_get_by_id(dev_id);
+	if (dev == NULL) {
+		GPU_LOG(ERR, "query invalid device ID %d", dev_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+	if (info == NULL) {
+		GPU_LOG(ERR, "query without storage");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	if (dev->ops.dev_info_get == NULL) {
+		*info = dev->info;
+		return 0;
+	}
+	return GPU_DRV_RET(dev->ops.dev_info_get(dev, info));
+}
diff --git a/lib/gpudev/gpudev_driver.h b/lib/gpudev/gpudev_driver.h
new file mode 100644
index 0000000000..9e096e3b64
--- /dev/null
+++ b/lib/gpudev/gpudev_driver.h
@@ -0,0 +1,67 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 NVIDIA Corporation & Affiliates
+ */
+
+/*
+ * This header file must be included only by drivers.
+ * It is considered internal, i.e. hidden for the application.
+ * The prefix rte_ is used to avoid namespace clash in drivers.
+ */
+
+#ifndef RTE_GPUDEV_DRIVER_H
+#define RTE_GPUDEV_DRIVER_H
+
+#include <stdint.h>
+
+#include <rte_dev.h>
+
+#include "rte_gpudev.h"
+
+/* Flags indicate current state of device. */
+enum rte_gpu_state {
+	RTE_GPU_STATE_UNUSED,        /* not initialized */
+	RTE_GPU_STATE_INITIALIZED,   /* initialized */
+};
+
+struct rte_gpu;
+typedef int (rte_gpu_close_t)(struct rte_gpu *dev);
+typedef int (rte_gpu_info_get_t)(struct rte_gpu *dev, struct rte_gpu_info *info);
+
+struct rte_gpu_ops {
+	/* Get device info. If NULL, info is just copied. */
+	rte_gpu_info_get_t *dev_info_get;
+	/* Close device. */
+	rte_gpu_close_t *dev_close;
+};
+
+struct rte_gpu {
+	/* Backing device. */
+	struct rte_device *device;
+	/* Unique identifier name. */
+	char name[RTE_DEV_NAME_MAX_LEN]; /* Updated by this library. */
+	/* Device info structure. */
+	struct rte_gpu_info info;
+	/* Driver functions. */
+	struct rte_gpu_ops ops;
+	/* Current state (used or not) in the running process. */
+	enum rte_gpu_state state; /* Updated by this library. */
+	/* Driver-specific private data for the running process. */
+	void *process_private;
+} __rte_cache_aligned;
+
+__rte_internal
+struct rte_gpu *rte_gpu_get_by_name(const char *name);
+
+/* First step of initialization */
+__rte_internal
+struct rte_gpu *rte_gpu_allocate(const char *name);
+
+/* Last step of initialization. */
+__rte_internal
+void rte_gpu_complete_new(struct rte_gpu *dev);
+
+/* Last step of removal. */
+__rte_internal
+int rte_gpu_release(struct rte_gpu *dev);
+
+#endif /* RTE_GPUDEV_DRIVER_H */
diff --git a/lib/gpudev/meson.build b/lib/gpudev/meson.build
new file mode 100644
index 0000000000..608154817b
--- /dev/null
+++ b/lib/gpudev/meson.build
@@ -0,0 +1,10 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2021 NVIDIA Corporation & Affiliates
+
+headers = files(
+        'rte_gpudev.h',
+)
+
+sources = files(
+        'gpudev.c',
+)
diff --git a/lib/gpudev/rte_gpudev.h b/lib/gpudev/rte_gpudev.h
new file mode 100644
index 0000000000..eb7cfa8c59
--- /dev/null
+++ b/lib/gpudev/rte_gpudev.h
@@ -0,0 +1,168 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 NVIDIA Corporation & Affiliates
+ */
+
+#ifndef RTE_GPUDEV_H
+#define RTE_GPUDEV_H
+
+#include <stddef.h>
+#include <stdint.h>
+#include <stdbool.h>
+
+#include <rte_compat.h>
+
+/**
+ * @file
+ * Generic library to interact with GPU computing device.
+ *
+ * The API is not thread-safe.
+ * Device management must be done by a single thread.
+ *
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/** Maximum number of devices if rte_gpu_init() is not called. */
+#define RTE_GPU_DEFAULT_MAX 32
+
+/** Empty device ID. */
+#define RTE_GPU_ID_NONE -1
+
+/** Store device info. */
+struct rte_gpu_info {
+	/** Unique identifier name. */
+	const char *name;
+	/** Device ID. */
+	int16_t dev_id;
+	/** Total processors available on device. */
+	uint32_t processor_count;
+	/** Total memory available on device. */
+	size_t total_memory;
+	/* Local NUMA memory ID. -1 if unknown. */
+	int16_t numa_node;
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Initialize the device array before probing devices.
+ * If not called, the maximum of probed devices is RTE_GPU_DEFAULT_MAX.
+ *
+ * @param dev_max
+ *   Maximum number of devices.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENOMEM if out of memory
+ *   - EINVAL if 0 size
+ *   - EBUSY if already initialized
+ */
+__rte_experimental
+int rte_gpu_init(size_t dev_max);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Return the number of GPU detected and associated to DPDK.
+ *
+ * @return
+ *   The number of available computing devices.
+ */
+__rte_experimental
+uint16_t rte_gpu_count_avail(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Check if the device is valid and initialized in DPDK.
+ *
+ * @param dev_id
+ *   The input device ID.
+ *
+ * @return
+ *   - True if dev_id is a valid and initialized computing device.
+ *   - False otherwise.
+ */
+__rte_experimental
+bool rte_gpu_is_valid(int16_t dev_id);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get the ID of the next valid GPU initialized in DPDK.
+ *
+ * @param dev_id
+ *   The initial device ID to start the research.
+ *
+ * @return
+ *   Next device ID corresponding to a valid and initialized computing device,
+ *   RTE_GPU_ID_NONE if there is none.
+ */
+__rte_experimental
+int16_t rte_gpu_find_next(int16_t dev_id);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Macro to iterate over all valid GPU devices.
+ *
+ * @param dev_id
+ *   The ID of the next possible valid device, usually 0 to iterate all.
+ */
+#define RTE_GPU_FOREACH(dev_id) \
+	for (dev_id = rte_gpu_find_next(0); \
+	     dev_id > 0; \
+	     dev_id = rte_gpu_find_next(dev_id + 1))
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Close device.
+ * All resources are released.
+ *
+ * @param dev_id
+ *   Device ID to close.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - EPERM if driver error
+ */
+__rte_experimental
+int rte_gpu_close(int16_t dev_id);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Return device specific info.
+ *
+ * @param dev_id
+ *   Device ID to get info.
+ * @param info
+ *   Memory structure to fill with the info.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - EINVAL if NULL info
+ *   - EPERM if driver error
+ */
+__rte_experimental
+int rte_gpu_info_get(int16_t dev_id, struct rte_gpu_info *info);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_GPUDEV_H */
diff --git a/lib/gpudev/version.map b/lib/gpudev/version.map
new file mode 100644
index 0000000000..6ac6b327e2
--- /dev/null
+++ b/lib/gpudev/version.map
@@ -0,0 +1,20 @@
+EXPERIMENTAL {
+	global:
+
+	# added in 21.11
+	rte_gpu_close;
+	rte_gpu_count_avail;
+	rte_gpu_find_next;
+	rte_gpu_info_get;
+	rte_gpu_init;
+	rte_gpu_is_valid;
+};
+
+INTERNAL {
+	global:
+
+	rte_gpu_allocate;
+	rte_gpu_complete_new;
+	rte_gpu_get_by_name;
+	rte_gpu_release;
+};
diff --git a/lib/meson.build b/lib/meson.build
index 499d26060f..8537a5ab80 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -34,6 +34,7 @@ libraries = [
         'distributor',
         'efd',
         'eventdev',
+        'gpudev',
         'gro',
         'gso',
         'ip_frag',
-- 
2.17.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [PATCH v4 2/9] gpudev: add event notification
  2021-11-03 19:15 ` [dpdk-dev] [PATCH v4 " eagostini
  2021-11-03 19:15   ` [dpdk-dev] [PATCH v4 1/9] gpudev: introduce GPU device class library eagostini
@ 2021-11-03 19:15   ` eagostini
  2021-11-03 19:15   ` [dpdk-dev] [PATCH v4 3/9] gpudev: add child device representing a device context eagostini
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 128+ messages in thread
From: eagostini @ 2021-11-03 19:15 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon

From: Thomas Monjalon <thomas@monjalon.net>

Callback functions may be registered for a device event.
Callback management is per-process and not thread-safe.

The events RTE_GPU_EVENT_NEW and RTE_GPU_EVENT_DEL
are notified respectively after creation and before removal
of a device, as part of the library functions.
Some future events may be emitted from drivers.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 lib/gpudev/gpudev.c        | 148 +++++++++++++++++++++++++++++++++++++
 lib/gpudev/gpudev_driver.h |   7 ++
 lib/gpudev/rte_gpudev.h    |  70 ++++++++++++++++++
 lib/gpudev/version.map     |   3 +
 4 files changed, 228 insertions(+)

diff --git a/lib/gpudev/gpudev.c b/lib/gpudev/gpudev.c
index c839c530c8..d57e23df7c 100644
--- a/lib/gpudev/gpudev.c
+++ b/lib/gpudev/gpudev.c
@@ -3,6 +3,7 @@
  */
 
 #include <rte_eal.h>
+#include <rte_tailq.h>
 #include <rte_string_fns.h>
 #include <rte_errno.h>
 #include <rte_log.h>
@@ -27,6 +28,16 @@ static int16_t gpu_max;
 /* Number of currently valid devices */
 static int16_t gpu_count;
 
+/* Event callback object */
+struct rte_gpu_callback {
+	TAILQ_ENTRY(rte_gpu_callback) next;
+	rte_gpu_callback_t *function;
+	void *user_data;
+	enum rte_gpu_event event;
+};
+static rte_rwlock_t gpu_callback_lock = RTE_RWLOCK_INITIALIZER;
+static void gpu_free_callbacks(struct rte_gpu *dev);
+
 int
 rte_gpu_init(size_t dev_max)
 {
@@ -166,6 +177,7 @@ rte_gpu_allocate(const char *name)
 	dev->info.name = dev->name;
 	dev->info.dev_id = dev_id;
 	dev->info.numa_node = -1;
+	TAILQ_INIT(&dev->callbacks);
 
 	gpu_count++;
 	GPU_LOG(DEBUG, "new device %s (id %d) of total %d",
@@ -180,6 +192,8 @@ rte_gpu_complete_new(struct rte_gpu *dev)
 		return;
 
 	dev->state = RTE_GPU_STATE_INITIALIZED;
+	dev->state = RTE_GPU_STATE_INITIALIZED;
+	rte_gpu_notify(dev, RTE_GPU_EVENT_NEW);
 }
 
 int
@@ -192,6 +206,9 @@ rte_gpu_release(struct rte_gpu *dev)
 
 	GPU_LOG(DEBUG, "free device %s (id %d)",
 			dev->info.name, dev->info.dev_id);
+	rte_gpu_notify(dev, RTE_GPU_EVENT_DEL);
+
+	gpu_free_callbacks(dev);
 	dev->state = RTE_GPU_STATE_UNUSED;
 	gpu_count--;
 
@@ -224,6 +241,137 @@ rte_gpu_close(int16_t dev_id)
 	return firsterr;
 }
 
+int
+rte_gpu_callback_register(int16_t dev_id, enum rte_gpu_event event,
+		rte_gpu_callback_t *function, void *user_data)
+{
+	int16_t next_dev, last_dev;
+	struct rte_gpu_callback_list *callbacks;
+	struct rte_gpu_callback *callback;
+
+	if (!rte_gpu_is_valid(dev_id) && dev_id != RTE_GPU_ID_ANY) {
+		GPU_LOG(ERR, "register callback of invalid ID %d", dev_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+	if (function == NULL) {
+		GPU_LOG(ERR, "cannot register callback without function");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	if (dev_id == RTE_GPU_ID_ANY) {
+		next_dev = 0;
+		last_dev = gpu_max - 1;
+	} else {
+		next_dev = last_dev = dev_id;
+	}
+
+	rte_rwlock_write_lock(&gpu_callback_lock);
+	do {
+		callbacks = &gpus[next_dev].callbacks;
+
+		/* check if not already registered */
+		TAILQ_FOREACH(callback, callbacks, next) {
+			if (callback->event == event &&
+					callback->function == function &&
+					callback->user_data == user_data) {
+				GPU_LOG(INFO, "callback already registered");
+				return 0;
+			}
+		}
+
+		callback = malloc(sizeof(*callback));
+		if (callback == NULL) {
+			GPU_LOG(ERR, "cannot allocate callback");
+			return -ENOMEM;
+		}
+		callback->function = function;
+		callback->user_data = user_data;
+		callback->event = event;
+		TAILQ_INSERT_TAIL(callbacks, callback, next);
+
+	} while (++next_dev <= last_dev);
+	rte_rwlock_write_unlock(&gpu_callback_lock);
+
+	return 0;
+}
+
+int
+rte_gpu_callback_unregister(int16_t dev_id, enum rte_gpu_event event,
+		rte_gpu_callback_t *function, void *user_data)
+{
+	int16_t next_dev, last_dev;
+	struct rte_gpu_callback_list *callbacks;
+	struct rte_gpu_callback *callback, *nextcb;
+
+	if (!rte_gpu_is_valid(dev_id) && dev_id != RTE_GPU_ID_ANY) {
+		GPU_LOG(ERR, "unregister callback of invalid ID %d", dev_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+	if (function == NULL) {
+		GPU_LOG(ERR, "cannot unregister callback without function");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	if (dev_id == RTE_GPU_ID_ANY) {
+		next_dev = 0;
+		last_dev = gpu_max - 1;
+	} else {
+		next_dev = last_dev = dev_id;
+	}
+
+	rte_rwlock_write_lock(&gpu_callback_lock);
+	do {
+		callbacks = &gpus[next_dev].callbacks;
+		RTE_TAILQ_FOREACH_SAFE(callback, callbacks, next, nextcb) {
+			if (callback->event != event ||
+					callback->function != function ||
+					(callback->user_data != user_data &&
+					user_data != (void *)-1))
+				continue;
+			TAILQ_REMOVE(callbacks, callback, next);
+			free(callback);
+		}
+	} while (++next_dev <= last_dev);
+	rte_rwlock_write_unlock(&gpu_callback_lock);
+
+	return 0;
+}
+
+static void
+gpu_free_callbacks(struct rte_gpu *dev)
+{
+	struct rte_gpu_callback_list *callbacks;
+	struct rte_gpu_callback *callback, *nextcb;
+
+	callbacks = &dev->callbacks;
+	rte_rwlock_write_lock(&gpu_callback_lock);
+	RTE_TAILQ_FOREACH_SAFE(callback, callbacks, next, nextcb) {
+		TAILQ_REMOVE(callbacks, callback, next);
+		free(callback);
+	}
+	rte_rwlock_write_unlock(&gpu_callback_lock);
+}
+
+void
+rte_gpu_notify(struct rte_gpu *dev, enum rte_gpu_event event)
+{
+	int16_t dev_id;
+	struct rte_gpu_callback *callback;
+
+	dev_id = dev->info.dev_id;
+	rte_rwlock_read_lock(&gpu_callback_lock);
+	TAILQ_FOREACH(callback, &dev->callbacks, next) {
+		if (callback->event != event || callback->function == NULL)
+			continue;
+		callback->function(dev_id, event, callback->user_data);
+	}
+	rte_rwlock_read_unlock(&gpu_callback_lock);
+}
+
 int
 rte_gpu_info_get(int16_t dev_id, struct rte_gpu_info *info)
 {
diff --git a/lib/gpudev/gpudev_driver.h b/lib/gpudev/gpudev_driver.h
index 9e096e3b64..2a7089aa52 100644
--- a/lib/gpudev/gpudev_driver.h
+++ b/lib/gpudev/gpudev_driver.h
@@ -12,6 +12,7 @@
 #define RTE_GPUDEV_DRIVER_H
 
 #include <stdint.h>
+#include <sys/queue.h>
 
 #include <rte_dev.h>
 
@@ -43,6 +44,8 @@ struct rte_gpu {
 	struct rte_gpu_info info;
 	/* Driver functions. */
 	struct rte_gpu_ops ops;
+	/* Event callback list. */
+	TAILQ_HEAD(rte_gpu_callback_list, rte_gpu_callback) callbacks;
 	/* Current state (used or not) in the running process. */
 	enum rte_gpu_state state; /* Updated by this library. */
 	/* Driver-specific private data for the running process. */
@@ -64,4 +67,8 @@ void rte_gpu_complete_new(struct rte_gpu *dev);
 __rte_internal
 int rte_gpu_release(struct rte_gpu *dev);
 
+/* Call registered callbacks. No multi-process event. */
+__rte_internal
+void rte_gpu_notify(struct rte_gpu *dev, enum rte_gpu_event);
+
 #endif /* RTE_GPUDEV_DRIVER_H */
diff --git a/lib/gpudev/rte_gpudev.h b/lib/gpudev/rte_gpudev.h
index eb7cfa8c59..e1702fbfe4 100644
--- a/lib/gpudev/rte_gpudev.h
+++ b/lib/gpudev/rte_gpudev.h
@@ -31,6 +31,11 @@ extern "C" {
 
 /** Empty device ID. */
 #define RTE_GPU_ID_NONE -1
+/** Catch-all device ID. */
+#define RTE_GPU_ID_ANY INT16_MIN
+
+/** Catch-all callback data. */
+#define RTE_GPU_CALLBACK_ANY_DATA ((void *)-1)
 
 /** Store device info. */
 struct rte_gpu_info {
@@ -46,6 +51,18 @@ struct rte_gpu_info {
 	int16_t numa_node;
 };
 
+/** Flags passed in notification callback. */
+enum rte_gpu_event {
+	/** Device is just initialized. */
+	RTE_GPU_EVENT_NEW,
+	/** Device is going to be released. */
+	RTE_GPU_EVENT_DEL,
+};
+
+/** Prototype of event callback function. */
+typedef void (rte_gpu_callback_t)(int16_t dev_id,
+		enum rte_gpu_event event, void *user_data);
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
@@ -141,6 +158,59 @@ int16_t rte_gpu_find_next(int16_t dev_id);
 __rte_experimental
 int rte_gpu_close(int16_t dev_id);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Register a function as event callback.
+ * A function may be registered multiple times for different events.
+ *
+ * @param dev_id
+ *   Device ID to get notified about.
+ *   RTE_GPU_ID_ANY means all devices.
+ * @param event
+ *   Device event to be registered for.
+ * @param function
+ *   Callback function to be called on event.
+ * @param user_data
+ *   Optional parameter passed in the callback.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - EINVAL if NULL function
+ *   - ENOMEM if out of memory
+ */
+__rte_experimental
+int rte_gpu_callback_register(int16_t dev_id, enum rte_gpu_event event,
+		rte_gpu_callback_t *function, void *user_data);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Unregister for an event.
+ *
+ * @param dev_id
+ *   Device ID to be silenced.
+ *   RTE_GPU_ID_ANY means all devices.
+ * @param event
+ *   Registered event.
+ * @param function
+ *   Registered function.
+ * @param user_data
+ *   Optional parameter as registered.
+ *   RTE_GPU_CALLBACK_ANY_DATA is a catch-all.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - EINVAL if NULL function
+ */
+__rte_experimental
+int rte_gpu_callback_unregister(int16_t dev_id, enum rte_gpu_event event,
+		rte_gpu_callback_t *function, void *user_data);
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
diff --git a/lib/gpudev/version.map b/lib/gpudev/version.map
index 6ac6b327e2..b3b6b76c1c 100644
--- a/lib/gpudev/version.map
+++ b/lib/gpudev/version.map
@@ -2,6 +2,8 @@ EXPERIMENTAL {
 	global:
 
 	# added in 21.11
+	rte_gpu_callback_register;
+	rte_gpu_callback_unregister;
 	rte_gpu_close;
 	rte_gpu_count_avail;
 	rte_gpu_find_next;
@@ -16,5 +18,6 @@ INTERNAL {
 	rte_gpu_allocate;
 	rte_gpu_complete_new;
 	rte_gpu_get_by_name;
+	rte_gpu_notify;
 	rte_gpu_release;
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [PATCH v4 3/9] gpudev: add child device representing a device context
  2021-11-03 19:15 ` [dpdk-dev] [PATCH v4 " eagostini
  2021-11-03 19:15   ` [dpdk-dev] [PATCH v4 1/9] gpudev: introduce GPU device class library eagostini
  2021-11-03 19:15   ` [dpdk-dev] [PATCH v4 2/9] gpudev: add event notification eagostini
@ 2021-11-03 19:15   ` eagostini
  2021-11-03 19:15   ` [dpdk-dev] [PATCH v4 4/9] gpudev: support multi-process eagostini
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 128+ messages in thread
From: eagostini @ 2021-11-03 19:15 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon

From: Thomas Monjalon <thomas@monjalon.net>

The computing device may operate in some isolated contexts.
Memory and processing are isolated in a silo represented by
a child device.
The context is provided as an opaque by the caller of
rte_gpu_add_child().

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 doc/guides/prog_guide/gpudev.rst | 12 ++++++
 lib/gpudev/gpudev.c              | 45 +++++++++++++++++++-
 lib/gpudev/gpudev_driver.h       |  2 +-
 lib/gpudev/rte_gpudev.h          | 71 +++++++++++++++++++++++++++++---
 lib/gpudev/version.map           |  1 +
 5 files changed, 123 insertions(+), 8 deletions(-)

diff --git a/doc/guides/prog_guide/gpudev.rst b/doc/guides/prog_guide/gpudev.rst
index 6ea7239159..7694639489 100644
--- a/doc/guides/prog_guide/gpudev.rst
+++ b/doc/guides/prog_guide/gpudev.rst
@@ -34,3 +34,15 @@ This library provides a number of features:
 
 API Overview
 ------------
+
+Child Device
+~~~~~~~~~~~~
+
+By default, DPDK PCIe module detects and registers physical GPU devices
+in the system.
+With the gpudev library is also possible to add additional non-physical devices
+through an ``uint64_t`` generic handler (e.g. CUDA Driver context)
+that will be registered internally by the driver as an additional device (child)
+connected to a physical device (parent).
+Each device (parent or child) is represented through a ID
+required to indicate which device a given operation should be executed on.
diff --git a/lib/gpudev/gpudev.c b/lib/gpudev/gpudev.c
index d57e23df7c..74cdd7f20b 100644
--- a/lib/gpudev/gpudev.c
+++ b/lib/gpudev/gpudev.c
@@ -80,13 +80,22 @@ rte_gpu_is_valid(int16_t dev_id)
 	return false;
 }
 
+static bool
+gpu_match_parent(int16_t dev_id, int16_t parent)
+{
+	if (parent == RTE_GPU_ID_ANY)
+		return true;
+	return gpus[dev_id].info.parent == parent;
+}
+
 int16_t
-rte_gpu_find_next(int16_t dev_id)
+rte_gpu_find_next(int16_t dev_id, int16_t parent)
 {
 	if (dev_id < 0)
 		dev_id = 0;
 	while (dev_id < gpu_max &&
-			gpus[dev_id].state == RTE_GPU_STATE_UNUSED)
+			(gpus[dev_id].state == RTE_GPU_STATE_UNUSED ||
+			!gpu_match_parent(dev_id, parent)))
 		dev_id++;
 
 	if (dev_id >= gpu_max)
@@ -177,6 +186,7 @@ rte_gpu_allocate(const char *name)
 	dev->info.name = dev->name;
 	dev->info.dev_id = dev_id;
 	dev->info.numa_node = -1;
+	dev->info.parent = RTE_GPU_ID_NONE;
 	TAILQ_INIT(&dev->callbacks);
 
 	gpu_count++;
@@ -185,6 +195,28 @@ rte_gpu_allocate(const char *name)
 	return dev;
 }
 
+int16_t
+rte_gpu_add_child(const char *name, int16_t parent, uint64_t child_context)
+{
+	struct rte_gpu *dev;
+
+	if (!rte_gpu_is_valid(parent)) {
+		GPU_LOG(ERR, "add child to invalid parent ID %d", parent);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+
+	dev = rte_gpu_allocate(name);
+	if (dev == NULL)
+		return -rte_errno;
+
+	dev->info.parent = parent;
+	dev->info.context = child_context;
+
+	rte_gpu_complete_new(dev);
+	return dev->info.dev_id;
+}
+
 void
 rte_gpu_complete_new(struct rte_gpu *dev)
 {
@@ -199,10 +231,19 @@ rte_gpu_complete_new(struct rte_gpu *dev)
 int
 rte_gpu_release(struct rte_gpu *dev)
 {
+	int16_t dev_id, child;
+
 	if (dev == NULL) {
 		rte_errno = ENODEV;
 		return -rte_errno;
 	}
+	dev_id = dev->info.dev_id;
+	RTE_GPU_FOREACH_CHILD(child, dev_id) {
+		GPU_LOG(ERR, "cannot release device %d with child %d",
+				dev_id, child);
+		rte_errno = EBUSY;
+		return -rte_errno;
+	}
 
 	GPU_LOG(DEBUG, "free device %s (id %d)",
 			dev->info.name, dev->info.dev_id);
diff --git a/lib/gpudev/gpudev_driver.h b/lib/gpudev/gpudev_driver.h
index 2a7089aa52..4d0077161c 100644
--- a/lib/gpudev/gpudev_driver.h
+++ b/lib/gpudev/gpudev_driver.h
@@ -31,7 +31,7 @@ typedef int (rte_gpu_info_get_t)(struct rte_gpu *dev, struct rte_gpu_info *info)
 struct rte_gpu_ops {
 	/* Get device info. If NULL, info is just copied. */
 	rte_gpu_info_get_t *dev_info_get;
-	/* Close device. */
+	/* Close device or child context. */
 	rte_gpu_close_t *dev_close;
 };
 
diff --git a/lib/gpudev/rte_gpudev.h b/lib/gpudev/rte_gpudev.h
index e1702fbfe4..df75dbdbab 100644
--- a/lib/gpudev/rte_gpudev.h
+++ b/lib/gpudev/rte_gpudev.h
@@ -41,8 +41,12 @@ extern "C" {
 struct rte_gpu_info {
 	/** Unique identifier name. */
 	const char *name;
+	/** Opaque handler of the device context. */
+	uint64_t context;
 	/** Device ID. */
 	int16_t dev_id;
+	/** ID of the parent device, RTE_GPU_ID_NONE if no parent */
+	int16_t parent;
 	/** Total processors available on device. */
 	uint32_t processor_count;
 	/** Total memory available on device. */
@@ -110,6 +114,33 @@ uint16_t rte_gpu_count_avail(void);
 __rte_experimental
 bool rte_gpu_is_valid(int16_t dev_id);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Create a virtual device representing a context in the parent device.
+ *
+ * @param name
+ *   Unique string to identify the device.
+ * @param parent
+ *   Device ID of the parent.
+ * @param child_context
+ *   Opaque context handler.
+ *
+ * @return
+ *   Device ID of the new created child, -rte_errno otherwise:
+ *   - EINVAL if empty name
+ *   - ENAMETOOLONG if long name
+ *   - EEXIST if existing device name
+ *   - ENODEV if invalid parent
+ *   - EPERM if secondary process
+ *   - ENOENT if too many devices
+ *   - ENOMEM if out of space
+ */
+__rte_experimental
+int16_t rte_gpu_add_child(const char *name,
+		int16_t parent, uint64_t child_context);
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
@@ -118,13 +149,17 @@ bool rte_gpu_is_valid(int16_t dev_id);
  *
  * @param dev_id
  *   The initial device ID to start the research.
+ * @param parent
+ *   The device ID of the parent.
+ *   RTE_GPU_ID_NONE means no parent.
+ *   RTE_GPU_ID_ANY means no or any parent.
  *
  * @return
  *   Next device ID corresponding to a valid and initialized computing device,
  *   RTE_GPU_ID_NONE if there is none.
  */
 __rte_experimental
-int16_t rte_gpu_find_next(int16_t dev_id);
+int16_t rte_gpu_find_next(int16_t dev_id, int16_t parent);
 
 /**
  * @warning
@@ -136,15 +171,41 @@ int16_t rte_gpu_find_next(int16_t dev_id);
  *   The ID of the next possible valid device, usually 0 to iterate all.
  */
 #define RTE_GPU_FOREACH(dev_id) \
-	for (dev_id = rte_gpu_find_next(0); \
-	     dev_id > 0; \
-	     dev_id = rte_gpu_find_next(dev_id + 1))
+	RTE_GPU_FOREACH_CHILD(dev_id, RTE_GPU_ID_ANY)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Macro to iterate over all valid computing devices having no parent.
+ *
+ * @param dev_id
+ *   The ID of the next possible valid device, usually 0 to iterate all.
+ */
+#define RTE_GPU_FOREACH_PARENT(dev_id) \
+	RTE_GPU_FOREACH_CHILD(dev_id, RTE_GPU_ID_NONE)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Macro to iterate over all valid children of a computing device parent.
+ *
+ * @param dev_id
+ *   The ID of the next possible valid device, usually 0 to iterate all.
+ * @param parent
+ *   The device ID of the parent.
+ */
+#define RTE_GPU_FOREACH_CHILD(dev_id, parent) \
+	for (dev_id = rte_gpu_find_next(0, parent); \
+	     dev_id >= 0; \
+	     dev_id = rte_gpu_find_next(dev_id + 1, parent))
 
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
  *
- * Close device.
+ * Close device or child context.
  * All resources are released.
  *
  * @param dev_id
diff --git a/lib/gpudev/version.map b/lib/gpudev/version.map
index b3b6b76c1c..4a934ed933 100644
--- a/lib/gpudev/version.map
+++ b/lib/gpudev/version.map
@@ -2,6 +2,7 @@ EXPERIMENTAL {
 	global:
 
 	# added in 21.11
+	rte_gpu_add_child;
 	rte_gpu_callback_register;
 	rte_gpu_callback_unregister;
 	rte_gpu_close;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [PATCH v4 4/9] gpudev: support multi-process
  2021-11-03 19:15 ` [dpdk-dev] [PATCH v4 " eagostini
                     ` (2 preceding siblings ...)
  2021-11-03 19:15   ` [dpdk-dev] [PATCH v4 3/9] gpudev: add child device representing a device context eagostini
@ 2021-11-03 19:15   ` eagostini
  2021-11-03 19:15   ` [dpdk-dev] [PATCH v4 5/9] gpudev: add memory API eagostini
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 128+ messages in thread
From: eagostini @ 2021-11-03 19:15 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon

From: Thomas Monjalon <thomas@monjalon.net>

The device data shared between processes are moved in a struct
allocated in a shared memory (a new memzone for all GPUs).
The main struct rte_gpu references the shared memory
via the pointer mpshared.

The API function rte_gpu_attach() is added to attach a device
from the secondary process.
The function rte_gpu_allocate() can be used only by primary process.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 lib/gpudev/gpudev.c        | 127 +++++++++++++++++++++++++++++++------
 lib/gpudev/gpudev_driver.h |  25 ++++++--
 lib/gpudev/version.map     |   1 +
 3 files changed, 127 insertions(+), 26 deletions(-)

diff --git a/lib/gpudev/gpudev.c b/lib/gpudev/gpudev.c
index 74cdd7f20b..f0690cf730 100644
--- a/lib/gpudev/gpudev.c
+++ b/lib/gpudev/gpudev.c
@@ -5,6 +5,7 @@
 #include <rte_eal.h>
 #include <rte_tailq.h>
 #include <rte_string_fns.h>
+#include <rte_memzone.h>
 #include <rte_errno.h>
 #include <rte_log.h>
 
@@ -28,6 +29,12 @@ static int16_t gpu_max;
 /* Number of currently valid devices */
 static int16_t gpu_count;
 
+/* Shared memory between processes. */
+static const char *GPU_MEMZONE = "rte_gpu_shared";
+static struct {
+	__extension__ struct rte_gpu_mpshared gpus[0];
+} *gpu_shared_mem;
+
 /* Event callback object */
 struct rte_gpu_callback {
 	TAILQ_ENTRY(rte_gpu_callback) next;
@@ -75,7 +82,7 @@ bool
 rte_gpu_is_valid(int16_t dev_id)
 {
 	if (dev_id >= 0 && dev_id < gpu_max &&
-		gpus[dev_id].state == RTE_GPU_STATE_INITIALIZED)
+		gpus[dev_id].process_state == RTE_GPU_STATE_INITIALIZED)
 		return true;
 	return false;
 }
@@ -85,7 +92,7 @@ gpu_match_parent(int16_t dev_id, int16_t parent)
 {
 	if (parent == RTE_GPU_ID_ANY)
 		return true;
-	return gpus[dev_id].info.parent == parent;
+	return gpus[dev_id].mpshared->info.parent == parent;
 }
 
 int16_t
@@ -94,7 +101,7 @@ rte_gpu_find_next(int16_t dev_id, int16_t parent)
 	if (dev_id < 0)
 		dev_id = 0;
 	while (dev_id < gpu_max &&
-			(gpus[dev_id].state == RTE_GPU_STATE_UNUSED ||
+			(gpus[dev_id].process_state == RTE_GPU_STATE_UNUSED ||
 			!gpu_match_parent(dev_id, parent)))
 		dev_id++;
 
@@ -109,7 +116,7 @@ gpu_find_free_id(void)
 	int16_t dev_id;
 
 	for (dev_id = 0; dev_id < gpu_max; dev_id++) {
-		if (gpus[dev_id].state == RTE_GPU_STATE_UNUSED)
+		if (gpus[dev_id].process_state == RTE_GPU_STATE_UNUSED)
 			return dev_id;
 	}
 	return RTE_GPU_ID_NONE;
@@ -136,12 +143,35 @@ rte_gpu_get_by_name(const char *name)
 
 	RTE_GPU_FOREACH(dev_id) {
 		dev = &gpus[dev_id];
-		if (strncmp(name, dev->name, RTE_DEV_NAME_MAX_LEN) == 0)
+		if (strncmp(name, dev->mpshared->name, RTE_DEV_NAME_MAX_LEN) == 0)
 			return dev;
 	}
 	return NULL;
 }
 
+static int
+gpu_shared_mem_init(void)
+{
+	const struct rte_memzone *memzone;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		memzone = rte_memzone_reserve(GPU_MEMZONE,
+				sizeof(*gpu_shared_mem) +
+				sizeof(*gpu_shared_mem->gpus) * gpu_max,
+				SOCKET_ID_ANY, 0);
+	} else {
+		memzone = rte_memzone_lookup(GPU_MEMZONE);
+	}
+	if (memzone == NULL) {
+		GPU_LOG(ERR, "cannot initialize shared memory");
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+
+	gpu_shared_mem = memzone->addr;
+	return 0;
+}
+
 struct rte_gpu *
 rte_gpu_allocate(const char *name)
 {
@@ -163,6 +193,10 @@ rte_gpu_allocate(const char *name)
 	if (gpus == NULL && rte_gpu_init(RTE_GPU_DEFAULT_MAX) < 0)
 		return NULL;
 
+	/* initialize shared memory before adding first device */
+	if (gpu_shared_mem == NULL && gpu_shared_mem_init() < 0)
+		return NULL;
+
 	if (rte_gpu_get_by_name(name) != NULL) {
 		GPU_LOG(ERR, "device with name %s already exists", name);
 		rte_errno = EEXIST;
@@ -178,16 +212,20 @@ rte_gpu_allocate(const char *name)
 	dev = &gpus[dev_id];
 	memset(dev, 0, sizeof(*dev));
 
-	if (rte_strscpy(dev->name, name, RTE_DEV_NAME_MAX_LEN) < 0) {
+	dev->mpshared = &gpu_shared_mem->gpus[dev_id];
+	memset(dev->mpshared, 0, sizeof(*dev->mpshared));
+
+	if (rte_strscpy(dev->mpshared->name, name, RTE_DEV_NAME_MAX_LEN) < 0) {
 		GPU_LOG(ERR, "device name too long: %s", name);
 		rte_errno = ENAMETOOLONG;
 		return NULL;
 	}
-	dev->info.name = dev->name;
-	dev->info.dev_id = dev_id;
-	dev->info.numa_node = -1;
-	dev->info.parent = RTE_GPU_ID_NONE;
+	dev->mpshared->info.name = dev->mpshared->name;
+	dev->mpshared->info.dev_id = dev_id;
+	dev->mpshared->info.numa_node = -1;
+	dev->mpshared->info.parent = RTE_GPU_ID_NONE;
 	TAILQ_INIT(&dev->callbacks);
+	__atomic_fetch_add(&dev->mpshared->process_refcnt, 1, __ATOMIC_RELAXED);
 
 	gpu_count++;
 	GPU_LOG(DEBUG, "new device %s (id %d) of total %d",
@@ -195,6 +233,55 @@ rte_gpu_allocate(const char *name)
 	return dev;
 }
 
+struct rte_gpu *
+rte_gpu_attach(const char *name)
+{
+	int16_t dev_id;
+	struct rte_gpu *dev;
+	struct rte_gpu_mpshared *shared_dev;
+
+	if (rte_eal_process_type() != RTE_PROC_SECONDARY) {
+		GPU_LOG(ERR, "only secondary process can attach device");
+		rte_errno = EPERM;
+		return NULL;
+	}
+	if (name == NULL) {
+		GPU_LOG(ERR, "attach device without a name");
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* implicit initialization of library before adding first device */
+	if (gpus == NULL && rte_gpu_init(RTE_GPU_DEFAULT_MAX) < 0)
+		return NULL;
+
+	/* initialize shared memory before adding first device */
+	if (gpu_shared_mem == NULL && gpu_shared_mem_init() < 0)
+		return NULL;
+
+	for (dev_id = 0; dev_id < gpu_max; dev_id++) {
+		shared_dev = &gpu_shared_mem->gpus[dev_id];
+		if (strncmp(name, shared_dev->name, RTE_DEV_NAME_MAX_LEN) == 0)
+			break;
+	}
+	if (dev_id >= gpu_max) {
+		GPU_LOG(ERR, "device with name %s not found", name);
+		rte_errno = ENOENT;
+		return NULL;
+	}
+	dev = &gpus[dev_id];
+	memset(dev, 0, sizeof(*dev));
+
+	TAILQ_INIT(&dev->callbacks);
+	dev->mpshared = shared_dev;
+	__atomic_fetch_add(&dev->mpshared->process_refcnt, 1, __ATOMIC_RELAXED);
+
+	gpu_count++;
+	GPU_LOG(DEBUG, "attached device %s (id %d) of total %d",
+			name, dev_id, gpu_count);
+	return dev;
+}
+
 int16_t
 rte_gpu_add_child(const char *name, int16_t parent, uint64_t child_context)
 {
@@ -210,11 +297,11 @@ rte_gpu_add_child(const char *name, int16_t parent, uint64_t child_context)
 	if (dev == NULL)
 		return -rte_errno;
 
-	dev->info.parent = parent;
-	dev->info.context = child_context;
+	dev->mpshared->info.parent = parent;
+	dev->mpshared->info.context = child_context;
 
 	rte_gpu_complete_new(dev);
-	return dev->info.dev_id;
+	return dev->mpshared->info.dev_id;
 }
 
 void
@@ -223,8 +310,7 @@ rte_gpu_complete_new(struct rte_gpu *dev)
 	if (dev == NULL)
 		return;
 
-	dev->state = RTE_GPU_STATE_INITIALIZED;
-	dev->state = RTE_GPU_STATE_INITIALIZED;
+	dev->process_state = RTE_GPU_STATE_INITIALIZED;
 	rte_gpu_notify(dev, RTE_GPU_EVENT_NEW);
 }
 
@@ -237,7 +323,7 @@ rte_gpu_release(struct rte_gpu *dev)
 		rte_errno = ENODEV;
 		return -rte_errno;
 	}
-	dev_id = dev->info.dev_id;
+	dev_id = dev->mpshared->info.dev_id;
 	RTE_GPU_FOREACH_CHILD(child, dev_id) {
 		GPU_LOG(ERR, "cannot release device %d with child %d",
 				dev_id, child);
@@ -246,11 +332,12 @@ rte_gpu_release(struct rte_gpu *dev)
 	}
 
 	GPU_LOG(DEBUG, "free device %s (id %d)",
-			dev->info.name, dev->info.dev_id);
+			dev->mpshared->info.name, dev->mpshared->info.dev_id);
 	rte_gpu_notify(dev, RTE_GPU_EVENT_DEL);
 
 	gpu_free_callbacks(dev);
-	dev->state = RTE_GPU_STATE_UNUSED;
+	dev->process_state = RTE_GPU_STATE_UNUSED;
+	__atomic_fetch_sub(&dev->mpshared->process_refcnt, 1, __ATOMIC_RELAXED);
 	gpu_count--;
 
 	return 0;
@@ -403,7 +490,7 @@ rte_gpu_notify(struct rte_gpu *dev, enum rte_gpu_event event)
 	int16_t dev_id;
 	struct rte_gpu_callback *callback;
 
-	dev_id = dev->info.dev_id;
+	dev_id = dev->mpshared->info.dev_id;
 	rte_rwlock_read_lock(&gpu_callback_lock);
 	TAILQ_FOREACH(callback, &dev->callbacks, next) {
 		if (callback->event != event || callback->function == NULL)
@@ -431,7 +518,7 @@ rte_gpu_info_get(int16_t dev_id, struct rte_gpu_info *info)
 	}
 
 	if (dev->ops.dev_info_get == NULL) {
-		*info = dev->info;
+		*info = dev->mpshared->info;
 		return 0;
 	}
 	return GPU_DRV_RET(dev->ops.dev_info_get(dev, info));
diff --git a/lib/gpudev/gpudev_driver.h b/lib/gpudev/gpudev_driver.h
index 4d0077161c..9459c7e30f 100644
--- a/lib/gpudev/gpudev_driver.h
+++ b/lib/gpudev/gpudev_driver.h
@@ -35,19 +35,28 @@ struct rte_gpu_ops {
 	rte_gpu_close_t *dev_close;
 };
 
-struct rte_gpu {
-	/* Backing device. */
-	struct rte_device *device;
+struct rte_gpu_mpshared {
 	/* Unique identifier name. */
 	char name[RTE_DEV_NAME_MAX_LEN]; /* Updated by this library. */
+	/* Driver-specific private data shared in multi-process. */
+	void *dev_private;
 	/* Device info structure. */
 	struct rte_gpu_info info;
+	/* Counter of processes using the device. */
+	uint16_t process_refcnt; /* Updated by this library. */
+};
+
+struct rte_gpu {
+	/* Backing device. */
+	struct rte_device *device;
+	/* Data shared between processes. */
+	struct rte_gpu_mpshared *mpshared;
 	/* Driver functions. */
 	struct rte_gpu_ops ops;
 	/* Event callback list. */
 	TAILQ_HEAD(rte_gpu_callback_list, rte_gpu_callback) callbacks;
 	/* Current state (used or not) in the running process. */
-	enum rte_gpu_state state; /* Updated by this library. */
+	enum rte_gpu_state process_state; /* Updated by this library. */
 	/* Driver-specific private data for the running process. */
 	void *process_private;
 } __rte_cache_aligned;
@@ -55,15 +64,19 @@ struct rte_gpu {
 __rte_internal
 struct rte_gpu *rte_gpu_get_by_name(const char *name);
 
-/* First step of initialization */
+/* First step of initialization in primary process. */
 __rte_internal
 struct rte_gpu *rte_gpu_allocate(const char *name);
 
+/* First step of initialization in secondary process. */
+__rte_internal
+struct rte_gpu *rte_gpu_attach(const char *name);
+
 /* Last step of initialization. */
 __rte_internal
 void rte_gpu_complete_new(struct rte_gpu *dev);
 
-/* Last step of removal. */
+/* Last step of removal (primary or secondary process). */
 __rte_internal
 int rte_gpu_release(struct rte_gpu *dev);
 
diff --git a/lib/gpudev/version.map b/lib/gpudev/version.map
index 4a934ed933..58dc632393 100644
--- a/lib/gpudev/version.map
+++ b/lib/gpudev/version.map
@@ -17,6 +17,7 @@ INTERNAL {
 	global:
 
 	rte_gpu_allocate;
+	rte_gpu_attach;
 	rte_gpu_complete_new;
 	rte_gpu_get_by_name;
 	rte_gpu_notify;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [PATCH v4 5/9] gpudev: add memory API
  2021-11-03 19:15 ` [dpdk-dev] [PATCH v4 " eagostini
                     ` (3 preceding siblings ...)
  2021-11-03 19:15   ` [dpdk-dev] [PATCH v4 4/9] gpudev: support multi-process eagostini
@ 2021-11-03 19:15   ` eagostini
  2021-11-03 19:15   ` [dpdk-dev] [PATCH v4 6/9] gpudev: add memory barrier eagostini
                     ` (3 subsequent siblings)
  8 siblings, 0 replies; 128+ messages in thread
From: eagostini @ 2021-11-03 19:15 UTC (permalink / raw)
  To: dev; +Cc: Elena Agostini, Thomas Monjalon

From: Elena Agostini <eagostini@nvidia.com>

In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.
Such workload distribution can be achieved by sharing some memory.

As a first step, the features are focused on memory management.
A function allows to allocate memory inside the device,
or in the main (CPU) memory while making it visible for the device.
This memory may be used to save packets or for synchronization data.

The next step should focus on GPU processing task control.

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 app/test-gpudev/main.c                 | 118 +++++++++++++++++++++++++
 doc/guides/gpus/features/default.ini   |   3 +
 doc/guides/prog_guide/gpudev.rst       |  19 ++++
 doc/guides/rel_notes/release_21_11.rst |   1 +
 lib/gpudev/gpudev.c                    | 101 +++++++++++++++++++++
 lib/gpudev/gpudev_driver.h             |  12 +++
 lib/gpudev/rte_gpudev.h                |  95 ++++++++++++++++++++
 lib/gpudev/version.map                 |   4 +
 8 files changed, 353 insertions(+)

diff --git a/app/test-gpudev/main.c b/app/test-gpudev/main.c
index 6a73a54e84..98c02a3ee0 100644
--- a/app/test-gpudev/main.c
+++ b/app/test-gpudev/main.c
@@ -62,6 +62,110 @@ args_parse(int argc, char **argv)
 	}
 }
 
+static int
+alloc_gpu_memory(uint16_t gpu_id)
+{
+	void * ptr_1 = NULL;
+	void * ptr_2 = NULL;
+	size_t buf_bytes = 1024;
+	int ret = 0;
+
+	printf("\n=======> TEST: Allocate GPU memory\n");
+
+	/* Alloc memory on GPU 0 */
+	ptr_1 = rte_gpu_malloc(gpu_id, buf_bytes);
+	if(ptr_1 == NULL)
+	{
+		fprintf(stderr, "rte_gpu_malloc GPU memory returned error\n");
+		return -1;
+	}
+	printf("GPU memory allocated at 0x%p %zdB\n", ptr_1, buf_bytes);
+
+	ptr_2 = rte_gpu_malloc(gpu_id, buf_bytes);
+	if(ptr_2 == NULL)
+	{
+		fprintf(stderr, "rte_gpu_malloc GPU memory returned error\n");
+		return -1;
+	}
+	printf("GPU memory allocated at 0x%p %zdB\n", ptr_2, buf_bytes);
+
+	ret = rte_gpu_free(gpu_id, (uint8_t*)(ptr_1)+0x700);
+	if(ret < 0)
+	{
+		printf("GPU memory 0x%p + 0x700 NOT freed because of memory address not recognized by driver\n", ptr_1);
+	}
+	else
+	{
+		fprintf(stderr, "rte_gpu_free erroneusly freed GPU memory 0x%p + 0x700\n", ptr_1);
+		return -1;
+	}
+
+	ret = rte_gpu_free(gpu_id, ptr_2);
+	if(ret < 0)
+	{
+		fprintf(stderr, "rte_gpu_free returned error %d\n", ret);
+		return -1;
+	}
+	printf("GPU memory 0x%p freed\n", ptr_2);
+
+	ret = rte_gpu_free(gpu_id, ptr_1);
+	if(ret < 0)
+	{
+		fprintf(stderr, "rte_gpu_free returned error %d\n", ret);
+		return -1;
+	}
+	printf("GPU memory 0x%p freed\n", ptr_1);
+
+	return 0;
+}
+
+static int
+register_cpu_memory(uint16_t gpu_id)
+{
+	void * ptr = NULL;
+	size_t buf_bytes = 1024;
+	int ret = 0;
+
+	printf("\n=======> TEST: Register CPU memory\n");
+
+	/* Alloc memory on CPU visible from GPU 0 */
+	ptr = rte_zmalloc(NULL, buf_bytes, 0);
+	if (ptr == NULL) {
+		fprintf(stderr, "Failed to allocate CPU memory.\n");
+		return -1;
+	}
+
+	ret = rte_gpu_register(gpu_id, buf_bytes, ptr);
+	if(ret < 0)
+	{
+		fprintf(stderr, "rte_gpu_register CPU memory returned error %d\n", ret);
+		return -1;
+	}
+	printf("CPU memory registered at 0x%p %zdB\n", ptr, buf_bytes);
+
+	ret = rte_gpu_unregister(gpu_id, (uint8_t*)(ptr)+0x700);
+	if(ret < 0)
+	{
+		printf("CPU memory 0x%p + 0x700 NOT unregistered because of memory address not recognized by driver\n", ptr);
+	}
+	else
+	{
+		fprintf(stderr, "rte_gpu_free erroneusly freed GPU memory 0x%p + 0x700\n", ptr);
+		return -1;
+	}
+	printf("CPU memory 0x%p unregistered\n", ptr);
+
+	ret = rte_gpu_unregister(gpu_id, ptr);
+	if(ret < 0)
+	{
+		fprintf(stderr, "rte_gpu_unregister returned error %d\n", ret);
+		return -1;
+	}
+	printf("CPU memory 0x%p unregistered\n", ptr);
+
+	return 0;
+}
+
 int
 main(int argc, char **argv)
 {
@@ -99,6 +203,20 @@ main(int argc, char **argv)
 	}
 	printf("\n\n");
 
+	if(nb_gpus == 0)
+	{
+		fprintf(stderr, "Need at least one GPU on the system to run the example\n");
+		return EXIT_FAILURE;
+	}
+
+	gpu_id = 0;
+
+	/**
+	 * Memory tests
+	 */
+	alloc_gpu_memory(gpu_id);
+	register_cpu_memory(gpu_id);
+
 	/* clean up the EAL */
 	rte_eal_cleanup();
 	printf("Bye...\n");
diff --git a/doc/guides/gpus/features/default.ini b/doc/guides/gpus/features/default.ini
index ec7a545eb7..87e9966424 100644
--- a/doc/guides/gpus/features/default.ini
+++ b/doc/guides/gpus/features/default.ini
@@ -8,3 +8,6 @@
 ;
 [Features]
 Get device info                =
+Share CPU memory with device   =
+Allocate device memory         =
+Free memory                    =
diff --git a/doc/guides/prog_guide/gpudev.rst b/doc/guides/prog_guide/gpudev.rst
index 7694639489..9aca69038c 100644
--- a/doc/guides/prog_guide/gpudev.rst
+++ b/doc/guides/prog_guide/gpudev.rst
@@ -30,6 +30,8 @@ Features
 This library provides a number of features:
 
 - Interoperability with device-specific library through generic handlers.
+- Allocate and free memory on the device.
+- Register CPU memory to make it visible from the device.
 
 
 API Overview
@@ -46,3 +48,20 @@ that will be registered internally by the driver as an additional device (child)
 connected to a physical device (parent).
 Each device (parent or child) is represented through a ID
 required to indicate which device a given operation should be executed on.
+
+Memory Allocation
+~~~~~~~~~~~~~~~~~
+
+gpudev can allocate on an input given GPU device a memory area
+returning the pointer to that memory.
+Later, it's also possible to free that memory with gpudev.
+GPU memory allocated outside of the gpudev library
+(e.g. with GPU-specific library) cannot be freed by the gpudev library.
+
+Memory Registration
+~~~~~~~~~~~~~~~~~~~
+
+gpudev can register a CPU memory area to make it visible from a GPU device.
+Later, it's also possible to unregister that memory with gpudev.
+CPU memory registered outside of the gpudev library
+(e.g. with GPU specific library) cannot be unregistered by the gpudev library.
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index 57851c63af..f70680dad3 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -94,6 +94,7 @@ New Features
 * **Introduced GPU device class with first features:**
 
   * Device information
+  * Memory management
 
 * **Added new RSS offload types for IPv4/L4 checksum in RSS flow.**
 
diff --git a/lib/gpudev/gpudev.c b/lib/gpudev/gpudev.c
index f0690cf730..1d8318f769 100644
--- a/lib/gpudev/gpudev.c
+++ b/lib/gpudev/gpudev.c
@@ -6,6 +6,7 @@
 #include <rte_tailq.h>
 #include <rte_string_fns.h>
 #include <rte_memzone.h>
+#include <rte_malloc.h>
 #include <rte_errno.h>
 #include <rte_log.h>
 
@@ -523,3 +524,103 @@ rte_gpu_info_get(int16_t dev_id, struct rte_gpu_info *info)
 	}
 	return GPU_DRV_RET(dev->ops.dev_info_get(dev, info));
 }
+
+void *
+rte_gpu_malloc(int16_t dev_id, size_t size)
+{
+	struct rte_gpu *dev;
+	void *ptr;
+	int ret;
+
+	dev = gpu_get_by_id(dev_id);
+	if (dev == NULL) {
+		GPU_LOG(ERR, "alloc mem for invalid device ID %d", dev_id);
+		rte_errno = ENODEV;
+		return NULL;
+	}
+
+	if (dev->ops.mem_alloc == NULL) {
+		GPU_LOG(ERR, "mem allocation not supported");
+		rte_errno = ENOTSUP;
+		return NULL;
+	}
+
+	if (size == 0) /* dry-run */
+		return NULL;
+
+	ret = dev->ops.mem_alloc(dev, size, &ptr);
+
+	switch (ret) {
+		case 0:
+			return ptr;
+		case -ENOMEM:
+		case -E2BIG:
+			rte_errno = -ret;
+			return NULL;
+		default:
+			rte_errno = -EPERM;
+			return NULL;
+	}
+}
+
+int
+rte_gpu_register(int16_t dev_id, size_t size, void * ptr)
+{
+	struct rte_gpu *dev;
+
+	dev = gpu_get_by_id(dev_id);
+	if (dev == NULL) {
+		GPU_LOG(ERR, "alloc mem for invalid device ID %d", dev_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+
+	if (dev->ops.mem_register == NULL) {
+		GPU_LOG(ERR, "mem registration not supported");
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+
+	if (size == 0 || ptr == NULL) /* dry-run */
+		return -EINVAL;
+
+	return GPU_DRV_RET(dev->ops.mem_register(dev, size, ptr));
+}
+
+int
+rte_gpu_unregister(int16_t dev_id, void * ptr)
+{
+	struct rte_gpu *dev;
+
+	dev = gpu_get_by_id(dev_id);
+	if (dev == NULL) {
+		GPU_LOG(ERR, "unregister mem for invalid device ID %d", dev_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+
+	if (dev->ops.mem_unregister == NULL) {
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	return GPU_DRV_RET(dev->ops.mem_unregister(dev, ptr));
+}
+
+int
+rte_gpu_free(int16_t dev_id, void *ptr)
+{
+	struct rte_gpu *dev;
+
+	dev = gpu_get_by_id(dev_id);
+	if (dev == NULL) {
+		GPU_LOG(ERR, "free mem for invalid device ID %d", dev_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+
+	if (dev->ops.mem_free == NULL) {
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	return GPU_DRV_RET(dev->ops.mem_free(dev, ptr));
+}
diff --git a/lib/gpudev/gpudev_driver.h b/lib/gpudev/gpudev_driver.h
index 9459c7e30f..11015944a6 100644
--- a/lib/gpudev/gpudev_driver.h
+++ b/lib/gpudev/gpudev_driver.h
@@ -27,12 +27,24 @@ enum rte_gpu_state {
 struct rte_gpu;
 typedef int (rte_gpu_close_t)(struct rte_gpu *dev);
 typedef int (rte_gpu_info_get_t)(struct rte_gpu *dev, struct rte_gpu_info *info);
+typedef int (rte_gpu_mem_alloc_t)(struct rte_gpu *dev, size_t size, void **ptr);
+typedef int (rte_gpu_free_t)(struct rte_gpu *dev, void *ptr);
+typedef int (rte_gpu_mem_register_t)(struct rte_gpu *dev, size_t size, void *ptr);
+typedef int (rte_gpu_mem_unregister_t)(struct rte_gpu *dev, void *ptr);
 
 struct rte_gpu_ops {
 	/* Get device info. If NULL, info is just copied. */
 	rte_gpu_info_get_t *dev_info_get;
 	/* Close device or child context. */
 	rte_gpu_close_t *dev_close;
+	/* Allocate memory in device. */
+	rte_gpu_mem_alloc_t *mem_alloc;
+	/* Register CPU memory in device. */
+	rte_gpu_mem_register_t *mem_register;
+	/* Free memory allocated or registered in device. */
+	rte_gpu_free_t *mem_free;
+	/* Unregister CPU memory in device. */
+	rte_gpu_mem_unregister_t *mem_unregister;
 };
 
 struct rte_gpu_mpshared {
diff --git a/lib/gpudev/rte_gpudev.h b/lib/gpudev/rte_gpudev.h
index df75dbdbab..3c276581c0 100644
--- a/lib/gpudev/rte_gpudev.h
+++ b/lib/gpudev/rte_gpudev.h
@@ -9,6 +9,7 @@
 #include <stdint.h>
 #include <stdbool.h>
 
+#include <rte_bitops.h>
 #include <rte_compat.h>
 
 /**
@@ -292,6 +293,100 @@ int rte_gpu_callback_unregister(int16_t dev_id, enum rte_gpu_event event,
 __rte_experimental
 int rte_gpu_info_get(int16_t dev_id, struct rte_gpu_info *info);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Allocate a chunk of memory usable by the device.
+ *
+ * @param dev_id
+ *   Device ID requiring allocated memory.
+ * @param size
+ *   Number of bytes to allocate.
+ *   Requesting 0 will do nothing.
+ *
+ * @return
+ *   A pointer to the allocated memory, otherwise NULL and rte_errno is set:
+ *   - ENODEV if invalid dev_id
+ *   - EINVAL if reserved flags
+ *   - ENOTSUP if operation not supported by the driver
+ *   - E2BIG if size is higher than limit
+ *   - ENOMEM if out of space
+ *   - EPERM if driver error
+ */
+__rte_experimental
+void *rte_gpu_malloc(int16_t dev_id, size_t size)
+__rte_alloc_size(2);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Deallocate a chunk of memory allocated with rte_gpu_malloc().
+ *
+ * @param dev_id
+ *   Reference device ID.
+ * @param ptr
+ *   Pointer to the memory area to be deallocated.
+ *   NULL is a no-op accepted value.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - ENOTSUP if operation not supported by the driver
+ *   - EPERM if driver error
+ */
+__rte_experimental
+int rte_gpu_free(int16_t dev_id, void *ptr);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Register a chunk of memory on the CPU usable by the device.
+ *
+ * @param dev_id
+ *   Device ID requiring allocated memory.
+ * @param size
+ *   Number of bytes to allocate.
+ *   Requesting 0 will do nothing.
+ * @param ptr
+ *   Pointer to the memory area to be registered.
+ *   NULL is a no-op accepted value.
+
+ * @return
+ *   A pointer to the allocated memory, otherwise NULL and rte_errno is set:
+ *   - ENODEV if invalid dev_id
+ *   - EINVAL if reserved flags
+ *   - ENOTSUP if operation not supported by the driver
+ *   - E2BIG if size is higher than limit
+ *   - ENOMEM if out of space
+ *   - EPERM if driver error
+ */
+__rte_experimental
+int rte_gpu_register(int16_t dev_id, size_t size, void * ptr);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Deregister a chunk of memory previusly registered with rte_gpu_mem_register()
+ *
+ * @param dev_id
+ *   Reference device ID.
+ * @param ptr
+ *   Pointer to the memory area to be unregistered.
+ *   NULL is a no-op accepted value.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - ENOTSUP if operation not supported by the driver
+ *   - EPERM if driver error
+ */
+__rte_experimental
+int rte_gpu_unregister(int16_t dev_id, void *ptr);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/gpudev/version.map b/lib/gpudev/version.map
index 58dc632393..d4a65ebd52 100644
--- a/lib/gpudev/version.map
+++ b/lib/gpudev/version.map
@@ -8,9 +8,13 @@ EXPERIMENTAL {
 	rte_gpu_close;
 	rte_gpu_count_avail;
 	rte_gpu_find_next;
+	rte_gpu_free;
 	rte_gpu_info_get;
 	rte_gpu_init;
 	rte_gpu_is_valid;
+	rte_gpu_malloc;
+	rte_gpu_register;
+	rte_gpu_unregister;
 };
 
 INTERNAL {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [PATCH v4 6/9] gpudev: add memory barrier
  2021-11-03 19:15 ` [dpdk-dev] [PATCH v4 " eagostini
                     ` (4 preceding siblings ...)
  2021-11-03 19:15   ` [dpdk-dev] [PATCH v4 5/9] gpudev: add memory API eagostini
@ 2021-11-03 19:15   ` eagostini
  2021-11-03 19:15   ` [dpdk-dev] [PATCH v4 7/9] gpudev: add communication flag eagostini
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 128+ messages in thread
From: eagostini @ 2021-11-03 19:15 UTC (permalink / raw)
  To: dev; +Cc: Elena Agostini

From: Elena Agostini <eagostini@nvidia.com>

Add a function for the application to ensure the coherency
of the writes executed by another device into the GPU memory.

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
---
 doc/guides/prog_guide/gpudev.rst |  8 ++++++++
 lib/gpudev/gpudev.c              | 19 +++++++++++++++++++
 lib/gpudev/gpudev_driver.h       |  3 +++
 lib/gpudev/rte_gpudev.h          | 18 ++++++++++++++++++
 lib/gpudev/version.map           |  1 +
 5 files changed, 49 insertions(+)

diff --git a/doc/guides/prog_guide/gpudev.rst b/doc/guides/prog_guide/gpudev.rst
index 9aca69038c..eb5f0af817 100644
--- a/doc/guides/prog_guide/gpudev.rst
+++ b/doc/guides/prog_guide/gpudev.rst
@@ -65,3 +65,11 @@ gpudev can register a CPU memory area to make it visible from a GPU device.
 Later, it's also possible to unregister that memory with gpudev.
 CPU memory registered outside of the gpudev library
 (e.g. with GPU specific library) cannot be unregistered by the gpudev library.
+
+Memory Barrier
+~~~~~~~~~~~~~~
+
+Some GPU drivers may need, under certain conditions,
+to enforce the coherency of external devices writes (e.g. NIC receiving packets)
+into the GPU memory.
+gpudev abstracts and exposes this capability.
diff --git a/lib/gpudev/gpudev.c b/lib/gpudev/gpudev.c
index 1d8318f769..cefefd737a 100644
--- a/lib/gpudev/gpudev.c
+++ b/lib/gpudev/gpudev.c
@@ -624,3 +624,22 @@ rte_gpu_free(int16_t dev_id, void *ptr)
 	}
 	return GPU_DRV_RET(dev->ops.mem_free(dev, ptr));
 }
+
+int
+rte_gpu_mbw(int16_t dev_id)
+{
+	struct rte_gpu *dev;
+
+	dev = gpu_get_by_id(dev_id);
+	if (dev == NULL) {
+		GPU_LOG(ERR, "memory barrier for invalid device ID %d", dev_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+
+	if (dev->ops.mbw == NULL) {
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	return GPU_DRV_RET(dev->ops.mbw(dev));
+}
diff --git a/lib/gpudev/gpudev_driver.h b/lib/gpudev/gpudev_driver.h
index 11015944a6..ab24de9e28 100644
--- a/lib/gpudev/gpudev_driver.h
+++ b/lib/gpudev/gpudev_driver.h
@@ -31,6 +31,7 @@ typedef int (rte_gpu_mem_alloc_t)(struct rte_gpu *dev, size_t size, void **ptr);
 typedef int (rte_gpu_free_t)(struct rte_gpu *dev, void *ptr);
 typedef int (rte_gpu_mem_register_t)(struct rte_gpu *dev, size_t size, void *ptr);
 typedef int (rte_gpu_mem_unregister_t)(struct rte_gpu *dev, void *ptr);
+typedef int (rte_gpu_mbw_t)(struct rte_gpu *dev);
 
 struct rte_gpu_ops {
 	/* Get device info. If NULL, info is just copied. */
@@ -45,6 +46,8 @@ struct rte_gpu_ops {
 	rte_gpu_free_t *mem_free;
 	/* Unregister CPU memory in device. */
 	rte_gpu_mem_unregister_t *mem_unregister;
+	/* Enforce GPU memory write barrier. */
+	rte_gpu_mbw_t *mbw;
 };
 
 struct rte_gpu_mpshared {
diff --git a/lib/gpudev/rte_gpudev.h b/lib/gpudev/rte_gpudev.h
index 3c276581c0..e790b3e2b7 100644
--- a/lib/gpudev/rte_gpudev.h
+++ b/lib/gpudev/rte_gpudev.h
@@ -387,6 +387,24 @@ int rte_gpu_register(int16_t dev_id, size_t size, void * ptr);
 __rte_experimental
 int rte_gpu_unregister(int16_t dev_id, void *ptr);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Enforce a GPU memory write barrier.
+ *
+ * @param dev_id
+ *   Reference device ID.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - ENOTSUP if operation not supported by the driver
+ *   - EPERM if driver error
+ */
+__rte_experimental
+int rte_gpu_mbw(int16_t dev_id);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/gpudev/version.map b/lib/gpudev/version.map
index d4a65ebd52..d72d470d8e 100644
--- a/lib/gpudev/version.map
+++ b/lib/gpudev/version.map
@@ -13,6 +13,7 @@ EXPERIMENTAL {
 	rte_gpu_init;
 	rte_gpu_is_valid;
 	rte_gpu_malloc;
+	rte_gpu_mbw;
 	rte_gpu_register;
 	rte_gpu_unregister;
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [PATCH v4 7/9] gpudev: add communication flag
  2021-11-03 19:15 ` [dpdk-dev] [PATCH v4 " eagostini
                     ` (5 preceding siblings ...)
  2021-11-03 19:15   ` [dpdk-dev] [PATCH v4 6/9] gpudev: add memory barrier eagostini
@ 2021-11-03 19:15   ` eagostini
  2021-11-03 19:15   ` [dpdk-dev] [PATCH v4 8/9] gpudev: add communication list eagostini
  2021-11-03 19:15   ` [dpdk-dev] [PATCH v4 9/9] doc: add CUDA example in GPU guide eagostini
  8 siblings, 0 replies; 128+ messages in thread
From: eagostini @ 2021-11-03 19:15 UTC (permalink / raw)
  To: dev; +Cc: Elena Agostini

From: Elena Agostini <eagostini@nvidia.com>

In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.
When mixing network activity with task processing there may be the need
to put in communication the CPU with the device in order to synchronize
operations.

The purpose of this flag is to allow the CPU and the GPU to
exchange ACKs. A possible use-case is described below.

CPU:
- Trigger some task on the GPU
- Prepare some data
- Signal to the GPU the data is ready updating the communication flag

GPU:
- Do some pre-processing
- Wait for more data from the CPU polling on the communication flag
- Consume the data prepared by the CPU

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
---
 app/test-gpudev/main.c                 |  66 +++++++++++++++
 doc/guides/prog_guide/gpudev.rst       |  13 +++
 doc/guides/rel_notes/release_21_11.rst |   1 +
 lib/gpudev/gpudev.c                    |  94 +++++++++++++++++++++
 lib/gpudev/rte_gpudev.h                | 108 +++++++++++++++++++++++++
 lib/gpudev/version.map                 |   4 +
 6 files changed, 286 insertions(+)

diff --git a/app/test-gpudev/main.c b/app/test-gpudev/main.c
index 98c02a3ee0..22f5c950b2 100644
--- a/app/test-gpudev/main.c
+++ b/app/test-gpudev/main.c
@@ -166,6 +166,67 @@ register_cpu_memory(uint16_t gpu_id)
 	return 0;
 }
 
+static int
+create_update_comm_flag(uint16_t gpu_id)
+{
+	struct rte_gpu_comm_flag devflag;
+	int ret = 0;
+	uint32_t set_val;
+	uint32_t get_val;
+
+	printf("\n=======> TEST: Communication flag\n");
+
+	ret = rte_gpu_comm_create_flag(gpu_id, &devflag, RTE_GPU_COMM_FLAG_CPU);
+	if(ret < 0)
+	{
+		fprintf(stderr, "rte_gpu_comm_create_flag returned error %d\n", ret);
+		return -1;
+	}
+
+	set_val = 25;
+	ret = rte_gpu_comm_set_flag(&devflag, set_val);
+	if(ret < 0)
+	{
+		fprintf(stderr, "rte_gpu_comm_set_flag returned error %d\n", ret);
+		return -1;
+	}
+
+	ret = rte_gpu_comm_get_flag_value(&devflag, &get_val);
+	if(ret < 0)
+	{
+		fprintf(stderr, "rte_gpu_comm_get_flag_value returned error %d\n", ret);
+		return -1;
+	}
+
+	printf("Communication flag value at 0x%p was set to %d and current value is %d\n", devflag.ptr, set_val, get_val);
+
+	set_val = 38;
+	ret = rte_gpu_comm_set_flag(&devflag, set_val);
+	if(ret < 0)
+	{
+		fprintf(stderr, "rte_gpu_comm_set_flag returned error %d\n", ret);
+		return -1;
+	}
+
+	ret = rte_gpu_comm_get_flag_value(&devflag, &get_val);
+	if(ret < 0)
+	{
+		fprintf(stderr, "rte_gpu_comm_get_flag_value returned error %d\n", ret);
+		return -1;
+	}
+
+	printf("Communication flag value at 0x%p was set to %d and current value is %d\n", devflag.ptr, set_val, get_val);
+
+	ret = rte_gpu_comm_destroy_flag(&devflag);
+	if(ret < 0)
+	{
+		fprintf(stderr, "rte_gpu_comm_destroy_flags returned error %d\n", ret);
+		return -1;
+	}
+
+	return 0;
+}
+
 int
 main(int argc, char **argv)
 {
@@ -217,6 +278,11 @@ main(int argc, char **argv)
 	alloc_gpu_memory(gpu_id);
 	register_cpu_memory(gpu_id);
 
+	/**
+	 * Communication items test
+	 */
+	create_update_comm_flag(gpu_id);
+
 	/* clean up the EAL */
 	rte_eal_cleanup();
 	printf("Bye...\n");
diff --git a/doc/guides/prog_guide/gpudev.rst b/doc/guides/prog_guide/gpudev.rst
index eb5f0af817..e0db627aed 100644
--- a/doc/guides/prog_guide/gpudev.rst
+++ b/doc/guides/prog_guide/gpudev.rst
@@ -32,6 +32,10 @@ This library provides a number of features:
 - Interoperability with device-specific library through generic handlers.
 - Allocate and free memory on the device.
 - Register CPU memory to make it visible from the device.
+- Communication between the CPU and the device.
+
+The whole CPU - GPU communication is implemented
+using CPU memory visible from the GPU.
 
 
 API Overview
@@ -73,3 +77,12 @@ Some GPU drivers may need, under certain conditions,
 to enforce the coherency of external devices writes (e.g. NIC receiving packets)
 into the GPU memory.
 gpudev abstracts and exposes this capability.
+
+Communication Flag
+~~~~~~~~~~~~~~~~~~
+
+Considering an application with some GPU task
+that's waiting to receive a signal from the CPU
+to move forward with the execution.
+The communication flag allocates a CPU memory GPU-visible ``uint32_t`` flag
+that can be used by the CPU to communicate with a GPU task.
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index f70680dad3..16d10bb14c 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -95,6 +95,7 @@ New Features
 
   * Device information
   * Memory management
+  * Communication flag
 
 * **Added new RSS offload types for IPv4/L4 checksum in RSS flow.**
 
diff --git a/lib/gpudev/gpudev.c b/lib/gpudev/gpudev.c
index cefefd737a..827e29d8f6 100644
--- a/lib/gpudev/gpudev.c
+++ b/lib/gpudev/gpudev.c
@@ -643,3 +643,97 @@ rte_gpu_mbw(int16_t dev_id)
 	}
 	return GPU_DRV_RET(dev->ops.mbw(dev));
 }
+
+int
+rte_gpu_comm_create_flag(uint16_t dev_id, struct rte_gpu_comm_flag *devflag,
+		enum rte_gpu_comm_flag_type mtype)
+{
+	size_t flag_size;
+	int ret;
+
+	if (devflag == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (mtype != RTE_GPU_COMM_FLAG_CPU) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	flag_size = sizeof(uint32_t);
+
+	devflag->ptr = rte_zmalloc(NULL, flag_size, 0);
+	if (devflag->ptr == NULL) {
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+
+	ret = rte_gpu_register(dev_id, flag_size, devflag->ptr);
+	if(ret < 0)
+	{
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+
+	devflag->mtype = mtype;
+	devflag->dev_id = dev_id;
+
+	return 0;
+}
+
+int
+rte_gpu_comm_destroy_flag(struct rte_gpu_comm_flag *devflag)
+{
+	int ret;
+
+	if (devflag == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	ret = rte_gpu_unregister(devflag->dev_id, devflag->ptr);
+	if(ret < 0)
+	{
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	rte_free(devflag->ptr);
+
+	return 0;
+}
+
+int
+rte_gpu_comm_set_flag(struct rte_gpu_comm_flag *devflag, uint32_t val)
+{
+	if (devflag == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	if (devflag->mtype != RTE_GPU_COMM_FLAG_CPU) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	RTE_GPU_VOLATILE(*devflag->ptr) = val;
+
+	return 0;
+}
+
+int
+rte_gpu_comm_get_flag_value(struct rte_gpu_comm_flag *devflag, uint32_t *val)
+{
+	if (devflag == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (devflag->mtype != RTE_GPU_COMM_FLAG_CPU) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	*val = RTE_GPU_VOLATILE(*devflag->ptr);
+
+	return 0;
+}
diff --git a/lib/gpudev/rte_gpudev.h b/lib/gpudev/rte_gpudev.h
index e790b3e2b7..4a10a8bcf5 100644
--- a/lib/gpudev/rte_gpudev.h
+++ b/lib/gpudev/rte_gpudev.h
@@ -38,6 +38,9 @@ extern "C" {
 /** Catch-all callback data. */
 #define RTE_GPU_CALLBACK_ANY_DATA ((void *)-1)
 
+/** Access variable as volatile. */
+#define RTE_GPU_VOLATILE(x) (*(volatile typeof(x)*)&(x))
+
 /** Store device info. */
 struct rte_gpu_info {
 	/** Unique identifier name. */
@@ -68,6 +71,22 @@ enum rte_gpu_event {
 typedef void (rte_gpu_callback_t)(int16_t dev_id,
 		enum rte_gpu_event event, void *user_data);
 
+/** Memory where communication flag is allocated. */
+enum rte_gpu_comm_flag_type {
+	/** Allocate flag on CPU memory visible from device. */
+	RTE_GPU_COMM_FLAG_CPU = 0,
+};
+
+/** Communication flag to coordinate CPU with the device. */
+struct rte_gpu_comm_flag {
+	/** Device that will use the device flag. */
+	uint16_t dev_id;
+	/** Pointer to flag memory area. */
+	uint32_t *ptr;
+	/** Type of memory used to allocate the flag. */
+	enum rte_gpu_comm_flag_type mtype;
+};
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
@@ -405,6 +424,95 @@ int rte_gpu_unregister(int16_t dev_id, void *ptr);
 __rte_experimental
 int rte_gpu_mbw(int16_t dev_id);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Create a communication flag that can be shared
+ * between CPU threads and device workload to exchange some status info
+ * (e.g. work is done, processing can start, etc..).
+ *
+ * @param dev_id
+ *   Reference device ID.
+ * @param devflag
+ *   Pointer to the memory area of the devflag structure.
+ * @param mtype
+ *   Type of memory to allocate the communication flag.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - EINVAL if invalid inputs
+ *   - ENOTSUP if operation not supported by the driver
+ *   - ENOMEM if out of space
+ *   - EPERM if driver error
+ */
+__rte_experimental
+int rte_gpu_comm_create_flag(uint16_t dev_id,
+		struct rte_gpu_comm_flag *devflag,
+		enum rte_gpu_comm_flag_type mtype);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Deallocate a communication flag.
+ *
+ * @param devflag
+ *   Pointer to the memory area of the devflag structure.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - EINVAL if NULL devflag
+ *   - ENOTSUP if operation not supported by the driver
+ *   - EPERM if driver error
+ */
+__rte_experimental
+int rte_gpu_comm_destroy_flag(struct rte_gpu_comm_flag *devflag);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set the value of a communication flag as the input value.
+ * Flag memory area is treated as volatile.
+ * The flag must have been allocated with RTE_GPU_COMM_FLAG_CPU.
+ *
+ * @param devflag
+ *   Pointer to the memory area of the devflag structure.
+ * @param val
+ *   Value to set in the flag.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - EINVAL if invalid input params
+ */
+__rte_experimental
+int rte_gpu_comm_set_flag(struct rte_gpu_comm_flag *devflag,
+		uint32_t val);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get the value of the communication flag.
+ * Flag memory area is treated as volatile.
+ * The flag must have been allocated with RTE_GPU_COMM_FLAG_CPU.
+ *
+ * @param devflag
+ *   Pointer to the memory area of the devflag structure.
+ * @param val
+ *   Flag output value.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - EINVAL if invalid input params
+ */
+__rte_experimental
+int rte_gpu_comm_get_flag_value(struct rte_gpu_comm_flag *devflag,
+		uint32_t *val);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/gpudev/version.map b/lib/gpudev/version.map
index d72d470d8e..2fc039373a 100644
--- a/lib/gpudev/version.map
+++ b/lib/gpudev/version.map
@@ -6,6 +6,10 @@ EXPERIMENTAL {
 	rte_gpu_callback_register;
 	rte_gpu_callback_unregister;
 	rte_gpu_close;
+	rte_gpu_comm_create_flag;
+	rte_gpu_comm_destroy_flag;
+	rte_gpu_comm_get_flag_value;
+	rte_gpu_comm_set_flag;
 	rte_gpu_count_avail;
 	rte_gpu_find_next;
 	rte_gpu_free;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [PATCH v4 8/9] gpudev: add communication list
  2021-11-03 19:15 ` [dpdk-dev] [PATCH v4 " eagostini
                     ` (6 preceding siblings ...)
  2021-11-03 19:15   ` [dpdk-dev] [PATCH v4 7/9] gpudev: add communication flag eagostini
@ 2021-11-03 19:15   ` eagostini
  2021-11-03 19:15   ` [dpdk-dev] [PATCH v4 9/9] doc: add CUDA example in GPU guide eagostini
  8 siblings, 0 replies; 128+ messages in thread
From: eagostini @ 2021-11-03 19:15 UTC (permalink / raw)
  To: dev; +Cc: Elena Agostini

From: Elena Agostini <eagostini@nvidia.com>

In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.
When mixing network activity with task processing there may be the need
to put in communication the CPU with the device in order to synchronize
operations.

An example could be a receive-and-process application
where CPU is responsible for receiving packets in multiple mbufs
and the GPU is responsible for processing the content of those packets.

The purpose of this list is to provide a buffer in CPU memory visible
from the GPU that can be treated as a circular buffer
to let the CPU provide fondamental info of received packets to the GPU.

A possible use-case is described below.

CPU:
- Trigger some task on the GPU
- in a loop:
    - receive a number of packets
    - provide packets info to the GPU

GPU:
- Do some pre-processing
- Wait to receive a new set of packet to be processed

Layout of a communication list would be:

     -------
    |   0    | => pkt_list
    | status |
    | #pkts  |
     -------
    |   1    | => pkt_list
    | status |
    | #pkts  |
     -------
    |   2    | => pkt_list
    | status |
    | #pkts  |
     -------
    |  ....  | => pkt_list
     -------

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
---
 app/test-gpudev/main.c                 | 103 +++++++++++++++
 doc/guides/prog_guide/gpudev.rst       |  16 +++
 doc/guides/rel_notes/release_21_11.rst |   2 +-
 lib/gpudev/gpudev.c                    | 169 +++++++++++++++++++++++++
 lib/gpudev/meson.build                 |   2 +
 lib/gpudev/rte_gpudev.h                | 129 +++++++++++++++++++
 lib/gpudev/version.map                 |   4 +
 7 files changed, 424 insertions(+), 1 deletion(-)

diff --git a/app/test-gpudev/main.c b/app/test-gpudev/main.c
index 22f5c950b2..8f7ffa4c63 100644
--- a/app/test-gpudev/main.c
+++ b/app/test-gpudev/main.c
@@ -227,6 +227,108 @@ create_update_comm_flag(uint16_t gpu_id)
 	return 0;
 }
 
+static int
+simulate_gpu_task(struct rte_gpu_comm_list *comm_list_item, int num_pkts)
+{
+	int idx;
+
+	if(comm_list_item == NULL)
+		return -1;
+
+	for (idx = 0; idx < num_pkts; idx++) {
+		/**
+		 * consume(comm_list_item->pkt_list[idx].addr);
+		 */
+	}
+	comm_list_item->status = RTE_GPU_COMM_LIST_DONE;
+
+	return 0;
+}
+
+static int
+create_update_comm_list(uint16_t gpu_id)
+{
+	int ret = 0;
+	int i = 0;
+	struct rte_gpu_comm_list * comm_list;
+	uint32_t num_comm_items = 1024;
+	struct rte_mbuf * mbufs[10];
+
+	printf("\n=======> TEST: Communication list\n");
+
+	comm_list = rte_gpu_comm_create_list(gpu_id, num_comm_items);
+	if(comm_list == NULL)
+	{
+		fprintf(stderr, "rte_gpu_comm_create_list returned error %d\n", ret);
+		return -1;
+	}
+
+	/**
+	 * Simulate DPDK receive functions like rte_eth_rx_burst()
+	 */
+	for(i = 0; i < 10; i++)
+	{
+		mbufs[i] = rte_zmalloc(NULL, sizeof(struct rte_mbuf), 0);
+		if (mbufs[i] == NULL) {
+			fprintf(stderr, "Failed to allocate fake mbufs in CPU memory.\n");
+			return -1;
+		}
+
+		memset(mbufs[i], 0, sizeof(struct rte_mbuf));
+	}
+
+	/**
+	 * Populate just the first item of  the list
+	 */
+	ret = rte_gpu_comm_populate_list_pkts(&(comm_list[0]), mbufs, 10);
+	if(ret < 0)
+	{
+		fprintf(stderr, "rte_gpu_comm_populate_list_pkts returned error %d\n", ret);
+		return -1;
+	}
+
+	ret = rte_gpu_comm_cleanup_list(&(comm_list[0]));
+	if(ret == 0)
+	{
+		fprintf(stderr, "rte_gpu_comm_cleanup_list erroneusly cleaned the list even if packets have not beeing consumed yet\n");
+		return -1;
+	}
+	else
+	{
+		fprintf(stderr, "rte_gpu_comm_cleanup_list correctly didn't clean up the packets because they have not beeing consumed yet\n");
+	}
+
+	/**
+	 * Simulate a GPU tasks going through the packet list to consume
+	 * mbufs packets and release them
+	 */
+	simulate_gpu_task(&(comm_list[0]), 10);
+
+	/**
+	 * Packets have been consumed, now the communication item
+	 * and the related mbufs can be all released
+	 */
+	ret = rte_gpu_comm_cleanup_list(&(comm_list[0]));
+	if(ret < 0)
+	{
+		fprintf(stderr, "rte_gpu_comm_cleanup_list returned error %d\n", ret);
+		return -1;
+	}
+
+	ret = rte_gpu_comm_destroy_list(comm_list, num_comm_items);
+	if(ret < 0)
+	{
+		fprintf(stderr, "rte_gpu_comm_destroy_list returned error %d\n", ret);
+		return -1;
+	}
+
+	for(i = 0; i < 10; i++)
+		rte_free(mbufs[i]);
+
+	printf("\nCommunication list test passed!\n");
+	return 0;
+}
+
 int
 main(int argc, char **argv)
 {
@@ -282,6 +384,7 @@ main(int argc, char **argv)
 	 * Communication items test
 	 */
 	create_update_comm_flag(gpu_id);
+	create_update_comm_list(gpu_id);
 
 	/* clean up the EAL */
 	rte_eal_cleanup();
diff --git a/doc/guides/prog_guide/gpudev.rst b/doc/guides/prog_guide/gpudev.rst
index e0db627aed..cbaec5a1e4 100644
--- a/doc/guides/prog_guide/gpudev.rst
+++ b/doc/guides/prog_guide/gpudev.rst
@@ -86,3 +86,19 @@ that's waiting to receive a signal from the CPU
 to move forward with the execution.
 The communication flag allocates a CPU memory GPU-visible ``uint32_t`` flag
 that can be used by the CPU to communicate with a GPU task.
+
+Communication list
+~~~~~~~~~~~~~~~~~~
+
+By default, DPDK pulls free mbufs from a mempool to receive packets.
+Best practice, expecially in a multithreaded application,
+is to no make any assumption on which mbufs will be used
+to receive the next bursts of packets.
+Considering an application with a GPU memory mempool
+attached to a receive queue having some task waiting on the GPU
+to receive a new burst of packets to be processed,
+there is the need to communicate from the CPU
+the list of mbuf payload addresses where received packet have been stored.
+The ``rte_gpu_comm_*()`` functions are responsible to create a list of packets
+that can be populated with receive mbuf payload addresses
+and communicated to the task running on the GPU.
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index 16d10bb14c..8dd0982d12 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -95,7 +95,7 @@ New Features
 
   * Device information
   * Memory management
-  * Communication flag
+  * Communication flag & list
 
 * **Added new RSS offload types for IPv4/L4 checksum in RSS flow.**
 
diff --git a/lib/gpudev/gpudev.c b/lib/gpudev/gpudev.c
index 827e29d8f6..9affac6bdd 100644
--- a/lib/gpudev/gpudev.c
+++ b/lib/gpudev/gpudev.c
@@ -737,3 +737,172 @@ rte_gpu_comm_get_flag_value(struct rte_gpu_comm_flag *devflag, uint32_t *val)
 
 	return 0;
 }
+
+struct rte_gpu_comm_list *
+rte_gpu_comm_create_list(uint16_t dev_id,
+		uint32_t num_comm_items)
+{
+	struct rte_gpu_comm_list *comm_list;
+	uint32_t idx_l;
+	int ret;
+	struct rte_gpu *dev;
+
+	if (num_comm_items == 0) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	dev = gpu_get_by_id(dev_id);
+	if (dev == NULL) {
+		GPU_LOG(ERR, "memory barrier for invalid device ID %d", dev_id);
+		rte_errno = ENODEV;
+		return NULL;
+	}
+
+	comm_list = rte_zmalloc(NULL, sizeof(struct rte_gpu_comm_list) * num_comm_items, 0);
+	if (comm_list == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	ret = rte_gpu_register(dev_id, sizeof(struct rte_gpu_comm_list) * num_comm_items, comm_list);
+	if(ret < 0)
+	{
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	for (idx_l = 0; idx_l < num_comm_items; idx_l++) {
+		comm_list[idx_l].pkt_list = rte_zmalloc(NULL, sizeof(struct rte_gpu_comm_pkt) * RTE_GPU_COMM_LIST_PKTS_MAX, 0);
+		if (comm_list[idx_l].pkt_list == NULL) {
+			rte_errno = ENOMEM;
+			return NULL;
+		}
+
+		ret = rte_gpu_register(dev_id, sizeof(struct rte_gpu_comm_pkt) * RTE_GPU_COMM_LIST_PKTS_MAX, comm_list[idx_l].pkt_list);
+		if(ret < 0)
+		{
+			rte_errno = ENOMEM;
+			return NULL;
+		}
+
+		RTE_GPU_VOLATILE(comm_list[idx_l].status) = RTE_GPU_COMM_LIST_FREE;
+		comm_list[idx_l].num_pkts = 0;
+		comm_list[idx_l].dev_id = dev_id;
+
+		comm_list[idx_l].mbufs = rte_zmalloc(NULL, sizeof(struct rte_mbuf *) * RTE_GPU_COMM_LIST_PKTS_MAX, 0);
+		if (comm_list[idx_l].mbufs == NULL) {
+			rte_errno = ENOMEM;
+			return NULL;
+		}
+	}
+
+	return comm_list;
+}
+
+int
+rte_gpu_comm_destroy_list(struct rte_gpu_comm_list *comm_list,
+		uint32_t num_comm_items)
+{
+	uint32_t idx_l;
+	int ret;
+	uint16_t dev_id;
+
+	if (comm_list == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	dev_id = comm_list[0].dev_id;
+
+	for (idx_l = 0; idx_l < num_comm_items; idx_l++)
+	{
+		ret = rte_gpu_unregister(dev_id, comm_list[idx_l].pkt_list);
+		if(ret < 0)
+		{
+			rte_errno = EINVAL;
+			return -1;
+		}
+
+		rte_free(comm_list[idx_l].pkt_list);
+		rte_free(comm_list[idx_l].mbufs);
+	}
+
+	ret = rte_gpu_unregister(dev_id, comm_list);
+	if(ret < 0)
+	{
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	rte_free(comm_list);
+
+	return 0;
+}
+
+int
+rte_gpu_comm_populate_list_pkts(struct rte_gpu_comm_list *comm_list_item,
+		struct rte_mbuf **mbufs, uint32_t num_mbufs)
+{
+	uint32_t idx;
+
+	if (comm_list_item == NULL || comm_list_item->pkt_list == NULL ||
+			mbufs == NULL || num_mbufs > RTE_GPU_COMM_LIST_PKTS_MAX) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	for (idx = 0; idx < num_mbufs; idx++) {
+		/* support only unchained mbufs */
+		if (unlikely((mbufs[idx]->nb_segs > 1) ||
+				(mbufs[idx]->next != NULL) ||
+				(mbufs[idx]->data_len != mbufs[idx]->pkt_len))) {
+			rte_errno = ENOTSUP;
+			return -rte_errno;
+		}
+		comm_list_item->pkt_list[idx].addr =
+				rte_pktmbuf_mtod_offset(mbufs[idx], uintptr_t, 0);
+		comm_list_item->pkt_list[idx].size = mbufs[idx]->pkt_len;
+		comm_list_item->mbufs[idx] = mbufs[idx];
+	}
+
+	RTE_GPU_VOLATILE(comm_list_item->num_pkts) = num_mbufs;
+	rte_gpu_mbw(comm_list_item->dev_id);
+	RTE_GPU_VOLATILE(comm_list_item->status) = RTE_GPU_COMM_LIST_READY;
+	rte_gpu_mbw(comm_list_item->dev_id);
+
+	return 0;
+}
+
+int
+rte_gpu_comm_cleanup_list(struct rte_gpu_comm_list *comm_list_item)
+{
+	uint32_t idx = 0;
+
+	if (comm_list_item == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	if (RTE_GPU_VOLATILE(comm_list_item->status) ==
+			RTE_GPU_COMM_LIST_READY) {
+		GPU_LOG(ERR, "packet list is still in progress");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	for (idx = 0; idx < RTE_GPU_COMM_LIST_PKTS_MAX; idx++) {
+		if (comm_list_item->pkt_list[idx].addr == 0)
+			break;
+
+		comm_list_item->pkt_list[idx].addr = 0;
+		comm_list_item->pkt_list[idx].size = 0;
+		comm_list_item->mbufs[idx] = NULL;
+	}
+
+	RTE_GPU_VOLATILE(comm_list_item->status) = RTE_GPU_COMM_LIST_FREE;
+	RTE_GPU_VOLATILE(comm_list_item->num_pkts) = 0;
+	rte_mb();
+
+	return 0;
+}
diff --git a/lib/gpudev/meson.build b/lib/gpudev/meson.build
index 608154817b..89a118f357 100644
--- a/lib/gpudev/meson.build
+++ b/lib/gpudev/meson.build
@@ -8,3 +8,5 @@ headers = files(
 sources = files(
         'gpudev.c',
 )
+
+deps += ['mbuf']
diff --git a/lib/gpudev/rte_gpudev.h b/lib/gpudev/rte_gpudev.h
index 4a10a8bcf5..987a6c58f0 100644
--- a/lib/gpudev/rte_gpudev.h
+++ b/lib/gpudev/rte_gpudev.h
@@ -9,6 +9,7 @@
 #include <stdint.h>
 #include <stdbool.h>
 
+#include <rte_mbuf.h>
 #include <rte_bitops.h>
 #include <rte_compat.h>
 
@@ -41,6 +42,9 @@ extern "C" {
 /** Access variable as volatile. */
 #define RTE_GPU_VOLATILE(x) (*(volatile typeof(x)*)&(x))
 
+/** Max number of packets per communication list. */
+#define RTE_GPU_COMM_LIST_PKTS_MAX 1024
+
 /** Store device info. */
 struct rte_gpu_info {
 	/** Unique identifier name. */
@@ -87,6 +91,43 @@ struct rte_gpu_comm_flag {
 	enum rte_gpu_comm_flag_type mtype;
 };
 
+/** List of packets shared among CPU and device. */
+struct rte_gpu_comm_pkt {
+	/** Address of the packet in memory (e.g. mbuf->buf_addr). */
+	uintptr_t addr;
+	/** Size in byte of the packet. */
+	size_t size;
+};
+
+/** Possible status for the list of packets shared among CPU and device. */
+enum rte_gpu_comm_list_status {
+	/** Packet list can be filled with new mbufs, no one is using it. */
+	RTE_GPU_COMM_LIST_FREE = 0,
+	/** Packet list has been filled with new mbufs and it's ready to be used .*/
+	RTE_GPU_COMM_LIST_READY,
+	/** Packet list has been processed, it's ready to be freed. */
+	RTE_GPU_COMM_LIST_DONE,
+	/** Some error occurred during packet list processing. */
+	RTE_GPU_COMM_LIST_ERROR,
+};
+
+/**
+ * Communication list holding a number of lists of packets
+ * each having a status flag.
+ */
+struct rte_gpu_comm_list {
+	/** Device that will use the communication list. */
+	uint16_t dev_id;
+	/** List of mbufs populated by the CPU with a set of mbufs. */
+	struct rte_mbuf ** mbufs;
+	/** List of packets populated by the CPU with a set of mbufs info. */
+	struct rte_gpu_comm_pkt *pkt_list;
+	/** Number of packets in the list. */
+	uint32_t num_pkts;
+	/** Status of the list. */
+	enum rte_gpu_comm_list_status status;
+};
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
@@ -513,6 +554,94 @@ __rte_experimental
 int rte_gpu_comm_get_flag_value(struct rte_gpu_comm_flag *devflag,
 		uint32_t *val);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Create a communication list that can be used to share packets
+ * between CPU and device.
+ * Each element of the list contains:
+ *  - a packet list of RTE_GPU_COMM_LIST_PKTS_MAX elements
+ *  - number of packets in the list
+ *  - a status flag to communicate if the packet list is FREE,
+ *    READY to be processed, DONE with processing.
+ *
+ * The list is allocated in CPU-visible memory.
+ * At creation time, every list is in FREE state.
+ *
+ * @param dev_id
+ *   Reference device ID.
+ * @param num_comm_items
+ *   Number of items in the communication list.
+ *
+ * @return
+ *   A pointer to the allocated list, otherwise NULL and rte_errno is set:
+ *   - EINVAL if invalid input params
+ */
+__rte_experimental
+struct rte_gpu_comm_list *rte_gpu_comm_create_list(uint16_t dev_id,
+		uint32_t num_comm_items);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Destroy a communication list.
+ *
+ * @param comm_list
+ *   Communication list to be destroyed.
+ * @param num_comm_items
+ *   Number of items in the communication list.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - EINVAL if invalid input params
+ */
+__rte_experimental
+int rte_gpu_comm_destroy_list(struct rte_gpu_comm_list *comm_list,
+		uint32_t num_comm_items);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Populate the packets list of the communication item
+ * with info from a list of mbufs.
+ * Status flag of that packet list is set to READY.
+ *
+ * @param comm_list_item
+ *   Communication list item to fill.
+ * @param mbufs
+ *   List of mbufs.
+ * @param num_mbufs
+ *   Number of mbufs.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - EINVAL if invalid input params
+ *   - ENOTSUP if mbufs are chained (multiple segments)
+ */
+__rte_experimental
+int rte_gpu_comm_populate_list_pkts(struct rte_gpu_comm_list *comm_list_item,
+		struct rte_mbuf **mbufs, uint32_t num_mbufs);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Reset a communication list item to the original state.
+ * The status flag set to FREE and mbufs are returned to the pool.
+ *
+ * @param comm_list_item
+ *   Communication list item to reset.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - EINVAL if invalid input params
+ */
+__rte_experimental
+int rte_gpu_comm_cleanup_list(struct rte_gpu_comm_list *comm_list_item);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/gpudev/version.map b/lib/gpudev/version.map
index 2fc039373a..45a35fa6e4 100644
--- a/lib/gpudev/version.map
+++ b/lib/gpudev/version.map
@@ -6,9 +6,13 @@ EXPERIMENTAL {
 	rte_gpu_callback_register;
 	rte_gpu_callback_unregister;
 	rte_gpu_close;
+	rte_gpu_comm_cleanup_list;
 	rte_gpu_comm_create_flag;
+	rte_gpu_comm_create_list;
 	rte_gpu_comm_destroy_flag;
+	rte_gpu_comm_destroy_list;
 	rte_gpu_comm_get_flag_value;
+	rte_gpu_comm_populate_list_pkts;
 	rte_gpu_comm_set_flag;
 	rte_gpu_count_avail;
 	rte_gpu_find_next;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [PATCH v4 9/9] doc: add CUDA example in GPU guide
  2021-11-03 19:15 ` [dpdk-dev] [PATCH v4 " eagostini
                     ` (7 preceding siblings ...)
  2021-11-03 19:15   ` [dpdk-dev] [PATCH v4 8/9] gpudev: add communication list eagostini
@ 2021-11-03 19:15   ` eagostini
  8 siblings, 0 replies; 128+ messages in thread
From: eagostini @ 2021-11-03 19:15 UTC (permalink / raw)
  To: dev; +Cc: Elena Agostini

From: Elena Agostini <eagostini@nvidia.com>

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
---
 doc/guides/prog_guide/gpudev.rst | 122 +++++++++++++++++++++++++++++++
 1 file changed, 122 insertions(+)

diff --git a/doc/guides/prog_guide/gpudev.rst b/doc/guides/prog_guide/gpudev.rst
index cbaec5a1e4..1baf0c6772 100644
--- a/doc/guides/prog_guide/gpudev.rst
+++ b/doc/guides/prog_guide/gpudev.rst
@@ -102,3 +102,125 @@ the list of mbuf payload addresses where received packet have been stored.
 The ``rte_gpu_comm_*()`` functions are responsible to create a list of packets
 that can be populated with receive mbuf payload addresses
 and communicated to the task running on the GPU.
+
+
+CUDA Example
+------------
+
+In the example below, there is a pseudo-code to give an example
+about how to use functions in this library in case of a CUDA application.
+
+.. code-block:: c
+
+   //////////////////////////////////////////////////////////////////////////
+   ///// gpudev library + CUDA functions
+   //////////////////////////////////////////////////////////////////////////
+   #define GPU_PAGE_SHIFT 16
+   #define GPU_PAGE_SIZE (1UL << GPU_PAGE_SHIFT)
+
+   int main() {
+       struct rte_gpu_flag quit_flag;
+       struct rte_gpu_comm_list *comm_list;
+       int nb_rx = 0;
+       int comm_list_entry = 0;
+       struct rte_mbuf * rx_mbufs[max_rx_mbufs];
+       cudaStream_t cstream;
+       struct rte_mempool *mpool_payload, *mpool_header;
+       struct rte_pktmbuf_extmem ext_mem;
+       int16_t dev_id;
+       int16_t port_id = 0;
+
+       /** Initialize CUDA objects (cstream, context, etc..). */
+       /** Use gpudev library to register a new CUDA context if any */
+       /** Let's assume the application wants to use the default context of the GPU device 0 */
+
+       dev_id = 0;
+
+       /**
+        * Create an external memory mempool using memory allocated on the GPU.
+        */
+       ext_mem.elt_size = mbufs_headroom_size;
+                   ext_mem.buf_len = RTE_ALIGN_CEIL(mbufs_num * ext_mem.elt_size, GPU_PAGE_SIZE);
+       ext_mem.buf_iova = RTE_BAD_IOVA;
+       ext_mem.buf_ptr = rte_gpu_malloc(dev_id, ext_mem.buf_len, 0);
+       rte_extmem_register(ext_mem.buf_ptr, ext_mem.buf_len, NULL, ext_mem.buf_iova, GPU_PAGE_SIZE);
+       rte_dev_dma_map(rte_eth_devices[port_id].device, ext_mem.buf_ptr, ext_mem.buf_iova, ext_mem.buf_len);
+       mpool_payload = rte_pktmbuf_pool_create_extbuf("gpu_mempool", mbufs_num,
+                                                       0, 0, ext_mem.elt_size,
+                                                       rte_socket_id(), &ext_mem, 1);
+
+       /**
+        * Create CPU - device communication flag. With this flag, the CPU can tell to the CUDA kernel
+        * to exit from the main loop.
+        */
+       rte_gpu_comm_create_flag(dev_id, &quit_flag, RTE_GPU_COMM_FLAG_CPU);
+       rte_gpu_comm_set_flag(&quit_flag , 0);
+
+       /**
+        * Create CPU - device communication list. Each entry of this list will be populated by the CPU
+        * with a new set of received mbufs that the CUDA kernel has to process.
+        */
+       comm_list = rte_gpu_comm_create_list(dev_id, num_entries);
+
+       /** A very simple CUDA kernel with just 1 CUDA block and RTE_GPU_COMM_LIST_PKTS_MAX CUDA threads. */
+       cuda_kernel_packet_processing<<<1, RTE_GPU_COMM_LIST_PKTS_MAX, 0, cstream>>>(quit_flag->ptr, comm_list, num_entries, ...);
+
+       /**
+        * For simplicity, the CPU here receives only 2 bursts of mbufs.
+        * In a real application, network activity and device processing should overlap.
+        */
+       nb_rx = rte_eth_rx_burst(port_id, queue_id, &(rx_mbufs[0]), max_rx_mbufs);
+       rte_gpu_comm_populate_list_pkts(comm_list[0], rx_mbufs, nb_rx);
+       nb_rx = rte_eth_rx_burst(port_id, queue_id, &(rx_mbufs[0]), max_rx_mbufs);
+       rte_gpu_comm_populate_list_pkts(comm_list[1], rx_mbufs, nb_rx);
+
+       /**
+        * CPU waits for the completion of the packets' processing on the CUDA kernel
+        * and then it does a cleanup of the received mbufs.
+        */
+       while(rte_gpu_comm_cleanup_list(comm_list[0]));
+       while(rte_gpu_comm_cleanup_list(comm_list[1]));
+
+       /** CPU notifies the CUDA kernel that it has to terminate */
+       rte_gpu_comm_set_flag(&quit_flag, 1);
+
+       /** gpudev objects cleanup/destruction */
+       /** CUDA cleanup */
+
+       rte_gpu_free(dev_id, ext_mem.buf_len);
+
+       /** DPDK cleanup */
+
+       return 0;
+   }
+
+   //////////////////////////////////////////////////////////////////////////
+   ///// CUDA kernel
+   //////////////////////////////////////////////////////////////////////////
+
+   void cuda_kernel(uint32_t * quit_flag_ptr, struct rte_gpu_comm_list *comm_list, int comm_list_entries) {
+      int comm_list_index = 0;
+      struct rte_gpu_comm_pkt *pkt_list = NULL;
+
+      /** Do some pre-processing operations. */
+
+      /** GPU kernel keeps checking this flag to know if it has to quit or wait for more packets. */
+      while(*quit_flag_ptr == 0)
+      {
+         if(comm_list[comm_list_index]->status != RTE_GPU_COMM_LIST_READY)
+         continue;
+
+         if(threadIdx.x < comm_list[comm_list_index]->num_pkts)
+         {
+            /** Each CUDA thread processes a different packet. */
+            packet_processing(comm_list[comm_list_index]->addr, comm_list[comm_list_index]->size, ..);
+         }
+         __threadfence();
+         __syncthreads();
+
+         /** Wait for new packets on the next communication list entry. */
+         comm_list_index = (comm_list_index+1) % comm_list_entries;
+      }
+
+      /** Do some post-processing operations. */
+   }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH v3 5/9] gpudev: add memory API
  2021-10-29 19:38     ` Mattias Rönnblom
@ 2021-11-08 15:16       ` Elena Agostini
  0 siblings, 0 replies; 128+ messages in thread
From: Elena Agostini @ 2021-11-08 15:16 UTC (permalink / raw)
  To: Mattias Rönnblom, dev; +Cc: NBU-Contact-Thomas Monjalon

> From: Mattias Rönnblom <hofors@lysator.liu.se>
> Date: Friday, 29 October 2021 at 21:38
> To: Elena Agostini <eagostini@nvidia.com>, dev@dpdk.org <dev@dpdk.org>
> Cc: NBU-Contact-Thomas Monjalon <thomas@monjalon.net>
> Subject: Re: [dpdk-dev] [PATCH v3 5/9] gpudev: add memory API>
>
> On 2021-10-09 03:53, eagostini@nvidia.com wrote:
> > From: Elena Agostini <eagostini@nvidia.com>
> >
> > In heterogeneous computing system, processing is not only in the CPU.
> > Some tasks can be delegated to devices working in parallel.
> > Such workload distribution can be achieved by sharing some memory.
> >
> > As a first step, the features are focused on memory management.
> > A function allows to allocate memory inside the device,
> > or in the main (CPU) memory while making it visible for the device.
> > This memory may be used to save packets or for synchronization data.
> >
> > The next step should focus on GPU processing task control.
> >
> > Signed-off-by: Elena Agostini <eagostini@nvidia.com>
> > Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> > ---
> >   app/test-gpudev/main.c                 | 118 +++++++++++++++++++++++++
> >   doc/guides/gpus/features/default.ini   |   3 +
> >   doc/guides/prog_guide/gpudev.rst       |  19 ++++
> >   doc/guides/rel_notes/release_21_11.rst |   1 +
> >   lib/gpudev/gpudev.c                    | 101 +++++++++++++++++++++
> >   lib/gpudev/gpudev_driver.h             |  12 +++
> >   lib/gpudev/rte_gpudev.h                |  95 ++++++++++++++++++++
> >   lib/gpudev/version.map                 |   4 +
> >   8 files changed, 353 insertions(+)
> >
> > diff --git a/app/test-gpudev/main.c b/app/test-gpudev/main.c
> > index 6a73a54e84..98c02a3ee0 100644
> > --- a/app/test-gpudev/main.c
> > +++ b/app/test-gpudev/main.c
> > @@ -62,6 +62,110 @@ args_parse(int argc, char **argv)
> >       }
> >   }
> >
> > +static int
> > +alloc_gpu_memory(uint16_t gpu_id)
> > +{
> > +     void * ptr_1 = NULL;>

> Delete space between '*' and 'p'.>

Thanks Mattias, I addressed all of your comments and I re-run checkpatch script
In all the gpudev related files.

> > +     void * ptr_2 = NULL;
> > +     size_t buf_bytes = 1024;
> > +     int ret = 0;>

> This initialization is redundant.>

> > +
> > +     printf("\n=======> TEST: Allocate GPU memory\n");
> > +
> > +     /* Alloc memory on GPU 0 */
> > +     ptr_1 = rte_gpu_malloc(gpu_id, buf_bytes);
> > +     if(ptr_1 == NULL)
> > +     {>

> Misplaced braces.>

> "if (" rather than "if(".>

> > +             fprintf(stderr, "rte_gpu_malloc GPU memory returned error\n");
> > +             return -1;
> > +     }
> > +     printf("GPU memory allocated at 0x%p %zdB\n", ptr_1, buf_bytes);
> > +
> > +     ptr_2 = rte_gpu_malloc(gpu_id, buf_bytes);
> > +     if(ptr_2 == NULL)
> > +     {>

> Again, and throughout this file.>

> > +             fprintf(stderr, "rte_gpu_malloc GPU memory returned error\n");
> > +             return -1;
> > +     }
> > +     printf("GPU memory allocated at 0x%p %zdB\n", ptr_2, buf_bytes);
> > +
> > +     ret = rte_gpu_free(gpu_id, (uint8_t*)(ptr_1)+0x700);
> > +     if(ret < 0)
> > +     {
> > +             printf("GPU memory 0x%p + 0x700 NOT freed because of memory address not recognized by driver\n", ptr_1);
> > +     }
> > +     else
> > +     {
> > +             fprintf(stderr, "rte_gpu_free erroneusly freed GPU memory 0x%p + 0x700\n", ptr_1);
> > +             return -1;
> > +     }
> > +
> > +     ret = rte_gpu_free(gpu_id, ptr_2);
> > +     if(ret < 0)
> > +     {
> > +             fprintf(stderr, "rte_gpu_free returned error %d\n", ret);
> > +             return -1;
> > +     }
> > +     printf("GPU memory 0x%p freed\n", ptr_2);
> > +
> > +     ret = rte_gpu_free(gpu_id, ptr_1);
> > +     if(ret < 0)
> > +     {
> > +             fprintf(stderr, "rte_gpu_free returned error %d\n", ret);
> > +             return -1;
> > +     }
> > +     printf("GPU memory 0x%p freed\n", ptr_1);
> > +
> > +     return 0;
> > +}
> > +
> > +static int
> > +register_cpu_memory(uint16_t gpu_id)
> > +{
> > +     void * ptr = NULL;
> > +     size_t buf_bytes = 1024;
> > +     int ret = 0;
> > +
> > +     printf("\n=======> TEST: Register CPU memory\n");
> > +
> > +     /* Alloc memory on CPU visible from GPU 0 */
> > +     ptr = rte_zmalloc(NULL, buf_bytes, 0);
> > +     if (ptr == NULL) {
> > +             fprintf(stderr, "Failed to allocate CPU memory.\n");
> > +             return -1;
> > +     }
> > +
> > +     ret = rte_gpu_register(gpu_id, buf_bytes, ptr);
> > +     if(ret < 0)
> > +     {
> > +             fprintf(stderr, "rte_gpu_register CPU memory returned error %d\n", ret);
> > +             return -1;
> > +     }
> > +     printf("CPU memory registered at 0x%p %zdB\n", ptr, buf_bytes);
> > +
> > +     ret = rte_gpu_unregister(gpu_id, (uint8_t*)(ptr)+0x700);
> > +     if(ret < 0)
> > +     {
> > +             printf("CPU memory 0x%p + 0x700 NOT unregistered because of memory address not recognized by driver\n", ptr);
> > +     }
> > +     else
> > +     {
> > +             fprintf(stderr, "rte_gpu_free erroneusly freed GPU memory 0x%p + 0x700\n", ptr);
> > +             return -1;
> > +     }
> > +     printf("CPU memory 0x%p unregistered\n", ptr);
> > +
> > +     ret = rte_gpu_unregister(gpu_id, ptr);
> > +     if(ret < 0)
> > +     {
> > +             fprintf(stderr, "rte_gpu_unregister returned error %d\n", ret);
> > +             return -1;
> > +     }
> > +     printf("CPU memory 0x%p unregistered\n", ptr);
> > +
> > +     return 0;
> > +}
> > +
> >   int
> >   main(int argc, char **argv)
> >   {
> > @@ -99,6 +203,20 @@ main(int argc, char **argv)
> >       }
> >       printf("\n\n");
> >
> > +     if(nb_gpus == 0 > +     {
> > +             fprintf(stderr, "Need at least one GPU on the system to run the example\n");
> > +             return EXIT_FAILURE;
> > +     }
> > +
> > +     gpu_id = 0;
> > +
> > +     /**
> > +      * Memory tests
> > +      */
> > +     alloc_gpu_memory(gpu_id);
> > +     register_cpu_memory(gpu_id);
> > +
> >       /* clean up the EAL */
> >       rte_eal_cleanup();
> >       printf("Bye...\n");
> > diff --git a/doc/guides/gpus/features/default.ini b/doc/guides/gpus/features/default.ini
> > index ec7a545eb7..87e9966424 100644
> > --- a/doc/guides/gpus/features/default.ini
> > +++ b/doc/guides/gpus/features/default.ini
> > @@ -8,3 +8,6 @@
> >   ;
> >   [Features]
> >   Get device info                =
> > +Share CPU memory with device   =
> > +Allocate device memory         =
> > +Free memory                    =
> > diff --git a/doc/guides/prog_guide/gpudev.rst b/doc/guides/prog_guide/gpudev.rst
> > index 7694639489..9aca69038c 100644
> > --- a/doc/guides/prog_guide/gpudev.rst
> > +++ b/doc/guides/prog_guide/gpudev.rst
> > @@ -30,6 +30,8 @@ Features
> >   This library provides a number of features:
> >
> >   - Interoperability with device-specific library through generic handlers.
> > +- Allocate and free memory on the device.
> > +- Register CPU memory to make it visible from the device.
> >
> >
> >   API Overview
> > @@ -46,3 +48,20 @@ that will be registered internally by the driver as an additional device (child)
> >   connected to a physical device (parent).
> >   Each device (parent or child) is represented through a ID
> >   required to indicate which device a given operation should be executed on.
> > +
> > +Memory Allocation
> > +~~~~~~~~~~~~~~~~~
> > +
> > +gpudev can allocate on an input given GPU device a memory area
> > +returning the pointer to that memory.
> > +Later, it's also possible to free that memory with gpudev.
> > +GPU memory allocated outside of the gpudev library
> > +(e.g. with GPU-specific library) cannot be freed by the gpudev library.
> > +
> > +Memory Registration
> > +~~~~~~~~~~~~~~~~~~~
> > +
> > +gpudev can register a CPU memory area to make it visible from a GPU device.
> > +Later, it's also possible to unregister that memory with gpudev.
> > +CPU memory registered outside of the gpudev library
> > +(e.g. with GPU specific library) cannot be unregistered by the gpudev library.
> > diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
> > index 4986a35b50..c4ac5e3053 100644
> > --- a/doc/guides/rel_notes/release_21_11.rst
> > +++ b/doc/guides/rel_notes/release_21_11.rst
> > @@ -65,6 +65,7 @@ New Features
> >   * **Introduced GPU device class with first features:**
> >
> >     * Device information
> > +  * Memory management
> >
> >   * **Added new RSS offload types for IPv4/L4 checksum in RSS flow.**
> >
> > diff --git a/lib/gpudev/gpudev.c b/lib/gpudev/gpudev.c
> > index f0690cf730..1d8318f769 100644
> > --- a/lib/gpudev/gpudev.c
> > +++ b/lib/gpudev/gpudev.c
> > @@ -6,6 +6,7 @@
> >   #include <rte_tailq.h>
> >   #include <rte_string_fns.h>
> >   #include <rte_memzone.h>
> > +#include <rte_malloc.h>
> >   #include <rte_errno.h>
> >   #include <rte_log.h>
> >
> > @@ -523,3 +524,103 @@ rte_gpu_info_get(int16_t dev_id, struct rte_gpu_info *info)
> >       }
> >       return GPU_DRV_RET(dev->ops.dev_info_get(dev, info));
> >   }
> > +
> > +void *
> > +rte_gpu_malloc(int16_t dev_id, size_t size)
> > +{
> > +     struct rte_gpu *dev;
> > +     void *ptr;
> > +     int ret;
> > +
> > +     dev = gpu_get_by_id(dev_id);
> > +     if (dev == NULL) {
> > +             GPU_LOG(ERR, "alloc mem for invalid device ID %d", dev_id);
> > +             rte_errno = ENODEV;
> > +             return NULL;
> > +     }
> > +
> > +     if (dev->ops.mem_alloc == NULL) {
> > +             GPU_LOG(ERR, "mem allocation not supported");
> > +             rte_errno = ENOTSUP;
> > +             return NULL;
> > +     }
> > +
> > +     if (size == 0) /* dry-run */
> > +             return NULL;
> > +
> > +     ret = dev->ops.mem_alloc(dev, size, &ptr);
> > +
> > +     switch (ret) {
> > +             case 0:
> > +                     return ptr;
> > +             case -ENOMEM:
> > +             case -E2BIG:
> > +                     rte_errno = -ret;
> > +                     return NULL;
> > +             default:
> > +                     rte_errno = -EPERM;
> > +                     return NULL;
> > +     }
> > +}
> > +
> > +int
> > +rte_gpu_register(int16_t dev_id, size_t size, void * ptr)
> > +{
> > +     struct rte_gpu *dev;
> > +
> > +     dev = gpu_get_by_id(dev_id);
> > +     if (dev == NULL) {
> > +             GPU_LOG(ERR, "alloc mem for invalid device ID %d", dev_id);
> > +             rte_errno = ENODEV;
> > +             return -rte_errno;
> > +     }
> > +
> > +     if (dev->ops.mem_register == NULL) {
> > +             GPU_LOG(ERR, "mem registration not supported");
> > +             rte_errno = ENOTSUP;
> > +             return -rte_errno;
> > +     }
> > +
> > +     if (size == 0 || ptr == NULL) /* dry-run */
> > +             return -EINVAL;
> > +
> > +     return GPU_DRV_RET(dev->ops.mem_register(dev, size, ptr));
> > +}
> > +
> > +int
> > +rte_gpu_unregister(int16_t dev_id, void * ptr)
> > +{
> > +     struct rte_gpu *dev;
> > +
> > +     dev = gpu_get_by_id(dev_id);
> > +     if (dev == NULL) {
> > +             GPU_LOG(ERR, "unregister mem for invalid device ID %d", dev_id);
> > +             rte_errno = ENODEV;
> > +             return -rte_errno;
> > +     }
> > +
> > +     if (dev->ops.mem_unregister == NULL) {
> > +             rte_errno = ENOTSUP;
> > +             return -rte_errno;
> > +     }
> > +     return GPU_DRV_RET(dev->ops.mem_unregister(dev, ptr));
> > +}
> > +
> > +int
> > +rte_gpu_free(int16_t dev_id, void *ptr)
> > +{
> > +     struct rte_gpu *dev;
> > +
> > +     dev = gpu_get_by_id(dev_id);
> > +     if (dev == NULL) {
> > +             GPU_LOG(ERR, "free mem for invalid device ID %d", dev_id);
> > +             rte_errno = ENODEV;
> > +             return -rte_errno;
> > +     }
> > +
> > +     if (dev->ops.mem_free == NULL) {
> > +             rte_errno = ENOTSUP;
> > +             return -rte_errno;
> > +     }
> > +     return GPU_DRV_RET(dev->ops.mem_free(dev, ptr));
> > +}
> > diff --git a/lib/gpudev/gpudev_driver.h b/lib/gpudev/gpudev_driver.h
> > index 9459c7e30f..11015944a6 100644
> > --- a/lib/gpudev/gpudev_driver.h
> > +++ b/lib/gpudev/gpudev_driver.h
> > @@ -27,12 +27,24 @@ enum rte_gpu_state {
> >   struct rte_gpu;
> >   typedef int (rte_gpu_close_t)(struct rte_gpu *dev);
> >   typedef int (rte_gpu_info_get_t)(struct rte_gpu *dev, struct rte_gpu_info *info);
> > +typedef int (rte_gpu_mem_alloc_t)(struct rte_gpu *dev, size_t size, void **ptr);
> > +typedef int (rte_gpu_free_t)(struct rte_gpu *dev, void *ptr);
> > +typedef int (rte_gpu_mem_register_t)(struct rte_gpu *dev, size_t size, void *ptr);
> > +typedef int (rte_gpu_mem_unregister_t)(struct rte_gpu *dev, void *ptr);
> >
> >   struct rte_gpu_ops {
> >       /* Get device info. If NULL, info is just copied. */
> >       rte_gpu_info_get_t *dev_info_get;
> >       /* Close device or child context. */
> >       rte_gpu_close_t *dev_close;
> > +     /* Allocate memory in device. */
> > +     rte_gpu_mem_alloc_t *mem_alloc;
> > +     /* Register CPU memory in device. */
> > +     rte_gpu_mem_register_t *mem_register;
> > +     /* Free memory allocated or registered in device. */
> > +     rte_gpu_free_t *mem_free;
> > +     /* Unregister CPU memory in device. */
> > +     rte_gpu_mem_unregister_t *mem_unregister;
> >   };
> >
> >   struct rte_gpu_mpshared {
> > diff --git a/lib/gpudev/rte_gpudev.h b/lib/gpudev/rte_gpudev.h
> > index df75dbdbab..3c276581c0 100644
> > --- a/lib/gpudev/rte_gpudev.h
> > +++ b/lib/gpudev/rte_gpudev.h
> > @@ -9,6 +9,7 @@
> >   #include <stdint.h>
> >   #include <stdbool.h>
> >
> > +#include <rte_bitops.h>
> >   #include <rte_compat.h>
> >
> >   /**
> > @@ -292,6 +293,100 @@ int rte_gpu_callback_unregister(int16_t dev_id, enum rte_gpu_event event,
> >   __rte_experimental
> >   int rte_gpu_info_get(int16_t dev_id, struct rte_gpu_info *info);
> >
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Allocate a chunk of memory usable by the device.
> > + *
> > + * @param dev_id
> > + *   Device ID requiring allocated memory.
> > + * @param size
> > + *   Number of bytes to allocate.
> > + *   Requesting 0 will do nothing.
> > + *
> > + * @return
> > + *   A pointer to the allocated memory, otherwise NULL and rte_errno is set:
> > + *   - ENODEV if invalid dev_id
> > + *   - EINVAL if reserved flags
> > + *   - ENOTSUP if operation not supported by the driver
> > + *   - E2BIG if size is higher than limit
> > + *   - ENOMEM if out of space
> > + *   - EPERM if driver error
> > + */
> > +__rte_experimental
> > +void *rte_gpu_malloc(int16_t dev_id, size_t size)
> > +__rte_alloc_size(2);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Deallocate a chunk of memory allocated with rte_gpu_malloc().
> > + *
> > + * @param dev_id
> > + *   Reference device ID.
> > + * @param ptr
> > + *   Pointer to the memory area to be deallocated.
> > + *   NULL is a no-op accepted value.
> > + *
> > + * @return
> > + *   0 on success, -rte_errno otherwise:>

> I don't think you are supposed to set rte_errno if it's not needed,
> which is not the case here (since you return the error code).>

> > + *   - ENODEV if invalid dev_id
> > + *   - ENOTSUP if operation not supported by the driver
> > + *   - EPERM if driver error
> > + */
> > +__rte_experimental
> > +int rte_gpu_free(int16_t dev_id, void *ptr);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Register a chunk of memory on the CPU usable by the device.
> > + *
> > + * @param dev_id
> > + *   Device ID requiring allocated memory.
> > + * @param size
> > + *   Number of bytes to allocate.
> > + *   Requesting 0 will do nothing.
> > + * @param ptr
> > + *   Pointer to the memory area to be registered.
> > + *   NULL is a no-op accepted value.
> > +
> > + * @return
> > + *   A pointer to the allocated memory, otherwise NULL and rte_errno is set:
> > + *   - ENODEV if invalid dev_id
> > + *   - EINVAL if reserved flags
> > + *   - ENOTSUP if operation not supported by the driver
> > + *   - E2BIG if size is higher than limit
> > + *   - ENOMEM if out of space
> > + *   - EPERM if driver error
> > + */
> > +__rte_experimental
> > +int rte_gpu_register(int16_t dev_id, size_t size, void * ptr);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Deregister a chunk of memory previusly registered with rte_gpu_mem_register()
> > + *
> > + * @param dev_id
> > + *   Reference device ID.
> > + * @param ptr
> > + *   Pointer to the memory area to be unregistered.
> > + *   NULL is a no-op accepted value.
> > + *
> > + * @return
> > + *   0 on success, -rte_errno otherwise:
> > + *   - ENODEV if invalid dev_id
> > + *   - ENOTSUP if operation not supported by the driver
> > + *   - EPERM if driver error
> > + */
> > +__rte_experimental
> > +int rte_gpu_unregister(int16_t dev_id, void *ptr);
> > +
> >   #ifdef __cplusplus
> >   }
> >   #endif
> > diff --git a/lib/gpudev/version.map b/lib/gpudev/version.map
> > index 58dc632393..d4a65ebd52 100644
> > --- a/lib/gpudev/version.map
> > +++ b/lib/gpudev/version.map
> > @@ -8,9 +8,13 @@ EXPERIMENTAL {
> >       rte_gpu_close;
> >       rte_gpu_count_avail;
> >       rte_gpu_find_next;
> > +     rte_gpu_free;
> >       rte_gpu_info_get;
> >       rte_gpu_init;
> >       rte_gpu_is_valid;
> > +     rte_gpu_malloc;
> > +     rte_gpu_register;
> > +     rte_gpu_unregister;
> >   };
> >
> >   INTERNAL {
> >

^ permalink raw reply	[flat|nested] 128+ messages in thread

* Re: [dpdk-dev] [PATCH v5 0/9] GPU library
  2021-11-08 18:57 ` [dpdk-dev] [PATCH v5 0/9] GPU library eagostini
@ 2021-11-08 16:25   ` Thomas Monjalon
  2021-11-08 18:57   ` [dpdk-dev] [PATCH v5 1/9] gpudev: introduce GPU device class library eagostini
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 128+ messages in thread
From: Thomas Monjalon @ 2021-11-08 16:25 UTC (permalink / raw)
  To: Elena Agostini; +Cc: dev

08/11/2021 19:57, eagostini@nvidia.com:
> Elena Agostini (6):
>   gpudev: introduce GPU device class library
>   gpudev: add memory API
>   gpudev: add memory barrier
>   gpudev: add communication flag
>   gpudev: add communication list
>   doc: add CUDA example in GPU guide
> 
> Thomas Monjalon (3):
>   gpudev: add event notification
>   gpudev: add child device representing a device context
>   gpudev: support multi-process

Applied with last details fixed.

For reference, the techboard approved gpudev integration:
http://inbox.dpdk.org/dev/DM6PR11MB44917C0FF9926C1BADBCA7149A8B9@DM6PR11MB4491.namprd11.prod.outlook.com/



^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [PATCH v5 0/9] GPU library
  2021-06-02 20:35 [dpdk-dev] [PATCH] gpudev: introduce memory API Thomas Monjalon
                   ` (9 preceding siblings ...)
  2021-11-03 19:15 ` [dpdk-dev] [PATCH v4 " eagostini
@ 2021-11-08 18:57 ` eagostini
  2021-11-08 16:25   ` Thomas Monjalon
                     ` (9 more replies)
  10 siblings, 10 replies; 128+ messages in thread
From: eagostini @ 2021-11-08 18:57 UTC (permalink / raw)
  To: dev; +Cc: Elena Agostini

From: Elena Agostini <eagostini@nvidia.com>

In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.

The goal of this new library is to enhance the collaboration between
DPDK, that's primarily a CPU framework, and GPU devices.

When mixing network activity with task processing on a non-CPU device,
there may be the need to put in communication the CPU with the device
in order to manage the memory, synchronize operations, exchange info, etc..

This library provides a number of new features:
- Interoperability with GPU-specific library with generic handlers
- Possibility to allocate and free memory on the GPU
- Possibility to allocate and free memory on the CPU but visible from the GPU
- Communication functions to enhance the dialog between the CPU and the GPU

The infrastructure is prepared to welcome drivers in drivers/gpu/
as the CUDA one:
http://patches.dpdk.org/project/dpdk/cover/20211104020128.13165-1-eagostini@nvidia.com

Changelog:
- Patches updated to latest DPDK commit
- Communication list item has an array of mbufs instead of opaque
  objects
- Communication list free doesn't release mbufs anymore
- Fixed styling reported by checkpatch

Elena Agostini (6):
  gpudev: introduce GPU device class library
  gpudev: add memory API
  gpudev: add memory barrier
  gpudev: add communication flag
  gpudev: add communication list
  doc: add CUDA example in GPU guide

Thomas Monjalon (3):
  gpudev: add event notification
  gpudev: add child device representing a device context
  gpudev: support multi-process

 .gitignore                             |   1 +
 MAINTAINERS                            |   6 +
 app/meson.build                        |   1 +
 app/test-gpudev/main.c                 | 367 ++++++++++
 app/test-gpudev/meson.build            |   5 +
 doc/api/doxy-api.conf.in               |   1 +
 doc/guides/conf.py                     |   8 +
 doc/guides/gpus/features/default.ini   |  13 +
 doc/guides/gpus/index.rst              |  11 +
 doc/guides/gpus/overview.rst           |  10 +
 doc/guides/index.rst                   |   1 +
 doc/guides/prog_guide/gpudev.rst       | 226 +++++++
 doc/guides/prog_guide/index.rst        |   1 +
 doc/guides/rel_notes/release_21_11.rst |   6 +
 drivers/gpu/meson.build                |   4 +
 drivers/meson.build                    |   1 +
 lib/gpudev/gpudev.c                    | 901 +++++++++++++++++++++++++
 lib/gpudev/gpudev_driver.h             | 102 +++
 lib/gpudev/meson.build                 |  12 +
 lib/gpudev/rte_gpudev.h                | 649 ++++++++++++++++++
 lib/gpudev/version.map                 |  38 ++
 lib/meson.build                        |   1 +
 22 files changed, 2365 insertions(+)
 create mode 100644 app/test-gpudev/main.c
 create mode 100644 app/test-gpudev/meson.build
 create mode 100644 doc/guides/gpus/features/default.ini
 create mode 100644 doc/guides/gpus/index.rst
 create mode 100644 doc/guides/gpus/overview.rst
 create mode 100644 doc/guides/prog_guide/gpudev.rst
 create mode 100644 drivers/gpu/meson.build
 create mode 100644 lib/gpudev/gpudev.c
 create mode 100644 lib/gpudev/gpudev_driver.h
 create mode 100644 lib/gpudev/meson.build
 create mode 100644 lib/gpudev/rte_gpudev.h
 create mode 100644 lib/gpudev/version.map

-- 
2.17.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [PATCH v5 1/9] gpudev: introduce GPU device class library
  2021-11-08 18:57 ` [dpdk-dev] [PATCH v5 0/9] GPU library eagostini
  2021-11-08 16:25   ` Thomas Monjalon
@ 2021-11-08 18:57   ` eagostini
  2021-11-08 18:57   ` [dpdk-dev] [PATCH v5 2/9] gpudev: add event notification eagostini
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 128+ messages in thread
From: eagostini @ 2021-11-08 18:57 UTC (permalink / raw)
  To: dev; +Cc: Elena Agostini, Thomas Monjalon

From: Elena Agostini <eagostini@nvidia.com>

In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.

The new library gpudev is for dealing with GPGPU computing devices
from a DPDK application running on the CPU.

The infrastructure is prepared to welcome drivers in drivers/gpu/.

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 .gitignore                             |   1 +
 MAINTAINERS                            |   6 +
 app/meson.build                        |   1 +
 app/test-gpudev/main.c                 | 107 +++++++++++
 app/test-gpudev/meson.build            |   5 +
 doc/api/doxy-api.conf.in               |   1 +
 doc/guides/conf.py                     |   8 +
 doc/guides/gpus/features/default.ini   |  10 +
 doc/guides/gpus/index.rst              |  11 ++
 doc/guides/gpus/overview.rst           |  10 +
 doc/guides/index.rst                   |   1 +
 doc/guides/prog_guide/gpudev.rst       |  36 ++++
 doc/guides/prog_guide/index.rst        |   1 +
 doc/guides/rel_notes/release_21_11.rst |   4 +
 drivers/gpu/meson.build                |   4 +
 drivers/meson.build                    |   1 +
 lib/gpudev/gpudev.c                    | 249 +++++++++++++++++++++++++
 lib/gpudev/gpudev_driver.h             |  67 +++++++
 lib/gpudev/meson.build                 |  10 +
 lib/gpudev/rte_gpudev.h                | 168 +++++++++++++++++
 lib/gpudev/version.map                 |  20 ++
 lib/meson.build                        |   1 +
 22 files changed, 722 insertions(+)
 create mode 100644 app/test-gpudev/main.c
 create mode 100644 app/test-gpudev/meson.build
 create mode 100644 doc/guides/gpus/features/default.ini
 create mode 100644 doc/guides/gpus/index.rst
 create mode 100644 doc/guides/gpus/overview.rst
 create mode 100644 doc/guides/prog_guide/gpudev.rst
 create mode 100644 drivers/gpu/meson.build
 create mode 100644 lib/gpudev/gpudev.c
 create mode 100644 lib/gpudev/gpudev_driver.h
 create mode 100644 lib/gpudev/meson.build
 create mode 100644 lib/gpudev/rte_gpudev.h
 create mode 100644 lib/gpudev/version.map

diff --git a/.gitignore b/.gitignore
index 7ec8688342..b98a43a601 100644
--- a/.gitignore
+++ b/.gitignore
@@ -15,6 +15,7 @@ doc/guides/compressdevs/overview_feature_table.txt
 doc/guides/regexdevs/overview_feature_table.txt
 doc/guides/vdpadevs/overview_feature_table.txt
 doc/guides/bbdevs/overview_feature_table.txt
+doc/guides/gpus/overview_feature_table.txt
 
 # ignore generated ctags/cscope files
 cscope.out.po
diff --git a/MAINTAINERS b/MAINTAINERS
index 3459187e26..a2e67fb1e6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -467,6 +467,12 @@ M: Bruce Richardson <bruce.richardson@intel.com>
 F: examples/dma/
 F: doc/guides/sample_app_ug/dma.rst
 
+General-Purpose Graphics Processing Unit (GPU) API - EXPERIMENTAL
+M: Elena Agostini <eagostini@nvidia.com>
+F: lib/gpudev/
+F: doc/guides/prog_guide/gpudev.rst
+F: doc/guides/gpus/features/default.ini
+
 Eventdev API
 M: Jerin Jacob <jerinj@marvell.com>
 T: git://dpdk.org/next/dpdk-next-eventdev
diff --git a/app/meson.build b/app/meson.build
index 986c1a4ad4..310e83076f 100644
--- a/app/meson.build
+++ b/app/meson.build
@@ -13,6 +13,7 @@ apps = [
         'test-eventdev',
         'test-fib',
         'test-flow-perf',
+        'test-gpudev',
         'test-pipeline',
         'test-pmd',
         'test-regex',
diff --git a/app/test-gpudev/main.c b/app/test-gpudev/main.c
new file mode 100644
index 0000000000..438cfdac54
--- /dev/null
+++ b/app/test-gpudev/main.c
@@ -0,0 +1,107 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 NVIDIA Corporation & Affiliates
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <stdarg.h>
+#include <errno.h>
+#include <getopt.h>
+
+#include <rte_common.h>
+#include <rte_malloc.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_ether.h>
+#include <rte_ethdev.h>
+#include <rte_mempool.h>
+#include <rte_mbuf.h>
+
+#include <rte_gpudev.h>
+
+enum app_args {
+	ARG_HELP,
+	ARG_MEMPOOL
+};
+
+static void
+usage(const char *prog_name)
+{
+	printf("%s [EAL options] --\n",
+		prog_name);
+}
+
+static void
+args_parse(int argc, char **argv)
+{
+	char **argvopt;
+	int opt;
+	int opt_idx;
+
+	static struct option lgopts[] = {
+		{ "help", 0, 0, ARG_HELP},
+		/* End of options */
+		{ 0, 0, 0, 0 }
+	};
+
+	argvopt = argv;
+	while ((opt = getopt_long(argc, argvopt, "",
+				lgopts, &opt_idx)) != EOF) {
+		switch (opt) {
+		case ARG_HELP:
+			usage(argv[0]);
+			break;
+		default:
+			usage(argv[0]);
+			rte_exit(EXIT_FAILURE, "Invalid option: %s\n", argv[optind]);
+			break;
+		}
+	}
+}
+
+int
+main(int argc, char **argv)
+{
+	int ret;
+	int nb_gpus = 0;
+	int16_t gpu_id = 0;
+	struct rte_gpu_info ginfo;
+
+	/* Init EAL. */
+	ret = rte_eal_init(argc, argv);
+	if (ret < 0)
+		rte_exit(EXIT_FAILURE, "EAL init failed\n");
+	argc -= ret;
+	argv += ret;
+	if (argc > 1)
+		args_parse(argc, argv);
+	argc -= ret;
+	argv += ret;
+
+	nb_gpus = rte_gpu_count_avail();
+	printf("\n\nDPDK found %d GPUs:\n", nb_gpus);
+	RTE_GPU_FOREACH(gpu_id)
+	{
+		if (rte_gpu_info_get(gpu_id, &ginfo))
+			rte_exit(EXIT_FAILURE, "rte_gpu_info_get error - bye\n");
+
+		printf("\tGPU ID %d\n\t\tparent ID %d GPU Bus ID %s NUMA node %d Tot memory %.02f MB, Tot processors %d\n",
+				ginfo.dev_id,
+				ginfo.parent,
+				ginfo.name,
+				ginfo.numa_node,
+				(((float)ginfo.total_memory)/(float)1024)/(float)1024,
+				ginfo.processor_count
+			);
+	}
+	printf("\n\n");
+
+	/* clean up the EAL */
+	rte_eal_cleanup();
+	printf("Bye...\n");
+
+	return EXIT_SUCCESS;
+}
diff --git a/app/test-gpudev/meson.build b/app/test-gpudev/meson.build
new file mode 100644
index 0000000000..17bdef3646
--- /dev/null
+++ b/app/test-gpudev/meson.build
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2021 NVIDIA Corporation & Affiliates
+
+sources = files('main.c')
+deps = ['gpudev', 'ethdev']
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index 096ebbaf0d..db2ca9b6ed 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -41,6 +41,7 @@ INPUT                   = @TOPDIR@/doc/api/doxy-api-index.md \
                           @TOPDIR@/lib/eventdev \
                           @TOPDIR@/lib/fib \
                           @TOPDIR@/lib/flow_classify \
+                          @TOPDIR@/lib/gpudev \
                           @TOPDIR@/lib/graph \
                           @TOPDIR@/lib/gro \
                           @TOPDIR@/lib/gso \
diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 0494b0efe7..e6ce929bc8 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -152,6 +152,9 @@ def generate_overview_table(output_filename, table_id, section, table_name, titl
         name = ini_filename[:-4]
         name = name.replace('_vf', 'vf')
         pmd_names.append(name)
+    if not pmd_names:
+        # Add an empty column if table is empty (required by RST syntax)
+        pmd_names.append(' ')
 
     # Pad the table header names.
     max_header_len = len(max(pmd_names, key=len))
@@ -393,6 +396,11 @@ def setup(app):
                             'Features',
                             'Features availability in bbdev drivers',
                             'Feature')
+    table_file = dirname(__file__) + '/gpus/overview_feature_table.txt'
+    generate_overview_table(table_file, 1,
+                            'Features',
+                            'Features availability in GPU drivers',
+                            'Feature')
 
     if LooseVersion(sphinx_version) < LooseVersion('1.3.1'):
         print('Upgrade sphinx to version >= 1.3.1 for '
diff --git a/doc/guides/gpus/features/default.ini b/doc/guides/gpus/features/default.ini
new file mode 100644
index 0000000000..ec7a545eb7
--- /dev/null
+++ b/doc/guides/gpus/features/default.ini
@@ -0,0 +1,10 @@
+;
+; Features of GPU drivers.
+;
+; This file defines the features that are valid for inclusion in
+; the other driver files and also the order that they appear in
+; the features table in the documentation. The feature description
+; string should not exceed feature_str_len defined in conf.py.
+;
+[Features]
+Get device info                =
diff --git a/doc/guides/gpus/index.rst b/doc/guides/gpus/index.rst
new file mode 100644
index 0000000000..1878423239
--- /dev/null
+++ b/doc/guides/gpus/index.rst
@@ -0,0 +1,11 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+   Copyright (c) 2021 NVIDIA Corporation & Affiliates
+
+General-Purpose Graphics Processing Unit Drivers
+================================================
+
+.. toctree::
+   :maxdepth: 2
+   :numbered:
+
+   overview
diff --git a/doc/guides/gpus/overview.rst b/doc/guides/gpus/overview.rst
new file mode 100644
index 0000000000..4830348818
--- /dev/null
+++ b/doc/guides/gpus/overview.rst
@@ -0,0 +1,10 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+   Copyright (c) 2021 NVIDIA Corporation & Affiliates
+
+Overview of GPU Drivers
+=======================
+
+General-Purpose computing on Graphics Processing Unit (GPGPU)
+is the use of GPU to perform parallel computation.
+
+.. include:: overview_feature_table.txt
diff --git a/doc/guides/index.rst b/doc/guides/index.rst
index 919825992e..5eb5bd9c9a 100644
--- a/doc/guides/index.rst
+++ b/doc/guides/index.rst
@@ -22,6 +22,7 @@ DPDK documentation
    vdpadevs/index
    regexdevs/index
    dmadevs/index
+   gpus/index
    eventdevs/index
    rawdevs/index
    mempool/index
diff --git a/doc/guides/prog_guide/gpudev.rst b/doc/guides/prog_guide/gpudev.rst
new file mode 100644
index 0000000000..6ea7239159
--- /dev/null
+++ b/doc/guides/prog_guide/gpudev.rst
@@ -0,0 +1,36 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+   Copyright (c) 2021 NVIDIA Corporation & Affiliates
+
+General-Purpose Graphics Processing Unit Library
+================================================
+
+When mixing networking activity with task processing on a GPU device,
+there may be the need to put in communication the CPU with the device
+in order to manage the memory, synchronize operations, exchange info, etc..
+
+By means of the generic GPU interface provided by this library,
+it is possible to allocate a chunk of GPU memory and use it
+to create a DPDK mempool with external mbufs having the payload
+on the GPU memory, enabling any network interface card
+(which support this feature like Mellanox NIC)
+to directly transmit and receive packets using GPU memory.
+
+Additionally, this library provides a number of functions
+to enhance the dialog between CPU and GPU.
+
+Out of scope of this library is to provide a wrapper for GPU specific libraries
+(e.g. CUDA Toolkit or OpenCL), thus it is not possible to launch workload
+on the device or create GPU specific objects
+(e.g. CUDA Driver context or CUDA Streams in case of NVIDIA GPUs).
+
+
+Features
+--------
+
+This library provides a number of features:
+
+- Interoperability with device-specific library through generic handlers.
+
+
+API Overview
+------------
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 20e5155cf4..7090b5589a 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -28,6 +28,7 @@ Programmer's Guide
     compressdev
     regexdev
     dmadev
+    gpudev
     rte_security
     rawdev
     link_bonding_poll_mode_drv_lib
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index 8da19c613a..9cf59e73bb 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -101,6 +101,10 @@ New Features
   Added ``rte_eth_macaddrs_get`` to allow user to retrieve all Ethernet
   addresses assigned to given ethernet port.
 
+* **Introduced GPU device class with first features:**
+
+  * Device information
+
 * **Added new RSS offload types for IPv4/L4 checksum in RSS flow.**
 
   Added macros ETH_RSS_IPV4_CHKSUM and ETH_RSS_L4_CHKSUM, now IPv4 and
diff --git a/drivers/gpu/meson.build b/drivers/gpu/meson.build
new file mode 100644
index 0000000000..e51ad3381b
--- /dev/null
+++ b/drivers/gpu/meson.build
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2021 NVIDIA Corporation & Affiliates
+
+drivers = []
diff --git a/drivers/meson.build b/drivers/meson.build
index 34c0276487..d5f4e1c1f2 100644
--- a/drivers/meson.build
+++ b/drivers/meson.build
@@ -19,6 +19,7 @@ subdirs = [
         'vdpa',           # depends on common, bus and mempool.
         'event',          # depends on common, bus, mempool and net.
         'baseband',       # depends on common and bus.
+        'gpu',            # depends on common and bus.
 ]
 
 if meson.is_cross_build()
diff --git a/lib/gpudev/gpudev.c b/lib/gpudev/gpudev.c
new file mode 100644
index 0000000000..aeb021f2cc
--- /dev/null
+++ b/lib/gpudev/gpudev.c
@@ -0,0 +1,249 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 NVIDIA Corporation & Affiliates
+ */
+
+#include <rte_eal.h>
+#include <rte_string_fns.h>
+#include <rte_errno.h>
+#include <rte_log.h>
+
+#include "rte_gpudev.h"
+#include "gpudev_driver.h"
+
+/* Logging */
+RTE_LOG_REGISTER_DEFAULT(gpu_logtype, NOTICE);
+#define GPU_LOG(level, ...) \
+	rte_log(RTE_LOG_ ## level, gpu_logtype, RTE_FMT("gpu: " \
+		RTE_FMT_HEAD(__VA_ARGS__, ) "\n", RTE_FMT_TAIL(__VA_ARGS__, )))
+
+/* Set any driver error as EPERM */
+#define GPU_DRV_RET(function) \
+	((function != 0) ? -(rte_errno = EPERM) : (rte_errno = 0))
+
+/* Array of devices */
+static struct rte_gpu *gpus;
+/* Number of currently valid devices */
+static int16_t gpu_max;
+/* Number of currently valid devices */
+static int16_t gpu_count;
+
+int
+rte_gpu_init(size_t dev_max)
+{
+	if (dev_max == 0 || dev_max > INT16_MAX) {
+		GPU_LOG(ERR, "invalid array size");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	/* No lock, it must be called before or during first probing. */
+	if (gpus != NULL) {
+		GPU_LOG(ERR, "already initialized");
+		rte_errno = EBUSY;
+		return -rte_errno;
+	}
+
+	gpus = calloc(dev_max, sizeof(struct rte_gpu));
+	if (gpus == NULL) {
+		GPU_LOG(ERR, "cannot initialize library");
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+
+	gpu_max = dev_max;
+	return 0;
+}
+
+uint16_t
+rte_gpu_count_avail(void)
+{
+	return gpu_count;
+}
+
+bool
+rte_gpu_is_valid(int16_t dev_id)
+{
+	if (dev_id >= 0 && dev_id < gpu_max &&
+		gpus[dev_id].state == RTE_GPU_STATE_INITIALIZED)
+		return true;
+	return false;
+}
+
+int16_t
+rte_gpu_find_next(int16_t dev_id)
+{
+	if (dev_id < 0)
+		dev_id = 0;
+	while (dev_id < gpu_max &&
+			gpus[dev_id].state == RTE_GPU_STATE_UNUSED)
+		dev_id++;
+
+	if (dev_id >= gpu_max)
+		return RTE_GPU_ID_NONE;
+	return dev_id;
+}
+
+static int16_t
+gpu_find_free_id(void)
+{
+	int16_t dev_id;
+
+	for (dev_id = 0; dev_id < gpu_max; dev_id++) {
+		if (gpus[dev_id].state == RTE_GPU_STATE_UNUSED)
+			return dev_id;
+	}
+	return RTE_GPU_ID_NONE;
+}
+
+static struct rte_gpu *
+gpu_get_by_id(int16_t dev_id)
+{
+	if (!rte_gpu_is_valid(dev_id))
+		return NULL;
+	return &gpus[dev_id];
+}
+
+struct rte_gpu *
+rte_gpu_get_by_name(const char *name)
+{
+	int16_t dev_id;
+	struct rte_gpu *dev;
+
+	if (name == NULL) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	RTE_GPU_FOREACH(dev_id) {
+		dev = &gpus[dev_id];
+		if (strncmp(name, dev->name, RTE_DEV_NAME_MAX_LEN) == 0)
+			return dev;
+	}
+	return NULL;
+}
+
+struct rte_gpu *
+rte_gpu_allocate(const char *name)
+{
+	int16_t dev_id;
+	struct rte_gpu *dev;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+		GPU_LOG(ERR, "only primary process can allocate device");
+		rte_errno = EPERM;
+		return NULL;
+	}
+	if (name == NULL) {
+		GPU_LOG(ERR, "allocate device without a name");
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* implicit initialization of library before adding first device */
+	if (gpus == NULL && rte_gpu_init(RTE_GPU_DEFAULT_MAX) < 0)
+		return NULL;
+
+	if (rte_gpu_get_by_name(name) != NULL) {
+		GPU_LOG(ERR, "device with name %s already exists", name);
+		rte_errno = EEXIST;
+		return NULL;
+	}
+	dev_id = gpu_find_free_id();
+	if (dev_id == RTE_GPU_ID_NONE) {
+		GPU_LOG(ERR, "reached maximum number of devices");
+		rte_errno = ENOENT;
+		return NULL;
+	}
+
+	dev = &gpus[dev_id];
+	memset(dev, 0, sizeof(*dev));
+
+	if (rte_strscpy(dev->name, name, RTE_DEV_NAME_MAX_LEN) < 0) {
+		GPU_LOG(ERR, "device name too long: %s", name);
+		rte_errno = ENAMETOOLONG;
+		return NULL;
+	}
+	dev->info.name = dev->name;
+	dev->info.dev_id = dev_id;
+	dev->info.numa_node = -1;
+
+	gpu_count++;
+	GPU_LOG(DEBUG, "new device %s (id %d) of total %d",
+			name, dev_id, gpu_count);
+	return dev;
+}
+
+void
+rte_gpu_complete_new(struct rte_gpu *dev)
+{
+	if (dev == NULL)
+		return;
+
+	dev->state = RTE_GPU_STATE_INITIALIZED;
+}
+
+int
+rte_gpu_release(struct rte_gpu *dev)
+{
+	if (dev == NULL) {
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+
+	GPU_LOG(DEBUG, "free device %s (id %d)",
+			dev->info.name, dev->info.dev_id);
+	dev->state = RTE_GPU_STATE_UNUSED;
+	gpu_count--;
+
+	return 0;
+}
+
+int
+rte_gpu_close(int16_t dev_id)
+{
+	int firsterr, binerr;
+	int *lasterr = &firsterr;
+	struct rte_gpu *dev;
+
+	dev = gpu_get_by_id(dev_id);
+	if (dev == NULL) {
+		GPU_LOG(ERR, "close invalid device ID %d", dev_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+
+	if (dev->ops.dev_close != NULL) {
+		*lasterr = GPU_DRV_RET(dev->ops.dev_close(dev));
+		if (*lasterr != 0)
+			lasterr = &binerr;
+	}
+
+	*lasterr = rte_gpu_release(dev);
+
+	rte_errno = -firsterr;
+	return firsterr;
+}
+
+int
+rte_gpu_info_get(int16_t dev_id, struct rte_gpu_info *info)
+{
+	struct rte_gpu *dev;
+
+	dev = gpu_get_by_id(dev_id);
+	if (dev == NULL) {
+		GPU_LOG(ERR, "query invalid device ID %d", dev_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+	if (info == NULL) {
+		GPU_LOG(ERR, "query without storage");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	if (dev->ops.dev_info_get == NULL) {
+		*info = dev->info;
+		return 0;
+	}
+	return GPU_DRV_RET(dev->ops.dev_info_get(dev, info));
+}
diff --git a/lib/gpudev/gpudev_driver.h b/lib/gpudev/gpudev_driver.h
new file mode 100644
index 0000000000..9e096e3b64
--- /dev/null
+++ b/lib/gpudev/gpudev_driver.h
@@ -0,0 +1,67 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 NVIDIA Corporation & Affiliates
+ */
+
+/*
+ * This header file must be included only by drivers.
+ * It is considered internal, i.e. hidden for the application.
+ * The prefix rte_ is used to avoid namespace clash in drivers.
+ */
+
+#ifndef RTE_GPUDEV_DRIVER_H
+#define RTE_GPUDEV_DRIVER_H
+
+#include <stdint.h>
+
+#include <rte_dev.h>
+
+#include "rte_gpudev.h"
+
+/* Flags indicate current state of device. */
+enum rte_gpu_state {
+	RTE_GPU_STATE_UNUSED,        /* not initialized */
+	RTE_GPU_STATE_INITIALIZED,   /* initialized */
+};
+
+struct rte_gpu;
+typedef int (rte_gpu_close_t)(struct rte_gpu *dev);
+typedef int (rte_gpu_info_get_t)(struct rte_gpu *dev, struct rte_gpu_info *info);
+
+struct rte_gpu_ops {
+	/* Get device info. If NULL, info is just copied. */
+	rte_gpu_info_get_t *dev_info_get;
+	/* Close device. */
+	rte_gpu_close_t *dev_close;
+};
+
+struct rte_gpu {
+	/* Backing device. */
+	struct rte_device *device;
+	/* Unique identifier name. */
+	char name[RTE_DEV_NAME_MAX_LEN]; /* Updated by this library. */
+	/* Device info structure. */
+	struct rte_gpu_info info;
+	/* Driver functions. */
+	struct rte_gpu_ops ops;
+	/* Current state (used or not) in the running process. */
+	enum rte_gpu_state state; /* Updated by this library. */
+	/* Driver-specific private data for the running process. */
+	void *process_private;
+} __rte_cache_aligned;
+
+__rte_internal
+struct rte_gpu *rte_gpu_get_by_name(const char *name);
+
+/* First step of initialization */
+__rte_internal
+struct rte_gpu *rte_gpu_allocate(const char *name);
+
+/* Last step of initialization. */
+__rte_internal
+void rte_gpu_complete_new(struct rte_gpu *dev);
+
+/* Last step of removal. */
+__rte_internal
+int rte_gpu_release(struct rte_gpu *dev);
+
+#endif /* RTE_GPUDEV_DRIVER_H */
diff --git a/lib/gpudev/meson.build b/lib/gpudev/meson.build
new file mode 100644
index 0000000000..608154817b
--- /dev/null
+++ b/lib/gpudev/meson.build
@@ -0,0 +1,10 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2021 NVIDIA Corporation & Affiliates
+
+headers = files(
+        'rte_gpudev.h',
+)
+
+sources = files(
+        'gpudev.c',
+)
diff --git a/lib/gpudev/rte_gpudev.h b/lib/gpudev/rte_gpudev.h
new file mode 100644
index 0000000000..eb7cfa8c59
--- /dev/null
+++ b/lib/gpudev/rte_gpudev.h
@@ -0,0 +1,168 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 NVIDIA Corporation & Affiliates
+ */
+
+#ifndef RTE_GPUDEV_H
+#define RTE_GPUDEV_H
+
+#include <stddef.h>
+#include <stdint.h>
+#include <stdbool.h>
+
+#include <rte_compat.h>
+
+/**
+ * @file
+ * Generic library to interact with GPU computing device.
+ *
+ * The API is not thread-safe.
+ * Device management must be done by a single thread.
+ *
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/** Maximum number of devices if rte_gpu_init() is not called. */
+#define RTE_GPU_DEFAULT_MAX 32
+
+/** Empty device ID. */
+#define RTE_GPU_ID_NONE -1
+
+/** Store device info. */
+struct rte_gpu_info {
+	/** Unique identifier name. */
+	const char *name;
+	/** Device ID. */
+	int16_t dev_id;
+	/** Total processors available on device. */
+	uint32_t processor_count;
+	/** Total memory available on device. */
+	size_t total_memory;
+	/* Local NUMA memory ID. -1 if unknown. */
+	int16_t numa_node;
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Initialize the device array before probing devices.
+ * If not called, the maximum of probed devices is RTE_GPU_DEFAULT_MAX.
+ *
+ * @param dev_max
+ *   Maximum number of devices.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENOMEM if out of memory
+ *   - EINVAL if 0 size
+ *   - EBUSY if already initialized
+ */
+__rte_experimental
+int rte_gpu_init(size_t dev_max);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Return the number of GPU detected and associated to DPDK.
+ *
+ * @return
+ *   The number of available computing devices.
+ */
+__rte_experimental
+uint16_t rte_gpu_count_avail(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Check if the device is valid and initialized in DPDK.
+ *
+ * @param dev_id
+ *   The input device ID.
+ *
+ * @return
+ *   - True if dev_id is a valid and initialized computing device.
+ *   - False otherwise.
+ */
+__rte_experimental
+bool rte_gpu_is_valid(int16_t dev_id);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get the ID of the next valid GPU initialized in DPDK.
+ *
+ * @param dev_id
+ *   The initial device ID to start the research.
+ *
+ * @return
+ *   Next device ID corresponding to a valid and initialized computing device,
+ *   RTE_GPU_ID_NONE if there is none.
+ */
+__rte_experimental
+int16_t rte_gpu_find_next(int16_t dev_id);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Macro to iterate over all valid GPU devices.
+ *
+ * @param dev_id
+ *   The ID of the next possible valid device, usually 0 to iterate all.
+ */
+#define RTE_GPU_FOREACH(dev_id) \
+	for (dev_id = rte_gpu_find_next(0); \
+	     dev_id > 0; \
+	     dev_id = rte_gpu_find_next(dev_id + 1))
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Close device.
+ * All resources are released.
+ *
+ * @param dev_id
+ *   Device ID to close.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - EPERM if driver error
+ */
+__rte_experimental
+int rte_gpu_close(int16_t dev_id);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Return device specific info.
+ *
+ * @param dev_id
+ *   Device ID to get info.
+ * @param info
+ *   Memory structure to fill with the info.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - EINVAL if NULL info
+ *   - EPERM if driver error
+ */
+__rte_experimental
+int rte_gpu_info_get(int16_t dev_id, struct rte_gpu_info *info);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_GPUDEV_H */
diff --git a/lib/gpudev/version.map b/lib/gpudev/version.map
new file mode 100644
index 0000000000..6ac6b327e2
--- /dev/null
+++ b/lib/gpudev/version.map
@@ -0,0 +1,20 @@
+EXPERIMENTAL {
+	global:
+
+	# added in 21.11
+	rte_gpu_close;
+	rte_gpu_count_avail;
+	rte_gpu_find_next;
+	rte_gpu_info_get;
+	rte_gpu_init;
+	rte_gpu_is_valid;
+};
+
+INTERNAL {
+	global:
+
+	rte_gpu_allocate;
+	rte_gpu_complete_new;
+	rte_gpu_get_by_name;
+	rte_gpu_release;
+};
diff --git a/lib/meson.build b/lib/meson.build
index 499d26060f..8537a5ab80 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -34,6 +34,7 @@ libraries = [
         'distributor',
         'efd',
         'eventdev',
+        'gpudev',
         'gro',
         'gso',
         'ip_frag',
-- 
2.17.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [PATCH v5 2/9] gpudev: add event notification
  2021-11-08 18:57 ` [dpdk-dev] [PATCH v5 0/9] GPU library eagostini
  2021-11-08 16:25   ` Thomas Monjalon
  2021-11-08 18:57   ` [dpdk-dev] [PATCH v5 1/9] gpudev: introduce GPU device class library eagostini
@ 2021-11-08 18:57   ` eagostini
  2021-11-08 18:57   ` [dpdk-dev] [PATCH v5 3/9] gpudev: add child device representing a device context eagostini
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 128+ messages in thread
From: eagostini @ 2021-11-08 18:57 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon

From: Thomas Monjalon <thomas@monjalon.net>

Callback functions may be registered for a device event.
Callback management is per-process and not thread-safe.

The events RTE_GPU_EVENT_NEW and RTE_GPU_EVENT_DEL
are notified respectively after creation and before removal
of a device, as part of the library functions.
Some future events may be emitted from drivers.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 lib/gpudev/gpudev.c        | 148 +++++++++++++++++++++++++++++++++++++
 lib/gpudev/gpudev_driver.h |   7 ++
 lib/gpudev/rte_gpudev.h    |  70 ++++++++++++++++++
 lib/gpudev/version.map     |   3 +
 4 files changed, 228 insertions(+)

diff --git a/lib/gpudev/gpudev.c b/lib/gpudev/gpudev.c
index aeb021f2cc..07572ae040 100644
--- a/lib/gpudev/gpudev.c
+++ b/lib/gpudev/gpudev.c
@@ -3,6 +3,7 @@
  */
 
 #include <rte_eal.h>
+#include <rte_tailq.h>
 #include <rte_string_fns.h>
 #include <rte_errno.h>
 #include <rte_log.h>
@@ -27,6 +28,16 @@ static int16_t gpu_max;
 /* Number of currently valid devices */
 static int16_t gpu_count;
 
+/* Event callback object */
+struct rte_gpu_callback {
+	TAILQ_ENTRY(rte_gpu_callback) next;
+	rte_gpu_callback_t *function;
+	void *user_data;
+	enum rte_gpu_event event;
+};
+static rte_rwlock_t gpu_callback_lock = RTE_RWLOCK_INITIALIZER;
+static void gpu_free_callbacks(struct rte_gpu *dev);
+
 int
 rte_gpu_init(size_t dev_max)
 {
@@ -166,6 +177,7 @@ rte_gpu_allocate(const char *name)
 	dev->info.name = dev->name;
 	dev->info.dev_id = dev_id;
 	dev->info.numa_node = -1;
+	TAILQ_INIT(&dev->callbacks);
 
 	gpu_count++;
 	GPU_LOG(DEBUG, "new device %s (id %d) of total %d",
@@ -180,6 +192,8 @@ rte_gpu_complete_new(struct rte_gpu *dev)
 		return;
 
 	dev->state = RTE_GPU_STATE_INITIALIZED;
+	dev->state = RTE_GPU_STATE_INITIALIZED;
+	rte_gpu_notify(dev, RTE_GPU_EVENT_NEW);
 }
 
 int
@@ -192,6 +206,9 @@ rte_gpu_release(struct rte_gpu *dev)
 
 	GPU_LOG(DEBUG, "free device %s (id %d)",
 			dev->info.name, dev->info.dev_id);
+	rte_gpu_notify(dev, RTE_GPU_EVENT_DEL);
+
+	gpu_free_callbacks(dev);
 	dev->state = RTE_GPU_STATE_UNUSED;
 	gpu_count--;
 
@@ -224,6 +241,137 @@ rte_gpu_close(int16_t dev_id)
 	return firsterr;
 }
 
+int
+rte_gpu_callback_register(int16_t dev_id, enum rte_gpu_event event,
+		rte_gpu_callback_t *function, void *user_data)
+{
+	int16_t next_dev, last_dev;
+	struct rte_gpu_callback_list *callbacks;
+	struct rte_gpu_callback *callback;
+
+	if (!rte_gpu_is_valid(dev_id) && dev_id != RTE_GPU_ID_ANY) {
+		GPU_LOG(ERR, "register callback of invalid ID %d", dev_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+	if (function == NULL) {
+		GPU_LOG(ERR, "cannot register callback without function");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	if (dev_id == RTE_GPU_ID_ANY) {
+		next_dev = 0;
+		last_dev = gpu_max - 1;
+	} else {
+		next_dev = last_dev = dev_id;
+	}
+
+	rte_rwlock_write_lock(&gpu_callback_lock);
+	do {
+		callbacks = &gpus[next_dev].callbacks;
+
+		/* check if not already registered */
+		TAILQ_FOREACH(callback, callbacks, next) {
+			if (callback->event == event &&
+					callback->function == function &&
+					callback->user_data == user_data) {
+				GPU_LOG(INFO, "callback already registered");
+				return 0;
+			}
+		}
+
+		callback = malloc(sizeof(*callback));
+		if (callback == NULL) {
+			GPU_LOG(ERR, "cannot allocate callback");
+			return -ENOMEM;
+		}
+		callback->function = function;
+		callback->user_data = user_data;
+		callback->event = event;
+		TAILQ_INSERT_TAIL(callbacks, callback, next);
+
+	} while (++next_dev <= last_dev);
+	rte_rwlock_write_unlock(&gpu_callback_lock);
+
+	return 0;
+}
+
+int
+rte_gpu_callback_unregister(int16_t dev_id, enum rte_gpu_event event,
+		rte_gpu_callback_t *function, void *user_data)
+{
+	int16_t next_dev, last_dev;
+	struct rte_gpu_callback_list *callbacks;
+	struct rte_gpu_callback *callback, *nextcb;
+
+	if (!rte_gpu_is_valid(dev_id) && dev_id != RTE_GPU_ID_ANY) {
+		GPU_LOG(ERR, "unregister callback of invalid ID %d", dev_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+	if (function == NULL) {
+		GPU_LOG(ERR, "cannot unregister callback without function");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	if (dev_id == RTE_GPU_ID_ANY) {
+		next_dev = 0;
+		last_dev = gpu_max - 1;
+	} else {
+		next_dev = last_dev = dev_id;
+	}
+
+	rte_rwlock_write_lock(&gpu_callback_lock);
+	do {
+		callbacks = &gpus[next_dev].callbacks;
+		RTE_TAILQ_FOREACH_SAFE(callback, callbacks, next, nextcb) {
+			if (callback->event != event ||
+					callback->function != function ||
+					(callback->user_data != user_data &&
+					user_data != (void *)-1))
+				continue;
+			TAILQ_REMOVE(callbacks, callback, next);
+			free(callback);
+		}
+	} while (++next_dev <= last_dev);
+	rte_rwlock_write_unlock(&gpu_callback_lock);
+
+	return 0;
+}
+
+static void
+gpu_free_callbacks(struct rte_gpu *dev)
+{
+	struct rte_gpu_callback_list *callbacks;
+	struct rte_gpu_callback *callback, *nextcb;
+
+	callbacks = &dev->callbacks;
+	rte_rwlock_write_lock(&gpu_callback_lock);
+	RTE_TAILQ_FOREACH_SAFE(callback, callbacks, next, nextcb) {
+		TAILQ_REMOVE(callbacks, callback, next);
+		free(callback);
+	}
+	rte_rwlock_write_unlock(&gpu_callback_lock);
+}
+
+void
+rte_gpu_notify(struct rte_gpu *dev, enum rte_gpu_event event)
+{
+	int16_t dev_id;
+	struct rte_gpu_callback *callback;
+
+	dev_id = dev->info.dev_id;
+	rte_rwlock_read_lock(&gpu_callback_lock);
+	TAILQ_FOREACH(callback, &dev->callbacks, next) {
+		if (callback->event != event || callback->function == NULL)
+			continue;
+		callback->function(dev_id, event, callback->user_data);
+	}
+	rte_rwlock_read_unlock(&gpu_callback_lock);
+}
+
 int
 rte_gpu_info_get(int16_t dev_id, struct rte_gpu_info *info)
 {
diff --git a/lib/gpudev/gpudev_driver.h b/lib/gpudev/gpudev_driver.h
index 9e096e3b64..2a7089aa52 100644
--- a/lib/gpudev/gpudev_driver.h
+++ b/lib/gpudev/gpudev_driver.h
@@ -12,6 +12,7 @@
 #define RTE_GPUDEV_DRIVER_H
 
 #include <stdint.h>
+#include <sys/queue.h>
 
 #include <rte_dev.h>
 
@@ -43,6 +44,8 @@ struct rte_gpu {
 	struct rte_gpu_info info;
 	/* Driver functions. */
 	struct rte_gpu_ops ops;
+	/* Event callback list. */
+	TAILQ_HEAD(rte_gpu_callback_list, rte_gpu_callback) callbacks;
 	/* Current state (used or not) in the running process. */
 	enum rte_gpu_state state; /* Updated by this library. */
 	/* Driver-specific private data for the running process. */
@@ -64,4 +67,8 @@ void rte_gpu_complete_new(struct rte_gpu *dev);
 __rte_internal
 int rte_gpu_release(struct rte_gpu *dev);
 
+/* Call registered callbacks. No multi-process event. */
+__rte_internal
+void rte_gpu_notify(struct rte_gpu *dev, enum rte_gpu_event);
+
 #endif /* RTE_GPUDEV_DRIVER_H */
diff --git a/lib/gpudev/rte_gpudev.h b/lib/gpudev/rte_gpudev.h
index eb7cfa8c59..e1702fbfe4 100644
--- a/lib/gpudev/rte_gpudev.h
+++ b/lib/gpudev/rte_gpudev.h
@@ -31,6 +31,11 @@ extern "C" {
 
 /** Empty device ID. */
 #define RTE_GPU_ID_NONE -1
+/** Catch-all device ID. */
+#define RTE_GPU_ID_ANY INT16_MIN
+
+/** Catch-all callback data. */
+#define RTE_GPU_CALLBACK_ANY_DATA ((void *)-1)
 
 /** Store device info. */
 struct rte_gpu_info {
@@ -46,6 +51,18 @@ struct rte_gpu_info {
 	int16_t numa_node;
 };
 
+/** Flags passed in notification callback. */
+enum rte_gpu_event {
+	/** Device is just initialized. */
+	RTE_GPU_EVENT_NEW,
+	/** Device is going to be released. */
+	RTE_GPU_EVENT_DEL,
+};
+
+/** Prototype of event callback function. */
+typedef void (rte_gpu_callback_t)(int16_t dev_id,
+		enum rte_gpu_event event, void *user_data);
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
@@ -141,6 +158,59 @@ int16_t rte_gpu_find_next(int16_t dev_id);
 __rte_experimental
 int rte_gpu_close(int16_t dev_id);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Register a function as event callback.
+ * A function may be registered multiple times for different events.
+ *
+ * @param dev_id
+ *   Device ID to get notified about.
+ *   RTE_GPU_ID_ANY means all devices.
+ * @param event
+ *   Device event to be registered for.
+ * @param function
+ *   Callback function to be called on event.
+ * @param user_data
+ *   Optional parameter passed in the callback.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - EINVAL if NULL function
+ *   - ENOMEM if out of memory
+ */
+__rte_experimental
+int rte_gpu_callback_register(int16_t dev_id, enum rte_gpu_event event,
+		rte_gpu_callback_t *function, void *user_data);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Unregister for an event.
+ *
+ * @param dev_id
+ *   Device ID to be silenced.
+ *   RTE_GPU_ID_ANY means all devices.
+ * @param event
+ *   Registered event.
+ * @param function
+ *   Registered function.
+ * @param user_data
+ *   Optional parameter as registered.
+ *   RTE_GPU_CALLBACK_ANY_DATA is a catch-all.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - EINVAL if NULL function
+ */
+__rte_experimental
+int rte_gpu_callback_unregister(int16_t dev_id, enum rte_gpu_event event,
+		rte_gpu_callback_t *function, void *user_data);
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
diff --git a/lib/gpudev/version.map b/lib/gpudev/version.map
index 6ac6b327e2..b3b6b76c1c 100644
--- a/lib/gpudev/version.map
+++ b/lib/gpudev/version.map
@@ -2,6 +2,8 @@ EXPERIMENTAL {
 	global:
 
 	# added in 21.11
+	rte_gpu_callback_register;
+	rte_gpu_callback_unregister;
 	rte_gpu_close;
 	rte_gpu_count_avail;
 	rte_gpu_find_next;
@@ -16,5 +18,6 @@ INTERNAL {
 	rte_gpu_allocate;
 	rte_gpu_complete_new;
 	rte_gpu_get_by_name;
+	rte_gpu_notify;
 	rte_gpu_release;
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [PATCH v5 3/9] gpudev: add child device representing a device context
  2021-11-08 18:57 ` [dpdk-dev] [PATCH v5 0/9] GPU library eagostini
                     ` (2 preceding siblings ...)
  2021-11-08 18:57   ` [dpdk-dev] [PATCH v5 2/9] gpudev: add event notification eagostini
@ 2021-11-08 18:57   ` eagostini
  2021-11-08 18:58   ` [dpdk-dev] [PATCH v5 4/9] gpudev: support multi-process eagostini
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 128+ messages in thread
From: eagostini @ 2021-11-08 18:57 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon

From: Thomas Monjalon <thomas@monjalon.net>

The computing device may operate in some isolated contexts.
Memory and processing are isolated in a silo represented by
a child device.
The context is provided as an opaque by the caller of
rte_gpu_add_child().

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 doc/guides/prog_guide/gpudev.rst | 12 ++++++
 lib/gpudev/gpudev.c              | 45 +++++++++++++++++++-
 lib/gpudev/gpudev_driver.h       |  2 +-
 lib/gpudev/rte_gpudev.h          | 71 +++++++++++++++++++++++++++++---
 lib/gpudev/version.map           |  1 +
 5 files changed, 123 insertions(+), 8 deletions(-)

diff --git a/doc/guides/prog_guide/gpudev.rst b/doc/guides/prog_guide/gpudev.rst
index 6ea7239159..7694639489 100644
--- a/doc/guides/prog_guide/gpudev.rst
+++ b/doc/guides/prog_guide/gpudev.rst
@@ -34,3 +34,15 @@ This library provides a number of features:
 
 API Overview
 ------------
+
+Child Device
+~~~~~~~~~~~~
+
+By default, DPDK PCIe module detects and registers physical GPU devices
+in the system.
+With the gpudev library is also possible to add additional non-physical devices
+through an ``uint64_t`` generic handler (e.g. CUDA Driver context)
+that will be registered internally by the driver as an additional device (child)
+connected to a physical device (parent).
+Each device (parent or child) is represented through a ID
+required to indicate which device a given operation should be executed on.
diff --git a/lib/gpudev/gpudev.c b/lib/gpudev/gpudev.c
index 07572ae040..aaf41e6071 100644
--- a/lib/gpudev/gpudev.c
+++ b/lib/gpudev/gpudev.c
@@ -80,13 +80,22 @@ rte_gpu_is_valid(int16_t dev_id)
 	return false;
 }
 
+static bool
+gpu_match_parent(int16_t dev_id, int16_t parent)
+{
+	if (parent == RTE_GPU_ID_ANY)
+		return true;
+	return gpus[dev_id].info.parent == parent;
+}
+
 int16_t
-rte_gpu_find_next(int16_t dev_id)
+rte_gpu_find_next(int16_t dev_id, int16_t parent)
 {
 	if (dev_id < 0)
 		dev_id = 0;
 	while (dev_id < gpu_max &&
-			gpus[dev_id].state == RTE_GPU_STATE_UNUSED)
+			(gpus[dev_id].state == RTE_GPU_STATE_UNUSED ||
+			!gpu_match_parent(dev_id, parent)))
 		dev_id++;
 
 	if (dev_id >= gpu_max)
@@ -177,6 +186,7 @@ rte_gpu_allocate(const char *name)
 	dev->info.name = dev->name;
 	dev->info.dev_id = dev_id;
 	dev->info.numa_node = -1;
+	dev->info.parent = RTE_GPU_ID_NONE;
 	TAILQ_INIT(&dev->callbacks);
 
 	gpu_count++;
@@ -185,6 +195,28 @@ rte_gpu_allocate(const char *name)
 	return dev;
 }
 
+int16_t
+rte_gpu_add_child(const char *name, int16_t parent, uint64_t child_context)
+{
+	struct rte_gpu *dev;
+
+	if (!rte_gpu_is_valid(parent)) {
+		GPU_LOG(ERR, "add child to invalid parent ID %d", parent);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+
+	dev = rte_gpu_allocate(name);
+	if (dev == NULL)
+		return -rte_errno;
+
+	dev->info.parent = parent;
+	dev->info.context = child_context;
+
+	rte_gpu_complete_new(dev);
+	return dev->info.dev_id;
+}
+
 void
 rte_gpu_complete_new(struct rte_gpu *dev)
 {
@@ -199,10 +231,19 @@ rte_gpu_complete_new(struct rte_gpu *dev)
 int
 rte_gpu_release(struct rte_gpu *dev)
 {
+	int16_t dev_id, child;
+
 	if (dev == NULL) {
 		rte_errno = ENODEV;
 		return -rte_errno;
 	}
+	dev_id = dev->info.dev_id;
+	RTE_GPU_FOREACH_CHILD(child, dev_id) {
+		GPU_LOG(ERR, "cannot release device %d with child %d",
+				dev_id, child);
+		rte_errno = EBUSY;
+		return -rte_errno;
+	}
 
 	GPU_LOG(DEBUG, "free device %s (id %d)",
 			dev->info.name, dev->info.dev_id);
diff --git a/lib/gpudev/gpudev_driver.h b/lib/gpudev/gpudev_driver.h
index 2a7089aa52..4d0077161c 100644
--- a/lib/gpudev/gpudev_driver.h
+++ b/lib/gpudev/gpudev_driver.h
@@ -31,7 +31,7 @@ typedef int (rte_gpu_info_get_t)(struct rte_gpu *dev, struct rte_gpu_info *info)
 struct rte_gpu_ops {
 	/* Get device info. If NULL, info is just copied. */
 	rte_gpu_info_get_t *dev_info_get;
-	/* Close device. */
+	/* Close device or child context. */
 	rte_gpu_close_t *dev_close;
 };
 
diff --git a/lib/gpudev/rte_gpudev.h b/lib/gpudev/rte_gpudev.h
index e1702fbfe4..df75dbdbab 100644
--- a/lib/gpudev/rte_gpudev.h
+++ b/lib/gpudev/rte_gpudev.h
@@ -41,8 +41,12 @@ extern "C" {
 struct rte_gpu_info {
 	/** Unique identifier name. */
 	const char *name;
+	/** Opaque handler of the device context. */
+	uint64_t context;
 	/** Device ID. */
 	int16_t dev_id;
+	/** ID of the parent device, RTE_GPU_ID_NONE if no parent */
+	int16_t parent;
 	/** Total processors available on device. */
 	uint32_t processor_count;
 	/** Total memory available on device. */
@@ -110,6 +114,33 @@ uint16_t rte_gpu_count_avail(void);
 __rte_experimental
 bool rte_gpu_is_valid(int16_t dev_id);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Create a virtual device representing a context in the parent device.
+ *
+ * @param name
+ *   Unique string to identify the device.
+ * @param parent
+ *   Device ID of the parent.
+ * @param child_context
+ *   Opaque context handler.
+ *
+ * @return
+ *   Device ID of the new created child, -rte_errno otherwise:
+ *   - EINVAL if empty name
+ *   - ENAMETOOLONG if long name
+ *   - EEXIST if existing device name
+ *   - ENODEV if invalid parent
+ *   - EPERM if secondary process
+ *   - ENOENT if too many devices
+ *   - ENOMEM if out of space
+ */
+__rte_experimental
+int16_t rte_gpu_add_child(const char *name,
+		int16_t parent, uint64_t child_context);
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
@@ -118,13 +149,17 @@ bool rte_gpu_is_valid(int16_t dev_id);
  *
  * @param dev_id
  *   The initial device ID to start the research.
+ * @param parent
+ *   The device ID of the parent.
+ *   RTE_GPU_ID_NONE means no parent.
+ *   RTE_GPU_ID_ANY means no or any parent.
  *
  * @return
  *   Next device ID corresponding to a valid and initialized computing device,
  *   RTE_GPU_ID_NONE if there is none.
  */
 __rte_experimental
-int16_t rte_gpu_find_next(int16_t dev_id);
+int16_t rte_gpu_find_next(int16_t dev_id, int16_t parent);
 
 /**
  * @warning
@@ -136,15 +171,41 @@ int16_t rte_gpu_find_next(int16_t dev_id);
  *   The ID of the next possible valid device, usually 0 to iterate all.
  */
 #define RTE_GPU_FOREACH(dev_id) \
-	for (dev_id = rte_gpu_find_next(0); \
-	     dev_id > 0; \
-	     dev_id = rte_gpu_find_next(dev_id + 1))
+	RTE_GPU_FOREACH_CHILD(dev_id, RTE_GPU_ID_ANY)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Macro to iterate over all valid computing devices having no parent.
+ *
+ * @param dev_id
+ *   The ID of the next possible valid device, usually 0 to iterate all.
+ */
+#define RTE_GPU_FOREACH_PARENT(dev_id) \
+	RTE_GPU_FOREACH_CHILD(dev_id, RTE_GPU_ID_NONE)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Macro to iterate over all valid children of a computing device parent.
+ *
+ * @param dev_id
+ *   The ID of the next possible valid device, usually 0 to iterate all.
+ * @param parent
+ *   The device ID of the parent.
+ */
+#define RTE_GPU_FOREACH_CHILD(dev_id, parent) \
+	for (dev_id = rte_gpu_find_next(0, parent); \
+	     dev_id >= 0; \
+	     dev_id = rte_gpu_find_next(dev_id + 1, parent))
 
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
  *
- * Close device.
+ * Close device or child context.
  * All resources are released.
  *
  * @param dev_id
diff --git a/lib/gpudev/version.map b/lib/gpudev/version.map
index b3b6b76c1c..4a934ed933 100644
--- a/lib/gpudev/version.map
+++ b/lib/gpudev/version.map
@@ -2,6 +2,7 @@ EXPERIMENTAL {
 	global:
 
 	# added in 21.11
+	rte_gpu_add_child;
 	rte_gpu_callback_register;
 	rte_gpu_callback_unregister;
 	rte_gpu_close;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [PATCH v5 4/9] gpudev: support multi-process
  2021-11-08 18:57 ` [dpdk-dev] [PATCH v5 0/9] GPU library eagostini
                     ` (3 preceding siblings ...)
  2021-11-08 18:57   ` [dpdk-dev] [PATCH v5 3/9] gpudev: add child device representing a device context eagostini
@ 2021-11-08 18:58   ` eagostini
  2021-11-08 18:58   ` [dpdk-dev] [PATCH v5 5/9] gpudev: add memory API eagostini
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 128+ messages in thread
From: eagostini @ 2021-11-08 18:58 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon

From: Thomas Monjalon <thomas@monjalon.net>

The device data shared between processes are moved in a struct
allocated in a shared memory (a new memzone for all GPUs).
The main struct rte_gpu references the shared memory
via the pointer mpshared.

The API function rte_gpu_attach() is added to attach a device
from the secondary process.
The function rte_gpu_allocate() can be used only by primary process.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 lib/gpudev/gpudev.c        | 127 +++++++++++++++++++++++++++++++------
 lib/gpudev/gpudev_driver.h |  25 ++++++--
 lib/gpudev/version.map     |   1 +
 3 files changed, 127 insertions(+), 26 deletions(-)

diff --git a/lib/gpudev/gpudev.c b/lib/gpudev/gpudev.c
index aaf41e6071..17e371102a 100644
--- a/lib/gpudev/gpudev.c
+++ b/lib/gpudev/gpudev.c
@@ -5,6 +5,7 @@
 #include <rte_eal.h>
 #include <rte_tailq.h>
 #include <rte_string_fns.h>
+#include <rte_memzone.h>
 #include <rte_errno.h>
 #include <rte_log.h>
 
@@ -28,6 +29,12 @@ static int16_t gpu_max;
 /* Number of currently valid devices */
 static int16_t gpu_count;
 
+/* Shared memory between processes. */
+static const char *GPU_MEMZONE = "rte_gpu_shared";
+static struct {
+	__extension__ struct rte_gpu_mpshared gpus[0];
+} *gpu_shared_mem;
+
 /* Event callback object */
 struct rte_gpu_callback {
 	TAILQ_ENTRY(rte_gpu_callback) next;
@@ -75,7 +82,7 @@ bool
 rte_gpu_is_valid(int16_t dev_id)
 {
 	if (dev_id >= 0 && dev_id < gpu_max &&
-		gpus[dev_id].state == RTE_GPU_STATE_INITIALIZED)
+		gpus[dev_id].process_state == RTE_GPU_STATE_INITIALIZED)
 		return true;
 	return false;
 }
@@ -85,7 +92,7 @@ gpu_match_parent(int16_t dev_id, int16_t parent)
 {
 	if (parent == RTE_GPU_ID_ANY)
 		return true;
-	return gpus[dev_id].info.parent == parent;
+	return gpus[dev_id].mpshared->info.parent == parent;
 }
 
 int16_t
@@ -94,7 +101,7 @@ rte_gpu_find_next(int16_t dev_id, int16_t parent)
 	if (dev_id < 0)
 		dev_id = 0;
 	while (dev_id < gpu_max &&
-			(gpus[dev_id].state == RTE_GPU_STATE_UNUSED ||
+			(gpus[dev_id].process_state == RTE_GPU_STATE_UNUSED ||
 			!gpu_match_parent(dev_id, parent)))
 		dev_id++;
 
@@ -109,7 +116,7 @@ gpu_find_free_id(void)
 	int16_t dev_id;
 
 	for (dev_id = 0; dev_id < gpu_max; dev_id++) {
-		if (gpus[dev_id].state == RTE_GPU_STATE_UNUSED)
+		if (gpus[dev_id].process_state == RTE_GPU_STATE_UNUSED)
 			return dev_id;
 	}
 	return RTE_GPU_ID_NONE;
@@ -136,12 +143,35 @@ rte_gpu_get_by_name(const char *name)
 
 	RTE_GPU_FOREACH(dev_id) {
 		dev = &gpus[dev_id];
-		if (strncmp(name, dev->name, RTE_DEV_NAME_MAX_LEN) == 0)
+		if (strncmp(name, dev->mpshared->name, RTE_DEV_NAME_MAX_LEN) == 0)
 			return dev;
 	}
 	return NULL;
 }
 
+static int
+gpu_shared_mem_init(void)
+{
+	const struct rte_memzone *memzone;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		memzone = rte_memzone_reserve(GPU_MEMZONE,
+				sizeof(*gpu_shared_mem) +
+				sizeof(*gpu_shared_mem->gpus) * gpu_max,
+				SOCKET_ID_ANY, 0);
+	} else {
+		memzone = rte_memzone_lookup(GPU_MEMZONE);
+	}
+	if (memzone == NULL) {
+		GPU_LOG(ERR, "cannot initialize shared memory");
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+
+	gpu_shared_mem = memzone->addr;
+	return 0;
+}
+
 struct rte_gpu *
 rte_gpu_allocate(const char *name)
 {
@@ -163,6 +193,10 @@ rte_gpu_allocate(const char *name)
 	if (gpus == NULL && rte_gpu_init(RTE_GPU_DEFAULT_MAX) < 0)
 		return NULL;
 
+	/* initialize shared memory before adding first device */
+	if (gpu_shared_mem == NULL && gpu_shared_mem_init() < 0)
+		return NULL;
+
 	if (rte_gpu_get_by_name(name) != NULL) {
 		GPU_LOG(ERR, "device with name %s already exists", name);
 		rte_errno = EEXIST;
@@ -178,16 +212,20 @@ rte_gpu_allocate(const char *name)
 	dev = &gpus[dev_id];
 	memset(dev, 0, sizeof(*dev));
 
-	if (rte_strscpy(dev->name, name, RTE_DEV_NAME_MAX_LEN) < 0) {
+	dev->mpshared = &gpu_shared_mem->gpus[dev_id];
+	memset(dev->mpshared, 0, sizeof(*dev->mpshared));
+
+	if (rte_strscpy(dev->mpshared->name, name, RTE_DEV_NAME_MAX_LEN) < 0) {
 		GPU_LOG(ERR, "device name too long: %s", name);
 		rte_errno = ENAMETOOLONG;
 		return NULL;
 	}
-	dev->info.name = dev->name;
-	dev->info.dev_id = dev_id;
-	dev->info.numa_node = -1;
-	dev->info.parent = RTE_GPU_ID_NONE;
+	dev->mpshared->info.name = dev->mpshared->name;
+	dev->mpshared->info.dev_id = dev_id;
+	dev->mpshared->info.numa_node = -1;
+	dev->mpshared->info.parent = RTE_GPU_ID_NONE;
 	TAILQ_INIT(&dev->callbacks);
+	__atomic_fetch_add(&dev->mpshared->process_refcnt, 1, __ATOMIC_RELAXED);
 
 	gpu_count++;
 	GPU_LOG(DEBUG, "new device %s (id %d) of total %d",
@@ -195,6 +233,55 @@ rte_gpu_allocate(const char *name)
 	return dev;
 }
 
+struct rte_gpu *
+rte_gpu_attach(const char *name)
+{
+	int16_t dev_id;
+	struct rte_gpu *dev;
+	struct rte_gpu_mpshared *shared_dev;
+
+	if (rte_eal_process_type() != RTE_PROC_SECONDARY) {
+		GPU_LOG(ERR, "only secondary process can attach device");
+		rte_errno = EPERM;
+		return NULL;
+	}
+	if (name == NULL) {
+		GPU_LOG(ERR, "attach device without a name");
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* implicit initialization of library before adding first device */
+	if (gpus == NULL && rte_gpu_init(RTE_GPU_DEFAULT_MAX) < 0)
+		return NULL;
+
+	/* initialize shared memory before adding first device */
+	if (gpu_shared_mem == NULL && gpu_shared_mem_init() < 0)
+		return NULL;
+
+	for (dev_id = 0; dev_id < gpu_max; dev_id++) {
+		shared_dev = &gpu_shared_mem->gpus[dev_id];
+		if (strncmp(name, shared_dev->name, RTE_DEV_NAME_MAX_LEN) == 0)
+			break;
+	}
+	if (dev_id >= gpu_max) {
+		GPU_LOG(ERR, "device with name %s not found", name);
+		rte_errno = ENOENT;
+		return NULL;
+	}
+	dev = &gpus[dev_id];
+	memset(dev, 0, sizeof(*dev));
+
+	TAILQ_INIT(&dev->callbacks);
+	dev->mpshared = shared_dev;
+	__atomic_fetch_add(&dev->mpshared->process_refcnt, 1, __ATOMIC_RELAXED);
+
+	gpu_count++;
+	GPU_LOG(DEBUG, "attached device %s (id %d) of total %d",
+			name, dev_id, gpu_count);
+	return dev;
+}
+
 int16_t
 rte_gpu_add_child(const char *name, int16_t parent, uint64_t child_context)
 {
@@ -210,11 +297,11 @@ rte_gpu_add_child(const char *name, int16_t parent, uint64_t child_context)
 	if (dev == NULL)
 		return -rte_errno;
 
-	dev->info.parent = parent;
-	dev->info.context = child_context;
+	dev->mpshared->info.parent = parent;
+	dev->mpshared->info.context = child_context;
 
 	rte_gpu_complete_new(dev);
-	return dev->info.dev_id;
+	return dev->mpshared->info.dev_id;
 }
 
 void
@@ -223,8 +310,7 @@ rte_gpu_complete_new(struct rte_gpu *dev)
 	if (dev == NULL)
 		return;
 
-	dev->state = RTE_GPU_STATE_INITIALIZED;
-	dev->state = RTE_GPU_STATE_INITIALIZED;
+	dev->process_state = RTE_GPU_STATE_INITIALIZED;
 	rte_gpu_notify(dev, RTE_GPU_EVENT_NEW);
 }
 
@@ -237,7 +323,7 @@ rte_gpu_release(struct rte_gpu *dev)
 		rte_errno = ENODEV;
 		return -rte_errno;
 	}
-	dev_id = dev->info.dev_id;
+	dev_id = dev->mpshared->info.dev_id;
 	RTE_GPU_FOREACH_CHILD(child, dev_id) {
 		GPU_LOG(ERR, "cannot release device %d with child %d",
 				dev_id, child);
@@ -246,11 +332,12 @@ rte_gpu_release(struct rte_gpu *dev)
 	}
 
 	GPU_LOG(DEBUG, "free device %s (id %d)",
-			dev->info.name, dev->info.dev_id);
+			dev->mpshared->info.name, dev->mpshared->info.dev_id);
 	rte_gpu_notify(dev, RTE_GPU_EVENT_DEL);
 
 	gpu_free_callbacks(dev);
-	dev->state = RTE_GPU_STATE_UNUSED;
+	dev->process_state = RTE_GPU_STATE_UNUSED;
+	__atomic_fetch_sub(&dev->mpshared->process_refcnt, 1, __ATOMIC_RELAXED);
 	gpu_count--;
 
 	return 0;
@@ -403,7 +490,7 @@ rte_gpu_notify(struct rte_gpu *dev, enum rte_gpu_event event)
 	int16_t dev_id;
 	struct rte_gpu_callback *callback;
 
-	dev_id = dev->info.dev_id;
+	dev_id = dev->mpshared->info.dev_id;
 	rte_rwlock_read_lock(&gpu_callback_lock);
 	TAILQ_FOREACH(callback, &dev->callbacks, next) {
 		if (callback->event != event || callback->function == NULL)
@@ -431,7 +518,7 @@ rte_gpu_info_get(int16_t dev_id, struct rte_gpu_info *info)
 	}
 
 	if (dev->ops.dev_info_get == NULL) {
-		*info = dev->info;
+		*info = dev->mpshared->info;
 		return 0;
 	}
 	return GPU_DRV_RET(dev->ops.dev_info_get(dev, info));
diff --git a/lib/gpudev/gpudev_driver.h b/lib/gpudev/gpudev_driver.h
index 4d0077161c..9459c7e30f 100644
--- a/lib/gpudev/gpudev_driver.h
+++ b/lib/gpudev/gpudev_driver.h
@@ -35,19 +35,28 @@ struct rte_gpu_ops {
 	rte_gpu_close_t *dev_close;
 };
 
-struct rte_gpu {
-	/* Backing device. */
-	struct rte_device *device;
+struct rte_gpu_mpshared {
 	/* Unique identifier name. */
 	char name[RTE_DEV_NAME_MAX_LEN]; /* Updated by this library. */
+	/* Driver-specific private data shared in multi-process. */
+	void *dev_private;
 	/* Device info structure. */
 	struct rte_gpu_info info;
+	/* Counter of processes using the device. */
+	uint16_t process_refcnt; /* Updated by this library. */
+};
+
+struct rte_gpu {
+	/* Backing device. */
+	struct rte_device *device;
+	/* Data shared between processes. */
+	struct rte_gpu_mpshared *mpshared;
 	/* Driver functions. */
 	struct rte_gpu_ops ops;
 	/* Event callback list. */
 	TAILQ_HEAD(rte_gpu_callback_list, rte_gpu_callback) callbacks;
 	/* Current state (used or not) in the running process. */
-	enum rte_gpu_state state; /* Updated by this library. */
+	enum rte_gpu_state process_state; /* Updated by this library. */
 	/* Driver-specific private data for the running process. */
 	void *process_private;
 } __rte_cache_aligned;
@@ -55,15 +64,19 @@ struct rte_gpu {
 __rte_internal
 struct rte_gpu *rte_gpu_get_by_name(const char *name);
 
-/* First step of initialization */
+/* First step of initialization in primary process. */
 __rte_internal
 struct rte_gpu *rte_gpu_allocate(const char *name);
 
+/* First step of initialization in secondary process. */
+__rte_internal
+struct rte_gpu *rte_gpu_attach(const char *name);
+
 /* Last step of initialization. */
 __rte_internal
 void rte_gpu_complete_new(struct rte_gpu *dev);
 
-/* Last step of removal. */
+/* Last step of removal (primary or secondary process). */
 __rte_internal
 int rte_gpu_release(struct rte_gpu *dev);
 
diff --git a/lib/gpudev/version.map b/lib/gpudev/version.map
index 4a934ed933..58dc632393 100644
--- a/lib/gpudev/version.map
+++ b/lib/gpudev/version.map
@@ -17,6 +17,7 @@ INTERNAL {
 	global:
 
 	rte_gpu_allocate;
+	rte_gpu_attach;
 	rte_gpu_complete_new;
 	rte_gpu_get_by_name;
 	rte_gpu_notify;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [PATCH v5 5/9] gpudev: add memory API
  2021-11-08 18:57 ` [dpdk-dev] [PATCH v5 0/9] GPU library eagostini
                     ` (4 preceding siblings ...)
  2021-11-08 18:58   ` [dpdk-dev] [PATCH v5 4/9] gpudev: support multi-process eagostini
@ 2021-11-08 18:58   ` eagostini
  2021-11-08 18:58   ` [dpdk-dev] [PATCH v5 6/9] gpudev: add memory barrier eagostini
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 128+ messages in thread
From: eagostini @ 2021-11-08 18:58 UTC (permalink / raw)
  To: dev; +Cc: Elena Agostini, Thomas Monjalon

From: Elena Agostini <eagostini@nvidia.com>

In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.
Such workload distribution can be achieved by sharing some memory.

As a first step, the features are focused on memory management.
A function allows to allocate memory inside the device,
or in the main (CPU) memory while making it visible for the device.
This memory may be used to save packets or for synchronization data.

The next step should focus on GPU processing task control.

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 app/test-gpudev/main.c                 | 105 +++++++++++++++++++++++++
 doc/guides/gpus/features/default.ini   |   3 +
 doc/guides/prog_guide/gpudev.rst       |  19 +++++
 doc/guides/rel_notes/release_21_11.rst |   1 +
 lib/gpudev/gpudev.c                    | 101 ++++++++++++++++++++++++
 lib/gpudev/gpudev_driver.h             |  12 +++
 lib/gpudev/rte_gpudev.h                |  95 ++++++++++++++++++++++
 lib/gpudev/version.map                 |   4 +
 8 files changed, 340 insertions(+)

diff --git a/app/test-gpudev/main.c b/app/test-gpudev/main.c
index 438cfdac54..e3aca2225a 100644
--- a/app/test-gpudev/main.c
+++ b/app/test-gpudev/main.c
@@ -62,6 +62,98 @@ args_parse(int argc, char **argv)
 	}
 }
 
+static int
+alloc_gpu_memory(uint16_t gpu_id)
+{
+	void *ptr_1 = NULL;
+	void *ptr_2 = NULL;
+	size_t buf_bytes = 1024;
+	int ret;
+
+	printf("\n=======> TEST: Allocate GPU memory\n");
+
+	/* Alloc memory on GPU 0 */
+	ptr_1 = rte_gpu_malloc(gpu_id, buf_bytes);
+	if (ptr_1 == NULL) {
+		fprintf(stderr, "rte_gpu_malloc GPU memory returned error\n");
+		return -1;
+	}
+	printf("GPU memory allocated at 0x%p %zdB\n", ptr_1, buf_bytes);
+
+	ptr_2 = rte_gpu_malloc(gpu_id, buf_bytes);
+	if (ptr_2 == NULL) {
+		fprintf(stderr, "rte_gpu_malloc GPU memory returned error\n");
+		return -1;
+	}
+	printf("GPU memory allocated at 0x%p %zdB\n", ptr_2, buf_bytes);
+
+	ret = rte_gpu_free(gpu_id, (uint8_t *)(ptr_1)+0x700);
+	if (ret < 0) {
+		printf("GPU memory 0x%p + 0x700 NOT freed because of memory address not recognized by driver\n", ptr_1);
+	} else {
+		fprintf(stderr, "rte_gpu_free erroneusly freed GPU memory 0x%p + 0x700\n", ptr_1);
+		return -1;
+	}
+
+	ret = rte_gpu_free(gpu_id, ptr_2);
+	if (ret < 0) {
+		fprintf(stderr, "rte_gpu_free returned error %d\n", ret);
+		return -1;
+	}
+	printf("GPU memory 0x%p freed\n", ptr_2);
+
+	ret = rte_gpu_free(gpu_id, ptr_1);
+	if (ret < 0) {
+		fprintf(stderr, "rte_gpu_free returned error %d\n", ret);
+		return -1;
+	}
+	printf("GPU memory 0x%p freed\n", ptr_1);
+
+	return 0;
+}
+
+static int
+register_cpu_memory(uint16_t gpu_id)
+{
+	void *ptr = NULL;
+	size_t buf_bytes = 1024;
+	int ret;
+
+	printf("\n=======> TEST: Register CPU memory\n");
+
+	/* Alloc memory on CPU visible from GPU 0 */
+	ptr = rte_zmalloc(NULL, buf_bytes, 0);
+	if (ptr == NULL) {
+		fprintf(stderr, "Failed to allocate CPU memory.\n");
+		return -1;
+	}
+
+	ret = rte_gpu_register(gpu_id, buf_bytes, ptr);
+	if (ret < 0) {
+		fprintf(stderr, "rte_gpu_register CPU memory returned error %d\n", ret);
+		return -1;
+	}
+	printf("CPU memory registered at 0x%p %zdB\n", ptr, buf_bytes);
+
+	ret = rte_gpu_unregister(gpu_id, (uint8_t *)(ptr)+0x700);
+	if (ret < 0) {
+		printf("CPU memory 0x%p + 0x700 NOT unregistered because of memory address not recognized by driver\n", ptr);
+	} else {
+		fprintf(stderr, "rte_gpu_free erroneusly freed GPU memory 0x%p + 0x700\n", ptr);
+		return -1;
+	}
+	printf("CPU memory 0x%p unregistered\n", ptr);
+
+	ret = rte_gpu_unregister(gpu_id, ptr);
+	if (ret < 0) {
+		fprintf(stderr, "rte_gpu_unregister returned error %d\n", ret);
+		return -1;
+	}
+	printf("CPU memory 0x%p unregistered\n", ptr);
+
+	return 0;
+}
+
 int
 main(int argc, char **argv)
 {
@@ -99,6 +191,19 @@ main(int argc, char **argv)
 	}
 	printf("\n\n");
 
+	if (nb_gpus == 0) {
+		fprintf(stderr, "Need at least one GPU on the system to run the example\n");
+		return EXIT_FAILURE;
+	}
+
+	gpu_id = 0;
+
+	/**
+	 * Memory tests
+	 */
+	alloc_gpu_memory(gpu_id);
+	register_cpu_memory(gpu_id);
+
 	/* clean up the EAL */
 	rte_eal_cleanup();
 	printf("Bye...\n");
diff --git a/doc/guides/gpus/features/default.ini b/doc/guides/gpus/features/default.ini
index ec7a545eb7..87e9966424 100644
--- a/doc/guides/gpus/features/default.ini
+++ b/doc/guides/gpus/features/default.ini
@@ -8,3 +8,6 @@
 ;
 [Features]
 Get device info                =
+Share CPU memory with device   =
+Allocate device memory         =
+Free memory                    =
diff --git a/doc/guides/prog_guide/gpudev.rst b/doc/guides/prog_guide/gpudev.rst
index 7694639489..9aca69038c 100644
--- a/doc/guides/prog_guide/gpudev.rst
+++ b/doc/guides/prog_guide/gpudev.rst
@@ -30,6 +30,8 @@ Features
 This library provides a number of features:
 
 - Interoperability with device-specific library through generic handlers.
+- Allocate and free memory on the device.
+- Register CPU memory to make it visible from the device.
 
 
 API Overview
@@ -46,3 +48,20 @@ that will be registered internally by the driver as an additional device (child)
 connected to a physical device (parent).
 Each device (parent or child) is represented through a ID
 required to indicate which device a given operation should be executed on.
+
+Memory Allocation
+~~~~~~~~~~~~~~~~~
+
+gpudev can allocate on an input given GPU device a memory area
+returning the pointer to that memory.
+Later, it's also possible to free that memory with gpudev.
+GPU memory allocated outside of the gpudev library
+(e.g. with GPU-specific library) cannot be freed by the gpudev library.
+
+Memory Registration
+~~~~~~~~~~~~~~~~~~~
+
+gpudev can register a CPU memory area to make it visible from a GPU device.
+Later, it's also possible to unregister that memory with gpudev.
+CPU memory registered outside of the gpudev library
+(e.g. with GPU specific library) cannot be unregistered by the gpudev library.
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index 9cf59e73bb..a4d07bda9b 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -104,6 +104,7 @@ New Features
 * **Introduced GPU device class with first features:**
 
   * Device information
+  * Memory management
 
 * **Added new RSS offload types for IPv4/L4 checksum in RSS flow.**
 
diff --git a/lib/gpudev/gpudev.c b/lib/gpudev/gpudev.c
index 17e371102a..d0826ec881 100644
--- a/lib/gpudev/gpudev.c
+++ b/lib/gpudev/gpudev.c
@@ -6,6 +6,7 @@
 #include <rte_tailq.h>
 #include <rte_string_fns.h>
 #include <rte_memzone.h>
+#include <rte_malloc.h>
 #include <rte_errno.h>
 #include <rte_log.h>
 
@@ -523,3 +524,103 @@ rte_gpu_info_get(int16_t dev_id, struct rte_gpu_info *info)
 	}
 	return GPU_DRV_RET(dev->ops.dev_info_get(dev, info));
 }
+
+void *
+rte_gpu_malloc(int16_t dev_id, size_t size)
+{
+	struct rte_gpu *dev;
+	void *ptr;
+	int ret;
+
+	dev = gpu_get_by_id(dev_id);
+	if (dev == NULL) {
+		GPU_LOG(ERR, "alloc mem for invalid device ID %d", dev_id);
+		rte_errno = ENODEV;
+		return NULL;
+	}
+
+	if (dev->ops.mem_alloc == NULL) {
+		GPU_LOG(ERR, "mem allocation not supported");
+		rte_errno = ENOTSUP;
+		return NULL;
+	}
+
+	if (size == 0) /* dry-run */
+		return NULL;
+
+	ret = dev->ops.mem_alloc(dev, size, &ptr);
+
+	switch (ret) {
+		case 0:
+			return ptr;
+		case -ENOMEM:
+		case -E2BIG:
+			rte_errno = -ret;
+			return NULL;
+		default:
+			rte_errno = -EPERM;
+			return NULL;
+	}
+}
+
+int
+rte_gpu_register(int16_t dev_id, size_t size, void *ptr)
+{
+	struct rte_gpu *dev;
+
+	dev = gpu_get_by_id(dev_id);
+	if (dev == NULL) {
+		GPU_LOG(ERR, "alloc mem for invalid device ID %d", dev_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+
+	if (dev->ops.mem_register == NULL) {
+		GPU_LOG(ERR, "mem registration not supported");
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+
+	if (size == 0 || ptr == NULL) /* dry-run */
+		return -EINVAL;
+
+	return GPU_DRV_RET(dev->ops.mem_register(dev, size, ptr));
+}
+
+int
+rte_gpu_unregister(int16_t dev_id, void *ptr)
+{
+	struct rte_gpu *dev;
+
+	dev = gpu_get_by_id(dev_id);
+	if (dev == NULL) {
+		GPU_LOG(ERR, "unregister mem for invalid device ID %d", dev_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+
+	if (dev->ops.mem_unregister == NULL) {
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	return GPU_DRV_RET(dev->ops.mem_unregister(dev, ptr));
+}
+
+int
+rte_gpu_free(int16_t dev_id, void *ptr)
+{
+	struct rte_gpu *dev;
+
+	dev = gpu_get_by_id(dev_id);
+	if (dev == NULL) {
+		GPU_LOG(ERR, "free mem for invalid device ID %d", dev_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+
+	if (dev->ops.mem_free == NULL) {
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	return GPU_DRV_RET(dev->ops.mem_free(dev, ptr));
+}
diff --git a/lib/gpudev/gpudev_driver.h b/lib/gpudev/gpudev_driver.h
index 9459c7e30f..11015944a6 100644
--- a/lib/gpudev/gpudev_driver.h
+++ b/lib/gpudev/gpudev_driver.h
@@ -27,12 +27,24 @@ enum rte_gpu_state {
 struct rte_gpu;
 typedef int (rte_gpu_close_t)(struct rte_gpu *dev);
 typedef int (rte_gpu_info_get_t)(struct rte_gpu *dev, struct rte_gpu_info *info);
+typedef int (rte_gpu_mem_alloc_t)(struct rte_gpu *dev, size_t size, void **ptr);
+typedef int (rte_gpu_free_t)(struct rte_gpu *dev, void *ptr);
+typedef int (rte_gpu_mem_register_t)(struct rte_gpu *dev, size_t size, void *ptr);
+typedef int (rte_gpu_mem_unregister_t)(struct rte_gpu *dev, void *ptr);
 
 struct rte_gpu_ops {
 	/* Get device info. If NULL, info is just copied. */
 	rte_gpu_info_get_t *dev_info_get;
 	/* Close device or child context. */
 	rte_gpu_close_t *dev_close;
+	/* Allocate memory in device. */
+	rte_gpu_mem_alloc_t *mem_alloc;
+	/* Register CPU memory in device. */
+	rte_gpu_mem_register_t *mem_register;
+	/* Free memory allocated or registered in device. */
+	rte_gpu_free_t *mem_free;
+	/* Unregister CPU memory in device. */
+	rte_gpu_mem_unregister_t *mem_unregister;
 };
 
 struct rte_gpu_mpshared {
diff --git a/lib/gpudev/rte_gpudev.h b/lib/gpudev/rte_gpudev.h
index df75dbdbab..fee71d60e7 100644
--- a/lib/gpudev/rte_gpudev.h
+++ b/lib/gpudev/rte_gpudev.h
@@ -9,6 +9,7 @@
 #include <stdint.h>
 #include <stdbool.h>
 
+#include <rte_bitops.h>
 #include <rte_compat.h>
 
 /**
@@ -292,6 +293,100 @@ int rte_gpu_callback_unregister(int16_t dev_id, enum rte_gpu_event event,
 __rte_experimental
 int rte_gpu_info_get(int16_t dev_id, struct rte_gpu_info *info);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Allocate a chunk of memory usable by the device.
+ *
+ * @param dev_id
+ *   Device ID requiring allocated memory.
+ * @param size
+ *   Number of bytes to allocate.
+ *   Requesting 0 will do nothing.
+ *
+ * @return
+ *   A pointer to the allocated memory, otherwise NULL and rte_errno is set:
+ *   - ENODEV if invalid dev_id
+ *   - EINVAL if reserved flags
+ *   - ENOTSUP if operation not supported by the driver
+ *   - E2BIG if size is higher than limit
+ *   - ENOMEM if out of space
+ *   - EPERM if driver error
+ */
+__rte_experimental
+void *rte_gpu_malloc(int16_t dev_id, size_t size)
+__rte_alloc_size(2);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Deallocate a chunk of memory allocated with rte_gpu_malloc().
+ *
+ * @param dev_id
+ *   Reference device ID.
+ * @param ptr
+ *   Pointer to the memory area to be deallocated.
+ *   NULL is a no-op accepted value.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - ENOTSUP if operation not supported by the driver
+ *   - EPERM if driver error
+ */
+__rte_experimental
+int rte_gpu_free(int16_t dev_id, void *ptr);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Register a chunk of memory on the CPU usable by the device.
+ *
+ * @param dev_id
+ *   Device ID requiring allocated memory.
+ * @param size
+ *   Number of bytes to allocate.
+ *   Requesting 0 will do nothing.
+ * @param ptr
+ *   Pointer to the memory area to be registered.
+ *   NULL is a no-op accepted value.
+
+ * @return
+ *   A pointer to the allocated memory, otherwise NULL and rte_errno is set:
+ *   - ENODEV if invalid dev_id
+ *   - EINVAL if reserved flags
+ *   - ENOTSUP if operation not supported by the driver
+ *   - E2BIG if size is higher than limit
+ *   - ENOMEM if out of space
+ *   - EPERM if driver error
+ */
+__rte_experimental
+int rte_gpu_register(int16_t dev_id, size_t size, void *ptr);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Deregister a chunk of memory previusly registered with rte_gpu_mem_register()
+ *
+ * @param dev_id
+ *   Reference device ID.
+ * @param ptr
+ *   Pointer to the memory area to be unregistered.
+ *   NULL is a no-op accepted value.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - ENOTSUP if operation not supported by the driver
+ *   - EPERM if driver error
+ */
+__rte_experimental
+int rte_gpu_unregister(int16_t dev_id, void *ptr);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/gpudev/version.map b/lib/gpudev/version.map
index 58dc632393..d4a65ebd52 100644
--- a/lib/gpudev/version.map
+++ b/lib/gpudev/version.map
@@ -8,9 +8,13 @@ EXPERIMENTAL {
 	rte_gpu_close;
 	rte_gpu_count_avail;
 	rte_gpu_find_next;
+	rte_gpu_free;
 	rte_gpu_info_get;
 	rte_gpu_init;
 	rte_gpu_is_valid;
+	rte_gpu_malloc;
+	rte_gpu_register;
+	rte_gpu_unregister;
 };
 
 INTERNAL {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [PATCH v5 6/9] gpudev: add memory barrier
  2021-11-08 18:57 ` [dpdk-dev] [PATCH v5 0/9] GPU library eagostini
                     ` (5 preceding siblings ...)
  2021-11-08 18:58   ` [dpdk-dev] [PATCH v5 5/9] gpudev: add memory API eagostini
@ 2021-11-08 18:58   ` eagostini
  2021-11-08 18:58   ` [dpdk-dev] [PATCH v5 7/9] gpudev: add communication flag eagostini
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 128+ messages in thread
From: eagostini @ 2021-11-08 18:58 UTC (permalink / raw)
  To: dev; +Cc: Elena Agostini

From: Elena Agostini <eagostini@nvidia.com>

Add a function for the application to ensure the coherency
of the writes executed by another device into the GPU memory.

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
---
 doc/guides/prog_guide/gpudev.rst |  8 ++++++++
 lib/gpudev/gpudev.c              | 19 +++++++++++++++++++
 lib/gpudev/gpudev_driver.h       |  3 +++
 lib/gpudev/rte_gpudev.h          | 18 ++++++++++++++++++
 lib/gpudev/version.map           |  1 +
 5 files changed, 49 insertions(+)

diff --git a/doc/guides/prog_guide/gpudev.rst b/doc/guides/prog_guide/gpudev.rst
index 9aca69038c..eb5f0af817 100644
--- a/doc/guides/prog_guide/gpudev.rst
+++ b/doc/guides/prog_guide/gpudev.rst
@@ -65,3 +65,11 @@ gpudev can register a CPU memory area to make it visible from a GPU device.
 Later, it's also possible to unregister that memory with gpudev.
 CPU memory registered outside of the gpudev library
 (e.g. with GPU specific library) cannot be unregistered by the gpudev library.
+
+Memory Barrier
+~~~~~~~~~~~~~~
+
+Some GPU drivers may need, under certain conditions,
+to enforce the coherency of external devices writes (e.g. NIC receiving packets)
+into the GPU memory.
+gpudev abstracts and exposes this capability.
diff --git a/lib/gpudev/gpudev.c b/lib/gpudev/gpudev.c
index d0826ec881..49526b335f 100644
--- a/lib/gpudev/gpudev.c
+++ b/lib/gpudev/gpudev.c
@@ -624,3 +624,22 @@ rte_gpu_free(int16_t dev_id, void *ptr)
 	}
 	return GPU_DRV_RET(dev->ops.mem_free(dev, ptr));
 }
+
+int
+rte_gpu_mbw(int16_t dev_id)
+{
+	struct rte_gpu *dev;
+
+	dev = gpu_get_by_id(dev_id);
+	if (dev == NULL) {
+		GPU_LOG(ERR, "memory barrier for invalid device ID %d", dev_id);
+		rte_errno = ENODEV;
+		return -rte_errno;
+	}
+
+	if (dev->ops.mbw == NULL) {
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	return GPU_DRV_RET(dev->ops.mbw(dev));
+}
diff --git a/lib/gpudev/gpudev_driver.h b/lib/gpudev/gpudev_driver.h
index 11015944a6..ab24de9e28 100644
--- a/lib/gpudev/gpudev_driver.h
+++ b/lib/gpudev/gpudev_driver.h
@@ -31,6 +31,7 @@ typedef int (rte_gpu_mem_alloc_t)(struct rte_gpu *dev, size_t size, void **ptr);
 typedef int (rte_gpu_free_t)(struct rte_gpu *dev, void *ptr);
 typedef int (rte_gpu_mem_register_t)(struct rte_gpu *dev, size_t size, void *ptr);
 typedef int (rte_gpu_mem_unregister_t)(struct rte_gpu *dev, void *ptr);
+typedef int (rte_gpu_mbw_t)(struct rte_gpu *dev);
 
 struct rte_gpu_ops {
 	/* Get device info. If NULL, info is just copied. */
@@ -45,6 +46,8 @@ struct rte_gpu_ops {
 	rte_gpu_free_t *mem_free;
 	/* Unregister CPU memory in device. */
 	rte_gpu_mem_unregister_t *mem_unregister;
+	/* Enforce GPU memory write barrier. */
+	rte_gpu_mbw_t *mbw;
 };
 
 struct rte_gpu_mpshared {
diff --git a/lib/gpudev/rte_gpudev.h b/lib/gpudev/rte_gpudev.h
index fee71d60e7..650ebfd700 100644
--- a/lib/gpudev/rte_gpudev.h
+++ b/lib/gpudev/rte_gpudev.h
@@ -387,6 +387,24 @@ int rte_gpu_register(int16_t dev_id, size_t size, void *ptr);
 __rte_experimental
 int rte_gpu_unregister(int16_t dev_id, void *ptr);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Enforce a GPU memory write barrier.
+ *
+ * @param dev_id
+ *   Reference device ID.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - ENOTSUP if operation not supported by the driver
+ *   - EPERM if driver error
+ */
+__rte_experimental
+int rte_gpu_mbw(int16_t dev_id);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/gpudev/version.map b/lib/gpudev/version.map
index d4a65ebd52..d72d470d8e 100644
--- a/lib/gpudev/version.map
+++ b/lib/gpudev/version.map
@@ -13,6 +13,7 @@ EXPERIMENTAL {
 	rte_gpu_init;
 	rte_gpu_is_valid;
 	rte_gpu_malloc;
+	rte_gpu_mbw;
 	rte_gpu_register;
 	rte_gpu_unregister;
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [PATCH v5 7/9] gpudev: add communication flag
  2021-11-08 18:57 ` [dpdk-dev] [PATCH v5 0/9] GPU library eagostini
                     ` (6 preceding siblings ...)
  2021-11-08 18:58   ` [dpdk-dev] [PATCH v5 6/9] gpudev: add memory barrier eagostini
@ 2021-11-08 18:58   ` eagostini
  2021-11-08 18:58   ` [dpdk-dev] [PATCH v5 8/9] gpudev: add communication list eagostini
  2021-11-08 18:58   ` [dpdk-dev] [PATCH v5 9/9] doc: add CUDA example in GPU guide eagostini
  9 siblings, 0 replies; 128+ messages in thread
From: eagostini @ 2021-11-08 18:58 UTC (permalink / raw)
  To: dev; +Cc: Elena Agostini

From: Elena Agostini <eagostini@nvidia.com>

In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.
When mixing network activity with task processing there may be the need
to put in communication the CPU with the device in order to synchronize
operations.

The purpose of this flag is to allow the CPU and the GPU to
exchange ACKs. A possible use-case is described below.

CPU:
- Trigger some task on the GPU
- Prepare some data
- Signal to the GPU the data is ready updating the communication flag

GPU:
- Do some pre-processing
- Wait for more data from the CPU polling on the communication flag
- Consume the data prepared by the CPU

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
---
 app/test-gpudev/main.c                 |  60 ++++++++++++++
 doc/guides/prog_guide/gpudev.rst       |  13 +++
 doc/guides/rel_notes/release_21_11.rst |   1 +
 lib/gpudev/gpudev.c                    |  92 +++++++++++++++++++++
 lib/gpudev/rte_gpudev.h                | 108 +++++++++++++++++++++++++
 lib/gpudev/version.map                 |   4 +
 6 files changed, 278 insertions(+)

diff --git a/app/test-gpudev/main.c b/app/test-gpudev/main.c
index e3aca2225a..516a01b927 100644
--- a/app/test-gpudev/main.c
+++ b/app/test-gpudev/main.c
@@ -154,6 +154,61 @@ register_cpu_memory(uint16_t gpu_id)
 	return 0;
 }
 
+static int
+create_update_comm_flag(uint16_t gpu_id)
+{
+	struct rte_gpu_comm_flag devflag;
+	int ret = 0;
+	uint32_t set_val;
+	uint32_t get_val;
+
+	printf("\n=======> TEST: Communication flag\n");
+
+	ret = rte_gpu_comm_create_flag(gpu_id, &devflag, RTE_GPU_COMM_FLAG_CPU);
+	if (ret < 0) {
+		fprintf(stderr, "rte_gpu_comm_create_flag returned error %d\n", ret);
+		return -1;
+	}
+
+	set_val = 25;
+	ret = rte_gpu_comm_set_flag(&devflag, set_val);
+	if (ret < 0) {
+		fprintf(stderr, "rte_gpu_comm_set_flag returned error %d\n", ret);
+		return -1;
+	}
+
+	ret = rte_gpu_comm_get_flag_value(&devflag, &get_val);
+	if (ret < 0) {
+		fprintf(stderr, "rte_gpu_comm_get_flag_value returned error %d\n", ret);
+		return -1;
+	}
+
+	printf("Communication flag value at 0x%p was set to %d and current value is %d\n", devflag.ptr, set_val, get_val);
+
+	set_val = 38;
+	ret = rte_gpu_comm_set_flag(&devflag, set_val);
+	if (ret < 0) {
+		fprintf(stderr, "rte_gpu_comm_set_flag returned error %d\n", ret);
+		return -1;
+	}
+
+	ret = rte_gpu_comm_get_flag_value(&devflag, &get_val);
+	if (ret < 0) {
+		fprintf(stderr, "rte_gpu_comm_get_flag_value returned error %d\n", ret);
+		return -1;
+	}
+
+	printf("Communication flag value at 0x%p was set to %d and current value is %d\n", devflag.ptr, set_val, get_val);
+
+	ret = rte_gpu_comm_destroy_flag(&devflag);
+	if (ret < 0) {
+		fprintf(stderr, "rte_gpu_comm_destroy_flags returned error %d\n", ret);
+		return -1;
+	}
+
+	return 0;
+}
+
 int
 main(int argc, char **argv)
 {
@@ -204,6 +259,11 @@ main(int argc, char **argv)
 	alloc_gpu_memory(gpu_id);
 	register_cpu_memory(gpu_id);
 
+	/**
+	 * Communication items test
+	 */
+	create_update_comm_flag(gpu_id);
+
 	/* clean up the EAL */
 	rte_eal_cleanup();
 	printf("Bye...\n");
diff --git a/doc/guides/prog_guide/gpudev.rst b/doc/guides/prog_guide/gpudev.rst
index eb5f0af817..e0db627aed 100644
--- a/doc/guides/prog_guide/gpudev.rst
+++ b/doc/guides/prog_guide/gpudev.rst
@@ -32,6 +32,10 @@ This library provides a number of features:
 - Interoperability with device-specific library through generic handlers.
 - Allocate and free memory on the device.
 - Register CPU memory to make it visible from the device.
+- Communication between the CPU and the device.
+
+The whole CPU - GPU communication is implemented
+using CPU memory visible from the GPU.
 
 
 API Overview
@@ -73,3 +77,12 @@ Some GPU drivers may need, under certain conditions,
 to enforce the coherency of external devices writes (e.g. NIC receiving packets)
 into the GPU memory.
 gpudev abstracts and exposes this capability.
+
+Communication Flag
+~~~~~~~~~~~~~~~~~~
+
+Considering an application with some GPU task
+that's waiting to receive a signal from the CPU
+to move forward with the execution.
+The communication flag allocates a CPU memory GPU-visible ``uint32_t`` flag
+that can be used by the CPU to communicate with a GPU task.
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index a4d07bda9b..78b29d9a25 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -105,6 +105,7 @@ New Features
 
   * Device information
   * Memory management
+  * Communication flag
 
 * **Added new RSS offload types for IPv4/L4 checksum in RSS flow.**
 
diff --git a/lib/gpudev/gpudev.c b/lib/gpudev/gpudev.c
index 49526b335f..f887f3dd93 100644
--- a/lib/gpudev/gpudev.c
+++ b/lib/gpudev/gpudev.c
@@ -643,3 +643,95 @@ rte_gpu_mbw(int16_t dev_id)
 	}
 	return GPU_DRV_RET(dev->ops.mbw(dev));
 }
+
+int
+rte_gpu_comm_create_flag(uint16_t dev_id, struct rte_gpu_comm_flag *devflag,
+		enum rte_gpu_comm_flag_type mtype)
+{
+	size_t flag_size;
+	int ret;
+
+	if (devflag == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (mtype != RTE_GPU_COMM_FLAG_CPU) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	flag_size = sizeof(uint32_t);
+
+	devflag->ptr = rte_zmalloc(NULL, flag_size, 0);
+	if (devflag->ptr == NULL) {
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+
+	ret = rte_gpu_register(dev_id, flag_size, devflag->ptr);
+	if (ret < 0) {
+		rte_errno = ENOMEM;
+		return -rte_errno;
+	}
+
+	devflag->mtype = mtype;
+	devflag->dev_id = dev_id;
+
+	return 0;
+}
+
+int
+rte_gpu_comm_destroy_flag(struct rte_gpu_comm_flag *devflag)
+{
+	int ret;
+
+	if (devflag == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	ret = rte_gpu_unregister(devflag->dev_id, devflag->ptr);
+	if (ret < 0) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	rte_free(devflag->ptr);
+
+	return 0;
+}
+
+int
+rte_gpu_comm_set_flag(struct rte_gpu_comm_flag *devflag, uint32_t val)
+{
+	if (devflag == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	if (devflag->mtype != RTE_GPU_COMM_FLAG_CPU) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	RTE_GPU_VOLATILE(*devflag->ptr) = val;
+
+	return 0;
+}
+
+int
+rte_gpu_comm_get_flag_value(struct rte_gpu_comm_flag *devflag, uint32_t *val)
+{
+	if (devflag == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (devflag->mtype != RTE_GPU_COMM_FLAG_CPU) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	*val = RTE_GPU_VOLATILE(*devflag->ptr);
+
+	return 0;
+}
diff --git a/lib/gpudev/rte_gpudev.h b/lib/gpudev/rte_gpudev.h
index 650ebfd700..1466ac164b 100644
--- a/lib/gpudev/rte_gpudev.h
+++ b/lib/gpudev/rte_gpudev.h
@@ -38,6 +38,9 @@ extern "C" {
 /** Catch-all callback data. */
 #define RTE_GPU_CALLBACK_ANY_DATA ((void *)-1)
 
+/** Access variable as volatile. */
+#define RTE_GPU_VOLATILE(x) (*(volatile typeof(x) *)&(x))
+
 /** Store device info. */
 struct rte_gpu_info {
 	/** Unique identifier name. */
@@ -68,6 +71,22 @@ enum rte_gpu_event {
 typedef void (rte_gpu_callback_t)(int16_t dev_id,
 		enum rte_gpu_event event, void *user_data);
 
+/** Memory where communication flag is allocated. */
+enum rte_gpu_comm_flag_type {
+	/** Allocate flag on CPU memory visible from device. */
+	RTE_GPU_COMM_FLAG_CPU = 0,
+};
+
+/** Communication flag to coordinate CPU with the device. */
+struct rte_gpu_comm_flag {
+	/** Device that will use the device flag. */
+	uint16_t dev_id;
+	/** Pointer to flag memory area. */
+	uint32_t *ptr;
+	/** Type of memory used to allocate the flag. */
+	enum rte_gpu_comm_flag_type mtype;
+};
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
@@ -405,6 +424,95 @@ int rte_gpu_unregister(int16_t dev_id, void *ptr);
 __rte_experimental
 int rte_gpu_mbw(int16_t dev_id);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Create a communication flag that can be shared
+ * between CPU threads and device workload to exchange some status info
+ * (e.g. work is done, processing can start, etc..).
+ *
+ * @param dev_id
+ *   Reference device ID.
+ * @param devflag
+ *   Pointer to the memory area of the devflag structure.
+ * @param mtype
+ *   Type of memory to allocate the communication flag.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - EINVAL if invalid inputs
+ *   - ENOTSUP if operation not supported by the driver
+ *   - ENOMEM if out of space
+ *   - EPERM if driver error
+ */
+__rte_experimental
+int rte_gpu_comm_create_flag(uint16_t dev_id,
+		struct rte_gpu_comm_flag *devflag,
+		enum rte_gpu_comm_flag_type mtype);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Deallocate a communication flag.
+ *
+ * @param devflag
+ *   Pointer to the memory area of the devflag structure.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - ENODEV if invalid dev_id
+ *   - EINVAL if NULL devflag
+ *   - ENOTSUP if operation not supported by the driver
+ *   - EPERM if driver error
+ */
+__rte_experimental
+int rte_gpu_comm_destroy_flag(struct rte_gpu_comm_flag *devflag);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set the value of a communication flag as the input value.
+ * Flag memory area is treated as volatile.
+ * The flag must have been allocated with RTE_GPU_COMM_FLAG_CPU.
+ *
+ * @param devflag
+ *   Pointer to the memory area of the devflag structure.
+ * @param val
+ *   Value to set in the flag.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - EINVAL if invalid input params
+ */
+__rte_experimental
+int rte_gpu_comm_set_flag(struct rte_gpu_comm_flag *devflag,
+		uint32_t val);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get the value of the communication flag.
+ * Flag memory area is treated as volatile.
+ * The flag must have been allocated with RTE_GPU_COMM_FLAG_CPU.
+ *
+ * @param devflag
+ *   Pointer to the memory area of the devflag structure.
+ * @param val
+ *   Flag output value.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - EINVAL if invalid input params
+ */
+__rte_experimental
+int rte_gpu_comm_get_flag_value(struct rte_gpu_comm_flag *devflag,
+		uint32_t *val);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/gpudev/version.map b/lib/gpudev/version.map
index d72d470d8e..2fc039373a 100644
--- a/lib/gpudev/version.map
+++ b/lib/gpudev/version.map
@@ -6,6 +6,10 @@ EXPERIMENTAL {
 	rte_gpu_callback_register;
 	rte_gpu_callback_unregister;
 	rte_gpu_close;
+	rte_gpu_comm_create_flag;
+	rte_gpu_comm_destroy_flag;
+	rte_gpu_comm_get_flag_value;
+	rte_gpu_comm_set_flag;
 	rte_gpu_count_avail;
 	rte_gpu_find_next;
 	rte_gpu_free;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [PATCH v5 8/9] gpudev: add communication list
  2021-11-08 18:57 ` [dpdk-dev] [PATCH v5 0/9] GPU library eagostini
                     ` (7 preceding siblings ...)
  2021-11-08 18:58   ` [dpdk-dev] [PATCH v5 7/9] gpudev: add communication flag eagostini
@ 2021-11-08 18:58   ` eagostini
  2021-11-08 18:58   ` [dpdk-dev] [PATCH v5 9/9] doc: add CUDA example in GPU guide eagostini
  9 siblings, 0 replies; 128+ messages in thread
From: eagostini @ 2021-11-08 18:58 UTC (permalink / raw)
  To: dev; +Cc: Elena Agostini

From: Elena Agostini <eagostini@nvidia.com>

In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.
When mixing network activity with task processing there may be the need
to put in communication the CPU with the device in order to synchronize
operations.

An example could be a receive-and-process application
where CPU is responsible for receiving packets in multiple mbufs
and the GPU is responsible for processing the content of those packets.

The purpose of this list is to provide a buffer in CPU memory visible
from the GPU that can be treated as a circular buffer
to let the CPU provide fondamental info of received packets to the GPU.

A possible use-case is described below.

CPU:
- Trigger some task on the GPU
- in a loop:
    - receive a number of packets
    - provide packets info to the GPU

GPU:
- Do some pre-processing
- Wait to receive a new set of packet to be processed

Layout of a communication list would be:

     -------
    |   0    | => pkt_list
    | status |
    | #pkts  |
     -------
    |   1    | => pkt_list
    | status |
    | #pkts  |
     -------
    |   2    | => pkt_list
    | status |
    | #pkts  |
     -------
    |  ....  | => pkt_list
     -------

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
---
 app/test-gpudev/main.c                 |  95 ++++++++++++++
 doc/guides/prog_guide/gpudev.rst       |  16 +++
 doc/guides/rel_notes/release_21_11.rst |   2 +-
 lib/gpudev/gpudev.c                    | 164 +++++++++++++++++++++++++
 lib/gpudev/meson.build                 |   2 +
 lib/gpudev/rte_gpudev.h                | 129 +++++++++++++++++++
 lib/gpudev/version.map                 |   4 +
 7 files changed, 411 insertions(+), 1 deletion(-)

diff --git a/app/test-gpudev/main.c b/app/test-gpudev/main.c
index 516a01b927..111ed6d415 100644
--- a/app/test-gpudev/main.c
+++ b/app/test-gpudev/main.c
@@ -209,6 +209,100 @@ create_update_comm_flag(uint16_t gpu_id)
 	return 0;
 }
 
+static int
+simulate_gpu_task(struct rte_gpu_comm_list *comm_list_item, int num_pkts)
+{
+	int idx;
+
+	if (comm_list_item == NULL)
+		return -1;
+
+	for (idx = 0; idx < num_pkts; idx++) {
+		/**
+		 * consume(comm_list_item->pkt_list[idx].addr);
+		 */
+	}
+	comm_list_item->status = RTE_GPU_COMM_LIST_DONE;
+
+	return 0;
+}
+
+static int
+create_update_comm_list(uint16_t gpu_id)
+{
+	int ret = 0;
+	int i = 0;
+	struct rte_gpu_comm_list *comm_list;
+	uint32_t num_comm_items = 1024;
+	struct rte_mbuf *mbufs[10];
+
+	printf("\n=======> TEST: Communication list\n");
+
+	comm_list = rte_gpu_comm_create_list(gpu_id, num_comm_items);
+	if (comm_list == NULL) {
+		fprintf(stderr, "rte_gpu_comm_create_list returned error %d\n", ret);
+		return -1;
+	}
+
+	/**
+	 * Simulate DPDK receive functions like rte_eth_rx_burst()
+	 */
+	for (i = 0; i < 10; i++) {
+		mbufs[i] = rte_zmalloc(NULL, sizeof(struct rte_mbuf), 0);
+		if (mbufs[i] == NULL) {
+			fprintf(stderr, "Failed to allocate fake mbufs in CPU memory.\n");
+			return -1;
+		}
+
+		memset(mbufs[i], 0, sizeof(struct rte_mbuf));
+	}
+
+	/**
+	 * Populate just the first item of  the list
+	 */
+	ret = rte_gpu_comm_populate_list_pkts(&(comm_list[0]), mbufs, 10);
+	if (ret < 0) {
+		fprintf(stderr, "rte_gpu_comm_populate_list_pkts returned error %d\n", ret);
+		return -1;
+	}
+
+	ret = rte_gpu_comm_cleanup_list(&(comm_list[0]));
+	if (ret == 0) {
+		fprintf(stderr, "rte_gpu_comm_cleanup_list erroneusly cleaned the list even if packets have not beeing consumed yet\n");
+		return -1;
+	} else {
+		fprintf(stderr, "rte_gpu_comm_cleanup_list correctly didn't clean up the packets because they have not beeing consumed yet\n");
+	}
+
+	/**
+	 * Simulate a GPU tasks going through the packet list to consume
+	 * mbufs packets and release them
+	 */
+	simulate_gpu_task(&(comm_list[0]), 10);
+
+	/**
+	 * Packets have been consumed, now the communication item
+	 * and the related mbufs can be all released
+	 */
+	ret = rte_gpu_comm_cleanup_list(&(comm_list[0]));
+	if (ret < 0) {
+		fprintf(stderr, "rte_gpu_comm_cleanup_list returned error %d\n", ret);
+		return -1;
+	}
+
+	ret = rte_gpu_comm_destroy_list(comm_list, num_comm_items);
+	if (ret < 0) {
+		fprintf(stderr, "rte_gpu_comm_destroy_list returned error %d\n", ret);
+		return -1;
+	}
+
+	for (i = 0; i < 10; i++)
+		rte_free(mbufs[i]);
+
+	printf("\nCommunication list test passed!\n");
+	return 0;
+}
+
 int
 main(int argc, char **argv)
 {
@@ -263,6 +357,7 @@ main(int argc, char **argv)
 	 * Communication items test
 	 */
 	create_update_comm_flag(gpu_id);
+	create_update_comm_list(gpu_id);
 
 	/* clean up the EAL */
 	rte_eal_cleanup();
diff --git a/doc/guides/prog_guide/gpudev.rst b/doc/guides/prog_guide/gpudev.rst
index e0db627aed..cbaec5a1e4 100644
--- a/doc/guides/prog_guide/gpudev.rst
+++ b/doc/guides/prog_guide/gpudev.rst
@@ -86,3 +86,19 @@ that's waiting to receive a signal from the CPU
 to move forward with the execution.
 The communication flag allocates a CPU memory GPU-visible ``uint32_t`` flag
 that can be used by the CPU to communicate with a GPU task.
+
+Communication list
+~~~~~~~~~~~~~~~~~~
+
+By default, DPDK pulls free mbufs from a mempool to receive packets.
+Best practice, expecially in a multithreaded application,
+is to no make any assumption on which mbufs will be used
+to receive the next bursts of packets.
+Considering an application with a GPU memory mempool
+attached to a receive queue having some task waiting on the GPU
+to receive a new burst of packets to be processed,
+there is the need to communicate from the CPU
+the list of mbuf payload addresses where received packet have been stored.
+The ``rte_gpu_comm_*()`` functions are responsible to create a list of packets
+that can be populated with receive mbuf payload addresses
+and communicated to the task running on the GPU.
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index 78b29d9a25..23d8591f40 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -105,7 +105,7 @@ New Features
 
   * Device information
   * Memory management
-  * Communication flag
+  * Communication flag & list
 
 * **Added new RSS offload types for IPv4/L4 checksum in RSS flow.**
 
diff --git a/lib/gpudev/gpudev.c b/lib/gpudev/gpudev.c
index f887f3dd93..88148eb704 100644
--- a/lib/gpudev/gpudev.c
+++ b/lib/gpudev/gpudev.c
@@ -735,3 +735,167 @@ rte_gpu_comm_get_flag_value(struct rte_gpu_comm_flag *devflag, uint32_t *val)
 
 	return 0;
 }
+
+struct rte_gpu_comm_list *
+rte_gpu_comm_create_list(uint16_t dev_id,
+		uint32_t num_comm_items)
+{
+	struct rte_gpu_comm_list *comm_list;
+	uint32_t idx_l;
+	int ret;
+	struct rte_gpu *dev;
+
+	if (num_comm_items == 0) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	dev = gpu_get_by_id(dev_id);
+	if (dev == NULL) {
+		GPU_LOG(ERR, "memory barrier for invalid device ID %d", dev_id);
+		rte_errno = ENODEV;
+		return NULL;
+	}
+
+	comm_list = rte_zmalloc(NULL, sizeof(struct rte_gpu_comm_list) * num_comm_items, 0);
+	if (comm_list == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	ret = rte_gpu_register(dev_id, sizeof(struct rte_gpu_comm_list) * num_comm_items, comm_list);
+	if (ret < 0) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	for (idx_l = 0; idx_l < num_comm_items; idx_l++) {
+		comm_list[idx_l].pkt_list = rte_zmalloc(NULL, sizeof(struct rte_gpu_comm_pkt) * RTE_GPU_COMM_LIST_PKTS_MAX, 0);
+		if (comm_list[idx_l].pkt_list == NULL) {
+			rte_errno = ENOMEM;
+			return NULL;
+		}
+
+		ret = rte_gpu_register(dev_id, sizeof(struct rte_gpu_comm_pkt) * RTE_GPU_COMM_LIST_PKTS_MAX, comm_list[idx_l].pkt_list);
+		if (ret < 0) {
+			rte_errno = ENOMEM;
+			return NULL;
+		}
+
+		RTE_GPU_VOLATILE(comm_list[idx_l].status) = RTE_GPU_COMM_LIST_FREE;
+		comm_list[idx_l].num_pkts = 0;
+		comm_list[idx_l].dev_id = dev_id;
+
+		comm_list[idx_l].mbufs = rte_zmalloc(NULL, sizeof(struct rte_mbuf *) * RTE_GPU_COMM_LIST_PKTS_MAX, 0);
+		if (comm_list[idx_l].mbufs == NULL) {
+			rte_errno = ENOMEM;
+			return NULL;
+		}
+	}
+
+	return comm_list;
+}
+
+int
+rte_gpu_comm_destroy_list(struct rte_gpu_comm_list *comm_list,
+		uint32_t num_comm_items)
+{
+	uint32_t idx_l;
+	int ret;
+	uint16_t dev_id;
+
+	if (comm_list == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	dev_id = comm_list[0].dev_id;
+
+	for (idx_l = 0; idx_l < num_comm_items; idx_l++) {
+		ret = rte_gpu_unregister(dev_id, comm_list[idx_l].pkt_list);
+		if (ret < 0) {
+			rte_errno = EINVAL;
+			return -1;
+		}
+
+		rte_free(comm_list[idx_l].pkt_list);
+		rte_free(comm_list[idx_l].mbufs);
+	}
+
+	ret = rte_gpu_unregister(dev_id, comm_list);
+	if (ret < 0) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	rte_free(comm_list);
+
+	return 0;
+}
+
+int
+rte_gpu_comm_populate_list_pkts(struct rte_gpu_comm_list *comm_list_item,
+		struct rte_mbuf **mbufs, uint32_t num_mbufs)
+{
+	uint32_t idx;
+
+	if (comm_list_item == NULL || comm_list_item->pkt_list == NULL ||
+			mbufs == NULL || num_mbufs > RTE_GPU_COMM_LIST_PKTS_MAX) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	for (idx = 0; idx < num_mbufs; idx++) {
+		/* support only unchained mbufs */
+		if (unlikely((mbufs[idx]->nb_segs > 1) ||
+				(mbufs[idx]->next != NULL) ||
+				(mbufs[idx]->data_len != mbufs[idx]->pkt_len))) {
+			rte_errno = ENOTSUP;
+			return -rte_errno;
+		}
+		comm_list_item->pkt_list[idx].addr =
+				rte_pktmbuf_mtod_offset(mbufs[idx], uintptr_t, 0);
+		comm_list_item->pkt_list[idx].size = mbufs[idx]->pkt_len;
+		comm_list_item->mbufs[idx] = mbufs[idx];
+	}
+
+	RTE_GPU_VOLATILE(comm_list_item->num_pkts) = num_mbufs;
+	rte_gpu_mbw(comm_list_item->dev_id);
+	RTE_GPU_VOLATILE(comm_list_item->status) = RTE_GPU_COMM_LIST_READY;
+	rte_gpu_mbw(comm_list_item->dev_id);
+
+	return 0;
+}
+
+int
+rte_gpu_comm_cleanup_list(struct rte_gpu_comm_list *comm_list_item)
+{
+	uint32_t idx = 0;
+
+	if (comm_list_item == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	if (RTE_GPU_VOLATILE(comm_list_item->status) ==
+			RTE_GPU_COMM_LIST_READY) {
+		GPU_LOG(ERR, "packet list is still in progress");
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+
+	for (idx = 0; idx < RTE_GPU_COMM_LIST_PKTS_MAX; idx++) {
+		if (comm_list_item->pkt_list[idx].addr == 0)
+			break;
+
+		comm_list_item->pkt_list[idx].addr = 0;
+		comm_list_item->pkt_list[idx].size = 0;
+		comm_list_item->mbufs[idx] = NULL;
+	}
+
+	RTE_GPU_VOLATILE(comm_list_item->status) = RTE_GPU_COMM_LIST_FREE;
+	RTE_GPU_VOLATILE(comm_list_item->num_pkts) = 0;
+	rte_mb();
+
+	return 0;
+}
diff --git a/lib/gpudev/meson.build b/lib/gpudev/meson.build
index 608154817b..89a118f357 100644
--- a/lib/gpudev/meson.build
+++ b/lib/gpudev/meson.build
@@ -8,3 +8,5 @@ headers = files(
 sources = files(
         'gpudev.c',
 )
+
+deps += ['mbuf']
diff --git a/lib/gpudev/rte_gpudev.h b/lib/gpudev/rte_gpudev.h
index 1466ac164b..3023154be8 100644
--- a/lib/gpudev/rte_gpudev.h
+++ b/lib/gpudev/rte_gpudev.h
@@ -9,6 +9,7 @@
 #include <stdint.h>
 #include <stdbool.h>
 
+#include <rte_mbuf.h>
 #include <rte_bitops.h>
 #include <rte_compat.h>
 
@@ -41,6 +42,9 @@ extern "C" {
 /** Access variable as volatile. */
 #define RTE_GPU_VOLATILE(x) (*(volatile typeof(x) *)&(x))
 
+/** Max number of packets per communication list. */
+#define RTE_GPU_COMM_LIST_PKTS_MAX 1024
+
 /** Store device info. */
 struct rte_gpu_info {
 	/** Unique identifier name. */
@@ -87,6 +91,43 @@ struct rte_gpu_comm_flag {
 	enum rte_gpu_comm_flag_type mtype;
 };
 
+/** List of packets shared among CPU and device. */
+struct rte_gpu_comm_pkt {
+	/** Address of the packet in memory (e.g. mbuf->buf_addr). */
+	uintptr_t addr;
+	/** Size in byte of the packet. */
+	size_t size;
+};
+
+/** Possible status for the list of packets shared among CPU and device. */
+enum rte_gpu_comm_list_status {
+	/** Packet list can be filled with new mbufs, no one is using it. */
+	RTE_GPU_COMM_LIST_FREE = 0,
+	/** Packet list has been filled with new mbufs and it's ready to be used .*/
+	RTE_GPU_COMM_LIST_READY,
+	/** Packet list has been processed, it's ready to be freed. */
+	RTE_GPU_COMM_LIST_DONE,
+	/** Some error occurred during packet list processing. */
+	RTE_GPU_COMM_LIST_ERROR,
+};
+
+/**
+ * Communication list holding a number of lists of packets
+ * each having a status flag.
+ */
+struct rte_gpu_comm_list {
+	/** Device that will use the communication list. */
+	uint16_t dev_id;
+	/** List of mbufs populated by the CPU with a set of mbufs. */
+	struct rte_mbuf **mbufs;
+	/** List of packets populated by the CPU with a set of mbufs info. */
+	struct rte_gpu_comm_pkt *pkt_list;
+	/** Number of packets in the list. */
+	uint32_t num_pkts;
+	/** Status of the list. */
+	enum rte_gpu_comm_list_status status;
+};
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
@@ -513,6 +554,94 @@ __rte_experimental
 int rte_gpu_comm_get_flag_value(struct rte_gpu_comm_flag *devflag,
 		uint32_t *val);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Create a communication list that can be used to share packets
+ * between CPU and device.
+ * Each element of the list contains:
+ *  - a packet list of RTE_GPU_COMM_LIST_PKTS_MAX elements
+ *  - number of packets in the list
+ *  - a status flag to communicate if the packet list is FREE,
+ *    READY to be processed, DONE with processing.
+ *
+ * The list is allocated in CPU-visible memory.
+ * At creation time, every list is in FREE state.
+ *
+ * @param dev_id
+ *   Reference device ID.
+ * @param num_comm_items
+ *   Number of items in the communication list.
+ *
+ * @return
+ *   A pointer to the allocated list, otherwise NULL and rte_errno is set:
+ *   - EINVAL if invalid input params
+ */
+__rte_experimental
+struct rte_gpu_comm_list *rte_gpu_comm_create_list(uint16_t dev_id,
+		uint32_t num_comm_items);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Destroy a communication list.
+ *
+ * @param comm_list
+ *   Communication list to be destroyed.
+ * @param num_comm_items
+ *   Number of items in the communication list.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - EINVAL if invalid input params
+ */
+__rte_experimental
+int rte_gpu_comm_destroy_list(struct rte_gpu_comm_list *comm_list,
+		uint32_t num_comm_items);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Populate the packets list of the communication item
+ * with info from a list of mbufs.
+ * Status flag of that packet list is set to READY.
+ *
+ * @param comm_list_item
+ *   Communication list item to fill.
+ * @param mbufs
+ *   List of mbufs.
+ * @param num_mbufs
+ *   Number of mbufs.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - EINVAL if invalid input params
+ *   - ENOTSUP if mbufs are chained (multiple segments)
+ */
+__rte_experimental
+int rte_gpu_comm_populate_list_pkts(struct rte_gpu_comm_list *comm_list_item,
+		struct rte_mbuf **mbufs, uint32_t num_mbufs);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Reset a communication list item to the original state.
+ * The status flag set to FREE and mbufs are returned to the pool.
+ *
+ * @param comm_list_item
+ *   Communication list item to reset.
+ *
+ * @return
+ *   0 on success, -rte_errno otherwise:
+ *   - EINVAL if invalid input params
+ */
+__rte_experimental
+int rte_gpu_comm_cleanup_list(struct rte_gpu_comm_list *comm_list_item);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/gpudev/version.map b/lib/gpudev/version.map
index 2fc039373a..45a35fa6e4 100644
--- a/lib/gpudev/version.map
+++ b/lib/gpudev/version.map
@@ -6,9 +6,13 @@ EXPERIMENTAL {
 	rte_gpu_callback_register;
 	rte_gpu_callback_unregister;
 	rte_gpu_close;
+	rte_gpu_comm_cleanup_list;
 	rte_gpu_comm_create_flag;
+	rte_gpu_comm_create_list;
 	rte_gpu_comm_destroy_flag;
+	rte_gpu_comm_destroy_list;
 	rte_gpu_comm_get_flag_value;
+	rte_gpu_comm_populate_list_pkts;
 	rte_gpu_comm_set_flag;
 	rte_gpu_count_avail;
 	rte_gpu_find_next;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

* [dpdk-dev] [PATCH v5 9/9] doc: add CUDA example in GPU guide
  2021-11-08 18:57 ` [dpdk-dev] [PATCH v5 0/9] GPU library eagostini
                     ` (8 preceding siblings ...)
  2021-11-08 18:58   ` [dpdk-dev] [PATCH v5 8/9] gpudev: add communication list eagostini
@ 2021-11-08 18:58   ` eagostini
  9 siblings, 0 replies; 128+ messages in thread
From: eagostini @ 2021-11-08 18:58 UTC (permalink / raw)
  To: dev; +Cc: Elena Agostini

From: Elena Agostini <eagostini@nvidia.com>

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
---
 doc/guides/prog_guide/gpudev.rst | 122 +++++++++++++++++++++++++++++++
 1 file changed, 122 insertions(+)

diff --git a/doc/guides/prog_guide/gpudev.rst b/doc/guides/prog_guide/gpudev.rst
index cbaec5a1e4..1baf0c6772 100644
--- a/doc/guides/prog_guide/gpudev.rst
+++ b/doc/guides/prog_guide/gpudev.rst
@@ -102,3 +102,125 @@ the list of mbuf payload addresses where received packet have been stored.
 The ``rte_gpu_comm_*()`` functions are responsible to create a list of packets
 that can be populated with receive mbuf payload addresses
 and communicated to the task running on the GPU.
+
+
+CUDA Example
+------------
+
+In the example below, there is a pseudo-code to give an example
+about how to use functions in this library in case of a CUDA application.
+
+.. code-block:: c
+
+   //////////////////////////////////////////////////////////////////////////
+   ///// gpudev library + CUDA functions
+   //////////////////////////////////////////////////////////////////////////
+   #define GPU_PAGE_SHIFT 16
+   #define GPU_PAGE_SIZE (1UL << GPU_PAGE_SHIFT)
+
+   int main() {
+       struct rte_gpu_flag quit_flag;
+       struct rte_gpu_comm_list *comm_list;
+       int nb_rx = 0;
+       int comm_list_entry = 0;
+       struct rte_mbuf * rx_mbufs[max_rx_mbufs];
+       cudaStream_t cstream;
+       struct rte_mempool *mpool_payload, *mpool_header;
+       struct rte_pktmbuf_extmem ext_mem;
+       int16_t dev_id;
+       int16_t port_id = 0;
+
+       /** Initialize CUDA objects (cstream, context, etc..). */
+       /** Use gpudev library to register a new CUDA context if any */
+       /** Let's assume the application wants to use the default context of the GPU device 0 */
+
+       dev_id = 0;
+
+       /**
+        * Create an external memory mempool using memory allocated on the GPU.
+        */
+       ext_mem.elt_size = mbufs_headroom_size;
+                   ext_mem.buf_len = RTE_ALIGN_CEIL(mbufs_num * ext_mem.elt_size, GPU_PAGE_SIZE);
+       ext_mem.buf_iova = RTE_BAD_IOVA;
+       ext_mem.buf_ptr = rte_gpu_malloc(dev_id, ext_mem.buf_len, 0);
+       rte_extmem_register(ext_mem.buf_ptr, ext_mem.buf_len, NULL, ext_mem.buf_iova, GPU_PAGE_SIZE);
+       rte_dev_dma_map(rte_eth_devices[port_id].device, ext_mem.buf_ptr, ext_mem.buf_iova, ext_mem.buf_len);
+       mpool_payload = rte_pktmbuf_pool_create_extbuf("gpu_mempool", mbufs_num,
+                                                       0, 0, ext_mem.elt_size,
+                                                       rte_socket_id(), &ext_mem, 1);
+
+       /**
+        * Create CPU - device communication flag. With this flag, the CPU can tell to the CUDA kernel
+        * to exit from the main loop.
+        */
+       rte_gpu_comm_create_flag(dev_id, &quit_flag, RTE_GPU_COMM_FLAG_CPU);
+       rte_gpu_comm_set_flag(&quit_flag , 0);
+
+       /**
+        * Create CPU - device communication list. Each entry of this list will be populated by the CPU
+        * with a new set of received mbufs that the CUDA kernel has to process.
+        */
+       comm_list = rte_gpu_comm_create_list(dev_id, num_entries);
+
+       /** A very simple CUDA kernel with just 1 CUDA block and RTE_GPU_COMM_LIST_PKTS_MAX CUDA threads. */
+       cuda_kernel_packet_processing<<<1, RTE_GPU_COMM_LIST_PKTS_MAX, 0, cstream>>>(quit_flag->ptr, comm_list, num_entries, ...);
+
+       /**
+        * For simplicity, the CPU here receives only 2 bursts of mbufs.
+        * In a real application, network activity and device processing should overlap.
+        */
+       nb_rx = rte_eth_rx_burst(port_id, queue_id, &(rx_mbufs[0]), max_rx_mbufs);
+       rte_gpu_comm_populate_list_pkts(comm_list[0], rx_mbufs, nb_rx);
+       nb_rx = rte_eth_rx_burst(port_id, queue_id, &(rx_mbufs[0]), max_rx_mbufs);
+       rte_gpu_comm_populate_list_pkts(comm_list[1], rx_mbufs, nb_rx);
+
+       /**
+        * CPU waits for the completion of the packets' processing on the CUDA kernel
+        * and then it does a cleanup of the received mbufs.
+        */
+       while(rte_gpu_comm_cleanup_list(comm_list[0]));
+       while(rte_gpu_comm_cleanup_list(comm_list[1]));
+
+       /** CPU notifies the CUDA kernel that it has to terminate */
+       rte_gpu_comm_set_flag(&quit_flag, 1);
+
+       /** gpudev objects cleanup/destruction */
+       /** CUDA cleanup */
+
+       rte_gpu_free(dev_id, ext_mem.buf_len);
+
+       /** DPDK cleanup */
+
+       return 0;
+   }
+
+   //////////////////////////////////////////////////////////////////////////
+   ///// CUDA kernel
+   //////////////////////////////////////////////////////////////////////////
+
+   void cuda_kernel(uint32_t * quit_flag_ptr, struct rte_gpu_comm_list *comm_list, int comm_list_entries) {
+      int comm_list_index = 0;
+      struct rte_gpu_comm_pkt *pkt_list = NULL;
+
+      /** Do some pre-processing operations. */
+
+      /** GPU kernel keeps checking this flag to know if it has to quit or wait for more packets. */
+      while(*quit_flag_ptr == 0)
+      {
+         if(comm_list[comm_list_index]->status != RTE_GPU_COMM_LIST_READY)
+         continue;
+
+         if(threadIdx.x < comm_list[comm_list_index]->num_pkts)
+         {
+            /** Each CUDA thread processes a different packet. */
+            packet_processing(comm_list[comm_list_index]->addr, comm_list[comm_list_index]->size, ..);
+         }
+         __threadfence();
+         __syncthreads();
+
+         /** Wait for new packets on the next communication list entry. */
+         comm_list_index = (comm_list_index+1) % comm_list_entries;
+      }
+
+      /** Do some post-processing operations. */
+   }
-- 
2.17.1


^ permalink raw reply	[flat|nested] 128+ messages in thread

end of thread, other threads:[~2021-11-08 16:25 UTC | newest]

Thread overview: 128+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-02 20:35 [dpdk-dev] [PATCH] gpudev: introduce memory API Thomas Monjalon
2021-06-02 20:46 ` Stephen Hemminger
2021-06-02 20:48   ` Thomas Monjalon
2021-06-03  7:06 ` Andrew Rybchenko
2021-06-03  7:26   ` Thomas Monjalon
2021-06-03  7:49     ` Andrew Rybchenko
2021-06-03  8:26       ` Thomas Monjalon
2021-06-03  8:57         ` Andrew Rybchenko
2021-06-03  7:18 ` David Marchand
2021-06-03  7:30   ` Thomas Monjalon
2021-06-03  7:47 ` Jerin Jacob
2021-06-03  8:28   ` Thomas Monjalon
2021-06-03  8:41     ` Jerin Jacob
2021-06-03  8:43       ` Thomas Monjalon
2021-06-03  8:47         ` Jerin Jacob
2021-06-03  8:53           ` Thomas Monjalon
2021-06-03  9:20             ` Jerin Jacob
2021-06-03  9:36               ` Thomas Monjalon
2021-06-03 10:04                 ` Jerin Jacob
2021-06-03 10:30                   ` Thomas Monjalon
2021-06-03 11:38                     ` Jerin Jacob
2021-06-04 12:55                       ` Thomas Monjalon
2021-06-04 15:05                         ` Jerin Jacob
2021-06-03  9:33   ` Ferruh Yigit
2021-06-04 10:28     ` Thomas Monjalon
2021-06-04 11:09       ` Jerin Jacob
2021-06-04 12:46         ` Thomas Monjalon
2021-06-04 13:05           ` Andrew Rybchenko
2021-06-04 13:18             ` Thomas Monjalon
2021-06-04 13:59               ` Andrew Rybchenko
2021-06-04 14:09                 ` Thomas Monjalon
2021-06-04 15:20                   ` Jerin Jacob
2021-06-04 15:51                     ` Thomas Monjalon
2021-06-04 18:20                       ` Wang, Haiyue
2021-06-05  5:09                         ` Jerin Jacob
2021-06-06  1:13                           ` Honnappa Nagarahalli
2021-06-06  5:28                             ` Jerin Jacob
2021-06-07 10:29                               ` Thomas Monjalon
2021-06-07  7:20                             ` Wang, Haiyue
2021-06-07 10:43                               ` Thomas Monjalon
2021-06-07 13:54                                 ` Jerin Jacob
2021-06-07 16:47                                   ` Thomas Monjalon
2021-06-08  4:10                                     ` Jerin Jacob
2021-06-08  6:34                                       ` Thomas Monjalon
2021-06-08  7:09                                         ` Jerin Jacob
2021-06-08  7:32                                           ` Thomas Monjalon
2021-06-15 18:24                                         ` Ferruh Yigit
2021-06-15 18:54                                           ` Thomas Monjalon
2021-06-07 23:31                                   ` Honnappa Nagarahalli
2021-06-04  5:51 ` Wang, Haiyue
2021-06-04  8:15   ` Thomas Monjalon
2021-06-04 11:07 ` Wang, Haiyue
2021-06-04 12:43   ` Thomas Monjalon
2021-06-04 13:25     ` Wang, Haiyue
2021-06-04 14:06       ` Thomas Monjalon
2021-06-04 18:04         ` Wang, Haiyue
2021-06-05  7:49           ` Thomas Monjalon
2021-06-05 11:09             ` Wang, Haiyue
2021-06-06  1:10 ` Honnappa Nagarahalli
2021-06-07 10:50   ` Thomas Monjalon
2021-07-30 13:55 ` [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library Thomas Monjalon
2021-07-30 13:55   ` [dpdk-dev] [RFC PATCH v2 1/7] hcdev: introduce heterogeneous computing device library Thomas Monjalon
2021-07-30 13:55   ` [dpdk-dev] [RFC PATCH v2 2/7] hcdev: add event notification Thomas Monjalon
2021-07-30 13:55   ` [dpdk-dev] [RFC PATCH v2 3/7] hcdev: add child device representing a device context Thomas Monjalon
2021-07-30 13:55   ` [dpdk-dev] [RFC PATCH v2 4/7] hcdev: support multi-process Thomas Monjalon
2021-07-30 13:55   ` [dpdk-dev] [RFC PATCH v2 5/7] hcdev: add memory API Thomas Monjalon
2021-07-30 13:55   ` [dpdk-dev] [RFC PATCH v2 6/7] hcdev: add communication flag Thomas Monjalon
2021-07-30 13:55   ` [dpdk-dev] [RFC PATCH v2 7/7] hcdev: add communication list Thomas Monjalon
2021-07-31  7:06   ` [dpdk-dev] [RFC PATCH v2 0/7] heterogeneous computing library Jerin Jacob
2021-07-31  8:21     ` Thomas Monjalon
2021-07-31 13:42       ` Jerin Jacob
2021-08-27  9:44         ` Thomas Monjalon
2021-08-27 12:19           ` Jerin Jacob
2021-08-29  5:32             ` Wang, Haiyue
2021-09-01 15:35               ` Elena Agostini
2021-09-02 13:12                 ` Jerin Jacob
2021-09-06 16:11                   ` Elena Agostini
2021-09-06 17:15                     ` Wang, Haiyue
2021-09-06 17:22                       ` Elena Agostini
2021-09-07  0:55                         ` Wang, Haiyue
2021-10-09  1:53 ` [dpdk-dev] [PATCH v3 0/9] GPU library eagostini
2021-10-09  1:53   ` [dpdk-dev] [PATCH v3 1/9] gpudev: introduce GPU device class library eagostini
2021-10-09  1:53   ` [dpdk-dev] [PATCH v3 2/9] gpudev: add event notification eagostini
2021-10-09  1:53   ` [dpdk-dev] [PATCH v3 3/9] gpudev: add child device representing a device context eagostini
2021-10-09  1:53   ` [dpdk-dev] [PATCH v3 4/9] gpudev: support multi-process eagostini
2021-10-09  1:53   ` [dpdk-dev] [PATCH v3 5/9] gpudev: add memory API eagostini
2021-10-08 20:18     ` Thomas Monjalon
2021-10-29 19:38     ` Mattias Rönnblom
2021-11-08 15:16       ` Elena Agostini
2021-10-09  1:53   ` [dpdk-dev] [PATCH v3 6/9] gpudev: add memory barrier eagostini
2021-10-08 20:16     ` Thomas Monjalon
2021-10-09  1:53   ` [dpdk-dev] [PATCH v3 7/9] gpudev: add communication flag eagostini
2021-10-09  1:53   ` [dpdk-dev] [PATCH v3 8/9] gpudev: add communication list eagostini
2021-10-09  1:53   ` [dpdk-dev] [PATCH v3 9/9] doc: add CUDA example in GPU guide eagostini
2021-10-10 10:16   ` [dpdk-dev] [PATCH v3 0/9] GPU library Jerin Jacob
2021-10-11  8:18     ` Thomas Monjalon
2021-10-11  8:43       ` Jerin Jacob
2021-10-11  9:12         ` Thomas Monjalon
2021-10-11  9:29           ` Jerin Jacob
2021-10-11 10:27             ` Thomas Monjalon
2021-10-11 11:41               ` Jerin Jacob
2021-10-11 12:44                 ` Thomas Monjalon
2021-10-11 13:30                   ` Jerin Jacob
2021-10-19 10:00                     ` Elena Agostini
2021-10-19 18:47                       ` Jerin Jacob
2021-10-19 19:11                         ` Thomas Monjalon
2021-10-19 19:56                           ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran
2021-11-03 19:15 ` [dpdk-dev] [PATCH v4 " eagostini
2021-11-03 19:15   ` [dpdk-dev] [PATCH v4 1/9] gpudev: introduce GPU device class library eagostini
2021-11-03 19:15   ` [dpdk-dev] [PATCH v4 2/9] gpudev: add event notification eagostini
2021-11-03 19:15   ` [dpdk-dev] [PATCH v4 3/9] gpudev: add child device representing a device context eagostini
2021-11-03 19:15   ` [dpdk-dev] [PATCH v4 4/9] gpudev: support multi-process eagostini
2021-11-03 19:15   ` [dpdk-dev] [PATCH v4 5/9] gpudev: add memory API eagostini
2021-11-03 19:15   ` [dpdk-dev] [PATCH v4 6/9] gpudev: add memory barrier eagostini
2021-11-03 19:15   ` [dpdk-dev] [PATCH v4 7/9] gpudev: add communication flag eagostini
2021-11-03 19:15   ` [dpdk-dev] [PATCH v4 8/9] gpudev: add communication list eagostini
2021-11-03 19:15   ` [dpdk-dev] [PATCH v4 9/9] doc: add CUDA example in GPU guide eagostini
2021-11-08 18:57 ` [dpdk-dev] [PATCH v5 0/9] GPU library eagostini
2021-11-08 16:25   ` Thomas Monjalon
2021-11-08 18:57   ` [dpdk-dev] [PATCH v5 1/9] gpudev: introduce GPU device class library eagostini
2021-11-08 18:57   ` [dpdk-dev] [PATCH v5 2/9] gpudev: add event notification eagostini
2021-11-08 18:57   ` [dpdk-dev] [PATCH v5 3/9] gpudev: add child device representing a device context eagostini
2021-11-08 18:58   ` [dpdk-dev] [PATCH v5 4/9] gpudev: support multi-process eagostini
2021-11-08 18:58   ` [dpdk-dev] [PATCH v5 5/9] gpudev: add memory API eagostini
2021-11-08 18:58   ` [dpdk-dev] [PATCH v5 6/9] gpudev: add memory barrier eagostini
2021-11-08 18:58   ` [dpdk-dev] [PATCH v5 7/9] gpudev: add communication flag eagostini
2021-11-08 18:58   ` [dpdk-dev] [PATCH v5 8/9] gpudev: add communication list eagostini
2021-11-08 18:58   ` [dpdk-dev] [PATCH v5 9/9] doc: add CUDA example in GPU guide eagostini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).