DPDK patches and discussions
 help / color / mirror / Atom feed
* [PATCH v1 00/37] Implementation of ML CNXK driver
@ 2022-12-08 20:01 Srikanth Yalavarthi
  2022-12-08 20:01 ` [PATCH v1 01/37] ml/cnxk: add skeleton for ML cnxk driver Srikanth Yalavarthi
                   ` (40 more replies)
  0 siblings, 41 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:01 UTC (permalink / raw)
  Cc: dev, sshankarnara, jerinj, aprabhu, Srikanth Yalavarthi

Marvell ML CNXK Driver
----------------------

This patch series implements Machine Learning (ML) driver for Marvell
Octeon 10 (cnxk) platform. ML inferencing is supported on cnxk platform
through an integrated ML inferencing processor. The current driver
supports programming the ML hardware engine through offload mode.

All APIs proposed in the DPDK ML device specification are supported on
the cnxk platform.


Srikanth Yalavarthi (37):
  ml/cnxk: add skeleton for ML cnxk driver
  ml/cnxk: enable probe and remove of ML device
  ml/cnxk: add driver support to get device info
  ml/cnxk: add support for configure and close
  ml/cnxk: parse ML firmware path from device args
  ml/cnxk: enable firmware load and device reset
  ml/cnxk: enable support for simulator environment
  ml/cnxk: enable support for device start and stop
  ml/cnxk: add support to create device queue-pairs
  ml/cnxk: add functions to load and unload models
  ml/cnxk: enable validity checks for model metadata
  ml/cnxk: add internal structures for derived info
  ml/cnxk: add internal structures for tiles and OCM
  ml/cnxk: add structures for slow and fast path JDs
  ml/cnxk: find OCM mask and page slots for a model
  ml/cnxk: add support to reserve and free OCM pages
  ml/cnxk: enable support to start an ML model
  ml/cnxk: enable support to stop an ML models
  ml/cnxk: enable support to get model information
  ml/cnxk: enable support to update model params
  ml/cnxk: add support to get IO buffer sizes
  ml/cnxk: enable quantization and dequantization
  ml/cnxk: enable support to dump device debug info
  ml/cnxk: add driver support for device selftest
  ml/cnxk: enqueue a burst of inference requests
  ml/cnxk: dequeue a burst of inference requests
  ml/cnxk: add internal function for sync mode run
  ml/cnxk: enable support for firmware error codes
  ml/cnxk: add support to get and reset device stats
  ml/cnxk: add support to handle extended dev stats
  ml/cnxk: enable support to get xstats in cycles
  ml/cnxk: add support to report DPE FW warnings
  ml/cnxk: add support to enable model data caching
  ml/cnxk: add support to select OCM allocation mode
  ml/cnxk: add support to use lock during jcmd enq
  ml/cnxk: add support to select poll memory region
  ml/cnxk: add user guide for marvell cnxk ml driver

 MAINTAINERS                      |    3 +
 doc/guides/index.rst             |    1 +
 doc/guides/mldevs/cnxk.rst       |  238 +++
 doc/guides/mldevs/index.rst      |   14 +
 drivers/meson.build              |    1 +
 drivers/ml/cnxk/cn10k_ml_dev.c   |  823 +++++++++++
 drivers/ml/cnxk/cn10k_ml_dev.h   |  426 ++++++
 drivers/ml/cnxk/cn10k_ml_model.c |  396 +++++
 drivers/ml/cnxk/cn10k_ml_model.h |  511 +++++++
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  509 +++++++
 drivers/ml/cnxk/cn10k_ml_ocm.h   |   91 ++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 2310 ++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h   |   94 ++
 drivers/ml/cnxk/meson.build      |   32 +
 drivers/ml/meson.build           |    8 +
 15 files changed, 5457 insertions(+)
 create mode 100644 doc/guides/mldevs/cnxk.rst
 create mode 100644 doc/guides/mldevs/index.rst
 create mode 100644 drivers/ml/cnxk/cn10k_ml_dev.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_dev.h
 create mode 100644 drivers/ml/cnxk/cn10k_ml_model.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_model.h
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ocm.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ocm.h
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ops.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ops.h
 create mode 100644 drivers/ml/cnxk/meson.build
 create mode 100644 drivers/ml/meson.build

--
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 01/37] ml/cnxk: add skeleton for ML cnxk driver
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
@ 2022-12-08 20:01 ` Srikanth Yalavarthi
  2022-12-08 20:01 ` [PATCH v1 02/37] ml/cnxk: enable probe and remove of ML device Srikanth Yalavarthi
                   ` (39 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:01 UTC (permalink / raw)
  To: Thomas Monjalon, Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added initial source files and build files for ML cnxk driver.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
Depends-on: patch-120600 ("common/cnxk: add ML headers and ROC code for cnxk")

 MAINTAINERS                    |  2 ++
 drivers/meson.build            |  1 +
 drivers/ml/cnxk/cn10k_ml_dev.c |  8 ++++++++
 drivers/ml/cnxk/cn10k_ml_dev.h |  8 ++++++++
 drivers/ml/cnxk/meson.build    | 26 ++++++++++++++++++++++++++
 drivers/ml/meson.build         |  8 ++++++++
 6 files changed, 53 insertions(+)
 create mode 100644 drivers/ml/cnxk/cn10k_ml_dev.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_dev.h
 create mode 100644 drivers/ml/cnxk/meson.build
 create mode 100644 drivers/ml/meson.build

diff --git a/MAINTAINERS b/MAINTAINERS
index 8cdb3e215d..ba4c97e802 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1442,6 +1442,8 @@ Marvell ML CNXK
 M: Srikanth Yalavarthi <syalavarthi@marvell.com>
 F: drivers/common/cnxk/hw/ml.h
 F: drivers/common/cnxk/roc_ml*
+F: drivers/ml/cnxk/
+

 Packet processing
 -----------------
diff --git a/drivers/meson.build b/drivers/meson.build
index c6d619200f..546a5f409d 100644
--- a/drivers/meson.build
+++ b/drivers/meson.build
@@ -14,6 +14,7 @@ subdirs = [
         'mempool',        # depends on common and bus.
         'dma',            # depends on common and bus.
         'net',            # depends on common, bus, mempool
+        'ml',             # depends on common, bus, mempool
         'raw',            # depends on common, bus, dma and net.
         'crypto',         # depends on common, bus and mempool (net in future).
         'compress',       # depends on common, bus, mempool.
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
new file mode 100644
index 0000000000..cc96a7bdb3
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include "cn10k_ml_dev.h"
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
new file mode 100644
index 0000000000..049ac13fcd
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef _CN10K_ML_DEV_H_
+#define _CN10K_ML_DEV_H_
+
+#endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
new file mode 100644
index 0000000000..f04e78cce5
--- /dev/null
+++ b/drivers/ml/cnxk/meson.build
@@ -0,0 +1,26 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2022 Marvell.
+
+if not is_linux or not dpdk_conf.get('RTE_ARCH_64')
+    build = false
+    reason = 'only supported on 64-bit Linux'
+    subdir_done()
+endif
+
+sources = files(
+        'cn10k_ml_dev.c',
+)
+
+headers = files(
+        'cn10k_ml_dev.h',
+)
+
+deps += ['mldev', 'common_ml', 'common_cnxk']
+
+if get_option('buildtype').contains('debug')
+        cflags += [ '-DCNXK_ML_DEV_DEBUG' ]
+else
+        cflags += [ '-UCNXK_ML_DEV_DEBUG' ]
+endif
+
+pmd_supports_disable_iova_as_pa = true
diff --git a/drivers/ml/meson.build b/drivers/ml/meson.build
new file mode 100644
index 0000000000..54bc394c47
--- /dev/null
+++ b/drivers/ml/meson.build
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2022 Marvell.
+
+drivers = [
+        'cnxk',
+]
+
+std_deps = ['mldev']
--
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 02/37] ml/cnxk: enable probe and remove of ML device
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
  2022-12-08 20:01 ` [PATCH v1 01/37] ml/cnxk: add skeleton for ML cnxk driver Srikanth Yalavarthi
@ 2022-12-08 20:01 ` Srikanth Yalavarthi
  2022-12-08 20:01 ` [PATCH v1 03/37] ml/cnxk: add driver support to get device info Srikanth Yalavarthi
                   ` (38 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:01 UTC (permalink / raw)
  To: Srikanth Yalavarthi, Anatoly Burakov; +Cc: dev, sshankarnara, jerinj, aprabhu

ML inference engine on cn10k platform is a PCI based device. Added
driver support to probe and remove the device for cn10k poll mode
driver. The device is named by the PMD as "ml_cn10k".

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 114 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_dev.h |  11 ++++
 drivers/ml/cnxk/cn10k_ml_ops.c |  10 +++
 drivers/ml/cnxk/cn10k_ml_ops.h |  11 ++++
 drivers/ml/cnxk/meson.build    |   2 +
 5 files changed, 148 insertions(+)
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ops.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ops.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index cc96a7bdb3..c2e93c9a1a 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -2,7 +2,121 @@
  * Copyright (c) 2022 Marvell.
  */
 
+#include <rte_common.h>
+#include <rte_dev.h>
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
+#include <rte_pci.h>
+
+#include <roc_api.h>
 
 #include "cn10k_ml_dev.h"
+#include "cn10k_ml_ops.h"
+
+/* Dummy operations for ML device */
+struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
+
+static int
+cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
+{
+	struct rte_ml_dev_pmd_init_params init_params;
+	struct cn10k_ml_dev *mldev;
+	char name[RTE_ML_STR_MAX];
+	struct rte_ml_dev *dev;
+	int ret;
+
+	PLT_SET_USED(pci_drv);
+
+	init_params = (struct rte_ml_dev_pmd_init_params){
+		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cn10k_ml_dev)};
+
+	ret = roc_plt_init();
+	if (ret < 0) {
+		plt_err("Failed to initialize platform model");
+		return ret;
+	}
+
+	rte_pci_device_name(&pci_dev->addr, name, sizeof(name));
+	dev = rte_ml_dev_pmd_create(name, &pci_dev->device, &init_params);
+	if (dev == NULL) {
+		ret = -ENODEV;
+		goto error_exit;
+	}
+
+	/* Get private data space allocated */
+	mldev = dev->data->dev_private;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		mldev->roc.pci_dev = pci_dev;
+
+		ret = roc_ml_dev_init(&mldev->roc);
+		if (ret) {
+			plt_err("Failed to initialize ML ROC, ret = %d", ret);
+			goto pmd_destroy;
+		}
+
+		dev->dev_ops = &cn10k_ml_ops;
+	} else {
+		plt_err("CN10K ML Ops are not supported on secondary process");
+		dev->dev_ops = &ml_dev_dummy_ops;
+	}
+
+	dev->enqueue_burst = NULL;
+	dev->dequeue_burst = NULL;
+	dev->op_error_get = NULL;
+
+	return 0;
+
+pmd_destroy:
+	rte_ml_dev_pmd_destroy(dev);
+
+error_exit:
+	plt_err("Could not create device (vendor_id: 0x%x device_id: 0x%x)", pci_dev->id.vendor_id,
+		pci_dev->id.device_id);
+
+	return ret;
+}
+
+static int
+cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
+{
+	struct cn10k_ml_dev *mldev;
+	char name[RTE_ML_STR_MAX];
+	struct rte_ml_dev *dev;
+	int ret;
+
+	if (pci_dev == NULL)
+		return -EINVAL;
+
+	rte_pci_device_name(&pci_dev->addr, name, sizeof(name));
+
+	dev = rte_ml_dev_pmd_get_named_dev(name);
+	if (dev == NULL)
+		return -ENODEV;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		mldev = dev->data->dev_private;
+		ret = roc_ml_dev_fini(&mldev->roc);
+		if (ret)
+			return ret;
+	}
+
+	return rte_ml_dev_pmd_destroy(dev);
+}
+
+static struct rte_pci_id pci_id_ml_table[] = {
+	{RTE_PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_CN10K_ML_PF)},
+	/* sentinel */
+	{},
+};
+
+static struct rte_pci_driver cn10k_mldev_pmd = {
+	.id_table = pci_id_ml_table,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_NEED_IOVA_AS_VA,
+	.probe = cn10k_ml_pci_probe,
+	.remove = cn10k_ml_pci_remove,
+};
+
+RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd);
+RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table);
+RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 049ac13fcd..4827d29bf7 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -5,4 +5,15 @@
 #ifndef _CN10K_ML_DEV_H_
 #define _CN10K_ML_DEV_H_
 
+#include <roc_api.h>
+
+/* Marvell OCTEON CN10K ML PMD device name */
+#define MLDEV_NAME_CN10K_PMD ml_cn10k
+
+/* Device private data */
+struct cn10k_ml_dev {
+	/* ML device ROC */
+	struct roc_ml roc;
+};
+
 #endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
new file mode 100644
index 0000000000..39843e3ee5
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include "cn10k_ml_ops.h"
+
+struct rte_ml_dev_ops cn10k_ml_ops = {0};
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
new file mode 100644
index 0000000000..adb0035fd7
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef _CN10K_ML_OPS_H_
+#define _CN10K_ML_OPS_H_
+
+/* CN10K device ops */
+extern struct rte_ml_dev_ops cn10k_ml_ops;
+
+#endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index f04e78cce5..bf4ccde2c5 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -9,10 +9,12 @@ endif
 
 sources = files(
         'cn10k_ml_dev.c',
+        'cn10k_ml_ops.c',
 )
 
 headers = files(
         'cn10k_ml_dev.h',
+        'cn10k_ml_ops.h',
 )
 
 deps += ['mldev', 'common_ml', 'common_cnxk']
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 03/37] ml/cnxk: add driver support to get device info
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
  2022-12-08 20:01 ` [PATCH v1 01/37] ml/cnxk: add skeleton for ML cnxk driver Srikanth Yalavarthi
  2022-12-08 20:01 ` [PATCH v1 02/37] ml/cnxk: enable probe and remove of ML device Srikanth Yalavarthi
@ 2022-12-08 20:01 ` Srikanth Yalavarthi
  2022-12-08 20:01 ` [PATCH v1 04/37] ml/cnxk: add support for configure and close Srikanth Yalavarthi
                   ` (37 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:01 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added support to get the cn10k ML device information. This is a
driver implementation for the RTE function rte_ml_dev_info_get.
ML device on cn10k supports one queue-pair in lock-free mode and
does not support segmented input output data.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h | 15 +++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c | 23 ++++++++++++++++++++++-
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 4827d29bf7..eeaf83ce5c 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -10,6 +10,21 @@
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
+/* Device alignment size */
+#define ML_CN10K_ALIGN_SIZE 128
+
+/* Maximum number of models per device */
+#define ML_CN10K_MAX_MODELS 16
+
+/* Maximum number of Queue-Pairs per device */
+#define ML_CN10K_MAX_QP_PER_DEVICE 1
+
+/* Maximum number of descriptors per queue-pair */
+#define ML_CN10K_MAX_DESC_PER_QP 1024
+
+/* Maximum number of segments for IO data */
+#define ML_CN10K_MAX_SEGMENTS 1
+
 /* Device private data */
 struct cn10k_ml_dev {
 	/* ML device ROC */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 39843e3ee5..bad5ad4713 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -5,6 +5,27 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include "cn10k_ml_dev.h"
 #include "cn10k_ml_ops.h"
 
-struct rte_ml_dev_ops cn10k_ml_ops = {0};
+static int
+cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
+{
+	if (dev_info == NULL)
+		return -EINVAL;
+
+	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
+	dev_info->driver_name = dev->device->driver->name;
+	dev_info->max_models = ML_CN10K_MAX_MODELS;
+	dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE;
+	dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
+	dev_info->max_segments = ML_CN10K_MAX_SEGMENTS;
+	dev_info->min_align_size = ML_CN10K_ALIGN_SIZE;
+
+	return 0;
+}
+
+struct rte_ml_dev_ops cn10k_ml_ops = {
+	/* Device control ops */
+	.dev_info_get = cn10k_ml_dev_info_get,
+};
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 04/37] ml/cnxk: add support for configure and close
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (2 preceding siblings ...)
  2022-12-08 20:01 ` [PATCH v1 03/37] ml/cnxk: add driver support to get device info Srikanth Yalavarthi
@ 2022-12-08 20:01 ` Srikanth Yalavarthi
  2022-12-08 20:01 ` [PATCH v1 05/37] ml/cnxk: parse ML firmware path from device args Srikanth Yalavarthi
                   ` (36 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:01 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Implemented driver functions to configure and close ML devices.
Added skeleton code and support to reconfigure ML device. PCI
device remove is enabled in device close.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c |  2 ++
 drivers/ml/cnxk/cn10k_ml_dev.h | 21 ++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c | 60 ++++++++++++++++++++++++++++++++++
 3 files changed, 83 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index c2e93c9a1a..fd45226add 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -65,6 +65,8 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	dev->dequeue_burst = NULL;
 	dev->op_error_get = NULL;
 
+	mldev->state = ML_CN10K_DEV_STATE_PROBED;
+
 	return 0;
 
 pmd_destroy:
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index eeaf83ce5c..bda7a5b3ff 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -25,10 +25,31 @@
 /* Maximum number of segments for IO data */
 #define ML_CN10K_MAX_SEGMENTS 1
 
+/* ML command timeout in seconds */
+#define ML_CN10K_CMD_TIMEOUT 5
+
+/* Device configuration state enum */
+enum cn10k_ml_dev_state {
+	/* Device probed and not configured */
+	ML_CN10K_DEV_STATE_PROBED = 0,
+
+	/* Device configured */
+	ML_CN10K_DEV_STATE_CONFIGURED,
+
+	/* Device started */
+	ML_CN10K_DEV_STATE_STARTED,
+
+	/* Device closed */
+	ML_CN10K_DEV_STATE_CLOSED
+};
+
 /* Device private data */
 struct cn10k_ml_dev {
 	/* ML device ROC */
 	struct roc_ml roc;
+
+	/* Configuration state */
+	enum cn10k_ml_dev_state state;
 };
 
 #endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index bad5ad4713..32d38569a3 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -25,7 +25,67 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	return 0;
 }
 
+static int
+cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_dev *mldev;
+
+	if (dev == NULL || conf == NULL)
+		return -EINVAL;
+
+	/* Get CN10K device handle */
+	mldev = dev->data->dev_private;
+
+	cn10k_ml_dev_info_get(dev, &dev_info);
+	if (conf->nb_models > dev_info.max_models) {
+		plt_err("Invalid device config, nb_models > %d\n", dev_info.max_models);
+		return -EINVAL;
+	}
+
+	if (conf->nb_queue_pairs > dev_info.max_queue_pairs) {
+		plt_err("Invalid device config, nb_queue_pairs > %u\n", dev_info.max_queue_pairs);
+		return -EINVAL;
+	}
+
+	if (mldev->state == ML_CN10K_DEV_STATE_PROBED) {
+		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
+			   conf->nb_queue_pairs, conf->nb_models);
+	} else if (mldev->state == ML_CN10K_DEV_STATE_CONFIGURED) {
+		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
+			   conf->nb_queue_pairs, conf->nb_models);
+	} else if (mldev->state == ML_CN10K_DEV_STATE_STARTED) {
+		plt_err("Device can't be reconfigured in started state\n");
+		return -ENOTSUP;
+	} else if (mldev->state == ML_CN10K_DEV_STATE_CLOSED) {
+		plt_err("Device can't be reconfigured after close\n");
+		return -ENOTSUP;
+	}
+
+	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
+
+	return 0;
+}
+
+static int
+cn10k_ml_dev_close(struct rte_ml_dev *dev)
+{
+	struct cn10k_ml_dev *mldev;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	mldev = dev->data->dev_private;
+
+	mldev->state = ML_CN10K_DEV_STATE_CLOSED;
+
+	/* Remove PCI device */
+	return rte_dev_remove(dev->device);
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
+	.dev_configure = cn10k_ml_dev_configure,
+	.dev_close = cn10k_ml_dev_close,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 05/37] ml/cnxk: parse ML firmware path from device args
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (3 preceding siblings ...)
  2022-12-08 20:01 ` [PATCH v1 04/37] ml/cnxk: add support for configure and close Srikanth Yalavarthi
@ 2022-12-08 20:01 ` Srikanth Yalavarthi
  2022-12-08 20:01 ` [PATCH v1 06/37] ml/cnxk: enable firmware load and device reset Srikanth Yalavarthi
                   ` (35 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:01 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled parsing ML firmware path for cn10k. Default path is set
as "/lib/firmware/mlip-fw.bin", when args are not provided. Added
internal structures for ML firmware.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 71 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_dev.h | 12 ++++++
 drivers/ml/cnxk/meson.build    |  2 +-
 3 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index fd45226add..117cac43aa 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -4,6 +4,8 @@
 
 #include <rte_common.h>
 #include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_kvargs.h>
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 #include <rte_pci.h>
@@ -13,9 +15,70 @@
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ops.h"
 
+#define CN10K_ML_FW_PATH "fw_path"
+
+#define CN10K_ML_FW_PATH_DEFAULT "/lib/firmware/mlip-fw.bin"
+
+static const char *const valid_args[] = {CN10K_ML_FW_PATH, NULL};
+
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
 
+static int
+parse_string_arg(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	if (value == NULL || extra_args == NULL)
+		return -EINVAL;
+
+	*(char **)extra_args = strdup(value);
+
+	if (!*(char **)extra_args)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static int
+cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mldev)
+{
+	struct rte_kvargs *kvlist = NULL;
+	bool fw_path_set = false;
+	char *fw_path = NULL;
+	int ret = 0;
+
+	if (devargs == NULL)
+		goto check_args;
+
+	kvlist = rte_kvargs_parse(devargs->args, valid_args);
+	if (kvlist == NULL) {
+		plt_err("Error parsing devargs\n");
+		return -EINVAL;
+	}
+
+	if (rte_kvargs_count(kvlist, CN10K_ML_FW_PATH) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_PATH, &parse_string_arg, &fw_path);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n", CN10K_ML_FW_PATH);
+			ret = -EINVAL;
+			goto exit;
+		}
+		fw_path_set = true;
+	}
+
+check_args:
+	if (!fw_path_set)
+		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
+	else
+		mldev->fw.path = fw_path;
+	plt_info("ML: %s = %s", CN10K_ML_FW_PATH, mldev->fw.path);
+
+exit:
+	if (kvlist)
+		rte_kvargs_free(kvlist);
+
+	return ret;
+}
+
 static int
 cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 {
@@ -49,6 +112,12 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
 		mldev->roc.pci_dev = pci_dev;
 
+		ret = cn10k_mldev_parse_devargs(dev->device->devargs, mldev);
+		if (ret) {
+			plt_err("Failed to parse devargs ret = %d", ret);
+			goto pmd_destroy;
+		}
+
 		ret = roc_ml_dev_init(&mldev->roc);
 		if (ret) {
 			plt_err("Failed to initialize ML ROC, ret = %d", ret);
@@ -122,3 +191,5 @@ static struct rte_pci_driver cn10k_mldev_pmd = {
 RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table);
 RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
+
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH "=<path>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index bda7a5b3ff..7eac51cf09 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -43,6 +43,15 @@ enum cn10k_ml_dev_state {
 	ML_CN10K_DEV_STATE_CLOSED
 };
 
+/* ML firmware structure */
+struct cn10k_ml_fw {
+	/* Device reference */
+	struct cn10k_ml_dev *mldev;
+
+	/* Firmware file path */
+	const char *path;
+};
+
 /* Device private data */
 struct cn10k_ml_dev {
 	/* ML device ROC */
@@ -50,6 +59,9 @@ struct cn10k_ml_dev {
 
 	/* Configuration state */
 	enum cn10k_ml_dev_state state;
+
+	/* ML Firmware */
+	struct cn10k_ml_fw fw;
 };
 
 #endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index bf4ccde2c5..7c6fa5e906 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -17,7 +17,7 @@ headers = files(
         'cn10k_ml_ops.h',
 )
 
-deps += ['mldev', 'common_ml', 'common_cnxk']
+deps += ['mldev', 'common_ml', 'common_cnxk', 'kvargs']
 
 if get_option('buildtype').contains('debug')
         cflags += [ '-DCNXK_ML_DEV_DEBUG' ]
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 06/37] ml/cnxk: enable firmware load and device reset
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (4 preceding siblings ...)
  2022-12-08 20:01 ` [PATCH v1 05/37] ml/cnxk: parse ML firmware path from device args Srikanth Yalavarthi
@ 2022-12-08 20:01 ` Srikanth Yalavarthi
  2022-12-08 20:01 ` [PATCH v1 07/37] ml/cnxk: enable support for simulator environment Srikanth Yalavarthi
                   ` (34 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:01 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled support to load ML firmware on cn10ka ROC model. Reset
MLIP device during dev_close driver operation. Device can't be
reconfigured after a call to close. Job execution is disabled
after firmware load, execution is enabled in device start state.
Added internal request structure.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 327 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_dev.h | 156 ++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c |  21 +++
 drivers/ml/cnxk/cn10k_ml_ops.h |  14 ++
 4 files changed, 518 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 117cac43aa..f2b815aacc 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -12,6 +12,8 @@
 
 #include <roc_api.h>
 
+#include <eal_firmware.h>
+
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ops.h"
 
@@ -19,6 +21,15 @@
 
 #define CN10K_ML_FW_PATH_DEFAULT "/lib/firmware/mlip-fw.bin"
 
+/* ML firmware macros */
+#define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
+#define FW_STACK_BUFFER_SIZE	 0x40000
+#define FW_DEBUG_BUFFER_SIZE	 (2 * 0x20000)
+#define FW_EXCEPTION_BUFFER_SIZE 0x400
+#define FW_LINKER_OFFSET	 0x80000
+#define FW_WAIT_CYCLES		 100
+#define FW_LOAD_FLAGS		 0x1
+
 static const char *const valid_args[] = {CN10K_ML_FW_PATH, NULL};
 
 /* Dummy operations for ML device */
@@ -175,6 +186,322 @@ cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 	return rte_ml_dev_pmd_destroy(dev);
 }
 
+static void
+cn10k_ml_fw_print_info(struct cn10k_ml_fw *fw)
+{
+	plt_info("ML Firmware Version = %s", fw->req->jd.fw_load.version);
+
+	plt_ml_dbg("Firmware capabilities = 0x%016lx", fw->req->jd.fw_load.cap.u64);
+	plt_ml_dbg("Version = %s", fw->req->jd.fw_load.version);
+	plt_ml_dbg("core0_debug_ptr = 0x%016lx", fw->req->jd.fw_load.debug.core0_debug_ptr);
+	plt_ml_dbg("core1_debug_ptr = 0x%016lx", fw->req->jd.fw_load.debug.core1_debug_ptr);
+	plt_ml_dbg("debug_buffer_size = %u bytes", fw->req->jd.fw_load.debug.debug_buffer_size);
+	plt_ml_dbg("core0_exception_buffer = 0x%016lx",
+		   fw->req->jd.fw_load.debug.core0_exception_buffer);
+	plt_ml_dbg("core1_exception_buffer = 0x%016lx",
+		   fw->req->jd.fw_load.debug.core1_exception_buffer);
+	plt_ml_dbg("exception_state_size = %u bytes",
+		   fw->req->jd.fw_load.debug.exception_state_size);
+	plt_ml_dbg("flags = 0x%016lx", fw->req->jd.fw_load.flags);
+}
+
+uint64_t
+cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
+{
+	PLT_SET_USED(fw);
+
+	return FW_LOAD_FLAGS;
+}
+
+static int
+cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
+{
+	union ml_a35_0_rst_vector_base_s a35_0_rst_vector_base;
+	union ml_a35_0_rst_vector_base_s a35_1_rst_vector_base;
+	struct cn10k_ml_dev *mldev;
+	uint64_t timeout_cycle;
+	uint64_t reg_val64;
+	uint32_t reg_val32;
+	uint64_t offset;
+	bool timeout;
+	int ret = 0;
+	uint8_t i;
+
+	mldev = fw->mldev;
+
+	/* Reset HEAD and TAIL debug pointer registers */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
+
+	/* (1) Write firmware images for ACC's two A35 cores to the ML region in LLC / DRAM. */
+	memcpy(PLT_PTR_ADD(fw->data, FW_LINKER_OFFSET), buffer, size);
+
+	/* (2) Set ML(0)_MLR_BASE = Base IOVA of the ML region in LLC/DRAM. */
+	reg_val64 = PLT_PTR_SUB_U64_CAST(fw->data, rte_eal_get_baseaddr());
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_save(&mldev->roc, ML_MLR_BASE);
+
+	/* (3) Set ML(0)_AXI_BRIDGE_CTRL(1) = 0x184003 to remove back-pressure check on DMA AXI
+	 * bridge.
+	 */
+	reg_val64 = (ROC_ML_AXI_BRIDGE_CTRL_AXI_RESP_CTRL |
+		     ROC_ML_AXI_BRIDGE_CTRL_BRIDGE_CTRL_MODE | ROC_ML_AXI_BRIDGE_CTRL_NCB_WR_BLK |
+		     ROC_ML_AXI_BRIDGE_CTRL_FORCE_WRESP_OK | ROC_ML_AXI_BRIDGE_CTRL_FORCE_RRESP_OK);
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_AXI_BRIDGE_CTRL(1));
+	plt_ml_dbg("ML_AXI_BRIDGE_CTRL(1) => 0x%016lx",
+		   roc_ml_reg_read64(&mldev->roc, ML_AXI_BRIDGE_CTRL(1)));
+
+	/* (4) Set ML(0)_ANB(0..2)_BACKP_DISABLE = 0x3 to remove back-pressure on the AXI to NCB
+	 * bridges.
+	 */
+	for (i = 0; i < ML_ANBX_NR; i++) {
+		reg_val64 = (ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_B_BACKP_DISABLE |
+			     ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_R_BACKP_DISABLE);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_BACKP_DISABLE(i));
+		plt_ml_dbg("ML_ANBX_BACKP_DISABLE(%u) => 0x%016lx", i,
+			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_BACKP_DISABLE(i)));
+	}
+
+	/* (5) Set ML(0)_ANB(0..2)_NCBI_P_OVR = 0x3000 and ML(0)_ANB(0..2)_NCBI_NP_OVR = 0x3000 to
+	 * signal all ML transactions as non-secure.
+	 */
+	for (i = 0; i < ML_ANBX_NR; i++) {
+		reg_val64 = (ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR |
+			     ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR_VLD);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
+		plt_ml_dbg("ML_ANBX_NCBI_P_OVR(%u) => 0x%016lx", i,
+			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_P_OVR(i)));
+
+		reg_val64 |= (ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR |
+			      ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR_VLD);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_NP_OVR(i));
+		plt_ml_dbg("ML_ANBX_NCBI_NP_OVR(%u) => 0x%016lx", i,
+			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_NP_OVR(i)));
+	}
+
+	/* (6) Set ML(0)_CFG[MLIP_CLK_FORCE] = 1, to force turning on the MLIP clock. */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 |= ROC_ML_CFG_MLIP_CLK_FORCE;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	/* (7) Set ML(0)_JOB_MGR_CTRL[STALL_ON_IDLE] = 0, to make sure the boot request is accepted
+	 * when there is no job in the command queue.
+	 */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL);
+	reg_val64 &= ~ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
+	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL));
+
+	/* (8) Set ML(0)_CFG[ENA] = 0 and ML(0)_CFG[MLIP_ENA] = 1 to bring MLIP out of reset while
+	 * keeping the job manager disabled.
+	 */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 |= ROC_ML_CFG_MLIP_ENA;
+	reg_val64 &= ~ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	/* (9) Wait at least 70 coprocessor clock cycles. */
+	plt_delay_us(FW_WAIT_CYCLES);
+
+	/* (10) Write ML outbound addresses pointing to the firmware images written in step 1 to the
+	 * following registers: ML(0)_A35_0_RST_VECTOR_BASE_W(0..1) for core 0,
+	 * ML(0)_A35_1_RST_VECTOR_BASE_W(0..1) for core 1. The value written to each register is the
+	 * AXI outbound address divided by 4. Read after write.
+	 */
+	offset = PLT_PTR_ADD_U64_CAST(
+		fw->data, FW_LINKER_OFFSET - roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+	a35_0_rst_vector_base.s.addr = (offset + ML_AXI_START_ADDR) / 4;
+	a35_1_rst_vector_base.s.addr = (offset + ML_AXI_START_ADDR) / 4;
+
+	roc_ml_reg_write32(&mldev->roc, a35_0_rst_vector_base.w.w0, ML_A35_0_RST_VECTOR_BASE_W(0));
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(0));
+	plt_ml_dbg("ML_A35_0_RST_VECTOR_BASE_W(0) => 0x%08x", reg_val32);
+
+	roc_ml_reg_write32(&mldev->roc, a35_0_rst_vector_base.w.w1, ML_A35_0_RST_VECTOR_BASE_W(1));
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(1));
+	plt_ml_dbg("ML_A35_0_RST_VECTOR_BASE_W(1) => 0x%08x", reg_val32);
+
+	roc_ml_reg_write32(&mldev->roc, a35_1_rst_vector_base.w.w0, ML_A35_1_RST_VECTOR_BASE_W(0));
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(0));
+	plt_ml_dbg("ML_A35_1_RST_VECTOR_BASE_W(0) => 0x%08x", reg_val32);
+
+	roc_ml_reg_write32(&mldev->roc, a35_1_rst_vector_base.w.w1, ML_A35_1_RST_VECTOR_BASE_W(1));
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(1));
+	plt_ml_dbg("ML_A35_1_RST_VECTOR_BASE_W(1) => 0x%08x", reg_val32);
+
+	/* (11) Clear MLIP's ML(0)_SW_RST_CTRL[ACC_RST]. This will bring the ACC cores and other
+	 * MLIP components out of reset. The cores will execute firmware from the ML region as
+	 * written in step 1.
+	 */
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_SW_RST_CTRL);
+	reg_val32 &= ~ROC_ML_SW_RST_CTRL_ACC_RST;
+	roc_ml_reg_write32(&mldev->roc, reg_val32, ML_SW_RST_CTRL);
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_SW_RST_CTRL);
+	plt_ml_dbg("ML_SW_RST_CTRL => 0x%08x", reg_val32);
+
+	/* (12) Wait for notification from firmware that ML is ready for job execution. */
+	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
+	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
+	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &fw->req->result);
+	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
+	plt_write64(ML_CN10K_POLL_JOB_START, &fw->req->status);
+	plt_wmb();
+
+	/* Enqueue FW load through scratch registers */
+	timeout = true;
+	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&mldev->roc, &fw->req->jd);
+
+	plt_rmb();
+	do {
+		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
+		    (plt_read64(&fw->req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+			timeout = false;
+			break;
+		}
+	} while (plt_tsc_cycles() < timeout_cycle);
+
+	/* Check firmware load status, clean-up and exit on failure. */
+	if ((!timeout) && (fw->req->result.error_code == 0)) {
+		cn10k_ml_fw_print_info(fw);
+	} else {
+		/* Set ML to disable new jobs */
+		reg_val64 = (ROC_ML_CFG_JD_SIZE | ROC_ML_CFG_MLIP_ENA);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+
+		/* Clear scratch registers */
+		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+
+		if (timeout) {
+			plt_err("Firmware load timeout");
+			ret = -ETIME;
+		} else {
+			plt_err("Firmware load failed");
+			ret = -1;
+		}
+
+		return ret;
+	}
+
+	/* (13) Set ML(0)_JOB_MGR_CTRL[STALL_ON_IDLE] = 0x1; this is needed to shut down the MLIP
+	 * clock when there are no more jobs to process.
+	 */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL);
+	reg_val64 |= ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
+	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL));
+
+	/* (14) Set ML(0)_CFG[MLIP_CLK_FORCE] = 0; the MLIP clock will be turned on/off based on job
+	 * activities.
+	 */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 &= ~ROC_ML_CFG_MLIP_CLK_FORCE;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	/* (15) Set ML(0)_CFG[ENA] to enable ML job execution. */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 |= ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	/* Reset scratch registers */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+
+	/* Disable job execution, to be enabled in start */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 &= ~ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	/* Additional fixes: Set RO bit to fix O2D DMA bandwidth issue on cn10ka */
+	for (i = 0; i < ML_ANBX_NR; i++) {
+		reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_P_OVR(i));
+		reg_val64 |= (ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR |
+			      ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR_VLD);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
+	}
+
+	return ret;
+}
+
+int
+cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
+{
+	const struct plt_memzone *mz;
+	struct cn10k_ml_fw *fw;
+	void *fw_buffer = NULL;
+	uint64_t mz_size = 0;
+	uint64_t fw_size = 0;
+	int ret = 0;
+
+	fw = &mldev->fw;
+	fw->mldev = mldev;
+
+	/* Read firmware image to a buffer */
+	ret = rte_firmware_read(fw->path, &fw_buffer, &fw_size);
+	if (ret < 0) {
+		plt_err("Can't read firmware data: %s\n", fw->path);
+		return ret;
+	}
+
+	/* Reserve memzone for firmware load completion and data */
+	mz_size = sizeof(struct cn10k_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
+		  FW_DEBUG_BUFFER_SIZE + FW_EXCEPTION_BUFFER_SIZE;
+	mz = plt_memzone_reserve_aligned(FW_MEMZONE_NAME, mz_size, 0, ML_CN10K_ALIGN_SIZE);
+	if (mz == NULL) {
+		plt_err("plt_memzone_reserve failed : %s", FW_MEMZONE_NAME);
+		if (fw_buffer != NULL)
+			free(fw_buffer);
+		return -ENOMEM;
+	}
+	fw->req = mz->addr;
+
+	/* Reset firmware load completion structure */
+	memset(&fw->req->jd, 0, sizeof(struct cn10k_ml_jd));
+	memset(&fw->req->jd.fw_load.version[0], '\0', MLDEV_FIRMWARE_VERSION_LENGTH);
+
+	/* Reset device, if in active state */
+	if (roc_ml_mlip_is_enabled(&mldev->roc))
+		roc_ml_mlip_reset(&mldev->roc, true);
+
+	/* Load firmware */
+	fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cn10k_ml_req));
+	ret = cn10k_ml_fw_load_cn10ka(fw, fw_buffer, fw_size);
+	if (fw_buffer != NULL)
+		free(fw_buffer);
+	if (ret < 0)
+		cn10k_ml_fw_unload(mldev);
+
+	return ret;
+}
+
+void
+cn10k_ml_fw_unload(struct cn10k_ml_dev *mldev)
+{
+	const struct plt_memzone *mz;
+	uint64_t reg_val;
+
+	/* Disable and reset device */
+	reg_val = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val &= ~ROC_ML_CFG_MLIP_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val, ML_CFG);
+	roc_ml_mlip_reset(&mldev->roc, true);
+
+	mz = plt_memzone_lookup(FW_MEMZONE_NAME);
+	if (mz != NULL)
+		plt_memzone_free(mz);
+}
+
 static struct rte_pci_id pci_id_ml_table[] = {
 	{RTE_PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_CN10K_ML_PF)},
 	/* sentinel */
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 7eac51cf09..30c2ea6471 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -10,6 +10,9 @@
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
+/* Firmware version string length */
+#define MLDEV_FIRMWARE_VERSION_LENGTH 32
+
 /* Device alignment size */
 #define ML_CN10K_ALIGN_SIZE 128
 
@@ -28,6 +31,19 @@
 /* ML command timeout in seconds */
 #define ML_CN10K_CMD_TIMEOUT 5
 
+/* Poll mode job state */
+#define ML_CN10K_POLL_JOB_START	 0
+#define ML_CN10K_POLL_JOB_FINISH 1
+
+/* ML Job types */
+enum cn10k_ml_job_type {
+	ML_CN10K_JOB_TYPE_MODEL_RUN = 0,
+	ML_CN10K_JOB_TYPE_MODEL_STOP,
+	ML_CN10K_JOB_TYPE_MODEL_START,
+	ML_CN10K_JOB_TYPE_FIRMWARE_LOAD,
+	ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST,
+};
+
 /* Device configuration state enum */
 enum cn10k_ml_dev_state {
 	/* Device probed and not configured */
@@ -43,6 +59,136 @@ enum cn10k_ml_dev_state {
 	ML_CN10K_DEV_STATE_CLOSED
 };
 
+/* ML Firmware stats */
+struct cn10k_ml_fw_stats {
+	/* Firmware start cycle */
+	uint64_t fw_start;
+
+	/* Firmware end cycle */
+	uint64_t fw_end;
+
+	/* Hardware start cycle */
+	uint64_t hw_start;
+
+	/* Hardware end cycle */
+	uint64_t hw_end;
+};
+
+/* ML result structure */
+struct cn10k_ml_result {
+	/* Job error code */
+	uint64_t error_code;
+
+	/* Firmware stats */
+	struct cn10k_ml_fw_stats stats;
+
+	/* User context pointer */
+	void *user_ptr;
+};
+
+/* ML Firmware capability structure */
+union cn10k_ml_fw_cap {
+	uint64_t u64;
+
+	struct {
+		/* CMPC completion support */
+		uint64_t cmpc_completions : 1;
+
+		/* Poll mode completion support */
+		uint64_t poll_completions : 1;
+
+		/* SSO completion support */
+		uint64_t sso_completions : 1;
+
+		/* Support for model side loading */
+		uint64_t side_load_model : 1;
+
+		/* Batch execution */
+		uint64_t batch_run : 1;
+
+		/* Max number of models to be loaded in parallel */
+		uint64_t max_models : 8;
+
+		/* Firmware statistics */
+		uint64_t fw_stats : 1;
+
+		/* Hardware statistics */
+		uint64_t hw_stats : 1;
+
+		/* Max number of batches */
+		uint64_t max_num_batches : 16;
+
+		uint64_t rsvd : 33;
+	} s;
+};
+
+/* ML Firmware debug info structure */
+struct cn10k_ml_fw_debug {
+	/* ACC core 0 debug buffer */
+	uint64_t core0_debug_ptr;
+
+	/* ACC core 1 debug buffer */
+	uint64_t core1_debug_ptr;
+
+	/* ACC core 0 exception state buffer */
+	uint64_t core0_exception_buffer;
+
+	/* ACC core 1 exception state buffer */
+	uint64_t core1_exception_buffer;
+
+	/* Debug buffer size per core */
+	uint32_t debug_buffer_size;
+
+	/* Exception state dump size */
+	uint32_t exception_state_size;
+};
+
+/* Job descriptor header (32 bytes) */
+struct cn10k_ml_jd_header {
+	/* Job completion structure */
+	struct ml_jce_s jce;
+
+	/* Model ID */
+	uint64_t model_id : 8;
+
+	/* Job type */
+	uint64_t job_type : 8;
+
+	/* Flags for fast-path jobs */
+	uint64_t fp_flags : 16;
+
+	/* Flags for slow-path jobs */
+	uint64_t sp_flags : 16;
+	uint64_t rsvd : 16;
+
+	/* Job result pointer */
+	uint64_t *result;
+};
+
+/* Job descriptor structure */
+struct cn10k_ml_jd {
+	/* Job descriptor header (32 bytes) */
+	struct cn10k_ml_jd_header hdr;
+
+	union {
+		struct cn10k_ml_jd_section_fw_load {
+			/* Firmware capability structure (8 bytes) */
+			union cn10k_ml_fw_cap cap;
+
+			/* Firmware version (32 bytes) */
+			uint8_t version[MLDEV_FIRMWARE_VERSION_LENGTH];
+
+			/* Debug capability structure (40 bytes) */
+			struct cn10k_ml_fw_debug debug;
+
+			/* Flags to control error handling */
+			uint64_t flags;
+
+			uint8_t rsvd[8];
+		} fw_load;
+	};
+};
+
 /* ML firmware structure */
 struct cn10k_ml_fw {
 	/* Device reference */
@@ -50,6 +196,12 @@ struct cn10k_ml_fw {
 
 	/* Firmware file path */
 	const char *path;
+
+	/* Data buffer */
+	uint8_t *data;
+
+	/* FW load request structure */
+	struct cn10k_ml_req *req;
 };
 
 /* Device private data */
@@ -64,4 +216,8 @@ struct cn10k_ml_dev {
 	struct cn10k_ml_fw fw;
 };
 
+uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
+int cn10k_ml_fw_load(struct cn10k_ml_dev *mldev);
+void cn10k_ml_fw_unload(struct cn10k_ml_dev *mldev);
+
 #endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 32d38569a3..11e1cdb7cd 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -30,6 +30,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 {
 	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_dev *mldev;
+	int ret;
 
 	if (dev == NULL || conf == NULL)
 		return -EINVAL;
@@ -51,6 +52,11 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	if (mldev->state == ML_CN10K_DEV_STATE_PROBED) {
 		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
+
+		/* Load firmware */
+		ret = cn10k_ml_fw_load(mldev);
+		if (ret != 0)
+			return ret;
 	} else if (mldev->state == ML_CN10K_DEV_STATE_CONFIGURED) {
 		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
@@ -77,6 +83,21 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 
 	mldev = dev->data->dev_private;
 
+	/* Unload firmware */
+	cn10k_ml_fw_unload(mldev);
+
+	/* Clear scratch registers */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+
+	/* Reset ML_MLR_BASE */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+
 	mldev->state = ML_CN10K_DEV_STATE_CLOSED;
 
 	/* Remove PCI device */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index adb0035fd7..15d7478d78 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -5,6 +5,20 @@
 #ifndef _CN10K_ML_OPS_H_
 #define _CN10K_ML_OPS_H_
 
+#include "cn10k_ml_dev.h"
+
+/* ML request */
+struct cn10k_ml_req {
+	/* Job descriptor */
+	struct cn10k_ml_jd jd;
+
+	/* Job result */
+	struct cn10k_ml_result result;
+
+	/* Status field for poll mode requests */
+	volatile uint64_t status;
+} __rte_aligned(ROC_ALIGN);
+
 /* CN10K device ops */
 extern struct rte_ml_dev_ops cn10k_ml_ops;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 07/37] ml/cnxk: enable support for simulator environment
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (5 preceding siblings ...)
  2022-12-08 20:01 ` [PATCH v1 06/37] ml/cnxk: enable firmware load and device reset Srikanth Yalavarthi
@ 2022-12-08 20:01 ` Srikanth Yalavarthi
  2022-12-08 20:01 ` [PATCH v1 08/37] ml/cnxk: enable support for device start and stop Srikanth Yalavarthi
                   ` (33 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:01 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled device initialization and firmware load on simulator
platform. Firmware load stage on simulator would involve
launching a firmware handshake request only.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 119 +++++++++++++++++++++++++++++----
 1 file changed, 107 insertions(+), 12 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index f2b815aacc..805b037593 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -213,6 +213,89 @@ cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 	return FW_LOAD_FLAGS;
 }
 
+static int
+cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
+{
+	struct cn10k_ml_dev *mldev;
+	uint64_t timeout_cycle;
+	uint64_t reg_val64;
+	bool timeout;
+	int ret = 0;
+
+	mldev = fw->mldev;
+
+	/* Reset HEAD and TAIL debug pointer registers */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
+
+	/* Set ML_MLR_BASE to base IOVA of the ML region in LLC/DRAM. */
+	reg_val64 = rte_eal_get_baseaddr();
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_save(&mldev->roc, ML_MLR_BASE);
+
+	/* Update FW load completion structure */
+	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
+	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
+	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &fw->req->result);
+	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
+	plt_write64(ML_CN10K_POLL_JOB_START, &fw->req->status);
+	plt_wmb();
+
+	/* Enqueue FW load through scratch registers */
+	timeout = true;
+	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&mldev->roc, &fw->req->jd);
+
+	plt_rmb();
+	do {
+		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
+		    (plt_read64(&fw->req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+			timeout = false;
+			break;
+		}
+	} while (plt_tsc_cycles() < timeout_cycle);
+
+	/* Check firmware load status, clean-up and exit on failure. */
+	if ((!timeout) && (fw->req->result.error_code == 0)) {
+		cn10k_ml_fw_print_info(fw);
+	} else {
+		/* Set ML to disable new jobs */
+		reg_val64 = (ROC_ML_CFG_JD_SIZE | ROC_ML_CFG_MLIP_ENA);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+
+		/* Clear scratch registers */
+		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+
+		if (timeout) {
+			plt_err("Firmware load timeout");
+			ret = -ETIME;
+		} else {
+			plt_err("Firmware load failed");
+			ret = -1;
+		}
+
+		return ret;
+	}
+
+	/* Reset scratch registers */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+
+	/* Disable job execution, to be enabled in start */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 &= ~ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	return ret;
+}
+
 static int
 cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 {
@@ -447,16 +530,22 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	fw = &mldev->fw;
 	fw->mldev = mldev;
 
-	/* Read firmware image to a buffer */
-	ret = rte_firmware_read(fw->path, &fw_buffer, &fw_size);
-	if (ret < 0) {
-		plt_err("Can't read firmware data: %s\n", fw->path);
-		return ret;
+	if (roc_env_is_emulator() || roc_env_is_hw()) {
+		/* Read firmware image to a buffer */
+		ret = rte_firmware_read(fw->path, &fw_buffer, &fw_size);
+		if (ret < 0) {
+			plt_err("Can't read firmware data: %s\n", fw->path);
+			return ret;
+		}
+
+		/* Reserve memzone for firmware load completion and data */
+		mz_size = sizeof(struct cn10k_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
+			  FW_DEBUG_BUFFER_SIZE + FW_EXCEPTION_BUFFER_SIZE;
+	} else if (roc_env_is_asim()) {
+		/* Reserve memzone for firmware load completion */
+		mz_size = sizeof(struct cn10k_ml_req);
 	}
 
-	/* Reserve memzone for firmware load completion and data */
-	mz_size = sizeof(struct cn10k_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
-		  FW_DEBUG_BUFFER_SIZE + FW_EXCEPTION_BUFFER_SIZE;
 	mz = plt_memzone_reserve_aligned(FW_MEMZONE_NAME, mz_size, 0, ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL) {
 		plt_err("plt_memzone_reserve failed : %s", FW_MEMZONE_NAME);
@@ -475,10 +564,16 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 		roc_ml_mlip_reset(&mldev->roc, true);
 
 	/* Load firmware */
-	fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cn10k_ml_req));
-	ret = cn10k_ml_fw_load_cn10ka(fw, fw_buffer, fw_size);
-	if (fw_buffer != NULL)
-		free(fw_buffer);
+	if (roc_env_is_emulator() || roc_env_is_hw()) {
+		fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cn10k_ml_req));
+		ret = cn10k_ml_fw_load_cn10ka(fw, fw_buffer, fw_size);
+		if (fw_buffer != NULL)
+			free(fw_buffer);
+	} else if (roc_env_is_asim()) {
+		fw->data = NULL;
+		ret = cn10k_ml_fw_load_asim(fw);
+	}
+
 	if (ret < 0)
 		cn10k_ml_fw_unload(mldev);
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 08/37] ml/cnxk: enable support for device start and stop
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (6 preceding siblings ...)
  2022-12-08 20:01 ` [PATCH v1 07/37] ml/cnxk: enable support for simulator environment Srikanth Yalavarthi
@ 2022-12-08 20:01 ` Srikanth Yalavarthi
  2022-12-08 20:01 ` [PATCH v1 09/37] ml/cnxk: add support to create device queue-pairs Srikanth Yalavarthi
                   ` (32 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:01 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Implemented ML driver functions to start and stop ML device.
Start / Stop would enable or disable ML device to accept
inference requests.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 42 +++++++++++++++++++++++++++++++---
 1 file changed, 39 insertions(+), 3 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 11e1cdb7cd..3fea763caf 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -104,9 +104,45 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	return rte_dev_remove(dev->device);
 }
 
+static int
+cn10k_ml_dev_start(struct rte_ml_dev *dev)
+{
+	struct cn10k_ml_dev *mldev;
+	uint64_t reg_val64;
+
+	mldev = dev->data->dev_private;
+
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 |= ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	mldev->state = ML_CN10K_DEV_STATE_STARTED;
+
+	return 0;
+}
+
+static int
+cn10k_ml_dev_stop(struct rte_ml_dev *dev)
+{
+	struct cn10k_ml_dev *mldev;
+	uint64_t reg_val64;
+
+	mldev = dev->data->dev_private;
+
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 &= ~ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
-	.dev_info_get = cn10k_ml_dev_info_get,
-	.dev_configure = cn10k_ml_dev_configure,
-	.dev_close = cn10k_ml_dev_close,
+	.dev_info_get = cn10k_ml_dev_info_get, .dev_configure = cn10k_ml_dev_configure,
+	.dev_close = cn10k_ml_dev_close,       .dev_start = cn10k_ml_dev_start,
+	.dev_stop = cn10k_ml_dev_stop,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 09/37] ml/cnxk: add support to create device queue-pairs
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (7 preceding siblings ...)
  2022-12-08 20:01 ` [PATCH v1 08/37] ml/cnxk: enable support for device start and stop Srikanth Yalavarthi
@ 2022-12-08 20:01 ` Srikanth Yalavarthi
  2022-12-08 20:01 ` [PATCH v1 10/37] ml/cnxk: add functions to load and unload models Srikanth Yalavarthi
                   ` (31 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:01 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled support to create and destroy device queue-pairs. Updated
configure stage to create array to store queue-pair handles. Added
internal structure for queue-pair, queue and ML inference requests.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 207 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cn10k_ml_ops.h |  31 +++++
 2 files changed, 236 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 3fea763caf..7c9c49ffda 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -8,6 +8,97 @@
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ops.h"
 
+static void
+qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
+{
+	snprintf(name, size, "cn10k_ml_qp_mem_%u:%u", dev_id, qp_id);
+}
+
+static int
+cn10k_ml_qp_destroy(const struct rte_ml_dev *dev, struct cn10k_ml_qp *qp)
+{
+	const struct rte_memzone *qp_mem;
+	char name[RTE_MEMZONE_NAMESIZE];
+	int ret;
+
+	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp->id);
+	qp_mem = rte_memzone_lookup(name);
+	ret = rte_memzone_free(qp_mem);
+	if (ret)
+		return ret;
+
+	rte_free(qp);
+
+	return 0;
+}
+
+static int
+cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
+{
+	struct cn10k_ml_qp *qp;
+	int ret;
+
+	qp = dev->data->queue_pairs[queue_pair_id];
+	if (qp == NULL)
+		return -EINVAL;
+
+	ret = cn10k_ml_qp_destroy(dev, qp);
+	if (ret) {
+		plt_err("Could not destroy queue pair %u", queue_pair_id);
+		return ret;
+	}
+
+	dev->data->queue_pairs[queue_pair_id] = NULL;
+
+	return 0;
+}
+
+static struct cn10k_ml_qp *
+cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
+{
+	const struct rte_memzone *qp_mem;
+	char name[RTE_MEMZONE_NAMESIZE];
+	struct cn10k_ml_qp *qp;
+	uint32_t len;
+	uint8_t *va;
+
+	/* Allocate queue pair */
+	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cn10k_ml_qp), ROC_ALIGN,
+				socket_id);
+	if (qp == NULL) {
+		plt_err("Could not allocate queue pair");
+		return NULL;
+	}
+
+	/* For request queue */
+	len = nb_desc * sizeof(struct cn10k_ml_req);
+	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp_id);
+	qp_mem = rte_memzone_reserve_aligned(
+		name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | RTE_MEMZONE_256MB, ROC_ALIGN);
+	if (qp_mem == NULL) {
+		plt_err("Could not reserve memzone: %s", name);
+		goto qp_free;
+	}
+
+	va = qp_mem->addr;
+	memset(va, 0, len);
+
+	/* Initialize Request queue */
+	qp->id = qp_id;
+	qp->queue.reqs = (struct cn10k_ml_req *)va;
+	qp->queue.head = 0;
+	qp->queue.tail = 0;
+	qp->queue.wait_cycles = ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	qp->nb_desc = nb_desc;
+
+	return qp;
+
+qp_free:
+	rte_free(qp);
+
+	return NULL;
+}
+
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -30,6 +121,9 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 {
 	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_qp *qp;
+	uint32_t mz_size;
+	uint16_t qp_id;
 	int ret;
 
 	if (dev == NULL || conf == NULL)
@@ -68,21 +162,83 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		return -ENOTSUP;
 	}
 
+	/* Configure queue-pairs */
+	if (dev->data->queue_pairs == NULL) {
+		mz_size = sizeof(dev->data->queue_pairs[0]) * conf->nb_queue_pairs;
+		dev->data->queue_pairs =
+			rte_zmalloc("cn10k_mldev_queue_pairs", mz_size, RTE_CACHE_LINE_SIZE);
+		if (dev->data->queue_pairs == NULL) {
+			dev->data->nb_queue_pairs = 0;
+			plt_err("Failed to get memory for queue_pairs, nb_queue_pairs %u",
+				conf->nb_queue_pairs);
+			return -ENOMEM;
+		}
+	} else { /* Re-configure */
+		void **queue_pairs;
+
+		/* Release all queue pairs as ML spec doesn't support queue_pair_destroy. */
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+			qp = dev->data->queue_pairs[qp_id];
+			if (qp != NULL) {
+				ret = cn10k_ml_dev_queue_pair_release(dev, qp_id);
+				if (ret < 0)
+					return ret;
+			}
+		}
+
+		queue_pairs = dev->data->queue_pairs;
+		queue_pairs =
+			rte_realloc(queue_pairs, sizeof(queue_pairs[0]) * conf->nb_queue_pairs,
+				    RTE_CACHE_LINE_SIZE);
+		if (queue_pairs == NULL) {
+			dev->data->nb_queue_pairs = 0;
+			plt_err("Failed to realloc queue_pairs, nb_queue_pairs = %u",
+				conf->nb_queue_pairs);
+			ret = -ENOMEM;
+			goto error;
+		}
+
+		memset(queue_pairs, 0, sizeof(queue_pairs[0]) * conf->nb_queue_pairs);
+		dev->data->queue_pairs = queue_pairs;
+	}
+	dev->data->nb_queue_pairs = conf->nb_queue_pairs;
+
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
 
 	return 0;
+
+error:
+	if (dev->data->queue_pairs != NULL)
+		rte_free(dev->data->queue_pairs);
+
+	return ret;
 }
 
 static int
 cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_qp *qp;
+	uint16_t qp_id;
 
 	if (dev == NULL)
 		return -EINVAL;
 
 	mldev = dev->data->dev_private;
 
+	/* Destroy all queue pairs */
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		if (qp != NULL) {
+			if (cn10k_ml_qp_destroy(dev, qp) != 0)
+				plt_err("Could not destroy queue pair %u", qp_id);
+			dev->data->queue_pairs[qp_id] = NULL;
+		}
+	}
+
+	if (dev->data->queue_pairs)
+		rte_free(dev->data->queue_pairs);
+
 	/* Unload firmware */
 	cn10k_ml_fw_unload(mldev);
 
@@ -140,9 +296,56 @@ cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 	return 0;
 }
 
+static int
+cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
+			      const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_qp *qp;
+	uint32_t nb_desc;
+
+	if (queue_pair_id >= dev->data->nb_queue_pairs) {
+		plt_err("Queue-pair id = %u (>= max queue pairs supported, %u)\n", queue_pair_id,
+			dev->data->nb_queue_pairs);
+		return -EINVAL;
+	}
+
+	if (dev->data->queue_pairs[queue_pair_id] != NULL)
+		cn10k_ml_dev_queue_pair_release(dev, queue_pair_id);
+
+	cn10k_ml_dev_info_get(dev, &dev_info);
+	if ((qp_conf->nb_desc > dev_info.max_desc) || (qp_conf->nb_desc == 0)) {
+		plt_err("Could not setup queue pair for %u descriptors", qp_conf->nb_desc);
+		return -EINVAL;
+	}
+	plt_ml_dbg("Creating queue-pair, queue_pair_id = %u, nb_desc = %u", queue_pair_id,
+		   qp_conf->nb_desc);
+
+	/* As the number of usable descriptors is 1 less than the queue size being created, we
+	 * increment the size of queue by 1 than the requested size, except when the requested size
+	 * is equal to the maximum possible size.
+	 */
+	nb_desc =
+		(qp_conf->nb_desc == dev_info.max_desc) ? dev_info.max_desc : qp_conf->nb_desc + 1;
+	qp = cn10k_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
+	if (qp == NULL) {
+		plt_err("Could not create queue pair %u", queue_pair_id);
+		return -ENOMEM;
+	}
+	dev->data->queue_pairs[queue_pair_id] = qp;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
-	.dev_info_get = cn10k_ml_dev_info_get, .dev_configure = cn10k_ml_dev_configure,
-	.dev_close = cn10k_ml_dev_close,       .dev_start = cn10k_ml_dev_start,
+	.dev_info_get = cn10k_ml_dev_info_get,
+	.dev_configure = cn10k_ml_dev_configure,
+	.dev_close = cn10k_ml_dev_close,
+	.dev_start = cn10k_ml_dev_start,
 	.dev_stop = cn10k_ml_dev_stop,
+
+	/* Queue-pair handling ops */
+	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
+	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
 };
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 15d7478d78..455109f10f 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -5,6 +5,10 @@
 #ifndef _CN10K_ML_OPS_H_
 #define _CN10K_ML_OPS_H_
 
+#include <rte_mldev.h>
+
+#include <roc_api.h>
+
 #include "cn10k_ml_dev.h"
 
 /* ML request */
@@ -19,6 +23,33 @@ struct cn10k_ml_req {
 	volatile uint64_t status;
 } __rte_aligned(ROC_ALIGN);
 
+/* ML request queue */
+struct cn10k_ml_queue {
+	/* Array of requests */
+	struct cn10k_ml_req *reqs;
+
+	/* Head of the queue, used for enqueue */
+	uint64_t head;
+
+	/* Tail of the queue, used for dequeue */
+	uint64_t tail;
+
+	/* Request wait cycles */
+	uint64_t wait_cycles;
+};
+
+/* ML queue-pair structure */
+struct cn10k_ml_qp {
+	/* Queue pair ID */
+	uint32_t id;
+
+	/* Number of descriptors */
+	uint64_t nb_desc;
+
+	/* Request queue */
+	struct cn10k_ml_queue queue;
+};
+
 /* CN10K device ops */
 extern struct rte_ml_dev_ops cn10k_ml_ops;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 10/37] ml/cnxk: add functions to load and unload models
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (8 preceding siblings ...)
  2022-12-08 20:01 ` [PATCH v1 09/37] ml/cnxk: add support to create device queue-pairs Srikanth Yalavarthi
@ 2022-12-08 20:01 ` Srikanth Yalavarthi
  2022-12-08 20:01 ` [PATCH v1 11/37] ml/cnxk: enable validity checks for model metadata Srikanth Yalavarthi
                   ` (30 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:01 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added cnxk driver implementations to load and unload ML models.
Enabled support in configure stage to allocate model handles
array. Assign model ID and allocate resources per each model
during load stage and release resources during model unload.
Added internal structures to handle ML models.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |   3 +
 drivers/ml/cnxk/cn10k_ml_model.c |   5 +
 drivers/ml/cnxk/cn10k_ml_model.h |  43 +++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 154 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h   |   5 +
 drivers/ml/cnxk/meson.build      |   2 +
 6 files changed, 212 insertions(+)
 create mode 100644 drivers/ml/cnxk/cn10k_ml_model.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_model.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 30c2ea6471..c231cb23ed 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -214,6 +214,9 @@ struct cn10k_ml_dev {
 
 	/* ML Firmware */
 	struct cn10k_ml_fw fw;
+
+	/* Number of models loaded */
+	uint16_t nb_models_loaded;
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
new file mode 100644
index 0000000000..39ed707396
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#include "cn10k_ml_model.h"
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
new file mode 100644
index 0000000000..f529374281
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef _CN10K_ML_MODEL_H_
+#define _CN10K_ML_MODEL_H_
+
+#include <rte_mldev.h>
+
+#include <roc_api.h>
+
+#include "cn10k_ml_dev.h"
+
+/* Maximum number of models per device */
+#define ML_CN10K_MAX_MODELS 16
+
+/* Model state */
+enum cn10k_ml_model_state {
+	ML_CN10K_MODEL_STATE_LOADED,
+	ML_CN10K_MODEL_STATE_JOB_ACTIVE,
+	ML_CN10K_MODEL_STATE_STARTED,
+	ML_CN10K_MODEL_STATE_UNKNOWN,
+};
+
+/* ML Model Object */
+struct cn10k_ml_model {
+	/* Device reference */
+	struct cn10k_ml_dev *mldev;
+
+	/* Model name */
+	char name[RTE_ML_STR_MAX];
+
+	/* Model ID */
+	int16_t model_id;
+
+	/* Model lock, used to update model state */
+	plt_spinlock_t lock;
+
+	/* Model state */
+	enum cn10k_ml_model_state state;
+};
+
+#endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 7c9c49ffda..30e7b0da35 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -6,8 +6,12 @@
 #include <rte_mldev_pmd.h>
 
 #include "cn10k_ml_dev.h"
+#include "cn10k_ml_model.h"
 #include "cn10k_ml_ops.h"
 
+/* ML model macros */
+#define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -120,9 +124,11 @@ static int
 cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
 {
 	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
 	uint32_t mz_size;
+	int16_t model_id;
 	uint16_t qp_id;
 	int ret;
 
@@ -203,6 +209,48 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	}
 	dev->data->nb_queue_pairs = conf->nb_queue_pairs;
 
+	/* Allocate ML models */
+	if (dev->data->models == NULL) {
+		mz_size = sizeof(dev->data->models[0]) * conf->nb_models;
+		dev->data->models = rte_zmalloc("cn10k_mldev_models", mz_size, RTE_CACHE_LINE_SIZE);
+		if (dev->data->models == NULL) {
+			dev->data->nb_models = 0;
+			plt_err("Failed to get memory for ml_models, nb_models %u",
+				conf->nb_models);
+			ret = -ENOMEM;
+			goto error;
+		}
+	} else {
+		/* Re-configure */
+		void **models;
+
+		/* Unload all models */
+		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+			model = dev->data->models[model_id];
+			if (model != NULL) {
+				if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+					if (cn10k_ml_model_unload(dev, model_id) != 0)
+						plt_err("Could not unload model %d", model_id);
+				}
+				dev->data->models[model_id] = NULL;
+			}
+		}
+
+		models = dev->data->models;
+		models = rte_realloc(models, sizeof(models[0]) * conf->nb_models,
+				     RTE_CACHE_LINE_SIZE);
+		if (models == NULL) {
+			dev->data->nb_models = 0;
+			plt_err("Failed to realloc ml_models, nb_models = %u", conf->nb_models);
+			ret = -ENOMEM;
+			goto error;
+		}
+		memset(models, 0, sizeof(models[0]) * conf->nb_models);
+		dev->data->models = models;
+	}
+	dev->data->nb_models = conf->nb_models;
+
+	mldev->nb_models_loaded = 0;
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
 
 	return 0;
@@ -211,14 +259,19 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	if (dev->data->queue_pairs != NULL)
 		rte_free(dev->data->queue_pairs);
 
+	if (dev->data->models != NULL)
+		rte_free(dev->data->models);
+
 	return ret;
 }
 
 static int
 cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
+	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
+	int16_t model_id;
 	uint16_t qp_id;
 
 	if (dev == NULL)
@@ -226,6 +279,21 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 
 	mldev = dev->data->dev_private;
 
+	/* Unload all models */
+	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+		model = dev->data->models[model_id];
+		if (model != NULL) {
+			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+				if (cn10k_ml_model_unload(dev, model_id) != 0)
+					plt_err("Could not unload model %d", model_id);
+			}
+			dev->data->models[model_id] = NULL;
+		}
+	}
+
+	if (dev->data->models)
+		rte_free(dev->data->models);
+
 	/* Destroy all queue pairs */
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
 		qp = dev->data->queue_pairs[qp_id];
@@ -337,6 +405,88 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+int
+cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, int16_t *model_id)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint64_t mz_size;
+	uint16_t idx;
+	bool found;
+
+	PLT_SET_USED(params);
+
+	mldev = dev->data->dev_private;
+
+	/* Find model ID */
+	found = false;
+	for (idx = 0; idx < dev->data->nb_models; idx++) {
+		if (dev->data->models[idx] == NULL) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		plt_err("No slots available to load new model");
+		return -ENOMEM;
+	}
+
+	/* Get MZ size */
+	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE);
+
+	/* Allocate memzone for model object and model data */
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
+	mz = plt_memzone_reserve_aligned(str, mz_size, 0, ML_CN10K_ALIGN_SIZE);
+	if (!mz) {
+		plt_err("plt_memzone_reserve failed : %s", str);
+		return -ENOMEM;
+	}
+
+	model = mz->addr;
+	model->mldev = mldev;
+	model->model_id = idx;
+
+	plt_spinlock_init(&model->lock);
+	model->state = ML_CN10K_MODEL_STATE_LOADED;
+	dev->data->models[idx] = model;
+	mldev->nb_models_loaded++;
+
+	*model_id = idx;
+
+	return 0;
+}
+
+int
+cn10k_ml_model_unload(struct rte_ml_dev *dev, int16_t model_id)
+{
+	char str[RTE_MEMZONE_NAMESIZE];
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+
+	mldev = dev->data->dev_private;
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	if (model->state != ML_CN10K_MODEL_STATE_LOADED) {
+		plt_err("Cannot unload. Model in use.");
+		return -EBUSY;
+	}
+
+	dev->data->models[model_id] = NULL;
+	mldev->nb_models_loaded--;
+
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%d", CN10K_ML_MODEL_MEMZONE_NAME, model_id);
+	return plt_memzone_free(plt_memzone_lookup(str));
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -348,4 +498,8 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Queue-pair handling ops */
 	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
 	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
+
+	/* Model ops */
+	.model_load = cn10k_ml_model_load,
+	.model_unload = cn10k_ml_model_unload,
 };
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 455109f10f..5caebde908 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -53,4 +53,9 @@ struct cn10k_ml_qp {
 /* CN10K device ops */
 extern struct rte_ml_dev_ops cn10k_ml_ops;
 
+/* Slow-path ops */
+int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
+			int16_t *model_id);
+int cn10k_ml_model_unload(struct rte_ml_dev *dev, int16_t model_id);
+
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 7c6fa5e906..1f1c923329 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -10,11 +10,13 @@ endif
 sources = files(
         'cn10k_ml_dev.c',
         'cn10k_ml_ops.c',
+        'cn10k_ml_model.c',
 )
 
 headers = files(
         'cn10k_ml_dev.h',
         'cn10k_ml_ops.h',
+        'cn10k_ml_model.h',
 )
 
 deps += ['mldev', 'common_ml', 'common_cnxk', 'kvargs']
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 11/37] ml/cnxk: enable validity checks for model metadata
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (9 preceding siblings ...)
  2022-12-08 20:01 ` [PATCH v1 10/37] ml/cnxk: add functions to load and unload models Srikanth Yalavarthi
@ 2022-12-08 20:01 ` Srikanth Yalavarthi
  2022-12-08 20:01 ` [PATCH v1 12/37] ml/cnxk: add internal structures for derived info Srikanth Yalavarthi
                   ` (29 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:01 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added model metadata structure and enabled metadata check
during model load. Remap cnxk IO types with RTE IO types.
Store and update model metadata in model structure.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 196 +++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h | 312 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c   |  14 +-
 drivers/ml/cnxk/meson.build      |   2 +-
 4 files changed, 522 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 39ed707396..6f803ce6a5 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -2,4 +2,200 @@
  * Copyright (c) 2022 Marvell.
  */
 
+#include <rte_hash_crc.h>
+
+#include <ml_utils.h>
+
+#include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
+
+static enum rte_ml_io_type
+cn10k_ml_io_type_map(uint8_t type)
+{
+	switch (type) {
+	case 1:
+		return RTE_ML_IO_TYPE_INT8;
+	case 2:
+		return RTE_ML_IO_TYPE_UINT8;
+	case 3:
+		return RTE_ML_IO_TYPE_INT16;
+	case 4:
+		return RTE_ML_IO_TYPE_UINT16;
+	case 5:
+		return RTE_ML_IO_TYPE_INT32;
+	case 6:
+		return RTE_ML_IO_TYPE_UINT32;
+	case 7:
+		return RTE_ML_IO_TYPE_FP16;
+	case 8:
+		return RTE_ML_IO_TYPE_FP32;
+	}
+
+	return RTE_ML_IO_TYPE_UNKNOWN;
+}
+
+int
+cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	uint32_t payload_crc32c;
+	uint32_t header_crc32c;
+	uint8_t version[4];
+	uint8_t i;
+
+	metadata = (struct cn10k_ml_model_metadata *)buffer;
+
+	/* Header CRC check */
+	if (metadata->metadata_header.header_crc32c != 0) {
+		header_crc32c = rte_hash_crc(
+			buffer, sizeof(metadata->metadata_header) - sizeof(uint32_t), 0);
+
+		if (header_crc32c != metadata->metadata_header.header_crc32c) {
+			plt_err("Invalid model, Header CRC mismatch");
+			return -EINVAL;
+		}
+	}
+
+	/* Payload CRC check */
+	if (metadata->metadata_header.payload_crc32c != 0) {
+		payload_crc32c = rte_hash_crc(buffer + sizeof(metadata->metadata_header),
+					      size - sizeof(metadata->metadata_header), 0);
+
+		if (payload_crc32c != metadata->metadata_header.payload_crc32c) {
+			plt_err("Invalid model, Payload CRC mismatch");
+			return -EINVAL;
+		}
+	}
+
+	/* Model magic string */
+	if (strncmp((char *)metadata->metadata_header.magic, MRVL_ML_MODEL_MAGIC_STRING, 4) != 0) {
+		plt_err("Invalid model, magic = %s", metadata->metadata_header.magic);
+		return -EINVAL;
+	}
+
+	/* Target architecture */
+	if (metadata->metadata_header.target_architecture != MRVL_ML_MODEL_TARGET_ARCH) {
+		plt_err("Model target architecture (%u) not supported",
+			metadata->metadata_header.target_architecture);
+		return -ENOTSUP;
+	}
+
+	/* Header version */
+	memcpy(version, metadata->metadata_header.version, 4 * sizeof(uint8_t));
+	if (version[0] * 1000 + version[1] * 100 < MRVL_ML_MODEL_VERSION) {
+		plt_err("Metadata version = %u.%u.%u.%u (< %u.%u.%u.%u) not supported", version[0],
+			version[1], version[2], version[3], (MRVL_ML_MODEL_VERSION / 1000) % 10,
+			(MRVL_ML_MODEL_VERSION / 100) % 10, (MRVL_ML_MODEL_VERSION / 10) % 10,
+			MRVL_ML_MODEL_VERSION % 10);
+		return -ENOTSUP;
+	}
+
+	/* Init section */
+	if (metadata->init_model.file_size == 0) {
+		plt_err("Invalid metadata, init_model.file_size = %u",
+			metadata->init_model.file_size);
+		return -EINVAL;
+	}
+
+	/* Main section */
+	if (metadata->main_model.file_size == 0) {
+		plt_err("Invalid metadata, main_model.file_size = %u",
+			metadata->main_model.file_size);
+		return -EINVAL;
+	}
+
+	/* Finish section */
+	if (metadata->finish_model.file_size == 0) {
+		plt_err("Invalid metadata, finish_model.file_size = %u",
+			metadata->finish_model.file_size);
+		return -EINVAL;
+	}
+
+	/* Weights and Bias */
+	if (metadata->weights_bias.file_size == 0) {
+		plt_err("Invalid metadata, weights_bias.file_size = %u",
+			metadata->weights_bias.file_size);
+		return -EINVAL;
+	}
+
+	if (metadata->weights_bias.relocatable != 1) {
+		plt_err("Model not supported, non-relocatable weights and bias");
+		return -ENOTSUP;
+	}
+
+	/* Inputs */
+	for (i = 0; i < metadata->model.num_input; i++) {
+		if (ml_io_type_size_get(cn10k_ml_io_type_map(metadata->input[i].input_type)) <= 0) {
+			plt_err("Invalid metadata, input[%u] : input_type = %u", i,
+				metadata->input[i].input_type);
+			return -EINVAL;
+		}
+
+		if (ml_io_type_size_get(
+			    cn10k_ml_io_type_map(metadata->input[i].model_input_type)) <= 0) {
+			plt_err("Invalid metadata, input[%u] : model_input_type = %u", i,
+				metadata->input[i].model_input_type);
+			return -EINVAL;
+		}
+
+		if (metadata->input[i].relocatable != 1) {
+			plt_err("Model not supported, non-relocatable input: %u", i);
+			return -ENOTSUP;
+		}
+	}
+
+	/* Outputs */
+	for (i = 0; i < metadata->model.num_output; i++) {
+		if (ml_io_type_size_get(cn10k_ml_io_type_map(metadata->output[i].output_type)) <=
+		    0) {
+			plt_err("Invalid metadata, output[%u] : output_type = %u", i,
+				metadata->output[i].output_type);
+			return -EINVAL;
+		}
+
+		if (ml_io_type_size_get(
+			    cn10k_ml_io_type_map(metadata->output[i].model_output_type)) <= 0) {
+			plt_err("Invalid metadata, output[%u] : model_output_type = %u", i,
+				metadata->output[i].model_output_type);
+			return -EINVAL;
+		}
+
+		if (metadata->output[i].relocatable != 1) {
+			plt_err("Model not supported, non-relocatable output: %u", i);
+			return -ENOTSUP;
+		}
+	}
+
+	return 0;
+}
+
+void
+cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata)
+{
+	uint8_t i;
+
+	for (i = 0; i < metadata->model.num_input; i++) {
+		metadata->input[i].input_type = cn10k_ml_io_type_map(metadata->input[i].input_type);
+		metadata->input[i].model_input_type =
+			cn10k_ml_io_type_map(metadata->input[i].model_input_type);
+
+		if (metadata->input[i].shape.w == 0)
+			metadata->input[i].shape.w = 1;
+
+		if (metadata->input[i].shape.x == 0)
+			metadata->input[i].shape.x = 1;
+
+		if (metadata->input[i].shape.y == 0)
+			metadata->input[i].shape.y = 1;
+
+		if (metadata->input[i].shape.z == 0)
+			metadata->input[i].shape.z = 1;
+	}
+
+	for (i = 0; i < metadata->model.num_output; i++) {
+		metadata->output[i].output_type =
+			cn10k_ml_io_type_map(metadata->output[i].output_type);
+		metadata->output[i].model_output_type =
+			cn10k_ml_io_type_map(metadata->output[i].model_output_type);
+	}
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index f529374281..eb031c6fb2 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -22,6 +22,309 @@ enum cn10k_ml_model_state {
 	ML_CN10K_MODEL_STATE_UNKNOWN,
 };
 
+/* Model Metadata : v 2.1.0.2 */
+#define MRVL_ML_MODEL_MAGIC_STRING "MRVL"
+#define MRVL_ML_MODEL_TARGET_ARCH  128
+#define MRVL_ML_MODEL_VERSION	   2100
+#define MRVL_ML_MODEL_NAME_LEN	   64
+#define MRVL_ML_INPUT_NAME_LEN	   16
+#define MRVL_ML_OUTPUT_NAME_LEN	   16
+#define MRVL_ML_INPUT_OUTPUT_SIZE  8
+
+/* Model file metadata structure */
+struct cn10k_ml_model_metadata {
+	/* Header (256-byte) */
+	struct {
+		/* Magic string ('M', 'R', 'V', 'L') */
+		uint8_t magic[4];
+
+		/* Metadata version */
+		uint8_t version[4];
+
+		/* Metadata size */
+		uint32_t metadata_size;
+
+		/* Unique ID */
+		uint8_t uuid[128];
+
+		/* Model target architecture
+		 * 0 = Undefined
+		 * 1 = M1K
+		 * 128 = MLIP
+		 * 256 = Experimental
+		 */
+		uint32_t target_architecture;
+		uint8_t reserved[104];
+
+		/* CRC of data after metadata_header (i.e. after first 256 bytes) */
+		uint32_t payload_crc32c;
+
+		/* CRC of first 252 bytes of metadata_header, after payload_crc calculation */
+		uint32_t header_crc32c;
+	} metadata_header;
+
+	/* Model information (256-byte) */
+	struct {
+		/* Model name string */
+		uint8_t name[MRVL_ML_MODEL_NAME_LEN];
+
+		/* Model version info (xx.xx.xx.xx) */
+		uint8_t version[4];
+
+		/* Model code size (init + main + finish) */
+		uint32_t code_size;
+
+		/* Model data size (Weights and Bias) */
+		uint32_t data_size;
+
+		/* OCM start offset, set to ocm_wb_range_start */
+		uint32_t ocm_start;
+
+		/* OCM start offset, set to max OCM size */
+		uint32_t ocm_end;
+
+		/* Relocatable flag (always yes)
+		 * 0 = Not relocatable
+		 * 1 = Relocatable
+		 */
+		uint8_t ocm_relocatable;
+
+		/* Tile relocatable flag (always yes)
+		 * 0 = Not relocatable
+		 * 1 = Relocatable
+		 */
+		uint8_t tile_relocatable;
+
+		/* Start tile (Always 0) */
+		uint8_t tile_start;
+
+		/* End tile (num_tiles - 1) */
+		uint8_t tile_end;
+
+		/* Inference batch size */
+		uint8_t batch_size;
+
+		/* Number of input tensors (Max 8) */
+		uint8_t num_input;
+
+		/* Number of output tensors (Max 8) */
+		uint8_t num_output;
+		uint8_t reserved1;
+
+		/* Total input size in bytes */
+		uint32_t input_size;
+
+		/* Total output size in bytes */
+		uint32_t output_size;
+
+		/* Table size in bytes */
+		uint32_t table_size;
+
+		/* Number of layers in the network */
+		uint32_t num_layers;
+		uint32_t reserved2;
+
+		/* Floor of absolute OCM region */
+		uint64_t ocm_tmp_range_floor;
+
+		/* Relative OCM start address of WB data block */
+		uint64_t ocm_wb_range_start;
+
+		/* Relative OCM end address of WB data block */
+		uint64_t ocm_wb_range_end;
+
+		/* Relative DDR start address of WB data block */
+		uint64_t ddr_wb_range_start;
+
+		/* Relative DDR end address of all outputs */
+		uint64_t ddr_wb_range_end;
+
+		/* Relative DDR start address of all inputs */
+		uint64_t ddr_input_range_start;
+
+		/* Relative DDR end address of all inputs */
+		uint64_t ddr_input_range_end;
+
+		/* Relative DDR start address of all outputs */
+		uint64_t ddr_output_range_start;
+
+		/* Relative ddr end address of all outputs */
+		uint64_t ddr_output_range_end;
+
+		/* Compiler version */
+		uint8_t compiler_version[8];
+
+		/* CDK version */
+		uint8_t cdk_version[4];
+
+		/* Lower batch optimization support
+		 * 0 - No,
+		 * 1 - Yes
+		 */
+		uint8_t supports_lower_batch_size_optimization;
+		uint8_t reserved[59];
+	} model;
+
+	/* Init section (64-byte) */
+	struct {
+		uint32_t file_offset;
+		uint32_t file_size;
+		uint8_t reserved[56];
+	} init_model;
+
+	/* Main section (64-byte) */
+	struct {
+		uint32_t file_offset;
+		uint32_t file_size;
+		uint8_t reserved[56];
+	} main_model;
+
+	/* Finish section (64-byte) */
+	struct {
+		uint32_t file_offset;
+		uint32_t file_size;
+		uint8_t reserved[56];
+	} finish_model;
+
+	uint8_t reserved1[512]; /* End of 2k bytes */
+
+	/* Weights and Biases (64-byte) */
+	struct {
+		/* Memory offset, Set to ddr_wb_range_start */
+		uint64_t mem_offset;
+		uint32_t file_offset;
+		uint32_t file_size;
+
+		/* Relocatable flag for WB
+		 * 1 = Relocatable
+		 * 2 = Not relocatable
+		 */
+		uint8_t relocatable;
+		uint8_t reserved[47];
+	} weights_bias;
+
+	/* Input (512-byte, 64-byte per input) provisioned for 8 inputs */
+	struct {
+		/* DDR offset (in ocm absolute addresses for input) */
+		uint64_t mem_offset;
+
+		/* Relocatable flag
+		 * 1 = Relocatable
+		 * 2 = Not relocatable
+		 */
+		uint8_t relocatable;
+
+		/* Input quantization
+		 * 1 = Requires quantization
+		 * 2 = Pre-quantized
+		 */
+		uint8_t quantize;
+
+		/* Type of incoming input
+		 * 1 = INT8, 2 = UINT8, 3 = INT16, 4 = UINT16,
+		 * 5 = INT32, 6 = UINT32, 7 = FP16, 8 = FP32
+		 */
+		uint8_t input_type;
+
+		/* Type of input required by model
+		 * 1 = INT8, 2 = UINT8, 3 = INT16, 4 = UINT16,
+		 * 5 = INT32, 6 = UINT32, 7 = FP16, 8 = FP32
+		 */
+		uint8_t model_input_type;
+
+		/* float_32 qscale value
+		 * quantized = non-quantized * qscale
+		 */
+		float qscale;
+
+		/* Input shape */
+		struct {
+			/* Input format
+			 * 1 = NCHW
+			 * 2 = NHWC
+			 */
+			uint8_t format;
+			uint8_t reserved[3];
+			uint32_t w;
+			uint32_t x;
+			uint32_t y;
+			uint32_t z;
+		} shape;
+		uint8_t reserved[4];
+
+		/* Name of input */
+		uint8_t input_name[MRVL_ML_INPUT_NAME_LEN];
+
+		/* DDR range end
+		 * new = mem_offset + size_bytes - 1
+		 */
+		uint64_t ddr_range_end;
+	} input[MRVL_ML_INPUT_OUTPUT_SIZE];
+
+	/* Output (512 byte, 64-byte per input) provisioned for 8 outputs */
+	struct {
+		/* DDR offset in ocm absolute addresses for output */
+		uint64_t mem_offset;
+
+		/* Relocatable flag
+		 * 1 = Relocatable
+		 * 2 = Not relocatable
+		 */
+		uint8_t relocatable;
+
+		/* Output dequantization
+		 * 1 = De-quantization required
+		 * 2 = De-quantization not required
+		 */
+		uint8_t dequantize;
+
+		/* Type of outgoing output
+		 * 1 = INT8, 2 = UINT8, 3 = INT16, 4 = UINT16
+		 * 5 = INT32, 6 = UINT32, 7 = FP16, 8 = FP32
+		 */
+		uint8_t output_type;
+
+		/* Type of output produced by model
+		 * 1 = INT8, 2 = UINT8, 3 = INT16, 4 = UINT16
+		 * 5 = INT32, 6 = UINT32, 7 = FP16, 8 = FP32
+		 */
+		uint8_t model_output_type;
+
+		/* float_32 dscale value
+		 * dequantized = quantized * dscale
+		 */
+		float dscale;
+
+		/* Number of items in the output */
+		uint32_t size;
+		uint8_t reserved[20];
+
+		/* DDR range end
+		 * new = mem_offset + size_bytes - 1
+		 */
+		uint64_t ddr_range_end;
+		uint8_t output_name[MRVL_ML_OUTPUT_NAME_LEN];
+	} output[MRVL_ML_INPUT_OUTPUT_SIZE];
+
+	uint8_t reserved2[1792];
+
+	/* Model data */
+	struct {
+		uint8_t reserved1[4068];
+
+		/* Beta: xx.xx.xx.xx,
+		 * Later: YYYYMM.xx.xx
+		 */
+		uint8_t compiler_version[8];
+
+		/* M1K CDK version (xx.xx.xx.xx) */
+		uint8_t m1k_cdk_version[4];
+	} data;
+
+	/* Hidden 16 bytes of magic code */
+	uint8_t reserved3[16];
+};
+
 /* ML Model Object */
 struct cn10k_ml_model {
 	/* Device reference */
@@ -33,6 +336,12 @@ struct cn10k_ml_model {
 	/* Model ID */
 	int16_t model_id;
 
+	/* Batch size */
+	uint32_t batch_size;
+
+	/* Model metadata */
+	struct cn10k_ml_model_metadata metadata;
+
 	/* Model lock, used to update model state */
 	plt_spinlock_t lock;
 
@@ -40,4 +349,7 @@ struct cn10k_ml_model {
 	enum cn10k_ml_model_state state;
 };
 
+int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
+void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
+
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 30e7b0da35..171428794e 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -416,8 +416,11 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	uint64_t mz_size;
 	uint16_t idx;
 	bool found;
+	int ret;
 
-	PLT_SET_USED(params);
+	ret = cn10k_ml_model_metadata_check(params->addr, params->size);
+	if (ret != 0)
+		return ret;
 
 	mldev = dev->data->dev_private;
 
@@ -450,6 +453,15 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	model->mldev = mldev;
 	model->model_id = idx;
 
+	memcpy(&model->metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
+	cn10k_ml_model_metadata_update(&model->metadata);
+
+	/* Enable support for batch_size of 256 */
+	if (model->metadata.model.batch_size == 0)
+		model->batch_size = 256;
+	else
+		model->batch_size = model->metadata.model.batch_size;
+
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 1f1c923329..b7567d04a2 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -19,7 +19,7 @@ headers = files(
         'cn10k_ml_model.h',
 )
 
-deps += ['mldev', 'common_ml', 'common_cnxk', 'kvargs']
+deps += ['mldev', 'common_ml', 'common_cnxk', 'kvargs', 'hash']
 
 if get_option('buildtype').contains('debug')
         cflags += [ '-DCNXK_ML_DEV_DEBUG' ]
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 12/37] ml/cnxk: add internal structures for derived info
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (10 preceding siblings ...)
  2022-12-08 20:01 ` [PATCH v1 11/37] ml/cnxk: enable validity checks for model metadata Srikanth Yalavarthi
@ 2022-12-08 20:01 ` Srikanth Yalavarthi
  2022-12-08 20:01 ` [PATCH v1 13/37] ml/cnxk: add internal structures for tiles and OCM Srikanth Yalavarthi
                   ` (28 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:01 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added internal structures to handle derived address fields
and enabled support to compute DMA addresses for model start.
Enabled updating internal model fields.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 88 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h | 80 +++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 18 ++++++-
 3 files changed, 185 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 6f803ce6a5..72b52fce8d 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -199,3 +199,91 @@ cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata)
 			cn10k_ml_io_type_map(metadata->output[i].model_output_type);
 	}
 }
+
+void
+cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_t *base_dma_addr)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	struct cn10k_ml_model_addr *addr;
+	size_t model_data_size;
+	uint8_t *dma_addr_load;
+	uint8_t *dma_addr_run;
+	uint8_t i;
+	int fpos;
+
+	metadata = &model->metadata;
+	addr = &model->addr;
+	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
+			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
+
+	/* Base address */
+	addr->base_dma_addr_load = base_dma_addr;
+	addr->base_dma_addr_run = PLT_PTR_ADD(addr->base_dma_addr_load, model_data_size);
+
+	/* Init Section */
+	dma_addr_load = addr->base_dma_addr_load;
+	dma_addr_run = addr->base_dma_addr_run;
+	fpos = sizeof(struct cn10k_ml_model_metadata);
+	addr->init_load_addr = dma_addr_load;
+	addr->init_run_addr = dma_addr_run;
+	memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->init_model.file_size);
+
+	/* Main Section */
+	dma_addr_load += metadata->init_model.file_size;
+	dma_addr_run += metadata->init_model.file_size;
+	fpos += metadata->init_model.file_size;
+	addr->main_load_addr = dma_addr_load;
+	addr->main_run_addr = dma_addr_run;
+	memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->main_model.file_size);
+
+	/* Finish Section */
+	dma_addr_load += metadata->main_model.file_size;
+	dma_addr_run += metadata->main_model.file_size;
+	fpos += metadata->main_model.file_size;
+	addr->finish_load_addr = dma_addr_load;
+	addr->finish_run_addr = dma_addr_run;
+	memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->finish_model.file_size);
+
+	/* Weights & Bias Section*/
+	dma_addr_load += metadata->finish_model.file_size;
+	fpos += metadata->finish_model.file_size;
+	addr->wb_base_addr = PLT_PTR_SUB(dma_addr_load, metadata->weights_bias.mem_offset);
+	addr->wb_load_addr = PLT_PTR_ADD(addr->wb_base_addr, metadata->weights_bias.mem_offset);
+	memcpy(addr->wb_load_addr, PLT_PTR_ADD(buffer, fpos), metadata->weights_bias.file_size);
+
+	/* Inputs */
+	addr->total_input_sz_d = 0;
+	addr->total_input_sz_q = 0;
+	for (i = 0; i < metadata->model.num_input; i++) {
+		addr->input[i].nb_elements =
+			model->metadata.input[i].shape.w * model->metadata.input[i].shape.x *
+			model->metadata.input[i].shape.y * model->metadata.input[i].shape.z;
+		addr->input[i].sz_d = addr->input[i].nb_elements *
+				      ml_io_type_size_get(metadata->input[i].input_type);
+		addr->input[i].sz_q = addr->input[i].nb_elements *
+				      ml_io_type_size_get(metadata->input[i].model_input_type);
+		addr->total_input_sz_d += addr->input[i].sz_d;
+		addr->total_input_sz_q += addr->input[i].sz_q;
+
+		plt_ml_dbg("model_id = %d, input[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+			   model->model_id, i, metadata->input[i].shape.w,
+			   metadata->input[i].shape.x, metadata->input[i].shape.y,
+			   metadata->input[i].shape.z, addr->input[i].sz_d, addr->input[i].sz_q);
+	}
+
+	/* Outputs */
+	addr->total_output_sz_q = 0;
+	addr->total_output_sz_d = 0;
+	for (i = 0; i < metadata->model.num_output; i++) {
+		addr->output[i].nb_elements = metadata->output[i].size;
+		addr->output[i].sz_d = addr->output[i].nb_elements *
+				       ml_io_type_size_get(metadata->output[i].output_type);
+		addr->output[i].sz_q = addr->output[i].nb_elements *
+				       ml_io_type_size_get(metadata->output[i].model_output_type);
+		addr->total_output_sz_q += addr->output[i].sz_q;
+		addr->total_output_sz_d += addr->output[i].sz_d;
+
+		plt_ml_dbg("model_id = %d, output[%u] - sz_d = %u, sz_q = %u", model->model_id, i,
+			   addr->output[i].sz_d, addr->output[i].sz_q);
+	}
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index eb031c6fb2..02a119cdd8 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -325,6 +325,81 @@ struct cn10k_ml_model_metadata {
 	uint8_t reserved3[16];
 };
 
+/* Model address structure */
+struct cn10k_ml_model_addr {
+	/* Base DMA address for load */
+	void *base_dma_addr_load;
+
+	/* Base DMA address for run */
+	void *base_dma_addr_run;
+
+	/* Init section load address */
+	void *init_load_addr;
+
+	/* Init section run address */
+	void *init_run_addr;
+
+	/* Main section load address */
+	void *main_load_addr;
+
+	/* Main section run address */
+	void *main_run_addr;
+
+	/* Finish section load address */
+	void *finish_load_addr;
+
+	/* Finish section run address */
+	void *finish_run_addr;
+
+	/* Weights and Bias base address */
+	void *wb_base_addr;
+
+	/* Weights and bias load address */
+	void *wb_load_addr;
+
+	/* Start tile */
+	uint8_t tile_start;
+
+	/* End tile */
+	uint8_t tile_end;
+
+	/* Input address and size */
+	struct {
+		/* Number of elements */
+		uint32_t nb_elements;
+
+		/* Dequantized input size */
+		uint32_t sz_d;
+
+		/* Quantized input size */
+		uint32_t sz_q;
+	} input[MRVL_ML_INPUT_OUTPUT_SIZE];
+
+	/* Output address and size */
+	struct {
+		/* Number of elements */
+		uint32_t nb_elements;
+
+		/* Dequantize output size */
+		uint32_t sz_d;
+
+		/* Quantized output size */
+		uint32_t sz_q;
+	} output[MRVL_ML_INPUT_OUTPUT_SIZE];
+
+	/* Total size of quantized input */
+	uint32_t total_input_sz_q;
+
+	/* Total size of dequantized input */
+	uint32_t total_input_sz_d;
+
+	/* Total size of quantized output */
+	uint32_t total_output_sz_q;
+
+	/* Total size of dequantized output */
+	uint32_t total_output_sz_d;
+};
+
 /* ML Model Object */
 struct cn10k_ml_model {
 	/* Device reference */
@@ -342,6 +417,9 @@ struct cn10k_ml_model {
 	/* Model metadata */
 	struct cn10k_ml_model_metadata metadata;
 
+	/* Model address structure */
+	struct cn10k_ml_model_addr addr;
+
 	/* Model lock, used to update model state */
 	plt_spinlock_t lock;
 
@@ -351,5 +429,7 @@ struct cn10k_ml_model {
 
 int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
+void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
+				uint8_t *base_dma_addr);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 171428794e..6bf365d185 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -408,11 +408,14 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 int
 cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, int16_t *model_id)
 {
+	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
+	size_t model_data_size;
+	uint8_t *base_dma_addr;
 	uint64_t mz_size;
 	uint16_t idx;
 	bool found;
@@ -439,7 +442,12 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	/* Get MZ size */
-	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE);
+	metadata = (struct cn10k_ml_model_metadata *)params->addr;
+	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
+			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
+	model_data_size = PLT_ALIGN_CEIL(model_data_size, ML_CN10K_ALIGN_SIZE);
+	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
+		  2 * model_data_size;
 
 	/* Allocate memzone for model object and model data */
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
@@ -462,6 +470,14 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	else
 		model->batch_size = model->metadata.model.batch_size;
 
+	/* Set DMA base address */
+	base_dma_addr = PLT_PTR_ADD(
+		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE));
+	cn10k_ml_model_addr_update(model, params->addr, base_dma_addr);
+
+	/* Copy data from load to run. run address to be used by MLIP */
+	memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, model_data_size);
+
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 13/37] ml/cnxk: add internal structures for tiles and OCM
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (11 preceding siblings ...)
  2022-12-08 20:01 ` [PATCH v1 12/37] ml/cnxk: add internal structures for derived info Srikanth Yalavarthi
@ 2022-12-08 20:01 ` Srikanth Yalavarthi
  2022-12-08 20:01 ` [PATCH v1 14/37] ml/cnxk: add structures for slow and fast path JDs Srikanth Yalavarthi
                   ` (27 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:01 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added internal structures to handle tile and OCM information and
OCM to model memory mapping. Initialize the fields to platform
specific defaults and compute the OCM / tile requirements for model.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |  5 ++
 drivers/ml/cnxk/cn10k_ml_model.c | 53 +++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h |  6 +++
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  5 ++
 drivers/ml/cnxk/cn10k_ml_ocm.h   | 79 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 28 +++++++++++
 drivers/ml/cnxk/meson.build      |  2 +
 7 files changed, 178 insertions(+)
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ocm.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ocm.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index c231cb23ed..6b91c9aae6 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -7,6 +7,8 @@
 
 #include <roc_api.h>
 
+#include "cn10k_ml_ocm.h"
+
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
@@ -215,6 +217,9 @@ struct cn10k_ml_dev {
 	/* ML Firmware */
 	struct cn10k_ml_fw fw;
 
+	/* ML OCM info */
+	struct cn10k_ml_ocm ocm;
+
 	/* Number of models loaded */
 	uint16_t nb_models_loaded;
 };
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 72b52fce8d..11b52af68c 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -8,6 +8,7 @@
 
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
+#include "cn10k_ml_ocm.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -287,3 +288,55 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_
 			   addr->output[i].sz_d, addr->output[i].sz_q);
 	}
 }
+
+int
+cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, int16_t model_id, uint8_t *buffer,
+			       uint16_t *wb_pages, uint16_t *scratch_pages)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	struct cn10k_ml_ocm *ocm;
+	uint64_t scratch_size;
+	uint64_t wb_size;
+
+	metadata = (struct cn10k_ml_model_metadata *)buffer;
+	ocm = &mldev->ocm;
+
+	/* Assume wb_size is zero for non-relocatable models */
+	if (metadata->model.ocm_relocatable)
+		wb_size = metadata->model.ocm_wb_range_end - metadata->model.ocm_wb_range_start + 1;
+	else
+		wb_size = 0;
+
+	if (wb_size % ocm->page_size)
+		*wb_pages = wb_size / ocm->page_size + 1;
+	else
+		*wb_pages = wb_size / ocm->page_size;
+	plt_ml_dbg("model_id = %d, wb_size = %" PRIu64 ", wb_pages = %u", model_id, wb_size,
+		   *wb_pages);
+
+	scratch_size = ocm->size_per_tile - metadata->model.ocm_tmp_range_floor;
+	if (metadata->model.ocm_tmp_range_floor % ocm->page_size)
+		*scratch_pages = scratch_size / ocm->page_size + 1;
+	else
+		*scratch_pages = scratch_size / ocm->page_size;
+	plt_ml_dbg("model_id = %d, scratch_size = %" PRIu64 ", scratch_pages = %u", model_id,
+		   scratch_size, *scratch_pages);
+
+	/* Check if the model can be loaded on OCM */
+	if ((*wb_pages + *scratch_pages) > ML_CN10K_OCM_NUMPAGES) {
+		plt_err("Cannot create the model, OCM relocatable = %u",
+			metadata->model.ocm_relocatable);
+		plt_err("wb_pages (%u) + scratch_pages (%u) > %u", *wb_pages, *scratch_pages,
+			ML_CN10K_OCM_NUMPAGES);
+		return -ENOMEM;
+	}
+
+	/* Update scratch_pages to block the full tile for OCM non-relocatable model. This would
+	 * prevent the library from allocating the remaining space on the tile to other models.
+	 */
+	if (!metadata->model.ocm_relocatable)
+		*scratch_pages =
+			PLT_MAX(PLT_U64_CAST(*scratch_pages), PLT_U64_CAST(ML_CN10K_OCM_NUMPAGES));
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 02a119cdd8..913849feb0 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -10,6 +10,7 @@
 #include <roc_api.h>
 
 #include "cn10k_ml_dev.h"
+#include "cn10k_ml_ocm.h"
 
 /* Maximum number of models per device */
 #define ML_CN10K_MAX_MODELS 16
@@ -420,6 +421,9 @@ struct cn10k_ml_model {
 	/* Model address structure */
 	struct cn10k_ml_model_addr addr;
 
+	/* Tile and memory information object */
+	struct cn10k_ml_ocm_model_map model_mem_map;
+
 	/* Model lock, used to update model state */
 	plt_spinlock_t lock;
 
@@ -431,5 +435,7 @@ int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
 void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
 				uint8_t *base_dma_addr);
+int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, int16_t model_id, uint8_t *buffer,
+				   uint16_t *wb_pages, uint16_t *scratch_pages);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
new file mode 100644
index 0000000000..b1c62f2963
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#include "cn10k_ml_ocm.h"
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
new file mode 100644
index 0000000000..57c2eee344
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -0,0 +1,79 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef _CN10K_ML_OCM_H_
+#define _CN10K_ML_OCM_H_
+
+#include <rte_mldev.h>
+
+/* Page size in bytes. */
+#define ML_CN10K_OCM_PAGESIZE 0x4000
+
+/* Number of OCM tiles. */
+#define ML_CN10K_OCM_NUMTILES 0x8
+
+/* OCM in bytes, per tile. */
+#define ML_CN10K_OCM_TILESIZE 0x100000
+
+/* OCM pages, per tile. */
+#define ML_CN10K_OCM_NUMPAGES (ML_CN10K_OCM_TILESIZE / ML_CN10K_OCM_PAGESIZE)
+
+/* Maximum OCM mask words, per tile, 8 bit words. */
+#define ML_CN10K_OCM_MASKWORDS (ML_CN10K_OCM_NUMPAGES / 8)
+
+/* ML OCM and Tile information structure */
+struct cn10k_ml_ocm_tile_info {
+	/* Mask of used / allotted pages on tile's OCM */
+	uint8_t ocm_mask[ML_CN10K_OCM_MASKWORDS];
+
+	/* Last pages in the tile's OCM used for weights and bias, default = -1 */
+	int last_wb_page;
+
+	/* Number pages used for scratch memory on the tile's OCM */
+	uint16_t scratch_pages;
+};
+
+/* ML Model OCM map structure */
+struct cn10k_ml_ocm_model_map {
+	/* Status of OCM reservation */
+	bool ocm_reserved;
+
+	/* Mask of OCM tiles for the model */
+	uint64_t tilemask;
+
+	/* Start page for the model load, default = -1 */
+	int wb_page_start;
+
+	/* Number of pages required for weights and bias */
+	uint16_t wb_pages;
+
+	/* Number of pages required for scratch memory */
+	uint16_t scratch_pages;
+};
+
+/* OCM state structure */
+struct cn10k_ml_ocm {
+	/* OCM spinlock, used to update OCM state */
+	rte_spinlock_t lock;
+
+	/* Number of OCM tiles */
+	uint8_t num_tiles;
+
+	/* OCM size per each tile */
+	uint64_t size_per_tile;
+
+	/* Size of OCM page */
+	uint64_t page_size;
+
+	/* Number of OCM pages */
+	uint16_t num_pages;
+
+	/* Words per OCM mask */
+	uint16_t mask_words;
+
+	/* OCM memory info and status*/
+	struct cn10k_ml_ocm_tile_info tile_ocm_info[ML_CN10K_OCM_NUMTILES];
+};
+
+#endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 6bf365d185..63c6ae4862 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -126,8 +126,10 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_qp *qp;
 	uint32_t mz_size;
+	uint16_t tile_id;
 	int16_t model_id;
 	uint16_t qp_id;
 	int ret;
@@ -250,6 +252,18 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	}
 	dev->data->nb_models = conf->nb_models;
 
+	ocm = &mldev->ocm;
+	ocm->num_tiles = ML_CN10K_OCM_NUMTILES;
+	ocm->size_per_tile = ML_CN10K_OCM_TILESIZE;
+	ocm->page_size = ML_CN10K_OCM_PAGESIZE;
+	ocm->num_pages = ocm->size_per_tile / ocm->page_size;
+	ocm->mask_words = ocm->num_pages / (8 * sizeof(uint8_t));
+
+	for (tile_id = 0; tile_id < ocm->num_tiles; tile_id++)
+		ocm->tile_ocm_info[tile_id].last_wb_page = -1;
+
+	rte_spinlock_init(&ocm->lock);
+
 	mldev->nb_models_loaded = 0;
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
 
@@ -416,6 +430,8 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	const struct plt_memzone *mz;
 	size_t model_data_size;
 	uint8_t *base_dma_addr;
+	uint16_t scratch_pages;
+	uint16_t wb_pages;
 	uint64_t mz_size;
 	uint16_t idx;
 	bool found;
@@ -441,6 +457,11 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 		return -ENOMEM;
 	}
 
+	/* Get WB and scratch pages, check if model can be loaded. */
+	ret = cn10k_ml_model_ocm_pages_count(mldev, idx, params->addr, &wb_pages, &scratch_pages);
+	if (ret < 0)
+		return ret;
+
 	/* Get MZ size */
 	metadata = (struct cn10k_ml_model_metadata *)params->addr;
 	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
@@ -478,6 +499,13 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	/* Copy data from load to run. run address to be used by MLIP */
 	memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, model_data_size);
 
+	memset(&model->model_mem_map, 0, sizeof(struct cn10k_ml_ocm_model_map));
+	model->model_mem_map.ocm_reserved = false;
+	model->model_mem_map.tilemask = 0;
+	model->model_mem_map.wb_page_start = -1;
+	model->model_mem_map.wb_pages = wb_pages;
+	model->model_mem_map.scratch_pages = scratch_pages;
+
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index b7567d04a2..32cb0dc0a2 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -11,12 +11,14 @@ sources = files(
         'cn10k_ml_dev.c',
         'cn10k_ml_ops.c',
         'cn10k_ml_model.c',
+        'cn10k_ml_ocm.c',
 )
 
 headers = files(
         'cn10k_ml_dev.h',
         'cn10k_ml_ops.h',
         'cn10k_ml_model.h',
+        'cn10k_ml_ocm.h',
 )
 
 deps += ['mldev', 'common_ml', 'common_cnxk', 'kvargs', 'hash']
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 14/37] ml/cnxk: add structures for slow and fast path JDs
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (12 preceding siblings ...)
  2022-12-08 20:01 ` [PATCH v1 13/37] ml/cnxk: add internal structures for tiles and OCM Srikanth Yalavarthi
@ 2022-12-08 20:01 ` Srikanth Yalavarthi
  2022-12-08 20:01 ` [PATCH v1 15/37] ml/cnxk: find OCM mask and page slots for a model Srikanth Yalavarthi
                   ` (26 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:01 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added JD structures for load, unload and run jobs. Initialize
job command and allocate memory for request structures for slow
path jobs.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   | 99 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h |  4 ++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 19 +++++-
 drivers/ml/cnxk/cn10k_ml_ops.h   |  4 ++
 4 files changed, 125 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 6b91c9aae6..17411e5fe1 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -188,6 +188,105 @@ struct cn10k_ml_jd {
 
 			uint8_t rsvd[8];
 		} fw_load;
+
+		struct cn10k_ml_jd_section_model_start {
+			/* Source model start address in DDR relative to ML_MLR_BASE */
+			uint64_t model_src_ddr_addr;
+
+			/* Destination model start address in DDR relative to ML_MLR_BASE */
+			uint64_t model_dst_ddr_addr;
+
+			/* Offset to model init section in the model */
+			uint64_t model_init_offset : 32;
+
+			/* Size of init section in the model */
+			uint64_t model_init_size : 32;
+
+			/* Offset to model main section in the model */
+			uint64_t model_main_offset : 32;
+
+			/* Size of main section in the model */
+			uint64_t model_main_size : 32;
+
+			/* Offset to model finish section in the model */
+			uint64_t model_finish_offset : 32;
+
+			/* Size of finish section in the model */
+			uint64_t model_finish_size : 32;
+
+			/* Offset to WB in model bin */
+			uint64_t model_wb_offset : 32;
+
+			/* Number of model layers */
+			uint64_t num_layers : 8;
+
+			/* Number of gather entries, 0 means linear input mode (= no gather) */
+			uint64_t num_gather_entries : 8;
+
+			/* Number of scatter entries 0 means linear input mode (= no scatter) */
+			uint64_t num_scatter_entries : 8;
+
+			/* Tile mask to load model */
+			uint64_t tilemask : 8;
+
+			/* Batch size of model  */
+			uint64_t batch_size : 32;
+
+			/* OCM WB base address */
+			uint64_t ocm_wb_base_address : 32;
+
+			/* OCM WB range start */
+			uint64_t ocm_wb_range_start : 32;
+
+			/* OCM WB range End */
+			uint64_t ocm_wb_range_end : 32;
+
+			/* DDR WB address */
+			uint64_t ddr_wb_base_address;
+
+			/* DDR WB range start */
+			uint64_t ddr_wb_range_start : 32;
+
+			/* DDR WB range end */
+			uint64_t ddr_wb_range_end : 32;
+
+			union {
+				/* Points to gather list if num_gather_entries > 0 */
+				void *gather_list;
+				struct {
+					/* Linear input mode */
+					uint64_t ddr_range_start : 32;
+					uint64_t ddr_range_end : 32;
+				} s;
+			} input;
+
+			union {
+				/* Points to scatter list if num_scatter_entries > 0 */
+				void *scatter_list;
+				struct {
+					/* Linear output mode */
+					uint64_t ddr_range_start : 32;
+					uint64_t ddr_range_end : 32;
+				} s;
+			} output;
+		} model_start;
+
+		struct cn10k_ml_jd_section_model_stop {
+			uint8_t rsvd[96];
+		} model_stop;
+
+		struct cn10k_ml_jd_section_model_run {
+			/* Address of the input for the run relative to ML_MLR_BASE */
+			uint64_t input_ddr_addr;
+
+			/* Address of the output for the run relative to ML_MLR_BASE */
+			uint64_t output_ddr_addr;
+
+			/* Number of batches to run in variable batch processing */
+			uint16_t num_batches;
+
+			uint8_t rsvd[78];
+		} model_run;
 	};
 };
 
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 913849feb0..64160032c1 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -11,6 +11,7 @@
 
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ocm.h"
+#include "cn10k_ml_ops.h"
 
 /* Maximum number of models per device */
 #define ML_CN10K_MAX_MODELS 16
@@ -429,6 +430,9 @@ struct cn10k_ml_model {
 
 	/* Model state */
 	enum cn10k_ml_model_state state;
+
+	/* Model slow-path operations request pointer */
+	struct cn10k_ml_req *req;
 };
 
 int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 63c6ae4862..6c26f450a5 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -12,6 +12,10 @@
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
 
+/* ML Job descriptor flags */
+#define ML_FLAGS_POLL_COMPL BIT(0)
+#define ML_FLAGS_SSO_COMPL  BIT(1)
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -65,6 +69,7 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	struct cn10k_ml_qp *qp;
 	uint32_t len;
 	uint8_t *va;
+	uint64_t i;
 
 	/* Allocate queue pair */
 	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cn10k_ml_qp), ROC_ALIGN,
@@ -95,6 +100,12 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->queue.wait_cycles = ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
 	qp->nb_desc = nb_desc;
 
+	/* Initialize job command */
+	for (i = 0; i < qp->nb_desc; i++) {
+		memset(&qp->queue.reqs[i].jd, 0, sizeof(struct cn10k_ml_jd));
+		qp->queue.reqs[i].jcmd.w1.s.jobptr = PLT_U64_CAST(&qp->queue.reqs[i].jd);
+	}
+
 	return qp;
 
 qp_free:
@@ -468,7 +479,8 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
 	model_data_size = PLT_ALIGN_CEIL(model_data_size, ML_CN10K_ALIGN_SIZE);
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
-		  2 * model_data_size;
+		  2 * model_data_size +
+		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE);
 
 	/* Allocate memzone for model object and model data */
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
@@ -506,6 +518,11 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	model->model_mem_map.wb_pages = wb_pages;
 	model->model_mem_map.scratch_pages = scratch_pages;
 
+	/* Set slow-path request address and state */
+	model->req = PLT_PTR_ADD(
+		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
+				  2 * model_data_size);
+
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 5caebde908..35962f7985 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -6,6 +6,7 @@
 #define _CN10K_ML_OPS_H_
 
 #include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
 
 #include <roc_api.h>
 
@@ -21,6 +22,9 @@ struct cn10k_ml_req {
 
 	/* Status field for poll mode requests */
 	volatile uint64_t status;
+
+	/* Job command */
+	struct ml_job_cmd_s jcmd;
 } __rte_aligned(ROC_ALIGN);
 
 /* ML request queue */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 15/37] ml/cnxk: find OCM mask and page slots for a model
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (13 preceding siblings ...)
  2022-12-08 20:01 ` [PATCH v1 14/37] ml/cnxk: add structures for slow and fast path JDs Srikanth Yalavarthi
@ 2022-12-08 20:01 ` Srikanth Yalavarthi
  2022-12-08 20:01 ` [PATCH v1 16/37] ml/cnxk: add support to reserve and free OCM pages Srikanth Yalavarthi
                   ` (25 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:01 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added support to compute OCM tilemask and page start for a
model. The computed tilemask and page start are used during
model start to copy model weights and bias to OCM. OCM slot
for a model is allocated from the tiles with maximum amount
of free memory.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c | 330 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ocm.h |   5 +
 2 files changed, 335 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index b1c62f2963..a465848558 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -2,4 +2,334 @@
  * Copyright (c) 2022 Marvell.
  */
 
+#include <rte_mldev_pmd.h>
+
+#include "cn10k_ml_dev.h"
 #include "cn10k_ml_ocm.h"
+
+#include "roc_api.h"
+
+/* OCM macros */
+#define BYTE_LEN	  8
+#define OCM_MAP_WORD_SIZE (sizeof(uint8_t) * BYTE_LEN)
+#define SET_BIT(num, n)	  ((num) | (1 << (n)))
+
+/* Left shift multi-word mask by 1 bit.
+ *
+ * For example, given a mask of two uint8_t words
+ * Input:  [00110101] [00110111]
+ * Output: [01101010] [01101110]
+ */
+static void
+lshift_mask(uint8_t *mask, int nwords)
+{
+	int i;
+	int word_sz;
+
+	word_sz = sizeof(uint8_t) * BYTE_LEN;
+	for (i = nwords - 1; i >= 0; i--) {
+		mask[i] = mask[i] << 1;
+		if (i != 0)
+			mask[i] = mask[i] | (mask[i - 1] >> (word_sz - 1));
+	}
+}
+
+/* Get the index of the first unused slot in a multi-word mask (base_mask). Unused slots only after
+ * the start_pos are considered. An unused slot is a sequence of slot_sz continuous unset bits in
+ * the multi-word mask. For example given a multi-word mask,
+ *
+ * The program creates a search_mask with slot_sz bits set. Uses a sliding windows approach to scan
+ * the mask to identify the available first slot. search_mask slides left from start_pos to end.
+ *
+ * [10111000] [01001001]
+ * - WORD 1 --- WORD 0 -
+ *
+ * When start = 0,
+ * Index of the first unused slot of size 4 is 7.
+ * Index of the first unused slot of size 3 is 7.
+ * Index of the first unused slot of size 2 is 1.
+ * Index of the first unused slot of size 1 is 1.
+ *
+ * When start = 2,
+ * Index of the first unused slot of size 4 is 7.
+ * Index of the first unused slot of size 2 is 4.
+ * Index of the first unused slot of size 1 is 2.
+ *
+ * When unable to find a valid slot, return 0
+ * When slot_sz is zero, return max_idx + 1
+ */
+static int
+slot_index_lowest(uint8_t *base_mask, int nwords, int slot_sz, int start_pos)
+{
+	uint8_t *search_mask;
+	int word_sz;
+	int end_pos;
+	int min_idx;
+	int max_idx;
+	bool match;
+	int i, j;
+	int idx;
+
+	word_sz = sizeof(uint8_t) * BYTE_LEN;
+	min_idx = 0;
+	max_idx = word_sz * nwords;
+	idx = min_idx - 1;
+
+	if (slot_sz == 0)
+		return max_idx;
+
+	/* Create a mask with slot_sz bits set */
+	search_mask = plt_zmalloc(nwords * sizeof(uint8_t), 0);
+	if (search_mask == NULL)
+		goto error;
+
+	for (i = 0; i < nwords; i++) {
+		if (i < slot_sz / word_sz)
+			search_mask[i] = 0xFF;
+		else if (i > slot_sz / word_sz)
+			search_mask[i] = 0x00;
+		else
+			search_mask[i] = (1 << (slot_sz % word_sz)) - 1;
+	}
+
+	/* Shift search mask by start_pos bits */
+	for (i = 0; i < start_pos; i++)
+		lshift_mask(search_mask, nwords);
+
+	/* Scan for a slot, left shift search mask after every iteration */
+	end_pos = nwords * word_sz - slot_sz + 1;
+	for (j = start_pos; j < end_pos; j++) {
+		match = true;
+		for (i = 0; i < nwords; i++)
+			match = match && (((~base_mask[i]) & search_mask[i]) == search_mask[i]);
+
+		if (match) {
+			idx = j;
+			goto found;
+		}
+
+		lshift_mask(search_mask, nwords);
+	}
+
+found:
+	plt_free(search_mask);
+
+error:
+	return idx;
+}
+
+/* Find the largest possible unused slot, with a minimum size of search_sz in a multi-work mask. The
+ * function returns the start index of the slot and the size of the identified slot (slot_sz).
+ *
+ * For example, in multi-word mask
+ *
+ * [10111000] [01001001]
+ * - WORD 1 --- WORD 0 -
+ *
+ * When search_sz > 4, return value = -1, slot_sz = 0
+ * When search_sz <=4, return value = 7, slot_sz = 4
+ */
+static int
+slot_index_largest(uint8_t *base_mask, int nwords, int search_sz, int *slot_sz)
+{
+	uint8_t *search_mask;
+	int mask_sz;
+	int word_sz;
+	int end_pos;
+	bool match;
+	int i, j;
+	int idx;
+
+	word_sz = sizeof(uint8_t) * BYTE_LEN;
+	mask_sz = nwords * word_sz;
+	idx = -1;
+
+	/* Create a mask with mask_sz bits set */
+	search_mask = plt_zmalloc(mask_sz, 0);
+	if (search_mask == NULL)
+		goto error;
+
+start:
+	for (i = 0; i < nwords; i++) {
+		if (i < mask_sz / word_sz)
+			search_mask[i] = 0xFF;
+		else if (i > mask_sz / word_sz)
+			search_mask[i] = 0x00;
+		else
+			search_mask[i] = (1 << (mask_sz % word_sz)) - 1;
+	}
+
+	/* Scan for a slot, left shift search mask after every iteration */
+	end_pos = nwords * word_sz - mask_sz + 1;
+	for (j = 0; j < end_pos; j++) {
+		match = true;
+		for (i = 0; i < nwords; i++)
+			match = match && (((~base_mask[i]) & search_mask[i]) == search_mask[i]);
+
+		if (match) {
+			idx = j;
+			goto found;
+		}
+		lshift_mask(search_mask, nwords);
+	}
+
+	mask_sz--;
+	if (mask_sz >= search_sz)
+		goto start;
+	else
+		mask_sz = 0;
+
+found:
+	plt_free(search_mask);
+	if (search_sz == 0)
+		idx = word_sz * nwords;
+
+error:
+	if (slot_sz)
+		*slot_sz = mask_sz;
+
+	return idx;
+}
+
+/* Count number of bits in a tilemask. Assumes that all set bits are contiguous. */
+int
+cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end)
+{
+	uint8_t count;
+
+	PLT_ASSERT(tilemask != 0);
+
+	*start = __builtin_ctzl(tilemask);
+	*end = 64 - __builtin_clzl(tilemask) - 1;
+	count = *end - *start + 1;
+
+	PLT_ASSERT(count == __builtin_popcountl(tilemask));
+	return count;
+}
+
+/* Find the tiles and wb_page_start to load the model on given 'num_tiles' tiles with the specified
+ * scratch & wb pages and OCM allocation mode.
+ */
+int
+cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
+			   uint16_t scratch_pages, uint64_t *tilemask)
+{
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+
+	uint8_t local_ocm_mask[ML_CN10K_OCM_MASKWORDS] = {0};
+	uint16_t used_scratch_pages_max;
+	uint16_t scratch_page_start;
+	int used_last_wb_page_max;
+	uint16_t scratch_page_end;
+	uint8_t search_start_tile;
+	uint8_t search_end_tile;
+	int wb_page_start_curr;
+	int max_slot_sz_curr;
+	uint8_t tile_start;
+	int ocm_alloc_mode;
+	int wb_page_start;
+	uint16_t tile_id;
+	uint16_t word_id;
+	uint8_t tile_idx;
+	int max_slot_sz;
+	int start_tile;
+	int page_id;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+
+	if (num_tiles > ML_CN10K_OCM_NUMTILES) {
+		plt_err("Invalid num_tiles = %u (> ML_CN10K_OCM_NUMTILES)", num_tiles);
+		return -1;
+	}
+
+	memset(tilemask, 0, sizeof(uint64_t));
+	wb_page_start = -1;
+	used_scratch_pages_max = 0;
+	used_last_wb_page_max = -1;
+	start_tile = -1;
+	max_slot_sz_curr = 0;
+	max_slot_sz = 0;
+	tile_idx = 0;
+	ocm_alloc_mode = 2;
+
+	if ((start_tile != -1) && (start_tile % num_tiles != 0)) {
+		plt_err("Invalid start_tile, %d", start_tile);
+		return -1;
+	}
+
+	if (start_tile < 0) {
+		search_start_tile = 0;
+		search_end_tile = ocm->num_tiles - num_tiles;
+	} else {
+		search_start_tile = start_tile;
+		search_end_tile = start_tile;
+	}
+
+	tile_start = search_start_tile;
+start_search:
+	used_scratch_pages_max = 0;
+	used_last_wb_page_max = -1;
+	for (tile_id = tile_start; tile_id < tile_start + num_tiles; tile_id++) {
+		used_scratch_pages_max =
+			PLT_MAX(ocm->tile_ocm_info[tile_id].scratch_pages, used_scratch_pages_max);
+		used_last_wb_page_max =
+			PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, used_last_wb_page_max);
+	}
+
+	memset(local_ocm_mask, 0, sizeof(local_ocm_mask));
+	for (tile_id = tile_start; tile_id < tile_start + num_tiles; tile_id++) {
+		for (word_id = 0; word_id < ocm->mask_words; word_id++)
+			local_ocm_mask[word_id] |= ocm->tile_ocm_info[tile_id].ocm_mask[word_id];
+	}
+
+	if (used_scratch_pages_max < scratch_pages) { /* Check for extra scratch pages */
+		if (ocm->num_pages - used_last_wb_page_max - 1 >=
+		    scratch_pages) { /* Pages available */
+			scratch_page_start = ocm->num_pages - scratch_pages;
+			scratch_page_end = ocm->num_pages - 1;
+			for (page_id = scratch_page_start; page_id <= scratch_page_end;
+			     page_id++) { /* Mark the extra scratch pages as used */
+				local_ocm_mask[page_id / OCM_MAP_WORD_SIZE] =
+					SET_BIT(local_ocm_mask[page_id / OCM_MAP_WORD_SIZE],
+						page_id % OCM_MAP_WORD_SIZE);
+			}
+		} else { /* Pages not available, check for next set of tiles */
+			goto next_search;
+		}
+	}
+
+	if (ocm_alloc_mode == 1) {
+		wb_page_start = slot_index_lowest(local_ocm_mask, ocm->mask_words, wb_pages, 0);
+		if (wb_page_start != -1) { /* Have a valid slot for WB, else next set of tiles */
+			tile_idx = tile_start;
+			goto found;
+		}
+	} else if (ocm_alloc_mode == 2) {
+		wb_page_start_curr = slot_index_largest(local_ocm_mask, ocm->mask_words, wb_pages,
+							&max_slot_sz_curr);
+		if (max_slot_sz_curr > max_slot_sz) {
+			wb_page_start = wb_page_start_curr;
+			max_slot_sz = max_slot_sz_curr;
+			tile_idx = tile_start;
+		} else if (max_slot_sz_curr == max_slot_sz) {
+			wb_page_start = wb_page_start_curr;
+			if (wb_page_start == ocm->num_pages) {
+				tile_idx = tile_start;
+				goto found;
+			}
+		}
+	}
+
+next_search:
+	tile_start = tile_start + num_tiles;
+	if (tile_start <= search_end_tile)
+		goto start_search;
+
+found:
+	if (wb_page_start != -1)
+		*tilemask = GENMASK_ULL(tile_idx + num_tiles - 1, tile_idx);
+
+	return wb_page_start;
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 57c2eee344..2b7166bbca 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -6,6 +6,7 @@
 #define _CN10K_ML_OCM_H_
 
 #include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
 
 /* Page size in bytes. */
 #define ML_CN10K_OCM_PAGESIZE 0x4000
@@ -76,4 +77,8 @@ struct cn10k_ml_ocm {
 	struct cn10k_ml_ocm_tile_info tile_ocm_info[ML_CN10K_OCM_NUMTILES];
 };
 
+int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
+int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
+			       uint16_t scratch_pages, uint64_t *tilemask);
+
 #endif /* _CN10K_ML_OCM_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 16/37] ml/cnxk: add support to reserve and free OCM pages
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (14 preceding siblings ...)
  2022-12-08 20:01 ` [PATCH v1 15/37] ml/cnxk: find OCM mask and page slots for a model Srikanth Yalavarthi
@ 2022-12-08 20:01 ` Srikanth Yalavarthi
  2022-12-08 20:02 ` [PATCH v1 17/37] ml/cnxk: enable support to start an ML model Srikanth Yalavarthi
                   ` (24 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:01 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added support to reserve and free OCM pages for a model. OCM
pages are reserved upon completion of model start and are
released after model stop.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c | 131 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cn10k_ml_ocm.h |   3 +
 2 files changed, 131 insertions(+), 3 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index a465848558..ddc0936cec 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -5,14 +5,17 @@
 #include <rte_mldev_pmd.h>
 
 #include "cn10k_ml_dev.h"
+#include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
 #include "roc_api.h"
 
 /* OCM macros */
-#define BYTE_LEN	  8
-#define OCM_MAP_WORD_SIZE (sizeof(uint8_t) * BYTE_LEN)
-#define SET_BIT(num, n)	  ((num) | (1 << (n)))
+#define BYTE_LEN	   8
+#define OCM_MAP_WORD_SIZE  (sizeof(uint8_t) * BYTE_LEN)
+#define IS_BIT_SET(num, n) ((num) & (1 << (n)))
+#define SET_BIT(num, n)	   ((num) | (1 << (n)))
+#define CLEAR_BIT(num, n)  ((num) &= ~((1) << (n)))
 
 /* Left shift multi-word mask by 1 bit.
  *
@@ -333,3 +336,125 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 
 	return wb_page_start;
 }
+
+void
+cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, int16_t model_id, uint64_t tilemask,
+			   int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+
+	int scratch_page_start;
+	int scratch_page_end;
+	int wb_page_end;
+	int tile_start;
+	int tile_end;
+	int tile_id;
+	int page_id;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+	model = dev->data->models[model_id];
+
+	/* Get first set bit, tile_start */
+	tile_start = 0;
+	tile_end = 0;
+	cn10k_ml_ocm_tilecount(tilemask, &tile_start, &tile_end);
+	wb_page_end = wb_page_start + wb_pages - 1;
+	scratch_page_start = ocm->num_pages - scratch_pages;
+	scratch_page_end = ocm->num_pages - 1;
+
+	/* Update tile_ocm_info */
+	for (tile_id = tile_start; tile_id <= tile_end; tile_id++) {
+		/* Scratch pages */
+		for (page_id = scratch_page_start; page_id <= scratch_page_end; page_id++)
+			ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE] = SET_BIT(
+				ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE],
+				page_id % OCM_MAP_WORD_SIZE);
+		ocm->tile_ocm_info[tile_id].scratch_pages =
+			PLT_MAX(ocm->tile_ocm_info[tile_id].scratch_pages, scratch_pages);
+
+		/* WB pages */
+		for (page_id = wb_page_start; page_id <= wb_page_end; page_id++)
+			ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE] = SET_BIT(
+				ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE],
+				page_id % OCM_MAP_WORD_SIZE);
+		if (wb_pages != 0)
+			ocm->tile_ocm_info[tile_id].last_wb_page =
+				PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, wb_page_end);
+	}
+
+	model->addr.tile_start = tile_start;
+	model->addr.tile_end = tile_end;
+
+	plt_ml_dbg("model_id = %d, tilemask = 0x%016lx", model_id, tilemask);
+	plt_ml_dbg("model_id = %d, wb_page_start = %d, wb_page_end = %d", model_id, wb_page_start,
+		   wb_page_end);
+	plt_ml_dbg("model_id = %d, scratch_page_start = %d, scratch_page_end = %d", model_id,
+		   scratch_page_start, scratch_page_end);
+}
+
+void
+cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, int16_t model_id)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+
+	int scratch_resize_pages;
+	int wb_page_start;
+	int wb_page_end;
+	int prev_start;
+	int curr_start;
+	int tile_id;
+	int page_id;
+	int16_t i;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+	model = dev->data->models[model_id];
+
+	/* Update OCM info for WB memory */
+	wb_page_start = model->model_mem_map.wb_page_start;
+	wb_page_end = wb_page_start + model->model_mem_map.wb_pages - 1;
+	for (tile_id = model->addr.tile_start; tile_id <= model->addr.tile_end; tile_id++) {
+		for (page_id = wb_page_start; page_id <= wb_page_end; page_id++) {
+			ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE] =
+				CLEAR_BIT(ocm->tile_ocm_info[tile_id]
+						  .ocm_mask[page_id / OCM_MAP_WORD_SIZE],
+					  page_id % OCM_MAP_WORD_SIZE);
+		}
+
+		/* Update last_wb_page size */
+		if (wb_page_end == ocm->tile_ocm_info[tile_id].last_wb_page)
+			ocm->tile_ocm_info[tile_id].last_wb_page = wb_page_start - 1;
+
+		/* Update scratch page size and clear extra bits */
+		scratch_resize_pages = 0;
+		/* Get max scratch pages required, excluding the current model */
+		for (i = 0; i < dev->data->nb_models; i++) {
+			struct cn10k_ml_model *model = dev->data->models[i];
+
+			if ((i != model_id) && (model != NULL)) {
+				if (IS_BIT_SET(model->model_mem_map.tilemask, tile_id))
+					scratch_resize_pages =
+						PLT_MAX((int)model->model_mem_map.scratch_pages,
+							scratch_resize_pages);
+			}
+		}
+
+		/* Clear extra scratch pages */
+		if (scratch_resize_pages < ocm->tile_ocm_info[tile_id].scratch_pages) {
+			prev_start = ocm->num_pages - ocm->tile_ocm_info[tile_id].scratch_pages;
+			curr_start = ocm->num_pages - scratch_resize_pages;
+			for (page_id = prev_start; page_id < curr_start; page_id++) {
+				ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE] =
+					CLEAR_BIT(ocm->tile_ocm_info[tile_id]
+							  .ocm_mask[page_id / OCM_MAP_WORD_SIZE],
+						  page_id % OCM_MAP_WORD_SIZE);
+			}
+			ocm->tile_ocm_info[tile_id].scratch_pages = scratch_resize_pages;
+		}
+	}
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 2b7166bbca..7c6b1432c5 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -80,5 +80,8 @@ struct cn10k_ml_ocm {
 int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
 int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
 			       uint16_t scratch_pages, uint64_t *tilemask);
+void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, int16_t model_id, uint64_t tilemask,
+				int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages);
+void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, int16_t model_id);
 
 #endif /* _CN10K_ML_OCM_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 17/37] ml/cnxk: enable support to start an ML model
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (15 preceding siblings ...)
  2022-12-08 20:01 ` [PATCH v1 16/37] ml/cnxk: add support to reserve and free OCM pages Srikanth Yalavarthi
@ 2022-12-08 20:02 ` Srikanth Yalavarthi
  2022-12-08 20:02 ` [PATCH v1 18/37] ml/cnxk: enable support to stop an ML models Srikanth Yalavarthi
                   ` (23 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:02 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Implemented model start driver function. A model start  job
is checked for completion in synchronous mode. Tilemask and
OCM slot is calculated before starting the model. Model start
is enqueued through scratch registers. OCM pages are reserved
after model start completion.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h |   3 +
 drivers/ml/cnxk/cn10k_ml_ops.c | 208 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |   4 +
 3 files changed, 215 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 17411e5fe1..5096a26c40 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -33,6 +33,9 @@
 /* ML command timeout in seconds */
 #define ML_CN10K_CMD_TIMEOUT 5
 
+/* ML slow-path job flags */
+#define ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE BIT(0)
+
 /* Poll mode job state */
 #define ML_CN10K_POLL_JOB_START	 0
 #define ML_CN10K_POLL_JOB_FINISH 1
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 6c26f450a5..b74092e605 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -114,6 +114,64 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	return NULL;
 }
 
+static void
+cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_model *model,
+				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	struct cn10k_ml_model_addr *addr;
+
+	metadata = &model->metadata;
+	addr = &model->addr;
+
+	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
+	req->jd.hdr.jce.w0.u64 = 0;
+	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
+	req->jd.hdr.model_id = model->model_id;
+	req->jd.hdr.job_type = job_type;
+	req->jd.hdr.fp_flags = 0x0;
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+
+	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
+		if (!model->metadata.model.ocm_relocatable)
+			req->jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
+		else
+			req->jd.hdr.sp_flags = 0x0;
+		req->jd.model_start.model_src_ddr_addr =
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, addr->init_load_addr));
+		req->jd.model_start.model_dst_ddr_addr =
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, addr->init_run_addr));
+		req->jd.model_start.model_init_offset = 0x0;
+		req->jd.model_start.model_main_offset = metadata->init_model.file_size;
+		req->jd.model_start.model_finish_offset =
+			metadata->init_model.file_size + metadata->main_model.file_size;
+		req->jd.model_start.model_init_size = metadata->init_model.file_size;
+		req->jd.model_start.model_main_size = metadata->main_model.file_size;
+		req->jd.model_start.model_finish_size = metadata->finish_model.file_size;
+		req->jd.model_start.model_wb_offset = metadata->init_model.file_size +
+						      metadata->main_model.file_size +
+						      metadata->finish_model.file_size;
+		req->jd.model_start.num_layers = metadata->model.num_layers;
+		req->jd.model_start.num_gather_entries = 0;
+		req->jd.model_start.num_scatter_entries = 0;
+		req->jd.model_start.tilemask = 0; /* Updated after reserving pages */
+		req->jd.model_start.batch_size = model->batch_size;
+		req->jd.model_start.ocm_wb_base_address = 0; /* Updated after reserving pages */
+		req->jd.model_start.ocm_wb_range_start = metadata->model.ocm_wb_range_start;
+		req->jd.model_start.ocm_wb_range_end = metadata->model.ocm_wb_range_end;
+		req->jd.model_start.ddr_wb_base_address = PLT_U64_CAST(roc_ml_addr_ap2mlip(
+			&mldev->roc,
+			PLT_PTR_ADD(addr->finish_load_addr, metadata->finish_model.file_size)));
+		req->jd.model_start.ddr_wb_range_start = metadata->model.ddr_wb_range_start;
+		req->jd.model_start.ddr_wb_range_end = metadata->model.ddr_wb_range_end;
+		req->jd.model_start.input.s.ddr_range_start = metadata->model.ddr_input_range_start;
+		req->jd.model_start.input.s.ddr_range_end = metadata->model.ddr_input_range_end;
+		req->jd.model_start.output.s.ddr_range_start =
+			metadata->model.ddr_output_range_start;
+		req->jd.model_start.output.s.ddr_range_end = metadata->model.ddr_output_range_end;
+	}
+}
+
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -560,6 +618,155 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, int16_t model_id)
 	return plt_memzone_free(plt_memzone_lookup(str));
 }
 
+int
+cn10k_ml_model_start(struct rte_ml_dev *dev, int16_t model_id)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+	struct cn10k_ml_req *req;
+
+	bool job_enqueued;
+	bool job_dequeued;
+	uint8_t num_tiles;
+	uint64_t tilemask;
+	int wb_page_start;
+	int tile_start;
+	int tile_end;
+	bool locked;
+	int ret = 0;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	/* Prepare JD */
+	req = model->req;
+	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
+
+	req->result.error_code = 0x0;
+	req->result.user_ptr = NULL;
+
+	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_wmb();
+
+	num_tiles = model->metadata.model.tile_end - model->metadata.model.tile_start + 1;
+
+	locked = false;
+	while (!locked) {
+		if (plt_spinlock_trylock(&model->lock) != 0) {
+			if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+				plt_ml_dbg("Model already started, model = 0x%016lx",
+					   PLT_U64_CAST(model));
+				plt_spinlock_unlock(&model->lock);
+				return 1;
+			}
+
+			if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE) {
+				plt_err("A slow-path job is active for the model = 0x%016lx",
+					PLT_U64_CAST(model));
+				plt_spinlock_unlock(&model->lock);
+				return -EBUSY;
+			}
+
+			model->state = ML_CN10K_MODEL_STATE_JOB_ACTIVE;
+			plt_spinlock_unlock(&model->lock);
+			locked = true;
+		}
+	}
+
+	while (!model->model_mem_map.ocm_reserved) {
+		if (plt_spinlock_trylock(&ocm->lock) != 0) {
+			wb_page_start = cn10k_ml_ocm_tilemask_find(
+				dev, num_tiles, model->model_mem_map.wb_pages,
+				model->model_mem_map.scratch_pages, &tilemask);
+
+			if (wb_page_start == -1) {
+				plt_err("Free pages not available on OCM tiles");
+				plt_err("Failed to load model = 0x%016lx, name = %s",
+					PLT_U64_CAST(model), model->metadata.model.name);
+
+				plt_spinlock_unlock(&ocm->lock);
+				return -ENOMEM;
+			}
+
+			model->model_mem_map.tilemask = tilemask;
+			model->model_mem_map.wb_page_start = wb_page_start;
+
+			cn10k_ml_ocm_reserve_pages(
+				dev, model->model_id, model->model_mem_map.tilemask,
+				model->model_mem_map.wb_page_start, model->model_mem_map.wb_pages,
+				model->model_mem_map.scratch_pages);
+			model->model_mem_map.ocm_reserved = true;
+			plt_spinlock_unlock(&ocm->lock);
+		}
+	}
+
+	/* Update JD */
+	cn10k_ml_ocm_tilecount(model->model_mem_map.tilemask, &tile_start, &tile_end);
+	req->jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
+	req->jd.model_start.ocm_wb_base_address =
+		model->model_mem_map.wb_page_start * ocm->page_size;
+
+	job_enqueued = false;
+	job_dequeued = false;
+	do {
+		if (!job_enqueued) {
+			req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+			job_enqueued = roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+		}
+
+		if (job_enqueued && !job_dequeued)
+			job_dequeued = roc_ml_scratch_dequeue(&mldev->roc, &req->jd);
+
+		if (job_dequeued)
+			break;
+	} while (plt_tsc_cycles() < req->timeout);
+
+	if (job_dequeued) {
+		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+			if (req->result.error_code == 0)
+				ret = 0;
+			else
+				ret = -1;
+		}
+	} else { /* Reset scratch registers */
+		roc_ml_scratch_queue_reset(&mldev->roc);
+		ret = -ETIME;
+	}
+
+	locked = false;
+	while (!locked) {
+		if (plt_spinlock_trylock(&model->lock) != 0) {
+			if (ret == 0)
+				model->state = ML_CN10K_MODEL_STATE_STARTED;
+			else
+				model->state = ML_CN10K_MODEL_STATE_UNKNOWN;
+
+			plt_spinlock_unlock(&model->lock);
+			locked = true;
+		}
+	}
+
+	if (model->state == ML_CN10K_MODEL_STATE_UNKNOWN) {
+		while (model->model_mem_map.ocm_reserved) {
+			if (plt_spinlock_trylock(&ocm->lock) != 0) {
+				cn10k_ml_ocm_free_pages(dev, model->model_id);
+				model->model_mem_map.ocm_reserved = false;
+				model->model_mem_map.tilemask = 0x0;
+				plt_spinlock_unlock(&ocm->lock);
+			}
+		}
+	}
+
+	return ret;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -575,4 +782,5 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Model ops */
 	.model_load = cn10k_ml_model_load,
 	.model_unload = cn10k_ml_model_unload,
+	.model_start = cn10k_ml_model_start,
 };
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 35962f7985..3fe3872fd1 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -25,6 +25,9 @@ struct cn10k_ml_req {
 
 	/* Job command */
 	struct ml_job_cmd_s jcmd;
+
+	/* Request timeout cycle */
+	uint64_t timeout;
 } __rte_aligned(ROC_ALIGN);
 
 /* ML request queue */
@@ -61,5 +64,6 @@ extern struct rte_ml_dev_ops cn10k_ml_ops;
 int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			int16_t *model_id);
 int cn10k_ml_model_unload(struct rte_ml_dev *dev, int16_t model_id);
+int cn10k_ml_model_start(struct rte_ml_dev *dev, int16_t model_id);
 
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 18/37] ml/cnxk: enable support to stop an ML models
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (16 preceding siblings ...)
  2022-12-08 20:02 ` [PATCH v1 17/37] ml/cnxk: enable support to start an ML model Srikanth Yalavarthi
@ 2022-12-08 20:02 ` Srikanth Yalavarthi
  2022-12-08 20:02 ` [PATCH v1 19/37] ml/cnxk: enable support to get model information Srikanth Yalavarthi
                   ` (22 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:02 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Implemented model stop driver function. A model stop job is
enqueued through scratch registers and is checked for
completion through polling in a synchronous mode. OCM pages
are released after model stop completion.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 115 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cn10k_ml_ops.h |   1 +
 2 files changed, 114 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index b74092e605..a0b0fc7e1f 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -295,10 +295,14 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		/* Re-configure */
 		void **models;
 
-		/* Unload all models */
+		/* Stop and unload all models */
 		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 			model = dev->data->models[model_id];
 			if (model != NULL) {
+				if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+					if (cn10k_ml_model_stop(dev, model_id) != 0)
+						plt_err("Could not stop model %d", model_id);
+				}
 				if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
 					if (cn10k_ml_model_unload(dev, model_id) != 0)
 						plt_err("Could not unload model %d", model_id);
@@ -362,10 +366,14 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 
 	mldev = dev->data->dev_private;
 
-	/* Unload all models */
+	/* Stop and unload all models */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 		model = dev->data->models[model_id];
 		if (model != NULL) {
+			if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+				if (cn10k_ml_model_stop(dev, model_id) != 0)
+					plt_err("Could not stop model %d", model_id);
+			}
 			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
 				if (cn10k_ml_model_unload(dev, model_id) != 0)
 					plt_err("Could not unload model %d", model_id);
@@ -767,6 +775,108 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, int16_t model_id)
 	return ret;
 }
 
+int
+cn10k_ml_model_stop(struct rte_ml_dev *dev, int16_t model_id)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+	struct cn10k_ml_req *req;
+
+	bool job_enqueued;
+	bool job_dequeued;
+	bool locked;
+	int ret = 0;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	/* Prepare JD */
+	req = model->req;
+	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
+	req->result.error_code = 0x0;
+	req->result.user_ptr = NULL;
+
+	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_wmb();
+
+	locked = false;
+	while (!locked) {
+		if (plt_spinlock_trylock(&model->lock) != 0) {
+			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+				plt_ml_dbg("Model not started, model = 0x%016lx",
+					   PLT_U64_CAST(model));
+				plt_spinlock_unlock(&model->lock);
+				return 1;
+			}
+
+			if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE) {
+				plt_err("A slow-path job is active for the model = 0x%016lx",
+					PLT_U64_CAST(model));
+				plt_spinlock_unlock(&model->lock);
+				return -EBUSY;
+			}
+
+			model->state = ML_CN10K_MODEL_STATE_JOB_ACTIVE;
+			plt_spinlock_unlock(&model->lock);
+			locked = true;
+		}
+	}
+
+	while (model->model_mem_map.ocm_reserved) {
+		if (plt_spinlock_trylock(&ocm->lock) != 0) {
+			cn10k_ml_ocm_free_pages(dev, model->model_id);
+			model->model_mem_map.ocm_reserved = false;
+			model->model_mem_map.tilemask = 0x0;
+			plt_spinlock_unlock(&ocm->lock);
+		}
+	}
+
+	job_enqueued = false;
+	job_dequeued = false;
+	do {
+		if (!job_enqueued) {
+			req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+			job_enqueued = roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+		}
+
+		if (job_enqueued && !job_dequeued)
+			job_dequeued = roc_ml_scratch_dequeue(&mldev->roc, &req->jd);
+
+		if (job_dequeued)
+			break;
+	} while (plt_tsc_cycles() < req->timeout);
+
+	if (job_dequeued) {
+		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+			if (req->result.error_code == 0x0)
+				ret = 0;
+			else
+				ret = -1;
+		}
+	} else {
+		roc_ml_scratch_queue_reset(&mldev->roc);
+		ret = -ETIME;
+	}
+
+	locked = false;
+	while (!locked) {
+		if (plt_spinlock_trylock(&model->lock) != 0) {
+			model->state = ML_CN10K_MODEL_STATE_LOADED;
+			plt_spinlock_unlock(&model->lock);
+			locked = true;
+		}
+	}
+
+	return ret;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -783,4 +893,5 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.model_load = cn10k_ml_model_load,
 	.model_unload = cn10k_ml_model_unload,
 	.model_start = cn10k_ml_model_start,
+	.model_stop = cn10k_ml_model_stop,
 };
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 3fe3872fd1..5e7e42ee88 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -65,5 +65,6 @@ int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *para
 			int16_t *model_id);
 int cn10k_ml_model_unload(struct rte_ml_dev *dev, int16_t model_id);
 int cn10k_ml_model_start(struct rte_ml_dev *dev, int16_t model_id);
+int cn10k_ml_model_stop(struct rte_ml_dev *dev, int16_t model_id);
 
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 19/37] ml/cnxk: enable support to get model information
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (17 preceding siblings ...)
  2022-12-08 20:02 ` [PATCH v1 18/37] ml/cnxk: enable support to stop an ML models Srikanth Yalavarthi
@ 2022-12-08 20:02 ` Srikanth Yalavarthi
  2022-12-08 20:02 ` [PATCH v1 20/37] ml/cnxk: enable support to update model params Srikanth Yalavarthi
                   ` (21 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:02 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added driver functions to get model information. Added
internal functions to set and get model info.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 54 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h |  9 ++++++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 37 ++++++++++++++++++++--
 3 files changed, 98 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 11b52af68c..19595656ae 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -340,3 +340,57 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, int16_t model_id, uin
 
 	return 0;
 }
+
+void
+cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model)
+{
+	struct rte_ml_model_info *info;
+	struct rte_ml_io_info *output;
+	struct rte_ml_io_info *input;
+	uint8_t i;
+
+	info = PLT_PTR_CAST(model->info);
+	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
+	output =
+		PLT_PTR_ADD(input, model->metadata.model.num_input * sizeof(struct rte_ml_io_info));
+
+	/* Set model info */
+	memset(info, 0, sizeof(struct rte_ml_model_info));
+	memcpy(info->name, model->metadata.model.name, MRVL_ML_MODEL_NAME_LEN);
+	snprintf(info->version, RTE_ML_STR_MAX, "%u.%u.%u.%u", model->metadata.model.version[0],
+		 model->metadata.model.version[1], model->metadata.model.version[2],
+		 model->metadata.model.version[3]);
+	info->model_id = model->model_id;
+	info->device_id = dev->data->dev_id;
+	info->batch_size = model->batch_size;
+	info->nb_inputs = model->metadata.model.num_input;
+	info->input_info = input;
+	info->nb_outputs = model->metadata.model.num_output;
+	info->output_info = output;
+	info->wb_size = model->metadata.weights_bias.file_size;
+
+	/* Set input info */
+	for (i = 0; i < info->nb_inputs; i++) {
+		memcpy(input[i].name, model->metadata.input[i].input_name, MRVL_ML_INPUT_NAME_LEN);
+		input[i].dtype = model->metadata.input[i].input_type;
+		input[i].qtype = model->metadata.input[i].model_input_type;
+		input[i].shape.format = model->metadata.input[i].shape.format;
+		input[i].shape.w = model->metadata.input[i].shape.w;
+		input[i].shape.x = model->metadata.input[i].shape.x;
+		input[i].shape.y = model->metadata.input[i].shape.y;
+		input[i].shape.z = model->metadata.input[i].shape.z;
+	}
+
+	/* Set output info */
+	for (i = 0; i < info->nb_outputs; i++) {
+		memcpy(output[i].name, model->metadata.output[i].output_name,
+		       MRVL_ML_OUTPUT_NAME_LEN);
+		output[i].dtype = model->metadata.output[i].output_type;
+		output[i].qtype = model->metadata.output[i].model_output_type;
+		output[i].shape.format = RTE_ML_IO_FORMAT_1D;
+		output[i].shape.w = model->metadata.output[i].size;
+		output[i].shape.x = 1;
+		output[i].shape.y = 1;
+		output[i].shape.z = 1;
+	}
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 64160032c1..2372ac9b72 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -425,6 +425,14 @@ struct cn10k_ml_model {
 	/* Tile and memory information object */
 	struct cn10k_ml_ocm_model_map model_mem_map;
 
+	/* Internal model information structure
+	 * Size of the buffer = sizeof(struct rte_ml_model_info)
+	 *                    + num_inputs * sizeof(struct rte_ml_io_info)
+	 *                    + num_outputs * sizeof(struct rte_ml_io_info).
+	 * Structures would be arranged in the same order in the buffer.
+	 */
+	uint8_t *info;
+
 	/* Model lock, used to update model state */
 	plt_spinlock_t lock;
 
@@ -441,5 +449,6 @@ void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
 				uint8_t *base_dma_addr);
 int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, int16_t model_id, uint8_t *buffer,
 				   uint16_t *wb_pages, uint16_t *scratch_pages);
+void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index a0b0fc7e1f..f26cfcfd06 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -506,6 +506,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
 	size_t model_data_size;
+	size_t model_info_size;
 	uint8_t *base_dma_addr;
 	uint16_t scratch_pages;
 	uint16_t wb_pages;
@@ -544,8 +545,13 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
 			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
 	model_data_size = PLT_ALIGN_CEIL(model_data_size, ML_CN10K_ALIGN_SIZE);
+	model_info_size = sizeof(struct rte_ml_model_info) +
+			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
+			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
+	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
+
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
-		  2 * model_data_size +
+		  2 * model_data_size + model_info_size +
 		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE);
 
 	/* Allocate memzone for model object and model data */
@@ -559,6 +565,9 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	model = mz->addr;
 	model->mldev = mldev;
 	model->model_id = idx;
+	model->info = PLT_PTR_ADD(
+		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
+				  2 * model_data_size);
 
 	memcpy(&model->metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
 	cn10k_ml_model_metadata_update(&model->metadata);
@@ -587,7 +596,10 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	/* Set slow-path request address and state */
 	model->req = PLT_PTR_ADD(
 		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
-				  2 * model_data_size);
+				  2 * model_data_size + model_info_size);
+
+	/* Set model info */
+	cn10k_ml_model_info_set(dev, model);
 
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
@@ -877,6 +889,26 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, int16_t model_id)
 	return ret;
 }
 
+static int
+cn10k_ml_model_info_get(struct rte_ml_dev *dev, int16_t model_id,
+			struct rte_ml_model_info *model_info)
+{
+	struct cn10k_ml_model *model;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	memcpy(model_info, model->info, sizeof(struct rte_ml_model_info));
+	model_info->input_info = ((struct rte_ml_model_info *)model->info)->input_info;
+	model_info->output_info = ((struct rte_ml_model_info *)model->info)->output_info;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -894,4 +926,5 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.model_unload = cn10k_ml_model_unload,
 	.model_start = cn10k_ml_model_start,
 	.model_stop = cn10k_ml_model_stop,
+	.model_info_get = cn10k_ml_model_info_get,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 20/37] ml/cnxk: enable support to update model params
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (18 preceding siblings ...)
  2022-12-08 20:02 ` [PATCH v1 19/37] ml/cnxk: enable support to get model information Srikanth Yalavarthi
@ 2022-12-08 20:02 ` Srikanth Yalavarthi
  2022-12-08 20:02 ` [PATCH v1 21/37] ml/cnxk: add support to get IO buffer sizes Srikanth Yalavarthi
                   ` (20 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:02 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added cnxk driver functions to update model params or weights
and bias after a models is loaded. Updating model params would
not require reloading the model.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index f26cfcfd06..bc50e1b8cb 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -909,6 +909,36 @@ cn10k_ml_model_info_get(struct rte_ml_dev *dev, int16_t model_id,
 	return 0;
 }
 
+static int
+cn10k_ml_model_params_update(struct rte_ml_dev *dev, int16_t model_id, void *buffer)
+{
+	struct cn10k_ml_model *model;
+	size_t size;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	if (model->state == ML_CN10K_MODEL_STATE_UNKNOWN)
+		return -1;
+	else if (model->state != ML_CN10K_MODEL_STATE_LOADED)
+		return -EBUSY;
+
+	size = model->metadata.init_model.file_size + model->metadata.main_model.file_size +
+	       model->metadata.finish_model.file_size + model->metadata.weights_bias.file_size;
+
+	/* Update model weights & bias */
+	memcpy(model->addr.wb_load_addr, buffer, model->metadata.weights_bias.file_size);
+
+	/* Copy data from load to run. run address to be used by MLIP */
+	memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, size);
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -927,4 +957,5 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.model_start = cn10k_ml_model_start,
 	.model_stop = cn10k_ml_model_stop,
 	.model_info_get = cn10k_ml_model_info_get,
+	.model_params_update = cn10k_ml_model_params_update,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 21/37] ml/cnxk: add support to get IO buffer sizes
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (19 preceding siblings ...)
  2022-12-08 20:02 ` [PATCH v1 20/37] ml/cnxk: enable support to update model params Srikanth Yalavarthi
@ 2022-12-08 20:02 ` Srikanth Yalavarthi
  2022-12-08 20:02 ` [PATCH v1 22/37] ml/cnxk: enable quantization and dequantization Srikanth Yalavarthi
                   ` (19 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:02 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added driver functions to get input and output buffer sizes
for a given batch size. This function would compute the buffer
size based on specific requirements of the device.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 52 ++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index bc50e1b8cb..c96f17ebd8 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -939,6 +939,54 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, int16_t model_id, void *buf
 	return 0;
 }
 
+static int
+cn10k_ml_io_input_size_get(struct rte_ml_dev *dev, int16_t model_id, uint32_t nb_batches,
+			   uint64_t *input_qsize, uint64_t *input_dsize)
+{
+	struct cn10k_ml_model *model;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	if (input_qsize != NULL)
+		*input_qsize = PLT_U64_CAST(model->addr.total_input_sz_q *
+					    PLT_DIV_CEIL(nb_batches, model->batch_size));
+
+	if (input_dsize != NULL)
+		*input_dsize = PLT_U64_CAST(model->addr.total_input_sz_d *
+					    PLT_DIV_CEIL(nb_batches, model->batch_size));
+
+	return 0;
+}
+
+static int
+cn10k_ml_io_output_size_get(struct rte_ml_dev *dev, int16_t model_id, uint32_t nb_batches,
+			    uint64_t *output_qsize, uint64_t *output_dsize)
+{
+	struct cn10k_ml_model *model;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	if (output_qsize != NULL)
+		*output_qsize = PLT_U64_CAST(model->addr.total_output_sz_q *
+					     PLT_DIV_CEIL(nb_batches, model->batch_size));
+
+	if (output_dsize != NULL)
+		*output_dsize = PLT_U64_CAST(model->addr.total_output_sz_d *
+					     PLT_DIV_CEIL(nb_batches, model->batch_size));
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -958,4 +1006,8 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.model_stop = cn10k_ml_model_stop,
 	.model_info_get = cn10k_ml_model_info_get,
 	.model_params_update = cn10k_ml_model_params_update,
+
+	/* I/O ops */
+	.io_input_size_get = cn10k_ml_io_input_size_get,
+	.io_output_size_get = cn10k_ml_io_output_size_get,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 22/37] ml/cnxk: enable quantization and dequantization
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (20 preceding siblings ...)
  2022-12-08 20:02 ` [PATCH v1 21/37] ml/cnxk: add support to get IO buffer sizes Srikanth Yalavarthi
@ 2022-12-08 20:02 ` Srikanth Yalavarthi
  2022-12-08 20:02 ` [PATCH v1 23/37] ml/cnxk: enable support to dump device debug info Srikanth Yalavarthi
                   ` (18 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:02 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Implemented driver functions to quantize / dequantize input
and output data. Support is enabled for multiple batches.
Quantization / dequantization use the type conversion functions
defined in ML common code.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 150 +++++++++++++++++++++++++++++++++
 1 file changed, 150 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index c96f17ebd8..9868d2a598 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -5,6 +5,8 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include <ml_utils.h>
+
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ops.h"
@@ -987,6 +989,152 @@ cn10k_ml_io_output_size_get(struct rte_ml_dev *dev, int16_t model_id, uint32_t n
 	return 0;
 }
 
+static int
+cn10k_ml_io_quantize(struct rte_ml_dev *dev, int16_t model_id, uint16_t nb_batches, void *dbuffer,
+		     void *qbuffer)
+{
+	struct cn10k_ml_model *model;
+	uint8_t *lcl_dbuffer;
+	uint8_t *lcl_qbuffer;
+	uint32_t batch_id;
+	uint32_t i;
+	int ret;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	lcl_dbuffer = dbuffer;
+	lcl_qbuffer = qbuffer;
+	batch_id = 0;
+
+next_batch:
+	for (i = 0; i < model->metadata.model.num_input; i++) {
+		if (model->metadata.input[i].input_type ==
+		    model->metadata.input[i].model_input_type) {
+			memcpy(lcl_qbuffer, lcl_dbuffer, model->addr.input[i].sz_d);
+		} else {
+			switch (model->metadata.input[i].model_input_type) {
+			case RTE_ML_IO_TYPE_INT8:
+				ret = ml_float32_to_int8(model->metadata.input[i].qscale,
+							 model->addr.input[i].nb_elements,
+							 lcl_dbuffer, lcl_qbuffer);
+				break;
+			case RTE_ML_IO_TYPE_UINT8:
+				ret = ml_float32_to_uint8(model->metadata.input[i].qscale,
+							  model->addr.input[i].nb_elements,
+							  lcl_dbuffer, lcl_qbuffer);
+				break;
+			case RTE_ML_IO_TYPE_INT16:
+				ret = ml_float32_to_int16(model->metadata.input[i].qscale,
+							  model->addr.input[i].nb_elements,
+							  lcl_dbuffer, lcl_qbuffer);
+				break;
+			case RTE_ML_IO_TYPE_UINT16:
+				ret = ml_float32_to_uint16(model->metadata.input[i].qscale,
+							   model->addr.input[i].nb_elements,
+							   lcl_dbuffer, lcl_qbuffer);
+				break;
+			case RTE_ML_IO_TYPE_FP16:
+				ret = ml_float32_to_float16(model->addr.input[i].nb_elements,
+							    lcl_dbuffer, lcl_qbuffer);
+				break;
+			default:
+				plt_err("Unsupported model_input_type[%u] : %u", i,
+					model->metadata.input[i].model_input_type);
+				ret = -ENOTSUP;
+			}
+			if (ret < 0)
+				return ret;
+		}
+
+		lcl_dbuffer += model->addr.input[i].sz_d;
+		lcl_qbuffer += model->addr.input[i].sz_q;
+	}
+
+	batch_id++;
+	if (batch_id < PLT_DIV_CEIL(nb_batches, model->batch_size))
+		goto next_batch;
+
+	return 0;
+}
+
+static int
+cn10k_ml_io_dequantize(struct rte_ml_dev *dev, int16_t model_id, uint16_t nb_batches, void *qbuffer,
+		       void *dbuffer)
+{
+	struct cn10k_ml_model *model;
+	uint8_t *lcl_qbuffer;
+	uint8_t *lcl_dbuffer;
+	uint32_t batch_id;
+	uint32_t i;
+	int ret;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	lcl_dbuffer = dbuffer;
+	lcl_qbuffer = qbuffer;
+	batch_id = 0;
+
+next_batch:
+	for (i = 0; i < model->metadata.model.num_output; i++) {
+		if (model->metadata.output[i].output_type ==
+		    model->metadata.output[i].model_output_type) {
+			memcpy(lcl_dbuffer, lcl_qbuffer, model->addr.output[i].sz_q);
+		} else {
+			switch (model->metadata.output[i].model_output_type) {
+			case RTE_ML_IO_TYPE_INT8:
+				ret = ml_int8_to_float32(model->metadata.output[i].dscale,
+							 model->addr.output[i].nb_elements,
+							 lcl_qbuffer, lcl_dbuffer);
+				break;
+			case RTE_ML_IO_TYPE_UINT8:
+				ret = ml_uint8_to_float32(model->metadata.output[i].dscale,
+							  model->addr.output[i].nb_elements,
+							  lcl_qbuffer, lcl_dbuffer);
+				break;
+			case RTE_ML_IO_TYPE_INT16:
+				ret = ml_int16_to_float32(model->metadata.output[i].dscale,
+							  model->addr.output[i].nb_elements,
+							  lcl_qbuffer, lcl_dbuffer);
+				break;
+			case RTE_ML_IO_TYPE_UINT16:
+				ret = ml_uint16_to_float32(model->metadata.output[i].dscale,
+							   model->addr.output[i].nb_elements,
+							   lcl_qbuffer, lcl_dbuffer);
+				break;
+			case RTE_ML_IO_TYPE_FP16:
+				ret = ml_float16_to_float32(model->addr.output[i].nb_elements,
+							    lcl_qbuffer, lcl_dbuffer);
+				break;
+			default:
+				plt_err("Unsupported model_output_type[%u] : %u", i,
+					model->metadata.output[i].model_output_type);
+				ret = -ENOTSUP;
+			}
+			if (ret < 0)
+				return ret;
+		}
+
+		lcl_qbuffer += model->addr.output[i].sz_q;
+		lcl_dbuffer += model->addr.output[i].sz_d;
+	}
+
+	batch_id++;
+	if (batch_id < PLT_DIV_CEIL(nb_batches, model->batch_size))
+		goto next_batch;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -1010,4 +1158,6 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* I/O ops */
 	.io_input_size_get = cn10k_ml_io_input_size_get,
 	.io_output_size_get = cn10k_ml_io_output_size_get,
+	.io_quantize = cn10k_ml_io_quantize,
+	.io_dequantize = cn10k_ml_io_dequantize,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 23/37] ml/cnxk: enable support to dump device debug info
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (21 preceding siblings ...)
  2022-12-08 20:02 ` [PATCH v1 22/37] ml/cnxk: enable quantization and dequantization Srikanth Yalavarthi
@ 2022-12-08 20:02 ` Srikanth Yalavarthi
  2022-12-08 20:02 ` [PATCH v1 24/37] ml/cnxk: add driver support for device selftest Srikanth Yalavarthi
                   ` (17 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:02 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added support to dump device debug information. Debug info on
cn10k device includes model state info, OCM usage info, firmware
debug and exception buffer.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c |  51 +++++++++
 drivers/ml/cnxk/cn10k_ml_ocm.h |   1 +
 drivers/ml/cnxk/cn10k_ml_ops.c | 189 +++++++++++++++++++++++++++++++++
 3 files changed, 241 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index ddc0936cec..348df9468a 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -458,3 +458,54 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, int16_t model_id)
 		}
 	}
 }
+
+static void
+cn10k_ml_ocm_pagemask_to_str(struct cn10k_ml_ocm_tile_info *tile_info, uint16_t nwords, char *str)
+{
+	char *p = str;
+	int word;
+
+	/* add prefix 0x */
+	*p++ = '0';
+	*p++ = 'x';
+
+	/* build one word at a time */
+	for (word = nwords - 1; word >= 0; word--) {
+		sprintf(p, "%02X", tile_info->ocm_mask[word]);
+		p += 2;
+	}
+
+	/* terminate */
+	*p++ = 0;
+}
+
+void
+cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
+{
+	char str[ML_CN10K_OCM_NUMPAGES / 4 + 2]; /* nibbles + prefix '0x' */
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+	uint8_t tile_id;
+	uint8_t word_id;
+	int wb_pages;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+
+	fprintf(fp, "OCM State:\n");
+	for (tile_id = 0; tile_id < ocm->num_tiles; tile_id++) {
+		cn10k_ml_ocm_pagemask_to_str(&ocm->tile_ocm_info[tile_id], ocm->mask_words, str);
+
+		wb_pages = 0 - ocm->tile_ocm_info[tile_id].scratch_pages;
+		for (word_id = 0; word_id < ML_CN10K_OCM_MASKWORDS; word_id++)
+			wb_pages +=
+				__builtin_popcount(ocm->tile_ocm_info[tile_id].ocm_mask[word_id]);
+
+		fprintf(fp,
+			"tile = %2u, scratch_pages = %4u,"
+			" wb_pages = %4d, last_wb_page = %4d,"
+			" pagemask = %s\n",
+			tile_id, ocm->tile_ocm_info[tile_id].scratch_pages, wb_pages,
+			ocm->tile_ocm_info[tile_id].last_wb_page, str);
+	}
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 7c6b1432c5..887c8bf6c0 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -83,5 +83,6 @@ int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16
 void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, int16_t model_id, uint64_t tilemask,
 				int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages);
 void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, int16_t model_id);
+void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp);
 
 #endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 9868d2a598..ae90d32480 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -14,10 +14,25 @@
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
 
+/* Debug print width */
+#define STR_LEN	  12
+#define FIELD_LEN 16
+#define LINE_LEN  90
+
 /* ML Job descriptor flags */
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
+static void
+print_line(FILE *fp, int len)
+{
+	int i;
+
+	for (i = 0; i < len; i++)
+		fprintf(fp, "-");
+	fprintf(fp, "\n");
+}
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -116,6 +131,102 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	return NULL;
 }
 
+static void
+cn10k_ml_model_print(struct rte_ml_dev *dev, int16_t model_id, FILE *fp)
+{
+
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+	char str[STR_LEN];
+	uint8_t i;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+	model = dev->data->models[model_id];
+
+	/* Print debug info */
+	print_line(fp, LINE_LEN);
+	fprintf(fp, " Model Information (%s)\n", model->metadata.model.name);
+	print_line(fp, LINE_LEN);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->metadata.model.name);
+	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version", model->metadata.model.version[0],
+		model->metadata.model.version[1], model->metadata.model.version[2],
+		model->metadata.model.version[3]);
+	if (strlen(model->name) != 0)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "debug_name", model->name);
+	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "index", model->model_id);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->metadata.model.batch_size);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_layers", model->metadata.model.num_layers);
+
+	/* Print model state */
+	if (model->state == ML_CN10K_MODEL_STATE_LOADED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+	if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+	if (model->state == ML_CN10K_MODEL_STATE_STARTED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+
+	/* Print OCM status */
+	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
+		model->metadata.model.ocm_wb_range_end - model->metadata.model.ocm_wb_range_start +
+			1);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", model->model_mem_map.wb_pages);
+	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
+		ocm->size_per_tile - model->metadata.model.ocm_tmp_range_floor);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages", model->model_mem_map.scratch_pages);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
+		model->metadata.model.tile_end - model->metadata.model.tile_start + 1);
+
+	if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
+			ML_CN10K_OCM_NUMTILES / 4, model->model_mem_map.tilemask);
+		fprintf(fp, "%*s : 0x%x\n", FIELD_LEN, "ocm_wb_start",
+			model->model_mem_map.wb_page_start * ML_CN10K_OCM_PAGESIZE);
+	}
+
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->metadata.model.num_input);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", model->metadata.model.num_output);
+	fprintf(fp, "\n");
+
+	print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s  %18s  %12s  %14s\n", "input", "input_name", "input_type",
+		"model_input_type", "quantize", "format");
+	print_line(fp, LINE_LEN);
+	for (i = 0; i < model->metadata.model.num_input; i++) {
+		fprintf(fp, "%8u  ", i);
+		fprintf(fp, "%*s  ", 16, model->metadata.input[i].input_name);
+		ml_io_type_to_str(model->metadata.input[i].input_type, str, STR_LEN);
+		fprintf(fp, "%*s  ", 12, str);
+		ml_io_type_to_str(model->metadata.input[i].model_input_type, str, STR_LEN);
+		fprintf(fp, "%*s  ", 18, str);
+		fprintf(fp, "%*s", 12, (model->metadata.input[i].quantize == 1 ? "Yes" : "No"));
+		ml_io_format_to_str(model->metadata.input[i].shape.format, str, STR_LEN);
+		fprintf(fp, "%*s", 16, str);
+		fprintf(fp, "\n");
+	}
+	fprintf(fp, "\n");
+
+	print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "output", "output_name", "output_type",
+		"model_output_type", "dequantize");
+	print_line(fp, LINE_LEN);
+	for (i = 0; i < model->metadata.model.num_output; i++) {
+		fprintf(fp, "%8u  ", i);
+		fprintf(fp, "%*s  ", 16, model->metadata.output[i].output_name);
+		ml_io_type_to_str(model->metadata.output[i].output_type, str, STR_LEN);
+		fprintf(fp, "%*s  ", 12, str);
+		ml_io_type_to_str(model->metadata.output[i].model_output_type, str, STR_LEN);
+		fprintf(fp, "%*s  ", 18, str);
+		fprintf(fp, "%*s", 12, (model->metadata.output[i].dequantize == 1 ? "Yes" : "No"));
+		fprintf(fp, "\n");
+	}
+	fprintf(fp, "\n");
+	print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+}
+
 static void
 cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_model *model,
 				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
@@ -498,6 +609,83 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+static int
+cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_fw *fw;
+
+	uint32_t head_loc;
+	uint32_t tail_loc;
+	uint32_t bufsize;
+	char *head_ptr;
+	int model_id;
+	int core_id;
+
+	if (roc_env_is_asim())
+		return 0;
+
+	mldev = dev->data->dev_private;
+	fw = &mldev->fw;
+
+	/* Dump model info */
+	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+		model = dev->data->models[model_id];
+		if (model != NULL) {
+			cn10k_ml_model_print(dev, model_id, fp);
+			fprintf(fp, "\n");
+		}
+	}
+
+	/* Dump ocm state */
+	cn10k_ml_ocm_print(dev, fp);
+
+	/* Dump debug buffer */
+	for (core_id = 0; core_id <= 1; core_id++) {
+		bufsize = fw->req->jd.fw_load.debug.debug_buffer_size;
+		if (core_id == 0) {
+			head_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+			tail_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_debug_ptr);
+			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+		} else {
+			head_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+			tail_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_debug_ptr);
+			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+		}
+		if (head_loc < tail_loc) {
+			fprintf(fp, "%.*s\n", tail_loc - head_loc, &head_ptr[head_loc]);
+		} else if (head_loc >= tail_loc + 1) {
+			fprintf(fp, "%.*s\n", bufsize - tail_loc, &head_ptr[head_loc]);
+			fprintf(fp, "%.*s\n", tail_loc, &head_ptr[0]);
+		}
+	}
+
+	/* Dump exception info */
+	for (core_id = 0; core_id <= 1; core_id++) {
+		bufsize = fw->req->jd.fw_load.debug.exception_state_size;
+		if ((core_id == 0) &&
+		    (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0)) {
+			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_exception_buffer);
+			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C0 = 0x%016lx",
+				roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0));
+			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+			fprintf(fp, "%.*s", bufsize, head_ptr);
+		} else if ((core_id == 1) &&
+			   (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0)) {
+			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_exception_buffer);
+			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C1 = 0x%016lx",
+				roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1));
+			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+			fprintf(fp, "%.*s", bufsize, head_ptr);
+		}
+	}
+
+	return 0;
+}
+
 int
 cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, int16_t *model_id)
 {
@@ -1142,6 +1330,7 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.dev_close = cn10k_ml_dev_close,
 	.dev_start = cn10k_ml_dev_start,
 	.dev_stop = cn10k_ml_dev_stop,
+	.dev_dump = cn10k_ml_dev_dump,
 
 	/* Queue-pair handling ops */
 	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 24/37] ml/cnxk: add driver support for device selftest
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (22 preceding siblings ...)
  2022-12-08 20:02 ` [PATCH v1 23/37] ml/cnxk: enable support to dump device debug info Srikanth Yalavarthi
@ 2022-12-08 20:02 ` Srikanth Yalavarthi
  2022-12-08 20:02 ` [PATCH v1 25/37] ml/cnxk: enqueue a burst of inference requests Srikanth Yalavarthi
                   ` (16 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:02 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added support for device selftest. Device selftest includes
checking the status of firmware.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 57 ++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index ae90d32480..9cf3bb4a9f 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -686,6 +686,62 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	return 0;
 }
 
+static int
+cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
+{
+	const struct plt_memzone *mz;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_req *req;
+	uint64_t timeout_cycle;
+	bool timeout;
+	int ret;
+
+	mldev = dev->data->dev_private;
+	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cn10k_ml_req), 0,
+					 ML_CN10K_ALIGN_SIZE);
+	if (mz == NULL) {
+		plt_err("Could not allocate reserved memzone");
+		return -ENOMEM;
+	}
+	req = mz->addr;
+
+	/* Prepare load completion structure */
+	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
+	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
+	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST;
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+	req->jd.fw_load.flags = cn10k_ml_fw_flags_get(&mldev->fw);
+	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_wmb();
+
+	/* Enqueue FW handshake / load through scratch registers */
+	timeout = true;
+	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+
+	plt_rmb();
+	do {
+		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
+		    (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+			timeout = false;
+			break;
+		}
+	} while (plt_tsc_cycles() < timeout_cycle);
+
+	/* Check firmware handshake / load status, clean-up and exit */
+	ret = 0;
+	if (timeout) {
+		ret = -ETIME;
+	} else {
+		if (req->result.error_code != 0)
+			ret = -1;
+	}
+
+	plt_memzone_free(mz);
+
+	return ret;
+}
+
 int
 cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, int16_t *model_id)
 {
@@ -1331,6 +1387,7 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.dev_start = cn10k_ml_dev_start,
 	.dev_stop = cn10k_ml_dev_stop,
 	.dev_dump = cn10k_ml_dev_dump,
+	.dev_selftest = cn10k_ml_dev_selftest,
 
 	/* Queue-pair handling ops */
 	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 25/37] ml/cnxk: enqueue a burst of inference requests
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (23 preceding siblings ...)
  2022-12-08 20:02 ` [PATCH v1 24/37] ml/cnxk: add driver support for device selftest Srikanth Yalavarthi
@ 2022-12-08 20:02 ` Srikanth Yalavarthi
  2022-12-08 20:02 ` [PATCH v1 26/37] ml/cnxk: dequeue " Srikanth Yalavarthi
                   ` (15 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:02 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled driver support to enqueue a burst of inference requests
to ML device. Enqueue uses internal ML request structure to queue
the inferences and job completion through polling.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 96 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  7 +++
 2 files changed, 103 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 9cf3bb4a9f..6f2d1adac8 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -285,6 +285,28 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 	}
 }
 
+static __rte_always_inline void
+cn10k_ml_prep_fp_job_descriptor(struct rte_ml_dev *dev, struct cn10k_ml_req *req,
+				struct rte_ml_op *op)
+{
+	struct cn10k_ml_dev *mldev;
+
+	mldev = dev->data->dev_private;
+
+	req->jd.hdr.jce.w0.u64 = 0;
+	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
+	req->jd.hdr.model_id = op->model_id;
+	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
+	req->jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
+	req->jd.hdr.sp_flags = 0x0;
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+	req->jd.model_run.input_ddr_addr =
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->input.addr));
+	req->jd.model_run.output_ddr_addr =
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->output.addr));
+	req->jd.model_run.num_batches = op->nb_batches;
+}
+
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -450,6 +472,8 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 
 	rte_spinlock_init(&ocm->lock);
 
+	dev->enqueue_burst = cn10k_ml_enqueue_burst;
+
 	mldev->nb_models_loaded = 0;
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
 
@@ -1379,6 +1403,78 @@ cn10k_ml_io_dequantize(struct rte_ml_dev *dev, int16_t model_id, uint16_t nb_bat
 	return 0;
 }
 
+static __rte_always_inline void
+queue_index_advance(uint64_t *index, uint64_t nb_desc)
+{
+	*index = (*index + 1) % nb_desc;
+}
+
+static __rte_always_inline uint64_t
+queue_pending_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
+{
+	return (nb_desc + head - tail) % nb_desc;
+}
+
+static __rte_always_inline uint64_t
+queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
+{
+	return nb_desc - queue_pending_count(head, tail, nb_desc) - 1;
+}
+
+__rte_hot uint16_t
+cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
+		       uint16_t nb_ops)
+{
+	struct cn10k_ml_queue *queue;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_req *req;
+	struct cn10k_ml_qp *qp;
+	struct rte_ml_op *op;
+
+	uint16_t count;
+	uint64_t head;
+	bool enqueued;
+
+	mldev = dev->data->dev_private;
+	qp = dev->data->queue_pairs[qp_id];
+	queue = &qp->queue;
+
+	head = queue->head;
+	nb_ops = PLT_MIN(nb_ops, queue_free_count(head, queue->tail, qp->nb_desc));
+	count = 0;
+
+	if (unlikely(nb_ops == 0))
+		return 0;
+
+enqueue_req:
+	op = ops[count];
+	req = &queue->reqs[head];
+
+	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
+
+	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
+	req->result.user_ptr = op->user_ptr;
+
+	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	enqueued = roc_ml_jcmdq_enqueue_lf(&mldev->roc, &req->jcmd);
+	if (unlikely(!enqueued))
+		goto jcmdq_full;
+
+	req->timeout = plt_tsc_cycles() + queue->wait_cycles;
+	req->op = op;
+
+	queue_index_advance(&head, qp->nb_desc);
+	count++;
+
+	if (count < nb_ops)
+		goto enqueue_req;
+
+jcmdq_full:
+	queue->head = head;
+
+	return count;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 5e7e42ee88..e3f61beeab 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -28,6 +28,9 @@ struct cn10k_ml_req {
 
 	/* Request timeout cycle */
 	uint64_t timeout;
+
+	/* ML op */
+	struct rte_ml_op *op;
 } __rte_aligned(ROC_ALIGN);
 
 /* ML request queue */
@@ -67,4 +70,8 @@ int cn10k_ml_model_unload(struct rte_ml_dev *dev, int16_t model_id);
 int cn10k_ml_model_start(struct rte_ml_dev *dev, int16_t model_id);
 int cn10k_ml_model_stop(struct rte_ml_dev *dev, int16_t model_id);
 
+/* Fast-path ops */
+__rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
+					  struct rte_ml_op **ops, uint16_t nb_ops);
+
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 26/37] ml/cnxk: dequeue a burst of inference requests
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (24 preceding siblings ...)
  2022-12-08 20:02 ` [PATCH v1 25/37] ml/cnxk: enqueue a burst of inference requests Srikanth Yalavarthi
@ 2022-12-08 20:02 ` Srikanth Yalavarthi
  2022-12-08 20:02 ` [PATCH v1 27/37] ml/cnxk: add internal function for sync mode run Srikanth Yalavarthi
                   ` (14 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:02 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled driver support to dequeue inference requests from
internal queue. Dequeue checks for request completion by
polling the status field of the job request.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 61 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  2 ++
 2 files changed, 63 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 6f2d1adac8..83ec064c82 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -473,6 +473,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	rte_spinlock_init(&ocm->lock);
 
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
+	dev->dequeue_burst = cn10k_ml_dequeue_burst;
 
 	mldev->nb_models_loaded = 0;
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
@@ -1421,6 +1422,23 @@ queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
 	return nb_desc - queue_pending_count(head, tail, nb_desc) - 1;
 }
 
+static __rte_always_inline void
+cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
+		       struct rte_ml_op *op)
+{
+	PLT_SET_USED(dev);
+	PLT_SET_USED(qp_id);
+
+	op->impl_opaque = result->error_code;
+
+	if (likely(result->error_code == 0))
+		op->status = RTE_ML_OP_STATUS_SUCCESS;
+	else
+		op->status = RTE_ML_OP_STATUS_ERROR;
+
+	op->user_ptr = result->user_ptr;
+}
+
 __rte_hot uint16_t
 cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
@@ -1475,6 +1493,49 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	return count;
 }
 
+__rte_hot uint16_t
+cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
+		       uint16_t nb_ops)
+{
+	struct cn10k_ml_queue *queue;
+	struct cn10k_ml_req *req;
+	struct cn10k_ml_qp *qp;
+
+	uint64_t status;
+	uint16_t count;
+	uint64_t tail;
+
+	qp = dev->data->queue_pairs[qp_id];
+	queue = &qp->queue;
+
+	tail = queue->tail;
+	nb_ops = PLT_MIN(nb_ops, queue_pending_count(queue->head, tail, qp->nb_desc));
+	count = 0;
+
+	if (unlikely(nb_ops == 0))
+		goto empty_or_active;
+
+dequeue_req:
+	req = &queue->reqs[tail];
+	status = plt_read64(&req->status);
+	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH))
+		goto empty_or_active;
+
+	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
+	ops[count] = req->op;
+
+	queue_index_advance(&tail, qp->nb_desc);
+	count++;
+
+	if (count < nb_ops)
+		goto dequeue_req;
+
+empty_or_active:
+	queue->tail = tail;
+
+	return count;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index e3f61beeab..3c5342dcc7 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -73,5 +73,7 @@ int cn10k_ml_model_stop(struct rte_ml_dev *dev, int16_t model_id);
 /* Fast-path ops */
 __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					  struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
+					  struct rte_ml_op **ops, uint16_t nb_ops);
 
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 27/37] ml/cnxk: add internal function for sync mode run
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (25 preceding siblings ...)
  2022-12-08 20:02 ` [PATCH v1 26/37] ml/cnxk: dequeue " Srikanth Yalavarthi
@ 2022-12-08 20:02 ` Srikanth Yalavarthi
  2022-12-08 20:02 ` [PATCH v1 28/37] ml/cnxk: enable support for firmware error codes Srikanth Yalavarthi
                   ` (13 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:02 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added internal function to execute ML inference requests
in synchronous mode. Sync mode inference execution is used
to launch inference requests without using a queue-pair.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 53 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  1 +
 2 files changed, 54 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 83ec064c82..e7ee0774f2 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1536,6 +1536,59 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	return count;
 }
 
+__rte_hot int
+cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_req *req;
+	bool timeout;
+	int ret = 0;
+
+	mldev = dev->data->dev_private;
+	model = dev->data->models[op->model_id];
+	req = model->req;
+
+	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
+
+	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
+	req->result.user_ptr = op->user_ptr;
+
+	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
+
+	timeout = true;
+	req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	do {
+		if (roc_ml_jcmdq_enqueue_lf(&mldev->roc, &req->jcmd)) {
+			req->op = op;
+			timeout = false;
+			break;
+		}
+	} while (plt_tsc_cycles() < req->timeout);
+
+	if (timeout) {
+		ret = -EBUSY;
+		goto error_enqueue;
+	}
+
+	timeout = true;
+	do {
+		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+			timeout = false;
+			break;
+		}
+	} while (plt_tsc_cycles() < req->timeout);
+
+	if (timeout)
+		ret = -ETIME;
+	else
+		cn10k_ml_result_update(dev, -1, &req->result, req->op);
+
+error_enqueue:
+	return ret;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 3c5342dcc7..c23e484b69 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -75,5 +75,6 @@ __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id
 					  struct rte_ml_op **ops, uint16_t nb_ops);
 __rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					  struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
 
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 28/37] ml/cnxk: enable support for firmware error codes
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (26 preceding siblings ...)
  2022-12-08 20:02 ` [PATCH v1 27/37] ml/cnxk: add internal function for sync mode run Srikanth Yalavarthi
@ 2022-12-08 20:02 ` Srikanth Yalavarthi
  2022-12-08 20:02 ` [PATCH v1 29/37] ml/cnxk: add support to get and reset device stats Srikanth Yalavarthi
                   ` (12 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:02 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled support for error handling. Added error types and subtypes
supported by ML firmware. Enabled support to get device specific
error code and message for a completed ML request.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c |   4 +-
 drivers/ml/cnxk/cn10k_ml_dev.h |  50 +++++++++++++-
 drivers/ml/cnxk/cn10k_ml_ops.c | 117 ++++++++++++++++++++++++++++++---
 drivers/ml/cnxk/cn10k_ml_ops.h |   2 +
 4 files changed, 160 insertions(+), 13 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 805b037593..779734d6cd 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -261,7 +261,7 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 	} while (plt_tsc_cycles() < timeout_cycle);
 
 	/* Check firmware load status, clean-up and exit on failure. */
-	if ((!timeout) && (fw->req->result.error_code == 0)) {
+	if ((!timeout) && (fw->req->result.error_code.u64 == 0)) {
 		cn10k_ml_fw_print_info(fw);
 	} else {
 		/* Set ML to disable new jobs */
@@ -452,7 +452,7 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	} while (plt_tsc_cycles() < timeout_cycle);
 
 	/* Check firmware load status, clean-up and exit on failure. */
-	if ((!timeout) && (fw->req->result.error_code == 0)) {
+	if ((!timeout) && (fw->req->result.error_code.u64 == 0)) {
 		cn10k_ml_fw_print_info(fw);
 	} else {
 		/* Set ML to disable new jobs */
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 5096a26c40..2045465839 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -64,6 +64,54 @@ enum cn10k_ml_dev_state {
 	ML_CN10K_DEV_STATE_CLOSED
 };
 
+/* ML error types enumeration */
+enum cn10k_ml_error_etype {
+	/* 0x0 */ ML_ETYPE_NO_ERROR = 0, /* No error */
+	/* 0x1 */ ML_ETYPE_FW_NONFATAL,	 /* Firmware non-fatal error */
+	/* 0x2 */ ML_ETYPE_HW_NONFATAL,	 /* Hardware non-fatal error */
+	/* 0x3 */ ML_ETYPE_HW_FATAL,	 /* Hardware fatal error */
+	/* 0x4 */ ML_ETYPE_HW_WARNING,	 /* Hardware warning */
+	/* 0x5 */ ML_ETYPE_DRIVER,	 /* Driver specific error */
+	/* 0x6 */ ML_ETYPE_UNKNOWN,	 /* Unknown error */
+};
+
+/* ML firmware non-fatal error sub-type */
+enum cn10k_ml_error_stype_fw_nf {
+	/* 0x0 */ ML_FW_ERR_NOERR = 0,		 /* No error */
+	/* 0x1 */ ML_FW_ERR_UNLOAD_ID_NOT_FOUND, /* Model ID not found during load */
+	/* 0x2 */ ML_FW_ERR_LOAD_LUT_OVERFLOW,	 /* Lookup table overflow at load */
+	/* 0x3 */ ML_FW_ERR_ID_IN_USE,		 /* Model ID already in use */
+	/* 0x4 */ ML_FW_ERR_INVALID_TILEMASK,	 /* Invalid OCM tilemask */
+	/* 0x5 */ ML_FW_ERR_RUN_LUT_OVERFLOW,	 /* Lookup table overflow at run */
+	/* 0x6 */ ML_FW_ERR_RUN_ID_NOT_FOUND,	 /* Model ID not found during run */
+	/* 0x7 */ ML_FW_ERR_COMMAND_NOTSUP,	 /* Unsupported command */
+	/* 0x8 */ ML_FW_ERR_DDR_ADDR_RANGE,	 /* DDR address out of range */
+	/* 0x9 */ ML_FW_ERR_NUM_BATCHES_INVALID, /* Invalid number of batches */
+	/* 0xA */ ML_FW_ERR_INSSYNC_TIMEOUT,	 /* INS sync timeout */
+};
+
+/* ML driver error sub-type */
+enum cn10k_ml_error_stype_driver {
+	/* 0x0 */ ML_DRIVER_ERR_NOERR = 0, /* No error */
+	/* 0x1 */ ML_DRIVER_ERR_UNKNOWN,   /* Unable to determine erro sub-type */
+	/* 0x2 */ ML_DRIVER_ERR_EXCEPTION, /* Firmware exception */
+	/* 0x3 */ ML_DRIVER_ERR_FW_ERROR,  /* Unknown firmware error */
+};
+
+/* ML error structure */
+union cn10k_ml_error_code {
+	struct {
+		/* Error type */
+		uint64_t etype : 4;
+
+		/* Error sub-type */
+		uint64_t stype : 60;
+	} s;
+
+	/* WORD 0 */
+	uint64_t u64;
+};
+
 /* ML Firmware stats */
 struct cn10k_ml_fw_stats {
 	/* Firmware start cycle */
@@ -82,7 +130,7 @@ struct cn10k_ml_fw_stats {
 /* ML result structure */
 struct cn10k_ml_result {
 	/* Job error code */
-	uint64_t error_code;
+	union cn10k_ml_error_code error_code;
 
 	/* Firmware stats */
 	struct cn10k_ml_fw_stats stats;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index e7ee0774f2..d9eea21e12 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -23,6 +23,49 @@
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
+/* Error message length */
+#define ERRMSG_LEN 32
+
+/* Error type database */
+static const struct cn10k_ml_etype_db {
+	enum cn10k_ml_error_etype etype;
+	char name[ERRMSG_LEN];
+} ml_etype_db[] = {
+	{ML_ETYPE_NO_ERROR, "NO_ERROR"},	{ML_ETYPE_FW_NONFATAL, "FW_NON_FATAL"},
+	{ML_ETYPE_HW_NONFATAL, "HW_NON_FATAL"}, {ML_ETYPE_HW_FATAL, "HW_FATAL"},
+	{ML_ETYPE_HW_WARNING, "HW_WARNING"},	{ML_ETYPE_DRIVER, "DRIVER_ERROR"},
+	{ML_ETYPE_UNKNOWN, "UNKNOWN_ERROR"},
+};
+
+/* Hardware non-fatal error subtype database */
+static const struct cn10k_ml_stype_db_hw_nf {
+	enum cn10k_ml_error_stype_fw_nf stype;
+	char msg[ERRMSG_LEN];
+} ml_stype_db_hw_nf[] = {
+	{ML_FW_ERR_NOERR, "NO ERROR"},
+	{ML_FW_ERR_UNLOAD_ID_NOT_FOUND, "UNLOAD MODEL ID NOT FOUND"},
+	{ML_FW_ERR_LOAD_LUT_OVERFLOW, "LOAD LUT OVERFLOW"},
+	{ML_FW_ERR_ID_IN_USE, "MODEL ID IN USE"},
+	{ML_FW_ERR_INVALID_TILEMASK, "INVALID TILEMASK"},
+	{ML_FW_ERR_RUN_LUT_OVERFLOW, "RUN LUT OVERFLOW"},
+	{ML_FW_ERR_RUN_ID_NOT_FOUND, "RUN MODEL ID NOT FOUND"},
+	{ML_FW_ERR_COMMAND_NOTSUP, "COMMAND NOT SUPPORTED"},
+	{ML_FW_ERR_DDR_ADDR_RANGE, "DDR ADDRESS OUT OF RANGE"},
+	{ML_FW_ERR_NUM_BATCHES_INVALID, "INVALID BATCHES"},
+	{ML_FW_ERR_INSSYNC_TIMEOUT, "INSSYNC TIMEOUT"},
+};
+
+/* Driver error subtype database */
+static const struct cn10k_ml_stype_db_driver {
+	enum cn10k_ml_error_stype_driver stype;
+	char msg[ERRMSG_LEN];
+} ml_stype_db_driver[] = {
+	{ML_DRIVER_ERR_NOERR, "NO ERROR"},
+	{ML_DRIVER_ERR_UNKNOWN, "UNKNOWN ERROR"},
+	{ML_DRIVER_ERR_EXCEPTION, "FW EXCEPTION"},
+	{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
+};
+
 static void
 print_line(FILE *fp, int len)
 {
@@ -474,6 +517,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
+	dev->op_error_get = cn10k_ml_op_error_get;
 
 	mldev->nb_models_loaded = 0;
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
@@ -758,7 +802,7 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	if (timeout) {
 		ret = -ETIME;
 	} else {
-		if (req->result.error_code != 0)
+		if (req->result.error_code.u64 != 0)
 			ret = -1;
 	}
 
@@ -940,7 +984,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, int16_t model_id)
 	req = model->req;
 	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
 
-	req->result.error_code = 0x0;
+	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
 
 	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
@@ -1021,7 +1065,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, int16_t model_id)
 
 	if (job_dequeued) {
 		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
-			if (req->result.error_code == 0)
+			if (req->result.error_code.u64 == 0)
 				ret = 0;
 			else
 				ret = -1;
@@ -1083,7 +1127,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, int16_t model_id)
 	/* Prepare JD */
 	req = model->req;
 	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
-	req->result.error_code = 0x0;
+	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
 
 	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
@@ -1138,7 +1182,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, int16_t model_id)
 
 	if (job_dequeued) {
 		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
-			if (req->result.error_code == 0x0)
+			if (req->result.error_code.u64 == 0x0)
 				ret = 0;
 			else
 				ret = -1;
@@ -1429,12 +1473,30 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 	PLT_SET_USED(dev);
 	PLT_SET_USED(qp_id);
 
-	op->impl_opaque = result->error_code;
+	struct cn10k_ml_dev *mldev;
 
-	if (likely(result->error_code == 0))
+	if (likely(result->error_code.u64 == 0)) {
+		op->impl_opaque = result->error_code.u64;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
-	else
+	} else {
+		/* Handle driver error */
+		if (result->error_code.s.etype == ML_ETYPE_DRIVER) {
+			mldev = dev->data->dev_private;
+
+			/* Check for exception */
+			if ((roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0) ||
+			    (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
+				result->error_code.s.stype = ML_DRIVER_ERR_EXCEPTION;
+			else if ((roc_ml_reg_read64(&mldev->roc, ML_CORE_INT_LO) != 0) ||
+				 (roc_ml_reg_read64(&mldev->roc, ML_CORE_INT_HI) != 0))
+				result->error_code.s.stype = ML_DRIVER_ERR_FW_ERROR;
+			else
+				result->error_code.s.stype = ML_DRIVER_ERR_UNKNOWN;
+		}
+
+		op->impl_opaque = result->error_code.u64;
 		op->status = RTE_ML_OP_STATUS_ERROR;
+	}
 
 	op->user_ptr = result->user_ptr;
 }
@@ -1471,6 +1533,7 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
+	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
 	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
@@ -1518,8 +1581,12 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 dequeue_req:
 	req = &queue->reqs[tail];
 	status = plt_read64(&req->status);
-	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH))
-		goto empty_or_active;
+	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH)) {
+		if (plt_tsc_cycles() < req->timeout)
+			goto empty_or_active;
+		else /* Timeout, set indication of driver error */
+			req->result.error_code.s.etype = ML_ETYPE_DRIVER;
+	}
 
 	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
 	ops[count] = req->op;
@@ -1536,6 +1603,35 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	return count;
 }
 
+__rte_hot int
+cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_ml_op_error *error)
+{
+	union cn10k_ml_error_code *error_code;
+	char msg[RTE_ML_STR_MAX];
+
+	PLT_SET_USED(dev);
+
+	error_code = (union cn10k_ml_error_code *)&op->impl_opaque;
+
+	/* Copy error message */
+	plt_strlcpy(msg, ml_etype_db[error_code->s.etype].name, sizeof(msg));
+
+	/* Copy sub error message */
+	if (error_code->s.etype == ML_ETYPE_HW_NONFATAL) {
+		strcat(msg, " : ");
+		strcat(msg, ml_stype_db_hw_nf[error_code->s.stype].msg);
+	}
+
+	if (error_code->s.etype == ML_ETYPE_DRIVER) {
+		strcat(msg, " : ");
+		strcat(msg, ml_stype_db_driver[error_code->s.stype].msg);
+	}
+
+	plt_strlcpy(error->message, msg, sizeof(error->message));
+
+	return 0;
+}
+
 __rte_hot int
 cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 {
@@ -1552,6 +1648,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
+	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
 	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index c23e484b69..5f00cb2a60 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -75,6 +75,8 @@ __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id
 					  struct rte_ml_op **ops, uint16_t nb_ops);
 __rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					  struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op,
+				    struct rte_ml_op_error *error);
 __rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
 
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 29/37] ml/cnxk: add support to get and reset device stats
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (27 preceding siblings ...)
  2022-12-08 20:02 ` [PATCH v1 28/37] ml/cnxk: enable support for firmware error codes Srikanth Yalavarthi
@ 2022-12-08 20:02 ` Srikanth Yalavarthi
  2022-12-08 20:02 ` [PATCH v1 30/37] ml/cnxk: add support to handle extended dev stats Srikanth Yalavarthi
                   ` (11 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:02 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled support to get and reset ML device stats. Device stats
include number of requests enqueued/dequeued and error count.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 55 ++++++++++++++++++++++++++++++++--
 drivers/ml/cnxk/cn10k_ml_ops.h |  3 ++
 2 files changed, 55 insertions(+), 3 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index d9eea21e12..732d0a63ba 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -159,6 +159,10 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->queue.tail = 0;
 	qp->queue.wait_cycles = ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
 	qp->nb_desc = nb_desc;
+	qp->stats.enqueued_count = 0;
+	qp->stats.dequeued_count = 0;
+	qp->stats.enqueue_err_count = 0;
+	qp->stats.dequeue_err_count = 0;
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
@@ -678,6 +682,38 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+static int
+cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
+{
+	struct cn10k_ml_qp *qp;
+	int qp_id;
+
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		stats->enqueued_count += qp->stats.enqueued_count;
+		stats->dequeued_count += qp->stats.dequeued_count;
+		stats->enqueue_err_count += qp->stats.enqueue_err_count;
+		stats->dequeue_err_count += qp->stats.dequeue_err_count;
+	}
+
+	return 0;
+}
+
+static void
+cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
+{
+	struct cn10k_ml_qp *qp;
+	int qp_id;
+
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		qp->stats.enqueued_count = 0;
+		qp->stats.dequeued_count = 0;
+		qp->stats.enqueue_err_count = 0;
+		qp->stats.dequeue_err_count = 0;
+	}
+}
+
 static int
 cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
@@ -1470,15 +1506,23 @@ static __rte_always_inline void
 cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
 		       struct rte_ml_op *op)
 {
-	PLT_SET_USED(dev);
-	PLT_SET_USED(qp_id);
-
 	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_qp *qp;
 
 	if (likely(result->error_code.u64 == 0)) {
+		if (likely(qp_id >= 0)) {
+			qp = dev->data->queue_pairs[qp_id];
+			qp->stats.dequeued_count++;
+		}
+
 		op->impl_opaque = result->error_code.u64;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
 	} else {
+		if (likely(qp_id >= 0)) {
+			qp = dev->data->queue_pairs[qp_id];
+			qp->stats.dequeue_err_count++;
+		}
+
 		/* Handle driver error */
 		if (result->error_code.s.etype == ML_ETYPE_DRIVER) {
 			mldev = dev->data->dev_private;
@@ -1552,6 +1596,7 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 
 jcmdq_full:
 	queue->head = head;
+	qp->stats.enqueued_count += count;
 
 	return count;
 }
@@ -1700,6 +1745,10 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
 	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
 
+	/* Stats ops */
+	.dev_stats_get = cn10k_ml_dev_stats_get,
+	.dev_stats_reset = cn10k_ml_dev_stats_reset,
+
 	/* Model ops */
 	.model_load = cn10k_ml_model_load,
 	.model_unload = cn10k_ml_model_unload,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 5f00cb2a60..4c38f1938a 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -58,6 +58,9 @@ struct cn10k_ml_qp {
 
 	/* Request queue */
 	struct cn10k_ml_queue queue;
+
+	/* Queue pair statistics */
+	struct rte_ml_dev_stats stats;
 };
 
 /* CN10K device ops */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 30/37] ml/cnxk: add support to handle extended dev stats
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (28 preceding siblings ...)
  2022-12-08 20:02 ` [PATCH v1 29/37] ml/cnxk: add support to get and reset device stats Srikanth Yalavarthi
@ 2022-12-08 20:02 ` Srikanth Yalavarthi
  2022-12-08 20:02 ` [PATCH v1 31/37] ml/cnxk: enable support to get xstats in cycles Srikanth Yalavarthi
                   ` (10 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:02 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added support to handle ML device extended stats. Support
is enabled to get xstats names and stats values and reset
xstats. Supported xstats include avg, min and max hardware
and firmware latency.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |   3 +
 drivers/ml/cnxk/cn10k_ml_model.h |  57 +++++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 356 ++++++++++++++++++++++++++++++-
 3 files changed, 415 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 2045465839..d6c02e18f4 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -372,6 +372,9 @@ struct cn10k_ml_dev {
 
 	/* Number of models loaded */
 	uint16_t nb_models_loaded;
+
+	/* xstats status */
+	bool xstats_enabled;
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 2372ac9b72..9d8068a173 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -402,6 +402,57 @@ struct cn10k_ml_model_addr {
 	uint32_t total_output_sz_d;
 };
 
+/* Extended stats types enum */
+enum cn10k_ml_model_xstats_type {
+	/* Average hardware latency */
+	avg_hw_latency = 0,
+
+	/* Minimum hardware latency */
+	min_hw_latency,
+
+	/* Maximum hardware latency */
+	max_hw_latency,
+
+	/* Average firmware latency */
+	avg_fw_latency,
+
+	/* Minimum firmware latency */
+	min_fw_latency,
+
+	/* Maximum firmware latency */
+	max_fw_latency,
+};
+
+/* Model fast-path stats */
+struct cn10k_ml_model_stats {
+	/* Total hardware latency, sum of all inferences */
+	uint64_t hw_latency_tot;
+
+	/* Minimum hardware latency */
+	uint64_t hw_latency_min;
+
+	/* Maximum hardware latency */
+	uint64_t hw_latency_max;
+
+	/* Total firmware latency, sum of all inferences */
+	uint64_t fw_latency_tot;
+
+	/* Minimum firmware latency */
+	uint64_t fw_latency_min;
+
+	/* Maximum firmware latency */
+	uint64_t fw_latency_max;
+
+	/* Total jobs dequeued */
+	uint64_t dequeued_count;
+
+	/* Hardware stats reset index */
+	uint64_t hw_reset_count;
+
+	/* Firmware stats reset index */
+	uint64_t fw_reset_count;
+};
+
 /* ML Model Object */
 struct cn10k_ml_model {
 	/* Device reference */
@@ -441,6 +492,12 @@ struct cn10k_ml_model {
 
 	/* Model slow-path operations request pointer */
 	struct cn10k_ml_req *req;
+
+	/* Model stats for burst ops */
+	struct cn10k_ml_model_stats *burst_stats;
+
+	/* Model stats for sync ops */
+	struct cn10k_ml_model_stats *sync_stats;
 };
 
 int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 732d0a63ba..eeea98a4d5 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -354,6 +354,134 @@ cn10k_ml_prep_fp_job_descriptor(struct rte_ml_dev *dev, struct cn10k_ml_req *req
 	req->jd.model_run.num_batches = op->nb_batches;
 }
 
+#define ML_AVG_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
+			value += model->burst_stats[qp_id].str##_latency_tot;                      \
+			count += model->burst_stats[qp_id].dequeued_count -                        \
+				 model->burst_stats[qp_id].str##_reset_count;                      \
+		}                                                                                  \
+		value = value / count;                                                             \
+	} while (0)
+
+#define ML_MIN_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+	do {                                                                                       \
+		value = UINT64_MAX;                                                                \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
+			value = PLT_MIN(value, model->burst_stats[qp_id].str##_latency_min);       \
+			count += model->burst_stats[qp_id].dequeued_count -                        \
+				 model->burst_stats[qp_id].str##_reset_count;                      \
+		}                                                                                  \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+
+#define ML_MAX_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
+			value = PLT_MAX(value, model->burst_stats[qp_id].str##_latency_max);       \
+			count += model->burst_stats[qp_id].dequeued_count -                        \
+				 model->burst_stats[qp_id].str##_reset_count;                      \
+		}                                                                                  \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+
+static uint64_t
+cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t model_id,
+			 enum cn10k_ml_model_xstats_type type)
+{
+	struct cn10k_ml_model *model;
+	uint64_t count = 0;
+	uint64_t value;
+	uint32_t qp_id;
+
+	model = dev->data->models[model_id];
+	if (model == NULL)
+		return 0;
+
+	switch (type) {
+	case avg_hw_latency:
+		ML_AVG_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		break;
+	case min_hw_latency:
+		ML_MIN_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		break;
+	case max_hw_latency:
+		ML_MAX_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		break;
+	case avg_fw_latency:
+		ML_AVG_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		break;
+	case min_fw_latency:
+		ML_MIN_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		break;
+	case max_fw_latency:
+		ML_MAX_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		break;
+	default:
+		value = 0;
+	}
+
+	return value;
+}
+
+#define ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
+			model->burst_stats[qp_id].str##_latency_tot = 0;                           \
+			model->burst_stats[qp_id].str##_reset_count =                              \
+				model->burst_stats[qp_id].dequeued_count;                          \
+		}                                                                                  \
+	} while (0)
+
+#define ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
+			model->burst_stats[qp_id].str##_latency_min = UINT64_MAX;                  \
+	} while (0)
+
+#define ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
+			model->burst_stats[qp_id].str##_latency_max = 0;                           \
+	} while (0)
+
+static void
+cn10k_ml_model_xstat_reset(struct rte_ml_dev *dev, uint16_t model_id,
+			   enum cn10k_ml_model_xstats_type type)
+{
+	struct cn10k_ml_model *model;
+	uint32_t qp_id;
+
+	model = dev->data->models[model_id];
+
+	switch (type) {
+	case avg_hw_latency:
+		ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, hw);
+		break;
+	case min_hw_latency:
+		ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, hw);
+		break;
+	case max_hw_latency:
+		ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, hw);
+		break;
+	case avg_fw_latency:
+		ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, fw);
+		break;
+	case min_fw_latency:
+		ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, fw);
+		break;
+	case max_fw_latency:
+		ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, fw);
+		break;
+	default:
+		return;
+	}
+}
+
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -519,6 +647,13 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 
 	rte_spinlock_init(&ocm->lock);
 
+	/* Check firmware stats */
+	if ((mldev->fw.req->jd.fw_load.cap.s.hw_stats) &&
+	    (mldev->fw.req->jd.fw_load.cap.s.fw_stats))
+		mldev->xstats_enabled = true;
+	else
+		mldev->xstats_enabled = false;
+
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
 	dev->op_error_get = cn10k_ml_op_error_get;
@@ -714,6 +849,170 @@ cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 	}
 }
 
+/* Model xstats names */
+struct rte_ml_dev_xstats_map cn10k_ml_model_xstats_table[] = {
+	{avg_hw_latency, "Avg-HW-Latency"}, {min_hw_latency, "Min-HW-Latency"},
+	{max_hw_latency, "Max-HW-Latency"}, {avg_fw_latency, "Avg-FW-Latency"},
+	{min_fw_latency, "Min-FW-Latency"}, {max_fw_latency, "Max-FW-Latency"},
+};
+
+static int
+cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, struct rte_ml_dev_xstats_map *xstats_map,
+			      uint32_t size)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	uint32_t model_id;
+	uint32_t count;
+	uint32_t type;
+	uint32_t id;
+
+	mldev = dev->data->dev_private;
+	if (!mldev->xstats_enabled)
+		return 0;
+
+	if (xstats_map == NULL)
+		return PLT_DIM(cn10k_ml_model_xstats_table) * mldev->nb_models_loaded;
+
+	/* Model xstats names */
+	count = 0;
+	cn10k_ml_dev_info_get(dev, &dev_info);
+
+	for (id = 0; id < PLT_DIM(cn10k_ml_model_xstats_table) * dev_info.max_models; id++) {
+		model_id = id / PLT_DIM(cn10k_ml_model_xstats_table);
+		model = dev->data->models[model_id];
+
+		if (model == NULL)
+			continue;
+
+		xstats_map[count].id = id;
+		type = id % PLT_DIM(cn10k_ml_model_xstats_table);
+
+		snprintf(xstats_map[count].name, RTE_ML_STR_MAX, "%s-%s-cycles",
+			 model->metadata.model.name, cn10k_ml_model_xstats_table[type].name);
+
+		count++;
+		if (count == size)
+			break;
+	}
+
+	return count;
+}
+
+static int
+cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
+				uint64_t *value)
+{
+	struct rte_ml_dev_xstats_map *xstats_map;
+	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_dev *mldev;
+	uint32_t num_xstats;
+	uint32_t model_id;
+	uint32_t type;
+	uint32_t id;
+
+	mldev = dev->data->dev_private;
+	if (!mldev->xstats_enabled)
+		return 0;
+
+	num_xstats = PLT_DIM(cn10k_ml_model_xstats_table) * mldev->nb_models_loaded;
+	xstats_map = rte_zmalloc("cn10k_ml_xstats_map",
+				 sizeof(struct rte_ml_dev_xstats_map) * num_xstats, 0);
+	cn10k_ml_dev_xstats_names_get(dev, xstats_map, num_xstats);
+
+	cn10k_ml_dev_info_get(dev, &dev_info);
+	for (id = 0; id < PLT_DIM(cn10k_ml_model_xstats_table) * dev_info.max_models; id++) {
+		if (strncmp(name, xstats_map[id].name, strlen(name)) == 0) {
+			*stat_id = id;
+			rte_free(xstats_map);
+			break;
+		}
+	}
+
+	if (id == PLT_DIM(cn10k_ml_model_xstats_table) * dev_info.max_models)
+		return -EINVAL;
+
+	model_id = id / PLT_DIM(cn10k_ml_model_xstats_table);
+	type = id % PLT_DIM(cn10k_ml_model_xstats_table);
+	*value = cn10k_ml_model_xstat_get(dev, model_id, type);
+
+	return 0;
+}
+
+static int
+cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, const uint16_t *stat_ids, uint64_t *values,
+			uint16_t nb_ids)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	uint32_t model_id;
+	uint32_t count;
+	uint32_t type;
+	uint32_t i;
+
+	mldev = dev->data->dev_private;
+	if (!mldev->xstats_enabled)
+		return 0;
+
+	count = 0;
+	for (i = 0; i < nb_ids; i++) {
+		model_id = stat_ids[i] / PLT_DIM(cn10k_ml_model_xstats_table);
+		model = dev->data->models[model_id];
+
+		if (model == NULL)
+			continue;
+
+		type = stat_ids[i] % PLT_DIM(cn10k_ml_model_xstats_table);
+		values[i] = cn10k_ml_model_xstat_get(dev, model_id, type);
+		count++;
+	}
+
+	return count;
+}
+
+static int
+cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, const uint16_t *stat_ids, uint16_t nb_ids)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	uint32_t model_id;
+	uint32_t type;
+	uint32_t i;
+
+	mldev = dev->data->dev_private;
+	if (!mldev->xstats_enabled)
+		return 0;
+
+	cn10k_ml_dev_info_get(dev, &dev_info);
+	if (stat_ids == NULL) {
+		for (i = 0; i < PLT_DIM(cn10k_ml_model_xstats_table) * dev_info.max_models; i++) {
+			model_id = i / PLT_DIM(cn10k_ml_model_xstats_table);
+			model = dev->data->models[model_id];
+
+			if (model == NULL)
+				continue;
+
+			type = i % PLT_DIM(cn10k_ml_model_xstats_table);
+			cn10k_ml_model_xstat_reset(dev, model_id, type);
+		}
+	} else {
+		for (i = 0; i < nb_ids; i++) {
+			model_id = stat_ids[i] / PLT_DIM(cn10k_ml_model_xstats_table);
+			model = dev->data->models[model_id];
+
+			if (model == NULL)
+				continue;
+
+			type = stat_ids[i] % PLT_DIM(cn10k_ml_model_xstats_table);
+			cn10k_ml_model_xstat_reset(dev, model_id, type);
+		}
+	}
+
+	return 0;
+}
+
 static int
 cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
@@ -856,6 +1155,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
+	size_t model_stats_size;
 	size_t model_data_size;
 	size_t model_info_size;
 	uint8_t *base_dma_addr;
@@ -864,6 +1164,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	uint64_t mz_size;
 	uint16_t idx;
 	bool found;
+	int qp_id;
 	int ret;
 
 	ret = cn10k_ml_model_metadata_check(params->addr, params->size);
@@ -900,10 +1201,12 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
 			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
 	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
+	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_model_stats);
 
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
 		  2 * model_data_size + model_info_size +
-		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE);
+		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE) +
+		  model_stats_size;
 
 	/* Allocate memzone for model object and model data */
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
@@ -952,6 +1255,24 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	/* Set model info */
 	cn10k_ml_model_info_set(dev, model);
 
+	/* Reset burst and sync stats */
+	model->burst_stats = PLT_PTR_ADD(
+		model->req, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE));
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
+		model->burst_stats[qp_id].hw_latency_tot = 0;
+		model->burst_stats[qp_id].hw_latency_min = UINT64_MAX;
+		model->burst_stats[qp_id].hw_latency_max = 0;
+		model->burst_stats[qp_id].fw_latency_tot = 0;
+		model->burst_stats[qp_id].fw_latency_min = UINT64_MAX;
+		model->burst_stats[qp_id].fw_latency_max = 0;
+		model->burst_stats[qp_id].hw_reset_count = 0;
+		model->burst_stats[qp_id].fw_reset_count = 0;
+		model->burst_stats[qp_id].dequeued_count = 0;
+	}
+	model->sync_stats =
+		PLT_PTR_ADD(model->burst_stats,
+			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_model_stats));
+
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
@@ -1506,15 +1827,44 @@ static __rte_always_inline void
 cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
 		       struct rte_ml_op *op)
 {
+	struct cn10k_ml_model_stats *stats;
+	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
+	uint64_t hw_latency;
+	uint64_t fw_latency;
 
 	if (likely(result->error_code.u64 == 0)) {
+		model = dev->data->models[op->model_id];
 		if (likely(qp_id >= 0)) {
 			qp = dev->data->queue_pairs[qp_id];
 			qp->stats.dequeued_count++;
+			stats = &model->burst_stats[qp_id];
+		} else {
+			stats = model->sync_stats;
+		}
+
+		if (unlikely(stats->dequeued_count == stats->hw_reset_count)) {
+			stats->hw_latency_min = UINT64_MAX;
+			stats->hw_latency_max = 0;
 		}
 
+		if (unlikely(stats->dequeued_count == stats->fw_reset_count)) {
+			stats->fw_latency_min = UINT64_MAX;
+			stats->fw_latency_max = 0;
+		}
+
+		hw_latency = result->stats.hw_end - result->stats.hw_start;
+		fw_latency = result->stats.fw_end - result->stats.fw_start - hw_latency;
+
+		stats->hw_latency_tot += hw_latency;
+		stats->hw_latency_min = PLT_MIN(stats->hw_latency_min, hw_latency);
+		stats->hw_latency_max = PLT_MAX(stats->hw_latency_max, hw_latency);
+		stats->fw_latency_tot += fw_latency;
+		stats->fw_latency_min = PLT_MIN(stats->fw_latency_min, fw_latency);
+		stats->fw_latency_max = PLT_MAX(stats->fw_latency_max, fw_latency);
+		stats->dequeued_count++;
+
 		op->impl_opaque = result->error_code.u64;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
 	} else {
@@ -1748,6 +2098,10 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Stats ops */
 	.dev_stats_get = cn10k_ml_dev_stats_get,
 	.dev_stats_reset = cn10k_ml_dev_stats_reset,
+	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
+	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
+	.dev_xstats_get = cn10k_ml_dev_xstats_get,
+	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
 
 	/* Model ops */
 	.model_load = cn10k_ml_model_load,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 31/37] ml/cnxk: enable support to get xstats in cycles
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (29 preceding siblings ...)
  2022-12-08 20:02 ` [PATCH v1 30/37] ml/cnxk: add support to handle extended dev stats Srikanth Yalavarthi
@ 2022-12-08 20:02 ` Srikanth Yalavarthi
  2022-12-08 20:02 ` [PATCH v1 32/37] ml/cnxk: add support to report DPE FW warnings Srikanth Yalavarthi
                   ` (9 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:02 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled support to retrieve xstats in either cycles or ns.
Access to sclk is enabled only if an RVU device is probed
during initialization. Driver would return the xstats in
nanoseconds only when an RVU device is probed, else would
fallback to cycles.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index eeea98a4d5..5d29a55e66 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -394,6 +394,8 @@ cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t model_id,
 			 enum cn10k_ml_model_xstats_type type)
 {
 	struct cn10k_ml_model *model;
+	uint16_t rclk_freq; /* MHz */
+	uint16_t sclk_freq; /* MHz */
 	uint64_t count = 0;
 	uint64_t value;
 	uint32_t qp_id;
@@ -425,6 +427,10 @@ cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t model_id,
 		value = 0;
 	}
 
+	roc_clk_freq_get(&rclk_freq, &sclk_freq);
+	if (sclk_freq != 0) /* return in ns */
+		value = (value * 1000ULL) / sclk_freq;
+
 	return value;
 }
 
@@ -863,6 +869,8 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, struct rte_ml_dev_xstats_m
 	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
+	uint16_t rclk_freq;
+	uint16_t sclk_freq;
 	uint32_t model_id;
 	uint32_t count;
 	uint32_t type;
@@ -878,6 +886,7 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, struct rte_ml_dev_xstats_m
 	/* Model xstats names */
 	count = 0;
 	cn10k_ml_dev_info_get(dev, &dev_info);
+	roc_clk_freq_get(&rclk_freq, &sclk_freq);
 
 	for (id = 0; id < PLT_DIM(cn10k_ml_model_xstats_table) * dev_info.max_models; id++) {
 		model_id = id / PLT_DIM(cn10k_ml_model_xstats_table);
@@ -889,8 +898,14 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, struct rte_ml_dev_xstats_m
 		xstats_map[count].id = id;
 		type = id % PLT_DIM(cn10k_ml_model_xstats_table);
 
-		snprintf(xstats_map[count].name, RTE_ML_STR_MAX, "%s-%s-cycles",
-			 model->metadata.model.name, cn10k_ml_model_xstats_table[type].name);
+		if (sclk_freq == 0)
+			snprintf(xstats_map[count].name, RTE_ML_STR_MAX, "%s-%s-cycles",
+				 model->metadata.model.name,
+				 cn10k_ml_model_xstats_table[type].name);
+		else
+			snprintf(xstats_map[count].name, RTE_ML_STR_MAX, "%s-%s-ns",
+				 model->metadata.model.name,
+				 cn10k_ml_model_xstats_table[type].name);
 
 		count++;
 		if (count == size)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 32/37] ml/cnxk: add support to report DPE FW warnings
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (30 preceding siblings ...)
  2022-12-08 20:02 ` [PATCH v1 31/37] ml/cnxk: enable support to get xstats in cycles Srikanth Yalavarthi
@ 2022-12-08 20:02 ` Srikanth Yalavarthi
  2022-12-08 20:02 ` [PATCH v1 33/37] ml/cnxk: add support to enable model data caching Srikanth Yalavarthi
                   ` (8 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:02 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled support to enable and report DPE warnings from ML
firmware. Configure firmware load flags based on the device
arguments.

Default values:
	enable_dpe_errors = 1
	report_dpe_errors = 0

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 94 +++++++++++++++++++++++++++++++---
 drivers/ml/cnxk/cn10k_ml_dev.h |  6 +++
 2 files changed, 93 insertions(+), 7 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 779734d6cd..0b345b3d4e 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -17,9 +17,13 @@
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ops.h"
 
-#define CN10K_ML_FW_PATH "fw_path"
+#define CN10K_ML_FW_PATH		"fw_path"
+#define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
+#define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
 
-#define CN10K_ML_FW_PATH_DEFAULT "/lib/firmware/mlip-fw.bin"
+#define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
+#define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1
+#define CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT 0
 
 /* ML firmware macros */
 #define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
@@ -28,9 +32,13 @@
 #define FW_EXCEPTION_BUFFER_SIZE 0x400
 #define FW_LINKER_OFFSET	 0x80000
 #define FW_WAIT_CYCLES		 100
-#define FW_LOAD_FLAGS		 0x1
 
-static const char *const valid_args[] = {CN10K_ML_FW_PATH, NULL};
+/* Firmware flags */
+#define FW_ENABLE_DPE_WARNING_BITMASK BIT(0)
+#define FW_REPORT_DPE_WARNING_BITMASK BIT(1)
+
+static const char *const valid_args[] = {CN10K_ML_FW_PATH, CN10K_ML_FW_ENABLE_DPE_WARNINGS,
+					 CN10K_ML_FW_REPORT_DPE_WARNINGS, NULL};
 
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
@@ -49,9 +57,25 @@ parse_string_arg(const char *key __rte_unused, const char *value, void *extra_ar
 	return 0;
 }
 
+static int
+parse_integer_arg(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	int *i = (int *)extra_args;
+
+	*i = atoi(value);
+	if (*i < 0) {
+		plt_err("Argument has to be positive.");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 static int
 cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mldev)
 {
+	bool enable_dpe_warnings_set = false;
+	bool report_dpe_warnings_set = false;
 	struct rte_kvargs *kvlist = NULL;
 	bool fw_path_set = false;
 	char *fw_path = NULL;
@@ -76,6 +100,30 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		fw_path_set = true;
 	}
 
+	if (rte_kvargs_count(kvlist, CN10K_ML_FW_ENABLE_DPE_WARNINGS) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_ENABLE_DPE_WARNINGS,
+					 &parse_integer_arg, &mldev->fw.enable_dpe_warnings);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n",
+				CN10K_ML_FW_ENABLE_DPE_WARNINGS);
+			ret = -EINVAL;
+			goto exit;
+		}
+		enable_dpe_warnings_set = true;
+	}
+
+	if (rte_kvargs_count(kvlist, CN10K_ML_FW_REPORT_DPE_WARNINGS) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_REPORT_DPE_WARNINGS,
+					 &parse_integer_arg, &mldev->fw.report_dpe_warnings);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n",
+				CN10K_ML_FW_REPORT_DPE_WARNINGS);
+			ret = -EINVAL;
+			goto exit;
+		}
+		report_dpe_warnings_set = true;
+	}
+
 check_args:
 	if (!fw_path_set)
 		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
@@ -83,6 +131,30 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		mldev->fw.path = fw_path;
 	plt_info("ML: %s = %s", CN10K_ML_FW_PATH, mldev->fw.path);
 
+	if (!enable_dpe_warnings_set) {
+		mldev->fw.enable_dpe_warnings = CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT;
+	} else {
+		if ((mldev->fw.enable_dpe_warnings < 0) || (mldev->fw.enable_dpe_warnings > 1)) {
+			plt_err("Invalid argument, %s = %d\n", CN10K_ML_FW_ENABLE_DPE_WARNINGS,
+				mldev->fw.enable_dpe_warnings);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_info("ML: %s = %d", CN10K_ML_FW_ENABLE_DPE_WARNINGS, mldev->fw.enable_dpe_warnings);
+
+	if (!report_dpe_warnings_set) {
+		mldev->fw.report_dpe_warnings = CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT;
+	} else {
+		if ((mldev->fw.report_dpe_warnings < 0) || (mldev->fw.report_dpe_warnings > 1)) {
+			plt_err("Invalid argument, %s = %d\n", CN10K_ML_FW_REPORT_DPE_WARNINGS,
+				mldev->fw.report_dpe_warnings);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_info("ML: %s = %d", CN10K_ML_FW_REPORT_DPE_WARNINGS, mldev->fw.report_dpe_warnings);
+
 exit:
 	if (kvlist)
 		rte_kvargs_free(kvlist);
@@ -208,9 +280,15 @@ cn10k_ml_fw_print_info(struct cn10k_ml_fw *fw)
 uint64_t
 cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 {
-	PLT_SET_USED(fw);
+	uint64_t flags = 0x0;
+
+	if (fw->enable_dpe_warnings)
+		flags = flags | FW_ENABLE_DPE_WARNING_BITMASK;
+
+	if (fw->report_dpe_warnings)
+		flags = flags | FW_REPORT_DPE_WARNING_BITMASK;
 
-	return FW_LOAD_FLAGS;
+	return flags;
 }
 
 static int
@@ -614,4 +692,6 @@ RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table);
 RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
 
-RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH "=<path>");
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD,
+			      CN10K_ML_FW_PATH "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
+					       "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS "=<0|1>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index d6c02e18f4..fefbb5072a 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -349,6 +349,12 @@ struct cn10k_ml_fw {
 	/* Firmware file path */
 	const char *path;
 
+	/* Enable DPE warnings */
+	int enable_dpe_warnings;
+
+	/* Report DPE warnings */
+	int report_dpe_warnings;
+
 	/* Data buffer */
 	uint8_t *data;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 33/37] ml/cnxk: add support to enable model data caching
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (31 preceding siblings ...)
  2022-12-08 20:02 ` [PATCH v1 32/37] ml/cnxk: add support to report DPE FW warnings Srikanth Yalavarthi
@ 2022-12-08 20:02 ` Srikanth Yalavarthi
  2022-12-08 20:02 ` [PATCH v1 34/37] ml/cnxk: add support to select OCM allocation mode Srikanth Yalavarthi
                   ` (7 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:02 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added device argument 'cache_model_data' to enable model data
caching. An inference request would be executed with dummy data
in synchronous mode during model start stage. This run would
cache the model weights and bias in the memory and result in
improved inference throughput.

cache_model_data = 1, enable (default)
cache_model_data = 0, disable

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 33 ++++++++++++++++++++--
 drivers/ml/cnxk/cn10k_ml_dev.h |  3 ++
 drivers/ml/cnxk/cn10k_ml_ops.c | 50 ++++++++++++++++++++++++++++++++++
 3 files changed, 84 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 0b345b3d4e..b844a42677 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -20,10 +20,12 @@
 #define CN10K_ML_FW_PATH		"fw_path"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
+#define CN10K_ML_DEV_CACHE_MODEL_DATA	"cache_model_data"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT 0
+#define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT	1
 
 /* ML firmware macros */
 #define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
@@ -38,7 +40,8 @@
 #define FW_REPORT_DPE_WARNING_BITMASK BIT(1)
 
 static const char *const valid_args[] = {CN10K_ML_FW_PATH, CN10K_ML_FW_ENABLE_DPE_WARNINGS,
-					 CN10K_ML_FW_REPORT_DPE_WARNINGS, NULL};
+					 CN10K_ML_FW_REPORT_DPE_WARNINGS,
+					 CN10K_ML_DEV_CACHE_MODEL_DATA, NULL};
 
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
@@ -76,6 +79,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 {
 	bool enable_dpe_warnings_set = false;
 	bool report_dpe_warnings_set = false;
+	bool cache_model_data_set = false;
 	struct rte_kvargs *kvlist = NULL;
 	bool fw_path_set = false;
 	char *fw_path = NULL;
@@ -124,6 +128,18 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		report_dpe_warnings_set = true;
 	}
 
+	if (rte_kvargs_count(kvlist, CN10K_ML_DEV_CACHE_MODEL_DATA) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_DEV_CACHE_MODEL_DATA, &parse_integer_arg,
+					 &mldev->cache_model_data);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n",
+				CN10K_ML_DEV_CACHE_MODEL_DATA);
+			ret = -EINVAL;
+			goto exit;
+		}
+		cache_model_data_set = true;
+	}
+
 check_args:
 	if (!fw_path_set)
 		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
@@ -155,6 +171,18 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %d", CN10K_ML_FW_REPORT_DPE_WARNINGS, mldev->fw.report_dpe_warnings);
 
+	if (!cache_model_data_set) {
+		mldev->cache_model_data = CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT;
+	} else {
+		if ((mldev->cache_model_data < 0) || (mldev->cache_model_data > 1)) {
+			plt_err("Invalid argument, %s = %d\n", CN10K_ML_DEV_CACHE_MODEL_DATA,
+				mldev->cache_model_data);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_info("ML: %s = %d", CN10K_ML_DEV_CACHE_MODEL_DATA, mldev->cache_model_data);
+
 exit:
 	if (kvlist)
 		rte_kvargs_free(kvlist);
@@ -694,4 +722,5 @@ RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
 
 RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD,
 			      CN10K_ML_FW_PATH "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
-					       "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS "=<0|1>");
+					       "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
+					       "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA "=<0|1>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index fefbb5072a..59094e767e 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -381,6 +381,9 @@ struct cn10k_ml_dev {
 
 	/* xstats status */
 	bool xstats_enabled;
+
+	/* Enable / disable model data caching */
+	int cache_model_data;
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 5d29a55e66..a44d77df76 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -488,6 +488,49 @@ cn10k_ml_model_xstat_reset(struct rte_ml_dev *dev, uint16_t model_id,
 	}
 }
 
+static int
+cn10k_ml_cache_model_data(struct rte_ml_dev *dev, int16_t model_id)
+{
+	struct cn10k_ml_model *model;
+	struct rte_ml_op op;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint64_t isize = 0;
+	uint64_t osize = 0;
+	int ret = 0;
+
+	model = dev->data->models[model_id];
+
+	/* Create input and output buffers. */
+	rte_ml_io_input_size_get(dev->data->dev_id, model_id, model->batch_size, &isize, NULL);
+	rte_ml_io_output_size_get(dev->data->dev_id, model_id, model->batch_size, &osize, NULL);
+
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%d", "ml_dummy_io", model_id);
+	mz = plt_memzone_reserve_aligned(str, isize + osize, 0, ML_CN10K_ALIGN_SIZE);
+	if (mz == NULL)
+		return -ENOMEM;
+	memset(mz->addr, 0, isize + osize);
+
+	op.model_id = model_id;
+	op.nb_batches = model->batch_size;
+	op.mempool = NULL;
+
+	op.input.addr = mz->addr;
+	op.input.length = isize;
+	op.input.next = NULL;
+
+	op.output.addr = PLT_PTR_ADD(op.input.addr, isize);
+	op.output.length = osize;
+	op.output.next = NULL;
+
+	memset(model->req, 0, sizeof(struct cn10k_ml_req));
+	ret = cn10k_ml_inference_sync(dev, &op);
+	plt_memzone_free(mz);
+
+	return ret;
+}
+
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -1471,6 +1514,13 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, int16_t model_id)
 		}
 	}
 
+	if (ret < 0) { /* Call unload to update model and FW state, ignore error */
+		rte_ml_model_stop(dev->data->dev_id, model_id);
+	} else {
+		if (mldev->cache_model_data && roc_model_is_cn10ka())
+			ret = cn10k_ml_cache_model_data(dev, model_id);
+	}
+
 	return ret;
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 34/37] ml/cnxk: add support to select OCM allocation mode
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (32 preceding siblings ...)
  2022-12-08 20:02 ` [PATCH v1 33/37] ml/cnxk: add support to enable model data caching Srikanth Yalavarthi
@ 2022-12-08 20:02 ` Srikanth Yalavarthi
  2022-12-08 20:02 ` [PATCH v1 35/37] ml/cnxk: add support to use lock during jcmd enq Srikanth Yalavarthi
                   ` (6 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:02 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added device argument "ocm_alloc_mode" to select OCM allocation
method during model start. Two modes are supported by the driver.

Added implementation for ocm_alloc_mode lowest as default.

ocm_alloc_mode:
lowest:  Allocate from first available free slot / lowest
         tile ID in OCM (default)
largest: Allocate from a slot with maximum free memory

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 45 +++++++++++++++++++++++++++++-----
 drivers/ml/cnxk/cn10k_ml_ocm.c |  6 ++---
 drivers/ml/cnxk/cn10k_ml_ocm.h |  3 +++
 3 files changed, 44 insertions(+), 10 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index b844a42677..a5fce18ec1 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -21,11 +21,13 @@
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
 #define CN10K_ML_DEV_CACHE_MODEL_DATA	"cache_model_data"
+#define CN10K_ML_OCM_ALLOC_MODE		"ocm_alloc_mode"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT 0
 #define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT	1
+#define CN10K_ML_OCM_ALLOC_MODE_DEFAULT		"lowest"
 
 /* ML firmware macros */
 #define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
@@ -39,9 +41,12 @@
 #define FW_ENABLE_DPE_WARNING_BITMASK BIT(0)
 #define FW_REPORT_DPE_WARNING_BITMASK BIT(1)
 
-static const char *const valid_args[] = {CN10K_ML_FW_PATH, CN10K_ML_FW_ENABLE_DPE_WARNINGS,
+static const char *const valid_args[] = {CN10K_ML_FW_PATH,
+					 CN10K_ML_FW_ENABLE_DPE_WARNINGS,
 					 CN10K_ML_FW_REPORT_DPE_WARNINGS,
-					 CN10K_ML_DEV_CACHE_MODEL_DATA, NULL};
+					 CN10K_ML_DEV_CACHE_MODEL_DATA,
+					 CN10K_ML_OCM_ALLOC_MODE,
+					 NULL};
 
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
@@ -81,6 +86,8 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	bool report_dpe_warnings_set = false;
 	bool cache_model_data_set = false;
 	struct rte_kvargs *kvlist = NULL;
+	bool ocm_alloc_mode_set = false;
+	char *ocm_alloc_mode = NULL;
 	bool fw_path_set = false;
 	char *fw_path = NULL;
 	int ret = 0;
@@ -140,6 +147,17 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		cache_model_data_set = true;
 	}
 
+	if (rte_kvargs_count(kvlist, CN10K_ML_OCM_ALLOC_MODE) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_OCM_ALLOC_MODE, &parse_string_arg,
+					 &ocm_alloc_mode);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n", CN10K_ML_OCM_ALLOC_MODE);
+			ret = -EINVAL;
+			goto exit;
+		}
+		ocm_alloc_mode_set = true;
+	}
+
 check_args:
 	if (!fw_path_set)
 		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
@@ -183,6 +201,20 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %d", CN10K_ML_DEV_CACHE_MODEL_DATA, mldev->cache_model_data);
 
+	if (!ocm_alloc_mode_set) {
+		mldev->ocm.alloc_mode = CN10K_ML_OCM_ALLOC_MODE_DEFAULT;
+	} else {
+		if (!((strcmp(ocm_alloc_mode, "lowest") == 0) ||
+		      (strcmp(ocm_alloc_mode, "largest") == 0))) {
+			plt_err("Invalid argument, %s = %s\n", CN10K_ML_OCM_ALLOC_MODE,
+				ocm_alloc_mode);
+			ret = -EINVAL;
+			goto exit;
+		}
+		mldev->ocm.alloc_mode = ocm_alloc_mode;
+	}
+	plt_info("ML: %s = %s", CN10K_ML_OCM_ALLOC_MODE, mldev->ocm.alloc_mode);
+
 exit:
 	if (kvlist)
 		rte_kvargs_free(kvlist);
@@ -720,7 +752,8 @@ RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table);
 RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
 
-RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD,
-			      CN10K_ML_FW_PATH "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
-					       "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
-					       "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA "=<0|1>");
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH
+			      "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
+			      "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
+			      "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
+			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE "=<lowest|largest>");
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 348df9468a..b74af2cae6 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -230,7 +230,6 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	int wb_page_start_curr;
 	int max_slot_sz_curr;
 	uint8_t tile_start;
-	int ocm_alloc_mode;
 	int wb_page_start;
 	uint16_t tile_id;
 	uint16_t word_id;
@@ -255,7 +254,6 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	max_slot_sz_curr = 0;
 	max_slot_sz = 0;
 	tile_idx = 0;
-	ocm_alloc_mode = 2;
 
 	if ((start_tile != -1) && (start_tile % num_tiles != 0)) {
 		plt_err("Invalid start_tile, %d", start_tile);
@@ -303,13 +301,13 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 		}
 	}
 
-	if (ocm_alloc_mode == 1) {
+	if (strcmp(ocm->alloc_mode, "lowest") == 0) {
 		wb_page_start = slot_index_lowest(local_ocm_mask, ocm->mask_words, wb_pages, 0);
 		if (wb_page_start != -1) { /* Have a valid slot for WB, else next set of tiles */
 			tile_idx = tile_start;
 			goto found;
 		}
-	} else if (ocm_alloc_mode == 2) {
+	} else if (strcmp(ocm->alloc_mode, "largest") == 0) {
 		wb_page_start_curr = slot_index_largest(local_ocm_mask, ocm->mask_words, wb_pages,
 							&max_slot_sz_curr);
 		if (max_slot_sz_curr > max_slot_sz) {
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 887c8bf6c0..65f0e0f650 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -58,6 +58,9 @@ struct cn10k_ml_ocm {
 	/* OCM spinlock, used to update OCM state */
 	rte_spinlock_t lock;
 
+	/* OCM allocation mode */
+	const char *alloc_mode;
+
 	/* Number of OCM tiles */
 	uint8_t num_tiles;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 35/37] ml/cnxk: add support to use lock during jcmd enq
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (33 preceding siblings ...)
  2022-12-08 20:02 ` [PATCH v1 34/37] ml/cnxk: add support to select OCM allocation mode Srikanth Yalavarthi
@ 2022-12-08 20:02 ` Srikanth Yalavarthi
  2022-12-08 20:02 ` [PATCH v1 36/37] ml/cnxk: add support to select poll memory region Srikanth Yalavarthi
                   ` (5 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:02 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added device argument "hw_queue_lock" to select the JCMDQ enqueue
ROC function to be used in fast path.

hw_queue_lock:

0: Disable, use lock free version of JCMDQ enqueue ROC 	function for
	job queuing. To avoid race condition in request queuing to
	hardware, disabling hw_queue_lock restricts the number of
	queue-pairs supported by cnxk driver to 1.

1: Enable, (default) use spin-lock version of JCMDQ enqueue ROC
	function for job queuing. Enabling spinlock version would
	disable restrictions on the number of queue-pairs that
	can be created.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 31 ++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cn10k_ml_dev.h | 13 +++++++++++--
 drivers/ml/cnxk/cn10k_ml_ops.c | 20 +++++++++++++++++---
 3 files changed, 58 insertions(+), 6 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index a5fce18ec1..33709dae6f 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -22,12 +22,14 @@
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
 #define CN10K_ML_DEV_CACHE_MODEL_DATA	"cache_model_data"
 #define CN10K_ML_OCM_ALLOC_MODE		"ocm_alloc_mode"
+#define CN10K_ML_DEV_HW_QUEUE_LOCK	"hw_queue_lock"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT 0
 #define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT	1
 #define CN10K_ML_OCM_ALLOC_MODE_DEFAULT		"lowest"
+#define CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT	1
 
 /* ML firmware macros */
 #define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
@@ -46,6 +48,7 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 					 CN10K_ML_FW_REPORT_DPE_WARNINGS,
 					 CN10K_ML_DEV_CACHE_MODEL_DATA,
 					 CN10K_ML_OCM_ALLOC_MODE,
+					 CN10K_ML_DEV_HW_QUEUE_LOCK,
 					 NULL};
 
 /* Dummy operations for ML device */
@@ -87,6 +90,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	bool cache_model_data_set = false;
 	struct rte_kvargs *kvlist = NULL;
 	bool ocm_alloc_mode_set = false;
+	bool hw_queue_lock_set = false;
 	char *ocm_alloc_mode = NULL;
 	bool fw_path_set = false;
 	char *fw_path = NULL;
@@ -158,6 +162,18 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		ocm_alloc_mode_set = true;
 	}
 
+	if (rte_kvargs_count(kvlist, CN10K_ML_DEV_HW_QUEUE_LOCK) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_DEV_HW_QUEUE_LOCK, &parse_integer_arg,
+					 &mldev->hw_queue_lock);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n",
+				CN10K_ML_DEV_HW_QUEUE_LOCK);
+			ret = -EINVAL;
+			goto exit;
+		}
+		hw_queue_lock_set = true;
+	}
+
 check_args:
 	if (!fw_path_set)
 		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
@@ -215,6 +231,18 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %s", CN10K_ML_OCM_ALLOC_MODE, mldev->ocm.alloc_mode);
 
+	if (!hw_queue_lock_set) {
+		mldev->hw_queue_lock = CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT;
+	} else {
+		if ((mldev->hw_queue_lock < 0) || (mldev->hw_queue_lock > 1)) {
+			plt_err("Invalid argument, %s = %d\n", CN10K_ML_DEV_HW_QUEUE_LOCK,
+				mldev->hw_queue_lock);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, mldev->hw_queue_lock);
+
 exit:
 	if (kvlist)
 		rte_kvargs_free(kvlist);
@@ -756,4 +784,5 @@ RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH
 			      "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
 			      "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
 			      "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
-			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE "=<lowest|largest>");
+			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE
+			      "=<lowest|largest>" CN10K_ML_DEV_HW_QUEUE_LOCK "=<0|1>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 59094e767e..f4e0fea920 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -21,8 +21,11 @@
 /* Maximum number of models per device */
 #define ML_CN10K_MAX_MODELS 16
 
-/* Maximum number of Queue-Pairs per device */
-#define ML_CN10K_MAX_QP_PER_DEVICE 1
+/* Maximum number of Queue-Pairs per device, spinlock version */
+#define ML_CN10K_MAX_QP_PER_DEVICE_SL 16
+
+/* Maximum number of Queue-Pairs per device, lock-free version */
+#define ML_CN10K_MAX_QP_PER_DEVICE_LF 1
 
 /* Maximum number of descriptors per queue-pair */
 #define ML_CN10K_MAX_DESC_PER_QP 1024
@@ -384,6 +387,12 @@ struct cn10k_ml_dev {
 
 	/* Enable / disable model data caching */
 	int cache_model_data;
+
+	/* Use spinlock version of ROC enqueue */
+	int hw_queue_lock;
+
+	/* JCMD enqueue function handler */
+	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index a44d77df76..f787455a7f 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -534,13 +534,21 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, int16_t model_id)
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
+	struct cn10k_ml_dev *mldev;
+
 	if (dev_info == NULL)
 		return -EINVAL;
 
+	mldev = dev->data->dev_private;
+
 	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
 	dev_info->driver_name = dev->device->driver->name;
 	dev_info->max_models = ML_CN10K_MAX_MODELS;
-	dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE;
+	if (mldev->hw_queue_lock)
+		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
+	else
+		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_LF;
+
 	dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
 	dev_info->max_segments = ML_CN10K_MAX_SEGMENTS;
 	dev_info->min_align_size = ML_CN10K_ALIGN_SIZE;
@@ -703,6 +711,12 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	else
 		mldev->xstats_enabled = false;
 
+	/* Set JCMDQ enqueue function */
+	if (mldev->hw_queue_lock == 1)
+		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_sl;
+	else
+		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
+
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
 	dev->op_error_get = cn10k_ml_op_error_get;
@@ -1996,7 +2010,7 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	req->result.user_ptr = op->user_ptr;
 
 	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
-	enqueued = roc_ml_jcmdq_enqueue_lf(&mldev->roc, &req->jcmd);
+	enqueued = mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
 
@@ -2117,7 +2131,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	timeout = true;
 	req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
 	do {
-		if (roc_ml_jcmdq_enqueue_lf(&mldev->roc, &req->jcmd)) {
+		if (mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd)) {
 			req->op = op;
 			timeout = false;
 			break;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 36/37] ml/cnxk: add support to select poll memory region
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (34 preceding siblings ...)
  2022-12-08 20:02 ` [PATCH v1 35/37] ml/cnxk: add support to use lock during jcmd enq Srikanth Yalavarthi
@ 2022-12-08 20:02 ` Srikanth Yalavarthi
  2022-12-08 20:02 ` [PATCH v1 37/37] ml/cnxk: add user guide for marvell cnxk ml driver Srikanth Yalavarthi
                   ` (4 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:02 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added device argument "poll_mem" to select the memory
region to be used for polling in fast-path requests.

Implemented support to use scratch registers for polling.
Available pool of scratch registers one-to-one mapped with
the internal request queue.

poll_mem:
ddr:      Use DDR memory location for polling (default)
register: Use scratch registers polling

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c |  47 +++++++++++--
 drivers/ml/cnxk/cn10k_ml_dev.h |  24 +++++++
 drivers/ml/cnxk/cn10k_ml_ops.c | 124 +++++++++++++++++++++++++++++++--
 drivers/ml/cnxk/cn10k_ml_ops.h |   9 +++
 4 files changed, 192 insertions(+), 12 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 33709dae6f..153a0bdf4c 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -23,6 +23,7 @@
 #define CN10K_ML_DEV_CACHE_MODEL_DATA	"cache_model_data"
 #define CN10K_ML_OCM_ALLOC_MODE		"ocm_alloc_mode"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK	"hw_queue_lock"
+#define CN10K_ML_FW_POLL_MEM		"poll_mem"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1
@@ -30,6 +31,7 @@
 #define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT	1
 #define CN10K_ML_OCM_ALLOC_MODE_DEFAULT		"lowest"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT	1
+#define CN10K_ML_FW_POLL_MEM_DEFAULT		"ddr"
 
 /* ML firmware macros */
 #define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
@@ -42,6 +44,7 @@
 /* Firmware flags */
 #define FW_ENABLE_DPE_WARNING_BITMASK BIT(0)
 #define FW_REPORT_DPE_WARNING_BITMASK BIT(1)
+#define FW_USE_DDR_POLL_ADDR_FP	      BIT(2)
 
 static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 					 CN10K_ML_FW_ENABLE_DPE_WARNINGS,
@@ -49,6 +52,7 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 					 CN10K_ML_DEV_CACHE_MODEL_DATA,
 					 CN10K_ML_OCM_ALLOC_MODE,
 					 CN10K_ML_DEV_HW_QUEUE_LOCK,
+					 CN10K_ML_FW_POLL_MEM,
 					 NULL};
 
 /* Dummy operations for ML device */
@@ -92,7 +96,9 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	bool ocm_alloc_mode_set = false;
 	bool hw_queue_lock_set = false;
 	char *ocm_alloc_mode = NULL;
+	bool poll_mem_set = false;
 	bool fw_path_set = false;
+	char *poll_mem = NULL;
 	char *fw_path = NULL;
 	int ret = 0;
 
@@ -174,6 +180,17 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		hw_queue_lock_set = true;
 	}
 
+	if (rte_kvargs_count(kvlist, CN10K_ML_FW_POLL_MEM) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_POLL_MEM, &parse_string_arg,
+					 &poll_mem);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n", CN10K_ML_FW_POLL_MEM);
+			ret = -EINVAL;
+			goto exit;
+		}
+		poll_mem_set = true;
+	}
+
 check_args:
 	if (!fw_path_set)
 		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
@@ -243,6 +260,18 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, mldev->hw_queue_lock);
 
+	if (!poll_mem_set) {
+		mldev->fw.poll_mem = CN10K_ML_FW_POLL_MEM_DEFAULT;
+	} else {
+		if (!((strcmp(poll_mem, "ddr") == 0) || (strcmp(poll_mem, "register") == 0))) {
+			plt_err("Invalid argument, %s = %s\n", CN10K_ML_FW_POLL_MEM, poll_mem);
+			ret = -EINVAL;
+			goto exit;
+		}
+		mldev->fw.poll_mem = poll_mem;
+	}
+	plt_info("ML: %s = %s", CN10K_ML_FW_POLL_MEM, mldev->fw.poll_mem);
+
 exit:
 	if (kvlist)
 		rte_kvargs_free(kvlist);
@@ -376,6 +405,11 @@ cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 	if (fw->report_dpe_warnings)
 		flags = flags | FW_REPORT_DPE_WARNING_BITMASK;
 
+	if (strcmp(fw->poll_mem, "ddr") == 0)
+		flags = flags | FW_USE_DDR_POLL_ADDR_FP;
+	else if (strcmp(fw->poll_mem, "register") == 0)
+		flags = flags & ~FW_USE_DDR_POLL_ADDR_FP;
+
 	return flags;
 }
 
@@ -780,9 +814,10 @@ RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table);
 RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
 
-RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH
-			      "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
-			      "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
-			      "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
-			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE
-			      "=<lowest|largest>" CN10K_ML_DEV_HW_QUEUE_LOCK "=<0|1>");
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD,
+			      CN10K_ML_FW_PATH "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
+					       "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
+					       "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
+					       "=<0|1>" CN10K_ML_OCM_ALLOC_MODE
+					       "=<lowest|largest>" CN10K_ML_DEV_HW_QUEUE_LOCK
+					       "=<0|1>" CN10K_ML_FW_POLL_MEM "=<ddr|register>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index f4e0fea920..ce2d75f0e0 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -43,6 +43,18 @@
 #define ML_CN10K_POLL_JOB_START	 0
 #define ML_CN10K_POLL_JOB_FINISH 1
 
+/* Memory barrier macros */
+#if defined(RTE_ARCH_ARM)
+#define dmb_st ({ asm volatile("dmb st" : : : "memory"); })
+#define dsb_st ({ asm volatile("dsb st" : : : "memory"); })
+#else
+#define dmb_st
+#define dsb_st
+#endif
+
+struct cn10k_ml_req;
+struct cn10k_ml_qp;
+
 /* ML Job types */
 enum cn10k_ml_job_type {
 	ML_CN10K_JOB_TYPE_MODEL_RUN = 0,
@@ -358,6 +370,9 @@ struct cn10k_ml_fw {
 	/* Report DPE warnings */
 	int report_dpe_warnings;
 
+	/* Memory to be used for polling in fast-path requests */
+	const char *poll_mem;
+
 	/* Data buffer */
 	uint8_t *data;
 
@@ -393,6 +408,15 @@ struct cn10k_ml_dev {
 
 	/* JCMD enqueue function handler */
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
+
+	/* Poll handling function pointers */
+	void (*set_poll_addr)(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx);
+	void (*set_poll_ptr)(struct roc_ml *roc_ml, struct cn10k_ml_req *req);
+	uint64_t (*get_poll_ptr)(struct roc_ml *roc_ml, struct cn10k_ml_req *req);
+
+	/* Memory barrier function pointers to handle synchronization */
+	void (*set_enq_barrier)(void);
+	void (*set_deq_barrier)(void);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index f787455a7f..b73ce8c97a 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -23,6 +23,11 @@
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
+/* Scratch register range for poll mode requests */
+#define ML_POLL_REGISTER_SYNC  1023
+#define ML_POLL_REGISTER_START 1024
+#define ML_POLL_REGISTER_END   2047
+
 /* Error message length */
 #define ERRMSG_LEN 32
 
@@ -76,6 +81,80 @@ print_line(FILE *fp, int len)
 	fprintf(fp, "\n");
 }
 
+static inline void
+cn10k_ml_set_poll_addr_ddr(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx)
+{
+	PLT_SET_USED(qp);
+	PLT_SET_USED(idx);
+
+	req->compl_W1 = PLT_U64_CAST(&req->status);
+}
+
+static inline void
+cn10k_ml_set_poll_addr_reg(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx)
+{
+	req->compl_W1 = ML_SCRATCH(qp->block_start + idx % qp->block_size);
+}
+
+static inline void
+cn10k_ml_set_poll_ptr_ddr(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+{
+	PLT_SET_USED(roc_ml);
+
+	plt_write64(ML_CN10K_POLL_JOB_START, req->compl_W1);
+}
+
+static inline void
+cn10k_ml_set_poll_ptr_reg(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+{
+	roc_ml_reg_write64(roc_ml, ML_CN10K_POLL_JOB_START, req->compl_W1);
+}
+
+static inline uint64_t
+cn10k_ml_get_poll_ptr_ddr(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+{
+	PLT_SET_USED(roc_ml);
+
+	return plt_read64(req->compl_W1);
+}
+
+static inline uint64_t
+cn10k_ml_get_poll_ptr_reg(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+{
+	return roc_ml_reg_read64(roc_ml, req->compl_W1);
+}
+
+static inline void
+cn10k_ml_set_sync_addr(struct cn10k_ml_dev *mldev, struct cn10k_ml_req *req)
+{
+	if (strcmp(mldev->fw.poll_mem, "ddr") == 0)
+		req->compl_W1 = PLT_U64_CAST(&req->status);
+	else if (strcmp(mldev->fw.poll_mem, "register") == 0)
+		req->compl_W1 = ML_SCRATCH(ML_POLL_REGISTER_SYNC);
+}
+
+static inline void
+cn10k_ml_enq_barrier_ddr(void)
+{
+}
+
+static inline void
+cn10k_ml_deq_barrier_ddr(void)
+{
+}
+
+static inline void
+cn10k_ml_enq_barrier_register(void)
+{
+	dmb_st;
+}
+
+static inline void
+cn10k_ml_deq_barrier_register(void)
+{
+	dsb_st;
+}
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -163,6 +242,9 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->stats.dequeued_count = 0;
 	qp->stats.enqueue_err_count = 0;
 	qp->stats.dequeue_err_count = 0;
+	qp->block_size =
+		(ML_POLL_REGISTER_END - ML_POLL_REGISTER_START + 1) / dev->data->nb_queue_pairs;
+	qp->block_start = ML_POLL_REGISTER_START + qp_id * qp->block_size;
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
@@ -341,7 +423,7 @@ cn10k_ml_prep_fp_job_descriptor(struct rte_ml_dev *dev, struct cn10k_ml_req *req
 	mldev = dev->data->dev_private;
 
 	req->jd.hdr.jce.w0.u64 = 0;
-	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
+	req->jd.hdr.jce.w1.u64 = req->compl_W1;
 	req->jd.hdr.model_id = op->model_id;
 	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
 	req->jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
@@ -549,7 +631,11 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	else
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_LF;
 
-	dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
+	if (strcmp(mldev->fw.poll_mem, "register") == 0)
+		dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP / dev_info->max_queue_pairs;
+	else if (strcmp(mldev->fw.poll_mem, "ddr") == 0)
+		dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
+
 	dev_info->max_segments = ML_CN10K_MAX_SEGMENTS;
 	dev_info->min_align_size = ML_CN10K_ALIGN_SIZE;
 
@@ -717,6 +803,26 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	else
 		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
+	/* Set polling function pointers */
+	if (strcmp(mldev->fw.poll_mem, "ddr") == 0) {
+		mldev->set_poll_addr = cn10k_ml_set_poll_addr_ddr;
+		mldev->set_poll_ptr = cn10k_ml_set_poll_ptr_ddr;
+		mldev->get_poll_ptr = cn10k_ml_get_poll_ptr_ddr;
+	} else if (strcmp(mldev->fw.poll_mem, "register") == 0) {
+		mldev->set_poll_addr = cn10k_ml_set_poll_addr_reg;
+		mldev->set_poll_ptr = cn10k_ml_set_poll_ptr_reg;
+		mldev->get_poll_ptr = cn10k_ml_get_poll_ptr_reg;
+	}
+
+	/* Set barrier function pointers */
+	if (strcmp(mldev->fw.poll_mem, "ddr") == 0) {
+		mldev->set_enq_barrier = cn10k_ml_enq_barrier_ddr;
+		mldev->set_deq_barrier = cn10k_ml_deq_barrier_ddr;
+	} else if (strcmp(mldev->fw.poll_mem, "register") == 0) {
+		mldev->set_enq_barrier = cn10k_ml_enq_barrier_register;
+		mldev->set_deq_barrier = cn10k_ml_deq_barrier_register;
+	}
+
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
 	dev->op_error_get = cn10k_ml_op_error_get;
@@ -2003,13 +2109,15 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	op = ops[count];
 	req = &queue->reqs[head];
 
+	mldev->set_poll_addr(qp, req, head);
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
+	mldev->set_enq_barrier();
 
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	mldev->set_poll_ptr(&mldev->roc, req);
 	enqueued = mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
@@ -2035,6 +2143,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 		       uint16_t nb_ops)
 {
 	struct cn10k_ml_queue *queue;
+	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	struct cn10k_ml_qp *qp;
 
@@ -2042,6 +2151,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	uint16_t count;
 	uint64_t tail;
 
+	mldev = dev->data->dev_private;
 	qp = dev->data->queue_pairs[qp_id];
 	queue = &qp->queue;
 
@@ -2054,7 +2164,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 
 dequeue_req:
 	req = &queue->reqs[tail];
-	status = plt_read64(&req->status);
+	status = mldev->get_poll_ptr(&mldev->roc, req);
 	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH)) {
 		if (plt_tsc_cycles() < req->timeout)
 			goto empty_or_active;
@@ -2062,6 +2172,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 			req->result.error_code.s.etype = ML_ETYPE_DRIVER;
 	}
 
+	mldev->set_deq_barrier();
 	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
 	ops[count] = req->op;
 
@@ -2119,13 +2230,14 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	model = dev->data->models[op->model_id];
 	req = model->req;
 
+	cn10k_ml_set_sync_addr(mldev, req);
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	mldev->set_poll_ptr(&mldev->roc, req);
 	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
 
 	timeout = true;
@@ -2145,7 +2257,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 
 	timeout = true;
 	do {
-		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+		if (mldev->get_poll_ptr(&mldev->roc, req) == ML_CN10K_POLL_JOB_FINISH) {
 			timeout = false;
 			break;
 		}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 4c38f1938a..f09c67f186 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -26,6 +26,9 @@ struct cn10k_ml_req {
 	/* Job command */
 	struct ml_job_cmd_s jcmd;
 
+	/* Job completion W1 */
+	uint64_t compl_W1;
+
 	/* Request timeout cycle */
 	uint64_t timeout;
 
@@ -61,6 +64,12 @@ struct cn10k_ml_qp {
 
 	/* Queue pair statistics */
 	struct rte_ml_dev_stats stats;
+
+	/* Register block start for polling */
+	uint32_t block_start;
+
+	/* Register block end for polling */
+	uint32_t block_size;
 };
 
 /* CN10K device ops */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v1 37/37] ml/cnxk: add user guide for marvell cnxk ml driver
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (35 preceding siblings ...)
  2022-12-08 20:02 ` [PATCH v1 36/37] ml/cnxk: add support to select poll memory region Srikanth Yalavarthi
@ 2022-12-08 20:02 ` Srikanth Yalavarthi
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (3 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:02 UTC (permalink / raw)
  To: Thomas Monjalon, Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added user guide for Marvell cnxk ML driver for Marvell Octeon
cnxk Soc family. Added details about device initialization,
debug options and runtime device args supported by the driver.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 MAINTAINERS                 |   1 +
 doc/guides/index.rst        |   1 +
 doc/guides/mldevs/cnxk.rst  | 238 ++++++++++++++++++++++++++++++++++++
 doc/guides/mldevs/index.rst |  14 +++
 4 files changed, 254 insertions(+)
 create mode 100644 doc/guides/mldevs/cnxk.rst
 create mode 100644 doc/guides/mldevs/index.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index ba4c97e802..537acb8c84 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1443,6 +1443,7 @@ M: Srikanth Yalavarthi <syalavarthi@marvell.com>
 F: drivers/common/cnxk/hw/ml.h
 F: drivers/common/cnxk/roc_ml*
 F: drivers/ml/cnxk/
+F: doc/guides/mldevs/cnxk.rst
 
 
 Packet processing
diff --git a/doc/guides/index.rst b/doc/guides/index.rst
index 5eb5bd9c9a..0bd729530a 100644
--- a/doc/guides/index.rst
+++ b/doc/guides/index.rst
@@ -26,6 +26,7 @@ DPDK documentation
    eventdevs/index
    rawdevs/index
    mempool/index
+   mldevs/index
    platform/index
    contributing/index
    rel_notes/index
diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
new file mode 100644
index 0000000000..1e657450c7
--- /dev/null
+++ b/doc/guides/mldevs/cnxk.rst
@@ -0,0 +1,238 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright (c) 2022 Marvell.
+
+Marvell cnxk Machine Learning Poll Mode Driver
+==============================================
+
+The cnxk ML poll mode driver provides support for offloading Machine
+Learning inference operations to Machine Learning accelerator units
+on the **Marvell OCTEON cnxk** SoC family.
+
+The cnxk ML PMD code is organized into multiple files with all file names
+starting with cn10k, providing support for CN106XX and CN106XXS.
+
+More information about OCTEON cnxk SoCs may be obtained from `<https://www.marvell.com>`_
+
+Supported OCTEON cnxk SoCs
+--------------------------
+
+- CN106XX
+- CN106XXS
+
+Features
+--------
+
+The OCTEON cnxk ML PMD provides support for the following set of operations:
+
+Slow-path device and ML model handling:
+
+* ``Device probing, configuration and close``
+* ``Device start / stop``
+* ``Model loading and unloading``
+* ``Model start / stop``
+* ``Data quantization and dequantization``
+
+Fast-path Inference:
+
+* ``Inference execution``
+* ``Error handling``
+
+
+Installation
+------------
+
+The OCTEON cnxk ML PMD may be compiled natively on an OCTEON cnxk platform
+or cross-compiled on an x86 platform.
+
+Refer to :doc:`../platform/cnxk` for instructions to build your DPDK
+application.
+
+
+Initialization
+--------------
+
+``CN10K Initialization``
+
+List the ML PF devices available on cn10k platform:
+
+.. code-block:: console
+
+    lspci -d:a092
+
+``a092`` is the ML device PF id. You should see output similar to:
+
+.. code-block:: console
+
+    0000:00:10.0 System peripheral: Cavium, Inc. Device a092
+
+Bind the ML PF device to the vfio_pci driver:
+
+.. code-block:: console
+
+    cd <dpdk directory>
+    ./usertools/dpdk-devbind.py -u 0000:00:10.0
+    ./usertools/dpdk-devbind.py -b vfio-pci 0000:00:10.0
+
+Runtime Config Options
+----------------------
+
+- ``Firmware file path`` (default ``/lib/firmware/mlip-fw.bin``)
+
+   Path to the firmware binary to be loaded during device configuration.
+   The ``fw_path`` ``devargs`` parameter can be used by the user to load
+   ML firmware from a custom path.
+
+   For example::
+
+      -a 0000:00:10.0,fw_path="/home/user/ml_fw.bin"
+
+   With the above configuration, driver loads the firmware from the path
+   "/home/user/ml_fw.bin".
+
+- ``Enable DPE warnings`` (default ``1``)
+
+   ML firmware can be configured during load to handle the DPE errors reported
+   by ML inference engine. When enabled, firmware would mask the DPE non-fatal
+   hardware errors as warnings. The parameter ``enable_dpe_warnings`` ``devargs``
+   is used fo this configuration.
+
+   For example::
+
+      -a 0000:00:10.0,enable_dpe_warnings=0
+
+   With the above configuration, DPE non-fatal errors reported by HW are
+   considered as errors.
+
+
+- ``Model data caching`` (default ``1``)
+
+   Enable caching model data on ML ACC cores. Enabling this option executes a
+   dummy inference request in synchronous mode during model start stage. Caching
+   of model data improves the inferencing throughput / latency for the model.
+   The parameter ``cache_model_data`` ``devargs`` is used to enable data caching.
+
+   For example::
+
+      -a 0000:00:10.0,cache_model_data=0
+
+   With the above configuration, model data caching is disabled.
+
+
+- ``OCM allocation mode`` (default ``lowest``)
+
+   Option to specify the method to be used while allocating OCM memory for a
+   model during model start. Two modes are supported by the driver. The
+   parameter ``ocm_alloc_mode`` ``devargs`` is used to select the OCM
+   allocation mode.
+
+   ``lowest`` - Allocate OCM for the model from first available free slot. Search
+   for the free slot is done starting from the lowest tile ID and lowest page ID.
+   ``largest`` - Allocate OCM for the model from the slot with largest amount of
+   free space.
+
+   For example::
+
+      -a 0000:00:10.0,ocm_alloc_mode=lowest
+
+   With the above configuration, OCM allocation fo the model would be done from
+   the first available free slot / from the lowest possible tile ID.
+
+
+- ``Enable hardware queue lock`` (default ``0``)
+
+   Option to select the job request enqueue function to used to queue the requests
+   to hardware queue. The parameter ``hw_queue_lock`` ``devargs`` is used to select
+   the enqueue function.
+
+   ``0`` - Disable (default), use lock free version of hardware enqueue function
+   for job queuing in enqueue burst operation. To avoid race condition in request
+   queuing to hardware, disabling hw_queue_lock restricts the number of queue-pairs
+   supported by cnxk driver to 1.
+   ``1`` - Enable, use spin-lock version of hardware enqueue function for job queuing.
+   Enabling spinlock version would disable restrictions on the number of queue-pairs
+   that can be supported by the driver.
+
+   For example::
+
+      -a 0000:00:10.0,hw_queue_lock=1
+
+   With the above configuration, spinlock version of hardware enqueue function is used
+   in the fast path enqueue burst operation.
+
+
+- ``Polling memory location`` (default ``ddr``)
+
+   ML cnxk driver provides the option to select the memory location to be used
+   for polling to check the inference request completion. Driver supports using
+   the either DDR address space (``ddr``) or ML registers (``register``) as
+   polling locations. The parameter ``poll_mem`` ``devargs`` is used to specify
+   the poll location.
+
+   For example::
+
+      -a 0000:00:10.0,poll_mem="register"
+
+   With the above configuration, ML cnxk driver is configured to use ML registers
+   for polling in fastpath requests.
+
+
+Debugging Options
+-----------------
+
+.. _table_octeon_cnxk_ml_debug_options:
+
+.. table:: OCTEON cnxk ML PMD debug options
+
+    +---+------------+-------------------------------------------------------+
+    | # | Component  | EAL log command                                       |
+    +===+============+=======================================================+
+    | 1 | ML         | --log-level='pmd\.ml\.cnxk,8'                         |
+    +---+------------+-------------------------------------------------------+
+
+
+Extended stats
+--------------
+
+Marvell cnxk ML PMD supports reporting the inference latencies through extended
+stats. The PMD supports the below list of 6 extended stats types per each model.
+Total number of extended stats would be equal to 6 x number of models loaded.
+
+.. _table_octeon_cnxk_ml_xstats_names:
+
+.. table:: OCTEON cnxk ML PMD xstats names
+
+    +---+---------------------+----------------------------------------------+
+    | # | Type                | Description                                  |
+    +===+=====================+==============================================+
+    | 1 | Avg-HW-Latency      | Average hardware latency                     |
+    +---+---------------------+----------------------------------------------+
+    | 2 | Min-HW-Latency      | Minimum hardware latency                     |
+    +---+---------------------+----------------------------------------------+
+    | 3 | Max-HW-Latency      | Maximum hardware latency                     |
+    +---+---------------------+----------------------------------------------+
+    | 4 | Avg-HW-Latency      | Average firmware latency                     |
+    +---+---------------------+----------------------------------------------+
+    | 5 | Avg-HW-Latency      | Minimum firmware latency                     |
+    +---+---------------------+----------------------------------------------+
+    | 6 | Avg-HW-Latency      | Maximum firmware latency                     |
+    +---+---------------------+----------------------------------------------+
+
+Latency values reported by the PMD through xstats can have units, either in
+cycles or nano seconds. The units of the latency is determined during DPDK
+initialization and would depend on the availability of SCLK. Latencies are
+reported in nao seconds when the SCLK is available and in cycles otherwise.
+Application needs to initialize at least one RVU for the clock to be available.
+
+xstats names are dynamically generated by the PMD and would have the format
+"Model-<model_id>-Type-<units>".
+
+For example::
+   Model-1-Avg-FW-Latency-ns
+
+The above xstat name would report average firmware latency in nano seconds for
+model with model ID 1.
+
+Number of xstats made available by the PMD chang dynamically. The number would
+increase with loading a model and would decrease with unloading a model.
+Application needs to update the xstats map after a model is either loaded or
+unloaded.
diff --git a/doc/guides/mldevs/index.rst b/doc/guides/mldevs/index.rst
new file mode 100644
index 0000000000..f201e54175
--- /dev/null
+++ b/doc/guides/mldevs/index.rst
@@ -0,0 +1,14 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright (c) 2022 Marvell.
+
+Machine Learning Device Driver
+==============================
+
+The following are a list of ML device PMDs, which can be used from an
+application through the ML device API.
+
+.. toctree::
+    :maxdepth: 2
+    :numbered:
+
+    cnxk
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 00/37] Implementation of ML CNXK driver
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (36 preceding siblings ...)
  2022-12-08 20:02 ` [PATCH v1 37/37] ml/cnxk: add user guide for marvell cnxk ml driver Srikanth Yalavarthi
@ 2022-12-08 20:17 ` Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 01/37] ml/cnxk: add skeleton for ML cnxk driver Srikanth Yalavarthi
                     ` (37 more replies)
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                   ` (2 subsequent siblings)
  40 siblings, 38 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:17 UTC (permalink / raw)
  Cc: dev, sshankarnara, jerinj, aprabhu, Srikanth Yalavarthi

Marvell ML CNXK Driver
----------------------

This patch series implements Machine Learning (ML) driver for Marvell
Octeon 10 (cnxk) platform. ML inferencing is supported on cnxk platform
through an integrated ML inferencing processor. The current driver
supports programming the ML hardware engine through offload mode.

All APIs proposed in the DPDK ML device specification are supported on
the cnxk platform.


Srikanth Yalavarthi (37):
  ml/cnxk: add skeleton for ML cnxk driver
  ml/cnxk: enable probe and remove of ML device
  ml/cnxk: add driver support to get device info
  ml/cnxk: add support for configure and close
  ml/cnxk: parse ML firmware path from device args
  ml/cnxk: enable firmware load and device reset
  ml/cnxk: enable support for simulator environment
  ml/cnxk: enable support for device start and stop
  ml/cnxk: add support to create device queue-pairs
  ml/cnxk: add functions to load and unload models
  ml/cnxk: enable validity checks for model metadata
  ml/cnxk: add internal structures for derived info
  ml/cnxk: add internal structures for tiles and OCM
  ml/cnxk: add structures for slow and fast path JDs
  ml/cnxk: find OCM mask and page slots for a model
  ml/cnxk: add support to reserve and free OCM pages
  ml/cnxk: enable support to start an ML model
  ml/cnxk: enable support to stop an ML models
  ml/cnxk: enable support to get model information
  ml/cnxk: enable support to update model params
  ml/cnxk: add support to get IO buffer sizes
  ml/cnxk: enable quantization and dequantization
  ml/cnxk: enable support to dump device debug info
  ml/cnxk: add driver support for device selftest
  ml/cnxk: enqueue a burst of inference requests
  ml/cnxk: dequeue a burst of inference requests
  ml/cnxk: add internal function for sync mode run
  ml/cnxk: enable support for firmware error codes
  ml/cnxk: add support to get and reset device stats
  ml/cnxk: add support to handle extended dev stats
  ml/cnxk: enable support to get xstats in cycles
  ml/cnxk: add support to report DPE FW warnings
  ml/cnxk: add support to enable model data caching
  ml/cnxk: add support to select OCM allocation mode
  ml/cnxk: add support to use lock during jcmd enq
  ml/cnxk: add support to select poll memory region
  ml/cnxk: add user guide for marvell cnxk ml driver

 MAINTAINERS                      |    3 +
 doc/guides/index.rst             |    1 +
 doc/guides/mldevs/cnxk.rst       |  238 +++
 doc/guides/mldevs/index.rst      |   14 +
 drivers/meson.build              |    1 +
 drivers/ml/cnxk/cn10k_ml_dev.c   |  823 +++++++++++
 drivers/ml/cnxk/cn10k_ml_dev.h   |  426 ++++++
 drivers/ml/cnxk/cn10k_ml_model.c |  396 +++++
 drivers/ml/cnxk/cn10k_ml_model.h |  511 +++++++
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  509 +++++++
 drivers/ml/cnxk/cn10k_ml_ocm.h   |   91 ++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 2310 ++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h   |   94 ++
 drivers/ml/cnxk/meson.build      |   32 +
 drivers/ml/meson.build           |    8 +
 15 files changed, 5457 insertions(+)
 create mode 100644 doc/guides/mldevs/cnxk.rst
 create mode 100644 doc/guides/mldevs/index.rst
 create mode 100644 drivers/ml/cnxk/cn10k_ml_dev.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_dev.h
 create mode 100644 drivers/ml/cnxk/cn10k_ml_model.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_model.h
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ocm.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ocm.h
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ops.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ops.h
 create mode 100644 drivers/ml/cnxk/meson.build
 create mode 100644 drivers/ml/meson.build

--
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 01/37] ml/cnxk: add skeleton for ML cnxk driver
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
@ 2022-12-08 20:17   ` Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 02/37] ml/cnxk: enable probe and remove of ML device Srikanth Yalavarthi
                     ` (36 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:17 UTC (permalink / raw)
  To: Thomas Monjalon, Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added initial source files and build files for ML cnxk driver.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
Depends-on: patch-120600 ("common/cnxk: add ML headers and ROC code for cnxk")

 MAINTAINERS                    |  2 ++
 drivers/meson.build            |  1 +
 drivers/ml/cnxk/cn10k_ml_dev.c |  8 ++++++++
 drivers/ml/cnxk/cn10k_ml_dev.h |  8 ++++++++
 drivers/ml/cnxk/meson.build    | 26 ++++++++++++++++++++++++++
 drivers/ml/meson.build         |  8 ++++++++
 6 files changed, 53 insertions(+)
 create mode 100644 drivers/ml/cnxk/cn10k_ml_dev.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_dev.h
 create mode 100644 drivers/ml/cnxk/meson.build
 create mode 100644 drivers/ml/meson.build

diff --git a/MAINTAINERS b/MAINTAINERS
index 8cdb3e215d..ba4c97e802 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1442,6 +1442,8 @@ Marvell ML CNXK
 M: Srikanth Yalavarthi <syalavarthi@marvell.com>
 F: drivers/common/cnxk/hw/ml.h
 F: drivers/common/cnxk/roc_ml*
+F: drivers/ml/cnxk/
+

 Packet processing
 -----------------
diff --git a/drivers/meson.build b/drivers/meson.build
index c6d619200f..546a5f409d 100644
--- a/drivers/meson.build
+++ b/drivers/meson.build
@@ -14,6 +14,7 @@ subdirs = [
         'mempool',        # depends on common and bus.
         'dma',            # depends on common and bus.
         'net',            # depends on common, bus, mempool
+        'ml',             # depends on common, bus, mempool
         'raw',            # depends on common, bus, dma and net.
         'crypto',         # depends on common, bus and mempool (net in future).
         'compress',       # depends on common, bus, mempool.
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
new file mode 100644
index 0000000000..cc96a7bdb3
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include "cn10k_ml_dev.h"
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
new file mode 100644
index 0000000000..049ac13fcd
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef _CN10K_ML_DEV_H_
+#define _CN10K_ML_DEV_H_
+
+#endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
new file mode 100644
index 0000000000..f04e78cce5
--- /dev/null
+++ b/drivers/ml/cnxk/meson.build
@@ -0,0 +1,26 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2022 Marvell.
+
+if not is_linux or not dpdk_conf.get('RTE_ARCH_64')
+    build = false
+    reason = 'only supported on 64-bit Linux'
+    subdir_done()
+endif
+
+sources = files(
+        'cn10k_ml_dev.c',
+)
+
+headers = files(
+        'cn10k_ml_dev.h',
+)
+
+deps += ['mldev', 'common_ml', 'common_cnxk']
+
+if get_option('buildtype').contains('debug')
+        cflags += [ '-DCNXK_ML_DEV_DEBUG' ]
+else
+        cflags += [ '-UCNXK_ML_DEV_DEBUG' ]
+endif
+
+pmd_supports_disable_iova_as_pa = true
diff --git a/drivers/ml/meson.build b/drivers/ml/meson.build
new file mode 100644
index 0000000000..54bc394c47
--- /dev/null
+++ b/drivers/ml/meson.build
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2022 Marvell.
+
+drivers = [
+        'cnxk',
+]
+
+std_deps = ['mldev']
--
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 02/37] ml/cnxk: enable probe and remove of ML device
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 01/37] ml/cnxk: add skeleton for ML cnxk driver Srikanth Yalavarthi
@ 2022-12-08 20:17   ` Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 03/37] ml/cnxk: add driver support to get device info Srikanth Yalavarthi
                     ` (35 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi, Anatoly Burakov; +Cc: dev, sshankarnara, jerinj, aprabhu

ML inference engine on cn10k platform is a PCI based device. Added
driver support to probe and remove the device for cn10k poll mode
driver. The device is named by the PMD as "ml_cn10k".

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 114 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_dev.h |  11 ++++
 drivers/ml/cnxk/cn10k_ml_ops.c |  10 +++
 drivers/ml/cnxk/cn10k_ml_ops.h |  11 ++++
 drivers/ml/cnxk/meson.build    |   2 +
 5 files changed, 148 insertions(+)
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ops.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ops.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index cc96a7bdb3..c2e93c9a1a 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -2,7 +2,121 @@
  * Copyright (c) 2022 Marvell.
  */
 
+#include <rte_common.h>
+#include <rte_dev.h>
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
+#include <rte_pci.h>
+
+#include <roc_api.h>
 
 #include "cn10k_ml_dev.h"
+#include "cn10k_ml_ops.h"
+
+/* Dummy operations for ML device */
+struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
+
+static int
+cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
+{
+	struct rte_ml_dev_pmd_init_params init_params;
+	struct cn10k_ml_dev *mldev;
+	char name[RTE_ML_STR_MAX];
+	struct rte_ml_dev *dev;
+	int ret;
+
+	PLT_SET_USED(pci_drv);
+
+	init_params = (struct rte_ml_dev_pmd_init_params){
+		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cn10k_ml_dev)};
+
+	ret = roc_plt_init();
+	if (ret < 0) {
+		plt_err("Failed to initialize platform model");
+		return ret;
+	}
+
+	rte_pci_device_name(&pci_dev->addr, name, sizeof(name));
+	dev = rte_ml_dev_pmd_create(name, &pci_dev->device, &init_params);
+	if (dev == NULL) {
+		ret = -ENODEV;
+		goto error_exit;
+	}
+
+	/* Get private data space allocated */
+	mldev = dev->data->dev_private;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		mldev->roc.pci_dev = pci_dev;
+
+		ret = roc_ml_dev_init(&mldev->roc);
+		if (ret) {
+			plt_err("Failed to initialize ML ROC, ret = %d", ret);
+			goto pmd_destroy;
+		}
+
+		dev->dev_ops = &cn10k_ml_ops;
+	} else {
+		plt_err("CN10K ML Ops are not supported on secondary process");
+		dev->dev_ops = &ml_dev_dummy_ops;
+	}
+
+	dev->enqueue_burst = NULL;
+	dev->dequeue_burst = NULL;
+	dev->op_error_get = NULL;
+
+	return 0;
+
+pmd_destroy:
+	rte_ml_dev_pmd_destroy(dev);
+
+error_exit:
+	plt_err("Could not create device (vendor_id: 0x%x device_id: 0x%x)", pci_dev->id.vendor_id,
+		pci_dev->id.device_id);
+
+	return ret;
+}
+
+static int
+cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
+{
+	struct cn10k_ml_dev *mldev;
+	char name[RTE_ML_STR_MAX];
+	struct rte_ml_dev *dev;
+	int ret;
+
+	if (pci_dev == NULL)
+		return -EINVAL;
+
+	rte_pci_device_name(&pci_dev->addr, name, sizeof(name));
+
+	dev = rte_ml_dev_pmd_get_named_dev(name);
+	if (dev == NULL)
+		return -ENODEV;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		mldev = dev->data->dev_private;
+		ret = roc_ml_dev_fini(&mldev->roc);
+		if (ret)
+			return ret;
+	}
+
+	return rte_ml_dev_pmd_destroy(dev);
+}
+
+static struct rte_pci_id pci_id_ml_table[] = {
+	{RTE_PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_CN10K_ML_PF)},
+	/* sentinel */
+	{},
+};
+
+static struct rte_pci_driver cn10k_mldev_pmd = {
+	.id_table = pci_id_ml_table,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_NEED_IOVA_AS_VA,
+	.probe = cn10k_ml_pci_probe,
+	.remove = cn10k_ml_pci_remove,
+};
+
+RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd);
+RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table);
+RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 049ac13fcd..4827d29bf7 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -5,4 +5,15 @@
 #ifndef _CN10K_ML_DEV_H_
 #define _CN10K_ML_DEV_H_
 
+#include <roc_api.h>
+
+/* Marvell OCTEON CN10K ML PMD device name */
+#define MLDEV_NAME_CN10K_PMD ml_cn10k
+
+/* Device private data */
+struct cn10k_ml_dev {
+	/* ML device ROC */
+	struct roc_ml roc;
+};
+
 #endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
new file mode 100644
index 0000000000..39843e3ee5
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include "cn10k_ml_ops.h"
+
+struct rte_ml_dev_ops cn10k_ml_ops = {0};
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
new file mode 100644
index 0000000000..adb0035fd7
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef _CN10K_ML_OPS_H_
+#define _CN10K_ML_OPS_H_
+
+/* CN10K device ops */
+extern struct rte_ml_dev_ops cn10k_ml_ops;
+
+#endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index f04e78cce5..bf4ccde2c5 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -9,10 +9,12 @@ endif
 
 sources = files(
         'cn10k_ml_dev.c',
+        'cn10k_ml_ops.c',
 )
 
 headers = files(
         'cn10k_ml_dev.h',
+        'cn10k_ml_ops.h',
 )
 
 deps += ['mldev', 'common_ml', 'common_cnxk']
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 03/37] ml/cnxk: add driver support to get device info
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 01/37] ml/cnxk: add skeleton for ML cnxk driver Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 02/37] ml/cnxk: enable probe and remove of ML device Srikanth Yalavarthi
@ 2022-12-08 20:17   ` Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 04/37] ml/cnxk: add support for configure and close Srikanth Yalavarthi
                     ` (34 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added support to get the cn10k ML device information. This is a
driver implementation for the RTE function rte_ml_dev_info_get.
ML device on cn10k supports one queue-pair in lock-free mode and
does not support segmented input output data.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h | 15 +++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c | 23 ++++++++++++++++++++++-
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 4827d29bf7..eeaf83ce5c 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -10,6 +10,21 @@
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
+/* Device alignment size */
+#define ML_CN10K_ALIGN_SIZE 128
+
+/* Maximum number of models per device */
+#define ML_CN10K_MAX_MODELS 16
+
+/* Maximum number of Queue-Pairs per device */
+#define ML_CN10K_MAX_QP_PER_DEVICE 1
+
+/* Maximum number of descriptors per queue-pair */
+#define ML_CN10K_MAX_DESC_PER_QP 1024
+
+/* Maximum number of segments for IO data */
+#define ML_CN10K_MAX_SEGMENTS 1
+
 /* Device private data */
 struct cn10k_ml_dev {
 	/* ML device ROC */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 39843e3ee5..bad5ad4713 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -5,6 +5,27 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include "cn10k_ml_dev.h"
 #include "cn10k_ml_ops.h"
 
-struct rte_ml_dev_ops cn10k_ml_ops = {0};
+static int
+cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
+{
+	if (dev_info == NULL)
+		return -EINVAL;
+
+	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
+	dev_info->driver_name = dev->device->driver->name;
+	dev_info->max_models = ML_CN10K_MAX_MODELS;
+	dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE;
+	dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
+	dev_info->max_segments = ML_CN10K_MAX_SEGMENTS;
+	dev_info->min_align_size = ML_CN10K_ALIGN_SIZE;
+
+	return 0;
+}
+
+struct rte_ml_dev_ops cn10k_ml_ops = {
+	/* Device control ops */
+	.dev_info_get = cn10k_ml_dev_info_get,
+};
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 04/37] ml/cnxk: add support for configure and close
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (2 preceding siblings ...)
  2022-12-08 20:17   ` [PATCH v2 03/37] ml/cnxk: add driver support to get device info Srikanth Yalavarthi
@ 2022-12-08 20:17   ` Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 05/37] ml/cnxk: parse ML firmware path from device args Srikanth Yalavarthi
                     ` (33 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Implemented driver functions to configure and close ML devices.
Added skeleton code and support to reconfigure ML device. PCI
device remove is enabled in device close.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c |  2 ++
 drivers/ml/cnxk/cn10k_ml_dev.h | 21 ++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c | 60 ++++++++++++++++++++++++++++++++++
 3 files changed, 83 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index c2e93c9a1a..fd45226add 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -65,6 +65,8 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	dev->dequeue_burst = NULL;
 	dev->op_error_get = NULL;
 
+	mldev->state = ML_CN10K_DEV_STATE_PROBED;
+
 	return 0;
 
 pmd_destroy:
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index eeaf83ce5c..bda7a5b3ff 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -25,10 +25,31 @@
 /* Maximum number of segments for IO data */
 #define ML_CN10K_MAX_SEGMENTS 1
 
+/* ML command timeout in seconds */
+#define ML_CN10K_CMD_TIMEOUT 5
+
+/* Device configuration state enum */
+enum cn10k_ml_dev_state {
+	/* Device probed and not configured */
+	ML_CN10K_DEV_STATE_PROBED = 0,
+
+	/* Device configured */
+	ML_CN10K_DEV_STATE_CONFIGURED,
+
+	/* Device started */
+	ML_CN10K_DEV_STATE_STARTED,
+
+	/* Device closed */
+	ML_CN10K_DEV_STATE_CLOSED
+};
+
 /* Device private data */
 struct cn10k_ml_dev {
 	/* ML device ROC */
 	struct roc_ml roc;
+
+	/* Configuration state */
+	enum cn10k_ml_dev_state state;
 };
 
 #endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index bad5ad4713..32d38569a3 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -25,7 +25,67 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	return 0;
 }
 
+static int
+cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_dev *mldev;
+
+	if (dev == NULL || conf == NULL)
+		return -EINVAL;
+
+	/* Get CN10K device handle */
+	mldev = dev->data->dev_private;
+
+	cn10k_ml_dev_info_get(dev, &dev_info);
+	if (conf->nb_models > dev_info.max_models) {
+		plt_err("Invalid device config, nb_models > %d\n", dev_info.max_models);
+		return -EINVAL;
+	}
+
+	if (conf->nb_queue_pairs > dev_info.max_queue_pairs) {
+		plt_err("Invalid device config, nb_queue_pairs > %u\n", dev_info.max_queue_pairs);
+		return -EINVAL;
+	}
+
+	if (mldev->state == ML_CN10K_DEV_STATE_PROBED) {
+		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
+			   conf->nb_queue_pairs, conf->nb_models);
+	} else if (mldev->state == ML_CN10K_DEV_STATE_CONFIGURED) {
+		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
+			   conf->nb_queue_pairs, conf->nb_models);
+	} else if (mldev->state == ML_CN10K_DEV_STATE_STARTED) {
+		plt_err("Device can't be reconfigured in started state\n");
+		return -ENOTSUP;
+	} else if (mldev->state == ML_CN10K_DEV_STATE_CLOSED) {
+		plt_err("Device can't be reconfigured after close\n");
+		return -ENOTSUP;
+	}
+
+	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
+
+	return 0;
+}
+
+static int
+cn10k_ml_dev_close(struct rte_ml_dev *dev)
+{
+	struct cn10k_ml_dev *mldev;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	mldev = dev->data->dev_private;
+
+	mldev->state = ML_CN10K_DEV_STATE_CLOSED;
+
+	/* Remove PCI device */
+	return rte_dev_remove(dev->device);
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
+	.dev_configure = cn10k_ml_dev_configure,
+	.dev_close = cn10k_ml_dev_close,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 05/37] ml/cnxk: parse ML firmware path from device args
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (3 preceding siblings ...)
  2022-12-08 20:17   ` [PATCH v2 04/37] ml/cnxk: add support for configure and close Srikanth Yalavarthi
@ 2022-12-08 20:17   ` Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 06/37] ml/cnxk: enable firmware load and device reset Srikanth Yalavarthi
                     ` (32 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled parsing ML firmware path for cn10k. Default path is set
as "/lib/firmware/mlip-fw.bin", when args are not provided. Added
internal structures for ML firmware.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 71 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_dev.h | 12 ++++++
 drivers/ml/cnxk/meson.build    |  2 +-
 3 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index fd45226add..117cac43aa 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -4,6 +4,8 @@
 
 #include <rte_common.h>
 #include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_kvargs.h>
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 #include <rte_pci.h>
@@ -13,9 +15,70 @@
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ops.h"
 
+#define CN10K_ML_FW_PATH "fw_path"
+
+#define CN10K_ML_FW_PATH_DEFAULT "/lib/firmware/mlip-fw.bin"
+
+static const char *const valid_args[] = {CN10K_ML_FW_PATH, NULL};
+
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
 
+static int
+parse_string_arg(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	if (value == NULL || extra_args == NULL)
+		return -EINVAL;
+
+	*(char **)extra_args = strdup(value);
+
+	if (!*(char **)extra_args)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static int
+cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mldev)
+{
+	struct rte_kvargs *kvlist = NULL;
+	bool fw_path_set = false;
+	char *fw_path = NULL;
+	int ret = 0;
+
+	if (devargs == NULL)
+		goto check_args;
+
+	kvlist = rte_kvargs_parse(devargs->args, valid_args);
+	if (kvlist == NULL) {
+		plt_err("Error parsing devargs\n");
+		return -EINVAL;
+	}
+
+	if (rte_kvargs_count(kvlist, CN10K_ML_FW_PATH) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_PATH, &parse_string_arg, &fw_path);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n", CN10K_ML_FW_PATH);
+			ret = -EINVAL;
+			goto exit;
+		}
+		fw_path_set = true;
+	}
+
+check_args:
+	if (!fw_path_set)
+		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
+	else
+		mldev->fw.path = fw_path;
+	plt_info("ML: %s = %s", CN10K_ML_FW_PATH, mldev->fw.path);
+
+exit:
+	if (kvlist)
+		rte_kvargs_free(kvlist);
+
+	return ret;
+}
+
 static int
 cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 {
@@ -49,6 +112,12 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
 		mldev->roc.pci_dev = pci_dev;
 
+		ret = cn10k_mldev_parse_devargs(dev->device->devargs, mldev);
+		if (ret) {
+			plt_err("Failed to parse devargs ret = %d", ret);
+			goto pmd_destroy;
+		}
+
 		ret = roc_ml_dev_init(&mldev->roc);
 		if (ret) {
 			plt_err("Failed to initialize ML ROC, ret = %d", ret);
@@ -122,3 +191,5 @@ static struct rte_pci_driver cn10k_mldev_pmd = {
 RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table);
 RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
+
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH "=<path>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index bda7a5b3ff..7eac51cf09 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -43,6 +43,15 @@ enum cn10k_ml_dev_state {
 	ML_CN10K_DEV_STATE_CLOSED
 };
 
+/* ML firmware structure */
+struct cn10k_ml_fw {
+	/* Device reference */
+	struct cn10k_ml_dev *mldev;
+
+	/* Firmware file path */
+	const char *path;
+};
+
 /* Device private data */
 struct cn10k_ml_dev {
 	/* ML device ROC */
@@ -50,6 +59,9 @@ struct cn10k_ml_dev {
 
 	/* Configuration state */
 	enum cn10k_ml_dev_state state;
+
+	/* ML Firmware */
+	struct cn10k_ml_fw fw;
 };
 
 #endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index bf4ccde2c5..7c6fa5e906 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -17,7 +17,7 @@ headers = files(
         'cn10k_ml_ops.h',
 )
 
-deps += ['mldev', 'common_ml', 'common_cnxk']
+deps += ['mldev', 'common_ml', 'common_cnxk', 'kvargs']
 
 if get_option('buildtype').contains('debug')
         cflags += [ '-DCNXK_ML_DEV_DEBUG' ]
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 06/37] ml/cnxk: enable firmware load and device reset
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (4 preceding siblings ...)
  2022-12-08 20:17   ` [PATCH v2 05/37] ml/cnxk: parse ML firmware path from device args Srikanth Yalavarthi
@ 2022-12-08 20:17   ` Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 07/37] ml/cnxk: enable support for simulator environment Srikanth Yalavarthi
                     ` (31 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled support to load ML firmware on cn10ka ROC model. Reset
MLIP device during dev_close driver operation. Device can't be
reconfigured after a call to close. Job execution is disabled
after firmware load, execution is enabled in device start state.
Added internal request structure.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 327 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_dev.h | 156 ++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c |  21 +++
 drivers/ml/cnxk/cn10k_ml_ops.h |  14 ++
 4 files changed, 518 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 117cac43aa..f2b815aacc 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -12,6 +12,8 @@
 
 #include <roc_api.h>
 
+#include <eal_firmware.h>
+
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ops.h"
 
@@ -19,6 +21,15 @@
 
 #define CN10K_ML_FW_PATH_DEFAULT "/lib/firmware/mlip-fw.bin"
 
+/* ML firmware macros */
+#define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
+#define FW_STACK_BUFFER_SIZE	 0x40000
+#define FW_DEBUG_BUFFER_SIZE	 (2 * 0x20000)
+#define FW_EXCEPTION_BUFFER_SIZE 0x400
+#define FW_LINKER_OFFSET	 0x80000
+#define FW_WAIT_CYCLES		 100
+#define FW_LOAD_FLAGS		 0x1
+
 static const char *const valid_args[] = {CN10K_ML_FW_PATH, NULL};
 
 /* Dummy operations for ML device */
@@ -175,6 +186,322 @@ cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 	return rte_ml_dev_pmd_destroy(dev);
 }
 
+static void
+cn10k_ml_fw_print_info(struct cn10k_ml_fw *fw)
+{
+	plt_info("ML Firmware Version = %s", fw->req->jd.fw_load.version);
+
+	plt_ml_dbg("Firmware capabilities = 0x%016lx", fw->req->jd.fw_load.cap.u64);
+	plt_ml_dbg("Version = %s", fw->req->jd.fw_load.version);
+	plt_ml_dbg("core0_debug_ptr = 0x%016lx", fw->req->jd.fw_load.debug.core0_debug_ptr);
+	plt_ml_dbg("core1_debug_ptr = 0x%016lx", fw->req->jd.fw_load.debug.core1_debug_ptr);
+	plt_ml_dbg("debug_buffer_size = %u bytes", fw->req->jd.fw_load.debug.debug_buffer_size);
+	plt_ml_dbg("core0_exception_buffer = 0x%016lx",
+		   fw->req->jd.fw_load.debug.core0_exception_buffer);
+	plt_ml_dbg("core1_exception_buffer = 0x%016lx",
+		   fw->req->jd.fw_load.debug.core1_exception_buffer);
+	plt_ml_dbg("exception_state_size = %u bytes",
+		   fw->req->jd.fw_load.debug.exception_state_size);
+	plt_ml_dbg("flags = 0x%016lx", fw->req->jd.fw_load.flags);
+}
+
+uint64_t
+cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
+{
+	PLT_SET_USED(fw);
+
+	return FW_LOAD_FLAGS;
+}
+
+static int
+cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
+{
+	union ml_a35_0_rst_vector_base_s a35_0_rst_vector_base;
+	union ml_a35_0_rst_vector_base_s a35_1_rst_vector_base;
+	struct cn10k_ml_dev *mldev;
+	uint64_t timeout_cycle;
+	uint64_t reg_val64;
+	uint32_t reg_val32;
+	uint64_t offset;
+	bool timeout;
+	int ret = 0;
+	uint8_t i;
+
+	mldev = fw->mldev;
+
+	/* Reset HEAD and TAIL debug pointer registers */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
+
+	/* (1) Write firmware images for ACC's two A35 cores to the ML region in LLC / DRAM. */
+	memcpy(PLT_PTR_ADD(fw->data, FW_LINKER_OFFSET), buffer, size);
+
+	/* (2) Set ML(0)_MLR_BASE = Base IOVA of the ML region in LLC/DRAM. */
+	reg_val64 = PLT_PTR_SUB_U64_CAST(fw->data, rte_eal_get_baseaddr());
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_save(&mldev->roc, ML_MLR_BASE);
+
+	/* (3) Set ML(0)_AXI_BRIDGE_CTRL(1) = 0x184003 to remove back-pressure check on DMA AXI
+	 * bridge.
+	 */
+	reg_val64 = (ROC_ML_AXI_BRIDGE_CTRL_AXI_RESP_CTRL |
+		     ROC_ML_AXI_BRIDGE_CTRL_BRIDGE_CTRL_MODE | ROC_ML_AXI_BRIDGE_CTRL_NCB_WR_BLK |
+		     ROC_ML_AXI_BRIDGE_CTRL_FORCE_WRESP_OK | ROC_ML_AXI_BRIDGE_CTRL_FORCE_RRESP_OK);
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_AXI_BRIDGE_CTRL(1));
+	plt_ml_dbg("ML_AXI_BRIDGE_CTRL(1) => 0x%016lx",
+		   roc_ml_reg_read64(&mldev->roc, ML_AXI_BRIDGE_CTRL(1)));
+
+	/* (4) Set ML(0)_ANB(0..2)_BACKP_DISABLE = 0x3 to remove back-pressure on the AXI to NCB
+	 * bridges.
+	 */
+	for (i = 0; i < ML_ANBX_NR; i++) {
+		reg_val64 = (ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_B_BACKP_DISABLE |
+			     ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_R_BACKP_DISABLE);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_BACKP_DISABLE(i));
+		plt_ml_dbg("ML_ANBX_BACKP_DISABLE(%u) => 0x%016lx", i,
+			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_BACKP_DISABLE(i)));
+	}
+
+	/* (5) Set ML(0)_ANB(0..2)_NCBI_P_OVR = 0x3000 and ML(0)_ANB(0..2)_NCBI_NP_OVR = 0x3000 to
+	 * signal all ML transactions as non-secure.
+	 */
+	for (i = 0; i < ML_ANBX_NR; i++) {
+		reg_val64 = (ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR |
+			     ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR_VLD);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
+		plt_ml_dbg("ML_ANBX_NCBI_P_OVR(%u) => 0x%016lx", i,
+			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_P_OVR(i)));
+
+		reg_val64 |= (ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR |
+			      ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR_VLD);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_NP_OVR(i));
+		plt_ml_dbg("ML_ANBX_NCBI_NP_OVR(%u) => 0x%016lx", i,
+			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_NP_OVR(i)));
+	}
+
+	/* (6) Set ML(0)_CFG[MLIP_CLK_FORCE] = 1, to force turning on the MLIP clock. */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 |= ROC_ML_CFG_MLIP_CLK_FORCE;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	/* (7) Set ML(0)_JOB_MGR_CTRL[STALL_ON_IDLE] = 0, to make sure the boot request is accepted
+	 * when there is no job in the command queue.
+	 */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL);
+	reg_val64 &= ~ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
+	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL));
+
+	/* (8) Set ML(0)_CFG[ENA] = 0 and ML(0)_CFG[MLIP_ENA] = 1 to bring MLIP out of reset while
+	 * keeping the job manager disabled.
+	 */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 |= ROC_ML_CFG_MLIP_ENA;
+	reg_val64 &= ~ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	/* (9) Wait at least 70 coprocessor clock cycles. */
+	plt_delay_us(FW_WAIT_CYCLES);
+
+	/* (10) Write ML outbound addresses pointing to the firmware images written in step 1 to the
+	 * following registers: ML(0)_A35_0_RST_VECTOR_BASE_W(0..1) for core 0,
+	 * ML(0)_A35_1_RST_VECTOR_BASE_W(0..1) for core 1. The value written to each register is the
+	 * AXI outbound address divided by 4. Read after write.
+	 */
+	offset = PLT_PTR_ADD_U64_CAST(
+		fw->data, FW_LINKER_OFFSET - roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+	a35_0_rst_vector_base.s.addr = (offset + ML_AXI_START_ADDR) / 4;
+	a35_1_rst_vector_base.s.addr = (offset + ML_AXI_START_ADDR) / 4;
+
+	roc_ml_reg_write32(&mldev->roc, a35_0_rst_vector_base.w.w0, ML_A35_0_RST_VECTOR_BASE_W(0));
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(0));
+	plt_ml_dbg("ML_A35_0_RST_VECTOR_BASE_W(0) => 0x%08x", reg_val32);
+
+	roc_ml_reg_write32(&mldev->roc, a35_0_rst_vector_base.w.w1, ML_A35_0_RST_VECTOR_BASE_W(1));
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(1));
+	plt_ml_dbg("ML_A35_0_RST_VECTOR_BASE_W(1) => 0x%08x", reg_val32);
+
+	roc_ml_reg_write32(&mldev->roc, a35_1_rst_vector_base.w.w0, ML_A35_1_RST_VECTOR_BASE_W(0));
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(0));
+	plt_ml_dbg("ML_A35_1_RST_VECTOR_BASE_W(0) => 0x%08x", reg_val32);
+
+	roc_ml_reg_write32(&mldev->roc, a35_1_rst_vector_base.w.w1, ML_A35_1_RST_VECTOR_BASE_W(1));
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(1));
+	plt_ml_dbg("ML_A35_1_RST_VECTOR_BASE_W(1) => 0x%08x", reg_val32);
+
+	/* (11) Clear MLIP's ML(0)_SW_RST_CTRL[ACC_RST]. This will bring the ACC cores and other
+	 * MLIP components out of reset. The cores will execute firmware from the ML region as
+	 * written in step 1.
+	 */
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_SW_RST_CTRL);
+	reg_val32 &= ~ROC_ML_SW_RST_CTRL_ACC_RST;
+	roc_ml_reg_write32(&mldev->roc, reg_val32, ML_SW_RST_CTRL);
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_SW_RST_CTRL);
+	plt_ml_dbg("ML_SW_RST_CTRL => 0x%08x", reg_val32);
+
+	/* (12) Wait for notification from firmware that ML is ready for job execution. */
+	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
+	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
+	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &fw->req->result);
+	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
+	plt_write64(ML_CN10K_POLL_JOB_START, &fw->req->status);
+	plt_wmb();
+
+	/* Enqueue FW load through scratch registers */
+	timeout = true;
+	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&mldev->roc, &fw->req->jd);
+
+	plt_rmb();
+	do {
+		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
+		    (plt_read64(&fw->req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+			timeout = false;
+			break;
+		}
+	} while (plt_tsc_cycles() < timeout_cycle);
+
+	/* Check firmware load status, clean-up and exit on failure. */
+	if ((!timeout) && (fw->req->result.error_code == 0)) {
+		cn10k_ml_fw_print_info(fw);
+	} else {
+		/* Set ML to disable new jobs */
+		reg_val64 = (ROC_ML_CFG_JD_SIZE | ROC_ML_CFG_MLIP_ENA);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+
+		/* Clear scratch registers */
+		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+
+		if (timeout) {
+			plt_err("Firmware load timeout");
+			ret = -ETIME;
+		} else {
+			plt_err("Firmware load failed");
+			ret = -1;
+		}
+
+		return ret;
+	}
+
+	/* (13) Set ML(0)_JOB_MGR_CTRL[STALL_ON_IDLE] = 0x1; this is needed to shut down the MLIP
+	 * clock when there are no more jobs to process.
+	 */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL);
+	reg_val64 |= ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
+	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL));
+
+	/* (14) Set ML(0)_CFG[MLIP_CLK_FORCE] = 0; the MLIP clock will be turned on/off based on job
+	 * activities.
+	 */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 &= ~ROC_ML_CFG_MLIP_CLK_FORCE;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	/* (15) Set ML(0)_CFG[ENA] to enable ML job execution. */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 |= ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	/* Reset scratch registers */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+
+	/* Disable job execution, to be enabled in start */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 &= ~ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	/* Additional fixes: Set RO bit to fix O2D DMA bandwidth issue on cn10ka */
+	for (i = 0; i < ML_ANBX_NR; i++) {
+		reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_P_OVR(i));
+		reg_val64 |= (ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR |
+			      ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR_VLD);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
+	}
+
+	return ret;
+}
+
+int
+cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
+{
+	const struct plt_memzone *mz;
+	struct cn10k_ml_fw *fw;
+	void *fw_buffer = NULL;
+	uint64_t mz_size = 0;
+	uint64_t fw_size = 0;
+	int ret = 0;
+
+	fw = &mldev->fw;
+	fw->mldev = mldev;
+
+	/* Read firmware image to a buffer */
+	ret = rte_firmware_read(fw->path, &fw_buffer, &fw_size);
+	if (ret < 0) {
+		plt_err("Can't read firmware data: %s\n", fw->path);
+		return ret;
+	}
+
+	/* Reserve memzone for firmware load completion and data */
+	mz_size = sizeof(struct cn10k_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
+		  FW_DEBUG_BUFFER_SIZE + FW_EXCEPTION_BUFFER_SIZE;
+	mz = plt_memzone_reserve_aligned(FW_MEMZONE_NAME, mz_size, 0, ML_CN10K_ALIGN_SIZE);
+	if (mz == NULL) {
+		plt_err("plt_memzone_reserve failed : %s", FW_MEMZONE_NAME);
+		if (fw_buffer != NULL)
+			free(fw_buffer);
+		return -ENOMEM;
+	}
+	fw->req = mz->addr;
+
+	/* Reset firmware load completion structure */
+	memset(&fw->req->jd, 0, sizeof(struct cn10k_ml_jd));
+	memset(&fw->req->jd.fw_load.version[0], '\0', MLDEV_FIRMWARE_VERSION_LENGTH);
+
+	/* Reset device, if in active state */
+	if (roc_ml_mlip_is_enabled(&mldev->roc))
+		roc_ml_mlip_reset(&mldev->roc, true);
+
+	/* Load firmware */
+	fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cn10k_ml_req));
+	ret = cn10k_ml_fw_load_cn10ka(fw, fw_buffer, fw_size);
+	if (fw_buffer != NULL)
+		free(fw_buffer);
+	if (ret < 0)
+		cn10k_ml_fw_unload(mldev);
+
+	return ret;
+}
+
+void
+cn10k_ml_fw_unload(struct cn10k_ml_dev *mldev)
+{
+	const struct plt_memzone *mz;
+	uint64_t reg_val;
+
+	/* Disable and reset device */
+	reg_val = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val &= ~ROC_ML_CFG_MLIP_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val, ML_CFG);
+	roc_ml_mlip_reset(&mldev->roc, true);
+
+	mz = plt_memzone_lookup(FW_MEMZONE_NAME);
+	if (mz != NULL)
+		plt_memzone_free(mz);
+}
+
 static struct rte_pci_id pci_id_ml_table[] = {
 	{RTE_PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_CN10K_ML_PF)},
 	/* sentinel */
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 7eac51cf09..30c2ea6471 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -10,6 +10,9 @@
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
+/* Firmware version string length */
+#define MLDEV_FIRMWARE_VERSION_LENGTH 32
+
 /* Device alignment size */
 #define ML_CN10K_ALIGN_SIZE 128
 
@@ -28,6 +31,19 @@
 /* ML command timeout in seconds */
 #define ML_CN10K_CMD_TIMEOUT 5
 
+/* Poll mode job state */
+#define ML_CN10K_POLL_JOB_START	 0
+#define ML_CN10K_POLL_JOB_FINISH 1
+
+/* ML Job types */
+enum cn10k_ml_job_type {
+	ML_CN10K_JOB_TYPE_MODEL_RUN = 0,
+	ML_CN10K_JOB_TYPE_MODEL_STOP,
+	ML_CN10K_JOB_TYPE_MODEL_START,
+	ML_CN10K_JOB_TYPE_FIRMWARE_LOAD,
+	ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST,
+};
+
 /* Device configuration state enum */
 enum cn10k_ml_dev_state {
 	/* Device probed and not configured */
@@ -43,6 +59,136 @@ enum cn10k_ml_dev_state {
 	ML_CN10K_DEV_STATE_CLOSED
 };
 
+/* ML Firmware stats */
+struct cn10k_ml_fw_stats {
+	/* Firmware start cycle */
+	uint64_t fw_start;
+
+	/* Firmware end cycle */
+	uint64_t fw_end;
+
+	/* Hardware start cycle */
+	uint64_t hw_start;
+
+	/* Hardware end cycle */
+	uint64_t hw_end;
+};
+
+/* ML result structure */
+struct cn10k_ml_result {
+	/* Job error code */
+	uint64_t error_code;
+
+	/* Firmware stats */
+	struct cn10k_ml_fw_stats stats;
+
+	/* User context pointer */
+	void *user_ptr;
+};
+
+/* ML Firmware capability structure */
+union cn10k_ml_fw_cap {
+	uint64_t u64;
+
+	struct {
+		/* CMPC completion support */
+		uint64_t cmpc_completions : 1;
+
+		/* Poll mode completion support */
+		uint64_t poll_completions : 1;
+
+		/* SSO completion support */
+		uint64_t sso_completions : 1;
+
+		/* Support for model side loading */
+		uint64_t side_load_model : 1;
+
+		/* Batch execution */
+		uint64_t batch_run : 1;
+
+		/* Max number of models to be loaded in parallel */
+		uint64_t max_models : 8;
+
+		/* Firmware statistics */
+		uint64_t fw_stats : 1;
+
+		/* Hardware statistics */
+		uint64_t hw_stats : 1;
+
+		/* Max number of batches */
+		uint64_t max_num_batches : 16;
+
+		uint64_t rsvd : 33;
+	} s;
+};
+
+/* ML Firmware debug info structure */
+struct cn10k_ml_fw_debug {
+	/* ACC core 0 debug buffer */
+	uint64_t core0_debug_ptr;
+
+	/* ACC core 1 debug buffer */
+	uint64_t core1_debug_ptr;
+
+	/* ACC core 0 exception state buffer */
+	uint64_t core0_exception_buffer;
+
+	/* ACC core 1 exception state buffer */
+	uint64_t core1_exception_buffer;
+
+	/* Debug buffer size per core */
+	uint32_t debug_buffer_size;
+
+	/* Exception state dump size */
+	uint32_t exception_state_size;
+};
+
+/* Job descriptor header (32 bytes) */
+struct cn10k_ml_jd_header {
+	/* Job completion structure */
+	struct ml_jce_s jce;
+
+	/* Model ID */
+	uint64_t model_id : 8;
+
+	/* Job type */
+	uint64_t job_type : 8;
+
+	/* Flags for fast-path jobs */
+	uint64_t fp_flags : 16;
+
+	/* Flags for slow-path jobs */
+	uint64_t sp_flags : 16;
+	uint64_t rsvd : 16;
+
+	/* Job result pointer */
+	uint64_t *result;
+};
+
+/* Job descriptor structure */
+struct cn10k_ml_jd {
+	/* Job descriptor header (32 bytes) */
+	struct cn10k_ml_jd_header hdr;
+
+	union {
+		struct cn10k_ml_jd_section_fw_load {
+			/* Firmware capability structure (8 bytes) */
+			union cn10k_ml_fw_cap cap;
+
+			/* Firmware version (32 bytes) */
+			uint8_t version[MLDEV_FIRMWARE_VERSION_LENGTH];
+
+			/* Debug capability structure (40 bytes) */
+			struct cn10k_ml_fw_debug debug;
+
+			/* Flags to control error handling */
+			uint64_t flags;
+
+			uint8_t rsvd[8];
+		} fw_load;
+	};
+};
+
 /* ML firmware structure */
 struct cn10k_ml_fw {
 	/* Device reference */
@@ -50,6 +196,12 @@ struct cn10k_ml_fw {
 
 	/* Firmware file path */
 	const char *path;
+
+	/* Data buffer */
+	uint8_t *data;
+
+	/* FW load request structure */
+	struct cn10k_ml_req *req;
 };
 
 /* Device private data */
@@ -64,4 +216,8 @@ struct cn10k_ml_dev {
 	struct cn10k_ml_fw fw;
 };
 
+uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
+int cn10k_ml_fw_load(struct cn10k_ml_dev *mldev);
+void cn10k_ml_fw_unload(struct cn10k_ml_dev *mldev);
+
 #endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 32d38569a3..11e1cdb7cd 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -30,6 +30,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 {
 	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_dev *mldev;
+	int ret;
 
 	if (dev == NULL || conf == NULL)
 		return -EINVAL;
@@ -51,6 +52,11 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	if (mldev->state == ML_CN10K_DEV_STATE_PROBED) {
 		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
+
+		/* Load firmware */
+		ret = cn10k_ml_fw_load(mldev);
+		if (ret != 0)
+			return ret;
 	} else if (mldev->state == ML_CN10K_DEV_STATE_CONFIGURED) {
 		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
@@ -77,6 +83,21 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 
 	mldev = dev->data->dev_private;
 
+	/* Unload firmware */
+	cn10k_ml_fw_unload(mldev);
+
+	/* Clear scratch registers */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+
+	/* Reset ML_MLR_BASE */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+
 	mldev->state = ML_CN10K_DEV_STATE_CLOSED;
 
 	/* Remove PCI device */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index adb0035fd7..15d7478d78 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -5,6 +5,20 @@
 #ifndef _CN10K_ML_OPS_H_
 #define _CN10K_ML_OPS_H_
 
+#include "cn10k_ml_dev.h"
+
+/* ML request */
+struct cn10k_ml_req {
+	/* Job descriptor */
+	struct cn10k_ml_jd jd;
+
+	/* Job result */
+	struct cn10k_ml_result result;
+
+	/* Status field for poll mode requests */
+	volatile uint64_t status;
+} __rte_aligned(ROC_ALIGN);
+
 /* CN10K device ops */
 extern struct rte_ml_dev_ops cn10k_ml_ops;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 07/37] ml/cnxk: enable support for simulator environment
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (5 preceding siblings ...)
  2022-12-08 20:17   ` [PATCH v2 06/37] ml/cnxk: enable firmware load and device reset Srikanth Yalavarthi
@ 2022-12-08 20:17   ` Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 08/37] ml/cnxk: enable support for device start and stop Srikanth Yalavarthi
                     ` (30 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled device initialization and firmware load on simulator
platform. Firmware load stage on simulator would involve
launching a firmware handshake request only.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 119 +++++++++++++++++++++++++++++----
 1 file changed, 107 insertions(+), 12 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index f2b815aacc..805b037593 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -213,6 +213,89 @@ cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 	return FW_LOAD_FLAGS;
 }
 
+static int
+cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
+{
+	struct cn10k_ml_dev *mldev;
+	uint64_t timeout_cycle;
+	uint64_t reg_val64;
+	bool timeout;
+	int ret = 0;
+
+	mldev = fw->mldev;
+
+	/* Reset HEAD and TAIL debug pointer registers */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
+
+	/* Set ML_MLR_BASE to base IOVA of the ML region in LLC/DRAM. */
+	reg_val64 = rte_eal_get_baseaddr();
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_save(&mldev->roc, ML_MLR_BASE);
+
+	/* Update FW load completion structure */
+	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
+	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
+	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &fw->req->result);
+	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
+	plt_write64(ML_CN10K_POLL_JOB_START, &fw->req->status);
+	plt_wmb();
+
+	/* Enqueue FW load through scratch registers */
+	timeout = true;
+	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&mldev->roc, &fw->req->jd);
+
+	plt_rmb();
+	do {
+		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
+		    (plt_read64(&fw->req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+			timeout = false;
+			break;
+		}
+	} while (plt_tsc_cycles() < timeout_cycle);
+
+	/* Check firmware load status, clean-up and exit on failure. */
+	if ((!timeout) && (fw->req->result.error_code == 0)) {
+		cn10k_ml_fw_print_info(fw);
+	} else {
+		/* Set ML to disable new jobs */
+		reg_val64 = (ROC_ML_CFG_JD_SIZE | ROC_ML_CFG_MLIP_ENA);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+
+		/* Clear scratch registers */
+		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+
+		if (timeout) {
+			plt_err("Firmware load timeout");
+			ret = -ETIME;
+		} else {
+			plt_err("Firmware load failed");
+			ret = -1;
+		}
+
+		return ret;
+	}
+
+	/* Reset scratch registers */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+
+	/* Disable job execution, to be enabled in start */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 &= ~ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	return ret;
+}
+
 static int
 cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 {
@@ -447,16 +530,22 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	fw = &mldev->fw;
 	fw->mldev = mldev;
 
-	/* Read firmware image to a buffer */
-	ret = rte_firmware_read(fw->path, &fw_buffer, &fw_size);
-	if (ret < 0) {
-		plt_err("Can't read firmware data: %s\n", fw->path);
-		return ret;
+	if (roc_env_is_emulator() || roc_env_is_hw()) {
+		/* Read firmware image to a buffer */
+		ret = rte_firmware_read(fw->path, &fw_buffer, &fw_size);
+		if (ret < 0) {
+			plt_err("Can't read firmware data: %s\n", fw->path);
+			return ret;
+		}
+
+		/* Reserve memzone for firmware load completion and data */
+		mz_size = sizeof(struct cn10k_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
+			  FW_DEBUG_BUFFER_SIZE + FW_EXCEPTION_BUFFER_SIZE;
+	} else if (roc_env_is_asim()) {
+		/* Reserve memzone for firmware load completion */
+		mz_size = sizeof(struct cn10k_ml_req);
 	}
 
-	/* Reserve memzone for firmware load completion and data */
-	mz_size = sizeof(struct cn10k_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
-		  FW_DEBUG_BUFFER_SIZE + FW_EXCEPTION_BUFFER_SIZE;
 	mz = plt_memzone_reserve_aligned(FW_MEMZONE_NAME, mz_size, 0, ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL) {
 		plt_err("plt_memzone_reserve failed : %s", FW_MEMZONE_NAME);
@@ -475,10 +564,16 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 		roc_ml_mlip_reset(&mldev->roc, true);
 
 	/* Load firmware */
-	fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cn10k_ml_req));
-	ret = cn10k_ml_fw_load_cn10ka(fw, fw_buffer, fw_size);
-	if (fw_buffer != NULL)
-		free(fw_buffer);
+	if (roc_env_is_emulator() || roc_env_is_hw()) {
+		fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cn10k_ml_req));
+		ret = cn10k_ml_fw_load_cn10ka(fw, fw_buffer, fw_size);
+		if (fw_buffer != NULL)
+			free(fw_buffer);
+	} else if (roc_env_is_asim()) {
+		fw->data = NULL;
+		ret = cn10k_ml_fw_load_asim(fw);
+	}
+
 	if (ret < 0)
 		cn10k_ml_fw_unload(mldev);
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 08/37] ml/cnxk: enable support for device start and stop
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (6 preceding siblings ...)
  2022-12-08 20:17   ` [PATCH v2 07/37] ml/cnxk: enable support for simulator environment Srikanth Yalavarthi
@ 2022-12-08 20:17   ` Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 09/37] ml/cnxk: add support to create device queue-pairs Srikanth Yalavarthi
                     ` (29 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Implemented ML driver functions to start and stop ML device.
Start / Stop would enable or disable ML device to accept
inference requests.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 42 +++++++++++++++++++++++++++++++---
 1 file changed, 39 insertions(+), 3 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 11e1cdb7cd..3fea763caf 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -104,9 +104,45 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	return rte_dev_remove(dev->device);
 }
 
+static int
+cn10k_ml_dev_start(struct rte_ml_dev *dev)
+{
+	struct cn10k_ml_dev *mldev;
+	uint64_t reg_val64;
+
+	mldev = dev->data->dev_private;
+
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 |= ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	mldev->state = ML_CN10K_DEV_STATE_STARTED;
+
+	return 0;
+}
+
+static int
+cn10k_ml_dev_stop(struct rte_ml_dev *dev)
+{
+	struct cn10k_ml_dev *mldev;
+	uint64_t reg_val64;
+
+	mldev = dev->data->dev_private;
+
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 &= ~ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
-	.dev_info_get = cn10k_ml_dev_info_get,
-	.dev_configure = cn10k_ml_dev_configure,
-	.dev_close = cn10k_ml_dev_close,
+	.dev_info_get = cn10k_ml_dev_info_get, .dev_configure = cn10k_ml_dev_configure,
+	.dev_close = cn10k_ml_dev_close,       .dev_start = cn10k_ml_dev_start,
+	.dev_stop = cn10k_ml_dev_stop,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 09/37] ml/cnxk: add support to create device queue-pairs
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (7 preceding siblings ...)
  2022-12-08 20:17   ` [PATCH v2 08/37] ml/cnxk: enable support for device start and stop Srikanth Yalavarthi
@ 2022-12-08 20:17   ` Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 10/37] ml/cnxk: add functions to load and unload models Srikanth Yalavarthi
                     ` (28 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled support to create and destroy device queue-pairs. Updated
configure stage to create array to store queue-pair handles. Added
internal structure for queue-pair, queue and ML inference requests.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 207 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cn10k_ml_ops.h |  31 +++++
 2 files changed, 236 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 3fea763caf..7c9c49ffda 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -8,6 +8,97 @@
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ops.h"
 
+static void
+qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
+{
+	snprintf(name, size, "cn10k_ml_qp_mem_%u:%u", dev_id, qp_id);
+}
+
+static int
+cn10k_ml_qp_destroy(const struct rte_ml_dev *dev, struct cn10k_ml_qp *qp)
+{
+	const struct rte_memzone *qp_mem;
+	char name[RTE_MEMZONE_NAMESIZE];
+	int ret;
+
+	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp->id);
+	qp_mem = rte_memzone_lookup(name);
+	ret = rte_memzone_free(qp_mem);
+	if (ret)
+		return ret;
+
+	rte_free(qp);
+
+	return 0;
+}
+
+static int
+cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
+{
+	struct cn10k_ml_qp *qp;
+	int ret;
+
+	qp = dev->data->queue_pairs[queue_pair_id];
+	if (qp == NULL)
+		return -EINVAL;
+
+	ret = cn10k_ml_qp_destroy(dev, qp);
+	if (ret) {
+		plt_err("Could not destroy queue pair %u", queue_pair_id);
+		return ret;
+	}
+
+	dev->data->queue_pairs[queue_pair_id] = NULL;
+
+	return 0;
+}
+
+static struct cn10k_ml_qp *
+cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
+{
+	const struct rte_memzone *qp_mem;
+	char name[RTE_MEMZONE_NAMESIZE];
+	struct cn10k_ml_qp *qp;
+	uint32_t len;
+	uint8_t *va;
+
+	/* Allocate queue pair */
+	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cn10k_ml_qp), ROC_ALIGN,
+				socket_id);
+	if (qp == NULL) {
+		plt_err("Could not allocate queue pair");
+		return NULL;
+	}
+
+	/* For request queue */
+	len = nb_desc * sizeof(struct cn10k_ml_req);
+	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp_id);
+	qp_mem = rte_memzone_reserve_aligned(
+		name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | RTE_MEMZONE_256MB, ROC_ALIGN);
+	if (qp_mem == NULL) {
+		plt_err("Could not reserve memzone: %s", name);
+		goto qp_free;
+	}
+
+	va = qp_mem->addr;
+	memset(va, 0, len);
+
+	/* Initialize Request queue */
+	qp->id = qp_id;
+	qp->queue.reqs = (struct cn10k_ml_req *)va;
+	qp->queue.head = 0;
+	qp->queue.tail = 0;
+	qp->queue.wait_cycles = ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	qp->nb_desc = nb_desc;
+
+	return qp;
+
+qp_free:
+	rte_free(qp);
+
+	return NULL;
+}
+
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -30,6 +121,9 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 {
 	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_qp *qp;
+	uint32_t mz_size;
+	uint16_t qp_id;
 	int ret;
 
 	if (dev == NULL || conf == NULL)
@@ -68,21 +162,83 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		return -ENOTSUP;
 	}
 
+	/* Configure queue-pairs */
+	if (dev->data->queue_pairs == NULL) {
+		mz_size = sizeof(dev->data->queue_pairs[0]) * conf->nb_queue_pairs;
+		dev->data->queue_pairs =
+			rte_zmalloc("cn10k_mldev_queue_pairs", mz_size, RTE_CACHE_LINE_SIZE);
+		if (dev->data->queue_pairs == NULL) {
+			dev->data->nb_queue_pairs = 0;
+			plt_err("Failed to get memory for queue_pairs, nb_queue_pairs %u",
+				conf->nb_queue_pairs);
+			return -ENOMEM;
+		}
+	} else { /* Re-configure */
+		void **queue_pairs;
+
+		/* Release all queue pairs as ML spec doesn't support queue_pair_destroy. */
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+			qp = dev->data->queue_pairs[qp_id];
+			if (qp != NULL) {
+				ret = cn10k_ml_dev_queue_pair_release(dev, qp_id);
+				if (ret < 0)
+					return ret;
+			}
+		}
+
+		queue_pairs = dev->data->queue_pairs;
+		queue_pairs =
+			rte_realloc(queue_pairs, sizeof(queue_pairs[0]) * conf->nb_queue_pairs,
+				    RTE_CACHE_LINE_SIZE);
+		if (queue_pairs == NULL) {
+			dev->data->nb_queue_pairs = 0;
+			plt_err("Failed to realloc queue_pairs, nb_queue_pairs = %u",
+				conf->nb_queue_pairs);
+			ret = -ENOMEM;
+			goto error;
+		}
+
+		memset(queue_pairs, 0, sizeof(queue_pairs[0]) * conf->nb_queue_pairs);
+		dev->data->queue_pairs = queue_pairs;
+	}
+	dev->data->nb_queue_pairs = conf->nb_queue_pairs;
+
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
 
 	return 0;
+
+error:
+	if (dev->data->queue_pairs != NULL)
+		rte_free(dev->data->queue_pairs);
+
+	return ret;
 }
 
 static int
 cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_qp *qp;
+	uint16_t qp_id;
 
 	if (dev == NULL)
 		return -EINVAL;
 
 	mldev = dev->data->dev_private;
 
+	/* Destroy all queue pairs */
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		if (qp != NULL) {
+			if (cn10k_ml_qp_destroy(dev, qp) != 0)
+				plt_err("Could not destroy queue pair %u", qp_id);
+			dev->data->queue_pairs[qp_id] = NULL;
+		}
+	}
+
+	if (dev->data->queue_pairs)
+		rte_free(dev->data->queue_pairs);
+
 	/* Unload firmware */
 	cn10k_ml_fw_unload(mldev);
 
@@ -140,9 +296,56 @@ cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 	return 0;
 }
 
+static int
+cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
+			      const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_qp *qp;
+	uint32_t nb_desc;
+
+	if (queue_pair_id >= dev->data->nb_queue_pairs) {
+		plt_err("Queue-pair id = %u (>= max queue pairs supported, %u)\n", queue_pair_id,
+			dev->data->nb_queue_pairs);
+		return -EINVAL;
+	}
+
+	if (dev->data->queue_pairs[queue_pair_id] != NULL)
+		cn10k_ml_dev_queue_pair_release(dev, queue_pair_id);
+
+	cn10k_ml_dev_info_get(dev, &dev_info);
+	if ((qp_conf->nb_desc > dev_info.max_desc) || (qp_conf->nb_desc == 0)) {
+		plt_err("Could not setup queue pair for %u descriptors", qp_conf->nb_desc);
+		return -EINVAL;
+	}
+	plt_ml_dbg("Creating queue-pair, queue_pair_id = %u, nb_desc = %u", queue_pair_id,
+		   qp_conf->nb_desc);
+
+	/* As the number of usable descriptors is 1 less than the queue size being created, we
+	 * increment the size of queue by 1 than the requested size, except when the requested size
+	 * is equal to the maximum possible size.
+	 */
+	nb_desc =
+		(qp_conf->nb_desc == dev_info.max_desc) ? dev_info.max_desc : qp_conf->nb_desc + 1;
+	qp = cn10k_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
+	if (qp == NULL) {
+		plt_err("Could not create queue pair %u", queue_pair_id);
+		return -ENOMEM;
+	}
+	dev->data->queue_pairs[queue_pair_id] = qp;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
-	.dev_info_get = cn10k_ml_dev_info_get, .dev_configure = cn10k_ml_dev_configure,
-	.dev_close = cn10k_ml_dev_close,       .dev_start = cn10k_ml_dev_start,
+	.dev_info_get = cn10k_ml_dev_info_get,
+	.dev_configure = cn10k_ml_dev_configure,
+	.dev_close = cn10k_ml_dev_close,
+	.dev_start = cn10k_ml_dev_start,
 	.dev_stop = cn10k_ml_dev_stop,
+
+	/* Queue-pair handling ops */
+	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
+	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
 };
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 15d7478d78..455109f10f 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -5,6 +5,10 @@
 #ifndef _CN10K_ML_OPS_H_
 #define _CN10K_ML_OPS_H_
 
+#include <rte_mldev.h>
+
+#include <roc_api.h>
+
 #include "cn10k_ml_dev.h"
 
 /* ML request */
@@ -19,6 +23,33 @@ struct cn10k_ml_req {
 	volatile uint64_t status;
 } __rte_aligned(ROC_ALIGN);
 
+/* ML request queue */
+struct cn10k_ml_queue {
+	/* Array of requests */
+	struct cn10k_ml_req *reqs;
+
+	/* Head of the queue, used for enqueue */
+	uint64_t head;
+
+	/* Tail of the queue, used for dequeue */
+	uint64_t tail;
+
+	/* Request wait cycles */
+	uint64_t wait_cycles;
+};
+
+/* ML queue-pair structure */
+struct cn10k_ml_qp {
+	/* Queue pair ID */
+	uint32_t id;
+
+	/* Number of descriptors */
+	uint64_t nb_desc;
+
+	/* Request queue */
+	struct cn10k_ml_queue queue;
+};
+
 /* CN10K device ops */
 extern struct rte_ml_dev_ops cn10k_ml_ops;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 10/37] ml/cnxk: add functions to load and unload models
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (8 preceding siblings ...)
  2022-12-08 20:17   ` [PATCH v2 09/37] ml/cnxk: add support to create device queue-pairs Srikanth Yalavarthi
@ 2022-12-08 20:17   ` Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 11/37] ml/cnxk: enable validity checks for model metadata Srikanth Yalavarthi
                     ` (27 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added cnxk driver implementations to load and unload ML models.
Enabled support in configure stage to allocate model handles
array. Assign model ID and allocate resources per each model
during load stage and release resources during model unload.
Added internal structures to handle ML models.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |   3 +
 drivers/ml/cnxk/cn10k_ml_model.c |   5 +
 drivers/ml/cnxk/cn10k_ml_model.h |  43 +++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 154 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h   |   5 +
 drivers/ml/cnxk/meson.build      |   2 +
 6 files changed, 212 insertions(+)
 create mode 100644 drivers/ml/cnxk/cn10k_ml_model.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_model.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 30c2ea6471..c231cb23ed 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -214,6 +214,9 @@ struct cn10k_ml_dev {
 
 	/* ML Firmware */
 	struct cn10k_ml_fw fw;
+
+	/* Number of models loaded */
+	uint16_t nb_models_loaded;
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
new file mode 100644
index 0000000000..39ed707396
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#include "cn10k_ml_model.h"
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
new file mode 100644
index 0000000000..f529374281
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef _CN10K_ML_MODEL_H_
+#define _CN10K_ML_MODEL_H_
+
+#include <rte_mldev.h>
+
+#include <roc_api.h>
+
+#include "cn10k_ml_dev.h"
+
+/* Maximum number of models per device */
+#define ML_CN10K_MAX_MODELS 16
+
+/* Model state */
+enum cn10k_ml_model_state {
+	ML_CN10K_MODEL_STATE_LOADED,
+	ML_CN10K_MODEL_STATE_JOB_ACTIVE,
+	ML_CN10K_MODEL_STATE_STARTED,
+	ML_CN10K_MODEL_STATE_UNKNOWN,
+};
+
+/* ML Model Object */
+struct cn10k_ml_model {
+	/* Device reference */
+	struct cn10k_ml_dev *mldev;
+
+	/* Model name */
+	char name[RTE_ML_STR_MAX];
+
+	/* Model ID */
+	int16_t model_id;
+
+	/* Model lock, used to update model state */
+	plt_spinlock_t lock;
+
+	/* Model state */
+	enum cn10k_ml_model_state state;
+};
+
+#endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 7c9c49ffda..30e7b0da35 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -6,8 +6,12 @@
 #include <rte_mldev_pmd.h>
 
 #include "cn10k_ml_dev.h"
+#include "cn10k_ml_model.h"
 #include "cn10k_ml_ops.h"
 
+/* ML model macros */
+#define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -120,9 +124,11 @@ static int
 cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
 {
 	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
 	uint32_t mz_size;
+	int16_t model_id;
 	uint16_t qp_id;
 	int ret;
 
@@ -203,6 +209,48 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	}
 	dev->data->nb_queue_pairs = conf->nb_queue_pairs;
 
+	/* Allocate ML models */
+	if (dev->data->models == NULL) {
+		mz_size = sizeof(dev->data->models[0]) * conf->nb_models;
+		dev->data->models = rte_zmalloc("cn10k_mldev_models", mz_size, RTE_CACHE_LINE_SIZE);
+		if (dev->data->models == NULL) {
+			dev->data->nb_models = 0;
+			plt_err("Failed to get memory for ml_models, nb_models %u",
+				conf->nb_models);
+			ret = -ENOMEM;
+			goto error;
+		}
+	} else {
+		/* Re-configure */
+		void **models;
+
+		/* Unload all models */
+		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+			model = dev->data->models[model_id];
+			if (model != NULL) {
+				if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+					if (cn10k_ml_model_unload(dev, model_id) != 0)
+						plt_err("Could not unload model %d", model_id);
+				}
+				dev->data->models[model_id] = NULL;
+			}
+		}
+
+		models = dev->data->models;
+		models = rte_realloc(models, sizeof(models[0]) * conf->nb_models,
+				     RTE_CACHE_LINE_SIZE);
+		if (models == NULL) {
+			dev->data->nb_models = 0;
+			plt_err("Failed to realloc ml_models, nb_models = %u", conf->nb_models);
+			ret = -ENOMEM;
+			goto error;
+		}
+		memset(models, 0, sizeof(models[0]) * conf->nb_models);
+		dev->data->models = models;
+	}
+	dev->data->nb_models = conf->nb_models;
+
+	mldev->nb_models_loaded = 0;
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
 
 	return 0;
@@ -211,14 +259,19 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	if (dev->data->queue_pairs != NULL)
 		rte_free(dev->data->queue_pairs);
 
+	if (dev->data->models != NULL)
+		rte_free(dev->data->models);
+
 	return ret;
 }
 
 static int
 cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
+	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
+	int16_t model_id;
 	uint16_t qp_id;
 
 	if (dev == NULL)
@@ -226,6 +279,21 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 
 	mldev = dev->data->dev_private;
 
+	/* Unload all models */
+	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+		model = dev->data->models[model_id];
+		if (model != NULL) {
+			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+				if (cn10k_ml_model_unload(dev, model_id) != 0)
+					plt_err("Could not unload model %d", model_id);
+			}
+			dev->data->models[model_id] = NULL;
+		}
+	}
+
+	if (dev->data->models)
+		rte_free(dev->data->models);
+
 	/* Destroy all queue pairs */
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
 		qp = dev->data->queue_pairs[qp_id];
@@ -337,6 +405,88 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+int
+cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, int16_t *model_id)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint64_t mz_size;
+	uint16_t idx;
+	bool found;
+
+	PLT_SET_USED(params);
+
+	mldev = dev->data->dev_private;
+
+	/* Find model ID */
+	found = false;
+	for (idx = 0; idx < dev->data->nb_models; idx++) {
+		if (dev->data->models[idx] == NULL) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		plt_err("No slots available to load new model");
+		return -ENOMEM;
+	}
+
+	/* Get MZ size */
+	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE);
+
+	/* Allocate memzone for model object and model data */
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
+	mz = plt_memzone_reserve_aligned(str, mz_size, 0, ML_CN10K_ALIGN_SIZE);
+	if (!mz) {
+		plt_err("plt_memzone_reserve failed : %s", str);
+		return -ENOMEM;
+	}
+
+	model = mz->addr;
+	model->mldev = mldev;
+	model->model_id = idx;
+
+	plt_spinlock_init(&model->lock);
+	model->state = ML_CN10K_MODEL_STATE_LOADED;
+	dev->data->models[idx] = model;
+	mldev->nb_models_loaded++;
+
+	*model_id = idx;
+
+	return 0;
+}
+
+int
+cn10k_ml_model_unload(struct rte_ml_dev *dev, int16_t model_id)
+{
+	char str[RTE_MEMZONE_NAMESIZE];
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+
+	mldev = dev->data->dev_private;
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	if (model->state != ML_CN10K_MODEL_STATE_LOADED) {
+		plt_err("Cannot unload. Model in use.");
+		return -EBUSY;
+	}
+
+	dev->data->models[model_id] = NULL;
+	mldev->nb_models_loaded--;
+
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%d", CN10K_ML_MODEL_MEMZONE_NAME, model_id);
+	return plt_memzone_free(plt_memzone_lookup(str));
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -348,4 +498,8 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Queue-pair handling ops */
 	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
 	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
+
+	/* Model ops */
+	.model_load = cn10k_ml_model_load,
+	.model_unload = cn10k_ml_model_unload,
 };
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 455109f10f..5caebde908 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -53,4 +53,9 @@ struct cn10k_ml_qp {
 /* CN10K device ops */
 extern struct rte_ml_dev_ops cn10k_ml_ops;
 
+/* Slow-path ops */
+int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
+			int16_t *model_id);
+int cn10k_ml_model_unload(struct rte_ml_dev *dev, int16_t model_id);
+
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 7c6fa5e906..1f1c923329 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -10,11 +10,13 @@ endif
 sources = files(
         'cn10k_ml_dev.c',
         'cn10k_ml_ops.c',
+        'cn10k_ml_model.c',
 )
 
 headers = files(
         'cn10k_ml_dev.h',
         'cn10k_ml_ops.h',
+        'cn10k_ml_model.h',
 )
 
 deps += ['mldev', 'common_ml', 'common_cnxk', 'kvargs']
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 11/37] ml/cnxk: enable validity checks for model metadata
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (9 preceding siblings ...)
  2022-12-08 20:17   ` [PATCH v2 10/37] ml/cnxk: add functions to load and unload models Srikanth Yalavarthi
@ 2022-12-08 20:17   ` Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 12/37] ml/cnxk: add internal structures for derived info Srikanth Yalavarthi
                     ` (26 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added model metadata structure and enabled metadata check
during model load. Remap cnxk IO types with RTE IO types.
Store and update model metadata in model structure.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 196 +++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h | 312 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c   |  14 +-
 drivers/ml/cnxk/meson.build      |   2 +-
 4 files changed, 522 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 39ed707396..6f803ce6a5 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -2,4 +2,200 @@
  * Copyright (c) 2022 Marvell.
  */
 
+#include <rte_hash_crc.h>
+
+#include <ml_utils.h>
+
+#include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
+
+static enum rte_ml_io_type
+cn10k_ml_io_type_map(uint8_t type)
+{
+	switch (type) {
+	case 1:
+		return RTE_ML_IO_TYPE_INT8;
+	case 2:
+		return RTE_ML_IO_TYPE_UINT8;
+	case 3:
+		return RTE_ML_IO_TYPE_INT16;
+	case 4:
+		return RTE_ML_IO_TYPE_UINT16;
+	case 5:
+		return RTE_ML_IO_TYPE_INT32;
+	case 6:
+		return RTE_ML_IO_TYPE_UINT32;
+	case 7:
+		return RTE_ML_IO_TYPE_FP16;
+	case 8:
+		return RTE_ML_IO_TYPE_FP32;
+	}
+
+	return RTE_ML_IO_TYPE_UNKNOWN;
+}
+
+int
+cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	uint32_t payload_crc32c;
+	uint32_t header_crc32c;
+	uint8_t version[4];
+	uint8_t i;
+
+	metadata = (struct cn10k_ml_model_metadata *)buffer;
+
+	/* Header CRC check */
+	if (metadata->metadata_header.header_crc32c != 0) {
+		header_crc32c = rte_hash_crc(
+			buffer, sizeof(metadata->metadata_header) - sizeof(uint32_t), 0);
+
+		if (header_crc32c != metadata->metadata_header.header_crc32c) {
+			plt_err("Invalid model, Header CRC mismatch");
+			return -EINVAL;
+		}
+	}
+
+	/* Payload CRC check */
+	if (metadata->metadata_header.payload_crc32c != 0) {
+		payload_crc32c = rte_hash_crc(buffer + sizeof(metadata->metadata_header),
+					      size - sizeof(metadata->metadata_header), 0);
+
+		if (payload_crc32c != metadata->metadata_header.payload_crc32c) {
+			plt_err("Invalid model, Payload CRC mismatch");
+			return -EINVAL;
+		}
+	}
+
+	/* Model magic string */
+	if (strncmp((char *)metadata->metadata_header.magic, MRVL_ML_MODEL_MAGIC_STRING, 4) != 0) {
+		plt_err("Invalid model, magic = %s", metadata->metadata_header.magic);
+		return -EINVAL;
+	}
+
+	/* Target architecture */
+	if (metadata->metadata_header.target_architecture != MRVL_ML_MODEL_TARGET_ARCH) {
+		plt_err("Model target architecture (%u) not supported",
+			metadata->metadata_header.target_architecture);
+		return -ENOTSUP;
+	}
+
+	/* Header version */
+	memcpy(version, metadata->metadata_header.version, 4 * sizeof(uint8_t));
+	if (version[0] * 1000 + version[1] * 100 < MRVL_ML_MODEL_VERSION) {
+		plt_err("Metadata version = %u.%u.%u.%u (< %u.%u.%u.%u) not supported", version[0],
+			version[1], version[2], version[3], (MRVL_ML_MODEL_VERSION / 1000) % 10,
+			(MRVL_ML_MODEL_VERSION / 100) % 10, (MRVL_ML_MODEL_VERSION / 10) % 10,
+			MRVL_ML_MODEL_VERSION % 10);
+		return -ENOTSUP;
+	}
+
+	/* Init section */
+	if (metadata->init_model.file_size == 0) {
+		plt_err("Invalid metadata, init_model.file_size = %u",
+			metadata->init_model.file_size);
+		return -EINVAL;
+	}
+
+	/* Main section */
+	if (metadata->main_model.file_size == 0) {
+		plt_err("Invalid metadata, main_model.file_size = %u",
+			metadata->main_model.file_size);
+		return -EINVAL;
+	}
+
+	/* Finish section */
+	if (metadata->finish_model.file_size == 0) {
+		plt_err("Invalid metadata, finish_model.file_size = %u",
+			metadata->finish_model.file_size);
+		return -EINVAL;
+	}
+
+	/* Weights and Bias */
+	if (metadata->weights_bias.file_size == 0) {
+		plt_err("Invalid metadata, weights_bias.file_size = %u",
+			metadata->weights_bias.file_size);
+		return -EINVAL;
+	}
+
+	if (metadata->weights_bias.relocatable != 1) {
+		plt_err("Model not supported, non-relocatable weights and bias");
+		return -ENOTSUP;
+	}
+
+	/* Inputs */
+	for (i = 0; i < metadata->model.num_input; i++) {
+		if (ml_io_type_size_get(cn10k_ml_io_type_map(metadata->input[i].input_type)) <= 0) {
+			plt_err("Invalid metadata, input[%u] : input_type = %u", i,
+				metadata->input[i].input_type);
+			return -EINVAL;
+		}
+
+		if (ml_io_type_size_get(
+			    cn10k_ml_io_type_map(metadata->input[i].model_input_type)) <= 0) {
+			plt_err("Invalid metadata, input[%u] : model_input_type = %u", i,
+				metadata->input[i].model_input_type);
+			return -EINVAL;
+		}
+
+		if (metadata->input[i].relocatable != 1) {
+			plt_err("Model not supported, non-relocatable input: %u", i);
+			return -ENOTSUP;
+		}
+	}
+
+	/* Outputs */
+	for (i = 0; i < metadata->model.num_output; i++) {
+		if (ml_io_type_size_get(cn10k_ml_io_type_map(metadata->output[i].output_type)) <=
+		    0) {
+			plt_err("Invalid metadata, output[%u] : output_type = %u", i,
+				metadata->output[i].output_type);
+			return -EINVAL;
+		}
+
+		if (ml_io_type_size_get(
+			    cn10k_ml_io_type_map(metadata->output[i].model_output_type)) <= 0) {
+			plt_err("Invalid metadata, output[%u] : model_output_type = %u", i,
+				metadata->output[i].model_output_type);
+			return -EINVAL;
+		}
+
+		if (metadata->output[i].relocatable != 1) {
+			plt_err("Model not supported, non-relocatable output: %u", i);
+			return -ENOTSUP;
+		}
+	}
+
+	return 0;
+}
+
+void
+cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata)
+{
+	uint8_t i;
+
+	for (i = 0; i < metadata->model.num_input; i++) {
+		metadata->input[i].input_type = cn10k_ml_io_type_map(metadata->input[i].input_type);
+		metadata->input[i].model_input_type =
+			cn10k_ml_io_type_map(metadata->input[i].model_input_type);
+
+		if (metadata->input[i].shape.w == 0)
+			metadata->input[i].shape.w = 1;
+
+		if (metadata->input[i].shape.x == 0)
+			metadata->input[i].shape.x = 1;
+
+		if (metadata->input[i].shape.y == 0)
+			metadata->input[i].shape.y = 1;
+
+		if (metadata->input[i].shape.z == 0)
+			metadata->input[i].shape.z = 1;
+	}
+
+	for (i = 0; i < metadata->model.num_output; i++) {
+		metadata->output[i].output_type =
+			cn10k_ml_io_type_map(metadata->output[i].output_type);
+		metadata->output[i].model_output_type =
+			cn10k_ml_io_type_map(metadata->output[i].model_output_type);
+	}
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index f529374281..eb031c6fb2 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -22,6 +22,309 @@ enum cn10k_ml_model_state {
 	ML_CN10K_MODEL_STATE_UNKNOWN,
 };
 
+/* Model Metadata : v 2.1.0.2 */
+#define MRVL_ML_MODEL_MAGIC_STRING "MRVL"
+#define MRVL_ML_MODEL_TARGET_ARCH  128
+#define MRVL_ML_MODEL_VERSION	   2100
+#define MRVL_ML_MODEL_NAME_LEN	   64
+#define MRVL_ML_INPUT_NAME_LEN	   16
+#define MRVL_ML_OUTPUT_NAME_LEN	   16
+#define MRVL_ML_INPUT_OUTPUT_SIZE  8
+
+/* Model file metadata structure */
+struct cn10k_ml_model_metadata {
+	/* Header (256-byte) */
+	struct {
+		/* Magic string ('M', 'R', 'V', 'L') */
+		uint8_t magic[4];
+
+		/* Metadata version */
+		uint8_t version[4];
+
+		/* Metadata size */
+		uint32_t metadata_size;
+
+		/* Unique ID */
+		uint8_t uuid[128];
+
+		/* Model target architecture
+		 * 0 = Undefined
+		 * 1 = M1K
+		 * 128 = MLIP
+		 * 256 = Experimental
+		 */
+		uint32_t target_architecture;
+		uint8_t reserved[104];
+
+		/* CRC of data after metadata_header (i.e. after first 256 bytes) */
+		uint32_t payload_crc32c;
+
+		/* CRC of first 252 bytes of metadata_header, after payload_crc calculation */
+		uint32_t header_crc32c;
+	} metadata_header;
+
+	/* Model information (256-byte) */
+	struct {
+		/* Model name string */
+		uint8_t name[MRVL_ML_MODEL_NAME_LEN];
+
+		/* Model version info (xx.xx.xx.xx) */
+		uint8_t version[4];
+
+		/* Model code size (init + main + finish) */
+		uint32_t code_size;
+
+		/* Model data size (Weights and Bias) */
+		uint32_t data_size;
+
+		/* OCM start offset, set to ocm_wb_range_start */
+		uint32_t ocm_start;
+
+		/* OCM start offset, set to max OCM size */
+		uint32_t ocm_end;
+
+		/* Relocatable flag (always yes)
+		 * 0 = Not relocatable
+		 * 1 = Relocatable
+		 */
+		uint8_t ocm_relocatable;
+
+		/* Tile relocatable flag (always yes)
+		 * 0 = Not relocatable
+		 * 1 = Relocatable
+		 */
+		uint8_t tile_relocatable;
+
+		/* Start tile (Always 0) */
+		uint8_t tile_start;
+
+		/* End tile (num_tiles - 1) */
+		uint8_t tile_end;
+
+		/* Inference batch size */
+		uint8_t batch_size;
+
+		/* Number of input tensors (Max 8) */
+		uint8_t num_input;
+
+		/* Number of output tensors (Max 8) */
+		uint8_t num_output;
+		uint8_t reserved1;
+
+		/* Total input size in bytes */
+		uint32_t input_size;
+
+		/* Total output size in bytes */
+		uint32_t output_size;
+
+		/* Table size in bytes */
+		uint32_t table_size;
+
+		/* Number of layers in the network */
+		uint32_t num_layers;
+		uint32_t reserved2;
+
+		/* Floor of absolute OCM region */
+		uint64_t ocm_tmp_range_floor;
+
+		/* Relative OCM start address of WB data block */
+		uint64_t ocm_wb_range_start;
+
+		/* Relative OCM end address of WB data block */
+		uint64_t ocm_wb_range_end;
+
+		/* Relative DDR start address of WB data block */
+		uint64_t ddr_wb_range_start;
+
+		/* Relative DDR end address of all outputs */
+		uint64_t ddr_wb_range_end;
+
+		/* Relative DDR start address of all inputs */
+		uint64_t ddr_input_range_start;
+
+		/* Relative DDR end address of all inputs */
+		uint64_t ddr_input_range_end;
+
+		/* Relative DDR start address of all outputs */
+		uint64_t ddr_output_range_start;
+
+		/* Relative ddr end address of all outputs */
+		uint64_t ddr_output_range_end;
+
+		/* Compiler version */
+		uint8_t compiler_version[8];
+
+		/* CDK version */
+		uint8_t cdk_version[4];
+
+		/* Lower batch optimization support
+		 * 0 - No,
+		 * 1 - Yes
+		 */
+		uint8_t supports_lower_batch_size_optimization;
+		uint8_t reserved[59];
+	} model;
+
+	/* Init section (64-byte) */
+	struct {
+		uint32_t file_offset;
+		uint32_t file_size;
+		uint8_t reserved[56];
+	} init_model;
+
+	/* Main section (64-byte) */
+	struct {
+		uint32_t file_offset;
+		uint32_t file_size;
+		uint8_t reserved[56];
+	} main_model;
+
+	/* Finish section (64-byte) */
+	struct {
+		uint32_t file_offset;
+		uint32_t file_size;
+		uint8_t reserved[56];
+	} finish_model;
+
+	uint8_t reserved1[512]; /* End of 2k bytes */
+
+	/* Weights and Biases (64-byte) */
+	struct {
+		/* Memory offset, Set to ddr_wb_range_start */
+		uint64_t mem_offset;
+		uint32_t file_offset;
+		uint32_t file_size;
+
+		/* Relocatable flag for WB
+		 * 1 = Relocatable
+		 * 2 = Not relocatable
+		 */
+		uint8_t relocatable;
+		uint8_t reserved[47];
+	} weights_bias;
+
+	/* Input (512-byte, 64-byte per input) provisioned for 8 inputs */
+	struct {
+		/* DDR offset (in ocm absolute addresses for input) */
+		uint64_t mem_offset;
+
+		/* Relocatable flag
+		 * 1 = Relocatable
+		 * 2 = Not relocatable
+		 */
+		uint8_t relocatable;
+
+		/* Input quantization
+		 * 1 = Requires quantization
+		 * 2 = Pre-quantized
+		 */
+		uint8_t quantize;
+
+		/* Type of incoming input
+		 * 1 = INT8, 2 = UINT8, 3 = INT16, 4 = UINT16,
+		 * 5 = INT32, 6 = UINT32, 7 = FP16, 8 = FP32
+		 */
+		uint8_t input_type;
+
+		/* Type of input required by model
+		 * 1 = INT8, 2 = UINT8, 3 = INT16, 4 = UINT16,
+		 * 5 = INT32, 6 = UINT32, 7 = FP16, 8 = FP32
+		 */
+		uint8_t model_input_type;
+
+		/* float_32 qscale value
+		 * quantized = non-quantized * qscale
+		 */
+		float qscale;
+
+		/* Input shape */
+		struct {
+			/* Input format
+			 * 1 = NCHW
+			 * 2 = NHWC
+			 */
+			uint8_t format;
+			uint8_t reserved[3];
+			uint32_t w;
+			uint32_t x;
+			uint32_t y;
+			uint32_t z;
+		} shape;
+		uint8_t reserved[4];
+
+		/* Name of input */
+		uint8_t input_name[MRVL_ML_INPUT_NAME_LEN];
+
+		/* DDR range end
+		 * new = mem_offset + size_bytes - 1
+		 */
+		uint64_t ddr_range_end;
+	} input[MRVL_ML_INPUT_OUTPUT_SIZE];
+
+	/* Output (512 byte, 64-byte per input) provisioned for 8 outputs */
+	struct {
+		/* DDR offset in ocm absolute addresses for output */
+		uint64_t mem_offset;
+
+		/* Relocatable flag
+		 * 1 = Relocatable
+		 * 2 = Not relocatable
+		 */
+		uint8_t relocatable;
+
+		/* Output dequantization
+		 * 1 = De-quantization required
+		 * 2 = De-quantization not required
+		 */
+		uint8_t dequantize;
+
+		/* Type of outgoing output
+		 * 1 = INT8, 2 = UINT8, 3 = INT16, 4 = UINT16
+		 * 5 = INT32, 6 = UINT32, 7 = FP16, 8 = FP32
+		 */
+		uint8_t output_type;
+
+		/* Type of output produced by model
+		 * 1 = INT8, 2 = UINT8, 3 = INT16, 4 = UINT16
+		 * 5 = INT32, 6 = UINT32, 7 = FP16, 8 = FP32
+		 */
+		uint8_t model_output_type;
+
+		/* float_32 dscale value
+		 * dequantized = quantized * dscale
+		 */
+		float dscale;
+
+		/* Number of items in the output */
+		uint32_t size;
+		uint8_t reserved[20];
+
+		/* DDR range end
+		 * new = mem_offset + size_bytes - 1
+		 */
+		uint64_t ddr_range_end;
+		uint8_t output_name[MRVL_ML_OUTPUT_NAME_LEN];
+	} output[MRVL_ML_INPUT_OUTPUT_SIZE];
+
+	uint8_t reserved2[1792];
+
+	/* Model data */
+	struct {
+		uint8_t reserved1[4068];
+
+		/* Beta: xx.xx.xx.xx,
+		 * Later: YYYYMM.xx.xx
+		 */
+		uint8_t compiler_version[8];
+
+		/* M1K CDK version (xx.xx.xx.xx) */
+		uint8_t m1k_cdk_version[4];
+	} data;
+
+	/* Hidden 16 bytes of magic code */
+	uint8_t reserved3[16];
+};
+
 /* ML Model Object */
 struct cn10k_ml_model {
 	/* Device reference */
@@ -33,6 +336,12 @@ struct cn10k_ml_model {
 	/* Model ID */
 	int16_t model_id;
 
+	/* Batch size */
+	uint32_t batch_size;
+
+	/* Model metadata */
+	struct cn10k_ml_model_metadata metadata;
+
 	/* Model lock, used to update model state */
 	plt_spinlock_t lock;
 
@@ -40,4 +349,7 @@ struct cn10k_ml_model {
 	enum cn10k_ml_model_state state;
 };
 
+int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
+void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
+
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 30e7b0da35..171428794e 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -416,8 +416,11 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	uint64_t mz_size;
 	uint16_t idx;
 	bool found;
+	int ret;
 
-	PLT_SET_USED(params);
+	ret = cn10k_ml_model_metadata_check(params->addr, params->size);
+	if (ret != 0)
+		return ret;
 
 	mldev = dev->data->dev_private;
 
@@ -450,6 +453,15 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	model->mldev = mldev;
 	model->model_id = idx;
 
+	memcpy(&model->metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
+	cn10k_ml_model_metadata_update(&model->metadata);
+
+	/* Enable support for batch_size of 256 */
+	if (model->metadata.model.batch_size == 0)
+		model->batch_size = 256;
+	else
+		model->batch_size = model->metadata.model.batch_size;
+
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 1f1c923329..b7567d04a2 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -19,7 +19,7 @@ headers = files(
         'cn10k_ml_model.h',
 )
 
-deps += ['mldev', 'common_ml', 'common_cnxk', 'kvargs']
+deps += ['mldev', 'common_ml', 'common_cnxk', 'kvargs', 'hash']
 
 if get_option('buildtype').contains('debug')
         cflags += [ '-DCNXK_ML_DEV_DEBUG' ]
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 12/37] ml/cnxk: add internal structures for derived info
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (10 preceding siblings ...)
  2022-12-08 20:17   ` [PATCH v2 11/37] ml/cnxk: enable validity checks for model metadata Srikanth Yalavarthi
@ 2022-12-08 20:17   ` Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 13/37] ml/cnxk: add internal structures for tiles and OCM Srikanth Yalavarthi
                     ` (25 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added internal structures to handle derived address fields
and enabled support to compute DMA addresses for model start.
Enabled updating internal model fields.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 88 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h | 80 +++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 18 ++++++-
 3 files changed, 185 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 6f803ce6a5..72b52fce8d 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -199,3 +199,91 @@ cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata)
 			cn10k_ml_io_type_map(metadata->output[i].model_output_type);
 	}
 }
+
+void
+cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_t *base_dma_addr)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	struct cn10k_ml_model_addr *addr;
+	size_t model_data_size;
+	uint8_t *dma_addr_load;
+	uint8_t *dma_addr_run;
+	uint8_t i;
+	int fpos;
+
+	metadata = &model->metadata;
+	addr = &model->addr;
+	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
+			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
+
+	/* Base address */
+	addr->base_dma_addr_load = base_dma_addr;
+	addr->base_dma_addr_run = PLT_PTR_ADD(addr->base_dma_addr_load, model_data_size);
+
+	/* Init Section */
+	dma_addr_load = addr->base_dma_addr_load;
+	dma_addr_run = addr->base_dma_addr_run;
+	fpos = sizeof(struct cn10k_ml_model_metadata);
+	addr->init_load_addr = dma_addr_load;
+	addr->init_run_addr = dma_addr_run;
+	memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->init_model.file_size);
+
+	/* Main Section */
+	dma_addr_load += metadata->init_model.file_size;
+	dma_addr_run += metadata->init_model.file_size;
+	fpos += metadata->init_model.file_size;
+	addr->main_load_addr = dma_addr_load;
+	addr->main_run_addr = dma_addr_run;
+	memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->main_model.file_size);
+
+	/* Finish Section */
+	dma_addr_load += metadata->main_model.file_size;
+	dma_addr_run += metadata->main_model.file_size;
+	fpos += metadata->main_model.file_size;
+	addr->finish_load_addr = dma_addr_load;
+	addr->finish_run_addr = dma_addr_run;
+	memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->finish_model.file_size);
+
+	/* Weights & Bias Section*/
+	dma_addr_load += metadata->finish_model.file_size;
+	fpos += metadata->finish_model.file_size;
+	addr->wb_base_addr = PLT_PTR_SUB(dma_addr_load, metadata->weights_bias.mem_offset);
+	addr->wb_load_addr = PLT_PTR_ADD(addr->wb_base_addr, metadata->weights_bias.mem_offset);
+	memcpy(addr->wb_load_addr, PLT_PTR_ADD(buffer, fpos), metadata->weights_bias.file_size);
+
+	/* Inputs */
+	addr->total_input_sz_d = 0;
+	addr->total_input_sz_q = 0;
+	for (i = 0; i < metadata->model.num_input; i++) {
+		addr->input[i].nb_elements =
+			model->metadata.input[i].shape.w * model->metadata.input[i].shape.x *
+			model->metadata.input[i].shape.y * model->metadata.input[i].shape.z;
+		addr->input[i].sz_d = addr->input[i].nb_elements *
+				      ml_io_type_size_get(metadata->input[i].input_type);
+		addr->input[i].sz_q = addr->input[i].nb_elements *
+				      ml_io_type_size_get(metadata->input[i].model_input_type);
+		addr->total_input_sz_d += addr->input[i].sz_d;
+		addr->total_input_sz_q += addr->input[i].sz_q;
+
+		plt_ml_dbg("model_id = %d, input[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+			   model->model_id, i, metadata->input[i].shape.w,
+			   metadata->input[i].shape.x, metadata->input[i].shape.y,
+			   metadata->input[i].shape.z, addr->input[i].sz_d, addr->input[i].sz_q);
+	}
+
+	/* Outputs */
+	addr->total_output_sz_q = 0;
+	addr->total_output_sz_d = 0;
+	for (i = 0; i < metadata->model.num_output; i++) {
+		addr->output[i].nb_elements = metadata->output[i].size;
+		addr->output[i].sz_d = addr->output[i].nb_elements *
+				       ml_io_type_size_get(metadata->output[i].output_type);
+		addr->output[i].sz_q = addr->output[i].nb_elements *
+				       ml_io_type_size_get(metadata->output[i].model_output_type);
+		addr->total_output_sz_q += addr->output[i].sz_q;
+		addr->total_output_sz_d += addr->output[i].sz_d;
+
+		plt_ml_dbg("model_id = %d, output[%u] - sz_d = %u, sz_q = %u", model->model_id, i,
+			   addr->output[i].sz_d, addr->output[i].sz_q);
+	}
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index eb031c6fb2..02a119cdd8 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -325,6 +325,81 @@ struct cn10k_ml_model_metadata {
 	uint8_t reserved3[16];
 };
 
+/* Model address structure */
+struct cn10k_ml_model_addr {
+	/* Base DMA address for load */
+	void *base_dma_addr_load;
+
+	/* Base DMA address for run */
+	void *base_dma_addr_run;
+
+	/* Init section load address */
+	void *init_load_addr;
+
+	/* Init section run address */
+	void *init_run_addr;
+
+	/* Main section load address */
+	void *main_load_addr;
+
+	/* Main section run address */
+	void *main_run_addr;
+
+	/* Finish section load address */
+	void *finish_load_addr;
+
+	/* Finish section run address */
+	void *finish_run_addr;
+
+	/* Weights and Bias base address */
+	void *wb_base_addr;
+
+	/* Weights and bias load address */
+	void *wb_load_addr;
+
+	/* Start tile */
+	uint8_t tile_start;
+
+	/* End tile */
+	uint8_t tile_end;
+
+	/* Input address and size */
+	struct {
+		/* Number of elements */
+		uint32_t nb_elements;
+
+		/* Dequantized input size */
+		uint32_t sz_d;
+
+		/* Quantized input size */
+		uint32_t sz_q;
+	} input[MRVL_ML_INPUT_OUTPUT_SIZE];
+
+	/* Output address and size */
+	struct {
+		/* Number of elements */
+		uint32_t nb_elements;
+
+		/* Dequantize output size */
+		uint32_t sz_d;
+
+		/* Quantized output size */
+		uint32_t sz_q;
+	} output[MRVL_ML_INPUT_OUTPUT_SIZE];
+
+	/* Total size of quantized input */
+	uint32_t total_input_sz_q;
+
+	/* Total size of dequantized input */
+	uint32_t total_input_sz_d;
+
+	/* Total size of quantized output */
+	uint32_t total_output_sz_q;
+
+	/* Total size of dequantized output */
+	uint32_t total_output_sz_d;
+};
+
 /* ML Model Object */
 struct cn10k_ml_model {
 	/* Device reference */
@@ -342,6 +417,9 @@ struct cn10k_ml_model {
 	/* Model metadata */
 	struct cn10k_ml_model_metadata metadata;
 
+	/* Model address structure */
+	struct cn10k_ml_model_addr addr;
+
 	/* Model lock, used to update model state */
 	plt_spinlock_t lock;
 
@@ -351,5 +429,7 @@ struct cn10k_ml_model {
 
 int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
+void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
+				uint8_t *base_dma_addr);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 171428794e..6bf365d185 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -408,11 +408,14 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 int
 cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, int16_t *model_id)
 {
+	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
+	size_t model_data_size;
+	uint8_t *base_dma_addr;
 	uint64_t mz_size;
 	uint16_t idx;
 	bool found;
@@ -439,7 +442,12 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	/* Get MZ size */
-	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE);
+	metadata = (struct cn10k_ml_model_metadata *)params->addr;
+	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
+			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
+	model_data_size = PLT_ALIGN_CEIL(model_data_size, ML_CN10K_ALIGN_SIZE);
+	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
+		  2 * model_data_size;
 
 	/* Allocate memzone for model object and model data */
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
@@ -462,6 +470,14 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	else
 		model->batch_size = model->metadata.model.batch_size;
 
+	/* Set DMA base address */
+	base_dma_addr = PLT_PTR_ADD(
+		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE));
+	cn10k_ml_model_addr_update(model, params->addr, base_dma_addr);
+
+	/* Copy data from load to run. run address to be used by MLIP */
+	memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, model_data_size);
+
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 13/37] ml/cnxk: add internal structures for tiles and OCM
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (11 preceding siblings ...)
  2022-12-08 20:17   ` [PATCH v2 12/37] ml/cnxk: add internal structures for derived info Srikanth Yalavarthi
@ 2022-12-08 20:17   ` Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 14/37] ml/cnxk: add structures for slow and fast path JDs Srikanth Yalavarthi
                     ` (24 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added internal structures to handle tile and OCM information and
OCM to model memory mapping. Initialize the fields to platform
specific defaults and compute the OCM / tile requirements for model.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |  5 ++
 drivers/ml/cnxk/cn10k_ml_model.c | 53 +++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h |  6 +++
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  5 ++
 drivers/ml/cnxk/cn10k_ml_ocm.h   | 79 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 28 +++++++++++
 drivers/ml/cnxk/meson.build      |  2 +
 7 files changed, 178 insertions(+)
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ocm.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ocm.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index c231cb23ed..6b91c9aae6 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -7,6 +7,8 @@
 
 #include <roc_api.h>
 
+#include "cn10k_ml_ocm.h"
+
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
@@ -215,6 +217,9 @@ struct cn10k_ml_dev {
 	/* ML Firmware */
 	struct cn10k_ml_fw fw;
 
+	/* ML OCM info */
+	struct cn10k_ml_ocm ocm;
+
 	/* Number of models loaded */
 	uint16_t nb_models_loaded;
 };
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 72b52fce8d..11b52af68c 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -8,6 +8,7 @@
 
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
+#include "cn10k_ml_ocm.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -287,3 +288,55 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_
 			   addr->output[i].sz_d, addr->output[i].sz_q);
 	}
 }
+
+int
+cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, int16_t model_id, uint8_t *buffer,
+			       uint16_t *wb_pages, uint16_t *scratch_pages)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	struct cn10k_ml_ocm *ocm;
+	uint64_t scratch_size;
+	uint64_t wb_size;
+
+	metadata = (struct cn10k_ml_model_metadata *)buffer;
+	ocm = &mldev->ocm;
+
+	/* Assume wb_size is zero for non-relocatable models */
+	if (metadata->model.ocm_relocatable)
+		wb_size = metadata->model.ocm_wb_range_end - metadata->model.ocm_wb_range_start + 1;
+	else
+		wb_size = 0;
+
+	if (wb_size % ocm->page_size)
+		*wb_pages = wb_size / ocm->page_size + 1;
+	else
+		*wb_pages = wb_size / ocm->page_size;
+	plt_ml_dbg("model_id = %d, wb_size = %" PRIu64 ", wb_pages = %u", model_id, wb_size,
+		   *wb_pages);
+
+	scratch_size = ocm->size_per_tile - metadata->model.ocm_tmp_range_floor;
+	if (metadata->model.ocm_tmp_range_floor % ocm->page_size)
+		*scratch_pages = scratch_size / ocm->page_size + 1;
+	else
+		*scratch_pages = scratch_size / ocm->page_size;
+	plt_ml_dbg("model_id = %d, scratch_size = %" PRIu64 ", scratch_pages = %u", model_id,
+		   scratch_size, *scratch_pages);
+
+	/* Check if the model can be loaded on OCM */
+	if ((*wb_pages + *scratch_pages) > ML_CN10K_OCM_NUMPAGES) {
+		plt_err("Cannot create the model, OCM relocatable = %u",
+			metadata->model.ocm_relocatable);
+		plt_err("wb_pages (%u) + scratch_pages (%u) > %u", *wb_pages, *scratch_pages,
+			ML_CN10K_OCM_NUMPAGES);
+		return -ENOMEM;
+	}
+
+	/* Update scratch_pages to block the full tile for OCM non-relocatable model. This would
+	 * prevent the library from allocating the remaining space on the tile to other models.
+	 */
+	if (!metadata->model.ocm_relocatable)
+		*scratch_pages =
+			PLT_MAX(PLT_U64_CAST(*scratch_pages), PLT_U64_CAST(ML_CN10K_OCM_NUMPAGES));
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 02a119cdd8..913849feb0 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -10,6 +10,7 @@
 #include <roc_api.h>
 
 #include "cn10k_ml_dev.h"
+#include "cn10k_ml_ocm.h"
 
 /* Maximum number of models per device */
 #define ML_CN10K_MAX_MODELS 16
@@ -420,6 +421,9 @@ struct cn10k_ml_model {
 	/* Model address structure */
 	struct cn10k_ml_model_addr addr;
 
+	/* Tile and memory information object */
+	struct cn10k_ml_ocm_model_map model_mem_map;
+
 	/* Model lock, used to update model state */
 	plt_spinlock_t lock;
 
@@ -431,5 +435,7 @@ int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
 void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
 				uint8_t *base_dma_addr);
+int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, int16_t model_id, uint8_t *buffer,
+				   uint16_t *wb_pages, uint16_t *scratch_pages);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
new file mode 100644
index 0000000000..b1c62f2963
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#include "cn10k_ml_ocm.h"
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
new file mode 100644
index 0000000000..57c2eee344
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -0,0 +1,79 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef _CN10K_ML_OCM_H_
+#define _CN10K_ML_OCM_H_
+
+#include <rte_mldev.h>
+
+/* Page size in bytes. */
+#define ML_CN10K_OCM_PAGESIZE 0x4000
+
+/* Number of OCM tiles. */
+#define ML_CN10K_OCM_NUMTILES 0x8
+
+/* OCM in bytes, per tile. */
+#define ML_CN10K_OCM_TILESIZE 0x100000
+
+/* OCM pages, per tile. */
+#define ML_CN10K_OCM_NUMPAGES (ML_CN10K_OCM_TILESIZE / ML_CN10K_OCM_PAGESIZE)
+
+/* Maximum OCM mask words, per tile, 8 bit words. */
+#define ML_CN10K_OCM_MASKWORDS (ML_CN10K_OCM_NUMPAGES / 8)
+
+/* ML OCM and Tile information structure */
+struct cn10k_ml_ocm_tile_info {
+	/* Mask of used / allotted pages on tile's OCM */
+	uint8_t ocm_mask[ML_CN10K_OCM_MASKWORDS];
+
+	/* Last pages in the tile's OCM used for weights and bias, default = -1 */
+	int last_wb_page;
+
+	/* Number pages used for scratch memory on the tile's OCM */
+	uint16_t scratch_pages;
+};
+
+/* ML Model OCM map structure */
+struct cn10k_ml_ocm_model_map {
+	/* Status of OCM reservation */
+	bool ocm_reserved;
+
+	/* Mask of OCM tiles for the model */
+	uint64_t tilemask;
+
+	/* Start page for the model load, default = -1 */
+	int wb_page_start;
+
+	/* Number of pages required for weights and bias */
+	uint16_t wb_pages;
+
+	/* Number of pages required for scratch memory */
+	uint16_t scratch_pages;
+};
+
+/* OCM state structure */
+struct cn10k_ml_ocm {
+	/* OCM spinlock, used to update OCM state */
+	rte_spinlock_t lock;
+
+	/* Number of OCM tiles */
+	uint8_t num_tiles;
+
+	/* OCM size per each tile */
+	uint64_t size_per_tile;
+
+	/* Size of OCM page */
+	uint64_t page_size;
+
+	/* Number of OCM pages */
+	uint16_t num_pages;
+
+	/* Words per OCM mask */
+	uint16_t mask_words;
+
+	/* OCM memory info and status*/
+	struct cn10k_ml_ocm_tile_info tile_ocm_info[ML_CN10K_OCM_NUMTILES];
+};
+
+#endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 6bf365d185..63c6ae4862 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -126,8 +126,10 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_qp *qp;
 	uint32_t mz_size;
+	uint16_t tile_id;
 	int16_t model_id;
 	uint16_t qp_id;
 	int ret;
@@ -250,6 +252,18 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	}
 	dev->data->nb_models = conf->nb_models;
 
+	ocm = &mldev->ocm;
+	ocm->num_tiles = ML_CN10K_OCM_NUMTILES;
+	ocm->size_per_tile = ML_CN10K_OCM_TILESIZE;
+	ocm->page_size = ML_CN10K_OCM_PAGESIZE;
+	ocm->num_pages = ocm->size_per_tile / ocm->page_size;
+	ocm->mask_words = ocm->num_pages / (8 * sizeof(uint8_t));
+
+	for (tile_id = 0; tile_id < ocm->num_tiles; tile_id++)
+		ocm->tile_ocm_info[tile_id].last_wb_page = -1;
+
+	rte_spinlock_init(&ocm->lock);
+
 	mldev->nb_models_loaded = 0;
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
 
@@ -416,6 +430,8 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	const struct plt_memzone *mz;
 	size_t model_data_size;
 	uint8_t *base_dma_addr;
+	uint16_t scratch_pages;
+	uint16_t wb_pages;
 	uint64_t mz_size;
 	uint16_t idx;
 	bool found;
@@ -441,6 +457,11 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 		return -ENOMEM;
 	}
 
+	/* Get WB and scratch pages, check if model can be loaded. */
+	ret = cn10k_ml_model_ocm_pages_count(mldev, idx, params->addr, &wb_pages, &scratch_pages);
+	if (ret < 0)
+		return ret;
+
 	/* Get MZ size */
 	metadata = (struct cn10k_ml_model_metadata *)params->addr;
 	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
@@ -478,6 +499,13 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	/* Copy data from load to run. run address to be used by MLIP */
 	memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, model_data_size);
 
+	memset(&model->model_mem_map, 0, sizeof(struct cn10k_ml_ocm_model_map));
+	model->model_mem_map.ocm_reserved = false;
+	model->model_mem_map.tilemask = 0;
+	model->model_mem_map.wb_page_start = -1;
+	model->model_mem_map.wb_pages = wb_pages;
+	model->model_mem_map.scratch_pages = scratch_pages;
+
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index b7567d04a2..32cb0dc0a2 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -11,12 +11,14 @@ sources = files(
         'cn10k_ml_dev.c',
         'cn10k_ml_ops.c',
         'cn10k_ml_model.c',
+        'cn10k_ml_ocm.c',
 )
 
 headers = files(
         'cn10k_ml_dev.h',
         'cn10k_ml_ops.h',
         'cn10k_ml_model.h',
+        'cn10k_ml_ocm.h',
 )
 
 deps += ['mldev', 'common_ml', 'common_cnxk', 'kvargs', 'hash']
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 14/37] ml/cnxk: add structures for slow and fast path JDs
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (12 preceding siblings ...)
  2022-12-08 20:17   ` [PATCH v2 13/37] ml/cnxk: add internal structures for tiles and OCM Srikanth Yalavarthi
@ 2022-12-08 20:17   ` Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 15/37] ml/cnxk: find OCM mask and page slots for a model Srikanth Yalavarthi
                     ` (23 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added JD structures for load, unload and run jobs. Initialize
job command and allocate memory for request structures for slow
path jobs.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   | 99 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h |  4 ++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 19 +++++-
 drivers/ml/cnxk/cn10k_ml_ops.h   |  4 ++
 4 files changed, 125 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 6b91c9aae6..17411e5fe1 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -188,6 +188,105 @@ struct cn10k_ml_jd {
 
 			uint8_t rsvd[8];
 		} fw_load;
+
+		struct cn10k_ml_jd_section_model_start {
+			/* Source model start address in DDR relative to ML_MLR_BASE */
+			uint64_t model_src_ddr_addr;
+
+			/* Destination model start address in DDR relative to ML_MLR_BASE */
+			uint64_t model_dst_ddr_addr;
+
+			/* Offset to model init section in the model */
+			uint64_t model_init_offset : 32;
+
+			/* Size of init section in the model */
+			uint64_t model_init_size : 32;
+
+			/* Offset to model main section in the model */
+			uint64_t model_main_offset : 32;
+
+			/* Size of main section in the model */
+			uint64_t model_main_size : 32;
+
+			/* Offset to model finish section in the model */
+			uint64_t model_finish_offset : 32;
+
+			/* Size of finish section in the model */
+			uint64_t model_finish_size : 32;
+
+			/* Offset to WB in model bin */
+			uint64_t model_wb_offset : 32;
+
+			/* Number of model layers */
+			uint64_t num_layers : 8;
+
+			/* Number of gather entries, 0 means linear input mode (= no gather) */
+			uint64_t num_gather_entries : 8;
+
+			/* Number of scatter entries 0 means linear input mode (= no scatter) */
+			uint64_t num_scatter_entries : 8;
+
+			/* Tile mask to load model */
+			uint64_t tilemask : 8;
+
+			/* Batch size of model  */
+			uint64_t batch_size : 32;
+
+			/* OCM WB base address */
+			uint64_t ocm_wb_base_address : 32;
+
+			/* OCM WB range start */
+			uint64_t ocm_wb_range_start : 32;
+
+			/* OCM WB range End */
+			uint64_t ocm_wb_range_end : 32;
+
+			/* DDR WB address */
+			uint64_t ddr_wb_base_address;
+
+			/* DDR WB range start */
+			uint64_t ddr_wb_range_start : 32;
+
+			/* DDR WB range end */
+			uint64_t ddr_wb_range_end : 32;
+
+			union {
+				/* Points to gather list if num_gather_entries > 0 */
+				void *gather_list;
+				struct {
+					/* Linear input mode */
+					uint64_t ddr_range_start : 32;
+					uint64_t ddr_range_end : 32;
+				} s;
+			} input;
+
+			union {
+				/* Points to scatter list if num_scatter_entries > 0 */
+				void *scatter_list;
+				struct {
+					/* Linear output mode */
+					uint64_t ddr_range_start : 32;
+					uint64_t ddr_range_end : 32;
+				} s;
+			} output;
+		} model_start;
+
+		struct cn10k_ml_jd_section_model_stop {
+			uint8_t rsvd[96];
+		} model_stop;
+
+		struct cn10k_ml_jd_section_model_run {
+			/* Address of the input for the run relative to ML_MLR_BASE */
+			uint64_t input_ddr_addr;
+
+			/* Address of the output for the run relative to ML_MLR_BASE */
+			uint64_t output_ddr_addr;
+
+			/* Number of batches to run in variable batch processing */
+			uint16_t num_batches;
+
+			uint8_t rsvd[78];
+		} model_run;
 	};
 };
 
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 913849feb0..64160032c1 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -11,6 +11,7 @@
 
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ocm.h"
+#include "cn10k_ml_ops.h"
 
 /* Maximum number of models per device */
 #define ML_CN10K_MAX_MODELS 16
@@ -429,6 +430,9 @@ struct cn10k_ml_model {
 
 	/* Model state */
 	enum cn10k_ml_model_state state;
+
+	/* Model slow-path operations request pointer */
+	struct cn10k_ml_req *req;
 };
 
 int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 63c6ae4862..6c26f450a5 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -12,6 +12,10 @@
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
 
+/* ML Job descriptor flags */
+#define ML_FLAGS_POLL_COMPL BIT(0)
+#define ML_FLAGS_SSO_COMPL  BIT(1)
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -65,6 +69,7 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	struct cn10k_ml_qp *qp;
 	uint32_t len;
 	uint8_t *va;
+	uint64_t i;
 
 	/* Allocate queue pair */
 	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cn10k_ml_qp), ROC_ALIGN,
@@ -95,6 +100,12 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->queue.wait_cycles = ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
 	qp->nb_desc = nb_desc;
 
+	/* Initialize job command */
+	for (i = 0; i < qp->nb_desc; i++) {
+		memset(&qp->queue.reqs[i].jd, 0, sizeof(struct cn10k_ml_jd));
+		qp->queue.reqs[i].jcmd.w1.s.jobptr = PLT_U64_CAST(&qp->queue.reqs[i].jd);
+	}
+
 	return qp;
 
 qp_free:
@@ -468,7 +479,8 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
 	model_data_size = PLT_ALIGN_CEIL(model_data_size, ML_CN10K_ALIGN_SIZE);
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
-		  2 * model_data_size;
+		  2 * model_data_size +
+		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE);
 
 	/* Allocate memzone for model object and model data */
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
@@ -506,6 +518,11 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	model->model_mem_map.wb_pages = wb_pages;
 	model->model_mem_map.scratch_pages = scratch_pages;
 
+	/* Set slow-path request address and state */
+	model->req = PLT_PTR_ADD(
+		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
+				  2 * model_data_size);
+
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 5caebde908..35962f7985 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -6,6 +6,7 @@
 #define _CN10K_ML_OPS_H_
 
 #include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
 
 #include <roc_api.h>
 
@@ -21,6 +22,9 @@ struct cn10k_ml_req {
 
 	/* Status field for poll mode requests */
 	volatile uint64_t status;
+
+	/* Job command */
+	struct ml_job_cmd_s jcmd;
 } __rte_aligned(ROC_ALIGN);
 
 /* ML request queue */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 15/37] ml/cnxk: find OCM mask and page slots for a model
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (13 preceding siblings ...)
  2022-12-08 20:17   ` [PATCH v2 14/37] ml/cnxk: add structures for slow and fast path JDs Srikanth Yalavarthi
@ 2022-12-08 20:17   ` Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 16/37] ml/cnxk: add support to reserve and free OCM pages Srikanth Yalavarthi
                     ` (22 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added support to compute OCM tilemask and page start for a
model. The computed tilemask and page start are used during
model start to copy model weights and bias to OCM. OCM slot
for a model is allocated from the tiles with maximum amount
of free memory.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c | 330 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ocm.h |   5 +
 2 files changed, 335 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index b1c62f2963..a465848558 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -2,4 +2,334 @@
  * Copyright (c) 2022 Marvell.
  */
 
+#include <rte_mldev_pmd.h>
+
+#include "cn10k_ml_dev.h"
 #include "cn10k_ml_ocm.h"
+
+#include "roc_api.h"
+
+/* OCM macros */
+#define BYTE_LEN	  8
+#define OCM_MAP_WORD_SIZE (sizeof(uint8_t) * BYTE_LEN)
+#define SET_BIT(num, n)	  ((num) | (1 << (n)))
+
+/* Left shift multi-word mask by 1 bit.
+ *
+ * For example, given a mask of two uint8_t words
+ * Input:  [00110101] [00110111]
+ * Output: [01101010] [01101110]
+ */
+static void
+lshift_mask(uint8_t *mask, int nwords)
+{
+	int i;
+	int word_sz;
+
+	word_sz = sizeof(uint8_t) * BYTE_LEN;
+	for (i = nwords - 1; i >= 0; i--) {
+		mask[i] = mask[i] << 1;
+		if (i != 0)
+			mask[i] = mask[i] | (mask[i - 1] >> (word_sz - 1));
+	}
+}
+
+/* Get the index of the first unused slot in a multi-word mask (base_mask). Unused slots only after
+ * the start_pos are considered. An unused slot is a sequence of slot_sz continuous unset bits in
+ * the multi-word mask. For example given a multi-word mask,
+ *
+ * The program creates a search_mask with slot_sz bits set. Uses a sliding windows approach to scan
+ * the mask to identify the available first slot. search_mask slides left from start_pos to end.
+ *
+ * [10111000] [01001001]
+ * - WORD 1 --- WORD 0 -
+ *
+ * When start = 0,
+ * Index of the first unused slot of size 4 is 7.
+ * Index of the first unused slot of size 3 is 7.
+ * Index of the first unused slot of size 2 is 1.
+ * Index of the first unused slot of size 1 is 1.
+ *
+ * When start = 2,
+ * Index of the first unused slot of size 4 is 7.
+ * Index of the first unused slot of size 2 is 4.
+ * Index of the first unused slot of size 1 is 2.
+ *
+ * When unable to find a valid slot, return 0
+ * When slot_sz is zero, return max_idx + 1
+ */
+static int
+slot_index_lowest(uint8_t *base_mask, int nwords, int slot_sz, int start_pos)
+{
+	uint8_t *search_mask;
+	int word_sz;
+	int end_pos;
+	int min_idx;
+	int max_idx;
+	bool match;
+	int i, j;
+	int idx;
+
+	word_sz = sizeof(uint8_t) * BYTE_LEN;
+	min_idx = 0;
+	max_idx = word_sz * nwords;
+	idx = min_idx - 1;
+
+	if (slot_sz == 0)
+		return max_idx;
+
+	/* Create a mask with slot_sz bits set */
+	search_mask = plt_zmalloc(nwords * sizeof(uint8_t), 0);
+	if (search_mask == NULL)
+		goto error;
+
+	for (i = 0; i < nwords; i++) {
+		if (i < slot_sz / word_sz)
+			search_mask[i] = 0xFF;
+		else if (i > slot_sz / word_sz)
+			search_mask[i] = 0x00;
+		else
+			search_mask[i] = (1 << (slot_sz % word_sz)) - 1;
+	}
+
+	/* Shift search mask by start_pos bits */
+	for (i = 0; i < start_pos; i++)
+		lshift_mask(search_mask, nwords);
+
+	/* Scan for a slot, left shift search mask after every iteration */
+	end_pos = nwords * word_sz - slot_sz + 1;
+	for (j = start_pos; j < end_pos; j++) {
+		match = true;
+		for (i = 0; i < nwords; i++)
+			match = match && (((~base_mask[i]) & search_mask[i]) == search_mask[i]);
+
+		if (match) {
+			idx = j;
+			goto found;
+		}
+
+		lshift_mask(search_mask, nwords);
+	}
+
+found:
+	plt_free(search_mask);
+
+error:
+	return idx;
+}
+
+/* Find the largest possible unused slot, with a minimum size of search_sz in a multi-work mask. The
+ * function returns the start index of the slot and the size of the identified slot (slot_sz).
+ *
+ * For example, in multi-word mask
+ *
+ * [10111000] [01001001]
+ * - WORD 1 --- WORD 0 -
+ *
+ * When search_sz > 4, return value = -1, slot_sz = 0
+ * When search_sz <=4, return value = 7, slot_sz = 4
+ */
+static int
+slot_index_largest(uint8_t *base_mask, int nwords, int search_sz, int *slot_sz)
+{
+	uint8_t *search_mask;
+	int mask_sz;
+	int word_sz;
+	int end_pos;
+	bool match;
+	int i, j;
+	int idx;
+
+	word_sz = sizeof(uint8_t) * BYTE_LEN;
+	mask_sz = nwords * word_sz;
+	idx = -1;
+
+	/* Create a mask with mask_sz bits set */
+	search_mask = plt_zmalloc(mask_sz, 0);
+	if (search_mask == NULL)
+		goto error;
+
+start:
+	for (i = 0; i < nwords; i++) {
+		if (i < mask_sz / word_sz)
+			search_mask[i] = 0xFF;
+		else if (i > mask_sz / word_sz)
+			search_mask[i] = 0x00;
+		else
+			search_mask[i] = (1 << (mask_sz % word_sz)) - 1;
+	}
+
+	/* Scan for a slot, left shift search mask after every iteration */
+	end_pos = nwords * word_sz - mask_sz + 1;
+	for (j = 0; j < end_pos; j++) {
+		match = true;
+		for (i = 0; i < nwords; i++)
+			match = match && (((~base_mask[i]) & search_mask[i]) == search_mask[i]);
+
+		if (match) {
+			idx = j;
+			goto found;
+		}
+		lshift_mask(search_mask, nwords);
+	}
+
+	mask_sz--;
+	if (mask_sz >= search_sz)
+		goto start;
+	else
+		mask_sz = 0;
+
+found:
+	plt_free(search_mask);
+	if (search_sz == 0)
+		idx = word_sz * nwords;
+
+error:
+	if (slot_sz)
+		*slot_sz = mask_sz;
+
+	return idx;
+}
+
+/* Count number of bits in a tilemask. Assumes that all set bits are contiguous. */
+int
+cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end)
+{
+	uint8_t count;
+
+	PLT_ASSERT(tilemask != 0);
+
+	*start = __builtin_ctzl(tilemask);
+	*end = 64 - __builtin_clzl(tilemask) - 1;
+	count = *end - *start + 1;
+
+	PLT_ASSERT(count == __builtin_popcountl(tilemask));
+	return count;
+}
+
+/* Find the tiles and wb_page_start to load the model on given 'num_tiles' tiles with the specified
+ * scratch & wb pages and OCM allocation mode.
+ */
+int
+cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
+			   uint16_t scratch_pages, uint64_t *tilemask)
+{
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+
+	uint8_t local_ocm_mask[ML_CN10K_OCM_MASKWORDS] = {0};
+	uint16_t used_scratch_pages_max;
+	uint16_t scratch_page_start;
+	int used_last_wb_page_max;
+	uint16_t scratch_page_end;
+	uint8_t search_start_tile;
+	uint8_t search_end_tile;
+	int wb_page_start_curr;
+	int max_slot_sz_curr;
+	uint8_t tile_start;
+	int ocm_alloc_mode;
+	int wb_page_start;
+	uint16_t tile_id;
+	uint16_t word_id;
+	uint8_t tile_idx;
+	int max_slot_sz;
+	int start_tile;
+	int page_id;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+
+	if (num_tiles > ML_CN10K_OCM_NUMTILES) {
+		plt_err("Invalid num_tiles = %u (> ML_CN10K_OCM_NUMTILES)", num_tiles);
+		return -1;
+	}
+
+	memset(tilemask, 0, sizeof(uint64_t));
+	wb_page_start = -1;
+	used_scratch_pages_max = 0;
+	used_last_wb_page_max = -1;
+	start_tile = -1;
+	max_slot_sz_curr = 0;
+	max_slot_sz = 0;
+	tile_idx = 0;
+	ocm_alloc_mode = 2;
+
+	if ((start_tile != -1) && (start_tile % num_tiles != 0)) {
+		plt_err("Invalid start_tile, %d", start_tile);
+		return -1;
+	}
+
+	if (start_tile < 0) {
+		search_start_tile = 0;
+		search_end_tile = ocm->num_tiles - num_tiles;
+	} else {
+		search_start_tile = start_tile;
+		search_end_tile = start_tile;
+	}
+
+	tile_start = search_start_tile;
+start_search:
+	used_scratch_pages_max = 0;
+	used_last_wb_page_max = -1;
+	for (tile_id = tile_start; tile_id < tile_start + num_tiles; tile_id++) {
+		used_scratch_pages_max =
+			PLT_MAX(ocm->tile_ocm_info[tile_id].scratch_pages, used_scratch_pages_max);
+		used_last_wb_page_max =
+			PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, used_last_wb_page_max);
+	}
+
+	memset(local_ocm_mask, 0, sizeof(local_ocm_mask));
+	for (tile_id = tile_start; tile_id < tile_start + num_tiles; tile_id++) {
+		for (word_id = 0; word_id < ocm->mask_words; word_id++)
+			local_ocm_mask[word_id] |= ocm->tile_ocm_info[tile_id].ocm_mask[word_id];
+	}
+
+	if (used_scratch_pages_max < scratch_pages) { /* Check for extra scratch pages */
+		if (ocm->num_pages - used_last_wb_page_max - 1 >=
+		    scratch_pages) { /* Pages available */
+			scratch_page_start = ocm->num_pages - scratch_pages;
+			scratch_page_end = ocm->num_pages - 1;
+			for (page_id = scratch_page_start; page_id <= scratch_page_end;
+			     page_id++) { /* Mark the extra scratch pages as used */
+				local_ocm_mask[page_id / OCM_MAP_WORD_SIZE] =
+					SET_BIT(local_ocm_mask[page_id / OCM_MAP_WORD_SIZE],
+						page_id % OCM_MAP_WORD_SIZE);
+			}
+		} else { /* Pages not available, check for next set of tiles */
+			goto next_search;
+		}
+	}
+
+	if (ocm_alloc_mode == 1) {
+		wb_page_start = slot_index_lowest(local_ocm_mask, ocm->mask_words, wb_pages, 0);
+		if (wb_page_start != -1) { /* Have a valid slot for WB, else next set of tiles */
+			tile_idx = tile_start;
+			goto found;
+		}
+	} else if (ocm_alloc_mode == 2) {
+		wb_page_start_curr = slot_index_largest(local_ocm_mask, ocm->mask_words, wb_pages,
+							&max_slot_sz_curr);
+		if (max_slot_sz_curr > max_slot_sz) {
+			wb_page_start = wb_page_start_curr;
+			max_slot_sz = max_slot_sz_curr;
+			tile_idx = tile_start;
+		} else if (max_slot_sz_curr == max_slot_sz) {
+			wb_page_start = wb_page_start_curr;
+			if (wb_page_start == ocm->num_pages) {
+				tile_idx = tile_start;
+				goto found;
+			}
+		}
+	}
+
+next_search:
+	tile_start = tile_start + num_tiles;
+	if (tile_start <= search_end_tile)
+		goto start_search;
+
+found:
+	if (wb_page_start != -1)
+		*tilemask = GENMASK_ULL(tile_idx + num_tiles - 1, tile_idx);
+
+	return wb_page_start;
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 57c2eee344..2b7166bbca 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -6,6 +6,7 @@
 #define _CN10K_ML_OCM_H_
 
 #include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
 
 /* Page size in bytes. */
 #define ML_CN10K_OCM_PAGESIZE 0x4000
@@ -76,4 +77,8 @@ struct cn10k_ml_ocm {
 	struct cn10k_ml_ocm_tile_info tile_ocm_info[ML_CN10K_OCM_NUMTILES];
 };
 
+int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
+int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
+			       uint16_t scratch_pages, uint64_t *tilemask);
+
 #endif /* _CN10K_ML_OCM_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 16/37] ml/cnxk: add support to reserve and free OCM pages
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (14 preceding siblings ...)
  2022-12-08 20:17   ` [PATCH v2 15/37] ml/cnxk: find OCM mask and page slots for a model Srikanth Yalavarthi
@ 2022-12-08 20:17   ` Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 17/37] ml/cnxk: enable support to start an ML model Srikanth Yalavarthi
                     ` (21 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added support to reserve and free OCM pages for a model. OCM
pages are reserved upon completion of model start and are
released after model stop.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c | 131 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cn10k_ml_ocm.h |   3 +
 2 files changed, 131 insertions(+), 3 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index a465848558..ddc0936cec 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -5,14 +5,17 @@
 #include <rte_mldev_pmd.h>
 
 #include "cn10k_ml_dev.h"
+#include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
 #include "roc_api.h"
 
 /* OCM macros */
-#define BYTE_LEN	  8
-#define OCM_MAP_WORD_SIZE (sizeof(uint8_t) * BYTE_LEN)
-#define SET_BIT(num, n)	  ((num) | (1 << (n)))
+#define BYTE_LEN	   8
+#define OCM_MAP_WORD_SIZE  (sizeof(uint8_t) * BYTE_LEN)
+#define IS_BIT_SET(num, n) ((num) & (1 << (n)))
+#define SET_BIT(num, n)	   ((num) | (1 << (n)))
+#define CLEAR_BIT(num, n)  ((num) &= ~((1) << (n)))
 
 /* Left shift multi-word mask by 1 bit.
  *
@@ -333,3 +336,125 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 
 	return wb_page_start;
 }
+
+void
+cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, int16_t model_id, uint64_t tilemask,
+			   int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+
+	int scratch_page_start;
+	int scratch_page_end;
+	int wb_page_end;
+	int tile_start;
+	int tile_end;
+	int tile_id;
+	int page_id;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+	model = dev->data->models[model_id];
+
+	/* Get first set bit, tile_start */
+	tile_start = 0;
+	tile_end = 0;
+	cn10k_ml_ocm_tilecount(tilemask, &tile_start, &tile_end);
+	wb_page_end = wb_page_start + wb_pages - 1;
+	scratch_page_start = ocm->num_pages - scratch_pages;
+	scratch_page_end = ocm->num_pages - 1;
+
+	/* Update tile_ocm_info */
+	for (tile_id = tile_start; tile_id <= tile_end; tile_id++) {
+		/* Scratch pages */
+		for (page_id = scratch_page_start; page_id <= scratch_page_end; page_id++)
+			ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE] = SET_BIT(
+				ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE],
+				page_id % OCM_MAP_WORD_SIZE);
+		ocm->tile_ocm_info[tile_id].scratch_pages =
+			PLT_MAX(ocm->tile_ocm_info[tile_id].scratch_pages, scratch_pages);
+
+		/* WB pages */
+		for (page_id = wb_page_start; page_id <= wb_page_end; page_id++)
+			ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE] = SET_BIT(
+				ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE],
+				page_id % OCM_MAP_WORD_SIZE);
+		if (wb_pages != 0)
+			ocm->tile_ocm_info[tile_id].last_wb_page =
+				PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, wb_page_end);
+	}
+
+	model->addr.tile_start = tile_start;
+	model->addr.tile_end = tile_end;
+
+	plt_ml_dbg("model_id = %d, tilemask = 0x%016lx", model_id, tilemask);
+	plt_ml_dbg("model_id = %d, wb_page_start = %d, wb_page_end = %d", model_id, wb_page_start,
+		   wb_page_end);
+	plt_ml_dbg("model_id = %d, scratch_page_start = %d, scratch_page_end = %d", model_id,
+		   scratch_page_start, scratch_page_end);
+}
+
+void
+cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, int16_t model_id)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+
+	int scratch_resize_pages;
+	int wb_page_start;
+	int wb_page_end;
+	int prev_start;
+	int curr_start;
+	int tile_id;
+	int page_id;
+	int16_t i;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+	model = dev->data->models[model_id];
+
+	/* Update OCM info for WB memory */
+	wb_page_start = model->model_mem_map.wb_page_start;
+	wb_page_end = wb_page_start + model->model_mem_map.wb_pages - 1;
+	for (tile_id = model->addr.tile_start; tile_id <= model->addr.tile_end; tile_id++) {
+		for (page_id = wb_page_start; page_id <= wb_page_end; page_id++) {
+			ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE] =
+				CLEAR_BIT(ocm->tile_ocm_info[tile_id]
+						  .ocm_mask[page_id / OCM_MAP_WORD_SIZE],
+					  page_id % OCM_MAP_WORD_SIZE);
+		}
+
+		/* Update last_wb_page size */
+		if (wb_page_end == ocm->tile_ocm_info[tile_id].last_wb_page)
+			ocm->tile_ocm_info[tile_id].last_wb_page = wb_page_start - 1;
+
+		/* Update scratch page size and clear extra bits */
+		scratch_resize_pages = 0;
+		/* Get max scratch pages required, excluding the current model */
+		for (i = 0; i < dev->data->nb_models; i++) {
+			struct cn10k_ml_model *model = dev->data->models[i];
+
+			if ((i != model_id) && (model != NULL)) {
+				if (IS_BIT_SET(model->model_mem_map.tilemask, tile_id))
+					scratch_resize_pages =
+						PLT_MAX((int)model->model_mem_map.scratch_pages,
+							scratch_resize_pages);
+			}
+		}
+
+		/* Clear extra scratch pages */
+		if (scratch_resize_pages < ocm->tile_ocm_info[tile_id].scratch_pages) {
+			prev_start = ocm->num_pages - ocm->tile_ocm_info[tile_id].scratch_pages;
+			curr_start = ocm->num_pages - scratch_resize_pages;
+			for (page_id = prev_start; page_id < curr_start; page_id++) {
+				ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE] =
+					CLEAR_BIT(ocm->tile_ocm_info[tile_id]
+							  .ocm_mask[page_id / OCM_MAP_WORD_SIZE],
+						  page_id % OCM_MAP_WORD_SIZE);
+			}
+			ocm->tile_ocm_info[tile_id].scratch_pages = scratch_resize_pages;
+		}
+	}
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 2b7166bbca..7c6b1432c5 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -80,5 +80,8 @@ struct cn10k_ml_ocm {
 int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
 int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
 			       uint16_t scratch_pages, uint64_t *tilemask);
+void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, int16_t model_id, uint64_t tilemask,
+				int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages);
+void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, int16_t model_id);
 
 #endif /* _CN10K_ML_OCM_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 17/37] ml/cnxk: enable support to start an ML model
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (15 preceding siblings ...)
  2022-12-08 20:17   ` [PATCH v2 16/37] ml/cnxk: add support to reserve and free OCM pages Srikanth Yalavarthi
@ 2022-12-08 20:17   ` Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 18/37] ml/cnxk: enable support to stop an ML models Srikanth Yalavarthi
                     ` (20 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Implemented model start driver function. A model start  job
is checked for completion in synchronous mode. Tilemask and
OCM slot is calculated before starting the model. Model start
is enqueued through scratch registers. OCM pages are reserved
after model start completion.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h |   3 +
 drivers/ml/cnxk/cn10k_ml_ops.c | 208 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |   4 +
 3 files changed, 215 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 17411e5fe1..5096a26c40 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -33,6 +33,9 @@
 /* ML command timeout in seconds */
 #define ML_CN10K_CMD_TIMEOUT 5
 
+/* ML slow-path job flags */
+#define ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE BIT(0)
+
 /* Poll mode job state */
 #define ML_CN10K_POLL_JOB_START	 0
 #define ML_CN10K_POLL_JOB_FINISH 1
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 6c26f450a5..b74092e605 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -114,6 +114,64 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	return NULL;
 }
 
+static void
+cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_model *model,
+				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	struct cn10k_ml_model_addr *addr;
+
+	metadata = &model->metadata;
+	addr = &model->addr;
+
+	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
+	req->jd.hdr.jce.w0.u64 = 0;
+	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
+	req->jd.hdr.model_id = model->model_id;
+	req->jd.hdr.job_type = job_type;
+	req->jd.hdr.fp_flags = 0x0;
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+
+	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
+		if (!model->metadata.model.ocm_relocatable)
+			req->jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
+		else
+			req->jd.hdr.sp_flags = 0x0;
+		req->jd.model_start.model_src_ddr_addr =
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, addr->init_load_addr));
+		req->jd.model_start.model_dst_ddr_addr =
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, addr->init_run_addr));
+		req->jd.model_start.model_init_offset = 0x0;
+		req->jd.model_start.model_main_offset = metadata->init_model.file_size;
+		req->jd.model_start.model_finish_offset =
+			metadata->init_model.file_size + metadata->main_model.file_size;
+		req->jd.model_start.model_init_size = metadata->init_model.file_size;
+		req->jd.model_start.model_main_size = metadata->main_model.file_size;
+		req->jd.model_start.model_finish_size = metadata->finish_model.file_size;
+		req->jd.model_start.model_wb_offset = metadata->init_model.file_size +
+						      metadata->main_model.file_size +
+						      metadata->finish_model.file_size;
+		req->jd.model_start.num_layers = metadata->model.num_layers;
+		req->jd.model_start.num_gather_entries = 0;
+		req->jd.model_start.num_scatter_entries = 0;
+		req->jd.model_start.tilemask = 0; /* Updated after reserving pages */
+		req->jd.model_start.batch_size = model->batch_size;
+		req->jd.model_start.ocm_wb_base_address = 0; /* Updated after reserving pages */
+		req->jd.model_start.ocm_wb_range_start = metadata->model.ocm_wb_range_start;
+		req->jd.model_start.ocm_wb_range_end = metadata->model.ocm_wb_range_end;
+		req->jd.model_start.ddr_wb_base_address = PLT_U64_CAST(roc_ml_addr_ap2mlip(
+			&mldev->roc,
+			PLT_PTR_ADD(addr->finish_load_addr, metadata->finish_model.file_size)));
+		req->jd.model_start.ddr_wb_range_start = metadata->model.ddr_wb_range_start;
+		req->jd.model_start.ddr_wb_range_end = metadata->model.ddr_wb_range_end;
+		req->jd.model_start.input.s.ddr_range_start = metadata->model.ddr_input_range_start;
+		req->jd.model_start.input.s.ddr_range_end = metadata->model.ddr_input_range_end;
+		req->jd.model_start.output.s.ddr_range_start =
+			metadata->model.ddr_output_range_start;
+		req->jd.model_start.output.s.ddr_range_end = metadata->model.ddr_output_range_end;
+	}
+}
+
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -560,6 +618,155 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, int16_t model_id)
 	return plt_memzone_free(plt_memzone_lookup(str));
 }
 
+int
+cn10k_ml_model_start(struct rte_ml_dev *dev, int16_t model_id)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+	struct cn10k_ml_req *req;
+
+	bool job_enqueued;
+	bool job_dequeued;
+	uint8_t num_tiles;
+	uint64_t tilemask;
+	int wb_page_start;
+	int tile_start;
+	int tile_end;
+	bool locked;
+	int ret = 0;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	/* Prepare JD */
+	req = model->req;
+	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
+
+	req->result.error_code = 0x0;
+	req->result.user_ptr = NULL;
+
+	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_wmb();
+
+	num_tiles = model->metadata.model.tile_end - model->metadata.model.tile_start + 1;
+
+	locked = false;
+	while (!locked) {
+		if (plt_spinlock_trylock(&model->lock) != 0) {
+			if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+				plt_ml_dbg("Model already started, model = 0x%016lx",
+					   PLT_U64_CAST(model));
+				plt_spinlock_unlock(&model->lock);
+				return 1;
+			}
+
+			if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE) {
+				plt_err("A slow-path job is active for the model = 0x%016lx",
+					PLT_U64_CAST(model));
+				plt_spinlock_unlock(&model->lock);
+				return -EBUSY;
+			}
+
+			model->state = ML_CN10K_MODEL_STATE_JOB_ACTIVE;
+			plt_spinlock_unlock(&model->lock);
+			locked = true;
+		}
+	}
+
+	while (!model->model_mem_map.ocm_reserved) {
+		if (plt_spinlock_trylock(&ocm->lock) != 0) {
+			wb_page_start = cn10k_ml_ocm_tilemask_find(
+				dev, num_tiles, model->model_mem_map.wb_pages,
+				model->model_mem_map.scratch_pages, &tilemask);
+
+			if (wb_page_start == -1) {
+				plt_err("Free pages not available on OCM tiles");
+				plt_err("Failed to load model = 0x%016lx, name = %s",
+					PLT_U64_CAST(model), model->metadata.model.name);
+
+				plt_spinlock_unlock(&ocm->lock);
+				return -ENOMEM;
+			}
+
+			model->model_mem_map.tilemask = tilemask;
+			model->model_mem_map.wb_page_start = wb_page_start;
+
+			cn10k_ml_ocm_reserve_pages(
+				dev, model->model_id, model->model_mem_map.tilemask,
+				model->model_mem_map.wb_page_start, model->model_mem_map.wb_pages,
+				model->model_mem_map.scratch_pages);
+			model->model_mem_map.ocm_reserved = true;
+			plt_spinlock_unlock(&ocm->lock);
+		}
+	}
+
+	/* Update JD */
+	cn10k_ml_ocm_tilecount(model->model_mem_map.tilemask, &tile_start, &tile_end);
+	req->jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
+	req->jd.model_start.ocm_wb_base_address =
+		model->model_mem_map.wb_page_start * ocm->page_size;
+
+	job_enqueued = false;
+	job_dequeued = false;
+	do {
+		if (!job_enqueued) {
+			req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+			job_enqueued = roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+		}
+
+		if (job_enqueued && !job_dequeued)
+			job_dequeued = roc_ml_scratch_dequeue(&mldev->roc, &req->jd);
+
+		if (job_dequeued)
+			break;
+	} while (plt_tsc_cycles() < req->timeout);
+
+	if (job_dequeued) {
+		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+			if (req->result.error_code == 0)
+				ret = 0;
+			else
+				ret = -1;
+		}
+	} else { /* Reset scratch registers */
+		roc_ml_scratch_queue_reset(&mldev->roc);
+		ret = -ETIME;
+	}
+
+	locked = false;
+	while (!locked) {
+		if (plt_spinlock_trylock(&model->lock) != 0) {
+			if (ret == 0)
+				model->state = ML_CN10K_MODEL_STATE_STARTED;
+			else
+				model->state = ML_CN10K_MODEL_STATE_UNKNOWN;
+
+			plt_spinlock_unlock(&model->lock);
+			locked = true;
+		}
+	}
+
+	if (model->state == ML_CN10K_MODEL_STATE_UNKNOWN) {
+		while (model->model_mem_map.ocm_reserved) {
+			if (plt_spinlock_trylock(&ocm->lock) != 0) {
+				cn10k_ml_ocm_free_pages(dev, model->model_id);
+				model->model_mem_map.ocm_reserved = false;
+				model->model_mem_map.tilemask = 0x0;
+				plt_spinlock_unlock(&ocm->lock);
+			}
+		}
+	}
+
+	return ret;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -575,4 +782,5 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Model ops */
 	.model_load = cn10k_ml_model_load,
 	.model_unload = cn10k_ml_model_unload,
+	.model_start = cn10k_ml_model_start,
 };
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 35962f7985..3fe3872fd1 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -25,6 +25,9 @@ struct cn10k_ml_req {
 
 	/* Job command */
 	struct ml_job_cmd_s jcmd;
+
+	/* Request timeout cycle */
+	uint64_t timeout;
 } __rte_aligned(ROC_ALIGN);
 
 /* ML request queue */
@@ -61,5 +64,6 @@ extern struct rte_ml_dev_ops cn10k_ml_ops;
 int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			int16_t *model_id);
 int cn10k_ml_model_unload(struct rte_ml_dev *dev, int16_t model_id);
+int cn10k_ml_model_start(struct rte_ml_dev *dev, int16_t model_id);
 
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 18/37] ml/cnxk: enable support to stop an ML models
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (16 preceding siblings ...)
  2022-12-08 20:17   ` [PATCH v2 17/37] ml/cnxk: enable support to start an ML model Srikanth Yalavarthi
@ 2022-12-08 20:17   ` Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 19/37] ml/cnxk: enable support to get model information Srikanth Yalavarthi
                     ` (19 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Implemented model stop driver function. A model stop job is
enqueued through scratch registers and is checked for
completion through polling in a synchronous mode. OCM pages
are released after model stop completion.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 115 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cn10k_ml_ops.h |   1 +
 2 files changed, 114 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index b74092e605..a0b0fc7e1f 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -295,10 +295,14 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		/* Re-configure */
 		void **models;
 
-		/* Unload all models */
+		/* Stop and unload all models */
 		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 			model = dev->data->models[model_id];
 			if (model != NULL) {
+				if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+					if (cn10k_ml_model_stop(dev, model_id) != 0)
+						plt_err("Could not stop model %d", model_id);
+				}
 				if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
 					if (cn10k_ml_model_unload(dev, model_id) != 0)
 						plt_err("Could not unload model %d", model_id);
@@ -362,10 +366,14 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 
 	mldev = dev->data->dev_private;
 
-	/* Unload all models */
+	/* Stop and unload all models */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 		model = dev->data->models[model_id];
 		if (model != NULL) {
+			if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+				if (cn10k_ml_model_stop(dev, model_id) != 0)
+					plt_err("Could not stop model %d", model_id);
+			}
 			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
 				if (cn10k_ml_model_unload(dev, model_id) != 0)
 					plt_err("Could not unload model %d", model_id);
@@ -767,6 +775,108 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, int16_t model_id)
 	return ret;
 }
 
+int
+cn10k_ml_model_stop(struct rte_ml_dev *dev, int16_t model_id)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+	struct cn10k_ml_req *req;
+
+	bool job_enqueued;
+	bool job_dequeued;
+	bool locked;
+	int ret = 0;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	/* Prepare JD */
+	req = model->req;
+	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
+	req->result.error_code = 0x0;
+	req->result.user_ptr = NULL;
+
+	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_wmb();
+
+	locked = false;
+	while (!locked) {
+		if (plt_spinlock_trylock(&model->lock) != 0) {
+			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+				plt_ml_dbg("Model not started, model = 0x%016lx",
+					   PLT_U64_CAST(model));
+				plt_spinlock_unlock(&model->lock);
+				return 1;
+			}
+
+			if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE) {
+				plt_err("A slow-path job is active for the model = 0x%016lx",
+					PLT_U64_CAST(model));
+				plt_spinlock_unlock(&model->lock);
+				return -EBUSY;
+			}
+
+			model->state = ML_CN10K_MODEL_STATE_JOB_ACTIVE;
+			plt_spinlock_unlock(&model->lock);
+			locked = true;
+		}
+	}
+
+	while (model->model_mem_map.ocm_reserved) {
+		if (plt_spinlock_trylock(&ocm->lock) != 0) {
+			cn10k_ml_ocm_free_pages(dev, model->model_id);
+			model->model_mem_map.ocm_reserved = false;
+			model->model_mem_map.tilemask = 0x0;
+			plt_spinlock_unlock(&ocm->lock);
+		}
+	}
+
+	job_enqueued = false;
+	job_dequeued = false;
+	do {
+		if (!job_enqueued) {
+			req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+			job_enqueued = roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+		}
+
+		if (job_enqueued && !job_dequeued)
+			job_dequeued = roc_ml_scratch_dequeue(&mldev->roc, &req->jd);
+
+		if (job_dequeued)
+			break;
+	} while (plt_tsc_cycles() < req->timeout);
+
+	if (job_dequeued) {
+		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+			if (req->result.error_code == 0x0)
+				ret = 0;
+			else
+				ret = -1;
+		}
+	} else {
+		roc_ml_scratch_queue_reset(&mldev->roc);
+		ret = -ETIME;
+	}
+
+	locked = false;
+	while (!locked) {
+		if (plt_spinlock_trylock(&model->lock) != 0) {
+			model->state = ML_CN10K_MODEL_STATE_LOADED;
+			plt_spinlock_unlock(&model->lock);
+			locked = true;
+		}
+	}
+
+	return ret;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -783,4 +893,5 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.model_load = cn10k_ml_model_load,
 	.model_unload = cn10k_ml_model_unload,
 	.model_start = cn10k_ml_model_start,
+	.model_stop = cn10k_ml_model_stop,
 };
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 3fe3872fd1..5e7e42ee88 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -65,5 +65,6 @@ int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *para
 			int16_t *model_id);
 int cn10k_ml_model_unload(struct rte_ml_dev *dev, int16_t model_id);
 int cn10k_ml_model_start(struct rte_ml_dev *dev, int16_t model_id);
+int cn10k_ml_model_stop(struct rte_ml_dev *dev, int16_t model_id);
 
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 19/37] ml/cnxk: enable support to get model information
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (17 preceding siblings ...)
  2022-12-08 20:17   ` [PATCH v2 18/37] ml/cnxk: enable support to stop an ML models Srikanth Yalavarthi
@ 2022-12-08 20:17   ` Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 20/37] ml/cnxk: enable support to update model params Srikanth Yalavarthi
                     ` (18 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added driver functions to get model information. Added
internal functions to set and get model info.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 54 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h |  9 ++++++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 37 ++++++++++++++++++++--
 3 files changed, 98 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 11b52af68c..19595656ae 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -340,3 +340,57 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, int16_t model_id, uin
 
 	return 0;
 }
+
+void
+cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model)
+{
+	struct rte_ml_model_info *info;
+	struct rte_ml_io_info *output;
+	struct rte_ml_io_info *input;
+	uint8_t i;
+
+	info = PLT_PTR_CAST(model->info);
+	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
+	output =
+		PLT_PTR_ADD(input, model->metadata.model.num_input * sizeof(struct rte_ml_io_info));
+
+	/* Set model info */
+	memset(info, 0, sizeof(struct rte_ml_model_info));
+	memcpy(info->name, model->metadata.model.name, MRVL_ML_MODEL_NAME_LEN);
+	snprintf(info->version, RTE_ML_STR_MAX, "%u.%u.%u.%u", model->metadata.model.version[0],
+		 model->metadata.model.version[1], model->metadata.model.version[2],
+		 model->metadata.model.version[3]);
+	info->model_id = model->model_id;
+	info->device_id = dev->data->dev_id;
+	info->batch_size = model->batch_size;
+	info->nb_inputs = model->metadata.model.num_input;
+	info->input_info = input;
+	info->nb_outputs = model->metadata.model.num_output;
+	info->output_info = output;
+	info->wb_size = model->metadata.weights_bias.file_size;
+
+	/* Set input info */
+	for (i = 0; i < info->nb_inputs; i++) {
+		memcpy(input[i].name, model->metadata.input[i].input_name, MRVL_ML_INPUT_NAME_LEN);
+		input[i].dtype = model->metadata.input[i].input_type;
+		input[i].qtype = model->metadata.input[i].model_input_type;
+		input[i].shape.format = model->metadata.input[i].shape.format;
+		input[i].shape.w = model->metadata.input[i].shape.w;
+		input[i].shape.x = model->metadata.input[i].shape.x;
+		input[i].shape.y = model->metadata.input[i].shape.y;
+		input[i].shape.z = model->metadata.input[i].shape.z;
+	}
+
+	/* Set output info */
+	for (i = 0; i < info->nb_outputs; i++) {
+		memcpy(output[i].name, model->metadata.output[i].output_name,
+		       MRVL_ML_OUTPUT_NAME_LEN);
+		output[i].dtype = model->metadata.output[i].output_type;
+		output[i].qtype = model->metadata.output[i].model_output_type;
+		output[i].shape.format = RTE_ML_IO_FORMAT_1D;
+		output[i].shape.w = model->metadata.output[i].size;
+		output[i].shape.x = 1;
+		output[i].shape.y = 1;
+		output[i].shape.z = 1;
+	}
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 64160032c1..2372ac9b72 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -425,6 +425,14 @@ struct cn10k_ml_model {
 	/* Tile and memory information object */
 	struct cn10k_ml_ocm_model_map model_mem_map;
 
+	/* Internal model information structure
+	 * Size of the buffer = sizeof(struct rte_ml_model_info)
+	 *                    + num_inputs * sizeof(struct rte_ml_io_info)
+	 *                    + num_outputs * sizeof(struct rte_ml_io_info).
+	 * Structures would be arranged in the same order in the buffer.
+	 */
+	uint8_t *info;
+
 	/* Model lock, used to update model state */
 	plt_spinlock_t lock;
 
@@ -441,5 +449,6 @@ void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
 				uint8_t *base_dma_addr);
 int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, int16_t model_id, uint8_t *buffer,
 				   uint16_t *wb_pages, uint16_t *scratch_pages);
+void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index a0b0fc7e1f..f26cfcfd06 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -506,6 +506,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
 	size_t model_data_size;
+	size_t model_info_size;
 	uint8_t *base_dma_addr;
 	uint16_t scratch_pages;
 	uint16_t wb_pages;
@@ -544,8 +545,13 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
 			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
 	model_data_size = PLT_ALIGN_CEIL(model_data_size, ML_CN10K_ALIGN_SIZE);
+	model_info_size = sizeof(struct rte_ml_model_info) +
+			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
+			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
+	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
+
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
-		  2 * model_data_size +
+		  2 * model_data_size + model_info_size +
 		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE);
 
 	/* Allocate memzone for model object and model data */
@@ -559,6 +565,9 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	model = mz->addr;
 	model->mldev = mldev;
 	model->model_id = idx;
+	model->info = PLT_PTR_ADD(
+		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
+				  2 * model_data_size);
 
 	memcpy(&model->metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
 	cn10k_ml_model_metadata_update(&model->metadata);
@@ -587,7 +596,10 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	/* Set slow-path request address and state */
 	model->req = PLT_PTR_ADD(
 		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
-				  2 * model_data_size);
+				  2 * model_data_size + model_info_size);
+
+	/* Set model info */
+	cn10k_ml_model_info_set(dev, model);
 
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
@@ -877,6 +889,26 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, int16_t model_id)
 	return ret;
 }
 
+static int
+cn10k_ml_model_info_get(struct rte_ml_dev *dev, int16_t model_id,
+			struct rte_ml_model_info *model_info)
+{
+	struct cn10k_ml_model *model;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	memcpy(model_info, model->info, sizeof(struct rte_ml_model_info));
+	model_info->input_info = ((struct rte_ml_model_info *)model->info)->input_info;
+	model_info->output_info = ((struct rte_ml_model_info *)model->info)->output_info;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -894,4 +926,5 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.model_unload = cn10k_ml_model_unload,
 	.model_start = cn10k_ml_model_start,
 	.model_stop = cn10k_ml_model_stop,
+	.model_info_get = cn10k_ml_model_info_get,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 20/37] ml/cnxk: enable support to update model params
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (18 preceding siblings ...)
  2022-12-08 20:17   ` [PATCH v2 19/37] ml/cnxk: enable support to get model information Srikanth Yalavarthi
@ 2022-12-08 20:17   ` Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 21/37] ml/cnxk: add support to get IO buffer sizes Srikanth Yalavarthi
                     ` (17 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added cnxk driver functions to update model params or weights
and bias after a models is loaded. Updating model params would
not require reloading the model.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index f26cfcfd06..bc50e1b8cb 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -909,6 +909,36 @@ cn10k_ml_model_info_get(struct rte_ml_dev *dev, int16_t model_id,
 	return 0;
 }
 
+static int
+cn10k_ml_model_params_update(struct rte_ml_dev *dev, int16_t model_id, void *buffer)
+{
+	struct cn10k_ml_model *model;
+	size_t size;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	if (model->state == ML_CN10K_MODEL_STATE_UNKNOWN)
+		return -1;
+	else if (model->state != ML_CN10K_MODEL_STATE_LOADED)
+		return -EBUSY;
+
+	size = model->metadata.init_model.file_size + model->metadata.main_model.file_size +
+	       model->metadata.finish_model.file_size + model->metadata.weights_bias.file_size;
+
+	/* Update model weights & bias */
+	memcpy(model->addr.wb_load_addr, buffer, model->metadata.weights_bias.file_size);
+
+	/* Copy data from load to run. run address to be used by MLIP */
+	memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, size);
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -927,4 +957,5 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.model_start = cn10k_ml_model_start,
 	.model_stop = cn10k_ml_model_stop,
 	.model_info_get = cn10k_ml_model_info_get,
+	.model_params_update = cn10k_ml_model_params_update,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 21/37] ml/cnxk: add support to get IO buffer sizes
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (19 preceding siblings ...)
  2022-12-08 20:17   ` [PATCH v2 20/37] ml/cnxk: enable support to update model params Srikanth Yalavarthi
@ 2022-12-08 20:17   ` Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 22/37] ml/cnxk: enable quantization and dequantization Srikanth Yalavarthi
                     ` (16 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added driver functions to get input and output buffer sizes
for a given batch size. This function would compute the buffer
size based on specific requirements of the device.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 52 ++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index bc50e1b8cb..c96f17ebd8 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -939,6 +939,54 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, int16_t model_id, void *buf
 	return 0;
 }
 
+static int
+cn10k_ml_io_input_size_get(struct rte_ml_dev *dev, int16_t model_id, uint32_t nb_batches,
+			   uint64_t *input_qsize, uint64_t *input_dsize)
+{
+	struct cn10k_ml_model *model;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	if (input_qsize != NULL)
+		*input_qsize = PLT_U64_CAST(model->addr.total_input_sz_q *
+					    PLT_DIV_CEIL(nb_batches, model->batch_size));
+
+	if (input_dsize != NULL)
+		*input_dsize = PLT_U64_CAST(model->addr.total_input_sz_d *
+					    PLT_DIV_CEIL(nb_batches, model->batch_size));
+
+	return 0;
+}
+
+static int
+cn10k_ml_io_output_size_get(struct rte_ml_dev *dev, int16_t model_id, uint32_t nb_batches,
+			    uint64_t *output_qsize, uint64_t *output_dsize)
+{
+	struct cn10k_ml_model *model;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	if (output_qsize != NULL)
+		*output_qsize = PLT_U64_CAST(model->addr.total_output_sz_q *
+					     PLT_DIV_CEIL(nb_batches, model->batch_size));
+
+	if (output_dsize != NULL)
+		*output_dsize = PLT_U64_CAST(model->addr.total_output_sz_d *
+					     PLT_DIV_CEIL(nb_batches, model->batch_size));
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -958,4 +1006,8 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.model_stop = cn10k_ml_model_stop,
 	.model_info_get = cn10k_ml_model_info_get,
 	.model_params_update = cn10k_ml_model_params_update,
+
+	/* I/O ops */
+	.io_input_size_get = cn10k_ml_io_input_size_get,
+	.io_output_size_get = cn10k_ml_io_output_size_get,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 22/37] ml/cnxk: enable quantization and dequantization
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (20 preceding siblings ...)
  2022-12-08 20:17   ` [PATCH v2 21/37] ml/cnxk: add support to get IO buffer sizes Srikanth Yalavarthi
@ 2022-12-08 20:17   ` Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 23/37] ml/cnxk: enable support to dump device debug info Srikanth Yalavarthi
                     ` (15 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Implemented driver functions to quantize / dequantize input
and output data. Support is enabled for multiple batches.
Quantization / dequantization use the type conversion functions
defined in ML common code.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 150 +++++++++++++++++++++++++++++++++
 1 file changed, 150 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index c96f17ebd8..9868d2a598 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -5,6 +5,8 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include <ml_utils.h>
+
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ops.h"
@@ -987,6 +989,152 @@ cn10k_ml_io_output_size_get(struct rte_ml_dev *dev, int16_t model_id, uint32_t n
 	return 0;
 }
 
+static int
+cn10k_ml_io_quantize(struct rte_ml_dev *dev, int16_t model_id, uint16_t nb_batches, void *dbuffer,
+		     void *qbuffer)
+{
+	struct cn10k_ml_model *model;
+	uint8_t *lcl_dbuffer;
+	uint8_t *lcl_qbuffer;
+	uint32_t batch_id;
+	uint32_t i;
+	int ret;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	lcl_dbuffer = dbuffer;
+	lcl_qbuffer = qbuffer;
+	batch_id = 0;
+
+next_batch:
+	for (i = 0; i < model->metadata.model.num_input; i++) {
+		if (model->metadata.input[i].input_type ==
+		    model->metadata.input[i].model_input_type) {
+			memcpy(lcl_qbuffer, lcl_dbuffer, model->addr.input[i].sz_d);
+		} else {
+			switch (model->metadata.input[i].model_input_type) {
+			case RTE_ML_IO_TYPE_INT8:
+				ret = ml_float32_to_int8(model->metadata.input[i].qscale,
+							 model->addr.input[i].nb_elements,
+							 lcl_dbuffer, lcl_qbuffer);
+				break;
+			case RTE_ML_IO_TYPE_UINT8:
+				ret = ml_float32_to_uint8(model->metadata.input[i].qscale,
+							  model->addr.input[i].nb_elements,
+							  lcl_dbuffer, lcl_qbuffer);
+				break;
+			case RTE_ML_IO_TYPE_INT16:
+				ret = ml_float32_to_int16(model->metadata.input[i].qscale,
+							  model->addr.input[i].nb_elements,
+							  lcl_dbuffer, lcl_qbuffer);
+				break;
+			case RTE_ML_IO_TYPE_UINT16:
+				ret = ml_float32_to_uint16(model->metadata.input[i].qscale,
+							   model->addr.input[i].nb_elements,
+							   lcl_dbuffer, lcl_qbuffer);
+				break;
+			case RTE_ML_IO_TYPE_FP16:
+				ret = ml_float32_to_float16(model->addr.input[i].nb_elements,
+							    lcl_dbuffer, lcl_qbuffer);
+				break;
+			default:
+				plt_err("Unsupported model_input_type[%u] : %u", i,
+					model->metadata.input[i].model_input_type);
+				ret = -ENOTSUP;
+			}
+			if (ret < 0)
+				return ret;
+		}
+
+		lcl_dbuffer += model->addr.input[i].sz_d;
+		lcl_qbuffer += model->addr.input[i].sz_q;
+	}
+
+	batch_id++;
+	if (batch_id < PLT_DIV_CEIL(nb_batches, model->batch_size))
+		goto next_batch;
+
+	return 0;
+}
+
+static int
+cn10k_ml_io_dequantize(struct rte_ml_dev *dev, int16_t model_id, uint16_t nb_batches, void *qbuffer,
+		       void *dbuffer)
+{
+	struct cn10k_ml_model *model;
+	uint8_t *lcl_qbuffer;
+	uint8_t *lcl_dbuffer;
+	uint32_t batch_id;
+	uint32_t i;
+	int ret;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	lcl_dbuffer = dbuffer;
+	lcl_qbuffer = qbuffer;
+	batch_id = 0;
+
+next_batch:
+	for (i = 0; i < model->metadata.model.num_output; i++) {
+		if (model->metadata.output[i].output_type ==
+		    model->metadata.output[i].model_output_type) {
+			memcpy(lcl_dbuffer, lcl_qbuffer, model->addr.output[i].sz_q);
+		} else {
+			switch (model->metadata.output[i].model_output_type) {
+			case RTE_ML_IO_TYPE_INT8:
+				ret = ml_int8_to_float32(model->metadata.output[i].dscale,
+							 model->addr.output[i].nb_elements,
+							 lcl_qbuffer, lcl_dbuffer);
+				break;
+			case RTE_ML_IO_TYPE_UINT8:
+				ret = ml_uint8_to_float32(model->metadata.output[i].dscale,
+							  model->addr.output[i].nb_elements,
+							  lcl_qbuffer, lcl_dbuffer);
+				break;
+			case RTE_ML_IO_TYPE_INT16:
+				ret = ml_int16_to_float32(model->metadata.output[i].dscale,
+							  model->addr.output[i].nb_elements,
+							  lcl_qbuffer, lcl_dbuffer);
+				break;
+			case RTE_ML_IO_TYPE_UINT16:
+				ret = ml_uint16_to_float32(model->metadata.output[i].dscale,
+							   model->addr.output[i].nb_elements,
+							   lcl_qbuffer, lcl_dbuffer);
+				break;
+			case RTE_ML_IO_TYPE_FP16:
+				ret = ml_float16_to_float32(model->addr.output[i].nb_elements,
+							    lcl_qbuffer, lcl_dbuffer);
+				break;
+			default:
+				plt_err("Unsupported model_output_type[%u] : %u", i,
+					model->metadata.output[i].model_output_type);
+				ret = -ENOTSUP;
+			}
+			if (ret < 0)
+				return ret;
+		}
+
+		lcl_qbuffer += model->addr.output[i].sz_q;
+		lcl_dbuffer += model->addr.output[i].sz_d;
+	}
+
+	batch_id++;
+	if (batch_id < PLT_DIV_CEIL(nb_batches, model->batch_size))
+		goto next_batch;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -1010,4 +1158,6 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* I/O ops */
 	.io_input_size_get = cn10k_ml_io_input_size_get,
 	.io_output_size_get = cn10k_ml_io_output_size_get,
+	.io_quantize = cn10k_ml_io_quantize,
+	.io_dequantize = cn10k_ml_io_dequantize,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 23/37] ml/cnxk: enable support to dump device debug info
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (21 preceding siblings ...)
  2022-12-08 20:17   ` [PATCH v2 22/37] ml/cnxk: enable quantization and dequantization Srikanth Yalavarthi
@ 2022-12-08 20:17   ` Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 24/37] ml/cnxk: add driver support for device selftest Srikanth Yalavarthi
                     ` (14 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added support to dump device debug information. Debug info on
cn10k device includes model state info, OCM usage info, firmware
debug and exception buffer.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c |  51 +++++++++
 drivers/ml/cnxk/cn10k_ml_ocm.h |   1 +
 drivers/ml/cnxk/cn10k_ml_ops.c | 189 +++++++++++++++++++++++++++++++++
 3 files changed, 241 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index ddc0936cec..348df9468a 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -458,3 +458,54 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, int16_t model_id)
 		}
 	}
 }
+
+static void
+cn10k_ml_ocm_pagemask_to_str(struct cn10k_ml_ocm_tile_info *tile_info, uint16_t nwords, char *str)
+{
+	char *p = str;
+	int word;
+
+	/* add prefix 0x */
+	*p++ = '0';
+	*p++ = 'x';
+
+	/* build one word at a time */
+	for (word = nwords - 1; word >= 0; word--) {
+		sprintf(p, "%02X", tile_info->ocm_mask[word]);
+		p += 2;
+	}
+
+	/* terminate */
+	*p++ = 0;
+}
+
+void
+cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
+{
+	char str[ML_CN10K_OCM_NUMPAGES / 4 + 2]; /* nibbles + prefix '0x' */
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+	uint8_t tile_id;
+	uint8_t word_id;
+	int wb_pages;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+
+	fprintf(fp, "OCM State:\n");
+	for (tile_id = 0; tile_id < ocm->num_tiles; tile_id++) {
+		cn10k_ml_ocm_pagemask_to_str(&ocm->tile_ocm_info[tile_id], ocm->mask_words, str);
+
+		wb_pages = 0 - ocm->tile_ocm_info[tile_id].scratch_pages;
+		for (word_id = 0; word_id < ML_CN10K_OCM_MASKWORDS; word_id++)
+			wb_pages +=
+				__builtin_popcount(ocm->tile_ocm_info[tile_id].ocm_mask[word_id]);
+
+		fprintf(fp,
+			"tile = %2u, scratch_pages = %4u,"
+			" wb_pages = %4d, last_wb_page = %4d,"
+			" pagemask = %s\n",
+			tile_id, ocm->tile_ocm_info[tile_id].scratch_pages, wb_pages,
+			ocm->tile_ocm_info[tile_id].last_wb_page, str);
+	}
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 7c6b1432c5..887c8bf6c0 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -83,5 +83,6 @@ int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16
 void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, int16_t model_id, uint64_t tilemask,
 				int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages);
 void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, int16_t model_id);
+void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp);
 
 #endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 9868d2a598..ae90d32480 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -14,10 +14,25 @@
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
 
+/* Debug print width */
+#define STR_LEN	  12
+#define FIELD_LEN 16
+#define LINE_LEN  90
+
 /* ML Job descriptor flags */
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
+static void
+print_line(FILE *fp, int len)
+{
+	int i;
+
+	for (i = 0; i < len; i++)
+		fprintf(fp, "-");
+	fprintf(fp, "\n");
+}
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -116,6 +131,102 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	return NULL;
 }
 
+static void
+cn10k_ml_model_print(struct rte_ml_dev *dev, int16_t model_id, FILE *fp)
+{
+
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+	char str[STR_LEN];
+	uint8_t i;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+	model = dev->data->models[model_id];
+
+	/* Print debug info */
+	print_line(fp, LINE_LEN);
+	fprintf(fp, " Model Information (%s)\n", model->metadata.model.name);
+	print_line(fp, LINE_LEN);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->metadata.model.name);
+	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version", model->metadata.model.version[0],
+		model->metadata.model.version[1], model->metadata.model.version[2],
+		model->metadata.model.version[3]);
+	if (strlen(model->name) != 0)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "debug_name", model->name);
+	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "index", model->model_id);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->metadata.model.batch_size);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_layers", model->metadata.model.num_layers);
+
+	/* Print model state */
+	if (model->state == ML_CN10K_MODEL_STATE_LOADED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+	if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+	if (model->state == ML_CN10K_MODEL_STATE_STARTED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+
+	/* Print OCM status */
+	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
+		model->metadata.model.ocm_wb_range_end - model->metadata.model.ocm_wb_range_start +
+			1);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", model->model_mem_map.wb_pages);
+	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
+		ocm->size_per_tile - model->metadata.model.ocm_tmp_range_floor);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages", model->model_mem_map.scratch_pages);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
+		model->metadata.model.tile_end - model->metadata.model.tile_start + 1);
+
+	if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
+			ML_CN10K_OCM_NUMTILES / 4, model->model_mem_map.tilemask);
+		fprintf(fp, "%*s : 0x%x\n", FIELD_LEN, "ocm_wb_start",
+			model->model_mem_map.wb_page_start * ML_CN10K_OCM_PAGESIZE);
+	}
+
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->metadata.model.num_input);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", model->metadata.model.num_output);
+	fprintf(fp, "\n");
+
+	print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s  %18s  %12s  %14s\n", "input", "input_name", "input_type",
+		"model_input_type", "quantize", "format");
+	print_line(fp, LINE_LEN);
+	for (i = 0; i < model->metadata.model.num_input; i++) {
+		fprintf(fp, "%8u  ", i);
+		fprintf(fp, "%*s  ", 16, model->metadata.input[i].input_name);
+		ml_io_type_to_str(model->metadata.input[i].input_type, str, STR_LEN);
+		fprintf(fp, "%*s  ", 12, str);
+		ml_io_type_to_str(model->metadata.input[i].model_input_type, str, STR_LEN);
+		fprintf(fp, "%*s  ", 18, str);
+		fprintf(fp, "%*s", 12, (model->metadata.input[i].quantize == 1 ? "Yes" : "No"));
+		ml_io_format_to_str(model->metadata.input[i].shape.format, str, STR_LEN);
+		fprintf(fp, "%*s", 16, str);
+		fprintf(fp, "\n");
+	}
+	fprintf(fp, "\n");
+
+	print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "output", "output_name", "output_type",
+		"model_output_type", "dequantize");
+	print_line(fp, LINE_LEN);
+	for (i = 0; i < model->metadata.model.num_output; i++) {
+		fprintf(fp, "%8u  ", i);
+		fprintf(fp, "%*s  ", 16, model->metadata.output[i].output_name);
+		ml_io_type_to_str(model->metadata.output[i].output_type, str, STR_LEN);
+		fprintf(fp, "%*s  ", 12, str);
+		ml_io_type_to_str(model->metadata.output[i].model_output_type, str, STR_LEN);
+		fprintf(fp, "%*s  ", 18, str);
+		fprintf(fp, "%*s", 12, (model->metadata.output[i].dequantize == 1 ? "Yes" : "No"));
+		fprintf(fp, "\n");
+	}
+	fprintf(fp, "\n");
+	print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+}
+
 static void
 cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_model *model,
 				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
@@ -498,6 +609,83 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+static int
+cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_fw *fw;
+
+	uint32_t head_loc;
+	uint32_t tail_loc;
+	uint32_t bufsize;
+	char *head_ptr;
+	int model_id;
+	int core_id;
+
+	if (roc_env_is_asim())
+		return 0;
+
+	mldev = dev->data->dev_private;
+	fw = &mldev->fw;
+
+	/* Dump model info */
+	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+		model = dev->data->models[model_id];
+		if (model != NULL) {
+			cn10k_ml_model_print(dev, model_id, fp);
+			fprintf(fp, "\n");
+		}
+	}
+
+	/* Dump ocm state */
+	cn10k_ml_ocm_print(dev, fp);
+
+	/* Dump debug buffer */
+	for (core_id = 0; core_id <= 1; core_id++) {
+		bufsize = fw->req->jd.fw_load.debug.debug_buffer_size;
+		if (core_id == 0) {
+			head_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+			tail_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_debug_ptr);
+			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+		} else {
+			head_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+			tail_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_debug_ptr);
+			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+		}
+		if (head_loc < tail_loc) {
+			fprintf(fp, "%.*s\n", tail_loc - head_loc, &head_ptr[head_loc]);
+		} else if (head_loc >= tail_loc + 1) {
+			fprintf(fp, "%.*s\n", bufsize - tail_loc, &head_ptr[head_loc]);
+			fprintf(fp, "%.*s\n", tail_loc, &head_ptr[0]);
+		}
+	}
+
+	/* Dump exception info */
+	for (core_id = 0; core_id <= 1; core_id++) {
+		bufsize = fw->req->jd.fw_load.debug.exception_state_size;
+		if ((core_id == 0) &&
+		    (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0)) {
+			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_exception_buffer);
+			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C0 = 0x%016lx",
+				roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0));
+			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+			fprintf(fp, "%.*s", bufsize, head_ptr);
+		} else if ((core_id == 1) &&
+			   (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0)) {
+			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_exception_buffer);
+			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C1 = 0x%016lx",
+				roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1));
+			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+			fprintf(fp, "%.*s", bufsize, head_ptr);
+		}
+	}
+
+	return 0;
+}
+
 int
 cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, int16_t *model_id)
 {
@@ -1142,6 +1330,7 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.dev_close = cn10k_ml_dev_close,
 	.dev_start = cn10k_ml_dev_start,
 	.dev_stop = cn10k_ml_dev_stop,
+	.dev_dump = cn10k_ml_dev_dump,
 
 	/* Queue-pair handling ops */
 	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 24/37] ml/cnxk: add driver support for device selftest
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (22 preceding siblings ...)
  2022-12-08 20:17   ` [PATCH v2 23/37] ml/cnxk: enable support to dump device debug info Srikanth Yalavarthi
@ 2022-12-08 20:17   ` Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 25/37] ml/cnxk: enqueue a burst of inference requests Srikanth Yalavarthi
                     ` (13 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added support for device selftest. Device selftest includes
checking the status of firmware.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 57 ++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index ae90d32480..9cf3bb4a9f 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -686,6 +686,62 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	return 0;
 }
 
+static int
+cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
+{
+	const struct plt_memzone *mz;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_req *req;
+	uint64_t timeout_cycle;
+	bool timeout;
+	int ret;
+
+	mldev = dev->data->dev_private;
+	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cn10k_ml_req), 0,
+					 ML_CN10K_ALIGN_SIZE);
+	if (mz == NULL) {
+		plt_err("Could not allocate reserved memzone");
+		return -ENOMEM;
+	}
+	req = mz->addr;
+
+	/* Prepare load completion structure */
+	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
+	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
+	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST;
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+	req->jd.fw_load.flags = cn10k_ml_fw_flags_get(&mldev->fw);
+	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_wmb();
+
+	/* Enqueue FW handshake / load through scratch registers */
+	timeout = true;
+	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+
+	plt_rmb();
+	do {
+		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
+		    (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+			timeout = false;
+			break;
+		}
+	} while (plt_tsc_cycles() < timeout_cycle);
+
+	/* Check firmware handshake / load status, clean-up and exit */
+	ret = 0;
+	if (timeout) {
+		ret = -ETIME;
+	} else {
+		if (req->result.error_code != 0)
+			ret = -1;
+	}
+
+	plt_memzone_free(mz);
+
+	return ret;
+}
+
 int
 cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, int16_t *model_id)
 {
@@ -1331,6 +1387,7 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.dev_start = cn10k_ml_dev_start,
 	.dev_stop = cn10k_ml_dev_stop,
 	.dev_dump = cn10k_ml_dev_dump,
+	.dev_selftest = cn10k_ml_dev_selftest,
 
 	/* Queue-pair handling ops */
 	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 25/37] ml/cnxk: enqueue a burst of inference requests
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (23 preceding siblings ...)
  2022-12-08 20:17   ` [PATCH v2 24/37] ml/cnxk: add driver support for device selftest Srikanth Yalavarthi
@ 2022-12-08 20:17   ` Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 26/37] ml/cnxk: dequeue " Srikanth Yalavarthi
                     ` (12 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled driver support to enqueue a burst of inference requests
to ML device. Enqueue uses internal ML request structure to queue
the inferences and job completion through polling.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 96 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  7 +++
 2 files changed, 103 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 9cf3bb4a9f..6f2d1adac8 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -285,6 +285,28 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 	}
 }
 
+static __rte_always_inline void
+cn10k_ml_prep_fp_job_descriptor(struct rte_ml_dev *dev, struct cn10k_ml_req *req,
+				struct rte_ml_op *op)
+{
+	struct cn10k_ml_dev *mldev;
+
+	mldev = dev->data->dev_private;
+
+	req->jd.hdr.jce.w0.u64 = 0;
+	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
+	req->jd.hdr.model_id = op->model_id;
+	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
+	req->jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
+	req->jd.hdr.sp_flags = 0x0;
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+	req->jd.model_run.input_ddr_addr =
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->input.addr));
+	req->jd.model_run.output_ddr_addr =
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->output.addr));
+	req->jd.model_run.num_batches = op->nb_batches;
+}
+
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -450,6 +472,8 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 
 	rte_spinlock_init(&ocm->lock);
 
+	dev->enqueue_burst = cn10k_ml_enqueue_burst;
+
 	mldev->nb_models_loaded = 0;
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
 
@@ -1379,6 +1403,78 @@ cn10k_ml_io_dequantize(struct rte_ml_dev *dev, int16_t model_id, uint16_t nb_bat
 	return 0;
 }
 
+static __rte_always_inline void
+queue_index_advance(uint64_t *index, uint64_t nb_desc)
+{
+	*index = (*index + 1) % nb_desc;
+}
+
+static __rte_always_inline uint64_t
+queue_pending_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
+{
+	return (nb_desc + head - tail) % nb_desc;
+}
+
+static __rte_always_inline uint64_t
+queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
+{
+	return nb_desc - queue_pending_count(head, tail, nb_desc) - 1;
+}
+
+__rte_hot uint16_t
+cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
+		       uint16_t nb_ops)
+{
+	struct cn10k_ml_queue *queue;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_req *req;
+	struct cn10k_ml_qp *qp;
+	struct rte_ml_op *op;
+
+	uint16_t count;
+	uint64_t head;
+	bool enqueued;
+
+	mldev = dev->data->dev_private;
+	qp = dev->data->queue_pairs[qp_id];
+	queue = &qp->queue;
+
+	head = queue->head;
+	nb_ops = PLT_MIN(nb_ops, queue_free_count(head, queue->tail, qp->nb_desc));
+	count = 0;
+
+	if (unlikely(nb_ops == 0))
+		return 0;
+
+enqueue_req:
+	op = ops[count];
+	req = &queue->reqs[head];
+
+	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
+
+	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
+	req->result.user_ptr = op->user_ptr;
+
+	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	enqueued = roc_ml_jcmdq_enqueue_lf(&mldev->roc, &req->jcmd);
+	if (unlikely(!enqueued))
+		goto jcmdq_full;
+
+	req->timeout = plt_tsc_cycles() + queue->wait_cycles;
+	req->op = op;
+
+	queue_index_advance(&head, qp->nb_desc);
+	count++;
+
+	if (count < nb_ops)
+		goto enqueue_req;
+
+jcmdq_full:
+	queue->head = head;
+
+	return count;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 5e7e42ee88..e3f61beeab 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -28,6 +28,9 @@ struct cn10k_ml_req {
 
 	/* Request timeout cycle */
 	uint64_t timeout;
+
+	/* ML op */
+	struct rte_ml_op *op;
 } __rte_aligned(ROC_ALIGN);
 
 /* ML request queue */
@@ -67,4 +70,8 @@ int cn10k_ml_model_unload(struct rte_ml_dev *dev, int16_t model_id);
 int cn10k_ml_model_start(struct rte_ml_dev *dev, int16_t model_id);
 int cn10k_ml_model_stop(struct rte_ml_dev *dev, int16_t model_id);
 
+/* Fast-path ops */
+__rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
+					  struct rte_ml_op **ops, uint16_t nb_ops);
+
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 26/37] ml/cnxk: dequeue a burst of inference requests
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (24 preceding siblings ...)
  2022-12-08 20:17   ` [PATCH v2 25/37] ml/cnxk: enqueue a burst of inference requests Srikanth Yalavarthi
@ 2022-12-08 20:17   ` Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 27/37] ml/cnxk: add internal function for sync mode run Srikanth Yalavarthi
                     ` (11 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled driver support to dequeue inference requests from
internal queue. Dequeue checks for request completion by
polling the status field of the job request.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 61 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  2 ++
 2 files changed, 63 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 6f2d1adac8..83ec064c82 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -473,6 +473,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	rte_spinlock_init(&ocm->lock);
 
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
+	dev->dequeue_burst = cn10k_ml_dequeue_burst;
 
 	mldev->nb_models_loaded = 0;
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
@@ -1421,6 +1422,23 @@ queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
 	return nb_desc - queue_pending_count(head, tail, nb_desc) - 1;
 }
 
+static __rte_always_inline void
+cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
+		       struct rte_ml_op *op)
+{
+	PLT_SET_USED(dev);
+	PLT_SET_USED(qp_id);
+
+	op->impl_opaque = result->error_code;
+
+	if (likely(result->error_code == 0))
+		op->status = RTE_ML_OP_STATUS_SUCCESS;
+	else
+		op->status = RTE_ML_OP_STATUS_ERROR;
+
+	op->user_ptr = result->user_ptr;
+}
+
 __rte_hot uint16_t
 cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
@@ -1475,6 +1493,49 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	return count;
 }
 
+__rte_hot uint16_t
+cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
+		       uint16_t nb_ops)
+{
+	struct cn10k_ml_queue *queue;
+	struct cn10k_ml_req *req;
+	struct cn10k_ml_qp *qp;
+
+	uint64_t status;
+	uint16_t count;
+	uint64_t tail;
+
+	qp = dev->data->queue_pairs[qp_id];
+	queue = &qp->queue;
+
+	tail = queue->tail;
+	nb_ops = PLT_MIN(nb_ops, queue_pending_count(queue->head, tail, qp->nb_desc));
+	count = 0;
+
+	if (unlikely(nb_ops == 0))
+		goto empty_or_active;
+
+dequeue_req:
+	req = &queue->reqs[tail];
+	status = plt_read64(&req->status);
+	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH))
+		goto empty_or_active;
+
+	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
+	ops[count] = req->op;
+
+	queue_index_advance(&tail, qp->nb_desc);
+	count++;
+
+	if (count < nb_ops)
+		goto dequeue_req;
+
+empty_or_active:
+	queue->tail = tail;
+
+	return count;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index e3f61beeab..3c5342dcc7 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -73,5 +73,7 @@ int cn10k_ml_model_stop(struct rte_ml_dev *dev, int16_t model_id);
 /* Fast-path ops */
 __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					  struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
+					  struct rte_ml_op **ops, uint16_t nb_ops);
 
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 27/37] ml/cnxk: add internal function for sync mode run
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (25 preceding siblings ...)
  2022-12-08 20:17   ` [PATCH v2 26/37] ml/cnxk: dequeue " Srikanth Yalavarthi
@ 2022-12-08 20:17   ` Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 28/37] ml/cnxk: enable support for firmware error codes Srikanth Yalavarthi
                     ` (10 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added internal function to execute ML inference requests
in synchronous mode. Sync mode inference execution is used
to launch inference requests without using a queue-pair.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 53 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  1 +
 2 files changed, 54 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 83ec064c82..e7ee0774f2 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1536,6 +1536,59 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	return count;
 }
 
+__rte_hot int
+cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_req *req;
+	bool timeout;
+	int ret = 0;
+
+	mldev = dev->data->dev_private;
+	model = dev->data->models[op->model_id];
+	req = model->req;
+
+	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
+
+	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
+	req->result.user_ptr = op->user_ptr;
+
+	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
+
+	timeout = true;
+	req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	do {
+		if (roc_ml_jcmdq_enqueue_lf(&mldev->roc, &req->jcmd)) {
+			req->op = op;
+			timeout = false;
+			break;
+		}
+	} while (plt_tsc_cycles() < req->timeout);
+
+	if (timeout) {
+		ret = -EBUSY;
+		goto error_enqueue;
+	}
+
+	timeout = true;
+	do {
+		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+			timeout = false;
+			break;
+		}
+	} while (plt_tsc_cycles() < req->timeout);
+
+	if (timeout)
+		ret = -ETIME;
+	else
+		cn10k_ml_result_update(dev, -1, &req->result, req->op);
+
+error_enqueue:
+	return ret;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 3c5342dcc7..c23e484b69 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -75,5 +75,6 @@ __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id
 					  struct rte_ml_op **ops, uint16_t nb_ops);
 __rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					  struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
 
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 28/37] ml/cnxk: enable support for firmware error codes
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (26 preceding siblings ...)
  2022-12-08 20:17   ` [PATCH v2 27/37] ml/cnxk: add internal function for sync mode run Srikanth Yalavarthi
@ 2022-12-08 20:17   ` Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 29/37] ml/cnxk: add support to get and reset device stats Srikanth Yalavarthi
                     ` (9 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled support for error handling. Added error types and subtypes
supported by ML firmware. Enabled support to get device specific
error code and message for a completed ML request.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
v2:
* Fixed typos

 drivers/ml/cnxk/cn10k_ml_dev.c |   4 +-
 drivers/ml/cnxk/cn10k_ml_dev.h |  50 +++++++++++++-
 drivers/ml/cnxk/cn10k_ml_ops.c | 117 ++++++++++++++++++++++++++++++---
 drivers/ml/cnxk/cn10k_ml_ops.h |   2 +
 4 files changed, 160 insertions(+), 13 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 805b037593..779734d6cd 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -261,7 +261,7 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 	} while (plt_tsc_cycles() < timeout_cycle);

 	/* Check firmware load status, clean-up and exit on failure. */
-	if ((!timeout) && (fw->req->result.error_code == 0)) {
+	if ((!timeout) && (fw->req->result.error_code.u64 == 0)) {
 		cn10k_ml_fw_print_info(fw);
 	} else {
 		/* Set ML to disable new jobs */
@@ -452,7 +452,7 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	} while (plt_tsc_cycles() < timeout_cycle);

 	/* Check firmware load status, clean-up and exit on failure. */
-	if ((!timeout) && (fw->req->result.error_code == 0)) {
+	if ((!timeout) && (fw->req->result.error_code.u64 == 0)) {
 		cn10k_ml_fw_print_info(fw);
 	} else {
 		/* Set ML to disable new jobs */
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 5096a26c40..f292078920 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -64,6 +64,54 @@ enum cn10k_ml_dev_state {
 	ML_CN10K_DEV_STATE_CLOSED
 };

+/* ML error types enumeration */
+enum cn10k_ml_error_etype {
+	/* 0x0 */ ML_ETYPE_NO_ERROR = 0, /* No error */
+	/* 0x1 */ ML_ETYPE_FW_NONFATAL,	 /* Firmware non-fatal error */
+	/* 0x2 */ ML_ETYPE_HW_NONFATAL,	 /* Hardware non-fatal error */
+	/* 0x3 */ ML_ETYPE_HW_FATAL,	 /* Hardware fatal error */
+	/* 0x4 */ ML_ETYPE_HW_WARNING,	 /* Hardware warning */
+	/* 0x5 */ ML_ETYPE_DRIVER,	 /* Driver specific error */
+	/* 0x6 */ ML_ETYPE_UNKNOWN,	 /* Unknown error */
+};
+
+/* ML firmware non-fatal error sub-type */
+enum cn10k_ml_error_stype_fw_nf {
+	/* 0x0 */ ML_FW_ERR_NOERR = 0,		 /* No error */
+	/* 0x1 */ ML_FW_ERR_UNLOAD_ID_NOT_FOUND, /* Model ID not found during load */
+	/* 0x2 */ ML_FW_ERR_LOAD_LUT_OVERFLOW,	 /* Lookup table overflow at load */
+	/* 0x3 */ ML_FW_ERR_ID_IN_USE,		 /* Model ID already in use */
+	/* 0x4 */ ML_FW_ERR_INVALID_TILEMASK,	 /* Invalid OCM tilemask */
+	/* 0x5 */ ML_FW_ERR_RUN_LUT_OVERFLOW,	 /* Lookup table overflow at run */
+	/* 0x6 */ ML_FW_ERR_RUN_ID_NOT_FOUND,	 /* Model ID not found during run */
+	/* 0x7 */ ML_FW_ERR_COMMAND_NOTSUP,	 /* Unsupported command */
+	/* 0x8 */ ML_FW_ERR_DDR_ADDR_RANGE,	 /* DDR address out of range */
+	/* 0x9 */ ML_FW_ERR_NUM_BATCHES_INVALID, /* Invalid number of batches */
+	/* 0xA */ ML_FW_ERR_INSSYNC_TIMEOUT,	 /* INS sync timeout */
+};
+
+/* ML driver error sub-type */
+enum cn10k_ml_error_stype_driver {
+	/* 0x0 */ ML_DRIVER_ERR_NOERR = 0, /* No error */
+	/* 0x1 */ ML_DRIVER_ERR_UNKNOWN,   /* Unable to determine error sub-type */
+	/* 0x2 */ ML_DRIVER_ERR_EXCEPTION, /* Firmware exception */
+	/* 0x3 */ ML_DRIVER_ERR_FW_ERROR,  /* Unknown firmware error */
+};
+
+/* ML error structure */
+union cn10k_ml_error_code {
+	struct {
+		/* Error type */
+		uint64_t etype : 4;
+
+		/* Error sub-type */
+		uint64_t stype : 60;
+	} s;
+
+	/* WORD 0 */
+	uint64_t u64;
+};
+
 /* ML Firmware stats */
 struct cn10k_ml_fw_stats {
 	/* Firmware start cycle */
@@ -82,7 +130,7 @@ struct cn10k_ml_fw_stats {
 /* ML result structure */
 struct cn10k_ml_result {
 	/* Job error code */
-	uint64_t error_code;
+	union cn10k_ml_error_code error_code;

 	/* Firmware stats */
 	struct cn10k_ml_fw_stats stats;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index e7ee0774f2..d9eea21e12 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -23,6 +23,49 @@
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)

+/* Error message length */
+#define ERRMSG_LEN 32
+
+/* Error type database */
+static const struct cn10k_ml_etype_db {
+	enum cn10k_ml_error_etype etype;
+	char name[ERRMSG_LEN];
+} ml_etype_db[] = {
+	{ML_ETYPE_NO_ERROR, "NO_ERROR"},	{ML_ETYPE_FW_NONFATAL, "FW_NON_FATAL"},
+	{ML_ETYPE_HW_NONFATAL, "HW_NON_FATAL"}, {ML_ETYPE_HW_FATAL, "HW_FATAL"},
+	{ML_ETYPE_HW_WARNING, "HW_WARNING"},	{ML_ETYPE_DRIVER, "DRIVER_ERROR"},
+	{ML_ETYPE_UNKNOWN, "UNKNOWN_ERROR"},
+};
+
+/* Hardware non-fatal error subtype database */
+static const struct cn10k_ml_stype_db_hw_nf {
+	enum cn10k_ml_error_stype_fw_nf stype;
+	char msg[ERRMSG_LEN];
+} ml_stype_db_hw_nf[] = {
+	{ML_FW_ERR_NOERR, "NO ERROR"},
+	{ML_FW_ERR_UNLOAD_ID_NOT_FOUND, "UNLOAD MODEL ID NOT FOUND"},
+	{ML_FW_ERR_LOAD_LUT_OVERFLOW, "LOAD LUT OVERFLOW"},
+	{ML_FW_ERR_ID_IN_USE, "MODEL ID IN USE"},
+	{ML_FW_ERR_INVALID_TILEMASK, "INVALID TILEMASK"},
+	{ML_FW_ERR_RUN_LUT_OVERFLOW, "RUN LUT OVERFLOW"},
+	{ML_FW_ERR_RUN_ID_NOT_FOUND, "RUN MODEL ID NOT FOUND"},
+	{ML_FW_ERR_COMMAND_NOTSUP, "COMMAND NOT SUPPORTED"},
+	{ML_FW_ERR_DDR_ADDR_RANGE, "DDR ADDRESS OUT OF RANGE"},
+	{ML_FW_ERR_NUM_BATCHES_INVALID, "INVALID BATCHES"},
+	{ML_FW_ERR_INSSYNC_TIMEOUT, "INSSYNC TIMEOUT"},
+};
+
+/* Driver error subtype database */
+static const struct cn10k_ml_stype_db_driver {
+	enum cn10k_ml_error_stype_driver stype;
+	char msg[ERRMSG_LEN];
+} ml_stype_db_driver[] = {
+	{ML_DRIVER_ERR_NOERR, "NO ERROR"},
+	{ML_DRIVER_ERR_UNKNOWN, "UNKNOWN ERROR"},
+	{ML_DRIVER_ERR_EXCEPTION, "FW EXCEPTION"},
+	{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
+};
+
 static void
 print_line(FILE *fp, int len)
 {
@@ -474,6 +517,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c

 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
+	dev->op_error_get = cn10k_ml_op_error_get;

 	mldev->nb_models_loaded = 0;
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
@@ -758,7 +802,7 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	if (timeout) {
 		ret = -ETIME;
 	} else {
-		if (req->result.error_code != 0)
+		if (req->result.error_code.u64 != 0)
 			ret = -1;
 	}

@@ -940,7 +984,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, int16_t model_id)
 	req = model->req;
 	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);

-	req->result.error_code = 0x0;
+	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;

 	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
@@ -1021,7 +1065,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, int16_t model_id)

 	if (job_dequeued) {
 		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
-			if (req->result.error_code == 0)
+			if (req->result.error_code.u64 == 0)
 				ret = 0;
 			else
 				ret = -1;
@@ -1083,7 +1127,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, int16_t model_id)
 	/* Prepare JD */
 	req = model->req;
 	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
-	req->result.error_code = 0x0;
+	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;

 	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
@@ -1138,7 +1182,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, int16_t model_id)

 	if (job_dequeued) {
 		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
-			if (req->result.error_code == 0x0)
+			if (req->result.error_code.u64 == 0x0)
 				ret = 0;
 			else
 				ret = -1;
@@ -1429,12 +1473,30 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 	PLT_SET_USED(dev);
 	PLT_SET_USED(qp_id);

-	op->impl_opaque = result->error_code;
+	struct cn10k_ml_dev *mldev;

-	if (likely(result->error_code == 0))
+	if (likely(result->error_code.u64 == 0)) {
+		op->impl_opaque = result->error_code.u64;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
-	else
+	} else {
+		/* Handle driver error */
+		if (result->error_code.s.etype == ML_ETYPE_DRIVER) {
+			mldev = dev->data->dev_private;
+
+			/* Check for exception */
+			if ((roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0) ||
+			    (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
+				result->error_code.s.stype = ML_DRIVER_ERR_EXCEPTION;
+			else if ((roc_ml_reg_read64(&mldev->roc, ML_CORE_INT_LO) != 0) ||
+				 (roc_ml_reg_read64(&mldev->roc, ML_CORE_INT_HI) != 0))
+				result->error_code.s.stype = ML_DRIVER_ERR_FW_ERROR;
+			else
+				result->error_code.s.stype = ML_DRIVER_ERR_UNKNOWN;
+		}
+
+		op->impl_opaque = result->error_code.u64;
 		op->status = RTE_ML_OP_STATUS_ERROR;
+	}

 	op->user_ptr = result->user_ptr;
 }
@@ -1471,6 +1533,7 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);

 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
+	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;

 	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
@@ -1518,8 +1581,12 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 dequeue_req:
 	req = &queue->reqs[tail];
 	status = plt_read64(&req->status);
-	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH))
-		goto empty_or_active;
+	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH)) {
+		if (plt_tsc_cycles() < req->timeout)
+			goto empty_or_active;
+		else /* Timeout, set indication of driver error */
+			req->result.error_code.s.etype = ML_ETYPE_DRIVER;
+	}

 	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
 	ops[count] = req->op;
@@ -1536,6 +1603,35 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	return count;
 }

+__rte_hot int
+cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_ml_op_error *error)
+{
+	union cn10k_ml_error_code *error_code;
+	char msg[RTE_ML_STR_MAX];
+
+	PLT_SET_USED(dev);
+
+	error_code = (union cn10k_ml_error_code *)&op->impl_opaque;
+
+	/* Copy error message */
+	plt_strlcpy(msg, ml_etype_db[error_code->s.etype].name, sizeof(msg));
+
+	/* Copy sub error message */
+	if (error_code->s.etype == ML_ETYPE_HW_NONFATAL) {
+		strcat(msg, " : ");
+		strcat(msg, ml_stype_db_hw_nf[error_code->s.stype].msg);
+	}
+
+	if (error_code->s.etype == ML_ETYPE_DRIVER) {
+		strcat(msg, " : ");
+		strcat(msg, ml_stype_db_driver[error_code->s.stype].msg);
+	}
+
+	plt_strlcpy(error->message, msg, sizeof(error->message));
+
+	return 0;
+}
+
 __rte_hot int
 cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 {
@@ -1552,6 +1648,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);

 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
+	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;

 	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index c23e484b69..5f00cb2a60 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -75,6 +75,8 @@ __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id
 					  struct rte_ml_op **ops, uint16_t nb_ops);
 __rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					  struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op,
+				    struct rte_ml_op_error *error);
 __rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);

 #endif /* _CN10K_ML_OPS_H_ */
--
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 29/37] ml/cnxk: add support to get and reset device stats
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (27 preceding siblings ...)
  2022-12-08 20:17   ` [PATCH v2 28/37] ml/cnxk: enable support for firmware error codes Srikanth Yalavarthi
@ 2022-12-08 20:17   ` Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 30/37] ml/cnxk: add support to handle extended dev stats Srikanth Yalavarthi
                     ` (8 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled support to get and reset ML device stats. Device stats
include number of requests enqueued/dequeued and error count.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 55 ++++++++++++++++++++++++++++++++--
 drivers/ml/cnxk/cn10k_ml_ops.h |  3 ++
 2 files changed, 55 insertions(+), 3 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index d9eea21e12..732d0a63ba 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -159,6 +159,10 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->queue.tail = 0;
 	qp->queue.wait_cycles = ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
 	qp->nb_desc = nb_desc;
+	qp->stats.enqueued_count = 0;
+	qp->stats.dequeued_count = 0;
+	qp->stats.enqueue_err_count = 0;
+	qp->stats.dequeue_err_count = 0;
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
@@ -678,6 +682,38 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+static int
+cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
+{
+	struct cn10k_ml_qp *qp;
+	int qp_id;
+
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		stats->enqueued_count += qp->stats.enqueued_count;
+		stats->dequeued_count += qp->stats.dequeued_count;
+		stats->enqueue_err_count += qp->stats.enqueue_err_count;
+		stats->dequeue_err_count += qp->stats.dequeue_err_count;
+	}
+
+	return 0;
+}
+
+static void
+cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
+{
+	struct cn10k_ml_qp *qp;
+	int qp_id;
+
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		qp->stats.enqueued_count = 0;
+		qp->stats.dequeued_count = 0;
+		qp->stats.enqueue_err_count = 0;
+		qp->stats.dequeue_err_count = 0;
+	}
+}
+
 static int
 cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
@@ -1470,15 +1506,23 @@ static __rte_always_inline void
 cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
 		       struct rte_ml_op *op)
 {
-	PLT_SET_USED(dev);
-	PLT_SET_USED(qp_id);
-
 	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_qp *qp;
 
 	if (likely(result->error_code.u64 == 0)) {
+		if (likely(qp_id >= 0)) {
+			qp = dev->data->queue_pairs[qp_id];
+			qp->stats.dequeued_count++;
+		}
+
 		op->impl_opaque = result->error_code.u64;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
 	} else {
+		if (likely(qp_id >= 0)) {
+			qp = dev->data->queue_pairs[qp_id];
+			qp->stats.dequeue_err_count++;
+		}
+
 		/* Handle driver error */
 		if (result->error_code.s.etype == ML_ETYPE_DRIVER) {
 			mldev = dev->data->dev_private;
@@ -1552,6 +1596,7 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 
 jcmdq_full:
 	queue->head = head;
+	qp->stats.enqueued_count += count;
 
 	return count;
 }
@@ -1700,6 +1745,10 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
 	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
 
+	/* Stats ops */
+	.dev_stats_get = cn10k_ml_dev_stats_get,
+	.dev_stats_reset = cn10k_ml_dev_stats_reset,
+
 	/* Model ops */
 	.model_load = cn10k_ml_model_load,
 	.model_unload = cn10k_ml_model_unload,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 5f00cb2a60..4c38f1938a 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -58,6 +58,9 @@ struct cn10k_ml_qp {
 
 	/* Request queue */
 	struct cn10k_ml_queue queue;
+
+	/* Queue pair statistics */
+	struct rte_ml_dev_stats stats;
 };
 
 /* CN10K device ops */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 30/37] ml/cnxk: add support to handle extended dev stats
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (28 preceding siblings ...)
  2022-12-08 20:17   ` [PATCH v2 29/37] ml/cnxk: add support to get and reset device stats Srikanth Yalavarthi
@ 2022-12-08 20:17   ` Srikanth Yalavarthi
  2022-12-08 20:17   ` [PATCH v2 31/37] ml/cnxk: enable support to get xstats in cycles Srikanth Yalavarthi
                     ` (7 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added support to handle ML device extended stats. Support
is enabled to get xstats names and stats values and reset
xstats. Supported xstats include avg, min and max hardware
and firmware latency.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |   3 +
 drivers/ml/cnxk/cn10k_ml_model.h |  57 +++++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 356 ++++++++++++++++++++++++++++++-
 3 files changed, 415 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index f292078920..fadca2a9f5 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -372,6 +372,9 @@ struct cn10k_ml_dev {
 
 	/* Number of models loaded */
 	uint16_t nb_models_loaded;
+
+	/* xstats status */
+	bool xstats_enabled;
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 2372ac9b72..9d8068a173 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -402,6 +402,57 @@ struct cn10k_ml_model_addr {
 	uint32_t total_output_sz_d;
 };
 
+/* Extended stats types enum */
+enum cn10k_ml_model_xstats_type {
+	/* Average hardware latency */
+	avg_hw_latency = 0,
+
+	/* Minimum hardware latency */
+	min_hw_latency,
+
+	/* Maximum hardware latency */
+	max_hw_latency,
+
+	/* Average firmware latency */
+	avg_fw_latency,
+
+	/* Minimum firmware latency */
+	min_fw_latency,
+
+	/* Maximum firmware latency */
+	max_fw_latency,
+};
+
+/* Model fast-path stats */
+struct cn10k_ml_model_stats {
+	/* Total hardware latency, sum of all inferences */
+	uint64_t hw_latency_tot;
+
+	/* Minimum hardware latency */
+	uint64_t hw_latency_min;
+
+	/* Maximum hardware latency */
+	uint64_t hw_latency_max;
+
+	/* Total firmware latency, sum of all inferences */
+	uint64_t fw_latency_tot;
+
+	/* Minimum firmware latency */
+	uint64_t fw_latency_min;
+
+	/* Maximum firmware latency */
+	uint64_t fw_latency_max;
+
+	/* Total jobs dequeued */
+	uint64_t dequeued_count;
+
+	/* Hardware stats reset index */
+	uint64_t hw_reset_count;
+
+	/* Firmware stats reset index */
+	uint64_t fw_reset_count;
+};
+
 /* ML Model Object */
 struct cn10k_ml_model {
 	/* Device reference */
@@ -441,6 +492,12 @@ struct cn10k_ml_model {
 
 	/* Model slow-path operations request pointer */
 	struct cn10k_ml_req *req;
+
+	/* Model stats for burst ops */
+	struct cn10k_ml_model_stats *burst_stats;
+
+	/* Model stats for sync ops */
+	struct cn10k_ml_model_stats *sync_stats;
 };
 
 int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 732d0a63ba..eeea98a4d5 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -354,6 +354,134 @@ cn10k_ml_prep_fp_job_descriptor(struct rte_ml_dev *dev, struct cn10k_ml_req *req
 	req->jd.model_run.num_batches = op->nb_batches;
 }
 
+#define ML_AVG_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
+			value += model->burst_stats[qp_id].str##_latency_tot;                      \
+			count += model->burst_stats[qp_id].dequeued_count -                        \
+				 model->burst_stats[qp_id].str##_reset_count;                      \
+		}                                                                                  \
+		value = value / count;                                                             \
+	} while (0)
+
+#define ML_MIN_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+	do {                                                                                       \
+		value = UINT64_MAX;                                                                \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
+			value = PLT_MIN(value, model->burst_stats[qp_id].str##_latency_min);       \
+			count += model->burst_stats[qp_id].dequeued_count -                        \
+				 model->burst_stats[qp_id].str##_reset_count;                      \
+		}                                                                                  \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+
+#define ML_MAX_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
+			value = PLT_MAX(value, model->burst_stats[qp_id].str##_latency_max);       \
+			count += model->burst_stats[qp_id].dequeued_count -                        \
+				 model->burst_stats[qp_id].str##_reset_count;                      \
+		}                                                                                  \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+
+static uint64_t
+cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t model_id,
+			 enum cn10k_ml_model_xstats_type type)
+{
+	struct cn10k_ml_model *model;
+	uint64_t count = 0;
+	uint64_t value;
+	uint32_t qp_id;
+
+	model = dev->data->models[model_id];
+	if (model == NULL)
+		return 0;
+
+	switch (type) {
+	case avg_hw_latency:
+		ML_AVG_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		break;
+	case min_hw_latency:
+		ML_MIN_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		break;
+	case max_hw_latency:
+		ML_MAX_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		break;
+	case avg_fw_latency:
+		ML_AVG_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		break;
+	case min_fw_latency:
+		ML_MIN_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		break;
+	case max_fw_latency:
+		ML_MAX_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		break;
+	default:
+		value = 0;
+	}
+
+	return value;
+}
+
+#define ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
+			model->burst_stats[qp_id].str##_latency_tot = 0;                           \
+			model->burst_stats[qp_id].str##_reset_count =                              \
+				model->burst_stats[qp_id].dequeued_count;                          \
+		}                                                                                  \
+	} while (0)
+
+#define ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
+			model->burst_stats[qp_id].str##_latency_min = UINT64_MAX;                  \
+	} while (0)
+
+#define ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
+			model->burst_stats[qp_id].str##_latency_max = 0;                           \
+	} while (0)
+
+static void
+cn10k_ml_model_xstat_reset(struct rte_ml_dev *dev, uint16_t model_id,
+			   enum cn10k_ml_model_xstats_type type)
+{
+	struct cn10k_ml_model *model;
+	uint32_t qp_id;
+
+	model = dev->data->models[model_id];
+
+	switch (type) {
+	case avg_hw_latency:
+		ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, hw);
+		break;
+	case min_hw_latency:
+		ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, hw);
+		break;
+	case max_hw_latency:
+		ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, hw);
+		break;
+	case avg_fw_latency:
+		ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, fw);
+		break;
+	case min_fw_latency:
+		ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, fw);
+		break;
+	case max_fw_latency:
+		ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, fw);
+		break;
+	default:
+		return;
+	}
+}
+
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -519,6 +647,13 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 
 	rte_spinlock_init(&ocm->lock);
 
+	/* Check firmware stats */
+	if ((mldev->fw.req->jd.fw_load.cap.s.hw_stats) &&
+	    (mldev->fw.req->jd.fw_load.cap.s.fw_stats))
+		mldev->xstats_enabled = true;
+	else
+		mldev->xstats_enabled = false;
+
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
 	dev->op_error_get = cn10k_ml_op_error_get;
@@ -714,6 +849,170 @@ cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 	}
 }
 
+/* Model xstats names */
+struct rte_ml_dev_xstats_map cn10k_ml_model_xstats_table[] = {
+	{avg_hw_latency, "Avg-HW-Latency"}, {min_hw_latency, "Min-HW-Latency"},
+	{max_hw_latency, "Max-HW-Latency"}, {avg_fw_latency, "Avg-FW-Latency"},
+	{min_fw_latency, "Min-FW-Latency"}, {max_fw_latency, "Max-FW-Latency"},
+};
+
+static int
+cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, struct rte_ml_dev_xstats_map *xstats_map,
+			      uint32_t size)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	uint32_t model_id;
+	uint32_t count;
+	uint32_t type;
+	uint32_t id;
+
+	mldev = dev->data->dev_private;
+	if (!mldev->xstats_enabled)
+		return 0;
+
+	if (xstats_map == NULL)
+		return PLT_DIM(cn10k_ml_model_xstats_table) * mldev->nb_models_loaded;
+
+	/* Model xstats names */
+	count = 0;
+	cn10k_ml_dev_info_get(dev, &dev_info);
+
+	for (id = 0; id < PLT_DIM(cn10k_ml_model_xstats_table) * dev_info.max_models; id++) {
+		model_id = id / PLT_DIM(cn10k_ml_model_xstats_table);
+		model = dev->data->models[model_id];
+
+		if (model == NULL)
+			continue;
+
+		xstats_map[count].id = id;
+		type = id % PLT_DIM(cn10k_ml_model_xstats_table);
+
+		snprintf(xstats_map[count].name, RTE_ML_STR_MAX, "%s-%s-cycles",
+			 model->metadata.model.name, cn10k_ml_model_xstats_table[type].name);
+
+		count++;
+		if (count == size)
+			break;
+	}
+
+	return count;
+}
+
+static int
+cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
+				uint64_t *value)
+{
+	struct rte_ml_dev_xstats_map *xstats_map;
+	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_dev *mldev;
+	uint32_t num_xstats;
+	uint32_t model_id;
+	uint32_t type;
+	uint32_t id;
+
+	mldev = dev->data->dev_private;
+	if (!mldev->xstats_enabled)
+		return 0;
+
+	num_xstats = PLT_DIM(cn10k_ml_model_xstats_table) * mldev->nb_models_loaded;
+	xstats_map = rte_zmalloc("cn10k_ml_xstats_map",
+				 sizeof(struct rte_ml_dev_xstats_map) * num_xstats, 0);
+	cn10k_ml_dev_xstats_names_get(dev, xstats_map, num_xstats);
+
+	cn10k_ml_dev_info_get(dev, &dev_info);
+	for (id = 0; id < PLT_DIM(cn10k_ml_model_xstats_table) * dev_info.max_models; id++) {
+		if (strncmp(name, xstats_map[id].name, strlen(name)) == 0) {
+			*stat_id = id;
+			rte_free(xstats_map);
+			break;
+		}
+	}
+
+	if (id == PLT_DIM(cn10k_ml_model_xstats_table) * dev_info.max_models)
+		return -EINVAL;
+
+	model_id = id / PLT_DIM(cn10k_ml_model_xstats_table);
+	type = id % PLT_DIM(cn10k_ml_model_xstats_table);
+	*value = cn10k_ml_model_xstat_get(dev, model_id, type);
+
+	return 0;
+}
+
+static int
+cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, const uint16_t *stat_ids, uint64_t *values,
+			uint16_t nb_ids)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	uint32_t model_id;
+	uint32_t count;
+	uint32_t type;
+	uint32_t i;
+
+	mldev = dev->data->dev_private;
+	if (!mldev->xstats_enabled)
+		return 0;
+
+	count = 0;
+	for (i = 0; i < nb_ids; i++) {
+		model_id = stat_ids[i] / PLT_DIM(cn10k_ml_model_xstats_table);
+		model = dev->data->models[model_id];
+
+		if (model == NULL)
+			continue;
+
+		type = stat_ids[i] % PLT_DIM(cn10k_ml_model_xstats_table);
+		values[i] = cn10k_ml_model_xstat_get(dev, model_id, type);
+		count++;
+	}
+
+	return count;
+}
+
+static int
+cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, const uint16_t *stat_ids, uint16_t nb_ids)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	uint32_t model_id;
+	uint32_t type;
+	uint32_t i;
+
+	mldev = dev->data->dev_private;
+	if (!mldev->xstats_enabled)
+		return 0;
+
+	cn10k_ml_dev_info_get(dev, &dev_info);
+	if (stat_ids == NULL) {
+		for (i = 0; i < PLT_DIM(cn10k_ml_model_xstats_table) * dev_info.max_models; i++) {
+			model_id = i / PLT_DIM(cn10k_ml_model_xstats_table);
+			model = dev->data->models[model_id];
+
+			if (model == NULL)
+				continue;
+
+			type = i % PLT_DIM(cn10k_ml_model_xstats_table);
+			cn10k_ml_model_xstat_reset(dev, model_id, type);
+		}
+	} else {
+		for (i = 0; i < nb_ids; i++) {
+			model_id = stat_ids[i] / PLT_DIM(cn10k_ml_model_xstats_table);
+			model = dev->data->models[model_id];
+
+			if (model == NULL)
+				continue;
+
+			type = stat_ids[i] % PLT_DIM(cn10k_ml_model_xstats_table);
+			cn10k_ml_model_xstat_reset(dev, model_id, type);
+		}
+	}
+
+	return 0;
+}
+
 static int
 cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
@@ -856,6 +1155,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
+	size_t model_stats_size;
 	size_t model_data_size;
 	size_t model_info_size;
 	uint8_t *base_dma_addr;
@@ -864,6 +1164,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	uint64_t mz_size;
 	uint16_t idx;
 	bool found;
+	int qp_id;
 	int ret;
 
 	ret = cn10k_ml_model_metadata_check(params->addr, params->size);
@@ -900,10 +1201,12 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
 			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
 	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
+	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_model_stats);
 
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
 		  2 * model_data_size + model_info_size +
-		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE);
+		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE) +
+		  model_stats_size;
 
 	/* Allocate memzone for model object and model data */
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
@@ -952,6 +1255,24 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	/* Set model info */
 	cn10k_ml_model_info_set(dev, model);
 
+	/* Reset burst and sync stats */
+	model->burst_stats = PLT_PTR_ADD(
+		model->req, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE));
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
+		model->burst_stats[qp_id].hw_latency_tot = 0;
+		model->burst_stats[qp_id].hw_latency_min = UINT64_MAX;
+		model->burst_stats[qp_id].hw_latency_max = 0;
+		model->burst_stats[qp_id].fw_latency_tot = 0;
+		model->burst_stats[qp_id].fw_latency_min = UINT64_MAX;
+		model->burst_stats[qp_id].fw_latency_max = 0;
+		model->burst_stats[qp_id].hw_reset_count = 0;
+		model->burst_stats[qp_id].fw_reset_count = 0;
+		model->burst_stats[qp_id].dequeued_count = 0;
+	}
+	model->sync_stats =
+		PLT_PTR_ADD(model->burst_stats,
+			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_model_stats));
+
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
@@ -1506,15 +1827,44 @@ static __rte_always_inline void
 cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
 		       struct rte_ml_op *op)
 {
+	struct cn10k_ml_model_stats *stats;
+	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
+	uint64_t hw_latency;
+	uint64_t fw_latency;
 
 	if (likely(result->error_code.u64 == 0)) {
+		model = dev->data->models[op->model_id];
 		if (likely(qp_id >= 0)) {
 			qp = dev->data->queue_pairs[qp_id];
 			qp->stats.dequeued_count++;
+			stats = &model->burst_stats[qp_id];
+		} else {
+			stats = model->sync_stats;
+		}
+
+		if (unlikely(stats->dequeued_count == stats->hw_reset_count)) {
+			stats->hw_latency_min = UINT64_MAX;
+			stats->hw_latency_max = 0;
 		}
 
+		if (unlikely(stats->dequeued_count == stats->fw_reset_count)) {
+			stats->fw_latency_min = UINT64_MAX;
+			stats->fw_latency_max = 0;
+		}
+
+		hw_latency = result->stats.hw_end - result->stats.hw_start;
+		fw_latency = result->stats.fw_end - result->stats.fw_start - hw_latency;
+
+		stats->hw_latency_tot += hw_latency;
+		stats->hw_latency_min = PLT_MIN(stats->hw_latency_min, hw_latency);
+		stats->hw_latency_max = PLT_MAX(stats->hw_latency_max, hw_latency);
+		stats->fw_latency_tot += fw_latency;
+		stats->fw_latency_min = PLT_MIN(stats->fw_latency_min, fw_latency);
+		stats->fw_latency_max = PLT_MAX(stats->fw_latency_max, fw_latency);
+		stats->dequeued_count++;
+
 		op->impl_opaque = result->error_code.u64;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
 	} else {
@@ -1748,6 +2098,10 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Stats ops */
 	.dev_stats_get = cn10k_ml_dev_stats_get,
 	.dev_stats_reset = cn10k_ml_dev_stats_reset,
+	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
+	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
+	.dev_xstats_get = cn10k_ml_dev_xstats_get,
+	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
 
 	/* Model ops */
 	.model_load = cn10k_ml_model_load,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 31/37] ml/cnxk: enable support to get xstats in cycles
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (29 preceding siblings ...)
  2022-12-08 20:17   ` [PATCH v2 30/37] ml/cnxk: add support to handle extended dev stats Srikanth Yalavarthi
@ 2022-12-08 20:17   ` Srikanth Yalavarthi
  2022-12-08 20:18   ` [PATCH v2 32/37] ml/cnxk: add support to report DPE FW warnings Srikanth Yalavarthi
                     ` (6 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled support to retrieve xstats in either cycles or ns.
Access to sclk is enabled only if an RVU device is probed
during initialization. Driver would return the xstats in
nanoseconds only when an RVU device is probed, else would
fallback to cycles.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index eeea98a4d5..5d29a55e66 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -394,6 +394,8 @@ cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t model_id,
 			 enum cn10k_ml_model_xstats_type type)
 {
 	struct cn10k_ml_model *model;
+	uint16_t rclk_freq; /* MHz */
+	uint16_t sclk_freq; /* MHz */
 	uint64_t count = 0;
 	uint64_t value;
 	uint32_t qp_id;
@@ -425,6 +427,10 @@ cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t model_id,
 		value = 0;
 	}
 
+	roc_clk_freq_get(&rclk_freq, &sclk_freq);
+	if (sclk_freq != 0) /* return in ns */
+		value = (value * 1000ULL) / sclk_freq;
+
 	return value;
 }
 
@@ -863,6 +869,8 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, struct rte_ml_dev_xstats_m
 	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
+	uint16_t rclk_freq;
+	uint16_t sclk_freq;
 	uint32_t model_id;
 	uint32_t count;
 	uint32_t type;
@@ -878,6 +886,7 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, struct rte_ml_dev_xstats_m
 	/* Model xstats names */
 	count = 0;
 	cn10k_ml_dev_info_get(dev, &dev_info);
+	roc_clk_freq_get(&rclk_freq, &sclk_freq);
 
 	for (id = 0; id < PLT_DIM(cn10k_ml_model_xstats_table) * dev_info.max_models; id++) {
 		model_id = id / PLT_DIM(cn10k_ml_model_xstats_table);
@@ -889,8 +898,14 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, struct rte_ml_dev_xstats_m
 		xstats_map[count].id = id;
 		type = id % PLT_DIM(cn10k_ml_model_xstats_table);
 
-		snprintf(xstats_map[count].name, RTE_ML_STR_MAX, "%s-%s-cycles",
-			 model->metadata.model.name, cn10k_ml_model_xstats_table[type].name);
+		if (sclk_freq == 0)
+			snprintf(xstats_map[count].name, RTE_ML_STR_MAX, "%s-%s-cycles",
+				 model->metadata.model.name,
+				 cn10k_ml_model_xstats_table[type].name);
+		else
+			snprintf(xstats_map[count].name, RTE_ML_STR_MAX, "%s-%s-ns",
+				 model->metadata.model.name,
+				 cn10k_ml_model_xstats_table[type].name);
 
 		count++;
 		if (count == size)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 32/37] ml/cnxk: add support to report DPE FW warnings
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (30 preceding siblings ...)
  2022-12-08 20:17   ` [PATCH v2 31/37] ml/cnxk: enable support to get xstats in cycles Srikanth Yalavarthi
@ 2022-12-08 20:18   ` Srikanth Yalavarthi
  2022-12-08 20:18   ` [PATCH v2 33/37] ml/cnxk: add support to enable model data caching Srikanth Yalavarthi
                     ` (5 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:18 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled support to enable and report DPE warnings from ML
firmware. Configure firmware load flags based on the device
arguments.

Default values:
	enable_dpe_errors = 1
	report_dpe_errors = 0

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 94 +++++++++++++++++++++++++++++++---
 drivers/ml/cnxk/cn10k_ml_dev.h |  6 +++
 2 files changed, 93 insertions(+), 7 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 779734d6cd..0b345b3d4e 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -17,9 +17,13 @@
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ops.h"
 
-#define CN10K_ML_FW_PATH "fw_path"
+#define CN10K_ML_FW_PATH		"fw_path"
+#define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
+#define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
 
-#define CN10K_ML_FW_PATH_DEFAULT "/lib/firmware/mlip-fw.bin"
+#define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
+#define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1
+#define CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT 0
 
 /* ML firmware macros */
 #define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
@@ -28,9 +32,13 @@
 #define FW_EXCEPTION_BUFFER_SIZE 0x400
 #define FW_LINKER_OFFSET	 0x80000
 #define FW_WAIT_CYCLES		 100
-#define FW_LOAD_FLAGS		 0x1
 
-static const char *const valid_args[] = {CN10K_ML_FW_PATH, NULL};
+/* Firmware flags */
+#define FW_ENABLE_DPE_WARNING_BITMASK BIT(0)
+#define FW_REPORT_DPE_WARNING_BITMASK BIT(1)
+
+static const char *const valid_args[] = {CN10K_ML_FW_PATH, CN10K_ML_FW_ENABLE_DPE_WARNINGS,
+					 CN10K_ML_FW_REPORT_DPE_WARNINGS, NULL};
 
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
@@ -49,9 +57,25 @@ parse_string_arg(const char *key __rte_unused, const char *value, void *extra_ar
 	return 0;
 }
 
+static int
+parse_integer_arg(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	int *i = (int *)extra_args;
+
+	*i = atoi(value);
+	if (*i < 0) {
+		plt_err("Argument has to be positive.");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 static int
 cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mldev)
 {
+	bool enable_dpe_warnings_set = false;
+	bool report_dpe_warnings_set = false;
 	struct rte_kvargs *kvlist = NULL;
 	bool fw_path_set = false;
 	char *fw_path = NULL;
@@ -76,6 +100,30 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		fw_path_set = true;
 	}
 
+	if (rte_kvargs_count(kvlist, CN10K_ML_FW_ENABLE_DPE_WARNINGS) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_ENABLE_DPE_WARNINGS,
+					 &parse_integer_arg, &mldev->fw.enable_dpe_warnings);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n",
+				CN10K_ML_FW_ENABLE_DPE_WARNINGS);
+			ret = -EINVAL;
+			goto exit;
+		}
+		enable_dpe_warnings_set = true;
+	}
+
+	if (rte_kvargs_count(kvlist, CN10K_ML_FW_REPORT_DPE_WARNINGS) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_REPORT_DPE_WARNINGS,
+					 &parse_integer_arg, &mldev->fw.report_dpe_warnings);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n",
+				CN10K_ML_FW_REPORT_DPE_WARNINGS);
+			ret = -EINVAL;
+			goto exit;
+		}
+		report_dpe_warnings_set = true;
+	}
+
 check_args:
 	if (!fw_path_set)
 		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
@@ -83,6 +131,30 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		mldev->fw.path = fw_path;
 	plt_info("ML: %s = %s", CN10K_ML_FW_PATH, mldev->fw.path);
 
+	if (!enable_dpe_warnings_set) {
+		mldev->fw.enable_dpe_warnings = CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT;
+	} else {
+		if ((mldev->fw.enable_dpe_warnings < 0) || (mldev->fw.enable_dpe_warnings > 1)) {
+			plt_err("Invalid argument, %s = %d\n", CN10K_ML_FW_ENABLE_DPE_WARNINGS,
+				mldev->fw.enable_dpe_warnings);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_info("ML: %s = %d", CN10K_ML_FW_ENABLE_DPE_WARNINGS, mldev->fw.enable_dpe_warnings);
+
+	if (!report_dpe_warnings_set) {
+		mldev->fw.report_dpe_warnings = CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT;
+	} else {
+		if ((mldev->fw.report_dpe_warnings < 0) || (mldev->fw.report_dpe_warnings > 1)) {
+			plt_err("Invalid argument, %s = %d\n", CN10K_ML_FW_REPORT_DPE_WARNINGS,
+				mldev->fw.report_dpe_warnings);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_info("ML: %s = %d", CN10K_ML_FW_REPORT_DPE_WARNINGS, mldev->fw.report_dpe_warnings);
+
 exit:
 	if (kvlist)
 		rte_kvargs_free(kvlist);
@@ -208,9 +280,15 @@ cn10k_ml_fw_print_info(struct cn10k_ml_fw *fw)
 uint64_t
 cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 {
-	PLT_SET_USED(fw);
+	uint64_t flags = 0x0;
+
+	if (fw->enable_dpe_warnings)
+		flags = flags | FW_ENABLE_DPE_WARNING_BITMASK;
+
+	if (fw->report_dpe_warnings)
+		flags = flags | FW_REPORT_DPE_WARNING_BITMASK;
 
-	return FW_LOAD_FLAGS;
+	return flags;
 }
 
 static int
@@ -614,4 +692,6 @@ RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table);
 RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
 
-RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH "=<path>");
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD,
+			      CN10K_ML_FW_PATH "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
+					       "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS "=<0|1>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index fadca2a9f5..52c8bd1af7 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -349,6 +349,12 @@ struct cn10k_ml_fw {
 	/* Firmware file path */
 	const char *path;
 
+	/* Enable DPE warnings */
+	int enable_dpe_warnings;
+
+	/* Report DPE warnings */
+	int report_dpe_warnings;
+
 	/* Data buffer */
 	uint8_t *data;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 33/37] ml/cnxk: add support to enable model data caching
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (31 preceding siblings ...)
  2022-12-08 20:18   ` [PATCH v2 32/37] ml/cnxk: add support to report DPE FW warnings Srikanth Yalavarthi
@ 2022-12-08 20:18   ` Srikanth Yalavarthi
  2022-12-08 20:18   ` [PATCH v2 34/37] ml/cnxk: add support to select OCM allocation mode Srikanth Yalavarthi
                     ` (4 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:18 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added device argument 'cache_model_data' to enable model data
caching. An inference request would be executed with dummy data
in synchronous mode during model start stage. This run would
cache the model weights and bias in the memory and result in
improved inference throughput.

cache_model_data = 1, enable (default)
cache_model_data = 0, disable

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 33 ++++++++++++++++++++--
 drivers/ml/cnxk/cn10k_ml_dev.h |  3 ++
 drivers/ml/cnxk/cn10k_ml_ops.c | 50 ++++++++++++++++++++++++++++++++++
 3 files changed, 84 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 0b345b3d4e..b844a42677 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -20,10 +20,12 @@
 #define CN10K_ML_FW_PATH		"fw_path"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
+#define CN10K_ML_DEV_CACHE_MODEL_DATA	"cache_model_data"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT 0
+#define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT	1
 
 /* ML firmware macros */
 #define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
@@ -38,7 +40,8 @@
 #define FW_REPORT_DPE_WARNING_BITMASK BIT(1)
 
 static const char *const valid_args[] = {CN10K_ML_FW_PATH, CN10K_ML_FW_ENABLE_DPE_WARNINGS,
-					 CN10K_ML_FW_REPORT_DPE_WARNINGS, NULL};
+					 CN10K_ML_FW_REPORT_DPE_WARNINGS,
+					 CN10K_ML_DEV_CACHE_MODEL_DATA, NULL};
 
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
@@ -76,6 +79,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 {
 	bool enable_dpe_warnings_set = false;
 	bool report_dpe_warnings_set = false;
+	bool cache_model_data_set = false;
 	struct rte_kvargs *kvlist = NULL;
 	bool fw_path_set = false;
 	char *fw_path = NULL;
@@ -124,6 +128,18 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		report_dpe_warnings_set = true;
 	}
 
+	if (rte_kvargs_count(kvlist, CN10K_ML_DEV_CACHE_MODEL_DATA) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_DEV_CACHE_MODEL_DATA, &parse_integer_arg,
+					 &mldev->cache_model_data);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n",
+				CN10K_ML_DEV_CACHE_MODEL_DATA);
+			ret = -EINVAL;
+			goto exit;
+		}
+		cache_model_data_set = true;
+	}
+
 check_args:
 	if (!fw_path_set)
 		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
@@ -155,6 +171,18 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %d", CN10K_ML_FW_REPORT_DPE_WARNINGS, mldev->fw.report_dpe_warnings);
 
+	if (!cache_model_data_set) {
+		mldev->cache_model_data = CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT;
+	} else {
+		if ((mldev->cache_model_data < 0) || (mldev->cache_model_data > 1)) {
+			plt_err("Invalid argument, %s = %d\n", CN10K_ML_DEV_CACHE_MODEL_DATA,
+				mldev->cache_model_data);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_info("ML: %s = %d", CN10K_ML_DEV_CACHE_MODEL_DATA, mldev->cache_model_data);
+
 exit:
 	if (kvlist)
 		rte_kvargs_free(kvlist);
@@ -694,4 +722,5 @@ RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
 
 RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD,
 			      CN10K_ML_FW_PATH "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
-					       "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS "=<0|1>");
+					       "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
+					       "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA "=<0|1>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 52c8bd1af7..24e4823196 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -381,6 +381,9 @@ struct cn10k_ml_dev {
 
 	/* xstats status */
 	bool xstats_enabled;
+
+	/* Enable / disable model data caching */
+	int cache_model_data;
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 5d29a55e66..a44d77df76 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -488,6 +488,49 @@ cn10k_ml_model_xstat_reset(struct rte_ml_dev *dev, uint16_t model_id,
 	}
 }
 
+static int
+cn10k_ml_cache_model_data(struct rte_ml_dev *dev, int16_t model_id)
+{
+	struct cn10k_ml_model *model;
+	struct rte_ml_op op;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint64_t isize = 0;
+	uint64_t osize = 0;
+	int ret = 0;
+
+	model = dev->data->models[model_id];
+
+	/* Create input and output buffers. */
+	rte_ml_io_input_size_get(dev->data->dev_id, model_id, model->batch_size, &isize, NULL);
+	rte_ml_io_output_size_get(dev->data->dev_id, model_id, model->batch_size, &osize, NULL);
+
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%d", "ml_dummy_io", model_id);
+	mz = plt_memzone_reserve_aligned(str, isize + osize, 0, ML_CN10K_ALIGN_SIZE);
+	if (mz == NULL)
+		return -ENOMEM;
+	memset(mz->addr, 0, isize + osize);
+
+	op.model_id = model_id;
+	op.nb_batches = model->batch_size;
+	op.mempool = NULL;
+
+	op.input.addr = mz->addr;
+	op.input.length = isize;
+	op.input.next = NULL;
+
+	op.output.addr = PLT_PTR_ADD(op.input.addr, isize);
+	op.output.length = osize;
+	op.output.next = NULL;
+
+	memset(model->req, 0, sizeof(struct cn10k_ml_req));
+	ret = cn10k_ml_inference_sync(dev, &op);
+	plt_memzone_free(mz);
+
+	return ret;
+}
+
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -1471,6 +1514,13 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, int16_t model_id)
 		}
 	}
 
+	if (ret < 0) { /* Call unload to update model and FW state, ignore error */
+		rte_ml_model_stop(dev->data->dev_id, model_id);
+	} else {
+		if (mldev->cache_model_data && roc_model_is_cn10ka())
+			ret = cn10k_ml_cache_model_data(dev, model_id);
+	}
+
 	return ret;
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 34/37] ml/cnxk: add support to select OCM allocation mode
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (32 preceding siblings ...)
  2022-12-08 20:18   ` [PATCH v2 33/37] ml/cnxk: add support to enable model data caching Srikanth Yalavarthi
@ 2022-12-08 20:18   ` Srikanth Yalavarthi
  2022-12-08 20:18   ` [PATCH v2 35/37] ml/cnxk: add support to use lock during jcmd enq Srikanth Yalavarthi
                     ` (3 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:18 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added device argument "ocm_alloc_mode" to select OCM allocation
method during model start. Two modes are supported by the driver.

Added implementation for ocm_alloc_mode lowest as default.

ocm_alloc_mode:
lowest:  Allocate from first available free slot / lowest
         tile ID in OCM (default)
largest: Allocate from a slot with maximum free memory

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 45 +++++++++++++++++++++++++++++-----
 drivers/ml/cnxk/cn10k_ml_ocm.c |  6 ++---
 drivers/ml/cnxk/cn10k_ml_ocm.h |  3 +++
 3 files changed, 44 insertions(+), 10 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index b844a42677..a5fce18ec1 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -21,11 +21,13 @@
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
 #define CN10K_ML_DEV_CACHE_MODEL_DATA	"cache_model_data"
+#define CN10K_ML_OCM_ALLOC_MODE		"ocm_alloc_mode"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT 0
 #define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT	1
+#define CN10K_ML_OCM_ALLOC_MODE_DEFAULT		"lowest"
 
 /* ML firmware macros */
 #define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
@@ -39,9 +41,12 @@
 #define FW_ENABLE_DPE_WARNING_BITMASK BIT(0)
 #define FW_REPORT_DPE_WARNING_BITMASK BIT(1)
 
-static const char *const valid_args[] = {CN10K_ML_FW_PATH, CN10K_ML_FW_ENABLE_DPE_WARNINGS,
+static const char *const valid_args[] = {CN10K_ML_FW_PATH,
+					 CN10K_ML_FW_ENABLE_DPE_WARNINGS,
 					 CN10K_ML_FW_REPORT_DPE_WARNINGS,
-					 CN10K_ML_DEV_CACHE_MODEL_DATA, NULL};
+					 CN10K_ML_DEV_CACHE_MODEL_DATA,
+					 CN10K_ML_OCM_ALLOC_MODE,
+					 NULL};
 
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
@@ -81,6 +86,8 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	bool report_dpe_warnings_set = false;
 	bool cache_model_data_set = false;
 	struct rte_kvargs *kvlist = NULL;
+	bool ocm_alloc_mode_set = false;
+	char *ocm_alloc_mode = NULL;
 	bool fw_path_set = false;
 	char *fw_path = NULL;
 	int ret = 0;
@@ -140,6 +147,17 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		cache_model_data_set = true;
 	}
 
+	if (rte_kvargs_count(kvlist, CN10K_ML_OCM_ALLOC_MODE) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_OCM_ALLOC_MODE, &parse_string_arg,
+					 &ocm_alloc_mode);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n", CN10K_ML_OCM_ALLOC_MODE);
+			ret = -EINVAL;
+			goto exit;
+		}
+		ocm_alloc_mode_set = true;
+	}
+
 check_args:
 	if (!fw_path_set)
 		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
@@ -183,6 +201,20 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %d", CN10K_ML_DEV_CACHE_MODEL_DATA, mldev->cache_model_data);
 
+	if (!ocm_alloc_mode_set) {
+		mldev->ocm.alloc_mode = CN10K_ML_OCM_ALLOC_MODE_DEFAULT;
+	} else {
+		if (!((strcmp(ocm_alloc_mode, "lowest") == 0) ||
+		      (strcmp(ocm_alloc_mode, "largest") == 0))) {
+			plt_err("Invalid argument, %s = %s\n", CN10K_ML_OCM_ALLOC_MODE,
+				ocm_alloc_mode);
+			ret = -EINVAL;
+			goto exit;
+		}
+		mldev->ocm.alloc_mode = ocm_alloc_mode;
+	}
+	plt_info("ML: %s = %s", CN10K_ML_OCM_ALLOC_MODE, mldev->ocm.alloc_mode);
+
 exit:
 	if (kvlist)
 		rte_kvargs_free(kvlist);
@@ -720,7 +752,8 @@ RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table);
 RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
 
-RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD,
-			      CN10K_ML_FW_PATH "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
-					       "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
-					       "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA "=<0|1>");
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH
+			      "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
+			      "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
+			      "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
+			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE "=<lowest|largest>");
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 348df9468a..b74af2cae6 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -230,7 +230,6 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	int wb_page_start_curr;
 	int max_slot_sz_curr;
 	uint8_t tile_start;
-	int ocm_alloc_mode;
 	int wb_page_start;
 	uint16_t tile_id;
 	uint16_t word_id;
@@ -255,7 +254,6 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	max_slot_sz_curr = 0;
 	max_slot_sz = 0;
 	tile_idx = 0;
-	ocm_alloc_mode = 2;
 
 	if ((start_tile != -1) && (start_tile % num_tiles != 0)) {
 		plt_err("Invalid start_tile, %d", start_tile);
@@ -303,13 +301,13 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 		}
 	}
 
-	if (ocm_alloc_mode == 1) {
+	if (strcmp(ocm->alloc_mode, "lowest") == 0) {
 		wb_page_start = slot_index_lowest(local_ocm_mask, ocm->mask_words, wb_pages, 0);
 		if (wb_page_start != -1) { /* Have a valid slot for WB, else next set of tiles */
 			tile_idx = tile_start;
 			goto found;
 		}
-	} else if (ocm_alloc_mode == 2) {
+	} else if (strcmp(ocm->alloc_mode, "largest") == 0) {
 		wb_page_start_curr = slot_index_largest(local_ocm_mask, ocm->mask_words, wb_pages,
 							&max_slot_sz_curr);
 		if (max_slot_sz_curr > max_slot_sz) {
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 887c8bf6c0..65f0e0f650 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -58,6 +58,9 @@ struct cn10k_ml_ocm {
 	/* OCM spinlock, used to update OCM state */
 	rte_spinlock_t lock;
 
+	/* OCM allocation mode */
+	const char *alloc_mode;
+
 	/* Number of OCM tiles */
 	uint8_t num_tiles;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 35/37] ml/cnxk: add support to use lock during jcmd enq
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (33 preceding siblings ...)
  2022-12-08 20:18   ` [PATCH v2 34/37] ml/cnxk: add support to select OCM allocation mode Srikanth Yalavarthi
@ 2022-12-08 20:18   ` Srikanth Yalavarthi
  2022-12-08 20:18   ` [PATCH v2 36/37] ml/cnxk: add support to select poll memory region Srikanth Yalavarthi
                     ` (2 subsequent siblings)
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:18 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added device argument "hw_queue_lock" to select the JCMDQ enqueue
ROC function to be used in fast path.

hw_queue_lock:

0: Disable, use lock free version of JCMDQ enqueue ROC 	function for
	job queuing. To avoid race condition in request queuing to
	hardware, disabling hw_queue_lock restricts the number of
	queue-pairs supported by cnxk driver to 1.

1: Enable, (default) use spin-lock version of JCMDQ enqueue ROC
	function for job queuing. Enabling spinlock version would
	disable restrictions on the number of queue-pairs that
	can be created.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 31 ++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cn10k_ml_dev.h | 13 +++++++++++--
 drivers/ml/cnxk/cn10k_ml_ops.c | 20 +++++++++++++++++---
 3 files changed, 58 insertions(+), 6 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index a5fce18ec1..33709dae6f 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -22,12 +22,14 @@
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
 #define CN10K_ML_DEV_CACHE_MODEL_DATA	"cache_model_data"
 #define CN10K_ML_OCM_ALLOC_MODE		"ocm_alloc_mode"
+#define CN10K_ML_DEV_HW_QUEUE_LOCK	"hw_queue_lock"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT 0
 #define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT	1
 #define CN10K_ML_OCM_ALLOC_MODE_DEFAULT		"lowest"
+#define CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT	1
 
 /* ML firmware macros */
 #define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
@@ -46,6 +48,7 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 					 CN10K_ML_FW_REPORT_DPE_WARNINGS,
 					 CN10K_ML_DEV_CACHE_MODEL_DATA,
 					 CN10K_ML_OCM_ALLOC_MODE,
+					 CN10K_ML_DEV_HW_QUEUE_LOCK,
 					 NULL};
 
 /* Dummy operations for ML device */
@@ -87,6 +90,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	bool cache_model_data_set = false;
 	struct rte_kvargs *kvlist = NULL;
 	bool ocm_alloc_mode_set = false;
+	bool hw_queue_lock_set = false;
 	char *ocm_alloc_mode = NULL;
 	bool fw_path_set = false;
 	char *fw_path = NULL;
@@ -158,6 +162,18 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		ocm_alloc_mode_set = true;
 	}
 
+	if (rte_kvargs_count(kvlist, CN10K_ML_DEV_HW_QUEUE_LOCK) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_DEV_HW_QUEUE_LOCK, &parse_integer_arg,
+					 &mldev->hw_queue_lock);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n",
+				CN10K_ML_DEV_HW_QUEUE_LOCK);
+			ret = -EINVAL;
+			goto exit;
+		}
+		hw_queue_lock_set = true;
+	}
+
 check_args:
 	if (!fw_path_set)
 		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
@@ -215,6 +231,18 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %s", CN10K_ML_OCM_ALLOC_MODE, mldev->ocm.alloc_mode);
 
+	if (!hw_queue_lock_set) {
+		mldev->hw_queue_lock = CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT;
+	} else {
+		if ((mldev->hw_queue_lock < 0) || (mldev->hw_queue_lock > 1)) {
+			plt_err("Invalid argument, %s = %d\n", CN10K_ML_DEV_HW_QUEUE_LOCK,
+				mldev->hw_queue_lock);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, mldev->hw_queue_lock);
+
 exit:
 	if (kvlist)
 		rte_kvargs_free(kvlist);
@@ -756,4 +784,5 @@ RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH
 			      "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
 			      "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
 			      "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
-			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE "=<lowest|largest>");
+			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE
+			      "=<lowest|largest>" CN10K_ML_DEV_HW_QUEUE_LOCK "=<0|1>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 24e4823196..4b65efecc5 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -21,8 +21,11 @@
 /* Maximum number of models per device */
 #define ML_CN10K_MAX_MODELS 16
 
-/* Maximum number of Queue-Pairs per device */
-#define ML_CN10K_MAX_QP_PER_DEVICE 1
+/* Maximum number of Queue-Pairs per device, spinlock version */
+#define ML_CN10K_MAX_QP_PER_DEVICE_SL 16
+
+/* Maximum number of Queue-Pairs per device, lock-free version */
+#define ML_CN10K_MAX_QP_PER_DEVICE_LF 1
 
 /* Maximum number of descriptors per queue-pair */
 #define ML_CN10K_MAX_DESC_PER_QP 1024
@@ -384,6 +387,12 @@ struct cn10k_ml_dev {
 
 	/* Enable / disable model data caching */
 	int cache_model_data;
+
+	/* Use spinlock version of ROC enqueue */
+	int hw_queue_lock;
+
+	/* JCMD enqueue function handler */
+	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index a44d77df76..f787455a7f 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -534,13 +534,21 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, int16_t model_id)
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
+	struct cn10k_ml_dev *mldev;
+
 	if (dev_info == NULL)
 		return -EINVAL;
 
+	mldev = dev->data->dev_private;
+
 	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
 	dev_info->driver_name = dev->device->driver->name;
 	dev_info->max_models = ML_CN10K_MAX_MODELS;
-	dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE;
+	if (mldev->hw_queue_lock)
+		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
+	else
+		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_LF;
+
 	dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
 	dev_info->max_segments = ML_CN10K_MAX_SEGMENTS;
 	dev_info->min_align_size = ML_CN10K_ALIGN_SIZE;
@@ -703,6 +711,12 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	else
 		mldev->xstats_enabled = false;
 
+	/* Set JCMDQ enqueue function */
+	if (mldev->hw_queue_lock == 1)
+		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_sl;
+	else
+		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
+
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
 	dev->op_error_get = cn10k_ml_op_error_get;
@@ -1996,7 +2010,7 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	req->result.user_ptr = op->user_ptr;
 
 	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
-	enqueued = roc_ml_jcmdq_enqueue_lf(&mldev->roc, &req->jcmd);
+	enqueued = mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
 
@@ -2117,7 +2131,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	timeout = true;
 	req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
 	do {
-		if (roc_ml_jcmdq_enqueue_lf(&mldev->roc, &req->jcmd)) {
+		if (mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd)) {
 			req->op = op;
 			timeout = false;
 			break;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 36/37] ml/cnxk: add support to select poll memory region
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (34 preceding siblings ...)
  2022-12-08 20:18   ` [PATCH v2 35/37] ml/cnxk: add support to use lock during jcmd enq Srikanth Yalavarthi
@ 2022-12-08 20:18   ` Srikanth Yalavarthi
  2022-12-08 20:18   ` [PATCH v2 37/37] ml/cnxk: add user guide for marvell cnxk ml driver Srikanth Yalavarthi
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:18 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added device argument "poll_mem" to select the memory
region to be used for polling in fast-path requests.

Implemented support to use scratch registers for polling.
Available pool of scratch registers one-to-one mapped with
the internal request queue.

poll_mem:
ddr:      Use DDR memory location for polling (default)
register: Use scratch registers polling

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c |  47 +++++++++++--
 drivers/ml/cnxk/cn10k_ml_dev.h |  24 +++++++
 drivers/ml/cnxk/cn10k_ml_ops.c | 124 +++++++++++++++++++++++++++++++--
 drivers/ml/cnxk/cn10k_ml_ops.h |   9 +++
 4 files changed, 192 insertions(+), 12 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 33709dae6f..153a0bdf4c 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -23,6 +23,7 @@
 #define CN10K_ML_DEV_CACHE_MODEL_DATA	"cache_model_data"
 #define CN10K_ML_OCM_ALLOC_MODE		"ocm_alloc_mode"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK	"hw_queue_lock"
+#define CN10K_ML_FW_POLL_MEM		"poll_mem"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1
@@ -30,6 +31,7 @@
 #define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT	1
 #define CN10K_ML_OCM_ALLOC_MODE_DEFAULT		"lowest"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT	1
+#define CN10K_ML_FW_POLL_MEM_DEFAULT		"ddr"
 
 /* ML firmware macros */
 #define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
@@ -42,6 +44,7 @@
 /* Firmware flags */
 #define FW_ENABLE_DPE_WARNING_BITMASK BIT(0)
 #define FW_REPORT_DPE_WARNING_BITMASK BIT(1)
+#define FW_USE_DDR_POLL_ADDR_FP	      BIT(2)
 
 static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 					 CN10K_ML_FW_ENABLE_DPE_WARNINGS,
@@ -49,6 +52,7 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 					 CN10K_ML_DEV_CACHE_MODEL_DATA,
 					 CN10K_ML_OCM_ALLOC_MODE,
 					 CN10K_ML_DEV_HW_QUEUE_LOCK,
+					 CN10K_ML_FW_POLL_MEM,
 					 NULL};
 
 /* Dummy operations for ML device */
@@ -92,7 +96,9 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	bool ocm_alloc_mode_set = false;
 	bool hw_queue_lock_set = false;
 	char *ocm_alloc_mode = NULL;
+	bool poll_mem_set = false;
 	bool fw_path_set = false;
+	char *poll_mem = NULL;
 	char *fw_path = NULL;
 	int ret = 0;
 
@@ -174,6 +180,17 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		hw_queue_lock_set = true;
 	}
 
+	if (rte_kvargs_count(kvlist, CN10K_ML_FW_POLL_MEM) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_POLL_MEM, &parse_string_arg,
+					 &poll_mem);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n", CN10K_ML_FW_POLL_MEM);
+			ret = -EINVAL;
+			goto exit;
+		}
+		poll_mem_set = true;
+	}
+
 check_args:
 	if (!fw_path_set)
 		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
@@ -243,6 +260,18 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, mldev->hw_queue_lock);
 
+	if (!poll_mem_set) {
+		mldev->fw.poll_mem = CN10K_ML_FW_POLL_MEM_DEFAULT;
+	} else {
+		if (!((strcmp(poll_mem, "ddr") == 0) || (strcmp(poll_mem, "register") == 0))) {
+			plt_err("Invalid argument, %s = %s\n", CN10K_ML_FW_POLL_MEM, poll_mem);
+			ret = -EINVAL;
+			goto exit;
+		}
+		mldev->fw.poll_mem = poll_mem;
+	}
+	plt_info("ML: %s = %s", CN10K_ML_FW_POLL_MEM, mldev->fw.poll_mem);
+
 exit:
 	if (kvlist)
 		rte_kvargs_free(kvlist);
@@ -376,6 +405,11 @@ cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 	if (fw->report_dpe_warnings)
 		flags = flags | FW_REPORT_DPE_WARNING_BITMASK;
 
+	if (strcmp(fw->poll_mem, "ddr") == 0)
+		flags = flags | FW_USE_DDR_POLL_ADDR_FP;
+	else if (strcmp(fw->poll_mem, "register") == 0)
+		flags = flags & ~FW_USE_DDR_POLL_ADDR_FP;
+
 	return flags;
 }
 
@@ -780,9 +814,10 @@ RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table);
 RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
 
-RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH
-			      "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
-			      "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
-			      "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
-			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE
-			      "=<lowest|largest>" CN10K_ML_DEV_HW_QUEUE_LOCK "=<0|1>");
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD,
+			      CN10K_ML_FW_PATH "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
+					       "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
+					       "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
+					       "=<0|1>" CN10K_ML_OCM_ALLOC_MODE
+					       "=<lowest|largest>" CN10K_ML_DEV_HW_QUEUE_LOCK
+					       "=<0|1>" CN10K_ML_FW_POLL_MEM "=<ddr|register>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 4b65efecc5..092a023144 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -43,6 +43,18 @@
 #define ML_CN10K_POLL_JOB_START	 0
 #define ML_CN10K_POLL_JOB_FINISH 1
 
+/* Memory barrier macros */
+#if defined(RTE_ARCH_ARM)
+#define dmb_st ({ asm volatile("dmb st" : : : "memory"); })
+#define dsb_st ({ asm volatile("dsb st" : : : "memory"); })
+#else
+#define dmb_st
+#define dsb_st
+#endif
+
+struct cn10k_ml_req;
+struct cn10k_ml_qp;
+
 /* ML Job types */
 enum cn10k_ml_job_type {
 	ML_CN10K_JOB_TYPE_MODEL_RUN = 0,
@@ -358,6 +370,9 @@ struct cn10k_ml_fw {
 	/* Report DPE warnings */
 	int report_dpe_warnings;
 
+	/* Memory to be used for polling in fast-path requests */
+	const char *poll_mem;
+
 	/* Data buffer */
 	uint8_t *data;
 
@@ -393,6 +408,15 @@ struct cn10k_ml_dev {
 
 	/* JCMD enqueue function handler */
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
+
+	/* Poll handling function pointers */
+	void (*set_poll_addr)(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx);
+	void (*set_poll_ptr)(struct roc_ml *roc_ml, struct cn10k_ml_req *req);
+	uint64_t (*get_poll_ptr)(struct roc_ml *roc_ml, struct cn10k_ml_req *req);
+
+	/* Memory barrier function pointers to handle synchronization */
+	void (*set_enq_barrier)(void);
+	void (*set_deq_barrier)(void);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index f787455a7f..b73ce8c97a 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -23,6 +23,11 @@
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
+/* Scratch register range for poll mode requests */
+#define ML_POLL_REGISTER_SYNC  1023
+#define ML_POLL_REGISTER_START 1024
+#define ML_POLL_REGISTER_END   2047
+
 /* Error message length */
 #define ERRMSG_LEN 32
 
@@ -76,6 +81,80 @@ print_line(FILE *fp, int len)
 	fprintf(fp, "\n");
 }
 
+static inline void
+cn10k_ml_set_poll_addr_ddr(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx)
+{
+	PLT_SET_USED(qp);
+	PLT_SET_USED(idx);
+
+	req->compl_W1 = PLT_U64_CAST(&req->status);
+}
+
+static inline void
+cn10k_ml_set_poll_addr_reg(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx)
+{
+	req->compl_W1 = ML_SCRATCH(qp->block_start + idx % qp->block_size);
+}
+
+static inline void
+cn10k_ml_set_poll_ptr_ddr(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+{
+	PLT_SET_USED(roc_ml);
+
+	plt_write64(ML_CN10K_POLL_JOB_START, req->compl_W1);
+}
+
+static inline void
+cn10k_ml_set_poll_ptr_reg(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+{
+	roc_ml_reg_write64(roc_ml, ML_CN10K_POLL_JOB_START, req->compl_W1);
+}
+
+static inline uint64_t
+cn10k_ml_get_poll_ptr_ddr(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+{
+	PLT_SET_USED(roc_ml);
+
+	return plt_read64(req->compl_W1);
+}
+
+static inline uint64_t
+cn10k_ml_get_poll_ptr_reg(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+{
+	return roc_ml_reg_read64(roc_ml, req->compl_W1);
+}
+
+static inline void
+cn10k_ml_set_sync_addr(struct cn10k_ml_dev *mldev, struct cn10k_ml_req *req)
+{
+	if (strcmp(mldev->fw.poll_mem, "ddr") == 0)
+		req->compl_W1 = PLT_U64_CAST(&req->status);
+	else if (strcmp(mldev->fw.poll_mem, "register") == 0)
+		req->compl_W1 = ML_SCRATCH(ML_POLL_REGISTER_SYNC);
+}
+
+static inline void
+cn10k_ml_enq_barrier_ddr(void)
+{
+}
+
+static inline void
+cn10k_ml_deq_barrier_ddr(void)
+{
+}
+
+static inline void
+cn10k_ml_enq_barrier_register(void)
+{
+	dmb_st;
+}
+
+static inline void
+cn10k_ml_deq_barrier_register(void)
+{
+	dsb_st;
+}
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -163,6 +242,9 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->stats.dequeued_count = 0;
 	qp->stats.enqueue_err_count = 0;
 	qp->stats.dequeue_err_count = 0;
+	qp->block_size =
+		(ML_POLL_REGISTER_END - ML_POLL_REGISTER_START + 1) / dev->data->nb_queue_pairs;
+	qp->block_start = ML_POLL_REGISTER_START + qp_id * qp->block_size;
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
@@ -341,7 +423,7 @@ cn10k_ml_prep_fp_job_descriptor(struct rte_ml_dev *dev, struct cn10k_ml_req *req
 	mldev = dev->data->dev_private;
 
 	req->jd.hdr.jce.w0.u64 = 0;
-	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
+	req->jd.hdr.jce.w1.u64 = req->compl_W1;
 	req->jd.hdr.model_id = op->model_id;
 	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
 	req->jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
@@ -549,7 +631,11 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	else
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_LF;
 
-	dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
+	if (strcmp(mldev->fw.poll_mem, "register") == 0)
+		dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP / dev_info->max_queue_pairs;
+	else if (strcmp(mldev->fw.poll_mem, "ddr") == 0)
+		dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
+
 	dev_info->max_segments = ML_CN10K_MAX_SEGMENTS;
 	dev_info->min_align_size = ML_CN10K_ALIGN_SIZE;
 
@@ -717,6 +803,26 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	else
 		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
+	/* Set polling function pointers */
+	if (strcmp(mldev->fw.poll_mem, "ddr") == 0) {
+		mldev->set_poll_addr = cn10k_ml_set_poll_addr_ddr;
+		mldev->set_poll_ptr = cn10k_ml_set_poll_ptr_ddr;
+		mldev->get_poll_ptr = cn10k_ml_get_poll_ptr_ddr;
+	} else if (strcmp(mldev->fw.poll_mem, "register") == 0) {
+		mldev->set_poll_addr = cn10k_ml_set_poll_addr_reg;
+		mldev->set_poll_ptr = cn10k_ml_set_poll_ptr_reg;
+		mldev->get_poll_ptr = cn10k_ml_get_poll_ptr_reg;
+	}
+
+	/* Set barrier function pointers */
+	if (strcmp(mldev->fw.poll_mem, "ddr") == 0) {
+		mldev->set_enq_barrier = cn10k_ml_enq_barrier_ddr;
+		mldev->set_deq_barrier = cn10k_ml_deq_barrier_ddr;
+	} else if (strcmp(mldev->fw.poll_mem, "register") == 0) {
+		mldev->set_enq_barrier = cn10k_ml_enq_barrier_register;
+		mldev->set_deq_barrier = cn10k_ml_deq_barrier_register;
+	}
+
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
 	dev->op_error_get = cn10k_ml_op_error_get;
@@ -2003,13 +2109,15 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	op = ops[count];
 	req = &queue->reqs[head];
 
+	mldev->set_poll_addr(qp, req, head);
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
+	mldev->set_enq_barrier();
 
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	mldev->set_poll_ptr(&mldev->roc, req);
 	enqueued = mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
@@ -2035,6 +2143,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 		       uint16_t nb_ops)
 {
 	struct cn10k_ml_queue *queue;
+	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	struct cn10k_ml_qp *qp;
 
@@ -2042,6 +2151,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	uint16_t count;
 	uint64_t tail;
 
+	mldev = dev->data->dev_private;
 	qp = dev->data->queue_pairs[qp_id];
 	queue = &qp->queue;
 
@@ -2054,7 +2164,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 
 dequeue_req:
 	req = &queue->reqs[tail];
-	status = plt_read64(&req->status);
+	status = mldev->get_poll_ptr(&mldev->roc, req);
 	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH)) {
 		if (plt_tsc_cycles() < req->timeout)
 			goto empty_or_active;
@@ -2062,6 +2172,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 			req->result.error_code.s.etype = ML_ETYPE_DRIVER;
 	}
 
+	mldev->set_deq_barrier();
 	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
 	ops[count] = req->op;
 
@@ -2119,13 +2230,14 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	model = dev->data->models[op->model_id];
 	req = model->req;
 
+	cn10k_ml_set_sync_addr(mldev, req);
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	mldev->set_poll_ptr(&mldev->roc, req);
 	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
 
 	timeout = true;
@@ -2145,7 +2257,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 
 	timeout = true;
 	do {
-		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+		if (mldev->get_poll_ptr(&mldev->roc, req) == ML_CN10K_POLL_JOB_FINISH) {
 			timeout = false;
 			break;
 		}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 4c38f1938a..f09c67f186 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -26,6 +26,9 @@ struct cn10k_ml_req {
 	/* Job command */
 	struct ml_job_cmd_s jcmd;
 
+	/* Job completion W1 */
+	uint64_t compl_W1;
+
 	/* Request timeout cycle */
 	uint64_t timeout;
 
@@ -61,6 +64,12 @@ struct cn10k_ml_qp {
 
 	/* Queue pair statistics */
 	struct rte_ml_dev_stats stats;
+
+	/* Register block start for polling */
+	uint32_t block_start;
+
+	/* Register block end for polling */
+	uint32_t block_size;
 };
 
 /* CN10K device ops */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v2 37/37] ml/cnxk: add user guide for marvell cnxk ml driver
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (35 preceding siblings ...)
  2022-12-08 20:18   ` [PATCH v2 36/37] ml/cnxk: add support to select poll memory region Srikanth Yalavarthi
@ 2022-12-08 20:18   ` Srikanth Yalavarthi
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
  37 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-08 20:18 UTC (permalink / raw)
  To: Thomas Monjalon, Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added user guide for Marvell cnxk ML driver for Marvell Octeon
cnxk Soc family. Added details about device initialization,
debug options and runtime device args supported by the driver.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
v2:
* Fixed typos

 MAINTAINERS                 |   1 +
 doc/guides/index.rst        |   1 +
 doc/guides/mldevs/cnxk.rst  | 238 ++++++++++++++++++++++++++++++++++++
 doc/guides/mldevs/index.rst |  14 +++
 4 files changed, 254 insertions(+)
 create mode 100644 doc/guides/mldevs/cnxk.rst
 create mode 100644 doc/guides/mldevs/index.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index ba4c97e802..537acb8c84 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1443,6 +1443,7 @@ M: Srikanth Yalavarthi <syalavarthi@marvell.com>
 F: drivers/common/cnxk/hw/ml.h
 F: drivers/common/cnxk/roc_ml*
 F: drivers/ml/cnxk/
+F: doc/guides/mldevs/cnxk.rst


 Packet processing
diff --git a/doc/guides/index.rst b/doc/guides/index.rst
index 5eb5bd9c9a..0bd729530a 100644
--- a/doc/guides/index.rst
+++ b/doc/guides/index.rst
@@ -26,6 +26,7 @@ DPDK documentation
    eventdevs/index
    rawdevs/index
    mempool/index
+   mldevs/index
    platform/index
    contributing/index
    rel_notes/index
diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
new file mode 100644
index 0000000000..da40336299
--- /dev/null
+++ b/doc/guides/mldevs/cnxk.rst
@@ -0,0 +1,238 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright (c) 2022 Marvell.
+
+Marvell cnxk Machine Learning Poll Mode Driver
+==============================================
+
+The cnxk ML poll mode driver provides support for offloading Machine
+Learning inference operations to Machine Learning accelerator units
+on the **Marvell OCTEON cnxk** SoC family.
+
+The cnxk ML PMD code is organized into multiple files with all file names
+starting with cn10k, providing support for CN106XX and CN106XXS.
+
+More information about OCTEON cnxk SoCs may be obtained from `<https://www.marvell.com>`_
+
+Supported OCTEON cnxk SoCs
+--------------------------
+
+- CN106XX
+- CN106XXS
+
+Features
+--------
+
+The OCTEON cnxk ML PMD provides support for the following set of operations:
+
+Slow-path device and ML model handling:
+
+* ``Device probing, configuration and close``
+* ``Device start / stop``
+* ``Model loading and unloading``
+* ``Model start / stop``
+* ``Data quantization and dequantization``
+
+Fast-path Inference:
+
+* ``Inference execution``
+* ``Error handling``
+
+
+Installation
+------------
+
+The OCTEON cnxk ML PMD may be compiled natively on an OCTEON cnxk platform
+or cross-compiled on an x86 platform.
+
+Refer to :doc:`../platform/cnxk` for instructions to build your DPDK
+application.
+
+
+Initialization
+--------------
+
+``CN10K Initialization``
+
+List the ML PF devices available on cn10k platform:
+
+.. code-block:: console
+
+    lspci -d:a092
+
+``a092`` is the ML device PF id. You should see output similar to:
+
+.. code-block:: console
+
+    0000:00:10.0 System peripheral: Cavium, Inc. Device a092
+
+Bind the ML PF device to the vfio_pci driver:
+
+.. code-block:: console
+
+    cd <dpdk directory>
+    ./usertools/dpdk-devbind.py -u 0000:00:10.0
+    ./usertools/dpdk-devbind.py -b vfio-pci 0000:00:10.0
+
+Runtime Config Options
+----------------------
+
+- ``Firmware file path`` (default ``/lib/firmware/mlip-fw.bin``)
+
+   Path to the firmware binary to be loaded during device configuration.
+   The ``fw_path`` ``devargs`` parameter can be used by the user to load
+   ML firmware from a custom path.
+
+   For example::
+
+      -a 0000:00:10.0,fw_path="/home/user/ml_fw.bin"
+
+   With the above configuration, driver loads the firmware from the path
+   "/home/user/ml_fw.bin".
+
+- ``Enable DPE warnings`` (default ``1``)
+
+   ML firmware can be configured during load to handle the DPE errors reported
+   by ML inference engine. When enabled, firmware would mask the DPE non-fatal
+   hardware errors as warnings. The parameter ``enable_dpe_warnings`` ``devargs``
+   is used fo this configuration.
+
+   For example::
+
+      -a 0000:00:10.0,enable_dpe_warnings=0
+
+   With the above configuration, DPE non-fatal errors reported by HW are
+   considered as errors.
+
+
+- ``Model data caching`` (default ``1``)
+
+   Enable caching model data on ML ACC cores. Enabling this option executes a
+   dummy inference request in synchronous mode during model start stage. Caching
+   of model data improves the inferencing throughput / latency for the model.
+   The parameter ``cache_model_data`` ``devargs`` is used to enable data caching.
+
+   For example::
+
+      -a 0000:00:10.0,cache_model_data=0
+
+   With the above configuration, model data caching is disabled.
+
+
+- ``OCM allocation mode`` (default ``lowest``)
+
+   Option to specify the method to be used while allocating OCM memory for a
+   model during model start. Two modes are supported by the driver. The
+   parameter ``ocm_alloc_mode`` ``devargs`` is used to select the OCM
+   allocation mode.
+
+   ``lowest`` - Allocate OCM for the model from first available free slot. Search
+   for the free slot is done starting from the lowest tile ID and lowest page ID.
+   ``largest`` - Allocate OCM for the model from the slot with largest amount of
+   free space.
+
+   For example::
+
+      -a 0000:00:10.0,ocm_alloc_mode=lowest
+
+   With the above configuration, OCM allocation fo the model would be done from
+   the first available free slot / from the lowest possible tile ID.
+
+
+- ``Enable hardware queue lock`` (default ``0``)
+
+   Option to select the job request enqueue function to used to queue the requests
+   to hardware queue. The parameter ``hw_queue_lock`` ``devargs`` is used to select
+   the enqueue function.
+
+   ``0`` - Disable (default), use lock free version of hardware enqueue function
+   for job queuing in enqueue burst operation. To avoid race condition in request
+   queuing to hardware, disabling hw_queue_lock restricts the number of queue-pairs
+   supported by cnxk driver to 1.
+   ``1`` - Enable, use spin-lock version of hardware enqueue function for job queuing.
+   Enabling spinlock version would disable restrictions on the number of queue-pairs
+   that can be supported by the driver.
+
+   For example::
+
+      -a 0000:00:10.0,hw_queue_lock=1
+
+   With the above configuration, spinlock version of hardware enqueue function is used
+   in the fast path enqueue burst operation.
+
+
+- ``Polling memory location`` (default ``ddr``)
+
+   ML cnxk driver provides the option to select the memory location to be used
+   for polling to check the inference request completion. Driver supports using
+   the either DDR address space (``ddr``) or ML registers (``register``) as
+   polling locations. The parameter ``poll_mem`` ``devargs`` is used to specify
+   the poll location.
+
+   For example::
+
+      -a 0000:00:10.0,poll_mem="register"
+
+   With the above configuration, ML cnxk driver is configured to use ML registers
+   for polling in fastpath requests.
+
+
+Debugging Options
+-----------------
+
+.. _table_octeon_cnxk_ml_debug_options:
+
+.. table:: OCTEON cnxk ML PMD debug options
+
+    +---+------------+-------------------------------------------------------+
+    | # | Component  | EAL log command                                       |
+    +===+============+=======================================================+
+    | 1 | ML         | --log-level='pmd\.ml\.cnxk,8'                         |
+    +---+------------+-------------------------------------------------------+
+
+
+Extended stats
+--------------
+
+Marvell cnxk ML PMD supports reporting the inference latencies through extended
+stats. The PMD supports the below list of 6 extended stats types per each model.
+Total number of extended stats would be equal to 6 x number of models loaded.
+
+.. _table_octeon_cnxk_ml_xstats_names:
+
+.. table:: OCTEON cnxk ML PMD xstats names
+
+    +---+---------------------+----------------------------------------------+
+    | # | Type                | Description                                  |
+    +===+=====================+==============================================+
+    | 1 | Avg-HW-Latency      | Average hardware latency                     |
+    +---+---------------------+----------------------------------------------+
+    | 2 | Min-HW-Latency      | Minimum hardware latency                     |
+    +---+---------------------+----------------------------------------------+
+    | 3 | Max-HW-Latency      | Maximum hardware latency                     |
+    +---+---------------------+----------------------------------------------+
+    | 4 | Avg-HW-Latency      | Average firmware latency                     |
+    +---+---------------------+----------------------------------------------+
+    | 5 | Avg-HW-Latency      | Minimum firmware latency                     |
+    +---+---------------------+----------------------------------------------+
+    | 6 | Avg-HW-Latency      | Maximum firmware latency                     |
+    +---+---------------------+----------------------------------------------+
+
+Latency values reported by the PMD through xstats can have units, either in
+cycles or nano seconds. The units of the latency is determined during DPDK
+initialization and would depend on the availability of SCLK. Latencies are
+reported in nao seconds when the SCLK is available and in cycles otherwise.
+Application needs to initialize at least one RVU for the clock to be available.
+
+xstats names are dynamically generated by the PMD and would have the format
+"Model-<model_id>-Type-<units>".
+
+For example::
+   Model-1-Avg-FW-Latency-ns
+
+The above xstat name would report average firmware latency in nano seconds for
+model with model ID 1.
+
+Number of xstats made available by the PMD change dynamically. The number would
+increase with loading a model and would decrease with unloading a model.
+Application needs to update the xstats map after a model is either loaded or
+unloaded.
diff --git a/doc/guides/mldevs/index.rst b/doc/guides/mldevs/index.rst
new file mode 100644
index 0000000000..f201e54175
--- /dev/null
+++ b/doc/guides/mldevs/index.rst
@@ -0,0 +1,14 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright (c) 2022 Marvell.
+
+Machine Learning Device Driver
+==============================
+
+The following are a list of ML device PMDs, which can be used from an
+application through the ML device API.
+
+.. toctree::
+    :maxdepth: 2
+    :numbered:
+
+    cnxk
--
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 00/38] Implementation of ML CNXK driver
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (36 preceding siblings ...)
  2022-12-08 20:18   ` [PATCH v2 37/37] ml/cnxk: add user guide for marvell cnxk ml driver Srikanth Yalavarthi
@ 2022-12-20 19:26   ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 01/38] common/cnxk: add ML headers and ROC code for cnxk Srikanth Yalavarthi
                       ` (38 more replies)
  37 siblings, 39 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  Cc: dev, sshankarnara, jerinj, aprabhu, Srikanth Yalavarthi

Marvell ML CNXK Driver
----------------------

This patch series implements common Machine Learning (ML) ROC code
and driver for Marvell Octeon 10 (cnxk) platform. ML inferencing is
supported on cnxk platform through an integrated ML inferencing
processor. The current driver supports programming the ML hardware
engine through offload mode.

All APIs proposed in the DPDK ML device specification are supported on
the cnxk platform.

v3:
* Skip installation of internal headers
* Update internal comments and code cleanup

v2:
* Typo and formatting fixes

Srikanth Yalavarthi (38):
  common/cnxk: add ML headers and ROC code for cnxk
  ml/cnxk: add skeleton for ML cnxk driver
  ml/cnxk: enable probe and remove of ML device
  ml/cnxk: add driver support to get device info
  ml/cnxk: add support for configure and close
  ml/cnxk: parse ML firmware path from device args
  ml/cnxk: enable firmware load and device reset
  ml/cnxk: enable support for simulator environment
  ml/cnxk: enable support for device start and stop
  ml/cnxk: add support to create device queue-pairs
  ml/cnxk: add functions to load and unload models
  ml/cnxk: enable validity checks for model metadata
  ml/cnxk: add internal structures for derived info
  ml/cnxk: add internal structures for tiles and OCM
  ml/cnxk: add structures for slow and fast path JDs
  ml/cnxk: find OCM mask and page slots for a model
  ml/cnxk: add support to reserve and free OCM pages
  ml/cnxk: enable support to start an ML model
  ml/cnxk: enable support to stop an ML models
  ml/cnxk: enable support to get model information
  ml/cnxk: enable support to update model params
  ml/cnxk: add support to get IO buffer sizes
  ml/cnxk: enable quantization and dequantization
  ml/cnxk: enable support to dump device debug info
  ml/cnxk: add driver support for device selftest
  ml/cnxk: enqueue a burst of inference requests
  ml/cnxk: dequeue a burst of inference requests
  ml/cnxk: add internal function for sync mode run
  ml/cnxk: enable support for firmware error codes
  ml/cnxk: add support to get and reset device stats
  ml/cnxk: add support to handle extended dev stats
  ml/cnxk: enable support to get xstats in cycles
  ml/cnxk: add support to report DPE FW warnings
  ml/cnxk: add support to enable model data caching
  ml/cnxk: add support to select OCM allocation mode
  ml/cnxk: add support to use lock during jcmd enq
  ml/cnxk: add support to select poll memory region
  ml/cnxk: add user guide for marvell cnxk ml driver

 MAINTAINERS                         |    7 +
 doc/guides/index.rst                |    1 +
 doc/guides/mldevs/cnxk.rst          |  238 +++
 doc/guides/mldevs/index.rst         |   14 +
 drivers/common/cnxk/hw/ml.h         |  170 ++
 drivers/common/cnxk/meson.build     |    1 +
 drivers/common/cnxk/roc_api.h       |    4 +
 drivers/common/cnxk/roc_constants.h |    2 +
 drivers/common/cnxk/roc_dev_priv.h  |    1 +
 drivers/common/cnxk/roc_ml.c        |  626 ++++++++
 drivers/common/cnxk/roc_ml.h        |  152 ++
 drivers/common/cnxk/roc_ml_priv.h   |   24 +
 drivers/common/cnxk/roc_platform.c  |    1 +
 drivers/common/cnxk/roc_platform.h  |    2 +
 drivers/common/cnxk/roc_priv.h      |    3 +
 drivers/common/cnxk/version.map     |   29 +
 drivers/meson.build                 |    1 +
 drivers/ml/cnxk/cn10k_ml_dev.c      |  823 ++++++++++
 drivers/ml/cnxk/cn10k_ml_dev.h      |  426 +++++
 drivers/ml/cnxk/cn10k_ml_model.c    |  397 +++++
 drivers/ml/cnxk/cn10k_ml_model.h    |  511 ++++++
 drivers/ml/cnxk/cn10k_ml_ocm.c      |  509 ++++++
 drivers/ml/cnxk/cn10k_ml_ocm.h      |   91 ++
 drivers/ml/cnxk/cn10k_ml_ops.c      | 2306 +++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h      |   94 ++
 drivers/ml/cnxk/meson.build         |   32 +
 drivers/ml/meson.build              |    8 +
 27 files changed, 6473 insertions(+)
 create mode 100644 doc/guides/mldevs/cnxk.rst
 create mode 100644 doc/guides/mldevs/index.rst
 create mode 100644 drivers/common/cnxk/hw/ml.h
 create mode 100644 drivers/common/cnxk/roc_ml.c
 create mode 100644 drivers/common/cnxk/roc_ml.h
 create mode 100644 drivers/common/cnxk/roc_ml_priv.h
 create mode 100644 drivers/ml/cnxk/cn10k_ml_dev.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_dev.h
 create mode 100644 drivers/ml/cnxk/cn10k_ml_model.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_model.h
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ocm.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ocm.h
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ops.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ops.h
 create mode 100644 drivers/ml/cnxk/meson.build
 create mode 100644 drivers/ml/meson.build

--
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 01/38] common/cnxk: add ML headers and ROC code for cnxk
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 02/38] ml/cnxk: add skeleton for ML cnxk driver Srikanth Yalavarthi
                       ` (37 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Thomas Monjalon, Srikanth Yalavarthi, Nithin Dabilpuram,
	Kiran Kumar K, Sunil Kumar Kori, Satha Rao
  Cc: dev, sshankarnara, jerinj, aprabhu

Added ML cnxk headers for register, structure definitions and
ROC layer. Implemented ROC functions, registered logtype for
ML module with the name pmd.ml.cnxk and defined ML hardware ID.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
Depends-on: series-26198 ("implementation of ML common code")

 MAINTAINERS                         |   4 +
 drivers/common/cnxk/hw/ml.h         | 170 ++++++++
 drivers/common/cnxk/meson.build     |   1 +
 drivers/common/cnxk/roc_api.h       |   4 +
 drivers/common/cnxk/roc_constants.h |   2 +
 drivers/common/cnxk/roc_dev_priv.h  |   1 +
 drivers/common/cnxk/roc_ml.c        | 626 ++++++++++++++++++++++++++++
 drivers/common/cnxk/roc_ml.h        | 152 +++++++
 drivers/common/cnxk/roc_ml_priv.h   |  24 ++
 drivers/common/cnxk/roc_platform.c  |   1 +
 drivers/common/cnxk/roc_platform.h  |   2 +
 drivers/common/cnxk/roc_priv.h      |   3 +
 drivers/common/cnxk/version.map     |  29 ++
 13 files changed, 1019 insertions(+)
 create mode 100644 drivers/common/cnxk/hw/ml.h
 create mode 100644 drivers/common/cnxk/roc_ml.c
 create mode 100644 drivers/common/cnxk/roc_ml.h
 create mode 100644 drivers/common/cnxk/roc_ml_priv.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 6412209bff..8cdb3e215d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1438,6 +1438,10 @@ ML common code
 M: Srikanth Yalavarthi <syalavarthi@marvell.com>
 F: drivers/common/ml/

+Marvell ML CNXK
+M: Srikanth Yalavarthi <syalavarthi@marvell.com>
+F: drivers/common/cnxk/hw/ml.h
+F: drivers/common/cnxk/roc_ml*

 Packet processing
 -----------------
diff --git a/drivers/common/cnxk/hw/ml.h b/drivers/common/cnxk/hw/ml.h
new file mode 100644
index 0000000000..3ead42b807
--- /dev/null
+++ b/drivers/common/cnxk/hw/ml.h
@@ -0,0 +1,170 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef __ML_HW_H__
+#define __ML_HW_H__
+
+#include <stdint.h>
+
+/* Constants */
+#define ML_ANBX_NR 0x3
+
+/* Base offsets */
+#define ML_MLAB_BLK_OFFSET 0x20000000 /* CNF10KB */
+#define ML_AXI_START_ADDR  0x800000000
+
+/* MLW register offsets / ML_PF_BAR0 */
+#define ML_CFG			 0x10000
+#define ML_MLR_BASE		 0x10008
+#define ML_AXI_BRIDGE_CTRL(a)	 (0x10020 | (uint64_t)(a) << 3)
+#define ML_JOB_MGR_CTRL		 0x10060
+#define ML_CORE_INT_LO		 0x10140
+#define ML_CORE_INT_HI		 0x10160
+#define ML_JCMDQ_IN(a)		 (0x11000 | (uint64_t)(a) << 3) /* CN10KA */
+#define ML_JCMDQ_STATUS		 0x11010			/* CN10KA */
+#define ML_STGX_STATUS(a)	 (0x11020 | (uint64_t)(a) << 3) /* CNF10KB */
+#define ML_STG_CONTROL		 0x11100			/* CNF10KB */
+#define ML_PNB_CMD_TYPE		 0x113a0			/* CNF10KB */
+#define ML_SCRATCH(a)		 (0x14000 | (uint64_t)(a) << 3)
+#define ML_ANBX_BACKP_DISABLE(a) (0x18000 | (uint64_t)(a) << 12) /* CN10KA */
+#define ML_ANBX_NCBI_P_OVR(a)	 (0x18010 | (uint64_t)(a) << 12) /* CN10KA */
+#define ML_ANBX_NCBI_NP_OVR(a)	 (0x18020 | (uint64_t)(a) << 12) /* CN10KA */
+
+/* MLIP configuration register offsets / ML_PF_BAR0 */
+#define ML_SW_RST_CTRL		      0x12084000
+#define ML_A35_0_RST_VECTOR_BASE_W(a) (0x12084014 + (a) * (0x04))
+#define ML_A35_1_RST_VECTOR_BASE_W(a) (0x1208401c + (a) * (0x04))
+
+/* MLW scratch register offsets */
+#define ML_SCRATCH_WORK_PTR	      (ML_SCRATCH(0))
+#define ML_SCRATCH_FW_CTRL	      (ML_SCRATCH(1))
+#define ML_SCRATCH_DBG_BUFFER_HEAD_C0 (ML_SCRATCH(2))
+#define ML_SCRATCH_DBG_BUFFER_TAIL_C0 (ML_SCRATCH(3))
+#define ML_SCRATCH_DBG_BUFFER_HEAD_C1 (ML_SCRATCH(4))
+#define ML_SCRATCH_DBG_BUFFER_TAIL_C1 (ML_SCRATCH(5))
+#define ML_SCRATCH_EXCEPTION_SP_C0    (ML_SCRATCH(6))
+#define ML_SCRATCH_EXCEPTION_SP_C1    (ML_SCRATCH(7))
+
+/* ML job completion structure */
+struct ml_jce_s {
+	/* WORD 0 */
+	union ml_jce_w0 {
+		struct {
+			uint64_t rsvd_0_3 : 4;
+
+			/* Reserved for future architecture */
+			uint64_t ggrp_h : 2;
+
+			/* Tag type */
+			uint64_t ttype : 2;
+
+			/* Physical function number */
+			uint64_t pf_func : 16;
+
+			/* Unused [7] + Guest Group [6:0] */
+			uint64_t ggrp : 8;
+
+			/* Tag */
+			uint64_t tag : 32;
+		} s;
+		uint64_t u64;
+	} w0;
+
+	/* WORD 1 */
+	union ml_jce_w1 {
+		struct {
+			/* Work queue pointer */
+			uint64_t wqp : 53;
+			uint64_t rsvd_53_63 : 11;
+
+		} s;
+		uint64_t u64;
+	} w1;
+};
+
+/* ML job command structure */
+struct ml_job_cmd_s {
+	/* WORD 0 */
+	union ml_job_cmd_w0 {
+		struct {
+			uint64_t rsvd_0_63;
+		} s;
+		uint64_t u64;
+	} w0;
+
+	/* WORD 1 */
+	union ml_job_cmd_w1 {
+		struct {
+			/* Job pointer */
+			uint64_t jobptr : 53;
+			uint64_t rsvd_53_63 : 11;
+		} s;
+		uint64_t u64;
+	} w1;
+};
+
+/* ML A35 0 RST vector base structure */
+union ml_a35_0_rst_vector_base_s {
+	struct {
+		/* Base address */
+		uint64_t addr : 37;
+		uint64_t rsvd_37_63 : 27;
+	} s;
+
+	struct {
+		/* WORD 0 */
+		uint32_t w0;
+
+		/* WORD 1 */
+		uint32_t w1;
+	} w;
+
+	uint64_t u64;
+};
+
+/* ML A35 1 RST vector base structure */
+union ml_a35_1_rst_vector_base_s {
+	struct {
+		/* Base address */
+		uint64_t addr : 37;
+		uint64_t rsvd_37_63 : 27;
+	} s;
+
+	struct {
+		/* WORD 0 */
+		uint32_t w0;
+
+		/* WORD 1 */
+		uint32_t w1;
+	} w;
+
+	uint64_t u64;
+};
+
+/* Work pointer scratch register */
+union ml_scratch_work_ptr_s {
+	struct {
+		/* Work pointer */
+		uint64_t work_ptr : 37;
+		uint64_t rsvd_37_63 : 27;
+	} s;
+	uint64_t u64;
+};
+
+/* Firmware control scratch register */
+union ml_scratch_fw_ctrl_s {
+	struct {
+		uint64_t rsvd_0_15 : 16;
+
+		/* Valid job bit */
+		uint64_t valid : 1;
+
+		/* Done status bit */
+		uint64_t done : 1;
+		uint64_t rsvd_18_63 : 46;
+	} s;
+	uint64_t u64;
+};
+
+#endif /* __ML_HW_H__ */
diff --git a/drivers/common/cnxk/meson.build b/drivers/common/cnxk/meson.build
index 849735921c..b4aa0a050c 100644
--- a/drivers/common/cnxk/meson.build
+++ b/drivers/common/cnxk/meson.build
@@ -26,6 +26,7 @@ sources = files(
         'roc_irq.c',
         'roc_ie_ot.c',
         'roc_mbox.c',
+        'roc_ml.c',
         'roc_model.c',
         'roc_nix.c',
         'roc_nix_bpf.c',
diff --git a/drivers/common/cnxk/roc_api.h b/drivers/common/cnxk/roc_api.h
index 072f16d77d..fdddf8c6c7 100644
--- a/drivers/common/cnxk/roc_api.h
+++ b/drivers/common/cnxk/roc_api.h
@@ -34,6 +34,7 @@
 /* HW structure definition */
 #include "hw/cpt.h"
 #include "hw/dpi.h"
+#include "hw/ml.h"
 #include "hw/nix.h"
 #include "hw/npa.h"
 #include "hw/npc.h"
@@ -106,4 +107,7 @@
 /* NIX Inline dev */
 #include "roc_nix_inl.h"

+/* ML */
+#include "roc_ml.h"
+
 #endif /* _ROC_API_H_ */
diff --git a/drivers/common/cnxk/roc_constants.h b/drivers/common/cnxk/roc_constants.h
index 0495965daa..ddaef133b8 100644
--- a/drivers/common/cnxk/roc_constants.h
+++ b/drivers/common/cnxk/roc_constants.h
@@ -50,6 +50,8 @@
 #define PCI_DEVID_CN10K_RVU_CPT_PF 0xA0F2
 #define PCI_DEVID_CN10K_RVU_CPT_VF 0xA0F3

+#define PCI_DEVID_CN10K_ML_PF 0xA092
+
 #define PCI_SUBSYSTEM_DEVID_CN10KA  0xB900
 #define PCI_SUBSYSTEM_DEVID_CN10KAS 0xB900
 #define PCI_SUBSYSTEM_DEVID_CNF10KA 0xBA00
diff --git a/drivers/common/cnxk/roc_dev_priv.h b/drivers/common/cnxk/roc_dev_priv.h
index 302dc0feb0..55700dc851 100644
--- a/drivers/common/cnxk/roc_dev_priv.h
+++ b/drivers/common/cnxk/roc_dev_priv.h
@@ -89,6 +89,7 @@ struct dev {
 	struct dev_ops *ops;
 	void *roc_nix;
 	void *roc_cpt;
+	void *roc_ml;
 	bool disable_shared_lmt; /* false(default): shared lmt mode enabled */
 	const struct plt_memzone *lmt_mz;
 } __plt_cache_aligned;
diff --git a/drivers/common/cnxk/roc_ml.c b/drivers/common/cnxk/roc_ml.c
new file mode 100644
index 0000000000..7390697b1d
--- /dev/null
+++ b/drivers/common/cnxk/roc_ml.c
@@ -0,0 +1,626 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#include "roc_api.h"
+#include "roc_priv.h"
+
+#define TIME_SEC_IN_MS 1000
+
+static int
+roc_ml_reg_wait_to_clear(struct roc_ml *roc_ml, uint64_t offset, uint64_t mask)
+{
+	uint64_t start_cycle;
+	uint64_t wait_cycles;
+	uint64_t reg_val;
+
+	wait_cycles = (ROC_ML_TIMEOUT_MS * plt_tsc_hz()) / TIME_SEC_IN_MS;
+	start_cycle = plt_tsc_cycles();
+	do {
+		reg_val = roc_ml_reg_read64(roc_ml, offset);
+
+		if (!(reg_val & mask))
+			return 0;
+	} while (plt_tsc_cycles() - start_cycle < wait_cycles);
+
+	return -ETIME;
+}
+
+uint64_t
+roc_ml_reg_read64(struct roc_ml *roc_ml, uint64_t offset)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+
+	return plt_read64(PLT_PTR_ADD(ml->ml_reg_addr, offset));
+}
+
+void
+roc_ml_reg_write64(struct roc_ml *roc_ml, uint64_t val, uint64_t offset)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+
+	plt_write64(val, PLT_PTR_ADD(ml->ml_reg_addr, offset));
+}
+
+uint32_t
+roc_ml_reg_read32(struct roc_ml *roc_ml, uint64_t offset)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+
+	return plt_read32(PLT_PTR_ADD(ml->ml_reg_addr, offset));
+}
+
+void
+roc_ml_reg_write32(struct roc_ml *roc_ml, uint32_t val, uint64_t offset)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+
+	plt_write32(val, PLT_PTR_ADD(ml->ml_reg_addr, offset));
+}
+
+void
+roc_ml_reg_save(struct roc_ml *roc_ml, uint64_t offset)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+
+	if (offset == ML_MLR_BASE) {
+		ml->ml_mlr_base =
+			FIELD_GET(ROC_ML_MLR_BASE_BASE, roc_ml_reg_read64(roc_ml, offset));
+		ml->ml_mlr_base_saved = true;
+	}
+}
+
+void *
+roc_ml_addr_ap2mlip(struct roc_ml *roc_ml, void *addr)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+	uint64_t ml_mlr_base;
+
+	ml_mlr_base = (ml->ml_mlr_base_saved) ? ml->ml_mlr_base :
+						FIELD_GET(ROC_ML_MLR_BASE_BASE,
+							  roc_ml_reg_read64(roc_ml, ML_MLR_BASE));
+	return PLT_PTR_ADD(addr, ML_AXI_START_ADDR - ml_mlr_base);
+}
+
+void *
+roc_ml_addr_mlip2ap(struct roc_ml *roc_ml, void *addr)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+	uint64_t ml_mlr_base;
+
+	ml_mlr_base = (ml->ml_mlr_base_saved) ? ml->ml_mlr_base :
+						FIELD_GET(ROC_ML_MLR_BASE_BASE,
+							  roc_ml_reg_read64(roc_ml, ML_MLR_BASE));
+	return PLT_PTR_ADD(addr, ml_mlr_base - ML_AXI_START_ADDR);
+}
+
+uint64_t
+roc_ml_addr_pa_to_offset(struct roc_ml *roc_ml, uint64_t phys_addr)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+
+	if (roc_model_is_cn10ka())
+		return phys_addr - ml->pci_dev->mem_resource[0].phys_addr;
+	else
+		return phys_addr - ml->pci_dev->mem_resource[0].phys_addr - ML_MLAB_BLK_OFFSET;
+}
+
+uint64_t
+roc_ml_addr_offset_to_pa(struct roc_ml *roc_ml, uint64_t offset)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+
+	if (roc_model_is_cn10ka())
+		return ml->pci_dev->mem_resource[0].phys_addr + offset;
+	else
+		return ml->pci_dev->mem_resource[0].phys_addr + ML_MLAB_BLK_OFFSET + offset;
+}
+
+void
+roc_ml_scratch_write_job(struct roc_ml *roc_ml, void *work_ptr)
+{
+	union ml_scratch_work_ptr_s reg_work_ptr;
+	union ml_scratch_fw_ctrl_s reg_fw_ctrl;
+
+	reg_work_ptr.u64 = 0;
+	reg_work_ptr.s.work_ptr = PLT_U64_CAST(roc_ml_addr_ap2mlip(roc_ml, work_ptr));
+
+	reg_fw_ctrl.u64 = 0;
+	reg_fw_ctrl.s.valid = 1;
+
+	roc_ml_reg_write64(roc_ml, reg_work_ptr.u64, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(roc_ml, reg_fw_ctrl.u64, ML_SCRATCH_FW_CTRL);
+}
+
+bool
+roc_ml_scratch_is_valid_bit_set(struct roc_ml *roc_ml)
+{
+	union ml_scratch_fw_ctrl_s reg_fw_ctrl;
+
+	reg_fw_ctrl.u64 = roc_ml_reg_read64(roc_ml, ML_SCRATCH_FW_CTRL);
+
+	if (reg_fw_ctrl.s.valid == 1)
+		return true;
+
+	return false;
+}
+
+bool
+roc_ml_scratch_is_done_bit_set(struct roc_ml *roc_ml)
+{
+	union ml_scratch_fw_ctrl_s reg_fw_ctrl;
+
+	reg_fw_ctrl.u64 = roc_ml_reg_read64(roc_ml, ML_SCRATCH_FW_CTRL);
+
+	if (reg_fw_ctrl.s.done == 1)
+		return true;
+
+	return false;
+}
+
+bool
+roc_ml_scratch_enqueue(struct roc_ml *roc_ml, void *work_ptr)
+{
+	union ml_scratch_work_ptr_s reg_work_ptr;
+	union ml_scratch_fw_ctrl_s reg_fw_ctrl;
+	bool ret = false;
+
+	reg_work_ptr.u64 = 0;
+	reg_work_ptr.s.work_ptr = PLT_U64_CAST(roc_ml_addr_ap2mlip(roc_ml, work_ptr));
+
+	reg_fw_ctrl.u64 = 0;
+	reg_fw_ctrl.s.valid = 1;
+
+	if (plt_spinlock_trylock(&roc_ml->sp_spinlock) != 0) {
+		bool valid = roc_ml_scratch_is_valid_bit_set(roc_ml);
+		bool done = roc_ml_scratch_is_done_bit_set(roc_ml);
+
+		if (valid == done) {
+			roc_ml_clk_force_on(roc_ml);
+			roc_ml_dma_stall_off(roc_ml);
+
+			roc_ml_reg_write64(roc_ml, reg_work_ptr.u64, ML_SCRATCH_WORK_PTR);
+			roc_ml_reg_write64(roc_ml, reg_fw_ctrl.u64, ML_SCRATCH_FW_CTRL);
+
+			ret = true;
+		}
+		plt_spinlock_unlock(&roc_ml->sp_spinlock);
+	}
+
+	return ret;
+}
+
+bool
+roc_ml_scratch_dequeue(struct roc_ml *roc_ml, void *work_ptr)
+{
+	union ml_scratch_work_ptr_s reg_work_ptr;
+	bool ret = false;
+
+	if (plt_spinlock_trylock(&roc_ml->sp_spinlock) != 0) {
+		bool valid = roc_ml_scratch_is_valid_bit_set(roc_ml);
+		bool done = roc_ml_scratch_is_done_bit_set(roc_ml);
+
+		if (valid && done) {
+			reg_work_ptr.u64 = roc_ml_reg_read64(roc_ml, ML_SCRATCH_WORK_PTR);
+			if (work_ptr ==
+			    roc_ml_addr_mlip2ap(roc_ml, PLT_PTR_CAST(reg_work_ptr.u64))) {
+				roc_ml_dma_stall_on(roc_ml);
+				roc_ml_clk_force_off(roc_ml);
+
+				roc_ml_reg_write64(roc_ml, 0, ML_SCRATCH_WORK_PTR);
+				roc_ml_reg_write64(roc_ml, 0, ML_SCRATCH_FW_CTRL);
+				ret = true;
+			}
+		}
+		plt_spinlock_unlock(&roc_ml->sp_spinlock);
+	}
+
+	return ret;
+}
+
+void
+roc_ml_scratch_queue_reset(struct roc_ml *roc_ml)
+{
+	if (plt_spinlock_trylock(&roc_ml->sp_spinlock) != 0) {
+		roc_ml_dma_stall_on(roc_ml);
+		roc_ml_clk_force_off(roc_ml);
+		roc_ml_reg_write64(roc_ml, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(roc_ml, 0, ML_SCRATCH_FW_CTRL);
+		plt_spinlock_unlock(&roc_ml->sp_spinlock);
+	}
+}
+
+bool
+roc_ml_jcmdq_enqueue_lf(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd)
+{
+	bool ret = false;
+
+	if (FIELD_GET(ROC_ML_JCMDQ_STATUS_AVAIL_COUNT,
+		      roc_ml_reg_read64(roc_ml, ML_JCMDQ_STATUS)) != 0) {
+		roc_ml_reg_write64(roc_ml, job_cmd->w0.u64, ML_JCMDQ_IN(0));
+		roc_ml_reg_write64(roc_ml, job_cmd->w1.u64, ML_JCMDQ_IN(1));
+		ret = true;
+	}
+
+	return ret;
+}
+
+bool
+roc_ml_jcmdq_enqueue_sl(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd)
+{
+	bool ret = false;
+
+	if (plt_spinlock_trylock(&roc_ml->fp_spinlock) != 0) {
+		if (FIELD_GET(ROC_ML_JCMDQ_STATUS_AVAIL_COUNT,
+			      roc_ml_reg_read64(roc_ml, ML_JCMDQ_STATUS)) != 0) {
+			roc_ml_reg_write64(roc_ml, job_cmd->w0.u64, ML_JCMDQ_IN(0));
+			roc_ml_reg_write64(roc_ml, job_cmd->w1.u64, ML_JCMDQ_IN(1));
+			ret = true;
+		}
+		plt_spinlock_unlock(&roc_ml->fp_spinlock);
+	}
+
+	return ret;
+}
+
+void
+roc_ml_clk_force_on(struct roc_ml *roc_ml)
+{
+	uint64_t reg_val = 0;
+
+	reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+	reg_val |= ROC_ML_CFG_MLIP_CLK_FORCE;
+	roc_ml_reg_write64(roc_ml, reg_val, ML_CFG);
+}
+
+void
+roc_ml_clk_force_off(struct roc_ml *roc_ml)
+{
+	uint64_t reg_val = 0;
+
+	roc_ml_reg_write64(roc_ml, 0, ML_SCRATCH_WORK_PTR);
+
+	reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+	reg_val &= ~ROC_ML_CFG_MLIP_CLK_FORCE;
+	roc_ml_reg_write64(roc_ml, reg_val, ML_CFG);
+}
+
+void
+roc_ml_dma_stall_on(struct roc_ml *roc_ml)
+{
+	uint64_t reg_val = 0;
+
+	reg_val = roc_ml_reg_read64(roc_ml, ML_JOB_MGR_CTRL);
+	reg_val |= ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
+	roc_ml_reg_write64(roc_ml, reg_val, ML_JOB_MGR_CTRL);
+}
+
+void
+roc_ml_dma_stall_off(struct roc_ml *roc_ml)
+{
+	uint64_t reg_val = 0;
+
+	reg_val = roc_ml_reg_read64(roc_ml, ML_JOB_MGR_CTRL);
+	reg_val &= ~ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
+	roc_ml_reg_write64(roc_ml, reg_val, ML_JOB_MGR_CTRL);
+}
+
+bool
+roc_ml_mlip_is_enabled(struct roc_ml *roc_ml)
+{
+	uint64_t reg_val;
+
+	reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+
+	if ((reg_val & ROC_ML_CFG_MLIP_ENA) != 0)
+		return true;
+
+	return false;
+}
+
+int
+roc_ml_mlip_reset(struct roc_ml *roc_ml, bool force)
+{
+	uint64_t reg_val;
+
+	/* Force reset */
+	if (force) {
+		/* Set ML(0)_CFG[ENA] = 0. */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+		reg_val &= ~ROC_ML_CFG_ENA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_CFG);
+
+		/* Set ML(0)_CFG[MLIP_ENA] = 0. */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+		reg_val &= ~ROC_ML_CFG_MLIP_ENA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_CFG);
+
+		/* Clear ML_MLR_BASE */
+		roc_ml_reg_write64(roc_ml, 0, ML_MLR_BASE);
+	}
+
+	if (roc_model_is_cn10ka()) {
+		/* Wait for all active jobs to finish.
+		 * ML_CFG[ENA] : When set, MLW will accept job commands. This
+		 * bit can be cleared at any time. If [BUSY] is set, software
+		 * must wait until [BUSY] == 0 before setting this bit.
+		 */
+		roc_ml_reg_wait_to_clear(roc_ml, ML_CFG, ROC_ML_CFG_BUSY);
+
+		/* (1) Set ML(0)_AXI_BRIDGE_CTRL(0..1)[FENCE] = 1 to instruct
+		 * the AXI bridge not to accept any new transactions from MLIP.
+		 */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(0));
+		reg_val |= ROC_ML_AXI_BRIDGE_CTRL_FENCE;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(0));
+
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(1));
+		reg_val |= ROC_ML_AXI_BRIDGE_CTRL_FENCE;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(1));
+
+		/* (2) Wait until ML(0)_AXI_BRIDGE_CTRL(0..1)[BUSY] = 0 which
+		 * indicates that there is no outstanding transactions on
+		 * AXI-NCB paths.
+		 */
+		roc_ml_reg_wait_to_clear(roc_ml, ML_AXI_BRIDGE_CTRL(0),
+					 ROC_ML_AXI_BRIDGE_CTRL_BUSY);
+		roc_ml_reg_wait_to_clear(roc_ml, ML_AXI_BRIDGE_CTRL(1),
+					 ROC_ML_AXI_BRIDGE_CTRL_BUSY);
+
+		/* (3) Wait until ML(0)_JOB_MGR_CTRL[BUSY] = 0 which indicates
+		 * that there are no pending jobs in the MLW's job manager.
+		 */
+		roc_ml_reg_wait_to_clear(roc_ml, ML_JOB_MGR_CTRL, ROC_ML_JOB_MGR_CTRL_BUSY);
+
+		/* (4) Set ML(0)_CFG[ENA] = 0. */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+		reg_val &= ~ROC_ML_CFG_ENA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_CFG);
+
+		/* (5) Set ML(0)_CFG[MLIP_ENA] = 0. */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+		reg_val &= ~ROC_ML_CFG_MLIP_ENA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_CFG);
+
+		/* (6) Set ML(0)_AXI_BRIDGE_CTRL(0..1)[FENCE] = 0.*/
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(0));
+		reg_val &= ~ROC_ML_AXI_BRIDGE_CTRL_FENCE;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(0));
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(1));
+	}
+
+	if (roc_model_is_cnf10kb()) {
+		/* (1) Clear MLAB(0)_CFG[ENA]. Any new jobs will bypass the job
+		 * execution stages and their completions will be returned to
+		 * PSM.
+		 */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+		reg_val &= ~ROC_ML_CFG_ENA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_CFG);
+
+		/* (2) Quiesce the ACC and DMA AXI interfaces: For each of the
+		 * two MLAB(0)_AXI_BRIDGE_CTRL(0..1) registers:
+		 *
+		 * (a) Set MLAB(0)_AXI_BRIDGE_CTRL(0..1)[FENCE] to block new AXI
+		 * commands from MLIP.
+		 *
+		 * (b) Poll MLAB(0)_AXI_BRIDGE_CTRL(0..1)[BUSY] == 0.
+		 */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(0));
+		reg_val |= ROC_ML_AXI_BRIDGE_CTRL_FENCE;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(0));
+
+		roc_ml_reg_wait_to_clear(roc_ml, ML_AXI_BRIDGE_CTRL(0),
+					 ROC_ML_AXI_BRIDGE_CTRL_BUSY);
+
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(1));
+		reg_val |= ROC_ML_AXI_BRIDGE_CTRL_FENCE;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(1));
+
+		roc_ml_reg_wait_to_clear(roc_ml, ML_AXI_BRIDGE_CTRL(1),
+					 ROC_ML_AXI_BRIDGE_CTRL_BUSY);
+
+		/* (3) Clear MLAB(0)_CFG[MLIP_ENA] to reset MLIP.
+		 */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+		reg_val &= ~ROC_ML_CFG_MLIP_ENA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_CFG);
+
+cnf10kb_mlip_reset_stage_4a:
+		/* (4) Flush any outstanding jobs in MLAB's job execution
+		 * stages:
+		 *
+		 * (a) Wait for completion stage to clear:
+		 *   - Poll MLAB(0)_STG(0..2)_STATUS[VALID] == 0.
+		 */
+		roc_ml_reg_wait_to_clear(roc_ml, ML_STGX_STATUS(0), ROC_ML_STG_STATUS_VALID);
+		roc_ml_reg_wait_to_clear(roc_ml, ML_STGX_STATUS(1), ROC_ML_STG_STATUS_VALID);
+		roc_ml_reg_wait_to_clear(roc_ml, ML_STGX_STATUS(2), ROC_ML_STG_STATUS_VALID);
+
+cnf10kb_mlip_reset_stage_4b:
+		/* (4b) Clear job run stage: Poll
+		 * MLAB(0)_STG_CONTROL[RUN_TO_COMP] == 0.
+		 */
+		roc_ml_reg_wait_to_clear(roc_ml, ML_STG_CONTROL, ROC_ML_STG_CONTROL_RUN_TO_COMP);
+
+		/* (4b) Clear job run stage: If MLAB(0)_STG(1)_STATUS[VALID] ==
+		 * 1:
+		 *     - Set MLAB(0)_STG_CONTROL[RUN_TO_COMP].
+		 *     - Poll MLAB(0)_STG_CONTROL[RUN_TO_COMP] == 0.
+		 *     - Repeat step (a) to clear job completion stage.
+		 */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_STGX_STATUS(1));
+		if (reg_val & ROC_ML_STG_STATUS_VALID) {
+			reg_val = roc_ml_reg_read64(roc_ml, ML_STG_CONTROL);
+			reg_val |= ROC_ML_STG_CONTROL_RUN_TO_COMP;
+			roc_ml_reg_write64(roc_ml, reg_val, ML_STG_CONTROL);
+
+			roc_ml_reg_wait_to_clear(roc_ml, ML_STG_CONTROL,
+						 ROC_ML_STG_CONTROL_RUN_TO_COMP);
+
+			goto cnf10kb_mlip_reset_stage_4a;
+		}
+
+		/* (4c) Clear job fetch stage: Poll
+		 * MLAB(0)_STG_CONTROL[FETCH_TO_RUN] == 0.
+		 */
+		roc_ml_reg_wait_to_clear(roc_ml, ML_STG_CONTROL, ROC_ML_STG_CONTROL_FETCH_TO_RUN);
+
+		/* (4c) Clear job fetch stage: If
+		 * MLAB(0)_STG(0..2)_STATUS[VALID] == 1:
+		 *     - Set MLAB(0)_STG_CONTROL[FETCH_TO_RUN].
+		 *     - Poll MLAB(0)_STG_CONTROL[FETCH_TO_RUN] == 0.
+		 *     - Repeat step (b) to clear job run and completion stages.
+		 */
+		reg_val = (roc_ml_reg_read64(roc_ml, ML_STGX_STATUS(0)) |
+			   roc_ml_reg_read64(roc_ml, ML_STGX_STATUS(1)) |
+			   roc_ml_reg_read64(roc_ml, ML_STGX_STATUS(2)));
+
+		if (reg_val & ROC_ML_STG_STATUS_VALID) {
+			reg_val = roc_ml_reg_read64(roc_ml, ML_STG_CONTROL);
+			reg_val |= ROC_ML_STG_CONTROL_RUN_TO_COMP;
+			roc_ml_reg_write64(roc_ml, reg_val, ML_STG_CONTROL);
+
+			roc_ml_reg_wait_to_clear(roc_ml, ML_STG_CONTROL,
+						 ROC_ML_STG_CONTROL_RUN_TO_COMP);
+
+			goto cnf10kb_mlip_reset_stage_4b;
+		}
+
+		/* (5) Reset the ACC and DMA AXI interfaces: For each of the two
+		 * MLAB(0)_AXI_BRIDGE_CTRL(0..1) registers:
+		 *
+		 * (5a) Set and then clear
+		 * MLAB(0)_AXI_BRIDGE_CTRL(0..1)[FLUSH_WRITE_DATA].
+		 *
+		 * (5b) Clear MLAB(0)_AXI_BRIDGE_CTRL(0..1)[FENCE].
+		 */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(0));
+		reg_val |= ROC_ML_AXI_BRIDGE_CTRL_FLUSH_WRITE_DATA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(0));
+
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(0));
+		reg_val &= ~ROC_ML_AXI_BRIDGE_CTRL_FLUSH_WRITE_DATA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(0));
+
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(0));
+		reg_val &= ~ROC_ML_AXI_BRIDGE_CTRL_FENCE;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(0));
+
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(1));
+		reg_val |= ROC_ML_AXI_BRIDGE_CTRL_FLUSH_WRITE_DATA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(1));
+
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(1));
+		reg_val &= ~ROC_ML_AXI_BRIDGE_CTRL_FLUSH_WRITE_DATA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(1));
+
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(1));
+		reg_val &= ~ROC_ML_AXI_BRIDGE_CTRL_FENCE;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(1));
+	}
+
+	return 0;
+}
+
+int
+roc_ml_dev_init(struct roc_ml *roc_ml)
+{
+	struct plt_pci_device *pci_dev;
+	struct dev *dev;
+	struct ml *ml;
+
+	if (roc_ml == NULL || roc_ml->pci_dev == NULL)
+		return -EINVAL;
+
+	PLT_STATIC_ASSERT(sizeof(struct ml) <= ROC_ML_MEM_SZ);
+
+	ml = roc_ml_to_ml_priv(roc_ml);
+	memset(ml, 0, sizeof(*ml));
+	pci_dev = roc_ml->pci_dev;
+	dev = &ml->dev;
+
+	ml->pci_dev = pci_dev;
+	dev->roc_ml = roc_ml;
+
+	ml->ml_reg_addr = ml->pci_dev->mem_resource[0].addr;
+	ml->ml_mlr_base = 0;
+	ml->ml_mlr_base_saved = false;
+
+	plt_ml_dbg("ML: PCI Physical Address : 0x%016lx", ml->pci_dev->mem_resource[0].phys_addr);
+	plt_ml_dbg("ML: PCI Virtual Address : 0x%016lx",
+		   PLT_U64_CAST(ml->pci_dev->mem_resource[0].addr));
+
+	plt_spinlock_init(&roc_ml->sp_spinlock);
+	plt_spinlock_init(&roc_ml->fp_spinlock);
+
+	return 0;
+}
+
+int
+roc_ml_dev_fini(struct roc_ml *roc_ml)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+
+	if (ml == NULL)
+		return -EINVAL;
+
+	return 0;
+}
+
+int
+roc_ml_blk_init(struct roc_bphy *roc_bphy, struct roc_ml *roc_ml)
+{
+	struct dev *dev;
+	struct ml *ml;
+
+	if ((roc_ml == NULL) || (roc_bphy == NULL))
+		return -EINVAL;
+
+	PLT_STATIC_ASSERT(sizeof(struct ml) <= ROC_ML_MEM_SZ);
+
+	ml = roc_ml_to_ml_priv(roc_ml);
+	memset(ml, 0, sizeof(*ml));
+
+	dev = &ml->dev;
+
+	ml->pci_dev = roc_bphy->pci_dev;
+	dev->roc_ml = roc_ml;
+
+	plt_ml_dbg(
+		"MLAB: Physical Address : 0x%016lx",
+		PLT_PTR_ADD_U64_CAST(ml->pci_dev->mem_resource[0].phys_addr, ML_MLAB_BLK_OFFSET));
+	plt_ml_dbg("MLAB: Virtual Address : 0x%016lx",
+		   PLT_PTR_ADD_U64_CAST(ml->pci_dev->mem_resource[0].addr, ML_MLAB_BLK_OFFSET));
+
+	ml->ml_reg_addr = PLT_PTR_ADD(ml->pci_dev->mem_resource[0].addr, ML_MLAB_BLK_OFFSET);
+	ml->ml_mlr_base = 0;
+	ml->ml_mlr_base_saved = false;
+
+	plt_spinlock_init(&roc_ml->sp_spinlock);
+	plt_spinlock_init(&roc_ml->fp_spinlock);
+
+	return 0;
+}
+
+int
+roc_ml_blk_fini(struct roc_bphy *roc_bphy, struct roc_ml *roc_ml)
+{
+	struct ml *ml;
+
+	if ((roc_ml == NULL) || (roc_bphy == NULL))
+		return -EINVAL;
+
+	ml = roc_ml_to_ml_priv(roc_ml);
+
+	if (ml == NULL)
+		return -EINVAL;
+
+	return 0;
+}
+
+uint16_t
+roc_ml_sso_pf_func_get(void)
+{
+	return idev_sso_pffunc_get();
+}
diff --git a/drivers/common/cnxk/roc_ml.h b/drivers/common/cnxk/roc_ml.h
new file mode 100644
index 0000000000..3cd82be6a6
--- /dev/null
+++ b/drivers/common/cnxk/roc_ml.h
@@ -0,0 +1,152 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef _ROC_ML_H_
+#define _ROC_ML_H_
+
+#include "roc_api.h"
+
+#define ROC_ML_MEM_SZ	  (6 * 1024)
+#define ROC_ML_TIMEOUT_MS 10000
+
+/* ML_CFG */
+#define ROC_ML_CFG_JD_SIZE	  GENMASK_ULL(1, 0)
+#define ROC_ML_CFG_MLIP_ENA	  BIT_ULL(2)
+#define ROC_ML_CFG_BUSY		  BIT_ULL(3)
+#define ROC_ML_CFG_WRAP_CLK_FORCE BIT_ULL(4)
+#define ROC_ML_CFG_MLIP_CLK_FORCE BIT_ULL(5)
+#define ROC_ML_CFG_ENA		  BIT_ULL(6)
+
+/* ML_MLR_BASE */
+#define ROC_ML_MLR_BASE_BASE GENMASK_ULL(51, 0)
+
+/* ML_STG_STATUS */
+#define ROC_ML_STG_STATUS_VALID		BIT_ULL(0)
+#define ROC_ML_STG_STATUS_ADDR_ERR	BIT_ULL(1)
+#define ROC_ML_STG_STATUS_DMA_ERR	BIT_ULL(2)
+#define ROC_ML_STG_STATUS_TIMEOUT	BIT_ULL(3)
+#define ROC_ML_STG_STATUS_NFAT_ERR	BIT_ULL(4)
+#define ROC_ML_STG_STATUS_JOB_ERR	BIT_ULL(5)
+#define ROC_ML_STG_STATUS_ELAPSED_TICKS GENMASK_ULL(47, 6)
+
+/* ML_STG_CONTROL */
+#define ROC_ML_STG_CONTROL_FETCH_TO_RUN BIT_ULL(0)
+#define ROC_ML_STG_CONTROL_RUN_TO_COMP	BIT_ULL(1)
+
+/* ML_AXI_BRIDGE */
+#define ROC_ML_AXI_BRIDGE_CTRL_AXI_RESP_CTRL	      BIT_ULL(0)
+#define ROC_ML_AXI_BRIDGE_CTRL_BRIDGE_CTRL_MODE	      BIT_ULL(1)
+#define ROC_ML_AXI_BRIDGE_CTRL_FORCE_AXI_ID	      GENMASK_ULL(11, 2)
+#define ROC_ML_AXI_BRIDGE_CTRL_CSR_WR_BLK	      BIT_ULL(13)
+#define ROC_ML_AXI_BRIDGE_CTRL_NCB_WR_BLK	      BIT_ULL(14)
+#define ROC_ML_AXI_BRIDGE_CTRL_CSR_RD_BLK	      BIT_ULL(15)
+#define ROC_ML_AXI_BRIDGE_CTRL_NCB_RD_BLK	      BIT_ULL(16)
+#define ROC_ML_AXI_BRIDGE_CTRL_FENCE		      BIT_ULL(17)
+#define ROC_ML_AXI_BRIDGE_CTRL_BUSY		      BIT_ULL(18)
+#define ROC_ML_AXI_BRIDGE_CTRL_FORCE_WRESP_OK	      BIT_ULL(19)
+#define ROC_ML_AXI_BRIDGE_CTRL_FORCE_RRESP_OK	      BIT_ULL(20)
+#define ROC_ML_AXI_BRIDGE_CTRL_CSR_FORCE_CMPLT	      BIT_ULL(21)
+#define ROC_ML_AXI_BRIDGE_CTRL_WR_CNT_GEAR	      GENMASK_ULL(25, 22)
+#define ROC_ML_AXI_BRIDGE_CTRL_RD_GEAR		      GENMASK_ULL(28, 26)
+#define ROC_ML_AXI_BRIDGE_CTRL_CSR_CUTTHROUGH_MODE    BIT_ULL(29)
+#define ROC_ML_AXI_BRIDGE_CTRL_GAA_WRITE_CREDITS      GENMASK_ULL(33, 30)
+#define ROC_ML_AXI_BRIDGE_CTRL_GAA_READ_CREDITS	      GENMASK_ULL(37, 34)
+#define ROC_ML_AXI_BRIDGE_CTRL_GAA_LOAD_WRITE_CREDITS BIT_ULL(38)
+#define ROC_ML_AXI_BRIDGE_CTRL_GAA_LOAD_READ_CREDITS  BIT_ULL(39)
+#define ROC_ML_AXI_BRIDGE_CTRL_FLUSH_WRITE_DATA	      BIT_ULL(40)
+
+/* ML_JOB_MGR_CTRL */
+#define ROC_ML_JOB_MGR_CTRL_STALL_ON_ERR     BIT_ULL(0)
+#define ROC_ML_JOB_MGR_CTRL_PF_OVERRIDE	     BIT_ULL(1)
+#define ROC_ML_JOB_MGR_CTRL_PF_FUNC_OVERRIDE GENMASK_ULL(19, 4)
+#define ROC_ML_JOB_MGR_CTRL_BUSY	     BIT_ULL(20)
+#define ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE    BIT_ULL(21)
+
+/* ML_JCMDQ_STATUS */
+#define ROC_ML_JCMDQ_STATUS_AVAIL_COUNT GENMASK_ULL(4, 0)
+
+/* ML_ANBX_BACKP_DISABLE */
+#define ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_B_BACKP_DISABLE BIT_ULL(0)
+#define ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_R_BACKP_DISABLE BIT_ULL(1)
+
+/* ML_ANBX_NCBI_P_OVR */
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_MSH_DST_OVR_VLD	 BIT_ULL(0)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_MSH_DST_OVR	 GENMASK_ULL(11, 1)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR_VLD	 BIT_ULL(12)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR		 BIT_ULL(13)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_PADDR_OVR_VLD	 BIT_ULL(14)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_PADDR_OVR		 BIT_ULL(15)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR_VLD	 BIT_ULL(16)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR		 BIT_ULL(17)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_MPADID_VAL_OVR_VLD BIT_ULL(18)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_MPADID_VAL_OVR	 BIT_ULL(19)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_MPAMDID_OVR_VLD	 BIT_ULL(20)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_MPAMDID_OVR	 BIT_ULL(21)
+
+/* ML_ANBX_NCBI_NP_OVR */
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_MSH_DST_OVR_VLD	   BIT_ULL(0)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_MSH_DST_OVR	   GENMASK_ULL(11, 1)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR_VLD	   BIT_ULL(12)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR		   BIT_ULL(13)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_PADDR_OVR_VLD	   BIT_ULL(14)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_PADDR_OVR	   BIT_ULL(15)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_RO_OVR_VLD	   BIT_ULL(16)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_RO_OVR		   BIT_ULL(17)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_MPADID_VAL_OVR_VLD BIT_ULL(18)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_MPADID_VAL_OVR	   BIT_ULL(19)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_MPAMDID_OVR_VLD	   BIT_ULL(20)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_MPAMDID_OVR	   BIT_ULL(21)
+
+/* ML_SW_RST_CTRL */
+#define ROC_ML_SW_RST_CTRL_ACC_RST  BIT_ULL(0)
+#define ROC_ML_SW_RST_CTRL_CMPC_RST BIT_ULL(1)
+
+struct roc_ml {
+	struct plt_pci_device *pci_dev;
+	plt_spinlock_t sp_spinlock;
+	plt_spinlock_t fp_spinlock;
+	uint8_t reserved[ROC_ML_MEM_SZ] __plt_cache_aligned;
+} __plt_cache_aligned;
+
+/* Register read and write functions */
+uint64_t __roc_api roc_ml_reg_read64(struct roc_ml *roc_ml, uint64_t offset);
+void __roc_api roc_ml_reg_write64(struct roc_ml *roc_ml, uint64_t val, uint64_t offset);
+uint32_t __roc_api roc_ml_reg_read32(struct roc_ml *roc_ml, uint64_t offset);
+void __roc_api roc_ml_reg_write32(struct roc_ml *roc_ml, uint32_t val, uint64_t offset);
+void __roc_api roc_ml_reg_save(struct roc_ml *roc_ml, uint64_t offset);
+
+/* Address translation functions */
+uint64_t __roc_api roc_ml_addr_pa_to_offset(struct roc_ml *roc_ml, uint64_t phys_addr);
+uint64_t __roc_api roc_ml_addr_offset_to_pa(struct roc_ml *roc_ml, uint64_t offset);
+void *__roc_api roc_ml_addr_ap2mlip(struct roc_ml *roc_ml, void *addr);
+void *__roc_api roc_ml_addr_mlip2ap(struct roc_ml *roc_ml, void *addr);
+
+/* Scratch and JCMDQ functions */
+void __roc_api roc_ml_scratch_write_job(struct roc_ml *roc_ml, void *jd);
+bool __roc_api roc_ml_scratch_is_valid_bit_set(struct roc_ml *roc_ml);
+bool __roc_api roc_ml_scratch_is_done_bit_set(struct roc_ml *roc_ml);
+bool __roc_api roc_ml_scratch_enqueue(struct roc_ml *roc_ml, void *work_ptr);
+bool __roc_api roc_ml_scratch_dequeue(struct roc_ml *roc_ml, void *work_ptr);
+void __roc_api roc_ml_scratch_queue_reset(struct roc_ml *roc_ml);
+bool __roc_api roc_ml_jcmdq_enqueue_lf(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
+bool __roc_api roc_ml_jcmdq_enqueue_sl(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
+
+/* Device management functions */
+void __roc_api roc_ml_clk_force_on(struct roc_ml *roc_ml);
+void __roc_api roc_ml_clk_force_off(struct roc_ml *roc_ml);
+void __roc_api roc_ml_dma_stall_on(struct roc_ml *roc_ml);
+void __roc_api roc_ml_dma_stall_off(struct roc_ml *roc_ml);
+bool __roc_api roc_ml_mlip_is_enabled(struct roc_ml *roc_ml);
+int __roc_api roc_ml_mlip_reset(struct roc_ml *roc_ml, bool force);
+
+/* Device / block  functions */
+int __roc_api roc_ml_dev_init(struct roc_ml *roc_ml);
+int __roc_api roc_ml_dev_fini(struct roc_ml *roc_ml);
+int __roc_api roc_ml_blk_init(struct roc_bphy *roc_bphy, struct roc_ml *roc_ml);
+int __roc_api roc_ml_blk_fini(struct roc_bphy *roc_bphy, struct roc_ml *roc_ml);
+
+/* Utility functions */
+uint16_t __roc_api roc_ml_sso_pf_func_get(void);
+
+#endif /*_ROC_ML_H_*/
diff --git a/drivers/common/cnxk/roc_ml_priv.h b/drivers/common/cnxk/roc_ml_priv.h
new file mode 100644
index 0000000000..ad5fe90bab
--- /dev/null
+++ b/drivers/common/cnxk/roc_ml_priv.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef _ROC_ML_PRIV_H_
+#define _ROC_ML_PRIV_H_
+
+#include "roc_api.h"
+
+struct ml {
+	struct plt_pci_device *pci_dev;
+	struct dev dev;
+	uint8_t *ml_reg_addr;
+	uint64_t ml_mlr_base;
+	bool ml_mlr_base_saved;
+} __plt_cache_aligned;
+
+static inline struct ml *
+roc_ml_to_ml_priv(struct roc_ml *roc_ml)
+{
+	return (struct ml *)&roc_ml->reserved[0];
+}
+
+#endif /* _ROC_ML_PRIV_H_ */
diff --git a/drivers/common/cnxk/roc_platform.c b/drivers/common/cnxk/roc_platform.c
index ce0f9b870c..f91b95ceab 100644
--- a/drivers/common/cnxk/roc_platform.c
+++ b/drivers/common/cnxk/roc_platform.c
@@ -63,6 +63,7 @@ roc_plt_init(void)
 RTE_LOG_REGISTER(cnxk_logtype_base, pmd.cnxk.base, NOTICE);
 RTE_LOG_REGISTER(cnxk_logtype_mbox, pmd.cnxk.mbox, NOTICE);
 RTE_LOG_REGISTER(cnxk_logtype_cpt, pmd.crypto.cnxk, NOTICE);
+RTE_LOG_REGISTER(cnxk_logtype_ml, pmd.ml.cnxk, NOTICE);
 RTE_LOG_REGISTER(cnxk_logtype_npa, pmd.mempool.cnxk, NOTICE);
 RTE_LOG_REGISTER(cnxk_logtype_nix, pmd.net.cnxk, NOTICE);
 RTE_LOG_REGISTER(cnxk_logtype_npc, pmd.net.cnxk.flow, NOTICE);
diff --git a/drivers/common/cnxk/roc_platform.h b/drivers/common/cnxk/roc_platform.h
index 1a48ff3db4..a291ed1c66 100644
--- a/drivers/common/cnxk/roc_platform.h
+++ b/drivers/common/cnxk/roc_platform.h
@@ -233,6 +233,7 @@
 extern int cnxk_logtype_base;
 extern int cnxk_logtype_mbox;
 extern int cnxk_logtype_cpt;
+extern int cnxk_logtype_ml;
 extern int cnxk_logtype_npa;
 extern int cnxk_logtype_nix;
 extern int cnxk_logtype_npc;
@@ -260,6 +261,7 @@ extern int cnxk_logtype_ree;
 #define plt_base_dbg(fmt, ...)	plt_dbg(base, fmt, ##__VA_ARGS__)
 #define plt_cpt_dbg(fmt, ...)	plt_dbg(cpt, fmt, ##__VA_ARGS__)
 #define plt_mbox_dbg(fmt, ...)	plt_dbg(mbox, fmt, ##__VA_ARGS__)
+#define plt_ml_dbg(fmt, ...)	plt_dbg(ml, fmt, ##__VA_ARGS__)
 #define plt_npa_dbg(fmt, ...)	plt_dbg(npa, fmt, ##__VA_ARGS__)
 #define plt_nix_dbg(fmt, ...)	plt_dbg(nix, fmt, ##__VA_ARGS__)
 #define plt_npc_dbg(fmt, ...)	plt_dbg(npc, fmt, ##__VA_ARGS__)
diff --git a/drivers/common/cnxk/roc_priv.h b/drivers/common/cnxk/roc_priv.h
index 122d411fe7..14fe2e452a 100644
--- a/drivers/common/cnxk/roc_priv.h
+++ b/drivers/common/cnxk/roc_priv.h
@@ -47,4 +47,7 @@
 /* REE */
 #include "roc_ree_priv.h"

+/* ML */
+#include "roc_ml_priv.h"
+
 #endif /* _ROC_PRIV_H_ */
diff --git a/drivers/common/cnxk/version.map b/drivers/common/cnxk/version.map
index 17f0ec6b48..f7fe49e0ed 100644
--- a/drivers/common/cnxk/version.map
+++ b/drivers/common/cnxk/version.map
@@ -8,6 +8,7 @@ INTERNAL {
 	cnxk_logtype_base;
 	cnxk_logtype_cpt;
 	cnxk_logtype_mbox;
+	cnxk_logtype_ml;
 	cnxk_logtype_nix;
 	cnxk_logtype_npa;
 	cnxk_logtype_npc;
@@ -96,6 +97,34 @@ INTERNAL {
 	roc_idev_npa_nix_get;
 	roc_idev_num_lmtlines_get;
 	roc_idev_nix_inl_meta_aura_get;
+	roc_ml_reg_read64;
+	roc_ml_reg_write64;
+	roc_ml_reg_read32;
+	roc_ml_reg_write32;
+	roc_ml_reg_save;
+	roc_ml_addr_ap2mlip;
+	roc_ml_addr_mlip2ap;
+	roc_ml_addr_pa_to_offset;
+	roc_ml_addr_offset_to_pa;
+	roc_ml_scratch_write_job;
+	roc_ml_scratch_is_valid_bit_set;
+	roc_ml_scratch_is_done_bit_set;
+	roc_ml_scratch_enqueue;
+	roc_ml_scratch_dequeue;
+	roc_ml_scratch_queue_reset;
+	roc_ml_jcmdq_enqueue_lf;
+	roc_ml_jcmdq_enqueue_sl;
+	roc_ml_clk_force_on;
+	roc_ml_clk_force_off;
+	roc_ml_dma_stall_on;
+	roc_ml_dma_stall_off;
+	roc_ml_mlip_is_enabled;
+	roc_ml_mlip_reset;
+	roc_ml_dev_init;
+	roc_ml_dev_fini;
+	roc_ml_blk_init;
+	roc_ml_blk_fini;
+	roc_ml_sso_pf_func_get;
 	roc_model;
 	roc_se_auth_key_set;
 	roc_se_ciph_key_set;
--
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 02/38] ml/cnxk: add skeleton for ML cnxk driver
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 01/38] common/cnxk: add ML headers and ROC code for cnxk Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 03/38] ml/cnxk: enable probe and remove of ML device Srikanth Yalavarthi
                       ` (36 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Thomas Monjalon, Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added initial source files and build files for ML cnxk driver.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 MAINTAINERS                    |  2 ++
 drivers/meson.build            |  1 +
 drivers/ml/cnxk/cn10k_ml_dev.c |  8 ++++++++
 drivers/ml/cnxk/cn10k_ml_dev.h |  8 ++++++++
 drivers/ml/cnxk/meson.build    | 26 ++++++++++++++++++++++++++
 drivers/ml/meson.build         |  8 ++++++++
 6 files changed, 53 insertions(+)
 create mode 100644 drivers/ml/cnxk/cn10k_ml_dev.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_dev.h
 create mode 100644 drivers/ml/cnxk/meson.build
 create mode 100644 drivers/ml/meson.build

diff --git a/MAINTAINERS b/MAINTAINERS
index 8cdb3e215d..ba4c97e802 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1442,6 +1442,8 @@ Marvell ML CNXK
 M: Srikanth Yalavarthi <syalavarthi@marvell.com>
 F: drivers/common/cnxk/hw/ml.h
 F: drivers/common/cnxk/roc_ml*
+F: drivers/ml/cnxk/
+
 
 Packet processing
 -----------------
diff --git a/drivers/meson.build b/drivers/meson.build
index c6d619200f..546a5f409d 100644
--- a/drivers/meson.build
+++ b/drivers/meson.build
@@ -14,6 +14,7 @@ subdirs = [
         'mempool',        # depends on common and bus.
         'dma',            # depends on common and bus.
         'net',            # depends on common, bus, mempool
+        'ml',             # depends on common, bus, mempool
         'raw',            # depends on common, bus, dma and net.
         'crypto',         # depends on common, bus and mempool (net in future).
         'compress',       # depends on common, bus, mempool.
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
new file mode 100644
index 0000000000..cc96a7bdb3
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include "cn10k_ml_dev.h"
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
new file mode 100644
index 0000000000..049ac13fcd
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef _CN10K_ML_DEV_H_
+#define _CN10K_ML_DEV_H_
+
+#endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
new file mode 100644
index 0000000000..f33fef39e3
--- /dev/null
+++ b/drivers/ml/cnxk/meson.build
@@ -0,0 +1,26 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2022 Marvell.
+
+if not is_linux or not dpdk_conf.get('RTE_ARCH_64')
+    build = false
+    reason = 'only supported on 64-bit Linux'
+    subdir_done()
+endif
+
+driver_sdk_headers = files(
+        'cn10k_ml_dev.h',
+)
+
+sources = files(
+        'cn10k_ml_dev.c',
+)
+
+deps += ['mldev', 'common_ml', 'common_cnxk']
+
+if get_option('buildtype').contains('debug')
+        cflags += [ '-DCNXK_ML_DEV_DEBUG' ]
+else
+        cflags += [ '-UCNXK_ML_DEV_DEBUG' ]
+endif
+
+pmd_supports_disable_iova_as_pa = true
diff --git a/drivers/ml/meson.build b/drivers/ml/meson.build
new file mode 100644
index 0000000000..54bc394c47
--- /dev/null
+++ b/drivers/ml/meson.build
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2022 Marvell.
+
+drivers = [
+        'cnxk',
+]
+
+std_deps = ['mldev']
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 03/38] ml/cnxk: enable probe and remove of ML device
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 01/38] common/cnxk: add ML headers and ROC code for cnxk Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 02/38] ml/cnxk: add skeleton for ML cnxk driver Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 04/38] ml/cnxk: add driver support to get device info Srikanth Yalavarthi
                       ` (35 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi, Anatoly Burakov; +Cc: dev, sshankarnara, jerinj, aprabhu

ML inference engine on cn10k platform is a PCI based device. Added
driver support to probe and remove the device for cn10k poll mode
driver. The device is named by the PMD as "ml_cn10k".

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 114 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_dev.h |  11 ++++
 drivers/ml/cnxk/cn10k_ml_ops.c |  10 +++
 drivers/ml/cnxk/cn10k_ml_ops.h |  11 ++++
 drivers/ml/cnxk/meson.build    |   2 +
 5 files changed, 148 insertions(+)
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ops.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ops.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index cc96a7bdb3..c2e93c9a1a 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -2,7 +2,121 @@
  * Copyright (c) 2022 Marvell.
  */
 
+#include <rte_common.h>
+#include <rte_dev.h>
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
+#include <rte_pci.h>
+
+#include <roc_api.h>
 
 #include "cn10k_ml_dev.h"
+#include "cn10k_ml_ops.h"
+
+/* Dummy operations for ML device */
+struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
+
+static int
+cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
+{
+	struct rte_ml_dev_pmd_init_params init_params;
+	struct cn10k_ml_dev *mldev;
+	char name[RTE_ML_STR_MAX];
+	struct rte_ml_dev *dev;
+	int ret;
+
+	PLT_SET_USED(pci_drv);
+
+	init_params = (struct rte_ml_dev_pmd_init_params){
+		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cn10k_ml_dev)};
+
+	ret = roc_plt_init();
+	if (ret < 0) {
+		plt_err("Failed to initialize platform model");
+		return ret;
+	}
+
+	rte_pci_device_name(&pci_dev->addr, name, sizeof(name));
+	dev = rte_ml_dev_pmd_create(name, &pci_dev->device, &init_params);
+	if (dev == NULL) {
+		ret = -ENODEV;
+		goto error_exit;
+	}
+
+	/* Get private data space allocated */
+	mldev = dev->data->dev_private;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		mldev->roc.pci_dev = pci_dev;
+
+		ret = roc_ml_dev_init(&mldev->roc);
+		if (ret) {
+			plt_err("Failed to initialize ML ROC, ret = %d", ret);
+			goto pmd_destroy;
+		}
+
+		dev->dev_ops = &cn10k_ml_ops;
+	} else {
+		plt_err("CN10K ML Ops are not supported on secondary process");
+		dev->dev_ops = &ml_dev_dummy_ops;
+	}
+
+	dev->enqueue_burst = NULL;
+	dev->dequeue_burst = NULL;
+	dev->op_error_get = NULL;
+
+	return 0;
+
+pmd_destroy:
+	rte_ml_dev_pmd_destroy(dev);
+
+error_exit:
+	plt_err("Could not create device (vendor_id: 0x%x device_id: 0x%x)", pci_dev->id.vendor_id,
+		pci_dev->id.device_id);
+
+	return ret;
+}
+
+static int
+cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
+{
+	struct cn10k_ml_dev *mldev;
+	char name[RTE_ML_STR_MAX];
+	struct rte_ml_dev *dev;
+	int ret;
+
+	if (pci_dev == NULL)
+		return -EINVAL;
+
+	rte_pci_device_name(&pci_dev->addr, name, sizeof(name));
+
+	dev = rte_ml_dev_pmd_get_named_dev(name);
+	if (dev == NULL)
+		return -ENODEV;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		mldev = dev->data->dev_private;
+		ret = roc_ml_dev_fini(&mldev->roc);
+		if (ret)
+			return ret;
+	}
+
+	return rte_ml_dev_pmd_destroy(dev);
+}
+
+static struct rte_pci_id pci_id_ml_table[] = {
+	{RTE_PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_CN10K_ML_PF)},
+	/* sentinel */
+	{},
+};
+
+static struct rte_pci_driver cn10k_mldev_pmd = {
+	.id_table = pci_id_ml_table,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_NEED_IOVA_AS_VA,
+	.probe = cn10k_ml_pci_probe,
+	.remove = cn10k_ml_pci_remove,
+};
+
+RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd);
+RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table);
+RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 049ac13fcd..833a09791a 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -5,4 +5,15 @@
 #ifndef _CN10K_ML_DEV_H_
 #define _CN10K_ML_DEV_H_
 
+#include <roc_api.h>
+
+/* Marvell OCTEON CN10K ML PMD device name */
+#define MLDEV_NAME_CN10K_PMD ml_cn10k
+
+/* Device private data */
+struct cn10k_ml_dev {
+	/* Device ROC */
+	struct roc_ml roc;
+};
+
 #endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
new file mode 100644
index 0000000000..39843e3ee5
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include "cn10k_ml_ops.h"
+
+struct rte_ml_dev_ops cn10k_ml_ops = {0};
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
new file mode 100644
index 0000000000..b14221d02c
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef _CN10K_ML_OPS_H_
+#define _CN10K_ML_OPS_H_
+
+/* Device ops */
+extern struct rte_ml_dev_ops cn10k_ml_ops;
+
+#endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index f33fef39e3..aff65a082f 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -9,10 +9,12 @@ endif
 
 driver_sdk_headers = files(
         'cn10k_ml_dev.h',
+        'cn10k_ml_ops.h',
 )
 
 sources = files(
         'cn10k_ml_dev.c',
+        'cn10k_ml_ops.c',
 )
 
 deps += ['mldev', 'common_ml', 'common_cnxk']
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 04/38] ml/cnxk: add driver support to get device info
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (2 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 03/38] ml/cnxk: enable probe and remove of ML device Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 05/38] ml/cnxk: add support for configure and close Srikanth Yalavarthi
                       ` (34 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added support to get the cn10k ML device information. This is a
driver implementation for the RTE function rte_ml_dev_info_get.
ML device on cn10k supports one queue-pair in lock-free mode and
does not support segmented input output data.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h | 15 +++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c | 23 ++++++++++++++++++++++-
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 833a09791a..13d26373e4 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -10,6 +10,21 @@
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
+/* Device alignment size */
+#define ML_CN10K_ALIGN_SIZE 128
+
+/* Maximum number of models per device */
+#define ML_CN10K_MAX_MODELS 16
+
+/* Maximum number of queue-pairs per device */
+#define ML_CN10K_MAX_QP_PER_DEVICE 1
+
+/* Maximum number of descriptors per queue-pair */
+#define ML_CN10K_MAX_DESC_PER_QP 1024
+
+/* Maximum number of segments for IO data */
+#define ML_CN10K_MAX_SEGMENTS 1
+
 /* Device private data */
 struct cn10k_ml_dev {
 	/* Device ROC */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 39843e3ee5..bad5ad4713 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -5,6 +5,27 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include "cn10k_ml_dev.h"
 #include "cn10k_ml_ops.h"
 
-struct rte_ml_dev_ops cn10k_ml_ops = {0};
+static int
+cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
+{
+	if (dev_info == NULL)
+		return -EINVAL;
+
+	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
+	dev_info->driver_name = dev->device->driver->name;
+	dev_info->max_models = ML_CN10K_MAX_MODELS;
+	dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE;
+	dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
+	dev_info->max_segments = ML_CN10K_MAX_SEGMENTS;
+	dev_info->min_align_size = ML_CN10K_ALIGN_SIZE;
+
+	return 0;
+}
+
+struct rte_ml_dev_ops cn10k_ml_ops = {
+	/* Device control ops */
+	.dev_info_get = cn10k_ml_dev_info_get,
+};
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 05/38] ml/cnxk: add support for configure and close
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (3 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 04/38] ml/cnxk: add driver support to get device info Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 06/38] ml/cnxk: parse ML firmware path from device args Srikanth Yalavarthi
                       ` (33 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Implemented driver functions to configure and close ML devices.
Added skeleton code and support to reconfigure ML device. PCI
device remove is enabled in device close.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c |  2 ++
 drivers/ml/cnxk/cn10k_ml_dev.h | 21 ++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c | 60 ++++++++++++++++++++++++++++++++++
 3 files changed, 83 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index c2e93c9a1a..fd45226add 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -65,6 +65,8 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	dev->dequeue_burst = NULL;
 	dev->op_error_get = NULL;
 
+	mldev->state = ML_CN10K_DEV_STATE_PROBED;
+
 	return 0;
 
 pmd_destroy:
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 13d26373e4..e7fb5fc2e2 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -25,10 +25,31 @@
 /* Maximum number of segments for IO data */
 #define ML_CN10K_MAX_SEGMENTS 1
 
+/* ML command timeout in seconds */
+#define ML_CN10K_CMD_TIMEOUT 5
+
+/* Device configuration state enum */
+enum cn10k_ml_dev_state {
+	/* Probed and not configured */
+	ML_CN10K_DEV_STATE_PROBED = 0,
+
+	/* Configured */
+	ML_CN10K_DEV_STATE_CONFIGURED,
+
+	/* Started */
+	ML_CN10K_DEV_STATE_STARTED,
+
+	/* Closed */
+	ML_CN10K_DEV_STATE_CLOSED
+};
+
 /* Device private data */
 struct cn10k_ml_dev {
 	/* Device ROC */
 	struct roc_ml roc;
+
+	/* Configuration state */
+	enum cn10k_ml_dev_state state;
 };
 
 #endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index bad5ad4713..32d38569a3 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -25,7 +25,67 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	return 0;
 }
 
+static int
+cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_dev *mldev;
+
+	if (dev == NULL || conf == NULL)
+		return -EINVAL;
+
+	/* Get CN10K device handle */
+	mldev = dev->data->dev_private;
+
+	cn10k_ml_dev_info_get(dev, &dev_info);
+	if (conf->nb_models > dev_info.max_models) {
+		plt_err("Invalid device config, nb_models > %d\n", dev_info.max_models);
+		return -EINVAL;
+	}
+
+	if (conf->nb_queue_pairs > dev_info.max_queue_pairs) {
+		plt_err("Invalid device config, nb_queue_pairs > %u\n", dev_info.max_queue_pairs);
+		return -EINVAL;
+	}
+
+	if (mldev->state == ML_CN10K_DEV_STATE_PROBED) {
+		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
+			   conf->nb_queue_pairs, conf->nb_models);
+	} else if (mldev->state == ML_CN10K_DEV_STATE_CONFIGURED) {
+		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
+			   conf->nb_queue_pairs, conf->nb_models);
+	} else if (mldev->state == ML_CN10K_DEV_STATE_STARTED) {
+		plt_err("Device can't be reconfigured in started state\n");
+		return -ENOTSUP;
+	} else if (mldev->state == ML_CN10K_DEV_STATE_CLOSED) {
+		plt_err("Device can't be reconfigured after close\n");
+		return -ENOTSUP;
+	}
+
+	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
+
+	return 0;
+}
+
+static int
+cn10k_ml_dev_close(struct rte_ml_dev *dev)
+{
+	struct cn10k_ml_dev *mldev;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	mldev = dev->data->dev_private;
+
+	mldev->state = ML_CN10K_DEV_STATE_CLOSED;
+
+	/* Remove PCI device */
+	return rte_dev_remove(dev->device);
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
+	.dev_configure = cn10k_ml_dev_configure,
+	.dev_close = cn10k_ml_dev_close,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 06/38] ml/cnxk: parse ML firmware path from device args
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (4 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 05/38] ml/cnxk: add support for configure and close Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 07/38] ml/cnxk: enable firmware load and device reset Srikanth Yalavarthi
                       ` (32 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled parsing ML firmware path for cn10k. Default path is set
as "/lib/firmware/mlip-fw.bin", when args are not provided. Added
internal structures for ML firmware.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 71 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_dev.h | 12 ++++++
 drivers/ml/cnxk/meson.build    |  2 +-
 3 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index fd45226add..117cac43aa 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -4,6 +4,8 @@
 
 #include <rte_common.h>
 #include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_kvargs.h>
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 #include <rte_pci.h>
@@ -13,9 +15,70 @@
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ops.h"
 
+#define CN10K_ML_FW_PATH "fw_path"
+
+#define CN10K_ML_FW_PATH_DEFAULT "/lib/firmware/mlip-fw.bin"
+
+static const char *const valid_args[] = {CN10K_ML_FW_PATH, NULL};
+
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
 
+static int
+parse_string_arg(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	if (value == NULL || extra_args == NULL)
+		return -EINVAL;
+
+	*(char **)extra_args = strdup(value);
+
+	if (!*(char **)extra_args)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static int
+cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mldev)
+{
+	struct rte_kvargs *kvlist = NULL;
+	bool fw_path_set = false;
+	char *fw_path = NULL;
+	int ret = 0;
+
+	if (devargs == NULL)
+		goto check_args;
+
+	kvlist = rte_kvargs_parse(devargs->args, valid_args);
+	if (kvlist == NULL) {
+		plt_err("Error parsing devargs\n");
+		return -EINVAL;
+	}
+
+	if (rte_kvargs_count(kvlist, CN10K_ML_FW_PATH) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_PATH, &parse_string_arg, &fw_path);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n", CN10K_ML_FW_PATH);
+			ret = -EINVAL;
+			goto exit;
+		}
+		fw_path_set = true;
+	}
+
+check_args:
+	if (!fw_path_set)
+		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
+	else
+		mldev->fw.path = fw_path;
+	plt_info("ML: %s = %s", CN10K_ML_FW_PATH, mldev->fw.path);
+
+exit:
+	if (kvlist)
+		rte_kvargs_free(kvlist);
+
+	return ret;
+}
+
 static int
 cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 {
@@ -49,6 +112,12 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
 		mldev->roc.pci_dev = pci_dev;
 
+		ret = cn10k_mldev_parse_devargs(dev->device->devargs, mldev);
+		if (ret) {
+			plt_err("Failed to parse devargs ret = %d", ret);
+			goto pmd_destroy;
+		}
+
 		ret = roc_ml_dev_init(&mldev->roc);
 		if (ret) {
 			plt_err("Failed to initialize ML ROC, ret = %d", ret);
@@ -122,3 +191,5 @@ static struct rte_pci_driver cn10k_mldev_pmd = {
 RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table);
 RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
+
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH "=<path>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index e7fb5fc2e2..5333566cff 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -43,6 +43,15 @@ enum cn10k_ml_dev_state {
 	ML_CN10K_DEV_STATE_CLOSED
 };
 
+/* ML firmware structure */
+struct cn10k_ml_fw {
+	/* Device reference */
+	struct cn10k_ml_dev *mldev;
+
+	/* Firmware file path */
+	const char *path;
+};
+
 /* Device private data */
 struct cn10k_ml_dev {
 	/* Device ROC */
@@ -50,6 +59,9 @@ struct cn10k_ml_dev {
 
 	/* Configuration state */
 	enum cn10k_ml_dev_state state;
+
+	/* Firmware */
+	struct cn10k_ml_fw fw;
 };
 
 #endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index aff65a082f..87b7fc3f2a 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -17,7 +17,7 @@ sources = files(
         'cn10k_ml_ops.c',
 )
 
-deps += ['mldev', 'common_ml', 'common_cnxk']
+deps += ['mldev', 'common_ml', 'common_cnxk', 'kvargs']
 
 if get_option('buildtype').contains('debug')
         cflags += [ '-DCNXK_ML_DEV_DEBUG' ]
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 07/38] ml/cnxk: enable firmware load and device reset
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (5 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 06/38] ml/cnxk: parse ML firmware path from device args Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 08/38] ml/cnxk: enable support for simulator environment Srikanth Yalavarthi
                       ` (31 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled support to load ML firmware on cn10ka ROC model. Reset
MLIP device during dev_close driver operation. Device can't be
reconfigured after a call to close. Job execution is disabled
after firmware load, execution is enabled in device start state.
Added internal request structure.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 327 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_dev.h | 156 ++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c |  21 +++
 drivers/ml/cnxk/cn10k_ml_ops.h |  14 ++
 4 files changed, 518 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 117cac43aa..90fca45ddd 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -12,6 +12,8 @@
 
 #include <roc_api.h>
 
+#include <eal_firmware.h>
+
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ops.h"
 
@@ -19,6 +21,15 @@
 
 #define CN10K_ML_FW_PATH_DEFAULT "/lib/firmware/mlip-fw.bin"
 
+/* ML firmware macros */
+#define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
+#define FW_STACK_BUFFER_SIZE	 0x40000
+#define FW_DEBUG_BUFFER_SIZE	 (2 * 0x20000)
+#define FW_EXCEPTION_BUFFER_SIZE 0x400
+#define FW_LINKER_OFFSET	 0x80000
+#define FW_WAIT_CYCLES		 100
+#define FW_LOAD_FLAGS		 0x1
+
 static const char *const valid_args[] = {CN10K_ML_FW_PATH, NULL};
 
 /* Dummy operations for ML device */
@@ -175,6 +186,322 @@ cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 	return rte_ml_dev_pmd_destroy(dev);
 }
 
+static void
+cn10k_ml_fw_print_info(struct cn10k_ml_fw *fw)
+{
+	plt_info("ML Firmware Version = %s", fw->req->jd.fw_load.version);
+
+	plt_ml_dbg("Firmware capabilities = 0x%016lx", fw->req->jd.fw_load.cap.u64);
+	plt_ml_dbg("Version = %s", fw->req->jd.fw_load.version);
+	plt_ml_dbg("core0_debug_ptr = 0x%016lx", fw->req->jd.fw_load.debug.core0_debug_ptr);
+	plt_ml_dbg("core1_debug_ptr = 0x%016lx", fw->req->jd.fw_load.debug.core1_debug_ptr);
+	plt_ml_dbg("debug_buffer_size = %u bytes", fw->req->jd.fw_load.debug.debug_buffer_size);
+	plt_ml_dbg("core0_exception_buffer = 0x%016lx",
+		   fw->req->jd.fw_load.debug.core0_exception_buffer);
+	plt_ml_dbg("core1_exception_buffer = 0x%016lx",
+		   fw->req->jd.fw_load.debug.core1_exception_buffer);
+	plt_ml_dbg("exception_state_size = %u bytes",
+		   fw->req->jd.fw_load.debug.exception_state_size);
+	plt_ml_dbg("flags = 0x%016lx", fw->req->jd.fw_load.flags);
+}
+
+uint64_t
+cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
+{
+	PLT_SET_USED(fw);
+
+	return FW_LOAD_FLAGS;
+}
+
+static int
+cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
+{
+	union ml_a35_0_rst_vector_base_s a35_0_rst_vector_base;
+	union ml_a35_0_rst_vector_base_s a35_1_rst_vector_base;
+	struct cn10k_ml_dev *mldev;
+	uint64_t timeout_cycle;
+	uint64_t reg_val64;
+	uint32_t reg_val32;
+	uint64_t offset;
+	bool timeout;
+	int ret = 0;
+	uint8_t i;
+
+	mldev = fw->mldev;
+
+	/* Reset HEAD and TAIL debug pointer registers */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
+
+	/* (1) Write firmware images for ACC's two A35 cores to the ML region in LLC / DRAM. */
+	rte_memcpy(PLT_PTR_ADD(fw->data, FW_LINKER_OFFSET), buffer, size);
+
+	/* (2) Set ML(0)_MLR_BASE = Base IOVA of the ML region in LLC/DRAM. */
+	reg_val64 = PLT_PTR_SUB_U64_CAST(fw->data, rte_eal_get_baseaddr());
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_save(&mldev->roc, ML_MLR_BASE);
+
+	/* (3) Set ML(0)_AXI_BRIDGE_CTRL(1) = 0x184003 to remove back-pressure check on DMA AXI
+	 * bridge.
+	 */
+	reg_val64 = (ROC_ML_AXI_BRIDGE_CTRL_AXI_RESP_CTRL |
+		     ROC_ML_AXI_BRIDGE_CTRL_BRIDGE_CTRL_MODE | ROC_ML_AXI_BRIDGE_CTRL_NCB_WR_BLK |
+		     ROC_ML_AXI_BRIDGE_CTRL_FORCE_WRESP_OK | ROC_ML_AXI_BRIDGE_CTRL_FORCE_RRESP_OK);
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_AXI_BRIDGE_CTRL(1));
+	plt_ml_dbg("ML_AXI_BRIDGE_CTRL(1) => 0x%016lx",
+		   roc_ml_reg_read64(&mldev->roc, ML_AXI_BRIDGE_CTRL(1)));
+
+	/* (4) Set ML(0)_ANB(0..2)_BACKP_DISABLE = 0x3 to remove back-pressure on the AXI to NCB
+	 * bridges.
+	 */
+	for (i = 0; i < ML_ANBX_NR; i++) {
+		reg_val64 = (ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_B_BACKP_DISABLE |
+			     ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_R_BACKP_DISABLE);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_BACKP_DISABLE(i));
+		plt_ml_dbg("ML_ANBX_BACKP_DISABLE(%u) => 0x%016lx", i,
+			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_BACKP_DISABLE(i)));
+	}
+
+	/* (5) Set ML(0)_ANB(0..2)_NCBI_P_OVR = 0x3000 and ML(0)_ANB(0..2)_NCBI_NP_OVR = 0x3000 to
+	 * signal all ML transactions as non-secure.
+	 */
+	for (i = 0; i < ML_ANBX_NR; i++) {
+		reg_val64 = (ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR |
+			     ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR_VLD);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
+		plt_ml_dbg("ML_ANBX_NCBI_P_OVR(%u) => 0x%016lx", i,
+			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_P_OVR(i)));
+
+		reg_val64 |= (ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR |
+			      ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR_VLD);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_NP_OVR(i));
+		plt_ml_dbg("ML_ANBX_NCBI_NP_OVR(%u) => 0x%016lx", i,
+			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_NP_OVR(i)));
+	}
+
+	/* (6) Set ML(0)_CFG[MLIP_CLK_FORCE] = 1, to force turning on the MLIP clock. */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 |= ROC_ML_CFG_MLIP_CLK_FORCE;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	/* (7) Set ML(0)_JOB_MGR_CTRL[STALL_ON_IDLE] = 0, to make sure the boot request is accepted
+	 * when there is no job in the command queue.
+	 */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL);
+	reg_val64 &= ~ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
+	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL));
+
+	/* (8) Set ML(0)_CFG[ENA] = 0 and ML(0)_CFG[MLIP_ENA] = 1 to bring MLIP out of reset while
+	 * keeping the job manager disabled.
+	 */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 |= ROC_ML_CFG_MLIP_ENA;
+	reg_val64 &= ~ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	/* (9) Wait at least 70 coprocessor clock cycles. */
+	plt_delay_us(FW_WAIT_CYCLES);
+
+	/* (10) Write ML outbound addresses pointing to the firmware images written in step 1 to the
+	 * following registers: ML(0)_A35_0_RST_VECTOR_BASE_W(0..1) for core 0,
+	 * ML(0)_A35_1_RST_VECTOR_BASE_W(0..1) for core 1. The value written to each register is the
+	 * AXI outbound address divided by 4. Read after write.
+	 */
+	offset = PLT_PTR_ADD_U64_CAST(
+		fw->data, FW_LINKER_OFFSET - roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+	a35_0_rst_vector_base.s.addr = (offset + ML_AXI_START_ADDR) / 4;
+	a35_1_rst_vector_base.s.addr = (offset + ML_AXI_START_ADDR) / 4;
+
+	roc_ml_reg_write32(&mldev->roc, a35_0_rst_vector_base.w.w0, ML_A35_0_RST_VECTOR_BASE_W(0));
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(0));
+	plt_ml_dbg("ML_A35_0_RST_VECTOR_BASE_W(0) => 0x%08x", reg_val32);
+
+	roc_ml_reg_write32(&mldev->roc, a35_0_rst_vector_base.w.w1, ML_A35_0_RST_VECTOR_BASE_W(1));
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(1));
+	plt_ml_dbg("ML_A35_0_RST_VECTOR_BASE_W(1) => 0x%08x", reg_val32);
+
+	roc_ml_reg_write32(&mldev->roc, a35_1_rst_vector_base.w.w0, ML_A35_1_RST_VECTOR_BASE_W(0));
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(0));
+	plt_ml_dbg("ML_A35_1_RST_VECTOR_BASE_W(0) => 0x%08x", reg_val32);
+
+	roc_ml_reg_write32(&mldev->roc, a35_1_rst_vector_base.w.w1, ML_A35_1_RST_VECTOR_BASE_W(1));
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(1));
+	plt_ml_dbg("ML_A35_1_RST_VECTOR_BASE_W(1) => 0x%08x", reg_val32);
+
+	/* (11) Clear MLIP's ML(0)_SW_RST_CTRL[ACC_RST]. This will bring the ACC cores and other
+	 * MLIP components out of reset. The cores will execute firmware from the ML region as
+	 * written in step 1.
+	 */
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_SW_RST_CTRL);
+	reg_val32 &= ~ROC_ML_SW_RST_CTRL_ACC_RST;
+	roc_ml_reg_write32(&mldev->roc, reg_val32, ML_SW_RST_CTRL);
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_SW_RST_CTRL);
+	plt_ml_dbg("ML_SW_RST_CTRL => 0x%08x", reg_val32);
+
+	/* (12) Wait for notification from firmware that ML is ready for job execution. */
+	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
+	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
+	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &fw->req->result);
+	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
+	plt_write64(ML_CN10K_POLL_JOB_START, &fw->req->status);
+	plt_wmb();
+
+	/* Enqueue FW load through scratch registers */
+	timeout = true;
+	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&mldev->roc, &fw->req->jd);
+
+	plt_rmb();
+	do {
+		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
+		    (plt_read64(&fw->req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+			timeout = false;
+			break;
+		}
+	} while (plt_tsc_cycles() < timeout_cycle);
+
+	/* Check firmware load status, clean-up and exit on failure. */
+	if ((!timeout) && (fw->req->result.error_code == 0)) {
+		cn10k_ml_fw_print_info(fw);
+	} else {
+		/* Set ML to disable new jobs */
+		reg_val64 = (ROC_ML_CFG_JD_SIZE | ROC_ML_CFG_MLIP_ENA);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+
+		/* Clear scratch registers */
+		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+
+		if (timeout) {
+			plt_err("Firmware load timeout");
+			ret = -ETIME;
+		} else {
+			plt_err("Firmware load failed");
+			ret = -1;
+		}
+
+		return ret;
+	}
+
+	/* (13) Set ML(0)_JOB_MGR_CTRL[STALL_ON_IDLE] = 0x1; this is needed to shut down the MLIP
+	 * clock when there are no more jobs to process.
+	 */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL);
+	reg_val64 |= ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
+	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL));
+
+	/* (14) Set ML(0)_CFG[MLIP_CLK_FORCE] = 0; the MLIP clock will be turned on/off based on job
+	 * activities.
+	 */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 &= ~ROC_ML_CFG_MLIP_CLK_FORCE;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	/* (15) Set ML(0)_CFG[ENA] to enable ML job execution. */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 |= ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	/* Reset scratch registers */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+
+	/* Disable job execution, to be enabled in start */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 &= ~ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	/* Additional fixes: Set RO bit to fix O2D DMA bandwidth issue on cn10ka */
+	for (i = 0; i < ML_ANBX_NR; i++) {
+		reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_P_OVR(i));
+		reg_val64 |= (ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR |
+			      ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR_VLD);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
+	}
+
+	return ret;
+}
+
+int
+cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
+{
+	const struct plt_memzone *mz;
+	struct cn10k_ml_fw *fw;
+	void *fw_buffer = NULL;
+	uint64_t mz_size = 0;
+	uint64_t fw_size = 0;
+	int ret = 0;
+
+	fw = &mldev->fw;
+	fw->mldev = mldev;
+
+	/* Read firmware image to a buffer */
+	ret = rte_firmware_read(fw->path, &fw_buffer, &fw_size);
+	if (ret < 0) {
+		plt_err("Can't read firmware data: %s\n", fw->path);
+		return ret;
+	}
+
+	/* Reserve memzone for firmware load completion and data */
+	mz_size = sizeof(struct cn10k_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
+		  FW_DEBUG_BUFFER_SIZE + FW_EXCEPTION_BUFFER_SIZE;
+	mz = plt_memzone_reserve_aligned(FW_MEMZONE_NAME, mz_size, 0, ML_CN10K_ALIGN_SIZE);
+	if (mz == NULL) {
+		plt_err("plt_memzone_reserve failed : %s", FW_MEMZONE_NAME);
+		if (fw_buffer != NULL)
+			free(fw_buffer);
+		return -ENOMEM;
+	}
+	fw->req = mz->addr;
+
+	/* Reset firmware load completion structure */
+	memset(&fw->req->jd, 0, sizeof(struct cn10k_ml_jd));
+	memset(&fw->req->jd.fw_load.version[0], '\0', MLDEV_FIRMWARE_VERSION_LENGTH);
+
+	/* Reset device, if in active state */
+	if (roc_ml_mlip_is_enabled(&mldev->roc))
+		roc_ml_mlip_reset(&mldev->roc, true);
+
+	/* Load firmware */
+	fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cn10k_ml_req));
+	ret = cn10k_ml_fw_load_cn10ka(fw, fw_buffer, fw_size);
+	if (fw_buffer != NULL)
+		free(fw_buffer);
+	if (ret < 0)
+		cn10k_ml_fw_unload(mldev);
+
+	return ret;
+}
+
+void
+cn10k_ml_fw_unload(struct cn10k_ml_dev *mldev)
+{
+	const struct plt_memzone *mz;
+	uint64_t reg_val;
+
+	/* Disable and reset device */
+	reg_val = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val &= ~ROC_ML_CFG_MLIP_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val, ML_CFG);
+	roc_ml_mlip_reset(&mldev->roc, true);
+
+	mz = plt_memzone_lookup(FW_MEMZONE_NAME);
+	if (mz != NULL)
+		plt_memzone_free(mz);
+}
+
 static struct rte_pci_id pci_id_ml_table[] = {
 	{RTE_PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_CN10K_ML_PF)},
 	/* sentinel */
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 5333566cff..00d23eb3ca 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -10,6 +10,9 @@
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
+/* Firmware version string length */
+#define MLDEV_FIRMWARE_VERSION_LENGTH 32
+
 /* Device alignment size */
 #define ML_CN10K_ALIGN_SIZE 128
 
@@ -28,6 +31,19 @@
 /* ML command timeout in seconds */
 #define ML_CN10K_CMD_TIMEOUT 5
 
+/* Poll mode job state */
+#define ML_CN10K_POLL_JOB_START	 0
+#define ML_CN10K_POLL_JOB_FINISH 1
+
+/* Job types */
+enum cn10k_ml_job_type {
+	ML_CN10K_JOB_TYPE_MODEL_RUN = 0,
+	ML_CN10K_JOB_TYPE_MODEL_STOP,
+	ML_CN10K_JOB_TYPE_MODEL_START,
+	ML_CN10K_JOB_TYPE_FIRMWARE_LOAD,
+	ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST,
+};
+
 /* Device configuration state enum */
 enum cn10k_ml_dev_state {
 	/* Probed and not configured */
@@ -43,6 +59,136 @@ enum cn10k_ml_dev_state {
 	ML_CN10K_DEV_STATE_CLOSED
 };
 
+/* Firmware stats */
+struct cn10k_ml_fw_stats {
+	/* Firmware start cycle */
+	uint64_t fw_start;
+
+	/* Firmware end cycle */
+	uint64_t fw_end;
+
+	/* Hardware start cycle */
+	uint64_t hw_start;
+
+	/* Hardware end cycle */
+	uint64_t hw_end;
+};
+
+/* Result structure */
+struct cn10k_ml_result {
+	/* Job error code */
+	uint64_t error_code;
+
+	/* Firmware stats */
+	struct cn10k_ml_fw_stats stats;
+
+	/* User context pointer */
+	void *user_ptr;
+};
+
+/* Firmware capability structure */
+union cn10k_ml_fw_cap {
+	uint64_t u64;
+
+	struct {
+		/* CMPC completion support */
+		uint64_t cmpc_completions : 1;
+
+		/* Poll mode completion support */
+		uint64_t poll_completions : 1;
+
+		/* SSO completion support */
+		uint64_t sso_completions : 1;
+
+		/* Support for model side loading */
+		uint64_t side_load_model : 1;
+
+		/* Batch execution */
+		uint64_t batch_run : 1;
+
+		/* Max number of models to be loaded in parallel */
+		uint64_t max_models : 8;
+
+		/* Firmware statistics */
+		uint64_t fw_stats : 1;
+
+		/* Hardware statistics */
+		uint64_t hw_stats : 1;
+
+		/* Max number of batches */
+		uint64_t max_num_batches : 16;
+
+		uint64_t rsvd : 33;
+	} s;
+};
+
+/* Firmware debug info structure */
+struct cn10k_ml_fw_debug {
+	/* ACC core 0 debug buffer */
+	uint64_t core0_debug_ptr;
+
+	/* ACC core 1 debug buffer */
+	uint64_t core1_debug_ptr;
+
+	/* ACC core 0 exception state buffer */
+	uint64_t core0_exception_buffer;
+
+	/* ACC core 1 exception state buffer */
+	uint64_t core1_exception_buffer;
+
+	/* Debug buffer size per core */
+	uint32_t debug_buffer_size;
+
+	/* Exception state dump size */
+	uint32_t exception_state_size;
+};
+
+/* Job descriptor header (32 bytes) */
+struct cn10k_ml_jd_header {
+	/* Job completion structure */
+	struct ml_jce_s jce;
+
+	/* Model ID */
+	uint64_t model_id : 8;
+
+	/* Job type */
+	uint64_t job_type : 8;
+
+	/* Flags for fast-path jobs */
+	uint64_t fp_flags : 16;
+
+	/* Flags for slow-path jobs */
+	uint64_t sp_flags : 16;
+	uint64_t rsvd : 16;
+
+	/* Job result pointer */
+	uint64_t *result;
+};
+
+/* Job descriptor structure */
+struct cn10k_ml_jd {
+	/* Job descriptor header (32 bytes) */
+	struct cn10k_ml_jd_header hdr;
+
+	union {
+		struct cn10k_ml_jd_section_fw_load {
+			/* Firmware capability structure (8 bytes) */
+			union cn10k_ml_fw_cap cap;
+
+			/* Firmware version (32 bytes) */
+			uint8_t version[MLDEV_FIRMWARE_VERSION_LENGTH];
+
+			/* Debug capability structure (40 bytes) */
+			struct cn10k_ml_fw_debug debug;
+
+			/* Flags to control error handling */
+			uint64_t flags;
+
+			uint8_t rsvd[8];
+		} fw_load;
+	};
+};
+
 /* ML firmware structure */
 struct cn10k_ml_fw {
 	/* Device reference */
@@ -50,6 +196,12 @@ struct cn10k_ml_fw {
 
 	/* Firmware file path */
 	const char *path;
+
+	/* Data buffer */
+	uint8_t *data;
+
+	/* Firmware load / handshake request structure */
+	struct cn10k_ml_req *req;
 };
 
 /* Device private data */
@@ -64,4 +216,8 @@ struct cn10k_ml_dev {
 	struct cn10k_ml_fw fw;
 };
 
+uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
+int cn10k_ml_fw_load(struct cn10k_ml_dev *mldev);
+void cn10k_ml_fw_unload(struct cn10k_ml_dev *mldev);
+
 #endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 32d38569a3..11e1cdb7cd 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -30,6 +30,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 {
 	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_dev *mldev;
+	int ret;
 
 	if (dev == NULL || conf == NULL)
 		return -EINVAL;
@@ -51,6 +52,11 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	if (mldev->state == ML_CN10K_DEV_STATE_PROBED) {
 		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
+
+		/* Load firmware */
+		ret = cn10k_ml_fw_load(mldev);
+		if (ret != 0)
+			return ret;
 	} else if (mldev->state == ML_CN10K_DEV_STATE_CONFIGURED) {
 		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
@@ -77,6 +83,21 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 
 	mldev = dev->data->dev_private;
 
+	/* Unload firmware */
+	cn10k_ml_fw_unload(mldev);
+
+	/* Clear scratch registers */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+
+	/* Reset ML_MLR_BASE */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+
 	mldev->state = ML_CN10K_DEV_STATE_CLOSED;
 
 	/* Remove PCI device */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index b14221d02c..fe18730aca 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -5,6 +5,20 @@
 #ifndef _CN10K_ML_OPS_H_
 #define _CN10K_ML_OPS_H_
 
+#include "cn10k_ml_dev.h"
+
+/* ML request */
+struct cn10k_ml_req {
+	/* Job descriptor */
+	struct cn10k_ml_jd jd;
+
+	/* Job result */
+	struct cn10k_ml_result result;
+
+	/* Status field for poll mode requests */
+	volatile uint64_t status;
+} __rte_aligned(ROC_ALIGN);
+
 /* Device ops */
 extern struct rte_ml_dev_ops cn10k_ml_ops;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 08/38] ml/cnxk: enable support for simulator environment
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (6 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 07/38] ml/cnxk: enable firmware load and device reset Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 09/38] ml/cnxk: enable support for device start and stop Srikanth Yalavarthi
                       ` (30 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled device initialization and firmware load on simulator
platform. Firmware load stage on simulator would involve
launching a firmware handshake request only.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 119 +++++++++++++++++++++++++++++----
 1 file changed, 107 insertions(+), 12 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 90fca45ddd..837f006bf0 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -213,6 +213,89 @@ cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 	return FW_LOAD_FLAGS;
 }
 
+static int
+cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
+{
+	struct cn10k_ml_dev *mldev;
+	uint64_t timeout_cycle;
+	uint64_t reg_val64;
+	bool timeout;
+	int ret = 0;
+
+	mldev = fw->mldev;
+
+	/* Reset HEAD and TAIL debug pointer registers */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
+
+	/* Set ML_MLR_BASE to base IOVA of the ML region in LLC/DRAM. */
+	reg_val64 = rte_eal_get_baseaddr();
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_save(&mldev->roc, ML_MLR_BASE);
+
+	/* Update FW load completion structure */
+	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
+	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
+	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &fw->req->result);
+	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
+	plt_write64(ML_CN10K_POLL_JOB_START, &fw->req->status);
+	plt_wmb();
+
+	/* Enqueue FW load through scratch registers */
+	timeout = true;
+	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&mldev->roc, &fw->req->jd);
+
+	plt_rmb();
+	do {
+		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
+		    (plt_read64(&fw->req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+			timeout = false;
+			break;
+		}
+	} while (plt_tsc_cycles() < timeout_cycle);
+
+	/* Check firmware load status, clean-up and exit on failure. */
+	if ((!timeout) && (fw->req->result.error_code == 0)) {
+		cn10k_ml_fw_print_info(fw);
+	} else {
+		/* Set ML to disable new jobs */
+		reg_val64 = (ROC_ML_CFG_JD_SIZE | ROC_ML_CFG_MLIP_ENA);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+
+		/* Clear scratch registers */
+		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+
+		if (timeout) {
+			plt_err("Firmware load timeout");
+			ret = -ETIME;
+		} else {
+			plt_err("Firmware load failed");
+			ret = -1;
+		}
+
+		return ret;
+	}
+
+	/* Reset scratch registers */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+
+	/* Disable job execution, to be enabled in start */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 &= ~ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	return ret;
+}
+
 static int
 cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 {
@@ -447,16 +530,22 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	fw = &mldev->fw;
 	fw->mldev = mldev;
 
-	/* Read firmware image to a buffer */
-	ret = rte_firmware_read(fw->path, &fw_buffer, &fw_size);
-	if (ret < 0) {
-		plt_err("Can't read firmware data: %s\n", fw->path);
-		return ret;
+	if (roc_env_is_emulator() || roc_env_is_hw()) {
+		/* Read firmware image to a buffer */
+		ret = rte_firmware_read(fw->path, &fw_buffer, &fw_size);
+		if (ret < 0) {
+			plt_err("Can't read firmware data: %s\n", fw->path);
+			return ret;
+		}
+
+		/* Reserve memzone for firmware load completion and data */
+		mz_size = sizeof(struct cn10k_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
+			  FW_DEBUG_BUFFER_SIZE + FW_EXCEPTION_BUFFER_SIZE;
+	} else if (roc_env_is_asim()) {
+		/* Reserve memzone for firmware load completion */
+		mz_size = sizeof(struct cn10k_ml_req);
 	}
 
-	/* Reserve memzone for firmware load completion and data */
-	mz_size = sizeof(struct cn10k_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
-		  FW_DEBUG_BUFFER_SIZE + FW_EXCEPTION_BUFFER_SIZE;
 	mz = plt_memzone_reserve_aligned(FW_MEMZONE_NAME, mz_size, 0, ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL) {
 		plt_err("plt_memzone_reserve failed : %s", FW_MEMZONE_NAME);
@@ -475,10 +564,16 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 		roc_ml_mlip_reset(&mldev->roc, true);
 
 	/* Load firmware */
-	fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cn10k_ml_req));
-	ret = cn10k_ml_fw_load_cn10ka(fw, fw_buffer, fw_size);
-	if (fw_buffer != NULL)
-		free(fw_buffer);
+	if (roc_env_is_emulator() || roc_env_is_hw()) {
+		fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cn10k_ml_req));
+		ret = cn10k_ml_fw_load_cn10ka(fw, fw_buffer, fw_size);
+		if (fw_buffer != NULL)
+			free(fw_buffer);
+	} else if (roc_env_is_asim()) {
+		fw->data = NULL;
+		ret = cn10k_ml_fw_load_asim(fw);
+	}
+
 	if (ret < 0)
 		cn10k_ml_fw_unload(mldev);
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 09/38] ml/cnxk: enable support for device start and stop
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (7 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 08/38] ml/cnxk: enable support for simulator environment Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 10/38] ml/cnxk: add support to create device queue-pairs Srikanth Yalavarthi
                       ` (29 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Implemented ML driver functions to start and stop ML device.
Start / Stop would enable or disable ML device to accept
inference requests.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 42 +++++++++++++++++++++++++++++++---
 1 file changed, 39 insertions(+), 3 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 11e1cdb7cd..3fea763caf 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -104,9 +104,45 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	return rte_dev_remove(dev->device);
 }
 
+static int
+cn10k_ml_dev_start(struct rte_ml_dev *dev)
+{
+	struct cn10k_ml_dev *mldev;
+	uint64_t reg_val64;
+
+	mldev = dev->data->dev_private;
+
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 |= ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	mldev->state = ML_CN10K_DEV_STATE_STARTED;
+
+	return 0;
+}
+
+static int
+cn10k_ml_dev_stop(struct rte_ml_dev *dev)
+{
+	struct cn10k_ml_dev *mldev;
+	uint64_t reg_val64;
+
+	mldev = dev->data->dev_private;
+
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 &= ~ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
-	.dev_info_get = cn10k_ml_dev_info_get,
-	.dev_configure = cn10k_ml_dev_configure,
-	.dev_close = cn10k_ml_dev_close,
+	.dev_info_get = cn10k_ml_dev_info_get, .dev_configure = cn10k_ml_dev_configure,
+	.dev_close = cn10k_ml_dev_close,       .dev_start = cn10k_ml_dev_start,
+	.dev_stop = cn10k_ml_dev_stop,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 10/38] ml/cnxk: add support to create device queue-pairs
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (8 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 09/38] ml/cnxk: enable support for device start and stop Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 11/38] ml/cnxk: add functions to load and unload models Srikanth Yalavarthi
                       ` (28 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled support to create and destroy device queue-pairs. Updated
configure stage to create array to store queue-pair handles. Added
internal structure for queue-pair, queue and ML inference requests.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 207 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cn10k_ml_ops.h |  33 +++++-
 2 files changed, 237 insertions(+), 3 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 3fea763caf..7c9c49ffda 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -8,6 +8,97 @@
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ops.h"
 
+static void
+qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
+{
+	snprintf(name, size, "cn10k_ml_qp_mem_%u:%u", dev_id, qp_id);
+}
+
+static int
+cn10k_ml_qp_destroy(const struct rte_ml_dev *dev, struct cn10k_ml_qp *qp)
+{
+	const struct rte_memzone *qp_mem;
+	char name[RTE_MEMZONE_NAMESIZE];
+	int ret;
+
+	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp->id);
+	qp_mem = rte_memzone_lookup(name);
+	ret = rte_memzone_free(qp_mem);
+	if (ret)
+		return ret;
+
+	rte_free(qp);
+
+	return 0;
+}
+
+static int
+cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
+{
+	struct cn10k_ml_qp *qp;
+	int ret;
+
+	qp = dev->data->queue_pairs[queue_pair_id];
+	if (qp == NULL)
+		return -EINVAL;
+
+	ret = cn10k_ml_qp_destroy(dev, qp);
+	if (ret) {
+		plt_err("Could not destroy queue pair %u", queue_pair_id);
+		return ret;
+	}
+
+	dev->data->queue_pairs[queue_pair_id] = NULL;
+
+	return 0;
+}
+
+static struct cn10k_ml_qp *
+cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
+{
+	const struct rte_memzone *qp_mem;
+	char name[RTE_MEMZONE_NAMESIZE];
+	struct cn10k_ml_qp *qp;
+	uint32_t len;
+	uint8_t *va;
+
+	/* Allocate queue pair */
+	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cn10k_ml_qp), ROC_ALIGN,
+				socket_id);
+	if (qp == NULL) {
+		plt_err("Could not allocate queue pair");
+		return NULL;
+	}
+
+	/* For request queue */
+	len = nb_desc * sizeof(struct cn10k_ml_req);
+	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp_id);
+	qp_mem = rte_memzone_reserve_aligned(
+		name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | RTE_MEMZONE_256MB, ROC_ALIGN);
+	if (qp_mem == NULL) {
+		plt_err("Could not reserve memzone: %s", name);
+		goto qp_free;
+	}
+
+	va = qp_mem->addr;
+	memset(va, 0, len);
+
+	/* Initialize Request queue */
+	qp->id = qp_id;
+	qp->queue.reqs = (struct cn10k_ml_req *)va;
+	qp->queue.head = 0;
+	qp->queue.tail = 0;
+	qp->queue.wait_cycles = ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	qp->nb_desc = nb_desc;
+
+	return qp;
+
+qp_free:
+	rte_free(qp);
+
+	return NULL;
+}
+
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -30,6 +121,9 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 {
 	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_qp *qp;
+	uint32_t mz_size;
+	uint16_t qp_id;
 	int ret;
 
 	if (dev == NULL || conf == NULL)
@@ -68,21 +162,83 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		return -ENOTSUP;
 	}
 
+	/* Configure queue-pairs */
+	if (dev->data->queue_pairs == NULL) {
+		mz_size = sizeof(dev->data->queue_pairs[0]) * conf->nb_queue_pairs;
+		dev->data->queue_pairs =
+			rte_zmalloc("cn10k_mldev_queue_pairs", mz_size, RTE_CACHE_LINE_SIZE);
+		if (dev->data->queue_pairs == NULL) {
+			dev->data->nb_queue_pairs = 0;
+			plt_err("Failed to get memory for queue_pairs, nb_queue_pairs %u",
+				conf->nb_queue_pairs);
+			return -ENOMEM;
+		}
+	} else { /* Re-configure */
+		void **queue_pairs;
+
+		/* Release all queue pairs as ML spec doesn't support queue_pair_destroy. */
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+			qp = dev->data->queue_pairs[qp_id];
+			if (qp != NULL) {
+				ret = cn10k_ml_dev_queue_pair_release(dev, qp_id);
+				if (ret < 0)
+					return ret;
+			}
+		}
+
+		queue_pairs = dev->data->queue_pairs;
+		queue_pairs =
+			rte_realloc(queue_pairs, sizeof(queue_pairs[0]) * conf->nb_queue_pairs,
+				    RTE_CACHE_LINE_SIZE);
+		if (queue_pairs == NULL) {
+			dev->data->nb_queue_pairs = 0;
+			plt_err("Failed to realloc queue_pairs, nb_queue_pairs = %u",
+				conf->nb_queue_pairs);
+			ret = -ENOMEM;
+			goto error;
+		}
+
+		memset(queue_pairs, 0, sizeof(queue_pairs[0]) * conf->nb_queue_pairs);
+		dev->data->queue_pairs = queue_pairs;
+	}
+	dev->data->nb_queue_pairs = conf->nb_queue_pairs;
+
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
 
 	return 0;
+
+error:
+	if (dev->data->queue_pairs != NULL)
+		rte_free(dev->data->queue_pairs);
+
+	return ret;
 }
 
 static int
 cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_qp *qp;
+	uint16_t qp_id;
 
 	if (dev == NULL)
 		return -EINVAL;
 
 	mldev = dev->data->dev_private;
 
+	/* Destroy all queue pairs */
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		if (qp != NULL) {
+			if (cn10k_ml_qp_destroy(dev, qp) != 0)
+				plt_err("Could not destroy queue pair %u", qp_id);
+			dev->data->queue_pairs[qp_id] = NULL;
+		}
+	}
+
+	if (dev->data->queue_pairs)
+		rte_free(dev->data->queue_pairs);
+
 	/* Unload firmware */
 	cn10k_ml_fw_unload(mldev);
 
@@ -140,9 +296,56 @@ cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 	return 0;
 }
 
+static int
+cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
+			      const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_qp *qp;
+	uint32_t nb_desc;
+
+	if (queue_pair_id >= dev->data->nb_queue_pairs) {
+		plt_err("Queue-pair id = %u (>= max queue pairs supported, %u)\n", queue_pair_id,
+			dev->data->nb_queue_pairs);
+		return -EINVAL;
+	}
+
+	if (dev->data->queue_pairs[queue_pair_id] != NULL)
+		cn10k_ml_dev_queue_pair_release(dev, queue_pair_id);
+
+	cn10k_ml_dev_info_get(dev, &dev_info);
+	if ((qp_conf->nb_desc > dev_info.max_desc) || (qp_conf->nb_desc == 0)) {
+		plt_err("Could not setup queue pair for %u descriptors", qp_conf->nb_desc);
+		return -EINVAL;
+	}
+	plt_ml_dbg("Creating queue-pair, queue_pair_id = %u, nb_desc = %u", queue_pair_id,
+		   qp_conf->nb_desc);
+
+	/* As the number of usable descriptors is 1 less than the queue size being created, we
+	 * increment the size of queue by 1 than the requested size, except when the requested size
+	 * is equal to the maximum possible size.
+	 */
+	nb_desc =
+		(qp_conf->nb_desc == dev_info.max_desc) ? dev_info.max_desc : qp_conf->nb_desc + 1;
+	qp = cn10k_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
+	if (qp == NULL) {
+		plt_err("Could not create queue pair %u", queue_pair_id);
+		return -ENOMEM;
+	}
+	dev->data->queue_pairs[queue_pair_id] = qp;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
-	.dev_info_get = cn10k_ml_dev_info_get, .dev_configure = cn10k_ml_dev_configure,
-	.dev_close = cn10k_ml_dev_close,       .dev_start = cn10k_ml_dev_start,
+	.dev_info_get = cn10k_ml_dev_info_get,
+	.dev_configure = cn10k_ml_dev_configure,
+	.dev_close = cn10k_ml_dev_close,
+	.dev_start = cn10k_ml_dev_start,
 	.dev_stop = cn10k_ml_dev_stop,
+
+	/* Queue-pair handling ops */
+	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
+	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
 };
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index fe18730aca..289c7c5587 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -5,9 +5,13 @@
 #ifndef _CN10K_ML_OPS_H_
 #define _CN10K_ML_OPS_H_
 
+#include <rte_mldev.h>
+
+#include <roc_api.h>
+
 #include "cn10k_ml_dev.h"
 
-/* ML request */
+/* Request structure */
 struct cn10k_ml_req {
 	/* Job descriptor */
 	struct cn10k_ml_jd jd;
@@ -19,6 +23,33 @@ struct cn10k_ml_req {
 	volatile uint64_t status;
 } __rte_aligned(ROC_ALIGN);
 
+/* Request queue */
+struct cn10k_ml_queue {
+	/* Array of requests */
+	struct cn10k_ml_req *reqs;
+
+	/* Head of the queue, used for enqueue */
+	uint64_t head;
+
+	/* Tail of the queue, used for dequeue */
+	uint64_t tail;
+
+	/* Wait cycles before timeout */
+	uint64_t wait_cycles;
+};
+
+/* Queue-pair structure */
+struct cn10k_ml_qp {
+	/* ID */
+	uint32_t id;
+
+	/* Number of descriptors */
+	uint64_t nb_desc;
+
+	/* Request queue */
+	struct cn10k_ml_queue queue;
+};
+
 /* Device ops */
 extern struct rte_ml_dev_ops cn10k_ml_ops;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 11/38] ml/cnxk: add functions to load and unload models
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (9 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 10/38] ml/cnxk: add support to create device queue-pairs Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 12/38] ml/cnxk: enable validity checks for model metadata Srikanth Yalavarthi
                       ` (27 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added cnxk driver implementations to load and unload ML models.
Enabled support in configure stage to allocate model handles
array. Assign model ID and allocate resources per each model
during load stage and release resources during model unload.
Added internal structures to handle ML models.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |   3 +
 drivers/ml/cnxk/cn10k_ml_model.c |   5 +
 drivers/ml/cnxk/cn10k_ml_model.h |  43 +++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 154 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h   |   5 +
 drivers/ml/cnxk/meson.build      |   2 +
 6 files changed, 212 insertions(+)
 create mode 100644 drivers/ml/cnxk/cn10k_ml_model.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_model.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 00d23eb3ca..7cf6268115 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -214,6 +214,9 @@ struct cn10k_ml_dev {
 
 	/* Firmware */
 	struct cn10k_ml_fw fw;
+
+	/* Number of models loaded */
+	uint16_t nb_models_loaded;
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
new file mode 100644
index 0000000000..39ed707396
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#include "cn10k_ml_model.h"
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
new file mode 100644
index 0000000000..0a6a498342
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef _CN10K_ML_MODEL_H_
+#define _CN10K_ML_MODEL_H_
+
+#include <rte_mldev.h>
+
+#include <roc_api.h>
+
+#include "cn10k_ml_dev.h"
+
+/* Maximum number of models per device */
+#define ML_CN10K_MAX_MODELS 16
+
+/* Model state */
+enum cn10k_ml_model_state {
+	ML_CN10K_MODEL_STATE_LOADED,
+	ML_CN10K_MODEL_STATE_JOB_ACTIVE,
+	ML_CN10K_MODEL_STATE_STARTED,
+	ML_CN10K_MODEL_STATE_UNKNOWN,
+};
+
+/* Model Object */
+struct cn10k_ml_model {
+	/* Device reference */
+	struct cn10k_ml_dev *mldev;
+
+	/* Name */
+	char name[RTE_ML_STR_MAX];
+
+	/* ID */
+	int16_t model_id;
+
+	/* Spinlock, used to update model state */
+	plt_spinlock_t lock;
+
+	/* State */
+	enum cn10k_ml_model_state state;
+};
+
+#endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 7c9c49ffda..d177d0e3e4 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -6,8 +6,12 @@
 #include <rte_mldev_pmd.h>
 
 #include "cn10k_ml_dev.h"
+#include "cn10k_ml_model.h"
 #include "cn10k_ml_ops.h"
 
+/* ML model macros */
+#define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -120,9 +124,11 @@ static int
 cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
 {
 	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
 	uint32_t mz_size;
+	int16_t model_id;
 	uint16_t qp_id;
 	int ret;
 
@@ -203,6 +209,48 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	}
 	dev->data->nb_queue_pairs = conf->nb_queue_pairs;
 
+	/* Allocate ML models */
+	if (dev->data->models == NULL) {
+		mz_size = sizeof(dev->data->models[0]) * conf->nb_models;
+		dev->data->models = rte_zmalloc("cn10k_mldev_models", mz_size, RTE_CACHE_LINE_SIZE);
+		if (dev->data->models == NULL) {
+			dev->data->nb_models = 0;
+			plt_err("Failed to get memory for ml_models, nb_models %u",
+				conf->nb_models);
+			ret = -ENOMEM;
+			goto error;
+		}
+	} else {
+		/* Re-configure */
+		void **models;
+
+		/* Unload all models */
+		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+			model = dev->data->models[model_id];
+			if (model != NULL) {
+				if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+					if (cn10k_ml_model_unload(dev, model_id) != 0)
+						plt_err("Could not unload model %d", model_id);
+				}
+				dev->data->models[model_id] = NULL;
+			}
+		}
+
+		models = dev->data->models;
+		models = rte_realloc(models, sizeof(models[0]) * conf->nb_models,
+				     RTE_CACHE_LINE_SIZE);
+		if (models == NULL) {
+			dev->data->nb_models = 0;
+			plt_err("Failed to realloc ml_models, nb_models = %u", conf->nb_models);
+			ret = -ENOMEM;
+			goto error;
+		}
+		memset(models, 0, sizeof(models[0]) * conf->nb_models);
+		dev->data->models = models;
+	}
+	dev->data->nb_models = conf->nb_models;
+
+	mldev->nb_models_loaded = 0;
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
 
 	return 0;
@@ -211,14 +259,19 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	if (dev->data->queue_pairs != NULL)
 		rte_free(dev->data->queue_pairs);
 
+	if (dev->data->models != NULL)
+		rte_free(dev->data->models);
+
 	return ret;
 }
 
 static int
 cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
+	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
+	int16_t model_id;
 	uint16_t qp_id;
 
 	if (dev == NULL)
@@ -226,6 +279,21 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 
 	mldev = dev->data->dev_private;
 
+	/* Unload all models */
+	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+		model = dev->data->models[model_id];
+		if (model != NULL) {
+			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+				if (cn10k_ml_model_unload(dev, model_id) != 0)
+					plt_err("Could not unload model %d", model_id);
+			}
+			dev->data->models[model_id] = NULL;
+		}
+	}
+
+	if (dev->data->models)
+		rte_free(dev->data->models);
+
 	/* Destroy all queue pairs */
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
 		qp = dev->data->queue_pairs[qp_id];
@@ -337,6 +405,88 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+int
+cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, int16_t *model_id)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint64_t mz_size;
+	uint16_t idx;
+	bool found;
+
+	PLT_SET_USED(params);
+
+	mldev = dev->data->dev_private;
+
+	/* Find model ID */
+	found = false;
+	for (idx = 0; idx < dev->data->nb_models; idx++) {
+		if (dev->data->models[idx] == NULL) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		plt_err("No slots available to load new model");
+		return -ENOMEM;
+	}
+
+	/* Compute memzone size */
+	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE);
+
+	/* Allocate memzone for model object and model data */
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
+	mz = plt_memzone_reserve_aligned(str, mz_size, 0, ML_CN10K_ALIGN_SIZE);
+	if (!mz) {
+		plt_err("plt_memzone_reserve failed : %s", str);
+		return -ENOMEM;
+	}
+
+	model = mz->addr;
+	model->mldev = mldev;
+	model->model_id = idx;
+
+	plt_spinlock_init(&model->lock);
+	model->state = ML_CN10K_MODEL_STATE_LOADED;
+	dev->data->models[idx] = model;
+	mldev->nb_models_loaded++;
+
+	*model_id = idx;
+
+	return 0;
+}
+
+int
+cn10k_ml_model_unload(struct rte_ml_dev *dev, int16_t model_id)
+{
+	char str[RTE_MEMZONE_NAMESIZE];
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+
+	mldev = dev->data->dev_private;
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	if (model->state != ML_CN10K_MODEL_STATE_LOADED) {
+		plt_err("Cannot unload. Model in use.");
+		return -EBUSY;
+	}
+
+	dev->data->models[model_id] = NULL;
+	mldev->nb_models_loaded--;
+
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%d", CN10K_ML_MODEL_MEMZONE_NAME, model_id);
+	return plt_memzone_free(plt_memzone_lookup(str));
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -348,4 +498,8 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Queue-pair handling ops */
 	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
 	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
+
+	/* Model ops */
+	.model_load = cn10k_ml_model_load,
+	.model_unload = cn10k_ml_model_unload,
 };
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 289c7c5587..8a939cabc7 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -53,4 +53,9 @@ struct cn10k_ml_qp {
 /* Device ops */
 extern struct rte_ml_dev_ops cn10k_ml_ops;
 
+/* Slow-path ops */
+int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
+			int16_t *model_id);
+int cn10k_ml_model_unload(struct rte_ml_dev *dev, int16_t model_id);
+
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 87b7fc3f2a..5bc98386b8 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -10,11 +10,13 @@ endif
 driver_sdk_headers = files(
         'cn10k_ml_dev.h',
         'cn10k_ml_ops.h',
+        'cn10k_ml_model.h',
 )
 
 sources = files(
         'cn10k_ml_dev.c',
         'cn10k_ml_ops.c',
+        'cn10k_ml_model.c',
 )
 
 deps += ['mldev', 'common_ml', 'common_cnxk', 'kvargs']
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 12/38] ml/cnxk: enable validity checks for model metadata
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (10 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 11/38] ml/cnxk: add functions to load and unload models Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 13/38] ml/cnxk: add internal structures for derived info Srikanth Yalavarthi
                       ` (26 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added model metadata structure and enabled metadata check
during model load. Remap cnxk IO types with RTE IO types.
Store and update model metadata in model structure.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 196 +++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h | 312 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c   |  14 +-
 drivers/ml/cnxk/meson.build      |   2 +-
 4 files changed, 522 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 39ed707396..efb7a10233 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -2,4 +2,200 @@
  * Copyright (c) 2022 Marvell.
  */
 
+#include <rte_hash_crc.h>
+
+#include <ml_utils.h>
+
+#include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
+
+static enum rte_ml_io_type
+cn10k_ml_io_type_map(uint8_t type)
+{
+	switch (type) {
+	case 1:
+		return RTE_ML_IO_TYPE_INT8;
+	case 2:
+		return RTE_ML_IO_TYPE_UINT8;
+	case 3:
+		return RTE_ML_IO_TYPE_INT16;
+	case 4:
+		return RTE_ML_IO_TYPE_UINT16;
+	case 5:
+		return RTE_ML_IO_TYPE_INT32;
+	case 6:
+		return RTE_ML_IO_TYPE_UINT32;
+	case 7:
+		return RTE_ML_IO_TYPE_FP16;
+	case 8:
+		return RTE_ML_IO_TYPE_FP32;
+	}
+
+	return RTE_ML_IO_TYPE_UNKNOWN;
+}
+
+int
+cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	uint32_t payload_crc32c;
+	uint32_t header_crc32c;
+	uint8_t version[4];
+	uint8_t i;
+
+	metadata = (struct cn10k_ml_model_metadata *)buffer;
+
+	/* Header CRC check */
+	if (metadata->metadata_header.header_crc32c != 0) {
+		header_crc32c = rte_hash_crc(
+			buffer, sizeof(metadata->metadata_header) - sizeof(uint32_t), 0);
+
+		if (header_crc32c != metadata->metadata_header.header_crc32c) {
+			plt_err("Invalid model, Header CRC mismatch");
+			return -EINVAL;
+		}
+	}
+
+	/* Payload CRC check */
+	if (metadata->metadata_header.payload_crc32c != 0) {
+		payload_crc32c = rte_hash_crc(buffer + sizeof(metadata->metadata_header),
+					      size - sizeof(metadata->metadata_header), 0);
+
+		if (payload_crc32c != metadata->metadata_header.payload_crc32c) {
+			plt_err("Invalid model, Payload CRC mismatch");
+			return -EINVAL;
+		}
+	}
+
+	/* Model magic string */
+	if (strncmp((char *)metadata->metadata_header.magic, MRVL_ML_MODEL_MAGIC_STRING, 4) != 0) {
+		plt_err("Invalid model, magic = %s", metadata->metadata_header.magic);
+		return -EINVAL;
+	}
+
+	/* Target architecture */
+	if (metadata->metadata_header.target_architecture != MRVL_ML_MODEL_TARGET_ARCH) {
+		plt_err("Model target architecture (%u) not supported",
+			metadata->metadata_header.target_architecture);
+		return -ENOTSUP;
+	}
+
+	/* Header version */
+	rte_memcpy(version, metadata->metadata_header.version, 4 * sizeof(uint8_t));
+	if (version[0] * 1000 + version[1] * 100 < MRVL_ML_MODEL_VERSION) {
+		plt_err("Metadata version = %u.%u.%u.%u (< %u.%u.%u.%u) not supported", version[0],
+			version[1], version[2], version[3], (MRVL_ML_MODEL_VERSION / 1000) % 10,
+			(MRVL_ML_MODEL_VERSION / 100) % 10, (MRVL_ML_MODEL_VERSION / 10) % 10,
+			MRVL_ML_MODEL_VERSION % 10);
+		return -ENOTSUP;
+	}
+
+	/* Init section */
+	if (metadata->init_model.file_size == 0) {
+		plt_err("Invalid metadata, init_model.file_size = %u",
+			metadata->init_model.file_size);
+		return -EINVAL;
+	}
+
+	/* Main section */
+	if (metadata->main_model.file_size == 0) {
+		plt_err("Invalid metadata, main_model.file_size = %u",
+			metadata->main_model.file_size);
+		return -EINVAL;
+	}
+
+	/* Finish section */
+	if (metadata->finish_model.file_size == 0) {
+		plt_err("Invalid metadata, finish_model.file_size = %u",
+			metadata->finish_model.file_size);
+		return -EINVAL;
+	}
+
+	/* Weights and Bias */
+	if (metadata->weights_bias.file_size == 0) {
+		plt_err("Invalid metadata, weights_bias.file_size = %u",
+			metadata->weights_bias.file_size);
+		return -EINVAL;
+	}
+
+	if (metadata->weights_bias.relocatable != 1) {
+		plt_err("Model not supported, non-relocatable weights and bias");
+		return -ENOTSUP;
+	}
+
+	/* Inputs */
+	for (i = 0; i < metadata->model.num_input; i++) {
+		if (ml_io_type_size_get(cn10k_ml_io_type_map(metadata->input[i].input_type)) <= 0) {
+			plt_err("Invalid metadata, input[%u] : input_type = %u", i,
+				metadata->input[i].input_type);
+			return -EINVAL;
+		}
+
+		if (ml_io_type_size_get(
+			    cn10k_ml_io_type_map(metadata->input[i].model_input_type)) <= 0) {
+			plt_err("Invalid metadata, input[%u] : model_input_type = %u", i,
+				metadata->input[i].model_input_type);
+			return -EINVAL;
+		}
+
+		if (metadata->input[i].relocatable != 1) {
+			plt_err("Model not supported, non-relocatable input: %u", i);
+			return -ENOTSUP;
+		}
+	}
+
+	/* Outputs */
+	for (i = 0; i < metadata->model.num_output; i++) {
+		if (ml_io_type_size_get(cn10k_ml_io_type_map(metadata->output[i].output_type)) <=
+		    0) {
+			plt_err("Invalid metadata, output[%u] : output_type = %u", i,
+				metadata->output[i].output_type);
+			return -EINVAL;
+		}
+
+		if (ml_io_type_size_get(
+			    cn10k_ml_io_type_map(metadata->output[i].model_output_type)) <= 0) {
+			plt_err("Invalid metadata, output[%u] : model_output_type = %u", i,
+				metadata->output[i].model_output_type);
+			return -EINVAL;
+		}
+
+		if (metadata->output[i].relocatable != 1) {
+			plt_err("Model not supported, non-relocatable output: %u", i);
+			return -ENOTSUP;
+		}
+	}
+
+	return 0;
+}
+
+void
+cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata)
+{
+	uint8_t i;
+
+	for (i = 0; i < metadata->model.num_input; i++) {
+		metadata->input[i].input_type = cn10k_ml_io_type_map(metadata->input[i].input_type);
+		metadata->input[i].model_input_type =
+			cn10k_ml_io_type_map(metadata->input[i].model_input_type);
+
+		if (metadata->input[i].shape.w == 0)
+			metadata->input[i].shape.w = 1;
+
+		if (metadata->input[i].shape.x == 0)
+			metadata->input[i].shape.x = 1;
+
+		if (metadata->input[i].shape.y == 0)
+			metadata->input[i].shape.y = 1;
+
+		if (metadata->input[i].shape.z == 0)
+			metadata->input[i].shape.z = 1;
+	}
+
+	for (i = 0; i < metadata->model.num_output; i++) {
+		metadata->output[i].output_type =
+			cn10k_ml_io_type_map(metadata->output[i].output_type);
+		metadata->output[i].model_output_type =
+			cn10k_ml_io_type_map(metadata->output[i].model_output_type);
+	}
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 0a6a498342..3b9c04a5a3 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -22,6 +22,309 @@ enum cn10k_ml_model_state {
 	ML_CN10K_MODEL_STATE_UNKNOWN,
 };
 
+/* Model Metadata : v 2.1.0.2 */
+#define MRVL_ML_MODEL_MAGIC_STRING "MRVL"
+#define MRVL_ML_MODEL_TARGET_ARCH  128
+#define MRVL_ML_MODEL_VERSION	   2100
+#define MRVL_ML_MODEL_NAME_LEN	   64
+#define MRVL_ML_INPUT_NAME_LEN	   16
+#define MRVL_ML_OUTPUT_NAME_LEN	   16
+#define MRVL_ML_INPUT_OUTPUT_SIZE  8
+
+/* Model file metadata structure */
+struct cn10k_ml_model_metadata {
+	/* Header (256-byte) */
+	struct {
+		/* Magic string ('M', 'R', 'V', 'L') */
+		uint8_t magic[4];
+
+		/* Metadata version */
+		uint8_t version[4];
+
+		/* Metadata size */
+		uint32_t metadata_size;
+
+		/* Unique ID */
+		uint8_t uuid[128];
+
+		/* Model target architecture
+		 * 0 = Undefined
+		 * 1 = M1K
+		 * 128 = MLIP
+		 * 256 = Experimental
+		 */
+		uint32_t target_architecture;
+		uint8_t reserved[104];
+
+		/* CRC of data after metadata_header (i.e. after first 256 bytes) */
+		uint32_t payload_crc32c;
+
+		/* CRC of first 252 bytes of metadata_header, after payload_crc calculation */
+		uint32_t header_crc32c;
+	} metadata_header;
+
+	/* Model information (256-byte) */
+	struct {
+		/* Model name string */
+		uint8_t name[MRVL_ML_MODEL_NAME_LEN];
+
+		/* Model version info (xx.xx.xx.xx) */
+		uint8_t version[4];
+
+		/* Model code size (Init + Main + Finish) */
+		uint32_t code_size;
+
+		/* Model data size (Weights and Bias) */
+		uint32_t data_size;
+
+		/* OCM start offset, set to ocm_wb_range_start */
+		uint32_t ocm_start;
+
+		/* OCM start offset, set to max OCM size */
+		uint32_t ocm_end;
+
+		/* Relocatable flag (always yes)
+		 * 0 = Not relocatable
+		 * 1 = Relocatable
+		 */
+		uint8_t ocm_relocatable;
+
+		/* Tile relocatable flag (always yes)
+		 * 0 = Not relocatable
+		 * 1 = Relocatable
+		 */
+		uint8_t tile_relocatable;
+
+		/* Start tile (Always 0) */
+		uint8_t tile_start;
+
+		/* End tile (num_tiles - 1) */
+		uint8_t tile_end;
+
+		/* Inference batch size */
+		uint8_t batch_size;
+
+		/* Number of input tensors (Max 8) */
+		uint8_t num_input;
+
+		/* Number of output tensors (Max 8) */
+		uint8_t num_output;
+		uint8_t reserved1;
+
+		/* Total input size in bytes */
+		uint32_t input_size;
+
+		/* Total output size in bytes */
+		uint32_t output_size;
+
+		/* Table size in bytes */
+		uint32_t table_size;
+
+		/* Number of layers in the network */
+		uint32_t num_layers;
+		uint32_t reserved2;
+
+		/* Floor of absolute OCM region */
+		uint64_t ocm_tmp_range_floor;
+
+		/* Relative OCM start address of WB data block */
+		uint64_t ocm_wb_range_start;
+
+		/* Relative OCM end address of WB data block */
+		uint64_t ocm_wb_range_end;
+
+		/* Relative DDR start address of WB data block */
+		uint64_t ddr_wb_range_start;
+
+		/* Relative DDR end address of all outputs */
+		uint64_t ddr_wb_range_end;
+
+		/* Relative DDR start address of all inputs */
+		uint64_t ddr_input_range_start;
+
+		/* Relative DDR end address of all inputs */
+		uint64_t ddr_input_range_end;
+
+		/* Relative DDR start address of all outputs */
+		uint64_t ddr_output_range_start;
+
+		/* Relative DDR end address of all outputs */
+		uint64_t ddr_output_range_end;
+
+		/* Compiler version */
+		uint8_t compiler_version[8];
+
+		/* CDK version */
+		uint8_t cdk_version[4];
+
+		/* Lower batch optimization support
+		 * 0 - No,
+		 * 1 - Yes
+		 */
+		uint8_t supports_lower_batch_size_optimization;
+		uint8_t reserved[59];
+	} model;
+
+	/* Init section (64-byte) */
+	struct {
+		uint32_t file_offset;
+		uint32_t file_size;
+		uint8_t reserved[56];
+	} init_model;
+
+	/* Main section (64-byte) */
+	struct {
+		uint32_t file_offset;
+		uint32_t file_size;
+		uint8_t reserved[56];
+	} main_model;
+
+	/* Finish section (64-byte) */
+	struct {
+		uint32_t file_offset;
+		uint32_t file_size;
+		uint8_t reserved[56];
+	} finish_model;
+
+	uint8_t reserved1[512]; /* End of 2k bytes */
+
+	/* Weights and Bias (64-byte) */
+	struct {
+		/* Memory offset, set to ddr_wb_range_start */
+		uint64_t mem_offset;
+		uint32_t file_offset;
+		uint32_t file_size;
+
+		/* Relocatable flag for WB
+		 * 1 = Relocatable
+		 * 2 = Not relocatable
+		 */
+		uint8_t relocatable;
+		uint8_t reserved[47];
+	} weights_bias;
+
+	/* Input (512-byte, 64-byte per input) provisioned for 8 inputs */
+	struct {
+		/* DDR offset (in OCM absolute addresses for input) */
+		uint64_t mem_offset;
+
+		/* Relocatable flag
+		 * 1 = Relocatable
+		 * 2 = Not relocatable
+		 */
+		uint8_t relocatable;
+
+		/* Input quantization
+		 * 1 = Requires quantization
+		 * 2 = Pre-quantized
+		 */
+		uint8_t quantize;
+
+		/* Type of incoming input
+		 * 1 = INT8, 2 = UINT8, 3 = INT16, 4 = UINT16,
+		 * 5 = INT32, 6 = UINT32, 7 = FP16, 8 = FP32
+		 */
+		uint8_t input_type;
+
+		/* Type of input required by model
+		 * 1 = INT8, 2 = UINT8, 3 = INT16, 4 = UINT16,
+		 * 5 = INT32, 6 = UINT32, 7 = FP16, 8 = FP32
+		 */
+		uint8_t model_input_type;
+
+		/* float_32 qscale value
+		 * quantized = non-quantized * qscale
+		 */
+		float qscale;
+
+		/* Input shape */
+		struct {
+			/* Input format
+			 * 1 = NCHW
+			 * 2 = NHWC
+			 */
+			uint8_t format;
+			uint8_t reserved[3];
+			uint32_t w;
+			uint32_t x;
+			uint32_t y;
+			uint32_t z;
+		} shape;
+		uint8_t reserved[4];
+
+		/* Name of input */
+		uint8_t input_name[MRVL_ML_INPUT_NAME_LEN];
+
+		/* DDR range end
+		 * new = mem_offset + size_bytes - 1
+		 */
+		uint64_t ddr_range_end;
+	} input[MRVL_ML_INPUT_OUTPUT_SIZE];
+
+	/* Output (512 byte, 64-byte per input) provisioned for 8 outputs */
+	struct {
+		/* DDR offset in OCM absolute addresses for output */
+		uint64_t mem_offset;
+
+		/* Relocatable flag
+		 * 1 = Relocatable
+		 * 2 = Not relocatable
+		 */
+		uint8_t relocatable;
+
+		/* Output dequantization
+		 * 1 = De-quantization required
+		 * 2 = De-quantization not required
+		 */
+		uint8_t dequantize;
+
+		/* Type of outgoing output
+		 * 1 = INT8, 2 = UINT8, 3 = INT16, 4 = UINT16
+		 * 5 = INT32, 6 = UINT32, 7 = FP16, 8 = FP32
+		 */
+		uint8_t output_type;
+
+		/* Type of output produced by model
+		 * 1 = INT8, 2 = UINT8, 3 = INT16, 4 = UINT16
+		 * 5 = INT32, 6 = UINT32, 7 = FP16, 8 = FP32
+		 */
+		uint8_t model_output_type;
+
+		/* float_32 dscale value
+		 * dequantized = quantized * dscale
+		 */
+		float dscale;
+
+		/* Number of items in the output */
+		uint32_t size;
+		uint8_t reserved[20];
+
+		/* DDR range end
+		 * new = mem_offset + size_bytes - 1
+		 */
+		uint64_t ddr_range_end;
+		uint8_t output_name[MRVL_ML_OUTPUT_NAME_LEN];
+	} output[MRVL_ML_INPUT_OUTPUT_SIZE];
+
+	uint8_t reserved2[1792];
+
+	/* Model data */
+	struct {
+		uint8_t reserved1[4068];
+
+		/* Beta: xx.xx.xx.xx,
+		 * Later: YYYYMM.xx.xx
+		 */
+		uint8_t compiler_version[8];
+
+		/* M1K CDK version (xx.xx.xx.xx) */
+		uint8_t m1k_cdk_version[4];
+	} data;
+
+	/* Hidden 16 bytes of magic code */
+	uint8_t reserved3[16];
+};
+
 /* Model Object */
 struct cn10k_ml_model {
 	/* Device reference */
@@ -33,6 +336,12 @@ struct cn10k_ml_model {
 	/* ID */
 	int16_t model_id;
 
+	/* Batch size */
+	uint32_t batch_size;
+
+	/* Metadata */
+	struct cn10k_ml_model_metadata metadata;
+
 	/* Spinlock, used to update model state */
 	plt_spinlock_t lock;
 
@@ -40,4 +349,7 @@ struct cn10k_ml_model {
 	enum cn10k_ml_model_state state;
 };
 
+int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
+void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
+
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index d177d0e3e4..f7c1d43aee 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -416,8 +416,11 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	uint64_t mz_size;
 	uint16_t idx;
 	bool found;
+	int ret;
 
-	PLT_SET_USED(params);
+	ret = cn10k_ml_model_metadata_check(params->addr, params->size);
+	if (ret != 0)
+		return ret;
 
 	mldev = dev->data->dev_private;
 
@@ -450,6 +453,15 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	model->mldev = mldev;
 	model->model_id = idx;
 
+	rte_memcpy(&model->metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
+	cn10k_ml_model_metadata_update(&model->metadata);
+
+	/* Enable support for batch_size of 256 */
+	if (model->metadata.model.batch_size == 0)
+		model->batch_size = 256;
+	else
+		model->batch_size = model->metadata.model.batch_size;
+
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 5bc98386b8..84ff622784 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -19,7 +19,7 @@ sources = files(
         'cn10k_ml_model.c',
 )
 
-deps += ['mldev', 'common_ml', 'common_cnxk', 'kvargs']
+deps += ['mldev', 'common_ml', 'common_cnxk', 'kvargs', 'hash']
 
 if get_option('buildtype').contains('debug')
         cflags += [ '-DCNXK_ML_DEV_DEBUG' ]
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 13/38] ml/cnxk: add internal structures for derived info
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (11 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 12/38] ml/cnxk: enable validity checks for model metadata Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 14/38] ml/cnxk: add internal structures for tiles and OCM Srikanth Yalavarthi
                       ` (25 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added internal structures to handle derived address fields
and enabled support to compute DMA addresses for model start.
Enabled updating internal model fields.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 88 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h | 80 +++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 18 ++++++-
 3 files changed, 185 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index efb7a10233..5591e5f572 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -199,3 +199,91 @@ cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata)
 			cn10k_ml_io_type_map(metadata->output[i].model_output_type);
 	}
 }
+
+void
+cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_t *base_dma_addr)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	struct cn10k_ml_model_addr *addr;
+	size_t model_data_size;
+	uint8_t *dma_addr_load;
+	uint8_t *dma_addr_run;
+	uint8_t i;
+	int fpos;
+
+	metadata = &model->metadata;
+	addr = &model->addr;
+	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
+			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
+
+	/* Base address */
+	addr->base_dma_addr_load = base_dma_addr;
+	addr->base_dma_addr_run = PLT_PTR_ADD(addr->base_dma_addr_load, model_data_size);
+
+	/* Init section */
+	dma_addr_load = addr->base_dma_addr_load;
+	dma_addr_run = addr->base_dma_addr_run;
+	fpos = sizeof(struct cn10k_ml_model_metadata);
+	addr->init_load_addr = dma_addr_load;
+	addr->init_run_addr = dma_addr_run;
+	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->init_model.file_size);
+
+	/* Main section */
+	dma_addr_load += metadata->init_model.file_size;
+	dma_addr_run += metadata->init_model.file_size;
+	fpos += metadata->init_model.file_size;
+	addr->main_load_addr = dma_addr_load;
+	addr->main_run_addr = dma_addr_run;
+	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->main_model.file_size);
+
+	/* Finish section */
+	dma_addr_load += metadata->main_model.file_size;
+	dma_addr_run += metadata->main_model.file_size;
+	fpos += metadata->main_model.file_size;
+	addr->finish_load_addr = dma_addr_load;
+	addr->finish_run_addr = dma_addr_run;
+	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->finish_model.file_size);
+
+	/* Weights and Bias section */
+	dma_addr_load += metadata->finish_model.file_size;
+	fpos += metadata->finish_model.file_size;
+	addr->wb_base_addr = PLT_PTR_SUB(dma_addr_load, metadata->weights_bias.mem_offset);
+	addr->wb_load_addr = PLT_PTR_ADD(addr->wb_base_addr, metadata->weights_bias.mem_offset);
+	rte_memcpy(addr->wb_load_addr, PLT_PTR_ADD(buffer, fpos), metadata->weights_bias.file_size);
+
+	/* Inputs */
+	addr->total_input_sz_d = 0;
+	addr->total_input_sz_q = 0;
+	for (i = 0; i < metadata->model.num_input; i++) {
+		addr->input[i].nb_elements =
+			model->metadata.input[i].shape.w * model->metadata.input[i].shape.x *
+			model->metadata.input[i].shape.y * model->metadata.input[i].shape.z;
+		addr->input[i].sz_d = addr->input[i].nb_elements *
+				      ml_io_type_size_get(metadata->input[i].input_type);
+		addr->input[i].sz_q = addr->input[i].nb_elements *
+				      ml_io_type_size_get(metadata->input[i].model_input_type);
+		addr->total_input_sz_d += addr->input[i].sz_d;
+		addr->total_input_sz_q += addr->input[i].sz_q;
+
+		plt_ml_dbg("model_id = %d, input[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+			   model->model_id, i, metadata->input[i].shape.w,
+			   metadata->input[i].shape.x, metadata->input[i].shape.y,
+			   metadata->input[i].shape.z, addr->input[i].sz_d, addr->input[i].sz_q);
+	}
+
+	/* Outputs */
+	addr->total_output_sz_q = 0;
+	addr->total_output_sz_d = 0;
+	for (i = 0; i < metadata->model.num_output; i++) {
+		addr->output[i].nb_elements = metadata->output[i].size;
+		addr->output[i].sz_d = addr->output[i].nb_elements *
+				       ml_io_type_size_get(metadata->output[i].output_type);
+		addr->output[i].sz_q = addr->output[i].nb_elements *
+				       ml_io_type_size_get(metadata->output[i].model_output_type);
+		addr->total_output_sz_q += addr->output[i].sz_q;
+		addr->total_output_sz_d += addr->output[i].sz_d;
+
+		plt_ml_dbg("model_id = %d, output[%u] - sz_d = %u, sz_q = %u", model->model_id, i,
+			   addr->output[i].sz_d, addr->output[i].sz_q);
+	}
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 3b9c04a5a3..1f329323a6 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -325,6 +325,81 @@ struct cn10k_ml_model_metadata {
 	uint8_t reserved3[16];
 };
 
+/* Model address structure */
+struct cn10k_ml_model_addr {
+	/* Base DMA address for load */
+	void *base_dma_addr_load;
+
+	/* Base DMA address for run */
+	void *base_dma_addr_run;
+
+	/* Init section load address */
+	void *init_load_addr;
+
+	/* Init section run address */
+	void *init_run_addr;
+
+	/* Main section load address */
+	void *main_load_addr;
+
+	/* Main section run address */
+	void *main_run_addr;
+
+	/* Finish section load address */
+	void *finish_load_addr;
+
+	/* Finish section run address */
+	void *finish_run_addr;
+
+	/* Weights and Bias base address */
+	void *wb_base_addr;
+
+	/* Weights and bias load address */
+	void *wb_load_addr;
+
+	/* Start tile */
+	uint8_t tile_start;
+
+	/* End tile */
+	uint8_t tile_end;
+
+	/* Input address and size */
+	struct {
+		/* Number of elements */
+		uint32_t nb_elements;
+
+		/* Dequantized input size */
+		uint32_t sz_d;
+
+		/* Quantized input size */
+		uint32_t sz_q;
+	} input[MRVL_ML_INPUT_OUTPUT_SIZE];
+
+	/* Output address and size */
+	struct {
+		/* Number of elements */
+		uint32_t nb_elements;
+
+		/* Dequantize output size */
+		uint32_t sz_d;
+
+		/* Quantized output size */
+		uint32_t sz_q;
+	} output[MRVL_ML_INPUT_OUTPUT_SIZE];
+
+	/* Total size of quantized input */
+	uint32_t total_input_sz_q;
+
+	/* Total size of dequantized input */
+	uint32_t total_input_sz_d;
+
+	/* Total size of quantized output */
+	uint32_t total_output_sz_q;
+
+	/* Total size of dequantized output */
+	uint32_t total_output_sz_d;
+};
+
 /* Model Object */
 struct cn10k_ml_model {
 	/* Device reference */
@@ -342,6 +417,9 @@ struct cn10k_ml_model {
 	/* Metadata */
 	struct cn10k_ml_model_metadata metadata;
 
+	/* Address structure */
+	struct cn10k_ml_model_addr addr;
+
 	/* Spinlock, used to update model state */
 	plt_spinlock_t lock;
 
@@ -351,5 +429,7 @@ struct cn10k_ml_model {
 
 int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
+void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
+				uint8_t *base_dma_addr);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index f7c1d43aee..20f15ec35d 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -408,11 +408,14 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 int
 cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, int16_t *model_id)
 {
+	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
+	size_t model_data_size;
+	uint8_t *base_dma_addr;
 	uint64_t mz_size;
 	uint16_t idx;
 	bool found;
@@ -439,7 +442,12 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	/* Compute memzone size */
-	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE);
+	metadata = (struct cn10k_ml_model_metadata *)params->addr;
+	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
+			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
+	model_data_size = PLT_ALIGN_CEIL(model_data_size, ML_CN10K_ALIGN_SIZE);
+	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
+		  2 * model_data_size;
 
 	/* Allocate memzone for model object and model data */
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
@@ -462,6 +470,14 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	else
 		model->batch_size = model->metadata.model.batch_size;
 
+	/* Set DMA base address */
+	base_dma_addr = PLT_PTR_ADD(
+		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE));
+	cn10k_ml_model_addr_update(model, params->addr, base_dma_addr);
+
+	/* Copy data from load to run. run address to be used by MLIP */
+	rte_memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, model_data_size);
+
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 14/38] ml/cnxk: add internal structures for tiles and OCM
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (12 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 13/38] ml/cnxk: add internal structures for derived info Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 15/38] ml/cnxk: add structures for slow and fast path JDs Srikanth Yalavarthi
                       ` (24 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added internal structures to handle tile and OCM information and
OCM to model memory mapping. Initialize the fields to platform
specific defaults and compute the OCM / tile requirements for model.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |  5 ++
 drivers/ml/cnxk/cn10k_ml_model.c | 53 +++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h |  6 +++
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  5 ++
 drivers/ml/cnxk/cn10k_ml_ocm.h   | 79 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 29 ++++++++++++
 drivers/ml/cnxk/meson.build      |  2 +
 7 files changed, 179 insertions(+)
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ocm.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ocm.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 7cf6268115..02a4496c97 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -7,6 +7,8 @@
 
 #include <roc_api.h>
 
+#include "cn10k_ml_ocm.h"
+
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
@@ -215,6 +217,9 @@ struct cn10k_ml_dev {
 	/* Firmware */
 	struct cn10k_ml_fw fw;
 
+	/* OCM info */
+	struct cn10k_ml_ocm ocm;
+
 	/* Number of models loaded */
 	uint16_t nb_models_loaded;
 };
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 5591e5f572..2281f59591 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -8,6 +8,7 @@
 
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
+#include "cn10k_ml_ocm.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -287,3 +288,55 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_
 			   addr->output[i].sz_d, addr->output[i].sz_q);
 	}
 }
+
+int
+cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, int16_t model_id, uint8_t *buffer,
+			       uint16_t *wb_pages, uint16_t *scratch_pages)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	struct cn10k_ml_ocm *ocm;
+	uint64_t scratch_size;
+	uint64_t wb_size;
+
+	metadata = (struct cn10k_ml_model_metadata *)buffer;
+	ocm = &mldev->ocm;
+
+	/* Assume wb_size is zero for non-relocatable models */
+	if (metadata->model.ocm_relocatable)
+		wb_size = metadata->model.ocm_wb_range_end - metadata->model.ocm_wb_range_start + 1;
+	else
+		wb_size = 0;
+
+	if (wb_size % ocm->page_size)
+		*wb_pages = wb_size / ocm->page_size + 1;
+	else
+		*wb_pages = wb_size / ocm->page_size;
+	plt_ml_dbg("model_id = %d, wb_size = %" PRIu64 ", wb_pages = %u", model_id, wb_size,
+		   *wb_pages);
+
+	scratch_size = ocm->size_per_tile - metadata->model.ocm_tmp_range_floor;
+	if (metadata->model.ocm_tmp_range_floor % ocm->page_size)
+		*scratch_pages = scratch_size / ocm->page_size + 1;
+	else
+		*scratch_pages = scratch_size / ocm->page_size;
+	plt_ml_dbg("model_id = %d, scratch_size = %" PRIu64 ", scratch_pages = %u", model_id,
+		   scratch_size, *scratch_pages);
+
+	/* Check if the model can be loaded on OCM */
+	if ((*wb_pages + *scratch_pages) > ML_CN10K_OCM_NUMPAGES) {
+		plt_err("Cannot create the model, OCM relocatable = %u",
+			metadata->model.ocm_relocatable);
+		plt_err("wb_pages (%u) + scratch_pages (%u) > %u", *wb_pages, *scratch_pages,
+			ML_CN10K_OCM_NUMPAGES);
+		return -ENOMEM;
+	}
+
+	/* Update scratch_pages to block the full tile for OCM non-relocatable model. This would
+	 * prevent the library from allocating the remaining space on the tile to other models.
+	 */
+	if (!metadata->model.ocm_relocatable)
+		*scratch_pages =
+			PLT_MAX(PLT_U64_CAST(*scratch_pages), PLT_U64_CAST(ML_CN10K_OCM_NUMPAGES));
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 1f329323a6..80653f18a9 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -10,6 +10,7 @@
 #include <roc_api.h>
 
 #include "cn10k_ml_dev.h"
+#include "cn10k_ml_ocm.h"
 
 /* Maximum number of models per device */
 #define ML_CN10K_MAX_MODELS 16
@@ -420,6 +421,9 @@ struct cn10k_ml_model {
 	/* Address structure */
 	struct cn10k_ml_model_addr addr;
 
+	/* Tile and memory information object */
+	struct cn10k_ml_ocm_model_map model_mem_map;
+
 	/* Spinlock, used to update model state */
 	plt_spinlock_t lock;
 
@@ -431,5 +435,7 @@ int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
 void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
 				uint8_t *base_dma_addr);
+int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, int16_t model_id, uint8_t *buffer,
+				   uint16_t *wb_pages, uint16_t *scratch_pages);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
new file mode 100644
index 0000000000..b1c62f2963
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#include "cn10k_ml_ocm.h"
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
new file mode 100644
index 0000000000..44390396f9
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -0,0 +1,79 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef _CN10K_ML_OCM_H_
+#define _CN10K_ML_OCM_H_
+
+#include <rte_mldev.h>
+
+/* Page size in bytes. */
+#define ML_CN10K_OCM_PAGESIZE 0x4000
+
+/* Number of OCM tiles. */
+#define ML_CN10K_OCM_NUMTILES 0x8
+
+/* OCM in bytes, per tile. */
+#define ML_CN10K_OCM_TILESIZE 0x100000
+
+/* OCM pages, per tile. */
+#define ML_CN10K_OCM_NUMPAGES (ML_CN10K_OCM_TILESIZE / ML_CN10K_OCM_PAGESIZE)
+
+/* Maximum OCM mask words, per tile, 8 bit words. */
+#define ML_CN10K_OCM_MASKWORDS (ML_CN10K_OCM_NUMPAGES / 8)
+
+/* OCM and Tile information structure */
+struct cn10k_ml_ocm_tile_info {
+	/* Mask of used / allotted pages on tile's OCM */
+	uint8_t ocm_mask[ML_CN10K_OCM_MASKWORDS];
+
+	/* Last pages in the tile's OCM used for weights and bias, default = -1 */
+	int last_wb_page;
+
+	/* Number pages used for scratch memory on the tile's OCM */
+	uint16_t scratch_pages;
+};
+
+/* Model OCM map structure */
+struct cn10k_ml_ocm_model_map {
+	/* Status of OCM reservation */
+	bool ocm_reserved;
+
+	/* Mask of OCM tiles for the model */
+	uint64_t tilemask;
+
+	/* Start page for the model load, default = -1 */
+	int wb_page_start;
+
+	/* Number of pages required for weights and bias */
+	uint16_t wb_pages;
+
+	/* Number of pages required for scratch memory */
+	uint16_t scratch_pages;
+};
+
+/* OCM state structure */
+struct cn10k_ml_ocm {
+	/* OCM spinlock, used to update OCM state */
+	rte_spinlock_t lock;
+
+	/* Number of OCM tiles */
+	uint8_t num_tiles;
+
+	/* OCM size per each tile */
+	uint64_t size_per_tile;
+
+	/* Size of OCM page */
+	uint64_t page_size;
+
+	/* Number of OCM pages */
+	uint16_t num_pages;
+
+	/* Words per OCM mask */
+	uint16_t mask_words;
+
+	/* OCM memory info and status*/
+	struct cn10k_ml_ocm_tile_info tile_ocm_info[ML_CN10K_OCM_NUMTILES];
+};
+
+#endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 20f15ec35d..9ccf52332f 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -126,8 +126,10 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_qp *qp;
 	uint32_t mz_size;
+	uint16_t tile_id;
 	int16_t model_id;
 	uint16_t qp_id;
 	int ret;
@@ -250,6 +252,18 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	}
 	dev->data->nb_models = conf->nb_models;
 
+	ocm = &mldev->ocm;
+	ocm->num_tiles = ML_CN10K_OCM_NUMTILES;
+	ocm->size_per_tile = ML_CN10K_OCM_TILESIZE;
+	ocm->page_size = ML_CN10K_OCM_PAGESIZE;
+	ocm->num_pages = ocm->size_per_tile / ocm->page_size;
+	ocm->mask_words = ocm->num_pages / (8 * sizeof(uint8_t));
+
+	for (tile_id = 0; tile_id < ocm->num_tiles; tile_id++)
+		ocm->tile_ocm_info[tile_id].last_wb_page = -1;
+
+	rte_spinlock_init(&ocm->lock);
+
 	mldev->nb_models_loaded = 0;
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
 
@@ -416,6 +430,8 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	const struct plt_memzone *mz;
 	size_t model_data_size;
 	uint8_t *base_dma_addr;
+	uint16_t scratch_pages;
+	uint16_t wb_pages;
 	uint64_t mz_size;
 	uint16_t idx;
 	bool found;
@@ -441,6 +457,11 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 		return -ENOMEM;
 	}
 
+	/* Get WB and scratch pages, check if model can be loaded. */
+	ret = cn10k_ml_model_ocm_pages_count(mldev, idx, params->addr, &wb_pages, &scratch_pages);
+	if (ret < 0)
+		return ret;
+
 	/* Compute memzone size */
 	metadata = (struct cn10k_ml_model_metadata *)params->addr;
 	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
@@ -478,6 +499,14 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	/* Copy data from load to run. run address to be used by MLIP */
 	rte_memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, model_data_size);
 
+	/* Initialize model_mem_map */
+	memset(&model->model_mem_map, 0, sizeof(struct cn10k_ml_ocm_model_map));
+	model->model_mem_map.ocm_reserved = false;
+	model->model_mem_map.tilemask = 0;
+	model->model_mem_map.wb_page_start = -1;
+	model->model_mem_map.wb_pages = wb_pages;
+	model->model_mem_map.scratch_pages = scratch_pages;
+
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 84ff622784..2f220a26b5 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -11,12 +11,14 @@ driver_sdk_headers = files(
         'cn10k_ml_dev.h',
         'cn10k_ml_ops.h',
         'cn10k_ml_model.h',
+        'cn10k_ml_ocm.h',
 )
 
 sources = files(
         'cn10k_ml_dev.c',
         'cn10k_ml_ops.c',
         'cn10k_ml_model.c',
+        'cn10k_ml_ocm.c',
 )
 
 deps += ['mldev', 'common_ml', 'common_cnxk', 'kvargs', 'hash']
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 15/38] ml/cnxk: add structures for slow and fast path JDs
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (13 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 14/38] ml/cnxk: add internal structures for tiles and OCM Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 16/38] ml/cnxk: find OCM mask and page slots for a model Srikanth Yalavarthi
                       ` (23 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added JD structures for load, unload and run jobs. Initialize
job command and allocate memory for request structures for slow
path jobs.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   | 99 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h |  4 ++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 19 +++++-
 drivers/ml/cnxk/cn10k_ml_ops.h   |  4 ++
 4 files changed, 125 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 02a4496c97..68fcc957fa 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -188,6 +188,105 @@ struct cn10k_ml_jd {
 
 			uint8_t rsvd[8];
 		} fw_load;
+
+		struct cn10k_ml_jd_section_model_start {
+			/* Source model start address in DDR relative to ML_MLR_BASE */
+			uint64_t model_src_ddr_addr;
+
+			/* Destination model start address in DDR relative to ML_MLR_BASE */
+			uint64_t model_dst_ddr_addr;
+
+			/* Offset to model init section in the model */
+			uint64_t model_init_offset : 32;
+
+			/* Size of init section in the model */
+			uint64_t model_init_size : 32;
+
+			/* Offset to model main section in the model */
+			uint64_t model_main_offset : 32;
+
+			/* Size of main section in the model */
+			uint64_t model_main_size : 32;
+
+			/* Offset to model finish section in the model */
+			uint64_t model_finish_offset : 32;
+
+			/* Size of finish section in the model */
+			uint64_t model_finish_size : 32;
+
+			/* Offset to WB in model bin */
+			uint64_t model_wb_offset : 32;
+
+			/* Number of model layers */
+			uint64_t num_layers : 8;
+
+			/* Number of gather entries, 0 means linear input mode (= no gather) */
+			uint64_t num_gather_entries : 8;
+
+			/* Number of scatter entries 0 means linear input mode (= no scatter) */
+			uint64_t num_scatter_entries : 8;
+
+			/* Tile mask to load model */
+			uint64_t tilemask : 8;
+
+			/* Batch size of model  */
+			uint64_t batch_size : 32;
+
+			/* OCM WB base address */
+			uint64_t ocm_wb_base_address : 32;
+
+			/* OCM WB range start */
+			uint64_t ocm_wb_range_start : 32;
+
+			/* OCM WB range End */
+			uint64_t ocm_wb_range_end : 32;
+
+			/* DDR WB address */
+			uint64_t ddr_wb_base_address;
+
+			/* DDR WB range start */
+			uint64_t ddr_wb_range_start : 32;
+
+			/* DDR WB range end */
+			uint64_t ddr_wb_range_end : 32;
+
+			union {
+				/* Points to gather list if num_gather_entries > 0 */
+				void *gather_list;
+				struct {
+					/* Linear input mode */
+					uint64_t ddr_range_start : 32;
+					uint64_t ddr_range_end : 32;
+				} s;
+			} input;
+
+			union {
+				/* Points to scatter list if num_scatter_entries > 0 */
+				void *scatter_list;
+				struct {
+					/* Linear output mode */
+					uint64_t ddr_range_start : 32;
+					uint64_t ddr_range_end : 32;
+				} s;
+			} output;
+		} model_start;
+
+		struct cn10k_ml_jd_section_model_stop {
+			uint8_t rsvd[96];
+		} model_stop;
+
+		struct cn10k_ml_jd_section_model_run {
+			/* Address of the input for the run relative to ML_MLR_BASE */
+			uint64_t input_ddr_addr;
+
+			/* Address of the output for the run relative to ML_MLR_BASE */
+			uint64_t output_ddr_addr;
+
+			/* Number of batches to run in variable batch processing */
+			uint16_t num_batches;
+
+			uint8_t rsvd[78];
+		} model_run;
 	};
 };
 
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 80653f18a9..4b5a804c8b 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -11,6 +11,7 @@
 
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ocm.h"
+#include "cn10k_ml_ops.h"
 
 /* Maximum number of models per device */
 #define ML_CN10K_MAX_MODELS 16
@@ -429,6 +430,9 @@ struct cn10k_ml_model {
 
 	/* State */
 	enum cn10k_ml_model_state state;
+
+	/* Slow-path operations request pointer */
+	struct cn10k_ml_req *req;
 };
 
 int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 9ccf52332f..8603cba20e 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -12,6 +12,10 @@
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
 
+/* ML Job descriptor flags */
+#define ML_FLAGS_POLL_COMPL BIT(0)
+#define ML_FLAGS_SSO_COMPL  BIT(1)
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -65,6 +69,7 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	struct cn10k_ml_qp *qp;
 	uint32_t len;
 	uint8_t *va;
+	uint64_t i;
 
 	/* Allocate queue pair */
 	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cn10k_ml_qp), ROC_ALIGN,
@@ -95,6 +100,12 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->queue.wait_cycles = ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
 	qp->nb_desc = nb_desc;
 
+	/* Initialize job command */
+	for (i = 0; i < qp->nb_desc; i++) {
+		memset(&qp->queue.reqs[i].jd, 0, sizeof(struct cn10k_ml_jd));
+		qp->queue.reqs[i].jcmd.w1.s.jobptr = PLT_U64_CAST(&qp->queue.reqs[i].jd);
+	}
+
 	return qp;
 
 qp_free:
@@ -468,7 +479,8 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
 	model_data_size = PLT_ALIGN_CEIL(model_data_size, ML_CN10K_ALIGN_SIZE);
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
-		  2 * model_data_size;
+		  2 * model_data_size +
+		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE);
 
 	/* Allocate memzone for model object and model data */
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
@@ -507,6 +519,11 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	model->model_mem_map.wb_pages = wb_pages;
 	model->model_mem_map.scratch_pages = scratch_pages;
 
+	/* Set slow-path request address and state */
+	model->req = PLT_PTR_ADD(
+		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
+				  2 * model_data_size);
+
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 8a939cabc7..981aa52655 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -6,6 +6,7 @@
 #define _CN10K_ML_OPS_H_
 
 #include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
 
 #include <roc_api.h>
 
@@ -21,6 +22,9 @@ struct cn10k_ml_req {
 
 	/* Status field for poll mode requests */
 	volatile uint64_t status;
+
+	/* Job command */
+	struct ml_job_cmd_s jcmd;
 } __rte_aligned(ROC_ALIGN);
 
 /* Request queue */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 16/38] ml/cnxk: find OCM mask and page slots for a model
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (14 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 15/38] ml/cnxk: add structures for slow and fast path JDs Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 17/38] ml/cnxk: add support to reserve and free OCM pages Srikanth Yalavarthi
                       ` (22 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added support to compute OCM tilemask and page start for a
model. The computed tilemask and page start are used during
model start to copy model weights and bias to OCM. OCM slot
for a model is allocated from the tiles with maximum amount
of free memory.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c | 330 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ocm.h |   5 +
 2 files changed, 335 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index b1c62f2963..df2fa4c514 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -2,4 +2,334 @@
  * Copyright (c) 2022 Marvell.
  */
 
+#include <rte_mldev_pmd.h>
+
+#include "cn10k_ml_dev.h"
 #include "cn10k_ml_ocm.h"
+
+#include "roc_api.h"
+
+/* OCM macros */
+#define BYTE_LEN	  8
+#define OCM_MAP_WORD_SIZE (sizeof(uint8_t) * BYTE_LEN)
+#define SET_BIT(num, n)	  ((num) | (1 << (n)))
+
+/* Left shift multi-word mask by 1 bit.
+ *
+ * For example, given a mask of two uint8_t words
+ * Input:  [00110101] [00110111]
+ * Output: [01101010] [01101110]
+ */
+static void
+lshift_mask(uint8_t *mask, int nwords)
+{
+	int i;
+	int word_sz;
+
+	word_sz = sizeof(uint8_t) * BYTE_LEN;
+	for (i = nwords - 1; i >= 0; i--) {
+		mask[i] = mask[i] << 1;
+		if (i != 0)
+			mask[i] = mask[i] | (mask[i - 1] >> (word_sz - 1));
+	}
+}
+
+/* Get the index of the first unused slot in a multi-word mask (base_mask). Unused slots only after
+ * the start_pos are considered. An unused slot is a sequence of slot_sz continuous unset bits in
+ * the multi-word mask. For example given a multi-word mask,
+ *
+ * The program creates a search_mask with slot_sz bits set. Uses a sliding windows approach to scan
+ * the mask to identify the available first slot. search_mask slides left from start_pos to end.
+ *
+ * [10111000] [01001001]
+ * - WORD 1 --- WORD 0 -
+ *
+ * When start = 0,
+ * Index of the first unused slot of size 4 is 7.
+ * Index of the first unused slot of size 3 is 7.
+ * Index of the first unused slot of size 2 is 1.
+ * Index of the first unused slot of size 1 is 1.
+ *
+ * When start = 2,
+ * Index of the first unused slot of size 4 is 7.
+ * Index of the first unused slot of size 2 is 4.
+ * Index of the first unused slot of size 1 is 2.
+ *
+ * When unable to find a valid slot, return 0
+ * When slot_sz is zero, return max_idx + 1
+ */
+static int
+slot_index_lowest(uint8_t *base_mask, int nwords, int slot_sz, int start_pos)
+{
+	uint8_t *search_mask;
+	int word_sz;
+	int end_pos;
+	int min_idx;
+	int max_idx;
+	bool match;
+	int i, j;
+	int idx;
+
+	word_sz = sizeof(uint8_t) * BYTE_LEN;
+	min_idx = 0;
+	max_idx = word_sz * nwords;
+	idx = min_idx - 1;
+
+	if (slot_sz == 0)
+		return max_idx;
+
+	/* Create a mask with slot_sz bits set */
+	search_mask = plt_zmalloc(nwords * sizeof(uint8_t), 0);
+	if (search_mask == NULL)
+		goto error;
+
+	for (i = 0; i < nwords; i++) {
+		if (i < slot_sz / word_sz)
+			search_mask[i] = 0xFF;
+		else if (i > slot_sz / word_sz)
+			search_mask[i] = 0x00;
+		else
+			search_mask[i] = (1 << (slot_sz % word_sz)) - 1;
+	}
+
+	/* Shift search mask by start_pos bits */
+	for (i = 0; i < start_pos; i++)
+		lshift_mask(search_mask, nwords);
+
+	/* Scan for a slot, left shift search mask after every iteration */
+	end_pos = nwords * word_sz - slot_sz + 1;
+	for (j = start_pos; j < end_pos; j++) {
+		match = true;
+		for (i = 0; i < nwords; i++)
+			match = match && (((~base_mask[i]) & search_mask[i]) == search_mask[i]);
+
+		if (match) {
+			idx = j;
+			goto found;
+		}
+
+		lshift_mask(search_mask, nwords);
+	}
+
+found:
+	plt_free(search_mask);
+
+error:
+	return idx;
+}
+
+/* Find the largest possible unused slot, with a minimum size of search_sz in a multi-work mask. The
+ * function returns the start index of the slot and the size of the identified slot (slot_sz).
+ *
+ * For example, in multi-word mask
+ *
+ * [10111000] [01001001]
+ * - WORD 1 --- WORD 0 -
+ *
+ * When search_sz > 4, return value = -1, slot_sz = 0
+ * When search_sz <=4, return value = 7, slot_sz = 4
+ */
+static int
+slot_index_largest(uint8_t *base_mask, int nwords, int search_sz, int *slot_sz)
+{
+	uint8_t *search_mask;
+	int mask_sz;
+	int word_sz;
+	int end_pos;
+	bool match;
+	int i, j;
+	int idx;
+
+	word_sz = sizeof(uint8_t) * BYTE_LEN;
+	mask_sz = nwords * word_sz;
+	idx = -1;
+
+	/* Create a mask with mask_sz bits set */
+	search_mask = plt_zmalloc(mask_sz, 0);
+	if (search_mask == NULL)
+		goto error;
+
+start:
+	for (i = 0; i < nwords; i++) {
+		if (i < mask_sz / word_sz)
+			search_mask[i] = 0xFF;
+		else if (i > mask_sz / word_sz)
+			search_mask[i] = 0x00;
+		else
+			search_mask[i] = (1 << (mask_sz % word_sz)) - 1;
+	}
+
+	/* Scan for a slot, left shift search mask after every iteration */
+	end_pos = nwords * word_sz - mask_sz + 1;
+	for (j = 0; j < end_pos; j++) {
+		match = true;
+		for (i = 0; i < nwords; i++)
+			match = match && (((~base_mask[i]) & search_mask[i]) == search_mask[i]);
+
+		if (match) {
+			idx = j;
+			goto found;
+		}
+		lshift_mask(search_mask, nwords);
+	}
+
+	mask_sz--;
+	if (mask_sz >= search_sz)
+		goto start;
+	else
+		mask_sz = 0;
+
+found:
+	plt_free(search_mask);
+	if (search_sz == 0)
+		idx = word_sz * nwords;
+
+error:
+	if (slot_sz)
+		*slot_sz = mask_sz;
+
+	return idx;
+}
+
+/* Count number of bits in a tilemask. Assumes that all set bits are contiguous. */
+int
+cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end)
+{
+	uint8_t count;
+
+	PLT_ASSERT(tilemask != 0);
+
+	*start = __builtin_ctzl(tilemask);
+	*end = 64 - __builtin_clzl(tilemask) - 1;
+	count = *end - *start + 1;
+
+	PLT_ASSERT(count == __builtin_popcountl(tilemask));
+	return count;
+}
+
+/* Find the tiles and wb_page_start to load the model on given 'num_tiles' tiles with the specified
+ * scratch & WB pages and OCM allocation mode.
+ */
+int
+cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
+			   uint16_t scratch_pages, uint64_t *tilemask)
+{
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+
+	uint8_t local_ocm_mask[ML_CN10K_OCM_MASKWORDS] = {0};
+	uint16_t used_scratch_pages_max;
+	uint16_t scratch_page_start;
+	int used_last_wb_page_max;
+	uint16_t scratch_page_end;
+	uint8_t search_start_tile;
+	uint8_t search_end_tile;
+	int wb_page_start_curr;
+	int max_slot_sz_curr;
+	uint8_t tile_start;
+	int ocm_alloc_mode;
+	int wb_page_start;
+	uint16_t tile_id;
+	uint16_t word_id;
+	uint8_t tile_idx;
+	int max_slot_sz;
+	int start_tile;
+	int page_id;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+
+	if (num_tiles > ML_CN10K_OCM_NUMTILES) {
+		plt_err("Invalid num_tiles = %u (> ML_CN10K_OCM_NUMTILES)", num_tiles);
+		return -1;
+	}
+
+	memset(tilemask, 0, sizeof(uint64_t));
+	wb_page_start = -1;
+	used_scratch_pages_max = 0;
+	used_last_wb_page_max = -1;
+	start_tile = -1;
+	max_slot_sz_curr = 0;
+	max_slot_sz = 0;
+	tile_idx = 0;
+	ocm_alloc_mode = 2;
+
+	if ((start_tile != -1) && (start_tile % num_tiles != 0)) {
+		plt_err("Invalid start_tile, %d", start_tile);
+		return -1;
+	}
+
+	if (start_tile < 0) {
+		search_start_tile = 0;
+		search_end_tile = ocm->num_tiles - num_tiles;
+	} else {
+		search_start_tile = start_tile;
+		search_end_tile = start_tile;
+	}
+
+	tile_start = search_start_tile;
+start_search:
+	used_scratch_pages_max = 0;
+	used_last_wb_page_max = -1;
+	for (tile_id = tile_start; tile_id < tile_start + num_tiles; tile_id++) {
+		used_scratch_pages_max =
+			PLT_MAX(ocm->tile_ocm_info[tile_id].scratch_pages, used_scratch_pages_max);
+		used_last_wb_page_max =
+			PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, used_last_wb_page_max);
+	}
+
+	memset(local_ocm_mask, 0, sizeof(local_ocm_mask));
+	for (tile_id = tile_start; tile_id < tile_start + num_tiles; tile_id++) {
+		for (word_id = 0; word_id < ocm->mask_words; word_id++)
+			local_ocm_mask[word_id] |= ocm->tile_ocm_info[tile_id].ocm_mask[word_id];
+	}
+
+	if (used_scratch_pages_max < scratch_pages) { /* Check for extra scratch pages */
+		if (ocm->num_pages - used_last_wb_page_max - 1 >=
+		    scratch_pages) { /* Pages available */
+			scratch_page_start = ocm->num_pages - scratch_pages;
+			scratch_page_end = ocm->num_pages - 1;
+			for (page_id = scratch_page_start; page_id <= scratch_page_end;
+			     page_id++) { /* Mark the extra scratch pages as used */
+				local_ocm_mask[page_id / OCM_MAP_WORD_SIZE] =
+					SET_BIT(local_ocm_mask[page_id / OCM_MAP_WORD_SIZE],
+						page_id % OCM_MAP_WORD_SIZE);
+			}
+		} else { /* Pages not available, check for next set of tiles */
+			goto next_search;
+		}
+	}
+
+	if (ocm_alloc_mode == 1) {
+		wb_page_start = slot_index_lowest(local_ocm_mask, ocm->mask_words, wb_pages, 0);
+		if (wb_page_start != -1) { /* Have a valid slot for WB, else next set of tiles */
+			tile_idx = tile_start;
+			goto found;
+		}
+	} else if (ocm_alloc_mode == 2) {
+		wb_page_start_curr = slot_index_largest(local_ocm_mask, ocm->mask_words, wb_pages,
+							&max_slot_sz_curr);
+		if (max_slot_sz_curr > max_slot_sz) {
+			wb_page_start = wb_page_start_curr;
+			max_slot_sz = max_slot_sz_curr;
+			tile_idx = tile_start;
+		} else if (max_slot_sz_curr == max_slot_sz) {
+			wb_page_start = wb_page_start_curr;
+			if (wb_page_start == ocm->num_pages) {
+				tile_idx = tile_start;
+				goto found;
+			}
+		}
+	}
+
+next_search:
+	tile_start = tile_start + num_tiles;
+	if (tile_start <= search_end_tile)
+		goto start_search;
+
+found:
+	if (wb_page_start != -1)
+		*tilemask = GENMASK_ULL(tile_idx + num_tiles - 1, tile_idx);
+
+	return wb_page_start;
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 44390396f9..2e26271a7a 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -6,6 +6,7 @@
 #define _CN10K_ML_OCM_H_
 
 #include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
 
 /* Page size in bytes. */
 #define ML_CN10K_OCM_PAGESIZE 0x4000
@@ -76,4 +77,8 @@ struct cn10k_ml_ocm {
 	struct cn10k_ml_ocm_tile_info tile_ocm_info[ML_CN10K_OCM_NUMTILES];
 };
 
+int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
+int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
+			       uint16_t scratch_pages, uint64_t *tilemask);
+
 #endif /* _CN10K_ML_OCM_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 17/38] ml/cnxk: add support to reserve and free OCM pages
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (15 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 16/38] ml/cnxk: find OCM mask and page slots for a model Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 18/38] ml/cnxk: enable support to start an ML model Srikanth Yalavarthi
                       ` (21 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added support to reserve and free OCM pages for a model. OCM
pages are reserved upon completion of model start and are
released after model stop.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c | 131 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cn10k_ml_ocm.h |   3 +
 2 files changed, 131 insertions(+), 3 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index df2fa4c514..034d9546eb 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -5,14 +5,17 @@
 #include <rte_mldev_pmd.h>
 
 #include "cn10k_ml_dev.h"
+#include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
 #include "roc_api.h"
 
 /* OCM macros */
-#define BYTE_LEN	  8
-#define OCM_MAP_WORD_SIZE (sizeof(uint8_t) * BYTE_LEN)
-#define SET_BIT(num, n)	  ((num) | (1 << (n)))
+#define BYTE_LEN	   8
+#define OCM_MAP_WORD_SIZE  (sizeof(uint8_t) * BYTE_LEN)
+#define IS_BIT_SET(num, n) ((num) & (1 << (n)))
+#define SET_BIT(num, n)	   ((num) | (1 << (n)))
+#define CLEAR_BIT(num, n)  ((num) &= ~((1) << (n)))
 
 /* Left shift multi-word mask by 1 bit.
  *
@@ -333,3 +336,125 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 
 	return wb_page_start;
 }
+
+void
+cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, int16_t model_id, uint64_t tilemask,
+			   int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+
+	int scratch_page_start;
+	int scratch_page_end;
+	int wb_page_end;
+	int tile_start;
+	int tile_end;
+	int tile_id;
+	int page_id;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+	model = dev->data->models[model_id];
+
+	/* Get first set bit, tile_start */
+	tile_start = 0;
+	tile_end = 0;
+	cn10k_ml_ocm_tilecount(tilemask, &tile_start, &tile_end);
+	wb_page_end = wb_page_start + wb_pages - 1;
+	scratch_page_start = ocm->num_pages - scratch_pages;
+	scratch_page_end = ocm->num_pages - 1;
+
+	/* Update tile_ocm_info */
+	for (tile_id = tile_start; tile_id <= tile_end; tile_id++) {
+		/* Scratch pages */
+		for (page_id = scratch_page_start; page_id <= scratch_page_end; page_id++)
+			ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE] = SET_BIT(
+				ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE],
+				page_id % OCM_MAP_WORD_SIZE);
+		ocm->tile_ocm_info[tile_id].scratch_pages =
+			PLT_MAX(ocm->tile_ocm_info[tile_id].scratch_pages, scratch_pages);
+
+		/* WB pages */
+		for (page_id = wb_page_start; page_id <= wb_page_end; page_id++)
+			ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE] = SET_BIT(
+				ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE],
+				page_id % OCM_MAP_WORD_SIZE);
+		if (wb_pages != 0)
+			ocm->tile_ocm_info[tile_id].last_wb_page =
+				PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, wb_page_end);
+	}
+
+	model->addr.tile_start = tile_start;
+	model->addr.tile_end = tile_end;
+
+	plt_ml_dbg("model_id = %d, tilemask = 0x%016lx", model_id, tilemask);
+	plt_ml_dbg("model_id = %d, wb_page_start = %d, wb_page_end = %d", model_id, wb_page_start,
+		   wb_page_end);
+	plt_ml_dbg("model_id = %d, scratch_page_start = %d, scratch_page_end = %d", model_id,
+		   scratch_page_start, scratch_page_end);
+}
+
+void
+cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, int16_t model_id)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+
+	int scratch_resize_pages;
+	int wb_page_start;
+	int wb_page_end;
+	int prev_start;
+	int curr_start;
+	int tile_id;
+	int page_id;
+	int16_t i;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+	model = dev->data->models[model_id];
+
+	/* Update OCM info for WB memory */
+	wb_page_start = model->model_mem_map.wb_page_start;
+	wb_page_end = wb_page_start + model->model_mem_map.wb_pages - 1;
+	for (tile_id = model->addr.tile_start; tile_id <= model->addr.tile_end; tile_id++) {
+		for (page_id = wb_page_start; page_id <= wb_page_end; page_id++) {
+			ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE] =
+				CLEAR_BIT(ocm->tile_ocm_info[tile_id]
+						  .ocm_mask[page_id / OCM_MAP_WORD_SIZE],
+					  page_id % OCM_MAP_WORD_SIZE);
+		}
+
+		/* Update last_wb_page size */
+		if (wb_page_end == ocm->tile_ocm_info[tile_id].last_wb_page)
+			ocm->tile_ocm_info[tile_id].last_wb_page = wb_page_start - 1;
+
+		/* Update scratch page size and clear extra bits */
+		scratch_resize_pages = 0;
+		/* Get max scratch pages required, excluding the current model */
+		for (i = 0; i < dev->data->nb_models; i++) {
+			struct cn10k_ml_model *model = dev->data->models[i];
+
+			if ((i != model_id) && (model != NULL)) {
+				if (IS_BIT_SET(model->model_mem_map.tilemask, tile_id))
+					scratch_resize_pages =
+						PLT_MAX((int)model->model_mem_map.scratch_pages,
+							scratch_resize_pages);
+			}
+		}
+
+		/* Clear extra scratch pages */
+		if (scratch_resize_pages < ocm->tile_ocm_info[tile_id].scratch_pages) {
+			prev_start = ocm->num_pages - ocm->tile_ocm_info[tile_id].scratch_pages;
+			curr_start = ocm->num_pages - scratch_resize_pages;
+			for (page_id = prev_start; page_id < curr_start; page_id++) {
+				ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE] =
+					CLEAR_BIT(ocm->tile_ocm_info[tile_id]
+							  .ocm_mask[page_id / OCM_MAP_WORD_SIZE],
+						  page_id % OCM_MAP_WORD_SIZE);
+			}
+			ocm->tile_ocm_info[tile_id].scratch_pages = scratch_resize_pages;
+		}
+	}
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 2e26271a7a..cd65d1d8fa 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -80,5 +80,8 @@ struct cn10k_ml_ocm {
 int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
 int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
 			       uint16_t scratch_pages, uint64_t *tilemask);
+void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, int16_t model_id, uint64_t tilemask,
+				int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages);
+void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, int16_t model_id);
 
 #endif /* _CN10K_ML_OCM_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 18/38] ml/cnxk: enable support to start an ML model
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (16 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 17/38] ml/cnxk: add support to reserve and free OCM pages Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 19/38] ml/cnxk: enable support to stop an ML models Srikanth Yalavarthi
                       ` (20 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Implemented model start driver function. A model start  job
is checked for completion in synchronous mode. Tilemask and
OCM slot is calculated before starting the model. Model start
is enqueued through scratch registers. OCM pages are reserved
after model start completion.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h |   3 +
 drivers/ml/cnxk/cn10k_ml_ops.c | 207 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |   4 +
 3 files changed, 214 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 68fcc957fa..8f6bc24370 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -33,6 +33,9 @@
 /* ML command timeout in seconds */
 #define ML_CN10K_CMD_TIMEOUT 5
 
+/* ML slow-path job flags */
+#define ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE BIT(0)
+
 /* Poll mode job state */
 #define ML_CN10K_POLL_JOB_START	 0
 #define ML_CN10K_POLL_JOB_FINISH 1
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 8603cba20e..78624e75c2 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -114,6 +114,64 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	return NULL;
 }
 
+static void
+cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_model *model,
+				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	struct cn10k_ml_model_addr *addr;
+
+	metadata = &model->metadata;
+	addr = &model->addr;
+
+	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
+	req->jd.hdr.jce.w0.u64 = 0;
+	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
+	req->jd.hdr.model_id = model->model_id;
+	req->jd.hdr.job_type = job_type;
+	req->jd.hdr.fp_flags = 0x0;
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+
+	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
+		if (!model->metadata.model.ocm_relocatable)
+			req->jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
+		else
+			req->jd.hdr.sp_flags = 0x0;
+		req->jd.model_start.model_src_ddr_addr =
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, addr->init_load_addr));
+		req->jd.model_start.model_dst_ddr_addr =
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, addr->init_run_addr));
+		req->jd.model_start.model_init_offset = 0x0;
+		req->jd.model_start.model_main_offset = metadata->init_model.file_size;
+		req->jd.model_start.model_finish_offset =
+			metadata->init_model.file_size + metadata->main_model.file_size;
+		req->jd.model_start.model_init_size = metadata->init_model.file_size;
+		req->jd.model_start.model_main_size = metadata->main_model.file_size;
+		req->jd.model_start.model_finish_size = metadata->finish_model.file_size;
+		req->jd.model_start.model_wb_offset = metadata->init_model.file_size +
+						      metadata->main_model.file_size +
+						      metadata->finish_model.file_size;
+		req->jd.model_start.num_layers = metadata->model.num_layers;
+		req->jd.model_start.num_gather_entries = 0;
+		req->jd.model_start.num_scatter_entries = 0;
+		req->jd.model_start.tilemask = 0; /* Updated after reserving pages */
+		req->jd.model_start.batch_size = model->batch_size;
+		req->jd.model_start.ocm_wb_base_address = 0; /* Updated after reserving pages */
+		req->jd.model_start.ocm_wb_range_start = metadata->model.ocm_wb_range_start;
+		req->jd.model_start.ocm_wb_range_end = metadata->model.ocm_wb_range_end;
+		req->jd.model_start.ddr_wb_base_address = PLT_U64_CAST(roc_ml_addr_ap2mlip(
+			&mldev->roc,
+			PLT_PTR_ADD(addr->finish_load_addr, metadata->finish_model.file_size)));
+		req->jd.model_start.ddr_wb_range_start = metadata->model.ddr_wb_range_start;
+		req->jd.model_start.ddr_wb_range_end = metadata->model.ddr_wb_range_end;
+		req->jd.model_start.input.s.ddr_range_start = metadata->model.ddr_input_range_start;
+		req->jd.model_start.input.s.ddr_range_end = metadata->model.ddr_input_range_end;
+		req->jd.model_start.output.s.ddr_range_start =
+			metadata->model.ddr_output_range_start;
+		req->jd.model_start.output.s.ddr_range_end = metadata->model.ddr_output_range_end;
+	}
+}
+
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -561,6 +619,154 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, int16_t model_id)
 	return plt_memzone_free(plt_memzone_lookup(str));
 }
 
+int
+cn10k_ml_model_start(struct rte_ml_dev *dev, int16_t model_id)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+	struct cn10k_ml_req *req;
+
+	bool job_enqueued;
+	bool job_dequeued;
+	uint8_t num_tiles;
+	uint64_t tilemask;
+	int wb_page_start;
+	int tile_start;
+	int tile_end;
+	bool locked;
+	int ret = 0;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	/* Prepare JD */
+	req = model->req;
+	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
+	req->result.error_code = 0x0;
+	req->result.user_ptr = NULL;
+
+	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_wmb();
+
+	num_tiles = model->metadata.model.tile_end - model->metadata.model.tile_start + 1;
+
+	locked = false;
+	while (!locked) {
+		if (plt_spinlock_trylock(&model->lock) != 0) {
+			if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+				plt_ml_dbg("Model already started, model = 0x%016lx",
+					   PLT_U64_CAST(model));
+				plt_spinlock_unlock(&model->lock);
+				return 1;
+			}
+
+			if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE) {
+				plt_err("A slow-path job is active for the model = 0x%016lx",
+					PLT_U64_CAST(model));
+				plt_spinlock_unlock(&model->lock);
+				return -EBUSY;
+			}
+
+			model->state = ML_CN10K_MODEL_STATE_JOB_ACTIVE;
+			plt_spinlock_unlock(&model->lock);
+			locked = true;
+		}
+	}
+
+	while (!model->model_mem_map.ocm_reserved) {
+		if (plt_spinlock_trylock(&ocm->lock) != 0) {
+			wb_page_start = cn10k_ml_ocm_tilemask_find(
+				dev, num_tiles, model->model_mem_map.wb_pages,
+				model->model_mem_map.scratch_pages, &tilemask);
+
+			if (wb_page_start == -1) {
+				plt_err("Free pages not available on OCM tiles");
+				plt_err("Failed to load model = 0x%016lx, name = %s",
+					PLT_U64_CAST(model), model->metadata.model.name);
+
+				plt_spinlock_unlock(&ocm->lock);
+				return -ENOMEM;
+			}
+
+			model->model_mem_map.tilemask = tilemask;
+			model->model_mem_map.wb_page_start = wb_page_start;
+
+			cn10k_ml_ocm_reserve_pages(
+				dev, model->model_id, model->model_mem_map.tilemask,
+				model->model_mem_map.wb_page_start, model->model_mem_map.wb_pages,
+				model->model_mem_map.scratch_pages);
+			model->model_mem_map.ocm_reserved = true;
+			plt_spinlock_unlock(&ocm->lock);
+		}
+	}
+
+	/* Update JD */
+	cn10k_ml_ocm_tilecount(model->model_mem_map.tilemask, &tile_start, &tile_end);
+	req->jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
+	req->jd.model_start.ocm_wb_base_address =
+		model->model_mem_map.wb_page_start * ocm->page_size;
+
+	job_enqueued = false;
+	job_dequeued = false;
+	do {
+		if (!job_enqueued) {
+			req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+			job_enqueued = roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+		}
+
+		if (job_enqueued && !job_dequeued)
+			job_dequeued = roc_ml_scratch_dequeue(&mldev->roc, &req->jd);
+
+		if (job_dequeued)
+			break;
+	} while (plt_tsc_cycles() < req->timeout);
+
+	if (job_dequeued) {
+		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+			if (req->result.error_code == 0)
+				ret = 0;
+			else
+				ret = -1;
+		}
+	} else { /* Reset scratch registers */
+		roc_ml_scratch_queue_reset(&mldev->roc);
+		ret = -ETIME;
+	}
+
+	locked = false;
+	while (!locked) {
+		if (plt_spinlock_trylock(&model->lock) != 0) {
+			if (ret == 0)
+				model->state = ML_CN10K_MODEL_STATE_STARTED;
+			else
+				model->state = ML_CN10K_MODEL_STATE_UNKNOWN;
+
+			plt_spinlock_unlock(&model->lock);
+			locked = true;
+		}
+	}
+
+	if (model->state == ML_CN10K_MODEL_STATE_UNKNOWN) {
+		while (model->model_mem_map.ocm_reserved) {
+			if (plt_spinlock_trylock(&ocm->lock) != 0) {
+				cn10k_ml_ocm_free_pages(dev, model->model_id);
+				model->model_mem_map.ocm_reserved = false;
+				model->model_mem_map.tilemask = 0x0;
+				plt_spinlock_unlock(&ocm->lock);
+			}
+		}
+	}
+
+	return ret;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -576,4 +782,5 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Model ops */
 	.model_load = cn10k_ml_model_load,
 	.model_unload = cn10k_ml_model_unload,
+	.model_start = cn10k_ml_model_start,
 };
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 981aa52655..af2ea19dce 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -25,6 +25,9 @@ struct cn10k_ml_req {
 
 	/* Job command */
 	struct ml_job_cmd_s jcmd;
+
+	/* Timeout cycle */
+	uint64_t timeout;
 } __rte_aligned(ROC_ALIGN);
 
 /* Request queue */
@@ -61,5 +64,6 @@ extern struct rte_ml_dev_ops cn10k_ml_ops;
 int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			int16_t *model_id);
 int cn10k_ml_model_unload(struct rte_ml_dev *dev, int16_t model_id);
+int cn10k_ml_model_start(struct rte_ml_dev *dev, int16_t model_id);
 
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 19/38] ml/cnxk: enable support to stop an ML models
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (17 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 18/38] ml/cnxk: enable support to start an ML model Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 20/38] ml/cnxk: enable support to get model information Srikanth Yalavarthi
                       ` (19 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Implemented model stop driver function. A model stop job is
enqueued through scratch registers and is checked for
completion through polling in a synchronous mode. OCM pages
are released after model stop completion.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 115 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cn10k_ml_ops.h |   1 +
 2 files changed, 114 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 78624e75c2..e902ef6420 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -295,10 +295,14 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		/* Re-configure */
 		void **models;
 
-		/* Unload all models */
+		/* Stop and unload all models */
 		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 			model = dev->data->models[model_id];
 			if (model != NULL) {
+				if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+					if (cn10k_ml_model_stop(dev, model_id) != 0)
+						plt_err("Could not stop model %d", model_id);
+				}
 				if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
 					if (cn10k_ml_model_unload(dev, model_id) != 0)
 						plt_err("Could not unload model %d", model_id);
@@ -362,10 +366,14 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 
 	mldev = dev->data->dev_private;
 
-	/* Unload all models */
+	/* Stop and unload all models */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 		model = dev->data->models[model_id];
 		if (model != NULL) {
+			if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+				if (cn10k_ml_model_stop(dev, model_id) != 0)
+					plt_err("Could not stop model %d", model_id);
+			}
 			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
 				if (cn10k_ml_model_unload(dev, model_id) != 0)
 					plt_err("Could not unload model %d", model_id);
@@ -767,6 +775,108 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, int16_t model_id)
 	return ret;
 }
 
+int
+cn10k_ml_model_stop(struct rte_ml_dev *dev, int16_t model_id)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+	struct cn10k_ml_req *req;
+
+	bool job_enqueued;
+	bool job_dequeued;
+	bool locked;
+	int ret = 0;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	/* Prepare JD */
+	req = model->req;
+	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
+	req->result.error_code = 0x0;
+	req->result.user_ptr = NULL;
+
+	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_wmb();
+
+	locked = false;
+	while (!locked) {
+		if (plt_spinlock_trylock(&model->lock) != 0) {
+			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+				plt_ml_dbg("Model not started, model = 0x%016lx",
+					   PLT_U64_CAST(model));
+				plt_spinlock_unlock(&model->lock);
+				return 1;
+			}
+
+			if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE) {
+				plt_err("A slow-path job is active for the model = 0x%016lx",
+					PLT_U64_CAST(model));
+				plt_spinlock_unlock(&model->lock);
+				return -EBUSY;
+			}
+
+			model->state = ML_CN10K_MODEL_STATE_JOB_ACTIVE;
+			plt_spinlock_unlock(&model->lock);
+			locked = true;
+		}
+	}
+
+	while (model->model_mem_map.ocm_reserved) {
+		if (plt_spinlock_trylock(&ocm->lock) != 0) {
+			cn10k_ml_ocm_free_pages(dev, model->model_id);
+			model->model_mem_map.ocm_reserved = false;
+			model->model_mem_map.tilemask = 0x0;
+			plt_spinlock_unlock(&ocm->lock);
+		}
+	}
+
+	job_enqueued = false;
+	job_dequeued = false;
+	do {
+		if (!job_enqueued) {
+			req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+			job_enqueued = roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+		}
+
+		if (job_enqueued && !job_dequeued)
+			job_dequeued = roc_ml_scratch_dequeue(&mldev->roc, &req->jd);
+
+		if (job_dequeued)
+			break;
+	} while (plt_tsc_cycles() < req->timeout);
+
+	if (job_dequeued) {
+		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+			if (req->result.error_code == 0x0)
+				ret = 0;
+			else
+				ret = -1;
+		}
+	} else {
+		roc_ml_scratch_queue_reset(&mldev->roc);
+		ret = -ETIME;
+	}
+
+	locked = false;
+	while (!locked) {
+		if (plt_spinlock_trylock(&model->lock) != 0) {
+			model->state = ML_CN10K_MODEL_STATE_LOADED;
+			plt_spinlock_unlock(&model->lock);
+			locked = true;
+		}
+	}
+
+	return ret;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -783,4 +893,5 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.model_load = cn10k_ml_model_load,
 	.model_unload = cn10k_ml_model_unload,
 	.model_start = cn10k_ml_model_start,
+	.model_stop = cn10k_ml_model_stop,
 };
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index af2ea19dce..3143c9054c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -65,5 +65,6 @@ int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *para
 			int16_t *model_id);
 int cn10k_ml_model_unload(struct rte_ml_dev *dev, int16_t model_id);
 int cn10k_ml_model_start(struct rte_ml_dev *dev, int16_t model_id);
+int cn10k_ml_model_stop(struct rte_ml_dev *dev, int16_t model_id);
 
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 20/38] ml/cnxk: enable support to get model information
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (18 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 19/38] ml/cnxk: enable support to stop an ML models Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 21/38] ml/cnxk: enable support to update model params Srikanth Yalavarthi
                       ` (18 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added driver functions to get model information. Added
internal functions to set and get model info.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 55 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h |  9 ++++++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 37 ++++++++++++++++++---
 3 files changed, 97 insertions(+), 4 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 2281f59591..8a988162fc 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -340,3 +340,58 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, int16_t model_id, uin
 
 	return 0;
 }
+
+void
+cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model)
+{
+	struct rte_ml_model_info *info;
+	struct rte_ml_io_info *output;
+	struct rte_ml_io_info *input;
+	uint8_t i;
+
+	info = PLT_PTR_CAST(model->info);
+	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
+	output =
+		PLT_PTR_ADD(input, model->metadata.model.num_input * sizeof(struct rte_ml_io_info));
+
+	/* Set model info */
+	memset(info, 0, sizeof(struct rte_ml_model_info));
+	rte_memcpy(info->name, model->metadata.model.name, MRVL_ML_MODEL_NAME_LEN);
+	snprintf(info->version, RTE_ML_STR_MAX, "%u.%u.%u.%u", model->metadata.model.version[0],
+		 model->metadata.model.version[1], model->metadata.model.version[2],
+		 model->metadata.model.version[3]);
+	info->model_id = model->model_id;
+	info->device_id = dev->data->dev_id;
+	info->batch_size = model->batch_size;
+	info->nb_inputs = model->metadata.model.num_input;
+	info->input_info = input;
+	info->nb_outputs = model->metadata.model.num_output;
+	info->output_info = output;
+	info->wb_size = model->metadata.weights_bias.file_size;
+
+	/* Set input info */
+	for (i = 0; i < info->nb_inputs; i++) {
+		rte_memcpy(input[i].name, model->metadata.input[i].input_name,
+			   MRVL_ML_INPUT_NAME_LEN);
+		input[i].dtype = model->metadata.input[i].input_type;
+		input[i].qtype = model->metadata.input[i].model_input_type;
+		input[i].shape.format = model->metadata.input[i].shape.format;
+		input[i].shape.w = model->metadata.input[i].shape.w;
+		input[i].shape.x = model->metadata.input[i].shape.x;
+		input[i].shape.y = model->metadata.input[i].shape.y;
+		input[i].shape.z = model->metadata.input[i].shape.z;
+	}
+
+	/* Set output info */
+	for (i = 0; i < info->nb_outputs; i++) {
+		rte_memcpy(output[i].name, model->metadata.output[i].output_name,
+			   MRVL_ML_OUTPUT_NAME_LEN);
+		output[i].dtype = model->metadata.output[i].output_type;
+		output[i].qtype = model->metadata.output[i].model_output_type;
+		output[i].shape.format = RTE_ML_IO_FORMAT_1D;
+		output[i].shape.w = model->metadata.output[i].size;
+		output[i].shape.x = 1;
+		output[i].shape.y = 1;
+		output[i].shape.z = 1;
+	}
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 4b5a804c8b..2fd12846d4 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -425,6 +425,14 @@ struct cn10k_ml_model {
 	/* Tile and memory information object */
 	struct cn10k_ml_ocm_model_map model_mem_map;
 
+	/* Internal model information structure
+	 * Size of the buffer = sizeof(struct rte_ml_model_info)
+	 *                    + num_inputs * sizeof(struct rte_ml_io_info)
+	 *                    + num_outputs * sizeof(struct rte_ml_io_info).
+	 * Structures would be arranged in the same order in the buffer.
+	 */
+	uint8_t *info;
+
 	/* Spinlock, used to update model state */
 	plt_spinlock_t lock;
 
@@ -441,5 +449,6 @@ void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
 				uint8_t *base_dma_addr);
 int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, int16_t model_id, uint8_t *buffer,
 				   uint16_t *wb_pages, uint16_t *scratch_pages);
+void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index e902ef6420..dd1c7ae385 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -506,6 +506,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
 	size_t model_data_size;
+	size_t model_info_size;
 	uint8_t *base_dma_addr;
 	uint16_t scratch_pages;
 	uint16_t wb_pages;
@@ -544,8 +545,13 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
 			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
 	model_data_size = PLT_ALIGN_CEIL(model_data_size, ML_CN10K_ALIGN_SIZE);
+	model_info_size = sizeof(struct rte_ml_model_info) +
+			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
+			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
+	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
+
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
-		  2 * model_data_size +
+		  2 * model_data_size + model_info_size +
 		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE);
 
 	/* Allocate memzone for model object and model data */
@@ -585,10 +591,12 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	model->model_mem_map.wb_pages = wb_pages;
 	model->model_mem_map.scratch_pages = scratch_pages;
 
+	/* Set model info */
+	model->info = PLT_PTR_ADD(base_dma_addr, 2 * model_data_size);
+	cn10k_ml_model_info_set(dev, model);
+
 	/* Set slow-path request address and state */
-	model->req = PLT_PTR_ADD(
-		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
-				  2 * model_data_size);
+	model->req = PLT_PTR_ADD(model->info, model_info_size);
 
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
@@ -877,6 +885,26 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, int16_t model_id)
 	return ret;
 }
 
+static int
+cn10k_ml_model_info_get(struct rte_ml_dev *dev, int16_t model_id,
+			struct rte_ml_model_info *model_info)
+{
+	struct cn10k_ml_model *model;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	rte_memcpy(model_info, model->info, sizeof(struct rte_ml_model_info));
+	model_info->input_info = ((struct rte_ml_model_info *)model->info)->input_info;
+	model_info->output_info = ((struct rte_ml_model_info *)model->info)->output_info;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -894,4 +922,5 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.model_unload = cn10k_ml_model_unload,
 	.model_start = cn10k_ml_model_start,
 	.model_stop = cn10k_ml_model_stop,
+	.model_info_get = cn10k_ml_model_info_get,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 21/38] ml/cnxk: enable support to update model params
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (19 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 20/38] ml/cnxk: enable support to get model information Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 22/38] ml/cnxk: add support to get IO buffer sizes Srikanth Yalavarthi
                       ` (17 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added cnxk driver functions to update model params or weights
and bias after a models is loaded. Updating model params would
not require reloading the model.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index dd1c7ae385..55be5d2d29 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -905,6 +905,36 @@ cn10k_ml_model_info_get(struct rte_ml_dev *dev, int16_t model_id,
 	return 0;
 }
 
+static int
+cn10k_ml_model_params_update(struct rte_ml_dev *dev, int16_t model_id, void *buffer)
+{
+	struct cn10k_ml_model *model;
+	size_t size;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	if (model->state == ML_CN10K_MODEL_STATE_UNKNOWN)
+		return -1;
+	else if (model->state != ML_CN10K_MODEL_STATE_LOADED)
+		return -EBUSY;
+
+	size = model->metadata.init_model.file_size + model->metadata.main_model.file_size +
+	       model->metadata.finish_model.file_size + model->metadata.weights_bias.file_size;
+
+	/* Update model weights & bias */
+	rte_memcpy(model->addr.wb_load_addr, buffer, model->metadata.weights_bias.file_size);
+
+	/* Copy data from load to run. run address to be used by MLIP */
+	rte_memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, size);
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -923,4 +953,5 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.model_start = cn10k_ml_model_start,
 	.model_stop = cn10k_ml_model_stop,
 	.model_info_get = cn10k_ml_model_info_get,
+	.model_params_update = cn10k_ml_model_params_update,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 22/38] ml/cnxk: add support to get IO buffer sizes
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (20 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 21/38] ml/cnxk: enable support to update model params Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 23/38] ml/cnxk: enable quantization and dequantization Srikanth Yalavarthi
                       ` (16 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added driver functions to get input and output buffer sizes
for a given batch size. This function would compute the buffer
size based on specific requirements of the device.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 52 ++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 55be5d2d29..d825c72a8e 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -935,6 +935,54 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, int16_t model_id, void *buf
 	return 0;
 }
 
+static int
+cn10k_ml_io_input_size_get(struct rte_ml_dev *dev, int16_t model_id, uint32_t nb_batches,
+			   uint64_t *input_qsize, uint64_t *input_dsize)
+{
+	struct cn10k_ml_model *model;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	if (input_qsize != NULL)
+		*input_qsize = PLT_U64_CAST(model->addr.total_input_sz_q *
+					    PLT_DIV_CEIL(nb_batches, model->batch_size));
+
+	if (input_dsize != NULL)
+		*input_dsize = PLT_U64_CAST(model->addr.total_input_sz_d *
+					    PLT_DIV_CEIL(nb_batches, model->batch_size));
+
+	return 0;
+}
+
+static int
+cn10k_ml_io_output_size_get(struct rte_ml_dev *dev, int16_t model_id, uint32_t nb_batches,
+			    uint64_t *output_qsize, uint64_t *output_dsize)
+{
+	struct cn10k_ml_model *model;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	if (output_qsize != NULL)
+		*output_qsize = PLT_U64_CAST(model->addr.total_output_sz_q *
+					     PLT_DIV_CEIL(nb_batches, model->batch_size));
+
+	if (output_dsize != NULL)
+		*output_dsize = PLT_U64_CAST(model->addr.total_output_sz_d *
+					     PLT_DIV_CEIL(nb_batches, model->batch_size));
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -954,4 +1002,8 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.model_stop = cn10k_ml_model_stop,
 	.model_info_get = cn10k_ml_model_info_get,
 	.model_params_update = cn10k_ml_model_params_update,
+
+	/* I/O ops */
+	.io_input_size_get = cn10k_ml_io_input_size_get,
+	.io_output_size_get = cn10k_ml_io_output_size_get,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 23/38] ml/cnxk: enable quantization and dequantization
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (21 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 22/38] ml/cnxk: add support to get IO buffer sizes Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 24/38] ml/cnxk: enable support to dump device debug info Srikanth Yalavarthi
                       ` (15 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Implemented driver functions to quantize / dequantize input
and output data. Support is enabled for multiple batches.
Quantization / dequantization use the type conversion functions
defined in ML common code.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 150 +++++++++++++++++++++++++++++++++
 1 file changed, 150 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index d825c72a8e..e02b8335b7 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -5,6 +5,8 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include <ml_utils.h>
+
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ops.h"
@@ -983,6 +985,152 @@ cn10k_ml_io_output_size_get(struct rte_ml_dev *dev, int16_t model_id, uint32_t n
 	return 0;
 }
 
+static int
+cn10k_ml_io_quantize(struct rte_ml_dev *dev, int16_t model_id, uint16_t nb_batches, void *dbuffer,
+		     void *qbuffer)
+{
+	struct cn10k_ml_model *model;
+	uint8_t *lcl_dbuffer;
+	uint8_t *lcl_qbuffer;
+	uint32_t batch_id;
+	uint32_t i;
+	int ret;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	lcl_dbuffer = dbuffer;
+	lcl_qbuffer = qbuffer;
+	batch_id = 0;
+
+next_batch:
+	for (i = 0; i < model->metadata.model.num_input; i++) {
+		if (model->metadata.input[i].input_type ==
+		    model->metadata.input[i].model_input_type) {
+			rte_memcpy(lcl_qbuffer, lcl_dbuffer, model->addr.input[i].sz_d);
+		} else {
+			switch (model->metadata.input[i].model_input_type) {
+			case RTE_ML_IO_TYPE_INT8:
+				ret = ml_float32_to_int8(model->metadata.input[i].qscale,
+							 model->addr.input[i].nb_elements,
+							 lcl_dbuffer, lcl_qbuffer);
+				break;
+			case RTE_ML_IO_TYPE_UINT8:
+				ret = ml_float32_to_uint8(model->metadata.input[i].qscale,
+							  model->addr.input[i].nb_elements,
+							  lcl_dbuffer, lcl_qbuffer);
+				break;
+			case RTE_ML_IO_TYPE_INT16:
+				ret = ml_float32_to_int16(model->metadata.input[i].qscale,
+							  model->addr.input[i].nb_elements,
+							  lcl_dbuffer, lcl_qbuffer);
+				break;
+			case RTE_ML_IO_TYPE_UINT16:
+				ret = ml_float32_to_uint16(model->metadata.input[i].qscale,
+							   model->addr.input[i].nb_elements,
+							   lcl_dbuffer, lcl_qbuffer);
+				break;
+			case RTE_ML_IO_TYPE_FP16:
+				ret = ml_float32_to_float16(model->addr.input[i].nb_elements,
+							    lcl_dbuffer, lcl_qbuffer);
+				break;
+			default:
+				plt_err("Unsupported model_input_type[%u] : %u", i,
+					model->metadata.input[i].model_input_type);
+				ret = -ENOTSUP;
+			}
+			if (ret < 0)
+				return ret;
+		}
+
+		lcl_dbuffer += model->addr.input[i].sz_d;
+		lcl_qbuffer += model->addr.input[i].sz_q;
+	}
+
+	batch_id++;
+	if (batch_id < PLT_DIV_CEIL(nb_batches, model->batch_size))
+		goto next_batch;
+
+	return 0;
+}
+
+static int
+cn10k_ml_io_dequantize(struct rte_ml_dev *dev, int16_t model_id, uint16_t nb_batches, void *qbuffer,
+		       void *dbuffer)
+{
+	struct cn10k_ml_model *model;
+	uint8_t *lcl_qbuffer;
+	uint8_t *lcl_dbuffer;
+	uint32_t batch_id;
+	uint32_t i;
+	int ret;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	lcl_dbuffer = dbuffer;
+	lcl_qbuffer = qbuffer;
+	batch_id = 0;
+
+next_batch:
+	for (i = 0; i < model->metadata.model.num_output; i++) {
+		if (model->metadata.output[i].output_type ==
+		    model->metadata.output[i].model_output_type) {
+			rte_memcpy(lcl_dbuffer, lcl_qbuffer, model->addr.output[i].sz_q);
+		} else {
+			switch (model->metadata.output[i].model_output_type) {
+			case RTE_ML_IO_TYPE_INT8:
+				ret = ml_int8_to_float32(model->metadata.output[i].dscale,
+							 model->addr.output[i].nb_elements,
+							 lcl_qbuffer, lcl_dbuffer);
+				break;
+			case RTE_ML_IO_TYPE_UINT8:
+				ret = ml_uint8_to_float32(model->metadata.output[i].dscale,
+							  model->addr.output[i].nb_elements,
+							  lcl_qbuffer, lcl_dbuffer);
+				break;
+			case RTE_ML_IO_TYPE_INT16:
+				ret = ml_int16_to_float32(model->metadata.output[i].dscale,
+							  model->addr.output[i].nb_elements,
+							  lcl_qbuffer, lcl_dbuffer);
+				break;
+			case RTE_ML_IO_TYPE_UINT16:
+				ret = ml_uint16_to_float32(model->metadata.output[i].dscale,
+							   model->addr.output[i].nb_elements,
+							   lcl_qbuffer, lcl_dbuffer);
+				break;
+			case RTE_ML_IO_TYPE_FP16:
+				ret = ml_float16_to_float32(model->addr.output[i].nb_elements,
+							    lcl_qbuffer, lcl_dbuffer);
+				break;
+			default:
+				plt_err("Unsupported model_output_type[%u] : %u", i,
+					model->metadata.output[i].model_output_type);
+				ret = -ENOTSUP;
+			}
+			if (ret < 0)
+				return ret;
+		}
+
+		lcl_qbuffer += model->addr.output[i].sz_q;
+		lcl_dbuffer += model->addr.output[i].sz_d;
+	}
+
+	batch_id++;
+	if (batch_id < PLT_DIV_CEIL(nb_batches, model->batch_size))
+		goto next_batch;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -1006,4 +1154,6 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* I/O ops */
 	.io_input_size_get = cn10k_ml_io_input_size_get,
 	.io_output_size_get = cn10k_ml_io_output_size_get,
+	.io_quantize = cn10k_ml_io_quantize,
+	.io_dequantize = cn10k_ml_io_dequantize,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 24/38] ml/cnxk: enable support to dump device debug info
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (22 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 23/38] ml/cnxk: enable quantization and dequantization Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 25/38] ml/cnxk: add driver support for device selftest Srikanth Yalavarthi
                       ` (14 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added support to dump device debug information. Debug info on
cn10k device includes model state info, OCM usage info, firmware
debug and exception buffer.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c |  51 +++++++++
 drivers/ml/cnxk/cn10k_ml_ocm.h |   1 +
 drivers/ml/cnxk/cn10k_ml_ops.c | 189 +++++++++++++++++++++++++++++++++
 3 files changed, 241 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 034d9546eb..2083d99f81 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -458,3 +458,54 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, int16_t model_id)
 		}
 	}
 }
+
+static void
+cn10k_ml_ocm_pagemask_to_str(struct cn10k_ml_ocm_tile_info *tile_info, uint16_t nwords, char *str)
+{
+	char *p = str;
+	int word;
+
+	/* add prefix 0x */
+	*p++ = '0';
+	*p++ = 'x';
+
+	/* build one word at a time */
+	for (word = nwords - 1; word >= 0; word--) {
+		sprintf(p, "%02X", tile_info->ocm_mask[word]);
+		p += 2;
+	}
+
+	/* terminate */
+	*p++ = 0;
+}
+
+void
+cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
+{
+	char str[ML_CN10K_OCM_NUMPAGES / 4 + 2]; /* nibbles + prefix '0x' */
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+	uint8_t tile_id;
+	uint8_t word_id;
+	int wb_pages;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+
+	fprintf(fp, "OCM State:\n");
+	for (tile_id = 0; tile_id < ocm->num_tiles; tile_id++) {
+		cn10k_ml_ocm_pagemask_to_str(&ocm->tile_ocm_info[tile_id], ocm->mask_words, str);
+
+		wb_pages = 0 - ocm->tile_ocm_info[tile_id].scratch_pages;
+		for (word_id = 0; word_id < ML_CN10K_OCM_MASKWORDS; word_id++)
+			wb_pages +=
+				__builtin_popcount(ocm->tile_ocm_info[tile_id].ocm_mask[word_id]);
+
+		fprintf(fp,
+			"tile = %2u, scratch_pages = %4u,"
+			" wb_pages = %4d, last_wb_page = %4d,"
+			" pagemask = %s\n",
+			tile_id, ocm->tile_ocm_info[tile_id].scratch_pages, wb_pages,
+			ocm->tile_ocm_info[tile_id].last_wb_page, str);
+	}
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index cd65d1d8fa..4415bbfb45 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -83,5 +83,6 @@ int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16
 void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, int16_t model_id, uint64_t tilemask,
 				int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages);
 void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, int16_t model_id);
+void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp);
 
 #endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index e02b8335b7..369566fab1 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -14,10 +14,25 @@
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
 
+/* Debug print width */
+#define STR_LEN	  12
+#define FIELD_LEN 16
+#define LINE_LEN  90
+
 /* ML Job descriptor flags */
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
+static void
+print_line(FILE *fp, int len)
+{
+	int i;
+
+	for (i = 0; i < len; i++)
+		fprintf(fp, "-");
+	fprintf(fp, "\n");
+}
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -116,6 +131,102 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	return NULL;
 }
 
+static void
+cn10k_ml_model_print(struct rte_ml_dev *dev, int16_t model_id, FILE *fp)
+{
+
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+	char str[STR_LEN];
+	uint8_t i;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+	model = dev->data->models[model_id];
+
+	/* Print debug info */
+	print_line(fp, LINE_LEN);
+	fprintf(fp, " Model Information (%s)\n", model->metadata.model.name);
+	print_line(fp, LINE_LEN);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->metadata.model.name);
+	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version", model->metadata.model.version[0],
+		model->metadata.model.version[1], model->metadata.model.version[2],
+		model->metadata.model.version[3]);
+	if (strlen(model->name) != 0)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "debug_name", model->name);
+	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "index", model->model_id);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->metadata.model.batch_size);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_layers", model->metadata.model.num_layers);
+
+	/* Print model state */
+	if (model->state == ML_CN10K_MODEL_STATE_LOADED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+	if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+	if (model->state == ML_CN10K_MODEL_STATE_STARTED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+
+	/* Print OCM status */
+	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
+		model->metadata.model.ocm_wb_range_end - model->metadata.model.ocm_wb_range_start +
+			1);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", model->model_mem_map.wb_pages);
+	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
+		ocm->size_per_tile - model->metadata.model.ocm_tmp_range_floor);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages", model->model_mem_map.scratch_pages);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
+		model->metadata.model.tile_end - model->metadata.model.tile_start + 1);
+
+	if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
+			ML_CN10K_OCM_NUMTILES / 4, model->model_mem_map.tilemask);
+		fprintf(fp, "%*s : 0x%x\n", FIELD_LEN, "ocm_wb_start",
+			model->model_mem_map.wb_page_start * ML_CN10K_OCM_PAGESIZE);
+	}
+
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->metadata.model.num_input);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", model->metadata.model.num_output);
+	fprintf(fp, "\n");
+
+	print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s  %18s  %12s  %14s\n", "input", "input_name", "input_type",
+		"model_input_type", "quantize", "format");
+	print_line(fp, LINE_LEN);
+	for (i = 0; i < model->metadata.model.num_input; i++) {
+		fprintf(fp, "%8u  ", i);
+		fprintf(fp, "%*s  ", 16, model->metadata.input[i].input_name);
+		ml_io_type_to_str(model->metadata.input[i].input_type, str, STR_LEN);
+		fprintf(fp, "%*s  ", 12, str);
+		ml_io_type_to_str(model->metadata.input[i].model_input_type, str, STR_LEN);
+		fprintf(fp, "%*s  ", 18, str);
+		fprintf(fp, "%*s", 12, (model->metadata.input[i].quantize == 1 ? "Yes" : "No"));
+		ml_io_format_to_str(model->metadata.input[i].shape.format, str, STR_LEN);
+		fprintf(fp, "%*s", 16, str);
+		fprintf(fp, "\n");
+	}
+	fprintf(fp, "\n");
+
+	print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "output", "output_name", "output_type",
+		"model_output_type", "dequantize");
+	print_line(fp, LINE_LEN);
+	for (i = 0; i < model->metadata.model.num_output; i++) {
+		fprintf(fp, "%8u  ", i);
+		fprintf(fp, "%*s  ", 16, model->metadata.output[i].output_name);
+		ml_io_type_to_str(model->metadata.output[i].output_type, str, STR_LEN);
+		fprintf(fp, "%*s  ", 12, str);
+		ml_io_type_to_str(model->metadata.output[i].model_output_type, str, STR_LEN);
+		fprintf(fp, "%*s  ", 18, str);
+		fprintf(fp, "%*s", 12, (model->metadata.output[i].dequantize == 1 ? "Yes" : "No"));
+		fprintf(fp, "\n");
+	}
+	fprintf(fp, "\n");
+	print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+}
+
 static void
 cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_model *model,
 				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
@@ -498,6 +609,83 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+static int
+cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_fw *fw;
+
+	uint32_t head_loc;
+	uint32_t tail_loc;
+	uint32_t bufsize;
+	char *head_ptr;
+	int model_id;
+	int core_id;
+
+	if (roc_env_is_asim())
+		return 0;
+
+	mldev = dev->data->dev_private;
+	fw = &mldev->fw;
+
+	/* Dump model info */
+	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+		model = dev->data->models[model_id];
+		if (model != NULL) {
+			cn10k_ml_model_print(dev, model_id, fp);
+			fprintf(fp, "\n");
+		}
+	}
+
+	/* Dump OCM state */
+	cn10k_ml_ocm_print(dev, fp);
+
+	/* Dump debug buffer */
+	for (core_id = 0; core_id <= 1; core_id++) {
+		bufsize = fw->req->jd.fw_load.debug.debug_buffer_size;
+		if (core_id == 0) {
+			head_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+			tail_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_debug_ptr);
+			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+		} else {
+			head_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+			tail_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_debug_ptr);
+			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+		}
+		if (head_loc < tail_loc) {
+			fprintf(fp, "%.*s\n", tail_loc - head_loc, &head_ptr[head_loc]);
+		} else if (head_loc >= tail_loc + 1) {
+			fprintf(fp, "%.*s\n", bufsize - tail_loc, &head_ptr[head_loc]);
+			fprintf(fp, "%.*s\n", tail_loc, &head_ptr[0]);
+		}
+	}
+
+	/* Dump exception info */
+	for (core_id = 0; core_id <= 1; core_id++) {
+		bufsize = fw->req->jd.fw_load.debug.exception_state_size;
+		if ((core_id == 0) &&
+		    (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0)) {
+			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_exception_buffer);
+			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C0 = 0x%016lx",
+				roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0));
+			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+			fprintf(fp, "%.*s", bufsize, head_ptr);
+		} else if ((core_id == 1) &&
+			   (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0)) {
+			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_exception_buffer);
+			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C1 = 0x%016lx",
+				roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1));
+			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+			fprintf(fp, "%.*s", bufsize, head_ptr);
+		}
+	}
+
+	return 0;
+}
+
 int
 cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, int16_t *model_id)
 {
@@ -1138,6 +1326,7 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.dev_close = cn10k_ml_dev_close,
 	.dev_start = cn10k_ml_dev_start,
 	.dev_stop = cn10k_ml_dev_stop,
+	.dev_dump = cn10k_ml_dev_dump,
 
 	/* Queue-pair handling ops */
 	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 25/38] ml/cnxk: add driver support for device selftest
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (23 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 24/38] ml/cnxk: enable support to dump device debug info Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 26/38] ml/cnxk: enqueue a burst of inference requests Srikanth Yalavarthi
                       ` (13 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added support for device selftest. Device selftest includes
checking the status of firmware.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 57 ++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 369566fab1..e6f2e4b8f9 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -686,6 +686,62 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	return 0;
 }
 
+static int
+cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
+{
+	const struct plt_memzone *mz;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_req *req;
+	uint64_t timeout_cycle;
+	bool timeout;
+	int ret;
+
+	mldev = dev->data->dev_private;
+	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cn10k_ml_req), 0,
+					 ML_CN10K_ALIGN_SIZE);
+	if (mz == NULL) {
+		plt_err("Could not allocate reserved memzone");
+		return -ENOMEM;
+	}
+	req = mz->addr;
+
+	/* Prepare load completion structure */
+	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
+	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
+	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST;
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+	req->jd.fw_load.flags = cn10k_ml_fw_flags_get(&mldev->fw);
+	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_wmb();
+
+	/* Enqueue firmware selftest request through scratch registers */
+	timeout = true;
+	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+
+	plt_rmb();
+	do {
+		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
+		    (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+			timeout = false;
+			break;
+		}
+	} while (plt_tsc_cycles() < timeout_cycle);
+
+	/* Check firmware selftest status, clean-up and exit */
+	ret = 0;
+	if (timeout) {
+		ret = -ETIME;
+	} else {
+		if (req->result.error_code != 0)
+			ret = -1;
+	}
+
+	plt_memzone_free(mz);
+
+	return ret;
+}
+
 int
 cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, int16_t *model_id)
 {
@@ -1327,6 +1383,7 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.dev_start = cn10k_ml_dev_start,
 	.dev_stop = cn10k_ml_dev_stop,
 	.dev_dump = cn10k_ml_dev_dump,
+	.dev_selftest = cn10k_ml_dev_selftest,
 
 	/* Queue-pair handling ops */
 	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 26/38] ml/cnxk: enqueue a burst of inference requests
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (24 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 25/38] ml/cnxk: add driver support for device selftest Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 27/38] ml/cnxk: dequeue " Srikanth Yalavarthi
                       ` (12 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled driver support to enqueue a burst of inference requests
to ML device. Enqueue uses internal ML request structure to queue
the inferences and job completion through polling.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 96 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  7 +++
 2 files changed, 103 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index e6f2e4b8f9..e6bfc6635e 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -285,6 +285,28 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 	}
 }
 
+static __rte_always_inline void
+cn10k_ml_prep_fp_job_descriptor(struct rte_ml_dev *dev, struct cn10k_ml_req *req,
+				struct rte_ml_op *op)
+{
+	struct cn10k_ml_dev *mldev;
+
+	mldev = dev->data->dev_private;
+
+	req->jd.hdr.jce.w0.u64 = 0;
+	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
+	req->jd.hdr.model_id = op->model_id;
+	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
+	req->jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
+	req->jd.hdr.sp_flags = 0x0;
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+	req->jd.model_run.input_ddr_addr =
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->input.addr));
+	req->jd.model_run.output_ddr_addr =
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->output.addr));
+	req->jd.model_run.num_batches = op->nb_batches;
+}
+
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -450,6 +472,8 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 
 	rte_spinlock_init(&ocm->lock);
 
+	dev->enqueue_burst = cn10k_ml_enqueue_burst;
+
 	mldev->nb_models_loaded = 0;
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
 
@@ -1375,6 +1399,78 @@ cn10k_ml_io_dequantize(struct rte_ml_dev *dev, int16_t model_id, uint16_t nb_bat
 	return 0;
 }
 
+static __rte_always_inline void
+queue_index_advance(uint64_t *index, uint64_t nb_desc)
+{
+	*index = (*index + 1) % nb_desc;
+}
+
+static __rte_always_inline uint64_t
+queue_pending_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
+{
+	return (nb_desc + head - tail) % nb_desc;
+}
+
+static __rte_always_inline uint64_t
+queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
+{
+	return nb_desc - queue_pending_count(head, tail, nb_desc) - 1;
+}
+
+__rte_hot uint16_t
+cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
+		       uint16_t nb_ops)
+{
+	struct cn10k_ml_queue *queue;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_req *req;
+	struct cn10k_ml_qp *qp;
+	struct rte_ml_op *op;
+
+	uint16_t count;
+	uint64_t head;
+	bool enqueued;
+
+	mldev = dev->data->dev_private;
+	qp = dev->data->queue_pairs[qp_id];
+	queue = &qp->queue;
+
+	head = queue->head;
+	nb_ops = PLT_MIN(nb_ops, queue_free_count(head, queue->tail, qp->nb_desc));
+	count = 0;
+
+	if (unlikely(nb_ops == 0))
+		return 0;
+
+enqueue_req:
+	op = ops[count];
+	req = &queue->reqs[head];
+
+	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
+
+	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
+	req->result.user_ptr = op->user_ptr;
+
+	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	enqueued = roc_ml_jcmdq_enqueue_lf(&mldev->roc, &req->jcmd);
+	if (unlikely(!enqueued))
+		goto jcmdq_full;
+
+	req->timeout = plt_tsc_cycles() + queue->wait_cycles;
+	req->op = op;
+
+	queue_index_advance(&head, qp->nb_desc);
+	count++;
+
+	if (count < nb_ops)
+		goto enqueue_req;
+
+jcmdq_full:
+	queue->head = head;
+
+	return count;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 3143c9054c..d35f91a302 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -28,6 +28,9 @@ struct cn10k_ml_req {
 
 	/* Timeout cycle */
 	uint64_t timeout;
+
+	/* Op */
+	struct rte_ml_op *op;
 } __rte_aligned(ROC_ALIGN);
 
 /* Request queue */
@@ -67,4 +70,8 @@ int cn10k_ml_model_unload(struct rte_ml_dev *dev, int16_t model_id);
 int cn10k_ml_model_start(struct rte_ml_dev *dev, int16_t model_id);
 int cn10k_ml_model_stop(struct rte_ml_dev *dev, int16_t model_id);
 
+/* Fast-path ops */
+__rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
+					  struct rte_ml_op **ops, uint16_t nb_ops);
+
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 27/38] ml/cnxk: dequeue a burst of inference requests
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (25 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 26/38] ml/cnxk: enqueue a burst of inference requests Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 28/38] ml/cnxk: add internal function for sync mode run Srikanth Yalavarthi
                       ` (11 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled driver support to dequeue inference requests from
internal queue. Dequeue checks for request completion by
polling the status field of the job request.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 61 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  2 ++
 2 files changed, 63 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index e6bfc6635e..8de2b3e49c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -473,6 +473,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	rte_spinlock_init(&ocm->lock);
 
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
+	dev->dequeue_burst = cn10k_ml_dequeue_burst;
 
 	mldev->nb_models_loaded = 0;
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
@@ -1417,6 +1418,23 @@ queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
 	return nb_desc - queue_pending_count(head, tail, nb_desc) - 1;
 }
 
+static __rte_always_inline void
+cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
+		       struct rte_ml_op *op)
+{
+	PLT_SET_USED(dev);
+	PLT_SET_USED(qp_id);
+
+	op->impl_opaque = result->error_code;
+
+	if (likely(result->error_code == 0))
+		op->status = RTE_ML_OP_STATUS_SUCCESS;
+	else
+		op->status = RTE_ML_OP_STATUS_ERROR;
+
+	op->user_ptr = result->user_ptr;
+}
+
 __rte_hot uint16_t
 cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
@@ -1471,6 +1489,49 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	return count;
 }
 
+__rte_hot uint16_t
+cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
+		       uint16_t nb_ops)
+{
+	struct cn10k_ml_queue *queue;
+	struct cn10k_ml_req *req;
+	struct cn10k_ml_qp *qp;
+
+	uint64_t status;
+	uint16_t count;
+	uint64_t tail;
+
+	qp = dev->data->queue_pairs[qp_id];
+	queue = &qp->queue;
+
+	tail = queue->tail;
+	nb_ops = PLT_MIN(nb_ops, queue_pending_count(queue->head, tail, qp->nb_desc));
+	count = 0;
+
+	if (unlikely(nb_ops == 0))
+		goto empty_or_active;
+
+dequeue_req:
+	req = &queue->reqs[tail];
+	status = plt_read64(&req->status);
+	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH))
+		goto empty_or_active;
+
+	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
+	ops[count] = req->op;
+
+	queue_index_advance(&tail, qp->nb_desc);
+	count++;
+
+	if (count < nb_ops)
+		goto dequeue_req;
+
+empty_or_active:
+	queue->tail = tail;
+
+	return count;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index d35f91a302..3178295bba 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -73,5 +73,7 @@ int cn10k_ml_model_stop(struct rte_ml_dev *dev, int16_t model_id);
 /* Fast-path ops */
 __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					  struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
+					  struct rte_ml_op **ops, uint16_t nb_ops);
 
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 28/38] ml/cnxk: add internal function for sync mode run
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (26 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 27/38] ml/cnxk: dequeue " Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 29/38] ml/cnxk: enable support for firmware error codes Srikanth Yalavarthi
                       ` (10 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added internal function to execute ML inference requests
in synchronous mode. Sync mode inference execution is used
to launch inference requests without using a queue-pair.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 53 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  1 +
 2 files changed, 54 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 8de2b3e49c..61e19f85fb 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1532,6 +1532,59 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	return count;
 }
 
+__rte_hot int
+cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_req *req;
+	bool timeout;
+	int ret = 0;
+
+	mldev = dev->data->dev_private;
+	model = dev->data->models[op->model_id];
+	req = model->req;
+
+	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
+
+	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
+	req->result.user_ptr = op->user_ptr;
+
+	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
+
+	timeout = true;
+	req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	do {
+		if (roc_ml_jcmdq_enqueue_lf(&mldev->roc, &req->jcmd)) {
+			req->op = op;
+			timeout = false;
+			break;
+		}
+	} while (plt_tsc_cycles() < req->timeout);
+
+	if (timeout) {
+		ret = -EBUSY;
+		goto error_enqueue;
+	}
+
+	timeout = true;
+	do {
+		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+			timeout = false;
+			break;
+		}
+	} while (plt_tsc_cycles() < req->timeout);
+
+	if (timeout)
+		ret = -ETIME;
+	else
+		cn10k_ml_result_update(dev, -1, &req->result, req->op);
+
+error_enqueue:
+	return ret;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 3178295bba..a17a2851b1 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -75,5 +75,6 @@ __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id
 					  struct rte_ml_op **ops, uint16_t nb_ops);
 __rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					  struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
 
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 29/38] ml/cnxk: enable support for firmware error codes
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (27 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 28/38] ml/cnxk: add internal function for sync mode run Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 30/38] ml/cnxk: add support to get and reset device stats Srikanth Yalavarthi
                       ` (9 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled support for error handling. Added error types and subtypes
supported by ML firmware. Enabled support to get device specific
error code and message for a completed ML request.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c |   4 +-
 drivers/ml/cnxk/cn10k_ml_dev.h |  50 +++++++++++++-
 drivers/ml/cnxk/cn10k_ml_ops.c | 117 ++++++++++++++++++++++++++++++---
 drivers/ml/cnxk/cn10k_ml_ops.h |   2 +
 4 files changed, 160 insertions(+), 13 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 837f006bf0..76ed853a3c 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -261,7 +261,7 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 	} while (plt_tsc_cycles() < timeout_cycle);
 
 	/* Check firmware load status, clean-up and exit on failure. */
-	if ((!timeout) && (fw->req->result.error_code == 0)) {
+	if ((!timeout) && (fw->req->result.error_code.u64 == 0)) {
 		cn10k_ml_fw_print_info(fw);
 	} else {
 		/* Set ML to disable new jobs */
@@ -452,7 +452,7 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	} while (plt_tsc_cycles() < timeout_cycle);
 
 	/* Check firmware load status, clean-up and exit on failure. */
-	if ((!timeout) && (fw->req->result.error_code == 0)) {
+	if ((!timeout) && (fw->req->result.error_code.u64 == 0)) {
 		cn10k_ml_fw_print_info(fw);
 	} else {
 		/* Set ML to disable new jobs */
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 8f6bc24370..604a200e26 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -64,6 +64,54 @@ enum cn10k_ml_dev_state {
 	ML_CN10K_DEV_STATE_CLOSED
 };
 
+/* Error types enumeration */
+enum cn10k_ml_error_etype {
+	/* 0x0 */ ML_ETYPE_NO_ERROR = 0, /* No error */
+	/* 0x1 */ ML_ETYPE_FW_NONFATAL,	 /* Firmware non-fatal error */
+	/* 0x2 */ ML_ETYPE_HW_NONFATAL,	 /* Hardware non-fatal error */
+	/* 0x3 */ ML_ETYPE_HW_FATAL,	 /* Hardware fatal error */
+	/* 0x4 */ ML_ETYPE_HW_WARNING,	 /* Hardware warning */
+	/* 0x5 */ ML_ETYPE_DRIVER,	 /* Driver specific error */
+	/* 0x6 */ ML_ETYPE_UNKNOWN,	 /* Unknown error */
+};
+
+/* Firmware non-fatal error sub-type */
+enum cn10k_ml_error_stype_fw_nf {
+	/* 0x0 */ ML_FW_ERR_NOERR = 0,		 /* No error */
+	/* 0x1 */ ML_FW_ERR_UNLOAD_ID_NOT_FOUND, /* Model ID not found during load */
+	/* 0x2 */ ML_FW_ERR_LOAD_LUT_OVERFLOW,	 /* Lookup table overflow at load */
+	/* 0x3 */ ML_FW_ERR_ID_IN_USE,		 /* Model ID already in use */
+	/* 0x4 */ ML_FW_ERR_INVALID_TILEMASK,	 /* Invalid OCM tilemask */
+	/* 0x5 */ ML_FW_ERR_RUN_LUT_OVERFLOW,	 /* Lookup table overflow at run */
+	/* 0x6 */ ML_FW_ERR_RUN_ID_NOT_FOUND,	 /* Model ID not found during run */
+	/* 0x7 */ ML_FW_ERR_COMMAND_NOTSUP,	 /* Unsupported command */
+	/* 0x8 */ ML_FW_ERR_DDR_ADDR_RANGE,	 /* DDR address out of range */
+	/* 0x9 */ ML_FW_ERR_NUM_BATCHES_INVALID, /* Invalid number of batches */
+	/* 0xA */ ML_FW_ERR_INSSYNC_TIMEOUT,	 /* INS sync timeout */
+};
+
+/* Driver error sub-type */
+enum cn10k_ml_error_stype_driver {
+	/* 0x0 */ ML_DRIVER_ERR_NOERR = 0, /* No error */
+	/* 0x1 */ ML_DRIVER_ERR_UNKNOWN,   /* Unable to determine error sub-type */
+	/* 0x2 */ ML_DRIVER_ERR_EXCEPTION, /* Firmware exception */
+	/* 0x3 */ ML_DRIVER_ERR_FW_ERROR,  /* Unknown firmware error */
+};
+
+/* Error structure */
+union cn10k_ml_error_code {
+	struct {
+		/* Error type */
+		uint64_t etype : 4;
+
+		/* Error sub-type */
+		uint64_t stype : 60;
+	} s;
+
+	/* WORD 0 */
+	uint64_t u64;
+};
+
 /* Firmware stats */
 struct cn10k_ml_fw_stats {
 	/* Firmware start cycle */
@@ -82,7 +130,7 @@ struct cn10k_ml_fw_stats {
 /* Result structure */
 struct cn10k_ml_result {
 	/* Job error code */
-	uint64_t error_code;
+	union cn10k_ml_error_code error_code;
 
 	/* Firmware stats */
 	struct cn10k_ml_fw_stats stats;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 61e19f85fb..668456be01 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -23,6 +23,49 @@
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
+/* Error message length */
+#define ERRMSG_LEN 32
+
+/* Error type database */
+static const struct cn10k_ml_etype_db {
+	enum cn10k_ml_error_etype etype;
+	char name[ERRMSG_LEN];
+} ml_etype_db[] = {
+	{ML_ETYPE_NO_ERROR, "NO_ERROR"},	{ML_ETYPE_FW_NONFATAL, "FW_NON_FATAL"},
+	{ML_ETYPE_HW_NONFATAL, "HW_NON_FATAL"}, {ML_ETYPE_HW_FATAL, "HW_FATAL"},
+	{ML_ETYPE_HW_WARNING, "HW_WARNING"},	{ML_ETYPE_DRIVER, "DRIVER_ERROR"},
+	{ML_ETYPE_UNKNOWN, "UNKNOWN_ERROR"},
+};
+
+/* Hardware non-fatal error subtype database */
+static const struct cn10k_ml_stype_db_hw_nf {
+	enum cn10k_ml_error_stype_fw_nf stype;
+	char msg[ERRMSG_LEN];
+} ml_stype_db_hw_nf[] = {
+	{ML_FW_ERR_NOERR, "NO ERROR"},
+	{ML_FW_ERR_UNLOAD_ID_NOT_FOUND, "UNLOAD MODEL ID NOT FOUND"},
+	{ML_FW_ERR_LOAD_LUT_OVERFLOW, "LOAD LUT OVERFLOW"},
+	{ML_FW_ERR_ID_IN_USE, "MODEL ID IN USE"},
+	{ML_FW_ERR_INVALID_TILEMASK, "INVALID TILEMASK"},
+	{ML_FW_ERR_RUN_LUT_OVERFLOW, "RUN LUT OVERFLOW"},
+	{ML_FW_ERR_RUN_ID_NOT_FOUND, "RUN MODEL ID NOT FOUND"},
+	{ML_FW_ERR_COMMAND_NOTSUP, "COMMAND NOT SUPPORTED"},
+	{ML_FW_ERR_DDR_ADDR_RANGE, "DDR ADDRESS OUT OF RANGE"},
+	{ML_FW_ERR_NUM_BATCHES_INVALID, "INVALID BATCHES"},
+	{ML_FW_ERR_INSSYNC_TIMEOUT, "INSSYNC TIMEOUT"},
+};
+
+/* Driver error subtype database */
+static const struct cn10k_ml_stype_db_driver {
+	enum cn10k_ml_error_stype_driver stype;
+	char msg[ERRMSG_LEN];
+} ml_stype_db_driver[] = {
+	{ML_DRIVER_ERR_NOERR, "NO ERROR"},
+	{ML_DRIVER_ERR_UNKNOWN, "UNKNOWN ERROR"},
+	{ML_DRIVER_ERR_EXCEPTION, "FW EXCEPTION"},
+	{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
+};
+
 static void
 print_line(FILE *fp, int len)
 {
@@ -474,6 +517,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
+	dev->op_error_get = cn10k_ml_op_error_get;
 
 	mldev->nb_models_loaded = 0;
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
@@ -758,7 +802,7 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	if (timeout) {
 		ret = -ETIME;
 	} else {
-		if (req->result.error_code != 0)
+		if (req->result.error_code.u64 != 0)
 			ret = -1;
 	}
 
@@ -936,7 +980,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, int16_t model_id)
 	/* Prepare JD */
 	req = model->req;
 	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
-	req->result.error_code = 0x0;
+	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
 
 	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
@@ -1017,7 +1061,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, int16_t model_id)
 
 	if (job_dequeued) {
 		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
-			if (req->result.error_code == 0)
+			if (req->result.error_code.u64 == 0)
 				ret = 0;
 			else
 				ret = -1;
@@ -1079,7 +1123,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, int16_t model_id)
 	/* Prepare JD */
 	req = model->req;
 	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
-	req->result.error_code = 0x0;
+	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
 
 	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
@@ -1134,7 +1178,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, int16_t model_id)
 
 	if (job_dequeued) {
 		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
-			if (req->result.error_code == 0x0)
+			if (req->result.error_code.u64 == 0x0)
 				ret = 0;
 			else
 				ret = -1;
@@ -1425,12 +1469,30 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 	PLT_SET_USED(dev);
 	PLT_SET_USED(qp_id);
 
-	op->impl_opaque = result->error_code;
+	struct cn10k_ml_dev *mldev;
 
-	if (likely(result->error_code == 0))
+	if (likely(result->error_code.u64 == 0)) {
+		op->impl_opaque = result->error_code.u64;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
-	else
+	} else {
+		/* Handle driver error */
+		if (result->error_code.s.etype == ML_ETYPE_DRIVER) {
+			mldev = dev->data->dev_private;
+
+			/* Check for exception */
+			if ((roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0) ||
+			    (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
+				result->error_code.s.stype = ML_DRIVER_ERR_EXCEPTION;
+			else if ((roc_ml_reg_read64(&mldev->roc, ML_CORE_INT_LO) != 0) ||
+				 (roc_ml_reg_read64(&mldev->roc, ML_CORE_INT_HI) != 0))
+				result->error_code.s.stype = ML_DRIVER_ERR_FW_ERROR;
+			else
+				result->error_code.s.stype = ML_DRIVER_ERR_UNKNOWN;
+		}
+
+		op->impl_opaque = result->error_code.u64;
 		op->status = RTE_ML_OP_STATUS_ERROR;
+	}
 
 	op->user_ptr = result->user_ptr;
 }
@@ -1467,6 +1529,7 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
+	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
 	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
@@ -1514,8 +1577,12 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 dequeue_req:
 	req = &queue->reqs[tail];
 	status = plt_read64(&req->status);
-	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH))
-		goto empty_or_active;
+	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH)) {
+		if (plt_tsc_cycles() < req->timeout)
+			goto empty_or_active;
+		else /* Timeout, set indication of driver error */
+			req->result.error_code.s.etype = ML_ETYPE_DRIVER;
+	}
 
 	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
 	ops[count] = req->op;
@@ -1532,6 +1599,35 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	return count;
 }
 
+__rte_hot int
+cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_ml_op_error *error)
+{
+	union cn10k_ml_error_code *error_code;
+	char msg[RTE_ML_STR_MAX];
+
+	PLT_SET_USED(dev);
+
+	error_code = (union cn10k_ml_error_code *)&op->impl_opaque;
+
+	/* Copy error message */
+	plt_strlcpy(msg, ml_etype_db[error_code->s.etype].name, sizeof(msg));
+
+	/* Copy sub error message */
+	if (error_code->s.etype == ML_ETYPE_HW_NONFATAL) {
+		strcat(msg, " : ");
+		strcat(msg, ml_stype_db_hw_nf[error_code->s.stype].msg);
+	}
+
+	if (error_code->s.etype == ML_ETYPE_DRIVER) {
+		strcat(msg, " : ");
+		strcat(msg, ml_stype_db_driver[error_code->s.stype].msg);
+	}
+
+	plt_strlcpy(error->message, msg, sizeof(error->message));
+
+	return 0;
+}
+
 __rte_hot int
 cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 {
@@ -1548,6 +1644,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
+	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
 	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index a17a2851b1..560310f835 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -75,6 +75,8 @@ __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id
 					  struct rte_ml_op **ops, uint16_t nb_ops);
 __rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					  struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op,
+				    struct rte_ml_op_error *error);
 __rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
 
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 30/38] ml/cnxk: add support to get and reset device stats
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (28 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 29/38] ml/cnxk: enable support for firmware error codes Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 31/38] ml/cnxk: add support to handle extended dev stats Srikanth Yalavarthi
                       ` (8 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled support to get and reset ML device stats. Device stats
include number of requests enqueued/dequeued and error count.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 55 ++++++++++++++++++++++++++++++++--
 drivers/ml/cnxk/cn10k_ml_ops.h |  3 ++
 2 files changed, 55 insertions(+), 3 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 668456be01..d5c45ce916 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -159,6 +159,10 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->queue.tail = 0;
 	qp->queue.wait_cycles = ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
 	qp->nb_desc = nb_desc;
+	qp->stats.enqueued_count = 0;
+	qp->stats.dequeued_count = 0;
+	qp->stats.enqueue_err_count = 0;
+	qp->stats.dequeue_err_count = 0;
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
@@ -678,6 +682,38 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+static int
+cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
+{
+	struct cn10k_ml_qp *qp;
+	int qp_id;
+
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		stats->enqueued_count += qp->stats.enqueued_count;
+		stats->dequeued_count += qp->stats.dequeued_count;
+		stats->enqueue_err_count += qp->stats.enqueue_err_count;
+		stats->dequeue_err_count += qp->stats.dequeue_err_count;
+	}
+
+	return 0;
+}
+
+static void
+cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
+{
+	struct cn10k_ml_qp *qp;
+	int qp_id;
+
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		qp->stats.enqueued_count = 0;
+		qp->stats.dequeued_count = 0;
+		qp->stats.enqueue_err_count = 0;
+		qp->stats.dequeue_err_count = 0;
+	}
+}
+
 static int
 cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
@@ -1466,15 +1502,23 @@ static __rte_always_inline void
 cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
 		       struct rte_ml_op *op)
 {
-	PLT_SET_USED(dev);
-	PLT_SET_USED(qp_id);
-
 	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_qp *qp;
 
 	if (likely(result->error_code.u64 == 0)) {
+		if (likely(qp_id >= 0)) {
+			qp = dev->data->queue_pairs[qp_id];
+			qp->stats.dequeued_count++;
+		}
+
 		op->impl_opaque = result->error_code.u64;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
 	} else {
+		if (likely(qp_id >= 0)) {
+			qp = dev->data->queue_pairs[qp_id];
+			qp->stats.dequeue_err_count++;
+		}
+
 		/* Handle driver error */
 		if (result->error_code.s.etype == ML_ETYPE_DRIVER) {
 			mldev = dev->data->dev_private;
@@ -1548,6 +1592,7 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 
 jcmdq_full:
 	queue->head = head;
+	qp->stats.enqueued_count += count;
 
 	return count;
 }
@@ -1696,6 +1741,10 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
 	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
 
+	/* Stats ops */
+	.dev_stats_get = cn10k_ml_dev_stats_get,
+	.dev_stats_reset = cn10k_ml_dev_stats_reset,
+
 	/* Model ops */
 	.model_load = cn10k_ml_model_load,
 	.model_unload = cn10k_ml_model_unload,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 560310f835..fb82af414a 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -58,6 +58,9 @@ struct cn10k_ml_qp {
 
 	/* Request queue */
 	struct cn10k_ml_queue queue;
+
+	/* Statistics per queue-pair */
+	struct rte_ml_dev_stats stats;
 };
 
 /* Device ops */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 31/38] ml/cnxk: add support to handle extended dev stats
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (29 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 30/38] ml/cnxk: add support to get and reset device stats Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 32/38] ml/cnxk: enable support to get xstats in cycles Srikanth Yalavarthi
                       ` (7 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added support to handle ML device extended stats. Support
is enabled to get xstats names and stats values and reset
xstats. Supported xstats include avg, min and max hardware
and firmware latency.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |   3 +
 drivers/ml/cnxk/cn10k_ml_model.h |  57 +++++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 356 ++++++++++++++++++++++++++++++-
 3 files changed, 415 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 604a200e26..b7ff369ba8 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -372,6 +372,9 @@ struct cn10k_ml_dev {
 
 	/* Number of models loaded */
 	uint16_t nb_models_loaded;
+
+	/* xstats status */
+	bool xstats_enabled;
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 2fd12846d4..f6a7276aa7 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -402,6 +402,57 @@ struct cn10k_ml_model_addr {
 	uint32_t total_output_sz_d;
 };
 
+/* Extended stats types enum */
+enum cn10k_ml_model_xstats_type {
+	/* Average hardware latency */
+	avg_hw_latency = 0,
+
+	/* Minimum hardware latency */
+	min_hw_latency,
+
+	/* Maximum hardware latency */
+	max_hw_latency,
+
+	/* Average firmware latency */
+	avg_fw_latency,
+
+	/* Minimum firmware latency */
+	min_fw_latency,
+
+	/* Maximum firmware latency */
+	max_fw_latency,
+};
+
+/* Model fast-path stats */
+struct cn10k_ml_model_stats {
+	/* Total hardware latency, sum of all inferences */
+	uint64_t hw_latency_tot;
+
+	/* Minimum hardware latency */
+	uint64_t hw_latency_min;
+
+	/* Maximum hardware latency */
+	uint64_t hw_latency_max;
+
+	/* Total firmware latency, sum of all inferences */
+	uint64_t fw_latency_tot;
+
+	/* Minimum firmware latency */
+	uint64_t fw_latency_min;
+
+	/* Maximum firmware latency */
+	uint64_t fw_latency_max;
+
+	/* Total jobs dequeued */
+	uint64_t dequeued_count;
+
+	/* Hardware stats reset index */
+	uint64_t hw_reset_count;
+
+	/* Firmware stats reset index */
+	uint64_t fw_reset_count;
+};
+
 /* Model Object */
 struct cn10k_ml_model {
 	/* Device reference */
@@ -441,6 +492,12 @@ struct cn10k_ml_model {
 
 	/* Slow-path operations request pointer */
 	struct cn10k_ml_req *req;
+
+	/* Stats for burst ops */
+	struct cn10k_ml_model_stats *burst_stats;
+
+	/* Stats for sync ops */
+	struct cn10k_ml_model_stats *sync_stats;
 };
 
 int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index d5c45ce916..47edde0404 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -354,6 +354,134 @@ cn10k_ml_prep_fp_job_descriptor(struct rte_ml_dev *dev, struct cn10k_ml_req *req
 	req->jd.model_run.num_batches = op->nb_batches;
 }
 
+#define ML_AVG_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
+			value += model->burst_stats[qp_id].str##_latency_tot;                      \
+			count += model->burst_stats[qp_id].dequeued_count -                        \
+				 model->burst_stats[qp_id].str##_reset_count;                      \
+		}                                                                                  \
+		value = value / count;                                                             \
+	} while (0)
+
+#define ML_MIN_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+	do {                                                                                       \
+		value = UINT64_MAX;                                                                \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
+			value = PLT_MIN(value, model->burst_stats[qp_id].str##_latency_min);       \
+			count += model->burst_stats[qp_id].dequeued_count -                        \
+				 model->burst_stats[qp_id].str##_reset_count;                      \
+		}                                                                                  \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+
+#define ML_MAX_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
+			value = PLT_MAX(value, model->burst_stats[qp_id].str##_latency_max);       \
+			count += model->burst_stats[qp_id].dequeued_count -                        \
+				 model->burst_stats[qp_id].str##_reset_count;                      \
+		}                                                                                  \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+
+static uint64_t
+cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t model_id,
+			 enum cn10k_ml_model_xstats_type type)
+{
+	struct cn10k_ml_model *model;
+	uint64_t count = 0;
+	uint64_t value;
+	uint32_t qp_id;
+
+	model = dev->data->models[model_id];
+	if (model == NULL)
+		return 0;
+
+	switch (type) {
+	case avg_hw_latency:
+		ML_AVG_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		break;
+	case min_hw_latency:
+		ML_MIN_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		break;
+	case max_hw_latency:
+		ML_MAX_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		break;
+	case avg_fw_latency:
+		ML_AVG_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		break;
+	case min_fw_latency:
+		ML_MIN_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		break;
+	case max_fw_latency:
+		ML_MAX_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		break;
+	default:
+		value = 0;
+	}
+
+	return value;
+}
+
+#define ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
+			model->burst_stats[qp_id].str##_latency_tot = 0;                           \
+			model->burst_stats[qp_id].str##_reset_count =                              \
+				model->burst_stats[qp_id].dequeued_count;                          \
+		}                                                                                  \
+	} while (0)
+
+#define ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
+			model->burst_stats[qp_id].str##_latency_min = UINT64_MAX;                  \
+	} while (0)
+
+#define ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
+			model->burst_stats[qp_id].str##_latency_max = 0;                           \
+	} while (0)
+
+static void
+cn10k_ml_model_xstat_reset(struct rte_ml_dev *dev, uint16_t model_id,
+			   enum cn10k_ml_model_xstats_type type)
+{
+	struct cn10k_ml_model *model;
+	uint32_t qp_id;
+
+	model = dev->data->models[model_id];
+
+	switch (type) {
+	case avg_hw_latency:
+		ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, hw);
+		break;
+	case min_hw_latency:
+		ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, hw);
+		break;
+	case max_hw_latency:
+		ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, hw);
+		break;
+	case avg_fw_latency:
+		ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, fw);
+		break;
+	case min_fw_latency:
+		ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, fw);
+		break;
+	case max_fw_latency:
+		ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, fw);
+		break;
+	default:
+		return;
+	}
+}
+
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -519,6 +647,13 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 
 	rte_spinlock_init(&ocm->lock);
 
+	/* Check firmware stats */
+	if ((mldev->fw.req->jd.fw_load.cap.s.hw_stats) &&
+	    (mldev->fw.req->jd.fw_load.cap.s.fw_stats))
+		mldev->xstats_enabled = true;
+	else
+		mldev->xstats_enabled = false;
+
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
 	dev->op_error_get = cn10k_ml_op_error_get;
@@ -714,6 +849,170 @@ cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 	}
 }
 
+/* Model xstats names */
+struct rte_ml_dev_xstats_map cn10k_ml_model_xstats_table[] = {
+	{avg_hw_latency, "Avg-HW-Latency"}, {min_hw_latency, "Min-HW-Latency"},
+	{max_hw_latency, "Max-HW-Latency"}, {avg_fw_latency, "Avg-FW-Latency"},
+	{min_fw_latency, "Min-FW-Latency"}, {max_fw_latency, "Max-FW-Latency"},
+};
+
+static int
+cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, struct rte_ml_dev_xstats_map *xstats_map,
+			      uint32_t size)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	uint32_t model_id;
+	uint32_t count;
+	uint32_t type;
+	uint32_t id;
+
+	mldev = dev->data->dev_private;
+	if (!mldev->xstats_enabled)
+		return 0;
+
+	if (xstats_map == NULL)
+		return PLT_DIM(cn10k_ml_model_xstats_table) * mldev->nb_models_loaded;
+
+	/* Model xstats names */
+	count = 0;
+	cn10k_ml_dev_info_get(dev, &dev_info);
+
+	for (id = 0; id < PLT_DIM(cn10k_ml_model_xstats_table) * dev_info.max_models; id++) {
+		model_id = id / PLT_DIM(cn10k_ml_model_xstats_table);
+		model = dev->data->models[model_id];
+
+		if (model == NULL)
+			continue;
+
+		xstats_map[count].id = id;
+		type = id % PLT_DIM(cn10k_ml_model_xstats_table);
+
+		snprintf(xstats_map[count].name, RTE_ML_STR_MAX, "%s-%s-cycles",
+			 model->metadata.model.name, cn10k_ml_model_xstats_table[type].name);
+
+		count++;
+		if (count == size)
+			break;
+	}
+
+	return count;
+}
+
+static int
+cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
+				uint64_t *value)
+{
+	struct rte_ml_dev_xstats_map *xstats_map;
+	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_dev *mldev;
+	uint32_t num_xstats;
+	uint32_t model_id;
+	uint32_t type;
+	uint32_t id;
+
+	mldev = dev->data->dev_private;
+	if (!mldev->xstats_enabled)
+		return 0;
+
+	num_xstats = PLT_DIM(cn10k_ml_model_xstats_table) * mldev->nb_models_loaded;
+	xstats_map = rte_zmalloc("cn10k_ml_xstats_map",
+				 sizeof(struct rte_ml_dev_xstats_map) * num_xstats, 0);
+	cn10k_ml_dev_xstats_names_get(dev, xstats_map, num_xstats);
+
+	cn10k_ml_dev_info_get(dev, &dev_info);
+	for (id = 0; id < PLT_DIM(cn10k_ml_model_xstats_table) * dev_info.max_models; id++) {
+		if (strncmp(name, xstats_map[id].name, strlen(name)) == 0) {
+			*stat_id = id;
+			rte_free(xstats_map);
+			break;
+		}
+	}
+
+	if (id == PLT_DIM(cn10k_ml_model_xstats_table) * dev_info.max_models)
+		return -EINVAL;
+
+	model_id = id / PLT_DIM(cn10k_ml_model_xstats_table);
+	type = id % PLT_DIM(cn10k_ml_model_xstats_table);
+	*value = cn10k_ml_model_xstat_get(dev, model_id, type);
+
+	return 0;
+}
+
+static int
+cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, const uint16_t *stat_ids, uint64_t *values,
+			uint16_t nb_ids)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	uint32_t model_id;
+	uint32_t count;
+	uint32_t type;
+	uint32_t i;
+
+	mldev = dev->data->dev_private;
+	if (!mldev->xstats_enabled)
+		return 0;
+
+	count = 0;
+	for (i = 0; i < nb_ids; i++) {
+		model_id = stat_ids[i] / PLT_DIM(cn10k_ml_model_xstats_table);
+		model = dev->data->models[model_id];
+
+		if (model == NULL)
+			continue;
+
+		type = stat_ids[i] % PLT_DIM(cn10k_ml_model_xstats_table);
+		values[i] = cn10k_ml_model_xstat_get(dev, model_id, type);
+		count++;
+	}
+
+	return count;
+}
+
+static int
+cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, const uint16_t *stat_ids, uint16_t nb_ids)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	uint32_t model_id;
+	uint32_t type;
+	uint32_t i;
+
+	mldev = dev->data->dev_private;
+	if (!mldev->xstats_enabled)
+		return 0;
+
+	cn10k_ml_dev_info_get(dev, &dev_info);
+	if (stat_ids == NULL) {
+		for (i = 0; i < PLT_DIM(cn10k_ml_model_xstats_table) * dev_info.max_models; i++) {
+			model_id = i / PLT_DIM(cn10k_ml_model_xstats_table);
+			model = dev->data->models[model_id];
+
+			if (model == NULL)
+				continue;
+
+			type = i % PLT_DIM(cn10k_ml_model_xstats_table);
+			cn10k_ml_model_xstat_reset(dev, model_id, type);
+		}
+	} else {
+		for (i = 0; i < nb_ids; i++) {
+			model_id = stat_ids[i] / PLT_DIM(cn10k_ml_model_xstats_table);
+			model = dev->data->models[model_id];
+
+			if (model == NULL)
+				continue;
+
+			type = stat_ids[i] % PLT_DIM(cn10k_ml_model_xstats_table);
+			cn10k_ml_model_xstat_reset(dev, model_id, type);
+		}
+	}
+
+	return 0;
+}
+
 static int
 cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
@@ -856,6 +1155,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
+	size_t model_stats_size;
 	size_t model_data_size;
 	size_t model_info_size;
 	uint8_t *base_dma_addr;
@@ -864,6 +1164,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	uint64_t mz_size;
 	uint16_t idx;
 	bool found;
+	int qp_id;
 	int ret;
 
 	ret = cn10k_ml_model_metadata_check(params->addr, params->size);
@@ -900,10 +1201,12 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
 			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
 	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
+	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_model_stats);
 
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
 		  2 * model_data_size + model_info_size +
-		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE);
+		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE) +
+		  model_stats_size;
 
 	/* Allocate memzone for model object and model data */
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
@@ -949,6 +1252,24 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	/* Set slow-path request address and state */
 	model->req = PLT_PTR_ADD(model->info, model_info_size);
 
+	/* Reset burst and sync stats */
+	model->burst_stats = PLT_PTR_ADD(
+		model->req, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE));
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
+		model->burst_stats[qp_id].hw_latency_tot = 0;
+		model->burst_stats[qp_id].hw_latency_min = UINT64_MAX;
+		model->burst_stats[qp_id].hw_latency_max = 0;
+		model->burst_stats[qp_id].fw_latency_tot = 0;
+		model->burst_stats[qp_id].fw_latency_min = UINT64_MAX;
+		model->burst_stats[qp_id].fw_latency_max = 0;
+		model->burst_stats[qp_id].hw_reset_count = 0;
+		model->burst_stats[qp_id].fw_reset_count = 0;
+		model->burst_stats[qp_id].dequeued_count = 0;
+	}
+	model->sync_stats =
+		PLT_PTR_ADD(model->burst_stats,
+			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_model_stats));
+
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
@@ -1502,15 +1823,44 @@ static __rte_always_inline void
 cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
 		       struct rte_ml_op *op)
 {
+	struct cn10k_ml_model_stats *stats;
+	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
+	uint64_t hw_latency;
+	uint64_t fw_latency;
 
 	if (likely(result->error_code.u64 == 0)) {
+		model = dev->data->models[op->model_id];
 		if (likely(qp_id >= 0)) {
 			qp = dev->data->queue_pairs[qp_id];
 			qp->stats.dequeued_count++;
+			stats = &model->burst_stats[qp_id];
+		} else {
+			stats = model->sync_stats;
+		}
+
+		if (unlikely(stats->dequeued_count == stats->hw_reset_count)) {
+			stats->hw_latency_min = UINT64_MAX;
+			stats->hw_latency_max = 0;
 		}
 
+		if (unlikely(stats->dequeued_count == stats->fw_reset_count)) {
+			stats->fw_latency_min = UINT64_MAX;
+			stats->fw_latency_max = 0;
+		}
+
+		hw_latency = result->stats.hw_end - result->stats.hw_start;
+		fw_latency = result->stats.fw_end - result->stats.fw_start - hw_latency;
+
+		stats->hw_latency_tot += hw_latency;
+		stats->hw_latency_min = PLT_MIN(stats->hw_latency_min, hw_latency);
+		stats->hw_latency_max = PLT_MAX(stats->hw_latency_max, hw_latency);
+		stats->fw_latency_tot += fw_latency;
+		stats->fw_latency_min = PLT_MIN(stats->fw_latency_min, fw_latency);
+		stats->fw_latency_max = PLT_MAX(stats->fw_latency_max, fw_latency);
+		stats->dequeued_count++;
+
 		op->impl_opaque = result->error_code.u64;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
 	} else {
@@ -1744,6 +2094,10 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Stats ops */
 	.dev_stats_get = cn10k_ml_dev_stats_get,
 	.dev_stats_reset = cn10k_ml_dev_stats_reset,
+	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
+	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
+	.dev_xstats_get = cn10k_ml_dev_xstats_get,
+	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
 
 	/* Model ops */
 	.model_load = cn10k_ml_model_load,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 32/38] ml/cnxk: enable support to get xstats in cycles
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (30 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 31/38] ml/cnxk: add support to handle extended dev stats Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 33/38] ml/cnxk: add support to report DPE FW warnings Srikanth Yalavarthi
                       ` (6 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled support to retrieve xstats in either cycles or ns.
Access to sclk is enabled only if an RVU device is probed
during initialization. Driver would return the xstats in
nanoseconds only when an RVU device is probed, else would
fallback to cycles.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 47edde0404..cdd9ae9c69 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -394,6 +394,8 @@ cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t model_id,
 			 enum cn10k_ml_model_xstats_type type)
 {
 	struct cn10k_ml_model *model;
+	uint16_t rclk_freq; /* MHz */
+	uint16_t sclk_freq; /* MHz */
 	uint64_t count = 0;
 	uint64_t value;
 	uint32_t qp_id;
@@ -425,6 +427,10 @@ cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t model_id,
 		value = 0;
 	}
 
+	roc_clk_freq_get(&rclk_freq, &sclk_freq);
+	if (sclk_freq != 0) /* return in ns */
+		value = (value * 1000ULL) / sclk_freq;
+
 	return value;
 }
 
@@ -863,6 +869,8 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, struct rte_ml_dev_xstats_m
 	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
+	uint16_t rclk_freq;
+	uint16_t sclk_freq;
 	uint32_t model_id;
 	uint32_t count;
 	uint32_t type;
@@ -878,6 +886,7 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, struct rte_ml_dev_xstats_m
 	/* Model xstats names */
 	count = 0;
 	cn10k_ml_dev_info_get(dev, &dev_info);
+	roc_clk_freq_get(&rclk_freq, &sclk_freq);
 
 	for (id = 0; id < PLT_DIM(cn10k_ml_model_xstats_table) * dev_info.max_models; id++) {
 		model_id = id / PLT_DIM(cn10k_ml_model_xstats_table);
@@ -889,8 +898,14 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, struct rte_ml_dev_xstats_m
 		xstats_map[count].id = id;
 		type = id % PLT_DIM(cn10k_ml_model_xstats_table);
 
-		snprintf(xstats_map[count].name, RTE_ML_STR_MAX, "%s-%s-cycles",
-			 model->metadata.model.name, cn10k_ml_model_xstats_table[type].name);
+		if (sclk_freq == 0)
+			snprintf(xstats_map[count].name, RTE_ML_STR_MAX, "%s-%s-cycles",
+				 model->metadata.model.name,
+				 cn10k_ml_model_xstats_table[type].name);
+		else
+			snprintf(xstats_map[count].name, RTE_ML_STR_MAX, "%s-%s-ns",
+				 model->metadata.model.name,
+				 cn10k_ml_model_xstats_table[type].name);
 
 		count++;
 		if (count == size)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 33/38] ml/cnxk: add support to report DPE FW warnings
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (31 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 32/38] ml/cnxk: enable support to get xstats in cycles Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 34/38] ml/cnxk: add support to enable model data caching Srikanth Yalavarthi
                       ` (5 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled support to enable and report DPE warnings from ML
firmware. Configure firmware load flags based on the device
arguments.

Default values:
	enable_dpe_errors = 1
	report_dpe_errors = 0

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 94 +++++++++++++++++++++++++++++++---
 drivers/ml/cnxk/cn10k_ml_dev.h |  6 +++
 2 files changed, 93 insertions(+), 7 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 76ed853a3c..ac6592891b 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -17,9 +17,13 @@
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ops.h"
 
-#define CN10K_ML_FW_PATH "fw_path"
+#define CN10K_ML_FW_PATH		"fw_path"
+#define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
+#define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
 
-#define CN10K_ML_FW_PATH_DEFAULT "/lib/firmware/mlip-fw.bin"
+#define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
+#define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1
+#define CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT 0
 
 /* ML firmware macros */
 #define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
@@ -28,9 +32,13 @@
 #define FW_EXCEPTION_BUFFER_SIZE 0x400
 #define FW_LINKER_OFFSET	 0x80000
 #define FW_WAIT_CYCLES		 100
-#define FW_LOAD_FLAGS		 0x1
 
-static const char *const valid_args[] = {CN10K_ML_FW_PATH, NULL};
+/* Firmware flags */
+#define FW_ENABLE_DPE_WARNING_BITMASK BIT(0)
+#define FW_REPORT_DPE_WARNING_BITMASK BIT(1)
+
+static const char *const valid_args[] = {CN10K_ML_FW_PATH, CN10K_ML_FW_ENABLE_DPE_WARNINGS,
+					 CN10K_ML_FW_REPORT_DPE_WARNINGS, NULL};
 
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
@@ -49,9 +57,25 @@ parse_string_arg(const char *key __rte_unused, const char *value, void *extra_ar
 	return 0;
 }
 
+static int
+parse_integer_arg(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	int *i = (int *)extra_args;
+
+	*i = atoi(value);
+	if (*i < 0) {
+		plt_err("Argument has to be positive.");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 static int
 cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mldev)
 {
+	bool enable_dpe_warnings_set = false;
+	bool report_dpe_warnings_set = false;
 	struct rte_kvargs *kvlist = NULL;
 	bool fw_path_set = false;
 	char *fw_path = NULL;
@@ -76,6 +100,30 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		fw_path_set = true;
 	}
 
+	if (rte_kvargs_count(kvlist, CN10K_ML_FW_ENABLE_DPE_WARNINGS) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_ENABLE_DPE_WARNINGS,
+					 &parse_integer_arg, &mldev->fw.enable_dpe_warnings);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n",
+				CN10K_ML_FW_ENABLE_DPE_WARNINGS);
+			ret = -EINVAL;
+			goto exit;
+		}
+		enable_dpe_warnings_set = true;
+	}
+
+	if (rte_kvargs_count(kvlist, CN10K_ML_FW_REPORT_DPE_WARNINGS) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_REPORT_DPE_WARNINGS,
+					 &parse_integer_arg, &mldev->fw.report_dpe_warnings);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n",
+				CN10K_ML_FW_REPORT_DPE_WARNINGS);
+			ret = -EINVAL;
+			goto exit;
+		}
+		report_dpe_warnings_set = true;
+	}
+
 check_args:
 	if (!fw_path_set)
 		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
@@ -83,6 +131,30 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		mldev->fw.path = fw_path;
 	plt_info("ML: %s = %s", CN10K_ML_FW_PATH, mldev->fw.path);
 
+	if (!enable_dpe_warnings_set) {
+		mldev->fw.enable_dpe_warnings = CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT;
+	} else {
+		if ((mldev->fw.enable_dpe_warnings < 0) || (mldev->fw.enable_dpe_warnings > 1)) {
+			plt_err("Invalid argument, %s = %d\n", CN10K_ML_FW_ENABLE_DPE_WARNINGS,
+				mldev->fw.enable_dpe_warnings);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_info("ML: %s = %d", CN10K_ML_FW_ENABLE_DPE_WARNINGS, mldev->fw.enable_dpe_warnings);
+
+	if (!report_dpe_warnings_set) {
+		mldev->fw.report_dpe_warnings = CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT;
+	} else {
+		if ((mldev->fw.report_dpe_warnings < 0) || (mldev->fw.report_dpe_warnings > 1)) {
+			plt_err("Invalid argument, %s = %d\n", CN10K_ML_FW_REPORT_DPE_WARNINGS,
+				mldev->fw.report_dpe_warnings);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_info("ML: %s = %d", CN10K_ML_FW_REPORT_DPE_WARNINGS, mldev->fw.report_dpe_warnings);
+
 exit:
 	if (kvlist)
 		rte_kvargs_free(kvlist);
@@ -208,9 +280,15 @@ cn10k_ml_fw_print_info(struct cn10k_ml_fw *fw)
 uint64_t
 cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 {
-	PLT_SET_USED(fw);
+	uint64_t flags = 0x0;
+
+	if (fw->enable_dpe_warnings)
+		flags = flags | FW_ENABLE_DPE_WARNING_BITMASK;
+
+	if (fw->report_dpe_warnings)
+		flags = flags | FW_REPORT_DPE_WARNING_BITMASK;
 
-	return FW_LOAD_FLAGS;
+	return flags;
 }
 
 static int
@@ -614,4 +692,6 @@ RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table);
 RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
 
-RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH "=<path>");
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD,
+			      CN10K_ML_FW_PATH "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
+					       "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS "=<0|1>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index b7ff369ba8..9ba56ffba6 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -349,6 +349,12 @@ struct cn10k_ml_fw {
 	/* Firmware file path */
 	const char *path;
 
+	/* Enable DPE warnings */
+	int enable_dpe_warnings;
+
+	/* Report DPE warnings */
+	int report_dpe_warnings;
+
 	/* Data buffer */
 	uint8_t *data;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 34/38] ml/cnxk: add support to enable model data caching
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (32 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 33/38] ml/cnxk: add support to report DPE FW warnings Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 35/38] ml/cnxk: add support to select OCM allocation mode Srikanth Yalavarthi
                       ` (4 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added device argument 'cache_model_data' to enable model data
caching. An inference request would be executed with dummy data
in synchronous mode during model start stage. This run would
cache the model weights and bias in the memory and result in
improved inference throughput.

cache_model_data = 1, enable (default)
cache_model_data = 0, disable

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 33 ++++++++++++++++++++--
 drivers/ml/cnxk/cn10k_ml_dev.h |  3 ++
 drivers/ml/cnxk/cn10k_ml_ops.c | 50 ++++++++++++++++++++++++++++++++++
 3 files changed, 84 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index ac6592891b..948708a420 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -20,10 +20,12 @@
 #define CN10K_ML_FW_PATH		"fw_path"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
+#define CN10K_ML_DEV_CACHE_MODEL_DATA	"cache_model_data"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT 0
+#define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT	1
 
 /* ML firmware macros */
 #define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
@@ -38,7 +40,8 @@
 #define FW_REPORT_DPE_WARNING_BITMASK BIT(1)
 
 static const char *const valid_args[] = {CN10K_ML_FW_PATH, CN10K_ML_FW_ENABLE_DPE_WARNINGS,
-					 CN10K_ML_FW_REPORT_DPE_WARNINGS, NULL};
+					 CN10K_ML_FW_REPORT_DPE_WARNINGS,
+					 CN10K_ML_DEV_CACHE_MODEL_DATA, NULL};
 
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
@@ -76,6 +79,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 {
 	bool enable_dpe_warnings_set = false;
 	bool report_dpe_warnings_set = false;
+	bool cache_model_data_set = false;
 	struct rte_kvargs *kvlist = NULL;
 	bool fw_path_set = false;
 	char *fw_path = NULL;
@@ -124,6 +128,18 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		report_dpe_warnings_set = true;
 	}
 
+	if (rte_kvargs_count(kvlist, CN10K_ML_DEV_CACHE_MODEL_DATA) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_DEV_CACHE_MODEL_DATA, &parse_integer_arg,
+					 &mldev->cache_model_data);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n",
+				CN10K_ML_DEV_CACHE_MODEL_DATA);
+			ret = -EINVAL;
+			goto exit;
+		}
+		cache_model_data_set = true;
+	}
+
 check_args:
 	if (!fw_path_set)
 		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
@@ -155,6 +171,18 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %d", CN10K_ML_FW_REPORT_DPE_WARNINGS, mldev->fw.report_dpe_warnings);
 
+	if (!cache_model_data_set) {
+		mldev->cache_model_data = CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT;
+	} else {
+		if ((mldev->cache_model_data < 0) || (mldev->cache_model_data > 1)) {
+			plt_err("Invalid argument, %s = %d\n", CN10K_ML_DEV_CACHE_MODEL_DATA,
+				mldev->cache_model_data);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_info("ML: %s = %d", CN10K_ML_DEV_CACHE_MODEL_DATA, mldev->cache_model_data);
+
 exit:
 	if (kvlist)
 		rte_kvargs_free(kvlist);
@@ -694,4 +722,5 @@ RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
 
 RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD,
 			      CN10K_ML_FW_PATH "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
-					       "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS "=<0|1>");
+					       "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
+					       "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA "=<0|1>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 9ba56ffba6..718edadde7 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -381,6 +381,9 @@ struct cn10k_ml_dev {
 
 	/* xstats status */
 	bool xstats_enabled;
+
+	/* Enable / disable model data caching */
+	int cache_model_data;
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index cdd9ae9c69..1e6d366c59 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -488,6 +488,49 @@ cn10k_ml_model_xstat_reset(struct rte_ml_dev *dev, uint16_t model_id,
 	}
 }
 
+static int
+cn10k_ml_cache_model_data(struct rte_ml_dev *dev, int16_t model_id)
+{
+	struct cn10k_ml_model *model;
+	struct rte_ml_op op;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint64_t isize = 0;
+	uint64_t osize = 0;
+	int ret = 0;
+
+	model = dev->data->models[model_id];
+
+	/* Create input and output buffers. */
+	rte_ml_io_input_size_get(dev->data->dev_id, model_id, model->batch_size, &isize, NULL);
+	rte_ml_io_output_size_get(dev->data->dev_id, model_id, model->batch_size, &osize, NULL);
+
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%d", "ml_dummy_io", model_id);
+	mz = plt_memzone_reserve_aligned(str, isize + osize, 0, ML_CN10K_ALIGN_SIZE);
+	if (mz == NULL)
+		return -ENOMEM;
+	memset(mz->addr, 0, isize + osize);
+
+	op.model_id = model_id;
+	op.nb_batches = model->batch_size;
+	op.mempool = NULL;
+
+	op.input.addr = mz->addr;
+	op.input.length = isize;
+	op.input.next = NULL;
+
+	op.output.addr = PLT_PTR_ADD(op.input.addr, isize);
+	op.output.length = osize;
+	op.output.next = NULL;
+
+	memset(model->req, 0, sizeof(struct cn10k_ml_req));
+	ret = cn10k_ml_inference_sync(dev, &op);
+	plt_memzone_free(mz);
+
+	return ret;
+}
+
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -1467,6 +1510,13 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, int16_t model_id)
 		}
 	}
 
+	if (ret < 0) { /* Call unload to update model and FW state, ignore error */
+		rte_ml_model_stop(dev->data->dev_id, model_id);
+	} else {
+		if (mldev->cache_model_data && roc_model_is_cn10ka())
+			ret = cn10k_ml_cache_model_data(dev, model_id);
+	}
+
 	return ret;
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 35/38] ml/cnxk: add support to select OCM allocation mode
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (33 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 34/38] ml/cnxk: add support to enable model data caching Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 36/38] ml/cnxk: add support to use lock during jcmd enq Srikanth Yalavarthi
                       ` (3 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added device argument "ocm_alloc_mode" to select OCM allocation
method during model start. Two modes are supported by the driver.

Added implementation for ocm_alloc_mode lowest as default.

ocm_alloc_mode:
lowest:  Allocate from first available free slot / lowest
         tile ID in OCM (default)
largest: Allocate from a slot with maximum free memory

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 45 +++++++++++++++++++++++++++++-----
 drivers/ml/cnxk/cn10k_ml_ocm.c |  6 ++---
 drivers/ml/cnxk/cn10k_ml_ocm.h |  3 +++
 3 files changed, 44 insertions(+), 10 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 948708a420..5c02d67c8e 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -21,11 +21,13 @@
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
 #define CN10K_ML_DEV_CACHE_MODEL_DATA	"cache_model_data"
+#define CN10K_ML_OCM_ALLOC_MODE		"ocm_alloc_mode"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT 0
 #define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT	1
+#define CN10K_ML_OCM_ALLOC_MODE_DEFAULT		"lowest"
 
 /* ML firmware macros */
 #define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
@@ -39,9 +41,12 @@
 #define FW_ENABLE_DPE_WARNING_BITMASK BIT(0)
 #define FW_REPORT_DPE_WARNING_BITMASK BIT(1)
 
-static const char *const valid_args[] = {CN10K_ML_FW_PATH, CN10K_ML_FW_ENABLE_DPE_WARNINGS,
+static const char *const valid_args[] = {CN10K_ML_FW_PATH,
+					 CN10K_ML_FW_ENABLE_DPE_WARNINGS,
 					 CN10K_ML_FW_REPORT_DPE_WARNINGS,
-					 CN10K_ML_DEV_CACHE_MODEL_DATA, NULL};
+					 CN10K_ML_DEV_CACHE_MODEL_DATA,
+					 CN10K_ML_OCM_ALLOC_MODE,
+					 NULL};
 
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
@@ -81,6 +86,8 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	bool report_dpe_warnings_set = false;
 	bool cache_model_data_set = false;
 	struct rte_kvargs *kvlist = NULL;
+	bool ocm_alloc_mode_set = false;
+	char *ocm_alloc_mode = NULL;
 	bool fw_path_set = false;
 	char *fw_path = NULL;
 	int ret = 0;
@@ -140,6 +147,17 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		cache_model_data_set = true;
 	}
 
+	if (rte_kvargs_count(kvlist, CN10K_ML_OCM_ALLOC_MODE) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_OCM_ALLOC_MODE, &parse_string_arg,
+					 &ocm_alloc_mode);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n", CN10K_ML_OCM_ALLOC_MODE);
+			ret = -EINVAL;
+			goto exit;
+		}
+		ocm_alloc_mode_set = true;
+	}
+
 check_args:
 	if (!fw_path_set)
 		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
@@ -183,6 +201,20 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %d", CN10K_ML_DEV_CACHE_MODEL_DATA, mldev->cache_model_data);
 
+	if (!ocm_alloc_mode_set) {
+		mldev->ocm.alloc_mode = CN10K_ML_OCM_ALLOC_MODE_DEFAULT;
+	} else {
+		if (!((strcmp(ocm_alloc_mode, "lowest") == 0) ||
+		      (strcmp(ocm_alloc_mode, "largest") == 0))) {
+			plt_err("Invalid argument, %s = %s\n", CN10K_ML_OCM_ALLOC_MODE,
+				ocm_alloc_mode);
+			ret = -EINVAL;
+			goto exit;
+		}
+		mldev->ocm.alloc_mode = ocm_alloc_mode;
+	}
+	plt_info("ML: %s = %s", CN10K_ML_OCM_ALLOC_MODE, mldev->ocm.alloc_mode);
+
 exit:
 	if (kvlist)
 		rte_kvargs_free(kvlist);
@@ -720,7 +752,8 @@ RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table);
 RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
 
-RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD,
-			      CN10K_ML_FW_PATH "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
-					       "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
-					       "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA "=<0|1>");
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH
+			      "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
+			      "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
+			      "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
+			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE "=<lowest|largest>");
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 2083d99f81..26e356c107 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -230,7 +230,6 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	int wb_page_start_curr;
 	int max_slot_sz_curr;
 	uint8_t tile_start;
-	int ocm_alloc_mode;
 	int wb_page_start;
 	uint16_t tile_id;
 	uint16_t word_id;
@@ -255,7 +254,6 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	max_slot_sz_curr = 0;
 	max_slot_sz = 0;
 	tile_idx = 0;
-	ocm_alloc_mode = 2;
 
 	if ((start_tile != -1) && (start_tile % num_tiles != 0)) {
 		plt_err("Invalid start_tile, %d", start_tile);
@@ -303,13 +301,13 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 		}
 	}
 
-	if (ocm_alloc_mode == 1) {
+	if (strcmp(ocm->alloc_mode, "lowest") == 0) {
 		wb_page_start = slot_index_lowest(local_ocm_mask, ocm->mask_words, wb_pages, 0);
 		if (wb_page_start != -1) { /* Have a valid slot for WB, else next set of tiles */
 			tile_idx = tile_start;
 			goto found;
 		}
-	} else if (ocm_alloc_mode == 2) {
+	} else if (strcmp(ocm->alloc_mode, "largest") == 0) {
 		wb_page_start_curr = slot_index_largest(local_ocm_mask, ocm->mask_words, wb_pages,
 							&max_slot_sz_curr);
 		if (max_slot_sz_curr > max_slot_sz) {
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 4415bbfb45..6bf71c8da6 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -58,6 +58,9 @@ struct cn10k_ml_ocm {
 	/* OCM spinlock, used to update OCM state */
 	rte_spinlock_t lock;
 
+	/* OCM allocation mode */
+	const char *alloc_mode;
+
 	/* Number of OCM tiles */
 	uint8_t num_tiles;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 36/38] ml/cnxk: add support to use lock during jcmd enq
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (34 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 35/38] ml/cnxk: add support to select OCM allocation mode Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 37/38] ml/cnxk: add support to select poll memory region Srikanth Yalavarthi
                       ` (2 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added device argument "hw_queue_lock" to select the JCMDQ enqueue
ROC function to be used in fast path.

hw_queue_lock:

0: Disable, use lock free version of JCMDQ enqueue ROC 	function for
	job queuing. To avoid race condition in request queuing to
	hardware, disabling hw_queue_lock restricts the number of
	queue-pairs supported by cnxk driver to 1.

1: Enable, (default) use spin-lock version of JCMDQ enqueue ROC
	function for job queuing. Enabling spinlock version would
	disable restrictions on the number of queue-pairs that
	can be created.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 31 ++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cn10k_ml_dev.h | 13 +++++++++++--
 drivers/ml/cnxk/cn10k_ml_ops.c | 20 +++++++++++++++++---
 3 files changed, 58 insertions(+), 6 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 5c02d67c8e..aa503b2691 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -22,12 +22,14 @@
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
 #define CN10K_ML_DEV_CACHE_MODEL_DATA	"cache_model_data"
 #define CN10K_ML_OCM_ALLOC_MODE		"ocm_alloc_mode"
+#define CN10K_ML_DEV_HW_QUEUE_LOCK	"hw_queue_lock"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT 0
 #define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT	1
 #define CN10K_ML_OCM_ALLOC_MODE_DEFAULT		"lowest"
+#define CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT	1
 
 /* ML firmware macros */
 #define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
@@ -46,6 +48,7 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 					 CN10K_ML_FW_REPORT_DPE_WARNINGS,
 					 CN10K_ML_DEV_CACHE_MODEL_DATA,
 					 CN10K_ML_OCM_ALLOC_MODE,
+					 CN10K_ML_DEV_HW_QUEUE_LOCK,
 					 NULL};
 
 /* Dummy operations for ML device */
@@ -87,6 +90,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	bool cache_model_data_set = false;
 	struct rte_kvargs *kvlist = NULL;
 	bool ocm_alloc_mode_set = false;
+	bool hw_queue_lock_set = false;
 	char *ocm_alloc_mode = NULL;
 	bool fw_path_set = false;
 	char *fw_path = NULL;
@@ -158,6 +162,18 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		ocm_alloc_mode_set = true;
 	}
 
+	if (rte_kvargs_count(kvlist, CN10K_ML_DEV_HW_QUEUE_LOCK) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_DEV_HW_QUEUE_LOCK, &parse_integer_arg,
+					 &mldev->hw_queue_lock);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n",
+				CN10K_ML_DEV_HW_QUEUE_LOCK);
+			ret = -EINVAL;
+			goto exit;
+		}
+		hw_queue_lock_set = true;
+	}
+
 check_args:
 	if (!fw_path_set)
 		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
@@ -215,6 +231,18 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %s", CN10K_ML_OCM_ALLOC_MODE, mldev->ocm.alloc_mode);
 
+	if (!hw_queue_lock_set) {
+		mldev->hw_queue_lock = CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT;
+	} else {
+		if ((mldev->hw_queue_lock < 0) || (mldev->hw_queue_lock > 1)) {
+			plt_err("Invalid argument, %s = %d\n", CN10K_ML_DEV_HW_QUEUE_LOCK,
+				mldev->hw_queue_lock);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, mldev->hw_queue_lock);
+
 exit:
 	if (kvlist)
 		rte_kvargs_free(kvlist);
@@ -756,4 +784,5 @@ RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH
 			      "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
 			      "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
 			      "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
-			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE "=<lowest|largest>");
+			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE
+			      "=<lowest|largest>" CN10K_ML_DEV_HW_QUEUE_LOCK "=<0|1>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 718edadde7..49676ac9e7 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -21,8 +21,11 @@
 /* Maximum number of models per device */
 #define ML_CN10K_MAX_MODELS 16
 
-/* Maximum number of queue-pairs per device */
-#define ML_CN10K_MAX_QP_PER_DEVICE 1
+/* Maximum number of queue-pairs per device, spinlock version */
+#define ML_CN10K_MAX_QP_PER_DEVICE_SL 16
+
+/* Maximum number of queue-pairs per device, lock-free version */
+#define ML_CN10K_MAX_QP_PER_DEVICE_LF 1
 
 /* Maximum number of descriptors per queue-pair */
 #define ML_CN10K_MAX_DESC_PER_QP 1024
@@ -384,6 +387,12 @@ struct cn10k_ml_dev {
 
 	/* Enable / disable model data caching */
 	int cache_model_data;
+
+	/* Use spinlock version of ROC enqueue */
+	int hw_queue_lock;
+
+	/* JCMD enqueue function handler */
+	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 1e6d366c59..c82f3de5c8 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -534,13 +534,21 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, int16_t model_id)
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
+	struct cn10k_ml_dev *mldev;
+
 	if (dev_info == NULL)
 		return -EINVAL;
 
+	mldev = dev->data->dev_private;
+
 	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
 	dev_info->driver_name = dev->device->driver->name;
 	dev_info->max_models = ML_CN10K_MAX_MODELS;
-	dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE;
+	if (mldev->hw_queue_lock)
+		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
+	else
+		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_LF;
+
 	dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
 	dev_info->max_segments = ML_CN10K_MAX_SEGMENTS;
 	dev_info->min_align_size = ML_CN10K_ALIGN_SIZE;
@@ -703,6 +711,12 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	else
 		mldev->xstats_enabled = false;
 
+	/* Set JCMDQ enqueue function */
+	if (mldev->hw_queue_lock == 1)
+		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_sl;
+	else
+		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
+
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
 	dev->op_error_get = cn10k_ml_op_error_get;
@@ -1992,7 +2006,7 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	req->result.user_ptr = op->user_ptr;
 
 	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
-	enqueued = roc_ml_jcmdq_enqueue_lf(&mldev->roc, &req->jcmd);
+	enqueued = mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
 
@@ -2113,7 +2127,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	timeout = true;
 	req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
 	do {
-		if (roc_ml_jcmdq_enqueue_lf(&mldev->roc, &req->jcmd)) {
+		if (mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd)) {
 			req->op = op;
 			timeout = false;
 			break;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 37/38] ml/cnxk: add support to select poll memory region
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (35 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 36/38] ml/cnxk: add support to use lock during jcmd enq Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 19:26     ` [PATCH v3 38/38] ml/cnxk: add user guide for marvell cnxk ml driver Srikanth Yalavarthi
  2022-12-20 21:23     ` [PATCH v3 00/38] Implementation of ML CNXK driver Stephen Hemminger
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added device argument "poll_mem" to select the memory
region to be used for polling in fast-path requests.

Implemented support to use scratch registers for polling.
Available pool of scratch registers one-to-one mapped with
the internal request queue.

poll_mem:
ddr:      Use DDR memory location for polling (default)
register: Use scratch registers polling

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c |  47 +++++++++++--
 drivers/ml/cnxk/cn10k_ml_dev.h |  24 +++++++
 drivers/ml/cnxk/cn10k_ml_ops.c | 124 +++++++++++++++++++++++++++++++--
 drivers/ml/cnxk/cn10k_ml_ops.h |   9 +++
 4 files changed, 192 insertions(+), 12 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index aa503b2691..a746a66849 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -23,6 +23,7 @@
 #define CN10K_ML_DEV_CACHE_MODEL_DATA	"cache_model_data"
 #define CN10K_ML_OCM_ALLOC_MODE		"ocm_alloc_mode"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK	"hw_queue_lock"
+#define CN10K_ML_FW_POLL_MEM		"poll_mem"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1
@@ -30,6 +31,7 @@
 #define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT	1
 #define CN10K_ML_OCM_ALLOC_MODE_DEFAULT		"lowest"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT	1
+#define CN10K_ML_FW_POLL_MEM_DEFAULT		"ddr"
 
 /* ML firmware macros */
 #define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
@@ -42,6 +44,7 @@
 /* Firmware flags */
 #define FW_ENABLE_DPE_WARNING_BITMASK BIT(0)
 #define FW_REPORT_DPE_WARNING_BITMASK BIT(1)
+#define FW_USE_DDR_POLL_ADDR_FP	      BIT(2)
 
 static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 					 CN10K_ML_FW_ENABLE_DPE_WARNINGS,
@@ -49,6 +52,7 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 					 CN10K_ML_DEV_CACHE_MODEL_DATA,
 					 CN10K_ML_OCM_ALLOC_MODE,
 					 CN10K_ML_DEV_HW_QUEUE_LOCK,
+					 CN10K_ML_FW_POLL_MEM,
 					 NULL};
 
 /* Dummy operations for ML device */
@@ -92,7 +96,9 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	bool ocm_alloc_mode_set = false;
 	bool hw_queue_lock_set = false;
 	char *ocm_alloc_mode = NULL;
+	bool poll_mem_set = false;
 	bool fw_path_set = false;
+	char *poll_mem = NULL;
 	char *fw_path = NULL;
 	int ret = 0;
 
@@ -174,6 +180,17 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		hw_queue_lock_set = true;
 	}
 
+	if (rte_kvargs_count(kvlist, CN10K_ML_FW_POLL_MEM) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_POLL_MEM, &parse_string_arg,
+					 &poll_mem);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n", CN10K_ML_FW_POLL_MEM);
+			ret = -EINVAL;
+			goto exit;
+		}
+		poll_mem_set = true;
+	}
+
 check_args:
 	if (!fw_path_set)
 		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
@@ -243,6 +260,18 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, mldev->hw_queue_lock);
 
+	if (!poll_mem_set) {
+		mldev->fw.poll_mem = CN10K_ML_FW_POLL_MEM_DEFAULT;
+	} else {
+		if (!((strcmp(poll_mem, "ddr") == 0) || (strcmp(poll_mem, "register") == 0))) {
+			plt_err("Invalid argument, %s = %s\n", CN10K_ML_FW_POLL_MEM, poll_mem);
+			ret = -EINVAL;
+			goto exit;
+		}
+		mldev->fw.poll_mem = poll_mem;
+	}
+	plt_info("ML: %s = %s", CN10K_ML_FW_POLL_MEM, mldev->fw.poll_mem);
+
 exit:
 	if (kvlist)
 		rte_kvargs_free(kvlist);
@@ -376,6 +405,11 @@ cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 	if (fw->report_dpe_warnings)
 		flags = flags | FW_REPORT_DPE_WARNING_BITMASK;
 
+	if (strcmp(fw->poll_mem, "ddr") == 0)
+		flags = flags | FW_USE_DDR_POLL_ADDR_FP;
+	else if (strcmp(fw->poll_mem, "register") == 0)
+		flags = flags & ~FW_USE_DDR_POLL_ADDR_FP;
+
 	return flags;
 }
 
@@ -780,9 +814,10 @@ RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table);
 RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
 
-RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH
-			      "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
-			      "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
-			      "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
-			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE
-			      "=<lowest|largest>" CN10K_ML_DEV_HW_QUEUE_LOCK "=<0|1>");
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD,
+			      CN10K_ML_FW_PATH "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
+					       "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
+					       "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
+					       "=<0|1>" CN10K_ML_OCM_ALLOC_MODE
+					       "=<lowest|largest>" CN10K_ML_DEV_HW_QUEUE_LOCK
+					       "=<0|1>" CN10K_ML_FW_POLL_MEM "=<ddr|register>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 49676ac9e7..966d92e027 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -43,6 +43,18 @@
 #define ML_CN10K_POLL_JOB_START	 0
 #define ML_CN10K_POLL_JOB_FINISH 1
 
+/* Memory barrier macros */
+#if defined(RTE_ARCH_ARM)
+#define dmb_st ({ asm volatile("dmb st" : : : "memory"); })
+#define dsb_st ({ asm volatile("dsb st" : : : "memory"); })
+#else
+#define dmb_st
+#define dsb_st
+#endif
+
+struct cn10k_ml_req;
+struct cn10k_ml_qp;
+
 /* Job types */
 enum cn10k_ml_job_type {
 	ML_CN10K_JOB_TYPE_MODEL_RUN = 0,
@@ -358,6 +370,9 @@ struct cn10k_ml_fw {
 	/* Report DPE warnings */
 	int report_dpe_warnings;
 
+	/* Memory to be used for polling in fast-path requests */
+	const char *poll_mem;
+
 	/* Data buffer */
 	uint8_t *data;
 
@@ -393,6 +408,15 @@ struct cn10k_ml_dev {
 
 	/* JCMD enqueue function handler */
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
+
+	/* Poll handling function pointers */
+	void (*set_poll_addr)(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx);
+	void (*set_poll_ptr)(struct roc_ml *roc_ml, struct cn10k_ml_req *req);
+	uint64_t (*get_poll_ptr)(struct roc_ml *roc_ml, struct cn10k_ml_req *req);
+
+	/* Memory barrier function pointers to handle synchronization */
+	void (*set_enq_barrier)(void);
+	void (*set_deq_barrier)(void);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index c82f3de5c8..4903267c4d 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -23,6 +23,11 @@
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
+/* Scratch register range for poll mode requests */
+#define ML_POLL_REGISTER_SYNC  1023
+#define ML_POLL_REGISTER_START 1024
+#define ML_POLL_REGISTER_END   2047
+
 /* Error message length */
 #define ERRMSG_LEN 32
 
@@ -76,6 +81,80 @@ print_line(FILE *fp, int len)
 	fprintf(fp, "\n");
 }
 
+static inline void
+cn10k_ml_set_poll_addr_ddr(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx)
+{
+	PLT_SET_USED(qp);
+	PLT_SET_USED(idx);
+
+	req->compl_W1 = PLT_U64_CAST(&req->status);
+}
+
+static inline void
+cn10k_ml_set_poll_addr_reg(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx)
+{
+	req->compl_W1 = ML_SCRATCH(qp->block_start + idx % qp->block_size);
+}
+
+static inline void
+cn10k_ml_set_poll_ptr_ddr(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+{
+	PLT_SET_USED(roc_ml);
+
+	plt_write64(ML_CN10K_POLL_JOB_START, req->compl_W1);
+}
+
+static inline void
+cn10k_ml_set_poll_ptr_reg(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+{
+	roc_ml_reg_write64(roc_ml, ML_CN10K_POLL_JOB_START, req->compl_W1);
+}
+
+static inline uint64_t
+cn10k_ml_get_poll_ptr_ddr(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+{
+	PLT_SET_USED(roc_ml);
+
+	return plt_read64(req->compl_W1);
+}
+
+static inline uint64_t
+cn10k_ml_get_poll_ptr_reg(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+{
+	return roc_ml_reg_read64(roc_ml, req->compl_W1);
+}
+
+static inline void
+cn10k_ml_set_sync_addr(struct cn10k_ml_dev *mldev, struct cn10k_ml_req *req)
+{
+	if (strcmp(mldev->fw.poll_mem, "ddr") == 0)
+		req->compl_W1 = PLT_U64_CAST(&req->status);
+	else if (strcmp(mldev->fw.poll_mem, "register") == 0)
+		req->compl_W1 = ML_SCRATCH(ML_POLL_REGISTER_SYNC);
+}
+
+static inline void
+cn10k_ml_enq_barrier_ddr(void)
+{
+}
+
+static inline void
+cn10k_ml_deq_barrier_ddr(void)
+{
+}
+
+static inline void
+cn10k_ml_enq_barrier_register(void)
+{
+	dmb_st;
+}
+
+static inline void
+cn10k_ml_deq_barrier_register(void)
+{
+	dsb_st;
+}
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -163,6 +242,9 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->stats.dequeued_count = 0;
 	qp->stats.enqueue_err_count = 0;
 	qp->stats.dequeue_err_count = 0;
+	qp->block_size =
+		(ML_POLL_REGISTER_END - ML_POLL_REGISTER_START + 1) / dev->data->nb_queue_pairs;
+	qp->block_start = ML_POLL_REGISTER_START + qp_id * qp->block_size;
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
@@ -341,7 +423,7 @@ cn10k_ml_prep_fp_job_descriptor(struct rte_ml_dev *dev, struct cn10k_ml_req *req
 	mldev = dev->data->dev_private;
 
 	req->jd.hdr.jce.w0.u64 = 0;
-	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
+	req->jd.hdr.jce.w1.u64 = req->compl_W1;
 	req->jd.hdr.model_id = op->model_id;
 	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
 	req->jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
@@ -549,7 +631,11 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	else
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_LF;
 
-	dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
+	if (strcmp(mldev->fw.poll_mem, "register") == 0)
+		dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP / dev_info->max_queue_pairs;
+	else if (strcmp(mldev->fw.poll_mem, "ddr") == 0)
+		dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
+
 	dev_info->max_segments = ML_CN10K_MAX_SEGMENTS;
 	dev_info->min_align_size = ML_CN10K_ALIGN_SIZE;
 
@@ -717,6 +803,26 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	else
 		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
+	/* Set polling function pointers */
+	if (strcmp(mldev->fw.poll_mem, "ddr") == 0) {
+		mldev->set_poll_addr = cn10k_ml_set_poll_addr_ddr;
+		mldev->set_poll_ptr = cn10k_ml_set_poll_ptr_ddr;
+		mldev->get_poll_ptr = cn10k_ml_get_poll_ptr_ddr;
+	} else if (strcmp(mldev->fw.poll_mem, "register") == 0) {
+		mldev->set_poll_addr = cn10k_ml_set_poll_addr_reg;
+		mldev->set_poll_ptr = cn10k_ml_set_poll_ptr_reg;
+		mldev->get_poll_ptr = cn10k_ml_get_poll_ptr_reg;
+	}
+
+	/* Set barrier function pointers */
+	if (strcmp(mldev->fw.poll_mem, "ddr") == 0) {
+		mldev->set_enq_barrier = cn10k_ml_enq_barrier_ddr;
+		mldev->set_deq_barrier = cn10k_ml_deq_barrier_ddr;
+	} else if (strcmp(mldev->fw.poll_mem, "register") == 0) {
+		mldev->set_enq_barrier = cn10k_ml_enq_barrier_register;
+		mldev->set_deq_barrier = cn10k_ml_deq_barrier_register;
+	}
+
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
 	dev->op_error_get = cn10k_ml_op_error_get;
@@ -1999,13 +2105,15 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	op = ops[count];
 	req = &queue->reqs[head];
 
+	mldev->set_poll_addr(qp, req, head);
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
+	mldev->set_enq_barrier();
 
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	mldev->set_poll_ptr(&mldev->roc, req);
 	enqueued = mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
@@ -2031,6 +2139,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 		       uint16_t nb_ops)
 {
 	struct cn10k_ml_queue *queue;
+	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	struct cn10k_ml_qp *qp;
 
@@ -2038,6 +2147,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	uint16_t count;
 	uint64_t tail;
 
+	mldev = dev->data->dev_private;
 	qp = dev->data->queue_pairs[qp_id];
 	queue = &qp->queue;
 
@@ -2050,7 +2160,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 
 dequeue_req:
 	req = &queue->reqs[tail];
-	status = plt_read64(&req->status);
+	status = mldev->get_poll_ptr(&mldev->roc, req);
 	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH)) {
 		if (plt_tsc_cycles() < req->timeout)
 			goto empty_or_active;
@@ -2058,6 +2168,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 			req->result.error_code.s.etype = ML_ETYPE_DRIVER;
 	}
 
+	mldev->set_deq_barrier();
 	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
 	ops[count] = req->op;
 
@@ -2115,13 +2226,14 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	model = dev->data->models[op->model_id];
 	req = model->req;
 
+	cn10k_ml_set_sync_addr(mldev, req);
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	mldev->set_poll_ptr(&mldev->roc, req);
 	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
 
 	timeout = true;
@@ -2141,7 +2253,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 
 	timeout = true;
 	do {
-		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+		if (mldev->get_poll_ptr(&mldev->roc, req) == ML_CN10K_POLL_JOB_FINISH) {
 			timeout = false;
 			break;
 		}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index fb82af414a..995ed27e4e 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -26,6 +26,9 @@ struct cn10k_ml_req {
 	/* Job command */
 	struct ml_job_cmd_s jcmd;
 
+	/* Job completion W1 */
+	uint64_t compl_W1;
+
 	/* Timeout cycle */
 	uint64_t timeout;
 
@@ -61,6 +64,12 @@ struct cn10k_ml_qp {
 
 	/* Statistics per queue-pair */
 	struct rte_ml_dev_stats stats;
+
+	/* Register block start for polling */
+	uint32_t block_start;
+
+	/* Register block end for polling */
+	uint32_t block_size;
 };
 
 /* Device ops */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v3 38/38] ml/cnxk: add user guide for marvell cnxk ml driver
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (36 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 37/38] ml/cnxk: add support to select poll memory region Srikanth Yalavarthi
@ 2022-12-20 19:26     ` Srikanth Yalavarthi
  2022-12-20 21:23     ` [PATCH v3 00/38] Implementation of ML CNXK driver Stephen Hemminger
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2022-12-20 19:26 UTC (permalink / raw)
  To: Thomas Monjalon, Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added user guide for Marvell cnxk ML driver for Marvell Octeon
cnxk Soc family. Added details about device initialization,
debug options and runtime device args supported by the driver.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 MAINTAINERS                 |   1 +
 doc/guides/index.rst        |   1 +
 doc/guides/mldevs/cnxk.rst  | 238 ++++++++++++++++++++++++++++++++++++
 doc/guides/mldevs/index.rst |  14 +++
 4 files changed, 254 insertions(+)
 create mode 100644 doc/guides/mldevs/cnxk.rst
 create mode 100644 doc/guides/mldevs/index.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index ba4c97e802..537acb8c84 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1443,6 +1443,7 @@ M: Srikanth Yalavarthi <syalavarthi@marvell.com>
 F: drivers/common/cnxk/hw/ml.h
 F: drivers/common/cnxk/roc_ml*
 F: drivers/ml/cnxk/
+F: doc/guides/mldevs/cnxk.rst
 
 
 Packet processing
diff --git a/doc/guides/index.rst b/doc/guides/index.rst
index 5eb5bd9c9a..0bd729530a 100644
--- a/doc/guides/index.rst
+++ b/doc/guides/index.rst
@@ -26,6 +26,7 @@ DPDK documentation
    eventdevs/index
    rawdevs/index
    mempool/index
+   mldevs/index
    platform/index
    contributing/index
    rel_notes/index
diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
new file mode 100644
index 0000000000..da40336299
--- /dev/null
+++ b/doc/guides/mldevs/cnxk.rst
@@ -0,0 +1,238 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright (c) 2022 Marvell.
+
+Marvell cnxk Machine Learning Poll Mode Driver
+==============================================
+
+The cnxk ML poll mode driver provides support for offloading Machine
+Learning inference operations to Machine Learning accelerator units
+on the **Marvell OCTEON cnxk** SoC family.
+
+The cnxk ML PMD code is organized into multiple files with all file names
+starting with cn10k, providing support for CN106XX and CN106XXS.
+
+More information about OCTEON cnxk SoCs may be obtained from `<https://www.marvell.com>`_
+
+Supported OCTEON cnxk SoCs
+--------------------------
+
+- CN106XX
+- CN106XXS
+
+Features
+--------
+
+The OCTEON cnxk ML PMD provides support for the following set of operations:
+
+Slow-path device and ML model handling:
+
+* ``Device probing, configuration and close``
+* ``Device start / stop``
+* ``Model loading and unloading``
+* ``Model start / stop``
+* ``Data quantization and dequantization``
+
+Fast-path Inference:
+
+* ``Inference execution``
+* ``Error handling``
+
+
+Installation
+------------
+
+The OCTEON cnxk ML PMD may be compiled natively on an OCTEON cnxk platform
+or cross-compiled on an x86 platform.
+
+Refer to :doc:`../platform/cnxk` for instructions to build your DPDK
+application.
+
+
+Initialization
+--------------
+
+``CN10K Initialization``
+
+List the ML PF devices available on cn10k platform:
+
+.. code-block:: console
+
+    lspci -d:a092
+
+``a092`` is the ML device PF id. You should see output similar to:
+
+.. code-block:: console
+
+    0000:00:10.0 System peripheral: Cavium, Inc. Device a092
+
+Bind the ML PF device to the vfio_pci driver:
+
+.. code-block:: console
+
+    cd <dpdk directory>
+    ./usertools/dpdk-devbind.py -u 0000:00:10.0
+    ./usertools/dpdk-devbind.py -b vfio-pci 0000:00:10.0
+
+Runtime Config Options
+----------------------
+
+- ``Firmware file path`` (default ``/lib/firmware/mlip-fw.bin``)
+
+   Path to the firmware binary to be loaded during device configuration.
+   The ``fw_path`` ``devargs`` parameter can be used by the user to load
+   ML firmware from a custom path.
+
+   For example::
+
+      -a 0000:00:10.0,fw_path="/home/user/ml_fw.bin"
+
+   With the above configuration, driver loads the firmware from the path
+   "/home/user/ml_fw.bin".
+
+- ``Enable DPE warnings`` (default ``1``)
+
+   ML firmware can be configured during load to handle the DPE errors reported
+   by ML inference engine. When enabled, firmware would mask the DPE non-fatal
+   hardware errors as warnings. The parameter ``enable_dpe_warnings`` ``devargs``
+   is used fo this configuration.
+
+   For example::
+
+      -a 0000:00:10.0,enable_dpe_warnings=0
+
+   With the above configuration, DPE non-fatal errors reported by HW are
+   considered as errors.
+
+
+- ``Model data caching`` (default ``1``)
+
+   Enable caching model data on ML ACC cores. Enabling this option executes a
+   dummy inference request in synchronous mode during model start stage. Caching
+   of model data improves the inferencing throughput / latency for the model.
+   The parameter ``cache_model_data`` ``devargs`` is used to enable data caching.
+
+   For example::
+
+      -a 0000:00:10.0,cache_model_data=0
+
+   With the above configuration, model data caching is disabled.
+
+
+- ``OCM allocation mode`` (default ``lowest``)
+
+   Option to specify the method to be used while allocating OCM memory for a
+   model during model start. Two modes are supported by the driver. The
+   parameter ``ocm_alloc_mode`` ``devargs`` is used to select the OCM
+   allocation mode.
+
+   ``lowest`` - Allocate OCM for the model from first available free slot. Search
+   for the free slot is done starting from the lowest tile ID and lowest page ID.
+   ``largest`` - Allocate OCM for the model from the slot with largest amount of
+   free space.
+
+   For example::
+
+      -a 0000:00:10.0,ocm_alloc_mode=lowest
+
+   With the above configuration, OCM allocation fo the model would be done from
+   the first available free slot / from the lowest possible tile ID.
+
+
+- ``Enable hardware queue lock`` (default ``0``)
+
+   Option to select the job request enqueue function to used to queue the requests
+   to hardware queue. The parameter ``hw_queue_lock`` ``devargs`` is used to select
+   the enqueue function.
+
+   ``0`` - Disable (default), use lock free version of hardware enqueue function
+   for job queuing in enqueue burst operation. To avoid race condition in request
+   queuing to hardware, disabling hw_queue_lock restricts the number of queue-pairs
+   supported by cnxk driver to 1.
+   ``1`` - Enable, use spin-lock version of hardware enqueue function for job queuing.
+   Enabling spinlock version would disable restrictions on the number of queue-pairs
+   that can be supported by the driver.
+
+   For example::
+
+      -a 0000:00:10.0,hw_queue_lock=1
+
+   With the above configuration, spinlock version of hardware enqueue function is used
+   in the fast path enqueue burst operation.
+
+
+- ``Polling memory location`` (default ``ddr``)
+
+   ML cnxk driver provides the option to select the memory location to be used
+   for polling to check the inference request completion. Driver supports using
+   the either DDR address space (``ddr``) or ML registers (``register``) as
+   polling locations. The parameter ``poll_mem`` ``devargs`` is used to specify
+   the poll location.
+
+   For example::
+
+      -a 0000:00:10.0,poll_mem="register"
+
+   With the above configuration, ML cnxk driver is configured to use ML registers
+   for polling in fastpath requests.
+
+
+Debugging Options
+-----------------
+
+.. _table_octeon_cnxk_ml_debug_options:
+
+.. table:: OCTEON cnxk ML PMD debug options
+
+    +---+------------+-------------------------------------------------------+
+    | # | Component  | EAL log command                                       |
+    +===+============+=======================================================+
+    | 1 | ML         | --log-level='pmd\.ml\.cnxk,8'                         |
+    +---+------------+-------------------------------------------------------+
+
+
+Extended stats
+--------------
+
+Marvell cnxk ML PMD supports reporting the inference latencies through extended
+stats. The PMD supports the below list of 6 extended stats types per each model.
+Total number of extended stats would be equal to 6 x number of models loaded.
+
+.. _table_octeon_cnxk_ml_xstats_names:
+
+.. table:: OCTEON cnxk ML PMD xstats names
+
+    +---+---------------------+----------------------------------------------+
+    | # | Type                | Description                                  |
+    +===+=====================+==============================================+
+    | 1 | Avg-HW-Latency      | Average hardware latency                     |
+    +---+---------------------+----------------------------------------------+
+    | 2 | Min-HW-Latency      | Minimum hardware latency                     |
+    +---+---------------------+----------------------------------------------+
+    | 3 | Max-HW-Latency      | Maximum hardware latency                     |
+    +---+---------------------+----------------------------------------------+
+    | 4 | Avg-HW-Latency      | Average firmware latency                     |
+    +---+---------------------+----------------------------------------------+
+    | 5 | Avg-HW-Latency      | Minimum firmware latency                     |
+    +---+---------------------+----------------------------------------------+
+    | 6 | Avg-HW-Latency      | Maximum firmware latency                     |
+    +---+---------------------+----------------------------------------------+
+
+Latency values reported by the PMD through xstats can have units, either in
+cycles or nano seconds. The units of the latency is determined during DPDK
+initialization and would depend on the availability of SCLK. Latencies are
+reported in nao seconds when the SCLK is available and in cycles otherwise.
+Application needs to initialize at least one RVU for the clock to be available.
+
+xstats names are dynamically generated by the PMD and would have the format
+"Model-<model_id>-Type-<units>".
+
+For example::
+   Model-1-Avg-FW-Latency-ns
+
+The above xstat name would report average firmware latency in nano seconds for
+model with model ID 1.
+
+Number of xstats made available by the PMD change dynamically. The number would
+increase with loading a model and would decrease with unloading a model.
+Application needs to update the xstats map after a model is either loaded or
+unloaded.
diff --git a/doc/guides/mldevs/index.rst b/doc/guides/mldevs/index.rst
new file mode 100644
index 0000000000..f201e54175
--- /dev/null
+++ b/doc/guides/mldevs/index.rst
@@ -0,0 +1,14 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright (c) 2022 Marvell.
+
+Machine Learning Device Driver
+==============================
+
+The following are a list of ML device PMDs, which can be used from an
+application through the ML device API.
+
+.. toctree::
+    :maxdepth: 2
+    :numbered:
+
+    cnxk
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH v3 00/38] Implementation of ML CNXK driver
  2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
                       ` (37 preceding siblings ...)
  2022-12-20 19:26     ` [PATCH v3 38/38] ml/cnxk: add user guide for marvell cnxk ml driver Srikanth Yalavarthi
@ 2022-12-20 21:23     ` Stephen Hemminger
  2022-12-21  4:44       ` Jerin Jacob
  38 siblings, 1 reply; 253+ messages in thread
From: Stephen Hemminger @ 2022-12-20 21:23 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

On Tue, 20 Dec 2022 11:26:07 -0800
Srikanth Yalavarthi <syalavarthi@marvell.com> wrote:

> Marvell ML CNXK Driver
> ----------------------
> 
> This patch series implements common Machine Learning (ML) ROC code
> and driver for Marvell Octeon 10 (cnxk) platform. ML inferencing is
> supported on cnxk platform through an integrated ML inferencing
> processor. The current driver supports programming the ML hardware
> engine through offload mode.
> 
> All APIs proposed in the DPDK ML device specification are supported on
> the cnxk platform.


Is this hardware in the DPDK CI lab?
How can the project make sure this isn't broken in future?

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH v3 00/38] Implementation of ML CNXK driver
  2022-12-20 21:23     ` [PATCH v3 00/38] Implementation of ML CNXK driver Stephen Hemminger
@ 2022-12-21  4:44       ` Jerin Jacob
  0 siblings, 0 replies; 253+ messages in thread
From: Jerin Jacob @ 2022-12-21  4:44 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Srikanth Yalavarthi, dev, sshankarnara, jerinj, aprabhu

On Wed, Dec 21, 2022 at 2:54 AM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Tue, 20 Dec 2022 11:26:07 -0800
> Srikanth Yalavarthi <syalavarthi@marvell.com> wrote:
>
> > Marvell ML CNXK Driver
> > ----------------------
> >
> > This patch series implements common Machine Learning (ML) ROC code
> > and driver for Marvell Octeon 10 (cnxk) platform. ML inferencing is
> > supported on cnxk platform through an integrated ML inferencing
> > processor. The current driver supports programming the ML hardware
> > engine through offload mode.
> >
> > All APIs proposed in the DPDK ML device specification are supported on
> > the cnxk platform.
>
>
> Is this hardware in the DPDK CI lab?

No

> How can the project make sure this isn't broken in future?

It will be like the rest of 95% HWs of DPDK drivers. i.e. Vendor will
make sure it is not broken.

^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 00/39] Implementation of ML CNXK driver
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (37 preceding siblings ...)
  2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
@ 2023-02-01  9:22 ` Srikanth Yalavarthi
  2023-02-01  9:22   ` [PATCH v4 01/39] common/cnxk: add ML headers and ROC code for cnxk Srikanth Yalavarthi
                     ` (38 more replies)
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
  40 siblings, 39 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:22 UTC (permalink / raw)
  Cc: dev, sshankarnara, jerinj, aprabhu, Srikanth Yalavarthi

Marvell ML CNXK Driver
----------------------

This patch series implements common Machine Learning (ML) ROC code
and driver for Marvell Octeon 10 (cnxk) platform. ML inferencing is
supported on cnxk platform through an integrated ML inferencing
processor. The current driver supports programming the ML hardware
engine through offload mode.

All APIs proposed in the DPDK ML device specification are supported on
the cnxk platform.

v4:
* Update function names of ML common code
* Added support for configurable OCM page size
* Minor typo fixes

v3:
* Skip installation of internal headers
* Update internal comments and code cleanup

v2:
* Typo and formatting fixes

Srikanth Yalavarthi (39):
  common/cnxk: add ML headers and ROC code for cnxk
  ml/cnxk: add skeleton for ML cnxk driver
  ml/cnxk: enable probe and remove of ML device
  ml/cnxk: add driver support to get device info
  ml/cnxk: add support for configure and close
  ml/cnxk: parse ML firmware path from device args
  ml/cnxk: enable firmware load and device reset
  ml/cnxk: enable support for simulator environment
  ml/cnxk: enable support for device start and stop
  ml/cnxk: add support to create device queue-pairs
  ml/cnxk: add functions to load and unload models
  ml/cnxk: enable validity checks for model metadata
  ml/cnxk: add internal structures for derived info
  ml/cnxk: add internal structures for tiles and OCM
  ml/cnxk: add structures for slow and fast path JDs
  ml/cnxk: find OCM mask and page slots for a model
  ml/cnxk: add support to reserve and free OCM pages
  ml/cnxk: enable support to start an ML model
  ml/cnxk: enable support to stop an ML models
  ml/cnxk: enable support to get model information
  ml/cnxk: enable support to update model params
  ml/cnxk: add support to get IO buffer sizes
  ml/cnxk: enable quantization and dequantization
  ml/cnxk: enable support to dump device debug info
  ml/cnxk: add driver support for device selftest
  ml/cnxk: enqueue a burst of inference requests
  ml/cnxk: dequeue a burst of inference requests
  ml/cnxk: add internal function for sync mode run
  ml/cnxk: enable support for firmware error codes
  ml/cnxk: add support to get and reset device stats
  ml/cnxk: add support to handle extended dev stats
  ml/cnxk: enable support to get xstats in cycles
  ml/cnxk: add support to report DPE FW warnings
  ml/cnxk: add support to enable model data caching
  ml/cnxk: add support to select OCM allocation mode
  ml/cnxk: add support to use lock during jcmd enq
  ml/cnxk: add support to select poll memory region
  ml/cnxk: add user guide for marvell cnxk ml driver
  ml/cnxk: enable support for configurable ocm page

 MAINTAINERS                         |   11 +
 doc/guides/index.rst                |    1 +
 doc/guides/mldevs/cnxk.rst          |  254 +++
 doc/guides/mldevs/index.rst         |   14 +
 drivers/common/cnxk/hw/ml.h         |  170 ++
 drivers/common/cnxk/meson.build     |    1 +
 drivers/common/cnxk/roc_api.h       |    4 +
 drivers/common/cnxk/roc_constants.h |    2 +
 drivers/common/cnxk/roc_dev_priv.h  |    1 +
 drivers/common/cnxk/roc_ml.c        |  626 ++++++++
 drivers/common/cnxk/roc_ml.h        |  152 ++
 drivers/common/cnxk/roc_ml_priv.h   |   24 +
 drivers/common/cnxk/roc_platform.c  |    1 +
 drivers/common/cnxk/roc_platform.h  |    2 +
 drivers/common/cnxk/roc_priv.h      |    3 +
 drivers/common/cnxk/version.map     |   29 +
 drivers/meson.build                 |    1 +
 drivers/ml/cnxk/cn10k_ml_dev.c      |  870 ++++++++++
 drivers/ml/cnxk/cn10k_ml_dev.h      |  429 +++++
 drivers/ml/cnxk/cn10k_ml_model.c    |  413 +++++
 drivers/ml/cnxk/cn10k_ml_model.h    |  508 ++++++
 drivers/ml/cnxk/cn10k_ml_ocm.c      |  519 ++++++
 drivers/ml/cnxk/cn10k_ml_ocm.h      |   85 +
 drivers/ml/cnxk/cn10k_ml_ops.c      | 2316 +++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h      |   94 ++
 drivers/ml/cnxk/meson.build         |   32 +
 drivers/ml/meson.build              |    8 +
 27 files changed, 6570 insertions(+)
 create mode 100644 doc/guides/mldevs/cnxk.rst
 create mode 100644 doc/guides/mldevs/index.rst
 create mode 100644 drivers/common/cnxk/hw/ml.h
 create mode 100644 drivers/common/cnxk/roc_ml.c
 create mode 100644 drivers/common/cnxk/roc_ml.h
 create mode 100644 drivers/common/cnxk/roc_ml_priv.h
 create mode 100644 drivers/ml/cnxk/cn10k_ml_dev.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_dev.h
 create mode 100644 drivers/ml/cnxk/cn10k_ml_model.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_model.h
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ocm.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ocm.h
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ops.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ops.h
 create mode 100644 drivers/ml/cnxk/meson.build
 create mode 100644 drivers/ml/meson.build

--
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 01/39] common/cnxk: add ML headers and ROC code for cnxk
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
@ 2023-02-01  9:22   ` Srikanth Yalavarthi
  2023-02-01  9:22   ` [PATCH v4 02/39] ml/cnxk: add skeleton for ML cnxk driver Srikanth Yalavarthi
                     ` (37 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:22 UTC (permalink / raw)
  To: Thomas Monjalon, Srikanth Yalavarthi, Nithin Dabilpuram,
	Kiran Kumar K, Sunil Kumar Kori, Satha Rao
  Cc: dev, sshankarnara, jerinj, aprabhu

Added ML cnxk headers for register, structure definitions and
ROC layer. Implemented ROC functions, registered logtype for
ML module with the name pmd.ml.cnxk and defined ML hardware ID.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
Depends-on: series-26731 ("Implementation of ML common code")

 MAINTAINERS                         |   9 +
 drivers/common/cnxk/hw/ml.h         | 170 ++++++++
 drivers/common/cnxk/meson.build     |   1 +
 drivers/common/cnxk/roc_api.h       |   4 +
 drivers/common/cnxk/roc_constants.h |   2 +
 drivers/common/cnxk/roc_dev_priv.h  |   1 +
 drivers/common/cnxk/roc_ml.c        | 626 ++++++++++++++++++++++++++++
 drivers/common/cnxk/roc_ml.h        | 152 +++++++
 drivers/common/cnxk/roc_ml_priv.h   |  24 ++
 drivers/common/cnxk/roc_platform.c  |   1 +
 drivers/common/cnxk/roc_platform.h  |   2 +
 drivers/common/cnxk/roc_priv.h      |   3 +
 drivers/common/cnxk/version.map     |  29 ++
 13 files changed, 1024 insertions(+)
 create mode 100644 drivers/common/cnxk/hw/ml.h
 create mode 100644 drivers/common/cnxk/roc_ml.c
 create mode 100644 drivers/common/cnxk/roc_ml.h
 create mode 100644 drivers/common/cnxk/roc_ml_priv.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 36f6e43470..265f5b9a3d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1434,6 +1434,15 @@ F: drivers/raw/dpaa2_cmdif/
 F: doc/guides/rawdevs/dpaa2_cmdif.rst


+ML Device Drivers
+------------------------
+
+Marvell ML CNXK
+M: Srikanth Yalavarthi <syalavarthi@marvell.com>
+F: drivers/common/cnxk/hw/ml.h
+F: drivers/common/cnxk/roc_ml*
+
+
 Packet processing
 -----------------

diff --git a/drivers/common/cnxk/hw/ml.h b/drivers/common/cnxk/hw/ml.h
new file mode 100644
index 0000000000..3ead42b807
--- /dev/null
+++ b/drivers/common/cnxk/hw/ml.h
@@ -0,0 +1,170 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef __ML_HW_H__
+#define __ML_HW_H__
+
+#include <stdint.h>
+
+/* Constants */
+#define ML_ANBX_NR 0x3
+
+/* Base offsets */
+#define ML_MLAB_BLK_OFFSET 0x20000000 /* CNF10KB */
+#define ML_AXI_START_ADDR  0x800000000
+
+/* MLW register offsets / ML_PF_BAR0 */
+#define ML_CFG			 0x10000
+#define ML_MLR_BASE		 0x10008
+#define ML_AXI_BRIDGE_CTRL(a)	 (0x10020 | (uint64_t)(a) << 3)
+#define ML_JOB_MGR_CTRL		 0x10060
+#define ML_CORE_INT_LO		 0x10140
+#define ML_CORE_INT_HI		 0x10160
+#define ML_JCMDQ_IN(a)		 (0x11000 | (uint64_t)(a) << 3) /* CN10KA */
+#define ML_JCMDQ_STATUS		 0x11010			/* CN10KA */
+#define ML_STGX_STATUS(a)	 (0x11020 | (uint64_t)(a) << 3) /* CNF10KB */
+#define ML_STG_CONTROL		 0x11100			/* CNF10KB */
+#define ML_PNB_CMD_TYPE		 0x113a0			/* CNF10KB */
+#define ML_SCRATCH(a)		 (0x14000 | (uint64_t)(a) << 3)
+#define ML_ANBX_BACKP_DISABLE(a) (0x18000 | (uint64_t)(a) << 12) /* CN10KA */
+#define ML_ANBX_NCBI_P_OVR(a)	 (0x18010 | (uint64_t)(a) << 12) /* CN10KA */
+#define ML_ANBX_NCBI_NP_OVR(a)	 (0x18020 | (uint64_t)(a) << 12) /* CN10KA */
+
+/* MLIP configuration register offsets / ML_PF_BAR0 */
+#define ML_SW_RST_CTRL		      0x12084000
+#define ML_A35_0_RST_VECTOR_BASE_W(a) (0x12084014 + (a) * (0x04))
+#define ML_A35_1_RST_VECTOR_BASE_W(a) (0x1208401c + (a) * (0x04))
+
+/* MLW scratch register offsets */
+#define ML_SCRATCH_WORK_PTR	      (ML_SCRATCH(0))
+#define ML_SCRATCH_FW_CTRL	      (ML_SCRATCH(1))
+#define ML_SCRATCH_DBG_BUFFER_HEAD_C0 (ML_SCRATCH(2))
+#define ML_SCRATCH_DBG_BUFFER_TAIL_C0 (ML_SCRATCH(3))
+#define ML_SCRATCH_DBG_BUFFER_HEAD_C1 (ML_SCRATCH(4))
+#define ML_SCRATCH_DBG_BUFFER_TAIL_C1 (ML_SCRATCH(5))
+#define ML_SCRATCH_EXCEPTION_SP_C0    (ML_SCRATCH(6))
+#define ML_SCRATCH_EXCEPTION_SP_C1    (ML_SCRATCH(7))
+
+/* ML job completion structure */
+struct ml_jce_s {
+	/* WORD 0 */
+	union ml_jce_w0 {
+		struct {
+			uint64_t rsvd_0_3 : 4;
+
+			/* Reserved for future architecture */
+			uint64_t ggrp_h : 2;
+
+			/* Tag type */
+			uint64_t ttype : 2;
+
+			/* Physical function number */
+			uint64_t pf_func : 16;
+
+			/* Unused [7] + Guest Group [6:0] */
+			uint64_t ggrp : 8;
+
+			/* Tag */
+			uint64_t tag : 32;
+		} s;
+		uint64_t u64;
+	} w0;
+
+	/* WORD 1 */
+	union ml_jce_w1 {
+		struct {
+			/* Work queue pointer */
+			uint64_t wqp : 53;
+			uint64_t rsvd_53_63 : 11;
+
+		} s;
+		uint64_t u64;
+	} w1;
+};
+
+/* ML job command structure */
+struct ml_job_cmd_s {
+	/* WORD 0 */
+	union ml_job_cmd_w0 {
+		struct {
+			uint64_t rsvd_0_63;
+		} s;
+		uint64_t u64;
+	} w0;
+
+	/* WORD 1 */
+	union ml_job_cmd_w1 {
+		struct {
+			/* Job pointer */
+			uint64_t jobptr : 53;
+			uint64_t rsvd_53_63 : 11;
+		} s;
+		uint64_t u64;
+	} w1;
+};
+
+/* ML A35 0 RST vector base structure */
+union ml_a35_0_rst_vector_base_s {
+	struct {
+		/* Base address */
+		uint64_t addr : 37;
+		uint64_t rsvd_37_63 : 27;
+	} s;
+
+	struct {
+		/* WORD 0 */
+		uint32_t w0;
+
+		/* WORD 1 */
+		uint32_t w1;
+	} w;
+
+	uint64_t u64;
+};
+
+/* ML A35 1 RST vector base structure */
+union ml_a35_1_rst_vector_base_s {
+	struct {
+		/* Base address */
+		uint64_t addr : 37;
+		uint64_t rsvd_37_63 : 27;
+	} s;
+
+	struct {
+		/* WORD 0 */
+		uint32_t w0;
+
+		/* WORD 1 */
+		uint32_t w1;
+	} w;
+
+	uint64_t u64;
+};
+
+/* Work pointer scratch register */
+union ml_scratch_work_ptr_s {
+	struct {
+		/* Work pointer */
+		uint64_t work_ptr : 37;
+		uint64_t rsvd_37_63 : 27;
+	} s;
+	uint64_t u64;
+};
+
+/* Firmware control scratch register */
+union ml_scratch_fw_ctrl_s {
+	struct {
+		uint64_t rsvd_0_15 : 16;
+
+		/* Valid job bit */
+		uint64_t valid : 1;
+
+		/* Done status bit */
+		uint64_t done : 1;
+		uint64_t rsvd_18_63 : 46;
+	} s;
+	uint64_t u64;
+};
+
+#endif /* __ML_HW_H__ */
diff --git a/drivers/common/cnxk/meson.build b/drivers/common/cnxk/meson.build
index 849735921c..b4aa0a050c 100644
--- a/drivers/common/cnxk/meson.build
+++ b/drivers/common/cnxk/meson.build
@@ -26,6 +26,7 @@ sources = files(
         'roc_irq.c',
         'roc_ie_ot.c',
         'roc_mbox.c',
+        'roc_ml.c',
         'roc_model.c',
         'roc_nix.c',
         'roc_nix_bpf.c',
diff --git a/drivers/common/cnxk/roc_api.h b/drivers/common/cnxk/roc_api.h
index 14a11321e0..06accf247d 100644
--- a/drivers/common/cnxk/roc_api.h
+++ b/drivers/common/cnxk/roc_api.h
@@ -34,6 +34,7 @@
 /* HW structure definition */
 #include "hw/cpt.h"
 #include "hw/dpi.h"
+#include "hw/ml.h"
 #include "hw/nix.h"
 #include "hw/npa.h"
 #include "hw/npc.h"
@@ -107,4 +108,7 @@
 /* NIX Inline dev */
 #include "roc_nix_inl.h"

+/* ML */
+#include "roc_ml.h"
+
 #endif /* _ROC_API_H_ */
diff --git a/drivers/common/cnxk/roc_constants.h b/drivers/common/cnxk/roc_constants.h
index 0495965daa..ddaef133b8 100644
--- a/drivers/common/cnxk/roc_constants.h
+++ b/drivers/common/cnxk/roc_constants.h
@@ -50,6 +50,8 @@
 #define PCI_DEVID_CN10K_RVU_CPT_PF 0xA0F2
 #define PCI_DEVID_CN10K_RVU_CPT_VF 0xA0F3

+#define PCI_DEVID_CN10K_ML_PF 0xA092
+
 #define PCI_SUBSYSTEM_DEVID_CN10KA  0xB900
 #define PCI_SUBSYSTEM_DEVID_CN10KAS 0xB900
 #define PCI_SUBSYSTEM_DEVID_CNF10KA 0xBA00
diff --git a/drivers/common/cnxk/roc_dev_priv.h b/drivers/common/cnxk/roc_dev_priv.h
index 4217ec4af8..40af5e0f0b 100644
--- a/drivers/common/cnxk/roc_dev_priv.h
+++ b/drivers/common/cnxk/roc_dev_priv.h
@@ -90,6 +90,7 @@ struct dev {
 	void *roc_nix;
 	void *roc_cpt;
 	void *roc_tim;
+	void *roc_ml;
 	bool disable_shared_lmt; /* false(default): shared lmt mode enabled */
 	const struct plt_memzone *lmt_mz;
 } __plt_cache_aligned;
diff --git a/drivers/common/cnxk/roc_ml.c b/drivers/common/cnxk/roc_ml.c
new file mode 100644
index 0000000000..7390697b1d
--- /dev/null
+++ b/drivers/common/cnxk/roc_ml.c
@@ -0,0 +1,626 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#include "roc_api.h"
+#include "roc_priv.h"
+
+#define TIME_SEC_IN_MS 1000
+
+static int
+roc_ml_reg_wait_to_clear(struct roc_ml *roc_ml, uint64_t offset, uint64_t mask)
+{
+	uint64_t start_cycle;
+	uint64_t wait_cycles;
+	uint64_t reg_val;
+
+	wait_cycles = (ROC_ML_TIMEOUT_MS * plt_tsc_hz()) / TIME_SEC_IN_MS;
+	start_cycle = plt_tsc_cycles();
+	do {
+		reg_val = roc_ml_reg_read64(roc_ml, offset);
+
+		if (!(reg_val & mask))
+			return 0;
+	} while (plt_tsc_cycles() - start_cycle < wait_cycles);
+
+	return -ETIME;
+}
+
+uint64_t
+roc_ml_reg_read64(struct roc_ml *roc_ml, uint64_t offset)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+
+	return plt_read64(PLT_PTR_ADD(ml->ml_reg_addr, offset));
+}
+
+void
+roc_ml_reg_write64(struct roc_ml *roc_ml, uint64_t val, uint64_t offset)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+
+	plt_write64(val, PLT_PTR_ADD(ml->ml_reg_addr, offset));
+}
+
+uint32_t
+roc_ml_reg_read32(struct roc_ml *roc_ml, uint64_t offset)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+
+	return plt_read32(PLT_PTR_ADD(ml->ml_reg_addr, offset));
+}
+
+void
+roc_ml_reg_write32(struct roc_ml *roc_ml, uint32_t val, uint64_t offset)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+
+	plt_write32(val, PLT_PTR_ADD(ml->ml_reg_addr, offset));
+}
+
+void
+roc_ml_reg_save(struct roc_ml *roc_ml, uint64_t offset)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+
+	if (offset == ML_MLR_BASE) {
+		ml->ml_mlr_base =
+			FIELD_GET(ROC_ML_MLR_BASE_BASE, roc_ml_reg_read64(roc_ml, offset));
+		ml->ml_mlr_base_saved = true;
+	}
+}
+
+void *
+roc_ml_addr_ap2mlip(struct roc_ml *roc_ml, void *addr)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+	uint64_t ml_mlr_base;
+
+	ml_mlr_base = (ml->ml_mlr_base_saved) ? ml->ml_mlr_base :
+						FIELD_GET(ROC_ML_MLR_BASE_BASE,
+							  roc_ml_reg_read64(roc_ml, ML_MLR_BASE));
+	return PLT_PTR_ADD(addr, ML_AXI_START_ADDR - ml_mlr_base);
+}
+
+void *
+roc_ml_addr_mlip2ap(struct roc_ml *roc_ml, void *addr)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+	uint64_t ml_mlr_base;
+
+	ml_mlr_base = (ml->ml_mlr_base_saved) ? ml->ml_mlr_base :
+						FIELD_GET(ROC_ML_MLR_BASE_BASE,
+							  roc_ml_reg_read64(roc_ml, ML_MLR_BASE));
+	return PLT_PTR_ADD(addr, ml_mlr_base - ML_AXI_START_ADDR);
+}
+
+uint64_t
+roc_ml_addr_pa_to_offset(struct roc_ml *roc_ml, uint64_t phys_addr)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+
+	if (roc_model_is_cn10ka())
+		return phys_addr - ml->pci_dev->mem_resource[0].phys_addr;
+	else
+		return phys_addr - ml->pci_dev->mem_resource[0].phys_addr - ML_MLAB_BLK_OFFSET;
+}
+
+uint64_t
+roc_ml_addr_offset_to_pa(struct roc_ml *roc_ml, uint64_t offset)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+
+	if (roc_model_is_cn10ka())
+		return ml->pci_dev->mem_resource[0].phys_addr + offset;
+	else
+		return ml->pci_dev->mem_resource[0].phys_addr + ML_MLAB_BLK_OFFSET + offset;
+}
+
+void
+roc_ml_scratch_write_job(struct roc_ml *roc_ml, void *work_ptr)
+{
+	union ml_scratch_work_ptr_s reg_work_ptr;
+	union ml_scratch_fw_ctrl_s reg_fw_ctrl;
+
+	reg_work_ptr.u64 = 0;
+	reg_work_ptr.s.work_ptr = PLT_U64_CAST(roc_ml_addr_ap2mlip(roc_ml, work_ptr));
+
+	reg_fw_ctrl.u64 = 0;
+	reg_fw_ctrl.s.valid = 1;
+
+	roc_ml_reg_write64(roc_ml, reg_work_ptr.u64, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(roc_ml, reg_fw_ctrl.u64, ML_SCRATCH_FW_CTRL);
+}
+
+bool
+roc_ml_scratch_is_valid_bit_set(struct roc_ml *roc_ml)
+{
+	union ml_scratch_fw_ctrl_s reg_fw_ctrl;
+
+	reg_fw_ctrl.u64 = roc_ml_reg_read64(roc_ml, ML_SCRATCH_FW_CTRL);
+
+	if (reg_fw_ctrl.s.valid == 1)
+		return true;
+
+	return false;
+}
+
+bool
+roc_ml_scratch_is_done_bit_set(struct roc_ml *roc_ml)
+{
+	union ml_scratch_fw_ctrl_s reg_fw_ctrl;
+
+	reg_fw_ctrl.u64 = roc_ml_reg_read64(roc_ml, ML_SCRATCH_FW_CTRL);
+
+	if (reg_fw_ctrl.s.done == 1)
+		return true;
+
+	return false;
+}
+
+bool
+roc_ml_scratch_enqueue(struct roc_ml *roc_ml, void *work_ptr)
+{
+	union ml_scratch_work_ptr_s reg_work_ptr;
+	union ml_scratch_fw_ctrl_s reg_fw_ctrl;
+	bool ret = false;
+
+	reg_work_ptr.u64 = 0;
+	reg_work_ptr.s.work_ptr = PLT_U64_CAST(roc_ml_addr_ap2mlip(roc_ml, work_ptr));
+
+	reg_fw_ctrl.u64 = 0;
+	reg_fw_ctrl.s.valid = 1;
+
+	if (plt_spinlock_trylock(&roc_ml->sp_spinlock) != 0) {
+		bool valid = roc_ml_scratch_is_valid_bit_set(roc_ml);
+		bool done = roc_ml_scratch_is_done_bit_set(roc_ml);
+
+		if (valid == done) {
+			roc_ml_clk_force_on(roc_ml);
+			roc_ml_dma_stall_off(roc_ml);
+
+			roc_ml_reg_write64(roc_ml, reg_work_ptr.u64, ML_SCRATCH_WORK_PTR);
+			roc_ml_reg_write64(roc_ml, reg_fw_ctrl.u64, ML_SCRATCH_FW_CTRL);
+
+			ret = true;
+		}
+		plt_spinlock_unlock(&roc_ml->sp_spinlock);
+	}
+
+	return ret;
+}
+
+bool
+roc_ml_scratch_dequeue(struct roc_ml *roc_ml, void *work_ptr)
+{
+	union ml_scratch_work_ptr_s reg_work_ptr;
+	bool ret = false;
+
+	if (plt_spinlock_trylock(&roc_ml->sp_spinlock) != 0) {
+		bool valid = roc_ml_scratch_is_valid_bit_set(roc_ml);
+		bool done = roc_ml_scratch_is_done_bit_set(roc_ml);
+
+		if (valid && done) {
+			reg_work_ptr.u64 = roc_ml_reg_read64(roc_ml, ML_SCRATCH_WORK_PTR);
+			if (work_ptr ==
+			    roc_ml_addr_mlip2ap(roc_ml, PLT_PTR_CAST(reg_work_ptr.u64))) {
+				roc_ml_dma_stall_on(roc_ml);
+				roc_ml_clk_force_off(roc_ml);
+
+				roc_ml_reg_write64(roc_ml, 0, ML_SCRATCH_WORK_PTR);
+				roc_ml_reg_write64(roc_ml, 0, ML_SCRATCH_FW_CTRL);
+				ret = true;
+			}
+		}
+		plt_spinlock_unlock(&roc_ml->sp_spinlock);
+	}
+
+	return ret;
+}
+
+void
+roc_ml_scratch_queue_reset(struct roc_ml *roc_ml)
+{
+	if (plt_spinlock_trylock(&roc_ml->sp_spinlock) != 0) {
+		roc_ml_dma_stall_on(roc_ml);
+		roc_ml_clk_force_off(roc_ml);
+		roc_ml_reg_write64(roc_ml, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(roc_ml, 0, ML_SCRATCH_FW_CTRL);
+		plt_spinlock_unlock(&roc_ml->sp_spinlock);
+	}
+}
+
+bool
+roc_ml_jcmdq_enqueue_lf(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd)
+{
+	bool ret = false;
+
+	if (FIELD_GET(ROC_ML_JCMDQ_STATUS_AVAIL_COUNT,
+		      roc_ml_reg_read64(roc_ml, ML_JCMDQ_STATUS)) != 0) {
+		roc_ml_reg_write64(roc_ml, job_cmd->w0.u64, ML_JCMDQ_IN(0));
+		roc_ml_reg_write64(roc_ml, job_cmd->w1.u64, ML_JCMDQ_IN(1));
+		ret = true;
+	}
+
+	return ret;
+}
+
+bool
+roc_ml_jcmdq_enqueue_sl(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd)
+{
+	bool ret = false;
+
+	if (plt_spinlock_trylock(&roc_ml->fp_spinlock) != 0) {
+		if (FIELD_GET(ROC_ML_JCMDQ_STATUS_AVAIL_COUNT,
+			      roc_ml_reg_read64(roc_ml, ML_JCMDQ_STATUS)) != 0) {
+			roc_ml_reg_write64(roc_ml, job_cmd->w0.u64, ML_JCMDQ_IN(0));
+			roc_ml_reg_write64(roc_ml, job_cmd->w1.u64, ML_JCMDQ_IN(1));
+			ret = true;
+		}
+		plt_spinlock_unlock(&roc_ml->fp_spinlock);
+	}
+
+	return ret;
+}
+
+void
+roc_ml_clk_force_on(struct roc_ml *roc_ml)
+{
+	uint64_t reg_val = 0;
+
+	reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+	reg_val |= ROC_ML_CFG_MLIP_CLK_FORCE;
+	roc_ml_reg_write64(roc_ml, reg_val, ML_CFG);
+}
+
+void
+roc_ml_clk_force_off(struct roc_ml *roc_ml)
+{
+	uint64_t reg_val = 0;
+
+	roc_ml_reg_write64(roc_ml, 0, ML_SCRATCH_WORK_PTR);
+
+	reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+	reg_val &= ~ROC_ML_CFG_MLIP_CLK_FORCE;
+	roc_ml_reg_write64(roc_ml, reg_val, ML_CFG);
+}
+
+void
+roc_ml_dma_stall_on(struct roc_ml *roc_ml)
+{
+	uint64_t reg_val = 0;
+
+	reg_val = roc_ml_reg_read64(roc_ml, ML_JOB_MGR_CTRL);
+	reg_val |= ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
+	roc_ml_reg_write64(roc_ml, reg_val, ML_JOB_MGR_CTRL);
+}
+
+void
+roc_ml_dma_stall_off(struct roc_ml *roc_ml)
+{
+	uint64_t reg_val = 0;
+
+	reg_val = roc_ml_reg_read64(roc_ml, ML_JOB_MGR_CTRL);
+	reg_val &= ~ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
+	roc_ml_reg_write64(roc_ml, reg_val, ML_JOB_MGR_CTRL);
+}
+
+bool
+roc_ml_mlip_is_enabled(struct roc_ml *roc_ml)
+{
+	uint64_t reg_val;
+
+	reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+
+	if ((reg_val & ROC_ML_CFG_MLIP_ENA) != 0)
+		return true;
+
+	return false;
+}
+
+int
+roc_ml_mlip_reset(struct roc_ml *roc_ml, bool force)
+{
+	uint64_t reg_val;
+
+	/* Force reset */
+	if (force) {
+		/* Set ML(0)_CFG[ENA] = 0. */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+		reg_val &= ~ROC_ML_CFG_ENA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_CFG);
+
+		/* Set ML(0)_CFG[MLIP_ENA] = 0. */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+		reg_val &= ~ROC_ML_CFG_MLIP_ENA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_CFG);
+
+		/* Clear ML_MLR_BASE */
+		roc_ml_reg_write64(roc_ml, 0, ML_MLR_BASE);
+	}
+
+	if (roc_model_is_cn10ka()) {
+		/* Wait for all active jobs to finish.
+		 * ML_CFG[ENA] : When set, MLW will accept job commands. This
+		 * bit can be cleared at any time. If [BUSY] is set, software
+		 * must wait until [BUSY] == 0 before setting this bit.
+		 */
+		roc_ml_reg_wait_to_clear(roc_ml, ML_CFG, ROC_ML_CFG_BUSY);
+
+		/* (1) Set ML(0)_AXI_BRIDGE_CTRL(0..1)[FENCE] = 1 to instruct
+		 * the AXI bridge not to accept any new transactions from MLIP.
+		 */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(0));
+		reg_val |= ROC_ML_AXI_BRIDGE_CTRL_FENCE;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(0));
+
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(1));
+		reg_val |= ROC_ML_AXI_BRIDGE_CTRL_FENCE;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(1));
+
+		/* (2) Wait until ML(0)_AXI_BRIDGE_CTRL(0..1)[BUSY] = 0 which
+		 * indicates that there is no outstanding transactions on
+		 * AXI-NCB paths.
+		 */
+		roc_ml_reg_wait_to_clear(roc_ml, ML_AXI_BRIDGE_CTRL(0),
+					 ROC_ML_AXI_BRIDGE_CTRL_BUSY);
+		roc_ml_reg_wait_to_clear(roc_ml, ML_AXI_BRIDGE_CTRL(1),
+					 ROC_ML_AXI_BRIDGE_CTRL_BUSY);
+
+		/* (3) Wait until ML(0)_JOB_MGR_CTRL[BUSY] = 0 which indicates
+		 * that there are no pending jobs in the MLW's job manager.
+		 */
+		roc_ml_reg_wait_to_clear(roc_ml, ML_JOB_MGR_CTRL, ROC_ML_JOB_MGR_CTRL_BUSY);
+
+		/* (4) Set ML(0)_CFG[ENA] = 0. */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+		reg_val &= ~ROC_ML_CFG_ENA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_CFG);
+
+		/* (5) Set ML(0)_CFG[MLIP_ENA] = 0. */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+		reg_val &= ~ROC_ML_CFG_MLIP_ENA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_CFG);
+
+		/* (6) Set ML(0)_AXI_BRIDGE_CTRL(0..1)[FENCE] = 0.*/
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(0));
+		reg_val &= ~ROC_ML_AXI_BRIDGE_CTRL_FENCE;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(0));
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(1));
+	}
+
+	if (roc_model_is_cnf10kb()) {
+		/* (1) Clear MLAB(0)_CFG[ENA]. Any new jobs will bypass the job
+		 * execution stages and their completions will be returned to
+		 * PSM.
+		 */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+		reg_val &= ~ROC_ML_CFG_ENA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_CFG);
+
+		/* (2) Quiesce the ACC and DMA AXI interfaces: For each of the
+		 * two MLAB(0)_AXI_BRIDGE_CTRL(0..1) registers:
+		 *
+		 * (a) Set MLAB(0)_AXI_BRIDGE_CTRL(0..1)[FENCE] to block new AXI
+		 * commands from MLIP.
+		 *
+		 * (b) Poll MLAB(0)_AXI_BRIDGE_CTRL(0..1)[BUSY] == 0.
+		 */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(0));
+		reg_val |= ROC_ML_AXI_BRIDGE_CTRL_FENCE;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(0));
+
+		roc_ml_reg_wait_to_clear(roc_ml, ML_AXI_BRIDGE_CTRL(0),
+					 ROC_ML_AXI_BRIDGE_CTRL_BUSY);
+
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(1));
+		reg_val |= ROC_ML_AXI_BRIDGE_CTRL_FENCE;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(1));
+
+		roc_ml_reg_wait_to_clear(roc_ml, ML_AXI_BRIDGE_CTRL(1),
+					 ROC_ML_AXI_BRIDGE_CTRL_BUSY);
+
+		/* (3) Clear MLAB(0)_CFG[MLIP_ENA] to reset MLIP.
+		 */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+		reg_val &= ~ROC_ML_CFG_MLIP_ENA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_CFG);
+
+cnf10kb_mlip_reset_stage_4a:
+		/* (4) Flush any outstanding jobs in MLAB's job execution
+		 * stages:
+		 *
+		 * (a) Wait for completion stage to clear:
+		 *   - Poll MLAB(0)_STG(0..2)_STATUS[VALID] == 0.
+		 */
+		roc_ml_reg_wait_to_clear(roc_ml, ML_STGX_STATUS(0), ROC_ML_STG_STATUS_VALID);
+		roc_ml_reg_wait_to_clear(roc_ml, ML_STGX_STATUS(1), ROC_ML_STG_STATUS_VALID);
+		roc_ml_reg_wait_to_clear(roc_ml, ML_STGX_STATUS(2), ROC_ML_STG_STATUS_VALID);
+
+cnf10kb_mlip_reset_stage_4b:
+		/* (4b) Clear job run stage: Poll
+		 * MLAB(0)_STG_CONTROL[RUN_TO_COMP] == 0.
+		 */
+		roc_ml_reg_wait_to_clear(roc_ml, ML_STG_CONTROL, ROC_ML_STG_CONTROL_RUN_TO_COMP);
+
+		/* (4b) Clear job run stage: If MLAB(0)_STG(1)_STATUS[VALID] ==
+		 * 1:
+		 *     - Set MLAB(0)_STG_CONTROL[RUN_TO_COMP].
+		 *     - Poll MLAB(0)_STG_CONTROL[RUN_TO_COMP] == 0.
+		 *     - Repeat step (a) to clear job completion stage.
+		 */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_STGX_STATUS(1));
+		if (reg_val & ROC_ML_STG_STATUS_VALID) {
+			reg_val = roc_ml_reg_read64(roc_ml, ML_STG_CONTROL);
+			reg_val |= ROC_ML_STG_CONTROL_RUN_TO_COMP;
+			roc_ml_reg_write64(roc_ml, reg_val, ML_STG_CONTROL);
+
+			roc_ml_reg_wait_to_clear(roc_ml, ML_STG_CONTROL,
+						 ROC_ML_STG_CONTROL_RUN_TO_COMP);
+
+			goto cnf10kb_mlip_reset_stage_4a;
+		}
+
+		/* (4c) Clear job fetch stage: Poll
+		 * MLAB(0)_STG_CONTROL[FETCH_TO_RUN] == 0.
+		 */
+		roc_ml_reg_wait_to_clear(roc_ml, ML_STG_CONTROL, ROC_ML_STG_CONTROL_FETCH_TO_RUN);
+
+		/* (4c) Clear job fetch stage: If
+		 * MLAB(0)_STG(0..2)_STATUS[VALID] == 1:
+		 *     - Set MLAB(0)_STG_CONTROL[FETCH_TO_RUN].
+		 *     - Poll MLAB(0)_STG_CONTROL[FETCH_TO_RUN] == 0.
+		 *     - Repeat step (b) to clear job run and completion stages.
+		 */
+		reg_val = (roc_ml_reg_read64(roc_ml, ML_STGX_STATUS(0)) |
+			   roc_ml_reg_read64(roc_ml, ML_STGX_STATUS(1)) |
+			   roc_ml_reg_read64(roc_ml, ML_STGX_STATUS(2)));
+
+		if (reg_val & ROC_ML_STG_STATUS_VALID) {
+			reg_val = roc_ml_reg_read64(roc_ml, ML_STG_CONTROL);
+			reg_val |= ROC_ML_STG_CONTROL_RUN_TO_COMP;
+			roc_ml_reg_write64(roc_ml, reg_val, ML_STG_CONTROL);
+
+			roc_ml_reg_wait_to_clear(roc_ml, ML_STG_CONTROL,
+						 ROC_ML_STG_CONTROL_RUN_TO_COMP);
+
+			goto cnf10kb_mlip_reset_stage_4b;
+		}
+
+		/* (5) Reset the ACC and DMA AXI interfaces: For each of the two
+		 * MLAB(0)_AXI_BRIDGE_CTRL(0..1) registers:
+		 *
+		 * (5a) Set and then clear
+		 * MLAB(0)_AXI_BRIDGE_CTRL(0..1)[FLUSH_WRITE_DATA].
+		 *
+		 * (5b) Clear MLAB(0)_AXI_BRIDGE_CTRL(0..1)[FENCE].
+		 */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(0));
+		reg_val |= ROC_ML_AXI_BRIDGE_CTRL_FLUSH_WRITE_DATA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(0));
+
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(0));
+		reg_val &= ~ROC_ML_AXI_BRIDGE_CTRL_FLUSH_WRITE_DATA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(0));
+
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(0));
+		reg_val &= ~ROC_ML_AXI_BRIDGE_CTRL_FENCE;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(0));
+
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(1));
+		reg_val |= ROC_ML_AXI_BRIDGE_CTRL_FLUSH_WRITE_DATA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(1));
+
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(1));
+		reg_val &= ~ROC_ML_AXI_BRIDGE_CTRL_FLUSH_WRITE_DATA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(1));
+
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(1));
+		reg_val &= ~ROC_ML_AXI_BRIDGE_CTRL_FENCE;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(1));
+	}
+
+	return 0;
+}
+
+int
+roc_ml_dev_init(struct roc_ml *roc_ml)
+{
+	struct plt_pci_device *pci_dev;
+	struct dev *dev;
+	struct ml *ml;
+
+	if (roc_ml == NULL || roc_ml->pci_dev == NULL)
+		return -EINVAL;
+
+	PLT_STATIC_ASSERT(sizeof(struct ml) <= ROC_ML_MEM_SZ);
+
+	ml = roc_ml_to_ml_priv(roc_ml);
+	memset(ml, 0, sizeof(*ml));
+	pci_dev = roc_ml->pci_dev;
+	dev = &ml->dev;
+
+	ml->pci_dev = pci_dev;
+	dev->roc_ml = roc_ml;
+
+	ml->ml_reg_addr = ml->pci_dev->mem_resource[0].addr;
+	ml->ml_mlr_base = 0;
+	ml->ml_mlr_base_saved = false;
+
+	plt_ml_dbg("ML: PCI Physical Address : 0x%016lx", ml->pci_dev->mem_resource[0].phys_addr);
+	plt_ml_dbg("ML: PCI Virtual Address : 0x%016lx",
+		   PLT_U64_CAST(ml->pci_dev->mem_resource[0].addr));
+
+	plt_spinlock_init(&roc_ml->sp_spinlock);
+	plt_spinlock_init(&roc_ml->fp_spinlock);
+
+	return 0;
+}
+
+int
+roc_ml_dev_fini(struct roc_ml *roc_ml)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+
+	if (ml == NULL)
+		return -EINVAL;
+
+	return 0;
+}
+
+int
+roc_ml_blk_init(struct roc_bphy *roc_bphy, struct roc_ml *roc_ml)
+{
+	struct dev *dev;
+	struct ml *ml;
+
+	if ((roc_ml == NULL) || (roc_bphy == NULL))
+		return -EINVAL;
+
+	PLT_STATIC_ASSERT(sizeof(struct ml) <= ROC_ML_MEM_SZ);
+
+	ml = roc_ml_to_ml_priv(roc_ml);
+	memset(ml, 0, sizeof(*ml));
+
+	dev = &ml->dev;
+
+	ml->pci_dev = roc_bphy->pci_dev;
+	dev->roc_ml = roc_ml;
+
+	plt_ml_dbg(
+		"MLAB: Physical Address : 0x%016lx",
+		PLT_PTR_ADD_U64_CAST(ml->pci_dev->mem_resource[0].phys_addr, ML_MLAB_BLK_OFFSET));
+	plt_ml_dbg("MLAB: Virtual Address : 0x%016lx",
+		   PLT_PTR_ADD_U64_CAST(ml->pci_dev->mem_resource[0].addr, ML_MLAB_BLK_OFFSET));
+
+	ml->ml_reg_addr = PLT_PTR_ADD(ml->pci_dev->mem_resource[0].addr, ML_MLAB_BLK_OFFSET);
+	ml->ml_mlr_base = 0;
+	ml->ml_mlr_base_saved = false;
+
+	plt_spinlock_init(&roc_ml->sp_spinlock);
+	plt_spinlock_init(&roc_ml->fp_spinlock);
+
+	return 0;
+}
+
+int
+roc_ml_blk_fini(struct roc_bphy *roc_bphy, struct roc_ml *roc_ml)
+{
+	struct ml *ml;
+
+	if ((roc_ml == NULL) || (roc_bphy == NULL))
+		return -EINVAL;
+
+	ml = roc_ml_to_ml_priv(roc_ml);
+
+	if (ml == NULL)
+		return -EINVAL;
+
+	return 0;
+}
+
+uint16_t
+roc_ml_sso_pf_func_get(void)
+{
+	return idev_sso_pffunc_get();
+}
diff --git a/drivers/common/cnxk/roc_ml.h b/drivers/common/cnxk/roc_ml.h
new file mode 100644
index 0000000000..3cd82be6a6
--- /dev/null
+++ b/drivers/common/cnxk/roc_ml.h
@@ -0,0 +1,152 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef _ROC_ML_H_
+#define _ROC_ML_H_
+
+#include "roc_api.h"
+
+#define ROC_ML_MEM_SZ	  (6 * 1024)
+#define ROC_ML_TIMEOUT_MS 10000
+
+/* ML_CFG */
+#define ROC_ML_CFG_JD_SIZE	  GENMASK_ULL(1, 0)
+#define ROC_ML_CFG_MLIP_ENA	  BIT_ULL(2)
+#define ROC_ML_CFG_BUSY		  BIT_ULL(3)
+#define ROC_ML_CFG_WRAP_CLK_FORCE BIT_ULL(4)
+#define ROC_ML_CFG_MLIP_CLK_FORCE BIT_ULL(5)
+#define ROC_ML_CFG_ENA		  BIT_ULL(6)
+
+/* ML_MLR_BASE */
+#define ROC_ML_MLR_BASE_BASE GENMASK_ULL(51, 0)
+
+/* ML_STG_STATUS */
+#define ROC_ML_STG_STATUS_VALID		BIT_ULL(0)
+#define ROC_ML_STG_STATUS_ADDR_ERR	BIT_ULL(1)
+#define ROC_ML_STG_STATUS_DMA_ERR	BIT_ULL(2)
+#define ROC_ML_STG_STATUS_TIMEOUT	BIT_ULL(3)
+#define ROC_ML_STG_STATUS_NFAT_ERR	BIT_ULL(4)
+#define ROC_ML_STG_STATUS_JOB_ERR	BIT_ULL(5)
+#define ROC_ML_STG_STATUS_ELAPSED_TICKS GENMASK_ULL(47, 6)
+
+/* ML_STG_CONTROL */
+#define ROC_ML_STG_CONTROL_FETCH_TO_RUN BIT_ULL(0)
+#define ROC_ML_STG_CONTROL_RUN_TO_COMP	BIT_ULL(1)
+
+/* ML_AXI_BRIDGE */
+#define ROC_ML_AXI_BRIDGE_CTRL_AXI_RESP_CTRL	      BIT_ULL(0)
+#define ROC_ML_AXI_BRIDGE_CTRL_BRIDGE_CTRL_MODE	      BIT_ULL(1)
+#define ROC_ML_AXI_BRIDGE_CTRL_FORCE_AXI_ID	      GENMASK_ULL(11, 2)
+#define ROC_ML_AXI_BRIDGE_CTRL_CSR_WR_BLK	      BIT_ULL(13)
+#define ROC_ML_AXI_BRIDGE_CTRL_NCB_WR_BLK	      BIT_ULL(14)
+#define ROC_ML_AXI_BRIDGE_CTRL_CSR_RD_BLK	      BIT_ULL(15)
+#define ROC_ML_AXI_BRIDGE_CTRL_NCB_RD_BLK	      BIT_ULL(16)
+#define ROC_ML_AXI_BRIDGE_CTRL_FENCE		      BIT_ULL(17)
+#define ROC_ML_AXI_BRIDGE_CTRL_BUSY		      BIT_ULL(18)
+#define ROC_ML_AXI_BRIDGE_CTRL_FORCE_WRESP_OK	      BIT_ULL(19)
+#define ROC_ML_AXI_BRIDGE_CTRL_FORCE_RRESP_OK	      BIT_ULL(20)
+#define ROC_ML_AXI_BRIDGE_CTRL_CSR_FORCE_CMPLT	      BIT_ULL(21)
+#define ROC_ML_AXI_BRIDGE_CTRL_WR_CNT_GEAR	      GENMASK_ULL(25, 22)
+#define ROC_ML_AXI_BRIDGE_CTRL_RD_GEAR		      GENMASK_ULL(28, 26)
+#define ROC_ML_AXI_BRIDGE_CTRL_CSR_CUTTHROUGH_MODE    BIT_ULL(29)
+#define ROC_ML_AXI_BRIDGE_CTRL_GAA_WRITE_CREDITS      GENMASK_ULL(33, 30)
+#define ROC_ML_AXI_BRIDGE_CTRL_GAA_READ_CREDITS	      GENMASK_ULL(37, 34)
+#define ROC_ML_AXI_BRIDGE_CTRL_GAA_LOAD_WRITE_CREDITS BIT_ULL(38)
+#define ROC_ML_AXI_BRIDGE_CTRL_GAA_LOAD_READ_CREDITS  BIT_ULL(39)
+#define ROC_ML_AXI_BRIDGE_CTRL_FLUSH_WRITE_DATA	      BIT_ULL(40)
+
+/* ML_JOB_MGR_CTRL */
+#define ROC_ML_JOB_MGR_CTRL_STALL_ON_ERR     BIT_ULL(0)
+#define ROC_ML_JOB_MGR_CTRL_PF_OVERRIDE	     BIT_ULL(1)
+#define ROC_ML_JOB_MGR_CTRL_PF_FUNC_OVERRIDE GENMASK_ULL(19, 4)
+#define ROC_ML_JOB_MGR_CTRL_BUSY	     BIT_ULL(20)
+#define ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE    BIT_ULL(21)
+
+/* ML_JCMDQ_STATUS */
+#define ROC_ML_JCMDQ_STATUS_AVAIL_COUNT GENMASK_ULL(4, 0)
+
+/* ML_ANBX_BACKP_DISABLE */
+#define ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_B_BACKP_DISABLE BIT_ULL(0)
+#define ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_R_BACKP_DISABLE BIT_ULL(1)
+
+/* ML_ANBX_NCBI_P_OVR */
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_MSH_DST_OVR_VLD	 BIT_ULL(0)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_MSH_DST_OVR	 GENMASK_ULL(11, 1)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR_VLD	 BIT_ULL(12)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR		 BIT_ULL(13)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_PADDR_OVR_VLD	 BIT_ULL(14)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_PADDR_OVR		 BIT_ULL(15)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR_VLD	 BIT_ULL(16)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR		 BIT_ULL(17)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_MPADID_VAL_OVR_VLD BIT_ULL(18)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_MPADID_VAL_OVR	 BIT_ULL(19)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_MPAMDID_OVR_VLD	 BIT_ULL(20)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_MPAMDID_OVR	 BIT_ULL(21)
+
+/* ML_ANBX_NCBI_NP_OVR */
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_MSH_DST_OVR_VLD	   BIT_ULL(0)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_MSH_DST_OVR	   GENMASK_ULL(11, 1)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR_VLD	   BIT_ULL(12)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR		   BIT_ULL(13)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_PADDR_OVR_VLD	   BIT_ULL(14)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_PADDR_OVR	   BIT_ULL(15)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_RO_OVR_VLD	   BIT_ULL(16)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_RO_OVR		   BIT_ULL(17)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_MPADID_VAL_OVR_VLD BIT_ULL(18)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_MPADID_VAL_OVR	   BIT_ULL(19)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_MPAMDID_OVR_VLD	   BIT_ULL(20)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_MPAMDID_OVR	   BIT_ULL(21)
+
+/* ML_SW_RST_CTRL */
+#define ROC_ML_SW_RST_CTRL_ACC_RST  BIT_ULL(0)
+#define ROC_ML_SW_RST_CTRL_CMPC_RST BIT_ULL(1)
+
+struct roc_ml {
+	struct plt_pci_device *pci_dev;
+	plt_spinlock_t sp_spinlock;
+	plt_spinlock_t fp_spinlock;
+	uint8_t reserved[ROC_ML_MEM_SZ] __plt_cache_aligned;
+} __plt_cache_aligned;
+
+/* Register read and write functions */
+uint64_t __roc_api roc_ml_reg_read64(struct roc_ml *roc_ml, uint64_t offset);
+void __roc_api roc_ml_reg_write64(struct roc_ml *roc_ml, uint64_t val, uint64_t offset);
+uint32_t __roc_api roc_ml_reg_read32(struct roc_ml *roc_ml, uint64_t offset);
+void __roc_api roc_ml_reg_write32(struct roc_ml *roc_ml, uint32_t val, uint64_t offset);
+void __roc_api roc_ml_reg_save(struct roc_ml *roc_ml, uint64_t offset);
+
+/* Address translation functions */
+uint64_t __roc_api roc_ml_addr_pa_to_offset(struct roc_ml *roc_ml, uint64_t phys_addr);
+uint64_t __roc_api roc_ml_addr_offset_to_pa(struct roc_ml *roc_ml, uint64_t offset);
+void *__roc_api roc_ml_addr_ap2mlip(struct roc_ml *roc_ml, void *addr);
+void *__roc_api roc_ml_addr_mlip2ap(struct roc_ml *roc_ml, void *addr);
+
+/* Scratch and JCMDQ functions */
+void __roc_api roc_ml_scratch_write_job(struct roc_ml *roc_ml, void *jd);
+bool __roc_api roc_ml_scratch_is_valid_bit_set(struct roc_ml *roc_ml);
+bool __roc_api roc_ml_scratch_is_done_bit_set(struct roc_ml *roc_ml);
+bool __roc_api roc_ml_scratch_enqueue(struct roc_ml *roc_ml, void *work_ptr);
+bool __roc_api roc_ml_scratch_dequeue(struct roc_ml *roc_ml, void *work_ptr);
+void __roc_api roc_ml_scratch_queue_reset(struct roc_ml *roc_ml);
+bool __roc_api roc_ml_jcmdq_enqueue_lf(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
+bool __roc_api roc_ml_jcmdq_enqueue_sl(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
+
+/* Device management functions */
+void __roc_api roc_ml_clk_force_on(struct roc_ml *roc_ml);
+void __roc_api roc_ml_clk_force_off(struct roc_ml *roc_ml);
+void __roc_api roc_ml_dma_stall_on(struct roc_ml *roc_ml);
+void __roc_api roc_ml_dma_stall_off(struct roc_ml *roc_ml);
+bool __roc_api roc_ml_mlip_is_enabled(struct roc_ml *roc_ml);
+int __roc_api roc_ml_mlip_reset(struct roc_ml *roc_ml, bool force);
+
+/* Device / block  functions */
+int __roc_api roc_ml_dev_init(struct roc_ml *roc_ml);
+int __roc_api roc_ml_dev_fini(struct roc_ml *roc_ml);
+int __roc_api roc_ml_blk_init(struct roc_bphy *roc_bphy, struct roc_ml *roc_ml);
+int __roc_api roc_ml_blk_fini(struct roc_bphy *roc_bphy, struct roc_ml *roc_ml);
+
+/* Utility functions */
+uint16_t __roc_api roc_ml_sso_pf_func_get(void);
+
+#endif /*_ROC_ML_H_*/
diff --git a/drivers/common/cnxk/roc_ml_priv.h b/drivers/common/cnxk/roc_ml_priv.h
new file mode 100644
index 0000000000..ad5fe90bab
--- /dev/null
+++ b/drivers/common/cnxk/roc_ml_priv.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef _ROC_ML_PRIV_H_
+#define _ROC_ML_PRIV_H_
+
+#include "roc_api.h"
+
+struct ml {
+	struct plt_pci_device *pci_dev;
+	struct dev dev;
+	uint8_t *ml_reg_addr;
+	uint64_t ml_mlr_base;
+	bool ml_mlr_base_saved;
+} __plt_cache_aligned;
+
+static inline struct ml *
+roc_ml_to_ml_priv(struct roc_ml *roc_ml)
+{
+	return (struct ml *)&roc_ml->reserved[0];
+}
+
+#endif /* _ROC_ML_PRIV_H_ */
diff --git a/drivers/common/cnxk/roc_platform.c b/drivers/common/cnxk/roc_platform.c
index ce0f9b870c..f91b95ceab 100644
--- a/drivers/common/cnxk/roc_platform.c
+++ b/drivers/common/cnxk/roc_platform.c
@@ -63,6 +63,7 @@ roc_plt_init(void)
 RTE_LOG_REGISTER(cnxk_logtype_base, pmd.cnxk.base, NOTICE);
 RTE_LOG_REGISTER(cnxk_logtype_mbox, pmd.cnxk.mbox, NOTICE);
 RTE_LOG_REGISTER(cnxk_logtype_cpt, pmd.crypto.cnxk, NOTICE);
+RTE_LOG_REGISTER(cnxk_logtype_ml, pmd.ml.cnxk, NOTICE);
 RTE_LOG_REGISTER(cnxk_logtype_npa, pmd.mempool.cnxk, NOTICE);
 RTE_LOG_REGISTER(cnxk_logtype_nix, pmd.net.cnxk, NOTICE);
 RTE_LOG_REGISTER(cnxk_logtype_npc, pmd.net.cnxk.flow, NOTICE);
diff --git a/drivers/common/cnxk/roc_platform.h b/drivers/common/cnxk/roc_platform.h
index 8ba28e69fa..59a422eb9d 100644
--- a/drivers/common/cnxk/roc_platform.h
+++ b/drivers/common/cnxk/roc_platform.h
@@ -234,6 +234,7 @@
 extern int cnxk_logtype_base;
 extern int cnxk_logtype_mbox;
 extern int cnxk_logtype_cpt;
+extern int cnxk_logtype_ml;
 extern int cnxk_logtype_npa;
 extern int cnxk_logtype_nix;
 extern int cnxk_logtype_npc;
@@ -261,6 +262,7 @@ extern int cnxk_logtype_ree;
 #define plt_base_dbg(fmt, ...)	plt_dbg(base, fmt, ##__VA_ARGS__)
 #define plt_cpt_dbg(fmt, ...)	plt_dbg(cpt, fmt, ##__VA_ARGS__)
 #define plt_mbox_dbg(fmt, ...)	plt_dbg(mbox, fmt, ##__VA_ARGS__)
+#define plt_ml_dbg(fmt, ...)	plt_dbg(ml, fmt, ##__VA_ARGS__)
 #define plt_npa_dbg(fmt, ...)	plt_dbg(npa, fmt, ##__VA_ARGS__)
 #define plt_nix_dbg(fmt, ...)	plt_dbg(nix, fmt, ##__VA_ARGS__)
 #define plt_npc_dbg(fmt, ...)	plt_dbg(npc, fmt, ##__VA_ARGS__)
diff --git a/drivers/common/cnxk/roc_priv.h b/drivers/common/cnxk/roc_priv.h
index 122d411fe7..14fe2e452a 100644
--- a/drivers/common/cnxk/roc_priv.h
+++ b/drivers/common/cnxk/roc_priv.h
@@ -47,4 +47,7 @@
 /* REE */
 #include "roc_ree_priv.h"

+/* ML */
+#include "roc_ml_priv.h"
+
 #endif /* _ROC_PRIV_H_ */
diff --git a/drivers/common/cnxk/version.map b/drivers/common/cnxk/version.map
index ee283d2392..c1c8542b1a 100644
--- a/drivers/common/cnxk/version.map
+++ b/drivers/common/cnxk/version.map
@@ -8,6 +8,7 @@ INTERNAL {
 	cnxk_logtype_base;
 	cnxk_logtype_cpt;
 	cnxk_logtype_mbox;
+	cnxk_logtype_ml;
 	cnxk_logtype_nix;
 	cnxk_logtype_npa;
 	cnxk_logtype_npc;
@@ -96,6 +97,34 @@ INTERNAL {
 	roc_idev_npa_nix_get;
 	roc_idev_num_lmtlines_get;
 	roc_idev_nix_inl_meta_aura_get;
+	roc_ml_reg_read64;
+	roc_ml_reg_write64;
+	roc_ml_reg_read32;
+	roc_ml_reg_write32;
+	roc_ml_reg_save;
+	roc_ml_addr_ap2mlip;
+	roc_ml_addr_mlip2ap;
+	roc_ml_addr_pa_to_offset;
+	roc_ml_addr_offset_to_pa;
+	roc_ml_scratch_write_job;
+	roc_ml_scratch_is_valid_bit_set;
+	roc_ml_scratch_is_done_bit_set;
+	roc_ml_scratch_enqueue;
+	roc_ml_scratch_dequeue;
+	roc_ml_scratch_queue_reset;
+	roc_ml_jcmdq_enqueue_lf;
+	roc_ml_jcmdq_enqueue_sl;
+	roc_ml_clk_force_on;
+	roc_ml_clk_force_off;
+	roc_ml_dma_stall_on;
+	roc_ml_dma_stall_off;
+	roc_ml_mlip_is_enabled;
+	roc_ml_mlip_reset;
+	roc_ml_dev_init;
+	roc_ml_dev_fini;
+	roc_ml_blk_init;
+	roc_ml_blk_fini;
+	roc_ml_sso_pf_func_get;
 	roc_model;
 	roc_se_auth_key_set;
 	roc_se_ciph_key_set;
--
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 02/39] ml/cnxk: add skeleton for ML cnxk driver
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
  2023-02-01  9:22   ` [PATCH v4 01/39] common/cnxk: add ML headers and ROC code for cnxk Srikanth Yalavarthi
@ 2023-02-01  9:22   ` Srikanth Yalavarthi
  2023-02-01  9:22   ` [PATCH v4 03/39] ml/cnxk: enable probe and remove of ML device Srikanth Yalavarthi
                     ` (36 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:22 UTC (permalink / raw)
  To: Thomas Monjalon, Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added initial source files and build files for ML cnxk driver.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 MAINTAINERS                    |  1 +
 drivers/meson.build            |  1 +
 drivers/ml/cnxk/cn10k_ml_dev.c |  8 ++++++++
 drivers/ml/cnxk/cn10k_ml_dev.h |  8 ++++++++
 drivers/ml/cnxk/meson.build    | 26 ++++++++++++++++++++++++++
 drivers/ml/meson.build         |  8 ++++++++
 6 files changed, 52 insertions(+)
 create mode 100644 drivers/ml/cnxk/cn10k_ml_dev.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_dev.h
 create mode 100644 drivers/ml/cnxk/meson.build
 create mode 100644 drivers/ml/meson.build

diff --git a/MAINTAINERS b/MAINTAINERS
index 265f5b9a3d..147f2bd8ec 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1441,6 +1441,7 @@ Marvell ML CNXK
 M: Srikanth Yalavarthi <syalavarthi@marvell.com>
 F: drivers/common/cnxk/hw/ml.h
 F: drivers/common/cnxk/roc_ml*
+F: drivers/ml/cnxk/
 
 
 Packet processing
diff --git a/drivers/meson.build b/drivers/meson.build
index c6d619200f..546a5f409d 100644
--- a/drivers/meson.build
+++ b/drivers/meson.build
@@ -14,6 +14,7 @@ subdirs = [
         'mempool',        # depends on common and bus.
         'dma',            # depends on common and bus.
         'net',            # depends on common, bus, mempool
+        'ml',             # depends on common, bus, mempool
         'raw',            # depends on common, bus, dma and net.
         'crypto',         # depends on common, bus and mempool (net in future).
         'compress',       # depends on common, bus, mempool.
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
new file mode 100644
index 0000000000..cc96a7bdb3
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include "cn10k_ml_dev.h"
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
new file mode 100644
index 0000000000..049ac13fcd
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef _CN10K_ML_DEV_H_
+#define _CN10K_ML_DEV_H_
+
+#endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
new file mode 100644
index 0000000000..2ec6a88e3f
--- /dev/null
+++ b/drivers/ml/cnxk/meson.build
@@ -0,0 +1,26 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2022 Marvell.
+
+if not is_linux or not dpdk_conf.get('RTE_ARCH_64')
+    build = false
+    reason = 'only supported on 64-bit Linux'
+    subdir_done()
+endif
+
+driver_sdk_headers = files(
+        'cn10k_ml_dev.h',
+)
+
+sources = files(
+        'cn10k_ml_dev.c',
+)
+
+deps += ['mldev', 'common_cnxk']
+
+if get_option('buildtype').contains('debug')
+        cflags += [ '-DCNXK_ML_DEV_DEBUG' ]
+else
+        cflags += [ '-UCNXK_ML_DEV_DEBUG' ]
+endif
+
+pmd_supports_disable_iova_as_pa = true
diff --git a/drivers/ml/meson.build b/drivers/ml/meson.build
new file mode 100644
index 0000000000..54bc394c47
--- /dev/null
+++ b/drivers/ml/meson.build
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2022 Marvell.
+
+drivers = [
+        'cnxk',
+]
+
+std_deps = ['mldev']
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 03/39] ml/cnxk: enable probe and remove of ML device
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
  2023-02-01  9:22   ` [PATCH v4 01/39] common/cnxk: add ML headers and ROC code for cnxk Srikanth Yalavarthi
  2023-02-01  9:22   ` [PATCH v4 02/39] ml/cnxk: add skeleton for ML cnxk driver Srikanth Yalavarthi
@ 2023-02-01  9:22   ` Srikanth Yalavarthi
  2023-02-01  9:22   ` [PATCH v4 04/39] ml/cnxk: add driver support to get device info Srikanth Yalavarthi
                     ` (35 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:22 UTC (permalink / raw)
  To: Srikanth Yalavarthi, Anatoly Burakov; +Cc: dev, sshankarnara, jerinj, aprabhu

ML inference engine on cn10k platform is a PCI based device. Added
driver support to probe and remove the device for cn10k poll mode
driver. The device is named by the PMD as "ml_cn10k".

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 114 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_dev.h |  11 ++++
 drivers/ml/cnxk/cn10k_ml_ops.c |  10 +++
 drivers/ml/cnxk/cn10k_ml_ops.h |  11 ++++
 drivers/ml/cnxk/meson.build    |   2 +
 5 files changed, 148 insertions(+)
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ops.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ops.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index cc96a7bdb3..c2e93c9a1a 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -2,7 +2,121 @@
  * Copyright (c) 2022 Marvell.
  */
 
+#include <rte_common.h>
+#include <rte_dev.h>
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
+#include <rte_pci.h>
+
+#include <roc_api.h>
 
 #include "cn10k_ml_dev.h"
+#include "cn10k_ml_ops.h"
+
+/* Dummy operations for ML device */
+struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
+
+static int
+cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
+{
+	struct rte_ml_dev_pmd_init_params init_params;
+	struct cn10k_ml_dev *mldev;
+	char name[RTE_ML_STR_MAX];
+	struct rte_ml_dev *dev;
+	int ret;
+
+	PLT_SET_USED(pci_drv);
+
+	init_params = (struct rte_ml_dev_pmd_init_params){
+		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cn10k_ml_dev)};
+
+	ret = roc_plt_init();
+	if (ret < 0) {
+		plt_err("Failed to initialize platform model");
+		return ret;
+	}
+
+	rte_pci_device_name(&pci_dev->addr, name, sizeof(name));
+	dev = rte_ml_dev_pmd_create(name, &pci_dev->device, &init_params);
+	if (dev == NULL) {
+		ret = -ENODEV;
+		goto error_exit;
+	}
+
+	/* Get private data space allocated */
+	mldev = dev->data->dev_private;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		mldev->roc.pci_dev = pci_dev;
+
+		ret = roc_ml_dev_init(&mldev->roc);
+		if (ret) {
+			plt_err("Failed to initialize ML ROC, ret = %d", ret);
+			goto pmd_destroy;
+		}
+
+		dev->dev_ops = &cn10k_ml_ops;
+	} else {
+		plt_err("CN10K ML Ops are not supported on secondary process");
+		dev->dev_ops = &ml_dev_dummy_ops;
+	}
+
+	dev->enqueue_burst = NULL;
+	dev->dequeue_burst = NULL;
+	dev->op_error_get = NULL;
+
+	return 0;
+
+pmd_destroy:
+	rte_ml_dev_pmd_destroy(dev);
+
+error_exit:
+	plt_err("Could not create device (vendor_id: 0x%x device_id: 0x%x)", pci_dev->id.vendor_id,
+		pci_dev->id.device_id);
+
+	return ret;
+}
+
+static int
+cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
+{
+	struct cn10k_ml_dev *mldev;
+	char name[RTE_ML_STR_MAX];
+	struct rte_ml_dev *dev;
+	int ret;
+
+	if (pci_dev == NULL)
+		return -EINVAL;
+
+	rte_pci_device_name(&pci_dev->addr, name, sizeof(name));
+
+	dev = rte_ml_dev_pmd_get_named_dev(name);
+	if (dev == NULL)
+		return -ENODEV;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		mldev = dev->data->dev_private;
+		ret = roc_ml_dev_fini(&mldev->roc);
+		if (ret)
+			return ret;
+	}
+
+	return rte_ml_dev_pmd_destroy(dev);
+}
+
+static struct rte_pci_id pci_id_ml_table[] = {
+	{RTE_PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_CN10K_ML_PF)},
+	/* sentinel */
+	{},
+};
+
+static struct rte_pci_driver cn10k_mldev_pmd = {
+	.id_table = pci_id_ml_table,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_NEED_IOVA_AS_VA,
+	.probe = cn10k_ml_pci_probe,
+	.remove = cn10k_ml_pci_remove,
+};
+
+RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd);
+RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table);
+RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 049ac13fcd..833a09791a 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -5,4 +5,15 @@
 #ifndef _CN10K_ML_DEV_H_
 #define _CN10K_ML_DEV_H_
 
+#include <roc_api.h>
+
+/* Marvell OCTEON CN10K ML PMD device name */
+#define MLDEV_NAME_CN10K_PMD ml_cn10k
+
+/* Device private data */
+struct cn10k_ml_dev {
+	/* Device ROC */
+	struct roc_ml roc;
+};
+
 #endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
new file mode 100644
index 0000000000..39843e3ee5
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include "cn10k_ml_ops.h"
+
+struct rte_ml_dev_ops cn10k_ml_ops = {0};
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
new file mode 100644
index 0000000000..b14221d02c
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef _CN10K_ML_OPS_H_
+#define _CN10K_ML_OPS_H_
+
+/* Device ops */
+extern struct rte_ml_dev_ops cn10k_ml_ops;
+
+#endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 2ec6a88e3f..caed62a9f3 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -9,10 +9,12 @@ endif
 
 driver_sdk_headers = files(
         'cn10k_ml_dev.h',
+        'cn10k_ml_ops.h',
 )
 
 sources = files(
         'cn10k_ml_dev.c',
+        'cn10k_ml_ops.c',
 )
 
 deps += ['mldev', 'common_cnxk']
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 04/39] ml/cnxk: add driver support to get device info
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (2 preceding siblings ...)
  2023-02-01  9:22   ` [PATCH v4 03/39] ml/cnxk: enable probe and remove of ML device Srikanth Yalavarthi
@ 2023-02-01  9:22   ` Srikanth Yalavarthi
  2023-02-01  9:22   ` [PATCH v4 05/39] ml/cnxk: add support for configure and close Srikanth Yalavarthi
                     ` (34 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:22 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added support to get the cn10k ML device information. This is a
driver implementation for the RTE function rte_ml_dev_info_get.
ML device on cn10k supports one queue-pair in lock-free mode and
does not support segmented input output data.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h | 15 +++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c | 23 ++++++++++++++++++++++-
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 833a09791a..13d26373e4 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -10,6 +10,21 @@
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
+/* Device alignment size */
+#define ML_CN10K_ALIGN_SIZE 128
+
+/* Maximum number of models per device */
+#define ML_CN10K_MAX_MODELS 16
+
+/* Maximum number of queue-pairs per device */
+#define ML_CN10K_MAX_QP_PER_DEVICE 1
+
+/* Maximum number of descriptors per queue-pair */
+#define ML_CN10K_MAX_DESC_PER_QP 1024
+
+/* Maximum number of segments for IO data */
+#define ML_CN10K_MAX_SEGMENTS 1
+
 /* Device private data */
 struct cn10k_ml_dev {
 	/* Device ROC */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 39843e3ee5..bad5ad4713 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -5,6 +5,27 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include "cn10k_ml_dev.h"
 #include "cn10k_ml_ops.h"
 
-struct rte_ml_dev_ops cn10k_ml_ops = {0};
+static int
+cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
+{
+	if (dev_info == NULL)
+		return -EINVAL;
+
+	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
+	dev_info->driver_name = dev->device->driver->name;
+	dev_info->max_models = ML_CN10K_MAX_MODELS;
+	dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE;
+	dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
+	dev_info->max_segments = ML_CN10K_MAX_SEGMENTS;
+	dev_info->min_align_size = ML_CN10K_ALIGN_SIZE;
+
+	return 0;
+}
+
+struct rte_ml_dev_ops cn10k_ml_ops = {
+	/* Device control ops */
+	.dev_info_get = cn10k_ml_dev_info_get,
+};
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 05/39] ml/cnxk: add support for configure and close
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (3 preceding siblings ...)
  2023-02-01  9:22   ` [PATCH v4 04/39] ml/cnxk: add driver support to get device info Srikanth Yalavarthi
@ 2023-02-01  9:22   ` Srikanth Yalavarthi
  2023-02-01  9:22   ` [PATCH v4 06/39] ml/cnxk: parse ML firmware path from device args Srikanth Yalavarthi
                     ` (33 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:22 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Implemented driver functions to configure and close ML devices.
Added skeleton code and support to reconfigure ML device. PCI
device remove is enabled in device close.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c |  2 ++
 drivers/ml/cnxk/cn10k_ml_dev.h | 21 ++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c | 60 ++++++++++++++++++++++++++++++++++
 3 files changed, 83 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index c2e93c9a1a..fd45226add 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -65,6 +65,8 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	dev->dequeue_burst = NULL;
 	dev->op_error_get = NULL;
 
+	mldev->state = ML_CN10K_DEV_STATE_PROBED;
+
 	return 0;
 
 pmd_destroy:
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 13d26373e4..e7fb5fc2e2 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -25,10 +25,31 @@
 /* Maximum number of segments for IO data */
 #define ML_CN10K_MAX_SEGMENTS 1
 
+/* ML command timeout in seconds */
+#define ML_CN10K_CMD_TIMEOUT 5
+
+/* Device configuration state enum */
+enum cn10k_ml_dev_state {
+	/* Probed and not configured */
+	ML_CN10K_DEV_STATE_PROBED = 0,
+
+	/* Configured */
+	ML_CN10K_DEV_STATE_CONFIGURED,
+
+	/* Started */
+	ML_CN10K_DEV_STATE_STARTED,
+
+	/* Closed */
+	ML_CN10K_DEV_STATE_CLOSED
+};
+
 /* Device private data */
 struct cn10k_ml_dev {
 	/* Device ROC */
 	struct roc_ml roc;
+
+	/* Configuration state */
+	enum cn10k_ml_dev_state state;
 };
 
 #endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index bad5ad4713..32d38569a3 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -25,7 +25,67 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	return 0;
 }
 
+static int
+cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_dev *mldev;
+
+	if (dev == NULL || conf == NULL)
+		return -EINVAL;
+
+	/* Get CN10K device handle */
+	mldev = dev->data->dev_private;
+
+	cn10k_ml_dev_info_get(dev, &dev_info);
+	if (conf->nb_models > dev_info.max_models) {
+		plt_err("Invalid device config, nb_models > %d\n", dev_info.max_models);
+		return -EINVAL;
+	}
+
+	if (conf->nb_queue_pairs > dev_info.max_queue_pairs) {
+		plt_err("Invalid device config, nb_queue_pairs > %u\n", dev_info.max_queue_pairs);
+		return -EINVAL;
+	}
+
+	if (mldev->state == ML_CN10K_DEV_STATE_PROBED) {
+		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
+			   conf->nb_queue_pairs, conf->nb_models);
+	} else if (mldev->state == ML_CN10K_DEV_STATE_CONFIGURED) {
+		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
+			   conf->nb_queue_pairs, conf->nb_models);
+	} else if (mldev->state == ML_CN10K_DEV_STATE_STARTED) {
+		plt_err("Device can't be reconfigured in started state\n");
+		return -ENOTSUP;
+	} else if (mldev->state == ML_CN10K_DEV_STATE_CLOSED) {
+		plt_err("Device can't be reconfigured after close\n");
+		return -ENOTSUP;
+	}
+
+	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
+
+	return 0;
+}
+
+static int
+cn10k_ml_dev_close(struct rte_ml_dev *dev)
+{
+	struct cn10k_ml_dev *mldev;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	mldev = dev->data->dev_private;
+
+	mldev->state = ML_CN10K_DEV_STATE_CLOSED;
+
+	/* Remove PCI device */
+	return rte_dev_remove(dev->device);
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
+	.dev_configure = cn10k_ml_dev_configure,
+	.dev_close = cn10k_ml_dev_close,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 06/39] ml/cnxk: parse ML firmware path from device args
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (4 preceding siblings ...)
  2023-02-01  9:22   ` [PATCH v4 05/39] ml/cnxk: add support for configure and close Srikanth Yalavarthi
@ 2023-02-01  9:22   ` Srikanth Yalavarthi
  2023-02-01  9:22   ` [PATCH v4 07/39] ml/cnxk: enable firmware load and device reset Srikanth Yalavarthi
                     ` (32 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:22 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled parsing ML firmware path for cn10k. Default path is set
as "/lib/firmware/mlip-fw.bin", when args are not provided. Added
internal structures for ML firmware.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 71 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_dev.h | 12 ++++++
 drivers/ml/cnxk/meson.build    |  2 +-
 3 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index fd45226add..117cac43aa 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -4,6 +4,8 @@
 
 #include <rte_common.h>
 #include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_kvargs.h>
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 #include <rte_pci.h>
@@ -13,9 +15,70 @@
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ops.h"
 
+#define CN10K_ML_FW_PATH "fw_path"
+
+#define CN10K_ML_FW_PATH_DEFAULT "/lib/firmware/mlip-fw.bin"
+
+static const char *const valid_args[] = {CN10K_ML_FW_PATH, NULL};
+
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
 
+static int
+parse_string_arg(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	if (value == NULL || extra_args == NULL)
+		return -EINVAL;
+
+	*(char **)extra_args = strdup(value);
+
+	if (!*(char **)extra_args)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static int
+cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mldev)
+{
+	struct rte_kvargs *kvlist = NULL;
+	bool fw_path_set = false;
+	char *fw_path = NULL;
+	int ret = 0;
+
+	if (devargs == NULL)
+		goto check_args;
+
+	kvlist = rte_kvargs_parse(devargs->args, valid_args);
+	if (kvlist == NULL) {
+		plt_err("Error parsing devargs\n");
+		return -EINVAL;
+	}
+
+	if (rte_kvargs_count(kvlist, CN10K_ML_FW_PATH) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_PATH, &parse_string_arg, &fw_path);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n", CN10K_ML_FW_PATH);
+			ret = -EINVAL;
+			goto exit;
+		}
+		fw_path_set = true;
+	}
+
+check_args:
+	if (!fw_path_set)
+		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
+	else
+		mldev->fw.path = fw_path;
+	plt_info("ML: %s = %s", CN10K_ML_FW_PATH, mldev->fw.path);
+
+exit:
+	if (kvlist)
+		rte_kvargs_free(kvlist);
+
+	return ret;
+}
+
 static int
 cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 {
@@ -49,6 +112,12 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
 		mldev->roc.pci_dev = pci_dev;
 
+		ret = cn10k_mldev_parse_devargs(dev->device->devargs, mldev);
+		if (ret) {
+			plt_err("Failed to parse devargs ret = %d", ret);
+			goto pmd_destroy;
+		}
+
 		ret = roc_ml_dev_init(&mldev->roc);
 		if (ret) {
 			plt_err("Failed to initialize ML ROC, ret = %d", ret);
@@ -122,3 +191,5 @@ static struct rte_pci_driver cn10k_mldev_pmd = {
 RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table);
 RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
+
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH "=<path>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index e7fb5fc2e2..5333566cff 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -43,6 +43,15 @@ enum cn10k_ml_dev_state {
 	ML_CN10K_DEV_STATE_CLOSED
 };
 
+/* ML firmware structure */
+struct cn10k_ml_fw {
+	/* Device reference */
+	struct cn10k_ml_dev *mldev;
+
+	/* Firmware file path */
+	const char *path;
+};
+
 /* Device private data */
 struct cn10k_ml_dev {
 	/* Device ROC */
@@ -50,6 +59,9 @@ struct cn10k_ml_dev {
 
 	/* Configuration state */
 	enum cn10k_ml_dev_state state;
+
+	/* Firmware */
+	struct cn10k_ml_fw fw;
 };
 
 #endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index caed62a9f3..7dc8a29a80 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -17,7 +17,7 @@ sources = files(
         'cn10k_ml_ops.c',
 )
 
-deps += ['mldev', 'common_cnxk']
+deps += ['mldev', 'common_cnxk', 'kvargs']
 
 if get_option('buildtype').contains('debug')
         cflags += [ '-DCNXK_ML_DEV_DEBUG' ]
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 07/39] ml/cnxk: enable firmware load and device reset
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (5 preceding siblings ...)
  2023-02-01  9:22   ` [PATCH v4 06/39] ml/cnxk: parse ML firmware path from device args Srikanth Yalavarthi
@ 2023-02-01  9:22   ` Srikanth Yalavarthi
  2023-02-01  9:22   ` [PATCH v4 08/39] ml/cnxk: enable support for simulator environment Srikanth Yalavarthi
                     ` (31 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:22 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled support to load ML firmware on cn10ka ROC model. Reset
MLIP device during dev_close driver operation. Device can't be
reconfigured after a call to close. Job execution is disabled
after firmware load, execution is enabled in device start state.
Added internal request structure.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 327 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_dev.h | 156 ++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c |  21 +++
 drivers/ml/cnxk/cn10k_ml_ops.h |  14 ++
 4 files changed, 518 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 117cac43aa..90fca45ddd 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -12,6 +12,8 @@
 
 #include <roc_api.h>
 
+#include <eal_firmware.h>
+
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ops.h"
 
@@ -19,6 +21,15 @@
 
 #define CN10K_ML_FW_PATH_DEFAULT "/lib/firmware/mlip-fw.bin"
 
+/* ML firmware macros */
+#define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
+#define FW_STACK_BUFFER_SIZE	 0x40000
+#define FW_DEBUG_BUFFER_SIZE	 (2 * 0x20000)
+#define FW_EXCEPTION_BUFFER_SIZE 0x400
+#define FW_LINKER_OFFSET	 0x80000
+#define FW_WAIT_CYCLES		 100
+#define FW_LOAD_FLAGS		 0x1
+
 static const char *const valid_args[] = {CN10K_ML_FW_PATH, NULL};
 
 /* Dummy operations for ML device */
@@ -175,6 +186,322 @@ cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 	return rte_ml_dev_pmd_destroy(dev);
 }
 
+static void
+cn10k_ml_fw_print_info(struct cn10k_ml_fw *fw)
+{
+	plt_info("ML Firmware Version = %s", fw->req->jd.fw_load.version);
+
+	plt_ml_dbg("Firmware capabilities = 0x%016lx", fw->req->jd.fw_load.cap.u64);
+	plt_ml_dbg("Version = %s", fw->req->jd.fw_load.version);
+	plt_ml_dbg("core0_debug_ptr = 0x%016lx", fw->req->jd.fw_load.debug.core0_debug_ptr);
+	plt_ml_dbg("core1_debug_ptr = 0x%016lx", fw->req->jd.fw_load.debug.core1_debug_ptr);
+	plt_ml_dbg("debug_buffer_size = %u bytes", fw->req->jd.fw_load.debug.debug_buffer_size);
+	plt_ml_dbg("core0_exception_buffer = 0x%016lx",
+		   fw->req->jd.fw_load.debug.core0_exception_buffer);
+	plt_ml_dbg("core1_exception_buffer = 0x%016lx",
+		   fw->req->jd.fw_load.debug.core1_exception_buffer);
+	plt_ml_dbg("exception_state_size = %u bytes",
+		   fw->req->jd.fw_load.debug.exception_state_size);
+	plt_ml_dbg("flags = 0x%016lx", fw->req->jd.fw_load.flags);
+}
+
+uint64_t
+cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
+{
+	PLT_SET_USED(fw);
+
+	return FW_LOAD_FLAGS;
+}
+
+static int
+cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
+{
+	union ml_a35_0_rst_vector_base_s a35_0_rst_vector_base;
+	union ml_a35_0_rst_vector_base_s a35_1_rst_vector_base;
+	struct cn10k_ml_dev *mldev;
+	uint64_t timeout_cycle;
+	uint64_t reg_val64;
+	uint32_t reg_val32;
+	uint64_t offset;
+	bool timeout;
+	int ret = 0;
+	uint8_t i;
+
+	mldev = fw->mldev;
+
+	/* Reset HEAD and TAIL debug pointer registers */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
+
+	/* (1) Write firmware images for ACC's two A35 cores to the ML region in LLC / DRAM. */
+	rte_memcpy(PLT_PTR_ADD(fw->data, FW_LINKER_OFFSET), buffer, size);
+
+	/* (2) Set ML(0)_MLR_BASE = Base IOVA of the ML region in LLC/DRAM. */
+	reg_val64 = PLT_PTR_SUB_U64_CAST(fw->data, rte_eal_get_baseaddr());
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_save(&mldev->roc, ML_MLR_BASE);
+
+	/* (3) Set ML(0)_AXI_BRIDGE_CTRL(1) = 0x184003 to remove back-pressure check on DMA AXI
+	 * bridge.
+	 */
+	reg_val64 = (ROC_ML_AXI_BRIDGE_CTRL_AXI_RESP_CTRL |
+		     ROC_ML_AXI_BRIDGE_CTRL_BRIDGE_CTRL_MODE | ROC_ML_AXI_BRIDGE_CTRL_NCB_WR_BLK |
+		     ROC_ML_AXI_BRIDGE_CTRL_FORCE_WRESP_OK | ROC_ML_AXI_BRIDGE_CTRL_FORCE_RRESP_OK);
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_AXI_BRIDGE_CTRL(1));
+	plt_ml_dbg("ML_AXI_BRIDGE_CTRL(1) => 0x%016lx",
+		   roc_ml_reg_read64(&mldev->roc, ML_AXI_BRIDGE_CTRL(1)));
+
+	/* (4) Set ML(0)_ANB(0..2)_BACKP_DISABLE = 0x3 to remove back-pressure on the AXI to NCB
+	 * bridges.
+	 */
+	for (i = 0; i < ML_ANBX_NR; i++) {
+		reg_val64 = (ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_B_BACKP_DISABLE |
+			     ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_R_BACKP_DISABLE);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_BACKP_DISABLE(i));
+		plt_ml_dbg("ML_ANBX_BACKP_DISABLE(%u) => 0x%016lx", i,
+			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_BACKP_DISABLE(i)));
+	}
+
+	/* (5) Set ML(0)_ANB(0..2)_NCBI_P_OVR = 0x3000 and ML(0)_ANB(0..2)_NCBI_NP_OVR = 0x3000 to
+	 * signal all ML transactions as non-secure.
+	 */
+	for (i = 0; i < ML_ANBX_NR; i++) {
+		reg_val64 = (ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR |
+			     ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR_VLD);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
+		plt_ml_dbg("ML_ANBX_NCBI_P_OVR(%u) => 0x%016lx", i,
+			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_P_OVR(i)));
+
+		reg_val64 |= (ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR |
+			      ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR_VLD);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_NP_OVR(i));
+		plt_ml_dbg("ML_ANBX_NCBI_NP_OVR(%u) => 0x%016lx", i,
+			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_NP_OVR(i)));
+	}
+
+	/* (6) Set ML(0)_CFG[MLIP_CLK_FORCE] = 1, to force turning on the MLIP clock. */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 |= ROC_ML_CFG_MLIP_CLK_FORCE;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	/* (7) Set ML(0)_JOB_MGR_CTRL[STALL_ON_IDLE] = 0, to make sure the boot request is accepted
+	 * when there is no job in the command queue.
+	 */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL);
+	reg_val64 &= ~ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
+	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL));
+
+	/* (8) Set ML(0)_CFG[ENA] = 0 and ML(0)_CFG[MLIP_ENA] = 1 to bring MLIP out of reset while
+	 * keeping the job manager disabled.
+	 */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 |= ROC_ML_CFG_MLIP_ENA;
+	reg_val64 &= ~ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	/* (9) Wait at least 70 coprocessor clock cycles. */
+	plt_delay_us(FW_WAIT_CYCLES);
+
+	/* (10) Write ML outbound addresses pointing to the firmware images written in step 1 to the
+	 * following registers: ML(0)_A35_0_RST_VECTOR_BASE_W(0..1) for core 0,
+	 * ML(0)_A35_1_RST_VECTOR_BASE_W(0..1) for core 1. The value written to each register is the
+	 * AXI outbound address divided by 4. Read after write.
+	 */
+	offset = PLT_PTR_ADD_U64_CAST(
+		fw->data, FW_LINKER_OFFSET - roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+	a35_0_rst_vector_base.s.addr = (offset + ML_AXI_START_ADDR) / 4;
+	a35_1_rst_vector_base.s.addr = (offset + ML_AXI_START_ADDR) / 4;
+
+	roc_ml_reg_write32(&mldev->roc, a35_0_rst_vector_base.w.w0, ML_A35_0_RST_VECTOR_BASE_W(0));
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(0));
+	plt_ml_dbg("ML_A35_0_RST_VECTOR_BASE_W(0) => 0x%08x", reg_val32);
+
+	roc_ml_reg_write32(&mldev->roc, a35_0_rst_vector_base.w.w1, ML_A35_0_RST_VECTOR_BASE_W(1));
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(1));
+	plt_ml_dbg("ML_A35_0_RST_VECTOR_BASE_W(1) => 0x%08x", reg_val32);
+
+	roc_ml_reg_write32(&mldev->roc, a35_1_rst_vector_base.w.w0, ML_A35_1_RST_VECTOR_BASE_W(0));
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(0));
+	plt_ml_dbg("ML_A35_1_RST_VECTOR_BASE_W(0) => 0x%08x", reg_val32);
+
+	roc_ml_reg_write32(&mldev->roc, a35_1_rst_vector_base.w.w1, ML_A35_1_RST_VECTOR_BASE_W(1));
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(1));
+	plt_ml_dbg("ML_A35_1_RST_VECTOR_BASE_W(1) => 0x%08x", reg_val32);
+
+	/* (11) Clear MLIP's ML(0)_SW_RST_CTRL[ACC_RST]. This will bring the ACC cores and other
+	 * MLIP components out of reset. The cores will execute firmware from the ML region as
+	 * written in step 1.
+	 */
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_SW_RST_CTRL);
+	reg_val32 &= ~ROC_ML_SW_RST_CTRL_ACC_RST;
+	roc_ml_reg_write32(&mldev->roc, reg_val32, ML_SW_RST_CTRL);
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_SW_RST_CTRL);
+	plt_ml_dbg("ML_SW_RST_CTRL => 0x%08x", reg_val32);
+
+	/* (12) Wait for notification from firmware that ML is ready for job execution. */
+	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
+	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
+	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &fw->req->result);
+	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
+	plt_write64(ML_CN10K_POLL_JOB_START, &fw->req->status);
+	plt_wmb();
+
+	/* Enqueue FW load through scratch registers */
+	timeout = true;
+	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&mldev->roc, &fw->req->jd);
+
+	plt_rmb();
+	do {
+		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
+		    (plt_read64(&fw->req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+			timeout = false;
+			break;
+		}
+	} while (plt_tsc_cycles() < timeout_cycle);
+
+	/* Check firmware load status, clean-up and exit on failure. */
+	if ((!timeout) && (fw->req->result.error_code == 0)) {
+		cn10k_ml_fw_print_info(fw);
+	} else {
+		/* Set ML to disable new jobs */
+		reg_val64 = (ROC_ML_CFG_JD_SIZE | ROC_ML_CFG_MLIP_ENA);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+
+		/* Clear scratch registers */
+		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+
+		if (timeout) {
+			plt_err("Firmware load timeout");
+			ret = -ETIME;
+		} else {
+			plt_err("Firmware load failed");
+			ret = -1;
+		}
+
+		return ret;
+	}
+
+	/* (13) Set ML(0)_JOB_MGR_CTRL[STALL_ON_IDLE] = 0x1; this is needed to shut down the MLIP
+	 * clock when there are no more jobs to process.
+	 */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL);
+	reg_val64 |= ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
+	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL));
+
+	/* (14) Set ML(0)_CFG[MLIP_CLK_FORCE] = 0; the MLIP clock will be turned on/off based on job
+	 * activities.
+	 */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 &= ~ROC_ML_CFG_MLIP_CLK_FORCE;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	/* (15) Set ML(0)_CFG[ENA] to enable ML job execution. */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 |= ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	/* Reset scratch registers */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+
+	/* Disable job execution, to be enabled in start */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 &= ~ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	/* Additional fixes: Set RO bit to fix O2D DMA bandwidth issue on cn10ka */
+	for (i = 0; i < ML_ANBX_NR; i++) {
+		reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_P_OVR(i));
+		reg_val64 |= (ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR |
+			      ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR_VLD);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
+	}
+
+	return ret;
+}
+
+int
+cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
+{
+	const struct plt_memzone *mz;
+	struct cn10k_ml_fw *fw;
+	void *fw_buffer = NULL;
+	uint64_t mz_size = 0;
+	uint64_t fw_size = 0;
+	int ret = 0;
+
+	fw = &mldev->fw;
+	fw->mldev = mldev;
+
+	/* Read firmware image to a buffer */
+	ret = rte_firmware_read(fw->path, &fw_buffer, &fw_size);
+	if (ret < 0) {
+		plt_err("Can't read firmware data: %s\n", fw->path);
+		return ret;
+	}
+
+	/* Reserve memzone for firmware load completion and data */
+	mz_size = sizeof(struct cn10k_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
+		  FW_DEBUG_BUFFER_SIZE + FW_EXCEPTION_BUFFER_SIZE;
+	mz = plt_memzone_reserve_aligned(FW_MEMZONE_NAME, mz_size, 0, ML_CN10K_ALIGN_SIZE);
+	if (mz == NULL) {
+		plt_err("plt_memzone_reserve failed : %s", FW_MEMZONE_NAME);
+		if (fw_buffer != NULL)
+			free(fw_buffer);
+		return -ENOMEM;
+	}
+	fw->req = mz->addr;
+
+	/* Reset firmware load completion structure */
+	memset(&fw->req->jd, 0, sizeof(struct cn10k_ml_jd));
+	memset(&fw->req->jd.fw_load.version[0], '\0', MLDEV_FIRMWARE_VERSION_LENGTH);
+
+	/* Reset device, if in active state */
+	if (roc_ml_mlip_is_enabled(&mldev->roc))
+		roc_ml_mlip_reset(&mldev->roc, true);
+
+	/* Load firmware */
+	fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cn10k_ml_req));
+	ret = cn10k_ml_fw_load_cn10ka(fw, fw_buffer, fw_size);
+	if (fw_buffer != NULL)
+		free(fw_buffer);
+	if (ret < 0)
+		cn10k_ml_fw_unload(mldev);
+
+	return ret;
+}
+
+void
+cn10k_ml_fw_unload(struct cn10k_ml_dev *mldev)
+{
+	const struct plt_memzone *mz;
+	uint64_t reg_val;
+
+	/* Disable and reset device */
+	reg_val = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val &= ~ROC_ML_CFG_MLIP_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val, ML_CFG);
+	roc_ml_mlip_reset(&mldev->roc, true);
+
+	mz = plt_memzone_lookup(FW_MEMZONE_NAME);
+	if (mz != NULL)
+		plt_memzone_free(mz);
+}
+
 static struct rte_pci_id pci_id_ml_table[] = {
 	{RTE_PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_CN10K_ML_PF)},
 	/* sentinel */
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 5333566cff..00d23eb3ca 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -10,6 +10,9 @@
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
+/* Firmware version string length */
+#define MLDEV_FIRMWARE_VERSION_LENGTH 32
+
 /* Device alignment size */
 #define ML_CN10K_ALIGN_SIZE 128
 
@@ -28,6 +31,19 @@
 /* ML command timeout in seconds */
 #define ML_CN10K_CMD_TIMEOUT 5
 
+/* Poll mode job state */
+#define ML_CN10K_POLL_JOB_START	 0
+#define ML_CN10K_POLL_JOB_FINISH 1
+
+/* Job types */
+enum cn10k_ml_job_type {
+	ML_CN10K_JOB_TYPE_MODEL_RUN = 0,
+	ML_CN10K_JOB_TYPE_MODEL_STOP,
+	ML_CN10K_JOB_TYPE_MODEL_START,
+	ML_CN10K_JOB_TYPE_FIRMWARE_LOAD,
+	ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST,
+};
+
 /* Device configuration state enum */
 enum cn10k_ml_dev_state {
 	/* Probed and not configured */
@@ -43,6 +59,136 @@ enum cn10k_ml_dev_state {
 	ML_CN10K_DEV_STATE_CLOSED
 };
 
+/* Firmware stats */
+struct cn10k_ml_fw_stats {
+	/* Firmware start cycle */
+	uint64_t fw_start;
+
+	/* Firmware end cycle */
+	uint64_t fw_end;
+
+	/* Hardware start cycle */
+	uint64_t hw_start;
+
+	/* Hardware end cycle */
+	uint64_t hw_end;
+};
+
+/* Result structure */
+struct cn10k_ml_result {
+	/* Job error code */
+	uint64_t error_code;
+
+	/* Firmware stats */
+	struct cn10k_ml_fw_stats stats;
+
+	/* User context pointer */
+	void *user_ptr;
+};
+
+/* Firmware capability structure */
+union cn10k_ml_fw_cap {
+	uint64_t u64;
+
+	struct {
+		/* CMPC completion support */
+		uint64_t cmpc_completions : 1;
+
+		/* Poll mode completion support */
+		uint64_t poll_completions : 1;
+
+		/* SSO completion support */
+		uint64_t sso_completions : 1;
+
+		/* Support for model side loading */
+		uint64_t side_load_model : 1;
+
+		/* Batch execution */
+		uint64_t batch_run : 1;
+
+		/* Max number of models to be loaded in parallel */
+		uint64_t max_models : 8;
+
+		/* Firmware statistics */
+		uint64_t fw_stats : 1;
+
+		/* Hardware statistics */
+		uint64_t hw_stats : 1;
+
+		/* Max number of batches */
+		uint64_t max_num_batches : 16;
+
+		uint64_t rsvd : 33;
+	} s;
+};
+
+/* Firmware debug info structure */
+struct cn10k_ml_fw_debug {
+	/* ACC core 0 debug buffer */
+	uint64_t core0_debug_ptr;
+
+	/* ACC core 1 debug buffer */
+	uint64_t core1_debug_ptr;
+
+	/* ACC core 0 exception state buffer */
+	uint64_t core0_exception_buffer;
+
+	/* ACC core 1 exception state buffer */
+	uint64_t core1_exception_buffer;
+
+	/* Debug buffer size per core */
+	uint32_t debug_buffer_size;
+
+	/* Exception state dump size */
+	uint32_t exception_state_size;
+};
+
+/* Job descriptor header (32 bytes) */
+struct cn10k_ml_jd_header {
+	/* Job completion structure */
+	struct ml_jce_s jce;
+
+	/* Model ID */
+	uint64_t model_id : 8;
+
+	/* Job type */
+	uint64_t job_type : 8;
+
+	/* Flags for fast-path jobs */
+	uint64_t fp_flags : 16;
+
+	/* Flags for slow-path jobs */
+	uint64_t sp_flags : 16;
+	uint64_t rsvd : 16;
+
+	/* Job result pointer */
+	uint64_t *result;
+};
+
+/* Job descriptor structure */
+struct cn10k_ml_jd {
+	/* Job descriptor header (32 bytes) */
+	struct cn10k_ml_jd_header hdr;
+
+	union {
+		struct cn10k_ml_jd_section_fw_load {
+			/* Firmware capability structure (8 bytes) */
+			union cn10k_ml_fw_cap cap;
+
+			/* Firmware version (32 bytes) */
+			uint8_t version[MLDEV_FIRMWARE_VERSION_LENGTH];
+
+			/* Debug capability structure (40 bytes) */
+			struct cn10k_ml_fw_debug debug;
+
+			/* Flags to control error handling */
+			uint64_t flags;
+
+			uint8_t rsvd[8];
+		} fw_load;
+	};
+};
+
 /* ML firmware structure */
 struct cn10k_ml_fw {
 	/* Device reference */
@@ -50,6 +196,12 @@ struct cn10k_ml_fw {
 
 	/* Firmware file path */
 	const char *path;
+
+	/* Data buffer */
+	uint8_t *data;
+
+	/* Firmware load / handshake request structure */
+	struct cn10k_ml_req *req;
 };
 
 /* Device private data */
@@ -64,4 +216,8 @@ struct cn10k_ml_dev {
 	struct cn10k_ml_fw fw;
 };
 
+uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
+int cn10k_ml_fw_load(struct cn10k_ml_dev *mldev);
+void cn10k_ml_fw_unload(struct cn10k_ml_dev *mldev);
+
 #endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 32d38569a3..11e1cdb7cd 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -30,6 +30,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 {
 	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_dev *mldev;
+	int ret;
 
 	if (dev == NULL || conf == NULL)
 		return -EINVAL;
@@ -51,6 +52,11 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	if (mldev->state == ML_CN10K_DEV_STATE_PROBED) {
 		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
+
+		/* Load firmware */
+		ret = cn10k_ml_fw_load(mldev);
+		if (ret != 0)
+			return ret;
 	} else if (mldev->state == ML_CN10K_DEV_STATE_CONFIGURED) {
 		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
@@ -77,6 +83,21 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 
 	mldev = dev->data->dev_private;
 
+	/* Unload firmware */
+	cn10k_ml_fw_unload(mldev);
+
+	/* Clear scratch registers */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+
+	/* Reset ML_MLR_BASE */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+
 	mldev->state = ML_CN10K_DEV_STATE_CLOSED;
 
 	/* Remove PCI device */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index b14221d02c..fe18730aca 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -5,6 +5,20 @@
 #ifndef _CN10K_ML_OPS_H_
 #define _CN10K_ML_OPS_H_
 
+#include "cn10k_ml_dev.h"
+
+/* ML request */
+struct cn10k_ml_req {
+	/* Job descriptor */
+	struct cn10k_ml_jd jd;
+
+	/* Job result */
+	struct cn10k_ml_result result;
+
+	/* Status field for poll mode requests */
+	volatile uint64_t status;
+} __rte_aligned(ROC_ALIGN);
+
 /* Device ops */
 extern struct rte_ml_dev_ops cn10k_ml_ops;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 08/39] ml/cnxk: enable support for simulator environment
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (6 preceding siblings ...)
  2023-02-01  9:22   ` [PATCH v4 07/39] ml/cnxk: enable firmware load and device reset Srikanth Yalavarthi
@ 2023-02-01  9:22   ` Srikanth Yalavarthi
  2023-02-01  9:22   ` [PATCH v4 09/39] ml/cnxk: enable support for device start and stop Srikanth Yalavarthi
                     ` (30 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:22 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled device initialization and firmware load on simulator
platform. Firmware load stage on simulator would involve
launching a firmware handshake request only.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 119 +++++++++++++++++++++++++++++----
 1 file changed, 107 insertions(+), 12 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 90fca45ddd..837f006bf0 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -213,6 +213,89 @@ cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 	return FW_LOAD_FLAGS;
 }
 
+static int
+cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
+{
+	struct cn10k_ml_dev *mldev;
+	uint64_t timeout_cycle;
+	uint64_t reg_val64;
+	bool timeout;
+	int ret = 0;
+
+	mldev = fw->mldev;
+
+	/* Reset HEAD and TAIL debug pointer registers */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
+
+	/* Set ML_MLR_BASE to base IOVA of the ML region in LLC/DRAM. */
+	reg_val64 = rte_eal_get_baseaddr();
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_save(&mldev->roc, ML_MLR_BASE);
+
+	/* Update FW load completion structure */
+	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
+	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
+	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &fw->req->result);
+	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
+	plt_write64(ML_CN10K_POLL_JOB_START, &fw->req->status);
+	plt_wmb();
+
+	/* Enqueue FW load through scratch registers */
+	timeout = true;
+	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&mldev->roc, &fw->req->jd);
+
+	plt_rmb();
+	do {
+		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
+		    (plt_read64(&fw->req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+			timeout = false;
+			break;
+		}
+	} while (plt_tsc_cycles() < timeout_cycle);
+
+	/* Check firmware load status, clean-up and exit on failure. */
+	if ((!timeout) && (fw->req->result.error_code == 0)) {
+		cn10k_ml_fw_print_info(fw);
+	} else {
+		/* Set ML to disable new jobs */
+		reg_val64 = (ROC_ML_CFG_JD_SIZE | ROC_ML_CFG_MLIP_ENA);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+
+		/* Clear scratch registers */
+		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+
+		if (timeout) {
+			plt_err("Firmware load timeout");
+			ret = -ETIME;
+		} else {
+			plt_err("Firmware load failed");
+			ret = -1;
+		}
+
+		return ret;
+	}
+
+	/* Reset scratch registers */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+
+	/* Disable job execution, to be enabled in start */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 &= ~ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	return ret;
+}
+
 static int
 cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 {
@@ -447,16 +530,22 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	fw = &mldev->fw;
 	fw->mldev = mldev;
 
-	/* Read firmware image to a buffer */
-	ret = rte_firmware_read(fw->path, &fw_buffer, &fw_size);
-	if (ret < 0) {
-		plt_err("Can't read firmware data: %s\n", fw->path);
-		return ret;
+	if (roc_env_is_emulator() || roc_env_is_hw()) {
+		/* Read firmware image to a buffer */
+		ret = rte_firmware_read(fw->path, &fw_buffer, &fw_size);
+		if (ret < 0) {
+			plt_err("Can't read firmware data: %s\n", fw->path);
+			return ret;
+		}
+
+		/* Reserve memzone for firmware load completion and data */
+		mz_size = sizeof(struct cn10k_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
+			  FW_DEBUG_BUFFER_SIZE + FW_EXCEPTION_BUFFER_SIZE;
+	} else if (roc_env_is_asim()) {
+		/* Reserve memzone for firmware load completion */
+		mz_size = sizeof(struct cn10k_ml_req);
 	}
 
-	/* Reserve memzone for firmware load completion and data */
-	mz_size = sizeof(struct cn10k_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
-		  FW_DEBUG_BUFFER_SIZE + FW_EXCEPTION_BUFFER_SIZE;
 	mz = plt_memzone_reserve_aligned(FW_MEMZONE_NAME, mz_size, 0, ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL) {
 		plt_err("plt_memzone_reserve failed : %s", FW_MEMZONE_NAME);
@@ -475,10 +564,16 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 		roc_ml_mlip_reset(&mldev->roc, true);
 
 	/* Load firmware */
-	fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cn10k_ml_req));
-	ret = cn10k_ml_fw_load_cn10ka(fw, fw_buffer, fw_size);
-	if (fw_buffer != NULL)
-		free(fw_buffer);
+	if (roc_env_is_emulator() || roc_env_is_hw()) {
+		fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cn10k_ml_req));
+		ret = cn10k_ml_fw_load_cn10ka(fw, fw_buffer, fw_size);
+		if (fw_buffer != NULL)
+			free(fw_buffer);
+	} else if (roc_env_is_asim()) {
+		fw->data = NULL;
+		ret = cn10k_ml_fw_load_asim(fw);
+	}
+
 	if (ret < 0)
 		cn10k_ml_fw_unload(mldev);
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 09/39] ml/cnxk: enable support for device start and stop
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (7 preceding siblings ...)
  2023-02-01  9:22   ` [PATCH v4 08/39] ml/cnxk: enable support for simulator environment Srikanth Yalavarthi
@ 2023-02-01  9:22   ` Srikanth Yalavarthi
  2023-02-01  9:22   ` [PATCH v4 10/39] ml/cnxk: add support to create device queue-pairs Srikanth Yalavarthi
                     ` (29 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:22 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Implemented ML driver functions to start and stop ML device.
Start / Stop would enable or disable ML device to accept
inference requests.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 42 +++++++++++++++++++++++++++++++---
 1 file changed, 39 insertions(+), 3 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 11e1cdb7cd..3fea763caf 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -104,9 +104,45 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	return rte_dev_remove(dev->device);
 }
 
+static int
+cn10k_ml_dev_start(struct rte_ml_dev *dev)
+{
+	struct cn10k_ml_dev *mldev;
+	uint64_t reg_val64;
+
+	mldev = dev->data->dev_private;
+
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 |= ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	mldev->state = ML_CN10K_DEV_STATE_STARTED;
+
+	return 0;
+}
+
+static int
+cn10k_ml_dev_stop(struct rte_ml_dev *dev)
+{
+	struct cn10k_ml_dev *mldev;
+	uint64_t reg_val64;
+
+	mldev = dev->data->dev_private;
+
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 &= ~ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
-	.dev_info_get = cn10k_ml_dev_info_get,
-	.dev_configure = cn10k_ml_dev_configure,
-	.dev_close = cn10k_ml_dev_close,
+	.dev_info_get = cn10k_ml_dev_info_get, .dev_configure = cn10k_ml_dev_configure,
+	.dev_close = cn10k_ml_dev_close,       .dev_start = cn10k_ml_dev_start,
+	.dev_stop = cn10k_ml_dev_stop,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 10/39] ml/cnxk: add support to create device queue-pairs
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (8 preceding siblings ...)
  2023-02-01  9:22   ` [PATCH v4 09/39] ml/cnxk: enable support for device start and stop Srikanth Yalavarthi
@ 2023-02-01  9:22   ` Srikanth Yalavarthi
  2023-02-01  9:22   ` [PATCH v4 11/39] ml/cnxk: add functions to load and unload models Srikanth Yalavarthi
                     ` (28 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:22 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled support to create and destroy device queue-pairs. Updated
configure stage to create array to store queue-pair handles. Added
internal structure for queue-pair, queue and ML inference requests.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 207 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cn10k_ml_ops.h |  33 +++++-
 2 files changed, 237 insertions(+), 3 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 3fea763caf..7c9c49ffda 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -8,6 +8,97 @@
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ops.h"
 
+static void
+qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
+{
+	snprintf(name, size, "cn10k_ml_qp_mem_%u:%u", dev_id, qp_id);
+}
+
+static int
+cn10k_ml_qp_destroy(const struct rte_ml_dev *dev, struct cn10k_ml_qp *qp)
+{
+	const struct rte_memzone *qp_mem;
+	char name[RTE_MEMZONE_NAMESIZE];
+	int ret;
+
+	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp->id);
+	qp_mem = rte_memzone_lookup(name);
+	ret = rte_memzone_free(qp_mem);
+	if (ret)
+		return ret;
+
+	rte_free(qp);
+
+	return 0;
+}
+
+static int
+cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
+{
+	struct cn10k_ml_qp *qp;
+	int ret;
+
+	qp = dev->data->queue_pairs[queue_pair_id];
+	if (qp == NULL)
+		return -EINVAL;
+
+	ret = cn10k_ml_qp_destroy(dev, qp);
+	if (ret) {
+		plt_err("Could not destroy queue pair %u", queue_pair_id);
+		return ret;
+	}
+
+	dev->data->queue_pairs[queue_pair_id] = NULL;
+
+	return 0;
+}
+
+static struct cn10k_ml_qp *
+cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
+{
+	const struct rte_memzone *qp_mem;
+	char name[RTE_MEMZONE_NAMESIZE];
+	struct cn10k_ml_qp *qp;
+	uint32_t len;
+	uint8_t *va;
+
+	/* Allocate queue pair */
+	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cn10k_ml_qp), ROC_ALIGN,
+				socket_id);
+	if (qp == NULL) {
+		plt_err("Could not allocate queue pair");
+		return NULL;
+	}
+
+	/* For request queue */
+	len = nb_desc * sizeof(struct cn10k_ml_req);
+	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp_id);
+	qp_mem = rte_memzone_reserve_aligned(
+		name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | RTE_MEMZONE_256MB, ROC_ALIGN);
+	if (qp_mem == NULL) {
+		plt_err("Could not reserve memzone: %s", name);
+		goto qp_free;
+	}
+
+	va = qp_mem->addr;
+	memset(va, 0, len);
+
+	/* Initialize Request queue */
+	qp->id = qp_id;
+	qp->queue.reqs = (struct cn10k_ml_req *)va;
+	qp->queue.head = 0;
+	qp->queue.tail = 0;
+	qp->queue.wait_cycles = ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	qp->nb_desc = nb_desc;
+
+	return qp;
+
+qp_free:
+	rte_free(qp);
+
+	return NULL;
+}
+
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -30,6 +121,9 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 {
 	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_qp *qp;
+	uint32_t mz_size;
+	uint16_t qp_id;
 	int ret;
 
 	if (dev == NULL || conf == NULL)
@@ -68,21 +162,83 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		return -ENOTSUP;
 	}
 
+	/* Configure queue-pairs */
+	if (dev->data->queue_pairs == NULL) {
+		mz_size = sizeof(dev->data->queue_pairs[0]) * conf->nb_queue_pairs;
+		dev->data->queue_pairs =
+			rte_zmalloc("cn10k_mldev_queue_pairs", mz_size, RTE_CACHE_LINE_SIZE);
+		if (dev->data->queue_pairs == NULL) {
+			dev->data->nb_queue_pairs = 0;
+			plt_err("Failed to get memory for queue_pairs, nb_queue_pairs %u",
+				conf->nb_queue_pairs);
+			return -ENOMEM;
+		}
+	} else { /* Re-configure */
+		void **queue_pairs;
+
+		/* Release all queue pairs as ML spec doesn't support queue_pair_destroy. */
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+			qp = dev->data->queue_pairs[qp_id];
+			if (qp != NULL) {
+				ret = cn10k_ml_dev_queue_pair_release(dev, qp_id);
+				if (ret < 0)
+					return ret;
+			}
+		}
+
+		queue_pairs = dev->data->queue_pairs;
+		queue_pairs =
+			rte_realloc(queue_pairs, sizeof(queue_pairs[0]) * conf->nb_queue_pairs,
+				    RTE_CACHE_LINE_SIZE);
+		if (queue_pairs == NULL) {
+			dev->data->nb_queue_pairs = 0;
+			plt_err("Failed to realloc queue_pairs, nb_queue_pairs = %u",
+				conf->nb_queue_pairs);
+			ret = -ENOMEM;
+			goto error;
+		}
+
+		memset(queue_pairs, 0, sizeof(queue_pairs[0]) * conf->nb_queue_pairs);
+		dev->data->queue_pairs = queue_pairs;
+	}
+	dev->data->nb_queue_pairs = conf->nb_queue_pairs;
+
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
 
 	return 0;
+
+error:
+	if (dev->data->queue_pairs != NULL)
+		rte_free(dev->data->queue_pairs);
+
+	return ret;
 }
 
 static int
 cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_qp *qp;
+	uint16_t qp_id;
 
 	if (dev == NULL)
 		return -EINVAL;
 
 	mldev = dev->data->dev_private;
 
+	/* Destroy all queue pairs */
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		if (qp != NULL) {
+			if (cn10k_ml_qp_destroy(dev, qp) != 0)
+				plt_err("Could not destroy queue pair %u", qp_id);
+			dev->data->queue_pairs[qp_id] = NULL;
+		}
+	}
+
+	if (dev->data->queue_pairs)
+		rte_free(dev->data->queue_pairs);
+
 	/* Unload firmware */
 	cn10k_ml_fw_unload(mldev);
 
@@ -140,9 +296,56 @@ cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 	return 0;
 }
 
+static int
+cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
+			      const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_qp *qp;
+	uint32_t nb_desc;
+
+	if (queue_pair_id >= dev->data->nb_queue_pairs) {
+		plt_err("Queue-pair id = %u (>= max queue pairs supported, %u)\n", queue_pair_id,
+			dev->data->nb_queue_pairs);
+		return -EINVAL;
+	}
+
+	if (dev->data->queue_pairs[queue_pair_id] != NULL)
+		cn10k_ml_dev_queue_pair_release(dev, queue_pair_id);
+
+	cn10k_ml_dev_info_get(dev, &dev_info);
+	if ((qp_conf->nb_desc > dev_info.max_desc) || (qp_conf->nb_desc == 0)) {
+		plt_err("Could not setup queue pair for %u descriptors", qp_conf->nb_desc);
+		return -EINVAL;
+	}
+	plt_ml_dbg("Creating queue-pair, queue_pair_id = %u, nb_desc = %u", queue_pair_id,
+		   qp_conf->nb_desc);
+
+	/* As the number of usable descriptors is 1 less than the queue size being created, we
+	 * increment the size of queue by 1 than the requested size, except when the requested size
+	 * is equal to the maximum possible size.
+	 */
+	nb_desc =
+		(qp_conf->nb_desc == dev_info.max_desc) ? dev_info.max_desc : qp_conf->nb_desc + 1;
+	qp = cn10k_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
+	if (qp == NULL) {
+		plt_err("Could not create queue pair %u", queue_pair_id);
+		return -ENOMEM;
+	}
+	dev->data->queue_pairs[queue_pair_id] = qp;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
-	.dev_info_get = cn10k_ml_dev_info_get, .dev_configure = cn10k_ml_dev_configure,
-	.dev_close = cn10k_ml_dev_close,       .dev_start = cn10k_ml_dev_start,
+	.dev_info_get = cn10k_ml_dev_info_get,
+	.dev_configure = cn10k_ml_dev_configure,
+	.dev_close = cn10k_ml_dev_close,
+	.dev_start = cn10k_ml_dev_start,
 	.dev_stop = cn10k_ml_dev_stop,
+
+	/* Queue-pair handling ops */
+	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
+	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
 };
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index fe18730aca..289c7c5587 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -5,9 +5,13 @@
 #ifndef _CN10K_ML_OPS_H_
 #define _CN10K_ML_OPS_H_
 
+#include <rte_mldev.h>
+
+#include <roc_api.h>
+
 #include "cn10k_ml_dev.h"
 
-/* ML request */
+/* Request structure */
 struct cn10k_ml_req {
 	/* Job descriptor */
 	struct cn10k_ml_jd jd;
@@ -19,6 +23,33 @@ struct cn10k_ml_req {
 	volatile uint64_t status;
 } __rte_aligned(ROC_ALIGN);
 
+/* Request queue */
+struct cn10k_ml_queue {
+	/* Array of requests */
+	struct cn10k_ml_req *reqs;
+
+	/* Head of the queue, used for enqueue */
+	uint64_t head;
+
+	/* Tail of the queue, used for dequeue */
+	uint64_t tail;
+
+	/* Wait cycles before timeout */
+	uint64_t wait_cycles;
+};
+
+/* Queue-pair structure */
+struct cn10k_ml_qp {
+	/* ID */
+	uint32_t id;
+
+	/* Number of descriptors */
+	uint64_t nb_desc;
+
+	/* Request queue */
+	struct cn10k_ml_queue queue;
+};
+
 /* Device ops */
 extern struct rte_ml_dev_ops cn10k_ml_ops;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 11/39] ml/cnxk: add functions to load and unload models
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (9 preceding siblings ...)
  2023-02-01  9:22   ` [PATCH v4 10/39] ml/cnxk: add support to create device queue-pairs Srikanth Yalavarthi
@ 2023-02-01  9:22   ` Srikanth Yalavarthi
  2023-02-01  9:22   ` [PATCH v4 12/39] ml/cnxk: enable validity checks for model metadata Srikanth Yalavarthi
                     ` (27 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:22 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added cnxk driver implementations to load and unload ML models.
Enabled support in configure stage to allocate model handles
array. Assign model ID and allocate resources per each model
during load stage and release resources during model unload.
Added internal structures to handle ML models.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |   3 +
 drivers/ml/cnxk/cn10k_ml_model.c |   5 +
 drivers/ml/cnxk/cn10k_ml_model.h |  40 ++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 154 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h   |   5 +
 drivers/ml/cnxk/meson.build      |   2 +
 6 files changed, 209 insertions(+)
 create mode 100644 drivers/ml/cnxk/cn10k_ml_model.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_model.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 00d23eb3ca..7cf6268115 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -214,6 +214,9 @@ struct cn10k_ml_dev {
 
 	/* Firmware */
 	struct cn10k_ml_fw fw;
+
+	/* Number of models loaded */
+	uint16_t nb_models_loaded;
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
new file mode 100644
index 0000000000..39ed707396
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#include "cn10k_ml_model.h"
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
new file mode 100644
index 0000000000..912fdb9758
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef _CN10K_ML_MODEL_H_
+#define _CN10K_ML_MODEL_H_
+
+#include <rte_mldev.h>
+
+#include <roc_api.h>
+
+#include "cn10k_ml_dev.h"
+
+/* Model state */
+enum cn10k_ml_model_state {
+	ML_CN10K_MODEL_STATE_LOADED,
+	ML_CN10K_MODEL_STATE_JOB_ACTIVE,
+	ML_CN10K_MODEL_STATE_STARTED,
+	ML_CN10K_MODEL_STATE_UNKNOWN,
+};
+
+/* Model Object */
+struct cn10k_ml_model {
+	/* Device reference */
+	struct cn10k_ml_dev *mldev;
+
+	/* Name */
+	char name[RTE_ML_STR_MAX];
+
+	/* ID */
+	int16_t model_id;
+
+	/* Spinlock, used to update model state */
+	plt_spinlock_t lock;
+
+	/* State */
+	enum cn10k_ml_model_state state;
+};
+
+#endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 7c9c49ffda..d177d0e3e4 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -6,8 +6,12 @@
 #include <rte_mldev_pmd.h>
 
 #include "cn10k_ml_dev.h"
+#include "cn10k_ml_model.h"
 #include "cn10k_ml_ops.h"
 
+/* ML model macros */
+#define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -120,9 +124,11 @@ static int
 cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
 {
 	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
 	uint32_t mz_size;
+	int16_t model_id;
 	uint16_t qp_id;
 	int ret;
 
@@ -203,6 +209,48 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	}
 	dev->data->nb_queue_pairs = conf->nb_queue_pairs;
 
+	/* Allocate ML models */
+	if (dev->data->models == NULL) {
+		mz_size = sizeof(dev->data->models[0]) * conf->nb_models;
+		dev->data->models = rte_zmalloc("cn10k_mldev_models", mz_size, RTE_CACHE_LINE_SIZE);
+		if (dev->data->models == NULL) {
+			dev->data->nb_models = 0;
+			plt_err("Failed to get memory for ml_models, nb_models %u",
+				conf->nb_models);
+			ret = -ENOMEM;
+			goto error;
+		}
+	} else {
+		/* Re-configure */
+		void **models;
+
+		/* Unload all models */
+		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+			model = dev->data->models[model_id];
+			if (model != NULL) {
+				if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+					if (cn10k_ml_model_unload(dev, model_id) != 0)
+						plt_err("Could not unload model %d", model_id);
+				}
+				dev->data->models[model_id] = NULL;
+			}
+		}
+
+		models = dev->data->models;
+		models = rte_realloc(models, sizeof(models[0]) * conf->nb_models,
+				     RTE_CACHE_LINE_SIZE);
+		if (models == NULL) {
+			dev->data->nb_models = 0;
+			plt_err("Failed to realloc ml_models, nb_models = %u", conf->nb_models);
+			ret = -ENOMEM;
+			goto error;
+		}
+		memset(models, 0, sizeof(models[0]) * conf->nb_models);
+		dev->data->models = models;
+	}
+	dev->data->nb_models = conf->nb_models;
+
+	mldev->nb_models_loaded = 0;
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
 
 	return 0;
@@ -211,14 +259,19 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	if (dev->data->queue_pairs != NULL)
 		rte_free(dev->data->queue_pairs);
 
+	if (dev->data->models != NULL)
+		rte_free(dev->data->models);
+
 	return ret;
 }
 
 static int
 cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
+	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
+	int16_t model_id;
 	uint16_t qp_id;
 
 	if (dev == NULL)
@@ -226,6 +279,21 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 
 	mldev = dev->data->dev_private;
 
+	/* Unload all models */
+	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+		model = dev->data->models[model_id];
+		if (model != NULL) {
+			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+				if (cn10k_ml_model_unload(dev, model_id) != 0)
+					plt_err("Could not unload model %d", model_id);
+			}
+			dev->data->models[model_id] = NULL;
+		}
+	}
+
+	if (dev->data->models)
+		rte_free(dev->data->models);
+
 	/* Destroy all queue pairs */
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
 		qp = dev->data->queue_pairs[qp_id];
@@ -337,6 +405,88 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+int
+cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, int16_t *model_id)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint64_t mz_size;
+	uint16_t idx;
+	bool found;
+
+	PLT_SET_USED(params);
+
+	mldev = dev->data->dev_private;
+
+	/* Find model ID */
+	found = false;
+	for (idx = 0; idx < dev->data->nb_models; idx++) {
+		if (dev->data->models[idx] == NULL) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		plt_err("No slots available to load new model");
+		return -ENOMEM;
+	}
+
+	/* Compute memzone size */
+	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE);
+
+	/* Allocate memzone for model object and model data */
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
+	mz = plt_memzone_reserve_aligned(str, mz_size, 0, ML_CN10K_ALIGN_SIZE);
+	if (!mz) {
+		plt_err("plt_memzone_reserve failed : %s", str);
+		return -ENOMEM;
+	}
+
+	model = mz->addr;
+	model->mldev = mldev;
+	model->model_id = idx;
+
+	plt_spinlock_init(&model->lock);
+	model->state = ML_CN10K_MODEL_STATE_LOADED;
+	dev->data->models[idx] = model;
+	mldev->nb_models_loaded++;
+
+	*model_id = idx;
+
+	return 0;
+}
+
+int
+cn10k_ml_model_unload(struct rte_ml_dev *dev, int16_t model_id)
+{
+	char str[RTE_MEMZONE_NAMESIZE];
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+
+	mldev = dev->data->dev_private;
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	if (model->state != ML_CN10K_MODEL_STATE_LOADED) {
+		plt_err("Cannot unload. Model in use.");
+		return -EBUSY;
+	}
+
+	dev->data->models[model_id] = NULL;
+	mldev->nb_models_loaded--;
+
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%d", CN10K_ML_MODEL_MEMZONE_NAME, model_id);
+	return plt_memzone_free(plt_memzone_lookup(str));
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -348,4 +498,8 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Queue-pair handling ops */
 	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
 	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
+
+	/* Model ops */
+	.model_load = cn10k_ml_model_load,
+	.model_unload = cn10k_ml_model_unload,
 };
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 289c7c5587..8a939cabc7 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -53,4 +53,9 @@ struct cn10k_ml_qp {
 /* Device ops */
 extern struct rte_ml_dev_ops cn10k_ml_ops;
 
+/* Slow-path ops */
+int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
+			int16_t *model_id);
+int cn10k_ml_model_unload(struct rte_ml_dev *dev, int16_t model_id);
+
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 7dc8a29a80..bf7a9c0225 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -10,11 +10,13 @@ endif
 driver_sdk_headers = files(
         'cn10k_ml_dev.h',
         'cn10k_ml_ops.h',
+        'cn10k_ml_model.h',
 )
 
 sources = files(
         'cn10k_ml_dev.c',
         'cn10k_ml_ops.c',
+        'cn10k_ml_model.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs']
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 12/39] ml/cnxk: enable validity checks for model metadata
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (10 preceding siblings ...)
  2023-02-01  9:22   ` [PATCH v4 11/39] ml/cnxk: add functions to load and unload models Srikanth Yalavarthi
@ 2023-02-01  9:22   ` Srikanth Yalavarthi
  2023-02-01  9:22   ` [PATCH v4 13/39] ml/cnxk: add internal structures for derived info Srikanth Yalavarthi
                     ` (26 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:22 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added model metadata structure and enabled metadata check
during model load. Remap cnxk IO types with RTE IO types.
Store and update model metadata in model structure.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 211 +++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h | 312 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c   |  14 +-
 drivers/ml/cnxk/meson.build      |   2 +-
 4 files changed, 537 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 39ed707396..0fefab9daa 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -2,4 +2,215 @@
  * Copyright (c) 2022 Marvell.
  */
 
+#include <rte_hash_crc.h>
+
+#include <mldev_utils.h>
+
+#include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
+
+static enum rte_ml_io_type
+cn10k_ml_io_type_map(uint8_t type)
+{
+	switch (type) {
+	case 1:
+		return RTE_ML_IO_TYPE_INT8;
+	case 2:
+		return RTE_ML_IO_TYPE_UINT8;
+	case 3:
+		return RTE_ML_IO_TYPE_INT16;
+	case 4:
+		return RTE_ML_IO_TYPE_UINT16;
+	case 5:
+		return RTE_ML_IO_TYPE_INT32;
+	case 6:
+		return RTE_ML_IO_TYPE_UINT32;
+	case 7:
+		return RTE_ML_IO_TYPE_FP16;
+	case 8:
+		return RTE_ML_IO_TYPE_FP32;
+	}
+
+	return RTE_ML_IO_TYPE_UNKNOWN;
+}
+
+int
+cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	uint32_t payload_crc32c;
+	uint32_t header_crc32c;
+	uint8_t version[4];
+	uint8_t i;
+
+	metadata = (struct cn10k_ml_model_metadata *)buffer;
+
+	/* Header CRC check */
+	if (metadata->metadata_header.header_crc32c != 0) {
+		header_crc32c = rte_hash_crc(
+			buffer, sizeof(metadata->metadata_header) - sizeof(uint32_t), 0);
+
+		if (header_crc32c != metadata->metadata_header.header_crc32c) {
+			plt_err("Invalid model, Header CRC mismatch");
+			return -EINVAL;
+		}
+	}
+
+	/* Payload CRC check */
+	if (metadata->metadata_header.payload_crc32c != 0) {
+		payload_crc32c = rte_hash_crc(buffer + sizeof(metadata->metadata_header),
+					      size - sizeof(metadata->metadata_header), 0);
+
+		if (payload_crc32c != metadata->metadata_header.payload_crc32c) {
+			plt_err("Invalid model, Payload CRC mismatch");
+			return -EINVAL;
+		}
+	}
+
+	/* Model magic string */
+	if (strncmp((char *)metadata->metadata_header.magic, MRVL_ML_MODEL_MAGIC_STRING, 4) != 0) {
+		plt_err("Invalid model, magic = %s", metadata->metadata_header.magic);
+		return -EINVAL;
+	}
+
+	/* Target architecture */
+	if (metadata->metadata_header.target_architecture != MRVL_ML_MODEL_TARGET_ARCH) {
+		plt_err("Model target architecture (%u) not supported",
+			metadata->metadata_header.target_architecture);
+		return -ENOTSUP;
+	}
+
+	/* Header version */
+	rte_memcpy(version, metadata->metadata_header.version, 4 * sizeof(uint8_t));
+	if (version[0] * 1000 + version[1] * 100 < MRVL_ML_MODEL_VERSION) {
+		plt_err("Metadata version = %u.%u.%u.%u (< %u.%u.%u.%u) not supported", version[0],
+			version[1], version[2], version[3], (MRVL_ML_MODEL_VERSION / 1000) % 10,
+			(MRVL_ML_MODEL_VERSION / 100) % 10, (MRVL_ML_MODEL_VERSION / 10) % 10,
+			MRVL_ML_MODEL_VERSION % 10);
+		return -ENOTSUP;
+	}
+
+	/* Init section */
+	if (metadata->init_model.file_size == 0) {
+		plt_err("Invalid metadata, init_model.file_size = %u",
+			metadata->init_model.file_size);
+		return -EINVAL;
+	}
+
+	/* Main section */
+	if (metadata->main_model.file_size == 0) {
+		plt_err("Invalid metadata, main_model.file_size = %u",
+			metadata->main_model.file_size);
+		return -EINVAL;
+	}
+
+	/* Finish section */
+	if (metadata->finish_model.file_size == 0) {
+		plt_err("Invalid metadata, finish_model.file_size = %u",
+			metadata->finish_model.file_size);
+		return -EINVAL;
+	}
+
+	/* Weights and Bias */
+	if (metadata->weights_bias.file_size == 0) {
+		plt_err("Invalid metadata, weights_bias.file_size = %u",
+			metadata->weights_bias.file_size);
+		return -EINVAL;
+	}
+
+	if (metadata->weights_bias.relocatable != 1) {
+		plt_err("Model not supported, non-relocatable weights and bias");
+		return -ENOTSUP;
+	}
+
+	/* Check input count */
+	if (metadata->model.num_input > MRVL_ML_INPUT_OUTPUT_SIZE) {
+		plt_err("Invalid metadata, num_input  = %d (> %d)", metadata->model.num_input,
+			MRVL_ML_INPUT_OUTPUT_SIZE);
+		return -EINVAL;
+	}
+
+	/* Check output count */
+	if (metadata->model.num_output > MRVL_ML_INPUT_OUTPUT_SIZE) {
+		plt_err("Invalid metadata, num_output  = %d (> %d)", metadata->model.num_output,
+			MRVL_ML_INPUT_OUTPUT_SIZE);
+		return -EINVAL;
+	}
+
+	/* Inputs */
+	for (i = 0; i < metadata->model.num_input; i++) {
+		if (rte_ml_io_type_size_get(cn10k_ml_io_type_map(metadata->input[i].input_type)) <=
+		    0) {
+			plt_err("Invalid metadata, input[%u] : input_type = %u", i,
+				metadata->input[i].input_type);
+			return -EINVAL;
+		}
+
+		if (rte_ml_io_type_size_get(
+			    cn10k_ml_io_type_map(metadata->input[i].model_input_type)) <= 0) {
+			plt_err("Invalid metadata, input[%u] : model_input_type = %u", i,
+				metadata->input[i].model_input_type);
+			return -EINVAL;
+		}
+
+		if (metadata->input[i].relocatable != 1) {
+			plt_err("Model not supported, non-relocatable input: %u", i);
+			return -ENOTSUP;
+		}
+	}
+
+	/* Outputs */
+	for (i = 0; i < metadata->model.num_output; i++) {
+		if (rte_ml_io_type_size_get(
+			    cn10k_ml_io_type_map(metadata->output[i].output_type)) <= 0) {
+			plt_err("Invalid metadata, output[%u] : output_type = %u", i,
+				metadata->output[i].output_type);
+			return -EINVAL;
+		}
+
+		if (rte_ml_io_type_size_get(
+			    cn10k_ml_io_type_map(metadata->output[i].model_output_type)) <= 0) {
+			plt_err("Invalid metadata, output[%u] : model_output_type = %u", i,
+				metadata->output[i].model_output_type);
+			return -EINVAL;
+		}
+
+		if (metadata->output[i].relocatable != 1) {
+			plt_err("Model not supported, non-relocatable output: %u", i);
+			return -ENOTSUP;
+		}
+	}
+
+	return 0;
+}
+
+void
+cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata)
+{
+	uint8_t i;
+
+	for (i = 0; i < metadata->model.num_input; i++) {
+		metadata->input[i].input_type = cn10k_ml_io_type_map(metadata->input[i].input_type);
+		metadata->input[i].model_input_type =
+			cn10k_ml_io_type_map(metadata->input[i].model_input_type);
+
+		if (metadata->input[i].shape.w == 0)
+			metadata->input[i].shape.w = 1;
+
+		if (metadata->input[i].shape.x == 0)
+			metadata->input[i].shape.x = 1;
+
+		if (metadata->input[i].shape.y == 0)
+			metadata->input[i].shape.y = 1;
+
+		if (metadata->input[i].shape.z == 0)
+			metadata->input[i].shape.z = 1;
+	}
+
+	for (i = 0; i < metadata->model.num_output; i++) {
+		metadata->output[i].output_type =
+			cn10k_ml_io_type_map(metadata->output[i].output_type);
+		metadata->output[i].model_output_type =
+			cn10k_ml_io_type_map(metadata->output[i].model_output_type);
+	}
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 912fdb9758..e25d6780e9 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -19,6 +19,309 @@ enum cn10k_ml_model_state {
 	ML_CN10K_MODEL_STATE_UNKNOWN,
 };
 
+/* Model Metadata : v 2.1.0.2 */
+#define MRVL_ML_MODEL_MAGIC_STRING "MRVL"
+#define MRVL_ML_MODEL_TARGET_ARCH  128
+#define MRVL_ML_MODEL_VERSION	   2100
+#define MRVL_ML_MODEL_NAME_LEN	   64
+#define MRVL_ML_INPUT_NAME_LEN	   16
+#define MRVL_ML_OUTPUT_NAME_LEN	   16
+#define MRVL_ML_INPUT_OUTPUT_SIZE  8
+
+/* Model file metadata structure */
+struct cn10k_ml_model_metadata {
+	/* Header (256-byte) */
+	struct {
+		/* Magic string ('M', 'R', 'V', 'L') */
+		uint8_t magic[4];
+
+		/* Metadata version */
+		uint8_t version[4];
+
+		/* Metadata size */
+		uint32_t metadata_size;
+
+		/* Unique ID */
+		uint8_t uuid[128];
+
+		/* Model target architecture
+		 * 0 = Undefined
+		 * 1 = M1K
+		 * 128 = MLIP
+		 * 256 = Experimental
+		 */
+		uint32_t target_architecture;
+		uint8_t reserved[104];
+
+		/* CRC of data after metadata_header (i.e. after first 256 bytes) */
+		uint32_t payload_crc32c;
+
+		/* CRC of first 252 bytes of metadata_header, after payload_crc calculation */
+		uint32_t header_crc32c;
+	} metadata_header;
+
+	/* Model information (256-byte) */
+	struct {
+		/* Model name string */
+		uint8_t name[MRVL_ML_MODEL_NAME_LEN];
+
+		/* Model version info (xx.xx.xx.xx) */
+		uint8_t version[4];
+
+		/* Model code size (Init + Main + Finish) */
+		uint32_t code_size;
+
+		/* Model data size (Weights and Bias) */
+		uint32_t data_size;
+
+		/* OCM start offset, set to ocm_wb_range_start */
+		uint32_t ocm_start;
+
+		/* OCM start offset, set to max OCM size */
+		uint32_t ocm_end;
+
+		/* Relocatable flag (always yes)
+		 * 0 = Not relocatable
+		 * 1 = Relocatable
+		 */
+		uint8_t ocm_relocatable;
+
+		/* Tile relocatable flag (always yes)
+		 * 0 = Not relocatable
+		 * 1 = Relocatable
+		 */
+		uint8_t tile_relocatable;
+
+		/* Start tile (Always 0) */
+		uint8_t tile_start;
+
+		/* End tile (num_tiles - 1) */
+		uint8_t tile_end;
+
+		/* Inference batch size */
+		uint8_t batch_size;
+
+		/* Number of input tensors (Max 8) */
+		uint8_t num_input;
+
+		/* Number of output tensors (Max 8) */
+		uint8_t num_output;
+		uint8_t reserved1;
+
+		/* Total input size in bytes */
+		uint32_t input_size;
+
+		/* Total output size in bytes */
+		uint32_t output_size;
+
+		/* Table size in bytes */
+		uint32_t table_size;
+
+		/* Number of layers in the network */
+		uint32_t num_layers;
+		uint32_t reserved2;
+
+		/* Floor of absolute OCM region */
+		uint64_t ocm_tmp_range_floor;
+
+		/* Relative OCM start address of WB data block */
+		uint64_t ocm_wb_range_start;
+
+		/* Relative OCM end address of WB data block */
+		uint64_t ocm_wb_range_end;
+
+		/* Relative DDR start address of WB data block */
+		uint64_t ddr_wb_range_start;
+
+		/* Relative DDR end address of all outputs */
+		uint64_t ddr_wb_range_end;
+
+		/* Relative DDR start address of all inputs */
+		uint64_t ddr_input_range_start;
+
+		/* Relative DDR end address of all inputs */
+		uint64_t ddr_input_range_end;
+
+		/* Relative DDR start address of all outputs */
+		uint64_t ddr_output_range_start;
+
+		/* Relative DDR end address of all outputs */
+		uint64_t ddr_output_range_end;
+
+		/* Compiler version */
+		uint8_t compiler_version[8];
+
+		/* CDK version */
+		uint8_t cdk_version[4];
+
+		/* Lower batch optimization support
+		 * 0 - No,
+		 * 1 - Yes
+		 */
+		uint8_t supports_lower_batch_size_optimization;
+		uint8_t reserved3[59];
+	} model;
+
+	/* Init section (64-byte) */
+	struct {
+		uint32_t file_offset;
+		uint32_t file_size;
+		uint8_t reserved[56];
+	} init_model;
+
+	/* Main section (64-byte) */
+	struct {
+		uint32_t file_offset;
+		uint32_t file_size;
+		uint8_t reserved[56];
+	} main_model;
+
+	/* Finish section (64-byte) */
+	struct {
+		uint32_t file_offset;
+		uint32_t file_size;
+		uint8_t reserved[56];
+	} finish_model;
+
+	uint8_t reserved1[512]; /* End of 2k bytes */
+
+	/* Weights and Bias (64-byte) */
+	struct {
+		/* Memory offset, set to ddr_wb_range_start */
+		uint64_t mem_offset;
+		uint32_t file_offset;
+		uint32_t file_size;
+
+		/* Relocatable flag for WB
+		 * 1 = Relocatable
+		 * 2 = Not relocatable
+		 */
+		uint8_t relocatable;
+		uint8_t reserved[47];
+	} weights_bias;
+
+	/* Input (512-byte, 64-byte per input) provisioned for 8 inputs */
+	struct {
+		/* DDR offset (in OCM absolute addresses for input) */
+		uint64_t mem_offset;
+
+		/* Relocatable flag
+		 * 1 = Relocatable
+		 * 2 = Not relocatable
+		 */
+		uint8_t relocatable;
+
+		/* Input quantization
+		 * 1 = Requires quantization
+		 * 2 = Pre-quantized
+		 */
+		uint8_t quantize;
+
+		/* Type of incoming input
+		 * 1 = INT8, 2 = UINT8, 3 = INT16, 4 = UINT16,
+		 * 5 = INT32, 6 = UINT32, 7 = FP16, 8 = FP32
+		 */
+		uint8_t input_type;
+
+		/* Type of input required by model
+		 * 1 = INT8, 2 = UINT8, 3 = INT16, 4 = UINT16,
+		 * 5 = INT32, 6 = UINT32, 7 = FP16, 8 = FP32
+		 */
+		uint8_t model_input_type;
+
+		/* float_32 qscale value
+		 * quantized = non-quantized * qscale
+		 */
+		float qscale;
+
+		/* Input shape */
+		struct {
+			/* Input format
+			 * 1 = NCHW
+			 * 2 = NHWC
+			 */
+			uint8_t format;
+			uint8_t reserved[3];
+			uint32_t w;
+			uint32_t x;
+			uint32_t y;
+			uint32_t z;
+		} shape;
+		uint8_t reserved[4];
+
+		/* Name of input */
+		uint8_t input_name[MRVL_ML_INPUT_NAME_LEN];
+
+		/* DDR range end
+		 * new = mem_offset + size_bytes - 1
+		 */
+		uint64_t ddr_range_end;
+	} input[MRVL_ML_INPUT_OUTPUT_SIZE];
+
+	/* Output (512 byte, 64-byte per input) provisioned for 8 outputs */
+	struct {
+		/* DDR offset in OCM absolute addresses for output */
+		uint64_t mem_offset;
+
+		/* Relocatable flag
+		 * 1 = Relocatable
+		 * 2 = Not relocatable
+		 */
+		uint8_t relocatable;
+
+		/* Output dequantization
+		 * 1 = De-quantization required
+		 * 2 = De-quantization not required
+		 */
+		uint8_t dequantize;
+
+		/* Type of outgoing output
+		 * 1 = INT8, 2 = UINT8, 3 = INT16, 4 = UINT16
+		 * 5 = INT32, 6 = UINT32, 7 = FP16, 8 = FP32
+		 */
+		uint8_t output_type;
+
+		/* Type of output produced by model
+		 * 1 = INT8, 2 = UINT8, 3 = INT16, 4 = UINT16
+		 * 5 = INT32, 6 = UINT32, 7 = FP16, 8 = FP32
+		 */
+		uint8_t model_output_type;
+
+		/* float_32 dscale value
+		 * dequantized = quantized * dscale
+		 */
+		float dscale;
+
+		/* Number of items in the output */
+		uint32_t size;
+		uint8_t reserved[20];
+
+		/* DDR range end
+		 * new = mem_offset + size_bytes - 1
+		 */
+		uint64_t ddr_range_end;
+		uint8_t output_name[MRVL_ML_OUTPUT_NAME_LEN];
+	} output[MRVL_ML_INPUT_OUTPUT_SIZE];
+
+	uint8_t reserved2[1792];
+
+	/* Model data */
+	struct {
+		uint8_t reserved1[4068];
+
+		/* Beta: xx.xx.xx.xx,
+		 * Later: YYYYMM.xx.xx
+		 */
+		uint8_t compiler_version[8];
+
+		/* M1K CDK version (xx.xx.xx.xx) */
+		uint8_t m1k_cdk_version[4];
+	} data;
+
+	/* Hidden 16 bytes of magic code */
+	uint8_t reserved3[16];
+};
+
 /* Model Object */
 struct cn10k_ml_model {
 	/* Device reference */
@@ -30,6 +333,12 @@ struct cn10k_ml_model {
 	/* ID */
 	int16_t model_id;
 
+	/* Batch size */
+	uint32_t batch_size;
+
+	/* Metadata */
+	struct cn10k_ml_model_metadata metadata;
+
 	/* Spinlock, used to update model state */
 	plt_spinlock_t lock;
 
@@ -37,4 +346,7 @@ struct cn10k_ml_model {
 	enum cn10k_ml_model_state state;
 };
 
+int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
+void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
+
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index d177d0e3e4..f7c1d43aee 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -416,8 +416,11 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	uint64_t mz_size;
 	uint16_t idx;
 	bool found;
+	int ret;
 
-	PLT_SET_USED(params);
+	ret = cn10k_ml_model_metadata_check(params->addr, params->size);
+	if (ret != 0)
+		return ret;
 
 	mldev = dev->data->dev_private;
 
@@ -450,6 +453,15 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	model->mldev = mldev;
 	model->model_id = idx;
 
+	rte_memcpy(&model->metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
+	cn10k_ml_model_metadata_update(&model->metadata);
+
+	/* Enable support for batch_size of 256 */
+	if (model->metadata.model.batch_size == 0)
+		model->batch_size = 256;
+	else
+		model->batch_size = model->metadata.model.batch_size;
+
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index bf7a9c0225..799e8f2470 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -19,7 +19,7 @@ sources = files(
         'cn10k_ml_model.c',
 )
 
-deps += ['mldev', 'common_cnxk', 'kvargs']
+deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
 
 if get_option('buildtype').contains('debug')
         cflags += [ '-DCNXK_ML_DEV_DEBUG' ]
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 13/39] ml/cnxk: add internal structures for derived info
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (11 preceding siblings ...)
  2023-02-01  9:22   ` [PATCH v4 12/39] ml/cnxk: enable validity checks for model metadata Srikanth Yalavarthi
@ 2023-02-01  9:22   ` Srikanth Yalavarthi
  2023-02-01  9:22   ` [PATCH v4 14/39] ml/cnxk: add internal structures for tiles and OCM Srikanth Yalavarthi
                     ` (25 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:22 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added internal structures to handle derived address fields
and enabled support to compute DMA addresses for model start.
Enabled updating internal model fields.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 89 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h | 80 ++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 18 ++++++-
 3 files changed, 186 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 0fefab9daa..dafcae106b 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -214,3 +214,92 @@ cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata)
 			cn10k_ml_io_type_map(metadata->output[i].model_output_type);
 	}
 }
+
+void
+cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_t *base_dma_addr)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	struct cn10k_ml_model_addr *addr;
+	size_t model_data_size;
+	uint8_t *dma_addr_load;
+	uint8_t *dma_addr_run;
+	uint8_t i;
+	int fpos;
+
+	metadata = &model->metadata;
+	addr = &model->addr;
+	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
+			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
+
+	/* Base address */
+	addr->base_dma_addr_load = base_dma_addr;
+	addr->base_dma_addr_run = PLT_PTR_ADD(addr->base_dma_addr_load, model_data_size);
+
+	/* Init section */
+	dma_addr_load = addr->base_dma_addr_load;
+	dma_addr_run = addr->base_dma_addr_run;
+	fpos = sizeof(struct cn10k_ml_model_metadata);
+	addr->init_load_addr = dma_addr_load;
+	addr->init_run_addr = dma_addr_run;
+	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->init_model.file_size);
+
+	/* Main section */
+	dma_addr_load += metadata->init_model.file_size;
+	dma_addr_run += metadata->init_model.file_size;
+	fpos += metadata->init_model.file_size;
+	addr->main_load_addr = dma_addr_load;
+	addr->main_run_addr = dma_addr_run;
+	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->main_model.file_size);
+
+	/* Finish section */
+	dma_addr_load += metadata->main_model.file_size;
+	dma_addr_run += metadata->main_model.file_size;
+	fpos += metadata->main_model.file_size;
+	addr->finish_load_addr = dma_addr_load;
+	addr->finish_run_addr = dma_addr_run;
+	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->finish_model.file_size);
+
+	/* Weights and Bias section */
+	dma_addr_load += metadata->finish_model.file_size;
+	fpos += metadata->finish_model.file_size;
+	addr->wb_base_addr = PLT_PTR_SUB(dma_addr_load, metadata->weights_bias.mem_offset);
+	addr->wb_load_addr = PLT_PTR_ADD(addr->wb_base_addr, metadata->weights_bias.mem_offset);
+	rte_memcpy(addr->wb_load_addr, PLT_PTR_ADD(buffer, fpos), metadata->weights_bias.file_size);
+
+	/* Inputs */
+	addr->total_input_sz_d = 0;
+	addr->total_input_sz_q = 0;
+	for (i = 0; i < metadata->model.num_input; i++) {
+		addr->input[i].nb_elements =
+			model->metadata.input[i].shape.w * model->metadata.input[i].shape.x *
+			model->metadata.input[i].shape.y * model->metadata.input[i].shape.z;
+		addr->input[i].sz_d = addr->input[i].nb_elements *
+				      rte_ml_io_type_size_get(metadata->input[i].input_type);
+		addr->input[i].sz_q = addr->input[i].nb_elements *
+				      rte_ml_io_type_size_get(metadata->input[i].model_input_type);
+		addr->total_input_sz_d += addr->input[i].sz_d;
+		addr->total_input_sz_q += addr->input[i].sz_q;
+
+		plt_ml_dbg("model_id = %d, input[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+			   model->model_id, i, metadata->input[i].shape.w,
+			   metadata->input[i].shape.x, metadata->input[i].shape.y,
+			   metadata->input[i].shape.z, addr->input[i].sz_d, addr->input[i].sz_q);
+	}
+
+	/* Outputs */
+	addr->total_output_sz_q = 0;
+	addr->total_output_sz_d = 0;
+	for (i = 0; i < metadata->model.num_output; i++) {
+		addr->output[i].nb_elements = metadata->output[i].size;
+		addr->output[i].sz_d = addr->output[i].nb_elements *
+				       rte_ml_io_type_size_get(metadata->output[i].output_type);
+		addr->output[i].sz_q =
+			addr->output[i].nb_elements *
+			rte_ml_io_type_size_get(metadata->output[i].model_output_type);
+		addr->total_output_sz_q += addr->output[i].sz_q;
+		addr->total_output_sz_d += addr->output[i].sz_d;
+
+		plt_ml_dbg("model_id = %d, output[%u] - sz_d = %u, sz_q = %u", model->model_id, i,
+			   addr->output[i].sz_d, addr->output[i].sz_q);
+	}
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index e25d6780e9..7e276c3b12 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -322,6 +322,81 @@ struct cn10k_ml_model_metadata {
 	uint8_t reserved3[16];
 };
 
+/* Model address structure */
+struct cn10k_ml_model_addr {
+	/* Base DMA address for load */
+	void *base_dma_addr_load;
+
+	/* Base DMA address for run */
+	void *base_dma_addr_run;
+
+	/* Init section load address */
+	void *init_load_addr;
+
+	/* Init section run address */
+	void *init_run_addr;
+
+	/* Main section load address */
+	void *main_load_addr;
+
+	/* Main section run address */
+	void *main_run_addr;
+
+	/* Finish section load address */
+	void *finish_load_addr;
+
+	/* Finish section run address */
+	void *finish_run_addr;
+
+	/* Weights and Bias base address */
+	void *wb_base_addr;
+
+	/* Weights and bias load address */
+	void *wb_load_addr;
+
+	/* Start tile */
+	uint8_t tile_start;
+
+	/* End tile */
+	uint8_t tile_end;
+
+	/* Input address and size */
+	struct {
+		/* Number of elements */
+		uint32_t nb_elements;
+
+		/* Dequantized input size */
+		uint32_t sz_d;
+
+		/* Quantized input size */
+		uint32_t sz_q;
+	} input[MRVL_ML_INPUT_OUTPUT_SIZE];
+
+	/* Output address and size */
+	struct {
+		/* Number of elements */
+		uint32_t nb_elements;
+
+		/* Dequantize output size */
+		uint32_t sz_d;
+
+		/* Quantized output size */
+		uint32_t sz_q;
+	} output[MRVL_ML_INPUT_OUTPUT_SIZE];
+
+	/* Total size of quantized input */
+	uint32_t total_input_sz_q;
+
+	/* Total size of dequantized input */
+	uint32_t total_input_sz_d;
+
+	/* Total size of quantized output */
+	uint32_t total_output_sz_q;
+
+	/* Total size of dequantized output */
+	uint32_t total_output_sz_d;
+};
+
 /* Model Object */
 struct cn10k_ml_model {
 	/* Device reference */
@@ -339,6 +414,9 @@ struct cn10k_ml_model {
 	/* Metadata */
 	struct cn10k_ml_model_metadata metadata;
 
+	/* Address structure */
+	struct cn10k_ml_model_addr addr;
+
 	/* Spinlock, used to update model state */
 	plt_spinlock_t lock;
 
@@ -348,5 +426,7 @@ struct cn10k_ml_model {
 
 int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
+void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
+				uint8_t *base_dma_addr);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index f7c1d43aee..20f15ec35d 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -408,11 +408,14 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 int
 cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, int16_t *model_id)
 {
+	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
+	size_t model_data_size;
+	uint8_t *base_dma_addr;
 	uint64_t mz_size;
 	uint16_t idx;
 	bool found;
@@ -439,7 +442,12 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	/* Compute memzone size */
-	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE);
+	metadata = (struct cn10k_ml_model_metadata *)params->addr;
+	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
+			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
+	model_data_size = PLT_ALIGN_CEIL(model_data_size, ML_CN10K_ALIGN_SIZE);
+	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
+		  2 * model_data_size;
 
 	/* Allocate memzone for model object and model data */
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
@@ -462,6 +470,14 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	else
 		model->batch_size = model->metadata.model.batch_size;
 
+	/* Set DMA base address */
+	base_dma_addr = PLT_PTR_ADD(
+		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE));
+	cn10k_ml_model_addr_update(model, params->addr, base_dma_addr);
+
+	/* Copy data from load to run. run address to be used by MLIP */
+	rte_memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, model_data_size);
+
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 14/39] ml/cnxk: add internal structures for tiles and OCM
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (12 preceding siblings ...)
  2023-02-01  9:22   ` [PATCH v4 13/39] ml/cnxk: add internal structures for derived info Srikanth Yalavarthi
@ 2023-02-01  9:22   ` Srikanth Yalavarthi
  2023-02-01  9:22   ` [PATCH v4 15/39] ml/cnxk: add structures for slow and fast path JDs Srikanth Yalavarthi
                     ` (24 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:22 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added internal structures to handle tile and OCM information and
OCM to model memory mapping. Initialize the fields to platform
specific defaults and compute the OCM / tile requirements for model.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |  5 ++
 drivers/ml/cnxk/cn10k_ml_model.c | 53 +++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h |  6 +++
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  5 ++
 drivers/ml/cnxk/cn10k_ml_ocm.h   | 79 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 29 ++++++++++++
 drivers/ml/cnxk/meson.build      |  2 +
 7 files changed, 179 insertions(+)
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ocm.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ocm.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 7cf6268115..02a4496c97 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -7,6 +7,8 @@
 
 #include <roc_api.h>
 
+#include "cn10k_ml_ocm.h"
+
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
@@ -215,6 +217,9 @@ struct cn10k_ml_dev {
 	/* Firmware */
 	struct cn10k_ml_fw fw;
 
+	/* OCM info */
+	struct cn10k_ml_ocm ocm;
+
 	/* Number of models loaded */
 	uint16_t nb_models_loaded;
 };
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index dafcae106b..30911b7ffe 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -8,6 +8,7 @@
 
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
+#include "cn10k_ml_ocm.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -303,3 +304,55 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_
 			   addr->output[i].sz_d, addr->output[i].sz_q);
 	}
 }
+
+int
+cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, int16_t model_id, uint8_t *buffer,
+			       uint16_t *wb_pages, uint16_t *scratch_pages)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	struct cn10k_ml_ocm *ocm;
+	uint64_t scratch_size;
+	uint64_t wb_size;
+
+	metadata = (struct cn10k_ml_model_metadata *)buffer;
+	ocm = &mldev->ocm;
+
+	/* Assume wb_size is zero for non-relocatable models */
+	if (metadata->model.ocm_relocatable)
+		wb_size = metadata->model.ocm_wb_range_end - metadata->model.ocm_wb_range_start + 1;
+	else
+		wb_size = 0;
+
+	if (wb_size % ocm->page_size)
+		*wb_pages = wb_size / ocm->page_size + 1;
+	else
+		*wb_pages = wb_size / ocm->page_size;
+	plt_ml_dbg("model_id = %d, wb_size = %" PRIu64 ", wb_pages = %u", model_id, wb_size,
+		   *wb_pages);
+
+	scratch_size = ocm->size_per_tile - metadata->model.ocm_tmp_range_floor;
+	if (metadata->model.ocm_tmp_range_floor % ocm->page_size)
+		*scratch_pages = scratch_size / ocm->page_size + 1;
+	else
+		*scratch_pages = scratch_size / ocm->page_size;
+	plt_ml_dbg("model_id = %d, scratch_size = %" PRIu64 ", scratch_pages = %u", model_id,
+		   scratch_size, *scratch_pages);
+
+	/* Check if the model can be loaded on OCM */
+	if ((*wb_pages + *scratch_pages) > ML_CN10K_OCM_NUMPAGES) {
+		plt_err("Cannot create the model, OCM relocatable = %u",
+			metadata->model.ocm_relocatable);
+		plt_err("wb_pages (%u) + scratch_pages (%u) > %u", *wb_pages, *scratch_pages,
+			ML_CN10K_OCM_NUMPAGES);
+		return -ENOMEM;
+	}
+
+	/* Update scratch_pages to block the full tile for OCM non-relocatable model. This would
+	 * prevent the library from allocating the remaining space on the tile to other models.
+	 */
+	if (!metadata->model.ocm_relocatable)
+		*scratch_pages =
+			PLT_MAX(PLT_U64_CAST(*scratch_pages), PLT_U64_CAST(ML_CN10K_OCM_NUMPAGES));
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 7e276c3b12..ebd296c609 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -10,6 +10,7 @@
 #include <roc_api.h>
 
 #include "cn10k_ml_dev.h"
+#include "cn10k_ml_ocm.h"
 
 /* Model state */
 enum cn10k_ml_model_state {
@@ -417,6 +418,9 @@ struct cn10k_ml_model {
 	/* Address structure */
 	struct cn10k_ml_model_addr addr;
 
+	/* Tile and memory information object */
+	struct cn10k_ml_ocm_model_map model_mem_map;
+
 	/* Spinlock, used to update model state */
 	plt_spinlock_t lock;
 
@@ -428,5 +432,7 @@ int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
 void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
 				uint8_t *base_dma_addr);
+int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, int16_t model_id, uint8_t *buffer,
+				   uint16_t *wb_pages, uint16_t *scratch_pages);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
new file mode 100644
index 0000000000..b1c62f2963
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#include "cn10k_ml_ocm.h"
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
new file mode 100644
index 0000000000..44390396f9
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -0,0 +1,79 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef _CN10K_ML_OCM_H_
+#define _CN10K_ML_OCM_H_
+
+#include <rte_mldev.h>
+
+/* Page size in bytes. */
+#define ML_CN10K_OCM_PAGESIZE 0x4000
+
+/* Number of OCM tiles. */
+#define ML_CN10K_OCM_NUMTILES 0x8
+
+/* OCM in bytes, per tile. */
+#define ML_CN10K_OCM_TILESIZE 0x100000
+
+/* OCM pages, per tile. */
+#define ML_CN10K_OCM_NUMPAGES (ML_CN10K_OCM_TILESIZE / ML_CN10K_OCM_PAGESIZE)
+
+/* Maximum OCM mask words, per tile, 8 bit words. */
+#define ML_CN10K_OCM_MASKWORDS (ML_CN10K_OCM_NUMPAGES / 8)
+
+/* OCM and Tile information structure */
+struct cn10k_ml_ocm_tile_info {
+	/* Mask of used / allotted pages on tile's OCM */
+	uint8_t ocm_mask[ML_CN10K_OCM_MASKWORDS];
+
+	/* Last pages in the tile's OCM used for weights and bias, default = -1 */
+	int last_wb_page;
+
+	/* Number pages used for scratch memory on the tile's OCM */
+	uint16_t scratch_pages;
+};
+
+/* Model OCM map structure */
+struct cn10k_ml_ocm_model_map {
+	/* Status of OCM reservation */
+	bool ocm_reserved;
+
+	/* Mask of OCM tiles for the model */
+	uint64_t tilemask;
+
+	/* Start page for the model load, default = -1 */
+	int wb_page_start;
+
+	/* Number of pages required for weights and bias */
+	uint16_t wb_pages;
+
+	/* Number of pages required for scratch memory */
+	uint16_t scratch_pages;
+};
+
+/* OCM state structure */
+struct cn10k_ml_ocm {
+	/* OCM spinlock, used to update OCM state */
+	rte_spinlock_t lock;
+
+	/* Number of OCM tiles */
+	uint8_t num_tiles;
+
+	/* OCM size per each tile */
+	uint64_t size_per_tile;
+
+	/* Size of OCM page */
+	uint64_t page_size;
+
+	/* Number of OCM pages */
+	uint16_t num_pages;
+
+	/* Words per OCM mask */
+	uint16_t mask_words;
+
+	/* OCM memory info and status*/
+	struct cn10k_ml_ocm_tile_info tile_ocm_info[ML_CN10K_OCM_NUMTILES];
+};
+
+#endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 20f15ec35d..9ccf52332f 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -126,8 +126,10 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_qp *qp;
 	uint32_t mz_size;
+	uint16_t tile_id;
 	int16_t model_id;
 	uint16_t qp_id;
 	int ret;
@@ -250,6 +252,18 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	}
 	dev->data->nb_models = conf->nb_models;
 
+	ocm = &mldev->ocm;
+	ocm->num_tiles = ML_CN10K_OCM_NUMTILES;
+	ocm->size_per_tile = ML_CN10K_OCM_TILESIZE;
+	ocm->page_size = ML_CN10K_OCM_PAGESIZE;
+	ocm->num_pages = ocm->size_per_tile / ocm->page_size;
+	ocm->mask_words = ocm->num_pages / (8 * sizeof(uint8_t));
+
+	for (tile_id = 0; tile_id < ocm->num_tiles; tile_id++)
+		ocm->tile_ocm_info[tile_id].last_wb_page = -1;
+
+	rte_spinlock_init(&ocm->lock);
+
 	mldev->nb_models_loaded = 0;
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
 
@@ -416,6 +430,8 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	const struct plt_memzone *mz;
 	size_t model_data_size;
 	uint8_t *base_dma_addr;
+	uint16_t scratch_pages;
+	uint16_t wb_pages;
 	uint64_t mz_size;
 	uint16_t idx;
 	bool found;
@@ -441,6 +457,11 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 		return -ENOMEM;
 	}
 
+	/* Get WB and scratch pages, check if model can be loaded. */
+	ret = cn10k_ml_model_ocm_pages_count(mldev, idx, params->addr, &wb_pages, &scratch_pages);
+	if (ret < 0)
+		return ret;
+
 	/* Compute memzone size */
 	metadata = (struct cn10k_ml_model_metadata *)params->addr;
 	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
@@ -478,6 +499,14 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	/* Copy data from load to run. run address to be used by MLIP */
 	rte_memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, model_data_size);
 
+	/* Initialize model_mem_map */
+	memset(&model->model_mem_map, 0, sizeof(struct cn10k_ml_ocm_model_map));
+	model->model_mem_map.ocm_reserved = false;
+	model->model_mem_map.tilemask = 0;
+	model->model_mem_map.wb_page_start = -1;
+	model->model_mem_map.wb_pages = wb_pages;
+	model->model_mem_map.scratch_pages = scratch_pages;
+
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 799e8f2470..393bc629b0 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -11,12 +11,14 @@ driver_sdk_headers = files(
         'cn10k_ml_dev.h',
         'cn10k_ml_ops.h',
         'cn10k_ml_model.h',
+        'cn10k_ml_ocm.h',
 )
 
 sources = files(
         'cn10k_ml_dev.c',
         'cn10k_ml_ops.c',
         'cn10k_ml_model.c',
+        'cn10k_ml_ocm.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 15/39] ml/cnxk: add structures for slow and fast path JDs
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (13 preceding siblings ...)
  2023-02-01  9:22   ` [PATCH v4 14/39] ml/cnxk: add internal structures for tiles and OCM Srikanth Yalavarthi
@ 2023-02-01  9:22   ` Srikanth Yalavarthi
  2023-02-01  9:22   ` [PATCH v4 16/39] ml/cnxk: find OCM mask and page slots for a model Srikanth Yalavarthi
                     ` (23 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:22 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added JD structures for load, unload and run jobs. Initialize
job command and allocate memory for request structures for slow
path jobs.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   | 99 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h |  4 ++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 19 +++++-
 drivers/ml/cnxk/cn10k_ml_ops.h   |  4 ++
 4 files changed, 125 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 02a4496c97..68fcc957fa 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -188,6 +188,105 @@ struct cn10k_ml_jd {
 
 			uint8_t rsvd[8];
 		} fw_load;
+
+		struct cn10k_ml_jd_section_model_start {
+			/* Source model start address in DDR relative to ML_MLR_BASE */
+			uint64_t model_src_ddr_addr;
+
+			/* Destination model start address in DDR relative to ML_MLR_BASE */
+			uint64_t model_dst_ddr_addr;
+
+			/* Offset to model init section in the model */
+			uint64_t model_init_offset : 32;
+
+			/* Size of init section in the model */
+			uint64_t model_init_size : 32;
+
+			/* Offset to model main section in the model */
+			uint64_t model_main_offset : 32;
+
+			/* Size of main section in the model */
+			uint64_t model_main_size : 32;
+
+			/* Offset to model finish section in the model */
+			uint64_t model_finish_offset : 32;
+
+			/* Size of finish section in the model */
+			uint64_t model_finish_size : 32;
+
+			/* Offset to WB in model bin */
+			uint64_t model_wb_offset : 32;
+
+			/* Number of model layers */
+			uint64_t num_layers : 8;
+
+			/* Number of gather entries, 0 means linear input mode (= no gather) */
+			uint64_t num_gather_entries : 8;
+
+			/* Number of scatter entries 0 means linear input mode (= no scatter) */
+			uint64_t num_scatter_entries : 8;
+
+			/* Tile mask to load model */
+			uint64_t tilemask : 8;
+
+			/* Batch size of model  */
+			uint64_t batch_size : 32;
+
+			/* OCM WB base address */
+			uint64_t ocm_wb_base_address : 32;
+
+			/* OCM WB range start */
+			uint64_t ocm_wb_range_start : 32;
+
+			/* OCM WB range End */
+			uint64_t ocm_wb_range_end : 32;
+
+			/* DDR WB address */
+			uint64_t ddr_wb_base_address;
+
+			/* DDR WB range start */
+			uint64_t ddr_wb_range_start : 32;
+
+			/* DDR WB range end */
+			uint64_t ddr_wb_range_end : 32;
+
+			union {
+				/* Points to gather list if num_gather_entries > 0 */
+				void *gather_list;
+				struct {
+					/* Linear input mode */
+					uint64_t ddr_range_start : 32;
+					uint64_t ddr_range_end : 32;
+				} s;
+			} input;
+
+			union {
+				/* Points to scatter list if num_scatter_entries > 0 */
+				void *scatter_list;
+				struct {
+					/* Linear output mode */
+					uint64_t ddr_range_start : 32;
+					uint64_t ddr_range_end : 32;
+				} s;
+			} output;
+		} model_start;
+
+		struct cn10k_ml_jd_section_model_stop {
+			uint8_t rsvd[96];
+		} model_stop;
+
+		struct cn10k_ml_jd_section_model_run {
+			/* Address of the input for the run relative to ML_MLR_BASE */
+			uint64_t input_ddr_addr;
+
+			/* Address of the output for the run relative to ML_MLR_BASE */
+			uint64_t output_ddr_addr;
+
+			/* Number of batches to run in variable batch processing */
+			uint16_t num_batches;
+
+			uint8_t rsvd[78];
+		} model_run;
 	};
 };
 
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index ebd296c609..003f5aba36 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -11,6 +11,7 @@
 
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ocm.h"
+#include "cn10k_ml_ops.h"
 
 /* Model state */
 enum cn10k_ml_model_state {
@@ -426,6 +427,9 @@ struct cn10k_ml_model {
 
 	/* State */
 	enum cn10k_ml_model_state state;
+
+	/* Slow-path operations request pointer */
+	struct cn10k_ml_req *req;
 };
 
 int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 9ccf52332f..8603cba20e 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -12,6 +12,10 @@
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
 
+/* ML Job descriptor flags */
+#define ML_FLAGS_POLL_COMPL BIT(0)
+#define ML_FLAGS_SSO_COMPL  BIT(1)
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -65,6 +69,7 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	struct cn10k_ml_qp *qp;
 	uint32_t len;
 	uint8_t *va;
+	uint64_t i;
 
 	/* Allocate queue pair */
 	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cn10k_ml_qp), ROC_ALIGN,
@@ -95,6 +100,12 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->queue.wait_cycles = ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
 	qp->nb_desc = nb_desc;
 
+	/* Initialize job command */
+	for (i = 0; i < qp->nb_desc; i++) {
+		memset(&qp->queue.reqs[i].jd, 0, sizeof(struct cn10k_ml_jd));
+		qp->queue.reqs[i].jcmd.w1.s.jobptr = PLT_U64_CAST(&qp->queue.reqs[i].jd);
+	}
+
 	return qp;
 
 qp_free:
@@ -468,7 +479,8 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
 	model_data_size = PLT_ALIGN_CEIL(model_data_size, ML_CN10K_ALIGN_SIZE);
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
-		  2 * model_data_size;
+		  2 * model_data_size +
+		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE);
 
 	/* Allocate memzone for model object and model data */
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
@@ -507,6 +519,11 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	model->model_mem_map.wb_pages = wb_pages;
 	model->model_mem_map.scratch_pages = scratch_pages;
 
+	/* Set slow-path request address and state */
+	model->req = PLT_PTR_ADD(
+		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
+				  2 * model_data_size);
+
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 8a939cabc7..981aa52655 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -6,6 +6,7 @@
 #define _CN10K_ML_OPS_H_
 
 #include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
 
 #include <roc_api.h>
 
@@ -21,6 +22,9 @@ struct cn10k_ml_req {
 
 	/* Status field for poll mode requests */
 	volatile uint64_t status;
+
+	/* Job command */
+	struct ml_job_cmd_s jcmd;
 } __rte_aligned(ROC_ALIGN);
 
 /* Request queue */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 16/39] ml/cnxk: find OCM mask and page slots for a model
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (14 preceding siblings ...)
  2023-02-01  9:22   ` [PATCH v4 15/39] ml/cnxk: add structures for slow and fast path JDs Srikanth Yalavarthi
@ 2023-02-01  9:22   ` Srikanth Yalavarthi
  2023-02-01  9:22   ` [PATCH v4 17/39] ml/cnxk: add support to reserve and free OCM pages Srikanth Yalavarthi
                     ` (22 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:22 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added support to compute OCM tilemask and page start for a
model. The computed tilemask and page start are used during
model start to copy model weights and bias to OCM. OCM slot
for a model is allocated from the tiles with maximum amount
of free memory.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c | 330 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ocm.h |   5 +
 2 files changed, 335 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index b1c62f2963..df2fa4c514 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -2,4 +2,334 @@
  * Copyright (c) 2022 Marvell.
  */
 
+#include <rte_mldev_pmd.h>
+
+#include "cn10k_ml_dev.h"
 #include "cn10k_ml_ocm.h"
+
+#include "roc_api.h"
+
+/* OCM macros */
+#define BYTE_LEN	  8
+#define OCM_MAP_WORD_SIZE (sizeof(uint8_t) * BYTE_LEN)
+#define SET_BIT(num, n)	  ((num) | (1 << (n)))
+
+/* Left shift multi-word mask by 1 bit.
+ *
+ * For example, given a mask of two uint8_t words
+ * Input:  [00110101] [00110111]
+ * Output: [01101010] [01101110]
+ */
+static void
+lshift_mask(uint8_t *mask, int nwords)
+{
+	int i;
+	int word_sz;
+
+	word_sz = sizeof(uint8_t) * BYTE_LEN;
+	for (i = nwords - 1; i >= 0; i--) {
+		mask[i] = mask[i] << 1;
+		if (i != 0)
+			mask[i] = mask[i] | (mask[i - 1] >> (word_sz - 1));
+	}
+}
+
+/* Get the index of the first unused slot in a multi-word mask (base_mask). Unused slots only after
+ * the start_pos are considered. An unused slot is a sequence of slot_sz continuous unset bits in
+ * the multi-word mask. For example given a multi-word mask,
+ *
+ * The program creates a search_mask with slot_sz bits set. Uses a sliding windows approach to scan
+ * the mask to identify the available first slot. search_mask slides left from start_pos to end.
+ *
+ * [10111000] [01001001]
+ * - WORD 1 --- WORD 0 -
+ *
+ * When start = 0,
+ * Index of the first unused slot of size 4 is 7.
+ * Index of the first unused slot of size 3 is 7.
+ * Index of the first unused slot of size 2 is 1.
+ * Index of the first unused slot of size 1 is 1.
+ *
+ * When start = 2,
+ * Index of the first unused slot of size 4 is 7.
+ * Index of the first unused slot of size 2 is 4.
+ * Index of the first unused slot of size 1 is 2.
+ *
+ * When unable to find a valid slot, return 0
+ * When slot_sz is zero, return max_idx + 1
+ */
+static int
+slot_index_lowest(uint8_t *base_mask, int nwords, int slot_sz, int start_pos)
+{
+	uint8_t *search_mask;
+	int word_sz;
+	int end_pos;
+	int min_idx;
+	int max_idx;
+	bool match;
+	int i, j;
+	int idx;
+
+	word_sz = sizeof(uint8_t) * BYTE_LEN;
+	min_idx = 0;
+	max_idx = word_sz * nwords;
+	idx = min_idx - 1;
+
+	if (slot_sz == 0)
+		return max_idx;
+
+	/* Create a mask with slot_sz bits set */
+	search_mask = plt_zmalloc(nwords * sizeof(uint8_t), 0);
+	if (search_mask == NULL)
+		goto error;
+
+	for (i = 0; i < nwords; i++) {
+		if (i < slot_sz / word_sz)
+			search_mask[i] = 0xFF;
+		else if (i > slot_sz / word_sz)
+			search_mask[i] = 0x00;
+		else
+			search_mask[i] = (1 << (slot_sz % word_sz)) - 1;
+	}
+
+	/* Shift search mask by start_pos bits */
+	for (i = 0; i < start_pos; i++)
+		lshift_mask(search_mask, nwords);
+
+	/* Scan for a slot, left shift search mask after every iteration */
+	end_pos = nwords * word_sz - slot_sz + 1;
+	for (j = start_pos; j < end_pos; j++) {
+		match = true;
+		for (i = 0; i < nwords; i++)
+			match = match && (((~base_mask[i]) & search_mask[i]) == search_mask[i]);
+
+		if (match) {
+			idx = j;
+			goto found;
+		}
+
+		lshift_mask(search_mask, nwords);
+	}
+
+found:
+	plt_free(search_mask);
+
+error:
+	return idx;
+}
+
+/* Find the largest possible unused slot, with a minimum size of search_sz in a multi-work mask. The
+ * function returns the start index of the slot and the size of the identified slot (slot_sz).
+ *
+ * For example, in multi-word mask
+ *
+ * [10111000] [01001001]
+ * - WORD 1 --- WORD 0 -
+ *
+ * When search_sz > 4, return value = -1, slot_sz = 0
+ * When search_sz <=4, return value = 7, slot_sz = 4
+ */
+static int
+slot_index_largest(uint8_t *base_mask, int nwords, int search_sz, int *slot_sz)
+{
+	uint8_t *search_mask;
+	int mask_sz;
+	int word_sz;
+	int end_pos;
+	bool match;
+	int i, j;
+	int idx;
+
+	word_sz = sizeof(uint8_t) * BYTE_LEN;
+	mask_sz = nwords * word_sz;
+	idx = -1;
+
+	/* Create a mask with mask_sz bits set */
+	search_mask = plt_zmalloc(mask_sz, 0);
+	if (search_mask == NULL)
+		goto error;
+
+start:
+	for (i = 0; i < nwords; i++) {
+		if (i < mask_sz / word_sz)
+			search_mask[i] = 0xFF;
+		else if (i > mask_sz / word_sz)
+			search_mask[i] = 0x00;
+		else
+			search_mask[i] = (1 << (mask_sz % word_sz)) - 1;
+	}
+
+	/* Scan for a slot, left shift search mask after every iteration */
+	end_pos = nwords * word_sz - mask_sz + 1;
+	for (j = 0; j < end_pos; j++) {
+		match = true;
+		for (i = 0; i < nwords; i++)
+			match = match && (((~base_mask[i]) & search_mask[i]) == search_mask[i]);
+
+		if (match) {
+			idx = j;
+			goto found;
+		}
+		lshift_mask(search_mask, nwords);
+	}
+
+	mask_sz--;
+	if (mask_sz >= search_sz)
+		goto start;
+	else
+		mask_sz = 0;
+
+found:
+	plt_free(search_mask);
+	if (search_sz == 0)
+		idx = word_sz * nwords;
+
+error:
+	if (slot_sz)
+		*slot_sz = mask_sz;
+
+	return idx;
+}
+
+/* Count number of bits in a tilemask. Assumes that all set bits are contiguous. */
+int
+cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end)
+{
+	uint8_t count;
+
+	PLT_ASSERT(tilemask != 0);
+
+	*start = __builtin_ctzl(tilemask);
+	*end = 64 - __builtin_clzl(tilemask) - 1;
+	count = *end - *start + 1;
+
+	PLT_ASSERT(count == __builtin_popcountl(tilemask));
+	return count;
+}
+
+/* Find the tiles and wb_page_start to load the model on given 'num_tiles' tiles with the specified
+ * scratch & WB pages and OCM allocation mode.
+ */
+int
+cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
+			   uint16_t scratch_pages, uint64_t *tilemask)
+{
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+
+	uint8_t local_ocm_mask[ML_CN10K_OCM_MASKWORDS] = {0};
+	uint16_t used_scratch_pages_max;
+	uint16_t scratch_page_start;
+	int used_last_wb_page_max;
+	uint16_t scratch_page_end;
+	uint8_t search_start_tile;
+	uint8_t search_end_tile;
+	int wb_page_start_curr;
+	int max_slot_sz_curr;
+	uint8_t tile_start;
+	int ocm_alloc_mode;
+	int wb_page_start;
+	uint16_t tile_id;
+	uint16_t word_id;
+	uint8_t tile_idx;
+	int max_slot_sz;
+	int start_tile;
+	int page_id;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+
+	if (num_tiles > ML_CN10K_OCM_NUMTILES) {
+		plt_err("Invalid num_tiles = %u (> ML_CN10K_OCM_NUMTILES)", num_tiles);
+		return -1;
+	}
+
+	memset(tilemask, 0, sizeof(uint64_t));
+	wb_page_start = -1;
+	used_scratch_pages_max = 0;
+	used_last_wb_page_max = -1;
+	start_tile = -1;
+	max_slot_sz_curr = 0;
+	max_slot_sz = 0;
+	tile_idx = 0;
+	ocm_alloc_mode = 2;
+
+	if ((start_tile != -1) && (start_tile % num_tiles != 0)) {
+		plt_err("Invalid start_tile, %d", start_tile);
+		return -1;
+	}
+
+	if (start_tile < 0) {
+		search_start_tile = 0;
+		search_end_tile = ocm->num_tiles - num_tiles;
+	} else {
+		search_start_tile = start_tile;
+		search_end_tile = start_tile;
+	}
+
+	tile_start = search_start_tile;
+start_search:
+	used_scratch_pages_max = 0;
+	used_last_wb_page_max = -1;
+	for (tile_id = tile_start; tile_id < tile_start + num_tiles; tile_id++) {
+		used_scratch_pages_max =
+			PLT_MAX(ocm->tile_ocm_info[tile_id].scratch_pages, used_scratch_pages_max);
+		used_last_wb_page_max =
+			PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, used_last_wb_page_max);
+	}
+
+	memset(local_ocm_mask, 0, sizeof(local_ocm_mask));
+	for (tile_id = tile_start; tile_id < tile_start + num_tiles; tile_id++) {
+		for (word_id = 0; word_id < ocm->mask_words; word_id++)
+			local_ocm_mask[word_id] |= ocm->tile_ocm_info[tile_id].ocm_mask[word_id];
+	}
+
+	if (used_scratch_pages_max < scratch_pages) { /* Check for extra scratch pages */
+		if (ocm->num_pages - used_last_wb_page_max - 1 >=
+		    scratch_pages) { /* Pages available */
+			scratch_page_start = ocm->num_pages - scratch_pages;
+			scratch_page_end = ocm->num_pages - 1;
+			for (page_id = scratch_page_start; page_id <= scratch_page_end;
+			     page_id++) { /* Mark the extra scratch pages as used */
+				local_ocm_mask[page_id / OCM_MAP_WORD_SIZE] =
+					SET_BIT(local_ocm_mask[page_id / OCM_MAP_WORD_SIZE],
+						page_id % OCM_MAP_WORD_SIZE);
+			}
+		} else { /* Pages not available, check for next set of tiles */
+			goto next_search;
+		}
+	}
+
+	if (ocm_alloc_mode == 1) {
+		wb_page_start = slot_index_lowest(local_ocm_mask, ocm->mask_words, wb_pages, 0);
+		if (wb_page_start != -1) { /* Have a valid slot for WB, else next set of tiles */
+			tile_idx = tile_start;
+			goto found;
+		}
+	} else if (ocm_alloc_mode == 2) {
+		wb_page_start_curr = slot_index_largest(local_ocm_mask, ocm->mask_words, wb_pages,
+							&max_slot_sz_curr);
+		if (max_slot_sz_curr > max_slot_sz) {
+			wb_page_start = wb_page_start_curr;
+			max_slot_sz = max_slot_sz_curr;
+			tile_idx = tile_start;
+		} else if (max_slot_sz_curr == max_slot_sz) {
+			wb_page_start = wb_page_start_curr;
+			if (wb_page_start == ocm->num_pages) {
+				tile_idx = tile_start;
+				goto found;
+			}
+		}
+	}
+
+next_search:
+	tile_start = tile_start + num_tiles;
+	if (tile_start <= search_end_tile)
+		goto start_search;
+
+found:
+	if (wb_page_start != -1)
+		*tilemask = GENMASK_ULL(tile_idx + num_tiles - 1, tile_idx);
+
+	return wb_page_start;
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 44390396f9..2e26271a7a 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -6,6 +6,7 @@
 #define _CN10K_ML_OCM_H_
 
 #include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
 
 /* Page size in bytes. */
 #define ML_CN10K_OCM_PAGESIZE 0x4000
@@ -76,4 +77,8 @@ struct cn10k_ml_ocm {
 	struct cn10k_ml_ocm_tile_info tile_ocm_info[ML_CN10K_OCM_NUMTILES];
 };
 
+int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
+int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
+			       uint16_t scratch_pages, uint64_t *tilemask);
+
 #endif /* _CN10K_ML_OCM_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 17/39] ml/cnxk: add support to reserve and free OCM pages
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (15 preceding siblings ...)
  2023-02-01  9:22   ` [PATCH v4 16/39] ml/cnxk: find OCM mask and page slots for a model Srikanth Yalavarthi
@ 2023-02-01  9:22   ` Srikanth Yalavarthi
  2023-02-01  9:22   ` [PATCH v4 18/39] ml/cnxk: enable support to start an ML model Srikanth Yalavarthi
                     ` (21 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:22 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added support to reserve and free OCM pages for a model. OCM
pages are reserved upon completion of model start and are
released after model stop.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c | 131 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cn10k_ml_ocm.h |   3 +
 2 files changed, 131 insertions(+), 3 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index df2fa4c514..034d9546eb 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -5,14 +5,17 @@
 #include <rte_mldev_pmd.h>
 
 #include "cn10k_ml_dev.h"
+#include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
 #include "roc_api.h"
 
 /* OCM macros */
-#define BYTE_LEN	  8
-#define OCM_MAP_WORD_SIZE (sizeof(uint8_t) * BYTE_LEN)
-#define SET_BIT(num, n)	  ((num) | (1 << (n)))
+#define BYTE_LEN	   8
+#define OCM_MAP_WORD_SIZE  (sizeof(uint8_t) * BYTE_LEN)
+#define IS_BIT_SET(num, n) ((num) & (1 << (n)))
+#define SET_BIT(num, n)	   ((num) | (1 << (n)))
+#define CLEAR_BIT(num, n)  ((num) &= ~((1) << (n)))
 
 /* Left shift multi-word mask by 1 bit.
  *
@@ -333,3 +336,125 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 
 	return wb_page_start;
 }
+
+void
+cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, int16_t model_id, uint64_t tilemask,
+			   int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+
+	int scratch_page_start;
+	int scratch_page_end;
+	int wb_page_end;
+	int tile_start;
+	int tile_end;
+	int tile_id;
+	int page_id;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+	model = dev->data->models[model_id];
+
+	/* Get first set bit, tile_start */
+	tile_start = 0;
+	tile_end = 0;
+	cn10k_ml_ocm_tilecount(tilemask, &tile_start, &tile_end);
+	wb_page_end = wb_page_start + wb_pages - 1;
+	scratch_page_start = ocm->num_pages - scratch_pages;
+	scratch_page_end = ocm->num_pages - 1;
+
+	/* Update tile_ocm_info */
+	for (tile_id = tile_start; tile_id <= tile_end; tile_id++) {
+		/* Scratch pages */
+		for (page_id = scratch_page_start; page_id <= scratch_page_end; page_id++)
+			ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE] = SET_BIT(
+				ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE],
+				page_id % OCM_MAP_WORD_SIZE);
+		ocm->tile_ocm_info[tile_id].scratch_pages =
+			PLT_MAX(ocm->tile_ocm_info[tile_id].scratch_pages, scratch_pages);
+
+		/* WB pages */
+		for (page_id = wb_page_start; page_id <= wb_page_end; page_id++)
+			ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE] = SET_BIT(
+				ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE],
+				page_id % OCM_MAP_WORD_SIZE);
+		if (wb_pages != 0)
+			ocm->tile_ocm_info[tile_id].last_wb_page =
+				PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, wb_page_end);
+	}
+
+	model->addr.tile_start = tile_start;
+	model->addr.tile_end = tile_end;
+
+	plt_ml_dbg("model_id = %d, tilemask = 0x%016lx", model_id, tilemask);
+	plt_ml_dbg("model_id = %d, wb_page_start = %d, wb_page_end = %d", model_id, wb_page_start,
+		   wb_page_end);
+	plt_ml_dbg("model_id = %d, scratch_page_start = %d, scratch_page_end = %d", model_id,
+		   scratch_page_start, scratch_page_end);
+}
+
+void
+cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, int16_t model_id)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+
+	int scratch_resize_pages;
+	int wb_page_start;
+	int wb_page_end;
+	int prev_start;
+	int curr_start;
+	int tile_id;
+	int page_id;
+	int16_t i;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+	model = dev->data->models[model_id];
+
+	/* Update OCM info for WB memory */
+	wb_page_start = model->model_mem_map.wb_page_start;
+	wb_page_end = wb_page_start + model->model_mem_map.wb_pages - 1;
+	for (tile_id = model->addr.tile_start; tile_id <= model->addr.tile_end; tile_id++) {
+		for (page_id = wb_page_start; page_id <= wb_page_end; page_id++) {
+			ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE] =
+				CLEAR_BIT(ocm->tile_ocm_info[tile_id]
+						  .ocm_mask[page_id / OCM_MAP_WORD_SIZE],
+					  page_id % OCM_MAP_WORD_SIZE);
+		}
+
+		/* Update last_wb_page size */
+		if (wb_page_end == ocm->tile_ocm_info[tile_id].last_wb_page)
+			ocm->tile_ocm_info[tile_id].last_wb_page = wb_page_start - 1;
+
+		/* Update scratch page size and clear extra bits */
+		scratch_resize_pages = 0;
+		/* Get max scratch pages required, excluding the current model */
+		for (i = 0; i < dev->data->nb_models; i++) {
+			struct cn10k_ml_model *model = dev->data->models[i];
+
+			if ((i != model_id) && (model != NULL)) {
+				if (IS_BIT_SET(model->model_mem_map.tilemask, tile_id))
+					scratch_resize_pages =
+						PLT_MAX((int)model->model_mem_map.scratch_pages,
+							scratch_resize_pages);
+			}
+		}
+
+		/* Clear extra scratch pages */
+		if (scratch_resize_pages < ocm->tile_ocm_info[tile_id].scratch_pages) {
+			prev_start = ocm->num_pages - ocm->tile_ocm_info[tile_id].scratch_pages;
+			curr_start = ocm->num_pages - scratch_resize_pages;
+			for (page_id = prev_start; page_id < curr_start; page_id++) {
+				ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE] =
+					CLEAR_BIT(ocm->tile_ocm_info[tile_id]
+							  .ocm_mask[page_id / OCM_MAP_WORD_SIZE],
+						  page_id % OCM_MAP_WORD_SIZE);
+			}
+			ocm->tile_ocm_info[tile_id].scratch_pages = scratch_resize_pages;
+		}
+	}
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 2e26271a7a..cd65d1d8fa 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -80,5 +80,8 @@ struct cn10k_ml_ocm {
 int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
 int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
 			       uint16_t scratch_pages, uint64_t *tilemask);
+void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, int16_t model_id, uint64_t tilemask,
+				int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages);
+void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, int16_t model_id);
 
 #endif /* _CN10K_ML_OCM_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 18/39] ml/cnxk: enable support to start an ML model
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (16 preceding siblings ...)
  2023-02-01  9:22   ` [PATCH v4 17/39] ml/cnxk: add support to reserve and free OCM pages Srikanth Yalavarthi
@ 2023-02-01  9:22   ` Srikanth Yalavarthi
  2023-02-01  9:22   ` [PATCH v4 19/39] ml/cnxk: enable support to stop an ML models Srikanth Yalavarthi
                     ` (20 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:22 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Implemented model start driver function. A model start  job
is checked for completion in synchronous mode. Tilemask and
OCM slot is calculated before starting the model. Model start
is enqueued through scratch registers. OCM pages are reserved
after model start completion.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h |   3 +
 drivers/ml/cnxk/cn10k_ml_ops.c | 207 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |   4 +
 3 files changed, 214 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 68fcc957fa..8f6bc24370 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -33,6 +33,9 @@
 /* ML command timeout in seconds */
 #define ML_CN10K_CMD_TIMEOUT 5
 
+/* ML slow-path job flags */
+#define ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE BIT(0)
+
 /* Poll mode job state */
 #define ML_CN10K_POLL_JOB_START	 0
 #define ML_CN10K_POLL_JOB_FINISH 1
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 8603cba20e..65f69ba8fb 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -114,6 +114,64 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	return NULL;
 }
 
+static void
+cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_model *model,
+				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	struct cn10k_ml_model_addr *addr;
+
+	metadata = &model->metadata;
+	addr = &model->addr;
+
+	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
+	req->jd.hdr.jce.w0.u64 = 0;
+	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
+	req->jd.hdr.model_id = model->model_id;
+	req->jd.hdr.job_type = job_type;
+	req->jd.hdr.fp_flags = 0x0;
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+
+	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
+		if (!model->metadata.model.ocm_relocatable)
+			req->jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
+		else
+			req->jd.hdr.sp_flags = 0x0;
+		req->jd.model_start.model_src_ddr_addr =
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, addr->init_load_addr));
+		req->jd.model_start.model_dst_ddr_addr =
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, addr->init_run_addr));
+		req->jd.model_start.model_init_offset = 0x0;
+		req->jd.model_start.model_main_offset = metadata->init_model.file_size;
+		req->jd.model_start.model_finish_offset =
+			metadata->init_model.file_size + metadata->main_model.file_size;
+		req->jd.model_start.model_init_size = metadata->init_model.file_size;
+		req->jd.model_start.model_main_size = metadata->main_model.file_size;
+		req->jd.model_start.model_finish_size = metadata->finish_model.file_size;
+		req->jd.model_start.model_wb_offset = metadata->init_model.file_size +
+						      metadata->main_model.file_size +
+						      metadata->finish_model.file_size;
+		req->jd.model_start.num_layers = metadata->model.num_layers;
+		req->jd.model_start.num_gather_entries = 0;
+		req->jd.model_start.num_scatter_entries = 0;
+		req->jd.model_start.tilemask = 0; /* Updated after reserving pages */
+		req->jd.model_start.batch_size = model->batch_size;
+		req->jd.model_start.ocm_wb_base_address = 0; /* Updated after reserving pages */
+		req->jd.model_start.ocm_wb_range_start = metadata->model.ocm_wb_range_start;
+		req->jd.model_start.ocm_wb_range_end = metadata->model.ocm_wb_range_end;
+		req->jd.model_start.ddr_wb_base_address = PLT_U64_CAST(roc_ml_addr_ap2mlip(
+			&mldev->roc,
+			PLT_PTR_ADD(addr->finish_load_addr, metadata->finish_model.file_size)));
+		req->jd.model_start.ddr_wb_range_start = metadata->model.ddr_wb_range_start;
+		req->jd.model_start.ddr_wb_range_end = metadata->model.ddr_wb_range_end;
+		req->jd.model_start.input.s.ddr_range_start = metadata->model.ddr_input_range_start;
+		req->jd.model_start.input.s.ddr_range_end = metadata->model.ddr_input_range_end;
+		req->jd.model_start.output.s.ddr_range_start =
+			metadata->model.ddr_output_range_start;
+		req->jd.model_start.output.s.ddr_range_end = metadata->model.ddr_output_range_end;
+	}
+}
+
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -561,6 +619,154 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, int16_t model_id)
 	return plt_memzone_free(plt_memzone_lookup(str));
 }
 
+int
+cn10k_ml_model_start(struct rte_ml_dev *dev, int16_t model_id)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+	struct cn10k_ml_req *req;
+
+	bool job_enqueued;
+	bool job_dequeued;
+	uint8_t num_tiles;
+	uint64_t tilemask;
+	int wb_page_start;
+	int tile_start;
+	int tile_end;
+	bool locked;
+	int ret = 0;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	/* Prepare JD */
+	req = model->req;
+	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
+	req->result.error_code = 0x0;
+	req->result.user_ptr = NULL;
+
+	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_wmb();
+
+	num_tiles = model->metadata.model.tile_end - model->metadata.model.tile_start + 1;
+
+	locked = false;
+	while (!locked) {
+		if (plt_spinlock_trylock(&model->lock) != 0) {
+			if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+				plt_ml_dbg("Model already started, model = 0x%016lx",
+					   PLT_U64_CAST(model));
+				plt_spinlock_unlock(&model->lock);
+				return 1;
+			}
+
+			if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE) {
+				plt_err("A slow-path job is active for the model = 0x%016lx",
+					PLT_U64_CAST(model));
+				plt_spinlock_unlock(&model->lock);
+				return -EBUSY;
+			}
+
+			model->state = ML_CN10K_MODEL_STATE_JOB_ACTIVE;
+			plt_spinlock_unlock(&model->lock);
+			locked = true;
+		}
+	}
+
+	while (!model->model_mem_map.ocm_reserved) {
+		if (plt_spinlock_trylock(&ocm->lock) != 0) {
+			wb_page_start = cn10k_ml_ocm_tilemask_find(
+				dev, num_tiles, model->model_mem_map.wb_pages,
+				model->model_mem_map.scratch_pages, &tilemask);
+
+			if (wb_page_start == -1) {
+				plt_err("Free pages not available on OCM tiles");
+				plt_err("Failed to start model = 0x%016lx, name = %s",
+					PLT_U64_CAST(model), model->metadata.model.name);
+
+				plt_spinlock_unlock(&ocm->lock);
+				return -ENOMEM;
+			}
+
+			model->model_mem_map.tilemask = tilemask;
+			model->model_mem_map.wb_page_start = wb_page_start;
+
+			cn10k_ml_ocm_reserve_pages(
+				dev, model->model_id, model->model_mem_map.tilemask,
+				model->model_mem_map.wb_page_start, model->model_mem_map.wb_pages,
+				model->model_mem_map.scratch_pages);
+			model->model_mem_map.ocm_reserved = true;
+			plt_spinlock_unlock(&ocm->lock);
+		}
+	}
+
+	/* Update JD */
+	cn10k_ml_ocm_tilecount(model->model_mem_map.tilemask, &tile_start, &tile_end);
+	req->jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
+	req->jd.model_start.ocm_wb_base_address =
+		model->model_mem_map.wb_page_start * ocm->page_size;
+
+	job_enqueued = false;
+	job_dequeued = false;
+	do {
+		if (!job_enqueued) {
+			req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+			job_enqueued = roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+		}
+
+		if (job_enqueued && !job_dequeued)
+			job_dequeued = roc_ml_scratch_dequeue(&mldev->roc, &req->jd);
+
+		if (job_dequeued)
+			break;
+	} while (plt_tsc_cycles() < req->timeout);
+
+	if (job_dequeued) {
+		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+			if (req->result.error_code == 0)
+				ret = 0;
+			else
+				ret = -1;
+		}
+	} else { /* Reset scratch registers */
+		roc_ml_scratch_queue_reset(&mldev->roc);
+		ret = -ETIME;
+	}
+
+	locked = false;
+	while (!locked) {
+		if (plt_spinlock_trylock(&model->lock) != 0) {
+			if (ret == 0)
+				model->state = ML_CN10K_MODEL_STATE_STARTED;
+			else
+				model->state = ML_CN10K_MODEL_STATE_UNKNOWN;
+
+			plt_spinlock_unlock(&model->lock);
+			locked = true;
+		}
+	}
+
+	if (model->state == ML_CN10K_MODEL_STATE_UNKNOWN) {
+		while (model->model_mem_map.ocm_reserved) {
+			if (plt_spinlock_trylock(&ocm->lock) != 0) {
+				cn10k_ml_ocm_free_pages(dev, model->model_id);
+				model->model_mem_map.ocm_reserved = false;
+				model->model_mem_map.tilemask = 0x0;
+				plt_spinlock_unlock(&ocm->lock);
+			}
+		}
+	}
+
+	return ret;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -576,4 +782,5 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Model ops */
 	.model_load = cn10k_ml_model_load,
 	.model_unload = cn10k_ml_model_unload,
+	.model_start = cn10k_ml_model_start,
 };
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 981aa52655..af2ea19dce 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -25,6 +25,9 @@ struct cn10k_ml_req {
 
 	/* Job command */
 	struct ml_job_cmd_s jcmd;
+
+	/* Timeout cycle */
+	uint64_t timeout;
 } __rte_aligned(ROC_ALIGN);
 
 /* Request queue */
@@ -61,5 +64,6 @@ extern struct rte_ml_dev_ops cn10k_ml_ops;
 int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			int16_t *model_id);
 int cn10k_ml_model_unload(struct rte_ml_dev *dev, int16_t model_id);
+int cn10k_ml_model_start(struct rte_ml_dev *dev, int16_t model_id);
 
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 19/39] ml/cnxk: enable support to stop an ML models
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (17 preceding siblings ...)
  2023-02-01  9:22   ` [PATCH v4 18/39] ml/cnxk: enable support to start an ML model Srikanth Yalavarthi
@ 2023-02-01  9:22   ` Srikanth Yalavarthi
  2023-02-01  9:22   ` [PATCH v4 20/39] ml/cnxk: enable support to get model information Srikanth Yalavarthi
                     ` (19 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:22 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Implemented model stop driver function. A model stop job is
enqueued through scratch registers and is checked for
completion through polling in a synchronous mode. OCM pages
are released after model stop completion.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 115 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cn10k_ml_ops.h |   1 +
 2 files changed, 114 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 65f69ba8fb..295b7794ec 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -295,10 +295,14 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		/* Re-configure */
 		void **models;
 
-		/* Unload all models */
+		/* Stop and unload all models */
 		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 			model = dev->data->models[model_id];
 			if (model != NULL) {
+				if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+					if (cn10k_ml_model_stop(dev, model_id) != 0)
+						plt_err("Could not stop model %d", model_id);
+				}
 				if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
 					if (cn10k_ml_model_unload(dev, model_id) != 0)
 						plt_err("Could not unload model %d", model_id);
@@ -362,10 +366,14 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 
 	mldev = dev->data->dev_private;
 
-	/* Unload all models */
+	/* Stop and unload all models */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 		model = dev->data->models[model_id];
 		if (model != NULL) {
+			if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+				if (cn10k_ml_model_stop(dev, model_id) != 0)
+					plt_err("Could not stop model %d", model_id);
+			}
 			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
 				if (cn10k_ml_model_unload(dev, model_id) != 0)
 					plt_err("Could not unload model %d", model_id);
@@ -767,6 +775,108 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, int16_t model_id)
 	return ret;
 }
 
+int
+cn10k_ml_model_stop(struct rte_ml_dev *dev, int16_t model_id)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+	struct cn10k_ml_req *req;
+
+	bool job_enqueued;
+	bool job_dequeued;
+	bool locked;
+	int ret = 0;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	/* Prepare JD */
+	req = model->req;
+	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
+	req->result.error_code = 0x0;
+	req->result.user_ptr = NULL;
+
+	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_wmb();
+
+	locked = false;
+	while (!locked) {
+		if (plt_spinlock_trylock(&model->lock) != 0) {
+			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+				plt_ml_dbg("Model not started, model = 0x%016lx",
+					   PLT_U64_CAST(model));
+				plt_spinlock_unlock(&model->lock);
+				return 1;
+			}
+
+			if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE) {
+				plt_err("A slow-path job is active for the model = 0x%016lx",
+					PLT_U64_CAST(model));
+				plt_spinlock_unlock(&model->lock);
+				return -EBUSY;
+			}
+
+			model->state = ML_CN10K_MODEL_STATE_JOB_ACTIVE;
+			plt_spinlock_unlock(&model->lock);
+			locked = true;
+		}
+	}
+
+	while (model->model_mem_map.ocm_reserved) {
+		if (plt_spinlock_trylock(&ocm->lock) != 0) {
+			cn10k_ml_ocm_free_pages(dev, model->model_id);
+			model->model_mem_map.ocm_reserved = false;
+			model->model_mem_map.tilemask = 0x0;
+			plt_spinlock_unlock(&ocm->lock);
+		}
+	}
+
+	job_enqueued = false;
+	job_dequeued = false;
+	do {
+		if (!job_enqueued) {
+			req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+			job_enqueued = roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+		}
+
+		if (job_enqueued && !job_dequeued)
+			job_dequeued = roc_ml_scratch_dequeue(&mldev->roc, &req->jd);
+
+		if (job_dequeued)
+			break;
+	} while (plt_tsc_cycles() < req->timeout);
+
+	if (job_dequeued) {
+		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+			if (req->result.error_code == 0x0)
+				ret = 0;
+			else
+				ret = -1;
+		}
+	} else {
+		roc_ml_scratch_queue_reset(&mldev->roc);
+		ret = -ETIME;
+	}
+
+	locked = false;
+	while (!locked) {
+		if (plt_spinlock_trylock(&model->lock) != 0) {
+			model->state = ML_CN10K_MODEL_STATE_LOADED;
+			plt_spinlock_unlock(&model->lock);
+			locked = true;
+		}
+	}
+
+	return ret;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -783,4 +893,5 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.model_load = cn10k_ml_model_load,
 	.model_unload = cn10k_ml_model_unload,
 	.model_start = cn10k_ml_model_start,
+	.model_stop = cn10k_ml_model_stop,
 };
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index af2ea19dce..3143c9054c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -65,5 +65,6 @@ int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *para
 			int16_t *model_id);
 int cn10k_ml_model_unload(struct rte_ml_dev *dev, int16_t model_id);
 int cn10k_ml_model_start(struct rte_ml_dev *dev, int16_t model_id);
+int cn10k_ml_model_stop(struct rte_ml_dev *dev, int16_t model_id);
 
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 20/39] ml/cnxk: enable support to get model information
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (18 preceding siblings ...)
  2023-02-01  9:22   ` [PATCH v4 19/39] ml/cnxk: enable support to stop an ML models Srikanth Yalavarthi
@ 2023-02-01  9:22   ` Srikanth Yalavarthi
  2023-02-01  9:22   ` [PATCH v4 21/39] ml/cnxk: enable support to update model params Srikanth Yalavarthi
                     ` (18 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:22 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added driver functions to get model information. Added
internal functions to set and get model info.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 55 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h |  9 ++++++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 37 ++++++++++++++++++---
 3 files changed, 97 insertions(+), 4 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 30911b7ffe..295b6f0a01 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -356,3 +356,58 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, int16_t model_id, uin
 
 	return 0;
 }
+
+void
+cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model)
+{
+	struct rte_ml_model_info *info;
+	struct rte_ml_io_info *output;
+	struct rte_ml_io_info *input;
+	uint8_t i;
+
+	info = PLT_PTR_CAST(model->info);
+	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
+	output =
+		PLT_PTR_ADD(input, model->metadata.model.num_input * sizeof(struct rte_ml_io_info));
+
+	/* Set model info */
+	memset(info, 0, sizeof(struct rte_ml_model_info));
+	rte_memcpy(info->name, model->metadata.model.name, MRVL_ML_MODEL_NAME_LEN);
+	snprintf(info->version, RTE_ML_STR_MAX, "%u.%u.%u.%u", model->metadata.model.version[0],
+		 model->metadata.model.version[1], model->metadata.model.version[2],
+		 model->metadata.model.version[3]);
+	info->model_id = model->model_id;
+	info->device_id = dev->data->dev_id;
+	info->batch_size = model->batch_size;
+	info->nb_inputs = model->metadata.model.num_input;
+	info->input_info = input;
+	info->nb_outputs = model->metadata.model.num_output;
+	info->output_info = output;
+	info->wb_size = model->metadata.weights_bias.file_size;
+
+	/* Set input info */
+	for (i = 0; i < info->nb_inputs; i++) {
+		rte_memcpy(input[i].name, model->metadata.input[i].input_name,
+			   MRVL_ML_INPUT_NAME_LEN);
+		input[i].dtype = model->metadata.input[i].input_type;
+		input[i].qtype = model->metadata.input[i].model_input_type;
+		input[i].shape.format = model->metadata.input[i].shape.format;
+		input[i].shape.w = model->metadata.input[i].shape.w;
+		input[i].shape.x = model->metadata.input[i].shape.x;
+		input[i].shape.y = model->metadata.input[i].shape.y;
+		input[i].shape.z = model->metadata.input[i].shape.z;
+	}
+
+	/* Set output info */
+	for (i = 0; i < info->nb_outputs; i++) {
+		rte_memcpy(output[i].name, model->metadata.output[i].output_name,
+			   MRVL_ML_OUTPUT_NAME_LEN);
+		output[i].dtype = model->metadata.output[i].output_type;
+		output[i].qtype = model->metadata.output[i].model_output_type;
+		output[i].shape.format = RTE_ML_IO_FORMAT_1D;
+		output[i].shape.w = model->metadata.output[i].size;
+		output[i].shape.x = 1;
+		output[i].shape.y = 1;
+		output[i].shape.z = 1;
+	}
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 003f5aba36..dca282a498 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -422,6 +422,14 @@ struct cn10k_ml_model {
 	/* Tile and memory information object */
 	struct cn10k_ml_ocm_model_map model_mem_map;
 
+	/* Internal model information structure
+	 * Size of the buffer = sizeof(struct rte_ml_model_info)
+	 *                    + num_inputs * sizeof(struct rte_ml_io_info)
+	 *                    + num_outputs * sizeof(struct rte_ml_io_info).
+	 * Structures would be arranged in the same order in the buffer.
+	 */
+	uint8_t *info;
+
 	/* Spinlock, used to update model state */
 	plt_spinlock_t lock;
 
@@ -438,5 +446,6 @@ void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
 				uint8_t *base_dma_addr);
 int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, int16_t model_id, uint8_t *buffer,
 				   uint16_t *wb_pages, uint16_t *scratch_pages);
+void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 295b7794ec..0d6030d36a 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -506,6 +506,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
 	size_t model_data_size;
+	size_t model_info_size;
 	uint8_t *base_dma_addr;
 	uint16_t scratch_pages;
 	uint16_t wb_pages;
@@ -544,8 +545,13 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
 			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
 	model_data_size = PLT_ALIGN_CEIL(model_data_size, ML_CN10K_ALIGN_SIZE);
+	model_info_size = sizeof(struct rte_ml_model_info) +
+			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
+			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
+	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
+
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
-		  2 * model_data_size +
+		  2 * model_data_size + model_info_size +
 		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE);
 
 	/* Allocate memzone for model object and model data */
@@ -585,10 +591,12 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	model->model_mem_map.wb_pages = wb_pages;
 	model->model_mem_map.scratch_pages = scratch_pages;
 
+	/* Set model info */
+	model->info = PLT_PTR_ADD(base_dma_addr, 2 * model_data_size);
+	cn10k_ml_model_info_set(dev, model);
+
 	/* Set slow-path request address and state */
-	model->req = PLT_PTR_ADD(
-		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
-				  2 * model_data_size);
+	model->req = PLT_PTR_ADD(model->info, model_info_size);
 
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
@@ -877,6 +885,26 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, int16_t model_id)
 	return ret;
 }
 
+static int
+cn10k_ml_model_info_get(struct rte_ml_dev *dev, int16_t model_id,
+			struct rte_ml_model_info *model_info)
+{
+	struct cn10k_ml_model *model;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	rte_memcpy(model_info, model->info, sizeof(struct rte_ml_model_info));
+	model_info->input_info = ((struct rte_ml_model_info *)model->info)->input_info;
+	model_info->output_info = ((struct rte_ml_model_info *)model->info)->output_info;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -894,4 +922,5 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.model_unload = cn10k_ml_model_unload,
 	.model_start = cn10k_ml_model_start,
 	.model_stop = cn10k_ml_model_stop,
+	.model_info_get = cn10k_ml_model_info_get,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 21/39] ml/cnxk: enable support to update model params
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (19 preceding siblings ...)
  2023-02-01  9:22   ` [PATCH v4 20/39] ml/cnxk: enable support to get model information Srikanth Yalavarthi
@ 2023-02-01  9:22   ` Srikanth Yalavarthi
  2023-02-01  9:22   ` [PATCH v4 22/39] ml/cnxk: add support to get IO buffer sizes Srikanth Yalavarthi
                     ` (17 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:22 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added cnxk driver functions to update model params or weights
and bias after a models is loaded. Updating model params would
not require reloading the model.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 0d6030d36a..8a91f98e50 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -905,6 +905,36 @@ cn10k_ml_model_info_get(struct rte_ml_dev *dev, int16_t model_id,
 	return 0;
 }
 
+static int
+cn10k_ml_model_params_update(struct rte_ml_dev *dev, int16_t model_id, void *buffer)
+{
+	struct cn10k_ml_model *model;
+	size_t size;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	if (model->state == ML_CN10K_MODEL_STATE_UNKNOWN)
+		return -1;
+	else if (model->state != ML_CN10K_MODEL_STATE_LOADED)
+		return -EBUSY;
+
+	size = model->metadata.init_model.file_size + model->metadata.main_model.file_size +
+	       model->metadata.finish_model.file_size + model->metadata.weights_bias.file_size;
+
+	/* Update model weights & bias */
+	rte_memcpy(model->addr.wb_load_addr, buffer, model->metadata.weights_bias.file_size);
+
+	/* Copy data from load to run. run address to be used by MLIP */
+	rte_memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, size);
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -923,4 +953,5 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.model_start = cn10k_ml_model_start,
 	.model_stop = cn10k_ml_model_stop,
 	.model_info_get = cn10k_ml_model_info_get,
+	.model_params_update = cn10k_ml_model_params_update,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 22/39] ml/cnxk: add support to get IO buffer sizes
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (20 preceding siblings ...)
  2023-02-01  9:22   ` [PATCH v4 21/39] ml/cnxk: enable support to update model params Srikanth Yalavarthi
@ 2023-02-01  9:22   ` Srikanth Yalavarthi
  2023-02-01  9:22   ` [PATCH v4 23/39] ml/cnxk: enable quantization and dequantization Srikanth Yalavarthi
                     ` (16 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:22 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added driver functions to get input and output buffer sizes
for a given batch size. This function would compute the buffer
size based on specific requirements of the device.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 52 ++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 8a91f98e50..643688e3d0 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -935,6 +935,54 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, int16_t model_id, void *buf
 	return 0;
 }
 
+static int
+cn10k_ml_io_input_size_get(struct rte_ml_dev *dev, int16_t model_id, uint32_t nb_batches,
+			   uint64_t *input_qsize, uint64_t *input_dsize)
+{
+	struct cn10k_ml_model *model;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	if (input_qsize != NULL)
+		*input_qsize = PLT_U64_CAST(model->addr.total_input_sz_q *
+					    PLT_DIV_CEIL(nb_batches, model->batch_size));
+
+	if (input_dsize != NULL)
+		*input_dsize = PLT_U64_CAST(model->addr.total_input_sz_d *
+					    PLT_DIV_CEIL(nb_batches, model->batch_size));
+
+	return 0;
+}
+
+static int
+cn10k_ml_io_output_size_get(struct rte_ml_dev *dev, int16_t model_id, uint32_t nb_batches,
+			    uint64_t *output_qsize, uint64_t *output_dsize)
+{
+	struct cn10k_ml_model *model;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	if (output_qsize != NULL)
+		*output_qsize = PLT_U64_CAST(model->addr.total_output_sz_q *
+					     PLT_DIV_CEIL(nb_batches, model->batch_size));
+
+	if (output_dsize != NULL)
+		*output_dsize = PLT_U64_CAST(model->addr.total_output_sz_d *
+					     PLT_DIV_CEIL(nb_batches, model->batch_size));
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -954,4 +1002,8 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.model_stop = cn10k_ml_model_stop,
 	.model_info_get = cn10k_ml_model_info_get,
 	.model_params_update = cn10k_ml_model_params_update,
+
+	/* I/O ops */
+	.io_input_size_get = cn10k_ml_io_input_size_get,
+	.io_output_size_get = cn10k_ml_io_output_size_get,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 23/39] ml/cnxk: enable quantization and dequantization
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (21 preceding siblings ...)
  2023-02-01  9:22   ` [PATCH v4 22/39] ml/cnxk: add support to get IO buffer sizes Srikanth Yalavarthi
@ 2023-02-01  9:22   ` Srikanth Yalavarthi
  2023-02-01  9:22   ` [PATCH v4 24/39] ml/cnxk: enable support to dump device debug info Srikanth Yalavarthi
                     ` (15 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:22 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Implemented driver functions to quantize / dequantize input
and output data. Support is enabled for multiple batches.
Quantization / dequantization use the type conversion functions
defined in ML common code.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 151 +++++++++++++++++++++++++++++++++
 1 file changed, 151 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 643688e3d0..88809e6e96 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -5,6 +5,8 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include <mldev_utils.h>
+
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ops.h"
@@ -983,6 +985,153 @@ cn10k_ml_io_output_size_get(struct rte_ml_dev *dev, int16_t model_id, uint32_t n
 	return 0;
 }
 
+static int
+cn10k_ml_io_quantize(struct rte_ml_dev *dev, int16_t model_id, uint16_t nb_batches, void *dbuffer,
+		     void *qbuffer)
+{
+	struct cn10k_ml_model *model;
+	uint8_t *lcl_dbuffer;
+	uint8_t *lcl_qbuffer;
+	uint32_t batch_id;
+	uint32_t i;
+	int ret;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	lcl_dbuffer = dbuffer;
+	lcl_qbuffer = qbuffer;
+	batch_id = 0;
+
+next_batch:
+	for (i = 0; i < model->metadata.model.num_input; i++) {
+		if (model->metadata.input[i].input_type ==
+		    model->metadata.input[i].model_input_type) {
+			rte_memcpy(lcl_qbuffer, lcl_dbuffer, model->addr.input[i].sz_d);
+		} else {
+			switch (model->metadata.input[i].model_input_type) {
+			case RTE_ML_IO_TYPE_INT8:
+				ret = rte_ml_io_float32_to_int8(model->metadata.input[i].qscale,
+								model->addr.input[i].nb_elements,
+								lcl_dbuffer, lcl_qbuffer);
+				break;
+			case RTE_ML_IO_TYPE_UINT8:
+				ret = rte_ml_io_float32_to_uint8(model->metadata.input[i].qscale,
+								 model->addr.input[i].nb_elements,
+								 lcl_dbuffer, lcl_qbuffer);
+				break;
+			case RTE_ML_IO_TYPE_INT16:
+				ret = rte_ml_io_float32_to_int16(model->metadata.input[i].qscale,
+								 model->addr.input[i].nb_elements,
+								 lcl_dbuffer, lcl_qbuffer);
+				break;
+			case RTE_ML_IO_TYPE_UINT16:
+				ret = rte_ml_io_float32_to_uint16(model->metadata.input[i].qscale,
+								  model->addr.input[i].nb_elements,
+								  lcl_dbuffer, lcl_qbuffer);
+				break;
+			case RTE_ML_IO_TYPE_FP16:
+				ret = rte_ml_io_float32_to_float16(model->addr.input[i].nb_elements,
+								   lcl_dbuffer, lcl_qbuffer);
+				break;
+			default:
+				plt_err("Unsupported model_input_type[%u] : %u", i,
+					model->metadata.input[i].model_input_type);
+				ret = -ENOTSUP;
+			}
+			if (ret < 0)
+				return ret;
+		}
+
+		lcl_dbuffer += model->addr.input[i].sz_d;
+		lcl_qbuffer += model->addr.input[i].sz_q;
+	}
+
+	batch_id++;
+	if (batch_id < PLT_DIV_CEIL(nb_batches, model->batch_size))
+		goto next_batch;
+
+	return 0;
+}
+
+static int
+cn10k_ml_io_dequantize(struct rte_ml_dev *dev, int16_t model_id, uint16_t nb_batches, void *qbuffer,
+		       void *dbuffer)
+{
+	struct cn10k_ml_model *model;
+	uint8_t *lcl_qbuffer;
+	uint8_t *lcl_dbuffer;
+	uint32_t batch_id;
+	uint32_t i;
+	int ret;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %d", model_id);
+		return -EINVAL;
+	}
+
+	lcl_dbuffer = dbuffer;
+	lcl_qbuffer = qbuffer;
+	batch_id = 0;
+
+next_batch:
+	for (i = 0; i < model->metadata.model.num_output; i++) {
+		if (model->metadata.output[i].output_type ==
+		    model->metadata.output[i].model_output_type) {
+			rte_memcpy(lcl_dbuffer, lcl_qbuffer, model->addr.output[i].sz_q);
+		} else {
+			switch (model->metadata.output[i].model_output_type) {
+			case RTE_ML_IO_TYPE_INT8:
+				ret = rte_ml_io_int8_to_float32(model->metadata.output[i].dscale,
+								model->addr.output[i].nb_elements,
+								lcl_qbuffer, lcl_dbuffer);
+				break;
+			case RTE_ML_IO_TYPE_UINT8:
+				ret = rte_ml_io_uint8_to_float32(model->metadata.output[i].dscale,
+								 model->addr.output[i].nb_elements,
+								 lcl_qbuffer, lcl_dbuffer);
+				break;
+			case RTE_ML_IO_TYPE_INT16:
+				ret = rte_ml_io_int16_to_float32(model->metadata.output[i].dscale,
+								 model->addr.output[i].nb_elements,
+								 lcl_qbuffer, lcl_dbuffer);
+				break;
+			case RTE_ML_IO_TYPE_UINT16:
+				ret = rte_ml_io_uint16_to_float32(model->metadata.output[i].dscale,
+								  model->addr.output[i].nb_elements,
+								  lcl_qbuffer, lcl_dbuffer);
+				break;
+			case RTE_ML_IO_TYPE_FP16:
+				ret = rte_ml_io_float16_to_float32(
+					model->addr.output[i].nb_elements, lcl_qbuffer,
+					lcl_dbuffer);
+				break;
+			default:
+				plt_err("Unsupported model_output_type[%u] : %u", i,
+					model->metadata.output[i].model_output_type);
+				ret = -ENOTSUP;
+			}
+			if (ret < 0)
+				return ret;
+		}
+
+		lcl_qbuffer += model->addr.output[i].sz_q;
+		lcl_dbuffer += model->addr.output[i].sz_d;
+	}
+
+	batch_id++;
+	if (batch_id < PLT_DIV_CEIL(nb_batches, model->batch_size))
+		goto next_batch;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -1006,4 +1155,6 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* I/O ops */
 	.io_input_size_get = cn10k_ml_io_input_size_get,
 	.io_output_size_get = cn10k_ml_io_output_size_get,
+	.io_quantize = cn10k_ml_io_quantize,
+	.io_dequantize = cn10k_ml_io_dequantize,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 24/39] ml/cnxk: enable support to dump device debug info
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (22 preceding siblings ...)
  2023-02-01  9:22   ` [PATCH v4 23/39] ml/cnxk: enable quantization and dequantization Srikanth Yalavarthi
@ 2023-02-01  9:22   ` Srikanth Yalavarthi
  2023-02-01  9:22   ` [PATCH v4 25/39] ml/cnxk: add driver support for device selftest Srikanth Yalavarthi
                     ` (14 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:22 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added support to dump device debug information. Debug info on
cn10k device includes model state info, OCM usage info, firmware
debug and exception buffer.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c |  51 +++++++++
 drivers/ml/cnxk/cn10k_ml_ocm.h |   1 +
 drivers/ml/cnxk/cn10k_ml_ops.c | 189 +++++++++++++++++++++++++++++++++
 3 files changed, 241 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 034d9546eb..2083d99f81 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -458,3 +458,54 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, int16_t model_id)
 		}
 	}
 }
+
+static void
+cn10k_ml_ocm_pagemask_to_str(struct cn10k_ml_ocm_tile_info *tile_info, uint16_t nwords, char *str)
+{
+	char *p = str;
+	int word;
+
+	/* add prefix 0x */
+	*p++ = '0';
+	*p++ = 'x';
+
+	/* build one word at a time */
+	for (word = nwords - 1; word >= 0; word--) {
+		sprintf(p, "%02X", tile_info->ocm_mask[word]);
+		p += 2;
+	}
+
+	/* terminate */
+	*p++ = 0;
+}
+
+void
+cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
+{
+	char str[ML_CN10K_OCM_NUMPAGES / 4 + 2]; /* nibbles + prefix '0x' */
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+	uint8_t tile_id;
+	uint8_t word_id;
+	int wb_pages;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+
+	fprintf(fp, "OCM State:\n");
+	for (tile_id = 0; tile_id < ocm->num_tiles; tile_id++) {
+		cn10k_ml_ocm_pagemask_to_str(&ocm->tile_ocm_info[tile_id], ocm->mask_words, str);
+
+		wb_pages = 0 - ocm->tile_ocm_info[tile_id].scratch_pages;
+		for (word_id = 0; word_id < ML_CN10K_OCM_MASKWORDS; word_id++)
+			wb_pages +=
+				__builtin_popcount(ocm->tile_ocm_info[tile_id].ocm_mask[word_id]);
+
+		fprintf(fp,
+			"tile = %2u, scratch_pages = %4u,"
+			" wb_pages = %4d, last_wb_page = %4d,"
+			" pagemask = %s\n",
+			tile_id, ocm->tile_ocm_info[tile_id].scratch_pages, wb_pages,
+			ocm->tile_ocm_info[tile_id].last_wb_page, str);
+	}
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index cd65d1d8fa..4415bbfb45 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -83,5 +83,6 @@ int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16
 void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, int16_t model_id, uint64_t tilemask,
 				int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages);
 void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, int16_t model_id);
+void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp);
 
 #endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 88809e6e96..ad849e7abc 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -14,10 +14,25 @@
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
 
+/* Debug print width */
+#define STR_LEN	  12
+#define FIELD_LEN 16
+#define LINE_LEN  90
+
 /* ML Job descriptor flags */
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
+static void
+print_line(FILE *fp, int len)
+{
+	int i;
+
+	for (i = 0; i < len; i++)
+		fprintf(fp, "-");
+	fprintf(fp, "\n");
+}
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -116,6 +131,102 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	return NULL;
 }
 
+static void
+cn10k_ml_model_print(struct rte_ml_dev *dev, int16_t model_id, FILE *fp)
+{
+
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+	char str[STR_LEN];
+	uint8_t i;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+	model = dev->data->models[model_id];
+
+	/* Print debug info */
+	print_line(fp, LINE_LEN);
+	fprintf(fp, " Model Information (%s)\n", model->metadata.model.name);
+	print_line(fp, LINE_LEN);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->metadata.model.name);
+	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version", model->metadata.model.version[0],
+		model->metadata.model.version[1], model->metadata.model.version[2],
+		model->metadata.model.version[3]);
+	if (strlen(model->name) != 0)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "debug_name", model->name);
+	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "index", model->model_id);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->metadata.model.batch_size);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_layers", model->metadata.model.num_layers);
+
+	/* Print model state */
+	if (model->state == ML_CN10K_MODEL_STATE_LOADED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+	if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+	if (model->state == ML_CN10K_MODEL_STATE_STARTED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+
+	/* Print OCM status */
+	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
+		model->metadata.model.ocm_wb_range_end - model->metadata.model.ocm_wb_range_start +
+			1);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", model->model_mem_map.wb_pages);
+	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
+		ocm->size_per_tile - model->metadata.model.ocm_tmp_range_floor);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages", model->model_mem_map.scratch_pages);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
+		model->metadata.model.tile_end - model->metadata.model.tile_start + 1);
+
+	if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
+			ML_CN10K_OCM_NUMTILES / 4, model->model_mem_map.tilemask);
+		fprintf(fp, "%*s : 0x%x\n", FIELD_LEN, "ocm_wb_start",
+			model->model_mem_map.wb_page_start * ML_CN10K_OCM_PAGESIZE);
+	}
+
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->metadata.model.num_input);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", model->metadata.model.num_output);
+	fprintf(fp, "\n");
+
+	print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s  %18s  %12s  %14s\n", "input", "input_name", "input_type",
+		"model_input_type", "quantize", "format");
+	print_line(fp, LINE_LEN);
+	for (i = 0; i < model->metadata.model.num_input; i++) {
+		fprintf(fp, "%8u  ", i);
+		fprintf(fp, "%*s  ", 16, model->metadata.input[i].input_name);
+		rte_ml_io_type_to_str(model->metadata.input[i].input_type, str, STR_LEN);
+		fprintf(fp, "%*s  ", 12, str);
+		rte_ml_io_type_to_str(model->metadata.input[i].model_input_type, str, STR_LEN);
+		fprintf(fp, "%*s  ", 18, str);
+		fprintf(fp, "%*s", 12, (model->metadata.input[i].quantize == 1 ? "Yes" : "No"));
+		rte_ml_io_format_to_str(model->metadata.input[i].shape.format, str, STR_LEN);
+		fprintf(fp, "%*s", 16, str);
+		fprintf(fp, "\n");
+	}
+	fprintf(fp, "\n");
+
+	print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "output", "output_name", "output_type",
+		"model_output_type", "dequantize");
+	print_line(fp, LINE_LEN);
+	for (i = 0; i < model->metadata.model.num_output; i++) {
+		fprintf(fp, "%8u  ", i);
+		fprintf(fp, "%*s  ", 16, model->metadata.output[i].output_name);
+		rte_ml_io_type_to_str(model->metadata.output[i].output_type, str, STR_LEN);
+		fprintf(fp, "%*s  ", 12, str);
+		rte_ml_io_type_to_str(model->metadata.output[i].model_output_type, str, STR_LEN);
+		fprintf(fp, "%*s  ", 18, str);
+		fprintf(fp, "%*s", 12, (model->metadata.output[i].dequantize == 1 ? "Yes" : "No"));
+		fprintf(fp, "\n");
+	}
+	fprintf(fp, "\n");
+	print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+}
+
 static void
 cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_model *model,
 				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
@@ -498,6 +609,83 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+static int
+cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_fw *fw;
+
+	uint32_t head_loc;
+	uint32_t tail_loc;
+	uint32_t bufsize;
+	char *head_ptr;
+	int model_id;
+	int core_id;
+
+	if (roc_env_is_asim())
+		return 0;
+
+	mldev = dev->data->dev_private;
+	fw = &mldev->fw;
+
+	/* Dump model info */
+	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+		model = dev->data->models[model_id];
+		if (model != NULL) {
+			cn10k_ml_model_print(dev, model_id, fp);
+			fprintf(fp, "\n");
+		}
+	}
+
+	/* Dump OCM state */
+	cn10k_ml_ocm_print(dev, fp);
+
+	/* Dump debug buffer */
+	for (core_id = 0; core_id <= 1; core_id++) {
+		bufsize = fw->req->jd.fw_load.debug.debug_buffer_size;
+		if (core_id == 0) {
+			head_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+			tail_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_debug_ptr);
+			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+		} else {
+			head_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+			tail_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_debug_ptr);
+			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+		}
+		if (head_loc < tail_loc) {
+			fprintf(fp, "%.*s\n", tail_loc - head_loc, &head_ptr[head_loc]);
+		} else if (head_loc >= tail_loc + 1) {
+			fprintf(fp, "%.*s\n", bufsize - tail_loc, &head_ptr[head_loc]);
+			fprintf(fp, "%.*s\n", tail_loc, &head_ptr[0]);
+		}
+	}
+
+	/* Dump exception info */
+	for (core_id = 0; core_id <= 1; core_id++) {
+		bufsize = fw->req->jd.fw_load.debug.exception_state_size;
+		if ((core_id == 0) &&
+		    (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0)) {
+			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_exception_buffer);
+			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C0 = 0x%016lx",
+				roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0));
+			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+			fprintf(fp, "%.*s", bufsize, head_ptr);
+		} else if ((core_id == 1) &&
+			   (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0)) {
+			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_exception_buffer);
+			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C1 = 0x%016lx",
+				roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1));
+			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+			fprintf(fp, "%.*s", bufsize, head_ptr);
+		}
+	}
+
+	return 0;
+}
+
 int
 cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, int16_t *model_id)
 {
@@ -1139,6 +1327,7 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.dev_close = cn10k_ml_dev_close,
 	.dev_start = cn10k_ml_dev_start,
 	.dev_stop = cn10k_ml_dev_stop,
+	.dev_dump = cn10k_ml_dev_dump,
 
 	/* Queue-pair handling ops */
 	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 25/39] ml/cnxk: add driver support for device selftest
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (23 preceding siblings ...)
  2023-02-01  9:22   ` [PATCH v4 24/39] ml/cnxk: enable support to dump device debug info Srikanth Yalavarthi
@ 2023-02-01  9:22   ` Srikanth Yalavarthi
  2023-02-01  9:22   ` [PATCH v4 26/39] ml/cnxk: enqueue a burst of inference requests Srikanth Yalavarthi
                     ` (13 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:22 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added support for device selftest. Device selftest includes
checking the status of firmware.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 57 ++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index ad849e7abc..f8d30ca5a6 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -686,6 +686,62 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	return 0;
 }
 
+static int
+cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
+{
+	const struct plt_memzone *mz;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_req *req;
+	uint64_t timeout_cycle;
+	bool timeout;
+	int ret;
+
+	mldev = dev->data->dev_private;
+	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cn10k_ml_req), 0,
+					 ML_CN10K_ALIGN_SIZE);
+	if (mz == NULL) {
+		plt_err("Could not allocate reserved memzone");
+		return -ENOMEM;
+	}
+	req = mz->addr;
+
+	/* Prepare load completion structure */
+	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
+	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
+	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST;
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+	req->jd.fw_load.flags = cn10k_ml_fw_flags_get(&mldev->fw);
+	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_wmb();
+
+	/* Enqueue firmware selftest request through scratch registers */
+	timeout = true;
+	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+
+	plt_rmb();
+	do {
+		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
+		    (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+			timeout = false;
+			break;
+		}
+	} while (plt_tsc_cycles() < timeout_cycle);
+
+	/* Check firmware selftest status, clean-up and exit */
+	ret = 0;
+	if (timeout) {
+		ret = -ETIME;
+	} else {
+		if (req->result.error_code != 0)
+			ret = -1;
+	}
+
+	plt_memzone_free(mz);
+
+	return ret;
+}
+
 int
 cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, int16_t *model_id)
 {
@@ -1328,6 +1384,7 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.dev_start = cn10k_ml_dev_start,
 	.dev_stop = cn10k_ml_dev_stop,
 	.dev_dump = cn10k_ml_dev_dump,
+	.dev_selftest = cn10k_ml_dev_selftest,
 
 	/* Queue-pair handling ops */
 	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 26/39] ml/cnxk: enqueue a burst of inference requests
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (24 preceding siblings ...)
  2023-02-01  9:22   ` [PATCH v4 25/39] ml/cnxk: add driver support for device selftest Srikanth Yalavarthi
@ 2023-02-01  9:22   ` Srikanth Yalavarthi
  2023-02-01  9:22   ` [PATCH v4 27/39] ml/cnxk: dequeue " Srikanth Yalavarthi
                     ` (12 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:22 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled driver support to enqueue a burst of inference requests
to ML device. Enqueue uses internal ML request structure to queue
the inferences and job completion through polling.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 96 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  7 +++
 2 files changed, 103 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index f8d30ca5a6..1abdf6fad1 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -285,6 +285,28 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 	}
 }
 
+static __rte_always_inline void
+cn10k_ml_prep_fp_job_descriptor(struct rte_ml_dev *dev, struct cn10k_ml_req *req,
+				struct rte_ml_op *op)
+{
+	struct cn10k_ml_dev *mldev;
+
+	mldev = dev->data->dev_private;
+
+	req->jd.hdr.jce.w0.u64 = 0;
+	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
+	req->jd.hdr.model_id = op->model_id;
+	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
+	req->jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
+	req->jd.hdr.sp_flags = 0x0;
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+	req->jd.model_run.input_ddr_addr =
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->input.addr));
+	req->jd.model_run.output_ddr_addr =
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->output.addr));
+	req->jd.model_run.num_batches = op->nb_batches;
+}
+
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -450,6 +472,8 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 
 	rte_spinlock_init(&ocm->lock);
 
+	dev->enqueue_burst = cn10k_ml_enqueue_burst;
+
 	mldev->nb_models_loaded = 0;
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
 
@@ -1376,6 +1400,78 @@ cn10k_ml_io_dequantize(struct rte_ml_dev *dev, int16_t model_id, uint16_t nb_bat
 	return 0;
 }
 
+static __rte_always_inline void
+queue_index_advance(uint64_t *index, uint64_t nb_desc)
+{
+	*index = (*index + 1) % nb_desc;
+}
+
+static __rte_always_inline uint64_t
+queue_pending_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
+{
+	return (nb_desc + head - tail) % nb_desc;
+}
+
+static __rte_always_inline uint64_t
+queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
+{
+	return nb_desc - queue_pending_count(head, tail, nb_desc) - 1;
+}
+
+__rte_hot uint16_t
+cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
+		       uint16_t nb_ops)
+{
+	struct cn10k_ml_queue *queue;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_req *req;
+	struct cn10k_ml_qp *qp;
+	struct rte_ml_op *op;
+
+	uint16_t count;
+	uint64_t head;
+	bool enqueued;
+
+	mldev = dev->data->dev_private;
+	qp = dev->data->queue_pairs[qp_id];
+	queue = &qp->queue;
+
+	head = queue->head;
+	nb_ops = PLT_MIN(nb_ops, queue_free_count(head, queue->tail, qp->nb_desc));
+	count = 0;
+
+	if (unlikely(nb_ops == 0))
+		return 0;
+
+enqueue_req:
+	op = ops[count];
+	req = &queue->reqs[head];
+
+	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
+
+	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
+	req->result.user_ptr = op->user_ptr;
+
+	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	enqueued = roc_ml_jcmdq_enqueue_lf(&mldev->roc, &req->jcmd);
+	if (unlikely(!enqueued))
+		goto jcmdq_full;
+
+	req->timeout = plt_tsc_cycles() + queue->wait_cycles;
+	req->op = op;
+
+	queue_index_advance(&head, qp->nb_desc);
+	count++;
+
+	if (count < nb_ops)
+		goto enqueue_req;
+
+jcmdq_full:
+	queue->head = head;
+
+	return count;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 3143c9054c..d35f91a302 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -28,6 +28,9 @@ struct cn10k_ml_req {
 
 	/* Timeout cycle */
 	uint64_t timeout;
+
+	/* Op */
+	struct rte_ml_op *op;
 } __rte_aligned(ROC_ALIGN);
 
 /* Request queue */
@@ -67,4 +70,8 @@ int cn10k_ml_model_unload(struct rte_ml_dev *dev, int16_t model_id);
 int cn10k_ml_model_start(struct rte_ml_dev *dev, int16_t model_id);
 int cn10k_ml_model_stop(struct rte_ml_dev *dev, int16_t model_id);
 
+/* Fast-path ops */
+__rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
+					  struct rte_ml_op **ops, uint16_t nb_ops);
+
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 27/39] ml/cnxk: dequeue a burst of inference requests
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (25 preceding siblings ...)
  2023-02-01  9:22   ` [PATCH v4 26/39] ml/cnxk: enqueue a burst of inference requests Srikanth Yalavarthi
@ 2023-02-01  9:22   ` Srikanth Yalavarthi
  2023-02-01  9:22   ` [PATCH v4 28/39] ml/cnxk: add internal function for sync mode run Srikanth Yalavarthi
                     ` (11 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:22 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled driver support to dequeue inference requests from
internal queue. Dequeue checks for request completion by
polling the status field of the job request.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 61 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  2 ++
 2 files changed, 63 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 1abdf6fad1..ef3cbadca7 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -473,6 +473,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	rte_spinlock_init(&ocm->lock);
 
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
+	dev->dequeue_burst = cn10k_ml_dequeue_burst;
 
 	mldev->nb_models_loaded = 0;
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
@@ -1418,6 +1419,23 @@ queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
 	return nb_desc - queue_pending_count(head, tail, nb_desc) - 1;
 }
 
+static __rte_always_inline void
+cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
+		       struct rte_ml_op *op)
+{
+	PLT_SET_USED(dev);
+	PLT_SET_USED(qp_id);
+
+	op->impl_opaque = result->error_code;
+
+	if (likely(result->error_code == 0))
+		op->status = RTE_ML_OP_STATUS_SUCCESS;
+	else
+		op->status = RTE_ML_OP_STATUS_ERROR;
+
+	op->user_ptr = result->user_ptr;
+}
+
 __rte_hot uint16_t
 cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
@@ -1472,6 +1490,49 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	return count;
 }
 
+__rte_hot uint16_t
+cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
+		       uint16_t nb_ops)
+{
+	struct cn10k_ml_queue *queue;
+	struct cn10k_ml_req *req;
+	struct cn10k_ml_qp *qp;
+
+	uint64_t status;
+	uint16_t count;
+	uint64_t tail;
+
+	qp = dev->data->queue_pairs[qp_id];
+	queue = &qp->queue;
+
+	tail = queue->tail;
+	nb_ops = PLT_MIN(nb_ops, queue_pending_count(queue->head, tail, qp->nb_desc));
+	count = 0;
+
+	if (unlikely(nb_ops == 0))
+		goto empty_or_active;
+
+dequeue_req:
+	req = &queue->reqs[tail];
+	status = plt_read64(&req->status);
+	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH))
+		goto empty_or_active;
+
+	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
+	ops[count] = req->op;
+
+	queue_index_advance(&tail, qp->nb_desc);
+	count++;
+
+	if (count < nb_ops)
+		goto dequeue_req;
+
+empty_or_active:
+	queue->tail = tail;
+
+	return count;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index d35f91a302..3178295bba 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -73,5 +73,7 @@ int cn10k_ml_model_stop(struct rte_ml_dev *dev, int16_t model_id);
 /* Fast-path ops */
 __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					  struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
+					  struct rte_ml_op **ops, uint16_t nb_ops);
 
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 28/39] ml/cnxk: add internal function for sync mode run
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (26 preceding siblings ...)
  2023-02-01  9:22   ` [PATCH v4 27/39] ml/cnxk: dequeue " Srikanth Yalavarthi
@ 2023-02-01  9:22   ` Srikanth Yalavarthi
  2023-02-01  9:23   ` [PATCH v4 29/39] ml/cnxk: enable support for firmware error codes Srikanth Yalavarthi
                     ` (10 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:22 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added internal function to execute ML inference requests
in synchronous mode. Sync mode inference execution is used
to launch inference requests without using a queue-pair.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 53 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  1 +
 2 files changed, 54 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index ef3cbadca7..b6a35f9a4f 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1533,6 +1533,59 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	return count;
 }
 
+__rte_hot int
+cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_req *req;
+	bool timeout;
+	int ret = 0;
+
+	mldev = dev->data->dev_private;
+	model = dev->data->models[op->model_id];
+	req = model->req;
+
+	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
+
+	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
+	req->result.user_ptr = op->user_ptr;
+
+	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
+
+	timeout = true;
+	req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	do {
+		if (roc_ml_jcmdq_enqueue_lf(&mldev->roc, &req->jcmd)) {
+			req->op = op;
+			timeout = false;
+			break;
+		}
+	} while (plt_tsc_cycles() < req->timeout);
+
+	if (timeout) {
+		ret = -EBUSY;
+		goto error_enqueue;
+	}
+
+	timeout = true;
+	do {
+		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+			timeout = false;
+			break;
+		}
+	} while (plt_tsc_cycles() < req->timeout);
+
+	if (timeout)
+		ret = -ETIME;
+	else
+		cn10k_ml_result_update(dev, -1, &req->result, req->op);
+
+error_enqueue:
+	return ret;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 3178295bba..a17a2851b1 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -75,5 +75,6 @@ __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id
 					  struct rte_ml_op **ops, uint16_t nb_ops);
 __rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					  struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
 
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 29/39] ml/cnxk: enable support for firmware error codes
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (27 preceding siblings ...)
  2023-02-01  9:22   ` [PATCH v4 28/39] ml/cnxk: add internal function for sync mode run Srikanth Yalavarthi
@ 2023-02-01  9:23   ` Srikanth Yalavarthi
  2023-02-01  9:23   ` [PATCH v4 30/39] ml/cnxk: add support to get and reset device stats Srikanth Yalavarthi
                     ` (9 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:23 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled support for error handling. Added error types and subtypes
supported by ML firmware. Enabled support to get device specific
error code and message for a completed ML request.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c |   4 +-
 drivers/ml/cnxk/cn10k_ml_dev.h |  50 +++++++++++++-
 drivers/ml/cnxk/cn10k_ml_ops.c | 117 ++++++++++++++++++++++++++++++---
 drivers/ml/cnxk/cn10k_ml_ops.h |   2 +
 4 files changed, 160 insertions(+), 13 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 837f006bf0..76ed853a3c 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -261,7 +261,7 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 	} while (plt_tsc_cycles() < timeout_cycle);
 
 	/* Check firmware load status, clean-up and exit on failure. */
-	if ((!timeout) && (fw->req->result.error_code == 0)) {
+	if ((!timeout) && (fw->req->result.error_code.u64 == 0)) {
 		cn10k_ml_fw_print_info(fw);
 	} else {
 		/* Set ML to disable new jobs */
@@ -452,7 +452,7 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	} while (plt_tsc_cycles() < timeout_cycle);
 
 	/* Check firmware load status, clean-up and exit on failure. */
-	if ((!timeout) && (fw->req->result.error_code == 0)) {
+	if ((!timeout) && (fw->req->result.error_code.u64 == 0)) {
 		cn10k_ml_fw_print_info(fw);
 	} else {
 		/* Set ML to disable new jobs */
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 8f6bc24370..604a200e26 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -64,6 +64,54 @@ enum cn10k_ml_dev_state {
 	ML_CN10K_DEV_STATE_CLOSED
 };
 
+/* Error types enumeration */
+enum cn10k_ml_error_etype {
+	/* 0x0 */ ML_ETYPE_NO_ERROR = 0, /* No error */
+	/* 0x1 */ ML_ETYPE_FW_NONFATAL,	 /* Firmware non-fatal error */
+	/* 0x2 */ ML_ETYPE_HW_NONFATAL,	 /* Hardware non-fatal error */
+	/* 0x3 */ ML_ETYPE_HW_FATAL,	 /* Hardware fatal error */
+	/* 0x4 */ ML_ETYPE_HW_WARNING,	 /* Hardware warning */
+	/* 0x5 */ ML_ETYPE_DRIVER,	 /* Driver specific error */
+	/* 0x6 */ ML_ETYPE_UNKNOWN,	 /* Unknown error */
+};
+
+/* Firmware non-fatal error sub-type */
+enum cn10k_ml_error_stype_fw_nf {
+	/* 0x0 */ ML_FW_ERR_NOERR = 0,		 /* No error */
+	/* 0x1 */ ML_FW_ERR_UNLOAD_ID_NOT_FOUND, /* Model ID not found during load */
+	/* 0x2 */ ML_FW_ERR_LOAD_LUT_OVERFLOW,	 /* Lookup table overflow at load */
+	/* 0x3 */ ML_FW_ERR_ID_IN_USE,		 /* Model ID already in use */
+	/* 0x4 */ ML_FW_ERR_INVALID_TILEMASK,	 /* Invalid OCM tilemask */
+	/* 0x5 */ ML_FW_ERR_RUN_LUT_OVERFLOW,	 /* Lookup table overflow at run */
+	/* 0x6 */ ML_FW_ERR_RUN_ID_NOT_FOUND,	 /* Model ID not found during run */
+	/* 0x7 */ ML_FW_ERR_COMMAND_NOTSUP,	 /* Unsupported command */
+	/* 0x8 */ ML_FW_ERR_DDR_ADDR_RANGE,	 /* DDR address out of range */
+	/* 0x9 */ ML_FW_ERR_NUM_BATCHES_INVALID, /* Invalid number of batches */
+	/* 0xA */ ML_FW_ERR_INSSYNC_TIMEOUT,	 /* INS sync timeout */
+};
+
+/* Driver error sub-type */
+enum cn10k_ml_error_stype_driver {
+	/* 0x0 */ ML_DRIVER_ERR_NOERR = 0, /* No error */
+	/* 0x1 */ ML_DRIVER_ERR_UNKNOWN,   /* Unable to determine error sub-type */
+	/* 0x2 */ ML_DRIVER_ERR_EXCEPTION, /* Firmware exception */
+	/* 0x3 */ ML_DRIVER_ERR_FW_ERROR,  /* Unknown firmware error */
+};
+
+/* Error structure */
+union cn10k_ml_error_code {
+	struct {
+		/* Error type */
+		uint64_t etype : 4;
+
+		/* Error sub-type */
+		uint64_t stype : 60;
+	} s;
+
+	/* WORD 0 */
+	uint64_t u64;
+};
+
 /* Firmware stats */
 struct cn10k_ml_fw_stats {
 	/* Firmware start cycle */
@@ -82,7 +130,7 @@ struct cn10k_ml_fw_stats {
 /* Result structure */
 struct cn10k_ml_result {
 	/* Job error code */
-	uint64_t error_code;
+	union cn10k_ml_error_code error_code;
 
 	/* Firmware stats */
 	struct cn10k_ml_fw_stats stats;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index b6a35f9a4f..8de5f9705a 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -23,6 +23,49 @@
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
+/* Error message length */
+#define ERRMSG_LEN 32
+
+/* Error type database */
+static const struct cn10k_ml_etype_db {
+	enum cn10k_ml_error_etype etype;
+	char name[ERRMSG_LEN];
+} ml_etype_db[] = {
+	{ML_ETYPE_NO_ERROR, "NO_ERROR"},	{ML_ETYPE_FW_NONFATAL, "FW_NON_FATAL"},
+	{ML_ETYPE_HW_NONFATAL, "HW_NON_FATAL"}, {ML_ETYPE_HW_FATAL, "HW_FATAL"},
+	{ML_ETYPE_HW_WARNING, "HW_WARNING"},	{ML_ETYPE_DRIVER, "DRIVER_ERROR"},
+	{ML_ETYPE_UNKNOWN, "UNKNOWN_ERROR"},
+};
+
+/* Hardware non-fatal error subtype database */
+static const struct cn10k_ml_stype_db_hw_nf {
+	enum cn10k_ml_error_stype_fw_nf stype;
+	char msg[ERRMSG_LEN];
+} ml_stype_db_hw_nf[] = {
+	{ML_FW_ERR_NOERR, "NO ERROR"},
+	{ML_FW_ERR_UNLOAD_ID_NOT_FOUND, "UNLOAD MODEL ID NOT FOUND"},
+	{ML_FW_ERR_LOAD_LUT_OVERFLOW, "LOAD LUT OVERFLOW"},
+	{ML_FW_ERR_ID_IN_USE, "MODEL ID IN USE"},
+	{ML_FW_ERR_INVALID_TILEMASK, "INVALID TILEMASK"},
+	{ML_FW_ERR_RUN_LUT_OVERFLOW, "RUN LUT OVERFLOW"},
+	{ML_FW_ERR_RUN_ID_NOT_FOUND, "RUN MODEL ID NOT FOUND"},
+	{ML_FW_ERR_COMMAND_NOTSUP, "COMMAND NOT SUPPORTED"},
+	{ML_FW_ERR_DDR_ADDR_RANGE, "DDR ADDRESS OUT OF RANGE"},
+	{ML_FW_ERR_NUM_BATCHES_INVALID, "INVALID BATCHES"},
+	{ML_FW_ERR_INSSYNC_TIMEOUT, "INSSYNC TIMEOUT"},
+};
+
+/* Driver error subtype database */
+static const struct cn10k_ml_stype_db_driver {
+	enum cn10k_ml_error_stype_driver stype;
+	char msg[ERRMSG_LEN];
+} ml_stype_db_driver[] = {
+	{ML_DRIVER_ERR_NOERR, "NO ERROR"},
+	{ML_DRIVER_ERR_UNKNOWN, "UNKNOWN ERROR"},
+	{ML_DRIVER_ERR_EXCEPTION, "FW EXCEPTION"},
+	{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
+};
+
 static void
 print_line(FILE *fp, int len)
 {
@@ -474,6 +517,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
+	dev->op_error_get = cn10k_ml_op_error_get;
 
 	mldev->nb_models_loaded = 0;
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
@@ -758,7 +802,7 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	if (timeout) {
 		ret = -ETIME;
 	} else {
-		if (req->result.error_code != 0)
+		if (req->result.error_code.u64 != 0)
 			ret = -1;
 	}
 
@@ -936,7 +980,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, int16_t model_id)
 	/* Prepare JD */
 	req = model->req;
 	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
-	req->result.error_code = 0x0;
+	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
 
 	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
@@ -1017,7 +1061,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, int16_t model_id)
 
 	if (job_dequeued) {
 		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
-			if (req->result.error_code == 0)
+			if (req->result.error_code.u64 == 0)
 				ret = 0;
 			else
 				ret = -1;
@@ -1079,7 +1123,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, int16_t model_id)
 	/* Prepare JD */
 	req = model->req;
 	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
-	req->result.error_code = 0x0;
+	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
 
 	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
@@ -1134,7 +1178,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, int16_t model_id)
 
 	if (job_dequeued) {
 		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
-			if (req->result.error_code == 0x0)
+			if (req->result.error_code.u64 == 0x0)
 				ret = 0;
 			else
 				ret = -1;
@@ -1426,12 +1470,30 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 	PLT_SET_USED(dev);
 	PLT_SET_USED(qp_id);
 
-	op->impl_opaque = result->error_code;
+	struct cn10k_ml_dev *mldev;
 
-	if (likely(result->error_code == 0))
+	if (likely(result->error_code.u64 == 0)) {
+		op->impl_opaque = result->error_code.u64;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
-	else
+	} else {
+		/* Handle driver error */
+		if (result->error_code.s.etype == ML_ETYPE_DRIVER) {
+			mldev = dev->data->dev_private;
+
+			/* Check for exception */
+			if ((roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0) ||
+			    (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
+				result->error_code.s.stype = ML_DRIVER_ERR_EXCEPTION;
+			else if ((roc_ml_reg_read64(&mldev->roc, ML_CORE_INT_LO) != 0) ||
+				 (roc_ml_reg_read64(&mldev->roc, ML_CORE_INT_HI) != 0))
+				result->error_code.s.stype = ML_DRIVER_ERR_FW_ERROR;
+			else
+				result->error_code.s.stype = ML_DRIVER_ERR_UNKNOWN;
+		}
+
+		op->impl_opaque = result->error_code.u64;
 		op->status = RTE_ML_OP_STATUS_ERROR;
+	}
 
 	op->user_ptr = result->user_ptr;
 }
@@ -1468,6 +1530,7 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
+	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
 	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
@@ -1515,8 +1578,12 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 dequeue_req:
 	req = &queue->reqs[tail];
 	status = plt_read64(&req->status);
-	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH))
-		goto empty_or_active;
+	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH)) {
+		if (plt_tsc_cycles() < req->timeout)
+			goto empty_or_active;
+		else /* Timeout, set indication of driver error */
+			req->result.error_code.s.etype = ML_ETYPE_DRIVER;
+	}
 
 	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
 	ops[count] = req->op;
@@ -1533,6 +1600,35 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	return count;
 }
 
+__rte_hot int
+cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_ml_op_error *error)
+{
+	union cn10k_ml_error_code *error_code;
+	char msg[RTE_ML_STR_MAX];
+
+	PLT_SET_USED(dev);
+
+	error_code = (union cn10k_ml_error_code *)&op->impl_opaque;
+
+	/* Copy error message */
+	plt_strlcpy(msg, ml_etype_db[error_code->s.etype].name, sizeof(msg));
+
+	/* Copy sub error message */
+	if (error_code->s.etype == ML_ETYPE_HW_NONFATAL) {
+		strcat(msg, " : ");
+		strcat(msg, ml_stype_db_hw_nf[error_code->s.stype].msg);
+	}
+
+	if (error_code->s.etype == ML_ETYPE_DRIVER) {
+		strcat(msg, " : ");
+		strcat(msg, ml_stype_db_driver[error_code->s.stype].msg);
+	}
+
+	plt_strlcpy(error->message, msg, sizeof(error->message));
+
+	return 0;
+}
+
 __rte_hot int
 cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 {
@@ -1549,6 +1645,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
+	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
 	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index a17a2851b1..560310f835 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -75,6 +75,8 @@ __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id
 					  struct rte_ml_op **ops, uint16_t nb_ops);
 __rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					  struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op,
+				    struct rte_ml_op_error *error);
 __rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
 
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 30/39] ml/cnxk: add support to get and reset device stats
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (28 preceding siblings ...)
  2023-02-01  9:23   ` [PATCH v4 29/39] ml/cnxk: enable support for firmware error codes Srikanth Yalavarthi
@ 2023-02-01  9:23   ` Srikanth Yalavarthi
  2023-02-01  9:23   ` [PATCH v4 31/39] ml/cnxk: add support to handle extended dev stats Srikanth Yalavarthi
                     ` (8 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:23 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled support to get and reset ML device stats. Device stats
include number of requests enqueued/dequeued and error count.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 55 ++++++++++++++++++++++++++++++++--
 drivers/ml/cnxk/cn10k_ml_ops.h |  3 ++
 2 files changed, 55 insertions(+), 3 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 8de5f9705a..3149e4153e 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -159,6 +159,10 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->queue.tail = 0;
 	qp->queue.wait_cycles = ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
 	qp->nb_desc = nb_desc;
+	qp->stats.enqueued_count = 0;
+	qp->stats.dequeued_count = 0;
+	qp->stats.enqueue_err_count = 0;
+	qp->stats.dequeue_err_count = 0;
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
@@ -678,6 +682,38 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+static int
+cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
+{
+	struct cn10k_ml_qp *qp;
+	int qp_id;
+
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		stats->enqueued_count += qp->stats.enqueued_count;
+		stats->dequeued_count += qp->stats.dequeued_count;
+		stats->enqueue_err_count += qp->stats.enqueue_err_count;
+		stats->dequeue_err_count += qp->stats.dequeue_err_count;
+	}
+
+	return 0;
+}
+
+static void
+cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
+{
+	struct cn10k_ml_qp *qp;
+	int qp_id;
+
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		qp->stats.enqueued_count = 0;
+		qp->stats.dequeued_count = 0;
+		qp->stats.enqueue_err_count = 0;
+		qp->stats.dequeue_err_count = 0;
+	}
+}
+
 static int
 cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
@@ -1467,15 +1503,23 @@ static __rte_always_inline void
 cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
 		       struct rte_ml_op *op)
 {
-	PLT_SET_USED(dev);
-	PLT_SET_USED(qp_id);
-
 	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_qp *qp;
 
 	if (likely(result->error_code.u64 == 0)) {
+		if (likely(qp_id >= 0)) {
+			qp = dev->data->queue_pairs[qp_id];
+			qp->stats.dequeued_count++;
+		}
+
 		op->impl_opaque = result->error_code.u64;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
 	} else {
+		if (likely(qp_id >= 0)) {
+			qp = dev->data->queue_pairs[qp_id];
+			qp->stats.dequeue_err_count++;
+		}
+
 		/* Handle driver error */
 		if (result->error_code.s.etype == ML_ETYPE_DRIVER) {
 			mldev = dev->data->dev_private;
@@ -1549,6 +1593,7 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 
 jcmdq_full:
 	queue->head = head;
+	qp->stats.enqueued_count += count;
 
 	return count;
 }
@@ -1697,6 +1742,10 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
 	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
 
+	/* Stats ops */
+	.dev_stats_get = cn10k_ml_dev_stats_get,
+	.dev_stats_reset = cn10k_ml_dev_stats_reset,
+
 	/* Model ops */
 	.model_load = cn10k_ml_model_load,
 	.model_unload = cn10k_ml_model_unload,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 560310f835..fb82af414a 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -58,6 +58,9 @@ struct cn10k_ml_qp {
 
 	/* Request queue */
 	struct cn10k_ml_queue queue;
+
+	/* Statistics per queue-pair */
+	struct rte_ml_dev_stats stats;
 };
 
 /* Device ops */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 31/39] ml/cnxk: add support to handle extended dev stats
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (29 preceding siblings ...)
  2023-02-01  9:23   ` [PATCH v4 30/39] ml/cnxk: add support to get and reset device stats Srikanth Yalavarthi
@ 2023-02-01  9:23   ` Srikanth Yalavarthi
  2023-02-01  9:23   ` [PATCH v4 32/39] ml/cnxk: enable support to get xstats in cycles Srikanth Yalavarthi
                     ` (7 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:23 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added support to handle ML device extended stats. Support
is enabled to get xstats names and stats values and reset
xstats. Supported xstats include avg, min and max hardware
and firmware latency.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |   3 +
 drivers/ml/cnxk/cn10k_ml_model.h |  57 +++++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 356 ++++++++++++++++++++++++++++++-
 3 files changed, 415 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 604a200e26..b7ff369ba8 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -372,6 +372,9 @@ struct cn10k_ml_dev {
 
 	/* Number of models loaded */
 	uint16_t nb_models_loaded;
+
+	/* xstats status */
+	bool xstats_enabled;
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index dca282a498..137f63bddc 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -399,6 +399,57 @@ struct cn10k_ml_model_addr {
 	uint32_t total_output_sz_d;
 };
 
+/* Extended stats types enum */
+enum cn10k_ml_model_xstats_type {
+	/* Average hardware latency */
+	avg_hw_latency = 0,
+
+	/* Minimum hardware latency */
+	min_hw_latency,
+
+	/* Maximum hardware latency */
+	max_hw_latency,
+
+	/* Average firmware latency */
+	avg_fw_latency,
+
+	/* Minimum firmware latency */
+	min_fw_latency,
+
+	/* Maximum firmware latency */
+	max_fw_latency,
+};
+
+/* Model fast-path stats */
+struct cn10k_ml_model_stats {
+	/* Total hardware latency, sum of all inferences */
+	uint64_t hw_latency_tot;
+
+	/* Minimum hardware latency */
+	uint64_t hw_latency_min;
+
+	/* Maximum hardware latency */
+	uint64_t hw_latency_max;
+
+	/* Total firmware latency, sum of all inferences */
+	uint64_t fw_latency_tot;
+
+	/* Minimum firmware latency */
+	uint64_t fw_latency_min;
+
+	/* Maximum firmware latency */
+	uint64_t fw_latency_max;
+
+	/* Total jobs dequeued */
+	uint64_t dequeued_count;
+
+	/* Hardware stats reset index */
+	uint64_t hw_reset_count;
+
+	/* Firmware stats reset index */
+	uint64_t fw_reset_count;
+};
+
 /* Model Object */
 struct cn10k_ml_model {
 	/* Device reference */
@@ -438,6 +489,12 @@ struct cn10k_ml_model {
 
 	/* Slow-path operations request pointer */
 	struct cn10k_ml_req *req;
+
+	/* Stats for burst ops */
+	struct cn10k_ml_model_stats *burst_stats;
+
+	/* Stats for sync ops */
+	struct cn10k_ml_model_stats *sync_stats;
 };
 
 int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 3149e4153e..b53c88557a 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -354,6 +354,134 @@ cn10k_ml_prep_fp_job_descriptor(struct rte_ml_dev *dev, struct cn10k_ml_req *req
 	req->jd.model_run.num_batches = op->nb_batches;
 }
 
+#define ML_AVG_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
+			value += model->burst_stats[qp_id].str##_latency_tot;                      \
+			count += model->burst_stats[qp_id].dequeued_count -                        \
+				 model->burst_stats[qp_id].str##_reset_count;                      \
+		}                                                                                  \
+		value = value / count;                                                             \
+	} while (0)
+
+#define ML_MIN_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+	do {                                                                                       \
+		value = UINT64_MAX;                                                                \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
+			value = PLT_MIN(value, model->burst_stats[qp_id].str##_latency_min);       \
+			count += model->burst_stats[qp_id].dequeued_count -                        \
+				 model->burst_stats[qp_id].str##_reset_count;                      \
+		}                                                                                  \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+
+#define ML_MAX_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
+			value = PLT_MAX(value, model->burst_stats[qp_id].str##_latency_max);       \
+			count += model->burst_stats[qp_id].dequeued_count -                        \
+				 model->burst_stats[qp_id].str##_reset_count;                      \
+		}                                                                                  \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+
+static uint64_t
+cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t model_id,
+			 enum cn10k_ml_model_xstats_type type)
+{
+	struct cn10k_ml_model *model;
+	uint64_t count = 0;
+	uint64_t value;
+	uint32_t qp_id;
+
+	model = dev->data->models[model_id];
+	if (model == NULL)
+		return 0;
+
+	switch (type) {
+	case avg_hw_latency:
+		ML_AVG_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		break;
+	case min_hw_latency:
+		ML_MIN_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		break;
+	case max_hw_latency:
+		ML_MAX_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		break;
+	case avg_fw_latency:
+		ML_AVG_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		break;
+	case min_fw_latency:
+		ML_MIN_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		break;
+	case max_fw_latency:
+		ML_MAX_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		break;
+	default:
+		value = 0;
+	}
+
+	return value;
+}
+
+#define ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
+			model->burst_stats[qp_id].str##_latency_tot = 0;                           \
+			model->burst_stats[qp_id].str##_reset_count =                              \
+				model->burst_stats[qp_id].dequeued_count;                          \
+		}                                                                                  \
+	} while (0)
+
+#define ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
+			model->burst_stats[qp_id].str##_latency_min = UINT64_MAX;                  \
+	} while (0)
+
+#define ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
+			model->burst_stats[qp_id].str##_latency_max = 0;                           \
+	} while (0)
+
+static void
+cn10k_ml_model_xstat_reset(struct rte_ml_dev *dev, uint16_t model_id,
+			   enum cn10k_ml_model_xstats_type type)
+{
+	struct cn10k_ml_model *model;
+	uint32_t qp_id;
+
+	model = dev->data->models[model_id];
+
+	switch (type) {
+	case avg_hw_latency:
+		ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, hw);
+		break;
+	case min_hw_latency:
+		ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, hw);
+		break;
+	case max_hw_latency:
+		ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, hw);
+		break;
+	case avg_fw_latency:
+		ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, fw);
+		break;
+	case min_fw_latency:
+		ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, fw);
+		break;
+	case max_fw_latency:
+		ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, fw);
+		break;
+	default:
+		return;
+	}
+}
+
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -519,6 +647,13 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 
 	rte_spinlock_init(&ocm->lock);
 
+	/* Check firmware stats */
+	if ((mldev->fw.req->jd.fw_load.cap.s.hw_stats) &&
+	    (mldev->fw.req->jd.fw_load.cap.s.fw_stats))
+		mldev->xstats_enabled = true;
+	else
+		mldev->xstats_enabled = false;
+
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
 	dev->op_error_get = cn10k_ml_op_error_get;
@@ -714,6 +849,170 @@ cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 	}
 }
 
+/* Model xstats names */
+struct rte_ml_dev_xstats_map cn10k_ml_model_xstats_table[] = {
+	{avg_hw_latency, "Avg-HW-Latency"}, {min_hw_latency, "Min-HW-Latency"},
+	{max_hw_latency, "Max-HW-Latency"}, {avg_fw_latency, "Avg-FW-Latency"},
+	{min_fw_latency, "Min-FW-Latency"}, {max_fw_latency, "Max-FW-Latency"},
+};
+
+static int
+cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, struct rte_ml_dev_xstats_map *xstats_map,
+			      uint32_t size)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	uint32_t model_id;
+	uint32_t count;
+	uint32_t type;
+	uint32_t id;
+
+	mldev = dev->data->dev_private;
+	if (!mldev->xstats_enabled)
+		return 0;
+
+	if (xstats_map == NULL)
+		return PLT_DIM(cn10k_ml_model_xstats_table) * mldev->nb_models_loaded;
+
+	/* Model xstats names */
+	count = 0;
+	cn10k_ml_dev_info_get(dev, &dev_info);
+
+	for (id = 0; id < PLT_DIM(cn10k_ml_model_xstats_table) * dev_info.max_models; id++) {
+		model_id = id / PLT_DIM(cn10k_ml_model_xstats_table);
+		model = dev->data->models[model_id];
+
+		if (model == NULL)
+			continue;
+
+		xstats_map[count].id = id;
+		type = id % PLT_DIM(cn10k_ml_model_xstats_table);
+
+		snprintf(xstats_map[count].name, RTE_ML_STR_MAX, "%s-%s-cycles",
+			 model->metadata.model.name, cn10k_ml_model_xstats_table[type].name);
+
+		count++;
+		if (count == size)
+			break;
+	}
+
+	return count;
+}
+
+static int
+cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
+				uint64_t *value)
+{
+	struct rte_ml_dev_xstats_map *xstats_map;
+	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_dev *mldev;
+	uint32_t num_xstats;
+	uint32_t model_id;
+	uint32_t type;
+	uint32_t id;
+
+	mldev = dev->data->dev_private;
+	if (!mldev->xstats_enabled)
+		return 0;
+
+	num_xstats = PLT_DIM(cn10k_ml_model_xstats_table) * mldev->nb_models_loaded;
+	xstats_map = rte_zmalloc("cn10k_ml_xstats_map",
+				 sizeof(struct rte_ml_dev_xstats_map) * num_xstats, 0);
+	cn10k_ml_dev_xstats_names_get(dev, xstats_map, num_xstats);
+
+	cn10k_ml_dev_info_get(dev, &dev_info);
+	for (id = 0; id < PLT_DIM(cn10k_ml_model_xstats_table) * dev_info.max_models; id++) {
+		if (strncmp(name, xstats_map[id].name, strlen(name)) == 0) {
+			*stat_id = id;
+			rte_free(xstats_map);
+			break;
+		}
+	}
+
+	if (id == PLT_DIM(cn10k_ml_model_xstats_table) * dev_info.max_models)
+		return -EINVAL;
+
+	model_id = id / PLT_DIM(cn10k_ml_model_xstats_table);
+	type = id % PLT_DIM(cn10k_ml_model_xstats_table);
+	*value = cn10k_ml_model_xstat_get(dev, model_id, type);
+
+	return 0;
+}
+
+static int
+cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, const uint16_t *stat_ids, uint64_t *values,
+			uint16_t nb_ids)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	uint32_t model_id;
+	uint32_t count;
+	uint32_t type;
+	uint32_t i;
+
+	mldev = dev->data->dev_private;
+	if (!mldev->xstats_enabled)
+		return 0;
+
+	count = 0;
+	for (i = 0; i < nb_ids; i++) {
+		model_id = stat_ids[i] / PLT_DIM(cn10k_ml_model_xstats_table);
+		model = dev->data->models[model_id];
+
+		if (model == NULL)
+			continue;
+
+		type = stat_ids[i] % PLT_DIM(cn10k_ml_model_xstats_table);
+		values[i] = cn10k_ml_model_xstat_get(dev, model_id, type);
+		count++;
+	}
+
+	return count;
+}
+
+static int
+cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, const uint16_t *stat_ids, uint16_t nb_ids)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	uint32_t model_id;
+	uint32_t type;
+	uint32_t i;
+
+	mldev = dev->data->dev_private;
+	if (!mldev->xstats_enabled)
+		return 0;
+
+	cn10k_ml_dev_info_get(dev, &dev_info);
+	if (stat_ids == NULL) {
+		for (i = 0; i < PLT_DIM(cn10k_ml_model_xstats_table) * dev_info.max_models; i++) {
+			model_id = i / PLT_DIM(cn10k_ml_model_xstats_table);
+			model = dev->data->models[model_id];
+
+			if (model == NULL)
+				continue;
+
+			type = i % PLT_DIM(cn10k_ml_model_xstats_table);
+			cn10k_ml_model_xstat_reset(dev, model_id, type);
+		}
+	} else {
+		for (i = 0; i < nb_ids; i++) {
+			model_id = stat_ids[i] / PLT_DIM(cn10k_ml_model_xstats_table);
+			model = dev->data->models[model_id];
+
+			if (model == NULL)
+				continue;
+
+			type = stat_ids[i] % PLT_DIM(cn10k_ml_model_xstats_table);
+			cn10k_ml_model_xstat_reset(dev, model_id, type);
+		}
+	}
+
+	return 0;
+}
+
 static int
 cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
@@ -856,6 +1155,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
+	size_t model_stats_size;
 	size_t model_data_size;
 	size_t model_info_size;
 	uint8_t *base_dma_addr;
@@ -864,6 +1164,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	uint64_t mz_size;
 	uint16_t idx;
 	bool found;
+	int qp_id;
 	int ret;
 
 	ret = cn10k_ml_model_metadata_check(params->addr, params->size);
@@ -900,10 +1201,12 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
 			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
 	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
+	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_model_stats);
 
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
 		  2 * model_data_size + model_info_size +
-		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE);
+		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE) +
+		  model_stats_size;
 
 	/* Allocate memzone for model object and model data */
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
@@ -949,6 +1252,24 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	/* Set slow-path request address and state */
 	model->req = PLT_PTR_ADD(model->info, model_info_size);
 
+	/* Reset burst and sync stats */
+	model->burst_stats = PLT_PTR_ADD(
+		model->req, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE));
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
+		model->burst_stats[qp_id].hw_latency_tot = 0;
+		model->burst_stats[qp_id].hw_latency_min = UINT64_MAX;
+		model->burst_stats[qp_id].hw_latency_max = 0;
+		model->burst_stats[qp_id].fw_latency_tot = 0;
+		model->burst_stats[qp_id].fw_latency_min = UINT64_MAX;
+		model->burst_stats[qp_id].fw_latency_max = 0;
+		model->burst_stats[qp_id].hw_reset_count = 0;
+		model->burst_stats[qp_id].fw_reset_count = 0;
+		model->burst_stats[qp_id].dequeued_count = 0;
+	}
+	model->sync_stats =
+		PLT_PTR_ADD(model->burst_stats,
+			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_model_stats));
+
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
@@ -1503,15 +1824,44 @@ static __rte_always_inline void
 cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
 		       struct rte_ml_op *op)
 {
+	struct cn10k_ml_model_stats *stats;
+	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
+	uint64_t hw_latency;
+	uint64_t fw_latency;
 
 	if (likely(result->error_code.u64 == 0)) {
+		model = dev->data->models[op->model_id];
 		if (likely(qp_id >= 0)) {
 			qp = dev->data->queue_pairs[qp_id];
 			qp->stats.dequeued_count++;
+			stats = &model->burst_stats[qp_id];
+		} else {
+			stats = model->sync_stats;
+		}
+
+		if (unlikely(stats->dequeued_count == stats->hw_reset_count)) {
+			stats->hw_latency_min = UINT64_MAX;
+			stats->hw_latency_max = 0;
 		}
 
+		if (unlikely(stats->dequeued_count == stats->fw_reset_count)) {
+			stats->fw_latency_min = UINT64_MAX;
+			stats->fw_latency_max = 0;
+		}
+
+		hw_latency = result->stats.hw_end - result->stats.hw_start;
+		fw_latency = result->stats.fw_end - result->stats.fw_start - hw_latency;
+
+		stats->hw_latency_tot += hw_latency;
+		stats->hw_latency_min = PLT_MIN(stats->hw_latency_min, hw_latency);
+		stats->hw_latency_max = PLT_MAX(stats->hw_latency_max, hw_latency);
+		stats->fw_latency_tot += fw_latency;
+		stats->fw_latency_min = PLT_MIN(stats->fw_latency_min, fw_latency);
+		stats->fw_latency_max = PLT_MAX(stats->fw_latency_max, fw_latency);
+		stats->dequeued_count++;
+
 		op->impl_opaque = result->error_code.u64;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
 	} else {
@@ -1745,6 +2095,10 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Stats ops */
 	.dev_stats_get = cn10k_ml_dev_stats_get,
 	.dev_stats_reset = cn10k_ml_dev_stats_reset,
+	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
+	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
+	.dev_xstats_get = cn10k_ml_dev_xstats_get,
+	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
 
 	/* Model ops */
 	.model_load = cn10k_ml_model_load,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 32/39] ml/cnxk: enable support to get xstats in cycles
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (30 preceding siblings ...)
  2023-02-01  9:23   ` [PATCH v4 31/39] ml/cnxk: add support to handle extended dev stats Srikanth Yalavarthi
@ 2023-02-01  9:23   ` Srikanth Yalavarthi
  2023-02-01  9:23   ` [PATCH v4 33/39] ml/cnxk: add support to report DPE FW warnings Srikanth Yalavarthi
                     ` (6 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:23 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled support to retrieve xstats in either cycles or ns.
Access to sclk is enabled only if an RVU device is probed
during initialization. Driver would return the xstats in
nanoseconds only when an RVU device is probed, else would
fallback to cycles.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index b53c88557a..eabb91d507 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -394,6 +394,8 @@ cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t model_id,
 			 enum cn10k_ml_model_xstats_type type)
 {
 	struct cn10k_ml_model *model;
+	uint16_t rclk_freq; /* MHz */
+	uint16_t sclk_freq; /* MHz */
 	uint64_t count = 0;
 	uint64_t value;
 	uint32_t qp_id;
@@ -425,6 +427,10 @@ cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t model_id,
 		value = 0;
 	}
 
+	roc_clk_freq_get(&rclk_freq, &sclk_freq);
+	if (sclk_freq != 0) /* return in ns */
+		value = (value * 1000ULL) / sclk_freq;
+
 	return value;
 }
 
@@ -863,6 +869,8 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, struct rte_ml_dev_xstats_m
 	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
+	uint16_t rclk_freq;
+	uint16_t sclk_freq;
 	uint32_t model_id;
 	uint32_t count;
 	uint32_t type;
@@ -878,6 +886,7 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, struct rte_ml_dev_xstats_m
 	/* Model xstats names */
 	count = 0;
 	cn10k_ml_dev_info_get(dev, &dev_info);
+	roc_clk_freq_get(&rclk_freq, &sclk_freq);
 
 	for (id = 0; id < PLT_DIM(cn10k_ml_model_xstats_table) * dev_info.max_models; id++) {
 		model_id = id / PLT_DIM(cn10k_ml_model_xstats_table);
@@ -889,8 +898,14 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, struct rte_ml_dev_xstats_m
 		xstats_map[count].id = id;
 		type = id % PLT_DIM(cn10k_ml_model_xstats_table);
 
-		snprintf(xstats_map[count].name, RTE_ML_STR_MAX, "%s-%s-cycles",
-			 model->metadata.model.name, cn10k_ml_model_xstats_table[type].name);
+		if (sclk_freq == 0)
+			snprintf(xstats_map[count].name, RTE_ML_STR_MAX, "%s-%s-cycles",
+				 model->metadata.model.name,
+				 cn10k_ml_model_xstats_table[type].name);
+		else
+			snprintf(xstats_map[count].name, RTE_ML_STR_MAX, "%s-%s-ns",
+				 model->metadata.model.name,
+				 cn10k_ml_model_xstats_table[type].name);
 
 		count++;
 		if (count == size)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 33/39] ml/cnxk: add support to report DPE FW warnings
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (31 preceding siblings ...)
  2023-02-01  9:23   ` [PATCH v4 32/39] ml/cnxk: enable support to get xstats in cycles Srikanth Yalavarthi
@ 2023-02-01  9:23   ` Srikanth Yalavarthi
  2023-02-01  9:23   ` [PATCH v4 34/39] ml/cnxk: add support to enable model data caching Srikanth Yalavarthi
                     ` (5 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:23 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled support to enable and report DPE warnings from ML
firmware. Configure firmware load flags based on the device
arguments.

Default values:
	enable_dpe_errors = 1
	report_dpe_errors = 0

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 94 +++++++++++++++++++++++++++++++---
 drivers/ml/cnxk/cn10k_ml_dev.h |  6 +++
 2 files changed, 93 insertions(+), 7 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 76ed853a3c..ac6592891b 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -17,9 +17,13 @@
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ops.h"
 
-#define CN10K_ML_FW_PATH "fw_path"
+#define CN10K_ML_FW_PATH		"fw_path"
+#define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
+#define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
 
-#define CN10K_ML_FW_PATH_DEFAULT "/lib/firmware/mlip-fw.bin"
+#define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
+#define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1
+#define CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT 0
 
 /* ML firmware macros */
 #define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
@@ -28,9 +32,13 @@
 #define FW_EXCEPTION_BUFFER_SIZE 0x400
 #define FW_LINKER_OFFSET	 0x80000
 #define FW_WAIT_CYCLES		 100
-#define FW_LOAD_FLAGS		 0x1
 
-static const char *const valid_args[] = {CN10K_ML_FW_PATH, NULL};
+/* Firmware flags */
+#define FW_ENABLE_DPE_WARNING_BITMASK BIT(0)
+#define FW_REPORT_DPE_WARNING_BITMASK BIT(1)
+
+static const char *const valid_args[] = {CN10K_ML_FW_PATH, CN10K_ML_FW_ENABLE_DPE_WARNINGS,
+					 CN10K_ML_FW_REPORT_DPE_WARNINGS, NULL};
 
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
@@ -49,9 +57,25 @@ parse_string_arg(const char *key __rte_unused, const char *value, void *extra_ar
 	return 0;
 }
 
+static int
+parse_integer_arg(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	int *i = (int *)extra_args;
+
+	*i = atoi(value);
+	if (*i < 0) {
+		plt_err("Argument has to be positive.");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 static int
 cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mldev)
 {
+	bool enable_dpe_warnings_set = false;
+	bool report_dpe_warnings_set = false;
 	struct rte_kvargs *kvlist = NULL;
 	bool fw_path_set = false;
 	char *fw_path = NULL;
@@ -76,6 +100,30 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		fw_path_set = true;
 	}
 
+	if (rte_kvargs_count(kvlist, CN10K_ML_FW_ENABLE_DPE_WARNINGS) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_ENABLE_DPE_WARNINGS,
+					 &parse_integer_arg, &mldev->fw.enable_dpe_warnings);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n",
+				CN10K_ML_FW_ENABLE_DPE_WARNINGS);
+			ret = -EINVAL;
+			goto exit;
+		}
+		enable_dpe_warnings_set = true;
+	}
+
+	if (rte_kvargs_count(kvlist, CN10K_ML_FW_REPORT_DPE_WARNINGS) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_REPORT_DPE_WARNINGS,
+					 &parse_integer_arg, &mldev->fw.report_dpe_warnings);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n",
+				CN10K_ML_FW_REPORT_DPE_WARNINGS);
+			ret = -EINVAL;
+			goto exit;
+		}
+		report_dpe_warnings_set = true;
+	}
+
 check_args:
 	if (!fw_path_set)
 		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
@@ -83,6 +131,30 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		mldev->fw.path = fw_path;
 	plt_info("ML: %s = %s", CN10K_ML_FW_PATH, mldev->fw.path);
 
+	if (!enable_dpe_warnings_set) {
+		mldev->fw.enable_dpe_warnings = CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT;
+	} else {
+		if ((mldev->fw.enable_dpe_warnings < 0) || (mldev->fw.enable_dpe_warnings > 1)) {
+			plt_err("Invalid argument, %s = %d\n", CN10K_ML_FW_ENABLE_DPE_WARNINGS,
+				mldev->fw.enable_dpe_warnings);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_info("ML: %s = %d", CN10K_ML_FW_ENABLE_DPE_WARNINGS, mldev->fw.enable_dpe_warnings);
+
+	if (!report_dpe_warnings_set) {
+		mldev->fw.report_dpe_warnings = CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT;
+	} else {
+		if ((mldev->fw.report_dpe_warnings < 0) || (mldev->fw.report_dpe_warnings > 1)) {
+			plt_err("Invalid argument, %s = %d\n", CN10K_ML_FW_REPORT_DPE_WARNINGS,
+				mldev->fw.report_dpe_warnings);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_info("ML: %s = %d", CN10K_ML_FW_REPORT_DPE_WARNINGS, mldev->fw.report_dpe_warnings);
+
 exit:
 	if (kvlist)
 		rte_kvargs_free(kvlist);
@@ -208,9 +280,15 @@ cn10k_ml_fw_print_info(struct cn10k_ml_fw *fw)
 uint64_t
 cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 {
-	PLT_SET_USED(fw);
+	uint64_t flags = 0x0;
+
+	if (fw->enable_dpe_warnings)
+		flags = flags | FW_ENABLE_DPE_WARNING_BITMASK;
+
+	if (fw->report_dpe_warnings)
+		flags = flags | FW_REPORT_DPE_WARNING_BITMASK;
 
-	return FW_LOAD_FLAGS;
+	return flags;
 }
 
 static int
@@ -614,4 +692,6 @@ RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table);
 RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
 
-RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH "=<path>");
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD,
+			      CN10K_ML_FW_PATH "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
+					       "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS "=<0|1>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index b7ff369ba8..9ba56ffba6 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -349,6 +349,12 @@ struct cn10k_ml_fw {
 	/* Firmware file path */
 	const char *path;
 
+	/* Enable DPE warnings */
+	int enable_dpe_warnings;
+
+	/* Report DPE warnings */
+	int report_dpe_warnings;
+
 	/* Data buffer */
 	uint8_t *data;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 34/39] ml/cnxk: add support to enable model data caching
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (32 preceding siblings ...)
  2023-02-01  9:23   ` [PATCH v4 33/39] ml/cnxk: add support to report DPE FW warnings Srikanth Yalavarthi
@ 2023-02-01  9:23   ` Srikanth Yalavarthi
  2023-02-01  9:23   ` [PATCH v4 35/39] ml/cnxk: add support to select OCM allocation mode Srikanth Yalavarthi
                     ` (4 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:23 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added device argument 'cache_model_data' to enable model data
caching. An inference request would be executed with dummy data
in synchronous mode during model start stage. This run would
cache the model weights and bias in the memory and result in
improved inference throughput.

cache_model_data = 1, enable (default)
cache_model_data = 0, disable

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 33 ++++++++++++++++++++--
 drivers/ml/cnxk/cn10k_ml_dev.h |  3 ++
 drivers/ml/cnxk/cn10k_ml_ops.c | 50 ++++++++++++++++++++++++++++++++++
 3 files changed, 84 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index ac6592891b..948708a420 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -20,10 +20,12 @@
 #define CN10K_ML_FW_PATH		"fw_path"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
+#define CN10K_ML_DEV_CACHE_MODEL_DATA	"cache_model_data"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT 0
+#define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT	1
 
 /* ML firmware macros */
 #define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
@@ -38,7 +40,8 @@
 #define FW_REPORT_DPE_WARNING_BITMASK BIT(1)
 
 static const char *const valid_args[] = {CN10K_ML_FW_PATH, CN10K_ML_FW_ENABLE_DPE_WARNINGS,
-					 CN10K_ML_FW_REPORT_DPE_WARNINGS, NULL};
+					 CN10K_ML_FW_REPORT_DPE_WARNINGS,
+					 CN10K_ML_DEV_CACHE_MODEL_DATA, NULL};
 
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
@@ -76,6 +79,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 {
 	bool enable_dpe_warnings_set = false;
 	bool report_dpe_warnings_set = false;
+	bool cache_model_data_set = false;
 	struct rte_kvargs *kvlist = NULL;
 	bool fw_path_set = false;
 	char *fw_path = NULL;
@@ -124,6 +128,18 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		report_dpe_warnings_set = true;
 	}
 
+	if (rte_kvargs_count(kvlist, CN10K_ML_DEV_CACHE_MODEL_DATA) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_DEV_CACHE_MODEL_DATA, &parse_integer_arg,
+					 &mldev->cache_model_data);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n",
+				CN10K_ML_DEV_CACHE_MODEL_DATA);
+			ret = -EINVAL;
+			goto exit;
+		}
+		cache_model_data_set = true;
+	}
+
 check_args:
 	if (!fw_path_set)
 		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
@@ -155,6 +171,18 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %d", CN10K_ML_FW_REPORT_DPE_WARNINGS, mldev->fw.report_dpe_warnings);
 
+	if (!cache_model_data_set) {
+		mldev->cache_model_data = CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT;
+	} else {
+		if ((mldev->cache_model_data < 0) || (mldev->cache_model_data > 1)) {
+			plt_err("Invalid argument, %s = %d\n", CN10K_ML_DEV_CACHE_MODEL_DATA,
+				mldev->cache_model_data);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_info("ML: %s = %d", CN10K_ML_DEV_CACHE_MODEL_DATA, mldev->cache_model_data);
+
 exit:
 	if (kvlist)
 		rte_kvargs_free(kvlist);
@@ -694,4 +722,5 @@ RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
 
 RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD,
 			      CN10K_ML_FW_PATH "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
-					       "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS "=<0|1>");
+					       "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
+					       "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA "=<0|1>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 9ba56ffba6..718edadde7 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -381,6 +381,9 @@ struct cn10k_ml_dev {
 
 	/* xstats status */
 	bool xstats_enabled;
+
+	/* Enable / disable model data caching */
+	int cache_model_data;
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index eabb91d507..c12259f44c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -488,6 +488,49 @@ cn10k_ml_model_xstat_reset(struct rte_ml_dev *dev, uint16_t model_id,
 	}
 }
 
+static int
+cn10k_ml_cache_model_data(struct rte_ml_dev *dev, int16_t model_id)
+{
+	struct cn10k_ml_model *model;
+	struct rte_ml_op op;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint64_t isize = 0;
+	uint64_t osize = 0;
+	int ret = 0;
+
+	model = dev->data->models[model_id];
+
+	/* Create input and output buffers. */
+	rte_ml_io_input_size_get(dev->data->dev_id, model_id, model->batch_size, &isize, NULL);
+	rte_ml_io_output_size_get(dev->data->dev_id, model_id, model->batch_size, &osize, NULL);
+
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%d", "ml_dummy_io", model_id);
+	mz = plt_memzone_reserve_aligned(str, isize + osize, 0, ML_CN10K_ALIGN_SIZE);
+	if (mz == NULL)
+		return -ENOMEM;
+	memset(mz->addr, 0, isize + osize);
+
+	op.model_id = model_id;
+	op.nb_batches = model->batch_size;
+	op.mempool = NULL;
+
+	op.input.addr = mz->addr;
+	op.input.length = isize;
+	op.input.next = NULL;
+
+	op.output.addr = PLT_PTR_ADD(op.input.addr, isize);
+	op.output.length = osize;
+	op.output.next = NULL;
+
+	memset(model->req, 0, sizeof(struct cn10k_ml_req));
+	ret = cn10k_ml_inference_sync(dev, &op);
+	plt_memzone_free(mz);
+
+	return ret;
+}
+
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -1467,6 +1510,13 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, int16_t model_id)
 		}
 	}
 
+	if (ret < 0) { /* Call unload to update model and FW state, ignore error */
+		rte_ml_model_stop(dev->data->dev_id, model_id);
+	} else {
+		if (mldev->cache_model_data && roc_model_is_cn10ka())
+			ret = cn10k_ml_cache_model_data(dev, model_id);
+	}
+
 	return ret;
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 35/39] ml/cnxk: add support to select OCM allocation mode
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (33 preceding siblings ...)
  2023-02-01  9:23   ` [PATCH v4 34/39] ml/cnxk: add support to enable model data caching Srikanth Yalavarthi
@ 2023-02-01  9:23   ` Srikanth Yalavarthi
  2023-02-01  9:23   ` [PATCH v4 36/39] ml/cnxk: add support to use lock during jcmd enq Srikanth Yalavarthi
                     ` (3 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:23 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added device argument "ocm_alloc_mode" to select OCM allocation
method during model start. Two modes are supported by the driver.

Added implementation for ocm_alloc_mode lowest as default.

ocm_alloc_mode:
lowest:  Allocate from first available free slot / lowest
         tile ID in OCM (default)
largest: Allocate from a slot with maximum free memory

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 45 +++++++++++++++++++++++++++++-----
 drivers/ml/cnxk/cn10k_ml_ocm.c |  6 ++---
 drivers/ml/cnxk/cn10k_ml_ocm.h |  3 +++
 3 files changed, 44 insertions(+), 10 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 948708a420..5c02d67c8e 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -21,11 +21,13 @@
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
 #define CN10K_ML_DEV_CACHE_MODEL_DATA	"cache_model_data"
+#define CN10K_ML_OCM_ALLOC_MODE		"ocm_alloc_mode"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT 0
 #define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT	1
+#define CN10K_ML_OCM_ALLOC_MODE_DEFAULT		"lowest"
 
 /* ML firmware macros */
 #define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
@@ -39,9 +41,12 @@
 #define FW_ENABLE_DPE_WARNING_BITMASK BIT(0)
 #define FW_REPORT_DPE_WARNING_BITMASK BIT(1)
 
-static const char *const valid_args[] = {CN10K_ML_FW_PATH, CN10K_ML_FW_ENABLE_DPE_WARNINGS,
+static const char *const valid_args[] = {CN10K_ML_FW_PATH,
+					 CN10K_ML_FW_ENABLE_DPE_WARNINGS,
 					 CN10K_ML_FW_REPORT_DPE_WARNINGS,
-					 CN10K_ML_DEV_CACHE_MODEL_DATA, NULL};
+					 CN10K_ML_DEV_CACHE_MODEL_DATA,
+					 CN10K_ML_OCM_ALLOC_MODE,
+					 NULL};
 
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
@@ -81,6 +86,8 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	bool report_dpe_warnings_set = false;
 	bool cache_model_data_set = false;
 	struct rte_kvargs *kvlist = NULL;
+	bool ocm_alloc_mode_set = false;
+	char *ocm_alloc_mode = NULL;
 	bool fw_path_set = false;
 	char *fw_path = NULL;
 	int ret = 0;
@@ -140,6 +147,17 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		cache_model_data_set = true;
 	}
 
+	if (rte_kvargs_count(kvlist, CN10K_ML_OCM_ALLOC_MODE) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_OCM_ALLOC_MODE, &parse_string_arg,
+					 &ocm_alloc_mode);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n", CN10K_ML_OCM_ALLOC_MODE);
+			ret = -EINVAL;
+			goto exit;
+		}
+		ocm_alloc_mode_set = true;
+	}
+
 check_args:
 	if (!fw_path_set)
 		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
@@ -183,6 +201,20 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %d", CN10K_ML_DEV_CACHE_MODEL_DATA, mldev->cache_model_data);
 
+	if (!ocm_alloc_mode_set) {
+		mldev->ocm.alloc_mode = CN10K_ML_OCM_ALLOC_MODE_DEFAULT;
+	} else {
+		if (!((strcmp(ocm_alloc_mode, "lowest") == 0) ||
+		      (strcmp(ocm_alloc_mode, "largest") == 0))) {
+			plt_err("Invalid argument, %s = %s\n", CN10K_ML_OCM_ALLOC_MODE,
+				ocm_alloc_mode);
+			ret = -EINVAL;
+			goto exit;
+		}
+		mldev->ocm.alloc_mode = ocm_alloc_mode;
+	}
+	plt_info("ML: %s = %s", CN10K_ML_OCM_ALLOC_MODE, mldev->ocm.alloc_mode);
+
 exit:
 	if (kvlist)
 		rte_kvargs_free(kvlist);
@@ -720,7 +752,8 @@ RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table);
 RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
 
-RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD,
-			      CN10K_ML_FW_PATH "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
-					       "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
-					       "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA "=<0|1>");
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH
+			      "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
+			      "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
+			      "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
+			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE "=<lowest|largest>");
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 2083d99f81..26e356c107 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -230,7 +230,6 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	int wb_page_start_curr;
 	int max_slot_sz_curr;
 	uint8_t tile_start;
-	int ocm_alloc_mode;
 	int wb_page_start;
 	uint16_t tile_id;
 	uint16_t word_id;
@@ -255,7 +254,6 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	max_slot_sz_curr = 0;
 	max_slot_sz = 0;
 	tile_idx = 0;
-	ocm_alloc_mode = 2;
 
 	if ((start_tile != -1) && (start_tile % num_tiles != 0)) {
 		plt_err("Invalid start_tile, %d", start_tile);
@@ -303,13 +301,13 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 		}
 	}
 
-	if (ocm_alloc_mode == 1) {
+	if (strcmp(ocm->alloc_mode, "lowest") == 0) {
 		wb_page_start = slot_index_lowest(local_ocm_mask, ocm->mask_words, wb_pages, 0);
 		if (wb_page_start != -1) { /* Have a valid slot for WB, else next set of tiles */
 			tile_idx = tile_start;
 			goto found;
 		}
-	} else if (ocm_alloc_mode == 2) {
+	} else if (strcmp(ocm->alloc_mode, "largest") == 0) {
 		wb_page_start_curr = slot_index_largest(local_ocm_mask, ocm->mask_words, wb_pages,
 							&max_slot_sz_curr);
 		if (max_slot_sz_curr > max_slot_sz) {
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 4415bbfb45..6bf71c8da6 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -58,6 +58,9 @@ struct cn10k_ml_ocm {
 	/* OCM spinlock, used to update OCM state */
 	rte_spinlock_t lock;
 
+	/* OCM allocation mode */
+	const char *alloc_mode;
+
 	/* Number of OCM tiles */
 	uint8_t num_tiles;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 36/39] ml/cnxk: add support to use lock during jcmd enq
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (34 preceding siblings ...)
  2023-02-01  9:23   ` [PATCH v4 35/39] ml/cnxk: add support to select OCM allocation mode Srikanth Yalavarthi
@ 2023-02-01  9:23   ` Srikanth Yalavarthi
  2023-02-01  9:23   ` [PATCH v4 37/39] ml/cnxk: add support to select poll memory region Srikanth Yalavarthi
                     ` (2 subsequent siblings)
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:23 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added device argument "hw_queue_lock" to select the JCMDQ enqueue
ROC function to be used in fast path.

hw_queue_lock:

0: Disable, use lock free version of JCMDQ enqueue ROC 	function for
	job queuing. To avoid race condition in request queuing to
	hardware, disabling hw_queue_lock restricts the number of
	queue-pairs supported by cnxk driver to 1.

1: Enable, (default) use spin-lock version of JCMDQ enqueue ROC
	function for job queuing. Enabling spinlock version would
	disable restrictions on the number of queue-pairs that
	can be created.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 31 ++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cn10k_ml_dev.h | 13 +++++++++++--
 drivers/ml/cnxk/cn10k_ml_ops.c | 20 +++++++++++++++++---
 3 files changed, 58 insertions(+), 6 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 5c02d67c8e..aa503b2691 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -22,12 +22,14 @@
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
 #define CN10K_ML_DEV_CACHE_MODEL_DATA	"cache_model_data"
 #define CN10K_ML_OCM_ALLOC_MODE		"ocm_alloc_mode"
+#define CN10K_ML_DEV_HW_QUEUE_LOCK	"hw_queue_lock"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT 0
 #define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT	1
 #define CN10K_ML_OCM_ALLOC_MODE_DEFAULT		"lowest"
+#define CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT	1
 
 /* ML firmware macros */
 #define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
@@ -46,6 +48,7 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 					 CN10K_ML_FW_REPORT_DPE_WARNINGS,
 					 CN10K_ML_DEV_CACHE_MODEL_DATA,
 					 CN10K_ML_OCM_ALLOC_MODE,
+					 CN10K_ML_DEV_HW_QUEUE_LOCK,
 					 NULL};
 
 /* Dummy operations for ML device */
@@ -87,6 +90,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	bool cache_model_data_set = false;
 	struct rte_kvargs *kvlist = NULL;
 	bool ocm_alloc_mode_set = false;
+	bool hw_queue_lock_set = false;
 	char *ocm_alloc_mode = NULL;
 	bool fw_path_set = false;
 	char *fw_path = NULL;
@@ -158,6 +162,18 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		ocm_alloc_mode_set = true;
 	}
 
+	if (rte_kvargs_count(kvlist, CN10K_ML_DEV_HW_QUEUE_LOCK) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_DEV_HW_QUEUE_LOCK, &parse_integer_arg,
+					 &mldev->hw_queue_lock);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n",
+				CN10K_ML_DEV_HW_QUEUE_LOCK);
+			ret = -EINVAL;
+			goto exit;
+		}
+		hw_queue_lock_set = true;
+	}
+
 check_args:
 	if (!fw_path_set)
 		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
@@ -215,6 +231,18 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %s", CN10K_ML_OCM_ALLOC_MODE, mldev->ocm.alloc_mode);
 
+	if (!hw_queue_lock_set) {
+		mldev->hw_queue_lock = CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT;
+	} else {
+		if ((mldev->hw_queue_lock < 0) || (mldev->hw_queue_lock > 1)) {
+			plt_err("Invalid argument, %s = %d\n", CN10K_ML_DEV_HW_QUEUE_LOCK,
+				mldev->hw_queue_lock);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, mldev->hw_queue_lock);
+
 exit:
 	if (kvlist)
 		rte_kvargs_free(kvlist);
@@ -756,4 +784,5 @@ RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH
 			      "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
 			      "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
 			      "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
-			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE "=<lowest|largest>");
+			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE
+			      "=<lowest|largest>" CN10K_ML_DEV_HW_QUEUE_LOCK "=<0|1>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 718edadde7..49676ac9e7 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -21,8 +21,11 @@
 /* Maximum number of models per device */
 #define ML_CN10K_MAX_MODELS 16
 
-/* Maximum number of queue-pairs per device */
-#define ML_CN10K_MAX_QP_PER_DEVICE 1
+/* Maximum number of queue-pairs per device, spinlock version */
+#define ML_CN10K_MAX_QP_PER_DEVICE_SL 16
+
+/* Maximum number of queue-pairs per device, lock-free version */
+#define ML_CN10K_MAX_QP_PER_DEVICE_LF 1
 
 /* Maximum number of descriptors per queue-pair */
 #define ML_CN10K_MAX_DESC_PER_QP 1024
@@ -384,6 +387,12 @@ struct cn10k_ml_dev {
 
 	/* Enable / disable model data caching */
 	int cache_model_data;
+
+	/* Use spinlock version of ROC enqueue */
+	int hw_queue_lock;
+
+	/* JCMD enqueue function handler */
+	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index c12259f44c..3c96db4514 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -534,13 +534,21 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, int16_t model_id)
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
+	struct cn10k_ml_dev *mldev;
+
 	if (dev_info == NULL)
 		return -EINVAL;
 
+	mldev = dev->data->dev_private;
+
 	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
 	dev_info->driver_name = dev->device->driver->name;
 	dev_info->max_models = ML_CN10K_MAX_MODELS;
-	dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE;
+	if (mldev->hw_queue_lock)
+		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
+	else
+		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_LF;
+
 	dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
 	dev_info->max_segments = ML_CN10K_MAX_SEGMENTS;
 	dev_info->min_align_size = ML_CN10K_ALIGN_SIZE;
@@ -703,6 +711,12 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	else
 		mldev->xstats_enabled = false;
 
+	/* Set JCMDQ enqueue function */
+	if (mldev->hw_queue_lock == 1)
+		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_sl;
+	else
+		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
+
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
 	dev->op_error_get = cn10k_ml_op_error_get;
@@ -1993,7 +2007,7 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	req->result.user_ptr = op->user_ptr;
 
 	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
-	enqueued = roc_ml_jcmdq_enqueue_lf(&mldev->roc, &req->jcmd);
+	enqueued = mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
 
@@ -2114,7 +2128,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	timeout = true;
 	req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
 	do {
-		if (roc_ml_jcmdq_enqueue_lf(&mldev->roc, &req->jcmd)) {
+		if (mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd)) {
 			req->op = op;
 			timeout = false;
 			break;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 37/39] ml/cnxk: add support to select poll memory region
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (35 preceding siblings ...)
  2023-02-01  9:23   ` [PATCH v4 36/39] ml/cnxk: add support to use lock during jcmd enq Srikanth Yalavarthi
@ 2023-02-01  9:23   ` Srikanth Yalavarthi
  2023-02-01  9:23   ` [PATCH v4 38/39] ml/cnxk: add user guide for marvell cnxk ml driver Srikanth Yalavarthi
  2023-02-01  9:23   ` [PATCH v4 39/39] ml/cnxk: enable support for configurable ocm page Srikanth Yalavarthi
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:23 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added device argument "poll_mem" to select the memory
region to be used for polling in fast-path requests.

Implemented support to use scratch registers for polling.
Available pool of scratch registers one-to-one mapped with
the internal request queue.

poll_mem:
ddr:      Use DDR memory location for polling (default)
register: Use scratch registers polling

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c |  47 +++++++++++--
 drivers/ml/cnxk/cn10k_ml_dev.h |  24 +++++++
 drivers/ml/cnxk/cn10k_ml_ops.c | 124 +++++++++++++++++++++++++++++++--
 drivers/ml/cnxk/cn10k_ml_ops.h |   9 +++
 4 files changed, 192 insertions(+), 12 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index aa503b2691..a746a66849 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -23,6 +23,7 @@
 #define CN10K_ML_DEV_CACHE_MODEL_DATA	"cache_model_data"
 #define CN10K_ML_OCM_ALLOC_MODE		"ocm_alloc_mode"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK	"hw_queue_lock"
+#define CN10K_ML_FW_POLL_MEM		"poll_mem"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1
@@ -30,6 +31,7 @@
 #define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT	1
 #define CN10K_ML_OCM_ALLOC_MODE_DEFAULT		"lowest"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT	1
+#define CN10K_ML_FW_POLL_MEM_DEFAULT		"ddr"
 
 /* ML firmware macros */
 #define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
@@ -42,6 +44,7 @@
 /* Firmware flags */
 #define FW_ENABLE_DPE_WARNING_BITMASK BIT(0)
 #define FW_REPORT_DPE_WARNING_BITMASK BIT(1)
+#define FW_USE_DDR_POLL_ADDR_FP	      BIT(2)
 
 static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 					 CN10K_ML_FW_ENABLE_DPE_WARNINGS,
@@ -49,6 +52,7 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 					 CN10K_ML_DEV_CACHE_MODEL_DATA,
 					 CN10K_ML_OCM_ALLOC_MODE,
 					 CN10K_ML_DEV_HW_QUEUE_LOCK,
+					 CN10K_ML_FW_POLL_MEM,
 					 NULL};
 
 /* Dummy operations for ML device */
@@ -92,7 +96,9 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	bool ocm_alloc_mode_set = false;
 	bool hw_queue_lock_set = false;
 	char *ocm_alloc_mode = NULL;
+	bool poll_mem_set = false;
 	bool fw_path_set = false;
+	char *poll_mem = NULL;
 	char *fw_path = NULL;
 	int ret = 0;
 
@@ -174,6 +180,17 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		hw_queue_lock_set = true;
 	}
 
+	if (rte_kvargs_count(kvlist, CN10K_ML_FW_POLL_MEM) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_POLL_MEM, &parse_string_arg,
+					 &poll_mem);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n", CN10K_ML_FW_POLL_MEM);
+			ret = -EINVAL;
+			goto exit;
+		}
+		poll_mem_set = true;
+	}
+
 check_args:
 	if (!fw_path_set)
 		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
@@ -243,6 +260,18 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, mldev->hw_queue_lock);
 
+	if (!poll_mem_set) {
+		mldev->fw.poll_mem = CN10K_ML_FW_POLL_MEM_DEFAULT;
+	} else {
+		if (!((strcmp(poll_mem, "ddr") == 0) || (strcmp(poll_mem, "register") == 0))) {
+			plt_err("Invalid argument, %s = %s\n", CN10K_ML_FW_POLL_MEM, poll_mem);
+			ret = -EINVAL;
+			goto exit;
+		}
+		mldev->fw.poll_mem = poll_mem;
+	}
+	plt_info("ML: %s = %s", CN10K_ML_FW_POLL_MEM, mldev->fw.poll_mem);
+
 exit:
 	if (kvlist)
 		rte_kvargs_free(kvlist);
@@ -376,6 +405,11 @@ cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 	if (fw->report_dpe_warnings)
 		flags = flags | FW_REPORT_DPE_WARNING_BITMASK;
 
+	if (strcmp(fw->poll_mem, "ddr") == 0)
+		flags = flags | FW_USE_DDR_POLL_ADDR_FP;
+	else if (strcmp(fw->poll_mem, "register") == 0)
+		flags = flags & ~FW_USE_DDR_POLL_ADDR_FP;
+
 	return flags;
 }
 
@@ -780,9 +814,10 @@ RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table);
 RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
 
-RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH
-			      "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
-			      "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
-			      "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
-			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE
-			      "=<lowest|largest>" CN10K_ML_DEV_HW_QUEUE_LOCK "=<0|1>");
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD,
+			      CN10K_ML_FW_PATH "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
+					       "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
+					       "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
+					       "=<0|1>" CN10K_ML_OCM_ALLOC_MODE
+					       "=<lowest|largest>" CN10K_ML_DEV_HW_QUEUE_LOCK
+					       "=<0|1>" CN10K_ML_FW_POLL_MEM "=<ddr|register>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 49676ac9e7..966d92e027 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -43,6 +43,18 @@
 #define ML_CN10K_POLL_JOB_START	 0
 #define ML_CN10K_POLL_JOB_FINISH 1
 
+/* Memory barrier macros */
+#if defined(RTE_ARCH_ARM)
+#define dmb_st ({ asm volatile("dmb st" : : : "memory"); })
+#define dsb_st ({ asm volatile("dsb st" : : : "memory"); })
+#else
+#define dmb_st
+#define dsb_st
+#endif
+
+struct cn10k_ml_req;
+struct cn10k_ml_qp;
+
 /* Job types */
 enum cn10k_ml_job_type {
 	ML_CN10K_JOB_TYPE_MODEL_RUN = 0,
@@ -358,6 +370,9 @@ struct cn10k_ml_fw {
 	/* Report DPE warnings */
 	int report_dpe_warnings;
 
+	/* Memory to be used for polling in fast-path requests */
+	const char *poll_mem;
+
 	/* Data buffer */
 	uint8_t *data;
 
@@ -393,6 +408,15 @@ struct cn10k_ml_dev {
 
 	/* JCMD enqueue function handler */
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
+
+	/* Poll handling function pointers */
+	void (*set_poll_addr)(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx);
+	void (*set_poll_ptr)(struct roc_ml *roc_ml, struct cn10k_ml_req *req);
+	uint64_t (*get_poll_ptr)(struct roc_ml *roc_ml, struct cn10k_ml_req *req);
+
+	/* Memory barrier function pointers to handle synchronization */
+	void (*set_enq_barrier)(void);
+	void (*set_deq_barrier)(void);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 3c96db4514..947f6a6490 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -23,6 +23,11 @@
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
+/* Scratch register range for poll mode requests */
+#define ML_POLL_REGISTER_SYNC  1023
+#define ML_POLL_REGISTER_START 1024
+#define ML_POLL_REGISTER_END   2047
+
 /* Error message length */
 #define ERRMSG_LEN 32
 
@@ -76,6 +81,80 @@ print_line(FILE *fp, int len)
 	fprintf(fp, "\n");
 }
 
+static inline void
+cn10k_ml_set_poll_addr_ddr(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx)
+{
+	PLT_SET_USED(qp);
+	PLT_SET_USED(idx);
+
+	req->compl_W1 = PLT_U64_CAST(&req->status);
+}
+
+static inline void
+cn10k_ml_set_poll_addr_reg(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx)
+{
+	req->compl_W1 = ML_SCRATCH(qp->block_start + idx % qp->block_size);
+}
+
+static inline void
+cn10k_ml_set_poll_ptr_ddr(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+{
+	PLT_SET_USED(roc_ml);
+
+	plt_write64(ML_CN10K_POLL_JOB_START, req->compl_W1);
+}
+
+static inline void
+cn10k_ml_set_poll_ptr_reg(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+{
+	roc_ml_reg_write64(roc_ml, ML_CN10K_POLL_JOB_START, req->compl_W1);
+}
+
+static inline uint64_t
+cn10k_ml_get_poll_ptr_ddr(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+{
+	PLT_SET_USED(roc_ml);
+
+	return plt_read64(req->compl_W1);
+}
+
+static inline uint64_t
+cn10k_ml_get_poll_ptr_reg(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+{
+	return roc_ml_reg_read64(roc_ml, req->compl_W1);
+}
+
+static inline void
+cn10k_ml_set_sync_addr(struct cn10k_ml_dev *mldev, struct cn10k_ml_req *req)
+{
+	if (strcmp(mldev->fw.poll_mem, "ddr") == 0)
+		req->compl_W1 = PLT_U64_CAST(&req->status);
+	else if (strcmp(mldev->fw.poll_mem, "register") == 0)
+		req->compl_W1 = ML_SCRATCH(ML_POLL_REGISTER_SYNC);
+}
+
+static inline void
+cn10k_ml_enq_barrier_ddr(void)
+{
+}
+
+static inline void
+cn10k_ml_deq_barrier_ddr(void)
+{
+}
+
+static inline void
+cn10k_ml_enq_barrier_register(void)
+{
+	dmb_st;
+}
+
+static inline void
+cn10k_ml_deq_barrier_register(void)
+{
+	dsb_st;
+}
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -163,6 +242,9 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->stats.dequeued_count = 0;
 	qp->stats.enqueue_err_count = 0;
 	qp->stats.dequeue_err_count = 0;
+	qp->block_size =
+		(ML_POLL_REGISTER_END - ML_POLL_REGISTER_START + 1) / dev->data->nb_queue_pairs;
+	qp->block_start = ML_POLL_REGISTER_START + qp_id * qp->block_size;
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
@@ -341,7 +423,7 @@ cn10k_ml_prep_fp_job_descriptor(struct rte_ml_dev *dev, struct cn10k_ml_req *req
 	mldev = dev->data->dev_private;
 
 	req->jd.hdr.jce.w0.u64 = 0;
-	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
+	req->jd.hdr.jce.w1.u64 = req->compl_W1;
 	req->jd.hdr.model_id = op->model_id;
 	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
 	req->jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
@@ -549,7 +631,11 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	else
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_LF;
 
-	dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
+	if (strcmp(mldev->fw.poll_mem, "register") == 0)
+		dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP / dev_info->max_queue_pairs;
+	else if (strcmp(mldev->fw.poll_mem, "ddr") == 0)
+		dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
+
 	dev_info->max_segments = ML_CN10K_MAX_SEGMENTS;
 	dev_info->min_align_size = ML_CN10K_ALIGN_SIZE;
 
@@ -717,6 +803,26 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	else
 		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
+	/* Set polling function pointers */
+	if (strcmp(mldev->fw.poll_mem, "ddr") == 0) {
+		mldev->set_poll_addr = cn10k_ml_set_poll_addr_ddr;
+		mldev->set_poll_ptr = cn10k_ml_set_poll_ptr_ddr;
+		mldev->get_poll_ptr = cn10k_ml_get_poll_ptr_ddr;
+	} else if (strcmp(mldev->fw.poll_mem, "register") == 0) {
+		mldev->set_poll_addr = cn10k_ml_set_poll_addr_reg;
+		mldev->set_poll_ptr = cn10k_ml_set_poll_ptr_reg;
+		mldev->get_poll_ptr = cn10k_ml_get_poll_ptr_reg;
+	}
+
+	/* Set barrier function pointers */
+	if (strcmp(mldev->fw.poll_mem, "ddr") == 0) {
+		mldev->set_enq_barrier = cn10k_ml_enq_barrier_ddr;
+		mldev->set_deq_barrier = cn10k_ml_deq_barrier_ddr;
+	} else if (strcmp(mldev->fw.poll_mem, "register") == 0) {
+		mldev->set_enq_barrier = cn10k_ml_enq_barrier_register;
+		mldev->set_deq_barrier = cn10k_ml_deq_barrier_register;
+	}
+
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
 	dev->op_error_get = cn10k_ml_op_error_get;
@@ -2000,13 +2106,15 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	op = ops[count];
 	req = &queue->reqs[head];
 
+	mldev->set_poll_addr(qp, req, head);
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
+	mldev->set_enq_barrier();
 
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	mldev->set_poll_ptr(&mldev->roc, req);
 	enqueued = mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
@@ -2032,6 +2140,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 		       uint16_t nb_ops)
 {
 	struct cn10k_ml_queue *queue;
+	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	struct cn10k_ml_qp *qp;
 
@@ -2039,6 +2148,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	uint16_t count;
 	uint64_t tail;
 
+	mldev = dev->data->dev_private;
 	qp = dev->data->queue_pairs[qp_id];
 	queue = &qp->queue;
 
@@ -2051,7 +2161,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 
 dequeue_req:
 	req = &queue->reqs[tail];
-	status = plt_read64(&req->status);
+	status = mldev->get_poll_ptr(&mldev->roc, req);
 	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH)) {
 		if (plt_tsc_cycles() < req->timeout)
 			goto empty_or_active;
@@ -2059,6 +2169,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 			req->result.error_code.s.etype = ML_ETYPE_DRIVER;
 	}
 
+	mldev->set_deq_barrier();
 	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
 	ops[count] = req->op;
 
@@ -2116,13 +2227,14 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	model = dev->data->models[op->model_id];
 	req = model->req;
 
+	cn10k_ml_set_sync_addr(mldev, req);
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	mldev->set_poll_ptr(&mldev->roc, req);
 	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
 
 	timeout = true;
@@ -2142,7 +2254,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 
 	timeout = true;
 	do {
-		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+		if (mldev->get_poll_ptr(&mldev->roc, req) == ML_CN10K_POLL_JOB_FINISH) {
 			timeout = false;
 			break;
 		}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index fb82af414a..995ed27e4e 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -26,6 +26,9 @@ struct cn10k_ml_req {
 	/* Job command */
 	struct ml_job_cmd_s jcmd;
 
+	/* Job completion W1 */
+	uint64_t compl_W1;
+
 	/* Timeout cycle */
 	uint64_t timeout;
 
@@ -61,6 +64,12 @@ struct cn10k_ml_qp {
 
 	/* Statistics per queue-pair */
 	struct rte_ml_dev_stats stats;
+
+	/* Register block start for polling */
+	uint32_t block_start;
+
+	/* Register block end for polling */
+	uint32_t block_size;
 };
 
 /* Device ops */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 38/39] ml/cnxk: add user guide for marvell cnxk ml driver
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (36 preceding siblings ...)
  2023-02-01  9:23   ` [PATCH v4 37/39] ml/cnxk: add support to select poll memory region Srikanth Yalavarthi
@ 2023-02-01  9:23   ` Srikanth Yalavarthi
  2023-02-01  9:23   ` [PATCH v4 39/39] ml/cnxk: enable support for configurable ocm page Srikanth Yalavarthi
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:23 UTC (permalink / raw)
  To: Thomas Monjalon, Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Added user guide for Marvell cnxk ML driver for Marvell Octeon
cnxk Soc family. Added details about device initialization,
debug options and runtime device args supported by the driver.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 MAINTAINERS                 |   1 +
 doc/guides/index.rst        |   1 +
 doc/guides/mldevs/cnxk.rst  | 238 ++++++++++++++++++++++++++++++++++++
 doc/guides/mldevs/index.rst |  14 +++
 4 files changed, 254 insertions(+)
 create mode 100644 doc/guides/mldevs/cnxk.rst
 create mode 100644 doc/guides/mldevs/index.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index 147f2bd8ec..4a4990fdf1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1442,6 +1442,7 @@ M: Srikanth Yalavarthi <syalavarthi@marvell.com>
 F: drivers/common/cnxk/hw/ml.h
 F: drivers/common/cnxk/roc_ml*
 F: drivers/ml/cnxk/
+F: doc/guides/mldevs/cnxk.rst
 
 
 Packet processing
diff --git a/doc/guides/index.rst b/doc/guides/index.rst
index 5eb5bd9c9a..0bd729530a 100644
--- a/doc/guides/index.rst
+++ b/doc/guides/index.rst
@@ -26,6 +26,7 @@ DPDK documentation
    eventdevs/index
    rawdevs/index
    mempool/index
+   mldevs/index
    platform/index
    contributing/index
    rel_notes/index
diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
new file mode 100644
index 0000000000..da40336299
--- /dev/null
+++ b/doc/guides/mldevs/cnxk.rst
@@ -0,0 +1,238 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright (c) 2022 Marvell.
+
+Marvell cnxk Machine Learning Poll Mode Driver
+==============================================
+
+The cnxk ML poll mode driver provides support for offloading Machine
+Learning inference operations to Machine Learning accelerator units
+on the **Marvell OCTEON cnxk** SoC family.
+
+The cnxk ML PMD code is organized into multiple files with all file names
+starting with cn10k, providing support for CN106XX and CN106XXS.
+
+More information about OCTEON cnxk SoCs may be obtained from `<https://www.marvell.com>`_
+
+Supported OCTEON cnxk SoCs
+--------------------------
+
+- CN106XX
+- CN106XXS
+
+Features
+--------
+
+The OCTEON cnxk ML PMD provides support for the following set of operations:
+
+Slow-path device and ML model handling:
+
+* ``Device probing, configuration and close``
+* ``Device start / stop``
+* ``Model loading and unloading``
+* ``Model start / stop``
+* ``Data quantization and dequantization``
+
+Fast-path Inference:
+
+* ``Inference execution``
+* ``Error handling``
+
+
+Installation
+------------
+
+The OCTEON cnxk ML PMD may be compiled natively on an OCTEON cnxk platform
+or cross-compiled on an x86 platform.
+
+Refer to :doc:`../platform/cnxk` for instructions to build your DPDK
+application.
+
+
+Initialization
+--------------
+
+``CN10K Initialization``
+
+List the ML PF devices available on cn10k platform:
+
+.. code-block:: console
+
+    lspci -d:a092
+
+``a092`` is the ML device PF id. You should see output similar to:
+
+.. code-block:: console
+
+    0000:00:10.0 System peripheral: Cavium, Inc. Device a092
+
+Bind the ML PF device to the vfio_pci driver:
+
+.. code-block:: console
+
+    cd <dpdk directory>
+    ./usertools/dpdk-devbind.py -u 0000:00:10.0
+    ./usertools/dpdk-devbind.py -b vfio-pci 0000:00:10.0
+
+Runtime Config Options
+----------------------
+
+- ``Firmware file path`` (default ``/lib/firmware/mlip-fw.bin``)
+
+   Path to the firmware binary to be loaded during device configuration.
+   The ``fw_path`` ``devargs`` parameter can be used by the user to load
+   ML firmware from a custom path.
+
+   For example::
+
+      -a 0000:00:10.0,fw_path="/home/user/ml_fw.bin"
+
+   With the above configuration, driver loads the firmware from the path
+   "/home/user/ml_fw.bin".
+
+- ``Enable DPE warnings`` (default ``1``)
+
+   ML firmware can be configured during load to handle the DPE errors reported
+   by ML inference engine. When enabled, firmware would mask the DPE non-fatal
+   hardware errors as warnings. The parameter ``enable_dpe_warnings`` ``devargs``
+   is used fo this configuration.
+
+   For example::
+
+      -a 0000:00:10.0,enable_dpe_warnings=0
+
+   With the above configuration, DPE non-fatal errors reported by HW are
+   considered as errors.
+
+
+- ``Model data caching`` (default ``1``)
+
+   Enable caching model data on ML ACC cores. Enabling this option executes a
+   dummy inference request in synchronous mode during model start stage. Caching
+   of model data improves the inferencing throughput / latency for the model.
+   The parameter ``cache_model_data`` ``devargs`` is used to enable data caching.
+
+   For example::
+
+      -a 0000:00:10.0,cache_model_data=0
+
+   With the above configuration, model data caching is disabled.
+
+
+- ``OCM allocation mode`` (default ``lowest``)
+
+   Option to specify the method to be used while allocating OCM memory for a
+   model during model start. Two modes are supported by the driver. The
+   parameter ``ocm_alloc_mode`` ``devargs`` is used to select the OCM
+   allocation mode.
+
+   ``lowest`` - Allocate OCM for the model from first available free slot. Search
+   for the free slot is done starting from the lowest tile ID and lowest page ID.
+   ``largest`` - Allocate OCM for the model from the slot with largest amount of
+   free space.
+
+   For example::
+
+      -a 0000:00:10.0,ocm_alloc_mode=lowest
+
+   With the above configuration, OCM allocation fo the model would be done from
+   the first available free slot / from the lowest possible tile ID.
+
+
+- ``Enable hardware queue lock`` (default ``0``)
+
+   Option to select the job request enqueue function to used to queue the requests
+   to hardware queue. The parameter ``hw_queue_lock`` ``devargs`` is used to select
+   the enqueue function.
+
+   ``0`` - Disable (default), use lock free version of hardware enqueue function
+   for job queuing in enqueue burst operation. To avoid race condition in request
+   queuing to hardware, disabling hw_queue_lock restricts the number of queue-pairs
+   supported by cnxk driver to 1.
+   ``1`` - Enable, use spin-lock version of hardware enqueue function for job queuing.
+   Enabling spinlock version would disable restrictions on the number of queue-pairs
+   that can be supported by the driver.
+
+   For example::
+
+      -a 0000:00:10.0,hw_queue_lock=1
+
+   With the above configuration, spinlock version of hardware enqueue function is used
+   in the fast path enqueue burst operation.
+
+
+- ``Polling memory location`` (default ``ddr``)
+
+   ML cnxk driver provides the option to select the memory location to be used
+   for polling to check the inference request completion. Driver supports using
+   the either DDR address space (``ddr``) or ML registers (``register``) as
+   polling locations. The parameter ``poll_mem`` ``devargs`` is used to specify
+   the poll location.
+
+   For example::
+
+      -a 0000:00:10.0,poll_mem="register"
+
+   With the above configuration, ML cnxk driver is configured to use ML registers
+   for polling in fastpath requests.
+
+
+Debugging Options
+-----------------
+
+.. _table_octeon_cnxk_ml_debug_options:
+
+.. table:: OCTEON cnxk ML PMD debug options
+
+    +---+------------+-------------------------------------------------------+
+    | # | Component  | EAL log command                                       |
+    +===+============+=======================================================+
+    | 1 | ML         | --log-level='pmd\.ml\.cnxk,8'                         |
+    +---+------------+-------------------------------------------------------+
+
+
+Extended stats
+--------------
+
+Marvell cnxk ML PMD supports reporting the inference latencies through extended
+stats. The PMD supports the below list of 6 extended stats types per each model.
+Total number of extended stats would be equal to 6 x number of models loaded.
+
+.. _table_octeon_cnxk_ml_xstats_names:
+
+.. table:: OCTEON cnxk ML PMD xstats names
+
+    +---+---------------------+----------------------------------------------+
+    | # | Type                | Description                                  |
+    +===+=====================+==============================================+
+    | 1 | Avg-HW-Latency      | Average hardware latency                     |
+    +---+---------------------+----------------------------------------------+
+    | 2 | Min-HW-Latency      | Minimum hardware latency                     |
+    +---+---------------------+----------------------------------------------+
+    | 3 | Max-HW-Latency      | Maximum hardware latency                     |
+    +---+---------------------+----------------------------------------------+
+    | 4 | Avg-HW-Latency      | Average firmware latency                     |
+    +---+---------------------+----------------------------------------------+
+    | 5 | Avg-HW-Latency      | Minimum firmware latency                     |
+    +---+---------------------+----------------------------------------------+
+    | 6 | Avg-HW-Latency      | Maximum firmware latency                     |
+    +---+---------------------+----------------------------------------------+
+
+Latency values reported by the PMD through xstats can have units, either in
+cycles or nano seconds. The units of the latency is determined during DPDK
+initialization and would depend on the availability of SCLK. Latencies are
+reported in nao seconds when the SCLK is available and in cycles otherwise.
+Application needs to initialize at least one RVU for the clock to be available.
+
+xstats names are dynamically generated by the PMD and would have the format
+"Model-<model_id>-Type-<units>".
+
+For example::
+   Model-1-Avg-FW-Latency-ns
+
+The above xstat name would report average firmware latency in nano seconds for
+model with model ID 1.
+
+Number of xstats made available by the PMD change dynamically. The number would
+increase with loading a model and would decrease with unloading a model.
+Application needs to update the xstats map after a model is either loaded or
+unloaded.
diff --git a/doc/guides/mldevs/index.rst b/doc/guides/mldevs/index.rst
new file mode 100644
index 0000000000..f201e54175
--- /dev/null
+++ b/doc/guides/mldevs/index.rst
@@ -0,0 +1,14 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright (c) 2022 Marvell.
+
+Machine Learning Device Driver
+==============================
+
+The following are a list of ML device PMDs, which can be used from an
+application through the ML device API.
+
+.. toctree::
+    :maxdepth: 2
+    :numbered:
+
+    cnxk
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v4 39/39] ml/cnxk: enable support for configurable ocm page
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
                     ` (37 preceding siblings ...)
  2023-02-01  9:23   ` [PATCH v4 38/39] ml/cnxk: add user guide for marvell cnxk ml driver Srikanth Yalavarthi
@ 2023-02-01  9:23   ` Srikanth Yalavarthi
  38 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-01  9:23 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu

Enabled support for configurable OCM page size. A new device
argument "ocm_page_size" is added to specify the page size
for OCM management. Supported page sizes are 1KB, 2KB, 4KB,
8KB and 16KB. Default page size is 16KB.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 doc/guides/mldevs/cnxk.rst       | 16 +++++++++
 drivers/ml/cnxk/cn10k_ml_dev.c   | 61 ++++++++++++++++++++++++++++----
 drivers/ml/cnxk/cn10k_ml_dev.h   |  3 ++
 drivers/ml/cnxk/cn10k_ml_model.c |  6 ++--
 drivers/ml/cnxk/cn10k_ml_ocm.c   | 18 +++++++---
 drivers/ml/cnxk/cn10k_ml_ocm.h   | 14 +++-----
 drivers/ml/cnxk/cn10k_ml_ops.c   | 17 ++++++---
 7 files changed, 107 insertions(+), 28 deletions(-)

diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
index da40336299..f7f61e8bfa 100644
--- a/doc/guides/mldevs/cnxk.rst
+++ b/doc/guides/mldevs/cnxk.rst
@@ -175,6 +175,22 @@ Runtime Config Options
    With the above configuration, ML cnxk driver is configured to use ML registers
    for polling in fastpath requests.
 
+- ``OCM page size`` (default ``16384``)
+
+   Option to specify the page size in bytes to be used for OCM management. Available
+   OCM is split into multiple pages of specified sizes and the pages are allocated to
+   the models. The parameter ``ocm_page_size`` ``devargs`` is used to specify the page
+   size to be used.
+
+   Supported page sizes by the driver are 1 KB, 2 KB, 4 KB, 8 KB and 16 KB. Default
+   page size is 16 KB.
+
+   For example::
+
+      -a 0000:00:10.0,ocm_page_size=8192
+
+   With the above configuration, page size of OCM is set to 8192 bytes / 8 KB.
+
 
 Debugging Options
 -----------------
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index a746a66849..6f9a1015a6 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -24,6 +24,7 @@
 #define CN10K_ML_OCM_ALLOC_MODE		"ocm_alloc_mode"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK	"hw_queue_lock"
 #define CN10K_ML_FW_POLL_MEM		"poll_mem"
+#define CN10K_ML_OCM_PAGE_SIZE		"ocm_page_size"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1
@@ -32,6 +33,7 @@
 #define CN10K_ML_OCM_ALLOC_MODE_DEFAULT		"lowest"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT	1
 #define CN10K_ML_FW_POLL_MEM_DEFAULT		"ddr"
+#define CN10K_ML_OCM_PAGE_SIZE_DEFAULT		16384
 
 /* ML firmware macros */
 #define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
@@ -53,8 +55,12 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 					 CN10K_ML_OCM_ALLOC_MODE,
 					 CN10K_ML_DEV_HW_QUEUE_LOCK,
 					 CN10K_ML_FW_POLL_MEM,
+					 CN10K_ML_OCM_PAGE_SIZE,
 					 NULL};
 
+/* Supported OCM page sizes: 1KB, 2KB, 4KB, 8KB and 16KB */
+static const int valid_ocm_page_size[] = {1024, 2048, 4096, 8192, 16384};
+
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
 
@@ -95,12 +101,15 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	struct rte_kvargs *kvlist = NULL;
 	bool ocm_alloc_mode_set = false;
 	bool hw_queue_lock_set = false;
+	bool ocm_page_size_set = false;
 	char *ocm_alloc_mode = NULL;
 	bool poll_mem_set = false;
 	bool fw_path_set = false;
 	char *poll_mem = NULL;
 	char *fw_path = NULL;
 	int ret = 0;
+	bool found;
+	uint8_t i;
 
 	if (devargs == NULL)
 		goto check_args;
@@ -191,6 +200,17 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		poll_mem_set = true;
 	}
 
+	if (rte_kvargs_count(kvlist, CN10K_ML_OCM_PAGE_SIZE) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_OCM_PAGE_SIZE, &parse_integer_arg,
+					 &mldev->ocm_page_size);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n", CN10K_ML_OCM_PAGE_SIZE);
+			ret = -EINVAL;
+			goto exit;
+		}
+		ocm_page_size_set = true;
+	}
+
 check_args:
 	if (!fw_path_set)
 		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
@@ -272,6 +292,32 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %s", CN10K_ML_FW_POLL_MEM, mldev->fw.poll_mem);
 
+	if (!ocm_page_size_set) {
+		mldev->ocm_page_size = CN10K_ML_OCM_PAGE_SIZE_DEFAULT;
+	} else {
+		if (mldev->ocm_page_size < 0) {
+			plt_err("Invalid argument, %s = %d\n", CN10K_ML_OCM_PAGE_SIZE,
+				mldev->ocm_page_size);
+			ret = -EINVAL;
+			goto exit;
+		}
+
+		found = false;
+		for (i = 0; i < PLT_DIM(valid_ocm_page_size); i++) {
+			if (mldev->ocm_page_size == valid_ocm_page_size[i]) {
+				found = true;
+				break;
+			}
+		}
+
+		if (!found) {
+			plt_err("Unsupported ocm_page_size = %d\n", mldev->ocm_page_size);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_info("ML: %s = %d", CN10K_ML_OCM_PAGE_SIZE, mldev->ocm_page_size);
+
 exit:
 	if (kvlist)
 		rte_kvargs_free(kvlist);
@@ -814,10 +860,11 @@ RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table);
 RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
 
-RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD,
-			      CN10K_ML_FW_PATH "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
-					       "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
-					       "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
-					       "=<0|1>" CN10K_ML_OCM_ALLOC_MODE
-					       "=<lowest|largest>" CN10K_ML_DEV_HW_QUEUE_LOCK
-					       "=<0|1>" CN10K_ML_FW_POLL_MEM "=<ddr|register>");
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH
+			      "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
+			      "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
+			      "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
+			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE
+			      "=<lowest|largest>" CN10K_ML_DEV_HW_QUEUE_LOCK
+			      "=<0|1>" CN10K_ML_FW_POLL_MEM "=<ddr|register>" CN10K_ML_OCM_PAGE_SIZE
+			      "=<1024|2048|4096|8192|16384>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 966d92e027..b4e46899c0 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -406,6 +406,9 @@ struct cn10k_ml_dev {
 	/* Use spinlock version of ROC enqueue */
 	int hw_queue_lock;
 
+	/* OCM page size */
+	int ocm_page_size;
+
 	/* JCMD enqueue function handler */
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
 
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 295b6f0a01..44f0087bf7 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -339,11 +339,11 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, int16_t model_id, uin
 		   scratch_size, *scratch_pages);
 
 	/* Check if the model can be loaded on OCM */
-	if ((*wb_pages + *scratch_pages) > ML_CN10K_OCM_NUMPAGES) {
+	if ((*wb_pages + *scratch_pages) > mldev->ocm.num_pages) {
 		plt_err("Cannot create the model, OCM relocatable = %u",
 			metadata->model.ocm_relocatable);
 		plt_err("wb_pages (%u) + scratch_pages (%u) > %u", *wb_pages, *scratch_pages,
-			ML_CN10K_OCM_NUMPAGES);
+			mldev->ocm.num_pages);
 		return -ENOMEM;
 	}
 
@@ -352,7 +352,7 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, int16_t model_id, uin
 	 */
 	if (!metadata->model.ocm_relocatable)
 		*scratch_pages =
-			PLT_MAX(PLT_U64_CAST(*scratch_pages), PLT_U64_CAST(ML_CN10K_OCM_NUMPAGES));
+			PLT_MAX(PLT_U64_CAST(*scratch_pages), PLT_U64_CAST(mldev->ocm.num_pages));
 
 	return 0;
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 26e356c107..4d9e01c47b 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -220,13 +220,13 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 
-	uint8_t local_ocm_mask[ML_CN10K_OCM_MASKWORDS] = {0};
 	uint16_t used_scratch_pages_max;
 	uint16_t scratch_page_start;
 	int used_last_wb_page_max;
 	uint16_t scratch_page_end;
 	uint8_t search_start_tile;
 	uint8_t search_end_tile;
+	uint8_t *local_ocm_mask;
 	int wb_page_start_curr;
 	int max_slot_sz_curr;
 	uint8_t tile_start;
@@ -268,6 +268,9 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 		search_end_tile = start_tile;
 	}
 
+	/* nibbles + prefix '0x' */
+	local_ocm_mask = rte_zmalloc("local_ocm_mask", mldev->ocm.mask_words, RTE_CACHE_LINE_SIZE);
+
 	tile_start = search_start_tile;
 start_search:
 	used_scratch_pages_max = 0;
@@ -279,7 +282,7 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 			PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, used_last_wb_page_max);
 	}
 
-	memset(local_ocm_mask, 0, sizeof(local_ocm_mask));
+	memset(local_ocm_mask, 0, mldev->ocm.mask_words);
 	for (tile_id = tile_start; tile_id < tile_start + num_tiles; tile_id++) {
 		for (word_id = 0; word_id < ocm->mask_words; word_id++)
 			local_ocm_mask[word_id] |= ocm->tile_ocm_info[tile_id].ocm_mask[word_id];
@@ -332,6 +335,8 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	if (wb_page_start != -1)
 		*tilemask = GENMASK_ULL(tile_idx + num_tiles - 1, tile_idx);
 
+	rte_free(local_ocm_mask);
+
 	return wb_page_start;
 }
 
@@ -480,7 +485,7 @@ cn10k_ml_ocm_pagemask_to_str(struct cn10k_ml_ocm_tile_info *tile_info, uint16_t
 void
 cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
 {
-	char str[ML_CN10K_OCM_NUMPAGES / 4 + 2]; /* nibbles + prefix '0x' */
+	char *str;
 	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	uint8_t tile_id;
@@ -490,12 +495,15 @@ cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
 	mldev = dev->data->dev_private;
 	ocm = &mldev->ocm;
 
+	/* nibbles + prefix '0x' */
+	str = rte_zmalloc("ocm_mask_str", mldev->ocm.num_pages / 4 + 2, RTE_CACHE_LINE_SIZE);
+
 	fprintf(fp, "OCM State:\n");
 	for (tile_id = 0; tile_id < ocm->num_tiles; tile_id++) {
 		cn10k_ml_ocm_pagemask_to_str(&ocm->tile_ocm_info[tile_id], ocm->mask_words, str);
 
 		wb_pages = 0 - ocm->tile_ocm_info[tile_id].scratch_pages;
-		for (word_id = 0; word_id < ML_CN10K_OCM_MASKWORDS; word_id++)
+		for (word_id = 0; word_id < mldev->ocm.mask_words; word_id++)
 			wb_pages +=
 				__builtin_popcount(ocm->tile_ocm_info[tile_id].ocm_mask[word_id]);
 
@@ -506,4 +514,6 @@ cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
 			tile_id, ocm->tile_ocm_info[tile_id].scratch_pages, wb_pages,
 			ocm->tile_ocm_info[tile_id].last_wb_page, str);
 	}
+
+	rte_free(str);
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 6bf71c8da6..0ed5db98db 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -8,25 +8,16 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
-/* Page size in bytes. */
-#define ML_CN10K_OCM_PAGESIZE 0x4000
-
 /* Number of OCM tiles. */
 #define ML_CN10K_OCM_NUMTILES 0x8
 
 /* OCM in bytes, per tile. */
 #define ML_CN10K_OCM_TILESIZE 0x100000
 
-/* OCM pages, per tile. */
-#define ML_CN10K_OCM_NUMPAGES (ML_CN10K_OCM_TILESIZE / ML_CN10K_OCM_PAGESIZE)
-
-/* Maximum OCM mask words, per tile, 8 bit words. */
-#define ML_CN10K_OCM_MASKWORDS (ML_CN10K_OCM_NUMPAGES / 8)
-
 /* OCM and Tile information structure */
 struct cn10k_ml_ocm_tile_info {
 	/* Mask of used / allotted pages on tile's OCM */
-	uint8_t ocm_mask[ML_CN10K_OCM_MASKWORDS];
+	uint8_t *ocm_mask;
 
 	/* Last pages in the tile's OCM used for weights and bias, default = -1 */
 	int last_wb_page;
@@ -78,6 +69,9 @@ struct cn10k_ml_ocm {
 
 	/* OCM memory info and status*/
 	struct cn10k_ml_ocm_tile_info tile_ocm_info[ML_CN10K_OCM_NUMTILES];
+
+	/* Memory for ocm_mask */
+	uint8_t *ocm_mask;
 };
 
 int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 947f6a6490..4126ab4991 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -311,8 +311,8 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, int16_t model_id, FILE *fp)
 	if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
 		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
 			ML_CN10K_OCM_NUMTILES / 4, model->model_mem_map.tilemask);
-		fprintf(fp, "%*s : 0x%x\n", FIELD_LEN, "ocm_wb_start",
-			model->model_mem_map.wb_page_start * ML_CN10K_OCM_PAGESIZE);
+		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
+			model->model_mem_map.wb_page_start * mldev->ocm.page_size);
 	}
 
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->metadata.model.num_input);
@@ -781,12 +781,18 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	ocm = &mldev->ocm;
 	ocm->num_tiles = ML_CN10K_OCM_NUMTILES;
 	ocm->size_per_tile = ML_CN10K_OCM_TILESIZE;
-	ocm->page_size = ML_CN10K_OCM_PAGESIZE;
+	ocm->page_size = mldev->ocm_page_size;
 	ocm->num_pages = ocm->size_per_tile / ocm->page_size;
 	ocm->mask_words = ocm->num_pages / (8 * sizeof(uint8_t));
 
-	for (tile_id = 0; tile_id < ocm->num_tiles; tile_id++)
+	/* Allocate memory for ocm_mask */
+	ocm->ocm_mask =
+		rte_zmalloc("ocm_mask", ocm->mask_words * ocm->num_tiles, RTE_CACHE_LINE_SIZE);
+
+	for (tile_id = 0; tile_id < ocm->num_tiles; tile_id++) {
+		ocm->tile_ocm_info[tile_id].ocm_mask = ocm->ocm_mask + tile_id * ocm->mask_words;
 		ocm->tile_ocm_info[tile_id].last_wb_page = -1;
+	}
 
 	rte_spinlock_init(&ocm->lock);
 
@@ -856,6 +862,9 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 
 	mldev = dev->data->dev_private;
 
+	/* Release ocm_mask memory */
+	rte_free(mldev->ocm.ocm_mask);
+
 	/* Stop and unload all models */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 		model = dev->data->models[model_id];
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 00/39] Implementation of ML CNXK driver
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (38 preceding siblings ...)
  2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
@ 2023-02-07 16:06 ` Srikanth Yalavarthi
  2023-02-07 16:06   ` [PATCH v5 01/39] common/cnxk: add ML headers and ROC code for cnxk Srikanth Yalavarthi
                     ` (39 more replies)
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
  40 siblings, 40 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:06 UTC (permalink / raw)
  Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla,
	Srikanth Yalavarthi

Marvell ML CNXK Driver
----------------------

This patch series implements common Machine Learning (ML) ROC code
and driver for Marvell Octeon 10 (cnxk) platform. ML inferencing is
supported on cnxk platform through an integrated ML inferencing
processor. The current driver supports programming the ML hardware
engine through offload mode.

All APIs proposed in the DPDK ML device specification are supported on
the cnxk platform.

v5:
* Updated model_id to uint16_t
* Updated release notes for 23.03

v4:
* Update function names of ML common code
* Added support for configurable OCM page size
* Minor typo fixes

v3:
* Skip installation of internal headers
* Update internal comments and code cleanup

v2:
* Typo and formatting fixes

Srikanth Yalavarthi (39):
  common/cnxk: add ML headers and ROC code for cnxk
  ml/cnxk: add skeleton for ML cnxk driver
  ml/cnxk: enable probe and remove of ML device
  ml/cnxk: add driver support to get device info
  ml/cnxk: add support for configure and close
  ml/cnxk: parse ML firmware path from device args
  ml/cnxk: enable firmware load and device reset
  ml/cnxk: enable support for simulator environment
  ml/cnxk: enable support for device start and stop
  ml/cnxk: add support to create device queue-pairs
  ml/cnxk: add functions to load and unload models
  ml/cnxk: enable validity checks for model metadata
  ml/cnxk: add internal structures for derived info
  ml/cnxk: add internal structures for tiles and OCM
  ml/cnxk: add structures for slow and fast path JDs
  ml/cnxk: find OCM mask and page slots for a model
  ml/cnxk: add support to reserve and free OCM pages
  ml/cnxk: enable support to start an ML model
  ml/cnxk: enable support to stop an ML models
  ml/cnxk: enable support to get model information
  ml/cnxk: enable support to update model params
  ml/cnxk: add support to get IO buffer sizes
  ml/cnxk: enable quantization and dequantization
  ml/cnxk: enable support to dump device debug info
  ml/cnxk: add driver support for device selftest
  ml/cnxk: enqueue a burst of inference requests
  ml/cnxk: dequeue a burst of inference requests
  ml/cnxk: add internal function for sync mode run
  ml/cnxk: enable support for firmware error codes
  ml/cnxk: add support to get and reset device stats
  ml/cnxk: add support to handle extended dev stats
  ml/cnxk: enable support to get xstats in cycles
  ml/cnxk: add support to report DPE FW warnings
  ml/cnxk: add support to enable model data caching
  ml/cnxk: add support to select OCM allocation mode
  ml/cnxk: add support to use lock during jcmd enq
  ml/cnxk: add support to select poll memory region
  ml/cnxk: add user guide for marvell cnxk ml driver
  ml/cnxk: enable support for configurable ocm page

 MAINTAINERS                            |   11 +
 doc/guides/index.rst                   |    1 +
 doc/guides/mldevs/cnxk.rst             |  254 +++
 doc/guides/mldevs/index.rst            |   14 +
 doc/guides/rel_notes/release_23_03.rst |    7 +
 drivers/common/cnxk/hw/ml.h            |  170 ++
 drivers/common/cnxk/meson.build        |    1 +
 drivers/common/cnxk/roc_api.h          |    4 +
 drivers/common/cnxk/roc_constants.h    |    2 +
 drivers/common/cnxk/roc_dev_priv.h     |    1 +
 drivers/common/cnxk/roc_ml.c           |  626 +++++++
 drivers/common/cnxk/roc_ml.h           |  152 ++
 drivers/common/cnxk/roc_ml_priv.h      |   24 +
 drivers/common/cnxk/roc_platform.c     |    1 +
 drivers/common/cnxk/roc_platform.h     |    2 +
 drivers/common/cnxk/roc_priv.h         |    3 +
 drivers/common/cnxk/version.map        |   29 +
 drivers/meson.build                    |    1 +
 drivers/ml/cnxk/cn10k_ml_dev.c         |  870 +++++++++
 drivers/ml/cnxk/cn10k_ml_dev.h         |  429 +++++
 drivers/ml/cnxk/cn10k_ml_model.c       |  413 +++++
 drivers/ml/cnxk/cn10k_ml_model.h       |  508 ++++++
 drivers/ml/cnxk/cn10k_ml_ocm.c         |  519 ++++++
 drivers/ml/cnxk/cn10k_ml_ocm.h         |   85 +
 drivers/ml/cnxk/cn10k_ml_ops.c         | 2316 ++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h         |   94 +
 drivers/ml/cnxk/meson.build            |   32 +
 drivers/ml/meson.build                 |    8 +
 28 files changed, 6577 insertions(+)
 create mode 100644 doc/guides/mldevs/cnxk.rst
 create mode 100644 doc/guides/mldevs/index.rst
 create mode 100644 drivers/common/cnxk/hw/ml.h
 create mode 100644 drivers/common/cnxk/roc_ml.c
 create mode 100644 drivers/common/cnxk/roc_ml.h
 create mode 100644 drivers/common/cnxk/roc_ml_priv.h
 create mode 100644 drivers/ml/cnxk/cn10k_ml_dev.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_dev.h
 create mode 100644 drivers/ml/cnxk/cn10k_ml_model.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_model.h
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ocm.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ocm.h
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ops.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ops.h
 create mode 100644 drivers/ml/cnxk/meson.build
 create mode 100644 drivers/ml/meson.build

--
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 01/39] common/cnxk: add ML headers and ROC code for cnxk
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
@ 2023-02-07 16:06   ` Srikanth Yalavarthi
  2023-02-07 16:06   ` [PATCH v5 02/39] ml/cnxk: add skeleton for ML cnxk driver Srikanth Yalavarthi
                     ` (38 subsequent siblings)
  39 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:06 UTC (permalink / raw)
  To: Thomas Monjalon, Srikanth Yalavarthi, Nithin Dabilpuram,
	Kiran Kumar K, Sunil Kumar Kori, Satha Rao
  Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added ML cnxk headers for register, structure definitions and
ROC layer. Implemented ROC functions, registered logtype for
ML module with the name pmd.ml.cnxk and defined ML hardware ID.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
Depends-on: series-26859 ("Implementation of ML common code")

 MAINTAINERS                         |   9 +
 drivers/common/cnxk/hw/ml.h         | 170 ++++++++
 drivers/common/cnxk/meson.build     |   1 +
 drivers/common/cnxk/roc_api.h       |   4 +
 drivers/common/cnxk/roc_constants.h |   2 +
 drivers/common/cnxk/roc_dev_priv.h  |   1 +
 drivers/common/cnxk/roc_ml.c        | 626 ++++++++++++++++++++++++++++
 drivers/common/cnxk/roc_ml.h        | 152 +++++++
 drivers/common/cnxk/roc_ml_priv.h   |  24 ++
 drivers/common/cnxk/roc_platform.c  |   1 +
 drivers/common/cnxk/roc_platform.h  |   2 +
 drivers/common/cnxk/roc_priv.h      |   3 +
 drivers/common/cnxk/version.map     |  29 ++
 13 files changed, 1024 insertions(+)
 create mode 100644 drivers/common/cnxk/hw/ml.h
 create mode 100644 drivers/common/cnxk/roc_ml.c
 create mode 100644 drivers/common/cnxk/roc_ml.h
 create mode 100644 drivers/common/cnxk/roc_ml_priv.h

diff --git a/MAINTAINERS b/MAINTAINERS
index f1b1915053..97ce3042b4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1434,6 +1434,15 @@ F: drivers/raw/dpaa2_cmdif/
 F: doc/guides/rawdevs/dpaa2_cmdif.rst


+ML Device Drivers
+------------------------
+
+Marvell ML CNXK
+M: Srikanth Yalavarthi <syalavarthi@marvell.com>
+F: drivers/common/cnxk/hw/ml.h
+F: drivers/common/cnxk/roc_ml*
+
+
 Packet processing
 -----------------

diff --git a/drivers/common/cnxk/hw/ml.h b/drivers/common/cnxk/hw/ml.h
new file mode 100644
index 0000000000..3ead42b807
--- /dev/null
+++ b/drivers/common/cnxk/hw/ml.h
@@ -0,0 +1,170 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef __ML_HW_H__
+#define __ML_HW_H__
+
+#include <stdint.h>
+
+/* Constants */
+#define ML_ANBX_NR 0x3
+
+/* Base offsets */
+#define ML_MLAB_BLK_OFFSET 0x20000000 /* CNF10KB */
+#define ML_AXI_START_ADDR  0x800000000
+
+/* MLW register offsets / ML_PF_BAR0 */
+#define ML_CFG			 0x10000
+#define ML_MLR_BASE		 0x10008
+#define ML_AXI_BRIDGE_CTRL(a)	 (0x10020 | (uint64_t)(a) << 3)
+#define ML_JOB_MGR_CTRL		 0x10060
+#define ML_CORE_INT_LO		 0x10140
+#define ML_CORE_INT_HI		 0x10160
+#define ML_JCMDQ_IN(a)		 (0x11000 | (uint64_t)(a) << 3) /* CN10KA */
+#define ML_JCMDQ_STATUS		 0x11010			/* CN10KA */
+#define ML_STGX_STATUS(a)	 (0x11020 | (uint64_t)(a) << 3) /* CNF10KB */
+#define ML_STG_CONTROL		 0x11100			/* CNF10KB */
+#define ML_PNB_CMD_TYPE		 0x113a0			/* CNF10KB */
+#define ML_SCRATCH(a)		 (0x14000 | (uint64_t)(a) << 3)
+#define ML_ANBX_BACKP_DISABLE(a) (0x18000 | (uint64_t)(a) << 12) /* CN10KA */
+#define ML_ANBX_NCBI_P_OVR(a)	 (0x18010 | (uint64_t)(a) << 12) /* CN10KA */
+#define ML_ANBX_NCBI_NP_OVR(a)	 (0x18020 | (uint64_t)(a) << 12) /* CN10KA */
+
+/* MLIP configuration register offsets / ML_PF_BAR0 */
+#define ML_SW_RST_CTRL		      0x12084000
+#define ML_A35_0_RST_VECTOR_BASE_W(a) (0x12084014 + (a) * (0x04))
+#define ML_A35_1_RST_VECTOR_BASE_W(a) (0x1208401c + (a) * (0x04))
+
+/* MLW scratch register offsets */
+#define ML_SCRATCH_WORK_PTR	      (ML_SCRATCH(0))
+#define ML_SCRATCH_FW_CTRL	      (ML_SCRATCH(1))
+#define ML_SCRATCH_DBG_BUFFER_HEAD_C0 (ML_SCRATCH(2))
+#define ML_SCRATCH_DBG_BUFFER_TAIL_C0 (ML_SCRATCH(3))
+#define ML_SCRATCH_DBG_BUFFER_HEAD_C1 (ML_SCRATCH(4))
+#define ML_SCRATCH_DBG_BUFFER_TAIL_C1 (ML_SCRATCH(5))
+#define ML_SCRATCH_EXCEPTION_SP_C0    (ML_SCRATCH(6))
+#define ML_SCRATCH_EXCEPTION_SP_C1    (ML_SCRATCH(7))
+
+/* ML job completion structure */
+struct ml_jce_s {
+	/* WORD 0 */
+	union ml_jce_w0 {
+		struct {
+			uint64_t rsvd_0_3 : 4;
+
+			/* Reserved for future architecture */
+			uint64_t ggrp_h : 2;
+
+			/* Tag type */
+			uint64_t ttype : 2;
+
+			/* Physical function number */
+			uint64_t pf_func : 16;
+
+			/* Unused [7] + Guest Group [6:0] */
+			uint64_t ggrp : 8;
+
+			/* Tag */
+			uint64_t tag : 32;
+		} s;
+		uint64_t u64;
+	} w0;
+
+	/* WORD 1 */
+	union ml_jce_w1 {
+		struct {
+			/* Work queue pointer */
+			uint64_t wqp : 53;
+			uint64_t rsvd_53_63 : 11;
+
+		} s;
+		uint64_t u64;
+	} w1;
+};
+
+/* ML job command structure */
+struct ml_job_cmd_s {
+	/* WORD 0 */
+	union ml_job_cmd_w0 {
+		struct {
+			uint64_t rsvd_0_63;
+		} s;
+		uint64_t u64;
+	} w0;
+
+	/* WORD 1 */
+	union ml_job_cmd_w1 {
+		struct {
+			/* Job pointer */
+			uint64_t jobptr : 53;
+			uint64_t rsvd_53_63 : 11;
+		} s;
+		uint64_t u64;
+	} w1;
+};
+
+/* ML A35 0 RST vector base structure */
+union ml_a35_0_rst_vector_base_s {
+	struct {
+		/* Base address */
+		uint64_t addr : 37;
+		uint64_t rsvd_37_63 : 27;
+	} s;
+
+	struct {
+		/* WORD 0 */
+		uint32_t w0;
+
+		/* WORD 1 */
+		uint32_t w1;
+	} w;
+
+	uint64_t u64;
+};
+
+/* ML A35 1 RST vector base structure */
+union ml_a35_1_rst_vector_base_s {
+	struct {
+		/* Base address */
+		uint64_t addr : 37;
+		uint64_t rsvd_37_63 : 27;
+	} s;
+
+	struct {
+		/* WORD 0 */
+		uint32_t w0;
+
+		/* WORD 1 */
+		uint32_t w1;
+	} w;
+
+	uint64_t u64;
+};
+
+/* Work pointer scratch register */
+union ml_scratch_work_ptr_s {
+	struct {
+		/* Work pointer */
+		uint64_t work_ptr : 37;
+		uint64_t rsvd_37_63 : 27;
+	} s;
+	uint64_t u64;
+};
+
+/* Firmware control scratch register */
+union ml_scratch_fw_ctrl_s {
+	struct {
+		uint64_t rsvd_0_15 : 16;
+
+		/* Valid job bit */
+		uint64_t valid : 1;
+
+		/* Done status bit */
+		uint64_t done : 1;
+		uint64_t rsvd_18_63 : 46;
+	} s;
+	uint64_t u64;
+};
+
+#endif /* __ML_HW_H__ */
diff --git a/drivers/common/cnxk/meson.build b/drivers/common/cnxk/meson.build
index 849735921c..b4aa0a050c 100644
--- a/drivers/common/cnxk/meson.build
+++ b/drivers/common/cnxk/meson.build
@@ -26,6 +26,7 @@ sources = files(
         'roc_irq.c',
         'roc_ie_ot.c',
         'roc_mbox.c',
+        'roc_ml.c',
         'roc_model.c',
         'roc_nix.c',
         'roc_nix_bpf.c',
diff --git a/drivers/common/cnxk/roc_api.h b/drivers/common/cnxk/roc_api.h
index 14a11321e0..06accf247d 100644
--- a/drivers/common/cnxk/roc_api.h
+++ b/drivers/common/cnxk/roc_api.h
@@ -34,6 +34,7 @@
 /* HW structure definition */
 #include "hw/cpt.h"
 #include "hw/dpi.h"
+#include "hw/ml.h"
 #include "hw/nix.h"
 #include "hw/npa.h"
 #include "hw/npc.h"
@@ -107,4 +108,7 @@
 /* NIX Inline dev */
 #include "roc_nix_inl.h"

+/* ML */
+#include "roc_ml.h"
+
 #endif /* _ROC_API_H_ */
diff --git a/drivers/common/cnxk/roc_constants.h b/drivers/common/cnxk/roc_constants.h
index 0495965daa..ddaef133b8 100644
--- a/drivers/common/cnxk/roc_constants.h
+++ b/drivers/common/cnxk/roc_constants.h
@@ -50,6 +50,8 @@
 #define PCI_DEVID_CN10K_RVU_CPT_PF 0xA0F2
 #define PCI_DEVID_CN10K_RVU_CPT_VF 0xA0F3

+#define PCI_DEVID_CN10K_ML_PF 0xA092
+
 #define PCI_SUBSYSTEM_DEVID_CN10KA  0xB900
 #define PCI_SUBSYSTEM_DEVID_CN10KAS 0xB900
 #define PCI_SUBSYSTEM_DEVID_CNF10KA 0xBA00
diff --git a/drivers/common/cnxk/roc_dev_priv.h b/drivers/common/cnxk/roc_dev_priv.h
index 4217ec4af8..40af5e0f0b 100644
--- a/drivers/common/cnxk/roc_dev_priv.h
+++ b/drivers/common/cnxk/roc_dev_priv.h
@@ -90,6 +90,7 @@ struct dev {
 	void *roc_nix;
 	void *roc_cpt;
 	void *roc_tim;
+	void *roc_ml;
 	bool disable_shared_lmt; /* false(default): shared lmt mode enabled */
 	const struct plt_memzone *lmt_mz;
 } __plt_cache_aligned;
diff --git a/drivers/common/cnxk/roc_ml.c b/drivers/common/cnxk/roc_ml.c
new file mode 100644
index 0000000000..7390697b1d
--- /dev/null
+++ b/drivers/common/cnxk/roc_ml.c
@@ -0,0 +1,626 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#include "roc_api.h"
+#include "roc_priv.h"
+
+#define TIME_SEC_IN_MS 1000
+
+static int
+roc_ml_reg_wait_to_clear(struct roc_ml *roc_ml, uint64_t offset, uint64_t mask)
+{
+	uint64_t start_cycle;
+	uint64_t wait_cycles;
+	uint64_t reg_val;
+
+	wait_cycles = (ROC_ML_TIMEOUT_MS * plt_tsc_hz()) / TIME_SEC_IN_MS;
+	start_cycle = plt_tsc_cycles();
+	do {
+		reg_val = roc_ml_reg_read64(roc_ml, offset);
+
+		if (!(reg_val & mask))
+			return 0;
+	} while (plt_tsc_cycles() - start_cycle < wait_cycles);
+
+	return -ETIME;
+}
+
+uint64_t
+roc_ml_reg_read64(struct roc_ml *roc_ml, uint64_t offset)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+
+	return plt_read64(PLT_PTR_ADD(ml->ml_reg_addr, offset));
+}
+
+void
+roc_ml_reg_write64(struct roc_ml *roc_ml, uint64_t val, uint64_t offset)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+
+	plt_write64(val, PLT_PTR_ADD(ml->ml_reg_addr, offset));
+}
+
+uint32_t
+roc_ml_reg_read32(struct roc_ml *roc_ml, uint64_t offset)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+
+	return plt_read32(PLT_PTR_ADD(ml->ml_reg_addr, offset));
+}
+
+void
+roc_ml_reg_write32(struct roc_ml *roc_ml, uint32_t val, uint64_t offset)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+
+	plt_write32(val, PLT_PTR_ADD(ml->ml_reg_addr, offset));
+}
+
+void
+roc_ml_reg_save(struct roc_ml *roc_ml, uint64_t offset)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+
+	if (offset == ML_MLR_BASE) {
+		ml->ml_mlr_base =
+			FIELD_GET(ROC_ML_MLR_BASE_BASE, roc_ml_reg_read64(roc_ml, offset));
+		ml->ml_mlr_base_saved = true;
+	}
+}
+
+void *
+roc_ml_addr_ap2mlip(struct roc_ml *roc_ml, void *addr)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+	uint64_t ml_mlr_base;
+
+	ml_mlr_base = (ml->ml_mlr_base_saved) ? ml->ml_mlr_base :
+						FIELD_GET(ROC_ML_MLR_BASE_BASE,
+							  roc_ml_reg_read64(roc_ml, ML_MLR_BASE));
+	return PLT_PTR_ADD(addr, ML_AXI_START_ADDR - ml_mlr_base);
+}
+
+void *
+roc_ml_addr_mlip2ap(struct roc_ml *roc_ml, void *addr)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+	uint64_t ml_mlr_base;
+
+	ml_mlr_base = (ml->ml_mlr_base_saved) ? ml->ml_mlr_base :
+						FIELD_GET(ROC_ML_MLR_BASE_BASE,
+							  roc_ml_reg_read64(roc_ml, ML_MLR_BASE));
+	return PLT_PTR_ADD(addr, ml_mlr_base - ML_AXI_START_ADDR);
+}
+
+uint64_t
+roc_ml_addr_pa_to_offset(struct roc_ml *roc_ml, uint64_t phys_addr)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+
+	if (roc_model_is_cn10ka())
+		return phys_addr - ml->pci_dev->mem_resource[0].phys_addr;
+	else
+		return phys_addr - ml->pci_dev->mem_resource[0].phys_addr - ML_MLAB_BLK_OFFSET;
+}
+
+uint64_t
+roc_ml_addr_offset_to_pa(struct roc_ml *roc_ml, uint64_t offset)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+
+	if (roc_model_is_cn10ka())
+		return ml->pci_dev->mem_resource[0].phys_addr + offset;
+	else
+		return ml->pci_dev->mem_resource[0].phys_addr + ML_MLAB_BLK_OFFSET + offset;
+}
+
+void
+roc_ml_scratch_write_job(struct roc_ml *roc_ml, void *work_ptr)
+{
+	union ml_scratch_work_ptr_s reg_work_ptr;
+	union ml_scratch_fw_ctrl_s reg_fw_ctrl;
+
+	reg_work_ptr.u64 = 0;
+	reg_work_ptr.s.work_ptr = PLT_U64_CAST(roc_ml_addr_ap2mlip(roc_ml, work_ptr));
+
+	reg_fw_ctrl.u64 = 0;
+	reg_fw_ctrl.s.valid = 1;
+
+	roc_ml_reg_write64(roc_ml, reg_work_ptr.u64, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(roc_ml, reg_fw_ctrl.u64, ML_SCRATCH_FW_CTRL);
+}
+
+bool
+roc_ml_scratch_is_valid_bit_set(struct roc_ml *roc_ml)
+{
+	union ml_scratch_fw_ctrl_s reg_fw_ctrl;
+
+	reg_fw_ctrl.u64 = roc_ml_reg_read64(roc_ml, ML_SCRATCH_FW_CTRL);
+
+	if (reg_fw_ctrl.s.valid == 1)
+		return true;
+
+	return false;
+}
+
+bool
+roc_ml_scratch_is_done_bit_set(struct roc_ml *roc_ml)
+{
+	union ml_scratch_fw_ctrl_s reg_fw_ctrl;
+
+	reg_fw_ctrl.u64 = roc_ml_reg_read64(roc_ml, ML_SCRATCH_FW_CTRL);
+
+	if (reg_fw_ctrl.s.done == 1)
+		return true;
+
+	return false;
+}
+
+bool
+roc_ml_scratch_enqueue(struct roc_ml *roc_ml, void *work_ptr)
+{
+	union ml_scratch_work_ptr_s reg_work_ptr;
+	union ml_scratch_fw_ctrl_s reg_fw_ctrl;
+	bool ret = false;
+
+	reg_work_ptr.u64 = 0;
+	reg_work_ptr.s.work_ptr = PLT_U64_CAST(roc_ml_addr_ap2mlip(roc_ml, work_ptr));
+
+	reg_fw_ctrl.u64 = 0;
+	reg_fw_ctrl.s.valid = 1;
+
+	if (plt_spinlock_trylock(&roc_ml->sp_spinlock) != 0) {
+		bool valid = roc_ml_scratch_is_valid_bit_set(roc_ml);
+		bool done = roc_ml_scratch_is_done_bit_set(roc_ml);
+
+		if (valid == done) {
+			roc_ml_clk_force_on(roc_ml);
+			roc_ml_dma_stall_off(roc_ml);
+
+			roc_ml_reg_write64(roc_ml, reg_work_ptr.u64, ML_SCRATCH_WORK_PTR);
+			roc_ml_reg_write64(roc_ml, reg_fw_ctrl.u64, ML_SCRATCH_FW_CTRL);
+
+			ret = true;
+		}
+		plt_spinlock_unlock(&roc_ml->sp_spinlock);
+	}
+
+	return ret;
+}
+
+bool
+roc_ml_scratch_dequeue(struct roc_ml *roc_ml, void *work_ptr)
+{
+	union ml_scratch_work_ptr_s reg_work_ptr;
+	bool ret = false;
+
+	if (plt_spinlock_trylock(&roc_ml->sp_spinlock) != 0) {
+		bool valid = roc_ml_scratch_is_valid_bit_set(roc_ml);
+		bool done = roc_ml_scratch_is_done_bit_set(roc_ml);
+
+		if (valid && done) {
+			reg_work_ptr.u64 = roc_ml_reg_read64(roc_ml, ML_SCRATCH_WORK_PTR);
+			if (work_ptr ==
+			    roc_ml_addr_mlip2ap(roc_ml, PLT_PTR_CAST(reg_work_ptr.u64))) {
+				roc_ml_dma_stall_on(roc_ml);
+				roc_ml_clk_force_off(roc_ml);
+
+				roc_ml_reg_write64(roc_ml, 0, ML_SCRATCH_WORK_PTR);
+				roc_ml_reg_write64(roc_ml, 0, ML_SCRATCH_FW_CTRL);
+				ret = true;
+			}
+		}
+		plt_spinlock_unlock(&roc_ml->sp_spinlock);
+	}
+
+	return ret;
+}
+
+void
+roc_ml_scratch_queue_reset(struct roc_ml *roc_ml)
+{
+	if (plt_spinlock_trylock(&roc_ml->sp_spinlock) != 0) {
+		roc_ml_dma_stall_on(roc_ml);
+		roc_ml_clk_force_off(roc_ml);
+		roc_ml_reg_write64(roc_ml, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(roc_ml, 0, ML_SCRATCH_FW_CTRL);
+		plt_spinlock_unlock(&roc_ml->sp_spinlock);
+	}
+}
+
+bool
+roc_ml_jcmdq_enqueue_lf(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd)
+{
+	bool ret = false;
+
+	if (FIELD_GET(ROC_ML_JCMDQ_STATUS_AVAIL_COUNT,
+		      roc_ml_reg_read64(roc_ml, ML_JCMDQ_STATUS)) != 0) {
+		roc_ml_reg_write64(roc_ml, job_cmd->w0.u64, ML_JCMDQ_IN(0));
+		roc_ml_reg_write64(roc_ml, job_cmd->w1.u64, ML_JCMDQ_IN(1));
+		ret = true;
+	}
+
+	return ret;
+}
+
+bool
+roc_ml_jcmdq_enqueue_sl(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd)
+{
+	bool ret = false;
+
+	if (plt_spinlock_trylock(&roc_ml->fp_spinlock) != 0) {
+		if (FIELD_GET(ROC_ML_JCMDQ_STATUS_AVAIL_COUNT,
+			      roc_ml_reg_read64(roc_ml, ML_JCMDQ_STATUS)) != 0) {
+			roc_ml_reg_write64(roc_ml, job_cmd->w0.u64, ML_JCMDQ_IN(0));
+			roc_ml_reg_write64(roc_ml, job_cmd->w1.u64, ML_JCMDQ_IN(1));
+			ret = true;
+		}
+		plt_spinlock_unlock(&roc_ml->fp_spinlock);
+	}
+
+	return ret;
+}
+
+void
+roc_ml_clk_force_on(struct roc_ml *roc_ml)
+{
+	uint64_t reg_val = 0;
+
+	reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+	reg_val |= ROC_ML_CFG_MLIP_CLK_FORCE;
+	roc_ml_reg_write64(roc_ml, reg_val, ML_CFG);
+}
+
+void
+roc_ml_clk_force_off(struct roc_ml *roc_ml)
+{
+	uint64_t reg_val = 0;
+
+	roc_ml_reg_write64(roc_ml, 0, ML_SCRATCH_WORK_PTR);
+
+	reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+	reg_val &= ~ROC_ML_CFG_MLIP_CLK_FORCE;
+	roc_ml_reg_write64(roc_ml, reg_val, ML_CFG);
+}
+
+void
+roc_ml_dma_stall_on(struct roc_ml *roc_ml)
+{
+	uint64_t reg_val = 0;
+
+	reg_val = roc_ml_reg_read64(roc_ml, ML_JOB_MGR_CTRL);
+	reg_val |= ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
+	roc_ml_reg_write64(roc_ml, reg_val, ML_JOB_MGR_CTRL);
+}
+
+void
+roc_ml_dma_stall_off(struct roc_ml *roc_ml)
+{
+	uint64_t reg_val = 0;
+
+	reg_val = roc_ml_reg_read64(roc_ml, ML_JOB_MGR_CTRL);
+	reg_val &= ~ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
+	roc_ml_reg_write64(roc_ml, reg_val, ML_JOB_MGR_CTRL);
+}
+
+bool
+roc_ml_mlip_is_enabled(struct roc_ml *roc_ml)
+{
+	uint64_t reg_val;
+
+	reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+
+	if ((reg_val & ROC_ML_CFG_MLIP_ENA) != 0)
+		return true;
+
+	return false;
+}
+
+int
+roc_ml_mlip_reset(struct roc_ml *roc_ml, bool force)
+{
+	uint64_t reg_val;
+
+	/* Force reset */
+	if (force) {
+		/* Set ML(0)_CFG[ENA] = 0. */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+		reg_val &= ~ROC_ML_CFG_ENA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_CFG);
+
+		/* Set ML(0)_CFG[MLIP_ENA] = 0. */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+		reg_val &= ~ROC_ML_CFG_MLIP_ENA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_CFG);
+
+		/* Clear ML_MLR_BASE */
+		roc_ml_reg_write64(roc_ml, 0, ML_MLR_BASE);
+	}
+
+	if (roc_model_is_cn10ka()) {
+		/* Wait for all active jobs to finish.
+		 * ML_CFG[ENA] : When set, MLW will accept job commands. This
+		 * bit can be cleared at any time. If [BUSY] is set, software
+		 * must wait until [BUSY] == 0 before setting this bit.
+		 */
+		roc_ml_reg_wait_to_clear(roc_ml, ML_CFG, ROC_ML_CFG_BUSY);
+
+		/* (1) Set ML(0)_AXI_BRIDGE_CTRL(0..1)[FENCE] = 1 to instruct
+		 * the AXI bridge not to accept any new transactions from MLIP.
+		 */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(0));
+		reg_val |= ROC_ML_AXI_BRIDGE_CTRL_FENCE;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(0));
+
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(1));
+		reg_val |= ROC_ML_AXI_BRIDGE_CTRL_FENCE;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(1));
+
+		/* (2) Wait until ML(0)_AXI_BRIDGE_CTRL(0..1)[BUSY] = 0 which
+		 * indicates that there is no outstanding transactions on
+		 * AXI-NCB paths.
+		 */
+		roc_ml_reg_wait_to_clear(roc_ml, ML_AXI_BRIDGE_CTRL(0),
+					 ROC_ML_AXI_BRIDGE_CTRL_BUSY);
+		roc_ml_reg_wait_to_clear(roc_ml, ML_AXI_BRIDGE_CTRL(1),
+					 ROC_ML_AXI_BRIDGE_CTRL_BUSY);
+
+		/* (3) Wait until ML(0)_JOB_MGR_CTRL[BUSY] = 0 which indicates
+		 * that there are no pending jobs in the MLW's job manager.
+		 */
+		roc_ml_reg_wait_to_clear(roc_ml, ML_JOB_MGR_CTRL, ROC_ML_JOB_MGR_CTRL_BUSY);
+
+		/* (4) Set ML(0)_CFG[ENA] = 0. */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+		reg_val &= ~ROC_ML_CFG_ENA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_CFG);
+
+		/* (5) Set ML(0)_CFG[MLIP_ENA] = 0. */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+		reg_val &= ~ROC_ML_CFG_MLIP_ENA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_CFG);
+
+		/* (6) Set ML(0)_AXI_BRIDGE_CTRL(0..1)[FENCE] = 0.*/
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(0));
+		reg_val &= ~ROC_ML_AXI_BRIDGE_CTRL_FENCE;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(0));
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(1));
+	}
+
+	if (roc_model_is_cnf10kb()) {
+		/* (1) Clear MLAB(0)_CFG[ENA]. Any new jobs will bypass the job
+		 * execution stages and their completions will be returned to
+		 * PSM.
+		 */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+		reg_val &= ~ROC_ML_CFG_ENA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_CFG);
+
+		/* (2) Quiesce the ACC and DMA AXI interfaces: For each of the
+		 * two MLAB(0)_AXI_BRIDGE_CTRL(0..1) registers:
+		 *
+		 * (a) Set MLAB(0)_AXI_BRIDGE_CTRL(0..1)[FENCE] to block new AXI
+		 * commands from MLIP.
+		 *
+		 * (b) Poll MLAB(0)_AXI_BRIDGE_CTRL(0..1)[BUSY] == 0.
+		 */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(0));
+		reg_val |= ROC_ML_AXI_BRIDGE_CTRL_FENCE;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(0));
+
+		roc_ml_reg_wait_to_clear(roc_ml, ML_AXI_BRIDGE_CTRL(0),
+					 ROC_ML_AXI_BRIDGE_CTRL_BUSY);
+
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(1));
+		reg_val |= ROC_ML_AXI_BRIDGE_CTRL_FENCE;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(1));
+
+		roc_ml_reg_wait_to_clear(roc_ml, ML_AXI_BRIDGE_CTRL(1),
+					 ROC_ML_AXI_BRIDGE_CTRL_BUSY);
+
+		/* (3) Clear MLAB(0)_CFG[MLIP_ENA] to reset MLIP.
+		 */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+		reg_val &= ~ROC_ML_CFG_MLIP_ENA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_CFG);
+
+cnf10kb_mlip_reset_stage_4a:
+		/* (4) Flush any outstanding jobs in MLAB's job execution
+		 * stages:
+		 *
+		 * (a) Wait for completion stage to clear:
+		 *   - Poll MLAB(0)_STG(0..2)_STATUS[VALID] == 0.
+		 */
+		roc_ml_reg_wait_to_clear(roc_ml, ML_STGX_STATUS(0), ROC_ML_STG_STATUS_VALID);
+		roc_ml_reg_wait_to_clear(roc_ml, ML_STGX_STATUS(1), ROC_ML_STG_STATUS_VALID);
+		roc_ml_reg_wait_to_clear(roc_ml, ML_STGX_STATUS(2), ROC_ML_STG_STATUS_VALID);
+
+cnf10kb_mlip_reset_stage_4b:
+		/* (4b) Clear job run stage: Poll
+		 * MLAB(0)_STG_CONTROL[RUN_TO_COMP] == 0.
+		 */
+		roc_ml_reg_wait_to_clear(roc_ml, ML_STG_CONTROL, ROC_ML_STG_CONTROL_RUN_TO_COMP);
+
+		/* (4b) Clear job run stage: If MLAB(0)_STG(1)_STATUS[VALID] ==
+		 * 1:
+		 *     - Set MLAB(0)_STG_CONTROL[RUN_TO_COMP].
+		 *     - Poll MLAB(0)_STG_CONTROL[RUN_TO_COMP] == 0.
+		 *     - Repeat step (a) to clear job completion stage.
+		 */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_STGX_STATUS(1));
+		if (reg_val & ROC_ML_STG_STATUS_VALID) {
+			reg_val = roc_ml_reg_read64(roc_ml, ML_STG_CONTROL);
+			reg_val |= ROC_ML_STG_CONTROL_RUN_TO_COMP;
+			roc_ml_reg_write64(roc_ml, reg_val, ML_STG_CONTROL);
+
+			roc_ml_reg_wait_to_clear(roc_ml, ML_STG_CONTROL,
+						 ROC_ML_STG_CONTROL_RUN_TO_COMP);
+
+			goto cnf10kb_mlip_reset_stage_4a;
+		}
+
+		/* (4c) Clear job fetch stage: Poll
+		 * MLAB(0)_STG_CONTROL[FETCH_TO_RUN] == 0.
+		 */
+		roc_ml_reg_wait_to_clear(roc_ml, ML_STG_CONTROL, ROC_ML_STG_CONTROL_FETCH_TO_RUN);
+
+		/* (4c) Clear job fetch stage: If
+		 * MLAB(0)_STG(0..2)_STATUS[VALID] == 1:
+		 *     - Set MLAB(0)_STG_CONTROL[FETCH_TO_RUN].
+		 *     - Poll MLAB(0)_STG_CONTROL[FETCH_TO_RUN] == 0.
+		 *     - Repeat step (b) to clear job run and completion stages.
+		 */
+		reg_val = (roc_ml_reg_read64(roc_ml, ML_STGX_STATUS(0)) |
+			   roc_ml_reg_read64(roc_ml, ML_STGX_STATUS(1)) |
+			   roc_ml_reg_read64(roc_ml, ML_STGX_STATUS(2)));
+
+		if (reg_val & ROC_ML_STG_STATUS_VALID) {
+			reg_val = roc_ml_reg_read64(roc_ml, ML_STG_CONTROL);
+			reg_val |= ROC_ML_STG_CONTROL_RUN_TO_COMP;
+			roc_ml_reg_write64(roc_ml, reg_val, ML_STG_CONTROL);
+
+			roc_ml_reg_wait_to_clear(roc_ml, ML_STG_CONTROL,
+						 ROC_ML_STG_CONTROL_RUN_TO_COMP);
+
+			goto cnf10kb_mlip_reset_stage_4b;
+		}
+
+		/* (5) Reset the ACC and DMA AXI interfaces: For each of the two
+		 * MLAB(0)_AXI_BRIDGE_CTRL(0..1) registers:
+		 *
+		 * (5a) Set and then clear
+		 * MLAB(0)_AXI_BRIDGE_CTRL(0..1)[FLUSH_WRITE_DATA].
+		 *
+		 * (5b) Clear MLAB(0)_AXI_BRIDGE_CTRL(0..1)[FENCE].
+		 */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(0));
+		reg_val |= ROC_ML_AXI_BRIDGE_CTRL_FLUSH_WRITE_DATA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(0));
+
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(0));
+		reg_val &= ~ROC_ML_AXI_BRIDGE_CTRL_FLUSH_WRITE_DATA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(0));
+
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(0));
+		reg_val &= ~ROC_ML_AXI_BRIDGE_CTRL_FENCE;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(0));
+
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(1));
+		reg_val |= ROC_ML_AXI_BRIDGE_CTRL_FLUSH_WRITE_DATA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(1));
+
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(1));
+		reg_val &= ~ROC_ML_AXI_BRIDGE_CTRL_FLUSH_WRITE_DATA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(1));
+
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(1));
+		reg_val &= ~ROC_ML_AXI_BRIDGE_CTRL_FENCE;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(1));
+	}
+
+	return 0;
+}
+
+int
+roc_ml_dev_init(struct roc_ml *roc_ml)
+{
+	struct plt_pci_device *pci_dev;
+	struct dev *dev;
+	struct ml *ml;
+
+	if (roc_ml == NULL || roc_ml->pci_dev == NULL)
+		return -EINVAL;
+
+	PLT_STATIC_ASSERT(sizeof(struct ml) <= ROC_ML_MEM_SZ);
+
+	ml = roc_ml_to_ml_priv(roc_ml);
+	memset(ml, 0, sizeof(*ml));
+	pci_dev = roc_ml->pci_dev;
+	dev = &ml->dev;
+
+	ml->pci_dev = pci_dev;
+	dev->roc_ml = roc_ml;
+
+	ml->ml_reg_addr = ml->pci_dev->mem_resource[0].addr;
+	ml->ml_mlr_base = 0;
+	ml->ml_mlr_base_saved = false;
+
+	plt_ml_dbg("ML: PCI Physical Address : 0x%016lx", ml->pci_dev->mem_resource[0].phys_addr);
+	plt_ml_dbg("ML: PCI Virtual Address : 0x%016lx",
+		   PLT_U64_CAST(ml->pci_dev->mem_resource[0].addr));
+
+	plt_spinlock_init(&roc_ml->sp_spinlock);
+	plt_spinlock_init(&roc_ml->fp_spinlock);
+
+	return 0;
+}
+
+int
+roc_ml_dev_fini(struct roc_ml *roc_ml)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+
+	if (ml == NULL)
+		return -EINVAL;
+
+	return 0;
+}
+
+int
+roc_ml_blk_init(struct roc_bphy *roc_bphy, struct roc_ml *roc_ml)
+{
+	struct dev *dev;
+	struct ml *ml;
+
+	if ((roc_ml == NULL) || (roc_bphy == NULL))
+		return -EINVAL;
+
+	PLT_STATIC_ASSERT(sizeof(struct ml) <= ROC_ML_MEM_SZ);
+
+	ml = roc_ml_to_ml_priv(roc_ml);
+	memset(ml, 0, sizeof(*ml));
+
+	dev = &ml->dev;
+
+	ml->pci_dev = roc_bphy->pci_dev;
+	dev->roc_ml = roc_ml;
+
+	plt_ml_dbg(
+		"MLAB: Physical Address : 0x%016lx",
+		PLT_PTR_ADD_U64_CAST(ml->pci_dev->mem_resource[0].phys_addr, ML_MLAB_BLK_OFFSET));
+	plt_ml_dbg("MLAB: Virtual Address : 0x%016lx",
+		   PLT_PTR_ADD_U64_CAST(ml->pci_dev->mem_resource[0].addr, ML_MLAB_BLK_OFFSET));
+
+	ml->ml_reg_addr = PLT_PTR_ADD(ml->pci_dev->mem_resource[0].addr, ML_MLAB_BLK_OFFSET);
+	ml->ml_mlr_base = 0;
+	ml->ml_mlr_base_saved = false;
+
+	plt_spinlock_init(&roc_ml->sp_spinlock);
+	plt_spinlock_init(&roc_ml->fp_spinlock);
+
+	return 0;
+}
+
+int
+roc_ml_blk_fini(struct roc_bphy *roc_bphy, struct roc_ml *roc_ml)
+{
+	struct ml *ml;
+
+	if ((roc_ml == NULL) || (roc_bphy == NULL))
+		return -EINVAL;
+
+	ml = roc_ml_to_ml_priv(roc_ml);
+
+	if (ml == NULL)
+		return -EINVAL;
+
+	return 0;
+}
+
+uint16_t
+roc_ml_sso_pf_func_get(void)
+{
+	return idev_sso_pffunc_get();
+}
diff --git a/drivers/common/cnxk/roc_ml.h b/drivers/common/cnxk/roc_ml.h
new file mode 100644
index 0000000000..3cd82be6a6
--- /dev/null
+++ b/drivers/common/cnxk/roc_ml.h
@@ -0,0 +1,152 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef _ROC_ML_H_
+#define _ROC_ML_H_
+
+#include "roc_api.h"
+
+#define ROC_ML_MEM_SZ	  (6 * 1024)
+#define ROC_ML_TIMEOUT_MS 10000
+
+/* ML_CFG */
+#define ROC_ML_CFG_JD_SIZE	  GENMASK_ULL(1, 0)
+#define ROC_ML_CFG_MLIP_ENA	  BIT_ULL(2)
+#define ROC_ML_CFG_BUSY		  BIT_ULL(3)
+#define ROC_ML_CFG_WRAP_CLK_FORCE BIT_ULL(4)
+#define ROC_ML_CFG_MLIP_CLK_FORCE BIT_ULL(5)
+#define ROC_ML_CFG_ENA		  BIT_ULL(6)
+
+/* ML_MLR_BASE */
+#define ROC_ML_MLR_BASE_BASE GENMASK_ULL(51, 0)
+
+/* ML_STG_STATUS */
+#define ROC_ML_STG_STATUS_VALID		BIT_ULL(0)
+#define ROC_ML_STG_STATUS_ADDR_ERR	BIT_ULL(1)
+#define ROC_ML_STG_STATUS_DMA_ERR	BIT_ULL(2)
+#define ROC_ML_STG_STATUS_TIMEOUT	BIT_ULL(3)
+#define ROC_ML_STG_STATUS_NFAT_ERR	BIT_ULL(4)
+#define ROC_ML_STG_STATUS_JOB_ERR	BIT_ULL(5)
+#define ROC_ML_STG_STATUS_ELAPSED_TICKS GENMASK_ULL(47, 6)
+
+/* ML_STG_CONTROL */
+#define ROC_ML_STG_CONTROL_FETCH_TO_RUN BIT_ULL(0)
+#define ROC_ML_STG_CONTROL_RUN_TO_COMP	BIT_ULL(1)
+
+/* ML_AXI_BRIDGE */
+#define ROC_ML_AXI_BRIDGE_CTRL_AXI_RESP_CTRL	      BIT_ULL(0)
+#define ROC_ML_AXI_BRIDGE_CTRL_BRIDGE_CTRL_MODE	      BIT_ULL(1)
+#define ROC_ML_AXI_BRIDGE_CTRL_FORCE_AXI_ID	      GENMASK_ULL(11, 2)
+#define ROC_ML_AXI_BRIDGE_CTRL_CSR_WR_BLK	      BIT_ULL(13)
+#define ROC_ML_AXI_BRIDGE_CTRL_NCB_WR_BLK	      BIT_ULL(14)
+#define ROC_ML_AXI_BRIDGE_CTRL_CSR_RD_BLK	      BIT_ULL(15)
+#define ROC_ML_AXI_BRIDGE_CTRL_NCB_RD_BLK	      BIT_ULL(16)
+#define ROC_ML_AXI_BRIDGE_CTRL_FENCE		      BIT_ULL(17)
+#define ROC_ML_AXI_BRIDGE_CTRL_BUSY		      BIT_ULL(18)
+#define ROC_ML_AXI_BRIDGE_CTRL_FORCE_WRESP_OK	      BIT_ULL(19)
+#define ROC_ML_AXI_BRIDGE_CTRL_FORCE_RRESP_OK	      BIT_ULL(20)
+#define ROC_ML_AXI_BRIDGE_CTRL_CSR_FORCE_CMPLT	      BIT_ULL(21)
+#define ROC_ML_AXI_BRIDGE_CTRL_WR_CNT_GEAR	      GENMASK_ULL(25, 22)
+#define ROC_ML_AXI_BRIDGE_CTRL_RD_GEAR		      GENMASK_ULL(28, 26)
+#define ROC_ML_AXI_BRIDGE_CTRL_CSR_CUTTHROUGH_MODE    BIT_ULL(29)
+#define ROC_ML_AXI_BRIDGE_CTRL_GAA_WRITE_CREDITS      GENMASK_ULL(33, 30)
+#define ROC_ML_AXI_BRIDGE_CTRL_GAA_READ_CREDITS	      GENMASK_ULL(37, 34)
+#define ROC_ML_AXI_BRIDGE_CTRL_GAA_LOAD_WRITE_CREDITS BIT_ULL(38)
+#define ROC_ML_AXI_BRIDGE_CTRL_GAA_LOAD_READ_CREDITS  BIT_ULL(39)
+#define ROC_ML_AXI_BRIDGE_CTRL_FLUSH_WRITE_DATA	      BIT_ULL(40)
+
+/* ML_JOB_MGR_CTRL */
+#define ROC_ML_JOB_MGR_CTRL_STALL_ON_ERR     BIT_ULL(0)
+#define ROC_ML_JOB_MGR_CTRL_PF_OVERRIDE	     BIT_ULL(1)
+#define ROC_ML_JOB_MGR_CTRL_PF_FUNC_OVERRIDE GENMASK_ULL(19, 4)
+#define ROC_ML_JOB_MGR_CTRL_BUSY	     BIT_ULL(20)
+#define ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE    BIT_ULL(21)
+
+/* ML_JCMDQ_STATUS */
+#define ROC_ML_JCMDQ_STATUS_AVAIL_COUNT GENMASK_ULL(4, 0)
+
+/* ML_ANBX_BACKP_DISABLE */
+#define ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_B_BACKP_DISABLE BIT_ULL(0)
+#define ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_R_BACKP_DISABLE BIT_ULL(1)
+
+/* ML_ANBX_NCBI_P_OVR */
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_MSH_DST_OVR_VLD	 BIT_ULL(0)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_MSH_DST_OVR	 GENMASK_ULL(11, 1)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR_VLD	 BIT_ULL(12)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR		 BIT_ULL(13)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_PADDR_OVR_VLD	 BIT_ULL(14)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_PADDR_OVR		 BIT_ULL(15)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR_VLD	 BIT_ULL(16)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR		 BIT_ULL(17)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_MPADID_VAL_OVR_VLD BIT_ULL(18)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_MPADID_VAL_OVR	 BIT_ULL(19)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_MPAMDID_OVR_VLD	 BIT_ULL(20)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_MPAMDID_OVR	 BIT_ULL(21)
+
+/* ML_ANBX_NCBI_NP_OVR */
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_MSH_DST_OVR_VLD	   BIT_ULL(0)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_MSH_DST_OVR	   GENMASK_ULL(11, 1)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR_VLD	   BIT_ULL(12)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR		   BIT_ULL(13)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_PADDR_OVR_VLD	   BIT_ULL(14)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_PADDR_OVR	   BIT_ULL(15)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_RO_OVR_VLD	   BIT_ULL(16)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_RO_OVR		   BIT_ULL(17)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_MPADID_VAL_OVR_VLD BIT_ULL(18)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_MPADID_VAL_OVR	   BIT_ULL(19)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_MPAMDID_OVR_VLD	   BIT_ULL(20)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_MPAMDID_OVR	   BIT_ULL(21)
+
+/* ML_SW_RST_CTRL */
+#define ROC_ML_SW_RST_CTRL_ACC_RST  BIT_ULL(0)
+#define ROC_ML_SW_RST_CTRL_CMPC_RST BIT_ULL(1)
+
+struct roc_ml {
+	struct plt_pci_device *pci_dev;
+	plt_spinlock_t sp_spinlock;
+	plt_spinlock_t fp_spinlock;
+	uint8_t reserved[ROC_ML_MEM_SZ] __plt_cache_aligned;
+} __plt_cache_aligned;
+
+/* Register read and write functions */
+uint64_t __roc_api roc_ml_reg_read64(struct roc_ml *roc_ml, uint64_t offset);
+void __roc_api roc_ml_reg_write64(struct roc_ml *roc_ml, uint64_t val, uint64_t offset);
+uint32_t __roc_api roc_ml_reg_read32(struct roc_ml *roc_ml, uint64_t offset);
+void __roc_api roc_ml_reg_write32(struct roc_ml *roc_ml, uint32_t val, uint64_t offset);
+void __roc_api roc_ml_reg_save(struct roc_ml *roc_ml, uint64_t offset);
+
+/* Address translation functions */
+uint64_t __roc_api roc_ml_addr_pa_to_offset(struct roc_ml *roc_ml, uint64_t phys_addr);
+uint64_t __roc_api roc_ml_addr_offset_to_pa(struct roc_ml *roc_ml, uint64_t offset);
+void *__roc_api roc_ml_addr_ap2mlip(struct roc_ml *roc_ml, void *addr);
+void *__roc_api roc_ml_addr_mlip2ap(struct roc_ml *roc_ml, void *addr);
+
+/* Scratch and JCMDQ functions */
+void __roc_api roc_ml_scratch_write_job(struct roc_ml *roc_ml, void *jd);
+bool __roc_api roc_ml_scratch_is_valid_bit_set(struct roc_ml *roc_ml);
+bool __roc_api roc_ml_scratch_is_done_bit_set(struct roc_ml *roc_ml);
+bool __roc_api roc_ml_scratch_enqueue(struct roc_ml *roc_ml, void *work_ptr);
+bool __roc_api roc_ml_scratch_dequeue(struct roc_ml *roc_ml, void *work_ptr);
+void __roc_api roc_ml_scratch_queue_reset(struct roc_ml *roc_ml);
+bool __roc_api roc_ml_jcmdq_enqueue_lf(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
+bool __roc_api roc_ml_jcmdq_enqueue_sl(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
+
+/* Device management functions */
+void __roc_api roc_ml_clk_force_on(struct roc_ml *roc_ml);
+void __roc_api roc_ml_clk_force_off(struct roc_ml *roc_ml);
+void __roc_api roc_ml_dma_stall_on(struct roc_ml *roc_ml);
+void __roc_api roc_ml_dma_stall_off(struct roc_ml *roc_ml);
+bool __roc_api roc_ml_mlip_is_enabled(struct roc_ml *roc_ml);
+int __roc_api roc_ml_mlip_reset(struct roc_ml *roc_ml, bool force);
+
+/* Device / block  functions */
+int __roc_api roc_ml_dev_init(struct roc_ml *roc_ml);
+int __roc_api roc_ml_dev_fini(struct roc_ml *roc_ml);
+int __roc_api roc_ml_blk_init(struct roc_bphy *roc_bphy, struct roc_ml *roc_ml);
+int __roc_api roc_ml_blk_fini(struct roc_bphy *roc_bphy, struct roc_ml *roc_ml);
+
+/* Utility functions */
+uint16_t __roc_api roc_ml_sso_pf_func_get(void);
+
+#endif /*_ROC_ML_H_*/
diff --git a/drivers/common/cnxk/roc_ml_priv.h b/drivers/common/cnxk/roc_ml_priv.h
new file mode 100644
index 0000000000..ad5fe90bab
--- /dev/null
+++ b/drivers/common/cnxk/roc_ml_priv.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef _ROC_ML_PRIV_H_
+#define _ROC_ML_PRIV_H_
+
+#include "roc_api.h"
+
+struct ml {
+	struct plt_pci_device *pci_dev;
+	struct dev dev;
+	uint8_t *ml_reg_addr;
+	uint64_t ml_mlr_base;
+	bool ml_mlr_base_saved;
+} __plt_cache_aligned;
+
+static inline struct ml *
+roc_ml_to_ml_priv(struct roc_ml *roc_ml)
+{
+	return (struct ml *)&roc_ml->reserved[0];
+}
+
+#endif /* _ROC_ML_PRIV_H_ */
diff --git a/drivers/common/cnxk/roc_platform.c b/drivers/common/cnxk/roc_platform.c
index ce0f9b870c..f91b95ceab 100644
--- a/drivers/common/cnxk/roc_platform.c
+++ b/drivers/common/cnxk/roc_platform.c
@@ -63,6 +63,7 @@ roc_plt_init(void)
 RTE_LOG_REGISTER(cnxk_logtype_base, pmd.cnxk.base, NOTICE);
 RTE_LOG_REGISTER(cnxk_logtype_mbox, pmd.cnxk.mbox, NOTICE);
 RTE_LOG_REGISTER(cnxk_logtype_cpt, pmd.crypto.cnxk, NOTICE);
+RTE_LOG_REGISTER(cnxk_logtype_ml, pmd.ml.cnxk, NOTICE);
 RTE_LOG_REGISTER(cnxk_logtype_npa, pmd.mempool.cnxk, NOTICE);
 RTE_LOG_REGISTER(cnxk_logtype_nix, pmd.net.cnxk, NOTICE);
 RTE_LOG_REGISTER(cnxk_logtype_npc, pmd.net.cnxk.flow, NOTICE);
diff --git a/drivers/common/cnxk/roc_platform.h b/drivers/common/cnxk/roc_platform.h
index a730b0ff26..f1786e633d 100644
--- a/drivers/common/cnxk/roc_platform.h
+++ b/drivers/common/cnxk/roc_platform.h
@@ -234,6 +234,7 @@
 extern int cnxk_logtype_base;
 extern int cnxk_logtype_mbox;
 extern int cnxk_logtype_cpt;
+extern int cnxk_logtype_ml;
 extern int cnxk_logtype_npa;
 extern int cnxk_logtype_nix;
 extern int cnxk_logtype_npc;
@@ -261,6 +262,7 @@ extern int cnxk_logtype_ree;
 #define plt_base_dbg(fmt, ...)	plt_dbg(base, fmt, ##__VA_ARGS__)
 #define plt_cpt_dbg(fmt, ...)	plt_dbg(cpt, fmt, ##__VA_ARGS__)
 #define plt_mbox_dbg(fmt, ...)	plt_dbg(mbox, fmt, ##__VA_ARGS__)
+#define plt_ml_dbg(fmt, ...)	plt_dbg(ml, fmt, ##__VA_ARGS__)
 #define plt_npa_dbg(fmt, ...)	plt_dbg(npa, fmt, ##__VA_ARGS__)
 #define plt_nix_dbg(fmt, ...)	plt_dbg(nix, fmt, ##__VA_ARGS__)
 #define plt_npc_dbg(fmt, ...)	plt_dbg(npc, fmt, ##__VA_ARGS__)
diff --git a/drivers/common/cnxk/roc_priv.h b/drivers/common/cnxk/roc_priv.h
index 122d411fe7..14fe2e452a 100644
--- a/drivers/common/cnxk/roc_priv.h
+++ b/drivers/common/cnxk/roc_priv.h
@@ -47,4 +47,7 @@
 /* REE */
 #include "roc_ree_priv.h"

+/* ML */
+#include "roc_ml_priv.h"
+
 #endif /* _ROC_PRIV_H_ */
diff --git a/drivers/common/cnxk/version.map b/drivers/common/cnxk/version.map
index 5677f63bee..a94bd1f420 100644
--- a/drivers/common/cnxk/version.map
+++ b/drivers/common/cnxk/version.map
@@ -8,6 +8,7 @@ INTERNAL {
 	cnxk_logtype_base;
 	cnxk_logtype_cpt;
 	cnxk_logtype_mbox;
+	cnxk_logtype_ml;
 	cnxk_logtype_nix;
 	cnxk_logtype_npa;
 	cnxk_logtype_npc;
@@ -98,6 +99,34 @@ INTERNAL {
 	roc_idev_npa_nix_get;
 	roc_idev_num_lmtlines_get;
 	roc_idev_nix_inl_meta_aura_get;
+	roc_ml_reg_read64;
+	roc_ml_reg_write64;
+	roc_ml_reg_read32;
+	roc_ml_reg_write32;
+	roc_ml_reg_save;
+	roc_ml_addr_ap2mlip;
+	roc_ml_addr_mlip2ap;
+	roc_ml_addr_pa_to_offset;
+	roc_ml_addr_offset_to_pa;
+	roc_ml_scratch_write_job;
+	roc_ml_scratch_is_valid_bit_set;
+	roc_ml_scratch_is_done_bit_set;
+	roc_ml_scratch_enqueue;
+	roc_ml_scratch_dequeue;
+	roc_ml_scratch_queue_reset;
+	roc_ml_jcmdq_enqueue_lf;
+	roc_ml_jcmdq_enqueue_sl;
+	roc_ml_clk_force_on;
+	roc_ml_clk_force_off;
+	roc_ml_dma_stall_on;
+	roc_ml_dma_stall_off;
+	roc_ml_mlip_is_enabled;
+	roc_ml_mlip_reset;
+	roc_ml_dev_init;
+	roc_ml_dev_fini;
+	roc_ml_blk_init;
+	roc_ml_blk_fini;
+	roc_ml_sso_pf_func_get;
 	roc_model;
 	roc_se_auth_key_set;
 	roc_se_ciph_key_set;
--
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 02/39] ml/cnxk: add skeleton for ML cnxk driver
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
  2023-02-07 16:06   ` [PATCH v5 01/39] common/cnxk: add ML headers and ROC code for cnxk Srikanth Yalavarthi
@ 2023-02-07 16:06   ` Srikanth Yalavarthi
  2023-03-09 22:06     ` Thomas Monjalon
  2023-02-07 16:06   ` [PATCH v5 03/39] ml/cnxk: enable probe and remove of ML device Srikanth Yalavarthi
                     ` (37 subsequent siblings)
  39 siblings, 1 reply; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:06 UTC (permalink / raw)
  To: Thomas Monjalon, Srikanth Yalavarthi
  Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added initial source files and build files for ML cnxk driver.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 MAINTAINERS                            |  1 +
 doc/guides/rel_notes/release_23_03.rst |  7 +++++++
 drivers/meson.build                    |  1 +
 drivers/ml/cnxk/cn10k_ml_dev.c         |  8 ++++++++
 drivers/ml/cnxk/cn10k_ml_dev.h         |  8 ++++++++
 drivers/ml/cnxk/meson.build            | 26 ++++++++++++++++++++++++++
 drivers/ml/meson.build                 |  8 ++++++++
 7 files changed, 59 insertions(+)
 create mode 100644 drivers/ml/cnxk/cn10k_ml_dev.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_dev.h
 create mode 100644 drivers/ml/cnxk/meson.build
 create mode 100644 drivers/ml/meson.build

diff --git a/MAINTAINERS b/MAINTAINERS
index 97ce3042b4..8e9d6dc946 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1441,6 +1441,7 @@ Marvell ML CNXK
 M: Srikanth Yalavarthi <syalavarthi@marvell.com>
 F: drivers/common/cnxk/hw/ml.h
 F: drivers/common/cnxk/roc_ml*
+F: drivers/ml/cnxk/
 
 
 Packet processing
diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
index 425323241e..09b0932cd9 100644
--- a/doc/guides/rel_notes/release_23_03.rst
+++ b/doc/guides/rel_notes/release_23_03.rst
@@ -100,6 +100,13 @@ New Features
   * Added functions to translate IO type and format to string.
   * Added functions to quantize and dequantize inference IO data.
 
+* **Implementation of Marvell CNXK machine learning driver for .**
+
+  * Added ml/cnxk driver which provides support for machine learning inference
+    operations on Marvell's CN10K series of SoC's.
+  * Added ML ROC code for ml/cnxk driver to common/cnxk.
+  * Added implementation with support for all rte_ml APIs.
+
 
 Removed Items
 -------------
diff --git a/drivers/meson.build b/drivers/meson.build
index c6d619200f..546a5f409d 100644
--- a/drivers/meson.build
+++ b/drivers/meson.build
@@ -14,6 +14,7 @@ subdirs = [
         'mempool',        # depends on common and bus.
         'dma',            # depends on common and bus.
         'net',            # depends on common, bus, mempool
+        'ml',             # depends on common, bus, mempool
         'raw',            # depends on common, bus, dma and net.
         'crypto',         # depends on common, bus and mempool (net in future).
         'compress',       # depends on common, bus, mempool.
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
new file mode 100644
index 0000000000..cc96a7bdb3
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include "cn10k_ml_dev.h"
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
new file mode 100644
index 0000000000..049ac13fcd
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef _CN10K_ML_DEV_H_
+#define _CN10K_ML_DEV_H_
+
+#endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
new file mode 100644
index 0000000000..2ec6a88e3f
--- /dev/null
+++ b/drivers/ml/cnxk/meson.build
@@ -0,0 +1,26 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2022 Marvell.
+
+if not is_linux or not dpdk_conf.get('RTE_ARCH_64')
+    build = false
+    reason = 'only supported on 64-bit Linux'
+    subdir_done()
+endif
+
+driver_sdk_headers = files(
+        'cn10k_ml_dev.h',
+)
+
+sources = files(
+        'cn10k_ml_dev.c',
+)
+
+deps += ['mldev', 'common_cnxk']
+
+if get_option('buildtype').contains('debug')
+        cflags += [ '-DCNXK_ML_DEV_DEBUG' ]
+else
+        cflags += [ '-UCNXK_ML_DEV_DEBUG' ]
+endif
+
+pmd_supports_disable_iova_as_pa = true
diff --git a/drivers/ml/meson.build b/drivers/ml/meson.build
new file mode 100644
index 0000000000..54bc394c47
--- /dev/null
+++ b/drivers/ml/meson.build
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2022 Marvell.
+
+drivers = [
+        'cnxk',
+]
+
+std_deps = ['mldev']
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 03/39] ml/cnxk: enable probe and remove of ML device
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
  2023-02-07 16:06   ` [PATCH v5 01/39] common/cnxk: add ML headers and ROC code for cnxk Srikanth Yalavarthi
  2023-02-07 16:06   ` [PATCH v5 02/39] ml/cnxk: add skeleton for ML cnxk driver Srikanth Yalavarthi
@ 2023-02-07 16:06   ` Srikanth Yalavarthi
  2023-02-07 16:06   ` [PATCH v5 04/39] ml/cnxk: add driver support to get device info Srikanth Yalavarthi
                     ` (36 subsequent siblings)
  39 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:06 UTC (permalink / raw)
  To: Srikanth Yalavarthi, Anatoly Burakov
  Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

ML inference engine on cn10k platform is a PCI based device. Added
driver support to probe and remove the device for cn10k poll mode
driver. The device is named by the PMD as "ml_cn10k".

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 114 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_dev.h |  11 ++++
 drivers/ml/cnxk/cn10k_ml_ops.c |  10 +++
 drivers/ml/cnxk/cn10k_ml_ops.h |  11 ++++
 drivers/ml/cnxk/meson.build    |   2 +
 5 files changed, 148 insertions(+)
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ops.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ops.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index cc96a7bdb3..c2e93c9a1a 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -2,7 +2,121 @@
  * Copyright (c) 2022 Marvell.
  */
 
+#include <rte_common.h>
+#include <rte_dev.h>
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
+#include <rte_pci.h>
+
+#include <roc_api.h>
 
 #include "cn10k_ml_dev.h"
+#include "cn10k_ml_ops.h"
+
+/* Dummy operations for ML device */
+struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
+
+static int
+cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
+{
+	struct rte_ml_dev_pmd_init_params init_params;
+	struct cn10k_ml_dev *mldev;
+	char name[RTE_ML_STR_MAX];
+	struct rte_ml_dev *dev;
+	int ret;
+
+	PLT_SET_USED(pci_drv);
+
+	init_params = (struct rte_ml_dev_pmd_init_params){
+		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cn10k_ml_dev)};
+
+	ret = roc_plt_init();
+	if (ret < 0) {
+		plt_err("Failed to initialize platform model");
+		return ret;
+	}
+
+	rte_pci_device_name(&pci_dev->addr, name, sizeof(name));
+	dev = rte_ml_dev_pmd_create(name, &pci_dev->device, &init_params);
+	if (dev == NULL) {
+		ret = -ENODEV;
+		goto error_exit;
+	}
+
+	/* Get private data space allocated */
+	mldev = dev->data->dev_private;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		mldev->roc.pci_dev = pci_dev;
+
+		ret = roc_ml_dev_init(&mldev->roc);
+		if (ret) {
+			plt_err("Failed to initialize ML ROC, ret = %d", ret);
+			goto pmd_destroy;
+		}
+
+		dev->dev_ops = &cn10k_ml_ops;
+	} else {
+		plt_err("CN10K ML Ops are not supported on secondary process");
+		dev->dev_ops = &ml_dev_dummy_ops;
+	}
+
+	dev->enqueue_burst = NULL;
+	dev->dequeue_burst = NULL;
+	dev->op_error_get = NULL;
+
+	return 0;
+
+pmd_destroy:
+	rte_ml_dev_pmd_destroy(dev);
+
+error_exit:
+	plt_err("Could not create device (vendor_id: 0x%x device_id: 0x%x)", pci_dev->id.vendor_id,
+		pci_dev->id.device_id);
+
+	return ret;
+}
+
+static int
+cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
+{
+	struct cn10k_ml_dev *mldev;
+	char name[RTE_ML_STR_MAX];
+	struct rte_ml_dev *dev;
+	int ret;
+
+	if (pci_dev == NULL)
+		return -EINVAL;
+
+	rte_pci_device_name(&pci_dev->addr, name, sizeof(name));
+
+	dev = rte_ml_dev_pmd_get_named_dev(name);
+	if (dev == NULL)
+		return -ENODEV;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		mldev = dev->data->dev_private;
+		ret = roc_ml_dev_fini(&mldev->roc);
+		if (ret)
+			return ret;
+	}
+
+	return rte_ml_dev_pmd_destroy(dev);
+}
+
+static struct rte_pci_id pci_id_ml_table[] = {
+	{RTE_PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_CN10K_ML_PF)},
+	/* sentinel */
+	{},
+};
+
+static struct rte_pci_driver cn10k_mldev_pmd = {
+	.id_table = pci_id_ml_table,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_NEED_IOVA_AS_VA,
+	.probe = cn10k_ml_pci_probe,
+	.remove = cn10k_ml_pci_remove,
+};
+
+RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd);
+RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table);
+RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 049ac13fcd..833a09791a 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -5,4 +5,15 @@
 #ifndef _CN10K_ML_DEV_H_
 #define _CN10K_ML_DEV_H_
 
+#include <roc_api.h>
+
+/* Marvell OCTEON CN10K ML PMD device name */
+#define MLDEV_NAME_CN10K_PMD ml_cn10k
+
+/* Device private data */
+struct cn10k_ml_dev {
+	/* Device ROC */
+	struct roc_ml roc;
+};
+
 #endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
new file mode 100644
index 0000000000..39843e3ee5
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include "cn10k_ml_ops.h"
+
+struct rte_ml_dev_ops cn10k_ml_ops = {0};
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
new file mode 100644
index 0000000000..b14221d02c
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef _CN10K_ML_OPS_H_
+#define _CN10K_ML_OPS_H_
+
+/* Device ops */
+extern struct rte_ml_dev_ops cn10k_ml_ops;
+
+#endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 2ec6a88e3f..caed62a9f3 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -9,10 +9,12 @@ endif
 
 driver_sdk_headers = files(
         'cn10k_ml_dev.h',
+        'cn10k_ml_ops.h',
 )
 
 sources = files(
         'cn10k_ml_dev.c',
+        'cn10k_ml_ops.c',
 )
 
 deps += ['mldev', 'common_cnxk']
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 04/39] ml/cnxk: add driver support to get device info
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (2 preceding siblings ...)
  2023-02-07 16:06   ` [PATCH v5 03/39] ml/cnxk: enable probe and remove of ML device Srikanth Yalavarthi
@ 2023-02-07 16:06   ` Srikanth Yalavarthi
  2023-02-07 16:06   ` [PATCH v5 05/39] ml/cnxk: add support for configure and close Srikanth Yalavarthi
                     ` (35 subsequent siblings)
  39 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:06 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added support to get the cn10k ML device information. This is a
driver implementation for the RTE function rte_ml_dev_info_get.
ML device on cn10k supports one queue-pair in lock-free mode and
does not support segmented input output data.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h | 15 +++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c | 23 ++++++++++++++++++++++-
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 833a09791a..13d26373e4 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -10,6 +10,21 @@
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
+/* Device alignment size */
+#define ML_CN10K_ALIGN_SIZE 128
+
+/* Maximum number of models per device */
+#define ML_CN10K_MAX_MODELS 16
+
+/* Maximum number of queue-pairs per device */
+#define ML_CN10K_MAX_QP_PER_DEVICE 1
+
+/* Maximum number of descriptors per queue-pair */
+#define ML_CN10K_MAX_DESC_PER_QP 1024
+
+/* Maximum number of segments for IO data */
+#define ML_CN10K_MAX_SEGMENTS 1
+
 /* Device private data */
 struct cn10k_ml_dev {
 	/* Device ROC */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 39843e3ee5..bad5ad4713 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -5,6 +5,27 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include "cn10k_ml_dev.h"
 #include "cn10k_ml_ops.h"
 
-struct rte_ml_dev_ops cn10k_ml_ops = {0};
+static int
+cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
+{
+	if (dev_info == NULL)
+		return -EINVAL;
+
+	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
+	dev_info->driver_name = dev->device->driver->name;
+	dev_info->max_models = ML_CN10K_MAX_MODELS;
+	dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE;
+	dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
+	dev_info->max_segments = ML_CN10K_MAX_SEGMENTS;
+	dev_info->min_align_size = ML_CN10K_ALIGN_SIZE;
+
+	return 0;
+}
+
+struct rte_ml_dev_ops cn10k_ml_ops = {
+	/* Device control ops */
+	.dev_info_get = cn10k_ml_dev_info_get,
+};
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 05/39] ml/cnxk: add support for configure and close
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (3 preceding siblings ...)
  2023-02-07 16:06   ` [PATCH v5 04/39] ml/cnxk: add driver support to get device info Srikanth Yalavarthi
@ 2023-02-07 16:06   ` Srikanth Yalavarthi
  2023-02-07 16:06   ` [PATCH v5 06/39] ml/cnxk: parse ML firmware path from device args Srikanth Yalavarthi
                     ` (34 subsequent siblings)
  39 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:06 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Implemented driver functions to configure and close ML devices.
Added skeleton code and support to reconfigure ML device. PCI
device remove is enabled in device close.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c |  2 ++
 drivers/ml/cnxk/cn10k_ml_dev.h | 21 ++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c | 60 ++++++++++++++++++++++++++++++++++
 3 files changed, 83 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index c2e93c9a1a..fd45226add 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -65,6 +65,8 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	dev->dequeue_burst = NULL;
 	dev->op_error_get = NULL;
 
+	mldev->state = ML_CN10K_DEV_STATE_PROBED;
+
 	return 0;
 
 pmd_destroy:
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 13d26373e4..e7fb5fc2e2 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -25,10 +25,31 @@
 /* Maximum number of segments for IO data */
 #define ML_CN10K_MAX_SEGMENTS 1
 
+/* ML command timeout in seconds */
+#define ML_CN10K_CMD_TIMEOUT 5
+
+/* Device configuration state enum */
+enum cn10k_ml_dev_state {
+	/* Probed and not configured */
+	ML_CN10K_DEV_STATE_PROBED = 0,
+
+	/* Configured */
+	ML_CN10K_DEV_STATE_CONFIGURED,
+
+	/* Started */
+	ML_CN10K_DEV_STATE_STARTED,
+
+	/* Closed */
+	ML_CN10K_DEV_STATE_CLOSED
+};
+
 /* Device private data */
 struct cn10k_ml_dev {
 	/* Device ROC */
 	struct roc_ml roc;
+
+	/* Configuration state */
+	enum cn10k_ml_dev_state state;
 };
 
 #endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index bad5ad4713..3a78d8c816 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -25,7 +25,67 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	return 0;
 }
 
+static int
+cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_dev *mldev;
+
+	if (dev == NULL || conf == NULL)
+		return -EINVAL;
+
+	/* Get CN10K device handle */
+	mldev = dev->data->dev_private;
+
+	cn10k_ml_dev_info_get(dev, &dev_info);
+	if (conf->nb_models > dev_info.max_models) {
+		plt_err("Invalid device config, nb_models > %u\n", dev_info.max_models);
+		return -EINVAL;
+	}
+
+	if (conf->nb_queue_pairs > dev_info.max_queue_pairs) {
+		plt_err("Invalid device config, nb_queue_pairs > %u\n", dev_info.max_queue_pairs);
+		return -EINVAL;
+	}
+
+	if (mldev->state == ML_CN10K_DEV_STATE_PROBED) {
+		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
+			   conf->nb_queue_pairs, conf->nb_models);
+	} else if (mldev->state == ML_CN10K_DEV_STATE_CONFIGURED) {
+		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
+			   conf->nb_queue_pairs, conf->nb_models);
+	} else if (mldev->state == ML_CN10K_DEV_STATE_STARTED) {
+		plt_err("Device can't be reconfigured in started state\n");
+		return -ENOTSUP;
+	} else if (mldev->state == ML_CN10K_DEV_STATE_CLOSED) {
+		plt_err("Device can't be reconfigured after close\n");
+		return -ENOTSUP;
+	}
+
+	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
+
+	return 0;
+}
+
+static int
+cn10k_ml_dev_close(struct rte_ml_dev *dev)
+{
+	struct cn10k_ml_dev *mldev;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	mldev = dev->data->dev_private;
+
+	mldev->state = ML_CN10K_DEV_STATE_CLOSED;
+
+	/* Remove PCI device */
+	return rte_dev_remove(dev->device);
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
+	.dev_configure = cn10k_ml_dev_configure,
+	.dev_close = cn10k_ml_dev_close,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 06/39] ml/cnxk: parse ML firmware path from device args
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (4 preceding siblings ...)
  2023-02-07 16:06   ` [PATCH v5 05/39] ml/cnxk: add support for configure and close Srikanth Yalavarthi
@ 2023-02-07 16:06   ` Srikanth Yalavarthi
  2023-02-07 16:06   ` [PATCH v5 07/39] ml/cnxk: enable firmware load and device reset Srikanth Yalavarthi
                     ` (33 subsequent siblings)
  39 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:06 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Enabled parsing ML firmware path for cn10k. Default path is set
as "/lib/firmware/mlip-fw.bin", when args are not provided. Added
internal structures for ML firmware.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 71 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_dev.h | 12 ++++++
 drivers/ml/cnxk/meson.build    |  2 +-
 3 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index fd45226add..117cac43aa 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -4,6 +4,8 @@
 
 #include <rte_common.h>
 #include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_kvargs.h>
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 #include <rte_pci.h>
@@ -13,9 +15,70 @@
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ops.h"
 
+#define CN10K_ML_FW_PATH "fw_path"
+
+#define CN10K_ML_FW_PATH_DEFAULT "/lib/firmware/mlip-fw.bin"
+
+static const char *const valid_args[] = {CN10K_ML_FW_PATH, NULL};
+
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
 
+static int
+parse_string_arg(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	if (value == NULL || extra_args == NULL)
+		return -EINVAL;
+
+	*(char **)extra_args = strdup(value);
+
+	if (!*(char **)extra_args)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static int
+cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mldev)
+{
+	struct rte_kvargs *kvlist = NULL;
+	bool fw_path_set = false;
+	char *fw_path = NULL;
+	int ret = 0;
+
+	if (devargs == NULL)
+		goto check_args;
+
+	kvlist = rte_kvargs_parse(devargs->args, valid_args);
+	if (kvlist == NULL) {
+		plt_err("Error parsing devargs\n");
+		return -EINVAL;
+	}
+
+	if (rte_kvargs_count(kvlist, CN10K_ML_FW_PATH) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_PATH, &parse_string_arg, &fw_path);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n", CN10K_ML_FW_PATH);
+			ret = -EINVAL;
+			goto exit;
+		}
+		fw_path_set = true;
+	}
+
+check_args:
+	if (!fw_path_set)
+		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
+	else
+		mldev->fw.path = fw_path;
+	plt_info("ML: %s = %s", CN10K_ML_FW_PATH, mldev->fw.path);
+
+exit:
+	if (kvlist)
+		rte_kvargs_free(kvlist);
+
+	return ret;
+}
+
 static int
 cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 {
@@ -49,6 +112,12 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
 		mldev->roc.pci_dev = pci_dev;
 
+		ret = cn10k_mldev_parse_devargs(dev->device->devargs, mldev);
+		if (ret) {
+			plt_err("Failed to parse devargs ret = %d", ret);
+			goto pmd_destroy;
+		}
+
 		ret = roc_ml_dev_init(&mldev->roc);
 		if (ret) {
 			plt_err("Failed to initialize ML ROC, ret = %d", ret);
@@ -122,3 +191,5 @@ static struct rte_pci_driver cn10k_mldev_pmd = {
 RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table);
 RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
+
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH "=<path>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index e7fb5fc2e2..5333566cff 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -43,6 +43,15 @@ enum cn10k_ml_dev_state {
 	ML_CN10K_DEV_STATE_CLOSED
 };
 
+/* ML firmware structure */
+struct cn10k_ml_fw {
+	/* Device reference */
+	struct cn10k_ml_dev *mldev;
+
+	/* Firmware file path */
+	const char *path;
+};
+
 /* Device private data */
 struct cn10k_ml_dev {
 	/* Device ROC */
@@ -50,6 +59,9 @@ struct cn10k_ml_dev {
 
 	/* Configuration state */
 	enum cn10k_ml_dev_state state;
+
+	/* Firmware */
+	struct cn10k_ml_fw fw;
 };
 
 #endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index caed62a9f3..7dc8a29a80 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -17,7 +17,7 @@ sources = files(
         'cn10k_ml_ops.c',
 )
 
-deps += ['mldev', 'common_cnxk']
+deps += ['mldev', 'common_cnxk', 'kvargs']
 
 if get_option('buildtype').contains('debug')
         cflags += [ '-DCNXK_ML_DEV_DEBUG' ]
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 07/39] ml/cnxk: enable firmware load and device reset
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (5 preceding siblings ...)
  2023-02-07 16:06   ` [PATCH v5 06/39] ml/cnxk: parse ML firmware path from device args Srikanth Yalavarthi
@ 2023-02-07 16:06   ` Srikanth Yalavarthi
  2023-02-07 16:06   ` [PATCH v5 08/39] ml/cnxk: enable support for simulator environment Srikanth Yalavarthi
                     ` (32 subsequent siblings)
  39 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:06 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Enabled support to load ML firmware on cn10ka ROC model. Reset
MLIP device during dev_close driver operation. Device can't be
reconfigured after a call to close. Job execution is disabled
after firmware load, execution is enabled in device start state.
Added internal request structure.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 327 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_dev.h | 156 ++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c |  21 +++
 drivers/ml/cnxk/cn10k_ml_ops.h |  14 ++
 4 files changed, 518 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 117cac43aa..90fca45ddd 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -12,6 +12,8 @@
 
 #include <roc_api.h>
 
+#include <eal_firmware.h>
+
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ops.h"
 
@@ -19,6 +21,15 @@
 
 #define CN10K_ML_FW_PATH_DEFAULT "/lib/firmware/mlip-fw.bin"
 
+/* ML firmware macros */
+#define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
+#define FW_STACK_BUFFER_SIZE	 0x40000
+#define FW_DEBUG_BUFFER_SIZE	 (2 * 0x20000)
+#define FW_EXCEPTION_BUFFER_SIZE 0x400
+#define FW_LINKER_OFFSET	 0x80000
+#define FW_WAIT_CYCLES		 100
+#define FW_LOAD_FLAGS		 0x1
+
 static const char *const valid_args[] = {CN10K_ML_FW_PATH, NULL};
 
 /* Dummy operations for ML device */
@@ -175,6 +186,322 @@ cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 	return rte_ml_dev_pmd_destroy(dev);
 }
 
+static void
+cn10k_ml_fw_print_info(struct cn10k_ml_fw *fw)
+{
+	plt_info("ML Firmware Version = %s", fw->req->jd.fw_load.version);
+
+	plt_ml_dbg("Firmware capabilities = 0x%016lx", fw->req->jd.fw_load.cap.u64);
+	plt_ml_dbg("Version = %s", fw->req->jd.fw_load.version);
+	plt_ml_dbg("core0_debug_ptr = 0x%016lx", fw->req->jd.fw_load.debug.core0_debug_ptr);
+	plt_ml_dbg("core1_debug_ptr = 0x%016lx", fw->req->jd.fw_load.debug.core1_debug_ptr);
+	plt_ml_dbg("debug_buffer_size = %u bytes", fw->req->jd.fw_load.debug.debug_buffer_size);
+	plt_ml_dbg("core0_exception_buffer = 0x%016lx",
+		   fw->req->jd.fw_load.debug.core0_exception_buffer);
+	plt_ml_dbg("core1_exception_buffer = 0x%016lx",
+		   fw->req->jd.fw_load.debug.core1_exception_buffer);
+	plt_ml_dbg("exception_state_size = %u bytes",
+		   fw->req->jd.fw_load.debug.exception_state_size);
+	plt_ml_dbg("flags = 0x%016lx", fw->req->jd.fw_load.flags);
+}
+
+uint64_t
+cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
+{
+	PLT_SET_USED(fw);
+
+	return FW_LOAD_FLAGS;
+}
+
+static int
+cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
+{
+	union ml_a35_0_rst_vector_base_s a35_0_rst_vector_base;
+	union ml_a35_0_rst_vector_base_s a35_1_rst_vector_base;
+	struct cn10k_ml_dev *mldev;
+	uint64_t timeout_cycle;
+	uint64_t reg_val64;
+	uint32_t reg_val32;
+	uint64_t offset;
+	bool timeout;
+	int ret = 0;
+	uint8_t i;
+
+	mldev = fw->mldev;
+
+	/* Reset HEAD and TAIL debug pointer registers */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
+
+	/* (1) Write firmware images for ACC's two A35 cores to the ML region in LLC / DRAM. */
+	rte_memcpy(PLT_PTR_ADD(fw->data, FW_LINKER_OFFSET), buffer, size);
+
+	/* (2) Set ML(0)_MLR_BASE = Base IOVA of the ML region in LLC/DRAM. */
+	reg_val64 = PLT_PTR_SUB_U64_CAST(fw->data, rte_eal_get_baseaddr());
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_save(&mldev->roc, ML_MLR_BASE);
+
+	/* (3) Set ML(0)_AXI_BRIDGE_CTRL(1) = 0x184003 to remove back-pressure check on DMA AXI
+	 * bridge.
+	 */
+	reg_val64 = (ROC_ML_AXI_BRIDGE_CTRL_AXI_RESP_CTRL |
+		     ROC_ML_AXI_BRIDGE_CTRL_BRIDGE_CTRL_MODE | ROC_ML_AXI_BRIDGE_CTRL_NCB_WR_BLK |
+		     ROC_ML_AXI_BRIDGE_CTRL_FORCE_WRESP_OK | ROC_ML_AXI_BRIDGE_CTRL_FORCE_RRESP_OK);
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_AXI_BRIDGE_CTRL(1));
+	plt_ml_dbg("ML_AXI_BRIDGE_CTRL(1) => 0x%016lx",
+		   roc_ml_reg_read64(&mldev->roc, ML_AXI_BRIDGE_CTRL(1)));
+
+	/* (4) Set ML(0)_ANB(0..2)_BACKP_DISABLE = 0x3 to remove back-pressure on the AXI to NCB
+	 * bridges.
+	 */
+	for (i = 0; i < ML_ANBX_NR; i++) {
+		reg_val64 = (ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_B_BACKP_DISABLE |
+			     ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_R_BACKP_DISABLE);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_BACKP_DISABLE(i));
+		plt_ml_dbg("ML_ANBX_BACKP_DISABLE(%u) => 0x%016lx", i,
+			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_BACKP_DISABLE(i)));
+	}
+
+	/* (5) Set ML(0)_ANB(0..2)_NCBI_P_OVR = 0x3000 and ML(0)_ANB(0..2)_NCBI_NP_OVR = 0x3000 to
+	 * signal all ML transactions as non-secure.
+	 */
+	for (i = 0; i < ML_ANBX_NR; i++) {
+		reg_val64 = (ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR |
+			     ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR_VLD);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
+		plt_ml_dbg("ML_ANBX_NCBI_P_OVR(%u) => 0x%016lx", i,
+			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_P_OVR(i)));
+
+		reg_val64 |= (ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR |
+			      ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR_VLD);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_NP_OVR(i));
+		plt_ml_dbg("ML_ANBX_NCBI_NP_OVR(%u) => 0x%016lx", i,
+			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_NP_OVR(i)));
+	}
+
+	/* (6) Set ML(0)_CFG[MLIP_CLK_FORCE] = 1, to force turning on the MLIP clock. */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 |= ROC_ML_CFG_MLIP_CLK_FORCE;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	/* (7) Set ML(0)_JOB_MGR_CTRL[STALL_ON_IDLE] = 0, to make sure the boot request is accepted
+	 * when there is no job in the command queue.
+	 */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL);
+	reg_val64 &= ~ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
+	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL));
+
+	/* (8) Set ML(0)_CFG[ENA] = 0 and ML(0)_CFG[MLIP_ENA] = 1 to bring MLIP out of reset while
+	 * keeping the job manager disabled.
+	 */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 |= ROC_ML_CFG_MLIP_ENA;
+	reg_val64 &= ~ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	/* (9) Wait at least 70 coprocessor clock cycles. */
+	plt_delay_us(FW_WAIT_CYCLES);
+
+	/* (10) Write ML outbound addresses pointing to the firmware images written in step 1 to the
+	 * following registers: ML(0)_A35_0_RST_VECTOR_BASE_W(0..1) for core 0,
+	 * ML(0)_A35_1_RST_VECTOR_BASE_W(0..1) for core 1. The value written to each register is the
+	 * AXI outbound address divided by 4. Read after write.
+	 */
+	offset = PLT_PTR_ADD_U64_CAST(
+		fw->data, FW_LINKER_OFFSET - roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+	a35_0_rst_vector_base.s.addr = (offset + ML_AXI_START_ADDR) / 4;
+	a35_1_rst_vector_base.s.addr = (offset + ML_AXI_START_ADDR) / 4;
+
+	roc_ml_reg_write32(&mldev->roc, a35_0_rst_vector_base.w.w0, ML_A35_0_RST_VECTOR_BASE_W(0));
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(0));
+	plt_ml_dbg("ML_A35_0_RST_VECTOR_BASE_W(0) => 0x%08x", reg_val32);
+
+	roc_ml_reg_write32(&mldev->roc, a35_0_rst_vector_base.w.w1, ML_A35_0_RST_VECTOR_BASE_W(1));
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(1));
+	plt_ml_dbg("ML_A35_0_RST_VECTOR_BASE_W(1) => 0x%08x", reg_val32);
+
+	roc_ml_reg_write32(&mldev->roc, a35_1_rst_vector_base.w.w0, ML_A35_1_RST_VECTOR_BASE_W(0));
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(0));
+	plt_ml_dbg("ML_A35_1_RST_VECTOR_BASE_W(0) => 0x%08x", reg_val32);
+
+	roc_ml_reg_write32(&mldev->roc, a35_1_rst_vector_base.w.w1, ML_A35_1_RST_VECTOR_BASE_W(1));
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(1));
+	plt_ml_dbg("ML_A35_1_RST_VECTOR_BASE_W(1) => 0x%08x", reg_val32);
+
+	/* (11) Clear MLIP's ML(0)_SW_RST_CTRL[ACC_RST]. This will bring the ACC cores and other
+	 * MLIP components out of reset. The cores will execute firmware from the ML region as
+	 * written in step 1.
+	 */
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_SW_RST_CTRL);
+	reg_val32 &= ~ROC_ML_SW_RST_CTRL_ACC_RST;
+	roc_ml_reg_write32(&mldev->roc, reg_val32, ML_SW_RST_CTRL);
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_SW_RST_CTRL);
+	plt_ml_dbg("ML_SW_RST_CTRL => 0x%08x", reg_val32);
+
+	/* (12) Wait for notification from firmware that ML is ready for job execution. */
+	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
+	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
+	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &fw->req->result);
+	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
+	plt_write64(ML_CN10K_POLL_JOB_START, &fw->req->status);
+	plt_wmb();
+
+	/* Enqueue FW load through scratch registers */
+	timeout = true;
+	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&mldev->roc, &fw->req->jd);
+
+	plt_rmb();
+	do {
+		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
+		    (plt_read64(&fw->req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+			timeout = false;
+			break;
+		}
+	} while (plt_tsc_cycles() < timeout_cycle);
+
+	/* Check firmware load status, clean-up and exit on failure. */
+	if ((!timeout) && (fw->req->result.error_code == 0)) {
+		cn10k_ml_fw_print_info(fw);
+	} else {
+		/* Set ML to disable new jobs */
+		reg_val64 = (ROC_ML_CFG_JD_SIZE | ROC_ML_CFG_MLIP_ENA);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+
+		/* Clear scratch registers */
+		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+
+		if (timeout) {
+			plt_err("Firmware load timeout");
+			ret = -ETIME;
+		} else {
+			plt_err("Firmware load failed");
+			ret = -1;
+		}
+
+		return ret;
+	}
+
+	/* (13) Set ML(0)_JOB_MGR_CTRL[STALL_ON_IDLE] = 0x1; this is needed to shut down the MLIP
+	 * clock when there are no more jobs to process.
+	 */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL);
+	reg_val64 |= ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
+	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL));
+
+	/* (14) Set ML(0)_CFG[MLIP_CLK_FORCE] = 0; the MLIP clock will be turned on/off based on job
+	 * activities.
+	 */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 &= ~ROC_ML_CFG_MLIP_CLK_FORCE;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	/* (15) Set ML(0)_CFG[ENA] to enable ML job execution. */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 |= ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	/* Reset scratch registers */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+
+	/* Disable job execution, to be enabled in start */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 &= ~ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	/* Additional fixes: Set RO bit to fix O2D DMA bandwidth issue on cn10ka */
+	for (i = 0; i < ML_ANBX_NR; i++) {
+		reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_P_OVR(i));
+		reg_val64 |= (ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR |
+			      ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR_VLD);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
+	}
+
+	return ret;
+}
+
+int
+cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
+{
+	const struct plt_memzone *mz;
+	struct cn10k_ml_fw *fw;
+	void *fw_buffer = NULL;
+	uint64_t mz_size = 0;
+	uint64_t fw_size = 0;
+	int ret = 0;
+
+	fw = &mldev->fw;
+	fw->mldev = mldev;
+
+	/* Read firmware image to a buffer */
+	ret = rte_firmware_read(fw->path, &fw_buffer, &fw_size);
+	if (ret < 0) {
+		plt_err("Can't read firmware data: %s\n", fw->path);
+		return ret;
+	}
+
+	/* Reserve memzone for firmware load completion and data */
+	mz_size = sizeof(struct cn10k_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
+		  FW_DEBUG_BUFFER_SIZE + FW_EXCEPTION_BUFFER_SIZE;
+	mz = plt_memzone_reserve_aligned(FW_MEMZONE_NAME, mz_size, 0, ML_CN10K_ALIGN_SIZE);
+	if (mz == NULL) {
+		plt_err("plt_memzone_reserve failed : %s", FW_MEMZONE_NAME);
+		if (fw_buffer != NULL)
+			free(fw_buffer);
+		return -ENOMEM;
+	}
+	fw->req = mz->addr;
+
+	/* Reset firmware load completion structure */
+	memset(&fw->req->jd, 0, sizeof(struct cn10k_ml_jd));
+	memset(&fw->req->jd.fw_load.version[0], '\0', MLDEV_FIRMWARE_VERSION_LENGTH);
+
+	/* Reset device, if in active state */
+	if (roc_ml_mlip_is_enabled(&mldev->roc))
+		roc_ml_mlip_reset(&mldev->roc, true);
+
+	/* Load firmware */
+	fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cn10k_ml_req));
+	ret = cn10k_ml_fw_load_cn10ka(fw, fw_buffer, fw_size);
+	if (fw_buffer != NULL)
+		free(fw_buffer);
+	if (ret < 0)
+		cn10k_ml_fw_unload(mldev);
+
+	return ret;
+}
+
+void
+cn10k_ml_fw_unload(struct cn10k_ml_dev *mldev)
+{
+	const struct plt_memzone *mz;
+	uint64_t reg_val;
+
+	/* Disable and reset device */
+	reg_val = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val &= ~ROC_ML_CFG_MLIP_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val, ML_CFG);
+	roc_ml_mlip_reset(&mldev->roc, true);
+
+	mz = plt_memzone_lookup(FW_MEMZONE_NAME);
+	if (mz != NULL)
+		plt_memzone_free(mz);
+}
+
 static struct rte_pci_id pci_id_ml_table[] = {
 	{RTE_PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_CN10K_ML_PF)},
 	/* sentinel */
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 5333566cff..00d23eb3ca 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -10,6 +10,9 @@
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
+/* Firmware version string length */
+#define MLDEV_FIRMWARE_VERSION_LENGTH 32
+
 /* Device alignment size */
 #define ML_CN10K_ALIGN_SIZE 128
 
@@ -28,6 +31,19 @@
 /* ML command timeout in seconds */
 #define ML_CN10K_CMD_TIMEOUT 5
 
+/* Poll mode job state */
+#define ML_CN10K_POLL_JOB_START	 0
+#define ML_CN10K_POLL_JOB_FINISH 1
+
+/* Job types */
+enum cn10k_ml_job_type {
+	ML_CN10K_JOB_TYPE_MODEL_RUN = 0,
+	ML_CN10K_JOB_TYPE_MODEL_STOP,
+	ML_CN10K_JOB_TYPE_MODEL_START,
+	ML_CN10K_JOB_TYPE_FIRMWARE_LOAD,
+	ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST,
+};
+
 /* Device configuration state enum */
 enum cn10k_ml_dev_state {
 	/* Probed and not configured */
@@ -43,6 +59,136 @@ enum cn10k_ml_dev_state {
 	ML_CN10K_DEV_STATE_CLOSED
 };
 
+/* Firmware stats */
+struct cn10k_ml_fw_stats {
+	/* Firmware start cycle */
+	uint64_t fw_start;
+
+	/* Firmware end cycle */
+	uint64_t fw_end;
+
+	/* Hardware start cycle */
+	uint64_t hw_start;
+
+	/* Hardware end cycle */
+	uint64_t hw_end;
+};
+
+/* Result structure */
+struct cn10k_ml_result {
+	/* Job error code */
+	uint64_t error_code;
+
+	/* Firmware stats */
+	struct cn10k_ml_fw_stats stats;
+
+	/* User context pointer */
+	void *user_ptr;
+};
+
+/* Firmware capability structure */
+union cn10k_ml_fw_cap {
+	uint64_t u64;
+
+	struct {
+		/* CMPC completion support */
+		uint64_t cmpc_completions : 1;
+
+		/* Poll mode completion support */
+		uint64_t poll_completions : 1;
+
+		/* SSO completion support */
+		uint64_t sso_completions : 1;
+
+		/* Support for model side loading */
+		uint64_t side_load_model : 1;
+
+		/* Batch execution */
+		uint64_t batch_run : 1;
+
+		/* Max number of models to be loaded in parallel */
+		uint64_t max_models : 8;
+
+		/* Firmware statistics */
+		uint64_t fw_stats : 1;
+
+		/* Hardware statistics */
+		uint64_t hw_stats : 1;
+
+		/* Max number of batches */
+		uint64_t max_num_batches : 16;
+
+		uint64_t rsvd : 33;
+	} s;
+};
+
+/* Firmware debug info structure */
+struct cn10k_ml_fw_debug {
+	/* ACC core 0 debug buffer */
+	uint64_t core0_debug_ptr;
+
+	/* ACC core 1 debug buffer */
+	uint64_t core1_debug_ptr;
+
+	/* ACC core 0 exception state buffer */
+	uint64_t core0_exception_buffer;
+
+	/* ACC core 1 exception state buffer */
+	uint64_t core1_exception_buffer;
+
+	/* Debug buffer size per core */
+	uint32_t debug_buffer_size;
+
+	/* Exception state dump size */
+	uint32_t exception_state_size;
+};
+
+/* Job descriptor header (32 bytes) */
+struct cn10k_ml_jd_header {
+	/* Job completion structure */
+	struct ml_jce_s jce;
+
+	/* Model ID */
+	uint64_t model_id : 8;
+
+	/* Job type */
+	uint64_t job_type : 8;
+
+	/* Flags for fast-path jobs */
+	uint64_t fp_flags : 16;
+
+	/* Flags for slow-path jobs */
+	uint64_t sp_flags : 16;
+	uint64_t rsvd : 16;
+
+	/* Job result pointer */
+	uint64_t *result;
+};
+
+/* Job descriptor structure */
+struct cn10k_ml_jd {
+	/* Job descriptor header (32 bytes) */
+	struct cn10k_ml_jd_header hdr;
+
+	union {
+		struct cn10k_ml_jd_section_fw_load {
+			/* Firmware capability structure (8 bytes) */
+			union cn10k_ml_fw_cap cap;
+
+			/* Firmware version (32 bytes) */
+			uint8_t version[MLDEV_FIRMWARE_VERSION_LENGTH];
+
+			/* Debug capability structure (40 bytes) */
+			struct cn10k_ml_fw_debug debug;
+
+			/* Flags to control error handling */
+			uint64_t flags;
+
+			uint8_t rsvd[8];
+		} fw_load;
+	};
+};
+
 /* ML firmware structure */
 struct cn10k_ml_fw {
 	/* Device reference */
@@ -50,6 +196,12 @@ struct cn10k_ml_fw {
 
 	/* Firmware file path */
 	const char *path;
+
+	/* Data buffer */
+	uint8_t *data;
+
+	/* Firmware load / handshake request structure */
+	struct cn10k_ml_req *req;
 };
 
 /* Device private data */
@@ -64,4 +216,8 @@ struct cn10k_ml_dev {
 	struct cn10k_ml_fw fw;
 };
 
+uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
+int cn10k_ml_fw_load(struct cn10k_ml_dev *mldev);
+void cn10k_ml_fw_unload(struct cn10k_ml_dev *mldev);
+
 #endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 3a78d8c816..3df1254dca 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -30,6 +30,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 {
 	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_dev *mldev;
+	int ret;
 
 	if (dev == NULL || conf == NULL)
 		return -EINVAL;
@@ -51,6 +52,11 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	if (mldev->state == ML_CN10K_DEV_STATE_PROBED) {
 		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
+
+		/* Load firmware */
+		ret = cn10k_ml_fw_load(mldev);
+		if (ret != 0)
+			return ret;
 	} else if (mldev->state == ML_CN10K_DEV_STATE_CONFIGURED) {
 		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
@@ -77,6 +83,21 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 
 	mldev = dev->data->dev_private;
 
+	/* Unload firmware */
+	cn10k_ml_fw_unload(mldev);
+
+	/* Clear scratch registers */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+
+	/* Reset ML_MLR_BASE */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+
 	mldev->state = ML_CN10K_DEV_STATE_CLOSED;
 
 	/* Remove PCI device */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index b14221d02c..fe18730aca 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -5,6 +5,20 @@
 #ifndef _CN10K_ML_OPS_H_
 #define _CN10K_ML_OPS_H_
 
+#include "cn10k_ml_dev.h"
+
+/* ML request */
+struct cn10k_ml_req {
+	/* Job descriptor */
+	struct cn10k_ml_jd jd;
+
+	/* Job result */
+	struct cn10k_ml_result result;
+
+	/* Status field for poll mode requests */
+	volatile uint64_t status;
+} __rte_aligned(ROC_ALIGN);
+
 /* Device ops */
 extern struct rte_ml_dev_ops cn10k_ml_ops;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 08/39] ml/cnxk: enable support for simulator environment
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (6 preceding siblings ...)
  2023-02-07 16:06   ` [PATCH v5 07/39] ml/cnxk: enable firmware load and device reset Srikanth Yalavarthi
@ 2023-02-07 16:06   ` Srikanth Yalavarthi
  2023-02-07 16:06   ` [PATCH v5 09/39] ml/cnxk: enable support for device start and stop Srikanth Yalavarthi
                     ` (31 subsequent siblings)
  39 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:06 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Enabled device initialization and firmware load on simulator
platform. Firmware load stage on simulator would involve
launching a firmware handshake request only.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 119 +++++++++++++++++++++++++++++----
 1 file changed, 107 insertions(+), 12 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 90fca45ddd..837f006bf0 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -213,6 +213,89 @@ cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 	return FW_LOAD_FLAGS;
 }
 
+static int
+cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
+{
+	struct cn10k_ml_dev *mldev;
+	uint64_t timeout_cycle;
+	uint64_t reg_val64;
+	bool timeout;
+	int ret = 0;
+
+	mldev = fw->mldev;
+
+	/* Reset HEAD and TAIL debug pointer registers */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
+
+	/* Set ML_MLR_BASE to base IOVA of the ML region in LLC/DRAM. */
+	reg_val64 = rte_eal_get_baseaddr();
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_save(&mldev->roc, ML_MLR_BASE);
+
+	/* Update FW load completion structure */
+	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
+	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
+	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &fw->req->result);
+	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
+	plt_write64(ML_CN10K_POLL_JOB_START, &fw->req->status);
+	plt_wmb();
+
+	/* Enqueue FW load through scratch registers */
+	timeout = true;
+	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&mldev->roc, &fw->req->jd);
+
+	plt_rmb();
+	do {
+		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
+		    (plt_read64(&fw->req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+			timeout = false;
+			break;
+		}
+	} while (plt_tsc_cycles() < timeout_cycle);
+
+	/* Check firmware load status, clean-up and exit on failure. */
+	if ((!timeout) && (fw->req->result.error_code == 0)) {
+		cn10k_ml_fw_print_info(fw);
+	} else {
+		/* Set ML to disable new jobs */
+		reg_val64 = (ROC_ML_CFG_JD_SIZE | ROC_ML_CFG_MLIP_ENA);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+
+		/* Clear scratch registers */
+		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+
+		if (timeout) {
+			plt_err("Firmware load timeout");
+			ret = -ETIME;
+		} else {
+			plt_err("Firmware load failed");
+			ret = -1;
+		}
+
+		return ret;
+	}
+
+	/* Reset scratch registers */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+
+	/* Disable job execution, to be enabled in start */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 &= ~ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	return ret;
+}
+
 static int
 cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 {
@@ -447,16 +530,22 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	fw = &mldev->fw;
 	fw->mldev = mldev;
 
-	/* Read firmware image to a buffer */
-	ret = rte_firmware_read(fw->path, &fw_buffer, &fw_size);
-	if (ret < 0) {
-		plt_err("Can't read firmware data: %s\n", fw->path);
-		return ret;
+	if (roc_env_is_emulator() || roc_env_is_hw()) {
+		/* Read firmware image to a buffer */
+		ret = rte_firmware_read(fw->path, &fw_buffer, &fw_size);
+		if (ret < 0) {
+			plt_err("Can't read firmware data: %s\n", fw->path);
+			return ret;
+		}
+
+		/* Reserve memzone for firmware load completion and data */
+		mz_size = sizeof(struct cn10k_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
+			  FW_DEBUG_BUFFER_SIZE + FW_EXCEPTION_BUFFER_SIZE;
+	} else if (roc_env_is_asim()) {
+		/* Reserve memzone for firmware load completion */
+		mz_size = sizeof(struct cn10k_ml_req);
 	}
 
-	/* Reserve memzone for firmware load completion and data */
-	mz_size = sizeof(struct cn10k_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
-		  FW_DEBUG_BUFFER_SIZE + FW_EXCEPTION_BUFFER_SIZE;
 	mz = plt_memzone_reserve_aligned(FW_MEMZONE_NAME, mz_size, 0, ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL) {
 		plt_err("plt_memzone_reserve failed : %s", FW_MEMZONE_NAME);
@@ -475,10 +564,16 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 		roc_ml_mlip_reset(&mldev->roc, true);
 
 	/* Load firmware */
-	fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cn10k_ml_req));
-	ret = cn10k_ml_fw_load_cn10ka(fw, fw_buffer, fw_size);
-	if (fw_buffer != NULL)
-		free(fw_buffer);
+	if (roc_env_is_emulator() || roc_env_is_hw()) {
+		fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cn10k_ml_req));
+		ret = cn10k_ml_fw_load_cn10ka(fw, fw_buffer, fw_size);
+		if (fw_buffer != NULL)
+			free(fw_buffer);
+	} else if (roc_env_is_asim()) {
+		fw->data = NULL;
+		ret = cn10k_ml_fw_load_asim(fw);
+	}
+
 	if (ret < 0)
 		cn10k_ml_fw_unload(mldev);
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 09/39] ml/cnxk: enable support for device start and stop
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (7 preceding siblings ...)
  2023-02-07 16:06   ` [PATCH v5 08/39] ml/cnxk: enable support for simulator environment Srikanth Yalavarthi
@ 2023-02-07 16:06   ` Srikanth Yalavarthi
  2023-02-07 16:06   ` [PATCH v5 10/39] ml/cnxk: add support to create device queue-pairs Srikanth Yalavarthi
                     ` (30 subsequent siblings)
  39 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:06 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Implemented ML driver functions to start and stop ML device.
Start / Stop would enable or disable ML device to accept
inference requests.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 42 +++++++++++++++++++++++++++++++---
 1 file changed, 39 insertions(+), 3 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 3df1254dca..a9f14fe4c5 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -104,9 +104,45 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	return rte_dev_remove(dev->device);
 }
 
+static int
+cn10k_ml_dev_start(struct rte_ml_dev *dev)
+{
+	struct cn10k_ml_dev *mldev;
+	uint64_t reg_val64;
+
+	mldev = dev->data->dev_private;
+
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 |= ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	mldev->state = ML_CN10K_DEV_STATE_STARTED;
+
+	return 0;
+}
+
+static int
+cn10k_ml_dev_stop(struct rte_ml_dev *dev)
+{
+	struct cn10k_ml_dev *mldev;
+	uint64_t reg_val64;
+
+	mldev = dev->data->dev_private;
+
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 &= ~ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
-	.dev_info_get = cn10k_ml_dev_info_get,
-	.dev_configure = cn10k_ml_dev_configure,
-	.dev_close = cn10k_ml_dev_close,
+	.dev_info_get = cn10k_ml_dev_info_get, .dev_configure = cn10k_ml_dev_configure,
+	.dev_close = cn10k_ml_dev_close,       .dev_start = cn10k_ml_dev_start,
+	.dev_stop = cn10k_ml_dev_stop,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 10/39] ml/cnxk: add support to create device queue-pairs
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (8 preceding siblings ...)
  2023-02-07 16:06   ` [PATCH v5 09/39] ml/cnxk: enable support for device start and stop Srikanth Yalavarthi
@ 2023-02-07 16:06   ` Srikanth Yalavarthi
  2023-02-07 16:06   ` [PATCH v5 11/39] ml/cnxk: add functions to load and unload models Srikanth Yalavarthi
                     ` (29 subsequent siblings)
  39 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:06 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Enabled support to create and destroy device queue-pairs. Updated
configure stage to create array to store queue-pair handles. Added
internal structure for queue-pair, queue and ML inference requests.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 207 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cn10k_ml_ops.h |  33 +++++-
 2 files changed, 237 insertions(+), 3 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index a9f14fe4c5..82670330d1 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -8,6 +8,97 @@
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ops.h"
 
+static void
+qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
+{
+	snprintf(name, size, "cn10k_ml_qp_mem_%u:%u", dev_id, qp_id);
+}
+
+static int
+cn10k_ml_qp_destroy(const struct rte_ml_dev *dev, struct cn10k_ml_qp *qp)
+{
+	const struct rte_memzone *qp_mem;
+	char name[RTE_MEMZONE_NAMESIZE];
+	int ret;
+
+	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp->id);
+	qp_mem = rte_memzone_lookup(name);
+	ret = rte_memzone_free(qp_mem);
+	if (ret)
+		return ret;
+
+	rte_free(qp);
+
+	return 0;
+}
+
+static int
+cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
+{
+	struct cn10k_ml_qp *qp;
+	int ret;
+
+	qp = dev->data->queue_pairs[queue_pair_id];
+	if (qp == NULL)
+		return -EINVAL;
+
+	ret = cn10k_ml_qp_destroy(dev, qp);
+	if (ret) {
+		plt_err("Could not destroy queue pair %u", queue_pair_id);
+		return ret;
+	}
+
+	dev->data->queue_pairs[queue_pair_id] = NULL;
+
+	return 0;
+}
+
+static struct cn10k_ml_qp *
+cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
+{
+	const struct rte_memzone *qp_mem;
+	char name[RTE_MEMZONE_NAMESIZE];
+	struct cn10k_ml_qp *qp;
+	uint32_t len;
+	uint8_t *va;
+
+	/* Allocate queue pair */
+	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cn10k_ml_qp), ROC_ALIGN,
+				socket_id);
+	if (qp == NULL) {
+		plt_err("Could not allocate queue pair");
+		return NULL;
+	}
+
+	/* For request queue */
+	len = nb_desc * sizeof(struct cn10k_ml_req);
+	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp_id);
+	qp_mem = rte_memzone_reserve_aligned(
+		name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | RTE_MEMZONE_256MB, ROC_ALIGN);
+	if (qp_mem == NULL) {
+		plt_err("Could not reserve memzone: %s", name);
+		goto qp_free;
+	}
+
+	va = qp_mem->addr;
+	memset(va, 0, len);
+
+	/* Initialize Request queue */
+	qp->id = qp_id;
+	qp->queue.reqs = (struct cn10k_ml_req *)va;
+	qp->queue.head = 0;
+	qp->queue.tail = 0;
+	qp->queue.wait_cycles = ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	qp->nb_desc = nb_desc;
+
+	return qp;
+
+qp_free:
+	rte_free(qp);
+
+	return NULL;
+}
+
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -30,6 +121,9 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 {
 	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_qp *qp;
+	uint32_t mz_size;
+	uint16_t qp_id;
 	int ret;
 
 	if (dev == NULL || conf == NULL)
@@ -68,21 +162,83 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		return -ENOTSUP;
 	}
 
+	/* Configure queue-pairs */
+	if (dev->data->queue_pairs == NULL) {
+		mz_size = sizeof(dev->data->queue_pairs[0]) * conf->nb_queue_pairs;
+		dev->data->queue_pairs =
+			rte_zmalloc("cn10k_mldev_queue_pairs", mz_size, RTE_CACHE_LINE_SIZE);
+		if (dev->data->queue_pairs == NULL) {
+			dev->data->nb_queue_pairs = 0;
+			plt_err("Failed to get memory for queue_pairs, nb_queue_pairs %u",
+				conf->nb_queue_pairs);
+			return -ENOMEM;
+		}
+	} else { /* Re-configure */
+		void **queue_pairs;
+
+		/* Release all queue pairs as ML spec doesn't support queue_pair_destroy. */
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+			qp = dev->data->queue_pairs[qp_id];
+			if (qp != NULL) {
+				ret = cn10k_ml_dev_queue_pair_release(dev, qp_id);
+				if (ret < 0)
+					return ret;
+			}
+		}
+
+		queue_pairs = dev->data->queue_pairs;
+		queue_pairs =
+			rte_realloc(queue_pairs, sizeof(queue_pairs[0]) * conf->nb_queue_pairs,
+				    RTE_CACHE_LINE_SIZE);
+		if (queue_pairs == NULL) {
+			dev->data->nb_queue_pairs = 0;
+			plt_err("Failed to realloc queue_pairs, nb_queue_pairs = %u",
+				conf->nb_queue_pairs);
+			ret = -ENOMEM;
+			goto error;
+		}
+
+		memset(queue_pairs, 0, sizeof(queue_pairs[0]) * conf->nb_queue_pairs);
+		dev->data->queue_pairs = queue_pairs;
+	}
+	dev->data->nb_queue_pairs = conf->nb_queue_pairs;
+
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
 
 	return 0;
+
+error:
+	if (dev->data->queue_pairs != NULL)
+		rte_free(dev->data->queue_pairs);
+
+	return ret;
 }
 
 static int
 cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_qp *qp;
+	uint16_t qp_id;
 
 	if (dev == NULL)
 		return -EINVAL;
 
 	mldev = dev->data->dev_private;
 
+	/* Destroy all queue pairs */
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		if (qp != NULL) {
+			if (cn10k_ml_qp_destroy(dev, qp) != 0)
+				plt_err("Could not destroy queue pair %u", qp_id);
+			dev->data->queue_pairs[qp_id] = NULL;
+		}
+	}
+
+	if (dev->data->queue_pairs)
+		rte_free(dev->data->queue_pairs);
+
 	/* Unload firmware */
 	cn10k_ml_fw_unload(mldev);
 
@@ -140,9 +296,56 @@ cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 	return 0;
 }
 
+static int
+cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
+			      const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_qp *qp;
+	uint32_t nb_desc;
+
+	if (queue_pair_id >= dev->data->nb_queue_pairs) {
+		plt_err("Queue-pair id = %u (>= max queue pairs supported, %u)\n", queue_pair_id,
+			dev->data->nb_queue_pairs);
+		return -EINVAL;
+	}
+
+	if (dev->data->queue_pairs[queue_pair_id] != NULL)
+		cn10k_ml_dev_queue_pair_release(dev, queue_pair_id);
+
+	cn10k_ml_dev_info_get(dev, &dev_info);
+	if ((qp_conf->nb_desc > dev_info.max_desc) || (qp_conf->nb_desc == 0)) {
+		plt_err("Could not setup queue pair for %u descriptors", qp_conf->nb_desc);
+		return -EINVAL;
+	}
+	plt_ml_dbg("Creating queue-pair, queue_pair_id = %u, nb_desc = %u", queue_pair_id,
+		   qp_conf->nb_desc);
+
+	/* As the number of usable descriptors is 1 less than the queue size being created, we
+	 * increment the size of queue by 1 than the requested size, except when the requested size
+	 * is equal to the maximum possible size.
+	 */
+	nb_desc =
+		(qp_conf->nb_desc == dev_info.max_desc) ? dev_info.max_desc : qp_conf->nb_desc + 1;
+	qp = cn10k_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
+	if (qp == NULL) {
+		plt_err("Could not create queue pair %u", queue_pair_id);
+		return -ENOMEM;
+	}
+	dev->data->queue_pairs[queue_pair_id] = qp;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
-	.dev_info_get = cn10k_ml_dev_info_get, .dev_configure = cn10k_ml_dev_configure,
-	.dev_close = cn10k_ml_dev_close,       .dev_start = cn10k_ml_dev_start,
+	.dev_info_get = cn10k_ml_dev_info_get,
+	.dev_configure = cn10k_ml_dev_configure,
+	.dev_close = cn10k_ml_dev_close,
+	.dev_start = cn10k_ml_dev_start,
 	.dev_stop = cn10k_ml_dev_stop,
+
+	/* Queue-pair handling ops */
+	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
+	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
 };
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index fe18730aca..289c7c5587 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -5,9 +5,13 @@
 #ifndef _CN10K_ML_OPS_H_
 #define _CN10K_ML_OPS_H_
 
+#include <rte_mldev.h>
+
+#include <roc_api.h>
+
 #include "cn10k_ml_dev.h"
 
-/* ML request */
+/* Request structure */
 struct cn10k_ml_req {
 	/* Job descriptor */
 	struct cn10k_ml_jd jd;
@@ -19,6 +23,33 @@ struct cn10k_ml_req {
 	volatile uint64_t status;
 } __rte_aligned(ROC_ALIGN);
 
+/* Request queue */
+struct cn10k_ml_queue {
+	/* Array of requests */
+	struct cn10k_ml_req *reqs;
+
+	/* Head of the queue, used for enqueue */
+	uint64_t head;
+
+	/* Tail of the queue, used for dequeue */
+	uint64_t tail;
+
+	/* Wait cycles before timeout */
+	uint64_t wait_cycles;
+};
+
+/* Queue-pair structure */
+struct cn10k_ml_qp {
+	/* ID */
+	uint32_t id;
+
+	/* Number of descriptors */
+	uint64_t nb_desc;
+
+	/* Request queue */
+	struct cn10k_ml_queue queue;
+};
+
 /* Device ops */
 extern struct rte_ml_dev_ops cn10k_ml_ops;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 11/39] ml/cnxk: add functions to load and unload models
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (9 preceding siblings ...)
  2023-02-07 16:06   ` [PATCH v5 10/39] ml/cnxk: add support to create device queue-pairs Srikanth Yalavarthi
@ 2023-02-07 16:06   ` Srikanth Yalavarthi
  2023-02-07 16:06   ` [PATCH v5 12/39] ml/cnxk: enable validity checks for model metadata Srikanth Yalavarthi
                     ` (28 subsequent siblings)
  39 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:06 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added cnxk driver implementations to load and unload ML models.
Enabled support in configure stage to allocate model handles
array. Assign model ID and allocate resources per each model
during load stage and release resources during model unload.
Added internal structures to handle ML models.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |   3 +
 drivers/ml/cnxk/cn10k_ml_model.c |   5 +
 drivers/ml/cnxk/cn10k_ml_model.h |  40 ++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 154 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h   |   5 +
 drivers/ml/cnxk/meson.build      |   2 +
 6 files changed, 209 insertions(+)
 create mode 100644 drivers/ml/cnxk/cn10k_ml_model.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_model.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 00d23eb3ca..7cf6268115 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -214,6 +214,9 @@ struct cn10k_ml_dev {
 
 	/* Firmware */
 	struct cn10k_ml_fw fw;
+
+	/* Number of models loaded */
+	uint16_t nb_models_loaded;
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
new file mode 100644
index 0000000000..39ed707396
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#include "cn10k_ml_model.h"
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
new file mode 100644
index 0000000000..a9f7b169de
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef _CN10K_ML_MODEL_H_
+#define _CN10K_ML_MODEL_H_
+
+#include <rte_mldev.h>
+
+#include <roc_api.h>
+
+#include "cn10k_ml_dev.h"
+
+/* Model state */
+enum cn10k_ml_model_state {
+	ML_CN10K_MODEL_STATE_LOADED,
+	ML_CN10K_MODEL_STATE_JOB_ACTIVE,
+	ML_CN10K_MODEL_STATE_STARTED,
+	ML_CN10K_MODEL_STATE_UNKNOWN,
+};
+
+/* Model Object */
+struct cn10k_ml_model {
+	/* Device reference */
+	struct cn10k_ml_dev *mldev;
+
+	/* Name */
+	char name[RTE_ML_STR_MAX];
+
+	/* ID */
+	uint16_t model_id;
+
+	/* Spinlock, used to update model state */
+	plt_spinlock_t lock;
+
+	/* State */
+	enum cn10k_ml_model_state state;
+};
+
+#endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 82670330d1..0955fa0d76 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -6,8 +6,12 @@
 #include <rte_mldev_pmd.h>
 
 #include "cn10k_ml_dev.h"
+#include "cn10k_ml_model.h"
 #include "cn10k_ml_ops.h"
 
+/* ML model macros */
+#define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -120,9 +124,11 @@ static int
 cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
 {
 	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
 	uint32_t mz_size;
+	uint16_t model_id;
 	uint16_t qp_id;
 	int ret;
 
@@ -203,6 +209,48 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	}
 	dev->data->nb_queue_pairs = conf->nb_queue_pairs;
 
+	/* Allocate ML models */
+	if (dev->data->models == NULL) {
+		mz_size = sizeof(dev->data->models[0]) * conf->nb_models;
+		dev->data->models = rte_zmalloc("cn10k_mldev_models", mz_size, RTE_CACHE_LINE_SIZE);
+		if (dev->data->models == NULL) {
+			dev->data->nb_models = 0;
+			plt_err("Failed to get memory for ml_models, nb_models %u",
+				conf->nb_models);
+			ret = -ENOMEM;
+			goto error;
+		}
+	} else {
+		/* Re-configure */
+		void **models;
+
+		/* Unload all models */
+		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+			model = dev->data->models[model_id];
+			if (model != NULL) {
+				if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+					if (cn10k_ml_model_unload(dev, model_id) != 0)
+						plt_err("Could not unload model %u", model_id);
+				}
+				dev->data->models[model_id] = NULL;
+			}
+		}
+
+		models = dev->data->models;
+		models = rte_realloc(models, sizeof(models[0]) * conf->nb_models,
+				     RTE_CACHE_LINE_SIZE);
+		if (models == NULL) {
+			dev->data->nb_models = 0;
+			plt_err("Failed to realloc ml_models, nb_models = %u", conf->nb_models);
+			ret = -ENOMEM;
+			goto error;
+		}
+		memset(models, 0, sizeof(models[0]) * conf->nb_models);
+		dev->data->models = models;
+	}
+	dev->data->nb_models = conf->nb_models;
+
+	mldev->nb_models_loaded = 0;
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
 
 	return 0;
@@ -211,14 +259,19 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	if (dev->data->queue_pairs != NULL)
 		rte_free(dev->data->queue_pairs);
 
+	if (dev->data->models != NULL)
+		rte_free(dev->data->models);
+
 	return ret;
 }
 
 static int
 cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
+	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
+	uint16_t model_id;
 	uint16_t qp_id;
 
 	if (dev == NULL)
@@ -226,6 +279,21 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 
 	mldev = dev->data->dev_private;
 
+	/* Unload all models */
+	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+		model = dev->data->models[model_id];
+		if (model != NULL) {
+			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+				if (cn10k_ml_model_unload(dev, model_id) != 0)
+					plt_err("Could not unload model %u", model_id);
+			}
+			dev->data->models[model_id] = NULL;
+		}
+	}
+
+	if (dev->data->models)
+		rte_free(dev->data->models);
+
 	/* Destroy all queue pairs */
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
 		qp = dev->data->queue_pairs[qp_id];
@@ -337,6 +405,88 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+int
+cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint64_t mz_size;
+	uint16_t idx;
+	bool found;
+
+	PLT_SET_USED(params);
+
+	mldev = dev->data->dev_private;
+
+	/* Find model ID */
+	found = false;
+	for (idx = 0; idx < dev->data->nb_models; idx++) {
+		if (dev->data->models[idx] == NULL) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		plt_err("No slots available to load new model");
+		return -ENOMEM;
+	}
+
+	/* Compute memzone size */
+	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE);
+
+	/* Allocate memzone for model object and model data */
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
+	mz = plt_memzone_reserve_aligned(str, mz_size, 0, ML_CN10K_ALIGN_SIZE);
+	if (!mz) {
+		plt_err("plt_memzone_reserve failed : %s", str);
+		return -ENOMEM;
+	}
+
+	model = mz->addr;
+	model->mldev = mldev;
+	model->model_id = idx;
+
+	plt_spinlock_init(&model->lock);
+	model->state = ML_CN10K_MODEL_STATE_LOADED;
+	dev->data->models[idx] = model;
+	mldev->nb_models_loaded++;
+
+	*model_id = idx;
+
+	return 0;
+}
+
+int
+cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	char str[RTE_MEMZONE_NAMESIZE];
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+
+	mldev = dev->data->dev_private;
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	if (model->state != ML_CN10K_MODEL_STATE_LOADED) {
+		plt_err("Cannot unload. Model in use.");
+		return -EBUSY;
+	}
+
+	dev->data->models[model_id] = NULL;
+	mldev->nb_models_loaded--;
+
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, model_id);
+	return plt_memzone_free(plt_memzone_lookup(str));
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -348,4 +498,8 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Queue-pair handling ops */
 	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
 	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
+
+	/* Model ops */
+	.model_load = cn10k_ml_model_load,
+	.model_unload = cn10k_ml_model_unload,
 };
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 289c7c5587..d7842ecd73 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -53,4 +53,9 @@ struct cn10k_ml_qp {
 /* Device ops */
 extern struct rte_ml_dev_ops cn10k_ml_ops;
 
+/* Slow-path ops */
+int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
+			uint16_t *model_id);
+int cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
+
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 7dc8a29a80..bf7a9c0225 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -10,11 +10,13 @@ endif
 driver_sdk_headers = files(
         'cn10k_ml_dev.h',
         'cn10k_ml_ops.h',
+        'cn10k_ml_model.h',
 )
 
 sources = files(
         'cn10k_ml_dev.c',
         'cn10k_ml_ops.c',
+        'cn10k_ml_model.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs']
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 12/39] ml/cnxk: enable validity checks for model metadata
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (10 preceding siblings ...)
  2023-02-07 16:06   ` [PATCH v5 11/39] ml/cnxk: add functions to load and unload models Srikanth Yalavarthi
@ 2023-02-07 16:06   ` Srikanth Yalavarthi
  2023-02-07 16:06   ` [PATCH v5 13/39] ml/cnxk: add internal structures for derived info Srikanth Yalavarthi
                     ` (27 subsequent siblings)
  39 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:06 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added model metadata structure and enabled metadata check
during model load. Remap cnxk IO types with RTE IO types.
Store and update model metadata in model structure.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 211 +++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h | 312 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c   |  14 +-
 drivers/ml/cnxk/meson.build      |   2 +-
 4 files changed, 537 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 39ed707396..dfa814bbe0 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -2,4 +2,215 @@
  * Copyright (c) 2022 Marvell.
  */
 
+#include <rte_hash_crc.h>
+
+#include <mldev_utils.h>
+
+#include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
+
+static enum rte_ml_io_type
+cn10k_ml_io_type_map(uint8_t type)
+{
+	switch (type) {
+	case 1:
+		return RTE_ML_IO_TYPE_INT8;
+	case 2:
+		return RTE_ML_IO_TYPE_UINT8;
+	case 3:
+		return RTE_ML_IO_TYPE_INT16;
+	case 4:
+		return RTE_ML_IO_TYPE_UINT16;
+	case 5:
+		return RTE_ML_IO_TYPE_INT32;
+	case 6:
+		return RTE_ML_IO_TYPE_UINT32;
+	case 7:
+		return RTE_ML_IO_TYPE_FP16;
+	case 8:
+		return RTE_ML_IO_TYPE_FP32;
+	}
+
+	return RTE_ML_IO_TYPE_UNKNOWN;
+}
+
+int
+cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	uint32_t payload_crc32c;
+	uint32_t header_crc32c;
+	uint8_t version[4];
+	uint8_t i;
+
+	metadata = (struct cn10k_ml_model_metadata *)buffer;
+
+	/* Header CRC check */
+	if (metadata->metadata_header.header_crc32c != 0) {
+		header_crc32c = rte_hash_crc(
+			buffer, sizeof(metadata->metadata_header) - sizeof(uint32_t), 0);
+
+		if (header_crc32c != metadata->metadata_header.header_crc32c) {
+			plt_err("Invalid model, Header CRC mismatch");
+			return -EINVAL;
+		}
+	}
+
+	/* Payload CRC check */
+	if (metadata->metadata_header.payload_crc32c != 0) {
+		payload_crc32c = rte_hash_crc(buffer + sizeof(metadata->metadata_header),
+					      size - sizeof(metadata->metadata_header), 0);
+
+		if (payload_crc32c != metadata->metadata_header.payload_crc32c) {
+			plt_err("Invalid model, Payload CRC mismatch");
+			return -EINVAL;
+		}
+	}
+
+	/* Model magic string */
+	if (strncmp((char *)metadata->metadata_header.magic, MRVL_ML_MODEL_MAGIC_STRING, 4) != 0) {
+		plt_err("Invalid model, magic = %s", metadata->metadata_header.magic);
+		return -EINVAL;
+	}
+
+	/* Target architecture */
+	if (metadata->metadata_header.target_architecture != MRVL_ML_MODEL_TARGET_ARCH) {
+		plt_err("Model target architecture (%u) not supported",
+			metadata->metadata_header.target_architecture);
+		return -ENOTSUP;
+	}
+
+	/* Header version */
+	rte_memcpy(version, metadata->metadata_header.version, 4 * sizeof(uint8_t));
+	if (version[0] * 1000 + version[1] * 100 < MRVL_ML_MODEL_VERSION) {
+		plt_err("Metadata version = %u.%u.%u.%u (< %u.%u.%u.%u) not supported", version[0],
+			version[1], version[2], version[3], (MRVL_ML_MODEL_VERSION / 1000) % 10,
+			(MRVL_ML_MODEL_VERSION / 100) % 10, (MRVL_ML_MODEL_VERSION / 10) % 10,
+			MRVL_ML_MODEL_VERSION % 10);
+		return -ENOTSUP;
+	}
+
+	/* Init section */
+	if (metadata->init_model.file_size == 0) {
+		plt_err("Invalid metadata, init_model.file_size = %u",
+			metadata->init_model.file_size);
+		return -EINVAL;
+	}
+
+	/* Main section */
+	if (metadata->main_model.file_size == 0) {
+		plt_err("Invalid metadata, main_model.file_size = %u",
+			metadata->main_model.file_size);
+		return -EINVAL;
+	}
+
+	/* Finish section */
+	if (metadata->finish_model.file_size == 0) {
+		plt_err("Invalid metadata, finish_model.file_size = %u",
+			metadata->finish_model.file_size);
+		return -EINVAL;
+	}
+
+	/* Weights and Bias */
+	if (metadata->weights_bias.file_size == 0) {
+		plt_err("Invalid metadata, weights_bias.file_size = %u",
+			metadata->weights_bias.file_size);
+		return -EINVAL;
+	}
+
+	if (metadata->weights_bias.relocatable != 1) {
+		plt_err("Model not supported, non-relocatable weights and bias");
+		return -ENOTSUP;
+	}
+
+	/* Check input count */
+	if (metadata->model.num_input > MRVL_ML_INPUT_OUTPUT_SIZE) {
+		plt_err("Invalid metadata, num_input  = %u (> %u)", metadata->model.num_input,
+			MRVL_ML_INPUT_OUTPUT_SIZE);
+		return -EINVAL;
+	}
+
+	/* Check output count */
+	if (metadata->model.num_output > MRVL_ML_INPUT_OUTPUT_SIZE) {
+		plt_err("Invalid metadata, num_output  = %u (> %u)", metadata->model.num_output,
+			MRVL_ML_INPUT_OUTPUT_SIZE);
+		return -EINVAL;
+	}
+
+	/* Inputs */
+	for (i = 0; i < metadata->model.num_input; i++) {
+		if (rte_ml_io_type_size_get(cn10k_ml_io_type_map(metadata->input[i].input_type)) <=
+		    0) {
+			plt_err("Invalid metadata, input[%u] : input_type = %u", i,
+				metadata->input[i].input_type);
+			return -EINVAL;
+		}
+
+		if (rte_ml_io_type_size_get(
+			    cn10k_ml_io_type_map(metadata->input[i].model_input_type)) <= 0) {
+			plt_err("Invalid metadata, input[%u] : model_input_type = %u", i,
+				metadata->input[i].model_input_type);
+			return -EINVAL;
+		}
+
+		if (metadata->input[i].relocatable != 1) {
+			plt_err("Model not supported, non-relocatable input: %u", i);
+			return -ENOTSUP;
+		}
+	}
+
+	/* Outputs */
+	for (i = 0; i < metadata->model.num_output; i++) {
+		if (rte_ml_io_type_size_get(
+			    cn10k_ml_io_type_map(metadata->output[i].output_type)) <= 0) {
+			plt_err("Invalid metadata, output[%u] : output_type = %u", i,
+				metadata->output[i].output_type);
+			return -EINVAL;
+		}
+
+		if (rte_ml_io_type_size_get(
+			    cn10k_ml_io_type_map(metadata->output[i].model_output_type)) <= 0) {
+			plt_err("Invalid metadata, output[%u] : model_output_type = %u", i,
+				metadata->output[i].model_output_type);
+			return -EINVAL;
+		}
+
+		if (metadata->output[i].relocatable != 1) {
+			plt_err("Model not supported, non-relocatable output: %u", i);
+			return -ENOTSUP;
+		}
+	}
+
+	return 0;
+}
+
+void
+cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata)
+{
+	uint8_t i;
+
+	for (i = 0; i < metadata->model.num_input; i++) {
+		metadata->input[i].input_type = cn10k_ml_io_type_map(metadata->input[i].input_type);
+		metadata->input[i].model_input_type =
+			cn10k_ml_io_type_map(metadata->input[i].model_input_type);
+
+		if (metadata->input[i].shape.w == 0)
+			metadata->input[i].shape.w = 1;
+
+		if (metadata->input[i].shape.x == 0)
+			metadata->input[i].shape.x = 1;
+
+		if (metadata->input[i].shape.y == 0)
+			metadata->input[i].shape.y = 1;
+
+		if (metadata->input[i].shape.z == 0)
+			metadata->input[i].shape.z = 1;
+	}
+
+	for (i = 0; i < metadata->model.num_output; i++) {
+		metadata->output[i].output_type =
+			cn10k_ml_io_type_map(metadata->output[i].output_type);
+		metadata->output[i].model_output_type =
+			cn10k_ml_io_type_map(metadata->output[i].model_output_type);
+	}
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index a9f7b169de..dc30bc2aa7 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -19,6 +19,309 @@ enum cn10k_ml_model_state {
 	ML_CN10K_MODEL_STATE_UNKNOWN,
 };
 
+/* Model Metadata : v 2.1.0.2 */
+#define MRVL_ML_MODEL_MAGIC_STRING "MRVL"
+#define MRVL_ML_MODEL_TARGET_ARCH  128
+#define MRVL_ML_MODEL_VERSION	   2100
+#define MRVL_ML_MODEL_NAME_LEN	   64
+#define MRVL_ML_INPUT_NAME_LEN	   16
+#define MRVL_ML_OUTPUT_NAME_LEN	   16
+#define MRVL_ML_INPUT_OUTPUT_SIZE  8
+
+/* Model file metadata structure */
+struct cn10k_ml_model_metadata {
+	/* Header (256-byte) */
+	struct {
+		/* Magic string ('M', 'R', 'V', 'L') */
+		uint8_t magic[4];
+
+		/* Metadata version */
+		uint8_t version[4];
+
+		/* Metadata size */
+		uint32_t metadata_size;
+
+		/* Unique ID */
+		uint8_t uuid[128];
+
+		/* Model target architecture
+		 * 0 = Undefined
+		 * 1 = M1K
+		 * 128 = MLIP
+		 * 256 = Experimental
+		 */
+		uint32_t target_architecture;
+		uint8_t reserved[104];
+
+		/* CRC of data after metadata_header (i.e. after first 256 bytes) */
+		uint32_t payload_crc32c;
+
+		/* CRC of first 252 bytes of metadata_header, after payload_crc calculation */
+		uint32_t header_crc32c;
+	} metadata_header;
+
+	/* Model information (256-byte) */
+	struct {
+		/* Model name string */
+		uint8_t name[MRVL_ML_MODEL_NAME_LEN];
+
+		/* Model version info (xx.xx.xx.xx) */
+		uint8_t version[4];
+
+		/* Model code size (Init + Main + Finish) */
+		uint32_t code_size;
+
+		/* Model data size (Weights and Bias) */
+		uint32_t data_size;
+
+		/* OCM start offset, set to ocm_wb_range_start */
+		uint32_t ocm_start;
+
+		/* OCM start offset, set to max OCM size */
+		uint32_t ocm_end;
+
+		/* Relocatable flag (always yes)
+		 * 0 = Not relocatable
+		 * 1 = Relocatable
+		 */
+		uint8_t ocm_relocatable;
+
+		/* Tile relocatable flag (always yes)
+		 * 0 = Not relocatable
+		 * 1 = Relocatable
+		 */
+		uint8_t tile_relocatable;
+
+		/* Start tile (Always 0) */
+		uint8_t tile_start;
+
+		/* End tile (num_tiles - 1) */
+		uint8_t tile_end;
+
+		/* Inference batch size */
+		uint8_t batch_size;
+
+		/* Number of input tensors (Max 8) */
+		uint8_t num_input;
+
+		/* Number of output tensors (Max 8) */
+		uint8_t num_output;
+		uint8_t reserved1;
+
+		/* Total input size in bytes */
+		uint32_t input_size;
+
+		/* Total output size in bytes */
+		uint32_t output_size;
+
+		/* Table size in bytes */
+		uint32_t table_size;
+
+		/* Number of layers in the network */
+		uint32_t num_layers;
+		uint32_t reserved2;
+
+		/* Floor of absolute OCM region */
+		uint64_t ocm_tmp_range_floor;
+
+		/* Relative OCM start address of WB data block */
+		uint64_t ocm_wb_range_start;
+
+		/* Relative OCM end address of WB data block */
+		uint64_t ocm_wb_range_end;
+
+		/* Relative DDR start address of WB data block */
+		uint64_t ddr_wb_range_start;
+
+		/* Relative DDR end address of all outputs */
+		uint64_t ddr_wb_range_end;
+
+		/* Relative DDR start address of all inputs */
+		uint64_t ddr_input_range_start;
+
+		/* Relative DDR end address of all inputs */
+		uint64_t ddr_input_range_end;
+
+		/* Relative DDR start address of all outputs */
+		uint64_t ddr_output_range_start;
+
+		/* Relative DDR end address of all outputs */
+		uint64_t ddr_output_range_end;
+
+		/* Compiler version */
+		uint8_t compiler_version[8];
+
+		/* CDK version */
+		uint8_t cdk_version[4];
+
+		/* Lower batch optimization support
+		 * 0 - No,
+		 * 1 - Yes
+		 */
+		uint8_t supports_lower_batch_size_optimization;
+		uint8_t reserved3[59];
+	} model;
+
+	/* Init section (64-byte) */
+	struct {
+		uint32_t file_offset;
+		uint32_t file_size;
+		uint8_t reserved[56];
+	} init_model;
+
+	/* Main section (64-byte) */
+	struct {
+		uint32_t file_offset;
+		uint32_t file_size;
+		uint8_t reserved[56];
+	} main_model;
+
+	/* Finish section (64-byte) */
+	struct {
+		uint32_t file_offset;
+		uint32_t file_size;
+		uint8_t reserved[56];
+	} finish_model;
+
+	uint8_t reserved1[512]; /* End of 2k bytes */
+
+	/* Weights and Bias (64-byte) */
+	struct {
+		/* Memory offset, set to ddr_wb_range_start */
+		uint64_t mem_offset;
+		uint32_t file_offset;
+		uint32_t file_size;
+
+		/* Relocatable flag for WB
+		 * 1 = Relocatable
+		 * 2 = Not relocatable
+		 */
+		uint8_t relocatable;
+		uint8_t reserved[47];
+	} weights_bias;
+
+	/* Input (512-byte, 64-byte per input) provisioned for 8 inputs */
+	struct {
+		/* DDR offset (in OCM absolute addresses for input) */
+		uint64_t mem_offset;
+
+		/* Relocatable flag
+		 * 1 = Relocatable
+		 * 2 = Not relocatable
+		 */
+		uint8_t relocatable;
+
+		/* Input quantization
+		 * 1 = Requires quantization
+		 * 2 = Pre-quantized
+		 */
+		uint8_t quantize;
+
+		/* Type of incoming input
+		 * 1 = INT8, 2 = UINT8, 3 = INT16, 4 = UINT16,
+		 * 5 = INT32, 6 = UINT32, 7 = FP16, 8 = FP32
+		 */
+		uint8_t input_type;
+
+		/* Type of input required by model
+		 * 1 = INT8, 2 = UINT8, 3 = INT16, 4 = UINT16,
+		 * 5 = INT32, 6 = UINT32, 7 = FP16, 8 = FP32
+		 */
+		uint8_t model_input_type;
+
+		/* float_32 qscale value
+		 * quantized = non-quantized * qscale
+		 */
+		float qscale;
+
+		/* Input shape */
+		struct {
+			/* Input format
+			 * 1 = NCHW
+			 * 2 = NHWC
+			 */
+			uint8_t format;
+			uint8_t reserved[3];
+			uint32_t w;
+			uint32_t x;
+			uint32_t y;
+			uint32_t z;
+		} shape;
+		uint8_t reserved[4];
+
+		/* Name of input */
+		uint8_t input_name[MRVL_ML_INPUT_NAME_LEN];
+
+		/* DDR range end
+		 * new = mem_offset + size_bytes - 1
+		 */
+		uint64_t ddr_range_end;
+	} input[MRVL_ML_INPUT_OUTPUT_SIZE];
+
+	/* Output (512 byte, 64-byte per input) provisioned for 8 outputs */
+	struct {
+		/* DDR offset in OCM absolute addresses for output */
+		uint64_t mem_offset;
+
+		/* Relocatable flag
+		 * 1 = Relocatable
+		 * 2 = Not relocatable
+		 */
+		uint8_t relocatable;
+
+		/* Output dequantization
+		 * 1 = De-quantization required
+		 * 2 = De-quantization not required
+		 */
+		uint8_t dequantize;
+
+		/* Type of outgoing output
+		 * 1 = INT8, 2 = UINT8, 3 = INT16, 4 = UINT16
+		 * 5 = INT32, 6 = UINT32, 7 = FP16, 8 = FP32
+		 */
+		uint8_t output_type;
+
+		/* Type of output produced by model
+		 * 1 = INT8, 2 = UINT8, 3 = INT16, 4 = UINT16
+		 * 5 = INT32, 6 = UINT32, 7 = FP16, 8 = FP32
+		 */
+		uint8_t model_output_type;
+
+		/* float_32 dscale value
+		 * dequantized = quantized * dscale
+		 */
+		float dscale;
+
+		/* Number of items in the output */
+		uint32_t size;
+		uint8_t reserved[20];
+
+		/* DDR range end
+		 * new = mem_offset + size_bytes - 1
+		 */
+		uint64_t ddr_range_end;
+		uint8_t output_name[MRVL_ML_OUTPUT_NAME_LEN];
+	} output[MRVL_ML_INPUT_OUTPUT_SIZE];
+
+	uint8_t reserved2[1792];
+
+	/* Model data */
+	struct {
+		uint8_t reserved1[4068];
+
+		/* Beta: xx.xx.xx.xx,
+		 * Later: YYYYMM.xx.xx
+		 */
+		uint8_t compiler_version[8];
+
+		/* M1K CDK version (xx.xx.xx.xx) */
+		uint8_t m1k_cdk_version[4];
+	} data;
+
+	/* Hidden 16 bytes of magic code */
+	uint8_t reserved3[16];
+};
+
 /* Model Object */
 struct cn10k_ml_model {
 	/* Device reference */
@@ -30,6 +333,12 @@ struct cn10k_ml_model {
 	/* ID */
 	uint16_t model_id;
 
+	/* Batch size */
+	uint32_t batch_size;
+
+	/* Metadata */
+	struct cn10k_ml_model_metadata metadata;
+
 	/* Spinlock, used to update model state */
 	plt_spinlock_t lock;
 
@@ -37,4 +346,7 @@ struct cn10k_ml_model {
 	enum cn10k_ml_model_state state;
 };
 
+int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
+void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
+
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 0955fa0d76..2cde795903 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -416,8 +416,11 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	uint64_t mz_size;
 	uint16_t idx;
 	bool found;
+	int ret;
 
-	PLT_SET_USED(params);
+	ret = cn10k_ml_model_metadata_check(params->addr, params->size);
+	if (ret != 0)
+		return ret;
 
 	mldev = dev->data->dev_private;
 
@@ -450,6 +453,15 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	model->mldev = mldev;
 	model->model_id = idx;
 
+	rte_memcpy(&model->metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
+	cn10k_ml_model_metadata_update(&model->metadata);
+
+	/* Enable support for batch_size of 256 */
+	if (model->metadata.model.batch_size == 0)
+		model->batch_size = 256;
+	else
+		model->batch_size = model->metadata.model.batch_size;
+
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index bf7a9c0225..799e8f2470 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -19,7 +19,7 @@ sources = files(
         'cn10k_ml_model.c',
 )
 
-deps += ['mldev', 'common_cnxk', 'kvargs']
+deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
 
 if get_option('buildtype').contains('debug')
         cflags += [ '-DCNXK_ML_DEV_DEBUG' ]
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 13/39] ml/cnxk: add internal structures for derived info
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (11 preceding siblings ...)
  2023-02-07 16:06   ` [PATCH v5 12/39] ml/cnxk: enable validity checks for model metadata Srikanth Yalavarthi
@ 2023-02-07 16:06   ` Srikanth Yalavarthi
  2023-02-07 16:06   ` [PATCH v5 14/39] ml/cnxk: add internal structures for tiles and OCM Srikanth Yalavarthi
                     ` (26 subsequent siblings)
  39 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:06 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added internal structures to handle derived address fields
and enabled support to compute DMA addresses for model start.
Enabled updating internal model fields.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 89 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h | 80 ++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 18 ++++++-
 3 files changed, 186 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index dfa814bbe0..2530beb80e 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -214,3 +214,92 @@ cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata)
 			cn10k_ml_io_type_map(metadata->output[i].model_output_type);
 	}
 }
+
+void
+cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_t *base_dma_addr)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	struct cn10k_ml_model_addr *addr;
+	size_t model_data_size;
+	uint8_t *dma_addr_load;
+	uint8_t *dma_addr_run;
+	uint8_t i;
+	int fpos;
+
+	metadata = &model->metadata;
+	addr = &model->addr;
+	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
+			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
+
+	/* Base address */
+	addr->base_dma_addr_load = base_dma_addr;
+	addr->base_dma_addr_run = PLT_PTR_ADD(addr->base_dma_addr_load, model_data_size);
+
+	/* Init section */
+	dma_addr_load = addr->base_dma_addr_load;
+	dma_addr_run = addr->base_dma_addr_run;
+	fpos = sizeof(struct cn10k_ml_model_metadata);
+	addr->init_load_addr = dma_addr_load;
+	addr->init_run_addr = dma_addr_run;
+	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->init_model.file_size);
+
+	/* Main section */
+	dma_addr_load += metadata->init_model.file_size;
+	dma_addr_run += metadata->init_model.file_size;
+	fpos += metadata->init_model.file_size;
+	addr->main_load_addr = dma_addr_load;
+	addr->main_run_addr = dma_addr_run;
+	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->main_model.file_size);
+
+	/* Finish section */
+	dma_addr_load += metadata->main_model.file_size;
+	dma_addr_run += metadata->main_model.file_size;
+	fpos += metadata->main_model.file_size;
+	addr->finish_load_addr = dma_addr_load;
+	addr->finish_run_addr = dma_addr_run;
+	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->finish_model.file_size);
+
+	/* Weights and Bias section */
+	dma_addr_load += metadata->finish_model.file_size;
+	fpos += metadata->finish_model.file_size;
+	addr->wb_base_addr = PLT_PTR_SUB(dma_addr_load, metadata->weights_bias.mem_offset);
+	addr->wb_load_addr = PLT_PTR_ADD(addr->wb_base_addr, metadata->weights_bias.mem_offset);
+	rte_memcpy(addr->wb_load_addr, PLT_PTR_ADD(buffer, fpos), metadata->weights_bias.file_size);
+
+	/* Inputs */
+	addr->total_input_sz_d = 0;
+	addr->total_input_sz_q = 0;
+	for (i = 0; i < metadata->model.num_input; i++) {
+		addr->input[i].nb_elements =
+			model->metadata.input[i].shape.w * model->metadata.input[i].shape.x *
+			model->metadata.input[i].shape.y * model->metadata.input[i].shape.z;
+		addr->input[i].sz_d = addr->input[i].nb_elements *
+				      rte_ml_io_type_size_get(metadata->input[i].input_type);
+		addr->input[i].sz_q = addr->input[i].nb_elements *
+				      rte_ml_io_type_size_get(metadata->input[i].model_input_type);
+		addr->total_input_sz_d += addr->input[i].sz_d;
+		addr->total_input_sz_q += addr->input[i].sz_q;
+
+		plt_ml_dbg("model_id = %u, input[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+			   model->model_id, i, metadata->input[i].shape.w,
+			   metadata->input[i].shape.x, metadata->input[i].shape.y,
+			   metadata->input[i].shape.z, addr->input[i].sz_d, addr->input[i].sz_q);
+	}
+
+	/* Outputs */
+	addr->total_output_sz_q = 0;
+	addr->total_output_sz_d = 0;
+	for (i = 0; i < metadata->model.num_output; i++) {
+		addr->output[i].nb_elements = metadata->output[i].size;
+		addr->output[i].sz_d = addr->output[i].nb_elements *
+				       rte_ml_io_type_size_get(metadata->output[i].output_type);
+		addr->output[i].sz_q =
+			addr->output[i].nb_elements *
+			rte_ml_io_type_size_get(metadata->output[i].model_output_type);
+		addr->total_output_sz_q += addr->output[i].sz_q;
+		addr->total_output_sz_d += addr->output[i].sz_d;
+
+		plt_ml_dbg("model_id = %u, output[%u] - sz_d = %u, sz_q = %u", model->model_id, i,
+			   addr->output[i].sz_d, addr->output[i].sz_q);
+	}
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index dc30bc2aa7..5345160a74 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -322,6 +322,81 @@ struct cn10k_ml_model_metadata {
 	uint8_t reserved3[16];
 };
 
+/* Model address structure */
+struct cn10k_ml_model_addr {
+	/* Base DMA address for load */
+	void *base_dma_addr_load;
+
+	/* Base DMA address for run */
+	void *base_dma_addr_run;
+
+	/* Init section load address */
+	void *init_load_addr;
+
+	/* Init section run address */
+	void *init_run_addr;
+
+	/* Main section load address */
+	void *main_load_addr;
+
+	/* Main section run address */
+	void *main_run_addr;
+
+	/* Finish section load address */
+	void *finish_load_addr;
+
+	/* Finish section run address */
+	void *finish_run_addr;
+
+	/* Weights and Bias base address */
+	void *wb_base_addr;
+
+	/* Weights and bias load address */
+	void *wb_load_addr;
+
+	/* Start tile */
+	uint8_t tile_start;
+
+	/* End tile */
+	uint8_t tile_end;
+
+	/* Input address and size */
+	struct {
+		/* Number of elements */
+		uint32_t nb_elements;
+
+		/* Dequantized input size */
+		uint32_t sz_d;
+
+		/* Quantized input size */
+		uint32_t sz_q;
+	} input[MRVL_ML_INPUT_OUTPUT_SIZE];
+
+	/* Output address and size */
+	struct {
+		/* Number of elements */
+		uint32_t nb_elements;
+
+		/* Dequantize output size */
+		uint32_t sz_d;
+
+		/* Quantized output size */
+		uint32_t sz_q;
+	} output[MRVL_ML_INPUT_OUTPUT_SIZE];
+
+	/* Total size of quantized input */
+	uint32_t total_input_sz_q;
+
+	/* Total size of dequantized input */
+	uint32_t total_input_sz_d;
+
+	/* Total size of quantized output */
+	uint32_t total_output_sz_q;
+
+	/* Total size of dequantized output */
+	uint32_t total_output_sz_d;
+};
+
 /* Model Object */
 struct cn10k_ml_model {
 	/* Device reference */
@@ -339,6 +414,9 @@ struct cn10k_ml_model {
 	/* Metadata */
 	struct cn10k_ml_model_metadata metadata;
 
+	/* Address structure */
+	struct cn10k_ml_model_addr addr;
+
 	/* Spinlock, used to update model state */
 	plt_spinlock_t lock;
 
@@ -348,5 +426,7 @@ struct cn10k_ml_model {
 
 int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
+void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
+				uint8_t *base_dma_addr);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 2cde795903..b11228f2cb 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -408,11 +408,14 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 int
 cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
+	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
+	size_t model_data_size;
+	uint8_t *base_dma_addr;
 	uint64_t mz_size;
 	uint16_t idx;
 	bool found;
@@ -439,7 +442,12 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	/* Compute memzone size */
-	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE);
+	metadata = (struct cn10k_ml_model_metadata *)params->addr;
+	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
+			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
+	model_data_size = PLT_ALIGN_CEIL(model_data_size, ML_CN10K_ALIGN_SIZE);
+	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
+		  2 * model_data_size;
 
 	/* Allocate memzone for model object and model data */
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
@@ -462,6 +470,14 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	else
 		model->batch_size = model->metadata.model.batch_size;
 
+	/* Set DMA base address */
+	base_dma_addr = PLT_PTR_ADD(
+		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE));
+	cn10k_ml_model_addr_update(model, params->addr, base_dma_addr);
+
+	/* Copy data from load to run. run address to be used by MLIP */
+	rte_memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, model_data_size);
+
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 14/39] ml/cnxk: add internal structures for tiles and OCM
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (12 preceding siblings ...)
  2023-02-07 16:06   ` [PATCH v5 13/39] ml/cnxk: add internal structures for derived info Srikanth Yalavarthi
@ 2023-02-07 16:06   ` Srikanth Yalavarthi
  2023-02-07 16:06   ` [PATCH v5 15/39] ml/cnxk: add structures for slow and fast path JDs Srikanth Yalavarthi
                     ` (25 subsequent siblings)
  39 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:06 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added internal structures to handle tile and OCM information and
OCM to model memory mapping. Initialize the fields to platform
specific defaults and compute the OCM / tile requirements for model.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |  5 ++
 drivers/ml/cnxk/cn10k_ml_model.c | 53 +++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h |  6 +++
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  5 ++
 drivers/ml/cnxk/cn10k_ml_ocm.h   | 79 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 31 ++++++++++++-
 drivers/ml/cnxk/meson.build      |  2 +
 7 files changed, 180 insertions(+), 1 deletion(-)
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ocm.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ocm.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 7cf6268115..02a4496c97 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -7,6 +7,8 @@
 
 #include <roc_api.h>
 
+#include "cn10k_ml_ocm.h"
+
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
@@ -215,6 +217,9 @@ struct cn10k_ml_dev {
 	/* Firmware */
 	struct cn10k_ml_fw fw;
 
+	/* OCM info */
+	struct cn10k_ml_ocm ocm;
+
 	/* Number of models loaded */
 	uint16_t nb_models_loaded;
 };
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 2530beb80e..69d6306104 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -8,6 +8,7 @@
 
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
+#include "cn10k_ml_ocm.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -303,3 +304,55 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_
 			   addr->output[i].sz_d, addr->output[i].sz_q);
 	}
 }
+
+int
+cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, uint8_t *buffer,
+			       uint16_t *wb_pages, uint16_t *scratch_pages)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	struct cn10k_ml_ocm *ocm;
+	uint64_t scratch_size;
+	uint64_t wb_size;
+
+	metadata = (struct cn10k_ml_model_metadata *)buffer;
+	ocm = &mldev->ocm;
+
+	/* Assume wb_size is zero for non-relocatable models */
+	if (metadata->model.ocm_relocatable)
+		wb_size = metadata->model.ocm_wb_range_end - metadata->model.ocm_wb_range_start + 1;
+	else
+		wb_size = 0;
+
+	if (wb_size % ocm->page_size)
+		*wb_pages = wb_size / ocm->page_size + 1;
+	else
+		*wb_pages = wb_size / ocm->page_size;
+	plt_ml_dbg("model_id = %u, wb_size = %" PRIu64 ", wb_pages = %u", model_id, wb_size,
+		   *wb_pages);
+
+	scratch_size = ocm->size_per_tile - metadata->model.ocm_tmp_range_floor;
+	if (metadata->model.ocm_tmp_range_floor % ocm->page_size)
+		*scratch_pages = scratch_size / ocm->page_size + 1;
+	else
+		*scratch_pages = scratch_size / ocm->page_size;
+	plt_ml_dbg("model_id = %u, scratch_size = %" PRIu64 ", scratch_pages = %u", model_id,
+		   scratch_size, *scratch_pages);
+
+	/* Check if the model can be loaded on OCM */
+	if ((*wb_pages + *scratch_pages) > ML_CN10K_OCM_NUMPAGES) {
+		plt_err("Cannot create the model, OCM relocatable = %u",
+			metadata->model.ocm_relocatable);
+		plt_err("wb_pages (%u) + scratch_pages (%u) > %u", *wb_pages, *scratch_pages,
+			ML_CN10K_OCM_NUMPAGES);
+		return -ENOMEM;
+	}
+
+	/* Update scratch_pages to block the full tile for OCM non-relocatable model. This would
+	 * prevent the library from allocating the remaining space on the tile to other models.
+	 */
+	if (!metadata->model.ocm_relocatable)
+		*scratch_pages =
+			PLT_MAX(PLT_U64_CAST(*scratch_pages), PLT_U64_CAST(ML_CN10K_OCM_NUMPAGES));
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 5345160a74..7893635787 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -10,6 +10,7 @@
 #include <roc_api.h>
 
 #include "cn10k_ml_dev.h"
+#include "cn10k_ml_ocm.h"
 
 /* Model state */
 enum cn10k_ml_model_state {
@@ -417,6 +418,9 @@ struct cn10k_ml_model {
 	/* Address structure */
 	struct cn10k_ml_model_addr addr;
 
+	/* Tile and memory information object */
+	struct cn10k_ml_ocm_model_map model_mem_map;
+
 	/* Spinlock, used to update model state */
 	plt_spinlock_t lock;
 
@@ -428,5 +432,7 @@ int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
 void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
 				uint8_t *base_dma_addr);
+int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, uint8_t *buffer,
+				   uint16_t *wb_pages, uint16_t *scratch_pages);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
new file mode 100644
index 0000000000..b1c62f2963
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#include "cn10k_ml_ocm.h"
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
new file mode 100644
index 0000000000..44390396f9
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -0,0 +1,79 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef _CN10K_ML_OCM_H_
+#define _CN10K_ML_OCM_H_
+
+#include <rte_mldev.h>
+
+/* Page size in bytes. */
+#define ML_CN10K_OCM_PAGESIZE 0x4000
+
+/* Number of OCM tiles. */
+#define ML_CN10K_OCM_NUMTILES 0x8
+
+/* OCM in bytes, per tile. */
+#define ML_CN10K_OCM_TILESIZE 0x100000
+
+/* OCM pages, per tile. */
+#define ML_CN10K_OCM_NUMPAGES (ML_CN10K_OCM_TILESIZE / ML_CN10K_OCM_PAGESIZE)
+
+/* Maximum OCM mask words, per tile, 8 bit words. */
+#define ML_CN10K_OCM_MASKWORDS (ML_CN10K_OCM_NUMPAGES / 8)
+
+/* OCM and Tile information structure */
+struct cn10k_ml_ocm_tile_info {
+	/* Mask of used / allotted pages on tile's OCM */
+	uint8_t ocm_mask[ML_CN10K_OCM_MASKWORDS];
+
+	/* Last pages in the tile's OCM used for weights and bias, default = -1 */
+	int last_wb_page;
+
+	/* Number pages used for scratch memory on the tile's OCM */
+	uint16_t scratch_pages;
+};
+
+/* Model OCM map structure */
+struct cn10k_ml_ocm_model_map {
+	/* Status of OCM reservation */
+	bool ocm_reserved;
+
+	/* Mask of OCM tiles for the model */
+	uint64_t tilemask;
+
+	/* Start page for the model load, default = -1 */
+	int wb_page_start;
+
+	/* Number of pages required for weights and bias */
+	uint16_t wb_pages;
+
+	/* Number of pages required for scratch memory */
+	uint16_t scratch_pages;
+};
+
+/* OCM state structure */
+struct cn10k_ml_ocm {
+	/* OCM spinlock, used to update OCM state */
+	rte_spinlock_t lock;
+
+	/* Number of OCM tiles */
+	uint8_t num_tiles;
+
+	/* OCM size per each tile */
+	uint64_t size_per_tile;
+
+	/* Size of OCM page */
+	uint64_t page_size;
+
+	/* Number of OCM pages */
+	uint16_t num_pages;
+
+	/* Words per OCM mask */
+	uint16_t mask_words;
+
+	/* OCM memory info and status*/
+	struct cn10k_ml_ocm_tile_info tile_ocm_info[ML_CN10K_OCM_NUMTILES];
+};
+
+#endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index b11228f2cb..302ce8a452 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -126,9 +126,11 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_qp *qp;
-	uint32_t mz_size;
 	uint16_t model_id;
+	uint32_t mz_size;
+	uint16_t tile_id;
 	uint16_t qp_id;
 	int ret;
 
@@ -250,6 +252,18 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	}
 	dev->data->nb_models = conf->nb_models;
 
+	ocm = &mldev->ocm;
+	ocm->num_tiles = ML_CN10K_OCM_NUMTILES;
+	ocm->size_per_tile = ML_CN10K_OCM_TILESIZE;
+	ocm->page_size = ML_CN10K_OCM_PAGESIZE;
+	ocm->num_pages = ocm->size_per_tile / ocm->page_size;
+	ocm->mask_words = ocm->num_pages / (8 * sizeof(uint8_t));
+
+	for (tile_id = 0; tile_id < ocm->num_tiles; tile_id++)
+		ocm->tile_ocm_info[tile_id].last_wb_page = -1;
+
+	rte_spinlock_init(&ocm->lock);
+
 	mldev->nb_models_loaded = 0;
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
 
@@ -416,6 +430,8 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	const struct plt_memzone *mz;
 	size_t model_data_size;
 	uint8_t *base_dma_addr;
+	uint16_t scratch_pages;
+	uint16_t wb_pages;
 	uint64_t mz_size;
 	uint16_t idx;
 	bool found;
@@ -441,6 +457,11 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 		return -ENOMEM;
 	}
 
+	/* Get WB and scratch pages, check if model can be loaded. */
+	ret = cn10k_ml_model_ocm_pages_count(mldev, idx, params->addr, &wb_pages, &scratch_pages);
+	if (ret < 0)
+		return ret;
+
 	/* Compute memzone size */
 	metadata = (struct cn10k_ml_model_metadata *)params->addr;
 	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
@@ -478,6 +499,14 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	/* Copy data from load to run. run address to be used by MLIP */
 	rte_memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, model_data_size);
 
+	/* Initialize model_mem_map */
+	memset(&model->model_mem_map, 0, sizeof(struct cn10k_ml_ocm_model_map));
+	model->model_mem_map.ocm_reserved = false;
+	model->model_mem_map.tilemask = 0;
+	model->model_mem_map.wb_page_start = -1;
+	model->model_mem_map.wb_pages = wb_pages;
+	model->model_mem_map.scratch_pages = scratch_pages;
+
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 799e8f2470..393bc629b0 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -11,12 +11,14 @@ driver_sdk_headers = files(
         'cn10k_ml_dev.h',
         'cn10k_ml_ops.h',
         'cn10k_ml_model.h',
+        'cn10k_ml_ocm.h',
 )
 
 sources = files(
         'cn10k_ml_dev.c',
         'cn10k_ml_ops.c',
         'cn10k_ml_model.c',
+        'cn10k_ml_ocm.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 15/39] ml/cnxk: add structures for slow and fast path JDs
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (13 preceding siblings ...)
  2023-02-07 16:06   ` [PATCH v5 14/39] ml/cnxk: add internal structures for tiles and OCM Srikanth Yalavarthi
@ 2023-02-07 16:06   ` Srikanth Yalavarthi
  2023-02-07 16:06   ` [PATCH v5 16/39] ml/cnxk: find OCM mask and page slots for a model Srikanth Yalavarthi
                     ` (24 subsequent siblings)
  39 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:06 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added JD structures for load, unload and run jobs. Initialize
job command and allocate memory for request structures for slow
path jobs.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   | 99 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h |  4 ++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 19 +++++-
 drivers/ml/cnxk/cn10k_ml_ops.h   |  4 ++
 4 files changed, 125 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 02a4496c97..68fcc957fa 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -188,6 +188,105 @@ struct cn10k_ml_jd {
 
 			uint8_t rsvd[8];
 		} fw_load;
+
+		struct cn10k_ml_jd_section_model_start {
+			/* Source model start address in DDR relative to ML_MLR_BASE */
+			uint64_t model_src_ddr_addr;
+
+			/* Destination model start address in DDR relative to ML_MLR_BASE */
+			uint64_t model_dst_ddr_addr;
+
+			/* Offset to model init section in the model */
+			uint64_t model_init_offset : 32;
+
+			/* Size of init section in the model */
+			uint64_t model_init_size : 32;
+
+			/* Offset to model main section in the model */
+			uint64_t model_main_offset : 32;
+
+			/* Size of main section in the model */
+			uint64_t model_main_size : 32;
+
+			/* Offset to model finish section in the model */
+			uint64_t model_finish_offset : 32;
+
+			/* Size of finish section in the model */
+			uint64_t model_finish_size : 32;
+
+			/* Offset to WB in model bin */
+			uint64_t model_wb_offset : 32;
+
+			/* Number of model layers */
+			uint64_t num_layers : 8;
+
+			/* Number of gather entries, 0 means linear input mode (= no gather) */
+			uint64_t num_gather_entries : 8;
+
+			/* Number of scatter entries 0 means linear input mode (= no scatter) */
+			uint64_t num_scatter_entries : 8;
+
+			/* Tile mask to load model */
+			uint64_t tilemask : 8;
+
+			/* Batch size of model  */
+			uint64_t batch_size : 32;
+
+			/* OCM WB base address */
+			uint64_t ocm_wb_base_address : 32;
+
+			/* OCM WB range start */
+			uint64_t ocm_wb_range_start : 32;
+
+			/* OCM WB range End */
+			uint64_t ocm_wb_range_end : 32;
+
+			/* DDR WB address */
+			uint64_t ddr_wb_base_address;
+
+			/* DDR WB range start */
+			uint64_t ddr_wb_range_start : 32;
+
+			/* DDR WB range end */
+			uint64_t ddr_wb_range_end : 32;
+
+			union {
+				/* Points to gather list if num_gather_entries > 0 */
+				void *gather_list;
+				struct {
+					/* Linear input mode */
+					uint64_t ddr_range_start : 32;
+					uint64_t ddr_range_end : 32;
+				} s;
+			} input;
+
+			union {
+				/* Points to scatter list if num_scatter_entries > 0 */
+				void *scatter_list;
+				struct {
+					/* Linear output mode */
+					uint64_t ddr_range_start : 32;
+					uint64_t ddr_range_end : 32;
+				} s;
+			} output;
+		} model_start;
+
+		struct cn10k_ml_jd_section_model_stop {
+			uint8_t rsvd[96];
+		} model_stop;
+
+		struct cn10k_ml_jd_section_model_run {
+			/* Address of the input for the run relative to ML_MLR_BASE */
+			uint64_t input_ddr_addr;
+
+			/* Address of the output for the run relative to ML_MLR_BASE */
+			uint64_t output_ddr_addr;
+
+			/* Number of batches to run in variable batch processing */
+			uint16_t num_batches;
+
+			uint8_t rsvd[78];
+		} model_run;
 	};
 };
 
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 7893635787..355915deeb 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -11,6 +11,7 @@
 
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ocm.h"
+#include "cn10k_ml_ops.h"
 
 /* Model state */
 enum cn10k_ml_model_state {
@@ -426,6 +427,9 @@ struct cn10k_ml_model {
 
 	/* State */
 	enum cn10k_ml_model_state state;
+
+	/* Slow-path operations request pointer */
+	struct cn10k_ml_req *req;
 };
 
 int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 302ce8a452..56adce12ea 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -12,6 +12,10 @@
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
 
+/* ML Job descriptor flags */
+#define ML_FLAGS_POLL_COMPL BIT(0)
+#define ML_FLAGS_SSO_COMPL  BIT(1)
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -65,6 +69,7 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	struct cn10k_ml_qp *qp;
 	uint32_t len;
 	uint8_t *va;
+	uint64_t i;
 
 	/* Allocate queue pair */
 	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cn10k_ml_qp), ROC_ALIGN,
@@ -95,6 +100,12 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->queue.wait_cycles = ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
 	qp->nb_desc = nb_desc;
 
+	/* Initialize job command */
+	for (i = 0; i < qp->nb_desc; i++) {
+		memset(&qp->queue.reqs[i].jd, 0, sizeof(struct cn10k_ml_jd));
+		qp->queue.reqs[i].jcmd.w1.s.jobptr = PLT_U64_CAST(&qp->queue.reqs[i].jd);
+	}
+
 	return qp;
 
 qp_free:
@@ -468,7 +479,8 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
 	model_data_size = PLT_ALIGN_CEIL(model_data_size, ML_CN10K_ALIGN_SIZE);
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
-		  2 * model_data_size;
+		  2 * model_data_size +
+		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE);
 
 	/* Allocate memzone for model object and model data */
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
@@ -507,6 +519,11 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	model->model_mem_map.wb_pages = wb_pages;
 	model->model_mem_map.scratch_pages = scratch_pages;
 
+	/* Set slow-path request address and state */
+	model->req = PLT_PTR_ADD(
+		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
+				  2 * model_data_size);
+
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index d7842ecd73..c86ce66f19 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -6,6 +6,7 @@
 #define _CN10K_ML_OPS_H_
 
 #include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
 
 #include <roc_api.h>
 
@@ -21,6 +22,9 @@ struct cn10k_ml_req {
 
 	/* Status field for poll mode requests */
 	volatile uint64_t status;
+
+	/* Job command */
+	struct ml_job_cmd_s jcmd;
 } __rte_aligned(ROC_ALIGN);
 
 /* Request queue */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 16/39] ml/cnxk: find OCM mask and page slots for a model
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (14 preceding siblings ...)
  2023-02-07 16:06   ` [PATCH v5 15/39] ml/cnxk: add structures for slow and fast path JDs Srikanth Yalavarthi
@ 2023-02-07 16:06   ` Srikanth Yalavarthi
  2023-02-07 16:06   ` [PATCH v5 17/39] ml/cnxk: add support to reserve and free OCM pages Srikanth Yalavarthi
                     ` (23 subsequent siblings)
  39 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:06 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added support to compute OCM tilemask and page start for a
model. The computed tilemask and page start are used during
model start to copy model weights and bias to OCM. OCM slot
for a model is allocated from the tiles with maximum amount
of free memory.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c | 330 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ocm.h |   5 +
 2 files changed, 335 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index b1c62f2963..df2fa4c514 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -2,4 +2,334 @@
  * Copyright (c) 2022 Marvell.
  */
 
+#include <rte_mldev_pmd.h>
+
+#include "cn10k_ml_dev.h"
 #include "cn10k_ml_ocm.h"
+
+#include "roc_api.h"
+
+/* OCM macros */
+#define BYTE_LEN	  8
+#define OCM_MAP_WORD_SIZE (sizeof(uint8_t) * BYTE_LEN)
+#define SET_BIT(num, n)	  ((num) | (1 << (n)))
+
+/* Left shift multi-word mask by 1 bit.
+ *
+ * For example, given a mask of two uint8_t words
+ * Input:  [00110101] [00110111]
+ * Output: [01101010] [01101110]
+ */
+static void
+lshift_mask(uint8_t *mask, int nwords)
+{
+	int i;
+	int word_sz;
+
+	word_sz = sizeof(uint8_t) * BYTE_LEN;
+	for (i = nwords - 1; i >= 0; i--) {
+		mask[i] = mask[i] << 1;
+		if (i != 0)
+			mask[i] = mask[i] | (mask[i - 1] >> (word_sz - 1));
+	}
+}
+
+/* Get the index of the first unused slot in a multi-word mask (base_mask). Unused slots only after
+ * the start_pos are considered. An unused slot is a sequence of slot_sz continuous unset bits in
+ * the multi-word mask. For example given a multi-word mask,
+ *
+ * The program creates a search_mask with slot_sz bits set. Uses a sliding windows approach to scan
+ * the mask to identify the available first slot. search_mask slides left from start_pos to end.
+ *
+ * [10111000] [01001001]
+ * - WORD 1 --- WORD 0 -
+ *
+ * When start = 0,
+ * Index of the first unused slot of size 4 is 7.
+ * Index of the first unused slot of size 3 is 7.
+ * Index of the first unused slot of size 2 is 1.
+ * Index of the first unused slot of size 1 is 1.
+ *
+ * When start = 2,
+ * Index of the first unused slot of size 4 is 7.
+ * Index of the first unused slot of size 2 is 4.
+ * Index of the first unused slot of size 1 is 2.
+ *
+ * When unable to find a valid slot, return 0
+ * When slot_sz is zero, return max_idx + 1
+ */
+static int
+slot_index_lowest(uint8_t *base_mask, int nwords, int slot_sz, int start_pos)
+{
+	uint8_t *search_mask;
+	int word_sz;
+	int end_pos;
+	int min_idx;
+	int max_idx;
+	bool match;
+	int i, j;
+	int idx;
+
+	word_sz = sizeof(uint8_t) * BYTE_LEN;
+	min_idx = 0;
+	max_idx = word_sz * nwords;
+	idx = min_idx - 1;
+
+	if (slot_sz == 0)
+		return max_idx;
+
+	/* Create a mask with slot_sz bits set */
+	search_mask = plt_zmalloc(nwords * sizeof(uint8_t), 0);
+	if (search_mask == NULL)
+		goto error;
+
+	for (i = 0; i < nwords; i++) {
+		if (i < slot_sz / word_sz)
+			search_mask[i] = 0xFF;
+		else if (i > slot_sz / word_sz)
+			search_mask[i] = 0x00;
+		else
+			search_mask[i] = (1 << (slot_sz % word_sz)) - 1;
+	}
+
+	/* Shift search mask by start_pos bits */
+	for (i = 0; i < start_pos; i++)
+		lshift_mask(search_mask, nwords);
+
+	/* Scan for a slot, left shift search mask after every iteration */
+	end_pos = nwords * word_sz - slot_sz + 1;
+	for (j = start_pos; j < end_pos; j++) {
+		match = true;
+		for (i = 0; i < nwords; i++)
+			match = match && (((~base_mask[i]) & search_mask[i]) == search_mask[i]);
+
+		if (match) {
+			idx = j;
+			goto found;
+		}
+
+		lshift_mask(search_mask, nwords);
+	}
+
+found:
+	plt_free(search_mask);
+
+error:
+	return idx;
+}
+
+/* Find the largest possible unused slot, with a minimum size of search_sz in a multi-work mask. The
+ * function returns the start index of the slot and the size of the identified slot (slot_sz).
+ *
+ * For example, in multi-word mask
+ *
+ * [10111000] [01001001]
+ * - WORD 1 --- WORD 0 -
+ *
+ * When search_sz > 4, return value = -1, slot_sz = 0
+ * When search_sz <=4, return value = 7, slot_sz = 4
+ */
+static int
+slot_index_largest(uint8_t *base_mask, int nwords, int search_sz, int *slot_sz)
+{
+	uint8_t *search_mask;
+	int mask_sz;
+	int word_sz;
+	int end_pos;
+	bool match;
+	int i, j;
+	int idx;
+
+	word_sz = sizeof(uint8_t) * BYTE_LEN;
+	mask_sz = nwords * word_sz;
+	idx = -1;
+
+	/* Create a mask with mask_sz bits set */
+	search_mask = plt_zmalloc(mask_sz, 0);
+	if (search_mask == NULL)
+		goto error;
+
+start:
+	for (i = 0; i < nwords; i++) {
+		if (i < mask_sz / word_sz)
+			search_mask[i] = 0xFF;
+		else if (i > mask_sz / word_sz)
+			search_mask[i] = 0x00;
+		else
+			search_mask[i] = (1 << (mask_sz % word_sz)) - 1;
+	}
+
+	/* Scan for a slot, left shift search mask after every iteration */
+	end_pos = nwords * word_sz - mask_sz + 1;
+	for (j = 0; j < end_pos; j++) {
+		match = true;
+		for (i = 0; i < nwords; i++)
+			match = match && (((~base_mask[i]) & search_mask[i]) == search_mask[i]);
+
+		if (match) {
+			idx = j;
+			goto found;
+		}
+		lshift_mask(search_mask, nwords);
+	}
+
+	mask_sz--;
+	if (mask_sz >= search_sz)
+		goto start;
+	else
+		mask_sz = 0;
+
+found:
+	plt_free(search_mask);
+	if (search_sz == 0)
+		idx = word_sz * nwords;
+
+error:
+	if (slot_sz)
+		*slot_sz = mask_sz;
+
+	return idx;
+}
+
+/* Count number of bits in a tilemask. Assumes that all set bits are contiguous. */
+int
+cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end)
+{
+	uint8_t count;
+
+	PLT_ASSERT(tilemask != 0);
+
+	*start = __builtin_ctzl(tilemask);
+	*end = 64 - __builtin_clzl(tilemask) - 1;
+	count = *end - *start + 1;
+
+	PLT_ASSERT(count == __builtin_popcountl(tilemask));
+	return count;
+}
+
+/* Find the tiles and wb_page_start to load the model on given 'num_tiles' tiles with the specified
+ * scratch & WB pages and OCM allocation mode.
+ */
+int
+cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
+			   uint16_t scratch_pages, uint64_t *tilemask)
+{
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+
+	uint8_t local_ocm_mask[ML_CN10K_OCM_MASKWORDS] = {0};
+	uint16_t used_scratch_pages_max;
+	uint16_t scratch_page_start;
+	int used_last_wb_page_max;
+	uint16_t scratch_page_end;
+	uint8_t search_start_tile;
+	uint8_t search_end_tile;
+	int wb_page_start_curr;
+	int max_slot_sz_curr;
+	uint8_t tile_start;
+	int ocm_alloc_mode;
+	int wb_page_start;
+	uint16_t tile_id;
+	uint16_t word_id;
+	uint8_t tile_idx;
+	int max_slot_sz;
+	int start_tile;
+	int page_id;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+
+	if (num_tiles > ML_CN10K_OCM_NUMTILES) {
+		plt_err("Invalid num_tiles = %u (> ML_CN10K_OCM_NUMTILES)", num_tiles);
+		return -1;
+	}
+
+	memset(tilemask, 0, sizeof(uint64_t));
+	wb_page_start = -1;
+	used_scratch_pages_max = 0;
+	used_last_wb_page_max = -1;
+	start_tile = -1;
+	max_slot_sz_curr = 0;
+	max_slot_sz = 0;
+	tile_idx = 0;
+	ocm_alloc_mode = 2;
+
+	if ((start_tile != -1) && (start_tile % num_tiles != 0)) {
+		plt_err("Invalid start_tile, %d", start_tile);
+		return -1;
+	}
+
+	if (start_tile < 0) {
+		search_start_tile = 0;
+		search_end_tile = ocm->num_tiles - num_tiles;
+	} else {
+		search_start_tile = start_tile;
+		search_end_tile = start_tile;
+	}
+
+	tile_start = search_start_tile;
+start_search:
+	used_scratch_pages_max = 0;
+	used_last_wb_page_max = -1;
+	for (tile_id = tile_start; tile_id < tile_start + num_tiles; tile_id++) {
+		used_scratch_pages_max =
+			PLT_MAX(ocm->tile_ocm_info[tile_id].scratch_pages, used_scratch_pages_max);
+		used_last_wb_page_max =
+			PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, used_last_wb_page_max);
+	}
+
+	memset(local_ocm_mask, 0, sizeof(local_ocm_mask));
+	for (tile_id = tile_start; tile_id < tile_start + num_tiles; tile_id++) {
+		for (word_id = 0; word_id < ocm->mask_words; word_id++)
+			local_ocm_mask[word_id] |= ocm->tile_ocm_info[tile_id].ocm_mask[word_id];
+	}
+
+	if (used_scratch_pages_max < scratch_pages) { /* Check for extra scratch pages */
+		if (ocm->num_pages - used_last_wb_page_max - 1 >=
+		    scratch_pages) { /* Pages available */
+			scratch_page_start = ocm->num_pages - scratch_pages;
+			scratch_page_end = ocm->num_pages - 1;
+			for (page_id = scratch_page_start; page_id <= scratch_page_end;
+			     page_id++) { /* Mark the extra scratch pages as used */
+				local_ocm_mask[page_id / OCM_MAP_WORD_SIZE] =
+					SET_BIT(local_ocm_mask[page_id / OCM_MAP_WORD_SIZE],
+						page_id % OCM_MAP_WORD_SIZE);
+			}
+		} else { /* Pages not available, check for next set of tiles */
+			goto next_search;
+		}
+	}
+
+	if (ocm_alloc_mode == 1) {
+		wb_page_start = slot_index_lowest(local_ocm_mask, ocm->mask_words, wb_pages, 0);
+		if (wb_page_start != -1) { /* Have a valid slot for WB, else next set of tiles */
+			tile_idx = tile_start;
+			goto found;
+		}
+	} else if (ocm_alloc_mode == 2) {
+		wb_page_start_curr = slot_index_largest(local_ocm_mask, ocm->mask_words, wb_pages,
+							&max_slot_sz_curr);
+		if (max_slot_sz_curr > max_slot_sz) {
+			wb_page_start = wb_page_start_curr;
+			max_slot_sz = max_slot_sz_curr;
+			tile_idx = tile_start;
+		} else if (max_slot_sz_curr == max_slot_sz) {
+			wb_page_start = wb_page_start_curr;
+			if (wb_page_start == ocm->num_pages) {
+				tile_idx = tile_start;
+				goto found;
+			}
+		}
+	}
+
+next_search:
+	tile_start = tile_start + num_tiles;
+	if (tile_start <= search_end_tile)
+		goto start_search;
+
+found:
+	if (wb_page_start != -1)
+		*tilemask = GENMASK_ULL(tile_idx + num_tiles - 1, tile_idx);
+
+	return wb_page_start;
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 44390396f9..2e26271a7a 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -6,6 +6,7 @@
 #define _CN10K_ML_OCM_H_
 
 #include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
 
 /* Page size in bytes. */
 #define ML_CN10K_OCM_PAGESIZE 0x4000
@@ -76,4 +77,8 @@ struct cn10k_ml_ocm {
 	struct cn10k_ml_ocm_tile_info tile_ocm_info[ML_CN10K_OCM_NUMTILES];
 };
 
+int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
+int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
+			       uint16_t scratch_pages, uint64_t *tilemask);
+
 #endif /* _CN10K_ML_OCM_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 17/39] ml/cnxk: add support to reserve and free OCM pages
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (15 preceding siblings ...)
  2023-02-07 16:06   ` [PATCH v5 16/39] ml/cnxk: find OCM mask and page slots for a model Srikanth Yalavarthi
@ 2023-02-07 16:06   ` Srikanth Yalavarthi
  2023-02-07 16:06   ` [PATCH v5 18/39] ml/cnxk: enable support to start an ML model Srikanth Yalavarthi
                     ` (22 subsequent siblings)
  39 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:06 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added support to reserve and free OCM pages for a model. OCM
pages are reserved upon completion of model start and are
released after model stop.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c | 131 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cn10k_ml_ocm.h |   3 +
 2 files changed, 131 insertions(+), 3 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index df2fa4c514..c3e4de3e9c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -5,14 +5,17 @@
 #include <rte_mldev_pmd.h>
 
 #include "cn10k_ml_dev.h"
+#include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
 #include "roc_api.h"
 
 /* OCM macros */
-#define BYTE_LEN	  8
-#define OCM_MAP_WORD_SIZE (sizeof(uint8_t) * BYTE_LEN)
-#define SET_BIT(num, n)	  ((num) | (1 << (n)))
+#define BYTE_LEN	   8
+#define OCM_MAP_WORD_SIZE  (sizeof(uint8_t) * BYTE_LEN)
+#define IS_BIT_SET(num, n) ((num) & (1 << (n)))
+#define SET_BIT(num, n)	   ((num) | (1 << (n)))
+#define CLEAR_BIT(num, n)  ((num) &= ~((1) << (n)))
 
 /* Left shift multi-word mask by 1 bit.
  *
@@ -333,3 +336,125 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 
 	return wb_page_start;
 }
+
+void
+cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t tilemask,
+			   int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+
+	int scratch_page_start;
+	int scratch_page_end;
+	int wb_page_end;
+	int tile_start;
+	int tile_end;
+	int tile_id;
+	int page_id;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+	model = dev->data->models[model_id];
+
+	/* Get first set bit, tile_start */
+	tile_start = 0;
+	tile_end = 0;
+	cn10k_ml_ocm_tilecount(tilemask, &tile_start, &tile_end);
+	wb_page_end = wb_page_start + wb_pages - 1;
+	scratch_page_start = ocm->num_pages - scratch_pages;
+	scratch_page_end = ocm->num_pages - 1;
+
+	/* Update tile_ocm_info */
+	for (tile_id = tile_start; tile_id <= tile_end; tile_id++) {
+		/* Scratch pages */
+		for (page_id = scratch_page_start; page_id <= scratch_page_end; page_id++)
+			ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE] = SET_BIT(
+				ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE],
+				page_id % OCM_MAP_WORD_SIZE);
+		ocm->tile_ocm_info[tile_id].scratch_pages =
+			PLT_MAX(ocm->tile_ocm_info[tile_id].scratch_pages, scratch_pages);
+
+		/* WB pages */
+		for (page_id = wb_page_start; page_id <= wb_page_end; page_id++)
+			ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE] = SET_BIT(
+				ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE],
+				page_id % OCM_MAP_WORD_SIZE);
+		if (wb_pages != 0)
+			ocm->tile_ocm_info[tile_id].last_wb_page =
+				PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, wb_page_end);
+	}
+
+	model->addr.tile_start = tile_start;
+	model->addr.tile_end = tile_end;
+
+	plt_ml_dbg("model_id = %u, tilemask = 0x%016lx", model_id, tilemask);
+	plt_ml_dbg("model_id = %u, wb_page_start = %d, wb_page_end = %d", model_id, wb_page_start,
+		   wb_page_end);
+	plt_ml_dbg("model_id = %u, scratch_page_start = %d, scratch_page_end = %d", model_id,
+		   scratch_page_start, scratch_page_end);
+}
+
+void
+cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+
+	int scratch_resize_pages;
+	int wb_page_start;
+	int wb_page_end;
+	int prev_start;
+	int curr_start;
+	int tile_id;
+	int page_id;
+	uint16_t i;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+	model = dev->data->models[model_id];
+
+	/* Update OCM info for WB memory */
+	wb_page_start = model->model_mem_map.wb_page_start;
+	wb_page_end = wb_page_start + model->model_mem_map.wb_pages - 1;
+	for (tile_id = model->addr.tile_start; tile_id <= model->addr.tile_end; tile_id++) {
+		for (page_id = wb_page_start; page_id <= wb_page_end; page_id++) {
+			ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE] =
+				CLEAR_BIT(ocm->tile_ocm_info[tile_id]
+						  .ocm_mask[page_id / OCM_MAP_WORD_SIZE],
+					  page_id % OCM_MAP_WORD_SIZE);
+		}
+
+		/* Update last_wb_page size */
+		if (wb_page_end == ocm->tile_ocm_info[tile_id].last_wb_page)
+			ocm->tile_ocm_info[tile_id].last_wb_page = wb_page_start - 1;
+
+		/* Update scratch page size and clear extra bits */
+		scratch_resize_pages = 0;
+		/* Get max scratch pages required, excluding the current model */
+		for (i = 0; i < dev->data->nb_models; i++) {
+			struct cn10k_ml_model *model = dev->data->models[i];
+
+			if ((i != model_id) && (model != NULL)) {
+				if (IS_BIT_SET(model->model_mem_map.tilemask, tile_id))
+					scratch_resize_pages =
+						PLT_MAX((int)model->model_mem_map.scratch_pages,
+							scratch_resize_pages);
+			}
+		}
+
+		/* Clear extra scratch pages */
+		if (scratch_resize_pages < ocm->tile_ocm_info[tile_id].scratch_pages) {
+			prev_start = ocm->num_pages - ocm->tile_ocm_info[tile_id].scratch_pages;
+			curr_start = ocm->num_pages - scratch_resize_pages;
+			for (page_id = prev_start; page_id < curr_start; page_id++) {
+				ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE] =
+					CLEAR_BIT(ocm->tile_ocm_info[tile_id]
+							  .ocm_mask[page_id / OCM_MAP_WORD_SIZE],
+						  page_id % OCM_MAP_WORD_SIZE);
+			}
+			ocm->tile_ocm_info[tile_id].scratch_pages = scratch_resize_pages;
+		}
+	}
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 2e26271a7a..32c9b17afc 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -80,5 +80,8 @@ struct cn10k_ml_ocm {
 int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
 int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
 			       uint16_t scratch_pages, uint64_t *tilemask);
+void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t tilemask,
+				int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages);
+void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id);
 
 #endif /* _CN10K_ML_OCM_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 18/39] ml/cnxk: enable support to start an ML model
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (16 preceding siblings ...)
  2023-02-07 16:06   ` [PATCH v5 17/39] ml/cnxk: add support to reserve and free OCM pages Srikanth Yalavarthi
@ 2023-02-07 16:06   ` Srikanth Yalavarthi
  2023-02-07 16:06   ` [PATCH v5 19/39] ml/cnxk: enable support to stop an ML models Srikanth Yalavarthi
                     ` (21 subsequent siblings)
  39 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:06 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Implemented model start driver function. A model start  job
is checked for completion in synchronous mode. Tilemask and
OCM slot is calculated before starting the model. Model start
is enqueued through scratch registers. OCM pages are reserved
after model start completion.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h |   3 +
 drivers/ml/cnxk/cn10k_ml_ops.c | 207 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |   4 +
 3 files changed, 214 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 68fcc957fa..8f6bc24370 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -33,6 +33,9 @@
 /* ML command timeout in seconds */
 #define ML_CN10K_CMD_TIMEOUT 5
 
+/* ML slow-path job flags */
+#define ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE BIT(0)
+
 /* Poll mode job state */
 #define ML_CN10K_POLL_JOB_START	 0
 #define ML_CN10K_POLL_JOB_FINISH 1
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 56adce12ea..e8ce65b182 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -114,6 +114,64 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	return NULL;
 }
 
+static void
+cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_model *model,
+				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	struct cn10k_ml_model_addr *addr;
+
+	metadata = &model->metadata;
+	addr = &model->addr;
+
+	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
+	req->jd.hdr.jce.w0.u64 = 0;
+	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
+	req->jd.hdr.model_id = model->model_id;
+	req->jd.hdr.job_type = job_type;
+	req->jd.hdr.fp_flags = 0x0;
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+
+	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
+		if (!model->metadata.model.ocm_relocatable)
+			req->jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
+		else
+			req->jd.hdr.sp_flags = 0x0;
+		req->jd.model_start.model_src_ddr_addr =
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, addr->init_load_addr));
+		req->jd.model_start.model_dst_ddr_addr =
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, addr->init_run_addr));
+		req->jd.model_start.model_init_offset = 0x0;
+		req->jd.model_start.model_main_offset = metadata->init_model.file_size;
+		req->jd.model_start.model_finish_offset =
+			metadata->init_model.file_size + metadata->main_model.file_size;
+		req->jd.model_start.model_init_size = metadata->init_model.file_size;
+		req->jd.model_start.model_main_size = metadata->main_model.file_size;
+		req->jd.model_start.model_finish_size = metadata->finish_model.file_size;
+		req->jd.model_start.model_wb_offset = metadata->init_model.file_size +
+						      metadata->main_model.file_size +
+						      metadata->finish_model.file_size;
+		req->jd.model_start.num_layers = metadata->model.num_layers;
+		req->jd.model_start.num_gather_entries = 0;
+		req->jd.model_start.num_scatter_entries = 0;
+		req->jd.model_start.tilemask = 0; /* Updated after reserving pages */
+		req->jd.model_start.batch_size = model->batch_size;
+		req->jd.model_start.ocm_wb_base_address = 0; /* Updated after reserving pages */
+		req->jd.model_start.ocm_wb_range_start = metadata->model.ocm_wb_range_start;
+		req->jd.model_start.ocm_wb_range_end = metadata->model.ocm_wb_range_end;
+		req->jd.model_start.ddr_wb_base_address = PLT_U64_CAST(roc_ml_addr_ap2mlip(
+			&mldev->roc,
+			PLT_PTR_ADD(addr->finish_load_addr, metadata->finish_model.file_size)));
+		req->jd.model_start.ddr_wb_range_start = metadata->model.ddr_wb_range_start;
+		req->jd.model_start.ddr_wb_range_end = metadata->model.ddr_wb_range_end;
+		req->jd.model_start.input.s.ddr_range_start = metadata->model.ddr_input_range_start;
+		req->jd.model_start.input.s.ddr_range_end = metadata->model.ddr_input_range_end;
+		req->jd.model_start.output.s.ddr_range_start =
+			metadata->model.ddr_output_range_start;
+		req->jd.model_start.output.s.ddr_range_end = metadata->model.ddr_output_range_end;
+	}
+}
+
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -561,6 +619,154 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 	return plt_memzone_free(plt_memzone_lookup(str));
 }
 
+int
+cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+	struct cn10k_ml_req *req;
+
+	bool job_enqueued;
+	bool job_dequeued;
+	uint8_t num_tiles;
+	uint64_t tilemask;
+	int wb_page_start;
+	int tile_start;
+	int tile_end;
+	bool locked;
+	int ret = 0;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	/* Prepare JD */
+	req = model->req;
+	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
+	req->result.error_code = 0x0;
+	req->result.user_ptr = NULL;
+
+	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_wmb();
+
+	num_tiles = model->metadata.model.tile_end - model->metadata.model.tile_start + 1;
+
+	locked = false;
+	while (!locked) {
+		if (plt_spinlock_trylock(&model->lock) != 0) {
+			if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+				plt_ml_dbg("Model already started, model = 0x%016lx",
+					   PLT_U64_CAST(model));
+				plt_spinlock_unlock(&model->lock);
+				return 1;
+			}
+
+			if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE) {
+				plt_err("A slow-path job is active for the model = 0x%016lx",
+					PLT_U64_CAST(model));
+				plt_spinlock_unlock(&model->lock);
+				return -EBUSY;
+			}
+
+			model->state = ML_CN10K_MODEL_STATE_JOB_ACTIVE;
+			plt_spinlock_unlock(&model->lock);
+			locked = true;
+		}
+	}
+
+	while (!model->model_mem_map.ocm_reserved) {
+		if (plt_spinlock_trylock(&ocm->lock) != 0) {
+			wb_page_start = cn10k_ml_ocm_tilemask_find(
+				dev, num_tiles, model->model_mem_map.wb_pages,
+				model->model_mem_map.scratch_pages, &tilemask);
+
+			if (wb_page_start == -1) {
+				plt_err("Free pages not available on OCM tiles");
+				plt_err("Failed to start model = 0x%016lx, name = %s",
+					PLT_U64_CAST(model), model->metadata.model.name);
+
+				plt_spinlock_unlock(&ocm->lock);
+				return -ENOMEM;
+			}
+
+			model->model_mem_map.tilemask = tilemask;
+			model->model_mem_map.wb_page_start = wb_page_start;
+
+			cn10k_ml_ocm_reserve_pages(
+				dev, model->model_id, model->model_mem_map.tilemask,
+				model->model_mem_map.wb_page_start, model->model_mem_map.wb_pages,
+				model->model_mem_map.scratch_pages);
+			model->model_mem_map.ocm_reserved = true;
+			plt_spinlock_unlock(&ocm->lock);
+		}
+	}
+
+	/* Update JD */
+	cn10k_ml_ocm_tilecount(model->model_mem_map.tilemask, &tile_start, &tile_end);
+	req->jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
+	req->jd.model_start.ocm_wb_base_address =
+		model->model_mem_map.wb_page_start * ocm->page_size;
+
+	job_enqueued = false;
+	job_dequeued = false;
+	do {
+		if (!job_enqueued) {
+			req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+			job_enqueued = roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+		}
+
+		if (job_enqueued && !job_dequeued)
+			job_dequeued = roc_ml_scratch_dequeue(&mldev->roc, &req->jd);
+
+		if (job_dequeued)
+			break;
+	} while (plt_tsc_cycles() < req->timeout);
+
+	if (job_dequeued) {
+		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+			if (req->result.error_code == 0)
+				ret = 0;
+			else
+				ret = -1;
+		}
+	} else { /* Reset scratch registers */
+		roc_ml_scratch_queue_reset(&mldev->roc);
+		ret = -ETIME;
+	}
+
+	locked = false;
+	while (!locked) {
+		if (plt_spinlock_trylock(&model->lock) != 0) {
+			if (ret == 0)
+				model->state = ML_CN10K_MODEL_STATE_STARTED;
+			else
+				model->state = ML_CN10K_MODEL_STATE_UNKNOWN;
+
+			plt_spinlock_unlock(&model->lock);
+			locked = true;
+		}
+	}
+
+	if (model->state == ML_CN10K_MODEL_STATE_UNKNOWN) {
+		while (model->model_mem_map.ocm_reserved) {
+			if (plt_spinlock_trylock(&ocm->lock) != 0) {
+				cn10k_ml_ocm_free_pages(dev, model->model_id);
+				model->model_mem_map.ocm_reserved = false;
+				model->model_mem_map.tilemask = 0x0;
+				plt_spinlock_unlock(&ocm->lock);
+			}
+		}
+	}
+
+	return ret;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -576,4 +782,5 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Model ops */
 	.model_load = cn10k_ml_model_load,
 	.model_unload = cn10k_ml_model_unload,
+	.model_start = cn10k_ml_model_start,
 };
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index c86ce66f19..989af978c4 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -25,6 +25,9 @@ struct cn10k_ml_req {
 
 	/* Job command */
 	struct ml_job_cmd_s jcmd;
+
+	/* Timeout cycle */
+	uint64_t timeout;
 } __rte_aligned(ROC_ALIGN);
 
 /* Request queue */
@@ -61,5 +64,6 @@ extern struct rte_ml_dev_ops cn10k_ml_ops;
 int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			uint16_t *model_id);
 int cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
+int cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id);
 
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 19/39] ml/cnxk: enable support to stop an ML models
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (17 preceding siblings ...)
  2023-02-07 16:06   ` [PATCH v5 18/39] ml/cnxk: enable support to start an ML model Srikanth Yalavarthi
@ 2023-02-07 16:06   ` Srikanth Yalavarthi
  2023-02-07 16:07   ` [PATCH v5 20/39] ml/cnxk: enable support to get model information Srikanth Yalavarthi
                     ` (20 subsequent siblings)
  39 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:06 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Implemented model stop driver function. A model stop job is
enqueued through scratch registers and is checked for
completion through polling in a synchronous mode. OCM pages
are released after model stop completion.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 115 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cn10k_ml_ops.h |   1 +
 2 files changed, 114 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index e8ce65b182..77d3728d8d 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -295,10 +295,14 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		/* Re-configure */
 		void **models;
 
-		/* Unload all models */
+		/* Stop and unload all models */
 		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 			model = dev->data->models[model_id];
 			if (model != NULL) {
+				if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+					if (cn10k_ml_model_stop(dev, model_id) != 0)
+						plt_err("Could not stop model %u", model_id);
+				}
 				if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
 					if (cn10k_ml_model_unload(dev, model_id) != 0)
 						plt_err("Could not unload model %u", model_id);
@@ -362,10 +366,14 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 
 	mldev = dev->data->dev_private;
 
-	/* Unload all models */
+	/* Stop and unload all models */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 		model = dev->data->models[model_id];
 		if (model != NULL) {
+			if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+				if (cn10k_ml_model_stop(dev, model_id) != 0)
+					plt_err("Could not stop model %u", model_id);
+			}
 			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
 				if (cn10k_ml_model_unload(dev, model_id) != 0)
 					plt_err("Could not unload model %u", model_id);
@@ -767,6 +775,108 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	return ret;
 }
 
+int
+cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+	struct cn10k_ml_req *req;
+
+	bool job_enqueued;
+	bool job_dequeued;
+	bool locked;
+	int ret = 0;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	/* Prepare JD */
+	req = model->req;
+	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
+	req->result.error_code = 0x0;
+	req->result.user_ptr = NULL;
+
+	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_wmb();
+
+	locked = false;
+	while (!locked) {
+		if (plt_spinlock_trylock(&model->lock) != 0) {
+			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+				plt_ml_dbg("Model not started, model = 0x%016lx",
+					   PLT_U64_CAST(model));
+				plt_spinlock_unlock(&model->lock);
+				return 1;
+			}
+
+			if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE) {
+				plt_err("A slow-path job is active for the model = 0x%016lx",
+					PLT_U64_CAST(model));
+				plt_spinlock_unlock(&model->lock);
+				return -EBUSY;
+			}
+
+			model->state = ML_CN10K_MODEL_STATE_JOB_ACTIVE;
+			plt_spinlock_unlock(&model->lock);
+			locked = true;
+		}
+	}
+
+	while (model->model_mem_map.ocm_reserved) {
+		if (plt_spinlock_trylock(&ocm->lock) != 0) {
+			cn10k_ml_ocm_free_pages(dev, model->model_id);
+			model->model_mem_map.ocm_reserved = false;
+			model->model_mem_map.tilemask = 0x0;
+			plt_spinlock_unlock(&ocm->lock);
+		}
+	}
+
+	job_enqueued = false;
+	job_dequeued = false;
+	do {
+		if (!job_enqueued) {
+			req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+			job_enqueued = roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+		}
+
+		if (job_enqueued && !job_dequeued)
+			job_dequeued = roc_ml_scratch_dequeue(&mldev->roc, &req->jd);
+
+		if (job_dequeued)
+			break;
+	} while (plt_tsc_cycles() < req->timeout);
+
+	if (job_dequeued) {
+		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+			if (req->result.error_code == 0x0)
+				ret = 0;
+			else
+				ret = -1;
+		}
+	} else {
+		roc_ml_scratch_queue_reset(&mldev->roc);
+		ret = -ETIME;
+	}
+
+	locked = false;
+	while (!locked) {
+		if (plt_spinlock_trylock(&model->lock) != 0) {
+			model->state = ML_CN10K_MODEL_STATE_LOADED;
+			plt_spinlock_unlock(&model->lock);
+			locked = true;
+		}
+	}
+
+	return ret;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -783,4 +893,5 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.model_load = cn10k_ml_model_load,
 	.model_unload = cn10k_ml_model_unload,
 	.model_start = cn10k_ml_model_start,
+	.model_stop = cn10k_ml_model_stop,
 };
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 989af978c4..22576b93c0 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -65,5 +65,6 @@ int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *para
 			uint16_t *model_id);
 int cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id);
+int cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
 
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 20/39] ml/cnxk: enable support to get model information
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (18 preceding siblings ...)
  2023-02-07 16:06   ` [PATCH v5 19/39] ml/cnxk: enable support to stop an ML models Srikanth Yalavarthi
@ 2023-02-07 16:07   ` Srikanth Yalavarthi
  2023-02-07 16:07   ` [PATCH v5 21/39] ml/cnxk: enable support to update model params Srikanth Yalavarthi
                     ` (19 subsequent siblings)
  39 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:07 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added driver functions to get model information. Added
internal functions to set and get model info.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 55 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h |  9 ++++++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 37 ++++++++++++++++++---
 3 files changed, 97 insertions(+), 4 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 69d6306104..0ded355d81 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -356,3 +356,58 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui
 
 	return 0;
 }
+
+void
+cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model)
+{
+	struct rte_ml_model_info *info;
+	struct rte_ml_io_info *output;
+	struct rte_ml_io_info *input;
+	uint8_t i;
+
+	info = PLT_PTR_CAST(model->info);
+	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
+	output =
+		PLT_PTR_ADD(input, model->metadata.model.num_input * sizeof(struct rte_ml_io_info));
+
+	/* Set model info */
+	memset(info, 0, sizeof(struct rte_ml_model_info));
+	rte_memcpy(info->name, model->metadata.model.name, MRVL_ML_MODEL_NAME_LEN);
+	snprintf(info->version, RTE_ML_STR_MAX, "%u.%u.%u.%u", model->metadata.model.version[0],
+		 model->metadata.model.version[1], model->metadata.model.version[2],
+		 model->metadata.model.version[3]);
+	info->model_id = model->model_id;
+	info->device_id = dev->data->dev_id;
+	info->batch_size = model->batch_size;
+	info->nb_inputs = model->metadata.model.num_input;
+	info->input_info = input;
+	info->nb_outputs = model->metadata.model.num_output;
+	info->output_info = output;
+	info->wb_size = model->metadata.weights_bias.file_size;
+
+	/* Set input info */
+	for (i = 0; i < info->nb_inputs; i++) {
+		rte_memcpy(input[i].name, model->metadata.input[i].input_name,
+			   MRVL_ML_INPUT_NAME_LEN);
+		input[i].dtype = model->metadata.input[i].input_type;
+		input[i].qtype = model->metadata.input[i].model_input_type;
+		input[i].shape.format = model->metadata.input[i].shape.format;
+		input[i].shape.w = model->metadata.input[i].shape.w;
+		input[i].shape.x = model->metadata.input[i].shape.x;
+		input[i].shape.y = model->metadata.input[i].shape.y;
+		input[i].shape.z = model->metadata.input[i].shape.z;
+	}
+
+	/* Set output info */
+	for (i = 0; i < info->nb_outputs; i++) {
+		rte_memcpy(output[i].name, model->metadata.output[i].output_name,
+			   MRVL_ML_OUTPUT_NAME_LEN);
+		output[i].dtype = model->metadata.output[i].output_type;
+		output[i].qtype = model->metadata.output[i].model_output_type;
+		output[i].shape.format = RTE_ML_IO_FORMAT_1D;
+		output[i].shape.w = model->metadata.output[i].size;
+		output[i].shape.x = 1;
+		output[i].shape.y = 1;
+		output[i].shape.z = 1;
+	}
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 355915deeb..75990fe1e4 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -422,6 +422,14 @@ struct cn10k_ml_model {
 	/* Tile and memory information object */
 	struct cn10k_ml_ocm_model_map model_mem_map;
 
+	/* Internal model information structure
+	 * Size of the buffer = sizeof(struct rte_ml_model_info)
+	 *                    + num_inputs * sizeof(struct rte_ml_io_info)
+	 *                    + num_outputs * sizeof(struct rte_ml_io_info).
+	 * Structures would be arranged in the same order in the buffer.
+	 */
+	uint8_t *info;
+
 	/* Spinlock, used to update model state */
 	plt_spinlock_t lock;
 
@@ -438,5 +446,6 @@ void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
 				uint8_t *base_dma_addr);
 int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, uint8_t *buffer,
 				   uint16_t *wb_pages, uint16_t *scratch_pages);
+void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 77d3728d8d..ad9b3dfd21 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -506,6 +506,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
 	size_t model_data_size;
+	size_t model_info_size;
 	uint8_t *base_dma_addr;
 	uint16_t scratch_pages;
 	uint16_t wb_pages;
@@ -544,8 +545,13 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
 			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
 	model_data_size = PLT_ALIGN_CEIL(model_data_size, ML_CN10K_ALIGN_SIZE);
+	model_info_size = sizeof(struct rte_ml_model_info) +
+			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
+			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
+	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
+
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
-		  2 * model_data_size +
+		  2 * model_data_size + model_info_size +
 		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE);
 
 	/* Allocate memzone for model object and model data */
@@ -585,10 +591,12 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	model->model_mem_map.wb_pages = wb_pages;
 	model->model_mem_map.scratch_pages = scratch_pages;
 
+	/* Set model info */
+	model->info = PLT_PTR_ADD(base_dma_addr, 2 * model_data_size);
+	cn10k_ml_model_info_set(dev, model);
+
 	/* Set slow-path request address and state */
-	model->req = PLT_PTR_ADD(
-		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
-				  2 * model_data_size);
+	model->req = PLT_PTR_ADD(model->info, model_info_size);
 
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
@@ -877,6 +885,26 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	return ret;
 }
 
+static int
+cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
+			struct rte_ml_model_info *model_info)
+{
+	struct cn10k_ml_model *model;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	rte_memcpy(model_info, model->info, sizeof(struct rte_ml_model_info));
+	model_info->input_info = ((struct rte_ml_model_info *)model->info)->input_info;
+	model_info->output_info = ((struct rte_ml_model_info *)model->info)->output_info;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -894,4 +922,5 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.model_unload = cn10k_ml_model_unload,
 	.model_start = cn10k_ml_model_start,
 	.model_stop = cn10k_ml_model_stop,
+	.model_info_get = cn10k_ml_model_info_get,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 21/39] ml/cnxk: enable support to update model params
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (19 preceding siblings ...)
  2023-02-07 16:07   ` [PATCH v5 20/39] ml/cnxk: enable support to get model information Srikanth Yalavarthi
@ 2023-02-07 16:07   ` Srikanth Yalavarthi
  2023-02-07 16:07   ` [PATCH v5 22/39] ml/cnxk: add support to get IO buffer sizes Srikanth Yalavarthi
                     ` (18 subsequent siblings)
  39 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:07 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added cnxk driver functions to update model params or weights
and bias after a models is loaded. Updating model params would
not require reloading the model.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index ad9b3dfd21..92bf1a0854 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -905,6 +905,36 @@ cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 	return 0;
 }
 
+static int
+cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
+{
+	struct cn10k_ml_model *model;
+	size_t size;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	if (model->state == ML_CN10K_MODEL_STATE_UNKNOWN)
+		return -1;
+	else if (model->state != ML_CN10K_MODEL_STATE_LOADED)
+		return -EBUSY;
+
+	size = model->metadata.init_model.file_size + model->metadata.main_model.file_size +
+	       model->metadata.finish_model.file_size + model->metadata.weights_bias.file_size;
+
+	/* Update model weights & bias */
+	rte_memcpy(model->addr.wb_load_addr, buffer, model->metadata.weights_bias.file_size);
+
+	/* Copy data from load to run. run address to be used by MLIP */
+	rte_memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, size);
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -923,4 +953,5 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.model_start = cn10k_ml_model_start,
 	.model_stop = cn10k_ml_model_stop,
 	.model_info_get = cn10k_ml_model_info_get,
+	.model_params_update = cn10k_ml_model_params_update,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 22/39] ml/cnxk: add support to get IO buffer sizes
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (20 preceding siblings ...)
  2023-02-07 16:07   ` [PATCH v5 21/39] ml/cnxk: enable support to update model params Srikanth Yalavarthi
@ 2023-02-07 16:07   ` Srikanth Yalavarthi
  2023-02-07 16:07   ` [PATCH v5 23/39] ml/cnxk: enable quantization and dequantization Srikanth Yalavarthi
                     ` (17 subsequent siblings)
  39 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:07 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added driver functions to get input and output buffer sizes
for a given batch size. This function would compute the buffer
size based on specific requirements of the device.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 52 ++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 92bf1a0854..b5c89bee40 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -935,6 +935,54 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *bu
 	return 0;
 }
 
+static int
+cn10k_ml_io_input_size_get(struct rte_ml_dev *dev, uint16_t model_id, uint32_t nb_batches,
+			   uint64_t *input_qsize, uint64_t *input_dsize)
+{
+	struct cn10k_ml_model *model;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	if (input_qsize != NULL)
+		*input_qsize = PLT_U64_CAST(model->addr.total_input_sz_q *
+					    PLT_DIV_CEIL(nb_batches, model->batch_size));
+
+	if (input_dsize != NULL)
+		*input_dsize = PLT_U64_CAST(model->addr.total_input_sz_d *
+					    PLT_DIV_CEIL(nb_batches, model->batch_size));
+
+	return 0;
+}
+
+static int
+cn10k_ml_io_output_size_get(struct rte_ml_dev *dev, uint16_t model_id, uint32_t nb_batches,
+			    uint64_t *output_qsize, uint64_t *output_dsize)
+{
+	struct cn10k_ml_model *model;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	if (output_qsize != NULL)
+		*output_qsize = PLT_U64_CAST(model->addr.total_output_sz_q *
+					     PLT_DIV_CEIL(nb_batches, model->batch_size));
+
+	if (output_dsize != NULL)
+		*output_dsize = PLT_U64_CAST(model->addr.total_output_sz_d *
+					     PLT_DIV_CEIL(nb_batches, model->batch_size));
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -954,4 +1002,8 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.model_stop = cn10k_ml_model_stop,
 	.model_info_get = cn10k_ml_model_info_get,
 	.model_params_update = cn10k_ml_model_params_update,
+
+	/* I/O ops */
+	.io_input_size_get = cn10k_ml_io_input_size_get,
+	.io_output_size_get = cn10k_ml_io_output_size_get,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 23/39] ml/cnxk: enable quantization and dequantization
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (21 preceding siblings ...)
  2023-02-07 16:07   ` [PATCH v5 22/39] ml/cnxk: add support to get IO buffer sizes Srikanth Yalavarthi
@ 2023-02-07 16:07   ` Srikanth Yalavarthi
  2023-02-07 16:07   ` [PATCH v5 24/39] ml/cnxk: enable support to dump device debug info Srikanth Yalavarthi
                     ` (16 subsequent siblings)
  39 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:07 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Implemented driver functions to quantize / dequantize input
and output data. Support is enabled for multiple batches.
Quantization / dequantization use the type conversion functions
defined in ML common code.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 151 +++++++++++++++++++++++++++++++++
 1 file changed, 151 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index b5c89bee40..231c9b340b 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -5,6 +5,8 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include <mldev_utils.h>
+
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ops.h"
@@ -983,6 +985,153 @@ cn10k_ml_io_output_size_get(struct rte_ml_dev *dev, uint16_t model_id, uint32_t
 	return 0;
 }
 
+static int
+cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_batches, void *dbuffer,
+		     void *qbuffer)
+{
+	struct cn10k_ml_model *model;
+	uint8_t *lcl_dbuffer;
+	uint8_t *lcl_qbuffer;
+	uint32_t batch_id;
+	uint32_t i;
+	int ret;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	lcl_dbuffer = dbuffer;
+	lcl_qbuffer = qbuffer;
+	batch_id = 0;
+
+next_batch:
+	for (i = 0; i < model->metadata.model.num_input; i++) {
+		if (model->metadata.input[i].input_type ==
+		    model->metadata.input[i].model_input_type) {
+			rte_memcpy(lcl_qbuffer, lcl_dbuffer, model->addr.input[i].sz_d);
+		} else {
+			switch (model->metadata.input[i].model_input_type) {
+			case RTE_ML_IO_TYPE_INT8:
+				ret = rte_ml_io_float32_to_int8(model->metadata.input[i].qscale,
+								model->addr.input[i].nb_elements,
+								lcl_dbuffer, lcl_qbuffer);
+				break;
+			case RTE_ML_IO_TYPE_UINT8:
+				ret = rte_ml_io_float32_to_uint8(model->metadata.input[i].qscale,
+								 model->addr.input[i].nb_elements,
+								 lcl_dbuffer, lcl_qbuffer);
+				break;
+			case RTE_ML_IO_TYPE_INT16:
+				ret = rte_ml_io_float32_to_int16(model->metadata.input[i].qscale,
+								 model->addr.input[i].nb_elements,
+								 lcl_dbuffer, lcl_qbuffer);
+				break;
+			case RTE_ML_IO_TYPE_UINT16:
+				ret = rte_ml_io_float32_to_uint16(model->metadata.input[i].qscale,
+								  model->addr.input[i].nb_elements,
+								  lcl_dbuffer, lcl_qbuffer);
+				break;
+			case RTE_ML_IO_TYPE_FP16:
+				ret = rte_ml_io_float32_to_float16(model->addr.input[i].nb_elements,
+								   lcl_dbuffer, lcl_qbuffer);
+				break;
+			default:
+				plt_err("Unsupported model_input_type[%u] : %u", i,
+					model->metadata.input[i].model_input_type);
+				ret = -ENOTSUP;
+			}
+			if (ret < 0)
+				return ret;
+		}
+
+		lcl_dbuffer += model->addr.input[i].sz_d;
+		lcl_qbuffer += model->addr.input[i].sz_q;
+	}
+
+	batch_id++;
+	if (batch_id < PLT_DIV_CEIL(nb_batches, model->batch_size))
+		goto next_batch;
+
+	return 0;
+}
+
+static int
+cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_batches,
+		       void *qbuffer, void *dbuffer)
+{
+	struct cn10k_ml_model *model;
+	uint8_t *lcl_qbuffer;
+	uint8_t *lcl_dbuffer;
+	uint32_t batch_id;
+	uint32_t i;
+	int ret;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	lcl_dbuffer = dbuffer;
+	lcl_qbuffer = qbuffer;
+	batch_id = 0;
+
+next_batch:
+	for (i = 0; i < model->metadata.model.num_output; i++) {
+		if (model->metadata.output[i].output_type ==
+		    model->metadata.output[i].model_output_type) {
+			rte_memcpy(lcl_dbuffer, lcl_qbuffer, model->addr.output[i].sz_q);
+		} else {
+			switch (model->metadata.output[i].model_output_type) {
+			case RTE_ML_IO_TYPE_INT8:
+				ret = rte_ml_io_int8_to_float32(model->metadata.output[i].dscale,
+								model->addr.output[i].nb_elements,
+								lcl_qbuffer, lcl_dbuffer);
+				break;
+			case RTE_ML_IO_TYPE_UINT8:
+				ret = rte_ml_io_uint8_to_float32(model->metadata.output[i].dscale,
+								 model->addr.output[i].nb_elements,
+								 lcl_qbuffer, lcl_dbuffer);
+				break;
+			case RTE_ML_IO_TYPE_INT16:
+				ret = rte_ml_io_int16_to_float32(model->metadata.output[i].dscale,
+								 model->addr.output[i].nb_elements,
+								 lcl_qbuffer, lcl_dbuffer);
+				break;
+			case RTE_ML_IO_TYPE_UINT16:
+				ret = rte_ml_io_uint16_to_float32(model->metadata.output[i].dscale,
+								  model->addr.output[i].nb_elements,
+								  lcl_qbuffer, lcl_dbuffer);
+				break;
+			case RTE_ML_IO_TYPE_FP16:
+				ret = rte_ml_io_float16_to_float32(
+					model->addr.output[i].nb_elements, lcl_qbuffer,
+					lcl_dbuffer);
+				break;
+			default:
+				plt_err("Unsupported model_output_type[%u] : %u", i,
+					model->metadata.output[i].model_output_type);
+				ret = -ENOTSUP;
+			}
+			if (ret < 0)
+				return ret;
+		}
+
+		lcl_qbuffer += model->addr.output[i].sz_q;
+		lcl_dbuffer += model->addr.output[i].sz_d;
+	}
+
+	batch_id++;
+	if (batch_id < PLT_DIV_CEIL(nb_batches, model->batch_size))
+		goto next_batch;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -1006,4 +1155,6 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* I/O ops */
 	.io_input_size_get = cn10k_ml_io_input_size_get,
 	.io_output_size_get = cn10k_ml_io_output_size_get,
+	.io_quantize = cn10k_ml_io_quantize,
+	.io_dequantize = cn10k_ml_io_dequantize,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 24/39] ml/cnxk: enable support to dump device debug info
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (22 preceding siblings ...)
  2023-02-07 16:07   ` [PATCH v5 23/39] ml/cnxk: enable quantization and dequantization Srikanth Yalavarthi
@ 2023-02-07 16:07   ` Srikanth Yalavarthi
  2023-02-07 16:07   ` [PATCH v5 25/39] ml/cnxk: add driver support for device selftest Srikanth Yalavarthi
                     ` (15 subsequent siblings)
  39 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:07 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added support to dump device debug information. Debug info on
cn10k device includes model state info, OCM usage info, firmware
debug and exception buffer.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c |  51 +++++++++
 drivers/ml/cnxk/cn10k_ml_ocm.h |   1 +
 drivers/ml/cnxk/cn10k_ml_ops.c | 189 +++++++++++++++++++++++++++++++++
 3 files changed, 241 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index c3e4de3e9c..0b04fcc2da 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -458,3 +458,54 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 		}
 	}
 }
+
+static void
+cn10k_ml_ocm_pagemask_to_str(struct cn10k_ml_ocm_tile_info *tile_info, uint16_t nwords, char *str)
+{
+	char *p = str;
+	int word;
+
+	/* add prefix 0x */
+	*p++ = '0';
+	*p++ = 'x';
+
+	/* build one word at a time */
+	for (word = nwords - 1; word >= 0; word--) {
+		sprintf(p, "%02X", tile_info->ocm_mask[word]);
+		p += 2;
+	}
+
+	/* terminate */
+	*p++ = 0;
+}
+
+void
+cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
+{
+	char str[ML_CN10K_OCM_NUMPAGES / 4 + 2]; /* nibbles + prefix '0x' */
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+	uint8_t tile_id;
+	uint8_t word_id;
+	int wb_pages;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+
+	fprintf(fp, "OCM State:\n");
+	for (tile_id = 0; tile_id < ocm->num_tiles; tile_id++) {
+		cn10k_ml_ocm_pagemask_to_str(&ocm->tile_ocm_info[tile_id], ocm->mask_words, str);
+
+		wb_pages = 0 - ocm->tile_ocm_info[tile_id].scratch_pages;
+		for (word_id = 0; word_id < ML_CN10K_OCM_MASKWORDS; word_id++)
+			wb_pages +=
+				__builtin_popcount(ocm->tile_ocm_info[tile_id].ocm_mask[word_id]);
+
+		fprintf(fp,
+			"tile = %2u, scratch_pages = %4u,"
+			" wb_pages = %4d, last_wb_page = %4d,"
+			" pagemask = %s\n",
+			tile_id, ocm->tile_ocm_info[tile_id].scratch_pages, wb_pages,
+			ocm->tile_ocm_info[tile_id].last_wb_page, str);
+	}
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 32c9b17afc..0c7172a671 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -83,5 +83,6 @@ int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16
 void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t tilemask,
 				int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages);
 void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id);
+void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp);
 
 #endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 231c9b340b..2d7d760536 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -14,10 +14,25 @@
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
 
+/* Debug print width */
+#define STR_LEN	  12
+#define FIELD_LEN 16
+#define LINE_LEN  90
+
 /* ML Job descriptor flags */
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
+static void
+print_line(FILE *fp, int len)
+{
+	int i;
+
+	for (i = 0; i < len; i++)
+		fprintf(fp, "-");
+	fprintf(fp, "\n");
+}
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -116,6 +131,102 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	return NULL;
 }
 
+static void
+cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
+{
+
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+	char str[STR_LEN];
+	uint8_t i;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+	model = dev->data->models[model_id];
+
+	/* Print debug info */
+	print_line(fp, LINE_LEN);
+	fprintf(fp, " Model Information (%s)\n", model->metadata.model.name);
+	print_line(fp, LINE_LEN);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->metadata.model.name);
+	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version", model->metadata.model.version[0],
+		model->metadata.model.version[1], model->metadata.model.version[2],
+		model->metadata.model.version[3]);
+	if (strlen(model->name) != 0)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "debug_name", model->name);
+	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->metadata.model.batch_size);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_layers", model->metadata.model.num_layers);
+
+	/* Print model state */
+	if (model->state == ML_CN10K_MODEL_STATE_LOADED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+	if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+	if (model->state == ML_CN10K_MODEL_STATE_STARTED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+
+	/* Print OCM status */
+	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
+		model->metadata.model.ocm_wb_range_end - model->metadata.model.ocm_wb_range_start +
+			1);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", model->model_mem_map.wb_pages);
+	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
+		ocm->size_per_tile - model->metadata.model.ocm_tmp_range_floor);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages", model->model_mem_map.scratch_pages);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
+		model->metadata.model.tile_end - model->metadata.model.tile_start + 1);
+
+	if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
+			ML_CN10K_OCM_NUMTILES / 4, model->model_mem_map.tilemask);
+		fprintf(fp, "%*s : 0x%x\n", FIELD_LEN, "ocm_wb_start",
+			model->model_mem_map.wb_page_start * ML_CN10K_OCM_PAGESIZE);
+	}
+
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->metadata.model.num_input);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", model->metadata.model.num_output);
+	fprintf(fp, "\n");
+
+	print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s  %18s  %12s  %14s\n", "input", "input_name", "input_type",
+		"model_input_type", "quantize", "format");
+	print_line(fp, LINE_LEN);
+	for (i = 0; i < model->metadata.model.num_input; i++) {
+		fprintf(fp, "%8u  ", i);
+		fprintf(fp, "%*s  ", 16, model->metadata.input[i].input_name);
+		rte_ml_io_type_to_str(model->metadata.input[i].input_type, str, STR_LEN);
+		fprintf(fp, "%*s  ", 12, str);
+		rte_ml_io_type_to_str(model->metadata.input[i].model_input_type, str, STR_LEN);
+		fprintf(fp, "%*s  ", 18, str);
+		fprintf(fp, "%*s", 12, (model->metadata.input[i].quantize == 1 ? "Yes" : "No"));
+		rte_ml_io_format_to_str(model->metadata.input[i].shape.format, str, STR_LEN);
+		fprintf(fp, "%*s", 16, str);
+		fprintf(fp, "\n");
+	}
+	fprintf(fp, "\n");
+
+	print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "output", "output_name", "output_type",
+		"model_output_type", "dequantize");
+	print_line(fp, LINE_LEN);
+	for (i = 0; i < model->metadata.model.num_output; i++) {
+		fprintf(fp, "%8u  ", i);
+		fprintf(fp, "%*s  ", 16, model->metadata.output[i].output_name);
+		rte_ml_io_type_to_str(model->metadata.output[i].output_type, str, STR_LEN);
+		fprintf(fp, "%*s  ", 12, str);
+		rte_ml_io_type_to_str(model->metadata.output[i].model_output_type, str, STR_LEN);
+		fprintf(fp, "%*s  ", 18, str);
+		fprintf(fp, "%*s", 12, (model->metadata.output[i].dequantize == 1 ? "Yes" : "No"));
+		fprintf(fp, "\n");
+	}
+	fprintf(fp, "\n");
+	print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+}
+
 static void
 cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_model *model,
 				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
@@ -498,6 +609,83 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+static int
+cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_fw *fw;
+
+	uint32_t head_loc;
+	uint32_t tail_loc;
+	uint16_t model_id;
+	uint32_t bufsize;
+	char *head_ptr;
+	int core_id;
+
+	if (roc_env_is_asim())
+		return 0;
+
+	mldev = dev->data->dev_private;
+	fw = &mldev->fw;
+
+	/* Dump model info */
+	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+		model = dev->data->models[model_id];
+		if (model != NULL) {
+			cn10k_ml_model_print(dev, model_id, fp);
+			fprintf(fp, "\n");
+		}
+	}
+
+	/* Dump OCM state */
+	cn10k_ml_ocm_print(dev, fp);
+
+	/* Dump debug buffer */
+	for (core_id = 0; core_id <= 1; core_id++) {
+		bufsize = fw->req->jd.fw_load.debug.debug_buffer_size;
+		if (core_id == 0) {
+			head_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+			tail_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_debug_ptr);
+			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+		} else {
+			head_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+			tail_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_debug_ptr);
+			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+		}
+		if (head_loc < tail_loc) {
+			fprintf(fp, "%.*s\n", tail_loc - head_loc, &head_ptr[head_loc]);
+		} else if (head_loc >= tail_loc + 1) {
+			fprintf(fp, "%.*s\n", bufsize - tail_loc, &head_ptr[head_loc]);
+			fprintf(fp, "%.*s\n", tail_loc, &head_ptr[0]);
+		}
+	}
+
+	/* Dump exception info */
+	for (core_id = 0; core_id <= 1; core_id++) {
+		bufsize = fw->req->jd.fw_load.debug.exception_state_size;
+		if ((core_id == 0) &&
+		    (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0)) {
+			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_exception_buffer);
+			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C0 = 0x%016lx",
+				roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0));
+			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+			fprintf(fp, "%.*s", bufsize, head_ptr);
+		} else if ((core_id == 1) &&
+			   (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0)) {
+			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_exception_buffer);
+			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C1 = 0x%016lx",
+				roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1));
+			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+			fprintf(fp, "%.*s", bufsize, head_ptr);
+		}
+	}
+
+	return 0;
+}
+
 int
 cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
@@ -1139,6 +1327,7 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.dev_close = cn10k_ml_dev_close,
 	.dev_start = cn10k_ml_dev_start,
 	.dev_stop = cn10k_ml_dev_stop,
+	.dev_dump = cn10k_ml_dev_dump,
 
 	/* Queue-pair handling ops */
 	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 25/39] ml/cnxk: add driver support for device selftest
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (23 preceding siblings ...)
  2023-02-07 16:07   ` [PATCH v5 24/39] ml/cnxk: enable support to dump device debug info Srikanth Yalavarthi
@ 2023-02-07 16:07   ` Srikanth Yalavarthi
  2023-02-07 16:07   ` [PATCH v5 26/39] ml/cnxk: enqueue a burst of inference requests Srikanth Yalavarthi
                     ` (14 subsequent siblings)
  39 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:07 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added support for device selftest. Device selftest includes
checking the status of firmware.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 57 ++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 2d7d760536..2fa0522faf 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -686,6 +686,62 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	return 0;
 }
 
+static int
+cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
+{
+	const struct plt_memzone *mz;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_req *req;
+	uint64_t timeout_cycle;
+	bool timeout;
+	int ret;
+
+	mldev = dev->data->dev_private;
+	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cn10k_ml_req), 0,
+					 ML_CN10K_ALIGN_SIZE);
+	if (mz == NULL) {
+		plt_err("Could not allocate reserved memzone");
+		return -ENOMEM;
+	}
+	req = mz->addr;
+
+	/* Prepare load completion structure */
+	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
+	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
+	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST;
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+	req->jd.fw_load.flags = cn10k_ml_fw_flags_get(&mldev->fw);
+	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_wmb();
+
+	/* Enqueue firmware selftest request through scratch registers */
+	timeout = true;
+	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+
+	plt_rmb();
+	do {
+		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
+		    (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+			timeout = false;
+			break;
+		}
+	} while (plt_tsc_cycles() < timeout_cycle);
+
+	/* Check firmware selftest status, clean-up and exit */
+	ret = 0;
+	if (timeout) {
+		ret = -ETIME;
+	} else {
+		if (req->result.error_code != 0)
+			ret = -1;
+	}
+
+	plt_memzone_free(mz);
+
+	return ret;
+}
+
 int
 cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
@@ -1328,6 +1384,7 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.dev_start = cn10k_ml_dev_start,
 	.dev_stop = cn10k_ml_dev_stop,
 	.dev_dump = cn10k_ml_dev_dump,
+	.dev_selftest = cn10k_ml_dev_selftest,
 
 	/* Queue-pair handling ops */
 	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 26/39] ml/cnxk: enqueue a burst of inference requests
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (24 preceding siblings ...)
  2023-02-07 16:07   ` [PATCH v5 25/39] ml/cnxk: add driver support for device selftest Srikanth Yalavarthi
@ 2023-02-07 16:07   ` Srikanth Yalavarthi
  2023-02-07 16:07   ` [PATCH v5 27/39] ml/cnxk: dequeue " Srikanth Yalavarthi
                     ` (13 subsequent siblings)
  39 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:07 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Enabled driver support to enqueue a burst of inference requests
to ML device. Enqueue uses internal ML request structure to queue
the inferences and job completion through polling.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 96 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  7 +++
 2 files changed, 103 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 2fa0522faf..f024487fc1 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -285,6 +285,28 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 	}
 }
 
+static __rte_always_inline void
+cn10k_ml_prep_fp_job_descriptor(struct rte_ml_dev *dev, struct cn10k_ml_req *req,
+				struct rte_ml_op *op)
+{
+	struct cn10k_ml_dev *mldev;
+
+	mldev = dev->data->dev_private;
+
+	req->jd.hdr.jce.w0.u64 = 0;
+	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
+	req->jd.hdr.model_id = op->model_id;
+	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
+	req->jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
+	req->jd.hdr.sp_flags = 0x0;
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+	req->jd.model_run.input_ddr_addr =
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->input.addr));
+	req->jd.model_run.output_ddr_addr =
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->output.addr));
+	req->jd.model_run.num_batches = op->nb_batches;
+}
+
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -450,6 +472,8 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 
 	rte_spinlock_init(&ocm->lock);
 
+	dev->enqueue_burst = cn10k_ml_enqueue_burst;
+
 	mldev->nb_models_loaded = 0;
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
 
@@ -1376,6 +1400,78 @@ cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_ba
 	return 0;
 }
 
+static __rte_always_inline void
+queue_index_advance(uint64_t *index, uint64_t nb_desc)
+{
+	*index = (*index + 1) % nb_desc;
+}
+
+static __rte_always_inline uint64_t
+queue_pending_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
+{
+	return (nb_desc + head - tail) % nb_desc;
+}
+
+static __rte_always_inline uint64_t
+queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
+{
+	return nb_desc - queue_pending_count(head, tail, nb_desc) - 1;
+}
+
+__rte_hot uint16_t
+cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
+		       uint16_t nb_ops)
+{
+	struct cn10k_ml_queue *queue;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_req *req;
+	struct cn10k_ml_qp *qp;
+	struct rte_ml_op *op;
+
+	uint16_t count;
+	uint64_t head;
+	bool enqueued;
+
+	mldev = dev->data->dev_private;
+	qp = dev->data->queue_pairs[qp_id];
+	queue = &qp->queue;
+
+	head = queue->head;
+	nb_ops = PLT_MIN(nb_ops, queue_free_count(head, queue->tail, qp->nb_desc));
+	count = 0;
+
+	if (unlikely(nb_ops == 0))
+		return 0;
+
+enqueue_req:
+	op = ops[count];
+	req = &queue->reqs[head];
+
+	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
+
+	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
+	req->result.user_ptr = op->user_ptr;
+
+	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	enqueued = roc_ml_jcmdq_enqueue_lf(&mldev->roc, &req->jcmd);
+	if (unlikely(!enqueued))
+		goto jcmdq_full;
+
+	req->timeout = plt_tsc_cycles() + queue->wait_cycles;
+	req->op = op;
+
+	queue_index_advance(&head, qp->nb_desc);
+	count++;
+
+	if (count < nb_ops)
+		goto enqueue_req;
+
+jcmdq_full:
+	queue->head = head;
+
+	return count;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 22576b93c0..a1724f6156 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -28,6 +28,9 @@ struct cn10k_ml_req {
 
 	/* Timeout cycle */
 	uint64_t timeout;
+
+	/* Op */
+	struct rte_ml_op *op;
 } __rte_aligned(ROC_ALIGN);
 
 /* Request queue */
@@ -67,4 +70,8 @@ int cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
 
+/* Fast-path ops */
+__rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
+					  struct rte_ml_op **ops, uint16_t nb_ops);
+
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 27/39] ml/cnxk: dequeue a burst of inference requests
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (25 preceding siblings ...)
  2023-02-07 16:07   ` [PATCH v5 26/39] ml/cnxk: enqueue a burst of inference requests Srikanth Yalavarthi
@ 2023-02-07 16:07   ` Srikanth Yalavarthi
  2023-02-07 16:07   ` [PATCH v5 28/39] ml/cnxk: add internal function for sync mode run Srikanth Yalavarthi
                     ` (12 subsequent siblings)
  39 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:07 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Enabled driver support to dequeue inference requests from
internal queue. Dequeue checks for request completion by
polling the status field of the job request.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 61 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  2 ++
 2 files changed, 63 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index f024487fc1..51f1c92a8d 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -473,6 +473,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	rte_spinlock_init(&ocm->lock);
 
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
+	dev->dequeue_burst = cn10k_ml_dequeue_burst;
 
 	mldev->nb_models_loaded = 0;
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
@@ -1418,6 +1419,23 @@ queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
 	return nb_desc - queue_pending_count(head, tail, nb_desc) - 1;
 }
 
+static __rte_always_inline void
+cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
+		       struct rte_ml_op *op)
+{
+	PLT_SET_USED(dev);
+	PLT_SET_USED(qp_id);
+
+	op->impl_opaque = result->error_code;
+
+	if (likely(result->error_code == 0))
+		op->status = RTE_ML_OP_STATUS_SUCCESS;
+	else
+		op->status = RTE_ML_OP_STATUS_ERROR;
+
+	op->user_ptr = result->user_ptr;
+}
+
 __rte_hot uint16_t
 cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
@@ -1472,6 +1490,49 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	return count;
 }
 
+__rte_hot uint16_t
+cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
+		       uint16_t nb_ops)
+{
+	struct cn10k_ml_queue *queue;
+	struct cn10k_ml_req *req;
+	struct cn10k_ml_qp *qp;
+
+	uint64_t status;
+	uint16_t count;
+	uint64_t tail;
+
+	qp = dev->data->queue_pairs[qp_id];
+	queue = &qp->queue;
+
+	tail = queue->tail;
+	nb_ops = PLT_MIN(nb_ops, queue_pending_count(queue->head, tail, qp->nb_desc));
+	count = 0;
+
+	if (unlikely(nb_ops == 0))
+		goto empty_or_active;
+
+dequeue_req:
+	req = &queue->reqs[tail];
+	status = plt_read64(&req->status);
+	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH))
+		goto empty_or_active;
+
+	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
+	ops[count] = req->op;
+
+	queue_index_advance(&tail, qp->nb_desc);
+	count++;
+
+	if (count < nb_ops)
+		goto dequeue_req;
+
+empty_or_active:
+	queue->tail = tail;
+
+	return count;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index a1724f6156..f6aab4a609 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -73,5 +73,7 @@ int cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
 /* Fast-path ops */
 __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					  struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
+					  struct rte_ml_op **ops, uint16_t nb_ops);
 
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 28/39] ml/cnxk: add internal function for sync mode run
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (26 preceding siblings ...)
  2023-02-07 16:07   ` [PATCH v5 27/39] ml/cnxk: dequeue " Srikanth Yalavarthi
@ 2023-02-07 16:07   ` Srikanth Yalavarthi
  2023-02-27 10:42     ` Prince Takkar
  2023-02-07 16:07   ` [PATCH v5 29/39] ml/cnxk: enable support for firmware error codes Srikanth Yalavarthi
                     ` (11 subsequent siblings)
  39 siblings, 1 reply; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:07 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added internal function to execute ML inference requests
in synchronous mode. Sync mode inference execution is used
to launch inference requests without using a queue-pair.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 53 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  1 +
 2 files changed, 54 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 51f1c92a8d..87778c37bb 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1533,6 +1533,59 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	return count;
 }
 
+__rte_hot int
+cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_req *req;
+	bool timeout;
+	int ret = 0;
+
+	mldev = dev->data->dev_private;
+	model = dev->data->models[op->model_id];
+	req = model->req;
+
+	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
+
+	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
+	req->result.user_ptr = op->user_ptr;
+
+	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
+
+	timeout = true;
+	req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	do {
+		if (roc_ml_jcmdq_enqueue_lf(&mldev->roc, &req->jcmd)) {
+			req->op = op;
+			timeout = false;
+			break;
+		}
+	} while (plt_tsc_cycles() < req->timeout);
+
+	if (timeout) {
+		ret = -EBUSY;
+		goto error_enqueue;
+	}
+
+	timeout = true;
+	do {
+		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+			timeout = false;
+			break;
+		}
+	} while (plt_tsc_cycles() < req->timeout);
+
+	if (timeout)
+		ret = -ETIME;
+	else
+		cn10k_ml_result_update(dev, -1, &req->result, req->op);
+
+error_enqueue:
+	return ret;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index f6aab4a609..7c35bf7539 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -75,5 +75,6 @@ __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id
 					  struct rte_ml_op **ops, uint16_t nb_ops);
 __rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					  struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
 
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 29/39] ml/cnxk: enable support for firmware error codes
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (27 preceding siblings ...)
  2023-02-07 16:07   ` [PATCH v5 28/39] ml/cnxk: add internal function for sync mode run Srikanth Yalavarthi
@ 2023-02-07 16:07   ` Srikanth Yalavarthi
  2023-02-07 16:07   ` [PATCH v5 30/39] ml/cnxk: add support to get and reset device stats Srikanth Yalavarthi
                     ` (10 subsequent siblings)
  39 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:07 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Enabled support for error handling. Added error types and subtypes
supported by ML firmware. Enabled support to get device specific
error code and message for a completed ML request.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c |   4 +-
 drivers/ml/cnxk/cn10k_ml_dev.h |  50 +++++++++++++-
 drivers/ml/cnxk/cn10k_ml_ops.c | 117 ++++++++++++++++++++++++++++++---
 drivers/ml/cnxk/cn10k_ml_ops.h |   2 +
 4 files changed, 160 insertions(+), 13 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 837f006bf0..76ed853a3c 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -261,7 +261,7 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 	} while (plt_tsc_cycles() < timeout_cycle);
 
 	/* Check firmware load status, clean-up and exit on failure. */
-	if ((!timeout) && (fw->req->result.error_code == 0)) {
+	if ((!timeout) && (fw->req->result.error_code.u64 == 0)) {
 		cn10k_ml_fw_print_info(fw);
 	} else {
 		/* Set ML to disable new jobs */
@@ -452,7 +452,7 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	} while (plt_tsc_cycles() < timeout_cycle);
 
 	/* Check firmware load status, clean-up and exit on failure. */
-	if ((!timeout) && (fw->req->result.error_code == 0)) {
+	if ((!timeout) && (fw->req->result.error_code.u64 == 0)) {
 		cn10k_ml_fw_print_info(fw);
 	} else {
 		/* Set ML to disable new jobs */
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 8f6bc24370..604a200e26 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -64,6 +64,54 @@ enum cn10k_ml_dev_state {
 	ML_CN10K_DEV_STATE_CLOSED
 };
 
+/* Error types enumeration */
+enum cn10k_ml_error_etype {
+	/* 0x0 */ ML_ETYPE_NO_ERROR = 0, /* No error */
+	/* 0x1 */ ML_ETYPE_FW_NONFATAL,	 /* Firmware non-fatal error */
+	/* 0x2 */ ML_ETYPE_HW_NONFATAL,	 /* Hardware non-fatal error */
+	/* 0x3 */ ML_ETYPE_HW_FATAL,	 /* Hardware fatal error */
+	/* 0x4 */ ML_ETYPE_HW_WARNING,	 /* Hardware warning */
+	/* 0x5 */ ML_ETYPE_DRIVER,	 /* Driver specific error */
+	/* 0x6 */ ML_ETYPE_UNKNOWN,	 /* Unknown error */
+};
+
+/* Firmware non-fatal error sub-type */
+enum cn10k_ml_error_stype_fw_nf {
+	/* 0x0 */ ML_FW_ERR_NOERR = 0,		 /* No error */
+	/* 0x1 */ ML_FW_ERR_UNLOAD_ID_NOT_FOUND, /* Model ID not found during load */
+	/* 0x2 */ ML_FW_ERR_LOAD_LUT_OVERFLOW,	 /* Lookup table overflow at load */
+	/* 0x3 */ ML_FW_ERR_ID_IN_USE,		 /* Model ID already in use */
+	/* 0x4 */ ML_FW_ERR_INVALID_TILEMASK,	 /* Invalid OCM tilemask */
+	/* 0x5 */ ML_FW_ERR_RUN_LUT_OVERFLOW,	 /* Lookup table overflow at run */
+	/* 0x6 */ ML_FW_ERR_RUN_ID_NOT_FOUND,	 /* Model ID not found during run */
+	/* 0x7 */ ML_FW_ERR_COMMAND_NOTSUP,	 /* Unsupported command */
+	/* 0x8 */ ML_FW_ERR_DDR_ADDR_RANGE,	 /* DDR address out of range */
+	/* 0x9 */ ML_FW_ERR_NUM_BATCHES_INVALID, /* Invalid number of batches */
+	/* 0xA */ ML_FW_ERR_INSSYNC_TIMEOUT,	 /* INS sync timeout */
+};
+
+/* Driver error sub-type */
+enum cn10k_ml_error_stype_driver {
+	/* 0x0 */ ML_DRIVER_ERR_NOERR = 0, /* No error */
+	/* 0x1 */ ML_DRIVER_ERR_UNKNOWN,   /* Unable to determine error sub-type */
+	/* 0x2 */ ML_DRIVER_ERR_EXCEPTION, /* Firmware exception */
+	/* 0x3 */ ML_DRIVER_ERR_FW_ERROR,  /* Unknown firmware error */
+};
+
+/* Error structure */
+union cn10k_ml_error_code {
+	struct {
+		/* Error type */
+		uint64_t etype : 4;
+
+		/* Error sub-type */
+		uint64_t stype : 60;
+	} s;
+
+	/* WORD 0 */
+	uint64_t u64;
+};
+
 /* Firmware stats */
 struct cn10k_ml_fw_stats {
 	/* Firmware start cycle */
@@ -82,7 +130,7 @@ struct cn10k_ml_fw_stats {
 /* Result structure */
 struct cn10k_ml_result {
 	/* Job error code */
-	uint64_t error_code;
+	union cn10k_ml_error_code error_code;
 
 	/* Firmware stats */
 	struct cn10k_ml_fw_stats stats;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 87778c37bb..23a9ca4ff2 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -23,6 +23,49 @@
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
+/* Error message length */
+#define ERRMSG_LEN 32
+
+/* Error type database */
+static const struct cn10k_ml_etype_db {
+	enum cn10k_ml_error_etype etype;
+	char name[ERRMSG_LEN];
+} ml_etype_db[] = {
+	{ML_ETYPE_NO_ERROR, "NO_ERROR"},	{ML_ETYPE_FW_NONFATAL, "FW_NON_FATAL"},
+	{ML_ETYPE_HW_NONFATAL, "HW_NON_FATAL"}, {ML_ETYPE_HW_FATAL, "HW_FATAL"},
+	{ML_ETYPE_HW_WARNING, "HW_WARNING"},	{ML_ETYPE_DRIVER, "DRIVER_ERROR"},
+	{ML_ETYPE_UNKNOWN, "UNKNOWN_ERROR"},
+};
+
+/* Hardware non-fatal error subtype database */
+static const struct cn10k_ml_stype_db_hw_nf {
+	enum cn10k_ml_error_stype_fw_nf stype;
+	char msg[ERRMSG_LEN];
+} ml_stype_db_hw_nf[] = {
+	{ML_FW_ERR_NOERR, "NO ERROR"},
+	{ML_FW_ERR_UNLOAD_ID_NOT_FOUND, "UNLOAD MODEL ID NOT FOUND"},
+	{ML_FW_ERR_LOAD_LUT_OVERFLOW, "LOAD LUT OVERFLOW"},
+	{ML_FW_ERR_ID_IN_USE, "MODEL ID IN USE"},
+	{ML_FW_ERR_INVALID_TILEMASK, "INVALID TILEMASK"},
+	{ML_FW_ERR_RUN_LUT_OVERFLOW, "RUN LUT OVERFLOW"},
+	{ML_FW_ERR_RUN_ID_NOT_FOUND, "RUN MODEL ID NOT FOUND"},
+	{ML_FW_ERR_COMMAND_NOTSUP, "COMMAND NOT SUPPORTED"},
+	{ML_FW_ERR_DDR_ADDR_RANGE, "DDR ADDRESS OUT OF RANGE"},
+	{ML_FW_ERR_NUM_BATCHES_INVALID, "INVALID BATCHES"},
+	{ML_FW_ERR_INSSYNC_TIMEOUT, "INSSYNC TIMEOUT"},
+};
+
+/* Driver error subtype database */
+static const struct cn10k_ml_stype_db_driver {
+	enum cn10k_ml_error_stype_driver stype;
+	char msg[ERRMSG_LEN];
+} ml_stype_db_driver[] = {
+	{ML_DRIVER_ERR_NOERR, "NO ERROR"},
+	{ML_DRIVER_ERR_UNKNOWN, "UNKNOWN ERROR"},
+	{ML_DRIVER_ERR_EXCEPTION, "FW EXCEPTION"},
+	{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
+};
+
 static void
 print_line(FILE *fp, int len)
 {
@@ -474,6 +517,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
+	dev->op_error_get = cn10k_ml_op_error_get;
 
 	mldev->nb_models_loaded = 0;
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
@@ -758,7 +802,7 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	if (timeout) {
 		ret = -ETIME;
 	} else {
-		if (req->result.error_code != 0)
+		if (req->result.error_code.u64 != 0)
 			ret = -1;
 	}
 
@@ -936,7 +980,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	/* Prepare JD */
 	req = model->req;
 	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
-	req->result.error_code = 0x0;
+	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
 
 	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
@@ -1017,7 +1061,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 
 	if (job_dequeued) {
 		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
-			if (req->result.error_code == 0)
+			if (req->result.error_code.u64 == 0)
 				ret = 0;
 			else
 				ret = -1;
@@ -1079,7 +1123,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	/* Prepare JD */
 	req = model->req;
 	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
-	req->result.error_code = 0x0;
+	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
 
 	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
@@ -1134,7 +1178,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 
 	if (job_dequeued) {
 		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
-			if (req->result.error_code == 0x0)
+			if (req->result.error_code.u64 == 0x0)
 				ret = 0;
 			else
 				ret = -1;
@@ -1426,12 +1470,30 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 	PLT_SET_USED(dev);
 	PLT_SET_USED(qp_id);
 
-	op->impl_opaque = result->error_code;
+	struct cn10k_ml_dev *mldev;
 
-	if (likely(result->error_code == 0))
+	if (likely(result->error_code.u64 == 0)) {
+		op->impl_opaque = result->error_code.u64;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
-	else
+	} else {
+		/* Handle driver error */
+		if (result->error_code.s.etype == ML_ETYPE_DRIVER) {
+			mldev = dev->data->dev_private;
+
+			/* Check for exception */
+			if ((roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0) ||
+			    (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
+				result->error_code.s.stype = ML_DRIVER_ERR_EXCEPTION;
+			else if ((roc_ml_reg_read64(&mldev->roc, ML_CORE_INT_LO) != 0) ||
+				 (roc_ml_reg_read64(&mldev->roc, ML_CORE_INT_HI) != 0))
+				result->error_code.s.stype = ML_DRIVER_ERR_FW_ERROR;
+			else
+				result->error_code.s.stype = ML_DRIVER_ERR_UNKNOWN;
+		}
+
+		op->impl_opaque = result->error_code.u64;
 		op->status = RTE_ML_OP_STATUS_ERROR;
+	}
 
 	op->user_ptr = result->user_ptr;
 }
@@ -1468,6 +1530,7 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
+	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
 	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
@@ -1515,8 +1578,12 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 dequeue_req:
 	req = &queue->reqs[tail];
 	status = plt_read64(&req->status);
-	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH))
-		goto empty_or_active;
+	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH)) {
+		if (plt_tsc_cycles() < req->timeout)
+			goto empty_or_active;
+		else /* Timeout, set indication of driver error */
+			req->result.error_code.s.etype = ML_ETYPE_DRIVER;
+	}
 
 	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
 	ops[count] = req->op;
@@ -1533,6 +1600,35 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	return count;
 }
 
+__rte_hot int
+cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_ml_op_error *error)
+{
+	union cn10k_ml_error_code *error_code;
+	char msg[RTE_ML_STR_MAX];
+
+	PLT_SET_USED(dev);
+
+	error_code = (union cn10k_ml_error_code *)&op->impl_opaque;
+
+	/* Copy error message */
+	plt_strlcpy(msg, ml_etype_db[error_code->s.etype].name, sizeof(msg));
+
+	/* Copy sub error message */
+	if (error_code->s.etype == ML_ETYPE_HW_NONFATAL) {
+		strcat(msg, " : ");
+		strcat(msg, ml_stype_db_hw_nf[error_code->s.stype].msg);
+	}
+
+	if (error_code->s.etype == ML_ETYPE_DRIVER) {
+		strcat(msg, " : ");
+		strcat(msg, ml_stype_db_driver[error_code->s.stype].msg);
+	}
+
+	plt_strlcpy(error->message, msg, sizeof(error->message));
+
+	return 0;
+}
+
 __rte_hot int
 cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 {
@@ -1549,6 +1645,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
+	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
 	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 7c35bf7539..1784900cff 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -75,6 +75,8 @@ __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id
 					  struct rte_ml_op **ops, uint16_t nb_ops);
 __rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					  struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op,
+				    struct rte_ml_op_error *error);
 __rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
 
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 30/39] ml/cnxk: add support to get and reset device stats
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (28 preceding siblings ...)
  2023-02-07 16:07   ` [PATCH v5 29/39] ml/cnxk: enable support for firmware error codes Srikanth Yalavarthi
@ 2023-02-07 16:07   ` Srikanth Yalavarthi
  2023-02-07 16:07   ` [PATCH v5 31/39] ml/cnxk: add support to handle extended dev stats Srikanth Yalavarthi
                     ` (9 subsequent siblings)
  39 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:07 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Enabled support to get and reset ML device stats. Device stats
include number of requests enqueued/dequeued and error count.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 55 ++++++++++++++++++++++++++++++++--
 drivers/ml/cnxk/cn10k_ml_ops.h |  3 ++
 2 files changed, 55 insertions(+), 3 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 23a9ca4ff2..c38f018a50 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -159,6 +159,10 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->queue.tail = 0;
 	qp->queue.wait_cycles = ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
 	qp->nb_desc = nb_desc;
+	qp->stats.enqueued_count = 0;
+	qp->stats.dequeued_count = 0;
+	qp->stats.enqueue_err_count = 0;
+	qp->stats.dequeue_err_count = 0;
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
@@ -678,6 +682,38 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+static int
+cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
+{
+	struct cn10k_ml_qp *qp;
+	int qp_id;
+
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		stats->enqueued_count += qp->stats.enqueued_count;
+		stats->dequeued_count += qp->stats.dequeued_count;
+		stats->enqueue_err_count += qp->stats.enqueue_err_count;
+		stats->dequeue_err_count += qp->stats.dequeue_err_count;
+	}
+
+	return 0;
+}
+
+static void
+cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
+{
+	struct cn10k_ml_qp *qp;
+	int qp_id;
+
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		qp->stats.enqueued_count = 0;
+		qp->stats.dequeued_count = 0;
+		qp->stats.enqueue_err_count = 0;
+		qp->stats.dequeue_err_count = 0;
+	}
+}
+
 static int
 cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
@@ -1467,15 +1503,23 @@ static __rte_always_inline void
 cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
 		       struct rte_ml_op *op)
 {
-	PLT_SET_USED(dev);
-	PLT_SET_USED(qp_id);
-
 	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_qp *qp;
 
 	if (likely(result->error_code.u64 == 0)) {
+		if (likely(qp_id >= 0)) {
+			qp = dev->data->queue_pairs[qp_id];
+			qp->stats.dequeued_count++;
+		}
+
 		op->impl_opaque = result->error_code.u64;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
 	} else {
+		if (likely(qp_id >= 0)) {
+			qp = dev->data->queue_pairs[qp_id];
+			qp->stats.dequeue_err_count++;
+		}
+
 		/* Handle driver error */
 		if (result->error_code.s.etype == ML_ETYPE_DRIVER) {
 			mldev = dev->data->dev_private;
@@ -1549,6 +1593,7 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 
 jcmdq_full:
 	queue->head = head;
+	qp->stats.enqueued_count += count;
 
 	return count;
 }
@@ -1697,6 +1742,10 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
 	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
 
+	/* Stats ops */
+	.dev_stats_get = cn10k_ml_dev_stats_get,
+	.dev_stats_reset = cn10k_ml_dev_stats_reset,
+
 	/* Model ops */
 	.model_load = cn10k_ml_model_load,
 	.model_unload = cn10k_ml_model_unload,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 1784900cff..65ae8b44f3 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -58,6 +58,9 @@ struct cn10k_ml_qp {
 
 	/* Request queue */
 	struct cn10k_ml_queue queue;
+
+	/* Statistics per queue-pair */
+	struct rte_ml_dev_stats stats;
 };
 
 /* Device ops */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 31/39] ml/cnxk: add support to handle extended dev stats
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (29 preceding siblings ...)
  2023-02-07 16:07   ` [PATCH v5 30/39] ml/cnxk: add support to get and reset device stats Srikanth Yalavarthi
@ 2023-02-07 16:07   ` Srikanth Yalavarthi
  2023-02-07 16:07   ` [PATCH v5 32/39] ml/cnxk: enable support to get xstats in cycles Srikanth Yalavarthi
                     ` (8 subsequent siblings)
  39 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:07 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added support to handle ML device extended stats. Support
is enabled to get xstats names and stats values and reset
xstats. Supported xstats include avg, min and max hardware
and firmware latency.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |   3 +
 drivers/ml/cnxk/cn10k_ml_model.h |  57 +++++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 356 ++++++++++++++++++++++++++++++-
 3 files changed, 415 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 604a200e26..b7ff369ba8 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -372,6 +372,9 @@ struct cn10k_ml_dev {
 
 	/* Number of models loaded */
 	uint16_t nb_models_loaded;
+
+	/* xstats status */
+	bool xstats_enabled;
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 75990fe1e4..1bc748265d 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -399,6 +399,57 @@ struct cn10k_ml_model_addr {
 	uint32_t total_output_sz_d;
 };
 
+/* Extended stats types enum */
+enum cn10k_ml_model_xstats_type {
+	/* Average hardware latency */
+	avg_hw_latency = 0,
+
+	/* Minimum hardware latency */
+	min_hw_latency,
+
+	/* Maximum hardware latency */
+	max_hw_latency,
+
+	/* Average firmware latency */
+	avg_fw_latency,
+
+	/* Minimum firmware latency */
+	min_fw_latency,
+
+	/* Maximum firmware latency */
+	max_fw_latency,
+};
+
+/* Model fast-path stats */
+struct cn10k_ml_model_stats {
+	/* Total hardware latency, sum of all inferences */
+	uint64_t hw_latency_tot;
+
+	/* Minimum hardware latency */
+	uint64_t hw_latency_min;
+
+	/* Maximum hardware latency */
+	uint64_t hw_latency_max;
+
+	/* Total firmware latency, sum of all inferences */
+	uint64_t fw_latency_tot;
+
+	/* Minimum firmware latency */
+	uint64_t fw_latency_min;
+
+	/* Maximum firmware latency */
+	uint64_t fw_latency_max;
+
+	/* Total jobs dequeued */
+	uint64_t dequeued_count;
+
+	/* Hardware stats reset index */
+	uint64_t hw_reset_count;
+
+	/* Firmware stats reset index */
+	uint64_t fw_reset_count;
+};
+
 /* Model Object */
 struct cn10k_ml_model {
 	/* Device reference */
@@ -438,6 +489,12 @@ struct cn10k_ml_model {
 
 	/* Slow-path operations request pointer */
 	struct cn10k_ml_req *req;
+
+	/* Stats for burst ops */
+	struct cn10k_ml_model_stats *burst_stats;
+
+	/* Stats for sync ops */
+	struct cn10k_ml_model_stats *sync_stats;
 };
 
 int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index c38f018a50..880bb6a5a9 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -354,6 +354,134 @@ cn10k_ml_prep_fp_job_descriptor(struct rte_ml_dev *dev, struct cn10k_ml_req *req
 	req->jd.model_run.num_batches = op->nb_batches;
 }
 
+#define ML_AVG_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
+			value += model->burst_stats[qp_id].str##_latency_tot;                      \
+			count += model->burst_stats[qp_id].dequeued_count -                        \
+				 model->burst_stats[qp_id].str##_reset_count;                      \
+		}                                                                                  \
+		value = value / count;                                                             \
+	} while (0)
+
+#define ML_MIN_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+	do {                                                                                       \
+		value = UINT64_MAX;                                                                \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
+			value = PLT_MIN(value, model->burst_stats[qp_id].str##_latency_min);       \
+			count += model->burst_stats[qp_id].dequeued_count -                        \
+				 model->burst_stats[qp_id].str##_reset_count;                      \
+		}                                                                                  \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+
+#define ML_MAX_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
+			value = PLT_MAX(value, model->burst_stats[qp_id].str##_latency_max);       \
+			count += model->burst_stats[qp_id].dequeued_count -                        \
+				 model->burst_stats[qp_id].str##_reset_count;                      \
+		}                                                                                  \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+
+static uint64_t
+cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t model_id,
+			 enum cn10k_ml_model_xstats_type type)
+{
+	struct cn10k_ml_model *model;
+	uint64_t count = 0;
+	uint64_t value;
+	uint32_t qp_id;
+
+	model = dev->data->models[model_id];
+	if (model == NULL)
+		return 0;
+
+	switch (type) {
+	case avg_hw_latency:
+		ML_AVG_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		break;
+	case min_hw_latency:
+		ML_MIN_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		break;
+	case max_hw_latency:
+		ML_MAX_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		break;
+	case avg_fw_latency:
+		ML_AVG_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		break;
+	case min_fw_latency:
+		ML_MIN_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		break;
+	case max_fw_latency:
+		ML_MAX_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		break;
+	default:
+		value = 0;
+	}
+
+	return value;
+}
+
+#define ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
+			model->burst_stats[qp_id].str##_latency_tot = 0;                           \
+			model->burst_stats[qp_id].str##_reset_count =                              \
+				model->burst_stats[qp_id].dequeued_count;                          \
+		}                                                                                  \
+	} while (0)
+
+#define ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
+			model->burst_stats[qp_id].str##_latency_min = UINT64_MAX;                  \
+	} while (0)
+
+#define ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
+			model->burst_stats[qp_id].str##_latency_max = 0;                           \
+	} while (0)
+
+static void
+cn10k_ml_model_xstat_reset(struct rte_ml_dev *dev, uint16_t model_id,
+			   enum cn10k_ml_model_xstats_type type)
+{
+	struct cn10k_ml_model *model;
+	uint32_t qp_id;
+
+	model = dev->data->models[model_id];
+
+	switch (type) {
+	case avg_hw_latency:
+		ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, hw);
+		break;
+	case min_hw_latency:
+		ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, hw);
+		break;
+	case max_hw_latency:
+		ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, hw);
+		break;
+	case avg_fw_latency:
+		ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, fw);
+		break;
+	case min_fw_latency:
+		ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, fw);
+		break;
+	case max_fw_latency:
+		ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, fw);
+		break;
+	default:
+		return;
+	}
+}
+
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -519,6 +647,13 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 
 	rte_spinlock_init(&ocm->lock);
 
+	/* Check firmware stats */
+	if ((mldev->fw.req->jd.fw_load.cap.s.hw_stats) &&
+	    (mldev->fw.req->jd.fw_load.cap.s.fw_stats))
+		mldev->xstats_enabled = true;
+	else
+		mldev->xstats_enabled = false;
+
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
 	dev->op_error_get = cn10k_ml_op_error_get;
@@ -714,6 +849,170 @@ cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 	}
 }
 
+/* Model xstats names */
+struct rte_ml_dev_xstats_map cn10k_ml_model_xstats_table[] = {
+	{avg_hw_latency, "Avg-HW-Latency"}, {min_hw_latency, "Min-HW-Latency"},
+	{max_hw_latency, "Max-HW-Latency"}, {avg_fw_latency, "Avg-FW-Latency"},
+	{min_fw_latency, "Min-FW-Latency"}, {max_fw_latency, "Max-FW-Latency"},
+};
+
+static int
+cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, struct rte_ml_dev_xstats_map *xstats_map,
+			      uint32_t size)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	uint32_t model_id;
+	uint32_t count;
+	uint32_t type;
+	uint32_t id;
+
+	mldev = dev->data->dev_private;
+	if (!mldev->xstats_enabled)
+		return 0;
+
+	if (xstats_map == NULL)
+		return PLT_DIM(cn10k_ml_model_xstats_table) * mldev->nb_models_loaded;
+
+	/* Model xstats names */
+	count = 0;
+	cn10k_ml_dev_info_get(dev, &dev_info);
+
+	for (id = 0; id < PLT_DIM(cn10k_ml_model_xstats_table) * dev_info.max_models; id++) {
+		model_id = id / PLT_DIM(cn10k_ml_model_xstats_table);
+		model = dev->data->models[model_id];
+
+		if (model == NULL)
+			continue;
+
+		xstats_map[count].id = id;
+		type = id % PLT_DIM(cn10k_ml_model_xstats_table);
+
+		snprintf(xstats_map[count].name, RTE_ML_STR_MAX, "%s-%s-cycles",
+			 model->metadata.model.name, cn10k_ml_model_xstats_table[type].name);
+
+		count++;
+		if (count == size)
+			break;
+	}
+
+	return count;
+}
+
+static int
+cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
+				uint64_t *value)
+{
+	struct rte_ml_dev_xstats_map *xstats_map;
+	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_dev *mldev;
+	uint32_t num_xstats;
+	uint32_t model_id;
+	uint32_t type;
+	uint32_t id;
+
+	mldev = dev->data->dev_private;
+	if (!mldev->xstats_enabled)
+		return 0;
+
+	num_xstats = PLT_DIM(cn10k_ml_model_xstats_table) * mldev->nb_models_loaded;
+	xstats_map = rte_zmalloc("cn10k_ml_xstats_map",
+				 sizeof(struct rte_ml_dev_xstats_map) * num_xstats, 0);
+	cn10k_ml_dev_xstats_names_get(dev, xstats_map, num_xstats);
+
+	cn10k_ml_dev_info_get(dev, &dev_info);
+	for (id = 0; id < PLT_DIM(cn10k_ml_model_xstats_table) * dev_info.max_models; id++) {
+		if (strncmp(name, xstats_map[id].name, strlen(name)) == 0) {
+			*stat_id = id;
+			rte_free(xstats_map);
+			break;
+		}
+	}
+
+	if (id == PLT_DIM(cn10k_ml_model_xstats_table) * dev_info.max_models)
+		return -EINVAL;
+
+	model_id = id / PLT_DIM(cn10k_ml_model_xstats_table);
+	type = id % PLT_DIM(cn10k_ml_model_xstats_table);
+	*value = cn10k_ml_model_xstat_get(dev, model_id, type);
+
+	return 0;
+}
+
+static int
+cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, const uint16_t *stat_ids, uint64_t *values,
+			uint16_t nb_ids)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	uint32_t model_id;
+	uint32_t count;
+	uint32_t type;
+	uint32_t i;
+
+	mldev = dev->data->dev_private;
+	if (!mldev->xstats_enabled)
+		return 0;
+
+	count = 0;
+	for (i = 0; i < nb_ids; i++) {
+		model_id = stat_ids[i] / PLT_DIM(cn10k_ml_model_xstats_table);
+		model = dev->data->models[model_id];
+
+		if (model == NULL)
+			continue;
+
+		type = stat_ids[i] % PLT_DIM(cn10k_ml_model_xstats_table);
+		values[i] = cn10k_ml_model_xstat_get(dev, model_id, type);
+		count++;
+	}
+
+	return count;
+}
+
+static int
+cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, const uint16_t *stat_ids, uint16_t nb_ids)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	uint32_t model_id;
+	uint32_t type;
+	uint32_t i;
+
+	mldev = dev->data->dev_private;
+	if (!mldev->xstats_enabled)
+		return 0;
+
+	cn10k_ml_dev_info_get(dev, &dev_info);
+	if (stat_ids == NULL) {
+		for (i = 0; i < PLT_DIM(cn10k_ml_model_xstats_table) * dev_info.max_models; i++) {
+			model_id = i / PLT_DIM(cn10k_ml_model_xstats_table);
+			model = dev->data->models[model_id];
+
+			if (model == NULL)
+				continue;
+
+			type = i % PLT_DIM(cn10k_ml_model_xstats_table);
+			cn10k_ml_model_xstat_reset(dev, model_id, type);
+		}
+	} else {
+		for (i = 0; i < nb_ids; i++) {
+			model_id = stat_ids[i] / PLT_DIM(cn10k_ml_model_xstats_table);
+			model = dev->data->models[model_id];
+
+			if (model == NULL)
+				continue;
+
+			type = stat_ids[i] % PLT_DIM(cn10k_ml_model_xstats_table);
+			cn10k_ml_model_xstat_reset(dev, model_id, type);
+		}
+	}
+
+	return 0;
+}
+
 static int
 cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
@@ -856,6 +1155,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
+	size_t model_stats_size;
 	size_t model_data_size;
 	size_t model_info_size;
 	uint8_t *base_dma_addr;
@@ -864,6 +1164,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	uint64_t mz_size;
 	uint16_t idx;
 	bool found;
+	int qp_id;
 	int ret;
 
 	ret = cn10k_ml_model_metadata_check(params->addr, params->size);
@@ -900,10 +1201,12 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
 			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
 	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
+	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_model_stats);
 
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
 		  2 * model_data_size + model_info_size +
-		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE);
+		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE) +
+		  model_stats_size;
 
 	/* Allocate memzone for model object and model data */
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
@@ -949,6 +1252,24 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	/* Set slow-path request address and state */
 	model->req = PLT_PTR_ADD(model->info, model_info_size);
 
+	/* Reset burst and sync stats */
+	model->burst_stats = PLT_PTR_ADD(
+		model->req, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE));
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
+		model->burst_stats[qp_id].hw_latency_tot = 0;
+		model->burst_stats[qp_id].hw_latency_min = UINT64_MAX;
+		model->burst_stats[qp_id].hw_latency_max = 0;
+		model->burst_stats[qp_id].fw_latency_tot = 0;
+		model->burst_stats[qp_id].fw_latency_min = UINT64_MAX;
+		model->burst_stats[qp_id].fw_latency_max = 0;
+		model->burst_stats[qp_id].hw_reset_count = 0;
+		model->burst_stats[qp_id].fw_reset_count = 0;
+		model->burst_stats[qp_id].dequeued_count = 0;
+	}
+	model->sync_stats =
+		PLT_PTR_ADD(model->burst_stats,
+			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_model_stats));
+
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
@@ -1503,15 +1824,44 @@ static __rte_always_inline void
 cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
 		       struct rte_ml_op *op)
 {
+	struct cn10k_ml_model_stats *stats;
+	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
+	uint64_t hw_latency;
+	uint64_t fw_latency;
 
 	if (likely(result->error_code.u64 == 0)) {
+		model = dev->data->models[op->model_id];
 		if (likely(qp_id >= 0)) {
 			qp = dev->data->queue_pairs[qp_id];
 			qp->stats.dequeued_count++;
+			stats = &model->burst_stats[qp_id];
+		} else {
+			stats = model->sync_stats;
+		}
+
+		if (unlikely(stats->dequeued_count == stats->hw_reset_count)) {
+			stats->hw_latency_min = UINT64_MAX;
+			stats->hw_latency_max = 0;
 		}
 
+		if (unlikely(stats->dequeued_count == stats->fw_reset_count)) {
+			stats->fw_latency_min = UINT64_MAX;
+			stats->fw_latency_max = 0;
+		}
+
+		hw_latency = result->stats.hw_end - result->stats.hw_start;
+		fw_latency = result->stats.fw_end - result->stats.fw_start - hw_latency;
+
+		stats->hw_latency_tot += hw_latency;
+		stats->hw_latency_min = PLT_MIN(stats->hw_latency_min, hw_latency);
+		stats->hw_latency_max = PLT_MAX(stats->hw_latency_max, hw_latency);
+		stats->fw_latency_tot += fw_latency;
+		stats->fw_latency_min = PLT_MIN(stats->fw_latency_min, fw_latency);
+		stats->fw_latency_max = PLT_MAX(stats->fw_latency_max, fw_latency);
+		stats->dequeued_count++;
+
 		op->impl_opaque = result->error_code.u64;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
 	} else {
@@ -1745,6 +2095,10 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Stats ops */
 	.dev_stats_get = cn10k_ml_dev_stats_get,
 	.dev_stats_reset = cn10k_ml_dev_stats_reset,
+	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
+	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
+	.dev_xstats_get = cn10k_ml_dev_xstats_get,
+	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
 
 	/* Model ops */
 	.model_load = cn10k_ml_model_load,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 32/39] ml/cnxk: enable support to get xstats in cycles
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (30 preceding siblings ...)
  2023-02-07 16:07   ` [PATCH v5 31/39] ml/cnxk: add support to handle extended dev stats Srikanth Yalavarthi
@ 2023-02-07 16:07   ` Srikanth Yalavarthi
  2023-02-15 12:33     ` Shivah Shankar Shankar Narayan Rao
  2023-02-16  4:40     ` Prince Takkar
  2023-02-07 16:07   ` [PATCH v5 33/39] ml/cnxk: add support to report DPE FW warnings Srikanth Yalavarthi
                     ` (7 subsequent siblings)
  39 siblings, 2 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:07 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Enabled support to retrieve xstats in either cycles or ns.
Access to sclk is enabled only if an RVU device is probed
during initialization. Driver would return the xstats in
nanoseconds only when an RVU device is probed, else would
fallback to cycles.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 880bb6a5a9..5689fbfcb2 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -394,6 +394,8 @@ cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t model_id,
 			 enum cn10k_ml_model_xstats_type type)
 {
 	struct cn10k_ml_model *model;
+	uint16_t rclk_freq; /* MHz */
+	uint16_t sclk_freq; /* MHz */
 	uint64_t count = 0;
 	uint64_t value;
 	uint32_t qp_id;
@@ -425,6 +427,10 @@ cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t model_id,
 		value = 0;
 	}
 
+	roc_clk_freq_get(&rclk_freq, &sclk_freq);
+	if (sclk_freq != 0) /* return in ns */
+		value = (value * 1000ULL) / sclk_freq;
+
 	return value;
 }
 
@@ -863,6 +869,8 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, struct rte_ml_dev_xstats_m
 	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
+	uint16_t rclk_freq;
+	uint16_t sclk_freq;
 	uint32_t model_id;
 	uint32_t count;
 	uint32_t type;
@@ -878,6 +886,7 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, struct rte_ml_dev_xstats_m
 	/* Model xstats names */
 	count = 0;
 	cn10k_ml_dev_info_get(dev, &dev_info);
+	roc_clk_freq_get(&rclk_freq, &sclk_freq);
 
 	for (id = 0; id < PLT_DIM(cn10k_ml_model_xstats_table) * dev_info.max_models; id++) {
 		model_id = id / PLT_DIM(cn10k_ml_model_xstats_table);
@@ -889,8 +898,14 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, struct rte_ml_dev_xstats_m
 		xstats_map[count].id = id;
 		type = id % PLT_DIM(cn10k_ml_model_xstats_table);
 
-		snprintf(xstats_map[count].name, RTE_ML_STR_MAX, "%s-%s-cycles",
-			 model->metadata.model.name, cn10k_ml_model_xstats_table[type].name);
+		if (sclk_freq == 0)
+			snprintf(xstats_map[count].name, RTE_ML_STR_MAX, "%s-%s-cycles",
+				 model->metadata.model.name,
+				 cn10k_ml_model_xstats_table[type].name);
+		else
+			snprintf(xstats_map[count].name, RTE_ML_STR_MAX, "%s-%s-ns",
+				 model->metadata.model.name,
+				 cn10k_ml_model_xstats_table[type].name);
 
 		count++;
 		if (count == size)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 33/39] ml/cnxk: add support to report DPE FW warnings
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (31 preceding siblings ...)
  2023-02-07 16:07   ` [PATCH v5 32/39] ml/cnxk: enable support to get xstats in cycles Srikanth Yalavarthi
@ 2023-02-07 16:07   ` Srikanth Yalavarthi
  2023-02-07 16:07   ` [PATCH v5 34/39] ml/cnxk: add support to enable model data caching Srikanth Yalavarthi
                     ` (6 subsequent siblings)
  39 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:07 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Enabled support to enable and report DPE warnings from ML
firmware. Configure firmware load flags based on the device
arguments.

Default values:
	enable_dpe_errors = 1
	report_dpe_errors = 0

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 94 +++++++++++++++++++++++++++++++---
 drivers/ml/cnxk/cn10k_ml_dev.h |  6 +++
 2 files changed, 93 insertions(+), 7 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 76ed853a3c..ac6592891b 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -17,9 +17,13 @@
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ops.h"
 
-#define CN10K_ML_FW_PATH "fw_path"
+#define CN10K_ML_FW_PATH		"fw_path"
+#define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
+#define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
 
-#define CN10K_ML_FW_PATH_DEFAULT "/lib/firmware/mlip-fw.bin"
+#define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
+#define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1
+#define CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT 0
 
 /* ML firmware macros */
 #define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
@@ -28,9 +32,13 @@
 #define FW_EXCEPTION_BUFFER_SIZE 0x400
 #define FW_LINKER_OFFSET	 0x80000
 #define FW_WAIT_CYCLES		 100
-#define FW_LOAD_FLAGS		 0x1
 
-static const char *const valid_args[] = {CN10K_ML_FW_PATH, NULL};
+/* Firmware flags */
+#define FW_ENABLE_DPE_WARNING_BITMASK BIT(0)
+#define FW_REPORT_DPE_WARNING_BITMASK BIT(1)
+
+static const char *const valid_args[] = {CN10K_ML_FW_PATH, CN10K_ML_FW_ENABLE_DPE_WARNINGS,
+					 CN10K_ML_FW_REPORT_DPE_WARNINGS, NULL};
 
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
@@ -49,9 +57,25 @@ parse_string_arg(const char *key __rte_unused, const char *value, void *extra_ar
 	return 0;
 }
 
+static int
+parse_integer_arg(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	int *i = (int *)extra_args;
+
+	*i = atoi(value);
+	if (*i < 0) {
+		plt_err("Argument has to be positive.");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 static int
 cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mldev)
 {
+	bool enable_dpe_warnings_set = false;
+	bool report_dpe_warnings_set = false;
 	struct rte_kvargs *kvlist = NULL;
 	bool fw_path_set = false;
 	char *fw_path = NULL;
@@ -76,6 +100,30 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		fw_path_set = true;
 	}
 
+	if (rte_kvargs_count(kvlist, CN10K_ML_FW_ENABLE_DPE_WARNINGS) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_ENABLE_DPE_WARNINGS,
+					 &parse_integer_arg, &mldev->fw.enable_dpe_warnings);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n",
+				CN10K_ML_FW_ENABLE_DPE_WARNINGS);
+			ret = -EINVAL;
+			goto exit;
+		}
+		enable_dpe_warnings_set = true;
+	}
+
+	if (rte_kvargs_count(kvlist, CN10K_ML_FW_REPORT_DPE_WARNINGS) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_REPORT_DPE_WARNINGS,
+					 &parse_integer_arg, &mldev->fw.report_dpe_warnings);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n",
+				CN10K_ML_FW_REPORT_DPE_WARNINGS);
+			ret = -EINVAL;
+			goto exit;
+		}
+		report_dpe_warnings_set = true;
+	}
+
 check_args:
 	if (!fw_path_set)
 		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
@@ -83,6 +131,30 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		mldev->fw.path = fw_path;
 	plt_info("ML: %s = %s", CN10K_ML_FW_PATH, mldev->fw.path);
 
+	if (!enable_dpe_warnings_set) {
+		mldev->fw.enable_dpe_warnings = CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT;
+	} else {
+		if ((mldev->fw.enable_dpe_warnings < 0) || (mldev->fw.enable_dpe_warnings > 1)) {
+			plt_err("Invalid argument, %s = %d\n", CN10K_ML_FW_ENABLE_DPE_WARNINGS,
+				mldev->fw.enable_dpe_warnings);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_info("ML: %s = %d", CN10K_ML_FW_ENABLE_DPE_WARNINGS, mldev->fw.enable_dpe_warnings);
+
+	if (!report_dpe_warnings_set) {
+		mldev->fw.report_dpe_warnings = CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT;
+	} else {
+		if ((mldev->fw.report_dpe_warnings < 0) || (mldev->fw.report_dpe_warnings > 1)) {
+			plt_err("Invalid argument, %s = %d\n", CN10K_ML_FW_REPORT_DPE_WARNINGS,
+				mldev->fw.report_dpe_warnings);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_info("ML: %s = %d", CN10K_ML_FW_REPORT_DPE_WARNINGS, mldev->fw.report_dpe_warnings);
+
 exit:
 	if (kvlist)
 		rte_kvargs_free(kvlist);
@@ -208,9 +280,15 @@ cn10k_ml_fw_print_info(struct cn10k_ml_fw *fw)
 uint64_t
 cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 {
-	PLT_SET_USED(fw);
+	uint64_t flags = 0x0;
+
+	if (fw->enable_dpe_warnings)
+		flags = flags | FW_ENABLE_DPE_WARNING_BITMASK;
+
+	if (fw->report_dpe_warnings)
+		flags = flags | FW_REPORT_DPE_WARNING_BITMASK;
 
-	return FW_LOAD_FLAGS;
+	return flags;
 }
 
 static int
@@ -614,4 +692,6 @@ RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table);
 RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
 
-RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH "=<path>");
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD,
+			      CN10K_ML_FW_PATH "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
+					       "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS "=<0|1>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index b7ff369ba8..9ba56ffba6 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -349,6 +349,12 @@ struct cn10k_ml_fw {
 	/* Firmware file path */
 	const char *path;
 
+	/* Enable DPE warnings */
+	int enable_dpe_warnings;
+
+	/* Report DPE warnings */
+	int report_dpe_warnings;
+
 	/* Data buffer */
 	uint8_t *data;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 34/39] ml/cnxk: add support to enable model data caching
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (32 preceding siblings ...)
  2023-02-07 16:07   ` [PATCH v5 33/39] ml/cnxk: add support to report DPE FW warnings Srikanth Yalavarthi
@ 2023-02-07 16:07   ` Srikanth Yalavarthi
  2023-02-07 16:07   ` [PATCH v5 35/39] ml/cnxk: add support to select OCM allocation mode Srikanth Yalavarthi
                     ` (5 subsequent siblings)
  39 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:07 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added device argument 'cache_model_data' to enable model data
caching. An inference request would be executed with dummy data
in synchronous mode during model start stage. This run would
cache the model weights and bias in the memory and result in
improved inference throughput.

cache_model_data = 1, enable (default)
cache_model_data = 0, disable

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 33 ++++++++++++++++++++--
 drivers/ml/cnxk/cn10k_ml_dev.h |  3 ++
 drivers/ml/cnxk/cn10k_ml_ops.c | 50 ++++++++++++++++++++++++++++++++++
 3 files changed, 84 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index ac6592891b..948708a420 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -20,10 +20,12 @@
 #define CN10K_ML_FW_PATH		"fw_path"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
+#define CN10K_ML_DEV_CACHE_MODEL_DATA	"cache_model_data"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT 0
+#define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT	1
 
 /* ML firmware macros */
 #define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
@@ -38,7 +40,8 @@
 #define FW_REPORT_DPE_WARNING_BITMASK BIT(1)
 
 static const char *const valid_args[] = {CN10K_ML_FW_PATH, CN10K_ML_FW_ENABLE_DPE_WARNINGS,
-					 CN10K_ML_FW_REPORT_DPE_WARNINGS, NULL};
+					 CN10K_ML_FW_REPORT_DPE_WARNINGS,
+					 CN10K_ML_DEV_CACHE_MODEL_DATA, NULL};
 
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
@@ -76,6 +79,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 {
 	bool enable_dpe_warnings_set = false;
 	bool report_dpe_warnings_set = false;
+	bool cache_model_data_set = false;
 	struct rte_kvargs *kvlist = NULL;
 	bool fw_path_set = false;
 	char *fw_path = NULL;
@@ -124,6 +128,18 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		report_dpe_warnings_set = true;
 	}
 
+	if (rte_kvargs_count(kvlist, CN10K_ML_DEV_CACHE_MODEL_DATA) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_DEV_CACHE_MODEL_DATA, &parse_integer_arg,
+					 &mldev->cache_model_data);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n",
+				CN10K_ML_DEV_CACHE_MODEL_DATA);
+			ret = -EINVAL;
+			goto exit;
+		}
+		cache_model_data_set = true;
+	}
+
 check_args:
 	if (!fw_path_set)
 		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
@@ -155,6 +171,18 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %d", CN10K_ML_FW_REPORT_DPE_WARNINGS, mldev->fw.report_dpe_warnings);
 
+	if (!cache_model_data_set) {
+		mldev->cache_model_data = CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT;
+	} else {
+		if ((mldev->cache_model_data < 0) || (mldev->cache_model_data > 1)) {
+			plt_err("Invalid argument, %s = %d\n", CN10K_ML_DEV_CACHE_MODEL_DATA,
+				mldev->cache_model_data);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_info("ML: %s = %d", CN10K_ML_DEV_CACHE_MODEL_DATA, mldev->cache_model_data);
+
 exit:
 	if (kvlist)
 		rte_kvargs_free(kvlist);
@@ -694,4 +722,5 @@ RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
 
 RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD,
 			      CN10K_ML_FW_PATH "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
-					       "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS "=<0|1>");
+					       "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
+					       "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA "=<0|1>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 9ba56ffba6..718edadde7 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -381,6 +381,9 @@ struct cn10k_ml_dev {
 
 	/* xstats status */
 	bool xstats_enabled;
+
+	/* Enable / disable model data caching */
+	int cache_model_data;
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 5689fbfcb2..d69df42b27 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -488,6 +488,49 @@ cn10k_ml_model_xstat_reset(struct rte_ml_dev *dev, uint16_t model_id,
 	}
 }
 
+static int
+cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cn10k_ml_model *model;
+	struct rte_ml_op op;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint64_t isize = 0;
+	uint64_t osize = 0;
+	int ret = 0;
+
+	model = dev->data->models[model_id];
+
+	/* Create input and output buffers. */
+	rte_ml_io_input_size_get(dev->data->dev_id, model_id, model->batch_size, &isize, NULL);
+	rte_ml_io_output_size_get(dev->data->dev_id, model_id, model->batch_size, &osize, NULL);
+
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", model_id);
+	mz = plt_memzone_reserve_aligned(str, isize + osize, 0, ML_CN10K_ALIGN_SIZE);
+	if (mz == NULL)
+		return -ENOMEM;
+	memset(mz->addr, 0, isize + osize);
+
+	op.model_id = model_id;
+	op.nb_batches = model->batch_size;
+	op.mempool = NULL;
+
+	op.input.addr = mz->addr;
+	op.input.length = isize;
+	op.input.next = NULL;
+
+	op.output.addr = PLT_PTR_ADD(op.input.addr, isize);
+	op.output.length = osize;
+	op.output.next = NULL;
+
+	memset(model->req, 0, sizeof(struct cn10k_ml_req));
+	ret = cn10k_ml_inference_sync(dev, &op);
+	plt_memzone_free(mz);
+
+	return ret;
+}
+
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -1467,6 +1510,13 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 		}
 	}
 
+	if (ret < 0) { /* Call unload to update model and FW state, ignore error */
+		rte_ml_model_stop(dev->data->dev_id, model_id);
+	} else {
+		if (mldev->cache_model_data && roc_model_is_cn10ka())
+			ret = cn10k_ml_cache_model_data(dev, model_id);
+	}
+
 	return ret;
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 35/39] ml/cnxk: add support to select OCM allocation mode
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (33 preceding siblings ...)
  2023-02-07 16:07   ` [PATCH v5 34/39] ml/cnxk: add support to enable model data caching Srikanth Yalavarthi
@ 2023-02-07 16:07   ` Srikanth Yalavarthi
  2023-03-01  9:01     ` Prince Takkar
  2023-02-07 16:07   ` [PATCH v5 36/39] ml/cnxk: add support to use lock during jcmd enq Srikanth Yalavarthi
                     ` (4 subsequent siblings)
  39 siblings, 1 reply; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:07 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added device argument "ocm_alloc_mode" to select OCM allocation
method during model start. Two modes are supported by the driver.

Added implementation for ocm_alloc_mode lowest as default.

ocm_alloc_mode:
lowest:  Allocate from first available free slot / lowest
         tile ID in OCM (default)
largest: Allocate from a slot with maximum free memory

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 45 +++++++++++++++++++++++++++++-----
 drivers/ml/cnxk/cn10k_ml_ocm.c |  6 ++---
 drivers/ml/cnxk/cn10k_ml_ocm.h |  3 +++
 3 files changed, 44 insertions(+), 10 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 948708a420..5c02d67c8e 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -21,11 +21,13 @@
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
 #define CN10K_ML_DEV_CACHE_MODEL_DATA	"cache_model_data"
+#define CN10K_ML_OCM_ALLOC_MODE		"ocm_alloc_mode"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT 0
 #define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT	1
+#define CN10K_ML_OCM_ALLOC_MODE_DEFAULT		"lowest"
 
 /* ML firmware macros */
 #define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
@@ -39,9 +41,12 @@
 #define FW_ENABLE_DPE_WARNING_BITMASK BIT(0)
 #define FW_REPORT_DPE_WARNING_BITMASK BIT(1)
 
-static const char *const valid_args[] = {CN10K_ML_FW_PATH, CN10K_ML_FW_ENABLE_DPE_WARNINGS,
+static const char *const valid_args[] = {CN10K_ML_FW_PATH,
+					 CN10K_ML_FW_ENABLE_DPE_WARNINGS,
 					 CN10K_ML_FW_REPORT_DPE_WARNINGS,
-					 CN10K_ML_DEV_CACHE_MODEL_DATA, NULL};
+					 CN10K_ML_DEV_CACHE_MODEL_DATA,
+					 CN10K_ML_OCM_ALLOC_MODE,
+					 NULL};
 
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
@@ -81,6 +86,8 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	bool report_dpe_warnings_set = false;
 	bool cache_model_data_set = false;
 	struct rte_kvargs *kvlist = NULL;
+	bool ocm_alloc_mode_set = false;
+	char *ocm_alloc_mode = NULL;
 	bool fw_path_set = false;
 	char *fw_path = NULL;
 	int ret = 0;
@@ -140,6 +147,17 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		cache_model_data_set = true;
 	}
 
+	if (rte_kvargs_count(kvlist, CN10K_ML_OCM_ALLOC_MODE) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_OCM_ALLOC_MODE, &parse_string_arg,
+					 &ocm_alloc_mode);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n", CN10K_ML_OCM_ALLOC_MODE);
+			ret = -EINVAL;
+			goto exit;
+		}
+		ocm_alloc_mode_set = true;
+	}
+
 check_args:
 	if (!fw_path_set)
 		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
@@ -183,6 +201,20 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %d", CN10K_ML_DEV_CACHE_MODEL_DATA, mldev->cache_model_data);
 
+	if (!ocm_alloc_mode_set) {
+		mldev->ocm.alloc_mode = CN10K_ML_OCM_ALLOC_MODE_DEFAULT;
+	} else {
+		if (!((strcmp(ocm_alloc_mode, "lowest") == 0) ||
+		      (strcmp(ocm_alloc_mode, "largest") == 0))) {
+			plt_err("Invalid argument, %s = %s\n", CN10K_ML_OCM_ALLOC_MODE,
+				ocm_alloc_mode);
+			ret = -EINVAL;
+			goto exit;
+		}
+		mldev->ocm.alloc_mode = ocm_alloc_mode;
+	}
+	plt_info("ML: %s = %s", CN10K_ML_OCM_ALLOC_MODE, mldev->ocm.alloc_mode);
+
 exit:
 	if (kvlist)
 		rte_kvargs_free(kvlist);
@@ -720,7 +752,8 @@ RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table);
 RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
 
-RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD,
-			      CN10K_ML_FW_PATH "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
-					       "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
-					       "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA "=<0|1>");
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH
+			      "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
+			      "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
+			      "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
+			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE "=<lowest|largest>");
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 0b04fcc2da..551faef7eb 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -230,7 +230,6 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	int wb_page_start_curr;
 	int max_slot_sz_curr;
 	uint8_t tile_start;
-	int ocm_alloc_mode;
 	int wb_page_start;
 	uint16_t tile_id;
 	uint16_t word_id;
@@ -255,7 +254,6 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	max_slot_sz_curr = 0;
 	max_slot_sz = 0;
 	tile_idx = 0;
-	ocm_alloc_mode = 2;
 
 	if ((start_tile != -1) && (start_tile % num_tiles != 0)) {
 		plt_err("Invalid start_tile, %d", start_tile);
@@ -303,13 +301,13 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 		}
 	}
 
-	if (ocm_alloc_mode == 1) {
+	if (strcmp(ocm->alloc_mode, "lowest") == 0) {
 		wb_page_start = slot_index_lowest(local_ocm_mask, ocm->mask_words, wb_pages, 0);
 		if (wb_page_start != -1) { /* Have a valid slot for WB, else next set of tiles */
 			tile_idx = tile_start;
 			goto found;
 		}
-	} else if (ocm_alloc_mode == 2) {
+	} else if (strcmp(ocm->alloc_mode, "largest") == 0) {
 		wb_page_start_curr = slot_index_largest(local_ocm_mask, ocm->mask_words, wb_pages,
 							&max_slot_sz_curr);
 		if (max_slot_sz_curr > max_slot_sz) {
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 0c7172a671..5f018b410a 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -58,6 +58,9 @@ struct cn10k_ml_ocm {
 	/* OCM spinlock, used to update OCM state */
 	rte_spinlock_t lock;
 
+	/* OCM allocation mode */
+	const char *alloc_mode;
+
 	/* Number of OCM tiles */
 	uint8_t num_tiles;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 36/39] ml/cnxk: add support to use lock during jcmd enq
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (34 preceding siblings ...)
  2023-02-07 16:07   ` [PATCH v5 35/39] ml/cnxk: add support to select OCM allocation mode Srikanth Yalavarthi
@ 2023-02-07 16:07   ` Srikanth Yalavarthi
  2023-02-07 16:07   ` [PATCH v5 37/39] ml/cnxk: add support to select poll memory region Srikanth Yalavarthi
                     ` (3 subsequent siblings)
  39 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:07 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added device argument "hw_queue_lock" to select the JCMDQ enqueue
ROC function to be used in fast path.

hw_queue_lock:

0: Disable, use lock free version of JCMDQ enqueue ROC 	function for
	job queuing. To avoid race condition in request queuing to
	hardware, disabling hw_queue_lock restricts the number of
	queue-pairs supported by cnxk driver to 1.

1: Enable, (default) use spin-lock version of JCMDQ enqueue ROC
	function for job queuing. Enabling spinlock version would
	disable restrictions on the number of queue-pairs that
	can be created.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 31 ++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cn10k_ml_dev.h | 13 +++++++++++--
 drivers/ml/cnxk/cn10k_ml_ops.c | 20 +++++++++++++++++---
 3 files changed, 58 insertions(+), 6 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 5c02d67c8e..aa503b2691 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -22,12 +22,14 @@
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
 #define CN10K_ML_DEV_CACHE_MODEL_DATA	"cache_model_data"
 #define CN10K_ML_OCM_ALLOC_MODE		"ocm_alloc_mode"
+#define CN10K_ML_DEV_HW_QUEUE_LOCK	"hw_queue_lock"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT 0
 #define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT	1
 #define CN10K_ML_OCM_ALLOC_MODE_DEFAULT		"lowest"
+#define CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT	1
 
 /* ML firmware macros */
 #define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
@@ -46,6 +48,7 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 					 CN10K_ML_FW_REPORT_DPE_WARNINGS,
 					 CN10K_ML_DEV_CACHE_MODEL_DATA,
 					 CN10K_ML_OCM_ALLOC_MODE,
+					 CN10K_ML_DEV_HW_QUEUE_LOCK,
 					 NULL};
 
 /* Dummy operations for ML device */
@@ -87,6 +90,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	bool cache_model_data_set = false;
 	struct rte_kvargs *kvlist = NULL;
 	bool ocm_alloc_mode_set = false;
+	bool hw_queue_lock_set = false;
 	char *ocm_alloc_mode = NULL;
 	bool fw_path_set = false;
 	char *fw_path = NULL;
@@ -158,6 +162,18 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		ocm_alloc_mode_set = true;
 	}
 
+	if (rte_kvargs_count(kvlist, CN10K_ML_DEV_HW_QUEUE_LOCK) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_DEV_HW_QUEUE_LOCK, &parse_integer_arg,
+					 &mldev->hw_queue_lock);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n",
+				CN10K_ML_DEV_HW_QUEUE_LOCK);
+			ret = -EINVAL;
+			goto exit;
+		}
+		hw_queue_lock_set = true;
+	}
+
 check_args:
 	if (!fw_path_set)
 		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
@@ -215,6 +231,18 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %s", CN10K_ML_OCM_ALLOC_MODE, mldev->ocm.alloc_mode);
 
+	if (!hw_queue_lock_set) {
+		mldev->hw_queue_lock = CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT;
+	} else {
+		if ((mldev->hw_queue_lock < 0) || (mldev->hw_queue_lock > 1)) {
+			plt_err("Invalid argument, %s = %d\n", CN10K_ML_DEV_HW_QUEUE_LOCK,
+				mldev->hw_queue_lock);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, mldev->hw_queue_lock);
+
 exit:
 	if (kvlist)
 		rte_kvargs_free(kvlist);
@@ -756,4 +784,5 @@ RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH
 			      "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
 			      "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
 			      "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
-			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE "=<lowest|largest>");
+			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE
+			      "=<lowest|largest>" CN10K_ML_DEV_HW_QUEUE_LOCK "=<0|1>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 718edadde7..49676ac9e7 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -21,8 +21,11 @@
 /* Maximum number of models per device */
 #define ML_CN10K_MAX_MODELS 16
 
-/* Maximum number of queue-pairs per device */
-#define ML_CN10K_MAX_QP_PER_DEVICE 1
+/* Maximum number of queue-pairs per device, spinlock version */
+#define ML_CN10K_MAX_QP_PER_DEVICE_SL 16
+
+/* Maximum number of queue-pairs per device, lock-free version */
+#define ML_CN10K_MAX_QP_PER_DEVICE_LF 1
 
 /* Maximum number of descriptors per queue-pair */
 #define ML_CN10K_MAX_DESC_PER_QP 1024
@@ -384,6 +387,12 @@ struct cn10k_ml_dev {
 
 	/* Enable / disable model data caching */
 	int cache_model_data;
+
+	/* Use spinlock version of ROC enqueue */
+	int hw_queue_lock;
+
+	/* JCMD enqueue function handler */
+	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index d69df42b27..f92f778e23 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -534,13 +534,21 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
+	struct cn10k_ml_dev *mldev;
+
 	if (dev_info == NULL)
 		return -EINVAL;
 
+	mldev = dev->data->dev_private;
+
 	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
 	dev_info->driver_name = dev->device->driver->name;
 	dev_info->max_models = ML_CN10K_MAX_MODELS;
-	dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE;
+	if (mldev->hw_queue_lock)
+		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
+	else
+		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_LF;
+
 	dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
 	dev_info->max_segments = ML_CN10K_MAX_SEGMENTS;
 	dev_info->min_align_size = ML_CN10K_ALIGN_SIZE;
@@ -703,6 +711,12 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	else
 		mldev->xstats_enabled = false;
 
+	/* Set JCMDQ enqueue function */
+	if (mldev->hw_queue_lock == 1)
+		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_sl;
+	else
+		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
+
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
 	dev->op_error_get = cn10k_ml_op_error_get;
@@ -1993,7 +2007,7 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	req->result.user_ptr = op->user_ptr;
 
 	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
-	enqueued = roc_ml_jcmdq_enqueue_lf(&mldev->roc, &req->jcmd);
+	enqueued = mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
 
@@ -2114,7 +2128,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	timeout = true;
 	req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
 	do {
-		if (roc_ml_jcmdq_enqueue_lf(&mldev->roc, &req->jcmd)) {
+		if (mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd)) {
 			req->op = op;
 			timeout = false;
 			break;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 37/39] ml/cnxk: add support to select poll memory region
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (35 preceding siblings ...)
  2023-02-07 16:07   ` [PATCH v5 36/39] ml/cnxk: add support to use lock during jcmd enq Srikanth Yalavarthi
@ 2023-02-07 16:07   ` Srikanth Yalavarthi
  2023-02-07 16:07   ` [PATCH v5 38/39] ml/cnxk: add user guide for marvell cnxk ml driver Srikanth Yalavarthi
                     ` (2 subsequent siblings)
  39 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:07 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added device argument "poll_mem" to select the memory
region to be used for polling in fast-path requests.

Implemented support to use scratch registers for polling.
Available pool of scratch registers one-to-one mapped with
the internal request queue.

poll_mem:
ddr:      Use DDR memory location for polling (default)
register: Use scratch registers polling

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c |  47 +++++++++++--
 drivers/ml/cnxk/cn10k_ml_dev.h |  24 +++++++
 drivers/ml/cnxk/cn10k_ml_ops.c | 124 +++++++++++++++++++++++++++++++--
 drivers/ml/cnxk/cn10k_ml_ops.h |   9 +++
 4 files changed, 192 insertions(+), 12 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index aa503b2691..a746a66849 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -23,6 +23,7 @@
 #define CN10K_ML_DEV_CACHE_MODEL_DATA	"cache_model_data"
 #define CN10K_ML_OCM_ALLOC_MODE		"ocm_alloc_mode"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK	"hw_queue_lock"
+#define CN10K_ML_FW_POLL_MEM		"poll_mem"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1
@@ -30,6 +31,7 @@
 #define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT	1
 #define CN10K_ML_OCM_ALLOC_MODE_DEFAULT		"lowest"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT	1
+#define CN10K_ML_FW_POLL_MEM_DEFAULT		"ddr"
 
 /* ML firmware macros */
 #define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
@@ -42,6 +44,7 @@
 /* Firmware flags */
 #define FW_ENABLE_DPE_WARNING_BITMASK BIT(0)
 #define FW_REPORT_DPE_WARNING_BITMASK BIT(1)
+#define FW_USE_DDR_POLL_ADDR_FP	      BIT(2)
 
 static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 					 CN10K_ML_FW_ENABLE_DPE_WARNINGS,
@@ -49,6 +52,7 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 					 CN10K_ML_DEV_CACHE_MODEL_DATA,
 					 CN10K_ML_OCM_ALLOC_MODE,
 					 CN10K_ML_DEV_HW_QUEUE_LOCK,
+					 CN10K_ML_FW_POLL_MEM,
 					 NULL};
 
 /* Dummy operations for ML device */
@@ -92,7 +96,9 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	bool ocm_alloc_mode_set = false;
 	bool hw_queue_lock_set = false;
 	char *ocm_alloc_mode = NULL;
+	bool poll_mem_set = false;
 	bool fw_path_set = false;
+	char *poll_mem = NULL;
 	char *fw_path = NULL;
 	int ret = 0;
 
@@ -174,6 +180,17 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		hw_queue_lock_set = true;
 	}
 
+	if (rte_kvargs_count(kvlist, CN10K_ML_FW_POLL_MEM) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_POLL_MEM, &parse_string_arg,
+					 &poll_mem);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n", CN10K_ML_FW_POLL_MEM);
+			ret = -EINVAL;
+			goto exit;
+		}
+		poll_mem_set = true;
+	}
+
 check_args:
 	if (!fw_path_set)
 		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
@@ -243,6 +260,18 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, mldev->hw_queue_lock);
 
+	if (!poll_mem_set) {
+		mldev->fw.poll_mem = CN10K_ML_FW_POLL_MEM_DEFAULT;
+	} else {
+		if (!((strcmp(poll_mem, "ddr") == 0) || (strcmp(poll_mem, "register") == 0))) {
+			plt_err("Invalid argument, %s = %s\n", CN10K_ML_FW_POLL_MEM, poll_mem);
+			ret = -EINVAL;
+			goto exit;
+		}
+		mldev->fw.poll_mem = poll_mem;
+	}
+	plt_info("ML: %s = %s", CN10K_ML_FW_POLL_MEM, mldev->fw.poll_mem);
+
 exit:
 	if (kvlist)
 		rte_kvargs_free(kvlist);
@@ -376,6 +405,11 @@ cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 	if (fw->report_dpe_warnings)
 		flags = flags | FW_REPORT_DPE_WARNING_BITMASK;
 
+	if (strcmp(fw->poll_mem, "ddr") == 0)
+		flags = flags | FW_USE_DDR_POLL_ADDR_FP;
+	else if (strcmp(fw->poll_mem, "register") == 0)
+		flags = flags & ~FW_USE_DDR_POLL_ADDR_FP;
+
 	return flags;
 }
 
@@ -780,9 +814,10 @@ RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table);
 RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
 
-RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH
-			      "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
-			      "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
-			      "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
-			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE
-			      "=<lowest|largest>" CN10K_ML_DEV_HW_QUEUE_LOCK "=<0|1>");
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD,
+			      CN10K_ML_FW_PATH "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
+					       "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
+					       "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
+					       "=<0|1>" CN10K_ML_OCM_ALLOC_MODE
+					       "=<lowest|largest>" CN10K_ML_DEV_HW_QUEUE_LOCK
+					       "=<0|1>" CN10K_ML_FW_POLL_MEM "=<ddr|register>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 49676ac9e7..966d92e027 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -43,6 +43,18 @@
 #define ML_CN10K_POLL_JOB_START	 0
 #define ML_CN10K_POLL_JOB_FINISH 1
 
+/* Memory barrier macros */
+#if defined(RTE_ARCH_ARM)
+#define dmb_st ({ asm volatile("dmb st" : : : "memory"); })
+#define dsb_st ({ asm volatile("dsb st" : : : "memory"); })
+#else
+#define dmb_st
+#define dsb_st
+#endif
+
+struct cn10k_ml_req;
+struct cn10k_ml_qp;
+
 /* Job types */
 enum cn10k_ml_job_type {
 	ML_CN10K_JOB_TYPE_MODEL_RUN = 0,
@@ -358,6 +370,9 @@ struct cn10k_ml_fw {
 	/* Report DPE warnings */
 	int report_dpe_warnings;
 
+	/* Memory to be used for polling in fast-path requests */
+	const char *poll_mem;
+
 	/* Data buffer */
 	uint8_t *data;
 
@@ -393,6 +408,15 @@ struct cn10k_ml_dev {
 
 	/* JCMD enqueue function handler */
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
+
+	/* Poll handling function pointers */
+	void (*set_poll_addr)(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx);
+	void (*set_poll_ptr)(struct roc_ml *roc_ml, struct cn10k_ml_req *req);
+	uint64_t (*get_poll_ptr)(struct roc_ml *roc_ml, struct cn10k_ml_req *req);
+
+	/* Memory barrier function pointers to handle synchronization */
+	void (*set_enq_barrier)(void);
+	void (*set_deq_barrier)(void);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index f92f778e23..61e6d023c5 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -23,6 +23,11 @@
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
+/* Scratch register range for poll mode requests */
+#define ML_POLL_REGISTER_SYNC  1023
+#define ML_POLL_REGISTER_START 1024
+#define ML_POLL_REGISTER_END   2047
+
 /* Error message length */
 #define ERRMSG_LEN 32
 
@@ -76,6 +81,80 @@ print_line(FILE *fp, int len)
 	fprintf(fp, "\n");
 }
 
+static inline void
+cn10k_ml_set_poll_addr_ddr(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx)
+{
+	PLT_SET_USED(qp);
+	PLT_SET_USED(idx);
+
+	req->compl_W1 = PLT_U64_CAST(&req->status);
+}
+
+static inline void
+cn10k_ml_set_poll_addr_reg(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx)
+{
+	req->compl_W1 = ML_SCRATCH(qp->block_start + idx % qp->block_size);
+}
+
+static inline void
+cn10k_ml_set_poll_ptr_ddr(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+{
+	PLT_SET_USED(roc_ml);
+
+	plt_write64(ML_CN10K_POLL_JOB_START, req->compl_W1);
+}
+
+static inline void
+cn10k_ml_set_poll_ptr_reg(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+{
+	roc_ml_reg_write64(roc_ml, ML_CN10K_POLL_JOB_START, req->compl_W1);
+}
+
+static inline uint64_t
+cn10k_ml_get_poll_ptr_ddr(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+{
+	PLT_SET_USED(roc_ml);
+
+	return plt_read64(req->compl_W1);
+}
+
+static inline uint64_t
+cn10k_ml_get_poll_ptr_reg(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+{
+	return roc_ml_reg_read64(roc_ml, req->compl_W1);
+}
+
+static inline void
+cn10k_ml_set_sync_addr(struct cn10k_ml_dev *mldev, struct cn10k_ml_req *req)
+{
+	if (strcmp(mldev->fw.poll_mem, "ddr") == 0)
+		req->compl_W1 = PLT_U64_CAST(&req->status);
+	else if (strcmp(mldev->fw.poll_mem, "register") == 0)
+		req->compl_W1 = ML_SCRATCH(ML_POLL_REGISTER_SYNC);
+}
+
+static inline void
+cn10k_ml_enq_barrier_ddr(void)
+{
+}
+
+static inline void
+cn10k_ml_deq_barrier_ddr(void)
+{
+}
+
+static inline void
+cn10k_ml_enq_barrier_register(void)
+{
+	dmb_st;
+}
+
+static inline void
+cn10k_ml_deq_barrier_register(void)
+{
+	dsb_st;
+}
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -163,6 +242,9 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->stats.dequeued_count = 0;
 	qp->stats.enqueue_err_count = 0;
 	qp->stats.dequeue_err_count = 0;
+	qp->block_size =
+		(ML_POLL_REGISTER_END - ML_POLL_REGISTER_START + 1) / dev->data->nb_queue_pairs;
+	qp->block_start = ML_POLL_REGISTER_START + qp_id * qp->block_size;
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
@@ -341,7 +423,7 @@ cn10k_ml_prep_fp_job_descriptor(struct rte_ml_dev *dev, struct cn10k_ml_req *req
 	mldev = dev->data->dev_private;
 
 	req->jd.hdr.jce.w0.u64 = 0;
-	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
+	req->jd.hdr.jce.w1.u64 = req->compl_W1;
 	req->jd.hdr.model_id = op->model_id;
 	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
 	req->jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
@@ -549,7 +631,11 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	else
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_LF;
 
-	dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
+	if (strcmp(mldev->fw.poll_mem, "register") == 0)
+		dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP / dev_info->max_queue_pairs;
+	else if (strcmp(mldev->fw.poll_mem, "ddr") == 0)
+		dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
+
 	dev_info->max_segments = ML_CN10K_MAX_SEGMENTS;
 	dev_info->min_align_size = ML_CN10K_ALIGN_SIZE;
 
@@ -717,6 +803,26 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	else
 		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
+	/* Set polling function pointers */
+	if (strcmp(mldev->fw.poll_mem, "ddr") == 0) {
+		mldev->set_poll_addr = cn10k_ml_set_poll_addr_ddr;
+		mldev->set_poll_ptr = cn10k_ml_set_poll_ptr_ddr;
+		mldev->get_poll_ptr = cn10k_ml_get_poll_ptr_ddr;
+	} else if (strcmp(mldev->fw.poll_mem, "register") == 0) {
+		mldev->set_poll_addr = cn10k_ml_set_poll_addr_reg;
+		mldev->set_poll_ptr = cn10k_ml_set_poll_ptr_reg;
+		mldev->get_poll_ptr = cn10k_ml_get_poll_ptr_reg;
+	}
+
+	/* Set barrier function pointers */
+	if (strcmp(mldev->fw.poll_mem, "ddr") == 0) {
+		mldev->set_enq_barrier = cn10k_ml_enq_barrier_ddr;
+		mldev->set_deq_barrier = cn10k_ml_deq_barrier_ddr;
+	} else if (strcmp(mldev->fw.poll_mem, "register") == 0) {
+		mldev->set_enq_barrier = cn10k_ml_enq_barrier_register;
+		mldev->set_deq_barrier = cn10k_ml_deq_barrier_register;
+	}
+
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
 	dev->op_error_get = cn10k_ml_op_error_get;
@@ -2000,13 +2106,15 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	op = ops[count];
 	req = &queue->reqs[head];
 
+	mldev->set_poll_addr(qp, req, head);
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
+	mldev->set_enq_barrier();
 
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	mldev->set_poll_ptr(&mldev->roc, req);
 	enqueued = mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
@@ -2032,6 +2140,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 		       uint16_t nb_ops)
 {
 	struct cn10k_ml_queue *queue;
+	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	struct cn10k_ml_qp *qp;
 
@@ -2039,6 +2148,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	uint16_t count;
 	uint64_t tail;
 
+	mldev = dev->data->dev_private;
 	qp = dev->data->queue_pairs[qp_id];
 	queue = &qp->queue;
 
@@ -2051,7 +2161,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 
 dequeue_req:
 	req = &queue->reqs[tail];
-	status = plt_read64(&req->status);
+	status = mldev->get_poll_ptr(&mldev->roc, req);
 	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH)) {
 		if (plt_tsc_cycles() < req->timeout)
 			goto empty_or_active;
@@ -2059,6 +2169,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 			req->result.error_code.s.etype = ML_ETYPE_DRIVER;
 	}
 
+	mldev->set_deq_barrier();
 	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
 	ops[count] = req->op;
 
@@ -2116,13 +2227,14 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	model = dev->data->models[op->model_id];
 	req = model->req;
 
+	cn10k_ml_set_sync_addr(mldev, req);
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	mldev->set_poll_ptr(&mldev->roc, req);
 	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
 
 	timeout = true;
@@ -2142,7 +2254,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 
 	timeout = true;
 	do {
-		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+		if (mldev->get_poll_ptr(&mldev->roc, req) == ML_CN10K_POLL_JOB_FINISH) {
 			timeout = false;
 			break;
 		}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 65ae8b44f3..58c992720a 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -26,6 +26,9 @@ struct cn10k_ml_req {
 	/* Job command */
 	struct ml_job_cmd_s jcmd;
 
+	/* Job completion W1 */
+	uint64_t compl_W1;
+
 	/* Timeout cycle */
 	uint64_t timeout;
 
@@ -61,6 +64,12 @@ struct cn10k_ml_qp {
 
 	/* Statistics per queue-pair */
 	struct rte_ml_dev_stats stats;
+
+	/* Register block start for polling */
+	uint32_t block_start;
+
+	/* Register block end for polling */
+	uint32_t block_size;
 };
 
 /* Device ops */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 38/39] ml/cnxk: add user guide for marvell cnxk ml driver
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (36 preceding siblings ...)
  2023-02-07 16:07   ` [PATCH v5 37/39] ml/cnxk: add support to select poll memory region Srikanth Yalavarthi
@ 2023-02-07 16:07   ` Srikanth Yalavarthi
  2023-02-15 12:34     ` Shivah Shankar Shankar Narayan Rao
  2023-02-16  4:41     ` Prince Takkar
  2023-02-07 16:07   ` [PATCH v5 39/39] ml/cnxk: enable support for configurable ocm page Srikanth Yalavarthi
  2023-03-02  6:08   ` [PATCH v5 00/39] Implementation of ML CNXK driver Prince Takkar
  39 siblings, 2 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:07 UTC (permalink / raw)
  To: Thomas Monjalon, Srikanth Yalavarthi
  Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added user guide for Marvell cnxk ML driver for Marvell Octeon
cnxk Soc family. Added details about device initialization,
debug options and runtime device args supported by the driver.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 MAINTAINERS                 |   1 +
 doc/guides/index.rst        |   1 +
 doc/guides/mldevs/cnxk.rst  | 238 ++++++++++++++++++++++++++++++++++++
 doc/guides/mldevs/index.rst |  14 +++
 4 files changed, 254 insertions(+)
 create mode 100644 doc/guides/mldevs/cnxk.rst
 create mode 100644 doc/guides/mldevs/index.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index 8e9d6dc946..65153948d2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1442,6 +1442,7 @@ M: Srikanth Yalavarthi <syalavarthi@marvell.com>
 F: drivers/common/cnxk/hw/ml.h
 F: drivers/common/cnxk/roc_ml*
 F: drivers/ml/cnxk/
+F: doc/guides/mldevs/cnxk.rst
 
 
 Packet processing
diff --git a/doc/guides/index.rst b/doc/guides/index.rst
index 5eb5bd9c9a..0bd729530a 100644
--- a/doc/guides/index.rst
+++ b/doc/guides/index.rst
@@ -26,6 +26,7 @@ DPDK documentation
    eventdevs/index
    rawdevs/index
    mempool/index
+   mldevs/index
    platform/index
    contributing/index
    rel_notes/index
diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
new file mode 100644
index 0000000000..da40336299
--- /dev/null
+++ b/doc/guides/mldevs/cnxk.rst
@@ -0,0 +1,238 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright (c) 2022 Marvell.
+
+Marvell cnxk Machine Learning Poll Mode Driver
+==============================================
+
+The cnxk ML poll mode driver provides support for offloading Machine
+Learning inference operations to Machine Learning accelerator units
+on the **Marvell OCTEON cnxk** SoC family.
+
+The cnxk ML PMD code is organized into multiple files with all file names
+starting with cn10k, providing support for CN106XX and CN106XXS.
+
+More information about OCTEON cnxk SoCs may be obtained from `<https://www.marvell.com>`_
+
+Supported OCTEON cnxk SoCs
+--------------------------
+
+- CN106XX
+- CN106XXS
+
+Features
+--------
+
+The OCTEON cnxk ML PMD provides support for the following set of operations:
+
+Slow-path device and ML model handling:
+
+* ``Device probing, configuration and close``
+* ``Device start / stop``
+* ``Model loading and unloading``
+* ``Model start / stop``
+* ``Data quantization and dequantization``
+
+Fast-path Inference:
+
+* ``Inference execution``
+* ``Error handling``
+
+
+Installation
+------------
+
+The OCTEON cnxk ML PMD may be compiled natively on an OCTEON cnxk platform
+or cross-compiled on an x86 platform.
+
+Refer to :doc:`../platform/cnxk` for instructions to build your DPDK
+application.
+
+
+Initialization
+--------------
+
+``CN10K Initialization``
+
+List the ML PF devices available on cn10k platform:
+
+.. code-block:: console
+
+    lspci -d:a092
+
+``a092`` is the ML device PF id. You should see output similar to:
+
+.. code-block:: console
+
+    0000:00:10.0 System peripheral: Cavium, Inc. Device a092
+
+Bind the ML PF device to the vfio_pci driver:
+
+.. code-block:: console
+
+    cd <dpdk directory>
+    ./usertools/dpdk-devbind.py -u 0000:00:10.0
+    ./usertools/dpdk-devbind.py -b vfio-pci 0000:00:10.0
+
+Runtime Config Options
+----------------------
+
+- ``Firmware file path`` (default ``/lib/firmware/mlip-fw.bin``)
+
+   Path to the firmware binary to be loaded during device configuration.
+   The ``fw_path`` ``devargs`` parameter can be used by the user to load
+   ML firmware from a custom path.
+
+   For example::
+
+      -a 0000:00:10.0,fw_path="/home/user/ml_fw.bin"
+
+   With the above configuration, driver loads the firmware from the path
+   "/home/user/ml_fw.bin".
+
+- ``Enable DPE warnings`` (default ``1``)
+
+   ML firmware can be configured during load to handle the DPE errors reported
+   by ML inference engine. When enabled, firmware would mask the DPE non-fatal
+   hardware errors as warnings. The parameter ``enable_dpe_warnings`` ``devargs``
+   is used fo this configuration.
+
+   For example::
+
+      -a 0000:00:10.0,enable_dpe_warnings=0
+
+   With the above configuration, DPE non-fatal errors reported by HW are
+   considered as errors.
+
+
+- ``Model data caching`` (default ``1``)
+
+   Enable caching model data on ML ACC cores. Enabling this option executes a
+   dummy inference request in synchronous mode during model start stage. Caching
+   of model data improves the inferencing throughput / latency for the model.
+   The parameter ``cache_model_data`` ``devargs`` is used to enable data caching.
+
+   For example::
+
+      -a 0000:00:10.0,cache_model_data=0
+
+   With the above configuration, model data caching is disabled.
+
+
+- ``OCM allocation mode`` (default ``lowest``)
+
+   Option to specify the method to be used while allocating OCM memory for a
+   model during model start. Two modes are supported by the driver. The
+   parameter ``ocm_alloc_mode`` ``devargs`` is used to select the OCM
+   allocation mode.
+
+   ``lowest`` - Allocate OCM for the model from first available free slot. Search
+   for the free slot is done starting from the lowest tile ID and lowest page ID.
+   ``largest`` - Allocate OCM for the model from the slot with largest amount of
+   free space.
+
+   For example::
+
+      -a 0000:00:10.0,ocm_alloc_mode=lowest
+
+   With the above configuration, OCM allocation fo the model would be done from
+   the first available free slot / from the lowest possible tile ID.
+
+
+- ``Enable hardware queue lock`` (default ``0``)
+
+   Option to select the job request enqueue function to used to queue the requests
+   to hardware queue. The parameter ``hw_queue_lock`` ``devargs`` is used to select
+   the enqueue function.
+
+   ``0`` - Disable (default), use lock free version of hardware enqueue function
+   for job queuing in enqueue burst operation. To avoid race condition in request
+   queuing to hardware, disabling hw_queue_lock restricts the number of queue-pairs
+   supported by cnxk driver to 1.
+   ``1`` - Enable, use spin-lock version of hardware enqueue function for job queuing.
+   Enabling spinlock version would disable restrictions on the number of queue-pairs
+   that can be supported by the driver.
+
+   For example::
+
+      -a 0000:00:10.0,hw_queue_lock=1
+
+   With the above configuration, spinlock version of hardware enqueue function is used
+   in the fast path enqueue burst operation.
+
+
+- ``Polling memory location`` (default ``ddr``)
+
+   ML cnxk driver provides the option to select the memory location to be used
+   for polling to check the inference request completion. Driver supports using
+   the either DDR address space (``ddr``) or ML registers (``register``) as
+   polling locations. The parameter ``poll_mem`` ``devargs`` is used to specify
+   the poll location.
+
+   For example::
+
+      -a 0000:00:10.0,poll_mem="register"
+
+   With the above configuration, ML cnxk driver is configured to use ML registers
+   for polling in fastpath requests.
+
+
+Debugging Options
+-----------------
+
+.. _table_octeon_cnxk_ml_debug_options:
+
+.. table:: OCTEON cnxk ML PMD debug options
+
+    +---+------------+-------------------------------------------------------+
+    | # | Component  | EAL log command                                       |
+    +===+============+=======================================================+
+    | 1 | ML         | --log-level='pmd\.ml\.cnxk,8'                         |
+    +---+------------+-------------------------------------------------------+
+
+
+Extended stats
+--------------
+
+Marvell cnxk ML PMD supports reporting the inference latencies through extended
+stats. The PMD supports the below list of 6 extended stats types per each model.
+Total number of extended stats would be equal to 6 x number of models loaded.
+
+.. _table_octeon_cnxk_ml_xstats_names:
+
+.. table:: OCTEON cnxk ML PMD xstats names
+
+    +---+---------------------+----------------------------------------------+
+    | # | Type                | Description                                  |
+    +===+=====================+==============================================+
+    | 1 | Avg-HW-Latency      | Average hardware latency                     |
+    +---+---------------------+----------------------------------------------+
+    | 2 | Min-HW-Latency      | Minimum hardware latency                     |
+    +---+---------------------+----------------------------------------------+
+    | 3 | Max-HW-Latency      | Maximum hardware latency                     |
+    +---+---------------------+----------------------------------------------+
+    | 4 | Avg-HW-Latency      | Average firmware latency                     |
+    +---+---------------------+----------------------------------------------+
+    | 5 | Avg-HW-Latency      | Minimum firmware latency                     |
+    +---+---------------------+----------------------------------------------+
+    | 6 | Avg-HW-Latency      | Maximum firmware latency                     |
+    +---+---------------------+----------------------------------------------+
+
+Latency values reported by the PMD through xstats can have units, either in
+cycles or nano seconds. The units of the latency is determined during DPDK
+initialization and would depend on the availability of SCLK. Latencies are
+reported in nao seconds when the SCLK is available and in cycles otherwise.
+Application needs to initialize at least one RVU for the clock to be available.
+
+xstats names are dynamically generated by the PMD and would have the format
+"Model-<model_id>-Type-<units>".
+
+For example::
+   Model-1-Avg-FW-Latency-ns
+
+The above xstat name would report average firmware latency in nano seconds for
+model with model ID 1.
+
+Number of xstats made available by the PMD change dynamically. The number would
+increase with loading a model and would decrease with unloading a model.
+Application needs to update the xstats map after a model is either loaded or
+unloaded.
diff --git a/doc/guides/mldevs/index.rst b/doc/guides/mldevs/index.rst
new file mode 100644
index 0000000000..f201e54175
--- /dev/null
+++ b/doc/guides/mldevs/index.rst
@@ -0,0 +1,14 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright (c) 2022 Marvell.
+
+Machine Learning Device Driver
+==============================
+
+The following are a list of ML device PMDs, which can be used from an
+application through the ML device API.
+
+.. toctree::
+    :maxdepth: 2
+    :numbered:
+
+    cnxk
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v5 39/39] ml/cnxk: enable support for configurable ocm page
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (37 preceding siblings ...)
  2023-02-07 16:07   ` [PATCH v5 38/39] ml/cnxk: add user guide for marvell cnxk ml driver Srikanth Yalavarthi
@ 2023-02-07 16:07   ` Srikanth Yalavarthi
  2023-02-15 12:33     ` Shivah Shankar Shankar Narayan Rao
  2023-02-16  4:37     ` Prince Takkar
  2023-03-02  6:08   ` [PATCH v5 00/39] Implementation of ML CNXK driver Prince Takkar
  39 siblings, 2 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-02-07 16:07 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Enabled support for configurable OCM page size. A new device
argument "ocm_page_size" is added to specify the page size
for OCM management. Supported page sizes are 1KB, 2KB, 4KB,
8KB and 16KB. Default page size is 16KB.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 doc/guides/mldevs/cnxk.rst       | 16 +++++++++
 drivers/ml/cnxk/cn10k_ml_dev.c   | 61 ++++++++++++++++++++++++++++----
 drivers/ml/cnxk/cn10k_ml_dev.h   |  3 ++
 drivers/ml/cnxk/cn10k_ml_model.c |  6 ++--
 drivers/ml/cnxk/cn10k_ml_ocm.c   | 18 +++++++---
 drivers/ml/cnxk/cn10k_ml_ocm.h   | 14 +++-----
 drivers/ml/cnxk/cn10k_ml_ops.c   | 17 ++++++---
 7 files changed, 107 insertions(+), 28 deletions(-)

diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
index da40336299..f7f61e8bfa 100644
--- a/doc/guides/mldevs/cnxk.rst
+++ b/doc/guides/mldevs/cnxk.rst
@@ -175,6 +175,22 @@ Runtime Config Options
    With the above configuration, ML cnxk driver is configured to use ML registers
    for polling in fastpath requests.
 
+- ``OCM page size`` (default ``16384``)
+
+   Option to specify the page size in bytes to be used for OCM management. Available
+   OCM is split into multiple pages of specified sizes and the pages are allocated to
+   the models. The parameter ``ocm_page_size`` ``devargs`` is used to specify the page
+   size to be used.
+
+   Supported page sizes by the driver are 1 KB, 2 KB, 4 KB, 8 KB and 16 KB. Default
+   page size is 16 KB.
+
+   For example::
+
+      -a 0000:00:10.0,ocm_page_size=8192
+
+   With the above configuration, page size of OCM is set to 8192 bytes / 8 KB.
+
 
 Debugging Options
 -----------------
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index a746a66849..6f9a1015a6 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -24,6 +24,7 @@
 #define CN10K_ML_OCM_ALLOC_MODE		"ocm_alloc_mode"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK	"hw_queue_lock"
 #define CN10K_ML_FW_POLL_MEM		"poll_mem"
+#define CN10K_ML_OCM_PAGE_SIZE		"ocm_page_size"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1
@@ -32,6 +33,7 @@
 #define CN10K_ML_OCM_ALLOC_MODE_DEFAULT		"lowest"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT	1
 #define CN10K_ML_FW_POLL_MEM_DEFAULT		"ddr"
+#define CN10K_ML_OCM_PAGE_SIZE_DEFAULT		16384
 
 /* ML firmware macros */
 #define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
@@ -53,8 +55,12 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 					 CN10K_ML_OCM_ALLOC_MODE,
 					 CN10K_ML_DEV_HW_QUEUE_LOCK,
 					 CN10K_ML_FW_POLL_MEM,
+					 CN10K_ML_OCM_PAGE_SIZE,
 					 NULL};
 
+/* Supported OCM page sizes: 1KB, 2KB, 4KB, 8KB and 16KB */
+static const int valid_ocm_page_size[] = {1024, 2048, 4096, 8192, 16384};
+
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
 
@@ -95,12 +101,15 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	struct rte_kvargs *kvlist = NULL;
 	bool ocm_alloc_mode_set = false;
 	bool hw_queue_lock_set = false;
+	bool ocm_page_size_set = false;
 	char *ocm_alloc_mode = NULL;
 	bool poll_mem_set = false;
 	bool fw_path_set = false;
 	char *poll_mem = NULL;
 	char *fw_path = NULL;
 	int ret = 0;
+	bool found;
+	uint8_t i;
 
 	if (devargs == NULL)
 		goto check_args;
@@ -191,6 +200,17 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		poll_mem_set = true;
 	}
 
+	if (rte_kvargs_count(kvlist, CN10K_ML_OCM_PAGE_SIZE) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_OCM_PAGE_SIZE, &parse_integer_arg,
+					 &mldev->ocm_page_size);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n", CN10K_ML_OCM_PAGE_SIZE);
+			ret = -EINVAL;
+			goto exit;
+		}
+		ocm_page_size_set = true;
+	}
+
 check_args:
 	if (!fw_path_set)
 		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
@@ -272,6 +292,32 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %s", CN10K_ML_FW_POLL_MEM, mldev->fw.poll_mem);
 
+	if (!ocm_page_size_set) {
+		mldev->ocm_page_size = CN10K_ML_OCM_PAGE_SIZE_DEFAULT;
+	} else {
+		if (mldev->ocm_page_size < 0) {
+			plt_err("Invalid argument, %s = %d\n", CN10K_ML_OCM_PAGE_SIZE,
+				mldev->ocm_page_size);
+			ret = -EINVAL;
+			goto exit;
+		}
+
+		found = false;
+		for (i = 0; i < PLT_DIM(valid_ocm_page_size); i++) {
+			if (mldev->ocm_page_size == valid_ocm_page_size[i]) {
+				found = true;
+				break;
+			}
+		}
+
+		if (!found) {
+			plt_err("Unsupported ocm_page_size = %d\n", mldev->ocm_page_size);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_info("ML: %s = %d", CN10K_ML_OCM_PAGE_SIZE, mldev->ocm_page_size);
+
 exit:
 	if (kvlist)
 		rte_kvargs_free(kvlist);
@@ -814,10 +860,11 @@ RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table);
 RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
 
-RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD,
-			      CN10K_ML_FW_PATH "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
-					       "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
-					       "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
-					       "=<0|1>" CN10K_ML_OCM_ALLOC_MODE
-					       "=<lowest|largest>" CN10K_ML_DEV_HW_QUEUE_LOCK
-					       "=<0|1>" CN10K_ML_FW_POLL_MEM "=<ddr|register>");
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH
+			      "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
+			      "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
+			      "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
+			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE
+			      "=<lowest|largest>" CN10K_ML_DEV_HW_QUEUE_LOCK
+			      "=<0|1>" CN10K_ML_FW_POLL_MEM "=<ddr|register>" CN10K_ML_OCM_PAGE_SIZE
+			      "=<1024|2048|4096|8192|16384>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 966d92e027..b4e46899c0 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -406,6 +406,9 @@ struct cn10k_ml_dev {
 	/* Use spinlock version of ROC enqueue */
 	int hw_queue_lock;
 
+	/* OCM page size */
+	int ocm_page_size;
+
 	/* JCMD enqueue function handler */
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
 
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 0ded355d81..ceffde8459 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -339,11 +339,11 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui
 		   scratch_size, *scratch_pages);
 
 	/* Check if the model can be loaded on OCM */
-	if ((*wb_pages + *scratch_pages) > ML_CN10K_OCM_NUMPAGES) {
+	if ((*wb_pages + *scratch_pages) > mldev->ocm.num_pages) {
 		plt_err("Cannot create the model, OCM relocatable = %u",
 			metadata->model.ocm_relocatable);
 		plt_err("wb_pages (%u) + scratch_pages (%u) > %u", *wb_pages, *scratch_pages,
-			ML_CN10K_OCM_NUMPAGES);
+			mldev->ocm.num_pages);
 		return -ENOMEM;
 	}
 
@@ -352,7 +352,7 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui
 	 */
 	if (!metadata->model.ocm_relocatable)
 		*scratch_pages =
-			PLT_MAX(PLT_U64_CAST(*scratch_pages), PLT_U64_CAST(ML_CN10K_OCM_NUMPAGES));
+			PLT_MAX(PLT_U64_CAST(*scratch_pages), PLT_U64_CAST(mldev->ocm.num_pages));
 
 	return 0;
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 551faef7eb..d8d2c71a3c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -220,13 +220,13 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 
-	uint8_t local_ocm_mask[ML_CN10K_OCM_MASKWORDS] = {0};
 	uint16_t used_scratch_pages_max;
 	uint16_t scratch_page_start;
 	int used_last_wb_page_max;
 	uint16_t scratch_page_end;
 	uint8_t search_start_tile;
 	uint8_t search_end_tile;
+	uint8_t *local_ocm_mask;
 	int wb_page_start_curr;
 	int max_slot_sz_curr;
 	uint8_t tile_start;
@@ -268,6 +268,9 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 		search_end_tile = start_tile;
 	}
 
+	/* nibbles + prefix '0x' */
+	local_ocm_mask = rte_zmalloc("local_ocm_mask", mldev->ocm.mask_words, RTE_CACHE_LINE_SIZE);
+
 	tile_start = search_start_tile;
 start_search:
 	used_scratch_pages_max = 0;
@@ -279,7 +282,7 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 			PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, used_last_wb_page_max);
 	}
 
-	memset(local_ocm_mask, 0, sizeof(local_ocm_mask));
+	memset(local_ocm_mask, 0, mldev->ocm.mask_words);
 	for (tile_id = tile_start; tile_id < tile_start + num_tiles; tile_id++) {
 		for (word_id = 0; word_id < ocm->mask_words; word_id++)
 			local_ocm_mask[word_id] |= ocm->tile_ocm_info[tile_id].ocm_mask[word_id];
@@ -332,6 +335,8 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	if (wb_page_start != -1)
 		*tilemask = GENMASK_ULL(tile_idx + num_tiles - 1, tile_idx);
 
+	rte_free(local_ocm_mask);
+
 	return wb_page_start;
 }
 
@@ -480,7 +485,7 @@ cn10k_ml_ocm_pagemask_to_str(struct cn10k_ml_ocm_tile_info *tile_info, uint16_t
 void
 cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
 {
-	char str[ML_CN10K_OCM_NUMPAGES / 4 + 2]; /* nibbles + prefix '0x' */
+	char *str;
 	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	uint8_t tile_id;
@@ -490,12 +495,15 @@ cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
 	mldev = dev->data->dev_private;
 	ocm = &mldev->ocm;
 
+	/* nibbles + prefix '0x' */
+	str = rte_zmalloc("ocm_mask_str", mldev->ocm.num_pages / 4 + 2, RTE_CACHE_LINE_SIZE);
+
 	fprintf(fp, "OCM State:\n");
 	for (tile_id = 0; tile_id < ocm->num_tiles; tile_id++) {
 		cn10k_ml_ocm_pagemask_to_str(&ocm->tile_ocm_info[tile_id], ocm->mask_words, str);
 
 		wb_pages = 0 - ocm->tile_ocm_info[tile_id].scratch_pages;
-		for (word_id = 0; word_id < ML_CN10K_OCM_MASKWORDS; word_id++)
+		for (word_id = 0; word_id < mldev->ocm.mask_words; word_id++)
 			wb_pages +=
 				__builtin_popcount(ocm->tile_ocm_info[tile_id].ocm_mask[word_id]);
 
@@ -506,4 +514,6 @@ cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
 			tile_id, ocm->tile_ocm_info[tile_id].scratch_pages, wb_pages,
 			ocm->tile_ocm_info[tile_id].last_wb_page, str);
 	}
+
+	rte_free(str);
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 5f018b410a..3404e7fd65 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -8,25 +8,16 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
-/* Page size in bytes. */
-#define ML_CN10K_OCM_PAGESIZE 0x4000
-
 /* Number of OCM tiles. */
 #define ML_CN10K_OCM_NUMTILES 0x8
 
 /* OCM in bytes, per tile. */
 #define ML_CN10K_OCM_TILESIZE 0x100000
 
-/* OCM pages, per tile. */
-#define ML_CN10K_OCM_NUMPAGES (ML_CN10K_OCM_TILESIZE / ML_CN10K_OCM_PAGESIZE)
-
-/* Maximum OCM mask words, per tile, 8 bit words. */
-#define ML_CN10K_OCM_MASKWORDS (ML_CN10K_OCM_NUMPAGES / 8)
-
 /* OCM and Tile information structure */
 struct cn10k_ml_ocm_tile_info {
 	/* Mask of used / allotted pages on tile's OCM */
-	uint8_t ocm_mask[ML_CN10K_OCM_MASKWORDS];
+	uint8_t *ocm_mask;
 
 	/* Last pages in the tile's OCM used for weights and bias, default = -1 */
 	int last_wb_page;
@@ -78,6 +69,9 @@ struct cn10k_ml_ocm {
 
 	/* OCM memory info and status*/
 	struct cn10k_ml_ocm_tile_info tile_ocm_info[ML_CN10K_OCM_NUMTILES];
+
+	/* Memory for ocm_mask */
+	uint8_t *ocm_mask;
 };
 
 int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 61e6d023c5..5b77e47322 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -311,8 +311,8 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 	if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
 		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
 			ML_CN10K_OCM_NUMTILES / 4, model->model_mem_map.tilemask);
-		fprintf(fp, "%*s : 0x%x\n", FIELD_LEN, "ocm_wb_start",
-			model->model_mem_map.wb_page_start * ML_CN10K_OCM_PAGESIZE);
+		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
+			model->model_mem_map.wb_page_start * mldev->ocm.page_size);
 	}
 
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->metadata.model.num_input);
@@ -781,12 +781,18 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	ocm = &mldev->ocm;
 	ocm->num_tiles = ML_CN10K_OCM_NUMTILES;
 	ocm->size_per_tile = ML_CN10K_OCM_TILESIZE;
-	ocm->page_size = ML_CN10K_OCM_PAGESIZE;
+	ocm->page_size = mldev->ocm_page_size;
 	ocm->num_pages = ocm->size_per_tile / ocm->page_size;
 	ocm->mask_words = ocm->num_pages / (8 * sizeof(uint8_t));
 
-	for (tile_id = 0; tile_id < ocm->num_tiles; tile_id++)
+	/* Allocate memory for ocm_mask */
+	ocm->ocm_mask =
+		rte_zmalloc("ocm_mask", ocm->mask_words * ocm->num_tiles, RTE_CACHE_LINE_SIZE);
+
+	for (tile_id = 0; tile_id < ocm->num_tiles; tile_id++) {
+		ocm->tile_ocm_info[tile_id].ocm_mask = ocm->ocm_mask + tile_id * ocm->mask_words;
 		ocm->tile_ocm_info[tile_id].last_wb_page = -1;
+	}
 
 	rte_spinlock_init(&ocm->lock);
 
@@ -856,6 +862,9 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 
 	mldev = dev->data->dev_private;
 
+	/* Release ocm_mask memory */
+	rte_free(mldev->ocm.ocm_mask);
+
 	/* Stop and unload all models */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 		model = dev->data->models[model_id];
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* RE: [PATCH v5 39/39] ml/cnxk: enable support for configurable ocm page
  2023-02-07 16:07   ` [PATCH v5 39/39] ml/cnxk: enable support for configurable ocm page Srikanth Yalavarthi
@ 2023-02-15 12:33     ` Shivah Shankar Shankar Narayan Rao
  2023-02-16  4:37     ` Prince Takkar
  1 sibling, 0 replies; 253+ messages in thread
From: Shivah Shankar Shankar Narayan Rao @ 2023-02-15 12:33 UTC (permalink / raw)
  To: Srikanth Yalavarthi, Srikanth Yalavarthi
  Cc: dev, Jerin Jacob Kollanukkaran, Anup Prabhu, Prince Takkar,
	Parijat Shukla

[-- Attachment #1: Type: text/plain, Size: 873 bytes --]

> -----Original Message-----
> From: Srikanth Yalavarthi <syalavarthi@marvell.com>
> Sent: Tuesday, February 7, 2023 9:37 PM
> To: Srikanth Yalavarthi <syalavarthi@marvell.com>
> Cc: dev@dpdk.org; Shivah Shankar Shankar Narayan Rao
> <sshankarnara@marvell.com>; Jerin Jacob Kollanukkaran
> <jerinj@marvell.com>; Anup Prabhu <aprabhu@marvell.com>; Prince Takkar
> <ptakkar@marvell.com>; Parijat Shukla <pshukla@marvell.com>
> Subject: [PATCH v5 39/39] ml/cnxk: enable support for configurable ocm
> page
> 
> Enabled support for configurable OCM page size. A new device argument
> "ocm_page_size" is added to specify the page size for OCM management.
> Supported page sizes are 1KB, 2KB, 4KB, 8KB and 16KB. Default page size is
> 16KB.
> 
> Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Acked-by: Shivah Shankar S <sshankarnara@marvell.com>

[-- Attachment #2: winmail.dat --]
[-- Type: application/ms-tnef, Size: 23647 bytes --]

^ permalink raw reply	[flat|nested] 253+ messages in thread

* RE: [PATCH v5 32/39] ml/cnxk: enable support to get xstats in cycles
  2023-02-07 16:07   ` [PATCH v5 32/39] ml/cnxk: enable support to get xstats in cycles Srikanth Yalavarthi
@ 2023-02-15 12:33     ` Shivah Shankar Shankar Narayan Rao
  2023-02-16  4:40     ` Prince Takkar
  1 sibling, 0 replies; 253+ messages in thread
From: Shivah Shankar Shankar Narayan Rao @ 2023-02-15 12:33 UTC (permalink / raw)
  To: Srikanth Yalavarthi, Srikanth Yalavarthi
  Cc: dev, Jerin Jacob Kollanukkaran, Anup Prabhu, Prince Takkar,
	Parijat Shukla

[-- Attachment #1: Type: text/plain, Size: 899 bytes --]

> -----Original Message-----
> From: Srikanth Yalavarthi <syalavarthi@marvell.com>
> Sent: Tuesday, February 7, 2023 9:37 PM
> To: Srikanth Yalavarthi <syalavarthi@marvell.com>
> Cc: dev@dpdk.org; Shivah Shankar Shankar Narayan Rao
> <sshankarnara@marvell.com>; Jerin Jacob Kollanukkaran
> <jerinj@marvell.com>; Anup Prabhu <aprabhu@marvell.com>; Prince Takkar
> <ptakkar@marvell.com>; Parijat Shukla <pshukla@marvell.com>
> Subject: [PATCH v5 32/39] ml/cnxk: enable support to get xstats in cycles
> 
> Enabled support to retrieve xstats in either cycles or ns.
> Access to sclk is enabled only if an RVU device is probed during initialization.
> Driver would return the xstats in nanoseconds only when an RVU device is
> probed, else would fallback to cycles.
> 
> Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Acked-by: Shivah Shankar S <sshankarnara@marvell.com>

[-- Attachment #2: winmail.dat --]
[-- Type: application/ms-tnef, Size: 23661 bytes --]

^ permalink raw reply	[flat|nested] 253+ messages in thread

* RE: [PATCH v5 38/39] ml/cnxk: add user guide for marvell cnxk ml driver
  2023-02-07 16:07   ` [PATCH v5 38/39] ml/cnxk: add user guide for marvell cnxk ml driver Srikanth Yalavarthi
@ 2023-02-15 12:34     ` Shivah Shankar Shankar Narayan Rao
  2023-02-16  4:41     ` Prince Takkar
  1 sibling, 0 replies; 253+ messages in thread
From: Shivah Shankar Shankar Narayan Rao @ 2023-02-15 12:34 UTC (permalink / raw)
  To: Srikanth Yalavarthi, Thomas Monjalon, Srikanth Yalavarthi
  Cc: dev, Jerin Jacob Kollanukkaran, Anup Prabhu, Prince Takkar,
	Parijat Shukla

[-- Attachment #1: Type: text/plain, Size: 875 bytes --]

> -----Original Message-----
> From: Srikanth Yalavarthi <syalavarthi@marvell.com>
> Sent: Tuesday, February 7, 2023 9:37 PM
> To: Thomas Monjalon <thomas@monjalon.net>; Srikanth Yalavarthi
> <syalavarthi@marvell.com>
> Cc: dev@dpdk.org; Shivah Shankar Shankar Narayan Rao
> <sshankarnara@marvell.com>; Jerin Jacob Kollanukkaran
> <jerinj@marvell.com>; Anup Prabhu <aprabhu@marvell.com>; Prince Takkar
> <ptakkar@marvell.com>; Parijat Shukla <pshukla@marvell.com>
> Subject: [PATCH v5 38/39] ml/cnxk: add user guide for marvell cnxk ml driver
> 
> Added user guide for Marvell cnxk ML driver for Marvell Octeon cnxk Soc
> family. Added details about device initialization, debug options and runtime
> device args supported by the driver.
> 
> Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Acked-by: Shivah Shankar S <sshankarnara@marvell.com>

[-- Attachment #2: winmail.dat --]
[-- Type: application/ms-tnef, Size: 23656 bytes --]

^ permalink raw reply	[flat|nested] 253+ messages in thread

* RE: [PATCH v5 39/39] ml/cnxk: enable support for configurable ocm page
  2023-02-07 16:07   ` [PATCH v5 39/39] ml/cnxk: enable support for configurable ocm page Srikanth Yalavarthi
  2023-02-15 12:33     ` Shivah Shankar Shankar Narayan Rao
@ 2023-02-16  4:37     ` Prince Takkar
  1 sibling, 0 replies; 253+ messages in thread
From: Prince Takkar @ 2023-02-16  4:37 UTC (permalink / raw)
  To: Srikanth Yalavarthi, Srikanth Yalavarthi
  Cc: dev, Shivah Shankar Shankar Narayan Rao,
	Jerin Jacob Kollanukkaran, Anup Prabhu, Parijat Shukla

[-- Attachment #1: Type: text/plain, Size: 1307 bytes --]



> -----Original Message-----
> From: Srikanth Yalavarthi <syalavarthi@marvell.com>
> Sent: Tuesday, February 7, 2023 9:37 PM
> To: Srikanth Yalavarthi <syalavarthi@marvell.com>
> Cc: dev@dpdk.org; Shivah Shankar Shankar Narayan Rao
> <sshankarnara@marvell.com>; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
> Anup Prabhu <aprabhu@marvell.com>; Prince Takkar <ptakkar@marvell.com>;
> Parijat Shukla <pshukla@marvell.com>
> Subject: [PATCH v5 39/39] ml/cnxk: enable support for configurable ocm page
> 
> Enabled support for configurable OCM page size. A new device argument
> "ocm_page_size" is added to specify the page size for OCM management.
> Supported page sizes are 1KB, 2KB, 4KB, 8KB and 16KB. Default page size is
> 16KB.
> 
> Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
> ---
>  doc/guides/mldevs/cnxk.rst       | 16 +++++++++
>  drivers/ml/cnxk/cn10k_ml_dev.c   | 61 ++++++++++++++++++++++++++++----
>  drivers/ml/cnxk/cn10k_ml_dev.h   |  3 ++
>  drivers/ml/cnxk/cn10k_ml_model.c |  6 ++--
>  drivers/ml/cnxk/cn10k_ml_ocm.c   | 18 +++++++---
>  drivers/ml/cnxk/cn10k_ml_ocm.h   | 14 +++-----
>  drivers/ml/cnxk/cn10k_ml_ops.c   | 17 ++++++---
>  7 files changed, 107 insertions(+), 28 deletions(-)
>
Acked-by: Prince Takkar <ptakkar@marvell.com>

[-- Attachment #2: winmail.dat --]
[-- Type: application/ms-tnef, Size: 23147 bytes --]

^ permalink raw reply	[flat|nested] 253+ messages in thread

* RE: [PATCH v5 32/39] ml/cnxk: enable support to get xstats in cycles
  2023-02-07 16:07   ` [PATCH v5 32/39] ml/cnxk: enable support to get xstats in cycles Srikanth Yalavarthi
  2023-02-15 12:33     ` Shivah Shankar Shankar Narayan Rao
@ 2023-02-16  4:40     ` Prince Takkar
  1 sibling, 0 replies; 253+ messages in thread
From: Prince Takkar @ 2023-02-16  4:40 UTC (permalink / raw)
  To: Srikanth Yalavarthi, Srikanth Yalavarthi
  Cc: dev, Shivah Shankar Shankar Narayan Rao,
	Jerin Jacob Kollanukkaran, Anup Prabhu, Parijat Shukla

[-- Attachment #1: Type: text/plain, Size: 1019 bytes --]



> -----Original Message-----
> From: Srikanth Yalavarthi <syalavarthi@marvell.com>
> Sent: Tuesday, February 7, 2023 9:37 PM
> To: Srikanth Yalavarthi <syalavarthi@marvell.com>
> Cc: dev@dpdk.org; Shivah Shankar Shankar Narayan Rao
> <sshankarnara@marvell.com>; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
> Anup Prabhu <aprabhu@marvell.com>; Prince Takkar <ptakkar@marvell.com>;
> Parijat Shukla <pshukla@marvell.com>
> Subject: [PATCH v5 32/39] ml/cnxk: enable support to get xstats in cycles
> 
> Enabled support to retrieve xstats in either cycles or ns.
> Access to sclk is enabled only if an RVU device is probed during initialization.
> Driver would return the xstats in nanoseconds only when an RVU device is
> probed, else would fallback to cycles.
> 
> Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
> ---
>  drivers/ml/cnxk/cn10k_ml_ops.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
Acked-by: Prince Takkar <ptakkar@marvell.com>

[-- Attachment #2: winmail.dat --]
[-- Type: application/ms-tnef, Size: 23025 bytes --]

^ permalink raw reply	[flat|nested] 253+ messages in thread

* RE: [PATCH v5 38/39] ml/cnxk: add user guide for marvell cnxk ml driver
  2023-02-07 16:07   ` [PATCH v5 38/39] ml/cnxk: add user guide for marvell cnxk ml driver Srikanth Yalavarthi
  2023-02-15 12:34     ` Shivah Shankar Shankar Narayan Rao
@ 2023-02-16  4:41     ` Prince Takkar
  1 sibling, 0 replies; 253+ messages in thread
From: Prince Takkar @ 2023-02-16  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi, Thomas Monjalon, Srikanth Yalavarthi
  Cc: dev, Shivah Shankar Shankar Narayan Rao,
	Jerin Jacob Kollanukkaran, Anup Prabhu, Parijat Shukla

[-- Attachment #1: Type: text/plain, Size: 1219 bytes --]



> -----Original Message-----
> From: Srikanth Yalavarthi <syalavarthi@marvell.com>
> Sent: Tuesday, February 7, 2023 9:37 PM
> To: Thomas Monjalon <thomas@monjalon.net>; Srikanth Yalavarthi
> <syalavarthi@marvell.com>
> Cc: dev@dpdk.org; Shivah Shankar Shankar Narayan Rao
> <sshankarnara@marvell.com>; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
> Anup Prabhu <aprabhu@marvell.com>; Prince Takkar <ptakkar@marvell.com>;
> Parijat Shukla <pshukla@marvell.com>
> Subject: [PATCH v5 38/39] ml/cnxk: add user guide for marvell cnxk ml driver
> 
> Added user guide for Marvell cnxk ML driver for Marvell Octeon cnxk Soc family.
> Added details about device initialization, debug options and runtime device args
> supported by the driver.
> 
> Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
> ---
>  MAINTAINERS                 |   1 +
>  doc/guides/index.rst        |   1 +
>  doc/guides/mldevs/cnxk.rst  | 238 ++++++++++++++++++++++++++++++++++++
>  doc/guides/mldevs/index.rst |  14 +++
>  4 files changed, 254 insertions(+)
>  create mode 100644 doc/guides/mldevs/cnxk.rst  create mode 100644
> doc/guides/mldevs/index.rst
> 
Acked-by: Prince Takkar <ptakkar@marvell.com>

[-- Attachment #2: winmail.dat --]
[-- Type: application/ms-tnef, Size: 23088 bytes --]

^ permalink raw reply	[flat|nested] 253+ messages in thread

* RE: [PATCH v5 28/39] ml/cnxk: add internal function for sync mode run
  2023-02-07 16:07   ` [PATCH v5 28/39] ml/cnxk: add internal function for sync mode run Srikanth Yalavarthi
@ 2023-02-27 10:42     ` Prince Takkar
  0 siblings, 0 replies; 253+ messages in thread
From: Prince Takkar @ 2023-02-27 10:42 UTC (permalink / raw)
  To: Srikanth Yalavarthi, Srikanth Yalavarthi
  Cc: dev, Shivah Shankar Shankar Narayan Rao,
	Jerin Jacob Kollanukkaran, Anup Prabhu, Parijat Shukla

[-- Attachment #1: Type: text/plain, Size: 980 bytes --]



> -----Original Message-----
> From: Srikanth Yalavarthi <syalavarthi@marvell.com>
> Sent: Tuesday, February 7, 2023 9:37 PM
> To: Srikanth Yalavarthi <syalavarthi@marvell.com>
> Cc: dev@dpdk.org; Shivah Shankar Shankar Narayan Rao
> <sshankarnara@marvell.com>; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
> Anup Prabhu <aprabhu@marvell.com>; Prince Takkar <ptakkar@marvell.com>;
> Parijat Shukla <pshukla@marvell.com>
> Subject: [PATCH v5 28/39] ml/cnxk: add internal function for sync mode run
> 
> Added internal function to execute ML inference requests in synchronous mode.
> Sync mode inference execution is used to launch inference requests without
> using a queue-pair.
> 
> Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
> ---
>  drivers/ml/cnxk/cn10k_ml_ops.c | 53 ++++++++++++++++++++++++++++++++++
>  drivers/ml/cnxk/cn10k_ml_ops.h |  1 +
>  2 files changed, 54 insertions(+)
> 
Acked-by: Prince Takkar <ptakkar@marvell.com>

[-- Attachment #2: winmail.dat --]
[-- Type: application/ms-tnef, Size: 15178 bytes --]

^ permalink raw reply	[flat|nested] 253+ messages in thread

* RE: [PATCH v5 35/39] ml/cnxk: add support to select OCM allocation mode
  2023-02-07 16:07   ` [PATCH v5 35/39] ml/cnxk: add support to select OCM allocation mode Srikanth Yalavarthi
@ 2023-03-01  9:01     ` Prince Takkar
  0 siblings, 0 replies; 253+ messages in thread
From: Prince Takkar @ 2023-03-01  9:01 UTC (permalink / raw)
  To: Srikanth Yalavarthi, Srikanth Yalavarthi
  Cc: dev, Shivah Shankar Shankar Narayan Rao,
	Jerin Jacob Kollanukkaran, Anup Prabhu, Parijat Shukla

[-- Attachment #1: Type: text/plain, Size: 1245 bytes --]



> -----Original Message-----
> From: Srikanth Yalavarthi <syalavarthi@marvell.com>
> Sent: Tuesday, February 7, 2023 9:37 PM
> To: Srikanth Yalavarthi <syalavarthi@marvell.com>
> Cc: dev@dpdk.org; Shivah Shankar Shankar Narayan Rao
> <sshankarnara@marvell.com>; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
> Anup Prabhu <aprabhu@marvell.com>; Prince Takkar <ptakkar@marvell.com>;
> Parijat Shukla <pshukla@marvell.com>
> Subject: [PATCH v5 35/39] ml/cnxk: add support to select OCM allocation mode
> 
> Added device argument "ocm_alloc_mode" to select OCM allocation method
> during model start. Two modes are supported by the driver.
> 
> Added implementation for ocm_alloc_mode lowest as default.
> 
> ocm_alloc_mode:
> lowest:  Allocate from first available free slot / lowest
>          tile ID in OCM (default)
> largest: Allocate from a slot with maximum free memory
> 
> Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
> ---
>  drivers/ml/cnxk/cn10k_ml_dev.c | 45 +++++++++++++++++++++++++++++-----
>  drivers/ml/cnxk/cn10k_ml_ocm.c |  6 ++---  drivers/ml/cnxk/cn10k_ml_ocm.h |
> 3 +++
>  3 files changed, 44 insertions(+), 10 deletions(-)
> 
Acked-by: Prince Takkar <ptakkar@marvell.com>

[-- Attachment #2: winmail.dat --]
[-- Type: application/ms-tnef, Size: 23176 bytes --]

^ permalink raw reply	[flat|nested] 253+ messages in thread

* RE: [PATCH v5 00/39] Implementation of ML CNXK driver
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
                     ` (38 preceding siblings ...)
  2023-02-07 16:07   ` [PATCH v5 39/39] ml/cnxk: enable support for configurable ocm page Srikanth Yalavarthi
@ 2023-03-02  6:08   ` Prince Takkar
  39 siblings, 0 replies; 253+ messages in thread
From: Prince Takkar @ 2023-03-02  6:08 UTC (permalink / raw)
  To: Srikanth Yalavarthi
  Cc: dev, Shivah Shankar Shankar Narayan Rao,
	Jerin Jacob Kollanukkaran, Anup Prabhu, Parijat Shukla,
	Srikanth Yalavarthi



> -----Original Message-----
> From: Srikanth Yalavarthi <syalavarthi@marvell.com>
> Sent: Tuesday, February 7, 2023 9:37 PM
> Cc: dev@dpdk.org; Shivah Shankar Shankar Narayan Rao
> <sshankarnara@marvell.com>; Jerin Jacob Kollanukkaran
> <jerinj@marvell.com>; Anup Prabhu <aprabhu@marvell.com>; Prince Takkar
> <ptakkar@marvell.com>; Parijat Shukla <pshukla@marvell.com>; Srikanth
> Yalavarthi <syalavarthi@marvell.com>
> Subject: [PATCH v5 00/39] Implementation of ML CNXK driver
> 
> Marvell ML CNXK Driver
> ----------------------
> 
> This patch series implements common Machine Learning (ML) ROC code and
> driver for Marvell Octeon 10 (cnxk) platform. ML inferencing is supported on
> cnxk platform through an integrated ML inferencing processor. The current
> driver supports programming the ML hardware engine through offload
> mode.
> 
> All APIs proposed in the DPDK ML device specification are supported on the
> cnxk platform.
> 
> v5:
> * Updated model_id to uint16_t
> * Updated release notes for 23.03
> 
> v4:
> * Update function names of ML common code
> * Added support for configurable OCM page size
> * Minor typo fixes
> 
> v3:
> * Skip installation of internal headers
> * Update internal comments and code cleanup
> 
> v2:
> * Typo and formatting fixes
> 
> Srikanth Yalavarthi (39):
>   common/cnxk: add ML headers and ROC code for cnxk
>   ml/cnxk: add skeleton for ML cnxk driver
>   ml/cnxk: enable probe and remove of ML device
>   ml/cnxk: add driver support to get device info
>   ml/cnxk: add support for configure and close
>   ml/cnxk: parse ML firmware path from device args
>   ml/cnxk: enable firmware load and device reset
>   ml/cnxk: enable support for simulator environment
>   ml/cnxk: enable support for device start and stop
>   ml/cnxk: add support to create device queue-pairs
>   ml/cnxk: add functions to load and unload models
>   ml/cnxk: enable validity checks for model metadata
>   ml/cnxk: add internal structures for derived info
>   ml/cnxk: add internal structures for tiles and OCM
>   ml/cnxk: add structures for slow and fast path JDs
>   ml/cnxk: find OCM mask and page slots for a model
>   ml/cnxk: add support to reserve and free OCM pages
>   ml/cnxk: enable support to start an ML model
>   ml/cnxk: enable support to stop an ML models
>   ml/cnxk: enable support to get model information
>   ml/cnxk: enable support to update model params
>   ml/cnxk: add support to get IO buffer sizes
>   ml/cnxk: enable quantization and dequantization
>   ml/cnxk: enable support to dump device debug info
>   ml/cnxk: add driver support for device selftest
>   ml/cnxk: enqueue a burst of inference requests
>   ml/cnxk: dequeue a burst of inference requests
>   ml/cnxk: add internal function for sync mode run
>   ml/cnxk: enable support for firmware error codes
>   ml/cnxk: add support to get and reset device stats
>   ml/cnxk: add support to handle extended dev stats
>   ml/cnxk: enable support to get xstats in cycles
>   ml/cnxk: add support to report DPE FW warnings
>   ml/cnxk: add support to enable model data caching
>   ml/cnxk: add support to select OCM allocation mode
>   ml/cnxk: add support to use lock during jcmd enq
>   ml/cnxk: add support to select poll memory region
>   ml/cnxk: add user guide for marvell cnxk ml driver
>   ml/cnxk: enable support for configurable ocm page
> 
>  MAINTAINERS                            |   11 +
>  doc/guides/index.rst                   |    1 +
>  doc/guides/mldevs/cnxk.rst             |  254 +++
>  doc/guides/mldevs/index.rst            |   14 +
>  doc/guides/rel_notes/release_23_03.rst |    7 +
>  drivers/common/cnxk/hw/ml.h            |  170 ++
>  drivers/common/cnxk/meson.build        |    1 +
>  drivers/common/cnxk/roc_api.h          |    4 +
>  drivers/common/cnxk/roc_constants.h    |    2 +
>  drivers/common/cnxk/roc_dev_priv.h     |    1 +
>  drivers/common/cnxk/roc_ml.c           |  626 +++++++
>  drivers/common/cnxk/roc_ml.h           |  152 ++
>  drivers/common/cnxk/roc_ml_priv.h      |   24 +
>  drivers/common/cnxk/roc_platform.c     |    1 +
>  drivers/common/cnxk/roc_platform.h     |    2 +
>  drivers/common/cnxk/roc_priv.h         |    3 +
>  drivers/common/cnxk/version.map        |   29 +
>  drivers/meson.build                    |    1 +
>  drivers/ml/cnxk/cn10k_ml_dev.c         |  870 +++++++++
>  drivers/ml/cnxk/cn10k_ml_dev.h         |  429 +++++
>  drivers/ml/cnxk/cn10k_ml_model.c       |  413 +++++
>  drivers/ml/cnxk/cn10k_ml_model.h       |  508 ++++++
>  drivers/ml/cnxk/cn10k_ml_ocm.c         |  519 ++++++
>  drivers/ml/cnxk/cn10k_ml_ocm.h         |   85 +
>  drivers/ml/cnxk/cn10k_ml_ops.c         | 2316 ++++++++++++++++++++++++
>  drivers/ml/cnxk/cn10k_ml_ops.h         |   94 +
>  drivers/ml/cnxk/meson.build            |   32 +
>  drivers/ml/meson.build                 |    8 +
>  28 files changed, 6577 insertions(+)
>  create mode 100644 doc/guides/mldevs/cnxk.rst  create mode 100644
> doc/guides/mldevs/index.rst  create mode 100644
> drivers/common/cnxk/hw/ml.h  create mode 100644
> drivers/common/cnxk/roc_ml.c  create mode 100644
> drivers/common/cnxk/roc_ml.h  create mode 100644
> drivers/common/cnxk/roc_ml_priv.h  create mode 100644
> drivers/ml/cnxk/cn10k_ml_dev.c  create mode 100644
> drivers/ml/cnxk/cn10k_ml_dev.h  create mode 100644
> drivers/ml/cnxk/cn10k_ml_model.c  create mode 100644
> drivers/ml/cnxk/cn10k_ml_model.h  create mode 100644
> drivers/ml/cnxk/cn10k_ml_ocm.c  create mode 100644
> drivers/ml/cnxk/cn10k_ml_ocm.h  create mode 100644
> drivers/ml/cnxk/cn10k_ml_ops.c  create mode 100644
> drivers/ml/cnxk/cn10k_ml_ops.h  create mode 100644
> drivers/ml/cnxk/meson.build  create mode 100644 drivers/ml/meson.build
> 
> --
> 2.17.1

Acked-by: Prince Takkar <ptakkar@marvell.com>

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH v5 02/39] ml/cnxk: add skeleton for ML cnxk driver
  2023-02-07 16:06   ` [PATCH v5 02/39] ml/cnxk: add skeleton for ML cnxk driver Srikanth Yalavarthi
@ 2023-03-09 22:06     ` Thomas Monjalon
  2023-03-10  8:25       ` [EXT] " Srikanth Yalavarthi
  0 siblings, 1 reply; 253+ messages in thread
From: Thomas Monjalon @ 2023-03-09 22:06 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

07/02/2023 17:06, Srikanth Yalavarthi:
> --- a/doc/guides/rel_notes/release_23_03.rst
> +++ b/doc/guides/rel_notes/release_23_03.rst
> +* **Implementation of Marvell CNXK machine learning driver for .**

It seems a word is missing.
It  looks like you did a lot of work on the mldev series,
so some details are missing.

> +
> +  * Added ml/cnxk driver which provides support for machine learning inference
> +    operations on Marvell's CN10K series of SoC's.
> +  * Added ML ROC code for ml/cnxk driver to common/cnxk.
> +  * Added implementation with support for all rte_ml APIs.





^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 00/39] Implementation of ML CNXK driver
  2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
                   ` (39 preceding siblings ...)
  2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
@ 2023-03-10  8:19 ` Srikanth Yalavarthi
  2023-03-10  8:19   ` [PATCH v6 01/39] common/cnxk: add ML headers and ROC code for cnxk Srikanth Yalavarthi
                     ` (40 more replies)
  40 siblings, 41 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:19 UTC (permalink / raw)
  Cc: dev, syalavarthi, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Marvell ML CNXK Driver
----------------------

This patch series implements common Machine Learning (ML) ROC code
and driver for Marvell Octeon 10 (cnxk) platform. ML inferencing is
supported on cnxk platform through an integrated ML inferencing
processor. The current driver supports programming the ML hardware
engine through offload mode.

All APIs proposed in the DPDK ML device specification are supported on
the cnxk platform.

v6:
* Fixed release notes content
* Rebased the patch series

v5:
* Updated model_id to uint16_t
* Updated release notes for 23.03

v4:
* Update function names of ML common code
* Added support for configurable OCM page size
* Minor typo fixes

v3:
* Skip installation of internal headers
* Update internal comments and code cleanup

v2:
* Typo and formatting fixes


Srikanth Yalavarthi (39):
  common/cnxk: add ML headers and ROC code for cnxk
  ml/cnxk: add skeleton for ML cnxk driver
  ml/cnxk: enable probe and remove of ML device
  ml/cnxk: add driver support to get device info
  ml/cnxk: add support for configure and close
  ml/cnxk: parse ML firmware path from device args
  ml/cnxk: enable firmware load and device reset
  ml/cnxk: enable support for simulator environment
  ml/cnxk: enable support for device start and stop
  ml/cnxk: add support to create device queue-pairs
  ml/cnxk: add functions to load and unload models
  ml/cnxk: enable validity checks for model metadata
  ml/cnxk: add internal structures for derived info
  ml/cnxk: add internal structures for tiles and OCM
  ml/cnxk: add structures for slow and fast path JDs
  ml/cnxk: find OCM mask and page slots for a model
  ml/cnxk: add support to reserve and free OCM pages
  ml/cnxk: enable support to start an ML model
  ml/cnxk: enable support to stop an ML models
  ml/cnxk: enable support to get model information
  ml/cnxk: enable support to update model params
  ml/cnxk: add support to get IO buffer sizes
  ml/cnxk: enable quantization and dequantization
  ml/cnxk: enable support to dump device debug info
  ml/cnxk: add driver support for device selftest
  ml/cnxk: enqueue a burst of inference requests
  ml/cnxk: dequeue a burst of inference requests
  ml/cnxk: add internal function for sync mode run
  ml/cnxk: enable support for firmware error codes
  ml/cnxk: add support to get and reset device stats
  ml/cnxk: add support to handle extended dev stats
  ml/cnxk: enable support to get xstats in cycles
  ml/cnxk: add support to report DPE FW warnings
  ml/cnxk: add support to enable model data caching
  ml/cnxk: add support to select OCM allocation mode
  ml/cnxk: add support to use lock during jcmd enq
  ml/cnxk: add support to select poll memory region
  ml/cnxk: add user guide for marvell cnxk ml driver
  ml/cnxk: add support for configurable ocm page

 MAINTAINERS                            |   11 +
 doc/guides/index.rst                   |    1 +
 doc/guides/mldevs/cnxk.rst             |  254 +++
 doc/guides/mldevs/index.rst            |   14 +
 doc/guides/rel_notes/release_23_03.rst |    7 +
 drivers/common/cnxk/hw/ml.h            |  170 ++
 drivers/common/cnxk/meson.build        |    1 +
 drivers/common/cnxk/roc_api.h          |    4 +
 drivers/common/cnxk/roc_constants.h    |    2 +
 drivers/common/cnxk/roc_dev_priv.h     |    1 +
 drivers/common/cnxk/roc_ml.c           |  626 +++++++
 drivers/common/cnxk/roc_ml.h           |  152 ++
 drivers/common/cnxk/roc_ml_priv.h      |   24 +
 drivers/common/cnxk/roc_platform.c     |    1 +
 drivers/common/cnxk/roc_platform.h     |    2 +
 drivers/common/cnxk/roc_priv.h         |    3 +
 drivers/common/cnxk/version.map        |   29 +
 drivers/meson.build                    |    1 +
 drivers/ml/cnxk/cn10k_ml_dev.c         |  870 +++++++++
 drivers/ml/cnxk/cn10k_ml_dev.h         |  429 +++++
 drivers/ml/cnxk/cn10k_ml_model.c       |  413 +++++
 drivers/ml/cnxk/cn10k_ml_model.h       |  508 ++++++
 drivers/ml/cnxk/cn10k_ml_ocm.c         |  519 ++++++
 drivers/ml/cnxk/cn10k_ml_ocm.h         |   85 +
 drivers/ml/cnxk/cn10k_ml_ops.c         | 2316 ++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h         |   94 +
 drivers/ml/cnxk/meson.build            |   32 +
 drivers/ml/meson.build                 |    8 +
 28 files changed, 6577 insertions(+)
 create mode 100644 doc/guides/mldevs/cnxk.rst
 create mode 100644 doc/guides/mldevs/index.rst
 create mode 100644 drivers/common/cnxk/hw/ml.h
 create mode 100644 drivers/common/cnxk/roc_ml.c
 create mode 100644 drivers/common/cnxk/roc_ml.h
 create mode 100644 drivers/common/cnxk/roc_ml_priv.h
 create mode 100644 drivers/ml/cnxk/cn10k_ml_dev.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_dev.h
 create mode 100644 drivers/ml/cnxk/cn10k_ml_model.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_model.h
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ocm.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ocm.h
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ops.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ops.h
 create mode 100644 drivers/ml/cnxk/meson.build
 create mode 100644 drivers/ml/meson.build

--
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 01/39] common/cnxk: add ML headers and ROC code for cnxk
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
@ 2023-03-10  8:19   ` Srikanth Yalavarthi
  2023-03-10  8:19   ` [PATCH v6 02/39] ml/cnxk: add skeleton for ML cnxk driver Srikanth Yalavarthi
                     ` (39 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:19 UTC (permalink / raw)
  To: Thomas Monjalon, Srikanth Yalavarthi, Nithin Dabilpuram,
	Kiran Kumar K, Sunil Kumar Kori, Satha Rao
  Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added ML cnxk headers for register, structure definitions and
ROC layer. Implemented ROC functions, registered logtype for
ML module with the name pmd.ml.cnxk and defined ML hardware ID.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
Depends-on: series-27324 ("Implementation of mldev test application")

 MAINTAINERS                         |   9 +
 drivers/common/cnxk/hw/ml.h         | 170 ++++++++
 drivers/common/cnxk/meson.build     |   1 +
 drivers/common/cnxk/roc_api.h       |   4 +
 drivers/common/cnxk/roc_constants.h |   2 +
 drivers/common/cnxk/roc_dev_priv.h  |   1 +
 drivers/common/cnxk/roc_ml.c        | 626 ++++++++++++++++++++++++++++
 drivers/common/cnxk/roc_ml.h        | 152 +++++++
 drivers/common/cnxk/roc_ml_priv.h   |  24 ++
 drivers/common/cnxk/roc_platform.c  |   1 +
 drivers/common/cnxk/roc_platform.h  |   2 +
 drivers/common/cnxk/roc_priv.h      |   3 +
 drivers/common/cnxk/version.map     |  29 ++
 13 files changed, 1024 insertions(+)
 create mode 100644 drivers/common/cnxk/hw/ml.h
 create mode 100644 drivers/common/cnxk/roc_ml.c
 create mode 100644 drivers/common/cnxk/roc_ml.h
 create mode 100644 drivers/common/cnxk/roc_ml_priv.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 320842e13f..d58df9197c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1442,6 +1442,15 @@ F: drivers/raw/dpaa2_cmdif/
 F: doc/guides/rawdevs/dpaa2_cmdif.rst


+ML Device Drivers
+------------------------
+
+Marvell ML CNXK
+M: Srikanth Yalavarthi <syalavarthi@marvell.com>
+F: drivers/common/cnxk/hw/ml.h
+F: drivers/common/cnxk/roc_ml*
+
+
 Packet processing
 -----------------

diff --git a/drivers/common/cnxk/hw/ml.h b/drivers/common/cnxk/hw/ml.h
new file mode 100644
index 0000000000..3ead42b807
--- /dev/null
+++ b/drivers/common/cnxk/hw/ml.h
@@ -0,0 +1,170 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef __ML_HW_H__
+#define __ML_HW_H__
+
+#include <stdint.h>
+
+/* Constants */
+#define ML_ANBX_NR 0x3
+
+/* Base offsets */
+#define ML_MLAB_BLK_OFFSET 0x20000000 /* CNF10KB */
+#define ML_AXI_START_ADDR  0x800000000
+
+/* MLW register offsets / ML_PF_BAR0 */
+#define ML_CFG			 0x10000
+#define ML_MLR_BASE		 0x10008
+#define ML_AXI_BRIDGE_CTRL(a)	 (0x10020 | (uint64_t)(a) << 3)
+#define ML_JOB_MGR_CTRL		 0x10060
+#define ML_CORE_INT_LO		 0x10140
+#define ML_CORE_INT_HI		 0x10160
+#define ML_JCMDQ_IN(a)		 (0x11000 | (uint64_t)(a) << 3) /* CN10KA */
+#define ML_JCMDQ_STATUS		 0x11010			/* CN10KA */
+#define ML_STGX_STATUS(a)	 (0x11020 | (uint64_t)(a) << 3) /* CNF10KB */
+#define ML_STG_CONTROL		 0x11100			/* CNF10KB */
+#define ML_PNB_CMD_TYPE		 0x113a0			/* CNF10KB */
+#define ML_SCRATCH(a)		 (0x14000 | (uint64_t)(a) << 3)
+#define ML_ANBX_BACKP_DISABLE(a) (0x18000 | (uint64_t)(a) << 12) /* CN10KA */
+#define ML_ANBX_NCBI_P_OVR(a)	 (0x18010 | (uint64_t)(a) << 12) /* CN10KA */
+#define ML_ANBX_NCBI_NP_OVR(a)	 (0x18020 | (uint64_t)(a) << 12) /* CN10KA */
+
+/* MLIP configuration register offsets / ML_PF_BAR0 */
+#define ML_SW_RST_CTRL		      0x12084000
+#define ML_A35_0_RST_VECTOR_BASE_W(a) (0x12084014 + (a) * (0x04))
+#define ML_A35_1_RST_VECTOR_BASE_W(a) (0x1208401c + (a) * (0x04))
+
+/* MLW scratch register offsets */
+#define ML_SCRATCH_WORK_PTR	      (ML_SCRATCH(0))
+#define ML_SCRATCH_FW_CTRL	      (ML_SCRATCH(1))
+#define ML_SCRATCH_DBG_BUFFER_HEAD_C0 (ML_SCRATCH(2))
+#define ML_SCRATCH_DBG_BUFFER_TAIL_C0 (ML_SCRATCH(3))
+#define ML_SCRATCH_DBG_BUFFER_HEAD_C1 (ML_SCRATCH(4))
+#define ML_SCRATCH_DBG_BUFFER_TAIL_C1 (ML_SCRATCH(5))
+#define ML_SCRATCH_EXCEPTION_SP_C0    (ML_SCRATCH(6))
+#define ML_SCRATCH_EXCEPTION_SP_C1    (ML_SCRATCH(7))
+
+/* ML job completion structure */
+struct ml_jce_s {
+	/* WORD 0 */
+	union ml_jce_w0 {
+		struct {
+			uint64_t rsvd_0_3 : 4;
+
+			/* Reserved for future architecture */
+			uint64_t ggrp_h : 2;
+
+			/* Tag type */
+			uint64_t ttype : 2;
+
+			/* Physical function number */
+			uint64_t pf_func : 16;
+
+			/* Unused [7] + Guest Group [6:0] */
+			uint64_t ggrp : 8;
+
+			/* Tag */
+			uint64_t tag : 32;
+		} s;
+		uint64_t u64;
+	} w0;
+
+	/* WORD 1 */
+	union ml_jce_w1 {
+		struct {
+			/* Work queue pointer */
+			uint64_t wqp : 53;
+			uint64_t rsvd_53_63 : 11;
+
+		} s;
+		uint64_t u64;
+	} w1;
+};
+
+/* ML job command structure */
+struct ml_job_cmd_s {
+	/* WORD 0 */
+	union ml_job_cmd_w0 {
+		struct {
+			uint64_t rsvd_0_63;
+		} s;
+		uint64_t u64;
+	} w0;
+
+	/* WORD 1 */
+	union ml_job_cmd_w1 {
+		struct {
+			/* Job pointer */
+			uint64_t jobptr : 53;
+			uint64_t rsvd_53_63 : 11;
+		} s;
+		uint64_t u64;
+	} w1;
+};
+
+/* ML A35 0 RST vector base structure */
+union ml_a35_0_rst_vector_base_s {
+	struct {
+		/* Base address */
+		uint64_t addr : 37;
+		uint64_t rsvd_37_63 : 27;
+	} s;
+
+	struct {
+		/* WORD 0 */
+		uint32_t w0;
+
+		/* WORD 1 */
+		uint32_t w1;
+	} w;
+
+	uint64_t u64;
+};
+
+/* ML A35 1 RST vector base structure */
+union ml_a35_1_rst_vector_base_s {
+	struct {
+		/* Base address */
+		uint64_t addr : 37;
+		uint64_t rsvd_37_63 : 27;
+	} s;
+
+	struct {
+		/* WORD 0 */
+		uint32_t w0;
+
+		/* WORD 1 */
+		uint32_t w1;
+	} w;
+
+	uint64_t u64;
+};
+
+/* Work pointer scratch register */
+union ml_scratch_work_ptr_s {
+	struct {
+		/* Work pointer */
+		uint64_t work_ptr : 37;
+		uint64_t rsvd_37_63 : 27;
+	} s;
+	uint64_t u64;
+};
+
+/* Firmware control scratch register */
+union ml_scratch_fw_ctrl_s {
+	struct {
+		uint64_t rsvd_0_15 : 16;
+
+		/* Valid job bit */
+		uint64_t valid : 1;
+
+		/* Done status bit */
+		uint64_t done : 1;
+		uint64_t rsvd_18_63 : 46;
+	} s;
+	uint64_t u64;
+};
+
+#endif /* __ML_HW_H__ */
diff --git a/drivers/common/cnxk/meson.build b/drivers/common/cnxk/meson.build
index 849735921c..b4aa0a050c 100644
--- a/drivers/common/cnxk/meson.build
+++ b/drivers/common/cnxk/meson.build
@@ -26,6 +26,7 @@ sources = files(
         'roc_irq.c',
         'roc_ie_ot.c',
         'roc_mbox.c',
+        'roc_ml.c',
         'roc_model.c',
         'roc_nix.c',
         'roc_nix_bpf.c',
diff --git a/drivers/common/cnxk/roc_api.h b/drivers/common/cnxk/roc_api.h
index 993a2f7a68..bbc94ab48e 100644
--- a/drivers/common/cnxk/roc_api.h
+++ b/drivers/common/cnxk/roc_api.h
@@ -31,6 +31,7 @@
 /* HW structure definition */
 #include "hw/cpt.h"
 #include "hw/dpi.h"
+#include "hw/ml.h"
 #include "hw/nix.h"
 #include "hw/npa.h"
 #include "hw/npc.h"
@@ -110,4 +111,7 @@
 #include "roc_nix_inl_dp.h"
 #include "roc_nix_inl.h"

+/* ML */
+#include "roc_ml.h"
+
 #endif /* _ROC_API_H_ */
diff --git a/drivers/common/cnxk/roc_constants.h b/drivers/common/cnxk/roc_constants.h
index c94916db6d..291b6a4bc9 100644
--- a/drivers/common/cnxk/roc_constants.h
+++ b/drivers/common/cnxk/roc_constants.h
@@ -52,6 +52,8 @@
 #define PCI_DEVID_CN10K_RVU_CPT_PF 0xA0F2
 #define PCI_DEVID_CN10K_RVU_CPT_VF 0xA0F3

+#define PCI_DEVID_CN10K_ML_PF 0xA092
+
 #define PCI_SUBSYSTEM_DEVID_CN10KA  0xB900
 #define PCI_SUBSYSTEM_DEVID_CN10KAS 0xB900
 #define PCI_SUBSYSTEM_DEVID_CNF10KA 0xBA00
diff --git a/drivers/common/cnxk/roc_dev_priv.h b/drivers/common/cnxk/roc_dev_priv.h
index 27bf68fddb..feda173ce5 100644
--- a/drivers/common/cnxk/roc_dev_priv.h
+++ b/drivers/common/cnxk/roc_dev_priv.h
@@ -94,6 +94,7 @@ struct dev {
 	void *roc_nix;
 	void *roc_cpt;
 	void *roc_tim;
+	void *roc_ml;
 	bool disable_shared_lmt; /* false(default): shared lmt mode enabled */
 	const struct plt_memzone *lmt_mz;
 } __plt_cache_aligned;
diff --git a/drivers/common/cnxk/roc_ml.c b/drivers/common/cnxk/roc_ml.c
new file mode 100644
index 0000000000..7390697b1d
--- /dev/null
+++ b/drivers/common/cnxk/roc_ml.c
@@ -0,0 +1,626 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#include "roc_api.h"
+#include "roc_priv.h"
+
+#define TIME_SEC_IN_MS 1000
+
+static int
+roc_ml_reg_wait_to_clear(struct roc_ml *roc_ml, uint64_t offset, uint64_t mask)
+{
+	uint64_t start_cycle;
+	uint64_t wait_cycles;
+	uint64_t reg_val;
+
+	wait_cycles = (ROC_ML_TIMEOUT_MS * plt_tsc_hz()) / TIME_SEC_IN_MS;
+	start_cycle = plt_tsc_cycles();
+	do {
+		reg_val = roc_ml_reg_read64(roc_ml, offset);
+
+		if (!(reg_val & mask))
+			return 0;
+	} while (plt_tsc_cycles() - start_cycle < wait_cycles);
+
+	return -ETIME;
+}
+
+uint64_t
+roc_ml_reg_read64(struct roc_ml *roc_ml, uint64_t offset)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+
+	return plt_read64(PLT_PTR_ADD(ml->ml_reg_addr, offset));
+}
+
+void
+roc_ml_reg_write64(struct roc_ml *roc_ml, uint64_t val, uint64_t offset)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+
+	plt_write64(val, PLT_PTR_ADD(ml->ml_reg_addr, offset));
+}
+
+uint32_t
+roc_ml_reg_read32(struct roc_ml *roc_ml, uint64_t offset)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+
+	return plt_read32(PLT_PTR_ADD(ml->ml_reg_addr, offset));
+}
+
+void
+roc_ml_reg_write32(struct roc_ml *roc_ml, uint32_t val, uint64_t offset)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+
+	plt_write32(val, PLT_PTR_ADD(ml->ml_reg_addr, offset));
+}
+
+void
+roc_ml_reg_save(struct roc_ml *roc_ml, uint64_t offset)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+
+	if (offset == ML_MLR_BASE) {
+		ml->ml_mlr_base =
+			FIELD_GET(ROC_ML_MLR_BASE_BASE, roc_ml_reg_read64(roc_ml, offset));
+		ml->ml_mlr_base_saved = true;
+	}
+}
+
+void *
+roc_ml_addr_ap2mlip(struct roc_ml *roc_ml, void *addr)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+	uint64_t ml_mlr_base;
+
+	ml_mlr_base = (ml->ml_mlr_base_saved) ? ml->ml_mlr_base :
+						FIELD_GET(ROC_ML_MLR_BASE_BASE,
+							  roc_ml_reg_read64(roc_ml, ML_MLR_BASE));
+	return PLT_PTR_ADD(addr, ML_AXI_START_ADDR - ml_mlr_base);
+}
+
+void *
+roc_ml_addr_mlip2ap(struct roc_ml *roc_ml, void *addr)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+	uint64_t ml_mlr_base;
+
+	ml_mlr_base = (ml->ml_mlr_base_saved) ? ml->ml_mlr_base :
+						FIELD_GET(ROC_ML_MLR_BASE_BASE,
+							  roc_ml_reg_read64(roc_ml, ML_MLR_BASE));
+	return PLT_PTR_ADD(addr, ml_mlr_base - ML_AXI_START_ADDR);
+}
+
+uint64_t
+roc_ml_addr_pa_to_offset(struct roc_ml *roc_ml, uint64_t phys_addr)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+
+	if (roc_model_is_cn10ka())
+		return phys_addr - ml->pci_dev->mem_resource[0].phys_addr;
+	else
+		return phys_addr - ml->pci_dev->mem_resource[0].phys_addr - ML_MLAB_BLK_OFFSET;
+}
+
+uint64_t
+roc_ml_addr_offset_to_pa(struct roc_ml *roc_ml, uint64_t offset)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+
+	if (roc_model_is_cn10ka())
+		return ml->pci_dev->mem_resource[0].phys_addr + offset;
+	else
+		return ml->pci_dev->mem_resource[0].phys_addr + ML_MLAB_BLK_OFFSET + offset;
+}
+
+void
+roc_ml_scratch_write_job(struct roc_ml *roc_ml, void *work_ptr)
+{
+	union ml_scratch_work_ptr_s reg_work_ptr;
+	union ml_scratch_fw_ctrl_s reg_fw_ctrl;
+
+	reg_work_ptr.u64 = 0;
+	reg_work_ptr.s.work_ptr = PLT_U64_CAST(roc_ml_addr_ap2mlip(roc_ml, work_ptr));
+
+	reg_fw_ctrl.u64 = 0;
+	reg_fw_ctrl.s.valid = 1;
+
+	roc_ml_reg_write64(roc_ml, reg_work_ptr.u64, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(roc_ml, reg_fw_ctrl.u64, ML_SCRATCH_FW_CTRL);
+}
+
+bool
+roc_ml_scratch_is_valid_bit_set(struct roc_ml *roc_ml)
+{
+	union ml_scratch_fw_ctrl_s reg_fw_ctrl;
+
+	reg_fw_ctrl.u64 = roc_ml_reg_read64(roc_ml, ML_SCRATCH_FW_CTRL);
+
+	if (reg_fw_ctrl.s.valid == 1)
+		return true;
+
+	return false;
+}
+
+bool
+roc_ml_scratch_is_done_bit_set(struct roc_ml *roc_ml)
+{
+	union ml_scratch_fw_ctrl_s reg_fw_ctrl;
+
+	reg_fw_ctrl.u64 = roc_ml_reg_read64(roc_ml, ML_SCRATCH_FW_CTRL);
+
+	if (reg_fw_ctrl.s.done == 1)
+		return true;
+
+	return false;
+}
+
+bool
+roc_ml_scratch_enqueue(struct roc_ml *roc_ml, void *work_ptr)
+{
+	union ml_scratch_work_ptr_s reg_work_ptr;
+	union ml_scratch_fw_ctrl_s reg_fw_ctrl;
+	bool ret = false;
+
+	reg_work_ptr.u64 = 0;
+	reg_work_ptr.s.work_ptr = PLT_U64_CAST(roc_ml_addr_ap2mlip(roc_ml, work_ptr));
+
+	reg_fw_ctrl.u64 = 0;
+	reg_fw_ctrl.s.valid = 1;
+
+	if (plt_spinlock_trylock(&roc_ml->sp_spinlock) != 0) {
+		bool valid = roc_ml_scratch_is_valid_bit_set(roc_ml);
+		bool done = roc_ml_scratch_is_done_bit_set(roc_ml);
+
+		if (valid == done) {
+			roc_ml_clk_force_on(roc_ml);
+			roc_ml_dma_stall_off(roc_ml);
+
+			roc_ml_reg_write64(roc_ml, reg_work_ptr.u64, ML_SCRATCH_WORK_PTR);
+			roc_ml_reg_write64(roc_ml, reg_fw_ctrl.u64, ML_SCRATCH_FW_CTRL);
+
+			ret = true;
+		}
+		plt_spinlock_unlock(&roc_ml->sp_spinlock);
+	}
+
+	return ret;
+}
+
+bool
+roc_ml_scratch_dequeue(struct roc_ml *roc_ml, void *work_ptr)
+{
+	union ml_scratch_work_ptr_s reg_work_ptr;
+	bool ret = false;
+
+	if (plt_spinlock_trylock(&roc_ml->sp_spinlock) != 0) {
+		bool valid = roc_ml_scratch_is_valid_bit_set(roc_ml);
+		bool done = roc_ml_scratch_is_done_bit_set(roc_ml);
+
+		if (valid && done) {
+			reg_work_ptr.u64 = roc_ml_reg_read64(roc_ml, ML_SCRATCH_WORK_PTR);
+			if (work_ptr ==
+			    roc_ml_addr_mlip2ap(roc_ml, PLT_PTR_CAST(reg_work_ptr.u64))) {
+				roc_ml_dma_stall_on(roc_ml);
+				roc_ml_clk_force_off(roc_ml);
+
+				roc_ml_reg_write64(roc_ml, 0, ML_SCRATCH_WORK_PTR);
+				roc_ml_reg_write64(roc_ml, 0, ML_SCRATCH_FW_CTRL);
+				ret = true;
+			}
+		}
+		plt_spinlock_unlock(&roc_ml->sp_spinlock);
+	}
+
+	return ret;
+}
+
+void
+roc_ml_scratch_queue_reset(struct roc_ml *roc_ml)
+{
+	if (plt_spinlock_trylock(&roc_ml->sp_spinlock) != 0) {
+		roc_ml_dma_stall_on(roc_ml);
+		roc_ml_clk_force_off(roc_ml);
+		roc_ml_reg_write64(roc_ml, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(roc_ml, 0, ML_SCRATCH_FW_CTRL);
+		plt_spinlock_unlock(&roc_ml->sp_spinlock);
+	}
+}
+
+bool
+roc_ml_jcmdq_enqueue_lf(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd)
+{
+	bool ret = false;
+
+	if (FIELD_GET(ROC_ML_JCMDQ_STATUS_AVAIL_COUNT,
+		      roc_ml_reg_read64(roc_ml, ML_JCMDQ_STATUS)) != 0) {
+		roc_ml_reg_write64(roc_ml, job_cmd->w0.u64, ML_JCMDQ_IN(0));
+		roc_ml_reg_write64(roc_ml, job_cmd->w1.u64, ML_JCMDQ_IN(1));
+		ret = true;
+	}
+
+	return ret;
+}
+
+bool
+roc_ml_jcmdq_enqueue_sl(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd)
+{
+	bool ret = false;
+
+	if (plt_spinlock_trylock(&roc_ml->fp_spinlock) != 0) {
+		if (FIELD_GET(ROC_ML_JCMDQ_STATUS_AVAIL_COUNT,
+			      roc_ml_reg_read64(roc_ml, ML_JCMDQ_STATUS)) != 0) {
+			roc_ml_reg_write64(roc_ml, job_cmd->w0.u64, ML_JCMDQ_IN(0));
+			roc_ml_reg_write64(roc_ml, job_cmd->w1.u64, ML_JCMDQ_IN(1));
+			ret = true;
+		}
+		plt_spinlock_unlock(&roc_ml->fp_spinlock);
+	}
+
+	return ret;
+}
+
+void
+roc_ml_clk_force_on(struct roc_ml *roc_ml)
+{
+	uint64_t reg_val = 0;
+
+	reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+	reg_val |= ROC_ML_CFG_MLIP_CLK_FORCE;
+	roc_ml_reg_write64(roc_ml, reg_val, ML_CFG);
+}
+
+void
+roc_ml_clk_force_off(struct roc_ml *roc_ml)
+{
+	uint64_t reg_val = 0;
+
+	roc_ml_reg_write64(roc_ml, 0, ML_SCRATCH_WORK_PTR);
+
+	reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+	reg_val &= ~ROC_ML_CFG_MLIP_CLK_FORCE;
+	roc_ml_reg_write64(roc_ml, reg_val, ML_CFG);
+}
+
+void
+roc_ml_dma_stall_on(struct roc_ml *roc_ml)
+{
+	uint64_t reg_val = 0;
+
+	reg_val = roc_ml_reg_read64(roc_ml, ML_JOB_MGR_CTRL);
+	reg_val |= ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
+	roc_ml_reg_write64(roc_ml, reg_val, ML_JOB_MGR_CTRL);
+}
+
+void
+roc_ml_dma_stall_off(struct roc_ml *roc_ml)
+{
+	uint64_t reg_val = 0;
+
+	reg_val = roc_ml_reg_read64(roc_ml, ML_JOB_MGR_CTRL);
+	reg_val &= ~ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
+	roc_ml_reg_write64(roc_ml, reg_val, ML_JOB_MGR_CTRL);
+}
+
+bool
+roc_ml_mlip_is_enabled(struct roc_ml *roc_ml)
+{
+	uint64_t reg_val;
+
+	reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+
+	if ((reg_val & ROC_ML_CFG_MLIP_ENA) != 0)
+		return true;
+
+	return false;
+}
+
+int
+roc_ml_mlip_reset(struct roc_ml *roc_ml, bool force)
+{
+	uint64_t reg_val;
+
+	/* Force reset */
+	if (force) {
+		/* Set ML(0)_CFG[ENA] = 0. */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+		reg_val &= ~ROC_ML_CFG_ENA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_CFG);
+
+		/* Set ML(0)_CFG[MLIP_ENA] = 0. */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+		reg_val &= ~ROC_ML_CFG_MLIP_ENA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_CFG);
+
+		/* Clear ML_MLR_BASE */
+		roc_ml_reg_write64(roc_ml, 0, ML_MLR_BASE);
+	}
+
+	if (roc_model_is_cn10ka()) {
+		/* Wait for all active jobs to finish.
+		 * ML_CFG[ENA] : When set, MLW will accept job commands. This
+		 * bit can be cleared at any time. If [BUSY] is set, software
+		 * must wait until [BUSY] == 0 before setting this bit.
+		 */
+		roc_ml_reg_wait_to_clear(roc_ml, ML_CFG, ROC_ML_CFG_BUSY);
+
+		/* (1) Set ML(0)_AXI_BRIDGE_CTRL(0..1)[FENCE] = 1 to instruct
+		 * the AXI bridge not to accept any new transactions from MLIP.
+		 */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(0));
+		reg_val |= ROC_ML_AXI_BRIDGE_CTRL_FENCE;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(0));
+
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(1));
+		reg_val |= ROC_ML_AXI_BRIDGE_CTRL_FENCE;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(1));
+
+		/* (2) Wait until ML(0)_AXI_BRIDGE_CTRL(0..1)[BUSY] = 0 which
+		 * indicates that there is no outstanding transactions on
+		 * AXI-NCB paths.
+		 */
+		roc_ml_reg_wait_to_clear(roc_ml, ML_AXI_BRIDGE_CTRL(0),
+					 ROC_ML_AXI_BRIDGE_CTRL_BUSY);
+		roc_ml_reg_wait_to_clear(roc_ml, ML_AXI_BRIDGE_CTRL(1),
+					 ROC_ML_AXI_BRIDGE_CTRL_BUSY);
+
+		/* (3) Wait until ML(0)_JOB_MGR_CTRL[BUSY] = 0 which indicates
+		 * that there are no pending jobs in the MLW's job manager.
+		 */
+		roc_ml_reg_wait_to_clear(roc_ml, ML_JOB_MGR_CTRL, ROC_ML_JOB_MGR_CTRL_BUSY);
+
+		/* (4) Set ML(0)_CFG[ENA] = 0. */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+		reg_val &= ~ROC_ML_CFG_ENA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_CFG);
+
+		/* (5) Set ML(0)_CFG[MLIP_ENA] = 0. */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+		reg_val &= ~ROC_ML_CFG_MLIP_ENA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_CFG);
+
+		/* (6) Set ML(0)_AXI_BRIDGE_CTRL(0..1)[FENCE] = 0.*/
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(0));
+		reg_val &= ~ROC_ML_AXI_BRIDGE_CTRL_FENCE;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(0));
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(1));
+	}
+
+	if (roc_model_is_cnf10kb()) {
+		/* (1) Clear MLAB(0)_CFG[ENA]. Any new jobs will bypass the job
+		 * execution stages and their completions will be returned to
+		 * PSM.
+		 */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+		reg_val &= ~ROC_ML_CFG_ENA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_CFG);
+
+		/* (2) Quiesce the ACC and DMA AXI interfaces: For each of the
+		 * two MLAB(0)_AXI_BRIDGE_CTRL(0..1) registers:
+		 *
+		 * (a) Set MLAB(0)_AXI_BRIDGE_CTRL(0..1)[FENCE] to block new AXI
+		 * commands from MLIP.
+		 *
+		 * (b) Poll MLAB(0)_AXI_BRIDGE_CTRL(0..1)[BUSY] == 0.
+		 */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(0));
+		reg_val |= ROC_ML_AXI_BRIDGE_CTRL_FENCE;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(0));
+
+		roc_ml_reg_wait_to_clear(roc_ml, ML_AXI_BRIDGE_CTRL(0),
+					 ROC_ML_AXI_BRIDGE_CTRL_BUSY);
+
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(1));
+		reg_val |= ROC_ML_AXI_BRIDGE_CTRL_FENCE;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(1));
+
+		roc_ml_reg_wait_to_clear(roc_ml, ML_AXI_BRIDGE_CTRL(1),
+					 ROC_ML_AXI_BRIDGE_CTRL_BUSY);
+
+		/* (3) Clear MLAB(0)_CFG[MLIP_ENA] to reset MLIP.
+		 */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_CFG);
+		reg_val &= ~ROC_ML_CFG_MLIP_ENA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_CFG);
+
+cnf10kb_mlip_reset_stage_4a:
+		/* (4) Flush any outstanding jobs in MLAB's job execution
+		 * stages:
+		 *
+		 * (a) Wait for completion stage to clear:
+		 *   - Poll MLAB(0)_STG(0..2)_STATUS[VALID] == 0.
+		 */
+		roc_ml_reg_wait_to_clear(roc_ml, ML_STGX_STATUS(0), ROC_ML_STG_STATUS_VALID);
+		roc_ml_reg_wait_to_clear(roc_ml, ML_STGX_STATUS(1), ROC_ML_STG_STATUS_VALID);
+		roc_ml_reg_wait_to_clear(roc_ml, ML_STGX_STATUS(2), ROC_ML_STG_STATUS_VALID);
+
+cnf10kb_mlip_reset_stage_4b:
+		/* (4b) Clear job run stage: Poll
+		 * MLAB(0)_STG_CONTROL[RUN_TO_COMP] == 0.
+		 */
+		roc_ml_reg_wait_to_clear(roc_ml, ML_STG_CONTROL, ROC_ML_STG_CONTROL_RUN_TO_COMP);
+
+		/* (4b) Clear job run stage: If MLAB(0)_STG(1)_STATUS[VALID] ==
+		 * 1:
+		 *     - Set MLAB(0)_STG_CONTROL[RUN_TO_COMP].
+		 *     - Poll MLAB(0)_STG_CONTROL[RUN_TO_COMP] == 0.
+		 *     - Repeat step (a) to clear job completion stage.
+		 */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_STGX_STATUS(1));
+		if (reg_val & ROC_ML_STG_STATUS_VALID) {
+			reg_val = roc_ml_reg_read64(roc_ml, ML_STG_CONTROL);
+			reg_val |= ROC_ML_STG_CONTROL_RUN_TO_COMP;
+			roc_ml_reg_write64(roc_ml, reg_val, ML_STG_CONTROL);
+
+			roc_ml_reg_wait_to_clear(roc_ml, ML_STG_CONTROL,
+						 ROC_ML_STG_CONTROL_RUN_TO_COMP);
+
+			goto cnf10kb_mlip_reset_stage_4a;
+		}
+
+		/* (4c) Clear job fetch stage: Poll
+		 * MLAB(0)_STG_CONTROL[FETCH_TO_RUN] == 0.
+		 */
+		roc_ml_reg_wait_to_clear(roc_ml, ML_STG_CONTROL, ROC_ML_STG_CONTROL_FETCH_TO_RUN);
+
+		/* (4c) Clear job fetch stage: If
+		 * MLAB(0)_STG(0..2)_STATUS[VALID] == 1:
+		 *     - Set MLAB(0)_STG_CONTROL[FETCH_TO_RUN].
+		 *     - Poll MLAB(0)_STG_CONTROL[FETCH_TO_RUN] == 0.
+		 *     - Repeat step (b) to clear job run and completion stages.
+		 */
+		reg_val = (roc_ml_reg_read64(roc_ml, ML_STGX_STATUS(0)) |
+			   roc_ml_reg_read64(roc_ml, ML_STGX_STATUS(1)) |
+			   roc_ml_reg_read64(roc_ml, ML_STGX_STATUS(2)));
+
+		if (reg_val & ROC_ML_STG_STATUS_VALID) {
+			reg_val = roc_ml_reg_read64(roc_ml, ML_STG_CONTROL);
+			reg_val |= ROC_ML_STG_CONTROL_RUN_TO_COMP;
+			roc_ml_reg_write64(roc_ml, reg_val, ML_STG_CONTROL);
+
+			roc_ml_reg_wait_to_clear(roc_ml, ML_STG_CONTROL,
+						 ROC_ML_STG_CONTROL_RUN_TO_COMP);
+
+			goto cnf10kb_mlip_reset_stage_4b;
+		}
+
+		/* (5) Reset the ACC and DMA AXI interfaces: For each of the two
+		 * MLAB(0)_AXI_BRIDGE_CTRL(0..1) registers:
+		 *
+		 * (5a) Set and then clear
+		 * MLAB(0)_AXI_BRIDGE_CTRL(0..1)[FLUSH_WRITE_DATA].
+		 *
+		 * (5b) Clear MLAB(0)_AXI_BRIDGE_CTRL(0..1)[FENCE].
+		 */
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(0));
+		reg_val |= ROC_ML_AXI_BRIDGE_CTRL_FLUSH_WRITE_DATA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(0));
+
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(0));
+		reg_val &= ~ROC_ML_AXI_BRIDGE_CTRL_FLUSH_WRITE_DATA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(0));
+
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(0));
+		reg_val &= ~ROC_ML_AXI_BRIDGE_CTRL_FENCE;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(0));
+
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(1));
+		reg_val |= ROC_ML_AXI_BRIDGE_CTRL_FLUSH_WRITE_DATA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(1));
+
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(1));
+		reg_val &= ~ROC_ML_AXI_BRIDGE_CTRL_FLUSH_WRITE_DATA;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(1));
+
+		reg_val = roc_ml_reg_read64(roc_ml, ML_AXI_BRIDGE_CTRL(1));
+		reg_val &= ~ROC_ML_AXI_BRIDGE_CTRL_FENCE;
+		roc_ml_reg_write64(roc_ml, reg_val, ML_AXI_BRIDGE_CTRL(1));
+	}
+
+	return 0;
+}
+
+int
+roc_ml_dev_init(struct roc_ml *roc_ml)
+{
+	struct plt_pci_device *pci_dev;
+	struct dev *dev;
+	struct ml *ml;
+
+	if (roc_ml == NULL || roc_ml->pci_dev == NULL)
+		return -EINVAL;
+
+	PLT_STATIC_ASSERT(sizeof(struct ml) <= ROC_ML_MEM_SZ);
+
+	ml = roc_ml_to_ml_priv(roc_ml);
+	memset(ml, 0, sizeof(*ml));
+	pci_dev = roc_ml->pci_dev;
+	dev = &ml->dev;
+
+	ml->pci_dev = pci_dev;
+	dev->roc_ml = roc_ml;
+
+	ml->ml_reg_addr = ml->pci_dev->mem_resource[0].addr;
+	ml->ml_mlr_base = 0;
+	ml->ml_mlr_base_saved = false;
+
+	plt_ml_dbg("ML: PCI Physical Address : 0x%016lx", ml->pci_dev->mem_resource[0].phys_addr);
+	plt_ml_dbg("ML: PCI Virtual Address : 0x%016lx",
+		   PLT_U64_CAST(ml->pci_dev->mem_resource[0].addr));
+
+	plt_spinlock_init(&roc_ml->sp_spinlock);
+	plt_spinlock_init(&roc_ml->fp_spinlock);
+
+	return 0;
+}
+
+int
+roc_ml_dev_fini(struct roc_ml *roc_ml)
+{
+	struct ml *ml = roc_ml_to_ml_priv(roc_ml);
+
+	if (ml == NULL)
+		return -EINVAL;
+
+	return 0;
+}
+
+int
+roc_ml_blk_init(struct roc_bphy *roc_bphy, struct roc_ml *roc_ml)
+{
+	struct dev *dev;
+	struct ml *ml;
+
+	if ((roc_ml == NULL) || (roc_bphy == NULL))
+		return -EINVAL;
+
+	PLT_STATIC_ASSERT(sizeof(struct ml) <= ROC_ML_MEM_SZ);
+
+	ml = roc_ml_to_ml_priv(roc_ml);
+	memset(ml, 0, sizeof(*ml));
+
+	dev = &ml->dev;
+
+	ml->pci_dev = roc_bphy->pci_dev;
+	dev->roc_ml = roc_ml;
+
+	plt_ml_dbg(
+		"MLAB: Physical Address : 0x%016lx",
+		PLT_PTR_ADD_U64_CAST(ml->pci_dev->mem_resource[0].phys_addr, ML_MLAB_BLK_OFFSET));
+	plt_ml_dbg("MLAB: Virtual Address : 0x%016lx",
+		   PLT_PTR_ADD_U64_CAST(ml->pci_dev->mem_resource[0].addr, ML_MLAB_BLK_OFFSET));
+
+	ml->ml_reg_addr = PLT_PTR_ADD(ml->pci_dev->mem_resource[0].addr, ML_MLAB_BLK_OFFSET);
+	ml->ml_mlr_base = 0;
+	ml->ml_mlr_base_saved = false;
+
+	plt_spinlock_init(&roc_ml->sp_spinlock);
+	plt_spinlock_init(&roc_ml->fp_spinlock);
+
+	return 0;
+}
+
+int
+roc_ml_blk_fini(struct roc_bphy *roc_bphy, struct roc_ml *roc_ml)
+{
+	struct ml *ml;
+
+	if ((roc_ml == NULL) || (roc_bphy == NULL))
+		return -EINVAL;
+
+	ml = roc_ml_to_ml_priv(roc_ml);
+
+	if (ml == NULL)
+		return -EINVAL;
+
+	return 0;
+}
+
+uint16_t
+roc_ml_sso_pf_func_get(void)
+{
+	return idev_sso_pffunc_get();
+}
diff --git a/drivers/common/cnxk/roc_ml.h b/drivers/common/cnxk/roc_ml.h
new file mode 100644
index 0000000000..3cd82be6a6
--- /dev/null
+++ b/drivers/common/cnxk/roc_ml.h
@@ -0,0 +1,152 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef _ROC_ML_H_
+#define _ROC_ML_H_
+
+#include "roc_api.h"
+
+#define ROC_ML_MEM_SZ	  (6 * 1024)
+#define ROC_ML_TIMEOUT_MS 10000
+
+/* ML_CFG */
+#define ROC_ML_CFG_JD_SIZE	  GENMASK_ULL(1, 0)
+#define ROC_ML_CFG_MLIP_ENA	  BIT_ULL(2)
+#define ROC_ML_CFG_BUSY		  BIT_ULL(3)
+#define ROC_ML_CFG_WRAP_CLK_FORCE BIT_ULL(4)
+#define ROC_ML_CFG_MLIP_CLK_FORCE BIT_ULL(5)
+#define ROC_ML_CFG_ENA		  BIT_ULL(6)
+
+/* ML_MLR_BASE */
+#define ROC_ML_MLR_BASE_BASE GENMASK_ULL(51, 0)
+
+/* ML_STG_STATUS */
+#define ROC_ML_STG_STATUS_VALID		BIT_ULL(0)
+#define ROC_ML_STG_STATUS_ADDR_ERR	BIT_ULL(1)
+#define ROC_ML_STG_STATUS_DMA_ERR	BIT_ULL(2)
+#define ROC_ML_STG_STATUS_TIMEOUT	BIT_ULL(3)
+#define ROC_ML_STG_STATUS_NFAT_ERR	BIT_ULL(4)
+#define ROC_ML_STG_STATUS_JOB_ERR	BIT_ULL(5)
+#define ROC_ML_STG_STATUS_ELAPSED_TICKS GENMASK_ULL(47, 6)
+
+/* ML_STG_CONTROL */
+#define ROC_ML_STG_CONTROL_FETCH_TO_RUN BIT_ULL(0)
+#define ROC_ML_STG_CONTROL_RUN_TO_COMP	BIT_ULL(1)
+
+/* ML_AXI_BRIDGE */
+#define ROC_ML_AXI_BRIDGE_CTRL_AXI_RESP_CTRL	      BIT_ULL(0)
+#define ROC_ML_AXI_BRIDGE_CTRL_BRIDGE_CTRL_MODE	      BIT_ULL(1)
+#define ROC_ML_AXI_BRIDGE_CTRL_FORCE_AXI_ID	      GENMASK_ULL(11, 2)
+#define ROC_ML_AXI_BRIDGE_CTRL_CSR_WR_BLK	      BIT_ULL(13)
+#define ROC_ML_AXI_BRIDGE_CTRL_NCB_WR_BLK	      BIT_ULL(14)
+#define ROC_ML_AXI_BRIDGE_CTRL_CSR_RD_BLK	      BIT_ULL(15)
+#define ROC_ML_AXI_BRIDGE_CTRL_NCB_RD_BLK	      BIT_ULL(16)
+#define ROC_ML_AXI_BRIDGE_CTRL_FENCE		      BIT_ULL(17)
+#define ROC_ML_AXI_BRIDGE_CTRL_BUSY		      BIT_ULL(18)
+#define ROC_ML_AXI_BRIDGE_CTRL_FORCE_WRESP_OK	      BIT_ULL(19)
+#define ROC_ML_AXI_BRIDGE_CTRL_FORCE_RRESP_OK	      BIT_ULL(20)
+#define ROC_ML_AXI_BRIDGE_CTRL_CSR_FORCE_CMPLT	      BIT_ULL(21)
+#define ROC_ML_AXI_BRIDGE_CTRL_WR_CNT_GEAR	      GENMASK_ULL(25, 22)
+#define ROC_ML_AXI_BRIDGE_CTRL_RD_GEAR		      GENMASK_ULL(28, 26)
+#define ROC_ML_AXI_BRIDGE_CTRL_CSR_CUTTHROUGH_MODE    BIT_ULL(29)
+#define ROC_ML_AXI_BRIDGE_CTRL_GAA_WRITE_CREDITS      GENMASK_ULL(33, 30)
+#define ROC_ML_AXI_BRIDGE_CTRL_GAA_READ_CREDITS	      GENMASK_ULL(37, 34)
+#define ROC_ML_AXI_BRIDGE_CTRL_GAA_LOAD_WRITE_CREDITS BIT_ULL(38)
+#define ROC_ML_AXI_BRIDGE_CTRL_GAA_LOAD_READ_CREDITS  BIT_ULL(39)
+#define ROC_ML_AXI_BRIDGE_CTRL_FLUSH_WRITE_DATA	      BIT_ULL(40)
+
+/* ML_JOB_MGR_CTRL */
+#define ROC_ML_JOB_MGR_CTRL_STALL_ON_ERR     BIT_ULL(0)
+#define ROC_ML_JOB_MGR_CTRL_PF_OVERRIDE	     BIT_ULL(1)
+#define ROC_ML_JOB_MGR_CTRL_PF_FUNC_OVERRIDE GENMASK_ULL(19, 4)
+#define ROC_ML_JOB_MGR_CTRL_BUSY	     BIT_ULL(20)
+#define ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE    BIT_ULL(21)
+
+/* ML_JCMDQ_STATUS */
+#define ROC_ML_JCMDQ_STATUS_AVAIL_COUNT GENMASK_ULL(4, 0)
+
+/* ML_ANBX_BACKP_DISABLE */
+#define ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_B_BACKP_DISABLE BIT_ULL(0)
+#define ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_R_BACKP_DISABLE BIT_ULL(1)
+
+/* ML_ANBX_NCBI_P_OVR */
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_MSH_DST_OVR_VLD	 BIT_ULL(0)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_MSH_DST_OVR	 GENMASK_ULL(11, 1)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR_VLD	 BIT_ULL(12)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR		 BIT_ULL(13)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_PADDR_OVR_VLD	 BIT_ULL(14)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_PADDR_OVR		 BIT_ULL(15)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR_VLD	 BIT_ULL(16)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR		 BIT_ULL(17)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_MPADID_VAL_OVR_VLD BIT_ULL(18)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_MPADID_VAL_OVR	 BIT_ULL(19)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_MPAMDID_OVR_VLD	 BIT_ULL(20)
+#define ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_MPAMDID_OVR	 BIT_ULL(21)
+
+/* ML_ANBX_NCBI_NP_OVR */
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_MSH_DST_OVR_VLD	   BIT_ULL(0)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_MSH_DST_OVR	   GENMASK_ULL(11, 1)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR_VLD	   BIT_ULL(12)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR		   BIT_ULL(13)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_PADDR_OVR_VLD	   BIT_ULL(14)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_PADDR_OVR	   BIT_ULL(15)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_RO_OVR_VLD	   BIT_ULL(16)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_RO_OVR		   BIT_ULL(17)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_MPADID_VAL_OVR_VLD BIT_ULL(18)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_MPADID_VAL_OVR	   BIT_ULL(19)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_MPAMDID_OVR_VLD	   BIT_ULL(20)
+#define ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_MPAMDID_OVR	   BIT_ULL(21)
+
+/* ML_SW_RST_CTRL */
+#define ROC_ML_SW_RST_CTRL_ACC_RST  BIT_ULL(0)
+#define ROC_ML_SW_RST_CTRL_CMPC_RST BIT_ULL(1)
+
+struct roc_ml {
+	struct plt_pci_device *pci_dev;
+	plt_spinlock_t sp_spinlock;
+	plt_spinlock_t fp_spinlock;
+	uint8_t reserved[ROC_ML_MEM_SZ] __plt_cache_aligned;
+} __plt_cache_aligned;
+
+/* Register read and write functions */
+uint64_t __roc_api roc_ml_reg_read64(struct roc_ml *roc_ml, uint64_t offset);
+void __roc_api roc_ml_reg_write64(struct roc_ml *roc_ml, uint64_t val, uint64_t offset);
+uint32_t __roc_api roc_ml_reg_read32(struct roc_ml *roc_ml, uint64_t offset);
+void __roc_api roc_ml_reg_write32(struct roc_ml *roc_ml, uint32_t val, uint64_t offset);
+void __roc_api roc_ml_reg_save(struct roc_ml *roc_ml, uint64_t offset);
+
+/* Address translation functions */
+uint64_t __roc_api roc_ml_addr_pa_to_offset(struct roc_ml *roc_ml, uint64_t phys_addr);
+uint64_t __roc_api roc_ml_addr_offset_to_pa(struct roc_ml *roc_ml, uint64_t offset);
+void *__roc_api roc_ml_addr_ap2mlip(struct roc_ml *roc_ml, void *addr);
+void *__roc_api roc_ml_addr_mlip2ap(struct roc_ml *roc_ml, void *addr);
+
+/* Scratch and JCMDQ functions */
+void __roc_api roc_ml_scratch_write_job(struct roc_ml *roc_ml, void *jd);
+bool __roc_api roc_ml_scratch_is_valid_bit_set(struct roc_ml *roc_ml);
+bool __roc_api roc_ml_scratch_is_done_bit_set(struct roc_ml *roc_ml);
+bool __roc_api roc_ml_scratch_enqueue(struct roc_ml *roc_ml, void *work_ptr);
+bool __roc_api roc_ml_scratch_dequeue(struct roc_ml *roc_ml, void *work_ptr);
+void __roc_api roc_ml_scratch_queue_reset(struct roc_ml *roc_ml);
+bool __roc_api roc_ml_jcmdq_enqueue_lf(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
+bool __roc_api roc_ml_jcmdq_enqueue_sl(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
+
+/* Device management functions */
+void __roc_api roc_ml_clk_force_on(struct roc_ml *roc_ml);
+void __roc_api roc_ml_clk_force_off(struct roc_ml *roc_ml);
+void __roc_api roc_ml_dma_stall_on(struct roc_ml *roc_ml);
+void __roc_api roc_ml_dma_stall_off(struct roc_ml *roc_ml);
+bool __roc_api roc_ml_mlip_is_enabled(struct roc_ml *roc_ml);
+int __roc_api roc_ml_mlip_reset(struct roc_ml *roc_ml, bool force);
+
+/* Device / block  functions */
+int __roc_api roc_ml_dev_init(struct roc_ml *roc_ml);
+int __roc_api roc_ml_dev_fini(struct roc_ml *roc_ml);
+int __roc_api roc_ml_blk_init(struct roc_bphy *roc_bphy, struct roc_ml *roc_ml);
+int __roc_api roc_ml_blk_fini(struct roc_bphy *roc_bphy, struct roc_ml *roc_ml);
+
+/* Utility functions */
+uint16_t __roc_api roc_ml_sso_pf_func_get(void);
+
+#endif /*_ROC_ML_H_*/
diff --git a/drivers/common/cnxk/roc_ml_priv.h b/drivers/common/cnxk/roc_ml_priv.h
new file mode 100644
index 0000000000..ad5fe90bab
--- /dev/null
+++ b/drivers/common/cnxk/roc_ml_priv.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef _ROC_ML_PRIV_H_
+#define _ROC_ML_PRIV_H_
+
+#include "roc_api.h"
+
+struct ml {
+	struct plt_pci_device *pci_dev;
+	struct dev dev;
+	uint8_t *ml_reg_addr;
+	uint64_t ml_mlr_base;
+	bool ml_mlr_base_saved;
+} __plt_cache_aligned;
+
+static inline struct ml *
+roc_ml_to_ml_priv(struct roc_ml *roc_ml)
+{
+	return (struct ml *)&roc_ml->reserved[0];
+}
+
+#endif /* _ROC_ML_PRIV_H_ */
diff --git a/drivers/common/cnxk/roc_platform.c b/drivers/common/cnxk/roc_platform.c
index ce0f9b870c..f91b95ceab 100644
--- a/drivers/common/cnxk/roc_platform.c
+++ b/drivers/common/cnxk/roc_platform.c
@@ -63,6 +63,7 @@ roc_plt_init(void)
 RTE_LOG_REGISTER(cnxk_logtype_base, pmd.cnxk.base, NOTICE);
 RTE_LOG_REGISTER(cnxk_logtype_mbox, pmd.cnxk.mbox, NOTICE);
 RTE_LOG_REGISTER(cnxk_logtype_cpt, pmd.crypto.cnxk, NOTICE);
+RTE_LOG_REGISTER(cnxk_logtype_ml, pmd.ml.cnxk, NOTICE);
 RTE_LOG_REGISTER(cnxk_logtype_npa, pmd.mempool.cnxk, NOTICE);
 RTE_LOG_REGISTER(cnxk_logtype_nix, pmd.net.cnxk, NOTICE);
 RTE_LOG_REGISTER(cnxk_logtype_npc, pmd.net.cnxk.flow, NOTICE);
diff --git a/drivers/common/cnxk/roc_platform.h b/drivers/common/cnxk/roc_platform.h
index ed562a82bf..c50fcd6c6d 100644
--- a/drivers/common/cnxk/roc_platform.h
+++ b/drivers/common/cnxk/roc_platform.h
@@ -234,6 +234,7 @@
 extern int cnxk_logtype_base;
 extern int cnxk_logtype_mbox;
 extern int cnxk_logtype_cpt;
+extern int cnxk_logtype_ml;
 extern int cnxk_logtype_npa;
 extern int cnxk_logtype_nix;
 extern int cnxk_logtype_npc;
@@ -261,6 +262,7 @@ extern int cnxk_logtype_ree;
 #define plt_base_dbg(fmt, ...)	plt_dbg(base, fmt, ##__VA_ARGS__)
 #define plt_cpt_dbg(fmt, ...)	plt_dbg(cpt, fmt, ##__VA_ARGS__)
 #define plt_mbox_dbg(fmt, ...)	plt_dbg(mbox, fmt, ##__VA_ARGS__)
+#define plt_ml_dbg(fmt, ...)	plt_dbg(ml, fmt, ##__VA_ARGS__)
 #define plt_npa_dbg(fmt, ...)	plt_dbg(npa, fmt, ##__VA_ARGS__)
 #define plt_nix_dbg(fmt, ...)	plt_dbg(nix, fmt, ##__VA_ARGS__)
 #define plt_npc_dbg(fmt, ...)	plt_dbg(npc, fmt, ##__VA_ARGS__)
diff --git a/drivers/common/cnxk/roc_priv.h b/drivers/common/cnxk/roc_priv.h
index 122d411fe7..14fe2e452a 100644
--- a/drivers/common/cnxk/roc_priv.h
+++ b/drivers/common/cnxk/roc_priv.h
@@ -47,4 +47,7 @@
 /* REE */
 #include "roc_ree_priv.h"

+/* ML */
+#include "roc_ml_priv.h"
+
 #endif /* _ROC_PRIV_H_ */
diff --git a/drivers/common/cnxk/version.map b/drivers/common/cnxk/version.map
index 4bc14901a7..b298a21b84 100644
--- a/drivers/common/cnxk/version.map
+++ b/drivers/common/cnxk/version.map
@@ -8,6 +8,7 @@ INTERNAL {
 	cnxk_logtype_base;
 	cnxk_logtype_cpt;
 	cnxk_logtype_mbox;
+	cnxk_logtype_ml;
 	cnxk_logtype_nix;
 	cnxk_logtype_npa;
 	cnxk_logtype_npc;
@@ -98,6 +99,34 @@ INTERNAL {
 	roc_idev_npa_nix_get;
 	roc_idev_num_lmtlines_get;
 	roc_idev_nix_inl_meta_aura_get;
+	roc_ml_reg_read64;
+	roc_ml_reg_write64;
+	roc_ml_reg_read32;
+	roc_ml_reg_write32;
+	roc_ml_reg_save;
+	roc_ml_addr_ap2mlip;
+	roc_ml_addr_mlip2ap;
+	roc_ml_addr_pa_to_offset;
+	roc_ml_addr_offset_to_pa;
+	roc_ml_scratch_write_job;
+	roc_ml_scratch_is_valid_bit_set;
+	roc_ml_scratch_is_done_bit_set;
+	roc_ml_scratch_enqueue;
+	roc_ml_scratch_dequeue;
+	roc_ml_scratch_queue_reset;
+	roc_ml_jcmdq_enqueue_lf;
+	roc_ml_jcmdq_enqueue_sl;
+	roc_ml_clk_force_on;
+	roc_ml_clk_force_off;
+	roc_ml_dma_stall_on;
+	roc_ml_dma_stall_off;
+	roc_ml_mlip_is_enabled;
+	roc_ml_mlip_reset;
+	roc_ml_dev_init;
+	roc_ml_dev_fini;
+	roc_ml_blk_init;
+	roc_ml_blk_fini;
+	roc_ml_sso_pf_func_get;
 	roc_model;
 	roc_se_auth_key_set;
 	roc_se_ciph_key_set;
--
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 02/39] ml/cnxk: add skeleton for ML cnxk driver
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
  2023-03-10  8:19   ` [PATCH v6 01/39] common/cnxk: add ML headers and ROC code for cnxk Srikanth Yalavarthi
@ 2023-03-10  8:19   ` Srikanth Yalavarthi
  2023-03-10  8:19   ` [PATCH v6 03/39] ml/cnxk: enable probe and remove of ML device Srikanth Yalavarthi
                     ` (38 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:19 UTC (permalink / raw)
  To: Thomas Monjalon, Srikanth Yalavarthi
  Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added initial source files and build files for ML cnxk driver.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 MAINTAINERS                            |  1 +
 doc/guides/rel_notes/release_23_03.rst |  7 +++++++
 drivers/meson.build                    |  1 +
 drivers/ml/cnxk/cn10k_ml_dev.c         |  8 ++++++++
 drivers/ml/cnxk/cn10k_ml_dev.h         |  8 ++++++++
 drivers/ml/cnxk/meson.build            | 26 ++++++++++++++++++++++++++
 drivers/ml/meson.build                 |  8 ++++++++
 7 files changed, 59 insertions(+)
 create mode 100644 drivers/ml/cnxk/cn10k_ml_dev.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_dev.h
 create mode 100644 drivers/ml/cnxk/meson.build
 create mode 100644 drivers/ml/meson.build

diff --git a/MAINTAINERS b/MAINTAINERS
index d58df9197c..8f695516c7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1449,6 +1449,7 @@ Marvell ML CNXK
 M: Srikanth Yalavarthi <syalavarthi@marvell.com>
 F: drivers/common/cnxk/hw/ml.h
 F: drivers/common/cnxk/roc_ml*
+F: drivers/ml/cnxk/
 
 
 Packet processing
diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
index 8186545082..851c41e4e0 100644
--- a/doc/guides/rel_notes/release_23_03.rst
+++ b/doc/guides/rel_notes/release_23_03.rst
@@ -224,6 +224,13 @@ New Features
   * Test case for inferences from multiple models in ordered mode.
   * Test case for inferences from multiple models.in interleaving mode.
 
+* **Implementation of Marvell CNXK machine learning driver. **
+
+  * Added ml/cnxk driver which provides support for machine learning inference
+    operations on Marvell's CN10K series of SoC's.
+  * Added ML ROC code for ml/cnxk driver to common/cnxk.
+  * Added implementation with support for all rte_ml APIs.
+
 
 Removed Items
 -------------
diff --git a/drivers/meson.build b/drivers/meson.build
index 0618c31a69..31924823e1 100644
--- a/drivers/meson.build
+++ b/drivers/meson.build
@@ -14,6 +14,7 @@ subdirs = [
         'mempool',        # depends on common and bus.
         'dma',            # depends on common and bus.
         'net',            # depends on common, bus, mempool
+        'ml',             # depends on common, bus, mempool
         'raw',            # depends on common, bus, dma and net.
         'crypto',         # depends on common, bus and mempool (net in future).
         'compress',       # depends on common, bus, mempool.
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
new file mode 100644
index 0000000000..cc96a7bdb3
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include "cn10k_ml_dev.h"
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
new file mode 100644
index 0000000000..049ac13fcd
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef _CN10K_ML_DEV_H_
+#define _CN10K_ML_DEV_H_
+
+#endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
new file mode 100644
index 0000000000..2ec6a88e3f
--- /dev/null
+++ b/drivers/ml/cnxk/meson.build
@@ -0,0 +1,26 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2022 Marvell.
+
+if not is_linux or not dpdk_conf.get('RTE_ARCH_64')
+    build = false
+    reason = 'only supported on 64-bit Linux'
+    subdir_done()
+endif
+
+driver_sdk_headers = files(
+        'cn10k_ml_dev.h',
+)
+
+sources = files(
+        'cn10k_ml_dev.c',
+)
+
+deps += ['mldev', 'common_cnxk']
+
+if get_option('buildtype').contains('debug')
+        cflags += [ '-DCNXK_ML_DEV_DEBUG' ]
+else
+        cflags += [ '-UCNXK_ML_DEV_DEBUG' ]
+endif
+
+pmd_supports_disable_iova_as_pa = true
diff --git a/drivers/ml/meson.build b/drivers/ml/meson.build
new file mode 100644
index 0000000000..54bc394c47
--- /dev/null
+++ b/drivers/ml/meson.build
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2022 Marvell.
+
+drivers = [
+        'cnxk',
+]
+
+std_deps = ['mldev']
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 03/39] ml/cnxk: enable probe and remove of ML device
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
  2023-03-10  8:19   ` [PATCH v6 01/39] common/cnxk: add ML headers and ROC code for cnxk Srikanth Yalavarthi
  2023-03-10  8:19   ` [PATCH v6 02/39] ml/cnxk: add skeleton for ML cnxk driver Srikanth Yalavarthi
@ 2023-03-10  8:19   ` Srikanth Yalavarthi
  2023-03-10  8:19   ` [PATCH v6 04/39] ml/cnxk: add driver support to get device info Srikanth Yalavarthi
                     ` (37 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:19 UTC (permalink / raw)
  To: Srikanth Yalavarthi, Anatoly Burakov
  Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

ML inference engine on cn10k platform is a PCI based device. Added
driver support to probe and remove the device for cn10k poll mode
driver. The device is named by the PMD as "ml_cn10k".

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 114 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_dev.h |  11 ++++
 drivers/ml/cnxk/cn10k_ml_ops.c |  10 +++
 drivers/ml/cnxk/cn10k_ml_ops.h |  11 ++++
 drivers/ml/cnxk/meson.build    |   2 +
 5 files changed, 148 insertions(+)
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ops.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ops.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index cc96a7bdb3..c2e93c9a1a 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -2,7 +2,121 @@
  * Copyright (c) 2022 Marvell.
  */
 
+#include <rte_common.h>
+#include <rte_dev.h>
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
+#include <rte_pci.h>
+
+#include <roc_api.h>
 
 #include "cn10k_ml_dev.h"
+#include "cn10k_ml_ops.h"
+
+/* Dummy operations for ML device */
+struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
+
+static int
+cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
+{
+	struct rte_ml_dev_pmd_init_params init_params;
+	struct cn10k_ml_dev *mldev;
+	char name[RTE_ML_STR_MAX];
+	struct rte_ml_dev *dev;
+	int ret;
+
+	PLT_SET_USED(pci_drv);
+
+	init_params = (struct rte_ml_dev_pmd_init_params){
+		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cn10k_ml_dev)};
+
+	ret = roc_plt_init();
+	if (ret < 0) {
+		plt_err("Failed to initialize platform model");
+		return ret;
+	}
+
+	rte_pci_device_name(&pci_dev->addr, name, sizeof(name));
+	dev = rte_ml_dev_pmd_create(name, &pci_dev->device, &init_params);
+	if (dev == NULL) {
+		ret = -ENODEV;
+		goto error_exit;
+	}
+
+	/* Get private data space allocated */
+	mldev = dev->data->dev_private;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		mldev->roc.pci_dev = pci_dev;
+
+		ret = roc_ml_dev_init(&mldev->roc);
+		if (ret) {
+			plt_err("Failed to initialize ML ROC, ret = %d", ret);
+			goto pmd_destroy;
+		}
+
+		dev->dev_ops = &cn10k_ml_ops;
+	} else {
+		plt_err("CN10K ML Ops are not supported on secondary process");
+		dev->dev_ops = &ml_dev_dummy_ops;
+	}
+
+	dev->enqueue_burst = NULL;
+	dev->dequeue_burst = NULL;
+	dev->op_error_get = NULL;
+
+	return 0;
+
+pmd_destroy:
+	rte_ml_dev_pmd_destroy(dev);
+
+error_exit:
+	plt_err("Could not create device (vendor_id: 0x%x device_id: 0x%x)", pci_dev->id.vendor_id,
+		pci_dev->id.device_id);
+
+	return ret;
+}
+
+static int
+cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
+{
+	struct cn10k_ml_dev *mldev;
+	char name[RTE_ML_STR_MAX];
+	struct rte_ml_dev *dev;
+	int ret;
+
+	if (pci_dev == NULL)
+		return -EINVAL;
+
+	rte_pci_device_name(&pci_dev->addr, name, sizeof(name));
+
+	dev = rte_ml_dev_pmd_get_named_dev(name);
+	if (dev == NULL)
+		return -ENODEV;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		mldev = dev->data->dev_private;
+		ret = roc_ml_dev_fini(&mldev->roc);
+		if (ret)
+			return ret;
+	}
+
+	return rte_ml_dev_pmd_destroy(dev);
+}
+
+static struct rte_pci_id pci_id_ml_table[] = {
+	{RTE_PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_CN10K_ML_PF)},
+	/* sentinel */
+	{},
+};
+
+static struct rte_pci_driver cn10k_mldev_pmd = {
+	.id_table = pci_id_ml_table,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_NEED_IOVA_AS_VA,
+	.probe = cn10k_ml_pci_probe,
+	.remove = cn10k_ml_pci_remove,
+};
+
+RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd);
+RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table);
+RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 049ac13fcd..833a09791a 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -5,4 +5,15 @@
 #ifndef _CN10K_ML_DEV_H_
 #define _CN10K_ML_DEV_H_
 
+#include <roc_api.h>
+
+/* Marvell OCTEON CN10K ML PMD device name */
+#define MLDEV_NAME_CN10K_PMD ml_cn10k
+
+/* Device private data */
+struct cn10k_ml_dev {
+	/* Device ROC */
+	struct roc_ml roc;
+};
+
 #endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
new file mode 100644
index 0000000000..39843e3ee5
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include "cn10k_ml_ops.h"
+
+struct rte_ml_dev_ops cn10k_ml_ops = {0};
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
new file mode 100644
index 0000000000..b14221d02c
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef _CN10K_ML_OPS_H_
+#define _CN10K_ML_OPS_H_
+
+/* Device ops */
+extern struct rte_ml_dev_ops cn10k_ml_ops;
+
+#endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 2ec6a88e3f..caed62a9f3 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -9,10 +9,12 @@ endif
 
 driver_sdk_headers = files(
         'cn10k_ml_dev.h',
+        'cn10k_ml_ops.h',
 )
 
 sources = files(
         'cn10k_ml_dev.c',
+        'cn10k_ml_ops.c',
 )
 
 deps += ['mldev', 'common_cnxk']
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 04/39] ml/cnxk: add driver support to get device info
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (2 preceding siblings ...)
  2023-03-10  8:19   ` [PATCH v6 03/39] ml/cnxk: enable probe and remove of ML device Srikanth Yalavarthi
@ 2023-03-10  8:19   ` Srikanth Yalavarthi
  2023-03-10  8:19   ` [PATCH v6 05/39] ml/cnxk: add support for configure and close Srikanth Yalavarthi
                     ` (36 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:19 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added support to get the cn10k ML device information. This is a
driver implementation for the RTE function rte_ml_dev_info_get.
ML device on cn10k supports one queue-pair in lock-free mode and
does not support segmented input output data.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h | 15 +++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c | 23 ++++++++++++++++++++++-
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 833a09791a..13d26373e4 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -10,6 +10,21 @@
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
+/* Device alignment size */
+#define ML_CN10K_ALIGN_SIZE 128
+
+/* Maximum number of models per device */
+#define ML_CN10K_MAX_MODELS 16
+
+/* Maximum number of queue-pairs per device */
+#define ML_CN10K_MAX_QP_PER_DEVICE 1
+
+/* Maximum number of descriptors per queue-pair */
+#define ML_CN10K_MAX_DESC_PER_QP 1024
+
+/* Maximum number of segments for IO data */
+#define ML_CN10K_MAX_SEGMENTS 1
+
 /* Device private data */
 struct cn10k_ml_dev {
 	/* Device ROC */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 39843e3ee5..bad5ad4713 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -5,6 +5,27 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include "cn10k_ml_dev.h"
 #include "cn10k_ml_ops.h"
 
-struct rte_ml_dev_ops cn10k_ml_ops = {0};
+static int
+cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
+{
+	if (dev_info == NULL)
+		return -EINVAL;
+
+	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
+	dev_info->driver_name = dev->device->driver->name;
+	dev_info->max_models = ML_CN10K_MAX_MODELS;
+	dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE;
+	dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
+	dev_info->max_segments = ML_CN10K_MAX_SEGMENTS;
+	dev_info->min_align_size = ML_CN10K_ALIGN_SIZE;
+
+	return 0;
+}
+
+struct rte_ml_dev_ops cn10k_ml_ops = {
+	/* Device control ops */
+	.dev_info_get = cn10k_ml_dev_info_get,
+};
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 05/39] ml/cnxk: add support for configure and close
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (3 preceding siblings ...)
  2023-03-10  8:19   ` [PATCH v6 04/39] ml/cnxk: add driver support to get device info Srikanth Yalavarthi
@ 2023-03-10  8:19   ` Srikanth Yalavarthi
  2023-03-10  8:19   ` [PATCH v6 06/39] ml/cnxk: parse ML firmware path from device args Srikanth Yalavarthi
                     ` (35 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:19 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Implemented driver functions to configure and close ML devices.
Added skeleton code and support to reconfigure ML device. PCI
device remove is enabled in device close.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c |  2 ++
 drivers/ml/cnxk/cn10k_ml_dev.h | 21 ++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c | 60 ++++++++++++++++++++++++++++++++++
 3 files changed, 83 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index c2e93c9a1a..fd45226add 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -65,6 +65,8 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	dev->dequeue_burst = NULL;
 	dev->op_error_get = NULL;
 
+	mldev->state = ML_CN10K_DEV_STATE_PROBED;
+
 	return 0;
 
 pmd_destroy:
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 13d26373e4..e7fb5fc2e2 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -25,10 +25,31 @@
 /* Maximum number of segments for IO data */
 #define ML_CN10K_MAX_SEGMENTS 1
 
+/* ML command timeout in seconds */
+#define ML_CN10K_CMD_TIMEOUT 5
+
+/* Device configuration state enum */
+enum cn10k_ml_dev_state {
+	/* Probed and not configured */
+	ML_CN10K_DEV_STATE_PROBED = 0,
+
+	/* Configured */
+	ML_CN10K_DEV_STATE_CONFIGURED,
+
+	/* Started */
+	ML_CN10K_DEV_STATE_STARTED,
+
+	/* Closed */
+	ML_CN10K_DEV_STATE_CLOSED
+};
+
 /* Device private data */
 struct cn10k_ml_dev {
 	/* Device ROC */
 	struct roc_ml roc;
+
+	/* Configuration state */
+	enum cn10k_ml_dev_state state;
 };
 
 #endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index bad5ad4713..3a78d8c816 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -25,7 +25,67 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	return 0;
 }
 
+static int
+cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_dev *mldev;
+
+	if (dev == NULL || conf == NULL)
+		return -EINVAL;
+
+	/* Get CN10K device handle */
+	mldev = dev->data->dev_private;
+
+	cn10k_ml_dev_info_get(dev, &dev_info);
+	if (conf->nb_models > dev_info.max_models) {
+		plt_err("Invalid device config, nb_models > %u\n", dev_info.max_models);
+		return -EINVAL;
+	}
+
+	if (conf->nb_queue_pairs > dev_info.max_queue_pairs) {
+		plt_err("Invalid device config, nb_queue_pairs > %u\n", dev_info.max_queue_pairs);
+		return -EINVAL;
+	}
+
+	if (mldev->state == ML_CN10K_DEV_STATE_PROBED) {
+		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
+			   conf->nb_queue_pairs, conf->nb_models);
+	} else if (mldev->state == ML_CN10K_DEV_STATE_CONFIGURED) {
+		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
+			   conf->nb_queue_pairs, conf->nb_models);
+	} else if (mldev->state == ML_CN10K_DEV_STATE_STARTED) {
+		plt_err("Device can't be reconfigured in started state\n");
+		return -ENOTSUP;
+	} else if (mldev->state == ML_CN10K_DEV_STATE_CLOSED) {
+		plt_err("Device can't be reconfigured after close\n");
+		return -ENOTSUP;
+	}
+
+	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
+
+	return 0;
+}
+
+static int
+cn10k_ml_dev_close(struct rte_ml_dev *dev)
+{
+	struct cn10k_ml_dev *mldev;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	mldev = dev->data->dev_private;
+
+	mldev->state = ML_CN10K_DEV_STATE_CLOSED;
+
+	/* Remove PCI device */
+	return rte_dev_remove(dev->device);
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
+	.dev_configure = cn10k_ml_dev_configure,
+	.dev_close = cn10k_ml_dev_close,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 06/39] ml/cnxk: parse ML firmware path from device args
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (4 preceding siblings ...)
  2023-03-10  8:19   ` [PATCH v6 05/39] ml/cnxk: add support for configure and close Srikanth Yalavarthi
@ 2023-03-10  8:19   ` Srikanth Yalavarthi
  2023-03-10  8:19   ` [PATCH v6 07/39] ml/cnxk: enable firmware load and device reset Srikanth Yalavarthi
                     ` (34 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:19 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Enabled parsing ML firmware path for cn10k. Default path is set
as "/lib/firmware/mlip-fw.bin", when args are not provided. Added
internal structures for ML firmware.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 71 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_dev.h | 12 ++++++
 drivers/ml/cnxk/meson.build    |  2 +-
 3 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index fd45226add..117cac43aa 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -4,6 +4,8 @@
 
 #include <rte_common.h>
 #include <rte_dev.h>
+#include <rte_devargs.h>
+#include <rte_kvargs.h>
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 #include <rte_pci.h>
@@ -13,9 +15,70 @@
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ops.h"
 
+#define CN10K_ML_FW_PATH "fw_path"
+
+#define CN10K_ML_FW_PATH_DEFAULT "/lib/firmware/mlip-fw.bin"
+
+static const char *const valid_args[] = {CN10K_ML_FW_PATH, NULL};
+
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
 
+static int
+parse_string_arg(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	if (value == NULL || extra_args == NULL)
+		return -EINVAL;
+
+	*(char **)extra_args = strdup(value);
+
+	if (!*(char **)extra_args)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static int
+cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mldev)
+{
+	struct rte_kvargs *kvlist = NULL;
+	bool fw_path_set = false;
+	char *fw_path = NULL;
+	int ret = 0;
+
+	if (devargs == NULL)
+		goto check_args;
+
+	kvlist = rte_kvargs_parse(devargs->args, valid_args);
+	if (kvlist == NULL) {
+		plt_err("Error parsing devargs\n");
+		return -EINVAL;
+	}
+
+	if (rte_kvargs_count(kvlist, CN10K_ML_FW_PATH) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_PATH, &parse_string_arg, &fw_path);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n", CN10K_ML_FW_PATH);
+			ret = -EINVAL;
+			goto exit;
+		}
+		fw_path_set = true;
+	}
+
+check_args:
+	if (!fw_path_set)
+		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
+	else
+		mldev->fw.path = fw_path;
+	plt_info("ML: %s = %s", CN10K_ML_FW_PATH, mldev->fw.path);
+
+exit:
+	if (kvlist)
+		rte_kvargs_free(kvlist);
+
+	return ret;
+}
+
 static int
 cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 {
@@ -49,6 +112,12 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
 		mldev->roc.pci_dev = pci_dev;
 
+		ret = cn10k_mldev_parse_devargs(dev->device->devargs, mldev);
+		if (ret) {
+			plt_err("Failed to parse devargs ret = %d", ret);
+			goto pmd_destroy;
+		}
+
 		ret = roc_ml_dev_init(&mldev->roc);
 		if (ret) {
 			plt_err("Failed to initialize ML ROC, ret = %d", ret);
@@ -122,3 +191,5 @@ static struct rte_pci_driver cn10k_mldev_pmd = {
 RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table);
 RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
+
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH "=<path>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index e7fb5fc2e2..5333566cff 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -43,6 +43,15 @@ enum cn10k_ml_dev_state {
 	ML_CN10K_DEV_STATE_CLOSED
 };
 
+/* ML firmware structure */
+struct cn10k_ml_fw {
+	/* Device reference */
+	struct cn10k_ml_dev *mldev;
+
+	/* Firmware file path */
+	const char *path;
+};
+
 /* Device private data */
 struct cn10k_ml_dev {
 	/* Device ROC */
@@ -50,6 +59,9 @@ struct cn10k_ml_dev {
 
 	/* Configuration state */
 	enum cn10k_ml_dev_state state;
+
+	/* Firmware */
+	struct cn10k_ml_fw fw;
 };
 
 #endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index caed62a9f3..7dc8a29a80 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -17,7 +17,7 @@ sources = files(
         'cn10k_ml_ops.c',
 )
 
-deps += ['mldev', 'common_cnxk']
+deps += ['mldev', 'common_cnxk', 'kvargs']
 
 if get_option('buildtype').contains('debug')
         cflags += [ '-DCNXK_ML_DEV_DEBUG' ]
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 07/39] ml/cnxk: enable firmware load and device reset
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (5 preceding siblings ...)
  2023-03-10  8:19   ` [PATCH v6 06/39] ml/cnxk: parse ML firmware path from device args Srikanth Yalavarthi
@ 2023-03-10  8:19   ` Srikanth Yalavarthi
  2023-03-10  8:19   ` [PATCH v6 08/39] ml/cnxk: enable support for simulator environment Srikanth Yalavarthi
                     ` (33 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:19 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Enabled support to load ML firmware on cn10ka ROC model. Reset
MLIP device during dev_close driver operation. Device can't be
reconfigured after a call to close. Job execution is disabled
after firmware load, execution is enabled in device start state.
Added internal request structure.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 327 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_dev.h | 156 ++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c |  21 +++
 drivers/ml/cnxk/cn10k_ml_ops.h |  14 ++
 4 files changed, 518 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 117cac43aa..90fca45ddd 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -12,6 +12,8 @@
 
 #include <roc_api.h>
 
+#include <eal_firmware.h>
+
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ops.h"
 
@@ -19,6 +21,15 @@
 
 #define CN10K_ML_FW_PATH_DEFAULT "/lib/firmware/mlip-fw.bin"
 
+/* ML firmware macros */
+#define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
+#define FW_STACK_BUFFER_SIZE	 0x40000
+#define FW_DEBUG_BUFFER_SIZE	 (2 * 0x20000)
+#define FW_EXCEPTION_BUFFER_SIZE 0x400
+#define FW_LINKER_OFFSET	 0x80000
+#define FW_WAIT_CYCLES		 100
+#define FW_LOAD_FLAGS		 0x1
+
 static const char *const valid_args[] = {CN10K_ML_FW_PATH, NULL};
 
 /* Dummy operations for ML device */
@@ -175,6 +186,322 @@ cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 	return rte_ml_dev_pmd_destroy(dev);
 }
 
+static void
+cn10k_ml_fw_print_info(struct cn10k_ml_fw *fw)
+{
+	plt_info("ML Firmware Version = %s", fw->req->jd.fw_load.version);
+
+	plt_ml_dbg("Firmware capabilities = 0x%016lx", fw->req->jd.fw_load.cap.u64);
+	plt_ml_dbg("Version = %s", fw->req->jd.fw_load.version);
+	plt_ml_dbg("core0_debug_ptr = 0x%016lx", fw->req->jd.fw_load.debug.core0_debug_ptr);
+	plt_ml_dbg("core1_debug_ptr = 0x%016lx", fw->req->jd.fw_load.debug.core1_debug_ptr);
+	plt_ml_dbg("debug_buffer_size = %u bytes", fw->req->jd.fw_load.debug.debug_buffer_size);
+	plt_ml_dbg("core0_exception_buffer = 0x%016lx",
+		   fw->req->jd.fw_load.debug.core0_exception_buffer);
+	plt_ml_dbg("core1_exception_buffer = 0x%016lx",
+		   fw->req->jd.fw_load.debug.core1_exception_buffer);
+	plt_ml_dbg("exception_state_size = %u bytes",
+		   fw->req->jd.fw_load.debug.exception_state_size);
+	plt_ml_dbg("flags = 0x%016lx", fw->req->jd.fw_load.flags);
+}
+
+uint64_t
+cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
+{
+	PLT_SET_USED(fw);
+
+	return FW_LOAD_FLAGS;
+}
+
+static int
+cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
+{
+	union ml_a35_0_rst_vector_base_s a35_0_rst_vector_base;
+	union ml_a35_0_rst_vector_base_s a35_1_rst_vector_base;
+	struct cn10k_ml_dev *mldev;
+	uint64_t timeout_cycle;
+	uint64_t reg_val64;
+	uint32_t reg_val32;
+	uint64_t offset;
+	bool timeout;
+	int ret = 0;
+	uint8_t i;
+
+	mldev = fw->mldev;
+
+	/* Reset HEAD and TAIL debug pointer registers */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
+
+	/* (1) Write firmware images for ACC's two A35 cores to the ML region in LLC / DRAM. */
+	rte_memcpy(PLT_PTR_ADD(fw->data, FW_LINKER_OFFSET), buffer, size);
+
+	/* (2) Set ML(0)_MLR_BASE = Base IOVA of the ML region in LLC/DRAM. */
+	reg_val64 = PLT_PTR_SUB_U64_CAST(fw->data, rte_eal_get_baseaddr());
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_save(&mldev->roc, ML_MLR_BASE);
+
+	/* (3) Set ML(0)_AXI_BRIDGE_CTRL(1) = 0x184003 to remove back-pressure check on DMA AXI
+	 * bridge.
+	 */
+	reg_val64 = (ROC_ML_AXI_BRIDGE_CTRL_AXI_RESP_CTRL |
+		     ROC_ML_AXI_BRIDGE_CTRL_BRIDGE_CTRL_MODE | ROC_ML_AXI_BRIDGE_CTRL_NCB_WR_BLK |
+		     ROC_ML_AXI_BRIDGE_CTRL_FORCE_WRESP_OK | ROC_ML_AXI_BRIDGE_CTRL_FORCE_RRESP_OK);
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_AXI_BRIDGE_CTRL(1));
+	plt_ml_dbg("ML_AXI_BRIDGE_CTRL(1) => 0x%016lx",
+		   roc_ml_reg_read64(&mldev->roc, ML_AXI_BRIDGE_CTRL(1)));
+
+	/* (4) Set ML(0)_ANB(0..2)_BACKP_DISABLE = 0x3 to remove back-pressure on the AXI to NCB
+	 * bridges.
+	 */
+	for (i = 0; i < ML_ANBX_NR; i++) {
+		reg_val64 = (ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_B_BACKP_DISABLE |
+			     ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_R_BACKP_DISABLE);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_BACKP_DISABLE(i));
+		plt_ml_dbg("ML_ANBX_BACKP_DISABLE(%u) => 0x%016lx", i,
+			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_BACKP_DISABLE(i)));
+	}
+
+	/* (5) Set ML(0)_ANB(0..2)_NCBI_P_OVR = 0x3000 and ML(0)_ANB(0..2)_NCBI_NP_OVR = 0x3000 to
+	 * signal all ML transactions as non-secure.
+	 */
+	for (i = 0; i < ML_ANBX_NR; i++) {
+		reg_val64 = (ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR |
+			     ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR_VLD);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
+		plt_ml_dbg("ML_ANBX_NCBI_P_OVR(%u) => 0x%016lx", i,
+			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_P_OVR(i)));
+
+		reg_val64 |= (ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR |
+			      ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR_VLD);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_NP_OVR(i));
+		plt_ml_dbg("ML_ANBX_NCBI_NP_OVR(%u) => 0x%016lx", i,
+			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_NP_OVR(i)));
+	}
+
+	/* (6) Set ML(0)_CFG[MLIP_CLK_FORCE] = 1, to force turning on the MLIP clock. */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 |= ROC_ML_CFG_MLIP_CLK_FORCE;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	/* (7) Set ML(0)_JOB_MGR_CTRL[STALL_ON_IDLE] = 0, to make sure the boot request is accepted
+	 * when there is no job in the command queue.
+	 */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL);
+	reg_val64 &= ~ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
+	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL));
+
+	/* (8) Set ML(0)_CFG[ENA] = 0 and ML(0)_CFG[MLIP_ENA] = 1 to bring MLIP out of reset while
+	 * keeping the job manager disabled.
+	 */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 |= ROC_ML_CFG_MLIP_ENA;
+	reg_val64 &= ~ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	/* (9) Wait at least 70 coprocessor clock cycles. */
+	plt_delay_us(FW_WAIT_CYCLES);
+
+	/* (10) Write ML outbound addresses pointing to the firmware images written in step 1 to the
+	 * following registers: ML(0)_A35_0_RST_VECTOR_BASE_W(0..1) for core 0,
+	 * ML(0)_A35_1_RST_VECTOR_BASE_W(0..1) for core 1. The value written to each register is the
+	 * AXI outbound address divided by 4. Read after write.
+	 */
+	offset = PLT_PTR_ADD_U64_CAST(
+		fw->data, FW_LINKER_OFFSET - roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+	a35_0_rst_vector_base.s.addr = (offset + ML_AXI_START_ADDR) / 4;
+	a35_1_rst_vector_base.s.addr = (offset + ML_AXI_START_ADDR) / 4;
+
+	roc_ml_reg_write32(&mldev->roc, a35_0_rst_vector_base.w.w0, ML_A35_0_RST_VECTOR_BASE_W(0));
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(0));
+	plt_ml_dbg("ML_A35_0_RST_VECTOR_BASE_W(0) => 0x%08x", reg_val32);
+
+	roc_ml_reg_write32(&mldev->roc, a35_0_rst_vector_base.w.w1, ML_A35_0_RST_VECTOR_BASE_W(1));
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(1));
+	plt_ml_dbg("ML_A35_0_RST_VECTOR_BASE_W(1) => 0x%08x", reg_val32);
+
+	roc_ml_reg_write32(&mldev->roc, a35_1_rst_vector_base.w.w0, ML_A35_1_RST_VECTOR_BASE_W(0));
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(0));
+	plt_ml_dbg("ML_A35_1_RST_VECTOR_BASE_W(0) => 0x%08x", reg_val32);
+
+	roc_ml_reg_write32(&mldev->roc, a35_1_rst_vector_base.w.w1, ML_A35_1_RST_VECTOR_BASE_W(1));
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(1));
+	plt_ml_dbg("ML_A35_1_RST_VECTOR_BASE_W(1) => 0x%08x", reg_val32);
+
+	/* (11) Clear MLIP's ML(0)_SW_RST_CTRL[ACC_RST]. This will bring the ACC cores and other
+	 * MLIP components out of reset. The cores will execute firmware from the ML region as
+	 * written in step 1.
+	 */
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_SW_RST_CTRL);
+	reg_val32 &= ~ROC_ML_SW_RST_CTRL_ACC_RST;
+	roc_ml_reg_write32(&mldev->roc, reg_val32, ML_SW_RST_CTRL);
+	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_SW_RST_CTRL);
+	plt_ml_dbg("ML_SW_RST_CTRL => 0x%08x", reg_val32);
+
+	/* (12) Wait for notification from firmware that ML is ready for job execution. */
+	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
+	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
+	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &fw->req->result);
+	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
+	plt_write64(ML_CN10K_POLL_JOB_START, &fw->req->status);
+	plt_wmb();
+
+	/* Enqueue FW load through scratch registers */
+	timeout = true;
+	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&mldev->roc, &fw->req->jd);
+
+	plt_rmb();
+	do {
+		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
+		    (plt_read64(&fw->req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+			timeout = false;
+			break;
+		}
+	} while (plt_tsc_cycles() < timeout_cycle);
+
+	/* Check firmware load status, clean-up and exit on failure. */
+	if ((!timeout) && (fw->req->result.error_code == 0)) {
+		cn10k_ml_fw_print_info(fw);
+	} else {
+		/* Set ML to disable new jobs */
+		reg_val64 = (ROC_ML_CFG_JD_SIZE | ROC_ML_CFG_MLIP_ENA);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+
+		/* Clear scratch registers */
+		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+
+		if (timeout) {
+			plt_err("Firmware load timeout");
+			ret = -ETIME;
+		} else {
+			plt_err("Firmware load failed");
+			ret = -1;
+		}
+
+		return ret;
+	}
+
+	/* (13) Set ML(0)_JOB_MGR_CTRL[STALL_ON_IDLE] = 0x1; this is needed to shut down the MLIP
+	 * clock when there are no more jobs to process.
+	 */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL);
+	reg_val64 |= ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
+	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL));
+
+	/* (14) Set ML(0)_CFG[MLIP_CLK_FORCE] = 0; the MLIP clock will be turned on/off based on job
+	 * activities.
+	 */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 &= ~ROC_ML_CFG_MLIP_CLK_FORCE;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	/* (15) Set ML(0)_CFG[ENA] to enable ML job execution. */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 |= ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	/* Reset scratch registers */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+
+	/* Disable job execution, to be enabled in start */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 &= ~ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	/* Additional fixes: Set RO bit to fix O2D DMA bandwidth issue on cn10ka */
+	for (i = 0; i < ML_ANBX_NR; i++) {
+		reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_P_OVR(i));
+		reg_val64 |= (ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR |
+			      ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR_VLD);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
+	}
+
+	return ret;
+}
+
+int
+cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
+{
+	const struct plt_memzone *mz;
+	struct cn10k_ml_fw *fw;
+	void *fw_buffer = NULL;
+	uint64_t mz_size = 0;
+	uint64_t fw_size = 0;
+	int ret = 0;
+
+	fw = &mldev->fw;
+	fw->mldev = mldev;
+
+	/* Read firmware image to a buffer */
+	ret = rte_firmware_read(fw->path, &fw_buffer, &fw_size);
+	if (ret < 0) {
+		plt_err("Can't read firmware data: %s\n", fw->path);
+		return ret;
+	}
+
+	/* Reserve memzone for firmware load completion and data */
+	mz_size = sizeof(struct cn10k_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
+		  FW_DEBUG_BUFFER_SIZE + FW_EXCEPTION_BUFFER_SIZE;
+	mz = plt_memzone_reserve_aligned(FW_MEMZONE_NAME, mz_size, 0, ML_CN10K_ALIGN_SIZE);
+	if (mz == NULL) {
+		plt_err("plt_memzone_reserve failed : %s", FW_MEMZONE_NAME);
+		if (fw_buffer != NULL)
+			free(fw_buffer);
+		return -ENOMEM;
+	}
+	fw->req = mz->addr;
+
+	/* Reset firmware load completion structure */
+	memset(&fw->req->jd, 0, sizeof(struct cn10k_ml_jd));
+	memset(&fw->req->jd.fw_load.version[0], '\0', MLDEV_FIRMWARE_VERSION_LENGTH);
+
+	/* Reset device, if in active state */
+	if (roc_ml_mlip_is_enabled(&mldev->roc))
+		roc_ml_mlip_reset(&mldev->roc, true);
+
+	/* Load firmware */
+	fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cn10k_ml_req));
+	ret = cn10k_ml_fw_load_cn10ka(fw, fw_buffer, fw_size);
+	if (fw_buffer != NULL)
+		free(fw_buffer);
+	if (ret < 0)
+		cn10k_ml_fw_unload(mldev);
+
+	return ret;
+}
+
+void
+cn10k_ml_fw_unload(struct cn10k_ml_dev *mldev)
+{
+	const struct plt_memzone *mz;
+	uint64_t reg_val;
+
+	/* Disable and reset device */
+	reg_val = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val &= ~ROC_ML_CFG_MLIP_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val, ML_CFG);
+	roc_ml_mlip_reset(&mldev->roc, true);
+
+	mz = plt_memzone_lookup(FW_MEMZONE_NAME);
+	if (mz != NULL)
+		plt_memzone_free(mz);
+}
+
 static struct rte_pci_id pci_id_ml_table[] = {
 	{RTE_PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_CN10K_ML_PF)},
 	/* sentinel */
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 5333566cff..00d23eb3ca 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -10,6 +10,9 @@
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
+/* Firmware version string length */
+#define MLDEV_FIRMWARE_VERSION_LENGTH 32
+
 /* Device alignment size */
 #define ML_CN10K_ALIGN_SIZE 128
 
@@ -28,6 +31,19 @@
 /* ML command timeout in seconds */
 #define ML_CN10K_CMD_TIMEOUT 5
 
+/* Poll mode job state */
+#define ML_CN10K_POLL_JOB_START	 0
+#define ML_CN10K_POLL_JOB_FINISH 1
+
+/* Job types */
+enum cn10k_ml_job_type {
+	ML_CN10K_JOB_TYPE_MODEL_RUN = 0,
+	ML_CN10K_JOB_TYPE_MODEL_STOP,
+	ML_CN10K_JOB_TYPE_MODEL_START,
+	ML_CN10K_JOB_TYPE_FIRMWARE_LOAD,
+	ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST,
+};
+
 /* Device configuration state enum */
 enum cn10k_ml_dev_state {
 	/* Probed and not configured */
@@ -43,6 +59,136 @@ enum cn10k_ml_dev_state {
 	ML_CN10K_DEV_STATE_CLOSED
 };
 
+/* Firmware stats */
+struct cn10k_ml_fw_stats {
+	/* Firmware start cycle */
+	uint64_t fw_start;
+
+	/* Firmware end cycle */
+	uint64_t fw_end;
+
+	/* Hardware start cycle */
+	uint64_t hw_start;
+
+	/* Hardware end cycle */
+	uint64_t hw_end;
+};
+
+/* Result structure */
+struct cn10k_ml_result {
+	/* Job error code */
+	uint64_t error_code;
+
+	/* Firmware stats */
+	struct cn10k_ml_fw_stats stats;
+
+	/* User context pointer */
+	void *user_ptr;
+};
+
+/* Firmware capability structure */
+union cn10k_ml_fw_cap {
+	uint64_t u64;
+
+	struct {
+		/* CMPC completion support */
+		uint64_t cmpc_completions : 1;
+
+		/* Poll mode completion support */
+		uint64_t poll_completions : 1;
+
+		/* SSO completion support */
+		uint64_t sso_completions : 1;
+
+		/* Support for model side loading */
+		uint64_t side_load_model : 1;
+
+		/* Batch execution */
+		uint64_t batch_run : 1;
+
+		/* Max number of models to be loaded in parallel */
+		uint64_t max_models : 8;
+
+		/* Firmware statistics */
+		uint64_t fw_stats : 1;
+
+		/* Hardware statistics */
+		uint64_t hw_stats : 1;
+
+		/* Max number of batches */
+		uint64_t max_num_batches : 16;
+
+		uint64_t rsvd : 33;
+	} s;
+};
+
+/* Firmware debug info structure */
+struct cn10k_ml_fw_debug {
+	/* ACC core 0 debug buffer */
+	uint64_t core0_debug_ptr;
+
+	/* ACC core 1 debug buffer */
+	uint64_t core1_debug_ptr;
+
+	/* ACC core 0 exception state buffer */
+	uint64_t core0_exception_buffer;
+
+	/* ACC core 1 exception state buffer */
+	uint64_t core1_exception_buffer;
+
+	/* Debug buffer size per core */
+	uint32_t debug_buffer_size;
+
+	/* Exception state dump size */
+	uint32_t exception_state_size;
+};
+
+/* Job descriptor header (32 bytes) */
+struct cn10k_ml_jd_header {
+	/* Job completion structure */
+	struct ml_jce_s jce;
+
+	/* Model ID */
+	uint64_t model_id : 8;
+
+	/* Job type */
+	uint64_t job_type : 8;
+
+	/* Flags for fast-path jobs */
+	uint64_t fp_flags : 16;
+
+	/* Flags for slow-path jobs */
+	uint64_t sp_flags : 16;
+	uint64_t rsvd : 16;
+
+	/* Job result pointer */
+	uint64_t *result;
+};
+
+/* Job descriptor structure */
+struct cn10k_ml_jd {
+	/* Job descriptor header (32 bytes) */
+	struct cn10k_ml_jd_header hdr;
+
+	union {
+		struct cn10k_ml_jd_section_fw_load {
+			/* Firmware capability structure (8 bytes) */
+			union cn10k_ml_fw_cap cap;
+
+			/* Firmware version (32 bytes) */
+			uint8_t version[MLDEV_FIRMWARE_VERSION_LENGTH];
+
+			/* Debug capability structure (40 bytes) */
+			struct cn10k_ml_fw_debug debug;
+
+			/* Flags to control error handling */
+			uint64_t flags;
+
+			uint8_t rsvd[8];
+		} fw_load;
+	};
+};
+
 /* ML firmware structure */
 struct cn10k_ml_fw {
 	/* Device reference */
@@ -50,6 +196,12 @@ struct cn10k_ml_fw {
 
 	/* Firmware file path */
 	const char *path;
+
+	/* Data buffer */
+	uint8_t *data;
+
+	/* Firmware load / handshake request structure */
+	struct cn10k_ml_req *req;
 };
 
 /* Device private data */
@@ -64,4 +216,8 @@ struct cn10k_ml_dev {
 	struct cn10k_ml_fw fw;
 };
 
+uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
+int cn10k_ml_fw_load(struct cn10k_ml_dev *mldev);
+void cn10k_ml_fw_unload(struct cn10k_ml_dev *mldev);
+
 #endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 3a78d8c816..3df1254dca 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -30,6 +30,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 {
 	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_dev *mldev;
+	int ret;
 
 	if (dev == NULL || conf == NULL)
 		return -EINVAL;
@@ -51,6 +52,11 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	if (mldev->state == ML_CN10K_DEV_STATE_PROBED) {
 		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
+
+		/* Load firmware */
+		ret = cn10k_ml_fw_load(mldev);
+		if (ret != 0)
+			return ret;
 	} else if (mldev->state == ML_CN10K_DEV_STATE_CONFIGURED) {
 		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
@@ -77,6 +83,21 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 
 	mldev = dev->data->dev_private;
 
+	/* Unload firmware */
+	cn10k_ml_fw_unload(mldev);
+
+	/* Clear scratch registers */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+
+	/* Reset ML_MLR_BASE */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+
 	mldev->state = ML_CN10K_DEV_STATE_CLOSED;
 
 	/* Remove PCI device */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index b14221d02c..fe18730aca 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -5,6 +5,20 @@
 #ifndef _CN10K_ML_OPS_H_
 #define _CN10K_ML_OPS_H_
 
+#include "cn10k_ml_dev.h"
+
+/* ML request */
+struct cn10k_ml_req {
+	/* Job descriptor */
+	struct cn10k_ml_jd jd;
+
+	/* Job result */
+	struct cn10k_ml_result result;
+
+	/* Status field for poll mode requests */
+	volatile uint64_t status;
+} __rte_aligned(ROC_ALIGN);
+
 /* Device ops */
 extern struct rte_ml_dev_ops cn10k_ml_ops;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 08/39] ml/cnxk: enable support for simulator environment
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (6 preceding siblings ...)
  2023-03-10  8:19   ` [PATCH v6 07/39] ml/cnxk: enable firmware load and device reset Srikanth Yalavarthi
@ 2023-03-10  8:19   ` Srikanth Yalavarthi
  2023-03-10  8:19   ` [PATCH v6 09/39] ml/cnxk: enable support for device start and stop Srikanth Yalavarthi
                     ` (32 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:19 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Enabled device initialization and firmware load on simulator
platform. Firmware load stage on simulator would involve
launching a firmware handshake request only.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 119 +++++++++++++++++++++++++++++----
 1 file changed, 107 insertions(+), 12 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 90fca45ddd..837f006bf0 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -213,6 +213,89 @@ cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 	return FW_LOAD_FLAGS;
 }
 
+static int
+cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
+{
+	struct cn10k_ml_dev *mldev;
+	uint64_t timeout_cycle;
+	uint64_t reg_val64;
+	bool timeout;
+	int ret = 0;
+
+	mldev = fw->mldev;
+
+	/* Reset HEAD and TAIL debug pointer registers */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
+
+	/* Set ML_MLR_BASE to base IOVA of the ML region in LLC/DRAM. */
+	reg_val64 = rte_eal_get_baseaddr();
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_save(&mldev->roc, ML_MLR_BASE);
+
+	/* Update FW load completion structure */
+	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
+	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
+	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &fw->req->result);
+	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
+	plt_write64(ML_CN10K_POLL_JOB_START, &fw->req->status);
+	plt_wmb();
+
+	/* Enqueue FW load through scratch registers */
+	timeout = true;
+	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&mldev->roc, &fw->req->jd);
+
+	plt_rmb();
+	do {
+		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
+		    (plt_read64(&fw->req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+			timeout = false;
+			break;
+		}
+	} while (plt_tsc_cycles() < timeout_cycle);
+
+	/* Check firmware load status, clean-up and exit on failure. */
+	if ((!timeout) && (fw->req->result.error_code == 0)) {
+		cn10k_ml_fw_print_info(fw);
+	} else {
+		/* Set ML to disable new jobs */
+		reg_val64 = (ROC_ML_CFG_JD_SIZE | ROC_ML_CFG_MLIP_ENA);
+		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+
+		/* Clear scratch registers */
+		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+
+		if (timeout) {
+			plt_err("Firmware load timeout");
+			ret = -ETIME;
+		} else {
+			plt_err("Firmware load failed");
+			ret = -1;
+		}
+
+		return ret;
+	}
+
+	/* Reset scratch registers */
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+
+	/* Disable job execution, to be enabled in start */
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 &= ~ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	return ret;
+}
+
 static int
 cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 {
@@ -447,16 +530,22 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	fw = &mldev->fw;
 	fw->mldev = mldev;
 
-	/* Read firmware image to a buffer */
-	ret = rte_firmware_read(fw->path, &fw_buffer, &fw_size);
-	if (ret < 0) {
-		plt_err("Can't read firmware data: %s\n", fw->path);
-		return ret;
+	if (roc_env_is_emulator() || roc_env_is_hw()) {
+		/* Read firmware image to a buffer */
+		ret = rte_firmware_read(fw->path, &fw_buffer, &fw_size);
+		if (ret < 0) {
+			plt_err("Can't read firmware data: %s\n", fw->path);
+			return ret;
+		}
+
+		/* Reserve memzone for firmware load completion and data */
+		mz_size = sizeof(struct cn10k_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
+			  FW_DEBUG_BUFFER_SIZE + FW_EXCEPTION_BUFFER_SIZE;
+	} else if (roc_env_is_asim()) {
+		/* Reserve memzone for firmware load completion */
+		mz_size = sizeof(struct cn10k_ml_req);
 	}
 
-	/* Reserve memzone for firmware load completion and data */
-	mz_size = sizeof(struct cn10k_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
-		  FW_DEBUG_BUFFER_SIZE + FW_EXCEPTION_BUFFER_SIZE;
 	mz = plt_memzone_reserve_aligned(FW_MEMZONE_NAME, mz_size, 0, ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL) {
 		plt_err("plt_memzone_reserve failed : %s", FW_MEMZONE_NAME);
@@ -475,10 +564,16 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 		roc_ml_mlip_reset(&mldev->roc, true);
 
 	/* Load firmware */
-	fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cn10k_ml_req));
-	ret = cn10k_ml_fw_load_cn10ka(fw, fw_buffer, fw_size);
-	if (fw_buffer != NULL)
-		free(fw_buffer);
+	if (roc_env_is_emulator() || roc_env_is_hw()) {
+		fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cn10k_ml_req));
+		ret = cn10k_ml_fw_load_cn10ka(fw, fw_buffer, fw_size);
+		if (fw_buffer != NULL)
+			free(fw_buffer);
+	} else if (roc_env_is_asim()) {
+		fw->data = NULL;
+		ret = cn10k_ml_fw_load_asim(fw);
+	}
+
 	if (ret < 0)
 		cn10k_ml_fw_unload(mldev);
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 09/39] ml/cnxk: enable support for device start and stop
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (7 preceding siblings ...)
  2023-03-10  8:19   ` [PATCH v6 08/39] ml/cnxk: enable support for simulator environment Srikanth Yalavarthi
@ 2023-03-10  8:19   ` Srikanth Yalavarthi
  2023-03-10  8:19   ` [PATCH v6 10/39] ml/cnxk: add support to create device queue-pairs Srikanth Yalavarthi
                     ` (31 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:19 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Implemented ML driver functions to start and stop ML device.
Start / Stop would enable or disable ML device to accept
inference requests.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 42 +++++++++++++++++++++++++++++++---
 1 file changed, 39 insertions(+), 3 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 3df1254dca..a9f14fe4c5 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -104,9 +104,45 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	return rte_dev_remove(dev->device);
 }
 
+static int
+cn10k_ml_dev_start(struct rte_ml_dev *dev)
+{
+	struct cn10k_ml_dev *mldev;
+	uint64_t reg_val64;
+
+	mldev = dev->data->dev_private;
+
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 |= ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	mldev->state = ML_CN10K_DEV_STATE_STARTED;
+
+	return 0;
+}
+
+static int
+cn10k_ml_dev_stop(struct rte_ml_dev *dev)
+{
+	struct cn10k_ml_dev *mldev;
+	uint64_t reg_val64;
+
+	mldev = dev->data->dev_private;
+
+	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 &= ~ROC_ML_CFG_ENA;
+	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+
+	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
-	.dev_info_get = cn10k_ml_dev_info_get,
-	.dev_configure = cn10k_ml_dev_configure,
-	.dev_close = cn10k_ml_dev_close,
+	.dev_info_get = cn10k_ml_dev_info_get, .dev_configure = cn10k_ml_dev_configure,
+	.dev_close = cn10k_ml_dev_close,       .dev_start = cn10k_ml_dev_start,
+	.dev_stop = cn10k_ml_dev_stop,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 10/39] ml/cnxk: add support to create device queue-pairs
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (8 preceding siblings ...)
  2023-03-10  8:19   ` [PATCH v6 09/39] ml/cnxk: enable support for device start and stop Srikanth Yalavarthi
@ 2023-03-10  8:19   ` Srikanth Yalavarthi
  2023-03-10  8:19   ` [PATCH v6 11/39] ml/cnxk: add functions to load and unload models Srikanth Yalavarthi
                     ` (30 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:19 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Enabled support to create and destroy device queue-pairs. Updated
configure stage to create array to store queue-pair handles. Added
internal structure for queue-pair, queue and ML inference requests.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 207 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cn10k_ml_ops.h |  33 +++++-
 2 files changed, 237 insertions(+), 3 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index a9f14fe4c5..82670330d1 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -8,6 +8,97 @@
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ops.h"
 
+static void
+qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
+{
+	snprintf(name, size, "cn10k_ml_qp_mem_%u:%u", dev_id, qp_id);
+}
+
+static int
+cn10k_ml_qp_destroy(const struct rte_ml_dev *dev, struct cn10k_ml_qp *qp)
+{
+	const struct rte_memzone *qp_mem;
+	char name[RTE_MEMZONE_NAMESIZE];
+	int ret;
+
+	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp->id);
+	qp_mem = rte_memzone_lookup(name);
+	ret = rte_memzone_free(qp_mem);
+	if (ret)
+		return ret;
+
+	rte_free(qp);
+
+	return 0;
+}
+
+static int
+cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
+{
+	struct cn10k_ml_qp *qp;
+	int ret;
+
+	qp = dev->data->queue_pairs[queue_pair_id];
+	if (qp == NULL)
+		return -EINVAL;
+
+	ret = cn10k_ml_qp_destroy(dev, qp);
+	if (ret) {
+		plt_err("Could not destroy queue pair %u", queue_pair_id);
+		return ret;
+	}
+
+	dev->data->queue_pairs[queue_pair_id] = NULL;
+
+	return 0;
+}
+
+static struct cn10k_ml_qp *
+cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
+{
+	const struct rte_memzone *qp_mem;
+	char name[RTE_MEMZONE_NAMESIZE];
+	struct cn10k_ml_qp *qp;
+	uint32_t len;
+	uint8_t *va;
+
+	/* Allocate queue pair */
+	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cn10k_ml_qp), ROC_ALIGN,
+				socket_id);
+	if (qp == NULL) {
+		plt_err("Could not allocate queue pair");
+		return NULL;
+	}
+
+	/* For request queue */
+	len = nb_desc * sizeof(struct cn10k_ml_req);
+	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp_id);
+	qp_mem = rte_memzone_reserve_aligned(
+		name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | RTE_MEMZONE_256MB, ROC_ALIGN);
+	if (qp_mem == NULL) {
+		plt_err("Could not reserve memzone: %s", name);
+		goto qp_free;
+	}
+
+	va = qp_mem->addr;
+	memset(va, 0, len);
+
+	/* Initialize Request queue */
+	qp->id = qp_id;
+	qp->queue.reqs = (struct cn10k_ml_req *)va;
+	qp->queue.head = 0;
+	qp->queue.tail = 0;
+	qp->queue.wait_cycles = ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	qp->nb_desc = nb_desc;
+
+	return qp;
+
+qp_free:
+	rte_free(qp);
+
+	return NULL;
+}
+
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -30,6 +121,9 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 {
 	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_qp *qp;
+	uint32_t mz_size;
+	uint16_t qp_id;
 	int ret;
 
 	if (dev == NULL || conf == NULL)
@@ -68,21 +162,83 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		return -ENOTSUP;
 	}
 
+	/* Configure queue-pairs */
+	if (dev->data->queue_pairs == NULL) {
+		mz_size = sizeof(dev->data->queue_pairs[0]) * conf->nb_queue_pairs;
+		dev->data->queue_pairs =
+			rte_zmalloc("cn10k_mldev_queue_pairs", mz_size, RTE_CACHE_LINE_SIZE);
+		if (dev->data->queue_pairs == NULL) {
+			dev->data->nb_queue_pairs = 0;
+			plt_err("Failed to get memory for queue_pairs, nb_queue_pairs %u",
+				conf->nb_queue_pairs);
+			return -ENOMEM;
+		}
+	} else { /* Re-configure */
+		void **queue_pairs;
+
+		/* Release all queue pairs as ML spec doesn't support queue_pair_destroy. */
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+			qp = dev->data->queue_pairs[qp_id];
+			if (qp != NULL) {
+				ret = cn10k_ml_dev_queue_pair_release(dev, qp_id);
+				if (ret < 0)
+					return ret;
+			}
+		}
+
+		queue_pairs = dev->data->queue_pairs;
+		queue_pairs =
+			rte_realloc(queue_pairs, sizeof(queue_pairs[0]) * conf->nb_queue_pairs,
+				    RTE_CACHE_LINE_SIZE);
+		if (queue_pairs == NULL) {
+			dev->data->nb_queue_pairs = 0;
+			plt_err("Failed to realloc queue_pairs, nb_queue_pairs = %u",
+				conf->nb_queue_pairs);
+			ret = -ENOMEM;
+			goto error;
+		}
+
+		memset(queue_pairs, 0, sizeof(queue_pairs[0]) * conf->nb_queue_pairs);
+		dev->data->queue_pairs = queue_pairs;
+	}
+	dev->data->nb_queue_pairs = conf->nb_queue_pairs;
+
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
 
 	return 0;
+
+error:
+	if (dev->data->queue_pairs != NULL)
+		rte_free(dev->data->queue_pairs);
+
+	return ret;
 }
 
 static int
 cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_qp *qp;
+	uint16_t qp_id;
 
 	if (dev == NULL)
 		return -EINVAL;
 
 	mldev = dev->data->dev_private;
 
+	/* Destroy all queue pairs */
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		if (qp != NULL) {
+			if (cn10k_ml_qp_destroy(dev, qp) != 0)
+				plt_err("Could not destroy queue pair %u", qp_id);
+			dev->data->queue_pairs[qp_id] = NULL;
+		}
+	}
+
+	if (dev->data->queue_pairs)
+		rte_free(dev->data->queue_pairs);
+
 	/* Unload firmware */
 	cn10k_ml_fw_unload(mldev);
 
@@ -140,9 +296,56 @@ cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 	return 0;
 }
 
+static int
+cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
+			      const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_qp *qp;
+	uint32_t nb_desc;
+
+	if (queue_pair_id >= dev->data->nb_queue_pairs) {
+		plt_err("Queue-pair id = %u (>= max queue pairs supported, %u)\n", queue_pair_id,
+			dev->data->nb_queue_pairs);
+		return -EINVAL;
+	}
+
+	if (dev->data->queue_pairs[queue_pair_id] != NULL)
+		cn10k_ml_dev_queue_pair_release(dev, queue_pair_id);
+
+	cn10k_ml_dev_info_get(dev, &dev_info);
+	if ((qp_conf->nb_desc > dev_info.max_desc) || (qp_conf->nb_desc == 0)) {
+		plt_err("Could not setup queue pair for %u descriptors", qp_conf->nb_desc);
+		return -EINVAL;
+	}
+	plt_ml_dbg("Creating queue-pair, queue_pair_id = %u, nb_desc = %u", queue_pair_id,
+		   qp_conf->nb_desc);
+
+	/* As the number of usable descriptors is 1 less than the queue size being created, we
+	 * increment the size of queue by 1 than the requested size, except when the requested size
+	 * is equal to the maximum possible size.
+	 */
+	nb_desc =
+		(qp_conf->nb_desc == dev_info.max_desc) ? dev_info.max_desc : qp_conf->nb_desc + 1;
+	qp = cn10k_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
+	if (qp == NULL) {
+		plt_err("Could not create queue pair %u", queue_pair_id);
+		return -ENOMEM;
+	}
+	dev->data->queue_pairs[queue_pair_id] = qp;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
-	.dev_info_get = cn10k_ml_dev_info_get, .dev_configure = cn10k_ml_dev_configure,
-	.dev_close = cn10k_ml_dev_close,       .dev_start = cn10k_ml_dev_start,
+	.dev_info_get = cn10k_ml_dev_info_get,
+	.dev_configure = cn10k_ml_dev_configure,
+	.dev_close = cn10k_ml_dev_close,
+	.dev_start = cn10k_ml_dev_start,
 	.dev_stop = cn10k_ml_dev_stop,
+
+	/* Queue-pair handling ops */
+	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
+	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
 };
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index fe18730aca..289c7c5587 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -5,9 +5,13 @@
 #ifndef _CN10K_ML_OPS_H_
 #define _CN10K_ML_OPS_H_
 
+#include <rte_mldev.h>
+
+#include <roc_api.h>
+
 #include "cn10k_ml_dev.h"
 
-/* ML request */
+/* Request structure */
 struct cn10k_ml_req {
 	/* Job descriptor */
 	struct cn10k_ml_jd jd;
@@ -19,6 +23,33 @@ struct cn10k_ml_req {
 	volatile uint64_t status;
 } __rte_aligned(ROC_ALIGN);
 
+/* Request queue */
+struct cn10k_ml_queue {
+	/* Array of requests */
+	struct cn10k_ml_req *reqs;
+
+	/* Head of the queue, used for enqueue */
+	uint64_t head;
+
+	/* Tail of the queue, used for dequeue */
+	uint64_t tail;
+
+	/* Wait cycles before timeout */
+	uint64_t wait_cycles;
+};
+
+/* Queue-pair structure */
+struct cn10k_ml_qp {
+	/* ID */
+	uint32_t id;
+
+	/* Number of descriptors */
+	uint64_t nb_desc;
+
+	/* Request queue */
+	struct cn10k_ml_queue queue;
+};
+
 /* Device ops */
 extern struct rte_ml_dev_ops cn10k_ml_ops;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 11/39] ml/cnxk: add functions to load and unload models
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (9 preceding siblings ...)
  2023-03-10  8:19   ` [PATCH v6 10/39] ml/cnxk: add support to create device queue-pairs Srikanth Yalavarthi
@ 2023-03-10  8:19   ` Srikanth Yalavarthi
  2023-03-10  8:19   ` [PATCH v6 12/39] ml/cnxk: enable validity checks for model metadata Srikanth Yalavarthi
                     ` (29 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:19 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added cnxk driver implementations to load and unload ML models.
Enabled support in configure stage to allocate model handles
array. Assign model ID and allocate resources per each model
during load stage and release resources during model unload.
Added internal structures to handle ML models.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |   3 +
 drivers/ml/cnxk/cn10k_ml_model.c |   5 +
 drivers/ml/cnxk/cn10k_ml_model.h |  40 ++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 154 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h   |   5 +
 drivers/ml/cnxk/meson.build      |   2 +
 6 files changed, 209 insertions(+)
 create mode 100644 drivers/ml/cnxk/cn10k_ml_model.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_model.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 00d23eb3ca..7cf6268115 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -214,6 +214,9 @@ struct cn10k_ml_dev {
 
 	/* Firmware */
 	struct cn10k_ml_fw fw;
+
+	/* Number of models loaded */
+	uint16_t nb_models_loaded;
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
new file mode 100644
index 0000000000..39ed707396
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#include "cn10k_ml_model.h"
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
new file mode 100644
index 0000000000..a9f7b169de
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef _CN10K_ML_MODEL_H_
+#define _CN10K_ML_MODEL_H_
+
+#include <rte_mldev.h>
+
+#include <roc_api.h>
+
+#include "cn10k_ml_dev.h"
+
+/* Model state */
+enum cn10k_ml_model_state {
+	ML_CN10K_MODEL_STATE_LOADED,
+	ML_CN10K_MODEL_STATE_JOB_ACTIVE,
+	ML_CN10K_MODEL_STATE_STARTED,
+	ML_CN10K_MODEL_STATE_UNKNOWN,
+};
+
+/* Model Object */
+struct cn10k_ml_model {
+	/* Device reference */
+	struct cn10k_ml_dev *mldev;
+
+	/* Name */
+	char name[RTE_ML_STR_MAX];
+
+	/* ID */
+	uint16_t model_id;
+
+	/* Spinlock, used to update model state */
+	plt_spinlock_t lock;
+
+	/* State */
+	enum cn10k_ml_model_state state;
+};
+
+#endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 82670330d1..0955fa0d76 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -6,8 +6,12 @@
 #include <rte_mldev_pmd.h>
 
 #include "cn10k_ml_dev.h"
+#include "cn10k_ml_model.h"
 #include "cn10k_ml_ops.h"
 
+/* ML model macros */
+#define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -120,9 +124,11 @@ static int
 cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
 {
 	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
 	uint32_t mz_size;
+	uint16_t model_id;
 	uint16_t qp_id;
 	int ret;
 
@@ -203,6 +209,48 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	}
 	dev->data->nb_queue_pairs = conf->nb_queue_pairs;
 
+	/* Allocate ML models */
+	if (dev->data->models == NULL) {
+		mz_size = sizeof(dev->data->models[0]) * conf->nb_models;
+		dev->data->models = rte_zmalloc("cn10k_mldev_models", mz_size, RTE_CACHE_LINE_SIZE);
+		if (dev->data->models == NULL) {
+			dev->data->nb_models = 0;
+			plt_err("Failed to get memory for ml_models, nb_models %u",
+				conf->nb_models);
+			ret = -ENOMEM;
+			goto error;
+		}
+	} else {
+		/* Re-configure */
+		void **models;
+
+		/* Unload all models */
+		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+			model = dev->data->models[model_id];
+			if (model != NULL) {
+				if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+					if (cn10k_ml_model_unload(dev, model_id) != 0)
+						plt_err("Could not unload model %u", model_id);
+				}
+				dev->data->models[model_id] = NULL;
+			}
+		}
+
+		models = dev->data->models;
+		models = rte_realloc(models, sizeof(models[0]) * conf->nb_models,
+				     RTE_CACHE_LINE_SIZE);
+		if (models == NULL) {
+			dev->data->nb_models = 0;
+			plt_err("Failed to realloc ml_models, nb_models = %u", conf->nb_models);
+			ret = -ENOMEM;
+			goto error;
+		}
+		memset(models, 0, sizeof(models[0]) * conf->nb_models);
+		dev->data->models = models;
+	}
+	dev->data->nb_models = conf->nb_models;
+
+	mldev->nb_models_loaded = 0;
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
 
 	return 0;
@@ -211,14 +259,19 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	if (dev->data->queue_pairs != NULL)
 		rte_free(dev->data->queue_pairs);
 
+	if (dev->data->models != NULL)
+		rte_free(dev->data->models);
+
 	return ret;
 }
 
 static int
 cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
+	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
+	uint16_t model_id;
 	uint16_t qp_id;
 
 	if (dev == NULL)
@@ -226,6 +279,21 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 
 	mldev = dev->data->dev_private;
 
+	/* Unload all models */
+	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+		model = dev->data->models[model_id];
+		if (model != NULL) {
+			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+				if (cn10k_ml_model_unload(dev, model_id) != 0)
+					plt_err("Could not unload model %u", model_id);
+			}
+			dev->data->models[model_id] = NULL;
+		}
+	}
+
+	if (dev->data->models)
+		rte_free(dev->data->models);
+
 	/* Destroy all queue pairs */
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
 		qp = dev->data->queue_pairs[qp_id];
@@ -337,6 +405,88 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+int
+cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint64_t mz_size;
+	uint16_t idx;
+	bool found;
+
+	PLT_SET_USED(params);
+
+	mldev = dev->data->dev_private;
+
+	/* Find model ID */
+	found = false;
+	for (idx = 0; idx < dev->data->nb_models; idx++) {
+		if (dev->data->models[idx] == NULL) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		plt_err("No slots available to load new model");
+		return -ENOMEM;
+	}
+
+	/* Compute memzone size */
+	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE);
+
+	/* Allocate memzone for model object and model data */
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
+	mz = plt_memzone_reserve_aligned(str, mz_size, 0, ML_CN10K_ALIGN_SIZE);
+	if (!mz) {
+		plt_err("plt_memzone_reserve failed : %s", str);
+		return -ENOMEM;
+	}
+
+	model = mz->addr;
+	model->mldev = mldev;
+	model->model_id = idx;
+
+	plt_spinlock_init(&model->lock);
+	model->state = ML_CN10K_MODEL_STATE_LOADED;
+	dev->data->models[idx] = model;
+	mldev->nb_models_loaded++;
+
+	*model_id = idx;
+
+	return 0;
+}
+
+int
+cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	char str[RTE_MEMZONE_NAMESIZE];
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+
+	mldev = dev->data->dev_private;
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	if (model->state != ML_CN10K_MODEL_STATE_LOADED) {
+		plt_err("Cannot unload. Model in use.");
+		return -EBUSY;
+	}
+
+	dev->data->models[model_id] = NULL;
+	mldev->nb_models_loaded--;
+
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, model_id);
+	return plt_memzone_free(plt_memzone_lookup(str));
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -348,4 +498,8 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Queue-pair handling ops */
 	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
 	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
+
+	/* Model ops */
+	.model_load = cn10k_ml_model_load,
+	.model_unload = cn10k_ml_model_unload,
 };
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 289c7c5587..d7842ecd73 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -53,4 +53,9 @@ struct cn10k_ml_qp {
 /* Device ops */
 extern struct rte_ml_dev_ops cn10k_ml_ops;
 
+/* Slow-path ops */
+int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
+			uint16_t *model_id);
+int cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
+
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 7dc8a29a80..bf7a9c0225 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -10,11 +10,13 @@ endif
 driver_sdk_headers = files(
         'cn10k_ml_dev.h',
         'cn10k_ml_ops.h',
+        'cn10k_ml_model.h',
 )
 
 sources = files(
         'cn10k_ml_dev.c',
         'cn10k_ml_ops.c',
+        'cn10k_ml_model.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs']
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 12/39] ml/cnxk: enable validity checks for model metadata
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (10 preceding siblings ...)
  2023-03-10  8:19   ` [PATCH v6 11/39] ml/cnxk: add functions to load and unload models Srikanth Yalavarthi
@ 2023-03-10  8:19   ` Srikanth Yalavarthi
  2023-03-10  8:19   ` [PATCH v6 13/39] ml/cnxk: add internal structures for derived info Srikanth Yalavarthi
                     ` (28 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:19 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added model metadata structure and enabled metadata check
during model load. Remap cnxk IO types with RTE IO types.
Store and update model metadata in model structure.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 211 +++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h | 312 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c   |  14 +-
 drivers/ml/cnxk/meson.build      |   2 +-
 4 files changed, 537 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 39ed707396..dfa814bbe0 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -2,4 +2,215 @@
  * Copyright (c) 2022 Marvell.
  */
 
+#include <rte_hash_crc.h>
+
+#include <mldev_utils.h>
+
+#include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
+
+static enum rte_ml_io_type
+cn10k_ml_io_type_map(uint8_t type)
+{
+	switch (type) {
+	case 1:
+		return RTE_ML_IO_TYPE_INT8;
+	case 2:
+		return RTE_ML_IO_TYPE_UINT8;
+	case 3:
+		return RTE_ML_IO_TYPE_INT16;
+	case 4:
+		return RTE_ML_IO_TYPE_UINT16;
+	case 5:
+		return RTE_ML_IO_TYPE_INT32;
+	case 6:
+		return RTE_ML_IO_TYPE_UINT32;
+	case 7:
+		return RTE_ML_IO_TYPE_FP16;
+	case 8:
+		return RTE_ML_IO_TYPE_FP32;
+	}
+
+	return RTE_ML_IO_TYPE_UNKNOWN;
+}
+
+int
+cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	uint32_t payload_crc32c;
+	uint32_t header_crc32c;
+	uint8_t version[4];
+	uint8_t i;
+
+	metadata = (struct cn10k_ml_model_metadata *)buffer;
+
+	/* Header CRC check */
+	if (metadata->metadata_header.header_crc32c != 0) {
+		header_crc32c = rte_hash_crc(
+			buffer, sizeof(metadata->metadata_header) - sizeof(uint32_t), 0);
+
+		if (header_crc32c != metadata->metadata_header.header_crc32c) {
+			plt_err("Invalid model, Header CRC mismatch");
+			return -EINVAL;
+		}
+	}
+
+	/* Payload CRC check */
+	if (metadata->metadata_header.payload_crc32c != 0) {
+		payload_crc32c = rte_hash_crc(buffer + sizeof(metadata->metadata_header),
+					      size - sizeof(metadata->metadata_header), 0);
+
+		if (payload_crc32c != metadata->metadata_header.payload_crc32c) {
+			plt_err("Invalid model, Payload CRC mismatch");
+			return -EINVAL;
+		}
+	}
+
+	/* Model magic string */
+	if (strncmp((char *)metadata->metadata_header.magic, MRVL_ML_MODEL_MAGIC_STRING, 4) != 0) {
+		plt_err("Invalid model, magic = %s", metadata->metadata_header.magic);
+		return -EINVAL;
+	}
+
+	/* Target architecture */
+	if (metadata->metadata_header.target_architecture != MRVL_ML_MODEL_TARGET_ARCH) {
+		plt_err("Model target architecture (%u) not supported",
+			metadata->metadata_header.target_architecture);
+		return -ENOTSUP;
+	}
+
+	/* Header version */
+	rte_memcpy(version, metadata->metadata_header.version, 4 * sizeof(uint8_t));
+	if (version[0] * 1000 + version[1] * 100 < MRVL_ML_MODEL_VERSION) {
+		plt_err("Metadata version = %u.%u.%u.%u (< %u.%u.%u.%u) not supported", version[0],
+			version[1], version[2], version[3], (MRVL_ML_MODEL_VERSION / 1000) % 10,
+			(MRVL_ML_MODEL_VERSION / 100) % 10, (MRVL_ML_MODEL_VERSION / 10) % 10,
+			MRVL_ML_MODEL_VERSION % 10);
+		return -ENOTSUP;
+	}
+
+	/* Init section */
+	if (metadata->init_model.file_size == 0) {
+		plt_err("Invalid metadata, init_model.file_size = %u",
+			metadata->init_model.file_size);
+		return -EINVAL;
+	}
+
+	/* Main section */
+	if (metadata->main_model.file_size == 0) {
+		plt_err("Invalid metadata, main_model.file_size = %u",
+			metadata->main_model.file_size);
+		return -EINVAL;
+	}
+
+	/* Finish section */
+	if (metadata->finish_model.file_size == 0) {
+		plt_err("Invalid metadata, finish_model.file_size = %u",
+			metadata->finish_model.file_size);
+		return -EINVAL;
+	}
+
+	/* Weights and Bias */
+	if (metadata->weights_bias.file_size == 0) {
+		plt_err("Invalid metadata, weights_bias.file_size = %u",
+			metadata->weights_bias.file_size);
+		return -EINVAL;
+	}
+
+	if (metadata->weights_bias.relocatable != 1) {
+		plt_err("Model not supported, non-relocatable weights and bias");
+		return -ENOTSUP;
+	}
+
+	/* Check input count */
+	if (metadata->model.num_input > MRVL_ML_INPUT_OUTPUT_SIZE) {
+		plt_err("Invalid metadata, num_input  = %u (> %u)", metadata->model.num_input,
+			MRVL_ML_INPUT_OUTPUT_SIZE);
+		return -EINVAL;
+	}
+
+	/* Check output count */
+	if (metadata->model.num_output > MRVL_ML_INPUT_OUTPUT_SIZE) {
+		plt_err("Invalid metadata, num_output  = %u (> %u)", metadata->model.num_output,
+			MRVL_ML_INPUT_OUTPUT_SIZE);
+		return -EINVAL;
+	}
+
+	/* Inputs */
+	for (i = 0; i < metadata->model.num_input; i++) {
+		if (rte_ml_io_type_size_get(cn10k_ml_io_type_map(metadata->input[i].input_type)) <=
+		    0) {
+			plt_err("Invalid metadata, input[%u] : input_type = %u", i,
+				metadata->input[i].input_type);
+			return -EINVAL;
+		}
+
+		if (rte_ml_io_type_size_get(
+			    cn10k_ml_io_type_map(metadata->input[i].model_input_type)) <= 0) {
+			plt_err("Invalid metadata, input[%u] : model_input_type = %u", i,
+				metadata->input[i].model_input_type);
+			return -EINVAL;
+		}
+
+		if (metadata->input[i].relocatable != 1) {
+			plt_err("Model not supported, non-relocatable input: %u", i);
+			return -ENOTSUP;
+		}
+	}
+
+	/* Outputs */
+	for (i = 0; i < metadata->model.num_output; i++) {
+		if (rte_ml_io_type_size_get(
+			    cn10k_ml_io_type_map(metadata->output[i].output_type)) <= 0) {
+			plt_err("Invalid metadata, output[%u] : output_type = %u", i,
+				metadata->output[i].output_type);
+			return -EINVAL;
+		}
+
+		if (rte_ml_io_type_size_get(
+			    cn10k_ml_io_type_map(metadata->output[i].model_output_type)) <= 0) {
+			plt_err("Invalid metadata, output[%u] : model_output_type = %u", i,
+				metadata->output[i].model_output_type);
+			return -EINVAL;
+		}
+
+		if (metadata->output[i].relocatable != 1) {
+			plt_err("Model not supported, non-relocatable output: %u", i);
+			return -ENOTSUP;
+		}
+	}
+
+	return 0;
+}
+
+void
+cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata)
+{
+	uint8_t i;
+
+	for (i = 0; i < metadata->model.num_input; i++) {
+		metadata->input[i].input_type = cn10k_ml_io_type_map(metadata->input[i].input_type);
+		metadata->input[i].model_input_type =
+			cn10k_ml_io_type_map(metadata->input[i].model_input_type);
+
+		if (metadata->input[i].shape.w == 0)
+			metadata->input[i].shape.w = 1;
+
+		if (metadata->input[i].shape.x == 0)
+			metadata->input[i].shape.x = 1;
+
+		if (metadata->input[i].shape.y == 0)
+			metadata->input[i].shape.y = 1;
+
+		if (metadata->input[i].shape.z == 0)
+			metadata->input[i].shape.z = 1;
+	}
+
+	for (i = 0; i < metadata->model.num_output; i++) {
+		metadata->output[i].output_type =
+			cn10k_ml_io_type_map(metadata->output[i].output_type);
+		metadata->output[i].model_output_type =
+			cn10k_ml_io_type_map(metadata->output[i].model_output_type);
+	}
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index a9f7b169de..dc30bc2aa7 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -19,6 +19,309 @@ enum cn10k_ml_model_state {
 	ML_CN10K_MODEL_STATE_UNKNOWN,
 };
 
+/* Model Metadata : v 2.1.0.2 */
+#define MRVL_ML_MODEL_MAGIC_STRING "MRVL"
+#define MRVL_ML_MODEL_TARGET_ARCH  128
+#define MRVL_ML_MODEL_VERSION	   2100
+#define MRVL_ML_MODEL_NAME_LEN	   64
+#define MRVL_ML_INPUT_NAME_LEN	   16
+#define MRVL_ML_OUTPUT_NAME_LEN	   16
+#define MRVL_ML_INPUT_OUTPUT_SIZE  8
+
+/* Model file metadata structure */
+struct cn10k_ml_model_metadata {
+	/* Header (256-byte) */
+	struct {
+		/* Magic string ('M', 'R', 'V', 'L') */
+		uint8_t magic[4];
+
+		/* Metadata version */
+		uint8_t version[4];
+
+		/* Metadata size */
+		uint32_t metadata_size;
+
+		/* Unique ID */
+		uint8_t uuid[128];
+
+		/* Model target architecture
+		 * 0 = Undefined
+		 * 1 = M1K
+		 * 128 = MLIP
+		 * 256 = Experimental
+		 */
+		uint32_t target_architecture;
+		uint8_t reserved[104];
+
+		/* CRC of data after metadata_header (i.e. after first 256 bytes) */
+		uint32_t payload_crc32c;
+
+		/* CRC of first 252 bytes of metadata_header, after payload_crc calculation */
+		uint32_t header_crc32c;
+	} metadata_header;
+
+	/* Model information (256-byte) */
+	struct {
+		/* Model name string */
+		uint8_t name[MRVL_ML_MODEL_NAME_LEN];
+
+		/* Model version info (xx.xx.xx.xx) */
+		uint8_t version[4];
+
+		/* Model code size (Init + Main + Finish) */
+		uint32_t code_size;
+
+		/* Model data size (Weights and Bias) */
+		uint32_t data_size;
+
+		/* OCM start offset, set to ocm_wb_range_start */
+		uint32_t ocm_start;
+
+		/* OCM start offset, set to max OCM size */
+		uint32_t ocm_end;
+
+		/* Relocatable flag (always yes)
+		 * 0 = Not relocatable
+		 * 1 = Relocatable
+		 */
+		uint8_t ocm_relocatable;
+
+		/* Tile relocatable flag (always yes)
+		 * 0 = Not relocatable
+		 * 1 = Relocatable
+		 */
+		uint8_t tile_relocatable;
+
+		/* Start tile (Always 0) */
+		uint8_t tile_start;
+
+		/* End tile (num_tiles - 1) */
+		uint8_t tile_end;
+
+		/* Inference batch size */
+		uint8_t batch_size;
+
+		/* Number of input tensors (Max 8) */
+		uint8_t num_input;
+
+		/* Number of output tensors (Max 8) */
+		uint8_t num_output;
+		uint8_t reserved1;
+
+		/* Total input size in bytes */
+		uint32_t input_size;
+
+		/* Total output size in bytes */
+		uint32_t output_size;
+
+		/* Table size in bytes */
+		uint32_t table_size;
+
+		/* Number of layers in the network */
+		uint32_t num_layers;
+		uint32_t reserved2;
+
+		/* Floor of absolute OCM region */
+		uint64_t ocm_tmp_range_floor;
+
+		/* Relative OCM start address of WB data block */
+		uint64_t ocm_wb_range_start;
+
+		/* Relative OCM end address of WB data block */
+		uint64_t ocm_wb_range_end;
+
+		/* Relative DDR start address of WB data block */
+		uint64_t ddr_wb_range_start;
+
+		/* Relative DDR end address of all outputs */
+		uint64_t ddr_wb_range_end;
+
+		/* Relative DDR start address of all inputs */
+		uint64_t ddr_input_range_start;
+
+		/* Relative DDR end address of all inputs */
+		uint64_t ddr_input_range_end;
+
+		/* Relative DDR start address of all outputs */
+		uint64_t ddr_output_range_start;
+
+		/* Relative DDR end address of all outputs */
+		uint64_t ddr_output_range_end;
+
+		/* Compiler version */
+		uint8_t compiler_version[8];
+
+		/* CDK version */
+		uint8_t cdk_version[4];
+
+		/* Lower batch optimization support
+		 * 0 - No,
+		 * 1 - Yes
+		 */
+		uint8_t supports_lower_batch_size_optimization;
+		uint8_t reserved3[59];
+	} model;
+
+	/* Init section (64-byte) */
+	struct {
+		uint32_t file_offset;
+		uint32_t file_size;
+		uint8_t reserved[56];
+	} init_model;
+
+	/* Main section (64-byte) */
+	struct {
+		uint32_t file_offset;
+		uint32_t file_size;
+		uint8_t reserved[56];
+	} main_model;
+
+	/* Finish section (64-byte) */
+	struct {
+		uint32_t file_offset;
+		uint32_t file_size;
+		uint8_t reserved[56];
+	} finish_model;
+
+	uint8_t reserved1[512]; /* End of 2k bytes */
+
+	/* Weights and Bias (64-byte) */
+	struct {
+		/* Memory offset, set to ddr_wb_range_start */
+		uint64_t mem_offset;
+		uint32_t file_offset;
+		uint32_t file_size;
+
+		/* Relocatable flag for WB
+		 * 1 = Relocatable
+		 * 2 = Not relocatable
+		 */
+		uint8_t relocatable;
+		uint8_t reserved[47];
+	} weights_bias;
+
+	/* Input (512-byte, 64-byte per input) provisioned for 8 inputs */
+	struct {
+		/* DDR offset (in OCM absolute addresses for input) */
+		uint64_t mem_offset;
+
+		/* Relocatable flag
+		 * 1 = Relocatable
+		 * 2 = Not relocatable
+		 */
+		uint8_t relocatable;
+
+		/* Input quantization
+		 * 1 = Requires quantization
+		 * 2 = Pre-quantized
+		 */
+		uint8_t quantize;
+
+		/* Type of incoming input
+		 * 1 = INT8, 2 = UINT8, 3 = INT16, 4 = UINT16,
+		 * 5 = INT32, 6 = UINT32, 7 = FP16, 8 = FP32
+		 */
+		uint8_t input_type;
+
+		/* Type of input required by model
+		 * 1 = INT8, 2 = UINT8, 3 = INT16, 4 = UINT16,
+		 * 5 = INT32, 6 = UINT32, 7 = FP16, 8 = FP32
+		 */
+		uint8_t model_input_type;
+
+		/* float_32 qscale value
+		 * quantized = non-quantized * qscale
+		 */
+		float qscale;
+
+		/* Input shape */
+		struct {
+			/* Input format
+			 * 1 = NCHW
+			 * 2 = NHWC
+			 */
+			uint8_t format;
+			uint8_t reserved[3];
+			uint32_t w;
+			uint32_t x;
+			uint32_t y;
+			uint32_t z;
+		} shape;
+		uint8_t reserved[4];
+
+		/* Name of input */
+		uint8_t input_name[MRVL_ML_INPUT_NAME_LEN];
+
+		/* DDR range end
+		 * new = mem_offset + size_bytes - 1
+		 */
+		uint64_t ddr_range_end;
+	} input[MRVL_ML_INPUT_OUTPUT_SIZE];
+
+	/* Output (512 byte, 64-byte per input) provisioned for 8 outputs */
+	struct {
+		/* DDR offset in OCM absolute addresses for output */
+		uint64_t mem_offset;
+
+		/* Relocatable flag
+		 * 1 = Relocatable
+		 * 2 = Not relocatable
+		 */
+		uint8_t relocatable;
+
+		/* Output dequantization
+		 * 1 = De-quantization required
+		 * 2 = De-quantization not required
+		 */
+		uint8_t dequantize;
+
+		/* Type of outgoing output
+		 * 1 = INT8, 2 = UINT8, 3 = INT16, 4 = UINT16
+		 * 5 = INT32, 6 = UINT32, 7 = FP16, 8 = FP32
+		 */
+		uint8_t output_type;
+
+		/* Type of output produced by model
+		 * 1 = INT8, 2 = UINT8, 3 = INT16, 4 = UINT16
+		 * 5 = INT32, 6 = UINT32, 7 = FP16, 8 = FP32
+		 */
+		uint8_t model_output_type;
+
+		/* float_32 dscale value
+		 * dequantized = quantized * dscale
+		 */
+		float dscale;
+
+		/* Number of items in the output */
+		uint32_t size;
+		uint8_t reserved[20];
+
+		/* DDR range end
+		 * new = mem_offset + size_bytes - 1
+		 */
+		uint64_t ddr_range_end;
+		uint8_t output_name[MRVL_ML_OUTPUT_NAME_LEN];
+	} output[MRVL_ML_INPUT_OUTPUT_SIZE];
+
+	uint8_t reserved2[1792];
+
+	/* Model data */
+	struct {
+		uint8_t reserved1[4068];
+
+		/* Beta: xx.xx.xx.xx,
+		 * Later: YYYYMM.xx.xx
+		 */
+		uint8_t compiler_version[8];
+
+		/* M1K CDK version (xx.xx.xx.xx) */
+		uint8_t m1k_cdk_version[4];
+	} data;
+
+	/* Hidden 16 bytes of magic code */
+	uint8_t reserved3[16];
+};
+
 /* Model Object */
 struct cn10k_ml_model {
 	/* Device reference */
@@ -30,6 +333,12 @@ struct cn10k_ml_model {
 	/* ID */
 	uint16_t model_id;
 
+	/* Batch size */
+	uint32_t batch_size;
+
+	/* Metadata */
+	struct cn10k_ml_model_metadata metadata;
+
 	/* Spinlock, used to update model state */
 	plt_spinlock_t lock;
 
@@ -37,4 +346,7 @@ struct cn10k_ml_model {
 	enum cn10k_ml_model_state state;
 };
 
+int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
+void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
+
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 0955fa0d76..2cde795903 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -416,8 +416,11 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	uint64_t mz_size;
 	uint16_t idx;
 	bool found;
+	int ret;
 
-	PLT_SET_USED(params);
+	ret = cn10k_ml_model_metadata_check(params->addr, params->size);
+	if (ret != 0)
+		return ret;
 
 	mldev = dev->data->dev_private;
 
@@ -450,6 +453,15 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	model->mldev = mldev;
 	model->model_id = idx;
 
+	rte_memcpy(&model->metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
+	cn10k_ml_model_metadata_update(&model->metadata);
+
+	/* Enable support for batch_size of 256 */
+	if (model->metadata.model.batch_size == 0)
+		model->batch_size = 256;
+	else
+		model->batch_size = model->metadata.model.batch_size;
+
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index bf7a9c0225..799e8f2470 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -19,7 +19,7 @@ sources = files(
         'cn10k_ml_model.c',
 )
 
-deps += ['mldev', 'common_cnxk', 'kvargs']
+deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
 
 if get_option('buildtype').contains('debug')
         cflags += [ '-DCNXK_ML_DEV_DEBUG' ]
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 13/39] ml/cnxk: add internal structures for derived info
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (11 preceding siblings ...)
  2023-03-10  8:19   ` [PATCH v6 12/39] ml/cnxk: enable validity checks for model metadata Srikanth Yalavarthi
@ 2023-03-10  8:19   ` Srikanth Yalavarthi
  2023-03-10  8:19   ` [PATCH v6 14/39] ml/cnxk: add internal structures for tiles and OCM Srikanth Yalavarthi
                     ` (27 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:19 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added internal structures to handle derived address fields
and enabled support to compute DMA addresses for model start.
Enabled updating internal model fields.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 89 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h | 80 ++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 18 ++++++-
 3 files changed, 186 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index dfa814bbe0..2530beb80e 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -214,3 +214,92 @@ cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata)
 			cn10k_ml_io_type_map(metadata->output[i].model_output_type);
 	}
 }
+
+void
+cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_t *base_dma_addr)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	struct cn10k_ml_model_addr *addr;
+	size_t model_data_size;
+	uint8_t *dma_addr_load;
+	uint8_t *dma_addr_run;
+	uint8_t i;
+	int fpos;
+
+	metadata = &model->metadata;
+	addr = &model->addr;
+	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
+			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
+
+	/* Base address */
+	addr->base_dma_addr_load = base_dma_addr;
+	addr->base_dma_addr_run = PLT_PTR_ADD(addr->base_dma_addr_load, model_data_size);
+
+	/* Init section */
+	dma_addr_load = addr->base_dma_addr_load;
+	dma_addr_run = addr->base_dma_addr_run;
+	fpos = sizeof(struct cn10k_ml_model_metadata);
+	addr->init_load_addr = dma_addr_load;
+	addr->init_run_addr = dma_addr_run;
+	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->init_model.file_size);
+
+	/* Main section */
+	dma_addr_load += metadata->init_model.file_size;
+	dma_addr_run += metadata->init_model.file_size;
+	fpos += metadata->init_model.file_size;
+	addr->main_load_addr = dma_addr_load;
+	addr->main_run_addr = dma_addr_run;
+	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->main_model.file_size);
+
+	/* Finish section */
+	dma_addr_load += metadata->main_model.file_size;
+	dma_addr_run += metadata->main_model.file_size;
+	fpos += metadata->main_model.file_size;
+	addr->finish_load_addr = dma_addr_load;
+	addr->finish_run_addr = dma_addr_run;
+	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->finish_model.file_size);
+
+	/* Weights and Bias section */
+	dma_addr_load += metadata->finish_model.file_size;
+	fpos += metadata->finish_model.file_size;
+	addr->wb_base_addr = PLT_PTR_SUB(dma_addr_load, metadata->weights_bias.mem_offset);
+	addr->wb_load_addr = PLT_PTR_ADD(addr->wb_base_addr, metadata->weights_bias.mem_offset);
+	rte_memcpy(addr->wb_load_addr, PLT_PTR_ADD(buffer, fpos), metadata->weights_bias.file_size);
+
+	/* Inputs */
+	addr->total_input_sz_d = 0;
+	addr->total_input_sz_q = 0;
+	for (i = 0; i < metadata->model.num_input; i++) {
+		addr->input[i].nb_elements =
+			model->metadata.input[i].shape.w * model->metadata.input[i].shape.x *
+			model->metadata.input[i].shape.y * model->metadata.input[i].shape.z;
+		addr->input[i].sz_d = addr->input[i].nb_elements *
+				      rte_ml_io_type_size_get(metadata->input[i].input_type);
+		addr->input[i].sz_q = addr->input[i].nb_elements *
+				      rte_ml_io_type_size_get(metadata->input[i].model_input_type);
+		addr->total_input_sz_d += addr->input[i].sz_d;
+		addr->total_input_sz_q += addr->input[i].sz_q;
+
+		plt_ml_dbg("model_id = %u, input[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+			   model->model_id, i, metadata->input[i].shape.w,
+			   metadata->input[i].shape.x, metadata->input[i].shape.y,
+			   metadata->input[i].shape.z, addr->input[i].sz_d, addr->input[i].sz_q);
+	}
+
+	/* Outputs */
+	addr->total_output_sz_q = 0;
+	addr->total_output_sz_d = 0;
+	for (i = 0; i < metadata->model.num_output; i++) {
+		addr->output[i].nb_elements = metadata->output[i].size;
+		addr->output[i].sz_d = addr->output[i].nb_elements *
+				       rte_ml_io_type_size_get(metadata->output[i].output_type);
+		addr->output[i].sz_q =
+			addr->output[i].nb_elements *
+			rte_ml_io_type_size_get(metadata->output[i].model_output_type);
+		addr->total_output_sz_q += addr->output[i].sz_q;
+		addr->total_output_sz_d += addr->output[i].sz_d;
+
+		plt_ml_dbg("model_id = %u, output[%u] - sz_d = %u, sz_q = %u", model->model_id, i,
+			   addr->output[i].sz_d, addr->output[i].sz_q);
+	}
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index dc30bc2aa7..5345160a74 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -322,6 +322,81 @@ struct cn10k_ml_model_metadata {
 	uint8_t reserved3[16];
 };
 
+/* Model address structure */
+struct cn10k_ml_model_addr {
+	/* Base DMA address for load */
+	void *base_dma_addr_load;
+
+	/* Base DMA address for run */
+	void *base_dma_addr_run;
+
+	/* Init section load address */
+	void *init_load_addr;
+
+	/* Init section run address */
+	void *init_run_addr;
+
+	/* Main section load address */
+	void *main_load_addr;
+
+	/* Main section run address */
+	void *main_run_addr;
+
+	/* Finish section load address */
+	void *finish_load_addr;
+
+	/* Finish section run address */
+	void *finish_run_addr;
+
+	/* Weights and Bias base address */
+	void *wb_base_addr;
+
+	/* Weights and bias load address */
+	void *wb_load_addr;
+
+	/* Start tile */
+	uint8_t tile_start;
+
+	/* End tile */
+	uint8_t tile_end;
+
+	/* Input address and size */
+	struct {
+		/* Number of elements */
+		uint32_t nb_elements;
+
+		/* Dequantized input size */
+		uint32_t sz_d;
+
+		/* Quantized input size */
+		uint32_t sz_q;
+	} input[MRVL_ML_INPUT_OUTPUT_SIZE];
+
+	/* Output address and size */
+	struct {
+		/* Number of elements */
+		uint32_t nb_elements;
+
+		/* Dequantize output size */
+		uint32_t sz_d;
+
+		/* Quantized output size */
+		uint32_t sz_q;
+	} output[MRVL_ML_INPUT_OUTPUT_SIZE];
+
+	/* Total size of quantized input */
+	uint32_t total_input_sz_q;
+
+	/* Total size of dequantized input */
+	uint32_t total_input_sz_d;
+
+	/* Total size of quantized output */
+	uint32_t total_output_sz_q;
+
+	/* Total size of dequantized output */
+	uint32_t total_output_sz_d;
+};
+
 /* Model Object */
 struct cn10k_ml_model {
 	/* Device reference */
@@ -339,6 +414,9 @@ struct cn10k_ml_model {
 	/* Metadata */
 	struct cn10k_ml_model_metadata metadata;
 
+	/* Address structure */
+	struct cn10k_ml_model_addr addr;
+
 	/* Spinlock, used to update model state */
 	plt_spinlock_t lock;
 
@@ -348,5 +426,7 @@ struct cn10k_ml_model {
 
 int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
+void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
+				uint8_t *base_dma_addr);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 2cde795903..b11228f2cb 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -408,11 +408,14 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 int
 cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
+	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
+	size_t model_data_size;
+	uint8_t *base_dma_addr;
 	uint64_t mz_size;
 	uint16_t idx;
 	bool found;
@@ -439,7 +442,12 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	/* Compute memzone size */
-	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE);
+	metadata = (struct cn10k_ml_model_metadata *)params->addr;
+	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
+			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
+	model_data_size = PLT_ALIGN_CEIL(model_data_size, ML_CN10K_ALIGN_SIZE);
+	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
+		  2 * model_data_size;
 
 	/* Allocate memzone for model object and model data */
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
@@ -462,6 +470,14 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	else
 		model->batch_size = model->metadata.model.batch_size;
 
+	/* Set DMA base address */
+	base_dma_addr = PLT_PTR_ADD(
+		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE));
+	cn10k_ml_model_addr_update(model, params->addr, base_dma_addr);
+
+	/* Copy data from load to run. run address to be used by MLIP */
+	rte_memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, model_data_size);
+
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 14/39] ml/cnxk: add internal structures for tiles and OCM
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (12 preceding siblings ...)
  2023-03-10  8:19   ` [PATCH v6 13/39] ml/cnxk: add internal structures for derived info Srikanth Yalavarthi
@ 2023-03-10  8:19   ` Srikanth Yalavarthi
  2023-03-10  8:19   ` [PATCH v6 15/39] ml/cnxk: add structures for slow and fast path JDs Srikanth Yalavarthi
                     ` (26 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:19 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added internal structures to handle tile and OCM information and
OCM to model memory mapping. Initialize the fields to platform
specific defaults and compute the OCM / tile requirements for model.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |  5 ++
 drivers/ml/cnxk/cn10k_ml_model.c | 53 +++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h |  6 +++
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  5 ++
 drivers/ml/cnxk/cn10k_ml_ocm.h   | 79 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 31 ++++++++++++-
 drivers/ml/cnxk/meson.build      |  2 +
 7 files changed, 180 insertions(+), 1 deletion(-)
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ocm.c
 create mode 100644 drivers/ml/cnxk/cn10k_ml_ocm.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 7cf6268115..02a4496c97 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -7,6 +7,8 @@
 
 #include <roc_api.h>
 
+#include "cn10k_ml_ocm.h"
+
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
@@ -215,6 +217,9 @@ struct cn10k_ml_dev {
 	/* Firmware */
 	struct cn10k_ml_fw fw;
 
+	/* OCM info */
+	struct cn10k_ml_ocm ocm;
+
 	/* Number of models loaded */
 	uint16_t nb_models_loaded;
 };
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 2530beb80e..69d6306104 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -8,6 +8,7 @@
 
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
+#include "cn10k_ml_ocm.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -303,3 +304,55 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_
 			   addr->output[i].sz_d, addr->output[i].sz_q);
 	}
 }
+
+int
+cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, uint8_t *buffer,
+			       uint16_t *wb_pages, uint16_t *scratch_pages)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	struct cn10k_ml_ocm *ocm;
+	uint64_t scratch_size;
+	uint64_t wb_size;
+
+	metadata = (struct cn10k_ml_model_metadata *)buffer;
+	ocm = &mldev->ocm;
+
+	/* Assume wb_size is zero for non-relocatable models */
+	if (metadata->model.ocm_relocatable)
+		wb_size = metadata->model.ocm_wb_range_end - metadata->model.ocm_wb_range_start + 1;
+	else
+		wb_size = 0;
+
+	if (wb_size % ocm->page_size)
+		*wb_pages = wb_size / ocm->page_size + 1;
+	else
+		*wb_pages = wb_size / ocm->page_size;
+	plt_ml_dbg("model_id = %u, wb_size = %" PRIu64 ", wb_pages = %u", model_id, wb_size,
+		   *wb_pages);
+
+	scratch_size = ocm->size_per_tile - metadata->model.ocm_tmp_range_floor;
+	if (metadata->model.ocm_tmp_range_floor % ocm->page_size)
+		*scratch_pages = scratch_size / ocm->page_size + 1;
+	else
+		*scratch_pages = scratch_size / ocm->page_size;
+	plt_ml_dbg("model_id = %u, scratch_size = %" PRIu64 ", scratch_pages = %u", model_id,
+		   scratch_size, *scratch_pages);
+
+	/* Check if the model can be loaded on OCM */
+	if ((*wb_pages + *scratch_pages) > ML_CN10K_OCM_NUMPAGES) {
+		plt_err("Cannot create the model, OCM relocatable = %u",
+			metadata->model.ocm_relocatable);
+		plt_err("wb_pages (%u) + scratch_pages (%u) > %u", *wb_pages, *scratch_pages,
+			ML_CN10K_OCM_NUMPAGES);
+		return -ENOMEM;
+	}
+
+	/* Update scratch_pages to block the full tile for OCM non-relocatable model. This would
+	 * prevent the library from allocating the remaining space on the tile to other models.
+	 */
+	if (!metadata->model.ocm_relocatable)
+		*scratch_pages =
+			PLT_MAX(PLT_U64_CAST(*scratch_pages), PLT_U64_CAST(ML_CN10K_OCM_NUMPAGES));
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 5345160a74..7893635787 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -10,6 +10,7 @@
 #include <roc_api.h>
 
 #include "cn10k_ml_dev.h"
+#include "cn10k_ml_ocm.h"
 
 /* Model state */
 enum cn10k_ml_model_state {
@@ -417,6 +418,9 @@ struct cn10k_ml_model {
 	/* Address structure */
 	struct cn10k_ml_model_addr addr;
 
+	/* Tile and memory information object */
+	struct cn10k_ml_ocm_model_map model_mem_map;
+
 	/* Spinlock, used to update model state */
 	plt_spinlock_t lock;
 
@@ -428,5 +432,7 @@ int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
 void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
 				uint8_t *base_dma_addr);
+int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, uint8_t *buffer,
+				   uint16_t *wb_pages, uint16_t *scratch_pages);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
new file mode 100644
index 0000000000..b1c62f2963
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#include "cn10k_ml_ocm.h"
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
new file mode 100644
index 0000000000..44390396f9
--- /dev/null
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -0,0 +1,79 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2022 Marvell.
+ */
+
+#ifndef _CN10K_ML_OCM_H_
+#define _CN10K_ML_OCM_H_
+
+#include <rte_mldev.h>
+
+/* Page size in bytes. */
+#define ML_CN10K_OCM_PAGESIZE 0x4000
+
+/* Number of OCM tiles. */
+#define ML_CN10K_OCM_NUMTILES 0x8
+
+/* OCM in bytes, per tile. */
+#define ML_CN10K_OCM_TILESIZE 0x100000
+
+/* OCM pages, per tile. */
+#define ML_CN10K_OCM_NUMPAGES (ML_CN10K_OCM_TILESIZE / ML_CN10K_OCM_PAGESIZE)
+
+/* Maximum OCM mask words, per tile, 8 bit words. */
+#define ML_CN10K_OCM_MASKWORDS (ML_CN10K_OCM_NUMPAGES / 8)
+
+/* OCM and Tile information structure */
+struct cn10k_ml_ocm_tile_info {
+	/* Mask of used / allotted pages on tile's OCM */
+	uint8_t ocm_mask[ML_CN10K_OCM_MASKWORDS];
+
+	/* Last pages in the tile's OCM used for weights and bias, default = -1 */
+	int last_wb_page;
+
+	/* Number pages used for scratch memory on the tile's OCM */
+	uint16_t scratch_pages;
+};
+
+/* Model OCM map structure */
+struct cn10k_ml_ocm_model_map {
+	/* Status of OCM reservation */
+	bool ocm_reserved;
+
+	/* Mask of OCM tiles for the model */
+	uint64_t tilemask;
+
+	/* Start page for the model load, default = -1 */
+	int wb_page_start;
+
+	/* Number of pages required for weights and bias */
+	uint16_t wb_pages;
+
+	/* Number of pages required for scratch memory */
+	uint16_t scratch_pages;
+};
+
+/* OCM state structure */
+struct cn10k_ml_ocm {
+	/* OCM spinlock, used to update OCM state */
+	rte_spinlock_t lock;
+
+	/* Number of OCM tiles */
+	uint8_t num_tiles;
+
+	/* OCM size per each tile */
+	uint64_t size_per_tile;
+
+	/* Size of OCM page */
+	uint64_t page_size;
+
+	/* Number of OCM pages */
+	uint16_t num_pages;
+
+	/* Words per OCM mask */
+	uint16_t mask_words;
+
+	/* OCM memory info and status*/
+	struct cn10k_ml_ocm_tile_info tile_ocm_info[ML_CN10K_OCM_NUMTILES];
+};
+
+#endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index b11228f2cb..302ce8a452 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -126,9 +126,11 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_qp *qp;
-	uint32_t mz_size;
 	uint16_t model_id;
+	uint32_t mz_size;
+	uint16_t tile_id;
 	uint16_t qp_id;
 	int ret;
 
@@ -250,6 +252,18 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	}
 	dev->data->nb_models = conf->nb_models;
 
+	ocm = &mldev->ocm;
+	ocm->num_tiles = ML_CN10K_OCM_NUMTILES;
+	ocm->size_per_tile = ML_CN10K_OCM_TILESIZE;
+	ocm->page_size = ML_CN10K_OCM_PAGESIZE;
+	ocm->num_pages = ocm->size_per_tile / ocm->page_size;
+	ocm->mask_words = ocm->num_pages / (8 * sizeof(uint8_t));
+
+	for (tile_id = 0; tile_id < ocm->num_tiles; tile_id++)
+		ocm->tile_ocm_info[tile_id].last_wb_page = -1;
+
+	rte_spinlock_init(&ocm->lock);
+
 	mldev->nb_models_loaded = 0;
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
 
@@ -416,6 +430,8 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	const struct plt_memzone *mz;
 	size_t model_data_size;
 	uint8_t *base_dma_addr;
+	uint16_t scratch_pages;
+	uint16_t wb_pages;
 	uint64_t mz_size;
 	uint16_t idx;
 	bool found;
@@ -441,6 +457,11 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 		return -ENOMEM;
 	}
 
+	/* Get WB and scratch pages, check if model can be loaded. */
+	ret = cn10k_ml_model_ocm_pages_count(mldev, idx, params->addr, &wb_pages, &scratch_pages);
+	if (ret < 0)
+		return ret;
+
 	/* Compute memzone size */
 	metadata = (struct cn10k_ml_model_metadata *)params->addr;
 	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
@@ -478,6 +499,14 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	/* Copy data from load to run. run address to be used by MLIP */
 	rte_memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, model_data_size);
 
+	/* Initialize model_mem_map */
+	memset(&model->model_mem_map, 0, sizeof(struct cn10k_ml_ocm_model_map));
+	model->model_mem_map.ocm_reserved = false;
+	model->model_mem_map.tilemask = 0;
+	model->model_mem_map.wb_page_start = -1;
+	model->model_mem_map.wb_pages = wb_pages;
+	model->model_mem_map.scratch_pages = scratch_pages;
+
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 799e8f2470..393bc629b0 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -11,12 +11,14 @@ driver_sdk_headers = files(
         'cn10k_ml_dev.h',
         'cn10k_ml_ops.h',
         'cn10k_ml_model.h',
+        'cn10k_ml_ocm.h',
 )
 
 sources = files(
         'cn10k_ml_dev.c',
         'cn10k_ml_ops.c',
         'cn10k_ml_model.c',
+        'cn10k_ml_ocm.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 15/39] ml/cnxk: add structures for slow and fast path JDs
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (13 preceding siblings ...)
  2023-03-10  8:19   ` [PATCH v6 14/39] ml/cnxk: add internal structures for tiles and OCM Srikanth Yalavarthi
@ 2023-03-10  8:19   ` Srikanth Yalavarthi
  2023-03-10  8:19   ` [PATCH v6 16/39] ml/cnxk: find OCM mask and page slots for a model Srikanth Yalavarthi
                     ` (25 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:19 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added JD structures for load, unload and run jobs. Initialize
job command and allocate memory for request structures for slow
path jobs.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   | 99 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h |  4 ++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 19 +++++-
 drivers/ml/cnxk/cn10k_ml_ops.h   |  4 ++
 4 files changed, 125 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 02a4496c97..68fcc957fa 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -188,6 +188,105 @@ struct cn10k_ml_jd {
 
 			uint8_t rsvd[8];
 		} fw_load;
+
+		struct cn10k_ml_jd_section_model_start {
+			/* Source model start address in DDR relative to ML_MLR_BASE */
+			uint64_t model_src_ddr_addr;
+
+			/* Destination model start address in DDR relative to ML_MLR_BASE */
+			uint64_t model_dst_ddr_addr;
+
+			/* Offset to model init section in the model */
+			uint64_t model_init_offset : 32;
+
+			/* Size of init section in the model */
+			uint64_t model_init_size : 32;
+
+			/* Offset to model main section in the model */
+			uint64_t model_main_offset : 32;
+
+			/* Size of main section in the model */
+			uint64_t model_main_size : 32;
+
+			/* Offset to model finish section in the model */
+			uint64_t model_finish_offset : 32;
+
+			/* Size of finish section in the model */
+			uint64_t model_finish_size : 32;
+
+			/* Offset to WB in model bin */
+			uint64_t model_wb_offset : 32;
+
+			/* Number of model layers */
+			uint64_t num_layers : 8;
+
+			/* Number of gather entries, 0 means linear input mode (= no gather) */
+			uint64_t num_gather_entries : 8;
+
+			/* Number of scatter entries 0 means linear input mode (= no scatter) */
+			uint64_t num_scatter_entries : 8;
+
+			/* Tile mask to load model */
+			uint64_t tilemask : 8;
+
+			/* Batch size of model  */
+			uint64_t batch_size : 32;
+
+			/* OCM WB base address */
+			uint64_t ocm_wb_base_address : 32;
+
+			/* OCM WB range start */
+			uint64_t ocm_wb_range_start : 32;
+
+			/* OCM WB range End */
+			uint64_t ocm_wb_range_end : 32;
+
+			/* DDR WB address */
+			uint64_t ddr_wb_base_address;
+
+			/* DDR WB range start */
+			uint64_t ddr_wb_range_start : 32;
+
+			/* DDR WB range end */
+			uint64_t ddr_wb_range_end : 32;
+
+			union {
+				/* Points to gather list if num_gather_entries > 0 */
+				void *gather_list;
+				struct {
+					/* Linear input mode */
+					uint64_t ddr_range_start : 32;
+					uint64_t ddr_range_end : 32;
+				} s;
+			} input;
+
+			union {
+				/* Points to scatter list if num_scatter_entries > 0 */
+				void *scatter_list;
+				struct {
+					/* Linear output mode */
+					uint64_t ddr_range_start : 32;
+					uint64_t ddr_range_end : 32;
+				} s;
+			} output;
+		} model_start;
+
+		struct cn10k_ml_jd_section_model_stop {
+			uint8_t rsvd[96];
+		} model_stop;
+
+		struct cn10k_ml_jd_section_model_run {
+			/* Address of the input for the run relative to ML_MLR_BASE */
+			uint64_t input_ddr_addr;
+
+			/* Address of the output for the run relative to ML_MLR_BASE */
+			uint64_t output_ddr_addr;
+
+			/* Number of batches to run in variable batch processing */
+			uint16_t num_batches;
+
+			uint8_t rsvd[78];
+		} model_run;
 	};
 };
 
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 7893635787..355915deeb 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -11,6 +11,7 @@
 
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ocm.h"
+#include "cn10k_ml_ops.h"
 
 /* Model state */
 enum cn10k_ml_model_state {
@@ -426,6 +427,9 @@ struct cn10k_ml_model {
 
 	/* State */
 	enum cn10k_ml_model_state state;
+
+	/* Slow-path operations request pointer */
+	struct cn10k_ml_req *req;
 };
 
 int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 302ce8a452..56adce12ea 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -12,6 +12,10 @@
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
 
+/* ML Job descriptor flags */
+#define ML_FLAGS_POLL_COMPL BIT(0)
+#define ML_FLAGS_SSO_COMPL  BIT(1)
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -65,6 +69,7 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	struct cn10k_ml_qp *qp;
 	uint32_t len;
 	uint8_t *va;
+	uint64_t i;
 
 	/* Allocate queue pair */
 	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cn10k_ml_qp), ROC_ALIGN,
@@ -95,6 +100,12 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->queue.wait_cycles = ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
 	qp->nb_desc = nb_desc;
 
+	/* Initialize job command */
+	for (i = 0; i < qp->nb_desc; i++) {
+		memset(&qp->queue.reqs[i].jd, 0, sizeof(struct cn10k_ml_jd));
+		qp->queue.reqs[i].jcmd.w1.s.jobptr = PLT_U64_CAST(&qp->queue.reqs[i].jd);
+	}
+
 	return qp;
 
 qp_free:
@@ -468,7 +479,8 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
 	model_data_size = PLT_ALIGN_CEIL(model_data_size, ML_CN10K_ALIGN_SIZE);
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
-		  2 * model_data_size;
+		  2 * model_data_size +
+		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE);
 
 	/* Allocate memzone for model object and model data */
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
@@ -507,6 +519,11 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	model->model_mem_map.wb_pages = wb_pages;
 	model->model_mem_map.scratch_pages = scratch_pages;
 
+	/* Set slow-path request address and state */
+	model->req = PLT_PTR_ADD(
+		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
+				  2 * model_data_size);
+
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index d7842ecd73..c86ce66f19 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -6,6 +6,7 @@
 #define _CN10K_ML_OPS_H_
 
 #include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
 
 #include <roc_api.h>
 
@@ -21,6 +22,9 @@ struct cn10k_ml_req {
 
 	/* Status field for poll mode requests */
 	volatile uint64_t status;
+
+	/* Job command */
+	struct ml_job_cmd_s jcmd;
 } __rte_aligned(ROC_ALIGN);
 
 /* Request queue */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 16/39] ml/cnxk: find OCM mask and page slots for a model
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (14 preceding siblings ...)
  2023-03-10  8:19   ` [PATCH v6 15/39] ml/cnxk: add structures for slow and fast path JDs Srikanth Yalavarthi
@ 2023-03-10  8:19   ` Srikanth Yalavarthi
  2023-03-10  8:19   ` [PATCH v6 17/39] ml/cnxk: add support to reserve and free OCM pages Srikanth Yalavarthi
                     ` (24 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:19 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added support to compute OCM tilemask and page start for a
model. The computed tilemask and page start are used during
model start to copy model weights and bias to OCM. OCM slot
for a model is allocated from the tiles with maximum amount
of free memory.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c | 330 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ocm.h |   5 +
 2 files changed, 335 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index b1c62f2963..df2fa4c514 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -2,4 +2,334 @@
  * Copyright (c) 2022 Marvell.
  */
 
+#include <rte_mldev_pmd.h>
+
+#include "cn10k_ml_dev.h"
 #include "cn10k_ml_ocm.h"
+
+#include "roc_api.h"
+
+/* OCM macros */
+#define BYTE_LEN	  8
+#define OCM_MAP_WORD_SIZE (sizeof(uint8_t) * BYTE_LEN)
+#define SET_BIT(num, n)	  ((num) | (1 << (n)))
+
+/* Left shift multi-word mask by 1 bit.
+ *
+ * For example, given a mask of two uint8_t words
+ * Input:  [00110101] [00110111]
+ * Output: [01101010] [01101110]
+ */
+static void
+lshift_mask(uint8_t *mask, int nwords)
+{
+	int i;
+	int word_sz;
+
+	word_sz = sizeof(uint8_t) * BYTE_LEN;
+	for (i = nwords - 1; i >= 0; i--) {
+		mask[i] = mask[i] << 1;
+		if (i != 0)
+			mask[i] = mask[i] | (mask[i - 1] >> (word_sz - 1));
+	}
+}
+
+/* Get the index of the first unused slot in a multi-word mask (base_mask). Unused slots only after
+ * the start_pos are considered. An unused slot is a sequence of slot_sz continuous unset bits in
+ * the multi-word mask. For example given a multi-word mask,
+ *
+ * The program creates a search_mask with slot_sz bits set. Uses a sliding windows approach to scan
+ * the mask to identify the available first slot. search_mask slides left from start_pos to end.
+ *
+ * [10111000] [01001001]
+ * - WORD 1 --- WORD 0 -
+ *
+ * When start = 0,
+ * Index of the first unused slot of size 4 is 7.
+ * Index of the first unused slot of size 3 is 7.
+ * Index of the first unused slot of size 2 is 1.
+ * Index of the first unused slot of size 1 is 1.
+ *
+ * When start = 2,
+ * Index of the first unused slot of size 4 is 7.
+ * Index of the first unused slot of size 2 is 4.
+ * Index of the first unused slot of size 1 is 2.
+ *
+ * When unable to find a valid slot, return 0
+ * When slot_sz is zero, return max_idx + 1
+ */
+static int
+slot_index_lowest(uint8_t *base_mask, int nwords, int slot_sz, int start_pos)
+{
+	uint8_t *search_mask;
+	int word_sz;
+	int end_pos;
+	int min_idx;
+	int max_idx;
+	bool match;
+	int i, j;
+	int idx;
+
+	word_sz = sizeof(uint8_t) * BYTE_LEN;
+	min_idx = 0;
+	max_idx = word_sz * nwords;
+	idx = min_idx - 1;
+
+	if (slot_sz == 0)
+		return max_idx;
+
+	/* Create a mask with slot_sz bits set */
+	search_mask = plt_zmalloc(nwords * sizeof(uint8_t), 0);
+	if (search_mask == NULL)
+		goto error;
+
+	for (i = 0; i < nwords; i++) {
+		if (i < slot_sz / word_sz)
+			search_mask[i] = 0xFF;
+		else if (i > slot_sz / word_sz)
+			search_mask[i] = 0x00;
+		else
+			search_mask[i] = (1 << (slot_sz % word_sz)) - 1;
+	}
+
+	/* Shift search mask by start_pos bits */
+	for (i = 0; i < start_pos; i++)
+		lshift_mask(search_mask, nwords);
+
+	/* Scan for a slot, left shift search mask after every iteration */
+	end_pos = nwords * word_sz - slot_sz + 1;
+	for (j = start_pos; j < end_pos; j++) {
+		match = true;
+		for (i = 0; i < nwords; i++)
+			match = match && (((~base_mask[i]) & search_mask[i]) == search_mask[i]);
+
+		if (match) {
+			idx = j;
+			goto found;
+		}
+
+		lshift_mask(search_mask, nwords);
+	}
+
+found:
+	plt_free(search_mask);
+
+error:
+	return idx;
+}
+
+/* Find the largest possible unused slot, with a minimum size of search_sz in a multi-work mask. The
+ * function returns the start index of the slot and the size of the identified slot (slot_sz).
+ *
+ * For example, in multi-word mask
+ *
+ * [10111000] [01001001]
+ * - WORD 1 --- WORD 0 -
+ *
+ * When search_sz > 4, return value = -1, slot_sz = 0
+ * When search_sz <=4, return value = 7, slot_sz = 4
+ */
+static int
+slot_index_largest(uint8_t *base_mask, int nwords, int search_sz, int *slot_sz)
+{
+	uint8_t *search_mask;
+	int mask_sz;
+	int word_sz;
+	int end_pos;
+	bool match;
+	int i, j;
+	int idx;
+
+	word_sz = sizeof(uint8_t) * BYTE_LEN;
+	mask_sz = nwords * word_sz;
+	idx = -1;
+
+	/* Create a mask with mask_sz bits set */
+	search_mask = plt_zmalloc(mask_sz, 0);
+	if (search_mask == NULL)
+		goto error;
+
+start:
+	for (i = 0; i < nwords; i++) {
+		if (i < mask_sz / word_sz)
+			search_mask[i] = 0xFF;
+		else if (i > mask_sz / word_sz)
+			search_mask[i] = 0x00;
+		else
+			search_mask[i] = (1 << (mask_sz % word_sz)) - 1;
+	}
+
+	/* Scan for a slot, left shift search mask after every iteration */
+	end_pos = nwords * word_sz - mask_sz + 1;
+	for (j = 0; j < end_pos; j++) {
+		match = true;
+		for (i = 0; i < nwords; i++)
+			match = match && (((~base_mask[i]) & search_mask[i]) == search_mask[i]);
+
+		if (match) {
+			idx = j;
+			goto found;
+		}
+		lshift_mask(search_mask, nwords);
+	}
+
+	mask_sz--;
+	if (mask_sz >= search_sz)
+		goto start;
+	else
+		mask_sz = 0;
+
+found:
+	plt_free(search_mask);
+	if (search_sz == 0)
+		idx = word_sz * nwords;
+
+error:
+	if (slot_sz)
+		*slot_sz = mask_sz;
+
+	return idx;
+}
+
+/* Count number of bits in a tilemask. Assumes that all set bits are contiguous. */
+int
+cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end)
+{
+	uint8_t count;
+
+	PLT_ASSERT(tilemask != 0);
+
+	*start = __builtin_ctzl(tilemask);
+	*end = 64 - __builtin_clzl(tilemask) - 1;
+	count = *end - *start + 1;
+
+	PLT_ASSERT(count == __builtin_popcountl(tilemask));
+	return count;
+}
+
+/* Find the tiles and wb_page_start to load the model on given 'num_tiles' tiles with the specified
+ * scratch & WB pages and OCM allocation mode.
+ */
+int
+cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
+			   uint16_t scratch_pages, uint64_t *tilemask)
+{
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+
+	uint8_t local_ocm_mask[ML_CN10K_OCM_MASKWORDS] = {0};
+	uint16_t used_scratch_pages_max;
+	uint16_t scratch_page_start;
+	int used_last_wb_page_max;
+	uint16_t scratch_page_end;
+	uint8_t search_start_tile;
+	uint8_t search_end_tile;
+	int wb_page_start_curr;
+	int max_slot_sz_curr;
+	uint8_t tile_start;
+	int ocm_alloc_mode;
+	int wb_page_start;
+	uint16_t tile_id;
+	uint16_t word_id;
+	uint8_t tile_idx;
+	int max_slot_sz;
+	int start_tile;
+	int page_id;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+
+	if (num_tiles > ML_CN10K_OCM_NUMTILES) {
+		plt_err("Invalid num_tiles = %u (> ML_CN10K_OCM_NUMTILES)", num_tiles);
+		return -1;
+	}
+
+	memset(tilemask, 0, sizeof(uint64_t));
+	wb_page_start = -1;
+	used_scratch_pages_max = 0;
+	used_last_wb_page_max = -1;
+	start_tile = -1;
+	max_slot_sz_curr = 0;
+	max_slot_sz = 0;
+	tile_idx = 0;
+	ocm_alloc_mode = 2;
+
+	if ((start_tile != -1) && (start_tile % num_tiles != 0)) {
+		plt_err("Invalid start_tile, %d", start_tile);
+		return -1;
+	}
+
+	if (start_tile < 0) {
+		search_start_tile = 0;
+		search_end_tile = ocm->num_tiles - num_tiles;
+	} else {
+		search_start_tile = start_tile;
+		search_end_tile = start_tile;
+	}
+
+	tile_start = search_start_tile;
+start_search:
+	used_scratch_pages_max = 0;
+	used_last_wb_page_max = -1;
+	for (tile_id = tile_start; tile_id < tile_start + num_tiles; tile_id++) {
+		used_scratch_pages_max =
+			PLT_MAX(ocm->tile_ocm_info[tile_id].scratch_pages, used_scratch_pages_max);
+		used_last_wb_page_max =
+			PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, used_last_wb_page_max);
+	}
+
+	memset(local_ocm_mask, 0, sizeof(local_ocm_mask));
+	for (tile_id = tile_start; tile_id < tile_start + num_tiles; tile_id++) {
+		for (word_id = 0; word_id < ocm->mask_words; word_id++)
+			local_ocm_mask[word_id] |= ocm->tile_ocm_info[tile_id].ocm_mask[word_id];
+	}
+
+	if (used_scratch_pages_max < scratch_pages) { /* Check for extra scratch pages */
+		if (ocm->num_pages - used_last_wb_page_max - 1 >=
+		    scratch_pages) { /* Pages available */
+			scratch_page_start = ocm->num_pages - scratch_pages;
+			scratch_page_end = ocm->num_pages - 1;
+			for (page_id = scratch_page_start; page_id <= scratch_page_end;
+			     page_id++) { /* Mark the extra scratch pages as used */
+				local_ocm_mask[page_id / OCM_MAP_WORD_SIZE] =
+					SET_BIT(local_ocm_mask[page_id / OCM_MAP_WORD_SIZE],
+						page_id % OCM_MAP_WORD_SIZE);
+			}
+		} else { /* Pages not available, check for next set of tiles */
+			goto next_search;
+		}
+	}
+
+	if (ocm_alloc_mode == 1) {
+		wb_page_start = slot_index_lowest(local_ocm_mask, ocm->mask_words, wb_pages, 0);
+		if (wb_page_start != -1) { /* Have a valid slot for WB, else next set of tiles */
+			tile_idx = tile_start;
+			goto found;
+		}
+	} else if (ocm_alloc_mode == 2) {
+		wb_page_start_curr = slot_index_largest(local_ocm_mask, ocm->mask_words, wb_pages,
+							&max_slot_sz_curr);
+		if (max_slot_sz_curr > max_slot_sz) {
+			wb_page_start = wb_page_start_curr;
+			max_slot_sz = max_slot_sz_curr;
+			tile_idx = tile_start;
+		} else if (max_slot_sz_curr == max_slot_sz) {
+			wb_page_start = wb_page_start_curr;
+			if (wb_page_start == ocm->num_pages) {
+				tile_idx = tile_start;
+				goto found;
+			}
+		}
+	}
+
+next_search:
+	tile_start = tile_start + num_tiles;
+	if (tile_start <= search_end_tile)
+		goto start_search;
+
+found:
+	if (wb_page_start != -1)
+		*tilemask = GENMASK_ULL(tile_idx + num_tiles - 1, tile_idx);
+
+	return wb_page_start;
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 44390396f9..2e26271a7a 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -6,6 +6,7 @@
 #define _CN10K_ML_OCM_H_
 
 #include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
 
 /* Page size in bytes. */
 #define ML_CN10K_OCM_PAGESIZE 0x4000
@@ -76,4 +77,8 @@ struct cn10k_ml_ocm {
 	struct cn10k_ml_ocm_tile_info tile_ocm_info[ML_CN10K_OCM_NUMTILES];
 };
 
+int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
+int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
+			       uint16_t scratch_pages, uint64_t *tilemask);
+
 #endif /* _CN10K_ML_OCM_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 17/39] ml/cnxk: add support to reserve and free OCM pages
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (15 preceding siblings ...)
  2023-03-10  8:19   ` [PATCH v6 16/39] ml/cnxk: find OCM mask and page slots for a model Srikanth Yalavarthi
@ 2023-03-10  8:19   ` Srikanth Yalavarthi
  2023-03-10  8:19   ` [PATCH v6 18/39] ml/cnxk: enable support to start an ML model Srikanth Yalavarthi
                     ` (23 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:19 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added support to reserve and free OCM pages for a model. OCM
pages are reserved upon completion of model start and are
released after model stop.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c | 131 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cn10k_ml_ocm.h |   3 +
 2 files changed, 131 insertions(+), 3 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index df2fa4c514..c3e4de3e9c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -5,14 +5,17 @@
 #include <rte_mldev_pmd.h>
 
 #include "cn10k_ml_dev.h"
+#include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
 #include "roc_api.h"
 
 /* OCM macros */
-#define BYTE_LEN	  8
-#define OCM_MAP_WORD_SIZE (sizeof(uint8_t) * BYTE_LEN)
-#define SET_BIT(num, n)	  ((num) | (1 << (n)))
+#define BYTE_LEN	   8
+#define OCM_MAP_WORD_SIZE  (sizeof(uint8_t) * BYTE_LEN)
+#define IS_BIT_SET(num, n) ((num) & (1 << (n)))
+#define SET_BIT(num, n)	   ((num) | (1 << (n)))
+#define CLEAR_BIT(num, n)  ((num) &= ~((1) << (n)))
 
 /* Left shift multi-word mask by 1 bit.
  *
@@ -333,3 +336,125 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 
 	return wb_page_start;
 }
+
+void
+cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t tilemask,
+			   int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+
+	int scratch_page_start;
+	int scratch_page_end;
+	int wb_page_end;
+	int tile_start;
+	int tile_end;
+	int tile_id;
+	int page_id;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+	model = dev->data->models[model_id];
+
+	/* Get first set bit, tile_start */
+	tile_start = 0;
+	tile_end = 0;
+	cn10k_ml_ocm_tilecount(tilemask, &tile_start, &tile_end);
+	wb_page_end = wb_page_start + wb_pages - 1;
+	scratch_page_start = ocm->num_pages - scratch_pages;
+	scratch_page_end = ocm->num_pages - 1;
+
+	/* Update tile_ocm_info */
+	for (tile_id = tile_start; tile_id <= tile_end; tile_id++) {
+		/* Scratch pages */
+		for (page_id = scratch_page_start; page_id <= scratch_page_end; page_id++)
+			ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE] = SET_BIT(
+				ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE],
+				page_id % OCM_MAP_WORD_SIZE);
+		ocm->tile_ocm_info[tile_id].scratch_pages =
+			PLT_MAX(ocm->tile_ocm_info[tile_id].scratch_pages, scratch_pages);
+
+		/* WB pages */
+		for (page_id = wb_page_start; page_id <= wb_page_end; page_id++)
+			ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE] = SET_BIT(
+				ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE],
+				page_id % OCM_MAP_WORD_SIZE);
+		if (wb_pages != 0)
+			ocm->tile_ocm_info[tile_id].last_wb_page =
+				PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, wb_page_end);
+	}
+
+	model->addr.tile_start = tile_start;
+	model->addr.tile_end = tile_end;
+
+	plt_ml_dbg("model_id = %u, tilemask = 0x%016lx", model_id, tilemask);
+	plt_ml_dbg("model_id = %u, wb_page_start = %d, wb_page_end = %d", model_id, wb_page_start,
+		   wb_page_end);
+	plt_ml_dbg("model_id = %u, scratch_page_start = %d, scratch_page_end = %d", model_id,
+		   scratch_page_start, scratch_page_end);
+}
+
+void
+cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+
+	int scratch_resize_pages;
+	int wb_page_start;
+	int wb_page_end;
+	int prev_start;
+	int curr_start;
+	int tile_id;
+	int page_id;
+	uint16_t i;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+	model = dev->data->models[model_id];
+
+	/* Update OCM info for WB memory */
+	wb_page_start = model->model_mem_map.wb_page_start;
+	wb_page_end = wb_page_start + model->model_mem_map.wb_pages - 1;
+	for (tile_id = model->addr.tile_start; tile_id <= model->addr.tile_end; tile_id++) {
+		for (page_id = wb_page_start; page_id <= wb_page_end; page_id++) {
+			ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE] =
+				CLEAR_BIT(ocm->tile_ocm_info[tile_id]
+						  .ocm_mask[page_id / OCM_MAP_WORD_SIZE],
+					  page_id % OCM_MAP_WORD_SIZE);
+		}
+
+		/* Update last_wb_page size */
+		if (wb_page_end == ocm->tile_ocm_info[tile_id].last_wb_page)
+			ocm->tile_ocm_info[tile_id].last_wb_page = wb_page_start - 1;
+
+		/* Update scratch page size and clear extra bits */
+		scratch_resize_pages = 0;
+		/* Get max scratch pages required, excluding the current model */
+		for (i = 0; i < dev->data->nb_models; i++) {
+			struct cn10k_ml_model *model = dev->data->models[i];
+
+			if ((i != model_id) && (model != NULL)) {
+				if (IS_BIT_SET(model->model_mem_map.tilemask, tile_id))
+					scratch_resize_pages =
+						PLT_MAX((int)model->model_mem_map.scratch_pages,
+							scratch_resize_pages);
+			}
+		}
+
+		/* Clear extra scratch pages */
+		if (scratch_resize_pages < ocm->tile_ocm_info[tile_id].scratch_pages) {
+			prev_start = ocm->num_pages - ocm->tile_ocm_info[tile_id].scratch_pages;
+			curr_start = ocm->num_pages - scratch_resize_pages;
+			for (page_id = prev_start; page_id < curr_start; page_id++) {
+				ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE] =
+					CLEAR_BIT(ocm->tile_ocm_info[tile_id]
+							  .ocm_mask[page_id / OCM_MAP_WORD_SIZE],
+						  page_id % OCM_MAP_WORD_SIZE);
+			}
+			ocm->tile_ocm_info[tile_id].scratch_pages = scratch_resize_pages;
+		}
+	}
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 2e26271a7a..32c9b17afc 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -80,5 +80,8 @@ struct cn10k_ml_ocm {
 int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
 int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
 			       uint16_t scratch_pages, uint64_t *tilemask);
+void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t tilemask,
+				int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages);
+void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id);
 
 #endif /* _CN10K_ML_OCM_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 18/39] ml/cnxk: enable support to start an ML model
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (16 preceding siblings ...)
  2023-03-10  8:19   ` [PATCH v6 17/39] ml/cnxk: add support to reserve and free OCM pages Srikanth Yalavarthi
@ 2023-03-10  8:19   ` Srikanth Yalavarthi
  2023-03-10  8:19   ` [PATCH v6 19/39] ml/cnxk: enable support to stop an ML models Srikanth Yalavarthi
                     ` (22 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:19 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Implemented model start driver function. A model start  job
is checked for completion in synchronous mode. Tilemask and
OCM slot is calculated before starting the model. Model start
is enqueued through scratch registers. OCM pages are reserved
after model start completion.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h |   3 +
 drivers/ml/cnxk/cn10k_ml_ops.c | 207 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |   4 +
 3 files changed, 214 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 68fcc957fa..8f6bc24370 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -33,6 +33,9 @@
 /* ML command timeout in seconds */
 #define ML_CN10K_CMD_TIMEOUT 5
 
+/* ML slow-path job flags */
+#define ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE BIT(0)
+
 /* Poll mode job state */
 #define ML_CN10K_POLL_JOB_START	 0
 #define ML_CN10K_POLL_JOB_FINISH 1
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 56adce12ea..e8ce65b182 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -114,6 +114,64 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	return NULL;
 }
 
+static void
+cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_model *model,
+				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	struct cn10k_ml_model_addr *addr;
+
+	metadata = &model->metadata;
+	addr = &model->addr;
+
+	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
+	req->jd.hdr.jce.w0.u64 = 0;
+	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
+	req->jd.hdr.model_id = model->model_id;
+	req->jd.hdr.job_type = job_type;
+	req->jd.hdr.fp_flags = 0x0;
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+
+	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
+		if (!model->metadata.model.ocm_relocatable)
+			req->jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
+		else
+			req->jd.hdr.sp_flags = 0x0;
+		req->jd.model_start.model_src_ddr_addr =
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, addr->init_load_addr));
+		req->jd.model_start.model_dst_ddr_addr =
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, addr->init_run_addr));
+		req->jd.model_start.model_init_offset = 0x0;
+		req->jd.model_start.model_main_offset = metadata->init_model.file_size;
+		req->jd.model_start.model_finish_offset =
+			metadata->init_model.file_size + metadata->main_model.file_size;
+		req->jd.model_start.model_init_size = metadata->init_model.file_size;
+		req->jd.model_start.model_main_size = metadata->main_model.file_size;
+		req->jd.model_start.model_finish_size = metadata->finish_model.file_size;
+		req->jd.model_start.model_wb_offset = metadata->init_model.file_size +
+						      metadata->main_model.file_size +
+						      metadata->finish_model.file_size;
+		req->jd.model_start.num_layers = metadata->model.num_layers;
+		req->jd.model_start.num_gather_entries = 0;
+		req->jd.model_start.num_scatter_entries = 0;
+		req->jd.model_start.tilemask = 0; /* Updated after reserving pages */
+		req->jd.model_start.batch_size = model->batch_size;
+		req->jd.model_start.ocm_wb_base_address = 0; /* Updated after reserving pages */
+		req->jd.model_start.ocm_wb_range_start = metadata->model.ocm_wb_range_start;
+		req->jd.model_start.ocm_wb_range_end = metadata->model.ocm_wb_range_end;
+		req->jd.model_start.ddr_wb_base_address = PLT_U64_CAST(roc_ml_addr_ap2mlip(
+			&mldev->roc,
+			PLT_PTR_ADD(addr->finish_load_addr, metadata->finish_model.file_size)));
+		req->jd.model_start.ddr_wb_range_start = metadata->model.ddr_wb_range_start;
+		req->jd.model_start.ddr_wb_range_end = metadata->model.ddr_wb_range_end;
+		req->jd.model_start.input.s.ddr_range_start = metadata->model.ddr_input_range_start;
+		req->jd.model_start.input.s.ddr_range_end = metadata->model.ddr_input_range_end;
+		req->jd.model_start.output.s.ddr_range_start =
+			metadata->model.ddr_output_range_start;
+		req->jd.model_start.output.s.ddr_range_end = metadata->model.ddr_output_range_end;
+	}
+}
+
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -561,6 +619,154 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 	return plt_memzone_free(plt_memzone_lookup(str));
 }
 
+int
+cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+	struct cn10k_ml_req *req;
+
+	bool job_enqueued;
+	bool job_dequeued;
+	uint8_t num_tiles;
+	uint64_t tilemask;
+	int wb_page_start;
+	int tile_start;
+	int tile_end;
+	bool locked;
+	int ret = 0;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	/* Prepare JD */
+	req = model->req;
+	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
+	req->result.error_code = 0x0;
+	req->result.user_ptr = NULL;
+
+	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_wmb();
+
+	num_tiles = model->metadata.model.tile_end - model->metadata.model.tile_start + 1;
+
+	locked = false;
+	while (!locked) {
+		if (plt_spinlock_trylock(&model->lock) != 0) {
+			if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+				plt_ml_dbg("Model already started, model = 0x%016lx",
+					   PLT_U64_CAST(model));
+				plt_spinlock_unlock(&model->lock);
+				return 1;
+			}
+
+			if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE) {
+				plt_err("A slow-path job is active for the model = 0x%016lx",
+					PLT_U64_CAST(model));
+				plt_spinlock_unlock(&model->lock);
+				return -EBUSY;
+			}
+
+			model->state = ML_CN10K_MODEL_STATE_JOB_ACTIVE;
+			plt_spinlock_unlock(&model->lock);
+			locked = true;
+		}
+	}
+
+	while (!model->model_mem_map.ocm_reserved) {
+		if (plt_spinlock_trylock(&ocm->lock) != 0) {
+			wb_page_start = cn10k_ml_ocm_tilemask_find(
+				dev, num_tiles, model->model_mem_map.wb_pages,
+				model->model_mem_map.scratch_pages, &tilemask);
+
+			if (wb_page_start == -1) {
+				plt_err("Free pages not available on OCM tiles");
+				plt_err("Failed to start model = 0x%016lx, name = %s",
+					PLT_U64_CAST(model), model->metadata.model.name);
+
+				plt_spinlock_unlock(&ocm->lock);
+				return -ENOMEM;
+			}
+
+			model->model_mem_map.tilemask = tilemask;
+			model->model_mem_map.wb_page_start = wb_page_start;
+
+			cn10k_ml_ocm_reserve_pages(
+				dev, model->model_id, model->model_mem_map.tilemask,
+				model->model_mem_map.wb_page_start, model->model_mem_map.wb_pages,
+				model->model_mem_map.scratch_pages);
+			model->model_mem_map.ocm_reserved = true;
+			plt_spinlock_unlock(&ocm->lock);
+		}
+	}
+
+	/* Update JD */
+	cn10k_ml_ocm_tilecount(model->model_mem_map.tilemask, &tile_start, &tile_end);
+	req->jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
+	req->jd.model_start.ocm_wb_base_address =
+		model->model_mem_map.wb_page_start * ocm->page_size;
+
+	job_enqueued = false;
+	job_dequeued = false;
+	do {
+		if (!job_enqueued) {
+			req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+			job_enqueued = roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+		}
+
+		if (job_enqueued && !job_dequeued)
+			job_dequeued = roc_ml_scratch_dequeue(&mldev->roc, &req->jd);
+
+		if (job_dequeued)
+			break;
+	} while (plt_tsc_cycles() < req->timeout);
+
+	if (job_dequeued) {
+		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+			if (req->result.error_code == 0)
+				ret = 0;
+			else
+				ret = -1;
+		}
+	} else { /* Reset scratch registers */
+		roc_ml_scratch_queue_reset(&mldev->roc);
+		ret = -ETIME;
+	}
+
+	locked = false;
+	while (!locked) {
+		if (plt_spinlock_trylock(&model->lock) != 0) {
+			if (ret == 0)
+				model->state = ML_CN10K_MODEL_STATE_STARTED;
+			else
+				model->state = ML_CN10K_MODEL_STATE_UNKNOWN;
+
+			plt_spinlock_unlock(&model->lock);
+			locked = true;
+		}
+	}
+
+	if (model->state == ML_CN10K_MODEL_STATE_UNKNOWN) {
+		while (model->model_mem_map.ocm_reserved) {
+			if (plt_spinlock_trylock(&ocm->lock) != 0) {
+				cn10k_ml_ocm_free_pages(dev, model->model_id);
+				model->model_mem_map.ocm_reserved = false;
+				model->model_mem_map.tilemask = 0x0;
+				plt_spinlock_unlock(&ocm->lock);
+			}
+		}
+	}
+
+	return ret;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -576,4 +782,5 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Model ops */
 	.model_load = cn10k_ml_model_load,
 	.model_unload = cn10k_ml_model_unload,
+	.model_start = cn10k_ml_model_start,
 };
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index c86ce66f19..989af978c4 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -25,6 +25,9 @@ struct cn10k_ml_req {
 
 	/* Job command */
 	struct ml_job_cmd_s jcmd;
+
+	/* Timeout cycle */
+	uint64_t timeout;
 } __rte_aligned(ROC_ALIGN);
 
 /* Request queue */
@@ -61,5 +64,6 @@ extern struct rte_ml_dev_ops cn10k_ml_ops;
 int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			uint16_t *model_id);
 int cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
+int cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id);
 
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 19/39] ml/cnxk: enable support to stop an ML models
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (17 preceding siblings ...)
  2023-03-10  8:19   ` [PATCH v6 18/39] ml/cnxk: enable support to start an ML model Srikanth Yalavarthi
@ 2023-03-10  8:19   ` Srikanth Yalavarthi
  2023-03-10  8:19   ` [PATCH v6 20/39] ml/cnxk: enable support to get model information Srikanth Yalavarthi
                     ` (21 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:19 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Implemented model stop driver function. A model stop job is
enqueued through scratch registers and is checked for
completion through polling in a synchronous mode. OCM pages
are released after model stop completion.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 115 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cn10k_ml_ops.h |   1 +
 2 files changed, 114 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index e8ce65b182..77d3728d8d 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -295,10 +295,14 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		/* Re-configure */
 		void **models;
 
-		/* Unload all models */
+		/* Stop and unload all models */
 		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 			model = dev->data->models[model_id];
 			if (model != NULL) {
+				if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+					if (cn10k_ml_model_stop(dev, model_id) != 0)
+						plt_err("Could not stop model %u", model_id);
+				}
 				if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
 					if (cn10k_ml_model_unload(dev, model_id) != 0)
 						plt_err("Could not unload model %u", model_id);
@@ -362,10 +366,14 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 
 	mldev = dev->data->dev_private;
 
-	/* Unload all models */
+	/* Stop and unload all models */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 		model = dev->data->models[model_id];
 		if (model != NULL) {
+			if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+				if (cn10k_ml_model_stop(dev, model_id) != 0)
+					plt_err("Could not stop model %u", model_id);
+			}
 			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
 				if (cn10k_ml_model_unload(dev, model_id) != 0)
 					plt_err("Could not unload model %u", model_id);
@@ -767,6 +775,108 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	return ret;
 }
 
+int
+cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+	struct cn10k_ml_req *req;
+
+	bool job_enqueued;
+	bool job_dequeued;
+	bool locked;
+	int ret = 0;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	/* Prepare JD */
+	req = model->req;
+	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
+	req->result.error_code = 0x0;
+	req->result.user_ptr = NULL;
+
+	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_wmb();
+
+	locked = false;
+	while (!locked) {
+		if (plt_spinlock_trylock(&model->lock) != 0) {
+			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+				plt_ml_dbg("Model not started, model = 0x%016lx",
+					   PLT_U64_CAST(model));
+				plt_spinlock_unlock(&model->lock);
+				return 1;
+			}
+
+			if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE) {
+				plt_err("A slow-path job is active for the model = 0x%016lx",
+					PLT_U64_CAST(model));
+				plt_spinlock_unlock(&model->lock);
+				return -EBUSY;
+			}
+
+			model->state = ML_CN10K_MODEL_STATE_JOB_ACTIVE;
+			plt_spinlock_unlock(&model->lock);
+			locked = true;
+		}
+	}
+
+	while (model->model_mem_map.ocm_reserved) {
+		if (plt_spinlock_trylock(&ocm->lock) != 0) {
+			cn10k_ml_ocm_free_pages(dev, model->model_id);
+			model->model_mem_map.ocm_reserved = false;
+			model->model_mem_map.tilemask = 0x0;
+			plt_spinlock_unlock(&ocm->lock);
+		}
+	}
+
+	job_enqueued = false;
+	job_dequeued = false;
+	do {
+		if (!job_enqueued) {
+			req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+			job_enqueued = roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+		}
+
+		if (job_enqueued && !job_dequeued)
+			job_dequeued = roc_ml_scratch_dequeue(&mldev->roc, &req->jd);
+
+		if (job_dequeued)
+			break;
+	} while (plt_tsc_cycles() < req->timeout);
+
+	if (job_dequeued) {
+		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+			if (req->result.error_code == 0x0)
+				ret = 0;
+			else
+				ret = -1;
+		}
+	} else {
+		roc_ml_scratch_queue_reset(&mldev->roc);
+		ret = -ETIME;
+	}
+
+	locked = false;
+	while (!locked) {
+		if (plt_spinlock_trylock(&model->lock) != 0) {
+			model->state = ML_CN10K_MODEL_STATE_LOADED;
+			plt_spinlock_unlock(&model->lock);
+			locked = true;
+		}
+	}
+
+	return ret;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -783,4 +893,5 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.model_load = cn10k_ml_model_load,
 	.model_unload = cn10k_ml_model_unload,
 	.model_start = cn10k_ml_model_start,
+	.model_stop = cn10k_ml_model_stop,
 };
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 989af978c4..22576b93c0 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -65,5 +65,6 @@ int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *para
 			uint16_t *model_id);
 int cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id);
+int cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
 
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 20/39] ml/cnxk: enable support to get model information
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (18 preceding siblings ...)
  2023-03-10  8:19   ` [PATCH v6 19/39] ml/cnxk: enable support to stop an ML models Srikanth Yalavarthi
@ 2023-03-10  8:19   ` Srikanth Yalavarthi
  2023-03-10  8:19   ` [PATCH v6 21/39] ml/cnxk: enable support to update model params Srikanth Yalavarthi
                     ` (20 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:19 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added driver functions to get model information. Added
internal functions to set and get model info.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 55 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h |  9 ++++++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 37 ++++++++++++++++++---
 3 files changed, 97 insertions(+), 4 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 69d6306104..0ded355d81 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -356,3 +356,58 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui
 
 	return 0;
 }
+
+void
+cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model)
+{
+	struct rte_ml_model_info *info;
+	struct rte_ml_io_info *output;
+	struct rte_ml_io_info *input;
+	uint8_t i;
+
+	info = PLT_PTR_CAST(model->info);
+	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
+	output =
+		PLT_PTR_ADD(input, model->metadata.model.num_input * sizeof(struct rte_ml_io_info));
+
+	/* Set model info */
+	memset(info, 0, sizeof(struct rte_ml_model_info));
+	rte_memcpy(info->name, model->metadata.model.name, MRVL_ML_MODEL_NAME_LEN);
+	snprintf(info->version, RTE_ML_STR_MAX, "%u.%u.%u.%u", model->metadata.model.version[0],
+		 model->metadata.model.version[1], model->metadata.model.version[2],
+		 model->metadata.model.version[3]);
+	info->model_id = model->model_id;
+	info->device_id = dev->data->dev_id;
+	info->batch_size = model->batch_size;
+	info->nb_inputs = model->metadata.model.num_input;
+	info->input_info = input;
+	info->nb_outputs = model->metadata.model.num_output;
+	info->output_info = output;
+	info->wb_size = model->metadata.weights_bias.file_size;
+
+	/* Set input info */
+	for (i = 0; i < info->nb_inputs; i++) {
+		rte_memcpy(input[i].name, model->metadata.input[i].input_name,
+			   MRVL_ML_INPUT_NAME_LEN);
+		input[i].dtype = model->metadata.input[i].input_type;
+		input[i].qtype = model->metadata.input[i].model_input_type;
+		input[i].shape.format = model->metadata.input[i].shape.format;
+		input[i].shape.w = model->metadata.input[i].shape.w;
+		input[i].shape.x = model->metadata.input[i].shape.x;
+		input[i].shape.y = model->metadata.input[i].shape.y;
+		input[i].shape.z = model->metadata.input[i].shape.z;
+	}
+
+	/* Set output info */
+	for (i = 0; i < info->nb_outputs; i++) {
+		rte_memcpy(output[i].name, model->metadata.output[i].output_name,
+			   MRVL_ML_OUTPUT_NAME_LEN);
+		output[i].dtype = model->metadata.output[i].output_type;
+		output[i].qtype = model->metadata.output[i].model_output_type;
+		output[i].shape.format = RTE_ML_IO_FORMAT_1D;
+		output[i].shape.w = model->metadata.output[i].size;
+		output[i].shape.x = 1;
+		output[i].shape.y = 1;
+		output[i].shape.z = 1;
+	}
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 355915deeb..75990fe1e4 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -422,6 +422,14 @@ struct cn10k_ml_model {
 	/* Tile and memory information object */
 	struct cn10k_ml_ocm_model_map model_mem_map;
 
+	/* Internal model information structure
+	 * Size of the buffer = sizeof(struct rte_ml_model_info)
+	 *                    + num_inputs * sizeof(struct rte_ml_io_info)
+	 *                    + num_outputs * sizeof(struct rte_ml_io_info).
+	 * Structures would be arranged in the same order in the buffer.
+	 */
+	uint8_t *info;
+
 	/* Spinlock, used to update model state */
 	plt_spinlock_t lock;
 
@@ -438,5 +446,6 @@ void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
 				uint8_t *base_dma_addr);
 int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, uint8_t *buffer,
 				   uint16_t *wb_pages, uint16_t *scratch_pages);
+void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 77d3728d8d..ad9b3dfd21 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -506,6 +506,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
 	size_t model_data_size;
+	size_t model_info_size;
 	uint8_t *base_dma_addr;
 	uint16_t scratch_pages;
 	uint16_t wb_pages;
@@ -544,8 +545,13 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
 			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
 	model_data_size = PLT_ALIGN_CEIL(model_data_size, ML_CN10K_ALIGN_SIZE);
+	model_info_size = sizeof(struct rte_ml_model_info) +
+			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
+			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
+	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
+
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
-		  2 * model_data_size +
+		  2 * model_data_size + model_info_size +
 		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE);
 
 	/* Allocate memzone for model object and model data */
@@ -585,10 +591,12 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	model->model_mem_map.wb_pages = wb_pages;
 	model->model_mem_map.scratch_pages = scratch_pages;
 
+	/* Set model info */
+	model->info = PLT_PTR_ADD(base_dma_addr, 2 * model_data_size);
+	cn10k_ml_model_info_set(dev, model);
+
 	/* Set slow-path request address and state */
-	model->req = PLT_PTR_ADD(
-		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
-				  2 * model_data_size);
+	model->req = PLT_PTR_ADD(model->info, model_info_size);
 
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
@@ -877,6 +885,26 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	return ret;
 }
 
+static int
+cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
+			struct rte_ml_model_info *model_info)
+{
+	struct cn10k_ml_model *model;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	rte_memcpy(model_info, model->info, sizeof(struct rte_ml_model_info));
+	model_info->input_info = ((struct rte_ml_model_info *)model->info)->input_info;
+	model_info->output_info = ((struct rte_ml_model_info *)model->info)->output_info;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -894,4 +922,5 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.model_unload = cn10k_ml_model_unload,
 	.model_start = cn10k_ml_model_start,
 	.model_stop = cn10k_ml_model_stop,
+	.model_info_get = cn10k_ml_model_info_get,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 21/39] ml/cnxk: enable support to update model params
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (19 preceding siblings ...)
  2023-03-10  8:19   ` [PATCH v6 20/39] ml/cnxk: enable support to get model information Srikanth Yalavarthi
@ 2023-03-10  8:19   ` Srikanth Yalavarthi
  2023-03-10  8:19   ` [PATCH v6 22/39] ml/cnxk: add support to get IO buffer sizes Srikanth Yalavarthi
                     ` (19 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:19 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added cnxk driver functions to update model params or weights
and bias after a models is loaded. Updating model params would
not require reloading the model.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index ad9b3dfd21..92bf1a0854 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -905,6 +905,36 @@ cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 	return 0;
 }
 
+static int
+cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
+{
+	struct cn10k_ml_model *model;
+	size_t size;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	if (model->state == ML_CN10K_MODEL_STATE_UNKNOWN)
+		return -1;
+	else if (model->state != ML_CN10K_MODEL_STATE_LOADED)
+		return -EBUSY;
+
+	size = model->metadata.init_model.file_size + model->metadata.main_model.file_size +
+	       model->metadata.finish_model.file_size + model->metadata.weights_bias.file_size;
+
+	/* Update model weights & bias */
+	rte_memcpy(model->addr.wb_load_addr, buffer, model->metadata.weights_bias.file_size);
+
+	/* Copy data from load to run. run address to be used by MLIP */
+	rte_memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, size);
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -923,4 +953,5 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.model_start = cn10k_ml_model_start,
 	.model_stop = cn10k_ml_model_stop,
 	.model_info_get = cn10k_ml_model_info_get,
+	.model_params_update = cn10k_ml_model_params_update,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 22/39] ml/cnxk: add support to get IO buffer sizes
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (20 preceding siblings ...)
  2023-03-10  8:19   ` [PATCH v6 21/39] ml/cnxk: enable support to update model params Srikanth Yalavarthi
@ 2023-03-10  8:19   ` Srikanth Yalavarthi
  2023-03-10  8:19   ` [PATCH v6 23/39] ml/cnxk: enable quantization and dequantization Srikanth Yalavarthi
                     ` (18 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:19 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added driver functions to get input and output buffer sizes
for a given batch size. This function would compute the buffer
size based on specific requirements of the device.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 52 ++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 92bf1a0854..b5c89bee40 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -935,6 +935,54 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *bu
 	return 0;
 }
 
+static int
+cn10k_ml_io_input_size_get(struct rte_ml_dev *dev, uint16_t model_id, uint32_t nb_batches,
+			   uint64_t *input_qsize, uint64_t *input_dsize)
+{
+	struct cn10k_ml_model *model;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	if (input_qsize != NULL)
+		*input_qsize = PLT_U64_CAST(model->addr.total_input_sz_q *
+					    PLT_DIV_CEIL(nb_batches, model->batch_size));
+
+	if (input_dsize != NULL)
+		*input_dsize = PLT_U64_CAST(model->addr.total_input_sz_d *
+					    PLT_DIV_CEIL(nb_batches, model->batch_size));
+
+	return 0;
+}
+
+static int
+cn10k_ml_io_output_size_get(struct rte_ml_dev *dev, uint16_t model_id, uint32_t nb_batches,
+			    uint64_t *output_qsize, uint64_t *output_dsize)
+{
+	struct cn10k_ml_model *model;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	if (output_qsize != NULL)
+		*output_qsize = PLT_U64_CAST(model->addr.total_output_sz_q *
+					     PLT_DIV_CEIL(nb_batches, model->batch_size));
+
+	if (output_dsize != NULL)
+		*output_dsize = PLT_U64_CAST(model->addr.total_output_sz_d *
+					     PLT_DIV_CEIL(nb_batches, model->batch_size));
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -954,4 +1002,8 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.model_stop = cn10k_ml_model_stop,
 	.model_info_get = cn10k_ml_model_info_get,
 	.model_params_update = cn10k_ml_model_params_update,
+
+	/* I/O ops */
+	.io_input_size_get = cn10k_ml_io_input_size_get,
+	.io_output_size_get = cn10k_ml_io_output_size_get,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 23/39] ml/cnxk: enable quantization and dequantization
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (21 preceding siblings ...)
  2023-03-10  8:19   ` [PATCH v6 22/39] ml/cnxk: add support to get IO buffer sizes Srikanth Yalavarthi
@ 2023-03-10  8:19   ` Srikanth Yalavarthi
  2023-03-10  8:20   ` [PATCH v6 24/39] ml/cnxk: enable support to dump device debug info Srikanth Yalavarthi
                     ` (17 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:19 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Implemented driver functions to quantize / dequantize input
and output data. Support is enabled for multiple batches.
Quantization / dequantization use the type conversion functions
defined in ML common code.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 151 +++++++++++++++++++++++++++++++++
 1 file changed, 151 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index b5c89bee40..231c9b340b 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -5,6 +5,8 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include <mldev_utils.h>
+
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ops.h"
@@ -983,6 +985,153 @@ cn10k_ml_io_output_size_get(struct rte_ml_dev *dev, uint16_t model_id, uint32_t
 	return 0;
 }
 
+static int
+cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_batches, void *dbuffer,
+		     void *qbuffer)
+{
+	struct cn10k_ml_model *model;
+	uint8_t *lcl_dbuffer;
+	uint8_t *lcl_qbuffer;
+	uint32_t batch_id;
+	uint32_t i;
+	int ret;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	lcl_dbuffer = dbuffer;
+	lcl_qbuffer = qbuffer;
+	batch_id = 0;
+
+next_batch:
+	for (i = 0; i < model->metadata.model.num_input; i++) {
+		if (model->metadata.input[i].input_type ==
+		    model->metadata.input[i].model_input_type) {
+			rte_memcpy(lcl_qbuffer, lcl_dbuffer, model->addr.input[i].sz_d);
+		} else {
+			switch (model->metadata.input[i].model_input_type) {
+			case RTE_ML_IO_TYPE_INT8:
+				ret = rte_ml_io_float32_to_int8(model->metadata.input[i].qscale,
+								model->addr.input[i].nb_elements,
+								lcl_dbuffer, lcl_qbuffer);
+				break;
+			case RTE_ML_IO_TYPE_UINT8:
+				ret = rte_ml_io_float32_to_uint8(model->metadata.input[i].qscale,
+								 model->addr.input[i].nb_elements,
+								 lcl_dbuffer, lcl_qbuffer);
+				break;
+			case RTE_ML_IO_TYPE_INT16:
+				ret = rte_ml_io_float32_to_int16(model->metadata.input[i].qscale,
+								 model->addr.input[i].nb_elements,
+								 lcl_dbuffer, lcl_qbuffer);
+				break;
+			case RTE_ML_IO_TYPE_UINT16:
+				ret = rte_ml_io_float32_to_uint16(model->metadata.input[i].qscale,
+								  model->addr.input[i].nb_elements,
+								  lcl_dbuffer, lcl_qbuffer);
+				break;
+			case RTE_ML_IO_TYPE_FP16:
+				ret = rte_ml_io_float32_to_float16(model->addr.input[i].nb_elements,
+								   lcl_dbuffer, lcl_qbuffer);
+				break;
+			default:
+				plt_err("Unsupported model_input_type[%u] : %u", i,
+					model->metadata.input[i].model_input_type);
+				ret = -ENOTSUP;
+			}
+			if (ret < 0)
+				return ret;
+		}
+
+		lcl_dbuffer += model->addr.input[i].sz_d;
+		lcl_qbuffer += model->addr.input[i].sz_q;
+	}
+
+	batch_id++;
+	if (batch_id < PLT_DIV_CEIL(nb_batches, model->batch_size))
+		goto next_batch;
+
+	return 0;
+}
+
+static int
+cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_batches,
+		       void *qbuffer, void *dbuffer)
+{
+	struct cn10k_ml_model *model;
+	uint8_t *lcl_qbuffer;
+	uint8_t *lcl_dbuffer;
+	uint32_t batch_id;
+	uint32_t i;
+	int ret;
+
+	model = dev->data->models[model_id];
+
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	lcl_dbuffer = dbuffer;
+	lcl_qbuffer = qbuffer;
+	batch_id = 0;
+
+next_batch:
+	for (i = 0; i < model->metadata.model.num_output; i++) {
+		if (model->metadata.output[i].output_type ==
+		    model->metadata.output[i].model_output_type) {
+			rte_memcpy(lcl_dbuffer, lcl_qbuffer, model->addr.output[i].sz_q);
+		} else {
+			switch (model->metadata.output[i].model_output_type) {
+			case RTE_ML_IO_TYPE_INT8:
+				ret = rte_ml_io_int8_to_float32(model->metadata.output[i].dscale,
+								model->addr.output[i].nb_elements,
+								lcl_qbuffer, lcl_dbuffer);
+				break;
+			case RTE_ML_IO_TYPE_UINT8:
+				ret = rte_ml_io_uint8_to_float32(model->metadata.output[i].dscale,
+								 model->addr.output[i].nb_elements,
+								 lcl_qbuffer, lcl_dbuffer);
+				break;
+			case RTE_ML_IO_TYPE_INT16:
+				ret = rte_ml_io_int16_to_float32(model->metadata.output[i].dscale,
+								 model->addr.output[i].nb_elements,
+								 lcl_qbuffer, lcl_dbuffer);
+				break;
+			case RTE_ML_IO_TYPE_UINT16:
+				ret = rte_ml_io_uint16_to_float32(model->metadata.output[i].dscale,
+								  model->addr.output[i].nb_elements,
+								  lcl_qbuffer, lcl_dbuffer);
+				break;
+			case RTE_ML_IO_TYPE_FP16:
+				ret = rte_ml_io_float16_to_float32(
+					model->addr.output[i].nb_elements, lcl_qbuffer,
+					lcl_dbuffer);
+				break;
+			default:
+				plt_err("Unsupported model_output_type[%u] : %u", i,
+					model->metadata.output[i].model_output_type);
+				ret = -ENOTSUP;
+			}
+			if (ret < 0)
+				return ret;
+		}
+
+		lcl_qbuffer += model->addr.output[i].sz_q;
+		lcl_dbuffer += model->addr.output[i].sz_d;
+	}
+
+	batch_id++;
+	if (batch_id < PLT_DIV_CEIL(nb_batches, model->batch_size))
+		goto next_batch;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
@@ -1006,4 +1155,6 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* I/O ops */
 	.io_input_size_get = cn10k_ml_io_input_size_get,
 	.io_output_size_get = cn10k_ml_io_output_size_get,
+	.io_quantize = cn10k_ml_io_quantize,
+	.io_dequantize = cn10k_ml_io_dequantize,
 };
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 24/39] ml/cnxk: enable support to dump device debug info
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (22 preceding siblings ...)
  2023-03-10  8:19   ` [PATCH v6 23/39] ml/cnxk: enable quantization and dequantization Srikanth Yalavarthi
@ 2023-03-10  8:20   ` Srikanth Yalavarthi
  2023-03-10  8:20   ` [PATCH v6 25/39] ml/cnxk: add driver support for device selftest Srikanth Yalavarthi
                     ` (16 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:20 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added support to dump device debug information. Debug info on
cn10k device includes model state info, OCM usage info, firmware
debug and exception buffer.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c |  51 +++++++++
 drivers/ml/cnxk/cn10k_ml_ocm.h |   1 +
 drivers/ml/cnxk/cn10k_ml_ops.c | 189 +++++++++++++++++++++++++++++++++
 3 files changed, 241 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index c3e4de3e9c..0b04fcc2da 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -458,3 +458,54 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 		}
 	}
 }
+
+static void
+cn10k_ml_ocm_pagemask_to_str(struct cn10k_ml_ocm_tile_info *tile_info, uint16_t nwords, char *str)
+{
+	char *p = str;
+	int word;
+
+	/* add prefix 0x */
+	*p++ = '0';
+	*p++ = 'x';
+
+	/* build one word at a time */
+	for (word = nwords - 1; word >= 0; word--) {
+		sprintf(p, "%02X", tile_info->ocm_mask[word]);
+		p += 2;
+	}
+
+	/* terminate */
+	*p++ = 0;
+}
+
+void
+cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
+{
+	char str[ML_CN10K_OCM_NUMPAGES / 4 + 2]; /* nibbles + prefix '0x' */
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+	uint8_t tile_id;
+	uint8_t word_id;
+	int wb_pages;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+
+	fprintf(fp, "OCM State:\n");
+	for (tile_id = 0; tile_id < ocm->num_tiles; tile_id++) {
+		cn10k_ml_ocm_pagemask_to_str(&ocm->tile_ocm_info[tile_id], ocm->mask_words, str);
+
+		wb_pages = 0 - ocm->tile_ocm_info[tile_id].scratch_pages;
+		for (word_id = 0; word_id < ML_CN10K_OCM_MASKWORDS; word_id++)
+			wb_pages +=
+				__builtin_popcount(ocm->tile_ocm_info[tile_id].ocm_mask[word_id]);
+
+		fprintf(fp,
+			"tile = %2u, scratch_pages = %4u,"
+			" wb_pages = %4d, last_wb_page = %4d,"
+			" pagemask = %s\n",
+			tile_id, ocm->tile_ocm_info[tile_id].scratch_pages, wb_pages,
+			ocm->tile_ocm_info[tile_id].last_wb_page, str);
+	}
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 32c9b17afc..0c7172a671 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -83,5 +83,6 @@ int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16
 void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t tilemask,
 				int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages);
 void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id);
+void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp);
 
 #endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 231c9b340b..2d7d760536 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -14,10 +14,25 @@
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
 
+/* Debug print width */
+#define STR_LEN	  12
+#define FIELD_LEN 16
+#define LINE_LEN  90
+
 /* ML Job descriptor flags */
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
+static void
+print_line(FILE *fp, int len)
+{
+	int i;
+
+	for (i = 0; i < len; i++)
+		fprintf(fp, "-");
+	fprintf(fp, "\n");
+}
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -116,6 +131,102 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	return NULL;
 }
 
+static void
+cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
+{
+
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_ocm *ocm;
+	char str[STR_LEN];
+	uint8_t i;
+
+	mldev = dev->data->dev_private;
+	ocm = &mldev->ocm;
+	model = dev->data->models[model_id];
+
+	/* Print debug info */
+	print_line(fp, LINE_LEN);
+	fprintf(fp, " Model Information (%s)\n", model->metadata.model.name);
+	print_line(fp, LINE_LEN);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->metadata.model.name);
+	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version", model->metadata.model.version[0],
+		model->metadata.model.version[1], model->metadata.model.version[2],
+		model->metadata.model.version[3]);
+	if (strlen(model->name) != 0)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "debug_name", model->name);
+	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->metadata.model.batch_size);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_layers", model->metadata.model.num_layers);
+
+	/* Print model state */
+	if (model->state == ML_CN10K_MODEL_STATE_LOADED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+	if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+	if (model->state == ML_CN10K_MODEL_STATE_STARTED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+
+	/* Print OCM status */
+	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
+		model->metadata.model.ocm_wb_range_end - model->metadata.model.ocm_wb_range_start +
+			1);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", model->model_mem_map.wb_pages);
+	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
+		ocm->size_per_tile - model->metadata.model.ocm_tmp_range_floor);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages", model->model_mem_map.scratch_pages);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
+		model->metadata.model.tile_end - model->metadata.model.tile_start + 1);
+
+	if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
+			ML_CN10K_OCM_NUMTILES / 4, model->model_mem_map.tilemask);
+		fprintf(fp, "%*s : 0x%x\n", FIELD_LEN, "ocm_wb_start",
+			model->model_mem_map.wb_page_start * ML_CN10K_OCM_PAGESIZE);
+	}
+
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->metadata.model.num_input);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", model->metadata.model.num_output);
+	fprintf(fp, "\n");
+
+	print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s  %18s  %12s  %14s\n", "input", "input_name", "input_type",
+		"model_input_type", "quantize", "format");
+	print_line(fp, LINE_LEN);
+	for (i = 0; i < model->metadata.model.num_input; i++) {
+		fprintf(fp, "%8u  ", i);
+		fprintf(fp, "%*s  ", 16, model->metadata.input[i].input_name);
+		rte_ml_io_type_to_str(model->metadata.input[i].input_type, str, STR_LEN);
+		fprintf(fp, "%*s  ", 12, str);
+		rte_ml_io_type_to_str(model->metadata.input[i].model_input_type, str, STR_LEN);
+		fprintf(fp, "%*s  ", 18, str);
+		fprintf(fp, "%*s", 12, (model->metadata.input[i].quantize == 1 ? "Yes" : "No"));
+		rte_ml_io_format_to_str(model->metadata.input[i].shape.format, str, STR_LEN);
+		fprintf(fp, "%*s", 16, str);
+		fprintf(fp, "\n");
+	}
+	fprintf(fp, "\n");
+
+	print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "output", "output_name", "output_type",
+		"model_output_type", "dequantize");
+	print_line(fp, LINE_LEN);
+	for (i = 0; i < model->metadata.model.num_output; i++) {
+		fprintf(fp, "%8u  ", i);
+		fprintf(fp, "%*s  ", 16, model->metadata.output[i].output_name);
+		rte_ml_io_type_to_str(model->metadata.output[i].output_type, str, STR_LEN);
+		fprintf(fp, "%*s  ", 12, str);
+		rte_ml_io_type_to_str(model->metadata.output[i].model_output_type, str, STR_LEN);
+		fprintf(fp, "%*s  ", 18, str);
+		fprintf(fp, "%*s", 12, (model->metadata.output[i].dequantize == 1 ? "Yes" : "No"));
+		fprintf(fp, "\n");
+	}
+	fprintf(fp, "\n");
+	print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+}
+
 static void
 cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_model *model,
 				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
@@ -498,6 +609,83 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+static int
+cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_fw *fw;
+
+	uint32_t head_loc;
+	uint32_t tail_loc;
+	uint16_t model_id;
+	uint32_t bufsize;
+	char *head_ptr;
+	int core_id;
+
+	if (roc_env_is_asim())
+		return 0;
+
+	mldev = dev->data->dev_private;
+	fw = &mldev->fw;
+
+	/* Dump model info */
+	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+		model = dev->data->models[model_id];
+		if (model != NULL) {
+			cn10k_ml_model_print(dev, model_id, fp);
+			fprintf(fp, "\n");
+		}
+	}
+
+	/* Dump OCM state */
+	cn10k_ml_ocm_print(dev, fp);
+
+	/* Dump debug buffer */
+	for (core_id = 0; core_id <= 1; core_id++) {
+		bufsize = fw->req->jd.fw_load.debug.debug_buffer_size;
+		if (core_id == 0) {
+			head_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+			tail_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_debug_ptr);
+			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+		} else {
+			head_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+			tail_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_debug_ptr);
+			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+		}
+		if (head_loc < tail_loc) {
+			fprintf(fp, "%.*s\n", tail_loc - head_loc, &head_ptr[head_loc]);
+		} else if (head_loc >= tail_loc + 1) {
+			fprintf(fp, "%.*s\n", bufsize - tail_loc, &head_ptr[head_loc]);
+			fprintf(fp, "%.*s\n", tail_loc, &head_ptr[0]);
+		}
+	}
+
+	/* Dump exception info */
+	for (core_id = 0; core_id <= 1; core_id++) {
+		bufsize = fw->req->jd.fw_load.debug.exception_state_size;
+		if ((core_id == 0) &&
+		    (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0)) {
+			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_exception_buffer);
+			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C0 = 0x%016lx",
+				roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0));
+			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+			fprintf(fp, "%.*s", bufsize, head_ptr);
+		} else if ((core_id == 1) &&
+			   (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0)) {
+			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_exception_buffer);
+			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C1 = 0x%016lx",
+				roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1));
+			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+			fprintf(fp, "%.*s", bufsize, head_ptr);
+		}
+	}
+
+	return 0;
+}
+
 int
 cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
@@ -1139,6 +1327,7 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.dev_close = cn10k_ml_dev_close,
 	.dev_start = cn10k_ml_dev_start,
 	.dev_stop = cn10k_ml_dev_stop,
+	.dev_dump = cn10k_ml_dev_dump,
 
 	/* Queue-pair handling ops */
 	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 25/39] ml/cnxk: add driver support for device selftest
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (23 preceding siblings ...)
  2023-03-10  8:20   ` [PATCH v6 24/39] ml/cnxk: enable support to dump device debug info Srikanth Yalavarthi
@ 2023-03-10  8:20   ` Srikanth Yalavarthi
  2023-03-10  8:20   ` [PATCH v6 26/39] ml/cnxk: enqueue a burst of inference requests Srikanth Yalavarthi
                     ` (15 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:20 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added support for device selftest. Device selftest includes
checking the status of firmware.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 57 ++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 2d7d760536..2fa0522faf 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -686,6 +686,62 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	return 0;
 }
 
+static int
+cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
+{
+	const struct plt_memzone *mz;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_req *req;
+	uint64_t timeout_cycle;
+	bool timeout;
+	int ret;
+
+	mldev = dev->data->dev_private;
+	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cn10k_ml_req), 0,
+					 ML_CN10K_ALIGN_SIZE);
+	if (mz == NULL) {
+		plt_err("Could not allocate reserved memzone");
+		return -ENOMEM;
+	}
+	req = mz->addr;
+
+	/* Prepare load completion structure */
+	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
+	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
+	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST;
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+	req->jd.fw_load.flags = cn10k_ml_fw_flags_get(&mldev->fw);
+	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_wmb();
+
+	/* Enqueue firmware selftest request through scratch registers */
+	timeout = true;
+	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+
+	plt_rmb();
+	do {
+		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
+		    (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+			timeout = false;
+			break;
+		}
+	} while (plt_tsc_cycles() < timeout_cycle);
+
+	/* Check firmware selftest status, clean-up and exit */
+	ret = 0;
+	if (timeout) {
+		ret = -ETIME;
+	} else {
+		if (req->result.error_code != 0)
+			ret = -1;
+	}
+
+	plt_memzone_free(mz);
+
+	return ret;
+}
+
 int
 cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
@@ -1328,6 +1384,7 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.dev_start = cn10k_ml_dev_start,
 	.dev_stop = cn10k_ml_dev_stop,
 	.dev_dump = cn10k_ml_dev_dump,
+	.dev_selftest = cn10k_ml_dev_selftest,
 
 	/* Queue-pair handling ops */
 	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 26/39] ml/cnxk: enqueue a burst of inference requests
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (24 preceding siblings ...)
  2023-03-10  8:20   ` [PATCH v6 25/39] ml/cnxk: add driver support for device selftest Srikanth Yalavarthi
@ 2023-03-10  8:20   ` Srikanth Yalavarthi
  2023-03-10  8:20   ` [PATCH v6 27/39] ml/cnxk: dequeue " Srikanth Yalavarthi
                     ` (14 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:20 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Enabled driver support to enqueue a burst of inference requests
to ML device. Enqueue uses internal ML request structure to queue
the inferences and job completion through polling.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 96 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  7 +++
 2 files changed, 103 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 2fa0522faf..f024487fc1 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -285,6 +285,28 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 	}
 }
 
+static __rte_always_inline void
+cn10k_ml_prep_fp_job_descriptor(struct rte_ml_dev *dev, struct cn10k_ml_req *req,
+				struct rte_ml_op *op)
+{
+	struct cn10k_ml_dev *mldev;
+
+	mldev = dev->data->dev_private;
+
+	req->jd.hdr.jce.w0.u64 = 0;
+	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
+	req->jd.hdr.model_id = op->model_id;
+	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
+	req->jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
+	req->jd.hdr.sp_flags = 0x0;
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+	req->jd.model_run.input_ddr_addr =
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->input.addr));
+	req->jd.model_run.output_ddr_addr =
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->output.addr));
+	req->jd.model_run.num_batches = op->nb_batches;
+}
+
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -450,6 +472,8 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 
 	rte_spinlock_init(&ocm->lock);
 
+	dev->enqueue_burst = cn10k_ml_enqueue_burst;
+
 	mldev->nb_models_loaded = 0;
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
 
@@ -1376,6 +1400,78 @@ cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_ba
 	return 0;
 }
 
+static __rte_always_inline void
+queue_index_advance(uint64_t *index, uint64_t nb_desc)
+{
+	*index = (*index + 1) % nb_desc;
+}
+
+static __rte_always_inline uint64_t
+queue_pending_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
+{
+	return (nb_desc + head - tail) % nb_desc;
+}
+
+static __rte_always_inline uint64_t
+queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
+{
+	return nb_desc - queue_pending_count(head, tail, nb_desc) - 1;
+}
+
+__rte_hot uint16_t
+cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
+		       uint16_t nb_ops)
+{
+	struct cn10k_ml_queue *queue;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_req *req;
+	struct cn10k_ml_qp *qp;
+	struct rte_ml_op *op;
+
+	uint16_t count;
+	uint64_t head;
+	bool enqueued;
+
+	mldev = dev->data->dev_private;
+	qp = dev->data->queue_pairs[qp_id];
+	queue = &qp->queue;
+
+	head = queue->head;
+	nb_ops = PLT_MIN(nb_ops, queue_free_count(head, queue->tail, qp->nb_desc));
+	count = 0;
+
+	if (unlikely(nb_ops == 0))
+		return 0;
+
+enqueue_req:
+	op = ops[count];
+	req = &queue->reqs[head];
+
+	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
+
+	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
+	req->result.user_ptr = op->user_ptr;
+
+	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	enqueued = roc_ml_jcmdq_enqueue_lf(&mldev->roc, &req->jcmd);
+	if (unlikely(!enqueued))
+		goto jcmdq_full;
+
+	req->timeout = plt_tsc_cycles() + queue->wait_cycles;
+	req->op = op;
+
+	queue_index_advance(&head, qp->nb_desc);
+	count++;
+
+	if (count < nb_ops)
+		goto enqueue_req;
+
+jcmdq_full:
+	queue->head = head;
+
+	return count;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 22576b93c0..a1724f6156 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -28,6 +28,9 @@ struct cn10k_ml_req {
 
 	/* Timeout cycle */
 	uint64_t timeout;
+
+	/* Op */
+	struct rte_ml_op *op;
 } __rte_aligned(ROC_ALIGN);
 
 /* Request queue */
@@ -67,4 +70,8 @@ int cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
 
+/* Fast-path ops */
+__rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
+					  struct rte_ml_op **ops, uint16_t nb_ops);
+
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 27/39] ml/cnxk: dequeue a burst of inference requests
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (25 preceding siblings ...)
  2023-03-10  8:20   ` [PATCH v6 26/39] ml/cnxk: enqueue a burst of inference requests Srikanth Yalavarthi
@ 2023-03-10  8:20   ` Srikanth Yalavarthi
  2023-03-10  8:20   ` [PATCH v6 28/39] ml/cnxk: add internal function for sync mode run Srikanth Yalavarthi
                     ` (13 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:20 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Enabled driver support to dequeue inference requests from
internal queue. Dequeue checks for request completion by
polling the status field of the job request.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 61 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  2 ++
 2 files changed, 63 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index f024487fc1..51f1c92a8d 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -473,6 +473,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	rte_spinlock_init(&ocm->lock);
 
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
+	dev->dequeue_burst = cn10k_ml_dequeue_burst;
 
 	mldev->nb_models_loaded = 0;
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
@@ -1418,6 +1419,23 @@ queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
 	return nb_desc - queue_pending_count(head, tail, nb_desc) - 1;
 }
 
+static __rte_always_inline void
+cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
+		       struct rte_ml_op *op)
+{
+	PLT_SET_USED(dev);
+	PLT_SET_USED(qp_id);
+
+	op->impl_opaque = result->error_code;
+
+	if (likely(result->error_code == 0))
+		op->status = RTE_ML_OP_STATUS_SUCCESS;
+	else
+		op->status = RTE_ML_OP_STATUS_ERROR;
+
+	op->user_ptr = result->user_ptr;
+}
+
 __rte_hot uint16_t
 cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
@@ -1472,6 +1490,49 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	return count;
 }
 
+__rte_hot uint16_t
+cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
+		       uint16_t nb_ops)
+{
+	struct cn10k_ml_queue *queue;
+	struct cn10k_ml_req *req;
+	struct cn10k_ml_qp *qp;
+
+	uint64_t status;
+	uint16_t count;
+	uint64_t tail;
+
+	qp = dev->data->queue_pairs[qp_id];
+	queue = &qp->queue;
+
+	tail = queue->tail;
+	nb_ops = PLT_MIN(nb_ops, queue_pending_count(queue->head, tail, qp->nb_desc));
+	count = 0;
+
+	if (unlikely(nb_ops == 0))
+		goto empty_or_active;
+
+dequeue_req:
+	req = &queue->reqs[tail];
+	status = plt_read64(&req->status);
+	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH))
+		goto empty_or_active;
+
+	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
+	ops[count] = req->op;
+
+	queue_index_advance(&tail, qp->nb_desc);
+	count++;
+
+	if (count < nb_ops)
+		goto dequeue_req;
+
+empty_or_active:
+	queue->tail = tail;
+
+	return count;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index a1724f6156..f6aab4a609 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -73,5 +73,7 @@ int cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
 /* Fast-path ops */
 __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					  struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
+					  struct rte_ml_op **ops, uint16_t nb_ops);
 
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 28/39] ml/cnxk: add internal function for sync mode run
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (26 preceding siblings ...)
  2023-03-10  8:20   ` [PATCH v6 27/39] ml/cnxk: dequeue " Srikanth Yalavarthi
@ 2023-03-10  8:20   ` Srikanth Yalavarthi
  2023-03-10  8:20   ` [PATCH v6 29/39] ml/cnxk: enable support for firmware error codes Srikanth Yalavarthi
                     ` (12 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:20 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added internal function to execute ML inference requests
in synchronous mode. Sync mode inference execution is used
to launch inference requests without using a queue-pair.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 53 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  1 +
 2 files changed, 54 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 51f1c92a8d..87778c37bb 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1533,6 +1533,59 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	return count;
 }
 
+__rte_hot int
+cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_req *req;
+	bool timeout;
+	int ret = 0;
+
+	mldev = dev->data->dev_private;
+	model = dev->data->models[op->model_id];
+	req = model->req;
+
+	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
+
+	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
+	req->result.user_ptr = op->user_ptr;
+
+	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
+
+	timeout = true;
+	req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	do {
+		if (roc_ml_jcmdq_enqueue_lf(&mldev->roc, &req->jcmd)) {
+			req->op = op;
+			timeout = false;
+			break;
+		}
+	} while (plt_tsc_cycles() < req->timeout);
+
+	if (timeout) {
+		ret = -EBUSY;
+		goto error_enqueue;
+	}
+
+	timeout = true;
+	do {
+		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+			timeout = false;
+			break;
+		}
+	} while (plt_tsc_cycles() < req->timeout);
+
+	if (timeout)
+		ret = -ETIME;
+	else
+		cn10k_ml_result_update(dev, -1, &req->result, req->op);
+
+error_enqueue:
+	return ret;
+}
+
 struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cn10k_ml_dev_info_get,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index f6aab4a609..7c35bf7539 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -75,5 +75,6 @@ __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id
 					  struct rte_ml_op **ops, uint16_t nb_ops);
 __rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					  struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
 
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 29/39] ml/cnxk: enable support for firmware error codes
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (27 preceding siblings ...)
  2023-03-10  8:20   ` [PATCH v6 28/39] ml/cnxk: add internal function for sync mode run Srikanth Yalavarthi
@ 2023-03-10  8:20   ` Srikanth Yalavarthi
  2023-03-10  8:20   ` [PATCH v6 30/39] ml/cnxk: add support to get and reset device stats Srikanth Yalavarthi
                     ` (11 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:20 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Enabled support for error handling. Added error types and subtypes
supported by ML firmware. Enabled support to get device specific
error code and message for a completed ML request.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c |   4 +-
 drivers/ml/cnxk/cn10k_ml_dev.h |  50 +++++++++++++-
 drivers/ml/cnxk/cn10k_ml_ops.c | 117 ++++++++++++++++++++++++++++++---
 drivers/ml/cnxk/cn10k_ml_ops.h |   2 +
 4 files changed, 160 insertions(+), 13 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 837f006bf0..76ed853a3c 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -261,7 +261,7 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 	} while (plt_tsc_cycles() < timeout_cycle);
 
 	/* Check firmware load status, clean-up and exit on failure. */
-	if ((!timeout) && (fw->req->result.error_code == 0)) {
+	if ((!timeout) && (fw->req->result.error_code.u64 == 0)) {
 		cn10k_ml_fw_print_info(fw);
 	} else {
 		/* Set ML to disable new jobs */
@@ -452,7 +452,7 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	} while (plt_tsc_cycles() < timeout_cycle);
 
 	/* Check firmware load status, clean-up and exit on failure. */
-	if ((!timeout) && (fw->req->result.error_code == 0)) {
+	if ((!timeout) && (fw->req->result.error_code.u64 == 0)) {
 		cn10k_ml_fw_print_info(fw);
 	} else {
 		/* Set ML to disable new jobs */
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 8f6bc24370..604a200e26 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -64,6 +64,54 @@ enum cn10k_ml_dev_state {
 	ML_CN10K_DEV_STATE_CLOSED
 };
 
+/* Error types enumeration */
+enum cn10k_ml_error_etype {
+	/* 0x0 */ ML_ETYPE_NO_ERROR = 0, /* No error */
+	/* 0x1 */ ML_ETYPE_FW_NONFATAL,	 /* Firmware non-fatal error */
+	/* 0x2 */ ML_ETYPE_HW_NONFATAL,	 /* Hardware non-fatal error */
+	/* 0x3 */ ML_ETYPE_HW_FATAL,	 /* Hardware fatal error */
+	/* 0x4 */ ML_ETYPE_HW_WARNING,	 /* Hardware warning */
+	/* 0x5 */ ML_ETYPE_DRIVER,	 /* Driver specific error */
+	/* 0x6 */ ML_ETYPE_UNKNOWN,	 /* Unknown error */
+};
+
+/* Firmware non-fatal error sub-type */
+enum cn10k_ml_error_stype_fw_nf {
+	/* 0x0 */ ML_FW_ERR_NOERR = 0,		 /* No error */
+	/* 0x1 */ ML_FW_ERR_UNLOAD_ID_NOT_FOUND, /* Model ID not found during load */
+	/* 0x2 */ ML_FW_ERR_LOAD_LUT_OVERFLOW,	 /* Lookup table overflow at load */
+	/* 0x3 */ ML_FW_ERR_ID_IN_USE,		 /* Model ID already in use */
+	/* 0x4 */ ML_FW_ERR_INVALID_TILEMASK,	 /* Invalid OCM tilemask */
+	/* 0x5 */ ML_FW_ERR_RUN_LUT_OVERFLOW,	 /* Lookup table overflow at run */
+	/* 0x6 */ ML_FW_ERR_RUN_ID_NOT_FOUND,	 /* Model ID not found during run */
+	/* 0x7 */ ML_FW_ERR_COMMAND_NOTSUP,	 /* Unsupported command */
+	/* 0x8 */ ML_FW_ERR_DDR_ADDR_RANGE,	 /* DDR address out of range */
+	/* 0x9 */ ML_FW_ERR_NUM_BATCHES_INVALID, /* Invalid number of batches */
+	/* 0xA */ ML_FW_ERR_INSSYNC_TIMEOUT,	 /* INS sync timeout */
+};
+
+/* Driver error sub-type */
+enum cn10k_ml_error_stype_driver {
+	/* 0x0 */ ML_DRIVER_ERR_NOERR = 0, /* No error */
+	/* 0x1 */ ML_DRIVER_ERR_UNKNOWN,   /* Unable to determine error sub-type */
+	/* 0x2 */ ML_DRIVER_ERR_EXCEPTION, /* Firmware exception */
+	/* 0x3 */ ML_DRIVER_ERR_FW_ERROR,  /* Unknown firmware error */
+};
+
+/* Error structure */
+union cn10k_ml_error_code {
+	struct {
+		/* Error type */
+		uint64_t etype : 4;
+
+		/* Error sub-type */
+		uint64_t stype : 60;
+	} s;
+
+	/* WORD 0 */
+	uint64_t u64;
+};
+
 /* Firmware stats */
 struct cn10k_ml_fw_stats {
 	/* Firmware start cycle */
@@ -82,7 +130,7 @@ struct cn10k_ml_fw_stats {
 /* Result structure */
 struct cn10k_ml_result {
 	/* Job error code */
-	uint64_t error_code;
+	union cn10k_ml_error_code error_code;
 
 	/* Firmware stats */
 	struct cn10k_ml_fw_stats stats;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 87778c37bb..23a9ca4ff2 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -23,6 +23,49 @@
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
+/* Error message length */
+#define ERRMSG_LEN 32
+
+/* Error type database */
+static const struct cn10k_ml_etype_db {
+	enum cn10k_ml_error_etype etype;
+	char name[ERRMSG_LEN];
+} ml_etype_db[] = {
+	{ML_ETYPE_NO_ERROR, "NO_ERROR"},	{ML_ETYPE_FW_NONFATAL, "FW_NON_FATAL"},
+	{ML_ETYPE_HW_NONFATAL, "HW_NON_FATAL"}, {ML_ETYPE_HW_FATAL, "HW_FATAL"},
+	{ML_ETYPE_HW_WARNING, "HW_WARNING"},	{ML_ETYPE_DRIVER, "DRIVER_ERROR"},
+	{ML_ETYPE_UNKNOWN, "UNKNOWN_ERROR"},
+};
+
+/* Hardware non-fatal error subtype database */
+static const struct cn10k_ml_stype_db_hw_nf {
+	enum cn10k_ml_error_stype_fw_nf stype;
+	char msg[ERRMSG_LEN];
+} ml_stype_db_hw_nf[] = {
+	{ML_FW_ERR_NOERR, "NO ERROR"},
+	{ML_FW_ERR_UNLOAD_ID_NOT_FOUND, "UNLOAD MODEL ID NOT FOUND"},
+	{ML_FW_ERR_LOAD_LUT_OVERFLOW, "LOAD LUT OVERFLOW"},
+	{ML_FW_ERR_ID_IN_USE, "MODEL ID IN USE"},
+	{ML_FW_ERR_INVALID_TILEMASK, "INVALID TILEMASK"},
+	{ML_FW_ERR_RUN_LUT_OVERFLOW, "RUN LUT OVERFLOW"},
+	{ML_FW_ERR_RUN_ID_NOT_FOUND, "RUN MODEL ID NOT FOUND"},
+	{ML_FW_ERR_COMMAND_NOTSUP, "COMMAND NOT SUPPORTED"},
+	{ML_FW_ERR_DDR_ADDR_RANGE, "DDR ADDRESS OUT OF RANGE"},
+	{ML_FW_ERR_NUM_BATCHES_INVALID, "INVALID BATCHES"},
+	{ML_FW_ERR_INSSYNC_TIMEOUT, "INSSYNC TIMEOUT"},
+};
+
+/* Driver error subtype database */
+static const struct cn10k_ml_stype_db_driver {
+	enum cn10k_ml_error_stype_driver stype;
+	char msg[ERRMSG_LEN];
+} ml_stype_db_driver[] = {
+	{ML_DRIVER_ERR_NOERR, "NO ERROR"},
+	{ML_DRIVER_ERR_UNKNOWN, "UNKNOWN ERROR"},
+	{ML_DRIVER_ERR_EXCEPTION, "FW EXCEPTION"},
+	{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
+};
+
 static void
 print_line(FILE *fp, int len)
 {
@@ -474,6 +517,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
+	dev->op_error_get = cn10k_ml_op_error_get;
 
 	mldev->nb_models_loaded = 0;
 	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
@@ -758,7 +802,7 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	if (timeout) {
 		ret = -ETIME;
 	} else {
-		if (req->result.error_code != 0)
+		if (req->result.error_code.u64 != 0)
 			ret = -1;
 	}
 
@@ -936,7 +980,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	/* Prepare JD */
 	req = model->req;
 	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
-	req->result.error_code = 0x0;
+	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
 
 	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
@@ -1017,7 +1061,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 
 	if (job_dequeued) {
 		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
-			if (req->result.error_code == 0)
+			if (req->result.error_code.u64 == 0)
 				ret = 0;
 			else
 				ret = -1;
@@ -1079,7 +1123,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	/* Prepare JD */
 	req = model->req;
 	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
-	req->result.error_code = 0x0;
+	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
 
 	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
@@ -1134,7 +1178,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 
 	if (job_dequeued) {
 		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
-			if (req->result.error_code == 0x0)
+			if (req->result.error_code.u64 == 0x0)
 				ret = 0;
 			else
 				ret = -1;
@@ -1426,12 +1470,30 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 	PLT_SET_USED(dev);
 	PLT_SET_USED(qp_id);
 
-	op->impl_opaque = result->error_code;
+	struct cn10k_ml_dev *mldev;
 
-	if (likely(result->error_code == 0))
+	if (likely(result->error_code.u64 == 0)) {
+		op->impl_opaque = result->error_code.u64;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
-	else
+	} else {
+		/* Handle driver error */
+		if (result->error_code.s.etype == ML_ETYPE_DRIVER) {
+			mldev = dev->data->dev_private;
+
+			/* Check for exception */
+			if ((roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0) ||
+			    (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
+				result->error_code.s.stype = ML_DRIVER_ERR_EXCEPTION;
+			else if ((roc_ml_reg_read64(&mldev->roc, ML_CORE_INT_LO) != 0) ||
+				 (roc_ml_reg_read64(&mldev->roc, ML_CORE_INT_HI) != 0))
+				result->error_code.s.stype = ML_DRIVER_ERR_FW_ERROR;
+			else
+				result->error_code.s.stype = ML_DRIVER_ERR_UNKNOWN;
+		}
+
+		op->impl_opaque = result->error_code.u64;
 		op->status = RTE_ML_OP_STATUS_ERROR;
+	}
 
 	op->user_ptr = result->user_ptr;
 }
@@ -1468,6 +1530,7 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
+	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
 	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
@@ -1515,8 +1578,12 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 dequeue_req:
 	req = &queue->reqs[tail];
 	status = plt_read64(&req->status);
-	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH))
-		goto empty_or_active;
+	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH)) {
+		if (plt_tsc_cycles() < req->timeout)
+			goto empty_or_active;
+		else /* Timeout, set indication of driver error */
+			req->result.error_code.s.etype = ML_ETYPE_DRIVER;
+	}
 
 	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
 	ops[count] = req->op;
@@ -1533,6 +1600,35 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	return count;
 }
 
+__rte_hot int
+cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_ml_op_error *error)
+{
+	union cn10k_ml_error_code *error_code;
+	char msg[RTE_ML_STR_MAX];
+
+	PLT_SET_USED(dev);
+
+	error_code = (union cn10k_ml_error_code *)&op->impl_opaque;
+
+	/* Copy error message */
+	plt_strlcpy(msg, ml_etype_db[error_code->s.etype].name, sizeof(msg));
+
+	/* Copy sub error message */
+	if (error_code->s.etype == ML_ETYPE_HW_NONFATAL) {
+		strcat(msg, " : ");
+		strcat(msg, ml_stype_db_hw_nf[error_code->s.stype].msg);
+	}
+
+	if (error_code->s.etype == ML_ETYPE_DRIVER) {
+		strcat(msg, " : ");
+		strcat(msg, ml_stype_db_driver[error_code->s.stype].msg);
+	}
+
+	plt_strlcpy(error->message, msg, sizeof(error->message));
+
+	return 0;
+}
+
 __rte_hot int
 cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 {
@@ -1549,6 +1645,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
+	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
 	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 7c35bf7539..1784900cff 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -75,6 +75,8 @@ __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id
 					  struct rte_ml_op **ops, uint16_t nb_ops);
 __rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					  struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op,
+				    struct rte_ml_op_error *error);
 __rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
 
 #endif /* _CN10K_ML_OPS_H_ */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 30/39] ml/cnxk: add support to get and reset device stats
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (28 preceding siblings ...)
  2023-03-10  8:20   ` [PATCH v6 29/39] ml/cnxk: enable support for firmware error codes Srikanth Yalavarthi
@ 2023-03-10  8:20   ` Srikanth Yalavarthi
  2023-03-10  8:20   ` [PATCH v6 31/39] ml/cnxk: add support to handle extended dev stats Srikanth Yalavarthi
                     ` (10 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:20 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Enabled support to get and reset ML device stats. Device stats
include number of requests enqueued/dequeued and error count.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 55 ++++++++++++++++++++++++++++++++--
 drivers/ml/cnxk/cn10k_ml_ops.h |  3 ++
 2 files changed, 55 insertions(+), 3 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 23a9ca4ff2..c38f018a50 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -159,6 +159,10 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->queue.tail = 0;
 	qp->queue.wait_cycles = ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
 	qp->nb_desc = nb_desc;
+	qp->stats.enqueued_count = 0;
+	qp->stats.dequeued_count = 0;
+	qp->stats.enqueue_err_count = 0;
+	qp->stats.dequeue_err_count = 0;
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
@@ -678,6 +682,38 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+static int
+cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
+{
+	struct cn10k_ml_qp *qp;
+	int qp_id;
+
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		stats->enqueued_count += qp->stats.enqueued_count;
+		stats->dequeued_count += qp->stats.dequeued_count;
+		stats->enqueue_err_count += qp->stats.enqueue_err_count;
+		stats->dequeue_err_count += qp->stats.dequeue_err_count;
+	}
+
+	return 0;
+}
+
+static void
+cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
+{
+	struct cn10k_ml_qp *qp;
+	int qp_id;
+
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		qp->stats.enqueued_count = 0;
+		qp->stats.dequeued_count = 0;
+		qp->stats.enqueue_err_count = 0;
+		qp->stats.dequeue_err_count = 0;
+	}
+}
+
 static int
 cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
@@ -1467,15 +1503,23 @@ static __rte_always_inline void
 cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
 		       struct rte_ml_op *op)
 {
-	PLT_SET_USED(dev);
-	PLT_SET_USED(qp_id);
-
 	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_qp *qp;
 
 	if (likely(result->error_code.u64 == 0)) {
+		if (likely(qp_id >= 0)) {
+			qp = dev->data->queue_pairs[qp_id];
+			qp->stats.dequeued_count++;
+		}
+
 		op->impl_opaque = result->error_code.u64;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
 	} else {
+		if (likely(qp_id >= 0)) {
+			qp = dev->data->queue_pairs[qp_id];
+			qp->stats.dequeue_err_count++;
+		}
+
 		/* Handle driver error */
 		if (result->error_code.s.etype == ML_ETYPE_DRIVER) {
 			mldev = dev->data->dev_private;
@@ -1549,6 +1593,7 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 
 jcmdq_full:
 	queue->head = head;
+	qp->stats.enqueued_count += count;
 
 	return count;
 }
@@ -1697,6 +1742,10 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
 	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
 
+	/* Stats ops */
+	.dev_stats_get = cn10k_ml_dev_stats_get,
+	.dev_stats_reset = cn10k_ml_dev_stats_reset,
+
 	/* Model ops */
 	.model_load = cn10k_ml_model_load,
 	.model_unload = cn10k_ml_model_unload,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 1784900cff..65ae8b44f3 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -58,6 +58,9 @@ struct cn10k_ml_qp {
 
 	/* Request queue */
 	struct cn10k_ml_queue queue;
+
+	/* Statistics per queue-pair */
+	struct rte_ml_dev_stats stats;
 };
 
 /* Device ops */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 31/39] ml/cnxk: add support to handle extended dev stats
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (29 preceding siblings ...)
  2023-03-10  8:20   ` [PATCH v6 30/39] ml/cnxk: add support to get and reset device stats Srikanth Yalavarthi
@ 2023-03-10  8:20   ` Srikanth Yalavarthi
  2023-03-10  8:20   ` [PATCH v6 32/39] ml/cnxk: enable support to get xstats in cycles Srikanth Yalavarthi
                     ` (9 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:20 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added support to handle ML device extended stats. Support
is enabled to get xstats names and stats values and reset
xstats. Supported xstats include avg, min and max hardware
and firmware latency.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |   3 +
 drivers/ml/cnxk/cn10k_ml_model.h |  57 +++++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 356 ++++++++++++++++++++++++++++++-
 3 files changed, 415 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 604a200e26..b7ff369ba8 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -372,6 +372,9 @@ struct cn10k_ml_dev {
 
 	/* Number of models loaded */
 	uint16_t nb_models_loaded;
+
+	/* xstats status */
+	bool xstats_enabled;
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 75990fe1e4..1bc748265d 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -399,6 +399,57 @@ struct cn10k_ml_model_addr {
 	uint32_t total_output_sz_d;
 };
 
+/* Extended stats types enum */
+enum cn10k_ml_model_xstats_type {
+	/* Average hardware latency */
+	avg_hw_latency = 0,
+
+	/* Minimum hardware latency */
+	min_hw_latency,
+
+	/* Maximum hardware latency */
+	max_hw_latency,
+
+	/* Average firmware latency */
+	avg_fw_latency,
+
+	/* Minimum firmware latency */
+	min_fw_latency,
+
+	/* Maximum firmware latency */
+	max_fw_latency,
+};
+
+/* Model fast-path stats */
+struct cn10k_ml_model_stats {
+	/* Total hardware latency, sum of all inferences */
+	uint64_t hw_latency_tot;
+
+	/* Minimum hardware latency */
+	uint64_t hw_latency_min;
+
+	/* Maximum hardware latency */
+	uint64_t hw_latency_max;
+
+	/* Total firmware latency, sum of all inferences */
+	uint64_t fw_latency_tot;
+
+	/* Minimum firmware latency */
+	uint64_t fw_latency_min;
+
+	/* Maximum firmware latency */
+	uint64_t fw_latency_max;
+
+	/* Total jobs dequeued */
+	uint64_t dequeued_count;
+
+	/* Hardware stats reset index */
+	uint64_t hw_reset_count;
+
+	/* Firmware stats reset index */
+	uint64_t fw_reset_count;
+};
+
 /* Model Object */
 struct cn10k_ml_model {
 	/* Device reference */
@@ -438,6 +489,12 @@ struct cn10k_ml_model {
 
 	/* Slow-path operations request pointer */
 	struct cn10k_ml_req *req;
+
+	/* Stats for burst ops */
+	struct cn10k_ml_model_stats *burst_stats;
+
+	/* Stats for sync ops */
+	struct cn10k_ml_model_stats *sync_stats;
 };
 
 int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index c38f018a50..880bb6a5a9 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -354,6 +354,134 @@ cn10k_ml_prep_fp_job_descriptor(struct rte_ml_dev *dev, struct cn10k_ml_req *req
 	req->jd.model_run.num_batches = op->nb_batches;
 }
 
+#define ML_AVG_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
+			value += model->burst_stats[qp_id].str##_latency_tot;                      \
+			count += model->burst_stats[qp_id].dequeued_count -                        \
+				 model->burst_stats[qp_id].str##_reset_count;                      \
+		}                                                                                  \
+		value = value / count;                                                             \
+	} while (0)
+
+#define ML_MIN_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+	do {                                                                                       \
+		value = UINT64_MAX;                                                                \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
+			value = PLT_MIN(value, model->burst_stats[qp_id].str##_latency_min);       \
+			count += model->burst_stats[qp_id].dequeued_count -                        \
+				 model->burst_stats[qp_id].str##_reset_count;                      \
+		}                                                                                  \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+
+#define ML_MAX_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
+			value = PLT_MAX(value, model->burst_stats[qp_id].str##_latency_max);       \
+			count += model->burst_stats[qp_id].dequeued_count -                        \
+				 model->burst_stats[qp_id].str##_reset_count;                      \
+		}                                                                                  \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+
+static uint64_t
+cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t model_id,
+			 enum cn10k_ml_model_xstats_type type)
+{
+	struct cn10k_ml_model *model;
+	uint64_t count = 0;
+	uint64_t value;
+	uint32_t qp_id;
+
+	model = dev->data->models[model_id];
+	if (model == NULL)
+		return 0;
+
+	switch (type) {
+	case avg_hw_latency:
+		ML_AVG_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		break;
+	case min_hw_latency:
+		ML_MIN_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		break;
+	case max_hw_latency:
+		ML_MAX_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		break;
+	case avg_fw_latency:
+		ML_AVG_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		break;
+	case min_fw_latency:
+		ML_MIN_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		break;
+	case max_fw_latency:
+		ML_MAX_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		break;
+	default:
+		value = 0;
+	}
+
+	return value;
+}
+
+#define ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
+			model->burst_stats[qp_id].str##_latency_tot = 0;                           \
+			model->burst_stats[qp_id].str##_reset_count =                              \
+				model->burst_stats[qp_id].dequeued_count;                          \
+		}                                                                                  \
+	} while (0)
+
+#define ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
+			model->burst_stats[qp_id].str##_latency_min = UINT64_MAX;                  \
+	} while (0)
+
+#define ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
+			model->burst_stats[qp_id].str##_latency_max = 0;                           \
+	} while (0)
+
+static void
+cn10k_ml_model_xstat_reset(struct rte_ml_dev *dev, uint16_t model_id,
+			   enum cn10k_ml_model_xstats_type type)
+{
+	struct cn10k_ml_model *model;
+	uint32_t qp_id;
+
+	model = dev->data->models[model_id];
+
+	switch (type) {
+	case avg_hw_latency:
+		ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, hw);
+		break;
+	case min_hw_latency:
+		ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, hw);
+		break;
+	case max_hw_latency:
+		ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, hw);
+		break;
+	case avg_fw_latency:
+		ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, fw);
+		break;
+	case min_fw_latency:
+		ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, fw);
+		break;
+	case max_fw_latency:
+		ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, fw);
+		break;
+	default:
+		return;
+	}
+}
+
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -519,6 +647,13 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 
 	rte_spinlock_init(&ocm->lock);
 
+	/* Check firmware stats */
+	if ((mldev->fw.req->jd.fw_load.cap.s.hw_stats) &&
+	    (mldev->fw.req->jd.fw_load.cap.s.fw_stats))
+		mldev->xstats_enabled = true;
+	else
+		mldev->xstats_enabled = false;
+
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
 	dev->op_error_get = cn10k_ml_op_error_get;
@@ -714,6 +849,170 @@ cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 	}
 }
 
+/* Model xstats names */
+struct rte_ml_dev_xstats_map cn10k_ml_model_xstats_table[] = {
+	{avg_hw_latency, "Avg-HW-Latency"}, {min_hw_latency, "Min-HW-Latency"},
+	{max_hw_latency, "Max-HW-Latency"}, {avg_fw_latency, "Avg-FW-Latency"},
+	{min_fw_latency, "Min-FW-Latency"}, {max_fw_latency, "Max-FW-Latency"},
+};
+
+static int
+cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, struct rte_ml_dev_xstats_map *xstats_map,
+			      uint32_t size)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	uint32_t model_id;
+	uint32_t count;
+	uint32_t type;
+	uint32_t id;
+
+	mldev = dev->data->dev_private;
+	if (!mldev->xstats_enabled)
+		return 0;
+
+	if (xstats_map == NULL)
+		return PLT_DIM(cn10k_ml_model_xstats_table) * mldev->nb_models_loaded;
+
+	/* Model xstats names */
+	count = 0;
+	cn10k_ml_dev_info_get(dev, &dev_info);
+
+	for (id = 0; id < PLT_DIM(cn10k_ml_model_xstats_table) * dev_info.max_models; id++) {
+		model_id = id / PLT_DIM(cn10k_ml_model_xstats_table);
+		model = dev->data->models[model_id];
+
+		if (model == NULL)
+			continue;
+
+		xstats_map[count].id = id;
+		type = id % PLT_DIM(cn10k_ml_model_xstats_table);
+
+		snprintf(xstats_map[count].name, RTE_ML_STR_MAX, "%s-%s-cycles",
+			 model->metadata.model.name, cn10k_ml_model_xstats_table[type].name);
+
+		count++;
+		if (count == size)
+			break;
+	}
+
+	return count;
+}
+
+static int
+cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
+				uint64_t *value)
+{
+	struct rte_ml_dev_xstats_map *xstats_map;
+	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_dev *mldev;
+	uint32_t num_xstats;
+	uint32_t model_id;
+	uint32_t type;
+	uint32_t id;
+
+	mldev = dev->data->dev_private;
+	if (!mldev->xstats_enabled)
+		return 0;
+
+	num_xstats = PLT_DIM(cn10k_ml_model_xstats_table) * mldev->nb_models_loaded;
+	xstats_map = rte_zmalloc("cn10k_ml_xstats_map",
+				 sizeof(struct rte_ml_dev_xstats_map) * num_xstats, 0);
+	cn10k_ml_dev_xstats_names_get(dev, xstats_map, num_xstats);
+
+	cn10k_ml_dev_info_get(dev, &dev_info);
+	for (id = 0; id < PLT_DIM(cn10k_ml_model_xstats_table) * dev_info.max_models; id++) {
+		if (strncmp(name, xstats_map[id].name, strlen(name)) == 0) {
+			*stat_id = id;
+			rte_free(xstats_map);
+			break;
+		}
+	}
+
+	if (id == PLT_DIM(cn10k_ml_model_xstats_table) * dev_info.max_models)
+		return -EINVAL;
+
+	model_id = id / PLT_DIM(cn10k_ml_model_xstats_table);
+	type = id % PLT_DIM(cn10k_ml_model_xstats_table);
+	*value = cn10k_ml_model_xstat_get(dev, model_id, type);
+
+	return 0;
+}
+
+static int
+cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, const uint16_t *stat_ids, uint64_t *values,
+			uint16_t nb_ids)
+{
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	uint32_t model_id;
+	uint32_t count;
+	uint32_t type;
+	uint32_t i;
+
+	mldev = dev->data->dev_private;
+	if (!mldev->xstats_enabled)
+		return 0;
+
+	count = 0;
+	for (i = 0; i < nb_ids; i++) {
+		model_id = stat_ids[i] / PLT_DIM(cn10k_ml_model_xstats_table);
+		model = dev->data->models[model_id];
+
+		if (model == NULL)
+			continue;
+
+		type = stat_ids[i] % PLT_DIM(cn10k_ml_model_xstats_table);
+		values[i] = cn10k_ml_model_xstat_get(dev, model_id, type);
+		count++;
+	}
+
+	return count;
+}
+
+static int
+cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, const uint16_t *stat_ids, uint16_t nb_ids)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_model *model;
+	struct cn10k_ml_dev *mldev;
+	uint32_t model_id;
+	uint32_t type;
+	uint32_t i;
+
+	mldev = dev->data->dev_private;
+	if (!mldev->xstats_enabled)
+		return 0;
+
+	cn10k_ml_dev_info_get(dev, &dev_info);
+	if (stat_ids == NULL) {
+		for (i = 0; i < PLT_DIM(cn10k_ml_model_xstats_table) * dev_info.max_models; i++) {
+			model_id = i / PLT_DIM(cn10k_ml_model_xstats_table);
+			model = dev->data->models[model_id];
+
+			if (model == NULL)
+				continue;
+
+			type = i % PLT_DIM(cn10k_ml_model_xstats_table);
+			cn10k_ml_model_xstat_reset(dev, model_id, type);
+		}
+	} else {
+		for (i = 0; i < nb_ids; i++) {
+			model_id = stat_ids[i] / PLT_DIM(cn10k_ml_model_xstats_table);
+			model = dev->data->models[model_id];
+
+			if (model == NULL)
+				continue;
+
+			type = stat_ids[i] % PLT_DIM(cn10k_ml_model_xstats_table);
+			cn10k_ml_model_xstat_reset(dev, model_id, type);
+		}
+	}
+
+	return 0;
+}
+
 static int
 cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
@@ -856,6 +1155,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
+	size_t model_stats_size;
 	size_t model_data_size;
 	size_t model_info_size;
 	uint8_t *base_dma_addr;
@@ -864,6 +1164,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	uint64_t mz_size;
 	uint16_t idx;
 	bool found;
+	int qp_id;
 	int ret;
 
 	ret = cn10k_ml_model_metadata_check(params->addr, params->size);
@@ -900,10 +1201,12 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
 			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
 	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
+	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_model_stats);
 
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
 		  2 * model_data_size + model_info_size +
-		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE);
+		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE) +
+		  model_stats_size;
 
 	/* Allocate memzone for model object and model data */
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
@@ -949,6 +1252,24 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	/* Set slow-path request address and state */
 	model->req = PLT_PTR_ADD(model->info, model_info_size);
 
+	/* Reset burst and sync stats */
+	model->burst_stats = PLT_PTR_ADD(
+		model->req, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE));
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
+		model->burst_stats[qp_id].hw_latency_tot = 0;
+		model->burst_stats[qp_id].hw_latency_min = UINT64_MAX;
+		model->burst_stats[qp_id].hw_latency_max = 0;
+		model->burst_stats[qp_id].fw_latency_tot = 0;
+		model->burst_stats[qp_id].fw_latency_min = UINT64_MAX;
+		model->burst_stats[qp_id].fw_latency_max = 0;
+		model->burst_stats[qp_id].hw_reset_count = 0;
+		model->burst_stats[qp_id].fw_reset_count = 0;
+		model->burst_stats[qp_id].dequeued_count = 0;
+	}
+	model->sync_stats =
+		PLT_PTR_ADD(model->burst_stats,
+			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_model_stats));
+
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
@@ -1503,15 +1824,44 @@ static __rte_always_inline void
 cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
 		       struct rte_ml_op *op)
 {
+	struct cn10k_ml_model_stats *stats;
+	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
+	uint64_t hw_latency;
+	uint64_t fw_latency;
 
 	if (likely(result->error_code.u64 == 0)) {
+		model = dev->data->models[op->model_id];
 		if (likely(qp_id >= 0)) {
 			qp = dev->data->queue_pairs[qp_id];
 			qp->stats.dequeued_count++;
+			stats = &model->burst_stats[qp_id];
+		} else {
+			stats = model->sync_stats;
+		}
+
+		if (unlikely(stats->dequeued_count == stats->hw_reset_count)) {
+			stats->hw_latency_min = UINT64_MAX;
+			stats->hw_latency_max = 0;
 		}
 
+		if (unlikely(stats->dequeued_count == stats->fw_reset_count)) {
+			stats->fw_latency_min = UINT64_MAX;
+			stats->fw_latency_max = 0;
+		}
+
+		hw_latency = result->stats.hw_end - result->stats.hw_start;
+		fw_latency = result->stats.fw_end - result->stats.fw_start - hw_latency;
+
+		stats->hw_latency_tot += hw_latency;
+		stats->hw_latency_min = PLT_MIN(stats->hw_latency_min, hw_latency);
+		stats->hw_latency_max = PLT_MAX(stats->hw_latency_max, hw_latency);
+		stats->fw_latency_tot += fw_latency;
+		stats->fw_latency_min = PLT_MIN(stats->fw_latency_min, fw_latency);
+		stats->fw_latency_max = PLT_MAX(stats->fw_latency_max, fw_latency);
+		stats->dequeued_count++;
+
 		op->impl_opaque = result->error_code.u64;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
 	} else {
@@ -1745,6 +2095,10 @@ struct rte_ml_dev_ops cn10k_ml_ops = {
 	/* Stats ops */
 	.dev_stats_get = cn10k_ml_dev_stats_get,
 	.dev_stats_reset = cn10k_ml_dev_stats_reset,
+	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
+	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
+	.dev_xstats_get = cn10k_ml_dev_xstats_get,
+	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
 
 	/* Model ops */
 	.model_load = cn10k_ml_model_load,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 32/39] ml/cnxk: enable support to get xstats in cycles
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (30 preceding siblings ...)
  2023-03-10  8:20   ` [PATCH v6 31/39] ml/cnxk: add support to handle extended dev stats Srikanth Yalavarthi
@ 2023-03-10  8:20   ` Srikanth Yalavarthi
  2023-03-10  8:20   ` [PATCH v6 33/39] ml/cnxk: add support to report DPE FW warnings Srikanth Yalavarthi
                     ` (8 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:20 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Enabled support to retrieve xstats in either cycles or ns.
Access to sclk is enabled only if an RVU device is probed
during initialization. Driver would return the xstats in
nanoseconds only when an RVU device is probed, else would
fallback to cycles.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 880bb6a5a9..5689fbfcb2 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -394,6 +394,8 @@ cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t model_id,
 			 enum cn10k_ml_model_xstats_type type)
 {
 	struct cn10k_ml_model *model;
+	uint16_t rclk_freq; /* MHz */
+	uint16_t sclk_freq; /* MHz */
 	uint64_t count = 0;
 	uint64_t value;
 	uint32_t qp_id;
@@ -425,6 +427,10 @@ cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t model_id,
 		value = 0;
 	}
 
+	roc_clk_freq_get(&rclk_freq, &sclk_freq);
+	if (sclk_freq != 0) /* return in ns */
+		value = (value * 1000ULL) / sclk_freq;
+
 	return value;
 }
 
@@ -863,6 +869,8 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, struct rte_ml_dev_xstats_m
 	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
+	uint16_t rclk_freq;
+	uint16_t sclk_freq;
 	uint32_t model_id;
 	uint32_t count;
 	uint32_t type;
@@ -878,6 +886,7 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, struct rte_ml_dev_xstats_m
 	/* Model xstats names */
 	count = 0;
 	cn10k_ml_dev_info_get(dev, &dev_info);
+	roc_clk_freq_get(&rclk_freq, &sclk_freq);
 
 	for (id = 0; id < PLT_DIM(cn10k_ml_model_xstats_table) * dev_info.max_models; id++) {
 		model_id = id / PLT_DIM(cn10k_ml_model_xstats_table);
@@ -889,8 +898,14 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, struct rte_ml_dev_xstats_m
 		xstats_map[count].id = id;
 		type = id % PLT_DIM(cn10k_ml_model_xstats_table);
 
-		snprintf(xstats_map[count].name, RTE_ML_STR_MAX, "%s-%s-cycles",
-			 model->metadata.model.name, cn10k_ml_model_xstats_table[type].name);
+		if (sclk_freq == 0)
+			snprintf(xstats_map[count].name, RTE_ML_STR_MAX, "%s-%s-cycles",
+				 model->metadata.model.name,
+				 cn10k_ml_model_xstats_table[type].name);
+		else
+			snprintf(xstats_map[count].name, RTE_ML_STR_MAX, "%s-%s-ns",
+				 model->metadata.model.name,
+				 cn10k_ml_model_xstats_table[type].name);
 
 		count++;
 		if (count == size)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 33/39] ml/cnxk: add support to report DPE FW warnings
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (31 preceding siblings ...)
  2023-03-10  8:20   ` [PATCH v6 32/39] ml/cnxk: enable support to get xstats in cycles Srikanth Yalavarthi
@ 2023-03-10  8:20   ` Srikanth Yalavarthi
  2023-03-10  8:20   ` [PATCH v6 34/39] ml/cnxk: add support to enable model data caching Srikanth Yalavarthi
                     ` (7 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:20 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Enabled support to enable and report DPE warnings from ML
firmware. Configure firmware load flags based on the device
arguments.

Default values:
	enable_dpe_errors = 1
	report_dpe_errors = 0

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 94 +++++++++++++++++++++++++++++++---
 drivers/ml/cnxk/cn10k_ml_dev.h |  6 +++
 2 files changed, 93 insertions(+), 7 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 76ed853a3c..ac6592891b 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -17,9 +17,13 @@
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ops.h"
 
-#define CN10K_ML_FW_PATH "fw_path"
+#define CN10K_ML_FW_PATH		"fw_path"
+#define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
+#define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
 
-#define CN10K_ML_FW_PATH_DEFAULT "/lib/firmware/mlip-fw.bin"
+#define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
+#define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1
+#define CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT 0
 
 /* ML firmware macros */
 #define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
@@ -28,9 +32,13 @@
 #define FW_EXCEPTION_BUFFER_SIZE 0x400
 #define FW_LINKER_OFFSET	 0x80000
 #define FW_WAIT_CYCLES		 100
-#define FW_LOAD_FLAGS		 0x1
 
-static const char *const valid_args[] = {CN10K_ML_FW_PATH, NULL};
+/* Firmware flags */
+#define FW_ENABLE_DPE_WARNING_BITMASK BIT(0)
+#define FW_REPORT_DPE_WARNING_BITMASK BIT(1)
+
+static const char *const valid_args[] = {CN10K_ML_FW_PATH, CN10K_ML_FW_ENABLE_DPE_WARNINGS,
+					 CN10K_ML_FW_REPORT_DPE_WARNINGS, NULL};
 
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
@@ -49,9 +57,25 @@ parse_string_arg(const char *key __rte_unused, const char *value, void *extra_ar
 	return 0;
 }
 
+static int
+parse_integer_arg(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	int *i = (int *)extra_args;
+
+	*i = atoi(value);
+	if (*i < 0) {
+		plt_err("Argument has to be positive.");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 static int
 cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mldev)
 {
+	bool enable_dpe_warnings_set = false;
+	bool report_dpe_warnings_set = false;
 	struct rte_kvargs *kvlist = NULL;
 	bool fw_path_set = false;
 	char *fw_path = NULL;
@@ -76,6 +100,30 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		fw_path_set = true;
 	}
 
+	if (rte_kvargs_count(kvlist, CN10K_ML_FW_ENABLE_DPE_WARNINGS) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_ENABLE_DPE_WARNINGS,
+					 &parse_integer_arg, &mldev->fw.enable_dpe_warnings);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n",
+				CN10K_ML_FW_ENABLE_DPE_WARNINGS);
+			ret = -EINVAL;
+			goto exit;
+		}
+		enable_dpe_warnings_set = true;
+	}
+
+	if (rte_kvargs_count(kvlist, CN10K_ML_FW_REPORT_DPE_WARNINGS) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_REPORT_DPE_WARNINGS,
+					 &parse_integer_arg, &mldev->fw.report_dpe_warnings);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n",
+				CN10K_ML_FW_REPORT_DPE_WARNINGS);
+			ret = -EINVAL;
+			goto exit;
+		}
+		report_dpe_warnings_set = true;
+	}
+
 check_args:
 	if (!fw_path_set)
 		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
@@ -83,6 +131,30 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		mldev->fw.path = fw_path;
 	plt_info("ML: %s = %s", CN10K_ML_FW_PATH, mldev->fw.path);
 
+	if (!enable_dpe_warnings_set) {
+		mldev->fw.enable_dpe_warnings = CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT;
+	} else {
+		if ((mldev->fw.enable_dpe_warnings < 0) || (mldev->fw.enable_dpe_warnings > 1)) {
+			plt_err("Invalid argument, %s = %d\n", CN10K_ML_FW_ENABLE_DPE_WARNINGS,
+				mldev->fw.enable_dpe_warnings);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_info("ML: %s = %d", CN10K_ML_FW_ENABLE_DPE_WARNINGS, mldev->fw.enable_dpe_warnings);
+
+	if (!report_dpe_warnings_set) {
+		mldev->fw.report_dpe_warnings = CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT;
+	} else {
+		if ((mldev->fw.report_dpe_warnings < 0) || (mldev->fw.report_dpe_warnings > 1)) {
+			plt_err("Invalid argument, %s = %d\n", CN10K_ML_FW_REPORT_DPE_WARNINGS,
+				mldev->fw.report_dpe_warnings);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_info("ML: %s = %d", CN10K_ML_FW_REPORT_DPE_WARNINGS, mldev->fw.report_dpe_warnings);
+
 exit:
 	if (kvlist)
 		rte_kvargs_free(kvlist);
@@ -208,9 +280,15 @@ cn10k_ml_fw_print_info(struct cn10k_ml_fw *fw)
 uint64_t
 cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 {
-	PLT_SET_USED(fw);
+	uint64_t flags = 0x0;
+
+	if (fw->enable_dpe_warnings)
+		flags = flags | FW_ENABLE_DPE_WARNING_BITMASK;
+
+	if (fw->report_dpe_warnings)
+		flags = flags | FW_REPORT_DPE_WARNING_BITMASK;
 
-	return FW_LOAD_FLAGS;
+	return flags;
 }
 
 static int
@@ -614,4 +692,6 @@ RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table);
 RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
 
-RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH "=<path>");
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD,
+			      CN10K_ML_FW_PATH "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
+					       "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS "=<0|1>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index b7ff369ba8..9ba56ffba6 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -349,6 +349,12 @@ struct cn10k_ml_fw {
 	/* Firmware file path */
 	const char *path;
 
+	/* Enable DPE warnings */
+	int enable_dpe_warnings;
+
+	/* Report DPE warnings */
+	int report_dpe_warnings;
+
 	/* Data buffer */
 	uint8_t *data;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 34/39] ml/cnxk: add support to enable model data caching
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (32 preceding siblings ...)
  2023-03-10  8:20   ` [PATCH v6 33/39] ml/cnxk: add support to report DPE FW warnings Srikanth Yalavarthi
@ 2023-03-10  8:20   ` Srikanth Yalavarthi
  2023-03-10  8:20   ` [PATCH v6 35/39] ml/cnxk: add support to select OCM allocation mode Srikanth Yalavarthi
                     ` (6 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:20 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added device argument 'cache_model_data' to enable model data
caching. An inference request would be executed with dummy data
in synchronous mode during model start stage. This run would
cache the model weights and bias in the memory and result in
improved inference throughput.

cache_model_data = 1, enable (default)
cache_model_data = 0, disable

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 33 ++++++++++++++++++++--
 drivers/ml/cnxk/cn10k_ml_dev.h |  3 ++
 drivers/ml/cnxk/cn10k_ml_ops.c | 50 ++++++++++++++++++++++++++++++++++
 3 files changed, 84 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index ac6592891b..948708a420 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -20,10 +20,12 @@
 #define CN10K_ML_FW_PATH		"fw_path"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
+#define CN10K_ML_DEV_CACHE_MODEL_DATA	"cache_model_data"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT 0
+#define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT	1
 
 /* ML firmware macros */
 #define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
@@ -38,7 +40,8 @@
 #define FW_REPORT_DPE_WARNING_BITMASK BIT(1)
 
 static const char *const valid_args[] = {CN10K_ML_FW_PATH, CN10K_ML_FW_ENABLE_DPE_WARNINGS,
-					 CN10K_ML_FW_REPORT_DPE_WARNINGS, NULL};
+					 CN10K_ML_FW_REPORT_DPE_WARNINGS,
+					 CN10K_ML_DEV_CACHE_MODEL_DATA, NULL};
 
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
@@ -76,6 +79,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 {
 	bool enable_dpe_warnings_set = false;
 	bool report_dpe_warnings_set = false;
+	bool cache_model_data_set = false;
 	struct rte_kvargs *kvlist = NULL;
 	bool fw_path_set = false;
 	char *fw_path = NULL;
@@ -124,6 +128,18 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		report_dpe_warnings_set = true;
 	}
 
+	if (rte_kvargs_count(kvlist, CN10K_ML_DEV_CACHE_MODEL_DATA) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_DEV_CACHE_MODEL_DATA, &parse_integer_arg,
+					 &mldev->cache_model_data);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n",
+				CN10K_ML_DEV_CACHE_MODEL_DATA);
+			ret = -EINVAL;
+			goto exit;
+		}
+		cache_model_data_set = true;
+	}
+
 check_args:
 	if (!fw_path_set)
 		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
@@ -155,6 +171,18 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %d", CN10K_ML_FW_REPORT_DPE_WARNINGS, mldev->fw.report_dpe_warnings);
 
+	if (!cache_model_data_set) {
+		mldev->cache_model_data = CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT;
+	} else {
+		if ((mldev->cache_model_data < 0) || (mldev->cache_model_data > 1)) {
+			plt_err("Invalid argument, %s = %d\n", CN10K_ML_DEV_CACHE_MODEL_DATA,
+				mldev->cache_model_data);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_info("ML: %s = %d", CN10K_ML_DEV_CACHE_MODEL_DATA, mldev->cache_model_data);
+
 exit:
 	if (kvlist)
 		rte_kvargs_free(kvlist);
@@ -694,4 +722,5 @@ RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
 
 RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD,
 			      CN10K_ML_FW_PATH "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
-					       "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS "=<0|1>");
+					       "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
+					       "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA "=<0|1>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 9ba56ffba6..718edadde7 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -381,6 +381,9 @@ struct cn10k_ml_dev {
 
 	/* xstats status */
 	bool xstats_enabled;
+
+	/* Enable / disable model data caching */
+	int cache_model_data;
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 5689fbfcb2..d69df42b27 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -488,6 +488,49 @@ cn10k_ml_model_xstat_reset(struct rte_ml_dev *dev, uint16_t model_id,
 	}
 }
 
+static int
+cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cn10k_ml_model *model;
+	struct rte_ml_op op;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint64_t isize = 0;
+	uint64_t osize = 0;
+	int ret = 0;
+
+	model = dev->data->models[model_id];
+
+	/* Create input and output buffers. */
+	rte_ml_io_input_size_get(dev->data->dev_id, model_id, model->batch_size, &isize, NULL);
+	rte_ml_io_output_size_get(dev->data->dev_id, model_id, model->batch_size, &osize, NULL);
+
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", model_id);
+	mz = plt_memzone_reserve_aligned(str, isize + osize, 0, ML_CN10K_ALIGN_SIZE);
+	if (mz == NULL)
+		return -ENOMEM;
+	memset(mz->addr, 0, isize + osize);
+
+	op.model_id = model_id;
+	op.nb_batches = model->batch_size;
+	op.mempool = NULL;
+
+	op.input.addr = mz->addr;
+	op.input.length = isize;
+	op.input.next = NULL;
+
+	op.output.addr = PLT_PTR_ADD(op.input.addr, isize);
+	op.output.length = osize;
+	op.output.next = NULL;
+
+	memset(model->req, 0, sizeof(struct cn10k_ml_req));
+	ret = cn10k_ml_inference_sync(dev, &op);
+	plt_memzone_free(mz);
+
+	return ret;
+}
+
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -1467,6 +1510,13 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 		}
 	}
 
+	if (ret < 0) { /* Call unload to update model and FW state, ignore error */
+		rte_ml_model_stop(dev->data->dev_id, model_id);
+	} else {
+		if (mldev->cache_model_data && roc_model_is_cn10ka())
+			ret = cn10k_ml_cache_model_data(dev, model_id);
+	}
+
 	return ret;
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 35/39] ml/cnxk: add support to select OCM allocation mode
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (33 preceding siblings ...)
  2023-03-10  8:20   ` [PATCH v6 34/39] ml/cnxk: add support to enable model data caching Srikanth Yalavarthi
@ 2023-03-10  8:20   ` Srikanth Yalavarthi
  2023-03-10  8:20   ` [PATCH v6 36/39] ml/cnxk: add support to use lock during jcmd enq Srikanth Yalavarthi
                     ` (5 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:20 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added device argument "ocm_alloc_mode" to select OCM allocation
method during model start. Two modes are supported by the driver.

Added implementation for ocm_alloc_mode lowest as default.

ocm_alloc_mode:
lowest:  Allocate from first available free slot / lowest
         tile ID in OCM (default)
largest: Allocate from a slot with maximum free memory

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 45 +++++++++++++++++++++++++++++-----
 drivers/ml/cnxk/cn10k_ml_ocm.c |  6 ++---
 drivers/ml/cnxk/cn10k_ml_ocm.h |  3 +++
 3 files changed, 44 insertions(+), 10 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 948708a420..5c02d67c8e 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -21,11 +21,13 @@
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
 #define CN10K_ML_DEV_CACHE_MODEL_DATA	"cache_model_data"
+#define CN10K_ML_OCM_ALLOC_MODE		"ocm_alloc_mode"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT 0
 #define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT	1
+#define CN10K_ML_OCM_ALLOC_MODE_DEFAULT		"lowest"
 
 /* ML firmware macros */
 #define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
@@ -39,9 +41,12 @@
 #define FW_ENABLE_DPE_WARNING_BITMASK BIT(0)
 #define FW_REPORT_DPE_WARNING_BITMASK BIT(1)
 
-static const char *const valid_args[] = {CN10K_ML_FW_PATH, CN10K_ML_FW_ENABLE_DPE_WARNINGS,
+static const char *const valid_args[] = {CN10K_ML_FW_PATH,
+					 CN10K_ML_FW_ENABLE_DPE_WARNINGS,
 					 CN10K_ML_FW_REPORT_DPE_WARNINGS,
-					 CN10K_ML_DEV_CACHE_MODEL_DATA, NULL};
+					 CN10K_ML_DEV_CACHE_MODEL_DATA,
+					 CN10K_ML_OCM_ALLOC_MODE,
+					 NULL};
 
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
@@ -81,6 +86,8 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	bool report_dpe_warnings_set = false;
 	bool cache_model_data_set = false;
 	struct rte_kvargs *kvlist = NULL;
+	bool ocm_alloc_mode_set = false;
+	char *ocm_alloc_mode = NULL;
 	bool fw_path_set = false;
 	char *fw_path = NULL;
 	int ret = 0;
@@ -140,6 +147,17 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		cache_model_data_set = true;
 	}
 
+	if (rte_kvargs_count(kvlist, CN10K_ML_OCM_ALLOC_MODE) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_OCM_ALLOC_MODE, &parse_string_arg,
+					 &ocm_alloc_mode);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n", CN10K_ML_OCM_ALLOC_MODE);
+			ret = -EINVAL;
+			goto exit;
+		}
+		ocm_alloc_mode_set = true;
+	}
+
 check_args:
 	if (!fw_path_set)
 		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
@@ -183,6 +201,20 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %d", CN10K_ML_DEV_CACHE_MODEL_DATA, mldev->cache_model_data);
 
+	if (!ocm_alloc_mode_set) {
+		mldev->ocm.alloc_mode = CN10K_ML_OCM_ALLOC_MODE_DEFAULT;
+	} else {
+		if (!((strcmp(ocm_alloc_mode, "lowest") == 0) ||
+		      (strcmp(ocm_alloc_mode, "largest") == 0))) {
+			plt_err("Invalid argument, %s = %s\n", CN10K_ML_OCM_ALLOC_MODE,
+				ocm_alloc_mode);
+			ret = -EINVAL;
+			goto exit;
+		}
+		mldev->ocm.alloc_mode = ocm_alloc_mode;
+	}
+	plt_info("ML: %s = %s", CN10K_ML_OCM_ALLOC_MODE, mldev->ocm.alloc_mode);
+
 exit:
 	if (kvlist)
 		rte_kvargs_free(kvlist);
@@ -720,7 +752,8 @@ RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table);
 RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
 
-RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD,
-			      CN10K_ML_FW_PATH "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
-					       "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
-					       "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA "=<0|1>");
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH
+			      "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
+			      "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
+			      "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
+			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE "=<lowest|largest>");
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 0b04fcc2da..551faef7eb 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -230,7 +230,6 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	int wb_page_start_curr;
 	int max_slot_sz_curr;
 	uint8_t tile_start;
-	int ocm_alloc_mode;
 	int wb_page_start;
 	uint16_t tile_id;
 	uint16_t word_id;
@@ -255,7 +254,6 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	max_slot_sz_curr = 0;
 	max_slot_sz = 0;
 	tile_idx = 0;
-	ocm_alloc_mode = 2;
 
 	if ((start_tile != -1) && (start_tile % num_tiles != 0)) {
 		plt_err("Invalid start_tile, %d", start_tile);
@@ -303,13 +301,13 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 		}
 	}
 
-	if (ocm_alloc_mode == 1) {
+	if (strcmp(ocm->alloc_mode, "lowest") == 0) {
 		wb_page_start = slot_index_lowest(local_ocm_mask, ocm->mask_words, wb_pages, 0);
 		if (wb_page_start != -1) { /* Have a valid slot for WB, else next set of tiles */
 			tile_idx = tile_start;
 			goto found;
 		}
-	} else if (ocm_alloc_mode == 2) {
+	} else if (strcmp(ocm->alloc_mode, "largest") == 0) {
 		wb_page_start_curr = slot_index_largest(local_ocm_mask, ocm->mask_words, wb_pages,
 							&max_slot_sz_curr);
 		if (max_slot_sz_curr > max_slot_sz) {
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 0c7172a671..5f018b410a 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -58,6 +58,9 @@ struct cn10k_ml_ocm {
 	/* OCM spinlock, used to update OCM state */
 	rte_spinlock_t lock;
 
+	/* OCM allocation mode */
+	const char *alloc_mode;
+
 	/* Number of OCM tiles */
 	uint8_t num_tiles;
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 36/39] ml/cnxk: add support to use lock during jcmd enq
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (34 preceding siblings ...)
  2023-03-10  8:20   ` [PATCH v6 35/39] ml/cnxk: add support to select OCM allocation mode Srikanth Yalavarthi
@ 2023-03-10  8:20   ` Srikanth Yalavarthi
  2023-03-10  8:20   ` [PATCH v6 37/39] ml/cnxk: add support to select poll memory region Srikanth Yalavarthi
                     ` (4 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:20 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added device argument "hw_queue_lock" to select the JCMDQ enqueue
ROC function to be used in fast path.

hw_queue_lock:

0: Disable, use lock free version of JCMDQ enqueue ROC 	function for
	job queuing. To avoid race condition in request queuing to
	hardware, disabling hw_queue_lock restricts the number of
	queue-pairs supported by cnxk driver to 1.

1: Enable, (default) use spin-lock version of JCMDQ enqueue ROC
	function for job queuing. Enabling spinlock version would
	disable restrictions on the number of queue-pairs that
	can be created.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 31 ++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cn10k_ml_dev.h | 13 +++++++++++--
 drivers/ml/cnxk/cn10k_ml_ops.c | 20 +++++++++++++++++---
 3 files changed, 58 insertions(+), 6 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 5c02d67c8e..aa503b2691 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -22,12 +22,14 @@
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
 #define CN10K_ML_DEV_CACHE_MODEL_DATA	"cache_model_data"
 #define CN10K_ML_OCM_ALLOC_MODE		"ocm_alloc_mode"
+#define CN10K_ML_DEV_HW_QUEUE_LOCK	"hw_queue_lock"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT 0
 #define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT	1
 #define CN10K_ML_OCM_ALLOC_MODE_DEFAULT		"lowest"
+#define CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT	1
 
 /* ML firmware macros */
 #define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
@@ -46,6 +48,7 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 					 CN10K_ML_FW_REPORT_DPE_WARNINGS,
 					 CN10K_ML_DEV_CACHE_MODEL_DATA,
 					 CN10K_ML_OCM_ALLOC_MODE,
+					 CN10K_ML_DEV_HW_QUEUE_LOCK,
 					 NULL};
 
 /* Dummy operations for ML device */
@@ -87,6 +90,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	bool cache_model_data_set = false;
 	struct rte_kvargs *kvlist = NULL;
 	bool ocm_alloc_mode_set = false;
+	bool hw_queue_lock_set = false;
 	char *ocm_alloc_mode = NULL;
 	bool fw_path_set = false;
 	char *fw_path = NULL;
@@ -158,6 +162,18 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		ocm_alloc_mode_set = true;
 	}
 
+	if (rte_kvargs_count(kvlist, CN10K_ML_DEV_HW_QUEUE_LOCK) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_DEV_HW_QUEUE_LOCK, &parse_integer_arg,
+					 &mldev->hw_queue_lock);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n",
+				CN10K_ML_DEV_HW_QUEUE_LOCK);
+			ret = -EINVAL;
+			goto exit;
+		}
+		hw_queue_lock_set = true;
+	}
+
 check_args:
 	if (!fw_path_set)
 		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
@@ -215,6 +231,18 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %s", CN10K_ML_OCM_ALLOC_MODE, mldev->ocm.alloc_mode);
 
+	if (!hw_queue_lock_set) {
+		mldev->hw_queue_lock = CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT;
+	} else {
+		if ((mldev->hw_queue_lock < 0) || (mldev->hw_queue_lock > 1)) {
+			plt_err("Invalid argument, %s = %d\n", CN10K_ML_DEV_HW_QUEUE_LOCK,
+				mldev->hw_queue_lock);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, mldev->hw_queue_lock);
+
 exit:
 	if (kvlist)
 		rte_kvargs_free(kvlist);
@@ -756,4 +784,5 @@ RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH
 			      "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
 			      "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
 			      "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
-			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE "=<lowest|largest>");
+			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE
+			      "=<lowest|largest>" CN10K_ML_DEV_HW_QUEUE_LOCK "=<0|1>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 718edadde7..49676ac9e7 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -21,8 +21,11 @@
 /* Maximum number of models per device */
 #define ML_CN10K_MAX_MODELS 16
 
-/* Maximum number of queue-pairs per device */
-#define ML_CN10K_MAX_QP_PER_DEVICE 1
+/* Maximum number of queue-pairs per device, spinlock version */
+#define ML_CN10K_MAX_QP_PER_DEVICE_SL 16
+
+/* Maximum number of queue-pairs per device, lock-free version */
+#define ML_CN10K_MAX_QP_PER_DEVICE_LF 1
 
 /* Maximum number of descriptors per queue-pair */
 #define ML_CN10K_MAX_DESC_PER_QP 1024
@@ -384,6 +387,12 @@ struct cn10k_ml_dev {
 
 	/* Enable / disable model data caching */
 	int cache_model_data;
+
+	/* Use spinlock version of ROC enqueue */
+	int hw_queue_lock;
+
+	/* JCMD enqueue function handler */
+	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index d69df42b27..f92f778e23 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -534,13 +534,21 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
+	struct cn10k_ml_dev *mldev;
+
 	if (dev_info == NULL)
 		return -EINVAL;
 
+	mldev = dev->data->dev_private;
+
 	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
 	dev_info->driver_name = dev->device->driver->name;
 	dev_info->max_models = ML_CN10K_MAX_MODELS;
-	dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE;
+	if (mldev->hw_queue_lock)
+		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
+	else
+		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_LF;
+
 	dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
 	dev_info->max_segments = ML_CN10K_MAX_SEGMENTS;
 	dev_info->min_align_size = ML_CN10K_ALIGN_SIZE;
@@ -703,6 +711,12 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	else
 		mldev->xstats_enabled = false;
 
+	/* Set JCMDQ enqueue function */
+	if (mldev->hw_queue_lock == 1)
+		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_sl;
+	else
+		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
+
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
 	dev->op_error_get = cn10k_ml_op_error_get;
@@ -1993,7 +2007,7 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	req->result.user_ptr = op->user_ptr;
 
 	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
-	enqueued = roc_ml_jcmdq_enqueue_lf(&mldev->roc, &req->jcmd);
+	enqueued = mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
 
@@ -2114,7 +2128,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	timeout = true;
 	req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
 	do {
-		if (roc_ml_jcmdq_enqueue_lf(&mldev->roc, &req->jcmd)) {
+		if (mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd)) {
 			req->op = op;
 			timeout = false;
 			break;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 37/39] ml/cnxk: add support to select poll memory region
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (35 preceding siblings ...)
  2023-03-10  8:20   ` [PATCH v6 36/39] ml/cnxk: add support to use lock during jcmd enq Srikanth Yalavarthi
@ 2023-03-10  8:20   ` Srikanth Yalavarthi
  2023-03-10  8:20   ` [PATCH v6 38/39] ml/cnxk: add user guide for marvell cnxk ml driver Srikanth Yalavarthi
                     ` (3 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:20 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added device argument "poll_mem" to select the memory
region to be used for polling in fast-path requests.

Implemented support to use scratch registers for polling.
Available pool of scratch registers one-to-one mapped with
the internal request queue.

poll_mem:
ddr:      Use DDR memory location for polling (default)
register: Use scratch registers polling

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c |  47 +++++++++++--
 drivers/ml/cnxk/cn10k_ml_dev.h |  24 +++++++
 drivers/ml/cnxk/cn10k_ml_ops.c | 124 +++++++++++++++++++++++++++++++--
 drivers/ml/cnxk/cn10k_ml_ops.h |   9 +++
 4 files changed, 192 insertions(+), 12 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index aa503b2691..a746a66849 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -23,6 +23,7 @@
 #define CN10K_ML_DEV_CACHE_MODEL_DATA	"cache_model_data"
 #define CN10K_ML_OCM_ALLOC_MODE		"ocm_alloc_mode"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK	"hw_queue_lock"
+#define CN10K_ML_FW_POLL_MEM		"poll_mem"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1
@@ -30,6 +31,7 @@
 #define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT	1
 #define CN10K_ML_OCM_ALLOC_MODE_DEFAULT		"lowest"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT	1
+#define CN10K_ML_FW_POLL_MEM_DEFAULT		"ddr"
 
 /* ML firmware macros */
 #define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
@@ -42,6 +44,7 @@
 /* Firmware flags */
 #define FW_ENABLE_DPE_WARNING_BITMASK BIT(0)
 #define FW_REPORT_DPE_WARNING_BITMASK BIT(1)
+#define FW_USE_DDR_POLL_ADDR_FP	      BIT(2)
 
 static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 					 CN10K_ML_FW_ENABLE_DPE_WARNINGS,
@@ -49,6 +52,7 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 					 CN10K_ML_DEV_CACHE_MODEL_DATA,
 					 CN10K_ML_OCM_ALLOC_MODE,
 					 CN10K_ML_DEV_HW_QUEUE_LOCK,
+					 CN10K_ML_FW_POLL_MEM,
 					 NULL};
 
 /* Dummy operations for ML device */
@@ -92,7 +96,9 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	bool ocm_alloc_mode_set = false;
 	bool hw_queue_lock_set = false;
 	char *ocm_alloc_mode = NULL;
+	bool poll_mem_set = false;
 	bool fw_path_set = false;
+	char *poll_mem = NULL;
 	char *fw_path = NULL;
 	int ret = 0;
 
@@ -174,6 +180,17 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		hw_queue_lock_set = true;
 	}
 
+	if (rte_kvargs_count(kvlist, CN10K_ML_FW_POLL_MEM) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_POLL_MEM, &parse_string_arg,
+					 &poll_mem);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n", CN10K_ML_FW_POLL_MEM);
+			ret = -EINVAL;
+			goto exit;
+		}
+		poll_mem_set = true;
+	}
+
 check_args:
 	if (!fw_path_set)
 		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
@@ -243,6 +260,18 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, mldev->hw_queue_lock);
 
+	if (!poll_mem_set) {
+		mldev->fw.poll_mem = CN10K_ML_FW_POLL_MEM_DEFAULT;
+	} else {
+		if (!((strcmp(poll_mem, "ddr") == 0) || (strcmp(poll_mem, "register") == 0))) {
+			plt_err("Invalid argument, %s = %s\n", CN10K_ML_FW_POLL_MEM, poll_mem);
+			ret = -EINVAL;
+			goto exit;
+		}
+		mldev->fw.poll_mem = poll_mem;
+	}
+	plt_info("ML: %s = %s", CN10K_ML_FW_POLL_MEM, mldev->fw.poll_mem);
+
 exit:
 	if (kvlist)
 		rte_kvargs_free(kvlist);
@@ -376,6 +405,11 @@ cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 	if (fw->report_dpe_warnings)
 		flags = flags | FW_REPORT_DPE_WARNING_BITMASK;
 
+	if (strcmp(fw->poll_mem, "ddr") == 0)
+		flags = flags | FW_USE_DDR_POLL_ADDR_FP;
+	else if (strcmp(fw->poll_mem, "register") == 0)
+		flags = flags & ~FW_USE_DDR_POLL_ADDR_FP;
+
 	return flags;
 }
 
@@ -780,9 +814,10 @@ RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table);
 RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
 
-RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH
-			      "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
-			      "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
-			      "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
-			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE
-			      "=<lowest|largest>" CN10K_ML_DEV_HW_QUEUE_LOCK "=<0|1>");
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD,
+			      CN10K_ML_FW_PATH "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
+					       "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
+					       "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
+					       "=<0|1>" CN10K_ML_OCM_ALLOC_MODE
+					       "=<lowest|largest>" CN10K_ML_DEV_HW_QUEUE_LOCK
+					       "=<0|1>" CN10K_ML_FW_POLL_MEM "=<ddr|register>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 49676ac9e7..966d92e027 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -43,6 +43,18 @@
 #define ML_CN10K_POLL_JOB_START	 0
 #define ML_CN10K_POLL_JOB_FINISH 1
 
+/* Memory barrier macros */
+#if defined(RTE_ARCH_ARM)
+#define dmb_st ({ asm volatile("dmb st" : : : "memory"); })
+#define dsb_st ({ asm volatile("dsb st" : : : "memory"); })
+#else
+#define dmb_st
+#define dsb_st
+#endif
+
+struct cn10k_ml_req;
+struct cn10k_ml_qp;
+
 /* Job types */
 enum cn10k_ml_job_type {
 	ML_CN10K_JOB_TYPE_MODEL_RUN = 0,
@@ -358,6 +370,9 @@ struct cn10k_ml_fw {
 	/* Report DPE warnings */
 	int report_dpe_warnings;
 
+	/* Memory to be used for polling in fast-path requests */
+	const char *poll_mem;
+
 	/* Data buffer */
 	uint8_t *data;
 
@@ -393,6 +408,15 @@ struct cn10k_ml_dev {
 
 	/* JCMD enqueue function handler */
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
+
+	/* Poll handling function pointers */
+	void (*set_poll_addr)(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx);
+	void (*set_poll_ptr)(struct roc_ml *roc_ml, struct cn10k_ml_req *req);
+	uint64_t (*get_poll_ptr)(struct roc_ml *roc_ml, struct cn10k_ml_req *req);
+
+	/* Memory barrier function pointers to handle synchronization */
+	void (*set_enq_barrier)(void);
+	void (*set_deq_barrier)(void);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index f92f778e23..61e6d023c5 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -23,6 +23,11 @@
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
+/* Scratch register range for poll mode requests */
+#define ML_POLL_REGISTER_SYNC  1023
+#define ML_POLL_REGISTER_START 1024
+#define ML_POLL_REGISTER_END   2047
+
 /* Error message length */
 #define ERRMSG_LEN 32
 
@@ -76,6 +81,80 @@ print_line(FILE *fp, int len)
 	fprintf(fp, "\n");
 }
 
+static inline void
+cn10k_ml_set_poll_addr_ddr(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx)
+{
+	PLT_SET_USED(qp);
+	PLT_SET_USED(idx);
+
+	req->compl_W1 = PLT_U64_CAST(&req->status);
+}
+
+static inline void
+cn10k_ml_set_poll_addr_reg(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx)
+{
+	req->compl_W1 = ML_SCRATCH(qp->block_start + idx % qp->block_size);
+}
+
+static inline void
+cn10k_ml_set_poll_ptr_ddr(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+{
+	PLT_SET_USED(roc_ml);
+
+	plt_write64(ML_CN10K_POLL_JOB_START, req->compl_W1);
+}
+
+static inline void
+cn10k_ml_set_poll_ptr_reg(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+{
+	roc_ml_reg_write64(roc_ml, ML_CN10K_POLL_JOB_START, req->compl_W1);
+}
+
+static inline uint64_t
+cn10k_ml_get_poll_ptr_ddr(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+{
+	PLT_SET_USED(roc_ml);
+
+	return plt_read64(req->compl_W1);
+}
+
+static inline uint64_t
+cn10k_ml_get_poll_ptr_reg(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+{
+	return roc_ml_reg_read64(roc_ml, req->compl_W1);
+}
+
+static inline void
+cn10k_ml_set_sync_addr(struct cn10k_ml_dev *mldev, struct cn10k_ml_req *req)
+{
+	if (strcmp(mldev->fw.poll_mem, "ddr") == 0)
+		req->compl_W1 = PLT_U64_CAST(&req->status);
+	else if (strcmp(mldev->fw.poll_mem, "register") == 0)
+		req->compl_W1 = ML_SCRATCH(ML_POLL_REGISTER_SYNC);
+}
+
+static inline void
+cn10k_ml_enq_barrier_ddr(void)
+{
+}
+
+static inline void
+cn10k_ml_deq_barrier_ddr(void)
+{
+}
+
+static inline void
+cn10k_ml_enq_barrier_register(void)
+{
+	dmb_st;
+}
+
+static inline void
+cn10k_ml_deq_barrier_register(void)
+{
+	dsb_st;
+}
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -163,6 +242,9 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->stats.dequeued_count = 0;
 	qp->stats.enqueue_err_count = 0;
 	qp->stats.dequeue_err_count = 0;
+	qp->block_size =
+		(ML_POLL_REGISTER_END - ML_POLL_REGISTER_START + 1) / dev->data->nb_queue_pairs;
+	qp->block_start = ML_POLL_REGISTER_START + qp_id * qp->block_size;
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
@@ -341,7 +423,7 @@ cn10k_ml_prep_fp_job_descriptor(struct rte_ml_dev *dev, struct cn10k_ml_req *req
 	mldev = dev->data->dev_private;
 
 	req->jd.hdr.jce.w0.u64 = 0;
-	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
+	req->jd.hdr.jce.w1.u64 = req->compl_W1;
 	req->jd.hdr.model_id = op->model_id;
 	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
 	req->jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
@@ -549,7 +631,11 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	else
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_LF;
 
-	dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
+	if (strcmp(mldev->fw.poll_mem, "register") == 0)
+		dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP / dev_info->max_queue_pairs;
+	else if (strcmp(mldev->fw.poll_mem, "ddr") == 0)
+		dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
+
 	dev_info->max_segments = ML_CN10K_MAX_SEGMENTS;
 	dev_info->min_align_size = ML_CN10K_ALIGN_SIZE;
 
@@ -717,6 +803,26 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	else
 		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
+	/* Set polling function pointers */
+	if (strcmp(mldev->fw.poll_mem, "ddr") == 0) {
+		mldev->set_poll_addr = cn10k_ml_set_poll_addr_ddr;
+		mldev->set_poll_ptr = cn10k_ml_set_poll_ptr_ddr;
+		mldev->get_poll_ptr = cn10k_ml_get_poll_ptr_ddr;
+	} else if (strcmp(mldev->fw.poll_mem, "register") == 0) {
+		mldev->set_poll_addr = cn10k_ml_set_poll_addr_reg;
+		mldev->set_poll_ptr = cn10k_ml_set_poll_ptr_reg;
+		mldev->get_poll_ptr = cn10k_ml_get_poll_ptr_reg;
+	}
+
+	/* Set barrier function pointers */
+	if (strcmp(mldev->fw.poll_mem, "ddr") == 0) {
+		mldev->set_enq_barrier = cn10k_ml_enq_barrier_ddr;
+		mldev->set_deq_barrier = cn10k_ml_deq_barrier_ddr;
+	} else if (strcmp(mldev->fw.poll_mem, "register") == 0) {
+		mldev->set_enq_barrier = cn10k_ml_enq_barrier_register;
+		mldev->set_deq_barrier = cn10k_ml_deq_barrier_register;
+	}
+
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
 	dev->op_error_get = cn10k_ml_op_error_get;
@@ -2000,13 +2106,15 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	op = ops[count];
 	req = &queue->reqs[head];
 
+	mldev->set_poll_addr(qp, req, head);
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
+	mldev->set_enq_barrier();
 
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	mldev->set_poll_ptr(&mldev->roc, req);
 	enqueued = mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
@@ -2032,6 +2140,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 		       uint16_t nb_ops)
 {
 	struct cn10k_ml_queue *queue;
+	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	struct cn10k_ml_qp *qp;
 
@@ -2039,6 +2148,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	uint16_t count;
 	uint64_t tail;
 
+	mldev = dev->data->dev_private;
 	qp = dev->data->queue_pairs[qp_id];
 	queue = &qp->queue;
 
@@ -2051,7 +2161,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 
 dequeue_req:
 	req = &queue->reqs[tail];
-	status = plt_read64(&req->status);
+	status = mldev->get_poll_ptr(&mldev->roc, req);
 	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH)) {
 		if (plt_tsc_cycles() < req->timeout)
 			goto empty_or_active;
@@ -2059,6 +2169,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 			req->result.error_code.s.etype = ML_ETYPE_DRIVER;
 	}
 
+	mldev->set_deq_barrier();
 	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
 	ops[count] = req->op;
 
@@ -2116,13 +2227,14 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	model = dev->data->models[op->model_id];
 	req = model->req;
 
+	cn10k_ml_set_sync_addr(mldev, req);
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	mldev->set_poll_ptr(&mldev->roc, req);
 	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
 
 	timeout = true;
@@ -2142,7 +2254,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 
 	timeout = true;
 	do {
-		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+		if (mldev->get_poll_ptr(&mldev->roc, req) == ML_CN10K_POLL_JOB_FINISH) {
 			timeout = false;
 			break;
 		}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 65ae8b44f3..58c992720a 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -26,6 +26,9 @@ struct cn10k_ml_req {
 	/* Job command */
 	struct ml_job_cmd_s jcmd;
 
+	/* Job completion W1 */
+	uint64_t compl_W1;
+
 	/* Timeout cycle */
 	uint64_t timeout;
 
@@ -61,6 +64,12 @@ struct cn10k_ml_qp {
 
 	/* Statistics per queue-pair */
 	struct rte_ml_dev_stats stats;
+
+	/* Register block start for polling */
+	uint32_t block_start;
+
+	/* Register block end for polling */
+	uint32_t block_size;
 };
 
 /* Device ops */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 38/39] ml/cnxk: add user guide for marvell cnxk ml driver
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (36 preceding siblings ...)
  2023-03-10  8:20   ` [PATCH v6 37/39] ml/cnxk: add support to select poll memory region Srikanth Yalavarthi
@ 2023-03-10  8:20   ` Srikanth Yalavarthi
  2023-03-10  8:20   ` [PATCH v6 39/39] ml/cnxk: add support for configurable ocm page Srikanth Yalavarthi
                     ` (2 subsequent siblings)
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:20 UTC (permalink / raw)
  To: Thomas Monjalon, Srikanth Yalavarthi
  Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added user guide for Marvell cnxk ML driver for Marvell Octeon
cnxk Soc family. Added details about device initialization,
debug options and runtime device args supported by the driver.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 MAINTAINERS                 |   1 +
 doc/guides/index.rst        |   1 +
 doc/guides/mldevs/cnxk.rst  | 238 ++++++++++++++++++++++++++++++++++++
 doc/guides/mldevs/index.rst |  14 +++
 4 files changed, 254 insertions(+)
 create mode 100644 doc/guides/mldevs/cnxk.rst
 create mode 100644 doc/guides/mldevs/index.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index 8f695516c7..ae6c4decbe 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1450,6 +1450,7 @@ M: Srikanth Yalavarthi <syalavarthi@marvell.com>
 F: drivers/common/cnxk/hw/ml.h
 F: drivers/common/cnxk/roc_ml*
 F: drivers/ml/cnxk/
+F: doc/guides/mldevs/cnxk.rst
 
 
 Packet processing
diff --git a/doc/guides/index.rst b/doc/guides/index.rst
index 5eb5bd9c9a..0bd729530a 100644
--- a/doc/guides/index.rst
+++ b/doc/guides/index.rst
@@ -26,6 +26,7 @@ DPDK documentation
    eventdevs/index
    rawdevs/index
    mempool/index
+   mldevs/index
    platform/index
    contributing/index
    rel_notes/index
diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
new file mode 100644
index 0000000000..da40336299
--- /dev/null
+++ b/doc/guides/mldevs/cnxk.rst
@@ -0,0 +1,238 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright (c) 2022 Marvell.
+
+Marvell cnxk Machine Learning Poll Mode Driver
+==============================================
+
+The cnxk ML poll mode driver provides support for offloading Machine
+Learning inference operations to Machine Learning accelerator units
+on the **Marvell OCTEON cnxk** SoC family.
+
+The cnxk ML PMD code is organized into multiple files with all file names
+starting with cn10k, providing support for CN106XX and CN106XXS.
+
+More information about OCTEON cnxk SoCs may be obtained from `<https://www.marvell.com>`_
+
+Supported OCTEON cnxk SoCs
+--------------------------
+
+- CN106XX
+- CN106XXS
+
+Features
+--------
+
+The OCTEON cnxk ML PMD provides support for the following set of operations:
+
+Slow-path device and ML model handling:
+
+* ``Device probing, configuration and close``
+* ``Device start / stop``
+* ``Model loading and unloading``
+* ``Model start / stop``
+* ``Data quantization and dequantization``
+
+Fast-path Inference:
+
+* ``Inference execution``
+* ``Error handling``
+
+
+Installation
+------------
+
+The OCTEON cnxk ML PMD may be compiled natively on an OCTEON cnxk platform
+or cross-compiled on an x86 platform.
+
+Refer to :doc:`../platform/cnxk` for instructions to build your DPDK
+application.
+
+
+Initialization
+--------------
+
+``CN10K Initialization``
+
+List the ML PF devices available on cn10k platform:
+
+.. code-block:: console
+
+    lspci -d:a092
+
+``a092`` is the ML device PF id. You should see output similar to:
+
+.. code-block:: console
+
+    0000:00:10.0 System peripheral: Cavium, Inc. Device a092
+
+Bind the ML PF device to the vfio_pci driver:
+
+.. code-block:: console
+
+    cd <dpdk directory>
+    ./usertools/dpdk-devbind.py -u 0000:00:10.0
+    ./usertools/dpdk-devbind.py -b vfio-pci 0000:00:10.0
+
+Runtime Config Options
+----------------------
+
+- ``Firmware file path`` (default ``/lib/firmware/mlip-fw.bin``)
+
+   Path to the firmware binary to be loaded during device configuration.
+   The ``fw_path`` ``devargs`` parameter can be used by the user to load
+   ML firmware from a custom path.
+
+   For example::
+
+      -a 0000:00:10.0,fw_path="/home/user/ml_fw.bin"
+
+   With the above configuration, driver loads the firmware from the path
+   "/home/user/ml_fw.bin".
+
+- ``Enable DPE warnings`` (default ``1``)
+
+   ML firmware can be configured during load to handle the DPE errors reported
+   by ML inference engine. When enabled, firmware would mask the DPE non-fatal
+   hardware errors as warnings. The parameter ``enable_dpe_warnings`` ``devargs``
+   is used fo this configuration.
+
+   For example::
+
+      -a 0000:00:10.0,enable_dpe_warnings=0
+
+   With the above configuration, DPE non-fatal errors reported by HW are
+   considered as errors.
+
+
+- ``Model data caching`` (default ``1``)
+
+   Enable caching model data on ML ACC cores. Enabling this option executes a
+   dummy inference request in synchronous mode during model start stage. Caching
+   of model data improves the inferencing throughput / latency for the model.
+   The parameter ``cache_model_data`` ``devargs`` is used to enable data caching.
+
+   For example::
+
+      -a 0000:00:10.0,cache_model_data=0
+
+   With the above configuration, model data caching is disabled.
+
+
+- ``OCM allocation mode`` (default ``lowest``)
+
+   Option to specify the method to be used while allocating OCM memory for a
+   model during model start. Two modes are supported by the driver. The
+   parameter ``ocm_alloc_mode`` ``devargs`` is used to select the OCM
+   allocation mode.
+
+   ``lowest`` - Allocate OCM for the model from first available free slot. Search
+   for the free slot is done starting from the lowest tile ID and lowest page ID.
+   ``largest`` - Allocate OCM for the model from the slot with largest amount of
+   free space.
+
+   For example::
+
+      -a 0000:00:10.0,ocm_alloc_mode=lowest
+
+   With the above configuration, OCM allocation fo the model would be done from
+   the first available free slot / from the lowest possible tile ID.
+
+
+- ``Enable hardware queue lock`` (default ``0``)
+
+   Option to select the job request enqueue function to used to queue the requests
+   to hardware queue. The parameter ``hw_queue_lock`` ``devargs`` is used to select
+   the enqueue function.
+
+   ``0`` - Disable (default), use lock free version of hardware enqueue function
+   for job queuing in enqueue burst operation. To avoid race condition in request
+   queuing to hardware, disabling hw_queue_lock restricts the number of queue-pairs
+   supported by cnxk driver to 1.
+   ``1`` - Enable, use spin-lock version of hardware enqueue function for job queuing.
+   Enabling spinlock version would disable restrictions on the number of queue-pairs
+   that can be supported by the driver.
+
+   For example::
+
+      -a 0000:00:10.0,hw_queue_lock=1
+
+   With the above configuration, spinlock version of hardware enqueue function is used
+   in the fast path enqueue burst operation.
+
+
+- ``Polling memory location`` (default ``ddr``)
+
+   ML cnxk driver provides the option to select the memory location to be used
+   for polling to check the inference request completion. Driver supports using
+   the either DDR address space (``ddr``) or ML registers (``register``) as
+   polling locations. The parameter ``poll_mem`` ``devargs`` is used to specify
+   the poll location.
+
+   For example::
+
+      -a 0000:00:10.0,poll_mem="register"
+
+   With the above configuration, ML cnxk driver is configured to use ML registers
+   for polling in fastpath requests.
+
+
+Debugging Options
+-----------------
+
+.. _table_octeon_cnxk_ml_debug_options:
+
+.. table:: OCTEON cnxk ML PMD debug options
+
+    +---+------------+-------------------------------------------------------+
+    | # | Component  | EAL log command                                       |
+    +===+============+=======================================================+
+    | 1 | ML         | --log-level='pmd\.ml\.cnxk,8'                         |
+    +---+------------+-------------------------------------------------------+
+
+
+Extended stats
+--------------
+
+Marvell cnxk ML PMD supports reporting the inference latencies through extended
+stats. The PMD supports the below list of 6 extended stats types per each model.
+Total number of extended stats would be equal to 6 x number of models loaded.
+
+.. _table_octeon_cnxk_ml_xstats_names:
+
+.. table:: OCTEON cnxk ML PMD xstats names
+
+    +---+---------------------+----------------------------------------------+
+    | # | Type                | Description                                  |
+    +===+=====================+==============================================+
+    | 1 | Avg-HW-Latency      | Average hardware latency                     |
+    +---+---------------------+----------------------------------------------+
+    | 2 | Min-HW-Latency      | Minimum hardware latency                     |
+    +---+---------------------+----------------------------------------------+
+    | 3 | Max-HW-Latency      | Maximum hardware latency                     |
+    +---+---------------------+----------------------------------------------+
+    | 4 | Avg-HW-Latency      | Average firmware latency                     |
+    +---+---------------------+----------------------------------------------+
+    | 5 | Avg-HW-Latency      | Minimum firmware latency                     |
+    +---+---------------------+----------------------------------------------+
+    | 6 | Avg-HW-Latency      | Maximum firmware latency                     |
+    +---+---------------------+----------------------------------------------+
+
+Latency values reported by the PMD through xstats can have units, either in
+cycles or nano seconds. The units of the latency is determined during DPDK
+initialization and would depend on the availability of SCLK. Latencies are
+reported in nao seconds when the SCLK is available and in cycles otherwise.
+Application needs to initialize at least one RVU for the clock to be available.
+
+xstats names are dynamically generated by the PMD and would have the format
+"Model-<model_id>-Type-<units>".
+
+For example::
+   Model-1-Avg-FW-Latency-ns
+
+The above xstat name would report average firmware latency in nano seconds for
+model with model ID 1.
+
+Number of xstats made available by the PMD change dynamically. The number would
+increase with loading a model and would decrease with unloading a model.
+Application needs to update the xstats map after a model is either loaded or
+unloaded.
diff --git a/doc/guides/mldevs/index.rst b/doc/guides/mldevs/index.rst
new file mode 100644
index 0000000000..f201e54175
--- /dev/null
+++ b/doc/guides/mldevs/index.rst
@@ -0,0 +1,14 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright (c) 2022 Marvell.
+
+Machine Learning Device Driver
+==============================
+
+The following are a list of ML device PMDs, which can be used from an
+application through the ML device API.
+
+.. toctree::
+    :maxdepth: 2
+    :numbered:
+
+    cnxk
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* [PATCH v6 39/39] ml/cnxk: add support for configurable ocm page
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (37 preceding siblings ...)
  2023-03-10  8:20   ` [PATCH v6 38/39] ml/cnxk: add user guide for marvell cnxk ml driver Srikanth Yalavarthi
@ 2023-03-10  8:20   ` Srikanth Yalavarthi
  2023-03-10  9:31   ` [PATCH v6 00/39] Implementation of ML CNXK driver Thomas Monjalon
  2023-03-10 15:24   ` Thomas Monjalon
  40 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:20 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

Added support for configurable OCM page size. A new device
argument "ocm_page_size" is added to specify the page size
for OCM management. Supported page sizes are 1KB, 2KB, 4KB,
8KB and 16KB. Default page size is 16KB.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 doc/guides/mldevs/cnxk.rst       | 16 +++++++++
 drivers/ml/cnxk/cn10k_ml_dev.c   | 61 ++++++++++++++++++++++++++++----
 drivers/ml/cnxk/cn10k_ml_dev.h   |  3 ++
 drivers/ml/cnxk/cn10k_ml_model.c |  6 ++--
 drivers/ml/cnxk/cn10k_ml_ocm.c   | 18 +++++++---
 drivers/ml/cnxk/cn10k_ml_ocm.h   | 14 +++-----
 drivers/ml/cnxk/cn10k_ml_ops.c   | 17 ++++++---
 7 files changed, 107 insertions(+), 28 deletions(-)

diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
index da40336299..f7f61e8bfa 100644
--- a/doc/guides/mldevs/cnxk.rst
+++ b/doc/guides/mldevs/cnxk.rst
@@ -175,6 +175,22 @@ Runtime Config Options
    With the above configuration, ML cnxk driver is configured to use ML registers
    for polling in fastpath requests.
 
+- ``OCM page size`` (default ``16384``)
+
+   Option to specify the page size in bytes to be used for OCM management. Available
+   OCM is split into multiple pages of specified sizes and the pages are allocated to
+   the models. The parameter ``ocm_page_size`` ``devargs`` is used to specify the page
+   size to be used.
+
+   Supported page sizes by the driver are 1 KB, 2 KB, 4 KB, 8 KB and 16 KB. Default
+   page size is 16 KB.
+
+   For example::
+
+      -a 0000:00:10.0,ocm_page_size=8192
+
+   With the above configuration, page size of OCM is set to 8192 bytes / 8 KB.
+
 
 Debugging Options
 -----------------
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index a746a66849..6f9a1015a6 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -24,6 +24,7 @@
 #define CN10K_ML_OCM_ALLOC_MODE		"ocm_alloc_mode"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK	"hw_queue_lock"
 #define CN10K_ML_FW_POLL_MEM		"poll_mem"
+#define CN10K_ML_OCM_PAGE_SIZE		"ocm_page_size"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT 1
@@ -32,6 +33,7 @@
 #define CN10K_ML_OCM_ALLOC_MODE_DEFAULT		"lowest"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT	1
 #define CN10K_ML_FW_POLL_MEM_DEFAULT		"ddr"
+#define CN10K_ML_OCM_PAGE_SIZE_DEFAULT		16384
 
 /* ML firmware macros */
 #define FW_MEMZONE_NAME		 "ml_cn10k_fw_mz"
@@ -53,8 +55,12 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 					 CN10K_ML_OCM_ALLOC_MODE,
 					 CN10K_ML_DEV_HW_QUEUE_LOCK,
 					 CN10K_ML_FW_POLL_MEM,
+					 CN10K_ML_OCM_PAGE_SIZE,
 					 NULL};
 
+/* Supported OCM page sizes: 1KB, 2KB, 4KB, 8KB and 16KB */
+static const int valid_ocm_page_size[] = {1024, 2048, 4096, 8192, 16384};
+
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
 
@@ -95,12 +101,15 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	struct rte_kvargs *kvlist = NULL;
 	bool ocm_alloc_mode_set = false;
 	bool hw_queue_lock_set = false;
+	bool ocm_page_size_set = false;
 	char *ocm_alloc_mode = NULL;
 	bool poll_mem_set = false;
 	bool fw_path_set = false;
 	char *poll_mem = NULL;
 	char *fw_path = NULL;
 	int ret = 0;
+	bool found;
+	uint8_t i;
 
 	if (devargs == NULL)
 		goto check_args;
@@ -191,6 +200,17 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		poll_mem_set = true;
 	}
 
+	if (rte_kvargs_count(kvlist, CN10K_ML_OCM_PAGE_SIZE) == 1) {
+		ret = rte_kvargs_process(kvlist, CN10K_ML_OCM_PAGE_SIZE, &parse_integer_arg,
+					 &mldev->ocm_page_size);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n", CN10K_ML_OCM_PAGE_SIZE);
+			ret = -EINVAL;
+			goto exit;
+		}
+		ocm_page_size_set = true;
+	}
+
 check_args:
 	if (!fw_path_set)
 		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
@@ -272,6 +292,32 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %s", CN10K_ML_FW_POLL_MEM, mldev->fw.poll_mem);
 
+	if (!ocm_page_size_set) {
+		mldev->ocm_page_size = CN10K_ML_OCM_PAGE_SIZE_DEFAULT;
+	} else {
+		if (mldev->ocm_page_size < 0) {
+			plt_err("Invalid argument, %s = %d\n", CN10K_ML_OCM_PAGE_SIZE,
+				mldev->ocm_page_size);
+			ret = -EINVAL;
+			goto exit;
+		}
+
+		found = false;
+		for (i = 0; i < PLT_DIM(valid_ocm_page_size); i++) {
+			if (mldev->ocm_page_size == valid_ocm_page_size[i]) {
+				found = true;
+				break;
+			}
+		}
+
+		if (!found) {
+			plt_err("Unsupported ocm_page_size = %d\n", mldev->ocm_page_size);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_info("ML: %s = %d", CN10K_ML_OCM_PAGE_SIZE, mldev->ocm_page_size);
+
 exit:
 	if (kvlist)
 		rte_kvargs_free(kvlist);
@@ -814,10 +860,11 @@ RTE_PMD_REGISTER_PCI(MLDEV_NAME_CN10K_PMD, cn10k_mldev_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(MLDEV_NAME_CN10K_PMD, pci_id_ml_table);
 RTE_PMD_REGISTER_KMOD_DEP(MLDEV_NAME_CN10K_PMD, "vfio-pci");
 
-RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD,
-			      CN10K_ML_FW_PATH "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
-					       "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
-					       "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
-					       "=<0|1>" CN10K_ML_OCM_ALLOC_MODE
-					       "=<lowest|largest>" CN10K_ML_DEV_HW_QUEUE_LOCK
-					       "=<0|1>" CN10K_ML_FW_POLL_MEM "=<ddr|register>");
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH
+			      "=<path>" CN10K_ML_FW_ENABLE_DPE_WARNINGS
+			      "=<0|1>" CN10K_ML_FW_REPORT_DPE_WARNINGS
+			      "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
+			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE
+			      "=<lowest|largest>" CN10K_ML_DEV_HW_QUEUE_LOCK
+			      "=<0|1>" CN10K_ML_FW_POLL_MEM "=<ddr|register>" CN10K_ML_OCM_PAGE_SIZE
+			      "=<1024|2048|4096|8192|16384>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 966d92e027..b4e46899c0 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -406,6 +406,9 @@ struct cn10k_ml_dev {
 	/* Use spinlock version of ROC enqueue */
 	int hw_queue_lock;
 
+	/* OCM page size */
+	int ocm_page_size;
+
 	/* JCMD enqueue function handler */
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
 
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 0ded355d81..ceffde8459 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -339,11 +339,11 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui
 		   scratch_size, *scratch_pages);
 
 	/* Check if the model can be loaded on OCM */
-	if ((*wb_pages + *scratch_pages) > ML_CN10K_OCM_NUMPAGES) {
+	if ((*wb_pages + *scratch_pages) > mldev->ocm.num_pages) {
 		plt_err("Cannot create the model, OCM relocatable = %u",
 			metadata->model.ocm_relocatable);
 		plt_err("wb_pages (%u) + scratch_pages (%u) > %u", *wb_pages, *scratch_pages,
-			ML_CN10K_OCM_NUMPAGES);
+			mldev->ocm.num_pages);
 		return -ENOMEM;
 	}
 
@@ -352,7 +352,7 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui
 	 */
 	if (!metadata->model.ocm_relocatable)
 		*scratch_pages =
-			PLT_MAX(PLT_U64_CAST(*scratch_pages), PLT_U64_CAST(ML_CN10K_OCM_NUMPAGES));
+			PLT_MAX(PLT_U64_CAST(*scratch_pages), PLT_U64_CAST(mldev->ocm.num_pages));
 
 	return 0;
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 551faef7eb..d8d2c71a3c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -220,13 +220,13 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 
-	uint8_t local_ocm_mask[ML_CN10K_OCM_MASKWORDS] = {0};
 	uint16_t used_scratch_pages_max;
 	uint16_t scratch_page_start;
 	int used_last_wb_page_max;
 	uint16_t scratch_page_end;
 	uint8_t search_start_tile;
 	uint8_t search_end_tile;
+	uint8_t *local_ocm_mask;
 	int wb_page_start_curr;
 	int max_slot_sz_curr;
 	uint8_t tile_start;
@@ -268,6 +268,9 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 		search_end_tile = start_tile;
 	}
 
+	/* nibbles + prefix '0x' */
+	local_ocm_mask = rte_zmalloc("local_ocm_mask", mldev->ocm.mask_words, RTE_CACHE_LINE_SIZE);
+
 	tile_start = search_start_tile;
 start_search:
 	used_scratch_pages_max = 0;
@@ -279,7 +282,7 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 			PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, used_last_wb_page_max);
 	}
 
-	memset(local_ocm_mask, 0, sizeof(local_ocm_mask));
+	memset(local_ocm_mask, 0, mldev->ocm.mask_words);
 	for (tile_id = tile_start; tile_id < tile_start + num_tiles; tile_id++) {
 		for (word_id = 0; word_id < ocm->mask_words; word_id++)
 			local_ocm_mask[word_id] |= ocm->tile_ocm_info[tile_id].ocm_mask[word_id];
@@ -332,6 +335,8 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	if (wb_page_start != -1)
 		*tilemask = GENMASK_ULL(tile_idx + num_tiles - 1, tile_idx);
 
+	rte_free(local_ocm_mask);
+
 	return wb_page_start;
 }
 
@@ -480,7 +485,7 @@ cn10k_ml_ocm_pagemask_to_str(struct cn10k_ml_ocm_tile_info *tile_info, uint16_t
 void
 cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
 {
-	char str[ML_CN10K_OCM_NUMPAGES / 4 + 2]; /* nibbles + prefix '0x' */
+	char *str;
 	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	uint8_t tile_id;
@@ -490,12 +495,15 @@ cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
 	mldev = dev->data->dev_private;
 	ocm = &mldev->ocm;
 
+	/* nibbles + prefix '0x' */
+	str = rte_zmalloc("ocm_mask_str", mldev->ocm.num_pages / 4 + 2, RTE_CACHE_LINE_SIZE);
+
 	fprintf(fp, "OCM State:\n");
 	for (tile_id = 0; tile_id < ocm->num_tiles; tile_id++) {
 		cn10k_ml_ocm_pagemask_to_str(&ocm->tile_ocm_info[tile_id], ocm->mask_words, str);
 
 		wb_pages = 0 - ocm->tile_ocm_info[tile_id].scratch_pages;
-		for (word_id = 0; word_id < ML_CN10K_OCM_MASKWORDS; word_id++)
+		for (word_id = 0; word_id < mldev->ocm.mask_words; word_id++)
 			wb_pages +=
 				__builtin_popcount(ocm->tile_ocm_info[tile_id].ocm_mask[word_id]);
 
@@ -506,4 +514,6 @@ cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
 			tile_id, ocm->tile_ocm_info[tile_id].scratch_pages, wb_pages,
 			ocm->tile_ocm_info[tile_id].last_wb_page, str);
 	}
+
+	rte_free(str);
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 5f018b410a..3404e7fd65 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -8,25 +8,16 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
-/* Page size in bytes. */
-#define ML_CN10K_OCM_PAGESIZE 0x4000
-
 /* Number of OCM tiles. */
 #define ML_CN10K_OCM_NUMTILES 0x8
 
 /* OCM in bytes, per tile. */
 #define ML_CN10K_OCM_TILESIZE 0x100000
 
-/* OCM pages, per tile. */
-#define ML_CN10K_OCM_NUMPAGES (ML_CN10K_OCM_TILESIZE / ML_CN10K_OCM_PAGESIZE)
-
-/* Maximum OCM mask words, per tile, 8 bit words. */
-#define ML_CN10K_OCM_MASKWORDS (ML_CN10K_OCM_NUMPAGES / 8)
-
 /* OCM and Tile information structure */
 struct cn10k_ml_ocm_tile_info {
 	/* Mask of used / allotted pages on tile's OCM */
-	uint8_t ocm_mask[ML_CN10K_OCM_MASKWORDS];
+	uint8_t *ocm_mask;
 
 	/* Last pages in the tile's OCM used for weights and bias, default = -1 */
 	int last_wb_page;
@@ -78,6 +69,9 @@ struct cn10k_ml_ocm {
 
 	/* OCM memory info and status*/
 	struct cn10k_ml_ocm_tile_info tile_ocm_info[ML_CN10K_OCM_NUMTILES];
+
+	/* Memory for ocm_mask */
+	uint8_t *ocm_mask;
 };
 
 int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 61e6d023c5..5b77e47322 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -311,8 +311,8 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 	if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
 		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
 			ML_CN10K_OCM_NUMTILES / 4, model->model_mem_map.tilemask);
-		fprintf(fp, "%*s : 0x%x\n", FIELD_LEN, "ocm_wb_start",
-			model->model_mem_map.wb_page_start * ML_CN10K_OCM_PAGESIZE);
+		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
+			model->model_mem_map.wb_page_start * mldev->ocm.page_size);
 	}
 
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->metadata.model.num_input);
@@ -781,12 +781,18 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	ocm = &mldev->ocm;
 	ocm->num_tiles = ML_CN10K_OCM_NUMTILES;
 	ocm->size_per_tile = ML_CN10K_OCM_TILESIZE;
-	ocm->page_size = ML_CN10K_OCM_PAGESIZE;
+	ocm->page_size = mldev->ocm_page_size;
 	ocm->num_pages = ocm->size_per_tile / ocm->page_size;
 	ocm->mask_words = ocm->num_pages / (8 * sizeof(uint8_t));
 
-	for (tile_id = 0; tile_id < ocm->num_tiles; tile_id++)
+	/* Allocate memory for ocm_mask */
+	ocm->ocm_mask =
+		rte_zmalloc("ocm_mask", ocm->mask_words * ocm->num_tiles, RTE_CACHE_LINE_SIZE);
+
+	for (tile_id = 0; tile_id < ocm->num_tiles; tile_id++) {
+		ocm->tile_ocm_info[tile_id].ocm_mask = ocm->ocm_mask + tile_id * ocm->mask_words;
 		ocm->tile_ocm_info[tile_id].last_wb_page = -1;
+	}
 
 	rte_spinlock_init(&ocm->lock);
 
@@ -856,6 +862,9 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 
 	mldev = dev->data->dev_private;
 
+	/* Release ocm_mask memory */
+	rte_free(mldev->ocm.ocm_mask);
+
 	/* Stop and unload all models */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 		model = dev->data->models[model_id];
-- 
2.17.1


^ permalink raw reply	[flat|nested] 253+ messages in thread

* RE: [EXT] Re: [PATCH v5 02/39] ml/cnxk: add skeleton for ML cnxk driver
  2023-03-09 22:06     ` Thomas Monjalon
@ 2023-03-10  8:25       ` Srikanth Yalavarthi
  2023-03-10  9:28         ` Thomas Monjalon
  0 siblings, 1 reply; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  8:25 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, Shivah Shankar Shankar Narayan Rao,
	Jerin Jacob Kollanukkaran, Anup Prabhu, Prince Takkar,
	Parijat Shukla, Srikanth Yalavarthi

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: 10 March 2023 03:36
> To: Srikanth Yalavarthi <syalavarthi@marvell.com>
> Cc: dev@dpdk.org; Shivah Shankar Shankar Narayan Rao
> <sshankarnara@marvell.com>; Jerin Jacob Kollanukkaran
> <jerinj@marvell.com>; Anup Prabhu <aprabhu@marvell.com>; Prince Takkar
> <ptakkar@marvell.com>; Parijat Shukla <pshukla@marvell.com>; Srikanth
> Yalavarthi <syalavarthi@marvell.com>
> Subject: [EXT] Re: [PATCH v5 02/39] ml/cnxk: add skeleton for ML cnxk driver
> 
> External Email
> 
> ----------------------------------------------------------------------
> 07/02/2023 17:06, Srikanth Yalavarthi:
> > --- a/doc/guides/rel_notes/release_23_03.rst
> > +++ b/doc/guides/rel_notes/release_23_03.rst
> > +* **Implementation of Marvell CNXK machine learning driver for .**
> 
> It seems a word is missing.
> It  looks like you did a lot of work on the mldev series, so some details are
> missing.
> 


Done. Updated the release notes correctly in version 6 patch series.


> > +
> > +  * Added ml/cnxk driver which provides support for machine learning
> inference
> > +    operations on Marvell's CN10K series of SoC's.
> > +  * Added ML ROC code for ml/cnxk driver to common/cnxk.
> > +  * Added implementation with support for all rte_ml APIs.
> 
> 
> 


^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [EXT] Re: [PATCH v5 02/39] ml/cnxk: add skeleton for ML cnxk driver
  2023-03-10  8:25       ` [EXT] " Srikanth Yalavarthi
@ 2023-03-10  9:28         ` Thomas Monjalon
  2023-03-10  9:31           ` Srikanth Yalavarthi
  0 siblings, 1 reply; 253+ messages in thread
From: Thomas Monjalon @ 2023-03-10  9:28 UTC (permalink / raw)
  To: Srikanth Yalavarthi
  Cc: dev, Shivah Shankar Shankar Narayan Rao,
	Jerin Jacob Kollanukkaran, Anup Prabhu, Prince Takkar,
	Parijat Shukla

10/03/2023 09:25, Srikanth Yalavarthi:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 07/02/2023 17:06, Srikanth Yalavarthi:
> > > --- a/doc/guides/rel_notes/release_23_03.rst
> > > +++ b/doc/guides/rel_notes/release_23_03.rst
> > > +* **Implementation of Marvell CNXK machine learning driver for .**
> > 
> > It seems a word is missing.
> > It  looks like you did a lot of work on the mldev series, so some details are
> > missing.
> 
> Done. Updated the release notes correctly in version 6 patch series.

I am close to merge v5 already.
I don't want to restart all the testing process.
Did you do other changes in v6?



^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH v6 00/39] Implementation of ML CNXK driver
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (38 preceding siblings ...)
  2023-03-10  8:20   ` [PATCH v6 39/39] ml/cnxk: add support for configurable ocm page Srikanth Yalavarthi
@ 2023-03-10  9:31   ` Thomas Monjalon
  2023-03-10 10:30     ` [EXT] " Srikanth Yalavarthi
  2023-03-10 15:24   ` Thomas Monjalon
  40 siblings, 1 reply; 253+ messages in thread
From: Thomas Monjalon @ 2023-03-10  9:31 UTC (permalink / raw)
  To: dev; +Cc: syalavarthi, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

10/03/2023 09:19, Srikanth Yalavarthi:
> Marvell ML CNXK Driver
> ----------------------
> 
> This patch series implements common Machine Learning (ML) ROC code
> and driver for Marvell Octeon 10 (cnxk) platform. ML inferencing is
> supported on cnxk platform through an integrated ML inferencing
> processor. The current driver supports programming the ML hardware
> engine through offload mode.
> 
> All APIs proposed in the DPDK ML device specification are supported on
> the cnxk platform.
> 
> v6:
> * Fixed release notes content
> * Rebased the patch series
> 
> v5:
> * Updated model_id to uint16_t
> * Updated release notes for 23.03

I will use v5 as it is ready to push.

Note: you never integrate acks in your new versions,
so I have to look for it and include it for you.

Note 2: we are fixing more build issues in the main repo
because of the mldev integration.



^ permalink raw reply	[flat|nested] 253+ messages in thread

* RE: [EXT] Re: [PATCH v5 02/39] ml/cnxk: add skeleton for ML cnxk driver
  2023-03-10  9:28         ` Thomas Monjalon
@ 2023-03-10  9:31           ` Srikanth Yalavarthi
  0 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10  9:31 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, Shivah Shankar Shankar Narayan Rao,
	Jerin Jacob Kollanukkaran, Anup Prabhu, Prince Takkar,
	Parijat Shukla, Srikanth Yalavarthi

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: 10 March 2023 14:59
> To: Srikanth Yalavarthi <syalavarthi@marvell.com>
> Cc: dev@dpdk.org; Shivah Shankar Shankar Narayan Rao
> <sshankarnara@marvell.com>; Jerin Jacob Kollanukkaran
> <jerinj@marvell.com>; Anup Prabhu <aprabhu@marvell.com>; Prince Takkar
> <ptakkar@marvell.com>; Parijat Shukla <pshukla@marvell.com>
> Subject: Re: [EXT] Re: [PATCH v5 02/39] ml/cnxk: add skeleton for ML cnxk
> driver
> 
> 10/03/2023 09:25, Srikanth Yalavarthi:
> > From: Thomas Monjalon <thomas@monjalon.net>
> > > 07/02/2023 17:06, Srikanth Yalavarthi:
> > > > --- a/doc/guides/rel_notes/release_23_03.rst
> > > > +++ b/doc/guides/rel_notes/release_23_03.rst
> > > > +* **Implementation of Marvell CNXK machine learning driver for
> > > > +.**
> > >
> > > It seems a word is missing.
> > > It  looks like you did a lot of work on the mldev series, so some
> > > details are missing.
> >
> > Done. Updated the release notes correctly in version 6 patch series.
> 
> I am close to merge v5 already.
> I don't want to restart all the testing process.
> Did you do other changes in v6?

No additional changes other than release notes fix

> 


^ permalink raw reply	[flat|nested] 253+ messages in thread

* RE: [EXT] Re: [PATCH v6 00/39] Implementation of ML CNXK driver
  2023-03-10  9:31   ` [PATCH v6 00/39] Implementation of ML CNXK driver Thomas Monjalon
@ 2023-03-10 10:30     ` Srikanth Yalavarthi
  0 siblings, 0 replies; 253+ messages in thread
From: Srikanth Yalavarthi @ 2023-03-10 10:30 UTC (permalink / raw)
  To: Thomas Monjalon, dev
  Cc: Shivah Shankar Shankar Narayan Rao, Jerin Jacob Kollanukkaran,
	Anup Prabhu, Prince Takkar, Parijat Shukla, Srikanth Yalavarthi

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: 10 March 2023 15:01
> To: dev@dpdk.org
> Cc: Srikanth Yalavarthi <syalavarthi@marvell.com>; Shivah Shankar Shankar
> Narayan Rao <sshankarnara@marvell.com>; Jerin Jacob Kollanukkaran
> <jerinj@marvell.com>; Anup Prabhu <aprabhu@marvell.com>; Prince Takkar
> <ptakkar@marvell.com>; Parijat Shukla <pshukla@marvell.com>
> Subject: [EXT] Re: [PATCH v6 00/39] Implementation of ML CNXK driver
> 
> External Email
> 
> ----------------------------------------------------------------------
> 10/03/2023 09:19, Srikanth Yalavarthi:
> > Marvell ML CNXK Driver
> > ----------------------
> >
> > This patch series implements common Machine Learning (ML) ROC code
> and
> > driver for Marvell Octeon 10 (cnxk) platform. ML inferencing is
> > supported on cnxk platform through an integrated ML inferencing
> > processor. The current driver supports programming the ML hardware
> > engine through offload mode.
> >
> > All APIs proposed in the DPDK ML device specification are supported on
> > the cnxk platform.
> >
> > v6:
> > * Fixed release notes content
> > * Rebased the patch series
> >
> > v5:
> > * Updated model_id to uint16_t
> > * Updated release notes for 23.03
> 
> I will use v5 as it is ready to push.
> 
> Note: you never integrate acks in your new versions, so I have to look for it
> and include it for you.

Noted, Will take care of integrating Ack's in all future patches.

> 
> Note 2: we are fixing more build issues in the main repo because of the
> mldev integration.
> 

^ permalink raw reply	[flat|nested] 253+ messages in thread

* Re: [PATCH v6 00/39] Implementation of ML CNXK driver
  2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
                     ` (39 preceding siblings ...)
  2023-03-10  9:31   ` [PATCH v6 00/39] Implementation of ML CNXK driver Thomas Monjalon
@ 2023-03-10 15:24   ` Thomas Monjalon
  40 siblings, 0 replies; 253+ messages in thread
From: Thomas Monjalon @ 2023-03-10 15:24 UTC (permalink / raw)
  To: Srikanth Yalavarthi
  Cc: dev, syalavarthi, sshankarnara, jerinj, aprabhu, ptakkar, pshukla

10/03/2023 09:19, Srikanth Yalavarthi:
> Marvell ML CNXK Driver
> ----------------------
> 
> This patch series implements common Machine Learning (ML) ROC code
> and driver for Marvell Octeon 10 (cnxk) platform. ML inferencing is
> supported on cnxk platform through an integrated ML inferencing
> processor. The current driver supports programming the ML hardware
> engine through offload mode.
> 
> All APIs proposed in the DPDK ML device specification are supported on
> the cnxk platform.

I've reworked a bit the documentation and squashed it in relevant commits.

Applied, thanks.




^ permalink raw reply	[flat|nested] 253+ messages in thread

end of thread, other threads:[~2023-03-10 15:24 UTC | newest]

Thread overview: 253+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-08 20:01 [PATCH v1 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
2022-12-08 20:01 ` [PATCH v1 01/37] ml/cnxk: add skeleton for ML cnxk driver Srikanth Yalavarthi
2022-12-08 20:01 ` [PATCH v1 02/37] ml/cnxk: enable probe and remove of ML device Srikanth Yalavarthi
2022-12-08 20:01 ` [PATCH v1 03/37] ml/cnxk: add driver support to get device info Srikanth Yalavarthi
2022-12-08 20:01 ` [PATCH v1 04/37] ml/cnxk: add support for configure and close Srikanth Yalavarthi
2022-12-08 20:01 ` [PATCH v1 05/37] ml/cnxk: parse ML firmware path from device args Srikanth Yalavarthi
2022-12-08 20:01 ` [PATCH v1 06/37] ml/cnxk: enable firmware load and device reset Srikanth Yalavarthi
2022-12-08 20:01 ` [PATCH v1 07/37] ml/cnxk: enable support for simulator environment Srikanth Yalavarthi
2022-12-08 20:01 ` [PATCH v1 08/37] ml/cnxk: enable support for device start and stop Srikanth Yalavarthi
2022-12-08 20:01 ` [PATCH v1 09/37] ml/cnxk: add support to create device queue-pairs Srikanth Yalavarthi
2022-12-08 20:01 ` [PATCH v1 10/37] ml/cnxk: add functions to load and unload models Srikanth Yalavarthi
2022-12-08 20:01 ` [PATCH v1 11/37] ml/cnxk: enable validity checks for model metadata Srikanth Yalavarthi
2022-12-08 20:01 ` [PATCH v1 12/37] ml/cnxk: add internal structures for derived info Srikanth Yalavarthi
2022-12-08 20:01 ` [PATCH v1 13/37] ml/cnxk: add internal structures for tiles and OCM Srikanth Yalavarthi
2022-12-08 20:01 ` [PATCH v1 14/37] ml/cnxk: add structures for slow and fast path JDs Srikanth Yalavarthi
2022-12-08 20:01 ` [PATCH v1 15/37] ml/cnxk: find OCM mask and page slots for a model Srikanth Yalavarthi
2022-12-08 20:01 ` [PATCH v1 16/37] ml/cnxk: add support to reserve and free OCM pages Srikanth Yalavarthi
2022-12-08 20:02 ` [PATCH v1 17/37] ml/cnxk: enable support to start an ML model Srikanth Yalavarthi
2022-12-08 20:02 ` [PATCH v1 18/37] ml/cnxk: enable support to stop an ML models Srikanth Yalavarthi
2022-12-08 20:02 ` [PATCH v1 19/37] ml/cnxk: enable support to get model information Srikanth Yalavarthi
2022-12-08 20:02 ` [PATCH v1 20/37] ml/cnxk: enable support to update model params Srikanth Yalavarthi
2022-12-08 20:02 ` [PATCH v1 21/37] ml/cnxk: add support to get IO buffer sizes Srikanth Yalavarthi
2022-12-08 20:02 ` [PATCH v1 22/37] ml/cnxk: enable quantization and dequantization Srikanth Yalavarthi
2022-12-08 20:02 ` [PATCH v1 23/37] ml/cnxk: enable support to dump device debug info Srikanth Yalavarthi
2022-12-08 20:02 ` [PATCH v1 24/37] ml/cnxk: add driver support for device selftest Srikanth Yalavarthi
2022-12-08 20:02 ` [PATCH v1 25/37] ml/cnxk: enqueue a burst of inference requests Srikanth Yalavarthi
2022-12-08 20:02 ` [PATCH v1 26/37] ml/cnxk: dequeue " Srikanth Yalavarthi
2022-12-08 20:02 ` [PATCH v1 27/37] ml/cnxk: add internal function for sync mode run Srikanth Yalavarthi
2022-12-08 20:02 ` [PATCH v1 28/37] ml/cnxk: enable support for firmware error codes Srikanth Yalavarthi
2022-12-08 20:02 ` [PATCH v1 29/37] ml/cnxk: add support to get and reset device stats Srikanth Yalavarthi
2022-12-08 20:02 ` [PATCH v1 30/37] ml/cnxk: add support to handle extended dev stats Srikanth Yalavarthi
2022-12-08 20:02 ` [PATCH v1 31/37] ml/cnxk: enable support to get xstats in cycles Srikanth Yalavarthi
2022-12-08 20:02 ` [PATCH v1 32/37] ml/cnxk: add support to report DPE FW warnings Srikanth Yalavarthi
2022-12-08 20:02 ` [PATCH v1 33/37] ml/cnxk: add support to enable model data caching Srikanth Yalavarthi
2022-12-08 20:02 ` [PATCH v1 34/37] ml/cnxk: add support to select OCM allocation mode Srikanth Yalavarthi
2022-12-08 20:02 ` [PATCH v1 35/37] ml/cnxk: add support to use lock during jcmd enq Srikanth Yalavarthi
2022-12-08 20:02 ` [PATCH v1 36/37] ml/cnxk: add support to select poll memory region Srikanth Yalavarthi
2022-12-08 20:02 ` [PATCH v1 37/37] ml/cnxk: add user guide for marvell cnxk ml driver Srikanth Yalavarthi
2022-12-08 20:17 ` [PATCH v2 00/37] Implementation of ML CNXK driver Srikanth Yalavarthi
2022-12-08 20:17   ` [PATCH v2 01/37] ml/cnxk: add skeleton for ML cnxk driver Srikanth Yalavarthi
2022-12-08 20:17   ` [PATCH v2 02/37] ml/cnxk: enable probe and remove of ML device Srikanth Yalavarthi
2022-12-08 20:17   ` [PATCH v2 03/37] ml/cnxk: add driver support to get device info Srikanth Yalavarthi
2022-12-08 20:17   ` [PATCH v2 04/37] ml/cnxk: add support for configure and close Srikanth Yalavarthi
2022-12-08 20:17   ` [PATCH v2 05/37] ml/cnxk: parse ML firmware path from device args Srikanth Yalavarthi
2022-12-08 20:17   ` [PATCH v2 06/37] ml/cnxk: enable firmware load and device reset Srikanth Yalavarthi
2022-12-08 20:17   ` [PATCH v2 07/37] ml/cnxk: enable support for simulator environment Srikanth Yalavarthi
2022-12-08 20:17   ` [PATCH v2 08/37] ml/cnxk: enable support for device start and stop Srikanth Yalavarthi
2022-12-08 20:17   ` [PATCH v2 09/37] ml/cnxk: add support to create device queue-pairs Srikanth Yalavarthi
2022-12-08 20:17   ` [PATCH v2 10/37] ml/cnxk: add functions to load and unload models Srikanth Yalavarthi
2022-12-08 20:17   ` [PATCH v2 11/37] ml/cnxk: enable validity checks for model metadata Srikanth Yalavarthi
2022-12-08 20:17   ` [PATCH v2 12/37] ml/cnxk: add internal structures for derived info Srikanth Yalavarthi
2022-12-08 20:17   ` [PATCH v2 13/37] ml/cnxk: add internal structures for tiles and OCM Srikanth Yalavarthi
2022-12-08 20:17   ` [PATCH v2 14/37] ml/cnxk: add structures for slow and fast path JDs Srikanth Yalavarthi
2022-12-08 20:17   ` [PATCH v2 15/37] ml/cnxk: find OCM mask and page slots for a model Srikanth Yalavarthi
2022-12-08 20:17   ` [PATCH v2 16/37] ml/cnxk: add support to reserve and free OCM pages Srikanth Yalavarthi
2022-12-08 20:17   ` [PATCH v2 17/37] ml/cnxk: enable support to start an ML model Srikanth Yalavarthi
2022-12-08 20:17   ` [PATCH v2 18/37] ml/cnxk: enable support to stop an ML models Srikanth Yalavarthi
2022-12-08 20:17   ` [PATCH v2 19/37] ml/cnxk: enable support to get model information Srikanth Yalavarthi
2022-12-08 20:17   ` [PATCH v2 20/37] ml/cnxk: enable support to update model params Srikanth Yalavarthi
2022-12-08 20:17   ` [PATCH v2 21/37] ml/cnxk: add support to get IO buffer sizes Srikanth Yalavarthi
2022-12-08 20:17   ` [PATCH v2 22/37] ml/cnxk: enable quantization and dequantization Srikanth Yalavarthi
2022-12-08 20:17   ` [PATCH v2 23/37] ml/cnxk: enable support to dump device debug info Srikanth Yalavarthi
2022-12-08 20:17   ` [PATCH v2 24/37] ml/cnxk: add driver support for device selftest Srikanth Yalavarthi
2022-12-08 20:17   ` [PATCH v2 25/37] ml/cnxk: enqueue a burst of inference requests Srikanth Yalavarthi
2022-12-08 20:17   ` [PATCH v2 26/37] ml/cnxk: dequeue " Srikanth Yalavarthi
2022-12-08 20:17   ` [PATCH v2 27/37] ml/cnxk: add internal function for sync mode run Srikanth Yalavarthi
2022-12-08 20:17   ` [PATCH v2 28/37] ml/cnxk: enable support for firmware error codes Srikanth Yalavarthi
2022-12-08 20:17   ` [PATCH v2 29/37] ml/cnxk: add support to get and reset device stats Srikanth Yalavarthi
2022-12-08 20:17   ` [PATCH v2 30/37] ml/cnxk: add support to handle extended dev stats Srikanth Yalavarthi
2022-12-08 20:17   ` [PATCH v2 31/37] ml/cnxk: enable support to get xstats in cycles Srikanth Yalavarthi
2022-12-08 20:18   ` [PATCH v2 32/37] ml/cnxk: add support to report DPE FW warnings Srikanth Yalavarthi
2022-12-08 20:18   ` [PATCH v2 33/37] ml/cnxk: add support to enable model data caching Srikanth Yalavarthi
2022-12-08 20:18   ` [PATCH v2 34/37] ml/cnxk: add support to select OCM allocation mode Srikanth Yalavarthi
2022-12-08 20:18   ` [PATCH v2 35/37] ml/cnxk: add support to use lock during jcmd enq Srikanth Yalavarthi
2022-12-08 20:18   ` [PATCH v2 36/37] ml/cnxk: add support to select poll memory region Srikanth Yalavarthi
2022-12-08 20:18   ` [PATCH v2 37/37] ml/cnxk: add user guide for marvell cnxk ml driver Srikanth Yalavarthi
2022-12-20 19:26   ` [PATCH v3 00/38] Implementation of ML CNXK driver Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 01/38] common/cnxk: add ML headers and ROC code for cnxk Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 02/38] ml/cnxk: add skeleton for ML cnxk driver Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 03/38] ml/cnxk: enable probe and remove of ML device Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 04/38] ml/cnxk: add driver support to get device info Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 05/38] ml/cnxk: add support for configure and close Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 06/38] ml/cnxk: parse ML firmware path from device args Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 07/38] ml/cnxk: enable firmware load and device reset Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 08/38] ml/cnxk: enable support for simulator environment Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 09/38] ml/cnxk: enable support for device start and stop Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 10/38] ml/cnxk: add support to create device queue-pairs Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 11/38] ml/cnxk: add functions to load and unload models Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 12/38] ml/cnxk: enable validity checks for model metadata Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 13/38] ml/cnxk: add internal structures for derived info Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 14/38] ml/cnxk: add internal structures for tiles and OCM Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 15/38] ml/cnxk: add structures for slow and fast path JDs Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 16/38] ml/cnxk: find OCM mask and page slots for a model Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 17/38] ml/cnxk: add support to reserve and free OCM pages Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 18/38] ml/cnxk: enable support to start an ML model Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 19/38] ml/cnxk: enable support to stop an ML models Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 20/38] ml/cnxk: enable support to get model information Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 21/38] ml/cnxk: enable support to update model params Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 22/38] ml/cnxk: add support to get IO buffer sizes Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 23/38] ml/cnxk: enable quantization and dequantization Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 24/38] ml/cnxk: enable support to dump device debug info Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 25/38] ml/cnxk: add driver support for device selftest Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 26/38] ml/cnxk: enqueue a burst of inference requests Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 27/38] ml/cnxk: dequeue " Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 28/38] ml/cnxk: add internal function for sync mode run Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 29/38] ml/cnxk: enable support for firmware error codes Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 30/38] ml/cnxk: add support to get and reset device stats Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 31/38] ml/cnxk: add support to handle extended dev stats Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 32/38] ml/cnxk: enable support to get xstats in cycles Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 33/38] ml/cnxk: add support to report DPE FW warnings Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 34/38] ml/cnxk: add support to enable model data caching Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 35/38] ml/cnxk: add support to select OCM allocation mode Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 36/38] ml/cnxk: add support to use lock during jcmd enq Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 37/38] ml/cnxk: add support to select poll memory region Srikanth Yalavarthi
2022-12-20 19:26     ` [PATCH v3 38/38] ml/cnxk: add user guide for marvell cnxk ml driver Srikanth Yalavarthi
2022-12-20 21:23     ` [PATCH v3 00/38] Implementation of ML CNXK driver Stephen Hemminger
2022-12-21  4:44       ` Jerin Jacob
2023-02-01  9:22 ` [PATCH v4 00/39] " Srikanth Yalavarthi
2023-02-01  9:22   ` [PATCH v4 01/39] common/cnxk: add ML headers and ROC code for cnxk Srikanth Yalavarthi
2023-02-01  9:22   ` [PATCH v4 02/39] ml/cnxk: add skeleton for ML cnxk driver Srikanth Yalavarthi
2023-02-01  9:22   ` [PATCH v4 03/39] ml/cnxk: enable probe and remove of ML device Srikanth Yalavarthi
2023-02-01  9:22   ` [PATCH v4 04/39] ml/cnxk: add driver support to get device info Srikanth Yalavarthi
2023-02-01  9:22   ` [PATCH v4 05/39] ml/cnxk: add support for configure and close Srikanth Yalavarthi
2023-02-01  9:22   ` [PATCH v4 06/39] ml/cnxk: parse ML firmware path from device args Srikanth Yalavarthi
2023-02-01  9:22   ` [PATCH v4 07/39] ml/cnxk: enable firmware load and device reset Srikanth Yalavarthi
2023-02-01  9:22   ` [PATCH v4 08/39] ml/cnxk: enable support for simulator environment Srikanth Yalavarthi
2023-02-01  9:22   ` [PATCH v4 09/39] ml/cnxk: enable support for device start and stop Srikanth Yalavarthi
2023-02-01  9:22   ` [PATCH v4 10/39] ml/cnxk: add support to create device queue-pairs Srikanth Yalavarthi
2023-02-01  9:22   ` [PATCH v4 11/39] ml/cnxk: add functions to load and unload models Srikanth Yalavarthi
2023-02-01  9:22   ` [PATCH v4 12/39] ml/cnxk: enable validity checks for model metadata Srikanth Yalavarthi
2023-02-01  9:22   ` [PATCH v4 13/39] ml/cnxk: add internal structures for derived info Srikanth Yalavarthi
2023-02-01  9:22   ` [PATCH v4 14/39] ml/cnxk: add internal structures for tiles and OCM Srikanth Yalavarthi
2023-02-01  9:22   ` [PATCH v4 15/39] ml/cnxk: add structures for slow and fast path JDs Srikanth Yalavarthi
2023-02-01  9:22   ` [PATCH v4 16/39] ml/cnxk: find OCM mask and page slots for a model Srikanth Yalavarthi
2023-02-01  9:22   ` [PATCH v4 17/39] ml/cnxk: add support to reserve and free OCM pages Srikanth Yalavarthi
2023-02-01  9:22   ` [PATCH v4 18/39] ml/cnxk: enable support to start an ML model Srikanth Yalavarthi
2023-02-01  9:22   ` [PATCH v4 19/39] ml/cnxk: enable support to stop an ML models Srikanth Yalavarthi
2023-02-01  9:22   ` [PATCH v4 20/39] ml/cnxk: enable support to get model information Srikanth Yalavarthi
2023-02-01  9:22   ` [PATCH v4 21/39] ml/cnxk: enable support to update model params Srikanth Yalavarthi
2023-02-01  9:22   ` [PATCH v4 22/39] ml/cnxk: add support to get IO buffer sizes Srikanth Yalavarthi
2023-02-01  9:22   ` [PATCH v4 23/39] ml/cnxk: enable quantization and dequantization Srikanth Yalavarthi
2023-02-01  9:22   ` [PATCH v4 24/39] ml/cnxk: enable support to dump device debug info Srikanth Yalavarthi
2023-02-01  9:22   ` [PATCH v4 25/39] ml/cnxk: add driver support for device selftest Srikanth Yalavarthi
2023-02-01  9:22   ` [PATCH v4 26/39] ml/cnxk: enqueue a burst of inference requests Srikanth Yalavarthi
2023-02-01  9:22   ` [PATCH v4 27/39] ml/cnxk: dequeue " Srikanth Yalavarthi
2023-02-01  9:22   ` [PATCH v4 28/39] ml/cnxk: add internal function for sync mode run Srikanth Yalavarthi
2023-02-01  9:23   ` [PATCH v4 29/39] ml/cnxk: enable support for firmware error codes Srikanth Yalavarthi
2023-02-01  9:23   ` [PATCH v4 30/39] ml/cnxk: add support to get and reset device stats Srikanth Yalavarthi
2023-02-01  9:23   ` [PATCH v4 31/39] ml/cnxk: add support to handle extended dev stats Srikanth Yalavarthi
2023-02-01  9:23   ` [PATCH v4 32/39] ml/cnxk: enable support to get xstats in cycles Srikanth Yalavarthi
2023-02-01  9:23   ` [PATCH v4 33/39] ml/cnxk: add support to report DPE FW warnings Srikanth Yalavarthi
2023-02-01  9:23   ` [PATCH v4 34/39] ml/cnxk: add support to enable model data caching Srikanth Yalavarthi
2023-02-01  9:23   ` [PATCH v4 35/39] ml/cnxk: add support to select OCM allocation mode Srikanth Yalavarthi
2023-02-01  9:23   ` [PATCH v4 36/39] ml/cnxk: add support to use lock during jcmd enq Srikanth Yalavarthi
2023-02-01  9:23   ` [PATCH v4 37/39] ml/cnxk: add support to select poll memory region Srikanth Yalavarthi
2023-02-01  9:23   ` [PATCH v4 38/39] ml/cnxk: add user guide for marvell cnxk ml driver Srikanth Yalavarthi
2023-02-01  9:23   ` [PATCH v4 39/39] ml/cnxk: enable support for configurable ocm page Srikanth Yalavarthi
2023-02-07 16:06 ` [PATCH v5 00/39] Implementation of ML CNXK driver Srikanth Yalavarthi
2023-02-07 16:06   ` [PATCH v5 01/39] common/cnxk: add ML headers and ROC code for cnxk Srikanth Yalavarthi
2023-02-07 16:06   ` [PATCH v5 02/39] ml/cnxk: add skeleton for ML cnxk driver Srikanth Yalavarthi
2023-03-09 22:06     ` Thomas Monjalon
2023-03-10  8:25       ` [EXT] " Srikanth Yalavarthi
2023-03-10  9:28         ` Thomas Monjalon
2023-03-10  9:31           ` Srikanth Yalavarthi
2023-02-07 16:06   ` [PATCH v5 03/39] ml/cnxk: enable probe and remove of ML device Srikanth Yalavarthi
2023-02-07 16:06   ` [PATCH v5 04/39] ml/cnxk: add driver support to get device info Srikanth Yalavarthi
2023-02-07 16:06   ` [PATCH v5 05/39] ml/cnxk: add support for configure and close Srikanth Yalavarthi
2023-02-07 16:06   ` [PATCH v5 06/39] ml/cnxk: parse ML firmware path from device args Srikanth Yalavarthi
2023-02-07 16:06   ` [PATCH v5 07/39] ml/cnxk: enable firmware load and device reset Srikanth Yalavarthi
2023-02-07 16:06   ` [PATCH v5 08/39] ml/cnxk: enable support for simulator environment Srikanth Yalavarthi
2023-02-07 16:06   ` [PATCH v5 09/39] ml/cnxk: enable support for device start and stop Srikanth Yalavarthi
2023-02-07 16:06   ` [PATCH v5 10/39] ml/cnxk: add support to create device queue-pairs Srikanth Yalavarthi
2023-02-07 16:06   ` [PATCH v5 11/39] ml/cnxk: add functions to load and unload models Srikanth Yalavarthi
2023-02-07 16:06   ` [PATCH v5 12/39] ml/cnxk: enable validity checks for model metadata Srikanth Yalavarthi
2023-02-07 16:06   ` [PATCH v5 13/39] ml/cnxk: add internal structures for derived info Srikanth Yalavarthi
2023-02-07 16:06   ` [PATCH v5 14/39] ml/cnxk: add internal structures for tiles and OCM Srikanth Yalavarthi
2023-02-07 16:06   ` [PATCH v5 15/39] ml/cnxk: add structures for slow and fast path JDs Srikanth Yalavarthi
2023-02-07 16:06   ` [PATCH v5 16/39] ml/cnxk: find OCM mask and page slots for a model Srikanth Yalavarthi
2023-02-07 16:06   ` [PATCH v5 17/39] ml/cnxk: add support to reserve and free OCM pages Srikanth Yalavarthi
2023-02-07 16:06   ` [PATCH v5 18/39] ml/cnxk: enable support to start an ML model Srikanth Yalavarthi
2023-02-07 16:06   ` [PATCH v5 19/39] ml/cnxk: enable support to stop an ML models Srikanth Yalavarthi
2023-02-07 16:07   ` [PATCH v5 20/39] ml/cnxk: enable support to get model information Srikanth Yalavarthi
2023-02-07 16:07   ` [PATCH v5 21/39] ml/cnxk: enable support to update model params Srikanth Yalavarthi
2023-02-07 16:07   ` [PATCH v5 22/39] ml/cnxk: add support to get IO buffer sizes Srikanth Yalavarthi
2023-02-07 16:07   ` [PATCH v5 23/39] ml/cnxk: enable quantization and dequantization Srikanth Yalavarthi
2023-02-07 16:07   ` [PATCH v5 24/39] ml/cnxk: enable support to dump device debug info Srikanth Yalavarthi
2023-02-07 16:07   ` [PATCH v5 25/39] ml/cnxk: add driver support for device selftest Srikanth Yalavarthi
2023-02-07 16:07   ` [PATCH v5 26/39] ml/cnxk: enqueue a burst of inference requests Srikanth Yalavarthi
2023-02-07 16:07   ` [PATCH v5 27/39] ml/cnxk: dequeue " Srikanth Yalavarthi
2023-02-07 16:07   ` [PATCH v5 28/39] ml/cnxk: add internal function for sync mode run Srikanth Yalavarthi
2023-02-27 10:42     ` Prince Takkar
2023-02-07 16:07   ` [PATCH v5 29/39] ml/cnxk: enable support for firmware error codes Srikanth Yalavarthi
2023-02-07 16:07   ` [PATCH v5 30/39] ml/cnxk: add support to get and reset device stats Srikanth Yalavarthi
2023-02-07 16:07   ` [PATCH v5 31/39] ml/cnxk: add support to handle extended dev stats Srikanth Yalavarthi
2023-02-07 16:07   ` [PATCH v5 32/39] ml/cnxk: enable support to get xstats in cycles Srikanth Yalavarthi
2023-02-15 12:33     ` Shivah Shankar Shankar Narayan Rao
2023-02-16  4:40     ` Prince Takkar
2023-02-07 16:07   ` [PATCH v5 33/39] ml/cnxk: add support to report DPE FW warnings Srikanth Yalavarthi
2023-02-07 16:07   ` [PATCH v5 34/39] ml/cnxk: add support to enable model data caching Srikanth Yalavarthi
2023-02-07 16:07   ` [PATCH v5 35/39] ml/cnxk: add support to select OCM allocation mode Srikanth Yalavarthi
2023-03-01  9:01     ` Prince Takkar
2023-02-07 16:07   ` [PATCH v5 36/39] ml/cnxk: add support to use lock during jcmd enq Srikanth Yalavarthi
2023-02-07 16:07   ` [PATCH v5 37/39] ml/cnxk: add support to select poll memory region Srikanth Yalavarthi
2023-02-07 16:07   ` [PATCH v5 38/39] ml/cnxk: add user guide for marvell cnxk ml driver Srikanth Yalavarthi
2023-02-15 12:34     ` Shivah Shankar Shankar Narayan Rao
2023-02-16  4:41     ` Prince Takkar
2023-02-07 16:07   ` [PATCH v5 39/39] ml/cnxk: enable support for configurable ocm page Srikanth Yalavarthi
2023-02-15 12:33     ` Shivah Shankar Shankar Narayan Rao
2023-02-16  4:37     ` Prince Takkar
2023-03-02  6:08   ` [PATCH v5 00/39] Implementation of ML CNXK driver Prince Takkar
2023-03-10  8:19 ` [PATCH v6 " Srikanth Yalavarthi
2023-03-10  8:19   ` [PATCH v6 01/39] common/cnxk: add ML headers and ROC code for cnxk Srikanth Yalavarthi
2023-03-10  8:19   ` [PATCH v6 02/39] ml/cnxk: add skeleton for ML cnxk driver Srikanth Yalavarthi
2023-03-10  8:19   ` [PATCH v6 03/39] ml/cnxk: enable probe and remove of ML device Srikanth Yalavarthi
2023-03-10  8:19   ` [PATCH v6 04/39] ml/cnxk: add driver support to get device info Srikanth Yalavarthi
2023-03-10  8:19   ` [PATCH v6 05/39] ml/cnxk: add support for configure and close Srikanth Yalavarthi
2023-03-10  8:19   ` [PATCH v6 06/39] ml/cnxk: parse ML firmware path from device args Srikanth Yalavarthi
2023-03-10  8:19   ` [PATCH v6 07/39] ml/cnxk: enable firmware load and device reset Srikanth Yalavarthi
2023-03-10  8:19   ` [PATCH v6 08/39] ml/cnxk: enable support for simulator environment Srikanth Yalavarthi
2023-03-10  8:19   ` [PATCH v6 09/39] ml/cnxk: enable support for device start and stop Srikanth Yalavarthi
2023-03-10  8:19   ` [PATCH v6 10/39] ml/cnxk: add support to create device queue-pairs Srikanth Yalavarthi
2023-03-10  8:19   ` [PATCH v6 11/39] ml/cnxk: add functions to load and unload models Srikanth Yalavarthi
2023-03-10  8:19   ` [PATCH v6 12/39] ml/cnxk: enable validity checks for model metadata Srikanth Yalavarthi
2023-03-10  8:19   ` [PATCH v6 13/39] ml/cnxk: add internal structures for derived info Srikanth Yalavarthi
2023-03-10  8:19   ` [PATCH v6 14/39] ml/cnxk: add internal structures for tiles and OCM Srikanth Yalavarthi
2023-03-10  8:19   ` [PATCH v6 15/39] ml/cnxk: add structures for slow and fast path JDs Srikanth Yalavarthi
2023-03-10  8:19   ` [PATCH v6 16/39] ml/cnxk: find OCM mask and page slots for a model Srikanth Yalavarthi
2023-03-10  8:19   ` [PATCH v6 17/39] ml/cnxk: add support to reserve and free OCM pages Srikanth Yalavarthi
2023-03-10  8:19   ` [PATCH v6 18/39] ml/cnxk: enable support to start an ML model Srikanth Yalavarthi
2023-03-10  8:19   ` [PATCH v6 19/39] ml/cnxk: enable support to stop an ML models Srikanth Yalavarthi
2023-03-10  8:19   ` [PATCH v6 20/39] ml/cnxk: enable support to get model information Srikanth Yalavarthi
2023-03-10  8:19   ` [PATCH v6 21/39] ml/cnxk: enable support to update model params Srikanth Yalavarthi
2023-03-10  8:19   ` [PATCH v6 22/39] ml/cnxk: add support to get IO buffer sizes Srikanth Yalavarthi
2023-03-10  8:19   ` [PATCH v6 23/39] ml/cnxk: enable quantization and dequantization Srikanth Yalavarthi
2023-03-10  8:20   ` [PATCH v6 24/39] ml/cnxk: enable support to dump device debug info Srikanth Yalavarthi
2023-03-10  8:20   ` [PATCH v6 25/39] ml/cnxk: add driver support for device selftest Srikanth Yalavarthi
2023-03-10  8:20   ` [PATCH v6 26/39] ml/cnxk: enqueue a burst of inference requests Srikanth Yalavarthi
2023-03-10  8:20   ` [PATCH v6 27/39] ml/cnxk: dequeue " Srikanth Yalavarthi
2023-03-10  8:20   ` [PATCH v6 28/39] ml/cnxk: add internal function for sync mode run Srikanth Yalavarthi
2023-03-10  8:20   ` [PATCH v6 29/39] ml/cnxk: enable support for firmware error codes Srikanth Yalavarthi
2023-03-10  8:20   ` [PATCH v6 30/39] ml/cnxk: add support to get and reset device stats Srikanth Yalavarthi
2023-03-10  8:20   ` [PATCH v6 31/39] ml/cnxk: add support to handle extended dev stats Srikanth Yalavarthi
2023-03-10  8:20   ` [PATCH v6 32/39] ml/cnxk: enable support to get xstats in cycles Srikanth Yalavarthi
2023-03-10  8:20   ` [PATCH v6 33/39] ml/cnxk: add support to report DPE FW warnings Srikanth Yalavarthi
2023-03-10  8:20   ` [PATCH v6 34/39] ml/cnxk: add support to enable model data caching Srikanth Yalavarthi
2023-03-10  8:20   ` [PATCH v6 35/39] ml/cnxk: add support to select OCM allocation mode Srikanth Yalavarthi
2023-03-10  8:20   ` [PATCH v6 36/39] ml/cnxk: add support to use lock during jcmd enq Srikanth Yalavarthi
2023-03-10  8:20   ` [PATCH v6 37/39] ml/cnxk: add support to select poll memory region Srikanth Yalavarthi
2023-03-10  8:20   ` [PATCH v6 38/39] ml/cnxk: add user guide for marvell cnxk ml driver Srikanth Yalavarthi
2023-03-10  8:20   ` [PATCH v6 39/39] ml/cnxk: add support for configurable ocm page Srikanth Yalavarthi
2023-03-10  9:31   ` [PATCH v6 00/39] Implementation of ML CNXK driver Thomas Monjalon
2023-03-10 10:30     ` [EXT] " Srikanth Yalavarthi
2023-03-10 15:24   ` Thomas Monjalon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).