DPDK patches and discussions
 help / color / mirror / Atom feed
* [PATCH v1 00/34] Implemenation of revised ml/cnxk driver
@ 2023-08-30 15:58 Srikanth Yalavarthi
  2023-08-30 15:58 ` [PATCH v1 01/34] ml/cnxk: drop support for register polling Srikanth Yalavarthi
                   ` (41 more replies)
  0 siblings, 42 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:58 UTC (permalink / raw)
  Cc: dev, syalavarthi, sshankarnara, aprabhu, ptakkar

This patch series is an implementation of revised ml/cnxk driver
to support models compiled with TVM compiler framework. TVM models
use a hybrid mode for execution, with regions of the model executing
on the ML accelerator and the rest executing on CPU cores.

This series of commits reorganizes the ml/cnxk driver and adds support
to execute multiple regions with-in a TVM model.

Anup Prabhu (1):
  ml/cnxk: enable fast-path ops for TVM models

Prince Takkar (2):
  ml/cnxk: update internal TVM model info structure
  ml/cnxk: support quantize and dequantize callback

Srikanth Yalavarthi (31):
  ml/cnxk: drop support for register polling
  ml/cnxk: drop use of RTE API for firmware read
  ml/cnxk: add generic cnxk device structure
  ml/cnxk: add generic model and layer structures
  ml/cnxk: add generic cnxk request structure
  ml/cnxk: add generic cnxk xstats structures
  ml/cnxk: rename cnxk ops function pointers struct
  ml/cnxk: update device handling functions
  ml/cnxk: update queue-pair handling functions
  ml/cnxk: update model load and unload functions
  ml/cnxk: update model start and stop functions
  ml/cnxk: update model utility functions
  ml/cnxk: update data quantization functions
  ml/cnxk: update device debug functions
  ml/cnxk: update device stats functions
  ml/cnxk: update device and model xstats functions
  ml/cnxk: update fast path functions
  ml/cnxk: move error handling to cnxk layer
  ml/cnxk: support config and close of tvmdp library
  ml/cnxk: add structures to support TVM model type
  ml/cnxk: add support for identify model type
  ml/cnxk: add support to parse TVM model objects
  ml/cnxk: fetch layer info and load TVM model
  ml/cnxk: update internal info for TVM model
  ml/cnxk: enable model unload in tvmdp library
  ml/cnxk: support start and stop for TVM models
  ml/cnxk: support device dump for TVM models
  ml/cnxk: enable reporting model runtime as xstats
  ml/cnxk: implement I/O alloc and free callbacks
  ml/cnxk: add generic ML malloc and free callback
  ml/cnxk: enable creation of mvtvm virtual device

 doc/guides/mldevs/cnxk.rst       |   16 -
 drivers/ml/cnxk/cn10k_ml_dev.c   |  477 ++---
 drivers/ml/cnxk/cn10k_ml_dev.h   |  457 +----
 drivers/ml/cnxk/cn10k_ml_model.c |  383 ++--
 drivers/ml/cnxk/cn10k_ml_model.h |  148 +-
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  106 +-
 drivers/ml/cnxk/cn10k_ml_ocm.h   |   15 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 2915 ++++++++++--------------------
 drivers/ml/cnxk/cn10k_ml_ops.h   |  351 +++-
 drivers/ml/cnxk/cnxk_ml_dev.c    |   22 +
 drivers/ml/cnxk/cnxk_ml_dev.h    |  120 ++
 drivers/ml/cnxk/cnxk_ml_io.c     |   95 +
 drivers/ml/cnxk/cnxk_ml_io.h     |   88 +
 drivers/ml/cnxk/cnxk_ml_model.c  |  143 ++
 drivers/ml/cnxk/cnxk_ml_model.h  |  187 ++
 drivers/ml/cnxk/cnxk_ml_ops.c    | 1771 ++++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.h    |   85 +
 drivers/ml/cnxk/cnxk_ml_utils.c  |   15 +
 drivers/ml/cnxk/cnxk_ml_utils.h  |   17 +
 drivers/ml/cnxk/cnxk_ml_xstats.h |  152 ++
 drivers/ml/cnxk/meson.build      |   70 +
 drivers/ml/cnxk/mvtvm_ml_dev.c   |  198 ++
 drivers/ml/cnxk/mvtvm_ml_dev.h   |   40 +
 drivers/ml/cnxk/mvtvm_ml_model.c |  322 ++++
 drivers/ml/cnxk/mvtvm_ml_model.h |   88 +
 drivers/ml/cnxk/mvtvm_ml_ops.c   |  583 ++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |   72 +
 27 files changed, 5945 insertions(+), 2991 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_xstats.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.h

-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 01/34] ml/cnxk: drop support for register polling
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
@ 2023-08-30 15:58 ` Srikanth Yalavarthi
  2023-08-30 15:58 ` [PATCH v1 02/34] ml/cnxk: drop use of RTE API for firmware read Srikanth Yalavarthi
                   ` (40 subsequent siblings)
  41 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:58 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Dropped support for device argument "poll_mem" for cnxk
ML driver. Support to use registers for polling is removed
and DDR addresses would be used for polling.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
Depends-on: series-29375 ("Spec changes to support multi I/O models")

 doc/guides/mldevs/cnxk.rst     |  16 -----
 drivers/ml/cnxk/cn10k_ml_dev.c |  36 +----------
 drivers/ml/cnxk/cn10k_ml_dev.h |  13 +---
 drivers/ml/cnxk/cn10k_ml_ops.c | 111 ++++-----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   6 --
 5 files changed, 18 insertions(+), 164 deletions(-)

diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
index b79bc540d95..1834b1f905a 100644
--- a/doc/guides/mldevs/cnxk.rst
+++ b/doc/guides/mldevs/cnxk.rst
@@ -180,22 +180,6 @@ Runtime Config Options
   in the fast path enqueue burst operation.
 
 
-**Polling memory location** (default ``ddr``)
-
-  ML cnxk driver provides the option to select the memory location to be used
-  for polling to check the inference request completion.
-  Driver supports using either the DDR address space (``ddr``)
-  or ML registers (``register``) as polling locations.
-  The parameter ``poll_mem`` is used to specify the poll location.
-
-  For example::
-
-     -a 0000:00:10.0,poll_mem="register"
-
-  With the above configuration, ML cnxk driver is configured to use ML registers
-  for polling in fastpath requests.
-
-
 Debugging Options
 -----------------
 
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 983138a7f23..e3c2badcef5 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -23,7 +23,6 @@
 #define CN10K_ML_DEV_CACHE_MODEL_DATA	"cache_model_data"
 #define CN10K_ML_OCM_ALLOC_MODE		"ocm_alloc_mode"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK	"hw_queue_lock"
-#define CN10K_ML_FW_POLL_MEM		"poll_mem"
 #define CN10K_ML_OCM_PAGE_SIZE		"ocm_page_size"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
@@ -32,7 +31,6 @@
 #define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT	1
 #define CN10K_ML_OCM_ALLOC_MODE_DEFAULT		"lowest"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT	1
-#define CN10K_ML_FW_POLL_MEM_DEFAULT		"ddr"
 #define CN10K_ML_OCM_PAGE_SIZE_DEFAULT		16384
 
 /* ML firmware macros */
@@ -54,7 +52,6 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 					 CN10K_ML_DEV_CACHE_MODEL_DATA,
 					 CN10K_ML_OCM_ALLOC_MODE,
 					 CN10K_ML_DEV_HW_QUEUE_LOCK,
-					 CN10K_ML_FW_POLL_MEM,
 					 CN10K_ML_OCM_PAGE_SIZE,
 					 NULL};
 
@@ -103,9 +100,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	bool hw_queue_lock_set = false;
 	bool ocm_page_size_set = false;
 	char *ocm_alloc_mode = NULL;
-	bool poll_mem_set = false;
 	bool fw_path_set = false;
-	char *poll_mem = NULL;
 	char *fw_path = NULL;
 	int ret = 0;
 	bool found;
@@ -189,17 +184,6 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		hw_queue_lock_set = true;
 	}
 
-	if (rte_kvargs_count(kvlist, CN10K_ML_FW_POLL_MEM) == 1) {
-		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_POLL_MEM, &parse_string_arg,
-					 &poll_mem);
-		if (ret < 0) {
-			plt_err("Error processing arguments, key = %s\n", CN10K_ML_FW_POLL_MEM);
-			ret = -EINVAL;
-			goto exit;
-		}
-		poll_mem_set = true;
-	}
-
 	if (rte_kvargs_count(kvlist, CN10K_ML_OCM_PAGE_SIZE) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_OCM_PAGE_SIZE, &parse_integer_arg,
 					 &mldev->ocm_page_size);
@@ -280,18 +264,6 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, mldev->hw_queue_lock);
 
-	if (!poll_mem_set) {
-		mldev->fw.poll_mem = CN10K_ML_FW_POLL_MEM_DEFAULT;
-	} else {
-		if (!((strcmp(poll_mem, "ddr") == 0) || (strcmp(poll_mem, "register") == 0))) {
-			plt_err("Invalid argument, %s = %s\n", CN10K_ML_FW_POLL_MEM, poll_mem);
-			ret = -EINVAL;
-			goto exit;
-		}
-		mldev->fw.poll_mem = poll_mem;
-	}
-	plt_info("ML: %s = %s", CN10K_ML_FW_POLL_MEM, mldev->fw.poll_mem);
-
 	if (!ocm_page_size_set) {
 		mldev->ocm_page_size = CN10K_ML_OCM_PAGE_SIZE_DEFAULT;
 	} else {
@@ -450,10 +422,7 @@ cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 	if (fw->report_dpe_warnings)
 		flags = flags | FW_REPORT_DPE_WARNING_BITMASK;
 
-	if (strcmp(fw->poll_mem, "ddr") == 0)
-		flags = flags | FW_USE_DDR_POLL_ADDR_FP;
-	else if (strcmp(fw->poll_mem, "register") == 0)
-		flags = flags & ~FW_USE_DDR_POLL_ADDR_FP;
+	flags = flags | FW_USE_DDR_POLL_ADDR_FP;
 
 	return flags;
 }
@@ -863,5 +832,4 @@ RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH
 			      "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
 			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE
 			      "=<lowest|largest>" CN10K_ML_DEV_HW_QUEUE_LOCK
-			      "=<0|1>" CN10K_ML_FW_POLL_MEM "=<ddr|register>" CN10K_ML_OCM_PAGE_SIZE
-			      "=<1024|2048|4096|8192|16384>");
+			      "=<0|1>" CN10K_ML_OCM_PAGE_SIZE "=<1024|2048|4096|8192|16384>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index c73bf7d001a..4aaeecff03d 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -390,9 +390,6 @@ struct cn10k_ml_fw {
 	/* Report DPE warnings */
 	int report_dpe_warnings;
 
-	/* Memory to be used for polling in fast-path requests */
-	const char *poll_mem;
-
 	/* Data buffer */
 	uint8_t *data;
 
@@ -525,13 +522,9 @@ struct cn10k_ml_dev {
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
 
 	/* Poll handling function pointers */
-	void (*set_poll_addr)(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx);
-	void (*set_poll_ptr)(struct roc_ml *roc_ml, struct cn10k_ml_req *req);
-	uint64_t (*get_poll_ptr)(struct roc_ml *roc_ml, struct cn10k_ml_req *req);
-
-	/* Memory barrier function pointers to handle synchronization */
-	void (*set_enq_barrier)(void);
-	void (*set_deq_barrier)(void);
+	void (*set_poll_addr)(struct cn10k_ml_req *req);
+	void (*set_poll_ptr)(struct cn10k_ml_req *req);
+	uint64_t (*get_poll_ptr)(struct cn10k_ml_req *req);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 4abf4ae0d39..11531afd8c1 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -23,11 +23,6 @@
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
-/* Scratch register range for poll mode requests */
-#define ML_POLL_REGISTER_SYNC  1023
-#define ML_POLL_REGISTER_START 1024
-#define ML_POLL_REGISTER_END   2047
-
 /* Error message length */
 #define ERRMSG_LEN 32
 
@@ -82,79 +77,23 @@ print_line(FILE *fp, int len)
 }
 
 static inline void
-cn10k_ml_set_poll_addr_ddr(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx)
+cn10k_ml_set_poll_addr(struct cn10k_ml_req *req)
 {
-	PLT_SET_USED(qp);
-	PLT_SET_USED(idx);
-
 	req->compl_W1 = PLT_U64_CAST(&req->status);
 }
 
 static inline void
-cn10k_ml_set_poll_addr_reg(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx)
-{
-	req->compl_W1 = ML_SCRATCH(qp->block_start + idx % qp->block_size);
-}
-
-static inline void
-cn10k_ml_set_poll_ptr_ddr(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+cn10k_ml_set_poll_ptr(struct cn10k_ml_req *req)
 {
-	PLT_SET_USED(roc_ml);
-
 	plt_write64(ML_CN10K_POLL_JOB_START, req->compl_W1);
 }
 
-static inline void
-cn10k_ml_set_poll_ptr_reg(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
-{
-	roc_ml_reg_write64(roc_ml, ML_CN10K_POLL_JOB_START, req->compl_W1);
-}
-
 static inline uint64_t
-cn10k_ml_get_poll_ptr_ddr(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+cn10k_ml_get_poll_ptr(struct cn10k_ml_req *req)
 {
-	PLT_SET_USED(roc_ml);
-
 	return plt_read64(req->compl_W1);
 }
 
-static inline uint64_t
-cn10k_ml_get_poll_ptr_reg(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
-{
-	return roc_ml_reg_read64(roc_ml, req->compl_W1);
-}
-
-static inline void
-cn10k_ml_set_sync_addr(struct cn10k_ml_dev *mldev, struct cn10k_ml_req *req)
-{
-	if (strcmp(mldev->fw.poll_mem, "ddr") == 0)
-		req->compl_W1 = PLT_U64_CAST(&req->status);
-	else if (strcmp(mldev->fw.poll_mem, "register") == 0)
-		req->compl_W1 = ML_SCRATCH(ML_POLL_REGISTER_SYNC);
-}
-
-static inline void
-cn10k_ml_enq_barrier_ddr(void)
-{
-}
-
-static inline void
-cn10k_ml_deq_barrier_ddr(void)
-{
-}
-
-static inline void
-cn10k_ml_enq_barrier_register(void)
-{
-	dmb_st;
-}
-
-static inline void
-cn10k_ml_deq_barrier_register(void)
-{
-	dsb_st;
-}
-
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -242,9 +181,6 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->stats.dequeued_count = 0;
 	qp->stats.enqueue_err_count = 0;
 	qp->stats.dequeue_err_count = 0;
-	qp->block_size =
-		(ML_POLL_REGISTER_END - ML_POLL_REGISTER_START + 1) / dev->data->nb_queue_pairs;
-	qp->block_start = ML_POLL_REGISTER_START + qp_id * qp->block_size;
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
@@ -933,11 +869,7 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	else
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_LF;
 
-	if (strcmp(mldev->fw.poll_mem, "register") == 0)
-		dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP / dev_info->max_queue_pairs;
-	else if (strcmp(mldev->fw.poll_mem, "ddr") == 0)
-		dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
-
+	dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
 	dev_info->max_io = ML_CN10K_MAX_INPUT_OUTPUT;
 	dev_info->max_segments = ML_CN10K_MAX_SEGMENTS;
 	dev_info->align_size = ML_CN10K_ALIGN_SIZE;
@@ -1118,24 +1050,9 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
 	/* Set polling function pointers */
-	if (strcmp(mldev->fw.poll_mem, "ddr") == 0) {
-		mldev->set_poll_addr = cn10k_ml_set_poll_addr_ddr;
-		mldev->set_poll_ptr = cn10k_ml_set_poll_ptr_ddr;
-		mldev->get_poll_ptr = cn10k_ml_get_poll_ptr_ddr;
-	} else if (strcmp(mldev->fw.poll_mem, "register") == 0) {
-		mldev->set_poll_addr = cn10k_ml_set_poll_addr_reg;
-		mldev->set_poll_ptr = cn10k_ml_set_poll_ptr_reg;
-		mldev->get_poll_ptr = cn10k_ml_get_poll_ptr_reg;
-	}
-
-	/* Set barrier function pointers */
-	if (strcmp(mldev->fw.poll_mem, "ddr") == 0) {
-		mldev->set_enq_barrier = cn10k_ml_enq_barrier_ddr;
-		mldev->set_deq_barrier = cn10k_ml_deq_barrier_ddr;
-	} else if (strcmp(mldev->fw.poll_mem, "register") == 0) {
-		mldev->set_enq_barrier = cn10k_ml_enq_barrier_register;
-		mldev->set_deq_barrier = cn10k_ml_deq_barrier_register;
-	}
+	mldev->set_poll_addr = cn10k_ml_set_poll_addr;
+	mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
+	mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
 
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
@@ -2390,15 +2307,14 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	op = ops[count];
 	req = &queue->reqs[head];
 
-	mldev->set_poll_addr(qp, req, head);
+	mldev->set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
-	mldev->set_enq_barrier();
 
-	mldev->set_poll_ptr(&mldev->roc, req);
+	mldev->set_poll_ptr(req);
 	enqueued = mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
@@ -2445,7 +2361,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 
 dequeue_req:
 	req = &queue->reqs[tail];
-	status = mldev->get_poll_ptr(&mldev->roc, req);
+	status = mldev->get_poll_ptr(req);
 	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH)) {
 		if (plt_tsc_cycles() < req->timeout)
 			goto empty_or_active;
@@ -2453,7 +2369,6 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 			req->result.error_code.s.etype = ML_ETYPE_DRIVER;
 	}
 
-	mldev->set_deq_barrier();
 	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
 	ops[count] = req->op;
 
@@ -2515,14 +2430,14 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	model = dev->data->models[op->model_id];
 	req = model->req;
 
-	cn10k_ml_set_sync_addr(mldev, req);
+	cn10k_ml_set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
-	mldev->set_poll_ptr(&mldev->roc, req);
+	mldev->set_poll_ptr(req);
 	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
 
 	timeout = true;
@@ -2542,7 +2457,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 
 	timeout = true;
 	do {
-		if (mldev->get_poll_ptr(&mldev->roc, req) == ML_CN10K_POLL_JOB_FINISH) {
+		if (mldev->get_poll_ptr(req) == ML_CN10K_POLL_JOB_FINISH) {
 			timeout = false;
 			break;
 		}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index d64a9f27e6c..005b093e45d 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -67,12 +67,6 @@ struct cn10k_ml_qp {
 
 	/* Statistics per queue-pair */
 	struct rte_ml_dev_stats stats;
-
-	/* Register block start for polling */
-	uint32_t block_start;
-
-	/* Register block end for polling */
-	uint32_t block_size;
 };
 
 /* Device ops */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 02/34] ml/cnxk: drop use of RTE API for firmware read
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
  2023-08-30 15:58 ` [PATCH v1 01/34] ml/cnxk: drop support for register polling Srikanth Yalavarthi
@ 2023-08-30 15:58 ` Srikanth Yalavarthi
  2023-09-21 12:08   ` Jerin Jacob
  2023-08-30 15:58 ` [PATCH v1 03/34] ml/cnxk: add generic cnxk device structure Srikanth Yalavarthi
                   ` (39 subsequent siblings)
  41 siblings, 1 reply; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:58 UTC (permalink / raw)
  To: Srikanth Yalavarthi, Prince Takkar; +Cc: dev, sshankarnara, aprabhu

Dropped use of rte_firmware_read API to read ML firmware
binary. When DPDK is built with libarchive aaupport, the
the RTE API assumes the binary file as a compressed
archive. This causes the ML firmware binary to be parsed
incorrectly.

Fixes: c29da752ffa8 ("ml/cnxk: support firmware load and device reset")
Cc: syalavarthi@marvell.com

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 64 +++++++++++++++++++++++++++++++---
 1 file changed, 60 insertions(+), 4 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index e3c2badcef5..b7e6ed9a00e 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -2,6 +2,11 @@
  * Copyright (c) 2022 Marvell.
  */
 
+#include <fcntl.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
 #include <rte_common.h>
 #include <rte_dev.h>
 #include <rte_devargs.h>
@@ -61,6 +66,57 @@ static const int valid_ocm_page_size[] = {1024, 2048, 4096, 8192, 16384};
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
 
+static int
+ml_read_file(const char *file, size_t *size, char **buffer)
+{
+	char *file_buffer = NULL;
+	struct stat file_stat;
+	char *file_map;
+	int ret;
+	int fd;
+
+	fd = open(file, O_RDONLY);
+	if (fd == -1) {
+		plt_err("Failed to open file: %s\n", file);
+		return -errno;
+	}
+
+	if (fstat(fd, &file_stat) != 0) {
+		plt_err("fstat failed for file: %s\n", file);
+		close(fd);
+		return -errno;
+	}
+
+	file_buffer = rte_malloc("ml_firmware", file_stat.st_size, PLT_CACHE_LINE_SIZE);
+	if (file_buffer == NULL) {
+		plt_err("Failed to allocate memory: %s\n", file);
+		ret = -ENOMEM;
+		goto error;
+	}
+
+	file_map = mmap(0, file_stat.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
+	if (file_map == MAP_FAILED) {
+		plt_err("Failed to map file: %s\n", file);
+		ret = -errno;
+		goto error;
+	}
+
+	rte_memcpy(file_buffer, file_map, file_stat.st_size);
+	munmap(file_map, file_stat.st_size);
+	close(fd);
+
+	*size = file_stat.st_size;
+	*buffer = file_buffer;
+
+	return 0;
+
+error:
+	free(file_buffer);
+	close(fd);
+
+	return ret;
+}
+
 static int
 parse_string_arg(const char *key __rte_unused, const char *value, void *extra_args)
 {
@@ -736,7 +792,7 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 {
 	const struct plt_memzone *mz;
 	struct cn10k_ml_fw *fw;
-	void *fw_buffer = NULL;
+	char *fw_buffer = NULL;
 	uint64_t mz_size = 0;
 	uint64_t fw_size = 0;
 	int ret = 0;
@@ -746,7 +802,7 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 
 	if (roc_env_is_emulator() || roc_env_is_hw()) {
 		/* Read firmware image to a buffer */
-		ret = rte_firmware_read(fw->path, &fw_buffer, &fw_size);
+		ret = ml_read_file(fw->path, &fw_size, &fw_buffer);
 		if ((ret < 0) || (fw_buffer == NULL)) {
 			plt_err("Unable to read firmware data: %s\n", fw->path);
 			return ret;
@@ -763,7 +819,7 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	mz = plt_memzone_reserve_aligned(FW_MEMZONE_NAME, mz_size, 0, ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL) {
 		plt_err("plt_memzone_reserve failed : %s", FW_MEMZONE_NAME);
-		free(fw_buffer);
+		rte_free(fw_buffer);
 		return -ENOMEM;
 	}
 	fw->req = mz->addr;
@@ -780,7 +836,7 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	if (roc_env_is_emulator() || roc_env_is_hw()) {
 		fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cn10k_ml_req));
 		ret = cn10k_ml_fw_load_cn10ka(fw, fw_buffer, fw_size);
-		free(fw_buffer);
+		rte_free(fw_buffer);
 	} else if (roc_env_is_asim()) {
 		fw->data = NULL;
 		ret = cn10k_ml_fw_load_asim(fw);
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 03/34] ml/cnxk: add generic cnxk device structure
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
  2023-08-30 15:58 ` [PATCH v1 01/34] ml/cnxk: drop support for register polling Srikanth Yalavarthi
  2023-08-30 15:58 ` [PATCH v1 02/34] ml/cnxk: drop use of RTE API for firmware read Srikanth Yalavarthi
@ 2023-08-30 15:58 ` Srikanth Yalavarthi
  2023-08-30 15:58 ` [PATCH v1 04/34] ml/cnxk: add generic model and layer structures Srikanth Yalavarthi
                   ` (38 subsequent siblings)
  41 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:58 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduce generic cnxk device structure. This structure is
a top level device structure for the driver, which would
encapsulate the target / platform specific device structure.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c   | 315 ++++++++++----------
 drivers/ml/cnxk/cn10k_ml_dev.h   |  47 +--
 drivers/ml/cnxk/cn10k_ml_model.c |  14 +-
 drivers/ml/cnxk/cn10k_ml_model.h |   8 +-
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  56 ++--
 drivers/ml/cnxk/cn10k_ml_ops.c   | 494 +++++++++++++++++--------------
 drivers/ml/cnxk/cnxk_ml_dev.c    |  11 +
 drivers/ml/cnxk/cnxk_ml_dev.h    |  58 ++++
 drivers/ml/cnxk/meson.build      |   2 +
 9 files changed, 562 insertions(+), 443 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index b7e6ed9a00e..367fb7014c4 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -15,13 +15,15 @@
 #include <rte_mldev_pmd.h>
 #include <rte_pci.h>
 
-#include <roc_api.h>
-
 #include <eal_firmware.h>
 
+#include <roc_api.h>
+
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ops.h"
 
+#include "cnxk_ml_dev.h"
+
 #define CN10K_ML_FW_PATH		"fw_path"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
@@ -63,9 +65,6 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 /* Supported OCM page sizes: 1KB, 2KB, 4KB, 8KB and 16KB */
 static const int valid_ocm_page_size[] = {1024, 2048, 4096, 8192, 16384};
 
-/* Dummy operations for ML device */
-struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
-
 static int
 ml_read_file(const char *file, size_t *size, char **buffer)
 {
@@ -146,7 +145,7 @@ parse_integer_arg(const char *key __rte_unused, const char *value, void *extra_a
 }
 
 static int
-cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mldev)
+cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *cn10k_mldev)
 {
 	bool enable_dpe_warnings_set = false;
 	bool report_dpe_warnings_set = false;
@@ -183,7 +182,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_FW_ENABLE_DPE_WARNINGS) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_ENABLE_DPE_WARNINGS,
-					 &parse_integer_arg, &mldev->fw.enable_dpe_warnings);
+					 &parse_integer_arg, &cn10k_mldev->fw.enable_dpe_warnings);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_FW_ENABLE_DPE_WARNINGS);
@@ -195,7 +194,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_FW_REPORT_DPE_WARNINGS) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_REPORT_DPE_WARNINGS,
-					 &parse_integer_arg, &mldev->fw.report_dpe_warnings);
+					 &parse_integer_arg, &cn10k_mldev->fw.report_dpe_warnings);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_FW_REPORT_DPE_WARNINGS);
@@ -207,7 +206,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_DEV_CACHE_MODEL_DATA) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_DEV_CACHE_MODEL_DATA, &parse_integer_arg,
-					 &mldev->cache_model_data);
+					 &cn10k_mldev->cache_model_data);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_DEV_CACHE_MODEL_DATA);
@@ -230,7 +229,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_DEV_HW_QUEUE_LOCK) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_DEV_HW_QUEUE_LOCK, &parse_integer_arg,
-					 &mldev->hw_queue_lock);
+					 &cn10k_mldev->hw_queue_lock);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_DEV_HW_QUEUE_LOCK);
@@ -242,7 +241,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_OCM_PAGE_SIZE) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_OCM_PAGE_SIZE, &parse_integer_arg,
-					 &mldev->ocm_page_size);
+					 &cn10k_mldev->ocm_page_size);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n", CN10K_ML_OCM_PAGE_SIZE);
 			ret = -EINVAL;
@@ -253,49 +252,53 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 check_args:
 	if (!fw_path_set)
-		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
+		cn10k_mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
 	else
-		mldev->fw.path = fw_path;
-	plt_info("ML: %s = %s", CN10K_ML_FW_PATH, mldev->fw.path);
+		cn10k_mldev->fw.path = fw_path;
+	plt_info("ML: %s = %s", CN10K_ML_FW_PATH, cn10k_mldev->fw.path);
 
 	if (!enable_dpe_warnings_set) {
-		mldev->fw.enable_dpe_warnings = CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT;
+		cn10k_mldev->fw.enable_dpe_warnings = CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT;
 	} else {
-		if ((mldev->fw.enable_dpe_warnings < 0) || (mldev->fw.enable_dpe_warnings > 1)) {
+		if ((cn10k_mldev->fw.enable_dpe_warnings < 0) ||
+		    (cn10k_mldev->fw.enable_dpe_warnings > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_FW_ENABLE_DPE_WARNINGS,
-				mldev->fw.enable_dpe_warnings);
+				cn10k_mldev->fw.enable_dpe_warnings);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_FW_ENABLE_DPE_WARNINGS, mldev->fw.enable_dpe_warnings);
+	plt_info("ML: %s = %d", CN10K_ML_FW_ENABLE_DPE_WARNINGS,
+		 cn10k_mldev->fw.enable_dpe_warnings);
 
 	if (!report_dpe_warnings_set) {
-		mldev->fw.report_dpe_warnings = CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT;
+		cn10k_mldev->fw.report_dpe_warnings = CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT;
 	} else {
-		if ((mldev->fw.report_dpe_warnings < 0) || (mldev->fw.report_dpe_warnings > 1)) {
+		if ((cn10k_mldev->fw.report_dpe_warnings < 0) ||
+		    (cn10k_mldev->fw.report_dpe_warnings > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_FW_REPORT_DPE_WARNINGS,
-				mldev->fw.report_dpe_warnings);
+				cn10k_mldev->fw.report_dpe_warnings);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_FW_REPORT_DPE_WARNINGS, mldev->fw.report_dpe_warnings);
+	plt_info("ML: %s = %d", CN10K_ML_FW_REPORT_DPE_WARNINGS,
+		 cn10k_mldev->fw.report_dpe_warnings);
 
 	if (!cache_model_data_set) {
-		mldev->cache_model_data = CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT;
+		cn10k_mldev->cache_model_data = CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT;
 	} else {
-		if ((mldev->cache_model_data < 0) || (mldev->cache_model_data > 1)) {
+		if ((cn10k_mldev->cache_model_data < 0) || (cn10k_mldev->cache_model_data > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_DEV_CACHE_MODEL_DATA,
-				mldev->cache_model_data);
+				cn10k_mldev->cache_model_data);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_DEV_CACHE_MODEL_DATA, mldev->cache_model_data);
+	plt_info("ML: %s = %d", CN10K_ML_DEV_CACHE_MODEL_DATA, cn10k_mldev->cache_model_data);
 
 	if (!ocm_alloc_mode_set) {
-		mldev->ocm.alloc_mode = CN10K_ML_OCM_ALLOC_MODE_DEFAULT;
+		cn10k_mldev->ocm.alloc_mode = CN10K_ML_OCM_ALLOC_MODE_DEFAULT;
 	} else {
 		if (!((strcmp(ocm_alloc_mode, "lowest") == 0) ||
 		      (strcmp(ocm_alloc_mode, "largest") == 0))) {
@@ -304,47 +307,47 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 			ret = -EINVAL;
 			goto exit;
 		}
-		mldev->ocm.alloc_mode = ocm_alloc_mode;
+		cn10k_mldev->ocm.alloc_mode = ocm_alloc_mode;
 	}
-	plt_info("ML: %s = %s", CN10K_ML_OCM_ALLOC_MODE, mldev->ocm.alloc_mode);
+	plt_info("ML: %s = %s", CN10K_ML_OCM_ALLOC_MODE, cn10k_mldev->ocm.alloc_mode);
 
 	if (!hw_queue_lock_set) {
-		mldev->hw_queue_lock = CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT;
+		cn10k_mldev->hw_queue_lock = CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT;
 	} else {
-		if ((mldev->hw_queue_lock < 0) || (mldev->hw_queue_lock > 1)) {
+		if ((cn10k_mldev->hw_queue_lock < 0) || (cn10k_mldev->hw_queue_lock > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_DEV_HW_QUEUE_LOCK,
-				mldev->hw_queue_lock);
+				cn10k_mldev->hw_queue_lock);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, mldev->hw_queue_lock);
+	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, cn10k_mldev->hw_queue_lock);
 
 	if (!ocm_page_size_set) {
-		mldev->ocm_page_size = CN10K_ML_OCM_PAGE_SIZE_DEFAULT;
+		cn10k_mldev->ocm_page_size = CN10K_ML_OCM_PAGE_SIZE_DEFAULT;
 	} else {
-		if (mldev->ocm_page_size < 0) {
+		if (cn10k_mldev->ocm_page_size < 0) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_OCM_PAGE_SIZE,
-				mldev->ocm_page_size);
+				cn10k_mldev->ocm_page_size);
 			ret = -EINVAL;
 			goto exit;
 		}
 
 		found = false;
 		for (i = 0; i < PLT_DIM(valid_ocm_page_size); i++) {
-			if (mldev->ocm_page_size == valid_ocm_page_size[i]) {
+			if (cn10k_mldev->ocm_page_size == valid_ocm_page_size[i]) {
 				found = true;
 				break;
 			}
 		}
 
 		if (!found) {
-			plt_err("Unsupported ocm_page_size = %d\n", mldev->ocm_page_size);
+			plt_err("Unsupported ocm_page_size = %d\n", cn10k_mldev->ocm_page_size);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_OCM_PAGE_SIZE, mldev->ocm_page_size);
+	plt_info("ML: %s = %d", CN10K_ML_OCM_PAGE_SIZE, cn10k_mldev->ocm_page_size);
 
 exit:
 	rte_kvargs_free(kvlist);
@@ -356,7 +359,8 @@ static int
 cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 {
 	struct rte_ml_dev_pmd_init_params init_params;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	char name[RTE_ML_STR_MAX];
 	struct rte_ml_dev *dev;
 	int ret;
@@ -364,7 +368,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	PLT_SET_USED(pci_drv);
 
 	init_params = (struct rte_ml_dev_pmd_init_params){
-		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cn10k_ml_dev)};
+		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cnxk_ml_dev)};
 
 	ret = roc_plt_init();
 	if (ret < 0) {
@@ -380,18 +384,20 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	}
 
 	/* Get private data space allocated */
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cnxk_mldev->mldev = dev;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
-		mldev->roc.pci_dev = pci_dev;
+		cn10k_mldev->roc.pci_dev = pci_dev;
 
-		ret = cn10k_mldev_parse_devargs(dev->device->devargs, mldev);
+		ret = cn10k_mldev_parse_devargs(dev->device->devargs, cn10k_mldev);
 		if (ret) {
 			plt_err("Failed to parse devargs ret = %d", ret);
 			goto pmd_destroy;
 		}
 
-		ret = roc_ml_dev_init(&mldev->roc);
+		ret = roc_ml_dev_init(&cn10k_mldev->roc);
 		if (ret) {
 			plt_err("Failed to initialize ML ROC, ret = %d", ret);
 			goto pmd_destroy;
@@ -407,7 +413,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	dev->dequeue_burst = NULL;
 	dev->op_error_get = NULL;
 
-	mldev->state = ML_CN10K_DEV_STATE_PROBED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_PROBED;
 
 	return 0;
 
@@ -424,7 +430,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 static int
 cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	char name[RTE_ML_STR_MAX];
 	struct rte_ml_dev *dev;
 	int ret;
@@ -439,8 +445,8 @@ cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 		return -ENODEV;
 
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
-		mldev = dev->data->dev_private;
-		ret = roc_ml_dev_fini(&mldev->roc);
+		cnxk_mldev = dev->data->dev_private;
+		ret = roc_ml_dev_fini(&cnxk_mldev->cn10k_mldev.roc);
 		if (ret)
 			return ret;
 	}
@@ -486,45 +492,45 @@ cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 static int
 cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
 	uint64_t timeout_cycle;
 	uint64_t reg_val64;
 	bool timeout;
 	int ret = 0;
 
-	mldev = fw->mldev;
+	cn10k_mldev = fw->cn10k_mldev;
 
 	/* Reset HEAD and TAIL debug pointer registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
 
 	/* Set ML_MLR_BASE to base IOVA of the ML region in LLC/DRAM. */
 	reg_val64 = rte_eal_get_baseaddr();
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_MLR_BASE);
-	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
-	roc_ml_reg_save(&mldev->roc, ML_MLR_BASE);
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_save(&cn10k_mldev->roc, ML_MLR_BASE);
 
 	/* Update FW load completion structure */
 	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
 	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &fw->req->result);
+	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
 	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CN10K_POLL_JOB_START, &fw->req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
-	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&mldev->roc, &fw->req->jd);
+	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
 
 	plt_rmb();
 	do {
-		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
+		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -536,11 +542,11 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 	} else {
 		/* Set ML to disable new jobs */
 		reg_val64 = (ROC_ML_CFG_JD_SIZE | ROC_ML_CFG_MLIP_ENA);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 
 		/* Clear scratch registers */
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
 
 		if (timeout) {
 			plt_err("Firmware load timeout");
@@ -554,14 +560,14 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 	}
 
 	/* Reset scratch registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
 
 	/* Disable job execution, to be enabled in start */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	return ret;
 }
@@ -571,7 +577,7 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 {
 	union ml_a35_0_rst_vector_base_s a35_0_rst_vector_base;
 	union ml_a35_0_rst_vector_base_s a35_1_rst_vector_base;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
 	uint64_t timeout_cycle;
 	uint64_t reg_val64;
 	uint32_t reg_val32;
@@ -580,24 +586,24 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	int ret = 0;
 	uint8_t i;
 
-	mldev = fw->mldev;
+	cn10k_mldev = fw->cn10k_mldev;
 
 	/* Reset HEAD and TAIL debug pointer registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
 
 	/* (1) Write firmware images for ACC's two A35 cores to the ML region in LLC / DRAM. */
 	rte_memcpy(PLT_PTR_ADD(fw->data, FW_LINKER_OFFSET), buffer, size);
 
 	/* (2) Set ML(0)_MLR_BASE = Base IOVA of the ML region in LLC/DRAM. */
 	reg_val64 = PLT_PTR_SUB_U64_CAST(fw->data, rte_eal_get_baseaddr());
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_MLR_BASE);
-	plt_ml_dbg("ML_MLR_BASE => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
-	roc_ml_reg_save(&mldev->roc, ML_MLR_BASE);
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_save(&cn10k_mldev->roc, ML_MLR_BASE);
 
 	/* (3) Set ML(0)_AXI_BRIDGE_CTRL(1) = 0x184003 to remove back-pressure check on DMA AXI
 	 * bridge.
@@ -605,9 +611,9 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	reg_val64 = (ROC_ML_AXI_BRIDGE_CTRL_AXI_RESP_CTRL |
 		     ROC_ML_AXI_BRIDGE_CTRL_BRIDGE_CTRL_MODE | ROC_ML_AXI_BRIDGE_CTRL_NCB_WR_BLK |
 		     ROC_ML_AXI_BRIDGE_CTRL_FORCE_WRESP_OK | ROC_ML_AXI_BRIDGE_CTRL_FORCE_RRESP_OK);
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_AXI_BRIDGE_CTRL(1));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_AXI_BRIDGE_CTRL(1));
 	plt_ml_dbg("ML_AXI_BRIDGE_CTRL(1) => 0x%016lx",
-		   roc_ml_reg_read64(&mldev->roc, ML_AXI_BRIDGE_CTRL(1)));
+		   roc_ml_reg_read64(&cn10k_mldev->roc, ML_AXI_BRIDGE_CTRL(1)));
 
 	/* (4) Set ML(0)_ANB(0..2)_BACKP_DISABLE = 0x3 to remove back-pressure on the AXI to NCB
 	 * bridges.
@@ -615,9 +621,9 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	for (i = 0; i < ML_ANBX_NR; i++) {
 		reg_val64 = (ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_B_BACKP_DISABLE |
 			     ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_R_BACKP_DISABLE);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_BACKP_DISABLE(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_BACKP_DISABLE(i));
 		plt_ml_dbg("ML_ANBX_BACKP_DISABLE(%u) => 0x%016lx", i,
-			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_BACKP_DISABLE(i)));
+			   roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_BACKP_DISABLE(i)));
 	}
 
 	/* (5) Set ML(0)_ANB(0..2)_NCBI_P_OVR = 0x3000 and ML(0)_ANB(0..2)_NCBI_NP_OVR = 0x3000 to
@@ -626,39 +632,40 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	for (i = 0; i < ML_ANBX_NR; i++) {
 		reg_val64 = (ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR |
 			     ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR_VLD);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
 		plt_ml_dbg("ML_ANBX_NCBI_P_OVR(%u) => 0x%016lx", i,
-			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_P_OVR(i)));
+			   roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_NCBI_P_OVR(i)));
 
 		reg_val64 |= (ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR |
 			      ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR_VLD);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_NP_OVR(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_NCBI_NP_OVR(i));
 		plt_ml_dbg("ML_ANBX_NCBI_NP_OVR(%u) => 0x%016lx", i,
-			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_NP_OVR(i)));
+			   roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_NCBI_NP_OVR(i)));
 	}
 
 	/* (6) Set ML(0)_CFG[MLIP_CLK_FORCE] = 1, to force turning on the MLIP clock. */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_MLIP_CLK_FORCE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* (7) Set ML(0)_JOB_MGR_CTRL[STALL_ON_IDLE] = 0, to make sure the boot request is accepted
 	 * when there is no job in the command queue.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL);
 	reg_val64 &= ~ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
-	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
+	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx",
+		   roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL));
 
 	/* (8) Set ML(0)_CFG[ENA] = 0 and ML(0)_CFG[MLIP_ENA] = 1 to bring MLIP out of reset while
 	 * keeping the job manager disabled.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_MLIP_ENA;
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* (9) Wait at least 70 coprocessor clock cycles. */
 	plt_delay_us(FW_WAIT_CYCLES);
@@ -669,53 +676,57 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	 * AXI outbound address divided by 4. Read after write.
 	 */
 	offset = PLT_PTR_ADD_U64_CAST(
-		fw->data, FW_LINKER_OFFSET - roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+		fw->data, FW_LINKER_OFFSET - roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
 	a35_0_rst_vector_base.s.addr = (offset + ML_AXI_START_ADDR) / 4;
 	a35_1_rst_vector_base.s.addr = (offset + ML_AXI_START_ADDR) / 4;
 
-	roc_ml_reg_write32(&mldev->roc, a35_0_rst_vector_base.w.w0, ML_A35_0_RST_VECTOR_BASE_W(0));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(0));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_0_rst_vector_base.w.w0,
+			   ML_A35_0_RST_VECTOR_BASE_W(0));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(0));
 	plt_ml_dbg("ML_A35_0_RST_VECTOR_BASE_W(0) => 0x%08x", reg_val32);
 
-	roc_ml_reg_write32(&mldev->roc, a35_0_rst_vector_base.w.w1, ML_A35_0_RST_VECTOR_BASE_W(1));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(1));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_0_rst_vector_base.w.w1,
+			   ML_A35_0_RST_VECTOR_BASE_W(1));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(1));
 	plt_ml_dbg("ML_A35_0_RST_VECTOR_BASE_W(1) => 0x%08x", reg_val32);
 
-	roc_ml_reg_write32(&mldev->roc, a35_1_rst_vector_base.w.w0, ML_A35_1_RST_VECTOR_BASE_W(0));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(0));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_1_rst_vector_base.w.w0,
+			   ML_A35_1_RST_VECTOR_BASE_W(0));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(0));
 	plt_ml_dbg("ML_A35_1_RST_VECTOR_BASE_W(0) => 0x%08x", reg_val32);
 
-	roc_ml_reg_write32(&mldev->roc, a35_1_rst_vector_base.w.w1, ML_A35_1_RST_VECTOR_BASE_W(1));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(1));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_1_rst_vector_base.w.w1,
+			   ML_A35_1_RST_VECTOR_BASE_W(1));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(1));
 	plt_ml_dbg("ML_A35_1_RST_VECTOR_BASE_W(1) => 0x%08x", reg_val32);
 
 	/* (11) Clear MLIP's ML(0)_SW_RST_CTRL[ACC_RST]. This will bring the ACC cores and other
 	 * MLIP components out of reset. The cores will execute firmware from the ML region as
 	 * written in step 1.
 	 */
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_SW_RST_CTRL);
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_SW_RST_CTRL);
 	reg_val32 &= ~ROC_ML_SW_RST_CTRL_ACC_RST;
-	roc_ml_reg_write32(&mldev->roc, reg_val32, ML_SW_RST_CTRL);
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_SW_RST_CTRL);
+	roc_ml_reg_write32(&cn10k_mldev->roc, reg_val32, ML_SW_RST_CTRL);
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_SW_RST_CTRL);
 	plt_ml_dbg("ML_SW_RST_CTRL => 0x%08x", reg_val32);
 
 	/* (12) Wait for notification from firmware that ML is ready for job execution. */
 	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
 	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &fw->req->result);
+	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
 	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CN10K_POLL_JOB_START, &fw->req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
-	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&mldev->roc, &fw->req->jd);
+	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
 
 	plt_rmb();
 	do {
-		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
+		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -727,11 +738,11 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	} else {
 		/* Set ML to disable new jobs */
 		reg_val64 = (ROC_ML_CFG_JD_SIZE | ROC_ML_CFG_MLIP_ENA);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 
 		/* Clear scratch registers */
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
 
 		if (timeout) {
 			plt_err("Firmware load timeout");
@@ -747,49 +758,51 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	/* (13) Set ML(0)_JOB_MGR_CTRL[STALL_ON_IDLE] = 0x1; this is needed to shut down the MLIP
 	 * clock when there are no more jobs to process.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL);
 	reg_val64 |= ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
-	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
+	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx",
+		   roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL));
 
 	/* (14) Set ML(0)_CFG[MLIP_CLK_FORCE] = 0; the MLIP clock will be turned on/off based on job
 	 * activities.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_MLIP_CLK_FORCE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* (15) Set ML(0)_CFG[ENA] to enable ML job execution. */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* Reset scratch registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
 
 	/* Disable job execution, to be enabled in start */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* Additional fixes: Set RO bit to fix O2D DMA bandwidth issue on cn10ka */
 	for (i = 0; i < ML_ANBX_NR; i++) {
-		reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_P_OVR(i));
+		reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_NCBI_P_OVR(i));
 		reg_val64 |= (ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR |
 			      ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR_VLD);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
 	}
 
 	return ret;
 }
 
 int
-cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
+cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
 	const struct plt_memzone *mz;
 	struct cn10k_ml_fw *fw;
 	char *fw_buffer = NULL;
@@ -797,8 +810,9 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	uint64_t fw_size = 0;
 	int ret = 0;
 
-	fw = &mldev->fw;
-	fw->mldev = mldev;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	fw = &cn10k_mldev->fw;
+	fw->cn10k_mldev = cn10k_mldev;
 
 	if (roc_env_is_emulator() || roc_env_is_hw()) {
 		/* Read firmware image to a buffer */
@@ -829,8 +843,8 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	memset(&fw->req->jd.fw_load.version[0], '\0', MLDEV_FIRMWARE_VERSION_LENGTH);
 
 	/* Reset device, if in active state */
-	if (roc_ml_mlip_is_enabled(&mldev->roc))
-		roc_ml_mlip_reset(&mldev->roc, true);
+	if (roc_ml_mlip_is_enabled(&cn10k_mldev->roc))
+		roc_ml_mlip_reset(&cn10k_mldev->roc, true);
 
 	/* Load firmware */
 	if (roc_env_is_emulator() || roc_env_is_hw()) {
@@ -843,22 +857,25 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	}
 
 	if (ret < 0)
-		cn10k_ml_fw_unload(mldev);
+		cn10k_ml_fw_unload(cnxk_mldev);
 
 	return ret;
 }
 
 void
-cn10k_ml_fw_unload(struct cn10k_ml_dev *mldev)
+cn10k_ml_fw_unload(struct cnxk_ml_dev *cnxk_mldev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
 	const struct plt_memzone *mz;
 	uint64_t reg_val;
 
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+
 	/* Disable and reset device */
-	reg_val = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val &= ~ROC_ML_CFG_MLIP_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val, ML_CFG);
-	roc_ml_mlip_reset(&mldev->roc, true);
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val, ML_CFG);
+	roc_ml_mlip_reset(&cn10k_mldev->roc, true);
 
 	mz = plt_memzone_lookup(FW_MEMZONE_NAME);
 	if (mz != NULL)
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 4aaeecff03d..f9da1548c4a 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -9,6 +9,9 @@
 
 #include "cn10k_ml_ocm.h"
 
+/* Dummy Device ops */
+extern struct rte_ml_dev_ops ml_dev_dummy_ops;
+
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
@@ -36,17 +39,10 @@
 /* Maximum number of segments for IO data */
 #define ML_CN10K_MAX_SEGMENTS 1
 
-/* ML command timeout in seconds */
-#define ML_CN10K_CMD_TIMEOUT 5
-
 /* ML slow-path job flags */
 #define ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE BIT(0)
 #define ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD   BIT(1)
 
-/* Poll mode job state */
-#define ML_CN10K_POLL_JOB_START	 0
-#define ML_CN10K_POLL_JOB_FINISH 1
-
 /* Memory barrier macros */
 #if defined(RTE_ARCH_ARM)
 #define dmb_st ({ asm volatile("dmb st" : : : "memory"); })
@@ -56,6 +52,7 @@
 #define dsb_st
 #endif
 
+struct cnxk_ml_dev;
 struct cn10k_ml_req;
 struct cn10k_ml_qp;
 
@@ -68,21 +65,6 @@ enum cn10k_ml_job_type {
 	ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST,
 };
 
-/* Device configuration state enum */
-enum cn10k_ml_dev_state {
-	/* Probed and not configured */
-	ML_CN10K_DEV_STATE_PROBED = 0,
-
-	/* Configured */
-	ML_CN10K_DEV_STATE_CONFIGURED,
-
-	/* Started */
-	ML_CN10K_DEV_STATE_STARTED,
-
-	/* Closed */
-	ML_CN10K_DEV_STATE_CLOSED
-};
-
 /* Error types enumeration */
 enum cn10k_ml_error_etype {
 	/* 0x0 */ ML_ETYPE_NO_ERROR = 0, /* No error */
@@ -379,7 +361,7 @@ struct cn10k_ml_jd {
 /* ML firmware structure */
 struct cn10k_ml_fw {
 	/* Device reference */
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
 
 	/* Firmware file path */
 	const char *path;
@@ -485,27 +467,12 @@ struct cn10k_ml_dev {
 	/* Device ROC */
 	struct roc_ml roc;
 
-	/* Configuration state */
-	enum cn10k_ml_dev_state state;
-
 	/* Firmware */
 	struct cn10k_ml_fw fw;
 
 	/* OCM info */
 	struct cn10k_ml_ocm ocm;
 
-	/* Number of models loaded */
-	uint16_t nb_models_loaded;
-
-	/* Number of models unloaded */
-	uint16_t nb_models_unloaded;
-
-	/* Number of models started */
-	uint16_t nb_models_started;
-
-	/* Number of models stopped */
-	uint16_t nb_models_stopped;
-
 	/* Extended stats data */
 	struct cn10k_ml_xstats xstats;
 
@@ -528,7 +495,7 @@ struct cn10k_ml_dev {
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
-int cn10k_ml_fw_load(struct cn10k_ml_dev *mldev);
-void cn10k_ml_fw_unload(struct cn10k_ml_dev *mldev);
+int cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev);
+void cn10k_ml_fw_unload(struct cnxk_ml_dev *cnxk_mldev);
 
 #endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index e0b750cd8ef..d146535866a 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -10,6 +10,8 @@
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
+#include "cnxk_ml_dev.h"
+
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
 {
@@ -461,7 +463,7 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_
 }
 
 int
-cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, uint8_t *buffer,
+cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id, uint8_t *buffer,
 			       uint16_t *wb_pages, uint16_t *scratch_pages)
 {
 	struct cn10k_ml_model_metadata *metadata;
@@ -470,7 +472,7 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui
 	uint64_t wb_size;
 
 	metadata = (struct cn10k_ml_model_metadata *)buffer;
-	ocm = &mldev->ocm;
+	ocm = &cn10k_mldev->ocm;
 
 	/* Assume wb_size is zero for non-relocatable models */
 	if (metadata->model.ocm_relocatable)
@@ -494,11 +496,11 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui
 		   scratch_size, *scratch_pages);
 
 	/* Check if the model can be loaded on OCM */
-	if ((*wb_pages + *scratch_pages) > mldev->ocm.num_pages) {
+	if ((*wb_pages + *scratch_pages) > cn10k_mldev->ocm.num_pages) {
 		plt_err("Cannot create the model, OCM relocatable = %u",
 			metadata->model.ocm_relocatable);
 		plt_err("wb_pages (%u) + scratch_pages (%u) > %u", *wb_pages, *scratch_pages,
-			mldev->ocm.num_pages);
+			cn10k_mldev->ocm.num_pages);
 		return -ENOMEM;
 	}
 
@@ -506,8 +508,8 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui
 	 * prevent the library from allocating the remaining space on the tile to other models.
 	 */
 	if (!metadata->model.ocm_relocatable)
-		*scratch_pages =
-			PLT_MAX(PLT_U64_CAST(*scratch_pages), PLT_U64_CAST(mldev->ocm.num_pages));
+		*scratch_pages = PLT_MAX(PLT_U64_CAST(*scratch_pages),
+					 PLT_U64_CAST(cn10k_mldev->ocm.num_pages));
 
 	return 0;
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 4cc0744891b..3128b28db73 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -13,6 +13,8 @@
 #include "cn10k_ml_ocm.h"
 #include "cn10k_ml_ops.h"
 
+struct cnxk_ml_dev;
+
 /* Model state */
 enum cn10k_ml_model_state {
 	ML_CN10K_MODEL_STATE_LOADED,
@@ -489,7 +491,7 @@ struct cn10k_ml_model_stats {
 /* Model Object */
 struct cn10k_ml_model {
 	/* Device reference */
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *mldev;
 
 	/* Name */
 	char name[RTE_ML_STR_MAX];
@@ -537,8 +539,8 @@ int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
 void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
 				uint8_t *base_dma_addr);
-int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, uint8_t *buffer,
-				   uint16_t *wb_pages, uint16_t *scratch_pages);
+int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id,
+				   uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages);
 void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 93505c9c09b..d0f716bccea 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -4,11 +4,13 @@
 
 #include <rte_mldev_pmd.h>
 
+#include <roc_api.h>
+
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
-#include "roc_api.h"
+#include "cnxk_ml_dev.h"
 
 /* OCM macros */
 #define BYTE_LEN	   8
@@ -217,7 +219,8 @@ int
 cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
 			   uint16_t scratch_pages, uint64_t *tilemask)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	uint16_t used_scratch_pages_max;
@@ -236,8 +239,9 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	int max_slot_sz;
 	int page_id;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 
 	if (num_tiles > ML_CN10K_OCM_NUMTILES) {
 		plt_err("Invalid num_tiles = %u (> %u)", num_tiles, ML_CN10K_OCM_NUMTILES);
@@ -254,8 +258,8 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	tile_start = 0;
 	search_end_tile = ocm->num_tiles - num_tiles;
 
-	/* allocate for local ocm mask */
-	local_ocm_mask = rte_zmalloc("local_ocm_mask", mldev->ocm.mask_words, RTE_CACHE_LINE_SIZE);
+	/* Allocate for local ocm mask */
+	local_ocm_mask = rte_zmalloc("local_ocm_mask", ocm->mask_words, RTE_CACHE_LINE_SIZE);
 	if (local_ocm_mask == NULL) {
 		plt_err("Unable to allocate memory for local_ocm_mask");
 		return -1;
@@ -271,7 +275,7 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 			PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, used_last_wb_page_max);
 	}
 
-	memset(local_ocm_mask, 0, mldev->ocm.mask_words);
+	memset(local_ocm_mask, 0, ocm->mask_words);
 	for (tile_id = tile_start; tile_id < tile_start + num_tiles; tile_id++) {
 		for (word_id = 0; word_id < ocm->mask_words; word_id++)
 			local_ocm_mask[word_id] |= ocm->tile_ocm_info[tile_id].ocm_mask[word_id];
@@ -333,8 +337,9 @@ void
 cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t tilemask,
 			   int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_page_start;
@@ -345,8 +350,9 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 	int tile_id;
 	int page_id;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	/* Get first set bit, tile_start */
@@ -391,8 +397,9 @@ void
 cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_model *local_model;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_resize_pages;
@@ -404,8 +411,9 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 	int page_id;
 	uint16_t i;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	/* Update OCM info for WB memory */
@@ -453,35 +461,37 @@ cn10k_ml_ocm_pagemask_to_str(struct cn10k_ml_ocm_tile_info *tile_info, uint16_t
 	char *p = str;
 	int word;
 
-	/* add prefix 0x */
+	/* Add prefix 0x */
 	*p++ = '0';
 	*p++ = 'x';
 
-	/* build one word at a time */
+	/* Build hex string */
 	for (word = nwords - 1; word >= 0; word--) {
 		sprintf(p, "%02X", tile_info->ocm_mask[word]);
 		p += 2;
 	}
 
-	/* terminate */
+	/* Terminate */
 	*p++ = 0;
 }
 
 void
 cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
 {
-	char *str;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 	uint8_t tile_id;
 	uint8_t word_id;
 	int wb_pages;
+	char *str;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 
-	/* nibbles + prefix '0x' */
-	str = rte_zmalloc("ocm_mask_str", mldev->ocm.num_pages / 4 + 2, RTE_CACHE_LINE_SIZE);
+	/* Nibbles + prefix '0x' */
+	str = rte_zmalloc("ocm_mask_str", ocm->num_pages / 4 + 2, RTE_CACHE_LINE_SIZE);
 	if (str == NULL) {
 		plt_err("Unable to allocate memory for ocm_mask_str");
 		return;
@@ -492,7 +502,7 @@ cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
 		cn10k_ml_ocm_pagemask_to_str(&ocm->tile_ocm_info[tile_id], ocm->mask_words, str);
 
 		wb_pages = 0 - ocm->tile_ocm_info[tile_id].scratch_pages;
-		for (word_id = 0; word_id < mldev->ocm.mask_words; word_id++)
+		for (word_id = 0; word_id < ocm->mask_words; word_id++)
 			wb_pages +=
 				__builtin_popcount(ocm->tile_ocm_info[tile_id].ocm_mask[word_id]);
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 11531afd8c1..3385bf50c0d 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -11,6 +11,8 @@
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ops.h"
 
+#include "cnxk_ml_dev.h"
+
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
 
@@ -85,7 +87,7 @@ cn10k_ml_set_poll_addr(struct cn10k_ml_req *req)
 static inline void
 cn10k_ml_set_poll_ptr(struct cn10k_ml_req *req)
 {
-	plt_write64(ML_CN10K_POLL_JOB_START, req->compl_W1);
+	plt_write64(ML_CNXK_POLL_JOB_START, req->compl_W1);
 }
 
 static inline uint64_t
@@ -175,7 +177,7 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->queue.reqs = (struct cn10k_ml_req *)va;
 	qp->queue.head = 0;
 	qp->queue.tail = 0;
-	qp->queue.wait_cycles = ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	qp->nb_desc = nb_desc;
 	qp->stats.enqueued_count = 0;
 	qp->stats.dequeued_count = 0;
@@ -199,16 +201,17 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 static void
 cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 {
-
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	char str[STR_LEN];
 	uint8_t i;
 	uint8_t j;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	/* Print debug info */
@@ -249,7 +252,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
 			ML_CN10K_OCM_NUMTILES / 4, model->model_mem_map.tilemask);
 		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
-			model->model_mem_map.wb_page_start * mldev->ocm.page_size);
+			model->model_mem_map.wb_page_start * cn10k_mldev->ocm.page_size);
 	}
 
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->metadata.model.num_input);
@@ -325,7 +328,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 }
 
 static void
-cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_model *model,
+cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_model *model,
 				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
@@ -340,7 +343,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 	req->jd.hdr.model_id = model->model_id;
 	req->jd.hdr.job_type = job_type;
 	req->jd.hdr.fp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
 		if (!model->metadata.model.ocm_relocatable)
@@ -350,9 +353,9 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 
 		req->jd.hdr.sp_flags |= ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD;
 		req->jd.model_start.extended_args =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, &req->extended_args));
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->extended_args));
 		req->jd.model_start.model_dst_ddr_addr =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, addr->init_run_addr));
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_run_addr));
 		req->jd.model_start.model_init_offset = 0x0;
 		req->jd.model_start.model_main_offset = metadata->init_model.file_size;
 		req->jd.model_start.model_finish_offset =
@@ -372,7 +375,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 		req->jd.model_start.ocm_wb_range_start = metadata->model.ocm_wb_range_start;
 		req->jd.model_start.ocm_wb_range_end = metadata->model.ocm_wb_range_end;
 		req->jd.model_start.ddr_wb_base_address = PLT_U64_CAST(roc_ml_addr_ap2mlip(
-			&mldev->roc,
+			&cn10k_mldev->roc,
 			PLT_PTR_ADD(addr->finish_load_addr, metadata->finish_model.file_size)));
 		req->jd.model_start.ddr_wb_range_start = metadata->model.ddr_wb_range_start;
 		req->jd.model_start.ddr_wb_range_end = metadata->model.ddr_wb_range_end;
@@ -383,7 +386,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 		req->jd.model_start.output.s.ddr_range_end = metadata->model.ddr_output_range_end;
 
 		req->extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
-			roc_ml_addr_ap2mlip(&mldev->roc, model->addr.scratch_base_addr));
+			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, model->addr.scratch_base_addr));
 		req->extended_args.start.ddr_scratch_range_start =
 			metadata->model.ddr_scratch_range_start;
 		req->extended_args.start.ddr_scratch_range_end =
@@ -392,24 +395,20 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 }
 
 static __rte_always_inline void
-cn10k_ml_prep_fp_job_descriptor(struct rte_ml_dev *dev, struct cn10k_ml_req *req,
+cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_req *req,
 				struct rte_ml_op *op)
 {
-	struct cn10k_ml_dev *mldev;
-
-	mldev = dev->data->dev_private;
-
 	req->jd.hdr.jce.w0.u64 = 0;
 	req->jd.hdr.jce.w1.u64 = req->compl_W1;
 	req->jd.hdr.model_id = op->model_id;
 	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
 	req->jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
 	req->jd.hdr.sp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
 	req->jd.model_run.input_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->input[0]->addr));
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->input[0]->addr));
 	req->jd.model_run.output_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->output[0]->addr));
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->output[0]->addr));
 	req->jd.model_run.num_batches = op->nb_batches;
 }
 
@@ -436,66 +435,69 @@ static const struct xstat_info model_stats[] = {
 static int
 cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint16_t nb_stats;
 	uint16_t stat_id;
 	uint16_t model;
 	uint16_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
 	nb_stats = RTE_DIM(device_stats) + ML_CN10K_MAX_MODELS * RTE_DIM(model_stats);
-	if (mldev->xstats.entries == NULL)
-		mldev->xstats.entries = rte_zmalloc("cn10k_ml_xstats",
-						    sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
-						    PLT_CACHE_LINE_SIZE);
+	if (cn10k_mldev->xstats.entries == NULL)
+		cn10k_mldev->xstats.entries = rte_zmalloc(
+			"cn10k_ml_xstats", sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
+			PLT_CACHE_LINE_SIZE);
 
-	if (mldev->xstats.entries == NULL)
+	if (cn10k_mldev->xstats.entries == NULL)
 		return -ENOMEM;
 
 	/* Initialize device xstats */
 	stat_id = 0;
 	for (i = 0; i < RTE_DIM(device_stats); i++) {
-		mldev->xstats.entries[stat_id].map.id = stat_id;
-		snprintf(mldev->xstats.entries[stat_id].map.name,
-			 sizeof(mldev->xstats.entries[stat_id].map.name), "%s",
+		cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
+		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s",
 			 device_stats[i].name);
 
-		mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
-		mldev->xstats.entries[stat_id].type = device_stats[i].type;
-		mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_DEVICE;
-		mldev->xstats.entries[stat_id].obj_idx = 0;
-		mldev->xstats.entries[stat_id].reset_allowed = device_stats[i].reset_allowed;
+		cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
+		cn10k_mldev->xstats.entries[stat_id].type = device_stats[i].type;
+		cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_DEVICE;
+		cn10k_mldev->xstats.entries[stat_id].obj_idx = 0;
+		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_stats[i].reset_allowed;
 		stat_id++;
 	}
-	mldev->xstats.count_mode_device = stat_id;
+	cn10k_mldev->xstats.count_mode_device = stat_id;
 
 	/* Initialize model xstats */
 	for (model = 0; model < ML_CN10K_MAX_MODELS; model++) {
-		mldev->xstats.offset_for_model[model] = stat_id;
+		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
 
 		for (i = 0; i < RTE_DIM(model_stats); i++) {
-			mldev->xstats.entries[stat_id].map.id = stat_id;
-			mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
-			mldev->xstats.entries[stat_id].type = model_stats[i].type;
-			mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_MODEL;
-			mldev->xstats.entries[stat_id].obj_idx = model;
-			mldev->xstats.entries[stat_id].reset_allowed = model_stats[i].reset_allowed;
+			cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
+			cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
+			cn10k_mldev->xstats.entries[stat_id].type = model_stats[i].type;
+			cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_MODEL;
+			cn10k_mldev->xstats.entries[stat_id].obj_idx = model;
+			cn10k_mldev->xstats.entries[stat_id].reset_allowed =
+				model_stats[i].reset_allowed;
 
 			/* Name of xstat is updated during model load */
-			snprintf(mldev->xstats.entries[stat_id].map.name,
-				 sizeof(mldev->xstats.entries[stat_id].map.name), "Model-%u-%s",
-				 model, model_stats[i].name);
+			snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
+				 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name),
+				 "Model-%u-%s", model, model_stats[i].name);
 
 			stat_id++;
 		}
 
-		mldev->xstats.count_per_model[model] = RTE_DIM(model_stats);
+		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(model_stats);
 	}
 
-	mldev->xstats.count_mode_model = stat_id - mldev->xstats.count_mode_device;
-	mldev->xstats.count = stat_id;
+	cn10k_mldev->xstats.count_mode_model = stat_id - cn10k_mldev->xstats.count_mode_device;
+	cn10k_mldev->xstats.count = stat_id;
 
 	return 0;
 }
@@ -503,28 +505,32 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 static void
 cn10k_ml_xstats_uninit(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	rte_free(mldev->xstats.entries);
-	mldev->xstats.entries = NULL;
+	rte_free(cn10k_mldev->xstats.entries);
+	cn10k_mldev->xstats.entries = NULL;
 
-	mldev->xstats.count = 0;
+	cn10k_mldev->xstats.count = 0;
 }
 
 static void
 cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	uint16_t rclk_freq;
 	uint16_t sclk_freq;
 	uint16_t stat_id;
 	char suffix[8];
 	uint16_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[model_id];
 	stat_id = RTE_DIM(device_stats) + model_id * RTE_DIM(model_stats);
 
@@ -536,8 +542,8 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Update xstat name based on model name and sclk availability */
 	for (i = 0; i < RTE_DIM(model_stats); i++) {
-		snprintf(mldev->xstats.entries[stat_id].map.name,
-			 sizeof(mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
 			 model->metadata.model.name, model_stats[i].name, suffix);
 		stat_id++;
 	}
@@ -547,19 +553,19 @@ static uint64_t
 cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 		       enum cn10k_ml_xstats_type type)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
 
 	switch (type) {
 	case nb_models_loaded:
-		return mldev->nb_models_loaded;
+		return cnxk_mldev->nb_models_loaded;
 	case nb_models_unloaded:
-		return mldev->nb_models_unloaded;
+		return cnxk_mldev->nb_models_unloaded;
 	case nb_models_started:
-		return mldev->nb_models_started;
+		return cnxk_mldev->nb_models_started;
 	case nb_models_stopped:
-		return mldev->nb_models_stopped;
+		return cnxk_mldev->nb_models_stopped;
 	default:
 		return -1;
 	}
@@ -651,15 +657,17 @@ static int
 cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[], uint16_t nb_ids)
 {
 	struct cn10k_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint16_t nb_stats;
 	uint16_t stat_id;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	if (stat_ids == NULL)
-		nb_stats = mldev->xstats.count_mode_device;
+		nb_stats = cn10k_mldev->xstats.count_mode_device;
 	else
 		nb_stats = nb_ids;
 
@@ -669,10 +677,10 @@ cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[],
 		else
 			stat_id = stat_ids[i];
 
-		if (stat_id >= mldev->xstats.count_mode_device)
+		if (stat_id >= cn10k_mldev->xstats.count_mode_device)
 			return -EINVAL;
 
-		xs = &mldev->xstats.entries[stat_id];
+		xs = &cn10k_mldev->xstats.entries[stat_id];
 		if (!xs->reset_allowed)
 			continue;
 
@@ -740,15 +748,17 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 			    uint16_t nb_ids)
 {
 	struct cn10k_ml_xstats_entry *xs;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	int32_t lcl_model_id = 0;
 	uint16_t start_id;
 	uint16_t end_id;
 	int32_t i;
 	int32_t j;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	for (i = 0; i < ML_CN10K_MAX_MODELS; i++) {
 		if (model_id == -1) {
 			model = dev->data->models[i];
@@ -765,12 +775,13 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 			}
 		}
 
-		start_id = mldev->xstats.offset_for_model[i];
-		end_id = mldev->xstats.offset_for_model[i] + mldev->xstats.count_per_model[i] - 1;
+		start_id = cn10k_mldev->xstats.offset_for_model[i];
+		end_id = cn10k_mldev->xstats.offset_for_model[i] +
+			 cn10k_mldev->xstats.count_per_model[i] - 1;
 
 		if (stat_ids == NULL) {
 			for (j = start_id; j <= end_id; j++) {
-				xs = &mldev->xstats.entries[j];
+				xs = &cn10k_mldev->xstats.entries[j];
 				cn10k_ml_reset_model_stat(dev, i, xs->type);
 			}
 		} else {
@@ -780,7 +791,7 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 						stat_ids[j], lcl_model_id);
 					return -EINVAL;
 				}
-				xs = &mldev->xstats.entries[stat_ids[j]];
+				xs = &cn10k_mldev->xstats.entries[stat_ids[j]];
 				cn10k_ml_reset_model_stat(dev, i, xs->type);
 			}
 		}
@@ -854,17 +865,19 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
 	if (dev_info == NULL)
 		return -EINVAL;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
 	dev_info->driver_name = dev->device->driver->name;
 	dev_info->max_models = ML_CN10K_MAX_MODELS;
-	if (mldev->hw_queue_lock)
+	if (cn10k_mldev->hw_queue_lock)
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
 	else
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_LF;
@@ -881,8 +894,9 @@ static int
 cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
 {
 	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
@@ -895,7 +909,8 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		return -EINVAL;
 
 	/* Get CN10K device handle */
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	cn10k_ml_dev_info_get(dev, &dev_info);
 	if (conf->nb_models > dev_info.max_models) {
@@ -908,21 +923,21 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		return -EINVAL;
 	}
 
-	if (mldev->state == ML_CN10K_DEV_STATE_PROBED) {
+	if (cnxk_mldev->state == ML_CNXK_DEV_STATE_PROBED) {
 		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
 
 		/* Load firmware */
-		ret = cn10k_ml_fw_load(mldev);
+		ret = cn10k_ml_fw_load(cnxk_mldev);
 		if (ret != 0)
 			return ret;
-	} else if (mldev->state == ML_CN10K_DEV_STATE_CONFIGURED) {
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
 		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
-	} else if (mldev->state == ML_CN10K_DEV_STATE_STARTED) {
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_STARTED) {
 		plt_err("Device can't be reconfigured in started state\n");
 		return -ENOTSUP;
-	} else if (mldev->state == ML_CN10K_DEV_STATE_CLOSED) {
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CLOSED) {
 		plt_err("Device can't be reconfigured after close\n");
 		return -ENOTSUP;
 	}
@@ -1013,10 +1028,10 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	}
 	dev->data->nb_models = conf->nb_models;
 
-	ocm = &mldev->ocm;
+	ocm = &cn10k_mldev->ocm;
 	ocm->num_tiles = ML_CN10K_OCM_NUMTILES;
 	ocm->size_per_tile = ML_CN10K_OCM_TILESIZE;
-	ocm->page_size = mldev->ocm_page_size;
+	ocm->page_size = cn10k_mldev->ocm_page_size;
 	ocm->num_pages = ocm->size_per_tile / ocm->page_size;
 	ocm->mask_words = ocm->num_pages / (8 * sizeof(uint8_t));
 
@@ -1044,25 +1059,25 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	}
 
 	/* Set JCMDQ enqueue function */
-	if (mldev->hw_queue_lock == 1)
-		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_sl;
+	if (cn10k_mldev->hw_queue_lock == 1)
+		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_sl;
 	else
-		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
+		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
 	/* Set polling function pointers */
-	mldev->set_poll_addr = cn10k_ml_set_poll_addr;
-	mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
-	mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
+	cn10k_mldev->set_poll_addr = cn10k_ml_set_poll_addr;
+	cn10k_mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
+	cn10k_mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
 
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
 	dev->op_error_get = cn10k_ml_op_error_get;
 
-	mldev->nb_models_loaded = 0;
-	mldev->nb_models_started = 0;
-	mldev->nb_models_stopped = 0;
-	mldev->nb_models_unloaded = 0;
-	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
+	cnxk_mldev->nb_models_loaded = 0;
+	cnxk_mldev->nb_models_started = 0;
+	cnxk_mldev->nb_models_stopped = 0;
+	cnxk_mldev->nb_models_unloaded = 0;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
 
 	return 0;
 
@@ -1077,8 +1092,9 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 static int
 cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
 	uint16_t qp_id;
@@ -1086,10 +1102,11 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	if (dev == NULL)
 		return -EINVAL;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Release ocm_mask memory */
-	rte_free(mldev->ocm.ocm_mask);
+	rte_free(cn10k_mldev->ocm.ocm_mask);
 
 	/* Stop and unload all models */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
@@ -1125,21 +1142,21 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	cn10k_ml_xstats_uninit(dev);
 
 	/* Unload firmware */
-	cn10k_ml_fw_unload(mldev);
+	cn10k_ml_fw_unload(cnxk_mldev);
 
 	/* Clear scratch registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
 
 	/* Reset ML_MLR_BASE */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_MLR_BASE);
-	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
 
-	mldev->state = ML_CN10K_DEV_STATE_CLOSED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CLOSED;
 
 	/* Remove PCI device */
 	return rte_dev_remove(dev->device);
@@ -1148,17 +1165,19 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 static int
 cn10k_ml_dev_start(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	mldev->state = ML_CN10K_DEV_STATE_STARTED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
 
 	return 0;
 }
@@ -1166,17 +1185,19 @@ cn10k_ml_dev_start(struct rte_ml_dev *dev)
 static int
 cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
 
 	return 0;
 }
@@ -1259,22 +1280,24 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
 			      uint32_t size)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint32_t xstats_mode_count;
 	uint32_t idx = 0;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	xstats_mode_count = 0;
 	switch (mode) {
 	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = mldev->xstats.count_mode_device;
+		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CN10K_MAX_MODELS)
 			break;
-		xstats_mode_count = mldev->xstats.count_per_model[model_id];
+		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
 	default:
 		return -EINVAL;
@@ -1283,16 +1306,17 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 	if (xstats_mode_count > size || xstats_map == NULL)
 		return xstats_mode_count;
 
-	for (i = 0; i < mldev->xstats.count && idx < size; i++) {
-		if (mldev->xstats.entries[i].mode != mode)
+	for (i = 0; i < cn10k_mldev->xstats.count && idx < size; i++) {
+		if (cn10k_mldev->xstats.entries[i].mode != mode)
 			continue;
 
 		if (mode != RTE_ML_DEV_XSTATS_DEVICE &&
-		    model_id != mldev->xstats.entries[i].obj_idx)
+		    model_id != cn10k_mldev->xstats.entries[i].obj_idx)
 			continue;
 
-		strncpy(xstats_map[idx].name, mldev->xstats.entries[i].map.name, RTE_ML_STR_MAX);
-		xstats_map[idx].id = mldev->xstats.entries[i].map.id;
+		strncpy(xstats_map[idx].name, cn10k_mldev->xstats.entries[i].map.name,
+			RTE_ML_STR_MAX);
+		xstats_map[idx].id = cn10k_mldev->xstats.entries[i].map.id;
 		idx++;
 	}
 
@@ -1304,13 +1328,15 @@ cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16
 				uint64_t *value)
 {
 	struct cn10k_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	cn10k_ml_xstats_fn fn;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
-	for (i = 0; i < mldev->xstats.count; i++) {
-		xs = &mldev->xstats.entries[i];
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	for (i = 0; i < cn10k_mldev->xstats.count; i++) {
+		xs = &cn10k_mldev->xstats.entries[i];
 		if (strncmp(xs->map.name, name, RTE_ML_STR_MAX) == 0) {
 			if (stat_id != NULL)
 				*stat_id = xs->map.id;
@@ -1344,24 +1370,26 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
 {
 	struct cn10k_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint32_t xstats_mode_count;
 	cn10k_ml_xstats_fn fn;
 	uint64_t val;
 	uint32_t idx;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	xstats_mode_count = 0;
 
 	switch (mode) {
 	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = mldev->xstats.count_mode_device;
+		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CN10K_MAX_MODELS)
 			return -EINVAL;
-		xstats_mode_count = mldev->xstats.count_per_model[model_id];
+		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
 	default:
 		return -EINVAL;
@@ -1369,8 +1397,8 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 
 	idx = 0;
 	for (i = 0; i < nb_ids && idx < xstats_mode_count; i++) {
-		xs = &mldev->xstats.entries[stat_ids[i]];
-		if (stat_ids[i] > mldev->xstats.count || xs->mode != mode)
+		xs = &cn10k_mldev->xstats.entries[stat_ids[i]];
+		if (stat_ids[i] > cn10k_mldev->xstats.count || xs->mode != mode)
 			continue;
 
 		if (mode == RTE_ML_DEV_XSTATS_MODEL && model_id != xs->obj_idx) {
@@ -1418,8 +1446,9 @@ cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mo
 static int
 cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_fw *fw;
 
 	uint32_t head_loc;
@@ -1432,8 +1461,9 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	if (roc_env_is_asim())
 		return 0;
 
-	mldev = dev->data->dev_private;
-	fw = &mldev->fw;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	fw = &cn10k_mldev->fw;
 
 	/* Dump model info */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
@@ -1451,15 +1481,19 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	for (core_id = 0; core_id <= 1; core_id++) {
 		bufsize = fw->req->jd.fw_load.debug.debug_buffer_size;
 		if (core_id == 0) {
-			head_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-			tail_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+			head_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+			tail_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_debug_ptr);
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		} else {
-			head_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-			tail_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+			head_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+			tail_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_debug_ptr);
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		}
 		if (head_loc < tail_loc) {
 			fprintf(fp, "%.*s\n", tail_loc - head_loc, &head_ptr[head_loc]);
@@ -1473,18 +1507,18 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	for (core_id = 0; core_id <= 1; core_id++) {
 		bufsize = fw->req->jd.fw_load.debug.exception_state_size;
 		if ((core_id == 0) &&
-		    (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0)) {
+		    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0)) {
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C0 = 0x%016lx",
-				roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0));
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0));
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 			fprintf(fp, "%.*s", bufsize, head_ptr);
-		} else if ((core_id == 1) &&
-			   (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0)) {
+		} else if ((core_id == 1) && (roc_ml_reg_read64(&cn10k_mldev->roc,
+								ML_SCRATCH_EXCEPTION_SP_C1) != 0)) {
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C1 = 0x%016lx",
-				roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1));
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1));
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 			fprintf(fp, "%.*s", bufsize, head_ptr);
 		}
 	}
@@ -1495,14 +1529,16 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 static int
 cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	const struct plt_memzone *mz;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	uint64_t timeout_cycle;
 	bool timeout;
 	int ret;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cn10k_ml_req), 0,
 					 ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL) {
@@ -1515,20 +1551,20 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
 	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
 	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
-	req->jd.fw_load.flags = cn10k_ml_fw_flags_get(&mldev->fw);
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
+	req->jd.fw_load.flags = cn10k_ml_fw_flags_get(&cn10k_mldev->fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
 	/* Enqueue firmware selftest request through scratch registers */
 	timeout = true;
-	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
 
 	plt_rmb();
 	do {
-		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
-		    (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
+		    (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -1552,8 +1588,8 @@ int
 cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
 	struct cn10k_ml_model_metadata *metadata;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
@@ -1574,7 +1610,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	if (ret != 0)
 		return ret;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
 
 	/* Find model ID */
 	found = false;
@@ -1591,7 +1627,8 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	/* Get WB and scratch pages, check if model can be loaded. */
-	ret = cn10k_ml_model_ocm_pages_count(mldev, idx, params->addr, &wb_pages, &scratch_pages);
+	ret = cn10k_ml_model_ocm_pages_count(&cnxk_mldev->cn10k_mldev, idx, params->addr, &wb_pages,
+					     &scratch_pages);
 	if (ret < 0)
 		return ret;
 
@@ -1623,7 +1660,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	model = mz->addr;
-	model->mldev = mldev;
+	model->mldev = cnxk_mldev;
 	model->model_id = idx;
 
 	rte_memcpy(&model->metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
@@ -1680,7 +1717,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
-	mldev->nb_models_loaded++;
+	cnxk_mldev->nb_models_loaded++;
 
 	/* Update xstats names */
 	cn10k_ml_xstats_model_name_update(dev, idx);
@@ -1695,9 +1732,9 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	char str[RTE_MEMZONE_NAMESIZE];
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
 	model = dev->data->models[model_id];
 
 	if (model == NULL) {
@@ -1711,7 +1748,7 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 	}
 
 	dev->data->models[model_id] = NULL;
-	mldev->nb_models_unloaded++;
+	cnxk_mldev->nb_models_unloaded++;
 
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, model_id);
 	return plt_memzone_free(plt_memzone_lookup(str));
@@ -1720,8 +1757,9 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 int
 cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1735,8 +1773,9 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	bool locked;
 	int ret = 0;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	if (model == NULL) {
@@ -1746,11 +1785,11 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Prepare JD */
 	req = model->req;
-	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
+	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
 
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
 	num_tiles = model->metadata.model.tile_end - model->metadata.model.tile_start + 1;
@@ -1815,26 +1854,26 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	job_dequeued = false;
 	do {
 		if (!job_enqueued) {
-			req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&mldev->roc, &req->jd);
+			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
 			if (req->result.error_code.u64 == 0)
 				ret = 0;
 			else
 				ret = -1;
 		}
 	} else { /* Reset scratch registers */
-		roc_ml_scratch_queue_reset(&mldev->roc);
+		roc_ml_scratch_queue_reset(&cn10k_mldev->roc);
 		ret = -ETIME;
 	}
 
@@ -1843,7 +1882,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 		if (plt_spinlock_trylock(&model->lock) != 0) {
 			if (ret == 0) {
 				model->state = ML_CN10K_MODEL_STATE_STARTED;
-				mldev->nb_models_started++;
+				cnxk_mldev->nb_models_started++;
 			} else {
 				model->state = ML_CN10K_MODEL_STATE_UNKNOWN;
 			}
@@ -1867,7 +1906,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	if (ret < 0) { /* Call unload to update model and FW state, ignore error */
 		rte_ml_model_stop(dev->data->dev_id, model_id);
 	} else {
-		if (mldev->cache_model_data && roc_model_is_cn10ka())
+		if (cn10k_mldev->cache_model_data && roc_model_is_cn10ka())
 			ret = cn10k_ml_cache_model_data(dev, model_id);
 	}
 
@@ -1877,8 +1916,9 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 int
 cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1887,8 +1927,9 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	bool locked;
 	int ret = 0;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	if (model == NULL) {
@@ -1898,11 +1939,11 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Prepare JD */
 	req = model->req;
-	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
+	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
 
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
 	locked = false;
@@ -1941,33 +1982,33 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	job_dequeued = false;
 	do {
 		if (!job_enqueued) {
-			req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&mldev->roc, &req->jd);
+			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
 			if (req->result.error_code.u64 == 0x0)
 				ret = 0;
 			else
 				ret = -1;
 		}
 	} else {
-		roc_ml_scratch_queue_reset(&mldev->roc);
+		roc_ml_scratch_queue_reset(&cn10k_mldev->roc);
 		ret = -ETIME;
 	}
 
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			mldev->nb_models_stopped++;
+			cnxk_mldev->nb_models_stopped++;
 			model->state = ML_CN10K_MODEL_STATE_LOADED;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
@@ -2211,8 +2252,9 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		       struct rte_ml_op *op)
 {
 	struct cn10k_ml_model_stats *stats;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
@@ -2258,14 +2300,16 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 
 		/* Handle driver error */
 		if (result->error_code.s.etype == ML_ETYPE_DRIVER) {
-			mldev = dev->data->dev_private;
+			cnxk_mldev = dev->data->dev_private;
+			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 			/* Check for exception */
-			if ((roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0) ||
-			    (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
+			if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) !=
+			     0) ||
+			    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
 				result->error_code.s.stype = ML_DRIVER_ERR_EXCEPTION;
-			else if ((roc_ml_reg_read64(&mldev->roc, ML_CORE_INT_LO) != 0) ||
-				 (roc_ml_reg_read64(&mldev->roc, ML_CORE_INT_HI) != 0))
+			else if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_LO) != 0) ||
+				 (roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_HI) != 0))
 				result->error_code.s.stype = ML_DRIVER_ERR_FW_ERROR;
 			else
 				result->error_code.s.stype = ML_DRIVER_ERR_UNKNOWN;
@@ -2282,8 +2326,9 @@ __rte_hot uint16_t
 cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	struct cn10k_ml_qp *qp;
 	struct rte_ml_op *op;
@@ -2292,7 +2337,8 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	uint64_t head;
 	bool enqueued;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	qp = dev->data->queue_pairs[qp_id];
 	queue = &qp->queue;
 
@@ -2307,15 +2353,15 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	op = ops[count];
 	req = &queue->reqs[head];
 
-	mldev->set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
+	cn10k_mldev->set_poll_addr(req);
+	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
-	mldev->set_poll_ptr(req);
-	enqueued = mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd);
+	cn10k_mldev->set_poll_ptr(req);
+	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
 
@@ -2339,8 +2385,9 @@ __rte_hot uint16_t
 cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	struct cn10k_ml_qp *qp;
 
@@ -2348,7 +2395,8 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	uint16_t count;
 	uint64_t tail;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	qp = dev->data->queue_pairs[qp_id];
 	queue = &qp->queue;
 
@@ -2361,8 +2409,8 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 
 dequeue_req:
 	req = &queue->reqs[tail];
-	status = mldev->get_poll_ptr(req);
-	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH)) {
+	status = cn10k_mldev->get_poll_ptr(req);
+	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
 		if (plt_tsc_cycles() < req->timeout)
 			goto empty_or_active;
 		else /* Timeout, set indication of driver error */
@@ -2420,30 +2468,32 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 __rte_hot int
 cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	bool timeout;
 	int ret = 0;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[op->model_id];
 	req = model->req;
 
 	cn10k_ml_set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
-	mldev->set_poll_ptr(req);
+	cn10k_mldev->set_poll_ptr(req);
 	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
 
 	timeout = true;
-	req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	do {
-		if (mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd)) {
+		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd)) {
 			req->op = op;
 			timeout = false;
 			break;
@@ -2457,7 +2507,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 
 	timeout = true;
 	do {
-		if (mldev->get_poll_ptr(req) == ML_CN10K_POLL_JOB_FINISH) {
+		if (cn10k_mldev->get_poll_ptr(req) == ML_CNXK_POLL_JOB_FINISH) {
 			timeout = false;
 			break;
 		}
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.c b/drivers/ml/cnxk/cnxk_ml_dev.c
new file mode 100644
index 00000000000..2a5c17c973b
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_dev.c
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include "cnxk_ml_dev.h"
+
+/* Dummy operations for ML device */
+struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
new file mode 100644
index 00000000000..51315de6227
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_DEV_H_
+#define _CNXK_ML_DEV_H_
+
+#include <roc_api.h>
+
+#include "cn10k_ml_dev.h"
+
+/* ML command timeout in seconds */
+#define ML_CNXK_CMD_TIMEOUT 5
+
+/* Poll mode job state */
+#define ML_CNXK_POLL_JOB_START	0
+#define ML_CNXK_POLL_JOB_FINISH 1
+
+/* Device configuration state enum */
+enum cnxk_ml_dev_state {
+	/* Probed and not configured */
+	ML_CNXK_DEV_STATE_PROBED = 0,
+
+	/* Configured */
+	ML_CNXK_DEV_STATE_CONFIGURED,
+
+	/* Started */
+	ML_CNXK_DEV_STATE_STARTED,
+
+	/* Closed */
+	ML_CNXK_DEV_STATE_CLOSED
+};
+
+/* Device private data */
+struct cnxk_ml_dev {
+	/* RTE device */
+	struct rte_ml_dev *mldev;
+
+	/* Configuration state */
+	enum cnxk_ml_dev_state state;
+
+	/* Number of models loaded */
+	uint16_t nb_models_loaded;
+
+	/* Number of models unloaded */
+	uint16_t nb_models_unloaded;
+
+	/* Number of models started */
+	uint16_t nb_models_started;
+
+	/* Number of models stopped */
+	uint16_t nb_models_stopped;
+
+	/* CN10K device structure */
+	struct cn10k_ml_dev cn10k_mldev;
+};
+
+#endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 94fa4283b13..03a2d4ecf2f 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -12,6 +12,7 @@ driver_sdk_headers = files(
         'cn10k_ml_ops.h',
         'cn10k_ml_model.h',
         'cn10k_ml_ocm.h',
+        'cnxk_ml_dev.h',
 )
 
 sources = files(
@@ -19,6 +20,7 @@ sources = files(
         'cn10k_ml_ops.c',
         'cn10k_ml_model.c',
         'cn10k_ml_ocm.c',
+        'cnxk_ml_dev.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 04/34] ml/cnxk: add generic model and layer structures
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (2 preceding siblings ...)
  2023-08-30 15:58 ` [PATCH v1 03/34] ml/cnxk: add generic cnxk device structure Srikanth Yalavarthi
@ 2023-08-30 15:58 ` Srikanth Yalavarthi
  2023-08-30 15:58 ` [PATCH v1 05/34] ml/cnxk: add generic cnxk request structure Srikanth Yalavarthi
                   ` (37 subsequent siblings)
  41 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:58 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduce generic cnxk model and layer structure. These
structures would enable supporting models with multiple
layers. A model is a collection of multiple independent
layers with flow dependencies between the layers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |   9 +-
 drivers/ml/cnxk/cn10k_ml_model.c | 244 ++++++++--------
 drivers/ml/cnxk/cn10k_ml_model.h | 122 ++------
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  49 +++-
 drivers/ml/cnxk/cn10k_ml_ocm.h   |   9 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 487 +++++++++++++++++--------------
 drivers/ml/cnxk/cnxk_ml_io.h     |  79 +++++
 drivers/ml/cnxk/cnxk_ml_model.c  |   7 +
 drivers/ml/cnxk/cnxk_ml_model.h  | 111 +++++++
 drivers/ml/cnxk/meson.build      |   3 +
 10 files changed, 653 insertions(+), 467 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index f9da1548c4a..99ff0a344a2 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -9,6 +9,8 @@
 
 #include "cn10k_ml_ocm.h"
 
+#include "cnxk_ml_io.h"
+
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 
@@ -21,9 +23,6 @@ extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 /* Device alignment size */
 #define ML_CN10K_ALIGN_SIZE 128
 
-/* Maximum number of models per device */
-#define ML_CN10K_MAX_MODELS 16
-
 /* Maximum number of queue-pairs per device, spinlock version */
 #define ML_CN10K_MAX_QP_PER_DEVICE_SL 16
 
@@ -455,8 +454,8 @@ struct cn10k_ml_xstats {
 	struct cn10k_ml_xstats_entry *entries;
 
 	/* Store num stats and offset of the stats for each model */
-	uint16_t count_per_model[ML_CN10K_MAX_MODELS];
-	uint16_t offset_for_model[ML_CN10K_MAX_MODELS];
+	uint16_t count_per_model[ML_CNXK_MAX_MODELS];
+	uint16_t offset_for_model[ML_CNXK_MAX_MODELS];
 	uint16_t count_mode_device;
 	uint16_t count_mode_model;
 	uint16_t count;
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index d146535866a..0ea6520bf78 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -11,6 +11,7 @@
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -312,19 +313,17 @@ cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata)
 }
 
 void
-cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_t *base_dma_addr)
+cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer, uint8_t *base_dma_addr)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_model_addr *addr;
+	struct cn10k_ml_layer_addr *addr;
 	size_t model_data_size;
 	uint8_t *dma_addr_load;
 	uint8_t *dma_addr_run;
-	uint8_t i;
-	uint8_t j;
 	int fpos;
 
-	metadata = &model->metadata;
-	addr = &model->addr;
+	metadata = &layer->glow.metadata;
+	addr = &layer->glow.addr;
 	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
 			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
 
@@ -362,102 +361,136 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_
 	addr->wb_base_addr = PLT_PTR_SUB(dma_addr_load, metadata->weights_bias.mem_offset);
 	addr->wb_load_addr = PLT_PTR_ADD(addr->wb_base_addr, metadata->weights_bias.mem_offset);
 	rte_memcpy(addr->wb_load_addr, PLT_PTR_ADD(buffer, fpos), metadata->weights_bias.file_size);
+}
+
+void
+cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	uint8_t i;
+	uint8_t j;
+
+	metadata = &layer->glow.metadata;
 
 	/* Inputs */
-	addr->total_input_sz_d = 0;
-	addr->total_input_sz_q = 0;
+	layer->info.nb_inputs = metadata->model.num_input;
+	layer->info.total_input_sz_d = 0;
+	layer->info.total_input_sz_q = 0;
 	for (i = 0; i < metadata->model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			addr->input[i].nb_dims = 4;
-			addr->input[i].shape[0] = metadata->input1[i].shape.w;
-			addr->input[i].shape[1] = metadata->input1[i].shape.x;
-			addr->input[i].shape[2] = metadata->input1[i].shape.y;
-			addr->input[i].shape[3] = metadata->input1[i].shape.z;
-
-			addr->input[i].nb_elements =
+			strncpy(layer->info.input[i].name, (char *)metadata->input1[i].input_name,
+				MRVL_ML_INPUT_NAME_LEN);
+			layer->info.input[i].dtype = metadata->input1[i].input_type;
+			layer->info.input[i].qtype = metadata->input1[i].model_input_type;
+			layer->info.input[i].nb_dims = 4;
+			layer->info.input[i].shape[0] = metadata->input1[i].shape.w;
+			layer->info.input[i].shape[1] = metadata->input1[i].shape.x;
+			layer->info.input[i].shape[2] = metadata->input1[i].shape.y;
+			layer->info.input[i].shape[3] = metadata->input1[i].shape.z;
+			layer->info.input[i].nb_elements =
 				metadata->input1[i].shape.w * metadata->input1[i].shape.x *
 				metadata->input1[i].shape.y * metadata->input1[i].shape.z;
-			addr->input[i].sz_d =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_d =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].input_type);
-			addr->input[i].sz_q =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_q =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].model_input_type);
-			addr->total_input_sz_d += addr->input[i].sz_d;
-			addr->total_input_sz_q += addr->input[i].sz_q;
+			layer->info.input[i].scale = metadata->input1[i].qscale;
+
+			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
+			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
 
 			plt_ml_dbg(
-				"model_id = %u, input[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				model->model_id, i, metadata->input1[i].shape.w,
+				"index = %u, input1[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				layer->index, i, metadata->input1[i].shape.w,
 				metadata->input1[i].shape.x, metadata->input1[i].shape.y,
-				metadata->input1[i].shape.z, addr->input[i].sz_d,
-				addr->input[i].sz_q);
+				metadata->input1[i].shape.z, layer->info.input[i].sz_d,
+				layer->info.input[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			addr->input[i].nb_dims = 4;
-			addr->input[i].shape[0] = metadata->input2[j].shape.w;
-			addr->input[i].shape[1] = metadata->input2[j].shape.x;
-			addr->input[i].shape[2] = metadata->input2[j].shape.y;
-			addr->input[i].shape[3] = metadata->input2[j].shape.z;
-
-			addr->input[i].nb_elements =
+			strncpy(layer->info.input[i].name, (char *)metadata->input2[j].input_name,
+				MRVL_ML_INPUT_NAME_LEN);
+			layer->info.input[i].dtype = metadata->input2[j].input_type;
+			layer->info.input[i].qtype = metadata->input2[j].model_input_type;
+			layer->info.input[i].nb_dims = 4;
+			layer->info.input[i].shape[0] = metadata->input2[j].shape.w;
+			layer->info.input[i].shape[1] = metadata->input2[j].shape.x;
+			layer->info.input[i].shape[2] = metadata->input2[j].shape.y;
+			layer->info.input[i].shape[3] = metadata->input2[j].shape.z;
+			layer->info.input[i].nb_elements =
 				metadata->input2[j].shape.w * metadata->input2[j].shape.x *
 				metadata->input2[j].shape.y * metadata->input2[j].shape.z;
-			addr->input[i].sz_d =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_d =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].input_type);
-			addr->input[i].sz_q =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_q =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].model_input_type);
-			addr->total_input_sz_d += addr->input[i].sz_d;
-			addr->total_input_sz_q += addr->input[i].sz_q;
+			layer->info.input[i].scale = metadata->input2[j].qscale;
+
+			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
+			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
 
 			plt_ml_dbg(
-				"model_id = %u, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				model->model_id, j, metadata->input2[j].shape.w,
+				"index = %u, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				layer->index, j, metadata->input2[j].shape.w,
 				metadata->input2[j].shape.x, metadata->input2[j].shape.y,
-				metadata->input2[j].shape.z, addr->input[i].sz_d,
-				addr->input[i].sz_q);
+				metadata->input2[j].shape.z, layer->info.input[i].sz_d,
+				layer->info.input[i].sz_q);
 		}
 	}
 
 	/* Outputs */
-	addr->total_output_sz_q = 0;
-	addr->total_output_sz_d = 0;
+	layer->info.nb_outputs = metadata->model.num_output;
+	layer->info.total_output_sz_q = 0;
+	layer->info.total_output_sz_d = 0;
 	for (i = 0; i < metadata->model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			addr->output[i].nb_dims = 1;
-			addr->output[i].shape[0] = metadata->output1[i].size;
-			addr->output[i].nb_elements = metadata->output1[i].size;
-			addr->output[i].sz_d =
-				addr->output[i].nb_elements *
+			strncpy(layer->info.output[i].name,
+				(char *)metadata->output1[i].output_name, MRVL_ML_OUTPUT_NAME_LEN);
+			layer->info.output[i].dtype = metadata->output1[i].output_type;
+			layer->info.output[i].qtype = metadata->output1[i].model_output_type;
+			layer->info.output[i].nb_dims = 1;
+			layer->info.output[i].shape[0] = metadata->output1[i].size;
+			layer->info.output[i].nb_elements = metadata->output1[i].size;
+			layer->info.output[i].sz_d =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].output_type);
-			addr->output[i].sz_q =
-				addr->output[i].nb_elements *
+			layer->info.output[i].sz_q =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].model_output_type);
-			addr->total_output_sz_q += addr->output[i].sz_q;
-			addr->total_output_sz_d += addr->output[i].sz_d;
+			layer->info.output[i].scale = metadata->output1[i].dscale;
 
-			plt_ml_dbg("model_id = %u, output[%u] - sz_d = %u, sz_q = %u",
-				   model->model_id, i, addr->output[i].sz_d, addr->output[i].sz_q);
+			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
+			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
+
+			plt_ml_dbg("index = %u, output1[%u] - sz_d = %u, sz_q = %u", layer->index,
+				   i, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			addr->output[i].nb_dims = 1;
-			addr->output[i].shape[0] = metadata->output2[j].size;
-			addr->output[i].nb_elements = metadata->output2[j].size;
-			addr->output[i].sz_d =
-				addr->output[i].nb_elements *
+			strncpy(layer->info.output[i].name,
+				(char *)metadata->output2[j].output_name, MRVL_ML_OUTPUT_NAME_LEN);
+			layer->info.output[i].dtype = metadata->output2[j].output_type;
+			layer->info.output[i].qtype = metadata->output2[j].model_output_type;
+			layer->info.output[i].nb_dims = 1;
+			layer->info.output[i].shape[0] = metadata->output2[j].size;
+			layer->info.output[i].nb_elements = metadata->output2[j].size;
+			layer->info.output[i].sz_d =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].output_type);
-			addr->output[i].sz_q =
-				addr->output[i].nb_elements *
+			layer->info.output[i].sz_q =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].model_output_type);
-			addr->total_output_sz_q += addr->output[i].sz_q;
-			addr->total_output_sz_d += addr->output[i].sz_d;
+			layer->info.output[i].scale = metadata->output2[j].dscale;
+
+			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
+			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
 
-			plt_ml_dbg("model_id = %u, output2[%u] - sz_d = %u, sz_q = %u",
-				   model->model_id, j, addr->output[i].sz_d, addr->output[i].sz_q);
+			plt_ml_dbg("index = %u, output2[%u] - sz_d = %u, sz_q = %u", layer->index,
+				   j, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
 		}
 	}
 }
@@ -515,23 +548,23 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 }
 
 void
-cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model)
+cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_model_addr *addr;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct rte_ml_model_info *info;
 	struct rte_ml_io_info *output;
 	struct rte_ml_io_info *input;
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_layer *layer;
 	uint8_t i;
-	uint8_t j;
 
-	mldev = dev->data->dev_private;
-	metadata = &model->metadata;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	metadata = &model->glow.metadata;
 	info = PLT_PTR_CAST(model->info);
 	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
 	output = PLT_PTR_ADD(input, metadata->model.num_input * sizeof(struct rte_ml_io_info));
-	addr = &model->addr;
 
 	/* Set model info */
 	memset(info, 0, sizeof(struct rte_ml_model_info));
@@ -543,7 +576,8 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model)
 	info->device_id = dev->data->dev_id;
 	info->io_layout = RTE_ML_IO_LAYOUT_PACKED;
 	info->min_batches = model->batch_size;
-	info->max_batches = mldev->fw.req->jd.fw_load.cap.s.max_num_batches / model->batch_size;
+	info->max_batches =
+		cn10k_mldev->fw.req->jd.fw_load.cap.s.max_num_batches / model->batch_size;
 	info->nb_inputs = metadata->model.num_input;
 	info->input_info = input;
 	info->nb_outputs = metadata->model.num_output;
@@ -551,56 +585,26 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model)
 	info->wb_size = metadata->weights_bias.file_size;
 
 	/* Set input info */
+	layer = &model->layer[0];
 	for (i = 0; i < info->nb_inputs; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			rte_memcpy(input[i].name, metadata->input1[i].input_name,
-				   MRVL_ML_INPUT_NAME_LEN);
-			input[i].nb_dims = addr->input[i].nb_dims;
-			input[i].shape = addr->input[i].shape;
-			input[i].type = metadata->input1[i].model_input_type;
-			input[i].nb_elements = addr->input[i].nb_elements;
-			input[i].size =
-				addr->input[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->input1[i].model_input_type);
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-
-			rte_memcpy(input[i].name, metadata->input2[j].input_name,
-				   MRVL_ML_INPUT_NAME_LEN);
-			input[i].nb_dims = addr->input[i].nb_dims;
-			input[i].shape = addr->input[i].shape;
-			input[i].type = metadata->input2[j].model_input_type;
-			input[i].nb_elements = addr->input[i].nb_elements;
-			input[i].size =
-				addr->input[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->input2[j].model_input_type);
-		}
+		rte_memcpy(input[i].name, layer->info.input[i].name, MRVL_ML_INPUT_NAME_LEN);
+		input[i].nb_dims = layer->info.input[i].nb_dims;
+		input[i].shape = &layer->info.input[i].shape[0];
+		input[i].type = layer->info.input[i].qtype;
+		input[i].nb_elements = layer->info.input[i].nb_elements;
+		input[i].size = layer->info.input[i].nb_elements *
+				rte_ml_io_type_size_get(layer->info.input[i].qtype);
 	}
 
 	/* Set output info */
+	layer = &model->layer[0];
 	for (i = 0; i < info->nb_outputs; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			rte_memcpy(output[i].name, metadata->output1[i].output_name,
-				   MRVL_ML_OUTPUT_NAME_LEN);
-			output[i].nb_dims = addr->output[i].nb_dims;
-			output[i].shape = addr->output[i].shape;
-			output[i].type = metadata->output1[i].model_output_type;
-			output[i].nb_elements = addr->output[i].nb_elements;
-			output[i].size =
-				addr->output[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->output1[i].model_output_type);
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-
-			rte_memcpy(output[i].name, metadata->output2[j].output_name,
-				   MRVL_ML_OUTPUT_NAME_LEN);
-			output[i].nb_dims = addr->output[i].nb_dims;
-			output[i].shape = addr->output[i].shape;
-			output[i].type = metadata->output2[j].model_output_type;
-			output[i].nb_elements = addr->output[i].nb_elements;
-			output[i].size =
-				addr->output[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->output2[j].model_output_type);
-		}
+		rte_memcpy(output[i].name, layer->info.output[i].name, MRVL_ML_INPUT_NAME_LEN);
+		output[i].nb_dims = layer->info.output[i].nb_dims;
+		output[i].shape = &layer->info.output[i].shape[0];
+		output[i].type = layer->info.output[i].qtype;
+		output[i].nb_elements = layer->info.output[i].nb_elements;
+		output[i].size = layer->info.output[i].nb_elements *
+				 rte_ml_io_type_size_get(layer->info.output[i].qtype);
 	}
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 3128b28db73..206a369ca75 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -13,15 +13,8 @@
 #include "cn10k_ml_ocm.h"
 #include "cn10k_ml_ops.h"
 
-struct cnxk_ml_dev;
-
-/* Model state */
-enum cn10k_ml_model_state {
-	ML_CN10K_MODEL_STATE_LOADED,
-	ML_CN10K_MODEL_STATE_JOB_ACTIVE,
-	ML_CN10K_MODEL_STATE_STARTED,
-	ML_CN10K_MODEL_STATE_UNKNOWN,
-};
+struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 /* Model Metadata : v 2.3.0.1 */
 #define MRVL_ML_MODEL_MAGIC_STRING "MRVL"
@@ -369,7 +362,7 @@ struct cn10k_ml_model_metadata {
 };
 
 /* Model address structure */
-struct cn10k_ml_model_addr {
+struct cn10k_ml_layer_addr {
 	/* Base DMA address for load */
 	void *base_dma_addr_load;
 
@@ -408,58 +401,10 @@ struct cn10k_ml_model_addr {
 
 	/* End tile */
 	uint8_t tile_end;
-
-	/* Input address and size */
-	struct {
-		/* Number of dimensions in shape */
-		uint32_t nb_dims;
-
-		/* Shape of input */
-		uint32_t shape[4];
-
-		/* Number of elements */
-		uint32_t nb_elements;
-
-		/* Dequantized input size */
-		uint32_t sz_d;
-
-		/* Quantized input size */
-		uint32_t sz_q;
-	} input[MRVL_ML_NUM_INPUT_OUTPUT];
-
-	/* Output address and size */
-	struct {
-		/* Number of dimensions in shape */
-		uint32_t nb_dims;
-
-		/* Shape of input */
-		uint32_t shape[4];
-
-		/* Number of elements */
-		uint32_t nb_elements;
-
-		/* Dequantize output size */
-		uint32_t sz_d;
-
-		/* Quantized output size */
-		uint32_t sz_q;
-	} output[MRVL_ML_NUM_INPUT_OUTPUT];
-
-	/* Total size of quantized input */
-	uint32_t total_input_sz_q;
-
-	/* Total size of dequantized input */
-	uint32_t total_input_sz_d;
-
-	/* Total size of quantized output */
-	uint32_t total_output_sz_q;
-
-	/* Total size of dequantized output */
-	uint32_t total_output_sz_d;
 };
 
 /* Model fast-path stats */
-struct cn10k_ml_model_stats {
+struct cn10k_ml_layer_stats {
 	/* Total hardware latency, sum of all inferences */
 	uint64_t hw_latency_tot;
 
@@ -488,59 +433,38 @@ struct cn10k_ml_model_stats {
 	uint64_t fw_reset_count;
 };
 
-/* Model Object */
-struct cn10k_ml_model {
-	/* Device reference */
-	struct cnxk_ml_dev *mldev;
-
-	/* Name */
-	char name[RTE_ML_STR_MAX];
-
-	/* ID */
-	uint16_t model_id;
-
-	/* Batch size */
-	uint32_t batch_size;
-
-	/* Metadata */
+struct cn10k_ml_layer_data {
+	/* Model / Layer: metadata */
 	struct cn10k_ml_model_metadata metadata;
 
-	/* Address structure */
-	struct cn10k_ml_model_addr addr;
+	/* Layer: address structure */
+	struct cn10k_ml_layer_addr addr;
 
-	/* Tile and memory information object */
-	struct cn10k_ml_ocm_model_map model_mem_map;
+	/* Layer: Tile and memory information object */
+	struct cn10k_ml_ocm_layer_map ocm_map;
 
-	/* Internal model information structure
-	 * Size of the buffer = sizeof(struct rte_ml_model_info)
-	 *                    + num_inputs * sizeof(struct rte_ml_io_info)
-	 *                    + num_outputs * sizeof(struct rte_ml_io_info).
-	 * Structures would be arranged in the same order in the buffer.
-	 */
-	uint8_t *info;
-
-	/* Spinlock, used to update model state */
-	plt_spinlock_t lock;
-
-	/* State */
-	enum cn10k_ml_model_state state;
-
-	/* Slow-path operations request pointer */
+	/* Layer: Slow-path operations request pointer */
 	struct cn10k_ml_req *req;
 
-	/* Stats for burst ops */
-	struct cn10k_ml_model_stats *burst_stats;
+	/* Layer: Stats for burst ops */
+	struct cn10k_ml_layer_stats *burst_stats;
 
-	/* Stats for sync ops */
-	struct cn10k_ml_model_stats *sync_stats;
+	/* Layer: Stats for sync ops */
+	struct cn10k_ml_layer_stats *sync_stats;
+};
+
+struct cn10k_ml_model_data {
+	/* Model / Layer: metadata */
+	struct cn10k_ml_model_metadata metadata;
 };
 
 int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
-void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
+void cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer,
 				uint8_t *base_dma_addr);
+void cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer);
 int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id,
 				   uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages);
-void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model);
+void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index d0f716bccea..639f329f8aa 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -11,6 +11,7 @@
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 
 /* OCM macros */
 #define BYTE_LEN	   8
@@ -334,12 +335,14 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 }
 
 void
-cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t tilemask,
-			   int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages)
+cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
+			   uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
+			   uint16_t scratch_pages)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_page_start;
@@ -354,6 +357,7 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
+	layer = &model->layer[layer_id];
 
 	/* Get first set bit, tile_start */
 	tile_start = 0;
@@ -383,8 +387,8 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 				PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, wb_page_end);
 	}
 
-	model->addr.tile_start = tile_start;
-	model->addr.tile_end = tile_end;
+	layer->glow.addr.tile_start = tile_start;
+	layer->glow.addr.tile_end = tile_end;
 
 	plt_ml_dbg("model_id = %u, tilemask = 0x%016lx", model_id, tilemask);
 	plt_ml_dbg("model_id = %u, wb_page_start = %d, wb_page_end = %d", model_id, wb_page_start,
@@ -394,12 +398,14 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 }
 
 void
-cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id)
 {
-	struct cn10k_ml_model *local_model;
+	struct cnxk_ml_model *local_model;
+	struct cnxk_ml_layer *local_layer;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_resize_pages;
@@ -410,16 +416,19 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 	int tile_id;
 	int page_id;
 	uint16_t i;
+	uint16_t j;
 
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
+	layer = &model->layer[layer_id];
 
 	/* Update OCM info for WB memory */
-	wb_page_start = model->model_mem_map.wb_page_start;
-	wb_page_end = wb_page_start + model->model_mem_map.wb_pages - 1;
-	for (tile_id = model->addr.tile_start; tile_id <= model->addr.tile_end; tile_id++) {
+	wb_page_start = layer->glow.ocm_map.wb_page_start;
+	wb_page_end = wb_page_start + layer->glow.ocm_map.wb_pages - 1;
+	for (tile_id = layer->glow.addr.tile_start; tile_id <= layer->glow.addr.tile_end;
+	     tile_id++) {
 		for (page_id = wb_page_start; page_id <= wb_page_end; page_id++) {
 			CLEAR_BIT(ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE],
 				  page_id % OCM_MAP_WORD_SIZE);
@@ -433,11 +442,19 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 		scratch_resize_pages = 0;
 		for (i = 0; i < dev->data->nb_models; i++) {
 			local_model = dev->data->models[i];
-			if ((i != model_id) && (local_model != NULL)) {
-				if (IS_BIT_SET(local_model->model_mem_map.tilemask, tile_id))
-					scratch_resize_pages = PLT_MAX(
-						(int)local_model->model_mem_map.scratch_pages,
-						scratch_resize_pages);
+			if (local_model == NULL)
+				continue;
+
+			for (j = 0; j < local_model->nb_layers; j++) {
+				local_layer = &local_model->layer[j];
+				if (local_layer != layer &&
+				    local_layer->glow.ocm_map.ocm_reserved) {
+					if (IS_BIT_SET(local_layer->glow.ocm_map.tilemask, tile_id))
+						scratch_resize_pages =
+							PLT_MAX((int)local_layer->glow.ocm_map
+									.scratch_pages,
+								scratch_resize_pages);
+				}
 			}
 		}
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 3404e7fd651..720f8caf766 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -27,7 +27,7 @@ struct cn10k_ml_ocm_tile_info {
 };
 
 /* Model OCM map structure */
-struct cn10k_ml_ocm_model_map {
+struct cn10k_ml_ocm_layer_map {
 	/* Status of OCM reservation */
 	bool ocm_reserved;
 
@@ -77,9 +77,10 @@ struct cn10k_ml_ocm {
 int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
 int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
 			       uint16_t scratch_pages, uint64_t *tilemask);
-void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t tilemask,
-				int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages);
-void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id);
+void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
+				uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
+				uint16_t scratch_pages);
+void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id);
 void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp);
 
 #endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 3385bf50c0d..a52509630fe 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -12,6 +12,7 @@
 #include "cn10k_ml_ops.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
@@ -203,7 +204,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	char str[STR_LEN];
 	uint8_t i;
@@ -216,77 +217,80 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 
 	/* Print debug info */
 	print_line(fp, LINE_LEN);
-	fprintf(fp, " Model Information (%s)\n", model->metadata.model.name);
+	fprintf(fp, " Model Information (%s)\n", model->glow.metadata.model.name);
 	print_line(fp, LINE_LEN);
-	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->metadata.model.name);
-	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version", model->metadata.model.version[0],
-		model->metadata.model.version[1], model->metadata.model.version[2],
-		model->metadata.model.version[3]);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->glow.metadata.model.name);
+	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version",
+		model->glow.metadata.model.version[0], model->glow.metadata.model.version[1],
+		model->glow.metadata.model.version[2], model->glow.metadata.model.version[3]);
 	if (strlen(model->name) != 0)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "debug_name", model->name);
 	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->metadata.model.batch_size);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_layers", model->metadata.model.num_layers);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->glow.metadata.model.batch_size);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_layers", model->glow.metadata.model.num_layers);
 
 	/* Print model state */
-	if (model->state == ML_CN10K_MODEL_STATE_LOADED)
+	if (model->state == ML_CNXK_MODEL_STATE_LOADED)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
-	if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE)
+	if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
-	if (model->state == ML_CN10K_MODEL_STATE_STARTED)
+	if (model->state == ML_CNXK_MODEL_STATE_STARTED)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
 
 	/* Print OCM status */
 	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
-		model->metadata.model.ocm_wb_range_end - model->metadata.model.ocm_wb_range_start +
-			1);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", model->model_mem_map.wb_pages);
+		model->glow.metadata.model.ocm_wb_range_end -
+			model->glow.metadata.model.ocm_wb_range_start + 1);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", model->layer[0].glow.ocm_map.wb_pages);
 	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
-		ocm->size_per_tile - model->metadata.model.ocm_tmp_range_floor);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages", model->model_mem_map.scratch_pages);
+		ocm->size_per_tile - model->glow.metadata.model.ocm_tmp_range_floor);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages",
+		model->layer[0].glow.ocm_map.scratch_pages);
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
-		model->metadata.model.tile_end - model->metadata.model.tile_start + 1);
+		model->glow.metadata.model.tile_end - model->glow.metadata.model.tile_start + 1);
 
-	if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+	if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
-			ML_CN10K_OCM_NUMTILES / 4, model->model_mem_map.tilemask);
+			ML_CN10K_OCM_NUMTILES / 4, model->layer[0].glow.ocm_map.tilemask);
 		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
-			model->model_mem_map.wb_page_start * cn10k_mldev->ocm.page_size);
+			model->layer[0].glow.ocm_map.wb_page_start * cn10k_mldev->ocm.page_size);
 	}
 
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->metadata.model.num_input);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", model->metadata.model.num_output);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->glow.metadata.model.num_input);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", model->glow.metadata.model.num_output);
 	fprintf(fp, "\n");
 
 	print_line(fp, LINE_LEN);
 	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "input", "input_name", "input_type",
 		"model_input_type", "quantize");
 	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->metadata.model.num_input; i++) {
+	for (i = 0; i < model->glow.metadata.model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.input1[i].input_name);
-			rte_ml_io_type_to_str(model->metadata.input1[i].input_type, str, STR_LEN);
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.input1[i].input_name);
+			rte_ml_io_type_to_str(model->glow.metadata.input1[i].input_type, str,
+					      STR_LEN);
 			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.input1[i].model_input_type, str,
+			rte_ml_io_type_to_str(model->glow.metadata.input1[i].model_input_type, str,
 					      STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.input1[i].quantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.input1[i].quantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.input2[j].input_name);
-			rte_ml_io_type_to_str(model->metadata.input2[j].input_type, str, STR_LEN);
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.input2[j].input_name);
+			rte_ml_io_type_to_str(model->glow.metadata.input2[j].input_type, str,
+					      STR_LEN);
 			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.input2[j].model_input_type, str,
+			rte_ml_io_type_to_str(model->glow.metadata.input2[j].model_input_type, str,
 					      STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.input2[j].quantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.input2[j].quantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		}
 	}
@@ -296,29 +300,31 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "output", "output_name", "output_type",
 		"model_output_type", "dequantize");
 	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->metadata.model.num_output; i++) {
+	for (i = 0; i < model->glow.metadata.model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.output1[i].output_name);
-			rte_ml_io_type_to_str(model->metadata.output1[i].output_type, str, STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.output1[i].model_output_type, str,
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.output1[i].output_name);
+			rte_ml_io_type_to_str(model->glow.metadata.output1[i].output_type, str,
 					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(model->glow.metadata.output1[i].model_output_type,
+					      str, STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.output1[i].dequantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.output1[i].dequantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.output2[j].output_name);
-			rte_ml_io_type_to_str(model->metadata.output2[j].output_type, str, STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.output2[j].model_output_type, str,
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.output2[j].output_name);
+			rte_ml_io_type_to_str(model->glow.metadata.output2[j].output_type, str,
 					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(model->glow.metadata.output2[j].model_output_type,
+					      str, STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.output2[j].dequantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.output2[j].dequantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		}
 	}
@@ -328,14 +334,14 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 }
 
 static void
-cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_model *model,
+cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_model *model,
 				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_model_addr *addr;
+	struct cn10k_ml_layer_addr *addr;
 
-	metadata = &model->metadata;
-	addr = &model->addr;
+	metadata = &model->glow.metadata;
+	addr = &model->layer[0].glow.addr;
 
 	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
 	req->jd.hdr.jce.w0.u64 = 0;
@@ -346,7 +352,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_m
 	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
-		if (!model->metadata.model.ocm_relocatable)
+		if (!model->glow.metadata.model.ocm_relocatable)
 			req->jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
 		else
 			req->jd.hdr.sp_flags = 0x0;
@@ -386,7 +392,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_m
 		req->jd.model_start.output.s.ddr_range_end = metadata->model.ddr_output_range_end;
 
 		req->extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
-			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, model->addr.scratch_base_addr));
+			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->scratch_base_addr));
 		req->extended_args.start.ddr_scratch_range_start =
 			metadata->model.ddr_scratch_range_start;
 		req->extended_args.start.ddr_scratch_range_end =
@@ -446,7 +452,7 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
-	nb_stats = RTE_DIM(device_stats) + ML_CN10K_MAX_MODELS * RTE_DIM(model_stats);
+	nb_stats = RTE_DIM(device_stats) + ML_CNXK_MAX_MODELS * RTE_DIM(model_stats);
 	if (cn10k_mldev->xstats.entries == NULL)
 		cn10k_mldev->xstats.entries = rte_zmalloc(
 			"cn10k_ml_xstats", sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
@@ -473,7 +479,7 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	cn10k_mldev->xstats.count_mode_device = stat_id;
 
 	/* Initialize model xstats */
-	for (model = 0; model < ML_CN10K_MAX_MODELS; model++) {
+	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
 		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
 
 		for (i = 0; i < RTE_DIM(model_stats); i++) {
@@ -522,7 +528,7 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint16_t rclk_freq;
 	uint16_t sclk_freq;
 	uint16_t stat_id;
@@ -544,7 +550,7 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 	for (i = 0; i < RTE_DIM(model_stats); i++) {
 		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
-			 model->metadata.model.name, model_stats[i].name, suffix);
+			 model->layer[0].glow.metadata.model.name, model_stats[i].name, suffix);
 		stat_id++;
 	}
 }
@@ -577,9 +583,9 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value += model->burst_stats[qp_id].str##_latency_tot;                      \
-			count += model->burst_stats[qp_id].dequeued_count -                        \
-				 model->burst_stats[qp_id].str##_reset_count;                      \
+			value += model->layer[0].glow.burst_stats[qp_id].str##_latency_tot;        \
+			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
+				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
 		}                                                                                  \
 		if (count != 0)                                                                    \
 			value = value / count;                                                     \
@@ -589,9 +595,10 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = UINT64_MAX;                                                                \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MIN(value, model->burst_stats[qp_id].str##_latency_min);       \
-			count += model->burst_stats[qp_id].dequeued_count -                        \
-				 model->burst_stats[qp_id].str##_reset_count;                      \
+			value = PLT_MIN(                                                           \
+				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_min); \
+			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
+				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
@@ -601,9 +608,10 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MAX(value, model->burst_stats[qp_id].str##_latency_max);       \
-			count += model->burst_stats[qp_id].dequeued_count -                        \
-				 model->burst_stats[qp_id].str##_reset_count;                      \
+			value = PLT_MAX(                                                           \
+				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_max); \
+			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
+				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
@@ -612,7 +620,7 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 static uint64_t
 cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cn10k_ml_xstats_type type)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint16_t rclk_freq; /* MHz */
 	uint16_t sclk_freq; /* MHz */
 	uint64_t count = 0;
@@ -693,28 +701,28 @@ cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[],
 #define ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			model->burst_stats[qp_id].str##_latency_tot = 0;                           \
-			model->burst_stats[qp_id].str##_reset_count =                              \
-				model->burst_stats[qp_id].dequeued_count;                          \
+			model->layer[0].glow.burst_stats[qp_id].str##_latency_tot = 0;             \
+			model->layer[0].glow.burst_stats[qp_id].str##_reset_count =                \
+				model->layer[0].glow.burst_stats[qp_id].dequeued_count;            \
 		}                                                                                  \
 	} while (0)
 
 #define ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->burst_stats[qp_id].str##_latency_min = UINT64_MAX;                  \
+			model->layer[0].glow.burst_stats[qp_id].str##_latency_min = UINT64_MAX;    \
 	} while (0)
 
 #define ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->burst_stats[qp_id].str##_latency_max = 0;                           \
+			model->layer[0].glow.burst_stats[qp_id].str##_latency_max = 0;             \
 	} while (0)
 
 static void
 cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cn10k_ml_xstats_type type)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint32_t qp_id;
 
 	model = dev->data->models[model_id];
@@ -750,7 +758,7 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	int32_t lcl_model_id = 0;
 	uint16_t start_id;
 	uint16_t end_id;
@@ -759,7 +767,7 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	for (i = 0; i < ML_CN10K_MAX_MODELS; i++) {
+	for (i = 0; i < ML_CNXK_MAX_MODELS; i++) {
 		if (model_id == -1) {
 			model = dev->data->models[i];
 			if (model == NULL) /* Skip inactive models */
@@ -804,7 +812,7 @@ static int
 cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct rte_ml_model_info *info;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct rte_ml_buff_seg seg[2];
 	struct rte_ml_buff_seg *inp;
 	struct rte_ml_buff_seg *out;
@@ -855,7 +863,7 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	op.input = &inp;
 	op.output = &out;
 
-	memset(model->req, 0, sizeof(struct cn10k_ml_req));
+	memset(model->layer[0].glow.req, 0, sizeof(struct cn10k_ml_req));
 	ret = cn10k_ml_inference_sync(dev, &op);
 	plt_memzone_free(mz);
 
@@ -876,7 +884,7 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 
 	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
 	dev_info->driver_name = dev->device->driver->name;
-	dev_info->max_models = ML_CN10K_MAX_MODELS;
+	dev_info->max_models = ML_CNXK_MAX_MODELS;
 	if (cn10k_mldev->hw_queue_lock)
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
 	else
@@ -896,7 +904,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
@@ -1002,11 +1010,11 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 			model = dev->data->models[model_id];
 			if (model != NULL) {
-				if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 					if (cn10k_ml_model_stop(dev, model_id) != 0)
 						plt_err("Could not stop model %u", model_id);
 				}
-				if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
 					if (cn10k_ml_model_unload(dev, model_id) != 0)
 						plt_err("Could not unload model %u", model_id);
 				}
@@ -1094,7 +1102,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
 	uint16_t qp_id;
@@ -1112,11 +1120,11 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 		model = dev->data->models[model_id];
 		if (model != NULL) {
-			if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 				if (cn10k_ml_model_stop(dev, model_id) != 0)
 					plt_err("Could not stop model %u", model_id);
 			}
-			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
 				if (cn10k_ml_model_unload(dev, model_id) != 0)
 					plt_err("Could not unload model %u", model_id);
 			}
@@ -1295,7 +1303,7 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CN10K_MAX_MODELS)
+		if (model_id >= ML_CNXK_MAX_MODELS)
 			break;
 		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
@@ -1387,7 +1395,7 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CN10K_MAX_MODELS)
+		if (model_id >= ML_CNXK_MAX_MODELS)
 			return -EINVAL;
 		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
@@ -1448,7 +1456,7 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_fw *fw;
 
 	uint32_t head_loc;
@@ -1589,7 +1597,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
@@ -1644,9 +1652,9 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
 			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
 	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
-	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_model_stats);
+	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_stats);
 
-	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
+	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
 		  2 * model_data_size + model_scratch_size + model_info_size +
 		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE) +
 		  model_stats_size;
@@ -1660,62 +1668,85 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	model = mz->addr;
-	model->mldev = cnxk_mldev;
+	model->cnxk_mldev = cnxk_mldev;
 	model->model_id = idx;
+	dev->data->models[idx] = model;
 
-	rte_memcpy(&model->metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
-	cn10k_ml_model_metadata_update(&model->metadata);
+	rte_memcpy(&model->glow.metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
+	cn10k_ml_model_metadata_update(&model->glow.metadata);
+
+	/* Set model name */
+	rte_memcpy(model->name, (char *)model->glow.metadata.model.name, 64);
 
 	/* Enable support for batch_size of 256 */
-	if (model->metadata.model.batch_size == 0)
+	if (model->glow.metadata.model.batch_size == 0)
 		model->batch_size = 256;
 	else
-		model->batch_size = model->metadata.model.batch_size;
+		model->batch_size = model->glow.metadata.model.batch_size;
+
+	/* Since the number of layers that the driver would be handling for glow models is
+	 * always 1. consider the entire model as a model with single layer. This would
+	 * ignore the num_layers from metadata.
+	 */
+	model->nb_layers = 1;
+
+	/* Copy metadata to internal buffer */
+	rte_memcpy(&model->layer[0].glow.metadata, params->addr,
+		   sizeof(struct cn10k_ml_model_metadata));
+	cn10k_ml_model_metadata_update(&model->layer[0].glow.metadata);
+	model->layer[0].model = model;
 
 	/* Set DMA base address */
 	base_dma_addr = PLT_PTR_ADD(
-		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE));
-	cn10k_ml_model_addr_update(model, params->addr, base_dma_addr);
-	model->addr.scratch_base_addr = PLT_PTR_ADD(base_dma_addr, 2 * model_data_size);
+		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE));
+	cn10k_ml_layer_addr_update(&model->layer[0], params->addr, base_dma_addr);
+	model->layer[0].glow.addr.scratch_base_addr =
+		PLT_PTR_ADD(base_dma_addr, 2 * model_data_size);
 
 	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, model_data_size);
+	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
+		   model->layer[0].glow.addr.base_dma_addr_load, model_data_size);
+
+	/* Update internal I/O data structure */
+	cn10k_ml_layer_info_update(&model->layer[0]);
 
 	/* Initialize model_mem_map */
-	memset(&model->model_mem_map, 0, sizeof(struct cn10k_ml_ocm_model_map));
-	model->model_mem_map.ocm_reserved = false;
-	model->model_mem_map.tilemask = 0;
-	model->model_mem_map.wb_page_start = -1;
-	model->model_mem_map.wb_pages = wb_pages;
-	model->model_mem_map.scratch_pages = scratch_pages;
+	memset(&model->layer[0].glow.ocm_map, 0, sizeof(struct cn10k_ml_ocm_layer_map));
+	model->layer[0].glow.ocm_map.ocm_reserved = false;
+	model->layer[0].glow.ocm_map.tilemask = 0;
+	model->layer[0].glow.ocm_map.wb_page_start = -1;
+	model->layer[0].glow.ocm_map.wb_pages = wb_pages;
+	model->layer[0].glow.ocm_map.scratch_pages = scratch_pages;
 
 	/* Set model info */
-	model->info = PLT_PTR_ADD(model->addr.scratch_base_addr, model_scratch_size);
+	model->info = PLT_PTR_ADD(model->layer[0].glow.addr.scratch_base_addr, model_scratch_size);
 	cn10k_ml_model_info_set(dev, model);
 
 	/* Set slow-path request address and state */
-	model->req = PLT_PTR_ADD(model->info, model_info_size);
+	model->layer[0].glow.req = PLT_PTR_ADD(model->info, model_info_size);
 
 	/* Reset burst and sync stats */
-	model->burst_stats = PLT_PTR_ADD(
-		model->req, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE));
+	model->layer[0].glow.burst_stats =
+		PLT_PTR_ADD(model->layer[0].glow.req,
+			    PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE));
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
-		model->burst_stats[qp_id].hw_latency_tot = 0;
-		model->burst_stats[qp_id].hw_latency_min = UINT64_MAX;
-		model->burst_stats[qp_id].hw_latency_max = 0;
-		model->burst_stats[qp_id].fw_latency_tot = 0;
-		model->burst_stats[qp_id].fw_latency_min = UINT64_MAX;
-		model->burst_stats[qp_id].fw_latency_max = 0;
-		model->burst_stats[qp_id].hw_reset_count = 0;
-		model->burst_stats[qp_id].fw_reset_count = 0;
-		model->burst_stats[qp_id].dequeued_count = 0;
-	}
-	model->sync_stats =
-		PLT_PTR_ADD(model->burst_stats,
-			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_model_stats));
+		model->layer[0].glow.burst_stats[qp_id].hw_latency_tot = 0;
+		model->layer[0].glow.burst_stats[qp_id].hw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_stats[qp_id].hw_latency_max = 0;
+		model->layer[0].glow.burst_stats[qp_id].fw_latency_tot = 0;
+		model->layer[0].glow.burst_stats[qp_id].fw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_stats[qp_id].fw_latency_max = 0;
+		model->layer[0].glow.burst_stats[qp_id].hw_reset_count = 0;
+		model->layer[0].glow.burst_stats[qp_id].fw_reset_count = 0;
+		model->layer[0].glow.burst_stats[qp_id].dequeued_count = 0;
+	}
+
+	model->layer[0].glow.sync_stats =
+		PLT_PTR_ADD(model->layer[0].glow.burst_stats,
+			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_stats));
 
 	plt_spinlock_init(&model->lock);
-	model->state = ML_CN10K_MODEL_STATE_LOADED;
+	model->state = ML_CNXK_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
 	cnxk_mldev->nb_models_loaded++;
 
@@ -1731,7 +1762,7 @@ int
 cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	char str[RTE_MEMZONE_NAMESIZE];
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cnxk_ml_dev *cnxk_mldev;
 
 	cnxk_mldev = dev->data->dev_private;
@@ -1742,7 +1773,7 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EINVAL;
 	}
 
-	if (model->state != ML_CN10K_MODEL_STATE_LOADED) {
+	if (model->state != ML_CNXK_MODEL_STATE_LOADED) {
 		plt_err("Cannot unload. Model in use.");
 		return -EBUSY;
 	}
@@ -1759,7 +1790,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1784,7 +1815,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	}
 
 	/* Prepare JD */
-	req = model->req;
+	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
@@ -1792,63 +1823,66 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
-	num_tiles = model->metadata.model.tile_end - model->metadata.model.tile_start + 1;
+	num_tiles = model->layer[0].glow.metadata.model.tile_end -
+		    model->layer[0].glow.metadata.model.tile_start + 1;
 
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 				plt_ml_dbg("Model already started, model = 0x%016lx",
 					   PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE) {
+			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
 				plt_err("A slow-path job is active for the model = 0x%016lx",
 					PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CN10K_MODEL_STATE_JOB_ACTIVE;
+			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (!model->model_mem_map.ocm_reserved) {
+	while (!model->layer[0].glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
 			wb_page_start = cn10k_ml_ocm_tilemask_find(
-				dev, num_tiles, model->model_mem_map.wb_pages,
-				model->model_mem_map.scratch_pages, &tilemask);
+				dev, num_tiles, model->layer[0].glow.ocm_map.wb_pages,
+				model->layer[0].glow.ocm_map.scratch_pages, &tilemask);
 
 			if (wb_page_start == -1) {
 				plt_err("Free pages not available on OCM tiles");
 				plt_err("Failed to start model = 0x%016lx, name = %s",
-					PLT_U64_CAST(model), model->metadata.model.name);
+					PLT_U64_CAST(model),
+					model->layer[0].glow.metadata.model.name);
 
 				plt_spinlock_unlock(&ocm->lock);
 				return -ENOMEM;
 			}
 
-			model->model_mem_map.tilemask = tilemask;
-			model->model_mem_map.wb_page_start = wb_page_start;
+			model->layer[0].glow.ocm_map.tilemask = tilemask;
+			model->layer[0].glow.ocm_map.wb_page_start = wb_page_start;
 
-			cn10k_ml_ocm_reserve_pages(
-				dev, model->model_id, model->model_mem_map.tilemask,
-				model->model_mem_map.wb_page_start, model->model_mem_map.wb_pages,
-				model->model_mem_map.scratch_pages);
-			model->model_mem_map.ocm_reserved = true;
+			cn10k_ml_ocm_reserve_pages(dev, model->model_id, 0,
+						   model->layer[0].glow.ocm_map.tilemask,
+						   model->layer[0].glow.ocm_map.wb_page_start,
+						   model->layer[0].glow.ocm_map.wb_pages,
+						   model->layer[0].glow.ocm_map.scratch_pages);
+			model->layer[0].glow.ocm_map.ocm_reserved = true;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
 
 	/* Update JD */
-	cn10k_ml_ocm_tilecount(model->model_mem_map.tilemask, &tile_start, &tile_end);
+	cn10k_ml_ocm_tilecount(model->layer[0].glow.ocm_map.tilemask, &tile_start, &tile_end);
 	req->jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
 	req->jd.model_start.ocm_wb_base_address =
-		model->model_mem_map.wb_page_start * ocm->page_size;
+		model->layer[0].glow.ocm_map.wb_page_start * ocm->page_size;
 
 	job_enqueued = false;
 	job_dequeued = false;
@@ -1881,10 +1915,10 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
 			if (ret == 0) {
-				model->state = ML_CN10K_MODEL_STATE_STARTED;
+				model->state = ML_CNXK_MODEL_STATE_STARTED;
 				cnxk_mldev->nb_models_started++;
 			} else {
-				model->state = ML_CN10K_MODEL_STATE_UNKNOWN;
+				model->state = ML_CNXK_MODEL_STATE_UNKNOWN;
 			}
 
 			plt_spinlock_unlock(&model->lock);
@@ -1892,12 +1926,12 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 		}
 	}
 
-	if (model->state == ML_CN10K_MODEL_STATE_UNKNOWN) {
-		while (model->model_mem_map.ocm_reserved) {
+	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN) {
+		while (model->layer[0].glow.ocm_map.ocm_reserved) {
 			if (plt_spinlock_trylock(&ocm->lock) != 0) {
-				cn10k_ml_ocm_free_pages(dev, model->model_id);
-				model->model_mem_map.ocm_reserved = false;
-				model->model_mem_map.tilemask = 0x0;
+				cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
+				model->layer[0].glow.ocm_map.ocm_reserved = false;
+				model->layer[0].glow.ocm_map.tilemask = 0x0;
 				plt_spinlock_unlock(&ocm->lock);
 			}
 		}
@@ -1918,7 +1952,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1938,7 +1972,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	}
 
 	/* Prepare JD */
-	req = model->req;
+	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
@@ -1949,31 +1983,31 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
 				plt_ml_dbg("Model not started, model = 0x%016lx",
 					   PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE) {
+			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
 				plt_err("A slow-path job is active for the model = 0x%016lx",
 					PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CN10K_MODEL_STATE_JOB_ACTIVE;
+			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (model->model_mem_map.ocm_reserved) {
+	while (model->layer[0].glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
-			cn10k_ml_ocm_free_pages(dev, model->model_id);
-			model->model_mem_map.ocm_reserved = false;
-			model->model_mem_map.tilemask = 0x0;
+			cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
+			model->layer[0].glow.ocm_map.ocm_reserved = false;
+			model->layer[0].glow.ocm_map.tilemask = 0x0;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
@@ -2009,7 +2043,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
 			cnxk_mldev->nb_models_stopped++;
-			model->state = ML_CN10K_MODEL_STATE_LOADED;
+			model->state = ML_CNXK_MODEL_STATE_LOADED;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
@@ -2022,7 +2056,7 @@ static int
 cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			struct rte_ml_model_info *model_info)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 
 	model = dev->data->models[model_id];
 
@@ -2041,7 +2075,7 @@ cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 static int
 cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	size_t size;
 
 	model = dev->data->models[model_id];
@@ -2051,19 +2085,23 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *bu
 		return -EINVAL;
 	}
 
-	if (model->state == ML_CN10K_MODEL_STATE_UNKNOWN)
+	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN)
 		return -1;
-	else if (model->state != ML_CN10K_MODEL_STATE_LOADED)
+	else if (model->state != ML_CNXK_MODEL_STATE_LOADED)
 		return -EBUSY;
 
-	size = model->metadata.init_model.file_size + model->metadata.main_model.file_size +
-	       model->metadata.finish_model.file_size + model->metadata.weights_bias.file_size;
+	size = model->layer[0].glow.metadata.init_model.file_size +
+	       model->layer[0].glow.metadata.main_model.file_size +
+	       model->layer[0].glow.metadata.finish_model.file_size +
+	       model->layer[0].glow.metadata.weights_bias.file_size;
 
 	/* Update model weights & bias */
-	rte_memcpy(model->addr.wb_load_addr, buffer, model->metadata.weights_bias.file_size);
+	rte_memcpy(model->layer[0].glow.addr.wb_load_addr, buffer,
+		   model->layer[0].glow.metadata.weights_bias.file_size);
 
 	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, size);
+	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
+		   model->layer[0].glow.addr.base_dma_addr_load, size);
 
 	return 0;
 }
@@ -2072,7 +2110,7 @@ static int
 cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
 		     struct rte_ml_buff_seg **qbuffer)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint8_t model_input_type;
 	uint8_t *lcl_dbuffer;
 	uint8_t *lcl_qbuffer;
@@ -2092,57 +2130,58 @@ cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_bu
 	lcl_dbuffer = dbuffer[0]->addr;
 	lcl_qbuffer = qbuffer[0]->addr;
 
-	for (i = 0; i < model->metadata.model.num_input; i++) {
+	for (i = 0; i < model->layer[0].glow.metadata.model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			input_type = model->metadata.input1[i].input_type;
-			model_input_type = model->metadata.input1[i].model_input_type;
-			qscale = model->metadata.input1[i].qscale;
+			input_type = model->layer[0].glow.metadata.input1[i].input_type;
+			model_input_type = model->layer[0].glow.metadata.input1[i].model_input_type;
+			qscale = model->layer[0].glow.metadata.input1[i].qscale;
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			input_type = model->metadata.input2[j].input_type;
-			model_input_type = model->metadata.input2[j].model_input_type;
-			qscale = model->metadata.input2[j].qscale;
+			input_type = model->layer[0].glow.metadata.input2[j].input_type;
+			model_input_type = model->layer[0].glow.metadata.input2[j].model_input_type;
+			qscale = model->layer[0].glow.metadata.input2[j].qscale;
 		}
 
 		if (input_type == model_input_type) {
-			rte_memcpy(lcl_qbuffer, lcl_dbuffer, model->addr.input[i].sz_d);
+			rte_memcpy(lcl_qbuffer, lcl_dbuffer, model->layer[0].info.input[i].sz_d);
 		} else {
-			switch (model->metadata.input1[i].model_input_type) {
+			switch (model->layer[0].glow.metadata.input1[i].model_input_type) {
 			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_float32_to_int8(qscale,
-								model->addr.input[i].nb_elements,
-								lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_int8(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_float32_to_uint8(qscale,
-								 model->addr.input[i].nb_elements,
-								 lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_uint8(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_float32_to_int16(qscale,
-								 model->addr.input[i].nb_elements,
-								 lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_int16(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_float32_to_uint16(qscale,
-								  model->addr.input[i].nb_elements,
-								  lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_uint16(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_FP16:
-				ret = rte_ml_io_float32_to_float16(model->addr.input[i].nb_elements,
-								   lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_float16(
+					model->layer[0].info.input[i].nb_elements, lcl_dbuffer,
+					lcl_qbuffer);
 				break;
 			default:
 				plt_err("Unsupported model_input_type[%u] : %u", i,
-					model->metadata.input1[i].model_input_type);
+					model->layer[0].glow.metadata.input1[i].model_input_type);
 				ret = -ENOTSUP;
 			}
 			if (ret < 0)
 				return ret;
 		}
 
-		lcl_dbuffer += model->addr.input[i].sz_d;
-		lcl_qbuffer += model->addr.input[i].sz_q;
+		lcl_dbuffer += model->layer[0].info.input[i].sz_d;
+		lcl_qbuffer += model->layer[0].info.input[i].sz_q;
 	}
 
 	return 0;
@@ -2152,7 +2191,7 @@ static int
 cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
 		       struct rte_ml_buff_seg **dbuffer)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint8_t model_output_type;
 	uint8_t *lcl_qbuffer;
 	uint8_t *lcl_dbuffer;
@@ -2172,58 +2211,60 @@ cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_
 	lcl_dbuffer = dbuffer[0]->addr;
 	lcl_qbuffer = qbuffer[0]->addr;
 
-	for (i = 0; i < model->metadata.model.num_output; i++) {
+	for (i = 0; i < model->layer[0].glow.metadata.model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			output_type = model->metadata.output1[i].output_type;
-			model_output_type = model->metadata.output1[i].model_output_type;
-			dscale = model->metadata.output1[i].dscale;
+			output_type = model->layer[0].glow.metadata.output1[i].output_type;
+			model_output_type =
+				model->layer[0].glow.metadata.output1[i].model_output_type;
+			dscale = model->layer[0].glow.metadata.output1[i].dscale;
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			output_type = model->metadata.output2[j].output_type;
-			model_output_type = model->metadata.output2[j].model_output_type;
-			dscale = model->metadata.output2[j].dscale;
+			output_type = model->layer[0].glow.metadata.output2[j].output_type;
+			model_output_type =
+				model->layer[0].glow.metadata.output2[j].model_output_type;
+			dscale = model->layer[0].glow.metadata.output2[j].dscale;
 		}
 
 		if (output_type == model_output_type) {
-			rte_memcpy(lcl_dbuffer, lcl_qbuffer, model->addr.output[i].sz_q);
+			rte_memcpy(lcl_dbuffer, lcl_qbuffer, model->layer[0].info.output[i].sz_q);
 		} else {
-			switch (model->metadata.output1[i].model_output_type) {
+			switch (model->layer[0].glow.metadata.output1[i].model_output_type) {
 			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_int8_to_float32(dscale,
-								model->addr.output[i].nb_elements,
-								lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_int8_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_uint8_to_float32(dscale,
-								 model->addr.output[i].nb_elements,
-								 lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_uint8_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_int16_to_float32(dscale,
-								 model->addr.output[i].nb_elements,
-								 lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_int16_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_uint16_to_float32(dscale,
-								  model->addr.output[i].nb_elements,
-								  lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_uint16_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_FP16:
 				ret = rte_ml_io_float16_to_float32(
-					model->addr.output[i].nb_elements, lcl_qbuffer,
+					model->layer[0].info.output[i].nb_elements, lcl_qbuffer,
 					lcl_dbuffer);
 				break;
 			default:
 				plt_err("Unsupported model_output_type[%u] : %u", i,
-					model->metadata.output1[i].model_output_type);
+					model->layer[0].glow.metadata.output1[i].model_output_type);
 				ret = -ENOTSUP;
 			}
 			if (ret < 0)
 				return ret;
 		}
 
-		lcl_qbuffer += model->addr.output[i].sz_q;
-		lcl_dbuffer += model->addr.output[i].sz_d;
+		lcl_qbuffer += model->layer[0].info.output[i].sz_q;
+		lcl_dbuffer += model->layer[0].info.output[i].sz_d;
 	}
 
 	return 0;
@@ -2251,10 +2292,10 @@ static __rte_always_inline void
 cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
 		       struct rte_ml_op *op)
 {
-	struct cn10k_ml_model_stats *stats;
+	struct cn10k_ml_layer_stats *stats;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_qp *qp;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
@@ -2264,9 +2305,9 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		if (likely(qp_id >= 0)) {
 			qp = dev->data->queue_pairs[qp_id];
 			qp->stats.dequeued_count++;
-			stats = &model->burst_stats[qp_id];
+			stats = &model->layer[0].glow.burst_stats[qp_id];
 		} else {
-			stats = model->sync_stats;
+			stats = model->layer[0].glow.sync_stats;
 		}
 
 		if (unlikely(stats->dequeued_count == stats->hw_reset_count)) {
@@ -2470,7 +2511,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_req *req;
 	bool timeout;
 	int ret = 0;
@@ -2478,7 +2519,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[op->model_id];
-	req = model->req;
+	req = model->layer[0].glow.req;
 
 	cn10k_ml_set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
diff --git a/drivers/ml/cnxk/cnxk_ml_io.h b/drivers/ml/cnxk/cnxk_ml_io.h
new file mode 100644
index 00000000000..29ec7ec5112
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_io.h
@@ -0,0 +1,79 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_IO_H_
+#define _CNXK_ML_IO_H_
+
+#include <rte_mldev.h>
+
+/* Maximum number of models per device */
+#define ML_CNXK_MAX_MODELS 16
+
+/* Maximum number of layers per model */
+#define ML_CNXK_MODEL_MAX_LAYERS 32
+
+/* Maximum number of inputs or outputs per layer or model */
+#define ML_CNXK_MODEL_MAX_INPUT_OUTPUT 32
+
+/* Maximum number of dimensions per I/O shape */
+#define ML_CNXK_MODEL_MAX_DIMS 8
+
+/* Input / Output structure */
+struct cnxk_ml_io {
+	/* name */
+	char name[RTE_ML_STR_MAX];
+
+	/* dequantized data type */
+	enum rte_ml_io_type dtype;
+
+	/* quantized data type */
+	enum rte_ml_io_type qtype;
+
+	/* Number of dimensions in shape */
+	uint32_t nb_dims;
+
+	/* Shape of input */
+	uint32_t shape[ML_CNXK_MODEL_MAX_DIMS];
+
+	/* Number of elements */
+	uint32_t nb_elements;
+
+	/* Dequantized input size */
+	uint32_t sz_d;
+
+	/* Quantized input size */
+	uint32_t sz_q;
+
+	/* Scale */
+	float scale;
+};
+
+/* Model / Layer IO structure */
+struct cnxk_ml_io_info {
+	/* Number of inputs */
+	uint16_t nb_inputs;
+
+	/* Model / Layer inputs */
+	struct cnxk_ml_io input[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Total size of quantized input */
+	uint32_t total_input_sz_q;
+
+	/* Total size of dequantized input */
+	uint32_t total_input_sz_d;
+
+	/* Number of outputs */
+	uint16_t nb_outputs;
+
+	/* Model / Layer outputs */
+	struct cnxk_ml_io output[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Total size of quantized output */
+	uint32_t total_output_sz_q;
+
+	/* Total size of dequantized output */
+	uint32_t total_output_sz_d;
+};
+
+#endif /* _CNXK_ML_IO_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
new file mode 100644
index 00000000000..3d735ced3e1
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include "cnxk_ml_model.h"
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
new file mode 100644
index 00000000000..a2994dbb71e
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -0,0 +1,111 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_MODEL_H_
+#define _CNXK_ML_MODEL_H_
+
+#include <rte_mldev.h>
+
+#include <roc_api.h>
+
+#include "cn10k_ml_model.h"
+
+#include "cnxk_ml_io.h"
+
+struct cnxk_ml_dev;
+struct cnxk_ml_model;
+
+/* Model state */
+enum cnxk_ml_model_state {
+	/* Unknown state */
+	ML_CNXK_MODEL_STATE_UNKNOWN,
+
+	/* Model loaded */
+	ML_CNXK_MODEL_STATE_LOADED,
+
+	/* A slow-path job is active, start or stop */
+	ML_CNXK_MODEL_STATE_JOB_ACTIVE,
+
+	/* Model started */
+	ML_CNXK_MODEL_STATE_STARTED,
+};
+
+/* Layer state */
+enum cnxk_ml_layer_state {
+	/* Unknown state */
+	ML_CNXK_LAYER_STATE_UNKNOWN,
+
+	/* Layer loaded */
+	ML_CNXK_LAYER_STATE_LOADED,
+
+	/* A slow-path job is active, start or stop */
+	ML_CNXK_LAYER_STATE_JOB_ACTIVE,
+
+	/* Layer started */
+	ML_CNXK_LAYER_STATE_STARTED,
+};
+
+/* Layer object */
+struct cnxk_ml_layer {
+	/* Name*/
+	char name[RTE_ML_STR_MAX];
+
+	/* Model handle */
+	struct cnxk_ml_model *model;
+
+	/* Index mapped with firmware's model_id */
+	uint16_t index;
+
+	/* Input / Output */
+	struct cnxk_ml_io_info info;
+
+	/* Batch size */
+	uint32_t batch_size;
+
+	/* State */
+	enum cnxk_ml_layer_state state;
+
+	/* Glow layer specific data */
+	struct cn10k_ml_layer_data glow;
+};
+
+/* Model Object */
+struct cnxk_ml_model {
+	/* Device reference */
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	/* ID */
+	uint16_t model_id;
+
+	/* Name */
+	char name[RTE_ML_STR_MAX];
+
+	/* Model specific data - glow */
+	struct cn10k_ml_model_data glow;
+
+	/* Batch size */
+	uint32_t batch_size;
+
+	/* Number of layers */
+	uint16_t nb_layers;
+
+	/* Layer info */
+	struct cnxk_ml_layer layer[ML_CNXK_MODEL_MAX_LAYERS];
+
+	/* State */
+	enum cnxk_ml_model_state state;
+
+	/* Internal model information structure
+	 * Size of the buffer = sizeof(struct rte_ml_model_info)
+	 *                    + num_inputs * sizeof(struct rte_ml_io_info)
+	 *                    + num_outputs * sizeof(struct rte_ml_io_info).
+	 * Structures would be arranged in the same order in the buffer.
+	 */
+	uint8_t *info;
+
+	/* Spinlock, used to update model state */
+	plt_spinlock_t lock;
+};
+
+#endif /* _CNXK_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 03a2d4ecf2f..72e03b15b5b 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -13,6 +13,8 @@ driver_sdk_headers = files(
         'cn10k_ml_model.h',
         'cn10k_ml_ocm.h',
         'cnxk_ml_dev.h',
+        'cnxk_ml_io.h',
+        'cnxk_ml_model.h',
 )
 
 sources = files(
@@ -21,6 +23,7 @@ sources = files(
         'cn10k_ml_model.c',
         'cn10k_ml_ocm.c',
         'cnxk_ml_dev.c',
+        'cnxk_ml_model.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 05/34] ml/cnxk: add generic cnxk request structure
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (3 preceding siblings ...)
  2023-08-30 15:58 ` [PATCH v1 04/34] ml/cnxk: add generic model and layer structures Srikanth Yalavarthi
@ 2023-08-30 15:58 ` Srikanth Yalavarthi
  2023-08-30 15:58 ` [PATCH v1 06/34] ml/cnxk: add generic cnxk xstats structures Srikanth Yalavarthi
                   ` (36 subsequent siblings)
  41 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:58 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added generic cnxk request structure. Moved common fields
from cn10k structures to cnxk structure. Moved job related
structures and enumerations to ops headers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c   |  70 ++++---
 drivers/ml/cnxk/cn10k_ml_dev.h   | 269 +------------------------
 drivers/ml/cnxk/cn10k_ml_model.c |   6 +-
 drivers/ml/cnxk/cn10k_ml_model.h |   4 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 329 +++++++++++++++++--------------
 drivers/ml/cnxk/cn10k_ml_ops.h   | 296 +++++++++++++++++++++++----
 drivers/ml/cnxk/cnxk_ml_ops.c    |   7 +
 drivers/ml/cnxk/cnxk_ml_ops.h    |  63 ++++++
 drivers/ml/cnxk/meson.build      |   2 +
 9 files changed, 558 insertions(+), 488 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 367fb7014c4..f6e05cfc472 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -23,6 +23,7 @@
 #include "cn10k_ml_ops.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_ops.h"
 
 #define CN10K_ML_FW_PATH		"fw_path"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
@@ -457,20 +458,23 @@ cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 static void
 cn10k_ml_fw_print_info(struct cn10k_ml_fw *fw)
 {
-	plt_info("ML Firmware Version = %s", fw->req->jd.fw_load.version);
-
-	plt_ml_dbg("Firmware capabilities = 0x%016lx", fw->req->jd.fw_load.cap.u64);
-	plt_ml_dbg("Version = %s", fw->req->jd.fw_load.version);
-	plt_ml_dbg("core0_debug_ptr = 0x%016lx", fw->req->jd.fw_load.debug.core0_debug_ptr);
-	plt_ml_dbg("core1_debug_ptr = 0x%016lx", fw->req->jd.fw_load.debug.core1_debug_ptr);
-	plt_ml_dbg("debug_buffer_size = %u bytes", fw->req->jd.fw_load.debug.debug_buffer_size);
+	plt_info("ML Firmware Version = %s", fw->req->cn10k_req.jd.fw_load.version);
+
+	plt_ml_dbg("Firmware capabilities = 0x%016lx", fw->req->cn10k_req.jd.fw_load.cap.u64);
+	plt_ml_dbg("Version = %s", fw->req->cn10k_req.jd.fw_load.version);
+	plt_ml_dbg("core0_debug_ptr = 0x%016lx",
+		   fw->req->cn10k_req.jd.fw_load.debug.core0_debug_ptr);
+	plt_ml_dbg("core1_debug_ptr = 0x%016lx",
+		   fw->req->cn10k_req.jd.fw_load.debug.core1_debug_ptr);
+	plt_ml_dbg("debug_buffer_size = %u bytes",
+		   fw->req->cn10k_req.jd.fw_load.debug.debug_buffer_size);
 	plt_ml_dbg("core0_exception_buffer = 0x%016lx",
-		   fw->req->jd.fw_load.debug.core0_exception_buffer);
+		   fw->req->cn10k_req.jd.fw_load.debug.core0_exception_buffer);
 	plt_ml_dbg("core1_exception_buffer = 0x%016lx",
-		   fw->req->jd.fw_load.debug.core1_exception_buffer);
+		   fw->req->cn10k_req.jd.fw_load.debug.core1_exception_buffer);
 	plt_ml_dbg("exception_state_size = %u bytes",
-		   fw->req->jd.fw_load.debug.exception_state_size);
-	plt_ml_dbg("flags = 0x%016lx", fw->req->jd.fw_load.flags);
+		   fw->req->cn10k_req.jd.fw_load.debug.exception_state_size);
+	plt_ml_dbg("flags = 0x%016lx", fw->req->cn10k_req.jd.fw_load.flags);
 }
 
 uint64_t
@@ -515,29 +519,30 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 	roc_ml_reg_save(&cn10k_mldev->roc, ML_MLR_BASE);
 
 	/* Update FW load completion structure */
-	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
-	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
-	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
+	fw->req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->cn10k_req.status);
+	fw->req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
+	fw->req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->cn10k_req.result);
+	fw->req->cn10k_req.jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->cn10k_req.status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
 	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->cn10k_req.jd);
 
 	plt_rmb();
 	do {
 		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
+		    (plt_read64(&fw->req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
 	} while (plt_tsc_cycles() < timeout_cycle);
 
 	/* Check firmware load status, clean-up and exit on failure. */
-	if ((!timeout) && (fw->req->result.error_code.u64 == 0)) {
+	if ((!timeout) && (fw->req->cn10k_req.result.error_code == 0)) {
 		cn10k_ml_fw_print_info(fw);
 	} else {
 		/* Set ML to disable new jobs */
@@ -711,29 +716,30 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	plt_ml_dbg("ML_SW_RST_CTRL => 0x%08x", reg_val32);
 
 	/* (12) Wait for notification from firmware that ML is ready for job execution. */
-	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
-	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
-	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
+	fw->req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->cn10k_req.status);
+	fw->req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
+	fw->req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->cn10k_req.result);
+	fw->req->cn10k_req.jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->cn10k_req.status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
 	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->cn10k_req.jd);
 
 	plt_rmb();
 	do {
 		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
+		    (plt_read64(&fw->req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
 	} while (plt_tsc_cycles() < timeout_cycle);
 
 	/* Check firmware load status, clean-up and exit on failure. */
-	if ((!timeout) && (fw->req->result.error_code.u64 == 0)) {
+	if ((!timeout) && (fw->req->cn10k_req.result.error_code == 0)) {
 		cn10k_ml_fw_print_info(fw);
 	} else {
 		/* Set ML to disable new jobs */
@@ -823,11 +829,11 @@ cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 		}
 
 		/* Reserve memzone for firmware load completion and data */
-		mz_size = sizeof(struct cn10k_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
+		mz_size = sizeof(struct cnxk_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
 			  FW_DEBUG_BUFFER_SIZE + FW_EXCEPTION_BUFFER_SIZE;
 	} else if (roc_env_is_asim()) {
 		/* Reserve memzone for firmware load completion */
-		mz_size = sizeof(struct cn10k_ml_req);
+		mz_size = sizeof(struct cnxk_ml_req);
 	}
 
 	mz = plt_memzone_reserve_aligned(FW_MEMZONE_NAME, mz_size, 0, ML_CN10K_ALIGN_SIZE);
@@ -839,8 +845,8 @@ cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 	fw->req = mz->addr;
 
 	/* Reset firmware load completion structure */
-	memset(&fw->req->jd, 0, sizeof(struct cn10k_ml_jd));
-	memset(&fw->req->jd.fw_load.version[0], '\0', MLDEV_FIRMWARE_VERSION_LENGTH);
+	memset(&fw->req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+	memset(&fw->req->cn10k_req.jd.fw_load.version[0], '\0', MLDEV_FIRMWARE_VERSION_LENGTH);
 
 	/* Reset device, if in active state */
 	if (roc_ml_mlip_is_enabled(&cn10k_mldev->roc))
@@ -848,7 +854,7 @@ cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 
 	/* Load firmware */
 	if (roc_env_is_emulator() || roc_env_is_hw()) {
-		fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cn10k_ml_req));
+		fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cnxk_ml_req));
 		ret = cn10k_ml_fw_load_cn10ka(fw, fw_buffer, fw_size);
 		rte_free(fw_buffer);
 	} else if (roc_env_is_asim()) {
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 99ff0a344a2..1852d4f6c9a 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -17,9 +17,6 @@ extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
-/* Firmware version string length */
-#define MLDEV_FIRMWARE_VERSION_LENGTH 32
-
 /* Device alignment size */
 #define ML_CN10K_ALIGN_SIZE 128
 
@@ -52,17 +49,8 @@ extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 #endif
 
 struct cnxk_ml_dev;
-struct cn10k_ml_req;
-struct cn10k_ml_qp;
-
-/* Job types */
-enum cn10k_ml_job_type {
-	ML_CN10K_JOB_TYPE_MODEL_RUN = 0,
-	ML_CN10K_JOB_TYPE_MODEL_STOP,
-	ML_CN10K_JOB_TYPE_MODEL_START,
-	ML_CN10K_JOB_TYPE_FIRMWARE_LOAD,
-	ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST,
-};
+struct cnxk_ml_req;
+struct cnxk_ml_qp;
 
 /* Error types enumeration */
 enum cn10k_ml_error_etype {
@@ -112,251 +100,6 @@ union cn10k_ml_error_code {
 	uint64_t u64;
 };
 
-/* Firmware stats */
-struct cn10k_ml_fw_stats {
-	/* Firmware start cycle */
-	uint64_t fw_start;
-
-	/* Firmware end cycle */
-	uint64_t fw_end;
-
-	/* Hardware start cycle */
-	uint64_t hw_start;
-
-	/* Hardware end cycle */
-	uint64_t hw_end;
-};
-
-/* Result structure */
-struct cn10k_ml_result {
-	/* Job error code */
-	union cn10k_ml_error_code error_code;
-
-	/* Firmware stats */
-	struct cn10k_ml_fw_stats stats;
-
-	/* User context pointer */
-	void *user_ptr;
-};
-
-/* Firmware capability structure */
-union cn10k_ml_fw_cap {
-	uint64_t u64;
-
-	struct {
-		/* CMPC completion support */
-		uint64_t cmpc_completions : 1;
-
-		/* Poll mode completion support */
-		uint64_t poll_completions : 1;
-
-		/* SSO completion support */
-		uint64_t sso_completions : 1;
-
-		/* Support for model side loading */
-		uint64_t side_load_model : 1;
-
-		/* Batch execution */
-		uint64_t batch_run : 1;
-
-		/* Max number of models to be loaded in parallel */
-		uint64_t max_models : 8;
-
-		/* Firmware statistics */
-		uint64_t fw_stats : 1;
-
-		/* Hardware statistics */
-		uint64_t hw_stats : 1;
-
-		/* Max number of batches */
-		uint64_t max_num_batches : 16;
-
-		uint64_t rsvd : 33;
-	} s;
-};
-
-/* Firmware debug info structure */
-struct cn10k_ml_fw_debug {
-	/* ACC core 0 debug buffer */
-	uint64_t core0_debug_ptr;
-
-	/* ACC core 1 debug buffer */
-	uint64_t core1_debug_ptr;
-
-	/* ACC core 0 exception state buffer */
-	uint64_t core0_exception_buffer;
-
-	/* ACC core 1 exception state buffer */
-	uint64_t core1_exception_buffer;
-
-	/* Debug buffer size per core */
-	uint32_t debug_buffer_size;
-
-	/* Exception state dump size */
-	uint32_t exception_state_size;
-};
-
-/* Job descriptor header (32 bytes) */
-struct cn10k_ml_jd_header {
-	/* Job completion structure */
-	struct ml_jce_s jce;
-
-	/* Model ID */
-	uint64_t model_id : 8;
-
-	/* Job type */
-	uint64_t job_type : 8;
-
-	/* Flags for fast-path jobs */
-	uint64_t fp_flags : 16;
-
-	/* Flags for slow-path jobs */
-	uint64_t sp_flags : 16;
-	uint64_t rsvd : 16;
-
-	/* Job result pointer */
-	uint64_t *result;
-};
-
-/* Extra arguments for job descriptor */
-union cn10k_ml_jd_extended_args {
-	struct cn10k_ml_jd_extended_args_section_start {
-		/** DDR Scratch base address */
-		uint64_t ddr_scratch_base_address;
-
-		/** DDR Scratch range start */
-		uint64_t ddr_scratch_range_start;
-
-		/** DDR Scratch range end */
-		uint64_t ddr_scratch_range_end;
-
-		uint8_t rsvd[104];
-	} start;
-};
-
-/* Job descriptor structure */
-struct cn10k_ml_jd {
-	/* Job descriptor header (32 bytes) */
-	struct cn10k_ml_jd_header hdr;
-
-	union {
-		struct cn10k_ml_jd_section_fw_load {
-			/* Firmware capability structure (8 bytes) */
-			union cn10k_ml_fw_cap cap;
-
-			/* Firmware version (32 bytes) */
-			uint8_t version[MLDEV_FIRMWARE_VERSION_LENGTH];
-
-			/* Debug capability structure (40 bytes) */
-			struct cn10k_ml_fw_debug debug;
-
-			/* Flags to control error handling */
-			uint64_t flags;
-
-			uint8_t rsvd[8];
-		} fw_load;
-
-		struct cn10k_ml_jd_section_model_start {
-			/* Extended arguments */
-			uint64_t extended_args;
-
-			/* Destination model start address in DDR relative to ML_MLR_BASE */
-			uint64_t model_dst_ddr_addr;
-
-			/* Offset to model init section in the model */
-			uint64_t model_init_offset : 32;
-
-			/* Size of init section in the model */
-			uint64_t model_init_size : 32;
-
-			/* Offset to model main section in the model */
-			uint64_t model_main_offset : 32;
-
-			/* Size of main section in the model */
-			uint64_t model_main_size : 32;
-
-			/* Offset to model finish section in the model */
-			uint64_t model_finish_offset : 32;
-
-			/* Size of finish section in the model */
-			uint64_t model_finish_size : 32;
-
-			/* Offset to WB in model bin */
-			uint64_t model_wb_offset : 32;
-
-			/* Number of model layers */
-			uint64_t num_layers : 8;
-
-			/* Number of gather entries, 0 means linear input mode (= no gather) */
-			uint64_t num_gather_entries : 8;
-
-			/* Number of scatter entries 0 means linear input mode (= no scatter) */
-			uint64_t num_scatter_entries : 8;
-
-			/* Tile mask to load model */
-			uint64_t tilemask : 8;
-
-			/* Batch size of model  */
-			uint64_t batch_size : 32;
-
-			/* OCM WB base address */
-			uint64_t ocm_wb_base_address : 32;
-
-			/* OCM WB range start */
-			uint64_t ocm_wb_range_start : 32;
-
-			/* OCM WB range End */
-			uint64_t ocm_wb_range_end : 32;
-
-			/* DDR WB address */
-			uint64_t ddr_wb_base_address;
-
-			/* DDR WB range start */
-			uint64_t ddr_wb_range_start : 32;
-
-			/* DDR WB range end */
-			uint64_t ddr_wb_range_end : 32;
-
-			union {
-				/* Points to gather list if num_gather_entries > 0 */
-				void *gather_list;
-				struct {
-					/* Linear input mode */
-					uint64_t ddr_range_start : 32;
-					uint64_t ddr_range_end : 32;
-				} s;
-			} input;
-
-			union {
-				/* Points to scatter list if num_scatter_entries > 0 */
-				void *scatter_list;
-				struct {
-					/* Linear output mode */
-					uint64_t ddr_range_start : 32;
-					uint64_t ddr_range_end : 32;
-				} s;
-			} output;
-		} model_start;
-
-		struct cn10k_ml_jd_section_model_stop {
-			uint8_t rsvd[96];
-		} model_stop;
-
-		struct cn10k_ml_jd_section_model_run {
-			/* Address of the input for the run relative to ML_MLR_BASE */
-			uint64_t input_ddr_addr;
-
-			/* Address of the output for the run relative to ML_MLR_BASE */
-			uint64_t output_ddr_addr;
-
-			/* Number of batches to run in variable batch processing */
-			uint16_t num_batches;
-
-			uint8_t rsvd[78];
-		} model_run;
-	};
-};
-
 /* ML firmware structure */
 struct cn10k_ml_fw {
 	/* Device reference */
@@ -375,7 +118,7 @@ struct cn10k_ml_fw {
 	uint8_t *data;
 
 	/* Firmware load / handshake request structure */
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 };
 
 /* Extended stats types enum */
@@ -488,9 +231,9 @@ struct cn10k_ml_dev {
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
 
 	/* Poll handling function pointers */
-	void (*set_poll_addr)(struct cn10k_ml_req *req);
-	void (*set_poll_ptr)(struct cn10k_ml_req *req);
-	uint64_t (*get_poll_ptr)(struct cn10k_ml_req *req);
+	void (*set_poll_addr)(struct cnxk_ml_req *req);
+	void (*set_poll_ptr)(struct cnxk_ml_req *req);
+	uint64_t (*get_poll_ptr)(struct cnxk_ml_req *req);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 0ea6520bf78..2a0ae44cfd5 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -12,6 +12,7 @@
 
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_ops.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -551,7 +552,6 @@ void
 cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct rte_ml_model_info *info;
 	struct rte_ml_io_info *output;
@@ -560,7 +560,6 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 	uint8_t i;
 
 	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	metadata = &model->glow.metadata;
 	info = PLT_PTR_CAST(model->info);
 	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
@@ -577,7 +576,8 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 	info->io_layout = RTE_ML_IO_LAYOUT_PACKED;
 	info->min_batches = model->batch_size;
 	info->max_batches =
-		cn10k_mldev->fw.req->jd.fw_load.cap.s.max_num_batches / model->batch_size;
+		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_num_batches /
+		model->batch_size;
 	info->nb_inputs = metadata->model.num_input;
 	info->input_info = input;
 	info->nb_outputs = metadata->model.num_output;
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 206a369ca75..74ada1531a8 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -11,10 +11,10 @@
 
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ocm.h"
-#include "cn10k_ml_ops.h"
 
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
+struct cnxk_ml_req;
 
 /* Model Metadata : v 2.3.0.1 */
 #define MRVL_ML_MODEL_MAGIC_STRING "MRVL"
@@ -444,7 +444,7 @@ struct cn10k_ml_layer_data {
 	struct cn10k_ml_ocm_layer_map ocm_map;
 
 	/* Layer: Slow-path operations request pointer */
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 
 	/* Layer: Stats for burst ops */
 	struct cn10k_ml_layer_stats *burst_stats;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index a52509630fe..2b1fa08154d 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -13,6 +13,7 @@
 
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_ops.h"
 
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
@@ -80,31 +81,31 @@ print_line(FILE *fp, int len)
 }
 
 static inline void
-cn10k_ml_set_poll_addr(struct cn10k_ml_req *req)
+cn10k_ml_set_poll_addr(struct cnxk_ml_req *req)
 {
-	req->compl_W1 = PLT_U64_CAST(&req->status);
+	req->status = &req->cn10k_req.status;
 }
 
 static inline void
-cn10k_ml_set_poll_ptr(struct cn10k_ml_req *req)
+cn10k_ml_set_poll_ptr(struct cnxk_ml_req *req)
 {
-	plt_write64(ML_CNXK_POLL_JOB_START, req->compl_W1);
+	plt_write64(ML_CNXK_POLL_JOB_START, req->status);
 }
 
 static inline uint64_t
-cn10k_ml_get_poll_ptr(struct cn10k_ml_req *req)
+cn10k_ml_get_poll_ptr(struct cnxk_ml_req *req)
 {
-	return plt_read64(req->compl_W1);
+	return plt_read64(req->status);
 }
 
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
-	snprintf(name, size, "cn10k_ml_qp_mem_%u:%u", dev_id, qp_id);
+	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
 }
 
 static int
-cn10k_ml_qp_destroy(const struct rte_ml_dev *dev, struct cn10k_ml_qp *qp)
+cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
 {
 	const struct rte_memzone *qp_mem;
 	char name[RTE_MEMZONE_NAMESIZE];
@@ -124,14 +125,14 @@ cn10k_ml_qp_destroy(const struct rte_ml_dev *dev, struct cn10k_ml_qp *qp)
 static int
 cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
 {
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	int ret;
 
 	qp = dev->data->queue_pairs[queue_pair_id];
 	if (qp == NULL)
 		return -EINVAL;
 
-	ret = cn10k_ml_qp_destroy(dev, qp);
+	ret = cnxk_ml_qp_destroy(dev, qp);
 	if (ret) {
 		plt_err("Could not destroy queue pair %u", queue_pair_id);
 		return ret;
@@ -142,18 +143,18 @@ cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
 	return 0;
 }
 
-static struct cn10k_ml_qp *
-cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
+static struct cnxk_ml_qp *
+cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
 {
 	const struct rte_memzone *qp_mem;
 	char name[RTE_MEMZONE_NAMESIZE];
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint32_t len;
 	uint8_t *va;
 	uint64_t i;
 
 	/* Allocate queue pair */
-	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cn10k_ml_qp), ROC_ALIGN,
+	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cnxk_ml_qp), ROC_ALIGN,
 				socket_id);
 	if (qp == NULL) {
 		plt_err("Could not allocate queue pair");
@@ -161,7 +162,7 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	}
 
 	/* For request queue */
-	len = nb_desc * sizeof(struct cn10k_ml_req);
+	len = nb_desc * sizeof(struct cnxk_ml_req);
 	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp_id);
 	qp_mem = rte_memzone_reserve_aligned(
 		name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | RTE_MEMZONE_256MB, ROC_ALIGN);
@@ -175,7 +176,7 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 
 	/* Initialize Request queue */
 	qp->id = qp_id;
-	qp->queue.reqs = (struct cn10k_ml_req *)va;
+	qp->queue.reqs = (struct cnxk_ml_req *)va;
 	qp->queue.head = 0;
 	qp->queue.tail = 0;
 	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
@@ -187,8 +188,9 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
-		memset(&qp->queue.reqs[i].jd, 0, sizeof(struct cn10k_ml_jd));
-		qp->queue.reqs[i].jcmd.w1.s.jobptr = PLT_U64_CAST(&qp->queue.reqs[i].jd);
+		memset(&qp->queue.reqs[i].cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+		qp->queue.reqs[i].cn10k_req.jcmd.w1.s.jobptr =
+			PLT_U64_CAST(&qp->queue.reqs[i].cn10k_req.jd);
 	}
 
 	return qp;
@@ -335,7 +337,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 
 static void
 cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_model *model,
-				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
+				struct cnxk_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_layer_addr *addr;
@@ -343,79 +345,88 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 	metadata = &model->glow.metadata;
 	addr = &model->layer[0].glow.addr;
 
-	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
-	req->jd.hdr.jce.w0.u64 = 0;
-	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
-	req->jd.hdr.model_id = model->model_id;
-	req->jd.hdr.job_type = job_type;
-	req->jd.hdr.fp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
+	memset(&req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
+	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->cn10k_req.status);
+	req->cn10k_req.jd.hdr.model_id = model->model_id;
+	req->cn10k_req.jd.hdr.job_type = job_type;
+	req->cn10k_req.jd.hdr.fp_flags = 0x0;
+	req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
 		if (!model->glow.metadata.model.ocm_relocatable)
-			req->jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
+			req->cn10k_req.jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
 		else
-			req->jd.hdr.sp_flags = 0x0;
+			req->cn10k_req.jd.hdr.sp_flags = 0x0;
 
-		req->jd.hdr.sp_flags |= ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD;
-		req->jd.model_start.extended_args =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->extended_args));
-		req->jd.model_start.model_dst_ddr_addr =
+		req->cn10k_req.jd.hdr.sp_flags |= ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD;
+		req->cn10k_req.jd.model_start.extended_args = PLT_U64_CAST(
+			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.extended_args));
+		req->cn10k_req.jd.model_start.model_dst_ddr_addr =
 			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_run_addr));
-		req->jd.model_start.model_init_offset = 0x0;
-		req->jd.model_start.model_main_offset = metadata->init_model.file_size;
-		req->jd.model_start.model_finish_offset =
+		req->cn10k_req.jd.model_start.model_init_offset = 0x0;
+		req->cn10k_req.jd.model_start.model_main_offset = metadata->init_model.file_size;
+		req->cn10k_req.jd.model_start.model_finish_offset =
 			metadata->init_model.file_size + metadata->main_model.file_size;
-		req->jd.model_start.model_init_size = metadata->init_model.file_size;
-		req->jd.model_start.model_main_size = metadata->main_model.file_size;
-		req->jd.model_start.model_finish_size = metadata->finish_model.file_size;
-		req->jd.model_start.model_wb_offset = metadata->init_model.file_size +
-						      metadata->main_model.file_size +
-						      metadata->finish_model.file_size;
-		req->jd.model_start.num_layers = metadata->model.num_layers;
-		req->jd.model_start.num_gather_entries = 0;
-		req->jd.model_start.num_scatter_entries = 0;
-		req->jd.model_start.tilemask = 0; /* Updated after reserving pages */
-		req->jd.model_start.batch_size = model->batch_size;
-		req->jd.model_start.ocm_wb_base_address = 0; /* Updated after reserving pages */
-		req->jd.model_start.ocm_wb_range_start = metadata->model.ocm_wb_range_start;
-		req->jd.model_start.ocm_wb_range_end = metadata->model.ocm_wb_range_end;
-		req->jd.model_start.ddr_wb_base_address = PLT_U64_CAST(roc_ml_addr_ap2mlip(
-			&cn10k_mldev->roc,
-			PLT_PTR_ADD(addr->finish_load_addr, metadata->finish_model.file_size)));
-		req->jd.model_start.ddr_wb_range_start = metadata->model.ddr_wb_range_start;
-		req->jd.model_start.ddr_wb_range_end = metadata->model.ddr_wb_range_end;
-		req->jd.model_start.input.s.ddr_range_start = metadata->model.ddr_input_range_start;
-		req->jd.model_start.input.s.ddr_range_end = metadata->model.ddr_input_range_end;
-		req->jd.model_start.output.s.ddr_range_start =
+		req->cn10k_req.jd.model_start.model_init_size = metadata->init_model.file_size;
+		req->cn10k_req.jd.model_start.model_main_size = metadata->main_model.file_size;
+		req->cn10k_req.jd.model_start.model_finish_size = metadata->finish_model.file_size;
+		req->cn10k_req.jd.model_start.model_wb_offset = metadata->init_model.file_size +
+								metadata->main_model.file_size +
+								metadata->finish_model.file_size;
+		req->cn10k_req.jd.model_start.num_layers = metadata->model.num_layers;
+		req->cn10k_req.jd.model_start.num_gather_entries = 0;
+		req->cn10k_req.jd.model_start.num_scatter_entries = 0;
+		req->cn10k_req.jd.model_start.tilemask = 0; /* Updated after reserving pages */
+		req->cn10k_req.jd.model_start.batch_size = model->batch_size;
+		req->cn10k_req.jd.model_start.ocm_wb_base_address =
+			0; /* Updated after reserving pages */
+		req->cn10k_req.jd.model_start.ocm_wb_range_start =
+			metadata->model.ocm_wb_range_start;
+		req->cn10k_req.jd.model_start.ocm_wb_range_end = metadata->model.ocm_wb_range_end;
+		req->cn10k_req.jd.model_start.ddr_wb_base_address =
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(
+				&cn10k_mldev->roc, PLT_PTR_ADD(addr->finish_load_addr,
+							       metadata->finish_model.file_size)));
+		req->cn10k_req.jd.model_start.ddr_wb_range_start =
+			metadata->model.ddr_wb_range_start;
+		req->cn10k_req.jd.model_start.ddr_wb_range_end = metadata->model.ddr_wb_range_end;
+		req->cn10k_req.jd.model_start.input.s.ddr_range_start =
+			metadata->model.ddr_input_range_start;
+		req->cn10k_req.jd.model_start.input.s.ddr_range_end =
+			metadata->model.ddr_input_range_end;
+		req->cn10k_req.jd.model_start.output.s.ddr_range_start =
 			metadata->model.ddr_output_range_start;
-		req->jd.model_start.output.s.ddr_range_end = metadata->model.ddr_output_range_end;
+		req->cn10k_req.jd.model_start.output.s.ddr_range_end =
+			metadata->model.ddr_output_range_end;
 
-		req->extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
+		req->cn10k_req.extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
 			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->scratch_base_addr));
-		req->extended_args.start.ddr_scratch_range_start =
+		req->cn10k_req.extended_args.start.ddr_scratch_range_start =
 			metadata->model.ddr_scratch_range_start;
-		req->extended_args.start.ddr_scratch_range_end =
+		req->cn10k_req.extended_args.start.ddr_scratch_range_end =
 			metadata->model.ddr_scratch_range_end;
 	}
 }
 
 static __rte_always_inline void
-cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_req *req,
+cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_req *req,
 				struct rte_ml_op *op)
 {
-	req->jd.hdr.jce.w0.u64 = 0;
-	req->jd.hdr.jce.w1.u64 = req->compl_W1;
-	req->jd.hdr.model_id = op->model_id;
-	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
-	req->jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
-	req->jd.hdr.sp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
-	req->jd.model_run.input_ddr_addr =
+	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
+	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(req->status);
+	req->cn10k_req.jd.hdr.model_id = op->model_id;
+	req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
+	req->cn10k_req.jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
+	req->cn10k_req.jd.hdr.sp_flags = 0x0;
+	req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
+	req->cn10k_req.jd.model_run.input_ddr_addr =
 		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->input[0]->addr));
-	req->jd.model_run.output_ddr_addr =
+	req->cn10k_req.jd.model_run.output_ddr_addr =
 		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->output[0]->addr));
-	req->jd.model_run.num_batches = op->nb_batches;
+	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
 }
 
 struct xstat_info {
@@ -863,7 +874,7 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	op.input = &inp;
 	op.output = &out;
 
-	memset(model->layer[0].glow.req, 0, sizeof(struct cn10k_ml_req));
+	memset(model->layer[0].glow.req, 0, sizeof(struct cnxk_ml_req));
 	ret = cn10k_ml_inference_sync(dev, &op);
 	plt_memzone_free(mz);
 
@@ -906,7 +917,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint16_t model_id;
 	uint32_t mz_size;
 	uint16_t tile_id;
@@ -1103,7 +1114,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint16_t model_id;
 	uint16_t qp_id;
 
@@ -1138,7 +1149,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
 		qp = dev->data->queue_pairs[qp_id];
 		if (qp != NULL) {
-			if (cn10k_ml_qp_destroy(dev, qp) != 0)
+			if (cnxk_ml_qp_destroy(dev, qp) != 0)
 				plt_err("Could not destroy queue pair %u", qp_id);
 			dev->data->queue_pairs[qp_id] = NULL;
 		}
@@ -1215,7 +1226,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 			      const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
 {
 	struct rte_ml_dev_info dev_info;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint32_t nb_desc;
 
 	if (queue_pair_id >= dev->data->nb_queue_pairs) {
@@ -1241,7 +1252,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	 */
 	nb_desc =
 		(qp_conf->nb_desc == dev_info.max_desc) ? dev_info.max_desc : qp_conf->nb_desc + 1;
-	qp = cn10k_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
+	qp = cnxk_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
 	if (qp == NULL) {
 		plt_err("Could not create queue pair %u", queue_pair_id);
 		return -ENOMEM;
@@ -1254,7 +1265,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 static int
 cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 {
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	int qp_id;
 
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
@@ -1271,7 +1282,7 @@ cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 static void
 cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	int qp_id;
 
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
@@ -1487,20 +1498,22 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 
 	/* Dump debug buffer */
 	for (core_id = 0; core_id <= 1; core_id++) {
-		bufsize = fw->req->jd.fw_load.debug.debug_buffer_size;
+		bufsize = fw->req->cn10k_req.jd.fw_load.debug.debug_buffer_size;
 		if (core_id == 0) {
 			head_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
 			tail_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_debug_ptr);
+			head_ptr =
+				PLT_PTR_CAST(fw->req->cn10k_req.jd.fw_load.debug.core0_debug_ptr);
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		} else {
 			head_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
 			tail_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_debug_ptr);
+			head_ptr =
+				PLT_PTR_CAST(fw->req->cn10k_req.jd.fw_load.debug.core1_debug_ptr);
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		}
 		if (head_loc < tail_loc) {
@@ -1513,17 +1526,19 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 
 	/* Dump exception info */
 	for (core_id = 0; core_id <= 1; core_id++) {
-		bufsize = fw->req->jd.fw_load.debug.exception_state_size;
+		bufsize = fw->req->cn10k_req.jd.fw_load.debug.exception_state_size;
 		if ((core_id == 0) &&
 		    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0)) {
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_exception_buffer);
+			head_ptr = PLT_PTR_CAST(
+				fw->req->cn10k_req.jd.fw_load.debug.core0_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C0 = 0x%016lx",
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0));
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 			fprintf(fp, "%.*s", bufsize, head_ptr);
 		} else if ((core_id == 1) && (roc_ml_reg_read64(&cn10k_mldev->roc,
 								ML_SCRATCH_EXCEPTION_SP_C1) != 0)) {
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_exception_buffer);
+			head_ptr = PLT_PTR_CAST(
+				fw->req->cn10k_req.jd.fw_load.debug.core1_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C1 = 0x%016lx",
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1));
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
@@ -1540,14 +1555,14 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	const struct plt_memzone *mz;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 	uint64_t timeout_cycle;
 	bool timeout;
 	int ret;
 
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cn10k_ml_req), 0,
+	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cnxk_ml_req), 0,
 					 ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL) {
 		plt_err("Could not allocate reserved memzone");
@@ -1556,23 +1571,24 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	req = mz->addr;
 
 	/* Prepare load completion structure */
-	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
-	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
-	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
-	req->jd.fw_load.flags = cn10k_ml_fw_flags_get(&cn10k_mldev->fw);
-	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
+	memset(&req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->cn10k_req.status);
+	req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST;
+	req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
+	req->cn10k_req.jd.fw_load.flags = cn10k_ml_fw_flags_get(&cn10k_mldev->fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
 	/* Enqueue firmware selftest request through scratch registers */
 	timeout = true;
 	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 
 	plt_rmb();
 	do {
 		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
-		    (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH)) {
+		    (plt_read64(&req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -1583,7 +1599,7 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	if (timeout) {
 		ret = -ETIME;
 	} else {
-		if (req->result.error_code.u64 != 0)
+		if (req->cn10k_req.result.error_code != 0)
 			ret = -1;
 	}
 
@@ -1656,7 +1672,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
 		  2 * model_data_size + model_scratch_size + model_info_size +
-		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE) +
+		  PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE) +
 		  model_stats_size;
 
 	/* Allocate memzone for model object and model data */
@@ -1728,7 +1744,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	/* Reset burst and sync stats */
 	model->layer[0].glow.burst_stats =
 		PLT_PTR_ADD(model->layer[0].glow.req,
-			    PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE));
+			    PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
 		model->layer[0].glow.burst_stats[qp_id].hw_latency_tot = 0;
 		model->layer[0].glow.burst_stats[qp_id].hw_latency_min = UINT64_MAX;
@@ -1792,7 +1808,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 
 	bool job_enqueued;
 	bool job_dequeued;
@@ -1817,10 +1833,10 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	/* Prepare JD */
 	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
-	req->result.error_code.u64 = 0x0;
-	req->result.user_ptr = NULL;
+	req->cn10k_req.result.error_code = 0x0;
+	req->cn10k_req.result.user_ptr = NULL;
 
-	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
 	num_tiles = model->layer[0].glow.metadata.model.tile_end -
@@ -1880,8 +1896,8 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Update JD */
 	cn10k_ml_ocm_tilecount(model->layer[0].glow.ocm_map.tilemask, &tile_start, &tile_end);
-	req->jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
-	req->jd.model_start.ocm_wb_base_address =
+	req->cn10k_req.jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
+	req->cn10k_req.jd.model_start.ocm_wb_base_address =
 		model->layer[0].glow.ocm_map.wb_page_start * ocm->page_size;
 
 	job_enqueued = false;
@@ -1889,19 +1905,21 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	do {
 		if (!job_enqueued) {
 			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
+			job_enqueued =
+				roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
+			job_dequeued =
+				roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
-			if (req->result.error_code.u64 == 0)
+		if (plt_read64(&req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH) {
+			if (req->cn10k_req.result.error_code == 0)
 				ret = 0;
 			else
 				ret = -1;
@@ -1954,7 +1972,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 
 	bool job_enqueued;
 	bool job_dequeued;
@@ -1974,10 +1992,10 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	/* Prepare JD */
 	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
-	req->result.error_code.u64 = 0x0;
-	req->result.user_ptr = NULL;
+	req->cn10k_req.result.error_code = 0x0;
+	req->cn10k_req.result.user_ptr = NULL;
 
-	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
 	locked = false;
@@ -2017,19 +2035,21 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	do {
 		if (!job_enqueued) {
 			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
+			job_enqueued =
+				roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
+			job_dequeued =
+				roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
-			if (req->result.error_code.u64 == 0x0)
+		if (plt_read64(&req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH) {
+			if (req->cn10k_req.result.error_code == 0x0)
 				ret = 0;
 			else
 				ret = -1;
@@ -2289,18 +2309,23 @@ queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
 }
 
 static __rte_always_inline void
-cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
-		       struct rte_ml_op *op)
+cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *req)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_layer_stats *stats;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cn10k_ml_result *result;
 	struct cnxk_ml_model *model;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
+	struct rte_ml_op *op;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
 
-	if (likely(result->error_code.u64 == 0)) {
+	result = &req->cn10k_req.result;
+	op = req->op;
+
+	if (likely(result->error_code == 0)) {
 		model = dev->data->models[op->model_id];
 		if (likely(qp_id >= 0)) {
 			qp = dev->data->queue_pairs[qp_id];
@@ -2331,7 +2356,7 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		stats->fw_latency_max = PLT_MAX(stats->fw_latency_max, fw_latency);
 		stats->dequeued_count++;
 
-		op->impl_opaque = result->error_code.u64;
+		op->impl_opaque = result->error_code;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
 	} else {
 		if (likely(qp_id >= 0)) {
@@ -2340,7 +2365,8 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		}
 
 		/* Handle driver error */
-		if (result->error_code.s.etype == ML_ETYPE_DRIVER) {
+		error_code = (union cn10k_ml_error_code *)&result->error_code;
+		if (error_code->s.etype == ML_ETYPE_DRIVER) {
 			cnxk_mldev = dev->data->dev_private;
 			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
@@ -2348,15 +2374,15 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 			if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) !=
 			     0) ||
 			    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
-				result->error_code.s.stype = ML_DRIVER_ERR_EXCEPTION;
+				error_code->s.stype = ML_DRIVER_ERR_EXCEPTION;
 			else if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_LO) != 0) ||
 				 (roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_HI) != 0))
-				result->error_code.s.stype = ML_DRIVER_ERR_FW_ERROR;
+				error_code->s.stype = ML_DRIVER_ERR_FW_ERROR;
 			else
-				result->error_code.s.stype = ML_DRIVER_ERR_UNKNOWN;
+				error_code->s.stype = ML_DRIVER_ERR_UNKNOWN;
 		}
 
-		op->impl_opaque = result->error_code.u64;
+		op->impl_opaque = result->error_code;
 		op->status = RTE_ML_OP_STATUS_ERROR;
 	}
 
@@ -2367,11 +2393,12 @@ __rte_hot uint16_t
 cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_req *req;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_req *req;
+	struct cnxk_ml_qp *qp;
 	struct rte_ml_op *op;
 
 	uint16_t count;
@@ -2397,12 +2424,13 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	cn10k_mldev->set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
-	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
-	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
-	req->result.user_ptr = op->user_ptr;
+	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
+	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	req->cn10k_req.result.user_ptr = op->user_ptr;
 
 	cn10k_mldev->set_poll_ptr(req);
-	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd);
+	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
 
@@ -2426,11 +2454,12 @@ __rte_hot uint16_t
 cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_req *req;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_req *req;
+	struct cnxk_ml_qp *qp;
 
 	uint64_t status;
 	uint16_t count;
@@ -2452,13 +2481,15 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	req = &queue->reqs[tail];
 	status = cn10k_mldev->get_poll_ptr(req);
 	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
-		if (plt_tsc_cycles() < req->timeout)
+		if (plt_tsc_cycles() < req->timeout) {
 			goto empty_or_active;
-		else /* Timeout, set indication of driver error */
-			req->result.error_code.s.etype = ML_ETYPE_DRIVER;
+		} else { /* Timeout, set indication of driver error */
+			error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+			error_code->s.etype = ML_ETYPE_DRIVER;
+		}
 	}
 
-	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
+	cn10k_ml_result_update(dev, qp_id, req);
 	ops[count] = req->op;
 
 	queue_index_advance(&tail, qp->nb_desc);
@@ -2509,10 +2540,11 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 __rte_hot int
 cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 	bool timeout;
 	int ret = 0;
 
@@ -2524,17 +2556,18 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	cn10k_ml_set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
-	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
-	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
-	req->result.user_ptr = op->user_ptr;
+	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
+	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	req->cn10k_req.result.user_ptr = op->user_ptr;
 
 	cn10k_mldev->set_poll_ptr(req);
-	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
+	req->cn10k_req.jcmd.w1.s.jobptr = PLT_U64_CAST(&req->cn10k_req.jd);
 
 	timeout = true;
 	req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	do {
-		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd)) {
+		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd)) {
 			req->op = op;
 			timeout = false;
 			break;
@@ -2557,7 +2590,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	if (timeout)
 		ret = -ETIME;
 	else
-		cn10k_ml_result_update(dev, -1, &req->result, req->op);
+		cn10k_ml_result_update(dev, -1, req);
 
 error_enqueue:
 	return ret;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 005b093e45d..fd5992e1925 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -10,63 +10,279 @@
 
 #include <roc_api.h>
 
-#include "cn10k_ml_dev.h"
+/* Firmware version string length */
+#define MLDEV_FIRMWARE_VERSION_LENGTH 32
 
-/* Request structure */
-struct cn10k_ml_req {
-	/* Job descriptor */
-	struct cn10k_ml_jd jd;
+/* Job types */
+enum cn10k_ml_job_type {
+	ML_CN10K_JOB_TYPE_MODEL_RUN = 0,
+	ML_CN10K_JOB_TYPE_MODEL_STOP,
+	ML_CN10K_JOB_TYPE_MODEL_START,
+	ML_CN10K_JOB_TYPE_FIRMWARE_LOAD,
+	ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST,
+};
 
-	/* Job descriptor extra arguments */
-	union cn10k_ml_jd_extended_args extended_args;
+/* Firmware stats */
+struct cn10k_ml_stats {
+	/* Firmware start cycle */
+	uint64_t fw_start;
 
-	/* Job result */
-	struct cn10k_ml_result result;
+	/* Firmware end cycle */
+	uint64_t fw_end;
 
-	/* Status field for poll mode requests */
-	volatile uint64_t status;
+	/* Hardware start cycle */
+	uint64_t hw_start;
 
-	/* Job command */
-	struct ml_job_cmd_s jcmd;
+	/* Hardware end cycle */
+	uint64_t hw_end;
+};
+
+/* Result structure */
+struct cn10k_ml_result {
+	/* Job error code */
+	uint64_t error_code;
+
+	/* Stats */
+	struct cn10k_ml_stats stats;
+
+	/* User context pointer */
+	void *user_ptr;
+};
+
+/* Firmware capability structure */
+union cn10k_ml_fw_cap {
+	uint64_t u64;
+
+	struct {
+		/* CMPC completion support */
+		uint64_t cmpc_completions : 1;
+
+		/* Poll mode completion support */
+		uint64_t poll_completions : 1;
+
+		/* SSO completion support */
+		uint64_t sso_completions : 1;
+
+		/* Support for model side loading */
+		uint64_t side_load_model : 1;
 
-	/* Job completion W1 */
-	uint64_t compl_W1;
+		/* Batch execution */
+		uint64_t batch_run : 1;
 
-	/* Timeout cycle */
-	uint64_t timeout;
+		/* Max number of models to be loaded in parallel */
+		uint64_t max_models : 8;
 
-	/* Op */
-	struct rte_ml_op *op;
-} __rte_aligned(ROC_ALIGN);
+		/* Firmware statistics */
+		uint64_t fw_stats : 1;
 
-/* Request queue */
-struct cn10k_ml_queue {
-	/* Array of requests */
-	struct cn10k_ml_req *reqs;
+		/* Hardware statistics */
+		uint64_t hw_stats : 1;
 
-	/* Head of the queue, used for enqueue */
-	uint64_t head;
+		/* Max number of batches */
+		uint64_t max_num_batches : 16;
 
-	/* Tail of the queue, used for dequeue */
-	uint64_t tail;
+		uint64_t rsvd : 33;
+	} s;
+};
+
+/* Firmware debug info structure */
+struct cn10k_ml_fw_debug {
+	/* ACC core 0 debug buffer */
+	uint64_t core0_debug_ptr;
+
+	/* ACC core 1 debug buffer */
+	uint64_t core1_debug_ptr;
+
+	/* ACC core 0 exception state buffer */
+	uint64_t core0_exception_buffer;
+
+	/* ACC core 1 exception state buffer */
+	uint64_t core1_exception_buffer;
+
+	/* Debug buffer size per core */
+	uint32_t debug_buffer_size;
 
-	/* Wait cycles before timeout */
-	uint64_t wait_cycles;
+	/* Exception state dump size */
+	uint32_t exception_state_size;
 };
 
-/* Queue-pair structure */
-struct cn10k_ml_qp {
-	/* ID */
-	uint32_t id;
+/* Job descriptor header (32 bytes) */
+struct cn10k_ml_jd_header {
+	/* Job completion structure */
+	struct ml_jce_s jce;
+
+	/* Model ID */
+	uint64_t model_id : 8;
+
+	/* Job type */
+	uint64_t job_type : 8;
+
+	/* Flags for fast-path jobs */
+	uint64_t fp_flags : 16;
+
+	/* Flags for slow-path jobs */
+	uint64_t sp_flags : 16;
+	uint64_t rsvd : 16;
+
+	/* Job result pointer */
+	uint64_t *result;
+};
+
+/* Extra arguments for job descriptor */
+union cn10k_ml_jd_extended_args {
+	struct cn10k_ml_jd_extended_args_section_start {
+		/* DDR Scratch base address */
+		uint64_t ddr_scratch_base_address;
+
+		/* DDR Scratch range start */
+		uint64_t ddr_scratch_range_start;
+
+		/* DDR Scratch range end */
+		uint64_t ddr_scratch_range_end;
+
+		uint8_t rsvd[104];
+	} start;
+};
+
+/* Job descriptor structure */
+struct cn10k_ml_jd {
+	/* Job descriptor header (32 bytes) */
+	struct cn10k_ml_jd_header hdr;
+
+	union {
+		struct cn10k_ml_jd_section_fw_load {
+			/* Firmware capability structure (8 bytes) */
+			union cn10k_ml_fw_cap cap;
+
+			/* Firmware version (32 bytes) */
+			uint8_t version[MLDEV_FIRMWARE_VERSION_LENGTH];
+
+			/* Debug capability structure (40 bytes) */
+			struct cn10k_ml_fw_debug debug;
 
-	/* Number of descriptors */
-	uint64_t nb_desc;
+			/* Flags to control error handling */
+			uint64_t flags;
 
-	/* Request queue */
-	struct cn10k_ml_queue queue;
+			uint8_t rsvd[8];
+		} fw_load;
 
-	/* Statistics per queue-pair */
-	struct rte_ml_dev_stats stats;
+		struct cn10k_ml_jd_section_model_start {
+			/* Extended arguments */
+			uint64_t extended_args;
+
+			/* Destination model start address in DDR relative to ML_MLR_BASE */
+			uint64_t model_dst_ddr_addr;
+
+			/* Offset to model init section in the model */
+			uint64_t model_init_offset : 32;
+
+			/* Size of init section in the model */
+			uint64_t model_init_size : 32;
+
+			/* Offset to model main section in the model */
+			uint64_t model_main_offset : 32;
+
+			/* Size of main section in the model */
+			uint64_t model_main_size : 32;
+
+			/* Offset to model finish section in the model */
+			uint64_t model_finish_offset : 32;
+
+			/* Size of finish section in the model */
+			uint64_t model_finish_size : 32;
+
+			/* Offset to WB in model bin */
+			uint64_t model_wb_offset : 32;
+
+			/* Number of model layers */
+			uint64_t num_layers : 8;
+
+			/* Number of gather entries, 0 means linear input mode (= no gather) */
+			uint64_t num_gather_entries : 8;
+
+			/* Number of scatter entries 0 means linear input mode (= no scatter) */
+			uint64_t num_scatter_entries : 8;
+
+			/* Tile mask to load model */
+			uint64_t tilemask : 8;
+
+			/* Batch size of model  */
+			uint64_t batch_size : 32;
+
+			/* OCM WB base address */
+			uint64_t ocm_wb_base_address : 32;
+
+			/* OCM WB range start */
+			uint64_t ocm_wb_range_start : 32;
+
+			/* OCM WB range End */
+			uint64_t ocm_wb_range_end : 32;
+
+			/* DDR WB address */
+			uint64_t ddr_wb_base_address;
+
+			/* DDR WB range start */
+			uint64_t ddr_wb_range_start : 32;
+
+			/* DDR WB range end */
+			uint64_t ddr_wb_range_end : 32;
+
+			union {
+				/* Points to gather list if num_gather_entries > 0 */
+				void *gather_list;
+				struct {
+					/* Linear input mode */
+					uint64_t ddr_range_start : 32;
+					uint64_t ddr_range_end : 32;
+				} s;
+			} input;
+
+			union {
+				/* Points to scatter list if num_scatter_entries > 0 */
+				void *scatter_list;
+				struct {
+					/* Linear output mode */
+					uint64_t ddr_range_start : 32;
+					uint64_t ddr_range_end : 32;
+				} s;
+			} output;
+		} model_start;
+
+		struct cn10k_ml_jd_section_model_stop {
+			uint8_t rsvd[96];
+		} model_stop;
+
+		struct cn10k_ml_jd_section_model_run {
+			/* Address of the input for the run relative to ML_MLR_BASE */
+			uint64_t input_ddr_addr;
+
+			/* Address of the output for the run relative to ML_MLR_BASE */
+			uint64_t output_ddr_addr;
+
+			/* Number of batches to run in variable batch processing */
+			uint16_t num_batches;
+
+			uint8_t rsvd[78];
+		} model_run;
+	};
+} __plt_aligned(ROC_ALIGN);
+
+/* CN10K specific request */
+struct cn10k_ml_req {
+	/* Job descriptor */
+	struct cn10k_ml_jd jd;
+
+	/* Job descriptor extra arguments */
+	union cn10k_ml_jd_extended_args extended_args;
+
+	/* Status field for poll mode requests */
+	volatile uint64_t status;
+
+	/* Job command */
+	struct ml_job_cmd_s jcmd;
+
+	/* Result */
+	struct cn10k_ml_result result;
 };
 
 /* Device ops */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
new file mode 100644
index 00000000000..f1872dcf7c6
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include "cnxk_ml_ops.h"
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
new file mode 100644
index 00000000000..b953fb0f5fc
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_OPS_H_
+#define _CNXK_ML_OPS_H_
+
+#include <rte_mldev.h>
+#include <rte_mldev_core.h>
+
+#include <roc_api.h>
+
+#include "cn10k_ml_ops.h"
+
+/* Request structure */
+struct cnxk_ml_req {
+	/* Device specific request */
+	union {
+		/* CN10K */
+		struct cn10k_ml_req cn10k_req;
+	};
+
+	/* Address of status field */
+	volatile uint64_t *status;
+
+	/* Timeout cycle */
+	uint64_t timeout;
+
+	/* Op */
+	struct rte_ml_op *op;
+} __rte_aligned(ROC_ALIGN);
+
+/* Request queue */
+struct cnxk_ml_queue {
+	/* Array of requests */
+	struct cnxk_ml_req *reqs;
+
+	/* Head of the queue, used for enqueue */
+	uint64_t head;
+
+	/* Tail of the queue, used for dequeue */
+	uint64_t tail;
+
+	/* Wait cycles before timeout */
+	uint64_t wait_cycles;
+};
+
+/* Queue-pair structure */
+struct cnxk_ml_qp {
+	/* ID */
+	uint32_t id;
+
+	/* Number of descriptors */
+	uint64_t nb_desc;
+
+	/* Request queue */
+	struct cnxk_ml_queue queue;
+
+	/* Statistics per queue-pair */
+	struct rte_ml_dev_stats stats;
+};
+
+#endif /* _CNXK_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 72e03b15b5b..73db458fcd9 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -15,6 +15,7 @@ driver_sdk_headers = files(
         'cnxk_ml_dev.h',
         'cnxk_ml_io.h',
         'cnxk_ml_model.h',
+        'cnxk_ml_ops.h',
 )
 
 sources = files(
@@ -24,6 +25,7 @@ sources = files(
         'cn10k_ml_ocm.c',
         'cnxk_ml_dev.c',
         'cnxk_ml_model.c',
+        'cnxk_ml_ops.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 06/34] ml/cnxk: add generic cnxk xstats structures
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (4 preceding siblings ...)
  2023-08-30 15:58 ` [PATCH v1 05/34] ml/cnxk: add generic cnxk request structure Srikanth Yalavarthi
@ 2023-08-30 15:58 ` Srikanth Yalavarthi
  2023-08-30 15:58 ` [PATCH v1 07/34] ml/cnxk: rename cnxk ops function pointers struct Srikanth Yalavarthi
                   ` (35 subsequent siblings)
  41 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:58 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduced generic xstats structures and renamed cn10k
xstats enumerations with cnxk prefix.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |  86 +---------------
 drivers/ml/cnxk/cn10k_ml_model.h |   6 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 169 ++++++++++++++-----------------
 drivers/ml/cnxk/cnxk_ml_xstats.h | 128 +++++++++++++++++++++++
 drivers/ml/cnxk/meson.build      |   1 +
 5 files changed, 210 insertions(+), 180 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_xstats.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 1852d4f6c9a..be989e0a207 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -10,6 +10,7 @@
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_io.h"
+#include "cnxk_ml_xstats.h"
 
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
@@ -121,89 +122,6 @@ struct cn10k_ml_fw {
 	struct cnxk_ml_req *req;
 };
 
-/* Extended stats types enum */
-enum cn10k_ml_xstats_type {
-	/* Number of models loaded */
-	nb_models_loaded,
-
-	/* Number of models unloaded */
-	nb_models_unloaded,
-
-	/* Number of models started */
-	nb_models_started,
-
-	/* Number of models stopped */
-	nb_models_stopped,
-
-	/* Average inference hardware latency */
-	avg_hw_latency,
-
-	/* Minimum hardware latency */
-	min_hw_latency,
-
-	/* Maximum hardware latency */
-	max_hw_latency,
-
-	/* Average firmware latency */
-	avg_fw_latency,
-
-	/* Minimum firmware latency */
-	min_fw_latency,
-
-	/* Maximum firmware latency */
-	max_fw_latency,
-};
-
-/* Extended stats function type enum. */
-enum cn10k_ml_xstats_fn_type {
-	/* Device function */
-	CN10K_ML_XSTATS_FN_DEVICE,
-
-	/* Model function */
-	CN10K_ML_XSTATS_FN_MODEL,
-};
-
-/* Function pointer to get xstats for a type */
-typedef uint64_t (*cn10k_ml_xstats_fn)(struct rte_ml_dev *dev, uint16_t obj_idx,
-				       enum cn10k_ml_xstats_type stat);
-
-/* Extended stats entry structure */
-struct cn10k_ml_xstats_entry {
-	/* Name-ID map */
-	struct rte_ml_dev_xstats_map map;
-
-	/* xstats mode, device or model */
-	enum rte_ml_dev_xstats_mode mode;
-
-	/* Type of xstats */
-	enum cn10k_ml_xstats_type type;
-
-	/* xstats function */
-	enum cn10k_ml_xstats_fn_type fn_id;
-
-	/* Object ID, model ID for model stat type */
-	uint16_t obj_idx;
-
-	/* Allowed to reset the stat */
-	uint8_t reset_allowed;
-
-	/* An offset to be taken away to emulate resets */
-	uint64_t reset_value;
-};
-
-/* Extended stats data */
-struct cn10k_ml_xstats {
-	/* Pointer to xstats entries */
-	struct cn10k_ml_xstats_entry *entries;
-
-	/* Store num stats and offset of the stats for each model */
-	uint16_t count_per_model[ML_CNXK_MAX_MODELS];
-	uint16_t offset_for_model[ML_CNXK_MAX_MODELS];
-	uint16_t count_mode_device;
-	uint16_t count_mode_model;
-	uint16_t count;
-};
-
 /* Device private data */
 struct cn10k_ml_dev {
 	/* Device ROC */
@@ -216,7 +134,7 @@ struct cn10k_ml_dev {
 	struct cn10k_ml_ocm ocm;
 
 	/* Extended stats data */
-	struct cn10k_ml_xstats xstats;
+	struct cnxk_ml_xstats xstats;
 
 	/* Enable / disable model data caching */
 	int cache_model_data;
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 74ada1531a8..5c32f48c68f 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -404,7 +404,7 @@ struct cn10k_ml_layer_addr {
 };
 
 /* Model fast-path stats */
-struct cn10k_ml_layer_stats {
+struct cn10k_ml_layer_xstats {
 	/* Total hardware latency, sum of all inferences */
 	uint64_t hw_latency_tot;
 
@@ -447,10 +447,10 @@ struct cn10k_ml_layer_data {
 	struct cnxk_ml_req *req;
 
 	/* Layer: Stats for burst ops */
-	struct cn10k_ml_layer_stats *burst_stats;
+	struct cn10k_ml_layer_xstats *burst_xstats;
 
 	/* Layer: Stats for sync ops */
-	struct cn10k_ml_layer_stats *sync_stats;
+	struct cn10k_ml_layer_xstats *sync_xstats;
 };
 
 struct cn10k_ml_model_data {
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 2b1fa08154d..03a7447dc87 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -14,6 +14,7 @@
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
+#include "cnxk_ml_xstats.h"
 
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
@@ -429,26 +430,6 @@ cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
 }
 
-struct xstat_info {
-	char name[32];
-	enum cn10k_ml_xstats_type type;
-	uint8_t reset_allowed;
-};
-
-/* Note: Device stats are not allowed to be reset. */
-static const struct xstat_info device_stats[] = {
-	{"nb_models_loaded", nb_models_loaded, 0},
-	{"nb_models_unloaded", nb_models_unloaded, 0},
-	{"nb_models_started", nb_models_started, 0},
-	{"nb_models_stopped", nb_models_stopped, 0},
-};
-
-static const struct xstat_info model_stats[] = {
-	{"Avg-HW-Latency", avg_hw_latency, 1}, {"Min-HW-Latency", min_hw_latency, 1},
-	{"Max-HW-Latency", max_hw_latency, 1}, {"Avg-FW-Latency", avg_fw_latency, 1},
-	{"Min-FW-Latency", min_fw_latency, 1}, {"Max-FW-Latency", max_fw_latency, 1},
-};
-
 static int
 cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 {
@@ -463,10 +444,10 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
-	nb_stats = RTE_DIM(device_stats) + ML_CNXK_MAX_MODELS * RTE_DIM(model_stats);
+	nb_stats = RTE_DIM(device_xstats) + ML_CNXK_MAX_MODELS * RTE_DIM(layer_xstats);
 	if (cn10k_mldev->xstats.entries == NULL)
 		cn10k_mldev->xstats.entries = rte_zmalloc(
-			"cn10k_ml_xstats", sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
+			"cn10k_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
 			PLT_CACHE_LINE_SIZE);
 
 	if (cn10k_mldev->xstats.entries == NULL)
@@ -474,17 +455,17 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 
 	/* Initialize device xstats */
 	stat_id = 0;
-	for (i = 0; i < RTE_DIM(device_stats); i++) {
+	for (i = 0; i < RTE_DIM(device_xstats); i++) {
 		cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
 		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s",
-			 device_stats[i].name);
+			 device_xstats[i].name);
 
 		cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
-		cn10k_mldev->xstats.entries[stat_id].type = device_stats[i].type;
-		cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_DEVICE;
+		cn10k_mldev->xstats.entries[stat_id].type = device_xstats[i].type;
+		cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_DEVICE;
 		cn10k_mldev->xstats.entries[stat_id].obj_idx = 0;
-		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_stats[i].reset_allowed;
+		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_xstats[i].reset_allowed;
 		stat_id++;
 	}
 	cn10k_mldev->xstats.count_mode_device = stat_id;
@@ -493,24 +474,24 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
 		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
 
-		for (i = 0; i < RTE_DIM(model_stats); i++) {
+		for (i = 0; i < RTE_DIM(layer_xstats); i++) {
 			cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
 			cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
-			cn10k_mldev->xstats.entries[stat_id].type = model_stats[i].type;
-			cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_MODEL;
+			cn10k_mldev->xstats.entries[stat_id].type = layer_xstats[i].type;
+			cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
 			cn10k_mldev->xstats.entries[stat_id].obj_idx = model;
 			cn10k_mldev->xstats.entries[stat_id].reset_allowed =
-				model_stats[i].reset_allowed;
+				layer_xstats[i].reset_allowed;
 
 			/* Name of xstat is updated during model load */
 			snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 				 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name),
-				 "Model-%u-%s", model, model_stats[i].name);
+				 "Model-%u-%s", model, layer_xstats[i].name);
 
 			stat_id++;
 		}
 
-		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(model_stats);
+		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
 	}
 
 	cn10k_mldev->xstats.count_mode_model = stat_id - cn10k_mldev->xstats.count_mode_device;
@@ -549,7 +530,7 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[model_id];
-	stat_id = RTE_DIM(device_stats) + model_id * RTE_DIM(model_stats);
+	stat_id = RTE_DIM(device_xstats) + model_id * RTE_DIM(layer_xstats);
 
 	roc_clk_freq_get(&rclk_freq, &sclk_freq);
 	if (sclk_freq == 0)
@@ -558,17 +539,17 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 		strcpy(suffix, "ns");
 
 	/* Update xstat name based on model name and sclk availability */
-	for (i = 0; i < RTE_DIM(model_stats); i++) {
+	for (i = 0; i < RTE_DIM(layer_xstats); i++) {
 		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
-			 model->layer[0].glow.metadata.model.name, model_stats[i].name, suffix);
+			 model->layer[0].glow.metadata.model.name, layer_xstats[i].name, suffix);
 		stat_id++;
 	}
 }
 
 static uint64_t
 cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
-		       enum cn10k_ml_xstats_type type)
+		       enum cnxk_ml_xstats_type type)
 {
 	struct cnxk_ml_dev *cnxk_mldev;
 
@@ -594,9 +575,9 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value += model->layer[0].glow.burst_stats[qp_id].str##_latency_tot;        \
-			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
-				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
+			value += model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot;       \
+			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
+				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
 		}                                                                                  \
 		if (count != 0)                                                                    \
 			value = value / count;                                                     \
@@ -607,9 +588,10 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 		value = UINT64_MAX;                                                                \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
 			value = PLT_MIN(                                                           \
-				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_min); \
-			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
-				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
+				value,                                                             \
+				model->layer[0].glow.burst_xstats[qp_id].str##_latency_min);       \
+			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
+				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
@@ -620,16 +602,17 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
 			value = PLT_MAX(                                                           \
-				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_max); \
-			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
-				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
+				value,                                                             \
+				model->layer[0].glow.burst_xstats[qp_id].str##_latency_max);       \
+			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
+				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
 	} while (0)
 
 static uint64_t
-cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cn10k_ml_xstats_type type)
+cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cnxk_ml_xstats_type type)
 {
 	struct cnxk_ml_model *model;
 	uint16_t rclk_freq; /* MHz */
@@ -675,8 +658,8 @@ cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cn10k_ml
 static int
 cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[], uint16_t nb_ids)
 {
-	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
 	uint16_t nb_stats;
 	uint16_t stat_id;
@@ -712,26 +695,26 @@ cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[],
 #define ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			model->layer[0].glow.burst_stats[qp_id].str##_latency_tot = 0;             \
-			model->layer[0].glow.burst_stats[qp_id].str##_reset_count =                \
-				model->layer[0].glow.burst_stats[qp_id].dequeued_count;            \
+			model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot = 0;            \
+			model->layer[0].glow.burst_xstats[qp_id].str##_reset_count =               \
+				model->layer[0].glow.burst_xstats[qp_id].dequeued_count;           \
 		}                                                                                  \
 	} while (0)
 
 #define ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_stats[qp_id].str##_latency_min = UINT64_MAX;    \
+			model->layer[0].glow.burst_xstats[qp_id].str##_latency_min = UINT64_MAX;   \
 	} while (0)
 
 #define ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_stats[qp_id].str##_latency_max = 0;             \
+			model->layer[0].glow.burst_xstats[qp_id].str##_latency_max = 0;            \
 	} while (0)
 
 static void
-cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cn10k_ml_xstats_type type)
+cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cnxk_ml_xstats_type type)
 {
 	struct cnxk_ml_model *model;
 	uint32_t qp_id;
@@ -766,8 +749,8 @@ static int
 cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint16_t stat_ids[],
 			    uint16_t nb_ids)
 {
-	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	int32_t lcl_model_id = 0;
@@ -1346,10 +1329,10 @@ static int
 cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
 				uint64_t *value)
 {
-	struct cn10k_ml_xstats_entry *xs;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	cn10k_ml_xstats_fn fn;
+	cnxk_ml_xstats_fn fn;
 	uint32_t i;
 
 	cnxk_mldev = dev->data->dev_private;
@@ -1361,10 +1344,10 @@ cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16
 				*stat_id = xs->map.id;
 
 			switch (xs->fn_id) {
-			case CN10K_ML_XSTATS_FN_DEVICE:
+			case CNXK_ML_XSTATS_FN_DEVICE:
 				fn = cn10k_ml_dev_xstat_get;
 				break;
-			case CN10K_ML_XSTATS_FN_MODEL:
+			case CNXK_ML_XSTATS_FN_MODEL:
 				fn = cn10k_ml_model_xstat_get;
 				break;
 			default:
@@ -1388,11 +1371,11 @@ static int
 cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
 			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
 {
-	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
 	uint32_t xstats_mode_count;
-	cn10k_ml_xstats_fn fn;
+	cnxk_ml_xstats_fn fn;
 	uint64_t val;
 	uint32_t idx;
 	uint32_t i;
@@ -1427,10 +1410,10 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 		}
 
 		switch (xs->fn_id) {
-		case CN10K_ML_XSTATS_FN_DEVICE:
+		case CNXK_ML_XSTATS_FN_DEVICE:
 			fn = cn10k_ml_dev_xstat_get;
 			break;
-		case CN10K_ML_XSTATS_FN_MODEL:
+		case CNXK_ML_XSTATS_FN_MODEL:
 			fn = cn10k_ml_model_xstat_get;
 			break;
 		default:
@@ -1668,7 +1651,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
 			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
 	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
-	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_stats);
+	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_xstats);
 
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
 		  2 * model_data_size + model_scratch_size + model_info_size +
@@ -1742,24 +1725,24 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	model->layer[0].glow.req = PLT_PTR_ADD(model->info, model_info_size);
 
 	/* Reset burst and sync stats */
-	model->layer[0].glow.burst_stats =
+	model->layer[0].glow.burst_xstats =
 		PLT_PTR_ADD(model->layer[0].glow.req,
 			    PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
-		model->layer[0].glow.burst_stats[qp_id].hw_latency_tot = 0;
-		model->layer[0].glow.burst_stats[qp_id].hw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_stats[qp_id].hw_latency_max = 0;
-		model->layer[0].glow.burst_stats[qp_id].fw_latency_tot = 0;
-		model->layer[0].glow.burst_stats[qp_id].fw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_stats[qp_id].fw_latency_max = 0;
-		model->layer[0].glow.burst_stats[qp_id].hw_reset_count = 0;
-		model->layer[0].glow.burst_stats[qp_id].fw_reset_count = 0;
-		model->layer[0].glow.burst_stats[qp_id].dequeued_count = 0;
+		model->layer[0].glow.burst_xstats[qp_id].hw_latency_tot = 0;
+		model->layer[0].glow.burst_xstats[qp_id].hw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_xstats[qp_id].hw_latency_max = 0;
+		model->layer[0].glow.burst_xstats[qp_id].fw_latency_tot = 0;
+		model->layer[0].glow.burst_xstats[qp_id].fw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_xstats[qp_id].fw_latency_max = 0;
+		model->layer[0].glow.burst_xstats[qp_id].hw_reset_count = 0;
+		model->layer[0].glow.burst_xstats[qp_id].fw_reset_count = 0;
+		model->layer[0].glow.burst_xstats[qp_id].dequeued_count = 0;
 	}
 
-	model->layer[0].glow.sync_stats =
-		PLT_PTR_ADD(model->layer[0].glow.burst_stats,
-			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_stats));
+	model->layer[0].glow.sync_xstats =
+		PLT_PTR_ADD(model->layer[0].glow.burst_xstats,
+			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_xstats));
 
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CNXK_MODEL_STATE_LOADED;
@@ -2312,7 +2295,7 @@ static __rte_always_inline void
 cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *req)
 {
 	union cn10k_ml_error_code *error_code;
-	struct cn10k_ml_layer_stats *stats;
+	struct cn10k_ml_layer_xstats *xstats;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_result *result;
@@ -2330,31 +2313,31 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *re
 		if (likely(qp_id >= 0)) {
 			qp = dev->data->queue_pairs[qp_id];
 			qp->stats.dequeued_count++;
-			stats = &model->layer[0].glow.burst_stats[qp_id];
+			xstats = &model->layer[0].glow.burst_xstats[qp_id];
 		} else {
-			stats = model->layer[0].glow.sync_stats;
+			xstats = model->layer[0].glow.sync_xstats;
 		}
 
-		if (unlikely(stats->dequeued_count == stats->hw_reset_count)) {
-			stats->hw_latency_min = UINT64_MAX;
-			stats->hw_latency_max = 0;
+		if (unlikely(xstats->dequeued_count == xstats->hw_reset_count)) {
+			xstats->hw_latency_min = UINT64_MAX;
+			xstats->hw_latency_max = 0;
 		}
 
-		if (unlikely(stats->dequeued_count == stats->fw_reset_count)) {
-			stats->fw_latency_min = UINT64_MAX;
-			stats->fw_latency_max = 0;
+		if (unlikely(xstats->dequeued_count == xstats->fw_reset_count)) {
+			xstats->fw_latency_min = UINT64_MAX;
+			xstats->fw_latency_max = 0;
 		}
 
 		hw_latency = result->stats.hw_end - result->stats.hw_start;
 		fw_latency = result->stats.fw_end - result->stats.fw_start - hw_latency;
 
-		stats->hw_latency_tot += hw_latency;
-		stats->hw_latency_min = PLT_MIN(stats->hw_latency_min, hw_latency);
-		stats->hw_latency_max = PLT_MAX(stats->hw_latency_max, hw_latency);
-		stats->fw_latency_tot += fw_latency;
-		stats->fw_latency_min = PLT_MIN(stats->fw_latency_min, fw_latency);
-		stats->fw_latency_max = PLT_MAX(stats->fw_latency_max, fw_latency);
-		stats->dequeued_count++;
+		xstats->hw_latency_tot += hw_latency;
+		xstats->hw_latency_min = PLT_MIN(xstats->hw_latency_min, hw_latency);
+		xstats->hw_latency_max = PLT_MAX(xstats->hw_latency_max, hw_latency);
+		xstats->fw_latency_tot += fw_latency;
+		xstats->fw_latency_min = PLT_MIN(xstats->fw_latency_min, fw_latency);
+		xstats->fw_latency_max = PLT_MAX(xstats->fw_latency_max, fw_latency);
+		xstats->dequeued_count++;
 
 		op->impl_opaque = result->error_code;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
diff --git a/drivers/ml/cnxk/cnxk_ml_xstats.h b/drivers/ml/cnxk/cnxk_ml_xstats.h
new file mode 100644
index 00000000000..0d405679caa
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_xstats.h
@@ -0,0 +1,128 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_XSTATS_H_
+#define _CNXK_ML_XSTATS_H_
+
+#include "cnxk_ml_io.h"
+
+/* Extended stats types enum */
+enum cnxk_ml_xstats_type {
+	/* Number of models loaded */
+	nb_models_loaded,
+
+	/* Number of models unloaded */
+	nb_models_unloaded,
+
+	/* Number of models started */
+	nb_models_started,
+
+	/* Number of models stopped */
+	nb_models_stopped,
+
+	/* Average inference hardware latency */
+	avg_hw_latency,
+
+	/* Minimum hardware latency */
+	min_hw_latency,
+
+	/* Maximum hardware latency */
+	max_hw_latency,
+
+	/* Average firmware latency */
+	avg_fw_latency,
+
+	/* Minimum firmware latency */
+	min_fw_latency,
+
+	/* Maximum firmware latency */
+	max_fw_latency,
+
+	/* Average runtime latency */
+	avg_rt_latency,
+
+	/* Minimum runtime latency */
+	min_rt_latency,
+
+	/* Maximum runtime latency */
+	max_rt_latency,
+};
+
+/* Extended stats function type enum. */
+enum cnxk_ml_xstats_fn_type {
+	/* Device function */
+	CNXK_ML_XSTATS_FN_DEVICE,
+
+	/* Model function */
+	CNXK_ML_XSTATS_FN_MODEL,
+};
+
+/* Function pointer to get xstats for a type */
+typedef uint64_t (*cnxk_ml_xstats_fn)(struct rte_ml_dev *cnxk_mldev, uint16_t obj_idx,
+				      enum cnxk_ml_xstats_type stat);
+
+/* Extended stats entry structure */
+struct cnxk_ml_xstats_entry {
+	/* Name-ID map */
+	struct rte_ml_dev_xstats_map map;
+
+	/* xstats mode, device or model */
+	enum rte_ml_dev_xstats_mode mode;
+
+	/* Type of xstats */
+	enum cnxk_ml_xstats_type type;
+
+	/* xstats function */
+	enum cnxk_ml_xstats_fn_type fn_id;
+
+	/* Object ID, model ID for model stat type */
+	uint16_t obj_idx;
+
+	/* Layer ID, valid for model stat type */
+	int32_t layer_id;
+
+	/* Allowed to reset the stat */
+	uint8_t reset_allowed;
+
+	/* An offset to be taken away to emulate resets */
+	uint64_t reset_value;
+};
+
+/* Extended stats data */
+struct cnxk_ml_xstats {
+	/* Pointer to xstats entries */
+	struct cnxk_ml_xstats_entry *entries;
+
+	/* Store num stats and offset of the stats for each model */
+	uint16_t count_per_model[ML_CNXK_MAX_MODELS];
+	uint16_t offset_for_model[ML_CNXK_MAX_MODELS];
+	uint16_t count_per_layer[ML_CNXK_MAX_MODELS][ML_CNXK_MODEL_MAX_LAYERS];
+	uint16_t offset_for_layer[ML_CNXK_MAX_MODELS][ML_CNXK_MODEL_MAX_LAYERS];
+	uint16_t count_mode_device;
+	uint16_t count_mode_model;
+	uint16_t count;
+};
+
+struct cnxk_ml_xstat_info {
+	char name[32];
+	enum cnxk_ml_xstats_type type;
+	uint8_t reset_allowed;
+};
+
+/* Device xstats. Note: Device stats are not allowed to be reset. */
+static const struct cnxk_ml_xstat_info device_xstats[] = {
+	{"nb_models_loaded", nb_models_loaded, 0},
+	{"nb_models_unloaded", nb_models_unloaded, 0},
+	{"nb_models_started", nb_models_started, 0},
+	{"nb_models_stopped", nb_models_stopped, 0},
+};
+
+/* Layer xstats */
+static const struct cnxk_ml_xstat_info layer_xstats[] = {
+	{"Avg-HW-Latency", avg_hw_latency, 1}, {"Min-HW-Latency", min_hw_latency, 1},
+	{"Max-HW-Latency", max_hw_latency, 1}, {"Avg-FW-Latency", avg_fw_latency, 1},
+	{"Min-FW-Latency", min_fw_latency, 1}, {"Max-FW-Latency", max_fw_latency, 1},
+};
+
+#endif /* _CNXK_ML_XSTATS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 73db458fcd9..6385ac45481 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -16,6 +16,7 @@ driver_sdk_headers = files(
         'cnxk_ml_io.h',
         'cnxk_ml_model.h',
         'cnxk_ml_ops.h',
+        'cnxk_ml_xstats.h',
 )
 
 sources = files(
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 07/34] ml/cnxk: rename cnxk ops function pointers struct
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (5 preceding siblings ...)
  2023-08-30 15:58 ` [PATCH v1 06/34] ml/cnxk: add generic cnxk xstats structures Srikanth Yalavarthi
@ 2023-08-30 15:58 ` Srikanth Yalavarthi
  2023-08-30 15:58 ` [PATCH v1 08/34] ml/cnxk: update device handling functions Srikanth Yalavarthi
                   ` (34 subsequent siblings)
  41 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:58 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Renamed cn10k ML ops structure with cnxk prefix.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c |  2 +-
 drivers/ml/cnxk/cn10k_ml_ops.c | 73 +++++++++-------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h | 34 +++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.c  | 38 ++++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.h  |  2 +
 5 files changed, 93 insertions(+), 56 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index f6e05cfc472..20c114b8bf7 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -404,7 +404,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 			goto pmd_destroy;
 		}
 
-		dev->dev_ops = &cn10k_ml_ops;
+		dev->dev_ops = &cnxk_ml_ops;
 	} else {
 		plt_err("CN10K ML Ops are not supported on secondary process");
 		dev->dev_ops = &ml_dev_dummy_ops;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 03a7447dc87..e6383283d31 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -123,7 +123,7 @@ cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
 {
 	struct cnxk_ml_qp *qp;
@@ -864,7 +864,7 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	return ret;
 }
 
-static int
+int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -892,7 +892,7 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
 {
 	struct rte_ml_dev_info dev_info;
@@ -1091,7 +1091,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	return ret;
 }
 
-static int
+int
 cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1164,7 +1164,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	return rte_dev_remove(dev->device);
 }
 
-static int
+int
 cn10k_ml_dev_start(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1184,7 +1184,7 @@ cn10k_ml_dev_start(struct rte_ml_dev *dev)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1204,7 +1204,7 @@ cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 			      const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
 {
@@ -1245,7 +1245,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 {
 	struct cnxk_ml_qp *qp;
@@ -1262,7 +1262,7 @@ cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 	return 0;
 }
 
-static void
+void
 cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 {
 	struct cnxk_ml_qp *qp;
@@ -1277,7 +1277,7 @@ cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 	}
 }
 
-static int
+int
 cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
 			      uint32_t size)
@@ -1325,7 +1325,7 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 	return idx;
 }
 
-static int
+int
 cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
 				uint64_t *value)
 {
@@ -1367,7 +1367,7 @@ cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16
 	return -EINVAL;
 }
 
-static int
+int
 cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
 			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
 {
@@ -1431,7 +1431,7 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 	return idx;
 }
 
-static int
+int
 cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 			  int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids)
 {
@@ -1445,7 +1445,7 @@ cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mo
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1532,7 +1532,7 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -2055,7 +2055,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	return ret;
 }
 
-static int
+int
 cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			struct rte_ml_model_info *model_info)
 {
@@ -2075,7 +2075,7 @@ cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 	return 0;
 }
 
-static int
+int
 cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
 {
 	struct cnxk_ml_model *model;
@@ -2109,7 +2109,7 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *bu
 	return 0;
 }
 
-static int
+int
 cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
 		     struct rte_ml_buff_seg **qbuffer)
 {
@@ -2190,7 +2190,7 @@ cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_bu
 	return 0;
 }
 
-static int
+int
 cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
 		       struct rte_ml_buff_seg **dbuffer)
 {
@@ -2578,38 +2578,3 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 error_enqueue:
 	return ret;
 }
-
-struct rte_ml_dev_ops cn10k_ml_ops = {
-	/* Device control ops */
-	.dev_info_get = cn10k_ml_dev_info_get,
-	.dev_configure = cn10k_ml_dev_configure,
-	.dev_close = cn10k_ml_dev_close,
-	.dev_start = cn10k_ml_dev_start,
-	.dev_stop = cn10k_ml_dev_stop,
-	.dev_dump = cn10k_ml_dev_dump,
-	.dev_selftest = cn10k_ml_dev_selftest,
-
-	/* Queue-pair handling ops */
-	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
-	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
-
-	/* Stats ops */
-	.dev_stats_get = cn10k_ml_dev_stats_get,
-	.dev_stats_reset = cn10k_ml_dev_stats_reset,
-	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
-	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
-	.dev_xstats_get = cn10k_ml_dev_xstats_get,
-	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
-
-	/* Model ops */
-	.model_load = cn10k_ml_model_load,
-	.model_unload = cn10k_ml_model_unload,
-	.model_start = cn10k_ml_model_start,
-	.model_stop = cn10k_ml_model_stop,
-	.model_info_get = cn10k_ml_model_info_get,
-	.model_params_update = cn10k_ml_model_params_update,
-
-	/* I/O ops */
-	.io_quantize = cn10k_ml_io_quantize,
-	.io_dequantize = cn10k_ml_io_dequantize,
-};
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index fd5992e1925..16480b9ad89 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -286,7 +286,29 @@ struct cn10k_ml_req {
 };
 
 /* Device ops */
-extern struct rte_ml_dev_ops cn10k_ml_ops;
+int cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
+int cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf);
+int cn10k_ml_dev_close(struct rte_ml_dev *dev);
+int cn10k_ml_dev_start(struct rte_ml_dev *dev);
+int cn10k_ml_dev_stop(struct rte_ml_dev *dev);
+int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
+int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
+int cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
+				  const struct rte_ml_dev_qp_conf *qp_conf, int socket_id);
+int cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id);
+
+int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
+void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
+int cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+				  int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
+				  uint32_t size);
+int cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
+				    uint64_t *value);
+int cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+			    int32_t model_id, const uint16_t stat_ids[], uint64_t values[],
+			    uint16_t nb_ids);
+int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+			      int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids);
 
 /* Slow-path ops */
 int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
@@ -294,6 +316,16 @@ int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *para
 int cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
+int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
+			    struct rte_ml_model_info *model_info);
+int cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer);
+
+/* I/O ops */
+int cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id,
+			 struct rte_ml_buff_seg **dbuffer, struct rte_ml_buff_seg **qbuffer);
+
+int cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id,
+			   struct rte_ml_buff_seg **qbuffer, struct rte_ml_buff_seg **dbuffer);
 
 /* Fast-path ops */
 __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index f1872dcf7c6..89e0d9d32c3 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -3,5 +3,43 @@
  */
 
 #include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include "cn10k_ml_ops.h"
 
 #include "cnxk_ml_ops.h"
+
+struct rte_ml_dev_ops cnxk_ml_ops = {
+	/* Device control ops */
+	.dev_info_get = cn10k_ml_dev_info_get,
+	.dev_configure = cn10k_ml_dev_configure,
+	.dev_close = cn10k_ml_dev_close,
+	.dev_start = cn10k_ml_dev_start,
+	.dev_stop = cn10k_ml_dev_stop,
+	.dev_dump = cn10k_ml_dev_dump,
+	.dev_selftest = cn10k_ml_dev_selftest,
+
+	/* Queue-pair handling ops */
+	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
+	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
+
+	/* Stats ops */
+	.dev_stats_get = cn10k_ml_dev_stats_get,
+	.dev_stats_reset = cn10k_ml_dev_stats_reset,
+	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
+	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
+	.dev_xstats_get = cn10k_ml_dev_xstats_get,
+	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
+
+	/* Model ops */
+	.model_load = cn10k_ml_model_load,
+	.model_unload = cn10k_ml_model_unload,
+	.model_start = cn10k_ml_model_start,
+	.model_stop = cn10k_ml_model_stop,
+	.model_info_get = cn10k_ml_model_info_get,
+	.model_params_update = cn10k_ml_model_params_update,
+
+	/* I/O ops */
+	.io_quantize = cn10k_ml_io_quantize,
+	.io_dequantize = cn10k_ml_io_dequantize,
+};
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index b953fb0f5fc..a925c075809 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -60,4 +60,6 @@ struct cnxk_ml_qp {
 	struct rte_ml_dev_stats stats;
 };
 
+extern struct rte_ml_dev_ops cnxk_ml_ops;
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 08/34] ml/cnxk: update device handling functions
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (6 preceding siblings ...)
  2023-08-30 15:58 ` [PATCH v1 07/34] ml/cnxk: rename cnxk ops function pointers struct Srikanth Yalavarthi
@ 2023-08-30 15:58 ` Srikanth Yalavarthi
  2023-08-30 15:58 ` [PATCH v1 09/34] ml/cnxk: update queue-pair " Srikanth Yalavarthi
                   ` (33 subsequent siblings)
  41 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:58 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implement CNXK wrapper functions for dev_info_get,
dev_configure, dev_close, dev_start and dev_stop. The
wrapper functions allocate / release common resources
for the ML driver and invoke device specific functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 230 ++------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |  16 +-
 drivers/ml/cnxk/cnxk_ml_dev.h  |   3 +
 drivers/ml/cnxk/cnxk_ml_ops.c  | 286 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.h  |   3 +
 5 files changed, 314 insertions(+), 224 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index e6383283d31..0f32f3b2bbe 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -105,7 +105,7 @@ qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
 }
 
-static int
+int
 cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
 {
 	const struct rte_memzone *qp_mem;
@@ -865,20 +865,12 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 }
 
 int
-cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
+cn10k_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 
-	if (dev_info == NULL)
-		return -EINVAL;
-
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
-	dev_info->driver_name = dev->device->driver->name;
-	dev_info->max_models = ML_CNXK_MAX_MODELS;
 	if (cn10k_mldev->hw_queue_lock)
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
 	else
@@ -893,143 +885,17 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 }
 
 int
-cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
+cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
-	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cnxk_ml_qp *qp;
-	uint16_t model_id;
-	uint32_t mz_size;
 	uint16_t tile_id;
-	uint16_t qp_id;
 	int ret;
 
-	if (dev == NULL || conf == NULL)
-		return -EINVAL;
+	RTE_SET_USED(conf);
 
-	/* Get CN10K device handle */
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	cn10k_ml_dev_info_get(dev, &dev_info);
-	if (conf->nb_models > dev_info.max_models) {
-		plt_err("Invalid device config, nb_models > %u\n", dev_info.max_models);
-		return -EINVAL;
-	}
-
-	if (conf->nb_queue_pairs > dev_info.max_queue_pairs) {
-		plt_err("Invalid device config, nb_queue_pairs > %u\n", dev_info.max_queue_pairs);
-		return -EINVAL;
-	}
-
-	if (cnxk_mldev->state == ML_CNXK_DEV_STATE_PROBED) {
-		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
-			   conf->nb_queue_pairs, conf->nb_models);
-
-		/* Load firmware */
-		ret = cn10k_ml_fw_load(cnxk_mldev);
-		if (ret != 0)
-			return ret;
-	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
-		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
-			   conf->nb_queue_pairs, conf->nb_models);
-	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_STARTED) {
-		plt_err("Device can't be reconfigured in started state\n");
-		return -ENOTSUP;
-	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CLOSED) {
-		plt_err("Device can't be reconfigured after close\n");
-		return -ENOTSUP;
-	}
-
-	/* Configure queue-pairs */
-	if (dev->data->queue_pairs == NULL) {
-		mz_size = sizeof(dev->data->queue_pairs[0]) * conf->nb_queue_pairs;
-		dev->data->queue_pairs =
-			rte_zmalloc("cn10k_mldev_queue_pairs", mz_size, RTE_CACHE_LINE_SIZE);
-		if (dev->data->queue_pairs == NULL) {
-			dev->data->nb_queue_pairs = 0;
-			plt_err("Failed to get memory for queue_pairs, nb_queue_pairs %u",
-				conf->nb_queue_pairs);
-			return -ENOMEM;
-		}
-	} else { /* Re-configure */
-		void **queue_pairs;
-
-		/* Release all queue pairs as ML spec doesn't support queue_pair_destroy. */
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-			qp = dev->data->queue_pairs[qp_id];
-			if (qp != NULL) {
-				ret = cn10k_ml_dev_queue_pair_release(dev, qp_id);
-				if (ret < 0)
-					return ret;
-			}
-		}
-
-		queue_pairs = dev->data->queue_pairs;
-		queue_pairs =
-			rte_realloc(queue_pairs, sizeof(queue_pairs[0]) * conf->nb_queue_pairs,
-				    RTE_CACHE_LINE_SIZE);
-		if (queue_pairs == NULL) {
-			dev->data->nb_queue_pairs = 0;
-			plt_err("Failed to realloc queue_pairs, nb_queue_pairs = %u",
-				conf->nb_queue_pairs);
-			ret = -ENOMEM;
-			goto error;
-		}
-
-		memset(queue_pairs, 0, sizeof(queue_pairs[0]) * conf->nb_queue_pairs);
-		dev->data->queue_pairs = queue_pairs;
-	}
-	dev->data->nb_queue_pairs = conf->nb_queue_pairs;
-
-	/* Allocate ML models */
-	if (dev->data->models == NULL) {
-		mz_size = sizeof(dev->data->models[0]) * conf->nb_models;
-		dev->data->models = rte_zmalloc("cn10k_mldev_models", mz_size, RTE_CACHE_LINE_SIZE);
-		if (dev->data->models == NULL) {
-			dev->data->nb_models = 0;
-			plt_err("Failed to get memory for ml_models, nb_models %u",
-				conf->nb_models);
-			ret = -ENOMEM;
-			goto error;
-		}
-	} else {
-		/* Re-configure */
-		void **models;
-
-		/* Stop and unload all models */
-		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
-			model = dev->data->models[model_id];
-			if (model != NULL) {
-				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-					if (cn10k_ml_model_stop(dev, model_id) != 0)
-						plt_err("Could not stop model %u", model_id);
-				}
-				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-					if (cn10k_ml_model_unload(dev, model_id) != 0)
-						plt_err("Could not unload model %u", model_id);
-				}
-				dev->data->models[model_id] = NULL;
-			}
-		}
-
-		models = dev->data->models;
-		models = rte_realloc(models, sizeof(models[0]) * conf->nb_models,
-				     RTE_CACHE_LINE_SIZE);
-		if (models == NULL) {
-			dev->data->nb_models = 0;
-			plt_err("Failed to realloc ml_models, nb_models = %u", conf->nb_models);
-			ret = -ENOMEM;
-			goto error;
-		}
-		memset(models, 0, sizeof(models[0]) * conf->nb_models);
-		dev->data->models = models;
-	}
-	dev->data->nb_models = conf->nb_models;
-
 	ocm = &cn10k_mldev->ocm;
 	ocm->num_tiles = ML_CN10K_OCM_NUMTILES;
 	ocm->size_per_tile = ML_CN10K_OCM_TILESIZE;
@@ -1042,8 +908,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		rte_zmalloc("ocm_mask", ocm->mask_words * ocm->num_tiles, RTE_CACHE_LINE_SIZE);
 	if (ocm->ocm_mask == NULL) {
 		plt_err("Unable to allocate memory for OCM mask");
-		ret = -ENOMEM;
-		goto error;
+		return -ENOMEM;
 	}
 
 	for (tile_id = 0; tile_id < ocm->num_tiles; tile_id++) {
@@ -1054,10 +919,10 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	rte_spinlock_init(&ocm->lock);
 
 	/* Initialize xstats */
-	ret = cn10k_ml_xstats_init(dev);
+	ret = cn10k_ml_xstats_init(cnxk_mldev->mldev);
 	if (ret != 0) {
 		plt_err("Failed to initialize xstats");
-		goto error;
+		return ret;
 	}
 
 	/* Set JCMDQ enqueue function */
@@ -1071,77 +936,25 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	cn10k_mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
 	cn10k_mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
 
-	dev->enqueue_burst = cn10k_ml_enqueue_burst;
-	dev->dequeue_burst = cn10k_ml_dequeue_burst;
-	dev->op_error_get = cn10k_ml_op_error_get;
-
-	cnxk_mldev->nb_models_loaded = 0;
-	cnxk_mldev->nb_models_started = 0;
-	cnxk_mldev->nb_models_stopped = 0;
-	cnxk_mldev->nb_models_unloaded = 0;
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
+	cnxk_mldev->mldev->enqueue_burst = cn10k_ml_enqueue_burst;
+	cnxk_mldev->mldev->dequeue_burst = cn10k_ml_dequeue_burst;
+	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
 
 	return 0;
-
-error:
-	rte_free(dev->data->queue_pairs);
-
-	rte_free(dev->data->models);
-
-	return ret;
 }
 
 int
-cn10k_ml_dev_close(struct rte_ml_dev *dev)
+cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
-	struct cnxk_ml_qp *qp;
-	uint16_t model_id;
-	uint16_t qp_id;
 
-	if (dev == NULL)
-		return -EINVAL;
-
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Release ocm_mask memory */
 	rte_free(cn10k_mldev->ocm.ocm_mask);
 
-	/* Stop and unload all models */
-	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
-		model = dev->data->models[model_id];
-		if (model != NULL) {
-			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-				if (cn10k_ml_model_stop(dev, model_id) != 0)
-					plt_err("Could not stop model %u", model_id);
-			}
-			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-				if (cn10k_ml_model_unload(dev, model_id) != 0)
-					plt_err("Could not unload model %u", model_id);
-			}
-			dev->data->models[model_id] = NULL;
-		}
-	}
-
-	rte_free(dev->data->models);
-
-	/* Destroy all queue pairs */
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-		qp = dev->data->queue_pairs[qp_id];
-		if (qp != NULL) {
-			if (cnxk_ml_qp_destroy(dev, qp) != 0)
-				plt_err("Could not destroy queue pair %u", qp_id);
-			dev->data->queue_pairs[qp_id] = NULL;
-		}
-	}
-
-	rte_free(dev->data->queue_pairs);
-
 	/* Un-initialize xstats */
-	cn10k_ml_xstats_uninit(dev);
+	cn10k_ml_xstats_uninit(cnxk_mldev->mldev);
 
 	/* Unload firmware */
 	cn10k_ml_fw_unload(cnxk_mldev);
@@ -1158,20 +971,15 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_MLR_BASE);
 	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
 
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_CLOSED;
-
-	/* Remove PCI device */
-	return rte_dev_remove(dev->device);
+	return 0;
 }
 
 int
-cn10k_ml_dev_start(struct rte_ml_dev *dev)
+cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
@@ -1179,19 +987,15 @@ cn10k_ml_dev_start(struct rte_ml_dev *dev)
 	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
-
 	return 0;
 }
 
 int
-cn10k_ml_dev_stop(struct rte_ml_dev *dev)
+cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
@@ -1199,8 +1003,6 @@ cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
-
 	return 0;
 }
 
@@ -1221,7 +1023,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	if (dev->data->queue_pairs[queue_pair_id] != NULL)
 		cn10k_ml_dev_queue_pair_release(dev, queue_pair_id);
 
-	cn10k_ml_dev_info_get(dev, &dev_info);
+	cnxk_ml_dev_info_get(dev, &dev_info);
 	if ((qp_conf->nb_desc > dev_info.max_desc) || (qp_conf->nb_desc == 0)) {
 		plt_err("Could not setup queue pair for %u descriptors", qp_conf->nb_desc);
 		return -EINVAL;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 16480b9ad89..d50b5bede71 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -10,6 +10,9 @@
 
 #include <roc_api.h>
 
+struct cnxk_ml_dev;
+struct cnxk_ml_qp;
+
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
 
@@ -286,11 +289,11 @@ struct cn10k_ml_req {
 };
 
 /* Device ops */
-int cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
-int cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf);
-int cn10k_ml_dev_close(struct rte_ml_dev *dev);
-int cn10k_ml_dev_start(struct rte_ml_dev *dev);
-int cn10k_ml_dev_stop(struct rte_ml_dev *dev);
+int cn10k_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info);
+int cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
+int cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev);
+int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
 int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
 int cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
@@ -336,4 +339,7 @@ __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op
 				    struct rte_ml_op_error *error);
 __rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
 
+/* Temporarily set below functions as non-static */
+int cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp);
+
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 51315de6227..02605fa28fc 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -53,6 +53,9 @@ struct cnxk_ml_dev {
 
 	/* CN10K device structure */
 	struct cn10k_ml_dev cn10k_mldev;
+
+	/* Maximum number of layers */
+	uint64_t max_nb_layers;
 };
 
 #endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 89e0d9d32c3..83d5cbae58b 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -7,15 +7,291 @@
 
 #include "cn10k_ml_ops.h"
 
+#include "cnxk_ml_dev.h"
+#include "cnxk_ml_io.h"
+#include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
+int
+cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	if (dev == NULL || dev_info == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
+	dev_info->driver_name = dev->device->driver->name;
+	dev_info->max_models = ML_CNXK_MAX_MODELS;
+
+	return cn10k_ml_dev_info_get(cnxk_mldev, dev_info);
+}
+
+static int
+cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_qp *qp;
+	uint16_t model_id;
+	uint32_t mz_size;
+	uint16_t qp_id;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	/* Get CNXK device handle */
+	cnxk_mldev = dev->data->dev_private;
+
+	cnxk_ml_dev_info_get(dev, &dev_info);
+	if (conf->nb_models > dev_info.max_models) {
+		plt_err("Invalid device config, nb_models > %u\n", dev_info.max_models);
+		return -EINVAL;
+	}
+
+	if (conf->nb_queue_pairs > dev_info.max_queue_pairs) {
+		plt_err("Invalid device config, nb_queue_pairs > %u\n", dev_info.max_queue_pairs);
+		return -EINVAL;
+	}
+
+	if (cnxk_mldev->state == ML_CNXK_DEV_STATE_PROBED) {
+		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
+			   conf->nb_queue_pairs, conf->nb_models);
+
+		/* Load firmware */
+		ret = cn10k_ml_fw_load(cnxk_mldev);
+		if (ret != 0)
+			return ret;
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
+		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
+			   conf->nb_queue_pairs, conf->nb_models);
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_STARTED) {
+		plt_err("Device can't be reconfigured in started state\n");
+		return -ENOTSUP;
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CLOSED) {
+		plt_err("Device can't be reconfigured after close\n");
+		return -ENOTSUP;
+	}
+
+	/* Configure queue-pairs */
+	if (dev->data->queue_pairs == NULL) {
+		mz_size = sizeof(dev->data->queue_pairs[0]) * conf->nb_queue_pairs;
+		dev->data->queue_pairs =
+			rte_zmalloc("cnxk_mldev_queue_pairs", mz_size, RTE_CACHE_LINE_SIZE);
+		if (dev->data->queue_pairs == NULL) {
+			dev->data->nb_queue_pairs = 0;
+			plt_err("Failed to get memory for queue_pairs, nb_queue_pairs %u",
+				conf->nb_queue_pairs);
+			return -ENOMEM;
+		}
+	} else { /* Re-configure */
+		void **queue_pairs;
+
+		/* Release all queue pairs as ML spec doesn't support queue_pair_destroy. */
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+			qp = dev->data->queue_pairs[qp_id];
+			if (qp != NULL) {
+				ret = cn10k_ml_dev_queue_pair_release(dev, qp_id);
+				if (ret < 0)
+					return ret;
+			}
+		}
+
+		queue_pairs = dev->data->queue_pairs;
+		queue_pairs =
+			rte_realloc(queue_pairs, sizeof(queue_pairs[0]) * conf->nb_queue_pairs,
+				    RTE_CACHE_LINE_SIZE);
+		if (queue_pairs == NULL) {
+			dev->data->nb_queue_pairs = 0;
+			plt_err("Failed to realloc queue_pairs, nb_queue_pairs = %u",
+				conf->nb_queue_pairs);
+			ret = -ENOMEM;
+			goto error;
+		}
+
+		memset(queue_pairs, 0, sizeof(queue_pairs[0]) * conf->nb_queue_pairs);
+		dev->data->queue_pairs = queue_pairs;
+	}
+	dev->data->nb_queue_pairs = conf->nb_queue_pairs;
+
+	/* Allocate ML models */
+	if (dev->data->models == NULL) {
+		mz_size = sizeof(dev->data->models[0]) * conf->nb_models;
+		dev->data->models = rte_zmalloc("cnxk_mldev_models", mz_size, RTE_CACHE_LINE_SIZE);
+		if (dev->data->models == NULL) {
+			dev->data->nb_models = 0;
+			plt_err("Failed to get memory for ml_models, nb_models %u",
+				conf->nb_models);
+			ret = -ENOMEM;
+			goto error;
+		}
+	} else {
+		/* Re-configure */
+		void **models;
+
+		/* Stop and unload all models */
+		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+			model = dev->data->models[model_id];
+			if (model != NULL) {
+				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
+					if (cn10k_ml_model_stop(dev, model_id) != 0)
+						plt_err("Could not stop model %u", model_id);
+				}
+				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
+					if (cn10k_ml_model_unload(dev, model_id) != 0)
+						plt_err("Could not unload model %u", model_id);
+				}
+				dev->data->models[model_id] = NULL;
+			}
+		}
+
+		models = dev->data->models;
+		models = rte_realloc(models, sizeof(models[0]) * conf->nb_models,
+				     RTE_CACHE_LINE_SIZE);
+		if (models == NULL) {
+			dev->data->nb_models = 0;
+			plt_err("Failed to realloc ml_models, nb_models = %u", conf->nb_models);
+			ret = -ENOMEM;
+			goto error;
+		}
+		memset(models, 0, sizeof(models[0]) * conf->nb_models);
+		dev->data->models = models;
+	}
+	dev->data->nb_models = conf->nb_models;
+
+	ret = cn10k_ml_dev_configure(cnxk_mldev, conf);
+	if (ret != 0) {
+		plt_err("Failed to configure CN10K ML Device");
+		goto error;
+	}
+
+	/* Set device capabilities */
+	cnxk_mldev->max_nb_layers =
+		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
+
+	cnxk_mldev->nb_models_loaded = 0;
+	cnxk_mldev->nb_models_started = 0;
+	cnxk_mldev->nb_models_stopped = 0;
+	cnxk_mldev->nb_models_unloaded = 0;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
+
+	return 0;
+
+error:
+	rte_free(dev->data->queue_pairs);
+	rte_free(dev->data->models);
+
+	return ret;
+}
+
+static int
+cnxk_ml_dev_close(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_qp *qp;
+	uint16_t model_id;
+	uint16_t qp_id;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
+		plt_err("Failed to close CN10K ML Device");
+
+	/* Stop and unload all models */
+	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+		model = dev->data->models[model_id];
+		if (model != NULL) {
+			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
+				if (cn10k_ml_model_stop(dev, model_id) != 0)
+					plt_err("Could not stop model %u", model_id);
+			}
+			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
+				if (cn10k_ml_model_unload(dev, model_id) != 0)
+					plt_err("Could not unload model %u", model_id);
+			}
+			dev->data->models[model_id] = NULL;
+		}
+	}
+
+	rte_free(dev->data->models);
+
+	/* Destroy all queue pairs */
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		if (qp != NULL) {
+			if (cnxk_ml_qp_destroy(dev, qp) != 0)
+				plt_err("Could not destroy queue pair %u", qp_id);
+			dev->data->queue_pairs[qp_id] = NULL;
+		}
+	}
+
+	rte_free(dev->data->queue_pairs);
+
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CLOSED;
+
+	/* Remove PCI device */
+	return rte_dev_remove(dev->device);
+}
+
+static int
+cnxk_ml_dev_start(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	ret = cn10k_ml_dev_start(cnxk_mldev);
+	if (ret != 0) {
+		plt_err("Failed to start CN10K ML Device");
+		return ret;
+	}
+
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
+
+	return 0;
+}
+
+static int
+cnxk_ml_dev_stop(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	ret = cn10k_ml_dev_stop(cnxk_mldev);
+	if (ret != 0) {
+		plt_err("Failed to stop CN10K ML Device");
+		return ret;
+	}
+
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
-	.dev_info_get = cn10k_ml_dev_info_get,
-	.dev_configure = cn10k_ml_dev_configure,
-	.dev_close = cn10k_ml_dev_close,
-	.dev_start = cn10k_ml_dev_start,
-	.dev_stop = cn10k_ml_dev_stop,
+	.dev_info_get = cnxk_ml_dev_info_get,
+	.dev_configure = cnxk_ml_dev_configure,
+	.dev_close = cnxk_ml_dev_close,
+	.dev_start = cnxk_ml_dev_start,
+	.dev_stop = cnxk_ml_dev_stop,
 	.dev_dump = cn10k_ml_dev_dump,
 	.dev_selftest = cn10k_ml_dev_selftest,
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index a925c075809..2996928d7d0 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -62,4 +62,7 @@ struct cnxk_ml_qp {
 
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
+/* Temporarily set cnxk driver functions as non-static */
+int cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 09/34] ml/cnxk: update queue-pair handling functions
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (7 preceding siblings ...)
  2023-08-30 15:58 ` [PATCH v1 08/34] ml/cnxk: update device handling functions Srikanth Yalavarthi
@ 2023-08-30 15:58 ` Srikanth Yalavarthi
  2023-08-30 15:59 ` [PATCH v1 10/34] ml/cnxk: update model load and unload functions Srikanth Yalavarthi
                   ` (32 subsequent siblings)
  41 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:58 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to handle ML device queue-pairs.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 135 +----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   7 +-
 drivers/ml/cnxk/cnxk_ml_ops.c  | 153 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.h  |   3 -
 4 files changed, 154 insertions(+), 144 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 0f32f3b2bbe..330cb050cbd 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -99,93 +99,12 @@ cn10k_ml_get_poll_ptr(struct cnxk_ml_req *req)
 	return plt_read64(req->status);
 }
 
-static void
-qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
-{
-	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
-}
-
-int
-cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
-{
-	const struct rte_memzone *qp_mem;
-	char name[RTE_MEMZONE_NAMESIZE];
-	int ret;
-
-	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp->id);
-	qp_mem = rte_memzone_lookup(name);
-	ret = rte_memzone_free(qp_mem);
-	if (ret)
-		return ret;
-
-	rte_free(qp);
-
-	return 0;
-}
-
-int
-cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
-{
-	struct cnxk_ml_qp *qp;
-	int ret;
-
-	qp = dev->data->queue_pairs[queue_pair_id];
-	if (qp == NULL)
-		return -EINVAL;
-
-	ret = cnxk_ml_qp_destroy(dev, qp);
-	if (ret) {
-		plt_err("Could not destroy queue pair %u", queue_pair_id);
-		return ret;
-	}
-
-	dev->data->queue_pairs[queue_pair_id] = NULL;
-
-	return 0;
-}
-
-static struct cnxk_ml_qp *
-cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
+void
+cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp)
 {
-	const struct rte_memzone *qp_mem;
-	char name[RTE_MEMZONE_NAMESIZE];
-	struct cnxk_ml_qp *qp;
-	uint32_t len;
-	uint8_t *va;
 	uint64_t i;
 
-	/* Allocate queue pair */
-	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cnxk_ml_qp), ROC_ALIGN,
-				socket_id);
-	if (qp == NULL) {
-		plt_err("Could not allocate queue pair");
-		return NULL;
-	}
-
-	/* For request queue */
-	len = nb_desc * sizeof(struct cnxk_ml_req);
-	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp_id);
-	qp_mem = rte_memzone_reserve_aligned(
-		name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | RTE_MEMZONE_256MB, ROC_ALIGN);
-	if (qp_mem == NULL) {
-		plt_err("Could not reserve memzone: %s", name);
-		goto qp_free;
-	}
-
-	va = qp_mem->addr;
-	memset(va, 0, len);
-
-	/* Initialize Request queue */
-	qp->id = qp_id;
-	qp->queue.reqs = (struct cnxk_ml_req *)va;
-	qp->queue.head = 0;
-	qp->queue.tail = 0;
-	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	qp->nb_desc = nb_desc;
-	qp->stats.enqueued_count = 0;
-	qp->stats.dequeued_count = 0;
-	qp->stats.enqueue_err_count = 0;
-	qp->stats.dequeue_err_count = 0;
+	RTE_SET_USED(cnxk_mldev);
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
@@ -193,13 +112,6 @@ cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc
 		qp->queue.reqs[i].cn10k_req.jcmd.w1.s.jobptr =
 			PLT_U64_CAST(&qp->queue.reqs[i].cn10k_req.jd);
 	}
-
-	return qp;
-
-qp_free:
-	rte_free(qp);
-
-	return NULL;
 }
 
 static void
@@ -1006,47 +918,6 @@ cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
-int
-cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
-			      const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
-{
-	struct rte_ml_dev_info dev_info;
-	struct cnxk_ml_qp *qp;
-	uint32_t nb_desc;
-
-	if (queue_pair_id >= dev->data->nb_queue_pairs) {
-		plt_err("Queue-pair id = %u (>= max queue pairs supported, %u)\n", queue_pair_id,
-			dev->data->nb_queue_pairs);
-		return -EINVAL;
-	}
-
-	if (dev->data->queue_pairs[queue_pair_id] != NULL)
-		cn10k_ml_dev_queue_pair_release(dev, queue_pair_id);
-
-	cnxk_ml_dev_info_get(dev, &dev_info);
-	if ((qp_conf->nb_desc > dev_info.max_desc) || (qp_conf->nb_desc == 0)) {
-		plt_err("Could not setup queue pair for %u descriptors", qp_conf->nb_desc);
-		return -EINVAL;
-	}
-	plt_ml_dbg("Creating queue-pair, queue_pair_id = %u, nb_desc = %u", queue_pair_id,
-		   qp_conf->nb_desc);
-
-	/* As the number of usable descriptors is 1 less than the queue size being created, we
-	 * increment the size of queue by 1 than the requested size, except when the requested size
-	 * is equal to the maximum possible size.
-	 */
-	nb_desc =
-		(qp_conf->nb_desc == dev_info.max_desc) ? dev_info.max_desc : qp_conf->nb_desc + 1;
-	qp = cnxk_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
-	if (qp == NULL) {
-		plt_err("Could not create queue pair %u", queue_pair_id);
-		return -ENOMEM;
-	}
-	dev->data->queue_pairs[queue_pair_id] = qp;
-
-	return 0;
-}
-
 int
 cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 {
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index d50b5bede71..2d0a49d5cdf 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -296,9 +296,6 @@ int cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
 int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
-int cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
-				  const struct rte_ml_dev_qp_conf *qp_conf, int socket_id);
-int cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id);
 
 int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
 void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
@@ -339,7 +336,7 @@ __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op
 				    struct rte_ml_op_error *error);
 __rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
 
-/* Temporarily set below functions as non-static */
-int cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp);
+/* Misc ops */
+void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
 
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 83d5cbae58b..1767a8a3dbc 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -12,7 +12,107 @@
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
-int
+static void
+qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
+{
+	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
+}
+
+static int
+cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
+{
+	const struct rte_memzone *qp_mem;
+	char name[RTE_MEMZONE_NAMESIZE];
+	int ret;
+
+	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp->id);
+	qp_mem = rte_memzone_lookup(name);
+	ret = rte_memzone_free(qp_mem);
+	if (ret)
+		return ret;
+
+	rte_free(qp);
+
+	return 0;
+}
+
+static int
+cnxk_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
+{
+	struct cnxk_ml_qp *qp;
+	int ret;
+
+	qp = dev->data->queue_pairs[queue_pair_id];
+	if (qp == NULL)
+		return -EINVAL;
+
+	ret = cnxk_ml_qp_destroy(dev, qp);
+	if (ret) {
+		plt_err("Could not destroy queue pair %u", queue_pair_id);
+		return ret;
+	}
+
+	dev->data->queue_pairs[queue_pair_id] = NULL;
+
+	return 0;
+}
+
+static struct cnxk_ml_qp *
+cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
+{
+	const struct rte_memzone *qp_mem;
+	char name[RTE_MEMZONE_NAMESIZE];
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_qp *qp;
+	uint32_t len;
+	uint8_t *va;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	/* Allocate queue pair */
+	qp = rte_zmalloc_socket("cnxk_ml_pmd_queue_pair", sizeof(struct cnxk_ml_qp), ROC_ALIGN,
+				socket_id);
+	if (qp == NULL) {
+		plt_err("Could not allocate queue pair");
+		return NULL;
+	}
+
+	/* For request queue */
+	len = nb_desc * sizeof(struct cnxk_ml_req);
+	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp_id);
+	qp_mem = rte_memzone_reserve_aligned(
+		name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | RTE_MEMZONE_256MB, ROC_ALIGN);
+	if (qp_mem == NULL) {
+		plt_err("Could not reserve memzone: %s", name);
+		goto qp_free;
+	}
+
+	va = qp_mem->addr;
+	memset(va, 0, len);
+
+	/* Initialize Request queue */
+	qp->id = qp_id;
+	qp->queue.reqs = (struct cnxk_ml_req *)va;
+	qp->queue.head = 0;
+	qp->queue.tail = 0;
+	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	qp->nb_desc = nb_desc;
+	qp->stats.enqueued_count = 0;
+	qp->stats.dequeued_count = 0;
+	qp->stats.enqueue_err_count = 0;
+	qp->stats.dequeue_err_count = 0;
+
+	cn10k_ml_qp_initialize(cnxk_mldev, qp);
+
+	return qp;
+
+qp_free:
+	rte_free(qp);
+
+	return NULL;
+}
+
+static int
 cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
 	struct cnxk_ml_dev *cnxk_mldev;
@@ -95,7 +195,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
 			qp = dev->data->queue_pairs[qp_id];
 			if (qp != NULL) {
-				ret = cn10k_ml_dev_queue_pair_release(dev, qp_id);
+				ret = cnxk_ml_dev_queue_pair_release(dev, qp_id);
 				if (ret < 0)
 					return ret;
 			}
@@ -285,6 +385,51 @@ cnxk_ml_dev_stop(struct rte_ml_dev *dev)
 	return 0;
 }
 
+static int
+cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
+			     const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cnxk_ml_qp *qp;
+	uint32_t nb_desc;
+
+	if (queue_pair_id >= dev->data->nb_queue_pairs) {
+		plt_err("Queue-pair id = %u (>= max queue pairs supported, %u)\n", queue_pair_id,
+			dev->data->nb_queue_pairs);
+		return -EINVAL;
+	}
+
+	if (dev->data->queue_pairs[queue_pair_id] != NULL)
+		cnxk_ml_dev_queue_pair_release(dev, queue_pair_id);
+
+	cnxk_ml_dev_info_get(dev, &dev_info);
+	if (qp_conf->nb_desc == 0) {
+		plt_err("Could not setup queue pair for %u descriptors", qp_conf->nb_desc);
+		return -EINVAL;
+	} else if (qp_conf->nb_desc > dev_info.max_desc) {
+		plt_err("Could not setup queue pair for %u descriptors (> %u)", qp_conf->nb_desc,
+			dev_info.max_desc);
+		return -EINVAL;
+	}
+	plt_ml_dbg("Creating queue-pair, queue_pair_id = %u, nb_desc = %u", queue_pair_id,
+		   qp_conf->nb_desc);
+
+	/* As the number of usable descriptors is 1 less than the queue size being created, we
+	 * increment the size of queue by 1 than the requested size, except when the requested size
+	 * is equal to the maximum possible size.
+	 */
+	nb_desc =
+		(qp_conf->nb_desc == dev_info.max_desc) ? dev_info.max_desc : qp_conf->nb_desc + 1;
+	qp = cnxk_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
+	if (qp == NULL) {
+		plt_err("Could not create queue pair %u", queue_pair_id);
+		return -ENOMEM;
+	}
+	dev->data->queue_pairs[queue_pair_id] = qp;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -296,8 +441,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_selftest = cn10k_ml_dev_selftest,
 
 	/* Queue-pair handling ops */
-	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
-	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
+	.dev_queue_pair_setup = cnxk_ml_dev_queue_pair_setup,
+	.dev_queue_pair_release = cnxk_ml_dev_queue_pair_release,
 
 	/* Stats ops */
 	.dev_stats_get = cn10k_ml_dev_stats_get,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index 2996928d7d0..a925c075809 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -62,7 +62,4 @@ struct cnxk_ml_qp {
 
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
-/* Temporarily set cnxk driver functions as non-static */
-int cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
-
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 10/34] ml/cnxk: update model load and unload functions
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (8 preceding siblings ...)
  2023-08-30 15:58 ` [PATCH v1 09/34] ml/cnxk: update queue-pair " Srikanth Yalavarthi
@ 2023-08-30 15:59 ` Srikanth Yalavarthi
  2023-08-30 15:59 ` [PATCH v1 11/34] ml/cnxk: update model start and stop functions Srikanth Yalavarthi
                   ` (31 subsequent siblings)
  41 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented cnxk wrapper functions to load and unload
ML models. Wrapper functions would invoke the cn10k
model load and unload functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 239 ++++++++++++-------------
 drivers/ml/cnxk/cn10k_ml_model.h |  25 +--
 drivers/ml/cnxk/cn10k_ml_ops.c   | 296 ++++++++++++++++++-------------
 drivers/ml/cnxk/cn10k_ml_ops.h   |  12 +-
 drivers/ml/cnxk/cnxk_ml_dev.h    |  15 ++
 drivers/ml/cnxk/cnxk_ml_ops.c    | 144 ++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.h    |   2 +
 7 files changed, 455 insertions(+), 278 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 2a0ae44cfd5..9a336cd18f9 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -6,7 +6,6 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
@@ -318,42 +317,31 @@ cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer, uint8_t
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_layer_addr *addr;
-	size_t model_data_size;
 	uint8_t *dma_addr_load;
-	uint8_t *dma_addr_run;
 	int fpos;
 
 	metadata = &layer->glow.metadata;
 	addr = &layer->glow.addr;
-	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
-			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
 
 	/* Base address */
 	addr->base_dma_addr_load = base_dma_addr;
-	addr->base_dma_addr_run = PLT_PTR_ADD(addr->base_dma_addr_load, model_data_size);
 
 	/* Init section */
 	dma_addr_load = addr->base_dma_addr_load;
-	dma_addr_run = addr->base_dma_addr_run;
 	fpos = sizeof(struct cn10k_ml_model_metadata);
 	addr->init_load_addr = dma_addr_load;
-	addr->init_run_addr = dma_addr_run;
 	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->init_model.file_size);
 
 	/* Main section */
 	dma_addr_load += metadata->init_model.file_size;
-	dma_addr_run += metadata->init_model.file_size;
 	fpos += metadata->init_model.file_size;
 	addr->main_load_addr = dma_addr_load;
-	addr->main_run_addr = dma_addr_run;
 	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->main_model.file_size);
 
 	/* Finish section */
 	dma_addr_load += metadata->main_model.file_size;
-	dma_addr_run += metadata->main_model.file_size;
 	fpos += metadata->main_model.file_size;
 	addr->finish_load_addr = dma_addr_load;
-	addr->finish_run_addr = dma_addr_run;
 	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->finish_model.file_size);
 
 	/* Weights and Bias section */
@@ -365,140 +353,140 @@ cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer, uint8_t
 }
 
 void
-cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer)
+cn10k_ml_layer_io_info_update(struct cnxk_ml_io_info *io_info,
+			      struct cn10k_ml_model_metadata *metadata)
 {
-	struct cn10k_ml_model_metadata *metadata;
 	uint8_t i;
 	uint8_t j;
 
-	metadata = &layer->glow.metadata;
-
 	/* Inputs */
-	layer->info.nb_inputs = metadata->model.num_input;
-	layer->info.total_input_sz_d = 0;
-	layer->info.total_input_sz_q = 0;
+	io_info->nb_inputs = metadata->model.num_input;
+	io_info->total_input_sz_d = 0;
+	io_info->total_input_sz_q = 0;
 	for (i = 0; i < metadata->model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			strncpy(layer->info.input[i].name, (char *)metadata->input1[i].input_name,
+			strncpy(io_info->input[i].name, (char *)metadata->input1[i].input_name,
 				MRVL_ML_INPUT_NAME_LEN);
-			layer->info.input[i].dtype = metadata->input1[i].input_type;
-			layer->info.input[i].qtype = metadata->input1[i].model_input_type;
-			layer->info.input[i].nb_dims = 4;
-			layer->info.input[i].shape[0] = metadata->input1[i].shape.w;
-			layer->info.input[i].shape[1] = metadata->input1[i].shape.x;
-			layer->info.input[i].shape[2] = metadata->input1[i].shape.y;
-			layer->info.input[i].shape[3] = metadata->input1[i].shape.z;
-			layer->info.input[i].nb_elements =
+			io_info->input[i].dtype = metadata->input1[i].input_type;
+			io_info->input[i].qtype = metadata->input1[i].model_input_type;
+			io_info->input[i].nb_dims = 4;
+			io_info->input[i].shape[0] = metadata->input1[i].shape.w;
+			io_info->input[i].shape[1] = metadata->input1[i].shape.x;
+			io_info->input[i].shape[2] = metadata->input1[i].shape.y;
+			io_info->input[i].shape[3] = metadata->input1[i].shape.z;
+			io_info->input[i].nb_elements =
 				metadata->input1[i].shape.w * metadata->input1[i].shape.x *
 				metadata->input1[i].shape.y * metadata->input1[i].shape.z;
-			layer->info.input[i].sz_d =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_d =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].input_type);
-			layer->info.input[i].sz_q =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_q =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].model_input_type);
-			layer->info.input[i].scale = metadata->input1[i].qscale;
+			io_info->input[i].scale = metadata->input1[i].qscale;
 
-			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
-			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
+			io_info->total_input_sz_d += io_info->input[i].sz_d;
+			io_info->total_input_sz_q += io_info->input[i].sz_q;
 
 			plt_ml_dbg(
-				"index = %u, input1[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				layer->index, i, metadata->input1[i].shape.w,
+				"layer_name = %s, input1[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				metadata->model.name, i, metadata->input1[i].shape.w,
 				metadata->input1[i].shape.x, metadata->input1[i].shape.y,
-				metadata->input1[i].shape.z, layer->info.input[i].sz_d,
-				layer->info.input[i].sz_q);
+				metadata->input1[i].shape.z, io_info->input[i].sz_d,
+				io_info->input[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			strncpy(layer->info.input[i].name, (char *)metadata->input2[j].input_name,
+			strncpy(io_info->input[i].name, (char *)metadata->input2[j].input_name,
 				MRVL_ML_INPUT_NAME_LEN);
-			layer->info.input[i].dtype = metadata->input2[j].input_type;
-			layer->info.input[i].qtype = metadata->input2[j].model_input_type;
-			layer->info.input[i].nb_dims = 4;
-			layer->info.input[i].shape[0] = metadata->input2[j].shape.w;
-			layer->info.input[i].shape[1] = metadata->input2[j].shape.x;
-			layer->info.input[i].shape[2] = metadata->input2[j].shape.y;
-			layer->info.input[i].shape[3] = metadata->input2[j].shape.z;
-			layer->info.input[i].nb_elements =
+			io_info->input[i].dtype = metadata->input2[j].input_type;
+			io_info->input[i].qtype = metadata->input2[j].model_input_type;
+			io_info->input[i].nb_dims = 4;
+			io_info->input[i].shape[0] = metadata->input2[j].shape.w;
+			io_info->input[i].shape[1] = metadata->input2[j].shape.x;
+			io_info->input[i].shape[2] = metadata->input2[j].shape.y;
+			io_info->input[i].shape[3] = metadata->input2[j].shape.z;
+			io_info->input[i].nb_elements =
 				metadata->input2[j].shape.w * metadata->input2[j].shape.x *
 				metadata->input2[j].shape.y * metadata->input2[j].shape.z;
-			layer->info.input[i].sz_d =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_d =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].input_type);
-			layer->info.input[i].sz_q =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_q =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].model_input_type);
-			layer->info.input[i].scale = metadata->input2[j].qscale;
+			io_info->input[i].scale = metadata->input2[j].qscale;
 
-			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
-			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
+			io_info->total_input_sz_d += io_info->input[i].sz_d;
+			io_info->total_input_sz_q += io_info->input[i].sz_q;
 
 			plt_ml_dbg(
-				"index = %u, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				layer->index, j, metadata->input2[j].shape.w,
+				"layer_name = %s, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				metadata->model.name, j, metadata->input2[j].shape.w,
 				metadata->input2[j].shape.x, metadata->input2[j].shape.y,
-				metadata->input2[j].shape.z, layer->info.input[i].sz_d,
-				layer->info.input[i].sz_q);
+				metadata->input2[j].shape.z, io_info->input[i].sz_d,
+				io_info->input[i].sz_q);
 		}
 	}
 
 	/* Outputs */
-	layer->info.nb_outputs = metadata->model.num_output;
-	layer->info.total_output_sz_q = 0;
-	layer->info.total_output_sz_d = 0;
+	io_info->nb_outputs = metadata->model.num_output;
+	io_info->total_output_sz_q = 0;
+	io_info->total_output_sz_d = 0;
 	for (i = 0; i < metadata->model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			strncpy(layer->info.output[i].name,
-				(char *)metadata->output1[i].output_name, MRVL_ML_OUTPUT_NAME_LEN);
-			layer->info.output[i].dtype = metadata->output1[i].output_type;
-			layer->info.output[i].qtype = metadata->output1[i].model_output_type;
-			layer->info.output[i].nb_dims = 1;
-			layer->info.output[i].shape[0] = metadata->output1[i].size;
-			layer->info.output[i].nb_elements = metadata->output1[i].size;
-			layer->info.output[i].sz_d =
-				layer->info.output[i].nb_elements *
+			strncpy(io_info->output[i].name, (char *)metadata->output1[i].output_name,
+				MRVL_ML_OUTPUT_NAME_LEN);
+			io_info->output[i].dtype = metadata->output1[i].output_type;
+			io_info->output[i].qtype = metadata->output1[i].model_output_type;
+			io_info->output[i].nb_dims = 1;
+			io_info->output[i].shape[0] = metadata->output1[i].size;
+			io_info->output[i].nb_elements = metadata->output1[i].size;
+			io_info->output[i].sz_d =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].output_type);
-			layer->info.output[i].sz_q =
-				layer->info.output[i].nb_elements *
+			io_info->output[i].sz_q =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].model_output_type);
-			layer->info.output[i].scale = metadata->output1[i].dscale;
+			io_info->output[i].scale = metadata->output1[i].dscale;
 
-			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
-			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
+			io_info->total_output_sz_q += io_info->output[i].sz_q;
+			io_info->total_output_sz_d += io_info->output[i].sz_d;
 
-			plt_ml_dbg("index = %u, output1[%u] - sz_d = %u, sz_q = %u", layer->index,
-				   i, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
+			plt_ml_dbg("layer_name = %s, output1[%u] - sz_d = %u, sz_q = %u",
+				   metadata->model.name, i, io_info->output[i].sz_d,
+				   io_info->output[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			strncpy(layer->info.output[i].name,
-				(char *)metadata->output2[j].output_name, MRVL_ML_OUTPUT_NAME_LEN);
-			layer->info.output[i].dtype = metadata->output2[j].output_type;
-			layer->info.output[i].qtype = metadata->output2[j].model_output_type;
-			layer->info.output[i].nb_dims = 1;
-			layer->info.output[i].shape[0] = metadata->output2[j].size;
-			layer->info.output[i].nb_elements = metadata->output2[j].size;
-			layer->info.output[i].sz_d =
-				layer->info.output[i].nb_elements *
+			strncpy(io_info->output[i].name, (char *)metadata->output2[j].output_name,
+				MRVL_ML_OUTPUT_NAME_LEN);
+			io_info->output[i].dtype = metadata->output2[j].output_type;
+			io_info->output[i].qtype = metadata->output2[j].model_output_type;
+			io_info->output[i].nb_dims = 1;
+			io_info->output[i].shape[0] = metadata->output2[j].size;
+			io_info->output[i].nb_elements = metadata->output2[j].size;
+			io_info->output[i].sz_d =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].output_type);
-			layer->info.output[i].sz_q =
-				layer->info.output[i].nb_elements *
+			io_info->output[i].sz_q =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].model_output_type);
-			layer->info.output[i].scale = metadata->output2[j].dscale;
+			io_info->output[i].scale = metadata->output2[j].dscale;
 
-			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
-			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
+			io_info->total_output_sz_q += io_info->output[i].sz_q;
+			io_info->total_output_sz_d += io_info->output[i].sz_d;
 
-			plt_ml_dbg("index = %u, output2[%u] - sz_d = %u, sz_q = %u", layer->index,
-				   j, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
+			plt_ml_dbg("layer_name = %s, output2[%u] - sz_d = %u, sz_q = %u",
+				   metadata->model.name, j, io_info->output[i].sz_d,
+				   io_info->output[i].sz_q);
 		}
 	}
 }
 
 int
-cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id, uint8_t *buffer,
-			       uint16_t *wb_pages, uint16_t *scratch_pages)
+cn10k_ml_model_ocm_pages_count(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
+			       uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_ocm *ocm;
@@ -506,7 +494,7 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 	uint64_t wb_size;
 
 	metadata = (struct cn10k_ml_model_metadata *)buffer;
-	ocm = &cn10k_mldev->ocm;
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
 
 	/* Assume wb_size is zero for non-relocatable models */
 	if (metadata->model.ocm_relocatable)
@@ -518,7 +506,7 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 		*wb_pages = wb_size / ocm->page_size + 1;
 	else
 		*wb_pages = wb_size / ocm->page_size;
-	plt_ml_dbg("model_id = %u, wb_size = %" PRIu64 ", wb_pages = %u", model_id, wb_size,
+	plt_ml_dbg("index = %u, wb_size = %" PRIu64 ", wb_pages = %u", layer->index, wb_size,
 		   *wb_pages);
 
 	scratch_size = ocm->size_per_tile - metadata->model.ocm_tmp_range_floor;
@@ -526,15 +514,15 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 		*scratch_pages = scratch_size / ocm->page_size + 1;
 	else
 		*scratch_pages = scratch_size / ocm->page_size;
-	plt_ml_dbg("model_id = %u, scratch_size = %" PRIu64 ", scratch_pages = %u", model_id,
+	plt_ml_dbg("index = %u, scratch_size = %" PRIu64 ", scratch_pages = %u", layer->index,
 		   scratch_size, *scratch_pages);
 
 	/* Check if the model can be loaded on OCM */
-	if ((*wb_pages + *scratch_pages) > cn10k_mldev->ocm.num_pages) {
+	if ((*wb_pages + *scratch_pages) > ocm->num_pages) {
 		plt_err("Cannot create the model, OCM relocatable = %u",
 			metadata->model.ocm_relocatable);
 		plt_err("wb_pages (%u) + scratch_pages (%u) > %u", *wb_pages, *scratch_pages,
-			cn10k_mldev->ocm.num_pages);
+			ocm->num_pages);
 		return -ENOMEM;
 	}
 
@@ -542,28 +530,25 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 	 * prevent the library from allocating the remaining space on the tile to other models.
 	 */
 	if (!metadata->model.ocm_relocatable)
-		*scratch_pages = PLT_MAX(PLT_U64_CAST(*scratch_pages),
-					 PLT_U64_CAST(cn10k_mldev->ocm.num_pages));
+		*scratch_pages =
+			PLT_MAX(PLT_U64_CAST(*scratch_pages), PLT_U64_CAST(ocm->num_pages));
 
 	return 0;
 }
 
 void
-cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
+cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			struct cnxk_ml_io_info *io_info, struct cn10k_ml_model_metadata *metadata)
 {
-	struct cn10k_ml_model_metadata *metadata;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct rte_ml_model_info *info;
 	struct rte_ml_io_info *output;
 	struct rte_ml_io_info *input;
-	struct cnxk_ml_layer *layer;
 	uint8_t i;
 
-	cnxk_mldev = dev->data->dev_private;
 	metadata = &model->glow.metadata;
 	info = PLT_PTR_CAST(model->info);
 	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
-	output = PLT_PTR_ADD(input, metadata->model.num_input * sizeof(struct rte_ml_io_info));
+	output = PLT_PTR_ADD(input, ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info));
 
 	/* Set model info */
 	memset(info, 0, sizeof(struct rte_ml_model_info));
@@ -572,39 +557,37 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 		 metadata->model.version[1], metadata->model.version[2],
 		 metadata->model.version[3]);
 	info->model_id = model->model_id;
-	info->device_id = dev->data->dev_id;
+	info->device_id = cnxk_mldev->mldev->data->dev_id;
 	info->io_layout = RTE_ML_IO_LAYOUT_PACKED;
 	info->min_batches = model->batch_size;
 	info->max_batches =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_num_batches /
 		model->batch_size;
-	info->nb_inputs = metadata->model.num_input;
+	info->nb_inputs = io_info->nb_inputs;
 	info->input_info = input;
-	info->nb_outputs = metadata->model.num_output;
+	info->nb_outputs = io_info->nb_outputs;
 	info->output_info = output;
 	info->wb_size = metadata->weights_bias.file_size;
 
 	/* Set input info */
-	layer = &model->layer[0];
 	for (i = 0; i < info->nb_inputs; i++) {
-		rte_memcpy(input[i].name, layer->info.input[i].name, MRVL_ML_INPUT_NAME_LEN);
-		input[i].nb_dims = layer->info.input[i].nb_dims;
-		input[i].shape = &layer->info.input[i].shape[0];
-		input[i].type = layer->info.input[i].qtype;
-		input[i].nb_elements = layer->info.input[i].nb_elements;
-		input[i].size = layer->info.input[i].nb_elements *
-				rte_ml_io_type_size_get(layer->info.input[i].qtype);
+		rte_memcpy(input[i].name, io_info->input[i].name, MRVL_ML_INPUT_NAME_LEN);
+		input[i].nb_dims = io_info->input[i].nb_dims;
+		input[i].shape = &io_info->input[i].shape[0];
+		input[i].type = io_info->input[i].qtype;
+		input[i].nb_elements = io_info->input[i].nb_elements;
+		input[i].size = io_info->input[i].nb_elements *
+				rte_ml_io_type_size_get(io_info->input[i].qtype);
 	}
 
 	/* Set output info */
-	layer = &model->layer[0];
 	for (i = 0; i < info->nb_outputs; i++) {
-		rte_memcpy(output[i].name, layer->info.output[i].name, MRVL_ML_INPUT_NAME_LEN);
-		output[i].nb_dims = layer->info.output[i].nb_dims;
-		output[i].shape = &layer->info.output[i].shape[0];
-		output[i].type = layer->info.output[i].qtype;
-		output[i].nb_elements = layer->info.output[i].nb_elements;
-		output[i].size = layer->info.output[i].nb_elements *
-				 rte_ml_io_type_size_get(layer->info.output[i].qtype);
+		rte_memcpy(output[i].name, io_info->output[i].name, MRVL_ML_INPUT_NAME_LEN);
+		output[i].nb_dims = io_info->output[i].nb_dims;
+		output[i].shape = &io_info->output[i].shape[0];
+		output[i].type = io_info->output[i].qtype;
+		output[i].nb_elements = io_info->output[i].nb_elements;
+		output[i].size = io_info->output[i].nb_elements *
+				 rte_ml_io_type_size_get(io_info->output[i].qtype);
 	}
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 5c32f48c68f..45290b84cef 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -9,9 +9,11 @@
 
 #include <roc_api.h>
 
-#include "cn10k_ml_dev.h"
 #include "cn10k_ml_ocm.h"
 
+#include "cnxk_ml_io.h"
+
+struct cnxk_ml_dev;
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
 struct cnxk_ml_req;
@@ -366,27 +368,15 @@ struct cn10k_ml_layer_addr {
 	/* Base DMA address for load */
 	void *base_dma_addr_load;
 
-	/* Base DMA address for run */
-	void *base_dma_addr_run;
-
 	/* Init section load address */
 	void *init_load_addr;
 
-	/* Init section run address */
-	void *init_run_addr;
-
 	/* Main section load address */
 	void *main_load_addr;
 
-	/* Main section run address */
-	void *main_run_addr;
-
 	/* Finish section load address */
 	void *finish_load_addr;
 
-	/* Finish section run address */
-	void *finish_run_addr;
-
 	/* Weights and Bias base address */
 	void *wb_base_addr;
 
@@ -462,9 +452,12 @@ int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
 void cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer,
 				uint8_t *base_dma_addr);
-void cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer);
-int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id,
+void cn10k_ml_layer_io_info_update(struct cnxk_ml_io_info *io_info,
+				   struct cn10k_ml_model_metadata *metadata);
+int cn10k_ml_model_ocm_pages_count(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				   uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages);
-void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model);
+void cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			     struct cnxk_ml_io_info *io_info,
+			     struct cn10k_ml_model_metadata *metadata);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 330cb050cbd..3bfc63d9d40 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -19,6 +19,9 @@
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
 
+/* ML layer macros */
+#define CN10K_ML_LAYER_MEMZONE_NAME "ml_cn10k_layer_mz"
+
 /* Debug print width */
 #define STR_LEN	  12
 #define FIELD_LEN 16
@@ -277,7 +280,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 		req->cn10k_req.jd.model_start.extended_args = PLT_U64_CAST(
 			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.extended_args));
 		req->cn10k_req.jd.model_start.model_dst_ddr_addr =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_run_addr));
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_load_addr));
 		req->cn10k_req.jd.model_start.model_init_offset = 0x0;
 		req->cn10k_req.jd.model_start.model_main_offset = metadata->init_model.file_size;
 		req->cn10k_req.jd.model_start.model_finish_offset =
@@ -1265,85 +1268,171 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 }
 
 int
-cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
+cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uint8_t *buffer,
+		    size_t size, uint16_t *index)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
-	size_t model_scratch_size;
-	size_t model_stats_size;
-	size_t model_data_size;
-	size_t model_info_size;
+	size_t layer_object_size = 0;
+	size_t layer_scratch_size;
+	size_t layer_xstats_size;
 	uint8_t *base_dma_addr;
 	uint16_t scratch_pages;
+	uint16_t layer_id = 0;
 	uint16_t wb_pages;
 	uint64_t mz_size;
 	uint16_t idx;
-	bool found;
 	int qp_id;
 	int ret;
 
-	ret = cn10k_ml_model_metadata_check(params->addr, params->size);
+	PLT_SET_USED(size);
+	PLT_SET_USED(layer_name);
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	layer = &model->layer[layer_id];
+
+	ret = cn10k_ml_model_metadata_check(buffer, size);
 	if (ret != 0)
 		return ret;
 
-	cnxk_mldev = dev->data->dev_private;
-
-	/* Find model ID */
-	found = false;
-	for (idx = 0; idx < dev->data->nb_models; idx++) {
-		if (dev->data->models[idx] == NULL) {
-			found = true;
+	/* Get index */
+	for (idx = 0; idx < cnxk_mldev->max_nb_layers; idx++) {
+		if (!cnxk_mldev->index_map[idx].active) {
+			layer->index = idx;
 			break;
 		}
 	}
 
-	if (!found) {
-		plt_err("No slots available to load new model");
-		return -ENOMEM;
+	if (idx >= cnxk_mldev->max_nb_layers) {
+		plt_err("No slots available for model layers, model_id = %u, layer_id = %u",
+			model->model_id, layer_id);
+		return -1;
 	}
 
+	layer->model = model;
+
 	/* Get WB and scratch pages, check if model can be loaded. */
-	ret = cn10k_ml_model_ocm_pages_count(&cnxk_mldev->cn10k_mldev, idx, params->addr, &wb_pages,
-					     &scratch_pages);
+	ret = cn10k_ml_model_ocm_pages_count(cnxk_mldev, layer, buffer, &wb_pages, &scratch_pages);
 	if (ret < 0)
 		return ret;
 
-	/* Compute memzone size */
-	metadata = (struct cn10k_ml_model_metadata *)params->addr;
-	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
-			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
-	model_scratch_size = PLT_ALIGN_CEIL(metadata->model.ddr_scratch_range_end -
+	/* Compute layer memzone size */
+	metadata = (struct cn10k_ml_model_metadata *)buffer;
+	layer_object_size = metadata->init_model.file_size + metadata->main_model.file_size +
+			    metadata->finish_model.file_size + metadata->weights_bias.file_size;
+	layer_object_size = PLT_ALIGN_CEIL(layer_object_size, ML_CN10K_ALIGN_SIZE);
+	layer_scratch_size = PLT_ALIGN_CEIL(metadata->model.ddr_scratch_range_end -
 						    metadata->model.ddr_scratch_range_start + 1,
 					    ML_CN10K_ALIGN_SIZE);
-	model_data_size = PLT_ALIGN_CEIL(model_data_size, ML_CN10K_ALIGN_SIZE);
-	model_info_size = sizeof(struct rte_ml_model_info) +
-			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
-			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
-	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
-	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_xstats);
-
-	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
-		  2 * model_data_size + model_scratch_size + model_info_size +
-		  PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE) +
-		  model_stats_size;
+	layer_xstats_size = (cnxk_mldev->mldev->data->nb_queue_pairs + 1) *
+			    sizeof(struct cn10k_ml_layer_xstats);
 
-	/* Allocate memzone for model object and model data */
-	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
+	/* Allocate memzone for model data */
+	mz_size = layer_object_size + layer_scratch_size +
+		  PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE) +
+		  layer_xstats_size;
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u_%u", CN10K_ML_LAYER_MEMZONE_NAME,
+		 model->model_id, layer_id);
 	mz = plt_memzone_reserve_aligned(str, mz_size, 0, ML_CN10K_ALIGN_SIZE);
 	if (!mz) {
 		plt_err("plt_memzone_reserve failed : %s", str);
 		return -ENOMEM;
 	}
 
-	model = mz->addr;
-	model->cnxk_mldev = cnxk_mldev;
-	model->model_id = idx;
-	dev->data->models[idx] = model;
+	/* Copy metadata to internal buffer */
+	rte_memcpy(&layer->glow.metadata, buffer, sizeof(struct cn10k_ml_model_metadata));
+	cn10k_ml_model_metadata_update(&layer->glow.metadata);
+
+	/* Set layer name */
+	rte_memcpy(layer->name, layer->glow.metadata.model.name, MRVL_ML_MODEL_NAME_LEN);
+
+	/* Enable support for batch_size of 256 */
+	if (layer->glow.metadata.model.batch_size == 0)
+		layer->batch_size = 256;
+	else
+		layer->batch_size = layer->glow.metadata.model.batch_size;
+
+	/* Set DMA base address */
+	base_dma_addr = mz->addr;
+	cn10k_ml_layer_addr_update(layer, buffer, base_dma_addr);
+
+	/* Set scratch base address */
+	layer->glow.addr.scratch_base_addr = PLT_PTR_ADD(base_dma_addr, layer_object_size);
+
+	/* Update internal I/O data structure */
+	cn10k_ml_layer_io_info_update(&layer->info, &layer->glow.metadata);
+
+	/* Initialize model_mem_map */
+	memset(&layer->glow.ocm_map, 0, sizeof(struct cn10k_ml_ocm_layer_map));
+	layer->glow.ocm_map.ocm_reserved = false;
+	layer->glow.ocm_map.tilemask = 0;
+	layer->glow.ocm_map.wb_page_start = -1;
+	layer->glow.ocm_map.wb_pages = wb_pages;
+	layer->glow.ocm_map.scratch_pages = scratch_pages;
+
+	/* Set slow-path request address and state */
+	layer->glow.req = PLT_PTR_ADD(mz->addr, layer_object_size + layer_scratch_size);
+
+	/* Reset burst and sync stats */
+	layer->glow.burst_xstats = PLT_PTR_ADD(
+		layer->glow.req, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
+	for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs + 1; qp_id++) {
+		layer->glow.burst_xstats[qp_id].hw_latency_tot = 0;
+		layer->glow.burst_xstats[qp_id].hw_latency_min = UINT64_MAX;
+		layer->glow.burst_xstats[qp_id].hw_latency_max = 0;
+		layer->glow.burst_xstats[qp_id].fw_latency_tot = 0;
+		layer->glow.burst_xstats[qp_id].fw_latency_min = UINT64_MAX;
+		layer->glow.burst_xstats[qp_id].fw_latency_max = 0;
+		layer->glow.burst_xstats[qp_id].hw_reset_count = 0;
+		layer->glow.burst_xstats[qp_id].fw_reset_count = 0;
+		layer->glow.burst_xstats[qp_id].dequeued_count = 0;
+	}
+
+	layer->glow.sync_xstats =
+		PLT_PTR_ADD(layer->glow.burst_xstats, cnxk_mldev->mldev->data->nb_queue_pairs *
+							      sizeof(struct cn10k_ml_layer_xstats));
+
+	/* Update xstats names */
+	cn10k_ml_xstats_model_name_update(cnxk_mldev->mldev, idx);
+
+	layer->state = ML_CNXK_LAYER_STATE_LOADED;
+	cnxk_mldev->index_map[idx].model_id = model->model_id;
+	cnxk_mldev->index_map[idx].layer_id = layer_id;
+	cnxk_mldev->index_map[idx].active = true;
+	*index = idx;
+
+	return 0;
+}
+
+int
+cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+		    struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+	int ret;
+
+	/* Metadata check */
+	ret = cn10k_ml_model_metadata_check(params->addr, params->size);
+	if (ret != 0)
+		return ret;
 
+	/* Copy metadata to internal buffer */
 	rte_memcpy(&model->glow.metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
 	cn10k_ml_model_metadata_update(&model->glow.metadata);
 
@@ -1362,99 +1451,62 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	 */
 	model->nb_layers = 1;
 
-	/* Copy metadata to internal buffer */
-	rte_memcpy(&model->layer[0].glow.metadata, params->addr,
-		   sizeof(struct cn10k_ml_model_metadata));
-	cn10k_ml_model_metadata_update(&model->layer[0].glow.metadata);
-	model->layer[0].model = model;
-
-	/* Set DMA base address */
-	base_dma_addr = PLT_PTR_ADD(
-		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE));
-	cn10k_ml_layer_addr_update(&model->layer[0], params->addr, base_dma_addr);
-	model->layer[0].glow.addr.scratch_base_addr =
-		PLT_PTR_ADD(base_dma_addr, 2 * model_data_size);
-
-	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
-		   model->layer[0].glow.addr.base_dma_addr_load, model_data_size);
-
-	/* Update internal I/O data structure */
-	cn10k_ml_layer_info_update(&model->layer[0]);
-
-	/* Initialize model_mem_map */
-	memset(&model->layer[0].glow.ocm_map, 0, sizeof(struct cn10k_ml_ocm_layer_map));
-	model->layer[0].glow.ocm_map.ocm_reserved = false;
-	model->layer[0].glow.ocm_map.tilemask = 0;
-	model->layer[0].glow.ocm_map.wb_page_start = -1;
-	model->layer[0].glow.ocm_map.wb_pages = wb_pages;
-	model->layer[0].glow.ocm_map.scratch_pages = scratch_pages;
-
-	/* Set model info */
-	model->info = PLT_PTR_ADD(model->layer[0].glow.addr.scratch_base_addr, model_scratch_size);
-	cn10k_ml_model_info_set(dev, model);
-
-	/* Set slow-path request address and state */
-	model->layer[0].glow.req = PLT_PTR_ADD(model->info, model_info_size);
-
-	/* Reset burst and sync stats */
-	model->layer[0].glow.burst_xstats =
-		PLT_PTR_ADD(model->layer[0].glow.req,
-			    PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
-		model->layer[0].glow.burst_xstats[qp_id].hw_latency_tot = 0;
-		model->layer[0].glow.burst_xstats[qp_id].hw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_xstats[qp_id].hw_latency_max = 0;
-		model->layer[0].glow.burst_xstats[qp_id].fw_latency_tot = 0;
-		model->layer[0].glow.burst_xstats[qp_id].fw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_xstats[qp_id].fw_latency_max = 0;
-		model->layer[0].glow.burst_xstats[qp_id].hw_reset_count = 0;
-		model->layer[0].glow.burst_xstats[qp_id].fw_reset_count = 0;
-		model->layer[0].glow.burst_xstats[qp_id].dequeued_count = 0;
+	/* Load layer and get the index */
+	layer = &model->layer[0];
+	ret = cn10k_ml_layer_load(cnxk_mldev, model->model_id, NULL, params->addr, params->size,
+				  &layer->index);
+	if (ret != 0) {
+		plt_err("Model layer load failed: model_id = %u, layer_id = %u", model->model_id,
+			0);
+		return ret;
 	}
 
-	model->layer[0].glow.sync_xstats =
-		PLT_PTR_ADD(model->layer[0].glow.burst_xstats,
-			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_xstats));
-
-	plt_spinlock_init(&model->lock);
-	model->state = ML_CNXK_MODEL_STATE_LOADED;
-	dev->data->models[idx] = model;
-	cnxk_mldev->nb_models_loaded++;
-
-	/* Update xstats names */
-	cn10k_ml_xstats_model_name_update(dev, idx);
-
-	*model_id = idx;
+	cn10k_ml_model_info_set(cnxk_mldev, model, &model->layer[0].info, &model->glow.metadata);
 
 	return 0;
 }
 
 int
-cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name)
 {
-	char str[RTE_MEMZONE_NAMESIZE];
-	struct cnxk_ml_model *model;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 
-	cnxk_mldev = dev->data->dev_private;
-	model = dev->data->models[model_id];
+	char str[RTE_MEMZONE_NAMESIZE];
+	uint16_t layer_id = 0;
+	int ret;
 
+	PLT_SET_USED(layer_name);
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
 	if (model == NULL) {
 		plt_err("Invalid model_id = %u", model_id);
 		return -EINVAL;
 	}
 
-	if (model->state != ML_CNXK_MODEL_STATE_LOADED) {
-		plt_err("Cannot unload. Model in use.");
-		return -EBUSY;
-	}
+	layer = &model->layer[layer_id];
 
-	dev->data->models[model_id] = NULL;
-	cnxk_mldev->nb_models_unloaded++;
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u_%u", CN10K_ML_LAYER_MEMZONE_NAME,
+		 model->model_id, layer_id);
+	ret = plt_memzone_free(plt_memzone_lookup(str));
 
-	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, model_id);
-	return plt_memzone_free(plt_memzone_lookup(str));
+	layer->state = ML_CNXK_LAYER_STATE_UNKNOWN;
+	cnxk_mldev->index_map[layer->index].active = false;
+
+	return ret;
+}
+
+int
+cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	return cn10k_ml_layer_unload(cnxk_mldev, model->model_id, NULL);
 }
 
 int
@@ -1752,7 +1804,6 @@ int
 cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
 {
 	struct cnxk_ml_model *model;
-	size_t size;
 
 	model = dev->data->models[model_id];
 
@@ -1766,19 +1817,10 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *bu
 	else if (model->state != ML_CNXK_MODEL_STATE_LOADED)
 		return -EBUSY;
 
-	size = model->layer[0].glow.metadata.init_model.file_size +
-	       model->layer[0].glow.metadata.main_model.file_size +
-	       model->layer[0].glow.metadata.finish_model.file_size +
-	       model->layer[0].glow.metadata.weights_bias.file_size;
-
 	/* Update model weights & bias */
 	rte_memcpy(model->layer[0].glow.addr.wb_load_addr, buffer,
 		   model->layer[0].glow.metadata.weights_bias.file_size);
 
-	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
-		   model->layer[0].glow.addr.base_dma_addr_load, size);
-
 	return 0;
 }
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 2d0a49d5cdf..677219dfdf7 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -12,6 +12,7 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_qp;
+struct cnxk_ml_model;
 
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
@@ -311,9 +312,9 @@ int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 			      int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids);
 
 /* Slow-path ops */
-int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
-			uint16_t *model_id);
-int cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
+int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+			struct cnxk_ml_model *model);
+int cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
@@ -339,4 +340,9 @@ __rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *
 /* Misc ops */
 void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
 
+/* Layer ops */
+int cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uint8_t *buffer,
+			size_t size, uint16_t *index);
+int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name);
+
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 02605fa28fc..1590249abd1 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -31,6 +31,18 @@ enum cnxk_ml_dev_state {
 	ML_CNXK_DEV_STATE_CLOSED
 };
 
+/* Index to model and layer ID map */
+struct cnxk_ml_index_map {
+	/* Model ID */
+	uint16_t model_id;
+
+	/* Layer ID */
+	uint16_t layer_id;
+
+	/* Layer status */
+	bool active;
+};
+
 /* Device private data */
 struct cnxk_ml_dev {
 	/* RTE device */
@@ -56,6 +68,9 @@ struct cnxk_ml_dev {
 
 	/* Maximum number of layers */
 	uint64_t max_nb_layers;
+
+	/* Index map */
+	struct cnxk_ml_index_map *index_map;
 };
 
 #endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 1767a8a3dbc..3d9d5f9d78c 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -12,6 +12,9 @@
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
+/* ML model macros */
+#define CNXK_ML_MODEL_MEMZONE_NAME "ml_cnxk_model_mz"
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -139,6 +142,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	uint16_t model_id;
 	uint32_t mz_size;
 	uint16_t qp_id;
+	uint64_t i;
 	int ret;
 
 	if (dev == NULL)
@@ -242,7 +246,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 						plt_err("Could not stop model %u", model_id);
 				}
 				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-					if (cn10k_ml_model_unload(dev, model_id) != 0)
+					if (cnxk_ml_model_unload(dev, model_id) != 0)
 						plt_err("Could not unload model %u", model_id);
 				}
 				dev->data->models[model_id] = NULL;
@@ -273,6 +277,23 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	cnxk_mldev->max_nb_layers =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
 
+	/* Allocate and initialize index_map */
+	if (cnxk_mldev->index_map == NULL) {
+		cnxk_mldev->index_map =
+			rte_zmalloc("cnxk_ml_index_map",
+				    sizeof(struct cnxk_ml_index_map) * cnxk_mldev->max_nb_layers,
+				    RTE_CACHE_LINE_SIZE);
+		if (cnxk_mldev->index_map == NULL) {
+			plt_err("Failed to get memory for index_map, nb_layers %" PRIu64,
+				cnxk_mldev->max_nb_layers);
+			ret = -ENOMEM;
+			goto error;
+		}
+	}
+
+	for (i = 0; i < cnxk_mldev->max_nb_layers; i++)
+		cnxk_mldev->index_map[i].active = false;
+
 	cnxk_mldev->nb_models_loaded = 0;
 	cnxk_mldev->nb_models_started = 0;
 	cnxk_mldev->nb_models_stopped = 0;
@@ -305,6 +326,9 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close CN10K ML Device");
 
+	if (cnxk_mldev->index_map)
+		rte_free(cnxk_mldev->index_map);
+
 	/* Stop and unload all models */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 		model = dev->data->models[model_id];
@@ -314,7 +338,7 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 					plt_err("Could not stop model %u", model_id);
 			}
 			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-				if (cn10k_ml_model_unload(dev, model_id) != 0)
+				if (cnxk_ml_model_unload(dev, model_id) != 0)
 					plt_err("Could not unload model %u", model_id);
 			}
 			dev->data->models[model_id] = NULL;
@@ -430,6 +454,118 @@ cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+static int
+cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint64_t model_info_size;
+	uint16_t lcl_model_id;
+	uint64_t mz_size;
+	bool found;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	/* Find model ID */
+	found = false;
+	for (lcl_model_id = 0; lcl_model_id < dev->data->nb_models; lcl_model_id++) {
+		if (dev->data->models[lcl_model_id] == NULL) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		plt_err("No slots available to load new model");
+		return -ENOMEM;
+	}
+
+	/* Compute memzone size */
+	cnxk_ml_dev_info_get(dev, &dev_info);
+	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size);
+	model_info_size = sizeof(struct rte_ml_model_info) +
+			  ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info) +
+			  ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info);
+	model_info_size = PLT_ALIGN_CEIL(model_info_size, dev_info.align_size);
+	mz_size += model_info_size;
+
+	/* Allocate memzone for model object */
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CNXK_ML_MODEL_MEMZONE_NAME, lcl_model_id);
+	mz = plt_memzone_reserve_aligned(str, mz_size, 0, dev_info.align_size);
+	if (!mz) {
+		plt_err("Failed to allocate memory for cnxk_ml_model: %s", str);
+		return -ENOMEM;
+	}
+
+	model = mz->addr;
+	model->cnxk_mldev = cnxk_mldev;
+	model->model_id = lcl_model_id;
+	model->info = PLT_PTR_ADD(
+		model, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size));
+	dev->data->models[lcl_model_id] = model;
+
+	ret = cn10k_ml_model_load(cnxk_mldev, params, model);
+	if (ret != 0)
+		goto error;
+
+	plt_spinlock_init(&model->lock);
+	model->state = ML_CNXK_MODEL_STATE_LOADED;
+	cnxk_mldev->nb_models_loaded++;
+
+	*model_id = lcl_model_id;
+
+	return 0;
+
+error:
+	rte_memzone_free(mz);
+
+	return ret;
+}
+
+int
+cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	if (model->state != ML_CNXK_MODEL_STATE_LOADED) {
+		plt_err("Cannot unload. Model in use.");
+		return -EBUSY;
+	}
+
+	ret = cn10k_ml_model_unload(cnxk_mldev, model);
+	if (ret != 0)
+		return ret;
+
+	dev->data->models[model_id] = NULL;
+	cnxk_mldev->nb_models_unloaded++;
+
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CNXK_ML_MODEL_MEMZONE_NAME, model_id);
+	return plt_memzone_free(plt_memzone_lookup(str));
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -453,8 +589,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
 
 	/* Model ops */
-	.model_load = cn10k_ml_model_load,
-	.model_unload = cn10k_ml_model_unload,
+	.model_load = cnxk_ml_model_load,
+	.model_unload = cnxk_ml_model_unload,
 	.model_start = cn10k_ml_model_start,
 	.model_stop = cn10k_ml_model_stop,
 	.model_info_get = cn10k_ml_model_info_get,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index a925c075809..bc14f6e5b9e 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -62,4 +62,6 @@ struct cnxk_ml_qp {
 
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
+int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 11/34] ml/cnxk: update model start and stop functions
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (9 preceding siblings ...)
  2023-08-30 15:59 ` [PATCH v1 10/34] ml/cnxk: update model load and unload functions Srikanth Yalavarthi
@ 2023-08-30 15:59 ` Srikanth Yalavarthi
  2023-08-30 15:59 ` [PATCH v1 12/34] ml/cnxk: update model utility functions Srikanth Yalavarthi
                   ` (30 subsequent siblings)
  41 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented cnxk wrapper functions to start and stop
ML models. Wrapper functions would invoke the cn10k
model start and stop functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c |  28 ++--
 drivers/ml/cnxk/cn10k_ml_ocm.h |  12 +-
 drivers/ml/cnxk/cn10k_ml_ops.c | 282 ++++++++++++++++++++-------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   8 +-
 drivers/ml/cnxk/cnxk_ml_ops.c  |  48 +++++-
 drivers/ml/cnxk/cnxk_ml_ops.h  |   1 +
 6 files changed, 240 insertions(+), 139 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 639f329f8aa..6a8400b7763 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -217,11 +217,10 @@ cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end)
  * scratch & WB pages and OCM allocation mode.
  */
 int
-cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
+cn10k_ml_ocm_tilemask_find(struct cnxk_ml_dev *cnxk_mldev, uint8_t num_tiles, uint16_t wb_pages,
 			   uint16_t scratch_pages, uint64_t *tilemask)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	uint16_t used_scratch_pages_max;
@@ -240,7 +239,6 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	int max_slot_sz;
 	int page_id;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
 
@@ -335,12 +333,10 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 }
 
 void
-cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
+cn10k_ml_ocm_reserve_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id,
 			   uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
 			   uint16_t scratch_pages)
 {
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
@@ -353,10 +349,8 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t l
 	int tile_id;
 	int page_id;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
+	model = cnxk_mldev->mldev->data->models[model_id];
 	layer = &model->layer[layer_id];
 
 	/* Get first set bit, tile_start */
@@ -398,12 +392,10 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t l
 }
 
 void
-cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id)
+cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id)
 {
 	struct cnxk_ml_model *local_model;
 	struct cnxk_ml_layer *local_layer;
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
@@ -418,10 +410,8 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t laye
 	uint16_t i;
 	uint16_t j;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
+	model = cnxk_mldev->mldev->data->models[model_id];
 	layer = &model->layer[layer_id];
 
 	/* Update OCM info for WB memory */
@@ -440,8 +430,8 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t laye
 
 		/* Get max scratch pages required, excluding the current model */
 		scratch_resize_pages = 0;
-		for (i = 0; i < dev->data->nb_models; i++) {
-			local_model = dev->data->models[i];
+		for (i = 0; i < cnxk_mldev->mldev->data->nb_models; i++) {
+			local_model = cnxk_mldev->mldev->data->models[i];
 			if (local_model == NULL)
 				continue;
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 720f8caf766..97b723a56a5 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -8,6 +8,8 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+struct cnxk_ml_dev;
+
 /* Number of OCM tiles. */
 #define ML_CN10K_OCM_NUMTILES 0x8
 
@@ -75,12 +77,12 @@ struct cn10k_ml_ocm {
 };
 
 int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
-int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
+int cn10k_ml_ocm_tilemask_find(struct cnxk_ml_dev *cnxk_mldev, uint8_t num_tiles, uint16_t wb_pages,
 			       uint16_t scratch_pages, uint64_t *tilemask);
-void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
-				uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
-				uint16_t scratch_pages);
-void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id);
+void cn10k_ml_ocm_reserve_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id,
+				uint16_t layer_id, uint64_t tilemask, int wb_page_start,
+				uint16_t wb_pages, uint16_t scratch_pages);
+void cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id);
 void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp);
 
 #endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 3bfc63d9d40..e5b9837ed73 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -252,26 +252,28 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 }
 
 static void
-cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_model *model,
+cn10k_ml_prep_sp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				struct cnxk_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_layer_addr *addr;
+	struct cn10k_ml_dev *cn10k_mldev;
 
-	metadata = &model->glow.metadata;
-	addr = &model->layer[0].glow.addr;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	metadata = &layer->glow.metadata;
+	addr = &layer->glow.addr;
 
 	memset(&req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
 	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
 	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->cn10k_req.status);
-	req->cn10k_req.jd.hdr.model_id = model->model_id;
+	req->cn10k_req.jd.hdr.model_id = layer->index;
 	req->cn10k_req.jd.hdr.job_type = job_type;
 	req->cn10k_req.jd.hdr.fp_flags = 0x0;
 	req->cn10k_req.jd.hdr.result =
 		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
-		if (!model->glow.metadata.model.ocm_relocatable)
+		if (!layer->glow.metadata.model.ocm_relocatable)
 			req->cn10k_req.jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
 		else
 			req->cn10k_req.jd.hdr.sp_flags = 0x0;
@@ -295,7 +297,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 		req->cn10k_req.jd.model_start.num_gather_entries = 0;
 		req->cn10k_req.jd.model_start.num_scatter_entries = 0;
 		req->cn10k_req.jd.model_start.tilemask = 0; /* Updated after reserving pages */
-		req->cn10k_req.jd.model_start.batch_size = model->batch_size;
+		req->cn10k_req.jd.model_start.batch_size = layer->batch_size;
 		req->cn10k_req.jd.model_start.ocm_wb_base_address =
 			0; /* Updated after reserving pages */
 		req->cn10k_req.jd.model_start.ocm_wb_range_start =
@@ -327,9 +329,13 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 }
 
 static __rte_always_inline void
-cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_req *req,
+cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_req *req,
 				struct rte_ml_op *op)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+
 	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
 	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(req->status);
 	req->cn10k_req.jd.hdr.model_id = op->model_id;
@@ -718,10 +724,8 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 }
 
 static int
-cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer)
 {
-	struct rte_ml_model_info *info;
-	struct cnxk_ml_model *model;
 	struct rte_ml_buff_seg seg[2];
 	struct rte_ml_buff_seg *inp;
 	struct rte_ml_buff_seg *out;
@@ -734,22 +738,20 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	int ret = 0;
 	uint32_t i;
 
-	model = dev->data->models[model_id];
-	info = (struct rte_ml_model_info *)model->info;
 	inp = &seg[0];
 	out = &seg[1];
 
 	/* Create input and output buffers. */
-	for (i = 0; i < info->nb_inputs; i++)
-		isize += info->input_info[i].size;
+	for (i = 0; i < layer->info.nb_inputs; i++)
+		isize += layer->info.input[i].sz_q;
 
-	for (i = 0; i < info->nb_outputs; i++)
-		osize += info->output_info[i].size;
+	for (i = 0; i < layer->info.nb_outputs; i++)
+		osize += layer->info.output[i].sz_q;
 
-	isize = model->batch_size * isize;
-	osize = model->batch_size * osize;
+	isize = layer->batch_size * isize;
+	osize = layer->batch_size * osize;
 
-	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", model_id);
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", layer->index);
 	mz = plt_memzone_reserve_aligned(str, isize + osize, 0, ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL)
 		return -ENOMEM;
@@ -765,15 +767,15 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	seg[1].length = osize;
 	seg[1].next = NULL;
 
-	op.model_id = model_id;
-	op.nb_batches = model->batch_size;
+	op.model_id = layer->index;
+	op.nb_batches = layer->batch_size;
 	op.mempool = NULL;
 
 	op.input = &inp;
 	op.output = &out;
 
-	memset(model->layer[0].glow.req, 0, sizeof(struct cnxk_ml_req));
-	ret = cn10k_ml_inference_sync(dev, &op);
+	memset(layer->glow.req, 0, sizeof(struct cnxk_ml_req));
+	ret = cn10k_ml_inference_sync(cnxk_mldev, &op);
 	plt_memzone_free(mz);
 
 	return ret;
@@ -1510,14 +1512,16 @@ cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mode
 }
 
 int
-cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 	struct cnxk_ml_req *req;
 
+	uint16_t layer_id = 0;
 	bool job_enqueued;
 	bool job_dequeued;
 	uint8_t num_tiles;
@@ -1528,85 +1532,89 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	bool locked;
 	int ret = 0;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	PLT_SET_USED(layer_name);
 
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
 	if (model == NULL) {
 		plt_err("Invalid model_id = %u", model_id);
 		return -EINVAL;
 	}
 
+	layer = &model->layer[layer_id];
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
+
 	/* Prepare JD */
-	req = model->layer[0].glow.req;
-	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
+	req = layer->glow.req;
+	cn10k_ml_prep_sp_job_descriptor(cnxk_mldev, layer, req, ML_CN10K_JOB_TYPE_MODEL_START);
 	req->cn10k_req.result.error_code = 0x0;
 	req->cn10k_req.result.user_ptr = NULL;
 
 	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
-	num_tiles = model->layer[0].glow.metadata.model.tile_end -
-		    model->layer[0].glow.metadata.model.tile_start + 1;
+	num_tiles = layer->glow.metadata.model.tile_end - layer->glow.metadata.model.tile_start + 1;
 
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-				plt_ml_dbg("Model already started, model = 0x%016lx",
-					   PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_STARTED) {
+				plt_ml_dbg("Layer already started, model_id = %u, layer_id = %u",
+					   model->model_id, layer_id);
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
-				plt_err("A slow-path job is active for the model = 0x%016lx",
-					PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE) {
+				plt_err("A slow-path job is active for the model_id = %u",
+					model->model_id);
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
+			layer->state = ML_CNXK_LAYER_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (!model->layer[0].glow.ocm_map.ocm_reserved) {
+	while (!layer->glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
 			wb_page_start = cn10k_ml_ocm_tilemask_find(
-				dev, num_tiles, model->layer[0].glow.ocm_map.wb_pages,
-				model->layer[0].glow.ocm_map.scratch_pages, &tilemask);
+				cnxk_mldev, num_tiles, layer->glow.ocm_map.wb_pages,
+				layer->glow.ocm_map.scratch_pages, &tilemask);
 
 			if (wb_page_start == -1) {
 				plt_err("Free pages not available on OCM tiles");
-				plt_err("Failed to start model = 0x%016lx, name = %s",
-					PLT_U64_CAST(model),
-					model->layer[0].glow.metadata.model.name);
-
+				plt_err("Failed to start layer, model_id = %u, layer_id = %u",
+					model->model_id, layer_id);
 				plt_spinlock_unlock(&ocm->lock);
 				return -ENOMEM;
 			}
 
-			model->layer[0].glow.ocm_map.tilemask = tilemask;
-			model->layer[0].glow.ocm_map.wb_page_start = wb_page_start;
+			layer->glow.ocm_map.tilemask = tilemask;
+			layer->glow.ocm_map.wb_page_start = wb_page_start;
 
-			cn10k_ml_ocm_reserve_pages(dev, model->model_id, 0,
-						   model->layer[0].glow.ocm_map.tilemask,
-						   model->layer[0].glow.ocm_map.wb_page_start,
-						   model->layer[0].glow.ocm_map.wb_pages,
-						   model->layer[0].glow.ocm_map.scratch_pages);
-			model->layer[0].glow.ocm_map.ocm_reserved = true;
+			cn10k_ml_ocm_reserve_pages(
+				cnxk_mldev, model->model_id, layer_id, layer->glow.ocm_map.tilemask,
+				layer->glow.ocm_map.wb_page_start, layer->glow.ocm_map.wb_pages,
+				layer->glow.ocm_map.scratch_pages);
+			layer->glow.ocm_map.ocm_reserved = true;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
 
 	/* Update JD */
-	cn10k_ml_ocm_tilecount(model->layer[0].glow.ocm_map.tilemask, &tile_start, &tile_end);
+	cn10k_ml_ocm_tilecount(layer->glow.ocm_map.tilemask, &tile_start, &tile_end);
 	req->cn10k_req.jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
 	req->cn10k_req.jd.model_start.ocm_wb_base_address =
-		model->layer[0].glow.ocm_map.wb_page_start * ocm->page_size;
+		layer->glow.ocm_map.wb_page_start * ocm->page_size;
 
 	job_enqueued = false;
 	job_dequeued = false;
@@ -1640,66 +1648,94 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (ret == 0) {
-				model->state = ML_CNXK_MODEL_STATE_STARTED;
-				cnxk_mldev->nb_models_started++;
-			} else {
-				model->state = ML_CNXK_MODEL_STATE_UNKNOWN;
-			}
+			if (ret == 0)
+				layer->state = ML_CNXK_LAYER_STATE_STARTED;
+			else
+				layer->state = ML_CNXK_LAYER_STATE_UNKNOWN;
 
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN) {
-		while (model->layer[0].glow.ocm_map.ocm_reserved) {
+	if (layer->state == ML_CNXK_LAYER_STATE_UNKNOWN) {
+		while (layer->glow.ocm_map.ocm_reserved) {
 			if (plt_spinlock_trylock(&ocm->lock) != 0) {
-				cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
-				model->layer[0].glow.ocm_map.ocm_reserved = false;
-				model->layer[0].glow.ocm_map.tilemask = 0x0;
+				cn10k_ml_ocm_free_pages(cnxk_mldev, model->model_id, layer_id);
+				layer->glow.ocm_map.ocm_reserved = false;
+				layer->glow.ocm_map.tilemask = 0x0;
 				plt_spinlock_unlock(&ocm->lock);
 			}
 		}
 	}
 
-	if (ret < 0) { /* Call unload to update model and FW state, ignore error */
-		rte_ml_model_stop(dev->data->dev_id, model_id);
+	if (ret < 0) {
+		cn10k_ml_layer_stop(device, model_id, layer_name);
 	} else {
-		if (cn10k_mldev->cache_model_data && roc_model_is_cn10ka())
-			ret = cn10k_ml_cache_model_data(dev, model_id);
+		if (cn10k_mldev->cache_model_data)
+			ret = cn10k_ml_cache_model_data(cnxk_mldev, layer);
 	}
 
 	return ret;
 }
 
 int
-cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+	int ret;
+
+	layer = &model->layer[0];
+	ret = cn10k_ml_layer_start(cnxk_mldev, model->model_id, layer->name);
+	if (ret != 0) {
+		plt_err("CN10K Model start failed, model_id = %u, error = %d", model->model_id,
+			ret);
+		return ret;
+	}
+
+	cnxk_mldev->nb_models_started++;
+	model->state = ML_CNXK_MODEL_STATE_STARTED;
+
+	return 0;
+}
+
+int
+cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 	struct cnxk_ml_req *req;
 
+	uint16_t layer_id = 0;
 	bool job_enqueued;
 	bool job_dequeued;
 	bool locked;
 	int ret = 0;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	PLT_SET_USED(layer_name);
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
 
+	model = cnxk_mldev->mldev->data->models[model_id];
 	if (model == NULL) {
 		plt_err("Invalid model_id = %u", model_id);
 		return -EINVAL;
 	}
 
+	layer = &model->layer[layer_id];
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
+
 	/* Prepare JD */
-	req = model->layer[0].glow.req;
-	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
+	req = layer->glow.req;
+	cn10k_ml_prep_sp_job_descriptor(cnxk_mldev, layer, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
 	req->cn10k_req.result.error_code = 0x0;
 	req->cn10k_req.result.user_ptr = NULL;
 
@@ -1709,31 +1745,31 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-				plt_ml_dbg("Model not started, model = 0x%016lx",
-					   PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_LOADED) {
+				plt_ml_dbg("Layer not started, model_id = %u, layer_id = %u",
+					   model->model_id, layer_id);
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
-				plt_err("A slow-path job is active for the model = 0x%016lx",
-					PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE) {
+				plt_err("A slow-path job is active for the layer, model_id = %u, layer_id = %u",
+					model->model_id, layer_id);
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
+			layer->state = ML_CNXK_LAYER_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (model->layer[0].glow.ocm_map.ocm_reserved) {
+	while (layer->glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
-			cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
-			model->layer[0].glow.ocm_map.ocm_reserved = false;
-			model->layer[0].glow.ocm_map.tilemask = 0x0;
+			cn10k_ml_ocm_free_pages(cnxk_mldev, model->model_id, layer_id);
+			layer->glow.ocm_map.ocm_reserved = false;
+			layer->glow.ocm_map.tilemask = 0x0;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
@@ -1770,8 +1806,11 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			cnxk_mldev->nb_models_stopped++;
-			model->state = ML_CNXK_MODEL_STATE_LOADED;
+			if (ret == 0)
+				layer->state = ML_CNXK_LAYER_STATE_LOADED;
+			else
+				layer->state = ML_CNXK_LAYER_STATE_UNKNOWN;
+
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
@@ -1780,6 +1819,25 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	return ret;
 }
 
+int
+cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+	int ret;
+
+	layer = &model->layer[0];
+	ret = cn10k_ml_layer_stop(cnxk_mldev, model->model_id, layer->name);
+	if (ret != 0) {
+		plt_err("CN10K Model stop failed, model_id = %u, error = %d", model->model_id, ret);
+		return ret;
+	}
+
+	cnxk_mldev->nb_models_stopped++;
+	model->state = ML_CNXK_MODEL_STATE_LOADED;
+
+	return 0;
+}
+
 int
 cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			struct rte_ml_model_info *model_info)
@@ -2007,30 +2065,35 @@ queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
 }
 
 static __rte_always_inline void
-cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *req)
+cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml_req *req)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_layer_xstats *xstats;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_result *result;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cnxk_ml_qp *qp;
 	struct rte_ml_op *op;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
+	uint16_t model_id;
+	uint16_t layer_id;
 
 	result = &req->cn10k_req.result;
 	op = req->op;
 
 	if (likely(result->error_code == 0)) {
-		model = dev->data->models[op->model_id];
+		model_id = cnxk_mldev->index_map[op->model_id].model_id;
+		layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
+		model = cnxk_mldev->mldev->data->models[model_id];
+		layer = &model->layer[layer_id];
 		if (likely(qp_id >= 0)) {
-			qp = dev->data->queue_pairs[qp_id];
+			qp = cnxk_mldev->mldev->data->queue_pairs[qp_id];
 			qp->stats.dequeued_count++;
-			xstats = &model->layer[0].glow.burst_xstats[qp_id];
+			xstats = &layer->glow.burst_xstats[qp_id];
 		} else {
-			xstats = model->layer[0].glow.sync_xstats;
+			xstats = layer->glow.sync_xstats;
 		}
 
 		if (unlikely(xstats->dequeued_count == xstats->hw_reset_count)) {
@@ -2058,14 +2121,13 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *re
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
 	} else {
 		if (likely(qp_id >= 0)) {
-			qp = dev->data->queue_pairs[qp_id];
+			qp = cnxk_mldev->mldev->data->queue_pairs[qp_id];
 			qp->stats.dequeue_err_count++;
 		}
 
 		/* Handle driver error */
 		error_code = (union cn10k_ml_error_code *)&result->error_code;
 		if (error_code->s.etype == ML_ETYPE_DRIVER) {
-			cnxk_mldev = dev->data->dev_private;
 			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 			/* Check for exception */
@@ -2120,7 +2182,7 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	req = &queue->reqs[head];
 
 	cn10k_mldev->set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
@@ -2187,7 +2249,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 		}
 	}
 
-	cn10k_ml_result_update(dev, qp_id, req);
+	cn10k_ml_result_update(cnxk_mldev, qp_id, req);
 	ops[count] = req->op;
 
 	queue_index_advance(&tail, qp->nb_desc);
@@ -2236,23 +2298,27 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 }
 
 __rte_hot int
-cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
+cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cnxk_ml_req *req;
+	uint16_t model_id;
+	uint16_t layer_id;
 	bool timeout;
 	int ret = 0;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	model = dev->data->models[op->model_id];
-	req = model->layer[0].glow.req;
+	model_id = cnxk_mldev->index_map[op->model_id].model_id;
+	layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
+	model = cnxk_mldev->mldev->data->models[model_id];
+	layer = &model->layer[layer_id];
+	req = layer->glow.req;
 
 	cn10k_ml_set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
@@ -2288,7 +2354,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	if (timeout)
 		ret = -ETIME;
 	else
-		cn10k_ml_result_update(dev, -1, req);
+		cn10k_ml_result_update(cnxk_mldev, -1, req);
 
 error_enqueue:
 	return ret;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 677219dfdf7..a222a43d552 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -315,8 +315,8 @@ int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
-int cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id);
-int cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
+int cn10k_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			    struct rte_ml_model_info *model_info);
 int cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer);
@@ -335,7 +335,7 @@ __rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id
 					  struct rte_ml_op **ops, uint16_t nb_ops);
 __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op,
 				    struct rte_ml_op_error *error);
-__rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
+__rte_hot int cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op);
 
 /* Misc ops */
 void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
@@ -344,5 +344,7 @@ void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *q
 int cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uint8_t *buffer,
 			size_t size, uint16_t *index);
 int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name);
+int cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name);
+int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name);
 
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 3d9d5f9d78c..915309168d8 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -242,7 +242,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 			model = dev->data->models[model_id];
 			if (model != NULL) {
 				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-					if (cn10k_ml_model_stop(dev, model_id) != 0)
+					if (cnxk_ml_model_stop(dev, model_id) != 0)
 						plt_err("Could not stop model %u", model_id);
 				}
 				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
@@ -334,7 +334,7 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 		model = dev->data->models[model_id];
 		if (model != NULL) {
 			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-				if (cn10k_ml_model_stop(dev, model_id) != 0)
+				if (cnxk_ml_model_stop(dev, model_id) != 0)
 					plt_err("Could not stop model %u", model_id);
 			}
 			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
@@ -566,6 +566,46 @@ cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 	return plt_memzone_free(plt_memzone_lookup(str));
 }
 
+static int
+cnxk_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	return cn10k_ml_model_start(cnxk_mldev, model);
+}
+
+int
+cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	return cn10k_ml_model_stop(cnxk_mldev, model);
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -591,8 +631,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Model ops */
 	.model_load = cnxk_ml_model_load,
 	.model_unload = cnxk_ml_model_unload,
-	.model_start = cn10k_ml_model_start,
-	.model_stop = cn10k_ml_model_stop,
+	.model_start = cnxk_ml_model_start,
+	.model_stop = cnxk_ml_model_stop,
 	.model_info_get = cn10k_ml_model_info_get,
 	.model_params_update = cn10k_ml_model_params_update,
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index bc14f6e5b9e..d27ca0d0cb2 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -63,5 +63,6 @@ struct cnxk_ml_qp {
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
 int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
+int cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
 
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 12/34] ml/cnxk: update model utility functions
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (10 preceding siblings ...)
  2023-08-30 15:59 ` [PATCH v1 11/34] ml/cnxk: update model start and stop functions Srikanth Yalavarthi
@ 2023-08-30 15:59 ` Srikanth Yalavarthi
  2023-08-30 15:59 ` [PATCH v1 13/34] ml/cnxk: update data quantization functions Srikanth Yalavarthi
                   ` (29 subsequent siblings)
  41 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to update model params and
fetch model info.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 38 ++++++---------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |  5 ++--
 drivers/ml/cnxk/cnxk_ml_ops.c  | 48 ++++++++++++++++++++++++++++++++--
 3 files changed, 56 insertions(+), 35 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index e5b9837ed73..0eebefee5fc 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1839,45 +1839,23 @@ cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
 }
 
 int
-cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
-			struct rte_ml_model_info *model_info)
+cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			     void *buffer)
 {
-	struct cnxk_ml_model *model;
-
-	model = dev->data->models[model_id];
-
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
-
-	rte_memcpy(model_info, model->info, sizeof(struct rte_ml_model_info));
-	model_info->input_info = ((struct rte_ml_model_info *)model->info)->input_info;
-	model_info->output_info = ((struct rte_ml_model_info *)model->info)->output_info;
-
-	return 0;
-}
-
-int
-cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
-{
-	struct cnxk_ml_model *model;
-
-	model = dev->data->models[model_id];
+	struct cnxk_ml_layer *layer;
 
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
+	RTE_SET_USED(cnxk_mldev);
 
 	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN)
 		return -1;
 	else if (model->state != ML_CNXK_MODEL_STATE_LOADED)
 		return -EBUSY;
 
+	layer = &model->layer[0];
+
 	/* Update model weights & bias */
-	rte_memcpy(model->layer[0].glow.addr.wb_load_addr, buffer,
-		   model->layer[0].glow.metadata.weights_bias.file_size);
+	rte_memcpy(layer->glow.addr.wb_load_addr, buffer,
+		   layer->glow.metadata.weights_bias.file_size);
 
 	return 0;
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index a222a43d552..ef12069f0df 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -317,9 +317,8 @@ int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_para
 int cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
-int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
-			    struct rte_ml_model_info *model_info);
-int cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer);
+int cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				 void *buffer);
 
 /* I/O ops */
 int cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 915309168d8..5ad0ea8c3ce 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -606,6 +606,50 @@ cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	return cn10k_ml_model_stop(cnxk_mldev, model);
 }
 
+static int
+cnxk_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
+		       struct rte_ml_model_info *model_info)
+{
+	struct rte_ml_model_info *info;
+	struct cnxk_ml_model *model;
+
+	if ((dev == NULL) || (model_info == NULL))
+		return -EINVAL;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	info = (struct rte_ml_model_info *)model->info;
+	rte_memcpy(model_info, info, sizeof(struct rte_ml_model_info));
+	model_info->input_info = info->input_info;
+	model_info->output_info = info->output_info;
+
+	return 0;
+}
+
+static int
+cnxk_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	if ((dev == NULL) || (buffer == NULL))
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	return cn10k_ml_model_params_update(cnxk_mldev, model, buffer);
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -633,8 +677,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.model_unload = cnxk_ml_model_unload,
 	.model_start = cnxk_ml_model_start,
 	.model_stop = cnxk_ml_model_stop,
-	.model_info_get = cn10k_ml_model_info_get,
-	.model_params_update = cn10k_ml_model_params_update,
+	.model_info_get = cnxk_ml_model_info_get,
+	.model_params_update = cnxk_ml_model_params_update,
 
 	/* I/O ops */
 	.io_quantize = cn10k_ml_io_quantize,
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 13/34] ml/cnxk: update data quantization functions
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (11 preceding siblings ...)
  2023-08-30 15:59 ` [PATCH v1 12/34] ml/cnxk: update model utility functions Srikanth Yalavarthi
@ 2023-08-30 15:59 ` Srikanth Yalavarthi
  2023-08-30 15:59 ` [PATCH v1 14/34] ml/cnxk: update device debug functions Srikanth Yalavarthi
                   ` (28 subsequent siblings)
  41 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper functions to quantize input data and
dequantize output data.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 164 ---------------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   7 --
 drivers/ml/cnxk/cnxk_ml_io.c   |  95 +++++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_io.h   |   3 +
 drivers/ml/cnxk/cnxk_ml_ops.c  |  78 +++++++++++++++-
 drivers/ml/cnxk/meson.build    |   1 +
 6 files changed, 175 insertions(+), 173 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.c

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 0eebefee5fc..1e6aee818c7 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1860,170 +1860,6 @@ cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_mode
 	return 0;
 }
 
-int
-cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
-		     struct rte_ml_buff_seg **qbuffer)
-{
-	struct cnxk_ml_model *model;
-	uint8_t model_input_type;
-	uint8_t *lcl_dbuffer;
-	uint8_t *lcl_qbuffer;
-	uint8_t input_type;
-	float qscale;
-	uint32_t i;
-	uint32_t j;
-	int ret;
-
-	model = dev->data->models[model_id];
-
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
-
-	lcl_dbuffer = dbuffer[0]->addr;
-	lcl_qbuffer = qbuffer[0]->addr;
-
-	for (i = 0; i < model->layer[0].glow.metadata.model.num_input; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			input_type = model->layer[0].glow.metadata.input1[i].input_type;
-			model_input_type = model->layer[0].glow.metadata.input1[i].model_input_type;
-			qscale = model->layer[0].glow.metadata.input1[i].qscale;
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			input_type = model->layer[0].glow.metadata.input2[j].input_type;
-			model_input_type = model->layer[0].glow.metadata.input2[j].model_input_type;
-			qscale = model->layer[0].glow.metadata.input2[j].qscale;
-		}
-
-		if (input_type == model_input_type) {
-			rte_memcpy(lcl_qbuffer, lcl_dbuffer, model->layer[0].info.input[i].sz_d);
-		} else {
-			switch (model->layer[0].glow.metadata.input1[i].model_input_type) {
-			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_float32_to_int8(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_float32_to_uint8(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_float32_to_int16(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_float32_to_uint16(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_FP16:
-				ret = rte_ml_io_float32_to_float16(
-					model->layer[0].info.input[i].nb_elements, lcl_dbuffer,
-					lcl_qbuffer);
-				break;
-			default:
-				plt_err("Unsupported model_input_type[%u] : %u", i,
-					model->layer[0].glow.metadata.input1[i].model_input_type);
-				ret = -ENOTSUP;
-			}
-			if (ret < 0)
-				return ret;
-		}
-
-		lcl_dbuffer += model->layer[0].info.input[i].sz_d;
-		lcl_qbuffer += model->layer[0].info.input[i].sz_q;
-	}
-
-	return 0;
-}
-
-int
-cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
-		       struct rte_ml_buff_seg **dbuffer)
-{
-	struct cnxk_ml_model *model;
-	uint8_t model_output_type;
-	uint8_t *lcl_qbuffer;
-	uint8_t *lcl_dbuffer;
-	uint8_t output_type;
-	float dscale;
-	uint32_t i;
-	uint32_t j;
-	int ret;
-
-	model = dev->data->models[model_id];
-
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
-
-	lcl_dbuffer = dbuffer[0]->addr;
-	lcl_qbuffer = qbuffer[0]->addr;
-
-	for (i = 0; i < model->layer[0].glow.metadata.model.num_output; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			output_type = model->layer[0].glow.metadata.output1[i].output_type;
-			model_output_type =
-				model->layer[0].glow.metadata.output1[i].model_output_type;
-			dscale = model->layer[0].glow.metadata.output1[i].dscale;
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			output_type = model->layer[0].glow.metadata.output2[j].output_type;
-			model_output_type =
-				model->layer[0].glow.metadata.output2[j].model_output_type;
-			dscale = model->layer[0].glow.metadata.output2[j].dscale;
-		}
-
-		if (output_type == model_output_type) {
-			rte_memcpy(lcl_dbuffer, lcl_qbuffer, model->layer[0].info.output[i].sz_q);
-		} else {
-			switch (model->layer[0].glow.metadata.output1[i].model_output_type) {
-			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_int8_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_uint8_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_int16_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_uint16_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_FP16:
-				ret = rte_ml_io_float16_to_float32(
-					model->layer[0].info.output[i].nb_elements, lcl_qbuffer,
-					lcl_dbuffer);
-				break;
-			default:
-				plt_err("Unsupported model_output_type[%u] : %u", i,
-					model->layer[0].glow.metadata.output1[i].model_output_type);
-				ret = -ENOTSUP;
-			}
-			if (ret < 0)
-				return ret;
-		}
-
-		lcl_qbuffer += model->layer[0].info.output[i].sz_q;
-		lcl_dbuffer += model->layer[0].info.output[i].sz_d;
-	}
-
-	return 0;
-}
-
 static __rte_always_inline void
 queue_index_advance(uint64_t *index, uint64_t nb_desc)
 {
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index ef12069f0df..780e2a9f9c3 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -320,13 +320,6 @@ int cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 int cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 				 void *buffer);
 
-/* I/O ops */
-int cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id,
-			 struct rte_ml_buff_seg **dbuffer, struct rte_ml_buff_seg **qbuffer);
-
-int cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id,
-			   struct rte_ml_buff_seg **qbuffer, struct rte_ml_buff_seg **dbuffer);
-
 /* Fast-path ops */
 __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					  struct rte_ml_op **ops, uint16_t nb_ops);
diff --git a/drivers/ml/cnxk/cnxk_ml_io.c b/drivers/ml/cnxk/cnxk_ml_io.c
new file mode 100644
index 00000000000..c78009ab0cd
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_io.c
@@ -0,0 +1,95 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include <mldev_utils.h>
+
+#include <roc_api.h>
+
+#include "cnxk_ml_io.h"
+
+inline int
+cnxk_ml_io_quantize_single(struct cnxk_ml_io *input, uint8_t *dbuffer, uint8_t *qbuffer)
+{
+	enum rte_ml_io_type qtype;
+	enum rte_ml_io_type dtype;
+	uint32_t nb_elements;
+	float qscale;
+	int ret = 0;
+
+	dtype = input->dtype;
+	qtype = input->qtype;
+	qscale = input->scale;
+	nb_elements = input->nb_elements;
+
+	if (dtype == qtype) {
+		rte_memcpy(qbuffer, dbuffer, input->sz_d);
+	} else {
+		switch (qtype) {
+		case RTE_ML_IO_TYPE_INT8:
+			ret = rte_ml_io_float32_to_int8(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT8:
+			ret = rte_ml_io_float32_to_uint8(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_INT16:
+			ret = rte_ml_io_float32_to_int16(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT16:
+			ret = rte_ml_io_float32_to_uint16(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_FP16:
+			ret = rte_ml_io_float32_to_float16(nb_elements, dbuffer, qbuffer);
+			break;
+		default:
+			plt_err("Unsupported qtype : %u", qtype);
+			ret = -ENOTSUP;
+		}
+	}
+
+	return ret;
+}
+
+inline int
+cnxk_ml_io_dequantize_single(struct cnxk_ml_io *output, uint8_t *qbuffer, uint8_t *dbuffer)
+{
+	enum rte_ml_io_type qtype;
+	enum rte_ml_io_type dtype;
+	uint32_t nb_elements;
+	float dscale;
+	int ret = 0;
+
+	dtype = output->dtype;
+	qtype = output->qtype;
+	dscale = output->scale;
+	nb_elements = output->nb_elements;
+
+	if (dtype == qtype) {
+		rte_memcpy(dbuffer, qbuffer, output->sz_q);
+	} else {
+		switch (qtype) {
+		case RTE_ML_IO_TYPE_INT8:
+			ret = rte_ml_io_int8_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT8:
+			ret = rte_ml_io_uint8_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_INT16:
+			ret = rte_ml_io_int16_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT16:
+			ret = rte_ml_io_uint16_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_FP16:
+			ret = rte_ml_io_float16_to_float32(nb_elements, qbuffer, dbuffer);
+			break;
+		default:
+			plt_err("Unsupported qtype: %u", qtype);
+			ret = -ENOTSUP;
+		}
+	}
+
+	return ret;
+}
diff --git a/drivers/ml/cnxk/cnxk_ml_io.h b/drivers/ml/cnxk/cnxk_ml_io.h
index 29ec7ec5112..5de166c2520 100644
--- a/drivers/ml/cnxk/cnxk_ml_io.h
+++ b/drivers/ml/cnxk/cnxk_ml_io.h
@@ -76,4 +76,7 @@ struct cnxk_ml_io_info {
 	uint32_t total_output_sz_d;
 };
 
+int cnxk_ml_io_quantize_single(struct cnxk_ml_io *input, uint8_t *dbuffer, uint8_t *qbuffer);
+int cnxk_ml_io_dequantize_single(struct cnxk_ml_io *output, uint8_t *qbuffer, uint8_t *dbuffer);
+
 #endif /* _CNXK_ML_IO_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 5ad0ea8c3ce..ed71a551327 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -5,6 +5,8 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include <mldev_utils.h>
+
 #include "cn10k_ml_ops.h"
 
 #include "cnxk_ml_dev.h"
@@ -650,6 +652,78 @@ cnxk_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buf
 	return cn10k_ml_model_params_update(cnxk_mldev, model, buffer);
 }
 
+static int
+cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
+		    struct rte_ml_buff_seg **qbuffer)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_model *model;
+	uint8_t *lcl_dbuffer;
+	uint8_t *lcl_qbuffer;
+	uint32_t i;
+	int ret;
+
+	if ((dev == NULL) || (dbuffer == NULL) || (qbuffer == NULL))
+		return -EINVAL;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	info = &model->layer[0].info;
+
+	lcl_dbuffer = dbuffer[0]->addr;
+	lcl_qbuffer = qbuffer[0]->addr;
+	for (i = 0; i < info->nb_inputs; i++) {
+		ret = cnxk_ml_io_quantize_single(&info->input[i], lcl_dbuffer, lcl_qbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_dbuffer += info->input[i].sz_d;
+		lcl_qbuffer += info->input[i].sz_q;
+	}
+
+	return 0;
+}
+
+static int
+cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
+		      struct rte_ml_buff_seg **dbuffer)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_model *model;
+	uint8_t *lcl_qbuffer;
+	uint8_t *lcl_dbuffer;
+	uint32_t i;
+	int ret;
+
+	if ((dev == NULL) || (qbuffer == NULL) || (dbuffer == NULL))
+		return -EINVAL;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	info = &model->layer[model->nb_layers - 1].info;
+
+	lcl_qbuffer = qbuffer[0]->addr;
+	lcl_dbuffer = dbuffer[0]->addr;
+	for (i = 0; i < info->nb_outputs; i++) {
+		ret = cnxk_ml_io_dequantize_single(&info->output[i], lcl_qbuffer, lcl_dbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_qbuffer += info->output[i].sz_q;
+		lcl_dbuffer += info->output[i].sz_d;
+	}
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -681,6 +755,6 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.model_params_update = cnxk_ml_model_params_update,
 
 	/* I/O ops */
-	.io_quantize = cn10k_ml_io_quantize,
-	.io_dequantize = cn10k_ml_io_dequantize,
+	.io_quantize = cnxk_ml_io_quantize,
+	.io_dequantize = cnxk_ml_io_dequantize,
 };
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 6385ac45481..9cc4ddec702 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -25,6 +25,7 @@ sources = files(
         'cn10k_ml_model.c',
         'cn10k_ml_ocm.c',
         'cnxk_ml_dev.c',
+        'cnxk_ml_io.c',
         'cnxk_ml_model.c',
         'cnxk_ml_ops.c',
 )
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 14/34] ml/cnxk: update device debug functions
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (12 preceding siblings ...)
  2023-08-30 15:59 ` [PATCH v1 13/34] ml/cnxk: update data quantization functions Srikanth Yalavarthi
@ 2023-08-30 15:59 ` Srikanth Yalavarthi
  2023-08-30 15:59 ` [PATCH v1 15/34] ml/cnxk: update device stats functions Srikanth Yalavarthi
                   ` (27 subsequent siblings)
  41 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper for device dump and selftest debug
functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 118 +++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h |   1 +
 drivers/ml/cnxk/cn10k_ml_ocm.c   |   8 +-
 drivers/ml/cnxk/cn10k_ml_ocm.h   |   2 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 176 ++-----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h   |   4 +-
 drivers/ml/cnxk/cnxk_ml_model.c  |  33 ++++++
 drivers/ml/cnxk/cnxk_ml_model.h  |   2 +
 drivers/ml/cnxk/cnxk_ml_ops.c    |  39 ++++++-
 drivers/ml/cnxk/cnxk_ml_utils.c  |  15 +++
 drivers/ml/cnxk/cnxk_ml_utils.h  |  17 +++
 drivers/ml/cnxk/meson.build      |   2 +
 12 files changed, 236 insertions(+), 181 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.h

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 9a336cd18f9..9e92d4acf36 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -12,6 +12,7 @@
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
+#include "cnxk_ml_utils.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -591,3 +592,120 @@ cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 				 rte_ml_io_type_size_get(io_info->output[i].qtype);
 	}
 }
+
+void
+cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp)
+{
+	struct cn10k_ml_ocm *ocm;
+	char str[STR_LEN];
+	uint8_t i;
+	uint8_t j;
+
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
+
+	/* Print debug info */
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, " Layer Information (Layer ID: %u, Name: %s)\n",
+		cnxk_mldev->index_map[layer->index].layer_id, layer->name);
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "index", layer->index);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", layer->name);
+	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version",
+		layer->glow.metadata.model.version[0], layer->glow.metadata.model.version[1],
+		layer->glow.metadata.model.version[2], layer->glow.metadata.model.version[3]);
+	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "layer", PLT_U64_CAST(layer));
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", layer->batch_size);
+
+	/* Print model state */
+	if (layer->state == ML_CNXK_LAYER_STATE_LOADED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+	if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+	if (layer->state == ML_CNXK_LAYER_STATE_STARTED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+
+	/* Print OCM status */
+	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
+		layer->glow.metadata.model.ocm_wb_range_end -
+			layer->glow.metadata.model.ocm_wb_range_start + 1);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", layer->glow.ocm_map.wb_pages);
+	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
+		ocm->size_per_tile - layer->glow.metadata.model.ocm_tmp_range_floor);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages", layer->glow.ocm_map.scratch_pages);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
+		layer->glow.metadata.model.tile_end - layer->glow.metadata.model.tile_start + 1);
+
+	if (layer->state == ML_CNXK_LAYER_STATE_STARTED) {
+		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
+			ML_CN10K_OCM_NUMTILES / 4, layer->glow.ocm_map.tilemask);
+		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
+			layer->glow.ocm_map.wb_page_start * ocm->page_size);
+	}
+
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", layer->glow.metadata.model.num_input);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", layer->glow.metadata.model.num_output);
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s  %18s\n", "input", "input_name", "input_type",
+		"model_input_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->glow.metadata.model.num_input; i++) {
+		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.input1[i].input_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.input1[i].input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.input1[i].model_input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		} else {
+			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
+
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.input2[j].input_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.input2[j].input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.input2[j].model_input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		}
+	}
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s  %18s\n", "output", "output_name", "output_type",
+		"model_output_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->glow.metadata.model.num_output; i++) {
+		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.output1[i].output_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.output1[i].output_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.output1[i].model_output_type,
+					      str, STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		} else {
+			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.output2[j].output_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.output2[j].output_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.output2[j].model_output_type,
+					      str, STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		}
+	}
+	fprintf(fp, "\n");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 45290b84cef..8717e7de3ec 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -459,5 +459,6 @@ int cn10k_ml_model_ocm_pages_count(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_m
 void cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 			     struct cnxk_ml_io_info *io_info,
 			     struct cn10k_ml_model_metadata *metadata);
+void cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 6a8400b7763..7d4b1efad13 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -483,19 +483,15 @@ cn10k_ml_ocm_pagemask_to_str(struct cn10k_ml_ocm_tile_info *tile_info, uint16_t
 }
 
 void
-cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
+cn10k_ml_ocm_print(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
 {
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 	uint8_t tile_id;
 	uint8_t word_id;
 	int wb_pages;
 	char *str;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
 
 	/* Nibbles + prefix '0x' */
 	str = rte_zmalloc("ocm_mask_str", ocm->num_pages / 4 + 2, RTE_CACHE_LINE_SIZE);
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 97b723a56a5..bf8944f8eee 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -83,6 +83,6 @@ void cn10k_ml_ocm_reserve_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_i
 				uint16_t layer_id, uint64_t tilemask, int wb_page_start,
 				uint16_t wb_pages, uint16_t scratch_pages);
 void cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id);
-void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp);
+void cn10k_ml_ocm_print(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 
 #endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 1e6aee818c7..c3608eec99e 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -22,11 +22,6 @@
 /* ML layer macros */
 #define CN10K_ML_LAYER_MEMZONE_NAME "ml_cn10k_layer_mz"
 
-/* Debug print width */
-#define STR_LEN	  12
-#define FIELD_LEN 16
-#define LINE_LEN  90
-
 /* ML Job descriptor flags */
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
@@ -74,16 +69,6 @@ static const struct cn10k_ml_stype_db_driver {
 	{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
 };
 
-static void
-print_line(FILE *fp, int len)
-{
-	int i;
-
-	for (i = 0; i < len; i++)
-		fprintf(fp, "-");
-	fprintf(fp, "\n");
-}
-
 static inline void
 cn10k_ml_set_poll_addr(struct cnxk_ml_req *req)
 {
@@ -117,140 +102,6 @@ cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp)
 	}
 }
 
-static void
-cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
-	struct cn10k_ml_ocm *ocm;
-	char str[STR_LEN];
-	uint8_t i;
-	uint8_t j;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
-
-	/* Print debug info */
-	print_line(fp, LINE_LEN);
-	fprintf(fp, " Model Information (%s)\n", model->glow.metadata.model.name);
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->glow.metadata.model.name);
-	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version",
-		model->glow.metadata.model.version[0], model->glow.metadata.model.version[1],
-		model->glow.metadata.model.version[2], model->glow.metadata.model.version[3]);
-	if (strlen(model->name) != 0)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "debug_name", model->name);
-	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->glow.metadata.model.batch_size);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_layers", model->glow.metadata.model.num_layers);
-
-	/* Print model state */
-	if (model->state == ML_CNXK_MODEL_STATE_LOADED)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
-	if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
-	if (model->state == ML_CNXK_MODEL_STATE_STARTED)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
-
-	/* Print OCM status */
-	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
-		model->glow.metadata.model.ocm_wb_range_end -
-			model->glow.metadata.model.ocm_wb_range_start + 1);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", model->layer[0].glow.ocm_map.wb_pages);
-	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
-		ocm->size_per_tile - model->glow.metadata.model.ocm_tmp_range_floor);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages",
-		model->layer[0].glow.ocm_map.scratch_pages);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
-		model->glow.metadata.model.tile_end - model->glow.metadata.model.tile_start + 1);
-
-	if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
-			ML_CN10K_OCM_NUMTILES / 4, model->layer[0].glow.ocm_map.tilemask);
-		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
-			model->layer[0].glow.ocm_map.wb_page_start * cn10k_mldev->ocm.page_size);
-	}
-
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->glow.metadata.model.num_input);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", model->glow.metadata.model.num_output);
-	fprintf(fp, "\n");
-
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "input", "input_name", "input_type",
-		"model_input_type", "quantize");
-	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->glow.metadata.model.num_input; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.input1[i].input_name);
-			rte_ml_io_type_to_str(model->glow.metadata.input1[i].input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.input1[i].model_input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.input1[i].quantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.input2[j].input_name);
-			rte_ml_io_type_to_str(model->glow.metadata.input2[j].input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.input2[j].model_input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.input2[j].quantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		}
-	}
-	fprintf(fp, "\n");
-
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "output", "output_name", "output_type",
-		"model_output_type", "dequantize");
-	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->glow.metadata.model.num_output; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.output1[i].output_name);
-			rte_ml_io_type_to_str(model->glow.metadata.output1[i].output_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.output1[i].model_output_type,
-					      str, STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.output1[i].dequantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.output2[j].output_name);
-			rte_ml_io_type_to_str(model->glow.metadata.output2[j].output_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.output2[j].model_output_type,
-					      str, STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.output2[j].dequantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		}
-	}
-	fprintf(fp, "\n");
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "\n");
-}
-
 static void
 cn10k_ml_prep_sp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				struct cnxk_ml_req *req, enum cn10k_ml_job_type job_type)
@@ -1124,38 +975,25 @@ cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mo
 }
 
 int
-cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
+cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
 	struct cn10k_ml_fw *fw;
 
 	uint32_t head_loc;
 	uint32_t tail_loc;
-	uint16_t model_id;
 	uint32_t bufsize;
 	char *head_ptr;
 	int core_id;
 
-	if (roc_env_is_asim())
-		return 0;
-
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	fw = &cn10k_mldev->fw;
 
-	/* Dump model info */
-	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
-		model = dev->data->models[model_id];
-		if (model != NULL) {
-			cn10k_ml_model_print(dev, model_id, fp);
-			fprintf(fp, "\n");
-		}
-	}
-
 	/* Dump OCM state */
-	cn10k_ml_ocm_print(dev, fp);
+	cn10k_ml_ocm_print(cnxk_mldev, fp);
+
+	if (roc_env_is_asim())
+		return 0;
 
 	/* Dump debug buffer */
 	for (core_id = 0; core_id <= 1; core_id++) {
@@ -1211,17 +1049,15 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 }
 
 int
-cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
+cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	const struct plt_memzone *mz;
 	struct cnxk_ml_req *req;
 	uint64_t timeout_cycle;
 	bool timeout;
 	int ret;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cnxk_ml_req), 0,
 					 ML_CN10K_ALIGN_SIZE);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 780e2a9f9c3..5fda98ae88f 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -295,8 +295,8 @@ int cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_d
 int cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
-int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
-int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
+int cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
+int cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev);
 
 int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
 void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
index 3d735ced3e1..b069d4e3a5a 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.c
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -5,3 +5,36 @@
 #include <rte_mldev.h>
 
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_utils.h"
+
+void
+cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp)
+{
+	struct cnxk_ml_layer *layer;
+	uint16_t layer_id;
+
+	/* Print debug info */
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, " Model Information (Model ID: %u, Name: %s)\n", model->model_id, model->name);
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->name);
+	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->batch_size);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "nb_layers", model->nb_layers);
+
+	/* Print model state */
+	if (model->state == ML_CNXK_MODEL_STATE_LOADED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+	if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+	if (model->state == ML_CNXK_MODEL_STATE_STARTED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+
+	for (layer_id = 0; layer_id < model->nb_layers; layer_id++) {
+		layer = &model->layer[layer_id];
+		cn10k_ml_layer_print(cnxk_mldev, layer, fp);
+	}
+}
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index a2994dbb71e..66d979dd3ca 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -108,4 +108,6 @@ struct cnxk_ml_model {
 	plt_spinlock_t lock;
 };
 
+void cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp);
+
 #endif /* _CNXK_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index ed71a551327..b49ab597984 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -411,6 +411,41 @@ cnxk_ml_dev_stop(struct rte_ml_dev *dev)
 	return 0;
 }
 
+static int
+cnxk_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	uint16_t model_id;
+
+	if ((dev == NULL) || (fp == NULL))
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	/* Dump model info */
+	for (model_id = 0; model_id < cnxk_mldev->mldev->data->nb_models; model_id++) {
+		model = cnxk_mldev->mldev->data->models[model_id];
+		if (model != NULL)
+			cnxk_ml_model_dump(cnxk_mldev, model, fp);
+	}
+
+	return cn10k_ml_dev_dump(cnxk_mldev, fp);
+}
+
+static int
+cnxk_ml_dev_selftest(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	return cn10k_ml_dev_selftest(cnxk_mldev);
+}
+
 static int
 cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 			     const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
@@ -731,8 +766,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_close = cnxk_ml_dev_close,
 	.dev_start = cnxk_ml_dev_start,
 	.dev_stop = cnxk_ml_dev_stop,
-	.dev_dump = cn10k_ml_dev_dump,
-	.dev_selftest = cn10k_ml_dev_selftest,
+	.dev_dump = cnxk_ml_dev_dump,
+	.dev_selftest = cnxk_ml_dev_selftest,
 
 	/* Queue-pair handling ops */
 	.dev_queue_pair_setup = cnxk_ml_dev_queue_pair_setup,
diff --git a/drivers/ml/cnxk/cnxk_ml_utils.c b/drivers/ml/cnxk/cnxk_ml_utils.c
new file mode 100644
index 00000000000..ca3670a9e83
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_utils.c
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include "cnxk_ml_utils.h"
+
+void
+cnxk_ml_print_line(FILE *fp, int len)
+{
+	int i;
+
+	for (i = 0; i < len; i++)
+		fprintf(fp, "-");
+	fprintf(fp, "\n");
+}
diff --git a/drivers/ml/cnxk/cnxk_ml_utils.h b/drivers/ml/cnxk/cnxk_ml_utils.h
new file mode 100644
index 00000000000..ed2ab213469
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_utils.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_UTILS_H_
+#define _CNXK_ML_UTILS_H_
+
+#include <rte_mldev.h>
+
+/* Debug print width */
+#define STR_LEN	  12
+#define FIELD_LEN 16
+#define LINE_LEN  72
+
+void cnxk_ml_print_line(FILE *fp, int len);
+
+#endif /* _CNXK_ML_UTILS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 9cc4ddec702..575f08f9c09 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -17,6 +17,7 @@ driver_sdk_headers = files(
         'cnxk_ml_model.h',
         'cnxk_ml_ops.h',
         'cnxk_ml_xstats.h',
+        'cnxk_ml_utils.h',
 )
 
 sources = files(
@@ -28,6 +29,7 @@ sources = files(
         'cnxk_ml_io.c',
         'cnxk_ml_model.c',
         'cnxk_ml_ops.c',
+        'cnxk_ml_utils.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 15/34] ml/cnxk: update device stats functions
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (13 preceding siblings ...)
  2023-08-30 15:59 ` [PATCH v1 14/34] ml/cnxk: update device debug functions Srikanth Yalavarthi
@ 2023-08-30 15:59 ` Srikanth Yalavarthi
  2023-08-30 15:59 ` [PATCH v1 16/34] ml/cnxk: update device and model xstats functions Srikanth Yalavarthi
                   ` (26 subsequent siblings)
  41 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to handle ML device stats

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 32 ------------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |  2 --
 drivers/ml/cnxk/cnxk_ml_ops.c  | 36 ++++++++++++++++++++++++++++++++--
 3 files changed, 34 insertions(+), 36 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index c3608eec99e..59cd3bb9b34 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -774,38 +774,6 @@ cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
-int
-cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
-{
-	struct cnxk_ml_qp *qp;
-	int qp_id;
-
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-		qp = dev->data->queue_pairs[qp_id];
-		stats->enqueued_count += qp->stats.enqueued_count;
-		stats->dequeued_count += qp->stats.dequeued_count;
-		stats->enqueue_err_count += qp->stats.enqueue_err_count;
-		stats->dequeue_err_count += qp->stats.dequeue_err_count;
-	}
-
-	return 0;
-}
-
-void
-cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
-{
-	struct cnxk_ml_qp *qp;
-	int qp_id;
-
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-		qp = dev->data->queue_pairs[qp_id];
-		qp->stats.enqueued_count = 0;
-		qp->stats.dequeued_count = 0;
-		qp->stats.enqueue_err_count = 0;
-		qp->stats.dequeue_err_count = 0;
-	}
-}
-
 int
 cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 5fda98ae88f..47e7cb12af4 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -298,8 +298,6 @@ int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev);
 
-int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
-void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
 int cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 				  int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
 				  uint32_t size);
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index b49ab597984..ffeb3f44523 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -491,6 +491,38 @@ cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+static int
+cnxk_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
+{
+	struct cnxk_ml_qp *qp;
+	int qp_id;
+
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		stats->enqueued_count += qp->stats.enqueued_count;
+		stats->dequeued_count += qp->stats.dequeued_count;
+		stats->enqueue_err_count += qp->stats.enqueue_err_count;
+		stats->dequeue_err_count += qp->stats.dequeue_err_count;
+	}
+
+	return 0;
+}
+
+static void
+cnxk_ml_dev_stats_reset(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_qp *qp;
+	int qp_id;
+
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		qp->stats.enqueued_count = 0;
+		qp->stats.dequeued_count = 0;
+		qp->stats.enqueue_err_count = 0;
+		qp->stats.dequeue_err_count = 0;
+	}
+}
+
 static int
 cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
@@ -774,8 +806,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_queue_pair_release = cnxk_ml_dev_queue_pair_release,
 
 	/* Stats ops */
-	.dev_stats_get = cn10k_ml_dev_stats_get,
-	.dev_stats_reset = cn10k_ml_dev_stats_reset,
+	.dev_stats_get = cnxk_ml_dev_stats_get,
+	.dev_stats_reset = cnxk_ml_dev_stats_reset,
 	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
 	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
 	.dev_xstats_get = cn10k_ml_dev_xstats_get,
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 16/34] ml/cnxk: update device and model xstats functions
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (14 preceding siblings ...)
  2023-08-30 15:59 ` [PATCH v1 15/34] ml/cnxk: update device stats functions Srikanth Yalavarthi
@ 2023-08-30 15:59 ` Srikanth Yalavarthi
  2023-08-30 15:59 ` [PATCH v1 17/34] ml/cnxk: update fast path functions Srikanth Yalavarthi
                   ` (25 subsequent siblings)
  41 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to handle ML device and model
extended stats. Handling resources for the xstats is done
in the cnxk layer. Introduced internal xstats group.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |   4 -
 drivers/ml/cnxk/cn10k_ml_ops.c   | 542 +------------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h   |  11 -
 drivers/ml/cnxk/cnxk_ml_dev.h    |   5 +
 drivers/ml/cnxk/cnxk_ml_ops.c    | 540 +++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_xstats.h |  21 +-
 6 files changed, 571 insertions(+), 552 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index be989e0a207..bde9d089015 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -10,7 +10,6 @@
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_io.h"
-#include "cnxk_ml_xstats.h"
 
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
@@ -133,9 +132,6 @@ struct cn10k_ml_dev {
 	/* OCM info */
 	struct cn10k_ml_ocm ocm;
 
-	/* Extended stats data */
-	struct cnxk_ml_xstats xstats;
-
 	/* Enable / disable model data caching */
 	int cache_model_data;
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 59cd3bb9b34..f1431b89a2d 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -202,107 +202,21 @@ cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_r
 	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
 }
 
-static int
-cn10k_ml_xstats_init(struct rte_ml_dev *dev)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint16_t nb_stats;
-	uint16_t stat_id;
-	uint16_t model;
-	uint16_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
-	nb_stats = RTE_DIM(device_xstats) + ML_CNXK_MAX_MODELS * RTE_DIM(layer_xstats);
-	if (cn10k_mldev->xstats.entries == NULL)
-		cn10k_mldev->xstats.entries = rte_zmalloc(
-			"cn10k_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
-			PLT_CACHE_LINE_SIZE);
-
-	if (cn10k_mldev->xstats.entries == NULL)
-		return -ENOMEM;
-
-	/* Initialize device xstats */
-	stat_id = 0;
-	for (i = 0; i < RTE_DIM(device_xstats); i++) {
-		cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
-		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
-			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s",
-			 device_xstats[i].name);
-
-		cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
-		cn10k_mldev->xstats.entries[stat_id].type = device_xstats[i].type;
-		cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_DEVICE;
-		cn10k_mldev->xstats.entries[stat_id].obj_idx = 0;
-		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_xstats[i].reset_allowed;
-		stat_id++;
-	}
-	cn10k_mldev->xstats.count_mode_device = stat_id;
-
-	/* Initialize model xstats */
-	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
-		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
-
-		for (i = 0; i < RTE_DIM(layer_xstats); i++) {
-			cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
-			cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
-			cn10k_mldev->xstats.entries[stat_id].type = layer_xstats[i].type;
-			cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
-			cn10k_mldev->xstats.entries[stat_id].obj_idx = model;
-			cn10k_mldev->xstats.entries[stat_id].reset_allowed =
-				layer_xstats[i].reset_allowed;
-
-			/* Name of xstat is updated during model load */
-			snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
-				 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name),
-				 "Model-%u-%s", model, layer_xstats[i].name);
-
-			stat_id++;
-		}
-
-		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
-	}
-
-	cn10k_mldev->xstats.count_mode_model = stat_id - cn10k_mldev->xstats.count_mode_device;
-	cn10k_mldev->xstats.count = stat_id;
-
-	return 0;
-}
-
 static void
-cn10k_ml_xstats_uninit(struct rte_ml_dev *dev)
+cn10k_ml_xstats_layer_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id,
+				  uint16_t layer_id)
 {
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	rte_free(cn10k_mldev->xstats.entries);
-	cn10k_mldev->xstats.entries = NULL;
-
-	cn10k_mldev->xstats.count = 0;
-}
-
-static void
-cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	uint16_t rclk_freq;
 	uint16_t sclk_freq;
 	uint16_t stat_id;
 	char suffix[8];
 	uint16_t i;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	model = dev->data->models[model_id];
-	stat_id = RTE_DIM(device_xstats) + model_id * RTE_DIM(layer_xstats);
+	model = cnxk_mldev->mldev->data->models[model_id];
+	layer = &model->layer[layer_id];
+	stat_id = cnxk_mldev->xstats.offset_for_layer[model_id][layer_id];
 
 	roc_clk_freq_get(&rclk_freq, &sclk_freq);
 	if (sclk_freq == 0)
@@ -310,270 +224,15 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 	else
 		strcpy(suffix, "ns");
 
-	/* Update xstat name based on model name and sclk availability */
+	/* Update xstat name based on layer name and sclk availability */
 	for (i = 0; i < RTE_DIM(layer_xstats); i++) {
-		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
-			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
-			 model->layer[0].glow.metadata.model.name, layer_xstats[i].name, suffix);
+		snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+			 layer->glow.metadata.model.name, layer_xstats[i].name, suffix);
 		stat_id++;
 	}
 }
 
-static uint64_t
-cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
-		       enum cnxk_ml_xstats_type type)
-{
-	struct cnxk_ml_dev *cnxk_mldev;
-
-	cnxk_mldev = dev->data->dev_private;
-
-	switch (type) {
-	case nb_models_loaded:
-		return cnxk_mldev->nb_models_loaded;
-	case nb_models_unloaded:
-		return cnxk_mldev->nb_models_unloaded;
-	case nb_models_started:
-		return cnxk_mldev->nb_models_started;
-	case nb_models_stopped:
-		return cnxk_mldev->nb_models_stopped;
-	default:
-		return -1;
-	}
-
-	return 0;
-}
-
-#define ML_AVG_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
-	do {                                                                                       \
-		value = 0;                                                                         \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value += model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot;       \
-			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
-				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
-		}                                                                                  \
-		if (count != 0)                                                                    \
-			value = value / count;                                                     \
-	} while (0)
-
-#define ML_MIN_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
-	do {                                                                                       \
-		value = UINT64_MAX;                                                                \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MIN(                                                           \
-				value,                                                             \
-				model->layer[0].glow.burst_xstats[qp_id].str##_latency_min);       \
-			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
-				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
-		}                                                                                  \
-		if (count == 0)                                                                    \
-			value = 0;                                                                 \
-	} while (0)
-
-#define ML_MAX_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
-	do {                                                                                       \
-		value = 0;                                                                         \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MAX(                                                           \
-				value,                                                             \
-				model->layer[0].glow.burst_xstats[qp_id].str##_latency_max);       \
-			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
-				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
-		}                                                                                  \
-		if (count == 0)                                                                    \
-			value = 0;                                                                 \
-	} while (0)
-
-static uint64_t
-cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cnxk_ml_xstats_type type)
-{
-	struct cnxk_ml_model *model;
-	uint16_t rclk_freq; /* MHz */
-	uint16_t sclk_freq; /* MHz */
-	uint64_t count = 0;
-	uint64_t value;
-	uint32_t qp_id;
-
-	model = dev->data->models[obj_idx];
-	if (model == NULL)
-		return 0;
-
-	switch (type) {
-	case avg_hw_latency:
-		ML_AVG_FOREACH_QP(dev, model, qp_id, hw, value, count);
-		break;
-	case min_hw_latency:
-		ML_MIN_FOREACH_QP(dev, model, qp_id, hw, value, count);
-		break;
-	case max_hw_latency:
-		ML_MAX_FOREACH_QP(dev, model, qp_id, hw, value, count);
-		break;
-	case avg_fw_latency:
-		ML_AVG_FOREACH_QP(dev, model, qp_id, fw, value, count);
-		break;
-	case min_fw_latency:
-		ML_MIN_FOREACH_QP(dev, model, qp_id, fw, value, count);
-		break;
-	case max_fw_latency:
-		ML_MAX_FOREACH_QP(dev, model, qp_id, fw, value, count);
-		break;
-	default:
-		value = 0;
-	}
-
-	roc_clk_freq_get(&rclk_freq, &sclk_freq);
-	if (sclk_freq != 0) /* return in ns */
-		value = (value * 1000ULL) / sclk_freq;
-
-	return value;
-}
-
-static int
-cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[], uint16_t nb_ids)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_xstats_entry *xs;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint16_t nb_stats;
-	uint16_t stat_id;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	if (stat_ids == NULL)
-		nb_stats = cn10k_mldev->xstats.count_mode_device;
-	else
-		nb_stats = nb_ids;
-
-	for (i = 0; i < nb_stats; i++) {
-		if (stat_ids == NULL)
-			stat_id = i;
-		else
-			stat_id = stat_ids[i];
-
-		if (stat_id >= cn10k_mldev->xstats.count_mode_device)
-			return -EINVAL;
-
-		xs = &cn10k_mldev->xstats.entries[stat_id];
-		if (!xs->reset_allowed)
-			continue;
-
-		xs->reset_value = cn10k_ml_dev_xstat_get(dev, xs->obj_idx, xs->type);
-	}
-
-	return 0;
-}
-
-#define ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
-	do {                                                                                       \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot = 0;            \
-			model->layer[0].glow.burst_xstats[qp_id].str##_reset_count =               \
-				model->layer[0].glow.burst_xstats[qp_id].dequeued_count;           \
-		}                                                                                  \
-	} while (0)
-
-#define ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
-	do {                                                                                       \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_xstats[qp_id].str##_latency_min = UINT64_MAX;   \
-	} while (0)
-
-#define ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
-	do {                                                                                       \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_xstats[qp_id].str##_latency_max = 0;            \
-	} while (0)
-
-static void
-cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cnxk_ml_xstats_type type)
-{
-	struct cnxk_ml_model *model;
-	uint32_t qp_id;
-
-	model = dev->data->models[model_id];
-
-	switch (type) {
-	case avg_hw_latency:
-		ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, hw);
-		break;
-	case min_hw_latency:
-		ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, hw);
-		break;
-	case max_hw_latency:
-		ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, hw);
-		break;
-	case avg_fw_latency:
-		ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, fw);
-		break;
-	case min_fw_latency:
-		ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, fw);
-		break;
-	case max_fw_latency:
-		ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, fw);
-		break;
-	default:
-		return;
-	}
-}
-
-static int
-cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint16_t stat_ids[],
-			    uint16_t nb_ids)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_xstats_entry *xs;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
-	int32_t lcl_model_id = 0;
-	uint16_t start_id;
-	uint16_t end_id;
-	int32_t i;
-	int32_t j;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	for (i = 0; i < ML_CNXK_MAX_MODELS; i++) {
-		if (model_id == -1) {
-			model = dev->data->models[i];
-			if (model == NULL) /* Skip inactive models */
-				continue;
-		} else {
-			if (model_id != i)
-				continue;
-
-			model = dev->data->models[model_id];
-			if (model == NULL) {
-				plt_err("Invalid model_id = %d\n", model_id);
-				return -EINVAL;
-			}
-		}
-
-		start_id = cn10k_mldev->xstats.offset_for_model[i];
-		end_id = cn10k_mldev->xstats.offset_for_model[i] +
-			 cn10k_mldev->xstats.count_per_model[i] - 1;
-
-		if (stat_ids == NULL) {
-			for (j = start_id; j <= end_id; j++) {
-				xs = &cn10k_mldev->xstats.entries[j];
-				cn10k_ml_reset_model_stat(dev, i, xs->type);
-			}
-		} else {
-			for (j = 0; j < nb_ids; j++) {
-				if (stat_ids[j] < start_id || stat_ids[j] > end_id) {
-					plt_err("Invalid stat_ids[%d] = %d for model_id = %d\n", j,
-						stat_ids[j], lcl_model_id);
-					return -EINVAL;
-				}
-				xs = &cn10k_mldev->xstats.entries[stat_ids[j]];
-				cn10k_ml_reset_model_stat(dev, i, xs->type);
-			}
-		}
-	}
-
-	return 0;
-}
-
 static int
 cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer)
 {
@@ -658,7 +317,6 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cn10k_ml_ocm *ocm;
 	uint16_t tile_id;
-	int ret;
 
 	RTE_SET_USED(conf);
 
@@ -686,13 +344,6 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 
 	rte_spinlock_init(&ocm->lock);
 
-	/* Initialize xstats */
-	ret = cn10k_ml_xstats_init(cnxk_mldev->mldev);
-	if (ret != 0) {
-		plt_err("Failed to initialize xstats");
-		return ret;
-	}
-
 	/* Set JCMDQ enqueue function */
 	if (cn10k_mldev->hw_queue_lock == 1)
 		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_sl;
@@ -721,9 +372,6 @@ cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 	/* Release ocm_mask memory */
 	rte_free(cn10k_mldev->ocm.ocm_mask);
 
-	/* Un-initialize xstats */
-	cn10k_ml_xstats_uninit(cnxk_mldev->mldev);
-
 	/* Unload firmware */
 	cn10k_ml_fw_unload(cnxk_mldev);
 
@@ -774,174 +422,6 @@ cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
-int
-cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
-			      uint32_t size)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint32_t xstats_mode_count;
-	uint32_t idx = 0;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	xstats_mode_count = 0;
-	switch (mode) {
-	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
-		break;
-	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CNXK_MAX_MODELS)
-			break;
-		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
-		break;
-	default:
-		return -EINVAL;
-	};
-
-	if (xstats_mode_count > size || xstats_map == NULL)
-		return xstats_mode_count;
-
-	for (i = 0; i < cn10k_mldev->xstats.count && idx < size; i++) {
-		if (cn10k_mldev->xstats.entries[i].mode != mode)
-			continue;
-
-		if (mode != RTE_ML_DEV_XSTATS_DEVICE &&
-		    model_id != cn10k_mldev->xstats.entries[i].obj_idx)
-			continue;
-
-		strncpy(xstats_map[idx].name, cn10k_mldev->xstats.entries[i].map.name,
-			RTE_ML_STR_MAX);
-		xstats_map[idx].id = cn10k_mldev->xstats.entries[i].map.id;
-		idx++;
-	}
-
-	return idx;
-}
-
-int
-cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
-				uint64_t *value)
-{
-	struct cnxk_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	cnxk_ml_xstats_fn fn;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	for (i = 0; i < cn10k_mldev->xstats.count; i++) {
-		xs = &cn10k_mldev->xstats.entries[i];
-		if (strncmp(xs->map.name, name, RTE_ML_STR_MAX) == 0) {
-			if (stat_id != NULL)
-				*stat_id = xs->map.id;
-
-			switch (xs->fn_id) {
-			case CNXK_ML_XSTATS_FN_DEVICE:
-				fn = cn10k_ml_dev_xstat_get;
-				break;
-			case CNXK_ML_XSTATS_FN_MODEL:
-				fn = cn10k_ml_model_xstat_get;
-				break;
-			default:
-				plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
-				return -EINVAL;
-			}
-
-			*value = fn(dev, xs->obj_idx, xs->type) - xs->reset_value;
-
-			return 0;
-		}
-	}
-
-	if (stat_id != NULL)
-		*stat_id = (uint16_t)-1;
-
-	return -EINVAL;
-}
-
-int
-cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
-			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_xstats_entry *xs;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint32_t xstats_mode_count;
-	cnxk_ml_xstats_fn fn;
-	uint64_t val;
-	uint32_t idx;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	xstats_mode_count = 0;
-
-	switch (mode) {
-	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
-		break;
-	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CNXK_MAX_MODELS)
-			return -EINVAL;
-		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
-		break;
-	default:
-		return -EINVAL;
-	};
-
-	idx = 0;
-	for (i = 0; i < nb_ids && idx < xstats_mode_count; i++) {
-		xs = &cn10k_mldev->xstats.entries[stat_ids[i]];
-		if (stat_ids[i] > cn10k_mldev->xstats.count || xs->mode != mode)
-			continue;
-
-		if (mode == RTE_ML_DEV_XSTATS_MODEL && model_id != xs->obj_idx) {
-			plt_err("Invalid stats_id[%d] = %d for model_id = %d\n", i, stat_ids[i],
-				model_id);
-			return -EINVAL;
-		}
-
-		switch (xs->fn_id) {
-		case CNXK_ML_XSTATS_FN_DEVICE:
-			fn = cn10k_ml_dev_xstat_get;
-			break;
-		case CNXK_ML_XSTATS_FN_MODEL:
-			fn = cn10k_ml_model_xstat_get;
-			break;
-		default:
-			plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
-			return -EINVAL;
-		}
-
-		val = fn(dev, xs->obj_idx, xs->type);
-		if (values)
-			values[idx] = val;
-
-		idx++;
-	}
-
-	return idx;
-}
-
-int
-cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			  int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids)
-{
-	switch (mode) {
-	case RTE_ML_DEV_XSTATS_DEVICE:
-		return cn10k_ml_device_xstats_reset(dev, stat_ids, nb_ids);
-	case RTE_ML_DEV_XSTATS_MODEL:
-		return cn10k_ml_model_xstats_reset(dev, model_id, stat_ids, nb_ids);
-	};
-
-	return 0;
-}
-
 int
 cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
 {
@@ -1215,7 +695,7 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 							      sizeof(struct cn10k_ml_layer_xstats));
 
 	/* Update xstats names */
-	cn10k_ml_xstats_model_name_update(cnxk_mldev->mldev, idx);
+	cn10k_ml_xstats_layer_name_update(cnxk_mldev, model_id, layer_id);
 
 	layer->state = ML_CNXK_LAYER_STATE_LOADED;
 	cnxk_mldev->index_map[idx].model_id = model->model_id;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 47e7cb12af4..8a090a31592 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -298,17 +298,6 @@ int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev);
 
-int cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-				  int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
-				  uint32_t size);
-int cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
-				    uint64_t *value);
-int cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			    int32_t model_id, const uint16_t stat_ids[], uint64_t values[],
-			    uint16_t nb_ids);
-int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			      int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids);
-
 /* Slow-path ops */
 int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 1590249abd1..3ce9338f1f1 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -9,6 +9,8 @@
 
 #include "cn10k_ml_dev.h"
 
+#include "cnxk_ml_xstats.h"
+
 /* ML command timeout in seconds */
 #define ML_CNXK_CMD_TIMEOUT 5
 
@@ -51,6 +53,9 @@ struct cnxk_ml_dev {
 	/* Configuration state */
 	enum cnxk_ml_dev_state state;
 
+	/* Extended stats data */
+	struct cnxk_ml_xstats xstats;
+
 	/* Number of models loaded */
 	uint16_t nb_models_loaded;
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index ffeb3f44523..96d40029b36 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -117,6 +117,344 @@ cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc
 	return NULL;
 }
 
+static int
+cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
+{
+	uint16_t nb_stats;
+	uint16_t stat_id;
+	uint16_t model;
+	uint16_t layer;
+	uint16_t i;
+
+	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
+	nb_stats = RTE_DIM(device_xstats) +
+		   RTE_DIM(layer_xstats) * ML_CNXK_MAX_MODELS * ML_CNXK_MODEL_MAX_LAYERS;
+	if (cnxk_mldev->xstats.entries == NULL)
+		cnxk_mldev->xstats.entries = rte_zmalloc(
+			"cnxk_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
+			PLT_CACHE_LINE_SIZE);
+
+	if (cnxk_mldev->xstats.entries == NULL)
+		return -ENOMEM;
+
+	/* Initialize device xstats */
+	stat_id = 0;
+	for (i = 0; i < RTE_DIM(device_xstats); i++) {
+		cnxk_mldev->xstats.entries[stat_id].map.id = stat_id;
+		snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s",
+			 device_xstats[i].name);
+
+		cnxk_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
+		cnxk_mldev->xstats.entries[stat_id].group = CNXK_ML_XSTATS_GROUP_DEVICE;
+		cnxk_mldev->xstats.entries[stat_id].type = device_xstats[i].type;
+		cnxk_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_DEVICE;
+		cnxk_mldev->xstats.entries[stat_id].obj_idx = 0;
+		cnxk_mldev->xstats.entries[stat_id].reset_allowed = device_xstats[i].reset_allowed;
+		stat_id++;
+	}
+	cnxk_mldev->xstats.count_mode_device = stat_id;
+
+	/* Initialize model xstats */
+	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
+		cnxk_mldev->xstats.offset_for_model[model] = stat_id;
+
+		for (layer = 0; layer < ML_CNXK_MODEL_MAX_LAYERS; layer++) {
+			cnxk_mldev->xstats.offset_for_layer[model][layer] = stat_id;
+
+			for (i = 0; i < RTE_DIM(layer_xstats); i++) {
+				cnxk_mldev->xstats.entries[stat_id].map.id = stat_id;
+				cnxk_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
+				cnxk_mldev->xstats.entries[stat_id].group =
+					CNXK_ML_XSTATS_GROUP_LAYER;
+				cnxk_mldev->xstats.entries[stat_id].type = layer_xstats[i].type;
+				cnxk_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
+				cnxk_mldev->xstats.entries[stat_id].obj_idx = model;
+				cnxk_mldev->xstats.entries[stat_id].layer_id = layer;
+				cnxk_mldev->xstats.entries[stat_id].reset_allowed =
+					layer_xstats[i].reset_allowed;
+
+				/* Name of xstat is updated during model load */
+				snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+					 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name),
+					 "Layer-%u-%u-%s", model, layer, layer_xstats[i].name);
+
+				stat_id++;
+			}
+
+			cnxk_mldev->xstats.count_per_layer[model][layer] = RTE_DIM(layer_xstats);
+		}
+
+		cnxk_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
+	}
+
+	cnxk_mldev->xstats.count_mode_model = stat_id - cnxk_mldev->xstats.count_mode_device;
+	cnxk_mldev->xstats.count = stat_id;
+
+	return 0;
+}
+
+static void
+cnxk_ml_xstats_uninit(struct cnxk_ml_dev *cnxk_mldev)
+{
+	rte_free(cnxk_mldev->xstats.entries);
+	cnxk_mldev->xstats.entries = NULL;
+
+	cnxk_mldev->xstats.count = 0;
+}
+
+static uint64_t
+cnxk_ml_dev_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx __rte_unused,
+		      int32_t layer_id __rte_unused, enum cnxk_ml_xstats_type type)
+{
+	switch (type) {
+	case nb_models_loaded:
+		return cnxk_mldev->nb_models_loaded;
+	case nb_models_unloaded:
+		return cnxk_mldev->nb_models_unloaded;
+	case nb_models_started:
+		return cnxk_mldev->nb_models_started;
+	case nb_models_stopped:
+		return cnxk_mldev->nb_models_stopped;
+	default:
+		return -1;
+	}
+
+	return 0;
+}
+
+#define ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value += layer->glow.burst_xstats[qp_id].str##_latency_tot;                \
+			count += layer->glow.burst_xstats[qp_id].dequeued_count -                  \
+				 layer->glow.burst_xstats[qp_id].str##_reset_count;                \
+		}                                                                                  \
+		if (count != 0)                                                                    \
+			value = value / count;                                                     \
+	} while (0)
+
+#define ML_MIN_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
+	do {                                                                                       \
+		value = UINT64_MAX;                                                                \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MIN(value, layer->glow.burst_xstats[qp_id].str##_latency_min); \
+			count += layer->glow.burst_xstats[qp_id].dequeued_count -                  \
+				 layer->glow.burst_xstats[qp_id].str##_reset_count;                \
+		}                                                                                  \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+
+#define ML_MAX_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MAX(value, layer->glow.burst_xstats[qp_id].str##_latency_max); \
+			count += layer->glow.burst_xstats[qp_id].dequeued_count -                  \
+				 layer->glow.burst_xstats[qp_id].str##_reset_count;                \
+		}                                                                                  \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+
+static uint64_t
+cnxk_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx, int32_t layer_id,
+			enum cnxk_ml_xstats_type type)
+{
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
+	uint16_t rclk_freq; /* MHz */
+	uint16_t sclk_freq; /* MHz */
+	uint64_t count = 0;
+	uint64_t value;
+	uint32_t qp_id;
+
+	model = cnxk_mldev->mldev->data->models[obj_idx];
+	if (model == NULL)
+		return 0;
+
+	if (layer_id >= 0)
+		layer = &model->layer[layer_id];
+	else
+		layer = NULL;
+
+	switch (type) {
+	case avg_hw_latency:
+		ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, hw, value, count);
+		break;
+	case min_hw_latency:
+		ML_MIN_FOREACH_QP(cnxk_mldev, layer, qp_id, hw, value, count);
+		break;
+	case max_hw_latency:
+		ML_MAX_FOREACH_QP(cnxk_mldev, layer, qp_id, hw, value, count);
+		break;
+	case avg_fw_latency:
+		ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, fw, value, count);
+		break;
+	case min_fw_latency:
+		ML_MIN_FOREACH_QP(cnxk_mldev, layer, qp_id, fw, value, count);
+		break;
+	case max_fw_latency:
+		ML_MAX_FOREACH_QP(cnxk_mldev, layer, qp_id, fw, value, count);
+		break;
+	default:
+		value = 0;
+	}
+
+	roc_clk_freq_get(&rclk_freq, &sclk_freq);
+	if (sclk_freq != 0) /* return in ns */
+		value = (value * 1000ULL) / sclk_freq;
+
+	return value;
+}
+
+static int
+cnxk_ml_device_xstats_reset(struct cnxk_ml_dev *cnxk_mldev, const uint16_t stat_ids[],
+			    uint16_t nb_ids)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	uint16_t nb_stats;
+	uint16_t stat_id;
+	uint32_t i;
+
+	if (stat_ids == NULL)
+		nb_stats = cnxk_mldev->xstats.count_mode_device;
+	else
+		nb_stats = nb_ids;
+
+	for (i = 0; i < nb_stats; i++) {
+		if (stat_ids == NULL)
+			stat_id = i;
+		else
+			stat_id = stat_ids[i];
+
+		if (stat_id >= cnxk_mldev->xstats.count_mode_device)
+			return -EINVAL;
+
+		xs = &cnxk_mldev->xstats.entries[stat_id];
+		if (!xs->reset_allowed)
+			continue;
+
+		xs->reset_value =
+			cnxk_ml_dev_xstat_get(cnxk_mldev, xs->obj_idx, xs->layer_id, xs->type);
+	}
+
+	return 0;
+}
+
+#define ML_AVG_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, str)                                     \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			layer->glow.burst_xstats[qp_id].str##_latency_tot = 0;                     \
+			layer->glow.burst_xstats[qp_id].str##_reset_count =                        \
+				layer->glow.burst_xstats[qp_id].dequeued_count;                    \
+		}                                                                                  \
+	} while (0)
+
+#define ML_MIN_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, str)                                     \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++)          \
+			layer->glow.burst_xstats[qp_id].str##_latency_min = UINT64_MAX;            \
+	} while (0)
+
+#define ML_MAX_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, str)                                     \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++)          \
+			layer->glow.burst_xstats[qp_id].str##_latency_max = 0;                     \
+	} while (0)
+
+static void
+cnxk_ml_reset_model_stat(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id,
+			 enum cnxk_ml_xstats_type type)
+{
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
+	uint16_t layer_id = 0;
+	uint32_t qp_id;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	layer = &model->layer[layer_id];
+
+	switch (type) {
+	case avg_hw_latency:
+		ML_AVG_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, hw);
+		break;
+	case min_hw_latency:
+		ML_MIN_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, hw);
+		break;
+	case max_hw_latency:
+		ML_MAX_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, hw);
+		break;
+	case avg_fw_latency:
+		ML_AVG_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, fw);
+		break;
+	case min_fw_latency:
+		ML_MIN_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, fw);
+		break;
+	case max_fw_latency:
+		ML_MAX_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, fw);
+		break;
+	default:
+		return;
+	}
+}
+
+static int
+cnxk_ml_model_xstats_reset(struct cnxk_ml_dev *cnxk_mldev, int32_t model_id,
+			   const uint16_t stat_ids[], uint16_t nb_ids)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_model *model;
+	int32_t lcl_model_id = 0;
+	uint16_t layer_id = 0;
+	uint16_t start_id;
+	uint16_t end_id;
+	int32_t i;
+	int32_t j;
+
+	for (i = 0; i < ML_CNXK_MAX_MODELS; i++) {
+		if (model_id == -1) {
+			model = cnxk_mldev->mldev->data->models[i];
+			if (model == NULL) /* skip inactive models */
+				continue;
+		} else {
+			if (model_id != i)
+				continue;
+
+			model = cnxk_mldev->mldev->data->models[model_id];
+			if (model == NULL) {
+				plt_err("Invalid model_id = %d\n", model_id);
+				return -EINVAL;
+			}
+		}
+
+		start_id = cnxk_mldev->xstats.offset_for_layer[i][layer_id];
+		end_id = cnxk_mldev->xstats.offset_for_layer[i][layer_id] +
+			 cnxk_mldev->xstats.count_per_layer[i][layer_id] - 1;
+
+		if (stat_ids == NULL) {
+			for (j = start_id; j <= end_id; j++) {
+				xs = &cnxk_mldev->xstats.entries[j];
+				cnxk_ml_reset_model_stat(cnxk_mldev, i, xs->type);
+			}
+		} else {
+			for (j = 0; j < nb_ids; j++) {
+				if (stat_ids[j] < start_id || stat_ids[j] > end_id) {
+					plt_err("Invalid stat_ids[%d] = %d for model_id = %d\n", j,
+						stat_ids[j], lcl_model_id);
+					return -EINVAL;
+				}
+				xs = &cnxk_mldev->xstats.entries[stat_ids[j]];
+				cnxk_ml_reset_model_stat(cnxk_mldev, i, xs->type);
+			}
+		}
+	}
+
+	return 0;
+}
+
 static int
 cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -296,6 +634,13 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	for (i = 0; i < cnxk_mldev->max_nb_layers; i++)
 		cnxk_mldev->index_map[i].active = false;
 
+	/* Initialize xstats */
+	ret = cnxk_ml_xstats_init(cnxk_mldev);
+	if (ret != 0) {
+		plt_err("Failed to initialize xstats");
+		goto error;
+	}
+
 	cnxk_mldev->nb_models_loaded = 0;
 	cnxk_mldev->nb_models_started = 0;
 	cnxk_mldev->nb_models_stopped = 0;
@@ -325,6 +670,9 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
+	/* Un-initialize xstats */
+	cnxk_ml_xstats_uninit(cnxk_mldev);
+
 	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close CN10K ML Device");
 
@@ -523,6 +871,190 @@ cnxk_ml_dev_stats_reset(struct rte_ml_dev *dev)
 	}
 }
 
+static int
+cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+			     int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
+			     uint32_t size)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_dev *cnxk_mldev;
+	uint32_t xstats_mode_count;
+	uint16_t layer_id = 0;
+	uint32_t idx = 0;
+	uint32_t i;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+	xstats_mode_count = 0;
+
+	switch (mode) {
+	case RTE_ML_DEV_XSTATS_DEVICE:
+		xstats_mode_count = cnxk_mldev->xstats.count_mode_device;
+		break;
+	case RTE_ML_DEV_XSTATS_MODEL:
+		if (model_id >= ML_CNXK_MAX_MODELS)
+			break;
+		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+		break;
+	default:
+		return -EINVAL;
+	};
+
+	if (xstats_mode_count > size || xstats_map == NULL)
+		return xstats_mode_count;
+
+	for (i = 0; i < cnxk_mldev->xstats.count && idx < size; i++) {
+		xs = &cnxk_mldev->xstats.entries[i];
+		if (xs->mode != mode)
+			continue;
+
+		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
+		    (model_id != xs->obj_idx || layer_id != xs->layer_id))
+			continue;
+
+		strncpy(xstats_map[idx].name, xs->map.name, RTE_ML_STR_MAX);
+		xstats_map[idx].id = xs->map.id;
+		idx++;
+	}
+
+	return idx;
+}
+
+static int
+cnxk_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
+			       uint64_t *value)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_dev *cnxk_mldev;
+	cnxk_ml_xstats_fn fn;
+	uint32_t i;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	for (i = 0; i < cnxk_mldev->xstats.count; i++) {
+		xs = &cnxk_mldev->xstats.entries[i];
+		if (strncmp(xs->map.name, name, RTE_ML_STR_MAX) == 0) {
+			if (stat_id != NULL)
+				*stat_id = xs->map.id;
+
+			switch (xs->fn_id) {
+			case CNXK_ML_XSTATS_FN_DEVICE:
+				fn = cnxk_ml_dev_xstat_get;
+				break;
+			case CNXK_ML_XSTATS_FN_MODEL:
+				fn = cnxk_ml_model_xstat_get;
+				break;
+			default:
+				plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
+				return -EINVAL;
+			}
+
+			*value = fn(cnxk_mldev, xs->obj_idx, xs->layer_id, xs->type) -
+				 xs->reset_value;
+
+			return 0;
+		}
+	}
+
+	if (stat_id != NULL)
+		*stat_id = (uint16_t)-1;
+
+	return -EINVAL;
+}
+
+static int
+cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
+		       const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_dev *cnxk_mldev;
+	uint32_t xstats_mode_count;
+	uint16_t layer_id = 0;
+	cnxk_ml_xstats_fn fn;
+	uint64_t val;
+	uint32_t idx;
+	uint32_t i;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+	xstats_mode_count = 0;
+
+	switch (mode) {
+	case RTE_ML_DEV_XSTATS_DEVICE:
+		xstats_mode_count = cnxk_mldev->xstats.count_mode_device;
+		break;
+	case RTE_ML_DEV_XSTATS_MODEL:
+		if (model_id >= ML_CNXK_MAX_MODELS)
+			return -EINVAL;
+		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+		break;
+	default:
+		return -EINVAL;
+	};
+
+	idx = 0;
+	for (i = 0; i < nb_ids && idx < xstats_mode_count; i++) {
+		xs = &cnxk_mldev->xstats.entries[stat_ids[i]];
+		if (stat_ids[i] > cnxk_mldev->xstats.count || xs->mode != mode)
+			continue;
+
+		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
+		    (model_id != xs->obj_idx || layer_id != xs->layer_id)) {
+			plt_err("Invalid stats_id[%d] = %d for model_id = %d\n", i, stat_ids[i],
+				model_id);
+			return -EINVAL;
+		}
+
+		switch (xs->fn_id) {
+		case CNXK_ML_XSTATS_FN_DEVICE:
+			fn = cnxk_ml_dev_xstat_get;
+			break;
+		case CNXK_ML_XSTATS_FN_MODEL:
+			fn = cnxk_ml_model_xstat_get;
+			break;
+		default:
+			plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
+			return -EINVAL;
+		}
+
+		val = fn(cnxk_mldev, xs->obj_idx, xs->layer_id, xs->type);
+		if (values)
+			values[idx] = val;
+
+		idx++;
+	}
+
+	return idx;
+}
+
+static int
+cnxk_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
+			 const uint16_t stat_ids[], uint16_t nb_ids)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	switch (mode) {
+	case RTE_ML_DEV_XSTATS_DEVICE:
+		return cnxk_ml_device_xstats_reset(cnxk_mldev, stat_ids, nb_ids);
+	case RTE_ML_DEV_XSTATS_MODEL:
+		return cnxk_ml_model_xstats_reset(cnxk_mldev, model_id, stat_ids, nb_ids);
+	};
+
+	return 0;
+}
+
 static int
 cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
@@ -808,10 +1340,10 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Stats ops */
 	.dev_stats_get = cnxk_ml_dev_stats_get,
 	.dev_stats_reset = cnxk_ml_dev_stats_reset,
-	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
-	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
-	.dev_xstats_get = cn10k_ml_dev_xstats_get,
-	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
+	.dev_xstats_names_get = cnxk_ml_dev_xstats_names_get,
+	.dev_xstats_by_name_get = cnxk_ml_dev_xstats_by_name_get,
+	.dev_xstats_get = cnxk_ml_dev_xstats_get,
+	.dev_xstats_reset = cnxk_ml_dev_xstats_reset,
 
 	/* Model ops */
 	.model_load = cnxk_ml_model_load,
diff --git a/drivers/ml/cnxk/cnxk_ml_xstats.h b/drivers/ml/cnxk/cnxk_ml_xstats.h
index 0d405679caa..5e02bb876ca 100644
--- a/drivers/ml/cnxk/cnxk_ml_xstats.h
+++ b/drivers/ml/cnxk/cnxk_ml_xstats.h
@@ -7,6 +7,8 @@
 
 #include "cnxk_ml_io.h"
 
+struct cnxk_ml_dev;
+
 /* Extended stats types enum */
 enum cnxk_ml_xstats_type {
 	/* Number of models loaded */
@@ -58,9 +60,21 @@ enum cnxk_ml_xstats_fn_type {
 	CNXK_ML_XSTATS_FN_MODEL,
 };
 
+/* Extended stats group */
+enum cnxk_ml_xstats_group {
+	/* Device stats */
+	CNXK_ML_XSTATS_GROUP_DEVICE,
+
+	/* Model stats */
+	CNXK_ML_XSTATS_GROUP_MODEL,
+
+	/* Layer stats */
+	CNXK_ML_XSTATS_GROUP_LAYER,
+};
+
 /* Function pointer to get xstats for a type */
-typedef uint64_t (*cnxk_ml_xstats_fn)(struct rte_ml_dev *cnxk_mldev, uint16_t obj_idx,
-				      enum cnxk_ml_xstats_type stat);
+typedef uint64_t (*cnxk_ml_xstats_fn)(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx,
+				      int32_t layer_id, enum cnxk_ml_xstats_type stat);
 
 /* Extended stats entry structure */
 struct cnxk_ml_xstats_entry {
@@ -70,6 +84,9 @@ struct cnxk_ml_xstats_entry {
 	/* xstats mode, device or model */
 	enum rte_ml_dev_xstats_mode mode;
 
+	/* xstats group */
+	enum cnxk_ml_xstats_group group;
+
 	/* Type of xstats */
 	enum cnxk_ml_xstats_type type;
 
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 17/34] ml/cnxk: update fast path functions
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (15 preceding siblings ...)
  2023-08-30 15:59 ` [PATCH v1 16/34] ml/cnxk: update device and model xstats functions Srikanth Yalavarthi
@ 2023-08-30 15:59 ` Srikanth Yalavarthi
  2023-08-30 15:59 ` [PATCH v1 18/34] ml/cnxk: move error handling to cnxk layer Srikanth Yalavarthi
                   ` (24 subsequent siblings)
  41 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented cnxk layer fast-path functions and added support
for model specific fast-path functions. CNXK layer functions
would invoke model specific fast-path functions.

Added support for model specific poll handling functions and
updated internal inference sync function. Drop use of rte_ml_op
as argument. Updated function arguments to enable the function
to be used as callback by TVM HW runtime.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h  |   5 -
 drivers/ml/cnxk/cn10k_ml_ops.c  | 241 ++++++++------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h  |  13 +-
 drivers/ml/cnxk/cnxk_ml_model.h |  14 ++
 drivers/ml/cnxk/cnxk_ml_ops.c   | 128 +++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.h   |   7 +
 6 files changed, 216 insertions(+), 192 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index bde9d089015..94a94d996f8 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -143,11 +143,6 @@ struct cn10k_ml_dev {
 
 	/* JCMD enqueue function handler */
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
-
-	/* Poll handling function pointers */
-	void (*set_poll_addr)(struct cnxk_ml_req *req);
-	void (*set_poll_ptr)(struct cnxk_ml_req *req);
-	uint64_t (*get_poll_ptr)(struct cnxk_ml_req *req);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index f1431b89a2d..7d809d25ae0 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -69,24 +69,12 @@ static const struct cn10k_ml_stype_db_driver {
 	{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
 };
 
-static inline void
+__rte_hot void
 cn10k_ml_set_poll_addr(struct cnxk_ml_req *req)
 {
 	req->status = &req->cn10k_req.status;
 }
 
-static inline void
-cn10k_ml_set_poll_ptr(struct cnxk_ml_req *req)
-{
-	plt_write64(ML_CNXK_POLL_JOB_START, req->status);
-}
-
-static inline uint64_t
-cn10k_ml_get_poll_ptr(struct cnxk_ml_req *req)
-{
-	return plt_read64(req->status);
-}
-
 void
 cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp)
 {
@@ -181,7 +169,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_l
 
 static __rte_always_inline void
 cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_req *req,
-				struct rte_ml_op *op)
+				uint16_t index, void *input, void *output, uint16_t nb_batches)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 
@@ -189,17 +177,17 @@ cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_r
 
 	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
 	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(req->status);
-	req->cn10k_req.jd.hdr.model_id = op->model_id;
+	req->cn10k_req.jd.hdr.model_id = index;
 	req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
 	req->cn10k_req.jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
 	req->cn10k_req.jd.hdr.sp_flags = 0x0;
 	req->cn10k_req.jd.hdr.result =
 		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
 	req->cn10k_req.jd.model_run.input_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->input[0]->addr));
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, input));
 	req->cn10k_req.jd.model_run.output_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->output[0]->addr));
-	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, output));
+	req->cn10k_req.jd.model_run.num_batches = nb_batches;
 }
 
 static void
@@ -236,30 +224,15 @@ cn10k_ml_xstats_layer_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model
 static int
 cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer)
 {
-	struct rte_ml_buff_seg seg[2];
-	struct rte_ml_buff_seg *inp;
-	struct rte_ml_buff_seg *out;
-	struct rte_ml_op op;
-
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
 	uint64_t isize = 0;
 	uint64_t osize = 0;
 	int ret = 0;
-	uint32_t i;
-
-	inp = &seg[0];
-	out = &seg[1];
 
 	/* Create input and output buffers. */
-	for (i = 0; i < layer->info.nb_inputs; i++)
-		isize += layer->info.input[i].sz_q;
-
-	for (i = 0; i < layer->info.nb_outputs; i++)
-		osize += layer->info.output[i].sz_q;
-
-	isize = layer->batch_size * isize;
-	osize = layer->batch_size * osize;
+	isize = layer->info.total_input_sz_q;
+	osize = layer->info.total_output_sz_q;
 
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", layer->index);
 	mz = plt_memzone_reserve_aligned(str, isize + osize, 0, ML_CN10K_ALIGN_SIZE);
@@ -267,25 +240,9 @@ cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *
 		return -ENOMEM;
 	memset(mz->addr, 0, isize + osize);
 
-	seg[0].addr = mz->addr;
-	seg[0].iova_addr = mz->iova;
-	seg[0].length = isize;
-	seg[0].next = NULL;
-
-	seg[1].addr = PLT_PTR_ADD(mz->addr, isize);
-	seg[1].iova_addr = mz->iova + isize;
-	seg[1].length = osize;
-	seg[1].next = NULL;
-
-	op.model_id = layer->index;
-	op.nb_batches = layer->batch_size;
-	op.mempool = NULL;
-
-	op.input = &inp;
-	op.output = &out;
-
 	memset(layer->glow.req, 0, sizeof(struct cnxk_ml_req));
-	ret = cn10k_ml_inference_sync(cnxk_mldev, &op);
+	ret = cn10k_ml_inference_sync(cnxk_mldev, layer->index, mz->addr,
+				      PLT_PTR_ADD(mz->addr, isize), 1);
 	plt_memzone_free(mz);
 
 	return ret;
@@ -350,13 +307,8 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 	else
 		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
-	/* Set polling function pointers */
-	cn10k_mldev->set_poll_addr = cn10k_ml_set_poll_addr;
-	cn10k_mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
-	cn10k_mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
-
-	cnxk_mldev->mldev->enqueue_burst = cn10k_ml_enqueue_burst;
-	cnxk_mldev->mldev->dequeue_burst = cn10k_ml_dequeue_burst;
+	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
+	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
 	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
 
 	return 0;
@@ -749,6 +701,12 @@ cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	cn10k_ml_model_info_set(cnxk_mldev, model, &model->layer[0].info, &model->glow.metadata);
 
+	/* Set fast-path functions */
+	model->enqueue_single = cn10k_ml_enqueue_single;
+	model->result_update = cn10k_ml_result_update;
+	model->set_error_code = cn10k_ml_set_error_code;
+	model->set_poll_addr = cn10k_ml_set_poll_addr;
+
 	return 0;
 }
 
@@ -1144,26 +1102,8 @@ cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_mode
 	return 0;
 }
 
-static __rte_always_inline void
-queue_index_advance(uint64_t *index, uint64_t nb_desc)
-{
-	*index = (*index + 1) % nb_desc;
-}
-
-static __rte_always_inline uint64_t
-queue_pending_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
-{
-	return (nb_desc + head - tail) % nb_desc;
-}
-
-static __rte_always_inline uint64_t
-queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
-{
-	return nb_desc - queue_pending_count(head, tail, nb_desc) - 1;
-}
-
-static __rte_always_inline void
-cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml_req *req)
+__rte_hot void
+cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_layer_xstats *xstats;
@@ -1171,6 +1111,7 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml
 	struct cn10k_ml_result *result;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
+	struct cnxk_ml_req *req;
 	struct cnxk_ml_qp *qp;
 	struct rte_ml_op *op;
 	uint64_t hw_latency;
@@ -1178,9 +1119,9 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml
 	uint16_t model_id;
 	uint16_t layer_id;
 
+	req = (struct cnxk_ml_req *)request;
 	result = &req->cn10k_req.result;
 	op = req->op;
-
 	if (likely(result->error_code == 0)) {
 		model_id = cnxk_mldev->index_map[op->model_id].model_id;
 		layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
@@ -1247,119 +1188,48 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml
 	op->user_ptr = result->user_ptr;
 }
 
-__rte_hot uint16_t
-cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
-		       uint16_t nb_ops)
+__rte_hot void
+cn10k_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype)
+{
+	union cn10k_ml_error_code *error_code;
+
+	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+	error_code->s.etype = etype;
+	error_code->s.stype = stype;
+}
+
+__rte_hot bool
+cn10k_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op, uint16_t layer_id,
+			struct cnxk_ml_qp *qp, uint64_t head)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
 	struct cnxk_ml_queue *queue;
 	struct cnxk_ml_req *req;
-	struct cnxk_ml_qp *qp;
-	struct rte_ml_op *op;
-
-	uint16_t count;
-	uint64_t head;
-	bool enqueued;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	qp = dev->data->queue_pairs[qp_id];
 	queue = &qp->queue;
-
-	head = queue->head;
-	nb_ops = PLT_MIN(nb_ops, queue_free_count(head, queue->tail, qp->nb_desc));
-	count = 0;
-
-	if (unlikely(nb_ops == 0))
-		return 0;
-
-enqueue_req:
-	op = ops[count];
 	req = &queue->reqs[head];
 
-	cn10k_mldev->set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
+	model = cnxk_mldev->mldev->data->models[op->model_id];
+	model->set_poll_addr(req);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, model->layer[layer_id].index,
+					op->input[0]->addr, op->output[0]->addr, op->nb_batches);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
 	error_code->s.etype = ML_ETYPE_UNKNOWN;
 	req->cn10k_req.result.user_ptr = op->user_ptr;
 
-	cn10k_mldev->set_poll_ptr(req);
-	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd);
-	if (unlikely(!enqueued))
-		goto jcmdq_full;
+	cnxk_ml_set_poll_ptr(req);
+	if (unlikely(!cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd)))
+		return false;
 
 	req->timeout = plt_tsc_cycles() + queue->wait_cycles;
 	req->op = op;
 
-	queue_index_advance(&head, qp->nb_desc);
-	count++;
-
-	if (count < nb_ops)
-		goto enqueue_req;
-
-jcmdq_full:
-	queue->head = head;
-	qp->stats.enqueued_count += count;
-
-	return count;
-}
-
-__rte_hot uint16_t
-cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
-		       uint16_t nb_ops)
-{
-	union cn10k_ml_error_code *error_code;
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_queue *queue;
-	struct cnxk_ml_req *req;
-	struct cnxk_ml_qp *qp;
-
-	uint64_t status;
-	uint16_t count;
-	uint64_t tail;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	qp = dev->data->queue_pairs[qp_id];
-	queue = &qp->queue;
-
-	tail = queue->tail;
-	nb_ops = PLT_MIN(nb_ops, queue_pending_count(queue->head, tail, qp->nb_desc));
-	count = 0;
-
-	if (unlikely(nb_ops == 0))
-		goto empty_or_active;
-
-dequeue_req:
-	req = &queue->reqs[tail];
-	status = cn10k_mldev->get_poll_ptr(req);
-	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
-		if (plt_tsc_cycles() < req->timeout) {
-			goto empty_or_active;
-		} else { /* Timeout, set indication of driver error */
-			error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
-			error_code->s.etype = ML_ETYPE_DRIVER;
-		}
-	}
-
-	cn10k_ml_result_update(cnxk_mldev, qp_id, req);
-	ops[count] = req->op;
-
-	queue_index_advance(&tail, qp->nb_desc);
-	count++;
-
-	if (count < nb_ops)
-		goto dequeue_req;
-
-empty_or_active:
-	queue->tail = tail;
-
-	return count;
+	return true;
 }
 
 __rte_hot int
@@ -1396,41 +1266,48 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 }
 
 __rte_hot int
-cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op)
+cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
+			uint16_t nb_batches)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
 	struct cnxk_ml_req *req;
+	struct rte_ml_op op;
 	uint16_t model_id;
 	uint16_t layer_id;
 	bool timeout;
 	int ret = 0;
 
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	model_id = cnxk_mldev->index_map[op->model_id].model_id;
-	layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
+	model_id = cnxk_mldev->index_map[index].model_id;
+	layer_id = cnxk_mldev->index_map[index].layer_id;
 	model = cnxk_mldev->mldev->data->models[model_id];
 	layer = &model->layer[layer_id];
 	req = layer->glow.req;
 
+	op.model_id = index;
+	op.impl_opaque = 0;
+
 	cn10k_ml_set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, index, input, output, nb_batches);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
 	error_code->s.etype = ML_ETYPE_UNKNOWN;
-	req->cn10k_req.result.user_ptr = op->user_ptr;
+	req->cn10k_req.result.user_ptr = NULL;
 
-	cn10k_mldev->set_poll_ptr(req);
+	cnxk_ml_set_poll_ptr(req);
 	req->cn10k_req.jcmd.w1.s.jobptr = PLT_U64_CAST(&req->cn10k_req.jd);
 
 	timeout = true;
 	req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	do {
 		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd)) {
-			req->op = op;
+			req->op = &op;
 			timeout = false;
 			break;
 		}
@@ -1443,7 +1320,7 @@ cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op)
 
 	timeout = true;
 	do {
-		if (cn10k_mldev->get_poll_ptr(req) == ML_CNXK_POLL_JOB_FINISH) {
+		if (cnxk_ml_get_poll_ptr(req) == ML_CNXK_POLL_JOB_FINISH) {
 			timeout = false;
 			break;
 		}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 8a090a31592..3e75cae65a3 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -13,6 +13,7 @@
 struct cnxk_ml_dev;
 struct cnxk_ml_qp;
 struct cnxk_ml_model;
+struct cnxk_ml_req;
 
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
@@ -308,13 +309,15 @@ int cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_
 				 void *buffer);
 
 /* Fast-path ops */
-__rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
-					  struct rte_ml_op **ops, uint16_t nb_ops);
-__rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
-					  struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot bool cn10k_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op,
+				       uint16_t layer_id, struct cnxk_ml_qp *qp, uint64_t head);
 __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op,
 				    struct rte_ml_op_error *error);
-__rte_hot int cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op);
+__rte_hot int cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
+				      uint16_t nb_batches);
+__rte_hot void cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request);
+__rte_hot void cn10k_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype);
+__rte_hot void cn10k_ml_set_poll_addr(struct cnxk_ml_req *req);
 
 /* Misc ops */
 void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index 66d979dd3ca..f618e5aa5fc 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -15,6 +15,8 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_qp;
+struct cnxk_ml_req;
 
 /* Model state */
 enum cnxk_ml_model_state {
@@ -70,6 +72,12 @@ struct cnxk_ml_layer {
 	struct cn10k_ml_layer_data glow;
 };
 
+typedef bool (*enqueue_single_t)(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op,
+				 uint16_t layer_id, struct cnxk_ml_qp *qp, uint64_t head);
+typedef void (*result_update_t)(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request);
+typedef void (*set_error_code_t)(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype);
+typedef void (*set_poll_addr_t)(struct cnxk_ml_req *req);
+
 /* Model Object */
 struct cnxk_ml_model {
 	/* Device reference */
@@ -106,6 +114,12 @@ struct cnxk_ml_model {
 
 	/* Spinlock, used to update model state */
 	plt_spinlock_t lock;
+
+	/* Fast-path functions */
+	enqueue_single_t enqueue_single;
+	result_update_t result_update;
+	set_error_code_t set_error_code;
+	set_poll_addr_t set_poll_addr;
 };
 
 void cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp);
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 96d40029b36..4bf8bd25457 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -17,6 +17,18 @@
 /* ML model macros */
 #define CNXK_ML_MODEL_MEMZONE_NAME "ml_cnxk_model_mz"
 
+__rte_hot void
+cnxk_ml_set_poll_ptr(struct cnxk_ml_req *req)
+{
+	plt_write64(ML_CNXK_POLL_JOB_START, req->status);
+}
+
+__rte_hot uint64_t
+cnxk_ml_get_poll_ptr(struct cnxk_ml_req *req)
+{
+	return plt_read64(req->status);
+}
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -1323,6 +1335,122 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 	return 0;
 }
 
+static __rte_always_inline void
+queue_index_advance(uint64_t *index, uint64_t nb_desc)
+{
+	*index = (*index + 1) % nb_desc;
+}
+
+static __rte_always_inline uint64_t
+queue_pending_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
+{
+	return (nb_desc + head - tail) % nb_desc;
+}
+
+static __rte_always_inline uint64_t
+queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
+{
+	return nb_desc - queue_pending_count(head, tail, nb_desc) - 1;
+}
+
+__rte_hot uint16_t
+cnxk_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
+		      uint16_t nb_ops)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_qp *qp;
+	struct rte_ml_op *op;
+
+	uint16_t layer_id = 0;
+	uint16_t count;
+	uint64_t head;
+
+	cnxk_mldev = dev->data->dev_private;
+	qp = dev->data->queue_pairs[qp_id];
+	queue = &qp->queue;
+
+	head = queue->head;
+	nb_ops = PLT_MIN(nb_ops, queue_free_count(head, queue->tail, qp->nb_desc));
+	count = 0;
+
+	if (unlikely(nb_ops == 0))
+		return 0;
+
+enqueue_req:
+	op = ops[count];
+	model = cnxk_mldev->mldev->data->models[op->model_id];
+
+	if (unlikely(!model->enqueue_single(cnxk_mldev, op, layer_id, qp, head)))
+		goto jcmdq_full;
+
+	queue_index_advance(&head, qp->nb_desc);
+	count++;
+
+	if (count < nb_ops)
+		goto enqueue_req;
+
+jcmdq_full:
+	queue->head = head;
+	qp->stats.enqueued_count += count;
+
+	return count;
+}
+
+__rte_hot uint16_t
+cnxk_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
+		      uint16_t nb_ops)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_req *req;
+	struct cnxk_ml_qp *qp;
+
+	uint64_t status;
+	uint16_t count;
+	uint64_t tail;
+
+	cnxk_mldev = dev->data->dev_private;
+	qp = dev->data->queue_pairs[qp_id];
+	queue = &qp->queue;
+
+	tail = queue->tail;
+	nb_ops = PLT_MIN(nb_ops, queue_pending_count(queue->head, tail, qp->nb_desc));
+	count = 0;
+
+	if (unlikely(nb_ops == 0))
+		goto empty_or_active;
+
+dequeue_req:
+
+	req = &queue->reqs[tail];
+	model = cnxk_mldev->mldev->data->models[req->op->model_id];
+
+	status = cnxk_ml_get_poll_ptr(req);
+	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
+		if (plt_tsc_cycles() < req->timeout)
+			goto empty_or_active;
+		else /* Timeout, set indication of driver error */
+			model->set_error_code(req, ML_ETYPE_DRIVER, 0);
+	}
+
+	model->result_update(cnxk_mldev, qp->id, req);
+
+	ops[count] = req->op;
+	queue_index_advance(&tail, qp->nb_desc);
+	count++;
+
+	if (count < nb_ops)
+		goto dequeue_req;
+
+empty_or_active:
+	queue->tail = tail;
+
+	return count;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index d27ca0d0cb2..d0c126f34b7 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -65,4 +65,11 @@ extern struct rte_ml_dev_ops cnxk_ml_ops;
 int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
 int cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
 
+__rte_hot uint16_t cnxk_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
+					 struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot uint16_t cnxk_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
+					 struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot void cnxk_ml_set_poll_ptr(struct cnxk_ml_req *req);
+__rte_hot uint64_t cnxk_ml_get_poll_ptr(struct cnxk_ml_req *req);
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 18/34] ml/cnxk: move error handling to cnxk layer
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (16 preceding siblings ...)
  2023-08-30 15:59 ` [PATCH v1 17/34] ml/cnxk: update fast path functions Srikanth Yalavarthi
@ 2023-08-30 15:59 ` Srikanth Yalavarthi
  2023-08-30 15:59 ` [PATCH v1 19/34] ml/cnxk: support config and close of tvmdp library Srikanth Yalavarthi
                   ` (23 subsequent siblings)
  41 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Move error type structures to cnxk layer. cn10k layer to
handle fw and hw error sub-types only.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h | 41 ++++++---------
 drivers/ml/cnxk/cn10k_ml_ops.c | 93 +++++++++++++---------------------
 drivers/ml/cnxk/cnxk_ml_dev.c  |  8 +++
 drivers/ml/cnxk/cnxk_ml_dev.h  | 18 +++++++
 drivers/ml/cnxk/cnxk_ml_ops.c  |  2 +-
 5 files changed, 78 insertions(+), 84 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 94a94d996f8..2e7eb6c9ef9 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -52,38 +52,27 @@ struct cnxk_ml_dev;
 struct cnxk_ml_req;
 struct cnxk_ml_qp;
 
-/* Error types enumeration */
-enum cn10k_ml_error_etype {
-	/* 0x0 */ ML_ETYPE_NO_ERROR = 0, /* No error */
-	/* 0x1 */ ML_ETYPE_FW_NONFATAL,	 /* Firmware non-fatal error */
-	/* 0x2 */ ML_ETYPE_HW_NONFATAL,	 /* Hardware non-fatal error */
-	/* 0x3 */ ML_ETYPE_HW_FATAL,	 /* Hardware fatal error */
-	/* 0x4 */ ML_ETYPE_HW_WARNING,	 /* Hardware warning */
-	/* 0x5 */ ML_ETYPE_DRIVER,	 /* Driver specific error */
-	/* 0x6 */ ML_ETYPE_UNKNOWN,	 /* Unknown error */
-};
-
 /* Firmware non-fatal error sub-type */
 enum cn10k_ml_error_stype_fw_nf {
-	/* 0x0 */ ML_FW_ERR_NOERR = 0,		 /* No error */
-	/* 0x1 */ ML_FW_ERR_UNLOAD_ID_NOT_FOUND, /* Model ID not found during load */
-	/* 0x2 */ ML_FW_ERR_LOAD_LUT_OVERFLOW,	 /* Lookup table overflow at load */
-	/* 0x3 */ ML_FW_ERR_ID_IN_USE,		 /* Model ID already in use */
-	/* 0x4 */ ML_FW_ERR_INVALID_TILEMASK,	 /* Invalid OCM tilemask */
-	/* 0x5 */ ML_FW_ERR_RUN_LUT_OVERFLOW,	 /* Lookup table overflow at run */
-	/* 0x6 */ ML_FW_ERR_RUN_ID_NOT_FOUND,	 /* Model ID not found during run */
-	/* 0x7 */ ML_FW_ERR_COMMAND_NOTSUP,	 /* Unsupported command */
-	/* 0x8 */ ML_FW_ERR_DDR_ADDR_RANGE,	 /* DDR address out of range */
-	/* 0x9 */ ML_FW_ERR_NUM_BATCHES_INVALID, /* Invalid number of batches */
-	/* 0xA */ ML_FW_ERR_INSSYNC_TIMEOUT,	 /* INS sync timeout */
+	/* 0x0 */ ML_CN10K_FW_ERR_NOERR = 0,	       /* No error */
+	/* 0x1 */ ML_CN10K_FW_ERR_UNLOAD_ID_NOT_FOUND, /* Model ID not found during load */
+	/* 0x2 */ ML_CN10K_FW_ERR_LOAD_LUT_OVERFLOW,   /* Lookup table overflow at load */
+	/* 0x3 */ ML_CN10K_FW_ERR_ID_IN_USE,	       /* Model ID already in use */
+	/* 0x4 */ ML_CN10K_FW_ERR_INVALID_TILEMASK,    /* Invalid OCM tilemask */
+	/* 0x5 */ ML_CN10K_FW_ERR_RUN_LUT_OVERFLOW,    /* Lookup table overflow at run */
+	/* 0x6 */ ML_CN10K_FW_ERR_RUN_ID_NOT_FOUND,    /* Model ID not found during run */
+	/* 0x7 */ ML_CN10K_FW_ERR_COMMAND_NOTSUP,      /* Unsupported command */
+	/* 0x8 */ ML_CN10K_FW_ERR_DDR_ADDR_RANGE,      /* DDR address out of range */
+	/* 0x9 */ ML_CN10K_FW_ERR_NUM_BATCHES_INVALID, /* Invalid number of batches */
+	/* 0xA */ ML_CN10K_FW_ERR_INSSYNC_TIMEOUT,     /* INS sync timeout */
 };
 
 /* Driver error sub-type */
 enum cn10k_ml_error_stype_driver {
-	/* 0x0 */ ML_DRIVER_ERR_NOERR = 0, /* No error */
-	/* 0x1 */ ML_DRIVER_ERR_UNKNOWN,   /* Unable to determine error sub-type */
-	/* 0x2 */ ML_DRIVER_ERR_EXCEPTION, /* Firmware exception */
-	/* 0x3 */ ML_DRIVER_ERR_FW_ERROR,  /* Unknown firmware error */
+	/* 0x0 */ ML_CN10K_DRIVER_ERR_NOERR = 0, /* No error */
+	/* 0x1 */ ML_CN10K_DRIVER_ERR_UNKNOWN,	 /* Unable to determine error sub-type */
+	/* 0x2 */ ML_CN10K_DRIVER_ERR_EXCEPTION, /* Firmware exception */
+	/* 0x3 */ ML_CN10K_DRIVER_ERR_FW_ERROR,	 /* Unknown firmware error */
 };
 
 /* Error structure */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 7d809d25ae0..daeb3b712c5 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -26,47 +26,27 @@
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
-/* Error message length */
-#define ERRMSG_LEN 32
-
-/* Error type database */
-static const struct cn10k_ml_etype_db {
-	enum cn10k_ml_error_etype etype;
-	char name[ERRMSG_LEN];
-} ml_etype_db[] = {
-	{ML_ETYPE_NO_ERROR, "NO_ERROR"},	{ML_ETYPE_FW_NONFATAL, "FW_NON_FATAL"},
-	{ML_ETYPE_HW_NONFATAL, "HW_NON_FATAL"}, {ML_ETYPE_HW_FATAL, "HW_FATAL"},
-	{ML_ETYPE_HW_WARNING, "HW_WARNING"},	{ML_ETYPE_DRIVER, "DRIVER_ERROR"},
-	{ML_ETYPE_UNKNOWN, "UNKNOWN_ERROR"},
-};
-
 /* Hardware non-fatal error subtype database */
-static const struct cn10k_ml_stype_db_hw_nf {
-	enum cn10k_ml_error_stype_fw_nf stype;
-	char msg[ERRMSG_LEN];
-} ml_stype_db_hw_nf[] = {
-	{ML_FW_ERR_NOERR, "NO ERROR"},
-	{ML_FW_ERR_UNLOAD_ID_NOT_FOUND, "UNLOAD MODEL ID NOT FOUND"},
-	{ML_FW_ERR_LOAD_LUT_OVERFLOW, "LOAD LUT OVERFLOW"},
-	{ML_FW_ERR_ID_IN_USE, "MODEL ID IN USE"},
-	{ML_FW_ERR_INVALID_TILEMASK, "INVALID TILEMASK"},
-	{ML_FW_ERR_RUN_LUT_OVERFLOW, "RUN LUT OVERFLOW"},
-	{ML_FW_ERR_RUN_ID_NOT_FOUND, "RUN MODEL ID NOT FOUND"},
-	{ML_FW_ERR_COMMAND_NOTSUP, "COMMAND NOT SUPPORTED"},
-	{ML_FW_ERR_DDR_ADDR_RANGE, "DDR ADDRESS OUT OF RANGE"},
-	{ML_FW_ERR_NUM_BATCHES_INVALID, "INVALID BATCHES"},
-	{ML_FW_ERR_INSSYNC_TIMEOUT, "INSSYNC TIMEOUT"},
+static struct cnxk_ml_error_db ml_stype_db_hw_nf[] = {
+	{ML_CN10K_FW_ERR_NOERR, "NO ERROR"},
+	{ML_CN10K_FW_ERR_UNLOAD_ID_NOT_FOUND, "UNLOAD MODEL ID NOT FOUND"},
+	{ML_CN10K_FW_ERR_LOAD_LUT_OVERFLOW, "LOAD LUT OVERFLOW"},
+	{ML_CN10K_FW_ERR_ID_IN_USE, "MODEL ID IN USE"},
+	{ML_CN10K_FW_ERR_INVALID_TILEMASK, "INVALID TILEMASK"},
+	{ML_CN10K_FW_ERR_RUN_LUT_OVERFLOW, "RUN LUT OVERFLOW"},
+	{ML_CN10K_FW_ERR_RUN_ID_NOT_FOUND, "RUN MODEL ID NOT FOUND"},
+	{ML_CN10K_FW_ERR_COMMAND_NOTSUP, "COMMAND NOT SUPPORTED"},
+	{ML_CN10K_FW_ERR_DDR_ADDR_RANGE, "DDR ADDRESS OUT OF RANGE"},
+	{ML_CN10K_FW_ERR_NUM_BATCHES_INVALID, "INVALID BATCHES"},
+	{ML_CN10K_FW_ERR_INSSYNC_TIMEOUT, "INSSYNC TIMEOUT"},
 };
 
 /* Driver error subtype database */
-static const struct cn10k_ml_stype_db_driver {
-	enum cn10k_ml_error_stype_driver stype;
-	char msg[ERRMSG_LEN];
-} ml_stype_db_driver[] = {
-	{ML_DRIVER_ERR_NOERR, "NO ERROR"},
-	{ML_DRIVER_ERR_UNKNOWN, "UNKNOWN ERROR"},
-	{ML_DRIVER_ERR_EXCEPTION, "FW EXCEPTION"},
-	{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
+static struct cnxk_ml_error_db ml_stype_db_driver[] = {
+	{ML_CN10K_DRIVER_ERR_NOERR, "NO ERROR"},
+	{ML_CN10K_DRIVER_ERR_UNKNOWN, "UNKNOWN ERROR"},
+	{ML_CN10K_DRIVER_ERR_EXCEPTION, "FW EXCEPTION"},
+	{ML_CN10K_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
 };
 
 __rte_hot void
@@ -1166,19 +1146,19 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request)
 
 		/* Handle driver error */
 		error_code = (union cn10k_ml_error_code *)&result->error_code;
-		if (error_code->s.etype == ML_ETYPE_DRIVER) {
+		if (error_code->s.etype == ML_CNXK_ETYPE_DRIVER) {
 			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 			/* Check for exception */
 			if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) !=
 			     0) ||
 			    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
-				error_code->s.stype = ML_DRIVER_ERR_EXCEPTION;
+				error_code->s.stype = ML_CN10K_DRIVER_ERR_EXCEPTION;
 			else if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_LO) != 0) ||
 				 (roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_HI) != 0))
-				error_code->s.stype = ML_DRIVER_ERR_FW_ERROR;
+				error_code->s.stype = ML_CN10K_DRIVER_ERR_FW_ERROR;
 			else
-				error_code->s.stype = ML_DRIVER_ERR_UNKNOWN;
+				error_code->s.stype = ML_CN10K_DRIVER_ERR_UNKNOWN;
 		}
 
 		op->impl_opaque = result->error_code;
@@ -1219,7 +1199,7 @@ cn10k_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op, ui
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
-	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	error_code->s.etype = ML_CNXK_ETYPE_UNKNOWN;
 	req->cn10k_req.result.user_ptr = op->user_ptr;
 
 	cnxk_ml_set_poll_ptr(req);
@@ -1236,30 +1216,29 @@ __rte_hot int
 cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_ml_op_error *error)
 {
 	union cn10k_ml_error_code *error_code;
-	char msg[RTE_ML_STR_MAX];
 
 	PLT_SET_USED(dev);
 
 	error_code = (union cn10k_ml_error_code *)&op->impl_opaque;
 
-	/* Copy error message */
-	plt_strlcpy(msg, ml_etype_db[error_code->s.etype].name, sizeof(msg));
-
 	/* Copy sub error message */
-	if (error_code->s.etype == ML_ETYPE_HW_NONFATAL) {
-		strcat(msg, " : ");
+	if (error_code->s.etype == ML_CNXK_ETYPE_HW_NONFATAL) {
 		if (error_code->s.stype < PLT_DIM(ml_stype_db_hw_nf))
-			strcat(msg, ml_stype_db_hw_nf[error_code->s.stype].msg);
+			snprintf(error->message, RTE_ML_STR_MAX, "%s : %s",
+				 ml_etype_db[error_code->s.etype].str,
+				 ml_stype_db_hw_nf[error_code->s.stype].str);
 		else
-			strcat(msg, "UNKNOWN ERROR");
-	}
-
-	if (error_code->s.etype == ML_ETYPE_DRIVER) {
-		strcat(msg, " : ");
-		strcat(msg, ml_stype_db_driver[error_code->s.stype].msg);
+			snprintf(error->message, RTE_ML_STR_MAX, "%s : UNKNOWN ERROR",
+				 ml_etype_db[error_code->s.etype].str);
+	} else if (error_code->s.etype == ML_CNXK_ETYPE_DRIVER) {
+		snprintf(error->message, RTE_ML_STR_MAX, "%s : %s",
+			 ml_etype_db[error_code->s.etype].str,
+			 ml_stype_db_driver[error_code->s.stype].str);
+	} else {
+		snprintf(error->message, RTE_ML_STR_MAX, "%s",
+			 ml_etype_db[error_code->s.etype].str);
 	}
 
-	plt_strlcpy(error->message, msg, sizeof(error->message));
 	error->errcode = error_code->u64;
 
 	return 0;
@@ -1297,7 +1276,7 @@ cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
-	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	error_code->s.etype = ML_CNXK_ETYPE_UNKNOWN;
 	req->cn10k_req.result.user_ptr = NULL;
 
 	cnxk_ml_set_poll_ptr(req);
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.c b/drivers/ml/cnxk/cnxk_ml_dev.c
index 2a5c17c973b..63d1c9e417b 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.c
+++ b/drivers/ml/cnxk/cnxk_ml_dev.c
@@ -9,3 +9,11 @@
 
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
+
+/* Error type database */
+struct cnxk_ml_error_db ml_etype_db[] = {
+	{ML_CNXK_ETYPE_NO_ERROR, "NO_ERROR"},	     {ML_CNXK_ETYPE_FW_NONFATAL, "FW_NON_FATAL"},
+	{ML_CNXK_ETYPE_HW_NONFATAL, "HW_NON_FATAL"}, {ML_CNXK_ETYPE_HW_FATAL, "HW_FATAL"},
+	{ML_CNXK_ETYPE_HW_WARNING, "HW_WARNING"},    {ML_CNXK_ETYPE_DRIVER, "DRIVER_ERROR"},
+	{ML_CNXK_ETYPE_UNKNOWN, "UNKNOWN_ERROR"},
+};
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 3ce9338f1f1..382fca64bea 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -18,6 +18,22 @@
 #define ML_CNXK_POLL_JOB_START	0
 #define ML_CNXK_POLL_JOB_FINISH 1
 
+/* Error types enumeration */
+enum cnxk_ml_error_etype {
+	/* 0x0 */ ML_CNXK_ETYPE_NO_ERROR = 0, /* No error */
+	/* 0x1 */ ML_CNXK_ETYPE_FW_NONFATAL,  /* Firmware non-fatal error */
+	/* 0x2 */ ML_CNXK_ETYPE_HW_NONFATAL,  /* Hardware non-fatal error */
+	/* 0x3 */ ML_CNXK_ETYPE_HW_FATAL,     /* Hardware fatal error */
+	/* 0x4 */ ML_CNXK_ETYPE_HW_WARNING,   /* Hardware warning */
+	/* 0x5 */ ML_CNXK_ETYPE_DRIVER,	      /* Driver specific error */
+	/* 0x6 */ ML_CNXK_ETYPE_UNKNOWN,      /* Unknown error */
+};
+
+struct cnxk_ml_error_db {
+	uint64_t code;
+	char str[RTE_ML_STR_MAX];
+};
+
 /* Device configuration state enum */
 enum cnxk_ml_dev_state {
 	/* Probed and not configured */
@@ -78,4 +94,6 @@ struct cnxk_ml_dev {
 	struct cnxk_ml_index_map *index_map;
 };
 
+extern struct cnxk_ml_error_db ml_etype_db[];
+
 #endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 4bf8bd25457..b2eb4bd0d9a 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1433,7 +1433,7 @@ cnxk_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op *
 		if (plt_tsc_cycles() < req->timeout)
 			goto empty_or_active;
 		else /* Timeout, set indication of driver error */
-			model->set_error_code(req, ML_ETYPE_DRIVER, 0);
+			model->set_error_code(req, ML_CNXK_ETYPE_DRIVER, 0);
 	}
 
 	model->result_update(cnxk_mldev, qp->id, req);
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 19/34] ml/cnxk: support config and close of tvmdp library
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (17 preceding siblings ...)
  2023-08-30 15:59 ` [PATCH v1 18/34] ml/cnxk: move error handling to cnxk layer Srikanth Yalavarthi
@ 2023-08-30 15:59 ` Srikanth Yalavarthi
  2023-09-21 12:32   ` Jerin Jacob
  2023-08-30 15:59 ` [PATCH v1 20/34] ml/cnxk: add structures to support TVM model type Srikanth Yalavarthi
                   ` (22 subsequent siblings)
  41 siblings, 1 reply; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to configure and close TVMDP library based
on ML device configuration options.

Updated meson build to enable Jansson, TVM runtime, TVMDP
library as build dependencies.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_ops.c  | 15 ++++++++++++
 drivers/ml/cnxk/meson.build    | 45 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.c | 44 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h | 15 ++++++++++++
 4 files changed, 119 insertions(+)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.h

diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index b2eb4bd0d9a..454fec33234 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -9,6 +9,10 @@
 
 #include "cn10k_ml_ops.h"
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_ops.h"
+#endif
+
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_io.h"
 #include "cnxk_ml_model.h"
@@ -625,6 +629,12 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 		goto error;
 	}
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	ret = mvtvm_ml_dev_configure(cnxk_mldev, conf);
+	if (ret != 0)
+		goto error;
+#endif
+
 	/* Set device capabilities */
 	cnxk_mldev->max_nb_layers =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
@@ -685,6 +695,11 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 	/* Un-initialize xstats */
 	cnxk_ml_xstats_uninit(cnxk_mldev);
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	if (mvtvm_ml_dev_close(cnxk_mldev) != 0)
+		plt_err("Failed to close MVTVM ML Device");
+#endif
+
 	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close CN10K ML Device");
 
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 575f08f9c09..29dad0b0e33 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -7,6 +7,27 @@ if not is_linux or not dpdk_conf.get('RTE_ARCH_64')
     subdir_done()
 endif
 
+enable_mvtvm = true
+
+if not jansson_dep.found()
+        message('drivers/ml/cnxk: jansson not found')
+        enable_mvtvm = false
+endif
+
+tvmrt_lib = cc.find_library('tvm_runtime', required: false)
+if tvmrt_lib.found()
+        tvmrt_dep = declare_dependency(dependencies: tvmrt_lib)
+else
+        message('drivers/ml/cnxk: tvm_runtime not found')
+        enable_mvtvm = false
+endif
+
+tvmdp_dep = dependency('tvmdp', required: false)
+if not tvmdp_dep.found()
+        message('drivers/ml/cnxk: tvmdp not found')
+        enable_mvtvm = false
+endif
+
 driver_sdk_headers = files(
         'cn10k_ml_dev.h',
         'cn10k_ml_ops.h',
@@ -34,6 +55,30 @@ sources = files(
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
 
+if enable_mvtvm
+
+dpdk_conf.set('RTE_MLDEV_CNXK_ENABLE_MVTVM', true)
+
+driver_sdk_headers += files(
+        'mvtvm_ml_ops.h',
+)
+
+sources += files(
+        'mvtvm_ml_ops.c',
+)
+
+ext_deps += tvmrt_dep
+ext_deps += tvmdp_dep
+ext_deps += cc.find_library('stdc++', required: true)
+ext_deps += jansson_dep
+
+deps += ['bus_vdev']
+
+message('drivers/ml/cnxk: Enabled TVM model support')
+else
+message('drivers/ml/cnxk: Disabled TVM model support')
+endif
+
 require_iova_in_mbuf = false
 
 if get_option('buildtype').contains('debug')
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
new file mode 100644
index 00000000000..0e1fc527daa
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <tvmdp.h>
+
+#include <rte_common.h>
+#include <rte_cycles.h>
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include "mvtvm_ml_ops.h"
+
+#include "cnxk_ml_dev.h"
+
+int
+mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
+{
+	int ret;
+
+	RTE_SET_USED(conf);
+
+	/* Configure TVMDP library */
+	ret = tvmdp_configure(cnxk_mldev->mldev->data->nb_models, rte_get_tsc_cycles);
+	if (ret != 0)
+		plt_err("TVMDP configuration failed, error = %d\n", ret);
+
+	return ret;
+}
+
+int
+mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
+{
+	int ret;
+
+	RTE_SET_USED(cnxk_mldev);
+
+	/* Close TVMDP library configuration */
+	ret = tvmdp_close();
+	if (ret != 0)
+		plt_err("TVMDP close failed, error = %d\n", ret);
+
+	return ret;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
new file mode 100644
index 00000000000..988f3a1fd5e
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_OPS_H_
+#define _MVTVM_ML_OPS_H_
+
+#include <rte_mldev.h>
+
+struct cnxk_ml_dev;
+
+int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
+int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+
+#endif /* _MVTVM_ML_OPS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 20/34] ml/cnxk: add structures to support TVM model type
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (18 preceding siblings ...)
  2023-08-30 15:59 ` [PATCH v1 19/34] ml/cnxk: support config and close of tvmdp library Srikanth Yalavarthi
@ 2023-08-30 15:59 ` Srikanth Yalavarthi
  2023-08-30 15:59 ` [PATCH v1 21/34] ml/cnxk: add support for identify " Srikanth Yalavarthi
                   ` (21 subsequent siblings)
  41 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduced model type, sub-type and layer type. Added
internal structures for TVM model objects.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  3 ++
 drivers/ml/cnxk/cn10k_ml_ops.c   |  6 ++-
 drivers/ml/cnxk/cnxk_ml_model.h  | 63 +++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.c    | 54 ++++++++++++++++++++++-----
 drivers/ml/cnxk/meson.build      |  1 +
 drivers/ml/cnxk/mvtvm_ml_model.h | 46 +++++++++++++++++++++++
 6 files changed, 160 insertions(+), 13 deletions(-)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.h

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 7d4b1efad13..c665e2cf661 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -437,6 +437,9 @@ cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint1
 
 			for (j = 0; j < local_model->nb_layers; j++) {
 				local_layer = &local_model->layer[j];
+				if (local_layer->type != ML_CNXK_LAYER_TYPE_MRVL)
+					continue;
+
 				if (local_layer != layer &&
 				    local_layer->glow.ocm_map.ocm_reserved) {
 					if (IS_BIT_SET(local_layer->glow.ocm_map.tilemask, tile_id))
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index daeb3b712c5..db18f320527 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -650,6 +650,9 @@ cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	if (ret != 0)
 		return ret;
 
+	/* Set model sub type */
+	model->subtype = ML_CNXK_MODEL_SUBTYPE_GLOW_MRVL;
+
 	/* Copy metadata to internal buffer */
 	rte_memcpy(&model->glow.metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
 	cn10k_ml_model_metadata_update(&model->glow.metadata);
@@ -671,6 +674,7 @@ cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	/* Load layer and get the index */
 	layer = &model->layer[0];
+	layer->type = ML_CNXK_LAYER_TYPE_MRVL;
 	ret = cn10k_ml_layer_load(cnxk_mldev, model->model_id, NULL, params->addr, params->size,
 				  &layer->index);
 	if (ret != 0) {
@@ -894,7 +898,7 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 	if (ret < 0) {
 		cn10k_ml_layer_stop(device, model_id, layer_name);
 	} else {
-		if (cn10k_mldev->cache_model_data)
+		if (cn10k_mldev->cache_model_data && model->type == ML_CNXK_MODEL_TYPE_GLOW)
 			ret = cn10k_ml_cache_model_data(cnxk_mldev, layer);
 	}
 
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index f618e5aa5fc..b5d6ab2b1e2 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -11,6 +11,10 @@
 
 #include "cn10k_ml_model.h"
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_model.h"
+#endif
+
 #include "cnxk_ml_io.h"
 
 struct cnxk_ml_dev;
@@ -18,6 +22,45 @@ struct cnxk_ml_model;
 struct cnxk_ml_qp;
 struct cnxk_ml_req;
 
+/* Model type */
+enum cnxk_ml_model_type {
+	/* Invalid model type */
+	ML_CNXK_MODEL_TYPE_INVALID,
+
+	/* Glow compiled model, for MLIP target */
+	ML_CNXK_MODEL_TYPE_GLOW,
+
+	/* TVM compiled model, for ARM64 / ARM64 + MLIP target */
+	ML_CNXK_MODEL_TYPE_TVM,
+};
+
+/* Model subtype */
+enum cnxk_ml_model_subtype {
+	/* Marvell Glow model */
+	ML_CNXK_MODEL_SUBTYPE_GLOW_MRVL,
+
+	/* TVM model with single MRVL region */
+	ML_CNXK_MODEL_SUBTYPE_TVM_MRVL,
+
+	/* TVM model with LLVM regions only */
+	ML_CNXK_MODEL_SUBTYPE_TVM_LLVM,
+
+	/* TVM hybrid model, with both MRVL and LLVM regions or (> 1) MRVL regions*/
+	ML_CNXK_MODEL_SUBTYPE_TVM_HYBRID,
+};
+
+/* Layer type */
+enum cnxk_ml_layer_type {
+	/* MRVL layer, for MLIP target*/
+	ML_CNXK_LAYER_TYPE_UNKNOWN = 0,
+
+	/* MRVL layer, for MLIP target*/
+	ML_CNXK_LAYER_TYPE_MRVL,
+
+	/* LLVM layer, for ARM64 target*/
+	ML_CNXK_LAYER_TYPE_LLVM,
+};
+
 /* Model state */
 enum cnxk_ml_model_state {
 	/* Unknown state */
@@ -53,6 +96,9 @@ struct cnxk_ml_layer {
 	/* Name*/
 	char name[RTE_ML_STR_MAX];
 
+	/* Type */
+	enum cnxk_ml_layer_type type;
+
 	/* Model handle */
 	struct cnxk_ml_model *model;
 
@@ -83,14 +129,27 @@ struct cnxk_ml_model {
 	/* Device reference */
 	struct cnxk_ml_dev *cnxk_mldev;
 
+	/* Type */
+	enum cnxk_ml_model_type type;
+
+	/* Model subtype */
+	enum cnxk_ml_model_subtype subtype;
+
 	/* ID */
 	uint16_t model_id;
 
 	/* Name */
 	char name[RTE_ML_STR_MAX];
 
-	/* Model specific data - glow */
-	struct cn10k_ml_model_data glow;
+	union {
+		/* Model specific data - glow */
+		struct cn10k_ml_model_data glow;
+
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		/* Model type specific data - mvtvm */
+		struct mvtvm_ml_model_data mvtvm;
+#endif
+	};
 
 	/* Batch size */
 	uint32_t batch_size;
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 454fec33234..4a5b054975d 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1286,6 +1286,8 @@ cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buf
 	struct cnxk_ml_model *model;
 	uint8_t *lcl_dbuffer;
 	uint8_t *lcl_qbuffer;
+	uint64_t d_offset;
+	uint64_t q_offset;
 	uint32_t i;
 	int ret;
 
@@ -1298,17 +1300,32 @@ cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buf
 		return -EINVAL;
 	}
 
-	info = &model->layer[0].info;
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		info = &model->layer[0].info;
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	else
+		info = &model->mvtvm.info;
+#endif
 
-	lcl_dbuffer = dbuffer[0]->addr;
-	lcl_qbuffer = qbuffer[0]->addr;
+	d_offset = 0;
+	q_offset = 0;
 	for (i = 0; i < info->nb_inputs; i++) {
+		if (model->type == ML_CNXK_MODEL_TYPE_TVM) {
+			lcl_dbuffer = dbuffer[i]->addr;
+			lcl_qbuffer = qbuffer[i]->addr;
+		} else {
+			lcl_dbuffer = RTE_PTR_ADD(dbuffer[0]->addr, d_offset);
+			lcl_qbuffer = RTE_PTR_ADD(qbuffer[0]->addr, q_offset);
+		}
+
 		ret = cnxk_ml_io_quantize_single(&info->input[i], lcl_dbuffer, lcl_qbuffer);
 		if (ret < 0)
 			return ret;
 
-		lcl_dbuffer += info->input[i].sz_d;
-		lcl_qbuffer += info->input[i].sz_q;
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
+			d_offset += info->input[i].sz_d;
+			q_offset += info->input[i].sz_q;
+		}
 	}
 
 	return 0;
@@ -1322,6 +1339,8 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 	struct cnxk_ml_model *model;
 	uint8_t *lcl_qbuffer;
 	uint8_t *lcl_dbuffer;
+	uint64_t q_offset;
+	uint64_t d_offset;
 	uint32_t i;
 	int ret;
 
@@ -1334,17 +1353,32 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 		return -EINVAL;
 	}
 
-	info = &model->layer[model->nb_layers - 1].info;
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		info = &model->layer[model->nb_layers - 1].info;
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	else
+		info = &model->mvtvm.info;
+#endif
 
-	lcl_qbuffer = qbuffer[0]->addr;
-	lcl_dbuffer = dbuffer[0]->addr;
+	q_offset = 0;
+	d_offset = 0;
 	for (i = 0; i < info->nb_outputs; i++) {
+		if (model->type == ML_CNXK_MODEL_TYPE_TVM) {
+			lcl_qbuffer = qbuffer[i]->addr;
+			lcl_dbuffer = dbuffer[i]->addr;
+		} else {
+			lcl_qbuffer = RTE_PTR_ADD(qbuffer[0]->addr, q_offset);
+			lcl_dbuffer = RTE_PTR_ADD(dbuffer[0]->addr, d_offset);
+		}
+
 		ret = cnxk_ml_io_dequantize_single(&info->output[i], lcl_qbuffer, lcl_dbuffer);
 		if (ret < 0)
 			return ret;
 
-		lcl_qbuffer += info->output[i].sz_q;
-		lcl_dbuffer += info->output[i].sz_d;
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
+			q_offset += info->output[i].sz_q;
+			d_offset += info->output[i].sz_d;
+		}
 	}
 
 	return 0;
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 29dad0b0e33..9579cdf7867 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -61,6 +61,7 @@ dpdk_conf.set('RTE_MLDEV_CNXK_ENABLE_MVTVM', true)
 
 driver_sdk_headers += files(
         'mvtvm_ml_ops.h',
+        'mvtvm_ml_model.h',
 )
 
 sources += files(
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
new file mode 100644
index 00000000000..1f6b435be02
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_MODEL_H_
+#define _MVTVM_ML_MODEL_H_
+
+#include <tvmdp.h>
+
+#include <rte_mldev.h>
+
+#include "cnxk_ml_io.h"
+
+/* Maximum number of objects per model */
+#define ML_MVTVM_MODEL_OBJECT_MAX 3
+
+/* Objects list */
+extern char mvtvm_object_list[ML_MVTVM_MODEL_OBJECT_MAX][RTE_ML_STR_MAX];
+
+/* Model object structure */
+struct mvtvm_ml_model_object {
+	/* Name */
+	char name[RTE_ML_STR_MAX];
+
+	/* Temporary buffer */
+	uint8_t *buffer;
+
+	/* Buffer size */
+	int64_t size;
+};
+
+struct mvtvm_ml_model_data {
+	/* Model metadata */
+	struct tvmdp_model_metadata metadata;
+
+	/* Model objects */
+	struct tvmdp_model_object object;
+
+	/* TVM runtime callbacks */
+	struct tvmrt_glow_callback cb;
+
+	/* Model I/O info */
+	struct cnxk_ml_io_info info;
+};
+
+#endif /* _MVTVM_ML_MODEL_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 21/34] ml/cnxk: add support for identify model type
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (19 preceding siblings ...)
  2023-08-30 15:59 ` [PATCH v1 20/34] ml/cnxk: add structures to support TVM model type Srikanth Yalavarthi
@ 2023-08-30 15:59 ` Srikanth Yalavarthi
  2023-08-30 15:59 ` [PATCH v1 22/34] ml/cnxk: add support to parse TVM model objects Srikanth Yalavarthi
                   ` (20 subsequent siblings)
  41 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enable support to parse model buffer to identify the
model type and model sub-type. Enabled basic checks
for Glow model type buffer.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_model.c  | 96 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_model.h  |  1 +
 drivers/ml/cnxk/cnxk_ml_ops.c    |  9 +++
 drivers/ml/cnxk/meson.build      |  6 ++
 drivers/ml/cnxk/mvtvm_ml_model.c | 11 ++++
 5 files changed, 123 insertions(+)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.c

diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
index b069d4e3a5a..746d3ca5a95 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.c
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -2,11 +2,107 @@
  * Copyright (c) 2023 Marvell.
  */
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include <archive.h>
+#include <archive_entry.h>
+#endif
+
+#include <rte_hash_crc.h>
 #include <rte_mldev.h>
 
+#include "cn10k_ml_model.h"
+
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_model.h"
+#endif
+
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_utils.h"
 
+enum cnxk_ml_model_type
+cnxk_ml_model_get_type(struct rte_ml_model_params *params)
+{
+	struct cn10k_ml_model_metadata_header *metadata_header;
+	uint32_t payload_crc32c;
+	uint32_t header_crc32c;
+
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	struct archive_entry *entry;
+	struct archive *a;
+	uint8_t i;
+	int ret;
+
+	/* Assume as archive and check for read status */
+	a = archive_read_new();
+	archive_read_support_filter_all(a);
+	archive_read_support_format_all(a);
+
+	ret = archive_read_open_memory(a, params->addr, params->size);
+	if (ret == ARCHIVE_OK)
+		goto check_tvm;
+	else
+		goto check_glow;
+
+check_tvm:
+	bool object_found[ML_MVTVM_MODEL_OBJECT_MAX] = {false, false, false};
+
+	/* Parse buffer for available objects */
+	while (archive_read_next_header(a, &entry) == ARCHIVE_OK) {
+		for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+			if (!object_found[i] &&
+			    (strcmp(archive_entry_pathname(entry), mvtvm_object_list[i]) == 0))
+				object_found[i] = true;
+		}
+		archive_read_data_skip(a);
+	}
+
+	/* Check if all objects are available */
+	for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+		if (!object_found[i]) {
+			plt_err("Object %s not found in archive!\n", mvtvm_object_list[i]);
+			return ML_CNXK_MODEL_TYPE_INVALID;
+		}
+	}
+
+	return ML_CNXK_MODEL_TYPE_TVM;
+
+check_glow:
+#endif
+
+	/* Check model magic string */
+	metadata_header = (struct cn10k_ml_model_metadata_header *)params->addr;
+	if (strncmp((char *)metadata_header->magic, MRVL_ML_MODEL_MAGIC_STRING, 4) != 0) {
+		plt_err("Invalid Glow model, magic = %s", metadata_header->magic);
+		return ML_CNXK_MODEL_TYPE_INVALID;
+	}
+
+	/* Header CRC check */
+	if (metadata_header->header_crc32c != 0) {
+		header_crc32c = rte_hash_crc(
+			params->addr,
+			sizeof(struct cn10k_ml_model_metadata_header) - sizeof(uint32_t), 0);
+
+		if (header_crc32c != metadata_header->header_crc32c) {
+			plt_err("Invalid Glow model, Header CRC mismatch");
+			return ML_CNXK_MODEL_TYPE_INVALID;
+		}
+	}
+
+	/* Payload CRC check */
+	if (metadata_header->payload_crc32c != 0) {
+		payload_crc32c = rte_hash_crc(
+			PLT_PTR_ADD(params->addr, sizeof(struct cn10k_ml_model_metadata_header)),
+			params->size - sizeof(struct cn10k_ml_model_metadata_header), 0);
+
+		if (payload_crc32c != metadata_header->payload_crc32c) {
+			plt_err("Invalid Glow model, Payload CRC mismatch");
+			return ML_CNXK_MODEL_TYPE_INVALID;
+		}
+	}
+
+	return ML_CNXK_MODEL_TYPE_GLOW;
+}
+
 void
 cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp)
 {
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index b5d6ab2b1e2..577a96dc265 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -181,6 +181,7 @@ struct cnxk_ml_model {
 	set_poll_addr_t set_poll_addr;
 };
 
+enum cnxk_ml_model_type cnxk_ml_model_get_type(struct rte_ml_model_params *params);
 void cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp);
 
 #endif /* _CNXK_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 4a5b054975d..cbb701f20bb 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -10,6 +10,7 @@
 #include "cn10k_ml_ops.h"
 
 #ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_model.h"
 #include "mvtvm_ml_ops.h"
 #endif
 
@@ -1087,6 +1088,7 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 {
 	struct rte_ml_dev_info dev_info;
 	struct cnxk_ml_dev *cnxk_mldev;
+	enum cnxk_ml_model_type type;
 	struct cnxk_ml_model *model;
 
 	char str[RTE_MEMZONE_NAMESIZE];
@@ -1102,6 +1104,12 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 
 	cnxk_mldev = dev->data->dev_private;
 
+	type = cnxk_ml_model_get_type(params);
+	if (type == ML_CNXK_MODEL_TYPE_INVALID) {
+		plt_err("Invalid / unsupported model type");
+		return -EINVAL;
+	}
+
 	/* Find model ID */
 	found = false;
 	for (lcl_model_id = 0; lcl_model_id < dev->data->nb_models; lcl_model_id++) {
@@ -1135,6 +1143,7 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 
 	model = mz->addr;
 	model->cnxk_mldev = cnxk_mldev;
+	model->type = type;
 	model->model_id = lcl_model_id;
 	model->info = PLT_PTR_ADD(
 		model, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size));
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 9579cdf7867..db175b0834d 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -9,6 +9,11 @@ endif
 
 enable_mvtvm = true
 
+if not libarchive.found()
+        message('drivers/ml/cnxk: libarchive not found')
+        enable_mvtvm = false
+endif
+
 if not jansson_dep.found()
         message('drivers/ml/cnxk: jansson not found')
         enable_mvtvm = false
@@ -66,6 +71,7 @@ driver_sdk_headers += files(
 
 sources += files(
         'mvtvm_ml_ops.c',
+        'mvtvm_ml_model.c',
 )
 
 ext_deps += tvmrt_dep
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
new file mode 100644
index 00000000000..64622675345
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include "mvtvm_ml_model.h"
+
+/* Objects list */
+char mvtvm_object_list[ML_MVTVM_MODEL_OBJECT_MAX][RTE_ML_STR_MAX] = {"mod.so", "mod.json",
+								     "mod.params"};
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 22/34] ml/cnxk: add support to parse TVM model objects
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (20 preceding siblings ...)
  2023-08-30 15:59 ` [PATCH v1 21/34] ml/cnxk: add support for identify " Srikanth Yalavarthi
@ 2023-08-30 15:59 ` Srikanth Yalavarthi
  2023-08-30 15:59 ` [PATCH v1 23/34] ml/cnxk: fetch layer info and load TVM model Srikanth Yalavarthi
                   ` (19 subsequent siblings)
  41 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to parse TVM model objects from the model
archive buffer. Added support to check for all expected
objects and copy TVM model objects to internal buffers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_ops.c    | 14 +++++--
 drivers/ml/cnxk/mvtvm_ml_model.c | 62 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  3 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 63 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |  3 ++
 5 files changed, 142 insertions(+), 3 deletions(-)

diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index cbb701f20bb..a99367089b4 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1149,9 +1149,17 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 		model, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size));
 	dev->data->models[lcl_model_id] = model;
 
-	ret = cn10k_ml_model_load(cnxk_mldev, params, model);
-	if (ret != 0)
-		goto error;
+	if (type == ML_CNXK_MODEL_TYPE_GLOW) {
+		ret = cn10k_ml_model_load(cnxk_mldev, params, model);
+		if (ret != 0)
+			goto error;
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	} else {
+		ret = mvtvm_ml_model_load(cnxk_mldev, params, model);
+		if (ret != 0)
+			goto error;
+#endif
+	}
 
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CNXK_MODEL_STATE_LOADED;
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 64622675345..425a682209f 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -2,10 +2,72 @@
  * Copyright (c) 2023 Marvell.
  */
 
+#include <archive.h>
+#include <archive_entry.h>
+
 #include <rte_mldev.h>
 
+#include <roc_api.h>
+
 #include "mvtvm_ml_model.h"
 
 /* Objects list */
 char mvtvm_object_list[ML_MVTVM_MODEL_OBJECT_MAX][RTE_ML_STR_MAX] = {"mod.so", "mod.json",
 								     "mod.params"};
+
+int
+mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params, struct mvtvm_ml_model_object *object)
+{
+	bool object_found[ML_MVTVM_MODEL_OBJECT_MAX] = {false, false, false};
+	struct archive_entry *entry;
+	struct archive *a;
+	uint8_t i;
+	int ret;
+
+	/* Open archive */
+	a = archive_read_new();
+	archive_read_support_filter_all(a);
+	archive_read_support_format_all(a);
+
+	ret = archive_read_open_memory(a, params->addr, params->size);
+	if (ret != ARCHIVE_OK)
+		return archive_errno(a);
+
+	/* Read archive */
+	while (archive_read_next_header(a, &entry) == ARCHIVE_OK) {
+		for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+			if (!object_found[i] &&
+			    (strcmp(archive_entry_pathname(entry), mvtvm_object_list[i]) == 0)) {
+				memcpy(object[i].name, mvtvm_object_list[i], RTE_ML_STR_MAX);
+				object[i].size = archive_entry_size(entry);
+				object[i].buffer = rte_malloc(NULL, object[i].size, 0);
+
+				if (archive_read_data(a, object[i].buffer, object[i].size) !=
+				    object[i].size) {
+					plt_err("Failed to read object from model archive: %s",
+						object[i].name);
+					goto error;
+				}
+				object_found[i] = true;
+			}
+		}
+		archive_read_data_skip(a);
+	}
+
+	/* Check if all objects are parsed */
+	for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+		if (!object_found[i]) {
+			plt_err("Object %s not found in archive!\n", mvtvm_object_list[i]);
+			goto error;
+		}
+	}
+	return 0;
+
+error:
+	for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+		if (object[i].buffer != NULL)
+			rte_free(object[i].buffer);
+	}
+
+	return -EINVAL;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 1f6b435be02..73a45a91d66 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -43,4 +43,7 @@ struct mvtvm_ml_model_data {
 	struct cnxk_ml_io_info info;
 };
 
+int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
+			      struct mvtvm_ml_model_object *object);
+
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 0e1fc527daa..1bdd4515771 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -9,9 +9,14 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include "mvtvm_ml_model.h"
 #include "mvtvm_ml_ops.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
+
+/* ML model macros */
+#define MVTVM_ML_MODEL_MEMZONE_NAME "ml_mvtvm_model_mz"
 
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
@@ -42,3 +47,61 @@ mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 
 	return ret;
 }
+
+int
+mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+		    struct cnxk_ml_model *model)
+{
+	struct mvtvm_ml_model_object object[ML_MVTVM_MODEL_OBJECT_MAX];
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	size_t model_object_size = 0;
+	uint64_t mz_size = 0;
+	int ret;
+
+	RTE_SET_USED(cnxk_mldev);
+
+	ret = mvtvm_ml_model_blob_parse(params, object);
+	if (ret != 0)
+		return ret;
+
+	model_object_size = RTE_ALIGN_CEIL(object[0].size, RTE_CACHE_LINE_MIN_SIZE) +
+			    RTE_ALIGN_CEIL(object[1].size, RTE_CACHE_LINE_MIN_SIZE) +
+			    RTE_ALIGN_CEIL(object[2].size, RTE_CACHE_LINE_MIN_SIZE);
+	mz_size += model_object_size;
+
+	/* Allocate memzone for model object */
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", MVTVM_ML_MODEL_MEMZONE_NAME, model->model_id);
+	mz = plt_memzone_reserve_aligned(str, mz_size, 0, ML_CN10K_ALIGN_SIZE);
+	if (!mz) {
+		plt_err("plt_memzone_reserve failed : %s", str);
+		return -ENOMEM;
+	}
+
+	/* Copy mod.so */
+	model->mvtvm.object.so.addr = mz->addr;
+	model->mvtvm.object.so.size = object[0].size;
+	rte_memcpy(model->mvtvm.object.so.name, object[0].name, TVMDP_NAME_STRLEN);
+	rte_memcpy(model->mvtvm.object.so.addr, object[0].buffer, object[0].size);
+	rte_free(object[0].buffer);
+
+	/* Copy mod.json */
+	model->mvtvm.object.json.addr =
+		RTE_PTR_ADD(model->mvtvm.object.so.addr,
+			    RTE_ALIGN_CEIL(model->mvtvm.object.so.size, RTE_CACHE_LINE_MIN_SIZE));
+	model->mvtvm.object.json.size = object[1].size;
+	rte_memcpy(model->mvtvm.object.json.name, object[1].name, TVMDP_NAME_STRLEN);
+	rte_memcpy(model->mvtvm.object.json.addr, object[1].buffer, object[1].size);
+	rte_free(object[1].buffer);
+
+	/* Copy mod.params */
+	model->mvtvm.object.params.addr =
+		RTE_PTR_ADD(model->mvtvm.object.json.addr,
+			    RTE_ALIGN_CEIL(model->mvtvm.object.json.size, RTE_CACHE_LINE_MIN_SIZE));
+	model->mvtvm.object.params.size = object[2].size;
+	rte_memcpy(model->mvtvm.object.params.name, object[2].name, TVMDP_NAME_STRLEN);
+	rte_memcpy(model->mvtvm.object.params.addr, object[2].buffer, object[2].size);
+	rte_free(object[2].buffer);
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 988f3a1fd5e..ca8f57992da 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -8,8 +8,11 @@
 #include <rte_mldev.h>
 
 struct cnxk_ml_dev;
+struct cnxk_ml_model;
 
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+			struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_OPS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 23/34] ml/cnxk: fetch layer info and load TVM model
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (21 preceding siblings ...)
  2023-08-30 15:59 ` [PATCH v1 22/34] ml/cnxk: add support to parse TVM model objects Srikanth Yalavarthi
@ 2023-08-30 15:59 ` Srikanth Yalavarthi
  2023-08-30 15:59 ` [PATCH v1 24/34] ml/cnxk: update internal info for " Srikanth Yalavarthi
                   ` (18 subsequent siblings)
  41 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to fetch TVM model layer information and
update internal structures based on the layer information
Set callback functions for layer load and unload and
enable model loading using TVMDP library. Added support
to fetch full metadata after model load.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c   | 22 ++++++++-
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 +
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 83 ++++++++++++++++++++++++++++++++
 3 files changed, 106 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index db18f320527..79217165cd5 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -508,8 +508,10 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 	int qp_id;
 	int ret;
 
-	PLT_SET_USED(size);
+#ifndef RTE_MLDEV_CNXK_ENABLE_MVTVM
 	PLT_SET_USED(layer_name);
+#endif
+	PLT_SET_USED(size);
 
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
@@ -523,6 +525,24 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 		return -EINVAL;
 	}
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	if (model->type == ML_CNXK_MODEL_TYPE_TVM) {
+		for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+			if (strcmp(model->layer[layer_id].name, layer_name) == 0)
+				break;
+		}
+
+		if (layer_id == model->mvtvm.metadata.model.nb_layers) {
+			plt_err("Invalid layer name: %s", layer_name);
+			return -EINVAL;
+		}
+
+		if (model->layer[layer_id].type != ML_CNXK_LAYER_TYPE_MRVL) {
+			plt_err("Invalid layer name / type: %s", layer_name);
+			return -EINVAL;
+		}
+	}
+#endif
 	layer = &model->layer[layer_id];
 
 	ret = cn10k_ml_model_metadata_check(buffer, size);
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 73a45a91d66..6c38217c158 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -11,6 +11,8 @@
 
 #include "cnxk_ml_io.h"
 
+struct cnxk_ml_model;
+
 /* Maximum number of objects per model */
 #define ML_MVTVM_MODEL_OBJECT_MAX 3
 
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 1bdd4515771..5c30bbf6b89 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -9,6 +9,8 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include "cn10k_ml_ops.h"
+
 #include "mvtvm_ml_model.h"
 #include "mvtvm_ml_ops.h"
 
@@ -53,9 +55,13 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		    struct cnxk_ml_model *model)
 {
 	struct mvtvm_ml_model_object object[ML_MVTVM_MODEL_OBJECT_MAX];
+	struct tvmrt_glow_callback *callback;
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
 	size_t model_object_size = 0;
+	uint16_t nb_mrvl_layers;
+	uint16_t nb_llvm_layers;
+	uint8_t layer_id = 0;
 	uint64_t mz_size = 0;
 	int ret;
 
@@ -103,5 +109,82 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	rte_memcpy(model->mvtvm.object.params.addr, object[2].buffer, object[2].size);
 	rte_free(object[2].buffer);
 
+	/* Get metadata - stage 1 */
+	ret = tvmdp_model_metadata_get_stage1(model->mvtvm.object.json.addr,
+					      model->mvtvm.object.json.size,
+					      &model->mvtvm.metadata);
+	if (ret != 0) {
+		plt_err("TVMDP: Failed to parse metadata - stage 1, model_id = %u, error = %d",
+			model->model_id, ret);
+		goto error;
+	}
+
+	/* Set model fields */
+	plt_strlcpy(model->name, model->mvtvm.metadata.model.name, TVMDP_NAME_STRLEN);
+	model->batch_size = 1;
+	model->nb_layers = model->mvtvm.metadata.model.nb_layers;
+
+	/* Update layer info */
+	nb_mrvl_layers = 0;
+	nb_llvm_layers = 0;
+	for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+		strncpy(model->layer[layer_id].name,
+			model->mvtvm.metadata.model.layer[layer_id].name, TVMDP_NAME_STRLEN);
+		if (strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "mrvl") == 0 ||
+		    strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "MRVL") == 0) {
+			model->layer[layer_id].type = ML_CNXK_LAYER_TYPE_MRVL;
+			nb_mrvl_layers++;
+		} else if (strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "llvm") == 0 ||
+			   strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "LLVM") == 0) {
+			model->layer[layer_id].type = ML_CNXK_LAYER_TYPE_LLVM;
+			nb_llvm_layers++;
+		}
+	}
+
+	if ((nb_llvm_layers == 0) && (nb_mrvl_layers == 0)) {
+		plt_err("Invalid model, nb_llvm_layers = %u, nb_mrvl_layers = %u", nb_llvm_layers,
+			nb_mrvl_layers);
+		goto error;
+	}
+
+	/* Set model subtype */
+	if ((nb_llvm_layers == 0) && (nb_mrvl_layers == 1))
+		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_MRVL;
+	else if ((nb_llvm_layers > 0) && (nb_mrvl_layers == 0))
+		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_LLVM;
+	else
+		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_HYBRID;
+
+	/* Set callback function array */
+	if (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_LLVM) {
+		callback = &model->mvtvm.cb;
+		callback->tvmrt_glow_layer_load = cn10k_ml_layer_load;
+		callback->tvmrt_glow_layer_unload = cn10k_ml_layer_unload;
+	} else {
+		callback = NULL;
+	}
+
+	/* Initialize model in TVMDP */
+	ret = tvmdp_model_load(cnxk_mldev, model->model_id, (void *)(&model->mvtvm.object),
+			       callback);
+	if (ret != 0) {
+		plt_err("TVMDP: Model load failed, model_id = %u, error = %d", model->model_id,
+			ret);
+		goto error;
+	}
+
+	/* Get model metadata - stage 2 */
+	ret = tvmdp_model_metadata_get_stage2(model->model_id, &model->mvtvm.metadata);
+	if (ret != 0) {
+		plt_err("TVMDP: Failed to get metadata, model_id = %u, error = %d\n",
+			model->model_id, ret);
+		goto error;
+	}
+
 	return 0;
+
+error:
+	rte_memzone_free(mz);
+
+	return ret;
 }
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 24/34] ml/cnxk: update internal info for TVM model
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (22 preceding siblings ...)
  2023-08-30 15:59 ` [PATCH v1 23/34] ml/cnxk: fetch layer info and load TVM model Srikanth Yalavarthi
@ 2023-08-30 15:59 ` Srikanth Yalavarthi
  2023-08-30 15:59 ` [PATCH v1 25/34] ml/cnxk: enable model unload in tvmdp library Srikanth Yalavarthi
                   ` (17 subsequent siblings)
  41 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enabled updating internal IO info structures for TVM model.
Compute static fields related to the model I/O.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/mvtvm_ml_model.c | 105 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |   1 +
 drivers/ml/cnxk/mvtvm_ml_ops.c   |   3 +
 3 files changed, 109 insertions(+)

diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 425a682209f..86f465a645f 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -7,10 +7,14 @@
 
 #include <rte_mldev.h>
 
+#include <mldev_utils.h>
+
 #include <roc_api.h>
 
 #include "mvtvm_ml_model.h"
 
+#include "cnxk_ml_model.h"
+
 /* Objects list */
 char mvtvm_object_list[ML_MVTVM_MODEL_OBJECT_MAX][RTE_ML_STR_MAX] = {"mod.so", "mod.json",
 								     "mod.params"};
@@ -71,3 +75,104 @@ mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params, struct mvtvm_ml_mo
 
 	return -EINVAL;
 }
+
+static enum rte_ml_io_type
+mvtvm_ml_io_type_map(uint8_t type)
+{
+	switch (type) {
+	case kDLInt:
+		return RTE_ML_IO_TYPE_INT32;
+	case kDLUInt:
+		return RTE_ML_IO_TYPE_UINT32;
+	case kDLFloat:
+		return RTE_ML_IO_TYPE_FP32;
+	case kDLBfloat:
+		return RTE_ML_IO_TYPE_BFLOAT16;
+	}
+
+	return RTE_ML_IO_TYPE_UNKNOWN;
+}
+
+void
+mvtvm_ml_model_io_info_update(struct cnxk_ml_model *model)
+{
+	struct tvmdp_model_metadata *metadata;
+	int32_t i;
+	int32_t j;
+
+	if (model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL)
+		goto tvm_mrvl_model;
+
+	metadata = &model->mvtvm.metadata;
+
+	/* Inputs, set for layer_id = 0 */
+	model->mvtvm.info.nb_inputs = metadata->model.num_input;
+	model->mvtvm.info.total_input_sz_d = 0;
+	model->mvtvm.info.total_input_sz_q = 0;
+	for (i = 0; i < metadata->model.num_input; i++) {
+		strncpy(model->mvtvm.info.input[i].name, metadata->input[i].name,
+			TVMDP_NAME_STRLEN);
+		model->mvtvm.info.input[i].dtype =
+			mvtvm_ml_io_type_map(metadata->input[i].datatype.code);
+		model->mvtvm.info.input[i].qtype =
+			mvtvm_ml_io_type_map(metadata->input[i].model_datatype.code);
+		model->mvtvm.info.input[i].nb_dims = metadata->input[i].ndim;
+
+		model->mvtvm.info.input[i].nb_elements = 1;
+		for (j = 0; j < metadata->input[i].ndim; j++) {
+			model->mvtvm.info.input[i].shape[j] = metadata->input[i].shape[j];
+			model->mvtvm.info.input[i].nb_elements *= metadata->input[i].shape[j];
+		}
+
+		model->mvtvm.info.input[i].sz_d =
+			model->mvtvm.info.input[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.input[i].dtype);
+		model->mvtvm.info.input[i].sz_q =
+			model->mvtvm.info.input[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.input[i].qtype);
+
+		model->mvtvm.info.total_input_sz_d += model->mvtvm.info.input[i].sz_d;
+		model->mvtvm.info.total_input_sz_q += model->mvtvm.info.input[i].sz_q;
+
+		plt_ml_dbg("model_id = %u, input[%u] - sz_d = %u sz_q = %u", model->model_id, i,
+			   model->mvtvm.info.input[i].sz_d, model->mvtvm.info.input[i].sz_q);
+	}
+
+	/* Outputs, set for nb_layers - 1 */
+	model->mvtvm.info.nb_outputs = metadata->model.num_output;
+	model->mvtvm.info.total_output_sz_d = 0;
+	model->mvtvm.info.total_output_sz_q = 0;
+	for (i = 0; i < metadata->model.num_output; i++) {
+		strncpy(model->mvtvm.info.output[i].name, metadata->output[i].name,
+			TVMDP_NAME_STRLEN);
+		model->mvtvm.info.output[i].dtype =
+			mvtvm_ml_io_type_map(metadata->output[i].datatype.code);
+		model->mvtvm.info.output[i].qtype =
+			mvtvm_ml_io_type_map(metadata->output[i].model_datatype.code);
+		model->mvtvm.info.output[i].nb_dims = metadata->output[i].ndim;
+
+		model->mvtvm.info.output[i].nb_elements = 1;
+		for (j = 0; j < metadata->output[i].ndim; j++) {
+			model->mvtvm.info.output[i].shape[j] = metadata->output[i].shape[j];
+			model->mvtvm.info.output[i].nb_elements *= metadata->output[i].shape[j];
+		}
+
+		model->mvtvm.info.output[i].sz_d =
+			model->mvtvm.info.output[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.output[i].dtype);
+		model->mvtvm.info.output[i].sz_q =
+			model->mvtvm.info.output[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.output[i].qtype);
+
+		model->mvtvm.info.total_output_sz_d += model->mvtvm.info.output[i].sz_d;
+		model->mvtvm.info.total_output_sz_q += model->mvtvm.info.output[i].sz_q;
+
+		plt_ml_dbg("model_id = %u, output[%u] - sz_d = %u sz_q = %u", model->model_id, i,
+			   model->mvtvm.info.output[i].sz_d, model->mvtvm.info.output[i].sz_q);
+	}
+
+	return;
+
+tvm_mrvl_model:
+	cn10k_ml_layer_io_info_update(&model->mvtvm.info, &model->layer[0].glow.metadata);
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 6c38217c158..2b25a7b568e 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -47,5 +47,6 @@ struct mvtvm_ml_model_data {
 
 int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
 			      struct mvtvm_ml_model_object *object);
+void mvtvm_ml_model_io_info_update(struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 5c30bbf6b89..a783e16e6eb 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -181,6 +181,9 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		goto error;
 	}
 
+	/* Update model I/O data */
+	mvtvm_ml_model_io_info_update(model);
+
 	return 0;
 
 error:
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 25/34] ml/cnxk: enable model unload in tvmdp library
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (23 preceding siblings ...)
  2023-08-30 15:59 ` [PATCH v1 24/34] ml/cnxk: update internal info for " Srikanth Yalavarthi
@ 2023-08-30 15:59 ` Srikanth Yalavarthi
  2023-08-30 15:59 ` [PATCH v1 26/34] ml/cnxk: support start and stop for TVM models Srikanth Yalavarthi
                   ` (16 subsequent siblings)
  41 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enable unloading model using external tvmdp library. Updated
layer unload callback to support multiple layers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 20 ++++++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.c  |  9 +++++++--
 drivers/ml/cnxk/mvtvm_ml_ops.c | 28 ++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h |  1 +
 4 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 79217165cd5..85d0a9e18bb 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -725,7 +725,9 @@ cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name)
 	uint16_t layer_id = 0;
 	int ret;
 
+#ifndef RTE_MLDEV_CNXK_ENABLE_MVTVM
 	PLT_SET_USED(layer_name);
+#endif
 
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
@@ -739,6 +741,24 @@ cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name)
 		return -EINVAL;
 	}
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	if (model->type == ML_CNXK_MODEL_TYPE_TVM) {
+		for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+			if (strcmp(model->layer[layer_id].name, layer_name) == 0)
+				break;
+		}
+
+		if (layer_id == model->mvtvm.metadata.model.nb_layers) {
+			plt_err("Invalid layer name: %s", layer_name);
+			return -EINVAL;
+		}
+
+		if (model->layer[layer_id].type != ML_CNXK_LAYER_TYPE_MRVL) {
+			plt_err("Invalid layer name / type: %s", layer_name);
+			return -EINVAL;
+		}
+	}
+#endif
 	layer = &model->layer[layer_id];
 
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u_%u", CN10K_ML_LAYER_MEMZONE_NAME,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index a99367089b4..d8eadcb8121 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1182,7 +1182,7 @@ cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 	struct cnxk_ml_model *model;
 
 	char str[RTE_MEMZONE_NAMESIZE];
-	int ret;
+	int ret = 0;
 
 	if (dev == NULL)
 		return -EINVAL;
@@ -1200,7 +1200,12 @@ cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EBUSY;
 	}
 
-	ret = cn10k_ml_model_unload(cnxk_mldev, model);
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		ret = cn10k_ml_model_unload(cnxk_mldev, model);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	else
+		ret = mvtvm_ml_model_unload(cnxk_mldev, model);
+#endif
 	if (ret != 0)
 		return ret;
 
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index a783e16e6eb..1edbfb0dcc3 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -191,3 +191,31 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	return ret;
 }
+
+int
+mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	int ret;
+
+	RTE_SET_USED(cnxk_mldev);
+
+	/* Initialize model in TVMDP */
+	ret = tvmdp_model_unload(model->model_id);
+	if (ret != 0) {
+		plt_err("TVMDP: Model unload failed, model_id = %u, error = %d", model->model_id,
+			ret);
+		return ret;
+	}
+
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", MVTVM_ML_MODEL_MEMZONE_NAME, model->model_id);
+	mz = rte_memzone_lookup(str);
+	if (mz == NULL) {
+		plt_err("Memzone lookup failed for TVM model: model_id = %u, mz = %s",
+			model->model_id, str);
+		return -EINVAL;
+	}
+
+	return plt_memzone_free(mz);
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index ca8f57992da..8b4db20fe94 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -14,5 +14,6 @@ int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_d
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
+int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_OPS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 26/34] ml/cnxk: support start and stop for TVM models
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (24 preceding siblings ...)
  2023-08-30 15:59 ` [PATCH v1 25/34] ml/cnxk: enable model unload in tvmdp library Srikanth Yalavarthi
@ 2023-08-30 15:59 ` Srikanth Yalavarthi
  2023-08-30 15:59 ` [PATCH v1 27/34] ml/cnxk: update internal TVM model info structure Srikanth Yalavarthi
                   ` (15 subsequent siblings)
  41 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to start and stop TVM models. TVM model
start would invoke layer start for all Glow layers part
of the model. TVM model stop would invoke layer stop
for all Glow layers part of the model.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 42 +++++++++++++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.c  | 18 ++++++++++--
 drivers/ml/cnxk/mvtvm_ml_ops.c | 52 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h |  2 ++
 4 files changed, 112 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 85d0a9e18bb..f70383b1281 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -798,7 +798,9 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 	bool locked;
 	int ret = 0;
 
+#ifndef RTE_MLDEV_CNXK_ENABLE_MVTVM
 	PLT_SET_USED(layer_name);
+#endif
 
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
@@ -812,6 +814,25 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 		return -EINVAL;
 	}
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	if (model->type == ML_CNXK_MODEL_TYPE_TVM) {
+		for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+			if (strcmp(model->layer[layer_id].name, layer_name) == 0)
+				break;
+		}
+
+		if (layer_id == model->mvtvm.metadata.model.nb_layers) {
+			plt_err("Invalid layer name: %s", layer_name);
+			return -EINVAL;
+		}
+
+		if (model->layer[layer_id].type != ML_CNXK_LAYER_TYPE_MRVL) {
+			plt_err("Invalid layer name / type: %s", layer_name);
+			return -EINVAL;
+		}
+	}
+#endif
+
 	layer = &model->layer[layer_id];
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
@@ -981,7 +1002,9 @@ cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name)
 	bool locked;
 	int ret = 0;
 
+#ifndef RTE_MLDEV_CNXK_ENABLE_MVTVM
 	PLT_SET_USED(layer_name);
+#endif
 
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
@@ -995,6 +1018,25 @@ cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name)
 		return -EINVAL;
 	}
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	if (model->type == ML_CNXK_MODEL_TYPE_TVM) {
+		for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+			if (strcmp(model->layer[layer_id].name, layer_name) == 0)
+				break;
+		}
+
+		if (layer_id == model->mvtvm.metadata.model.nb_layers) {
+			plt_err("Invalid layer name: %s", layer_name);
+			return -EINVAL;
+		}
+
+		if (model->layer[layer_id].type != ML_CNXK_LAYER_TYPE_MRVL) {
+			plt_err("Invalid layer name / type: %s", layer_name);
+			return -EINVAL;
+		}
+	}
+#endif
+
 	layer = &model->layer[layer_id];
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index d8eadcb8121..45c3d61a9b6 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1233,7 +1233,14 @@ cnxk_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EINVAL;
 	}
 
-	return cn10k_ml_model_start(cnxk_mldev, model);
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		return cn10k_ml_model_start(cnxk_mldev, model);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	else
+		return mvtvm_ml_model_start(cnxk_mldev, model);
+#endif
+
+	return 0;
 }
 
 int
@@ -1253,7 +1260,14 @@ cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EINVAL;
 	}
 
-	return cn10k_ml_model_stop(cnxk_mldev, model);
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		return cn10k_ml_model_stop(cnxk_mldev, model);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	else
+		return mvtvm_ml_model_stop(cnxk_mldev, model);
+#endif
+
+	return 0;
 }
 
 static int
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 1edbfb0dcc3..8d25b5d4b87 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -219,3 +219,55 @@ mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mode
 
 	return plt_memzone_free(mz);
 }
+
+int
+mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+
+	uint16_t layer_id = 0;
+	int ret = 0;
+
+next_layer:
+	layer = &model->layer[layer_id];
+	if (layer->type == ML_CNXK_LAYER_TYPE_MRVL) {
+		ret = cn10k_ml_layer_start(cnxk_mldev, model->model_id, layer->name);
+		if (ret != 0) {
+			plt_err("Layer start failed, model_id = %u, layer_name = %s, error = %d",
+				model->model_id, layer->name, ret);
+			return ret;
+		}
+	}
+	layer_id++;
+
+	if (layer_id < model->nb_layers)
+		goto next_layer;
+
+	return 0;
+}
+
+int
+mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+
+	uint16_t layer_id = 0;
+	int ret = 0;
+
+next_layer:
+	layer = &model->layer[layer_id];
+	if (layer->type == ML_CNXK_LAYER_TYPE_MRVL) {
+		ret = cn10k_ml_layer_stop(cnxk_mldev, model->model_id, layer->name);
+		if (ret != 0) {
+			plt_err("Layer stop failed, model_id = %u, layer_name = %s, error = %d",
+				model->model_id, layer->name, ret);
+			return ret;
+		}
+	}
+	layer_id++;
+
+	if (layer_id < model->nb_layers)
+		goto next_layer;
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 8b4db20fe94..f6ede6229f4 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -15,5 +15,7 @@ int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_OPS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 27/34] ml/cnxk: update internal TVM model info structure
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (25 preceding siblings ...)
  2023-08-30 15:59 ` [PATCH v1 26/34] ml/cnxk: support start and stop for TVM models Srikanth Yalavarthi
@ 2023-08-30 15:59 ` Srikanth Yalavarthi
  2023-08-30 15:59 ` [PATCH v1 28/34] ml/cnxk: support device dump for TVM models Srikanth Yalavarthi
                   ` (14 subsequent siblings)
  41 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

From: Prince Takkar <ptakkar@marvell.com>

Added support to update internal model info structure
for TVM models.

Signed-off-by: Prince Takkar <ptakkar@marvell.com>
Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/mvtvm_ml_model.c | 65 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 +
 drivers/ml/cnxk/mvtvm_ml_ops.c   |  3 ++
 3 files changed, 70 insertions(+)

diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 86f465a645f..6d72a5255e2 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -13,6 +13,7 @@
 
 #include "mvtvm_ml_model.h"
 
+#include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 
 /* Objects list */
@@ -176,3 +177,67 @@ mvtvm_ml_model_io_info_update(struct cnxk_ml_model *model)
 tvm_mrvl_model:
 	cn10k_ml_layer_io_info_update(&model->mvtvm.info, &model->layer[0].glow.metadata);
 }
+
+void
+mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct tvmdp_model_metadata *metadata;
+	struct rte_ml_model_info *info;
+	struct rte_ml_io_info *output;
+	struct rte_ml_io_info *input;
+	uint8_t i;
+
+	info = PLT_PTR_CAST(model->info);
+	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
+	output = PLT_PTR_ADD(input, ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info));
+
+	/* Reset model info */
+	memset(info, 0, sizeof(struct rte_ml_model_info));
+
+	if (model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL)
+		goto tvm_mrvl_model;
+
+	metadata = &model->mvtvm.metadata;
+	rte_memcpy(info->name, metadata->model.name, MRVL_ML_MODEL_NAME_LEN);
+	snprintf(info->version, RTE_ML_STR_MAX, "%u.%u.%u.%u", metadata->model.version[0],
+		 metadata->model.version[1], metadata->model.version[2],
+		 metadata->model.version[3]);
+	info->model_id = model->model_id;
+	info->device_id = cnxk_mldev->mldev->data->dev_id;
+	info->io_layout = RTE_ML_IO_LAYOUT_SPLIT;
+	info->min_batches = model->batch_size;
+	info->max_batches = model->batch_size;
+	info->nb_inputs = metadata->model.num_input;
+	info->input_info = input;
+	info->nb_outputs = metadata->model.num_output;
+	info->output_info = output;
+	info->wb_size = 0;
+
+	/* Set input info */
+	for (i = 0; i < info->nb_inputs; i++) {
+		rte_memcpy(input[i].name, metadata->input[i].name, MRVL_ML_INPUT_NAME_LEN);
+		input[i].nb_dims = metadata->input[i].ndim;
+		input[i].shape = &model->mvtvm.info.input[i].shape[0];
+		input[i].type = model->mvtvm.info.input[i].qtype;
+		input[i].nb_elements = model->mvtvm.info.input[i].nb_elements;
+		input[i].size = model->mvtvm.info.input[i].nb_elements *
+				rte_ml_io_type_size_get(model->mvtvm.info.input[i].qtype);
+	}
+
+	/* Set output info */
+	for (i = 0; i < info->nb_outputs; i++) {
+		rte_memcpy(output[i].name, metadata->output[i].name, MRVL_ML_OUTPUT_NAME_LEN);
+		output[i].nb_dims = metadata->output[i].ndim;
+		output[i].shape = &model->mvtvm.info.output[i].shape[0];
+		output[i].type = model->mvtvm.info.output[i].qtype;
+		output[i].nb_elements = model->mvtvm.info.output[i].nb_elements;
+		output[i].size = model->mvtvm.info.output[i].nb_elements *
+				 rte_ml_io_type_size_get(model->mvtvm.info.output[i].qtype);
+	}
+
+	return;
+
+tvm_mrvl_model:
+	cn10k_ml_model_info_set(cnxk_mldev, model, &model->mvtvm.info,
+				&model->layer[0].glow.metadata);
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 2b25a7b568e..eef424b5c2a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -11,6 +11,7 @@
 
 #include "cnxk_ml_io.h"
 
+struct cnxk_ml_dev;
 struct cnxk_ml_model;
 
 /* Maximum number of objects per model */
@@ -48,5 +49,6 @@ struct mvtvm_ml_model_data {
 int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
 			      struct mvtvm_ml_model_object *object);
 void mvtvm_ml_model_io_info_update(struct cnxk_ml_model *model);
+void mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 8d25b5d4b87..3fae25f6d2d 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -184,6 +184,9 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	/* Update model I/O data */
 	mvtvm_ml_model_io_info_update(model);
 
+	/* Set model info */
+	mvtvm_ml_model_info_set(cnxk_mldev, model);
+
 	return 0;
 
 error:
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 28/34] ml/cnxk: support device dump for TVM models
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (26 preceding siblings ...)
  2023-08-30 15:59 ` [PATCH v1 27/34] ml/cnxk: update internal TVM model info structure Srikanth Yalavarthi
@ 2023-08-30 15:59 ` Srikanth Yalavarthi
  2023-08-30 15:59 ` [PATCH v1 29/34] ml/cnxk: enable reporting model runtime as xstats Srikanth Yalavarthi
                   ` (13 subsequent siblings)
  41 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enabled support to print TVM model layer info.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_model.c  |  9 ++++-
 drivers/ml/cnxk/cnxk_ml_ops.c    |  1 +
 drivers/ml/cnxk/mvtvm_ml_model.c | 59 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 ++
 4 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
index 746d3ca5a95..e63ee58ab27 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.c
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -115,6 +115,8 @@ cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 	cnxk_ml_print_line(fp, LINE_LEN);
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
 	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->name);
+	fprintf(fp, "%*s : %d\n", FIELD_LEN, "type", model->type);
+	fprintf(fp, "%*s : %d\n", FIELD_LEN, "subtype", model->subtype);
 	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->batch_size);
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "nb_layers", model->nb_layers);
@@ -131,6 +133,11 @@ cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 
 	for (layer_id = 0; layer_id < model->nb_layers; layer_id++) {
 		layer = &model->layer[layer_id];
-		cn10k_ml_layer_print(cnxk_mldev, layer, fp);
+		if (layer->type == ML_CNXK_LAYER_TYPE_MRVL)
+			cn10k_ml_layer_print(cnxk_mldev, layer, fp);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		else
+			mvtvm_ml_layer_print(cnxk_mldev, layer, fp);
+#endif
 	}
 }
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 45c3d61a9b6..f933a2b846f 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -18,6 +18,7 @@
 #include "cnxk_ml_io.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
+#include "cnxk_ml_utils.h"
 
 /* ML model macros */
 #define CNXK_ML_MODEL_MEMZONE_NAME "ml_cnxk_model_mz"
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 6d72a5255e2..24dc862d685 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -15,6 +15,7 @@
 
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_utils.h"
 
 /* Objects list */
 char mvtvm_object_list[ML_MVTVM_MODEL_OBJECT_MAX][RTE_ML_STR_MAX] = {"mod.so", "mod.json",
@@ -241,3 +242,61 @@ mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 	cn10k_ml_model_info_set(cnxk_mldev, model, &model->mvtvm.info,
 				&model->layer[0].glow.metadata);
 }
+
+void
+mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp)
+{
+	char str[STR_LEN];
+	uint8_t i;
+
+	/* Print debug info */
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, " Layer Information (Layer ID: %u, Name: %s)\n",
+		cnxk_mldev->index_map[layer->index].layer_id, layer->name);
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "layer_id",
+		cnxk_mldev->index_map[layer->index].layer_id);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", layer->name);
+	fprintf(fp, "%*s : %d\n", FIELD_LEN, "type", layer->type);
+	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "layer", PLT_U64_CAST(layer));
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", layer->batch_size);
+
+	/* Print model state */
+	if (layer->state == ML_CNXK_LAYER_STATE_LOADED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+	if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+	if (layer->state == ML_CNXK_LAYER_STATE_STARTED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", layer->info.nb_inputs);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", layer->info.nb_outputs);
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s\n", "input", "input_name", "input_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->info.nb_inputs; i++) {
+		fprintf(fp, "%8u  ", i);
+		fprintf(fp, "%*s  ", 16, layer->info.input[i].name);
+		rte_ml_io_type_to_str(layer->info.input[i].qtype, str, STR_LEN);
+		fprintf(fp, "%*s  ", 12, str);
+	}
+	fprintf(fp, "\n");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s\n", "output", "output_name", "output_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->info.nb_outputs; i++) {
+		fprintf(fp, "%8u  ", i);
+		fprintf(fp, "%*s  ", 16, layer->info.output[i].name);
+		rte_ml_io_type_to_str(layer->info.output[i].qtype, str, STR_LEN);
+		fprintf(fp, "%*s  ", 12, str);
+		fprintf(fp, "\n");
+	}
+	fprintf(fp, "\n");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index eef424b5c2a..fa7735cfaa0 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -13,6 +13,7 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 /* Maximum number of objects per model */
 #define ML_MVTVM_MODEL_OBJECT_MAX 3
@@ -50,5 +51,6 @@ int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
 			      struct mvtvm_ml_model_object *object);
 void mvtvm_ml_model_io_info_update(struct cnxk_ml_model *model);
 void mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+void mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 29/34] ml/cnxk: enable reporting model runtime as xstats
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (27 preceding siblings ...)
  2023-08-30 15:59 ` [PATCH v1 28/34] ml/cnxk: support device dump for TVM models Srikanth Yalavarthi
@ 2023-08-30 15:59 ` Srikanth Yalavarthi
  2023-08-30 15:59 ` [PATCH v1 30/34] ml/cnxk: implement I/O alloc and free callbacks Srikanth Yalavarthi
                   ` (12 subsequent siblings)
  41 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added model xstats entries to compute runtime latency.
Allocated internal resources for TVM model xstats.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_ops.c    | 182 ++++++++++++++++++++++++++++---
 drivers/ml/cnxk/cnxk_ml_ops.h    |   1 +
 drivers/ml/cnxk/cnxk_ml_xstats.h |   7 ++
 drivers/ml/cnxk/mvtvm_ml_model.h |  24 ++++
 drivers/ml/cnxk/mvtvm_ml_ops.c   |  24 +++-
 5 files changed, 223 insertions(+), 15 deletions(-)

diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index f933a2b846f..ff9ecd3c941 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -146,7 +146,8 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
 	nb_stats = RTE_DIM(device_xstats) +
-		   RTE_DIM(layer_xstats) * ML_CNXK_MAX_MODELS * ML_CNXK_MODEL_MAX_LAYERS;
+		   RTE_DIM(layer_xstats) * ML_CNXK_MAX_MODELS * ML_CNXK_MODEL_MAX_LAYERS +
+		   RTE_DIM(model_xstats) * ML_CNXK_MAX_MODELS;
 	if (cnxk_mldev->xstats.entries == NULL)
 		cnxk_mldev->xstats.entries = rte_zmalloc(
 			"cnxk_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
@@ -177,6 +178,25 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
 		cnxk_mldev->xstats.offset_for_model[model] = stat_id;
 
+		for (i = 0; i < RTE_DIM(model_xstats); i++) {
+			cnxk_mldev->xstats.entries[stat_id].map.id = stat_id;
+			cnxk_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
+			cnxk_mldev->xstats.entries[stat_id].group = CNXK_ML_XSTATS_GROUP_MODEL;
+			cnxk_mldev->xstats.entries[stat_id].type = model_xstats[i].type;
+			cnxk_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
+			cnxk_mldev->xstats.entries[stat_id].obj_idx = model;
+			cnxk_mldev->xstats.entries[stat_id].layer_id = -1;
+			cnxk_mldev->xstats.entries[stat_id].reset_allowed =
+				model_xstats[i].reset_allowed;
+
+			/* Name of xstat is updated during model load */
+			snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+				 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name),
+				 "Model-%u-%s", model, model_xstats[i].name);
+
+			stat_id++;
+		}
+
 		for (layer = 0; layer < ML_CNXK_MODEL_MAX_LAYERS; layer++) {
 			cnxk_mldev->xstats.offset_for_layer[model][layer] = stat_id;
 
@@ -203,7 +223,8 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 			cnxk_mldev->xstats.count_per_layer[model][layer] = RTE_DIM(layer_xstats);
 		}
 
-		cnxk_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
+		cnxk_mldev->xstats.count_per_model[model] =
+			RTE_DIM(layer_xstats) + ML_CNXK_MODEL_MAX_LAYERS * RTE_DIM(model_xstats);
 	}
 
 	cnxk_mldev->xstats.count_mode_model = stat_id - cnxk_mldev->xstats.count_mode_device;
@@ -212,6 +233,42 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
+void
+cnxk_ml_xstats_model_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id)
+{
+	struct cnxk_ml_model *model;
+	uint16_t rclk_freq;
+	uint16_t sclk_freq;
+	uint16_t stat_id;
+	char suffix[8];
+	uint16_t i;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	stat_id = cnxk_mldev->xstats.offset_for_model[model_id];
+
+	roc_clk_freq_get(&rclk_freq, &sclk_freq);
+	if (sclk_freq == 0)
+		strcpy(suffix, "cycles");
+	else
+		strcpy(suffix, "ns");
+
+	/* Update xstat name based on layer name and sclk availability */
+	for (i = 0; i < RTE_DIM(model_xstats); i++) {
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+			snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+				 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+				 model->glow.metadata.model.name, model_xstats[i].name, suffix);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		else
+			snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+				 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+				 model->mvtvm.metadata.model.name, model_xstats[i].name, suffix);
+#endif
+
+		stat_id++;
+	}
+}
+
 static void
 cnxk_ml_xstats_uninit(struct cnxk_ml_dev *cnxk_mldev)
 {
@@ -249,6 +306,9 @@ cnxk_ml_dev_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx __rte_unu
 			count += layer->glow.burst_xstats[qp_id].dequeued_count -                  \
 				 layer->glow.burst_xstats[qp_id].str##_reset_count;                \
 		}                                                                                  \
+		value += layer->glow.sync_xstats->str##_latency_tot;                               \
+		count += layer->glow.sync_xstats->dequeued_count -                                 \
+			 layer->glow.sync_xstats->str##_reset_count;                               \
 		if (count != 0)                                                                    \
 			value = value / count;                                                     \
 	} while (0)
@@ -261,6 +321,9 @@ cnxk_ml_dev_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx __rte_unu
 			count += layer->glow.burst_xstats[qp_id].dequeued_count -                  \
 				 layer->glow.burst_xstats[qp_id].str##_reset_count;                \
 		}                                                                                  \
+		value = PLT_MIN(value, layer->glow.sync_xstats->str##_latency_min);                \
+		count += layer->glow.sync_xstats->dequeued_count -                                 \
+			 layer->glow.sync_xstats->str##_reset_count;                               \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
 	} while (0)
@@ -273,10 +336,53 @@ cnxk_ml_dev_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx __rte_unu
 			count += layer->glow.burst_xstats[qp_id].dequeued_count -                  \
 				 layer->glow.burst_xstats[qp_id].str##_reset_count;                \
 		}                                                                                  \
+		value = PLT_MAX(value, layer->glow.sync_xstats->str##_latency_max);                \
+		count += layer->glow.sync_xstats->dequeued_count -                                 \
+			 layer->glow.sync_xstats->str##_reset_count;                               \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
 	} while (0)
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#define ML_AVG_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count)                            \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value += model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_tot;              \
+			count += model->mvtvm.burst_xstats[qp_id].dequeued_count -                 \
+				 model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count;              \
+		}                                                                                  \
+		if (count != 0)                                                                    \
+			value = value / count;                                                     \
+	} while (0)
+
+#define ML_MIN_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count)                            \
+	do {                                                                                       \
+		value = UINT64_MAX;                                                                \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MIN(value,                                                     \
+					model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_min);      \
+			count += model->mvtvm.burst_xstats[qp_id].dequeued_count -                 \
+				 model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count;              \
+		}                                                                                  \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+
+#define ML_MAX_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count)                            \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MAX(value,                                                     \
+					model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_max);      \
+			count += model->mvtvm.burst_xstats[qp_id].dequeued_count -                 \
+				 model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count;              \
+		}                                                                                  \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+#endif
+
 static uint64_t
 cnxk_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx, int32_t layer_id,
 			enum cnxk_ml_xstats_type type)
@@ -317,6 +423,17 @@ cnxk_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx, int32_
 	case max_fw_latency:
 		ML_MAX_FOREACH_QP(cnxk_mldev, layer, qp_id, fw, value, count);
 		break;
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	case avg_rt_latency:
+		ML_AVG_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count);
+		break;
+	case min_rt_latency:
+		ML_MIN_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count);
+		break;
+	case max_rt_latency:
+		ML_MAX_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count);
+		break;
+#endif
 	default:
 		value = 0;
 	}
@@ -907,8 +1024,9 @@ cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode
 {
 	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
 	uint32_t xstats_mode_count;
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	uint32_t idx = 0;
 	uint32_t i;
 
@@ -925,7 +1043,17 @@ cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CNXK_MAX_MODELS)
 			break;
-		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+
+		model = cnxk_mldev->mldev->data->models[model_id];
+		for (layer_id = 0; layer_id < model->nb_layers; layer_id++) {
+			if (model->layer[layer_id].type == ML_CNXK_LAYER_TYPE_MRVL)
+				xstats_mode_count +=
+					cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+		}
+
+		if ((model->type == ML_CNXK_MODEL_TYPE_TVM) &&
+		    (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_MRVL))
+			xstats_mode_count += RTE_DIM(model_xstats);
 		break;
 	default:
 		return -EINVAL;
@@ -939,9 +1067,20 @@ cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode
 		if (xs->mode != mode)
 			continue;
 
-		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
-		    (model_id != xs->obj_idx || layer_id != xs->layer_id))
-			continue;
+		if (mode == RTE_ML_DEV_XSTATS_MODEL) {
+			if (model_id != xs->obj_idx)
+				continue;
+
+			model = cnxk_mldev->mldev->data->models[model_id];
+			if ((model->type == ML_CNXK_MODEL_TYPE_GLOW ||
+			     model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL) &&
+			    xs->group == CNXK_ML_XSTATS_GROUP_MODEL)
+				continue;
+
+			if (model->type == ML_CNXK_MODEL_TYPE_TVM &&
+			    model->layer[xs->layer_id].type == ML_CNXK_LAYER_TYPE_LLVM)
+				continue;
+		}
 
 		strncpy(xstats_map[idx].name, xs->map.name, RTE_ML_STR_MAX);
 		xstats_map[idx].id = xs->map.id;
@@ -1002,9 +1141,10 @@ cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 {
 	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
 	uint32_t xstats_mode_count;
-	uint16_t layer_id = 0;
 	cnxk_ml_xstats_fn fn;
+	uint16_t layer_id;
 	uint64_t val;
 	uint32_t idx;
 	uint32_t i;
@@ -1022,7 +1162,14 @@ cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CNXK_MAX_MODELS)
 			return -EINVAL;
-		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+
+		model = cnxk_mldev->mldev->data->models[model_id];
+		for (layer_id = 0; layer_id < model->nb_layers; layer_id++)
+			xstats_mode_count += cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+
+		if ((model->type == ML_CNXK_MODEL_TYPE_TVM) &&
+		    (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_MRVL))
+			xstats_mode_count += RTE_DIM(model_xstats);
 		break;
 	default:
 		return -EINVAL;
@@ -1034,11 +1181,18 @@ cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 		if (stat_ids[i] > cnxk_mldev->xstats.count || xs->mode != mode)
 			continue;
 
-		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
-		    (model_id != xs->obj_idx || layer_id != xs->layer_id)) {
-			plt_err("Invalid stats_id[%d] = %d for model_id = %d\n", i, stat_ids[i],
-				model_id);
-			return -EINVAL;
+		if (mode == RTE_ML_DEV_XSTATS_MODEL) {
+			if (model_id != xs->obj_idx)
+				continue;
+
+			model = cnxk_mldev->mldev->data->models[xs->obj_idx];
+			if ((model->type == ML_CNXK_MODEL_TYPE_GLOW ||
+			     model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL) &&
+			    xs->group == CNXK_ML_XSTATS_GROUP_MODEL)
+				continue;
+
+			if (xs->layer_id == -1 && xs->group == CNXK_ML_XSTATS_GROUP_LAYER)
+				continue;
 		}
 
 		switch (xs->fn_id) {
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index d0c126f34b7..2575f4c6e10 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -64,6 +64,7 @@ extern struct rte_ml_dev_ops cnxk_ml_ops;
 
 int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
 int cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
+void cnxk_ml_xstats_model_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id);
 
 __rte_hot uint16_t cnxk_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					 struct rte_ml_op **ops, uint16_t nb_ops);
diff --git a/drivers/ml/cnxk/cnxk_ml_xstats.h b/drivers/ml/cnxk/cnxk_ml_xstats.h
index 5e02bb876ca..a2c9adfe4ab 100644
--- a/drivers/ml/cnxk/cnxk_ml_xstats.h
+++ b/drivers/ml/cnxk/cnxk_ml_xstats.h
@@ -142,4 +142,11 @@ static const struct cnxk_ml_xstat_info layer_xstats[] = {
 	{"Min-FW-Latency", min_fw_latency, 1}, {"Max-FW-Latency", max_fw_latency, 1},
 };
 
+/* Model xstats */
+static const struct cnxk_ml_xstat_info model_xstats[] = {
+	{"Avg-RT-Latency", avg_rt_latency, 1},
+	{"Min-RT-Latency", min_rt_latency, 1},
+	{"Max-RT-Latency", max_rt_latency, 1},
+};
+
 #endif /* _CNXK_ML_XSTATS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index fa7735cfaa0..d71df36f5a5 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -33,6 +33,27 @@ struct mvtvm_ml_model_object {
 	int64_t size;
 };
 
+/* Model fast-path stats */
+struct mvtvm_ml_model_xstats {
+	/* Total TVM runtime latency, sum of all inferences */
+	uint64_t tvm_rt_latency_tot;
+
+	/* TVM runtime latency */
+	uint64_t tvm_rt_latency;
+
+	/* Minimum TVM runtime latency */
+	uint64_t tvm_rt_latency_min;
+
+	/* Maximum TVM runtime latency */
+	uint64_t tvm_rt_latency_max;
+
+	/* Total jobs dequeued */
+	uint64_t dequeued_count;
+
+	/* Hardware stats reset index */
+	uint64_t tvm_rt_reset_count;
+};
+
 struct mvtvm_ml_model_data {
 	/* Model metadata */
 	struct tvmdp_model_metadata metadata;
@@ -45,6 +66,9 @@ struct mvtvm_ml_model_data {
 
 	/* Model I/O info */
 	struct cnxk_ml_io_info info;
+
+	/* Stats for burst ops */
+	struct mvtvm_ml_model_xstats *burst_xstats;
 };
 
 int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 3fae25f6d2d..c251579668c 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -16,6 +16,7 @@
 
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_ops.h"
 
 /* ML model macros */
 #define MVTVM_ML_MODEL_MEMZONE_NAME "ml_mvtvm_model_mz"
@@ -59,6 +60,7 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
 	size_t model_object_size = 0;
+	size_t model_xstats_size = 0;
 	uint16_t nb_mrvl_layers;
 	uint16_t nb_llvm_layers;
 	uint8_t layer_id = 0;
@@ -74,7 +76,11 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	model_object_size = RTE_ALIGN_CEIL(object[0].size, RTE_CACHE_LINE_MIN_SIZE) +
 			    RTE_ALIGN_CEIL(object[1].size, RTE_CACHE_LINE_MIN_SIZE) +
 			    RTE_ALIGN_CEIL(object[2].size, RTE_CACHE_LINE_MIN_SIZE);
-	mz_size += model_object_size;
+
+	model_xstats_size =
+		cnxk_mldev->mldev->data->nb_queue_pairs * sizeof(struct mvtvm_ml_model_xstats);
+
+	mz_size += model_object_size + model_xstats_size;
 
 	/* Allocate memzone for model object */
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", MVTVM_ML_MODEL_MEMZONE_NAME, model->model_id);
@@ -187,6 +193,22 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	/* Set model info */
 	mvtvm_ml_model_info_set(cnxk_mldev, model);
 
+	/* Update model xstats name */
+	cnxk_ml_xstats_model_name_update(cnxk_mldev, model->model_id);
+
+	model->mvtvm.burst_xstats = RTE_PTR_ADD(
+		model->mvtvm.object.params.addr,
+		RTE_ALIGN_CEIL(model->mvtvm.object.params.size, RTE_CACHE_LINE_MIN_SIZE));
+
+	for (int qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_tot = 0;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency = 0;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_min = UINT64_MAX;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_max = 0;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count = 0;
+		model->mvtvm.burst_xstats[qp_id].dequeued_count = 0;
+	}
+
 	return 0;
 
 error:
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 30/34] ml/cnxk: implement I/O alloc and free callbacks
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (28 preceding siblings ...)
  2023-08-30 15:59 ` [PATCH v1 29/34] ml/cnxk: enable reporting model runtime as xstats Srikanth Yalavarthi
@ 2023-08-30 15:59 ` Srikanth Yalavarthi
  2023-08-30 15:59 ` [PATCH v1 31/34] ml/cnxk: add generic ML malloc and free callback Srikanth Yalavarthi
                   ` (11 subsequent siblings)
  41 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented callback functions for IO allocation and free
for Glow layers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 123 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |   3 +
 drivers/ml/cnxk/mvtvm_ml_ops.c |   2 +
 3 files changed, 128 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index f70383b1281..23e98b96c59 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1399,3 +1399,126 @@ cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
 error_enqueue:
 	return ret;
 }
+
+int
+cn10k_ml_io_alloc(void *device, uint16_t model_id, const char *layer_name, uint64_t **input_qbuffer,
+		  uint64_t **output_qbuffer)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint16_t layer_id = 0;
+	uint64_t output_size;
+	uint64_t input_size;
+
+#ifndef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	PLT_SET_USED(layer_name);
+#endif
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", cnxk_mldev);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	if (model->type == ML_CNXK_MODEL_TYPE_TVM) {
+		for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+			if (strcmp(model->layer[layer_id].name, layer_name) == 0)
+				break;
+		}
+
+		if (layer_id == model->mvtvm.metadata.model.nb_layers) {
+			plt_err("Invalid layer name: %s", layer_name);
+			return -EINVAL;
+		}
+
+		if (model->layer[layer_id].type != ML_CNXK_LAYER_TYPE_MRVL) {
+			plt_err("Invalid layer name / type: %s", layer_name);
+			return -EINVAL;
+		}
+	}
+#endif
+
+	layer = &model->layer[layer_id];
+	input_size = PLT_ALIGN_CEIL(layer->info.total_input_sz_q, ML_CN10K_ALIGN_SIZE);
+	output_size = PLT_ALIGN_CEIL(layer->info.total_output_sz_q, ML_CN10K_ALIGN_SIZE);
+
+	sprintf(str, "cn10k_ml_io_mz_%u_%u", model_id, layer_id);
+	mz = plt_memzone_reserve_aligned(str, input_size + output_size, 0, ML_CN10K_ALIGN_SIZE);
+	if (mz == NULL) {
+		plt_err("io_alloc failed: Unable to allocate memory: model_id = %u, layer_name = %s",
+			model_id, layer_name);
+		return -ENOMEM;
+	}
+
+	*input_qbuffer = mz->addr;
+	*output_qbuffer = PLT_PTR_ADD(mz->addr, input_size);
+
+	return 0;
+}
+
+int
+cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint16_t layer_id = 0;
+
+#ifndef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	PLT_SET_USED(layer_name);
+#endif
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", cnxk_mldev);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	if (model->type == ML_CNXK_MODEL_TYPE_TVM) {
+		for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+			if (strcmp(model->layer[layer_id].name, layer_name) == 0)
+				break;
+		}
+
+		if (layer_id == model->mvtvm.metadata.model.nb_layers) {
+			plt_err("Invalid layer name: %s", layer_name);
+			return -EINVAL;
+		}
+
+		if (model->layer[layer_id].type != ML_CNXK_LAYER_TYPE_MRVL) {
+			plt_err("Invalid layer name / type: %s", layer_name);
+			return -EINVAL;
+		}
+	}
+#endif
+
+	sprintf(str, "cn10k_ml_io_mz_%u_%u", model_id, layer_id);
+	mz = plt_memzone_lookup(str);
+	if (mz == NULL) {
+		plt_err("io_free failed: Memzone not found: model_id = %u, layer_name = %s",
+			model_id, layer_name);
+		return -EINVAL;
+	}
+
+	return plt_memzone_free(mz);
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 3e75cae65a3..055651eaa24 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -328,5 +328,8 @@ int cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name,
 int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name);
 int cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name);
 int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name);
+int cn10k_ml_io_alloc(void *device, uint16_t model_id, const char *layer_name,
+		      uint64_t **input_qbuffer, uint64_t **output_qbuffer);
+int cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name);
 
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index c251579668c..821b3e8f3c9 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -166,6 +166,8 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback = &model->mvtvm.cb;
 		callback->tvmrt_glow_layer_load = cn10k_ml_layer_load;
 		callback->tvmrt_glow_layer_unload = cn10k_ml_layer_unload;
+		callback->tvmrt_io_alloc = cn10k_ml_io_alloc;
+		callback->tvmrt_io_free = cn10k_ml_io_free;
 	} else {
 		callback = NULL;
 	}
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 31/34] ml/cnxk: add generic ML malloc and free callback
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (29 preceding siblings ...)
  2023-08-30 15:59 ` [PATCH v1 30/34] ml/cnxk: implement I/O alloc and free callbacks Srikanth Yalavarthi
@ 2023-08-30 15:59 ` Srikanth Yalavarthi
  2023-08-30 15:59 ` [PATCH v1 32/34] ml/cnxk: support quantize and dequantize callback Srikanth Yalavarthi
                   ` (10 subsequent siblings)
  41 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented generic ML malloc and free callbacks

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 30 ++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  3 +++
 drivers/ml/cnxk/mvtvm_ml_ops.c |  2 ++
 3 files changed, 35 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 23e98b96c59..140f7a343f9 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1522,3 +1522,33 @@ cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name)
 
 	return plt_memzone_free(mz);
 }
+
+int
+cn10k_ml_malloc(const char *name, size_t size, uint32_t align, void **addr)
+{
+	const struct plt_memzone *mz;
+
+	mz = plt_memzone_reserve_aligned(name, size, 0, align);
+	if (mz == NULL) {
+		plt_err("ml_malloc failed: Unable to allocate memory: name = %s", name);
+		return -ENOMEM;
+	}
+
+	*addr = mz->addr;
+
+	return 0;
+}
+
+int
+cn10k_ml_free(const char *name)
+{
+	const struct plt_memzone *mz;
+
+	mz = plt_memzone_lookup(name);
+	if (mz == NULL) {
+		plt_err("ml_free failed: Memzone not found: name = %s", name);
+		return -EINVAL;
+	}
+
+	return plt_memzone_free(mz);
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 055651eaa24..d7df1d003aa 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -332,4 +332,7 @@ int cn10k_ml_io_alloc(void *device, uint16_t model_id, const char *layer_name,
 		      uint64_t **input_qbuffer, uint64_t **output_qbuffer);
 int cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name);
 
+int cn10k_ml_malloc(const char *name, size_t size, uint32_t align, void **addr);
+int cn10k_ml_free(const char *name);
+
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 821b3e8f3c9..36616ece964 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -168,6 +168,8 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback->tvmrt_glow_layer_unload = cn10k_ml_layer_unload;
 		callback->tvmrt_io_alloc = cn10k_ml_io_alloc;
 		callback->tvmrt_io_free = cn10k_ml_io_free;
+		callback->tvmrt_malloc = cn10k_ml_malloc;
+		callback->tvmrt_free = cn10k_ml_free;
 	} else {
 		callback = NULL;
 	}
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 32/34] ml/cnxk: support quantize and dequantize callback
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (30 preceding siblings ...)
  2023-08-30 15:59 ` [PATCH v1 31/34] ml/cnxk: add generic ML malloc and free callback Srikanth Yalavarthi
@ 2023-08-30 15:59 ` Srikanth Yalavarthi
  2023-08-30 15:59 ` [PATCH v1 33/34] ml/cnxk: enable fast-path ops for TVM models Srikanth Yalavarthi
                   ` (9 subsequent siblings)
  41 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

From: Prince Takkar <ptakkar@marvell.com>

Added support for quantize and dequantize callback
functions for TVM models.

Signed-off-by: Prince Takkar <ptakkar@marvell.com>
---
 drivers/ml/cnxk/meson.build      |   5 ++
 drivers/ml/cnxk/mvtvm_ml_model.h |   2 +
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 127 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |   6 ++
 4 files changed, 140 insertions(+)

diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index db175b0834d..09a62b5c55a 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -19,6 +19,11 @@ if not jansson_dep.found()
         enable_mvtvm = false
 endif
 
+if not cc.check_header('dlpack/dlpack.h')
+        message('drivers/ml/cnxk: dlpack.h not found')
+        enable_mvtvm = false
+endif
+
 tvmrt_lib = cc.find_library('tvm_runtime', required: false)
 if tvmrt_lib.found()
         tvmrt_dep = declare_dependency(dependencies: tvmrt_lib)
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index d71df36f5a5..57a6ce0bb1a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -5,6 +5,8 @@
 #ifndef _MVTVM_ML_MODEL_H_
 #define _MVTVM_ML_MODEL_H_
 
+#include <dlpack/dlpack.h>
+
 #include <tvmdp.h>
 
 #include <rte_mldev.h>
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 36616ece964..0bee5884640 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -9,6 +9,8 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include <mldev_utils.h>
+
 #include "cn10k_ml_ops.h"
 
 #include "mvtvm_ml_model.h"
@@ -170,6 +172,8 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback->tvmrt_io_free = cn10k_ml_io_free;
 		callback->tvmrt_malloc = cn10k_ml_malloc;
 		callback->tvmrt_free = cn10k_ml_free;
+		callback->tvmrt_quantize = mvtvm_ml_io_quantize;
+		callback->tvmrt_dequantize = mvtvm_ml_io_dequantize;
 	} else {
 		callback = NULL;
 	}
@@ -300,3 +304,126 @@ mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
 
 	return 0;
 }
+
+int
+mvtvm_ml_io_quantize(void *device, uint16_t model_id, const char *layer_name,
+		     const DLTensor **deq_tensor, void *qbuffer)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	uint16_t layer_id = 0;
+	uint8_t *lcl_dbuffer;
+	uint8_t *lcl_qbuffer;
+	uint32_t i;
+	int ret;
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if ((device == NULL) || (deq_tensor == NULL) || (qbuffer == NULL))
+		return -EINVAL;
+#endif
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+#ifdef CNXK_ML_DEV_DEBUG
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+#endif
+
+	/* Get layer id */
+	for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+		if (strcmp(model->layer[layer_id].name, layer_name) == 0)
+			break;
+	}
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if (layer_id == model->mvtvm.metadata.model.nb_layers) {
+		plt_err("Invalid layer name: %s", layer_name);
+		return -EINVAL;
+	}
+
+	if (model->layer[layer_id].type != ML_CNXK_LAYER_TYPE_MRVL) {
+		plt_err("Invalid layer name / type: %s", layer_name);
+		return -EINVAL;
+	}
+#endif
+
+	info = &model->layer[layer_id].info;
+	lcl_qbuffer = (uint8_t *)qbuffer;
+
+	for (i = 0; i < info->nb_inputs; i++) {
+		lcl_dbuffer = PLT_PTR_ADD(deq_tensor[i]->data, deq_tensor[i]->byte_offset);
+
+		ret = cnxk_ml_io_quantize_single(&info->input[i], lcl_dbuffer, lcl_qbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_qbuffer += info->input[i].sz_q;
+	}
+
+	return 0;
+}
+
+int
+mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name, void *qbuffer,
+		       const DLTensor **deq_tensor)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	uint16_t layer_id = 0;
+	uint8_t *lcl_dbuffer;
+	uint8_t *lcl_qbuffer;
+	uint32_t i;
+	int ret;
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if ((device == NULL) || (deq_tensor == NULL) || (qbuffer == NULL))
+		return -EINVAL;
+#endif
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+#ifdef CNXK_ML_DEV_DEBUG
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+#endif
+
+	for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+		if (strcmp(model->layer[layer_id].name, layer_name) == 0)
+			break;
+	}
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if (layer_id == model->mvtvm.metadata.model.nb_layers) {
+		plt_err("Invalid layer name: %s", layer_name);
+		return -EINVAL;
+	}
+
+	if (model->layer[layer_id].type != ML_CNXK_LAYER_TYPE_MRVL) {
+		plt_err("Invalid layer name / type: %s", layer_name);
+		return -EINVAL;
+	}
+#endif
+
+	info = &model->layer[layer_id].info;
+	lcl_qbuffer = (uint8_t *)qbuffer;
+
+	for (i = 0; i < info->nb_outputs; i++) {
+		lcl_dbuffer = PLT_PTR_ADD(deq_tensor[i]->data, deq_tensor[i]->byte_offset);
+
+		ret = cnxk_ml_io_dequantize_single(&info->output[i], lcl_qbuffer, lcl_dbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_qbuffer += info->output[i].sz_q;
+	}
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index f6ede6229f4..3a1e97a7a08 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -5,6 +5,8 @@
 #ifndef _MVTVM_ML_OPS_H_
 #define _MVTVM_ML_OPS_H_
 
+#include <tvmdp.h>
+
 #include <rte_mldev.h>
 
 struct cnxk_ml_dev;
@@ -17,5 +19,9 @@ int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_para
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_io_quantize(void *device, uint16_t model_id, const char *layer_name,
+			 const DLTensor **deq_tensor, void *qbuffer);
+int mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name, void *qbuffer,
+			   const DLTensor **deq_tensor);
 
 #endif /* _MVTVM_ML_OPS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 33/34] ml/cnxk: enable fast-path ops for TVM models
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (31 preceding siblings ...)
  2023-08-30 15:59 ` [PATCH v1 32/34] ml/cnxk: support quantize and dequantize callback Srikanth Yalavarthi
@ 2023-08-30 15:59 ` Srikanth Yalavarthi
  2023-08-30 15:59 ` [PATCH v1 34/34] ml/cnxk: enable creation of mvtvm virtual device Srikanth Yalavarthi
                   ` (8 subsequent siblings)
  41 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

From: Anup Prabhu <aprabhu@marvell.com>

Enable fast-path ops support for TVM models. Models would
use TVMDP library function calls to execute inference
operations for Hybrid and LLVM model sub-types.

For TVM MRVL model subtypes that have a single MRVL layer,
the inference requests are directly enqueued to hardware
by the driver.

Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c   |   4 -
 drivers/ml/cnxk/cnxk_ml_io.h     |   6 ++
 drivers/ml/cnxk/cnxk_ml_ops.c    |   4 +
 drivers/ml/cnxk/cnxk_ml_ops.h    |   9 +++
 drivers/ml/cnxk/mvtvm_ml_model.c |  20 +++++
 drivers/ml/cnxk/mvtvm_ml_model.h |   6 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 124 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |  43 +++++++++++
 8 files changed, 212 insertions(+), 4 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 140f7a343f9..c1353fb0c81 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -287,10 +287,6 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 	else
 		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
-	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
-	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
-	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
-
 	return 0;
 }
 
diff --git a/drivers/ml/cnxk/cnxk_ml_io.h b/drivers/ml/cnxk/cnxk_ml_io.h
index 5de166c2520..6d5d25a7c9c 100644
--- a/drivers/ml/cnxk/cnxk_ml_io.h
+++ b/drivers/ml/cnxk/cnxk_ml_io.h
@@ -47,6 +47,12 @@ struct cnxk_ml_io {
 
 	/* Scale */
 	float scale;
+
+	/* Dequantized offset */
+	uint32_t offset_d;
+
+	/* Quantized offset */
+	uint32_t offset_q;
 };
 
 /* Model / Layer IO structure */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index ff9ecd3c941..c8491646da9 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -758,6 +758,10 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	cnxk_mldev->max_nb_layers =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
 
+	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
+	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
+	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
+
 	/* Allocate and initialize index_map */
 	if (cnxk_mldev->index_map == NULL) {
 		cnxk_mldev->index_map =
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index 2575f4c6e10..62e2b17e35b 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -12,12 +12,21 @@
 
 #include "cn10k_ml_ops.h"
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_ops.h"
+#endif
+
 /* Request structure */
 struct cnxk_ml_req {
 	/* Device specific request */
 	union {
 		/* CN10K */
 		struct cn10k_ml_req cn10k_req;
+
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		/* MVTVM */
+		struct mvtvm_ml_req mvtvm_req;
+#endif
 	};
 
 	/* Address of status field */
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 24dc862d685..4ac053408e2 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -136,6 +136,16 @@ mvtvm_ml_model_io_info_update(struct cnxk_ml_model *model)
 		model->mvtvm.info.total_input_sz_d += model->mvtvm.info.input[i].sz_d;
 		model->mvtvm.info.total_input_sz_q += model->mvtvm.info.input[i].sz_q;
 
+		model->mvtvm.info.input[i].offset_d = model->mvtvm.info.total_input_sz_d;
+		model->mvtvm.info.input[i].offset_q = model->mvtvm.info.total_input_sz_q;
+
+		model->mvtvm.input_tensor[i].device = metadata->input[i].device;
+		model->mvtvm.input_tensor[i].ndim = metadata->input[i].ndim;
+		model->mvtvm.input_tensor[i].dtype = metadata->input[i].datatype;
+		model->mvtvm.input_tensor[i].shape = metadata->input[i].shape;
+		model->mvtvm.input_tensor[i].strides = NULL;
+		model->mvtvm.input_tensor[i].byte_offset = model->mvtvm.info.input[i].offset_q;
+
 		plt_ml_dbg("model_id = %u, input[%u] - sz_d = %u sz_q = %u", model->model_id, i,
 			   model->mvtvm.info.input[i].sz_d, model->mvtvm.info.input[i].sz_q);
 	}
@@ -169,6 +179,16 @@ mvtvm_ml_model_io_info_update(struct cnxk_ml_model *model)
 		model->mvtvm.info.total_output_sz_d += model->mvtvm.info.output[i].sz_d;
 		model->mvtvm.info.total_output_sz_q += model->mvtvm.info.output[i].sz_q;
 
+		model->mvtvm.info.output[i].offset_d = model->mvtvm.info.total_output_sz_d;
+		model->mvtvm.info.output[i].offset_q = model->mvtvm.info.total_output_sz_q;
+
+		model->mvtvm.output_tensor[i].device = metadata->output[i].device;
+		model->mvtvm.output_tensor[i].ndim = metadata->output[i].ndim;
+		model->mvtvm.output_tensor[i].dtype = metadata->output[i].datatype;
+		model->mvtvm.output_tensor[i].shape = metadata->output[i].shape;
+		model->mvtvm.output_tensor[i].strides = NULL;
+		model->mvtvm.output_tensor[i].byte_offset = model->mvtvm.info.output[i].offset_q;
+
 		plt_ml_dbg("model_id = %u, output[%u] - sz_d = %u sz_q = %u", model->model_id, i,
 			   model->mvtvm.info.output[i].sz_d, model->mvtvm.info.output[i].sz_q);
 	}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 57a6ce0bb1a..08e101bbe74 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -71,6 +71,12 @@ struct mvtvm_ml_model_data {
 
 	/* Stats for burst ops */
 	struct mvtvm_ml_model_xstats *burst_xstats;
+
+	/* Input Tensor */
+	DLTensor input_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Output Tensor */
+	DLTensor output_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
 };
 
 int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 0bee5884640..e8484b3bd92 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -23,6 +23,12 @@
 /* ML model macros */
 #define MVTVM_ML_MODEL_MEMZONE_NAME "ml_mvtvm_model_mz"
 
+__rte_hot static void
+mvtvm_ml_set_poll_addr(struct cnxk_ml_req *req)
+{
+	req->status = &req->mvtvm_req.status;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
@@ -174,6 +180,7 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback->tvmrt_free = cn10k_ml_free;
 		callback->tvmrt_quantize = mvtvm_ml_io_quantize;
 		callback->tvmrt_dequantize = mvtvm_ml_io_dequantize;
+		callback->tvmrt_inference = cn10k_ml_inference_sync;
 	} else {
 		callback = NULL;
 	}
@@ -217,6 +224,19 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		model->mvtvm.burst_xstats[qp_id].dequeued_count = 0;
 	}
 
+	/* Set model specific fast path functions */
+	if (model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL) {
+		model->enqueue_single = cn10k_ml_enqueue_single;
+		model->result_update = cn10k_ml_result_update;
+		model->set_error_code = cn10k_ml_set_error_code;
+		model->set_poll_addr = cn10k_ml_set_poll_addr;
+	} else {
+		model->enqueue_single = mvtvm_ml_enqueue_single;
+		model->result_update = mvtvm_ml_result_update;
+		model->set_error_code = mvtvm_ml_set_error_code;
+		model->set_poll_addr = mvtvm_ml_set_poll_addr;
+	}
+
 	return 0;
 
 error:
@@ -427,3 +447,107 @@ mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name,
 
 	return 0;
 }
+
+static int
+mvtvm_ml_model_run(struct cnxk_ml_model *model, struct rte_ml_op *op, struct cnxk_ml_req *req)
+{
+	uint8_t i;
+
+	rte_memcpy(req->mvtvm_req.input_tensor, model->mvtvm.input_tensor,
+		   model->mvtvm.metadata.model.num_input * sizeof(DLTensor));
+	for (i = 0; i < model->mvtvm.metadata.model.num_input; i++) {
+		req->mvtvm_req.input_tensor[i].data = op->input[i]->addr;
+		req->mvtvm_req.input_tensor[i].byte_offset = 0;
+	}
+
+	rte_memcpy(req->mvtvm_req.output_tensor, model->mvtvm.output_tensor,
+		   model->mvtvm.metadata.model.num_output * sizeof(DLTensor));
+	for (i = 0; i < model->mvtvm.metadata.model.num_output; i++) {
+		req->mvtvm_req.output_tensor[i].data = op->output[i]->addr;
+		req->mvtvm_req.output_tensor[i].byte_offset = 0;
+	}
+
+	tvmdp_model_run(model->model_id, model->mvtvm.metadata.model.num_input,
+			req->mvtvm_req.input_tensor, model->mvtvm.metadata.model.num_output,
+			req->mvtvm_req.output_tensor, &req->mvtvm_req.result,
+			&req->mvtvm_req.status);
+
+	plt_write64(ML_CNXK_POLL_JOB_FINISH, req->status);
+
+	return 0;
+}
+
+__rte_hot void
+mvtvm_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype)
+{
+	RTE_SET_USED(stype);
+
+	req->mvtvm_req.result.error_code = etype;
+}
+
+__rte_hot bool
+mvtvm_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op, uint16_t layer_id,
+			struct cnxk_ml_qp *qp, uint64_t head)
+{
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_req *req;
+
+	RTE_SET_USED(layer_id);
+
+	queue = &qp->queue;
+	req = &queue->reqs[head];
+	model = cnxk_mldev->mldev->data->models[op->model_id];
+
+	model->set_poll_addr(req);
+	memset(&req->mvtvm_req.result, 0, sizeof(struct mvtvm_ml_result));
+	req->mvtvm_req.result.error_code = 0x0;
+	req->mvtvm_req.result.user_ptr = op->user_ptr;
+
+	cnxk_ml_set_poll_ptr(req);
+	mvtvm_ml_model_run(model, op, req);
+	req->timeout = plt_tsc_cycles() + queue->wait_cycles;
+	req->op = op;
+
+	return true;
+}
+
+__rte_hot void
+mvtvm_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request)
+{
+	struct mvtvm_ml_model_xstats *xstats;
+	struct mvtvm_ml_result *result;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_req *req;
+	uint64_t tvm_rt_latency;
+	struct cnxk_ml_qp *qp;
+	struct rte_ml_op *op;
+
+	req = (struct cnxk_ml_req *)request;
+	result = &req->mvtvm_req.result;
+	op = req->op;
+	qp = cnxk_mldev->mldev->data->queue_pairs[qp_id];
+	op->impl_opaque = result->error_code;
+
+	if (likely(result->error_code == 0)) {
+		qp->stats.dequeued_count++;
+		op->status = RTE_ML_OP_STATUS_SUCCESS;
+
+		model = cnxk_mldev->mldev->data->models[op->model_id];
+		xstats = &model->mvtvm.burst_xstats[qp_id];
+
+		if (unlikely(xstats->dequeued_count == xstats->tvm_rt_reset_count)) {
+			xstats->tvm_rt_latency_min = UINT64_MAX;
+			xstats->tvm_rt_latency_max = 0;
+		}
+		tvm_rt_latency = result->stats.end_ns - result->stats.start_ns;
+		xstats->tvm_rt_latency = tvm_rt_latency;
+		xstats->tvm_rt_latency_tot += tvm_rt_latency;
+		xstats->tvm_rt_latency_min = RTE_MIN(xstats->tvm_rt_latency_min, tvm_rt_latency);
+		xstats->tvm_rt_latency_max = RTE_MAX(xstats->tvm_rt_latency_max, tvm_rt_latency);
+		xstats->dequeued_count++;
+	} else {
+		qp->stats.dequeue_err_count++;
+		op->status = RTE_ML_OP_STATUS_ERROR;
+	}
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 3a1e97a7a08..dba055c22e7 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -11,6 +11,44 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_qp;
+struct cnxk_ml_req;
+
+/* Inference stats */
+struct mvtvm_ml_stats {
+	/* Start ns */
+	uint64_t start_ns;
+
+	/* Start ns */
+	uint64_t end_ns;
+};
+
+/* Result structure */
+struct mvtvm_ml_result {
+	/* Job error code */
+	uint64_t error_code;
+
+	/* Inference stats */
+	struct mvtvm_ml_stats stats;
+
+	/* User context pointer */
+	void *user_ptr;
+};
+
+/* MVTVM specific request */
+struct mvtvm_ml_req {
+	/* Input tensors */
+	DLTensor input_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Output tensors */
+	DLTensor output_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Status field for poll mode requests */
+	volatile uint64_t status;
+
+	/* Result */
+	struct mvtvm_ml_result result;
+};
 
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
@@ -24,4 +62,9 @@ int mvtvm_ml_io_quantize(void *device, uint16_t model_id, const char *layer_name
 int mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name, void *qbuffer,
 			   const DLTensor **deq_tensor);
 
+__rte_hot bool mvtvm_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op,
+				       uint16_t layer_id, struct cnxk_ml_qp *qp, uint64_t head);
+__rte_hot void mvtvm_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request);
+__rte_hot void mvtvm_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype);
+
 #endif /* _MVTVM_ML_OPS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v1 34/34] ml/cnxk: enable creation of mvtvm virtual device
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (32 preceding siblings ...)
  2023-08-30 15:59 ` [PATCH v1 33/34] ml/cnxk: enable fast-path ops for TVM models Srikanth Yalavarthi
@ 2023-08-30 15:59 ` Srikanth Yalavarthi
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (7 subsequent siblings)
  41 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-08-30 15:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enable support to create a mvtvm virtual device on system's
without a PCI based ML HW accelerator.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c |   8 ++
 drivers/ml/cnxk/cn10k_ml_dev.h |   3 +
 drivers/ml/cnxk/cnxk_ml_dev.c  |   3 +
 drivers/ml/cnxk/cnxk_ml_dev.h  |  21 ++++
 drivers/ml/cnxk/cnxk_ml_ops.c  |  86 ++++++++++----
 drivers/ml/cnxk/meson.build    |   2 +
 drivers/ml/cnxk/mvtvm_ml_dev.c | 198 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_dev.h |  40 +++++++
 drivers/ml/cnxk/mvtvm_ml_ops.c |  34 +++++-
 drivers/ml/cnxk/mvtvm_ml_ops.h |   2 +
 10 files changed, 372 insertions(+), 25 deletions(-)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 20c114b8bf7..e6dc87e3530 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -368,6 +368,12 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 
 	PLT_SET_USED(pci_drv);
 
+	if (cnxk_ml_dev_initialized == 1) {
+		plt_err("ML CNXK device already initialized!");
+		plt_err("Cannot initialize CN10K PCI dev");
+		rte_exit(-EINVAL, "Invalid EAL arguments ");
+	}
+
 	init_params = (struct rte_ml_dev_pmd_init_params){
 		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cnxk_ml_dev)};
 
@@ -414,6 +420,8 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	dev->dequeue_burst = NULL;
 	dev->op_error_get = NULL;
 
+	cnxk_ml_dev_initialized = 1;
+	cnxk_mldev->type = CNXK_ML_DEV_TYPE_PCI;
 	cnxk_mldev->state = ML_CNXK_DEV_STATE_PROBED;
 
 	return 0;
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 2e7eb6c9ef9..cee405f3f5b 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -11,6 +11,9 @@
 
 #include "cnxk_ml_io.h"
 
+/* Device status */
+extern int cnxk_ml_dev_initialized;
+
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.c b/drivers/ml/cnxk/cnxk_ml_dev.c
index 63d1c9e417b..dc4512223ca 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.c
+++ b/drivers/ml/cnxk/cnxk_ml_dev.c
@@ -7,6 +7,9 @@
 
 #include "cnxk_ml_dev.h"
 
+/* Device status */
+int cnxk_ml_dev_initialized;
+
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
 
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 382fca64bea..491c4c4aea5 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -9,6 +9,10 @@
 
 #include "cn10k_ml_dev.h"
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_dev.h"
+#endif
+
 #include "cnxk_ml_xstats.h"
 
 /* ML command timeout in seconds */
@@ -34,6 +38,15 @@ struct cnxk_ml_error_db {
 	char str[RTE_ML_STR_MAX];
 };
 
+/* Device type */
+enum cnxk_ml_dev_type {
+	/* PCI based Marvell's ML HW accelerator device */
+	CNXK_ML_DEV_TYPE_PCI,
+
+	/* Generic Virtual device */
+	CNXK_ML_DEV_TYPE_VDEV,
+};
+
 /* Device configuration state enum */
 enum cnxk_ml_dev_state {
 	/* Probed and not configured */
@@ -66,6 +79,9 @@ struct cnxk_ml_dev {
 	/* RTE device */
 	struct rte_ml_dev *mldev;
 
+	/* Device type */
+	enum cnxk_ml_dev_type type;
+
 	/* Configuration state */
 	enum cnxk_ml_dev_state state;
 
@@ -87,6 +103,11 @@ struct cnxk_ml_dev {
 	/* CN10K device structure */
 	struct cn10k_ml_dev cn10k_mldev;
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	/* MVTVM device structure */
+	struct mvtvm_ml_dev mvtvm_mldev;
+#endif
+
 	/* Maximum number of layers */
 	uint64_t max_nb_layers;
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index c8491646da9..3525215e716 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -125,7 +125,8 @@ cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc
 	qp->stats.enqueue_err_count = 0;
 	qp->stats.dequeue_err_count = 0;
 
-	cn10k_ml_qp_initialize(cnxk_mldev, qp);
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		cn10k_ml_qp_initialize(cnxk_mldev, qp);
 
 	return qp;
 
@@ -604,7 +605,14 @@ cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	dev_info->driver_name = dev->device->driver->name;
 	dev_info->max_models = ML_CNXK_MAX_MODELS;
 
-	return cn10k_ml_dev_info_get(cnxk_mldev, dev_info);
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		return cn10k_ml_dev_info_get(cnxk_mldev, dev_info);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	else
+		return mvtvm_ml_dev_info_get(cnxk_mldev, dev_info);
+#endif
+
+	return 0;
 }
 
 static int
@@ -642,9 +650,11 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 			   conf->nb_queue_pairs, conf->nb_models);
 
 		/* Load firmware */
-		ret = cn10k_ml_fw_load(cnxk_mldev);
-		if (ret != 0)
-			return ret;
+		if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+			ret = cn10k_ml_fw_load(cnxk_mldev);
+			if (ret != 0)
+				return ret;
+		}
 	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
 		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
@@ -742,10 +752,12 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	}
 	dev->data->nb_models = conf->nb_models;
 
-	ret = cn10k_ml_dev_configure(cnxk_mldev, conf);
-	if (ret != 0) {
-		plt_err("Failed to configure CN10K ML Device");
-		goto error;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		ret = cn10k_ml_dev_configure(cnxk_mldev, conf);
+		if (ret != 0) {
+			plt_err("Failed to configure CN10K ML Device");
+			goto error;
+		}
 	}
 
 #ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
@@ -755,12 +767,17 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 #endif
 
 	/* Set device capabilities */
-	cnxk_mldev->max_nb_layers =
-		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		cnxk_mldev->max_nb_layers =
+			cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
+	else
+		cnxk_mldev->max_nb_layers = ML_CNXK_MAX_MODELS;
 
 	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
 	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
-	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
+
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
 
 	/* Allocate and initialize index_map */
 	if (cnxk_mldev->index_map == NULL) {
@@ -823,8 +840,10 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 		plt_err("Failed to close MVTVM ML Device");
 #endif
 
-	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
-		plt_err("Failed to close CN10K ML Device");
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		if (cn10k_ml_dev_close(cnxk_mldev) != 0)
+			plt_err("Failed to close CN10K ML Device");
+	}
 
 	if (cnxk_mldev->index_map)
 		rte_free(cnxk_mldev->index_map);
@@ -876,10 +895,12 @@ cnxk_ml_dev_start(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
-	ret = cn10k_ml_dev_start(cnxk_mldev);
-	if (ret != 0) {
-		plt_err("Failed to start CN10K ML Device");
-		return ret;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		ret = cn10k_ml_dev_start(cnxk_mldev);
+		if (ret != 0) {
+			plt_err("Failed to start CN10K ML Device");
+			return ret;
+		}
 	}
 
 	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
@@ -898,10 +919,12 @@ cnxk_ml_dev_stop(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
-	ret = cn10k_ml_dev_stop(cnxk_mldev);
-	if (ret != 0) {
-		plt_err("Failed to stop CN10K ML Device");
-		return ret;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		ret = cn10k_ml_dev_stop(cnxk_mldev);
+		if (ret != 0) {
+			plt_err("Failed to stop CN10K ML Device");
+			return ret;
+		}
 	}
 
 	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
@@ -928,7 +951,14 @@ cnxk_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 			cnxk_ml_model_dump(cnxk_mldev, model, fp);
 	}
 
-	return cn10k_ml_dev_dump(cnxk_mldev, fp);
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		return cn10k_ml_dev_dump(cnxk_mldev, fp);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	else
+		return mvtvm_ml_dev_dump(cnxk_mldev, fp);
+#endif
+
+	return 0;
 }
 
 static int
@@ -941,6 +971,9 @@ cnxk_ml_dev_selftest(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV)
+		return -ENOTSUP;
+
 	return cn10k_ml_dev_selftest(cnxk_mldev);
 }
 
@@ -1269,6 +1302,11 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 		return -EINVAL;
 	}
 
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV && type != ML_CNXK_MODEL_TYPE_TVM) {
+		plt_err("Unsupported model type");
+		return -ENOTSUP;
+	}
+
 	/* Find model ID */
 	found = false;
 	for (lcl_model_id = 0; lcl_model_id < dev->data->nb_models; lcl_model_id++) {
@@ -1463,6 +1501,8 @@ cnxk_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buf
 		return -EINVAL;
 
 	cnxk_mldev = dev->data->dev_private;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV)
+		return -ENOTSUP;
 
 	model = dev->data->models[model_id];
 	if (model == NULL) {
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 09a62b5c55a..f5989c5cafe 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -70,11 +70,13 @@ if enable_mvtvm
 dpdk_conf.set('RTE_MLDEV_CNXK_ENABLE_MVTVM', true)
 
 driver_sdk_headers += files(
+        'mvtvm_ml_dev.h',
         'mvtvm_ml_ops.h',
         'mvtvm_ml_model.h',
 )
 
 sources += files(
+        'mvtvm_ml_dev.c',
         'mvtvm_ml_ops.c',
         'mvtvm_ml_model.c',
 )
diff --git a/drivers/ml/cnxk/mvtvm_ml_dev.c b/drivers/ml/cnxk/mvtvm_ml_dev.c
new file mode 100644
index 00000000000..8ca0e959e35
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_dev.c
@@ -0,0 +1,198 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_kvargs.h>
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include <bus_vdev_driver.h>
+
+#include <roc_api.h>
+
+#include "mvtvm_ml_dev.h"
+
+#include "cnxk_ml_dev.h"
+
+#define MVTVM_ML_DEV_MAX_QPS	      "max_qps"
+#define MVTVM_ML_DEV_CACHE_MODEL_DATA "cache_model_data"
+
+#define MVTVM_ML_DEV_MAX_QPS_DEFAULT	      32
+#define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT 1
+
+static const char *const valid_args[] = {MVTVM_ML_DEV_MAX_QPS, MVTVM_ML_DEV_CACHE_MODEL_DATA, NULL};
+
+static int
+parse_integer_arg(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	int *i = (int *)extra_args;
+
+	*i = atoi(value);
+	if (*i < 0) {
+		plt_err("Argument has to be positive.");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int
+parse_uint_arg(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	int i;
+	char *end;
+	errno = 0;
+
+	i = strtol(value, &end, 10);
+	if (*end != 0 || errno != 0 || i < 0)
+		return -EINVAL;
+
+	*((uint32_t *)extra_args) = i;
+
+	return 0;
+}
+
+static int
+mvtvm_mldev_parse_devargs(const char *args, struct mvtvm_ml_dev *mvtvm_mldev)
+{
+	bool cache_model_data_set = false;
+	struct rte_kvargs *kvlist = NULL;
+	bool max_qps_set = false;
+	int ret = 0;
+
+	if (args == NULL)
+		goto check_args;
+
+	kvlist = rte_kvargs_parse(args, valid_args);
+	if (kvlist == NULL) {
+		plt_err("Error parsing %s devargs\n", "MLDEV_NAME_MVTVM_PMD");
+		return -EINVAL;
+	}
+
+	if (rte_kvargs_count(kvlist, MVTVM_ML_DEV_MAX_QPS) == 1) {
+		ret = rte_kvargs_process(kvlist, MVTVM_ML_DEV_MAX_QPS, &parse_uint_arg,
+					 &mvtvm_mldev->max_nb_qpairs);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n", MVTVM_ML_DEV_MAX_QPS);
+			ret = -EINVAL;
+			goto exit;
+		}
+		max_qps_set = true;
+	}
+
+	if (rte_kvargs_count(kvlist, MVTVM_ML_DEV_CACHE_MODEL_DATA) == 1) {
+		ret = rte_kvargs_process(kvlist, MVTVM_ML_DEV_CACHE_MODEL_DATA, &parse_integer_arg,
+					 &mvtvm_mldev->cache_model_data);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n",
+				MVTVM_ML_DEV_CACHE_MODEL_DATA);
+			ret = -EINVAL;
+			goto exit;
+		}
+		cache_model_data_set = true;
+	}
+
+check_args:
+	if (!max_qps_set)
+		mvtvm_mldev->max_nb_qpairs = MVTVM_ML_DEV_MAX_QPS_DEFAULT;
+	plt_ml_dbg("ML: %s = %u", MVTVM_ML_DEV_MAX_QPS, mvtvm_mldev->max_nb_qpairs);
+
+	if (!cache_model_data_set) {
+		mvtvm_mldev->cache_model_data = CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT;
+	} else {
+		if ((mvtvm_mldev->cache_model_data < 0) || (mvtvm_mldev->cache_model_data > 1)) {
+			plt_err("Invalid argument, %s = %d\n", MVTVM_ML_DEV_CACHE_MODEL_DATA,
+				mvtvm_mldev->cache_model_data);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_ml_dbg("ML: %s = %d", MVTVM_ML_DEV_CACHE_MODEL_DATA, mvtvm_mldev->cache_model_data);
+
+exit:
+	if (kvlist)
+		rte_kvargs_free(kvlist);
+
+	return ret;
+}
+
+static int
+mvtvm_ml_vdev_probe(struct rte_vdev_device *vdev)
+{
+	struct rte_ml_dev_pmd_init_params init_params;
+	struct mvtvm_ml_dev *mvtvm_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct rte_ml_dev *dev;
+	const char *input_args;
+	const char *name;
+	int ret = 0;
+
+	if (cnxk_ml_dev_initialized == 1) {
+		plt_err("ML CNXK device already initialized!");
+		plt_err("Cannot initialize MVTVM vdev");
+		rte_exit(-EINVAL, "Invalid EAL arguments ");
+	}
+
+	init_params = (struct rte_ml_dev_pmd_init_params){
+		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cnxk_ml_dev)};
+
+	name = rte_vdev_device_name(vdev);
+	if (name == NULL)
+		return -EINVAL;
+	input_args = rte_vdev_device_args(vdev);
+
+	dev = rte_ml_dev_pmd_create(name, &vdev->device, &init_params);
+	if (dev == NULL) {
+		ret = -EFAULT;
+		goto error_exit;
+	}
+
+	cnxk_mldev = dev->data->dev_private;
+	cnxk_mldev->mldev = dev;
+	mvtvm_mldev = &cnxk_mldev->mvtvm_mldev;
+	mvtvm_mldev->vdev = vdev;
+
+	ret = mvtvm_mldev_parse_devargs(input_args, mvtvm_mldev);
+	if (ret < 0)
+		goto error_exit;
+
+	dev->dev_ops = &cnxk_ml_ops;
+	dev->enqueue_burst = NULL;
+	dev->dequeue_burst = NULL;
+	dev->op_error_get = NULL;
+
+	cnxk_ml_dev_initialized = 1;
+	cnxk_mldev->type = CNXK_ML_DEV_TYPE_VDEV;
+
+	return 0;
+
+error_exit:
+	plt_err("Could not create device: ml_mvtvm");
+
+	return ret;
+}
+
+static int
+mvtvm_ml_vdev_remove(struct rte_vdev_device *vdev)
+{
+	struct rte_ml_dev *dev;
+	const char *name;
+
+	name = rte_vdev_device_name(vdev);
+	if (name == NULL)
+		return -EINVAL;
+
+	dev = rte_ml_dev_pmd_get_named_dev(name);
+	if (dev == NULL)
+		return -ENODEV;
+
+	return rte_ml_dev_pmd_destroy(dev);
+}
+
+static struct rte_vdev_driver mvtvm_mldev_pmd = {.probe = mvtvm_ml_vdev_probe,
+						 .remove = mvtvm_ml_vdev_remove};
+
+RTE_PMD_REGISTER_VDEV(MLDEV_NAME_MVTVM_PMD, mvtvm_mldev_pmd);
+
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_MVTVM_PMD,
+			      MVTVM_ML_DEV_MAX_QPS "=<int>" MVTVM_ML_DEV_CACHE_MODEL_DATA "=<0|1>");
diff --git a/drivers/ml/cnxk/mvtvm_ml_dev.h b/drivers/ml/cnxk/mvtvm_ml_dev.h
new file mode 100644
index 00000000000..6922c193372
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_dev.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_DEV_H_
+#define _MVTVM_ML_DEV_H_
+
+#include <rte_mldev_core.h>
+
+/* Device status */
+extern int cnxk_ml_dev_initialized;
+
+/* CNXK Device ops */
+extern struct rte_ml_dev_ops cnxk_ml_ops;
+
+/* Marvell MVTVM ML PMD device name */
+#define MLDEV_NAME_MVTVM_PMD ml_mvtvm
+
+/* Maximum number of descriptors per queue-pair */
+#define ML_MVTVM_MAX_DESC_PER_QP 1024
+
+/* Maximum number of inputs / outputs per model */
+#define ML_MVTVM_MAX_INPUT_OUTPUT 32
+
+/* Maximum number of segments for IO data */
+#define ML_MVTVM_MAX_SEGMENTS 1
+
+/* Device private data */
+struct mvtvm_ml_dev {
+	/* Virtual device */
+	struct rte_vdev_device *vdev;
+
+	/* Maximum number of queue pairs */
+	uint16_t max_nb_qpairs;
+
+	/* Enable / disable model data caching */
+	int cache_model_data;
+};
+
+#endif /* _MVTVM_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index e8484b3bd92..dc9beb86e2e 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -11,8 +11,7 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_ops.h"
-
+#include "mvtvm_ml_dev.h"
 #include "mvtvm_ml_model.h"
 #include "mvtvm_ml_ops.h"
 
@@ -29,6 +28,22 @@ mvtvm_ml_set_poll_addr(struct cnxk_ml_req *req)
 	req->status = &req->mvtvm_req.status;
 }
 
+int
+mvtvm_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info)
+{
+	struct mvtvm_ml_dev *mvtvm_mldev;
+
+	mvtvm_mldev = &cnxk_mldev->mvtvm_mldev;
+
+	dev_info->max_queue_pairs = mvtvm_mldev->max_nb_qpairs;
+	dev_info->max_desc = ML_MVTVM_MAX_DESC_PER_QP;
+	dev_info->max_io = ML_MVTVM_MAX_INPUT_OUTPUT;
+	dev_info->max_segments = ML_MVTVM_MAX_SEGMENTS;
+	dev_info->align_size = RTE_CACHE_LINE_SIZE;
+
+	return 0;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
@@ -59,6 +74,15 @@ mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 	return ret;
 }
 
+int
+mvtvm_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(fp);
+
+	return 0;
+}
+
 int
 mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 		    struct cnxk_ml_model *model)
@@ -169,6 +193,12 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	else
 		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_HYBRID;
 
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV &&
+	    model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_LLVM) {
+		plt_err("Unsupported model sub-type");
+		return -ENOTSUP;
+	}
+
 	/* Set callback function array */
 	if (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_LLVM) {
 		callback = &model->mvtvm.cb;
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index dba055c22e7..6cb8db92030 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -50,8 +50,10 @@ struct mvtvm_ml_req {
 	struct mvtvm_ml_result result;
 };
 
+int mvtvm_ml_dev_info_get(struct cnxk_ml_dev *mldev, struct rte_ml_dev_info *dev_info);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int mvtvm_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 00/34] Implemenation of revised ml/cnxk driver
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (33 preceding siblings ...)
  2023-08-30 15:59 ` [PATCH v1 34/34] ml/cnxk: enable creation of mvtvm virtual device Srikanth Yalavarthi
@ 2023-09-20  7:24 ` Srikanth Yalavarthi
  2023-09-20  7:24   ` [PATCH v2 01/34] ml/cnxk: drop support for register polling Srikanth Yalavarthi
                     ` (34 more replies)
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
                   ` (6 subsequent siblings)
  41 siblings, 35 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:24 UTC (permalink / raw)
  Cc: dev, syalavarthi, sshankarnara, aprabhu, ptakkar

This patch series is an implementation of revised ml/cnxk driver
to support models compiled with TVM compiler framework. TVM models
use a hybrid mode for execution, with regions of the model executing
on the ML accelerator and the rest executing on CPU cores.

This series of commits reorganizes the ml/cnxk driver and adds support
to execute multiple regions with-in a TVM model.

v2:
  - Fix xstats reporting
  - Fix issues reported by klocwork static analysis tool
  - Update external header inclusions

v1:
  - Initial changes

Anup Prabhu (1):
  ml/cnxk: enable fast-path ops for TVM models

Prince Takkar (2):
  ml/cnxk: update internal TVM model info structure
  ml/cnxk: support quantize and dequantize callback

Srikanth Yalavarthi (31):
  ml/cnxk: drop support for register polling
  ml/cnxk: drop use of RTE API for firmware read
  ml/cnxk: add generic cnxk device structure
  ml/cnxk: add generic model and layer structures
  ml/cnxk: add generic cnxk request structure
  ml/cnxk: add generic cnxk xstats structures
  ml/cnxk: rename cnxk ops function pointers struct
  ml/cnxk: update device handling functions
  ml/cnxk: update queue-pair handling functions
  ml/cnxk: update model load and unload functions
  ml/cnxk: update model start and stop functions
  ml/cnxk: update model utility functions
  ml/cnxk: update data quantization functions
  ml/cnxk: update device debug functions
  ml/cnxk: update device stats functions
  ml/cnxk: update device and model xstats functions
  ml/cnxk: update fast path functions
  ml/cnxk: move error handling to cnxk layer
  ml/cnxk: support config and close of tvmdp library
  ml/cnxk: add structures to support TVM model type
  ml/cnxk: add support for identify model type
  ml/cnxk: add support to parse TVM model objects
  ml/cnxk: fetch layer info and load TVM model
  ml/cnxk: update internal info for TVM model
  ml/cnxk: enable model unload in tvmdp library
  ml/cnxk: support start and stop for TVM models
  ml/cnxk: support device dump for TVM models
  ml/cnxk: enable reporting model runtime as xstats
  ml/cnxk: implement I/O alloc and free callbacks
  ml/cnxk: add generic ML malloc and free callback
  ml/cnxk: enable creation of mvtvm virtual device

 doc/guides/mldevs/cnxk.rst       |   16 -
 drivers/ml/cnxk/cn10k_ml_dev.c   |  477 ++---
 drivers/ml/cnxk/cn10k_ml_dev.h   |  457 +----
 drivers/ml/cnxk/cn10k_ml_model.c |  383 ++--
 drivers/ml/cnxk/cn10k_ml_model.h |  148 +-
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  109 +-
 drivers/ml/cnxk/cn10k_ml_ocm.h   |   15 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 2915 ++++++++++--------------------
 drivers/ml/cnxk/cn10k_ml_ops.h   |  351 +++-
 drivers/ml/cnxk/cnxk_ml_dev.c    |   22 +
 drivers/ml/cnxk/cnxk_ml_dev.h    |  120 ++
 drivers/ml/cnxk/cnxk_ml_io.c     |   95 +
 drivers/ml/cnxk/cnxk_ml_io.h     |   88 +
 drivers/ml/cnxk/cnxk_ml_model.c  |  143 ++
 drivers/ml/cnxk/cnxk_ml_model.h  |  187 ++
 drivers/ml/cnxk/cnxk_ml_ops.c    | 1789 ++++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.h    |   85 +
 drivers/ml/cnxk/cnxk_ml_utils.c  |   15 +
 drivers/ml/cnxk/cnxk_ml_utils.h  |   17 +
 drivers/ml/cnxk/cnxk_ml_xstats.h |  152 ++
 drivers/ml/cnxk/meson.build      |   70 +
 drivers/ml/cnxk/mvtvm_ml_dev.c   |  198 ++
 drivers/ml/cnxk/mvtvm_ml_dev.h   |   40 +
 drivers/ml/cnxk/mvtvm_ml_model.c |  322 ++++
 drivers/ml/cnxk/mvtvm_ml_model.h |   88 +
 drivers/ml/cnxk/mvtvm_ml_ops.c   |  581 ++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |   74 +
 27 files changed, 5964 insertions(+), 2993 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_xstats.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.h

-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 01/34] ml/cnxk: drop support for register polling
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
@ 2023-09-20  7:24   ` Srikanth Yalavarthi
  2023-09-20  7:24   ` [PATCH v2 02/34] ml/cnxk: drop use of RTE API for firmware read Srikanth Yalavarthi
                     ` (33 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:24 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Dropped support for device argument "poll_mem" for cnxk
ML driver. Support to use registers for polling is removed
and DDR addresses would be used for polling.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
Depends-on: series-29565 ("Spec changes to support multi I/O models")

 doc/guides/mldevs/cnxk.rst     |  16 -----
 drivers/ml/cnxk/cn10k_ml_dev.c |  36 +----------
 drivers/ml/cnxk/cn10k_ml_dev.h |  13 +---
 drivers/ml/cnxk/cn10k_ml_ops.c | 111 ++++-----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   6 --
 5 files changed, 18 insertions(+), 164 deletions(-)

diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
index b79bc540d9..1834b1f905 100644
--- a/doc/guides/mldevs/cnxk.rst
+++ b/doc/guides/mldevs/cnxk.rst
@@ -180,22 +180,6 @@ Runtime Config Options
   in the fast path enqueue burst operation.
 
 
-**Polling memory location** (default ``ddr``)
-
-  ML cnxk driver provides the option to select the memory location to be used
-  for polling to check the inference request completion.
-  Driver supports using either the DDR address space (``ddr``)
-  or ML registers (``register``) as polling locations.
-  The parameter ``poll_mem`` is used to specify the poll location.
-
-  For example::
-
-     -a 0000:00:10.0,poll_mem="register"
-
-  With the above configuration, ML cnxk driver is configured to use ML registers
-  for polling in fastpath requests.
-
-
 Debugging Options
 -----------------
 
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 983138a7f2..e3c2badcef 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -23,7 +23,6 @@
 #define CN10K_ML_DEV_CACHE_MODEL_DATA	"cache_model_data"
 #define CN10K_ML_OCM_ALLOC_MODE		"ocm_alloc_mode"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK	"hw_queue_lock"
-#define CN10K_ML_FW_POLL_MEM		"poll_mem"
 #define CN10K_ML_OCM_PAGE_SIZE		"ocm_page_size"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
@@ -32,7 +31,6 @@
 #define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT	1
 #define CN10K_ML_OCM_ALLOC_MODE_DEFAULT		"lowest"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT	1
-#define CN10K_ML_FW_POLL_MEM_DEFAULT		"ddr"
 #define CN10K_ML_OCM_PAGE_SIZE_DEFAULT		16384
 
 /* ML firmware macros */
@@ -54,7 +52,6 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 					 CN10K_ML_DEV_CACHE_MODEL_DATA,
 					 CN10K_ML_OCM_ALLOC_MODE,
 					 CN10K_ML_DEV_HW_QUEUE_LOCK,
-					 CN10K_ML_FW_POLL_MEM,
 					 CN10K_ML_OCM_PAGE_SIZE,
 					 NULL};
 
@@ -103,9 +100,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	bool hw_queue_lock_set = false;
 	bool ocm_page_size_set = false;
 	char *ocm_alloc_mode = NULL;
-	bool poll_mem_set = false;
 	bool fw_path_set = false;
-	char *poll_mem = NULL;
 	char *fw_path = NULL;
 	int ret = 0;
 	bool found;
@@ -189,17 +184,6 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		hw_queue_lock_set = true;
 	}
 
-	if (rte_kvargs_count(kvlist, CN10K_ML_FW_POLL_MEM) == 1) {
-		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_POLL_MEM, &parse_string_arg,
-					 &poll_mem);
-		if (ret < 0) {
-			plt_err("Error processing arguments, key = %s\n", CN10K_ML_FW_POLL_MEM);
-			ret = -EINVAL;
-			goto exit;
-		}
-		poll_mem_set = true;
-	}
-
 	if (rte_kvargs_count(kvlist, CN10K_ML_OCM_PAGE_SIZE) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_OCM_PAGE_SIZE, &parse_integer_arg,
 					 &mldev->ocm_page_size);
@@ -280,18 +264,6 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, mldev->hw_queue_lock);
 
-	if (!poll_mem_set) {
-		mldev->fw.poll_mem = CN10K_ML_FW_POLL_MEM_DEFAULT;
-	} else {
-		if (!((strcmp(poll_mem, "ddr") == 0) || (strcmp(poll_mem, "register") == 0))) {
-			plt_err("Invalid argument, %s = %s\n", CN10K_ML_FW_POLL_MEM, poll_mem);
-			ret = -EINVAL;
-			goto exit;
-		}
-		mldev->fw.poll_mem = poll_mem;
-	}
-	plt_info("ML: %s = %s", CN10K_ML_FW_POLL_MEM, mldev->fw.poll_mem);
-
 	if (!ocm_page_size_set) {
 		mldev->ocm_page_size = CN10K_ML_OCM_PAGE_SIZE_DEFAULT;
 	} else {
@@ -450,10 +422,7 @@ cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 	if (fw->report_dpe_warnings)
 		flags = flags | FW_REPORT_DPE_WARNING_BITMASK;
 
-	if (strcmp(fw->poll_mem, "ddr") == 0)
-		flags = flags | FW_USE_DDR_POLL_ADDR_FP;
-	else if (strcmp(fw->poll_mem, "register") == 0)
-		flags = flags & ~FW_USE_DDR_POLL_ADDR_FP;
+	flags = flags | FW_USE_DDR_POLL_ADDR_FP;
 
 	return flags;
 }
@@ -863,5 +832,4 @@ RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH
 			      "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
 			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE
 			      "=<lowest|largest>" CN10K_ML_DEV_HW_QUEUE_LOCK
-			      "=<0|1>" CN10K_ML_FW_POLL_MEM "=<ddr|register>" CN10K_ML_OCM_PAGE_SIZE
-			      "=<1024|2048|4096|8192|16384>");
+			      "=<0|1>" CN10K_ML_OCM_PAGE_SIZE "=<1024|2048|4096|8192|16384>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index c73bf7d001..4aaeecff03 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -390,9 +390,6 @@ struct cn10k_ml_fw {
 	/* Report DPE warnings */
 	int report_dpe_warnings;
 
-	/* Memory to be used for polling in fast-path requests */
-	const char *poll_mem;
-
 	/* Data buffer */
 	uint8_t *data;
 
@@ -525,13 +522,9 @@ struct cn10k_ml_dev {
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
 
 	/* Poll handling function pointers */
-	void (*set_poll_addr)(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx);
-	void (*set_poll_ptr)(struct roc_ml *roc_ml, struct cn10k_ml_req *req);
-	uint64_t (*get_poll_ptr)(struct roc_ml *roc_ml, struct cn10k_ml_req *req);
-
-	/* Memory barrier function pointers to handle synchronization */
-	void (*set_enq_barrier)(void);
-	void (*set_deq_barrier)(void);
+	void (*set_poll_addr)(struct cn10k_ml_req *req);
+	void (*set_poll_ptr)(struct cn10k_ml_req *req);
+	uint64_t (*get_poll_ptr)(struct cn10k_ml_req *req);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 4abf4ae0d3..11531afd8c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -23,11 +23,6 @@
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
-/* Scratch register range for poll mode requests */
-#define ML_POLL_REGISTER_SYNC  1023
-#define ML_POLL_REGISTER_START 1024
-#define ML_POLL_REGISTER_END   2047
-
 /* Error message length */
 #define ERRMSG_LEN 32
 
@@ -82,79 +77,23 @@ print_line(FILE *fp, int len)
 }
 
 static inline void
-cn10k_ml_set_poll_addr_ddr(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx)
+cn10k_ml_set_poll_addr(struct cn10k_ml_req *req)
 {
-	PLT_SET_USED(qp);
-	PLT_SET_USED(idx);
-
 	req->compl_W1 = PLT_U64_CAST(&req->status);
 }
 
 static inline void
-cn10k_ml_set_poll_addr_reg(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx)
-{
-	req->compl_W1 = ML_SCRATCH(qp->block_start + idx % qp->block_size);
-}
-
-static inline void
-cn10k_ml_set_poll_ptr_ddr(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+cn10k_ml_set_poll_ptr(struct cn10k_ml_req *req)
 {
-	PLT_SET_USED(roc_ml);
-
 	plt_write64(ML_CN10K_POLL_JOB_START, req->compl_W1);
 }
 
-static inline void
-cn10k_ml_set_poll_ptr_reg(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
-{
-	roc_ml_reg_write64(roc_ml, ML_CN10K_POLL_JOB_START, req->compl_W1);
-}
-
 static inline uint64_t
-cn10k_ml_get_poll_ptr_ddr(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+cn10k_ml_get_poll_ptr(struct cn10k_ml_req *req)
 {
-	PLT_SET_USED(roc_ml);
-
 	return plt_read64(req->compl_W1);
 }
 
-static inline uint64_t
-cn10k_ml_get_poll_ptr_reg(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
-{
-	return roc_ml_reg_read64(roc_ml, req->compl_W1);
-}
-
-static inline void
-cn10k_ml_set_sync_addr(struct cn10k_ml_dev *mldev, struct cn10k_ml_req *req)
-{
-	if (strcmp(mldev->fw.poll_mem, "ddr") == 0)
-		req->compl_W1 = PLT_U64_CAST(&req->status);
-	else if (strcmp(mldev->fw.poll_mem, "register") == 0)
-		req->compl_W1 = ML_SCRATCH(ML_POLL_REGISTER_SYNC);
-}
-
-static inline void
-cn10k_ml_enq_barrier_ddr(void)
-{
-}
-
-static inline void
-cn10k_ml_deq_barrier_ddr(void)
-{
-}
-
-static inline void
-cn10k_ml_enq_barrier_register(void)
-{
-	dmb_st;
-}
-
-static inline void
-cn10k_ml_deq_barrier_register(void)
-{
-	dsb_st;
-}
-
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -242,9 +181,6 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->stats.dequeued_count = 0;
 	qp->stats.enqueue_err_count = 0;
 	qp->stats.dequeue_err_count = 0;
-	qp->block_size =
-		(ML_POLL_REGISTER_END - ML_POLL_REGISTER_START + 1) / dev->data->nb_queue_pairs;
-	qp->block_start = ML_POLL_REGISTER_START + qp_id * qp->block_size;
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
@@ -933,11 +869,7 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	else
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_LF;
 
-	if (strcmp(mldev->fw.poll_mem, "register") == 0)
-		dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP / dev_info->max_queue_pairs;
-	else if (strcmp(mldev->fw.poll_mem, "ddr") == 0)
-		dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
-
+	dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
 	dev_info->max_io = ML_CN10K_MAX_INPUT_OUTPUT;
 	dev_info->max_segments = ML_CN10K_MAX_SEGMENTS;
 	dev_info->align_size = ML_CN10K_ALIGN_SIZE;
@@ -1118,24 +1050,9 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
 	/* Set polling function pointers */
-	if (strcmp(mldev->fw.poll_mem, "ddr") == 0) {
-		mldev->set_poll_addr = cn10k_ml_set_poll_addr_ddr;
-		mldev->set_poll_ptr = cn10k_ml_set_poll_ptr_ddr;
-		mldev->get_poll_ptr = cn10k_ml_get_poll_ptr_ddr;
-	} else if (strcmp(mldev->fw.poll_mem, "register") == 0) {
-		mldev->set_poll_addr = cn10k_ml_set_poll_addr_reg;
-		mldev->set_poll_ptr = cn10k_ml_set_poll_ptr_reg;
-		mldev->get_poll_ptr = cn10k_ml_get_poll_ptr_reg;
-	}
-
-	/* Set barrier function pointers */
-	if (strcmp(mldev->fw.poll_mem, "ddr") == 0) {
-		mldev->set_enq_barrier = cn10k_ml_enq_barrier_ddr;
-		mldev->set_deq_barrier = cn10k_ml_deq_barrier_ddr;
-	} else if (strcmp(mldev->fw.poll_mem, "register") == 0) {
-		mldev->set_enq_barrier = cn10k_ml_enq_barrier_register;
-		mldev->set_deq_barrier = cn10k_ml_deq_barrier_register;
-	}
+	mldev->set_poll_addr = cn10k_ml_set_poll_addr;
+	mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
+	mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
 
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
@@ -2390,15 +2307,14 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	op = ops[count];
 	req = &queue->reqs[head];
 
-	mldev->set_poll_addr(qp, req, head);
+	mldev->set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
-	mldev->set_enq_barrier();
 
-	mldev->set_poll_ptr(&mldev->roc, req);
+	mldev->set_poll_ptr(req);
 	enqueued = mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
@@ -2445,7 +2361,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 
 dequeue_req:
 	req = &queue->reqs[tail];
-	status = mldev->get_poll_ptr(&mldev->roc, req);
+	status = mldev->get_poll_ptr(req);
 	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH)) {
 		if (plt_tsc_cycles() < req->timeout)
 			goto empty_or_active;
@@ -2453,7 +2369,6 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 			req->result.error_code.s.etype = ML_ETYPE_DRIVER;
 	}
 
-	mldev->set_deq_barrier();
 	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
 	ops[count] = req->op;
 
@@ -2515,14 +2430,14 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	model = dev->data->models[op->model_id];
 	req = model->req;
 
-	cn10k_ml_set_sync_addr(mldev, req);
+	cn10k_ml_set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
-	mldev->set_poll_ptr(&mldev->roc, req);
+	mldev->set_poll_ptr(req);
 	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
 
 	timeout = true;
@@ -2542,7 +2457,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 
 	timeout = true;
 	do {
-		if (mldev->get_poll_ptr(&mldev->roc, req) == ML_CN10K_POLL_JOB_FINISH) {
+		if (mldev->get_poll_ptr(req) == ML_CN10K_POLL_JOB_FINISH) {
 			timeout = false;
 			break;
 		}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index d64a9f27e6..005b093e45 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -67,12 +67,6 @@ struct cn10k_ml_qp {
 
 	/* Statistics per queue-pair */
 	struct rte_ml_dev_stats stats;
-
-	/* Register block start for polling */
-	uint32_t block_start;
-
-	/* Register block end for polling */
-	uint32_t block_size;
 };
 
 /* Device ops */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 02/34] ml/cnxk: drop use of RTE API for firmware read
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
  2023-09-20  7:24   ` [PATCH v2 01/34] ml/cnxk: drop support for register polling Srikanth Yalavarthi
@ 2023-09-20  7:24   ` Srikanth Yalavarthi
  2023-09-20  7:24   ` [PATCH v2 03/34] ml/cnxk: add generic cnxk device structure Srikanth Yalavarthi
                     ` (32 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:24 UTC (permalink / raw)
  To: Srikanth Yalavarthi, Prince Takkar; +Cc: dev, sshankarnara, aprabhu

Dropped use of rte_firmware_read API to read ML firmware
binary. When DPDK is built with libarchive aaupport, the
the RTE API assumes the binary file as a compressed
archive. This causes the ML firmware binary to be parsed
incorrectly.

Fixes: c29da752ffa8 ("ml/cnxk: support firmware load and device reset")
Cc: syalavarthi@marvell.com

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c | 64 +++++++++++++++++++++++++++++++---
 1 file changed, 60 insertions(+), 4 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index e3c2badcef..b7e6ed9a00 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -2,6 +2,11 @@
  * Copyright (c) 2022 Marvell.
  */
 
+#include <fcntl.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
 #include <rte_common.h>
 #include <rte_dev.h>
 #include <rte_devargs.h>
@@ -61,6 +66,57 @@ static const int valid_ocm_page_size[] = {1024, 2048, 4096, 8192, 16384};
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
 
+static int
+ml_read_file(const char *file, size_t *size, char **buffer)
+{
+	char *file_buffer = NULL;
+	struct stat file_stat;
+	char *file_map;
+	int ret;
+	int fd;
+
+	fd = open(file, O_RDONLY);
+	if (fd == -1) {
+		plt_err("Failed to open file: %s\n", file);
+		return -errno;
+	}
+
+	if (fstat(fd, &file_stat) != 0) {
+		plt_err("fstat failed for file: %s\n", file);
+		close(fd);
+		return -errno;
+	}
+
+	file_buffer = rte_malloc("ml_firmware", file_stat.st_size, PLT_CACHE_LINE_SIZE);
+	if (file_buffer == NULL) {
+		plt_err("Failed to allocate memory: %s\n", file);
+		ret = -ENOMEM;
+		goto error;
+	}
+
+	file_map = mmap(0, file_stat.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
+	if (file_map == MAP_FAILED) {
+		plt_err("Failed to map file: %s\n", file);
+		ret = -errno;
+		goto error;
+	}
+
+	rte_memcpy(file_buffer, file_map, file_stat.st_size);
+	munmap(file_map, file_stat.st_size);
+	close(fd);
+
+	*size = file_stat.st_size;
+	*buffer = file_buffer;
+
+	return 0;
+
+error:
+	free(file_buffer);
+	close(fd);
+
+	return ret;
+}
+
 static int
 parse_string_arg(const char *key __rte_unused, const char *value, void *extra_args)
 {
@@ -736,7 +792,7 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 {
 	const struct plt_memzone *mz;
 	struct cn10k_ml_fw *fw;
-	void *fw_buffer = NULL;
+	char *fw_buffer = NULL;
 	uint64_t mz_size = 0;
 	uint64_t fw_size = 0;
 	int ret = 0;
@@ -746,7 +802,7 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 
 	if (roc_env_is_emulator() || roc_env_is_hw()) {
 		/* Read firmware image to a buffer */
-		ret = rte_firmware_read(fw->path, &fw_buffer, &fw_size);
+		ret = ml_read_file(fw->path, &fw_size, &fw_buffer);
 		if ((ret < 0) || (fw_buffer == NULL)) {
 			plt_err("Unable to read firmware data: %s\n", fw->path);
 			return ret;
@@ -763,7 +819,7 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	mz = plt_memzone_reserve_aligned(FW_MEMZONE_NAME, mz_size, 0, ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL) {
 		plt_err("plt_memzone_reserve failed : %s", FW_MEMZONE_NAME);
-		free(fw_buffer);
+		rte_free(fw_buffer);
 		return -ENOMEM;
 	}
 	fw->req = mz->addr;
@@ -780,7 +836,7 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	if (roc_env_is_emulator() || roc_env_is_hw()) {
 		fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cn10k_ml_req));
 		ret = cn10k_ml_fw_load_cn10ka(fw, fw_buffer, fw_size);
-		free(fw_buffer);
+		rte_free(fw_buffer);
 	} else if (roc_env_is_asim()) {
 		fw->data = NULL;
 		ret = cn10k_ml_fw_load_asim(fw);
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 03/34] ml/cnxk: add generic cnxk device structure
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
  2023-09-20  7:24   ` [PATCH v2 01/34] ml/cnxk: drop support for register polling Srikanth Yalavarthi
  2023-09-20  7:24   ` [PATCH v2 02/34] ml/cnxk: drop use of RTE API for firmware read Srikanth Yalavarthi
@ 2023-09-20  7:24   ` Srikanth Yalavarthi
  2023-09-20  7:24   ` [PATCH v2 04/34] ml/cnxk: add generic model and layer structures Srikanth Yalavarthi
                     ` (31 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:24 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduce generic cnxk device structure. This structure is
a top level device structure for the driver, which would
encapsulate the target / platform specific device structure.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c   | 315 ++++++++++----------
 drivers/ml/cnxk/cn10k_ml_dev.h   |  47 +--
 drivers/ml/cnxk/cn10k_ml_model.c |  14 +-
 drivers/ml/cnxk/cn10k_ml_model.h |   8 +-
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  56 ++--
 drivers/ml/cnxk/cn10k_ml_ops.c   | 494 +++++++++++++++++--------------
 drivers/ml/cnxk/cnxk_ml_dev.c    |  11 +
 drivers/ml/cnxk/cnxk_ml_dev.h    |  58 ++++
 drivers/ml/cnxk/meson.build      |   2 +
 9 files changed, 562 insertions(+), 443 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index b7e6ed9a00..367fb7014c 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -15,13 +15,15 @@
 #include <rte_mldev_pmd.h>
 #include <rte_pci.h>
 
-#include <roc_api.h>
-
 #include <eal_firmware.h>
 
+#include <roc_api.h>
+
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ops.h"
 
+#include "cnxk_ml_dev.h"
+
 #define CN10K_ML_FW_PATH		"fw_path"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
@@ -63,9 +65,6 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 /* Supported OCM page sizes: 1KB, 2KB, 4KB, 8KB and 16KB */
 static const int valid_ocm_page_size[] = {1024, 2048, 4096, 8192, 16384};
 
-/* Dummy operations for ML device */
-struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
-
 static int
 ml_read_file(const char *file, size_t *size, char **buffer)
 {
@@ -146,7 +145,7 @@ parse_integer_arg(const char *key __rte_unused, const char *value, void *extra_a
 }
 
 static int
-cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mldev)
+cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *cn10k_mldev)
 {
 	bool enable_dpe_warnings_set = false;
 	bool report_dpe_warnings_set = false;
@@ -183,7 +182,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_FW_ENABLE_DPE_WARNINGS) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_ENABLE_DPE_WARNINGS,
-					 &parse_integer_arg, &mldev->fw.enable_dpe_warnings);
+					 &parse_integer_arg, &cn10k_mldev->fw.enable_dpe_warnings);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_FW_ENABLE_DPE_WARNINGS);
@@ -195,7 +194,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_FW_REPORT_DPE_WARNINGS) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_REPORT_DPE_WARNINGS,
-					 &parse_integer_arg, &mldev->fw.report_dpe_warnings);
+					 &parse_integer_arg, &cn10k_mldev->fw.report_dpe_warnings);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_FW_REPORT_DPE_WARNINGS);
@@ -207,7 +206,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_DEV_CACHE_MODEL_DATA) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_DEV_CACHE_MODEL_DATA, &parse_integer_arg,
-					 &mldev->cache_model_data);
+					 &cn10k_mldev->cache_model_data);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_DEV_CACHE_MODEL_DATA);
@@ -230,7 +229,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_DEV_HW_QUEUE_LOCK) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_DEV_HW_QUEUE_LOCK, &parse_integer_arg,
-					 &mldev->hw_queue_lock);
+					 &cn10k_mldev->hw_queue_lock);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_DEV_HW_QUEUE_LOCK);
@@ -242,7 +241,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_OCM_PAGE_SIZE) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_OCM_PAGE_SIZE, &parse_integer_arg,
-					 &mldev->ocm_page_size);
+					 &cn10k_mldev->ocm_page_size);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n", CN10K_ML_OCM_PAGE_SIZE);
 			ret = -EINVAL;
@@ -253,49 +252,53 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 check_args:
 	if (!fw_path_set)
-		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
+		cn10k_mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
 	else
-		mldev->fw.path = fw_path;
-	plt_info("ML: %s = %s", CN10K_ML_FW_PATH, mldev->fw.path);
+		cn10k_mldev->fw.path = fw_path;
+	plt_info("ML: %s = %s", CN10K_ML_FW_PATH, cn10k_mldev->fw.path);
 
 	if (!enable_dpe_warnings_set) {
-		mldev->fw.enable_dpe_warnings = CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT;
+		cn10k_mldev->fw.enable_dpe_warnings = CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT;
 	} else {
-		if ((mldev->fw.enable_dpe_warnings < 0) || (mldev->fw.enable_dpe_warnings > 1)) {
+		if ((cn10k_mldev->fw.enable_dpe_warnings < 0) ||
+		    (cn10k_mldev->fw.enable_dpe_warnings > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_FW_ENABLE_DPE_WARNINGS,
-				mldev->fw.enable_dpe_warnings);
+				cn10k_mldev->fw.enable_dpe_warnings);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_FW_ENABLE_DPE_WARNINGS, mldev->fw.enable_dpe_warnings);
+	plt_info("ML: %s = %d", CN10K_ML_FW_ENABLE_DPE_WARNINGS,
+		 cn10k_mldev->fw.enable_dpe_warnings);
 
 	if (!report_dpe_warnings_set) {
-		mldev->fw.report_dpe_warnings = CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT;
+		cn10k_mldev->fw.report_dpe_warnings = CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT;
 	} else {
-		if ((mldev->fw.report_dpe_warnings < 0) || (mldev->fw.report_dpe_warnings > 1)) {
+		if ((cn10k_mldev->fw.report_dpe_warnings < 0) ||
+		    (cn10k_mldev->fw.report_dpe_warnings > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_FW_REPORT_DPE_WARNINGS,
-				mldev->fw.report_dpe_warnings);
+				cn10k_mldev->fw.report_dpe_warnings);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_FW_REPORT_DPE_WARNINGS, mldev->fw.report_dpe_warnings);
+	plt_info("ML: %s = %d", CN10K_ML_FW_REPORT_DPE_WARNINGS,
+		 cn10k_mldev->fw.report_dpe_warnings);
 
 	if (!cache_model_data_set) {
-		mldev->cache_model_data = CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT;
+		cn10k_mldev->cache_model_data = CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT;
 	} else {
-		if ((mldev->cache_model_data < 0) || (mldev->cache_model_data > 1)) {
+		if ((cn10k_mldev->cache_model_data < 0) || (cn10k_mldev->cache_model_data > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_DEV_CACHE_MODEL_DATA,
-				mldev->cache_model_data);
+				cn10k_mldev->cache_model_data);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_DEV_CACHE_MODEL_DATA, mldev->cache_model_data);
+	plt_info("ML: %s = %d", CN10K_ML_DEV_CACHE_MODEL_DATA, cn10k_mldev->cache_model_data);
 
 	if (!ocm_alloc_mode_set) {
-		mldev->ocm.alloc_mode = CN10K_ML_OCM_ALLOC_MODE_DEFAULT;
+		cn10k_mldev->ocm.alloc_mode = CN10K_ML_OCM_ALLOC_MODE_DEFAULT;
 	} else {
 		if (!((strcmp(ocm_alloc_mode, "lowest") == 0) ||
 		      (strcmp(ocm_alloc_mode, "largest") == 0))) {
@@ -304,47 +307,47 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 			ret = -EINVAL;
 			goto exit;
 		}
-		mldev->ocm.alloc_mode = ocm_alloc_mode;
+		cn10k_mldev->ocm.alloc_mode = ocm_alloc_mode;
 	}
-	plt_info("ML: %s = %s", CN10K_ML_OCM_ALLOC_MODE, mldev->ocm.alloc_mode);
+	plt_info("ML: %s = %s", CN10K_ML_OCM_ALLOC_MODE, cn10k_mldev->ocm.alloc_mode);
 
 	if (!hw_queue_lock_set) {
-		mldev->hw_queue_lock = CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT;
+		cn10k_mldev->hw_queue_lock = CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT;
 	} else {
-		if ((mldev->hw_queue_lock < 0) || (mldev->hw_queue_lock > 1)) {
+		if ((cn10k_mldev->hw_queue_lock < 0) || (cn10k_mldev->hw_queue_lock > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_DEV_HW_QUEUE_LOCK,
-				mldev->hw_queue_lock);
+				cn10k_mldev->hw_queue_lock);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, mldev->hw_queue_lock);
+	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, cn10k_mldev->hw_queue_lock);
 
 	if (!ocm_page_size_set) {
-		mldev->ocm_page_size = CN10K_ML_OCM_PAGE_SIZE_DEFAULT;
+		cn10k_mldev->ocm_page_size = CN10K_ML_OCM_PAGE_SIZE_DEFAULT;
 	} else {
-		if (mldev->ocm_page_size < 0) {
+		if (cn10k_mldev->ocm_page_size < 0) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_OCM_PAGE_SIZE,
-				mldev->ocm_page_size);
+				cn10k_mldev->ocm_page_size);
 			ret = -EINVAL;
 			goto exit;
 		}
 
 		found = false;
 		for (i = 0; i < PLT_DIM(valid_ocm_page_size); i++) {
-			if (mldev->ocm_page_size == valid_ocm_page_size[i]) {
+			if (cn10k_mldev->ocm_page_size == valid_ocm_page_size[i]) {
 				found = true;
 				break;
 			}
 		}
 
 		if (!found) {
-			plt_err("Unsupported ocm_page_size = %d\n", mldev->ocm_page_size);
+			plt_err("Unsupported ocm_page_size = %d\n", cn10k_mldev->ocm_page_size);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_OCM_PAGE_SIZE, mldev->ocm_page_size);
+	plt_info("ML: %s = %d", CN10K_ML_OCM_PAGE_SIZE, cn10k_mldev->ocm_page_size);
 
 exit:
 	rte_kvargs_free(kvlist);
@@ -356,7 +359,8 @@ static int
 cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 {
 	struct rte_ml_dev_pmd_init_params init_params;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	char name[RTE_ML_STR_MAX];
 	struct rte_ml_dev *dev;
 	int ret;
@@ -364,7 +368,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	PLT_SET_USED(pci_drv);
 
 	init_params = (struct rte_ml_dev_pmd_init_params){
-		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cn10k_ml_dev)};
+		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cnxk_ml_dev)};
 
 	ret = roc_plt_init();
 	if (ret < 0) {
@@ -380,18 +384,20 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	}
 
 	/* Get private data space allocated */
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cnxk_mldev->mldev = dev;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
-		mldev->roc.pci_dev = pci_dev;
+		cn10k_mldev->roc.pci_dev = pci_dev;
 
-		ret = cn10k_mldev_parse_devargs(dev->device->devargs, mldev);
+		ret = cn10k_mldev_parse_devargs(dev->device->devargs, cn10k_mldev);
 		if (ret) {
 			plt_err("Failed to parse devargs ret = %d", ret);
 			goto pmd_destroy;
 		}
 
-		ret = roc_ml_dev_init(&mldev->roc);
+		ret = roc_ml_dev_init(&cn10k_mldev->roc);
 		if (ret) {
 			plt_err("Failed to initialize ML ROC, ret = %d", ret);
 			goto pmd_destroy;
@@ -407,7 +413,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	dev->dequeue_burst = NULL;
 	dev->op_error_get = NULL;
 
-	mldev->state = ML_CN10K_DEV_STATE_PROBED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_PROBED;
 
 	return 0;
 
@@ -424,7 +430,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 static int
 cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	char name[RTE_ML_STR_MAX];
 	struct rte_ml_dev *dev;
 	int ret;
@@ -439,8 +445,8 @@ cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 		return -ENODEV;
 
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
-		mldev = dev->data->dev_private;
-		ret = roc_ml_dev_fini(&mldev->roc);
+		cnxk_mldev = dev->data->dev_private;
+		ret = roc_ml_dev_fini(&cnxk_mldev->cn10k_mldev.roc);
 		if (ret)
 			return ret;
 	}
@@ -486,45 +492,45 @@ cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 static int
 cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
 	uint64_t timeout_cycle;
 	uint64_t reg_val64;
 	bool timeout;
 	int ret = 0;
 
-	mldev = fw->mldev;
+	cn10k_mldev = fw->cn10k_mldev;
 
 	/* Reset HEAD and TAIL debug pointer registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
 
 	/* Set ML_MLR_BASE to base IOVA of the ML region in LLC/DRAM. */
 	reg_val64 = rte_eal_get_baseaddr();
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_MLR_BASE);
-	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
-	roc_ml_reg_save(&mldev->roc, ML_MLR_BASE);
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_save(&cn10k_mldev->roc, ML_MLR_BASE);
 
 	/* Update FW load completion structure */
 	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
 	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &fw->req->result);
+	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
 	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CN10K_POLL_JOB_START, &fw->req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
-	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&mldev->roc, &fw->req->jd);
+	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
 
 	plt_rmb();
 	do {
-		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
+		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -536,11 +542,11 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 	} else {
 		/* Set ML to disable new jobs */
 		reg_val64 = (ROC_ML_CFG_JD_SIZE | ROC_ML_CFG_MLIP_ENA);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 
 		/* Clear scratch registers */
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
 
 		if (timeout) {
 			plt_err("Firmware load timeout");
@@ -554,14 +560,14 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 	}
 
 	/* Reset scratch registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
 
 	/* Disable job execution, to be enabled in start */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	return ret;
 }
@@ -571,7 +577,7 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 {
 	union ml_a35_0_rst_vector_base_s a35_0_rst_vector_base;
 	union ml_a35_0_rst_vector_base_s a35_1_rst_vector_base;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
 	uint64_t timeout_cycle;
 	uint64_t reg_val64;
 	uint32_t reg_val32;
@@ -580,24 +586,24 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	int ret = 0;
 	uint8_t i;
 
-	mldev = fw->mldev;
+	cn10k_mldev = fw->cn10k_mldev;
 
 	/* Reset HEAD and TAIL debug pointer registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
 
 	/* (1) Write firmware images for ACC's two A35 cores to the ML region in LLC / DRAM. */
 	rte_memcpy(PLT_PTR_ADD(fw->data, FW_LINKER_OFFSET), buffer, size);
 
 	/* (2) Set ML(0)_MLR_BASE = Base IOVA of the ML region in LLC/DRAM. */
 	reg_val64 = PLT_PTR_SUB_U64_CAST(fw->data, rte_eal_get_baseaddr());
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_MLR_BASE);
-	plt_ml_dbg("ML_MLR_BASE => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
-	roc_ml_reg_save(&mldev->roc, ML_MLR_BASE);
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_save(&cn10k_mldev->roc, ML_MLR_BASE);
 
 	/* (3) Set ML(0)_AXI_BRIDGE_CTRL(1) = 0x184003 to remove back-pressure check on DMA AXI
 	 * bridge.
@@ -605,9 +611,9 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	reg_val64 = (ROC_ML_AXI_BRIDGE_CTRL_AXI_RESP_CTRL |
 		     ROC_ML_AXI_BRIDGE_CTRL_BRIDGE_CTRL_MODE | ROC_ML_AXI_BRIDGE_CTRL_NCB_WR_BLK |
 		     ROC_ML_AXI_BRIDGE_CTRL_FORCE_WRESP_OK | ROC_ML_AXI_BRIDGE_CTRL_FORCE_RRESP_OK);
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_AXI_BRIDGE_CTRL(1));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_AXI_BRIDGE_CTRL(1));
 	plt_ml_dbg("ML_AXI_BRIDGE_CTRL(1) => 0x%016lx",
-		   roc_ml_reg_read64(&mldev->roc, ML_AXI_BRIDGE_CTRL(1)));
+		   roc_ml_reg_read64(&cn10k_mldev->roc, ML_AXI_BRIDGE_CTRL(1)));
 
 	/* (4) Set ML(0)_ANB(0..2)_BACKP_DISABLE = 0x3 to remove back-pressure on the AXI to NCB
 	 * bridges.
@@ -615,9 +621,9 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	for (i = 0; i < ML_ANBX_NR; i++) {
 		reg_val64 = (ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_B_BACKP_DISABLE |
 			     ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_R_BACKP_DISABLE);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_BACKP_DISABLE(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_BACKP_DISABLE(i));
 		plt_ml_dbg("ML_ANBX_BACKP_DISABLE(%u) => 0x%016lx", i,
-			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_BACKP_DISABLE(i)));
+			   roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_BACKP_DISABLE(i)));
 	}
 
 	/* (5) Set ML(0)_ANB(0..2)_NCBI_P_OVR = 0x3000 and ML(0)_ANB(0..2)_NCBI_NP_OVR = 0x3000 to
@@ -626,39 +632,40 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	for (i = 0; i < ML_ANBX_NR; i++) {
 		reg_val64 = (ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR |
 			     ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR_VLD);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
 		plt_ml_dbg("ML_ANBX_NCBI_P_OVR(%u) => 0x%016lx", i,
-			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_P_OVR(i)));
+			   roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_NCBI_P_OVR(i)));
 
 		reg_val64 |= (ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR |
 			      ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR_VLD);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_NP_OVR(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_NCBI_NP_OVR(i));
 		plt_ml_dbg("ML_ANBX_NCBI_NP_OVR(%u) => 0x%016lx", i,
-			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_NP_OVR(i)));
+			   roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_NCBI_NP_OVR(i)));
 	}
 
 	/* (6) Set ML(0)_CFG[MLIP_CLK_FORCE] = 1, to force turning on the MLIP clock. */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_MLIP_CLK_FORCE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* (7) Set ML(0)_JOB_MGR_CTRL[STALL_ON_IDLE] = 0, to make sure the boot request is accepted
 	 * when there is no job in the command queue.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL);
 	reg_val64 &= ~ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
-	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
+	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx",
+		   roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL));
 
 	/* (8) Set ML(0)_CFG[ENA] = 0 and ML(0)_CFG[MLIP_ENA] = 1 to bring MLIP out of reset while
 	 * keeping the job manager disabled.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_MLIP_ENA;
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* (9) Wait at least 70 coprocessor clock cycles. */
 	plt_delay_us(FW_WAIT_CYCLES);
@@ -669,53 +676,57 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	 * AXI outbound address divided by 4. Read after write.
 	 */
 	offset = PLT_PTR_ADD_U64_CAST(
-		fw->data, FW_LINKER_OFFSET - roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+		fw->data, FW_LINKER_OFFSET - roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
 	a35_0_rst_vector_base.s.addr = (offset + ML_AXI_START_ADDR) / 4;
 	a35_1_rst_vector_base.s.addr = (offset + ML_AXI_START_ADDR) / 4;
 
-	roc_ml_reg_write32(&mldev->roc, a35_0_rst_vector_base.w.w0, ML_A35_0_RST_VECTOR_BASE_W(0));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(0));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_0_rst_vector_base.w.w0,
+			   ML_A35_0_RST_VECTOR_BASE_W(0));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(0));
 	plt_ml_dbg("ML_A35_0_RST_VECTOR_BASE_W(0) => 0x%08x", reg_val32);
 
-	roc_ml_reg_write32(&mldev->roc, a35_0_rst_vector_base.w.w1, ML_A35_0_RST_VECTOR_BASE_W(1));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(1));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_0_rst_vector_base.w.w1,
+			   ML_A35_0_RST_VECTOR_BASE_W(1));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(1));
 	plt_ml_dbg("ML_A35_0_RST_VECTOR_BASE_W(1) => 0x%08x", reg_val32);
 
-	roc_ml_reg_write32(&mldev->roc, a35_1_rst_vector_base.w.w0, ML_A35_1_RST_VECTOR_BASE_W(0));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(0));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_1_rst_vector_base.w.w0,
+			   ML_A35_1_RST_VECTOR_BASE_W(0));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(0));
 	plt_ml_dbg("ML_A35_1_RST_VECTOR_BASE_W(0) => 0x%08x", reg_val32);
 
-	roc_ml_reg_write32(&mldev->roc, a35_1_rst_vector_base.w.w1, ML_A35_1_RST_VECTOR_BASE_W(1));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(1));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_1_rst_vector_base.w.w1,
+			   ML_A35_1_RST_VECTOR_BASE_W(1));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(1));
 	plt_ml_dbg("ML_A35_1_RST_VECTOR_BASE_W(1) => 0x%08x", reg_val32);
 
 	/* (11) Clear MLIP's ML(0)_SW_RST_CTRL[ACC_RST]. This will bring the ACC cores and other
 	 * MLIP components out of reset. The cores will execute firmware from the ML region as
 	 * written in step 1.
 	 */
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_SW_RST_CTRL);
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_SW_RST_CTRL);
 	reg_val32 &= ~ROC_ML_SW_RST_CTRL_ACC_RST;
-	roc_ml_reg_write32(&mldev->roc, reg_val32, ML_SW_RST_CTRL);
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_SW_RST_CTRL);
+	roc_ml_reg_write32(&cn10k_mldev->roc, reg_val32, ML_SW_RST_CTRL);
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_SW_RST_CTRL);
 	plt_ml_dbg("ML_SW_RST_CTRL => 0x%08x", reg_val32);
 
 	/* (12) Wait for notification from firmware that ML is ready for job execution. */
 	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
 	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &fw->req->result);
+	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
 	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CN10K_POLL_JOB_START, &fw->req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
-	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&mldev->roc, &fw->req->jd);
+	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
 
 	plt_rmb();
 	do {
-		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
+		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -727,11 +738,11 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	} else {
 		/* Set ML to disable new jobs */
 		reg_val64 = (ROC_ML_CFG_JD_SIZE | ROC_ML_CFG_MLIP_ENA);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 
 		/* Clear scratch registers */
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
 
 		if (timeout) {
 			plt_err("Firmware load timeout");
@@ -747,49 +758,51 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	/* (13) Set ML(0)_JOB_MGR_CTRL[STALL_ON_IDLE] = 0x1; this is needed to shut down the MLIP
 	 * clock when there are no more jobs to process.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL);
 	reg_val64 |= ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
-	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
+	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx",
+		   roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL));
 
 	/* (14) Set ML(0)_CFG[MLIP_CLK_FORCE] = 0; the MLIP clock will be turned on/off based on job
 	 * activities.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_MLIP_CLK_FORCE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* (15) Set ML(0)_CFG[ENA] to enable ML job execution. */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* Reset scratch registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
 
 	/* Disable job execution, to be enabled in start */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* Additional fixes: Set RO bit to fix O2D DMA bandwidth issue on cn10ka */
 	for (i = 0; i < ML_ANBX_NR; i++) {
-		reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_P_OVR(i));
+		reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_NCBI_P_OVR(i));
 		reg_val64 |= (ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR |
 			      ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR_VLD);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
 	}
 
 	return ret;
 }
 
 int
-cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
+cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
 	const struct plt_memzone *mz;
 	struct cn10k_ml_fw *fw;
 	char *fw_buffer = NULL;
@@ -797,8 +810,9 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	uint64_t fw_size = 0;
 	int ret = 0;
 
-	fw = &mldev->fw;
-	fw->mldev = mldev;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	fw = &cn10k_mldev->fw;
+	fw->cn10k_mldev = cn10k_mldev;
 
 	if (roc_env_is_emulator() || roc_env_is_hw()) {
 		/* Read firmware image to a buffer */
@@ -829,8 +843,8 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	memset(&fw->req->jd.fw_load.version[0], '\0', MLDEV_FIRMWARE_VERSION_LENGTH);
 
 	/* Reset device, if in active state */
-	if (roc_ml_mlip_is_enabled(&mldev->roc))
-		roc_ml_mlip_reset(&mldev->roc, true);
+	if (roc_ml_mlip_is_enabled(&cn10k_mldev->roc))
+		roc_ml_mlip_reset(&cn10k_mldev->roc, true);
 
 	/* Load firmware */
 	if (roc_env_is_emulator() || roc_env_is_hw()) {
@@ -843,22 +857,25 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	}
 
 	if (ret < 0)
-		cn10k_ml_fw_unload(mldev);
+		cn10k_ml_fw_unload(cnxk_mldev);
 
 	return ret;
 }
 
 void
-cn10k_ml_fw_unload(struct cn10k_ml_dev *mldev)
+cn10k_ml_fw_unload(struct cnxk_ml_dev *cnxk_mldev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
 	const struct plt_memzone *mz;
 	uint64_t reg_val;
 
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+
 	/* Disable and reset device */
-	reg_val = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val &= ~ROC_ML_CFG_MLIP_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val, ML_CFG);
-	roc_ml_mlip_reset(&mldev->roc, true);
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val, ML_CFG);
+	roc_ml_mlip_reset(&cn10k_mldev->roc, true);
 
 	mz = plt_memzone_lookup(FW_MEMZONE_NAME);
 	if (mz != NULL)
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 4aaeecff03..f9da1548c4 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -9,6 +9,9 @@
 
 #include "cn10k_ml_ocm.h"
 
+/* Dummy Device ops */
+extern struct rte_ml_dev_ops ml_dev_dummy_ops;
+
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
@@ -36,17 +39,10 @@
 /* Maximum number of segments for IO data */
 #define ML_CN10K_MAX_SEGMENTS 1
 
-/* ML command timeout in seconds */
-#define ML_CN10K_CMD_TIMEOUT 5
-
 /* ML slow-path job flags */
 #define ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE BIT(0)
 #define ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD   BIT(1)
 
-/* Poll mode job state */
-#define ML_CN10K_POLL_JOB_START	 0
-#define ML_CN10K_POLL_JOB_FINISH 1
-
 /* Memory barrier macros */
 #if defined(RTE_ARCH_ARM)
 #define dmb_st ({ asm volatile("dmb st" : : : "memory"); })
@@ -56,6 +52,7 @@
 #define dsb_st
 #endif
 
+struct cnxk_ml_dev;
 struct cn10k_ml_req;
 struct cn10k_ml_qp;
 
@@ -68,21 +65,6 @@ enum cn10k_ml_job_type {
 	ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST,
 };
 
-/* Device configuration state enum */
-enum cn10k_ml_dev_state {
-	/* Probed and not configured */
-	ML_CN10K_DEV_STATE_PROBED = 0,
-
-	/* Configured */
-	ML_CN10K_DEV_STATE_CONFIGURED,
-
-	/* Started */
-	ML_CN10K_DEV_STATE_STARTED,
-
-	/* Closed */
-	ML_CN10K_DEV_STATE_CLOSED
-};
-
 /* Error types enumeration */
 enum cn10k_ml_error_etype {
 	/* 0x0 */ ML_ETYPE_NO_ERROR = 0, /* No error */
@@ -379,7 +361,7 @@ struct cn10k_ml_jd {
 /* ML firmware structure */
 struct cn10k_ml_fw {
 	/* Device reference */
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
 
 	/* Firmware file path */
 	const char *path;
@@ -485,27 +467,12 @@ struct cn10k_ml_dev {
 	/* Device ROC */
 	struct roc_ml roc;
 
-	/* Configuration state */
-	enum cn10k_ml_dev_state state;
-
 	/* Firmware */
 	struct cn10k_ml_fw fw;
 
 	/* OCM info */
 	struct cn10k_ml_ocm ocm;
 
-	/* Number of models loaded */
-	uint16_t nb_models_loaded;
-
-	/* Number of models unloaded */
-	uint16_t nb_models_unloaded;
-
-	/* Number of models started */
-	uint16_t nb_models_started;
-
-	/* Number of models stopped */
-	uint16_t nb_models_stopped;
-
 	/* Extended stats data */
 	struct cn10k_ml_xstats xstats;
 
@@ -528,7 +495,7 @@ struct cn10k_ml_dev {
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
-int cn10k_ml_fw_load(struct cn10k_ml_dev *mldev);
-void cn10k_ml_fw_unload(struct cn10k_ml_dev *mldev);
+int cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev);
+void cn10k_ml_fw_unload(struct cnxk_ml_dev *cnxk_mldev);
 
 #endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index e0b750cd8e..d146535866 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -10,6 +10,8 @@
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
+#include "cnxk_ml_dev.h"
+
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
 {
@@ -461,7 +463,7 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_
 }
 
 int
-cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, uint8_t *buffer,
+cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id, uint8_t *buffer,
 			       uint16_t *wb_pages, uint16_t *scratch_pages)
 {
 	struct cn10k_ml_model_metadata *metadata;
@@ -470,7 +472,7 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui
 	uint64_t wb_size;
 
 	metadata = (struct cn10k_ml_model_metadata *)buffer;
-	ocm = &mldev->ocm;
+	ocm = &cn10k_mldev->ocm;
 
 	/* Assume wb_size is zero for non-relocatable models */
 	if (metadata->model.ocm_relocatable)
@@ -494,11 +496,11 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui
 		   scratch_size, *scratch_pages);
 
 	/* Check if the model can be loaded on OCM */
-	if ((*wb_pages + *scratch_pages) > mldev->ocm.num_pages) {
+	if ((*wb_pages + *scratch_pages) > cn10k_mldev->ocm.num_pages) {
 		plt_err("Cannot create the model, OCM relocatable = %u",
 			metadata->model.ocm_relocatable);
 		plt_err("wb_pages (%u) + scratch_pages (%u) > %u", *wb_pages, *scratch_pages,
-			mldev->ocm.num_pages);
+			cn10k_mldev->ocm.num_pages);
 		return -ENOMEM;
 	}
 
@@ -506,8 +508,8 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui
 	 * prevent the library from allocating the remaining space on the tile to other models.
 	 */
 	if (!metadata->model.ocm_relocatable)
-		*scratch_pages =
-			PLT_MAX(PLT_U64_CAST(*scratch_pages), PLT_U64_CAST(mldev->ocm.num_pages));
+		*scratch_pages = PLT_MAX(PLT_U64_CAST(*scratch_pages),
+					 PLT_U64_CAST(cn10k_mldev->ocm.num_pages));
 
 	return 0;
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 4cc0744891..3128b28db7 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -13,6 +13,8 @@
 #include "cn10k_ml_ocm.h"
 #include "cn10k_ml_ops.h"
 
+struct cnxk_ml_dev;
+
 /* Model state */
 enum cn10k_ml_model_state {
 	ML_CN10K_MODEL_STATE_LOADED,
@@ -489,7 +491,7 @@ struct cn10k_ml_model_stats {
 /* Model Object */
 struct cn10k_ml_model {
 	/* Device reference */
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *mldev;
 
 	/* Name */
 	char name[RTE_ML_STR_MAX];
@@ -537,8 +539,8 @@ int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
 void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
 				uint8_t *base_dma_addr);
-int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, uint8_t *buffer,
-				   uint16_t *wb_pages, uint16_t *scratch_pages);
+int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id,
+				   uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages);
 void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 6fb0bb620e..aa376284d5 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -4,11 +4,13 @@
 
 #include <rte_mldev_pmd.h>
 
+#include <roc_api.h>
+
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
-#include "roc_api.h"
+#include "cnxk_ml_dev.h"
 
 /* OCM macros */
 #define BYTE_LEN	   8
@@ -217,7 +219,8 @@ int
 cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
 			   uint16_t scratch_pages, uint64_t *tilemask)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	uint16_t used_scratch_pages_max;
@@ -236,8 +239,9 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	int max_slot_sz;
 	int page_id;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 
 	if (num_tiles > ML_CN10K_OCM_NUMTILES) {
 		plt_err("Invalid num_tiles = %u (> %u)", num_tiles, ML_CN10K_OCM_NUMTILES);
@@ -254,8 +258,8 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	tile_start = 0;
 	search_end_tile = ocm->num_tiles - num_tiles;
 
-	/* allocate for local ocm mask */
-	local_ocm_mask = rte_zmalloc("local_ocm_mask", mldev->ocm.mask_words, RTE_CACHE_LINE_SIZE);
+	/* Allocate for local ocm mask */
+	local_ocm_mask = rte_zmalloc("local_ocm_mask", ocm->mask_words, RTE_CACHE_LINE_SIZE);
 	if (local_ocm_mask == NULL) {
 		plt_err("Unable to allocate memory for local_ocm_mask");
 		return -1;
@@ -271,7 +275,7 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 			PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, used_last_wb_page_max);
 	}
 
-	memset(local_ocm_mask, 0, mldev->ocm.mask_words);
+	memset(local_ocm_mask, 0, ocm->mask_words);
 	for (tile_id = tile_start; tile_id < tile_start + num_tiles; tile_id++) {
 		for (word_id = 0; word_id < ocm->mask_words; word_id++)
 			local_ocm_mask[word_id] |= ocm->tile_ocm_info[tile_id].ocm_mask[word_id];
@@ -333,8 +337,9 @@ void
 cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t tilemask,
 			   int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_page_start;
@@ -345,8 +350,9 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 	int tile_id;
 	int page_id;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	/* Get first set bit, tile_start */
@@ -391,8 +397,9 @@ void
 cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_model *local_model;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_resize_pages;
@@ -404,8 +411,9 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 	int page_id;
 	uint16_t i;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	/* Update OCM info for WB memory */
@@ -453,35 +461,37 @@ cn10k_ml_ocm_pagemask_to_str(struct cn10k_ml_ocm_tile_info *tile_info, uint16_t
 	char *p = str;
 	int word;
 
-	/* add prefix 0x */
+	/* Add prefix 0x */
 	*p++ = '0';
 	*p++ = 'x';
 
-	/* build one word at a time */
+	/* Build hex string */
 	for (word = nwords - 1; word >= 0; word--) {
 		sprintf(p, "%02X", tile_info->ocm_mask[word]);
 		p += 2;
 	}
 
-	/* terminate */
+	/* Terminate */
 	*p++ = 0;
 }
 
 void
 cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
 {
-	char *str;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 	uint8_t tile_id;
 	uint8_t word_id;
 	int wb_pages;
+	char *str;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 
-	/* nibbles + prefix '0x' */
-	str = rte_zmalloc("ocm_mask_str", mldev->ocm.num_pages / 4 + 2, RTE_CACHE_LINE_SIZE);
+	/* Nibbles + prefix '0x' */
+	str = rte_zmalloc("ocm_mask_str", ocm->num_pages / 4 + 2, RTE_CACHE_LINE_SIZE);
 	if (str == NULL) {
 		plt_err("Unable to allocate memory for ocm_mask_str");
 		return;
@@ -492,7 +502,7 @@ cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
 		cn10k_ml_ocm_pagemask_to_str(&ocm->tile_ocm_info[tile_id], ocm->mask_words, str);
 
 		wb_pages = 0 - ocm->tile_ocm_info[tile_id].scratch_pages;
-		for (word_id = 0; word_id < mldev->ocm.mask_words; word_id++)
+		for (word_id = 0; word_id < ocm->mask_words; word_id++)
 			wb_pages +=
 				rte_popcount32(ocm->tile_ocm_info[tile_id].ocm_mask[word_id]);
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 11531afd8c..3385bf50c0 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -11,6 +11,8 @@
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ops.h"
 
+#include "cnxk_ml_dev.h"
+
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
 
@@ -85,7 +87,7 @@ cn10k_ml_set_poll_addr(struct cn10k_ml_req *req)
 static inline void
 cn10k_ml_set_poll_ptr(struct cn10k_ml_req *req)
 {
-	plt_write64(ML_CN10K_POLL_JOB_START, req->compl_W1);
+	plt_write64(ML_CNXK_POLL_JOB_START, req->compl_W1);
 }
 
 static inline uint64_t
@@ -175,7 +177,7 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->queue.reqs = (struct cn10k_ml_req *)va;
 	qp->queue.head = 0;
 	qp->queue.tail = 0;
-	qp->queue.wait_cycles = ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	qp->nb_desc = nb_desc;
 	qp->stats.enqueued_count = 0;
 	qp->stats.dequeued_count = 0;
@@ -199,16 +201,17 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 static void
 cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 {
-
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	char str[STR_LEN];
 	uint8_t i;
 	uint8_t j;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	/* Print debug info */
@@ -249,7 +252,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
 			ML_CN10K_OCM_NUMTILES / 4, model->model_mem_map.tilemask);
 		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
-			model->model_mem_map.wb_page_start * mldev->ocm.page_size);
+			model->model_mem_map.wb_page_start * cn10k_mldev->ocm.page_size);
 	}
 
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->metadata.model.num_input);
@@ -325,7 +328,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 }
 
 static void
-cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_model *model,
+cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_model *model,
 				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
@@ -340,7 +343,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 	req->jd.hdr.model_id = model->model_id;
 	req->jd.hdr.job_type = job_type;
 	req->jd.hdr.fp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
 		if (!model->metadata.model.ocm_relocatable)
@@ -350,9 +353,9 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 
 		req->jd.hdr.sp_flags |= ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD;
 		req->jd.model_start.extended_args =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, &req->extended_args));
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->extended_args));
 		req->jd.model_start.model_dst_ddr_addr =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, addr->init_run_addr));
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_run_addr));
 		req->jd.model_start.model_init_offset = 0x0;
 		req->jd.model_start.model_main_offset = metadata->init_model.file_size;
 		req->jd.model_start.model_finish_offset =
@@ -372,7 +375,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 		req->jd.model_start.ocm_wb_range_start = metadata->model.ocm_wb_range_start;
 		req->jd.model_start.ocm_wb_range_end = metadata->model.ocm_wb_range_end;
 		req->jd.model_start.ddr_wb_base_address = PLT_U64_CAST(roc_ml_addr_ap2mlip(
-			&mldev->roc,
+			&cn10k_mldev->roc,
 			PLT_PTR_ADD(addr->finish_load_addr, metadata->finish_model.file_size)));
 		req->jd.model_start.ddr_wb_range_start = metadata->model.ddr_wb_range_start;
 		req->jd.model_start.ddr_wb_range_end = metadata->model.ddr_wb_range_end;
@@ -383,7 +386,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 		req->jd.model_start.output.s.ddr_range_end = metadata->model.ddr_output_range_end;
 
 		req->extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
-			roc_ml_addr_ap2mlip(&mldev->roc, model->addr.scratch_base_addr));
+			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, model->addr.scratch_base_addr));
 		req->extended_args.start.ddr_scratch_range_start =
 			metadata->model.ddr_scratch_range_start;
 		req->extended_args.start.ddr_scratch_range_end =
@@ -392,24 +395,20 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 }
 
 static __rte_always_inline void
-cn10k_ml_prep_fp_job_descriptor(struct rte_ml_dev *dev, struct cn10k_ml_req *req,
+cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_req *req,
 				struct rte_ml_op *op)
 {
-	struct cn10k_ml_dev *mldev;
-
-	mldev = dev->data->dev_private;
-
 	req->jd.hdr.jce.w0.u64 = 0;
 	req->jd.hdr.jce.w1.u64 = req->compl_W1;
 	req->jd.hdr.model_id = op->model_id;
 	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
 	req->jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
 	req->jd.hdr.sp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
 	req->jd.model_run.input_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->input[0]->addr));
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->input[0]->addr));
 	req->jd.model_run.output_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->output[0]->addr));
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->output[0]->addr));
 	req->jd.model_run.num_batches = op->nb_batches;
 }
 
@@ -436,66 +435,69 @@ static const struct xstat_info model_stats[] = {
 static int
 cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint16_t nb_stats;
 	uint16_t stat_id;
 	uint16_t model;
 	uint16_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
 	nb_stats = RTE_DIM(device_stats) + ML_CN10K_MAX_MODELS * RTE_DIM(model_stats);
-	if (mldev->xstats.entries == NULL)
-		mldev->xstats.entries = rte_zmalloc("cn10k_ml_xstats",
-						    sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
-						    PLT_CACHE_LINE_SIZE);
+	if (cn10k_mldev->xstats.entries == NULL)
+		cn10k_mldev->xstats.entries = rte_zmalloc(
+			"cn10k_ml_xstats", sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
+			PLT_CACHE_LINE_SIZE);
 
-	if (mldev->xstats.entries == NULL)
+	if (cn10k_mldev->xstats.entries == NULL)
 		return -ENOMEM;
 
 	/* Initialize device xstats */
 	stat_id = 0;
 	for (i = 0; i < RTE_DIM(device_stats); i++) {
-		mldev->xstats.entries[stat_id].map.id = stat_id;
-		snprintf(mldev->xstats.entries[stat_id].map.name,
-			 sizeof(mldev->xstats.entries[stat_id].map.name), "%s",
+		cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
+		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s",
 			 device_stats[i].name);
 
-		mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
-		mldev->xstats.entries[stat_id].type = device_stats[i].type;
-		mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_DEVICE;
-		mldev->xstats.entries[stat_id].obj_idx = 0;
-		mldev->xstats.entries[stat_id].reset_allowed = device_stats[i].reset_allowed;
+		cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
+		cn10k_mldev->xstats.entries[stat_id].type = device_stats[i].type;
+		cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_DEVICE;
+		cn10k_mldev->xstats.entries[stat_id].obj_idx = 0;
+		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_stats[i].reset_allowed;
 		stat_id++;
 	}
-	mldev->xstats.count_mode_device = stat_id;
+	cn10k_mldev->xstats.count_mode_device = stat_id;
 
 	/* Initialize model xstats */
 	for (model = 0; model < ML_CN10K_MAX_MODELS; model++) {
-		mldev->xstats.offset_for_model[model] = stat_id;
+		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
 
 		for (i = 0; i < RTE_DIM(model_stats); i++) {
-			mldev->xstats.entries[stat_id].map.id = stat_id;
-			mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
-			mldev->xstats.entries[stat_id].type = model_stats[i].type;
-			mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_MODEL;
-			mldev->xstats.entries[stat_id].obj_idx = model;
-			mldev->xstats.entries[stat_id].reset_allowed = model_stats[i].reset_allowed;
+			cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
+			cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
+			cn10k_mldev->xstats.entries[stat_id].type = model_stats[i].type;
+			cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_MODEL;
+			cn10k_mldev->xstats.entries[stat_id].obj_idx = model;
+			cn10k_mldev->xstats.entries[stat_id].reset_allowed =
+				model_stats[i].reset_allowed;
 
 			/* Name of xstat is updated during model load */
-			snprintf(mldev->xstats.entries[stat_id].map.name,
-				 sizeof(mldev->xstats.entries[stat_id].map.name), "Model-%u-%s",
-				 model, model_stats[i].name);
+			snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
+				 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name),
+				 "Model-%u-%s", model, model_stats[i].name);
 
 			stat_id++;
 		}
 
-		mldev->xstats.count_per_model[model] = RTE_DIM(model_stats);
+		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(model_stats);
 	}
 
-	mldev->xstats.count_mode_model = stat_id - mldev->xstats.count_mode_device;
-	mldev->xstats.count = stat_id;
+	cn10k_mldev->xstats.count_mode_model = stat_id - cn10k_mldev->xstats.count_mode_device;
+	cn10k_mldev->xstats.count = stat_id;
 
 	return 0;
 }
@@ -503,28 +505,32 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 static void
 cn10k_ml_xstats_uninit(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	rte_free(mldev->xstats.entries);
-	mldev->xstats.entries = NULL;
+	rte_free(cn10k_mldev->xstats.entries);
+	cn10k_mldev->xstats.entries = NULL;
 
-	mldev->xstats.count = 0;
+	cn10k_mldev->xstats.count = 0;
 }
 
 static void
 cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	uint16_t rclk_freq;
 	uint16_t sclk_freq;
 	uint16_t stat_id;
 	char suffix[8];
 	uint16_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[model_id];
 	stat_id = RTE_DIM(device_stats) + model_id * RTE_DIM(model_stats);
 
@@ -536,8 +542,8 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Update xstat name based on model name and sclk availability */
 	for (i = 0; i < RTE_DIM(model_stats); i++) {
-		snprintf(mldev->xstats.entries[stat_id].map.name,
-			 sizeof(mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
 			 model->metadata.model.name, model_stats[i].name, suffix);
 		stat_id++;
 	}
@@ -547,19 +553,19 @@ static uint64_t
 cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 		       enum cn10k_ml_xstats_type type)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
 
 	switch (type) {
 	case nb_models_loaded:
-		return mldev->nb_models_loaded;
+		return cnxk_mldev->nb_models_loaded;
 	case nb_models_unloaded:
-		return mldev->nb_models_unloaded;
+		return cnxk_mldev->nb_models_unloaded;
 	case nb_models_started:
-		return mldev->nb_models_started;
+		return cnxk_mldev->nb_models_started;
 	case nb_models_stopped:
-		return mldev->nb_models_stopped;
+		return cnxk_mldev->nb_models_stopped;
 	default:
 		return -1;
 	}
@@ -651,15 +657,17 @@ static int
 cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[], uint16_t nb_ids)
 {
 	struct cn10k_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint16_t nb_stats;
 	uint16_t stat_id;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	if (stat_ids == NULL)
-		nb_stats = mldev->xstats.count_mode_device;
+		nb_stats = cn10k_mldev->xstats.count_mode_device;
 	else
 		nb_stats = nb_ids;
 
@@ -669,10 +677,10 @@ cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[],
 		else
 			stat_id = stat_ids[i];
 
-		if (stat_id >= mldev->xstats.count_mode_device)
+		if (stat_id >= cn10k_mldev->xstats.count_mode_device)
 			return -EINVAL;
 
-		xs = &mldev->xstats.entries[stat_id];
+		xs = &cn10k_mldev->xstats.entries[stat_id];
 		if (!xs->reset_allowed)
 			continue;
 
@@ -740,15 +748,17 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 			    uint16_t nb_ids)
 {
 	struct cn10k_ml_xstats_entry *xs;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	int32_t lcl_model_id = 0;
 	uint16_t start_id;
 	uint16_t end_id;
 	int32_t i;
 	int32_t j;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	for (i = 0; i < ML_CN10K_MAX_MODELS; i++) {
 		if (model_id == -1) {
 			model = dev->data->models[i];
@@ -765,12 +775,13 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 			}
 		}
 
-		start_id = mldev->xstats.offset_for_model[i];
-		end_id = mldev->xstats.offset_for_model[i] + mldev->xstats.count_per_model[i] - 1;
+		start_id = cn10k_mldev->xstats.offset_for_model[i];
+		end_id = cn10k_mldev->xstats.offset_for_model[i] +
+			 cn10k_mldev->xstats.count_per_model[i] - 1;
 
 		if (stat_ids == NULL) {
 			for (j = start_id; j <= end_id; j++) {
-				xs = &mldev->xstats.entries[j];
+				xs = &cn10k_mldev->xstats.entries[j];
 				cn10k_ml_reset_model_stat(dev, i, xs->type);
 			}
 		} else {
@@ -780,7 +791,7 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 						stat_ids[j], lcl_model_id);
 					return -EINVAL;
 				}
-				xs = &mldev->xstats.entries[stat_ids[j]];
+				xs = &cn10k_mldev->xstats.entries[stat_ids[j]];
 				cn10k_ml_reset_model_stat(dev, i, xs->type);
 			}
 		}
@@ -854,17 +865,19 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
 	if (dev_info == NULL)
 		return -EINVAL;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
 	dev_info->driver_name = dev->device->driver->name;
 	dev_info->max_models = ML_CN10K_MAX_MODELS;
-	if (mldev->hw_queue_lock)
+	if (cn10k_mldev->hw_queue_lock)
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
 	else
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_LF;
@@ -881,8 +894,9 @@ static int
 cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
 {
 	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
@@ -895,7 +909,8 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		return -EINVAL;
 
 	/* Get CN10K device handle */
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	cn10k_ml_dev_info_get(dev, &dev_info);
 	if (conf->nb_models > dev_info.max_models) {
@@ -908,21 +923,21 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		return -EINVAL;
 	}
 
-	if (mldev->state == ML_CN10K_DEV_STATE_PROBED) {
+	if (cnxk_mldev->state == ML_CNXK_DEV_STATE_PROBED) {
 		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
 
 		/* Load firmware */
-		ret = cn10k_ml_fw_load(mldev);
+		ret = cn10k_ml_fw_load(cnxk_mldev);
 		if (ret != 0)
 			return ret;
-	} else if (mldev->state == ML_CN10K_DEV_STATE_CONFIGURED) {
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
 		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
-	} else if (mldev->state == ML_CN10K_DEV_STATE_STARTED) {
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_STARTED) {
 		plt_err("Device can't be reconfigured in started state\n");
 		return -ENOTSUP;
-	} else if (mldev->state == ML_CN10K_DEV_STATE_CLOSED) {
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CLOSED) {
 		plt_err("Device can't be reconfigured after close\n");
 		return -ENOTSUP;
 	}
@@ -1013,10 +1028,10 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	}
 	dev->data->nb_models = conf->nb_models;
 
-	ocm = &mldev->ocm;
+	ocm = &cn10k_mldev->ocm;
 	ocm->num_tiles = ML_CN10K_OCM_NUMTILES;
 	ocm->size_per_tile = ML_CN10K_OCM_TILESIZE;
-	ocm->page_size = mldev->ocm_page_size;
+	ocm->page_size = cn10k_mldev->ocm_page_size;
 	ocm->num_pages = ocm->size_per_tile / ocm->page_size;
 	ocm->mask_words = ocm->num_pages / (8 * sizeof(uint8_t));
 
@@ -1044,25 +1059,25 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	}
 
 	/* Set JCMDQ enqueue function */
-	if (mldev->hw_queue_lock == 1)
-		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_sl;
+	if (cn10k_mldev->hw_queue_lock == 1)
+		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_sl;
 	else
-		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
+		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
 	/* Set polling function pointers */
-	mldev->set_poll_addr = cn10k_ml_set_poll_addr;
-	mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
-	mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
+	cn10k_mldev->set_poll_addr = cn10k_ml_set_poll_addr;
+	cn10k_mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
+	cn10k_mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
 
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
 	dev->op_error_get = cn10k_ml_op_error_get;
 
-	mldev->nb_models_loaded = 0;
-	mldev->nb_models_started = 0;
-	mldev->nb_models_stopped = 0;
-	mldev->nb_models_unloaded = 0;
-	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
+	cnxk_mldev->nb_models_loaded = 0;
+	cnxk_mldev->nb_models_started = 0;
+	cnxk_mldev->nb_models_stopped = 0;
+	cnxk_mldev->nb_models_unloaded = 0;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
 
 	return 0;
 
@@ -1077,8 +1092,9 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 static int
 cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
 	uint16_t qp_id;
@@ -1086,10 +1102,11 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	if (dev == NULL)
 		return -EINVAL;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Release ocm_mask memory */
-	rte_free(mldev->ocm.ocm_mask);
+	rte_free(cn10k_mldev->ocm.ocm_mask);
 
 	/* Stop and unload all models */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
@@ -1125,21 +1142,21 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	cn10k_ml_xstats_uninit(dev);
 
 	/* Unload firmware */
-	cn10k_ml_fw_unload(mldev);
+	cn10k_ml_fw_unload(cnxk_mldev);
 
 	/* Clear scratch registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
 
 	/* Reset ML_MLR_BASE */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_MLR_BASE);
-	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
 
-	mldev->state = ML_CN10K_DEV_STATE_CLOSED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CLOSED;
 
 	/* Remove PCI device */
 	return rte_dev_remove(dev->device);
@@ -1148,17 +1165,19 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 static int
 cn10k_ml_dev_start(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	mldev->state = ML_CN10K_DEV_STATE_STARTED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
 
 	return 0;
 }
@@ -1166,17 +1185,19 @@ cn10k_ml_dev_start(struct rte_ml_dev *dev)
 static int
 cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
 
 	return 0;
 }
@@ -1259,22 +1280,24 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
 			      uint32_t size)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint32_t xstats_mode_count;
 	uint32_t idx = 0;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	xstats_mode_count = 0;
 	switch (mode) {
 	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = mldev->xstats.count_mode_device;
+		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CN10K_MAX_MODELS)
 			break;
-		xstats_mode_count = mldev->xstats.count_per_model[model_id];
+		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
 	default:
 		return -EINVAL;
@@ -1283,16 +1306,17 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 	if (xstats_mode_count > size || xstats_map == NULL)
 		return xstats_mode_count;
 
-	for (i = 0; i < mldev->xstats.count && idx < size; i++) {
-		if (mldev->xstats.entries[i].mode != mode)
+	for (i = 0; i < cn10k_mldev->xstats.count && idx < size; i++) {
+		if (cn10k_mldev->xstats.entries[i].mode != mode)
 			continue;
 
 		if (mode != RTE_ML_DEV_XSTATS_DEVICE &&
-		    model_id != mldev->xstats.entries[i].obj_idx)
+		    model_id != cn10k_mldev->xstats.entries[i].obj_idx)
 			continue;
 
-		strncpy(xstats_map[idx].name, mldev->xstats.entries[i].map.name, RTE_ML_STR_MAX);
-		xstats_map[idx].id = mldev->xstats.entries[i].map.id;
+		strncpy(xstats_map[idx].name, cn10k_mldev->xstats.entries[i].map.name,
+			RTE_ML_STR_MAX);
+		xstats_map[idx].id = cn10k_mldev->xstats.entries[i].map.id;
 		idx++;
 	}
 
@@ -1304,13 +1328,15 @@ cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16
 				uint64_t *value)
 {
 	struct cn10k_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	cn10k_ml_xstats_fn fn;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
-	for (i = 0; i < mldev->xstats.count; i++) {
-		xs = &mldev->xstats.entries[i];
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	for (i = 0; i < cn10k_mldev->xstats.count; i++) {
+		xs = &cn10k_mldev->xstats.entries[i];
 		if (strncmp(xs->map.name, name, RTE_ML_STR_MAX) == 0) {
 			if (stat_id != NULL)
 				*stat_id = xs->map.id;
@@ -1344,24 +1370,26 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
 {
 	struct cn10k_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint32_t xstats_mode_count;
 	cn10k_ml_xstats_fn fn;
 	uint64_t val;
 	uint32_t idx;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	xstats_mode_count = 0;
 
 	switch (mode) {
 	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = mldev->xstats.count_mode_device;
+		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CN10K_MAX_MODELS)
 			return -EINVAL;
-		xstats_mode_count = mldev->xstats.count_per_model[model_id];
+		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
 	default:
 		return -EINVAL;
@@ -1369,8 +1397,8 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 
 	idx = 0;
 	for (i = 0; i < nb_ids && idx < xstats_mode_count; i++) {
-		xs = &mldev->xstats.entries[stat_ids[i]];
-		if (stat_ids[i] > mldev->xstats.count || xs->mode != mode)
+		xs = &cn10k_mldev->xstats.entries[stat_ids[i]];
+		if (stat_ids[i] > cn10k_mldev->xstats.count || xs->mode != mode)
 			continue;
 
 		if (mode == RTE_ML_DEV_XSTATS_MODEL && model_id != xs->obj_idx) {
@@ -1418,8 +1446,9 @@ cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mo
 static int
 cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_fw *fw;
 
 	uint32_t head_loc;
@@ -1432,8 +1461,9 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	if (roc_env_is_asim())
 		return 0;
 
-	mldev = dev->data->dev_private;
-	fw = &mldev->fw;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	fw = &cn10k_mldev->fw;
 
 	/* Dump model info */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
@@ -1451,15 +1481,19 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	for (core_id = 0; core_id <= 1; core_id++) {
 		bufsize = fw->req->jd.fw_load.debug.debug_buffer_size;
 		if (core_id == 0) {
-			head_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-			tail_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+			head_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+			tail_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_debug_ptr);
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		} else {
-			head_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-			tail_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+			head_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+			tail_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_debug_ptr);
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		}
 		if (head_loc < tail_loc) {
 			fprintf(fp, "%.*s\n", tail_loc - head_loc, &head_ptr[head_loc]);
@@ -1473,18 +1507,18 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	for (core_id = 0; core_id <= 1; core_id++) {
 		bufsize = fw->req->jd.fw_load.debug.exception_state_size;
 		if ((core_id == 0) &&
-		    (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0)) {
+		    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0)) {
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C0 = 0x%016lx",
-				roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0));
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0));
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 			fprintf(fp, "%.*s", bufsize, head_ptr);
-		} else if ((core_id == 1) &&
-			   (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0)) {
+		} else if ((core_id == 1) && (roc_ml_reg_read64(&cn10k_mldev->roc,
+								ML_SCRATCH_EXCEPTION_SP_C1) != 0)) {
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C1 = 0x%016lx",
-				roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1));
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1));
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 			fprintf(fp, "%.*s", bufsize, head_ptr);
 		}
 	}
@@ -1495,14 +1529,16 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 static int
 cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	const struct plt_memzone *mz;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	uint64_t timeout_cycle;
 	bool timeout;
 	int ret;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cn10k_ml_req), 0,
 					 ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL) {
@@ -1515,20 +1551,20 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
 	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
 	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
-	req->jd.fw_load.flags = cn10k_ml_fw_flags_get(&mldev->fw);
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
+	req->jd.fw_load.flags = cn10k_ml_fw_flags_get(&cn10k_mldev->fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
 	/* Enqueue firmware selftest request through scratch registers */
 	timeout = true;
-	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
 
 	plt_rmb();
 	do {
-		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
-		    (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
+		    (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -1552,8 +1588,8 @@ int
 cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
 	struct cn10k_ml_model_metadata *metadata;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
@@ -1574,7 +1610,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	if (ret != 0)
 		return ret;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
 
 	/* Find model ID */
 	found = false;
@@ -1591,7 +1627,8 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	/* Get WB and scratch pages, check if model can be loaded. */
-	ret = cn10k_ml_model_ocm_pages_count(mldev, idx, params->addr, &wb_pages, &scratch_pages);
+	ret = cn10k_ml_model_ocm_pages_count(&cnxk_mldev->cn10k_mldev, idx, params->addr, &wb_pages,
+					     &scratch_pages);
 	if (ret < 0)
 		return ret;
 
@@ -1623,7 +1660,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	model = mz->addr;
-	model->mldev = mldev;
+	model->mldev = cnxk_mldev;
 	model->model_id = idx;
 
 	rte_memcpy(&model->metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
@@ -1680,7 +1717,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
-	mldev->nb_models_loaded++;
+	cnxk_mldev->nb_models_loaded++;
 
 	/* Update xstats names */
 	cn10k_ml_xstats_model_name_update(dev, idx);
@@ -1695,9 +1732,9 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	char str[RTE_MEMZONE_NAMESIZE];
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
 	model = dev->data->models[model_id];
 
 	if (model == NULL) {
@@ -1711,7 +1748,7 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 	}
 
 	dev->data->models[model_id] = NULL;
-	mldev->nb_models_unloaded++;
+	cnxk_mldev->nb_models_unloaded++;
 
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, model_id);
 	return plt_memzone_free(plt_memzone_lookup(str));
@@ -1720,8 +1757,9 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 int
 cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1735,8 +1773,9 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	bool locked;
 	int ret = 0;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	if (model == NULL) {
@@ -1746,11 +1785,11 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Prepare JD */
 	req = model->req;
-	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
+	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
 
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
 	num_tiles = model->metadata.model.tile_end - model->metadata.model.tile_start + 1;
@@ -1815,26 +1854,26 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	job_dequeued = false;
 	do {
 		if (!job_enqueued) {
-			req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&mldev->roc, &req->jd);
+			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
 			if (req->result.error_code.u64 == 0)
 				ret = 0;
 			else
 				ret = -1;
 		}
 	} else { /* Reset scratch registers */
-		roc_ml_scratch_queue_reset(&mldev->roc);
+		roc_ml_scratch_queue_reset(&cn10k_mldev->roc);
 		ret = -ETIME;
 	}
 
@@ -1843,7 +1882,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 		if (plt_spinlock_trylock(&model->lock) != 0) {
 			if (ret == 0) {
 				model->state = ML_CN10K_MODEL_STATE_STARTED;
-				mldev->nb_models_started++;
+				cnxk_mldev->nb_models_started++;
 			} else {
 				model->state = ML_CN10K_MODEL_STATE_UNKNOWN;
 			}
@@ -1867,7 +1906,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	if (ret < 0) { /* Call unload to update model and FW state, ignore error */
 		rte_ml_model_stop(dev->data->dev_id, model_id);
 	} else {
-		if (mldev->cache_model_data && roc_model_is_cn10ka())
+		if (cn10k_mldev->cache_model_data && roc_model_is_cn10ka())
 			ret = cn10k_ml_cache_model_data(dev, model_id);
 	}
 
@@ -1877,8 +1916,9 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 int
 cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1887,8 +1927,9 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	bool locked;
 	int ret = 0;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	if (model == NULL) {
@@ -1898,11 +1939,11 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Prepare JD */
 	req = model->req;
-	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
+	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
 
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
 	locked = false;
@@ -1941,33 +1982,33 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	job_dequeued = false;
 	do {
 		if (!job_enqueued) {
-			req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&mldev->roc, &req->jd);
+			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
 			if (req->result.error_code.u64 == 0x0)
 				ret = 0;
 			else
 				ret = -1;
 		}
 	} else {
-		roc_ml_scratch_queue_reset(&mldev->roc);
+		roc_ml_scratch_queue_reset(&cn10k_mldev->roc);
 		ret = -ETIME;
 	}
 
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			mldev->nb_models_stopped++;
+			cnxk_mldev->nb_models_stopped++;
 			model->state = ML_CN10K_MODEL_STATE_LOADED;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
@@ -2211,8 +2252,9 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		       struct rte_ml_op *op)
 {
 	struct cn10k_ml_model_stats *stats;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
@@ -2258,14 +2300,16 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 
 		/* Handle driver error */
 		if (result->error_code.s.etype == ML_ETYPE_DRIVER) {
-			mldev = dev->data->dev_private;
+			cnxk_mldev = dev->data->dev_private;
+			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 			/* Check for exception */
-			if ((roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0) ||
-			    (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
+			if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) !=
+			     0) ||
+			    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
 				result->error_code.s.stype = ML_DRIVER_ERR_EXCEPTION;
-			else if ((roc_ml_reg_read64(&mldev->roc, ML_CORE_INT_LO) != 0) ||
-				 (roc_ml_reg_read64(&mldev->roc, ML_CORE_INT_HI) != 0))
+			else if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_LO) != 0) ||
+				 (roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_HI) != 0))
 				result->error_code.s.stype = ML_DRIVER_ERR_FW_ERROR;
 			else
 				result->error_code.s.stype = ML_DRIVER_ERR_UNKNOWN;
@@ -2282,8 +2326,9 @@ __rte_hot uint16_t
 cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	struct cn10k_ml_qp *qp;
 	struct rte_ml_op *op;
@@ -2292,7 +2337,8 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	uint64_t head;
 	bool enqueued;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	qp = dev->data->queue_pairs[qp_id];
 	queue = &qp->queue;
 
@@ -2307,15 +2353,15 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	op = ops[count];
 	req = &queue->reqs[head];
 
-	mldev->set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
+	cn10k_mldev->set_poll_addr(req);
+	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
-	mldev->set_poll_ptr(req);
-	enqueued = mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd);
+	cn10k_mldev->set_poll_ptr(req);
+	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
 
@@ -2339,8 +2385,9 @@ __rte_hot uint16_t
 cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	struct cn10k_ml_qp *qp;
 
@@ -2348,7 +2395,8 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	uint16_t count;
 	uint64_t tail;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	qp = dev->data->queue_pairs[qp_id];
 	queue = &qp->queue;
 
@@ -2361,8 +2409,8 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 
 dequeue_req:
 	req = &queue->reqs[tail];
-	status = mldev->get_poll_ptr(req);
-	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH)) {
+	status = cn10k_mldev->get_poll_ptr(req);
+	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
 		if (plt_tsc_cycles() < req->timeout)
 			goto empty_or_active;
 		else /* Timeout, set indication of driver error */
@@ -2420,30 +2468,32 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 __rte_hot int
 cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	bool timeout;
 	int ret = 0;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[op->model_id];
 	req = model->req;
 
 	cn10k_ml_set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
-	mldev->set_poll_ptr(req);
+	cn10k_mldev->set_poll_ptr(req);
 	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
 
 	timeout = true;
-	req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	do {
-		if (mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd)) {
+		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd)) {
 			req->op = op;
 			timeout = false;
 			break;
@@ -2457,7 +2507,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 
 	timeout = true;
 	do {
-		if (mldev->get_poll_ptr(req) == ML_CN10K_POLL_JOB_FINISH) {
+		if (cn10k_mldev->get_poll_ptr(req) == ML_CNXK_POLL_JOB_FINISH) {
 			timeout = false;
 			break;
 		}
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.c b/drivers/ml/cnxk/cnxk_ml_dev.c
new file mode 100644
index 0000000000..2a5c17c973
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_dev.c
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include "cnxk_ml_dev.h"
+
+/* Dummy operations for ML device */
+struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
new file mode 100644
index 0000000000..51315de622
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_DEV_H_
+#define _CNXK_ML_DEV_H_
+
+#include <roc_api.h>
+
+#include "cn10k_ml_dev.h"
+
+/* ML command timeout in seconds */
+#define ML_CNXK_CMD_TIMEOUT 5
+
+/* Poll mode job state */
+#define ML_CNXK_POLL_JOB_START	0
+#define ML_CNXK_POLL_JOB_FINISH 1
+
+/* Device configuration state enum */
+enum cnxk_ml_dev_state {
+	/* Probed and not configured */
+	ML_CNXK_DEV_STATE_PROBED = 0,
+
+	/* Configured */
+	ML_CNXK_DEV_STATE_CONFIGURED,
+
+	/* Started */
+	ML_CNXK_DEV_STATE_STARTED,
+
+	/* Closed */
+	ML_CNXK_DEV_STATE_CLOSED
+};
+
+/* Device private data */
+struct cnxk_ml_dev {
+	/* RTE device */
+	struct rte_ml_dev *mldev;
+
+	/* Configuration state */
+	enum cnxk_ml_dev_state state;
+
+	/* Number of models loaded */
+	uint16_t nb_models_loaded;
+
+	/* Number of models unloaded */
+	uint16_t nb_models_unloaded;
+
+	/* Number of models started */
+	uint16_t nb_models_started;
+
+	/* Number of models stopped */
+	uint16_t nb_models_stopped;
+
+	/* CN10K device structure */
+	struct cn10k_ml_dev cn10k_mldev;
+};
+
+#endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 94fa4283b1..03a2d4ecf2 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -12,6 +12,7 @@ driver_sdk_headers = files(
         'cn10k_ml_ops.h',
         'cn10k_ml_model.h',
         'cn10k_ml_ocm.h',
+        'cnxk_ml_dev.h',
 )
 
 sources = files(
@@ -19,6 +20,7 @@ sources = files(
         'cn10k_ml_ops.c',
         'cn10k_ml_model.c',
         'cn10k_ml_ocm.c',
+        'cnxk_ml_dev.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 04/34] ml/cnxk: add generic model and layer structures
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (2 preceding siblings ...)
  2023-09-20  7:24   ` [PATCH v2 03/34] ml/cnxk: add generic cnxk device structure Srikanth Yalavarthi
@ 2023-09-20  7:24   ` Srikanth Yalavarthi
  2023-09-20  7:24   ` [PATCH v2 05/34] ml/cnxk: add generic cnxk request structure Srikanth Yalavarthi
                     ` (30 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:24 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduce generic cnxk model and layer structure. These
structures would enable supporting models with multiple
layers. A model is a collection of multiple independent
layers with flow dependencies between the layers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |   9 +-
 drivers/ml/cnxk/cn10k_ml_model.c | 244 ++++++++--------
 drivers/ml/cnxk/cn10k_ml_model.h | 122 ++------
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  49 +++-
 drivers/ml/cnxk/cn10k_ml_ocm.h   |   9 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 487 +++++++++++++++++--------------
 drivers/ml/cnxk/cnxk_ml_io.h     |  79 +++++
 drivers/ml/cnxk/cnxk_ml_model.c  |   7 +
 drivers/ml/cnxk/cnxk_ml_model.h  | 111 +++++++
 drivers/ml/cnxk/meson.build      |   3 +
 10 files changed, 653 insertions(+), 467 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index f9da1548c4..99ff0a344a 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -9,6 +9,8 @@
 
 #include "cn10k_ml_ocm.h"
 
+#include "cnxk_ml_io.h"
+
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 
@@ -21,9 +23,6 @@ extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 /* Device alignment size */
 #define ML_CN10K_ALIGN_SIZE 128
 
-/* Maximum number of models per device */
-#define ML_CN10K_MAX_MODELS 16
-
 /* Maximum number of queue-pairs per device, spinlock version */
 #define ML_CN10K_MAX_QP_PER_DEVICE_SL 16
 
@@ -455,8 +454,8 @@ struct cn10k_ml_xstats {
 	struct cn10k_ml_xstats_entry *entries;
 
 	/* Store num stats and offset of the stats for each model */
-	uint16_t count_per_model[ML_CN10K_MAX_MODELS];
-	uint16_t offset_for_model[ML_CN10K_MAX_MODELS];
+	uint16_t count_per_model[ML_CNXK_MAX_MODELS];
+	uint16_t offset_for_model[ML_CNXK_MAX_MODELS];
 	uint16_t count_mode_device;
 	uint16_t count_mode_model;
 	uint16_t count;
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index d146535866..0ea6520bf7 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -11,6 +11,7 @@
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -312,19 +313,17 @@ cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata)
 }
 
 void
-cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_t *base_dma_addr)
+cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer, uint8_t *base_dma_addr)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_model_addr *addr;
+	struct cn10k_ml_layer_addr *addr;
 	size_t model_data_size;
 	uint8_t *dma_addr_load;
 	uint8_t *dma_addr_run;
-	uint8_t i;
-	uint8_t j;
 	int fpos;
 
-	metadata = &model->metadata;
-	addr = &model->addr;
+	metadata = &layer->glow.metadata;
+	addr = &layer->glow.addr;
 	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
 			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
 
@@ -362,102 +361,136 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_
 	addr->wb_base_addr = PLT_PTR_SUB(dma_addr_load, metadata->weights_bias.mem_offset);
 	addr->wb_load_addr = PLT_PTR_ADD(addr->wb_base_addr, metadata->weights_bias.mem_offset);
 	rte_memcpy(addr->wb_load_addr, PLT_PTR_ADD(buffer, fpos), metadata->weights_bias.file_size);
+}
+
+void
+cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	uint8_t i;
+	uint8_t j;
+
+	metadata = &layer->glow.metadata;
 
 	/* Inputs */
-	addr->total_input_sz_d = 0;
-	addr->total_input_sz_q = 0;
+	layer->info.nb_inputs = metadata->model.num_input;
+	layer->info.total_input_sz_d = 0;
+	layer->info.total_input_sz_q = 0;
 	for (i = 0; i < metadata->model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			addr->input[i].nb_dims = 4;
-			addr->input[i].shape[0] = metadata->input1[i].shape.w;
-			addr->input[i].shape[1] = metadata->input1[i].shape.x;
-			addr->input[i].shape[2] = metadata->input1[i].shape.y;
-			addr->input[i].shape[3] = metadata->input1[i].shape.z;
-
-			addr->input[i].nb_elements =
+			strncpy(layer->info.input[i].name, (char *)metadata->input1[i].input_name,
+				MRVL_ML_INPUT_NAME_LEN);
+			layer->info.input[i].dtype = metadata->input1[i].input_type;
+			layer->info.input[i].qtype = metadata->input1[i].model_input_type;
+			layer->info.input[i].nb_dims = 4;
+			layer->info.input[i].shape[0] = metadata->input1[i].shape.w;
+			layer->info.input[i].shape[1] = metadata->input1[i].shape.x;
+			layer->info.input[i].shape[2] = metadata->input1[i].shape.y;
+			layer->info.input[i].shape[3] = metadata->input1[i].shape.z;
+			layer->info.input[i].nb_elements =
 				metadata->input1[i].shape.w * metadata->input1[i].shape.x *
 				metadata->input1[i].shape.y * metadata->input1[i].shape.z;
-			addr->input[i].sz_d =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_d =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].input_type);
-			addr->input[i].sz_q =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_q =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].model_input_type);
-			addr->total_input_sz_d += addr->input[i].sz_d;
-			addr->total_input_sz_q += addr->input[i].sz_q;
+			layer->info.input[i].scale = metadata->input1[i].qscale;
+
+			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
+			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
 
 			plt_ml_dbg(
-				"model_id = %u, input[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				model->model_id, i, metadata->input1[i].shape.w,
+				"index = %u, input1[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				layer->index, i, metadata->input1[i].shape.w,
 				metadata->input1[i].shape.x, metadata->input1[i].shape.y,
-				metadata->input1[i].shape.z, addr->input[i].sz_d,
-				addr->input[i].sz_q);
+				metadata->input1[i].shape.z, layer->info.input[i].sz_d,
+				layer->info.input[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			addr->input[i].nb_dims = 4;
-			addr->input[i].shape[0] = metadata->input2[j].shape.w;
-			addr->input[i].shape[1] = metadata->input2[j].shape.x;
-			addr->input[i].shape[2] = metadata->input2[j].shape.y;
-			addr->input[i].shape[3] = metadata->input2[j].shape.z;
-
-			addr->input[i].nb_elements =
+			strncpy(layer->info.input[i].name, (char *)metadata->input2[j].input_name,
+				MRVL_ML_INPUT_NAME_LEN);
+			layer->info.input[i].dtype = metadata->input2[j].input_type;
+			layer->info.input[i].qtype = metadata->input2[j].model_input_type;
+			layer->info.input[i].nb_dims = 4;
+			layer->info.input[i].shape[0] = metadata->input2[j].shape.w;
+			layer->info.input[i].shape[1] = metadata->input2[j].shape.x;
+			layer->info.input[i].shape[2] = metadata->input2[j].shape.y;
+			layer->info.input[i].shape[3] = metadata->input2[j].shape.z;
+			layer->info.input[i].nb_elements =
 				metadata->input2[j].shape.w * metadata->input2[j].shape.x *
 				metadata->input2[j].shape.y * metadata->input2[j].shape.z;
-			addr->input[i].sz_d =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_d =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].input_type);
-			addr->input[i].sz_q =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_q =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].model_input_type);
-			addr->total_input_sz_d += addr->input[i].sz_d;
-			addr->total_input_sz_q += addr->input[i].sz_q;
+			layer->info.input[i].scale = metadata->input2[j].qscale;
+
+			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
+			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
 
 			plt_ml_dbg(
-				"model_id = %u, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				model->model_id, j, metadata->input2[j].shape.w,
+				"index = %u, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				layer->index, j, metadata->input2[j].shape.w,
 				metadata->input2[j].shape.x, metadata->input2[j].shape.y,
-				metadata->input2[j].shape.z, addr->input[i].sz_d,
-				addr->input[i].sz_q);
+				metadata->input2[j].shape.z, layer->info.input[i].sz_d,
+				layer->info.input[i].sz_q);
 		}
 	}
 
 	/* Outputs */
-	addr->total_output_sz_q = 0;
-	addr->total_output_sz_d = 0;
+	layer->info.nb_outputs = metadata->model.num_output;
+	layer->info.total_output_sz_q = 0;
+	layer->info.total_output_sz_d = 0;
 	for (i = 0; i < metadata->model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			addr->output[i].nb_dims = 1;
-			addr->output[i].shape[0] = metadata->output1[i].size;
-			addr->output[i].nb_elements = metadata->output1[i].size;
-			addr->output[i].sz_d =
-				addr->output[i].nb_elements *
+			strncpy(layer->info.output[i].name,
+				(char *)metadata->output1[i].output_name, MRVL_ML_OUTPUT_NAME_LEN);
+			layer->info.output[i].dtype = metadata->output1[i].output_type;
+			layer->info.output[i].qtype = metadata->output1[i].model_output_type;
+			layer->info.output[i].nb_dims = 1;
+			layer->info.output[i].shape[0] = metadata->output1[i].size;
+			layer->info.output[i].nb_elements = metadata->output1[i].size;
+			layer->info.output[i].sz_d =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].output_type);
-			addr->output[i].sz_q =
-				addr->output[i].nb_elements *
+			layer->info.output[i].sz_q =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].model_output_type);
-			addr->total_output_sz_q += addr->output[i].sz_q;
-			addr->total_output_sz_d += addr->output[i].sz_d;
+			layer->info.output[i].scale = metadata->output1[i].dscale;
 
-			plt_ml_dbg("model_id = %u, output[%u] - sz_d = %u, sz_q = %u",
-				   model->model_id, i, addr->output[i].sz_d, addr->output[i].sz_q);
+			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
+			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
+
+			plt_ml_dbg("index = %u, output1[%u] - sz_d = %u, sz_q = %u", layer->index,
+				   i, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			addr->output[i].nb_dims = 1;
-			addr->output[i].shape[0] = metadata->output2[j].size;
-			addr->output[i].nb_elements = metadata->output2[j].size;
-			addr->output[i].sz_d =
-				addr->output[i].nb_elements *
+			strncpy(layer->info.output[i].name,
+				(char *)metadata->output2[j].output_name, MRVL_ML_OUTPUT_NAME_LEN);
+			layer->info.output[i].dtype = metadata->output2[j].output_type;
+			layer->info.output[i].qtype = metadata->output2[j].model_output_type;
+			layer->info.output[i].nb_dims = 1;
+			layer->info.output[i].shape[0] = metadata->output2[j].size;
+			layer->info.output[i].nb_elements = metadata->output2[j].size;
+			layer->info.output[i].sz_d =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].output_type);
-			addr->output[i].sz_q =
-				addr->output[i].nb_elements *
+			layer->info.output[i].sz_q =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].model_output_type);
-			addr->total_output_sz_q += addr->output[i].sz_q;
-			addr->total_output_sz_d += addr->output[i].sz_d;
+			layer->info.output[i].scale = metadata->output2[j].dscale;
+
+			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
+			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
 
-			plt_ml_dbg("model_id = %u, output2[%u] - sz_d = %u, sz_q = %u",
-				   model->model_id, j, addr->output[i].sz_d, addr->output[i].sz_q);
+			plt_ml_dbg("index = %u, output2[%u] - sz_d = %u, sz_q = %u", layer->index,
+				   j, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
 		}
 	}
 }
@@ -515,23 +548,23 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 }
 
 void
-cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model)
+cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_model_addr *addr;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct rte_ml_model_info *info;
 	struct rte_ml_io_info *output;
 	struct rte_ml_io_info *input;
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_layer *layer;
 	uint8_t i;
-	uint8_t j;
 
-	mldev = dev->data->dev_private;
-	metadata = &model->metadata;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	metadata = &model->glow.metadata;
 	info = PLT_PTR_CAST(model->info);
 	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
 	output = PLT_PTR_ADD(input, metadata->model.num_input * sizeof(struct rte_ml_io_info));
-	addr = &model->addr;
 
 	/* Set model info */
 	memset(info, 0, sizeof(struct rte_ml_model_info));
@@ -543,7 +576,8 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model)
 	info->device_id = dev->data->dev_id;
 	info->io_layout = RTE_ML_IO_LAYOUT_PACKED;
 	info->min_batches = model->batch_size;
-	info->max_batches = mldev->fw.req->jd.fw_load.cap.s.max_num_batches / model->batch_size;
+	info->max_batches =
+		cn10k_mldev->fw.req->jd.fw_load.cap.s.max_num_batches / model->batch_size;
 	info->nb_inputs = metadata->model.num_input;
 	info->input_info = input;
 	info->nb_outputs = metadata->model.num_output;
@@ -551,56 +585,26 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model)
 	info->wb_size = metadata->weights_bias.file_size;
 
 	/* Set input info */
+	layer = &model->layer[0];
 	for (i = 0; i < info->nb_inputs; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			rte_memcpy(input[i].name, metadata->input1[i].input_name,
-				   MRVL_ML_INPUT_NAME_LEN);
-			input[i].nb_dims = addr->input[i].nb_dims;
-			input[i].shape = addr->input[i].shape;
-			input[i].type = metadata->input1[i].model_input_type;
-			input[i].nb_elements = addr->input[i].nb_elements;
-			input[i].size =
-				addr->input[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->input1[i].model_input_type);
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-
-			rte_memcpy(input[i].name, metadata->input2[j].input_name,
-				   MRVL_ML_INPUT_NAME_LEN);
-			input[i].nb_dims = addr->input[i].nb_dims;
-			input[i].shape = addr->input[i].shape;
-			input[i].type = metadata->input2[j].model_input_type;
-			input[i].nb_elements = addr->input[i].nb_elements;
-			input[i].size =
-				addr->input[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->input2[j].model_input_type);
-		}
+		rte_memcpy(input[i].name, layer->info.input[i].name, MRVL_ML_INPUT_NAME_LEN);
+		input[i].nb_dims = layer->info.input[i].nb_dims;
+		input[i].shape = &layer->info.input[i].shape[0];
+		input[i].type = layer->info.input[i].qtype;
+		input[i].nb_elements = layer->info.input[i].nb_elements;
+		input[i].size = layer->info.input[i].nb_elements *
+				rte_ml_io_type_size_get(layer->info.input[i].qtype);
 	}
 
 	/* Set output info */
+	layer = &model->layer[0];
 	for (i = 0; i < info->nb_outputs; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			rte_memcpy(output[i].name, metadata->output1[i].output_name,
-				   MRVL_ML_OUTPUT_NAME_LEN);
-			output[i].nb_dims = addr->output[i].nb_dims;
-			output[i].shape = addr->output[i].shape;
-			output[i].type = metadata->output1[i].model_output_type;
-			output[i].nb_elements = addr->output[i].nb_elements;
-			output[i].size =
-				addr->output[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->output1[i].model_output_type);
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-
-			rte_memcpy(output[i].name, metadata->output2[j].output_name,
-				   MRVL_ML_OUTPUT_NAME_LEN);
-			output[i].nb_dims = addr->output[i].nb_dims;
-			output[i].shape = addr->output[i].shape;
-			output[i].type = metadata->output2[j].model_output_type;
-			output[i].nb_elements = addr->output[i].nb_elements;
-			output[i].size =
-				addr->output[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->output2[j].model_output_type);
-		}
+		rte_memcpy(output[i].name, layer->info.output[i].name, MRVL_ML_INPUT_NAME_LEN);
+		output[i].nb_dims = layer->info.output[i].nb_dims;
+		output[i].shape = &layer->info.output[i].shape[0];
+		output[i].type = layer->info.output[i].qtype;
+		output[i].nb_elements = layer->info.output[i].nb_elements;
+		output[i].size = layer->info.output[i].nb_elements *
+				 rte_ml_io_type_size_get(layer->info.output[i].qtype);
 	}
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 3128b28db7..206a369ca7 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -13,15 +13,8 @@
 #include "cn10k_ml_ocm.h"
 #include "cn10k_ml_ops.h"
 
-struct cnxk_ml_dev;
-
-/* Model state */
-enum cn10k_ml_model_state {
-	ML_CN10K_MODEL_STATE_LOADED,
-	ML_CN10K_MODEL_STATE_JOB_ACTIVE,
-	ML_CN10K_MODEL_STATE_STARTED,
-	ML_CN10K_MODEL_STATE_UNKNOWN,
-};
+struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 /* Model Metadata : v 2.3.0.1 */
 #define MRVL_ML_MODEL_MAGIC_STRING "MRVL"
@@ -369,7 +362,7 @@ struct cn10k_ml_model_metadata {
 };
 
 /* Model address structure */
-struct cn10k_ml_model_addr {
+struct cn10k_ml_layer_addr {
 	/* Base DMA address for load */
 	void *base_dma_addr_load;
 
@@ -408,58 +401,10 @@ struct cn10k_ml_model_addr {
 
 	/* End tile */
 	uint8_t tile_end;
-
-	/* Input address and size */
-	struct {
-		/* Number of dimensions in shape */
-		uint32_t nb_dims;
-
-		/* Shape of input */
-		uint32_t shape[4];
-
-		/* Number of elements */
-		uint32_t nb_elements;
-
-		/* Dequantized input size */
-		uint32_t sz_d;
-
-		/* Quantized input size */
-		uint32_t sz_q;
-	} input[MRVL_ML_NUM_INPUT_OUTPUT];
-
-	/* Output address and size */
-	struct {
-		/* Number of dimensions in shape */
-		uint32_t nb_dims;
-
-		/* Shape of input */
-		uint32_t shape[4];
-
-		/* Number of elements */
-		uint32_t nb_elements;
-
-		/* Dequantize output size */
-		uint32_t sz_d;
-
-		/* Quantized output size */
-		uint32_t sz_q;
-	} output[MRVL_ML_NUM_INPUT_OUTPUT];
-
-	/* Total size of quantized input */
-	uint32_t total_input_sz_q;
-
-	/* Total size of dequantized input */
-	uint32_t total_input_sz_d;
-
-	/* Total size of quantized output */
-	uint32_t total_output_sz_q;
-
-	/* Total size of dequantized output */
-	uint32_t total_output_sz_d;
 };
 
 /* Model fast-path stats */
-struct cn10k_ml_model_stats {
+struct cn10k_ml_layer_stats {
 	/* Total hardware latency, sum of all inferences */
 	uint64_t hw_latency_tot;
 
@@ -488,59 +433,38 @@ struct cn10k_ml_model_stats {
 	uint64_t fw_reset_count;
 };
 
-/* Model Object */
-struct cn10k_ml_model {
-	/* Device reference */
-	struct cnxk_ml_dev *mldev;
-
-	/* Name */
-	char name[RTE_ML_STR_MAX];
-
-	/* ID */
-	uint16_t model_id;
-
-	/* Batch size */
-	uint32_t batch_size;
-
-	/* Metadata */
+struct cn10k_ml_layer_data {
+	/* Model / Layer: metadata */
 	struct cn10k_ml_model_metadata metadata;
 
-	/* Address structure */
-	struct cn10k_ml_model_addr addr;
+	/* Layer: address structure */
+	struct cn10k_ml_layer_addr addr;
 
-	/* Tile and memory information object */
-	struct cn10k_ml_ocm_model_map model_mem_map;
+	/* Layer: Tile and memory information object */
+	struct cn10k_ml_ocm_layer_map ocm_map;
 
-	/* Internal model information structure
-	 * Size of the buffer = sizeof(struct rte_ml_model_info)
-	 *                    + num_inputs * sizeof(struct rte_ml_io_info)
-	 *                    + num_outputs * sizeof(struct rte_ml_io_info).
-	 * Structures would be arranged in the same order in the buffer.
-	 */
-	uint8_t *info;
-
-	/* Spinlock, used to update model state */
-	plt_spinlock_t lock;
-
-	/* State */
-	enum cn10k_ml_model_state state;
-
-	/* Slow-path operations request pointer */
+	/* Layer: Slow-path operations request pointer */
 	struct cn10k_ml_req *req;
 
-	/* Stats for burst ops */
-	struct cn10k_ml_model_stats *burst_stats;
+	/* Layer: Stats for burst ops */
+	struct cn10k_ml_layer_stats *burst_stats;
 
-	/* Stats for sync ops */
-	struct cn10k_ml_model_stats *sync_stats;
+	/* Layer: Stats for sync ops */
+	struct cn10k_ml_layer_stats *sync_stats;
+};
+
+struct cn10k_ml_model_data {
+	/* Model / Layer: metadata */
+	struct cn10k_ml_model_metadata metadata;
 };
 
 int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
-void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
+void cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer,
 				uint8_t *base_dma_addr);
+void cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer);
 int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id,
 				   uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages);
-void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model);
+void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index aa376284d5..5682778e87 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -11,6 +11,7 @@
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 
 /* OCM macros */
 #define BYTE_LEN	   8
@@ -334,12 +335,14 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 }
 
 void
-cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t tilemask,
-			   int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages)
+cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
+			   uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
+			   uint16_t scratch_pages)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_page_start;
@@ -354,6 +357,7 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
+	layer = &model->layer[layer_id];
 
 	/* Get first set bit, tile_start */
 	tile_start = 0;
@@ -383,8 +387,8 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 				PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, wb_page_end);
 	}
 
-	model->addr.tile_start = tile_start;
-	model->addr.tile_end = tile_end;
+	layer->glow.addr.tile_start = tile_start;
+	layer->glow.addr.tile_end = tile_end;
 
 	plt_ml_dbg("model_id = %u, tilemask = 0x%016lx", model_id, tilemask);
 	plt_ml_dbg("model_id = %u, wb_page_start = %d, wb_page_end = %d", model_id, wb_page_start,
@@ -394,12 +398,14 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 }
 
 void
-cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id)
 {
-	struct cn10k_ml_model *local_model;
+	struct cnxk_ml_model *local_model;
+	struct cnxk_ml_layer *local_layer;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_resize_pages;
@@ -410,16 +416,19 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 	int tile_id;
 	int page_id;
 	uint16_t i;
+	uint16_t j;
 
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
+	layer = &model->layer[layer_id];
 
 	/* Update OCM info for WB memory */
-	wb_page_start = model->model_mem_map.wb_page_start;
-	wb_page_end = wb_page_start + model->model_mem_map.wb_pages - 1;
-	for (tile_id = model->addr.tile_start; tile_id <= model->addr.tile_end; tile_id++) {
+	wb_page_start = layer->glow.ocm_map.wb_page_start;
+	wb_page_end = wb_page_start + layer->glow.ocm_map.wb_pages - 1;
+	for (tile_id = layer->glow.addr.tile_start; tile_id <= layer->glow.addr.tile_end;
+	     tile_id++) {
 		for (page_id = wb_page_start; page_id <= wb_page_end; page_id++) {
 			CLEAR_BIT(ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE],
 				  page_id % OCM_MAP_WORD_SIZE);
@@ -433,11 +442,19 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 		scratch_resize_pages = 0;
 		for (i = 0; i < dev->data->nb_models; i++) {
 			local_model = dev->data->models[i];
-			if ((i != model_id) && (local_model != NULL)) {
-				if (IS_BIT_SET(local_model->model_mem_map.tilemask, tile_id))
-					scratch_resize_pages = PLT_MAX(
-						(int)local_model->model_mem_map.scratch_pages,
-						scratch_resize_pages);
+			if (local_model == NULL)
+				continue;
+
+			for (j = 0; j < local_model->nb_layers; j++) {
+				local_layer = &local_model->layer[j];
+				if (local_layer != layer &&
+				    local_layer->glow.ocm_map.ocm_reserved) {
+					if (IS_BIT_SET(local_layer->glow.ocm_map.tilemask, tile_id))
+						scratch_resize_pages =
+							PLT_MAX((int)local_layer->glow.ocm_map
+									.scratch_pages,
+								scratch_resize_pages);
+				}
 			}
 		}
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 3404e7fd65..720f8caf76 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -27,7 +27,7 @@ struct cn10k_ml_ocm_tile_info {
 };
 
 /* Model OCM map structure */
-struct cn10k_ml_ocm_model_map {
+struct cn10k_ml_ocm_layer_map {
 	/* Status of OCM reservation */
 	bool ocm_reserved;
 
@@ -77,9 +77,10 @@ struct cn10k_ml_ocm {
 int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
 int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
 			       uint16_t scratch_pages, uint64_t *tilemask);
-void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t tilemask,
-				int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages);
-void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id);
+void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
+				uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
+				uint16_t scratch_pages);
+void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id);
 void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp);
 
 #endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 3385bf50c0..a52509630f 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -12,6 +12,7 @@
 #include "cn10k_ml_ops.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
@@ -203,7 +204,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	char str[STR_LEN];
 	uint8_t i;
@@ -216,77 +217,80 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 
 	/* Print debug info */
 	print_line(fp, LINE_LEN);
-	fprintf(fp, " Model Information (%s)\n", model->metadata.model.name);
+	fprintf(fp, " Model Information (%s)\n", model->glow.metadata.model.name);
 	print_line(fp, LINE_LEN);
-	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->metadata.model.name);
-	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version", model->metadata.model.version[0],
-		model->metadata.model.version[1], model->metadata.model.version[2],
-		model->metadata.model.version[3]);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->glow.metadata.model.name);
+	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version",
+		model->glow.metadata.model.version[0], model->glow.metadata.model.version[1],
+		model->glow.metadata.model.version[2], model->glow.metadata.model.version[3]);
 	if (strlen(model->name) != 0)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "debug_name", model->name);
 	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->metadata.model.batch_size);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_layers", model->metadata.model.num_layers);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->glow.metadata.model.batch_size);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_layers", model->glow.metadata.model.num_layers);
 
 	/* Print model state */
-	if (model->state == ML_CN10K_MODEL_STATE_LOADED)
+	if (model->state == ML_CNXK_MODEL_STATE_LOADED)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
-	if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE)
+	if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
-	if (model->state == ML_CN10K_MODEL_STATE_STARTED)
+	if (model->state == ML_CNXK_MODEL_STATE_STARTED)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
 
 	/* Print OCM status */
 	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
-		model->metadata.model.ocm_wb_range_end - model->metadata.model.ocm_wb_range_start +
-			1);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", model->model_mem_map.wb_pages);
+		model->glow.metadata.model.ocm_wb_range_end -
+			model->glow.metadata.model.ocm_wb_range_start + 1);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", model->layer[0].glow.ocm_map.wb_pages);
 	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
-		ocm->size_per_tile - model->metadata.model.ocm_tmp_range_floor);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages", model->model_mem_map.scratch_pages);
+		ocm->size_per_tile - model->glow.metadata.model.ocm_tmp_range_floor);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages",
+		model->layer[0].glow.ocm_map.scratch_pages);
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
-		model->metadata.model.tile_end - model->metadata.model.tile_start + 1);
+		model->glow.metadata.model.tile_end - model->glow.metadata.model.tile_start + 1);
 
-	if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+	if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
-			ML_CN10K_OCM_NUMTILES / 4, model->model_mem_map.tilemask);
+			ML_CN10K_OCM_NUMTILES / 4, model->layer[0].glow.ocm_map.tilemask);
 		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
-			model->model_mem_map.wb_page_start * cn10k_mldev->ocm.page_size);
+			model->layer[0].glow.ocm_map.wb_page_start * cn10k_mldev->ocm.page_size);
 	}
 
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->metadata.model.num_input);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", model->metadata.model.num_output);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->glow.metadata.model.num_input);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", model->glow.metadata.model.num_output);
 	fprintf(fp, "\n");
 
 	print_line(fp, LINE_LEN);
 	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "input", "input_name", "input_type",
 		"model_input_type", "quantize");
 	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->metadata.model.num_input; i++) {
+	for (i = 0; i < model->glow.metadata.model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.input1[i].input_name);
-			rte_ml_io_type_to_str(model->metadata.input1[i].input_type, str, STR_LEN);
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.input1[i].input_name);
+			rte_ml_io_type_to_str(model->glow.metadata.input1[i].input_type, str,
+					      STR_LEN);
 			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.input1[i].model_input_type, str,
+			rte_ml_io_type_to_str(model->glow.metadata.input1[i].model_input_type, str,
 					      STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.input1[i].quantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.input1[i].quantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.input2[j].input_name);
-			rte_ml_io_type_to_str(model->metadata.input2[j].input_type, str, STR_LEN);
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.input2[j].input_name);
+			rte_ml_io_type_to_str(model->glow.metadata.input2[j].input_type, str,
+					      STR_LEN);
 			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.input2[j].model_input_type, str,
+			rte_ml_io_type_to_str(model->glow.metadata.input2[j].model_input_type, str,
 					      STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.input2[j].quantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.input2[j].quantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		}
 	}
@@ -296,29 +300,31 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "output", "output_name", "output_type",
 		"model_output_type", "dequantize");
 	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->metadata.model.num_output; i++) {
+	for (i = 0; i < model->glow.metadata.model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.output1[i].output_name);
-			rte_ml_io_type_to_str(model->metadata.output1[i].output_type, str, STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.output1[i].model_output_type, str,
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.output1[i].output_name);
+			rte_ml_io_type_to_str(model->glow.metadata.output1[i].output_type, str,
 					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(model->glow.metadata.output1[i].model_output_type,
+					      str, STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.output1[i].dequantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.output1[i].dequantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.output2[j].output_name);
-			rte_ml_io_type_to_str(model->metadata.output2[j].output_type, str, STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.output2[j].model_output_type, str,
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.output2[j].output_name);
+			rte_ml_io_type_to_str(model->glow.metadata.output2[j].output_type, str,
 					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(model->glow.metadata.output2[j].model_output_type,
+					      str, STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.output2[j].dequantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.output2[j].dequantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		}
 	}
@@ -328,14 +334,14 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 }
 
 static void
-cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_model *model,
+cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_model *model,
 				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_model_addr *addr;
+	struct cn10k_ml_layer_addr *addr;
 
-	metadata = &model->metadata;
-	addr = &model->addr;
+	metadata = &model->glow.metadata;
+	addr = &model->layer[0].glow.addr;
 
 	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
 	req->jd.hdr.jce.w0.u64 = 0;
@@ -346,7 +352,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_m
 	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
-		if (!model->metadata.model.ocm_relocatable)
+		if (!model->glow.metadata.model.ocm_relocatable)
 			req->jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
 		else
 			req->jd.hdr.sp_flags = 0x0;
@@ -386,7 +392,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_m
 		req->jd.model_start.output.s.ddr_range_end = metadata->model.ddr_output_range_end;
 
 		req->extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
-			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, model->addr.scratch_base_addr));
+			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->scratch_base_addr));
 		req->extended_args.start.ddr_scratch_range_start =
 			metadata->model.ddr_scratch_range_start;
 		req->extended_args.start.ddr_scratch_range_end =
@@ -446,7 +452,7 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
-	nb_stats = RTE_DIM(device_stats) + ML_CN10K_MAX_MODELS * RTE_DIM(model_stats);
+	nb_stats = RTE_DIM(device_stats) + ML_CNXK_MAX_MODELS * RTE_DIM(model_stats);
 	if (cn10k_mldev->xstats.entries == NULL)
 		cn10k_mldev->xstats.entries = rte_zmalloc(
 			"cn10k_ml_xstats", sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
@@ -473,7 +479,7 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	cn10k_mldev->xstats.count_mode_device = stat_id;
 
 	/* Initialize model xstats */
-	for (model = 0; model < ML_CN10K_MAX_MODELS; model++) {
+	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
 		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
 
 		for (i = 0; i < RTE_DIM(model_stats); i++) {
@@ -522,7 +528,7 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint16_t rclk_freq;
 	uint16_t sclk_freq;
 	uint16_t stat_id;
@@ -544,7 +550,7 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 	for (i = 0; i < RTE_DIM(model_stats); i++) {
 		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
-			 model->metadata.model.name, model_stats[i].name, suffix);
+			 model->layer[0].glow.metadata.model.name, model_stats[i].name, suffix);
 		stat_id++;
 	}
 }
@@ -577,9 +583,9 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value += model->burst_stats[qp_id].str##_latency_tot;                      \
-			count += model->burst_stats[qp_id].dequeued_count -                        \
-				 model->burst_stats[qp_id].str##_reset_count;                      \
+			value += model->layer[0].glow.burst_stats[qp_id].str##_latency_tot;        \
+			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
+				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
 		}                                                                                  \
 		if (count != 0)                                                                    \
 			value = value / count;                                                     \
@@ -589,9 +595,10 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = UINT64_MAX;                                                                \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MIN(value, model->burst_stats[qp_id].str##_latency_min);       \
-			count += model->burst_stats[qp_id].dequeued_count -                        \
-				 model->burst_stats[qp_id].str##_reset_count;                      \
+			value = PLT_MIN(                                                           \
+				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_min); \
+			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
+				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
@@ -601,9 +608,10 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MAX(value, model->burst_stats[qp_id].str##_latency_max);       \
-			count += model->burst_stats[qp_id].dequeued_count -                        \
-				 model->burst_stats[qp_id].str##_reset_count;                      \
+			value = PLT_MAX(                                                           \
+				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_max); \
+			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
+				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
@@ -612,7 +620,7 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 static uint64_t
 cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cn10k_ml_xstats_type type)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint16_t rclk_freq; /* MHz */
 	uint16_t sclk_freq; /* MHz */
 	uint64_t count = 0;
@@ -693,28 +701,28 @@ cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[],
 #define ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			model->burst_stats[qp_id].str##_latency_tot = 0;                           \
-			model->burst_stats[qp_id].str##_reset_count =                              \
-				model->burst_stats[qp_id].dequeued_count;                          \
+			model->layer[0].glow.burst_stats[qp_id].str##_latency_tot = 0;             \
+			model->layer[0].glow.burst_stats[qp_id].str##_reset_count =                \
+				model->layer[0].glow.burst_stats[qp_id].dequeued_count;            \
 		}                                                                                  \
 	} while (0)
 
 #define ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->burst_stats[qp_id].str##_latency_min = UINT64_MAX;                  \
+			model->layer[0].glow.burst_stats[qp_id].str##_latency_min = UINT64_MAX;    \
 	} while (0)
 
 #define ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->burst_stats[qp_id].str##_latency_max = 0;                           \
+			model->layer[0].glow.burst_stats[qp_id].str##_latency_max = 0;             \
 	} while (0)
 
 static void
 cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cn10k_ml_xstats_type type)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint32_t qp_id;
 
 	model = dev->data->models[model_id];
@@ -750,7 +758,7 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	int32_t lcl_model_id = 0;
 	uint16_t start_id;
 	uint16_t end_id;
@@ -759,7 +767,7 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	for (i = 0; i < ML_CN10K_MAX_MODELS; i++) {
+	for (i = 0; i < ML_CNXK_MAX_MODELS; i++) {
 		if (model_id == -1) {
 			model = dev->data->models[i];
 			if (model == NULL) /* Skip inactive models */
@@ -804,7 +812,7 @@ static int
 cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct rte_ml_model_info *info;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct rte_ml_buff_seg seg[2];
 	struct rte_ml_buff_seg *inp;
 	struct rte_ml_buff_seg *out;
@@ -855,7 +863,7 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	op.input = &inp;
 	op.output = &out;
 
-	memset(model->req, 0, sizeof(struct cn10k_ml_req));
+	memset(model->layer[0].glow.req, 0, sizeof(struct cn10k_ml_req));
 	ret = cn10k_ml_inference_sync(dev, &op);
 	plt_memzone_free(mz);
 
@@ -876,7 +884,7 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 
 	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
 	dev_info->driver_name = dev->device->driver->name;
-	dev_info->max_models = ML_CN10K_MAX_MODELS;
+	dev_info->max_models = ML_CNXK_MAX_MODELS;
 	if (cn10k_mldev->hw_queue_lock)
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
 	else
@@ -896,7 +904,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
@@ -1002,11 +1010,11 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 			model = dev->data->models[model_id];
 			if (model != NULL) {
-				if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 					if (cn10k_ml_model_stop(dev, model_id) != 0)
 						plt_err("Could not stop model %u", model_id);
 				}
-				if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
 					if (cn10k_ml_model_unload(dev, model_id) != 0)
 						plt_err("Could not unload model %u", model_id);
 				}
@@ -1094,7 +1102,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
 	uint16_t qp_id;
@@ -1112,11 +1120,11 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 		model = dev->data->models[model_id];
 		if (model != NULL) {
-			if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 				if (cn10k_ml_model_stop(dev, model_id) != 0)
 					plt_err("Could not stop model %u", model_id);
 			}
-			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
 				if (cn10k_ml_model_unload(dev, model_id) != 0)
 					plt_err("Could not unload model %u", model_id);
 			}
@@ -1295,7 +1303,7 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CN10K_MAX_MODELS)
+		if (model_id >= ML_CNXK_MAX_MODELS)
 			break;
 		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
@@ -1387,7 +1395,7 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CN10K_MAX_MODELS)
+		if (model_id >= ML_CNXK_MAX_MODELS)
 			return -EINVAL;
 		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
@@ -1448,7 +1456,7 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_fw *fw;
 
 	uint32_t head_loc;
@@ -1589,7 +1597,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
@@ -1644,9 +1652,9 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
 			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
 	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
-	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_model_stats);
+	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_stats);
 
-	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
+	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
 		  2 * model_data_size + model_scratch_size + model_info_size +
 		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE) +
 		  model_stats_size;
@@ -1660,62 +1668,85 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	model = mz->addr;
-	model->mldev = cnxk_mldev;
+	model->cnxk_mldev = cnxk_mldev;
 	model->model_id = idx;
+	dev->data->models[idx] = model;
 
-	rte_memcpy(&model->metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
-	cn10k_ml_model_metadata_update(&model->metadata);
+	rte_memcpy(&model->glow.metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
+	cn10k_ml_model_metadata_update(&model->glow.metadata);
+
+	/* Set model name */
+	rte_memcpy(model->name, (char *)model->glow.metadata.model.name, 64);
 
 	/* Enable support for batch_size of 256 */
-	if (model->metadata.model.batch_size == 0)
+	if (model->glow.metadata.model.batch_size == 0)
 		model->batch_size = 256;
 	else
-		model->batch_size = model->metadata.model.batch_size;
+		model->batch_size = model->glow.metadata.model.batch_size;
+
+	/* Since the number of layers that the driver would be handling for glow models is
+	 * always 1. consider the entire model as a model with single layer. This would
+	 * ignore the num_layers from metadata.
+	 */
+	model->nb_layers = 1;
+
+	/* Copy metadata to internal buffer */
+	rte_memcpy(&model->layer[0].glow.metadata, params->addr,
+		   sizeof(struct cn10k_ml_model_metadata));
+	cn10k_ml_model_metadata_update(&model->layer[0].glow.metadata);
+	model->layer[0].model = model;
 
 	/* Set DMA base address */
 	base_dma_addr = PLT_PTR_ADD(
-		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE));
-	cn10k_ml_model_addr_update(model, params->addr, base_dma_addr);
-	model->addr.scratch_base_addr = PLT_PTR_ADD(base_dma_addr, 2 * model_data_size);
+		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE));
+	cn10k_ml_layer_addr_update(&model->layer[0], params->addr, base_dma_addr);
+	model->layer[0].glow.addr.scratch_base_addr =
+		PLT_PTR_ADD(base_dma_addr, 2 * model_data_size);
 
 	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, model_data_size);
+	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
+		   model->layer[0].glow.addr.base_dma_addr_load, model_data_size);
+
+	/* Update internal I/O data structure */
+	cn10k_ml_layer_info_update(&model->layer[0]);
 
 	/* Initialize model_mem_map */
-	memset(&model->model_mem_map, 0, sizeof(struct cn10k_ml_ocm_model_map));
-	model->model_mem_map.ocm_reserved = false;
-	model->model_mem_map.tilemask = 0;
-	model->model_mem_map.wb_page_start = -1;
-	model->model_mem_map.wb_pages = wb_pages;
-	model->model_mem_map.scratch_pages = scratch_pages;
+	memset(&model->layer[0].glow.ocm_map, 0, sizeof(struct cn10k_ml_ocm_layer_map));
+	model->layer[0].glow.ocm_map.ocm_reserved = false;
+	model->layer[0].glow.ocm_map.tilemask = 0;
+	model->layer[0].glow.ocm_map.wb_page_start = -1;
+	model->layer[0].glow.ocm_map.wb_pages = wb_pages;
+	model->layer[0].glow.ocm_map.scratch_pages = scratch_pages;
 
 	/* Set model info */
-	model->info = PLT_PTR_ADD(model->addr.scratch_base_addr, model_scratch_size);
+	model->info = PLT_PTR_ADD(model->layer[0].glow.addr.scratch_base_addr, model_scratch_size);
 	cn10k_ml_model_info_set(dev, model);
 
 	/* Set slow-path request address and state */
-	model->req = PLT_PTR_ADD(model->info, model_info_size);
+	model->layer[0].glow.req = PLT_PTR_ADD(model->info, model_info_size);
 
 	/* Reset burst and sync stats */
-	model->burst_stats = PLT_PTR_ADD(
-		model->req, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE));
+	model->layer[0].glow.burst_stats =
+		PLT_PTR_ADD(model->layer[0].glow.req,
+			    PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE));
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
-		model->burst_stats[qp_id].hw_latency_tot = 0;
-		model->burst_stats[qp_id].hw_latency_min = UINT64_MAX;
-		model->burst_stats[qp_id].hw_latency_max = 0;
-		model->burst_stats[qp_id].fw_latency_tot = 0;
-		model->burst_stats[qp_id].fw_latency_min = UINT64_MAX;
-		model->burst_stats[qp_id].fw_latency_max = 0;
-		model->burst_stats[qp_id].hw_reset_count = 0;
-		model->burst_stats[qp_id].fw_reset_count = 0;
-		model->burst_stats[qp_id].dequeued_count = 0;
-	}
-	model->sync_stats =
-		PLT_PTR_ADD(model->burst_stats,
-			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_model_stats));
+		model->layer[0].glow.burst_stats[qp_id].hw_latency_tot = 0;
+		model->layer[0].glow.burst_stats[qp_id].hw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_stats[qp_id].hw_latency_max = 0;
+		model->layer[0].glow.burst_stats[qp_id].fw_latency_tot = 0;
+		model->layer[0].glow.burst_stats[qp_id].fw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_stats[qp_id].fw_latency_max = 0;
+		model->layer[0].glow.burst_stats[qp_id].hw_reset_count = 0;
+		model->layer[0].glow.burst_stats[qp_id].fw_reset_count = 0;
+		model->layer[0].glow.burst_stats[qp_id].dequeued_count = 0;
+	}
+
+	model->layer[0].glow.sync_stats =
+		PLT_PTR_ADD(model->layer[0].glow.burst_stats,
+			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_stats));
 
 	plt_spinlock_init(&model->lock);
-	model->state = ML_CN10K_MODEL_STATE_LOADED;
+	model->state = ML_CNXK_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
 	cnxk_mldev->nb_models_loaded++;
 
@@ -1731,7 +1762,7 @@ int
 cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	char str[RTE_MEMZONE_NAMESIZE];
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cnxk_ml_dev *cnxk_mldev;
 
 	cnxk_mldev = dev->data->dev_private;
@@ -1742,7 +1773,7 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EINVAL;
 	}
 
-	if (model->state != ML_CN10K_MODEL_STATE_LOADED) {
+	if (model->state != ML_CNXK_MODEL_STATE_LOADED) {
 		plt_err("Cannot unload. Model in use.");
 		return -EBUSY;
 	}
@@ -1759,7 +1790,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1784,7 +1815,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	}
 
 	/* Prepare JD */
-	req = model->req;
+	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
@@ -1792,63 +1823,66 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
-	num_tiles = model->metadata.model.tile_end - model->metadata.model.tile_start + 1;
+	num_tiles = model->layer[0].glow.metadata.model.tile_end -
+		    model->layer[0].glow.metadata.model.tile_start + 1;
 
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 				plt_ml_dbg("Model already started, model = 0x%016lx",
 					   PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE) {
+			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
 				plt_err("A slow-path job is active for the model = 0x%016lx",
 					PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CN10K_MODEL_STATE_JOB_ACTIVE;
+			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (!model->model_mem_map.ocm_reserved) {
+	while (!model->layer[0].glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
 			wb_page_start = cn10k_ml_ocm_tilemask_find(
-				dev, num_tiles, model->model_mem_map.wb_pages,
-				model->model_mem_map.scratch_pages, &tilemask);
+				dev, num_tiles, model->layer[0].glow.ocm_map.wb_pages,
+				model->layer[0].glow.ocm_map.scratch_pages, &tilemask);
 
 			if (wb_page_start == -1) {
 				plt_err("Free pages not available on OCM tiles");
 				plt_err("Failed to start model = 0x%016lx, name = %s",
-					PLT_U64_CAST(model), model->metadata.model.name);
+					PLT_U64_CAST(model),
+					model->layer[0].glow.metadata.model.name);
 
 				plt_spinlock_unlock(&ocm->lock);
 				return -ENOMEM;
 			}
 
-			model->model_mem_map.tilemask = tilemask;
-			model->model_mem_map.wb_page_start = wb_page_start;
+			model->layer[0].glow.ocm_map.tilemask = tilemask;
+			model->layer[0].glow.ocm_map.wb_page_start = wb_page_start;
 
-			cn10k_ml_ocm_reserve_pages(
-				dev, model->model_id, model->model_mem_map.tilemask,
-				model->model_mem_map.wb_page_start, model->model_mem_map.wb_pages,
-				model->model_mem_map.scratch_pages);
-			model->model_mem_map.ocm_reserved = true;
+			cn10k_ml_ocm_reserve_pages(dev, model->model_id, 0,
+						   model->layer[0].glow.ocm_map.tilemask,
+						   model->layer[0].glow.ocm_map.wb_page_start,
+						   model->layer[0].glow.ocm_map.wb_pages,
+						   model->layer[0].glow.ocm_map.scratch_pages);
+			model->layer[0].glow.ocm_map.ocm_reserved = true;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
 
 	/* Update JD */
-	cn10k_ml_ocm_tilecount(model->model_mem_map.tilemask, &tile_start, &tile_end);
+	cn10k_ml_ocm_tilecount(model->layer[0].glow.ocm_map.tilemask, &tile_start, &tile_end);
 	req->jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
 	req->jd.model_start.ocm_wb_base_address =
-		model->model_mem_map.wb_page_start * ocm->page_size;
+		model->layer[0].glow.ocm_map.wb_page_start * ocm->page_size;
 
 	job_enqueued = false;
 	job_dequeued = false;
@@ -1881,10 +1915,10 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
 			if (ret == 0) {
-				model->state = ML_CN10K_MODEL_STATE_STARTED;
+				model->state = ML_CNXK_MODEL_STATE_STARTED;
 				cnxk_mldev->nb_models_started++;
 			} else {
-				model->state = ML_CN10K_MODEL_STATE_UNKNOWN;
+				model->state = ML_CNXK_MODEL_STATE_UNKNOWN;
 			}
 
 			plt_spinlock_unlock(&model->lock);
@@ -1892,12 +1926,12 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 		}
 	}
 
-	if (model->state == ML_CN10K_MODEL_STATE_UNKNOWN) {
-		while (model->model_mem_map.ocm_reserved) {
+	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN) {
+		while (model->layer[0].glow.ocm_map.ocm_reserved) {
 			if (plt_spinlock_trylock(&ocm->lock) != 0) {
-				cn10k_ml_ocm_free_pages(dev, model->model_id);
-				model->model_mem_map.ocm_reserved = false;
-				model->model_mem_map.tilemask = 0x0;
+				cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
+				model->layer[0].glow.ocm_map.ocm_reserved = false;
+				model->layer[0].glow.ocm_map.tilemask = 0x0;
 				plt_spinlock_unlock(&ocm->lock);
 			}
 		}
@@ -1918,7 +1952,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1938,7 +1972,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	}
 
 	/* Prepare JD */
-	req = model->req;
+	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
@@ -1949,31 +1983,31 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
 				plt_ml_dbg("Model not started, model = 0x%016lx",
 					   PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE) {
+			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
 				plt_err("A slow-path job is active for the model = 0x%016lx",
 					PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CN10K_MODEL_STATE_JOB_ACTIVE;
+			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (model->model_mem_map.ocm_reserved) {
+	while (model->layer[0].glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
-			cn10k_ml_ocm_free_pages(dev, model->model_id);
-			model->model_mem_map.ocm_reserved = false;
-			model->model_mem_map.tilemask = 0x0;
+			cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
+			model->layer[0].glow.ocm_map.ocm_reserved = false;
+			model->layer[0].glow.ocm_map.tilemask = 0x0;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
@@ -2009,7 +2043,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
 			cnxk_mldev->nb_models_stopped++;
-			model->state = ML_CN10K_MODEL_STATE_LOADED;
+			model->state = ML_CNXK_MODEL_STATE_LOADED;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
@@ -2022,7 +2056,7 @@ static int
 cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			struct rte_ml_model_info *model_info)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 
 	model = dev->data->models[model_id];
 
@@ -2041,7 +2075,7 @@ cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 static int
 cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	size_t size;
 
 	model = dev->data->models[model_id];
@@ -2051,19 +2085,23 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *bu
 		return -EINVAL;
 	}
 
-	if (model->state == ML_CN10K_MODEL_STATE_UNKNOWN)
+	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN)
 		return -1;
-	else if (model->state != ML_CN10K_MODEL_STATE_LOADED)
+	else if (model->state != ML_CNXK_MODEL_STATE_LOADED)
 		return -EBUSY;
 
-	size = model->metadata.init_model.file_size + model->metadata.main_model.file_size +
-	       model->metadata.finish_model.file_size + model->metadata.weights_bias.file_size;
+	size = model->layer[0].glow.metadata.init_model.file_size +
+	       model->layer[0].glow.metadata.main_model.file_size +
+	       model->layer[0].glow.metadata.finish_model.file_size +
+	       model->layer[0].glow.metadata.weights_bias.file_size;
 
 	/* Update model weights & bias */
-	rte_memcpy(model->addr.wb_load_addr, buffer, model->metadata.weights_bias.file_size);
+	rte_memcpy(model->layer[0].glow.addr.wb_load_addr, buffer,
+		   model->layer[0].glow.metadata.weights_bias.file_size);
 
 	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, size);
+	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
+		   model->layer[0].glow.addr.base_dma_addr_load, size);
 
 	return 0;
 }
@@ -2072,7 +2110,7 @@ static int
 cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
 		     struct rte_ml_buff_seg **qbuffer)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint8_t model_input_type;
 	uint8_t *lcl_dbuffer;
 	uint8_t *lcl_qbuffer;
@@ -2092,57 +2130,58 @@ cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_bu
 	lcl_dbuffer = dbuffer[0]->addr;
 	lcl_qbuffer = qbuffer[0]->addr;
 
-	for (i = 0; i < model->metadata.model.num_input; i++) {
+	for (i = 0; i < model->layer[0].glow.metadata.model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			input_type = model->metadata.input1[i].input_type;
-			model_input_type = model->metadata.input1[i].model_input_type;
-			qscale = model->metadata.input1[i].qscale;
+			input_type = model->layer[0].glow.metadata.input1[i].input_type;
+			model_input_type = model->layer[0].glow.metadata.input1[i].model_input_type;
+			qscale = model->layer[0].glow.metadata.input1[i].qscale;
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			input_type = model->metadata.input2[j].input_type;
-			model_input_type = model->metadata.input2[j].model_input_type;
-			qscale = model->metadata.input2[j].qscale;
+			input_type = model->layer[0].glow.metadata.input2[j].input_type;
+			model_input_type = model->layer[0].glow.metadata.input2[j].model_input_type;
+			qscale = model->layer[0].glow.metadata.input2[j].qscale;
 		}
 
 		if (input_type == model_input_type) {
-			rte_memcpy(lcl_qbuffer, lcl_dbuffer, model->addr.input[i].sz_d);
+			rte_memcpy(lcl_qbuffer, lcl_dbuffer, model->layer[0].info.input[i].sz_d);
 		} else {
-			switch (model->metadata.input1[i].model_input_type) {
+			switch (model->layer[0].glow.metadata.input1[i].model_input_type) {
 			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_float32_to_int8(qscale,
-								model->addr.input[i].nb_elements,
-								lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_int8(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_float32_to_uint8(qscale,
-								 model->addr.input[i].nb_elements,
-								 lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_uint8(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_float32_to_int16(qscale,
-								 model->addr.input[i].nb_elements,
-								 lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_int16(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_float32_to_uint16(qscale,
-								  model->addr.input[i].nb_elements,
-								  lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_uint16(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_FP16:
-				ret = rte_ml_io_float32_to_float16(model->addr.input[i].nb_elements,
-								   lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_float16(
+					model->layer[0].info.input[i].nb_elements, lcl_dbuffer,
+					lcl_qbuffer);
 				break;
 			default:
 				plt_err("Unsupported model_input_type[%u] : %u", i,
-					model->metadata.input1[i].model_input_type);
+					model->layer[0].glow.metadata.input1[i].model_input_type);
 				ret = -ENOTSUP;
 			}
 			if (ret < 0)
 				return ret;
 		}
 
-		lcl_dbuffer += model->addr.input[i].sz_d;
-		lcl_qbuffer += model->addr.input[i].sz_q;
+		lcl_dbuffer += model->layer[0].info.input[i].sz_d;
+		lcl_qbuffer += model->layer[0].info.input[i].sz_q;
 	}
 
 	return 0;
@@ -2152,7 +2191,7 @@ static int
 cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
 		       struct rte_ml_buff_seg **dbuffer)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint8_t model_output_type;
 	uint8_t *lcl_qbuffer;
 	uint8_t *lcl_dbuffer;
@@ -2172,58 +2211,60 @@ cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_
 	lcl_dbuffer = dbuffer[0]->addr;
 	lcl_qbuffer = qbuffer[0]->addr;
 
-	for (i = 0; i < model->metadata.model.num_output; i++) {
+	for (i = 0; i < model->layer[0].glow.metadata.model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			output_type = model->metadata.output1[i].output_type;
-			model_output_type = model->metadata.output1[i].model_output_type;
-			dscale = model->metadata.output1[i].dscale;
+			output_type = model->layer[0].glow.metadata.output1[i].output_type;
+			model_output_type =
+				model->layer[0].glow.metadata.output1[i].model_output_type;
+			dscale = model->layer[0].glow.metadata.output1[i].dscale;
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			output_type = model->metadata.output2[j].output_type;
-			model_output_type = model->metadata.output2[j].model_output_type;
-			dscale = model->metadata.output2[j].dscale;
+			output_type = model->layer[0].glow.metadata.output2[j].output_type;
+			model_output_type =
+				model->layer[0].glow.metadata.output2[j].model_output_type;
+			dscale = model->layer[0].glow.metadata.output2[j].dscale;
 		}
 
 		if (output_type == model_output_type) {
-			rte_memcpy(lcl_dbuffer, lcl_qbuffer, model->addr.output[i].sz_q);
+			rte_memcpy(lcl_dbuffer, lcl_qbuffer, model->layer[0].info.output[i].sz_q);
 		} else {
-			switch (model->metadata.output1[i].model_output_type) {
+			switch (model->layer[0].glow.metadata.output1[i].model_output_type) {
 			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_int8_to_float32(dscale,
-								model->addr.output[i].nb_elements,
-								lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_int8_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_uint8_to_float32(dscale,
-								 model->addr.output[i].nb_elements,
-								 lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_uint8_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_int16_to_float32(dscale,
-								 model->addr.output[i].nb_elements,
-								 lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_int16_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_uint16_to_float32(dscale,
-								  model->addr.output[i].nb_elements,
-								  lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_uint16_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_FP16:
 				ret = rte_ml_io_float16_to_float32(
-					model->addr.output[i].nb_elements, lcl_qbuffer,
+					model->layer[0].info.output[i].nb_elements, lcl_qbuffer,
 					lcl_dbuffer);
 				break;
 			default:
 				plt_err("Unsupported model_output_type[%u] : %u", i,
-					model->metadata.output1[i].model_output_type);
+					model->layer[0].glow.metadata.output1[i].model_output_type);
 				ret = -ENOTSUP;
 			}
 			if (ret < 0)
 				return ret;
 		}
 
-		lcl_qbuffer += model->addr.output[i].sz_q;
-		lcl_dbuffer += model->addr.output[i].sz_d;
+		lcl_qbuffer += model->layer[0].info.output[i].sz_q;
+		lcl_dbuffer += model->layer[0].info.output[i].sz_d;
 	}
 
 	return 0;
@@ -2251,10 +2292,10 @@ static __rte_always_inline void
 cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
 		       struct rte_ml_op *op)
 {
-	struct cn10k_ml_model_stats *stats;
+	struct cn10k_ml_layer_stats *stats;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_qp *qp;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
@@ -2264,9 +2305,9 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		if (likely(qp_id >= 0)) {
 			qp = dev->data->queue_pairs[qp_id];
 			qp->stats.dequeued_count++;
-			stats = &model->burst_stats[qp_id];
+			stats = &model->layer[0].glow.burst_stats[qp_id];
 		} else {
-			stats = model->sync_stats;
+			stats = model->layer[0].glow.sync_stats;
 		}
 
 		if (unlikely(stats->dequeued_count == stats->hw_reset_count)) {
@@ -2470,7 +2511,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_req *req;
 	bool timeout;
 	int ret = 0;
@@ -2478,7 +2519,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[op->model_id];
-	req = model->req;
+	req = model->layer[0].glow.req;
 
 	cn10k_ml_set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
diff --git a/drivers/ml/cnxk/cnxk_ml_io.h b/drivers/ml/cnxk/cnxk_ml_io.h
new file mode 100644
index 0000000000..29ec7ec511
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_io.h
@@ -0,0 +1,79 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_IO_H_
+#define _CNXK_ML_IO_H_
+
+#include <rte_mldev.h>
+
+/* Maximum number of models per device */
+#define ML_CNXK_MAX_MODELS 16
+
+/* Maximum number of layers per model */
+#define ML_CNXK_MODEL_MAX_LAYERS 32
+
+/* Maximum number of inputs or outputs per layer or model */
+#define ML_CNXK_MODEL_MAX_INPUT_OUTPUT 32
+
+/* Maximum number of dimensions per I/O shape */
+#define ML_CNXK_MODEL_MAX_DIMS 8
+
+/* Input / Output structure */
+struct cnxk_ml_io {
+	/* name */
+	char name[RTE_ML_STR_MAX];
+
+	/* dequantized data type */
+	enum rte_ml_io_type dtype;
+
+	/* quantized data type */
+	enum rte_ml_io_type qtype;
+
+	/* Number of dimensions in shape */
+	uint32_t nb_dims;
+
+	/* Shape of input */
+	uint32_t shape[ML_CNXK_MODEL_MAX_DIMS];
+
+	/* Number of elements */
+	uint32_t nb_elements;
+
+	/* Dequantized input size */
+	uint32_t sz_d;
+
+	/* Quantized input size */
+	uint32_t sz_q;
+
+	/* Scale */
+	float scale;
+};
+
+/* Model / Layer IO structure */
+struct cnxk_ml_io_info {
+	/* Number of inputs */
+	uint16_t nb_inputs;
+
+	/* Model / Layer inputs */
+	struct cnxk_ml_io input[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Total size of quantized input */
+	uint32_t total_input_sz_q;
+
+	/* Total size of dequantized input */
+	uint32_t total_input_sz_d;
+
+	/* Number of outputs */
+	uint16_t nb_outputs;
+
+	/* Model / Layer outputs */
+	struct cnxk_ml_io output[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Total size of quantized output */
+	uint32_t total_output_sz_q;
+
+	/* Total size of dequantized output */
+	uint32_t total_output_sz_d;
+};
+
+#endif /* _CNXK_ML_IO_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
new file mode 100644
index 0000000000..3d735ced3e
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include "cnxk_ml_model.h"
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
new file mode 100644
index 0000000000..a2994dbb71
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -0,0 +1,111 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_MODEL_H_
+#define _CNXK_ML_MODEL_H_
+
+#include <rte_mldev.h>
+
+#include <roc_api.h>
+
+#include "cn10k_ml_model.h"
+
+#include "cnxk_ml_io.h"
+
+struct cnxk_ml_dev;
+struct cnxk_ml_model;
+
+/* Model state */
+enum cnxk_ml_model_state {
+	/* Unknown state */
+	ML_CNXK_MODEL_STATE_UNKNOWN,
+
+	/* Model loaded */
+	ML_CNXK_MODEL_STATE_LOADED,
+
+	/* A slow-path job is active, start or stop */
+	ML_CNXK_MODEL_STATE_JOB_ACTIVE,
+
+	/* Model started */
+	ML_CNXK_MODEL_STATE_STARTED,
+};
+
+/* Layer state */
+enum cnxk_ml_layer_state {
+	/* Unknown state */
+	ML_CNXK_LAYER_STATE_UNKNOWN,
+
+	/* Layer loaded */
+	ML_CNXK_LAYER_STATE_LOADED,
+
+	/* A slow-path job is active, start or stop */
+	ML_CNXK_LAYER_STATE_JOB_ACTIVE,
+
+	/* Layer started */
+	ML_CNXK_LAYER_STATE_STARTED,
+};
+
+/* Layer object */
+struct cnxk_ml_layer {
+	/* Name*/
+	char name[RTE_ML_STR_MAX];
+
+	/* Model handle */
+	struct cnxk_ml_model *model;
+
+	/* Index mapped with firmware's model_id */
+	uint16_t index;
+
+	/* Input / Output */
+	struct cnxk_ml_io_info info;
+
+	/* Batch size */
+	uint32_t batch_size;
+
+	/* State */
+	enum cnxk_ml_layer_state state;
+
+	/* Glow layer specific data */
+	struct cn10k_ml_layer_data glow;
+};
+
+/* Model Object */
+struct cnxk_ml_model {
+	/* Device reference */
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	/* ID */
+	uint16_t model_id;
+
+	/* Name */
+	char name[RTE_ML_STR_MAX];
+
+	/* Model specific data - glow */
+	struct cn10k_ml_model_data glow;
+
+	/* Batch size */
+	uint32_t batch_size;
+
+	/* Number of layers */
+	uint16_t nb_layers;
+
+	/* Layer info */
+	struct cnxk_ml_layer layer[ML_CNXK_MODEL_MAX_LAYERS];
+
+	/* State */
+	enum cnxk_ml_model_state state;
+
+	/* Internal model information structure
+	 * Size of the buffer = sizeof(struct rte_ml_model_info)
+	 *                    + num_inputs * sizeof(struct rte_ml_io_info)
+	 *                    + num_outputs * sizeof(struct rte_ml_io_info).
+	 * Structures would be arranged in the same order in the buffer.
+	 */
+	uint8_t *info;
+
+	/* Spinlock, used to update model state */
+	plt_spinlock_t lock;
+};
+
+#endif /* _CNXK_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 03a2d4ecf2..72e03b15b5 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -13,6 +13,8 @@ driver_sdk_headers = files(
         'cn10k_ml_model.h',
         'cn10k_ml_ocm.h',
         'cnxk_ml_dev.h',
+        'cnxk_ml_io.h',
+        'cnxk_ml_model.h',
 )
 
 sources = files(
@@ -21,6 +23,7 @@ sources = files(
         'cn10k_ml_model.c',
         'cn10k_ml_ocm.c',
         'cnxk_ml_dev.c',
+        'cnxk_ml_model.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 05/34] ml/cnxk: add generic cnxk request structure
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (3 preceding siblings ...)
  2023-09-20  7:24   ` [PATCH v2 04/34] ml/cnxk: add generic model and layer structures Srikanth Yalavarthi
@ 2023-09-20  7:24   ` Srikanth Yalavarthi
  2023-09-20  7:24   ` [PATCH v2 06/34] ml/cnxk: add generic cnxk xstats structures Srikanth Yalavarthi
                     ` (29 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:24 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added generic cnxk request structure. Moved common fields
from cn10k structures to cnxk structure. Moved job related
structures and enumerations to ops headers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c   |  70 ++++---
 drivers/ml/cnxk/cn10k_ml_dev.h   | 269 +------------------------
 drivers/ml/cnxk/cn10k_ml_model.c |   6 +-
 drivers/ml/cnxk/cn10k_ml_model.h |   4 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 329 +++++++++++++++++--------------
 drivers/ml/cnxk/cn10k_ml_ops.h   | 296 +++++++++++++++++++++++----
 drivers/ml/cnxk/cnxk_ml_ops.c    |   7 +
 drivers/ml/cnxk/cnxk_ml_ops.h    |  63 ++++++
 drivers/ml/cnxk/meson.build      |   2 +
 9 files changed, 558 insertions(+), 488 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 367fb7014c..f6e05cfc47 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -23,6 +23,7 @@
 #include "cn10k_ml_ops.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_ops.h"
 
 #define CN10K_ML_FW_PATH		"fw_path"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
@@ -457,20 +458,23 @@ cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 static void
 cn10k_ml_fw_print_info(struct cn10k_ml_fw *fw)
 {
-	plt_info("ML Firmware Version = %s", fw->req->jd.fw_load.version);
-
-	plt_ml_dbg("Firmware capabilities = 0x%016lx", fw->req->jd.fw_load.cap.u64);
-	plt_ml_dbg("Version = %s", fw->req->jd.fw_load.version);
-	plt_ml_dbg("core0_debug_ptr = 0x%016lx", fw->req->jd.fw_load.debug.core0_debug_ptr);
-	plt_ml_dbg("core1_debug_ptr = 0x%016lx", fw->req->jd.fw_load.debug.core1_debug_ptr);
-	plt_ml_dbg("debug_buffer_size = %u bytes", fw->req->jd.fw_load.debug.debug_buffer_size);
+	plt_info("ML Firmware Version = %s", fw->req->cn10k_req.jd.fw_load.version);
+
+	plt_ml_dbg("Firmware capabilities = 0x%016lx", fw->req->cn10k_req.jd.fw_load.cap.u64);
+	plt_ml_dbg("Version = %s", fw->req->cn10k_req.jd.fw_load.version);
+	plt_ml_dbg("core0_debug_ptr = 0x%016lx",
+		   fw->req->cn10k_req.jd.fw_load.debug.core0_debug_ptr);
+	plt_ml_dbg("core1_debug_ptr = 0x%016lx",
+		   fw->req->cn10k_req.jd.fw_load.debug.core1_debug_ptr);
+	plt_ml_dbg("debug_buffer_size = %u bytes",
+		   fw->req->cn10k_req.jd.fw_load.debug.debug_buffer_size);
 	plt_ml_dbg("core0_exception_buffer = 0x%016lx",
-		   fw->req->jd.fw_load.debug.core0_exception_buffer);
+		   fw->req->cn10k_req.jd.fw_load.debug.core0_exception_buffer);
 	plt_ml_dbg("core1_exception_buffer = 0x%016lx",
-		   fw->req->jd.fw_load.debug.core1_exception_buffer);
+		   fw->req->cn10k_req.jd.fw_load.debug.core1_exception_buffer);
 	plt_ml_dbg("exception_state_size = %u bytes",
-		   fw->req->jd.fw_load.debug.exception_state_size);
-	plt_ml_dbg("flags = 0x%016lx", fw->req->jd.fw_load.flags);
+		   fw->req->cn10k_req.jd.fw_load.debug.exception_state_size);
+	plt_ml_dbg("flags = 0x%016lx", fw->req->cn10k_req.jd.fw_load.flags);
 }
 
 uint64_t
@@ -515,29 +519,30 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 	roc_ml_reg_save(&cn10k_mldev->roc, ML_MLR_BASE);
 
 	/* Update FW load completion structure */
-	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
-	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
-	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
+	fw->req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->cn10k_req.status);
+	fw->req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
+	fw->req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->cn10k_req.result);
+	fw->req->cn10k_req.jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->cn10k_req.status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
 	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->cn10k_req.jd);
 
 	plt_rmb();
 	do {
 		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
+		    (plt_read64(&fw->req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
 	} while (plt_tsc_cycles() < timeout_cycle);
 
 	/* Check firmware load status, clean-up and exit on failure. */
-	if ((!timeout) && (fw->req->result.error_code.u64 == 0)) {
+	if ((!timeout) && (fw->req->cn10k_req.result.error_code == 0)) {
 		cn10k_ml_fw_print_info(fw);
 	} else {
 		/* Set ML to disable new jobs */
@@ -711,29 +716,30 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	plt_ml_dbg("ML_SW_RST_CTRL => 0x%08x", reg_val32);
 
 	/* (12) Wait for notification from firmware that ML is ready for job execution. */
-	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
-	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
-	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
+	fw->req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->cn10k_req.status);
+	fw->req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
+	fw->req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->cn10k_req.result);
+	fw->req->cn10k_req.jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->cn10k_req.status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
 	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->cn10k_req.jd);
 
 	plt_rmb();
 	do {
 		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
+		    (plt_read64(&fw->req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
 	} while (plt_tsc_cycles() < timeout_cycle);
 
 	/* Check firmware load status, clean-up and exit on failure. */
-	if ((!timeout) && (fw->req->result.error_code.u64 == 0)) {
+	if ((!timeout) && (fw->req->cn10k_req.result.error_code == 0)) {
 		cn10k_ml_fw_print_info(fw);
 	} else {
 		/* Set ML to disable new jobs */
@@ -823,11 +829,11 @@ cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 		}
 
 		/* Reserve memzone for firmware load completion and data */
-		mz_size = sizeof(struct cn10k_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
+		mz_size = sizeof(struct cnxk_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
 			  FW_DEBUG_BUFFER_SIZE + FW_EXCEPTION_BUFFER_SIZE;
 	} else if (roc_env_is_asim()) {
 		/* Reserve memzone for firmware load completion */
-		mz_size = sizeof(struct cn10k_ml_req);
+		mz_size = sizeof(struct cnxk_ml_req);
 	}
 
 	mz = plt_memzone_reserve_aligned(FW_MEMZONE_NAME, mz_size, 0, ML_CN10K_ALIGN_SIZE);
@@ -839,8 +845,8 @@ cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 	fw->req = mz->addr;
 
 	/* Reset firmware load completion structure */
-	memset(&fw->req->jd, 0, sizeof(struct cn10k_ml_jd));
-	memset(&fw->req->jd.fw_load.version[0], '\0', MLDEV_FIRMWARE_VERSION_LENGTH);
+	memset(&fw->req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+	memset(&fw->req->cn10k_req.jd.fw_load.version[0], '\0', MLDEV_FIRMWARE_VERSION_LENGTH);
 
 	/* Reset device, if in active state */
 	if (roc_ml_mlip_is_enabled(&cn10k_mldev->roc))
@@ -848,7 +854,7 @@ cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 
 	/* Load firmware */
 	if (roc_env_is_emulator() || roc_env_is_hw()) {
-		fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cn10k_ml_req));
+		fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cnxk_ml_req));
 		ret = cn10k_ml_fw_load_cn10ka(fw, fw_buffer, fw_size);
 		rte_free(fw_buffer);
 	} else if (roc_env_is_asim()) {
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 99ff0a344a..1852d4f6c9 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -17,9 +17,6 @@ extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
-/* Firmware version string length */
-#define MLDEV_FIRMWARE_VERSION_LENGTH 32
-
 /* Device alignment size */
 #define ML_CN10K_ALIGN_SIZE 128
 
@@ -52,17 +49,8 @@ extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 #endif
 
 struct cnxk_ml_dev;
-struct cn10k_ml_req;
-struct cn10k_ml_qp;
-
-/* Job types */
-enum cn10k_ml_job_type {
-	ML_CN10K_JOB_TYPE_MODEL_RUN = 0,
-	ML_CN10K_JOB_TYPE_MODEL_STOP,
-	ML_CN10K_JOB_TYPE_MODEL_START,
-	ML_CN10K_JOB_TYPE_FIRMWARE_LOAD,
-	ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST,
-};
+struct cnxk_ml_req;
+struct cnxk_ml_qp;
 
 /* Error types enumeration */
 enum cn10k_ml_error_etype {
@@ -112,251 +100,6 @@ union cn10k_ml_error_code {
 	uint64_t u64;
 };
 
-/* Firmware stats */
-struct cn10k_ml_fw_stats {
-	/* Firmware start cycle */
-	uint64_t fw_start;
-
-	/* Firmware end cycle */
-	uint64_t fw_end;
-
-	/* Hardware start cycle */
-	uint64_t hw_start;
-
-	/* Hardware end cycle */
-	uint64_t hw_end;
-};
-
-/* Result structure */
-struct cn10k_ml_result {
-	/* Job error code */
-	union cn10k_ml_error_code error_code;
-
-	/* Firmware stats */
-	struct cn10k_ml_fw_stats stats;
-
-	/* User context pointer */
-	void *user_ptr;
-};
-
-/* Firmware capability structure */
-union cn10k_ml_fw_cap {
-	uint64_t u64;
-
-	struct {
-		/* CMPC completion support */
-		uint64_t cmpc_completions : 1;
-
-		/* Poll mode completion support */
-		uint64_t poll_completions : 1;
-
-		/* SSO completion support */
-		uint64_t sso_completions : 1;
-
-		/* Support for model side loading */
-		uint64_t side_load_model : 1;
-
-		/* Batch execution */
-		uint64_t batch_run : 1;
-
-		/* Max number of models to be loaded in parallel */
-		uint64_t max_models : 8;
-
-		/* Firmware statistics */
-		uint64_t fw_stats : 1;
-
-		/* Hardware statistics */
-		uint64_t hw_stats : 1;
-
-		/* Max number of batches */
-		uint64_t max_num_batches : 16;
-
-		uint64_t rsvd : 33;
-	} s;
-};
-
-/* Firmware debug info structure */
-struct cn10k_ml_fw_debug {
-	/* ACC core 0 debug buffer */
-	uint64_t core0_debug_ptr;
-
-	/* ACC core 1 debug buffer */
-	uint64_t core1_debug_ptr;
-
-	/* ACC core 0 exception state buffer */
-	uint64_t core0_exception_buffer;
-
-	/* ACC core 1 exception state buffer */
-	uint64_t core1_exception_buffer;
-
-	/* Debug buffer size per core */
-	uint32_t debug_buffer_size;
-
-	/* Exception state dump size */
-	uint32_t exception_state_size;
-};
-
-/* Job descriptor header (32 bytes) */
-struct cn10k_ml_jd_header {
-	/* Job completion structure */
-	struct ml_jce_s jce;
-
-	/* Model ID */
-	uint64_t model_id : 8;
-
-	/* Job type */
-	uint64_t job_type : 8;
-
-	/* Flags for fast-path jobs */
-	uint64_t fp_flags : 16;
-
-	/* Flags for slow-path jobs */
-	uint64_t sp_flags : 16;
-	uint64_t rsvd : 16;
-
-	/* Job result pointer */
-	uint64_t *result;
-};
-
-/* Extra arguments for job descriptor */
-union cn10k_ml_jd_extended_args {
-	struct cn10k_ml_jd_extended_args_section_start {
-		/** DDR Scratch base address */
-		uint64_t ddr_scratch_base_address;
-
-		/** DDR Scratch range start */
-		uint64_t ddr_scratch_range_start;
-
-		/** DDR Scratch range end */
-		uint64_t ddr_scratch_range_end;
-
-		uint8_t rsvd[104];
-	} start;
-};
-
-/* Job descriptor structure */
-struct cn10k_ml_jd {
-	/* Job descriptor header (32 bytes) */
-	struct cn10k_ml_jd_header hdr;
-
-	union {
-		struct cn10k_ml_jd_section_fw_load {
-			/* Firmware capability structure (8 bytes) */
-			union cn10k_ml_fw_cap cap;
-
-			/* Firmware version (32 bytes) */
-			uint8_t version[MLDEV_FIRMWARE_VERSION_LENGTH];
-
-			/* Debug capability structure (40 bytes) */
-			struct cn10k_ml_fw_debug debug;
-
-			/* Flags to control error handling */
-			uint64_t flags;
-
-			uint8_t rsvd[8];
-		} fw_load;
-
-		struct cn10k_ml_jd_section_model_start {
-			/* Extended arguments */
-			uint64_t extended_args;
-
-			/* Destination model start address in DDR relative to ML_MLR_BASE */
-			uint64_t model_dst_ddr_addr;
-
-			/* Offset to model init section in the model */
-			uint64_t model_init_offset : 32;
-
-			/* Size of init section in the model */
-			uint64_t model_init_size : 32;
-
-			/* Offset to model main section in the model */
-			uint64_t model_main_offset : 32;
-
-			/* Size of main section in the model */
-			uint64_t model_main_size : 32;
-
-			/* Offset to model finish section in the model */
-			uint64_t model_finish_offset : 32;
-
-			/* Size of finish section in the model */
-			uint64_t model_finish_size : 32;
-
-			/* Offset to WB in model bin */
-			uint64_t model_wb_offset : 32;
-
-			/* Number of model layers */
-			uint64_t num_layers : 8;
-
-			/* Number of gather entries, 0 means linear input mode (= no gather) */
-			uint64_t num_gather_entries : 8;
-
-			/* Number of scatter entries 0 means linear input mode (= no scatter) */
-			uint64_t num_scatter_entries : 8;
-
-			/* Tile mask to load model */
-			uint64_t tilemask : 8;
-
-			/* Batch size of model  */
-			uint64_t batch_size : 32;
-
-			/* OCM WB base address */
-			uint64_t ocm_wb_base_address : 32;
-
-			/* OCM WB range start */
-			uint64_t ocm_wb_range_start : 32;
-
-			/* OCM WB range End */
-			uint64_t ocm_wb_range_end : 32;
-
-			/* DDR WB address */
-			uint64_t ddr_wb_base_address;
-
-			/* DDR WB range start */
-			uint64_t ddr_wb_range_start : 32;
-
-			/* DDR WB range end */
-			uint64_t ddr_wb_range_end : 32;
-
-			union {
-				/* Points to gather list if num_gather_entries > 0 */
-				void *gather_list;
-				struct {
-					/* Linear input mode */
-					uint64_t ddr_range_start : 32;
-					uint64_t ddr_range_end : 32;
-				} s;
-			} input;
-
-			union {
-				/* Points to scatter list if num_scatter_entries > 0 */
-				void *scatter_list;
-				struct {
-					/* Linear output mode */
-					uint64_t ddr_range_start : 32;
-					uint64_t ddr_range_end : 32;
-				} s;
-			} output;
-		} model_start;
-
-		struct cn10k_ml_jd_section_model_stop {
-			uint8_t rsvd[96];
-		} model_stop;
-
-		struct cn10k_ml_jd_section_model_run {
-			/* Address of the input for the run relative to ML_MLR_BASE */
-			uint64_t input_ddr_addr;
-
-			/* Address of the output for the run relative to ML_MLR_BASE */
-			uint64_t output_ddr_addr;
-
-			/* Number of batches to run in variable batch processing */
-			uint16_t num_batches;
-
-			uint8_t rsvd[78];
-		} model_run;
-	};
-};
-
 /* ML firmware structure */
 struct cn10k_ml_fw {
 	/* Device reference */
@@ -375,7 +118,7 @@ struct cn10k_ml_fw {
 	uint8_t *data;
 
 	/* Firmware load / handshake request structure */
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 };
 
 /* Extended stats types enum */
@@ -488,9 +231,9 @@ struct cn10k_ml_dev {
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
 
 	/* Poll handling function pointers */
-	void (*set_poll_addr)(struct cn10k_ml_req *req);
-	void (*set_poll_ptr)(struct cn10k_ml_req *req);
-	uint64_t (*get_poll_ptr)(struct cn10k_ml_req *req);
+	void (*set_poll_addr)(struct cnxk_ml_req *req);
+	void (*set_poll_ptr)(struct cnxk_ml_req *req);
+	uint64_t (*get_poll_ptr)(struct cnxk_ml_req *req);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 0ea6520bf7..2a0ae44cfd 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -12,6 +12,7 @@
 
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_ops.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -551,7 +552,6 @@ void
 cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct rte_ml_model_info *info;
 	struct rte_ml_io_info *output;
@@ -560,7 +560,6 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 	uint8_t i;
 
 	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	metadata = &model->glow.metadata;
 	info = PLT_PTR_CAST(model->info);
 	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
@@ -577,7 +576,8 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 	info->io_layout = RTE_ML_IO_LAYOUT_PACKED;
 	info->min_batches = model->batch_size;
 	info->max_batches =
-		cn10k_mldev->fw.req->jd.fw_load.cap.s.max_num_batches / model->batch_size;
+		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_num_batches /
+		model->batch_size;
 	info->nb_inputs = metadata->model.num_input;
 	info->input_info = input;
 	info->nb_outputs = metadata->model.num_output;
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 206a369ca7..74ada1531a 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -11,10 +11,10 @@
 
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ocm.h"
-#include "cn10k_ml_ops.h"
 
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
+struct cnxk_ml_req;
 
 /* Model Metadata : v 2.3.0.1 */
 #define MRVL_ML_MODEL_MAGIC_STRING "MRVL"
@@ -444,7 +444,7 @@ struct cn10k_ml_layer_data {
 	struct cn10k_ml_ocm_layer_map ocm_map;
 
 	/* Layer: Slow-path operations request pointer */
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 
 	/* Layer: Stats for burst ops */
 	struct cn10k_ml_layer_stats *burst_stats;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index a52509630f..2b1fa08154 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -13,6 +13,7 @@
 
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_ops.h"
 
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
@@ -80,31 +81,31 @@ print_line(FILE *fp, int len)
 }
 
 static inline void
-cn10k_ml_set_poll_addr(struct cn10k_ml_req *req)
+cn10k_ml_set_poll_addr(struct cnxk_ml_req *req)
 {
-	req->compl_W1 = PLT_U64_CAST(&req->status);
+	req->status = &req->cn10k_req.status;
 }
 
 static inline void
-cn10k_ml_set_poll_ptr(struct cn10k_ml_req *req)
+cn10k_ml_set_poll_ptr(struct cnxk_ml_req *req)
 {
-	plt_write64(ML_CNXK_POLL_JOB_START, req->compl_W1);
+	plt_write64(ML_CNXK_POLL_JOB_START, req->status);
 }
 
 static inline uint64_t
-cn10k_ml_get_poll_ptr(struct cn10k_ml_req *req)
+cn10k_ml_get_poll_ptr(struct cnxk_ml_req *req)
 {
-	return plt_read64(req->compl_W1);
+	return plt_read64(req->status);
 }
 
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
-	snprintf(name, size, "cn10k_ml_qp_mem_%u:%u", dev_id, qp_id);
+	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
 }
 
 static int
-cn10k_ml_qp_destroy(const struct rte_ml_dev *dev, struct cn10k_ml_qp *qp)
+cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
 {
 	const struct rte_memzone *qp_mem;
 	char name[RTE_MEMZONE_NAMESIZE];
@@ -124,14 +125,14 @@ cn10k_ml_qp_destroy(const struct rte_ml_dev *dev, struct cn10k_ml_qp *qp)
 static int
 cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
 {
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	int ret;
 
 	qp = dev->data->queue_pairs[queue_pair_id];
 	if (qp == NULL)
 		return -EINVAL;
 
-	ret = cn10k_ml_qp_destroy(dev, qp);
+	ret = cnxk_ml_qp_destroy(dev, qp);
 	if (ret) {
 		plt_err("Could not destroy queue pair %u", queue_pair_id);
 		return ret;
@@ -142,18 +143,18 @@ cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
 	return 0;
 }
 
-static struct cn10k_ml_qp *
-cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
+static struct cnxk_ml_qp *
+cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
 {
 	const struct rte_memzone *qp_mem;
 	char name[RTE_MEMZONE_NAMESIZE];
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint32_t len;
 	uint8_t *va;
 	uint64_t i;
 
 	/* Allocate queue pair */
-	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cn10k_ml_qp), ROC_ALIGN,
+	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cnxk_ml_qp), ROC_ALIGN,
 				socket_id);
 	if (qp == NULL) {
 		plt_err("Could not allocate queue pair");
@@ -161,7 +162,7 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	}
 
 	/* For request queue */
-	len = nb_desc * sizeof(struct cn10k_ml_req);
+	len = nb_desc * sizeof(struct cnxk_ml_req);
 	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp_id);
 	qp_mem = rte_memzone_reserve_aligned(
 		name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | RTE_MEMZONE_256MB, ROC_ALIGN);
@@ -175,7 +176,7 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 
 	/* Initialize Request queue */
 	qp->id = qp_id;
-	qp->queue.reqs = (struct cn10k_ml_req *)va;
+	qp->queue.reqs = (struct cnxk_ml_req *)va;
 	qp->queue.head = 0;
 	qp->queue.tail = 0;
 	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
@@ -187,8 +188,9 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
-		memset(&qp->queue.reqs[i].jd, 0, sizeof(struct cn10k_ml_jd));
-		qp->queue.reqs[i].jcmd.w1.s.jobptr = PLT_U64_CAST(&qp->queue.reqs[i].jd);
+		memset(&qp->queue.reqs[i].cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+		qp->queue.reqs[i].cn10k_req.jcmd.w1.s.jobptr =
+			PLT_U64_CAST(&qp->queue.reqs[i].cn10k_req.jd);
 	}
 
 	return qp;
@@ -335,7 +337,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 
 static void
 cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_model *model,
-				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
+				struct cnxk_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_layer_addr *addr;
@@ -343,79 +345,88 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 	metadata = &model->glow.metadata;
 	addr = &model->layer[0].glow.addr;
 
-	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
-	req->jd.hdr.jce.w0.u64 = 0;
-	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
-	req->jd.hdr.model_id = model->model_id;
-	req->jd.hdr.job_type = job_type;
-	req->jd.hdr.fp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
+	memset(&req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
+	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->cn10k_req.status);
+	req->cn10k_req.jd.hdr.model_id = model->model_id;
+	req->cn10k_req.jd.hdr.job_type = job_type;
+	req->cn10k_req.jd.hdr.fp_flags = 0x0;
+	req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
 		if (!model->glow.metadata.model.ocm_relocatable)
-			req->jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
+			req->cn10k_req.jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
 		else
-			req->jd.hdr.sp_flags = 0x0;
+			req->cn10k_req.jd.hdr.sp_flags = 0x0;
 
-		req->jd.hdr.sp_flags |= ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD;
-		req->jd.model_start.extended_args =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->extended_args));
-		req->jd.model_start.model_dst_ddr_addr =
+		req->cn10k_req.jd.hdr.sp_flags |= ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD;
+		req->cn10k_req.jd.model_start.extended_args = PLT_U64_CAST(
+			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.extended_args));
+		req->cn10k_req.jd.model_start.model_dst_ddr_addr =
 			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_run_addr));
-		req->jd.model_start.model_init_offset = 0x0;
-		req->jd.model_start.model_main_offset = metadata->init_model.file_size;
-		req->jd.model_start.model_finish_offset =
+		req->cn10k_req.jd.model_start.model_init_offset = 0x0;
+		req->cn10k_req.jd.model_start.model_main_offset = metadata->init_model.file_size;
+		req->cn10k_req.jd.model_start.model_finish_offset =
 			metadata->init_model.file_size + metadata->main_model.file_size;
-		req->jd.model_start.model_init_size = metadata->init_model.file_size;
-		req->jd.model_start.model_main_size = metadata->main_model.file_size;
-		req->jd.model_start.model_finish_size = metadata->finish_model.file_size;
-		req->jd.model_start.model_wb_offset = metadata->init_model.file_size +
-						      metadata->main_model.file_size +
-						      metadata->finish_model.file_size;
-		req->jd.model_start.num_layers = metadata->model.num_layers;
-		req->jd.model_start.num_gather_entries = 0;
-		req->jd.model_start.num_scatter_entries = 0;
-		req->jd.model_start.tilemask = 0; /* Updated after reserving pages */
-		req->jd.model_start.batch_size = model->batch_size;
-		req->jd.model_start.ocm_wb_base_address = 0; /* Updated after reserving pages */
-		req->jd.model_start.ocm_wb_range_start = metadata->model.ocm_wb_range_start;
-		req->jd.model_start.ocm_wb_range_end = metadata->model.ocm_wb_range_end;
-		req->jd.model_start.ddr_wb_base_address = PLT_U64_CAST(roc_ml_addr_ap2mlip(
-			&cn10k_mldev->roc,
-			PLT_PTR_ADD(addr->finish_load_addr, metadata->finish_model.file_size)));
-		req->jd.model_start.ddr_wb_range_start = metadata->model.ddr_wb_range_start;
-		req->jd.model_start.ddr_wb_range_end = metadata->model.ddr_wb_range_end;
-		req->jd.model_start.input.s.ddr_range_start = metadata->model.ddr_input_range_start;
-		req->jd.model_start.input.s.ddr_range_end = metadata->model.ddr_input_range_end;
-		req->jd.model_start.output.s.ddr_range_start =
+		req->cn10k_req.jd.model_start.model_init_size = metadata->init_model.file_size;
+		req->cn10k_req.jd.model_start.model_main_size = metadata->main_model.file_size;
+		req->cn10k_req.jd.model_start.model_finish_size = metadata->finish_model.file_size;
+		req->cn10k_req.jd.model_start.model_wb_offset = metadata->init_model.file_size +
+								metadata->main_model.file_size +
+								metadata->finish_model.file_size;
+		req->cn10k_req.jd.model_start.num_layers = metadata->model.num_layers;
+		req->cn10k_req.jd.model_start.num_gather_entries = 0;
+		req->cn10k_req.jd.model_start.num_scatter_entries = 0;
+		req->cn10k_req.jd.model_start.tilemask = 0; /* Updated after reserving pages */
+		req->cn10k_req.jd.model_start.batch_size = model->batch_size;
+		req->cn10k_req.jd.model_start.ocm_wb_base_address =
+			0; /* Updated after reserving pages */
+		req->cn10k_req.jd.model_start.ocm_wb_range_start =
+			metadata->model.ocm_wb_range_start;
+		req->cn10k_req.jd.model_start.ocm_wb_range_end = metadata->model.ocm_wb_range_end;
+		req->cn10k_req.jd.model_start.ddr_wb_base_address =
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(
+				&cn10k_mldev->roc, PLT_PTR_ADD(addr->finish_load_addr,
+							       metadata->finish_model.file_size)));
+		req->cn10k_req.jd.model_start.ddr_wb_range_start =
+			metadata->model.ddr_wb_range_start;
+		req->cn10k_req.jd.model_start.ddr_wb_range_end = metadata->model.ddr_wb_range_end;
+		req->cn10k_req.jd.model_start.input.s.ddr_range_start =
+			metadata->model.ddr_input_range_start;
+		req->cn10k_req.jd.model_start.input.s.ddr_range_end =
+			metadata->model.ddr_input_range_end;
+		req->cn10k_req.jd.model_start.output.s.ddr_range_start =
 			metadata->model.ddr_output_range_start;
-		req->jd.model_start.output.s.ddr_range_end = metadata->model.ddr_output_range_end;
+		req->cn10k_req.jd.model_start.output.s.ddr_range_end =
+			metadata->model.ddr_output_range_end;
 
-		req->extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
+		req->cn10k_req.extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
 			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->scratch_base_addr));
-		req->extended_args.start.ddr_scratch_range_start =
+		req->cn10k_req.extended_args.start.ddr_scratch_range_start =
 			metadata->model.ddr_scratch_range_start;
-		req->extended_args.start.ddr_scratch_range_end =
+		req->cn10k_req.extended_args.start.ddr_scratch_range_end =
 			metadata->model.ddr_scratch_range_end;
 	}
 }
 
 static __rte_always_inline void
-cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_req *req,
+cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_req *req,
 				struct rte_ml_op *op)
 {
-	req->jd.hdr.jce.w0.u64 = 0;
-	req->jd.hdr.jce.w1.u64 = req->compl_W1;
-	req->jd.hdr.model_id = op->model_id;
-	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
-	req->jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
-	req->jd.hdr.sp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
-	req->jd.model_run.input_ddr_addr =
+	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
+	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(req->status);
+	req->cn10k_req.jd.hdr.model_id = op->model_id;
+	req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
+	req->cn10k_req.jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
+	req->cn10k_req.jd.hdr.sp_flags = 0x0;
+	req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
+	req->cn10k_req.jd.model_run.input_ddr_addr =
 		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->input[0]->addr));
-	req->jd.model_run.output_ddr_addr =
+	req->cn10k_req.jd.model_run.output_ddr_addr =
 		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->output[0]->addr));
-	req->jd.model_run.num_batches = op->nb_batches;
+	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
 }
 
 struct xstat_info {
@@ -863,7 +874,7 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	op.input = &inp;
 	op.output = &out;
 
-	memset(model->layer[0].glow.req, 0, sizeof(struct cn10k_ml_req));
+	memset(model->layer[0].glow.req, 0, sizeof(struct cnxk_ml_req));
 	ret = cn10k_ml_inference_sync(dev, &op);
 	plt_memzone_free(mz);
 
@@ -906,7 +917,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint16_t model_id;
 	uint32_t mz_size;
 	uint16_t tile_id;
@@ -1103,7 +1114,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint16_t model_id;
 	uint16_t qp_id;
 
@@ -1138,7 +1149,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
 		qp = dev->data->queue_pairs[qp_id];
 		if (qp != NULL) {
-			if (cn10k_ml_qp_destroy(dev, qp) != 0)
+			if (cnxk_ml_qp_destroy(dev, qp) != 0)
 				plt_err("Could not destroy queue pair %u", qp_id);
 			dev->data->queue_pairs[qp_id] = NULL;
 		}
@@ -1215,7 +1226,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 			      const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
 {
 	struct rte_ml_dev_info dev_info;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint32_t nb_desc;
 
 	if (queue_pair_id >= dev->data->nb_queue_pairs) {
@@ -1241,7 +1252,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	 */
 	nb_desc =
 		(qp_conf->nb_desc == dev_info.max_desc) ? dev_info.max_desc : qp_conf->nb_desc + 1;
-	qp = cn10k_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
+	qp = cnxk_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
 	if (qp == NULL) {
 		plt_err("Could not create queue pair %u", queue_pair_id);
 		return -ENOMEM;
@@ -1254,7 +1265,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 static int
 cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 {
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	int qp_id;
 
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
@@ -1271,7 +1282,7 @@ cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 static void
 cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	int qp_id;
 
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
@@ -1487,20 +1498,22 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 
 	/* Dump debug buffer */
 	for (core_id = 0; core_id <= 1; core_id++) {
-		bufsize = fw->req->jd.fw_load.debug.debug_buffer_size;
+		bufsize = fw->req->cn10k_req.jd.fw_load.debug.debug_buffer_size;
 		if (core_id == 0) {
 			head_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
 			tail_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_debug_ptr);
+			head_ptr =
+				PLT_PTR_CAST(fw->req->cn10k_req.jd.fw_load.debug.core0_debug_ptr);
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		} else {
 			head_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
 			tail_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_debug_ptr);
+			head_ptr =
+				PLT_PTR_CAST(fw->req->cn10k_req.jd.fw_load.debug.core1_debug_ptr);
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		}
 		if (head_loc < tail_loc) {
@@ -1513,17 +1526,19 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 
 	/* Dump exception info */
 	for (core_id = 0; core_id <= 1; core_id++) {
-		bufsize = fw->req->jd.fw_load.debug.exception_state_size;
+		bufsize = fw->req->cn10k_req.jd.fw_load.debug.exception_state_size;
 		if ((core_id == 0) &&
 		    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0)) {
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_exception_buffer);
+			head_ptr = PLT_PTR_CAST(
+				fw->req->cn10k_req.jd.fw_load.debug.core0_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C0 = 0x%016lx",
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0));
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 			fprintf(fp, "%.*s", bufsize, head_ptr);
 		} else if ((core_id == 1) && (roc_ml_reg_read64(&cn10k_mldev->roc,
 								ML_SCRATCH_EXCEPTION_SP_C1) != 0)) {
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_exception_buffer);
+			head_ptr = PLT_PTR_CAST(
+				fw->req->cn10k_req.jd.fw_load.debug.core1_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C1 = 0x%016lx",
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1));
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
@@ -1540,14 +1555,14 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	const struct plt_memzone *mz;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 	uint64_t timeout_cycle;
 	bool timeout;
 	int ret;
 
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cn10k_ml_req), 0,
+	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cnxk_ml_req), 0,
 					 ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL) {
 		plt_err("Could not allocate reserved memzone");
@@ -1556,23 +1571,24 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	req = mz->addr;
 
 	/* Prepare load completion structure */
-	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
-	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
-	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
-	req->jd.fw_load.flags = cn10k_ml_fw_flags_get(&cn10k_mldev->fw);
-	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
+	memset(&req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->cn10k_req.status);
+	req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST;
+	req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
+	req->cn10k_req.jd.fw_load.flags = cn10k_ml_fw_flags_get(&cn10k_mldev->fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
 	/* Enqueue firmware selftest request through scratch registers */
 	timeout = true;
 	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 
 	plt_rmb();
 	do {
 		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
-		    (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH)) {
+		    (plt_read64(&req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -1583,7 +1599,7 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	if (timeout) {
 		ret = -ETIME;
 	} else {
-		if (req->result.error_code.u64 != 0)
+		if (req->cn10k_req.result.error_code != 0)
 			ret = -1;
 	}
 
@@ -1656,7 +1672,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
 		  2 * model_data_size + model_scratch_size + model_info_size +
-		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE) +
+		  PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE) +
 		  model_stats_size;
 
 	/* Allocate memzone for model object and model data */
@@ -1728,7 +1744,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	/* Reset burst and sync stats */
 	model->layer[0].glow.burst_stats =
 		PLT_PTR_ADD(model->layer[0].glow.req,
-			    PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE));
+			    PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
 		model->layer[0].glow.burst_stats[qp_id].hw_latency_tot = 0;
 		model->layer[0].glow.burst_stats[qp_id].hw_latency_min = UINT64_MAX;
@@ -1792,7 +1808,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 
 	bool job_enqueued;
 	bool job_dequeued;
@@ -1817,10 +1833,10 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	/* Prepare JD */
 	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
-	req->result.error_code.u64 = 0x0;
-	req->result.user_ptr = NULL;
+	req->cn10k_req.result.error_code = 0x0;
+	req->cn10k_req.result.user_ptr = NULL;
 
-	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
 	num_tiles = model->layer[0].glow.metadata.model.tile_end -
@@ -1880,8 +1896,8 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Update JD */
 	cn10k_ml_ocm_tilecount(model->layer[0].glow.ocm_map.tilemask, &tile_start, &tile_end);
-	req->jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
-	req->jd.model_start.ocm_wb_base_address =
+	req->cn10k_req.jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
+	req->cn10k_req.jd.model_start.ocm_wb_base_address =
 		model->layer[0].glow.ocm_map.wb_page_start * ocm->page_size;
 
 	job_enqueued = false;
@@ -1889,19 +1905,21 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	do {
 		if (!job_enqueued) {
 			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
+			job_enqueued =
+				roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
+			job_dequeued =
+				roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
-			if (req->result.error_code.u64 == 0)
+		if (plt_read64(&req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH) {
+			if (req->cn10k_req.result.error_code == 0)
 				ret = 0;
 			else
 				ret = -1;
@@ -1954,7 +1972,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 
 	bool job_enqueued;
 	bool job_dequeued;
@@ -1974,10 +1992,10 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	/* Prepare JD */
 	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
-	req->result.error_code.u64 = 0x0;
-	req->result.user_ptr = NULL;
+	req->cn10k_req.result.error_code = 0x0;
+	req->cn10k_req.result.user_ptr = NULL;
 
-	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
 	locked = false;
@@ -2017,19 +2035,21 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	do {
 		if (!job_enqueued) {
 			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
+			job_enqueued =
+				roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
+			job_dequeued =
+				roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
-			if (req->result.error_code.u64 == 0x0)
+		if (plt_read64(&req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH) {
+			if (req->cn10k_req.result.error_code == 0x0)
 				ret = 0;
 			else
 				ret = -1;
@@ -2289,18 +2309,23 @@ queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
 }
 
 static __rte_always_inline void
-cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
-		       struct rte_ml_op *op)
+cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *req)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_layer_stats *stats;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cn10k_ml_result *result;
 	struct cnxk_ml_model *model;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
+	struct rte_ml_op *op;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
 
-	if (likely(result->error_code.u64 == 0)) {
+	result = &req->cn10k_req.result;
+	op = req->op;
+
+	if (likely(result->error_code == 0)) {
 		model = dev->data->models[op->model_id];
 		if (likely(qp_id >= 0)) {
 			qp = dev->data->queue_pairs[qp_id];
@@ -2331,7 +2356,7 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		stats->fw_latency_max = PLT_MAX(stats->fw_latency_max, fw_latency);
 		stats->dequeued_count++;
 
-		op->impl_opaque = result->error_code.u64;
+		op->impl_opaque = result->error_code;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
 	} else {
 		if (likely(qp_id >= 0)) {
@@ -2340,7 +2365,8 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		}
 
 		/* Handle driver error */
-		if (result->error_code.s.etype == ML_ETYPE_DRIVER) {
+		error_code = (union cn10k_ml_error_code *)&result->error_code;
+		if (error_code->s.etype == ML_ETYPE_DRIVER) {
 			cnxk_mldev = dev->data->dev_private;
 			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
@@ -2348,15 +2374,15 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 			if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) !=
 			     0) ||
 			    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
-				result->error_code.s.stype = ML_DRIVER_ERR_EXCEPTION;
+				error_code->s.stype = ML_DRIVER_ERR_EXCEPTION;
 			else if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_LO) != 0) ||
 				 (roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_HI) != 0))
-				result->error_code.s.stype = ML_DRIVER_ERR_FW_ERROR;
+				error_code->s.stype = ML_DRIVER_ERR_FW_ERROR;
 			else
-				result->error_code.s.stype = ML_DRIVER_ERR_UNKNOWN;
+				error_code->s.stype = ML_DRIVER_ERR_UNKNOWN;
 		}
 
-		op->impl_opaque = result->error_code.u64;
+		op->impl_opaque = result->error_code;
 		op->status = RTE_ML_OP_STATUS_ERROR;
 	}
 
@@ -2367,11 +2393,12 @@ __rte_hot uint16_t
 cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_req *req;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_req *req;
+	struct cnxk_ml_qp *qp;
 	struct rte_ml_op *op;
 
 	uint16_t count;
@@ -2397,12 +2424,13 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	cn10k_mldev->set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
-	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
-	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
-	req->result.user_ptr = op->user_ptr;
+	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
+	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	req->cn10k_req.result.user_ptr = op->user_ptr;
 
 	cn10k_mldev->set_poll_ptr(req);
-	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd);
+	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
 
@@ -2426,11 +2454,12 @@ __rte_hot uint16_t
 cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_req *req;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_req *req;
+	struct cnxk_ml_qp *qp;
 
 	uint64_t status;
 	uint16_t count;
@@ -2452,13 +2481,15 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	req = &queue->reqs[tail];
 	status = cn10k_mldev->get_poll_ptr(req);
 	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
-		if (plt_tsc_cycles() < req->timeout)
+		if (plt_tsc_cycles() < req->timeout) {
 			goto empty_or_active;
-		else /* Timeout, set indication of driver error */
-			req->result.error_code.s.etype = ML_ETYPE_DRIVER;
+		} else { /* Timeout, set indication of driver error */
+			error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+			error_code->s.etype = ML_ETYPE_DRIVER;
+		}
 	}
 
-	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
+	cn10k_ml_result_update(dev, qp_id, req);
 	ops[count] = req->op;
 
 	queue_index_advance(&tail, qp->nb_desc);
@@ -2509,10 +2540,11 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 __rte_hot int
 cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 	bool timeout;
 	int ret = 0;
 
@@ -2524,17 +2556,18 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	cn10k_ml_set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
-	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
-	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
-	req->result.user_ptr = op->user_ptr;
+	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
+	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	req->cn10k_req.result.user_ptr = op->user_ptr;
 
 	cn10k_mldev->set_poll_ptr(req);
-	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
+	req->cn10k_req.jcmd.w1.s.jobptr = PLT_U64_CAST(&req->cn10k_req.jd);
 
 	timeout = true;
 	req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	do {
-		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd)) {
+		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd)) {
 			req->op = op;
 			timeout = false;
 			break;
@@ -2557,7 +2590,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	if (timeout)
 		ret = -ETIME;
 	else
-		cn10k_ml_result_update(dev, -1, &req->result, req->op);
+		cn10k_ml_result_update(dev, -1, req);
 
 error_enqueue:
 	return ret;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 005b093e45..fd5992e192 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -10,63 +10,279 @@
 
 #include <roc_api.h>
 
-#include "cn10k_ml_dev.h"
+/* Firmware version string length */
+#define MLDEV_FIRMWARE_VERSION_LENGTH 32
 
-/* Request structure */
-struct cn10k_ml_req {
-	/* Job descriptor */
-	struct cn10k_ml_jd jd;
+/* Job types */
+enum cn10k_ml_job_type {
+	ML_CN10K_JOB_TYPE_MODEL_RUN = 0,
+	ML_CN10K_JOB_TYPE_MODEL_STOP,
+	ML_CN10K_JOB_TYPE_MODEL_START,
+	ML_CN10K_JOB_TYPE_FIRMWARE_LOAD,
+	ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST,
+};
 
-	/* Job descriptor extra arguments */
-	union cn10k_ml_jd_extended_args extended_args;
+/* Firmware stats */
+struct cn10k_ml_stats {
+	/* Firmware start cycle */
+	uint64_t fw_start;
 
-	/* Job result */
-	struct cn10k_ml_result result;
+	/* Firmware end cycle */
+	uint64_t fw_end;
 
-	/* Status field for poll mode requests */
-	volatile uint64_t status;
+	/* Hardware start cycle */
+	uint64_t hw_start;
 
-	/* Job command */
-	struct ml_job_cmd_s jcmd;
+	/* Hardware end cycle */
+	uint64_t hw_end;
+};
+
+/* Result structure */
+struct cn10k_ml_result {
+	/* Job error code */
+	uint64_t error_code;
+
+	/* Stats */
+	struct cn10k_ml_stats stats;
+
+	/* User context pointer */
+	void *user_ptr;
+};
+
+/* Firmware capability structure */
+union cn10k_ml_fw_cap {
+	uint64_t u64;
+
+	struct {
+		/* CMPC completion support */
+		uint64_t cmpc_completions : 1;
+
+		/* Poll mode completion support */
+		uint64_t poll_completions : 1;
+
+		/* SSO completion support */
+		uint64_t sso_completions : 1;
+
+		/* Support for model side loading */
+		uint64_t side_load_model : 1;
 
-	/* Job completion W1 */
-	uint64_t compl_W1;
+		/* Batch execution */
+		uint64_t batch_run : 1;
 
-	/* Timeout cycle */
-	uint64_t timeout;
+		/* Max number of models to be loaded in parallel */
+		uint64_t max_models : 8;
 
-	/* Op */
-	struct rte_ml_op *op;
-} __rte_aligned(ROC_ALIGN);
+		/* Firmware statistics */
+		uint64_t fw_stats : 1;
 
-/* Request queue */
-struct cn10k_ml_queue {
-	/* Array of requests */
-	struct cn10k_ml_req *reqs;
+		/* Hardware statistics */
+		uint64_t hw_stats : 1;
 
-	/* Head of the queue, used for enqueue */
-	uint64_t head;
+		/* Max number of batches */
+		uint64_t max_num_batches : 16;
 
-	/* Tail of the queue, used for dequeue */
-	uint64_t tail;
+		uint64_t rsvd : 33;
+	} s;
+};
+
+/* Firmware debug info structure */
+struct cn10k_ml_fw_debug {
+	/* ACC core 0 debug buffer */
+	uint64_t core0_debug_ptr;
+
+	/* ACC core 1 debug buffer */
+	uint64_t core1_debug_ptr;
+
+	/* ACC core 0 exception state buffer */
+	uint64_t core0_exception_buffer;
+
+	/* ACC core 1 exception state buffer */
+	uint64_t core1_exception_buffer;
+
+	/* Debug buffer size per core */
+	uint32_t debug_buffer_size;
 
-	/* Wait cycles before timeout */
-	uint64_t wait_cycles;
+	/* Exception state dump size */
+	uint32_t exception_state_size;
 };
 
-/* Queue-pair structure */
-struct cn10k_ml_qp {
-	/* ID */
-	uint32_t id;
+/* Job descriptor header (32 bytes) */
+struct cn10k_ml_jd_header {
+	/* Job completion structure */
+	struct ml_jce_s jce;
+
+	/* Model ID */
+	uint64_t model_id : 8;
+
+	/* Job type */
+	uint64_t job_type : 8;
+
+	/* Flags for fast-path jobs */
+	uint64_t fp_flags : 16;
+
+	/* Flags for slow-path jobs */
+	uint64_t sp_flags : 16;
+	uint64_t rsvd : 16;
+
+	/* Job result pointer */
+	uint64_t *result;
+};
+
+/* Extra arguments for job descriptor */
+union cn10k_ml_jd_extended_args {
+	struct cn10k_ml_jd_extended_args_section_start {
+		/* DDR Scratch base address */
+		uint64_t ddr_scratch_base_address;
+
+		/* DDR Scratch range start */
+		uint64_t ddr_scratch_range_start;
+
+		/* DDR Scratch range end */
+		uint64_t ddr_scratch_range_end;
+
+		uint8_t rsvd[104];
+	} start;
+};
+
+/* Job descriptor structure */
+struct cn10k_ml_jd {
+	/* Job descriptor header (32 bytes) */
+	struct cn10k_ml_jd_header hdr;
+
+	union {
+		struct cn10k_ml_jd_section_fw_load {
+			/* Firmware capability structure (8 bytes) */
+			union cn10k_ml_fw_cap cap;
+
+			/* Firmware version (32 bytes) */
+			uint8_t version[MLDEV_FIRMWARE_VERSION_LENGTH];
+
+			/* Debug capability structure (40 bytes) */
+			struct cn10k_ml_fw_debug debug;
 
-	/* Number of descriptors */
-	uint64_t nb_desc;
+			/* Flags to control error handling */
+			uint64_t flags;
 
-	/* Request queue */
-	struct cn10k_ml_queue queue;
+			uint8_t rsvd[8];
+		} fw_load;
 
-	/* Statistics per queue-pair */
-	struct rte_ml_dev_stats stats;
+		struct cn10k_ml_jd_section_model_start {
+			/* Extended arguments */
+			uint64_t extended_args;
+
+			/* Destination model start address in DDR relative to ML_MLR_BASE */
+			uint64_t model_dst_ddr_addr;
+
+			/* Offset to model init section in the model */
+			uint64_t model_init_offset : 32;
+
+			/* Size of init section in the model */
+			uint64_t model_init_size : 32;
+
+			/* Offset to model main section in the model */
+			uint64_t model_main_offset : 32;
+
+			/* Size of main section in the model */
+			uint64_t model_main_size : 32;
+
+			/* Offset to model finish section in the model */
+			uint64_t model_finish_offset : 32;
+
+			/* Size of finish section in the model */
+			uint64_t model_finish_size : 32;
+
+			/* Offset to WB in model bin */
+			uint64_t model_wb_offset : 32;
+
+			/* Number of model layers */
+			uint64_t num_layers : 8;
+
+			/* Number of gather entries, 0 means linear input mode (= no gather) */
+			uint64_t num_gather_entries : 8;
+
+			/* Number of scatter entries 0 means linear input mode (= no scatter) */
+			uint64_t num_scatter_entries : 8;
+
+			/* Tile mask to load model */
+			uint64_t tilemask : 8;
+
+			/* Batch size of model  */
+			uint64_t batch_size : 32;
+
+			/* OCM WB base address */
+			uint64_t ocm_wb_base_address : 32;
+
+			/* OCM WB range start */
+			uint64_t ocm_wb_range_start : 32;
+
+			/* OCM WB range End */
+			uint64_t ocm_wb_range_end : 32;
+
+			/* DDR WB address */
+			uint64_t ddr_wb_base_address;
+
+			/* DDR WB range start */
+			uint64_t ddr_wb_range_start : 32;
+
+			/* DDR WB range end */
+			uint64_t ddr_wb_range_end : 32;
+
+			union {
+				/* Points to gather list if num_gather_entries > 0 */
+				void *gather_list;
+				struct {
+					/* Linear input mode */
+					uint64_t ddr_range_start : 32;
+					uint64_t ddr_range_end : 32;
+				} s;
+			} input;
+
+			union {
+				/* Points to scatter list if num_scatter_entries > 0 */
+				void *scatter_list;
+				struct {
+					/* Linear output mode */
+					uint64_t ddr_range_start : 32;
+					uint64_t ddr_range_end : 32;
+				} s;
+			} output;
+		} model_start;
+
+		struct cn10k_ml_jd_section_model_stop {
+			uint8_t rsvd[96];
+		} model_stop;
+
+		struct cn10k_ml_jd_section_model_run {
+			/* Address of the input for the run relative to ML_MLR_BASE */
+			uint64_t input_ddr_addr;
+
+			/* Address of the output for the run relative to ML_MLR_BASE */
+			uint64_t output_ddr_addr;
+
+			/* Number of batches to run in variable batch processing */
+			uint16_t num_batches;
+
+			uint8_t rsvd[78];
+		} model_run;
+	};
+} __plt_aligned(ROC_ALIGN);
+
+/* CN10K specific request */
+struct cn10k_ml_req {
+	/* Job descriptor */
+	struct cn10k_ml_jd jd;
+
+	/* Job descriptor extra arguments */
+	union cn10k_ml_jd_extended_args extended_args;
+
+	/* Status field for poll mode requests */
+	volatile uint64_t status;
+
+	/* Job command */
+	struct ml_job_cmd_s jcmd;
+
+	/* Result */
+	struct cn10k_ml_result result;
 };
 
 /* Device ops */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
new file mode 100644
index 0000000000..f1872dcf7c
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include "cnxk_ml_ops.h"
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
new file mode 100644
index 0000000000..b953fb0f5f
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_OPS_H_
+#define _CNXK_ML_OPS_H_
+
+#include <rte_mldev.h>
+#include <rte_mldev_core.h>
+
+#include <roc_api.h>
+
+#include "cn10k_ml_ops.h"
+
+/* Request structure */
+struct cnxk_ml_req {
+	/* Device specific request */
+	union {
+		/* CN10K */
+		struct cn10k_ml_req cn10k_req;
+	};
+
+	/* Address of status field */
+	volatile uint64_t *status;
+
+	/* Timeout cycle */
+	uint64_t timeout;
+
+	/* Op */
+	struct rte_ml_op *op;
+} __rte_aligned(ROC_ALIGN);
+
+/* Request queue */
+struct cnxk_ml_queue {
+	/* Array of requests */
+	struct cnxk_ml_req *reqs;
+
+	/* Head of the queue, used for enqueue */
+	uint64_t head;
+
+	/* Tail of the queue, used for dequeue */
+	uint64_t tail;
+
+	/* Wait cycles before timeout */
+	uint64_t wait_cycles;
+};
+
+/* Queue-pair structure */
+struct cnxk_ml_qp {
+	/* ID */
+	uint32_t id;
+
+	/* Number of descriptors */
+	uint64_t nb_desc;
+
+	/* Request queue */
+	struct cnxk_ml_queue queue;
+
+	/* Statistics per queue-pair */
+	struct rte_ml_dev_stats stats;
+};
+
+#endif /* _CNXK_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 72e03b15b5..73db458fcd 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -15,6 +15,7 @@ driver_sdk_headers = files(
         'cnxk_ml_dev.h',
         'cnxk_ml_io.h',
         'cnxk_ml_model.h',
+        'cnxk_ml_ops.h',
 )
 
 sources = files(
@@ -24,6 +25,7 @@ sources = files(
         'cn10k_ml_ocm.c',
         'cnxk_ml_dev.c',
         'cnxk_ml_model.c',
+        'cnxk_ml_ops.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 06/34] ml/cnxk: add generic cnxk xstats structures
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (4 preceding siblings ...)
  2023-09-20  7:24   ` [PATCH v2 05/34] ml/cnxk: add generic cnxk request structure Srikanth Yalavarthi
@ 2023-09-20  7:24   ` Srikanth Yalavarthi
  2023-09-20  7:24   ` [PATCH v2 07/34] ml/cnxk: rename cnxk ops function pointers struct Srikanth Yalavarthi
                     ` (28 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:24 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduced generic xstats structures and renamed cn10k
xstats enumerations with cnxk prefix.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |  86 +---------------
 drivers/ml/cnxk/cn10k_ml_model.h |   6 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 169 ++++++++++++++-----------------
 drivers/ml/cnxk/cnxk_ml_xstats.h | 128 +++++++++++++++++++++++
 drivers/ml/cnxk/meson.build      |   1 +
 5 files changed, 210 insertions(+), 180 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_xstats.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 1852d4f6c9..be989e0a20 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -10,6 +10,7 @@
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_io.h"
+#include "cnxk_ml_xstats.h"
 
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
@@ -121,89 +122,6 @@ struct cn10k_ml_fw {
 	struct cnxk_ml_req *req;
 };
 
-/* Extended stats types enum */
-enum cn10k_ml_xstats_type {
-	/* Number of models loaded */
-	nb_models_loaded,
-
-	/* Number of models unloaded */
-	nb_models_unloaded,
-
-	/* Number of models started */
-	nb_models_started,
-
-	/* Number of models stopped */
-	nb_models_stopped,
-
-	/* Average inference hardware latency */
-	avg_hw_latency,
-
-	/* Minimum hardware latency */
-	min_hw_latency,
-
-	/* Maximum hardware latency */
-	max_hw_latency,
-
-	/* Average firmware latency */
-	avg_fw_latency,
-
-	/* Minimum firmware latency */
-	min_fw_latency,
-
-	/* Maximum firmware latency */
-	max_fw_latency,
-};
-
-/* Extended stats function type enum. */
-enum cn10k_ml_xstats_fn_type {
-	/* Device function */
-	CN10K_ML_XSTATS_FN_DEVICE,
-
-	/* Model function */
-	CN10K_ML_XSTATS_FN_MODEL,
-};
-
-/* Function pointer to get xstats for a type */
-typedef uint64_t (*cn10k_ml_xstats_fn)(struct rte_ml_dev *dev, uint16_t obj_idx,
-				       enum cn10k_ml_xstats_type stat);
-
-/* Extended stats entry structure */
-struct cn10k_ml_xstats_entry {
-	/* Name-ID map */
-	struct rte_ml_dev_xstats_map map;
-
-	/* xstats mode, device or model */
-	enum rte_ml_dev_xstats_mode mode;
-
-	/* Type of xstats */
-	enum cn10k_ml_xstats_type type;
-
-	/* xstats function */
-	enum cn10k_ml_xstats_fn_type fn_id;
-
-	/* Object ID, model ID for model stat type */
-	uint16_t obj_idx;
-
-	/* Allowed to reset the stat */
-	uint8_t reset_allowed;
-
-	/* An offset to be taken away to emulate resets */
-	uint64_t reset_value;
-};
-
-/* Extended stats data */
-struct cn10k_ml_xstats {
-	/* Pointer to xstats entries */
-	struct cn10k_ml_xstats_entry *entries;
-
-	/* Store num stats and offset of the stats for each model */
-	uint16_t count_per_model[ML_CNXK_MAX_MODELS];
-	uint16_t offset_for_model[ML_CNXK_MAX_MODELS];
-	uint16_t count_mode_device;
-	uint16_t count_mode_model;
-	uint16_t count;
-};
-
 /* Device private data */
 struct cn10k_ml_dev {
 	/* Device ROC */
@@ -216,7 +134,7 @@ struct cn10k_ml_dev {
 	struct cn10k_ml_ocm ocm;
 
 	/* Extended stats data */
-	struct cn10k_ml_xstats xstats;
+	struct cnxk_ml_xstats xstats;
 
 	/* Enable / disable model data caching */
 	int cache_model_data;
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 74ada1531a..5c32f48c68 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -404,7 +404,7 @@ struct cn10k_ml_layer_addr {
 };
 
 /* Model fast-path stats */
-struct cn10k_ml_layer_stats {
+struct cn10k_ml_layer_xstats {
 	/* Total hardware latency, sum of all inferences */
 	uint64_t hw_latency_tot;
 
@@ -447,10 +447,10 @@ struct cn10k_ml_layer_data {
 	struct cnxk_ml_req *req;
 
 	/* Layer: Stats for burst ops */
-	struct cn10k_ml_layer_stats *burst_stats;
+	struct cn10k_ml_layer_xstats *burst_xstats;
 
 	/* Layer: Stats for sync ops */
-	struct cn10k_ml_layer_stats *sync_stats;
+	struct cn10k_ml_layer_xstats *sync_xstats;
 };
 
 struct cn10k_ml_model_data {
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 2b1fa08154..03a7447dc8 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -14,6 +14,7 @@
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
+#include "cnxk_ml_xstats.h"
 
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
@@ -429,26 +430,6 @@ cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
 }
 
-struct xstat_info {
-	char name[32];
-	enum cn10k_ml_xstats_type type;
-	uint8_t reset_allowed;
-};
-
-/* Note: Device stats are not allowed to be reset. */
-static const struct xstat_info device_stats[] = {
-	{"nb_models_loaded", nb_models_loaded, 0},
-	{"nb_models_unloaded", nb_models_unloaded, 0},
-	{"nb_models_started", nb_models_started, 0},
-	{"nb_models_stopped", nb_models_stopped, 0},
-};
-
-static const struct xstat_info model_stats[] = {
-	{"Avg-HW-Latency", avg_hw_latency, 1}, {"Min-HW-Latency", min_hw_latency, 1},
-	{"Max-HW-Latency", max_hw_latency, 1}, {"Avg-FW-Latency", avg_fw_latency, 1},
-	{"Min-FW-Latency", min_fw_latency, 1}, {"Max-FW-Latency", max_fw_latency, 1},
-};
-
 static int
 cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 {
@@ -463,10 +444,10 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
-	nb_stats = RTE_DIM(device_stats) + ML_CNXK_MAX_MODELS * RTE_DIM(model_stats);
+	nb_stats = RTE_DIM(device_xstats) + ML_CNXK_MAX_MODELS * RTE_DIM(layer_xstats);
 	if (cn10k_mldev->xstats.entries == NULL)
 		cn10k_mldev->xstats.entries = rte_zmalloc(
-			"cn10k_ml_xstats", sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
+			"cn10k_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
 			PLT_CACHE_LINE_SIZE);
 
 	if (cn10k_mldev->xstats.entries == NULL)
@@ -474,17 +455,17 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 
 	/* Initialize device xstats */
 	stat_id = 0;
-	for (i = 0; i < RTE_DIM(device_stats); i++) {
+	for (i = 0; i < RTE_DIM(device_xstats); i++) {
 		cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
 		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s",
-			 device_stats[i].name);
+			 device_xstats[i].name);
 
 		cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
-		cn10k_mldev->xstats.entries[stat_id].type = device_stats[i].type;
-		cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_DEVICE;
+		cn10k_mldev->xstats.entries[stat_id].type = device_xstats[i].type;
+		cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_DEVICE;
 		cn10k_mldev->xstats.entries[stat_id].obj_idx = 0;
-		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_stats[i].reset_allowed;
+		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_xstats[i].reset_allowed;
 		stat_id++;
 	}
 	cn10k_mldev->xstats.count_mode_device = stat_id;
@@ -493,24 +474,24 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
 		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
 
-		for (i = 0; i < RTE_DIM(model_stats); i++) {
+		for (i = 0; i < RTE_DIM(layer_xstats); i++) {
 			cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
 			cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
-			cn10k_mldev->xstats.entries[stat_id].type = model_stats[i].type;
-			cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_MODEL;
+			cn10k_mldev->xstats.entries[stat_id].type = layer_xstats[i].type;
+			cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
 			cn10k_mldev->xstats.entries[stat_id].obj_idx = model;
 			cn10k_mldev->xstats.entries[stat_id].reset_allowed =
-				model_stats[i].reset_allowed;
+				layer_xstats[i].reset_allowed;
 
 			/* Name of xstat is updated during model load */
 			snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 				 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name),
-				 "Model-%u-%s", model, model_stats[i].name);
+				 "Model-%u-%s", model, layer_xstats[i].name);
 
 			stat_id++;
 		}
 
-		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(model_stats);
+		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
 	}
 
 	cn10k_mldev->xstats.count_mode_model = stat_id - cn10k_mldev->xstats.count_mode_device;
@@ -549,7 +530,7 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[model_id];
-	stat_id = RTE_DIM(device_stats) + model_id * RTE_DIM(model_stats);
+	stat_id = RTE_DIM(device_xstats) + model_id * RTE_DIM(layer_xstats);
 
 	roc_clk_freq_get(&rclk_freq, &sclk_freq);
 	if (sclk_freq == 0)
@@ -558,17 +539,17 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 		strcpy(suffix, "ns");
 
 	/* Update xstat name based on model name and sclk availability */
-	for (i = 0; i < RTE_DIM(model_stats); i++) {
+	for (i = 0; i < RTE_DIM(layer_xstats); i++) {
 		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
-			 model->layer[0].glow.metadata.model.name, model_stats[i].name, suffix);
+			 model->layer[0].glow.metadata.model.name, layer_xstats[i].name, suffix);
 		stat_id++;
 	}
 }
 
 static uint64_t
 cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
-		       enum cn10k_ml_xstats_type type)
+		       enum cnxk_ml_xstats_type type)
 {
 	struct cnxk_ml_dev *cnxk_mldev;
 
@@ -594,9 +575,9 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value += model->layer[0].glow.burst_stats[qp_id].str##_latency_tot;        \
-			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
-				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
+			value += model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot;       \
+			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
+				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
 		}                                                                                  \
 		if (count != 0)                                                                    \
 			value = value / count;                                                     \
@@ -607,9 +588,10 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 		value = UINT64_MAX;                                                                \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
 			value = PLT_MIN(                                                           \
-				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_min); \
-			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
-				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
+				value,                                                             \
+				model->layer[0].glow.burst_xstats[qp_id].str##_latency_min);       \
+			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
+				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
@@ -620,16 +602,17 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
 			value = PLT_MAX(                                                           \
-				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_max); \
-			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
-				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
+				value,                                                             \
+				model->layer[0].glow.burst_xstats[qp_id].str##_latency_max);       \
+			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
+				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
 	} while (0)
 
 static uint64_t
-cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cn10k_ml_xstats_type type)
+cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cnxk_ml_xstats_type type)
 {
 	struct cnxk_ml_model *model;
 	uint16_t rclk_freq; /* MHz */
@@ -675,8 +658,8 @@ cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cn10k_ml
 static int
 cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[], uint16_t nb_ids)
 {
-	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
 	uint16_t nb_stats;
 	uint16_t stat_id;
@@ -712,26 +695,26 @@ cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[],
 #define ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			model->layer[0].glow.burst_stats[qp_id].str##_latency_tot = 0;             \
-			model->layer[0].glow.burst_stats[qp_id].str##_reset_count =                \
-				model->layer[0].glow.burst_stats[qp_id].dequeued_count;            \
+			model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot = 0;            \
+			model->layer[0].glow.burst_xstats[qp_id].str##_reset_count =               \
+				model->layer[0].glow.burst_xstats[qp_id].dequeued_count;           \
 		}                                                                                  \
 	} while (0)
 
 #define ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_stats[qp_id].str##_latency_min = UINT64_MAX;    \
+			model->layer[0].glow.burst_xstats[qp_id].str##_latency_min = UINT64_MAX;   \
 	} while (0)
 
 #define ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_stats[qp_id].str##_latency_max = 0;             \
+			model->layer[0].glow.burst_xstats[qp_id].str##_latency_max = 0;            \
 	} while (0)
 
 static void
-cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cn10k_ml_xstats_type type)
+cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cnxk_ml_xstats_type type)
 {
 	struct cnxk_ml_model *model;
 	uint32_t qp_id;
@@ -766,8 +749,8 @@ static int
 cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint16_t stat_ids[],
 			    uint16_t nb_ids)
 {
-	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	int32_t lcl_model_id = 0;
@@ -1346,10 +1329,10 @@ static int
 cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
 				uint64_t *value)
 {
-	struct cn10k_ml_xstats_entry *xs;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	cn10k_ml_xstats_fn fn;
+	cnxk_ml_xstats_fn fn;
 	uint32_t i;
 
 	cnxk_mldev = dev->data->dev_private;
@@ -1361,10 +1344,10 @@ cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16
 				*stat_id = xs->map.id;
 
 			switch (xs->fn_id) {
-			case CN10K_ML_XSTATS_FN_DEVICE:
+			case CNXK_ML_XSTATS_FN_DEVICE:
 				fn = cn10k_ml_dev_xstat_get;
 				break;
-			case CN10K_ML_XSTATS_FN_MODEL:
+			case CNXK_ML_XSTATS_FN_MODEL:
 				fn = cn10k_ml_model_xstat_get;
 				break;
 			default:
@@ -1388,11 +1371,11 @@ static int
 cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
 			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
 {
-	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
 	uint32_t xstats_mode_count;
-	cn10k_ml_xstats_fn fn;
+	cnxk_ml_xstats_fn fn;
 	uint64_t val;
 	uint32_t idx;
 	uint32_t i;
@@ -1427,10 +1410,10 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 		}
 
 		switch (xs->fn_id) {
-		case CN10K_ML_XSTATS_FN_DEVICE:
+		case CNXK_ML_XSTATS_FN_DEVICE:
 			fn = cn10k_ml_dev_xstat_get;
 			break;
-		case CN10K_ML_XSTATS_FN_MODEL:
+		case CNXK_ML_XSTATS_FN_MODEL:
 			fn = cn10k_ml_model_xstat_get;
 			break;
 		default:
@@ -1668,7 +1651,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
 			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
 	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
-	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_stats);
+	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_xstats);
 
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
 		  2 * model_data_size + model_scratch_size + model_info_size +
@@ -1742,24 +1725,24 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	model->layer[0].glow.req = PLT_PTR_ADD(model->info, model_info_size);
 
 	/* Reset burst and sync stats */
-	model->layer[0].glow.burst_stats =
+	model->layer[0].glow.burst_xstats =
 		PLT_PTR_ADD(model->layer[0].glow.req,
 			    PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
-		model->layer[0].glow.burst_stats[qp_id].hw_latency_tot = 0;
-		model->layer[0].glow.burst_stats[qp_id].hw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_stats[qp_id].hw_latency_max = 0;
-		model->layer[0].glow.burst_stats[qp_id].fw_latency_tot = 0;
-		model->layer[0].glow.burst_stats[qp_id].fw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_stats[qp_id].fw_latency_max = 0;
-		model->layer[0].glow.burst_stats[qp_id].hw_reset_count = 0;
-		model->layer[0].glow.burst_stats[qp_id].fw_reset_count = 0;
-		model->layer[0].glow.burst_stats[qp_id].dequeued_count = 0;
+		model->layer[0].glow.burst_xstats[qp_id].hw_latency_tot = 0;
+		model->layer[0].glow.burst_xstats[qp_id].hw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_xstats[qp_id].hw_latency_max = 0;
+		model->layer[0].glow.burst_xstats[qp_id].fw_latency_tot = 0;
+		model->layer[0].glow.burst_xstats[qp_id].fw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_xstats[qp_id].fw_latency_max = 0;
+		model->layer[0].glow.burst_xstats[qp_id].hw_reset_count = 0;
+		model->layer[0].glow.burst_xstats[qp_id].fw_reset_count = 0;
+		model->layer[0].glow.burst_xstats[qp_id].dequeued_count = 0;
 	}
 
-	model->layer[0].glow.sync_stats =
-		PLT_PTR_ADD(model->layer[0].glow.burst_stats,
-			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_stats));
+	model->layer[0].glow.sync_xstats =
+		PLT_PTR_ADD(model->layer[0].glow.burst_xstats,
+			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_xstats));
 
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CNXK_MODEL_STATE_LOADED;
@@ -2312,7 +2295,7 @@ static __rte_always_inline void
 cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *req)
 {
 	union cn10k_ml_error_code *error_code;
-	struct cn10k_ml_layer_stats *stats;
+	struct cn10k_ml_layer_xstats *xstats;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_result *result;
@@ -2330,31 +2313,31 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *re
 		if (likely(qp_id >= 0)) {
 			qp = dev->data->queue_pairs[qp_id];
 			qp->stats.dequeued_count++;
-			stats = &model->layer[0].glow.burst_stats[qp_id];
+			xstats = &model->layer[0].glow.burst_xstats[qp_id];
 		} else {
-			stats = model->layer[0].glow.sync_stats;
+			xstats = model->layer[0].glow.sync_xstats;
 		}
 
-		if (unlikely(stats->dequeued_count == stats->hw_reset_count)) {
-			stats->hw_latency_min = UINT64_MAX;
-			stats->hw_latency_max = 0;
+		if (unlikely(xstats->dequeued_count == xstats->hw_reset_count)) {
+			xstats->hw_latency_min = UINT64_MAX;
+			xstats->hw_latency_max = 0;
 		}
 
-		if (unlikely(stats->dequeued_count == stats->fw_reset_count)) {
-			stats->fw_latency_min = UINT64_MAX;
-			stats->fw_latency_max = 0;
+		if (unlikely(xstats->dequeued_count == xstats->fw_reset_count)) {
+			xstats->fw_latency_min = UINT64_MAX;
+			xstats->fw_latency_max = 0;
 		}
 
 		hw_latency = result->stats.hw_end - result->stats.hw_start;
 		fw_latency = result->stats.fw_end - result->stats.fw_start - hw_latency;
 
-		stats->hw_latency_tot += hw_latency;
-		stats->hw_latency_min = PLT_MIN(stats->hw_latency_min, hw_latency);
-		stats->hw_latency_max = PLT_MAX(stats->hw_latency_max, hw_latency);
-		stats->fw_latency_tot += fw_latency;
-		stats->fw_latency_min = PLT_MIN(stats->fw_latency_min, fw_latency);
-		stats->fw_latency_max = PLT_MAX(stats->fw_latency_max, fw_latency);
-		stats->dequeued_count++;
+		xstats->hw_latency_tot += hw_latency;
+		xstats->hw_latency_min = PLT_MIN(xstats->hw_latency_min, hw_latency);
+		xstats->hw_latency_max = PLT_MAX(xstats->hw_latency_max, hw_latency);
+		xstats->fw_latency_tot += fw_latency;
+		xstats->fw_latency_min = PLT_MIN(xstats->fw_latency_min, fw_latency);
+		xstats->fw_latency_max = PLT_MAX(xstats->fw_latency_max, fw_latency);
+		xstats->dequeued_count++;
 
 		op->impl_opaque = result->error_code;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
diff --git a/drivers/ml/cnxk/cnxk_ml_xstats.h b/drivers/ml/cnxk/cnxk_ml_xstats.h
new file mode 100644
index 0000000000..0d405679ca
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_xstats.h
@@ -0,0 +1,128 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_XSTATS_H_
+#define _CNXK_ML_XSTATS_H_
+
+#include "cnxk_ml_io.h"
+
+/* Extended stats types enum */
+enum cnxk_ml_xstats_type {
+	/* Number of models loaded */
+	nb_models_loaded,
+
+	/* Number of models unloaded */
+	nb_models_unloaded,
+
+	/* Number of models started */
+	nb_models_started,
+
+	/* Number of models stopped */
+	nb_models_stopped,
+
+	/* Average inference hardware latency */
+	avg_hw_latency,
+
+	/* Minimum hardware latency */
+	min_hw_latency,
+
+	/* Maximum hardware latency */
+	max_hw_latency,
+
+	/* Average firmware latency */
+	avg_fw_latency,
+
+	/* Minimum firmware latency */
+	min_fw_latency,
+
+	/* Maximum firmware latency */
+	max_fw_latency,
+
+	/* Average runtime latency */
+	avg_rt_latency,
+
+	/* Minimum runtime latency */
+	min_rt_latency,
+
+	/* Maximum runtime latency */
+	max_rt_latency,
+};
+
+/* Extended stats function type enum. */
+enum cnxk_ml_xstats_fn_type {
+	/* Device function */
+	CNXK_ML_XSTATS_FN_DEVICE,
+
+	/* Model function */
+	CNXK_ML_XSTATS_FN_MODEL,
+};
+
+/* Function pointer to get xstats for a type */
+typedef uint64_t (*cnxk_ml_xstats_fn)(struct rte_ml_dev *cnxk_mldev, uint16_t obj_idx,
+				      enum cnxk_ml_xstats_type stat);
+
+/* Extended stats entry structure */
+struct cnxk_ml_xstats_entry {
+	/* Name-ID map */
+	struct rte_ml_dev_xstats_map map;
+
+	/* xstats mode, device or model */
+	enum rte_ml_dev_xstats_mode mode;
+
+	/* Type of xstats */
+	enum cnxk_ml_xstats_type type;
+
+	/* xstats function */
+	enum cnxk_ml_xstats_fn_type fn_id;
+
+	/* Object ID, model ID for model stat type */
+	uint16_t obj_idx;
+
+	/* Layer ID, valid for model stat type */
+	int32_t layer_id;
+
+	/* Allowed to reset the stat */
+	uint8_t reset_allowed;
+
+	/* An offset to be taken away to emulate resets */
+	uint64_t reset_value;
+};
+
+/* Extended stats data */
+struct cnxk_ml_xstats {
+	/* Pointer to xstats entries */
+	struct cnxk_ml_xstats_entry *entries;
+
+	/* Store num stats and offset of the stats for each model */
+	uint16_t count_per_model[ML_CNXK_MAX_MODELS];
+	uint16_t offset_for_model[ML_CNXK_MAX_MODELS];
+	uint16_t count_per_layer[ML_CNXK_MAX_MODELS][ML_CNXK_MODEL_MAX_LAYERS];
+	uint16_t offset_for_layer[ML_CNXK_MAX_MODELS][ML_CNXK_MODEL_MAX_LAYERS];
+	uint16_t count_mode_device;
+	uint16_t count_mode_model;
+	uint16_t count;
+};
+
+struct cnxk_ml_xstat_info {
+	char name[32];
+	enum cnxk_ml_xstats_type type;
+	uint8_t reset_allowed;
+};
+
+/* Device xstats. Note: Device stats are not allowed to be reset. */
+static const struct cnxk_ml_xstat_info device_xstats[] = {
+	{"nb_models_loaded", nb_models_loaded, 0},
+	{"nb_models_unloaded", nb_models_unloaded, 0},
+	{"nb_models_started", nb_models_started, 0},
+	{"nb_models_stopped", nb_models_stopped, 0},
+};
+
+/* Layer xstats */
+static const struct cnxk_ml_xstat_info layer_xstats[] = {
+	{"Avg-HW-Latency", avg_hw_latency, 1}, {"Min-HW-Latency", min_hw_latency, 1},
+	{"Max-HW-Latency", max_hw_latency, 1}, {"Avg-FW-Latency", avg_fw_latency, 1},
+	{"Min-FW-Latency", min_fw_latency, 1}, {"Max-FW-Latency", max_fw_latency, 1},
+};
+
+#endif /* _CNXK_ML_XSTATS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 73db458fcd..6385ac4548 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -16,6 +16,7 @@ driver_sdk_headers = files(
         'cnxk_ml_io.h',
         'cnxk_ml_model.h',
         'cnxk_ml_ops.h',
+        'cnxk_ml_xstats.h',
 )
 
 sources = files(
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 07/34] ml/cnxk: rename cnxk ops function pointers struct
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (5 preceding siblings ...)
  2023-09-20  7:24   ` [PATCH v2 06/34] ml/cnxk: add generic cnxk xstats structures Srikanth Yalavarthi
@ 2023-09-20  7:24   ` Srikanth Yalavarthi
  2023-09-20  7:24   ` [PATCH v2 08/34] ml/cnxk: update device handling functions Srikanth Yalavarthi
                     ` (27 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:24 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Renamed cn10k ML ops structure with cnxk prefix.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c |  2 +-
 drivers/ml/cnxk/cn10k_ml_ops.c | 73 +++++++++-------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h | 34 +++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.c  | 38 ++++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.h  |  2 +
 5 files changed, 93 insertions(+), 56 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index f6e05cfc47..20c114b8bf 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -404,7 +404,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 			goto pmd_destroy;
 		}
 
-		dev->dev_ops = &cn10k_ml_ops;
+		dev->dev_ops = &cnxk_ml_ops;
 	} else {
 		plt_err("CN10K ML Ops are not supported on secondary process");
 		dev->dev_ops = &ml_dev_dummy_ops;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 03a7447dc8..e6383283d3 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -123,7 +123,7 @@ cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
 {
 	struct cnxk_ml_qp *qp;
@@ -864,7 +864,7 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	return ret;
 }
 
-static int
+int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -892,7 +892,7 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
 {
 	struct rte_ml_dev_info dev_info;
@@ -1091,7 +1091,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	return ret;
 }
 
-static int
+int
 cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1164,7 +1164,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	return rte_dev_remove(dev->device);
 }
 
-static int
+int
 cn10k_ml_dev_start(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1184,7 +1184,7 @@ cn10k_ml_dev_start(struct rte_ml_dev *dev)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1204,7 +1204,7 @@ cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 			      const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
 {
@@ -1245,7 +1245,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 {
 	struct cnxk_ml_qp *qp;
@@ -1262,7 +1262,7 @@ cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 	return 0;
 }
 
-static void
+void
 cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 {
 	struct cnxk_ml_qp *qp;
@@ -1277,7 +1277,7 @@ cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 	}
 }
 
-static int
+int
 cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
 			      uint32_t size)
@@ -1325,7 +1325,7 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 	return idx;
 }
 
-static int
+int
 cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
 				uint64_t *value)
 {
@@ -1367,7 +1367,7 @@ cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16
 	return -EINVAL;
 }
 
-static int
+int
 cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
 			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
 {
@@ -1431,7 +1431,7 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 	return idx;
 }
 
-static int
+int
 cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 			  int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids)
 {
@@ -1445,7 +1445,7 @@ cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mo
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1532,7 +1532,7 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -2055,7 +2055,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	return ret;
 }
 
-static int
+int
 cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			struct rte_ml_model_info *model_info)
 {
@@ -2075,7 +2075,7 @@ cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 	return 0;
 }
 
-static int
+int
 cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
 {
 	struct cnxk_ml_model *model;
@@ -2109,7 +2109,7 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *bu
 	return 0;
 }
 
-static int
+int
 cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
 		     struct rte_ml_buff_seg **qbuffer)
 {
@@ -2190,7 +2190,7 @@ cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_bu
 	return 0;
 }
 
-static int
+int
 cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
 		       struct rte_ml_buff_seg **dbuffer)
 {
@@ -2578,38 +2578,3 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 error_enqueue:
 	return ret;
 }
-
-struct rte_ml_dev_ops cn10k_ml_ops = {
-	/* Device control ops */
-	.dev_info_get = cn10k_ml_dev_info_get,
-	.dev_configure = cn10k_ml_dev_configure,
-	.dev_close = cn10k_ml_dev_close,
-	.dev_start = cn10k_ml_dev_start,
-	.dev_stop = cn10k_ml_dev_stop,
-	.dev_dump = cn10k_ml_dev_dump,
-	.dev_selftest = cn10k_ml_dev_selftest,
-
-	/* Queue-pair handling ops */
-	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
-	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
-
-	/* Stats ops */
-	.dev_stats_get = cn10k_ml_dev_stats_get,
-	.dev_stats_reset = cn10k_ml_dev_stats_reset,
-	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
-	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
-	.dev_xstats_get = cn10k_ml_dev_xstats_get,
-	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
-
-	/* Model ops */
-	.model_load = cn10k_ml_model_load,
-	.model_unload = cn10k_ml_model_unload,
-	.model_start = cn10k_ml_model_start,
-	.model_stop = cn10k_ml_model_stop,
-	.model_info_get = cn10k_ml_model_info_get,
-	.model_params_update = cn10k_ml_model_params_update,
-
-	/* I/O ops */
-	.io_quantize = cn10k_ml_io_quantize,
-	.io_dequantize = cn10k_ml_io_dequantize,
-};
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index fd5992e192..16480b9ad8 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -286,7 +286,29 @@ struct cn10k_ml_req {
 };
 
 /* Device ops */
-extern struct rte_ml_dev_ops cn10k_ml_ops;
+int cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
+int cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf);
+int cn10k_ml_dev_close(struct rte_ml_dev *dev);
+int cn10k_ml_dev_start(struct rte_ml_dev *dev);
+int cn10k_ml_dev_stop(struct rte_ml_dev *dev);
+int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
+int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
+int cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
+				  const struct rte_ml_dev_qp_conf *qp_conf, int socket_id);
+int cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id);
+
+int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
+void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
+int cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+				  int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
+				  uint32_t size);
+int cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
+				    uint64_t *value);
+int cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+			    int32_t model_id, const uint16_t stat_ids[], uint64_t values[],
+			    uint16_t nb_ids);
+int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+			      int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids);
 
 /* Slow-path ops */
 int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
@@ -294,6 +316,16 @@ int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *para
 int cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
+int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
+			    struct rte_ml_model_info *model_info);
+int cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer);
+
+/* I/O ops */
+int cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id,
+			 struct rte_ml_buff_seg **dbuffer, struct rte_ml_buff_seg **qbuffer);
+
+int cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id,
+			   struct rte_ml_buff_seg **qbuffer, struct rte_ml_buff_seg **dbuffer);
 
 /* Fast-path ops */
 __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index f1872dcf7c..89e0d9d32c 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -3,5 +3,43 @@
  */
 
 #include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include "cn10k_ml_ops.h"
 
 #include "cnxk_ml_ops.h"
+
+struct rte_ml_dev_ops cnxk_ml_ops = {
+	/* Device control ops */
+	.dev_info_get = cn10k_ml_dev_info_get,
+	.dev_configure = cn10k_ml_dev_configure,
+	.dev_close = cn10k_ml_dev_close,
+	.dev_start = cn10k_ml_dev_start,
+	.dev_stop = cn10k_ml_dev_stop,
+	.dev_dump = cn10k_ml_dev_dump,
+	.dev_selftest = cn10k_ml_dev_selftest,
+
+	/* Queue-pair handling ops */
+	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
+	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
+
+	/* Stats ops */
+	.dev_stats_get = cn10k_ml_dev_stats_get,
+	.dev_stats_reset = cn10k_ml_dev_stats_reset,
+	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
+	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
+	.dev_xstats_get = cn10k_ml_dev_xstats_get,
+	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
+
+	/* Model ops */
+	.model_load = cn10k_ml_model_load,
+	.model_unload = cn10k_ml_model_unload,
+	.model_start = cn10k_ml_model_start,
+	.model_stop = cn10k_ml_model_stop,
+	.model_info_get = cn10k_ml_model_info_get,
+	.model_params_update = cn10k_ml_model_params_update,
+
+	/* I/O ops */
+	.io_quantize = cn10k_ml_io_quantize,
+	.io_dequantize = cn10k_ml_io_dequantize,
+};
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index b953fb0f5f..a925c07580 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -60,4 +60,6 @@ struct cnxk_ml_qp {
 	struct rte_ml_dev_stats stats;
 };
 
+extern struct rte_ml_dev_ops cnxk_ml_ops;
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 08/34] ml/cnxk: update device handling functions
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (6 preceding siblings ...)
  2023-09-20  7:24   ` [PATCH v2 07/34] ml/cnxk: rename cnxk ops function pointers struct Srikanth Yalavarthi
@ 2023-09-20  7:24   ` Srikanth Yalavarthi
  2023-09-20  7:25   ` [PATCH v2 09/34] ml/cnxk: update queue-pair " Srikanth Yalavarthi
                     ` (26 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:24 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implement CNXK wrapper functions for dev_info_get,
dev_configure, dev_close, dev_start and dev_stop. The
wrapper functions allocate / release common resources
for the ML driver and invoke device specific functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 230 ++------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |  16 +-
 drivers/ml/cnxk/cnxk_ml_dev.h  |   3 +
 drivers/ml/cnxk/cnxk_ml_ops.c  | 286 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.h  |   3 +
 5 files changed, 314 insertions(+), 224 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index e6383283d3..0f32f3b2bb 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -105,7 +105,7 @@ qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
 }
 
-static int
+int
 cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
 {
 	const struct rte_memzone *qp_mem;
@@ -865,20 +865,12 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 }
 
 int
-cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
+cn10k_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 
-	if (dev_info == NULL)
-		return -EINVAL;
-
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
-	dev_info->driver_name = dev->device->driver->name;
-	dev_info->max_models = ML_CNXK_MAX_MODELS;
 	if (cn10k_mldev->hw_queue_lock)
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
 	else
@@ -893,143 +885,17 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 }
 
 int
-cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
+cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
-	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cnxk_ml_qp *qp;
-	uint16_t model_id;
-	uint32_t mz_size;
 	uint16_t tile_id;
-	uint16_t qp_id;
 	int ret;
 
-	if (dev == NULL || conf == NULL)
-		return -EINVAL;
+	RTE_SET_USED(conf);
 
-	/* Get CN10K device handle */
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	cn10k_ml_dev_info_get(dev, &dev_info);
-	if (conf->nb_models > dev_info.max_models) {
-		plt_err("Invalid device config, nb_models > %u\n", dev_info.max_models);
-		return -EINVAL;
-	}
-
-	if (conf->nb_queue_pairs > dev_info.max_queue_pairs) {
-		plt_err("Invalid device config, nb_queue_pairs > %u\n", dev_info.max_queue_pairs);
-		return -EINVAL;
-	}
-
-	if (cnxk_mldev->state == ML_CNXK_DEV_STATE_PROBED) {
-		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
-			   conf->nb_queue_pairs, conf->nb_models);
-
-		/* Load firmware */
-		ret = cn10k_ml_fw_load(cnxk_mldev);
-		if (ret != 0)
-			return ret;
-	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
-		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
-			   conf->nb_queue_pairs, conf->nb_models);
-	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_STARTED) {
-		plt_err("Device can't be reconfigured in started state\n");
-		return -ENOTSUP;
-	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CLOSED) {
-		plt_err("Device can't be reconfigured after close\n");
-		return -ENOTSUP;
-	}
-
-	/* Configure queue-pairs */
-	if (dev->data->queue_pairs == NULL) {
-		mz_size = sizeof(dev->data->queue_pairs[0]) * conf->nb_queue_pairs;
-		dev->data->queue_pairs =
-			rte_zmalloc("cn10k_mldev_queue_pairs", mz_size, RTE_CACHE_LINE_SIZE);
-		if (dev->data->queue_pairs == NULL) {
-			dev->data->nb_queue_pairs = 0;
-			plt_err("Failed to get memory for queue_pairs, nb_queue_pairs %u",
-				conf->nb_queue_pairs);
-			return -ENOMEM;
-		}
-	} else { /* Re-configure */
-		void **queue_pairs;
-
-		/* Release all queue pairs as ML spec doesn't support queue_pair_destroy. */
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-			qp = dev->data->queue_pairs[qp_id];
-			if (qp != NULL) {
-				ret = cn10k_ml_dev_queue_pair_release(dev, qp_id);
-				if (ret < 0)
-					return ret;
-			}
-		}
-
-		queue_pairs = dev->data->queue_pairs;
-		queue_pairs =
-			rte_realloc(queue_pairs, sizeof(queue_pairs[0]) * conf->nb_queue_pairs,
-				    RTE_CACHE_LINE_SIZE);
-		if (queue_pairs == NULL) {
-			dev->data->nb_queue_pairs = 0;
-			plt_err("Failed to realloc queue_pairs, nb_queue_pairs = %u",
-				conf->nb_queue_pairs);
-			ret = -ENOMEM;
-			goto error;
-		}
-
-		memset(queue_pairs, 0, sizeof(queue_pairs[0]) * conf->nb_queue_pairs);
-		dev->data->queue_pairs = queue_pairs;
-	}
-	dev->data->nb_queue_pairs = conf->nb_queue_pairs;
-
-	/* Allocate ML models */
-	if (dev->data->models == NULL) {
-		mz_size = sizeof(dev->data->models[0]) * conf->nb_models;
-		dev->data->models = rte_zmalloc("cn10k_mldev_models", mz_size, RTE_CACHE_LINE_SIZE);
-		if (dev->data->models == NULL) {
-			dev->data->nb_models = 0;
-			plt_err("Failed to get memory for ml_models, nb_models %u",
-				conf->nb_models);
-			ret = -ENOMEM;
-			goto error;
-		}
-	} else {
-		/* Re-configure */
-		void **models;
-
-		/* Stop and unload all models */
-		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
-			model = dev->data->models[model_id];
-			if (model != NULL) {
-				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-					if (cn10k_ml_model_stop(dev, model_id) != 0)
-						plt_err("Could not stop model %u", model_id);
-				}
-				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-					if (cn10k_ml_model_unload(dev, model_id) != 0)
-						plt_err("Could not unload model %u", model_id);
-				}
-				dev->data->models[model_id] = NULL;
-			}
-		}
-
-		models = dev->data->models;
-		models = rte_realloc(models, sizeof(models[0]) * conf->nb_models,
-				     RTE_CACHE_LINE_SIZE);
-		if (models == NULL) {
-			dev->data->nb_models = 0;
-			plt_err("Failed to realloc ml_models, nb_models = %u", conf->nb_models);
-			ret = -ENOMEM;
-			goto error;
-		}
-		memset(models, 0, sizeof(models[0]) * conf->nb_models);
-		dev->data->models = models;
-	}
-	dev->data->nb_models = conf->nb_models;
-
 	ocm = &cn10k_mldev->ocm;
 	ocm->num_tiles = ML_CN10K_OCM_NUMTILES;
 	ocm->size_per_tile = ML_CN10K_OCM_TILESIZE;
@@ -1042,8 +908,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		rte_zmalloc("ocm_mask", ocm->mask_words * ocm->num_tiles, RTE_CACHE_LINE_SIZE);
 	if (ocm->ocm_mask == NULL) {
 		plt_err("Unable to allocate memory for OCM mask");
-		ret = -ENOMEM;
-		goto error;
+		return -ENOMEM;
 	}
 
 	for (tile_id = 0; tile_id < ocm->num_tiles; tile_id++) {
@@ -1054,10 +919,10 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	rte_spinlock_init(&ocm->lock);
 
 	/* Initialize xstats */
-	ret = cn10k_ml_xstats_init(dev);
+	ret = cn10k_ml_xstats_init(cnxk_mldev->mldev);
 	if (ret != 0) {
 		plt_err("Failed to initialize xstats");
-		goto error;
+		return ret;
 	}
 
 	/* Set JCMDQ enqueue function */
@@ -1071,77 +936,25 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	cn10k_mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
 	cn10k_mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
 
-	dev->enqueue_burst = cn10k_ml_enqueue_burst;
-	dev->dequeue_burst = cn10k_ml_dequeue_burst;
-	dev->op_error_get = cn10k_ml_op_error_get;
-
-	cnxk_mldev->nb_models_loaded = 0;
-	cnxk_mldev->nb_models_started = 0;
-	cnxk_mldev->nb_models_stopped = 0;
-	cnxk_mldev->nb_models_unloaded = 0;
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
+	cnxk_mldev->mldev->enqueue_burst = cn10k_ml_enqueue_burst;
+	cnxk_mldev->mldev->dequeue_burst = cn10k_ml_dequeue_burst;
+	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
 
 	return 0;
-
-error:
-	rte_free(dev->data->queue_pairs);
-
-	rte_free(dev->data->models);
-
-	return ret;
 }
 
 int
-cn10k_ml_dev_close(struct rte_ml_dev *dev)
+cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
-	struct cnxk_ml_qp *qp;
-	uint16_t model_id;
-	uint16_t qp_id;
 
-	if (dev == NULL)
-		return -EINVAL;
-
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Release ocm_mask memory */
 	rte_free(cn10k_mldev->ocm.ocm_mask);
 
-	/* Stop and unload all models */
-	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
-		model = dev->data->models[model_id];
-		if (model != NULL) {
-			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-				if (cn10k_ml_model_stop(dev, model_id) != 0)
-					plt_err("Could not stop model %u", model_id);
-			}
-			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-				if (cn10k_ml_model_unload(dev, model_id) != 0)
-					plt_err("Could not unload model %u", model_id);
-			}
-			dev->data->models[model_id] = NULL;
-		}
-	}
-
-	rte_free(dev->data->models);
-
-	/* Destroy all queue pairs */
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-		qp = dev->data->queue_pairs[qp_id];
-		if (qp != NULL) {
-			if (cnxk_ml_qp_destroy(dev, qp) != 0)
-				plt_err("Could not destroy queue pair %u", qp_id);
-			dev->data->queue_pairs[qp_id] = NULL;
-		}
-	}
-
-	rte_free(dev->data->queue_pairs);
-
 	/* Un-initialize xstats */
-	cn10k_ml_xstats_uninit(dev);
+	cn10k_ml_xstats_uninit(cnxk_mldev->mldev);
 
 	/* Unload firmware */
 	cn10k_ml_fw_unload(cnxk_mldev);
@@ -1158,20 +971,15 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_MLR_BASE);
 	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
 
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_CLOSED;
-
-	/* Remove PCI device */
-	return rte_dev_remove(dev->device);
+	return 0;
 }
 
 int
-cn10k_ml_dev_start(struct rte_ml_dev *dev)
+cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
@@ -1179,19 +987,15 @@ cn10k_ml_dev_start(struct rte_ml_dev *dev)
 	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
-
 	return 0;
 }
 
 int
-cn10k_ml_dev_stop(struct rte_ml_dev *dev)
+cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
@@ -1199,8 +1003,6 @@ cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
-
 	return 0;
 }
 
@@ -1221,7 +1023,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	if (dev->data->queue_pairs[queue_pair_id] != NULL)
 		cn10k_ml_dev_queue_pair_release(dev, queue_pair_id);
 
-	cn10k_ml_dev_info_get(dev, &dev_info);
+	cnxk_ml_dev_info_get(dev, &dev_info);
 	if ((qp_conf->nb_desc > dev_info.max_desc) || (qp_conf->nb_desc == 0)) {
 		plt_err("Could not setup queue pair for %u descriptors", qp_conf->nb_desc);
 		return -EINVAL;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 16480b9ad8..d50b5bede7 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -10,6 +10,9 @@
 
 #include <roc_api.h>
 
+struct cnxk_ml_dev;
+struct cnxk_ml_qp;
+
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
 
@@ -286,11 +289,11 @@ struct cn10k_ml_req {
 };
 
 /* Device ops */
-int cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
-int cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf);
-int cn10k_ml_dev_close(struct rte_ml_dev *dev);
-int cn10k_ml_dev_start(struct rte_ml_dev *dev);
-int cn10k_ml_dev_stop(struct rte_ml_dev *dev);
+int cn10k_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info);
+int cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
+int cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev);
+int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
 int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
 int cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
@@ -336,4 +339,7 @@ __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op
 				    struct rte_ml_op_error *error);
 __rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
 
+/* Temporarily set below functions as non-static */
+int cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp);
+
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 51315de622..02605fa28f 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -53,6 +53,9 @@ struct cnxk_ml_dev {
 
 	/* CN10K device structure */
 	struct cn10k_ml_dev cn10k_mldev;
+
+	/* Maximum number of layers */
+	uint64_t max_nb_layers;
 };
 
 #endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 89e0d9d32c..83d5cbae58 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -7,15 +7,291 @@
 
 #include "cn10k_ml_ops.h"
 
+#include "cnxk_ml_dev.h"
+#include "cnxk_ml_io.h"
+#include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
+int
+cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	if (dev == NULL || dev_info == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
+	dev_info->driver_name = dev->device->driver->name;
+	dev_info->max_models = ML_CNXK_MAX_MODELS;
+
+	return cn10k_ml_dev_info_get(cnxk_mldev, dev_info);
+}
+
+static int
+cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_qp *qp;
+	uint16_t model_id;
+	uint32_t mz_size;
+	uint16_t qp_id;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	/* Get CNXK device handle */
+	cnxk_mldev = dev->data->dev_private;
+
+	cnxk_ml_dev_info_get(dev, &dev_info);
+	if (conf->nb_models > dev_info.max_models) {
+		plt_err("Invalid device config, nb_models > %u\n", dev_info.max_models);
+		return -EINVAL;
+	}
+
+	if (conf->nb_queue_pairs > dev_info.max_queue_pairs) {
+		plt_err("Invalid device config, nb_queue_pairs > %u\n", dev_info.max_queue_pairs);
+		return -EINVAL;
+	}
+
+	if (cnxk_mldev->state == ML_CNXK_DEV_STATE_PROBED) {
+		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
+			   conf->nb_queue_pairs, conf->nb_models);
+
+		/* Load firmware */
+		ret = cn10k_ml_fw_load(cnxk_mldev);
+		if (ret != 0)
+			return ret;
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
+		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
+			   conf->nb_queue_pairs, conf->nb_models);
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_STARTED) {
+		plt_err("Device can't be reconfigured in started state\n");
+		return -ENOTSUP;
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CLOSED) {
+		plt_err("Device can't be reconfigured after close\n");
+		return -ENOTSUP;
+	}
+
+	/* Configure queue-pairs */
+	if (dev->data->queue_pairs == NULL) {
+		mz_size = sizeof(dev->data->queue_pairs[0]) * conf->nb_queue_pairs;
+		dev->data->queue_pairs =
+			rte_zmalloc("cnxk_mldev_queue_pairs", mz_size, RTE_CACHE_LINE_SIZE);
+		if (dev->data->queue_pairs == NULL) {
+			dev->data->nb_queue_pairs = 0;
+			plt_err("Failed to get memory for queue_pairs, nb_queue_pairs %u",
+				conf->nb_queue_pairs);
+			return -ENOMEM;
+		}
+	} else { /* Re-configure */
+		void **queue_pairs;
+
+		/* Release all queue pairs as ML spec doesn't support queue_pair_destroy. */
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+			qp = dev->data->queue_pairs[qp_id];
+			if (qp != NULL) {
+				ret = cn10k_ml_dev_queue_pair_release(dev, qp_id);
+				if (ret < 0)
+					return ret;
+			}
+		}
+
+		queue_pairs = dev->data->queue_pairs;
+		queue_pairs =
+			rte_realloc(queue_pairs, sizeof(queue_pairs[0]) * conf->nb_queue_pairs,
+				    RTE_CACHE_LINE_SIZE);
+		if (queue_pairs == NULL) {
+			dev->data->nb_queue_pairs = 0;
+			plt_err("Failed to realloc queue_pairs, nb_queue_pairs = %u",
+				conf->nb_queue_pairs);
+			ret = -ENOMEM;
+			goto error;
+		}
+
+		memset(queue_pairs, 0, sizeof(queue_pairs[0]) * conf->nb_queue_pairs);
+		dev->data->queue_pairs = queue_pairs;
+	}
+	dev->data->nb_queue_pairs = conf->nb_queue_pairs;
+
+	/* Allocate ML models */
+	if (dev->data->models == NULL) {
+		mz_size = sizeof(dev->data->models[0]) * conf->nb_models;
+		dev->data->models = rte_zmalloc("cnxk_mldev_models", mz_size, RTE_CACHE_LINE_SIZE);
+		if (dev->data->models == NULL) {
+			dev->data->nb_models = 0;
+			plt_err("Failed to get memory for ml_models, nb_models %u",
+				conf->nb_models);
+			ret = -ENOMEM;
+			goto error;
+		}
+	} else {
+		/* Re-configure */
+		void **models;
+
+		/* Stop and unload all models */
+		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+			model = dev->data->models[model_id];
+			if (model != NULL) {
+				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
+					if (cn10k_ml_model_stop(dev, model_id) != 0)
+						plt_err("Could not stop model %u", model_id);
+				}
+				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
+					if (cn10k_ml_model_unload(dev, model_id) != 0)
+						plt_err("Could not unload model %u", model_id);
+				}
+				dev->data->models[model_id] = NULL;
+			}
+		}
+
+		models = dev->data->models;
+		models = rte_realloc(models, sizeof(models[0]) * conf->nb_models,
+				     RTE_CACHE_LINE_SIZE);
+		if (models == NULL) {
+			dev->data->nb_models = 0;
+			plt_err("Failed to realloc ml_models, nb_models = %u", conf->nb_models);
+			ret = -ENOMEM;
+			goto error;
+		}
+		memset(models, 0, sizeof(models[0]) * conf->nb_models);
+		dev->data->models = models;
+	}
+	dev->data->nb_models = conf->nb_models;
+
+	ret = cn10k_ml_dev_configure(cnxk_mldev, conf);
+	if (ret != 0) {
+		plt_err("Failed to configure CN10K ML Device");
+		goto error;
+	}
+
+	/* Set device capabilities */
+	cnxk_mldev->max_nb_layers =
+		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
+
+	cnxk_mldev->nb_models_loaded = 0;
+	cnxk_mldev->nb_models_started = 0;
+	cnxk_mldev->nb_models_stopped = 0;
+	cnxk_mldev->nb_models_unloaded = 0;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
+
+	return 0;
+
+error:
+	rte_free(dev->data->queue_pairs);
+	rte_free(dev->data->models);
+
+	return ret;
+}
+
+static int
+cnxk_ml_dev_close(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_qp *qp;
+	uint16_t model_id;
+	uint16_t qp_id;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
+		plt_err("Failed to close CN10K ML Device");
+
+	/* Stop and unload all models */
+	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+		model = dev->data->models[model_id];
+		if (model != NULL) {
+			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
+				if (cn10k_ml_model_stop(dev, model_id) != 0)
+					plt_err("Could not stop model %u", model_id);
+			}
+			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
+				if (cn10k_ml_model_unload(dev, model_id) != 0)
+					plt_err("Could not unload model %u", model_id);
+			}
+			dev->data->models[model_id] = NULL;
+		}
+	}
+
+	rte_free(dev->data->models);
+
+	/* Destroy all queue pairs */
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		if (qp != NULL) {
+			if (cnxk_ml_qp_destroy(dev, qp) != 0)
+				plt_err("Could not destroy queue pair %u", qp_id);
+			dev->data->queue_pairs[qp_id] = NULL;
+		}
+	}
+
+	rte_free(dev->data->queue_pairs);
+
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CLOSED;
+
+	/* Remove PCI device */
+	return rte_dev_remove(dev->device);
+}
+
+static int
+cnxk_ml_dev_start(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	ret = cn10k_ml_dev_start(cnxk_mldev);
+	if (ret != 0) {
+		plt_err("Failed to start CN10K ML Device");
+		return ret;
+	}
+
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
+
+	return 0;
+}
+
+static int
+cnxk_ml_dev_stop(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	ret = cn10k_ml_dev_stop(cnxk_mldev);
+	if (ret != 0) {
+		plt_err("Failed to stop CN10K ML Device");
+		return ret;
+	}
+
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
-	.dev_info_get = cn10k_ml_dev_info_get,
-	.dev_configure = cn10k_ml_dev_configure,
-	.dev_close = cn10k_ml_dev_close,
-	.dev_start = cn10k_ml_dev_start,
-	.dev_stop = cn10k_ml_dev_stop,
+	.dev_info_get = cnxk_ml_dev_info_get,
+	.dev_configure = cnxk_ml_dev_configure,
+	.dev_close = cnxk_ml_dev_close,
+	.dev_start = cnxk_ml_dev_start,
+	.dev_stop = cnxk_ml_dev_stop,
 	.dev_dump = cn10k_ml_dev_dump,
 	.dev_selftest = cn10k_ml_dev_selftest,
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index a925c07580..2996928d7d 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -62,4 +62,7 @@ struct cnxk_ml_qp {
 
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
+/* Temporarily set cnxk driver functions as non-static */
+int cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 09/34] ml/cnxk: update queue-pair handling functions
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (7 preceding siblings ...)
  2023-09-20  7:24   ` [PATCH v2 08/34] ml/cnxk: update device handling functions Srikanth Yalavarthi
@ 2023-09-20  7:25   ` Srikanth Yalavarthi
  2023-09-20  7:25   ` [PATCH v2 10/34] ml/cnxk: update model load and unload functions Srikanth Yalavarthi
                     ` (25 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:25 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to handle ML device queue-pairs.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 135 +----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   7 +-
 drivers/ml/cnxk/cnxk_ml_ops.c  | 153 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.h  |   3 -
 4 files changed, 154 insertions(+), 144 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 0f32f3b2bb..330cb050cb 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -99,93 +99,12 @@ cn10k_ml_get_poll_ptr(struct cnxk_ml_req *req)
 	return plt_read64(req->status);
 }
 
-static void
-qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
-{
-	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
-}
-
-int
-cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
-{
-	const struct rte_memzone *qp_mem;
-	char name[RTE_MEMZONE_NAMESIZE];
-	int ret;
-
-	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp->id);
-	qp_mem = rte_memzone_lookup(name);
-	ret = rte_memzone_free(qp_mem);
-	if (ret)
-		return ret;
-
-	rte_free(qp);
-
-	return 0;
-}
-
-int
-cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
-{
-	struct cnxk_ml_qp *qp;
-	int ret;
-
-	qp = dev->data->queue_pairs[queue_pair_id];
-	if (qp == NULL)
-		return -EINVAL;
-
-	ret = cnxk_ml_qp_destroy(dev, qp);
-	if (ret) {
-		plt_err("Could not destroy queue pair %u", queue_pair_id);
-		return ret;
-	}
-
-	dev->data->queue_pairs[queue_pair_id] = NULL;
-
-	return 0;
-}
-
-static struct cnxk_ml_qp *
-cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
+void
+cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp)
 {
-	const struct rte_memzone *qp_mem;
-	char name[RTE_MEMZONE_NAMESIZE];
-	struct cnxk_ml_qp *qp;
-	uint32_t len;
-	uint8_t *va;
 	uint64_t i;
 
-	/* Allocate queue pair */
-	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cnxk_ml_qp), ROC_ALIGN,
-				socket_id);
-	if (qp == NULL) {
-		plt_err("Could not allocate queue pair");
-		return NULL;
-	}
-
-	/* For request queue */
-	len = nb_desc * sizeof(struct cnxk_ml_req);
-	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp_id);
-	qp_mem = rte_memzone_reserve_aligned(
-		name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | RTE_MEMZONE_256MB, ROC_ALIGN);
-	if (qp_mem == NULL) {
-		plt_err("Could not reserve memzone: %s", name);
-		goto qp_free;
-	}
-
-	va = qp_mem->addr;
-	memset(va, 0, len);
-
-	/* Initialize Request queue */
-	qp->id = qp_id;
-	qp->queue.reqs = (struct cnxk_ml_req *)va;
-	qp->queue.head = 0;
-	qp->queue.tail = 0;
-	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	qp->nb_desc = nb_desc;
-	qp->stats.enqueued_count = 0;
-	qp->stats.dequeued_count = 0;
-	qp->stats.enqueue_err_count = 0;
-	qp->stats.dequeue_err_count = 0;
+	RTE_SET_USED(cnxk_mldev);
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
@@ -193,13 +112,6 @@ cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc
 		qp->queue.reqs[i].cn10k_req.jcmd.w1.s.jobptr =
 			PLT_U64_CAST(&qp->queue.reqs[i].cn10k_req.jd);
 	}
-
-	return qp;
-
-qp_free:
-	rte_free(qp);
-
-	return NULL;
 }
 
 static void
@@ -1006,47 +918,6 @@ cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
-int
-cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
-			      const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
-{
-	struct rte_ml_dev_info dev_info;
-	struct cnxk_ml_qp *qp;
-	uint32_t nb_desc;
-
-	if (queue_pair_id >= dev->data->nb_queue_pairs) {
-		plt_err("Queue-pair id = %u (>= max queue pairs supported, %u)\n", queue_pair_id,
-			dev->data->nb_queue_pairs);
-		return -EINVAL;
-	}
-
-	if (dev->data->queue_pairs[queue_pair_id] != NULL)
-		cn10k_ml_dev_queue_pair_release(dev, queue_pair_id);
-
-	cnxk_ml_dev_info_get(dev, &dev_info);
-	if ((qp_conf->nb_desc > dev_info.max_desc) || (qp_conf->nb_desc == 0)) {
-		plt_err("Could not setup queue pair for %u descriptors", qp_conf->nb_desc);
-		return -EINVAL;
-	}
-	plt_ml_dbg("Creating queue-pair, queue_pair_id = %u, nb_desc = %u", queue_pair_id,
-		   qp_conf->nb_desc);
-
-	/* As the number of usable descriptors is 1 less than the queue size being created, we
-	 * increment the size of queue by 1 than the requested size, except when the requested size
-	 * is equal to the maximum possible size.
-	 */
-	nb_desc =
-		(qp_conf->nb_desc == dev_info.max_desc) ? dev_info.max_desc : qp_conf->nb_desc + 1;
-	qp = cnxk_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
-	if (qp == NULL) {
-		plt_err("Could not create queue pair %u", queue_pair_id);
-		return -ENOMEM;
-	}
-	dev->data->queue_pairs[queue_pair_id] = qp;
-
-	return 0;
-}
-
 int
 cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 {
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index d50b5bede7..2d0a49d5cd 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -296,9 +296,6 @@ int cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
 int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
-int cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
-				  const struct rte_ml_dev_qp_conf *qp_conf, int socket_id);
-int cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id);
 
 int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
 void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
@@ -339,7 +336,7 @@ __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op
 				    struct rte_ml_op_error *error);
 __rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
 
-/* Temporarily set below functions as non-static */
-int cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp);
+/* Misc ops */
+void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
 
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 83d5cbae58..1767a8a3db 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -12,7 +12,107 @@
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
-int
+static void
+qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
+{
+	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
+}
+
+static int
+cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
+{
+	const struct rte_memzone *qp_mem;
+	char name[RTE_MEMZONE_NAMESIZE];
+	int ret;
+
+	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp->id);
+	qp_mem = rte_memzone_lookup(name);
+	ret = rte_memzone_free(qp_mem);
+	if (ret)
+		return ret;
+
+	rte_free(qp);
+
+	return 0;
+}
+
+static int
+cnxk_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
+{
+	struct cnxk_ml_qp *qp;
+	int ret;
+
+	qp = dev->data->queue_pairs[queue_pair_id];
+	if (qp == NULL)
+		return -EINVAL;
+
+	ret = cnxk_ml_qp_destroy(dev, qp);
+	if (ret) {
+		plt_err("Could not destroy queue pair %u", queue_pair_id);
+		return ret;
+	}
+
+	dev->data->queue_pairs[queue_pair_id] = NULL;
+
+	return 0;
+}
+
+static struct cnxk_ml_qp *
+cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
+{
+	const struct rte_memzone *qp_mem;
+	char name[RTE_MEMZONE_NAMESIZE];
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_qp *qp;
+	uint32_t len;
+	uint8_t *va;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	/* Allocate queue pair */
+	qp = rte_zmalloc_socket("cnxk_ml_pmd_queue_pair", sizeof(struct cnxk_ml_qp), ROC_ALIGN,
+				socket_id);
+	if (qp == NULL) {
+		plt_err("Could not allocate queue pair");
+		return NULL;
+	}
+
+	/* For request queue */
+	len = nb_desc * sizeof(struct cnxk_ml_req);
+	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp_id);
+	qp_mem = rte_memzone_reserve_aligned(
+		name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | RTE_MEMZONE_256MB, ROC_ALIGN);
+	if (qp_mem == NULL) {
+		plt_err("Could not reserve memzone: %s", name);
+		goto qp_free;
+	}
+
+	va = qp_mem->addr;
+	memset(va, 0, len);
+
+	/* Initialize Request queue */
+	qp->id = qp_id;
+	qp->queue.reqs = (struct cnxk_ml_req *)va;
+	qp->queue.head = 0;
+	qp->queue.tail = 0;
+	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	qp->nb_desc = nb_desc;
+	qp->stats.enqueued_count = 0;
+	qp->stats.dequeued_count = 0;
+	qp->stats.enqueue_err_count = 0;
+	qp->stats.dequeue_err_count = 0;
+
+	cn10k_ml_qp_initialize(cnxk_mldev, qp);
+
+	return qp;
+
+qp_free:
+	rte_free(qp);
+
+	return NULL;
+}
+
+static int
 cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
 	struct cnxk_ml_dev *cnxk_mldev;
@@ -95,7 +195,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
 			qp = dev->data->queue_pairs[qp_id];
 			if (qp != NULL) {
-				ret = cn10k_ml_dev_queue_pair_release(dev, qp_id);
+				ret = cnxk_ml_dev_queue_pair_release(dev, qp_id);
 				if (ret < 0)
 					return ret;
 			}
@@ -285,6 +385,51 @@ cnxk_ml_dev_stop(struct rte_ml_dev *dev)
 	return 0;
 }
 
+static int
+cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
+			     const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cnxk_ml_qp *qp;
+	uint32_t nb_desc;
+
+	if (queue_pair_id >= dev->data->nb_queue_pairs) {
+		plt_err("Queue-pair id = %u (>= max queue pairs supported, %u)\n", queue_pair_id,
+			dev->data->nb_queue_pairs);
+		return -EINVAL;
+	}
+
+	if (dev->data->queue_pairs[queue_pair_id] != NULL)
+		cnxk_ml_dev_queue_pair_release(dev, queue_pair_id);
+
+	cnxk_ml_dev_info_get(dev, &dev_info);
+	if (qp_conf->nb_desc == 0) {
+		plt_err("Could not setup queue pair for %u descriptors", qp_conf->nb_desc);
+		return -EINVAL;
+	} else if (qp_conf->nb_desc > dev_info.max_desc) {
+		plt_err("Could not setup queue pair for %u descriptors (> %u)", qp_conf->nb_desc,
+			dev_info.max_desc);
+		return -EINVAL;
+	}
+	plt_ml_dbg("Creating queue-pair, queue_pair_id = %u, nb_desc = %u", queue_pair_id,
+		   qp_conf->nb_desc);
+
+	/* As the number of usable descriptors is 1 less than the queue size being created, we
+	 * increment the size of queue by 1 than the requested size, except when the requested size
+	 * is equal to the maximum possible size.
+	 */
+	nb_desc =
+		(qp_conf->nb_desc == dev_info.max_desc) ? dev_info.max_desc : qp_conf->nb_desc + 1;
+	qp = cnxk_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
+	if (qp == NULL) {
+		plt_err("Could not create queue pair %u", queue_pair_id);
+		return -ENOMEM;
+	}
+	dev->data->queue_pairs[queue_pair_id] = qp;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -296,8 +441,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_selftest = cn10k_ml_dev_selftest,
 
 	/* Queue-pair handling ops */
-	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
-	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
+	.dev_queue_pair_setup = cnxk_ml_dev_queue_pair_setup,
+	.dev_queue_pair_release = cnxk_ml_dev_queue_pair_release,
 
 	/* Stats ops */
 	.dev_stats_get = cn10k_ml_dev_stats_get,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index 2996928d7d..a925c07580 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -62,7 +62,4 @@ struct cnxk_ml_qp {
 
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
-/* Temporarily set cnxk driver functions as non-static */
-int cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
-
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 10/34] ml/cnxk: update model load and unload functions
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (8 preceding siblings ...)
  2023-09-20  7:25   ` [PATCH v2 09/34] ml/cnxk: update queue-pair " Srikanth Yalavarthi
@ 2023-09-20  7:25   ` Srikanth Yalavarthi
  2023-09-20  7:25   ` [PATCH v2 11/34] ml/cnxk: update model start and stop functions Srikanth Yalavarthi
                     ` (24 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:25 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented cnxk wrapper functions to load and unload
ML models. Wrapper functions would invoke the cn10k
model load and unload functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 239 ++++++++++++-------------
 drivers/ml/cnxk/cn10k_ml_model.h |  25 +--
 drivers/ml/cnxk/cn10k_ml_ops.c   | 296 ++++++++++++++++++-------------
 drivers/ml/cnxk/cn10k_ml_ops.h   |  12 +-
 drivers/ml/cnxk/cnxk_ml_dev.h    |  15 ++
 drivers/ml/cnxk/cnxk_ml_ops.c    | 144 ++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.h    |   2 +
 7 files changed, 455 insertions(+), 278 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 2a0ae44cfd..9a336cd18f 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -6,7 +6,6 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
@@ -318,42 +317,31 @@ cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer, uint8_t
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_layer_addr *addr;
-	size_t model_data_size;
 	uint8_t *dma_addr_load;
-	uint8_t *dma_addr_run;
 	int fpos;
 
 	metadata = &layer->glow.metadata;
 	addr = &layer->glow.addr;
-	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
-			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
 
 	/* Base address */
 	addr->base_dma_addr_load = base_dma_addr;
-	addr->base_dma_addr_run = PLT_PTR_ADD(addr->base_dma_addr_load, model_data_size);
 
 	/* Init section */
 	dma_addr_load = addr->base_dma_addr_load;
-	dma_addr_run = addr->base_dma_addr_run;
 	fpos = sizeof(struct cn10k_ml_model_metadata);
 	addr->init_load_addr = dma_addr_load;
-	addr->init_run_addr = dma_addr_run;
 	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->init_model.file_size);
 
 	/* Main section */
 	dma_addr_load += metadata->init_model.file_size;
-	dma_addr_run += metadata->init_model.file_size;
 	fpos += metadata->init_model.file_size;
 	addr->main_load_addr = dma_addr_load;
-	addr->main_run_addr = dma_addr_run;
 	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->main_model.file_size);
 
 	/* Finish section */
 	dma_addr_load += metadata->main_model.file_size;
-	dma_addr_run += metadata->main_model.file_size;
 	fpos += metadata->main_model.file_size;
 	addr->finish_load_addr = dma_addr_load;
-	addr->finish_run_addr = dma_addr_run;
 	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->finish_model.file_size);
 
 	/* Weights and Bias section */
@@ -365,140 +353,140 @@ cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer, uint8_t
 }
 
 void
-cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer)
+cn10k_ml_layer_io_info_update(struct cnxk_ml_io_info *io_info,
+			      struct cn10k_ml_model_metadata *metadata)
 {
-	struct cn10k_ml_model_metadata *metadata;
 	uint8_t i;
 	uint8_t j;
 
-	metadata = &layer->glow.metadata;
-
 	/* Inputs */
-	layer->info.nb_inputs = metadata->model.num_input;
-	layer->info.total_input_sz_d = 0;
-	layer->info.total_input_sz_q = 0;
+	io_info->nb_inputs = metadata->model.num_input;
+	io_info->total_input_sz_d = 0;
+	io_info->total_input_sz_q = 0;
 	for (i = 0; i < metadata->model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			strncpy(layer->info.input[i].name, (char *)metadata->input1[i].input_name,
+			strncpy(io_info->input[i].name, (char *)metadata->input1[i].input_name,
 				MRVL_ML_INPUT_NAME_LEN);
-			layer->info.input[i].dtype = metadata->input1[i].input_type;
-			layer->info.input[i].qtype = metadata->input1[i].model_input_type;
-			layer->info.input[i].nb_dims = 4;
-			layer->info.input[i].shape[0] = metadata->input1[i].shape.w;
-			layer->info.input[i].shape[1] = metadata->input1[i].shape.x;
-			layer->info.input[i].shape[2] = metadata->input1[i].shape.y;
-			layer->info.input[i].shape[3] = metadata->input1[i].shape.z;
-			layer->info.input[i].nb_elements =
+			io_info->input[i].dtype = metadata->input1[i].input_type;
+			io_info->input[i].qtype = metadata->input1[i].model_input_type;
+			io_info->input[i].nb_dims = 4;
+			io_info->input[i].shape[0] = metadata->input1[i].shape.w;
+			io_info->input[i].shape[1] = metadata->input1[i].shape.x;
+			io_info->input[i].shape[2] = metadata->input1[i].shape.y;
+			io_info->input[i].shape[3] = metadata->input1[i].shape.z;
+			io_info->input[i].nb_elements =
 				metadata->input1[i].shape.w * metadata->input1[i].shape.x *
 				metadata->input1[i].shape.y * metadata->input1[i].shape.z;
-			layer->info.input[i].sz_d =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_d =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].input_type);
-			layer->info.input[i].sz_q =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_q =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].model_input_type);
-			layer->info.input[i].scale = metadata->input1[i].qscale;
+			io_info->input[i].scale = metadata->input1[i].qscale;
 
-			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
-			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
+			io_info->total_input_sz_d += io_info->input[i].sz_d;
+			io_info->total_input_sz_q += io_info->input[i].sz_q;
 
 			plt_ml_dbg(
-				"index = %u, input1[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				layer->index, i, metadata->input1[i].shape.w,
+				"layer_name = %s, input1[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				metadata->model.name, i, metadata->input1[i].shape.w,
 				metadata->input1[i].shape.x, metadata->input1[i].shape.y,
-				metadata->input1[i].shape.z, layer->info.input[i].sz_d,
-				layer->info.input[i].sz_q);
+				metadata->input1[i].shape.z, io_info->input[i].sz_d,
+				io_info->input[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			strncpy(layer->info.input[i].name, (char *)metadata->input2[j].input_name,
+			strncpy(io_info->input[i].name, (char *)metadata->input2[j].input_name,
 				MRVL_ML_INPUT_NAME_LEN);
-			layer->info.input[i].dtype = metadata->input2[j].input_type;
-			layer->info.input[i].qtype = metadata->input2[j].model_input_type;
-			layer->info.input[i].nb_dims = 4;
-			layer->info.input[i].shape[0] = metadata->input2[j].shape.w;
-			layer->info.input[i].shape[1] = metadata->input2[j].shape.x;
-			layer->info.input[i].shape[2] = metadata->input2[j].shape.y;
-			layer->info.input[i].shape[3] = metadata->input2[j].shape.z;
-			layer->info.input[i].nb_elements =
+			io_info->input[i].dtype = metadata->input2[j].input_type;
+			io_info->input[i].qtype = metadata->input2[j].model_input_type;
+			io_info->input[i].nb_dims = 4;
+			io_info->input[i].shape[0] = metadata->input2[j].shape.w;
+			io_info->input[i].shape[1] = metadata->input2[j].shape.x;
+			io_info->input[i].shape[2] = metadata->input2[j].shape.y;
+			io_info->input[i].shape[3] = metadata->input2[j].shape.z;
+			io_info->input[i].nb_elements =
 				metadata->input2[j].shape.w * metadata->input2[j].shape.x *
 				metadata->input2[j].shape.y * metadata->input2[j].shape.z;
-			layer->info.input[i].sz_d =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_d =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].input_type);
-			layer->info.input[i].sz_q =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_q =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].model_input_type);
-			layer->info.input[i].scale = metadata->input2[j].qscale;
+			io_info->input[i].scale = metadata->input2[j].qscale;
 
-			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
-			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
+			io_info->total_input_sz_d += io_info->input[i].sz_d;
+			io_info->total_input_sz_q += io_info->input[i].sz_q;
 
 			plt_ml_dbg(
-				"index = %u, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				layer->index, j, metadata->input2[j].shape.w,
+				"layer_name = %s, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				metadata->model.name, j, metadata->input2[j].shape.w,
 				metadata->input2[j].shape.x, metadata->input2[j].shape.y,
-				metadata->input2[j].shape.z, layer->info.input[i].sz_d,
-				layer->info.input[i].sz_q);
+				metadata->input2[j].shape.z, io_info->input[i].sz_d,
+				io_info->input[i].sz_q);
 		}
 	}
 
 	/* Outputs */
-	layer->info.nb_outputs = metadata->model.num_output;
-	layer->info.total_output_sz_q = 0;
-	layer->info.total_output_sz_d = 0;
+	io_info->nb_outputs = metadata->model.num_output;
+	io_info->total_output_sz_q = 0;
+	io_info->total_output_sz_d = 0;
 	for (i = 0; i < metadata->model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			strncpy(layer->info.output[i].name,
-				(char *)metadata->output1[i].output_name, MRVL_ML_OUTPUT_NAME_LEN);
-			layer->info.output[i].dtype = metadata->output1[i].output_type;
-			layer->info.output[i].qtype = metadata->output1[i].model_output_type;
-			layer->info.output[i].nb_dims = 1;
-			layer->info.output[i].shape[0] = metadata->output1[i].size;
-			layer->info.output[i].nb_elements = metadata->output1[i].size;
-			layer->info.output[i].sz_d =
-				layer->info.output[i].nb_elements *
+			strncpy(io_info->output[i].name, (char *)metadata->output1[i].output_name,
+				MRVL_ML_OUTPUT_NAME_LEN);
+			io_info->output[i].dtype = metadata->output1[i].output_type;
+			io_info->output[i].qtype = metadata->output1[i].model_output_type;
+			io_info->output[i].nb_dims = 1;
+			io_info->output[i].shape[0] = metadata->output1[i].size;
+			io_info->output[i].nb_elements = metadata->output1[i].size;
+			io_info->output[i].sz_d =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].output_type);
-			layer->info.output[i].sz_q =
-				layer->info.output[i].nb_elements *
+			io_info->output[i].sz_q =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].model_output_type);
-			layer->info.output[i].scale = metadata->output1[i].dscale;
+			io_info->output[i].scale = metadata->output1[i].dscale;
 
-			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
-			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
+			io_info->total_output_sz_q += io_info->output[i].sz_q;
+			io_info->total_output_sz_d += io_info->output[i].sz_d;
 
-			plt_ml_dbg("index = %u, output1[%u] - sz_d = %u, sz_q = %u", layer->index,
-				   i, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
+			plt_ml_dbg("layer_name = %s, output1[%u] - sz_d = %u, sz_q = %u",
+				   metadata->model.name, i, io_info->output[i].sz_d,
+				   io_info->output[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			strncpy(layer->info.output[i].name,
-				(char *)metadata->output2[j].output_name, MRVL_ML_OUTPUT_NAME_LEN);
-			layer->info.output[i].dtype = metadata->output2[j].output_type;
-			layer->info.output[i].qtype = metadata->output2[j].model_output_type;
-			layer->info.output[i].nb_dims = 1;
-			layer->info.output[i].shape[0] = metadata->output2[j].size;
-			layer->info.output[i].nb_elements = metadata->output2[j].size;
-			layer->info.output[i].sz_d =
-				layer->info.output[i].nb_elements *
+			strncpy(io_info->output[i].name, (char *)metadata->output2[j].output_name,
+				MRVL_ML_OUTPUT_NAME_LEN);
+			io_info->output[i].dtype = metadata->output2[j].output_type;
+			io_info->output[i].qtype = metadata->output2[j].model_output_type;
+			io_info->output[i].nb_dims = 1;
+			io_info->output[i].shape[0] = metadata->output2[j].size;
+			io_info->output[i].nb_elements = metadata->output2[j].size;
+			io_info->output[i].sz_d =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].output_type);
-			layer->info.output[i].sz_q =
-				layer->info.output[i].nb_elements *
+			io_info->output[i].sz_q =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].model_output_type);
-			layer->info.output[i].scale = metadata->output2[j].dscale;
+			io_info->output[i].scale = metadata->output2[j].dscale;
 
-			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
-			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
+			io_info->total_output_sz_q += io_info->output[i].sz_q;
+			io_info->total_output_sz_d += io_info->output[i].sz_d;
 
-			plt_ml_dbg("index = %u, output2[%u] - sz_d = %u, sz_q = %u", layer->index,
-				   j, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
+			plt_ml_dbg("layer_name = %s, output2[%u] - sz_d = %u, sz_q = %u",
+				   metadata->model.name, j, io_info->output[i].sz_d,
+				   io_info->output[i].sz_q);
 		}
 	}
 }
 
 int
-cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id, uint8_t *buffer,
-			       uint16_t *wb_pages, uint16_t *scratch_pages)
+cn10k_ml_model_ocm_pages_count(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
+			       uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_ocm *ocm;
@@ -506,7 +494,7 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 	uint64_t wb_size;
 
 	metadata = (struct cn10k_ml_model_metadata *)buffer;
-	ocm = &cn10k_mldev->ocm;
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
 
 	/* Assume wb_size is zero for non-relocatable models */
 	if (metadata->model.ocm_relocatable)
@@ -518,7 +506,7 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 		*wb_pages = wb_size / ocm->page_size + 1;
 	else
 		*wb_pages = wb_size / ocm->page_size;
-	plt_ml_dbg("model_id = %u, wb_size = %" PRIu64 ", wb_pages = %u", model_id, wb_size,
+	plt_ml_dbg("index = %u, wb_size = %" PRIu64 ", wb_pages = %u", layer->index, wb_size,
 		   *wb_pages);
 
 	scratch_size = ocm->size_per_tile - metadata->model.ocm_tmp_range_floor;
@@ -526,15 +514,15 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 		*scratch_pages = scratch_size / ocm->page_size + 1;
 	else
 		*scratch_pages = scratch_size / ocm->page_size;
-	plt_ml_dbg("model_id = %u, scratch_size = %" PRIu64 ", scratch_pages = %u", model_id,
+	plt_ml_dbg("index = %u, scratch_size = %" PRIu64 ", scratch_pages = %u", layer->index,
 		   scratch_size, *scratch_pages);
 
 	/* Check if the model can be loaded on OCM */
-	if ((*wb_pages + *scratch_pages) > cn10k_mldev->ocm.num_pages) {
+	if ((*wb_pages + *scratch_pages) > ocm->num_pages) {
 		plt_err("Cannot create the model, OCM relocatable = %u",
 			metadata->model.ocm_relocatable);
 		plt_err("wb_pages (%u) + scratch_pages (%u) > %u", *wb_pages, *scratch_pages,
-			cn10k_mldev->ocm.num_pages);
+			ocm->num_pages);
 		return -ENOMEM;
 	}
 
@@ -542,28 +530,25 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 	 * prevent the library from allocating the remaining space on the tile to other models.
 	 */
 	if (!metadata->model.ocm_relocatable)
-		*scratch_pages = PLT_MAX(PLT_U64_CAST(*scratch_pages),
-					 PLT_U64_CAST(cn10k_mldev->ocm.num_pages));
+		*scratch_pages =
+			PLT_MAX(PLT_U64_CAST(*scratch_pages), PLT_U64_CAST(ocm->num_pages));
 
 	return 0;
 }
 
 void
-cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
+cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			struct cnxk_ml_io_info *io_info, struct cn10k_ml_model_metadata *metadata)
 {
-	struct cn10k_ml_model_metadata *metadata;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct rte_ml_model_info *info;
 	struct rte_ml_io_info *output;
 	struct rte_ml_io_info *input;
-	struct cnxk_ml_layer *layer;
 	uint8_t i;
 
-	cnxk_mldev = dev->data->dev_private;
 	metadata = &model->glow.metadata;
 	info = PLT_PTR_CAST(model->info);
 	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
-	output = PLT_PTR_ADD(input, metadata->model.num_input * sizeof(struct rte_ml_io_info));
+	output = PLT_PTR_ADD(input, ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info));
 
 	/* Set model info */
 	memset(info, 0, sizeof(struct rte_ml_model_info));
@@ -572,39 +557,37 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 		 metadata->model.version[1], metadata->model.version[2],
 		 metadata->model.version[3]);
 	info->model_id = model->model_id;
-	info->device_id = dev->data->dev_id;
+	info->device_id = cnxk_mldev->mldev->data->dev_id;
 	info->io_layout = RTE_ML_IO_LAYOUT_PACKED;
 	info->min_batches = model->batch_size;
 	info->max_batches =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_num_batches /
 		model->batch_size;
-	info->nb_inputs = metadata->model.num_input;
+	info->nb_inputs = io_info->nb_inputs;
 	info->input_info = input;
-	info->nb_outputs = metadata->model.num_output;
+	info->nb_outputs = io_info->nb_outputs;
 	info->output_info = output;
 	info->wb_size = metadata->weights_bias.file_size;
 
 	/* Set input info */
-	layer = &model->layer[0];
 	for (i = 0; i < info->nb_inputs; i++) {
-		rte_memcpy(input[i].name, layer->info.input[i].name, MRVL_ML_INPUT_NAME_LEN);
-		input[i].nb_dims = layer->info.input[i].nb_dims;
-		input[i].shape = &layer->info.input[i].shape[0];
-		input[i].type = layer->info.input[i].qtype;
-		input[i].nb_elements = layer->info.input[i].nb_elements;
-		input[i].size = layer->info.input[i].nb_elements *
-				rte_ml_io_type_size_get(layer->info.input[i].qtype);
+		rte_memcpy(input[i].name, io_info->input[i].name, MRVL_ML_INPUT_NAME_LEN);
+		input[i].nb_dims = io_info->input[i].nb_dims;
+		input[i].shape = &io_info->input[i].shape[0];
+		input[i].type = io_info->input[i].qtype;
+		input[i].nb_elements = io_info->input[i].nb_elements;
+		input[i].size = io_info->input[i].nb_elements *
+				rte_ml_io_type_size_get(io_info->input[i].qtype);
 	}
 
 	/* Set output info */
-	layer = &model->layer[0];
 	for (i = 0; i < info->nb_outputs; i++) {
-		rte_memcpy(output[i].name, layer->info.output[i].name, MRVL_ML_INPUT_NAME_LEN);
-		output[i].nb_dims = layer->info.output[i].nb_dims;
-		output[i].shape = &layer->info.output[i].shape[0];
-		output[i].type = layer->info.output[i].qtype;
-		output[i].nb_elements = layer->info.output[i].nb_elements;
-		output[i].size = layer->info.output[i].nb_elements *
-				 rte_ml_io_type_size_get(layer->info.output[i].qtype);
+		rte_memcpy(output[i].name, io_info->output[i].name, MRVL_ML_INPUT_NAME_LEN);
+		output[i].nb_dims = io_info->output[i].nb_dims;
+		output[i].shape = &io_info->output[i].shape[0];
+		output[i].type = io_info->output[i].qtype;
+		output[i].nb_elements = io_info->output[i].nb_elements;
+		output[i].size = io_info->output[i].nb_elements *
+				 rte_ml_io_type_size_get(io_info->output[i].qtype);
 	}
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 5c32f48c68..45290b84ce 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -9,9 +9,11 @@
 
 #include <roc_api.h>
 
-#include "cn10k_ml_dev.h"
 #include "cn10k_ml_ocm.h"
 
+#include "cnxk_ml_io.h"
+
+struct cnxk_ml_dev;
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
 struct cnxk_ml_req;
@@ -366,27 +368,15 @@ struct cn10k_ml_layer_addr {
 	/* Base DMA address for load */
 	void *base_dma_addr_load;
 
-	/* Base DMA address for run */
-	void *base_dma_addr_run;
-
 	/* Init section load address */
 	void *init_load_addr;
 
-	/* Init section run address */
-	void *init_run_addr;
-
 	/* Main section load address */
 	void *main_load_addr;
 
-	/* Main section run address */
-	void *main_run_addr;
-
 	/* Finish section load address */
 	void *finish_load_addr;
 
-	/* Finish section run address */
-	void *finish_run_addr;
-
 	/* Weights and Bias base address */
 	void *wb_base_addr;
 
@@ -462,9 +452,12 @@ int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
 void cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer,
 				uint8_t *base_dma_addr);
-void cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer);
-int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id,
+void cn10k_ml_layer_io_info_update(struct cnxk_ml_io_info *io_info,
+				   struct cn10k_ml_model_metadata *metadata);
+int cn10k_ml_model_ocm_pages_count(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				   uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages);
-void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model);
+void cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			     struct cnxk_ml_io_info *io_info,
+			     struct cn10k_ml_model_metadata *metadata);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 330cb050cb..3bfc63d9d4 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -19,6 +19,9 @@
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
 
+/* ML layer macros */
+#define CN10K_ML_LAYER_MEMZONE_NAME "ml_cn10k_layer_mz"
+
 /* Debug print width */
 #define STR_LEN	  12
 #define FIELD_LEN 16
@@ -277,7 +280,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 		req->cn10k_req.jd.model_start.extended_args = PLT_U64_CAST(
 			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.extended_args));
 		req->cn10k_req.jd.model_start.model_dst_ddr_addr =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_run_addr));
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_load_addr));
 		req->cn10k_req.jd.model_start.model_init_offset = 0x0;
 		req->cn10k_req.jd.model_start.model_main_offset = metadata->init_model.file_size;
 		req->cn10k_req.jd.model_start.model_finish_offset =
@@ -1265,85 +1268,171 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 }
 
 int
-cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
+cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uint8_t *buffer,
+		    size_t size, uint16_t *index)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
-	size_t model_scratch_size;
-	size_t model_stats_size;
-	size_t model_data_size;
-	size_t model_info_size;
+	size_t layer_object_size = 0;
+	size_t layer_scratch_size;
+	size_t layer_xstats_size;
 	uint8_t *base_dma_addr;
 	uint16_t scratch_pages;
+	uint16_t layer_id = 0;
 	uint16_t wb_pages;
 	uint64_t mz_size;
 	uint16_t idx;
-	bool found;
 	int qp_id;
 	int ret;
 
-	ret = cn10k_ml_model_metadata_check(params->addr, params->size);
+	PLT_SET_USED(size);
+	PLT_SET_USED(layer_name);
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	layer = &model->layer[layer_id];
+
+	ret = cn10k_ml_model_metadata_check(buffer, size);
 	if (ret != 0)
 		return ret;
 
-	cnxk_mldev = dev->data->dev_private;
-
-	/* Find model ID */
-	found = false;
-	for (idx = 0; idx < dev->data->nb_models; idx++) {
-		if (dev->data->models[idx] == NULL) {
-			found = true;
+	/* Get index */
+	for (idx = 0; idx < cnxk_mldev->max_nb_layers; idx++) {
+		if (!cnxk_mldev->index_map[idx].active) {
+			layer->index = idx;
 			break;
 		}
 	}
 
-	if (!found) {
-		plt_err("No slots available to load new model");
-		return -ENOMEM;
+	if (idx >= cnxk_mldev->max_nb_layers) {
+		plt_err("No slots available for model layers, model_id = %u, layer_id = %u",
+			model->model_id, layer_id);
+		return -1;
 	}
 
+	layer->model = model;
+
 	/* Get WB and scratch pages, check if model can be loaded. */
-	ret = cn10k_ml_model_ocm_pages_count(&cnxk_mldev->cn10k_mldev, idx, params->addr, &wb_pages,
-					     &scratch_pages);
+	ret = cn10k_ml_model_ocm_pages_count(cnxk_mldev, layer, buffer, &wb_pages, &scratch_pages);
 	if (ret < 0)
 		return ret;
 
-	/* Compute memzone size */
-	metadata = (struct cn10k_ml_model_metadata *)params->addr;
-	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
-			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
-	model_scratch_size = PLT_ALIGN_CEIL(metadata->model.ddr_scratch_range_end -
+	/* Compute layer memzone size */
+	metadata = (struct cn10k_ml_model_metadata *)buffer;
+	layer_object_size = metadata->init_model.file_size + metadata->main_model.file_size +
+			    metadata->finish_model.file_size + metadata->weights_bias.file_size;
+	layer_object_size = PLT_ALIGN_CEIL(layer_object_size, ML_CN10K_ALIGN_SIZE);
+	layer_scratch_size = PLT_ALIGN_CEIL(metadata->model.ddr_scratch_range_end -
 						    metadata->model.ddr_scratch_range_start + 1,
 					    ML_CN10K_ALIGN_SIZE);
-	model_data_size = PLT_ALIGN_CEIL(model_data_size, ML_CN10K_ALIGN_SIZE);
-	model_info_size = sizeof(struct rte_ml_model_info) +
-			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
-			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
-	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
-	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_xstats);
-
-	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
-		  2 * model_data_size + model_scratch_size + model_info_size +
-		  PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE) +
-		  model_stats_size;
+	layer_xstats_size = (cnxk_mldev->mldev->data->nb_queue_pairs + 1) *
+			    sizeof(struct cn10k_ml_layer_xstats);
 
-	/* Allocate memzone for model object and model data */
-	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
+	/* Allocate memzone for model data */
+	mz_size = layer_object_size + layer_scratch_size +
+		  PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE) +
+		  layer_xstats_size;
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u_%u", CN10K_ML_LAYER_MEMZONE_NAME,
+		 model->model_id, layer_id);
 	mz = plt_memzone_reserve_aligned(str, mz_size, 0, ML_CN10K_ALIGN_SIZE);
 	if (!mz) {
 		plt_err("plt_memzone_reserve failed : %s", str);
 		return -ENOMEM;
 	}
 
-	model = mz->addr;
-	model->cnxk_mldev = cnxk_mldev;
-	model->model_id = idx;
-	dev->data->models[idx] = model;
+	/* Copy metadata to internal buffer */
+	rte_memcpy(&layer->glow.metadata, buffer, sizeof(struct cn10k_ml_model_metadata));
+	cn10k_ml_model_metadata_update(&layer->glow.metadata);
+
+	/* Set layer name */
+	rte_memcpy(layer->name, layer->glow.metadata.model.name, MRVL_ML_MODEL_NAME_LEN);
+
+	/* Enable support for batch_size of 256 */
+	if (layer->glow.metadata.model.batch_size == 0)
+		layer->batch_size = 256;
+	else
+		layer->batch_size = layer->glow.metadata.model.batch_size;
+
+	/* Set DMA base address */
+	base_dma_addr = mz->addr;
+	cn10k_ml_layer_addr_update(layer, buffer, base_dma_addr);
+
+	/* Set scratch base address */
+	layer->glow.addr.scratch_base_addr = PLT_PTR_ADD(base_dma_addr, layer_object_size);
+
+	/* Update internal I/O data structure */
+	cn10k_ml_layer_io_info_update(&layer->info, &layer->glow.metadata);
+
+	/* Initialize model_mem_map */
+	memset(&layer->glow.ocm_map, 0, sizeof(struct cn10k_ml_ocm_layer_map));
+	layer->glow.ocm_map.ocm_reserved = false;
+	layer->glow.ocm_map.tilemask = 0;
+	layer->glow.ocm_map.wb_page_start = -1;
+	layer->glow.ocm_map.wb_pages = wb_pages;
+	layer->glow.ocm_map.scratch_pages = scratch_pages;
+
+	/* Set slow-path request address and state */
+	layer->glow.req = PLT_PTR_ADD(mz->addr, layer_object_size + layer_scratch_size);
+
+	/* Reset burst and sync stats */
+	layer->glow.burst_xstats = PLT_PTR_ADD(
+		layer->glow.req, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
+	for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs + 1; qp_id++) {
+		layer->glow.burst_xstats[qp_id].hw_latency_tot = 0;
+		layer->glow.burst_xstats[qp_id].hw_latency_min = UINT64_MAX;
+		layer->glow.burst_xstats[qp_id].hw_latency_max = 0;
+		layer->glow.burst_xstats[qp_id].fw_latency_tot = 0;
+		layer->glow.burst_xstats[qp_id].fw_latency_min = UINT64_MAX;
+		layer->glow.burst_xstats[qp_id].fw_latency_max = 0;
+		layer->glow.burst_xstats[qp_id].hw_reset_count = 0;
+		layer->glow.burst_xstats[qp_id].fw_reset_count = 0;
+		layer->glow.burst_xstats[qp_id].dequeued_count = 0;
+	}
+
+	layer->glow.sync_xstats =
+		PLT_PTR_ADD(layer->glow.burst_xstats, cnxk_mldev->mldev->data->nb_queue_pairs *
+							      sizeof(struct cn10k_ml_layer_xstats));
+
+	/* Update xstats names */
+	cn10k_ml_xstats_model_name_update(cnxk_mldev->mldev, idx);
+
+	layer->state = ML_CNXK_LAYER_STATE_LOADED;
+	cnxk_mldev->index_map[idx].model_id = model->model_id;
+	cnxk_mldev->index_map[idx].layer_id = layer_id;
+	cnxk_mldev->index_map[idx].active = true;
+	*index = idx;
+
+	return 0;
+}
+
+int
+cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+		    struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+	int ret;
+
+	/* Metadata check */
+	ret = cn10k_ml_model_metadata_check(params->addr, params->size);
+	if (ret != 0)
+		return ret;
 
+	/* Copy metadata to internal buffer */
 	rte_memcpy(&model->glow.metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
 	cn10k_ml_model_metadata_update(&model->glow.metadata);
 
@@ -1362,99 +1451,62 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	 */
 	model->nb_layers = 1;
 
-	/* Copy metadata to internal buffer */
-	rte_memcpy(&model->layer[0].glow.metadata, params->addr,
-		   sizeof(struct cn10k_ml_model_metadata));
-	cn10k_ml_model_metadata_update(&model->layer[0].glow.metadata);
-	model->layer[0].model = model;
-
-	/* Set DMA base address */
-	base_dma_addr = PLT_PTR_ADD(
-		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE));
-	cn10k_ml_layer_addr_update(&model->layer[0], params->addr, base_dma_addr);
-	model->layer[0].glow.addr.scratch_base_addr =
-		PLT_PTR_ADD(base_dma_addr, 2 * model_data_size);
-
-	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
-		   model->layer[0].glow.addr.base_dma_addr_load, model_data_size);
-
-	/* Update internal I/O data structure */
-	cn10k_ml_layer_info_update(&model->layer[0]);
-
-	/* Initialize model_mem_map */
-	memset(&model->layer[0].glow.ocm_map, 0, sizeof(struct cn10k_ml_ocm_layer_map));
-	model->layer[0].glow.ocm_map.ocm_reserved = false;
-	model->layer[0].glow.ocm_map.tilemask = 0;
-	model->layer[0].glow.ocm_map.wb_page_start = -1;
-	model->layer[0].glow.ocm_map.wb_pages = wb_pages;
-	model->layer[0].glow.ocm_map.scratch_pages = scratch_pages;
-
-	/* Set model info */
-	model->info = PLT_PTR_ADD(model->layer[0].glow.addr.scratch_base_addr, model_scratch_size);
-	cn10k_ml_model_info_set(dev, model);
-
-	/* Set slow-path request address and state */
-	model->layer[0].glow.req = PLT_PTR_ADD(model->info, model_info_size);
-
-	/* Reset burst and sync stats */
-	model->layer[0].glow.burst_xstats =
-		PLT_PTR_ADD(model->layer[0].glow.req,
-			    PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
-		model->layer[0].glow.burst_xstats[qp_id].hw_latency_tot = 0;
-		model->layer[0].glow.burst_xstats[qp_id].hw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_xstats[qp_id].hw_latency_max = 0;
-		model->layer[0].glow.burst_xstats[qp_id].fw_latency_tot = 0;
-		model->layer[0].glow.burst_xstats[qp_id].fw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_xstats[qp_id].fw_latency_max = 0;
-		model->layer[0].glow.burst_xstats[qp_id].hw_reset_count = 0;
-		model->layer[0].glow.burst_xstats[qp_id].fw_reset_count = 0;
-		model->layer[0].glow.burst_xstats[qp_id].dequeued_count = 0;
+	/* Load layer and get the index */
+	layer = &model->layer[0];
+	ret = cn10k_ml_layer_load(cnxk_mldev, model->model_id, NULL, params->addr, params->size,
+				  &layer->index);
+	if (ret != 0) {
+		plt_err("Model layer load failed: model_id = %u, layer_id = %u", model->model_id,
+			0);
+		return ret;
 	}
 
-	model->layer[0].glow.sync_xstats =
-		PLT_PTR_ADD(model->layer[0].glow.burst_xstats,
-			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_xstats));
-
-	plt_spinlock_init(&model->lock);
-	model->state = ML_CNXK_MODEL_STATE_LOADED;
-	dev->data->models[idx] = model;
-	cnxk_mldev->nb_models_loaded++;
-
-	/* Update xstats names */
-	cn10k_ml_xstats_model_name_update(dev, idx);
-
-	*model_id = idx;
+	cn10k_ml_model_info_set(cnxk_mldev, model, &model->layer[0].info, &model->glow.metadata);
 
 	return 0;
 }
 
 int
-cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name)
 {
-	char str[RTE_MEMZONE_NAMESIZE];
-	struct cnxk_ml_model *model;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 
-	cnxk_mldev = dev->data->dev_private;
-	model = dev->data->models[model_id];
+	char str[RTE_MEMZONE_NAMESIZE];
+	uint16_t layer_id = 0;
+	int ret;
 
+	PLT_SET_USED(layer_name);
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
 	if (model == NULL) {
 		plt_err("Invalid model_id = %u", model_id);
 		return -EINVAL;
 	}
 
-	if (model->state != ML_CNXK_MODEL_STATE_LOADED) {
-		plt_err("Cannot unload. Model in use.");
-		return -EBUSY;
-	}
+	layer = &model->layer[layer_id];
 
-	dev->data->models[model_id] = NULL;
-	cnxk_mldev->nb_models_unloaded++;
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u_%u", CN10K_ML_LAYER_MEMZONE_NAME,
+		 model->model_id, layer_id);
+	ret = plt_memzone_free(plt_memzone_lookup(str));
 
-	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, model_id);
-	return plt_memzone_free(plt_memzone_lookup(str));
+	layer->state = ML_CNXK_LAYER_STATE_UNKNOWN;
+	cnxk_mldev->index_map[layer->index].active = false;
+
+	return ret;
+}
+
+int
+cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	return cn10k_ml_layer_unload(cnxk_mldev, model->model_id, NULL);
 }
 
 int
@@ -1752,7 +1804,6 @@ int
 cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
 {
 	struct cnxk_ml_model *model;
-	size_t size;
 
 	model = dev->data->models[model_id];
 
@@ -1766,19 +1817,10 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *bu
 	else if (model->state != ML_CNXK_MODEL_STATE_LOADED)
 		return -EBUSY;
 
-	size = model->layer[0].glow.metadata.init_model.file_size +
-	       model->layer[0].glow.metadata.main_model.file_size +
-	       model->layer[0].glow.metadata.finish_model.file_size +
-	       model->layer[0].glow.metadata.weights_bias.file_size;
-
 	/* Update model weights & bias */
 	rte_memcpy(model->layer[0].glow.addr.wb_load_addr, buffer,
 		   model->layer[0].glow.metadata.weights_bias.file_size);
 
-	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
-		   model->layer[0].glow.addr.base_dma_addr_load, size);
-
 	return 0;
 }
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 2d0a49d5cd..677219dfdf 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -12,6 +12,7 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_qp;
+struct cnxk_ml_model;
 
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
@@ -311,9 +312,9 @@ int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 			      int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids);
 
 /* Slow-path ops */
-int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
-			uint16_t *model_id);
-int cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
+int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+			struct cnxk_ml_model *model);
+int cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
@@ -339,4 +340,9 @@ __rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *
 /* Misc ops */
 void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
 
+/* Layer ops */
+int cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uint8_t *buffer,
+			size_t size, uint16_t *index);
+int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name);
+
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 02605fa28f..1590249abd 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -31,6 +31,18 @@ enum cnxk_ml_dev_state {
 	ML_CNXK_DEV_STATE_CLOSED
 };
 
+/* Index to model and layer ID map */
+struct cnxk_ml_index_map {
+	/* Model ID */
+	uint16_t model_id;
+
+	/* Layer ID */
+	uint16_t layer_id;
+
+	/* Layer status */
+	bool active;
+};
+
 /* Device private data */
 struct cnxk_ml_dev {
 	/* RTE device */
@@ -56,6 +68,9 @@ struct cnxk_ml_dev {
 
 	/* Maximum number of layers */
 	uint64_t max_nb_layers;
+
+	/* Index map */
+	struct cnxk_ml_index_map *index_map;
 };
 
 #endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 1767a8a3db..3d9d5f9d78 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -12,6 +12,9 @@
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
+/* ML model macros */
+#define CNXK_ML_MODEL_MEMZONE_NAME "ml_cnxk_model_mz"
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -139,6 +142,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	uint16_t model_id;
 	uint32_t mz_size;
 	uint16_t qp_id;
+	uint64_t i;
 	int ret;
 
 	if (dev == NULL)
@@ -242,7 +246,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 						plt_err("Could not stop model %u", model_id);
 				}
 				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-					if (cn10k_ml_model_unload(dev, model_id) != 0)
+					if (cnxk_ml_model_unload(dev, model_id) != 0)
 						plt_err("Could not unload model %u", model_id);
 				}
 				dev->data->models[model_id] = NULL;
@@ -273,6 +277,23 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	cnxk_mldev->max_nb_layers =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
 
+	/* Allocate and initialize index_map */
+	if (cnxk_mldev->index_map == NULL) {
+		cnxk_mldev->index_map =
+			rte_zmalloc("cnxk_ml_index_map",
+				    sizeof(struct cnxk_ml_index_map) * cnxk_mldev->max_nb_layers,
+				    RTE_CACHE_LINE_SIZE);
+		if (cnxk_mldev->index_map == NULL) {
+			plt_err("Failed to get memory for index_map, nb_layers %" PRIu64,
+				cnxk_mldev->max_nb_layers);
+			ret = -ENOMEM;
+			goto error;
+		}
+	}
+
+	for (i = 0; i < cnxk_mldev->max_nb_layers; i++)
+		cnxk_mldev->index_map[i].active = false;
+
 	cnxk_mldev->nb_models_loaded = 0;
 	cnxk_mldev->nb_models_started = 0;
 	cnxk_mldev->nb_models_stopped = 0;
@@ -305,6 +326,9 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close CN10K ML Device");
 
+	if (cnxk_mldev->index_map)
+		rte_free(cnxk_mldev->index_map);
+
 	/* Stop and unload all models */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 		model = dev->data->models[model_id];
@@ -314,7 +338,7 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 					plt_err("Could not stop model %u", model_id);
 			}
 			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-				if (cn10k_ml_model_unload(dev, model_id) != 0)
+				if (cnxk_ml_model_unload(dev, model_id) != 0)
 					plt_err("Could not unload model %u", model_id);
 			}
 			dev->data->models[model_id] = NULL;
@@ -430,6 +454,118 @@ cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+static int
+cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint64_t model_info_size;
+	uint16_t lcl_model_id;
+	uint64_t mz_size;
+	bool found;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	/* Find model ID */
+	found = false;
+	for (lcl_model_id = 0; lcl_model_id < dev->data->nb_models; lcl_model_id++) {
+		if (dev->data->models[lcl_model_id] == NULL) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		plt_err("No slots available to load new model");
+		return -ENOMEM;
+	}
+
+	/* Compute memzone size */
+	cnxk_ml_dev_info_get(dev, &dev_info);
+	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size);
+	model_info_size = sizeof(struct rte_ml_model_info) +
+			  ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info) +
+			  ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info);
+	model_info_size = PLT_ALIGN_CEIL(model_info_size, dev_info.align_size);
+	mz_size += model_info_size;
+
+	/* Allocate memzone for model object */
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CNXK_ML_MODEL_MEMZONE_NAME, lcl_model_id);
+	mz = plt_memzone_reserve_aligned(str, mz_size, 0, dev_info.align_size);
+	if (!mz) {
+		plt_err("Failed to allocate memory for cnxk_ml_model: %s", str);
+		return -ENOMEM;
+	}
+
+	model = mz->addr;
+	model->cnxk_mldev = cnxk_mldev;
+	model->model_id = lcl_model_id;
+	model->info = PLT_PTR_ADD(
+		model, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size));
+	dev->data->models[lcl_model_id] = model;
+
+	ret = cn10k_ml_model_load(cnxk_mldev, params, model);
+	if (ret != 0)
+		goto error;
+
+	plt_spinlock_init(&model->lock);
+	model->state = ML_CNXK_MODEL_STATE_LOADED;
+	cnxk_mldev->nb_models_loaded++;
+
+	*model_id = lcl_model_id;
+
+	return 0;
+
+error:
+	rte_memzone_free(mz);
+
+	return ret;
+}
+
+int
+cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	if (model->state != ML_CNXK_MODEL_STATE_LOADED) {
+		plt_err("Cannot unload. Model in use.");
+		return -EBUSY;
+	}
+
+	ret = cn10k_ml_model_unload(cnxk_mldev, model);
+	if (ret != 0)
+		return ret;
+
+	dev->data->models[model_id] = NULL;
+	cnxk_mldev->nb_models_unloaded++;
+
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CNXK_ML_MODEL_MEMZONE_NAME, model_id);
+	return plt_memzone_free(plt_memzone_lookup(str));
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -453,8 +589,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
 
 	/* Model ops */
-	.model_load = cn10k_ml_model_load,
-	.model_unload = cn10k_ml_model_unload,
+	.model_load = cnxk_ml_model_load,
+	.model_unload = cnxk_ml_model_unload,
 	.model_start = cn10k_ml_model_start,
 	.model_stop = cn10k_ml_model_stop,
 	.model_info_get = cn10k_ml_model_info_get,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index a925c07580..bc14f6e5b9 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -62,4 +62,6 @@ struct cnxk_ml_qp {
 
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
+int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 11/34] ml/cnxk: update model start and stop functions
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (9 preceding siblings ...)
  2023-09-20  7:25   ` [PATCH v2 10/34] ml/cnxk: update model load and unload functions Srikanth Yalavarthi
@ 2023-09-20  7:25   ` Srikanth Yalavarthi
  2023-09-20  7:25   ` [PATCH v2 12/34] ml/cnxk: update model utility functions Srikanth Yalavarthi
                     ` (23 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:25 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented cnxk wrapper functions to start and stop
ML models. Wrapper functions would invoke the cn10k
model start and stop functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c |  28 ++--
 drivers/ml/cnxk/cn10k_ml_ocm.h |  12 +-
 drivers/ml/cnxk/cn10k_ml_ops.c | 282 ++++++++++++++++++++-------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   8 +-
 drivers/ml/cnxk/cnxk_ml_ops.c  |  48 +++++-
 drivers/ml/cnxk/cnxk_ml_ops.h  |   1 +
 6 files changed, 240 insertions(+), 139 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 5682778e87..2d900dbc78 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -217,11 +217,10 @@ cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end)
  * scratch & WB pages and OCM allocation mode.
  */
 int
-cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
+cn10k_ml_ocm_tilemask_find(struct cnxk_ml_dev *cnxk_mldev, uint8_t num_tiles, uint16_t wb_pages,
 			   uint16_t scratch_pages, uint64_t *tilemask)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	uint16_t used_scratch_pages_max;
@@ -240,7 +239,6 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	int max_slot_sz;
 	int page_id;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
 
@@ -335,12 +333,10 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 }
 
 void
-cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
+cn10k_ml_ocm_reserve_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id,
 			   uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
 			   uint16_t scratch_pages)
 {
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
@@ -353,10 +349,8 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t l
 	int tile_id;
 	int page_id;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
+	model = cnxk_mldev->mldev->data->models[model_id];
 	layer = &model->layer[layer_id];
 
 	/* Get first set bit, tile_start */
@@ -398,12 +392,10 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t l
 }
 
 void
-cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id)
+cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id)
 {
 	struct cnxk_ml_model *local_model;
 	struct cnxk_ml_layer *local_layer;
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
@@ -418,10 +410,8 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t laye
 	uint16_t i;
 	uint16_t j;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
+	model = cnxk_mldev->mldev->data->models[model_id];
 	layer = &model->layer[layer_id];
 
 	/* Update OCM info for WB memory */
@@ -440,8 +430,8 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t laye
 
 		/* Get max scratch pages required, excluding the current model */
 		scratch_resize_pages = 0;
-		for (i = 0; i < dev->data->nb_models; i++) {
-			local_model = dev->data->models[i];
+		for (i = 0; i < cnxk_mldev->mldev->data->nb_models; i++) {
+			local_model = cnxk_mldev->mldev->data->models[i];
 			if (local_model == NULL)
 				continue;
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 720f8caf76..97b723a56a 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -8,6 +8,8 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+struct cnxk_ml_dev;
+
 /* Number of OCM tiles. */
 #define ML_CN10K_OCM_NUMTILES 0x8
 
@@ -75,12 +77,12 @@ struct cn10k_ml_ocm {
 };
 
 int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
-int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
+int cn10k_ml_ocm_tilemask_find(struct cnxk_ml_dev *cnxk_mldev, uint8_t num_tiles, uint16_t wb_pages,
 			       uint16_t scratch_pages, uint64_t *tilemask);
-void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
-				uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
-				uint16_t scratch_pages);
-void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id);
+void cn10k_ml_ocm_reserve_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id,
+				uint16_t layer_id, uint64_t tilemask, int wb_page_start,
+				uint16_t wb_pages, uint16_t scratch_pages);
+void cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id);
 void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp);
 
 #endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 3bfc63d9d4..e5b9837ed7 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -252,26 +252,28 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 }
 
 static void
-cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_model *model,
+cn10k_ml_prep_sp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				struct cnxk_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_layer_addr *addr;
+	struct cn10k_ml_dev *cn10k_mldev;
 
-	metadata = &model->glow.metadata;
-	addr = &model->layer[0].glow.addr;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	metadata = &layer->glow.metadata;
+	addr = &layer->glow.addr;
 
 	memset(&req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
 	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
 	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->cn10k_req.status);
-	req->cn10k_req.jd.hdr.model_id = model->model_id;
+	req->cn10k_req.jd.hdr.model_id = layer->index;
 	req->cn10k_req.jd.hdr.job_type = job_type;
 	req->cn10k_req.jd.hdr.fp_flags = 0x0;
 	req->cn10k_req.jd.hdr.result =
 		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
-		if (!model->glow.metadata.model.ocm_relocatable)
+		if (!layer->glow.metadata.model.ocm_relocatable)
 			req->cn10k_req.jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
 		else
 			req->cn10k_req.jd.hdr.sp_flags = 0x0;
@@ -295,7 +297,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 		req->cn10k_req.jd.model_start.num_gather_entries = 0;
 		req->cn10k_req.jd.model_start.num_scatter_entries = 0;
 		req->cn10k_req.jd.model_start.tilemask = 0; /* Updated after reserving pages */
-		req->cn10k_req.jd.model_start.batch_size = model->batch_size;
+		req->cn10k_req.jd.model_start.batch_size = layer->batch_size;
 		req->cn10k_req.jd.model_start.ocm_wb_base_address =
 			0; /* Updated after reserving pages */
 		req->cn10k_req.jd.model_start.ocm_wb_range_start =
@@ -327,9 +329,13 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 }
 
 static __rte_always_inline void
-cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_req *req,
+cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_req *req,
 				struct rte_ml_op *op)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+
 	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
 	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(req->status);
 	req->cn10k_req.jd.hdr.model_id = op->model_id;
@@ -718,10 +724,8 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 }
 
 static int
-cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer)
 {
-	struct rte_ml_model_info *info;
-	struct cnxk_ml_model *model;
 	struct rte_ml_buff_seg seg[2];
 	struct rte_ml_buff_seg *inp;
 	struct rte_ml_buff_seg *out;
@@ -734,22 +738,20 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	int ret = 0;
 	uint32_t i;
 
-	model = dev->data->models[model_id];
-	info = (struct rte_ml_model_info *)model->info;
 	inp = &seg[0];
 	out = &seg[1];
 
 	/* Create input and output buffers. */
-	for (i = 0; i < info->nb_inputs; i++)
-		isize += info->input_info[i].size;
+	for (i = 0; i < layer->info.nb_inputs; i++)
+		isize += layer->info.input[i].sz_q;
 
-	for (i = 0; i < info->nb_outputs; i++)
-		osize += info->output_info[i].size;
+	for (i = 0; i < layer->info.nb_outputs; i++)
+		osize += layer->info.output[i].sz_q;
 
-	isize = model->batch_size * isize;
-	osize = model->batch_size * osize;
+	isize = layer->batch_size * isize;
+	osize = layer->batch_size * osize;
 
-	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", model_id);
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", layer->index);
 	mz = plt_memzone_reserve_aligned(str, isize + osize, 0, ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL)
 		return -ENOMEM;
@@ -765,15 +767,15 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	seg[1].length = osize;
 	seg[1].next = NULL;
 
-	op.model_id = model_id;
-	op.nb_batches = model->batch_size;
+	op.model_id = layer->index;
+	op.nb_batches = layer->batch_size;
 	op.mempool = NULL;
 
 	op.input = &inp;
 	op.output = &out;
 
-	memset(model->layer[0].glow.req, 0, sizeof(struct cnxk_ml_req));
-	ret = cn10k_ml_inference_sync(dev, &op);
+	memset(layer->glow.req, 0, sizeof(struct cnxk_ml_req));
+	ret = cn10k_ml_inference_sync(cnxk_mldev, &op);
 	plt_memzone_free(mz);
 
 	return ret;
@@ -1510,14 +1512,16 @@ cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mode
 }
 
 int
-cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 	struct cnxk_ml_req *req;
 
+	uint16_t layer_id = 0;
 	bool job_enqueued;
 	bool job_dequeued;
 	uint8_t num_tiles;
@@ -1528,85 +1532,89 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	bool locked;
 	int ret = 0;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	PLT_SET_USED(layer_name);
 
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
 	if (model == NULL) {
 		plt_err("Invalid model_id = %u", model_id);
 		return -EINVAL;
 	}
 
+	layer = &model->layer[layer_id];
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
+
 	/* Prepare JD */
-	req = model->layer[0].glow.req;
-	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
+	req = layer->glow.req;
+	cn10k_ml_prep_sp_job_descriptor(cnxk_mldev, layer, req, ML_CN10K_JOB_TYPE_MODEL_START);
 	req->cn10k_req.result.error_code = 0x0;
 	req->cn10k_req.result.user_ptr = NULL;
 
 	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
-	num_tiles = model->layer[0].glow.metadata.model.tile_end -
-		    model->layer[0].glow.metadata.model.tile_start + 1;
+	num_tiles = layer->glow.metadata.model.tile_end - layer->glow.metadata.model.tile_start + 1;
 
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-				plt_ml_dbg("Model already started, model = 0x%016lx",
-					   PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_STARTED) {
+				plt_ml_dbg("Layer already started, model_id = %u, layer_id = %u",
+					   model->model_id, layer_id);
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
-				plt_err("A slow-path job is active for the model = 0x%016lx",
-					PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE) {
+				plt_err("A slow-path job is active for the model_id = %u",
+					model->model_id);
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
+			layer->state = ML_CNXK_LAYER_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (!model->layer[0].glow.ocm_map.ocm_reserved) {
+	while (!layer->glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
 			wb_page_start = cn10k_ml_ocm_tilemask_find(
-				dev, num_tiles, model->layer[0].glow.ocm_map.wb_pages,
-				model->layer[0].glow.ocm_map.scratch_pages, &tilemask);
+				cnxk_mldev, num_tiles, layer->glow.ocm_map.wb_pages,
+				layer->glow.ocm_map.scratch_pages, &tilemask);
 
 			if (wb_page_start == -1) {
 				plt_err("Free pages not available on OCM tiles");
-				plt_err("Failed to start model = 0x%016lx, name = %s",
-					PLT_U64_CAST(model),
-					model->layer[0].glow.metadata.model.name);
-
+				plt_err("Failed to start layer, model_id = %u, layer_id = %u",
+					model->model_id, layer_id);
 				plt_spinlock_unlock(&ocm->lock);
 				return -ENOMEM;
 			}
 
-			model->layer[0].glow.ocm_map.tilemask = tilemask;
-			model->layer[0].glow.ocm_map.wb_page_start = wb_page_start;
+			layer->glow.ocm_map.tilemask = tilemask;
+			layer->glow.ocm_map.wb_page_start = wb_page_start;
 
-			cn10k_ml_ocm_reserve_pages(dev, model->model_id, 0,
-						   model->layer[0].glow.ocm_map.tilemask,
-						   model->layer[0].glow.ocm_map.wb_page_start,
-						   model->layer[0].glow.ocm_map.wb_pages,
-						   model->layer[0].glow.ocm_map.scratch_pages);
-			model->layer[0].glow.ocm_map.ocm_reserved = true;
+			cn10k_ml_ocm_reserve_pages(
+				cnxk_mldev, model->model_id, layer_id, layer->glow.ocm_map.tilemask,
+				layer->glow.ocm_map.wb_page_start, layer->glow.ocm_map.wb_pages,
+				layer->glow.ocm_map.scratch_pages);
+			layer->glow.ocm_map.ocm_reserved = true;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
 
 	/* Update JD */
-	cn10k_ml_ocm_tilecount(model->layer[0].glow.ocm_map.tilemask, &tile_start, &tile_end);
+	cn10k_ml_ocm_tilecount(layer->glow.ocm_map.tilemask, &tile_start, &tile_end);
 	req->cn10k_req.jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
 	req->cn10k_req.jd.model_start.ocm_wb_base_address =
-		model->layer[0].glow.ocm_map.wb_page_start * ocm->page_size;
+		layer->glow.ocm_map.wb_page_start * ocm->page_size;
 
 	job_enqueued = false;
 	job_dequeued = false;
@@ -1640,66 +1648,94 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (ret == 0) {
-				model->state = ML_CNXK_MODEL_STATE_STARTED;
-				cnxk_mldev->nb_models_started++;
-			} else {
-				model->state = ML_CNXK_MODEL_STATE_UNKNOWN;
-			}
+			if (ret == 0)
+				layer->state = ML_CNXK_LAYER_STATE_STARTED;
+			else
+				layer->state = ML_CNXK_LAYER_STATE_UNKNOWN;
 
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN) {
-		while (model->layer[0].glow.ocm_map.ocm_reserved) {
+	if (layer->state == ML_CNXK_LAYER_STATE_UNKNOWN) {
+		while (layer->glow.ocm_map.ocm_reserved) {
 			if (plt_spinlock_trylock(&ocm->lock) != 0) {
-				cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
-				model->layer[0].glow.ocm_map.ocm_reserved = false;
-				model->layer[0].glow.ocm_map.tilemask = 0x0;
+				cn10k_ml_ocm_free_pages(cnxk_mldev, model->model_id, layer_id);
+				layer->glow.ocm_map.ocm_reserved = false;
+				layer->glow.ocm_map.tilemask = 0x0;
 				plt_spinlock_unlock(&ocm->lock);
 			}
 		}
 	}
 
-	if (ret < 0) { /* Call unload to update model and FW state, ignore error */
-		rte_ml_model_stop(dev->data->dev_id, model_id);
+	if (ret < 0) {
+		cn10k_ml_layer_stop(device, model_id, layer_name);
 	} else {
-		if (cn10k_mldev->cache_model_data && roc_model_is_cn10ka())
-			ret = cn10k_ml_cache_model_data(dev, model_id);
+		if (cn10k_mldev->cache_model_data)
+			ret = cn10k_ml_cache_model_data(cnxk_mldev, layer);
 	}
 
 	return ret;
 }
 
 int
-cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+	int ret;
+
+	layer = &model->layer[0];
+	ret = cn10k_ml_layer_start(cnxk_mldev, model->model_id, layer->name);
+	if (ret != 0) {
+		plt_err("CN10K Model start failed, model_id = %u, error = %d", model->model_id,
+			ret);
+		return ret;
+	}
+
+	cnxk_mldev->nb_models_started++;
+	model->state = ML_CNXK_MODEL_STATE_STARTED;
+
+	return 0;
+}
+
+int
+cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 	struct cnxk_ml_req *req;
 
+	uint16_t layer_id = 0;
 	bool job_enqueued;
 	bool job_dequeued;
 	bool locked;
 	int ret = 0;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	PLT_SET_USED(layer_name);
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
 
+	model = cnxk_mldev->mldev->data->models[model_id];
 	if (model == NULL) {
 		plt_err("Invalid model_id = %u", model_id);
 		return -EINVAL;
 	}
 
+	layer = &model->layer[layer_id];
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
+
 	/* Prepare JD */
-	req = model->layer[0].glow.req;
-	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
+	req = layer->glow.req;
+	cn10k_ml_prep_sp_job_descriptor(cnxk_mldev, layer, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
 	req->cn10k_req.result.error_code = 0x0;
 	req->cn10k_req.result.user_ptr = NULL;
 
@@ -1709,31 +1745,31 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-				plt_ml_dbg("Model not started, model = 0x%016lx",
-					   PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_LOADED) {
+				plt_ml_dbg("Layer not started, model_id = %u, layer_id = %u",
+					   model->model_id, layer_id);
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
-				plt_err("A slow-path job is active for the model = 0x%016lx",
-					PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE) {
+				plt_err("A slow-path job is active for the layer, model_id = %u, layer_id = %u",
+					model->model_id, layer_id);
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
+			layer->state = ML_CNXK_LAYER_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (model->layer[0].glow.ocm_map.ocm_reserved) {
+	while (layer->glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
-			cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
-			model->layer[0].glow.ocm_map.ocm_reserved = false;
-			model->layer[0].glow.ocm_map.tilemask = 0x0;
+			cn10k_ml_ocm_free_pages(cnxk_mldev, model->model_id, layer_id);
+			layer->glow.ocm_map.ocm_reserved = false;
+			layer->glow.ocm_map.tilemask = 0x0;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
@@ -1770,8 +1806,11 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			cnxk_mldev->nb_models_stopped++;
-			model->state = ML_CNXK_MODEL_STATE_LOADED;
+			if (ret == 0)
+				layer->state = ML_CNXK_LAYER_STATE_LOADED;
+			else
+				layer->state = ML_CNXK_LAYER_STATE_UNKNOWN;
+
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
@@ -1780,6 +1819,25 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	return ret;
 }
 
+int
+cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+	int ret;
+
+	layer = &model->layer[0];
+	ret = cn10k_ml_layer_stop(cnxk_mldev, model->model_id, layer->name);
+	if (ret != 0) {
+		plt_err("CN10K Model stop failed, model_id = %u, error = %d", model->model_id, ret);
+		return ret;
+	}
+
+	cnxk_mldev->nb_models_stopped++;
+	model->state = ML_CNXK_MODEL_STATE_LOADED;
+
+	return 0;
+}
+
 int
 cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			struct rte_ml_model_info *model_info)
@@ -2007,30 +2065,35 @@ queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
 }
 
 static __rte_always_inline void
-cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *req)
+cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml_req *req)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_layer_xstats *xstats;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_result *result;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cnxk_ml_qp *qp;
 	struct rte_ml_op *op;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
+	uint16_t model_id;
+	uint16_t layer_id;
 
 	result = &req->cn10k_req.result;
 	op = req->op;
 
 	if (likely(result->error_code == 0)) {
-		model = dev->data->models[op->model_id];
+		model_id = cnxk_mldev->index_map[op->model_id].model_id;
+		layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
+		model = cnxk_mldev->mldev->data->models[model_id];
+		layer = &model->layer[layer_id];
 		if (likely(qp_id >= 0)) {
-			qp = dev->data->queue_pairs[qp_id];
+			qp = cnxk_mldev->mldev->data->queue_pairs[qp_id];
 			qp->stats.dequeued_count++;
-			xstats = &model->layer[0].glow.burst_xstats[qp_id];
+			xstats = &layer->glow.burst_xstats[qp_id];
 		} else {
-			xstats = model->layer[0].glow.sync_xstats;
+			xstats = layer->glow.sync_xstats;
 		}
 
 		if (unlikely(xstats->dequeued_count == xstats->hw_reset_count)) {
@@ -2058,14 +2121,13 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *re
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
 	} else {
 		if (likely(qp_id >= 0)) {
-			qp = dev->data->queue_pairs[qp_id];
+			qp = cnxk_mldev->mldev->data->queue_pairs[qp_id];
 			qp->stats.dequeue_err_count++;
 		}
 
 		/* Handle driver error */
 		error_code = (union cn10k_ml_error_code *)&result->error_code;
 		if (error_code->s.etype == ML_ETYPE_DRIVER) {
-			cnxk_mldev = dev->data->dev_private;
 			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 			/* Check for exception */
@@ -2120,7 +2182,7 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	req = &queue->reqs[head];
 
 	cn10k_mldev->set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
@@ -2187,7 +2249,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 		}
 	}
 
-	cn10k_ml_result_update(dev, qp_id, req);
+	cn10k_ml_result_update(cnxk_mldev, qp_id, req);
 	ops[count] = req->op;
 
 	queue_index_advance(&tail, qp->nb_desc);
@@ -2236,23 +2298,27 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 }
 
 __rte_hot int
-cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
+cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cnxk_ml_req *req;
+	uint16_t model_id;
+	uint16_t layer_id;
 	bool timeout;
 	int ret = 0;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	model = dev->data->models[op->model_id];
-	req = model->layer[0].glow.req;
+	model_id = cnxk_mldev->index_map[op->model_id].model_id;
+	layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
+	model = cnxk_mldev->mldev->data->models[model_id];
+	layer = &model->layer[layer_id];
+	req = layer->glow.req;
 
 	cn10k_ml_set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
@@ -2288,7 +2354,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	if (timeout)
 		ret = -ETIME;
 	else
-		cn10k_ml_result_update(dev, -1, req);
+		cn10k_ml_result_update(cnxk_mldev, -1, req);
 
 error_enqueue:
 	return ret;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 677219dfdf..a222a43d55 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -315,8 +315,8 @@ int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
-int cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id);
-int cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
+int cn10k_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			    struct rte_ml_model_info *model_info);
 int cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer);
@@ -335,7 +335,7 @@ __rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id
 					  struct rte_ml_op **ops, uint16_t nb_ops);
 __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op,
 				    struct rte_ml_op_error *error);
-__rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
+__rte_hot int cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op);
 
 /* Misc ops */
 void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
@@ -344,5 +344,7 @@ void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *q
 int cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uint8_t *buffer,
 			size_t size, uint16_t *index);
 int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name);
+int cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name);
+int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name);
 
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 3d9d5f9d78..915309168d 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -242,7 +242,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 			model = dev->data->models[model_id];
 			if (model != NULL) {
 				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-					if (cn10k_ml_model_stop(dev, model_id) != 0)
+					if (cnxk_ml_model_stop(dev, model_id) != 0)
 						plt_err("Could not stop model %u", model_id);
 				}
 				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
@@ -334,7 +334,7 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 		model = dev->data->models[model_id];
 		if (model != NULL) {
 			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-				if (cn10k_ml_model_stop(dev, model_id) != 0)
+				if (cnxk_ml_model_stop(dev, model_id) != 0)
 					plt_err("Could not stop model %u", model_id);
 			}
 			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
@@ -566,6 +566,46 @@ cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 	return plt_memzone_free(plt_memzone_lookup(str));
 }
 
+static int
+cnxk_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	return cn10k_ml_model_start(cnxk_mldev, model);
+}
+
+int
+cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	return cn10k_ml_model_stop(cnxk_mldev, model);
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -591,8 +631,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Model ops */
 	.model_load = cnxk_ml_model_load,
 	.model_unload = cnxk_ml_model_unload,
-	.model_start = cn10k_ml_model_start,
-	.model_stop = cn10k_ml_model_stop,
+	.model_start = cnxk_ml_model_start,
+	.model_stop = cnxk_ml_model_stop,
 	.model_info_get = cn10k_ml_model_info_get,
 	.model_params_update = cn10k_ml_model_params_update,
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index bc14f6e5b9..d27ca0d0cb 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -63,5 +63,6 @@ struct cnxk_ml_qp {
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
 int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
+int cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
 
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 12/34] ml/cnxk: update model utility functions
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (10 preceding siblings ...)
  2023-09-20  7:25   ` [PATCH v2 11/34] ml/cnxk: update model start and stop functions Srikanth Yalavarthi
@ 2023-09-20  7:25   ` Srikanth Yalavarthi
  2023-09-20  7:25   ` [PATCH v2 13/34] ml/cnxk: update data quantization functions Srikanth Yalavarthi
                     ` (22 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:25 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to update model params and
fetch model info.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 38 ++++++---------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |  5 ++--
 drivers/ml/cnxk/cnxk_ml_ops.c  | 48 ++++++++++++++++++++++++++++++++--
 3 files changed, 56 insertions(+), 35 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index e5b9837ed7..0eebefee5f 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1839,45 +1839,23 @@ cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
 }
 
 int
-cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
-			struct rte_ml_model_info *model_info)
+cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			     void *buffer)
 {
-	struct cnxk_ml_model *model;
-
-	model = dev->data->models[model_id];
-
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
-
-	rte_memcpy(model_info, model->info, sizeof(struct rte_ml_model_info));
-	model_info->input_info = ((struct rte_ml_model_info *)model->info)->input_info;
-	model_info->output_info = ((struct rte_ml_model_info *)model->info)->output_info;
-
-	return 0;
-}
-
-int
-cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
-{
-	struct cnxk_ml_model *model;
-
-	model = dev->data->models[model_id];
+	struct cnxk_ml_layer *layer;
 
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
+	RTE_SET_USED(cnxk_mldev);
 
 	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN)
 		return -1;
 	else if (model->state != ML_CNXK_MODEL_STATE_LOADED)
 		return -EBUSY;
 
+	layer = &model->layer[0];
+
 	/* Update model weights & bias */
-	rte_memcpy(model->layer[0].glow.addr.wb_load_addr, buffer,
-		   model->layer[0].glow.metadata.weights_bias.file_size);
+	rte_memcpy(layer->glow.addr.wb_load_addr, buffer,
+		   layer->glow.metadata.weights_bias.file_size);
 
 	return 0;
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index a222a43d55..ef12069f0d 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -317,9 +317,8 @@ int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_para
 int cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
-int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
-			    struct rte_ml_model_info *model_info);
-int cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer);
+int cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				 void *buffer);
 
 /* I/O ops */
 int cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 915309168d..5ad0ea8c3c 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -606,6 +606,50 @@ cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	return cn10k_ml_model_stop(cnxk_mldev, model);
 }
 
+static int
+cnxk_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
+		       struct rte_ml_model_info *model_info)
+{
+	struct rte_ml_model_info *info;
+	struct cnxk_ml_model *model;
+
+	if ((dev == NULL) || (model_info == NULL))
+		return -EINVAL;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	info = (struct rte_ml_model_info *)model->info;
+	rte_memcpy(model_info, info, sizeof(struct rte_ml_model_info));
+	model_info->input_info = info->input_info;
+	model_info->output_info = info->output_info;
+
+	return 0;
+}
+
+static int
+cnxk_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	if ((dev == NULL) || (buffer == NULL))
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	return cn10k_ml_model_params_update(cnxk_mldev, model, buffer);
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -633,8 +677,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.model_unload = cnxk_ml_model_unload,
 	.model_start = cnxk_ml_model_start,
 	.model_stop = cnxk_ml_model_stop,
-	.model_info_get = cn10k_ml_model_info_get,
-	.model_params_update = cn10k_ml_model_params_update,
+	.model_info_get = cnxk_ml_model_info_get,
+	.model_params_update = cnxk_ml_model_params_update,
 
 	/* I/O ops */
 	.io_quantize = cn10k_ml_io_quantize,
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 13/34] ml/cnxk: update data quantization functions
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (11 preceding siblings ...)
  2023-09-20  7:25   ` [PATCH v2 12/34] ml/cnxk: update model utility functions Srikanth Yalavarthi
@ 2023-09-20  7:25   ` Srikanth Yalavarthi
  2023-09-20  7:25   ` [PATCH v2 14/34] ml/cnxk: update device debug functions Srikanth Yalavarthi
                     ` (21 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:25 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper functions to quantize input data and
dequantize output data.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 164 ---------------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   7 --
 drivers/ml/cnxk/cnxk_ml_io.c   |  95 +++++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_io.h   |   3 +
 drivers/ml/cnxk/cnxk_ml_ops.c  |  78 +++++++++++++++-
 drivers/ml/cnxk/meson.build    |   1 +
 6 files changed, 175 insertions(+), 173 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.c

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 0eebefee5f..1e6aee818c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1860,170 +1860,6 @@ cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_mode
 	return 0;
 }
 
-int
-cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
-		     struct rte_ml_buff_seg **qbuffer)
-{
-	struct cnxk_ml_model *model;
-	uint8_t model_input_type;
-	uint8_t *lcl_dbuffer;
-	uint8_t *lcl_qbuffer;
-	uint8_t input_type;
-	float qscale;
-	uint32_t i;
-	uint32_t j;
-	int ret;
-
-	model = dev->data->models[model_id];
-
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
-
-	lcl_dbuffer = dbuffer[0]->addr;
-	lcl_qbuffer = qbuffer[0]->addr;
-
-	for (i = 0; i < model->layer[0].glow.metadata.model.num_input; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			input_type = model->layer[0].glow.metadata.input1[i].input_type;
-			model_input_type = model->layer[0].glow.metadata.input1[i].model_input_type;
-			qscale = model->layer[0].glow.metadata.input1[i].qscale;
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			input_type = model->layer[0].glow.metadata.input2[j].input_type;
-			model_input_type = model->layer[0].glow.metadata.input2[j].model_input_type;
-			qscale = model->layer[0].glow.metadata.input2[j].qscale;
-		}
-
-		if (input_type == model_input_type) {
-			rte_memcpy(lcl_qbuffer, lcl_dbuffer, model->layer[0].info.input[i].sz_d);
-		} else {
-			switch (model->layer[0].glow.metadata.input1[i].model_input_type) {
-			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_float32_to_int8(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_float32_to_uint8(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_float32_to_int16(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_float32_to_uint16(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_FP16:
-				ret = rte_ml_io_float32_to_float16(
-					model->layer[0].info.input[i].nb_elements, lcl_dbuffer,
-					lcl_qbuffer);
-				break;
-			default:
-				plt_err("Unsupported model_input_type[%u] : %u", i,
-					model->layer[0].glow.metadata.input1[i].model_input_type);
-				ret = -ENOTSUP;
-			}
-			if (ret < 0)
-				return ret;
-		}
-
-		lcl_dbuffer += model->layer[0].info.input[i].sz_d;
-		lcl_qbuffer += model->layer[0].info.input[i].sz_q;
-	}
-
-	return 0;
-}
-
-int
-cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
-		       struct rte_ml_buff_seg **dbuffer)
-{
-	struct cnxk_ml_model *model;
-	uint8_t model_output_type;
-	uint8_t *lcl_qbuffer;
-	uint8_t *lcl_dbuffer;
-	uint8_t output_type;
-	float dscale;
-	uint32_t i;
-	uint32_t j;
-	int ret;
-
-	model = dev->data->models[model_id];
-
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
-
-	lcl_dbuffer = dbuffer[0]->addr;
-	lcl_qbuffer = qbuffer[0]->addr;
-
-	for (i = 0; i < model->layer[0].glow.metadata.model.num_output; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			output_type = model->layer[0].glow.metadata.output1[i].output_type;
-			model_output_type =
-				model->layer[0].glow.metadata.output1[i].model_output_type;
-			dscale = model->layer[0].glow.metadata.output1[i].dscale;
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			output_type = model->layer[0].glow.metadata.output2[j].output_type;
-			model_output_type =
-				model->layer[0].glow.metadata.output2[j].model_output_type;
-			dscale = model->layer[0].glow.metadata.output2[j].dscale;
-		}
-
-		if (output_type == model_output_type) {
-			rte_memcpy(lcl_dbuffer, lcl_qbuffer, model->layer[0].info.output[i].sz_q);
-		} else {
-			switch (model->layer[0].glow.metadata.output1[i].model_output_type) {
-			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_int8_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_uint8_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_int16_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_uint16_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_FP16:
-				ret = rte_ml_io_float16_to_float32(
-					model->layer[0].info.output[i].nb_elements, lcl_qbuffer,
-					lcl_dbuffer);
-				break;
-			default:
-				plt_err("Unsupported model_output_type[%u] : %u", i,
-					model->layer[0].glow.metadata.output1[i].model_output_type);
-				ret = -ENOTSUP;
-			}
-			if (ret < 0)
-				return ret;
-		}
-
-		lcl_qbuffer += model->layer[0].info.output[i].sz_q;
-		lcl_dbuffer += model->layer[0].info.output[i].sz_d;
-	}
-
-	return 0;
-}
-
 static __rte_always_inline void
 queue_index_advance(uint64_t *index, uint64_t nb_desc)
 {
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index ef12069f0d..780e2a9f9c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -320,13 +320,6 @@ int cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 int cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 				 void *buffer);
 
-/* I/O ops */
-int cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id,
-			 struct rte_ml_buff_seg **dbuffer, struct rte_ml_buff_seg **qbuffer);
-
-int cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id,
-			   struct rte_ml_buff_seg **qbuffer, struct rte_ml_buff_seg **dbuffer);
-
 /* Fast-path ops */
 __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					  struct rte_ml_op **ops, uint16_t nb_ops);
diff --git a/drivers/ml/cnxk/cnxk_ml_io.c b/drivers/ml/cnxk/cnxk_ml_io.c
new file mode 100644
index 0000000000..c78009ab0c
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_io.c
@@ -0,0 +1,95 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include <mldev_utils.h>
+
+#include <roc_api.h>
+
+#include "cnxk_ml_io.h"
+
+inline int
+cnxk_ml_io_quantize_single(struct cnxk_ml_io *input, uint8_t *dbuffer, uint8_t *qbuffer)
+{
+	enum rte_ml_io_type qtype;
+	enum rte_ml_io_type dtype;
+	uint32_t nb_elements;
+	float qscale;
+	int ret = 0;
+
+	dtype = input->dtype;
+	qtype = input->qtype;
+	qscale = input->scale;
+	nb_elements = input->nb_elements;
+
+	if (dtype == qtype) {
+		rte_memcpy(qbuffer, dbuffer, input->sz_d);
+	} else {
+		switch (qtype) {
+		case RTE_ML_IO_TYPE_INT8:
+			ret = rte_ml_io_float32_to_int8(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT8:
+			ret = rte_ml_io_float32_to_uint8(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_INT16:
+			ret = rte_ml_io_float32_to_int16(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT16:
+			ret = rte_ml_io_float32_to_uint16(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_FP16:
+			ret = rte_ml_io_float32_to_float16(nb_elements, dbuffer, qbuffer);
+			break;
+		default:
+			plt_err("Unsupported qtype : %u", qtype);
+			ret = -ENOTSUP;
+		}
+	}
+
+	return ret;
+}
+
+inline int
+cnxk_ml_io_dequantize_single(struct cnxk_ml_io *output, uint8_t *qbuffer, uint8_t *dbuffer)
+{
+	enum rte_ml_io_type qtype;
+	enum rte_ml_io_type dtype;
+	uint32_t nb_elements;
+	float dscale;
+	int ret = 0;
+
+	dtype = output->dtype;
+	qtype = output->qtype;
+	dscale = output->scale;
+	nb_elements = output->nb_elements;
+
+	if (dtype == qtype) {
+		rte_memcpy(dbuffer, qbuffer, output->sz_q);
+	} else {
+		switch (qtype) {
+		case RTE_ML_IO_TYPE_INT8:
+			ret = rte_ml_io_int8_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT8:
+			ret = rte_ml_io_uint8_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_INT16:
+			ret = rte_ml_io_int16_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT16:
+			ret = rte_ml_io_uint16_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_FP16:
+			ret = rte_ml_io_float16_to_float32(nb_elements, qbuffer, dbuffer);
+			break;
+		default:
+			plt_err("Unsupported qtype: %u", qtype);
+			ret = -ENOTSUP;
+		}
+	}
+
+	return ret;
+}
diff --git a/drivers/ml/cnxk/cnxk_ml_io.h b/drivers/ml/cnxk/cnxk_ml_io.h
index 29ec7ec511..5de166c252 100644
--- a/drivers/ml/cnxk/cnxk_ml_io.h
+++ b/drivers/ml/cnxk/cnxk_ml_io.h
@@ -76,4 +76,7 @@ struct cnxk_ml_io_info {
 	uint32_t total_output_sz_d;
 };
 
+int cnxk_ml_io_quantize_single(struct cnxk_ml_io *input, uint8_t *dbuffer, uint8_t *qbuffer);
+int cnxk_ml_io_dequantize_single(struct cnxk_ml_io *output, uint8_t *qbuffer, uint8_t *dbuffer);
+
 #endif /* _CNXK_ML_IO_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 5ad0ea8c3c..ed71a55132 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -5,6 +5,8 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include <mldev_utils.h>
+
 #include "cn10k_ml_ops.h"
 
 #include "cnxk_ml_dev.h"
@@ -650,6 +652,78 @@ cnxk_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buf
 	return cn10k_ml_model_params_update(cnxk_mldev, model, buffer);
 }
 
+static int
+cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
+		    struct rte_ml_buff_seg **qbuffer)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_model *model;
+	uint8_t *lcl_dbuffer;
+	uint8_t *lcl_qbuffer;
+	uint32_t i;
+	int ret;
+
+	if ((dev == NULL) || (dbuffer == NULL) || (qbuffer == NULL))
+		return -EINVAL;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	info = &model->layer[0].info;
+
+	lcl_dbuffer = dbuffer[0]->addr;
+	lcl_qbuffer = qbuffer[0]->addr;
+	for (i = 0; i < info->nb_inputs; i++) {
+		ret = cnxk_ml_io_quantize_single(&info->input[i], lcl_dbuffer, lcl_qbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_dbuffer += info->input[i].sz_d;
+		lcl_qbuffer += info->input[i].sz_q;
+	}
+
+	return 0;
+}
+
+static int
+cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
+		      struct rte_ml_buff_seg **dbuffer)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_model *model;
+	uint8_t *lcl_qbuffer;
+	uint8_t *lcl_dbuffer;
+	uint32_t i;
+	int ret;
+
+	if ((dev == NULL) || (qbuffer == NULL) || (dbuffer == NULL))
+		return -EINVAL;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	info = &model->layer[model->nb_layers - 1].info;
+
+	lcl_qbuffer = qbuffer[0]->addr;
+	lcl_dbuffer = dbuffer[0]->addr;
+	for (i = 0; i < info->nb_outputs; i++) {
+		ret = cnxk_ml_io_dequantize_single(&info->output[i], lcl_qbuffer, lcl_dbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_qbuffer += info->output[i].sz_q;
+		lcl_dbuffer += info->output[i].sz_d;
+	}
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -681,6 +755,6 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.model_params_update = cnxk_ml_model_params_update,
 
 	/* I/O ops */
-	.io_quantize = cn10k_ml_io_quantize,
-	.io_dequantize = cn10k_ml_io_dequantize,
+	.io_quantize = cnxk_ml_io_quantize,
+	.io_dequantize = cnxk_ml_io_dequantize,
 };
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 6385ac4548..9cc4ddec70 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -25,6 +25,7 @@ sources = files(
         'cn10k_ml_model.c',
         'cn10k_ml_ocm.c',
         'cnxk_ml_dev.c',
+        'cnxk_ml_io.c',
         'cnxk_ml_model.c',
         'cnxk_ml_ops.c',
 )
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 14/34] ml/cnxk: update device debug functions
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (12 preceding siblings ...)
  2023-09-20  7:25   ` [PATCH v2 13/34] ml/cnxk: update data quantization functions Srikanth Yalavarthi
@ 2023-09-20  7:25   ` Srikanth Yalavarthi
  2023-09-20  7:25   ` [PATCH v2 15/34] ml/cnxk: update device stats functions Srikanth Yalavarthi
                     ` (20 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:25 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper for device dump and selftest debug
functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 118 +++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h |   1 +
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  11 +-
 drivers/ml/cnxk/cn10k_ml_ocm.h   |   2 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 176 ++-----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h   |   4 +-
 drivers/ml/cnxk/cnxk_ml_model.c  |  33 ++++++
 drivers/ml/cnxk/cnxk_ml_model.h  |   2 +
 drivers/ml/cnxk/cnxk_ml_ops.c    |  39 ++++++-
 drivers/ml/cnxk/cnxk_ml_utils.c  |  15 +++
 drivers/ml/cnxk/cnxk_ml_utils.h  |  17 +++
 drivers/ml/cnxk/meson.build      |   2 +
 12 files changed, 237 insertions(+), 183 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.h

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 9a336cd18f..9e92d4acf3 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -12,6 +12,7 @@
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
+#include "cnxk_ml_utils.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -591,3 +592,120 @@ cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 				 rte_ml_io_type_size_get(io_info->output[i].qtype);
 	}
 }
+
+void
+cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp)
+{
+	struct cn10k_ml_ocm *ocm;
+	char str[STR_LEN];
+	uint8_t i;
+	uint8_t j;
+
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
+
+	/* Print debug info */
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, " Layer Information (Layer ID: %u, Name: %s)\n",
+		cnxk_mldev->index_map[layer->index].layer_id, layer->name);
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "index", layer->index);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", layer->name);
+	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version",
+		layer->glow.metadata.model.version[0], layer->glow.metadata.model.version[1],
+		layer->glow.metadata.model.version[2], layer->glow.metadata.model.version[3]);
+	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "layer", PLT_U64_CAST(layer));
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", layer->batch_size);
+
+	/* Print model state */
+	if (layer->state == ML_CNXK_LAYER_STATE_LOADED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+	if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+	if (layer->state == ML_CNXK_LAYER_STATE_STARTED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+
+	/* Print OCM status */
+	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
+		layer->glow.metadata.model.ocm_wb_range_end -
+			layer->glow.metadata.model.ocm_wb_range_start + 1);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", layer->glow.ocm_map.wb_pages);
+	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
+		ocm->size_per_tile - layer->glow.metadata.model.ocm_tmp_range_floor);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages", layer->glow.ocm_map.scratch_pages);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
+		layer->glow.metadata.model.tile_end - layer->glow.metadata.model.tile_start + 1);
+
+	if (layer->state == ML_CNXK_LAYER_STATE_STARTED) {
+		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
+			ML_CN10K_OCM_NUMTILES / 4, layer->glow.ocm_map.tilemask);
+		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
+			layer->glow.ocm_map.wb_page_start * ocm->page_size);
+	}
+
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", layer->glow.metadata.model.num_input);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", layer->glow.metadata.model.num_output);
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s  %18s\n", "input", "input_name", "input_type",
+		"model_input_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->glow.metadata.model.num_input; i++) {
+		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.input1[i].input_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.input1[i].input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.input1[i].model_input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		} else {
+			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
+
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.input2[j].input_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.input2[j].input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.input2[j].model_input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		}
+	}
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s  %18s\n", "output", "output_name", "output_type",
+		"model_output_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->glow.metadata.model.num_output; i++) {
+		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.output1[i].output_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.output1[i].output_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.output1[i].model_output_type,
+					      str, STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		} else {
+			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.output2[j].output_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.output2[j].output_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.output2[j].model_output_type,
+					      str, STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		}
+	}
+	fprintf(fp, "\n");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 45290b84ce..8717e7de3e 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -459,5 +459,6 @@ int cn10k_ml_model_ocm_pages_count(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_m
 void cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 			     struct cnxk_ml_io_info *io_info,
 			     struct cn10k_ml_model_metadata *metadata);
+void cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 2d900dbc78..70d207e646 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -483,19 +483,15 @@ cn10k_ml_ocm_pagemask_to_str(struct cn10k_ml_ocm_tile_info *tile_info, uint16_t
 }
 
 void
-cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
+cn10k_ml_ocm_print(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
 {
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 	uint8_t tile_id;
 	uint8_t word_id;
 	int wb_pages;
 	char *str;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
 
 	/* Nibbles + prefix '0x' */
 	str = rte_zmalloc("ocm_mask_str", ocm->num_pages / 4 + 2, RTE_CACHE_LINE_SIZE);
@@ -510,8 +506,7 @@ cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
 
 		wb_pages = 0 - ocm->tile_ocm_info[tile_id].scratch_pages;
 		for (word_id = 0; word_id < ocm->mask_words; word_id++)
-			wb_pages +=
-				rte_popcount32(ocm->tile_ocm_info[tile_id].ocm_mask[word_id]);
+			wb_pages += rte_popcount32(ocm->tile_ocm_info[tile_id].ocm_mask[word_id]);
 
 		fprintf(fp,
 			"tile = %2u, scratch_pages = %4u,"
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 97b723a56a..bf8944f8ee 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -83,6 +83,6 @@ void cn10k_ml_ocm_reserve_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_i
 				uint16_t layer_id, uint64_t tilemask, int wb_page_start,
 				uint16_t wb_pages, uint16_t scratch_pages);
 void cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id);
-void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp);
+void cn10k_ml_ocm_print(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 
 #endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 1e6aee818c..c3608eec99 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -22,11 +22,6 @@
 /* ML layer macros */
 #define CN10K_ML_LAYER_MEMZONE_NAME "ml_cn10k_layer_mz"
 
-/* Debug print width */
-#define STR_LEN	  12
-#define FIELD_LEN 16
-#define LINE_LEN  90
-
 /* ML Job descriptor flags */
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
@@ -74,16 +69,6 @@ static const struct cn10k_ml_stype_db_driver {
 	{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
 };
 
-static void
-print_line(FILE *fp, int len)
-{
-	int i;
-
-	for (i = 0; i < len; i++)
-		fprintf(fp, "-");
-	fprintf(fp, "\n");
-}
-
 static inline void
 cn10k_ml_set_poll_addr(struct cnxk_ml_req *req)
 {
@@ -117,140 +102,6 @@ cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp)
 	}
 }
 
-static void
-cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
-	struct cn10k_ml_ocm *ocm;
-	char str[STR_LEN];
-	uint8_t i;
-	uint8_t j;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
-
-	/* Print debug info */
-	print_line(fp, LINE_LEN);
-	fprintf(fp, " Model Information (%s)\n", model->glow.metadata.model.name);
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->glow.metadata.model.name);
-	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version",
-		model->glow.metadata.model.version[0], model->glow.metadata.model.version[1],
-		model->glow.metadata.model.version[2], model->glow.metadata.model.version[3]);
-	if (strlen(model->name) != 0)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "debug_name", model->name);
-	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->glow.metadata.model.batch_size);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_layers", model->glow.metadata.model.num_layers);
-
-	/* Print model state */
-	if (model->state == ML_CNXK_MODEL_STATE_LOADED)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
-	if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
-	if (model->state == ML_CNXK_MODEL_STATE_STARTED)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
-
-	/* Print OCM status */
-	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
-		model->glow.metadata.model.ocm_wb_range_end -
-			model->glow.metadata.model.ocm_wb_range_start + 1);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", model->layer[0].glow.ocm_map.wb_pages);
-	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
-		ocm->size_per_tile - model->glow.metadata.model.ocm_tmp_range_floor);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages",
-		model->layer[0].glow.ocm_map.scratch_pages);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
-		model->glow.metadata.model.tile_end - model->glow.metadata.model.tile_start + 1);
-
-	if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
-			ML_CN10K_OCM_NUMTILES / 4, model->layer[0].glow.ocm_map.tilemask);
-		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
-			model->layer[0].glow.ocm_map.wb_page_start * cn10k_mldev->ocm.page_size);
-	}
-
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->glow.metadata.model.num_input);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", model->glow.metadata.model.num_output);
-	fprintf(fp, "\n");
-
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "input", "input_name", "input_type",
-		"model_input_type", "quantize");
-	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->glow.metadata.model.num_input; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.input1[i].input_name);
-			rte_ml_io_type_to_str(model->glow.metadata.input1[i].input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.input1[i].model_input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.input1[i].quantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.input2[j].input_name);
-			rte_ml_io_type_to_str(model->glow.metadata.input2[j].input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.input2[j].model_input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.input2[j].quantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		}
-	}
-	fprintf(fp, "\n");
-
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "output", "output_name", "output_type",
-		"model_output_type", "dequantize");
-	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->glow.metadata.model.num_output; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.output1[i].output_name);
-			rte_ml_io_type_to_str(model->glow.metadata.output1[i].output_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.output1[i].model_output_type,
-					      str, STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.output1[i].dequantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.output2[j].output_name);
-			rte_ml_io_type_to_str(model->glow.metadata.output2[j].output_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.output2[j].model_output_type,
-					      str, STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.output2[j].dequantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		}
-	}
-	fprintf(fp, "\n");
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "\n");
-}
-
 static void
 cn10k_ml_prep_sp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				struct cnxk_ml_req *req, enum cn10k_ml_job_type job_type)
@@ -1124,38 +975,25 @@ cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mo
 }
 
 int
-cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
+cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
 	struct cn10k_ml_fw *fw;
 
 	uint32_t head_loc;
 	uint32_t tail_loc;
-	uint16_t model_id;
 	uint32_t bufsize;
 	char *head_ptr;
 	int core_id;
 
-	if (roc_env_is_asim())
-		return 0;
-
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	fw = &cn10k_mldev->fw;
 
-	/* Dump model info */
-	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
-		model = dev->data->models[model_id];
-		if (model != NULL) {
-			cn10k_ml_model_print(dev, model_id, fp);
-			fprintf(fp, "\n");
-		}
-	}
-
 	/* Dump OCM state */
-	cn10k_ml_ocm_print(dev, fp);
+	cn10k_ml_ocm_print(cnxk_mldev, fp);
+
+	if (roc_env_is_asim())
+		return 0;
 
 	/* Dump debug buffer */
 	for (core_id = 0; core_id <= 1; core_id++) {
@@ -1211,17 +1049,15 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 }
 
 int
-cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
+cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	const struct plt_memzone *mz;
 	struct cnxk_ml_req *req;
 	uint64_t timeout_cycle;
 	bool timeout;
 	int ret;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cnxk_ml_req), 0,
 					 ML_CN10K_ALIGN_SIZE);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 780e2a9f9c..5fda98ae88 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -295,8 +295,8 @@ int cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_d
 int cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
-int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
-int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
+int cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
+int cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev);
 
 int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
 void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
index 3d735ced3e..b069d4e3a5 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.c
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -5,3 +5,36 @@
 #include <rte_mldev.h>
 
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_utils.h"
+
+void
+cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp)
+{
+	struct cnxk_ml_layer *layer;
+	uint16_t layer_id;
+
+	/* Print debug info */
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, " Model Information (Model ID: %u, Name: %s)\n", model->model_id, model->name);
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->name);
+	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->batch_size);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "nb_layers", model->nb_layers);
+
+	/* Print model state */
+	if (model->state == ML_CNXK_MODEL_STATE_LOADED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+	if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+	if (model->state == ML_CNXK_MODEL_STATE_STARTED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+
+	for (layer_id = 0; layer_id < model->nb_layers; layer_id++) {
+		layer = &model->layer[layer_id];
+		cn10k_ml_layer_print(cnxk_mldev, layer, fp);
+	}
+}
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index a2994dbb71..66d979dd3c 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -108,4 +108,6 @@ struct cnxk_ml_model {
 	plt_spinlock_t lock;
 };
 
+void cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp);
+
 #endif /* _CNXK_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index ed71a55132..b49ab59798 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -411,6 +411,41 @@ cnxk_ml_dev_stop(struct rte_ml_dev *dev)
 	return 0;
 }
 
+static int
+cnxk_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	uint16_t model_id;
+
+	if ((dev == NULL) || (fp == NULL))
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	/* Dump model info */
+	for (model_id = 0; model_id < cnxk_mldev->mldev->data->nb_models; model_id++) {
+		model = cnxk_mldev->mldev->data->models[model_id];
+		if (model != NULL)
+			cnxk_ml_model_dump(cnxk_mldev, model, fp);
+	}
+
+	return cn10k_ml_dev_dump(cnxk_mldev, fp);
+}
+
+static int
+cnxk_ml_dev_selftest(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	return cn10k_ml_dev_selftest(cnxk_mldev);
+}
+
 static int
 cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 			     const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
@@ -731,8 +766,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_close = cnxk_ml_dev_close,
 	.dev_start = cnxk_ml_dev_start,
 	.dev_stop = cnxk_ml_dev_stop,
-	.dev_dump = cn10k_ml_dev_dump,
-	.dev_selftest = cn10k_ml_dev_selftest,
+	.dev_dump = cnxk_ml_dev_dump,
+	.dev_selftest = cnxk_ml_dev_selftest,
 
 	/* Queue-pair handling ops */
 	.dev_queue_pair_setup = cnxk_ml_dev_queue_pair_setup,
diff --git a/drivers/ml/cnxk/cnxk_ml_utils.c b/drivers/ml/cnxk/cnxk_ml_utils.c
new file mode 100644
index 0000000000..ca3670a9e8
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_utils.c
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include "cnxk_ml_utils.h"
+
+void
+cnxk_ml_print_line(FILE *fp, int len)
+{
+	int i;
+
+	for (i = 0; i < len; i++)
+		fprintf(fp, "-");
+	fprintf(fp, "\n");
+}
diff --git a/drivers/ml/cnxk/cnxk_ml_utils.h b/drivers/ml/cnxk/cnxk_ml_utils.h
new file mode 100644
index 0000000000..ed2ab21346
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_utils.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_UTILS_H_
+#define _CNXK_ML_UTILS_H_
+
+#include <rte_mldev.h>
+
+/* Debug print width */
+#define STR_LEN	  12
+#define FIELD_LEN 16
+#define LINE_LEN  72
+
+void cnxk_ml_print_line(FILE *fp, int len);
+
+#endif /* _CNXK_ML_UTILS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 9cc4ddec70..575f08f9c0 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -17,6 +17,7 @@ driver_sdk_headers = files(
         'cnxk_ml_model.h',
         'cnxk_ml_ops.h',
         'cnxk_ml_xstats.h',
+        'cnxk_ml_utils.h',
 )
 
 sources = files(
@@ -28,6 +29,7 @@ sources = files(
         'cnxk_ml_io.c',
         'cnxk_ml_model.c',
         'cnxk_ml_ops.c',
+        'cnxk_ml_utils.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 15/34] ml/cnxk: update device stats functions
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (13 preceding siblings ...)
  2023-09-20  7:25   ` [PATCH v2 14/34] ml/cnxk: update device debug functions Srikanth Yalavarthi
@ 2023-09-20  7:25   ` Srikanth Yalavarthi
  2023-09-20  7:25   ` [PATCH v2 16/34] ml/cnxk: update device and model xstats functions Srikanth Yalavarthi
                     ` (19 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:25 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to handle ML device stats

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 32 ------------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |  2 --
 drivers/ml/cnxk/cnxk_ml_ops.c  | 36 ++++++++++++++++++++++++++++++++--
 3 files changed, 34 insertions(+), 36 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index c3608eec99..59cd3bb9b3 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -774,38 +774,6 @@ cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
-int
-cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
-{
-	struct cnxk_ml_qp *qp;
-	int qp_id;
-
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-		qp = dev->data->queue_pairs[qp_id];
-		stats->enqueued_count += qp->stats.enqueued_count;
-		stats->dequeued_count += qp->stats.dequeued_count;
-		stats->enqueue_err_count += qp->stats.enqueue_err_count;
-		stats->dequeue_err_count += qp->stats.dequeue_err_count;
-	}
-
-	return 0;
-}
-
-void
-cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
-{
-	struct cnxk_ml_qp *qp;
-	int qp_id;
-
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-		qp = dev->data->queue_pairs[qp_id];
-		qp->stats.enqueued_count = 0;
-		qp->stats.dequeued_count = 0;
-		qp->stats.enqueue_err_count = 0;
-		qp->stats.dequeue_err_count = 0;
-	}
-}
-
 int
 cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 5fda98ae88..47e7cb12af 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -298,8 +298,6 @@ int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev);
 
-int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
-void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
 int cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 				  int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
 				  uint32_t size);
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index b49ab59798..ffeb3f4452 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -491,6 +491,38 @@ cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+static int
+cnxk_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
+{
+	struct cnxk_ml_qp *qp;
+	int qp_id;
+
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		stats->enqueued_count += qp->stats.enqueued_count;
+		stats->dequeued_count += qp->stats.dequeued_count;
+		stats->enqueue_err_count += qp->stats.enqueue_err_count;
+		stats->dequeue_err_count += qp->stats.dequeue_err_count;
+	}
+
+	return 0;
+}
+
+static void
+cnxk_ml_dev_stats_reset(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_qp *qp;
+	int qp_id;
+
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		qp->stats.enqueued_count = 0;
+		qp->stats.dequeued_count = 0;
+		qp->stats.enqueue_err_count = 0;
+		qp->stats.dequeue_err_count = 0;
+	}
+}
+
 static int
 cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
@@ -774,8 +806,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_queue_pair_release = cnxk_ml_dev_queue_pair_release,
 
 	/* Stats ops */
-	.dev_stats_get = cn10k_ml_dev_stats_get,
-	.dev_stats_reset = cn10k_ml_dev_stats_reset,
+	.dev_stats_get = cnxk_ml_dev_stats_get,
+	.dev_stats_reset = cnxk_ml_dev_stats_reset,
 	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
 	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
 	.dev_xstats_get = cn10k_ml_dev_xstats_get,
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 16/34] ml/cnxk: update device and model xstats functions
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (14 preceding siblings ...)
  2023-09-20  7:25   ` [PATCH v2 15/34] ml/cnxk: update device stats functions Srikanth Yalavarthi
@ 2023-09-20  7:25   ` Srikanth Yalavarthi
  2023-09-20  7:25   ` [PATCH v2 17/34] ml/cnxk: update fast path functions Srikanth Yalavarthi
                     ` (18 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:25 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to handle ML device and model
extended stats. Handling resources for the xstats is done
in the cnxk layer. Introduced internal xstats group.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |   4 -
 drivers/ml/cnxk/cn10k_ml_ops.c   | 542 +------------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h   |  11 -
 drivers/ml/cnxk/cnxk_ml_dev.h    |   5 +
 drivers/ml/cnxk/cnxk_ml_ops.c    | 540 +++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_xstats.h |  21 +-
 6 files changed, 571 insertions(+), 552 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index be989e0a20..bde9d08901 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -10,7 +10,6 @@
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_io.h"
-#include "cnxk_ml_xstats.h"
 
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
@@ -133,9 +132,6 @@ struct cn10k_ml_dev {
 	/* OCM info */
 	struct cn10k_ml_ocm ocm;
 
-	/* Extended stats data */
-	struct cnxk_ml_xstats xstats;
-
 	/* Enable / disable model data caching */
 	int cache_model_data;
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 59cd3bb9b3..f1431b89a2 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -202,107 +202,21 @@ cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_r
 	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
 }
 
-static int
-cn10k_ml_xstats_init(struct rte_ml_dev *dev)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint16_t nb_stats;
-	uint16_t stat_id;
-	uint16_t model;
-	uint16_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
-	nb_stats = RTE_DIM(device_xstats) + ML_CNXK_MAX_MODELS * RTE_DIM(layer_xstats);
-	if (cn10k_mldev->xstats.entries == NULL)
-		cn10k_mldev->xstats.entries = rte_zmalloc(
-			"cn10k_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
-			PLT_CACHE_LINE_SIZE);
-
-	if (cn10k_mldev->xstats.entries == NULL)
-		return -ENOMEM;
-
-	/* Initialize device xstats */
-	stat_id = 0;
-	for (i = 0; i < RTE_DIM(device_xstats); i++) {
-		cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
-		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
-			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s",
-			 device_xstats[i].name);
-
-		cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
-		cn10k_mldev->xstats.entries[stat_id].type = device_xstats[i].type;
-		cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_DEVICE;
-		cn10k_mldev->xstats.entries[stat_id].obj_idx = 0;
-		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_xstats[i].reset_allowed;
-		stat_id++;
-	}
-	cn10k_mldev->xstats.count_mode_device = stat_id;
-
-	/* Initialize model xstats */
-	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
-		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
-
-		for (i = 0; i < RTE_DIM(layer_xstats); i++) {
-			cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
-			cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
-			cn10k_mldev->xstats.entries[stat_id].type = layer_xstats[i].type;
-			cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
-			cn10k_mldev->xstats.entries[stat_id].obj_idx = model;
-			cn10k_mldev->xstats.entries[stat_id].reset_allowed =
-				layer_xstats[i].reset_allowed;
-
-			/* Name of xstat is updated during model load */
-			snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
-				 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name),
-				 "Model-%u-%s", model, layer_xstats[i].name);
-
-			stat_id++;
-		}
-
-		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
-	}
-
-	cn10k_mldev->xstats.count_mode_model = stat_id - cn10k_mldev->xstats.count_mode_device;
-	cn10k_mldev->xstats.count = stat_id;
-
-	return 0;
-}
-
 static void
-cn10k_ml_xstats_uninit(struct rte_ml_dev *dev)
+cn10k_ml_xstats_layer_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id,
+				  uint16_t layer_id)
 {
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	rte_free(cn10k_mldev->xstats.entries);
-	cn10k_mldev->xstats.entries = NULL;
-
-	cn10k_mldev->xstats.count = 0;
-}
-
-static void
-cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	uint16_t rclk_freq;
 	uint16_t sclk_freq;
 	uint16_t stat_id;
 	char suffix[8];
 	uint16_t i;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	model = dev->data->models[model_id];
-	stat_id = RTE_DIM(device_xstats) + model_id * RTE_DIM(layer_xstats);
+	model = cnxk_mldev->mldev->data->models[model_id];
+	layer = &model->layer[layer_id];
+	stat_id = cnxk_mldev->xstats.offset_for_layer[model_id][layer_id];
 
 	roc_clk_freq_get(&rclk_freq, &sclk_freq);
 	if (sclk_freq == 0)
@@ -310,270 +224,15 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 	else
 		strcpy(suffix, "ns");
 
-	/* Update xstat name based on model name and sclk availability */
+	/* Update xstat name based on layer name and sclk availability */
 	for (i = 0; i < RTE_DIM(layer_xstats); i++) {
-		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
-			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
-			 model->layer[0].glow.metadata.model.name, layer_xstats[i].name, suffix);
+		snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+			 layer->glow.metadata.model.name, layer_xstats[i].name, suffix);
 		stat_id++;
 	}
 }
 
-static uint64_t
-cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
-		       enum cnxk_ml_xstats_type type)
-{
-	struct cnxk_ml_dev *cnxk_mldev;
-
-	cnxk_mldev = dev->data->dev_private;
-
-	switch (type) {
-	case nb_models_loaded:
-		return cnxk_mldev->nb_models_loaded;
-	case nb_models_unloaded:
-		return cnxk_mldev->nb_models_unloaded;
-	case nb_models_started:
-		return cnxk_mldev->nb_models_started;
-	case nb_models_stopped:
-		return cnxk_mldev->nb_models_stopped;
-	default:
-		return -1;
-	}
-
-	return 0;
-}
-
-#define ML_AVG_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
-	do {                                                                                       \
-		value = 0;                                                                         \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value += model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot;       \
-			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
-				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
-		}                                                                                  \
-		if (count != 0)                                                                    \
-			value = value / count;                                                     \
-	} while (0)
-
-#define ML_MIN_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
-	do {                                                                                       \
-		value = UINT64_MAX;                                                                \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MIN(                                                           \
-				value,                                                             \
-				model->layer[0].glow.burst_xstats[qp_id].str##_latency_min);       \
-			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
-				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
-		}                                                                                  \
-		if (count == 0)                                                                    \
-			value = 0;                                                                 \
-	} while (0)
-
-#define ML_MAX_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
-	do {                                                                                       \
-		value = 0;                                                                         \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MAX(                                                           \
-				value,                                                             \
-				model->layer[0].glow.burst_xstats[qp_id].str##_latency_max);       \
-			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
-				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
-		}                                                                                  \
-		if (count == 0)                                                                    \
-			value = 0;                                                                 \
-	} while (0)
-
-static uint64_t
-cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cnxk_ml_xstats_type type)
-{
-	struct cnxk_ml_model *model;
-	uint16_t rclk_freq; /* MHz */
-	uint16_t sclk_freq; /* MHz */
-	uint64_t count = 0;
-	uint64_t value;
-	uint32_t qp_id;
-
-	model = dev->data->models[obj_idx];
-	if (model == NULL)
-		return 0;
-
-	switch (type) {
-	case avg_hw_latency:
-		ML_AVG_FOREACH_QP(dev, model, qp_id, hw, value, count);
-		break;
-	case min_hw_latency:
-		ML_MIN_FOREACH_QP(dev, model, qp_id, hw, value, count);
-		break;
-	case max_hw_latency:
-		ML_MAX_FOREACH_QP(dev, model, qp_id, hw, value, count);
-		break;
-	case avg_fw_latency:
-		ML_AVG_FOREACH_QP(dev, model, qp_id, fw, value, count);
-		break;
-	case min_fw_latency:
-		ML_MIN_FOREACH_QP(dev, model, qp_id, fw, value, count);
-		break;
-	case max_fw_latency:
-		ML_MAX_FOREACH_QP(dev, model, qp_id, fw, value, count);
-		break;
-	default:
-		value = 0;
-	}
-
-	roc_clk_freq_get(&rclk_freq, &sclk_freq);
-	if (sclk_freq != 0) /* return in ns */
-		value = (value * 1000ULL) / sclk_freq;
-
-	return value;
-}
-
-static int
-cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[], uint16_t nb_ids)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_xstats_entry *xs;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint16_t nb_stats;
-	uint16_t stat_id;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	if (stat_ids == NULL)
-		nb_stats = cn10k_mldev->xstats.count_mode_device;
-	else
-		nb_stats = nb_ids;
-
-	for (i = 0; i < nb_stats; i++) {
-		if (stat_ids == NULL)
-			stat_id = i;
-		else
-			stat_id = stat_ids[i];
-
-		if (stat_id >= cn10k_mldev->xstats.count_mode_device)
-			return -EINVAL;
-
-		xs = &cn10k_mldev->xstats.entries[stat_id];
-		if (!xs->reset_allowed)
-			continue;
-
-		xs->reset_value = cn10k_ml_dev_xstat_get(dev, xs->obj_idx, xs->type);
-	}
-
-	return 0;
-}
-
-#define ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
-	do {                                                                                       \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot = 0;            \
-			model->layer[0].glow.burst_xstats[qp_id].str##_reset_count =               \
-				model->layer[0].glow.burst_xstats[qp_id].dequeued_count;           \
-		}                                                                                  \
-	} while (0)
-
-#define ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
-	do {                                                                                       \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_xstats[qp_id].str##_latency_min = UINT64_MAX;   \
-	} while (0)
-
-#define ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
-	do {                                                                                       \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_xstats[qp_id].str##_latency_max = 0;            \
-	} while (0)
-
-static void
-cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cnxk_ml_xstats_type type)
-{
-	struct cnxk_ml_model *model;
-	uint32_t qp_id;
-
-	model = dev->data->models[model_id];
-
-	switch (type) {
-	case avg_hw_latency:
-		ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, hw);
-		break;
-	case min_hw_latency:
-		ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, hw);
-		break;
-	case max_hw_latency:
-		ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, hw);
-		break;
-	case avg_fw_latency:
-		ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, fw);
-		break;
-	case min_fw_latency:
-		ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, fw);
-		break;
-	case max_fw_latency:
-		ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, fw);
-		break;
-	default:
-		return;
-	}
-}
-
-static int
-cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint16_t stat_ids[],
-			    uint16_t nb_ids)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_xstats_entry *xs;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
-	int32_t lcl_model_id = 0;
-	uint16_t start_id;
-	uint16_t end_id;
-	int32_t i;
-	int32_t j;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	for (i = 0; i < ML_CNXK_MAX_MODELS; i++) {
-		if (model_id == -1) {
-			model = dev->data->models[i];
-			if (model == NULL) /* Skip inactive models */
-				continue;
-		} else {
-			if (model_id != i)
-				continue;
-
-			model = dev->data->models[model_id];
-			if (model == NULL) {
-				plt_err("Invalid model_id = %d\n", model_id);
-				return -EINVAL;
-			}
-		}
-
-		start_id = cn10k_mldev->xstats.offset_for_model[i];
-		end_id = cn10k_mldev->xstats.offset_for_model[i] +
-			 cn10k_mldev->xstats.count_per_model[i] - 1;
-
-		if (stat_ids == NULL) {
-			for (j = start_id; j <= end_id; j++) {
-				xs = &cn10k_mldev->xstats.entries[j];
-				cn10k_ml_reset_model_stat(dev, i, xs->type);
-			}
-		} else {
-			for (j = 0; j < nb_ids; j++) {
-				if (stat_ids[j] < start_id || stat_ids[j] > end_id) {
-					plt_err("Invalid stat_ids[%d] = %d for model_id = %d\n", j,
-						stat_ids[j], lcl_model_id);
-					return -EINVAL;
-				}
-				xs = &cn10k_mldev->xstats.entries[stat_ids[j]];
-				cn10k_ml_reset_model_stat(dev, i, xs->type);
-			}
-		}
-	}
-
-	return 0;
-}
-
 static int
 cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer)
 {
@@ -658,7 +317,6 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cn10k_ml_ocm *ocm;
 	uint16_t tile_id;
-	int ret;
 
 	RTE_SET_USED(conf);
 
@@ -686,13 +344,6 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 
 	rte_spinlock_init(&ocm->lock);
 
-	/* Initialize xstats */
-	ret = cn10k_ml_xstats_init(cnxk_mldev->mldev);
-	if (ret != 0) {
-		plt_err("Failed to initialize xstats");
-		return ret;
-	}
-
 	/* Set JCMDQ enqueue function */
 	if (cn10k_mldev->hw_queue_lock == 1)
 		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_sl;
@@ -721,9 +372,6 @@ cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 	/* Release ocm_mask memory */
 	rte_free(cn10k_mldev->ocm.ocm_mask);
 
-	/* Un-initialize xstats */
-	cn10k_ml_xstats_uninit(cnxk_mldev->mldev);
-
 	/* Unload firmware */
 	cn10k_ml_fw_unload(cnxk_mldev);
 
@@ -774,174 +422,6 @@ cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
-int
-cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
-			      uint32_t size)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint32_t xstats_mode_count;
-	uint32_t idx = 0;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	xstats_mode_count = 0;
-	switch (mode) {
-	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
-		break;
-	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CNXK_MAX_MODELS)
-			break;
-		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
-		break;
-	default:
-		return -EINVAL;
-	};
-
-	if (xstats_mode_count > size || xstats_map == NULL)
-		return xstats_mode_count;
-
-	for (i = 0; i < cn10k_mldev->xstats.count && idx < size; i++) {
-		if (cn10k_mldev->xstats.entries[i].mode != mode)
-			continue;
-
-		if (mode != RTE_ML_DEV_XSTATS_DEVICE &&
-		    model_id != cn10k_mldev->xstats.entries[i].obj_idx)
-			continue;
-
-		strncpy(xstats_map[idx].name, cn10k_mldev->xstats.entries[i].map.name,
-			RTE_ML_STR_MAX);
-		xstats_map[idx].id = cn10k_mldev->xstats.entries[i].map.id;
-		idx++;
-	}
-
-	return idx;
-}
-
-int
-cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
-				uint64_t *value)
-{
-	struct cnxk_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	cnxk_ml_xstats_fn fn;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	for (i = 0; i < cn10k_mldev->xstats.count; i++) {
-		xs = &cn10k_mldev->xstats.entries[i];
-		if (strncmp(xs->map.name, name, RTE_ML_STR_MAX) == 0) {
-			if (stat_id != NULL)
-				*stat_id = xs->map.id;
-
-			switch (xs->fn_id) {
-			case CNXK_ML_XSTATS_FN_DEVICE:
-				fn = cn10k_ml_dev_xstat_get;
-				break;
-			case CNXK_ML_XSTATS_FN_MODEL:
-				fn = cn10k_ml_model_xstat_get;
-				break;
-			default:
-				plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
-				return -EINVAL;
-			}
-
-			*value = fn(dev, xs->obj_idx, xs->type) - xs->reset_value;
-
-			return 0;
-		}
-	}
-
-	if (stat_id != NULL)
-		*stat_id = (uint16_t)-1;
-
-	return -EINVAL;
-}
-
-int
-cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
-			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_xstats_entry *xs;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint32_t xstats_mode_count;
-	cnxk_ml_xstats_fn fn;
-	uint64_t val;
-	uint32_t idx;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	xstats_mode_count = 0;
-
-	switch (mode) {
-	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
-		break;
-	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CNXK_MAX_MODELS)
-			return -EINVAL;
-		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
-		break;
-	default:
-		return -EINVAL;
-	};
-
-	idx = 0;
-	for (i = 0; i < nb_ids && idx < xstats_mode_count; i++) {
-		xs = &cn10k_mldev->xstats.entries[stat_ids[i]];
-		if (stat_ids[i] > cn10k_mldev->xstats.count || xs->mode != mode)
-			continue;
-
-		if (mode == RTE_ML_DEV_XSTATS_MODEL && model_id != xs->obj_idx) {
-			plt_err("Invalid stats_id[%d] = %d for model_id = %d\n", i, stat_ids[i],
-				model_id);
-			return -EINVAL;
-		}
-
-		switch (xs->fn_id) {
-		case CNXK_ML_XSTATS_FN_DEVICE:
-			fn = cn10k_ml_dev_xstat_get;
-			break;
-		case CNXK_ML_XSTATS_FN_MODEL:
-			fn = cn10k_ml_model_xstat_get;
-			break;
-		default:
-			plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
-			return -EINVAL;
-		}
-
-		val = fn(dev, xs->obj_idx, xs->type);
-		if (values)
-			values[idx] = val;
-
-		idx++;
-	}
-
-	return idx;
-}
-
-int
-cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			  int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids)
-{
-	switch (mode) {
-	case RTE_ML_DEV_XSTATS_DEVICE:
-		return cn10k_ml_device_xstats_reset(dev, stat_ids, nb_ids);
-	case RTE_ML_DEV_XSTATS_MODEL:
-		return cn10k_ml_model_xstats_reset(dev, model_id, stat_ids, nb_ids);
-	};
-
-	return 0;
-}
-
 int
 cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
 {
@@ -1215,7 +695,7 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 							      sizeof(struct cn10k_ml_layer_xstats));
 
 	/* Update xstats names */
-	cn10k_ml_xstats_model_name_update(cnxk_mldev->mldev, idx);
+	cn10k_ml_xstats_layer_name_update(cnxk_mldev, model_id, layer_id);
 
 	layer->state = ML_CNXK_LAYER_STATE_LOADED;
 	cnxk_mldev->index_map[idx].model_id = model->model_id;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 47e7cb12af..8a090a3159 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -298,17 +298,6 @@ int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev);
 
-int cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-				  int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
-				  uint32_t size);
-int cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
-				    uint64_t *value);
-int cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			    int32_t model_id, const uint16_t stat_ids[], uint64_t values[],
-			    uint16_t nb_ids);
-int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			      int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids);
-
 /* Slow-path ops */
 int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 1590249abd..3ce9338f1f 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -9,6 +9,8 @@
 
 #include "cn10k_ml_dev.h"
 
+#include "cnxk_ml_xstats.h"
+
 /* ML command timeout in seconds */
 #define ML_CNXK_CMD_TIMEOUT 5
 
@@ -51,6 +53,9 @@ struct cnxk_ml_dev {
 	/* Configuration state */
 	enum cnxk_ml_dev_state state;
 
+	/* Extended stats data */
+	struct cnxk_ml_xstats xstats;
+
 	/* Number of models loaded */
 	uint16_t nb_models_loaded;
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index ffeb3f4452..3719331951 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -117,6 +117,344 @@ cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc
 	return NULL;
 }
 
+static int
+cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
+{
+	uint16_t nb_stats;
+	uint16_t stat_id;
+	uint16_t model;
+	uint16_t layer;
+	uint16_t i;
+
+	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
+	nb_stats = RTE_DIM(device_xstats) +
+		   RTE_DIM(layer_xstats) * ML_CNXK_MAX_MODELS * ML_CNXK_MODEL_MAX_LAYERS;
+	if (cnxk_mldev->xstats.entries == NULL)
+		cnxk_mldev->xstats.entries = rte_zmalloc(
+			"cnxk_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
+			PLT_CACHE_LINE_SIZE);
+
+	if (cnxk_mldev->xstats.entries == NULL)
+		return -ENOMEM;
+
+	/* Initialize device xstats */
+	stat_id = 0;
+	for (i = 0; i < RTE_DIM(device_xstats); i++) {
+		cnxk_mldev->xstats.entries[stat_id].map.id = stat_id;
+		snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s",
+			 device_xstats[i].name);
+
+		cnxk_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
+		cnxk_mldev->xstats.entries[stat_id].group = CNXK_ML_XSTATS_GROUP_DEVICE;
+		cnxk_mldev->xstats.entries[stat_id].type = device_xstats[i].type;
+		cnxk_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_DEVICE;
+		cnxk_mldev->xstats.entries[stat_id].obj_idx = 0;
+		cnxk_mldev->xstats.entries[stat_id].reset_allowed = device_xstats[i].reset_allowed;
+		stat_id++;
+	}
+	cnxk_mldev->xstats.count_mode_device = stat_id;
+
+	/* Initialize model xstats */
+	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
+		cnxk_mldev->xstats.offset_for_model[model] = stat_id;
+
+		for (layer = 0; layer < ML_CNXK_MODEL_MAX_LAYERS; layer++) {
+			cnxk_mldev->xstats.offset_for_layer[model][layer] = stat_id;
+
+			for (i = 0; i < RTE_DIM(layer_xstats); i++) {
+				cnxk_mldev->xstats.entries[stat_id].map.id = stat_id;
+				cnxk_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
+				cnxk_mldev->xstats.entries[stat_id].group =
+					CNXK_ML_XSTATS_GROUP_LAYER;
+				cnxk_mldev->xstats.entries[stat_id].type = layer_xstats[i].type;
+				cnxk_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
+				cnxk_mldev->xstats.entries[stat_id].obj_idx = model;
+				cnxk_mldev->xstats.entries[stat_id].layer_id = layer;
+				cnxk_mldev->xstats.entries[stat_id].reset_allowed =
+					layer_xstats[i].reset_allowed;
+
+				/* Name of xstat is updated during model load */
+				snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+					 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name),
+					 "Layer-%u-%u-%s", model, layer, layer_xstats[i].name);
+
+				stat_id++;
+			}
+
+			cnxk_mldev->xstats.count_per_layer[model][layer] = RTE_DIM(layer_xstats);
+		}
+
+		cnxk_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
+	}
+
+	cnxk_mldev->xstats.count_mode_model = stat_id - cnxk_mldev->xstats.count_mode_device;
+	cnxk_mldev->xstats.count = stat_id;
+
+	return 0;
+}
+
+static void
+cnxk_ml_xstats_uninit(struct cnxk_ml_dev *cnxk_mldev)
+{
+	rte_free(cnxk_mldev->xstats.entries);
+	cnxk_mldev->xstats.entries = NULL;
+
+	cnxk_mldev->xstats.count = 0;
+}
+
+static uint64_t
+cnxk_ml_dev_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx __rte_unused,
+		      int32_t layer_id __rte_unused, enum cnxk_ml_xstats_type type)
+{
+	switch (type) {
+	case nb_models_loaded:
+		return cnxk_mldev->nb_models_loaded;
+	case nb_models_unloaded:
+		return cnxk_mldev->nb_models_unloaded;
+	case nb_models_started:
+		return cnxk_mldev->nb_models_started;
+	case nb_models_stopped:
+		return cnxk_mldev->nb_models_stopped;
+	default:
+		return -1;
+	}
+
+	return 0;
+}
+
+#define ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value += layer->glow.burst_xstats[qp_id].str##_latency_tot;                \
+			count += layer->glow.burst_xstats[qp_id].dequeued_count -                  \
+				 layer->glow.burst_xstats[qp_id].str##_reset_count;                \
+		}                                                                                  \
+		if (count != 0)                                                                    \
+			value = value / count;                                                     \
+	} while (0)
+
+#define ML_MIN_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
+	do {                                                                                       \
+		value = UINT64_MAX;                                                                \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MIN(value, layer->glow.burst_xstats[qp_id].str##_latency_min); \
+			count += layer->glow.burst_xstats[qp_id].dequeued_count -                  \
+				 layer->glow.burst_xstats[qp_id].str##_reset_count;                \
+		}                                                                                  \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+
+#define ML_MAX_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MAX(value, layer->glow.burst_xstats[qp_id].str##_latency_max); \
+			count += layer->glow.burst_xstats[qp_id].dequeued_count -                  \
+				 layer->glow.burst_xstats[qp_id].str##_reset_count;                \
+		}                                                                                  \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+
+static uint64_t
+cnxk_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx, int32_t layer_id,
+			enum cnxk_ml_xstats_type type)
+{
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
+	uint16_t rclk_freq; /* MHz */
+	uint16_t sclk_freq; /* MHz */
+	uint64_t count = 0;
+	uint64_t value = 0;
+	uint32_t qp_id;
+
+	model = cnxk_mldev->mldev->data->models[obj_idx];
+	if (model == NULL)
+		return 0;
+
+	if (layer_id >= 0)
+		layer = &model->layer[layer_id];
+	else
+		return 0;
+
+	switch (type) {
+	case avg_hw_latency:
+		ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, hw, value, count);
+		break;
+	case min_hw_latency:
+		ML_MIN_FOREACH_QP(cnxk_mldev, layer, qp_id, hw, value, count);
+		break;
+	case max_hw_latency:
+		ML_MAX_FOREACH_QP(cnxk_mldev, layer, qp_id, hw, value, count);
+		break;
+	case avg_fw_latency:
+		ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, fw, value, count);
+		break;
+	case min_fw_latency:
+		ML_MIN_FOREACH_QP(cnxk_mldev, layer, qp_id, fw, value, count);
+		break;
+	case max_fw_latency:
+		ML_MAX_FOREACH_QP(cnxk_mldev, layer, qp_id, fw, value, count);
+		break;
+	default:
+		value = 0;
+	}
+
+	roc_clk_freq_get(&rclk_freq, &sclk_freq);
+	if (sclk_freq != 0) /* return in ns */
+		value = (value * 1000ULL) / sclk_freq;
+
+	return value;
+}
+
+static int
+cnxk_ml_device_xstats_reset(struct cnxk_ml_dev *cnxk_mldev, const uint16_t stat_ids[],
+			    uint16_t nb_ids)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	uint16_t nb_stats;
+	uint16_t stat_id;
+	uint32_t i;
+
+	if (stat_ids == NULL)
+		nb_stats = cnxk_mldev->xstats.count_mode_device;
+	else
+		nb_stats = nb_ids;
+
+	for (i = 0; i < nb_stats; i++) {
+		if (stat_ids == NULL)
+			stat_id = i;
+		else
+			stat_id = stat_ids[i];
+
+		if (stat_id >= cnxk_mldev->xstats.count_mode_device)
+			return -EINVAL;
+
+		xs = &cnxk_mldev->xstats.entries[stat_id];
+		if (!xs->reset_allowed)
+			continue;
+
+		xs->reset_value =
+			cnxk_ml_dev_xstat_get(cnxk_mldev, xs->obj_idx, xs->layer_id, xs->type);
+	}
+
+	return 0;
+}
+
+#define ML_AVG_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, str)                                     \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			layer->glow.burst_xstats[qp_id].str##_latency_tot = 0;                     \
+			layer->glow.burst_xstats[qp_id].str##_reset_count =                        \
+				layer->glow.burst_xstats[qp_id].dequeued_count;                    \
+		}                                                                                  \
+	} while (0)
+
+#define ML_MIN_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, str)                                     \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++)          \
+			layer->glow.burst_xstats[qp_id].str##_latency_min = UINT64_MAX;            \
+	} while (0)
+
+#define ML_MAX_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, str)                                     \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++)          \
+			layer->glow.burst_xstats[qp_id].str##_latency_max = 0;                     \
+	} while (0)
+
+static void
+cnxk_ml_reset_model_stat(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id,
+			 enum cnxk_ml_xstats_type type)
+{
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
+	uint16_t layer_id = 0;
+	uint32_t qp_id;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	layer = &model->layer[layer_id];
+
+	switch (type) {
+	case avg_hw_latency:
+		ML_AVG_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, hw);
+		break;
+	case min_hw_latency:
+		ML_MIN_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, hw);
+		break;
+	case max_hw_latency:
+		ML_MAX_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, hw);
+		break;
+	case avg_fw_latency:
+		ML_AVG_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, fw);
+		break;
+	case min_fw_latency:
+		ML_MIN_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, fw);
+		break;
+	case max_fw_latency:
+		ML_MAX_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, fw);
+		break;
+	default:
+		return;
+	}
+}
+
+static int
+cnxk_ml_model_xstats_reset(struct cnxk_ml_dev *cnxk_mldev, int32_t model_id,
+			   const uint16_t stat_ids[], uint16_t nb_ids)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_model *model;
+	int32_t lcl_model_id = 0;
+	uint16_t layer_id = 0;
+	uint16_t start_id;
+	uint16_t end_id;
+	int32_t i;
+	int32_t j;
+
+	for (i = 0; i < ML_CNXK_MAX_MODELS; i++) {
+		if (model_id == -1) {
+			model = cnxk_mldev->mldev->data->models[i];
+			if (model == NULL) /* skip inactive models */
+				continue;
+		} else {
+			if (model_id != i)
+				continue;
+
+			model = cnxk_mldev->mldev->data->models[model_id];
+			if (model == NULL) {
+				plt_err("Invalid model_id = %d\n", model_id);
+				return -EINVAL;
+			}
+		}
+
+		start_id = cnxk_mldev->xstats.offset_for_layer[i][layer_id];
+		end_id = cnxk_mldev->xstats.offset_for_layer[i][layer_id] +
+			 cnxk_mldev->xstats.count_per_layer[i][layer_id] - 1;
+
+		if (stat_ids == NULL) {
+			for (j = start_id; j <= end_id; j++) {
+				xs = &cnxk_mldev->xstats.entries[j];
+				cnxk_ml_reset_model_stat(cnxk_mldev, i, xs->type);
+			}
+		} else {
+			for (j = 0; j < nb_ids; j++) {
+				if (stat_ids[j] < start_id || stat_ids[j] > end_id) {
+					plt_err("Invalid stat_ids[%d] = %d for model_id = %d\n", j,
+						stat_ids[j], lcl_model_id);
+					return -EINVAL;
+				}
+				xs = &cnxk_mldev->xstats.entries[stat_ids[j]];
+				cnxk_ml_reset_model_stat(cnxk_mldev, i, xs->type);
+			}
+		}
+	}
+
+	return 0;
+}
+
 static int
 cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -296,6 +634,13 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	for (i = 0; i < cnxk_mldev->max_nb_layers; i++)
 		cnxk_mldev->index_map[i].active = false;
 
+	/* Initialize xstats */
+	ret = cnxk_ml_xstats_init(cnxk_mldev);
+	if (ret != 0) {
+		plt_err("Failed to initialize xstats");
+		goto error;
+	}
+
 	cnxk_mldev->nb_models_loaded = 0;
 	cnxk_mldev->nb_models_started = 0;
 	cnxk_mldev->nb_models_stopped = 0;
@@ -325,6 +670,9 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
+	/* Un-initialize xstats */
+	cnxk_ml_xstats_uninit(cnxk_mldev);
+
 	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close CN10K ML Device");
 
@@ -523,6 +871,190 @@ cnxk_ml_dev_stats_reset(struct rte_ml_dev *dev)
 	}
 }
 
+static int
+cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+			     int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
+			     uint32_t size)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_dev *cnxk_mldev;
+	uint32_t xstats_mode_count;
+	uint16_t layer_id = 0;
+	uint32_t idx = 0;
+	uint32_t i;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+	xstats_mode_count = 0;
+
+	switch (mode) {
+	case RTE_ML_DEV_XSTATS_DEVICE:
+		xstats_mode_count = cnxk_mldev->xstats.count_mode_device;
+		break;
+	case RTE_ML_DEV_XSTATS_MODEL:
+		if (model_id >= ML_CNXK_MAX_MODELS)
+			break;
+		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+		break;
+	default:
+		return -EINVAL;
+	};
+
+	if (xstats_mode_count > size || xstats_map == NULL)
+		return xstats_mode_count;
+
+	for (i = 0; i < cnxk_mldev->xstats.count && idx < size; i++) {
+		xs = &cnxk_mldev->xstats.entries[i];
+		if (xs->mode != mode)
+			continue;
+
+		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
+		    (model_id != xs->obj_idx || layer_id != xs->layer_id))
+			continue;
+
+		strncpy(xstats_map[idx].name, xs->map.name, RTE_ML_STR_MAX);
+		xstats_map[idx].id = xs->map.id;
+		idx++;
+	}
+
+	return idx;
+}
+
+static int
+cnxk_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
+			       uint64_t *value)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_dev *cnxk_mldev;
+	cnxk_ml_xstats_fn fn;
+	uint32_t i;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	for (i = 0; i < cnxk_mldev->xstats.count; i++) {
+		xs = &cnxk_mldev->xstats.entries[i];
+		if (strncmp(xs->map.name, name, RTE_ML_STR_MAX) == 0) {
+			if (stat_id != NULL)
+				*stat_id = xs->map.id;
+
+			switch (xs->fn_id) {
+			case CNXK_ML_XSTATS_FN_DEVICE:
+				fn = cnxk_ml_dev_xstat_get;
+				break;
+			case CNXK_ML_XSTATS_FN_MODEL:
+				fn = cnxk_ml_model_xstat_get;
+				break;
+			default:
+				plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
+				return -EINVAL;
+			}
+
+			*value = fn(cnxk_mldev, xs->obj_idx, xs->layer_id, xs->type) -
+				 xs->reset_value;
+
+			return 0;
+		}
+	}
+
+	if (stat_id != NULL)
+		*stat_id = (uint16_t)-1;
+
+	return -EINVAL;
+}
+
+static int
+cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
+		       const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_dev *cnxk_mldev;
+	uint32_t xstats_mode_count;
+	uint16_t layer_id = 0;
+	cnxk_ml_xstats_fn fn;
+	uint64_t val;
+	uint32_t idx;
+	uint32_t i;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+	xstats_mode_count = 0;
+
+	switch (mode) {
+	case RTE_ML_DEV_XSTATS_DEVICE:
+		xstats_mode_count = cnxk_mldev->xstats.count_mode_device;
+		break;
+	case RTE_ML_DEV_XSTATS_MODEL:
+		if (model_id >= ML_CNXK_MAX_MODELS)
+			return -EINVAL;
+		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+		break;
+	default:
+		return -EINVAL;
+	};
+
+	idx = 0;
+	for (i = 0; i < nb_ids && idx < xstats_mode_count; i++) {
+		xs = &cnxk_mldev->xstats.entries[stat_ids[i]];
+		if (stat_ids[i] > cnxk_mldev->xstats.count || xs->mode != mode)
+			continue;
+
+		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
+		    (model_id != xs->obj_idx || layer_id != xs->layer_id)) {
+			plt_err("Invalid stats_id[%d] = %d for model_id = %d\n", i, stat_ids[i],
+				model_id);
+			return -EINVAL;
+		}
+
+		switch (xs->fn_id) {
+		case CNXK_ML_XSTATS_FN_DEVICE:
+			fn = cnxk_ml_dev_xstat_get;
+			break;
+		case CNXK_ML_XSTATS_FN_MODEL:
+			fn = cnxk_ml_model_xstat_get;
+			break;
+		default:
+			plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
+			return -EINVAL;
+		}
+
+		val = fn(cnxk_mldev, xs->obj_idx, xs->layer_id, xs->type);
+		if (values)
+			values[idx] = val;
+
+		idx++;
+	}
+
+	return idx;
+}
+
+static int
+cnxk_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
+			 const uint16_t stat_ids[], uint16_t nb_ids)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	switch (mode) {
+	case RTE_ML_DEV_XSTATS_DEVICE:
+		return cnxk_ml_device_xstats_reset(cnxk_mldev, stat_ids, nb_ids);
+	case RTE_ML_DEV_XSTATS_MODEL:
+		return cnxk_ml_model_xstats_reset(cnxk_mldev, model_id, stat_ids, nb_ids);
+	};
+
+	return 0;
+}
+
 static int
 cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
@@ -808,10 +1340,10 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Stats ops */
 	.dev_stats_get = cnxk_ml_dev_stats_get,
 	.dev_stats_reset = cnxk_ml_dev_stats_reset,
-	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
-	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
-	.dev_xstats_get = cn10k_ml_dev_xstats_get,
-	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
+	.dev_xstats_names_get = cnxk_ml_dev_xstats_names_get,
+	.dev_xstats_by_name_get = cnxk_ml_dev_xstats_by_name_get,
+	.dev_xstats_get = cnxk_ml_dev_xstats_get,
+	.dev_xstats_reset = cnxk_ml_dev_xstats_reset,
 
 	/* Model ops */
 	.model_load = cnxk_ml_model_load,
diff --git a/drivers/ml/cnxk/cnxk_ml_xstats.h b/drivers/ml/cnxk/cnxk_ml_xstats.h
index 0d405679ca..5e02bb876c 100644
--- a/drivers/ml/cnxk/cnxk_ml_xstats.h
+++ b/drivers/ml/cnxk/cnxk_ml_xstats.h
@@ -7,6 +7,8 @@
 
 #include "cnxk_ml_io.h"
 
+struct cnxk_ml_dev;
+
 /* Extended stats types enum */
 enum cnxk_ml_xstats_type {
 	/* Number of models loaded */
@@ -58,9 +60,21 @@ enum cnxk_ml_xstats_fn_type {
 	CNXK_ML_XSTATS_FN_MODEL,
 };
 
+/* Extended stats group */
+enum cnxk_ml_xstats_group {
+	/* Device stats */
+	CNXK_ML_XSTATS_GROUP_DEVICE,
+
+	/* Model stats */
+	CNXK_ML_XSTATS_GROUP_MODEL,
+
+	/* Layer stats */
+	CNXK_ML_XSTATS_GROUP_LAYER,
+};
+
 /* Function pointer to get xstats for a type */
-typedef uint64_t (*cnxk_ml_xstats_fn)(struct rte_ml_dev *cnxk_mldev, uint16_t obj_idx,
-				      enum cnxk_ml_xstats_type stat);
+typedef uint64_t (*cnxk_ml_xstats_fn)(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx,
+				      int32_t layer_id, enum cnxk_ml_xstats_type stat);
 
 /* Extended stats entry structure */
 struct cnxk_ml_xstats_entry {
@@ -70,6 +84,9 @@ struct cnxk_ml_xstats_entry {
 	/* xstats mode, device or model */
 	enum rte_ml_dev_xstats_mode mode;
 
+	/* xstats group */
+	enum cnxk_ml_xstats_group group;
+
 	/* Type of xstats */
 	enum cnxk_ml_xstats_type type;
 
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 17/34] ml/cnxk: update fast path functions
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (15 preceding siblings ...)
  2023-09-20  7:25   ` [PATCH v2 16/34] ml/cnxk: update device and model xstats functions Srikanth Yalavarthi
@ 2023-09-20  7:25   ` Srikanth Yalavarthi
  2023-09-20  7:25   ` [PATCH v2 18/34] ml/cnxk: move error handling to cnxk layer Srikanth Yalavarthi
                     ` (17 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:25 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented cnxk layer fast-path functions and added support
for model specific fast-path functions. CNXK layer functions
would invoke model specific fast-path functions.

Added support for model specific poll handling functions and
updated internal inference sync function. Drop use of rte_ml_op
as argument. Updated function arguments to enable the function
to be used as callback by TVM HW runtime.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h  |   5 -
 drivers/ml/cnxk/cn10k_ml_ops.c  | 241 ++++++++------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h  |  13 +-
 drivers/ml/cnxk/cnxk_ml_model.h |  14 ++
 drivers/ml/cnxk/cnxk_ml_ops.c   | 128 +++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.h   |   7 +
 6 files changed, 216 insertions(+), 192 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index bde9d08901..94a94d996f 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -143,11 +143,6 @@ struct cn10k_ml_dev {
 
 	/* JCMD enqueue function handler */
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
-
-	/* Poll handling function pointers */
-	void (*set_poll_addr)(struct cnxk_ml_req *req);
-	void (*set_poll_ptr)(struct cnxk_ml_req *req);
-	uint64_t (*get_poll_ptr)(struct cnxk_ml_req *req);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index f1431b89a2..7d809d25ae 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -69,24 +69,12 @@ static const struct cn10k_ml_stype_db_driver {
 	{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
 };
 
-static inline void
+__rte_hot void
 cn10k_ml_set_poll_addr(struct cnxk_ml_req *req)
 {
 	req->status = &req->cn10k_req.status;
 }
 
-static inline void
-cn10k_ml_set_poll_ptr(struct cnxk_ml_req *req)
-{
-	plt_write64(ML_CNXK_POLL_JOB_START, req->status);
-}
-
-static inline uint64_t
-cn10k_ml_get_poll_ptr(struct cnxk_ml_req *req)
-{
-	return plt_read64(req->status);
-}
-
 void
 cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp)
 {
@@ -181,7 +169,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_l
 
 static __rte_always_inline void
 cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_req *req,
-				struct rte_ml_op *op)
+				uint16_t index, void *input, void *output, uint16_t nb_batches)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 
@@ -189,17 +177,17 @@ cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_r
 
 	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
 	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(req->status);
-	req->cn10k_req.jd.hdr.model_id = op->model_id;
+	req->cn10k_req.jd.hdr.model_id = index;
 	req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
 	req->cn10k_req.jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
 	req->cn10k_req.jd.hdr.sp_flags = 0x0;
 	req->cn10k_req.jd.hdr.result =
 		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
 	req->cn10k_req.jd.model_run.input_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->input[0]->addr));
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, input));
 	req->cn10k_req.jd.model_run.output_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->output[0]->addr));
-	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, output));
+	req->cn10k_req.jd.model_run.num_batches = nb_batches;
 }
 
 static void
@@ -236,30 +224,15 @@ cn10k_ml_xstats_layer_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model
 static int
 cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer)
 {
-	struct rte_ml_buff_seg seg[2];
-	struct rte_ml_buff_seg *inp;
-	struct rte_ml_buff_seg *out;
-	struct rte_ml_op op;
-
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
 	uint64_t isize = 0;
 	uint64_t osize = 0;
 	int ret = 0;
-	uint32_t i;
-
-	inp = &seg[0];
-	out = &seg[1];
 
 	/* Create input and output buffers. */
-	for (i = 0; i < layer->info.nb_inputs; i++)
-		isize += layer->info.input[i].sz_q;
-
-	for (i = 0; i < layer->info.nb_outputs; i++)
-		osize += layer->info.output[i].sz_q;
-
-	isize = layer->batch_size * isize;
-	osize = layer->batch_size * osize;
+	isize = layer->info.total_input_sz_q;
+	osize = layer->info.total_output_sz_q;
 
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", layer->index);
 	mz = plt_memzone_reserve_aligned(str, isize + osize, 0, ML_CN10K_ALIGN_SIZE);
@@ -267,25 +240,9 @@ cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *
 		return -ENOMEM;
 	memset(mz->addr, 0, isize + osize);
 
-	seg[0].addr = mz->addr;
-	seg[0].iova_addr = mz->iova;
-	seg[0].length = isize;
-	seg[0].next = NULL;
-
-	seg[1].addr = PLT_PTR_ADD(mz->addr, isize);
-	seg[1].iova_addr = mz->iova + isize;
-	seg[1].length = osize;
-	seg[1].next = NULL;
-
-	op.model_id = layer->index;
-	op.nb_batches = layer->batch_size;
-	op.mempool = NULL;
-
-	op.input = &inp;
-	op.output = &out;
-
 	memset(layer->glow.req, 0, sizeof(struct cnxk_ml_req));
-	ret = cn10k_ml_inference_sync(cnxk_mldev, &op);
+	ret = cn10k_ml_inference_sync(cnxk_mldev, layer->index, mz->addr,
+				      PLT_PTR_ADD(mz->addr, isize), 1);
 	plt_memzone_free(mz);
 
 	return ret;
@@ -350,13 +307,8 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 	else
 		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
-	/* Set polling function pointers */
-	cn10k_mldev->set_poll_addr = cn10k_ml_set_poll_addr;
-	cn10k_mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
-	cn10k_mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
-
-	cnxk_mldev->mldev->enqueue_burst = cn10k_ml_enqueue_burst;
-	cnxk_mldev->mldev->dequeue_burst = cn10k_ml_dequeue_burst;
+	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
+	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
 	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
 
 	return 0;
@@ -749,6 +701,12 @@ cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	cn10k_ml_model_info_set(cnxk_mldev, model, &model->layer[0].info, &model->glow.metadata);
 
+	/* Set fast-path functions */
+	model->enqueue_single = cn10k_ml_enqueue_single;
+	model->result_update = cn10k_ml_result_update;
+	model->set_error_code = cn10k_ml_set_error_code;
+	model->set_poll_addr = cn10k_ml_set_poll_addr;
+
 	return 0;
 }
 
@@ -1144,26 +1102,8 @@ cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_mode
 	return 0;
 }
 
-static __rte_always_inline void
-queue_index_advance(uint64_t *index, uint64_t nb_desc)
-{
-	*index = (*index + 1) % nb_desc;
-}
-
-static __rte_always_inline uint64_t
-queue_pending_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
-{
-	return (nb_desc + head - tail) % nb_desc;
-}
-
-static __rte_always_inline uint64_t
-queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
-{
-	return nb_desc - queue_pending_count(head, tail, nb_desc) - 1;
-}
-
-static __rte_always_inline void
-cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml_req *req)
+__rte_hot void
+cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_layer_xstats *xstats;
@@ -1171,6 +1111,7 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml
 	struct cn10k_ml_result *result;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
+	struct cnxk_ml_req *req;
 	struct cnxk_ml_qp *qp;
 	struct rte_ml_op *op;
 	uint64_t hw_latency;
@@ -1178,9 +1119,9 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml
 	uint16_t model_id;
 	uint16_t layer_id;
 
+	req = (struct cnxk_ml_req *)request;
 	result = &req->cn10k_req.result;
 	op = req->op;
-
 	if (likely(result->error_code == 0)) {
 		model_id = cnxk_mldev->index_map[op->model_id].model_id;
 		layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
@@ -1247,119 +1188,48 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml
 	op->user_ptr = result->user_ptr;
 }
 
-__rte_hot uint16_t
-cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
-		       uint16_t nb_ops)
+__rte_hot void
+cn10k_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype)
+{
+	union cn10k_ml_error_code *error_code;
+
+	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+	error_code->s.etype = etype;
+	error_code->s.stype = stype;
+}
+
+__rte_hot bool
+cn10k_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op, uint16_t layer_id,
+			struct cnxk_ml_qp *qp, uint64_t head)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
 	struct cnxk_ml_queue *queue;
 	struct cnxk_ml_req *req;
-	struct cnxk_ml_qp *qp;
-	struct rte_ml_op *op;
-
-	uint16_t count;
-	uint64_t head;
-	bool enqueued;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	qp = dev->data->queue_pairs[qp_id];
 	queue = &qp->queue;
-
-	head = queue->head;
-	nb_ops = PLT_MIN(nb_ops, queue_free_count(head, queue->tail, qp->nb_desc));
-	count = 0;
-
-	if (unlikely(nb_ops == 0))
-		return 0;
-
-enqueue_req:
-	op = ops[count];
 	req = &queue->reqs[head];
 
-	cn10k_mldev->set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
+	model = cnxk_mldev->mldev->data->models[op->model_id];
+	model->set_poll_addr(req);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, model->layer[layer_id].index,
+					op->input[0]->addr, op->output[0]->addr, op->nb_batches);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
 	error_code->s.etype = ML_ETYPE_UNKNOWN;
 	req->cn10k_req.result.user_ptr = op->user_ptr;
 
-	cn10k_mldev->set_poll_ptr(req);
-	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd);
-	if (unlikely(!enqueued))
-		goto jcmdq_full;
+	cnxk_ml_set_poll_ptr(req);
+	if (unlikely(!cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd)))
+		return false;
 
 	req->timeout = plt_tsc_cycles() + queue->wait_cycles;
 	req->op = op;
 
-	queue_index_advance(&head, qp->nb_desc);
-	count++;
-
-	if (count < nb_ops)
-		goto enqueue_req;
-
-jcmdq_full:
-	queue->head = head;
-	qp->stats.enqueued_count += count;
-
-	return count;
-}
-
-__rte_hot uint16_t
-cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
-		       uint16_t nb_ops)
-{
-	union cn10k_ml_error_code *error_code;
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_queue *queue;
-	struct cnxk_ml_req *req;
-	struct cnxk_ml_qp *qp;
-
-	uint64_t status;
-	uint16_t count;
-	uint64_t tail;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	qp = dev->data->queue_pairs[qp_id];
-	queue = &qp->queue;
-
-	tail = queue->tail;
-	nb_ops = PLT_MIN(nb_ops, queue_pending_count(queue->head, tail, qp->nb_desc));
-	count = 0;
-
-	if (unlikely(nb_ops == 0))
-		goto empty_or_active;
-
-dequeue_req:
-	req = &queue->reqs[tail];
-	status = cn10k_mldev->get_poll_ptr(req);
-	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
-		if (plt_tsc_cycles() < req->timeout) {
-			goto empty_or_active;
-		} else { /* Timeout, set indication of driver error */
-			error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
-			error_code->s.etype = ML_ETYPE_DRIVER;
-		}
-	}
-
-	cn10k_ml_result_update(cnxk_mldev, qp_id, req);
-	ops[count] = req->op;
-
-	queue_index_advance(&tail, qp->nb_desc);
-	count++;
-
-	if (count < nb_ops)
-		goto dequeue_req;
-
-empty_or_active:
-	queue->tail = tail;
-
-	return count;
+	return true;
 }
 
 __rte_hot int
@@ -1396,41 +1266,48 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 }
 
 __rte_hot int
-cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op)
+cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
+			uint16_t nb_batches)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
 	struct cnxk_ml_req *req;
+	struct rte_ml_op op;
 	uint16_t model_id;
 	uint16_t layer_id;
 	bool timeout;
 	int ret = 0;
 
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	model_id = cnxk_mldev->index_map[op->model_id].model_id;
-	layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
+	model_id = cnxk_mldev->index_map[index].model_id;
+	layer_id = cnxk_mldev->index_map[index].layer_id;
 	model = cnxk_mldev->mldev->data->models[model_id];
 	layer = &model->layer[layer_id];
 	req = layer->glow.req;
 
+	op.model_id = index;
+	op.impl_opaque = 0;
+
 	cn10k_ml_set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, index, input, output, nb_batches);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
 	error_code->s.etype = ML_ETYPE_UNKNOWN;
-	req->cn10k_req.result.user_ptr = op->user_ptr;
+	req->cn10k_req.result.user_ptr = NULL;
 
-	cn10k_mldev->set_poll_ptr(req);
+	cnxk_ml_set_poll_ptr(req);
 	req->cn10k_req.jcmd.w1.s.jobptr = PLT_U64_CAST(&req->cn10k_req.jd);
 
 	timeout = true;
 	req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	do {
 		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd)) {
-			req->op = op;
+			req->op = &op;
 			timeout = false;
 			break;
 		}
@@ -1443,7 +1320,7 @@ cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op)
 
 	timeout = true;
 	do {
-		if (cn10k_mldev->get_poll_ptr(req) == ML_CNXK_POLL_JOB_FINISH) {
+		if (cnxk_ml_get_poll_ptr(req) == ML_CNXK_POLL_JOB_FINISH) {
 			timeout = false;
 			break;
 		}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 8a090a3159..3e75cae65a 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -13,6 +13,7 @@
 struct cnxk_ml_dev;
 struct cnxk_ml_qp;
 struct cnxk_ml_model;
+struct cnxk_ml_req;
 
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
@@ -308,13 +309,15 @@ int cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_
 				 void *buffer);
 
 /* Fast-path ops */
-__rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
-					  struct rte_ml_op **ops, uint16_t nb_ops);
-__rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
-					  struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot bool cn10k_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op,
+				       uint16_t layer_id, struct cnxk_ml_qp *qp, uint64_t head);
 __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op,
 				    struct rte_ml_op_error *error);
-__rte_hot int cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op);
+__rte_hot int cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
+				      uint16_t nb_batches);
+__rte_hot void cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request);
+__rte_hot void cn10k_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype);
+__rte_hot void cn10k_ml_set_poll_addr(struct cnxk_ml_req *req);
 
 /* Misc ops */
 void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index 66d979dd3c..f618e5aa5f 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -15,6 +15,8 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_qp;
+struct cnxk_ml_req;
 
 /* Model state */
 enum cnxk_ml_model_state {
@@ -70,6 +72,12 @@ struct cnxk_ml_layer {
 	struct cn10k_ml_layer_data glow;
 };
 
+typedef bool (*enqueue_single_t)(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op,
+				 uint16_t layer_id, struct cnxk_ml_qp *qp, uint64_t head);
+typedef void (*result_update_t)(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request);
+typedef void (*set_error_code_t)(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype);
+typedef void (*set_poll_addr_t)(struct cnxk_ml_req *req);
+
 /* Model Object */
 struct cnxk_ml_model {
 	/* Device reference */
@@ -106,6 +114,12 @@ struct cnxk_ml_model {
 
 	/* Spinlock, used to update model state */
 	plt_spinlock_t lock;
+
+	/* Fast-path functions */
+	enqueue_single_t enqueue_single;
+	result_update_t result_update;
+	set_error_code_t set_error_code;
+	set_poll_addr_t set_poll_addr;
 };
 
 void cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp);
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 3719331951..923e603e8e 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -17,6 +17,18 @@
 /* ML model macros */
 #define CNXK_ML_MODEL_MEMZONE_NAME "ml_cnxk_model_mz"
 
+__rte_hot void
+cnxk_ml_set_poll_ptr(struct cnxk_ml_req *req)
+{
+	plt_write64(ML_CNXK_POLL_JOB_START, req->status);
+}
+
+__rte_hot uint64_t
+cnxk_ml_get_poll_ptr(struct cnxk_ml_req *req)
+{
+	return plt_read64(req->status);
+}
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -1323,6 +1335,122 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 	return 0;
 }
 
+static __rte_always_inline void
+queue_index_advance(uint64_t *index, uint64_t nb_desc)
+{
+	*index = (*index + 1) % nb_desc;
+}
+
+static __rte_always_inline uint64_t
+queue_pending_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
+{
+	return (nb_desc + head - tail) % nb_desc;
+}
+
+static __rte_always_inline uint64_t
+queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
+{
+	return nb_desc - queue_pending_count(head, tail, nb_desc) - 1;
+}
+
+__rte_hot uint16_t
+cnxk_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
+		      uint16_t nb_ops)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_qp *qp;
+	struct rte_ml_op *op;
+
+	uint16_t layer_id = 0;
+	uint16_t count;
+	uint64_t head;
+
+	cnxk_mldev = dev->data->dev_private;
+	qp = dev->data->queue_pairs[qp_id];
+	queue = &qp->queue;
+
+	head = queue->head;
+	nb_ops = PLT_MIN(nb_ops, queue_free_count(head, queue->tail, qp->nb_desc));
+	count = 0;
+
+	if (unlikely(nb_ops == 0))
+		return 0;
+
+enqueue_req:
+	op = ops[count];
+	model = cnxk_mldev->mldev->data->models[op->model_id];
+
+	if (unlikely(!model->enqueue_single(cnxk_mldev, op, layer_id, qp, head)))
+		goto jcmdq_full;
+
+	queue_index_advance(&head, qp->nb_desc);
+	count++;
+
+	if (count < nb_ops)
+		goto enqueue_req;
+
+jcmdq_full:
+	queue->head = head;
+	qp->stats.enqueued_count += count;
+
+	return count;
+}
+
+__rte_hot uint16_t
+cnxk_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
+		      uint16_t nb_ops)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_req *req;
+	struct cnxk_ml_qp *qp;
+
+	uint64_t status;
+	uint16_t count;
+	uint64_t tail;
+
+	cnxk_mldev = dev->data->dev_private;
+	qp = dev->data->queue_pairs[qp_id];
+	queue = &qp->queue;
+
+	tail = queue->tail;
+	nb_ops = PLT_MIN(nb_ops, queue_pending_count(queue->head, tail, qp->nb_desc));
+	count = 0;
+
+	if (unlikely(nb_ops == 0))
+		goto empty_or_active;
+
+dequeue_req:
+
+	req = &queue->reqs[tail];
+	model = cnxk_mldev->mldev->data->models[req->op->model_id];
+
+	status = cnxk_ml_get_poll_ptr(req);
+	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
+		if (plt_tsc_cycles() < req->timeout)
+			goto empty_or_active;
+		else /* Timeout, set indication of driver error */
+			model->set_error_code(req, ML_ETYPE_DRIVER, 0);
+	}
+
+	model->result_update(cnxk_mldev, qp->id, req);
+
+	ops[count] = req->op;
+	queue_index_advance(&tail, qp->nb_desc);
+	count++;
+
+	if (count < nb_ops)
+		goto dequeue_req;
+
+empty_or_active:
+	queue->tail = tail;
+
+	return count;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index d27ca0d0cb..d0c126f34b 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -65,4 +65,11 @@ extern struct rte_ml_dev_ops cnxk_ml_ops;
 int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
 int cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
 
+__rte_hot uint16_t cnxk_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
+					 struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot uint16_t cnxk_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
+					 struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot void cnxk_ml_set_poll_ptr(struct cnxk_ml_req *req);
+__rte_hot uint64_t cnxk_ml_get_poll_ptr(struct cnxk_ml_req *req);
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 18/34] ml/cnxk: move error handling to cnxk layer
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (16 preceding siblings ...)
  2023-09-20  7:25   ` [PATCH v2 17/34] ml/cnxk: update fast path functions Srikanth Yalavarthi
@ 2023-09-20  7:25   ` Srikanth Yalavarthi
  2023-09-20  7:25   ` [PATCH v2 19/34] ml/cnxk: support config and close of tvmdp library Srikanth Yalavarthi
                     ` (16 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:25 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Move error type structures to cnxk layer. cn10k layer to
handle fw and hw error sub-types only.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h | 41 ++++++---------
 drivers/ml/cnxk/cn10k_ml_ops.c | 93 +++++++++++++---------------------
 drivers/ml/cnxk/cnxk_ml_dev.c  |  8 +++
 drivers/ml/cnxk/cnxk_ml_dev.h  | 18 +++++++
 drivers/ml/cnxk/cnxk_ml_ops.c  |  2 +-
 5 files changed, 78 insertions(+), 84 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 94a94d996f..2e7eb6c9ef 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -52,38 +52,27 @@ struct cnxk_ml_dev;
 struct cnxk_ml_req;
 struct cnxk_ml_qp;
 
-/* Error types enumeration */
-enum cn10k_ml_error_etype {
-	/* 0x0 */ ML_ETYPE_NO_ERROR = 0, /* No error */
-	/* 0x1 */ ML_ETYPE_FW_NONFATAL,	 /* Firmware non-fatal error */
-	/* 0x2 */ ML_ETYPE_HW_NONFATAL,	 /* Hardware non-fatal error */
-	/* 0x3 */ ML_ETYPE_HW_FATAL,	 /* Hardware fatal error */
-	/* 0x4 */ ML_ETYPE_HW_WARNING,	 /* Hardware warning */
-	/* 0x5 */ ML_ETYPE_DRIVER,	 /* Driver specific error */
-	/* 0x6 */ ML_ETYPE_UNKNOWN,	 /* Unknown error */
-};
-
 /* Firmware non-fatal error sub-type */
 enum cn10k_ml_error_stype_fw_nf {
-	/* 0x0 */ ML_FW_ERR_NOERR = 0,		 /* No error */
-	/* 0x1 */ ML_FW_ERR_UNLOAD_ID_NOT_FOUND, /* Model ID not found during load */
-	/* 0x2 */ ML_FW_ERR_LOAD_LUT_OVERFLOW,	 /* Lookup table overflow at load */
-	/* 0x3 */ ML_FW_ERR_ID_IN_USE,		 /* Model ID already in use */
-	/* 0x4 */ ML_FW_ERR_INVALID_TILEMASK,	 /* Invalid OCM tilemask */
-	/* 0x5 */ ML_FW_ERR_RUN_LUT_OVERFLOW,	 /* Lookup table overflow at run */
-	/* 0x6 */ ML_FW_ERR_RUN_ID_NOT_FOUND,	 /* Model ID not found during run */
-	/* 0x7 */ ML_FW_ERR_COMMAND_NOTSUP,	 /* Unsupported command */
-	/* 0x8 */ ML_FW_ERR_DDR_ADDR_RANGE,	 /* DDR address out of range */
-	/* 0x9 */ ML_FW_ERR_NUM_BATCHES_INVALID, /* Invalid number of batches */
-	/* 0xA */ ML_FW_ERR_INSSYNC_TIMEOUT,	 /* INS sync timeout */
+	/* 0x0 */ ML_CN10K_FW_ERR_NOERR = 0,	       /* No error */
+	/* 0x1 */ ML_CN10K_FW_ERR_UNLOAD_ID_NOT_FOUND, /* Model ID not found during load */
+	/* 0x2 */ ML_CN10K_FW_ERR_LOAD_LUT_OVERFLOW,   /* Lookup table overflow at load */
+	/* 0x3 */ ML_CN10K_FW_ERR_ID_IN_USE,	       /* Model ID already in use */
+	/* 0x4 */ ML_CN10K_FW_ERR_INVALID_TILEMASK,    /* Invalid OCM tilemask */
+	/* 0x5 */ ML_CN10K_FW_ERR_RUN_LUT_OVERFLOW,    /* Lookup table overflow at run */
+	/* 0x6 */ ML_CN10K_FW_ERR_RUN_ID_NOT_FOUND,    /* Model ID not found during run */
+	/* 0x7 */ ML_CN10K_FW_ERR_COMMAND_NOTSUP,      /* Unsupported command */
+	/* 0x8 */ ML_CN10K_FW_ERR_DDR_ADDR_RANGE,      /* DDR address out of range */
+	/* 0x9 */ ML_CN10K_FW_ERR_NUM_BATCHES_INVALID, /* Invalid number of batches */
+	/* 0xA */ ML_CN10K_FW_ERR_INSSYNC_TIMEOUT,     /* INS sync timeout */
 };
 
 /* Driver error sub-type */
 enum cn10k_ml_error_stype_driver {
-	/* 0x0 */ ML_DRIVER_ERR_NOERR = 0, /* No error */
-	/* 0x1 */ ML_DRIVER_ERR_UNKNOWN,   /* Unable to determine error sub-type */
-	/* 0x2 */ ML_DRIVER_ERR_EXCEPTION, /* Firmware exception */
-	/* 0x3 */ ML_DRIVER_ERR_FW_ERROR,  /* Unknown firmware error */
+	/* 0x0 */ ML_CN10K_DRIVER_ERR_NOERR = 0, /* No error */
+	/* 0x1 */ ML_CN10K_DRIVER_ERR_UNKNOWN,	 /* Unable to determine error sub-type */
+	/* 0x2 */ ML_CN10K_DRIVER_ERR_EXCEPTION, /* Firmware exception */
+	/* 0x3 */ ML_CN10K_DRIVER_ERR_FW_ERROR,	 /* Unknown firmware error */
 };
 
 /* Error structure */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 7d809d25ae..daeb3b712c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -26,47 +26,27 @@
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
-/* Error message length */
-#define ERRMSG_LEN 32
-
-/* Error type database */
-static const struct cn10k_ml_etype_db {
-	enum cn10k_ml_error_etype etype;
-	char name[ERRMSG_LEN];
-} ml_etype_db[] = {
-	{ML_ETYPE_NO_ERROR, "NO_ERROR"},	{ML_ETYPE_FW_NONFATAL, "FW_NON_FATAL"},
-	{ML_ETYPE_HW_NONFATAL, "HW_NON_FATAL"}, {ML_ETYPE_HW_FATAL, "HW_FATAL"},
-	{ML_ETYPE_HW_WARNING, "HW_WARNING"},	{ML_ETYPE_DRIVER, "DRIVER_ERROR"},
-	{ML_ETYPE_UNKNOWN, "UNKNOWN_ERROR"},
-};
-
 /* Hardware non-fatal error subtype database */
-static const struct cn10k_ml_stype_db_hw_nf {
-	enum cn10k_ml_error_stype_fw_nf stype;
-	char msg[ERRMSG_LEN];
-} ml_stype_db_hw_nf[] = {
-	{ML_FW_ERR_NOERR, "NO ERROR"},
-	{ML_FW_ERR_UNLOAD_ID_NOT_FOUND, "UNLOAD MODEL ID NOT FOUND"},
-	{ML_FW_ERR_LOAD_LUT_OVERFLOW, "LOAD LUT OVERFLOW"},
-	{ML_FW_ERR_ID_IN_USE, "MODEL ID IN USE"},
-	{ML_FW_ERR_INVALID_TILEMASK, "INVALID TILEMASK"},
-	{ML_FW_ERR_RUN_LUT_OVERFLOW, "RUN LUT OVERFLOW"},
-	{ML_FW_ERR_RUN_ID_NOT_FOUND, "RUN MODEL ID NOT FOUND"},
-	{ML_FW_ERR_COMMAND_NOTSUP, "COMMAND NOT SUPPORTED"},
-	{ML_FW_ERR_DDR_ADDR_RANGE, "DDR ADDRESS OUT OF RANGE"},
-	{ML_FW_ERR_NUM_BATCHES_INVALID, "INVALID BATCHES"},
-	{ML_FW_ERR_INSSYNC_TIMEOUT, "INSSYNC TIMEOUT"},
+static struct cnxk_ml_error_db ml_stype_db_hw_nf[] = {
+	{ML_CN10K_FW_ERR_NOERR, "NO ERROR"},
+	{ML_CN10K_FW_ERR_UNLOAD_ID_NOT_FOUND, "UNLOAD MODEL ID NOT FOUND"},
+	{ML_CN10K_FW_ERR_LOAD_LUT_OVERFLOW, "LOAD LUT OVERFLOW"},
+	{ML_CN10K_FW_ERR_ID_IN_USE, "MODEL ID IN USE"},
+	{ML_CN10K_FW_ERR_INVALID_TILEMASK, "INVALID TILEMASK"},
+	{ML_CN10K_FW_ERR_RUN_LUT_OVERFLOW, "RUN LUT OVERFLOW"},
+	{ML_CN10K_FW_ERR_RUN_ID_NOT_FOUND, "RUN MODEL ID NOT FOUND"},
+	{ML_CN10K_FW_ERR_COMMAND_NOTSUP, "COMMAND NOT SUPPORTED"},
+	{ML_CN10K_FW_ERR_DDR_ADDR_RANGE, "DDR ADDRESS OUT OF RANGE"},
+	{ML_CN10K_FW_ERR_NUM_BATCHES_INVALID, "INVALID BATCHES"},
+	{ML_CN10K_FW_ERR_INSSYNC_TIMEOUT, "INSSYNC TIMEOUT"},
 };
 
 /* Driver error subtype database */
-static const struct cn10k_ml_stype_db_driver {
-	enum cn10k_ml_error_stype_driver stype;
-	char msg[ERRMSG_LEN];
-} ml_stype_db_driver[] = {
-	{ML_DRIVER_ERR_NOERR, "NO ERROR"},
-	{ML_DRIVER_ERR_UNKNOWN, "UNKNOWN ERROR"},
-	{ML_DRIVER_ERR_EXCEPTION, "FW EXCEPTION"},
-	{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
+static struct cnxk_ml_error_db ml_stype_db_driver[] = {
+	{ML_CN10K_DRIVER_ERR_NOERR, "NO ERROR"},
+	{ML_CN10K_DRIVER_ERR_UNKNOWN, "UNKNOWN ERROR"},
+	{ML_CN10K_DRIVER_ERR_EXCEPTION, "FW EXCEPTION"},
+	{ML_CN10K_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
 };
 
 __rte_hot void
@@ -1166,19 +1146,19 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request)
 
 		/* Handle driver error */
 		error_code = (union cn10k_ml_error_code *)&result->error_code;
-		if (error_code->s.etype == ML_ETYPE_DRIVER) {
+		if (error_code->s.etype == ML_CNXK_ETYPE_DRIVER) {
 			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 			/* Check for exception */
 			if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) !=
 			     0) ||
 			    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
-				error_code->s.stype = ML_DRIVER_ERR_EXCEPTION;
+				error_code->s.stype = ML_CN10K_DRIVER_ERR_EXCEPTION;
 			else if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_LO) != 0) ||
 				 (roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_HI) != 0))
-				error_code->s.stype = ML_DRIVER_ERR_FW_ERROR;
+				error_code->s.stype = ML_CN10K_DRIVER_ERR_FW_ERROR;
 			else
-				error_code->s.stype = ML_DRIVER_ERR_UNKNOWN;
+				error_code->s.stype = ML_CN10K_DRIVER_ERR_UNKNOWN;
 		}
 
 		op->impl_opaque = result->error_code;
@@ -1219,7 +1199,7 @@ cn10k_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op, ui
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
-	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	error_code->s.etype = ML_CNXK_ETYPE_UNKNOWN;
 	req->cn10k_req.result.user_ptr = op->user_ptr;
 
 	cnxk_ml_set_poll_ptr(req);
@@ -1236,30 +1216,29 @@ __rte_hot int
 cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_ml_op_error *error)
 {
 	union cn10k_ml_error_code *error_code;
-	char msg[RTE_ML_STR_MAX];
 
 	PLT_SET_USED(dev);
 
 	error_code = (union cn10k_ml_error_code *)&op->impl_opaque;
 
-	/* Copy error message */
-	plt_strlcpy(msg, ml_etype_db[error_code->s.etype].name, sizeof(msg));
-
 	/* Copy sub error message */
-	if (error_code->s.etype == ML_ETYPE_HW_NONFATAL) {
-		strcat(msg, " : ");
+	if (error_code->s.etype == ML_CNXK_ETYPE_HW_NONFATAL) {
 		if (error_code->s.stype < PLT_DIM(ml_stype_db_hw_nf))
-			strcat(msg, ml_stype_db_hw_nf[error_code->s.stype].msg);
+			snprintf(error->message, RTE_ML_STR_MAX, "%s : %s",
+				 ml_etype_db[error_code->s.etype].str,
+				 ml_stype_db_hw_nf[error_code->s.stype].str);
 		else
-			strcat(msg, "UNKNOWN ERROR");
-	}
-
-	if (error_code->s.etype == ML_ETYPE_DRIVER) {
-		strcat(msg, " : ");
-		strcat(msg, ml_stype_db_driver[error_code->s.stype].msg);
+			snprintf(error->message, RTE_ML_STR_MAX, "%s : UNKNOWN ERROR",
+				 ml_etype_db[error_code->s.etype].str);
+	} else if (error_code->s.etype == ML_CNXK_ETYPE_DRIVER) {
+		snprintf(error->message, RTE_ML_STR_MAX, "%s : %s",
+			 ml_etype_db[error_code->s.etype].str,
+			 ml_stype_db_driver[error_code->s.stype].str);
+	} else {
+		snprintf(error->message, RTE_ML_STR_MAX, "%s",
+			 ml_etype_db[error_code->s.etype].str);
 	}
 
-	plt_strlcpy(error->message, msg, sizeof(error->message));
 	error->errcode = error_code->u64;
 
 	return 0;
@@ -1297,7 +1276,7 @@ cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
-	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	error_code->s.etype = ML_CNXK_ETYPE_UNKNOWN;
 	req->cn10k_req.result.user_ptr = NULL;
 
 	cnxk_ml_set_poll_ptr(req);
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.c b/drivers/ml/cnxk/cnxk_ml_dev.c
index 2a5c17c973..63d1c9e417 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.c
+++ b/drivers/ml/cnxk/cnxk_ml_dev.c
@@ -9,3 +9,11 @@
 
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
+
+/* Error type database */
+struct cnxk_ml_error_db ml_etype_db[] = {
+	{ML_CNXK_ETYPE_NO_ERROR, "NO_ERROR"},	     {ML_CNXK_ETYPE_FW_NONFATAL, "FW_NON_FATAL"},
+	{ML_CNXK_ETYPE_HW_NONFATAL, "HW_NON_FATAL"}, {ML_CNXK_ETYPE_HW_FATAL, "HW_FATAL"},
+	{ML_CNXK_ETYPE_HW_WARNING, "HW_WARNING"},    {ML_CNXK_ETYPE_DRIVER, "DRIVER_ERROR"},
+	{ML_CNXK_ETYPE_UNKNOWN, "UNKNOWN_ERROR"},
+};
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 3ce9338f1f..382fca64be 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -18,6 +18,22 @@
 #define ML_CNXK_POLL_JOB_START	0
 #define ML_CNXK_POLL_JOB_FINISH 1
 
+/* Error types enumeration */
+enum cnxk_ml_error_etype {
+	/* 0x0 */ ML_CNXK_ETYPE_NO_ERROR = 0, /* No error */
+	/* 0x1 */ ML_CNXK_ETYPE_FW_NONFATAL,  /* Firmware non-fatal error */
+	/* 0x2 */ ML_CNXK_ETYPE_HW_NONFATAL,  /* Hardware non-fatal error */
+	/* 0x3 */ ML_CNXK_ETYPE_HW_FATAL,     /* Hardware fatal error */
+	/* 0x4 */ ML_CNXK_ETYPE_HW_WARNING,   /* Hardware warning */
+	/* 0x5 */ ML_CNXK_ETYPE_DRIVER,	      /* Driver specific error */
+	/* 0x6 */ ML_CNXK_ETYPE_UNKNOWN,      /* Unknown error */
+};
+
+struct cnxk_ml_error_db {
+	uint64_t code;
+	char str[RTE_ML_STR_MAX];
+};
+
 /* Device configuration state enum */
 enum cnxk_ml_dev_state {
 	/* Probed and not configured */
@@ -78,4 +94,6 @@ struct cnxk_ml_dev {
 	struct cnxk_ml_index_map *index_map;
 };
 
+extern struct cnxk_ml_error_db ml_etype_db[];
+
 #endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 923e603e8e..e6c67c71f5 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1433,7 +1433,7 @@ cnxk_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op *
 		if (plt_tsc_cycles() < req->timeout)
 			goto empty_or_active;
 		else /* Timeout, set indication of driver error */
-			model->set_error_code(req, ML_ETYPE_DRIVER, 0);
+			model->set_error_code(req, ML_CNXK_ETYPE_DRIVER, 0);
 	}
 
 	model->result_update(cnxk_mldev, qp->id, req);
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 19/34] ml/cnxk: support config and close of tvmdp library
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (17 preceding siblings ...)
  2023-09-20  7:25   ` [PATCH v2 18/34] ml/cnxk: move error handling to cnxk layer Srikanth Yalavarthi
@ 2023-09-20  7:25   ` Srikanth Yalavarthi
  2023-09-20  7:25   ` [PATCH v2 20/34] ml/cnxk: add structures to support TVM model type Srikanth Yalavarthi
                     ` (15 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:25 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to configure and close TVMDP library based
on ML device configuration options.

Updated meson build to enable Jansson, TVM runtime, TVMDP
library as build dependencies.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_ops.c  | 15 ++++++++++
 drivers/ml/cnxk/meson.build    | 50 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.c | 42 ++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h | 19 +++++++++++++
 4 files changed, 126 insertions(+)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.h

diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index e6c67c71f5..358f16cead 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -9,6 +9,10 @@
 
 #include "cn10k_ml_ops.h"
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_ops.h"
+#endif
+
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_io.h"
 #include "cnxk_ml_model.h"
@@ -625,6 +629,12 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 		goto error;
 	}
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	ret = mvtvm_ml_dev_configure(cnxk_mldev, conf);
+	if (ret != 0)
+		goto error;
+#endif
+
 	/* Set device capabilities */
 	cnxk_mldev->max_nb_layers =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
@@ -685,6 +695,11 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 	/* Un-initialize xstats */
 	cnxk_ml_xstats_uninit(cnxk_mldev);
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	if (mvtvm_ml_dev_close(cnxk_mldev) != 0)
+		plt_err("Failed to close MVTVM ML Device");
+#endif
+
 	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close CN10K ML Device");
 
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 575f08f9c0..61f7fa32af 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -7,6 +7,32 @@ if not is_linux or not dpdk_conf.get('RTE_ARCH_64')
     subdir_done()
 endif
 
+enable_mvtvm = true
+
+if not jansson_dep.found()
+        message('drivers/ml/cnxk: jansson not found')
+        enable_mvtvm = false
+endif
+
+if not cc.check_header('dlpack/dlpack.h')
+        message('drivers/ml/cnxk: dlpack.h not found')
+        enable_mvtvm = false
+endif
+
+tvmrt_lib = cc.find_library('tvm_runtime', required: false)
+if tvmrt_lib.found()
+        tvmrt_dep = declare_dependency(dependencies: tvmrt_lib)
+else
+        message('drivers/ml/cnxk: tvm_runtime not found')
+        enable_mvtvm = false
+endif
+
+tvmdp_dep = dependency('tvmdp', required: false)
+if not tvmdp_dep.found()
+        message('drivers/ml/cnxk: tvmdp not found')
+        enable_mvtvm = false
+endif
+
 driver_sdk_headers = files(
         'cn10k_ml_dev.h',
         'cn10k_ml_ops.h',
@@ -34,6 +60,30 @@ sources = files(
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
 
+if enable_mvtvm
+
+dpdk_conf.set('RTE_MLDEV_CNXK_ENABLE_MVTVM', true)
+
+driver_sdk_headers += files(
+        'mvtvm_ml_ops.h',
+)
+
+sources += files(
+        'mvtvm_ml_ops.c',
+)
+
+ext_deps += tvmrt_dep
+ext_deps += tvmdp_dep
+ext_deps += cc.find_library('stdc++', required: true)
+ext_deps += jansson_dep
+
+deps += ['bus_vdev']
+
+message('drivers/ml/cnxk: Enabled TVM model support')
+else
+message('drivers/ml/cnxk: Disabled TVM model support')
+endif
+
 require_iova_in_mbuf = false
 
 if get_option('buildtype').contains('debug')
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
new file mode 100644
index 0000000000..f2b9499cf4
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_common.h>
+#include <rte_cycles.h>
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include "mvtvm_ml_ops.h"
+
+#include "cnxk_ml_dev.h"
+
+int
+mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
+{
+	int ret;
+
+	RTE_SET_USED(conf);
+
+	/* Configure TVMDP library */
+	ret = tvmdp_configure(cnxk_mldev->mldev->data->nb_models, rte_get_tsc_cycles);
+	if (ret != 0)
+		plt_err("TVMDP configuration failed, error = %d\n", ret);
+
+	return ret;
+}
+
+int
+mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
+{
+	int ret;
+
+	RTE_SET_USED(cnxk_mldev);
+
+	/* Close TVMDP library configuration */
+	ret = tvmdp_close();
+	if (ret != 0)
+		plt_err("TVMDP close failed, error = %d\n", ret);
+
+	return ret;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
new file mode 100644
index 0000000000..305b4681ed
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_OPS_H_
+#define _MVTVM_ML_OPS_H_
+
+#include <dlpack/dlpack.h>
+
+#include <tvmdp.h>
+
+#include <rte_mldev.h>
+
+struct cnxk_ml_dev;
+
+int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
+int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+
+#endif /* _MVTVM_ML_OPS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 20/34] ml/cnxk: add structures to support TVM model type
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (18 preceding siblings ...)
  2023-09-20  7:25   ` [PATCH v2 19/34] ml/cnxk: support config and close of tvmdp library Srikanth Yalavarthi
@ 2023-09-20  7:25   ` Srikanth Yalavarthi
  2023-09-20  7:25   ` [PATCH v2 21/34] ml/cnxk: add support for identify " Srikanth Yalavarthi
                     ` (14 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:25 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduced model type, sub-type and layer type. Added
internal structures for TVM model objects.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  3 ++
 drivers/ml/cnxk/cn10k_ml_ops.c   |  6 ++-
 drivers/ml/cnxk/cnxk_ml_model.h  | 63 +++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.c    | 60 +++++++++++++++++++++++++-----
 drivers/ml/cnxk/meson.build      |  1 +
 drivers/ml/cnxk/mvtvm_ml_model.h | 46 +++++++++++++++++++++++
 6 files changed, 166 insertions(+), 13 deletions(-)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.h

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 70d207e646..a7b64ddf05 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -437,6 +437,9 @@ cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint1
 
 			for (j = 0; j < local_model->nb_layers; j++) {
 				local_layer = &local_model->layer[j];
+				if (local_layer->type != ML_CNXK_LAYER_TYPE_MRVL)
+					continue;
+
 				if (local_layer != layer &&
 				    local_layer->glow.ocm_map.ocm_reserved) {
 					if (IS_BIT_SET(local_layer->glow.ocm_map.tilemask, tile_id))
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index daeb3b712c..db18f32052 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -650,6 +650,9 @@ cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	if (ret != 0)
 		return ret;
 
+	/* Set model sub type */
+	model->subtype = ML_CNXK_MODEL_SUBTYPE_GLOW_MRVL;
+
 	/* Copy metadata to internal buffer */
 	rte_memcpy(&model->glow.metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
 	cn10k_ml_model_metadata_update(&model->glow.metadata);
@@ -671,6 +674,7 @@ cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	/* Load layer and get the index */
 	layer = &model->layer[0];
+	layer->type = ML_CNXK_LAYER_TYPE_MRVL;
 	ret = cn10k_ml_layer_load(cnxk_mldev, model->model_id, NULL, params->addr, params->size,
 				  &layer->index);
 	if (ret != 0) {
@@ -894,7 +898,7 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 	if (ret < 0) {
 		cn10k_ml_layer_stop(device, model_id, layer_name);
 	} else {
-		if (cn10k_mldev->cache_model_data)
+		if (cn10k_mldev->cache_model_data && model->type == ML_CNXK_MODEL_TYPE_GLOW)
 			ret = cn10k_ml_cache_model_data(cnxk_mldev, layer);
 	}
 
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index f618e5aa5f..b5d6ab2b1e 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -11,6 +11,10 @@
 
 #include "cn10k_ml_model.h"
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_model.h"
+#endif
+
 #include "cnxk_ml_io.h"
 
 struct cnxk_ml_dev;
@@ -18,6 +22,45 @@ struct cnxk_ml_model;
 struct cnxk_ml_qp;
 struct cnxk_ml_req;
 
+/* Model type */
+enum cnxk_ml_model_type {
+	/* Invalid model type */
+	ML_CNXK_MODEL_TYPE_INVALID,
+
+	/* Glow compiled model, for MLIP target */
+	ML_CNXK_MODEL_TYPE_GLOW,
+
+	/* TVM compiled model, for ARM64 / ARM64 + MLIP target */
+	ML_CNXK_MODEL_TYPE_TVM,
+};
+
+/* Model subtype */
+enum cnxk_ml_model_subtype {
+	/* Marvell Glow model */
+	ML_CNXK_MODEL_SUBTYPE_GLOW_MRVL,
+
+	/* TVM model with single MRVL region */
+	ML_CNXK_MODEL_SUBTYPE_TVM_MRVL,
+
+	/* TVM model with LLVM regions only */
+	ML_CNXK_MODEL_SUBTYPE_TVM_LLVM,
+
+	/* TVM hybrid model, with both MRVL and LLVM regions or (> 1) MRVL regions*/
+	ML_CNXK_MODEL_SUBTYPE_TVM_HYBRID,
+};
+
+/* Layer type */
+enum cnxk_ml_layer_type {
+	/* MRVL layer, for MLIP target*/
+	ML_CNXK_LAYER_TYPE_UNKNOWN = 0,
+
+	/* MRVL layer, for MLIP target*/
+	ML_CNXK_LAYER_TYPE_MRVL,
+
+	/* LLVM layer, for ARM64 target*/
+	ML_CNXK_LAYER_TYPE_LLVM,
+};
+
 /* Model state */
 enum cnxk_ml_model_state {
 	/* Unknown state */
@@ -53,6 +96,9 @@ struct cnxk_ml_layer {
 	/* Name*/
 	char name[RTE_ML_STR_MAX];
 
+	/* Type */
+	enum cnxk_ml_layer_type type;
+
 	/* Model handle */
 	struct cnxk_ml_model *model;
 
@@ -83,14 +129,27 @@ struct cnxk_ml_model {
 	/* Device reference */
 	struct cnxk_ml_dev *cnxk_mldev;
 
+	/* Type */
+	enum cnxk_ml_model_type type;
+
+	/* Model subtype */
+	enum cnxk_ml_model_subtype subtype;
+
 	/* ID */
 	uint16_t model_id;
 
 	/* Name */
 	char name[RTE_ML_STR_MAX];
 
-	/* Model specific data - glow */
-	struct cn10k_ml_model_data glow;
+	union {
+		/* Model specific data - glow */
+		struct cn10k_ml_model_data glow;
+
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		/* Model type specific data - mvtvm */
+		struct mvtvm_ml_model_data mvtvm;
+#endif
+	};
 
 	/* Batch size */
 	uint32_t batch_size;
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 358f16cead..a20937ea11 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1286,6 +1286,8 @@ cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buf
 	struct cnxk_ml_model *model;
 	uint8_t *lcl_dbuffer;
 	uint8_t *lcl_qbuffer;
+	uint64_t d_offset;
+	uint64_t q_offset;
 	uint32_t i;
 	int ret;
 
@@ -1298,17 +1300,35 @@ cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buf
 		return -EINVAL;
 	}
 
-	info = &model->layer[0].info;
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		info = &model->layer[0].info;
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	else
+		info = &model->mvtvm.info;
+#endif
+
+	if (info == NULL)
+		return -EINVAL;
 
-	lcl_dbuffer = dbuffer[0]->addr;
-	lcl_qbuffer = qbuffer[0]->addr;
+	d_offset = 0;
+	q_offset = 0;
 	for (i = 0; i < info->nb_inputs; i++) {
+		if (model->type == ML_CNXK_MODEL_TYPE_TVM) {
+			lcl_dbuffer = dbuffer[i]->addr;
+			lcl_qbuffer = qbuffer[i]->addr;
+		} else {
+			lcl_dbuffer = RTE_PTR_ADD(dbuffer[0]->addr, d_offset);
+			lcl_qbuffer = RTE_PTR_ADD(qbuffer[0]->addr, q_offset);
+		}
+
 		ret = cnxk_ml_io_quantize_single(&info->input[i], lcl_dbuffer, lcl_qbuffer);
 		if (ret < 0)
 			return ret;
 
-		lcl_dbuffer += info->input[i].sz_d;
-		lcl_qbuffer += info->input[i].sz_q;
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
+			d_offset += info->input[i].sz_d;
+			q_offset += info->input[i].sz_q;
+		}
 	}
 
 	return 0;
@@ -1322,6 +1342,8 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 	struct cnxk_ml_model *model;
 	uint8_t *lcl_qbuffer;
 	uint8_t *lcl_dbuffer;
+	uint64_t q_offset;
+	uint64_t d_offset;
 	uint32_t i;
 	int ret;
 
@@ -1334,17 +1356,35 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 		return -EINVAL;
 	}
 
-	info = &model->layer[model->nb_layers - 1].info;
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		info = &model->layer[model->nb_layers - 1].info;
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	else
+		info = &model->mvtvm.info;
+#endif
+
+	if (info == NULL)
+		return -EINVAL;
 
-	lcl_qbuffer = qbuffer[0]->addr;
-	lcl_dbuffer = dbuffer[0]->addr;
+	q_offset = 0;
+	d_offset = 0;
 	for (i = 0; i < info->nb_outputs; i++) {
+		if (model->type == ML_CNXK_MODEL_TYPE_TVM) {
+			lcl_qbuffer = qbuffer[i]->addr;
+			lcl_dbuffer = dbuffer[i]->addr;
+		} else {
+			lcl_qbuffer = RTE_PTR_ADD(qbuffer[0]->addr, q_offset);
+			lcl_dbuffer = RTE_PTR_ADD(dbuffer[0]->addr, d_offset);
+		}
+
 		ret = cnxk_ml_io_dequantize_single(&info->output[i], lcl_qbuffer, lcl_dbuffer);
 		if (ret < 0)
 			return ret;
 
-		lcl_qbuffer += info->output[i].sz_q;
-		lcl_dbuffer += info->output[i].sz_d;
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
+			q_offset += info->output[i].sz_q;
+			d_offset += info->output[i].sz_d;
+		}
 	}
 
 	return 0;
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 61f7fa32af..25b72cc8aa 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -66,6 +66,7 @@ dpdk_conf.set('RTE_MLDEV_CNXK_ENABLE_MVTVM', true)
 
 driver_sdk_headers += files(
         'mvtvm_ml_ops.h',
+        'mvtvm_ml_model.h',
 )
 
 sources += files(
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
new file mode 100644
index 0000000000..1f6b435be0
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_MODEL_H_
+#define _MVTVM_ML_MODEL_H_
+
+#include <tvmdp.h>
+
+#include <rte_mldev.h>
+
+#include "cnxk_ml_io.h"
+
+/* Maximum number of objects per model */
+#define ML_MVTVM_MODEL_OBJECT_MAX 3
+
+/* Objects list */
+extern char mvtvm_object_list[ML_MVTVM_MODEL_OBJECT_MAX][RTE_ML_STR_MAX];
+
+/* Model object structure */
+struct mvtvm_ml_model_object {
+	/* Name */
+	char name[RTE_ML_STR_MAX];
+
+	/* Temporary buffer */
+	uint8_t *buffer;
+
+	/* Buffer size */
+	int64_t size;
+};
+
+struct mvtvm_ml_model_data {
+	/* Model metadata */
+	struct tvmdp_model_metadata metadata;
+
+	/* Model objects */
+	struct tvmdp_model_object object;
+
+	/* TVM runtime callbacks */
+	struct tvmrt_glow_callback cb;
+
+	/* Model I/O info */
+	struct cnxk_ml_io_info info;
+};
+
+#endif /* _MVTVM_ML_MODEL_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 21/34] ml/cnxk: add support for identify model type
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (19 preceding siblings ...)
  2023-09-20  7:25   ` [PATCH v2 20/34] ml/cnxk: add structures to support TVM model type Srikanth Yalavarthi
@ 2023-09-20  7:25   ` Srikanth Yalavarthi
  2023-09-20  7:25   ` [PATCH v2 22/34] ml/cnxk: add support to parse TVM model objects Srikanth Yalavarthi
                     ` (13 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:25 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enable support to parse model buffer to identify the
model type and model sub-type. Enabled basic checks
for Glow model type buffer.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_model.c  | 96 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_model.h  |  1 +
 drivers/ml/cnxk/cnxk_ml_ops.c    |  9 +++
 drivers/ml/cnxk/meson.build      |  6 ++
 drivers/ml/cnxk/mvtvm_ml_model.c | 11 ++++
 5 files changed, 123 insertions(+)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.c

diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
index b069d4e3a5..746d3ca5a9 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.c
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -2,11 +2,107 @@
  * Copyright (c) 2023 Marvell.
  */
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include <archive.h>
+#include <archive_entry.h>
+#endif
+
+#include <rte_hash_crc.h>
 #include <rte_mldev.h>
 
+#include "cn10k_ml_model.h"
+
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_model.h"
+#endif
+
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_utils.h"
 
+enum cnxk_ml_model_type
+cnxk_ml_model_get_type(struct rte_ml_model_params *params)
+{
+	struct cn10k_ml_model_metadata_header *metadata_header;
+	uint32_t payload_crc32c;
+	uint32_t header_crc32c;
+
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	struct archive_entry *entry;
+	struct archive *a;
+	uint8_t i;
+	int ret;
+
+	/* Assume as archive and check for read status */
+	a = archive_read_new();
+	archive_read_support_filter_all(a);
+	archive_read_support_format_all(a);
+
+	ret = archive_read_open_memory(a, params->addr, params->size);
+	if (ret == ARCHIVE_OK)
+		goto check_tvm;
+	else
+		goto check_glow;
+
+check_tvm:
+	bool object_found[ML_MVTVM_MODEL_OBJECT_MAX] = {false, false, false};
+
+	/* Parse buffer for available objects */
+	while (archive_read_next_header(a, &entry) == ARCHIVE_OK) {
+		for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+			if (!object_found[i] &&
+			    (strcmp(archive_entry_pathname(entry), mvtvm_object_list[i]) == 0))
+				object_found[i] = true;
+		}
+		archive_read_data_skip(a);
+	}
+
+	/* Check if all objects are available */
+	for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+		if (!object_found[i]) {
+			plt_err("Object %s not found in archive!\n", mvtvm_object_list[i]);
+			return ML_CNXK_MODEL_TYPE_INVALID;
+		}
+	}
+
+	return ML_CNXK_MODEL_TYPE_TVM;
+
+check_glow:
+#endif
+
+	/* Check model magic string */
+	metadata_header = (struct cn10k_ml_model_metadata_header *)params->addr;
+	if (strncmp((char *)metadata_header->magic, MRVL_ML_MODEL_MAGIC_STRING, 4) != 0) {
+		plt_err("Invalid Glow model, magic = %s", metadata_header->magic);
+		return ML_CNXK_MODEL_TYPE_INVALID;
+	}
+
+	/* Header CRC check */
+	if (metadata_header->header_crc32c != 0) {
+		header_crc32c = rte_hash_crc(
+			params->addr,
+			sizeof(struct cn10k_ml_model_metadata_header) - sizeof(uint32_t), 0);
+
+		if (header_crc32c != metadata_header->header_crc32c) {
+			plt_err("Invalid Glow model, Header CRC mismatch");
+			return ML_CNXK_MODEL_TYPE_INVALID;
+		}
+	}
+
+	/* Payload CRC check */
+	if (metadata_header->payload_crc32c != 0) {
+		payload_crc32c = rte_hash_crc(
+			PLT_PTR_ADD(params->addr, sizeof(struct cn10k_ml_model_metadata_header)),
+			params->size - sizeof(struct cn10k_ml_model_metadata_header), 0);
+
+		if (payload_crc32c != metadata_header->payload_crc32c) {
+			plt_err("Invalid Glow model, Payload CRC mismatch");
+			return ML_CNXK_MODEL_TYPE_INVALID;
+		}
+	}
+
+	return ML_CNXK_MODEL_TYPE_GLOW;
+}
+
 void
 cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp)
 {
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index b5d6ab2b1e..577a96dc26 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -181,6 +181,7 @@ struct cnxk_ml_model {
 	set_poll_addr_t set_poll_addr;
 };
 
+enum cnxk_ml_model_type cnxk_ml_model_get_type(struct rte_ml_model_params *params);
 void cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp);
 
 #endif /* _CNXK_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index a20937ea11..052c69e510 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -10,6 +10,7 @@
 #include "cn10k_ml_ops.h"
 
 #ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_model.h"
 #include "mvtvm_ml_ops.h"
 #endif
 
@@ -1087,6 +1088,7 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 {
 	struct rte_ml_dev_info dev_info;
 	struct cnxk_ml_dev *cnxk_mldev;
+	enum cnxk_ml_model_type type;
 	struct cnxk_ml_model *model;
 
 	char str[RTE_MEMZONE_NAMESIZE];
@@ -1102,6 +1104,12 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 
 	cnxk_mldev = dev->data->dev_private;
 
+	type = cnxk_ml_model_get_type(params);
+	if (type == ML_CNXK_MODEL_TYPE_INVALID) {
+		plt_err("Invalid / unsupported model type");
+		return -EINVAL;
+	}
+
 	/* Find model ID */
 	found = false;
 	for (lcl_model_id = 0; lcl_model_id < dev->data->nb_models; lcl_model_id++) {
@@ -1135,6 +1143,7 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 
 	model = mz->addr;
 	model->cnxk_mldev = cnxk_mldev;
+	model->type = type;
 	model->model_id = lcl_model_id;
 	model->info = PLT_PTR_ADD(
 		model, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size));
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 25b72cc8aa..09a62b5c55 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -9,6 +9,11 @@ endif
 
 enable_mvtvm = true
 
+if not libarchive.found()
+        message('drivers/ml/cnxk: libarchive not found')
+        enable_mvtvm = false
+endif
+
 if not jansson_dep.found()
         message('drivers/ml/cnxk: jansson not found')
         enable_mvtvm = false
@@ -71,6 +76,7 @@ driver_sdk_headers += files(
 
 sources += files(
         'mvtvm_ml_ops.c',
+        'mvtvm_ml_model.c',
 )
 
 ext_deps += tvmrt_dep
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
new file mode 100644
index 0000000000..6462267534
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include "mvtvm_ml_model.h"
+
+/* Objects list */
+char mvtvm_object_list[ML_MVTVM_MODEL_OBJECT_MAX][RTE_ML_STR_MAX] = {"mod.so", "mod.json",
+								     "mod.params"};
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 22/34] ml/cnxk: add support to parse TVM model objects
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (20 preceding siblings ...)
  2023-09-20  7:25   ` [PATCH v2 21/34] ml/cnxk: add support for identify " Srikanth Yalavarthi
@ 2023-09-20  7:25   ` Srikanth Yalavarthi
  2023-09-20  7:25   ` [PATCH v2 23/34] ml/cnxk: fetch layer info and load TVM model Srikanth Yalavarthi
                     ` (12 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:25 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to parse TVM model objects from the model
archive buffer. Added support to check for all expected
objects and copy TVM model objects to internal buffers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_ops.c    | 14 +++++--
 drivers/ml/cnxk/mvtvm_ml_model.c | 62 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  3 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 63 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |  3 ++
 5 files changed, 142 insertions(+), 3 deletions(-)

diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 052c69e510..8e17f597af 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1149,9 +1149,17 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 		model, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size));
 	dev->data->models[lcl_model_id] = model;
 
-	ret = cn10k_ml_model_load(cnxk_mldev, params, model);
-	if (ret != 0)
-		goto error;
+	if (type == ML_CNXK_MODEL_TYPE_GLOW) {
+		ret = cn10k_ml_model_load(cnxk_mldev, params, model);
+		if (ret != 0)
+			goto error;
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	} else {
+		ret = mvtvm_ml_model_load(cnxk_mldev, params, model);
+		if (ret != 0)
+			goto error;
+#endif
+	}
 
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CNXK_MODEL_STATE_LOADED;
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 6462267534..425a682209 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -2,10 +2,72 @@
  * Copyright (c) 2023 Marvell.
  */
 
+#include <archive.h>
+#include <archive_entry.h>
+
 #include <rte_mldev.h>
 
+#include <roc_api.h>
+
 #include "mvtvm_ml_model.h"
 
 /* Objects list */
 char mvtvm_object_list[ML_MVTVM_MODEL_OBJECT_MAX][RTE_ML_STR_MAX] = {"mod.so", "mod.json",
 								     "mod.params"};
+
+int
+mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params, struct mvtvm_ml_model_object *object)
+{
+	bool object_found[ML_MVTVM_MODEL_OBJECT_MAX] = {false, false, false};
+	struct archive_entry *entry;
+	struct archive *a;
+	uint8_t i;
+	int ret;
+
+	/* Open archive */
+	a = archive_read_new();
+	archive_read_support_filter_all(a);
+	archive_read_support_format_all(a);
+
+	ret = archive_read_open_memory(a, params->addr, params->size);
+	if (ret != ARCHIVE_OK)
+		return archive_errno(a);
+
+	/* Read archive */
+	while (archive_read_next_header(a, &entry) == ARCHIVE_OK) {
+		for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+			if (!object_found[i] &&
+			    (strcmp(archive_entry_pathname(entry), mvtvm_object_list[i]) == 0)) {
+				memcpy(object[i].name, mvtvm_object_list[i], RTE_ML_STR_MAX);
+				object[i].size = archive_entry_size(entry);
+				object[i].buffer = rte_malloc(NULL, object[i].size, 0);
+
+				if (archive_read_data(a, object[i].buffer, object[i].size) !=
+				    object[i].size) {
+					plt_err("Failed to read object from model archive: %s",
+						object[i].name);
+					goto error;
+				}
+				object_found[i] = true;
+			}
+		}
+		archive_read_data_skip(a);
+	}
+
+	/* Check if all objects are parsed */
+	for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+		if (!object_found[i]) {
+			plt_err("Object %s not found in archive!\n", mvtvm_object_list[i]);
+			goto error;
+		}
+	}
+	return 0;
+
+error:
+	for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+		if (object[i].buffer != NULL)
+			rte_free(object[i].buffer);
+	}
+
+	return -EINVAL;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 1f6b435be0..73a45a91d6 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -43,4 +43,7 @@ struct mvtvm_ml_model_data {
 	struct cnxk_ml_io_info info;
 };
 
+int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
+			      struct mvtvm_ml_model_object *object);
+
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index f2b9499cf4..baa9099084 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -7,9 +7,14 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include "mvtvm_ml_model.h"
 #include "mvtvm_ml_ops.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
+
+/* ML model macros */
+#define MVTVM_ML_MODEL_MEMZONE_NAME "ml_mvtvm_model_mz"
 
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
@@ -40,3 +45,61 @@ mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 
 	return ret;
 }
+
+int
+mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+		    struct cnxk_ml_model *model)
+{
+	struct mvtvm_ml_model_object object[ML_MVTVM_MODEL_OBJECT_MAX];
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	size_t model_object_size = 0;
+	uint64_t mz_size = 0;
+	int ret;
+
+	RTE_SET_USED(cnxk_mldev);
+
+	ret = mvtvm_ml_model_blob_parse(params, object);
+	if (ret != 0)
+		return ret;
+
+	model_object_size = RTE_ALIGN_CEIL(object[0].size, RTE_CACHE_LINE_MIN_SIZE) +
+			    RTE_ALIGN_CEIL(object[1].size, RTE_CACHE_LINE_MIN_SIZE) +
+			    RTE_ALIGN_CEIL(object[2].size, RTE_CACHE_LINE_MIN_SIZE);
+	mz_size += model_object_size;
+
+	/* Allocate memzone for model object */
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", MVTVM_ML_MODEL_MEMZONE_NAME, model->model_id);
+	mz = plt_memzone_reserve_aligned(str, mz_size, 0, ML_CN10K_ALIGN_SIZE);
+	if (!mz) {
+		plt_err("plt_memzone_reserve failed : %s", str);
+		return -ENOMEM;
+	}
+
+	/* Copy mod.so */
+	model->mvtvm.object.so.addr = mz->addr;
+	model->mvtvm.object.so.size = object[0].size;
+	rte_memcpy(model->mvtvm.object.so.name, object[0].name, TVMDP_NAME_STRLEN);
+	rte_memcpy(model->mvtvm.object.so.addr, object[0].buffer, object[0].size);
+	rte_free(object[0].buffer);
+
+	/* Copy mod.json */
+	model->mvtvm.object.json.addr =
+		RTE_PTR_ADD(model->mvtvm.object.so.addr,
+			    RTE_ALIGN_CEIL(model->mvtvm.object.so.size, RTE_CACHE_LINE_MIN_SIZE));
+	model->mvtvm.object.json.size = object[1].size;
+	rte_memcpy(model->mvtvm.object.json.name, object[1].name, TVMDP_NAME_STRLEN);
+	rte_memcpy(model->mvtvm.object.json.addr, object[1].buffer, object[1].size);
+	rte_free(object[1].buffer);
+
+	/* Copy mod.params */
+	model->mvtvm.object.params.addr =
+		RTE_PTR_ADD(model->mvtvm.object.json.addr,
+			    RTE_ALIGN_CEIL(model->mvtvm.object.json.size, RTE_CACHE_LINE_MIN_SIZE));
+	model->mvtvm.object.params.size = object[2].size;
+	rte_memcpy(model->mvtvm.object.params.name, object[2].name, TVMDP_NAME_STRLEN);
+	rte_memcpy(model->mvtvm.object.params.addr, object[2].buffer, object[2].size);
+	rte_free(object[2].buffer);
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 305b4681ed..6607537599 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -12,8 +12,11 @@
 #include <rte_mldev.h>
 
 struct cnxk_ml_dev;
+struct cnxk_ml_model;
 
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+			struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_OPS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 23/34] ml/cnxk: fetch layer info and load TVM model
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (21 preceding siblings ...)
  2023-09-20  7:25   ` [PATCH v2 22/34] ml/cnxk: add support to parse TVM model objects Srikanth Yalavarthi
@ 2023-09-20  7:25   ` Srikanth Yalavarthi
  2023-09-20  7:25   ` [PATCH v2 24/34] ml/cnxk: update internal info for " Srikanth Yalavarthi
                     ` (11 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:25 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to fetch TVM model layer information and
update internal structures based on the layer information
Set callback functions for layer load and unload and
enable model loading using TVMDP library. Added support
to fetch full metadata after model load.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c   | 22 ++++++++-
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 +
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 83 ++++++++++++++++++++++++++++++++
 3 files changed, 106 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index db18f32052..79217165cd 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -508,8 +508,10 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 	int qp_id;
 	int ret;
 
-	PLT_SET_USED(size);
+#ifndef RTE_MLDEV_CNXK_ENABLE_MVTVM
 	PLT_SET_USED(layer_name);
+#endif
+	PLT_SET_USED(size);
 
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
@@ -523,6 +525,24 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 		return -EINVAL;
 	}
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	if (model->type == ML_CNXK_MODEL_TYPE_TVM) {
+		for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+			if (strcmp(model->layer[layer_id].name, layer_name) == 0)
+				break;
+		}
+
+		if (layer_id == model->mvtvm.metadata.model.nb_layers) {
+			plt_err("Invalid layer name: %s", layer_name);
+			return -EINVAL;
+		}
+
+		if (model->layer[layer_id].type != ML_CNXK_LAYER_TYPE_MRVL) {
+			plt_err("Invalid layer name / type: %s", layer_name);
+			return -EINVAL;
+		}
+	}
+#endif
 	layer = &model->layer[layer_id];
 
 	ret = cn10k_ml_model_metadata_check(buffer, size);
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 73a45a91d6..6c38217c15 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -11,6 +11,8 @@
 
 #include "cnxk_ml_io.h"
 
+struct cnxk_ml_model;
+
 /* Maximum number of objects per model */
 #define ML_MVTVM_MODEL_OBJECT_MAX 3
 
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index baa9099084..d9ec411385 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -7,6 +7,8 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include "cn10k_ml_ops.h"
+
 #include "mvtvm_ml_model.h"
 #include "mvtvm_ml_ops.h"
 
@@ -51,9 +53,13 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		    struct cnxk_ml_model *model)
 {
 	struct mvtvm_ml_model_object object[ML_MVTVM_MODEL_OBJECT_MAX];
+	struct tvmrt_glow_callback *callback;
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
 	size_t model_object_size = 0;
+	uint16_t nb_mrvl_layers;
+	uint16_t nb_llvm_layers;
+	uint8_t layer_id = 0;
 	uint64_t mz_size = 0;
 	int ret;
 
@@ -101,5 +107,82 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	rte_memcpy(model->mvtvm.object.params.addr, object[2].buffer, object[2].size);
 	rte_free(object[2].buffer);
 
+	/* Get metadata - stage 1 */
+	ret = tvmdp_model_metadata_get_stage1(model->mvtvm.object.json.addr,
+					      model->mvtvm.object.json.size,
+					      &model->mvtvm.metadata);
+	if (ret != 0) {
+		plt_err("TVMDP: Failed to parse metadata - stage 1, model_id = %u, error = %d",
+			model->model_id, ret);
+		goto error;
+	}
+
+	/* Set model fields */
+	plt_strlcpy(model->name, model->mvtvm.metadata.model.name, TVMDP_NAME_STRLEN);
+	model->batch_size = 1;
+	model->nb_layers = model->mvtvm.metadata.model.nb_layers;
+
+	/* Update layer info */
+	nb_mrvl_layers = 0;
+	nb_llvm_layers = 0;
+	for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+		strncpy(model->layer[layer_id].name,
+			model->mvtvm.metadata.model.layer[layer_id].name, TVMDP_NAME_STRLEN);
+		if (strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "mrvl") == 0 ||
+		    strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "MRVL") == 0) {
+			model->layer[layer_id].type = ML_CNXK_LAYER_TYPE_MRVL;
+			nb_mrvl_layers++;
+		} else if (strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "llvm") == 0 ||
+			   strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "LLVM") == 0) {
+			model->layer[layer_id].type = ML_CNXK_LAYER_TYPE_LLVM;
+			nb_llvm_layers++;
+		}
+	}
+
+	if ((nb_llvm_layers == 0) && (nb_mrvl_layers == 0)) {
+		plt_err("Invalid model, nb_llvm_layers = %u, nb_mrvl_layers = %u", nb_llvm_layers,
+			nb_mrvl_layers);
+		goto error;
+	}
+
+	/* Set model subtype */
+	if ((nb_llvm_layers == 0) && (nb_mrvl_layers == 1))
+		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_MRVL;
+	else if ((nb_llvm_layers > 0) && (nb_mrvl_layers == 0))
+		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_LLVM;
+	else
+		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_HYBRID;
+
+	/* Set callback function array */
+	if (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_LLVM) {
+		callback = &model->mvtvm.cb;
+		callback->tvmrt_glow_layer_load = cn10k_ml_layer_load;
+		callback->tvmrt_glow_layer_unload = cn10k_ml_layer_unload;
+	} else {
+		callback = NULL;
+	}
+
+	/* Initialize model in TVMDP */
+	ret = tvmdp_model_load(cnxk_mldev, model->model_id, (void *)(&model->mvtvm.object),
+			       callback);
+	if (ret != 0) {
+		plt_err("TVMDP: Model load failed, model_id = %u, error = %d", model->model_id,
+			ret);
+		goto error;
+	}
+
+	/* Get model metadata - stage 2 */
+	ret = tvmdp_model_metadata_get_stage2(model->model_id, &model->mvtvm.metadata);
+	if (ret != 0) {
+		plt_err("TVMDP: Failed to get metadata, model_id = %u, error = %d\n",
+			model->model_id, ret);
+		goto error;
+	}
+
 	return 0;
+
+error:
+	rte_memzone_free(mz);
+
+	return ret;
 }
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 24/34] ml/cnxk: update internal info for TVM model
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (22 preceding siblings ...)
  2023-09-20  7:25   ` [PATCH v2 23/34] ml/cnxk: fetch layer info and load TVM model Srikanth Yalavarthi
@ 2023-09-20  7:25   ` Srikanth Yalavarthi
  2023-09-20  7:25   ` [PATCH v2 25/34] ml/cnxk: enable model unload in tvmdp library Srikanth Yalavarthi
                     ` (10 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:25 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enabled updating internal IO info structures for TVM model.
Compute static fields related to the model I/O.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/mvtvm_ml_model.c | 105 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |   1 +
 drivers/ml/cnxk/mvtvm_ml_ops.c   |   3 +
 3 files changed, 109 insertions(+)

diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 425a682209..86f465a645 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -7,10 +7,14 @@
 
 #include <rte_mldev.h>
 
+#include <mldev_utils.h>
+
 #include <roc_api.h>
 
 #include "mvtvm_ml_model.h"
 
+#include "cnxk_ml_model.h"
+
 /* Objects list */
 char mvtvm_object_list[ML_MVTVM_MODEL_OBJECT_MAX][RTE_ML_STR_MAX] = {"mod.so", "mod.json",
 								     "mod.params"};
@@ -71,3 +75,104 @@ mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params, struct mvtvm_ml_mo
 
 	return -EINVAL;
 }
+
+static enum rte_ml_io_type
+mvtvm_ml_io_type_map(uint8_t type)
+{
+	switch (type) {
+	case kDLInt:
+		return RTE_ML_IO_TYPE_INT32;
+	case kDLUInt:
+		return RTE_ML_IO_TYPE_UINT32;
+	case kDLFloat:
+		return RTE_ML_IO_TYPE_FP32;
+	case kDLBfloat:
+		return RTE_ML_IO_TYPE_BFLOAT16;
+	}
+
+	return RTE_ML_IO_TYPE_UNKNOWN;
+}
+
+void
+mvtvm_ml_model_io_info_update(struct cnxk_ml_model *model)
+{
+	struct tvmdp_model_metadata *metadata;
+	int32_t i;
+	int32_t j;
+
+	if (model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL)
+		goto tvm_mrvl_model;
+
+	metadata = &model->mvtvm.metadata;
+
+	/* Inputs, set for layer_id = 0 */
+	model->mvtvm.info.nb_inputs = metadata->model.num_input;
+	model->mvtvm.info.total_input_sz_d = 0;
+	model->mvtvm.info.total_input_sz_q = 0;
+	for (i = 0; i < metadata->model.num_input; i++) {
+		strncpy(model->mvtvm.info.input[i].name, metadata->input[i].name,
+			TVMDP_NAME_STRLEN);
+		model->mvtvm.info.input[i].dtype =
+			mvtvm_ml_io_type_map(metadata->input[i].datatype.code);
+		model->mvtvm.info.input[i].qtype =
+			mvtvm_ml_io_type_map(metadata->input[i].model_datatype.code);
+		model->mvtvm.info.input[i].nb_dims = metadata->input[i].ndim;
+
+		model->mvtvm.info.input[i].nb_elements = 1;
+		for (j = 0; j < metadata->input[i].ndim; j++) {
+			model->mvtvm.info.input[i].shape[j] = metadata->input[i].shape[j];
+			model->mvtvm.info.input[i].nb_elements *= metadata->input[i].shape[j];
+		}
+
+		model->mvtvm.info.input[i].sz_d =
+			model->mvtvm.info.input[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.input[i].dtype);
+		model->mvtvm.info.input[i].sz_q =
+			model->mvtvm.info.input[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.input[i].qtype);
+
+		model->mvtvm.info.total_input_sz_d += model->mvtvm.info.input[i].sz_d;
+		model->mvtvm.info.total_input_sz_q += model->mvtvm.info.input[i].sz_q;
+
+		plt_ml_dbg("model_id = %u, input[%u] - sz_d = %u sz_q = %u", model->model_id, i,
+			   model->mvtvm.info.input[i].sz_d, model->mvtvm.info.input[i].sz_q);
+	}
+
+	/* Outputs, set for nb_layers - 1 */
+	model->mvtvm.info.nb_outputs = metadata->model.num_output;
+	model->mvtvm.info.total_output_sz_d = 0;
+	model->mvtvm.info.total_output_sz_q = 0;
+	for (i = 0; i < metadata->model.num_output; i++) {
+		strncpy(model->mvtvm.info.output[i].name, metadata->output[i].name,
+			TVMDP_NAME_STRLEN);
+		model->mvtvm.info.output[i].dtype =
+			mvtvm_ml_io_type_map(metadata->output[i].datatype.code);
+		model->mvtvm.info.output[i].qtype =
+			mvtvm_ml_io_type_map(metadata->output[i].model_datatype.code);
+		model->mvtvm.info.output[i].nb_dims = metadata->output[i].ndim;
+
+		model->mvtvm.info.output[i].nb_elements = 1;
+		for (j = 0; j < metadata->output[i].ndim; j++) {
+			model->mvtvm.info.output[i].shape[j] = metadata->output[i].shape[j];
+			model->mvtvm.info.output[i].nb_elements *= metadata->output[i].shape[j];
+		}
+
+		model->mvtvm.info.output[i].sz_d =
+			model->mvtvm.info.output[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.output[i].dtype);
+		model->mvtvm.info.output[i].sz_q =
+			model->mvtvm.info.output[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.output[i].qtype);
+
+		model->mvtvm.info.total_output_sz_d += model->mvtvm.info.output[i].sz_d;
+		model->mvtvm.info.total_output_sz_q += model->mvtvm.info.output[i].sz_q;
+
+		plt_ml_dbg("model_id = %u, output[%u] - sz_d = %u sz_q = %u", model->model_id, i,
+			   model->mvtvm.info.output[i].sz_d, model->mvtvm.info.output[i].sz_q);
+	}
+
+	return;
+
+tvm_mrvl_model:
+	cn10k_ml_layer_io_info_update(&model->mvtvm.info, &model->layer[0].glow.metadata);
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 6c38217c15..2b25a7b568 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -47,5 +47,6 @@ struct mvtvm_ml_model_data {
 
 int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
 			      struct mvtvm_ml_model_object *object);
+void mvtvm_ml_model_io_info_update(struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index d9ec411385..1d585a57ff 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -179,6 +179,9 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		goto error;
 	}
 
+	/* Update model I/O data */
+	mvtvm_ml_model_io_info_update(model);
+
 	return 0;
 
 error:
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 25/34] ml/cnxk: enable model unload in tvmdp library
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (23 preceding siblings ...)
  2023-09-20  7:25   ` [PATCH v2 24/34] ml/cnxk: update internal info for " Srikanth Yalavarthi
@ 2023-09-20  7:25   ` Srikanth Yalavarthi
  2023-09-20  7:25   ` [PATCH v2 26/34] ml/cnxk: support start and stop for TVM models Srikanth Yalavarthi
                     ` (9 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:25 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enable unloading model using external tvmdp library. Updated
layer unload callback to support multiple layers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 20 ++++++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.c  |  9 +++++++--
 drivers/ml/cnxk/mvtvm_ml_ops.c | 28 ++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h |  1 +
 4 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 79217165cd..85d0a9e18b 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -725,7 +725,9 @@ cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name)
 	uint16_t layer_id = 0;
 	int ret;
 
+#ifndef RTE_MLDEV_CNXK_ENABLE_MVTVM
 	PLT_SET_USED(layer_name);
+#endif
 
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
@@ -739,6 +741,24 @@ cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name)
 		return -EINVAL;
 	}
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	if (model->type == ML_CNXK_MODEL_TYPE_TVM) {
+		for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+			if (strcmp(model->layer[layer_id].name, layer_name) == 0)
+				break;
+		}
+
+		if (layer_id == model->mvtvm.metadata.model.nb_layers) {
+			plt_err("Invalid layer name: %s", layer_name);
+			return -EINVAL;
+		}
+
+		if (model->layer[layer_id].type != ML_CNXK_LAYER_TYPE_MRVL) {
+			plt_err("Invalid layer name / type: %s", layer_name);
+			return -EINVAL;
+		}
+	}
+#endif
 	layer = &model->layer[layer_id];
 
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u_%u", CN10K_ML_LAYER_MEMZONE_NAME,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 8e17f597af..512bac641e 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1182,7 +1182,7 @@ cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 	struct cnxk_ml_model *model;
 
 	char str[RTE_MEMZONE_NAMESIZE];
-	int ret;
+	int ret = 0;
 
 	if (dev == NULL)
 		return -EINVAL;
@@ -1200,7 +1200,12 @@ cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EBUSY;
 	}
 
-	ret = cn10k_ml_model_unload(cnxk_mldev, model);
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		ret = cn10k_ml_model_unload(cnxk_mldev, model);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	else
+		ret = mvtvm_ml_model_unload(cnxk_mldev, model);
+#endif
 	if (ret != 0)
 		return ret;
 
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 1d585a57ff..073773e409 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -189,3 +189,31 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	return ret;
 }
+
+int
+mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	int ret;
+
+	RTE_SET_USED(cnxk_mldev);
+
+	/* Initialize model in TVMDP */
+	ret = tvmdp_model_unload(model->model_id);
+	if (ret != 0) {
+		plt_err("TVMDP: Model unload failed, model_id = %u, error = %d", model->model_id,
+			ret);
+		return ret;
+	}
+
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", MVTVM_ML_MODEL_MEMZONE_NAME, model->model_id);
+	mz = rte_memzone_lookup(str);
+	if (mz == NULL) {
+		plt_err("Memzone lookup failed for TVM model: model_id = %u, mz = %s",
+			model->model_id, str);
+		return -EINVAL;
+	}
+
+	return plt_memzone_free(mz);
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 6607537599..770794fe7d 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -18,5 +18,6 @@ int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_d
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
+int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_OPS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 26/34] ml/cnxk: support start and stop for TVM models
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (24 preceding siblings ...)
  2023-09-20  7:25   ` [PATCH v2 25/34] ml/cnxk: enable model unload in tvmdp library Srikanth Yalavarthi
@ 2023-09-20  7:25   ` Srikanth Yalavarthi
  2023-09-20  7:25   ` [PATCH v2 27/34] ml/cnxk: update internal TVM model info structure Srikanth Yalavarthi
                     ` (8 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:25 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to start and stop TVM models. TVM model
start would invoke layer start for all Glow layers part
of the model. TVM model stop would invoke layer stop
for all Glow layers part of the model.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 42 +++++++++++++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.c  | 18 ++++++++++--
 drivers/ml/cnxk/mvtvm_ml_ops.c | 52 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h |  2 ++
 4 files changed, 112 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 85d0a9e18b..f70383b128 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -798,7 +798,9 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 	bool locked;
 	int ret = 0;
 
+#ifndef RTE_MLDEV_CNXK_ENABLE_MVTVM
 	PLT_SET_USED(layer_name);
+#endif
 
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
@@ -812,6 +814,25 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 		return -EINVAL;
 	}
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	if (model->type == ML_CNXK_MODEL_TYPE_TVM) {
+		for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+			if (strcmp(model->layer[layer_id].name, layer_name) == 0)
+				break;
+		}
+
+		if (layer_id == model->mvtvm.metadata.model.nb_layers) {
+			plt_err("Invalid layer name: %s", layer_name);
+			return -EINVAL;
+		}
+
+		if (model->layer[layer_id].type != ML_CNXK_LAYER_TYPE_MRVL) {
+			plt_err("Invalid layer name / type: %s", layer_name);
+			return -EINVAL;
+		}
+	}
+#endif
+
 	layer = &model->layer[layer_id];
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
@@ -981,7 +1002,9 @@ cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name)
 	bool locked;
 	int ret = 0;
 
+#ifndef RTE_MLDEV_CNXK_ENABLE_MVTVM
 	PLT_SET_USED(layer_name);
+#endif
 
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
@@ -995,6 +1018,25 @@ cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name)
 		return -EINVAL;
 	}
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	if (model->type == ML_CNXK_MODEL_TYPE_TVM) {
+		for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+			if (strcmp(model->layer[layer_id].name, layer_name) == 0)
+				break;
+		}
+
+		if (layer_id == model->mvtvm.metadata.model.nb_layers) {
+			plt_err("Invalid layer name: %s", layer_name);
+			return -EINVAL;
+		}
+
+		if (model->layer[layer_id].type != ML_CNXK_LAYER_TYPE_MRVL) {
+			plt_err("Invalid layer name / type: %s", layer_name);
+			return -EINVAL;
+		}
+	}
+#endif
+
 	layer = &model->layer[layer_id];
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 512bac641e..1e567ad45c 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1233,7 +1233,14 @@ cnxk_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EINVAL;
 	}
 
-	return cn10k_ml_model_start(cnxk_mldev, model);
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		return cn10k_ml_model_start(cnxk_mldev, model);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	else
+		return mvtvm_ml_model_start(cnxk_mldev, model);
+#endif
+
+	return 0;
 }
 
 int
@@ -1253,7 +1260,14 @@ cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EINVAL;
 	}
 
-	return cn10k_ml_model_stop(cnxk_mldev, model);
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		return cn10k_ml_model_stop(cnxk_mldev, model);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	else
+		return mvtvm_ml_model_stop(cnxk_mldev, model);
+#endif
+
+	return 0;
 }
 
 static int
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 073773e409..4015374b0d 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -217,3 +217,55 @@ mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mode
 
 	return plt_memzone_free(mz);
 }
+
+int
+mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+
+	uint16_t layer_id = 0;
+	int ret = 0;
+
+next_layer:
+	layer = &model->layer[layer_id];
+	if (layer->type == ML_CNXK_LAYER_TYPE_MRVL) {
+		ret = cn10k_ml_layer_start(cnxk_mldev, model->model_id, layer->name);
+		if (ret != 0) {
+			plt_err("Layer start failed, model_id = %u, layer_name = %s, error = %d",
+				model->model_id, layer->name, ret);
+			return ret;
+		}
+	}
+	layer_id++;
+
+	if (layer_id < model->nb_layers)
+		goto next_layer;
+
+	return 0;
+}
+
+int
+mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+
+	uint16_t layer_id = 0;
+	int ret = 0;
+
+next_layer:
+	layer = &model->layer[layer_id];
+	if (layer->type == ML_CNXK_LAYER_TYPE_MRVL) {
+		ret = cn10k_ml_layer_stop(cnxk_mldev, model->model_id, layer->name);
+		if (ret != 0) {
+			plt_err("Layer stop failed, model_id = %u, layer_name = %s, error = %d",
+				model->model_id, layer->name, ret);
+			return ret;
+		}
+	}
+	layer_id++;
+
+	if (layer_id < model->nb_layers)
+		goto next_layer;
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 770794fe7d..55459f9f7f 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -19,5 +19,7 @@ int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_OPS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 27/34] ml/cnxk: update internal TVM model info structure
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (25 preceding siblings ...)
  2023-09-20  7:25   ` [PATCH v2 26/34] ml/cnxk: support start and stop for TVM models Srikanth Yalavarthi
@ 2023-09-20  7:25   ` Srikanth Yalavarthi
  2023-09-20  7:25   ` [PATCH v2 28/34] ml/cnxk: support device dump for TVM models Srikanth Yalavarthi
                     ` (7 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:25 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

From: Prince Takkar <ptakkar@marvell.com>

Added support to update internal model info structure
for TVM models.

Signed-off-by: Prince Takkar <ptakkar@marvell.com>
Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/mvtvm_ml_model.c | 65 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 +
 drivers/ml/cnxk/mvtvm_ml_ops.c   |  3 ++
 3 files changed, 70 insertions(+)

diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 86f465a645..8c04d4652f 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -13,6 +13,7 @@
 
 #include "mvtvm_ml_model.h"
 
+#include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 
 /* Objects list */
@@ -176,3 +177,67 @@ mvtvm_ml_model_io_info_update(struct cnxk_ml_model *model)
 tvm_mrvl_model:
 	cn10k_ml_layer_io_info_update(&model->mvtvm.info, &model->layer[0].glow.metadata);
 }
+
+void
+mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct tvmdp_model_metadata *metadata;
+	struct rte_ml_model_info *info;
+	struct rte_ml_io_info *output;
+	struct rte_ml_io_info *input;
+	uint8_t i;
+
+	info = PLT_PTR_CAST(model->info);
+	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
+	output = PLT_PTR_ADD(input, ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info));
+
+	/* Reset model info */
+	memset(info, 0, sizeof(struct rte_ml_model_info));
+
+	if (model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL)
+		goto tvm_mrvl_model;
+
+	metadata = &model->mvtvm.metadata;
+	rte_memcpy(info->name, metadata->model.name, TVMDP_NAME_STRLEN);
+	snprintf(info->version, RTE_ML_STR_MAX, "%u.%u.%u.%u", metadata->model.version[0],
+		 metadata->model.version[1], metadata->model.version[2],
+		 metadata->model.version[3]);
+	info->model_id = model->model_id;
+	info->device_id = cnxk_mldev->mldev->data->dev_id;
+	info->io_layout = RTE_ML_IO_LAYOUT_SPLIT;
+	info->min_batches = model->batch_size;
+	info->max_batches = model->batch_size;
+	info->nb_inputs = metadata->model.num_input;
+	info->input_info = input;
+	info->nb_outputs = metadata->model.num_output;
+	info->output_info = output;
+	info->wb_size = 0;
+
+	/* Set input info */
+	for (i = 0; i < info->nb_inputs; i++) {
+		rte_memcpy(input[i].name, metadata->input[i].name, MRVL_ML_INPUT_NAME_LEN);
+		input[i].nb_dims = metadata->input[i].ndim;
+		input[i].shape = &model->mvtvm.info.input[i].shape[0];
+		input[i].type = model->mvtvm.info.input[i].qtype;
+		input[i].nb_elements = model->mvtvm.info.input[i].nb_elements;
+		input[i].size = model->mvtvm.info.input[i].nb_elements *
+				rte_ml_io_type_size_get(model->mvtvm.info.input[i].qtype);
+	}
+
+	/* Set output info */
+	for (i = 0; i < info->nb_outputs; i++) {
+		rte_memcpy(output[i].name, metadata->output[i].name, MRVL_ML_OUTPUT_NAME_LEN);
+		output[i].nb_dims = metadata->output[i].ndim;
+		output[i].shape = &model->mvtvm.info.output[i].shape[0];
+		output[i].type = model->mvtvm.info.output[i].qtype;
+		output[i].nb_elements = model->mvtvm.info.output[i].nb_elements;
+		output[i].size = model->mvtvm.info.output[i].nb_elements *
+				 rte_ml_io_type_size_get(model->mvtvm.info.output[i].qtype);
+	}
+
+	return;
+
+tvm_mrvl_model:
+	cn10k_ml_model_info_set(cnxk_mldev, model, &model->mvtvm.info,
+				&model->layer[0].glow.metadata);
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 2b25a7b568..eef424b5c2 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -11,6 +11,7 @@
 
 #include "cnxk_ml_io.h"
 
+struct cnxk_ml_dev;
 struct cnxk_ml_model;
 
 /* Maximum number of objects per model */
@@ -48,5 +49,6 @@ struct mvtvm_ml_model_data {
 int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
 			      struct mvtvm_ml_model_object *object);
 void mvtvm_ml_model_io_info_update(struct cnxk_ml_model *model);
+void mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 4015374b0d..213151e68b 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -182,6 +182,9 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	/* Update model I/O data */
 	mvtvm_ml_model_io_info_update(model);
 
+	/* Set model info */
+	mvtvm_ml_model_info_set(cnxk_mldev, model);
+
 	return 0;
 
 error:
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 28/34] ml/cnxk: support device dump for TVM models
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (26 preceding siblings ...)
  2023-09-20  7:25   ` [PATCH v2 27/34] ml/cnxk: update internal TVM model info structure Srikanth Yalavarthi
@ 2023-09-20  7:25   ` Srikanth Yalavarthi
  2023-09-20  7:25   ` [PATCH v2 29/34] ml/cnxk: enable reporting model runtime as xstats Srikanth Yalavarthi
                     ` (6 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:25 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enabled support to print TVM model layer info.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_model.c  |  9 ++++-
 drivers/ml/cnxk/cnxk_ml_ops.c    |  1 +
 drivers/ml/cnxk/mvtvm_ml_model.c | 59 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 ++
 4 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
index 746d3ca5a9..e63ee58ab2 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.c
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -115,6 +115,8 @@ cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 	cnxk_ml_print_line(fp, LINE_LEN);
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
 	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->name);
+	fprintf(fp, "%*s : %d\n", FIELD_LEN, "type", model->type);
+	fprintf(fp, "%*s : %d\n", FIELD_LEN, "subtype", model->subtype);
 	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->batch_size);
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "nb_layers", model->nb_layers);
@@ -131,6 +133,11 @@ cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 
 	for (layer_id = 0; layer_id < model->nb_layers; layer_id++) {
 		layer = &model->layer[layer_id];
-		cn10k_ml_layer_print(cnxk_mldev, layer, fp);
+		if (layer->type == ML_CNXK_LAYER_TYPE_MRVL)
+			cn10k_ml_layer_print(cnxk_mldev, layer, fp);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		else
+			mvtvm_ml_layer_print(cnxk_mldev, layer, fp);
+#endif
 	}
 }
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 1e567ad45c..361184620b 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -18,6 +18,7 @@
 #include "cnxk_ml_io.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
+#include "cnxk_ml_utils.h"
 
 /* ML model macros */
 #define CNXK_ML_MODEL_MEMZONE_NAME "ml_cnxk_model_mz"
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 8c04d4652f..7086c7a407 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -15,6 +15,7 @@
 
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_utils.h"
 
 /* Objects list */
 char mvtvm_object_list[ML_MVTVM_MODEL_OBJECT_MAX][RTE_ML_STR_MAX] = {"mod.so", "mod.json",
@@ -241,3 +242,61 @@ mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 	cn10k_ml_model_info_set(cnxk_mldev, model, &model->mvtvm.info,
 				&model->layer[0].glow.metadata);
 }
+
+void
+mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp)
+{
+	char str[STR_LEN];
+	uint8_t i;
+
+	/* Print debug info */
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, " Layer Information (Layer ID: %u, Name: %s)\n",
+		cnxk_mldev->index_map[layer->index].layer_id, layer->name);
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "layer_id",
+		cnxk_mldev->index_map[layer->index].layer_id);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", layer->name);
+	fprintf(fp, "%*s : %d\n", FIELD_LEN, "type", layer->type);
+	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "layer", PLT_U64_CAST(layer));
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", layer->batch_size);
+
+	/* Print model state */
+	if (layer->state == ML_CNXK_LAYER_STATE_LOADED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+	if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+	if (layer->state == ML_CNXK_LAYER_STATE_STARTED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", layer->info.nb_inputs);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", layer->info.nb_outputs);
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s\n", "input", "input_name", "input_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->info.nb_inputs; i++) {
+		fprintf(fp, "%8u  ", i);
+		fprintf(fp, "%*s  ", 16, layer->info.input[i].name);
+		rte_ml_io_type_to_str(layer->info.input[i].qtype, str, STR_LEN);
+		fprintf(fp, "%*s  ", 12, str);
+	}
+	fprintf(fp, "\n");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s\n", "output", "output_name", "output_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->info.nb_outputs; i++) {
+		fprintf(fp, "%8u  ", i);
+		fprintf(fp, "%*s  ", 16, layer->info.output[i].name);
+		rte_ml_io_type_to_str(layer->info.output[i].qtype, str, STR_LEN);
+		fprintf(fp, "%*s  ", 12, str);
+		fprintf(fp, "\n");
+	}
+	fprintf(fp, "\n");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index eef424b5c2..fa7735cfaa 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -13,6 +13,7 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 /* Maximum number of objects per model */
 #define ML_MVTVM_MODEL_OBJECT_MAX 3
@@ -50,5 +51,6 @@ int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
 			      struct mvtvm_ml_model_object *object);
 void mvtvm_ml_model_io_info_update(struct cnxk_ml_model *model);
 void mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+void mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 29/34] ml/cnxk: enable reporting model runtime as xstats
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (27 preceding siblings ...)
  2023-09-20  7:25   ` [PATCH v2 28/34] ml/cnxk: support device dump for TVM models Srikanth Yalavarthi
@ 2023-09-20  7:25   ` Srikanth Yalavarthi
  2023-09-20  7:25   ` [PATCH v2 30/34] ml/cnxk: implement I/O alloc and free callbacks Srikanth Yalavarthi
                     ` (5 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:25 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added model xstats entries to compute runtime latency.
Allocated internal resources for TVM model xstats.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_ops.c    | 200 ++++++++++++++++++++++++++++---
 drivers/ml/cnxk/cnxk_ml_ops.h    |   1 +
 drivers/ml/cnxk/cnxk_ml_xstats.h |   7 ++
 drivers/ml/cnxk/mvtvm_ml_model.h |  24 ++++
 drivers/ml/cnxk/mvtvm_ml_ops.c   |  24 +++-
 5 files changed, 238 insertions(+), 18 deletions(-)

diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 361184620b..f281e6070f 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -146,7 +146,8 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
 	nb_stats = RTE_DIM(device_xstats) +
-		   RTE_DIM(layer_xstats) * ML_CNXK_MAX_MODELS * ML_CNXK_MODEL_MAX_LAYERS;
+		   RTE_DIM(layer_xstats) * ML_CNXK_MAX_MODELS * ML_CNXK_MODEL_MAX_LAYERS +
+		   RTE_DIM(model_xstats) * ML_CNXK_MAX_MODELS;
 	if (cnxk_mldev->xstats.entries == NULL)
 		cnxk_mldev->xstats.entries = rte_zmalloc(
 			"cnxk_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
@@ -177,6 +178,25 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
 		cnxk_mldev->xstats.offset_for_model[model] = stat_id;
 
+		for (i = 0; i < RTE_DIM(model_xstats); i++) {
+			cnxk_mldev->xstats.entries[stat_id].map.id = stat_id;
+			cnxk_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
+			cnxk_mldev->xstats.entries[stat_id].group = CNXK_ML_XSTATS_GROUP_MODEL;
+			cnxk_mldev->xstats.entries[stat_id].type = model_xstats[i].type;
+			cnxk_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
+			cnxk_mldev->xstats.entries[stat_id].obj_idx = model;
+			cnxk_mldev->xstats.entries[stat_id].layer_id = -1;
+			cnxk_mldev->xstats.entries[stat_id].reset_allowed =
+				model_xstats[i].reset_allowed;
+
+			/* Name of xstat is updated during model load */
+			snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+				 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name),
+				 "Model-%u-%s", model, model_xstats[i].name);
+
+			stat_id++;
+		}
+
 		for (layer = 0; layer < ML_CNXK_MODEL_MAX_LAYERS; layer++) {
 			cnxk_mldev->xstats.offset_for_layer[model][layer] = stat_id;
 
@@ -203,7 +223,8 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 			cnxk_mldev->xstats.count_per_layer[model][layer] = RTE_DIM(layer_xstats);
 		}
 
-		cnxk_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
+		cnxk_mldev->xstats.count_per_model[model] =
+			RTE_DIM(layer_xstats) + ML_CNXK_MODEL_MAX_LAYERS * RTE_DIM(model_xstats);
 	}
 
 	cnxk_mldev->xstats.count_mode_model = stat_id - cnxk_mldev->xstats.count_mode_device;
@@ -212,6 +233,42 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
+void
+cnxk_ml_xstats_model_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id)
+{
+	struct cnxk_ml_model *model;
+	uint16_t rclk_freq;
+	uint16_t sclk_freq;
+	uint16_t stat_id;
+	char suffix[8];
+	uint16_t i;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	stat_id = cnxk_mldev->xstats.offset_for_model[model_id];
+
+	roc_clk_freq_get(&rclk_freq, &sclk_freq);
+	if (sclk_freq == 0)
+		strcpy(suffix, "cycles");
+	else
+		strcpy(suffix, "ns");
+
+	/* Update xstat name based on layer name and sclk availability */
+	for (i = 0; i < RTE_DIM(model_xstats); i++) {
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+			snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+				 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+				 model->glow.metadata.model.name, model_xstats[i].name, suffix);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		else
+			snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+				 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+				 model->mvtvm.metadata.model.name, model_xstats[i].name, suffix);
+#endif
+
+		stat_id++;
+	}
+}
+
 static void
 cnxk_ml_xstats_uninit(struct cnxk_ml_dev *cnxk_mldev)
 {
@@ -249,6 +306,9 @@ cnxk_ml_dev_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx __rte_unu
 			count += layer->glow.burst_xstats[qp_id].dequeued_count -                  \
 				 layer->glow.burst_xstats[qp_id].str##_reset_count;                \
 		}                                                                                  \
+		value += layer->glow.sync_xstats->str##_latency_tot;                               \
+		count += layer->glow.sync_xstats->dequeued_count -                                 \
+			 layer->glow.sync_xstats->str##_reset_count;                               \
 		if (count != 0)                                                                    \
 			value = value / count;                                                     \
 	} while (0)
@@ -261,6 +321,9 @@ cnxk_ml_dev_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx __rte_unu
 			count += layer->glow.burst_xstats[qp_id].dequeued_count -                  \
 				 layer->glow.burst_xstats[qp_id].str##_reset_count;                \
 		}                                                                                  \
+		value = PLT_MIN(value, layer->glow.sync_xstats->str##_latency_min);                \
+		count += layer->glow.sync_xstats->dequeued_count -                                 \
+			 layer->glow.sync_xstats->str##_reset_count;                               \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
 	} while (0)
@@ -273,9 +336,52 @@ cnxk_ml_dev_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx __rte_unu
 			count += layer->glow.burst_xstats[qp_id].dequeued_count -                  \
 				 layer->glow.burst_xstats[qp_id].str##_reset_count;                \
 		}                                                                                  \
+		value = PLT_MAX(value, layer->glow.sync_xstats->str##_latency_max);                \
+		count += layer->glow.sync_xstats->dequeued_count -                                 \
+			 layer->glow.sync_xstats->str##_reset_count;                               \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#define ML_AVG_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count)                            \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value += model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_tot;              \
+			count += model->mvtvm.burst_xstats[qp_id].dequeued_count -                 \
+				 model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count;              \
+		}                                                                                  \
+		if (count != 0)                                                                    \
+			value = value / count;                                                     \
+	} while (0)
+
+#define ML_MIN_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count)                            \
+	do {                                                                                       \
+		value = UINT64_MAX;                                                                \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MIN(value,                                                     \
+					model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_min);      \
+			count += model->mvtvm.burst_xstats[qp_id].dequeued_count -                 \
+				 model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count;              \
+		}                                                                                  \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+
+#define ML_MAX_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count)                            \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MAX(value,                                                     \
+					model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_max);      \
+			count += model->mvtvm.burst_xstats[qp_id].dequeued_count -                 \
+				 model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count;              \
+		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
 	} while (0)
+#endif
 
 static uint64_t
 cnxk_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx, int32_t layer_id,
@@ -293,11 +399,15 @@ cnxk_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx, int32_
 	if (model == NULL)
 		return 0;
 
-	if (layer_id >= 0)
+	if (layer_id >= 0) {
 		layer = &model->layer[layer_id];
-	else
-		return 0;
+		goto layer_xstats;
+	} else {
+		layer = NULL;
+		goto model_xstats;
+	}
 
+layer_xstats:
 	switch (type) {
 	case avg_hw_latency:
 		ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, hw, value, count);
@@ -320,7 +430,26 @@ cnxk_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx, int32_
 	default:
 		value = 0;
 	}
+	goto exit_xstats;
 
+model_xstats:
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	switch (type) {
+	case avg_rt_latency:
+		ML_AVG_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count);
+		break;
+	case min_rt_latency:
+		ML_MIN_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count);
+		break;
+	case max_rt_latency:
+		ML_MAX_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count);
+		break;
+	default:
+		value = 0;
+	}
+#endif
+
+exit_xstats:
 	roc_clk_freq_get(&rclk_freq, &sclk_freq);
 	if (sclk_freq != 0) /* return in ns */
 		value = (value * 1000ULL) / sclk_freq;
@@ -907,8 +1036,9 @@ cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode
 {
 	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
 	uint32_t xstats_mode_count;
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	uint32_t idx = 0;
 	uint32_t i;
 
@@ -925,7 +1055,17 @@ cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CNXK_MAX_MODELS)
 			break;
-		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+
+		model = cnxk_mldev->mldev->data->models[model_id];
+		for (layer_id = 0; layer_id < model->nb_layers; layer_id++) {
+			if (model->layer[layer_id].type == ML_CNXK_LAYER_TYPE_MRVL)
+				xstats_mode_count +=
+					cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+		}
+
+		if ((model->type == ML_CNXK_MODEL_TYPE_TVM) &&
+		    (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_MRVL))
+			xstats_mode_count += RTE_DIM(model_xstats);
 		break;
 	default:
 		return -EINVAL;
@@ -939,9 +1079,20 @@ cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode
 		if (xs->mode != mode)
 			continue;
 
-		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
-		    (model_id != xs->obj_idx || layer_id != xs->layer_id))
-			continue;
+		if (mode == RTE_ML_DEV_XSTATS_MODEL) {
+			if (model_id != xs->obj_idx)
+				continue;
+
+			model = cnxk_mldev->mldev->data->models[model_id];
+			if ((model->type == ML_CNXK_MODEL_TYPE_GLOW ||
+			     model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL) &&
+			    xs->group == CNXK_ML_XSTATS_GROUP_MODEL)
+				continue;
+
+			if (model->type == ML_CNXK_MODEL_TYPE_TVM &&
+			    model->layer[xs->layer_id].type == ML_CNXK_LAYER_TYPE_LLVM)
+				continue;
+		}
 
 		strncpy(xstats_map[idx].name, xs->map.name, RTE_ML_STR_MAX);
 		xstats_map[idx].id = xs->map.id;
@@ -1002,9 +1153,10 @@ cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 {
 	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
 	uint32_t xstats_mode_count;
-	uint16_t layer_id = 0;
 	cnxk_ml_xstats_fn fn;
+	uint16_t layer_id;
 	uint64_t val;
 	uint32_t idx;
 	uint32_t i;
@@ -1022,7 +1174,14 @@ cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CNXK_MAX_MODELS)
 			return -EINVAL;
-		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+
+		model = cnxk_mldev->mldev->data->models[model_id];
+		for (layer_id = 0; layer_id < model->nb_layers; layer_id++)
+			xstats_mode_count += cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+
+		if ((model->type == ML_CNXK_MODEL_TYPE_TVM) &&
+		    (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_MRVL))
+			xstats_mode_count += RTE_DIM(model_xstats);
 		break;
 	default:
 		return -EINVAL;
@@ -1034,11 +1193,18 @@ cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 		if (stat_ids[i] > cnxk_mldev->xstats.count || xs->mode != mode)
 			continue;
 
-		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
-		    (model_id != xs->obj_idx || layer_id != xs->layer_id)) {
-			plt_err("Invalid stats_id[%d] = %d for model_id = %d\n", i, stat_ids[i],
-				model_id);
-			return -EINVAL;
+		if (mode == RTE_ML_DEV_XSTATS_MODEL) {
+			if (model_id != xs->obj_idx)
+				continue;
+
+			model = cnxk_mldev->mldev->data->models[xs->obj_idx];
+			if ((model->type == ML_CNXK_MODEL_TYPE_GLOW ||
+			     model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL) &&
+			    xs->group == CNXK_ML_XSTATS_GROUP_MODEL)
+				continue;
+
+			if (xs->layer_id == -1 && xs->group == CNXK_ML_XSTATS_GROUP_LAYER)
+				continue;
 		}
 
 		switch (xs->fn_id) {
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index d0c126f34b..2575f4c6e1 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -64,6 +64,7 @@ extern struct rte_ml_dev_ops cnxk_ml_ops;
 
 int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
 int cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
+void cnxk_ml_xstats_model_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id);
 
 __rte_hot uint16_t cnxk_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					 struct rte_ml_op **ops, uint16_t nb_ops);
diff --git a/drivers/ml/cnxk/cnxk_ml_xstats.h b/drivers/ml/cnxk/cnxk_ml_xstats.h
index 5e02bb876c..a2c9adfe4a 100644
--- a/drivers/ml/cnxk/cnxk_ml_xstats.h
+++ b/drivers/ml/cnxk/cnxk_ml_xstats.h
@@ -142,4 +142,11 @@ static const struct cnxk_ml_xstat_info layer_xstats[] = {
 	{"Min-FW-Latency", min_fw_latency, 1}, {"Max-FW-Latency", max_fw_latency, 1},
 };
 
+/* Model xstats */
+static const struct cnxk_ml_xstat_info model_xstats[] = {
+	{"Avg-RT-Latency", avg_rt_latency, 1},
+	{"Min-RT-Latency", min_rt_latency, 1},
+	{"Max-RT-Latency", max_rt_latency, 1},
+};
+
 #endif /* _CNXK_ML_XSTATS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index fa7735cfaa..d71df36f5a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -33,6 +33,27 @@ struct mvtvm_ml_model_object {
 	int64_t size;
 };
 
+/* Model fast-path stats */
+struct mvtvm_ml_model_xstats {
+	/* Total TVM runtime latency, sum of all inferences */
+	uint64_t tvm_rt_latency_tot;
+
+	/* TVM runtime latency */
+	uint64_t tvm_rt_latency;
+
+	/* Minimum TVM runtime latency */
+	uint64_t tvm_rt_latency_min;
+
+	/* Maximum TVM runtime latency */
+	uint64_t tvm_rt_latency_max;
+
+	/* Total jobs dequeued */
+	uint64_t dequeued_count;
+
+	/* Hardware stats reset index */
+	uint64_t tvm_rt_reset_count;
+};
+
 struct mvtvm_ml_model_data {
 	/* Model metadata */
 	struct tvmdp_model_metadata metadata;
@@ -45,6 +66,9 @@ struct mvtvm_ml_model_data {
 
 	/* Model I/O info */
 	struct cnxk_ml_io_info info;
+
+	/* Stats for burst ops */
+	struct mvtvm_ml_model_xstats *burst_xstats;
 };
 
 int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 213151e68b..d4518412be 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -14,6 +14,7 @@
 
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_ops.h"
 
 /* ML model macros */
 #define MVTVM_ML_MODEL_MEMZONE_NAME "ml_mvtvm_model_mz"
@@ -57,6 +58,7 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
 	size_t model_object_size = 0;
+	size_t model_xstats_size = 0;
 	uint16_t nb_mrvl_layers;
 	uint16_t nb_llvm_layers;
 	uint8_t layer_id = 0;
@@ -72,7 +74,11 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	model_object_size = RTE_ALIGN_CEIL(object[0].size, RTE_CACHE_LINE_MIN_SIZE) +
 			    RTE_ALIGN_CEIL(object[1].size, RTE_CACHE_LINE_MIN_SIZE) +
 			    RTE_ALIGN_CEIL(object[2].size, RTE_CACHE_LINE_MIN_SIZE);
-	mz_size += model_object_size;
+
+	model_xstats_size =
+		cnxk_mldev->mldev->data->nb_queue_pairs * sizeof(struct mvtvm_ml_model_xstats);
+
+	mz_size += model_object_size + model_xstats_size;
 
 	/* Allocate memzone for model object */
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", MVTVM_ML_MODEL_MEMZONE_NAME, model->model_id);
@@ -185,6 +191,22 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	/* Set model info */
 	mvtvm_ml_model_info_set(cnxk_mldev, model);
 
+	/* Update model xstats name */
+	cnxk_ml_xstats_model_name_update(cnxk_mldev, model->model_id);
+
+	model->mvtvm.burst_xstats = RTE_PTR_ADD(
+		model->mvtvm.object.params.addr,
+		RTE_ALIGN_CEIL(model->mvtvm.object.params.size, RTE_CACHE_LINE_MIN_SIZE));
+
+	for (int qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_tot = 0;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency = 0;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_min = UINT64_MAX;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_max = 0;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count = 0;
+		model->mvtvm.burst_xstats[qp_id].dequeued_count = 0;
+	}
+
 	return 0;
 
 error:
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 30/34] ml/cnxk: implement I/O alloc and free callbacks
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (28 preceding siblings ...)
  2023-09-20  7:25   ` [PATCH v2 29/34] ml/cnxk: enable reporting model runtime as xstats Srikanth Yalavarthi
@ 2023-09-20  7:25   ` Srikanth Yalavarthi
  2023-09-20  7:25   ` [PATCH v2 31/34] ml/cnxk: add generic ML malloc and free callback Srikanth Yalavarthi
                     ` (4 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:25 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented callback functions for IO allocation and free
for Glow layers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 123 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |   3 +
 drivers/ml/cnxk/mvtvm_ml_ops.c |   2 +
 3 files changed, 128 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index f70383b128..23e98b96c5 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1399,3 +1399,126 @@ cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
 error_enqueue:
 	return ret;
 }
+
+int
+cn10k_ml_io_alloc(void *device, uint16_t model_id, const char *layer_name, uint64_t **input_qbuffer,
+		  uint64_t **output_qbuffer)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint16_t layer_id = 0;
+	uint64_t output_size;
+	uint64_t input_size;
+
+#ifndef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	PLT_SET_USED(layer_name);
+#endif
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", cnxk_mldev);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	if (model->type == ML_CNXK_MODEL_TYPE_TVM) {
+		for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+			if (strcmp(model->layer[layer_id].name, layer_name) == 0)
+				break;
+		}
+
+		if (layer_id == model->mvtvm.metadata.model.nb_layers) {
+			plt_err("Invalid layer name: %s", layer_name);
+			return -EINVAL;
+		}
+
+		if (model->layer[layer_id].type != ML_CNXK_LAYER_TYPE_MRVL) {
+			plt_err("Invalid layer name / type: %s", layer_name);
+			return -EINVAL;
+		}
+	}
+#endif
+
+	layer = &model->layer[layer_id];
+	input_size = PLT_ALIGN_CEIL(layer->info.total_input_sz_q, ML_CN10K_ALIGN_SIZE);
+	output_size = PLT_ALIGN_CEIL(layer->info.total_output_sz_q, ML_CN10K_ALIGN_SIZE);
+
+	sprintf(str, "cn10k_ml_io_mz_%u_%u", model_id, layer_id);
+	mz = plt_memzone_reserve_aligned(str, input_size + output_size, 0, ML_CN10K_ALIGN_SIZE);
+	if (mz == NULL) {
+		plt_err("io_alloc failed: Unable to allocate memory: model_id = %u, layer_name = %s",
+			model_id, layer_name);
+		return -ENOMEM;
+	}
+
+	*input_qbuffer = mz->addr;
+	*output_qbuffer = PLT_PTR_ADD(mz->addr, input_size);
+
+	return 0;
+}
+
+int
+cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint16_t layer_id = 0;
+
+#ifndef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	PLT_SET_USED(layer_name);
+#endif
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", cnxk_mldev);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	if (model->type == ML_CNXK_MODEL_TYPE_TVM) {
+		for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+			if (strcmp(model->layer[layer_id].name, layer_name) == 0)
+				break;
+		}
+
+		if (layer_id == model->mvtvm.metadata.model.nb_layers) {
+			plt_err("Invalid layer name: %s", layer_name);
+			return -EINVAL;
+		}
+
+		if (model->layer[layer_id].type != ML_CNXK_LAYER_TYPE_MRVL) {
+			plt_err("Invalid layer name / type: %s", layer_name);
+			return -EINVAL;
+		}
+	}
+#endif
+
+	sprintf(str, "cn10k_ml_io_mz_%u_%u", model_id, layer_id);
+	mz = plt_memzone_lookup(str);
+	if (mz == NULL) {
+		plt_err("io_free failed: Memzone not found: model_id = %u, layer_name = %s",
+			model_id, layer_name);
+		return -EINVAL;
+	}
+
+	return plt_memzone_free(mz);
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 3e75cae65a..055651eaa2 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -328,5 +328,8 @@ int cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name,
 int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name);
 int cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name);
 int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name);
+int cn10k_ml_io_alloc(void *device, uint16_t model_id, const char *layer_name,
+		      uint64_t **input_qbuffer, uint64_t **output_qbuffer);
+int cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name);
 
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index d4518412be..a41ba4d343 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -164,6 +164,8 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback = &model->mvtvm.cb;
 		callback->tvmrt_glow_layer_load = cn10k_ml_layer_load;
 		callback->tvmrt_glow_layer_unload = cn10k_ml_layer_unload;
+		callback->tvmrt_io_alloc = cn10k_ml_io_alloc;
+		callback->tvmrt_io_free = cn10k_ml_io_free;
 	} else {
 		callback = NULL;
 	}
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 31/34] ml/cnxk: add generic ML malloc and free callback
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (29 preceding siblings ...)
  2023-09-20  7:25   ` [PATCH v2 30/34] ml/cnxk: implement I/O alloc and free callbacks Srikanth Yalavarthi
@ 2023-09-20  7:25   ` Srikanth Yalavarthi
  2023-09-20  7:25   ` [PATCH v2 32/34] ml/cnxk: support quantize and dequantize callback Srikanth Yalavarthi
                     ` (3 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:25 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented generic ML malloc and free callbacks

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 30 ++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  3 +++
 drivers/ml/cnxk/mvtvm_ml_ops.c |  2 ++
 3 files changed, 35 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 23e98b96c5..140f7a343f 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1522,3 +1522,33 @@ cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name)
 
 	return plt_memzone_free(mz);
 }
+
+int
+cn10k_ml_malloc(const char *name, size_t size, uint32_t align, void **addr)
+{
+	const struct plt_memzone *mz;
+
+	mz = plt_memzone_reserve_aligned(name, size, 0, align);
+	if (mz == NULL) {
+		plt_err("ml_malloc failed: Unable to allocate memory: name = %s", name);
+		return -ENOMEM;
+	}
+
+	*addr = mz->addr;
+
+	return 0;
+}
+
+int
+cn10k_ml_free(const char *name)
+{
+	const struct plt_memzone *mz;
+
+	mz = plt_memzone_lookup(name);
+	if (mz == NULL) {
+		plt_err("ml_free failed: Memzone not found: name = %s", name);
+		return -EINVAL;
+	}
+
+	return plt_memzone_free(mz);
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 055651eaa2..d7df1d003a 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -332,4 +332,7 @@ int cn10k_ml_io_alloc(void *device, uint16_t model_id, const char *layer_name,
 		      uint64_t **input_qbuffer, uint64_t **output_qbuffer);
 int cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name);
 
+int cn10k_ml_malloc(const char *name, size_t size, uint32_t align, void **addr);
+int cn10k_ml_free(const char *name);
+
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index a41ba4d343..95238d43d8 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -166,6 +166,8 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback->tvmrt_glow_layer_unload = cn10k_ml_layer_unload;
 		callback->tvmrt_io_alloc = cn10k_ml_io_alloc;
 		callback->tvmrt_io_free = cn10k_ml_io_free;
+		callback->tvmrt_malloc = cn10k_ml_malloc;
+		callback->tvmrt_free = cn10k_ml_free;
 	} else {
 		callback = NULL;
 	}
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 32/34] ml/cnxk: support quantize and dequantize callback
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (30 preceding siblings ...)
  2023-09-20  7:25   ` [PATCH v2 31/34] ml/cnxk: add generic ML malloc and free callback Srikanth Yalavarthi
@ 2023-09-20  7:25   ` Srikanth Yalavarthi
  2023-09-20  7:25   ` [PATCH v2 33/34] ml/cnxk: enable fast-path ops for TVM models Srikanth Yalavarthi
                     ` (2 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:25 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

From: Prince Takkar <ptakkar@marvell.com>

Added support for quantize and dequantize callback
functions for TVM models.

Signed-off-by: Prince Takkar <ptakkar@marvell.com>
---
 drivers/ml/cnxk/mvtvm_ml_model.h |   2 +
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 127 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |   4 +
 3 files changed, 133 insertions(+)

diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index d71df36f5a..57a6ce0bb1 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -5,6 +5,8 @@
 #ifndef _MVTVM_ML_MODEL_H_
 #define _MVTVM_ML_MODEL_H_
 
+#include <dlpack/dlpack.h>
+
 #include <tvmdp.h>
 
 #include <rte_mldev.h>
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 95238d43d8..5292ac97fe 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -7,6 +7,8 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include <mldev_utils.h>
+
 #include "cn10k_ml_ops.h"
 
 #include "mvtvm_ml_model.h"
@@ -168,6 +170,8 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback->tvmrt_io_free = cn10k_ml_io_free;
 		callback->tvmrt_malloc = cn10k_ml_malloc;
 		callback->tvmrt_free = cn10k_ml_free;
+		callback->tvmrt_quantize = mvtvm_ml_io_quantize;
+		callback->tvmrt_dequantize = mvtvm_ml_io_dequantize;
 	} else {
 		callback = NULL;
 	}
@@ -298,3 +302,126 @@ mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
 
 	return 0;
 }
+
+int
+mvtvm_ml_io_quantize(void *device, uint16_t model_id, const char *layer_name,
+		     const DLTensor **deq_tensor, void *qbuffer)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	uint16_t layer_id = 0;
+	uint8_t *lcl_dbuffer;
+	uint8_t *lcl_qbuffer;
+	uint32_t i;
+	int ret;
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if ((device == NULL) || (deq_tensor == NULL) || (qbuffer == NULL))
+		return -EINVAL;
+#endif
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+#ifdef CNXK_ML_DEV_DEBUG
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+#endif
+
+	/* Get layer id */
+	for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+		if (strcmp(model->layer[layer_id].name, layer_name) == 0)
+			break;
+	}
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if (layer_id == model->mvtvm.metadata.model.nb_layers) {
+		plt_err("Invalid layer name: %s", layer_name);
+		return -EINVAL;
+	}
+
+	if (model->layer[layer_id].type != ML_CNXK_LAYER_TYPE_MRVL) {
+		plt_err("Invalid layer name / type: %s", layer_name);
+		return -EINVAL;
+	}
+#endif
+
+	info = &model->layer[layer_id].info;
+	lcl_qbuffer = (uint8_t *)qbuffer;
+
+	for (i = 0; i < info->nb_inputs; i++) {
+		lcl_dbuffer = PLT_PTR_ADD(deq_tensor[i]->data, deq_tensor[i]->byte_offset);
+
+		ret = cnxk_ml_io_quantize_single(&info->input[i], lcl_dbuffer, lcl_qbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_qbuffer += info->input[i].sz_q;
+	}
+
+	return 0;
+}
+
+int
+mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name, void *qbuffer,
+		       const DLTensor **deq_tensor)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	uint16_t layer_id = 0;
+	uint8_t *lcl_dbuffer;
+	uint8_t *lcl_qbuffer;
+	uint32_t i;
+	int ret;
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if ((device == NULL) || (deq_tensor == NULL) || (qbuffer == NULL))
+		return -EINVAL;
+#endif
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+#ifdef CNXK_ML_DEV_DEBUG
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+#endif
+
+	for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+		if (strcmp(model->layer[layer_id].name, layer_name) == 0)
+			break;
+	}
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if (layer_id == model->mvtvm.metadata.model.nb_layers) {
+		plt_err("Invalid layer name: %s", layer_name);
+		return -EINVAL;
+	}
+
+	if (model->layer[layer_id].type != ML_CNXK_LAYER_TYPE_MRVL) {
+		plt_err("Invalid layer name / type: %s", layer_name);
+		return -EINVAL;
+	}
+#endif
+
+	info = &model->layer[layer_id].info;
+	lcl_qbuffer = (uint8_t *)qbuffer;
+
+	for (i = 0; i < info->nb_outputs; i++) {
+		lcl_dbuffer = PLT_PTR_ADD(deq_tensor[i]->data, deq_tensor[i]->byte_offset);
+
+		ret = cnxk_ml_io_dequantize_single(&info->output[i], lcl_qbuffer, lcl_dbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_qbuffer += info->output[i].sz_q;
+	}
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 55459f9f7f..a1a868ef4b 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -21,5 +21,9 @@ int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_para
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_io_quantize(void *device, uint16_t model_id, const char *layer_name,
+			 const DLTensor **deq_tensor, void *qbuffer);
+int mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name, void *qbuffer,
+			   const DLTensor **deq_tensor);
 
 #endif /* _MVTVM_ML_OPS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 33/34] ml/cnxk: enable fast-path ops for TVM models
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (31 preceding siblings ...)
  2023-09-20  7:25   ` [PATCH v2 32/34] ml/cnxk: support quantize and dequantize callback Srikanth Yalavarthi
@ 2023-09-20  7:25   ` Srikanth Yalavarthi
  2023-09-20  7:25   ` [PATCH v2 34/34] ml/cnxk: enable creation of mvtvm virtual device Srikanth Yalavarthi
  2023-09-21 12:15   ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Jerin Jacob
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:25 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

From: Anup Prabhu <aprabhu@marvell.com>

Enable fast-path ops support for TVM models. Models would
use TVMDP library function calls to execute inference
operations for Hybrid and LLVM model sub-types.

For TVM MRVL model subtypes that have a single MRVL layer,
the inference requests are directly enqueued to hardware
by the driver.

Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c   |   4 -
 drivers/ml/cnxk/cnxk_ml_io.h     |   6 ++
 drivers/ml/cnxk/cnxk_ml_ops.c    |   4 +
 drivers/ml/cnxk/cnxk_ml_ops.h    |   9 +++
 drivers/ml/cnxk/mvtvm_ml_model.c |  20 +++++
 drivers/ml/cnxk/mvtvm_ml_model.h |   6 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 124 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |  43 +++++++++++
 8 files changed, 212 insertions(+), 4 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 140f7a343f..c1353fb0c8 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -287,10 +287,6 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 	else
 		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
-	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
-	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
-	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
-
 	return 0;
 }
 
diff --git a/drivers/ml/cnxk/cnxk_ml_io.h b/drivers/ml/cnxk/cnxk_ml_io.h
index 5de166c252..6d5d25a7c9 100644
--- a/drivers/ml/cnxk/cnxk_ml_io.h
+++ b/drivers/ml/cnxk/cnxk_ml_io.h
@@ -47,6 +47,12 @@ struct cnxk_ml_io {
 
 	/* Scale */
 	float scale;
+
+	/* Dequantized offset */
+	uint32_t offset_d;
+
+	/* Quantized offset */
+	uint32_t offset_q;
 };
 
 /* Model / Layer IO structure */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index f281e6070f..274d152b81 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -770,6 +770,10 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	cnxk_mldev->max_nb_layers =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
 
+	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
+	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
+	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
+
 	/* Allocate and initialize index_map */
 	if (cnxk_mldev->index_map == NULL) {
 		cnxk_mldev->index_map =
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index 2575f4c6e1..62e2b17e35 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -12,12 +12,21 @@
 
 #include "cn10k_ml_ops.h"
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_ops.h"
+#endif
+
 /* Request structure */
 struct cnxk_ml_req {
 	/* Device specific request */
 	union {
 		/* CN10K */
 		struct cn10k_ml_req cn10k_req;
+
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		/* MVTVM */
+		struct mvtvm_ml_req mvtvm_req;
+#endif
 	};
 
 	/* Address of status field */
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 7086c7a407..8af84b6972 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -136,6 +136,16 @@ mvtvm_ml_model_io_info_update(struct cnxk_ml_model *model)
 		model->mvtvm.info.total_input_sz_d += model->mvtvm.info.input[i].sz_d;
 		model->mvtvm.info.total_input_sz_q += model->mvtvm.info.input[i].sz_q;
 
+		model->mvtvm.info.input[i].offset_d = model->mvtvm.info.total_input_sz_d;
+		model->mvtvm.info.input[i].offset_q = model->mvtvm.info.total_input_sz_q;
+
+		model->mvtvm.input_tensor[i].device = metadata->input[i].device;
+		model->mvtvm.input_tensor[i].ndim = metadata->input[i].ndim;
+		model->mvtvm.input_tensor[i].dtype = metadata->input[i].datatype;
+		model->mvtvm.input_tensor[i].shape = metadata->input[i].shape;
+		model->mvtvm.input_tensor[i].strides = NULL;
+		model->mvtvm.input_tensor[i].byte_offset = model->mvtvm.info.input[i].offset_q;
+
 		plt_ml_dbg("model_id = %u, input[%u] - sz_d = %u sz_q = %u", model->model_id, i,
 			   model->mvtvm.info.input[i].sz_d, model->mvtvm.info.input[i].sz_q);
 	}
@@ -169,6 +179,16 @@ mvtvm_ml_model_io_info_update(struct cnxk_ml_model *model)
 		model->mvtvm.info.total_output_sz_d += model->mvtvm.info.output[i].sz_d;
 		model->mvtvm.info.total_output_sz_q += model->mvtvm.info.output[i].sz_q;
 
+		model->mvtvm.info.output[i].offset_d = model->mvtvm.info.total_output_sz_d;
+		model->mvtvm.info.output[i].offset_q = model->mvtvm.info.total_output_sz_q;
+
+		model->mvtvm.output_tensor[i].device = metadata->output[i].device;
+		model->mvtvm.output_tensor[i].ndim = metadata->output[i].ndim;
+		model->mvtvm.output_tensor[i].dtype = metadata->output[i].datatype;
+		model->mvtvm.output_tensor[i].shape = metadata->output[i].shape;
+		model->mvtvm.output_tensor[i].strides = NULL;
+		model->mvtvm.output_tensor[i].byte_offset = model->mvtvm.info.output[i].offset_q;
+
 		plt_ml_dbg("model_id = %u, output[%u] - sz_d = %u sz_q = %u", model->model_id, i,
 			   model->mvtvm.info.output[i].sz_d, model->mvtvm.info.output[i].sz_q);
 	}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 57a6ce0bb1..08e101bbe7 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -71,6 +71,12 @@ struct mvtvm_ml_model_data {
 
 	/* Stats for burst ops */
 	struct mvtvm_ml_model_xstats *burst_xstats;
+
+	/* Input Tensor */
+	DLTensor input_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Output Tensor */
+	DLTensor output_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
 };
 
 int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 5292ac97fe..2baac8f72f 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -21,6 +21,12 @@
 /* ML model macros */
 #define MVTVM_ML_MODEL_MEMZONE_NAME "ml_mvtvm_model_mz"
 
+__rte_hot static void
+mvtvm_ml_set_poll_addr(struct cnxk_ml_req *req)
+{
+	req->status = &req->mvtvm_req.status;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
@@ -172,6 +178,7 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback->tvmrt_free = cn10k_ml_free;
 		callback->tvmrt_quantize = mvtvm_ml_io_quantize;
 		callback->tvmrt_dequantize = mvtvm_ml_io_dequantize;
+		callback->tvmrt_inference = cn10k_ml_inference_sync;
 	} else {
 		callback = NULL;
 	}
@@ -215,6 +222,19 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		model->mvtvm.burst_xstats[qp_id].dequeued_count = 0;
 	}
 
+	/* Set model specific fast path functions */
+	if (model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL) {
+		model->enqueue_single = cn10k_ml_enqueue_single;
+		model->result_update = cn10k_ml_result_update;
+		model->set_error_code = cn10k_ml_set_error_code;
+		model->set_poll_addr = cn10k_ml_set_poll_addr;
+	} else {
+		model->enqueue_single = mvtvm_ml_enqueue_single;
+		model->result_update = mvtvm_ml_result_update;
+		model->set_error_code = mvtvm_ml_set_error_code;
+		model->set_poll_addr = mvtvm_ml_set_poll_addr;
+	}
+
 	return 0;
 
 error:
@@ -425,3 +445,107 @@ mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name,
 
 	return 0;
 }
+
+static int
+mvtvm_ml_model_run(struct cnxk_ml_model *model, struct rte_ml_op *op, struct cnxk_ml_req *req)
+{
+	uint8_t i;
+
+	rte_memcpy(req->mvtvm_req.input_tensor, model->mvtvm.input_tensor,
+		   model->mvtvm.metadata.model.num_input * sizeof(DLTensor));
+	for (i = 0; i < model->mvtvm.metadata.model.num_input; i++) {
+		req->mvtvm_req.input_tensor[i].data = op->input[i]->addr;
+		req->mvtvm_req.input_tensor[i].byte_offset = 0;
+	}
+
+	rte_memcpy(req->mvtvm_req.output_tensor, model->mvtvm.output_tensor,
+		   model->mvtvm.metadata.model.num_output * sizeof(DLTensor));
+	for (i = 0; i < model->mvtvm.metadata.model.num_output; i++) {
+		req->mvtvm_req.output_tensor[i].data = op->output[i]->addr;
+		req->mvtvm_req.output_tensor[i].byte_offset = 0;
+	}
+
+	tvmdp_model_run(model->model_id, model->mvtvm.metadata.model.num_input,
+			req->mvtvm_req.input_tensor, model->mvtvm.metadata.model.num_output,
+			req->mvtvm_req.output_tensor, &req->mvtvm_req.result,
+			&req->mvtvm_req.status);
+
+	plt_write64(ML_CNXK_POLL_JOB_FINISH, req->status);
+
+	return 0;
+}
+
+__rte_hot void
+mvtvm_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype)
+{
+	RTE_SET_USED(stype);
+
+	req->mvtvm_req.result.error_code = etype;
+}
+
+__rte_hot bool
+mvtvm_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op, uint16_t layer_id,
+			struct cnxk_ml_qp *qp, uint64_t head)
+{
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_req *req;
+
+	RTE_SET_USED(layer_id);
+
+	queue = &qp->queue;
+	req = &queue->reqs[head];
+	model = cnxk_mldev->mldev->data->models[op->model_id];
+
+	model->set_poll_addr(req);
+	memset(&req->mvtvm_req.result, 0, sizeof(struct mvtvm_ml_result));
+	req->mvtvm_req.result.error_code = 0x0;
+	req->mvtvm_req.result.user_ptr = op->user_ptr;
+
+	cnxk_ml_set_poll_ptr(req);
+	mvtvm_ml_model_run(model, op, req);
+	req->timeout = plt_tsc_cycles() + queue->wait_cycles;
+	req->op = op;
+
+	return true;
+}
+
+__rte_hot void
+mvtvm_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request)
+{
+	struct mvtvm_ml_model_xstats *xstats;
+	struct mvtvm_ml_result *result;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_req *req;
+	uint64_t tvm_rt_latency;
+	struct cnxk_ml_qp *qp;
+	struct rte_ml_op *op;
+
+	req = (struct cnxk_ml_req *)request;
+	result = &req->mvtvm_req.result;
+	op = req->op;
+	qp = cnxk_mldev->mldev->data->queue_pairs[qp_id];
+	op->impl_opaque = result->error_code;
+
+	if (likely(result->error_code == 0)) {
+		qp->stats.dequeued_count++;
+		op->status = RTE_ML_OP_STATUS_SUCCESS;
+
+		model = cnxk_mldev->mldev->data->models[op->model_id];
+		xstats = &model->mvtvm.burst_xstats[qp_id];
+
+		if (unlikely(xstats->dequeued_count == xstats->tvm_rt_reset_count)) {
+			xstats->tvm_rt_latency_min = UINT64_MAX;
+			xstats->tvm_rt_latency_max = 0;
+		}
+		tvm_rt_latency = result->stats.end_ns - result->stats.start_ns;
+		xstats->tvm_rt_latency = tvm_rt_latency;
+		xstats->tvm_rt_latency_tot += tvm_rt_latency;
+		xstats->tvm_rt_latency_min = RTE_MIN(xstats->tvm_rt_latency_min, tvm_rt_latency);
+		xstats->tvm_rt_latency_max = RTE_MAX(xstats->tvm_rt_latency_max, tvm_rt_latency);
+		xstats->dequeued_count++;
+	} else {
+		qp->stats.dequeue_err_count++;
+		op->status = RTE_ML_OP_STATUS_ERROR;
+	}
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index a1a868ef4b..82292ceadd 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -13,6 +13,44 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_qp;
+struct cnxk_ml_req;
+
+/* Inference stats */
+struct mvtvm_ml_stats {
+	/* Start ns */
+	uint64_t start_ns;
+
+	/* Start ns */
+	uint64_t end_ns;
+};
+
+/* Result structure */
+struct mvtvm_ml_result {
+	/* Job error code */
+	uint64_t error_code;
+
+	/* Inference stats */
+	struct mvtvm_ml_stats stats;
+
+	/* User context pointer */
+	void *user_ptr;
+};
+
+/* MVTVM specific request */
+struct mvtvm_ml_req {
+	/* Input tensors */
+	DLTensor input_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Output tensors */
+	DLTensor output_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Status field for poll mode requests */
+	volatile uint64_t status;
+
+	/* Result */
+	struct mvtvm_ml_result result;
+};
 
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
@@ -26,4 +64,9 @@ int mvtvm_ml_io_quantize(void *device, uint16_t model_id, const char *layer_name
 int mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name, void *qbuffer,
 			   const DLTensor **deq_tensor);
 
+__rte_hot bool mvtvm_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op,
+				       uint16_t layer_id, struct cnxk_ml_qp *qp, uint64_t head);
+__rte_hot void mvtvm_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request);
+__rte_hot void mvtvm_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype);
+
 #endif /* _MVTVM_ML_OPS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v2 34/34] ml/cnxk: enable creation of mvtvm virtual device
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (32 preceding siblings ...)
  2023-09-20  7:25   ` [PATCH v2 33/34] ml/cnxk: enable fast-path ops for TVM models Srikanth Yalavarthi
@ 2023-09-20  7:25   ` Srikanth Yalavarthi
  2023-09-21 12:15   ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Jerin Jacob
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-20  7:25 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enable support to create a mvtvm virtual device on system's
without a PCI based ML HW accelerator.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c |   8 ++
 drivers/ml/cnxk/cn10k_ml_dev.h |   3 +
 drivers/ml/cnxk/cnxk_ml_dev.c  |   3 +
 drivers/ml/cnxk/cnxk_ml_dev.h  |  21 ++++
 drivers/ml/cnxk/cnxk_ml_ops.c  |  86 ++++++++++----
 drivers/ml/cnxk/meson.build    |   2 +
 drivers/ml/cnxk/mvtvm_ml_dev.c | 198 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_dev.h |  40 +++++++
 drivers/ml/cnxk/mvtvm_ml_ops.c |  34 +++++-
 drivers/ml/cnxk/mvtvm_ml_ops.h |   2 +
 10 files changed, 372 insertions(+), 25 deletions(-)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 20c114b8bf..e6dc87e353 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -368,6 +368,12 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 
 	PLT_SET_USED(pci_drv);
 
+	if (cnxk_ml_dev_initialized == 1) {
+		plt_err("ML CNXK device already initialized!");
+		plt_err("Cannot initialize CN10K PCI dev");
+		rte_exit(-EINVAL, "Invalid EAL arguments ");
+	}
+
 	init_params = (struct rte_ml_dev_pmd_init_params){
 		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cnxk_ml_dev)};
 
@@ -414,6 +420,8 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	dev->dequeue_burst = NULL;
 	dev->op_error_get = NULL;
 
+	cnxk_ml_dev_initialized = 1;
+	cnxk_mldev->type = CNXK_ML_DEV_TYPE_PCI;
 	cnxk_mldev->state = ML_CNXK_DEV_STATE_PROBED;
 
 	return 0;
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 2e7eb6c9ef..cee405f3f5 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -11,6 +11,9 @@
 
 #include "cnxk_ml_io.h"
 
+/* Device status */
+extern int cnxk_ml_dev_initialized;
+
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.c b/drivers/ml/cnxk/cnxk_ml_dev.c
index 63d1c9e417..dc4512223c 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.c
+++ b/drivers/ml/cnxk/cnxk_ml_dev.c
@@ -7,6 +7,9 @@
 
 #include "cnxk_ml_dev.h"
 
+/* Device status */
+int cnxk_ml_dev_initialized;
+
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
 
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 382fca64be..491c4c4aea 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -9,6 +9,10 @@
 
 #include "cn10k_ml_dev.h"
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_dev.h"
+#endif
+
 #include "cnxk_ml_xstats.h"
 
 /* ML command timeout in seconds */
@@ -34,6 +38,15 @@ struct cnxk_ml_error_db {
 	char str[RTE_ML_STR_MAX];
 };
 
+/* Device type */
+enum cnxk_ml_dev_type {
+	/* PCI based Marvell's ML HW accelerator device */
+	CNXK_ML_DEV_TYPE_PCI,
+
+	/* Generic Virtual device */
+	CNXK_ML_DEV_TYPE_VDEV,
+};
+
 /* Device configuration state enum */
 enum cnxk_ml_dev_state {
 	/* Probed and not configured */
@@ -66,6 +79,9 @@ struct cnxk_ml_dev {
 	/* RTE device */
 	struct rte_ml_dev *mldev;
 
+	/* Device type */
+	enum cnxk_ml_dev_type type;
+
 	/* Configuration state */
 	enum cnxk_ml_dev_state state;
 
@@ -87,6 +103,11 @@ struct cnxk_ml_dev {
 	/* CN10K device structure */
 	struct cn10k_ml_dev cn10k_mldev;
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	/* MVTVM device structure */
+	struct mvtvm_ml_dev mvtvm_mldev;
+#endif
+
 	/* Maximum number of layers */
 	uint64_t max_nb_layers;
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 274d152b81..9a59e3b40b 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -125,7 +125,8 @@ cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc
 	qp->stats.enqueue_err_count = 0;
 	qp->stats.dequeue_err_count = 0;
 
-	cn10k_ml_qp_initialize(cnxk_mldev, qp);
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		cn10k_ml_qp_initialize(cnxk_mldev, qp);
 
 	return qp;
 
@@ -616,7 +617,14 @@ cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	dev_info->driver_name = dev->device->driver->name;
 	dev_info->max_models = ML_CNXK_MAX_MODELS;
 
-	return cn10k_ml_dev_info_get(cnxk_mldev, dev_info);
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		return cn10k_ml_dev_info_get(cnxk_mldev, dev_info);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	else
+		return mvtvm_ml_dev_info_get(cnxk_mldev, dev_info);
+#endif
+
+	return 0;
 }
 
 static int
@@ -654,9 +662,11 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 			   conf->nb_queue_pairs, conf->nb_models);
 
 		/* Load firmware */
-		ret = cn10k_ml_fw_load(cnxk_mldev);
-		if (ret != 0)
-			return ret;
+		if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+			ret = cn10k_ml_fw_load(cnxk_mldev);
+			if (ret != 0)
+				return ret;
+		}
 	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
 		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
@@ -754,10 +764,12 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	}
 	dev->data->nb_models = conf->nb_models;
 
-	ret = cn10k_ml_dev_configure(cnxk_mldev, conf);
-	if (ret != 0) {
-		plt_err("Failed to configure CN10K ML Device");
-		goto error;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		ret = cn10k_ml_dev_configure(cnxk_mldev, conf);
+		if (ret != 0) {
+			plt_err("Failed to configure CN10K ML Device");
+			goto error;
+		}
 	}
 
 #ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
@@ -767,12 +779,17 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 #endif
 
 	/* Set device capabilities */
-	cnxk_mldev->max_nb_layers =
-		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		cnxk_mldev->max_nb_layers =
+			cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
+	else
+		cnxk_mldev->max_nb_layers = ML_CNXK_MAX_MODELS;
 
 	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
 	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
-	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
+
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
 
 	/* Allocate and initialize index_map */
 	if (cnxk_mldev->index_map == NULL) {
@@ -835,8 +852,10 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 		plt_err("Failed to close MVTVM ML Device");
 #endif
 
-	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
-		plt_err("Failed to close CN10K ML Device");
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		if (cn10k_ml_dev_close(cnxk_mldev) != 0)
+			plt_err("Failed to close CN10K ML Device");
+	}
 
 	if (cnxk_mldev->index_map)
 		rte_free(cnxk_mldev->index_map);
@@ -888,10 +907,12 @@ cnxk_ml_dev_start(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
-	ret = cn10k_ml_dev_start(cnxk_mldev);
-	if (ret != 0) {
-		plt_err("Failed to start CN10K ML Device");
-		return ret;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		ret = cn10k_ml_dev_start(cnxk_mldev);
+		if (ret != 0) {
+			plt_err("Failed to start CN10K ML Device");
+			return ret;
+		}
 	}
 
 	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
@@ -910,10 +931,12 @@ cnxk_ml_dev_stop(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
-	ret = cn10k_ml_dev_stop(cnxk_mldev);
-	if (ret != 0) {
-		plt_err("Failed to stop CN10K ML Device");
-		return ret;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		ret = cn10k_ml_dev_stop(cnxk_mldev);
+		if (ret != 0) {
+			plt_err("Failed to stop CN10K ML Device");
+			return ret;
+		}
 	}
 
 	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
@@ -940,7 +963,14 @@ cnxk_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 			cnxk_ml_model_dump(cnxk_mldev, model, fp);
 	}
 
-	return cn10k_ml_dev_dump(cnxk_mldev, fp);
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		return cn10k_ml_dev_dump(cnxk_mldev, fp);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	else
+		return mvtvm_ml_dev_dump(cnxk_mldev, fp);
+#endif
+
+	return 0;
 }
 
 static int
@@ -953,6 +983,9 @@ cnxk_ml_dev_selftest(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV)
+		return -ENOTSUP;
+
 	return cn10k_ml_dev_selftest(cnxk_mldev);
 }
 
@@ -1281,6 +1314,11 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 		return -EINVAL;
 	}
 
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV && type != ML_CNXK_MODEL_TYPE_TVM) {
+		plt_err("Unsupported model type");
+		return -ENOTSUP;
+	}
+
 	/* Find model ID */
 	found = false;
 	for (lcl_model_id = 0; lcl_model_id < dev->data->nb_models; lcl_model_id++) {
@@ -1475,6 +1513,8 @@ cnxk_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buf
 		return -EINVAL;
 
 	cnxk_mldev = dev->data->dev_private;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV)
+		return -ENOTSUP;
 
 	model = dev->data->models[model_id];
 	if (model == NULL) {
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 09a62b5c55..f5989c5caf 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -70,11 +70,13 @@ if enable_mvtvm
 dpdk_conf.set('RTE_MLDEV_CNXK_ENABLE_MVTVM', true)
 
 driver_sdk_headers += files(
+        'mvtvm_ml_dev.h',
         'mvtvm_ml_ops.h',
         'mvtvm_ml_model.h',
 )
 
 sources += files(
+        'mvtvm_ml_dev.c',
         'mvtvm_ml_ops.c',
         'mvtvm_ml_model.c',
 )
diff --git a/drivers/ml/cnxk/mvtvm_ml_dev.c b/drivers/ml/cnxk/mvtvm_ml_dev.c
new file mode 100644
index 0000000000..8ca0e959e3
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_dev.c
@@ -0,0 +1,198 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_kvargs.h>
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include <bus_vdev_driver.h>
+
+#include <roc_api.h>
+
+#include "mvtvm_ml_dev.h"
+
+#include "cnxk_ml_dev.h"
+
+#define MVTVM_ML_DEV_MAX_QPS	      "max_qps"
+#define MVTVM_ML_DEV_CACHE_MODEL_DATA "cache_model_data"
+
+#define MVTVM_ML_DEV_MAX_QPS_DEFAULT	      32
+#define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT 1
+
+static const char *const valid_args[] = {MVTVM_ML_DEV_MAX_QPS, MVTVM_ML_DEV_CACHE_MODEL_DATA, NULL};
+
+static int
+parse_integer_arg(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	int *i = (int *)extra_args;
+
+	*i = atoi(value);
+	if (*i < 0) {
+		plt_err("Argument has to be positive.");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int
+parse_uint_arg(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	int i;
+	char *end;
+	errno = 0;
+
+	i = strtol(value, &end, 10);
+	if (*end != 0 || errno != 0 || i < 0)
+		return -EINVAL;
+
+	*((uint32_t *)extra_args) = i;
+
+	return 0;
+}
+
+static int
+mvtvm_mldev_parse_devargs(const char *args, struct mvtvm_ml_dev *mvtvm_mldev)
+{
+	bool cache_model_data_set = false;
+	struct rte_kvargs *kvlist = NULL;
+	bool max_qps_set = false;
+	int ret = 0;
+
+	if (args == NULL)
+		goto check_args;
+
+	kvlist = rte_kvargs_parse(args, valid_args);
+	if (kvlist == NULL) {
+		plt_err("Error parsing %s devargs\n", "MLDEV_NAME_MVTVM_PMD");
+		return -EINVAL;
+	}
+
+	if (rte_kvargs_count(kvlist, MVTVM_ML_DEV_MAX_QPS) == 1) {
+		ret = rte_kvargs_process(kvlist, MVTVM_ML_DEV_MAX_QPS, &parse_uint_arg,
+					 &mvtvm_mldev->max_nb_qpairs);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n", MVTVM_ML_DEV_MAX_QPS);
+			ret = -EINVAL;
+			goto exit;
+		}
+		max_qps_set = true;
+	}
+
+	if (rte_kvargs_count(kvlist, MVTVM_ML_DEV_CACHE_MODEL_DATA) == 1) {
+		ret = rte_kvargs_process(kvlist, MVTVM_ML_DEV_CACHE_MODEL_DATA, &parse_integer_arg,
+					 &mvtvm_mldev->cache_model_data);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n",
+				MVTVM_ML_DEV_CACHE_MODEL_DATA);
+			ret = -EINVAL;
+			goto exit;
+		}
+		cache_model_data_set = true;
+	}
+
+check_args:
+	if (!max_qps_set)
+		mvtvm_mldev->max_nb_qpairs = MVTVM_ML_DEV_MAX_QPS_DEFAULT;
+	plt_ml_dbg("ML: %s = %u", MVTVM_ML_DEV_MAX_QPS, mvtvm_mldev->max_nb_qpairs);
+
+	if (!cache_model_data_set) {
+		mvtvm_mldev->cache_model_data = CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT;
+	} else {
+		if ((mvtvm_mldev->cache_model_data < 0) || (mvtvm_mldev->cache_model_data > 1)) {
+			plt_err("Invalid argument, %s = %d\n", MVTVM_ML_DEV_CACHE_MODEL_DATA,
+				mvtvm_mldev->cache_model_data);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_ml_dbg("ML: %s = %d", MVTVM_ML_DEV_CACHE_MODEL_DATA, mvtvm_mldev->cache_model_data);
+
+exit:
+	if (kvlist)
+		rte_kvargs_free(kvlist);
+
+	return ret;
+}
+
+static int
+mvtvm_ml_vdev_probe(struct rte_vdev_device *vdev)
+{
+	struct rte_ml_dev_pmd_init_params init_params;
+	struct mvtvm_ml_dev *mvtvm_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct rte_ml_dev *dev;
+	const char *input_args;
+	const char *name;
+	int ret = 0;
+
+	if (cnxk_ml_dev_initialized == 1) {
+		plt_err("ML CNXK device already initialized!");
+		plt_err("Cannot initialize MVTVM vdev");
+		rte_exit(-EINVAL, "Invalid EAL arguments ");
+	}
+
+	init_params = (struct rte_ml_dev_pmd_init_params){
+		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cnxk_ml_dev)};
+
+	name = rte_vdev_device_name(vdev);
+	if (name == NULL)
+		return -EINVAL;
+	input_args = rte_vdev_device_args(vdev);
+
+	dev = rte_ml_dev_pmd_create(name, &vdev->device, &init_params);
+	if (dev == NULL) {
+		ret = -EFAULT;
+		goto error_exit;
+	}
+
+	cnxk_mldev = dev->data->dev_private;
+	cnxk_mldev->mldev = dev;
+	mvtvm_mldev = &cnxk_mldev->mvtvm_mldev;
+	mvtvm_mldev->vdev = vdev;
+
+	ret = mvtvm_mldev_parse_devargs(input_args, mvtvm_mldev);
+	if (ret < 0)
+		goto error_exit;
+
+	dev->dev_ops = &cnxk_ml_ops;
+	dev->enqueue_burst = NULL;
+	dev->dequeue_burst = NULL;
+	dev->op_error_get = NULL;
+
+	cnxk_ml_dev_initialized = 1;
+	cnxk_mldev->type = CNXK_ML_DEV_TYPE_VDEV;
+
+	return 0;
+
+error_exit:
+	plt_err("Could not create device: ml_mvtvm");
+
+	return ret;
+}
+
+static int
+mvtvm_ml_vdev_remove(struct rte_vdev_device *vdev)
+{
+	struct rte_ml_dev *dev;
+	const char *name;
+
+	name = rte_vdev_device_name(vdev);
+	if (name == NULL)
+		return -EINVAL;
+
+	dev = rte_ml_dev_pmd_get_named_dev(name);
+	if (dev == NULL)
+		return -ENODEV;
+
+	return rte_ml_dev_pmd_destroy(dev);
+}
+
+static struct rte_vdev_driver mvtvm_mldev_pmd = {.probe = mvtvm_ml_vdev_probe,
+						 .remove = mvtvm_ml_vdev_remove};
+
+RTE_PMD_REGISTER_VDEV(MLDEV_NAME_MVTVM_PMD, mvtvm_mldev_pmd);
+
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_MVTVM_PMD,
+			      MVTVM_ML_DEV_MAX_QPS "=<int>" MVTVM_ML_DEV_CACHE_MODEL_DATA "=<0|1>");
diff --git a/drivers/ml/cnxk/mvtvm_ml_dev.h b/drivers/ml/cnxk/mvtvm_ml_dev.h
new file mode 100644
index 0000000000..6922c19337
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_dev.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_DEV_H_
+#define _MVTVM_ML_DEV_H_
+
+#include <rte_mldev_core.h>
+
+/* Device status */
+extern int cnxk_ml_dev_initialized;
+
+/* CNXK Device ops */
+extern struct rte_ml_dev_ops cnxk_ml_ops;
+
+/* Marvell MVTVM ML PMD device name */
+#define MLDEV_NAME_MVTVM_PMD ml_mvtvm
+
+/* Maximum number of descriptors per queue-pair */
+#define ML_MVTVM_MAX_DESC_PER_QP 1024
+
+/* Maximum number of inputs / outputs per model */
+#define ML_MVTVM_MAX_INPUT_OUTPUT 32
+
+/* Maximum number of segments for IO data */
+#define ML_MVTVM_MAX_SEGMENTS 1
+
+/* Device private data */
+struct mvtvm_ml_dev {
+	/* Virtual device */
+	struct rte_vdev_device *vdev;
+
+	/* Maximum number of queue pairs */
+	uint16_t max_nb_qpairs;
+
+	/* Enable / disable model data caching */
+	int cache_model_data;
+};
+
+#endif /* _MVTVM_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 2baac8f72f..f4cd51f872 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -9,8 +9,7 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_ops.h"
-
+#include "mvtvm_ml_dev.h"
 #include "mvtvm_ml_model.h"
 #include "mvtvm_ml_ops.h"
 
@@ -27,6 +26,22 @@ mvtvm_ml_set_poll_addr(struct cnxk_ml_req *req)
 	req->status = &req->mvtvm_req.status;
 }
 
+int
+mvtvm_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info)
+{
+	struct mvtvm_ml_dev *mvtvm_mldev;
+
+	mvtvm_mldev = &cnxk_mldev->mvtvm_mldev;
+
+	dev_info->max_queue_pairs = mvtvm_mldev->max_nb_qpairs;
+	dev_info->max_desc = ML_MVTVM_MAX_DESC_PER_QP;
+	dev_info->max_io = ML_MVTVM_MAX_INPUT_OUTPUT;
+	dev_info->max_segments = ML_MVTVM_MAX_SEGMENTS;
+	dev_info->align_size = RTE_CACHE_LINE_SIZE;
+
+	return 0;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
@@ -57,6 +72,15 @@ mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 	return ret;
 }
 
+int
+mvtvm_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(fp);
+
+	return 0;
+}
+
 int
 mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 		    struct cnxk_ml_model *model)
@@ -167,6 +191,12 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	else
 		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_HYBRID;
 
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV &&
+	    model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_LLVM) {
+		plt_err("Unsupported model sub-type");
+		return -ENOTSUP;
+	}
+
 	/* Set callback function array */
 	if (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_LLVM) {
 		callback = &model->mvtvm.cb;
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 82292ceadd..1247f80c2d 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -52,8 +52,10 @@ struct mvtvm_ml_req {
 	struct mvtvm_ml_result result;
 };
 
+int mvtvm_ml_dev_info_get(struct cnxk_ml_dev *mldev, struct rte_ml_dev_info *dev_info);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int mvtvm_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* Re: [PATCH v1 02/34] ml/cnxk: drop use of RTE API for firmware read
  2023-08-30 15:58 ` [PATCH v1 02/34] ml/cnxk: drop use of RTE API for firmware read Srikanth Yalavarthi
@ 2023-09-21 12:08   ` Jerin Jacob
  2023-09-21 12:52     ` David Marchand
                       ` (2 more replies)
  0 siblings, 3 replies; 340+ messages in thread
From: Jerin Jacob @ 2023-09-21 12:08 UTC (permalink / raw)
  To: Srikanth Yalavarthi, David Marchand
  Cc: Prince Takkar, dev, sshankarnara, aprabhu

On Wed, Aug 30, 2023 at 9:40 PM Srikanth Yalavarthi
<syalavarthi@marvell.com> wrote:
>
> Dropped use of rte_firmware_read API to read ML firmware
> binary. When DPDK is built with libarchive aaupport, the
> the RTE API assumes the binary file as a compressed
> archive. This causes the ML firmware binary to be parsed
> incorrectly.

+ @David Marchand  rte_firmware_read() author for his opinions


>
> Fixes: c29da752ffa8 ("ml/cnxk: support firmware load and device reset")
> Cc: syalavarthi@marvell.com
>
> Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
> ---
>  drivers/ml/cnxk/cn10k_ml_dev.c | 64 +++++++++++++++++++++++++++++++---
>  1 file changed, 60 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
> index e3c2badcef5..b7e6ed9a00e 100644
> --- a/drivers/ml/cnxk/cn10k_ml_dev.c
> +++ b/drivers/ml/cnxk/cn10k_ml_dev.c
> @@ -2,6 +2,11 @@
>   * Copyright (c) 2022 Marvell.
>   */
>
> +#include <fcntl.h>
> +#include <sys/mman.h>
> +#include <sys/stat.h>
> +#include <unistd.h>
> +
>  #include <rte_common.h>
>  #include <rte_dev.h>
>  #include <rte_devargs.h>
> @@ -61,6 +66,57 @@ static const int valid_ocm_page_size[] = {1024, 2048, 4096, 8192, 16384};
>  /* Dummy operations for ML device */
>  struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
>
> +static int
> +ml_read_file(const char *file, size_t *size, char **buffer)
> +{
> +       char *file_buffer = NULL;
> +       struct stat file_stat;
> +       char *file_map;
> +       int ret;
> +       int fd;
> +
> +       fd = open(file, O_RDONLY);
> +       if (fd == -1) {
> +               plt_err("Failed to open file: %s\n", file);
> +               return -errno;
> +       }
> +
> +       if (fstat(fd, &file_stat) != 0) {
> +               plt_err("fstat failed for file: %s\n", file);
> +               close(fd);
> +               return -errno;
> +       }
> +
> +       file_buffer = rte_malloc("ml_firmware", file_stat.st_size, PLT_CACHE_LINE_SIZE);
> +       if (file_buffer == NULL) {
> +               plt_err("Failed to allocate memory: %s\n", file);
> +               ret = -ENOMEM;
> +               goto error;
> +       }
> +
> +       file_map = mmap(0, file_stat.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
> +       if (file_map == MAP_FAILED) {
> +               plt_err("Failed to map file: %s\n", file);
> +               ret = -errno;
> +               goto error;
> +       }
> +
> +       rte_memcpy(file_buffer, file_map, file_stat.st_size);
> +       munmap(file_map, file_stat.st_size);
> +       close(fd);
> +
> +       *size = file_stat.st_size;
> +       *buffer = file_buffer;
> +
> +       return 0;
> +
> +error:
> +       free(file_buffer);
> +       close(fd);
> +
> +       return ret;
> +}
> +
>  static int
>  parse_string_arg(const char *key __rte_unused, const char *value, void *extra_args)
>  {
> @@ -736,7 +792,7 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
>  {
>         const struct plt_memzone *mz;
>         struct cn10k_ml_fw *fw;
> -       void *fw_buffer = NULL;
> +       char *fw_buffer = NULL;
>         uint64_t mz_size = 0;
>         uint64_t fw_size = 0;
>         int ret = 0;
> @@ -746,7 +802,7 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
>
>         if (roc_env_is_emulator() || roc_env_is_hw()) {
>                 /* Read firmware image to a buffer */
> -               ret = rte_firmware_read(fw->path, &fw_buffer, &fw_size);
> +               ret = ml_read_file(fw->path, &fw_size, &fw_buffer);
>                 if ((ret < 0) || (fw_buffer == NULL)) {
>                         plt_err("Unable to read firmware data: %s\n", fw->path);
>                         return ret;
> @@ -763,7 +819,7 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
>         mz = plt_memzone_reserve_aligned(FW_MEMZONE_NAME, mz_size, 0, ML_CN10K_ALIGN_SIZE);
>         if (mz == NULL) {
>                 plt_err("plt_memzone_reserve failed : %s", FW_MEMZONE_NAME);
> -               free(fw_buffer);
> +               rte_free(fw_buffer);
>                 return -ENOMEM;
>         }
>         fw->req = mz->addr;
> @@ -780,7 +836,7 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
>         if (roc_env_is_emulator() || roc_env_is_hw()) {
>                 fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cn10k_ml_req));
>                 ret = cn10k_ml_fw_load_cn10ka(fw, fw_buffer, fw_size);
> -               free(fw_buffer);
> +               rte_free(fw_buffer);
>         } else if (roc_env_is_asim()) {
>                 fw->data = NULL;
>                 ret = cn10k_ml_fw_load_asim(fw);
> --
> 2.41.0
>

^ permalink raw reply	[flat|nested] 340+ messages in thread

* Re: [PATCH v2 00/34] Implemenation of revised ml/cnxk driver
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (33 preceding siblings ...)
  2023-09-20  7:25   ` [PATCH v2 34/34] ml/cnxk: enable creation of mvtvm virtual device Srikanth Yalavarthi
@ 2023-09-21 12:15   ` Jerin Jacob
  2023-09-27 18:39     ` [EXT] " Srikanth Yalavarthi
  34 siblings, 1 reply; 340+ messages in thread
From: Jerin Jacob @ 2023-09-21 12:15 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

On Wed, Sep 20, 2023 at 12:55 PM Srikanth Yalavarthi
<syalavarthi@marvell.com> wrote:
>
> This patch series is an implementation of revised ml/cnxk driver
> to support models compiled with TVM compiler framework. TVM models
> use a hybrid mode for execution, with regions of the model executing
> on the ML accelerator and the rest executing on CPU cores.
>
> This series of commits reorganizes the ml/cnxk driver and adds support
> to execute multiple regions with-in a TVM model.


For new feature as when add it the patch, please update
doc/guides/rel_notes/release_23_11.rst
under "* **Updated Marvell cnxk ml driver.**"

^ permalink raw reply	[flat|nested] 340+ messages in thread

* Re: [PATCH v1 19/34] ml/cnxk: support config and close of tvmdp library
  2023-08-30 15:59 ` [PATCH v1 19/34] ml/cnxk: support config and close of tvmdp library Srikanth Yalavarthi
@ 2023-09-21 12:32   ` Jerin Jacob
  2023-09-27 18:38     ` [EXT] " Srikanth Yalavarthi
  0 siblings, 1 reply; 340+ messages in thread
From: Jerin Jacob @ 2023-09-21 12:32 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

On Wed, Aug 30, 2023 at 9:34 PM Srikanth Yalavarthi
<syalavarthi@marvell.com> wrote:
>
> Added support to configure and close TVMDP library based
> on ML device configuration options.
>
> Updated meson build to enable Jansson, TVM runtime, TVMDP
> library as build dependencies.

If it is optional – please add optional

please update cnxk ml driver documentation on this dependency and
example command to build it. See DPDK mlx5 docs for dependency
documentation.


>
> Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
> ---
>  drivers/ml/cnxk/cnxk_ml_ops.c  | 15 ++++++++++++
>  drivers/ml/cnxk/meson.build    | 45 ++++++++++++++++++++++++++++++++++
>  drivers/ml/cnxk/mvtvm_ml_ops.c | 44 +++++++++++++++++++++++++++++++++
>  drivers/ml/cnxk/mvtvm_ml_ops.h | 15 ++++++++++++
>  4 files changed, 119 insertions(+)
>  create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.c
>  create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.h
>
> diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
> index b2eb4bd0d9a..454fec33234 100644
> --- a/drivers/ml/cnxk/cnxk_ml_ops.c
> +++ b/drivers/ml/cnxk/cnxk_ml_ops.c
> @@ -9,6 +9,10 @@
>
>  #include "cn10k_ml_ops.h"
>
> +#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
> +#include "mvtvm_ml_ops.h"
> +#endif
> +
>  #include "cnxk_ml_dev.h"
>  #include "cnxk_ml_io.h"
>  #include "cnxk_ml_model.h"
> @@ -625,6 +629,12 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
>                 goto error;
>         }
>
> +#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM

If this #ifdef used a lot place in code like this, Please have stub
and segregate at once place in header file
and avoid ifdef main code like cnxk_ml_dev_configure().

^ permalink raw reply	[flat|nested] 340+ messages in thread

* Re: [PATCH v1 02/34] ml/cnxk: drop use of RTE API for firmware read
  2023-09-21 12:08   ` Jerin Jacob
@ 2023-09-21 12:52     ` David Marchand
  2023-09-21 13:06       ` [EXT] " Srikanth Yalavarthi
  2023-09-27  9:38     ` David Marchand
  2023-09-27 18:37     ` Srikanth Yalavarthi
  2 siblings, 1 reply; 340+ messages in thread
From: David Marchand @ 2023-09-21 12:52 UTC (permalink / raw)
  To: Jerin Jacob, Srikanth Yalavarthi
  Cc: Prince Takkar, dev, sshankarnara, aprabhu

On Thu, Sep 21, 2023 at 2:08 PM Jerin Jacob <jerinjacobk@gmail.com> wrote:
>
> On Wed, Aug 30, 2023 at 9:40 PM Srikanth Yalavarthi
> <syalavarthi@marvell.com> wrote:
> >
> > Dropped use of rte_firmware_read API to read ML firmware
> > binary. When DPDK is built with libarchive aaupport, the
> > the RTE API assumes the binary file as a compressed

The rte_firmware API supports both xz-compressed  and uncompressed files.
Otherwise, it would break loading net/ice on systems where
/lib/firmware content is uncompressed (which is still the case in some
Linux distributions).


To convince myself, I wrote a quick tool ("./archive" below) that
outputs in hexa the content of a file, with the same libarchive calls.

With a xz-compressed file:
$ hexdump -C /lib/firmware/intel/ice/ddp/ice.pkg.xz | head -1
00000000  fd 37 7a 58 5a 00 00 01  69 22 de 36 02 00 21 01  |.7zXZ...i".6..!.|
$ ./archive /lib/firmware/intel/ice/ddp/ice.pkg.xz | head -1
00000000: 01 00 00 00 05 00 00 00 1C 00 00 00 70 00 00 00 | ............p...

Uncompressing this file, and passing it to the same tool:
$ hexdump -C ice.pkg | head
00000000  01 00 00 00 05 00 00 00  1c 00 00 00 70 00 00 00  |............p...|
$ ./archive ice.pkg | head
00000000: 01 00 00 00 05 00 00 00 1C 00 00 00 70 00 00 00 | ............p...


For the record, I am using:
$ rpm -q libarchive
libarchive-3.6.1-3.fc37.x86_64


> > archive. This causes the ML firmware binary to be parsed
> > incorrectly.
>
> + @David Marchand  rte_firmware_read() author for his opinions

/lib/firmware/mlip-fw.bin does not seem to be something packaged in
Fedora, and I found no trace in linux-firmware repo, so I can't
reproduce your issue.

Please add some debug and give more details about the issue you are facing.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 340+ messages in thread

* RE: [EXT] Re: [PATCH v1 02/34] ml/cnxk: drop use of RTE API for firmware read
  2023-09-21 12:52     ` David Marchand
@ 2023-09-21 13:06       ` Srikanth Yalavarthi
  2023-09-21 13:26         ` David Marchand
  0 siblings, 1 reply; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-21 13:06 UTC (permalink / raw)
  To: David Marchand, Jerin Jacob
  Cc: Prince Takkar, dev, Shivah Shankar Shankar Narayan Rao,
	Anup Prabhu, Srikanth Yalavarthi

> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: 21 September 2023 18:23
> To: Jerin Jacob <jerinjacobk@gmail.com>; Srikanth Yalavarthi
> <syalavarthi@marvell.com>
> Cc: Prince Takkar <ptakkar@marvell.com>; dev@dpdk.org; Shivah Shankar
> Shankar Narayan Rao <sshankarnara@marvell.com>; Anup Prabhu
> <aprabhu@marvell.com>
> Subject: [EXT] Re: [PATCH v1 02/34] ml/cnxk: drop use of RTE API for
> firmware read
> 
> External Email
> 
> ----------------------------------------------------------------------
> On Thu, Sep 21, 2023 at 2:08 PM Jerin Jacob <jerinjacobk@gmail.com> wrote:
> >
> > On Wed, Aug 30, 2023 at 9:40 PM Srikanth Yalavarthi
> > <syalavarthi@marvell.com> wrote:
> > >
> > > Dropped use of rte_firmware_read API to read ML firmware binary.
> > > When DPDK is built with libarchive aaupport, the the RTE API assumes
> > > the binary file as a compressed
> 
> The rte_firmware API supports both xz-compressed  and uncompressed files.
> Otherwise, it would break loading net/ice on systems where /lib/firmware
> content is uncompressed (which is still the case in some Linux distributions).
> 
> 
> To convince myself, I wrote a quick tool ("./archive" below) that outputs in
> hexa the content of a file, with the same libarchive calls.
> 
> With a xz-compressed file:
> $ hexdump -C /lib/firmware/intel/ice/ddp/ice.pkg.xz | head -1
> 00000000  fd 37 7a 58 5a 00 00 01  69 22 de 36 02 00 21 01  |.7zXZ...i".6..!.| $
> ./archive /lib/firmware/intel/ice/ddp/ice.pkg.xz | head -1
> 00000000: 01 00 00 00 05 00 00 00 1C 00 00 00 70 00 00 00 | ............p...
> 
> Uncompressing this file, and passing it to the same tool:
> $ hexdump -C ice.pkg | head
> 00000000  01 00 00 00 05 00 00 00  1c 00 00 00 70 00 00 00  |............p...| $
> ./archive ice.pkg | head
> 00000000: 01 00 00 00 05 00 00 00 1C 00 00 00 70 00 00 00 | ............p...
> 
> 
> For the record, I am using:
> $ rpm -q libarchive
> libarchive-3.6.1-3.fc37.x86_64
> 
> 
> > > archive. This causes the ML firmware binary to be parsed
> > > incorrectly.
> >
> > + @David Marchand  rte_firmware_read() author for his opinions
> 
> /lib/firmware/mlip-fw.bin does not seem to be something packaged in
> Fedora, and I found no trace in linux-firmware repo, so I can't reproduce
> your issue.
> 
> Please add some debug and give more details about the issue you are facing.

The "/lib/firmware/mlip-fw.bin" is Marvell's ML firmware binary. This file is in un-compressed form.

When DPDK is built without libarchive support, No issues are observed with using  rte_firmware_read to load the firmware file as open and read system calls are used.

When libarchive support is enabled, rte_firmware_read tries to parse the firmware binary as an xz archive. Since the file is not an archive, this step is failing.

Hence, added new ML driver function to read the firmware binary.

> 
> 
> --
> David Marchand


^ permalink raw reply	[flat|nested] 340+ messages in thread

* Re: [EXT] Re: [PATCH v1 02/34] ml/cnxk: drop use of RTE API for firmware read
  2023-09-21 13:06       ` [EXT] " Srikanth Yalavarthi
@ 2023-09-21 13:26         ` David Marchand
  2023-09-22  3:59           ` Srikanth Yalavarthi
  0 siblings, 1 reply; 340+ messages in thread
From: David Marchand @ 2023-09-21 13:26 UTC (permalink / raw)
  To: Srikanth Yalavarthi
  Cc: Jerin Jacob, Prince Takkar, dev,
	Shivah Shankar Shankar Narayan Rao, Anup Prabhu

On Thu, Sep 21, 2023 at 3:06 PM Srikanth Yalavarthi
<syalavarthi@marvell.com> wrote:
> > > > archive. This causes the ML firmware binary to be parsed
> > > > incorrectly.
> > >
> > > + @David Marchand  rte_firmware_read() author for his opinions
> >
> > /lib/firmware/mlip-fw.bin does not seem to be something packaged in
> > Fedora, and I found no trace in linux-firmware repo, so I can't reproduce
> > your issue.
> >
> > Please add some debug and give more details about the issue you are facing.
>
> The "/lib/firmware/mlip-fw.bin" is Marvell's ML firmware binary. This file is in un-compressed form.
>
> When DPDK is built without libarchive support, No issues are observed with using  rte_firmware_read to load the firmware file as open and read system calls are used.
>
> When libarchive support is enabled, rte_firmware_read tries to parse the firmware binary as an xz archive. Since the file is not an archive, this step is failing.

Please debug this part and point at the exact place where it fails.

>
> Hence, added new ML driver function to read the firmware binary.

This is just avoiding the issue without understanding it...


-- 
David Marchand


^ permalink raw reply	[flat|nested] 340+ messages in thread

* RE: [EXT] Re: [PATCH v1 02/34] ml/cnxk: drop use of RTE API for firmware read
  2023-09-21 13:26         ` David Marchand
@ 2023-09-22  3:59           ` Srikanth Yalavarthi
  2023-09-22  8:07             ` David Marchand
  0 siblings, 1 reply; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-22  3:59 UTC (permalink / raw)
  To: David Marchand
  Cc: Jerin Jacob, Prince Takkar, dev,
	Shivah Shankar Shankar Narayan Rao, Anup Prabhu,
	Srikanth Yalavarthi

> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: 21 September 2023 18:57
> To: Srikanth Yalavarthi <syalavarthi@marvell.com>
> Cc: Jerin Jacob <jerinjacobk@gmail.com>; Prince Takkar
> <ptakkar@marvell.com>; dev@dpdk.org; Shivah Shankar Shankar Narayan
> Rao <sshankarnara@marvell.com>; Anup Prabhu <aprabhu@marvell.com>
> Subject: Re: [EXT] Re: [PATCH v1 02/34] ml/cnxk: drop use of RTE API for
> firmware read
> 
> On Thu, Sep 21, 2023 at 3:06 PM Srikanth Yalavarthi
> <syalavarthi@marvell.com> wrote:
> > > > > archive. This causes the ML firmware binary to be parsed
> > > > > incorrectly.
> > > >
> > > > + @David Marchand  rte_firmware_read() author for his opinions
> > >
> > > /lib/firmware/mlip-fw.bin does not seem to be something packaged in
> > > Fedora, and I found no trace in linux-firmware repo, so I can't
> > > reproduce your issue.
> > >
> > > Please add some debug and give more details about the issue you are
> facing.
> >
> > The "/lib/firmware/mlip-fw.bin" is Marvell's ML firmware binary. This file is
> in un-compressed form.
> >
> > When DPDK is built without libarchive support, No issues are observed with
> using  rte_firmware_read to load the firmware file as open and read system
> calls are used.
> >
> > When libarchive support is enabled, rte_firmware_read tries to parse the
> firmware binary as an xz archive. Since the file is not an archive, this step is
> failing.
> 
> Please debug this part and point at the exact place where it fails.

When compiled with libarchive support, the code fails in firmware_open (lib/eal/unix/eal_firmware.c:24) function

	if (archive_read_support_format_raw(ctx->a) != ARCHIVE_OK ||
			archive_read_support_filter_xz(ctx->a) != ARCHIVE_OK ||
			archive_read_open_filename(ctx->a, name, blocksize) != ARCHIVE_OK ||
			archive_read_next_header(ctx->a, &e) != ARCHIVE_OK) {
		archive_read_free(ctx->a);
		ctx->a = NULL;
		return -1;
	}

I understand that all of the 4 checks in the if condition assume that the file is a compressed archive. i.e, they look for relevant metadata of a compressed archive.
All 4 checks were failing when the file being read is a single uncompressed file (as in our case).

And, when compiled without libarchive enabled, alternate firmware_open (lib/eal/unix/eal_firmware.c:63) is called, which works for the file that we are trying to read.

Please correct me if my understanding is not correct.

> 
> >
> > Hence, added new ML driver function to read the firmware binary.
> 
> This is just avoiding the issue without understanding it...
> 
> 
> --
> David Marchand


^ permalink raw reply	[flat|nested] 340+ messages in thread

* Re: [EXT] Re: [PATCH v1 02/34] ml/cnxk: drop use of RTE API for firmware read
  2023-09-22  3:59           ` Srikanth Yalavarthi
@ 2023-09-22  8:07             ` David Marchand
  2023-09-22 16:59               ` Srikanth Yalavarthi
  0 siblings, 1 reply; 340+ messages in thread
From: David Marchand @ 2023-09-22  8:07 UTC (permalink / raw)
  To: Srikanth Yalavarthi
  Cc: Jerin Jacob, Prince Takkar, dev,
	Shivah Shankar Shankar Narayan Rao, Anup Prabhu

Hello,

On Fri, Sep 22, 2023 at 5:59 AM Srikanth Yalavarthi
<syalavarthi@marvell.com> wrote:
> > From: David Marchand <david.marchand@redhat.com>
> > On Thu, Sep 21, 2023 at 3:06 PM Srikanth Yalavarthi
> > <syalavarthi@marvell.com> wrote:
> > > > > > archive. This causes the ML firmware binary to be parsed
> > > > > > incorrectly.
> > > > >
> > > > > + @David Marchand  rte_firmware_read() author for his opinions
> > > >
> > > > /lib/firmware/mlip-fw.bin does not seem to be something packaged in
> > > > Fedora, and I found no trace in linux-firmware repo, so I can't
> > > > reproduce your issue.
> > > >
> > > > Please add some debug and give more details about the issue you are
> > facing.
> > >
> > > The "/lib/firmware/mlip-fw.bin" is Marvell's ML firmware binary. This file is
> > in un-compressed form.
> > >
> > > When DPDK is built without libarchive support, No issues are observed with
> > using  rte_firmware_read to load the firmware file as open and read system
> > calls are used.
> > >
> > > When libarchive support is enabled, rte_firmware_read tries to parse the
> > firmware binary as an xz archive. Since the file is not an archive, this step is
> > failing.
> >
> > Please debug this part and point at the exact place where it fails.
>
> When compiled with libarchive support, the code fails in firmware_open (lib/eal/unix/eal_firmware.c:24) function
>
>         if (archive_read_support_format_raw(ctx->a) != ARCHIVE_OK ||

"""
     archive_read_support_format_raw()
             The “raw” format handler allows libarchive to be used to
read arbitrary data.  It treats any data stream as an archive with a
single entry.  The pathname of this entry is “data”; all other entry
fields are unset.  This
             is not enabled by archive_read_support_format_all() in
order to avoid erroneous handling of damaged archives.
"""

Which means that this instance of libarchive accepts "raw" archive.

>                         archive_read_support_filter_xz(ctx->a) != ARCHIVE_OK ||

"""
     archive_read_support_filter_bzip2(),
archive_read_support_filter_compress(),
archive_read_support_filter_grzip(),
archive_read_support_filter_gzip(),
archive_read_support_filter_lrzip(),
archive_read_support_filter_lz4(),
             archive_read_support_filter_lzma(),
archive_read_support_filter_lzop(),
archive_read_support_filter_none(), archive_read_support_filter_rpm(),
archive_read_support_filter_uu(), archive_read_support_filter_xz(),
             archive_read_support_filter_zstd(),
             Enables auto-detection code and decompression support for
the specified compression.  These functions may fall back on external
programs if an appropriate library was not available at build time.
Decompression using an
             external program is usually slower than decompression
through built-in libraries.  Note that “none” is always enabled by
default.
"""

Which means that this instance of libarchive accepts xz compressed
files, and uncompressed files too because the "none" filter is enabled
by default.


>                         archive_read_open_filename(ctx->a, name, blocksize) != ARCHIVE_OK ||
>                         archive_read_next_header(ctx->a, &e) != ARCHIVE_OK) {
>                 archive_read_free(ctx->a);
>                 ctx->a = NULL;
>                 return -1;
>         }
>
> I understand that all of the 4 checks in the if condition assume that the file is a compressed archive. i.e, they look for relevant metadata of a compressed archive.

I had double checked before replying last time, it works as I
described with my fedora libarchive.


> All 4 checks were failing when the file being read is a single uncompressed file (as in our case).

o_O

Did you check that all 4 checks are failing individually or are you
saying this 4 tests fail as a whole?

I have one suspicion on archive_read_support_filter_xz, which may
return ARCHIVE_WARN.
But that's my only serious hint so far.

I have put up some debug patch, please have a try with it.
https://patchwork.dpdk.org/project/dpdk/patch/20230922080606.905222-1-david.marchand@redhat.com/


-- 
David Marchand


^ permalink raw reply	[flat|nested] 340+ messages in thread

* RE: [EXT] Re: [PATCH v1 02/34] ml/cnxk: drop use of RTE API for firmware read
  2023-09-22  8:07             ` David Marchand
@ 2023-09-22 16:59               ` Srikanth Yalavarthi
  0 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-22 16:59 UTC (permalink / raw)
  To: David Marchand
  Cc: Jerin Jacob, Prince Takkar, dev,
	Shivah Shankar Shankar Narayan Rao, Anup Prabhu,
	Srikanth Yalavarthi

> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: 22 September 2023 13:38
> To: Srikanth Yalavarthi <syalavarthi@marvell.com>
> Cc: Jerin Jacob <jerinjacobk@gmail.com>; Prince Takkar
> <ptakkar@marvell.com>; dev@dpdk.org; Shivah Shankar Shankar Narayan
> Rao <sshankarnara@marvell.com>; Anup Prabhu <aprabhu@marvell.com>
> Subject: Re: [EXT] Re: [PATCH v1 02/34] ml/cnxk: drop use of RTE API for
> firmware read
> 
> Hello,
> 
> On Fri, Sep 22, 2023 at 5:59 AM Srikanth Yalavarthi
> <syalavarthi@marvell.com> wrote:
> > > From: David Marchand <david.marchand@redhat.com> On Thu, Sep 21,
> > > 2023 at 3:06 PM Srikanth Yalavarthi <syalavarthi@marvell.com> wrote:
> > > > > > > archive. This causes the ML firmware binary to be parsed
> > > > > > > incorrectly.
> > > > > >
> > > > > > + @David Marchand  rte_firmware_read() author for his opinions
> > > > >
> Did you check that all 4 checks are failing individually or are you saying this 4
> tests fail as a whole?
> 
> I have one suspicion on archive_read_support_filter_xz, which may return
> ARCHIVE_WARN.

Yes, archive_read_support_xz is returning ARCHIVE_WARN (-20). This is causing the firmware_open function to fail.

I guess we can ignore the ARCHIVE_WARN, since this means about compression support, not decompression.

""These functions return ARCHIVE_OK if the compression is fully supported, ARCHIVE_WARN if the compression is supported only through an external program.""

I have submitted a patch, which I have tested with compressed (xz) and uncompressed files. Please share your comments.

http://patches.dpdk.org/project/dpdk/patch/20230922165356.31567-1-syalavarthi@marvell.com/


> But that's my only serious hint so far.
> 
> I have put up some debug patch, please have a try with it.
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__patchwork.dpdk.org_project_dpdk_patch_20230922080606.905222-
> 2D1-2Ddavid.marchand-
> 40redhat.com_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=SNPqUkGl0n_
> Ms1iJa_6wD6LBwX8efL_NOyXvAX-
> iCMI&m=gXmofsgJJekf5Jw2JIv1eESMDt3J_NyXqmHn9Gpk80XXWBsn7DBPjYsb
> eghxAWQr&s=iwNgfFNxL60sMbI-OP-k78p45eUPFaWTN1kUsO4nguQ&e=

Thanks for the debug patch and support.
> 
> 
> --
> David Marchand


^ permalink raw reply	[flat|nested] 340+ messages in thread

* Re: [PATCH v1 02/34] ml/cnxk: drop use of RTE API for firmware read
  2023-09-21 12:08   ` Jerin Jacob
  2023-09-21 12:52     ` David Marchand
@ 2023-09-27  9:38     ` David Marchand
  2023-09-27 10:00       ` [EXT] " Srikanth Yalavarthi
  2023-09-27 18:37     ` Srikanth Yalavarthi
  2 siblings, 1 reply; 340+ messages in thread
From: David Marchand @ 2023-09-27  9:38 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Srikanth Yalavarthi, Prince Takkar, dev, sshankarnara, aprabhu

On Thu, Sep 21, 2023 at 2:08 PM Jerin Jacob <jerinjacobk@gmail.com> wrote:
>
> On Wed, Aug 30, 2023 at 9:40 PM Srikanth Yalavarthi
> <syalavarthi@marvell.com> wrote:
> >
> > Dropped use of rte_firmware_read API to read ML firmware
> > binary. When DPDK is built with libarchive aaupport, the
> > the RTE API assumes the binary file as a compressed
> > archive. This causes the ML firmware binary to be parsed
> > incorrectly.
>
> + @David Marchand  rte_firmware_read() author for his opinions

I am not sure if this series was applied, but this patch can be
discarded as a fix on rte_firmware_read has been merged in the main
repository.
Thanks for the heads up Jerin.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 340+ messages in thread

* RE: [EXT] Re: [PATCH v1 02/34] ml/cnxk: drop use of RTE API for firmware read
  2023-09-27  9:38     ` David Marchand
@ 2023-09-27 10:00       ` Srikanth Yalavarthi
  0 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 10:00 UTC (permalink / raw)
  To: David Marchand, Jerin Jacob
  Cc: Prince Takkar, dev, Shivah Shankar Shankar Narayan Rao,
	Anup Prabhu, Srikanth Yalavarthi

> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: 27 September 2023 15:08
> To: Jerin Jacob <jerinjacobk@gmail.com>
> Cc: Srikanth Yalavarthi <syalavarthi@marvell.com>; Prince Takkar
> <ptakkar@marvell.com>; dev@dpdk.org; Shivah Shankar Shankar Narayan
> Rao <sshankarnara@marvell.com>; Anup Prabhu <aprabhu@marvell.com>
> Subject: [EXT] Re: [PATCH v1 02/34] ml/cnxk: drop use of RTE API for
> firmware read
> 
> External Email
> 
> ----------------------------------------------------------------------
> On Thu, Sep 21, 2023 at 2:08 PM Jerin Jacob <jerinjacobk@gmail.com> wrote:
> >
> > On Wed, Aug 30, 2023 at 9:40 PM Srikanth Yalavarthi
> > <syalavarthi@marvell.com> wrote:
> > >
> > > Dropped use of rte_firmware_read API to read ML firmware binary.
> > > When DPDK is built with libarchive aaupport, the the RTE API assumes
> > > the binary file as a compressed archive. This causes the ML firmware
> > > binary to be parsed incorrectly.
> >
> > + @David Marchand  rte_firmware_read() author for his opinions
> 
> I am not sure if this series was applied, but this patch can be discarded as a fix
> on rte_firmware_read has been merged in the main repository.
> Thanks for the heads up Jerin.

This series is not yet applied.
I will push a revised series with this commit removed.

> 
> 
> --
> David Marchand


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 00/35] Implemenation of revised ml/cnxk driver
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (34 preceding siblings ...)
  2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
@ 2023-09-27 18:30 ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 01/35] ml/cnxk: drop support for register polling Srikanth Yalavarthi
                     ` (34 more replies)
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (5 subsequent siblings)
  41 siblings, 35 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  Cc: dev, syalavarthi, sshankarnara, aprabhu, ptakkar

This patch series is an implementation of revised ml/cnxk driver
to support models compiled with TVM compiler framework. TVM models
use a hybrid mode for execution, with regions of the model executing
on the ML accelerator and the rest executing on CPU cores.

This series of commits reorganizes the ml/cnxk driver and adds support
to execute multiple regions with-in a TVM model.

v3:
  - Reduced use of RTE_MLDEV_CNXK_ENABLE_MVTVM macro
  - Added stubs file with dummy functions to use when TVM is disabled
  - Dropped patch with internal function to read firmware
  - Updated ML CNXK PMD documentation
  - Added external libary dependency info in documentation
  - Added release notes for 23.11

v2:
  - Fix xstats reporting
  - Fix issues reported by klocwork static analysis tool
  - Update external header inclusions

v1:
  - Initial changes

Anup Prabhu (1):
  ml/cnxk: enable fast-path ops for TVM models

Prince Takkar (2):
  ml/cnxk: update internal TVM model info structure
  ml/cnxk: support quantize and dequantize callback

Srikanth Yalavarthi (32):
  ml/cnxk: drop support for register polling
  ml/cnxk: add generic cnxk device structure
  ml/cnxk: add generic model and layer structures
  ml/cnxk: add generic cnxk request structure
  ml/cnxk: add generic cnxk xstats structures
  ml/cnxk: rename cnxk ops function pointers struct
  ml/cnxk: update device handling functions
  ml/cnxk: update queue-pair handling functions
  ml/cnxk: update model load and unload functions
  ml/cnxk: update model start and stop functions
  ml/cnxk: update model utility functions
  ml/cnxk: update data quantization functions
  ml/cnxk: update device debug functions
  ml/cnxk: update device stats functions
  ml/cnxk: update device and model xstats functions
  ml/cnxk: update fast path functions
  ml/cnxk: move error handling to cnxk layer
  ml/cnxk: support config and close of tvmdp library
  ml/cnxk: add structures to support TVM model type
  ml/cnxk: add support for identify model type
  ml/cnxk: add support to parse TVM model objects
  ml/cnxk: fetch layer info and load TVM model
  ml/cnxk: update internal info for TVM model
  ml/cnxk: enable model unload in tvmdp library
  ml/cnxk: support start and stop for TVM models
  ml/cnxk: support device dump for TVM models
  ml/cnxk: enable reporting model runtime as xstats
  ml/cnxk: implement I/O alloc and free callbacks
  ml/cnxk: add generic ML malloc and free callback
  ml/cnxk: enable creation of mvtvm virtual device
  ml/cnxk: update dependency info in driver docs
  ml/cnxk: update release notes for 23.11

 doc/guides/mldevs/cnxk.rst             |   81 +-
 doc/guides/rel_notes/release_23_11.rst |    4 +
 drivers/ml/cnxk/cn10k_ml_dev.c         |  416 ++--
 drivers/ml/cnxk/cn10k_ml_dev.h         |  457 +---
 drivers/ml/cnxk/cn10k_ml_model.c       |  401 ++--
 drivers/ml/cnxk/cn10k_ml_model.h       |  151 +-
 drivers/ml/cnxk/cn10k_ml_ocm.c         |  111 +-
 drivers/ml/cnxk/cn10k_ml_ocm.h         |   15 +-
 drivers/ml/cnxk/cn10k_ml_ops.c         | 2828 ++++++++----------------
 drivers/ml/cnxk/cn10k_ml_ops.h         |  358 ++-
 drivers/ml/cnxk/cnxk_ml_dev.c          |   22 +
 drivers/ml/cnxk/cnxk_ml_dev.h          |  120 +
 drivers/ml/cnxk/cnxk_ml_io.c           |   95 +
 drivers/ml/cnxk/cnxk_ml_io.h           |   88 +
 drivers/ml/cnxk/cnxk_ml_model.c        |   94 +
 drivers/ml/cnxk/cnxk_ml_model.h        |  192 ++
 drivers/ml/cnxk/cnxk_ml_ops.c          | 1630 ++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.h          |   87 +
 drivers/ml/cnxk/cnxk_ml_utils.c        |   15 +
 drivers/ml/cnxk/cnxk_ml_utils.h        |   17 +
 drivers/ml/cnxk/cnxk_ml_xstats.h       |  152 ++
 drivers/ml/cnxk/meson.build            |   79 +
 drivers/ml/cnxk/mvtvm_ml_dev.c         |  196 ++
 drivers/ml/cnxk/mvtvm_ml_dev.h         |   40 +
 drivers/ml/cnxk/mvtvm_ml_model.c       |  392 ++++
 drivers/ml/cnxk/mvtvm_ml_model.h       |   90 +
 drivers/ml/cnxk/mvtvm_ml_ops.c         |  652 ++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h         |   82 +
 drivers/ml/cnxk/mvtvm_ml_stubs.c       |  141 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.h       |   36 +
 30 files changed, 6083 insertions(+), 2959 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_xstats.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_stubs.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_stubs.h

-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 01/35] ml/cnxk: drop support for register polling
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 02/35] ml/cnxk: add generic cnxk device structure Srikanth Yalavarthi
                     ` (33 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Dropped support for device argument "poll_mem" for cnxk
ML driver. Support to use registers for polling is removed
and DDR addresses would be used for polling.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
Depends-on: series-29660 ("Spec changes to support multi I/O models")

 doc/guides/mldevs/cnxk.rst     |  16 -----
 drivers/ml/cnxk/cn10k_ml_dev.c |  36 +----------
 drivers/ml/cnxk/cn10k_ml_dev.h |  13 +---
 drivers/ml/cnxk/cn10k_ml_ops.c | 111 ++++-----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   6 --
 5 files changed, 18 insertions(+), 164 deletions(-)

diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
index b79bc540d9..1834b1f905 100644
--- a/doc/guides/mldevs/cnxk.rst
+++ b/doc/guides/mldevs/cnxk.rst
@@ -180,22 +180,6 @@ Runtime Config Options
   in the fast path enqueue burst operation.
 
 
-**Polling memory location** (default ``ddr``)
-
-  ML cnxk driver provides the option to select the memory location to be used
-  for polling to check the inference request completion.
-  Driver supports using either the DDR address space (``ddr``)
-  or ML registers (``register``) as polling locations.
-  The parameter ``poll_mem`` is used to specify the poll location.
-
-  For example::
-
-     -a 0000:00:10.0,poll_mem="register"
-
-  With the above configuration, ML cnxk driver is configured to use ML registers
-  for polling in fastpath requests.
-
-
 Debugging Options
 -----------------
 
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 983138a7f2..e3c2badcef 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -23,7 +23,6 @@
 #define CN10K_ML_DEV_CACHE_MODEL_DATA	"cache_model_data"
 #define CN10K_ML_OCM_ALLOC_MODE		"ocm_alloc_mode"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK	"hw_queue_lock"
-#define CN10K_ML_FW_POLL_MEM		"poll_mem"
 #define CN10K_ML_OCM_PAGE_SIZE		"ocm_page_size"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
@@ -32,7 +31,6 @@
 #define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT	1
 #define CN10K_ML_OCM_ALLOC_MODE_DEFAULT		"lowest"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT	1
-#define CN10K_ML_FW_POLL_MEM_DEFAULT		"ddr"
 #define CN10K_ML_OCM_PAGE_SIZE_DEFAULT		16384
 
 /* ML firmware macros */
@@ -54,7 +52,6 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 					 CN10K_ML_DEV_CACHE_MODEL_DATA,
 					 CN10K_ML_OCM_ALLOC_MODE,
 					 CN10K_ML_DEV_HW_QUEUE_LOCK,
-					 CN10K_ML_FW_POLL_MEM,
 					 CN10K_ML_OCM_PAGE_SIZE,
 					 NULL};
 
@@ -103,9 +100,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	bool hw_queue_lock_set = false;
 	bool ocm_page_size_set = false;
 	char *ocm_alloc_mode = NULL;
-	bool poll_mem_set = false;
 	bool fw_path_set = false;
-	char *poll_mem = NULL;
 	char *fw_path = NULL;
 	int ret = 0;
 	bool found;
@@ -189,17 +184,6 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		hw_queue_lock_set = true;
 	}
 
-	if (rte_kvargs_count(kvlist, CN10K_ML_FW_POLL_MEM) == 1) {
-		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_POLL_MEM, &parse_string_arg,
-					 &poll_mem);
-		if (ret < 0) {
-			plt_err("Error processing arguments, key = %s\n", CN10K_ML_FW_POLL_MEM);
-			ret = -EINVAL;
-			goto exit;
-		}
-		poll_mem_set = true;
-	}
-
 	if (rte_kvargs_count(kvlist, CN10K_ML_OCM_PAGE_SIZE) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_OCM_PAGE_SIZE, &parse_integer_arg,
 					 &mldev->ocm_page_size);
@@ -280,18 +264,6 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, mldev->hw_queue_lock);
 
-	if (!poll_mem_set) {
-		mldev->fw.poll_mem = CN10K_ML_FW_POLL_MEM_DEFAULT;
-	} else {
-		if (!((strcmp(poll_mem, "ddr") == 0) || (strcmp(poll_mem, "register") == 0))) {
-			plt_err("Invalid argument, %s = %s\n", CN10K_ML_FW_POLL_MEM, poll_mem);
-			ret = -EINVAL;
-			goto exit;
-		}
-		mldev->fw.poll_mem = poll_mem;
-	}
-	plt_info("ML: %s = %s", CN10K_ML_FW_POLL_MEM, mldev->fw.poll_mem);
-
 	if (!ocm_page_size_set) {
 		mldev->ocm_page_size = CN10K_ML_OCM_PAGE_SIZE_DEFAULT;
 	} else {
@@ -450,10 +422,7 @@ cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 	if (fw->report_dpe_warnings)
 		flags = flags | FW_REPORT_DPE_WARNING_BITMASK;
 
-	if (strcmp(fw->poll_mem, "ddr") == 0)
-		flags = flags | FW_USE_DDR_POLL_ADDR_FP;
-	else if (strcmp(fw->poll_mem, "register") == 0)
-		flags = flags & ~FW_USE_DDR_POLL_ADDR_FP;
+	flags = flags | FW_USE_DDR_POLL_ADDR_FP;
 
 	return flags;
 }
@@ -863,5 +832,4 @@ RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH
 			      "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
 			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE
 			      "=<lowest|largest>" CN10K_ML_DEV_HW_QUEUE_LOCK
-			      "=<0|1>" CN10K_ML_FW_POLL_MEM "=<ddr|register>" CN10K_ML_OCM_PAGE_SIZE
-			      "=<1024|2048|4096|8192|16384>");
+			      "=<0|1>" CN10K_ML_OCM_PAGE_SIZE "=<1024|2048|4096|8192|16384>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index c73bf7d001..4aaeecff03 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -390,9 +390,6 @@ struct cn10k_ml_fw {
 	/* Report DPE warnings */
 	int report_dpe_warnings;
 
-	/* Memory to be used for polling in fast-path requests */
-	const char *poll_mem;
-
 	/* Data buffer */
 	uint8_t *data;
 
@@ -525,13 +522,9 @@ struct cn10k_ml_dev {
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
 
 	/* Poll handling function pointers */
-	void (*set_poll_addr)(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx);
-	void (*set_poll_ptr)(struct roc_ml *roc_ml, struct cn10k_ml_req *req);
-	uint64_t (*get_poll_ptr)(struct roc_ml *roc_ml, struct cn10k_ml_req *req);
-
-	/* Memory barrier function pointers to handle synchronization */
-	void (*set_enq_barrier)(void);
-	void (*set_deq_barrier)(void);
+	void (*set_poll_addr)(struct cn10k_ml_req *req);
+	void (*set_poll_ptr)(struct cn10k_ml_req *req);
+	uint64_t (*get_poll_ptr)(struct cn10k_ml_req *req);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 4abf4ae0d3..11531afd8c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -23,11 +23,6 @@
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
-/* Scratch register range for poll mode requests */
-#define ML_POLL_REGISTER_SYNC  1023
-#define ML_POLL_REGISTER_START 1024
-#define ML_POLL_REGISTER_END   2047
-
 /* Error message length */
 #define ERRMSG_LEN 32
 
@@ -82,79 +77,23 @@ print_line(FILE *fp, int len)
 }
 
 static inline void
-cn10k_ml_set_poll_addr_ddr(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx)
+cn10k_ml_set_poll_addr(struct cn10k_ml_req *req)
 {
-	PLT_SET_USED(qp);
-	PLT_SET_USED(idx);
-
 	req->compl_W1 = PLT_U64_CAST(&req->status);
 }
 
 static inline void
-cn10k_ml_set_poll_addr_reg(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx)
-{
-	req->compl_W1 = ML_SCRATCH(qp->block_start + idx % qp->block_size);
-}
-
-static inline void
-cn10k_ml_set_poll_ptr_ddr(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+cn10k_ml_set_poll_ptr(struct cn10k_ml_req *req)
 {
-	PLT_SET_USED(roc_ml);
-
 	plt_write64(ML_CN10K_POLL_JOB_START, req->compl_W1);
 }
 
-static inline void
-cn10k_ml_set_poll_ptr_reg(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
-{
-	roc_ml_reg_write64(roc_ml, ML_CN10K_POLL_JOB_START, req->compl_W1);
-}
-
 static inline uint64_t
-cn10k_ml_get_poll_ptr_ddr(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+cn10k_ml_get_poll_ptr(struct cn10k_ml_req *req)
 {
-	PLT_SET_USED(roc_ml);
-
 	return plt_read64(req->compl_W1);
 }
 
-static inline uint64_t
-cn10k_ml_get_poll_ptr_reg(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
-{
-	return roc_ml_reg_read64(roc_ml, req->compl_W1);
-}
-
-static inline void
-cn10k_ml_set_sync_addr(struct cn10k_ml_dev *mldev, struct cn10k_ml_req *req)
-{
-	if (strcmp(mldev->fw.poll_mem, "ddr") == 0)
-		req->compl_W1 = PLT_U64_CAST(&req->status);
-	else if (strcmp(mldev->fw.poll_mem, "register") == 0)
-		req->compl_W1 = ML_SCRATCH(ML_POLL_REGISTER_SYNC);
-}
-
-static inline void
-cn10k_ml_enq_barrier_ddr(void)
-{
-}
-
-static inline void
-cn10k_ml_deq_barrier_ddr(void)
-{
-}
-
-static inline void
-cn10k_ml_enq_barrier_register(void)
-{
-	dmb_st;
-}
-
-static inline void
-cn10k_ml_deq_barrier_register(void)
-{
-	dsb_st;
-}
-
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -242,9 +181,6 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->stats.dequeued_count = 0;
 	qp->stats.enqueue_err_count = 0;
 	qp->stats.dequeue_err_count = 0;
-	qp->block_size =
-		(ML_POLL_REGISTER_END - ML_POLL_REGISTER_START + 1) / dev->data->nb_queue_pairs;
-	qp->block_start = ML_POLL_REGISTER_START + qp_id * qp->block_size;
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
@@ -933,11 +869,7 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	else
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_LF;
 
-	if (strcmp(mldev->fw.poll_mem, "register") == 0)
-		dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP / dev_info->max_queue_pairs;
-	else if (strcmp(mldev->fw.poll_mem, "ddr") == 0)
-		dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
-
+	dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
 	dev_info->max_io = ML_CN10K_MAX_INPUT_OUTPUT;
 	dev_info->max_segments = ML_CN10K_MAX_SEGMENTS;
 	dev_info->align_size = ML_CN10K_ALIGN_SIZE;
@@ -1118,24 +1050,9 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
 	/* Set polling function pointers */
-	if (strcmp(mldev->fw.poll_mem, "ddr") == 0) {
-		mldev->set_poll_addr = cn10k_ml_set_poll_addr_ddr;
-		mldev->set_poll_ptr = cn10k_ml_set_poll_ptr_ddr;
-		mldev->get_poll_ptr = cn10k_ml_get_poll_ptr_ddr;
-	} else if (strcmp(mldev->fw.poll_mem, "register") == 0) {
-		mldev->set_poll_addr = cn10k_ml_set_poll_addr_reg;
-		mldev->set_poll_ptr = cn10k_ml_set_poll_ptr_reg;
-		mldev->get_poll_ptr = cn10k_ml_get_poll_ptr_reg;
-	}
-
-	/* Set barrier function pointers */
-	if (strcmp(mldev->fw.poll_mem, "ddr") == 0) {
-		mldev->set_enq_barrier = cn10k_ml_enq_barrier_ddr;
-		mldev->set_deq_barrier = cn10k_ml_deq_barrier_ddr;
-	} else if (strcmp(mldev->fw.poll_mem, "register") == 0) {
-		mldev->set_enq_barrier = cn10k_ml_enq_barrier_register;
-		mldev->set_deq_barrier = cn10k_ml_deq_barrier_register;
-	}
+	mldev->set_poll_addr = cn10k_ml_set_poll_addr;
+	mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
+	mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
 
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
@@ -2390,15 +2307,14 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	op = ops[count];
 	req = &queue->reqs[head];
 
-	mldev->set_poll_addr(qp, req, head);
+	mldev->set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
-	mldev->set_enq_barrier();
 
-	mldev->set_poll_ptr(&mldev->roc, req);
+	mldev->set_poll_ptr(req);
 	enqueued = mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
@@ -2445,7 +2361,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 
 dequeue_req:
 	req = &queue->reqs[tail];
-	status = mldev->get_poll_ptr(&mldev->roc, req);
+	status = mldev->get_poll_ptr(req);
 	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH)) {
 		if (plt_tsc_cycles() < req->timeout)
 			goto empty_or_active;
@@ -2453,7 +2369,6 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 			req->result.error_code.s.etype = ML_ETYPE_DRIVER;
 	}
 
-	mldev->set_deq_barrier();
 	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
 	ops[count] = req->op;
 
@@ -2515,14 +2430,14 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	model = dev->data->models[op->model_id];
 	req = model->req;
 
-	cn10k_ml_set_sync_addr(mldev, req);
+	cn10k_ml_set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
-	mldev->set_poll_ptr(&mldev->roc, req);
+	mldev->set_poll_ptr(req);
 	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
 
 	timeout = true;
@@ -2542,7 +2457,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 
 	timeout = true;
 	do {
-		if (mldev->get_poll_ptr(&mldev->roc, req) == ML_CN10K_POLL_JOB_FINISH) {
+		if (mldev->get_poll_ptr(req) == ML_CN10K_POLL_JOB_FINISH) {
 			timeout = false;
 			break;
 		}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index d64a9f27e6..005b093e45 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -67,12 +67,6 @@ struct cn10k_ml_qp {
 
 	/* Statistics per queue-pair */
 	struct rte_ml_dev_stats stats;
-
-	/* Register block start for polling */
-	uint32_t block_start;
-
-	/* Register block end for polling */
-	uint32_t block_size;
 };
 
 /* Device ops */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 02/35] ml/cnxk: add generic cnxk device structure
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 01/35] ml/cnxk: drop support for register polling Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 03/35] ml/cnxk: add generic model and layer structures Srikanth Yalavarthi
                     ` (32 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduce generic cnxk device structure. This structure is
a top level device structure for the driver, which would
encapsulate the target / platform specific device structure.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c   | 316 ++++++++++----------
 drivers/ml/cnxk/cn10k_ml_dev.h   |  47 +--
 drivers/ml/cnxk/cn10k_ml_model.c |  15 +-
 drivers/ml/cnxk/cn10k_ml_model.h |   8 +-
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  60 ++--
 drivers/ml/cnxk/cn10k_ml_ops.c   | 495 +++++++++++++++++--------------
 drivers/ml/cnxk/cnxk_ml_dev.c    |  11 +
 drivers/ml/cnxk/cnxk_ml_dev.h    |  58 ++++
 drivers/ml/cnxk/meson.build      |   2 +
 9 files changed, 563 insertions(+), 449 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index e3c2badcef..3bc61443d8 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -10,13 +10,14 @@
 #include <rte_mldev_pmd.h>
 #include <rte_pci.h>
 
-#include <roc_api.h>
-
 #include <eal_firmware.h>
 
-#include "cn10k_ml_dev.h"
+#include <roc_api.h>
+
 #include "cn10k_ml_ops.h"
 
+#include "cnxk_ml_dev.h"
+
 #define CN10K_ML_FW_PATH		"fw_path"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
@@ -58,9 +59,6 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 /* Supported OCM page sizes: 1KB, 2KB, 4KB, 8KB and 16KB */
 static const int valid_ocm_page_size[] = {1024, 2048, 4096, 8192, 16384};
 
-/* Dummy operations for ML device */
-struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
-
 static int
 parse_string_arg(const char *key __rte_unused, const char *value, void *extra_args)
 {
@@ -90,7 +88,7 @@ parse_integer_arg(const char *key __rte_unused, const char *value, void *extra_a
 }
 
 static int
-cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mldev)
+cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *cn10k_mldev)
 {
 	bool enable_dpe_warnings_set = false;
 	bool report_dpe_warnings_set = false;
@@ -127,7 +125,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_FW_ENABLE_DPE_WARNINGS) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_ENABLE_DPE_WARNINGS,
-					 &parse_integer_arg, &mldev->fw.enable_dpe_warnings);
+					 &parse_integer_arg, &cn10k_mldev->fw.enable_dpe_warnings);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_FW_ENABLE_DPE_WARNINGS);
@@ -139,7 +137,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_FW_REPORT_DPE_WARNINGS) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_REPORT_DPE_WARNINGS,
-					 &parse_integer_arg, &mldev->fw.report_dpe_warnings);
+					 &parse_integer_arg, &cn10k_mldev->fw.report_dpe_warnings);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_FW_REPORT_DPE_WARNINGS);
@@ -151,7 +149,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_DEV_CACHE_MODEL_DATA) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_DEV_CACHE_MODEL_DATA, &parse_integer_arg,
-					 &mldev->cache_model_data);
+					 &cn10k_mldev->cache_model_data);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_DEV_CACHE_MODEL_DATA);
@@ -174,7 +172,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_DEV_HW_QUEUE_LOCK) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_DEV_HW_QUEUE_LOCK, &parse_integer_arg,
-					 &mldev->hw_queue_lock);
+					 &cn10k_mldev->hw_queue_lock);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_DEV_HW_QUEUE_LOCK);
@@ -186,7 +184,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_OCM_PAGE_SIZE) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_OCM_PAGE_SIZE, &parse_integer_arg,
-					 &mldev->ocm_page_size);
+					 &cn10k_mldev->ocm_page_size);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n", CN10K_ML_OCM_PAGE_SIZE);
 			ret = -EINVAL;
@@ -197,49 +195,53 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 check_args:
 	if (!fw_path_set)
-		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
+		cn10k_mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
 	else
-		mldev->fw.path = fw_path;
-	plt_info("ML: %s = %s", CN10K_ML_FW_PATH, mldev->fw.path);
+		cn10k_mldev->fw.path = fw_path;
+	plt_info("ML: %s = %s", CN10K_ML_FW_PATH, cn10k_mldev->fw.path);
 
 	if (!enable_dpe_warnings_set) {
-		mldev->fw.enable_dpe_warnings = CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT;
+		cn10k_mldev->fw.enable_dpe_warnings = CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT;
 	} else {
-		if ((mldev->fw.enable_dpe_warnings < 0) || (mldev->fw.enable_dpe_warnings > 1)) {
+		if ((cn10k_mldev->fw.enable_dpe_warnings < 0) ||
+		    (cn10k_mldev->fw.enable_dpe_warnings > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_FW_ENABLE_DPE_WARNINGS,
-				mldev->fw.enable_dpe_warnings);
+				cn10k_mldev->fw.enable_dpe_warnings);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_FW_ENABLE_DPE_WARNINGS, mldev->fw.enable_dpe_warnings);
+	plt_info("ML: %s = %d", CN10K_ML_FW_ENABLE_DPE_WARNINGS,
+		 cn10k_mldev->fw.enable_dpe_warnings);
 
 	if (!report_dpe_warnings_set) {
-		mldev->fw.report_dpe_warnings = CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT;
+		cn10k_mldev->fw.report_dpe_warnings = CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT;
 	} else {
-		if ((mldev->fw.report_dpe_warnings < 0) || (mldev->fw.report_dpe_warnings > 1)) {
+		if ((cn10k_mldev->fw.report_dpe_warnings < 0) ||
+		    (cn10k_mldev->fw.report_dpe_warnings > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_FW_REPORT_DPE_WARNINGS,
-				mldev->fw.report_dpe_warnings);
+				cn10k_mldev->fw.report_dpe_warnings);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_FW_REPORT_DPE_WARNINGS, mldev->fw.report_dpe_warnings);
+	plt_info("ML: %s = %d", CN10K_ML_FW_REPORT_DPE_WARNINGS,
+		 cn10k_mldev->fw.report_dpe_warnings);
 
 	if (!cache_model_data_set) {
-		mldev->cache_model_data = CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT;
+		cn10k_mldev->cache_model_data = CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT;
 	} else {
-		if ((mldev->cache_model_data < 0) || (mldev->cache_model_data > 1)) {
+		if ((cn10k_mldev->cache_model_data < 0) || (cn10k_mldev->cache_model_data > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_DEV_CACHE_MODEL_DATA,
-				mldev->cache_model_data);
+				cn10k_mldev->cache_model_data);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_DEV_CACHE_MODEL_DATA, mldev->cache_model_data);
+	plt_info("ML: %s = %d", CN10K_ML_DEV_CACHE_MODEL_DATA, cn10k_mldev->cache_model_data);
 
 	if (!ocm_alloc_mode_set) {
-		mldev->ocm.alloc_mode = CN10K_ML_OCM_ALLOC_MODE_DEFAULT;
+		cn10k_mldev->ocm.alloc_mode = CN10K_ML_OCM_ALLOC_MODE_DEFAULT;
 	} else {
 		if (!((strcmp(ocm_alloc_mode, "lowest") == 0) ||
 		      (strcmp(ocm_alloc_mode, "largest") == 0))) {
@@ -248,47 +250,47 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 			ret = -EINVAL;
 			goto exit;
 		}
-		mldev->ocm.alloc_mode = ocm_alloc_mode;
+		cn10k_mldev->ocm.alloc_mode = ocm_alloc_mode;
 	}
-	plt_info("ML: %s = %s", CN10K_ML_OCM_ALLOC_MODE, mldev->ocm.alloc_mode);
+	plt_info("ML: %s = %s", CN10K_ML_OCM_ALLOC_MODE, cn10k_mldev->ocm.alloc_mode);
 
 	if (!hw_queue_lock_set) {
-		mldev->hw_queue_lock = CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT;
+		cn10k_mldev->hw_queue_lock = CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT;
 	} else {
-		if ((mldev->hw_queue_lock < 0) || (mldev->hw_queue_lock > 1)) {
+		if ((cn10k_mldev->hw_queue_lock < 0) || (cn10k_mldev->hw_queue_lock > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_DEV_HW_QUEUE_LOCK,
-				mldev->hw_queue_lock);
+				cn10k_mldev->hw_queue_lock);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, mldev->hw_queue_lock);
+	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, cn10k_mldev->hw_queue_lock);
 
 	if (!ocm_page_size_set) {
-		mldev->ocm_page_size = CN10K_ML_OCM_PAGE_SIZE_DEFAULT;
+		cn10k_mldev->ocm_page_size = CN10K_ML_OCM_PAGE_SIZE_DEFAULT;
 	} else {
-		if (mldev->ocm_page_size < 0) {
+		if (cn10k_mldev->ocm_page_size < 0) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_OCM_PAGE_SIZE,
-				mldev->ocm_page_size);
+				cn10k_mldev->ocm_page_size);
 			ret = -EINVAL;
 			goto exit;
 		}
 
 		found = false;
 		for (i = 0; i < PLT_DIM(valid_ocm_page_size); i++) {
-			if (mldev->ocm_page_size == valid_ocm_page_size[i]) {
+			if (cn10k_mldev->ocm_page_size == valid_ocm_page_size[i]) {
 				found = true;
 				break;
 			}
 		}
 
 		if (!found) {
-			plt_err("Unsupported ocm_page_size = %d\n", mldev->ocm_page_size);
+			plt_err("Unsupported ocm_page_size = %d\n", cn10k_mldev->ocm_page_size);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_OCM_PAGE_SIZE, mldev->ocm_page_size);
+	plt_info("ML: %s = %d", CN10K_ML_OCM_PAGE_SIZE, cn10k_mldev->ocm_page_size);
 
 exit:
 	rte_kvargs_free(kvlist);
@@ -300,7 +302,8 @@ static int
 cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 {
 	struct rte_ml_dev_pmd_init_params init_params;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	char name[RTE_ML_STR_MAX];
 	struct rte_ml_dev *dev;
 	int ret;
@@ -308,7 +311,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	PLT_SET_USED(pci_drv);
 
 	init_params = (struct rte_ml_dev_pmd_init_params){
-		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cn10k_ml_dev)};
+		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cnxk_ml_dev)};
 
 	ret = roc_plt_init();
 	if (ret < 0) {
@@ -324,18 +327,20 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	}
 
 	/* Get private data space allocated */
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cnxk_mldev->mldev = dev;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
-		mldev->roc.pci_dev = pci_dev;
+		cn10k_mldev->roc.pci_dev = pci_dev;
 
-		ret = cn10k_mldev_parse_devargs(dev->device->devargs, mldev);
+		ret = cn10k_mldev_parse_devargs(dev->device->devargs, cn10k_mldev);
 		if (ret) {
 			plt_err("Failed to parse devargs ret = %d", ret);
 			goto pmd_destroy;
 		}
 
-		ret = roc_ml_dev_init(&mldev->roc);
+		ret = roc_ml_dev_init(&cn10k_mldev->roc);
 		if (ret) {
 			plt_err("Failed to initialize ML ROC, ret = %d", ret);
 			goto pmd_destroy;
@@ -351,7 +356,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	dev->dequeue_burst = NULL;
 	dev->op_error_get = NULL;
 
-	mldev->state = ML_CN10K_DEV_STATE_PROBED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_PROBED;
 
 	return 0;
 
@@ -368,7 +373,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 static int
 cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	char name[RTE_ML_STR_MAX];
 	struct rte_ml_dev *dev;
 	int ret;
@@ -383,8 +388,8 @@ cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 		return -ENODEV;
 
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
-		mldev = dev->data->dev_private;
-		ret = roc_ml_dev_fini(&mldev->roc);
+		cnxk_mldev = dev->data->dev_private;
+		ret = roc_ml_dev_fini(&cnxk_mldev->cn10k_mldev.roc);
 		if (ret)
 			return ret;
 	}
@@ -430,45 +435,45 @@ cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 static int
 cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
 	uint64_t timeout_cycle;
 	uint64_t reg_val64;
 	bool timeout;
 	int ret = 0;
 
-	mldev = fw->mldev;
+	cn10k_mldev = fw->cn10k_mldev;
 
 	/* Reset HEAD and TAIL debug pointer registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
 
 	/* Set ML_MLR_BASE to base IOVA of the ML region in LLC/DRAM. */
 	reg_val64 = rte_eal_get_baseaddr();
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_MLR_BASE);
-	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
-	roc_ml_reg_save(&mldev->roc, ML_MLR_BASE);
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_save(&cn10k_mldev->roc, ML_MLR_BASE);
 
 	/* Update FW load completion structure */
 	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
 	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &fw->req->result);
+	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
 	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CN10K_POLL_JOB_START, &fw->req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
-	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&mldev->roc, &fw->req->jd);
+	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
 
 	plt_rmb();
 	do {
-		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
+		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -480,11 +485,11 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 	} else {
 		/* Set ML to disable new jobs */
 		reg_val64 = (ROC_ML_CFG_JD_SIZE | ROC_ML_CFG_MLIP_ENA);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 
 		/* Clear scratch registers */
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
 
 		if (timeout) {
 			plt_err("Firmware load timeout");
@@ -498,14 +503,14 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 	}
 
 	/* Reset scratch registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
 
 	/* Disable job execution, to be enabled in start */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	return ret;
 }
@@ -515,7 +520,7 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 {
 	union ml_a35_0_rst_vector_base_s a35_0_rst_vector_base;
 	union ml_a35_0_rst_vector_base_s a35_1_rst_vector_base;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
 	uint64_t timeout_cycle;
 	uint64_t reg_val64;
 	uint32_t reg_val32;
@@ -524,24 +529,24 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	int ret = 0;
 	uint8_t i;
 
-	mldev = fw->mldev;
+	cn10k_mldev = fw->cn10k_mldev;
 
 	/* Reset HEAD and TAIL debug pointer registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
 
 	/* (1) Write firmware images for ACC's two A35 cores to the ML region in LLC / DRAM. */
 	rte_memcpy(PLT_PTR_ADD(fw->data, FW_LINKER_OFFSET), buffer, size);
 
 	/* (2) Set ML(0)_MLR_BASE = Base IOVA of the ML region in LLC/DRAM. */
 	reg_val64 = PLT_PTR_SUB_U64_CAST(fw->data, rte_eal_get_baseaddr());
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_MLR_BASE);
-	plt_ml_dbg("ML_MLR_BASE => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
-	roc_ml_reg_save(&mldev->roc, ML_MLR_BASE);
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_save(&cn10k_mldev->roc, ML_MLR_BASE);
 
 	/* (3) Set ML(0)_AXI_BRIDGE_CTRL(1) = 0x184003 to remove back-pressure check on DMA AXI
 	 * bridge.
@@ -549,9 +554,9 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	reg_val64 = (ROC_ML_AXI_BRIDGE_CTRL_AXI_RESP_CTRL |
 		     ROC_ML_AXI_BRIDGE_CTRL_BRIDGE_CTRL_MODE | ROC_ML_AXI_BRIDGE_CTRL_NCB_WR_BLK |
 		     ROC_ML_AXI_BRIDGE_CTRL_FORCE_WRESP_OK | ROC_ML_AXI_BRIDGE_CTRL_FORCE_RRESP_OK);
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_AXI_BRIDGE_CTRL(1));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_AXI_BRIDGE_CTRL(1));
 	plt_ml_dbg("ML_AXI_BRIDGE_CTRL(1) => 0x%016lx",
-		   roc_ml_reg_read64(&mldev->roc, ML_AXI_BRIDGE_CTRL(1)));
+		   roc_ml_reg_read64(&cn10k_mldev->roc, ML_AXI_BRIDGE_CTRL(1)));
 
 	/* (4) Set ML(0)_ANB(0..2)_BACKP_DISABLE = 0x3 to remove back-pressure on the AXI to NCB
 	 * bridges.
@@ -559,9 +564,9 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	for (i = 0; i < ML_ANBX_NR; i++) {
 		reg_val64 = (ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_B_BACKP_DISABLE |
 			     ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_R_BACKP_DISABLE);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_BACKP_DISABLE(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_BACKP_DISABLE(i));
 		plt_ml_dbg("ML_ANBX_BACKP_DISABLE(%u) => 0x%016lx", i,
-			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_BACKP_DISABLE(i)));
+			   roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_BACKP_DISABLE(i)));
 	}
 
 	/* (5) Set ML(0)_ANB(0..2)_NCBI_P_OVR = 0x3000 and ML(0)_ANB(0..2)_NCBI_NP_OVR = 0x3000 to
@@ -570,39 +575,40 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	for (i = 0; i < ML_ANBX_NR; i++) {
 		reg_val64 = (ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR |
 			     ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR_VLD);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
 		plt_ml_dbg("ML_ANBX_NCBI_P_OVR(%u) => 0x%016lx", i,
-			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_P_OVR(i)));
+			   roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_NCBI_P_OVR(i)));
 
 		reg_val64 |= (ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR |
 			      ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR_VLD);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_NP_OVR(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_NCBI_NP_OVR(i));
 		plt_ml_dbg("ML_ANBX_NCBI_NP_OVR(%u) => 0x%016lx", i,
-			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_NP_OVR(i)));
+			   roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_NCBI_NP_OVR(i)));
 	}
 
 	/* (6) Set ML(0)_CFG[MLIP_CLK_FORCE] = 1, to force turning on the MLIP clock. */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_MLIP_CLK_FORCE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* (7) Set ML(0)_JOB_MGR_CTRL[STALL_ON_IDLE] = 0, to make sure the boot request is accepted
 	 * when there is no job in the command queue.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL);
 	reg_val64 &= ~ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
-	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
+	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx",
+		   roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL));
 
 	/* (8) Set ML(0)_CFG[ENA] = 0 and ML(0)_CFG[MLIP_ENA] = 1 to bring MLIP out of reset while
 	 * keeping the job manager disabled.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_MLIP_ENA;
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* (9) Wait at least 70 coprocessor clock cycles. */
 	plt_delay_us(FW_WAIT_CYCLES);
@@ -613,53 +619,57 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	 * AXI outbound address divided by 4. Read after write.
 	 */
 	offset = PLT_PTR_ADD_U64_CAST(
-		fw->data, FW_LINKER_OFFSET - roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+		fw->data, FW_LINKER_OFFSET - roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
 	a35_0_rst_vector_base.s.addr = (offset + ML_AXI_START_ADDR) / 4;
 	a35_1_rst_vector_base.s.addr = (offset + ML_AXI_START_ADDR) / 4;
 
-	roc_ml_reg_write32(&mldev->roc, a35_0_rst_vector_base.w.w0, ML_A35_0_RST_VECTOR_BASE_W(0));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(0));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_0_rst_vector_base.w.w0,
+			   ML_A35_0_RST_VECTOR_BASE_W(0));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(0));
 	plt_ml_dbg("ML_A35_0_RST_VECTOR_BASE_W(0) => 0x%08x", reg_val32);
 
-	roc_ml_reg_write32(&mldev->roc, a35_0_rst_vector_base.w.w1, ML_A35_0_RST_VECTOR_BASE_W(1));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(1));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_0_rst_vector_base.w.w1,
+			   ML_A35_0_RST_VECTOR_BASE_W(1));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(1));
 	plt_ml_dbg("ML_A35_0_RST_VECTOR_BASE_W(1) => 0x%08x", reg_val32);
 
-	roc_ml_reg_write32(&mldev->roc, a35_1_rst_vector_base.w.w0, ML_A35_1_RST_VECTOR_BASE_W(0));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(0));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_1_rst_vector_base.w.w0,
+			   ML_A35_1_RST_VECTOR_BASE_W(0));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(0));
 	plt_ml_dbg("ML_A35_1_RST_VECTOR_BASE_W(0) => 0x%08x", reg_val32);
 
-	roc_ml_reg_write32(&mldev->roc, a35_1_rst_vector_base.w.w1, ML_A35_1_RST_VECTOR_BASE_W(1));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(1));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_1_rst_vector_base.w.w1,
+			   ML_A35_1_RST_VECTOR_BASE_W(1));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(1));
 	plt_ml_dbg("ML_A35_1_RST_VECTOR_BASE_W(1) => 0x%08x", reg_val32);
 
 	/* (11) Clear MLIP's ML(0)_SW_RST_CTRL[ACC_RST]. This will bring the ACC cores and other
 	 * MLIP components out of reset. The cores will execute firmware from the ML region as
 	 * written in step 1.
 	 */
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_SW_RST_CTRL);
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_SW_RST_CTRL);
 	reg_val32 &= ~ROC_ML_SW_RST_CTRL_ACC_RST;
-	roc_ml_reg_write32(&mldev->roc, reg_val32, ML_SW_RST_CTRL);
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_SW_RST_CTRL);
+	roc_ml_reg_write32(&cn10k_mldev->roc, reg_val32, ML_SW_RST_CTRL);
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_SW_RST_CTRL);
 	plt_ml_dbg("ML_SW_RST_CTRL => 0x%08x", reg_val32);
 
 	/* (12) Wait for notification from firmware that ML is ready for job execution. */
 	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
 	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &fw->req->result);
+	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
 	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CN10K_POLL_JOB_START, &fw->req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
-	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&mldev->roc, &fw->req->jd);
+	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
 
 	plt_rmb();
 	do {
-		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
+		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -671,11 +681,11 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	} else {
 		/* Set ML to disable new jobs */
 		reg_val64 = (ROC_ML_CFG_JD_SIZE | ROC_ML_CFG_MLIP_ENA);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 
 		/* Clear scratch registers */
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
 
 		if (timeout) {
 			plt_err("Firmware load timeout");
@@ -691,49 +701,51 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	/* (13) Set ML(0)_JOB_MGR_CTRL[STALL_ON_IDLE] = 0x1; this is needed to shut down the MLIP
 	 * clock when there are no more jobs to process.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL);
 	reg_val64 |= ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
-	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
+	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx",
+		   roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL));
 
 	/* (14) Set ML(0)_CFG[MLIP_CLK_FORCE] = 0; the MLIP clock will be turned on/off based on job
 	 * activities.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_MLIP_CLK_FORCE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* (15) Set ML(0)_CFG[ENA] to enable ML job execution. */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* Reset scratch registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
 
 	/* Disable job execution, to be enabled in start */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* Additional fixes: Set RO bit to fix O2D DMA bandwidth issue on cn10ka */
 	for (i = 0; i < ML_ANBX_NR; i++) {
-		reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_P_OVR(i));
+		reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_NCBI_P_OVR(i));
 		reg_val64 |= (ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR |
 			      ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR_VLD);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
 	}
 
 	return ret;
 }
 
 int
-cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
+cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
 	const struct plt_memzone *mz;
 	struct cn10k_ml_fw *fw;
 	void *fw_buffer = NULL;
@@ -741,8 +753,9 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	uint64_t fw_size = 0;
 	int ret = 0;
 
-	fw = &mldev->fw;
-	fw->mldev = mldev;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	fw = &cn10k_mldev->fw;
+	fw->cn10k_mldev = cn10k_mldev;
 
 	if (roc_env_is_emulator() || roc_env_is_hw()) {
 		/* Read firmware image to a buffer */
@@ -773,8 +786,8 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	memset(&fw->req->jd.fw_load.version[0], '\0', MLDEV_FIRMWARE_VERSION_LENGTH);
 
 	/* Reset device, if in active state */
-	if (roc_ml_mlip_is_enabled(&mldev->roc))
-		roc_ml_mlip_reset(&mldev->roc, true);
+	if (roc_ml_mlip_is_enabled(&cn10k_mldev->roc))
+		roc_ml_mlip_reset(&cn10k_mldev->roc, true);
 
 	/* Load firmware */
 	if (roc_env_is_emulator() || roc_env_is_hw()) {
@@ -787,22 +800,25 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	}
 
 	if (ret < 0)
-		cn10k_ml_fw_unload(mldev);
+		cn10k_ml_fw_unload(cnxk_mldev);
 
 	return ret;
 }
 
 void
-cn10k_ml_fw_unload(struct cn10k_ml_dev *mldev)
+cn10k_ml_fw_unload(struct cnxk_ml_dev *cnxk_mldev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
 	const struct plt_memzone *mz;
 	uint64_t reg_val;
 
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+
 	/* Disable and reset device */
-	reg_val = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val &= ~ROC_ML_CFG_MLIP_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val, ML_CFG);
-	roc_ml_mlip_reset(&mldev->roc, true);
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val, ML_CFG);
+	roc_ml_mlip_reset(&cn10k_mldev->roc, true);
 
 	mz = plt_memzone_lookup(FW_MEMZONE_NAME);
 	if (mz != NULL)
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 4aaeecff03..f9da1548c4 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -9,6 +9,9 @@
 
 #include "cn10k_ml_ocm.h"
 
+/* Dummy Device ops */
+extern struct rte_ml_dev_ops ml_dev_dummy_ops;
+
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
@@ -36,17 +39,10 @@
 /* Maximum number of segments for IO data */
 #define ML_CN10K_MAX_SEGMENTS 1
 
-/* ML command timeout in seconds */
-#define ML_CN10K_CMD_TIMEOUT 5
-
 /* ML slow-path job flags */
 #define ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE BIT(0)
 #define ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD   BIT(1)
 
-/* Poll mode job state */
-#define ML_CN10K_POLL_JOB_START	 0
-#define ML_CN10K_POLL_JOB_FINISH 1
-
 /* Memory barrier macros */
 #if defined(RTE_ARCH_ARM)
 #define dmb_st ({ asm volatile("dmb st" : : : "memory"); })
@@ -56,6 +52,7 @@
 #define dsb_st
 #endif
 
+struct cnxk_ml_dev;
 struct cn10k_ml_req;
 struct cn10k_ml_qp;
 
@@ -68,21 +65,6 @@ enum cn10k_ml_job_type {
 	ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST,
 };
 
-/* Device configuration state enum */
-enum cn10k_ml_dev_state {
-	/* Probed and not configured */
-	ML_CN10K_DEV_STATE_PROBED = 0,
-
-	/* Configured */
-	ML_CN10K_DEV_STATE_CONFIGURED,
-
-	/* Started */
-	ML_CN10K_DEV_STATE_STARTED,
-
-	/* Closed */
-	ML_CN10K_DEV_STATE_CLOSED
-};
-
 /* Error types enumeration */
 enum cn10k_ml_error_etype {
 	/* 0x0 */ ML_ETYPE_NO_ERROR = 0, /* No error */
@@ -379,7 +361,7 @@ struct cn10k_ml_jd {
 /* ML firmware structure */
 struct cn10k_ml_fw {
 	/* Device reference */
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
 
 	/* Firmware file path */
 	const char *path;
@@ -485,27 +467,12 @@ struct cn10k_ml_dev {
 	/* Device ROC */
 	struct roc_ml roc;
 
-	/* Configuration state */
-	enum cn10k_ml_dev_state state;
-
 	/* Firmware */
 	struct cn10k_ml_fw fw;
 
 	/* OCM info */
 	struct cn10k_ml_ocm ocm;
 
-	/* Number of models loaded */
-	uint16_t nb_models_loaded;
-
-	/* Number of models unloaded */
-	uint16_t nb_models_unloaded;
-
-	/* Number of models started */
-	uint16_t nb_models_started;
-
-	/* Number of models stopped */
-	uint16_t nb_models_stopped;
-
 	/* Extended stats data */
 	struct cn10k_ml_xstats xstats;
 
@@ -528,7 +495,7 @@ struct cn10k_ml_dev {
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
-int cn10k_ml_fw_load(struct cn10k_ml_dev *mldev);
-void cn10k_ml_fw_unload(struct cn10k_ml_dev *mldev);
+int cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev);
+void cn10k_ml_fw_unload(struct cnxk_ml_dev *cnxk_mldev);
 
 #endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index e0b750cd8e..cc46ca2efd 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -6,10 +6,11 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
+#include "cnxk_ml_dev.h"
+
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
 {
@@ -461,7 +462,7 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_
 }
 
 int
-cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, uint8_t *buffer,
+cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id, uint8_t *buffer,
 			       uint16_t *wb_pages, uint16_t *scratch_pages)
 {
 	struct cn10k_ml_model_metadata *metadata;
@@ -470,7 +471,7 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui
 	uint64_t wb_size;
 
 	metadata = (struct cn10k_ml_model_metadata *)buffer;
-	ocm = &mldev->ocm;
+	ocm = &cn10k_mldev->ocm;
 
 	/* Assume wb_size is zero for non-relocatable models */
 	if (metadata->model.ocm_relocatable)
@@ -494,11 +495,11 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui
 		   scratch_size, *scratch_pages);
 
 	/* Check if the model can be loaded on OCM */
-	if ((*wb_pages + *scratch_pages) > mldev->ocm.num_pages) {
+	if ((*wb_pages + *scratch_pages) > cn10k_mldev->ocm.num_pages) {
 		plt_err("Cannot create the model, OCM relocatable = %u",
 			metadata->model.ocm_relocatable);
 		plt_err("wb_pages (%u) + scratch_pages (%u) > %u", *wb_pages, *scratch_pages,
-			mldev->ocm.num_pages);
+			cn10k_mldev->ocm.num_pages);
 		return -ENOMEM;
 	}
 
@@ -506,8 +507,8 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui
 	 * prevent the library from allocating the remaining space on the tile to other models.
 	 */
 	if (!metadata->model.ocm_relocatable)
-		*scratch_pages =
-			PLT_MAX(PLT_U64_CAST(*scratch_pages), PLT_U64_CAST(mldev->ocm.num_pages));
+		*scratch_pages = PLT_MAX(PLT_U64_CAST(*scratch_pages),
+					 PLT_U64_CAST(cn10k_mldev->ocm.num_pages));
 
 	return 0;
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 4cc0744891..3128b28db7 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -13,6 +13,8 @@
 #include "cn10k_ml_ocm.h"
 #include "cn10k_ml_ops.h"
 
+struct cnxk_ml_dev;
+
 /* Model state */
 enum cn10k_ml_model_state {
 	ML_CN10K_MODEL_STATE_LOADED,
@@ -489,7 +491,7 @@ struct cn10k_ml_model_stats {
 /* Model Object */
 struct cn10k_ml_model {
 	/* Device reference */
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *mldev;
 
 	/* Name */
 	char name[RTE_ML_STR_MAX];
@@ -537,8 +539,8 @@ int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
 void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
 				uint8_t *base_dma_addr);
-int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, uint8_t *buffer,
-				   uint16_t *wb_pages, uint16_t *scratch_pages);
+int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id,
+				   uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages);
 void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 6fb0bb620e..8094a0fab1 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -4,11 +4,12 @@
 
 #include <rte_mldev_pmd.h>
 
-#include "cn10k_ml_dev.h"
+#include <roc_api.h>
+
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
-#include "roc_api.h"
+#include "cnxk_ml_dev.h"
 
 /* OCM macros */
 #define BYTE_LEN	   8
@@ -217,7 +218,8 @@ int
 cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
 			   uint16_t scratch_pages, uint64_t *tilemask)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	uint16_t used_scratch_pages_max;
@@ -236,8 +238,9 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	int max_slot_sz;
 	int page_id;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 
 	if (num_tiles > ML_CN10K_OCM_NUMTILES) {
 		plt_err("Invalid num_tiles = %u (> %u)", num_tiles, ML_CN10K_OCM_NUMTILES);
@@ -254,8 +257,8 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	tile_start = 0;
 	search_end_tile = ocm->num_tiles - num_tiles;
 
-	/* allocate for local ocm mask */
-	local_ocm_mask = rte_zmalloc("local_ocm_mask", mldev->ocm.mask_words, RTE_CACHE_LINE_SIZE);
+	/* Allocate for local ocm mask */
+	local_ocm_mask = rte_zmalloc("local_ocm_mask", ocm->mask_words, RTE_CACHE_LINE_SIZE);
 	if (local_ocm_mask == NULL) {
 		plt_err("Unable to allocate memory for local_ocm_mask");
 		return -1;
@@ -271,7 +274,7 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 			PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, used_last_wb_page_max);
 	}
 
-	memset(local_ocm_mask, 0, mldev->ocm.mask_words);
+	memset(local_ocm_mask, 0, ocm->mask_words);
 	for (tile_id = tile_start; tile_id < tile_start + num_tiles; tile_id++) {
 		for (word_id = 0; word_id < ocm->mask_words; word_id++)
 			local_ocm_mask[word_id] |= ocm->tile_ocm_info[tile_id].ocm_mask[word_id];
@@ -333,8 +336,9 @@ void
 cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t tilemask,
 			   int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_page_start;
@@ -345,8 +349,9 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 	int tile_id;
 	int page_id;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	/* Get first set bit, tile_start */
@@ -391,8 +396,9 @@ void
 cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_model *local_model;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_resize_pages;
@@ -404,8 +410,9 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 	int page_id;
 	uint16_t i;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	/* Update OCM info for WB memory */
@@ -453,35 +460,37 @@ cn10k_ml_ocm_pagemask_to_str(struct cn10k_ml_ocm_tile_info *tile_info, uint16_t
 	char *p = str;
 	int word;
 
-	/* add prefix 0x */
+	/* Add prefix 0x */
 	*p++ = '0';
 	*p++ = 'x';
 
-	/* build one word at a time */
+	/* Build hex string */
 	for (word = nwords - 1; word >= 0; word--) {
 		sprintf(p, "%02X", tile_info->ocm_mask[word]);
 		p += 2;
 	}
 
-	/* terminate */
+	/* Terminate */
 	*p++ = 0;
 }
 
 void
 cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
 {
-	char *str;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 	uint8_t tile_id;
 	uint8_t word_id;
 	int wb_pages;
+	char *str;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 
-	/* nibbles + prefix '0x' */
-	str = rte_zmalloc("ocm_mask_str", mldev->ocm.num_pages / 4 + 2, RTE_CACHE_LINE_SIZE);
+	/* Nibbles + prefix '0x' */
+	str = rte_zmalloc("ocm_mask_str", ocm->num_pages / 4 + 2, RTE_CACHE_LINE_SIZE);
 	if (str == NULL) {
 		plt_err("Unable to allocate memory for ocm_mask_str");
 		return;
@@ -492,9 +501,8 @@ cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
 		cn10k_ml_ocm_pagemask_to_str(&ocm->tile_ocm_info[tile_id], ocm->mask_words, str);
 
 		wb_pages = 0 - ocm->tile_ocm_info[tile_id].scratch_pages;
-		for (word_id = 0; word_id < mldev->ocm.mask_words; word_id++)
-			wb_pages +=
-				rte_popcount32(ocm->tile_ocm_info[tile_id].ocm_mask[word_id]);
+		for (word_id = 0; word_id < ocm->mask_words; word_id++)
+			wb_pages += rte_popcount32(ocm->tile_ocm_info[tile_id].ocm_mask[word_id]);
 
 		fprintf(fp,
 			"tile = %2u, scratch_pages = %4u,"
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 11531afd8c..def6d4c756 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -7,10 +7,11 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ops.h"
 
+#include "cnxk_ml_dev.h"
+
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
 
@@ -85,7 +86,7 @@ cn10k_ml_set_poll_addr(struct cn10k_ml_req *req)
 static inline void
 cn10k_ml_set_poll_ptr(struct cn10k_ml_req *req)
 {
-	plt_write64(ML_CN10K_POLL_JOB_START, req->compl_W1);
+	plt_write64(ML_CNXK_POLL_JOB_START, req->compl_W1);
 }
 
 static inline uint64_t
@@ -175,7 +176,7 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->queue.reqs = (struct cn10k_ml_req *)va;
 	qp->queue.head = 0;
 	qp->queue.tail = 0;
-	qp->queue.wait_cycles = ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	qp->nb_desc = nb_desc;
 	qp->stats.enqueued_count = 0;
 	qp->stats.dequeued_count = 0;
@@ -199,16 +200,17 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 static void
 cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 {
-
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	char str[STR_LEN];
 	uint8_t i;
 	uint8_t j;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	/* Print debug info */
@@ -249,7 +251,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
 			ML_CN10K_OCM_NUMTILES / 4, model->model_mem_map.tilemask);
 		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
-			model->model_mem_map.wb_page_start * mldev->ocm.page_size);
+			model->model_mem_map.wb_page_start * cn10k_mldev->ocm.page_size);
 	}
 
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->metadata.model.num_input);
@@ -325,7 +327,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 }
 
 static void
-cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_model *model,
+cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_model *model,
 				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
@@ -340,7 +342,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 	req->jd.hdr.model_id = model->model_id;
 	req->jd.hdr.job_type = job_type;
 	req->jd.hdr.fp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
 		if (!model->metadata.model.ocm_relocatable)
@@ -350,9 +352,9 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 
 		req->jd.hdr.sp_flags |= ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD;
 		req->jd.model_start.extended_args =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, &req->extended_args));
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->extended_args));
 		req->jd.model_start.model_dst_ddr_addr =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, addr->init_run_addr));
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_run_addr));
 		req->jd.model_start.model_init_offset = 0x0;
 		req->jd.model_start.model_main_offset = metadata->init_model.file_size;
 		req->jd.model_start.model_finish_offset =
@@ -372,7 +374,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 		req->jd.model_start.ocm_wb_range_start = metadata->model.ocm_wb_range_start;
 		req->jd.model_start.ocm_wb_range_end = metadata->model.ocm_wb_range_end;
 		req->jd.model_start.ddr_wb_base_address = PLT_U64_CAST(roc_ml_addr_ap2mlip(
-			&mldev->roc,
+			&cn10k_mldev->roc,
 			PLT_PTR_ADD(addr->finish_load_addr, metadata->finish_model.file_size)));
 		req->jd.model_start.ddr_wb_range_start = metadata->model.ddr_wb_range_start;
 		req->jd.model_start.ddr_wb_range_end = metadata->model.ddr_wb_range_end;
@@ -383,7 +385,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 		req->jd.model_start.output.s.ddr_range_end = metadata->model.ddr_output_range_end;
 
 		req->extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
-			roc_ml_addr_ap2mlip(&mldev->roc, model->addr.scratch_base_addr));
+			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, model->addr.scratch_base_addr));
 		req->extended_args.start.ddr_scratch_range_start =
 			metadata->model.ddr_scratch_range_start;
 		req->extended_args.start.ddr_scratch_range_end =
@@ -392,24 +394,20 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 }
 
 static __rte_always_inline void
-cn10k_ml_prep_fp_job_descriptor(struct rte_ml_dev *dev, struct cn10k_ml_req *req,
+cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_req *req,
 				struct rte_ml_op *op)
 {
-	struct cn10k_ml_dev *mldev;
-
-	mldev = dev->data->dev_private;
-
 	req->jd.hdr.jce.w0.u64 = 0;
 	req->jd.hdr.jce.w1.u64 = req->compl_W1;
 	req->jd.hdr.model_id = op->model_id;
 	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
 	req->jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
 	req->jd.hdr.sp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
 	req->jd.model_run.input_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->input[0]->addr));
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->input[0]->addr));
 	req->jd.model_run.output_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->output[0]->addr));
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->output[0]->addr));
 	req->jd.model_run.num_batches = op->nb_batches;
 }
 
@@ -436,66 +434,69 @@ static const struct xstat_info model_stats[] = {
 static int
 cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint16_t nb_stats;
 	uint16_t stat_id;
 	uint16_t model;
 	uint16_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
 	nb_stats = RTE_DIM(device_stats) + ML_CN10K_MAX_MODELS * RTE_DIM(model_stats);
-	if (mldev->xstats.entries == NULL)
-		mldev->xstats.entries = rte_zmalloc("cn10k_ml_xstats",
-						    sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
-						    PLT_CACHE_LINE_SIZE);
+	if (cn10k_mldev->xstats.entries == NULL)
+		cn10k_mldev->xstats.entries = rte_zmalloc(
+			"cn10k_ml_xstats", sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
+			PLT_CACHE_LINE_SIZE);
 
-	if (mldev->xstats.entries == NULL)
+	if (cn10k_mldev->xstats.entries == NULL)
 		return -ENOMEM;
 
 	/* Initialize device xstats */
 	stat_id = 0;
 	for (i = 0; i < RTE_DIM(device_stats); i++) {
-		mldev->xstats.entries[stat_id].map.id = stat_id;
-		snprintf(mldev->xstats.entries[stat_id].map.name,
-			 sizeof(mldev->xstats.entries[stat_id].map.name), "%s",
+		cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
+		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s",
 			 device_stats[i].name);
 
-		mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
-		mldev->xstats.entries[stat_id].type = device_stats[i].type;
-		mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_DEVICE;
-		mldev->xstats.entries[stat_id].obj_idx = 0;
-		mldev->xstats.entries[stat_id].reset_allowed = device_stats[i].reset_allowed;
+		cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
+		cn10k_mldev->xstats.entries[stat_id].type = device_stats[i].type;
+		cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_DEVICE;
+		cn10k_mldev->xstats.entries[stat_id].obj_idx = 0;
+		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_stats[i].reset_allowed;
 		stat_id++;
 	}
-	mldev->xstats.count_mode_device = stat_id;
+	cn10k_mldev->xstats.count_mode_device = stat_id;
 
 	/* Initialize model xstats */
 	for (model = 0; model < ML_CN10K_MAX_MODELS; model++) {
-		mldev->xstats.offset_for_model[model] = stat_id;
+		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
 
 		for (i = 0; i < RTE_DIM(model_stats); i++) {
-			mldev->xstats.entries[stat_id].map.id = stat_id;
-			mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
-			mldev->xstats.entries[stat_id].type = model_stats[i].type;
-			mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_MODEL;
-			mldev->xstats.entries[stat_id].obj_idx = model;
-			mldev->xstats.entries[stat_id].reset_allowed = model_stats[i].reset_allowed;
+			cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
+			cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
+			cn10k_mldev->xstats.entries[stat_id].type = model_stats[i].type;
+			cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_MODEL;
+			cn10k_mldev->xstats.entries[stat_id].obj_idx = model;
+			cn10k_mldev->xstats.entries[stat_id].reset_allowed =
+				model_stats[i].reset_allowed;
 
 			/* Name of xstat is updated during model load */
-			snprintf(mldev->xstats.entries[stat_id].map.name,
-				 sizeof(mldev->xstats.entries[stat_id].map.name), "Model-%u-%s",
-				 model, model_stats[i].name);
+			snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
+				 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name),
+				 "Model-%u-%s", model, model_stats[i].name);
 
 			stat_id++;
 		}
 
-		mldev->xstats.count_per_model[model] = RTE_DIM(model_stats);
+		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(model_stats);
 	}
 
-	mldev->xstats.count_mode_model = stat_id - mldev->xstats.count_mode_device;
-	mldev->xstats.count = stat_id;
+	cn10k_mldev->xstats.count_mode_model = stat_id - cn10k_mldev->xstats.count_mode_device;
+	cn10k_mldev->xstats.count = stat_id;
 
 	return 0;
 }
@@ -503,28 +504,32 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 static void
 cn10k_ml_xstats_uninit(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	rte_free(mldev->xstats.entries);
-	mldev->xstats.entries = NULL;
+	rte_free(cn10k_mldev->xstats.entries);
+	cn10k_mldev->xstats.entries = NULL;
 
-	mldev->xstats.count = 0;
+	cn10k_mldev->xstats.count = 0;
 }
 
 static void
 cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	uint16_t rclk_freq;
 	uint16_t sclk_freq;
 	uint16_t stat_id;
 	char suffix[8];
 	uint16_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[model_id];
 	stat_id = RTE_DIM(device_stats) + model_id * RTE_DIM(model_stats);
 
@@ -536,8 +541,8 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Update xstat name based on model name and sclk availability */
 	for (i = 0; i < RTE_DIM(model_stats); i++) {
-		snprintf(mldev->xstats.entries[stat_id].map.name,
-			 sizeof(mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
 			 model->metadata.model.name, model_stats[i].name, suffix);
 		stat_id++;
 	}
@@ -547,19 +552,19 @@ static uint64_t
 cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 		       enum cn10k_ml_xstats_type type)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
 
 	switch (type) {
 	case nb_models_loaded:
-		return mldev->nb_models_loaded;
+		return cnxk_mldev->nb_models_loaded;
 	case nb_models_unloaded:
-		return mldev->nb_models_unloaded;
+		return cnxk_mldev->nb_models_unloaded;
 	case nb_models_started:
-		return mldev->nb_models_started;
+		return cnxk_mldev->nb_models_started;
 	case nb_models_stopped:
-		return mldev->nb_models_stopped;
+		return cnxk_mldev->nb_models_stopped;
 	default:
 		return -1;
 	}
@@ -651,15 +656,17 @@ static int
 cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[], uint16_t nb_ids)
 {
 	struct cn10k_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint16_t nb_stats;
 	uint16_t stat_id;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	if (stat_ids == NULL)
-		nb_stats = mldev->xstats.count_mode_device;
+		nb_stats = cn10k_mldev->xstats.count_mode_device;
 	else
 		nb_stats = nb_ids;
 
@@ -669,10 +676,10 @@ cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[],
 		else
 			stat_id = stat_ids[i];
 
-		if (stat_id >= mldev->xstats.count_mode_device)
+		if (stat_id >= cn10k_mldev->xstats.count_mode_device)
 			return -EINVAL;
 
-		xs = &mldev->xstats.entries[stat_id];
+		xs = &cn10k_mldev->xstats.entries[stat_id];
 		if (!xs->reset_allowed)
 			continue;
 
@@ -740,15 +747,17 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 			    uint16_t nb_ids)
 {
 	struct cn10k_ml_xstats_entry *xs;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	int32_t lcl_model_id = 0;
 	uint16_t start_id;
 	uint16_t end_id;
 	int32_t i;
 	int32_t j;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	for (i = 0; i < ML_CN10K_MAX_MODELS; i++) {
 		if (model_id == -1) {
 			model = dev->data->models[i];
@@ -765,12 +774,13 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 			}
 		}
 
-		start_id = mldev->xstats.offset_for_model[i];
-		end_id = mldev->xstats.offset_for_model[i] + mldev->xstats.count_per_model[i] - 1;
+		start_id = cn10k_mldev->xstats.offset_for_model[i];
+		end_id = cn10k_mldev->xstats.offset_for_model[i] +
+			 cn10k_mldev->xstats.count_per_model[i] - 1;
 
 		if (stat_ids == NULL) {
 			for (j = start_id; j <= end_id; j++) {
-				xs = &mldev->xstats.entries[j];
+				xs = &cn10k_mldev->xstats.entries[j];
 				cn10k_ml_reset_model_stat(dev, i, xs->type);
 			}
 		} else {
@@ -780,7 +790,7 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 						stat_ids[j], lcl_model_id);
 					return -EINVAL;
 				}
-				xs = &mldev->xstats.entries[stat_ids[j]];
+				xs = &cn10k_mldev->xstats.entries[stat_ids[j]];
 				cn10k_ml_reset_model_stat(dev, i, xs->type);
 			}
 		}
@@ -854,17 +864,19 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
 	if (dev_info == NULL)
 		return -EINVAL;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
 	dev_info->driver_name = dev->device->driver->name;
 	dev_info->max_models = ML_CN10K_MAX_MODELS;
-	if (mldev->hw_queue_lock)
+	if (cn10k_mldev->hw_queue_lock)
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
 	else
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_LF;
@@ -881,8 +893,9 @@ static int
 cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
 {
 	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
@@ -895,7 +908,8 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		return -EINVAL;
 
 	/* Get CN10K device handle */
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	cn10k_ml_dev_info_get(dev, &dev_info);
 	if (conf->nb_models > dev_info.max_models) {
@@ -908,21 +922,21 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		return -EINVAL;
 	}
 
-	if (mldev->state == ML_CN10K_DEV_STATE_PROBED) {
+	if (cnxk_mldev->state == ML_CNXK_DEV_STATE_PROBED) {
 		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
 
 		/* Load firmware */
-		ret = cn10k_ml_fw_load(mldev);
+		ret = cn10k_ml_fw_load(cnxk_mldev);
 		if (ret != 0)
 			return ret;
-	} else if (mldev->state == ML_CN10K_DEV_STATE_CONFIGURED) {
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
 		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
-	} else if (mldev->state == ML_CN10K_DEV_STATE_STARTED) {
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_STARTED) {
 		plt_err("Device can't be reconfigured in started state\n");
 		return -ENOTSUP;
-	} else if (mldev->state == ML_CN10K_DEV_STATE_CLOSED) {
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CLOSED) {
 		plt_err("Device can't be reconfigured after close\n");
 		return -ENOTSUP;
 	}
@@ -1013,10 +1027,10 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	}
 	dev->data->nb_models = conf->nb_models;
 
-	ocm = &mldev->ocm;
+	ocm = &cn10k_mldev->ocm;
 	ocm->num_tiles = ML_CN10K_OCM_NUMTILES;
 	ocm->size_per_tile = ML_CN10K_OCM_TILESIZE;
-	ocm->page_size = mldev->ocm_page_size;
+	ocm->page_size = cn10k_mldev->ocm_page_size;
 	ocm->num_pages = ocm->size_per_tile / ocm->page_size;
 	ocm->mask_words = ocm->num_pages / (8 * sizeof(uint8_t));
 
@@ -1044,25 +1058,25 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	}
 
 	/* Set JCMDQ enqueue function */
-	if (mldev->hw_queue_lock == 1)
-		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_sl;
+	if (cn10k_mldev->hw_queue_lock == 1)
+		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_sl;
 	else
-		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
+		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
 	/* Set polling function pointers */
-	mldev->set_poll_addr = cn10k_ml_set_poll_addr;
-	mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
-	mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
+	cn10k_mldev->set_poll_addr = cn10k_ml_set_poll_addr;
+	cn10k_mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
+	cn10k_mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
 
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
 	dev->op_error_get = cn10k_ml_op_error_get;
 
-	mldev->nb_models_loaded = 0;
-	mldev->nb_models_started = 0;
-	mldev->nb_models_stopped = 0;
-	mldev->nb_models_unloaded = 0;
-	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
+	cnxk_mldev->nb_models_loaded = 0;
+	cnxk_mldev->nb_models_started = 0;
+	cnxk_mldev->nb_models_stopped = 0;
+	cnxk_mldev->nb_models_unloaded = 0;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
 
 	return 0;
 
@@ -1077,8 +1091,9 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 static int
 cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
 	uint16_t qp_id;
@@ -1086,10 +1101,11 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	if (dev == NULL)
 		return -EINVAL;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Release ocm_mask memory */
-	rte_free(mldev->ocm.ocm_mask);
+	rte_free(cn10k_mldev->ocm.ocm_mask);
 
 	/* Stop and unload all models */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
@@ -1125,21 +1141,21 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	cn10k_ml_xstats_uninit(dev);
 
 	/* Unload firmware */
-	cn10k_ml_fw_unload(mldev);
+	cn10k_ml_fw_unload(cnxk_mldev);
 
 	/* Clear scratch registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
 
 	/* Reset ML_MLR_BASE */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_MLR_BASE);
-	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
 
-	mldev->state = ML_CN10K_DEV_STATE_CLOSED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CLOSED;
 
 	/* Remove PCI device */
 	return rte_dev_remove(dev->device);
@@ -1148,17 +1164,19 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 static int
 cn10k_ml_dev_start(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	mldev->state = ML_CN10K_DEV_STATE_STARTED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
 
 	return 0;
 }
@@ -1166,17 +1184,19 @@ cn10k_ml_dev_start(struct rte_ml_dev *dev)
 static int
 cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
 
 	return 0;
 }
@@ -1259,22 +1279,24 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
 			      uint32_t size)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint32_t xstats_mode_count;
 	uint32_t idx = 0;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	xstats_mode_count = 0;
 	switch (mode) {
 	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = mldev->xstats.count_mode_device;
+		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CN10K_MAX_MODELS)
 			break;
-		xstats_mode_count = mldev->xstats.count_per_model[model_id];
+		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
 	default:
 		return -EINVAL;
@@ -1283,16 +1305,17 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 	if (xstats_mode_count > size || xstats_map == NULL)
 		return xstats_mode_count;
 
-	for (i = 0; i < mldev->xstats.count && idx < size; i++) {
-		if (mldev->xstats.entries[i].mode != mode)
+	for (i = 0; i < cn10k_mldev->xstats.count && idx < size; i++) {
+		if (cn10k_mldev->xstats.entries[i].mode != mode)
 			continue;
 
 		if (mode != RTE_ML_DEV_XSTATS_DEVICE &&
-		    model_id != mldev->xstats.entries[i].obj_idx)
+		    model_id != cn10k_mldev->xstats.entries[i].obj_idx)
 			continue;
 
-		strncpy(xstats_map[idx].name, mldev->xstats.entries[i].map.name, RTE_ML_STR_MAX);
-		xstats_map[idx].id = mldev->xstats.entries[i].map.id;
+		strncpy(xstats_map[idx].name, cn10k_mldev->xstats.entries[i].map.name,
+			RTE_ML_STR_MAX);
+		xstats_map[idx].id = cn10k_mldev->xstats.entries[i].map.id;
 		idx++;
 	}
 
@@ -1304,13 +1327,15 @@ cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16
 				uint64_t *value)
 {
 	struct cn10k_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	cn10k_ml_xstats_fn fn;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
-	for (i = 0; i < mldev->xstats.count; i++) {
-		xs = &mldev->xstats.entries[i];
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	for (i = 0; i < cn10k_mldev->xstats.count; i++) {
+		xs = &cn10k_mldev->xstats.entries[i];
 		if (strncmp(xs->map.name, name, RTE_ML_STR_MAX) == 0) {
 			if (stat_id != NULL)
 				*stat_id = xs->map.id;
@@ -1344,24 +1369,26 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
 {
 	struct cn10k_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint32_t xstats_mode_count;
 	cn10k_ml_xstats_fn fn;
 	uint64_t val;
 	uint32_t idx;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	xstats_mode_count = 0;
 
 	switch (mode) {
 	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = mldev->xstats.count_mode_device;
+		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CN10K_MAX_MODELS)
 			return -EINVAL;
-		xstats_mode_count = mldev->xstats.count_per_model[model_id];
+		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
 	default:
 		return -EINVAL;
@@ -1369,8 +1396,8 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 
 	idx = 0;
 	for (i = 0; i < nb_ids && idx < xstats_mode_count; i++) {
-		xs = &mldev->xstats.entries[stat_ids[i]];
-		if (stat_ids[i] > mldev->xstats.count || xs->mode != mode)
+		xs = &cn10k_mldev->xstats.entries[stat_ids[i]];
+		if (stat_ids[i] > cn10k_mldev->xstats.count || xs->mode != mode)
 			continue;
 
 		if (mode == RTE_ML_DEV_XSTATS_MODEL && model_id != xs->obj_idx) {
@@ -1418,8 +1445,9 @@ cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mo
 static int
 cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_fw *fw;
 
 	uint32_t head_loc;
@@ -1432,8 +1460,9 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	if (roc_env_is_asim())
 		return 0;
 
-	mldev = dev->data->dev_private;
-	fw = &mldev->fw;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	fw = &cn10k_mldev->fw;
 
 	/* Dump model info */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
@@ -1451,15 +1480,19 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	for (core_id = 0; core_id <= 1; core_id++) {
 		bufsize = fw->req->jd.fw_load.debug.debug_buffer_size;
 		if (core_id == 0) {
-			head_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-			tail_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+			head_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+			tail_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_debug_ptr);
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		} else {
-			head_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-			tail_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+			head_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+			tail_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_debug_ptr);
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		}
 		if (head_loc < tail_loc) {
 			fprintf(fp, "%.*s\n", tail_loc - head_loc, &head_ptr[head_loc]);
@@ -1473,18 +1506,18 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	for (core_id = 0; core_id <= 1; core_id++) {
 		bufsize = fw->req->jd.fw_load.debug.exception_state_size;
 		if ((core_id == 0) &&
-		    (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0)) {
+		    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0)) {
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C0 = 0x%016lx",
-				roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0));
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0));
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 			fprintf(fp, "%.*s", bufsize, head_ptr);
-		} else if ((core_id == 1) &&
-			   (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0)) {
+		} else if ((core_id == 1) && (roc_ml_reg_read64(&cn10k_mldev->roc,
+								ML_SCRATCH_EXCEPTION_SP_C1) != 0)) {
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C1 = 0x%016lx",
-				roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1));
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1));
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 			fprintf(fp, "%.*s", bufsize, head_ptr);
 		}
 	}
@@ -1495,14 +1528,16 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 static int
 cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	const struct plt_memzone *mz;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	uint64_t timeout_cycle;
 	bool timeout;
 	int ret;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cn10k_ml_req), 0,
 					 ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL) {
@@ -1515,20 +1550,20 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
 	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
 	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
-	req->jd.fw_load.flags = cn10k_ml_fw_flags_get(&mldev->fw);
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
+	req->jd.fw_load.flags = cn10k_ml_fw_flags_get(&cn10k_mldev->fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
 	/* Enqueue firmware selftest request through scratch registers */
 	timeout = true;
-	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
 
 	plt_rmb();
 	do {
-		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
-		    (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
+		    (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -1552,8 +1587,8 @@ int
 cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
 	struct cn10k_ml_model_metadata *metadata;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
@@ -1574,7 +1609,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	if (ret != 0)
 		return ret;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
 
 	/* Find model ID */
 	found = false;
@@ -1591,7 +1626,8 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	/* Get WB and scratch pages, check if model can be loaded. */
-	ret = cn10k_ml_model_ocm_pages_count(mldev, idx, params->addr, &wb_pages, &scratch_pages);
+	ret = cn10k_ml_model_ocm_pages_count(&cnxk_mldev->cn10k_mldev, idx, params->addr, &wb_pages,
+					     &scratch_pages);
 	if (ret < 0)
 		return ret;
 
@@ -1623,7 +1659,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	model = mz->addr;
-	model->mldev = mldev;
+	model->mldev = cnxk_mldev;
 	model->model_id = idx;
 
 	rte_memcpy(&model->metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
@@ -1680,7 +1716,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
-	mldev->nb_models_loaded++;
+	cnxk_mldev->nb_models_loaded++;
 
 	/* Update xstats names */
 	cn10k_ml_xstats_model_name_update(dev, idx);
@@ -1695,9 +1731,9 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	char str[RTE_MEMZONE_NAMESIZE];
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
 	model = dev->data->models[model_id];
 
 	if (model == NULL) {
@@ -1711,7 +1747,7 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 	}
 
 	dev->data->models[model_id] = NULL;
-	mldev->nb_models_unloaded++;
+	cnxk_mldev->nb_models_unloaded++;
 
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, model_id);
 	return plt_memzone_free(plt_memzone_lookup(str));
@@ -1720,8 +1756,9 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 int
 cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1735,8 +1772,9 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	bool locked;
 	int ret = 0;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	if (model == NULL) {
@@ -1746,11 +1784,11 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Prepare JD */
 	req = model->req;
-	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
+	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
 
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
 	num_tiles = model->metadata.model.tile_end - model->metadata.model.tile_start + 1;
@@ -1815,26 +1853,26 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	job_dequeued = false;
 	do {
 		if (!job_enqueued) {
-			req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&mldev->roc, &req->jd);
+			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
 			if (req->result.error_code.u64 == 0)
 				ret = 0;
 			else
 				ret = -1;
 		}
 	} else { /* Reset scratch registers */
-		roc_ml_scratch_queue_reset(&mldev->roc);
+		roc_ml_scratch_queue_reset(&cn10k_mldev->roc);
 		ret = -ETIME;
 	}
 
@@ -1843,7 +1881,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 		if (plt_spinlock_trylock(&model->lock) != 0) {
 			if (ret == 0) {
 				model->state = ML_CN10K_MODEL_STATE_STARTED;
-				mldev->nb_models_started++;
+				cnxk_mldev->nb_models_started++;
 			} else {
 				model->state = ML_CN10K_MODEL_STATE_UNKNOWN;
 			}
@@ -1867,7 +1905,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	if (ret < 0) { /* Call unload to update model and FW state, ignore error */
 		rte_ml_model_stop(dev->data->dev_id, model_id);
 	} else {
-		if (mldev->cache_model_data && roc_model_is_cn10ka())
+		if (cn10k_mldev->cache_model_data && roc_model_is_cn10ka())
 			ret = cn10k_ml_cache_model_data(dev, model_id);
 	}
 
@@ -1877,8 +1915,9 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 int
 cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1887,8 +1926,9 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	bool locked;
 	int ret = 0;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	if (model == NULL) {
@@ -1898,11 +1938,11 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Prepare JD */
 	req = model->req;
-	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
+	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
 
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
 	locked = false;
@@ -1941,33 +1981,33 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	job_dequeued = false;
 	do {
 		if (!job_enqueued) {
-			req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&mldev->roc, &req->jd);
+			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
 			if (req->result.error_code.u64 == 0x0)
 				ret = 0;
 			else
 				ret = -1;
 		}
 	} else {
-		roc_ml_scratch_queue_reset(&mldev->roc);
+		roc_ml_scratch_queue_reset(&cn10k_mldev->roc);
 		ret = -ETIME;
 	}
 
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			mldev->nb_models_stopped++;
+			cnxk_mldev->nb_models_stopped++;
 			model->state = ML_CN10K_MODEL_STATE_LOADED;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
@@ -2211,8 +2251,9 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		       struct rte_ml_op *op)
 {
 	struct cn10k_ml_model_stats *stats;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
@@ -2258,14 +2299,16 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 
 		/* Handle driver error */
 		if (result->error_code.s.etype == ML_ETYPE_DRIVER) {
-			mldev = dev->data->dev_private;
+			cnxk_mldev = dev->data->dev_private;
+			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 			/* Check for exception */
-			if ((roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0) ||
-			    (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
+			if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) !=
+			     0) ||
+			    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
 				result->error_code.s.stype = ML_DRIVER_ERR_EXCEPTION;
-			else if ((roc_ml_reg_read64(&mldev->roc, ML_CORE_INT_LO) != 0) ||
-				 (roc_ml_reg_read64(&mldev->roc, ML_CORE_INT_HI) != 0))
+			else if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_LO) != 0) ||
+				 (roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_HI) != 0))
 				result->error_code.s.stype = ML_DRIVER_ERR_FW_ERROR;
 			else
 				result->error_code.s.stype = ML_DRIVER_ERR_UNKNOWN;
@@ -2282,8 +2325,9 @@ __rte_hot uint16_t
 cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	struct cn10k_ml_qp *qp;
 	struct rte_ml_op *op;
@@ -2292,7 +2336,8 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	uint64_t head;
 	bool enqueued;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	qp = dev->data->queue_pairs[qp_id];
 	queue = &qp->queue;
 
@@ -2307,15 +2352,15 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	op = ops[count];
 	req = &queue->reqs[head];
 
-	mldev->set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
+	cn10k_mldev->set_poll_addr(req);
+	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
-	mldev->set_poll_ptr(req);
-	enqueued = mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd);
+	cn10k_mldev->set_poll_ptr(req);
+	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
 
@@ -2339,8 +2384,9 @@ __rte_hot uint16_t
 cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	struct cn10k_ml_qp *qp;
 
@@ -2348,7 +2394,8 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	uint16_t count;
 	uint64_t tail;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	qp = dev->data->queue_pairs[qp_id];
 	queue = &qp->queue;
 
@@ -2361,8 +2408,8 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 
 dequeue_req:
 	req = &queue->reqs[tail];
-	status = mldev->get_poll_ptr(req);
-	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH)) {
+	status = cn10k_mldev->get_poll_ptr(req);
+	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
 		if (plt_tsc_cycles() < req->timeout)
 			goto empty_or_active;
 		else /* Timeout, set indication of driver error */
@@ -2420,30 +2467,32 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 __rte_hot int
 cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	bool timeout;
 	int ret = 0;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[op->model_id];
 	req = model->req;
 
 	cn10k_ml_set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
-	mldev->set_poll_ptr(req);
+	cn10k_mldev->set_poll_ptr(req);
 	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
 
 	timeout = true;
-	req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	do {
-		if (mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd)) {
+		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd)) {
 			req->op = op;
 			timeout = false;
 			break;
@@ -2457,7 +2506,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 
 	timeout = true;
 	do {
-		if (mldev->get_poll_ptr(req) == ML_CN10K_POLL_JOB_FINISH) {
+		if (cn10k_mldev->get_poll_ptr(req) == ML_CNXK_POLL_JOB_FINISH) {
 			timeout = false;
 			break;
 		}
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.c b/drivers/ml/cnxk/cnxk_ml_dev.c
new file mode 100644
index 0000000000..2a5c17c973
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_dev.c
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include "cnxk_ml_dev.h"
+
+/* Dummy operations for ML device */
+struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
new file mode 100644
index 0000000000..51315de622
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_DEV_H_
+#define _CNXK_ML_DEV_H_
+
+#include <roc_api.h>
+
+#include "cn10k_ml_dev.h"
+
+/* ML command timeout in seconds */
+#define ML_CNXK_CMD_TIMEOUT 5
+
+/* Poll mode job state */
+#define ML_CNXK_POLL_JOB_START	0
+#define ML_CNXK_POLL_JOB_FINISH 1
+
+/* Device configuration state enum */
+enum cnxk_ml_dev_state {
+	/* Probed and not configured */
+	ML_CNXK_DEV_STATE_PROBED = 0,
+
+	/* Configured */
+	ML_CNXK_DEV_STATE_CONFIGURED,
+
+	/* Started */
+	ML_CNXK_DEV_STATE_STARTED,
+
+	/* Closed */
+	ML_CNXK_DEV_STATE_CLOSED
+};
+
+/* Device private data */
+struct cnxk_ml_dev {
+	/* RTE device */
+	struct rte_ml_dev *mldev;
+
+	/* Configuration state */
+	enum cnxk_ml_dev_state state;
+
+	/* Number of models loaded */
+	uint16_t nb_models_loaded;
+
+	/* Number of models unloaded */
+	uint16_t nb_models_unloaded;
+
+	/* Number of models started */
+	uint16_t nb_models_started;
+
+	/* Number of models stopped */
+	uint16_t nb_models_stopped;
+
+	/* CN10K device structure */
+	struct cn10k_ml_dev cn10k_mldev;
+};
+
+#endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 94fa4283b1..03a2d4ecf2 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -12,6 +12,7 @@ driver_sdk_headers = files(
         'cn10k_ml_ops.h',
         'cn10k_ml_model.h',
         'cn10k_ml_ocm.h',
+        'cnxk_ml_dev.h',
 )
 
 sources = files(
@@ -19,6 +20,7 @@ sources = files(
         'cn10k_ml_ops.c',
         'cn10k_ml_model.c',
         'cn10k_ml_ocm.c',
+        'cnxk_ml_dev.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 03/35] ml/cnxk: add generic model and layer structures
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 01/35] ml/cnxk: drop support for register polling Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 02/35] ml/cnxk: add generic cnxk device structure Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 04/35] ml/cnxk: add generic cnxk request structure Srikanth Yalavarthi
                     ` (31 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduce generic cnxk model and layer structure. These
structures would enable supporting models with multiple
layers. A model is a collection of multiple independent
layers with flow dependencies between the layers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |   9 +-
 drivers/ml/cnxk/cn10k_ml_model.c | 245 ++++++++--------
 drivers/ml/cnxk/cn10k_ml_model.h | 122 ++------
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  50 ++--
 drivers/ml/cnxk/cn10k_ml_ocm.h   |   9 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 488 +++++++++++++++++--------------
 drivers/ml/cnxk/cnxk_ml_io.h     |  79 +++++
 drivers/ml/cnxk/cnxk_ml_model.c  |   7 +
 drivers/ml/cnxk/cnxk_ml_model.h  | 111 +++++++
 drivers/ml/cnxk/meson.build      |   3 +
 10 files changed, 653 insertions(+), 470 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index f9da1548c4..99ff0a344a 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -9,6 +9,8 @@
 
 #include "cn10k_ml_ocm.h"
 
+#include "cnxk_ml_io.h"
+
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 
@@ -21,9 +23,6 @@ extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 /* Device alignment size */
 #define ML_CN10K_ALIGN_SIZE 128
 
-/* Maximum number of models per device */
-#define ML_CN10K_MAX_MODELS 16
-
 /* Maximum number of queue-pairs per device, spinlock version */
 #define ML_CN10K_MAX_QP_PER_DEVICE_SL 16
 
@@ -455,8 +454,8 @@ struct cn10k_ml_xstats {
 	struct cn10k_ml_xstats_entry *entries;
 
 	/* Store num stats and offset of the stats for each model */
-	uint16_t count_per_model[ML_CN10K_MAX_MODELS];
-	uint16_t offset_for_model[ML_CN10K_MAX_MODELS];
+	uint16_t count_per_model[ML_CNXK_MAX_MODELS];
+	uint16_t offset_for_model[ML_CNXK_MAX_MODELS];
 	uint16_t count_mode_device;
 	uint16_t count_mode_model;
 	uint16_t count;
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index cc46ca2efd..d747bba151 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -6,10 +6,10 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -311,19 +311,17 @@ cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata)
 }
 
 void
-cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_t *base_dma_addr)
+cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer, uint8_t *base_dma_addr)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_model_addr *addr;
+	struct cn10k_ml_layer_addr *addr;
 	size_t model_data_size;
 	uint8_t *dma_addr_load;
 	uint8_t *dma_addr_run;
-	uint8_t i;
-	uint8_t j;
 	int fpos;
 
-	metadata = &model->metadata;
-	addr = &model->addr;
+	metadata = &layer->glow.metadata;
+	addr = &layer->glow.addr;
 	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
 			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
 
@@ -361,102 +359,136 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_
 	addr->wb_base_addr = PLT_PTR_SUB(dma_addr_load, metadata->weights_bias.mem_offset);
 	addr->wb_load_addr = PLT_PTR_ADD(addr->wb_base_addr, metadata->weights_bias.mem_offset);
 	rte_memcpy(addr->wb_load_addr, PLT_PTR_ADD(buffer, fpos), metadata->weights_bias.file_size);
+}
+
+void
+cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	uint8_t i;
+	uint8_t j;
+
+	metadata = &layer->glow.metadata;
 
 	/* Inputs */
-	addr->total_input_sz_d = 0;
-	addr->total_input_sz_q = 0;
+	layer->info.nb_inputs = metadata->model.num_input;
+	layer->info.total_input_sz_d = 0;
+	layer->info.total_input_sz_q = 0;
 	for (i = 0; i < metadata->model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			addr->input[i].nb_dims = 4;
-			addr->input[i].shape[0] = metadata->input1[i].shape.w;
-			addr->input[i].shape[1] = metadata->input1[i].shape.x;
-			addr->input[i].shape[2] = metadata->input1[i].shape.y;
-			addr->input[i].shape[3] = metadata->input1[i].shape.z;
-
-			addr->input[i].nb_elements =
+			strncpy(layer->info.input[i].name, (char *)metadata->input1[i].input_name,
+				MRVL_ML_INPUT_NAME_LEN);
+			layer->info.input[i].dtype = metadata->input1[i].input_type;
+			layer->info.input[i].qtype = metadata->input1[i].model_input_type;
+			layer->info.input[i].nb_dims = 4;
+			layer->info.input[i].shape[0] = metadata->input1[i].shape.w;
+			layer->info.input[i].shape[1] = metadata->input1[i].shape.x;
+			layer->info.input[i].shape[2] = metadata->input1[i].shape.y;
+			layer->info.input[i].shape[3] = metadata->input1[i].shape.z;
+			layer->info.input[i].nb_elements =
 				metadata->input1[i].shape.w * metadata->input1[i].shape.x *
 				metadata->input1[i].shape.y * metadata->input1[i].shape.z;
-			addr->input[i].sz_d =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_d =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].input_type);
-			addr->input[i].sz_q =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_q =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].model_input_type);
-			addr->total_input_sz_d += addr->input[i].sz_d;
-			addr->total_input_sz_q += addr->input[i].sz_q;
+			layer->info.input[i].scale = metadata->input1[i].qscale;
+
+			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
+			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
 
 			plt_ml_dbg(
-				"model_id = %u, input[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				model->model_id, i, metadata->input1[i].shape.w,
+				"index = %u, input1[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				layer->index, i, metadata->input1[i].shape.w,
 				metadata->input1[i].shape.x, metadata->input1[i].shape.y,
-				metadata->input1[i].shape.z, addr->input[i].sz_d,
-				addr->input[i].sz_q);
+				metadata->input1[i].shape.z, layer->info.input[i].sz_d,
+				layer->info.input[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			addr->input[i].nb_dims = 4;
-			addr->input[i].shape[0] = metadata->input2[j].shape.w;
-			addr->input[i].shape[1] = metadata->input2[j].shape.x;
-			addr->input[i].shape[2] = metadata->input2[j].shape.y;
-			addr->input[i].shape[3] = metadata->input2[j].shape.z;
-
-			addr->input[i].nb_elements =
+			strncpy(layer->info.input[i].name, (char *)metadata->input2[j].input_name,
+				MRVL_ML_INPUT_NAME_LEN);
+			layer->info.input[i].dtype = metadata->input2[j].input_type;
+			layer->info.input[i].qtype = metadata->input2[j].model_input_type;
+			layer->info.input[i].nb_dims = 4;
+			layer->info.input[i].shape[0] = metadata->input2[j].shape.w;
+			layer->info.input[i].shape[1] = metadata->input2[j].shape.x;
+			layer->info.input[i].shape[2] = metadata->input2[j].shape.y;
+			layer->info.input[i].shape[3] = metadata->input2[j].shape.z;
+			layer->info.input[i].nb_elements =
 				metadata->input2[j].shape.w * metadata->input2[j].shape.x *
 				metadata->input2[j].shape.y * metadata->input2[j].shape.z;
-			addr->input[i].sz_d =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_d =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].input_type);
-			addr->input[i].sz_q =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_q =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].model_input_type);
-			addr->total_input_sz_d += addr->input[i].sz_d;
-			addr->total_input_sz_q += addr->input[i].sz_q;
+			layer->info.input[i].scale = metadata->input2[j].qscale;
+
+			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
+			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
 
 			plt_ml_dbg(
-				"model_id = %u, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				model->model_id, j, metadata->input2[j].shape.w,
+				"index = %u, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				layer->index, j, metadata->input2[j].shape.w,
 				metadata->input2[j].shape.x, metadata->input2[j].shape.y,
-				metadata->input2[j].shape.z, addr->input[i].sz_d,
-				addr->input[i].sz_q);
+				metadata->input2[j].shape.z, layer->info.input[i].sz_d,
+				layer->info.input[i].sz_q);
 		}
 	}
 
 	/* Outputs */
-	addr->total_output_sz_q = 0;
-	addr->total_output_sz_d = 0;
+	layer->info.nb_outputs = metadata->model.num_output;
+	layer->info.total_output_sz_q = 0;
+	layer->info.total_output_sz_d = 0;
 	for (i = 0; i < metadata->model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			addr->output[i].nb_dims = 1;
-			addr->output[i].shape[0] = metadata->output1[i].size;
-			addr->output[i].nb_elements = metadata->output1[i].size;
-			addr->output[i].sz_d =
-				addr->output[i].nb_elements *
+			strncpy(layer->info.output[i].name,
+				(char *)metadata->output1[i].output_name, MRVL_ML_OUTPUT_NAME_LEN);
+			layer->info.output[i].dtype = metadata->output1[i].output_type;
+			layer->info.output[i].qtype = metadata->output1[i].model_output_type;
+			layer->info.output[i].nb_dims = 1;
+			layer->info.output[i].shape[0] = metadata->output1[i].size;
+			layer->info.output[i].nb_elements = metadata->output1[i].size;
+			layer->info.output[i].sz_d =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].output_type);
-			addr->output[i].sz_q =
-				addr->output[i].nb_elements *
+			layer->info.output[i].sz_q =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].model_output_type);
-			addr->total_output_sz_q += addr->output[i].sz_q;
-			addr->total_output_sz_d += addr->output[i].sz_d;
+			layer->info.output[i].scale = metadata->output1[i].dscale;
 
-			plt_ml_dbg("model_id = %u, output[%u] - sz_d = %u, sz_q = %u",
-				   model->model_id, i, addr->output[i].sz_d, addr->output[i].sz_q);
+			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
+			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
+
+			plt_ml_dbg("index = %u, output1[%u] - sz_d = %u, sz_q = %u", layer->index,
+				   i, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			addr->output[i].nb_dims = 1;
-			addr->output[i].shape[0] = metadata->output2[j].size;
-			addr->output[i].nb_elements = metadata->output2[j].size;
-			addr->output[i].sz_d =
-				addr->output[i].nb_elements *
+			strncpy(layer->info.output[i].name,
+				(char *)metadata->output2[j].output_name, MRVL_ML_OUTPUT_NAME_LEN);
+			layer->info.output[i].dtype = metadata->output2[j].output_type;
+			layer->info.output[i].qtype = metadata->output2[j].model_output_type;
+			layer->info.output[i].nb_dims = 1;
+			layer->info.output[i].shape[0] = metadata->output2[j].size;
+			layer->info.output[i].nb_elements = metadata->output2[j].size;
+			layer->info.output[i].sz_d =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].output_type);
-			addr->output[i].sz_q =
-				addr->output[i].nb_elements *
+			layer->info.output[i].sz_q =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].model_output_type);
-			addr->total_output_sz_q += addr->output[i].sz_q;
-			addr->total_output_sz_d += addr->output[i].sz_d;
+			layer->info.output[i].scale = metadata->output2[j].dscale;
+
+			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
+			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
 
-			plt_ml_dbg("model_id = %u, output2[%u] - sz_d = %u, sz_q = %u",
-				   model->model_id, j, addr->output[i].sz_d, addr->output[i].sz_q);
+			plt_ml_dbg("index = %u, output2[%u] - sz_d = %u, sz_q = %u", layer->index,
+				   j, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
 		}
 	}
 }
@@ -514,23 +546,23 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 }
 
 void
-cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model)
+cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_model_addr *addr;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct rte_ml_model_info *info;
 	struct rte_ml_io_info *output;
 	struct rte_ml_io_info *input;
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_layer *layer;
 	uint8_t i;
-	uint8_t j;
 
-	mldev = dev->data->dev_private;
-	metadata = &model->metadata;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	metadata = &model->glow.metadata;
 	info = PLT_PTR_CAST(model->info);
 	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
 	output = PLT_PTR_ADD(input, metadata->model.num_input * sizeof(struct rte_ml_io_info));
-	addr = &model->addr;
 
 	/* Set model info */
 	memset(info, 0, sizeof(struct rte_ml_model_info));
@@ -542,7 +574,8 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model)
 	info->device_id = dev->data->dev_id;
 	info->io_layout = RTE_ML_IO_LAYOUT_PACKED;
 	info->min_batches = model->batch_size;
-	info->max_batches = mldev->fw.req->jd.fw_load.cap.s.max_num_batches / model->batch_size;
+	info->max_batches =
+		cn10k_mldev->fw.req->jd.fw_load.cap.s.max_num_batches / model->batch_size;
 	info->nb_inputs = metadata->model.num_input;
 	info->input_info = input;
 	info->nb_outputs = metadata->model.num_output;
@@ -550,56 +583,26 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model)
 	info->wb_size = metadata->weights_bias.file_size;
 
 	/* Set input info */
+	layer = &model->layer[0];
 	for (i = 0; i < info->nb_inputs; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			rte_memcpy(input[i].name, metadata->input1[i].input_name,
-				   MRVL_ML_INPUT_NAME_LEN);
-			input[i].nb_dims = addr->input[i].nb_dims;
-			input[i].shape = addr->input[i].shape;
-			input[i].type = metadata->input1[i].model_input_type;
-			input[i].nb_elements = addr->input[i].nb_elements;
-			input[i].size =
-				addr->input[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->input1[i].model_input_type);
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-
-			rte_memcpy(input[i].name, metadata->input2[j].input_name,
-				   MRVL_ML_INPUT_NAME_LEN);
-			input[i].nb_dims = addr->input[i].nb_dims;
-			input[i].shape = addr->input[i].shape;
-			input[i].type = metadata->input2[j].model_input_type;
-			input[i].nb_elements = addr->input[i].nb_elements;
-			input[i].size =
-				addr->input[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->input2[j].model_input_type);
-		}
+		rte_memcpy(input[i].name, layer->info.input[i].name, MRVL_ML_INPUT_NAME_LEN);
+		input[i].nb_dims = layer->info.input[i].nb_dims;
+		input[i].shape = &layer->info.input[i].shape[0];
+		input[i].type = layer->info.input[i].qtype;
+		input[i].nb_elements = layer->info.input[i].nb_elements;
+		input[i].size = layer->info.input[i].nb_elements *
+				rte_ml_io_type_size_get(layer->info.input[i].qtype);
 	}
 
 	/* Set output info */
+	layer = &model->layer[0];
 	for (i = 0; i < info->nb_outputs; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			rte_memcpy(output[i].name, metadata->output1[i].output_name,
-				   MRVL_ML_OUTPUT_NAME_LEN);
-			output[i].nb_dims = addr->output[i].nb_dims;
-			output[i].shape = addr->output[i].shape;
-			output[i].type = metadata->output1[i].model_output_type;
-			output[i].nb_elements = addr->output[i].nb_elements;
-			output[i].size =
-				addr->output[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->output1[i].model_output_type);
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-
-			rte_memcpy(output[i].name, metadata->output2[j].output_name,
-				   MRVL_ML_OUTPUT_NAME_LEN);
-			output[i].nb_dims = addr->output[i].nb_dims;
-			output[i].shape = addr->output[i].shape;
-			output[i].type = metadata->output2[j].model_output_type;
-			output[i].nb_elements = addr->output[i].nb_elements;
-			output[i].size =
-				addr->output[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->output2[j].model_output_type);
-		}
+		rte_memcpy(output[i].name, layer->info.output[i].name, MRVL_ML_INPUT_NAME_LEN);
+		output[i].nb_dims = layer->info.output[i].nb_dims;
+		output[i].shape = &layer->info.output[i].shape[0];
+		output[i].type = layer->info.output[i].qtype;
+		output[i].nb_elements = layer->info.output[i].nb_elements;
+		output[i].size = layer->info.output[i].nb_elements *
+				 rte_ml_io_type_size_get(layer->info.output[i].qtype);
 	}
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 3128b28db7..206a369ca7 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -13,15 +13,8 @@
 #include "cn10k_ml_ocm.h"
 #include "cn10k_ml_ops.h"
 
-struct cnxk_ml_dev;
-
-/* Model state */
-enum cn10k_ml_model_state {
-	ML_CN10K_MODEL_STATE_LOADED,
-	ML_CN10K_MODEL_STATE_JOB_ACTIVE,
-	ML_CN10K_MODEL_STATE_STARTED,
-	ML_CN10K_MODEL_STATE_UNKNOWN,
-};
+struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 /* Model Metadata : v 2.3.0.1 */
 #define MRVL_ML_MODEL_MAGIC_STRING "MRVL"
@@ -369,7 +362,7 @@ struct cn10k_ml_model_metadata {
 };
 
 /* Model address structure */
-struct cn10k_ml_model_addr {
+struct cn10k_ml_layer_addr {
 	/* Base DMA address for load */
 	void *base_dma_addr_load;
 
@@ -408,58 +401,10 @@ struct cn10k_ml_model_addr {
 
 	/* End tile */
 	uint8_t tile_end;
-
-	/* Input address and size */
-	struct {
-		/* Number of dimensions in shape */
-		uint32_t nb_dims;
-
-		/* Shape of input */
-		uint32_t shape[4];
-
-		/* Number of elements */
-		uint32_t nb_elements;
-
-		/* Dequantized input size */
-		uint32_t sz_d;
-
-		/* Quantized input size */
-		uint32_t sz_q;
-	} input[MRVL_ML_NUM_INPUT_OUTPUT];
-
-	/* Output address and size */
-	struct {
-		/* Number of dimensions in shape */
-		uint32_t nb_dims;
-
-		/* Shape of input */
-		uint32_t shape[4];
-
-		/* Number of elements */
-		uint32_t nb_elements;
-
-		/* Dequantize output size */
-		uint32_t sz_d;
-
-		/* Quantized output size */
-		uint32_t sz_q;
-	} output[MRVL_ML_NUM_INPUT_OUTPUT];
-
-	/* Total size of quantized input */
-	uint32_t total_input_sz_q;
-
-	/* Total size of dequantized input */
-	uint32_t total_input_sz_d;
-
-	/* Total size of quantized output */
-	uint32_t total_output_sz_q;
-
-	/* Total size of dequantized output */
-	uint32_t total_output_sz_d;
 };
 
 /* Model fast-path stats */
-struct cn10k_ml_model_stats {
+struct cn10k_ml_layer_stats {
 	/* Total hardware latency, sum of all inferences */
 	uint64_t hw_latency_tot;
 
@@ -488,59 +433,38 @@ struct cn10k_ml_model_stats {
 	uint64_t fw_reset_count;
 };
 
-/* Model Object */
-struct cn10k_ml_model {
-	/* Device reference */
-	struct cnxk_ml_dev *mldev;
-
-	/* Name */
-	char name[RTE_ML_STR_MAX];
-
-	/* ID */
-	uint16_t model_id;
-
-	/* Batch size */
-	uint32_t batch_size;
-
-	/* Metadata */
+struct cn10k_ml_layer_data {
+	/* Model / Layer: metadata */
 	struct cn10k_ml_model_metadata metadata;
 
-	/* Address structure */
-	struct cn10k_ml_model_addr addr;
+	/* Layer: address structure */
+	struct cn10k_ml_layer_addr addr;
 
-	/* Tile and memory information object */
-	struct cn10k_ml_ocm_model_map model_mem_map;
+	/* Layer: Tile and memory information object */
+	struct cn10k_ml_ocm_layer_map ocm_map;
 
-	/* Internal model information structure
-	 * Size of the buffer = sizeof(struct rte_ml_model_info)
-	 *                    + num_inputs * sizeof(struct rte_ml_io_info)
-	 *                    + num_outputs * sizeof(struct rte_ml_io_info).
-	 * Structures would be arranged in the same order in the buffer.
-	 */
-	uint8_t *info;
-
-	/* Spinlock, used to update model state */
-	plt_spinlock_t lock;
-
-	/* State */
-	enum cn10k_ml_model_state state;
-
-	/* Slow-path operations request pointer */
+	/* Layer: Slow-path operations request pointer */
 	struct cn10k_ml_req *req;
 
-	/* Stats for burst ops */
-	struct cn10k_ml_model_stats *burst_stats;
+	/* Layer: Stats for burst ops */
+	struct cn10k_ml_layer_stats *burst_stats;
 
-	/* Stats for sync ops */
-	struct cn10k_ml_model_stats *sync_stats;
+	/* Layer: Stats for sync ops */
+	struct cn10k_ml_layer_stats *sync_stats;
+};
+
+struct cn10k_ml_model_data {
+	/* Model / Layer: metadata */
+	struct cn10k_ml_model_metadata metadata;
 };
 
 int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
-void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
+void cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer,
 				uint8_t *base_dma_addr);
+void cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer);
 int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id,
 				   uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages);
-void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model);
+void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 8094a0fab1..d71c36eae6 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -6,10 +6,10 @@
 
 #include <roc_api.h>
 
-#include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 
 /* OCM macros */
 #define BYTE_LEN	   8
@@ -333,12 +333,14 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 }
 
 void
-cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t tilemask,
-			   int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages)
+cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
+			   uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
+			   uint16_t scratch_pages)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_page_start;
@@ -353,6 +355,7 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
+	layer = &model->layer[layer_id];
 
 	/* Get first set bit, tile_start */
 	tile_start = 0;
@@ -382,8 +385,8 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 				PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, wb_page_end);
 	}
 
-	model->addr.tile_start = tile_start;
-	model->addr.tile_end = tile_end;
+	layer->glow.addr.tile_start = tile_start;
+	layer->glow.addr.tile_end = tile_end;
 
 	plt_ml_dbg("model_id = %u, tilemask = 0x%016lx", model_id, tilemask);
 	plt_ml_dbg("model_id = %u, wb_page_start = %d, wb_page_end = %d", model_id, wb_page_start,
@@ -393,12 +396,14 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 }
 
 void
-cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id)
 {
-	struct cn10k_ml_model *local_model;
+	struct cnxk_ml_model *local_model;
+	struct cnxk_ml_layer *local_layer;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_resize_pages;
@@ -409,16 +414,19 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 	int tile_id;
 	int page_id;
 	uint16_t i;
+	uint16_t j;
 
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
+	layer = &model->layer[layer_id];
 
 	/* Update OCM info for WB memory */
-	wb_page_start = model->model_mem_map.wb_page_start;
-	wb_page_end = wb_page_start + model->model_mem_map.wb_pages - 1;
-	for (tile_id = model->addr.tile_start; tile_id <= model->addr.tile_end; tile_id++) {
+	wb_page_start = layer->glow.ocm_map.wb_page_start;
+	wb_page_end = wb_page_start + layer->glow.ocm_map.wb_pages - 1;
+	for (tile_id = layer->glow.addr.tile_start; tile_id <= layer->glow.addr.tile_end;
+	     tile_id++) {
 		for (page_id = wb_page_start; page_id <= wb_page_end; page_id++) {
 			CLEAR_BIT(ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE],
 				  page_id % OCM_MAP_WORD_SIZE);
@@ -432,11 +440,19 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 		scratch_resize_pages = 0;
 		for (i = 0; i < dev->data->nb_models; i++) {
 			local_model = dev->data->models[i];
-			if ((i != model_id) && (local_model != NULL)) {
-				if (IS_BIT_SET(local_model->model_mem_map.tilemask, tile_id))
-					scratch_resize_pages = PLT_MAX(
-						(int)local_model->model_mem_map.scratch_pages,
-						scratch_resize_pages);
+			if (local_model == NULL)
+				continue;
+
+			for (j = 0; j < local_model->nb_layers; j++) {
+				local_layer = &local_model->layer[j];
+				if (local_layer != layer &&
+				    local_layer->glow.ocm_map.ocm_reserved) {
+					if (IS_BIT_SET(local_layer->glow.ocm_map.tilemask, tile_id))
+						scratch_resize_pages =
+							PLT_MAX((int)local_layer->glow.ocm_map
+									.scratch_pages,
+								scratch_resize_pages);
+				}
 			}
 		}
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 3404e7fd65..720f8caf76 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -27,7 +27,7 @@ struct cn10k_ml_ocm_tile_info {
 };
 
 /* Model OCM map structure */
-struct cn10k_ml_ocm_model_map {
+struct cn10k_ml_ocm_layer_map {
 	/* Status of OCM reservation */
 	bool ocm_reserved;
 
@@ -77,9 +77,10 @@ struct cn10k_ml_ocm {
 int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
 int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
 			       uint16_t scratch_pages, uint64_t *tilemask);
-void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t tilemask,
-				int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages);
-void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id);
+void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
+				uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
+				uint16_t scratch_pages);
+void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id);
 void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp);
 
 #endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index def6d4c756..e91cc4e859 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -7,10 +7,10 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_model.h"
 #include "cn10k_ml_ops.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
@@ -202,7 +202,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	char str[STR_LEN];
 	uint8_t i;
@@ -215,77 +215,80 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 
 	/* Print debug info */
 	print_line(fp, LINE_LEN);
-	fprintf(fp, " Model Information (%s)\n", model->metadata.model.name);
+	fprintf(fp, " Model Information (%s)\n", model->glow.metadata.model.name);
 	print_line(fp, LINE_LEN);
-	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->metadata.model.name);
-	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version", model->metadata.model.version[0],
-		model->metadata.model.version[1], model->metadata.model.version[2],
-		model->metadata.model.version[3]);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->glow.metadata.model.name);
+	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version",
+		model->glow.metadata.model.version[0], model->glow.metadata.model.version[1],
+		model->glow.metadata.model.version[2], model->glow.metadata.model.version[3]);
 	if (strlen(model->name) != 0)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "debug_name", model->name);
 	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->metadata.model.batch_size);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_layers", model->metadata.model.num_layers);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->glow.metadata.model.batch_size);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_layers", model->glow.metadata.model.num_layers);
 
 	/* Print model state */
-	if (model->state == ML_CN10K_MODEL_STATE_LOADED)
+	if (model->state == ML_CNXK_MODEL_STATE_LOADED)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
-	if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE)
+	if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
-	if (model->state == ML_CN10K_MODEL_STATE_STARTED)
+	if (model->state == ML_CNXK_MODEL_STATE_STARTED)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
 
 	/* Print OCM status */
 	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
-		model->metadata.model.ocm_wb_range_end - model->metadata.model.ocm_wb_range_start +
-			1);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", model->model_mem_map.wb_pages);
+		model->glow.metadata.model.ocm_wb_range_end -
+			model->glow.metadata.model.ocm_wb_range_start + 1);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", model->layer[0].glow.ocm_map.wb_pages);
 	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
-		ocm->size_per_tile - model->metadata.model.ocm_tmp_range_floor);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages", model->model_mem_map.scratch_pages);
+		ocm->size_per_tile - model->glow.metadata.model.ocm_tmp_range_floor);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages",
+		model->layer[0].glow.ocm_map.scratch_pages);
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
-		model->metadata.model.tile_end - model->metadata.model.tile_start + 1);
+		model->glow.metadata.model.tile_end - model->glow.metadata.model.tile_start + 1);
 
-	if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+	if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
-			ML_CN10K_OCM_NUMTILES / 4, model->model_mem_map.tilemask);
+			ML_CN10K_OCM_NUMTILES / 4, model->layer[0].glow.ocm_map.tilemask);
 		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
-			model->model_mem_map.wb_page_start * cn10k_mldev->ocm.page_size);
+			model->layer[0].glow.ocm_map.wb_page_start * cn10k_mldev->ocm.page_size);
 	}
 
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->metadata.model.num_input);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", model->metadata.model.num_output);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->glow.metadata.model.num_input);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", model->glow.metadata.model.num_output);
 	fprintf(fp, "\n");
 
 	print_line(fp, LINE_LEN);
 	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "input", "input_name", "input_type",
 		"model_input_type", "quantize");
 	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->metadata.model.num_input; i++) {
+	for (i = 0; i < model->glow.metadata.model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.input1[i].input_name);
-			rte_ml_io_type_to_str(model->metadata.input1[i].input_type, str, STR_LEN);
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.input1[i].input_name);
+			rte_ml_io_type_to_str(model->glow.metadata.input1[i].input_type, str,
+					      STR_LEN);
 			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.input1[i].model_input_type, str,
+			rte_ml_io_type_to_str(model->glow.metadata.input1[i].model_input_type, str,
 					      STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.input1[i].quantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.input1[i].quantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.input2[j].input_name);
-			rte_ml_io_type_to_str(model->metadata.input2[j].input_type, str, STR_LEN);
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.input2[j].input_name);
+			rte_ml_io_type_to_str(model->glow.metadata.input2[j].input_type, str,
+					      STR_LEN);
 			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.input2[j].model_input_type, str,
+			rte_ml_io_type_to_str(model->glow.metadata.input2[j].model_input_type, str,
 					      STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.input2[j].quantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.input2[j].quantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		}
 	}
@@ -295,29 +298,31 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "output", "output_name", "output_type",
 		"model_output_type", "dequantize");
 	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->metadata.model.num_output; i++) {
+	for (i = 0; i < model->glow.metadata.model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.output1[i].output_name);
-			rte_ml_io_type_to_str(model->metadata.output1[i].output_type, str, STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.output1[i].model_output_type, str,
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.output1[i].output_name);
+			rte_ml_io_type_to_str(model->glow.metadata.output1[i].output_type, str,
 					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(model->glow.metadata.output1[i].model_output_type,
+					      str, STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.output1[i].dequantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.output1[i].dequantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.output2[j].output_name);
-			rte_ml_io_type_to_str(model->metadata.output2[j].output_type, str, STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.output2[j].model_output_type, str,
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.output2[j].output_name);
+			rte_ml_io_type_to_str(model->glow.metadata.output2[j].output_type, str,
 					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(model->glow.metadata.output2[j].model_output_type,
+					      str, STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.output2[j].dequantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.output2[j].dequantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		}
 	}
@@ -327,14 +332,14 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 }
 
 static void
-cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_model *model,
+cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_model *model,
 				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_model_addr *addr;
+	struct cn10k_ml_layer_addr *addr;
 
-	metadata = &model->metadata;
-	addr = &model->addr;
+	metadata = &model->glow.metadata;
+	addr = &model->layer[0].glow.addr;
 
 	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
 	req->jd.hdr.jce.w0.u64 = 0;
@@ -345,7 +350,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_m
 	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
-		if (!model->metadata.model.ocm_relocatable)
+		if (!model->glow.metadata.model.ocm_relocatable)
 			req->jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
 		else
 			req->jd.hdr.sp_flags = 0x0;
@@ -385,7 +390,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_m
 		req->jd.model_start.output.s.ddr_range_end = metadata->model.ddr_output_range_end;
 
 		req->extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
-			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, model->addr.scratch_base_addr));
+			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->scratch_base_addr));
 		req->extended_args.start.ddr_scratch_range_start =
 			metadata->model.ddr_scratch_range_start;
 		req->extended_args.start.ddr_scratch_range_end =
@@ -445,7 +450,7 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
-	nb_stats = RTE_DIM(device_stats) + ML_CN10K_MAX_MODELS * RTE_DIM(model_stats);
+	nb_stats = RTE_DIM(device_stats) + ML_CNXK_MAX_MODELS * RTE_DIM(model_stats);
 	if (cn10k_mldev->xstats.entries == NULL)
 		cn10k_mldev->xstats.entries = rte_zmalloc(
 			"cn10k_ml_xstats", sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
@@ -472,7 +477,7 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	cn10k_mldev->xstats.count_mode_device = stat_id;
 
 	/* Initialize model xstats */
-	for (model = 0; model < ML_CN10K_MAX_MODELS; model++) {
+	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
 		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
 
 		for (i = 0; i < RTE_DIM(model_stats); i++) {
@@ -521,7 +526,7 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint16_t rclk_freq;
 	uint16_t sclk_freq;
 	uint16_t stat_id;
@@ -543,7 +548,7 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 	for (i = 0; i < RTE_DIM(model_stats); i++) {
 		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
-			 model->metadata.model.name, model_stats[i].name, suffix);
+			 model->layer[0].glow.metadata.model.name, model_stats[i].name, suffix);
 		stat_id++;
 	}
 }
@@ -576,9 +581,9 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value += model->burst_stats[qp_id].str##_latency_tot;                      \
-			count += model->burst_stats[qp_id].dequeued_count -                        \
-				 model->burst_stats[qp_id].str##_reset_count;                      \
+			value += model->layer[0].glow.burst_stats[qp_id].str##_latency_tot;        \
+			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
+				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
 		}                                                                                  \
 		if (count != 0)                                                                    \
 			value = value / count;                                                     \
@@ -588,9 +593,10 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = UINT64_MAX;                                                                \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MIN(value, model->burst_stats[qp_id].str##_latency_min);       \
-			count += model->burst_stats[qp_id].dequeued_count -                        \
-				 model->burst_stats[qp_id].str##_reset_count;                      \
+			value = PLT_MIN(                                                           \
+				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_min); \
+			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
+				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
@@ -600,9 +606,10 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MAX(value, model->burst_stats[qp_id].str##_latency_max);       \
-			count += model->burst_stats[qp_id].dequeued_count -                        \
-				 model->burst_stats[qp_id].str##_reset_count;                      \
+			value = PLT_MAX(                                                           \
+				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_max); \
+			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
+				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
@@ -611,7 +618,7 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 static uint64_t
 cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cn10k_ml_xstats_type type)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint16_t rclk_freq; /* MHz */
 	uint16_t sclk_freq; /* MHz */
 	uint64_t count = 0;
@@ -692,28 +699,28 @@ cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[],
 #define ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			model->burst_stats[qp_id].str##_latency_tot = 0;                           \
-			model->burst_stats[qp_id].str##_reset_count =                              \
-				model->burst_stats[qp_id].dequeued_count;                          \
+			model->layer[0].glow.burst_stats[qp_id].str##_latency_tot = 0;             \
+			model->layer[0].glow.burst_stats[qp_id].str##_reset_count =                \
+				model->layer[0].glow.burst_stats[qp_id].dequeued_count;            \
 		}                                                                                  \
 	} while (0)
 
 #define ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->burst_stats[qp_id].str##_latency_min = UINT64_MAX;                  \
+			model->layer[0].glow.burst_stats[qp_id].str##_latency_min = UINT64_MAX;    \
 	} while (0)
 
 #define ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->burst_stats[qp_id].str##_latency_max = 0;                           \
+			model->layer[0].glow.burst_stats[qp_id].str##_latency_max = 0;             \
 	} while (0)
 
 static void
 cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cn10k_ml_xstats_type type)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint32_t qp_id;
 
 	model = dev->data->models[model_id];
@@ -749,7 +756,7 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	int32_t lcl_model_id = 0;
 	uint16_t start_id;
 	uint16_t end_id;
@@ -758,7 +765,7 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	for (i = 0; i < ML_CN10K_MAX_MODELS; i++) {
+	for (i = 0; i < ML_CNXK_MAX_MODELS; i++) {
 		if (model_id == -1) {
 			model = dev->data->models[i];
 			if (model == NULL) /* Skip inactive models */
@@ -803,7 +810,7 @@ static int
 cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct rte_ml_model_info *info;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct rte_ml_buff_seg seg[2];
 	struct rte_ml_buff_seg *inp;
 	struct rte_ml_buff_seg *out;
@@ -854,7 +861,7 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	op.input = &inp;
 	op.output = &out;
 
-	memset(model->req, 0, sizeof(struct cn10k_ml_req));
+	memset(model->layer[0].glow.req, 0, sizeof(struct cn10k_ml_req));
 	ret = cn10k_ml_inference_sync(dev, &op);
 	plt_memzone_free(mz);
 
@@ -875,7 +882,7 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 
 	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
 	dev_info->driver_name = dev->device->driver->name;
-	dev_info->max_models = ML_CN10K_MAX_MODELS;
+	dev_info->max_models = ML_CNXK_MAX_MODELS;
 	if (cn10k_mldev->hw_queue_lock)
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
 	else
@@ -895,7 +902,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
@@ -1001,11 +1008,11 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 			model = dev->data->models[model_id];
 			if (model != NULL) {
-				if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 					if (cn10k_ml_model_stop(dev, model_id) != 0)
 						plt_err("Could not stop model %u", model_id);
 				}
-				if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
 					if (cn10k_ml_model_unload(dev, model_id) != 0)
 						plt_err("Could not unload model %u", model_id);
 				}
@@ -1093,7 +1100,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
 	uint16_t qp_id;
@@ -1111,11 +1118,11 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 		model = dev->data->models[model_id];
 		if (model != NULL) {
-			if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 				if (cn10k_ml_model_stop(dev, model_id) != 0)
 					plt_err("Could not stop model %u", model_id);
 			}
-			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
 				if (cn10k_ml_model_unload(dev, model_id) != 0)
 					plt_err("Could not unload model %u", model_id);
 			}
@@ -1294,7 +1301,7 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CN10K_MAX_MODELS)
+		if (model_id >= ML_CNXK_MAX_MODELS)
 			break;
 		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
@@ -1386,7 +1393,7 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CN10K_MAX_MODELS)
+		if (model_id >= ML_CNXK_MAX_MODELS)
 			return -EINVAL;
 		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
@@ -1447,7 +1454,7 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_fw *fw;
 
 	uint32_t head_loc;
@@ -1588,7 +1595,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
@@ -1643,9 +1650,9 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
 			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
 	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
-	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_model_stats);
+	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_stats);
 
-	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
+	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
 		  2 * model_data_size + model_scratch_size + model_info_size +
 		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE) +
 		  model_stats_size;
@@ -1659,62 +1666,85 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	model = mz->addr;
-	model->mldev = cnxk_mldev;
+	model->cnxk_mldev = cnxk_mldev;
 	model->model_id = idx;
+	dev->data->models[idx] = model;
 
-	rte_memcpy(&model->metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
-	cn10k_ml_model_metadata_update(&model->metadata);
+	rte_memcpy(&model->glow.metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
+	cn10k_ml_model_metadata_update(&model->glow.metadata);
+
+	/* Set model name */
+	rte_memcpy(model->name, (char *)model->glow.metadata.model.name, 64);
 
 	/* Enable support for batch_size of 256 */
-	if (model->metadata.model.batch_size == 0)
+	if (model->glow.metadata.model.batch_size == 0)
 		model->batch_size = 256;
 	else
-		model->batch_size = model->metadata.model.batch_size;
+		model->batch_size = model->glow.metadata.model.batch_size;
+
+	/* Since the number of layers that the driver would be handling for glow models is
+	 * always 1. consider the entire model as a model with single layer. This would
+	 * ignore the num_layers from metadata.
+	 */
+	model->nb_layers = 1;
+
+	/* Copy metadata to internal buffer */
+	rte_memcpy(&model->layer[0].glow.metadata, params->addr,
+		   sizeof(struct cn10k_ml_model_metadata));
+	cn10k_ml_model_metadata_update(&model->layer[0].glow.metadata);
+	model->layer[0].model = model;
 
 	/* Set DMA base address */
 	base_dma_addr = PLT_PTR_ADD(
-		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE));
-	cn10k_ml_model_addr_update(model, params->addr, base_dma_addr);
-	model->addr.scratch_base_addr = PLT_PTR_ADD(base_dma_addr, 2 * model_data_size);
+		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE));
+	cn10k_ml_layer_addr_update(&model->layer[0], params->addr, base_dma_addr);
+	model->layer[0].glow.addr.scratch_base_addr =
+		PLT_PTR_ADD(base_dma_addr, 2 * model_data_size);
 
 	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, model_data_size);
+	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
+		   model->layer[0].glow.addr.base_dma_addr_load, model_data_size);
+
+	/* Update internal I/O data structure */
+	cn10k_ml_layer_info_update(&model->layer[0]);
 
 	/* Initialize model_mem_map */
-	memset(&model->model_mem_map, 0, sizeof(struct cn10k_ml_ocm_model_map));
-	model->model_mem_map.ocm_reserved = false;
-	model->model_mem_map.tilemask = 0;
-	model->model_mem_map.wb_page_start = -1;
-	model->model_mem_map.wb_pages = wb_pages;
-	model->model_mem_map.scratch_pages = scratch_pages;
+	memset(&model->layer[0].glow.ocm_map, 0, sizeof(struct cn10k_ml_ocm_layer_map));
+	model->layer[0].glow.ocm_map.ocm_reserved = false;
+	model->layer[0].glow.ocm_map.tilemask = 0;
+	model->layer[0].glow.ocm_map.wb_page_start = -1;
+	model->layer[0].glow.ocm_map.wb_pages = wb_pages;
+	model->layer[0].glow.ocm_map.scratch_pages = scratch_pages;
 
 	/* Set model info */
-	model->info = PLT_PTR_ADD(model->addr.scratch_base_addr, model_scratch_size);
+	model->info = PLT_PTR_ADD(model->layer[0].glow.addr.scratch_base_addr, model_scratch_size);
 	cn10k_ml_model_info_set(dev, model);
 
 	/* Set slow-path request address and state */
-	model->req = PLT_PTR_ADD(model->info, model_info_size);
+	model->layer[0].glow.req = PLT_PTR_ADD(model->info, model_info_size);
 
 	/* Reset burst and sync stats */
-	model->burst_stats = PLT_PTR_ADD(
-		model->req, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE));
+	model->layer[0].glow.burst_stats =
+		PLT_PTR_ADD(model->layer[0].glow.req,
+			    PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE));
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
-		model->burst_stats[qp_id].hw_latency_tot = 0;
-		model->burst_stats[qp_id].hw_latency_min = UINT64_MAX;
-		model->burst_stats[qp_id].hw_latency_max = 0;
-		model->burst_stats[qp_id].fw_latency_tot = 0;
-		model->burst_stats[qp_id].fw_latency_min = UINT64_MAX;
-		model->burst_stats[qp_id].fw_latency_max = 0;
-		model->burst_stats[qp_id].hw_reset_count = 0;
-		model->burst_stats[qp_id].fw_reset_count = 0;
-		model->burst_stats[qp_id].dequeued_count = 0;
-	}
-	model->sync_stats =
-		PLT_PTR_ADD(model->burst_stats,
-			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_model_stats));
+		model->layer[0].glow.burst_stats[qp_id].hw_latency_tot = 0;
+		model->layer[0].glow.burst_stats[qp_id].hw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_stats[qp_id].hw_latency_max = 0;
+		model->layer[0].glow.burst_stats[qp_id].fw_latency_tot = 0;
+		model->layer[0].glow.burst_stats[qp_id].fw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_stats[qp_id].fw_latency_max = 0;
+		model->layer[0].glow.burst_stats[qp_id].hw_reset_count = 0;
+		model->layer[0].glow.burst_stats[qp_id].fw_reset_count = 0;
+		model->layer[0].glow.burst_stats[qp_id].dequeued_count = 0;
+	}
+
+	model->layer[0].glow.sync_stats =
+		PLT_PTR_ADD(model->layer[0].glow.burst_stats,
+			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_stats));
 
 	plt_spinlock_init(&model->lock);
-	model->state = ML_CN10K_MODEL_STATE_LOADED;
+	model->state = ML_CNXK_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
 	cnxk_mldev->nb_models_loaded++;
 
@@ -1730,7 +1760,7 @@ int
 cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	char str[RTE_MEMZONE_NAMESIZE];
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cnxk_ml_dev *cnxk_mldev;
 
 	cnxk_mldev = dev->data->dev_private;
@@ -1741,7 +1771,7 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EINVAL;
 	}
 
-	if (model->state != ML_CN10K_MODEL_STATE_LOADED) {
+	if (model->state != ML_CNXK_MODEL_STATE_LOADED) {
 		plt_err("Cannot unload. Model in use.");
 		return -EBUSY;
 	}
@@ -1758,7 +1788,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1783,7 +1813,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	}
 
 	/* Prepare JD */
-	req = model->req;
+	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
@@ -1791,63 +1821,66 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
-	num_tiles = model->metadata.model.tile_end - model->metadata.model.tile_start + 1;
+	num_tiles = model->layer[0].glow.metadata.model.tile_end -
+		    model->layer[0].glow.metadata.model.tile_start + 1;
 
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 				plt_ml_dbg("Model already started, model = 0x%016lx",
 					   PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE) {
+			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
 				plt_err("A slow-path job is active for the model = 0x%016lx",
 					PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CN10K_MODEL_STATE_JOB_ACTIVE;
+			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (!model->model_mem_map.ocm_reserved) {
+	while (!model->layer[0].glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
 			wb_page_start = cn10k_ml_ocm_tilemask_find(
-				dev, num_tiles, model->model_mem_map.wb_pages,
-				model->model_mem_map.scratch_pages, &tilemask);
+				dev, num_tiles, model->layer[0].glow.ocm_map.wb_pages,
+				model->layer[0].glow.ocm_map.scratch_pages, &tilemask);
 
 			if (wb_page_start == -1) {
 				plt_err("Free pages not available on OCM tiles");
 				plt_err("Failed to start model = 0x%016lx, name = %s",
-					PLT_U64_CAST(model), model->metadata.model.name);
+					PLT_U64_CAST(model),
+					model->layer[0].glow.metadata.model.name);
 
 				plt_spinlock_unlock(&ocm->lock);
 				return -ENOMEM;
 			}
 
-			model->model_mem_map.tilemask = tilemask;
-			model->model_mem_map.wb_page_start = wb_page_start;
+			model->layer[0].glow.ocm_map.tilemask = tilemask;
+			model->layer[0].glow.ocm_map.wb_page_start = wb_page_start;
 
-			cn10k_ml_ocm_reserve_pages(
-				dev, model->model_id, model->model_mem_map.tilemask,
-				model->model_mem_map.wb_page_start, model->model_mem_map.wb_pages,
-				model->model_mem_map.scratch_pages);
-			model->model_mem_map.ocm_reserved = true;
+			cn10k_ml_ocm_reserve_pages(dev, model->model_id, 0,
+						   model->layer[0].glow.ocm_map.tilemask,
+						   model->layer[0].glow.ocm_map.wb_page_start,
+						   model->layer[0].glow.ocm_map.wb_pages,
+						   model->layer[0].glow.ocm_map.scratch_pages);
+			model->layer[0].glow.ocm_map.ocm_reserved = true;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
 
 	/* Update JD */
-	cn10k_ml_ocm_tilecount(model->model_mem_map.tilemask, &tile_start, &tile_end);
+	cn10k_ml_ocm_tilecount(model->layer[0].glow.ocm_map.tilemask, &tile_start, &tile_end);
 	req->jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
 	req->jd.model_start.ocm_wb_base_address =
-		model->model_mem_map.wb_page_start * ocm->page_size;
+		model->layer[0].glow.ocm_map.wb_page_start * ocm->page_size;
 
 	job_enqueued = false;
 	job_dequeued = false;
@@ -1880,10 +1913,10 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
 			if (ret == 0) {
-				model->state = ML_CN10K_MODEL_STATE_STARTED;
+				model->state = ML_CNXK_MODEL_STATE_STARTED;
 				cnxk_mldev->nb_models_started++;
 			} else {
-				model->state = ML_CN10K_MODEL_STATE_UNKNOWN;
+				model->state = ML_CNXK_MODEL_STATE_UNKNOWN;
 			}
 
 			plt_spinlock_unlock(&model->lock);
@@ -1891,12 +1924,12 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 		}
 	}
 
-	if (model->state == ML_CN10K_MODEL_STATE_UNKNOWN) {
-		while (model->model_mem_map.ocm_reserved) {
+	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN) {
+		while (model->layer[0].glow.ocm_map.ocm_reserved) {
 			if (plt_spinlock_trylock(&ocm->lock) != 0) {
-				cn10k_ml_ocm_free_pages(dev, model->model_id);
-				model->model_mem_map.ocm_reserved = false;
-				model->model_mem_map.tilemask = 0x0;
+				cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
+				model->layer[0].glow.ocm_map.ocm_reserved = false;
+				model->layer[0].glow.ocm_map.tilemask = 0x0;
 				plt_spinlock_unlock(&ocm->lock);
 			}
 		}
@@ -1917,7 +1950,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1937,7 +1970,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	}
 
 	/* Prepare JD */
-	req = model->req;
+	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
@@ -1948,31 +1981,31 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
 				plt_ml_dbg("Model not started, model = 0x%016lx",
 					   PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE) {
+			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
 				plt_err("A slow-path job is active for the model = 0x%016lx",
 					PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CN10K_MODEL_STATE_JOB_ACTIVE;
+			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (model->model_mem_map.ocm_reserved) {
+	while (model->layer[0].glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
-			cn10k_ml_ocm_free_pages(dev, model->model_id);
-			model->model_mem_map.ocm_reserved = false;
-			model->model_mem_map.tilemask = 0x0;
+			cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
+			model->layer[0].glow.ocm_map.ocm_reserved = false;
+			model->layer[0].glow.ocm_map.tilemask = 0x0;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
@@ -2008,7 +2041,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
 			cnxk_mldev->nb_models_stopped++;
-			model->state = ML_CN10K_MODEL_STATE_LOADED;
+			model->state = ML_CNXK_MODEL_STATE_LOADED;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
@@ -2021,7 +2054,7 @@ static int
 cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			struct rte_ml_model_info *model_info)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 
 	model = dev->data->models[model_id];
 
@@ -2040,7 +2073,7 @@ cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 static int
 cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	size_t size;
 
 	model = dev->data->models[model_id];
@@ -2050,19 +2083,23 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *bu
 		return -EINVAL;
 	}
 
-	if (model->state == ML_CN10K_MODEL_STATE_UNKNOWN)
+	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN)
 		return -1;
-	else if (model->state != ML_CN10K_MODEL_STATE_LOADED)
+	else if (model->state != ML_CNXK_MODEL_STATE_LOADED)
 		return -EBUSY;
 
-	size = model->metadata.init_model.file_size + model->metadata.main_model.file_size +
-	       model->metadata.finish_model.file_size + model->metadata.weights_bias.file_size;
+	size = model->layer[0].glow.metadata.init_model.file_size +
+	       model->layer[0].glow.metadata.main_model.file_size +
+	       model->layer[0].glow.metadata.finish_model.file_size +
+	       model->layer[0].glow.metadata.weights_bias.file_size;
 
 	/* Update model weights & bias */
-	rte_memcpy(model->addr.wb_load_addr, buffer, model->metadata.weights_bias.file_size);
+	rte_memcpy(model->layer[0].glow.addr.wb_load_addr, buffer,
+		   model->layer[0].glow.metadata.weights_bias.file_size);
 
 	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, size);
+	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
+		   model->layer[0].glow.addr.base_dma_addr_load, size);
 
 	return 0;
 }
@@ -2071,7 +2108,7 @@ static int
 cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
 		     struct rte_ml_buff_seg **qbuffer)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint8_t model_input_type;
 	uint8_t *lcl_dbuffer;
 	uint8_t *lcl_qbuffer;
@@ -2091,57 +2128,58 @@ cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_bu
 	lcl_dbuffer = dbuffer[0]->addr;
 	lcl_qbuffer = qbuffer[0]->addr;
 
-	for (i = 0; i < model->metadata.model.num_input; i++) {
+	for (i = 0; i < model->layer[0].glow.metadata.model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			input_type = model->metadata.input1[i].input_type;
-			model_input_type = model->metadata.input1[i].model_input_type;
-			qscale = model->metadata.input1[i].qscale;
+			input_type = model->layer[0].glow.metadata.input1[i].input_type;
+			model_input_type = model->layer[0].glow.metadata.input1[i].model_input_type;
+			qscale = model->layer[0].glow.metadata.input1[i].qscale;
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			input_type = model->metadata.input2[j].input_type;
-			model_input_type = model->metadata.input2[j].model_input_type;
-			qscale = model->metadata.input2[j].qscale;
+			input_type = model->layer[0].glow.metadata.input2[j].input_type;
+			model_input_type = model->layer[0].glow.metadata.input2[j].model_input_type;
+			qscale = model->layer[0].glow.metadata.input2[j].qscale;
 		}
 
 		if (input_type == model_input_type) {
-			rte_memcpy(lcl_qbuffer, lcl_dbuffer, model->addr.input[i].sz_d);
+			rte_memcpy(lcl_qbuffer, lcl_dbuffer, model->layer[0].info.input[i].sz_d);
 		} else {
-			switch (model->metadata.input1[i].model_input_type) {
+			switch (model->layer[0].glow.metadata.input1[i].model_input_type) {
 			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_float32_to_int8(qscale,
-								model->addr.input[i].nb_elements,
-								lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_int8(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_float32_to_uint8(qscale,
-								 model->addr.input[i].nb_elements,
-								 lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_uint8(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_float32_to_int16(qscale,
-								 model->addr.input[i].nb_elements,
-								 lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_int16(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_float32_to_uint16(qscale,
-								  model->addr.input[i].nb_elements,
-								  lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_uint16(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_FP16:
-				ret = rte_ml_io_float32_to_float16(model->addr.input[i].nb_elements,
-								   lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_float16(
+					model->layer[0].info.input[i].nb_elements, lcl_dbuffer,
+					lcl_qbuffer);
 				break;
 			default:
 				plt_err("Unsupported model_input_type[%u] : %u", i,
-					model->metadata.input1[i].model_input_type);
+					model->layer[0].glow.metadata.input1[i].model_input_type);
 				ret = -ENOTSUP;
 			}
 			if (ret < 0)
 				return ret;
 		}
 
-		lcl_dbuffer += model->addr.input[i].sz_d;
-		lcl_qbuffer += model->addr.input[i].sz_q;
+		lcl_dbuffer += model->layer[0].info.input[i].sz_d;
+		lcl_qbuffer += model->layer[0].info.input[i].sz_q;
 	}
 
 	return 0;
@@ -2151,7 +2189,7 @@ static int
 cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
 		       struct rte_ml_buff_seg **dbuffer)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint8_t model_output_type;
 	uint8_t *lcl_qbuffer;
 	uint8_t *lcl_dbuffer;
@@ -2171,58 +2209,60 @@ cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_
 	lcl_dbuffer = dbuffer[0]->addr;
 	lcl_qbuffer = qbuffer[0]->addr;
 
-	for (i = 0; i < model->metadata.model.num_output; i++) {
+	for (i = 0; i < model->layer[0].glow.metadata.model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			output_type = model->metadata.output1[i].output_type;
-			model_output_type = model->metadata.output1[i].model_output_type;
-			dscale = model->metadata.output1[i].dscale;
+			output_type = model->layer[0].glow.metadata.output1[i].output_type;
+			model_output_type =
+				model->layer[0].glow.metadata.output1[i].model_output_type;
+			dscale = model->layer[0].glow.metadata.output1[i].dscale;
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			output_type = model->metadata.output2[j].output_type;
-			model_output_type = model->metadata.output2[j].model_output_type;
-			dscale = model->metadata.output2[j].dscale;
+			output_type = model->layer[0].glow.metadata.output2[j].output_type;
+			model_output_type =
+				model->layer[0].glow.metadata.output2[j].model_output_type;
+			dscale = model->layer[0].glow.metadata.output2[j].dscale;
 		}
 
 		if (output_type == model_output_type) {
-			rte_memcpy(lcl_dbuffer, lcl_qbuffer, model->addr.output[i].sz_q);
+			rte_memcpy(lcl_dbuffer, lcl_qbuffer, model->layer[0].info.output[i].sz_q);
 		} else {
-			switch (model->metadata.output1[i].model_output_type) {
+			switch (model->layer[0].glow.metadata.output1[i].model_output_type) {
 			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_int8_to_float32(dscale,
-								model->addr.output[i].nb_elements,
-								lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_int8_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_uint8_to_float32(dscale,
-								 model->addr.output[i].nb_elements,
-								 lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_uint8_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_int16_to_float32(dscale,
-								 model->addr.output[i].nb_elements,
-								 lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_int16_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_uint16_to_float32(dscale,
-								  model->addr.output[i].nb_elements,
-								  lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_uint16_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_FP16:
 				ret = rte_ml_io_float16_to_float32(
-					model->addr.output[i].nb_elements, lcl_qbuffer,
+					model->layer[0].info.output[i].nb_elements, lcl_qbuffer,
 					lcl_dbuffer);
 				break;
 			default:
 				plt_err("Unsupported model_output_type[%u] : %u", i,
-					model->metadata.output1[i].model_output_type);
+					model->layer[0].glow.metadata.output1[i].model_output_type);
 				ret = -ENOTSUP;
 			}
 			if (ret < 0)
 				return ret;
 		}
 
-		lcl_qbuffer += model->addr.output[i].sz_q;
-		lcl_dbuffer += model->addr.output[i].sz_d;
+		lcl_qbuffer += model->layer[0].info.output[i].sz_q;
+		lcl_dbuffer += model->layer[0].info.output[i].sz_d;
 	}
 
 	return 0;
@@ -2250,10 +2290,10 @@ static __rte_always_inline void
 cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
 		       struct rte_ml_op *op)
 {
-	struct cn10k_ml_model_stats *stats;
+	struct cn10k_ml_layer_stats *stats;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_qp *qp;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
@@ -2263,9 +2303,9 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		if (likely(qp_id >= 0)) {
 			qp = dev->data->queue_pairs[qp_id];
 			qp->stats.dequeued_count++;
-			stats = &model->burst_stats[qp_id];
+			stats = &model->layer[0].glow.burst_stats[qp_id];
 		} else {
-			stats = model->sync_stats;
+			stats = model->layer[0].glow.sync_stats;
 		}
 
 		if (unlikely(stats->dequeued_count == stats->hw_reset_count)) {
@@ -2469,7 +2509,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_req *req;
 	bool timeout;
 	int ret = 0;
@@ -2477,7 +2517,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[op->model_id];
-	req = model->req;
+	req = model->layer[0].glow.req;
 
 	cn10k_ml_set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
diff --git a/drivers/ml/cnxk/cnxk_ml_io.h b/drivers/ml/cnxk/cnxk_ml_io.h
new file mode 100644
index 0000000000..29ec7ec511
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_io.h
@@ -0,0 +1,79 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_IO_H_
+#define _CNXK_ML_IO_H_
+
+#include <rte_mldev.h>
+
+/* Maximum number of models per device */
+#define ML_CNXK_MAX_MODELS 16
+
+/* Maximum number of layers per model */
+#define ML_CNXK_MODEL_MAX_LAYERS 32
+
+/* Maximum number of inputs or outputs per layer or model */
+#define ML_CNXK_MODEL_MAX_INPUT_OUTPUT 32
+
+/* Maximum number of dimensions per I/O shape */
+#define ML_CNXK_MODEL_MAX_DIMS 8
+
+/* Input / Output structure */
+struct cnxk_ml_io {
+	/* name */
+	char name[RTE_ML_STR_MAX];
+
+	/* dequantized data type */
+	enum rte_ml_io_type dtype;
+
+	/* quantized data type */
+	enum rte_ml_io_type qtype;
+
+	/* Number of dimensions in shape */
+	uint32_t nb_dims;
+
+	/* Shape of input */
+	uint32_t shape[ML_CNXK_MODEL_MAX_DIMS];
+
+	/* Number of elements */
+	uint32_t nb_elements;
+
+	/* Dequantized input size */
+	uint32_t sz_d;
+
+	/* Quantized input size */
+	uint32_t sz_q;
+
+	/* Scale */
+	float scale;
+};
+
+/* Model / Layer IO structure */
+struct cnxk_ml_io_info {
+	/* Number of inputs */
+	uint16_t nb_inputs;
+
+	/* Model / Layer inputs */
+	struct cnxk_ml_io input[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Total size of quantized input */
+	uint32_t total_input_sz_q;
+
+	/* Total size of dequantized input */
+	uint32_t total_input_sz_d;
+
+	/* Number of outputs */
+	uint16_t nb_outputs;
+
+	/* Model / Layer outputs */
+	struct cnxk_ml_io output[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Total size of quantized output */
+	uint32_t total_output_sz_q;
+
+	/* Total size of dequantized output */
+	uint32_t total_output_sz_d;
+};
+
+#endif /* _CNXK_ML_IO_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
new file mode 100644
index 0000000000..3d735ced3e
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include "cnxk_ml_model.h"
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
new file mode 100644
index 0000000000..a2994dbb71
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -0,0 +1,111 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_MODEL_H_
+#define _CNXK_ML_MODEL_H_
+
+#include <rte_mldev.h>
+
+#include <roc_api.h>
+
+#include "cn10k_ml_model.h"
+
+#include "cnxk_ml_io.h"
+
+struct cnxk_ml_dev;
+struct cnxk_ml_model;
+
+/* Model state */
+enum cnxk_ml_model_state {
+	/* Unknown state */
+	ML_CNXK_MODEL_STATE_UNKNOWN,
+
+	/* Model loaded */
+	ML_CNXK_MODEL_STATE_LOADED,
+
+	/* A slow-path job is active, start or stop */
+	ML_CNXK_MODEL_STATE_JOB_ACTIVE,
+
+	/* Model started */
+	ML_CNXK_MODEL_STATE_STARTED,
+};
+
+/* Layer state */
+enum cnxk_ml_layer_state {
+	/* Unknown state */
+	ML_CNXK_LAYER_STATE_UNKNOWN,
+
+	/* Layer loaded */
+	ML_CNXK_LAYER_STATE_LOADED,
+
+	/* A slow-path job is active, start or stop */
+	ML_CNXK_LAYER_STATE_JOB_ACTIVE,
+
+	/* Layer started */
+	ML_CNXK_LAYER_STATE_STARTED,
+};
+
+/* Layer object */
+struct cnxk_ml_layer {
+	/* Name*/
+	char name[RTE_ML_STR_MAX];
+
+	/* Model handle */
+	struct cnxk_ml_model *model;
+
+	/* Index mapped with firmware's model_id */
+	uint16_t index;
+
+	/* Input / Output */
+	struct cnxk_ml_io_info info;
+
+	/* Batch size */
+	uint32_t batch_size;
+
+	/* State */
+	enum cnxk_ml_layer_state state;
+
+	/* Glow layer specific data */
+	struct cn10k_ml_layer_data glow;
+};
+
+/* Model Object */
+struct cnxk_ml_model {
+	/* Device reference */
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	/* ID */
+	uint16_t model_id;
+
+	/* Name */
+	char name[RTE_ML_STR_MAX];
+
+	/* Model specific data - glow */
+	struct cn10k_ml_model_data glow;
+
+	/* Batch size */
+	uint32_t batch_size;
+
+	/* Number of layers */
+	uint16_t nb_layers;
+
+	/* Layer info */
+	struct cnxk_ml_layer layer[ML_CNXK_MODEL_MAX_LAYERS];
+
+	/* State */
+	enum cnxk_ml_model_state state;
+
+	/* Internal model information structure
+	 * Size of the buffer = sizeof(struct rte_ml_model_info)
+	 *                    + num_inputs * sizeof(struct rte_ml_io_info)
+	 *                    + num_outputs * sizeof(struct rte_ml_io_info).
+	 * Structures would be arranged in the same order in the buffer.
+	 */
+	uint8_t *info;
+
+	/* Spinlock, used to update model state */
+	plt_spinlock_t lock;
+};
+
+#endif /* _CNXK_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 03a2d4ecf2..72e03b15b5 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -13,6 +13,8 @@ driver_sdk_headers = files(
         'cn10k_ml_model.h',
         'cn10k_ml_ocm.h',
         'cnxk_ml_dev.h',
+        'cnxk_ml_io.h',
+        'cnxk_ml_model.h',
 )
 
 sources = files(
@@ -21,6 +23,7 @@ sources = files(
         'cn10k_ml_model.c',
         'cn10k_ml_ocm.c',
         'cnxk_ml_dev.c',
+        'cnxk_ml_model.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 04/35] ml/cnxk: add generic cnxk request structure
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
                     ` (2 preceding siblings ...)
  2023-09-27 18:30   ` [PATCH v3 03/35] ml/cnxk: add generic model and layer structures Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 05/35] ml/cnxk: add generic cnxk xstats structures Srikanth Yalavarthi
                     ` (30 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added generic cnxk request structure. Moved common fields
from cn10k structures to cnxk structure. Moved job related
structures and enumerations to ops headers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c   |  72 +++----
 drivers/ml/cnxk/cn10k_ml_dev.h   | 269 +------------------------
 drivers/ml/cnxk/cn10k_ml_model.c |   6 +-
 drivers/ml/cnxk/cn10k_ml_model.h |   4 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 331 +++++++++++++++++--------------
 drivers/ml/cnxk/cn10k_ml_ops.h   | 296 +++++++++++++++++++++++----
 drivers/ml/cnxk/cnxk_ml_ops.c    |   7 +
 drivers/ml/cnxk/cnxk_ml_ops.h    |  63 ++++++
 drivers/ml/cnxk/meson.build      |   2 +
 9 files changed, 558 insertions(+), 492 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 3bc61443d8..fc6f78d414 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -14,9 +14,8 @@
 
 #include <roc_api.h>
 
-#include "cn10k_ml_ops.h"
-
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_ops.h"
 
 #define CN10K_ML_FW_PATH		"fw_path"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
@@ -400,20 +399,23 @@ cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 static void
 cn10k_ml_fw_print_info(struct cn10k_ml_fw *fw)
 {
-	plt_info("ML Firmware Version = %s", fw->req->jd.fw_load.version);
-
-	plt_ml_dbg("Firmware capabilities = 0x%016lx", fw->req->jd.fw_load.cap.u64);
-	plt_ml_dbg("Version = %s", fw->req->jd.fw_load.version);
-	plt_ml_dbg("core0_debug_ptr = 0x%016lx", fw->req->jd.fw_load.debug.core0_debug_ptr);
-	plt_ml_dbg("core1_debug_ptr = 0x%016lx", fw->req->jd.fw_load.debug.core1_debug_ptr);
-	plt_ml_dbg("debug_buffer_size = %u bytes", fw->req->jd.fw_load.debug.debug_buffer_size);
+	plt_info("ML Firmware Version = %s", fw->req->cn10k_req.jd.fw_load.version);
+
+	plt_ml_dbg("Firmware capabilities = 0x%016lx", fw->req->cn10k_req.jd.fw_load.cap.u64);
+	plt_ml_dbg("Version = %s", fw->req->cn10k_req.jd.fw_load.version);
+	plt_ml_dbg("core0_debug_ptr = 0x%016lx",
+		   fw->req->cn10k_req.jd.fw_load.debug.core0_debug_ptr);
+	plt_ml_dbg("core1_debug_ptr = 0x%016lx",
+		   fw->req->cn10k_req.jd.fw_load.debug.core1_debug_ptr);
+	plt_ml_dbg("debug_buffer_size = %u bytes",
+		   fw->req->cn10k_req.jd.fw_load.debug.debug_buffer_size);
 	plt_ml_dbg("core0_exception_buffer = 0x%016lx",
-		   fw->req->jd.fw_load.debug.core0_exception_buffer);
+		   fw->req->cn10k_req.jd.fw_load.debug.core0_exception_buffer);
 	plt_ml_dbg("core1_exception_buffer = 0x%016lx",
-		   fw->req->jd.fw_load.debug.core1_exception_buffer);
+		   fw->req->cn10k_req.jd.fw_load.debug.core1_exception_buffer);
 	plt_ml_dbg("exception_state_size = %u bytes",
-		   fw->req->jd.fw_load.debug.exception_state_size);
-	plt_ml_dbg("flags = 0x%016lx", fw->req->jd.fw_load.flags);
+		   fw->req->cn10k_req.jd.fw_load.debug.exception_state_size);
+	plt_ml_dbg("flags = 0x%016lx", fw->req->cn10k_req.jd.fw_load.flags);
 }
 
 uint64_t
@@ -458,29 +460,30 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 	roc_ml_reg_save(&cn10k_mldev->roc, ML_MLR_BASE);
 
 	/* Update FW load completion structure */
-	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
-	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
-	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
+	fw->req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->cn10k_req.status);
+	fw->req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
+	fw->req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->cn10k_req.result);
+	fw->req->cn10k_req.jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->cn10k_req.status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
 	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->cn10k_req.jd);
 
 	plt_rmb();
 	do {
 		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
+		    (plt_read64(&fw->req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
 	} while (plt_tsc_cycles() < timeout_cycle);
 
 	/* Check firmware load status, clean-up and exit on failure. */
-	if ((!timeout) && (fw->req->result.error_code.u64 == 0)) {
+	if ((!timeout) && (fw->req->cn10k_req.result.error_code == 0)) {
 		cn10k_ml_fw_print_info(fw);
 	} else {
 		/* Set ML to disable new jobs */
@@ -654,29 +657,30 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	plt_ml_dbg("ML_SW_RST_CTRL => 0x%08x", reg_val32);
 
 	/* (12) Wait for notification from firmware that ML is ready for job execution. */
-	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
-	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
-	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
+	fw->req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->cn10k_req.status);
+	fw->req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
+	fw->req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->cn10k_req.result);
+	fw->req->cn10k_req.jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->cn10k_req.status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
 	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->cn10k_req.jd);
 
 	plt_rmb();
 	do {
 		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
+		    (plt_read64(&fw->req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
 	} while (plt_tsc_cycles() < timeout_cycle);
 
 	/* Check firmware load status, clean-up and exit on failure. */
-	if ((!timeout) && (fw->req->result.error_code.u64 == 0)) {
+	if ((!timeout) && (fw->req->cn10k_req.result.error_code == 0)) {
 		cn10k_ml_fw_print_info(fw);
 	} else {
 		/* Set ML to disable new jobs */
@@ -766,11 +770,11 @@ cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 		}
 
 		/* Reserve memzone for firmware load completion and data */
-		mz_size = sizeof(struct cn10k_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
+		mz_size = sizeof(struct cnxk_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
 			  FW_DEBUG_BUFFER_SIZE + FW_EXCEPTION_BUFFER_SIZE;
 	} else if (roc_env_is_asim()) {
 		/* Reserve memzone for firmware load completion */
-		mz_size = sizeof(struct cn10k_ml_req);
+		mz_size = sizeof(struct cnxk_ml_req);
 	}
 
 	mz = plt_memzone_reserve_aligned(FW_MEMZONE_NAME, mz_size, 0, ML_CN10K_ALIGN_SIZE);
@@ -782,8 +786,8 @@ cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 	fw->req = mz->addr;
 
 	/* Reset firmware load completion structure */
-	memset(&fw->req->jd, 0, sizeof(struct cn10k_ml_jd));
-	memset(&fw->req->jd.fw_load.version[0], '\0', MLDEV_FIRMWARE_VERSION_LENGTH);
+	memset(&fw->req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+	memset(&fw->req->cn10k_req.jd.fw_load.version[0], '\0', MLDEV_FIRMWARE_VERSION_LENGTH);
 
 	/* Reset device, if in active state */
 	if (roc_ml_mlip_is_enabled(&cn10k_mldev->roc))
@@ -791,7 +795,7 @@ cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 
 	/* Load firmware */
 	if (roc_env_is_emulator() || roc_env_is_hw()) {
-		fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cn10k_ml_req));
+		fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cnxk_ml_req));
 		ret = cn10k_ml_fw_load_cn10ka(fw, fw_buffer, fw_size);
 		free(fw_buffer);
 	} else if (roc_env_is_asim()) {
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 99ff0a344a..1852d4f6c9 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -17,9 +17,6 @@ extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
-/* Firmware version string length */
-#define MLDEV_FIRMWARE_VERSION_LENGTH 32
-
 /* Device alignment size */
 #define ML_CN10K_ALIGN_SIZE 128
 
@@ -52,17 +49,8 @@ extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 #endif
 
 struct cnxk_ml_dev;
-struct cn10k_ml_req;
-struct cn10k_ml_qp;
-
-/* Job types */
-enum cn10k_ml_job_type {
-	ML_CN10K_JOB_TYPE_MODEL_RUN = 0,
-	ML_CN10K_JOB_TYPE_MODEL_STOP,
-	ML_CN10K_JOB_TYPE_MODEL_START,
-	ML_CN10K_JOB_TYPE_FIRMWARE_LOAD,
-	ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST,
-};
+struct cnxk_ml_req;
+struct cnxk_ml_qp;
 
 /* Error types enumeration */
 enum cn10k_ml_error_etype {
@@ -112,251 +100,6 @@ union cn10k_ml_error_code {
 	uint64_t u64;
 };
 
-/* Firmware stats */
-struct cn10k_ml_fw_stats {
-	/* Firmware start cycle */
-	uint64_t fw_start;
-
-	/* Firmware end cycle */
-	uint64_t fw_end;
-
-	/* Hardware start cycle */
-	uint64_t hw_start;
-
-	/* Hardware end cycle */
-	uint64_t hw_end;
-};
-
-/* Result structure */
-struct cn10k_ml_result {
-	/* Job error code */
-	union cn10k_ml_error_code error_code;
-
-	/* Firmware stats */
-	struct cn10k_ml_fw_stats stats;
-
-	/* User context pointer */
-	void *user_ptr;
-};
-
-/* Firmware capability structure */
-union cn10k_ml_fw_cap {
-	uint64_t u64;
-
-	struct {
-		/* CMPC completion support */
-		uint64_t cmpc_completions : 1;
-
-		/* Poll mode completion support */
-		uint64_t poll_completions : 1;
-
-		/* SSO completion support */
-		uint64_t sso_completions : 1;
-
-		/* Support for model side loading */
-		uint64_t side_load_model : 1;
-
-		/* Batch execution */
-		uint64_t batch_run : 1;
-
-		/* Max number of models to be loaded in parallel */
-		uint64_t max_models : 8;
-
-		/* Firmware statistics */
-		uint64_t fw_stats : 1;
-
-		/* Hardware statistics */
-		uint64_t hw_stats : 1;
-
-		/* Max number of batches */
-		uint64_t max_num_batches : 16;
-
-		uint64_t rsvd : 33;
-	} s;
-};
-
-/* Firmware debug info structure */
-struct cn10k_ml_fw_debug {
-	/* ACC core 0 debug buffer */
-	uint64_t core0_debug_ptr;
-
-	/* ACC core 1 debug buffer */
-	uint64_t core1_debug_ptr;
-
-	/* ACC core 0 exception state buffer */
-	uint64_t core0_exception_buffer;
-
-	/* ACC core 1 exception state buffer */
-	uint64_t core1_exception_buffer;
-
-	/* Debug buffer size per core */
-	uint32_t debug_buffer_size;
-
-	/* Exception state dump size */
-	uint32_t exception_state_size;
-};
-
-/* Job descriptor header (32 bytes) */
-struct cn10k_ml_jd_header {
-	/* Job completion structure */
-	struct ml_jce_s jce;
-
-	/* Model ID */
-	uint64_t model_id : 8;
-
-	/* Job type */
-	uint64_t job_type : 8;
-
-	/* Flags for fast-path jobs */
-	uint64_t fp_flags : 16;
-
-	/* Flags for slow-path jobs */
-	uint64_t sp_flags : 16;
-	uint64_t rsvd : 16;
-
-	/* Job result pointer */
-	uint64_t *result;
-};
-
-/* Extra arguments for job descriptor */
-union cn10k_ml_jd_extended_args {
-	struct cn10k_ml_jd_extended_args_section_start {
-		/** DDR Scratch base address */
-		uint64_t ddr_scratch_base_address;
-
-		/** DDR Scratch range start */
-		uint64_t ddr_scratch_range_start;
-
-		/** DDR Scratch range end */
-		uint64_t ddr_scratch_range_end;
-
-		uint8_t rsvd[104];
-	} start;
-};
-
-/* Job descriptor structure */
-struct cn10k_ml_jd {
-	/* Job descriptor header (32 bytes) */
-	struct cn10k_ml_jd_header hdr;
-
-	union {
-		struct cn10k_ml_jd_section_fw_load {
-			/* Firmware capability structure (8 bytes) */
-			union cn10k_ml_fw_cap cap;
-
-			/* Firmware version (32 bytes) */
-			uint8_t version[MLDEV_FIRMWARE_VERSION_LENGTH];
-
-			/* Debug capability structure (40 bytes) */
-			struct cn10k_ml_fw_debug debug;
-
-			/* Flags to control error handling */
-			uint64_t flags;
-
-			uint8_t rsvd[8];
-		} fw_load;
-
-		struct cn10k_ml_jd_section_model_start {
-			/* Extended arguments */
-			uint64_t extended_args;
-
-			/* Destination model start address in DDR relative to ML_MLR_BASE */
-			uint64_t model_dst_ddr_addr;
-
-			/* Offset to model init section in the model */
-			uint64_t model_init_offset : 32;
-
-			/* Size of init section in the model */
-			uint64_t model_init_size : 32;
-
-			/* Offset to model main section in the model */
-			uint64_t model_main_offset : 32;
-
-			/* Size of main section in the model */
-			uint64_t model_main_size : 32;
-
-			/* Offset to model finish section in the model */
-			uint64_t model_finish_offset : 32;
-
-			/* Size of finish section in the model */
-			uint64_t model_finish_size : 32;
-
-			/* Offset to WB in model bin */
-			uint64_t model_wb_offset : 32;
-
-			/* Number of model layers */
-			uint64_t num_layers : 8;
-
-			/* Number of gather entries, 0 means linear input mode (= no gather) */
-			uint64_t num_gather_entries : 8;
-
-			/* Number of scatter entries 0 means linear input mode (= no scatter) */
-			uint64_t num_scatter_entries : 8;
-
-			/* Tile mask to load model */
-			uint64_t tilemask : 8;
-
-			/* Batch size of model  */
-			uint64_t batch_size : 32;
-
-			/* OCM WB base address */
-			uint64_t ocm_wb_base_address : 32;
-
-			/* OCM WB range start */
-			uint64_t ocm_wb_range_start : 32;
-
-			/* OCM WB range End */
-			uint64_t ocm_wb_range_end : 32;
-
-			/* DDR WB address */
-			uint64_t ddr_wb_base_address;
-
-			/* DDR WB range start */
-			uint64_t ddr_wb_range_start : 32;
-
-			/* DDR WB range end */
-			uint64_t ddr_wb_range_end : 32;
-
-			union {
-				/* Points to gather list if num_gather_entries > 0 */
-				void *gather_list;
-				struct {
-					/* Linear input mode */
-					uint64_t ddr_range_start : 32;
-					uint64_t ddr_range_end : 32;
-				} s;
-			} input;
-
-			union {
-				/* Points to scatter list if num_scatter_entries > 0 */
-				void *scatter_list;
-				struct {
-					/* Linear output mode */
-					uint64_t ddr_range_start : 32;
-					uint64_t ddr_range_end : 32;
-				} s;
-			} output;
-		} model_start;
-
-		struct cn10k_ml_jd_section_model_stop {
-			uint8_t rsvd[96];
-		} model_stop;
-
-		struct cn10k_ml_jd_section_model_run {
-			/* Address of the input for the run relative to ML_MLR_BASE */
-			uint64_t input_ddr_addr;
-
-			/* Address of the output for the run relative to ML_MLR_BASE */
-			uint64_t output_ddr_addr;
-
-			/* Number of batches to run in variable batch processing */
-			uint16_t num_batches;
-
-			uint8_t rsvd[78];
-		} model_run;
-	};
-};
-
 /* ML firmware structure */
 struct cn10k_ml_fw {
 	/* Device reference */
@@ -375,7 +118,7 @@ struct cn10k_ml_fw {
 	uint8_t *data;
 
 	/* Firmware load / handshake request structure */
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 };
 
 /* Extended stats types enum */
@@ -488,9 +231,9 @@ struct cn10k_ml_dev {
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
 
 	/* Poll handling function pointers */
-	void (*set_poll_addr)(struct cn10k_ml_req *req);
-	void (*set_poll_ptr)(struct cn10k_ml_req *req);
-	uint64_t (*get_poll_ptr)(struct cn10k_ml_req *req);
+	void (*set_poll_addr)(struct cnxk_ml_req *req);
+	void (*set_poll_ptr)(struct cnxk_ml_req *req);
+	uint64_t (*get_poll_ptr)(struct cnxk_ml_req *req);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index d747bba151..5d37e9bf8a 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -10,6 +10,7 @@
 
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_ops.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -549,7 +550,6 @@ void
 cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct rte_ml_model_info *info;
 	struct rte_ml_io_info *output;
@@ -558,7 +558,6 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 	uint8_t i;
 
 	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	metadata = &model->glow.metadata;
 	info = PLT_PTR_CAST(model->info);
 	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
@@ -575,7 +574,8 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 	info->io_layout = RTE_ML_IO_LAYOUT_PACKED;
 	info->min_batches = model->batch_size;
 	info->max_batches =
-		cn10k_mldev->fw.req->jd.fw_load.cap.s.max_num_batches / model->batch_size;
+		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_num_batches /
+		model->batch_size;
 	info->nb_inputs = metadata->model.num_input;
 	info->input_info = input;
 	info->nb_outputs = metadata->model.num_output;
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 206a369ca7..74ada1531a 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -11,10 +11,10 @@
 
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ocm.h"
-#include "cn10k_ml_ops.h"
 
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
+struct cnxk_ml_req;
 
 /* Model Metadata : v 2.3.0.1 */
 #define MRVL_ML_MODEL_MAGIC_STRING "MRVL"
@@ -444,7 +444,7 @@ struct cn10k_ml_layer_data {
 	struct cn10k_ml_ocm_layer_map ocm_map;
 
 	/* Layer: Slow-path operations request pointer */
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 
 	/* Layer: Stats for burst ops */
 	struct cn10k_ml_layer_stats *burst_stats;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index e91cc4e859..caee09829b 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -7,10 +7,9 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_ops.h"
-
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_ops.h"
 
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
@@ -78,31 +77,31 @@ print_line(FILE *fp, int len)
 }
 
 static inline void
-cn10k_ml_set_poll_addr(struct cn10k_ml_req *req)
+cn10k_ml_set_poll_addr(struct cnxk_ml_req *req)
 {
-	req->compl_W1 = PLT_U64_CAST(&req->status);
+	req->status = &req->cn10k_req.status;
 }
 
 static inline void
-cn10k_ml_set_poll_ptr(struct cn10k_ml_req *req)
+cn10k_ml_set_poll_ptr(struct cnxk_ml_req *req)
 {
-	plt_write64(ML_CNXK_POLL_JOB_START, req->compl_W1);
+	plt_write64(ML_CNXK_POLL_JOB_START, req->status);
 }
 
 static inline uint64_t
-cn10k_ml_get_poll_ptr(struct cn10k_ml_req *req)
+cn10k_ml_get_poll_ptr(struct cnxk_ml_req *req)
 {
-	return plt_read64(req->compl_W1);
+	return plt_read64(req->status);
 }
 
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
-	snprintf(name, size, "cn10k_ml_qp_mem_%u:%u", dev_id, qp_id);
+	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
 }
 
 static int
-cn10k_ml_qp_destroy(const struct rte_ml_dev *dev, struct cn10k_ml_qp *qp)
+cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
 {
 	const struct rte_memzone *qp_mem;
 	char name[RTE_MEMZONE_NAMESIZE];
@@ -122,14 +121,14 @@ cn10k_ml_qp_destroy(const struct rte_ml_dev *dev, struct cn10k_ml_qp *qp)
 static int
 cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
 {
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	int ret;
 
 	qp = dev->data->queue_pairs[queue_pair_id];
 	if (qp == NULL)
 		return -EINVAL;
 
-	ret = cn10k_ml_qp_destroy(dev, qp);
+	ret = cnxk_ml_qp_destroy(dev, qp);
 	if (ret) {
 		plt_err("Could not destroy queue pair %u", queue_pair_id);
 		return ret;
@@ -140,18 +139,18 @@ cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
 	return 0;
 }
 
-static struct cn10k_ml_qp *
-cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
+static struct cnxk_ml_qp *
+cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
 {
 	const struct rte_memzone *qp_mem;
 	char name[RTE_MEMZONE_NAMESIZE];
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint32_t len;
 	uint8_t *va;
 	uint64_t i;
 
 	/* Allocate queue pair */
-	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cn10k_ml_qp), ROC_ALIGN,
+	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cnxk_ml_qp), ROC_ALIGN,
 				socket_id);
 	if (qp == NULL) {
 		plt_err("Could not allocate queue pair");
@@ -159,7 +158,7 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	}
 
 	/* For request queue */
-	len = nb_desc * sizeof(struct cn10k_ml_req);
+	len = nb_desc * sizeof(struct cnxk_ml_req);
 	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp_id);
 	qp_mem = rte_memzone_reserve_aligned(
 		name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | RTE_MEMZONE_256MB, ROC_ALIGN);
@@ -173,7 +172,7 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 
 	/* Initialize Request queue */
 	qp->id = qp_id;
-	qp->queue.reqs = (struct cn10k_ml_req *)va;
+	qp->queue.reqs = (struct cnxk_ml_req *)va;
 	qp->queue.head = 0;
 	qp->queue.tail = 0;
 	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
@@ -185,8 +184,9 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
-		memset(&qp->queue.reqs[i].jd, 0, sizeof(struct cn10k_ml_jd));
-		qp->queue.reqs[i].jcmd.w1.s.jobptr = PLT_U64_CAST(&qp->queue.reqs[i].jd);
+		memset(&qp->queue.reqs[i].cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+		qp->queue.reqs[i].cn10k_req.jcmd.w1.s.jobptr =
+			PLT_U64_CAST(&qp->queue.reqs[i].cn10k_req.jd);
 	}
 
 	return qp;
@@ -333,7 +333,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 
 static void
 cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_model *model,
-				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
+				struct cnxk_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_layer_addr *addr;
@@ -341,79 +341,88 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 	metadata = &model->glow.metadata;
 	addr = &model->layer[0].glow.addr;
 
-	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
-	req->jd.hdr.jce.w0.u64 = 0;
-	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
-	req->jd.hdr.model_id = model->model_id;
-	req->jd.hdr.job_type = job_type;
-	req->jd.hdr.fp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
+	memset(&req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
+	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->cn10k_req.status);
+	req->cn10k_req.jd.hdr.model_id = model->model_id;
+	req->cn10k_req.jd.hdr.job_type = job_type;
+	req->cn10k_req.jd.hdr.fp_flags = 0x0;
+	req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
 		if (!model->glow.metadata.model.ocm_relocatable)
-			req->jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
+			req->cn10k_req.jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
 		else
-			req->jd.hdr.sp_flags = 0x0;
+			req->cn10k_req.jd.hdr.sp_flags = 0x0;
 
-		req->jd.hdr.sp_flags |= ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD;
-		req->jd.model_start.extended_args =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->extended_args));
-		req->jd.model_start.model_dst_ddr_addr =
+		req->cn10k_req.jd.hdr.sp_flags |= ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD;
+		req->cn10k_req.jd.model_start.extended_args = PLT_U64_CAST(
+			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.extended_args));
+		req->cn10k_req.jd.model_start.model_dst_ddr_addr =
 			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_run_addr));
-		req->jd.model_start.model_init_offset = 0x0;
-		req->jd.model_start.model_main_offset = metadata->init_model.file_size;
-		req->jd.model_start.model_finish_offset =
+		req->cn10k_req.jd.model_start.model_init_offset = 0x0;
+		req->cn10k_req.jd.model_start.model_main_offset = metadata->init_model.file_size;
+		req->cn10k_req.jd.model_start.model_finish_offset =
 			metadata->init_model.file_size + metadata->main_model.file_size;
-		req->jd.model_start.model_init_size = metadata->init_model.file_size;
-		req->jd.model_start.model_main_size = metadata->main_model.file_size;
-		req->jd.model_start.model_finish_size = metadata->finish_model.file_size;
-		req->jd.model_start.model_wb_offset = metadata->init_model.file_size +
-						      metadata->main_model.file_size +
-						      metadata->finish_model.file_size;
-		req->jd.model_start.num_layers = metadata->model.num_layers;
-		req->jd.model_start.num_gather_entries = 0;
-		req->jd.model_start.num_scatter_entries = 0;
-		req->jd.model_start.tilemask = 0; /* Updated after reserving pages */
-		req->jd.model_start.batch_size = model->batch_size;
-		req->jd.model_start.ocm_wb_base_address = 0; /* Updated after reserving pages */
-		req->jd.model_start.ocm_wb_range_start = metadata->model.ocm_wb_range_start;
-		req->jd.model_start.ocm_wb_range_end = metadata->model.ocm_wb_range_end;
-		req->jd.model_start.ddr_wb_base_address = PLT_U64_CAST(roc_ml_addr_ap2mlip(
-			&cn10k_mldev->roc,
-			PLT_PTR_ADD(addr->finish_load_addr, metadata->finish_model.file_size)));
-		req->jd.model_start.ddr_wb_range_start = metadata->model.ddr_wb_range_start;
-		req->jd.model_start.ddr_wb_range_end = metadata->model.ddr_wb_range_end;
-		req->jd.model_start.input.s.ddr_range_start = metadata->model.ddr_input_range_start;
-		req->jd.model_start.input.s.ddr_range_end = metadata->model.ddr_input_range_end;
-		req->jd.model_start.output.s.ddr_range_start =
+		req->cn10k_req.jd.model_start.model_init_size = metadata->init_model.file_size;
+		req->cn10k_req.jd.model_start.model_main_size = metadata->main_model.file_size;
+		req->cn10k_req.jd.model_start.model_finish_size = metadata->finish_model.file_size;
+		req->cn10k_req.jd.model_start.model_wb_offset = metadata->init_model.file_size +
+								metadata->main_model.file_size +
+								metadata->finish_model.file_size;
+		req->cn10k_req.jd.model_start.num_layers = metadata->model.num_layers;
+		req->cn10k_req.jd.model_start.num_gather_entries = 0;
+		req->cn10k_req.jd.model_start.num_scatter_entries = 0;
+		req->cn10k_req.jd.model_start.tilemask = 0; /* Updated after reserving pages */
+		req->cn10k_req.jd.model_start.batch_size = model->batch_size;
+		req->cn10k_req.jd.model_start.ocm_wb_base_address =
+			0; /* Updated after reserving pages */
+		req->cn10k_req.jd.model_start.ocm_wb_range_start =
+			metadata->model.ocm_wb_range_start;
+		req->cn10k_req.jd.model_start.ocm_wb_range_end = metadata->model.ocm_wb_range_end;
+		req->cn10k_req.jd.model_start.ddr_wb_base_address =
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(
+				&cn10k_mldev->roc, PLT_PTR_ADD(addr->finish_load_addr,
+							       metadata->finish_model.file_size)));
+		req->cn10k_req.jd.model_start.ddr_wb_range_start =
+			metadata->model.ddr_wb_range_start;
+		req->cn10k_req.jd.model_start.ddr_wb_range_end = metadata->model.ddr_wb_range_end;
+		req->cn10k_req.jd.model_start.input.s.ddr_range_start =
+			metadata->model.ddr_input_range_start;
+		req->cn10k_req.jd.model_start.input.s.ddr_range_end =
+			metadata->model.ddr_input_range_end;
+		req->cn10k_req.jd.model_start.output.s.ddr_range_start =
 			metadata->model.ddr_output_range_start;
-		req->jd.model_start.output.s.ddr_range_end = metadata->model.ddr_output_range_end;
+		req->cn10k_req.jd.model_start.output.s.ddr_range_end =
+			metadata->model.ddr_output_range_end;
 
-		req->extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
+		req->cn10k_req.extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
 			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->scratch_base_addr));
-		req->extended_args.start.ddr_scratch_range_start =
+		req->cn10k_req.extended_args.start.ddr_scratch_range_start =
 			metadata->model.ddr_scratch_range_start;
-		req->extended_args.start.ddr_scratch_range_end =
+		req->cn10k_req.extended_args.start.ddr_scratch_range_end =
 			metadata->model.ddr_scratch_range_end;
 	}
 }
 
 static __rte_always_inline void
-cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_req *req,
+cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_req *req,
 				struct rte_ml_op *op)
 {
-	req->jd.hdr.jce.w0.u64 = 0;
-	req->jd.hdr.jce.w1.u64 = req->compl_W1;
-	req->jd.hdr.model_id = op->model_id;
-	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
-	req->jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
-	req->jd.hdr.sp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
-	req->jd.model_run.input_ddr_addr =
+	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
+	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(req->status);
+	req->cn10k_req.jd.hdr.model_id = op->model_id;
+	req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
+	req->cn10k_req.jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
+	req->cn10k_req.jd.hdr.sp_flags = 0x0;
+	req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
+	req->cn10k_req.jd.model_run.input_ddr_addr =
 		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->input[0]->addr));
-	req->jd.model_run.output_ddr_addr =
+	req->cn10k_req.jd.model_run.output_ddr_addr =
 		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->output[0]->addr));
-	req->jd.model_run.num_batches = op->nb_batches;
+	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
 }
 
 struct xstat_info {
@@ -861,7 +870,7 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	op.input = &inp;
 	op.output = &out;
 
-	memset(model->layer[0].glow.req, 0, sizeof(struct cn10k_ml_req));
+	memset(model->layer[0].glow.req, 0, sizeof(struct cnxk_ml_req));
 	ret = cn10k_ml_inference_sync(dev, &op);
 	plt_memzone_free(mz);
 
@@ -904,7 +913,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint16_t model_id;
 	uint32_t mz_size;
 	uint16_t tile_id;
@@ -1101,7 +1110,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint16_t model_id;
 	uint16_t qp_id;
 
@@ -1136,7 +1145,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
 		qp = dev->data->queue_pairs[qp_id];
 		if (qp != NULL) {
-			if (cn10k_ml_qp_destroy(dev, qp) != 0)
+			if (cnxk_ml_qp_destroy(dev, qp) != 0)
 				plt_err("Could not destroy queue pair %u", qp_id);
 			dev->data->queue_pairs[qp_id] = NULL;
 		}
@@ -1213,7 +1222,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 			      const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
 {
 	struct rte_ml_dev_info dev_info;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint32_t nb_desc;
 
 	if (queue_pair_id >= dev->data->nb_queue_pairs) {
@@ -1239,7 +1248,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	 */
 	nb_desc =
 		(qp_conf->nb_desc == dev_info.max_desc) ? dev_info.max_desc : qp_conf->nb_desc + 1;
-	qp = cn10k_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
+	qp = cnxk_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
 	if (qp == NULL) {
 		plt_err("Could not create queue pair %u", queue_pair_id);
 		return -ENOMEM;
@@ -1252,7 +1261,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 static int
 cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 {
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	int qp_id;
 
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
@@ -1269,7 +1278,7 @@ cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 static void
 cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	int qp_id;
 
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
@@ -1485,20 +1494,22 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 
 	/* Dump debug buffer */
 	for (core_id = 0; core_id <= 1; core_id++) {
-		bufsize = fw->req->jd.fw_load.debug.debug_buffer_size;
+		bufsize = fw->req->cn10k_req.jd.fw_load.debug.debug_buffer_size;
 		if (core_id == 0) {
 			head_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
 			tail_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_debug_ptr);
+			head_ptr =
+				PLT_PTR_CAST(fw->req->cn10k_req.jd.fw_load.debug.core0_debug_ptr);
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		} else {
 			head_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
 			tail_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_debug_ptr);
+			head_ptr =
+				PLT_PTR_CAST(fw->req->cn10k_req.jd.fw_load.debug.core1_debug_ptr);
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		}
 		if (head_loc < tail_loc) {
@@ -1511,17 +1522,19 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 
 	/* Dump exception info */
 	for (core_id = 0; core_id <= 1; core_id++) {
-		bufsize = fw->req->jd.fw_load.debug.exception_state_size;
+		bufsize = fw->req->cn10k_req.jd.fw_load.debug.exception_state_size;
 		if ((core_id == 0) &&
 		    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0)) {
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_exception_buffer);
+			head_ptr = PLT_PTR_CAST(
+				fw->req->cn10k_req.jd.fw_load.debug.core0_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C0 = 0x%016lx",
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0));
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 			fprintf(fp, "%.*s", bufsize, head_ptr);
 		} else if ((core_id == 1) && (roc_ml_reg_read64(&cn10k_mldev->roc,
 								ML_SCRATCH_EXCEPTION_SP_C1) != 0)) {
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_exception_buffer);
+			head_ptr = PLT_PTR_CAST(
+				fw->req->cn10k_req.jd.fw_load.debug.core1_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C1 = 0x%016lx",
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1));
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
@@ -1538,14 +1551,14 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	const struct plt_memzone *mz;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 	uint64_t timeout_cycle;
 	bool timeout;
 	int ret;
 
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cn10k_ml_req), 0,
+	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cnxk_ml_req), 0,
 					 ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL) {
 		plt_err("Could not allocate reserved memzone");
@@ -1554,23 +1567,24 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	req = mz->addr;
 
 	/* Prepare load completion structure */
-	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
-	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
-	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
-	req->jd.fw_load.flags = cn10k_ml_fw_flags_get(&cn10k_mldev->fw);
-	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
+	memset(&req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->cn10k_req.status);
+	req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST;
+	req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
+	req->cn10k_req.jd.fw_load.flags = cn10k_ml_fw_flags_get(&cn10k_mldev->fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
 	/* Enqueue firmware selftest request through scratch registers */
 	timeout = true;
 	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 
 	plt_rmb();
 	do {
 		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
-		    (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH)) {
+		    (plt_read64(&req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -1581,7 +1595,7 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	if (timeout) {
 		ret = -ETIME;
 	} else {
-		if (req->result.error_code.u64 != 0)
+		if (req->cn10k_req.result.error_code != 0)
 			ret = -1;
 	}
 
@@ -1654,7 +1668,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
 		  2 * model_data_size + model_scratch_size + model_info_size +
-		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE) +
+		  PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE) +
 		  model_stats_size;
 
 	/* Allocate memzone for model object and model data */
@@ -1726,7 +1740,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	/* Reset burst and sync stats */
 	model->layer[0].glow.burst_stats =
 		PLT_PTR_ADD(model->layer[0].glow.req,
-			    PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE));
+			    PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
 		model->layer[0].glow.burst_stats[qp_id].hw_latency_tot = 0;
 		model->layer[0].glow.burst_stats[qp_id].hw_latency_min = UINT64_MAX;
@@ -1790,7 +1804,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 
 	bool job_enqueued;
 	bool job_dequeued;
@@ -1815,10 +1829,10 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	/* Prepare JD */
 	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
-	req->result.error_code.u64 = 0x0;
-	req->result.user_ptr = NULL;
+	req->cn10k_req.result.error_code = 0x0;
+	req->cn10k_req.result.user_ptr = NULL;
 
-	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
 	num_tiles = model->layer[0].glow.metadata.model.tile_end -
@@ -1878,8 +1892,8 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Update JD */
 	cn10k_ml_ocm_tilecount(model->layer[0].glow.ocm_map.tilemask, &tile_start, &tile_end);
-	req->jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
-	req->jd.model_start.ocm_wb_base_address =
+	req->cn10k_req.jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
+	req->cn10k_req.jd.model_start.ocm_wb_base_address =
 		model->layer[0].glow.ocm_map.wb_page_start * ocm->page_size;
 
 	job_enqueued = false;
@@ -1887,19 +1901,21 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	do {
 		if (!job_enqueued) {
 			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
+			job_enqueued =
+				roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
+			job_dequeued =
+				roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
-			if (req->result.error_code.u64 == 0)
+		if (plt_read64(&req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH) {
+			if (req->cn10k_req.result.error_code == 0)
 				ret = 0;
 			else
 				ret = -1;
@@ -1952,7 +1968,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 
 	bool job_enqueued;
 	bool job_dequeued;
@@ -1972,10 +1988,10 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	/* Prepare JD */
 	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
-	req->result.error_code.u64 = 0x0;
-	req->result.user_ptr = NULL;
+	req->cn10k_req.result.error_code = 0x0;
+	req->cn10k_req.result.user_ptr = NULL;
 
-	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
 	locked = false;
@@ -2015,19 +2031,21 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	do {
 		if (!job_enqueued) {
 			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
+			job_enqueued =
+				roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
+			job_dequeued =
+				roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
-			if (req->result.error_code.u64 == 0x0)
+		if (plt_read64(&req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH) {
+			if (req->cn10k_req.result.error_code == 0x0)
 				ret = 0;
 			else
 				ret = -1;
@@ -2287,18 +2305,23 @@ queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
 }
 
 static __rte_always_inline void
-cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
-		       struct rte_ml_op *op)
+cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *req)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_layer_stats *stats;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cn10k_ml_result *result;
 	struct cnxk_ml_model *model;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
+	struct rte_ml_op *op;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
 
-	if (likely(result->error_code.u64 == 0)) {
+	result = &req->cn10k_req.result;
+	op = req->op;
+
+	if (likely(result->error_code == 0)) {
 		model = dev->data->models[op->model_id];
 		if (likely(qp_id >= 0)) {
 			qp = dev->data->queue_pairs[qp_id];
@@ -2329,7 +2352,7 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		stats->fw_latency_max = PLT_MAX(stats->fw_latency_max, fw_latency);
 		stats->dequeued_count++;
 
-		op->impl_opaque = result->error_code.u64;
+		op->impl_opaque = result->error_code;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
 	} else {
 		if (likely(qp_id >= 0)) {
@@ -2338,7 +2361,8 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		}
 
 		/* Handle driver error */
-		if (result->error_code.s.etype == ML_ETYPE_DRIVER) {
+		error_code = (union cn10k_ml_error_code *)&result->error_code;
+		if (error_code->s.etype == ML_ETYPE_DRIVER) {
 			cnxk_mldev = dev->data->dev_private;
 			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
@@ -2346,15 +2370,15 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 			if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) !=
 			     0) ||
 			    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
-				result->error_code.s.stype = ML_DRIVER_ERR_EXCEPTION;
+				error_code->s.stype = ML_DRIVER_ERR_EXCEPTION;
 			else if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_LO) != 0) ||
 				 (roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_HI) != 0))
-				result->error_code.s.stype = ML_DRIVER_ERR_FW_ERROR;
+				error_code->s.stype = ML_DRIVER_ERR_FW_ERROR;
 			else
-				result->error_code.s.stype = ML_DRIVER_ERR_UNKNOWN;
+				error_code->s.stype = ML_DRIVER_ERR_UNKNOWN;
 		}
 
-		op->impl_opaque = result->error_code.u64;
+		op->impl_opaque = result->error_code;
 		op->status = RTE_ML_OP_STATUS_ERROR;
 	}
 
@@ -2365,11 +2389,12 @@ __rte_hot uint16_t
 cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_req *req;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_req *req;
+	struct cnxk_ml_qp *qp;
 	struct rte_ml_op *op;
 
 	uint16_t count;
@@ -2395,12 +2420,13 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	cn10k_mldev->set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
-	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
-	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
-	req->result.user_ptr = op->user_ptr;
+	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
+	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	req->cn10k_req.result.user_ptr = op->user_ptr;
 
 	cn10k_mldev->set_poll_ptr(req);
-	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd);
+	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
 
@@ -2424,11 +2450,12 @@ __rte_hot uint16_t
 cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_req *req;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_req *req;
+	struct cnxk_ml_qp *qp;
 
 	uint64_t status;
 	uint16_t count;
@@ -2450,13 +2477,15 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	req = &queue->reqs[tail];
 	status = cn10k_mldev->get_poll_ptr(req);
 	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
-		if (plt_tsc_cycles() < req->timeout)
+		if (plt_tsc_cycles() < req->timeout) {
 			goto empty_or_active;
-		else /* Timeout, set indication of driver error */
-			req->result.error_code.s.etype = ML_ETYPE_DRIVER;
+		} else { /* Timeout, set indication of driver error */
+			error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+			error_code->s.etype = ML_ETYPE_DRIVER;
+		}
 	}
 
-	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
+	cn10k_ml_result_update(dev, qp_id, req);
 	ops[count] = req->op;
 
 	queue_index_advance(&tail, qp->nb_desc);
@@ -2507,10 +2536,11 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 __rte_hot int
 cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 	bool timeout;
 	int ret = 0;
 
@@ -2522,17 +2552,18 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	cn10k_ml_set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
-	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
-	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
-	req->result.user_ptr = op->user_ptr;
+	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
+	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	req->cn10k_req.result.user_ptr = op->user_ptr;
 
 	cn10k_mldev->set_poll_ptr(req);
-	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
+	req->cn10k_req.jcmd.w1.s.jobptr = PLT_U64_CAST(&req->cn10k_req.jd);
 
 	timeout = true;
 	req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	do {
-		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd)) {
+		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd)) {
 			req->op = op;
 			timeout = false;
 			break;
@@ -2555,7 +2586,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	if (timeout)
 		ret = -ETIME;
 	else
-		cn10k_ml_result_update(dev, -1, &req->result, req->op);
+		cn10k_ml_result_update(dev, -1, req);
 
 error_enqueue:
 	return ret;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 005b093e45..fd5992e192 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -10,63 +10,279 @@
 
 #include <roc_api.h>
 
-#include "cn10k_ml_dev.h"
+/* Firmware version string length */
+#define MLDEV_FIRMWARE_VERSION_LENGTH 32
 
-/* Request structure */
-struct cn10k_ml_req {
-	/* Job descriptor */
-	struct cn10k_ml_jd jd;
+/* Job types */
+enum cn10k_ml_job_type {
+	ML_CN10K_JOB_TYPE_MODEL_RUN = 0,
+	ML_CN10K_JOB_TYPE_MODEL_STOP,
+	ML_CN10K_JOB_TYPE_MODEL_START,
+	ML_CN10K_JOB_TYPE_FIRMWARE_LOAD,
+	ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST,
+};
 
-	/* Job descriptor extra arguments */
-	union cn10k_ml_jd_extended_args extended_args;
+/* Firmware stats */
+struct cn10k_ml_stats {
+	/* Firmware start cycle */
+	uint64_t fw_start;
 
-	/* Job result */
-	struct cn10k_ml_result result;
+	/* Firmware end cycle */
+	uint64_t fw_end;
 
-	/* Status field for poll mode requests */
-	volatile uint64_t status;
+	/* Hardware start cycle */
+	uint64_t hw_start;
 
-	/* Job command */
-	struct ml_job_cmd_s jcmd;
+	/* Hardware end cycle */
+	uint64_t hw_end;
+};
+
+/* Result structure */
+struct cn10k_ml_result {
+	/* Job error code */
+	uint64_t error_code;
+
+	/* Stats */
+	struct cn10k_ml_stats stats;
+
+	/* User context pointer */
+	void *user_ptr;
+};
+
+/* Firmware capability structure */
+union cn10k_ml_fw_cap {
+	uint64_t u64;
+
+	struct {
+		/* CMPC completion support */
+		uint64_t cmpc_completions : 1;
+
+		/* Poll mode completion support */
+		uint64_t poll_completions : 1;
+
+		/* SSO completion support */
+		uint64_t sso_completions : 1;
+
+		/* Support for model side loading */
+		uint64_t side_load_model : 1;
 
-	/* Job completion W1 */
-	uint64_t compl_W1;
+		/* Batch execution */
+		uint64_t batch_run : 1;
 
-	/* Timeout cycle */
-	uint64_t timeout;
+		/* Max number of models to be loaded in parallel */
+		uint64_t max_models : 8;
 
-	/* Op */
-	struct rte_ml_op *op;
-} __rte_aligned(ROC_ALIGN);
+		/* Firmware statistics */
+		uint64_t fw_stats : 1;
 
-/* Request queue */
-struct cn10k_ml_queue {
-	/* Array of requests */
-	struct cn10k_ml_req *reqs;
+		/* Hardware statistics */
+		uint64_t hw_stats : 1;
 
-	/* Head of the queue, used for enqueue */
-	uint64_t head;
+		/* Max number of batches */
+		uint64_t max_num_batches : 16;
 
-	/* Tail of the queue, used for dequeue */
-	uint64_t tail;
+		uint64_t rsvd : 33;
+	} s;
+};
+
+/* Firmware debug info structure */
+struct cn10k_ml_fw_debug {
+	/* ACC core 0 debug buffer */
+	uint64_t core0_debug_ptr;
+
+	/* ACC core 1 debug buffer */
+	uint64_t core1_debug_ptr;
+
+	/* ACC core 0 exception state buffer */
+	uint64_t core0_exception_buffer;
+
+	/* ACC core 1 exception state buffer */
+	uint64_t core1_exception_buffer;
+
+	/* Debug buffer size per core */
+	uint32_t debug_buffer_size;
 
-	/* Wait cycles before timeout */
-	uint64_t wait_cycles;
+	/* Exception state dump size */
+	uint32_t exception_state_size;
 };
 
-/* Queue-pair structure */
-struct cn10k_ml_qp {
-	/* ID */
-	uint32_t id;
+/* Job descriptor header (32 bytes) */
+struct cn10k_ml_jd_header {
+	/* Job completion structure */
+	struct ml_jce_s jce;
+
+	/* Model ID */
+	uint64_t model_id : 8;
+
+	/* Job type */
+	uint64_t job_type : 8;
+
+	/* Flags for fast-path jobs */
+	uint64_t fp_flags : 16;
+
+	/* Flags for slow-path jobs */
+	uint64_t sp_flags : 16;
+	uint64_t rsvd : 16;
+
+	/* Job result pointer */
+	uint64_t *result;
+};
+
+/* Extra arguments for job descriptor */
+union cn10k_ml_jd_extended_args {
+	struct cn10k_ml_jd_extended_args_section_start {
+		/* DDR Scratch base address */
+		uint64_t ddr_scratch_base_address;
+
+		/* DDR Scratch range start */
+		uint64_t ddr_scratch_range_start;
+
+		/* DDR Scratch range end */
+		uint64_t ddr_scratch_range_end;
+
+		uint8_t rsvd[104];
+	} start;
+};
+
+/* Job descriptor structure */
+struct cn10k_ml_jd {
+	/* Job descriptor header (32 bytes) */
+	struct cn10k_ml_jd_header hdr;
+
+	union {
+		struct cn10k_ml_jd_section_fw_load {
+			/* Firmware capability structure (8 bytes) */
+			union cn10k_ml_fw_cap cap;
+
+			/* Firmware version (32 bytes) */
+			uint8_t version[MLDEV_FIRMWARE_VERSION_LENGTH];
+
+			/* Debug capability structure (40 bytes) */
+			struct cn10k_ml_fw_debug debug;
 
-	/* Number of descriptors */
-	uint64_t nb_desc;
+			/* Flags to control error handling */
+			uint64_t flags;
 
-	/* Request queue */
-	struct cn10k_ml_queue queue;
+			uint8_t rsvd[8];
+		} fw_load;
 
-	/* Statistics per queue-pair */
-	struct rte_ml_dev_stats stats;
+		struct cn10k_ml_jd_section_model_start {
+			/* Extended arguments */
+			uint64_t extended_args;
+
+			/* Destination model start address in DDR relative to ML_MLR_BASE */
+			uint64_t model_dst_ddr_addr;
+
+			/* Offset to model init section in the model */
+			uint64_t model_init_offset : 32;
+
+			/* Size of init section in the model */
+			uint64_t model_init_size : 32;
+
+			/* Offset to model main section in the model */
+			uint64_t model_main_offset : 32;
+
+			/* Size of main section in the model */
+			uint64_t model_main_size : 32;
+
+			/* Offset to model finish section in the model */
+			uint64_t model_finish_offset : 32;
+
+			/* Size of finish section in the model */
+			uint64_t model_finish_size : 32;
+
+			/* Offset to WB in model bin */
+			uint64_t model_wb_offset : 32;
+
+			/* Number of model layers */
+			uint64_t num_layers : 8;
+
+			/* Number of gather entries, 0 means linear input mode (= no gather) */
+			uint64_t num_gather_entries : 8;
+
+			/* Number of scatter entries 0 means linear input mode (= no scatter) */
+			uint64_t num_scatter_entries : 8;
+
+			/* Tile mask to load model */
+			uint64_t tilemask : 8;
+
+			/* Batch size of model  */
+			uint64_t batch_size : 32;
+
+			/* OCM WB base address */
+			uint64_t ocm_wb_base_address : 32;
+
+			/* OCM WB range start */
+			uint64_t ocm_wb_range_start : 32;
+
+			/* OCM WB range End */
+			uint64_t ocm_wb_range_end : 32;
+
+			/* DDR WB address */
+			uint64_t ddr_wb_base_address;
+
+			/* DDR WB range start */
+			uint64_t ddr_wb_range_start : 32;
+
+			/* DDR WB range end */
+			uint64_t ddr_wb_range_end : 32;
+
+			union {
+				/* Points to gather list if num_gather_entries > 0 */
+				void *gather_list;
+				struct {
+					/* Linear input mode */
+					uint64_t ddr_range_start : 32;
+					uint64_t ddr_range_end : 32;
+				} s;
+			} input;
+
+			union {
+				/* Points to scatter list if num_scatter_entries > 0 */
+				void *scatter_list;
+				struct {
+					/* Linear output mode */
+					uint64_t ddr_range_start : 32;
+					uint64_t ddr_range_end : 32;
+				} s;
+			} output;
+		} model_start;
+
+		struct cn10k_ml_jd_section_model_stop {
+			uint8_t rsvd[96];
+		} model_stop;
+
+		struct cn10k_ml_jd_section_model_run {
+			/* Address of the input for the run relative to ML_MLR_BASE */
+			uint64_t input_ddr_addr;
+
+			/* Address of the output for the run relative to ML_MLR_BASE */
+			uint64_t output_ddr_addr;
+
+			/* Number of batches to run in variable batch processing */
+			uint16_t num_batches;
+
+			uint8_t rsvd[78];
+		} model_run;
+	};
+} __plt_aligned(ROC_ALIGN);
+
+/* CN10K specific request */
+struct cn10k_ml_req {
+	/* Job descriptor */
+	struct cn10k_ml_jd jd;
+
+	/* Job descriptor extra arguments */
+	union cn10k_ml_jd_extended_args extended_args;
+
+	/* Status field for poll mode requests */
+	volatile uint64_t status;
+
+	/* Job command */
+	struct ml_job_cmd_s jcmd;
+
+	/* Result */
+	struct cn10k_ml_result result;
 };
 
 /* Device ops */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
new file mode 100644
index 0000000000..f1872dcf7c
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include "cnxk_ml_ops.h"
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
new file mode 100644
index 0000000000..b953fb0f5f
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_OPS_H_
+#define _CNXK_ML_OPS_H_
+
+#include <rte_mldev.h>
+#include <rte_mldev_core.h>
+
+#include <roc_api.h>
+
+#include "cn10k_ml_ops.h"
+
+/* Request structure */
+struct cnxk_ml_req {
+	/* Device specific request */
+	union {
+		/* CN10K */
+		struct cn10k_ml_req cn10k_req;
+	};
+
+	/* Address of status field */
+	volatile uint64_t *status;
+
+	/* Timeout cycle */
+	uint64_t timeout;
+
+	/* Op */
+	struct rte_ml_op *op;
+} __rte_aligned(ROC_ALIGN);
+
+/* Request queue */
+struct cnxk_ml_queue {
+	/* Array of requests */
+	struct cnxk_ml_req *reqs;
+
+	/* Head of the queue, used for enqueue */
+	uint64_t head;
+
+	/* Tail of the queue, used for dequeue */
+	uint64_t tail;
+
+	/* Wait cycles before timeout */
+	uint64_t wait_cycles;
+};
+
+/* Queue-pair structure */
+struct cnxk_ml_qp {
+	/* ID */
+	uint32_t id;
+
+	/* Number of descriptors */
+	uint64_t nb_desc;
+
+	/* Request queue */
+	struct cnxk_ml_queue queue;
+
+	/* Statistics per queue-pair */
+	struct rte_ml_dev_stats stats;
+};
+
+#endif /* _CNXK_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 72e03b15b5..73db458fcd 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -15,6 +15,7 @@ driver_sdk_headers = files(
         'cnxk_ml_dev.h',
         'cnxk_ml_io.h',
         'cnxk_ml_model.h',
+        'cnxk_ml_ops.h',
 )
 
 sources = files(
@@ -24,6 +25,7 @@ sources = files(
         'cn10k_ml_ocm.c',
         'cnxk_ml_dev.c',
         'cnxk_ml_model.c',
+        'cnxk_ml_ops.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 05/35] ml/cnxk: add generic cnxk xstats structures
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
                     ` (3 preceding siblings ...)
  2023-09-27 18:30   ` [PATCH v3 04/35] ml/cnxk: add generic cnxk request structure Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 06/35] ml/cnxk: rename cnxk ops function pointers struct Srikanth Yalavarthi
                     ` (29 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduced generic xstats structures and renamed cn10k
xstats enumerations with cnxk prefix.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |  86 +---------------
 drivers/ml/cnxk/cn10k_ml_model.h |   6 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 169 ++++++++++++++-----------------
 drivers/ml/cnxk/cnxk_ml_xstats.h | 128 +++++++++++++++++++++++
 drivers/ml/cnxk/meson.build      |   1 +
 5 files changed, 210 insertions(+), 180 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_xstats.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 1852d4f6c9..be989e0a20 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -10,6 +10,7 @@
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_io.h"
+#include "cnxk_ml_xstats.h"
 
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
@@ -121,89 +122,6 @@ struct cn10k_ml_fw {
 	struct cnxk_ml_req *req;
 };
 
-/* Extended stats types enum */
-enum cn10k_ml_xstats_type {
-	/* Number of models loaded */
-	nb_models_loaded,
-
-	/* Number of models unloaded */
-	nb_models_unloaded,
-
-	/* Number of models started */
-	nb_models_started,
-
-	/* Number of models stopped */
-	nb_models_stopped,
-
-	/* Average inference hardware latency */
-	avg_hw_latency,
-
-	/* Minimum hardware latency */
-	min_hw_latency,
-
-	/* Maximum hardware latency */
-	max_hw_latency,
-
-	/* Average firmware latency */
-	avg_fw_latency,
-
-	/* Minimum firmware latency */
-	min_fw_latency,
-
-	/* Maximum firmware latency */
-	max_fw_latency,
-};
-
-/* Extended stats function type enum. */
-enum cn10k_ml_xstats_fn_type {
-	/* Device function */
-	CN10K_ML_XSTATS_FN_DEVICE,
-
-	/* Model function */
-	CN10K_ML_XSTATS_FN_MODEL,
-};
-
-/* Function pointer to get xstats for a type */
-typedef uint64_t (*cn10k_ml_xstats_fn)(struct rte_ml_dev *dev, uint16_t obj_idx,
-				       enum cn10k_ml_xstats_type stat);
-
-/* Extended stats entry structure */
-struct cn10k_ml_xstats_entry {
-	/* Name-ID map */
-	struct rte_ml_dev_xstats_map map;
-
-	/* xstats mode, device or model */
-	enum rte_ml_dev_xstats_mode mode;
-
-	/* Type of xstats */
-	enum cn10k_ml_xstats_type type;
-
-	/* xstats function */
-	enum cn10k_ml_xstats_fn_type fn_id;
-
-	/* Object ID, model ID for model stat type */
-	uint16_t obj_idx;
-
-	/* Allowed to reset the stat */
-	uint8_t reset_allowed;
-
-	/* An offset to be taken away to emulate resets */
-	uint64_t reset_value;
-};
-
-/* Extended stats data */
-struct cn10k_ml_xstats {
-	/* Pointer to xstats entries */
-	struct cn10k_ml_xstats_entry *entries;
-
-	/* Store num stats and offset of the stats for each model */
-	uint16_t count_per_model[ML_CNXK_MAX_MODELS];
-	uint16_t offset_for_model[ML_CNXK_MAX_MODELS];
-	uint16_t count_mode_device;
-	uint16_t count_mode_model;
-	uint16_t count;
-};
-
 /* Device private data */
 struct cn10k_ml_dev {
 	/* Device ROC */
@@ -216,7 +134,7 @@ struct cn10k_ml_dev {
 	struct cn10k_ml_ocm ocm;
 
 	/* Extended stats data */
-	struct cn10k_ml_xstats xstats;
+	struct cnxk_ml_xstats xstats;
 
 	/* Enable / disable model data caching */
 	int cache_model_data;
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 74ada1531a..5c32f48c68 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -404,7 +404,7 @@ struct cn10k_ml_layer_addr {
 };
 
 /* Model fast-path stats */
-struct cn10k_ml_layer_stats {
+struct cn10k_ml_layer_xstats {
 	/* Total hardware latency, sum of all inferences */
 	uint64_t hw_latency_tot;
 
@@ -447,10 +447,10 @@ struct cn10k_ml_layer_data {
 	struct cnxk_ml_req *req;
 
 	/* Layer: Stats for burst ops */
-	struct cn10k_ml_layer_stats *burst_stats;
+	struct cn10k_ml_layer_xstats *burst_xstats;
 
 	/* Layer: Stats for sync ops */
-	struct cn10k_ml_layer_stats *sync_stats;
+	struct cn10k_ml_layer_xstats *sync_xstats;
 };
 
 struct cn10k_ml_model_data {
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index caee09829b..42a4389bbe 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -10,6 +10,7 @@
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
+#include "cnxk_ml_xstats.h"
 
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
@@ -425,26 +426,6 @@ cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
 }
 
-struct xstat_info {
-	char name[32];
-	enum cn10k_ml_xstats_type type;
-	uint8_t reset_allowed;
-};
-
-/* Note: Device stats are not allowed to be reset. */
-static const struct xstat_info device_stats[] = {
-	{"nb_models_loaded", nb_models_loaded, 0},
-	{"nb_models_unloaded", nb_models_unloaded, 0},
-	{"nb_models_started", nb_models_started, 0},
-	{"nb_models_stopped", nb_models_stopped, 0},
-};
-
-static const struct xstat_info model_stats[] = {
-	{"Avg-HW-Latency", avg_hw_latency, 1}, {"Min-HW-Latency", min_hw_latency, 1},
-	{"Max-HW-Latency", max_hw_latency, 1}, {"Avg-FW-Latency", avg_fw_latency, 1},
-	{"Min-FW-Latency", min_fw_latency, 1}, {"Max-FW-Latency", max_fw_latency, 1},
-};
-
 static int
 cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 {
@@ -459,10 +440,10 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
-	nb_stats = RTE_DIM(device_stats) + ML_CNXK_MAX_MODELS * RTE_DIM(model_stats);
+	nb_stats = RTE_DIM(device_xstats) + ML_CNXK_MAX_MODELS * RTE_DIM(layer_xstats);
 	if (cn10k_mldev->xstats.entries == NULL)
 		cn10k_mldev->xstats.entries = rte_zmalloc(
-			"cn10k_ml_xstats", sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
+			"cn10k_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
 			PLT_CACHE_LINE_SIZE);
 
 	if (cn10k_mldev->xstats.entries == NULL)
@@ -470,17 +451,17 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 
 	/* Initialize device xstats */
 	stat_id = 0;
-	for (i = 0; i < RTE_DIM(device_stats); i++) {
+	for (i = 0; i < RTE_DIM(device_xstats); i++) {
 		cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
 		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s",
-			 device_stats[i].name);
+			 device_xstats[i].name);
 
 		cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
-		cn10k_mldev->xstats.entries[stat_id].type = device_stats[i].type;
-		cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_DEVICE;
+		cn10k_mldev->xstats.entries[stat_id].type = device_xstats[i].type;
+		cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_DEVICE;
 		cn10k_mldev->xstats.entries[stat_id].obj_idx = 0;
-		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_stats[i].reset_allowed;
+		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_xstats[i].reset_allowed;
 		stat_id++;
 	}
 	cn10k_mldev->xstats.count_mode_device = stat_id;
@@ -489,24 +470,24 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
 		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
 
-		for (i = 0; i < RTE_DIM(model_stats); i++) {
+		for (i = 0; i < RTE_DIM(layer_xstats); i++) {
 			cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
 			cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
-			cn10k_mldev->xstats.entries[stat_id].type = model_stats[i].type;
-			cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_MODEL;
+			cn10k_mldev->xstats.entries[stat_id].type = layer_xstats[i].type;
+			cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
 			cn10k_mldev->xstats.entries[stat_id].obj_idx = model;
 			cn10k_mldev->xstats.entries[stat_id].reset_allowed =
-				model_stats[i].reset_allowed;
+				layer_xstats[i].reset_allowed;
 
 			/* Name of xstat is updated during model load */
 			snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 				 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name),
-				 "Model-%u-%s", model, model_stats[i].name);
+				 "Model-%u-%s", model, layer_xstats[i].name);
 
 			stat_id++;
 		}
 
-		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(model_stats);
+		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
 	}
 
 	cn10k_mldev->xstats.count_mode_model = stat_id - cn10k_mldev->xstats.count_mode_device;
@@ -545,7 +526,7 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[model_id];
-	stat_id = RTE_DIM(device_stats) + model_id * RTE_DIM(model_stats);
+	stat_id = RTE_DIM(device_xstats) + model_id * RTE_DIM(layer_xstats);
 
 	roc_clk_freq_get(&rclk_freq, &sclk_freq);
 	if (sclk_freq == 0)
@@ -554,17 +535,17 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 		strcpy(suffix, "ns");
 
 	/* Update xstat name based on model name and sclk availability */
-	for (i = 0; i < RTE_DIM(model_stats); i++) {
+	for (i = 0; i < RTE_DIM(layer_xstats); i++) {
 		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
-			 model->layer[0].glow.metadata.model.name, model_stats[i].name, suffix);
+			 model->layer[0].glow.metadata.model.name, layer_xstats[i].name, suffix);
 		stat_id++;
 	}
 }
 
 static uint64_t
 cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
-		       enum cn10k_ml_xstats_type type)
+		       enum cnxk_ml_xstats_type type)
 {
 	struct cnxk_ml_dev *cnxk_mldev;
 
@@ -590,9 +571,9 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value += model->layer[0].glow.burst_stats[qp_id].str##_latency_tot;        \
-			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
-				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
+			value += model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot;       \
+			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
+				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
 		}                                                                                  \
 		if (count != 0)                                                                    \
 			value = value / count;                                                     \
@@ -603,9 +584,10 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 		value = UINT64_MAX;                                                                \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
 			value = PLT_MIN(                                                           \
-				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_min); \
-			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
-				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
+				value,                                                             \
+				model->layer[0].glow.burst_xstats[qp_id].str##_latency_min);       \
+			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
+				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
@@ -616,16 +598,17 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
 			value = PLT_MAX(                                                           \
-				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_max); \
-			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
-				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
+				value,                                                             \
+				model->layer[0].glow.burst_xstats[qp_id].str##_latency_max);       \
+			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
+				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
 	} while (0)
 
 static uint64_t
-cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cn10k_ml_xstats_type type)
+cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cnxk_ml_xstats_type type)
 {
 	struct cnxk_ml_model *model;
 	uint16_t rclk_freq; /* MHz */
@@ -671,8 +654,8 @@ cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cn10k_ml
 static int
 cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[], uint16_t nb_ids)
 {
-	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
 	uint16_t nb_stats;
 	uint16_t stat_id;
@@ -708,26 +691,26 @@ cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[],
 #define ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			model->layer[0].glow.burst_stats[qp_id].str##_latency_tot = 0;             \
-			model->layer[0].glow.burst_stats[qp_id].str##_reset_count =                \
-				model->layer[0].glow.burst_stats[qp_id].dequeued_count;            \
+			model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot = 0;            \
+			model->layer[0].glow.burst_xstats[qp_id].str##_reset_count =               \
+				model->layer[0].glow.burst_xstats[qp_id].dequeued_count;           \
 		}                                                                                  \
 	} while (0)
 
 #define ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_stats[qp_id].str##_latency_min = UINT64_MAX;    \
+			model->layer[0].glow.burst_xstats[qp_id].str##_latency_min = UINT64_MAX;   \
 	} while (0)
 
 #define ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_stats[qp_id].str##_latency_max = 0;             \
+			model->layer[0].glow.burst_xstats[qp_id].str##_latency_max = 0;            \
 	} while (0)
 
 static void
-cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cn10k_ml_xstats_type type)
+cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cnxk_ml_xstats_type type)
 {
 	struct cnxk_ml_model *model;
 	uint32_t qp_id;
@@ -762,8 +745,8 @@ static int
 cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint16_t stat_ids[],
 			    uint16_t nb_ids)
 {
-	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	int32_t lcl_model_id = 0;
@@ -1342,10 +1325,10 @@ static int
 cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
 				uint64_t *value)
 {
-	struct cn10k_ml_xstats_entry *xs;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	cn10k_ml_xstats_fn fn;
+	cnxk_ml_xstats_fn fn;
 	uint32_t i;
 
 	cnxk_mldev = dev->data->dev_private;
@@ -1357,10 +1340,10 @@ cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16
 				*stat_id = xs->map.id;
 
 			switch (xs->fn_id) {
-			case CN10K_ML_XSTATS_FN_DEVICE:
+			case CNXK_ML_XSTATS_FN_DEVICE:
 				fn = cn10k_ml_dev_xstat_get;
 				break;
-			case CN10K_ML_XSTATS_FN_MODEL:
+			case CNXK_ML_XSTATS_FN_MODEL:
 				fn = cn10k_ml_model_xstat_get;
 				break;
 			default:
@@ -1384,11 +1367,11 @@ static int
 cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
 			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
 {
-	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
 	uint32_t xstats_mode_count;
-	cn10k_ml_xstats_fn fn;
+	cnxk_ml_xstats_fn fn;
 	uint64_t val;
 	uint32_t idx;
 	uint32_t i;
@@ -1423,10 +1406,10 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 		}
 
 		switch (xs->fn_id) {
-		case CN10K_ML_XSTATS_FN_DEVICE:
+		case CNXK_ML_XSTATS_FN_DEVICE:
 			fn = cn10k_ml_dev_xstat_get;
 			break;
-		case CN10K_ML_XSTATS_FN_MODEL:
+		case CNXK_ML_XSTATS_FN_MODEL:
 			fn = cn10k_ml_model_xstat_get;
 			break;
 		default:
@@ -1664,7 +1647,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
 			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
 	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
-	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_stats);
+	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_xstats);
 
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
 		  2 * model_data_size + model_scratch_size + model_info_size +
@@ -1738,24 +1721,24 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	model->layer[0].glow.req = PLT_PTR_ADD(model->info, model_info_size);
 
 	/* Reset burst and sync stats */
-	model->layer[0].glow.burst_stats =
+	model->layer[0].glow.burst_xstats =
 		PLT_PTR_ADD(model->layer[0].glow.req,
 			    PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
-		model->layer[0].glow.burst_stats[qp_id].hw_latency_tot = 0;
-		model->layer[0].glow.burst_stats[qp_id].hw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_stats[qp_id].hw_latency_max = 0;
-		model->layer[0].glow.burst_stats[qp_id].fw_latency_tot = 0;
-		model->layer[0].glow.burst_stats[qp_id].fw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_stats[qp_id].fw_latency_max = 0;
-		model->layer[0].glow.burst_stats[qp_id].hw_reset_count = 0;
-		model->layer[0].glow.burst_stats[qp_id].fw_reset_count = 0;
-		model->layer[0].glow.burst_stats[qp_id].dequeued_count = 0;
+		model->layer[0].glow.burst_xstats[qp_id].hw_latency_tot = 0;
+		model->layer[0].glow.burst_xstats[qp_id].hw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_xstats[qp_id].hw_latency_max = 0;
+		model->layer[0].glow.burst_xstats[qp_id].fw_latency_tot = 0;
+		model->layer[0].glow.burst_xstats[qp_id].fw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_xstats[qp_id].fw_latency_max = 0;
+		model->layer[0].glow.burst_xstats[qp_id].hw_reset_count = 0;
+		model->layer[0].glow.burst_xstats[qp_id].fw_reset_count = 0;
+		model->layer[0].glow.burst_xstats[qp_id].dequeued_count = 0;
 	}
 
-	model->layer[0].glow.sync_stats =
-		PLT_PTR_ADD(model->layer[0].glow.burst_stats,
-			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_stats));
+	model->layer[0].glow.sync_xstats =
+		PLT_PTR_ADD(model->layer[0].glow.burst_xstats,
+			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_xstats));
 
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CNXK_MODEL_STATE_LOADED;
@@ -2308,7 +2291,7 @@ static __rte_always_inline void
 cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *req)
 {
 	union cn10k_ml_error_code *error_code;
-	struct cn10k_ml_layer_stats *stats;
+	struct cn10k_ml_layer_xstats *xstats;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_result *result;
@@ -2326,31 +2309,31 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *re
 		if (likely(qp_id >= 0)) {
 			qp = dev->data->queue_pairs[qp_id];
 			qp->stats.dequeued_count++;
-			stats = &model->layer[0].glow.burst_stats[qp_id];
+			xstats = &model->layer[0].glow.burst_xstats[qp_id];
 		} else {
-			stats = model->layer[0].glow.sync_stats;
+			xstats = model->layer[0].glow.sync_xstats;
 		}
 
-		if (unlikely(stats->dequeued_count == stats->hw_reset_count)) {
-			stats->hw_latency_min = UINT64_MAX;
-			stats->hw_latency_max = 0;
+		if (unlikely(xstats->dequeued_count == xstats->hw_reset_count)) {
+			xstats->hw_latency_min = UINT64_MAX;
+			xstats->hw_latency_max = 0;
 		}
 
-		if (unlikely(stats->dequeued_count == stats->fw_reset_count)) {
-			stats->fw_latency_min = UINT64_MAX;
-			stats->fw_latency_max = 0;
+		if (unlikely(xstats->dequeued_count == xstats->fw_reset_count)) {
+			xstats->fw_latency_min = UINT64_MAX;
+			xstats->fw_latency_max = 0;
 		}
 
 		hw_latency = result->stats.hw_end - result->stats.hw_start;
 		fw_latency = result->stats.fw_end - result->stats.fw_start - hw_latency;
 
-		stats->hw_latency_tot += hw_latency;
-		stats->hw_latency_min = PLT_MIN(stats->hw_latency_min, hw_latency);
-		stats->hw_latency_max = PLT_MAX(stats->hw_latency_max, hw_latency);
-		stats->fw_latency_tot += fw_latency;
-		stats->fw_latency_min = PLT_MIN(stats->fw_latency_min, fw_latency);
-		stats->fw_latency_max = PLT_MAX(stats->fw_latency_max, fw_latency);
-		stats->dequeued_count++;
+		xstats->hw_latency_tot += hw_latency;
+		xstats->hw_latency_min = PLT_MIN(xstats->hw_latency_min, hw_latency);
+		xstats->hw_latency_max = PLT_MAX(xstats->hw_latency_max, hw_latency);
+		xstats->fw_latency_tot += fw_latency;
+		xstats->fw_latency_min = PLT_MIN(xstats->fw_latency_min, fw_latency);
+		xstats->fw_latency_max = PLT_MAX(xstats->fw_latency_max, fw_latency);
+		xstats->dequeued_count++;
 
 		op->impl_opaque = result->error_code;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
diff --git a/drivers/ml/cnxk/cnxk_ml_xstats.h b/drivers/ml/cnxk/cnxk_ml_xstats.h
new file mode 100644
index 0000000000..0d405679ca
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_xstats.h
@@ -0,0 +1,128 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_XSTATS_H_
+#define _CNXK_ML_XSTATS_H_
+
+#include "cnxk_ml_io.h"
+
+/* Extended stats types enum */
+enum cnxk_ml_xstats_type {
+	/* Number of models loaded */
+	nb_models_loaded,
+
+	/* Number of models unloaded */
+	nb_models_unloaded,
+
+	/* Number of models started */
+	nb_models_started,
+
+	/* Number of models stopped */
+	nb_models_stopped,
+
+	/* Average inference hardware latency */
+	avg_hw_latency,
+
+	/* Minimum hardware latency */
+	min_hw_latency,
+
+	/* Maximum hardware latency */
+	max_hw_latency,
+
+	/* Average firmware latency */
+	avg_fw_latency,
+
+	/* Minimum firmware latency */
+	min_fw_latency,
+
+	/* Maximum firmware latency */
+	max_fw_latency,
+
+	/* Average runtime latency */
+	avg_rt_latency,
+
+	/* Minimum runtime latency */
+	min_rt_latency,
+
+	/* Maximum runtime latency */
+	max_rt_latency,
+};
+
+/* Extended stats function type enum. */
+enum cnxk_ml_xstats_fn_type {
+	/* Device function */
+	CNXK_ML_XSTATS_FN_DEVICE,
+
+	/* Model function */
+	CNXK_ML_XSTATS_FN_MODEL,
+};
+
+/* Function pointer to get xstats for a type */
+typedef uint64_t (*cnxk_ml_xstats_fn)(struct rte_ml_dev *cnxk_mldev, uint16_t obj_idx,
+				      enum cnxk_ml_xstats_type stat);
+
+/* Extended stats entry structure */
+struct cnxk_ml_xstats_entry {
+	/* Name-ID map */
+	struct rte_ml_dev_xstats_map map;
+
+	/* xstats mode, device or model */
+	enum rte_ml_dev_xstats_mode mode;
+
+	/* Type of xstats */
+	enum cnxk_ml_xstats_type type;
+
+	/* xstats function */
+	enum cnxk_ml_xstats_fn_type fn_id;
+
+	/* Object ID, model ID for model stat type */
+	uint16_t obj_idx;
+
+	/* Layer ID, valid for model stat type */
+	int32_t layer_id;
+
+	/* Allowed to reset the stat */
+	uint8_t reset_allowed;
+
+	/* An offset to be taken away to emulate resets */
+	uint64_t reset_value;
+};
+
+/* Extended stats data */
+struct cnxk_ml_xstats {
+	/* Pointer to xstats entries */
+	struct cnxk_ml_xstats_entry *entries;
+
+	/* Store num stats and offset of the stats for each model */
+	uint16_t count_per_model[ML_CNXK_MAX_MODELS];
+	uint16_t offset_for_model[ML_CNXK_MAX_MODELS];
+	uint16_t count_per_layer[ML_CNXK_MAX_MODELS][ML_CNXK_MODEL_MAX_LAYERS];
+	uint16_t offset_for_layer[ML_CNXK_MAX_MODELS][ML_CNXK_MODEL_MAX_LAYERS];
+	uint16_t count_mode_device;
+	uint16_t count_mode_model;
+	uint16_t count;
+};
+
+struct cnxk_ml_xstat_info {
+	char name[32];
+	enum cnxk_ml_xstats_type type;
+	uint8_t reset_allowed;
+};
+
+/* Device xstats. Note: Device stats are not allowed to be reset. */
+static const struct cnxk_ml_xstat_info device_xstats[] = {
+	{"nb_models_loaded", nb_models_loaded, 0},
+	{"nb_models_unloaded", nb_models_unloaded, 0},
+	{"nb_models_started", nb_models_started, 0},
+	{"nb_models_stopped", nb_models_stopped, 0},
+};
+
+/* Layer xstats */
+static const struct cnxk_ml_xstat_info layer_xstats[] = {
+	{"Avg-HW-Latency", avg_hw_latency, 1}, {"Min-HW-Latency", min_hw_latency, 1},
+	{"Max-HW-Latency", max_hw_latency, 1}, {"Avg-FW-Latency", avg_fw_latency, 1},
+	{"Min-FW-Latency", min_fw_latency, 1}, {"Max-FW-Latency", max_fw_latency, 1},
+};
+
+#endif /* _CNXK_ML_XSTATS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 73db458fcd..6385ac4548 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -16,6 +16,7 @@ driver_sdk_headers = files(
         'cnxk_ml_io.h',
         'cnxk_ml_model.h',
         'cnxk_ml_ops.h',
+        'cnxk_ml_xstats.h',
 )
 
 sources = files(
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 06/35] ml/cnxk: rename cnxk ops function pointers struct
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
                     ` (4 preceding siblings ...)
  2023-09-27 18:30   ` [PATCH v3 05/35] ml/cnxk: add generic cnxk xstats structures Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 07/35] ml/cnxk: update device handling functions Srikanth Yalavarthi
                     ` (28 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Renamed cn10k ML ops structure with cnxk prefix.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c |  2 +-
 drivers/ml/cnxk/cn10k_ml_ops.c | 73 +++++++++-------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h | 34 +++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.c  | 36 +++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.h  |  2 +
 5 files changed, 91 insertions(+), 56 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index fc6f78d414..91813e9d0a 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -345,7 +345,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 			goto pmd_destroy;
 		}
 
-		dev->dev_ops = &cn10k_ml_ops;
+		dev->dev_ops = &cnxk_ml_ops;
 	} else {
 		plt_err("CN10K ML Ops are not supported on secondary process");
 		dev->dev_ops = &ml_dev_dummy_ops;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 42a4389bbe..66b38fc1eb 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -119,7 +119,7 @@ cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
 {
 	struct cnxk_ml_qp *qp;
@@ -860,7 +860,7 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	return ret;
 }
 
-static int
+int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -888,7 +888,7 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
 {
 	struct rte_ml_dev_info dev_info;
@@ -1087,7 +1087,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	return ret;
 }
 
-static int
+int
 cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1160,7 +1160,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	return rte_dev_remove(dev->device);
 }
 
-static int
+int
 cn10k_ml_dev_start(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1180,7 +1180,7 @@ cn10k_ml_dev_start(struct rte_ml_dev *dev)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1200,7 +1200,7 @@ cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 			      const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
 {
@@ -1241,7 +1241,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 {
 	struct cnxk_ml_qp *qp;
@@ -1258,7 +1258,7 @@ cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 	return 0;
 }
 
-static void
+void
 cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 {
 	struct cnxk_ml_qp *qp;
@@ -1273,7 +1273,7 @@ cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 	}
 }
 
-static int
+int
 cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
 			      uint32_t size)
@@ -1321,7 +1321,7 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 	return idx;
 }
 
-static int
+int
 cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
 				uint64_t *value)
 {
@@ -1363,7 +1363,7 @@ cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16
 	return -EINVAL;
 }
 
-static int
+int
 cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
 			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
 {
@@ -1427,7 +1427,7 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 	return idx;
 }
 
-static int
+int
 cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 			  int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids)
 {
@@ -1441,7 +1441,7 @@ cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mo
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1528,7 +1528,7 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -2051,7 +2051,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	return ret;
 }
 
-static int
+int
 cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			struct rte_ml_model_info *model_info)
 {
@@ -2071,7 +2071,7 @@ cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 	return 0;
 }
 
-static int
+int
 cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
 {
 	struct cnxk_ml_model *model;
@@ -2105,7 +2105,7 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *bu
 	return 0;
 }
 
-static int
+int
 cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
 		     struct rte_ml_buff_seg **qbuffer)
 {
@@ -2186,7 +2186,7 @@ cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_bu
 	return 0;
 }
 
-static int
+int
 cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
 		       struct rte_ml_buff_seg **dbuffer)
 {
@@ -2574,38 +2574,3 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 error_enqueue:
 	return ret;
 }
-
-struct rte_ml_dev_ops cn10k_ml_ops = {
-	/* Device control ops */
-	.dev_info_get = cn10k_ml_dev_info_get,
-	.dev_configure = cn10k_ml_dev_configure,
-	.dev_close = cn10k_ml_dev_close,
-	.dev_start = cn10k_ml_dev_start,
-	.dev_stop = cn10k_ml_dev_stop,
-	.dev_dump = cn10k_ml_dev_dump,
-	.dev_selftest = cn10k_ml_dev_selftest,
-
-	/* Queue-pair handling ops */
-	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
-	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
-
-	/* Stats ops */
-	.dev_stats_get = cn10k_ml_dev_stats_get,
-	.dev_stats_reset = cn10k_ml_dev_stats_reset,
-	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
-	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
-	.dev_xstats_get = cn10k_ml_dev_xstats_get,
-	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
-
-	/* Model ops */
-	.model_load = cn10k_ml_model_load,
-	.model_unload = cn10k_ml_model_unload,
-	.model_start = cn10k_ml_model_start,
-	.model_stop = cn10k_ml_model_stop,
-	.model_info_get = cn10k_ml_model_info_get,
-	.model_params_update = cn10k_ml_model_params_update,
-
-	/* I/O ops */
-	.io_quantize = cn10k_ml_io_quantize,
-	.io_dequantize = cn10k_ml_io_dequantize,
-};
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index fd5992e192..16480b9ad8 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -286,7 +286,29 @@ struct cn10k_ml_req {
 };
 
 /* Device ops */
-extern struct rte_ml_dev_ops cn10k_ml_ops;
+int cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
+int cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf);
+int cn10k_ml_dev_close(struct rte_ml_dev *dev);
+int cn10k_ml_dev_start(struct rte_ml_dev *dev);
+int cn10k_ml_dev_stop(struct rte_ml_dev *dev);
+int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
+int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
+int cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
+				  const struct rte_ml_dev_qp_conf *qp_conf, int socket_id);
+int cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id);
+
+int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
+void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
+int cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+				  int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
+				  uint32_t size);
+int cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
+				    uint64_t *value);
+int cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+			    int32_t model_id, const uint16_t stat_ids[], uint64_t values[],
+			    uint16_t nb_ids);
+int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+			      int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids);
 
 /* Slow-path ops */
 int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
@@ -294,6 +316,16 @@ int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *para
 int cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
+int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
+			    struct rte_ml_model_info *model_info);
+int cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer);
+
+/* I/O ops */
+int cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id,
+			 struct rte_ml_buff_seg **dbuffer, struct rte_ml_buff_seg **qbuffer);
+
+int cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id,
+			   struct rte_ml_buff_seg **qbuffer, struct rte_ml_buff_seg **dbuffer);
 
 /* Fast-path ops */
 __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index f1872dcf7c..03402681c5 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -3,5 +3,41 @@
  */
 
 #include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
 
 #include "cnxk_ml_ops.h"
+
+struct rte_ml_dev_ops cnxk_ml_ops = {
+	/* Device control ops */
+	.dev_info_get = cn10k_ml_dev_info_get,
+	.dev_configure = cn10k_ml_dev_configure,
+	.dev_close = cn10k_ml_dev_close,
+	.dev_start = cn10k_ml_dev_start,
+	.dev_stop = cn10k_ml_dev_stop,
+	.dev_dump = cn10k_ml_dev_dump,
+	.dev_selftest = cn10k_ml_dev_selftest,
+
+	/* Queue-pair handling ops */
+	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
+	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
+
+	/* Stats ops */
+	.dev_stats_get = cn10k_ml_dev_stats_get,
+	.dev_stats_reset = cn10k_ml_dev_stats_reset,
+	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
+	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
+	.dev_xstats_get = cn10k_ml_dev_xstats_get,
+	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
+
+	/* Model ops */
+	.model_load = cn10k_ml_model_load,
+	.model_unload = cn10k_ml_model_unload,
+	.model_start = cn10k_ml_model_start,
+	.model_stop = cn10k_ml_model_stop,
+	.model_info_get = cn10k_ml_model_info_get,
+	.model_params_update = cn10k_ml_model_params_update,
+
+	/* I/O ops */
+	.io_quantize = cn10k_ml_io_quantize,
+	.io_dequantize = cn10k_ml_io_dequantize,
+};
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index b953fb0f5f..a925c07580 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -60,4 +60,6 @@ struct cnxk_ml_qp {
 	struct rte_ml_dev_stats stats;
 };
 
+extern struct rte_ml_dev_ops cnxk_ml_ops;
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 07/35] ml/cnxk: update device handling functions
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
                     ` (5 preceding siblings ...)
  2023-09-27 18:30   ` [PATCH v3 06/35] ml/cnxk: rename cnxk ops function pointers struct Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 08/35] ml/cnxk: update queue-pair " Srikanth Yalavarthi
                     ` (27 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implement CNXK wrapper functions for dev_info_get,
dev_configure, dev_close, dev_start and dev_stop. The
wrapper functions allocate / release common resources
for the ML driver and invoke device specific functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 230 ++------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |  16 +-
 drivers/ml/cnxk/cnxk_ml_dev.h  |   3 +
 drivers/ml/cnxk/cnxk_ml_ops.c  | 286 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.h  |   3 +
 5 files changed, 314 insertions(+), 224 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 66b38fc1eb..6d8f2c8777 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -101,7 +101,7 @@ qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
 }
 
-static int
+int
 cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
 {
 	const struct rte_memzone *qp_mem;
@@ -861,20 +861,12 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 }
 
 int
-cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
+cn10k_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 
-	if (dev_info == NULL)
-		return -EINVAL;
-
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
-	dev_info->driver_name = dev->device->driver->name;
-	dev_info->max_models = ML_CNXK_MAX_MODELS;
 	if (cn10k_mldev->hw_queue_lock)
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
 	else
@@ -889,143 +881,17 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 }
 
 int
-cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
+cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
-	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cnxk_ml_qp *qp;
-	uint16_t model_id;
-	uint32_t mz_size;
 	uint16_t tile_id;
-	uint16_t qp_id;
 	int ret;
 
-	if (dev == NULL || conf == NULL)
-		return -EINVAL;
+	RTE_SET_USED(conf);
 
-	/* Get CN10K device handle */
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	cn10k_ml_dev_info_get(dev, &dev_info);
-	if (conf->nb_models > dev_info.max_models) {
-		plt_err("Invalid device config, nb_models > %u\n", dev_info.max_models);
-		return -EINVAL;
-	}
-
-	if (conf->nb_queue_pairs > dev_info.max_queue_pairs) {
-		plt_err("Invalid device config, nb_queue_pairs > %u\n", dev_info.max_queue_pairs);
-		return -EINVAL;
-	}
-
-	if (cnxk_mldev->state == ML_CNXK_DEV_STATE_PROBED) {
-		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
-			   conf->nb_queue_pairs, conf->nb_models);
-
-		/* Load firmware */
-		ret = cn10k_ml_fw_load(cnxk_mldev);
-		if (ret != 0)
-			return ret;
-	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
-		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
-			   conf->nb_queue_pairs, conf->nb_models);
-	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_STARTED) {
-		plt_err("Device can't be reconfigured in started state\n");
-		return -ENOTSUP;
-	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CLOSED) {
-		plt_err("Device can't be reconfigured after close\n");
-		return -ENOTSUP;
-	}
-
-	/* Configure queue-pairs */
-	if (dev->data->queue_pairs == NULL) {
-		mz_size = sizeof(dev->data->queue_pairs[0]) * conf->nb_queue_pairs;
-		dev->data->queue_pairs =
-			rte_zmalloc("cn10k_mldev_queue_pairs", mz_size, RTE_CACHE_LINE_SIZE);
-		if (dev->data->queue_pairs == NULL) {
-			dev->data->nb_queue_pairs = 0;
-			plt_err("Failed to get memory for queue_pairs, nb_queue_pairs %u",
-				conf->nb_queue_pairs);
-			return -ENOMEM;
-		}
-	} else { /* Re-configure */
-		void **queue_pairs;
-
-		/* Release all queue pairs as ML spec doesn't support queue_pair_destroy. */
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-			qp = dev->data->queue_pairs[qp_id];
-			if (qp != NULL) {
-				ret = cn10k_ml_dev_queue_pair_release(dev, qp_id);
-				if (ret < 0)
-					return ret;
-			}
-		}
-
-		queue_pairs = dev->data->queue_pairs;
-		queue_pairs =
-			rte_realloc(queue_pairs, sizeof(queue_pairs[0]) * conf->nb_queue_pairs,
-				    RTE_CACHE_LINE_SIZE);
-		if (queue_pairs == NULL) {
-			dev->data->nb_queue_pairs = 0;
-			plt_err("Failed to realloc queue_pairs, nb_queue_pairs = %u",
-				conf->nb_queue_pairs);
-			ret = -ENOMEM;
-			goto error;
-		}
-
-		memset(queue_pairs, 0, sizeof(queue_pairs[0]) * conf->nb_queue_pairs);
-		dev->data->queue_pairs = queue_pairs;
-	}
-	dev->data->nb_queue_pairs = conf->nb_queue_pairs;
-
-	/* Allocate ML models */
-	if (dev->data->models == NULL) {
-		mz_size = sizeof(dev->data->models[0]) * conf->nb_models;
-		dev->data->models = rte_zmalloc("cn10k_mldev_models", mz_size, RTE_CACHE_LINE_SIZE);
-		if (dev->data->models == NULL) {
-			dev->data->nb_models = 0;
-			plt_err("Failed to get memory for ml_models, nb_models %u",
-				conf->nb_models);
-			ret = -ENOMEM;
-			goto error;
-		}
-	} else {
-		/* Re-configure */
-		void **models;
-
-		/* Stop and unload all models */
-		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
-			model = dev->data->models[model_id];
-			if (model != NULL) {
-				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-					if (cn10k_ml_model_stop(dev, model_id) != 0)
-						plt_err("Could not stop model %u", model_id);
-				}
-				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-					if (cn10k_ml_model_unload(dev, model_id) != 0)
-						plt_err("Could not unload model %u", model_id);
-				}
-				dev->data->models[model_id] = NULL;
-			}
-		}
-
-		models = dev->data->models;
-		models = rte_realloc(models, sizeof(models[0]) * conf->nb_models,
-				     RTE_CACHE_LINE_SIZE);
-		if (models == NULL) {
-			dev->data->nb_models = 0;
-			plt_err("Failed to realloc ml_models, nb_models = %u", conf->nb_models);
-			ret = -ENOMEM;
-			goto error;
-		}
-		memset(models, 0, sizeof(models[0]) * conf->nb_models);
-		dev->data->models = models;
-	}
-	dev->data->nb_models = conf->nb_models;
-
 	ocm = &cn10k_mldev->ocm;
 	ocm->num_tiles = ML_CN10K_OCM_NUMTILES;
 	ocm->size_per_tile = ML_CN10K_OCM_TILESIZE;
@@ -1038,8 +904,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		rte_zmalloc("ocm_mask", ocm->mask_words * ocm->num_tiles, RTE_CACHE_LINE_SIZE);
 	if (ocm->ocm_mask == NULL) {
 		plt_err("Unable to allocate memory for OCM mask");
-		ret = -ENOMEM;
-		goto error;
+		return -ENOMEM;
 	}
 
 	for (tile_id = 0; tile_id < ocm->num_tiles; tile_id++) {
@@ -1050,10 +915,10 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	rte_spinlock_init(&ocm->lock);
 
 	/* Initialize xstats */
-	ret = cn10k_ml_xstats_init(dev);
+	ret = cn10k_ml_xstats_init(cnxk_mldev->mldev);
 	if (ret != 0) {
 		plt_err("Failed to initialize xstats");
-		goto error;
+		return ret;
 	}
 
 	/* Set JCMDQ enqueue function */
@@ -1067,77 +932,25 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	cn10k_mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
 	cn10k_mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
 
-	dev->enqueue_burst = cn10k_ml_enqueue_burst;
-	dev->dequeue_burst = cn10k_ml_dequeue_burst;
-	dev->op_error_get = cn10k_ml_op_error_get;
-
-	cnxk_mldev->nb_models_loaded = 0;
-	cnxk_mldev->nb_models_started = 0;
-	cnxk_mldev->nb_models_stopped = 0;
-	cnxk_mldev->nb_models_unloaded = 0;
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
+	cnxk_mldev->mldev->enqueue_burst = cn10k_ml_enqueue_burst;
+	cnxk_mldev->mldev->dequeue_burst = cn10k_ml_dequeue_burst;
+	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
 
 	return 0;
-
-error:
-	rte_free(dev->data->queue_pairs);
-
-	rte_free(dev->data->models);
-
-	return ret;
 }
 
 int
-cn10k_ml_dev_close(struct rte_ml_dev *dev)
+cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
-	struct cnxk_ml_qp *qp;
-	uint16_t model_id;
-	uint16_t qp_id;
 
-	if (dev == NULL)
-		return -EINVAL;
-
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Release ocm_mask memory */
 	rte_free(cn10k_mldev->ocm.ocm_mask);
 
-	/* Stop and unload all models */
-	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
-		model = dev->data->models[model_id];
-		if (model != NULL) {
-			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-				if (cn10k_ml_model_stop(dev, model_id) != 0)
-					plt_err("Could not stop model %u", model_id);
-			}
-			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-				if (cn10k_ml_model_unload(dev, model_id) != 0)
-					plt_err("Could not unload model %u", model_id);
-			}
-			dev->data->models[model_id] = NULL;
-		}
-	}
-
-	rte_free(dev->data->models);
-
-	/* Destroy all queue pairs */
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-		qp = dev->data->queue_pairs[qp_id];
-		if (qp != NULL) {
-			if (cnxk_ml_qp_destroy(dev, qp) != 0)
-				plt_err("Could not destroy queue pair %u", qp_id);
-			dev->data->queue_pairs[qp_id] = NULL;
-		}
-	}
-
-	rte_free(dev->data->queue_pairs);
-
 	/* Un-initialize xstats */
-	cn10k_ml_xstats_uninit(dev);
+	cn10k_ml_xstats_uninit(cnxk_mldev->mldev);
 
 	/* Unload firmware */
 	cn10k_ml_fw_unload(cnxk_mldev);
@@ -1154,20 +967,15 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_MLR_BASE);
 	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
 
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_CLOSED;
-
-	/* Remove PCI device */
-	return rte_dev_remove(dev->device);
+	return 0;
 }
 
 int
-cn10k_ml_dev_start(struct rte_ml_dev *dev)
+cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
@@ -1175,19 +983,15 @@ cn10k_ml_dev_start(struct rte_ml_dev *dev)
 	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
-
 	return 0;
 }
 
 int
-cn10k_ml_dev_stop(struct rte_ml_dev *dev)
+cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
@@ -1195,8 +999,6 @@ cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
-
 	return 0;
 }
 
@@ -1217,7 +1019,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	if (dev->data->queue_pairs[queue_pair_id] != NULL)
 		cn10k_ml_dev_queue_pair_release(dev, queue_pair_id);
 
-	cn10k_ml_dev_info_get(dev, &dev_info);
+	cnxk_ml_dev_info_get(dev, &dev_info);
 	if ((qp_conf->nb_desc > dev_info.max_desc) || (qp_conf->nb_desc == 0)) {
 		plt_err("Could not setup queue pair for %u descriptors", qp_conf->nb_desc);
 		return -EINVAL;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 16480b9ad8..d50b5bede7 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -10,6 +10,9 @@
 
 #include <roc_api.h>
 
+struct cnxk_ml_dev;
+struct cnxk_ml_qp;
+
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
 
@@ -286,11 +289,11 @@ struct cn10k_ml_req {
 };
 
 /* Device ops */
-int cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
-int cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf);
-int cn10k_ml_dev_close(struct rte_ml_dev *dev);
-int cn10k_ml_dev_start(struct rte_ml_dev *dev);
-int cn10k_ml_dev_stop(struct rte_ml_dev *dev);
+int cn10k_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info);
+int cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
+int cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev);
+int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
 int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
 int cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
@@ -336,4 +339,7 @@ __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op
 				    struct rte_ml_op_error *error);
 __rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
 
+/* Temporarily set below functions as non-static */
+int cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp);
+
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 51315de622..02605fa28f 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -53,6 +53,9 @@ struct cnxk_ml_dev {
 
 	/* CN10K device structure */
 	struct cn10k_ml_dev cn10k_mldev;
+
+	/* Maximum number of layers */
+	uint64_t max_nb_layers;
 };
 
 #endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 03402681c5..07a4daabc5 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -5,15 +5,291 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include "cnxk_ml_dev.h"
+#include "cnxk_ml_io.h"
+#include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
+int
+cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	if (dev == NULL || dev_info == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
+	dev_info->driver_name = dev->device->driver->name;
+	dev_info->max_models = ML_CNXK_MAX_MODELS;
+
+	return cn10k_ml_dev_info_get(cnxk_mldev, dev_info);
+}
+
+static int
+cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_qp *qp;
+	uint16_t model_id;
+	uint32_t mz_size;
+	uint16_t qp_id;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	/* Get CNXK device handle */
+	cnxk_mldev = dev->data->dev_private;
+
+	cnxk_ml_dev_info_get(dev, &dev_info);
+	if (conf->nb_models > dev_info.max_models) {
+		plt_err("Invalid device config, nb_models > %u\n", dev_info.max_models);
+		return -EINVAL;
+	}
+
+	if (conf->nb_queue_pairs > dev_info.max_queue_pairs) {
+		plt_err("Invalid device config, nb_queue_pairs > %u\n", dev_info.max_queue_pairs);
+		return -EINVAL;
+	}
+
+	if (cnxk_mldev->state == ML_CNXK_DEV_STATE_PROBED) {
+		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
+			   conf->nb_queue_pairs, conf->nb_models);
+
+		/* Load firmware */
+		ret = cn10k_ml_fw_load(cnxk_mldev);
+		if (ret != 0)
+			return ret;
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
+		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
+			   conf->nb_queue_pairs, conf->nb_models);
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_STARTED) {
+		plt_err("Device can't be reconfigured in started state\n");
+		return -ENOTSUP;
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CLOSED) {
+		plt_err("Device can't be reconfigured after close\n");
+		return -ENOTSUP;
+	}
+
+	/* Configure queue-pairs */
+	if (dev->data->queue_pairs == NULL) {
+		mz_size = sizeof(dev->data->queue_pairs[0]) * conf->nb_queue_pairs;
+		dev->data->queue_pairs =
+			rte_zmalloc("cnxk_mldev_queue_pairs", mz_size, RTE_CACHE_LINE_SIZE);
+		if (dev->data->queue_pairs == NULL) {
+			dev->data->nb_queue_pairs = 0;
+			plt_err("Failed to get memory for queue_pairs, nb_queue_pairs %u",
+				conf->nb_queue_pairs);
+			return -ENOMEM;
+		}
+	} else { /* Re-configure */
+		void **queue_pairs;
+
+		/* Release all queue pairs as ML spec doesn't support queue_pair_destroy. */
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+			qp = dev->data->queue_pairs[qp_id];
+			if (qp != NULL) {
+				ret = cn10k_ml_dev_queue_pair_release(dev, qp_id);
+				if (ret < 0)
+					return ret;
+			}
+		}
+
+		queue_pairs = dev->data->queue_pairs;
+		queue_pairs =
+			rte_realloc(queue_pairs, sizeof(queue_pairs[0]) * conf->nb_queue_pairs,
+				    RTE_CACHE_LINE_SIZE);
+		if (queue_pairs == NULL) {
+			dev->data->nb_queue_pairs = 0;
+			plt_err("Failed to realloc queue_pairs, nb_queue_pairs = %u",
+				conf->nb_queue_pairs);
+			ret = -ENOMEM;
+			goto error;
+		}
+
+		memset(queue_pairs, 0, sizeof(queue_pairs[0]) * conf->nb_queue_pairs);
+		dev->data->queue_pairs = queue_pairs;
+	}
+	dev->data->nb_queue_pairs = conf->nb_queue_pairs;
+
+	/* Allocate ML models */
+	if (dev->data->models == NULL) {
+		mz_size = sizeof(dev->data->models[0]) * conf->nb_models;
+		dev->data->models = rte_zmalloc("cnxk_mldev_models", mz_size, RTE_CACHE_LINE_SIZE);
+		if (dev->data->models == NULL) {
+			dev->data->nb_models = 0;
+			plt_err("Failed to get memory for ml_models, nb_models %u",
+				conf->nb_models);
+			ret = -ENOMEM;
+			goto error;
+		}
+	} else {
+		/* Re-configure */
+		void **models;
+
+		/* Stop and unload all models */
+		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+			model = dev->data->models[model_id];
+			if (model != NULL) {
+				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
+					if (cn10k_ml_model_stop(dev, model_id) != 0)
+						plt_err("Could not stop model %u", model_id);
+				}
+				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
+					if (cn10k_ml_model_unload(dev, model_id) != 0)
+						plt_err("Could not unload model %u", model_id);
+				}
+				dev->data->models[model_id] = NULL;
+			}
+		}
+
+		models = dev->data->models;
+		models = rte_realloc(models, sizeof(models[0]) * conf->nb_models,
+				     RTE_CACHE_LINE_SIZE);
+		if (models == NULL) {
+			dev->data->nb_models = 0;
+			plt_err("Failed to realloc ml_models, nb_models = %u", conf->nb_models);
+			ret = -ENOMEM;
+			goto error;
+		}
+		memset(models, 0, sizeof(models[0]) * conf->nb_models);
+		dev->data->models = models;
+	}
+	dev->data->nb_models = conf->nb_models;
+
+	ret = cn10k_ml_dev_configure(cnxk_mldev, conf);
+	if (ret != 0) {
+		plt_err("Failed to configure CN10K ML Device");
+		goto error;
+	}
+
+	/* Set device capabilities */
+	cnxk_mldev->max_nb_layers =
+		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
+
+	cnxk_mldev->nb_models_loaded = 0;
+	cnxk_mldev->nb_models_started = 0;
+	cnxk_mldev->nb_models_stopped = 0;
+	cnxk_mldev->nb_models_unloaded = 0;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
+
+	return 0;
+
+error:
+	rte_free(dev->data->queue_pairs);
+	rte_free(dev->data->models);
+
+	return ret;
+}
+
+static int
+cnxk_ml_dev_close(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_qp *qp;
+	uint16_t model_id;
+	uint16_t qp_id;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
+		plt_err("Failed to close CN10K ML Device");
+
+	/* Stop and unload all models */
+	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+		model = dev->data->models[model_id];
+		if (model != NULL) {
+			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
+				if (cn10k_ml_model_stop(dev, model_id) != 0)
+					plt_err("Could not stop model %u", model_id);
+			}
+			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
+				if (cn10k_ml_model_unload(dev, model_id) != 0)
+					plt_err("Could not unload model %u", model_id);
+			}
+			dev->data->models[model_id] = NULL;
+		}
+	}
+
+	rte_free(dev->data->models);
+
+	/* Destroy all queue pairs */
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		if (qp != NULL) {
+			if (cnxk_ml_qp_destroy(dev, qp) != 0)
+				plt_err("Could not destroy queue pair %u", qp_id);
+			dev->data->queue_pairs[qp_id] = NULL;
+		}
+	}
+
+	rte_free(dev->data->queue_pairs);
+
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CLOSED;
+
+	/* Remove PCI device */
+	return rte_dev_remove(dev->device);
+}
+
+static int
+cnxk_ml_dev_start(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	ret = cn10k_ml_dev_start(cnxk_mldev);
+	if (ret != 0) {
+		plt_err("Failed to start CN10K ML Device");
+		return ret;
+	}
+
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
+
+	return 0;
+}
+
+static int
+cnxk_ml_dev_stop(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	ret = cn10k_ml_dev_stop(cnxk_mldev);
+	if (ret != 0) {
+		plt_err("Failed to stop CN10K ML Device");
+		return ret;
+	}
+
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
-	.dev_info_get = cn10k_ml_dev_info_get,
-	.dev_configure = cn10k_ml_dev_configure,
-	.dev_close = cn10k_ml_dev_close,
-	.dev_start = cn10k_ml_dev_start,
-	.dev_stop = cn10k_ml_dev_stop,
+	.dev_info_get = cnxk_ml_dev_info_get,
+	.dev_configure = cnxk_ml_dev_configure,
+	.dev_close = cnxk_ml_dev_close,
+	.dev_start = cnxk_ml_dev_start,
+	.dev_stop = cnxk_ml_dev_stop,
 	.dev_dump = cn10k_ml_dev_dump,
 	.dev_selftest = cn10k_ml_dev_selftest,
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index a925c07580..2996928d7d 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -62,4 +62,7 @@ struct cnxk_ml_qp {
 
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
+/* Temporarily set cnxk driver functions as non-static */
+int cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 08/35] ml/cnxk: update queue-pair handling functions
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
                     ` (6 preceding siblings ...)
  2023-09-27 18:30   ` [PATCH v3 07/35] ml/cnxk: update device handling functions Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 09/35] ml/cnxk: update model load and unload functions Srikanth Yalavarthi
                     ` (26 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to handle ML device queue-pairs.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 135 +----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   7 +-
 drivers/ml/cnxk/cnxk_ml_ops.c  | 153 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.h  |   3 -
 4 files changed, 154 insertions(+), 144 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 6d8f2c8777..e3c688a55f 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -95,93 +95,12 @@ cn10k_ml_get_poll_ptr(struct cnxk_ml_req *req)
 	return plt_read64(req->status);
 }
 
-static void
-qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
-{
-	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
-}
-
-int
-cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
-{
-	const struct rte_memzone *qp_mem;
-	char name[RTE_MEMZONE_NAMESIZE];
-	int ret;
-
-	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp->id);
-	qp_mem = rte_memzone_lookup(name);
-	ret = rte_memzone_free(qp_mem);
-	if (ret)
-		return ret;
-
-	rte_free(qp);
-
-	return 0;
-}
-
-int
-cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
-{
-	struct cnxk_ml_qp *qp;
-	int ret;
-
-	qp = dev->data->queue_pairs[queue_pair_id];
-	if (qp == NULL)
-		return -EINVAL;
-
-	ret = cnxk_ml_qp_destroy(dev, qp);
-	if (ret) {
-		plt_err("Could not destroy queue pair %u", queue_pair_id);
-		return ret;
-	}
-
-	dev->data->queue_pairs[queue_pair_id] = NULL;
-
-	return 0;
-}
-
-static struct cnxk_ml_qp *
-cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
+void
+cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp)
 {
-	const struct rte_memzone *qp_mem;
-	char name[RTE_MEMZONE_NAMESIZE];
-	struct cnxk_ml_qp *qp;
-	uint32_t len;
-	uint8_t *va;
 	uint64_t i;
 
-	/* Allocate queue pair */
-	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cnxk_ml_qp), ROC_ALIGN,
-				socket_id);
-	if (qp == NULL) {
-		plt_err("Could not allocate queue pair");
-		return NULL;
-	}
-
-	/* For request queue */
-	len = nb_desc * sizeof(struct cnxk_ml_req);
-	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp_id);
-	qp_mem = rte_memzone_reserve_aligned(
-		name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | RTE_MEMZONE_256MB, ROC_ALIGN);
-	if (qp_mem == NULL) {
-		plt_err("Could not reserve memzone: %s", name);
-		goto qp_free;
-	}
-
-	va = qp_mem->addr;
-	memset(va, 0, len);
-
-	/* Initialize Request queue */
-	qp->id = qp_id;
-	qp->queue.reqs = (struct cnxk_ml_req *)va;
-	qp->queue.head = 0;
-	qp->queue.tail = 0;
-	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	qp->nb_desc = nb_desc;
-	qp->stats.enqueued_count = 0;
-	qp->stats.dequeued_count = 0;
-	qp->stats.enqueue_err_count = 0;
-	qp->stats.dequeue_err_count = 0;
+	RTE_SET_USED(cnxk_mldev);
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
@@ -189,13 +108,6 @@ cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc
 		qp->queue.reqs[i].cn10k_req.jcmd.w1.s.jobptr =
 			PLT_U64_CAST(&qp->queue.reqs[i].cn10k_req.jd);
 	}
-
-	return qp;
-
-qp_free:
-	rte_free(qp);
-
-	return NULL;
 }
 
 static void
@@ -1002,47 +914,6 @@ cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
-int
-cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
-			      const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
-{
-	struct rte_ml_dev_info dev_info;
-	struct cnxk_ml_qp *qp;
-	uint32_t nb_desc;
-
-	if (queue_pair_id >= dev->data->nb_queue_pairs) {
-		plt_err("Queue-pair id = %u (>= max queue pairs supported, %u)\n", queue_pair_id,
-			dev->data->nb_queue_pairs);
-		return -EINVAL;
-	}
-
-	if (dev->data->queue_pairs[queue_pair_id] != NULL)
-		cn10k_ml_dev_queue_pair_release(dev, queue_pair_id);
-
-	cnxk_ml_dev_info_get(dev, &dev_info);
-	if ((qp_conf->nb_desc > dev_info.max_desc) || (qp_conf->nb_desc == 0)) {
-		plt_err("Could not setup queue pair for %u descriptors", qp_conf->nb_desc);
-		return -EINVAL;
-	}
-	plt_ml_dbg("Creating queue-pair, queue_pair_id = %u, nb_desc = %u", queue_pair_id,
-		   qp_conf->nb_desc);
-
-	/* As the number of usable descriptors is 1 less than the queue size being created, we
-	 * increment the size of queue by 1 than the requested size, except when the requested size
-	 * is equal to the maximum possible size.
-	 */
-	nb_desc =
-		(qp_conf->nb_desc == dev_info.max_desc) ? dev_info.max_desc : qp_conf->nb_desc + 1;
-	qp = cnxk_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
-	if (qp == NULL) {
-		plt_err("Could not create queue pair %u", queue_pair_id);
-		return -ENOMEM;
-	}
-	dev->data->queue_pairs[queue_pair_id] = qp;
-
-	return 0;
-}
-
 int
 cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 {
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index d50b5bede7..2d0a49d5cd 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -296,9 +296,6 @@ int cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
 int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
-int cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
-				  const struct rte_ml_dev_qp_conf *qp_conf, int socket_id);
-int cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id);
 
 int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
 void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
@@ -339,7 +336,7 @@ __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op
 				    struct rte_ml_op_error *error);
 __rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
 
-/* Temporarily set below functions as non-static */
-int cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp);
+/* Misc ops */
+void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
 
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 07a4daabc5..aa56dd2276 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -10,7 +10,107 @@
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
-int
+static void
+qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
+{
+	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
+}
+
+static int
+cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
+{
+	const struct rte_memzone *qp_mem;
+	char name[RTE_MEMZONE_NAMESIZE];
+	int ret;
+
+	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp->id);
+	qp_mem = rte_memzone_lookup(name);
+	ret = rte_memzone_free(qp_mem);
+	if (ret)
+		return ret;
+
+	rte_free(qp);
+
+	return 0;
+}
+
+static int
+cnxk_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
+{
+	struct cnxk_ml_qp *qp;
+	int ret;
+
+	qp = dev->data->queue_pairs[queue_pair_id];
+	if (qp == NULL)
+		return -EINVAL;
+
+	ret = cnxk_ml_qp_destroy(dev, qp);
+	if (ret) {
+		plt_err("Could not destroy queue pair %u", queue_pair_id);
+		return ret;
+	}
+
+	dev->data->queue_pairs[queue_pair_id] = NULL;
+
+	return 0;
+}
+
+static struct cnxk_ml_qp *
+cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
+{
+	const struct rte_memzone *qp_mem;
+	char name[RTE_MEMZONE_NAMESIZE];
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_qp *qp;
+	uint32_t len;
+	uint8_t *va;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	/* Allocate queue pair */
+	qp = rte_zmalloc_socket("cnxk_ml_pmd_queue_pair", sizeof(struct cnxk_ml_qp), ROC_ALIGN,
+				socket_id);
+	if (qp == NULL) {
+		plt_err("Could not allocate queue pair");
+		return NULL;
+	}
+
+	/* For request queue */
+	len = nb_desc * sizeof(struct cnxk_ml_req);
+	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp_id);
+	qp_mem = rte_memzone_reserve_aligned(
+		name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | RTE_MEMZONE_256MB, ROC_ALIGN);
+	if (qp_mem == NULL) {
+		plt_err("Could not reserve memzone: %s", name);
+		goto qp_free;
+	}
+
+	va = qp_mem->addr;
+	memset(va, 0, len);
+
+	/* Initialize Request queue */
+	qp->id = qp_id;
+	qp->queue.reqs = (struct cnxk_ml_req *)va;
+	qp->queue.head = 0;
+	qp->queue.tail = 0;
+	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	qp->nb_desc = nb_desc;
+	qp->stats.enqueued_count = 0;
+	qp->stats.dequeued_count = 0;
+	qp->stats.enqueue_err_count = 0;
+	qp->stats.dequeue_err_count = 0;
+
+	cn10k_ml_qp_initialize(cnxk_mldev, qp);
+
+	return qp;
+
+qp_free:
+	rte_free(qp);
+
+	return NULL;
+}
+
+static int
 cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
 	struct cnxk_ml_dev *cnxk_mldev;
@@ -93,7 +193,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
 			qp = dev->data->queue_pairs[qp_id];
 			if (qp != NULL) {
-				ret = cn10k_ml_dev_queue_pair_release(dev, qp_id);
+				ret = cnxk_ml_dev_queue_pair_release(dev, qp_id);
 				if (ret < 0)
 					return ret;
 			}
@@ -283,6 +383,51 @@ cnxk_ml_dev_stop(struct rte_ml_dev *dev)
 	return 0;
 }
 
+static int
+cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
+			     const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cnxk_ml_qp *qp;
+	uint32_t nb_desc;
+
+	if (queue_pair_id >= dev->data->nb_queue_pairs) {
+		plt_err("Queue-pair id = %u (>= max queue pairs supported, %u)\n", queue_pair_id,
+			dev->data->nb_queue_pairs);
+		return -EINVAL;
+	}
+
+	if (dev->data->queue_pairs[queue_pair_id] != NULL)
+		cnxk_ml_dev_queue_pair_release(dev, queue_pair_id);
+
+	cnxk_ml_dev_info_get(dev, &dev_info);
+	if (qp_conf->nb_desc == 0) {
+		plt_err("Could not setup queue pair for %u descriptors", qp_conf->nb_desc);
+		return -EINVAL;
+	} else if (qp_conf->nb_desc > dev_info.max_desc) {
+		plt_err("Could not setup queue pair for %u descriptors (> %u)", qp_conf->nb_desc,
+			dev_info.max_desc);
+		return -EINVAL;
+	}
+	plt_ml_dbg("Creating queue-pair, queue_pair_id = %u, nb_desc = %u", queue_pair_id,
+		   qp_conf->nb_desc);
+
+	/* As the number of usable descriptors is 1 less than the queue size being created, we
+	 * increment the size of queue by 1 than the requested size, except when the requested size
+	 * is equal to the maximum possible size.
+	 */
+	nb_desc =
+		(qp_conf->nb_desc == dev_info.max_desc) ? dev_info.max_desc : qp_conf->nb_desc + 1;
+	qp = cnxk_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
+	if (qp == NULL) {
+		plt_err("Could not create queue pair %u", queue_pair_id);
+		return -ENOMEM;
+	}
+	dev->data->queue_pairs[queue_pair_id] = qp;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -294,8 +439,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_selftest = cn10k_ml_dev_selftest,
 
 	/* Queue-pair handling ops */
-	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
-	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
+	.dev_queue_pair_setup = cnxk_ml_dev_queue_pair_setup,
+	.dev_queue_pair_release = cnxk_ml_dev_queue_pair_release,
 
 	/* Stats ops */
 	.dev_stats_get = cn10k_ml_dev_stats_get,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index 2996928d7d..a925c07580 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -62,7 +62,4 @@ struct cnxk_ml_qp {
 
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
-/* Temporarily set cnxk driver functions as non-static */
-int cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
-
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 09/35] ml/cnxk: update model load and unload functions
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
                     ` (7 preceding siblings ...)
  2023-09-27 18:30   ` [PATCH v3 08/35] ml/cnxk: update queue-pair " Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 10/35] ml/cnxk: update model start and stop functions Srikanth Yalavarthi
                     ` (25 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented cnxk wrapper functions to load and unload
ML models. Wrapper functions would invoke the cn10k
model load and unload functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 244 ++++++++++++-------------
 drivers/ml/cnxk/cn10k_ml_model.h |  26 ++-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 296 ++++++++++++++++++-------------
 drivers/ml/cnxk/cn10k_ml_ops.h   |  12 +-
 drivers/ml/cnxk/cnxk_ml_dev.h    |  15 ++
 drivers/ml/cnxk/cnxk_ml_ops.c    | 144 ++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.h    |   2 +
 7 files changed, 462 insertions(+), 277 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 5d37e9bf8a..69a60b9b90 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -316,42 +316,31 @@ cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer, uint8_t
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_layer_addr *addr;
-	size_t model_data_size;
 	uint8_t *dma_addr_load;
-	uint8_t *dma_addr_run;
 	int fpos;
 
 	metadata = &layer->glow.metadata;
 	addr = &layer->glow.addr;
-	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
-			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
 
 	/* Base address */
 	addr->base_dma_addr_load = base_dma_addr;
-	addr->base_dma_addr_run = PLT_PTR_ADD(addr->base_dma_addr_load, model_data_size);
 
 	/* Init section */
 	dma_addr_load = addr->base_dma_addr_load;
-	dma_addr_run = addr->base_dma_addr_run;
 	fpos = sizeof(struct cn10k_ml_model_metadata);
 	addr->init_load_addr = dma_addr_load;
-	addr->init_run_addr = dma_addr_run;
 	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->init_model.file_size);
 
 	/* Main section */
 	dma_addr_load += metadata->init_model.file_size;
-	dma_addr_run += metadata->init_model.file_size;
 	fpos += metadata->init_model.file_size;
 	addr->main_load_addr = dma_addr_load;
-	addr->main_run_addr = dma_addr_run;
 	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->main_model.file_size);
 
 	/* Finish section */
 	dma_addr_load += metadata->main_model.file_size;
-	dma_addr_run += metadata->main_model.file_size;
 	fpos += metadata->main_model.file_size;
 	addr->finish_load_addr = dma_addr_load;
-	addr->finish_run_addr = dma_addr_run;
 	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->finish_model.file_size);
 
 	/* Weights and Bias section */
@@ -363,140 +352,146 @@ cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer, uint8_t
 }
 
 void
-cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer)
+cn10k_ml_layer_io_info_set(struct cnxk_ml_io_info *io_info,
+			   struct cn10k_ml_model_metadata *metadata)
 {
-	struct cn10k_ml_model_metadata *metadata;
 	uint8_t i;
 	uint8_t j;
 
-	metadata = &layer->glow.metadata;
-
 	/* Inputs */
-	layer->info.nb_inputs = metadata->model.num_input;
-	layer->info.total_input_sz_d = 0;
-	layer->info.total_input_sz_q = 0;
+	io_info->nb_inputs = metadata->model.num_input;
+	io_info->total_input_sz_d = 0;
+	io_info->total_input_sz_q = 0;
 	for (i = 0; i < metadata->model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			strncpy(layer->info.input[i].name, (char *)metadata->input1[i].input_name,
+			strncpy(io_info->input[i].name, (char *)metadata->input1[i].input_name,
 				MRVL_ML_INPUT_NAME_LEN);
-			layer->info.input[i].dtype = metadata->input1[i].input_type;
-			layer->info.input[i].qtype = metadata->input1[i].model_input_type;
-			layer->info.input[i].nb_dims = 4;
-			layer->info.input[i].shape[0] = metadata->input1[i].shape.w;
-			layer->info.input[i].shape[1] = metadata->input1[i].shape.x;
-			layer->info.input[i].shape[2] = metadata->input1[i].shape.y;
-			layer->info.input[i].shape[3] = metadata->input1[i].shape.z;
-			layer->info.input[i].nb_elements =
+			io_info->input[i].dtype = metadata->input1[i].input_type;
+			io_info->input[i].qtype = metadata->input1[i].model_input_type;
+			io_info->input[i].nb_dims = 4;
+			io_info->input[i].shape[0] = metadata->input1[i].shape.w;
+			io_info->input[i].shape[1] = metadata->input1[i].shape.x;
+			io_info->input[i].shape[2] = metadata->input1[i].shape.y;
+			io_info->input[i].shape[3] = metadata->input1[i].shape.z;
+			io_info->input[i].nb_elements =
 				metadata->input1[i].shape.w * metadata->input1[i].shape.x *
 				metadata->input1[i].shape.y * metadata->input1[i].shape.z;
-			layer->info.input[i].sz_d =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_d =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].input_type);
-			layer->info.input[i].sz_q =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_q =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].model_input_type);
-			layer->info.input[i].scale = metadata->input1[i].qscale;
+			io_info->input[i].scale = metadata->input1[i].qscale;
 
-			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
-			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
+			io_info->total_input_sz_d += io_info->input[i].sz_d;
+			io_info->total_input_sz_q += io_info->input[i].sz_q;
 
 			plt_ml_dbg(
-				"index = %u, input1[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				layer->index, i, metadata->input1[i].shape.w,
+				"layer_name = %s, input1[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				metadata->model.name, i, metadata->input1[i].shape.w,
 				metadata->input1[i].shape.x, metadata->input1[i].shape.y,
-				metadata->input1[i].shape.z, layer->info.input[i].sz_d,
-				layer->info.input[i].sz_q);
+				metadata->input1[i].shape.z, io_info->input[i].sz_d,
+				io_info->input[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			strncpy(layer->info.input[i].name, (char *)metadata->input2[j].input_name,
+			strncpy(io_info->input[i].name, (char *)metadata->input2[j].input_name,
 				MRVL_ML_INPUT_NAME_LEN);
-			layer->info.input[i].dtype = metadata->input2[j].input_type;
-			layer->info.input[i].qtype = metadata->input2[j].model_input_type;
-			layer->info.input[i].nb_dims = 4;
-			layer->info.input[i].shape[0] = metadata->input2[j].shape.w;
-			layer->info.input[i].shape[1] = metadata->input2[j].shape.x;
-			layer->info.input[i].shape[2] = metadata->input2[j].shape.y;
-			layer->info.input[i].shape[3] = metadata->input2[j].shape.z;
-			layer->info.input[i].nb_elements =
+			io_info->input[i].dtype = metadata->input2[j].input_type;
+			io_info->input[i].qtype = metadata->input2[j].model_input_type;
+			io_info->input[i].nb_dims = 4;
+			io_info->input[i].shape[0] = metadata->input2[j].shape.w;
+			io_info->input[i].shape[1] = metadata->input2[j].shape.x;
+			io_info->input[i].shape[2] = metadata->input2[j].shape.y;
+			io_info->input[i].shape[3] = metadata->input2[j].shape.z;
+			io_info->input[i].nb_elements =
 				metadata->input2[j].shape.w * metadata->input2[j].shape.x *
 				metadata->input2[j].shape.y * metadata->input2[j].shape.z;
-			layer->info.input[i].sz_d =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_d =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].input_type);
-			layer->info.input[i].sz_q =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_q =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].model_input_type);
-			layer->info.input[i].scale = metadata->input2[j].qscale;
+			io_info->input[i].scale = metadata->input2[j].qscale;
 
-			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
-			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
+			io_info->total_input_sz_d += io_info->input[i].sz_d;
+			io_info->total_input_sz_q += io_info->input[i].sz_q;
 
 			plt_ml_dbg(
-				"index = %u, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				layer->index, j, metadata->input2[j].shape.w,
+				"layer_name = %s, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				metadata->model.name, j, metadata->input2[j].shape.w,
 				metadata->input2[j].shape.x, metadata->input2[j].shape.y,
-				metadata->input2[j].shape.z, layer->info.input[i].sz_d,
-				layer->info.input[i].sz_q);
+				metadata->input2[j].shape.z, io_info->input[i].sz_d,
+				io_info->input[i].sz_q);
 		}
 	}
 
 	/* Outputs */
-	layer->info.nb_outputs = metadata->model.num_output;
-	layer->info.total_output_sz_q = 0;
-	layer->info.total_output_sz_d = 0;
+	io_info->nb_outputs = metadata->model.num_output;
+	io_info->total_output_sz_q = 0;
+	io_info->total_output_sz_d = 0;
 	for (i = 0; i < metadata->model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			strncpy(layer->info.output[i].name,
-				(char *)metadata->output1[i].output_name, MRVL_ML_OUTPUT_NAME_LEN);
-			layer->info.output[i].dtype = metadata->output1[i].output_type;
-			layer->info.output[i].qtype = metadata->output1[i].model_output_type;
-			layer->info.output[i].nb_dims = 1;
-			layer->info.output[i].shape[0] = metadata->output1[i].size;
-			layer->info.output[i].nb_elements = metadata->output1[i].size;
-			layer->info.output[i].sz_d =
-				layer->info.output[i].nb_elements *
+			strncpy(io_info->output[i].name, (char *)metadata->output1[i].output_name,
+				MRVL_ML_OUTPUT_NAME_LEN);
+			io_info->output[i].dtype = metadata->output1[i].output_type;
+			io_info->output[i].qtype = metadata->output1[i].model_output_type;
+			io_info->output[i].nb_dims = 1;
+			io_info->output[i].shape[0] = metadata->output1[i].size;
+			io_info->output[i].nb_elements = metadata->output1[i].size;
+			io_info->output[i].sz_d =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].output_type);
-			layer->info.output[i].sz_q =
-				layer->info.output[i].nb_elements *
+			io_info->output[i].sz_q =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].model_output_type);
-			layer->info.output[i].scale = metadata->output1[i].dscale;
+			io_info->output[i].scale = metadata->output1[i].dscale;
 
-			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
-			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
+			io_info->total_output_sz_q += io_info->output[i].sz_q;
+			io_info->total_output_sz_d += io_info->output[i].sz_d;
 
-			plt_ml_dbg("index = %u, output1[%u] - sz_d = %u, sz_q = %u", layer->index,
-				   i, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
+			plt_ml_dbg("layer_name = %s, output1[%u] - sz_d = %u, sz_q = %u",
+				   metadata->model.name, i, io_info->output[i].sz_d,
+				   io_info->output[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			strncpy(layer->info.output[i].name,
-				(char *)metadata->output2[j].output_name, MRVL_ML_OUTPUT_NAME_LEN);
-			layer->info.output[i].dtype = metadata->output2[j].output_type;
-			layer->info.output[i].qtype = metadata->output2[j].model_output_type;
-			layer->info.output[i].nb_dims = 1;
-			layer->info.output[i].shape[0] = metadata->output2[j].size;
-			layer->info.output[i].nb_elements = metadata->output2[j].size;
-			layer->info.output[i].sz_d =
-				layer->info.output[i].nb_elements *
+			strncpy(io_info->output[i].name, (char *)metadata->output2[j].output_name,
+				MRVL_ML_OUTPUT_NAME_LEN);
+			io_info->output[i].dtype = metadata->output2[j].output_type;
+			io_info->output[i].qtype = metadata->output2[j].model_output_type;
+			io_info->output[i].nb_dims = 1;
+			io_info->output[i].shape[0] = metadata->output2[j].size;
+			io_info->output[i].nb_elements = metadata->output2[j].size;
+			io_info->output[i].sz_d =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].output_type);
-			layer->info.output[i].sz_q =
-				layer->info.output[i].nb_elements *
+			io_info->output[i].sz_q =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].model_output_type);
-			layer->info.output[i].scale = metadata->output2[j].dscale;
+			io_info->output[i].scale = metadata->output2[j].dscale;
 
-			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
-			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
+			io_info->total_output_sz_q += io_info->output[i].sz_q;
+			io_info->total_output_sz_d += io_info->output[i].sz_d;
 
-			plt_ml_dbg("index = %u, output2[%u] - sz_d = %u, sz_q = %u", layer->index,
-				   j, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
+			plt_ml_dbg("layer_name = %s, output2[%u] - sz_d = %u, sz_q = %u",
+				   metadata->model.name, j, io_info->output[i].sz_d,
+				   io_info->output[i].sz_q);
 		}
 	}
 }
 
+struct cnxk_ml_io_info *
+cn10k_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
+{
+	return &model->layer[layer_id].info;
+}
+
 int
-cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id, uint8_t *buffer,
-			       uint16_t *wb_pages, uint16_t *scratch_pages)
+cn10k_ml_model_ocm_pages_count(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
+			       uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_ocm *ocm;
@@ -504,7 +499,7 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 	uint64_t wb_size;
 
 	metadata = (struct cn10k_ml_model_metadata *)buffer;
-	ocm = &cn10k_mldev->ocm;
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
 
 	/* Assume wb_size is zero for non-relocatable models */
 	if (metadata->model.ocm_relocatable)
@@ -516,7 +511,7 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 		*wb_pages = wb_size / ocm->page_size + 1;
 	else
 		*wb_pages = wb_size / ocm->page_size;
-	plt_ml_dbg("model_id = %u, wb_size = %" PRIu64 ", wb_pages = %u", model_id, wb_size,
+	plt_ml_dbg("index = %u, wb_size = %" PRIu64 ", wb_pages = %u", layer->index, wb_size,
 		   *wb_pages);
 
 	scratch_size = ocm->size_per_tile - metadata->model.ocm_tmp_range_floor;
@@ -524,15 +519,15 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 		*scratch_pages = scratch_size / ocm->page_size + 1;
 	else
 		*scratch_pages = scratch_size / ocm->page_size;
-	plt_ml_dbg("model_id = %u, scratch_size = %" PRIu64 ", scratch_pages = %u", model_id,
+	plt_ml_dbg("index = %u, scratch_size = %" PRIu64 ", scratch_pages = %u", layer->index,
 		   scratch_size, *scratch_pages);
 
 	/* Check if the model can be loaded on OCM */
-	if ((*wb_pages + *scratch_pages) > cn10k_mldev->ocm.num_pages) {
+	if ((*wb_pages + *scratch_pages) > ocm->num_pages) {
 		plt_err("Cannot create the model, OCM relocatable = %u",
 			metadata->model.ocm_relocatable);
 		plt_err("wb_pages (%u) + scratch_pages (%u) > %u", *wb_pages, *scratch_pages,
-			cn10k_mldev->ocm.num_pages);
+			ocm->num_pages);
 		return -ENOMEM;
 	}
 
@@ -540,28 +535,25 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 	 * prevent the library from allocating the remaining space on the tile to other models.
 	 */
 	if (!metadata->model.ocm_relocatable)
-		*scratch_pages = PLT_MAX(PLT_U64_CAST(*scratch_pages),
-					 PLT_U64_CAST(cn10k_mldev->ocm.num_pages));
+		*scratch_pages =
+			PLT_MAX(PLT_U64_CAST(*scratch_pages), PLT_U64_CAST(ocm->num_pages));
 
 	return 0;
 }
 
 void
-cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
+cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			struct cnxk_ml_io_info *io_info, struct cn10k_ml_model_metadata *metadata)
 {
-	struct cn10k_ml_model_metadata *metadata;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct rte_ml_model_info *info;
 	struct rte_ml_io_info *output;
 	struct rte_ml_io_info *input;
-	struct cnxk_ml_layer *layer;
 	uint8_t i;
 
-	cnxk_mldev = dev->data->dev_private;
 	metadata = &model->glow.metadata;
 	info = PLT_PTR_CAST(model->info);
 	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
-	output = PLT_PTR_ADD(input, metadata->model.num_input * sizeof(struct rte_ml_io_info));
+	output = PLT_PTR_ADD(input, ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info));
 
 	/* Set model info */
 	memset(info, 0, sizeof(struct rte_ml_model_info));
@@ -570,39 +562,37 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 		 metadata->model.version[1], metadata->model.version[2],
 		 metadata->model.version[3]);
 	info->model_id = model->model_id;
-	info->device_id = dev->data->dev_id;
+	info->device_id = cnxk_mldev->mldev->data->dev_id;
 	info->io_layout = RTE_ML_IO_LAYOUT_PACKED;
 	info->min_batches = model->batch_size;
 	info->max_batches =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_num_batches /
 		model->batch_size;
-	info->nb_inputs = metadata->model.num_input;
+	info->nb_inputs = io_info->nb_inputs;
 	info->input_info = input;
-	info->nb_outputs = metadata->model.num_output;
+	info->nb_outputs = io_info->nb_outputs;
 	info->output_info = output;
 	info->wb_size = metadata->weights_bias.file_size;
 
 	/* Set input info */
-	layer = &model->layer[0];
 	for (i = 0; i < info->nb_inputs; i++) {
-		rte_memcpy(input[i].name, layer->info.input[i].name, MRVL_ML_INPUT_NAME_LEN);
-		input[i].nb_dims = layer->info.input[i].nb_dims;
-		input[i].shape = &layer->info.input[i].shape[0];
-		input[i].type = layer->info.input[i].qtype;
-		input[i].nb_elements = layer->info.input[i].nb_elements;
-		input[i].size = layer->info.input[i].nb_elements *
-				rte_ml_io_type_size_get(layer->info.input[i].qtype);
+		rte_memcpy(input[i].name, io_info->input[i].name, MRVL_ML_INPUT_NAME_LEN);
+		input[i].nb_dims = io_info->input[i].nb_dims;
+		input[i].shape = &io_info->input[i].shape[0];
+		input[i].type = io_info->input[i].qtype;
+		input[i].nb_elements = io_info->input[i].nb_elements;
+		input[i].size = io_info->input[i].nb_elements *
+				rte_ml_io_type_size_get(io_info->input[i].qtype);
 	}
 
 	/* Set output info */
-	layer = &model->layer[0];
 	for (i = 0; i < info->nb_outputs; i++) {
-		rte_memcpy(output[i].name, layer->info.output[i].name, MRVL_ML_INPUT_NAME_LEN);
-		output[i].nb_dims = layer->info.output[i].nb_dims;
-		output[i].shape = &layer->info.output[i].shape[0];
-		output[i].type = layer->info.output[i].qtype;
-		output[i].nb_elements = layer->info.output[i].nb_elements;
-		output[i].size = layer->info.output[i].nb_elements *
-				 rte_ml_io_type_size_get(layer->info.output[i].qtype);
+		rte_memcpy(output[i].name, io_info->output[i].name, MRVL_ML_INPUT_NAME_LEN);
+		output[i].nb_dims = io_info->output[i].nb_dims;
+		output[i].shape = &io_info->output[i].shape[0];
+		output[i].type = io_info->output[i].qtype;
+		output[i].nb_elements = io_info->output[i].nb_elements;
+		output[i].size = io_info->output[i].nb_elements *
+				 rte_ml_io_type_size_get(io_info->output[i].qtype);
 	}
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 5c32f48c68..b891c9d627 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -9,9 +9,11 @@
 
 #include <roc_api.h>
 
-#include "cn10k_ml_dev.h"
 #include "cn10k_ml_ocm.h"
 
+#include "cnxk_ml_io.h"
+
+struct cnxk_ml_dev;
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
 struct cnxk_ml_req;
@@ -366,27 +368,15 @@ struct cn10k_ml_layer_addr {
 	/* Base DMA address for load */
 	void *base_dma_addr_load;
 
-	/* Base DMA address for run */
-	void *base_dma_addr_run;
-
 	/* Init section load address */
 	void *init_load_addr;
 
-	/* Init section run address */
-	void *init_run_addr;
-
 	/* Main section load address */
 	void *main_load_addr;
 
-	/* Main section run address */
-	void *main_run_addr;
-
 	/* Finish section load address */
 	void *finish_load_addr;
 
-	/* Finish section run address */
-	void *finish_run_addr;
-
 	/* Weights and Bias base address */
 	void *wb_base_addr;
 
@@ -462,9 +452,13 @@ int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
 void cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer,
 				uint8_t *base_dma_addr);
-void cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer);
-int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id,
+void cn10k_ml_layer_io_info_set(struct cnxk_ml_io_info *io_info,
+				struct cn10k_ml_model_metadata *metadata);
+struct cnxk_ml_io_info *cn10k_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
+int cn10k_ml_model_ocm_pages_count(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				   uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages);
-void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model);
+void cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			     struct cnxk_ml_io_info *io_info,
+			     struct cn10k_ml_model_metadata *metadata);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index e3c688a55f..ad2effb904 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -15,6 +15,9 @@
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
 
+/* ML layer macros */
+#define CN10K_ML_LAYER_MEMZONE_NAME "ml_cn10k_layer_mz"
+
 /* Debug print width */
 #define STR_LEN	  12
 #define FIELD_LEN 16
@@ -273,7 +276,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 		req->cn10k_req.jd.model_start.extended_args = PLT_U64_CAST(
 			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.extended_args));
 		req->cn10k_req.jd.model_start.model_dst_ddr_addr =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_run_addr));
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_load_addr));
 		req->cn10k_req.jd.model_start.model_init_offset = 0x0;
 		req->cn10k_req.jd.model_start.model_main_offset = metadata->init_model.file_size;
 		req->cn10k_req.jd.model_start.model_finish_offset =
@@ -1261,85 +1264,171 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 }
 
 int
-cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
+cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uint8_t *buffer,
+		    size_t size, uint16_t *index)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
-	size_t model_scratch_size;
-	size_t model_stats_size;
-	size_t model_data_size;
-	size_t model_info_size;
+	size_t layer_object_size = 0;
+	size_t layer_scratch_size;
+	size_t layer_xstats_size;
 	uint8_t *base_dma_addr;
 	uint16_t scratch_pages;
+	uint16_t layer_id = 0;
 	uint16_t wb_pages;
 	uint64_t mz_size;
 	uint16_t idx;
-	bool found;
 	int qp_id;
 	int ret;
 
-	ret = cn10k_ml_model_metadata_check(params->addr, params->size);
+	PLT_SET_USED(size);
+	PLT_SET_USED(layer_name);
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	layer = &model->layer[layer_id];
+
+	ret = cn10k_ml_model_metadata_check(buffer, size);
 	if (ret != 0)
 		return ret;
 
-	cnxk_mldev = dev->data->dev_private;
-
-	/* Find model ID */
-	found = false;
-	for (idx = 0; idx < dev->data->nb_models; idx++) {
-		if (dev->data->models[idx] == NULL) {
-			found = true;
+	/* Get index */
+	for (idx = 0; idx < cnxk_mldev->max_nb_layers; idx++) {
+		if (!cnxk_mldev->index_map[idx].active) {
+			layer->index = idx;
 			break;
 		}
 	}
 
-	if (!found) {
-		plt_err("No slots available to load new model");
-		return -ENOMEM;
+	if (idx >= cnxk_mldev->max_nb_layers) {
+		plt_err("No slots available for model layers, model_id = %u, layer_id = %u",
+			model->model_id, layer_id);
+		return -1;
 	}
 
+	layer->model = model;
+
 	/* Get WB and scratch pages, check if model can be loaded. */
-	ret = cn10k_ml_model_ocm_pages_count(&cnxk_mldev->cn10k_mldev, idx, params->addr, &wb_pages,
-					     &scratch_pages);
+	ret = cn10k_ml_model_ocm_pages_count(cnxk_mldev, layer, buffer, &wb_pages, &scratch_pages);
 	if (ret < 0)
 		return ret;
 
-	/* Compute memzone size */
-	metadata = (struct cn10k_ml_model_metadata *)params->addr;
-	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
-			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
-	model_scratch_size = PLT_ALIGN_CEIL(metadata->model.ddr_scratch_range_end -
+	/* Compute layer memzone size */
+	metadata = (struct cn10k_ml_model_metadata *)buffer;
+	layer_object_size = metadata->init_model.file_size + metadata->main_model.file_size +
+			    metadata->finish_model.file_size + metadata->weights_bias.file_size;
+	layer_object_size = PLT_ALIGN_CEIL(layer_object_size, ML_CN10K_ALIGN_SIZE);
+	layer_scratch_size = PLT_ALIGN_CEIL(metadata->model.ddr_scratch_range_end -
 						    metadata->model.ddr_scratch_range_start + 1,
 					    ML_CN10K_ALIGN_SIZE);
-	model_data_size = PLT_ALIGN_CEIL(model_data_size, ML_CN10K_ALIGN_SIZE);
-	model_info_size = sizeof(struct rte_ml_model_info) +
-			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
-			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
-	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
-	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_xstats);
-
-	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
-		  2 * model_data_size + model_scratch_size + model_info_size +
-		  PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE) +
-		  model_stats_size;
+	layer_xstats_size = (cnxk_mldev->mldev->data->nb_queue_pairs + 1) *
+			    sizeof(struct cn10k_ml_layer_xstats);
 
-	/* Allocate memzone for model object and model data */
-	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
+	/* Allocate memzone for model data */
+	mz_size = layer_object_size + layer_scratch_size +
+		  PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE) +
+		  layer_xstats_size;
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u_%u", CN10K_ML_LAYER_MEMZONE_NAME,
+		 model->model_id, layer_id);
 	mz = plt_memzone_reserve_aligned(str, mz_size, 0, ML_CN10K_ALIGN_SIZE);
 	if (!mz) {
 		plt_err("plt_memzone_reserve failed : %s", str);
 		return -ENOMEM;
 	}
 
-	model = mz->addr;
-	model->cnxk_mldev = cnxk_mldev;
-	model->model_id = idx;
-	dev->data->models[idx] = model;
+	/* Copy metadata to internal buffer */
+	rte_memcpy(&layer->glow.metadata, buffer, sizeof(struct cn10k_ml_model_metadata));
+	cn10k_ml_model_metadata_update(&layer->glow.metadata);
+
+	/* Set layer name */
+	rte_memcpy(layer->name, layer->glow.metadata.model.name, MRVL_ML_MODEL_NAME_LEN);
+
+	/* Enable support for batch_size of 256 */
+	if (layer->glow.metadata.model.batch_size == 0)
+		layer->batch_size = 256;
+	else
+		layer->batch_size = layer->glow.metadata.model.batch_size;
+
+	/* Set DMA base address */
+	base_dma_addr = mz->addr;
+	cn10k_ml_layer_addr_update(layer, buffer, base_dma_addr);
+
+	/* Set scratch base address */
+	layer->glow.addr.scratch_base_addr = PLT_PTR_ADD(base_dma_addr, layer_object_size);
+
+	/* Update internal I/O data structure */
+	cn10k_ml_layer_io_info_set(&layer->info, &layer->glow.metadata);
+
+	/* Initialize model_mem_map */
+	memset(&layer->glow.ocm_map, 0, sizeof(struct cn10k_ml_ocm_layer_map));
+	layer->glow.ocm_map.ocm_reserved = false;
+	layer->glow.ocm_map.tilemask = 0;
+	layer->glow.ocm_map.wb_page_start = -1;
+	layer->glow.ocm_map.wb_pages = wb_pages;
+	layer->glow.ocm_map.scratch_pages = scratch_pages;
+
+	/* Set slow-path request address and state */
+	layer->glow.req = PLT_PTR_ADD(mz->addr, layer_object_size + layer_scratch_size);
+
+	/* Reset burst and sync stats */
+	layer->glow.burst_xstats = PLT_PTR_ADD(
+		layer->glow.req, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
+	for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs + 1; qp_id++) {
+		layer->glow.burst_xstats[qp_id].hw_latency_tot = 0;
+		layer->glow.burst_xstats[qp_id].hw_latency_min = UINT64_MAX;
+		layer->glow.burst_xstats[qp_id].hw_latency_max = 0;
+		layer->glow.burst_xstats[qp_id].fw_latency_tot = 0;
+		layer->glow.burst_xstats[qp_id].fw_latency_min = UINT64_MAX;
+		layer->glow.burst_xstats[qp_id].fw_latency_max = 0;
+		layer->glow.burst_xstats[qp_id].hw_reset_count = 0;
+		layer->glow.burst_xstats[qp_id].fw_reset_count = 0;
+		layer->glow.burst_xstats[qp_id].dequeued_count = 0;
+	}
+
+	layer->glow.sync_xstats =
+		PLT_PTR_ADD(layer->glow.burst_xstats, cnxk_mldev->mldev->data->nb_queue_pairs *
+							      sizeof(struct cn10k_ml_layer_xstats));
+
+	/* Update xstats names */
+	cn10k_ml_xstats_model_name_update(cnxk_mldev->mldev, idx);
+
+	layer->state = ML_CNXK_LAYER_STATE_LOADED;
+	cnxk_mldev->index_map[idx].model_id = model->model_id;
+	cnxk_mldev->index_map[idx].layer_id = layer_id;
+	cnxk_mldev->index_map[idx].active = true;
+	*index = idx;
+
+	return 0;
+}
+
+int
+cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+		    struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+	int ret;
+
+	/* Metadata check */
+	ret = cn10k_ml_model_metadata_check(params->addr, params->size);
+	if (ret != 0)
+		return ret;
 
+	/* Copy metadata to internal buffer */
 	rte_memcpy(&model->glow.metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
 	cn10k_ml_model_metadata_update(&model->glow.metadata);
 
@@ -1358,99 +1447,62 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	 */
 	model->nb_layers = 1;
 
-	/* Copy metadata to internal buffer */
-	rte_memcpy(&model->layer[0].glow.metadata, params->addr,
-		   sizeof(struct cn10k_ml_model_metadata));
-	cn10k_ml_model_metadata_update(&model->layer[0].glow.metadata);
-	model->layer[0].model = model;
-
-	/* Set DMA base address */
-	base_dma_addr = PLT_PTR_ADD(
-		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE));
-	cn10k_ml_layer_addr_update(&model->layer[0], params->addr, base_dma_addr);
-	model->layer[0].glow.addr.scratch_base_addr =
-		PLT_PTR_ADD(base_dma_addr, 2 * model_data_size);
-
-	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
-		   model->layer[0].glow.addr.base_dma_addr_load, model_data_size);
-
-	/* Update internal I/O data structure */
-	cn10k_ml_layer_info_update(&model->layer[0]);
-
-	/* Initialize model_mem_map */
-	memset(&model->layer[0].glow.ocm_map, 0, sizeof(struct cn10k_ml_ocm_layer_map));
-	model->layer[0].glow.ocm_map.ocm_reserved = false;
-	model->layer[0].glow.ocm_map.tilemask = 0;
-	model->layer[0].glow.ocm_map.wb_page_start = -1;
-	model->layer[0].glow.ocm_map.wb_pages = wb_pages;
-	model->layer[0].glow.ocm_map.scratch_pages = scratch_pages;
-
-	/* Set model info */
-	model->info = PLT_PTR_ADD(model->layer[0].glow.addr.scratch_base_addr, model_scratch_size);
-	cn10k_ml_model_info_set(dev, model);
-
-	/* Set slow-path request address and state */
-	model->layer[0].glow.req = PLT_PTR_ADD(model->info, model_info_size);
-
-	/* Reset burst and sync stats */
-	model->layer[0].glow.burst_xstats =
-		PLT_PTR_ADD(model->layer[0].glow.req,
-			    PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
-		model->layer[0].glow.burst_xstats[qp_id].hw_latency_tot = 0;
-		model->layer[0].glow.burst_xstats[qp_id].hw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_xstats[qp_id].hw_latency_max = 0;
-		model->layer[0].glow.burst_xstats[qp_id].fw_latency_tot = 0;
-		model->layer[0].glow.burst_xstats[qp_id].fw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_xstats[qp_id].fw_latency_max = 0;
-		model->layer[0].glow.burst_xstats[qp_id].hw_reset_count = 0;
-		model->layer[0].glow.burst_xstats[qp_id].fw_reset_count = 0;
-		model->layer[0].glow.burst_xstats[qp_id].dequeued_count = 0;
+	/* Load layer and get the index */
+	layer = &model->layer[0];
+	ret = cn10k_ml_layer_load(cnxk_mldev, model->model_id, NULL, params->addr, params->size,
+				  &layer->index);
+	if (ret != 0) {
+		plt_err("Model layer load failed: model_id = %u, layer_id = %u", model->model_id,
+			0);
+		return ret;
 	}
 
-	model->layer[0].glow.sync_xstats =
-		PLT_PTR_ADD(model->layer[0].glow.burst_xstats,
-			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_xstats));
-
-	plt_spinlock_init(&model->lock);
-	model->state = ML_CNXK_MODEL_STATE_LOADED;
-	dev->data->models[idx] = model;
-	cnxk_mldev->nb_models_loaded++;
-
-	/* Update xstats names */
-	cn10k_ml_xstats_model_name_update(dev, idx);
-
-	*model_id = idx;
+	cn10k_ml_model_info_set(cnxk_mldev, model, &model->layer[0].info, &model->glow.metadata);
 
 	return 0;
 }
 
 int
-cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name)
 {
-	char str[RTE_MEMZONE_NAMESIZE];
-	struct cnxk_ml_model *model;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 
-	cnxk_mldev = dev->data->dev_private;
-	model = dev->data->models[model_id];
+	char str[RTE_MEMZONE_NAMESIZE];
+	uint16_t layer_id = 0;
+	int ret;
 
+	PLT_SET_USED(layer_name);
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
 	if (model == NULL) {
 		plt_err("Invalid model_id = %u", model_id);
 		return -EINVAL;
 	}
 
-	if (model->state != ML_CNXK_MODEL_STATE_LOADED) {
-		plt_err("Cannot unload. Model in use.");
-		return -EBUSY;
-	}
+	layer = &model->layer[layer_id];
 
-	dev->data->models[model_id] = NULL;
-	cnxk_mldev->nb_models_unloaded++;
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u_%u", CN10K_ML_LAYER_MEMZONE_NAME,
+		 model->model_id, layer_id);
+	ret = plt_memzone_free(plt_memzone_lookup(str));
 
-	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, model_id);
-	return plt_memzone_free(plt_memzone_lookup(str));
+	layer->state = ML_CNXK_LAYER_STATE_UNKNOWN;
+	cnxk_mldev->index_map[layer->index].active = false;
+
+	return ret;
+}
+
+int
+cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	return cn10k_ml_layer_unload(cnxk_mldev, model->model_id, NULL);
 }
 
 int
@@ -1748,7 +1800,6 @@ int
 cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
 {
 	struct cnxk_ml_model *model;
-	size_t size;
 
 	model = dev->data->models[model_id];
 
@@ -1762,19 +1813,10 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *bu
 	else if (model->state != ML_CNXK_MODEL_STATE_LOADED)
 		return -EBUSY;
 
-	size = model->layer[0].glow.metadata.init_model.file_size +
-	       model->layer[0].glow.metadata.main_model.file_size +
-	       model->layer[0].glow.metadata.finish_model.file_size +
-	       model->layer[0].glow.metadata.weights_bias.file_size;
-
 	/* Update model weights & bias */
 	rte_memcpy(model->layer[0].glow.addr.wb_load_addr, buffer,
 		   model->layer[0].glow.metadata.weights_bias.file_size);
 
-	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
-		   model->layer[0].glow.addr.base_dma_addr_load, size);
-
 	return 0;
 }
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 2d0a49d5cd..677219dfdf 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -12,6 +12,7 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_qp;
+struct cnxk_ml_model;
 
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
@@ -311,9 +312,9 @@ int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 			      int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids);
 
 /* Slow-path ops */
-int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
-			uint16_t *model_id);
-int cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
+int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+			struct cnxk_ml_model *model);
+int cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
@@ -339,4 +340,9 @@ __rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *
 /* Misc ops */
 void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
 
+/* Layer ops */
+int cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uint8_t *buffer,
+			size_t size, uint16_t *index);
+int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name);
+
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 02605fa28f..1590249abd 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -31,6 +31,18 @@ enum cnxk_ml_dev_state {
 	ML_CNXK_DEV_STATE_CLOSED
 };
 
+/* Index to model and layer ID map */
+struct cnxk_ml_index_map {
+	/* Model ID */
+	uint16_t model_id;
+
+	/* Layer ID */
+	uint16_t layer_id;
+
+	/* Layer status */
+	bool active;
+};
+
 /* Device private data */
 struct cnxk_ml_dev {
 	/* RTE device */
@@ -56,6 +68,9 @@ struct cnxk_ml_dev {
 
 	/* Maximum number of layers */
 	uint64_t max_nb_layers;
+
+	/* Index map */
+	struct cnxk_ml_index_map *index_map;
 };
 
 #endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index aa56dd2276..1d8b84269d 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -10,6 +10,9 @@
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
+/* ML model macros */
+#define CNXK_ML_MODEL_MEMZONE_NAME "ml_cnxk_model_mz"
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -137,6 +140,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	uint16_t model_id;
 	uint32_t mz_size;
 	uint16_t qp_id;
+	uint64_t i;
 	int ret;
 
 	if (dev == NULL)
@@ -240,7 +244,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 						plt_err("Could not stop model %u", model_id);
 				}
 				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-					if (cn10k_ml_model_unload(dev, model_id) != 0)
+					if (cnxk_ml_model_unload(dev, model_id) != 0)
 						plt_err("Could not unload model %u", model_id);
 				}
 				dev->data->models[model_id] = NULL;
@@ -271,6 +275,23 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	cnxk_mldev->max_nb_layers =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
 
+	/* Allocate and initialize index_map */
+	if (cnxk_mldev->index_map == NULL) {
+		cnxk_mldev->index_map =
+			rte_zmalloc("cnxk_ml_index_map",
+				    sizeof(struct cnxk_ml_index_map) * cnxk_mldev->max_nb_layers,
+				    RTE_CACHE_LINE_SIZE);
+		if (cnxk_mldev->index_map == NULL) {
+			plt_err("Failed to get memory for index_map, nb_layers %" PRIu64,
+				cnxk_mldev->max_nb_layers);
+			ret = -ENOMEM;
+			goto error;
+		}
+	}
+
+	for (i = 0; i < cnxk_mldev->max_nb_layers; i++)
+		cnxk_mldev->index_map[i].active = false;
+
 	cnxk_mldev->nb_models_loaded = 0;
 	cnxk_mldev->nb_models_started = 0;
 	cnxk_mldev->nb_models_stopped = 0;
@@ -303,6 +324,9 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close CN10K ML Device");
 
+	if (cnxk_mldev->index_map)
+		rte_free(cnxk_mldev->index_map);
+
 	/* Stop and unload all models */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 		model = dev->data->models[model_id];
@@ -312,7 +336,7 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 					plt_err("Could not stop model %u", model_id);
 			}
 			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-				if (cn10k_ml_model_unload(dev, model_id) != 0)
+				if (cnxk_ml_model_unload(dev, model_id) != 0)
 					plt_err("Could not unload model %u", model_id);
 			}
 			dev->data->models[model_id] = NULL;
@@ -428,6 +452,118 @@ cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+static int
+cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint64_t model_info_size;
+	uint16_t lcl_model_id;
+	uint64_t mz_size;
+	bool found;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	/* Find model ID */
+	found = false;
+	for (lcl_model_id = 0; lcl_model_id < dev->data->nb_models; lcl_model_id++) {
+		if (dev->data->models[lcl_model_id] == NULL) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		plt_err("No slots available to load new model");
+		return -ENOMEM;
+	}
+
+	/* Compute memzone size */
+	cnxk_ml_dev_info_get(dev, &dev_info);
+	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size);
+	model_info_size = sizeof(struct rte_ml_model_info) +
+			  ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info) +
+			  ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info);
+	model_info_size = PLT_ALIGN_CEIL(model_info_size, dev_info.align_size);
+	mz_size += model_info_size;
+
+	/* Allocate memzone for model object */
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CNXK_ML_MODEL_MEMZONE_NAME, lcl_model_id);
+	mz = plt_memzone_reserve_aligned(str, mz_size, 0, dev_info.align_size);
+	if (!mz) {
+		plt_err("Failed to allocate memory for cnxk_ml_model: %s", str);
+		return -ENOMEM;
+	}
+
+	model = mz->addr;
+	model->cnxk_mldev = cnxk_mldev;
+	model->model_id = lcl_model_id;
+	model->info = PLT_PTR_ADD(
+		model, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size));
+	dev->data->models[lcl_model_id] = model;
+
+	ret = cn10k_ml_model_load(cnxk_mldev, params, model);
+	if (ret != 0)
+		goto error;
+
+	plt_spinlock_init(&model->lock);
+	model->state = ML_CNXK_MODEL_STATE_LOADED;
+	cnxk_mldev->nb_models_loaded++;
+
+	*model_id = lcl_model_id;
+
+	return 0;
+
+error:
+	rte_memzone_free(mz);
+
+	return ret;
+}
+
+int
+cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	if (model->state != ML_CNXK_MODEL_STATE_LOADED) {
+		plt_err("Cannot unload. Model in use.");
+		return -EBUSY;
+	}
+
+	ret = cn10k_ml_model_unload(cnxk_mldev, model);
+	if (ret != 0)
+		return ret;
+
+	dev->data->models[model_id] = NULL;
+	cnxk_mldev->nb_models_unloaded++;
+
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CNXK_ML_MODEL_MEMZONE_NAME, model_id);
+	return plt_memzone_free(plt_memzone_lookup(str));
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -451,8 +587,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
 
 	/* Model ops */
-	.model_load = cn10k_ml_model_load,
-	.model_unload = cn10k_ml_model_unload,
+	.model_load = cnxk_ml_model_load,
+	.model_unload = cnxk_ml_model_unload,
 	.model_start = cn10k_ml_model_start,
 	.model_stop = cn10k_ml_model_stop,
 	.model_info_get = cn10k_ml_model_info_get,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index a925c07580..bc14f6e5b9 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -62,4 +62,6 @@ struct cnxk_ml_qp {
 
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
+int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 10/35] ml/cnxk: update model start and stop functions
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
                     ` (8 preceding siblings ...)
  2023-09-27 18:30   ` [PATCH v3 09/35] ml/cnxk: update model load and unload functions Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 11/35] ml/cnxk: update model utility functions Srikanth Yalavarthi
                     ` (24 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented cnxk wrapper functions to start and stop
ML models. Wrapper functions would invoke the cn10k
model start and stop functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c |  28 ++--
 drivers/ml/cnxk/cn10k_ml_ocm.h |  12 +-
 drivers/ml/cnxk/cn10k_ml_ops.c | 282 ++++++++++++++++++++-------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   8 +-
 drivers/ml/cnxk/cnxk_ml_ops.c  |  48 +++++-
 drivers/ml/cnxk/cnxk_ml_ops.h  |   1 +
 6 files changed, 240 insertions(+), 139 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index d71c36eae6..2197e5e0ed 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -215,11 +215,10 @@ cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end)
  * scratch & WB pages and OCM allocation mode.
  */
 int
-cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
+cn10k_ml_ocm_tilemask_find(struct cnxk_ml_dev *cnxk_mldev, uint8_t num_tiles, uint16_t wb_pages,
 			   uint16_t scratch_pages, uint64_t *tilemask)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	uint16_t used_scratch_pages_max;
@@ -238,7 +237,6 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	int max_slot_sz;
 	int page_id;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
 
@@ -333,12 +331,10 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 }
 
 void
-cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
+cn10k_ml_ocm_reserve_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id,
 			   uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
 			   uint16_t scratch_pages)
 {
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
@@ -351,10 +347,8 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t l
 	int tile_id;
 	int page_id;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
+	model = cnxk_mldev->mldev->data->models[model_id];
 	layer = &model->layer[layer_id];
 
 	/* Get first set bit, tile_start */
@@ -396,12 +390,10 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t l
 }
 
 void
-cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id)
+cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id)
 {
 	struct cnxk_ml_model *local_model;
 	struct cnxk_ml_layer *local_layer;
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
@@ -416,10 +408,8 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t laye
 	uint16_t i;
 	uint16_t j;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
+	model = cnxk_mldev->mldev->data->models[model_id];
 	layer = &model->layer[layer_id];
 
 	/* Update OCM info for WB memory */
@@ -438,8 +428,8 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t laye
 
 		/* Get max scratch pages required, excluding the current model */
 		scratch_resize_pages = 0;
-		for (i = 0; i < dev->data->nb_models; i++) {
-			local_model = dev->data->models[i];
+		for (i = 0; i < cnxk_mldev->mldev->data->nb_models; i++) {
+			local_model = cnxk_mldev->mldev->data->models[i];
 			if (local_model == NULL)
 				continue;
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 720f8caf76..97b723a56a 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -8,6 +8,8 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+struct cnxk_ml_dev;
+
 /* Number of OCM tiles. */
 #define ML_CN10K_OCM_NUMTILES 0x8
 
@@ -75,12 +77,12 @@ struct cn10k_ml_ocm {
 };
 
 int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
-int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
+int cn10k_ml_ocm_tilemask_find(struct cnxk_ml_dev *cnxk_mldev, uint8_t num_tiles, uint16_t wb_pages,
 			       uint16_t scratch_pages, uint64_t *tilemask);
-void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
-				uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
-				uint16_t scratch_pages);
-void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id);
+void cn10k_ml_ocm_reserve_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id,
+				uint16_t layer_id, uint64_t tilemask, int wb_page_start,
+				uint16_t wb_pages, uint16_t scratch_pages);
+void cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id);
 void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp);
 
 #endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index ad2effb904..c677861645 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -248,26 +248,28 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 }
 
 static void
-cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_model *model,
+cn10k_ml_prep_sp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				struct cnxk_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_layer_addr *addr;
+	struct cn10k_ml_dev *cn10k_mldev;
 
-	metadata = &model->glow.metadata;
-	addr = &model->layer[0].glow.addr;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	metadata = &layer->glow.metadata;
+	addr = &layer->glow.addr;
 
 	memset(&req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
 	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
 	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->cn10k_req.status);
-	req->cn10k_req.jd.hdr.model_id = model->model_id;
+	req->cn10k_req.jd.hdr.model_id = layer->index;
 	req->cn10k_req.jd.hdr.job_type = job_type;
 	req->cn10k_req.jd.hdr.fp_flags = 0x0;
 	req->cn10k_req.jd.hdr.result =
 		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
-		if (!model->glow.metadata.model.ocm_relocatable)
+		if (!layer->glow.metadata.model.ocm_relocatable)
 			req->cn10k_req.jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
 		else
 			req->cn10k_req.jd.hdr.sp_flags = 0x0;
@@ -291,7 +293,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 		req->cn10k_req.jd.model_start.num_gather_entries = 0;
 		req->cn10k_req.jd.model_start.num_scatter_entries = 0;
 		req->cn10k_req.jd.model_start.tilemask = 0; /* Updated after reserving pages */
-		req->cn10k_req.jd.model_start.batch_size = model->batch_size;
+		req->cn10k_req.jd.model_start.batch_size = layer->batch_size;
 		req->cn10k_req.jd.model_start.ocm_wb_base_address =
 			0; /* Updated after reserving pages */
 		req->cn10k_req.jd.model_start.ocm_wb_range_start =
@@ -323,9 +325,13 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 }
 
 static __rte_always_inline void
-cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_req *req,
+cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_req *req,
 				struct rte_ml_op *op)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+
 	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
 	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(req->status);
 	req->cn10k_req.jd.hdr.model_id = op->model_id;
@@ -714,10 +720,8 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 }
 
 static int
-cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer)
 {
-	struct rte_ml_model_info *info;
-	struct cnxk_ml_model *model;
 	struct rte_ml_buff_seg seg[2];
 	struct rte_ml_buff_seg *inp;
 	struct rte_ml_buff_seg *out;
@@ -730,22 +734,20 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	int ret = 0;
 	uint32_t i;
 
-	model = dev->data->models[model_id];
-	info = (struct rte_ml_model_info *)model->info;
 	inp = &seg[0];
 	out = &seg[1];
 
 	/* Create input and output buffers. */
-	for (i = 0; i < info->nb_inputs; i++)
-		isize += info->input_info[i].size;
+	for (i = 0; i < layer->info.nb_inputs; i++)
+		isize += layer->info.input[i].sz_q;
 
-	for (i = 0; i < info->nb_outputs; i++)
-		osize += info->output_info[i].size;
+	for (i = 0; i < layer->info.nb_outputs; i++)
+		osize += layer->info.output[i].sz_q;
 
-	isize = model->batch_size * isize;
-	osize = model->batch_size * osize;
+	isize = layer->batch_size * isize;
+	osize = layer->batch_size * osize;
 
-	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", model_id);
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", layer->index);
 	mz = plt_memzone_reserve_aligned(str, isize + osize, 0, ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL)
 		return -ENOMEM;
@@ -761,15 +763,15 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	seg[1].length = osize;
 	seg[1].next = NULL;
 
-	op.model_id = model_id;
-	op.nb_batches = model->batch_size;
+	op.model_id = layer->index;
+	op.nb_batches = layer->batch_size;
 	op.mempool = NULL;
 
 	op.input = &inp;
 	op.output = &out;
 
-	memset(model->layer[0].glow.req, 0, sizeof(struct cnxk_ml_req));
-	ret = cn10k_ml_inference_sync(dev, &op);
+	memset(layer->glow.req, 0, sizeof(struct cnxk_ml_req));
+	ret = cn10k_ml_inference_sync(cnxk_mldev, &op);
 	plt_memzone_free(mz);
 
 	return ret;
@@ -1506,14 +1508,16 @@ cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mode
 }
 
 int
-cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 	struct cnxk_ml_req *req;
 
+	uint16_t layer_id = 0;
 	bool job_enqueued;
 	bool job_dequeued;
 	uint8_t num_tiles;
@@ -1524,85 +1528,89 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	bool locked;
 	int ret = 0;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	PLT_SET_USED(layer_name);
 
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
 	if (model == NULL) {
 		plt_err("Invalid model_id = %u", model_id);
 		return -EINVAL;
 	}
 
+	layer = &model->layer[layer_id];
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
+
 	/* Prepare JD */
-	req = model->layer[0].glow.req;
-	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
+	req = layer->glow.req;
+	cn10k_ml_prep_sp_job_descriptor(cnxk_mldev, layer, req, ML_CN10K_JOB_TYPE_MODEL_START);
 	req->cn10k_req.result.error_code = 0x0;
 	req->cn10k_req.result.user_ptr = NULL;
 
 	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
-	num_tiles = model->layer[0].glow.metadata.model.tile_end -
-		    model->layer[0].glow.metadata.model.tile_start + 1;
+	num_tiles = layer->glow.metadata.model.tile_end - layer->glow.metadata.model.tile_start + 1;
 
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-				plt_ml_dbg("Model already started, model = 0x%016lx",
-					   PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_STARTED) {
+				plt_ml_dbg("Layer already started, model_id = %u, layer_id = %u",
+					   model->model_id, layer_id);
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
-				plt_err("A slow-path job is active for the model = 0x%016lx",
-					PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE) {
+				plt_err("A slow-path job is active for the model_id = %u",
+					model->model_id);
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
+			layer->state = ML_CNXK_LAYER_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (!model->layer[0].glow.ocm_map.ocm_reserved) {
+	while (!layer->glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
 			wb_page_start = cn10k_ml_ocm_tilemask_find(
-				dev, num_tiles, model->layer[0].glow.ocm_map.wb_pages,
-				model->layer[0].glow.ocm_map.scratch_pages, &tilemask);
+				cnxk_mldev, num_tiles, layer->glow.ocm_map.wb_pages,
+				layer->glow.ocm_map.scratch_pages, &tilemask);
 
 			if (wb_page_start == -1) {
 				plt_err("Free pages not available on OCM tiles");
-				plt_err("Failed to start model = 0x%016lx, name = %s",
-					PLT_U64_CAST(model),
-					model->layer[0].glow.metadata.model.name);
-
+				plt_err("Failed to start layer, model_id = %u, layer_id = %u",
+					model->model_id, layer_id);
 				plt_spinlock_unlock(&ocm->lock);
 				return -ENOMEM;
 			}
 
-			model->layer[0].glow.ocm_map.tilemask = tilemask;
-			model->layer[0].glow.ocm_map.wb_page_start = wb_page_start;
+			layer->glow.ocm_map.tilemask = tilemask;
+			layer->glow.ocm_map.wb_page_start = wb_page_start;
 
-			cn10k_ml_ocm_reserve_pages(dev, model->model_id, 0,
-						   model->layer[0].glow.ocm_map.tilemask,
-						   model->layer[0].glow.ocm_map.wb_page_start,
-						   model->layer[0].glow.ocm_map.wb_pages,
-						   model->layer[0].glow.ocm_map.scratch_pages);
-			model->layer[0].glow.ocm_map.ocm_reserved = true;
+			cn10k_ml_ocm_reserve_pages(
+				cnxk_mldev, model->model_id, layer_id, layer->glow.ocm_map.tilemask,
+				layer->glow.ocm_map.wb_page_start, layer->glow.ocm_map.wb_pages,
+				layer->glow.ocm_map.scratch_pages);
+			layer->glow.ocm_map.ocm_reserved = true;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
 
 	/* Update JD */
-	cn10k_ml_ocm_tilecount(model->layer[0].glow.ocm_map.tilemask, &tile_start, &tile_end);
+	cn10k_ml_ocm_tilecount(layer->glow.ocm_map.tilemask, &tile_start, &tile_end);
 	req->cn10k_req.jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
 	req->cn10k_req.jd.model_start.ocm_wb_base_address =
-		model->layer[0].glow.ocm_map.wb_page_start * ocm->page_size;
+		layer->glow.ocm_map.wb_page_start * ocm->page_size;
 
 	job_enqueued = false;
 	job_dequeued = false;
@@ -1636,66 +1644,94 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (ret == 0) {
-				model->state = ML_CNXK_MODEL_STATE_STARTED;
-				cnxk_mldev->nb_models_started++;
-			} else {
-				model->state = ML_CNXK_MODEL_STATE_UNKNOWN;
-			}
+			if (ret == 0)
+				layer->state = ML_CNXK_LAYER_STATE_STARTED;
+			else
+				layer->state = ML_CNXK_LAYER_STATE_UNKNOWN;
 
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN) {
-		while (model->layer[0].glow.ocm_map.ocm_reserved) {
+	if (layer->state == ML_CNXK_LAYER_STATE_UNKNOWN) {
+		while (layer->glow.ocm_map.ocm_reserved) {
 			if (plt_spinlock_trylock(&ocm->lock) != 0) {
-				cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
-				model->layer[0].glow.ocm_map.ocm_reserved = false;
-				model->layer[0].glow.ocm_map.tilemask = 0x0;
+				cn10k_ml_ocm_free_pages(cnxk_mldev, model->model_id, layer_id);
+				layer->glow.ocm_map.ocm_reserved = false;
+				layer->glow.ocm_map.tilemask = 0x0;
 				plt_spinlock_unlock(&ocm->lock);
 			}
 		}
 	}
 
-	if (ret < 0) { /* Call unload to update model and FW state, ignore error */
-		rte_ml_model_stop(dev->data->dev_id, model_id);
+	if (ret < 0) {
+		cn10k_ml_layer_stop(device, model_id, layer_name);
 	} else {
-		if (cn10k_mldev->cache_model_data && roc_model_is_cn10ka())
-			ret = cn10k_ml_cache_model_data(dev, model_id);
+		if (cn10k_mldev->cache_model_data)
+			ret = cn10k_ml_cache_model_data(cnxk_mldev, layer);
 	}
 
 	return ret;
 }
 
 int
-cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+	int ret;
+
+	layer = &model->layer[0];
+	ret = cn10k_ml_layer_start(cnxk_mldev, model->model_id, layer->name);
+	if (ret != 0) {
+		plt_err("CN10K Model start failed, model_id = %u, error = %d", model->model_id,
+			ret);
+		return ret;
+	}
+
+	cnxk_mldev->nb_models_started++;
+	model->state = ML_CNXK_MODEL_STATE_STARTED;
+
+	return 0;
+}
+
+int
+cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 	struct cnxk_ml_req *req;
 
+	uint16_t layer_id = 0;
 	bool job_enqueued;
 	bool job_dequeued;
 	bool locked;
 	int ret = 0;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	PLT_SET_USED(layer_name);
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
 
+	model = cnxk_mldev->mldev->data->models[model_id];
 	if (model == NULL) {
 		plt_err("Invalid model_id = %u", model_id);
 		return -EINVAL;
 	}
 
+	layer = &model->layer[layer_id];
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
+
 	/* Prepare JD */
-	req = model->layer[0].glow.req;
-	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
+	req = layer->glow.req;
+	cn10k_ml_prep_sp_job_descriptor(cnxk_mldev, layer, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
 	req->cn10k_req.result.error_code = 0x0;
 	req->cn10k_req.result.user_ptr = NULL;
 
@@ -1705,31 +1741,31 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-				plt_ml_dbg("Model not started, model = 0x%016lx",
-					   PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_LOADED) {
+				plt_ml_dbg("Layer not started, model_id = %u, layer_id = %u",
+					   model->model_id, layer_id);
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
-				plt_err("A slow-path job is active for the model = 0x%016lx",
-					PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE) {
+				plt_err("A slow-path job is active for the layer, model_id = %u, layer_id = %u",
+					model->model_id, layer_id);
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
+			layer->state = ML_CNXK_LAYER_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (model->layer[0].glow.ocm_map.ocm_reserved) {
+	while (layer->glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
-			cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
-			model->layer[0].glow.ocm_map.ocm_reserved = false;
-			model->layer[0].glow.ocm_map.tilemask = 0x0;
+			cn10k_ml_ocm_free_pages(cnxk_mldev, model->model_id, layer_id);
+			layer->glow.ocm_map.ocm_reserved = false;
+			layer->glow.ocm_map.tilemask = 0x0;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
@@ -1766,8 +1802,11 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			cnxk_mldev->nb_models_stopped++;
-			model->state = ML_CNXK_MODEL_STATE_LOADED;
+			if (ret == 0)
+				layer->state = ML_CNXK_LAYER_STATE_LOADED;
+			else
+				layer->state = ML_CNXK_LAYER_STATE_UNKNOWN;
+
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
@@ -1776,6 +1815,25 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	return ret;
 }
 
+int
+cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+	int ret;
+
+	layer = &model->layer[0];
+	ret = cn10k_ml_layer_stop(cnxk_mldev, model->model_id, layer->name);
+	if (ret != 0) {
+		plt_err("CN10K Model stop failed, model_id = %u, error = %d", model->model_id, ret);
+		return ret;
+	}
+
+	cnxk_mldev->nb_models_stopped++;
+	model->state = ML_CNXK_MODEL_STATE_LOADED;
+
+	return 0;
+}
+
 int
 cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			struct rte_ml_model_info *model_info)
@@ -2003,30 +2061,35 @@ queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
 }
 
 static __rte_always_inline void
-cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *req)
+cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml_req *req)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_layer_xstats *xstats;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_result *result;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cnxk_ml_qp *qp;
 	struct rte_ml_op *op;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
+	uint16_t model_id;
+	uint16_t layer_id;
 
 	result = &req->cn10k_req.result;
 	op = req->op;
 
 	if (likely(result->error_code == 0)) {
-		model = dev->data->models[op->model_id];
+		model_id = cnxk_mldev->index_map[op->model_id].model_id;
+		layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
+		model = cnxk_mldev->mldev->data->models[model_id];
+		layer = &model->layer[layer_id];
 		if (likely(qp_id >= 0)) {
-			qp = dev->data->queue_pairs[qp_id];
+			qp = cnxk_mldev->mldev->data->queue_pairs[qp_id];
 			qp->stats.dequeued_count++;
-			xstats = &model->layer[0].glow.burst_xstats[qp_id];
+			xstats = &layer->glow.burst_xstats[qp_id];
 		} else {
-			xstats = model->layer[0].glow.sync_xstats;
+			xstats = layer->glow.sync_xstats;
 		}
 
 		if (unlikely(xstats->dequeued_count == xstats->hw_reset_count)) {
@@ -2054,14 +2117,13 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *re
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
 	} else {
 		if (likely(qp_id >= 0)) {
-			qp = dev->data->queue_pairs[qp_id];
+			qp = cnxk_mldev->mldev->data->queue_pairs[qp_id];
 			qp->stats.dequeue_err_count++;
 		}
 
 		/* Handle driver error */
 		error_code = (union cn10k_ml_error_code *)&result->error_code;
 		if (error_code->s.etype == ML_ETYPE_DRIVER) {
-			cnxk_mldev = dev->data->dev_private;
 			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 			/* Check for exception */
@@ -2116,7 +2178,7 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	req = &queue->reqs[head];
 
 	cn10k_mldev->set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
@@ -2183,7 +2245,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 		}
 	}
 
-	cn10k_ml_result_update(dev, qp_id, req);
+	cn10k_ml_result_update(cnxk_mldev, qp_id, req);
 	ops[count] = req->op;
 
 	queue_index_advance(&tail, qp->nb_desc);
@@ -2232,23 +2294,27 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 }
 
 __rte_hot int
-cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
+cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cnxk_ml_req *req;
+	uint16_t model_id;
+	uint16_t layer_id;
 	bool timeout;
 	int ret = 0;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	model = dev->data->models[op->model_id];
-	req = model->layer[0].glow.req;
+	model_id = cnxk_mldev->index_map[op->model_id].model_id;
+	layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
+	model = cnxk_mldev->mldev->data->models[model_id];
+	layer = &model->layer[layer_id];
+	req = layer->glow.req;
 
 	cn10k_ml_set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
@@ -2284,7 +2350,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	if (timeout)
 		ret = -ETIME;
 	else
-		cn10k_ml_result_update(dev, -1, req);
+		cn10k_ml_result_update(cnxk_mldev, -1, req);
 
 error_enqueue:
 	return ret;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 677219dfdf..a222a43d55 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -315,8 +315,8 @@ int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
-int cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id);
-int cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
+int cn10k_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			    struct rte_ml_model_info *model_info);
 int cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer);
@@ -335,7 +335,7 @@ __rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id
 					  struct rte_ml_op **ops, uint16_t nb_ops);
 __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op,
 				    struct rte_ml_op_error *error);
-__rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
+__rte_hot int cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op);
 
 /* Misc ops */
 void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
@@ -344,5 +344,7 @@ void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *q
 int cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uint8_t *buffer,
 			size_t size, uint16_t *index);
 int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name);
+int cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name);
+int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name);
 
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 1d8b84269d..b61ed45876 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -240,7 +240,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 			model = dev->data->models[model_id];
 			if (model != NULL) {
 				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-					if (cn10k_ml_model_stop(dev, model_id) != 0)
+					if (cnxk_ml_model_stop(dev, model_id) != 0)
 						plt_err("Could not stop model %u", model_id);
 				}
 				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
@@ -332,7 +332,7 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 		model = dev->data->models[model_id];
 		if (model != NULL) {
 			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-				if (cn10k_ml_model_stop(dev, model_id) != 0)
+				if (cnxk_ml_model_stop(dev, model_id) != 0)
 					plt_err("Could not stop model %u", model_id);
 			}
 			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
@@ -564,6 +564,46 @@ cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 	return plt_memzone_free(plt_memzone_lookup(str));
 }
 
+static int
+cnxk_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	return cn10k_ml_model_start(cnxk_mldev, model);
+}
+
+int
+cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	return cn10k_ml_model_stop(cnxk_mldev, model);
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -589,8 +629,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Model ops */
 	.model_load = cnxk_ml_model_load,
 	.model_unload = cnxk_ml_model_unload,
-	.model_start = cn10k_ml_model_start,
-	.model_stop = cn10k_ml_model_stop,
+	.model_start = cnxk_ml_model_start,
+	.model_stop = cnxk_ml_model_stop,
 	.model_info_get = cn10k_ml_model_info_get,
 	.model_params_update = cn10k_ml_model_params_update,
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index bc14f6e5b9..d27ca0d0cb 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -63,5 +63,6 @@ struct cnxk_ml_qp {
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
 int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
+int cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
 
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 11/35] ml/cnxk: update model utility functions
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
                     ` (9 preceding siblings ...)
  2023-09-27 18:30   ` [PATCH v3 10/35] ml/cnxk: update model start and stop functions Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 12/35] ml/cnxk: update data quantization functions Srikanth Yalavarthi
                     ` (23 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to update model params and
fetch model info.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 38 ++++++---------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |  5 ++--
 drivers/ml/cnxk/cnxk_ml_ops.c  | 48 ++++++++++++++++++++++++++++++++--
 3 files changed, 56 insertions(+), 35 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index c677861645..c0d6216485 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1835,45 +1835,23 @@ cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
 }
 
 int
-cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
-			struct rte_ml_model_info *model_info)
+cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			     void *buffer)
 {
-	struct cnxk_ml_model *model;
-
-	model = dev->data->models[model_id];
-
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
-
-	rte_memcpy(model_info, model->info, sizeof(struct rte_ml_model_info));
-	model_info->input_info = ((struct rte_ml_model_info *)model->info)->input_info;
-	model_info->output_info = ((struct rte_ml_model_info *)model->info)->output_info;
-
-	return 0;
-}
-
-int
-cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
-{
-	struct cnxk_ml_model *model;
-
-	model = dev->data->models[model_id];
+	struct cnxk_ml_layer *layer;
 
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
+	RTE_SET_USED(cnxk_mldev);
 
 	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN)
 		return -1;
 	else if (model->state != ML_CNXK_MODEL_STATE_LOADED)
 		return -EBUSY;
 
+	layer = &model->layer[0];
+
 	/* Update model weights & bias */
-	rte_memcpy(model->layer[0].glow.addr.wb_load_addr, buffer,
-		   model->layer[0].glow.metadata.weights_bias.file_size);
+	rte_memcpy(layer->glow.addr.wb_load_addr, buffer,
+		   layer->glow.metadata.weights_bias.file_size);
 
 	return 0;
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index a222a43d55..ef12069f0d 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -317,9 +317,8 @@ int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_para
 int cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
-int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
-			    struct rte_ml_model_info *model_info);
-int cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer);
+int cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				 void *buffer);
 
 /* I/O ops */
 int cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index b61ed45876..9ce37fcfd1 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -604,6 +604,50 @@ cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	return cn10k_ml_model_stop(cnxk_mldev, model);
 }
 
+static int
+cnxk_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
+		       struct rte_ml_model_info *model_info)
+{
+	struct rte_ml_model_info *info;
+	struct cnxk_ml_model *model;
+
+	if ((dev == NULL) || (model_info == NULL))
+		return -EINVAL;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	info = (struct rte_ml_model_info *)model->info;
+	rte_memcpy(model_info, info, sizeof(struct rte_ml_model_info));
+	model_info->input_info = info->input_info;
+	model_info->output_info = info->output_info;
+
+	return 0;
+}
+
+static int
+cnxk_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	if ((dev == NULL) || (buffer == NULL))
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	return cn10k_ml_model_params_update(cnxk_mldev, model, buffer);
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -631,8 +675,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.model_unload = cnxk_ml_model_unload,
 	.model_start = cnxk_ml_model_start,
 	.model_stop = cnxk_ml_model_stop,
-	.model_info_get = cn10k_ml_model_info_get,
-	.model_params_update = cn10k_ml_model_params_update,
+	.model_info_get = cnxk_ml_model_info_get,
+	.model_params_update = cnxk_ml_model_params_update,
 
 	/* I/O ops */
 	.io_quantize = cn10k_ml_io_quantize,
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 12/35] ml/cnxk: update data quantization functions
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
                     ` (10 preceding siblings ...)
  2023-09-27 18:30   ` [PATCH v3 11/35] ml/cnxk: update model utility functions Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 13/35] ml/cnxk: update device debug functions Srikanth Yalavarthi
                     ` (22 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper functions to quantize input data and
dequantize output data.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 164 ---------------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   7 --
 drivers/ml/cnxk/cnxk_ml_io.c   |  95 +++++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_io.h   |   3 +
 drivers/ml/cnxk/cnxk_ml_ops.c  |  78 +++++++++++++++-
 drivers/ml/cnxk/meson.build    |   1 +
 6 files changed, 175 insertions(+), 173 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.c

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index c0d6216485..ff190b7f86 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1856,170 +1856,6 @@ cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_mode
 	return 0;
 }
 
-int
-cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
-		     struct rte_ml_buff_seg **qbuffer)
-{
-	struct cnxk_ml_model *model;
-	uint8_t model_input_type;
-	uint8_t *lcl_dbuffer;
-	uint8_t *lcl_qbuffer;
-	uint8_t input_type;
-	float qscale;
-	uint32_t i;
-	uint32_t j;
-	int ret;
-
-	model = dev->data->models[model_id];
-
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
-
-	lcl_dbuffer = dbuffer[0]->addr;
-	lcl_qbuffer = qbuffer[0]->addr;
-
-	for (i = 0; i < model->layer[0].glow.metadata.model.num_input; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			input_type = model->layer[0].glow.metadata.input1[i].input_type;
-			model_input_type = model->layer[0].glow.metadata.input1[i].model_input_type;
-			qscale = model->layer[0].glow.metadata.input1[i].qscale;
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			input_type = model->layer[0].glow.metadata.input2[j].input_type;
-			model_input_type = model->layer[0].glow.metadata.input2[j].model_input_type;
-			qscale = model->layer[0].glow.metadata.input2[j].qscale;
-		}
-
-		if (input_type == model_input_type) {
-			rte_memcpy(lcl_qbuffer, lcl_dbuffer, model->layer[0].info.input[i].sz_d);
-		} else {
-			switch (model->layer[0].glow.metadata.input1[i].model_input_type) {
-			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_float32_to_int8(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_float32_to_uint8(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_float32_to_int16(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_float32_to_uint16(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_FP16:
-				ret = rte_ml_io_float32_to_float16(
-					model->layer[0].info.input[i].nb_elements, lcl_dbuffer,
-					lcl_qbuffer);
-				break;
-			default:
-				plt_err("Unsupported model_input_type[%u] : %u", i,
-					model->layer[0].glow.metadata.input1[i].model_input_type);
-				ret = -ENOTSUP;
-			}
-			if (ret < 0)
-				return ret;
-		}
-
-		lcl_dbuffer += model->layer[0].info.input[i].sz_d;
-		lcl_qbuffer += model->layer[0].info.input[i].sz_q;
-	}
-
-	return 0;
-}
-
-int
-cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
-		       struct rte_ml_buff_seg **dbuffer)
-{
-	struct cnxk_ml_model *model;
-	uint8_t model_output_type;
-	uint8_t *lcl_qbuffer;
-	uint8_t *lcl_dbuffer;
-	uint8_t output_type;
-	float dscale;
-	uint32_t i;
-	uint32_t j;
-	int ret;
-
-	model = dev->data->models[model_id];
-
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
-
-	lcl_dbuffer = dbuffer[0]->addr;
-	lcl_qbuffer = qbuffer[0]->addr;
-
-	for (i = 0; i < model->layer[0].glow.metadata.model.num_output; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			output_type = model->layer[0].glow.metadata.output1[i].output_type;
-			model_output_type =
-				model->layer[0].glow.metadata.output1[i].model_output_type;
-			dscale = model->layer[0].glow.metadata.output1[i].dscale;
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			output_type = model->layer[0].glow.metadata.output2[j].output_type;
-			model_output_type =
-				model->layer[0].glow.metadata.output2[j].model_output_type;
-			dscale = model->layer[0].glow.metadata.output2[j].dscale;
-		}
-
-		if (output_type == model_output_type) {
-			rte_memcpy(lcl_dbuffer, lcl_qbuffer, model->layer[0].info.output[i].sz_q);
-		} else {
-			switch (model->layer[0].glow.metadata.output1[i].model_output_type) {
-			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_int8_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_uint8_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_int16_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_uint16_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_FP16:
-				ret = rte_ml_io_float16_to_float32(
-					model->layer[0].info.output[i].nb_elements, lcl_qbuffer,
-					lcl_dbuffer);
-				break;
-			default:
-				plt_err("Unsupported model_output_type[%u] : %u", i,
-					model->layer[0].glow.metadata.output1[i].model_output_type);
-				ret = -ENOTSUP;
-			}
-			if (ret < 0)
-				return ret;
-		}
-
-		lcl_qbuffer += model->layer[0].info.output[i].sz_q;
-		lcl_dbuffer += model->layer[0].info.output[i].sz_d;
-	}
-
-	return 0;
-}
-
 static __rte_always_inline void
 queue_index_advance(uint64_t *index, uint64_t nb_desc)
 {
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index ef12069f0d..780e2a9f9c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -320,13 +320,6 @@ int cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 int cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 				 void *buffer);
 
-/* I/O ops */
-int cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id,
-			 struct rte_ml_buff_seg **dbuffer, struct rte_ml_buff_seg **qbuffer);
-
-int cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id,
-			   struct rte_ml_buff_seg **qbuffer, struct rte_ml_buff_seg **dbuffer);
-
 /* Fast-path ops */
 __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					  struct rte_ml_op **ops, uint16_t nb_ops);
diff --git a/drivers/ml/cnxk/cnxk_ml_io.c b/drivers/ml/cnxk/cnxk_ml_io.c
new file mode 100644
index 0000000000..c78009ab0c
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_io.c
@@ -0,0 +1,95 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include <mldev_utils.h>
+
+#include <roc_api.h>
+
+#include "cnxk_ml_io.h"
+
+inline int
+cnxk_ml_io_quantize_single(struct cnxk_ml_io *input, uint8_t *dbuffer, uint8_t *qbuffer)
+{
+	enum rte_ml_io_type qtype;
+	enum rte_ml_io_type dtype;
+	uint32_t nb_elements;
+	float qscale;
+	int ret = 0;
+
+	dtype = input->dtype;
+	qtype = input->qtype;
+	qscale = input->scale;
+	nb_elements = input->nb_elements;
+
+	if (dtype == qtype) {
+		rte_memcpy(qbuffer, dbuffer, input->sz_d);
+	} else {
+		switch (qtype) {
+		case RTE_ML_IO_TYPE_INT8:
+			ret = rte_ml_io_float32_to_int8(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT8:
+			ret = rte_ml_io_float32_to_uint8(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_INT16:
+			ret = rte_ml_io_float32_to_int16(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT16:
+			ret = rte_ml_io_float32_to_uint16(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_FP16:
+			ret = rte_ml_io_float32_to_float16(nb_elements, dbuffer, qbuffer);
+			break;
+		default:
+			plt_err("Unsupported qtype : %u", qtype);
+			ret = -ENOTSUP;
+		}
+	}
+
+	return ret;
+}
+
+inline int
+cnxk_ml_io_dequantize_single(struct cnxk_ml_io *output, uint8_t *qbuffer, uint8_t *dbuffer)
+{
+	enum rte_ml_io_type qtype;
+	enum rte_ml_io_type dtype;
+	uint32_t nb_elements;
+	float dscale;
+	int ret = 0;
+
+	dtype = output->dtype;
+	qtype = output->qtype;
+	dscale = output->scale;
+	nb_elements = output->nb_elements;
+
+	if (dtype == qtype) {
+		rte_memcpy(dbuffer, qbuffer, output->sz_q);
+	} else {
+		switch (qtype) {
+		case RTE_ML_IO_TYPE_INT8:
+			ret = rte_ml_io_int8_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT8:
+			ret = rte_ml_io_uint8_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_INT16:
+			ret = rte_ml_io_int16_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT16:
+			ret = rte_ml_io_uint16_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_FP16:
+			ret = rte_ml_io_float16_to_float32(nb_elements, qbuffer, dbuffer);
+			break;
+		default:
+			plt_err("Unsupported qtype: %u", qtype);
+			ret = -ENOTSUP;
+		}
+	}
+
+	return ret;
+}
diff --git a/drivers/ml/cnxk/cnxk_ml_io.h b/drivers/ml/cnxk/cnxk_ml_io.h
index 29ec7ec511..5de166c252 100644
--- a/drivers/ml/cnxk/cnxk_ml_io.h
+++ b/drivers/ml/cnxk/cnxk_ml_io.h
@@ -76,4 +76,7 @@ struct cnxk_ml_io_info {
 	uint32_t total_output_sz_d;
 };
 
+int cnxk_ml_io_quantize_single(struct cnxk_ml_io *input, uint8_t *dbuffer, uint8_t *qbuffer);
+int cnxk_ml_io_dequantize_single(struct cnxk_ml_io *output, uint8_t *qbuffer, uint8_t *dbuffer);
+
 #endif /* _CNXK_ML_IO_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 9ce37fcfd1..63842025fc 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -5,6 +5,8 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include <mldev_utils.h>
+
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_io.h"
 #include "cnxk_ml_model.h"
@@ -648,6 +650,78 @@ cnxk_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buf
 	return cn10k_ml_model_params_update(cnxk_mldev, model, buffer);
 }
 
+static int
+cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
+		    struct rte_ml_buff_seg **qbuffer)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_model *model;
+	uint8_t *lcl_dbuffer;
+	uint8_t *lcl_qbuffer;
+	uint32_t i;
+	int ret;
+
+	if ((dev == NULL) || (dbuffer == NULL) || (qbuffer == NULL))
+		return -EINVAL;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	info = &model->layer[0].info;
+
+	lcl_dbuffer = dbuffer[0]->addr;
+	lcl_qbuffer = qbuffer[0]->addr;
+	for (i = 0; i < info->nb_inputs; i++) {
+		ret = cnxk_ml_io_quantize_single(&info->input[i], lcl_dbuffer, lcl_qbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_dbuffer += info->input[i].sz_d;
+		lcl_qbuffer += info->input[i].sz_q;
+	}
+
+	return 0;
+}
+
+static int
+cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
+		      struct rte_ml_buff_seg **dbuffer)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_model *model;
+	uint8_t *lcl_qbuffer;
+	uint8_t *lcl_dbuffer;
+	uint32_t i;
+	int ret;
+
+	if ((dev == NULL) || (qbuffer == NULL) || (dbuffer == NULL))
+		return -EINVAL;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	info = &model->layer[model->nb_layers - 1].info;
+
+	lcl_qbuffer = qbuffer[0]->addr;
+	lcl_dbuffer = dbuffer[0]->addr;
+	for (i = 0; i < info->nb_outputs; i++) {
+		ret = cnxk_ml_io_dequantize_single(&info->output[i], lcl_qbuffer, lcl_dbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_qbuffer += info->output[i].sz_q;
+		lcl_dbuffer += info->output[i].sz_d;
+	}
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -679,6 +753,6 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.model_params_update = cnxk_ml_model_params_update,
 
 	/* I/O ops */
-	.io_quantize = cn10k_ml_io_quantize,
-	.io_dequantize = cn10k_ml_io_dequantize,
+	.io_quantize = cnxk_ml_io_quantize,
+	.io_dequantize = cnxk_ml_io_dequantize,
 };
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 6385ac4548..9cc4ddec70 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -25,6 +25,7 @@ sources = files(
         'cn10k_ml_model.c',
         'cn10k_ml_ocm.c',
         'cnxk_ml_dev.c',
+        'cnxk_ml_io.c',
         'cnxk_ml_model.c',
         'cnxk_ml_ops.c',
 )
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 13/35] ml/cnxk: update device debug functions
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
                     ` (11 preceding siblings ...)
  2023-09-27 18:30   ` [PATCH v3 12/35] ml/cnxk: update data quantization functions Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 14/35] ml/cnxk: update device stats functions Srikanth Yalavarthi
                     ` (21 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper for device dump and selftest debug
functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 118 +++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h |   1 +
 drivers/ml/cnxk/cn10k_ml_ocm.c   |   8 +-
 drivers/ml/cnxk/cn10k_ml_ocm.h   |   2 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 176 ++-----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h   |   4 +-
 drivers/ml/cnxk/cnxk_ml_model.c  |  33 ++++++
 drivers/ml/cnxk/cnxk_ml_model.h  |   2 +
 drivers/ml/cnxk/cnxk_ml_ops.c    |  39 ++++++-
 drivers/ml/cnxk/cnxk_ml_utils.c  |  15 +++
 drivers/ml/cnxk/cnxk_ml_utils.h  |  17 +++
 drivers/ml/cnxk/meson.build      |   2 +
 12 files changed, 236 insertions(+), 181 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.h

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 69a60b9b90..b765b4ada9 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -11,6 +11,7 @@
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
+#include "cnxk_ml_utils.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -596,3 +597,120 @@ cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 				 rte_ml_io_type_size_get(io_info->output[i].qtype);
 	}
 }
+
+void
+cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp)
+{
+	struct cn10k_ml_ocm *ocm;
+	char str[STR_LEN];
+	uint8_t i;
+	uint8_t j;
+
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
+
+	/* Print debug info */
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, " Layer Information (Layer ID: %u, Name: %s)\n",
+		cnxk_mldev->index_map[layer->index].layer_id, layer->name);
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "index", layer->index);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", layer->name);
+	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version",
+		layer->glow.metadata.model.version[0], layer->glow.metadata.model.version[1],
+		layer->glow.metadata.model.version[2], layer->glow.metadata.model.version[3]);
+	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "layer", PLT_U64_CAST(layer));
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", layer->batch_size);
+
+	/* Print model state */
+	if (layer->state == ML_CNXK_LAYER_STATE_LOADED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+	if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+	if (layer->state == ML_CNXK_LAYER_STATE_STARTED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+
+	/* Print OCM status */
+	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
+		layer->glow.metadata.model.ocm_wb_range_end -
+			layer->glow.metadata.model.ocm_wb_range_start + 1);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", layer->glow.ocm_map.wb_pages);
+	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
+		ocm->size_per_tile - layer->glow.metadata.model.ocm_tmp_range_floor);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages", layer->glow.ocm_map.scratch_pages);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
+		layer->glow.metadata.model.tile_end - layer->glow.metadata.model.tile_start + 1);
+
+	if (layer->state == ML_CNXK_LAYER_STATE_STARTED) {
+		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
+			ML_CN10K_OCM_NUMTILES / 4, layer->glow.ocm_map.tilemask);
+		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
+			layer->glow.ocm_map.wb_page_start * ocm->page_size);
+	}
+
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", layer->glow.metadata.model.num_input);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", layer->glow.metadata.model.num_output);
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s  %18s\n", "input", "input_name", "input_type",
+		"model_input_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->glow.metadata.model.num_input; i++) {
+		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.input1[i].input_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.input1[i].input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.input1[i].model_input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		} else {
+			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
+
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.input2[j].input_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.input2[j].input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.input2[j].model_input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		}
+	}
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s  %18s\n", "output", "output_name", "output_type",
+		"model_output_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->glow.metadata.model.num_output; i++) {
+		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.output1[i].output_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.output1[i].output_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.output1[i].model_output_type,
+					      str, STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		} else {
+			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.output2[j].output_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.output2[j].output_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.output2[j].model_output_type,
+					      str, STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		}
+	}
+	fprintf(fp, "\n");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index b891c9d627..45f2ed5fcf 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -460,5 +460,6 @@ int cn10k_ml_model_ocm_pages_count(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_m
 void cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 			     struct cnxk_ml_io_info *io_info,
 			     struct cn10k_ml_model_metadata *metadata);
+void cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 2197e5e0ed..dc315cce10 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -481,19 +481,15 @@ cn10k_ml_ocm_pagemask_to_str(struct cn10k_ml_ocm_tile_info *tile_info, uint16_t
 }
 
 void
-cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
+cn10k_ml_ocm_print(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
 {
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 	uint8_t tile_id;
 	uint8_t word_id;
 	int wb_pages;
 	char *str;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
 
 	/* Nibbles + prefix '0x' */
 	str = rte_zmalloc("ocm_mask_str", ocm->num_pages / 4 + 2, RTE_CACHE_LINE_SIZE);
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 97b723a56a..bf8944f8ee 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -83,6 +83,6 @@ void cn10k_ml_ocm_reserve_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_i
 				uint16_t layer_id, uint64_t tilemask, int wb_page_start,
 				uint16_t wb_pages, uint16_t scratch_pages);
 void cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id);
-void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp);
+void cn10k_ml_ocm_print(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 
 #endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index ff190b7f86..0a3575879f 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -18,11 +18,6 @@
 /* ML layer macros */
 #define CN10K_ML_LAYER_MEMZONE_NAME "ml_cn10k_layer_mz"
 
-/* Debug print width */
-#define STR_LEN	  12
-#define FIELD_LEN 16
-#define LINE_LEN  90
-
 /* ML Job descriptor flags */
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
@@ -70,16 +65,6 @@ static const struct cn10k_ml_stype_db_driver {
 	{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
 };
 
-static void
-print_line(FILE *fp, int len)
-{
-	int i;
-
-	for (i = 0; i < len; i++)
-		fprintf(fp, "-");
-	fprintf(fp, "\n");
-}
-
 static inline void
 cn10k_ml_set_poll_addr(struct cnxk_ml_req *req)
 {
@@ -113,140 +98,6 @@ cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp)
 	}
 }
 
-static void
-cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
-	struct cn10k_ml_ocm *ocm;
-	char str[STR_LEN];
-	uint8_t i;
-	uint8_t j;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
-
-	/* Print debug info */
-	print_line(fp, LINE_LEN);
-	fprintf(fp, " Model Information (%s)\n", model->glow.metadata.model.name);
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->glow.metadata.model.name);
-	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version",
-		model->glow.metadata.model.version[0], model->glow.metadata.model.version[1],
-		model->glow.metadata.model.version[2], model->glow.metadata.model.version[3]);
-	if (strlen(model->name) != 0)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "debug_name", model->name);
-	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->glow.metadata.model.batch_size);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_layers", model->glow.metadata.model.num_layers);
-
-	/* Print model state */
-	if (model->state == ML_CNXK_MODEL_STATE_LOADED)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
-	if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
-	if (model->state == ML_CNXK_MODEL_STATE_STARTED)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
-
-	/* Print OCM status */
-	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
-		model->glow.metadata.model.ocm_wb_range_end -
-			model->glow.metadata.model.ocm_wb_range_start + 1);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", model->layer[0].glow.ocm_map.wb_pages);
-	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
-		ocm->size_per_tile - model->glow.metadata.model.ocm_tmp_range_floor);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages",
-		model->layer[0].glow.ocm_map.scratch_pages);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
-		model->glow.metadata.model.tile_end - model->glow.metadata.model.tile_start + 1);
-
-	if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
-			ML_CN10K_OCM_NUMTILES / 4, model->layer[0].glow.ocm_map.tilemask);
-		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
-			model->layer[0].glow.ocm_map.wb_page_start * cn10k_mldev->ocm.page_size);
-	}
-
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->glow.metadata.model.num_input);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", model->glow.metadata.model.num_output);
-	fprintf(fp, "\n");
-
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "input", "input_name", "input_type",
-		"model_input_type", "quantize");
-	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->glow.metadata.model.num_input; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.input1[i].input_name);
-			rte_ml_io_type_to_str(model->glow.metadata.input1[i].input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.input1[i].model_input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.input1[i].quantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.input2[j].input_name);
-			rte_ml_io_type_to_str(model->glow.metadata.input2[j].input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.input2[j].model_input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.input2[j].quantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		}
-	}
-	fprintf(fp, "\n");
-
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "output", "output_name", "output_type",
-		"model_output_type", "dequantize");
-	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->glow.metadata.model.num_output; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.output1[i].output_name);
-			rte_ml_io_type_to_str(model->glow.metadata.output1[i].output_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.output1[i].model_output_type,
-					      str, STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.output1[i].dequantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.output2[j].output_name);
-			rte_ml_io_type_to_str(model->glow.metadata.output2[j].output_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.output2[j].model_output_type,
-					      str, STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.output2[j].dequantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		}
-	}
-	fprintf(fp, "\n");
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "\n");
-}
-
 static void
 cn10k_ml_prep_sp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				struct cnxk_ml_req *req, enum cn10k_ml_job_type job_type)
@@ -1120,38 +971,25 @@ cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mo
 }
 
 int
-cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
+cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
 	struct cn10k_ml_fw *fw;
 
 	uint32_t head_loc;
 	uint32_t tail_loc;
-	uint16_t model_id;
 	uint32_t bufsize;
 	char *head_ptr;
 	int core_id;
 
-	if (roc_env_is_asim())
-		return 0;
-
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	fw = &cn10k_mldev->fw;
 
-	/* Dump model info */
-	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
-		model = dev->data->models[model_id];
-		if (model != NULL) {
-			cn10k_ml_model_print(dev, model_id, fp);
-			fprintf(fp, "\n");
-		}
-	}
-
 	/* Dump OCM state */
-	cn10k_ml_ocm_print(dev, fp);
+	cn10k_ml_ocm_print(cnxk_mldev, fp);
+
+	if (roc_env_is_asim())
+		return 0;
 
 	/* Dump debug buffer */
 	for (core_id = 0; core_id <= 1; core_id++) {
@@ -1207,17 +1045,15 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 }
 
 int
-cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
+cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	const struct plt_memzone *mz;
 	struct cnxk_ml_req *req;
 	uint64_t timeout_cycle;
 	bool timeout;
 	int ret;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cnxk_ml_req), 0,
 					 ML_CN10K_ALIGN_SIZE);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 780e2a9f9c..5fda98ae88 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -295,8 +295,8 @@ int cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_d
 int cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
-int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
-int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
+int cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
+int cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev);
 
 int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
 void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
index 3d735ced3e..b069d4e3a5 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.c
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -5,3 +5,36 @@
 #include <rte_mldev.h>
 
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_utils.h"
+
+void
+cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp)
+{
+	struct cnxk_ml_layer *layer;
+	uint16_t layer_id;
+
+	/* Print debug info */
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, " Model Information (Model ID: %u, Name: %s)\n", model->model_id, model->name);
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->name);
+	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->batch_size);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "nb_layers", model->nb_layers);
+
+	/* Print model state */
+	if (model->state == ML_CNXK_MODEL_STATE_LOADED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+	if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+	if (model->state == ML_CNXK_MODEL_STATE_STARTED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+
+	for (layer_id = 0; layer_id < model->nb_layers; layer_id++) {
+		layer = &model->layer[layer_id];
+		cn10k_ml_layer_print(cnxk_mldev, layer, fp);
+	}
+}
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index a2994dbb71..66d979dd3c 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -108,4 +108,6 @@ struct cnxk_ml_model {
 	plt_spinlock_t lock;
 };
 
+void cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp);
+
 #endif /* _CNXK_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 63842025fc..66b88ddae1 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -409,6 +409,41 @@ cnxk_ml_dev_stop(struct rte_ml_dev *dev)
 	return 0;
 }
 
+static int
+cnxk_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	uint16_t model_id;
+
+	if ((dev == NULL) || (fp == NULL))
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	/* Dump model info */
+	for (model_id = 0; model_id < cnxk_mldev->mldev->data->nb_models; model_id++) {
+		model = cnxk_mldev->mldev->data->models[model_id];
+		if (model != NULL)
+			cnxk_ml_model_dump(cnxk_mldev, model, fp);
+	}
+
+	return cn10k_ml_dev_dump(cnxk_mldev, fp);
+}
+
+static int
+cnxk_ml_dev_selftest(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	return cn10k_ml_dev_selftest(cnxk_mldev);
+}
+
 static int
 cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 			     const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
@@ -729,8 +764,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_close = cnxk_ml_dev_close,
 	.dev_start = cnxk_ml_dev_start,
 	.dev_stop = cnxk_ml_dev_stop,
-	.dev_dump = cn10k_ml_dev_dump,
-	.dev_selftest = cn10k_ml_dev_selftest,
+	.dev_dump = cnxk_ml_dev_dump,
+	.dev_selftest = cnxk_ml_dev_selftest,
 
 	/* Queue-pair handling ops */
 	.dev_queue_pair_setup = cnxk_ml_dev_queue_pair_setup,
diff --git a/drivers/ml/cnxk/cnxk_ml_utils.c b/drivers/ml/cnxk/cnxk_ml_utils.c
new file mode 100644
index 0000000000..ca3670a9e8
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_utils.c
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include "cnxk_ml_utils.h"
+
+void
+cnxk_ml_print_line(FILE *fp, int len)
+{
+	int i;
+
+	for (i = 0; i < len; i++)
+		fprintf(fp, "-");
+	fprintf(fp, "\n");
+}
diff --git a/drivers/ml/cnxk/cnxk_ml_utils.h b/drivers/ml/cnxk/cnxk_ml_utils.h
new file mode 100644
index 0000000000..ed2ab21346
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_utils.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_UTILS_H_
+#define _CNXK_ML_UTILS_H_
+
+#include <rte_mldev.h>
+
+/* Debug print width */
+#define STR_LEN	  12
+#define FIELD_LEN 16
+#define LINE_LEN  72
+
+void cnxk_ml_print_line(FILE *fp, int len);
+
+#endif /* _CNXK_ML_UTILS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 9cc4ddec70..575f08f9c0 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -17,6 +17,7 @@ driver_sdk_headers = files(
         'cnxk_ml_model.h',
         'cnxk_ml_ops.h',
         'cnxk_ml_xstats.h',
+        'cnxk_ml_utils.h',
 )
 
 sources = files(
@@ -28,6 +29,7 @@ sources = files(
         'cnxk_ml_io.c',
         'cnxk_ml_model.c',
         'cnxk_ml_ops.c',
+        'cnxk_ml_utils.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 14/35] ml/cnxk: update device stats functions
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
                     ` (12 preceding siblings ...)
  2023-09-27 18:30   ` [PATCH v3 13/35] ml/cnxk: update device debug functions Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 15/35] ml/cnxk: update device and model xstats functions Srikanth Yalavarthi
                     ` (20 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to handle ML device stats

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 32 ------------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |  2 --
 drivers/ml/cnxk/cnxk_ml_ops.c  | 36 ++++++++++++++++++++++++++++++++--
 3 files changed, 34 insertions(+), 36 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 0a3575879f..27d255a830 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -770,38 +770,6 @@ cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
-int
-cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
-{
-	struct cnxk_ml_qp *qp;
-	int qp_id;
-
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-		qp = dev->data->queue_pairs[qp_id];
-		stats->enqueued_count += qp->stats.enqueued_count;
-		stats->dequeued_count += qp->stats.dequeued_count;
-		stats->enqueue_err_count += qp->stats.enqueue_err_count;
-		stats->dequeue_err_count += qp->stats.dequeue_err_count;
-	}
-
-	return 0;
-}
-
-void
-cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
-{
-	struct cnxk_ml_qp *qp;
-	int qp_id;
-
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-		qp = dev->data->queue_pairs[qp_id];
-		qp->stats.enqueued_count = 0;
-		qp->stats.dequeued_count = 0;
-		qp->stats.enqueue_err_count = 0;
-		qp->stats.dequeue_err_count = 0;
-	}
-}
-
 int
 cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 5fda98ae88..47e7cb12af 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -298,8 +298,6 @@ int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev);
 
-int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
-void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
 int cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 				  int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
 				  uint32_t size);
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 66b88ddae1..c75317d6da 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -489,6 +489,38 @@ cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+static int
+cnxk_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
+{
+	struct cnxk_ml_qp *qp;
+	int qp_id;
+
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		stats->enqueued_count += qp->stats.enqueued_count;
+		stats->dequeued_count += qp->stats.dequeued_count;
+		stats->enqueue_err_count += qp->stats.enqueue_err_count;
+		stats->dequeue_err_count += qp->stats.dequeue_err_count;
+	}
+
+	return 0;
+}
+
+static void
+cnxk_ml_dev_stats_reset(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_qp *qp;
+	int qp_id;
+
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		qp->stats.enqueued_count = 0;
+		qp->stats.dequeued_count = 0;
+		qp->stats.enqueue_err_count = 0;
+		qp->stats.dequeue_err_count = 0;
+	}
+}
+
 static int
 cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
@@ -772,8 +804,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_queue_pair_release = cnxk_ml_dev_queue_pair_release,
 
 	/* Stats ops */
-	.dev_stats_get = cn10k_ml_dev_stats_get,
-	.dev_stats_reset = cn10k_ml_dev_stats_reset,
+	.dev_stats_get = cnxk_ml_dev_stats_get,
+	.dev_stats_reset = cnxk_ml_dev_stats_reset,
 	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
 	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
 	.dev_xstats_get = cn10k_ml_dev_xstats_get,
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 15/35] ml/cnxk: update device and model xstats functions
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
                     ` (13 preceding siblings ...)
  2023-09-27 18:30   ` [PATCH v3 14/35] ml/cnxk: update device stats functions Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 16/35] ml/cnxk: update fast path functions Srikanth Yalavarthi
                     ` (19 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to handle ML device and model
extended stats. Handling resources for the xstats is done
in the cnxk layer. Introduced internal xstats group.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |   4 -
 drivers/ml/cnxk/cn10k_ml_ops.c   | 531 +++----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h   |  16 +-
 drivers/ml/cnxk/cnxk_ml_dev.h    |   5 +
 drivers/ml/cnxk/cnxk_ml_ops.c    | 481 +++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_xstats.h |  21 +-
 6 files changed, 551 insertions(+), 507 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index be989e0a20..bde9d08901 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -10,7 +10,6 @@
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_io.h"
-#include "cnxk_ml_xstats.h"
 
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
@@ -133,9 +132,6 @@ struct cn10k_ml_dev {
 	/* OCM info */
 	struct cn10k_ml_ocm ocm;
 
-	/* Extended stats data */
-	struct cnxk_ml_xstats xstats;
-
 	/* Enable / disable model data caching */
 	int cache_model_data;
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 27d255a830..776ad60401 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -198,107 +198,21 @@ cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_r
 	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
 }
 
-static int
-cn10k_ml_xstats_init(struct rte_ml_dev *dev)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint16_t nb_stats;
-	uint16_t stat_id;
-	uint16_t model;
-	uint16_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
-	nb_stats = RTE_DIM(device_xstats) + ML_CNXK_MAX_MODELS * RTE_DIM(layer_xstats);
-	if (cn10k_mldev->xstats.entries == NULL)
-		cn10k_mldev->xstats.entries = rte_zmalloc(
-			"cn10k_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
-			PLT_CACHE_LINE_SIZE);
-
-	if (cn10k_mldev->xstats.entries == NULL)
-		return -ENOMEM;
-
-	/* Initialize device xstats */
-	stat_id = 0;
-	for (i = 0; i < RTE_DIM(device_xstats); i++) {
-		cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
-		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
-			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s",
-			 device_xstats[i].name);
-
-		cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
-		cn10k_mldev->xstats.entries[stat_id].type = device_xstats[i].type;
-		cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_DEVICE;
-		cn10k_mldev->xstats.entries[stat_id].obj_idx = 0;
-		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_xstats[i].reset_allowed;
-		stat_id++;
-	}
-	cn10k_mldev->xstats.count_mode_device = stat_id;
-
-	/* Initialize model xstats */
-	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
-		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
-
-		for (i = 0; i < RTE_DIM(layer_xstats); i++) {
-			cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
-			cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
-			cn10k_mldev->xstats.entries[stat_id].type = layer_xstats[i].type;
-			cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
-			cn10k_mldev->xstats.entries[stat_id].obj_idx = model;
-			cn10k_mldev->xstats.entries[stat_id].reset_allowed =
-				layer_xstats[i].reset_allowed;
-
-			/* Name of xstat is updated during model load */
-			snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
-				 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name),
-				 "Model-%u-%s", model, layer_xstats[i].name);
-
-			stat_id++;
-		}
-
-		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
-	}
-
-	cn10k_mldev->xstats.count_mode_model = stat_id - cn10k_mldev->xstats.count_mode_device;
-	cn10k_mldev->xstats.count = stat_id;
-
-	return 0;
-}
-
 static void
-cn10k_ml_xstats_uninit(struct rte_ml_dev *dev)
+cn10k_ml_xstats_layer_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id,
+				  uint16_t layer_id)
 {
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	rte_free(cn10k_mldev->xstats.entries);
-	cn10k_mldev->xstats.entries = NULL;
-
-	cn10k_mldev->xstats.count = 0;
-}
-
-static void
-cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	uint16_t rclk_freq;
 	uint16_t sclk_freq;
 	uint16_t stat_id;
 	char suffix[8];
 	uint16_t i;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	model = dev->data->models[model_id];
-	stat_id = RTE_DIM(device_xstats) + model_id * RTE_DIM(layer_xstats);
+	model = cnxk_mldev->mldev->data->models[model_id];
+	layer = &model->layer[layer_id];
+	stat_id = cnxk_mldev->xstats.offset_for_layer[model_id][layer_id];
 
 	roc_clk_freq_get(&rclk_freq, &sclk_freq);
 	if (sclk_freq == 0)
@@ -306,270 +220,94 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 	else
 		strcpy(suffix, "ns");
 
-	/* Update xstat name based on model name and sclk availability */
+	/* Update xstat name based on layer name and sclk availability */
 	for (i = 0; i < RTE_DIM(layer_xstats); i++) {
-		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
-			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
-			 model->layer[0].glow.metadata.model.name, layer_xstats[i].name, suffix);
+		snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+			 layer->glow.metadata.model.name, layer_xstats[i].name, suffix);
 		stat_id++;
 	}
 }
 
-static uint64_t
-cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
-		       enum cnxk_ml_xstats_type type)
-{
-	struct cnxk_ml_dev *cnxk_mldev;
-
-	cnxk_mldev = dev->data->dev_private;
-
-	switch (type) {
-	case nb_models_loaded:
-		return cnxk_mldev->nb_models_loaded;
-	case nb_models_unloaded:
-		return cnxk_mldev->nb_models_unloaded;
-	case nb_models_started:
-		return cnxk_mldev->nb_models_started;
-	case nb_models_stopped:
-		return cnxk_mldev->nb_models_stopped;
-	default:
-		return -1;
-	}
-
-	return 0;
-}
-
-#define ML_AVG_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+#define ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
 	do {                                                                                       \
 		value = 0;                                                                         \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value += model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot;       \
-			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
-				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value += layer->glow.burst_xstats[qp_id].str##_latency_tot;                \
+			count += layer->glow.burst_xstats[qp_id].dequeued_count -                  \
+				 layer->glow.burst_xstats[qp_id].str##_reset_count;                \
 		}                                                                                  \
+		value += layer->glow.sync_xstats->str##_latency_tot;                               \
+		count += layer->glow.sync_xstats->dequeued_count -                                 \
+			 layer->glow.sync_xstats->str##_reset_count;                               \
 		if (count != 0)                                                                    \
 			value = value / count;                                                     \
 	} while (0)
 
-#define ML_MIN_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+#define ML_MIN_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
 	do {                                                                                       \
 		value = UINT64_MAX;                                                                \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MIN(                                                           \
-				value,                                                             \
-				model->layer[0].glow.burst_xstats[qp_id].str##_latency_min);       \
-			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
-				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MIN(value, layer->glow.burst_xstats[qp_id].str##_latency_min); \
+			count += layer->glow.burst_xstats[qp_id].dequeued_count -                  \
+				 layer->glow.burst_xstats[qp_id].str##_reset_count;                \
 		}                                                                                  \
+		value = PLT_MIN(value, layer->glow.sync_xstats->str##_latency_min);                \
+		count += layer->glow.sync_xstats->dequeued_count -                                 \
+			 layer->glow.sync_xstats->str##_reset_count;                               \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
 	} while (0)
 
-#define ML_MAX_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+#define ML_MAX_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
 	do {                                                                                       \
 		value = 0;                                                                         \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MAX(                                                           \
-				value,                                                             \
-				model->layer[0].glow.burst_xstats[qp_id].str##_latency_max);       \
-			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
-				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MAX(value, layer->glow.burst_xstats[qp_id].str##_latency_max); \
+			count += layer->glow.burst_xstats[qp_id].dequeued_count -                  \
+				 layer->glow.burst_xstats[qp_id].str##_reset_count;                \
 		}                                                                                  \
+		value = PLT_MAX(value, layer->glow.sync_xstats->str##_latency_max);                \
+		count += layer->glow.sync_xstats->dequeued_count -                                 \
+			 layer->glow.sync_xstats->str##_reset_count;                               \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
 	} while (0)
 
-static uint64_t
-cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cnxk_ml_xstats_type type)
+uint64_t
+cn10k_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
+			 enum cnxk_ml_xstats_type type)
 {
-	struct cnxk_ml_model *model;
-	uint16_t rclk_freq; /* MHz */
-	uint16_t sclk_freq; /* MHz */
 	uint64_t count = 0;
-	uint64_t value;
+	uint64_t value = 0;
 	uint32_t qp_id;
 
-	model = dev->data->models[obj_idx];
-	if (model == NULL)
-		return 0;
-
 	switch (type) {
 	case avg_hw_latency:
-		ML_AVG_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, hw, value, count);
 		break;
 	case min_hw_latency:
-		ML_MIN_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		ML_MIN_FOREACH_QP(cnxk_mldev, layer, qp_id, hw, value, count);
 		break;
 	case max_hw_latency:
-		ML_MAX_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		ML_MAX_FOREACH_QP(cnxk_mldev, layer, qp_id, hw, value, count);
 		break;
 	case avg_fw_latency:
-		ML_AVG_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, fw, value, count);
 		break;
 	case min_fw_latency:
-		ML_MIN_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		ML_MIN_FOREACH_QP(cnxk_mldev, layer, qp_id, fw, value, count);
 		break;
 	case max_fw_latency:
-		ML_MAX_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		ML_MAX_FOREACH_QP(cnxk_mldev, layer, qp_id, fw, value, count);
 		break;
 	default:
 		value = 0;
 	}
 
-	roc_clk_freq_get(&rclk_freq, &sclk_freq);
-	if (sclk_freq != 0) /* return in ns */
-		value = (value * 1000ULL) / sclk_freq;
-
 	return value;
 }
 
-static int
-cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[], uint16_t nb_ids)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_xstats_entry *xs;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint16_t nb_stats;
-	uint16_t stat_id;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	if (stat_ids == NULL)
-		nb_stats = cn10k_mldev->xstats.count_mode_device;
-	else
-		nb_stats = nb_ids;
-
-	for (i = 0; i < nb_stats; i++) {
-		if (stat_ids == NULL)
-			stat_id = i;
-		else
-			stat_id = stat_ids[i];
-
-		if (stat_id >= cn10k_mldev->xstats.count_mode_device)
-			return -EINVAL;
-
-		xs = &cn10k_mldev->xstats.entries[stat_id];
-		if (!xs->reset_allowed)
-			continue;
-
-		xs->reset_value = cn10k_ml_dev_xstat_get(dev, xs->obj_idx, xs->type);
-	}
-
-	return 0;
-}
-
-#define ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
-	do {                                                                                       \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot = 0;            \
-			model->layer[0].glow.burst_xstats[qp_id].str##_reset_count =               \
-				model->layer[0].glow.burst_xstats[qp_id].dequeued_count;           \
-		}                                                                                  \
-	} while (0)
-
-#define ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
-	do {                                                                                       \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_xstats[qp_id].str##_latency_min = UINT64_MAX;   \
-	} while (0)
-
-#define ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
-	do {                                                                                       \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_xstats[qp_id].str##_latency_max = 0;            \
-	} while (0)
-
-static void
-cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cnxk_ml_xstats_type type)
-{
-	struct cnxk_ml_model *model;
-	uint32_t qp_id;
-
-	model = dev->data->models[model_id];
-
-	switch (type) {
-	case avg_hw_latency:
-		ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, hw);
-		break;
-	case min_hw_latency:
-		ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, hw);
-		break;
-	case max_hw_latency:
-		ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, hw);
-		break;
-	case avg_fw_latency:
-		ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, fw);
-		break;
-	case min_fw_latency:
-		ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, fw);
-		break;
-	case max_fw_latency:
-		ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, fw);
-		break;
-	default:
-		return;
-	}
-}
-
-static int
-cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint16_t stat_ids[],
-			    uint16_t nb_ids)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_xstats_entry *xs;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
-	int32_t lcl_model_id = 0;
-	uint16_t start_id;
-	uint16_t end_id;
-	int32_t i;
-	int32_t j;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	for (i = 0; i < ML_CNXK_MAX_MODELS; i++) {
-		if (model_id == -1) {
-			model = dev->data->models[i];
-			if (model == NULL) /* Skip inactive models */
-				continue;
-		} else {
-			if (model_id != i)
-				continue;
-
-			model = dev->data->models[model_id];
-			if (model == NULL) {
-				plt_err("Invalid model_id = %d\n", model_id);
-				return -EINVAL;
-			}
-		}
-
-		start_id = cn10k_mldev->xstats.offset_for_model[i];
-		end_id = cn10k_mldev->xstats.offset_for_model[i] +
-			 cn10k_mldev->xstats.count_per_model[i] - 1;
-
-		if (stat_ids == NULL) {
-			for (j = start_id; j <= end_id; j++) {
-				xs = &cn10k_mldev->xstats.entries[j];
-				cn10k_ml_reset_model_stat(dev, i, xs->type);
-			}
-		} else {
-			for (j = 0; j < nb_ids; j++) {
-				if (stat_ids[j] < start_id || stat_ids[j] > end_id) {
-					plt_err("Invalid stat_ids[%d] = %d for model_id = %d\n", j,
-						stat_ids[j], lcl_model_id);
-					return -EINVAL;
-				}
-				xs = &cn10k_mldev->xstats.entries[stat_ids[j]];
-				cn10k_ml_reset_model_stat(dev, i, xs->type);
-			}
-		}
-	}
-
-	return 0;
-}
-
 static int
 cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer)
 {
@@ -654,7 +392,6 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cn10k_ml_ocm *ocm;
 	uint16_t tile_id;
-	int ret;
 
 	RTE_SET_USED(conf);
 
@@ -682,13 +419,6 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 
 	rte_spinlock_init(&ocm->lock);
 
-	/* Initialize xstats */
-	ret = cn10k_ml_xstats_init(cnxk_mldev->mldev);
-	if (ret != 0) {
-		plt_err("Failed to initialize xstats");
-		return ret;
-	}
-
 	/* Set JCMDQ enqueue function */
 	if (cn10k_mldev->hw_queue_lock == 1)
 		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_sl;
@@ -717,9 +447,6 @@ cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 	/* Release ocm_mask memory */
 	rte_free(cn10k_mldev->ocm.ocm_mask);
 
-	/* Un-initialize xstats */
-	cn10k_ml_xstats_uninit(cnxk_mldev->mldev);
-
 	/* Unload firmware */
 	cn10k_ml_fw_unload(cnxk_mldev);
 
@@ -770,174 +497,6 @@ cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
-int
-cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
-			      uint32_t size)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint32_t xstats_mode_count;
-	uint32_t idx = 0;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	xstats_mode_count = 0;
-	switch (mode) {
-	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
-		break;
-	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CNXK_MAX_MODELS)
-			break;
-		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
-		break;
-	default:
-		return -EINVAL;
-	};
-
-	if (xstats_mode_count > size || xstats_map == NULL)
-		return xstats_mode_count;
-
-	for (i = 0; i < cn10k_mldev->xstats.count && idx < size; i++) {
-		if (cn10k_mldev->xstats.entries[i].mode != mode)
-			continue;
-
-		if (mode != RTE_ML_DEV_XSTATS_DEVICE &&
-		    model_id != cn10k_mldev->xstats.entries[i].obj_idx)
-			continue;
-
-		strncpy(xstats_map[idx].name, cn10k_mldev->xstats.entries[i].map.name,
-			RTE_ML_STR_MAX);
-		xstats_map[idx].id = cn10k_mldev->xstats.entries[i].map.id;
-		idx++;
-	}
-
-	return idx;
-}
-
-int
-cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
-				uint64_t *value)
-{
-	struct cnxk_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	cnxk_ml_xstats_fn fn;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	for (i = 0; i < cn10k_mldev->xstats.count; i++) {
-		xs = &cn10k_mldev->xstats.entries[i];
-		if (strncmp(xs->map.name, name, RTE_ML_STR_MAX) == 0) {
-			if (stat_id != NULL)
-				*stat_id = xs->map.id;
-
-			switch (xs->fn_id) {
-			case CNXK_ML_XSTATS_FN_DEVICE:
-				fn = cn10k_ml_dev_xstat_get;
-				break;
-			case CNXK_ML_XSTATS_FN_MODEL:
-				fn = cn10k_ml_model_xstat_get;
-				break;
-			default:
-				plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
-				return -EINVAL;
-			}
-
-			*value = fn(dev, xs->obj_idx, xs->type) - xs->reset_value;
-
-			return 0;
-		}
-	}
-
-	if (stat_id != NULL)
-		*stat_id = (uint16_t)-1;
-
-	return -EINVAL;
-}
-
-int
-cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
-			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_xstats_entry *xs;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint32_t xstats_mode_count;
-	cnxk_ml_xstats_fn fn;
-	uint64_t val;
-	uint32_t idx;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	xstats_mode_count = 0;
-
-	switch (mode) {
-	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
-		break;
-	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CNXK_MAX_MODELS)
-			return -EINVAL;
-		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
-		break;
-	default:
-		return -EINVAL;
-	};
-
-	idx = 0;
-	for (i = 0; i < nb_ids && idx < xstats_mode_count; i++) {
-		xs = &cn10k_mldev->xstats.entries[stat_ids[i]];
-		if (stat_ids[i] > cn10k_mldev->xstats.count || xs->mode != mode)
-			continue;
-
-		if (mode == RTE_ML_DEV_XSTATS_MODEL && model_id != xs->obj_idx) {
-			plt_err("Invalid stats_id[%d] = %d for model_id = %d\n", i, stat_ids[i],
-				model_id);
-			return -EINVAL;
-		}
-
-		switch (xs->fn_id) {
-		case CNXK_ML_XSTATS_FN_DEVICE:
-			fn = cn10k_ml_dev_xstat_get;
-			break;
-		case CNXK_ML_XSTATS_FN_MODEL:
-			fn = cn10k_ml_model_xstat_get;
-			break;
-		default:
-			plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
-			return -EINVAL;
-		}
-
-		val = fn(dev, xs->obj_idx, xs->type);
-		if (values)
-			values[idx] = val;
-
-		idx++;
-	}
-
-	return idx;
-}
-
-int
-cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			  int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids)
-{
-	switch (mode) {
-	case RTE_ML_DEV_XSTATS_DEVICE:
-		return cn10k_ml_device_xstats_reset(dev, stat_ids, nb_ids);
-	case RTE_ML_DEV_XSTATS_MODEL:
-		return cn10k_ml_model_xstats_reset(dev, model_id, stat_ids, nb_ids);
-	};
-
-	return 0;
-}
-
 int
 cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
 {
@@ -1211,7 +770,7 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 							      sizeof(struct cn10k_ml_layer_xstats));
 
 	/* Update xstats names */
-	cn10k_ml_xstats_model_name_update(cnxk_mldev->mldev, idx);
+	cn10k_ml_xstats_layer_name_update(cnxk_mldev, model_id, layer_id);
 
 	layer->state = ML_CNXK_LAYER_STATE_LOADED;
 	cnxk_mldev->index_map[idx].model_id = model->model_id;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 47e7cb12af..4d76164dba 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -13,6 +13,7 @@
 struct cnxk_ml_dev;
 struct cnxk_ml_qp;
 struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
@@ -298,17 +299,6 @@ int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev);
 
-int cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-				  int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
-				  uint32_t size);
-int cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
-				    uint64_t *value);
-int cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			    int32_t model_id, const uint16_t stat_ids[], uint64_t values[],
-			    uint16_t nb_ids);
-int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			      int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids);
-
 /* Slow-path ops */
 int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
@@ -337,4 +327,8 @@ int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_nam
 int cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name);
 int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name);
 
+/* xstats ops */
+uint64_t cn10k_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
+				  enum cnxk_ml_xstats_type type);
+
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 1590249abd..3ce9338f1f 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -9,6 +9,8 @@
 
 #include "cn10k_ml_dev.h"
 
+#include "cnxk_ml_xstats.h"
+
 /* ML command timeout in seconds */
 #define ML_CNXK_CMD_TIMEOUT 5
 
@@ -51,6 +53,9 @@ struct cnxk_ml_dev {
 	/* Configuration state */
 	enum cnxk_ml_dev_state state;
 
+	/* Extended stats data */
+	struct cnxk_ml_xstats xstats;
+
 	/* Number of models loaded */
 	uint16_t nb_models_loaded;
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index c75317d6da..6a423d9eda 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -115,6 +115,285 @@ cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc
 	return NULL;
 }
 
+static int
+cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
+{
+	uint16_t nb_stats;
+	uint16_t stat_id;
+	uint16_t model;
+	uint16_t layer;
+	uint16_t i;
+
+	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
+	nb_stats = RTE_DIM(device_xstats) +
+		   RTE_DIM(layer_xstats) * ML_CNXK_MAX_MODELS * ML_CNXK_MODEL_MAX_LAYERS;
+	if (cnxk_mldev->xstats.entries == NULL)
+		cnxk_mldev->xstats.entries = rte_zmalloc(
+			"cnxk_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
+			PLT_CACHE_LINE_SIZE);
+
+	if (cnxk_mldev->xstats.entries == NULL)
+		return -ENOMEM;
+
+	/* Initialize device xstats */
+	stat_id = 0;
+	for (i = 0; i < RTE_DIM(device_xstats); i++) {
+		cnxk_mldev->xstats.entries[stat_id].map.id = stat_id;
+		snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s",
+			 device_xstats[i].name);
+
+		cnxk_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
+		cnxk_mldev->xstats.entries[stat_id].group = CNXK_ML_XSTATS_GROUP_DEVICE;
+		cnxk_mldev->xstats.entries[stat_id].type = device_xstats[i].type;
+		cnxk_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_DEVICE;
+		cnxk_mldev->xstats.entries[stat_id].obj_idx = 0;
+		cnxk_mldev->xstats.entries[stat_id].reset_allowed = device_xstats[i].reset_allowed;
+		stat_id++;
+	}
+	cnxk_mldev->xstats.count_mode_device = stat_id;
+
+	/* Initialize model xstats */
+	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
+		cnxk_mldev->xstats.offset_for_model[model] = stat_id;
+
+		for (layer = 0; layer < ML_CNXK_MODEL_MAX_LAYERS; layer++) {
+			cnxk_mldev->xstats.offset_for_layer[model][layer] = stat_id;
+
+			for (i = 0; i < RTE_DIM(layer_xstats); i++) {
+				cnxk_mldev->xstats.entries[stat_id].map.id = stat_id;
+				cnxk_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
+				cnxk_mldev->xstats.entries[stat_id].group =
+					CNXK_ML_XSTATS_GROUP_LAYER;
+				cnxk_mldev->xstats.entries[stat_id].type = layer_xstats[i].type;
+				cnxk_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
+				cnxk_mldev->xstats.entries[stat_id].obj_idx = model;
+				cnxk_mldev->xstats.entries[stat_id].layer_id = layer;
+				cnxk_mldev->xstats.entries[stat_id].reset_allowed =
+					layer_xstats[i].reset_allowed;
+
+				/* Name of xstat is updated during model load */
+				snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+					 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name),
+					 "Layer-%u-%u-%s", model, layer, layer_xstats[i].name);
+
+				stat_id++;
+			}
+
+			cnxk_mldev->xstats.count_per_layer[model][layer] = RTE_DIM(layer_xstats);
+		}
+
+		cnxk_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
+	}
+
+	cnxk_mldev->xstats.count_mode_model = stat_id - cnxk_mldev->xstats.count_mode_device;
+	cnxk_mldev->xstats.count = stat_id;
+
+	return 0;
+}
+
+static void
+cnxk_ml_xstats_uninit(struct cnxk_ml_dev *cnxk_mldev)
+{
+	rte_free(cnxk_mldev->xstats.entries);
+	cnxk_mldev->xstats.entries = NULL;
+
+	cnxk_mldev->xstats.count = 0;
+}
+
+static uint64_t
+cnxk_ml_dev_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx __rte_unused,
+		      int32_t layer_id __rte_unused, enum cnxk_ml_xstats_type type)
+{
+	switch (type) {
+	case nb_models_loaded:
+		return cnxk_mldev->nb_models_loaded;
+	case nb_models_unloaded:
+		return cnxk_mldev->nb_models_unloaded;
+	case nb_models_started:
+		return cnxk_mldev->nb_models_started;
+	case nb_models_stopped:
+		return cnxk_mldev->nb_models_stopped;
+	default:
+		return -1;
+	}
+
+	return 0;
+}
+
+static uint64_t
+cnxk_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx, int32_t layer_id,
+			enum cnxk_ml_xstats_type type)
+{
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
+	uint16_t rclk_freq; /* MHz */
+	uint16_t sclk_freq; /* MHz */
+	uint64_t value = 0;
+
+	model = cnxk_mldev->mldev->data->models[obj_idx];
+	if (model == NULL)
+		return 0;
+
+	if (layer_id >= 0)
+		layer = &model->layer[layer_id];
+	else
+		return 0;
+
+	value = cn10k_ml_model_xstat_get(cnxk_mldev, layer, type);
+
+	roc_clk_freq_get(&rclk_freq, &sclk_freq);
+	if (sclk_freq != 0) /* return in ns */
+		value = (value * 1000ULL) / sclk_freq;
+
+	return value;
+}
+
+static int
+cnxk_ml_device_xstats_reset(struct cnxk_ml_dev *cnxk_mldev, const uint16_t stat_ids[],
+			    uint16_t nb_ids)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	uint16_t nb_stats;
+	uint16_t stat_id;
+	uint32_t i;
+
+	if (stat_ids == NULL)
+		nb_stats = cnxk_mldev->xstats.count_mode_device;
+	else
+		nb_stats = nb_ids;
+
+	for (i = 0; i < nb_stats; i++) {
+		if (stat_ids == NULL)
+			stat_id = i;
+		else
+			stat_id = stat_ids[i];
+
+		if (stat_id >= cnxk_mldev->xstats.count_mode_device)
+			return -EINVAL;
+
+		xs = &cnxk_mldev->xstats.entries[stat_id];
+		if (!xs->reset_allowed)
+			continue;
+
+		xs->reset_value =
+			cnxk_ml_dev_xstat_get(cnxk_mldev, xs->obj_idx, xs->layer_id, xs->type);
+	}
+
+	return 0;
+}
+
+#define ML_AVG_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, str)                                     \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			layer->glow.burst_xstats[qp_id].str##_latency_tot = 0;                     \
+			layer->glow.burst_xstats[qp_id].str##_reset_count =                        \
+				layer->glow.burst_xstats[qp_id].dequeued_count;                    \
+		}                                                                                  \
+	} while (0)
+
+#define ML_MIN_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, str)                                     \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++)          \
+			layer->glow.burst_xstats[qp_id].str##_latency_min = UINT64_MAX;            \
+	} while (0)
+
+#define ML_MAX_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, str)                                     \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++)          \
+			layer->glow.burst_xstats[qp_id].str##_latency_max = 0;                     \
+	} while (0)
+
+static void
+cnxk_ml_reset_model_stat(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id,
+			 enum cnxk_ml_xstats_type type)
+{
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
+	uint16_t layer_id = 0;
+	uint32_t qp_id;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	layer = &model->layer[layer_id];
+
+	switch (type) {
+	case avg_hw_latency:
+		ML_AVG_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, hw);
+		break;
+	case min_hw_latency:
+		ML_MIN_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, hw);
+		break;
+	case max_hw_latency:
+		ML_MAX_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, hw);
+		break;
+	case avg_fw_latency:
+		ML_AVG_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, fw);
+		break;
+	case min_fw_latency:
+		ML_MIN_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, fw);
+		break;
+	case max_fw_latency:
+		ML_MAX_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, fw);
+		break;
+	default:
+		return;
+	}
+}
+
+static int
+cnxk_ml_model_xstats_reset(struct cnxk_ml_dev *cnxk_mldev, int32_t model_id,
+			   const uint16_t stat_ids[], uint16_t nb_ids)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_model *model;
+	int32_t lcl_model_id = 0;
+	uint16_t layer_id = 0;
+	uint16_t start_id;
+	uint16_t end_id;
+	int32_t i;
+	int32_t j;
+
+	for (i = 0; i < ML_CNXK_MAX_MODELS; i++) {
+		if (model_id == -1) {
+			model = cnxk_mldev->mldev->data->models[i];
+			if (model == NULL) /* skip inactive models */
+				continue;
+		} else {
+			if (model_id != i)
+				continue;
+
+			model = cnxk_mldev->mldev->data->models[model_id];
+			if (model == NULL) {
+				plt_err("Invalid model_id = %d\n", model_id);
+				return -EINVAL;
+			}
+		}
+
+		start_id = cnxk_mldev->xstats.offset_for_layer[i][layer_id];
+		end_id = cnxk_mldev->xstats.offset_for_layer[i][layer_id] +
+			 cnxk_mldev->xstats.count_per_layer[i][layer_id] - 1;
+
+		if (stat_ids == NULL) {
+			for (j = start_id; j <= end_id; j++) {
+				xs = &cnxk_mldev->xstats.entries[j];
+				cnxk_ml_reset_model_stat(cnxk_mldev, i, xs->type);
+			}
+		} else {
+			for (j = 0; j < nb_ids; j++) {
+				if (stat_ids[j] < start_id || stat_ids[j] > end_id) {
+					plt_err("Invalid stat_ids[%d] = %d for model_id = %d\n", j,
+						stat_ids[j], lcl_model_id);
+					return -EINVAL;
+				}
+				xs = &cnxk_mldev->xstats.entries[stat_ids[j]];
+				cnxk_ml_reset_model_stat(cnxk_mldev, i, xs->type);
+			}
+		}
+	}
+
+	return 0;
+}
+
 static int
 cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -294,6 +573,13 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	for (i = 0; i < cnxk_mldev->max_nb_layers; i++)
 		cnxk_mldev->index_map[i].active = false;
 
+	/* Initialize xstats */
+	ret = cnxk_ml_xstats_init(cnxk_mldev);
+	if (ret != 0) {
+		plt_err("Failed to initialize xstats");
+		goto error;
+	}
+
 	cnxk_mldev->nb_models_loaded = 0;
 	cnxk_mldev->nb_models_started = 0;
 	cnxk_mldev->nb_models_stopped = 0;
@@ -323,6 +609,9 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
+	/* Un-initialize xstats */
+	cnxk_ml_xstats_uninit(cnxk_mldev);
+
 	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close CN10K ML Device");
 
@@ -521,6 +810,190 @@ cnxk_ml_dev_stats_reset(struct rte_ml_dev *dev)
 	}
 }
 
+static int
+cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+			     int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
+			     uint32_t size)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_dev *cnxk_mldev;
+	uint32_t xstats_mode_count;
+	uint16_t layer_id = 0;
+	uint32_t idx = 0;
+	uint32_t i;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+	xstats_mode_count = 0;
+
+	switch (mode) {
+	case RTE_ML_DEV_XSTATS_DEVICE:
+		xstats_mode_count = cnxk_mldev->xstats.count_mode_device;
+		break;
+	case RTE_ML_DEV_XSTATS_MODEL:
+		if (model_id >= ML_CNXK_MAX_MODELS)
+			break;
+		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+		break;
+	default:
+		return -EINVAL;
+	};
+
+	if (xstats_mode_count > size || xstats_map == NULL)
+		return xstats_mode_count;
+
+	for (i = 0; i < cnxk_mldev->xstats.count && idx < size; i++) {
+		xs = &cnxk_mldev->xstats.entries[i];
+		if (xs->mode != mode)
+			continue;
+
+		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
+		    (model_id != xs->obj_idx || layer_id != xs->layer_id))
+			continue;
+
+		strncpy(xstats_map[idx].name, xs->map.name, RTE_ML_STR_MAX);
+		xstats_map[idx].id = xs->map.id;
+		idx++;
+	}
+
+	return idx;
+}
+
+static int
+cnxk_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
+			       uint64_t *value)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_dev *cnxk_mldev;
+	cnxk_ml_xstats_fn fn;
+	uint32_t i;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	for (i = 0; i < cnxk_mldev->xstats.count; i++) {
+		xs = &cnxk_mldev->xstats.entries[i];
+		if (strncmp(xs->map.name, name, RTE_ML_STR_MAX) == 0) {
+			if (stat_id != NULL)
+				*stat_id = xs->map.id;
+
+			switch (xs->fn_id) {
+			case CNXK_ML_XSTATS_FN_DEVICE:
+				fn = cnxk_ml_dev_xstat_get;
+				break;
+			case CNXK_ML_XSTATS_FN_MODEL:
+				fn = cnxk_ml_model_xstat_get;
+				break;
+			default:
+				plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
+				return -EINVAL;
+			}
+
+			*value = fn(cnxk_mldev, xs->obj_idx, xs->layer_id, xs->type) -
+				 xs->reset_value;
+
+			return 0;
+		}
+	}
+
+	if (stat_id != NULL)
+		*stat_id = (uint16_t)-1;
+
+	return -EINVAL;
+}
+
+static int
+cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
+		       const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_dev *cnxk_mldev;
+	uint32_t xstats_mode_count;
+	uint16_t layer_id = 0;
+	cnxk_ml_xstats_fn fn;
+	uint64_t val;
+	uint32_t idx;
+	uint32_t i;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+	xstats_mode_count = 0;
+
+	switch (mode) {
+	case RTE_ML_DEV_XSTATS_DEVICE:
+		xstats_mode_count = cnxk_mldev->xstats.count_mode_device;
+		break;
+	case RTE_ML_DEV_XSTATS_MODEL:
+		if (model_id >= ML_CNXK_MAX_MODELS)
+			return -EINVAL;
+		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+		break;
+	default:
+		return -EINVAL;
+	};
+
+	idx = 0;
+	for (i = 0; i < nb_ids && idx < xstats_mode_count; i++) {
+		xs = &cnxk_mldev->xstats.entries[stat_ids[i]];
+		if (stat_ids[i] > cnxk_mldev->xstats.count || xs->mode != mode)
+			continue;
+
+		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
+		    (model_id != xs->obj_idx || layer_id != xs->layer_id)) {
+			plt_err("Invalid stats_id[%d] = %d for model_id = %d\n", i, stat_ids[i],
+				model_id);
+			return -EINVAL;
+		}
+
+		switch (xs->fn_id) {
+		case CNXK_ML_XSTATS_FN_DEVICE:
+			fn = cnxk_ml_dev_xstat_get;
+			break;
+		case CNXK_ML_XSTATS_FN_MODEL:
+			fn = cnxk_ml_model_xstat_get;
+			break;
+		default:
+			plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
+			return -EINVAL;
+		}
+
+		val = fn(cnxk_mldev, xs->obj_idx, xs->layer_id, xs->type);
+		if (values)
+			values[idx] = val;
+
+		idx++;
+	}
+
+	return idx;
+}
+
+static int
+cnxk_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
+			 const uint16_t stat_ids[], uint16_t nb_ids)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	switch (mode) {
+	case RTE_ML_DEV_XSTATS_DEVICE:
+		return cnxk_ml_device_xstats_reset(cnxk_mldev, stat_ids, nb_ids);
+	case RTE_ML_DEV_XSTATS_MODEL:
+		return cnxk_ml_model_xstats_reset(cnxk_mldev, model_id, stat_ids, nb_ids);
+	};
+
+	return 0;
+}
+
 static int
 cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
@@ -806,10 +1279,10 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Stats ops */
 	.dev_stats_get = cnxk_ml_dev_stats_get,
 	.dev_stats_reset = cnxk_ml_dev_stats_reset,
-	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
-	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
-	.dev_xstats_get = cn10k_ml_dev_xstats_get,
-	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
+	.dev_xstats_names_get = cnxk_ml_dev_xstats_names_get,
+	.dev_xstats_by_name_get = cnxk_ml_dev_xstats_by_name_get,
+	.dev_xstats_get = cnxk_ml_dev_xstats_get,
+	.dev_xstats_reset = cnxk_ml_dev_xstats_reset,
 
 	/* Model ops */
 	.model_load = cnxk_ml_model_load,
diff --git a/drivers/ml/cnxk/cnxk_ml_xstats.h b/drivers/ml/cnxk/cnxk_ml_xstats.h
index 0d405679ca..5e02bb876c 100644
--- a/drivers/ml/cnxk/cnxk_ml_xstats.h
+++ b/drivers/ml/cnxk/cnxk_ml_xstats.h
@@ -7,6 +7,8 @@
 
 #include "cnxk_ml_io.h"
 
+struct cnxk_ml_dev;
+
 /* Extended stats types enum */
 enum cnxk_ml_xstats_type {
 	/* Number of models loaded */
@@ -58,9 +60,21 @@ enum cnxk_ml_xstats_fn_type {
 	CNXK_ML_XSTATS_FN_MODEL,
 };
 
+/* Extended stats group */
+enum cnxk_ml_xstats_group {
+	/* Device stats */
+	CNXK_ML_XSTATS_GROUP_DEVICE,
+
+	/* Model stats */
+	CNXK_ML_XSTATS_GROUP_MODEL,
+
+	/* Layer stats */
+	CNXK_ML_XSTATS_GROUP_LAYER,
+};
+
 /* Function pointer to get xstats for a type */
-typedef uint64_t (*cnxk_ml_xstats_fn)(struct rte_ml_dev *cnxk_mldev, uint16_t obj_idx,
-				      enum cnxk_ml_xstats_type stat);
+typedef uint64_t (*cnxk_ml_xstats_fn)(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx,
+				      int32_t layer_id, enum cnxk_ml_xstats_type stat);
 
 /* Extended stats entry structure */
 struct cnxk_ml_xstats_entry {
@@ -70,6 +84,9 @@ struct cnxk_ml_xstats_entry {
 	/* xstats mode, device or model */
 	enum rte_ml_dev_xstats_mode mode;
 
+	/* xstats group */
+	enum cnxk_ml_xstats_group group;
+
 	/* Type of xstats */
 	enum cnxk_ml_xstats_type type;
 
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 16/35] ml/cnxk: update fast path functions
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
                     ` (14 preceding siblings ...)
  2023-09-27 18:30   ` [PATCH v3 15/35] ml/cnxk: update device and model xstats functions Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 17/35] ml/cnxk: move error handling to cnxk layer Srikanth Yalavarthi
                     ` (18 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented cnxk layer fast-path functions and added support
for model specific fast-path functions. CNXK layer functions
would invoke model specific fast-path functions.

Added support for model specific poll handling functions and
updated internal inference sync function. Drop use of rte_ml_op
as argument. Updated function arguments to enable the function
to be used as callback by TVM HW runtime.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h  |   5 -
 drivers/ml/cnxk/cn10k_ml_ops.c  | 241 ++++++++------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h  |  13 +-
 drivers/ml/cnxk/cnxk_ml_model.h |  14 ++
 drivers/ml/cnxk/cnxk_ml_ops.c   | 128 +++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.h   |   7 +
 6 files changed, 216 insertions(+), 192 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index bde9d08901..94a94d996f 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -143,11 +143,6 @@ struct cn10k_ml_dev {
 
 	/* JCMD enqueue function handler */
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
-
-	/* Poll handling function pointers */
-	void (*set_poll_addr)(struct cnxk_ml_req *req);
-	void (*set_poll_ptr)(struct cnxk_ml_req *req);
-	uint64_t (*get_poll_ptr)(struct cnxk_ml_req *req);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 776ad60401..8116c8dedb 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -65,24 +65,12 @@ static const struct cn10k_ml_stype_db_driver {
 	{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
 };
 
-static inline void
+__rte_hot void
 cn10k_ml_set_poll_addr(struct cnxk_ml_req *req)
 {
 	req->status = &req->cn10k_req.status;
 }
 
-static inline void
-cn10k_ml_set_poll_ptr(struct cnxk_ml_req *req)
-{
-	plt_write64(ML_CNXK_POLL_JOB_START, req->status);
-}
-
-static inline uint64_t
-cn10k_ml_get_poll_ptr(struct cnxk_ml_req *req)
-{
-	return plt_read64(req->status);
-}
-
 void
 cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp)
 {
@@ -177,7 +165,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_l
 
 static __rte_always_inline void
 cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_req *req,
-				struct rte_ml_op *op)
+				uint16_t index, void *input, void *output, uint16_t nb_batches)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 
@@ -185,17 +173,17 @@ cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_r
 
 	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
 	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(req->status);
-	req->cn10k_req.jd.hdr.model_id = op->model_id;
+	req->cn10k_req.jd.hdr.model_id = index;
 	req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
 	req->cn10k_req.jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
 	req->cn10k_req.jd.hdr.sp_flags = 0x0;
 	req->cn10k_req.jd.hdr.result =
 		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
 	req->cn10k_req.jd.model_run.input_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->input[0]->addr));
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, input));
 	req->cn10k_req.jd.model_run.output_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->output[0]->addr));
-	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, output));
+	req->cn10k_req.jd.model_run.num_batches = nb_batches;
 }
 
 static void
@@ -311,30 +299,15 @@ cn10k_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *l
 static int
 cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer)
 {
-	struct rte_ml_buff_seg seg[2];
-	struct rte_ml_buff_seg *inp;
-	struct rte_ml_buff_seg *out;
-	struct rte_ml_op op;
-
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
 	uint64_t isize = 0;
 	uint64_t osize = 0;
 	int ret = 0;
-	uint32_t i;
-
-	inp = &seg[0];
-	out = &seg[1];
 
 	/* Create input and output buffers. */
-	for (i = 0; i < layer->info.nb_inputs; i++)
-		isize += layer->info.input[i].sz_q;
-
-	for (i = 0; i < layer->info.nb_outputs; i++)
-		osize += layer->info.output[i].sz_q;
-
-	isize = layer->batch_size * isize;
-	osize = layer->batch_size * osize;
+	isize = layer->info.total_input_sz_q;
+	osize = layer->info.total_output_sz_q;
 
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", layer->index);
 	mz = plt_memzone_reserve_aligned(str, isize + osize, 0, ML_CN10K_ALIGN_SIZE);
@@ -342,25 +315,9 @@ cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *
 		return -ENOMEM;
 	memset(mz->addr, 0, isize + osize);
 
-	seg[0].addr = mz->addr;
-	seg[0].iova_addr = mz->iova;
-	seg[0].length = isize;
-	seg[0].next = NULL;
-
-	seg[1].addr = PLT_PTR_ADD(mz->addr, isize);
-	seg[1].iova_addr = mz->iova + isize;
-	seg[1].length = osize;
-	seg[1].next = NULL;
-
-	op.model_id = layer->index;
-	op.nb_batches = layer->batch_size;
-	op.mempool = NULL;
-
-	op.input = &inp;
-	op.output = &out;
-
 	memset(layer->glow.req, 0, sizeof(struct cnxk_ml_req));
-	ret = cn10k_ml_inference_sync(cnxk_mldev, &op);
+	ret = cn10k_ml_inference_sync(cnxk_mldev, layer->index, mz->addr,
+				      PLT_PTR_ADD(mz->addr, isize), 1);
 	plt_memzone_free(mz);
 
 	return ret;
@@ -425,13 +382,8 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 	else
 		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
-	/* Set polling function pointers */
-	cn10k_mldev->set_poll_addr = cn10k_ml_set_poll_addr;
-	cn10k_mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
-	cn10k_mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
-
-	cnxk_mldev->mldev->enqueue_burst = cn10k_ml_enqueue_burst;
-	cnxk_mldev->mldev->dequeue_burst = cn10k_ml_dequeue_burst;
+	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
+	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
 	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
 
 	return 0;
@@ -824,6 +776,12 @@ cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	cn10k_ml_model_info_set(cnxk_mldev, model, &model->layer[0].info, &model->glow.metadata);
 
+	/* Set fast-path functions */
+	model->enqueue_single = cn10k_ml_enqueue_single;
+	model->result_update = cn10k_ml_result_update;
+	model->set_error_code = cn10k_ml_set_error_code;
+	model->set_poll_addr = cn10k_ml_set_poll_addr;
+
 	return 0;
 }
 
@@ -1219,26 +1177,8 @@ cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_mode
 	return 0;
 }
 
-static __rte_always_inline void
-queue_index_advance(uint64_t *index, uint64_t nb_desc)
-{
-	*index = (*index + 1) % nb_desc;
-}
-
-static __rte_always_inline uint64_t
-queue_pending_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
-{
-	return (nb_desc + head - tail) % nb_desc;
-}
-
-static __rte_always_inline uint64_t
-queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
-{
-	return nb_desc - queue_pending_count(head, tail, nb_desc) - 1;
-}
-
-static __rte_always_inline void
-cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml_req *req)
+__rte_hot void
+cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_layer_xstats *xstats;
@@ -1246,6 +1186,7 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml
 	struct cn10k_ml_result *result;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
+	struct cnxk_ml_req *req;
 	struct cnxk_ml_qp *qp;
 	struct rte_ml_op *op;
 	uint64_t hw_latency;
@@ -1253,9 +1194,9 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml
 	uint16_t model_id;
 	uint16_t layer_id;
 
+	req = (struct cnxk_ml_req *)request;
 	result = &req->cn10k_req.result;
 	op = req->op;
-
 	if (likely(result->error_code == 0)) {
 		model_id = cnxk_mldev->index_map[op->model_id].model_id;
 		layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
@@ -1322,119 +1263,48 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml
 	op->user_ptr = result->user_ptr;
 }
 
-__rte_hot uint16_t
-cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
-		       uint16_t nb_ops)
+__rte_hot void
+cn10k_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype)
+{
+	union cn10k_ml_error_code *error_code;
+
+	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+	error_code->s.etype = etype;
+	error_code->s.stype = stype;
+}
+
+__rte_hot bool
+cn10k_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op, uint16_t layer_id,
+			struct cnxk_ml_qp *qp, uint64_t head)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
 	struct cnxk_ml_queue *queue;
 	struct cnxk_ml_req *req;
-	struct cnxk_ml_qp *qp;
-	struct rte_ml_op *op;
-
-	uint16_t count;
-	uint64_t head;
-	bool enqueued;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	qp = dev->data->queue_pairs[qp_id];
 	queue = &qp->queue;
-
-	head = queue->head;
-	nb_ops = PLT_MIN(nb_ops, queue_free_count(head, queue->tail, qp->nb_desc));
-	count = 0;
-
-	if (unlikely(nb_ops == 0))
-		return 0;
-
-enqueue_req:
-	op = ops[count];
 	req = &queue->reqs[head];
 
-	cn10k_mldev->set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
+	model = cnxk_mldev->mldev->data->models[op->model_id];
+	model->set_poll_addr(req);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, model->layer[layer_id].index,
+					op->input[0]->addr, op->output[0]->addr, op->nb_batches);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
 	error_code->s.etype = ML_ETYPE_UNKNOWN;
 	req->cn10k_req.result.user_ptr = op->user_ptr;
 
-	cn10k_mldev->set_poll_ptr(req);
-	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd);
-	if (unlikely(!enqueued))
-		goto jcmdq_full;
+	cnxk_ml_set_poll_ptr(req);
+	if (unlikely(!cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd)))
+		return false;
 
 	req->timeout = plt_tsc_cycles() + queue->wait_cycles;
 	req->op = op;
 
-	queue_index_advance(&head, qp->nb_desc);
-	count++;
-
-	if (count < nb_ops)
-		goto enqueue_req;
-
-jcmdq_full:
-	queue->head = head;
-	qp->stats.enqueued_count += count;
-
-	return count;
-}
-
-__rte_hot uint16_t
-cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
-		       uint16_t nb_ops)
-{
-	union cn10k_ml_error_code *error_code;
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_queue *queue;
-	struct cnxk_ml_req *req;
-	struct cnxk_ml_qp *qp;
-
-	uint64_t status;
-	uint16_t count;
-	uint64_t tail;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	qp = dev->data->queue_pairs[qp_id];
-	queue = &qp->queue;
-
-	tail = queue->tail;
-	nb_ops = PLT_MIN(nb_ops, queue_pending_count(queue->head, tail, qp->nb_desc));
-	count = 0;
-
-	if (unlikely(nb_ops == 0))
-		goto empty_or_active;
-
-dequeue_req:
-	req = &queue->reqs[tail];
-	status = cn10k_mldev->get_poll_ptr(req);
-	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
-		if (plt_tsc_cycles() < req->timeout) {
-			goto empty_or_active;
-		} else { /* Timeout, set indication of driver error */
-			error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
-			error_code->s.etype = ML_ETYPE_DRIVER;
-		}
-	}
-
-	cn10k_ml_result_update(cnxk_mldev, qp_id, req);
-	ops[count] = req->op;
-
-	queue_index_advance(&tail, qp->nb_desc);
-	count++;
-
-	if (count < nb_ops)
-		goto dequeue_req;
-
-empty_or_active:
-	queue->tail = tail;
-
-	return count;
+	return true;
 }
 
 __rte_hot int
@@ -1471,41 +1341,48 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 }
 
 __rte_hot int
-cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op)
+cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
+			uint16_t nb_batches)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
 	struct cnxk_ml_req *req;
+	struct rte_ml_op op;
 	uint16_t model_id;
 	uint16_t layer_id;
 	bool timeout;
 	int ret = 0;
 
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	model_id = cnxk_mldev->index_map[op->model_id].model_id;
-	layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
+	model_id = cnxk_mldev->index_map[index].model_id;
+	layer_id = cnxk_mldev->index_map[index].layer_id;
 	model = cnxk_mldev->mldev->data->models[model_id];
 	layer = &model->layer[layer_id];
 	req = layer->glow.req;
 
+	op.model_id = index;
+	op.impl_opaque = 0;
+
 	cn10k_ml_set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, index, input, output, nb_batches);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
 	error_code->s.etype = ML_ETYPE_UNKNOWN;
-	req->cn10k_req.result.user_ptr = op->user_ptr;
+	req->cn10k_req.result.user_ptr = NULL;
 
-	cn10k_mldev->set_poll_ptr(req);
+	cnxk_ml_set_poll_ptr(req);
 	req->cn10k_req.jcmd.w1.s.jobptr = PLT_U64_CAST(&req->cn10k_req.jd);
 
 	timeout = true;
 	req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	do {
 		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd)) {
-			req->op = op;
+			req->op = &op;
 			timeout = false;
 			break;
 		}
@@ -1518,7 +1395,7 @@ cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op)
 
 	timeout = true;
 	do {
-		if (cn10k_mldev->get_poll_ptr(req) == ML_CNXK_POLL_JOB_FINISH) {
+		if (cnxk_ml_get_poll_ptr(req) == ML_CNXK_POLL_JOB_FINISH) {
 			timeout = false;
 			break;
 		}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 4d76164dba..3d18303ed3 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -14,6 +14,7 @@ struct cnxk_ml_dev;
 struct cnxk_ml_qp;
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
+struct cnxk_ml_req;
 
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
@@ -309,13 +310,15 @@ int cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_
 				 void *buffer);
 
 /* Fast-path ops */
-__rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
-					  struct rte_ml_op **ops, uint16_t nb_ops);
-__rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
-					  struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot bool cn10k_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op,
+				       uint16_t layer_id, struct cnxk_ml_qp *qp, uint64_t head);
 __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op,
 				    struct rte_ml_op_error *error);
-__rte_hot int cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op);
+__rte_hot int cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
+				      uint16_t nb_batches);
+__rte_hot void cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request);
+__rte_hot void cn10k_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype);
+__rte_hot void cn10k_ml_set_poll_addr(struct cnxk_ml_req *req);
 
 /* Misc ops */
 void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index 66d979dd3c..f618e5aa5f 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -15,6 +15,8 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_qp;
+struct cnxk_ml_req;
 
 /* Model state */
 enum cnxk_ml_model_state {
@@ -70,6 +72,12 @@ struct cnxk_ml_layer {
 	struct cn10k_ml_layer_data glow;
 };
 
+typedef bool (*enqueue_single_t)(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op,
+				 uint16_t layer_id, struct cnxk_ml_qp *qp, uint64_t head);
+typedef void (*result_update_t)(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request);
+typedef void (*set_error_code_t)(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype);
+typedef void (*set_poll_addr_t)(struct cnxk_ml_req *req);
+
 /* Model Object */
 struct cnxk_ml_model {
 	/* Device reference */
@@ -106,6 +114,12 @@ struct cnxk_ml_model {
 
 	/* Spinlock, used to update model state */
 	plt_spinlock_t lock;
+
+	/* Fast-path functions */
+	enqueue_single_t enqueue_single;
+	result_update_t result_update;
+	set_error_code_t set_error_code;
+	set_poll_addr_t set_poll_addr;
 };
 
 void cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp);
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 6a423d9eda..6a44a69508 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -15,6 +15,18 @@
 /* ML model macros */
 #define CNXK_ML_MODEL_MEMZONE_NAME "ml_cnxk_model_mz"
 
+__rte_hot void
+cnxk_ml_set_poll_ptr(struct cnxk_ml_req *req)
+{
+	plt_write64(ML_CNXK_POLL_JOB_START, req->status);
+}
+
+__rte_hot uint64_t
+cnxk_ml_get_poll_ptr(struct cnxk_ml_req *req)
+{
+	return plt_read64(req->status);
+}
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -1262,6 +1274,122 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 	return 0;
 }
 
+static __rte_always_inline void
+queue_index_advance(uint64_t *index, uint64_t nb_desc)
+{
+	*index = (*index + 1) % nb_desc;
+}
+
+static __rte_always_inline uint64_t
+queue_pending_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
+{
+	return (nb_desc + head - tail) % nb_desc;
+}
+
+static __rte_always_inline uint64_t
+queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
+{
+	return nb_desc - queue_pending_count(head, tail, nb_desc) - 1;
+}
+
+__rte_hot uint16_t
+cnxk_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
+		      uint16_t nb_ops)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_qp *qp;
+	struct rte_ml_op *op;
+
+	uint16_t layer_id = 0;
+	uint16_t count;
+	uint64_t head;
+
+	cnxk_mldev = dev->data->dev_private;
+	qp = dev->data->queue_pairs[qp_id];
+	queue = &qp->queue;
+
+	head = queue->head;
+	nb_ops = PLT_MIN(nb_ops, queue_free_count(head, queue->tail, qp->nb_desc));
+	count = 0;
+
+	if (unlikely(nb_ops == 0))
+		return 0;
+
+enqueue_req:
+	op = ops[count];
+	model = cnxk_mldev->mldev->data->models[op->model_id];
+
+	if (unlikely(!model->enqueue_single(cnxk_mldev, op, layer_id, qp, head)))
+		goto jcmdq_full;
+
+	queue_index_advance(&head, qp->nb_desc);
+	count++;
+
+	if (count < nb_ops)
+		goto enqueue_req;
+
+jcmdq_full:
+	queue->head = head;
+	qp->stats.enqueued_count += count;
+
+	return count;
+}
+
+__rte_hot uint16_t
+cnxk_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
+		      uint16_t nb_ops)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_req *req;
+	struct cnxk_ml_qp *qp;
+
+	uint64_t status;
+	uint16_t count;
+	uint64_t tail;
+
+	cnxk_mldev = dev->data->dev_private;
+	qp = dev->data->queue_pairs[qp_id];
+	queue = &qp->queue;
+
+	tail = queue->tail;
+	nb_ops = PLT_MIN(nb_ops, queue_pending_count(queue->head, tail, qp->nb_desc));
+	count = 0;
+
+	if (unlikely(nb_ops == 0))
+		goto empty_or_active;
+
+dequeue_req:
+
+	req = &queue->reqs[tail];
+	model = cnxk_mldev->mldev->data->models[req->op->model_id];
+
+	status = cnxk_ml_get_poll_ptr(req);
+	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
+		if (plt_tsc_cycles() < req->timeout)
+			goto empty_or_active;
+		else /* Timeout, set indication of driver error */
+			model->set_error_code(req, ML_ETYPE_DRIVER, 0);
+	}
+
+	model->result_update(cnxk_mldev, qp->id, req);
+
+	ops[count] = req->op;
+	queue_index_advance(&tail, qp->nb_desc);
+	count++;
+
+	if (count < nb_ops)
+		goto dequeue_req;
+
+empty_or_active:
+	queue->tail = tail;
+
+	return count;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index d27ca0d0cb..d0c126f34b 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -65,4 +65,11 @@ extern struct rte_ml_dev_ops cnxk_ml_ops;
 int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
 int cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
 
+__rte_hot uint16_t cnxk_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
+					 struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot uint16_t cnxk_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
+					 struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot void cnxk_ml_set_poll_ptr(struct cnxk_ml_req *req);
+__rte_hot uint64_t cnxk_ml_get_poll_ptr(struct cnxk_ml_req *req);
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 17/35] ml/cnxk: move error handling to cnxk layer
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
                     ` (15 preceding siblings ...)
  2023-09-27 18:30   ` [PATCH v3 16/35] ml/cnxk: update fast path functions Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 18/35] ml/cnxk: support config and close of tvmdp library Srikanth Yalavarthi
                     ` (17 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Move error type structures to cnxk layer. cn10k layer to
handle fw and hw error sub-types only.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h | 41 ++++++---------
 drivers/ml/cnxk/cn10k_ml_ops.c | 93 +++++++++++++---------------------
 drivers/ml/cnxk/cnxk_ml_dev.c  |  8 +++
 drivers/ml/cnxk/cnxk_ml_dev.h  | 18 +++++++
 drivers/ml/cnxk/cnxk_ml_ops.c  |  2 +-
 5 files changed, 78 insertions(+), 84 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 94a94d996f..2e7eb6c9ef 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -52,38 +52,27 @@ struct cnxk_ml_dev;
 struct cnxk_ml_req;
 struct cnxk_ml_qp;
 
-/* Error types enumeration */
-enum cn10k_ml_error_etype {
-	/* 0x0 */ ML_ETYPE_NO_ERROR = 0, /* No error */
-	/* 0x1 */ ML_ETYPE_FW_NONFATAL,	 /* Firmware non-fatal error */
-	/* 0x2 */ ML_ETYPE_HW_NONFATAL,	 /* Hardware non-fatal error */
-	/* 0x3 */ ML_ETYPE_HW_FATAL,	 /* Hardware fatal error */
-	/* 0x4 */ ML_ETYPE_HW_WARNING,	 /* Hardware warning */
-	/* 0x5 */ ML_ETYPE_DRIVER,	 /* Driver specific error */
-	/* 0x6 */ ML_ETYPE_UNKNOWN,	 /* Unknown error */
-};
-
 /* Firmware non-fatal error sub-type */
 enum cn10k_ml_error_stype_fw_nf {
-	/* 0x0 */ ML_FW_ERR_NOERR = 0,		 /* No error */
-	/* 0x1 */ ML_FW_ERR_UNLOAD_ID_NOT_FOUND, /* Model ID not found during load */
-	/* 0x2 */ ML_FW_ERR_LOAD_LUT_OVERFLOW,	 /* Lookup table overflow at load */
-	/* 0x3 */ ML_FW_ERR_ID_IN_USE,		 /* Model ID already in use */
-	/* 0x4 */ ML_FW_ERR_INVALID_TILEMASK,	 /* Invalid OCM tilemask */
-	/* 0x5 */ ML_FW_ERR_RUN_LUT_OVERFLOW,	 /* Lookup table overflow at run */
-	/* 0x6 */ ML_FW_ERR_RUN_ID_NOT_FOUND,	 /* Model ID not found during run */
-	/* 0x7 */ ML_FW_ERR_COMMAND_NOTSUP,	 /* Unsupported command */
-	/* 0x8 */ ML_FW_ERR_DDR_ADDR_RANGE,	 /* DDR address out of range */
-	/* 0x9 */ ML_FW_ERR_NUM_BATCHES_INVALID, /* Invalid number of batches */
-	/* 0xA */ ML_FW_ERR_INSSYNC_TIMEOUT,	 /* INS sync timeout */
+	/* 0x0 */ ML_CN10K_FW_ERR_NOERR = 0,	       /* No error */
+	/* 0x1 */ ML_CN10K_FW_ERR_UNLOAD_ID_NOT_FOUND, /* Model ID not found during load */
+	/* 0x2 */ ML_CN10K_FW_ERR_LOAD_LUT_OVERFLOW,   /* Lookup table overflow at load */
+	/* 0x3 */ ML_CN10K_FW_ERR_ID_IN_USE,	       /* Model ID already in use */
+	/* 0x4 */ ML_CN10K_FW_ERR_INVALID_TILEMASK,    /* Invalid OCM tilemask */
+	/* 0x5 */ ML_CN10K_FW_ERR_RUN_LUT_OVERFLOW,    /* Lookup table overflow at run */
+	/* 0x6 */ ML_CN10K_FW_ERR_RUN_ID_NOT_FOUND,    /* Model ID not found during run */
+	/* 0x7 */ ML_CN10K_FW_ERR_COMMAND_NOTSUP,      /* Unsupported command */
+	/* 0x8 */ ML_CN10K_FW_ERR_DDR_ADDR_RANGE,      /* DDR address out of range */
+	/* 0x9 */ ML_CN10K_FW_ERR_NUM_BATCHES_INVALID, /* Invalid number of batches */
+	/* 0xA */ ML_CN10K_FW_ERR_INSSYNC_TIMEOUT,     /* INS sync timeout */
 };
 
 /* Driver error sub-type */
 enum cn10k_ml_error_stype_driver {
-	/* 0x0 */ ML_DRIVER_ERR_NOERR = 0, /* No error */
-	/* 0x1 */ ML_DRIVER_ERR_UNKNOWN,   /* Unable to determine error sub-type */
-	/* 0x2 */ ML_DRIVER_ERR_EXCEPTION, /* Firmware exception */
-	/* 0x3 */ ML_DRIVER_ERR_FW_ERROR,  /* Unknown firmware error */
+	/* 0x0 */ ML_CN10K_DRIVER_ERR_NOERR = 0, /* No error */
+	/* 0x1 */ ML_CN10K_DRIVER_ERR_UNKNOWN,	 /* Unable to determine error sub-type */
+	/* 0x2 */ ML_CN10K_DRIVER_ERR_EXCEPTION, /* Firmware exception */
+	/* 0x3 */ ML_CN10K_DRIVER_ERR_FW_ERROR,	 /* Unknown firmware error */
 };
 
 /* Error structure */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 8116c8dedb..65eaaf030d 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -22,47 +22,27 @@
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
-/* Error message length */
-#define ERRMSG_LEN 32
-
-/* Error type database */
-static const struct cn10k_ml_etype_db {
-	enum cn10k_ml_error_etype etype;
-	char name[ERRMSG_LEN];
-} ml_etype_db[] = {
-	{ML_ETYPE_NO_ERROR, "NO_ERROR"},	{ML_ETYPE_FW_NONFATAL, "FW_NON_FATAL"},
-	{ML_ETYPE_HW_NONFATAL, "HW_NON_FATAL"}, {ML_ETYPE_HW_FATAL, "HW_FATAL"},
-	{ML_ETYPE_HW_WARNING, "HW_WARNING"},	{ML_ETYPE_DRIVER, "DRIVER_ERROR"},
-	{ML_ETYPE_UNKNOWN, "UNKNOWN_ERROR"},
-};
-
 /* Hardware non-fatal error subtype database */
-static const struct cn10k_ml_stype_db_hw_nf {
-	enum cn10k_ml_error_stype_fw_nf stype;
-	char msg[ERRMSG_LEN];
-} ml_stype_db_hw_nf[] = {
-	{ML_FW_ERR_NOERR, "NO ERROR"},
-	{ML_FW_ERR_UNLOAD_ID_NOT_FOUND, "UNLOAD MODEL ID NOT FOUND"},
-	{ML_FW_ERR_LOAD_LUT_OVERFLOW, "LOAD LUT OVERFLOW"},
-	{ML_FW_ERR_ID_IN_USE, "MODEL ID IN USE"},
-	{ML_FW_ERR_INVALID_TILEMASK, "INVALID TILEMASK"},
-	{ML_FW_ERR_RUN_LUT_OVERFLOW, "RUN LUT OVERFLOW"},
-	{ML_FW_ERR_RUN_ID_NOT_FOUND, "RUN MODEL ID NOT FOUND"},
-	{ML_FW_ERR_COMMAND_NOTSUP, "COMMAND NOT SUPPORTED"},
-	{ML_FW_ERR_DDR_ADDR_RANGE, "DDR ADDRESS OUT OF RANGE"},
-	{ML_FW_ERR_NUM_BATCHES_INVALID, "INVALID BATCHES"},
-	{ML_FW_ERR_INSSYNC_TIMEOUT, "INSSYNC TIMEOUT"},
+static struct cnxk_ml_error_db ml_stype_db_hw_nf[] = {
+	{ML_CN10K_FW_ERR_NOERR, "NO ERROR"},
+	{ML_CN10K_FW_ERR_UNLOAD_ID_NOT_FOUND, "UNLOAD MODEL ID NOT FOUND"},
+	{ML_CN10K_FW_ERR_LOAD_LUT_OVERFLOW, "LOAD LUT OVERFLOW"},
+	{ML_CN10K_FW_ERR_ID_IN_USE, "MODEL ID IN USE"},
+	{ML_CN10K_FW_ERR_INVALID_TILEMASK, "INVALID TILEMASK"},
+	{ML_CN10K_FW_ERR_RUN_LUT_OVERFLOW, "RUN LUT OVERFLOW"},
+	{ML_CN10K_FW_ERR_RUN_ID_NOT_FOUND, "RUN MODEL ID NOT FOUND"},
+	{ML_CN10K_FW_ERR_COMMAND_NOTSUP, "COMMAND NOT SUPPORTED"},
+	{ML_CN10K_FW_ERR_DDR_ADDR_RANGE, "DDR ADDRESS OUT OF RANGE"},
+	{ML_CN10K_FW_ERR_NUM_BATCHES_INVALID, "INVALID BATCHES"},
+	{ML_CN10K_FW_ERR_INSSYNC_TIMEOUT, "INSSYNC TIMEOUT"},
 };
 
 /* Driver error subtype database */
-static const struct cn10k_ml_stype_db_driver {
-	enum cn10k_ml_error_stype_driver stype;
-	char msg[ERRMSG_LEN];
-} ml_stype_db_driver[] = {
-	{ML_DRIVER_ERR_NOERR, "NO ERROR"},
-	{ML_DRIVER_ERR_UNKNOWN, "UNKNOWN ERROR"},
-	{ML_DRIVER_ERR_EXCEPTION, "FW EXCEPTION"},
-	{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
+static struct cnxk_ml_error_db ml_stype_db_driver[] = {
+	{ML_CN10K_DRIVER_ERR_NOERR, "NO ERROR"},
+	{ML_CN10K_DRIVER_ERR_UNKNOWN, "UNKNOWN ERROR"},
+	{ML_CN10K_DRIVER_ERR_EXCEPTION, "FW EXCEPTION"},
+	{ML_CN10K_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
 };
 
 __rte_hot void
@@ -1241,19 +1221,19 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request)
 
 		/* Handle driver error */
 		error_code = (union cn10k_ml_error_code *)&result->error_code;
-		if (error_code->s.etype == ML_ETYPE_DRIVER) {
+		if (error_code->s.etype == ML_CNXK_ETYPE_DRIVER) {
 			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 			/* Check for exception */
 			if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) !=
 			     0) ||
 			    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
-				error_code->s.stype = ML_DRIVER_ERR_EXCEPTION;
+				error_code->s.stype = ML_CN10K_DRIVER_ERR_EXCEPTION;
 			else if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_LO) != 0) ||
 				 (roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_HI) != 0))
-				error_code->s.stype = ML_DRIVER_ERR_FW_ERROR;
+				error_code->s.stype = ML_CN10K_DRIVER_ERR_FW_ERROR;
 			else
-				error_code->s.stype = ML_DRIVER_ERR_UNKNOWN;
+				error_code->s.stype = ML_CN10K_DRIVER_ERR_UNKNOWN;
 		}
 
 		op->impl_opaque = result->error_code;
@@ -1294,7 +1274,7 @@ cn10k_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op, ui
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
-	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	error_code->s.etype = ML_CNXK_ETYPE_UNKNOWN;
 	req->cn10k_req.result.user_ptr = op->user_ptr;
 
 	cnxk_ml_set_poll_ptr(req);
@@ -1311,30 +1291,29 @@ __rte_hot int
 cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_ml_op_error *error)
 {
 	union cn10k_ml_error_code *error_code;
-	char msg[RTE_ML_STR_MAX];
 
 	PLT_SET_USED(dev);
 
 	error_code = (union cn10k_ml_error_code *)&op->impl_opaque;
 
-	/* Copy error message */
-	plt_strlcpy(msg, ml_etype_db[error_code->s.etype].name, sizeof(msg));
-
 	/* Copy sub error message */
-	if (error_code->s.etype == ML_ETYPE_HW_NONFATAL) {
-		strcat(msg, " : ");
+	if (error_code->s.etype == ML_CNXK_ETYPE_HW_NONFATAL) {
 		if (error_code->s.stype < PLT_DIM(ml_stype_db_hw_nf))
-			strcat(msg, ml_stype_db_hw_nf[error_code->s.stype].msg);
+			snprintf(error->message, RTE_ML_STR_MAX, "%s : %s",
+				 ml_etype_db[error_code->s.etype].str,
+				 ml_stype_db_hw_nf[error_code->s.stype].str);
 		else
-			strcat(msg, "UNKNOWN ERROR");
-	}
-
-	if (error_code->s.etype == ML_ETYPE_DRIVER) {
-		strcat(msg, " : ");
-		strcat(msg, ml_stype_db_driver[error_code->s.stype].msg);
+			snprintf(error->message, RTE_ML_STR_MAX, "%s : UNKNOWN ERROR",
+				 ml_etype_db[error_code->s.etype].str);
+	} else if (error_code->s.etype == ML_CNXK_ETYPE_DRIVER) {
+		snprintf(error->message, RTE_ML_STR_MAX, "%s : %s",
+			 ml_etype_db[error_code->s.etype].str,
+			 ml_stype_db_driver[error_code->s.stype].str);
+	} else {
+		snprintf(error->message, RTE_ML_STR_MAX, "%s",
+			 ml_etype_db[error_code->s.etype].str);
 	}
 
-	plt_strlcpy(error->message, msg, sizeof(error->message));
 	error->errcode = error_code->u64;
 
 	return 0;
@@ -1372,7 +1351,7 @@ cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
-	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	error_code->s.etype = ML_CNXK_ETYPE_UNKNOWN;
 	req->cn10k_req.result.user_ptr = NULL;
 
 	cnxk_ml_set_poll_ptr(req);
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.c b/drivers/ml/cnxk/cnxk_ml_dev.c
index 2a5c17c973..63d1c9e417 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.c
+++ b/drivers/ml/cnxk/cnxk_ml_dev.c
@@ -9,3 +9,11 @@
 
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
+
+/* Error type database */
+struct cnxk_ml_error_db ml_etype_db[] = {
+	{ML_CNXK_ETYPE_NO_ERROR, "NO_ERROR"},	     {ML_CNXK_ETYPE_FW_NONFATAL, "FW_NON_FATAL"},
+	{ML_CNXK_ETYPE_HW_NONFATAL, "HW_NON_FATAL"}, {ML_CNXK_ETYPE_HW_FATAL, "HW_FATAL"},
+	{ML_CNXK_ETYPE_HW_WARNING, "HW_WARNING"},    {ML_CNXK_ETYPE_DRIVER, "DRIVER_ERROR"},
+	{ML_CNXK_ETYPE_UNKNOWN, "UNKNOWN_ERROR"},
+};
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 3ce9338f1f..382fca64be 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -18,6 +18,22 @@
 #define ML_CNXK_POLL_JOB_START	0
 #define ML_CNXK_POLL_JOB_FINISH 1
 
+/* Error types enumeration */
+enum cnxk_ml_error_etype {
+	/* 0x0 */ ML_CNXK_ETYPE_NO_ERROR = 0, /* No error */
+	/* 0x1 */ ML_CNXK_ETYPE_FW_NONFATAL,  /* Firmware non-fatal error */
+	/* 0x2 */ ML_CNXK_ETYPE_HW_NONFATAL,  /* Hardware non-fatal error */
+	/* 0x3 */ ML_CNXK_ETYPE_HW_FATAL,     /* Hardware fatal error */
+	/* 0x4 */ ML_CNXK_ETYPE_HW_WARNING,   /* Hardware warning */
+	/* 0x5 */ ML_CNXK_ETYPE_DRIVER,	      /* Driver specific error */
+	/* 0x6 */ ML_CNXK_ETYPE_UNKNOWN,      /* Unknown error */
+};
+
+struct cnxk_ml_error_db {
+	uint64_t code;
+	char str[RTE_ML_STR_MAX];
+};
+
 /* Device configuration state enum */
 enum cnxk_ml_dev_state {
 	/* Probed and not configured */
@@ -78,4 +94,6 @@ struct cnxk_ml_dev {
 	struct cnxk_ml_index_map *index_map;
 };
 
+extern struct cnxk_ml_error_db ml_etype_db[];
+
 #endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 6a44a69508..8339f8342b 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1372,7 +1372,7 @@ cnxk_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op *
 		if (plt_tsc_cycles() < req->timeout)
 			goto empty_or_active;
 		else /* Timeout, set indication of driver error */
-			model->set_error_code(req, ML_ETYPE_DRIVER, 0);
+			model->set_error_code(req, ML_CNXK_ETYPE_DRIVER, 0);
 	}
 
 	model->result_update(cnxk_mldev, qp->id, req);
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 18/35] ml/cnxk: support config and close of tvmdp library
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
                     ` (16 preceding siblings ...)
  2023-09-27 18:30   ` [PATCH v3 17/35] ml/cnxk: move error handling to cnxk layer Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 19/35] ml/cnxk: add structures to support TVM model type Srikanth Yalavarthi
                     ` (16 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to configure and close TVMDP library based
on ML device configuration options.

Updated meson build to enable Jansson, TVM runtime, TVMDP
library as build dependencies.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_ops.c    |  7 ++++
 drivers/ml/cnxk/cnxk_ml_ops.h    |  6 ++++
 drivers/ml/cnxk/meson.build      | 59 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 41 ++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   | 19 ++++++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.c | 26 ++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h | 15 ++++++++
 7 files changed, 173 insertions(+)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_stubs.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_stubs.h

diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 8339f8342b..c3639320a5 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -564,6 +564,10 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 		goto error;
 	}
 
+	ret = mvtvm_ml_dev_configure(cnxk_mldev, conf);
+	if (ret != 0)
+		goto error;
+
 	/* Set device capabilities */
 	cnxk_mldev->max_nb_layers =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
@@ -624,6 +628,9 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 	/* Un-initialize xstats */
 	cnxk_ml_xstats_uninit(cnxk_mldev);
 
+	if (mvtvm_ml_dev_close(cnxk_mldev) != 0)
+		plt_err("Failed to close MVTVM ML Device");
+
 	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close CN10K ML Device");
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index d0c126f34b..b22a2b0d95 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -12,6 +12,12 @@
 
 #include "cn10k_ml_ops.h"
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_ops.h"
+#else
+#include "mvtvm_ml_stubs.h"
+#endif
+
 /* Request structure */
 struct cnxk_ml_req {
 	/* Device specific request */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 575f08f9c0..7570186177 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -7,6 +7,32 @@ if not is_linux or not dpdk_conf.get('RTE_ARCH_64')
     subdir_done()
 endif
 
+enable_mvtvm = true
+
+if not jansson_dep.found()
+        message('drivers/ml/cnxk: jansson not found')
+        enable_mvtvm = false
+endif
+
+if not cc.check_header('dlpack/dlpack.h')
+        message('drivers/ml/cnxk: dlpack.h not found')
+        enable_mvtvm = false
+endif
+
+tvmrt_lib = cc.find_library('tvm_runtime', required: false)
+if tvmrt_lib.found()
+        tvmrt_dep = declare_dependency(dependencies: tvmrt_lib)
+else
+        message('drivers/ml/cnxk: tvm_runtime not found')
+        enable_mvtvm = false
+endif
+
+tvmdp_dep = dependency('tvmdp', required: false)
+if not tvmdp_dep.found()
+        message('drivers/ml/cnxk: tvmdp not found')
+        enable_mvtvm = false
+endif
+
 driver_sdk_headers = files(
         'cn10k_ml_dev.h',
         'cn10k_ml_ops.h',
@@ -34,6 +60,39 @@ sources = files(
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
 
+if enable_mvtvm
+
+dpdk_conf.set('RTE_MLDEV_CNXK_ENABLE_MVTVM', 1)
+
+driver_sdk_headers += files(
+        'mvtvm_ml_ops.h',
+)
+
+sources += files(
+        'mvtvm_ml_ops.c',
+)
+
+ext_deps += tvmrt_dep
+ext_deps += tvmdp_dep
+ext_deps += cc.find_library('stdc++', required: true)
+ext_deps += jansson_dep
+
+deps += ['bus_vdev']
+
+message('drivers/ml/cnxk: Enabled TVM model support')
+else
+message('drivers/ml/cnxk: Disabled TVM model support')
+
+driver_sdk_headers += files(
+        'mvtvm_ml_stubs.h',
+)
+
+sources += files(
+        'mvtvm_ml_stubs.c',
+)
+
+endif
+
 require_iova_in_mbuf = false
 
 if get_option('buildtype').contains('debug')
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
new file mode 100644
index 0000000000..88c6d5a864
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_common.h>
+#include <rte_cycles.h>
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include "cnxk_ml_dev.h"
+#include "cnxk_ml_ops.h"
+
+int
+mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
+{
+	int ret;
+
+	RTE_SET_USED(conf);
+
+	/* Configure TVMDP library */
+	ret = tvmdp_configure(cnxk_mldev->mldev->data->nb_models, rte_get_tsc_cycles);
+	if (ret != 0)
+		plt_err("TVMDP configuration failed, error = %d\n", ret);
+
+	return ret;
+}
+
+int
+mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
+{
+	int ret;
+
+	RTE_SET_USED(cnxk_mldev);
+
+	/* Close TVMDP library configuration */
+	ret = tvmdp_close();
+	if (ret != 0)
+		plt_err("TVMDP close failed, error = %d\n", ret);
+
+	return ret;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
new file mode 100644
index 0000000000..305b4681ed
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_OPS_H_
+#define _MVTVM_ML_OPS_H_
+
+#include <dlpack/dlpack.h>
+
+#include <tvmdp.h>
+
+#include <rte_mldev.h>
+
+struct cnxk_ml_dev;
+
+int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
+int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+
+#endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
new file mode 100644
index 0000000000..a31cd39cfa
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include "mvtvm_ml_stubs.h"
+
+#include "cnxk_ml_dev.h"
+
+int
+mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(conf);
+
+	return 0;
+}
+
+int
+mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
+{
+	RTE_SET_USED(cnxk_mldev);
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
new file mode 100644
index 0000000000..11c56e5144
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_STUBS_H_
+#define _MVTVM_ML_STUBS_H_
+
+#include <rte_mldev.h>
+
+struct cnxk_ml_dev;
+
+int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
+int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+
+#endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 19/35] ml/cnxk: add structures to support TVM model type
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
                     ` (17 preceding siblings ...)
  2023-09-27 18:30   ` [PATCH v3 18/35] ml/cnxk: support config and close of tvmdp library Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 20/35] ml/cnxk: add support for identify " Srikanth Yalavarthi
                     ` (15 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduced model type, sub-type and layer type. Added
internal structures for TVM model objects.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  3 ++
 drivers/ml/cnxk/cn10k_ml_ops.c   |  6 ++-
 drivers/ml/cnxk/cnxk_ml_model.h  | 66 +++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.c    | 52 ++++++++++++++++++++-----
 drivers/ml/cnxk/meson.build      |  1 +
 drivers/ml/cnxk/mvtvm_ml_model.h | 46 ++++++++++++++++++++++
 6 files changed, 161 insertions(+), 13 deletions(-)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.h

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index dc315cce10..749ddeb344 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -435,6 +435,9 @@ cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint1
 
 			for (j = 0; j < local_model->nb_layers; j++) {
 				local_layer = &local_model->layer[j];
+				if (local_layer->type != ML_CNXK_LAYER_TYPE_MRVL)
+					continue;
+
 				if (local_layer != layer &&
 				    local_layer->glow.ocm_map.ocm_reserved) {
 					if (IS_BIT_SET(local_layer->glow.ocm_map.tilemask, tile_id))
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 65eaaf030d..a471e98fbf 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -725,6 +725,9 @@ cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	if (ret != 0)
 		return ret;
 
+	/* Set model sub type */
+	model->subtype = ML_CNXK_MODEL_SUBTYPE_GLOW_MRVL;
+
 	/* Copy metadata to internal buffer */
 	rte_memcpy(&model->glow.metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
 	cn10k_ml_model_metadata_update(&model->glow.metadata);
@@ -746,6 +749,7 @@ cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	/* Load layer and get the index */
 	layer = &model->layer[0];
+	layer->type = ML_CNXK_LAYER_TYPE_MRVL;
 	ret = cn10k_ml_layer_load(cnxk_mldev, model->model_id, NULL, params->addr, params->size,
 				  &layer->index);
 	if (ret != 0) {
@@ -969,7 +973,7 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 	if (ret < 0) {
 		cn10k_ml_layer_stop(device, model_id, layer_name);
 	} else {
-		if (cn10k_mldev->cache_model_data)
+		if (cn10k_mldev->cache_model_data && model->type == ML_CNXK_MODEL_TYPE_GLOW)
 			ret = cn10k_ml_cache_model_data(cnxk_mldev, layer);
 	}
 
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index f618e5aa5f..f100eca203 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -11,6 +11,10 @@
 
 #include "cn10k_ml_model.h"
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_model.h"
+#endif
+
 #include "cnxk_ml_io.h"
 
 struct cnxk_ml_dev;
@@ -18,6 +22,48 @@ struct cnxk_ml_model;
 struct cnxk_ml_qp;
 struct cnxk_ml_req;
 
+/* Model type */
+enum cnxk_ml_model_type {
+	/* Unknown model type */
+	ML_CNXK_MODEL_TYPE_UNKNOWN,
+
+	/* Invalid model type */
+	ML_CNXK_MODEL_TYPE_INVALID,
+
+	/* Glow compiled model, for MLIP target */
+	ML_CNXK_MODEL_TYPE_GLOW,
+
+	/* TVM compiled model, for ARM64 / ARM64 + MLIP target */
+	ML_CNXK_MODEL_TYPE_TVM,
+};
+
+/* Model subtype */
+enum cnxk_ml_model_subtype {
+	/* Marvell Glow model */
+	ML_CNXK_MODEL_SUBTYPE_GLOW_MRVL,
+
+	/* TVM model with single MRVL region */
+	ML_CNXK_MODEL_SUBTYPE_TVM_MRVL,
+
+	/* TVM model with LLVM regions only */
+	ML_CNXK_MODEL_SUBTYPE_TVM_LLVM,
+
+	/* TVM hybrid model, with both MRVL and LLVM regions or (> 1) MRVL regions*/
+	ML_CNXK_MODEL_SUBTYPE_TVM_HYBRID,
+};
+
+/* Layer type */
+enum cnxk_ml_layer_type {
+	/* MRVL layer, for MLIP target*/
+	ML_CNXK_LAYER_TYPE_UNKNOWN = 0,
+
+	/* MRVL layer, for MLIP target*/
+	ML_CNXK_LAYER_TYPE_MRVL,
+
+	/* LLVM layer, for ARM64 target*/
+	ML_CNXK_LAYER_TYPE_LLVM,
+};
+
 /* Model state */
 enum cnxk_ml_model_state {
 	/* Unknown state */
@@ -53,6 +99,9 @@ struct cnxk_ml_layer {
 	/* Name*/
 	char name[RTE_ML_STR_MAX];
 
+	/* Type */
+	enum cnxk_ml_layer_type type;
+
 	/* Model handle */
 	struct cnxk_ml_model *model;
 
@@ -83,14 +132,27 @@ struct cnxk_ml_model {
 	/* Device reference */
 	struct cnxk_ml_dev *cnxk_mldev;
 
+	/* Type */
+	enum cnxk_ml_model_type type;
+
+	/* Model subtype */
+	enum cnxk_ml_model_subtype subtype;
+
 	/* ID */
 	uint16_t model_id;
 
 	/* Name */
 	char name[RTE_ML_STR_MAX];
 
-	/* Model specific data - glow */
-	struct cn10k_ml_model_data glow;
+	union {
+		/* Model specific data - glow */
+		struct cn10k_ml_model_data glow;
+
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		/* Model type specific data - mvtvm */
+		struct mvtvm_ml_model_data mvtvm;
+#endif
+	};
 
 	/* Batch size */
 	uint32_t batch_size;
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index c3639320a5..ea6f59a70f 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1217,6 +1217,8 @@ cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buf
 	struct cnxk_ml_model *model;
 	uint8_t *lcl_dbuffer;
 	uint8_t *lcl_qbuffer;
+	uint64_t d_offset;
+	uint64_t q_offset;
 	uint32_t i;
 	int ret;
 
@@ -1229,17 +1231,31 @@ cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buf
 		return -EINVAL;
 	}
 
-	info = &model->layer[0].info;
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		info = cn10k_ml_model_io_info_get(model, 0);
 
-	lcl_dbuffer = dbuffer[0]->addr;
-	lcl_qbuffer = qbuffer[0]->addr;
+	if (info == NULL)
+		return -EINVAL;
+
+	d_offset = 0;
+	q_offset = 0;
 	for (i = 0; i < info->nb_inputs; i++) {
+		if (model->type == ML_CNXK_MODEL_TYPE_TVM) {
+			lcl_dbuffer = dbuffer[i]->addr;
+			lcl_qbuffer = qbuffer[i]->addr;
+		} else {
+			lcl_dbuffer = RTE_PTR_ADD(dbuffer[0]->addr, d_offset);
+			lcl_qbuffer = RTE_PTR_ADD(qbuffer[0]->addr, q_offset);
+		}
+
 		ret = cnxk_ml_io_quantize_single(&info->input[i], lcl_dbuffer, lcl_qbuffer);
 		if (ret < 0)
 			return ret;
 
-		lcl_dbuffer += info->input[i].sz_d;
-		lcl_qbuffer += info->input[i].sz_q;
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
+			d_offset += info->input[i].sz_d;
+			q_offset += info->input[i].sz_q;
+		}
 	}
 
 	return 0;
@@ -1253,6 +1269,8 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 	struct cnxk_ml_model *model;
 	uint8_t *lcl_qbuffer;
 	uint8_t *lcl_dbuffer;
+	uint64_t q_offset;
+	uint64_t d_offset;
 	uint32_t i;
 	int ret;
 
@@ -1265,17 +1283,31 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 		return -EINVAL;
 	}
 
-	info = &model->layer[model->nb_layers - 1].info;
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		info = cn10k_ml_model_io_info_get(model, model->nb_layers - 1);
 
-	lcl_qbuffer = qbuffer[0]->addr;
-	lcl_dbuffer = dbuffer[0]->addr;
+	if (info == NULL)
+		return -EINVAL;
+
+	q_offset = 0;
+	d_offset = 0;
 	for (i = 0; i < info->nb_outputs; i++) {
+		if (model->type == ML_CNXK_MODEL_TYPE_TVM) {
+			lcl_qbuffer = qbuffer[i]->addr;
+			lcl_dbuffer = dbuffer[i]->addr;
+		} else {
+			lcl_qbuffer = RTE_PTR_ADD(qbuffer[0]->addr, q_offset);
+			lcl_dbuffer = RTE_PTR_ADD(dbuffer[0]->addr, d_offset);
+		}
+
 		ret = cnxk_ml_io_dequantize_single(&info->output[i], lcl_qbuffer, lcl_dbuffer);
 		if (ret < 0)
 			return ret;
 
-		lcl_qbuffer += info->output[i].sz_q;
-		lcl_dbuffer += info->output[i].sz_d;
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
+			q_offset += info->output[i].sz_q;
+			d_offset += info->output[i].sz_d;
+		}
 	}
 
 	return 0;
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 7570186177..12b73ee3be 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -66,6 +66,7 @@ dpdk_conf.set('RTE_MLDEV_CNXK_ENABLE_MVTVM', 1)
 
 driver_sdk_headers += files(
         'mvtvm_ml_ops.h',
+        'mvtvm_ml_model.h',
 )
 
 sources += files(
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
new file mode 100644
index 0000000000..1f6b435be0
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_MODEL_H_
+#define _MVTVM_ML_MODEL_H_
+
+#include <tvmdp.h>
+
+#include <rte_mldev.h>
+
+#include "cnxk_ml_io.h"
+
+/* Maximum number of objects per model */
+#define ML_MVTVM_MODEL_OBJECT_MAX 3
+
+/* Objects list */
+extern char mvtvm_object_list[ML_MVTVM_MODEL_OBJECT_MAX][RTE_ML_STR_MAX];
+
+/* Model object structure */
+struct mvtvm_ml_model_object {
+	/* Name */
+	char name[RTE_ML_STR_MAX];
+
+	/* Temporary buffer */
+	uint8_t *buffer;
+
+	/* Buffer size */
+	int64_t size;
+};
+
+struct mvtvm_ml_model_data {
+	/* Model metadata */
+	struct tvmdp_model_metadata metadata;
+
+	/* Model objects */
+	struct tvmdp_model_object object;
+
+	/* TVM runtime callbacks */
+	struct tvmrt_glow_callback cb;
+
+	/* Model I/O info */
+	struct cnxk_ml_io_info info;
+};
+
+#endif /* _MVTVM_ML_MODEL_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 20/35] ml/cnxk: add support for identify model type
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
                     ` (18 preceding siblings ...)
  2023-09-27 18:30   ` [PATCH v3 19/35] ml/cnxk: add structures to support TVM model type Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 21/35] ml/cnxk: add support to parse TVM model objects Srikanth Yalavarthi
                     ` (14 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enable support to parse model buffer to identify the
model type and model sub-type. Enabled basic checks
for Glow model type buffer.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_model.c  | 49 ++++++++++++++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_model.h  |  3 ++
 drivers/ml/cnxk/cnxk_ml_ops.c    |  8 +++++
 drivers/ml/cnxk/meson.build      |  6 ++++
 drivers/ml/cnxk/mvtvm_ml_model.c | 55 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  9 ++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  1 +
 8 files changed, 133 insertions(+)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.c

diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
index b069d4e3a5..02f80410ec 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.c
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -2,11 +2,60 @@
  * Copyright (c) 2023 Marvell.
  */
 
+#include <rte_hash_crc.h>
 #include <rte_mldev.h>
 
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_utils.h"
 
+enum cnxk_ml_model_type
+cnxk_ml_model_get_type(struct rte_ml_model_params *params)
+{
+	struct cn10k_ml_model_metadata_header *metadata_header;
+	enum cnxk_ml_model_type type;
+	uint32_t payload_crc32c;
+	uint32_t header_crc32c;
+
+	type = mvtvm_ml_model_type_get(params);
+	if (type == ML_CNXK_MODEL_TYPE_TVM)
+		return ML_CNXK_MODEL_TYPE_TVM;
+	else if (type == ML_CNXK_MODEL_TYPE_INVALID)
+		return ML_CNXK_MODEL_TYPE_INVALID;
+
+	/* Check model magic string */
+	metadata_header = (struct cn10k_ml_model_metadata_header *)params->addr;
+	if (strncmp((char *)metadata_header->magic, MRVL_ML_MODEL_MAGIC_STRING, 4) != 0) {
+		plt_err("Invalid Glow model, magic = %s", metadata_header->magic);
+		return ML_CNXK_MODEL_TYPE_INVALID;
+	}
+
+	/* Header CRC check */
+	if (metadata_header->header_crc32c != 0) {
+		header_crc32c = rte_hash_crc(
+			params->addr,
+			sizeof(struct cn10k_ml_model_metadata_header) - sizeof(uint32_t), 0);
+
+		if (header_crc32c != metadata_header->header_crc32c) {
+			plt_err("Invalid Glow model, Header CRC mismatch");
+			return ML_CNXK_MODEL_TYPE_INVALID;
+		}
+	}
+
+	/* Payload CRC check */
+	if (metadata_header->payload_crc32c != 0) {
+		payload_crc32c = rte_hash_crc(
+			PLT_PTR_ADD(params->addr, sizeof(struct cn10k_ml_model_metadata_header)),
+			params->size - sizeof(struct cn10k_ml_model_metadata_header), 0);
+
+		if (payload_crc32c != metadata_header->payload_crc32c) {
+			plt_err("Invalid Glow model, Payload CRC mismatch");
+			return ML_CNXK_MODEL_TYPE_INVALID;
+		}
+	}
+
+	return ML_CNXK_MODEL_TYPE_GLOW;
+}
+
 void
 cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp)
 {
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index f100eca203..a2fced46a2 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -13,6 +13,8 @@
 
 #ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
 #include "mvtvm_ml_model.h"
+#else
+#include "mvtvm_ml_stubs.h"
 #endif
 
 #include "cnxk_ml_io.h"
@@ -184,6 +186,7 @@ struct cnxk_ml_model {
 	set_poll_addr_t set_poll_addr;
 };
 
+enum cnxk_ml_model_type cnxk_ml_model_get_type(struct rte_ml_model_params *params);
 void cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp);
 
 #endif /* _CNXK_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index ea6f59a70f..c140408023 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1018,6 +1018,7 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 {
 	struct rte_ml_dev_info dev_info;
 	struct cnxk_ml_dev *cnxk_mldev;
+	enum cnxk_ml_model_type type;
 	struct cnxk_ml_model *model;
 
 	char str[RTE_MEMZONE_NAMESIZE];
@@ -1033,6 +1034,12 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 
 	cnxk_mldev = dev->data->dev_private;
 
+	type = cnxk_ml_model_get_type(params);
+	if (type == ML_CNXK_MODEL_TYPE_INVALID) {
+		plt_err("Invalid / unsupported model type");
+		return -EINVAL;
+	}
+
 	/* Find model ID */
 	found = false;
 	for (lcl_model_id = 0; lcl_model_id < dev->data->nb_models; lcl_model_id++) {
@@ -1066,6 +1073,7 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 
 	model = mz->addr;
 	model->cnxk_mldev = cnxk_mldev;
+	model->type = type;
 	model->model_id = lcl_model_id;
 	model->info = PLT_PTR_ADD(
 		model, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size));
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 12b73ee3be..b3a62a7871 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -9,6 +9,11 @@ endif
 
 enable_mvtvm = true
 
+if not libarchive.found()
+        message('drivers/ml/cnxk: libarchive not found')
+        enable_mvtvm = false
+endif
+
 if not jansson_dep.found()
         message('drivers/ml/cnxk: jansson not found')
         enable_mvtvm = false
@@ -71,6 +76,7 @@ driver_sdk_headers += files(
 
 sources += files(
         'mvtvm_ml_ops.c',
+        'mvtvm_ml_model.c',
 )
 
 ext_deps += tvmrt_dep
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
new file mode 100644
index 0000000000..ab5f8baa67
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <archive.h>
+#include <archive_entry.h>
+
+#include <rte_mldev.h>
+
+#include <roc_api.h>
+
+#include "cnxk_ml_model.h"
+
+/* Objects list */
+char mvtvm_object_list[ML_MVTVM_MODEL_OBJECT_MAX][RTE_ML_STR_MAX] = {"mod.so", "mod.json",
+								     "mod.params"};
+
+enum cnxk_ml_model_type
+mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
+{
+	bool object_found[ML_MVTVM_MODEL_OBJECT_MAX] = {false, false, false};
+	struct archive_entry *entry;
+	struct archive *a;
+	uint8_t i;
+	int ret;
+
+	/* Assume as archive and check for read status */
+	a = archive_read_new();
+	archive_read_support_filter_all(a);
+	archive_read_support_format_all(a);
+
+	ret = archive_read_open_memory(a, params->addr, params->size);
+	if (ret != ARCHIVE_OK)
+		return ML_CNXK_MODEL_TYPE_UNKNOWN;
+
+	/* Parse buffer for available objects */
+	while (archive_read_next_header(a, &entry) == ARCHIVE_OK) {
+		for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+			if (!object_found[i] &&
+			    (strcmp(archive_entry_pathname(entry), mvtvm_object_list[i]) == 0))
+				object_found[i] = true;
+		}
+		archive_read_data_skip(a);
+	}
+
+	/* Check if all objects are available */
+	for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+		if (!object_found[i]) {
+			plt_err("Object %s not found in archive!\n", mvtvm_object_list[i]);
+			return ML_CNXK_MODEL_TYPE_INVALID;
+		}
+	}
+
+	return ML_CNXK_MODEL_TYPE_TVM;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 1f6b435be0..b6162fceec 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -43,4 +43,6 @@ struct mvtvm_ml_model_data {
 	struct cnxk_ml_io_info info;
 };
 
+enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
+
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index a31cd39cfa..a7352840a6 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -7,6 +7,15 @@
 #include "mvtvm_ml_stubs.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
+
+enum cnxk_ml_model_type
+mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
+{
+	RTE_SET_USED(params);
+
+	return ML_CNXK_MODEL_TYPE_UNKNOWN;
+}
 
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 11c56e5144..467a9d39e5 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -9,6 +9,7 @@
 
 struct cnxk_ml_dev;
 
+enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 21/35] ml/cnxk: add support to parse TVM model objects
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
                     ` (19 preceding siblings ...)
  2023-09-27 18:30   ` [PATCH v3 20/35] ml/cnxk: add support for identify " Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 22/35] ml/cnxk: fetch layer info and load TVM model Srikanth Yalavarthi
                     ` (13 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to parse TVM model objects from the model
archive buffer. Added support to check for all expected
objects and copy TVM model objects to internal buffers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_ops.c    |  5 ++-
 drivers/ml/cnxk/mvtvm_ml_model.c | 57 +++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 62 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |  3 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c | 11 ++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  3 ++
 7 files changed, 142 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index c140408023..b18271545d 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1079,7 +1079,10 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 		model, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size));
 	dev->data->models[lcl_model_id] = model;
 
-	ret = cn10k_ml_model_load(cnxk_mldev, params, model);
+	if (type == ML_CNXK_MODEL_TYPE_GLOW)
+		ret = cn10k_ml_model_load(cnxk_mldev, params, model);
+	else
+		ret = mvtvm_ml_model_load(cnxk_mldev, params, model);
 	if (ret != 0)
 		goto error;
 
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index ab5f8baa67..4c9a080c05 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -53,3 +53,60 @@ mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
 
 	return ML_CNXK_MODEL_TYPE_TVM;
 }
+
+int
+mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params, struct mvtvm_ml_model_object *object)
+{
+	bool object_found[ML_MVTVM_MODEL_OBJECT_MAX] = {false, false, false};
+	struct archive_entry *entry;
+	struct archive *a;
+	uint8_t i;
+	int ret;
+
+	/* Open archive */
+	a = archive_read_new();
+	archive_read_support_filter_all(a);
+	archive_read_support_format_all(a);
+
+	ret = archive_read_open_memory(a, params->addr, params->size);
+	if (ret != ARCHIVE_OK)
+		return archive_errno(a);
+
+	/* Read archive */
+	while (archive_read_next_header(a, &entry) == ARCHIVE_OK) {
+		for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+			if (!object_found[i] &&
+			    (strcmp(archive_entry_pathname(entry), mvtvm_object_list[i]) == 0)) {
+				memcpy(object[i].name, mvtvm_object_list[i], RTE_ML_STR_MAX);
+				object[i].size = archive_entry_size(entry);
+				object[i].buffer = rte_malloc(NULL, object[i].size, 0);
+
+				if (archive_read_data(a, object[i].buffer, object[i].size) !=
+				    object[i].size) {
+					plt_err("Failed to read object from model archive: %s",
+						object[i].name);
+					goto error;
+				}
+				object_found[i] = true;
+			}
+		}
+		archive_read_data_skip(a);
+	}
+
+	/* Check if all objects are parsed */
+	for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+		if (!object_found[i]) {
+			plt_err("Object %s not found in archive!\n", mvtvm_object_list[i]);
+			goto error;
+		}
+	}
+	return 0;
+
+error:
+	for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+		if (object[i].buffer != NULL)
+			rte_free(object[i].buffer);
+	}
+
+	return -EINVAL;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index b6162fceec..b11b66f495 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -44,5 +44,7 @@ struct mvtvm_ml_model_data {
 };
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
+int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
+			      struct mvtvm_ml_model_object *object);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 88c6d5a864..e2413b6b15 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -8,8 +8,12 @@
 #include <rte_mldev_pmd.h>
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
+/* ML model macros */
+#define MVTVM_ML_MODEL_MEMZONE_NAME "ml_mvtvm_model_mz"
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
@@ -39,3 +43,61 @@ mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 
 	return ret;
 }
+
+int
+mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+		    struct cnxk_ml_model *model)
+{
+	struct mvtvm_ml_model_object object[ML_MVTVM_MODEL_OBJECT_MAX];
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	size_t model_object_size = 0;
+	uint64_t mz_size = 0;
+	int ret;
+
+	RTE_SET_USED(cnxk_mldev);
+
+	ret = mvtvm_ml_model_blob_parse(params, object);
+	if (ret != 0)
+		return ret;
+
+	model_object_size = RTE_ALIGN_CEIL(object[0].size, RTE_CACHE_LINE_MIN_SIZE) +
+			    RTE_ALIGN_CEIL(object[1].size, RTE_CACHE_LINE_MIN_SIZE) +
+			    RTE_ALIGN_CEIL(object[2].size, RTE_CACHE_LINE_MIN_SIZE);
+	mz_size += model_object_size;
+
+	/* Allocate memzone for model object */
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", MVTVM_ML_MODEL_MEMZONE_NAME, model->model_id);
+	mz = plt_memzone_reserve_aligned(str, mz_size, 0, ML_CN10K_ALIGN_SIZE);
+	if (!mz) {
+		plt_err("plt_memzone_reserve failed : %s", str);
+		return -ENOMEM;
+	}
+
+	/* Copy mod.so */
+	model->mvtvm.object.so.addr = mz->addr;
+	model->mvtvm.object.so.size = object[0].size;
+	rte_memcpy(model->mvtvm.object.so.name, object[0].name, TVMDP_NAME_STRLEN);
+	rte_memcpy(model->mvtvm.object.so.addr, object[0].buffer, object[0].size);
+	rte_free(object[0].buffer);
+
+	/* Copy mod.json */
+	model->mvtvm.object.json.addr =
+		RTE_PTR_ADD(model->mvtvm.object.so.addr,
+			    RTE_ALIGN_CEIL(model->mvtvm.object.so.size, RTE_CACHE_LINE_MIN_SIZE));
+	model->mvtvm.object.json.size = object[1].size;
+	rte_memcpy(model->mvtvm.object.json.name, object[1].name, TVMDP_NAME_STRLEN);
+	rte_memcpy(model->mvtvm.object.json.addr, object[1].buffer, object[1].size);
+	rte_free(object[1].buffer);
+
+	/* Copy mod.params */
+	model->mvtvm.object.params.addr =
+		RTE_PTR_ADD(model->mvtvm.object.json.addr,
+			    RTE_ALIGN_CEIL(model->mvtvm.object.json.size, RTE_CACHE_LINE_MIN_SIZE));
+	model->mvtvm.object.params.size = object[2].size;
+	rte_memcpy(model->mvtvm.object.params.name, object[2].name, TVMDP_NAME_STRLEN);
+	rte_memcpy(model->mvtvm.object.params.addr, object[2].buffer, object[2].size);
+	rte_free(object[2].buffer);
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 305b4681ed..6607537599 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -12,8 +12,11 @@
 #include <rte_mldev.h>
 
 struct cnxk_ml_dev;
+struct cnxk_ml_model;
 
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+			struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index a7352840a6..7f3b3abb2e 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -33,3 +33,14 @@ mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 
 	return 0;
 }
+
+int
+mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+		    struct cnxk_ml_model *model)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(params);
+	RTE_SET_USED(model);
+
+	return -EINVAL;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 467a9d39e5..4bb1772ef4 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -8,9 +8,12 @@
 #include <rte_mldev.h>
 
 struct cnxk_ml_dev;
+struct cnxk_ml_model;
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+			struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 22/35] ml/cnxk: fetch layer info and load TVM model
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
                     ` (20 preceding siblings ...)
  2023-09-27 18:30   ` [PATCH v3 21/35] ml/cnxk: add support to parse TVM model objects Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 23/35] ml/cnxk: update internal info for " Srikanth Yalavarthi
                     ` (12 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to fetch TVM model layer information and
update internal structures based on the layer information
Set callback functions for layer load and unload and
enable model loading using TVMDP library. Added support
to fetch full metadata after model load.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 11 +++++
 drivers/ml/cnxk/cn10k_ml_model.h |  2 +
 drivers/ml/cnxk/cn10k_ml_ops.c   |  7 ++-
 drivers/ml/cnxk/mvtvm_ml_model.c | 25 ++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  4 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 81 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.c | 10 ++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  3 ++
 8 files changed, 141 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index b765b4ada9..9a80adf0fc 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -714,3 +714,14 @@ cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer
 	cnxk_ml_print_line(fp, LINE_LEN);
 	fprintf(fp, "\n");
 }
+
+int
+cn10k_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name, uint16_t *layer_id)
+{
+	if (model->type == ML_CNXK_MODEL_TYPE_TVM)
+		return mvtvm_ml_model_get_layer_id(model, layer_name, layer_id);
+
+	*layer_id = 0;
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 45f2ed5fcf..6744175cd5 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -461,5 +461,7 @@ void cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_mode
 			     struct cnxk_ml_io_info *io_info,
 			     struct cn10k_ml_model_metadata *metadata);
 void cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
+int cn10k_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
+				uint16_t *layer_id);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index a471e98fbf..4191ccc840 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -576,7 +576,7 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 	size_t layer_xstats_size;
 	uint8_t *base_dma_addr;
 	uint16_t scratch_pages;
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	uint16_t wb_pages;
 	uint64_t mz_size;
 	uint16_t idx;
@@ -584,7 +584,6 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 	int ret;
 
 	PLT_SET_USED(size);
-	PLT_SET_USED(layer_name);
 
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
@@ -598,6 +597,10 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 		return -EINVAL;
 	}
 
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
 	layer = &model->layer[layer_id];
 
 	ret = cn10k_ml_model_metadata_check(buffer, size);
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 4c9a080c05..8536fd8927 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -110,3 +110,28 @@ mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params, struct mvtvm_ml_mo
 
 	return -EINVAL;
 }
+
+int
+mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name, uint16_t *layer_id)
+{
+	uint16_t i;
+
+	for (i = 0; i < model->mvtvm.metadata.model.nb_layers; i++) {
+		if (strcmp(model->layer[i].name, layer_name) == 0)
+			break;
+	}
+
+	if (i == model->mvtvm.metadata.model.nb_layers) {
+		plt_err("Invalid layer name: %s", layer_name);
+		return -EINVAL;
+	}
+
+	if (model->layer[i].type != ML_CNXK_LAYER_TYPE_MRVL) {
+		plt_err("Invalid layer type, name: %s type: %d", layer_name, model->layer[i].type);
+		return -EINVAL;
+	}
+
+	*layer_id = i;
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index b11b66f495..6cb2639876 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -11,6 +11,8 @@
 
 #include "cnxk_ml_io.h"
 
+struct cnxk_ml_model;
+
 /* Maximum number of objects per model */
 #define ML_MVTVM_MODEL_OBJECT_MAX 3
 
@@ -46,5 +48,7 @@ struct mvtvm_ml_model_data {
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
 int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
 			      struct mvtvm_ml_model_object *object);
+int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
+				uint16_t *layer_id);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index e2413b6b15..1fe0a04301 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -49,9 +49,13 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		    struct cnxk_ml_model *model)
 {
 	struct mvtvm_ml_model_object object[ML_MVTVM_MODEL_OBJECT_MAX];
+	struct tvmrt_glow_callback *callback;
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
 	size_t model_object_size = 0;
+	uint16_t nb_mrvl_layers;
+	uint16_t nb_llvm_layers;
+	uint8_t layer_id = 0;
 	uint64_t mz_size = 0;
 	int ret;
 
@@ -99,5 +103,82 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	rte_memcpy(model->mvtvm.object.params.addr, object[2].buffer, object[2].size);
 	rte_free(object[2].buffer);
 
+	/* Get metadata - stage 1 */
+	ret = tvmdp_model_metadata_get_stage1(model->mvtvm.object.json.addr,
+					      model->mvtvm.object.json.size,
+					      &model->mvtvm.metadata);
+	if (ret != 0) {
+		plt_err("TVMDP: Failed to parse metadata - stage 1, model_id = %u, error = %d",
+			model->model_id, ret);
+		goto error;
+	}
+
+	/* Set model fields */
+	plt_strlcpy(model->name, model->mvtvm.metadata.model.name, TVMDP_NAME_STRLEN);
+	model->batch_size = 1;
+	model->nb_layers = model->mvtvm.metadata.model.nb_layers;
+
+	/* Update layer info */
+	nb_mrvl_layers = 0;
+	nb_llvm_layers = 0;
+	for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+		strncpy(model->layer[layer_id].name,
+			model->mvtvm.metadata.model.layer[layer_id].name, TVMDP_NAME_STRLEN);
+		if (strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "mrvl") == 0 ||
+		    strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "MRVL") == 0) {
+			model->layer[layer_id].type = ML_CNXK_LAYER_TYPE_MRVL;
+			nb_mrvl_layers++;
+		} else if (strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "llvm") == 0 ||
+			   strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "LLVM") == 0) {
+			model->layer[layer_id].type = ML_CNXK_LAYER_TYPE_LLVM;
+			nb_llvm_layers++;
+		}
+	}
+
+	if ((nb_llvm_layers == 0) && (nb_mrvl_layers == 0)) {
+		plt_err("Invalid model, nb_llvm_layers = %u, nb_mrvl_layers = %u", nb_llvm_layers,
+			nb_mrvl_layers);
+		goto error;
+	}
+
+	/* Set model subtype */
+	if ((nb_llvm_layers == 0) && (nb_mrvl_layers == 1))
+		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_MRVL;
+	else if ((nb_llvm_layers > 0) && (nb_mrvl_layers == 0))
+		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_LLVM;
+	else
+		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_HYBRID;
+
+	/* Set callback function array */
+	if (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_LLVM) {
+		callback = &model->mvtvm.cb;
+		callback->tvmrt_glow_layer_load = cn10k_ml_layer_load;
+		callback->tvmrt_glow_layer_unload = cn10k_ml_layer_unload;
+	} else {
+		callback = NULL;
+	}
+
+	/* Initialize model in TVMDP */
+	ret = tvmdp_model_load(cnxk_mldev, model->model_id, (void *)(&model->mvtvm.object),
+			       callback);
+	if (ret != 0) {
+		plt_err("TVMDP: Model load failed, model_id = %u, error = %d", model->model_id,
+			ret);
+		goto error;
+	}
+
+	/* Get model metadata - stage 2 */
+	ret = tvmdp_model_metadata_get_stage2(model->model_id, &model->mvtvm.metadata);
+	if (ret != 0) {
+		plt_err("TVMDP: Failed to get metadata, model_id = %u, error = %d\n",
+			model->model_id, ret);
+		goto error;
+	}
+
 	return 0;
+
+error:
+	rte_memzone_free(mz);
+
+	return ret;
 }
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index 7f3b3abb2e..d621dbc897 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -17,6 +17,16 @@ mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
 	return ML_CNXK_MODEL_TYPE_UNKNOWN;
 }
 
+int
+mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name, uint16_t *layer_id)
+{
+	RTE_SET_USED(model);
+	RTE_SET_USED(layer_name);
+	RTE_SET_USED(layer_id);
+
+	return -EINVAL;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 4bb1772ef4..23fdfdc4cd 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -16,4 +16,7 @@ int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 
+int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
+				uint16_t *layer_id);
+
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 23/35] ml/cnxk: update internal info for TVM model
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
                     ` (21 preceding siblings ...)
  2023-09-27 18:30   ` [PATCH v3 22/35] ml/cnxk: fetch layer info and load TVM model Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 24/35] ml/cnxk: enable model unload in tvmdp library Srikanth Yalavarthi
                     ` (11 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enabled updating internal IO info structures for TVM model.
Compute static fields related to the model I/O.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_ops.c    |   4 ++
 drivers/ml/cnxk/mvtvm_ml_model.c | 111 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |   2 +
 drivers/ml/cnxk/mvtvm_ml_ops.c   |   3 +
 drivers/ml/cnxk/mvtvm_ml_stubs.c |   9 +++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |   1 +
 6 files changed, 130 insertions(+)

diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index b18271545d..90b23d9c1c 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1244,6 +1244,8 @@ cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buf
 
 	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
 		info = cn10k_ml_model_io_info_get(model, 0);
+	else
+		info = mvtvm_ml_model_io_info_get(model, 0);
 
 	if (info == NULL)
 		return -EINVAL;
@@ -1296,6 +1298,8 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 
 	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
 		info = cn10k_ml_model_io_info_get(model, model->nb_layers - 1);
+	else
+		info = mvtvm_ml_model_io_info_get(model, model->nb_layers - 1);
 
 	if (info == NULL)
 		return -EINVAL;
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 8536fd8927..14f4b258d8 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -7,6 +7,8 @@
 
 #include <rte_mldev.h>
 
+#include <mldev_utils.h>
+
 #include <roc_api.h>
 
 #include "cnxk_ml_model.h"
@@ -135,3 +137,112 @@ mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 
 	return 0;
 }
+
+static enum rte_ml_io_type
+mvtvm_ml_io_type_map(uint8_t type)
+{
+	switch (type) {
+	case kDLInt:
+		return RTE_ML_IO_TYPE_INT32;
+	case kDLUInt:
+		return RTE_ML_IO_TYPE_UINT32;
+	case kDLFloat:
+		return RTE_ML_IO_TYPE_FP32;
+	case kDLBfloat:
+		return RTE_ML_IO_TYPE_BFLOAT16;
+	}
+
+	return RTE_ML_IO_TYPE_UNKNOWN;
+}
+
+void
+mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model)
+{
+	struct tvmdp_model_metadata *metadata;
+	int32_t i;
+	int32_t j;
+
+	if (model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL)
+		goto tvm_mrvl_model;
+
+	metadata = &model->mvtvm.metadata;
+
+	/* Inputs, set for layer_id = 0 */
+	model->mvtvm.info.nb_inputs = metadata->model.num_input;
+	model->mvtvm.info.total_input_sz_d = 0;
+	model->mvtvm.info.total_input_sz_q = 0;
+	for (i = 0; i < metadata->model.num_input; i++) {
+		strncpy(model->mvtvm.info.input[i].name, metadata->input[i].name,
+			TVMDP_NAME_STRLEN);
+		model->mvtvm.info.input[i].dtype =
+			mvtvm_ml_io_type_map(metadata->input[i].datatype.code);
+		model->mvtvm.info.input[i].qtype =
+			mvtvm_ml_io_type_map(metadata->input[i].model_datatype.code);
+		model->mvtvm.info.input[i].nb_dims = metadata->input[i].ndim;
+
+		model->mvtvm.info.input[i].nb_elements = 1;
+		for (j = 0; j < metadata->input[i].ndim; j++) {
+			model->mvtvm.info.input[i].shape[j] = metadata->input[i].shape[j];
+			model->mvtvm.info.input[i].nb_elements *= metadata->input[i].shape[j];
+		}
+
+		model->mvtvm.info.input[i].sz_d =
+			model->mvtvm.info.input[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.input[i].dtype);
+		model->mvtvm.info.input[i].sz_q =
+			model->mvtvm.info.input[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.input[i].qtype);
+
+		model->mvtvm.info.total_input_sz_d += model->mvtvm.info.input[i].sz_d;
+		model->mvtvm.info.total_input_sz_q += model->mvtvm.info.input[i].sz_q;
+
+		plt_ml_dbg("model_id = %u, input[%u] - sz_d = %u sz_q = %u", model->model_id, i,
+			   model->mvtvm.info.input[i].sz_d, model->mvtvm.info.input[i].sz_q);
+	}
+
+	/* Outputs, set for nb_layers - 1 */
+	model->mvtvm.info.nb_outputs = metadata->model.num_output;
+	model->mvtvm.info.total_output_sz_d = 0;
+	model->mvtvm.info.total_output_sz_q = 0;
+	for (i = 0; i < metadata->model.num_output; i++) {
+		strncpy(model->mvtvm.info.output[i].name, metadata->output[i].name,
+			TVMDP_NAME_STRLEN);
+		model->mvtvm.info.output[i].dtype =
+			mvtvm_ml_io_type_map(metadata->output[i].datatype.code);
+		model->mvtvm.info.output[i].qtype =
+			mvtvm_ml_io_type_map(metadata->output[i].model_datatype.code);
+		model->mvtvm.info.output[i].nb_dims = metadata->output[i].ndim;
+
+		model->mvtvm.info.output[i].nb_elements = 1;
+		for (j = 0; j < metadata->output[i].ndim; j++) {
+			model->mvtvm.info.output[i].shape[j] = metadata->output[i].shape[j];
+			model->mvtvm.info.output[i].nb_elements *= metadata->output[i].shape[j];
+		}
+
+		model->mvtvm.info.output[i].sz_d =
+			model->mvtvm.info.output[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.output[i].dtype);
+		model->mvtvm.info.output[i].sz_q =
+			model->mvtvm.info.output[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.output[i].qtype);
+
+		model->mvtvm.info.total_output_sz_d += model->mvtvm.info.output[i].sz_d;
+		model->mvtvm.info.total_output_sz_q += model->mvtvm.info.output[i].sz_q;
+
+		plt_ml_dbg("model_id = %u, output[%u] - sz_d = %u sz_q = %u", model->model_id, i,
+			   model->mvtvm.info.output[i].sz_d, model->mvtvm.info.output[i].sz_q);
+	}
+
+	return;
+
+tvm_mrvl_model:
+	cn10k_ml_layer_io_info_set(&model->mvtvm.info, &model->layer[0].glow.metadata);
+}
+
+struct cnxk_ml_io_info *
+mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
+{
+	RTE_SET_USED(layer_id);
+
+	return &model->mvtvm.info;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 6cb2639876..e86581bc6a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -50,5 +50,7 @@ int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
 			      struct mvtvm_ml_model_object *object);
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
+void mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model);
+struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 1fe0a04301..e248310cb3 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -175,6 +175,9 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		goto error;
 	}
 
+	/* Update model I/O data */
+	mvtvm_ml_model_io_info_set(model);
+
 	return 0;
 
 error:
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index d621dbc897..80a9a90b4e 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -27,6 +27,15 @@ mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 	return -EINVAL;
 }
 
+struct cnxk_ml_io_info *
+mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
+{
+	RTE_SET_USED(model);
+	RTE_SET_USED(layer_id);
+
+	return NULL;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 23fdfdc4cd..29f721072a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -18,5 +18,6 @@ int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_para
 
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
+struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
 
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 24/35] ml/cnxk: enable model unload in tvmdp library
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
                     ` (22 preceding siblings ...)
  2023-09-27 18:30   ` [PATCH v3 23/35] ml/cnxk: update internal info for " Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 25/35] ml/cnxk: support start and stop for TVM models Srikanth Yalavarthi
                     ` (10 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enable unloading model using external tvmdp library. Updated
layer unload callback to support multiple layers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c   |  8 +++++---
 drivers/ml/cnxk/cnxk_ml_ops.c    |  7 +++++--
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 28 ++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |  1 +
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  9 +++++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  1 +
 6 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 4191ccc840..e7208391fd 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -780,11 +780,9 @@ cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name)
 	struct cnxk_ml_layer *layer;
 
 	char str[RTE_MEMZONE_NAMESIZE];
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	int ret;
 
-	PLT_SET_USED(layer_name);
-
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
 		plt_err("Invalid device = %p", device);
@@ -797,6 +795,10 @@ cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name)
 		return -EINVAL;
 	}
 
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
 	layer = &model->layer[layer_id];
 
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u_%u", CN10K_ML_LAYER_MEMZONE_NAME,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 90b23d9c1c..cd95a3c7ad 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1107,7 +1107,7 @@ cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 	struct cnxk_ml_model *model;
 
 	char str[RTE_MEMZONE_NAMESIZE];
-	int ret;
+	int ret = 0;
 
 	if (dev == NULL)
 		return -EINVAL;
@@ -1125,7 +1125,10 @@ cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EBUSY;
 	}
 
-	ret = cn10k_ml_model_unload(cnxk_mldev, model);
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		ret = cn10k_ml_model_unload(cnxk_mldev, model);
+	else
+		ret = mvtvm_ml_model_unload(cnxk_mldev, model);
 	if (ret != 0)
 		return ret;
 
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index e248310cb3..9fd9e58de6 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -185,3 +185,31 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	return ret;
 }
+
+int
+mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	int ret;
+
+	RTE_SET_USED(cnxk_mldev);
+
+	/* Initialize model in TVMDP */
+	ret = tvmdp_model_unload(model->model_id);
+	if (ret != 0) {
+		plt_err("TVMDP: Model unload failed, model_id = %u, error = %d", model->model_id,
+			ret);
+		return ret;
+	}
+
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", MVTVM_ML_MODEL_MEMZONE_NAME, model->model_id);
+	mz = rte_memzone_lookup(str);
+	if (mz == NULL) {
+		plt_err("Memzone lookup failed for TVM model: model_id = %u, mz = %s",
+			model->model_id, str);
+		return -EINVAL;
+	}
+
+	return plt_memzone_free(mz);
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 6607537599..770794fe7d 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -18,5 +18,6 @@ int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_d
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
+int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index 80a9a90b4e..a17a76e41f 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -63,3 +63,12 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	return -EINVAL;
 }
+
+int
+mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+
+	return -EINVAL;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 29f721072a..3776fb5369 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -15,6 +15,7 @@ int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_d
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
+int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 25/35] ml/cnxk: support start and stop for TVM models
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
                     ` (23 preceding siblings ...)
  2023-09-27 18:30   ` [PATCH v3 24/35] ml/cnxk: enable model unload in tvmdp library Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 26/35] ml/cnxk: update internal TVM model info structure Srikanth Yalavarthi
                     ` (9 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to start and stop TVM models. TVM model
start would invoke layer start for all Glow layers part
of the model. TVM model stop would invoke layer stop
for all Glow layers part of the model.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c   | 16 ++++++----
 drivers/ml/cnxk/cnxk_ml_ops.c    | 14 +++++++--
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 52 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |  2 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c | 18 +++++++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  2 ++
 6 files changed, 96 insertions(+), 8 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index e7208391fd..2d308802cf 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -827,7 +827,7 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 	struct cn10k_ml_ocm *ocm;
 	struct cnxk_ml_req *req;
 
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	bool job_enqueued;
 	bool job_dequeued;
 	uint8_t num_tiles;
@@ -838,8 +838,6 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 	bool locked;
 	int ret = 0;
 
-	PLT_SET_USED(layer_name);
-
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
 		plt_err("Invalid device = %p", device);
@@ -852,6 +850,10 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 		return -EINVAL;
 	}
 
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
 	layer = &model->layer[layer_id];
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
@@ -1015,14 +1017,12 @@ cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name)
 	struct cn10k_ml_ocm *ocm;
 	struct cnxk_ml_req *req;
 
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	bool job_enqueued;
 	bool job_dequeued;
 	bool locked;
 	int ret = 0;
 
-	PLT_SET_USED(layer_name);
-
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
 		plt_err("Invalid device = %p", device);
@@ -1035,6 +1035,10 @@ cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name)
 		return -EINVAL;
 	}
 
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
 	layer = &model->layer[layer_id];
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index cd95a3c7ad..9d664571c4 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1156,7 +1156,12 @@ cnxk_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EINVAL;
 	}
 
-	return cn10k_ml_model_start(cnxk_mldev, model);
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		return cn10k_ml_model_start(cnxk_mldev, model);
+	else
+		return mvtvm_ml_model_start(cnxk_mldev, model);
+
+	return 0;
 }
 
 int
@@ -1176,7 +1181,12 @@ cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EINVAL;
 	}
 
-	return cn10k_ml_model_stop(cnxk_mldev, model);
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		return cn10k_ml_model_stop(cnxk_mldev, model);
+	else
+		return mvtvm_ml_model_stop(cnxk_mldev, model);
+
+	return 0;
 }
 
 static int
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 9fd9e58de6..1d0b3544a7 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -213,3 +213,55 @@ mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mode
 
 	return plt_memzone_free(mz);
 }
+
+int
+mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+
+	uint16_t layer_id = 0;
+	int ret = 0;
+
+next_layer:
+	layer = &model->layer[layer_id];
+	if (layer->type == ML_CNXK_LAYER_TYPE_MRVL) {
+		ret = cn10k_ml_layer_start(cnxk_mldev, model->model_id, layer->name);
+		if (ret != 0) {
+			plt_err("Layer start failed, model_id = %u, layer_name = %s, error = %d",
+				model->model_id, layer->name, ret);
+			return ret;
+		}
+	}
+	layer_id++;
+
+	if (layer_id < model->nb_layers)
+		goto next_layer;
+
+	return 0;
+}
+
+int
+mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+
+	uint16_t layer_id = 0;
+	int ret = 0;
+
+next_layer:
+	layer = &model->layer[layer_id];
+	if (layer->type == ML_CNXK_LAYER_TYPE_MRVL) {
+		ret = cn10k_ml_layer_stop(cnxk_mldev, model->model_id, layer->name);
+		if (ret != 0) {
+			plt_err("Layer stop failed, model_id = %u, layer_name = %s, error = %d",
+				model->model_id, layer->name, ret);
+			return ret;
+		}
+	}
+	layer_id++;
+
+	if (layer_id < model->nb_layers)
+		goto next_layer;
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 770794fe7d..55459f9f7f 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -19,5 +19,7 @@ int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index a17a76e41f..b8c2e6a1fc 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -72,3 +72,21 @@ mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mode
 
 	return -EINVAL;
 }
+
+int
+mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+
+	return -EINVAL;
+}
+
+int
+mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+
+	return -EINVAL;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 3776fb5369..1eb663b1d1 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -16,6 +16,8 @@ int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 26/35] ml/cnxk: update internal TVM model info structure
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
                     ` (24 preceding siblings ...)
  2023-09-27 18:30   ` [PATCH v3 25/35] ml/cnxk: support start and stop for TVM models Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 27/35] ml/cnxk: support device dump for TVM models Srikanth Yalavarthi
                     ` (8 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

From: Prince Takkar <ptakkar@marvell.com>

Added support to update internal model info structure
for TVM models.

Signed-off-by: Prince Takkar <ptakkar@marvell.com>
Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/mvtvm_ml_model.c | 65 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 +
 drivers/ml/cnxk/mvtvm_ml_ops.c   |  3 ++
 3 files changed, 70 insertions(+)

diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 14f4b258d8..569147aca7 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -11,6 +11,7 @@
 
 #include <roc_api.h>
 
+#include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 
 /* Objects list */
@@ -246,3 +247,67 @@ mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
 
 	return &model->mvtvm.info;
 }
+
+void
+mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct tvmdp_model_metadata *metadata;
+	struct rte_ml_model_info *info;
+	struct rte_ml_io_info *output;
+	struct rte_ml_io_info *input;
+	uint8_t i;
+
+	info = PLT_PTR_CAST(model->info);
+	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
+	output = PLT_PTR_ADD(input, ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info));
+
+	/* Reset model info */
+	memset(info, 0, sizeof(struct rte_ml_model_info));
+
+	if (model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL)
+		goto tvm_mrvl_model;
+
+	metadata = &model->mvtvm.metadata;
+	rte_memcpy(info->name, metadata->model.name, TVMDP_NAME_STRLEN);
+	snprintf(info->version, RTE_ML_STR_MAX, "%u.%u.%u.%u", metadata->model.version[0],
+		 metadata->model.version[1], metadata->model.version[2],
+		 metadata->model.version[3]);
+	info->model_id = model->model_id;
+	info->device_id = cnxk_mldev->mldev->data->dev_id;
+	info->io_layout = RTE_ML_IO_LAYOUT_SPLIT;
+	info->min_batches = model->batch_size;
+	info->max_batches = model->batch_size;
+	info->nb_inputs = metadata->model.num_input;
+	info->input_info = input;
+	info->nb_outputs = metadata->model.num_output;
+	info->output_info = output;
+	info->wb_size = 0;
+
+	/* Set input info */
+	for (i = 0; i < info->nb_inputs; i++) {
+		rte_memcpy(input[i].name, metadata->input[i].name, MRVL_ML_INPUT_NAME_LEN);
+		input[i].nb_dims = metadata->input[i].ndim;
+		input[i].shape = &model->mvtvm.info.input[i].shape[0];
+		input[i].type = model->mvtvm.info.input[i].qtype;
+		input[i].nb_elements = model->mvtvm.info.input[i].nb_elements;
+		input[i].size = model->mvtvm.info.input[i].nb_elements *
+				rte_ml_io_type_size_get(model->mvtvm.info.input[i].qtype);
+	}
+
+	/* Set output info */
+	for (i = 0; i < info->nb_outputs; i++) {
+		rte_memcpy(output[i].name, metadata->output[i].name, MRVL_ML_OUTPUT_NAME_LEN);
+		output[i].nb_dims = metadata->output[i].ndim;
+		output[i].shape = &model->mvtvm.info.output[i].shape[0];
+		output[i].type = model->mvtvm.info.output[i].qtype;
+		output[i].nb_elements = model->mvtvm.info.output[i].nb_elements;
+		output[i].size = model->mvtvm.info.output[i].nb_elements *
+				 rte_ml_io_type_size_get(model->mvtvm.info.output[i].qtype);
+	}
+
+	return;
+
+tvm_mrvl_model:
+	cn10k_ml_model_info_set(cnxk_mldev, model, &model->mvtvm.info,
+				&model->layer[0].glow.metadata);
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index e86581bc6a..a1247ffbde 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -11,6 +11,7 @@
 
 #include "cnxk_ml_io.h"
 
+struct cnxk_ml_dev;
 struct cnxk_ml_model;
 
 /* Maximum number of objects per model */
@@ -52,5 +53,6 @@ int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_n
 				uint16_t *layer_id);
 void mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model);
 struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
+void mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 1d0b3544a7..f13ba76207 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -178,6 +178,9 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	/* Update model I/O data */
 	mvtvm_ml_model_io_info_set(model);
 
+	/* Set model info */
+	mvtvm_ml_model_info_set(cnxk_mldev, model);
+
 	return 0;
 
 error:
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 27/35] ml/cnxk: support device dump for TVM models
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
                     ` (25 preceding siblings ...)
  2023-09-27 18:30   ` [PATCH v3 26/35] ml/cnxk: update internal TVM model info structure Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 28/35] ml/cnxk: enable reporting model runtime as xstats Srikanth Yalavarthi
                     ` (7 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enabled support to print TVM model layer info.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_model.c  |  7 +++-
 drivers/ml/cnxk/mvtvm_ml_model.c | 59 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  8 +++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  2 ++
 5 files changed, 77 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
index 02f80410ec..ed6a1ed866 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.c
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -68,6 +68,8 @@ cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 	cnxk_ml_print_line(fp, LINE_LEN);
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
 	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->name);
+	fprintf(fp, "%*s : %d\n", FIELD_LEN, "type", model->type);
+	fprintf(fp, "%*s : %d\n", FIELD_LEN, "subtype", model->subtype);
 	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->batch_size);
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "nb_layers", model->nb_layers);
@@ -84,6 +86,9 @@ cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 
 	for (layer_id = 0; layer_id < model->nb_layers; layer_id++) {
 		layer = &model->layer[layer_id];
-		cn10k_ml_layer_print(cnxk_mldev, layer, fp);
+		if (layer->type == ML_CNXK_LAYER_TYPE_MRVL)
+			cn10k_ml_layer_print(cnxk_mldev, layer, fp);
+		else
+			mvtvm_ml_layer_print(cnxk_mldev, layer, fp);
 	}
 }
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 569147aca7..4c12f584d5 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -13,6 +13,7 @@
 
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_utils.h"
 
 /* Objects list */
 char mvtvm_object_list[ML_MVTVM_MODEL_OBJECT_MAX][RTE_ML_STR_MAX] = {"mod.so", "mod.json",
@@ -311,3 +312,61 @@ mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 	cn10k_ml_model_info_set(cnxk_mldev, model, &model->mvtvm.info,
 				&model->layer[0].glow.metadata);
 }
+
+void
+mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp)
+{
+	char str[STR_LEN];
+	uint8_t i;
+
+	/* Print debug info */
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, " Layer Information (Layer ID: %u, Name: %s)\n",
+		cnxk_mldev->index_map[layer->index].layer_id, layer->name);
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "layer_id",
+		cnxk_mldev->index_map[layer->index].layer_id);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", layer->name);
+	fprintf(fp, "%*s : %d\n", FIELD_LEN, "type", layer->type);
+	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "layer", PLT_U64_CAST(layer));
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", layer->batch_size);
+
+	/* Print model state */
+	if (layer->state == ML_CNXK_LAYER_STATE_LOADED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+	if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+	if (layer->state == ML_CNXK_LAYER_STATE_STARTED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", layer->info.nb_inputs);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", layer->info.nb_outputs);
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s\n", "input", "input_name", "input_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->info.nb_inputs; i++) {
+		fprintf(fp, "%8u  ", i);
+		fprintf(fp, "%*s  ", 16, layer->info.input[i].name);
+		rte_ml_io_type_to_str(layer->info.input[i].qtype, str, STR_LEN);
+		fprintf(fp, "%*s  ", 12, str);
+	}
+	fprintf(fp, "\n");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s\n", "output", "output_name", "output_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->info.nb_outputs; i++) {
+		fprintf(fp, "%8u  ", i);
+		fprintf(fp, "%*s  ", 16, layer->info.output[i].name);
+		rte_ml_io_type_to_str(layer->info.output[i].qtype, str, STR_LEN);
+		fprintf(fp, "%*s  ", 12, str);
+		fprintf(fp, "\n");
+	}
+	fprintf(fp, "\n");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index a1247ffbde..900ba44fa0 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -13,6 +13,7 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 /* Maximum number of objects per model */
 #define ML_MVTVM_MODEL_OBJECT_MAX 3
@@ -54,5 +55,6 @@ int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_n
 void mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model);
 struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
 void mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+void mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index b8c2e6a1fc..260a051b08 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -36,6 +36,14 @@ mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
 	return NULL;
 }
 
+void
+mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(layer);
+	RTE_SET_USED(fp);
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 1eb663b1d1..d6d0edbcf1 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -9,6 +9,7 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
@@ -22,5 +23,6 @@ int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
 struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
+void mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
 
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 28/35] ml/cnxk: enable reporting model runtime as xstats
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
                     ` (26 preceding siblings ...)
  2023-09-27 18:30   ` [PATCH v3 27/35] ml/cnxk: support device dump for TVM models Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 29/35] ml/cnxk: implement I/O alloc and free callbacks Srikanth Yalavarthi
                     ` (6 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added model xstats entries to compute runtime latency.
Allocated internal resources for TVM model xstats.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c   |   9 +++
 drivers/ml/cnxk/cn10k_ml_ops.h   |   2 +
 drivers/ml/cnxk/cnxk_ml_ops.c    | 131 +++++++++++++++++++++++++++----
 drivers/ml/cnxk/cnxk_ml_ops.h    |   1 +
 drivers/ml/cnxk/cnxk_ml_xstats.h |   7 ++
 drivers/ml/cnxk/mvtvm_ml_model.h |  24 ++++++
 drivers/ml/cnxk/mvtvm_ml_ops.c   |  96 +++++++++++++++++++++-
 drivers/ml/cnxk/mvtvm_ml_ops.h   |   8 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  23 ++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |   6 ++
 10 files changed, 289 insertions(+), 18 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 2d308802cf..0c67ce7b40 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -197,6 +197,15 @@ cn10k_ml_xstats_layer_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model
 	}
 }
 
+void
+cn10k_ml_xstat_model_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			      uint16_t stat_id, uint16_t entry, char *suffix)
+{
+	snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+		 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+		 model->glow.metadata.model.name, model_xstats[entry].name, suffix);
+}
+
 #define ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
 	do {                                                                                       \
 		value = 0;                                                                         \
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 3d18303ed3..045e2e6cd2 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -331,6 +331,8 @@ int cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name
 int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name);
 
 /* xstats ops */
+void cn10k_ml_xstat_model_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				   uint16_t stat_id, uint16_t entry, char *suffix);
 uint64_t cn10k_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				  enum cnxk_ml_xstats_type type);
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 9d664571c4..c7d42ed950 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -138,7 +138,8 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
 	nb_stats = RTE_DIM(device_xstats) +
-		   RTE_DIM(layer_xstats) * ML_CNXK_MAX_MODELS * ML_CNXK_MODEL_MAX_LAYERS;
+		   RTE_DIM(layer_xstats) * ML_CNXK_MAX_MODELS * ML_CNXK_MODEL_MAX_LAYERS +
+		   RTE_DIM(model_xstats) * ML_CNXK_MAX_MODELS;
 	if (cnxk_mldev->xstats.entries == NULL)
 		cnxk_mldev->xstats.entries = rte_zmalloc(
 			"cnxk_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
@@ -169,6 +170,25 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
 		cnxk_mldev->xstats.offset_for_model[model] = stat_id;
 
+		for (i = 0; i < RTE_DIM(model_xstats); i++) {
+			cnxk_mldev->xstats.entries[stat_id].map.id = stat_id;
+			cnxk_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
+			cnxk_mldev->xstats.entries[stat_id].group = CNXK_ML_XSTATS_GROUP_MODEL;
+			cnxk_mldev->xstats.entries[stat_id].type = model_xstats[i].type;
+			cnxk_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
+			cnxk_mldev->xstats.entries[stat_id].obj_idx = model;
+			cnxk_mldev->xstats.entries[stat_id].layer_id = -1;
+			cnxk_mldev->xstats.entries[stat_id].reset_allowed =
+				model_xstats[i].reset_allowed;
+
+			/* Name of xstat is updated during model load */
+			snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+				 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name),
+				 "Model-%u-%s", model, model_xstats[i].name);
+
+			stat_id++;
+		}
+
 		for (layer = 0; layer < ML_CNXK_MODEL_MAX_LAYERS; layer++) {
 			cnxk_mldev->xstats.offset_for_layer[model][layer] = stat_id;
 
@@ -195,7 +215,8 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 			cnxk_mldev->xstats.count_per_layer[model][layer] = RTE_DIM(layer_xstats);
 		}
 
-		cnxk_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
+		cnxk_mldev->xstats.count_per_model[model] =
+			RTE_DIM(layer_xstats) + ML_CNXK_MODEL_MAX_LAYERS * RTE_DIM(model_xstats);
 	}
 
 	cnxk_mldev->xstats.count_mode_model = stat_id - cnxk_mldev->xstats.count_mode_device;
@@ -204,6 +225,36 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
+void
+cnxk_ml_xstats_model_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id)
+{
+	struct cnxk_ml_model *model;
+	uint16_t rclk_freq;
+	uint16_t sclk_freq;
+	uint16_t stat_id;
+	char suffix[8];
+	uint16_t i;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	stat_id = cnxk_mldev->xstats.offset_for_model[model_id];
+
+	roc_clk_freq_get(&rclk_freq, &sclk_freq);
+	if (sclk_freq == 0)
+		strcpy(suffix, "cycles");
+	else
+		strcpy(suffix, "ns");
+
+	/* Update xstat name based on layer name and sclk availability */
+	for (i = 0; i < RTE_DIM(model_xstats); i++) {
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+			cn10k_ml_xstat_model_name_set(cnxk_mldev, model, stat_id, i, suffix);
+		else
+			mvtvm_ml_model_xstat_name_set(cnxk_mldev, model, stat_id, i, suffix);
+
+		stat_id++;
+	}
+}
+
 static void
 cnxk_ml_xstats_uninit(struct cnxk_ml_dev *cnxk_mldev)
 {
@@ -247,13 +298,22 @@ cnxk_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx, int32_
 	if (model == NULL)
 		return 0;
 
-	if (layer_id >= 0)
+	if (layer_id >= 0) {
 		layer = &model->layer[layer_id];
-	else
-		return 0;
+		goto layer_xstats;
+	} else {
+		layer = NULL;
+		goto model_xstats;
+	}
 
+layer_xstats:
 	value = cn10k_ml_model_xstat_get(cnxk_mldev, layer, type);
+	goto exit_xstats;
+
+model_xstats:
+	value = mvtvm_ml_model_xstat_get(cnxk_mldev, model, type);
 
+exit_xstats:
 	roc_clk_freq_get(&rclk_freq, &sclk_freq);
 	if (sclk_freq != 0) /* return in ns */
 		value = (value * 1000ULL) / sclk_freq;
@@ -836,8 +896,9 @@ cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode
 {
 	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
 	uint32_t xstats_mode_count;
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	uint32_t idx = 0;
 	uint32_t i;
 
@@ -854,7 +915,17 @@ cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CNXK_MAX_MODELS)
 			break;
-		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+
+		model = cnxk_mldev->mldev->data->models[model_id];
+		for (layer_id = 0; layer_id < model->nb_layers; layer_id++) {
+			if (model->layer[layer_id].type == ML_CNXK_LAYER_TYPE_MRVL)
+				xstats_mode_count +=
+					cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+		}
+
+		if ((model->type == ML_CNXK_MODEL_TYPE_TVM) &&
+		    (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_MRVL))
+			xstats_mode_count += RTE_DIM(model_xstats);
 		break;
 	default:
 		return -EINVAL;
@@ -868,9 +939,20 @@ cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode
 		if (xs->mode != mode)
 			continue;
 
-		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
-		    (model_id != xs->obj_idx || layer_id != xs->layer_id))
-			continue;
+		if (mode == RTE_ML_DEV_XSTATS_MODEL) {
+			if (model_id != xs->obj_idx)
+				continue;
+
+			model = cnxk_mldev->mldev->data->models[model_id];
+			if ((model->type == ML_CNXK_MODEL_TYPE_GLOW ||
+			     model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL) &&
+			    xs->group == CNXK_ML_XSTATS_GROUP_MODEL)
+				continue;
+
+			if (model->type == ML_CNXK_MODEL_TYPE_TVM &&
+			    model->layer[xs->layer_id].type == ML_CNXK_LAYER_TYPE_LLVM)
+				continue;
+		}
 
 		strncpy(xstats_map[idx].name, xs->map.name, RTE_ML_STR_MAX);
 		xstats_map[idx].id = xs->map.id;
@@ -931,9 +1013,10 @@ cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 {
 	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
 	uint32_t xstats_mode_count;
-	uint16_t layer_id = 0;
 	cnxk_ml_xstats_fn fn;
+	uint16_t layer_id;
 	uint64_t val;
 	uint32_t idx;
 	uint32_t i;
@@ -951,7 +1034,14 @@ cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CNXK_MAX_MODELS)
 			return -EINVAL;
-		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+
+		model = cnxk_mldev->mldev->data->models[model_id];
+		for (layer_id = 0; layer_id < model->nb_layers; layer_id++)
+			xstats_mode_count += cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+
+		if ((model->type == ML_CNXK_MODEL_TYPE_TVM) &&
+		    (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_MRVL))
+			xstats_mode_count += RTE_DIM(model_xstats);
 		break;
 	default:
 		return -EINVAL;
@@ -963,11 +1053,18 @@ cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 		if (stat_ids[i] > cnxk_mldev->xstats.count || xs->mode != mode)
 			continue;
 
-		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
-		    (model_id != xs->obj_idx || layer_id != xs->layer_id)) {
-			plt_err("Invalid stats_id[%d] = %d for model_id = %d\n", i, stat_ids[i],
-				model_id);
-			return -EINVAL;
+		if (mode == RTE_ML_DEV_XSTATS_MODEL) {
+			if (model_id != xs->obj_idx)
+				continue;
+
+			model = cnxk_mldev->mldev->data->models[xs->obj_idx];
+			if ((model->type == ML_CNXK_MODEL_TYPE_GLOW ||
+			     model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL) &&
+			    xs->group == CNXK_ML_XSTATS_GROUP_MODEL)
+				continue;
+
+			if (xs->layer_id == -1 && xs->group == CNXK_ML_XSTATS_GROUP_LAYER)
+				continue;
 		}
 
 		switch (xs->fn_id) {
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index b22a2b0d95..ab32676b3e 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -70,6 +70,7 @@ extern struct rte_ml_dev_ops cnxk_ml_ops;
 
 int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
 int cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
+void cnxk_ml_xstats_model_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id);
 
 __rte_hot uint16_t cnxk_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					 struct rte_ml_op **ops, uint16_t nb_ops);
diff --git a/drivers/ml/cnxk/cnxk_ml_xstats.h b/drivers/ml/cnxk/cnxk_ml_xstats.h
index 5e02bb876c..a2c9adfe4a 100644
--- a/drivers/ml/cnxk/cnxk_ml_xstats.h
+++ b/drivers/ml/cnxk/cnxk_ml_xstats.h
@@ -142,4 +142,11 @@ static const struct cnxk_ml_xstat_info layer_xstats[] = {
 	{"Min-FW-Latency", min_fw_latency, 1}, {"Max-FW-Latency", max_fw_latency, 1},
 };
 
+/* Model xstats */
+static const struct cnxk_ml_xstat_info model_xstats[] = {
+	{"Avg-RT-Latency", avg_rt_latency, 1},
+	{"Min-RT-Latency", min_rt_latency, 1},
+	{"Max-RT-Latency", max_rt_latency, 1},
+};
+
 #endif /* _CNXK_ML_XSTATS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 900ba44fa0..66c3af18e1 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -33,6 +33,27 @@ struct mvtvm_ml_model_object {
 	int64_t size;
 };
 
+/* Model fast-path stats */
+struct mvtvm_ml_model_xstats {
+	/* Total TVM runtime latency, sum of all inferences */
+	uint64_t tvm_rt_latency_tot;
+
+	/* TVM runtime latency */
+	uint64_t tvm_rt_latency;
+
+	/* Minimum TVM runtime latency */
+	uint64_t tvm_rt_latency_min;
+
+	/* Maximum TVM runtime latency */
+	uint64_t tvm_rt_latency_max;
+
+	/* Total jobs dequeued */
+	uint64_t dequeued_count;
+
+	/* Hardware stats reset index */
+	uint64_t tvm_rt_reset_count;
+};
+
 struct mvtvm_ml_model_data {
 	/* Model metadata */
 	struct tvmdp_model_metadata metadata;
@@ -45,6 +66,9 @@ struct mvtvm_ml_model_data {
 
 	/* Model I/O info */
 	struct cnxk_ml_io_info info;
+
+	/* Stats for burst ops */
+	struct mvtvm_ml_model_xstats *burst_xstats;
 };
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index f13ba76207..832837034b 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -10,10 +10,83 @@
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
+#include "cnxk_ml_xstats.h"
 
 /* ML model macros */
 #define MVTVM_ML_MODEL_MEMZONE_NAME "ml_mvtvm_model_mz"
 
+void
+mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			      uint16_t stat_id, uint16_t entry, char *suffix)
+{
+	snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+		 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+		 model->mvtvm.metadata.model.name, model_xstats[entry].name, suffix);
+}
+
+#define ML_AVG_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count)                            \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value += model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_tot;              \
+			count += model->mvtvm.burst_xstats[qp_id].dequeued_count -                 \
+				 model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count;              \
+		}                                                                                  \
+		if (count != 0)                                                                    \
+			value = value / count;                                                     \
+	} while (0)
+
+#define ML_MIN_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count)                            \
+	do {                                                                                       \
+		value = UINT64_MAX;                                                                \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MIN(value,                                                     \
+					model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_min);      \
+			count += model->mvtvm.burst_xstats[qp_id].dequeued_count -                 \
+				 model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count;              \
+		}                                                                                  \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+
+#define ML_MAX_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count)                            \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MAX(value,                                                     \
+					model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_max);      \
+			count += model->mvtvm.burst_xstats[qp_id].dequeued_count -                 \
+				 model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count;              \
+		}                                                                                  \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+
+uint64_t
+mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			 enum cnxk_ml_xstats_type type)
+{
+	uint64_t count = 0;
+	uint64_t value = 0;
+	uint32_t qp_id;
+
+	switch (type) {
+	case avg_rt_latency:
+		ML_AVG_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count);
+		break;
+	case min_rt_latency:
+		ML_MIN_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count);
+		break;
+	case max_rt_latency:
+		ML_MAX_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count);
+		break;
+	default:
+		value = 0;
+	}
+
+	return value;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
@@ -53,6 +126,7 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
 	size_t model_object_size = 0;
+	size_t model_xstats_size = 0;
 	uint16_t nb_mrvl_layers;
 	uint16_t nb_llvm_layers;
 	uint8_t layer_id = 0;
@@ -68,7 +142,11 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	model_object_size = RTE_ALIGN_CEIL(object[0].size, RTE_CACHE_LINE_MIN_SIZE) +
 			    RTE_ALIGN_CEIL(object[1].size, RTE_CACHE_LINE_MIN_SIZE) +
 			    RTE_ALIGN_CEIL(object[2].size, RTE_CACHE_LINE_MIN_SIZE);
-	mz_size += model_object_size;
+
+	model_xstats_size =
+		cnxk_mldev->mldev->data->nb_queue_pairs * sizeof(struct mvtvm_ml_model_xstats);
+
+	mz_size += model_object_size + model_xstats_size;
 
 	/* Allocate memzone for model object */
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", MVTVM_ML_MODEL_MEMZONE_NAME, model->model_id);
@@ -181,6 +259,22 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	/* Set model info */
 	mvtvm_ml_model_info_set(cnxk_mldev, model);
 
+	/* Update model xstats name */
+	cnxk_ml_xstats_model_name_update(cnxk_mldev, model->model_id);
+
+	model->mvtvm.burst_xstats = RTE_PTR_ADD(
+		model->mvtvm.object.params.addr,
+		RTE_ALIGN_CEIL(model->mvtvm.object.params.size, RTE_CACHE_LINE_MIN_SIZE));
+
+	for (int qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_tot = 0;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency = 0;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_min = UINT64_MAX;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_max = 0;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count = 0;
+		model->mvtvm.burst_xstats[qp_id].dequeued_count = 0;
+	}
+
 	return 0;
 
 error:
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 55459f9f7f..22e0340146 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -11,8 +11,11 @@
 
 #include <rte_mldev.h>
 
+#include "cnxk_ml_xstats.h"
+
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
@@ -22,4 +25,9 @@ int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *
 int mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
+void mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				   uint16_t stat_id, uint16_t entry, char *suffix);
+uint64_t mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				  enum cnxk_ml_xstats_type type);
+
 #endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index 260a051b08..19af1d2703 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -8,6 +8,7 @@
 
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_xstats.h"
 
 enum cnxk_ml_model_type
 mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
@@ -44,6 +45,28 @@ mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer
 	RTE_SET_USED(fp);
 }
 
+void
+mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			      uint16_t stat_id, uint16_t entry, char *suffix)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+	RTE_SET_USED(stat_id);
+	RTE_SET_USED(entry);
+	RTE_SET_USED(suffix);
+}
+
+uint64_t
+mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			 enum cnxk_ml_xstats_type type)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+	RTE_SET_USED(type);
+
+	return 0;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index d6d0edbcf1..3fd1f04c35 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -7,6 +7,8 @@
 
 #include <rte_mldev.h>
 
+#include "cnxk_ml_xstats.h"
+
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
@@ -24,5 +26,9 @@ int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_n
 				uint16_t *layer_id);
 struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
 void mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
+void mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				   uint16_t stat_id, uint16_t entry, char *suffix);
+uint64_t mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				  enum cnxk_ml_xstats_type type);
 
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 29/35] ml/cnxk: implement I/O alloc and free callbacks
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
                     ` (27 preceding siblings ...)
  2023-09-27 18:30   ` [PATCH v3 28/35] ml/cnxk: enable reporting model runtime as xstats Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 30/35] ml/cnxk: add generic ML malloc and free callback Srikanth Yalavarthi
                     ` (5 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented callback functions for IO allocation and free
for Glow layers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 87 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  3 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c |  2 +
 3 files changed, 92 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 0c67ce7b40..7802425c87 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1410,3 +1410,90 @@ cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
 error_enqueue:
 	return ret;
 }
+
+int
+cn10k_ml_io_alloc(void *device, uint16_t model_id, const char *layer_name, uint64_t **input_qbuffer,
+		  uint64_t **output_qbuffer)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint64_t output_size;
+	uint64_t input_size;
+	uint16_t layer_id;
+	int ret;
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
+	layer = &model->layer[layer_id];
+	input_size = PLT_ALIGN_CEIL(layer->info.total_input_sz_q, ML_CN10K_ALIGN_SIZE);
+	output_size = PLT_ALIGN_CEIL(layer->info.total_output_sz_q, ML_CN10K_ALIGN_SIZE);
+
+	sprintf(str, "cn10k_ml_io_mz_%u_%u", model_id, layer_id);
+	mz = plt_memzone_reserve_aligned(str, input_size + output_size, 0, ML_CN10K_ALIGN_SIZE);
+	if (mz == NULL) {
+		plt_err("io_alloc failed: Unable to allocate memory: model_id = %u, layer_name = %s",
+			model_id, layer_name);
+		return -ENOMEM;
+	}
+
+	*input_qbuffer = mz->addr;
+	*output_qbuffer = PLT_PTR_ADD(mz->addr, input_size);
+
+	return 0;
+}
+
+int
+cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint16_t layer_id;
+	int ret;
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
+	sprintf(str, "cn10k_ml_io_mz_%u_%u", model_id, layer_id);
+	mz = plt_memzone_lookup(str);
+	if (mz == NULL) {
+		plt_err("io_free failed: Memzone not found: model_id = %u, layer_name = %s",
+			model_id, layer_name);
+		return -EINVAL;
+	}
+
+	return plt_memzone_free(mz);
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 045e2e6cd2..9c41c1c0b0 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -329,6 +329,9 @@ int cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name,
 int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name);
 int cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name);
 int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name);
+int cn10k_ml_io_alloc(void *device, uint16_t model_id, const char *layer_name,
+		      uint64_t **input_qbuffer, uint64_t **output_qbuffer);
+int cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name);
 
 /* xstats ops */
 void cn10k_ml_xstat_model_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 832837034b..77c2b5bcdc 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -232,6 +232,8 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback = &model->mvtvm.cb;
 		callback->tvmrt_glow_layer_load = cn10k_ml_layer_load;
 		callback->tvmrt_glow_layer_unload = cn10k_ml_layer_unload;
+		callback->tvmrt_io_alloc = cn10k_ml_io_alloc;
+		callback->tvmrt_io_free = cn10k_ml_io_free;
 	} else {
 		callback = NULL;
 	}
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 30/35] ml/cnxk: add generic ML malloc and free callback
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
                     ` (28 preceding siblings ...)
  2023-09-27 18:30   ` [PATCH v3 29/35] ml/cnxk: implement I/O alloc and free callbacks Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 31/35] ml/cnxk: support quantize and dequantize callback Srikanth Yalavarthi
                     ` (4 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented generic ML malloc and free callbacks

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 30 ++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  3 +++
 drivers/ml/cnxk/mvtvm_ml_ops.c |  2 ++
 3 files changed, 35 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 7802425c87..01b0a44caa 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1497,3 +1497,33 @@ cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name)
 
 	return plt_memzone_free(mz);
 }
+
+int
+cn10k_ml_malloc(const char *name, size_t size, uint32_t align, void **addr)
+{
+	const struct plt_memzone *mz;
+
+	mz = plt_memzone_reserve_aligned(name, size, 0, align);
+	if (mz == NULL) {
+		plt_err("ml_malloc failed: Unable to allocate memory: name = %s", name);
+		return -ENOMEM;
+	}
+
+	*addr = mz->addr;
+
+	return 0;
+}
+
+int
+cn10k_ml_free(const char *name)
+{
+	const struct plt_memzone *mz;
+
+	mz = plt_memzone_lookup(name);
+	if (mz == NULL) {
+		plt_err("ml_free failed: Memzone not found: name = %s", name);
+		return -EINVAL;
+	}
+
+	return plt_memzone_free(mz);
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 9c41c1c0b0..eb3e1c139c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -333,6 +333,9 @@ int cn10k_ml_io_alloc(void *device, uint16_t model_id, const char *layer_name,
 		      uint64_t **input_qbuffer, uint64_t **output_qbuffer);
 int cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name);
 
+int cn10k_ml_malloc(const char *name, size_t size, uint32_t align, void **addr);
+int cn10k_ml_free(const char *name);
+
 /* xstats ops */
 void cn10k_ml_xstat_model_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 				   uint16_t stat_id, uint16_t entry, char *suffix);
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 77c2b5bcdc..b627355917 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -234,6 +234,8 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback->tvmrt_glow_layer_unload = cn10k_ml_layer_unload;
 		callback->tvmrt_io_alloc = cn10k_ml_io_alloc;
 		callback->tvmrt_io_free = cn10k_ml_io_free;
+		callback->tvmrt_malloc = cn10k_ml_malloc;
+		callback->tvmrt_free = cn10k_ml_free;
 	} else {
 		callback = NULL;
 	}
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 31/35] ml/cnxk: support quantize and dequantize callback
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
                     ` (29 preceding siblings ...)
  2023-09-27 18:30   ` [PATCH v3 30/35] ml/cnxk: add generic ML malloc and free callback Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 32/35] ml/cnxk: enable fast-path ops for TVM models Srikanth Yalavarthi
                     ` (3 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

From: Prince Takkar <ptakkar@marvell.com>

Added support for quantize and dequantize callback
functions for TVM models.

Signed-off-by: Prince Takkar <ptakkar@marvell.com>
---
 drivers/ml/cnxk/mvtvm_ml_ops.c | 129 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h |   4 +
 2 files changed, 133 insertions(+)

diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index b627355917..776675843a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -2,11 +2,15 @@
  * Copyright (c) 2023 Marvell.
  */
 
+#include <dlpack/dlpack.h>
+
 #include <rte_common.h>
 #include <rte_cycles.h>
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include <mldev_utils.h>
+
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
@@ -236,6 +240,8 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback->tvmrt_io_free = cn10k_ml_io_free;
 		callback->tvmrt_malloc = cn10k_ml_malloc;
 		callback->tvmrt_free = cn10k_ml_free;
+		callback->tvmrt_quantize = mvtvm_ml_io_quantize;
+		callback->tvmrt_dequantize = mvtvm_ml_io_dequantize;
 	} else {
 		callback = NULL;
 	}
@@ -366,3 +372,126 @@ mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
 
 	return 0;
 }
+
+int
+mvtvm_ml_io_quantize(void *device, uint16_t model_id, const char *layer_name,
+		     const DLTensor **deq_tensor, void *qbuffer)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	uint16_t layer_id = 0;
+	uint8_t *lcl_dbuffer;
+	uint8_t *lcl_qbuffer;
+	uint32_t i;
+	int ret;
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if ((device == NULL) || (deq_tensor == NULL) || (qbuffer == NULL))
+		return -EINVAL;
+#endif
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+#ifdef CNXK_ML_DEV_DEBUG
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+#endif
+
+	/* Get layer id */
+	for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+		if (strcmp(model->layer[layer_id].name, layer_name) == 0)
+			break;
+	}
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if (layer_id == model->mvtvm.metadata.model.nb_layers) {
+		plt_err("Invalid layer name: %s", layer_name);
+		return -EINVAL;
+	}
+
+	if (model->layer[layer_id].type != ML_CNXK_LAYER_TYPE_MRVL) {
+		plt_err("Invalid layer name / type: %s", layer_name);
+		return -EINVAL;
+	}
+#endif
+
+	info = &model->layer[layer_id].info;
+	lcl_qbuffer = (uint8_t *)qbuffer;
+
+	for (i = 0; i < info->nb_inputs; i++) {
+		lcl_dbuffer = PLT_PTR_ADD(deq_tensor[i]->data, deq_tensor[i]->byte_offset);
+
+		ret = cnxk_ml_io_quantize_single(&info->input[i], lcl_dbuffer, lcl_qbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_qbuffer += info->input[i].sz_q;
+	}
+
+	return 0;
+}
+
+int
+mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name, void *qbuffer,
+		       const DLTensor **deq_tensor)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	uint16_t layer_id = 0;
+	uint8_t *lcl_dbuffer;
+	uint8_t *lcl_qbuffer;
+	uint32_t i;
+	int ret;
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if ((device == NULL) || (deq_tensor == NULL) || (qbuffer == NULL))
+		return -EINVAL;
+#endif
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+#ifdef CNXK_ML_DEV_DEBUG
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+#endif
+
+	for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+		if (strcmp(model->layer[layer_id].name, layer_name) == 0)
+			break;
+	}
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if (layer_id == model->mvtvm.metadata.model.nb_layers) {
+		plt_err("Invalid layer name: %s", layer_name);
+		return -EINVAL;
+	}
+
+	if (model->layer[layer_id].type != ML_CNXK_LAYER_TYPE_MRVL) {
+		plt_err("Invalid layer name / type: %s", layer_name);
+		return -EINVAL;
+	}
+#endif
+
+	info = &model->layer[layer_id].info;
+	lcl_qbuffer = (uint8_t *)qbuffer;
+
+	for (i = 0; i < info->nb_outputs; i++) {
+		lcl_dbuffer = PLT_PTR_ADD(deq_tensor[i]->data, deq_tensor[i]->byte_offset);
+
+		ret = cnxk_ml_io_dequantize_single(&info->output[i], lcl_qbuffer, lcl_dbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_qbuffer += info->output[i].sz_q;
+	}
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 22e0340146..4cabe30a82 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -24,6 +24,10 @@ int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_para
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_io_quantize(void *device, uint16_t model_id, const char *layer_name,
+			 const DLTensor **deq_tensor, void *qbuffer);
+int mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name, void *qbuffer,
+			   const DLTensor **deq_tensor);
 
 void mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 				   uint16_t stat_id, uint16_t entry, char *suffix);
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 32/35] ml/cnxk: enable fast-path ops for TVM models
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
                     ` (30 preceding siblings ...)
  2023-09-27 18:30   ` [PATCH v3 31/35] ml/cnxk: support quantize and dequantize callback Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 33/35] ml/cnxk: enable creation of mvtvm virtual device Srikanth Yalavarthi
                     ` (2 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

From: Anup Prabhu <aprabhu@marvell.com>

Enable fast-path ops support for TVM models. Models would
use TVMDP library function calls to execute inference
operations for Hybrid and LLVM model sub-types.

For TVM MRVL model subtypes that have a single MRVL layer,
the inference requests are directly enqueued to hardware
by the driver.

Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c   |   4 -
 drivers/ml/cnxk/cnxk_ml_io.h     |   6 ++
 drivers/ml/cnxk/cnxk_ml_ops.c    |   4 +
 drivers/ml/cnxk/cnxk_ml_ops.h    |   5 ++
 drivers/ml/cnxk/mvtvm_ml_model.c |  20 +++++
 drivers/ml/cnxk/mvtvm_ml_model.h |   6 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 124 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |  43 +++++++++++
 8 files changed, 208 insertions(+), 4 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 01b0a44caa..b9d30278c6 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -371,10 +371,6 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 	else
 		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
-	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
-	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
-	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
-
 	return 0;
 }
 
diff --git a/drivers/ml/cnxk/cnxk_ml_io.h b/drivers/ml/cnxk/cnxk_ml_io.h
index 5de166c252..6d5d25a7c9 100644
--- a/drivers/ml/cnxk/cnxk_ml_io.h
+++ b/drivers/ml/cnxk/cnxk_ml_io.h
@@ -47,6 +47,12 @@ struct cnxk_ml_io {
 
 	/* Scale */
 	float scale;
+
+	/* Dequantized offset */
+	uint32_t offset_d;
+
+	/* Quantized offset */
+	uint32_t offset_q;
 };
 
 /* Model / Layer IO structure */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index c7d42ed950..9bcf1b099e 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -632,6 +632,10 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	cnxk_mldev->max_nb_layers =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
 
+	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
+	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
+	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
+
 	/* Allocate and initialize index_map */
 	if (cnxk_mldev->index_map == NULL) {
 		cnxk_mldev->index_map =
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index ab32676b3e..7b49793a57 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -24,6 +24,11 @@ struct cnxk_ml_req {
 	union {
 		/* CN10K */
 		struct cn10k_ml_req cn10k_req;
+
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		/* MVTVM */
+		struct mvtvm_ml_req mvtvm_req;
+#endif
 	};
 
 	/* Address of status field */
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 4c12f584d5..1dfd0d176a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -198,6 +198,16 @@ mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model)
 		model->mvtvm.info.total_input_sz_d += model->mvtvm.info.input[i].sz_d;
 		model->mvtvm.info.total_input_sz_q += model->mvtvm.info.input[i].sz_q;
 
+		model->mvtvm.info.input[i].offset_d = model->mvtvm.info.total_input_sz_d;
+		model->mvtvm.info.input[i].offset_q = model->mvtvm.info.total_input_sz_q;
+
+		model->mvtvm.input_tensor[i].device = metadata->input[i].device;
+		model->mvtvm.input_tensor[i].ndim = metadata->input[i].ndim;
+		model->mvtvm.input_tensor[i].dtype = metadata->input[i].datatype;
+		model->mvtvm.input_tensor[i].shape = metadata->input[i].shape;
+		model->mvtvm.input_tensor[i].strides = NULL;
+		model->mvtvm.input_tensor[i].byte_offset = model->mvtvm.info.input[i].offset_q;
+
 		plt_ml_dbg("model_id = %u, input[%u] - sz_d = %u sz_q = %u", model->model_id, i,
 			   model->mvtvm.info.input[i].sz_d, model->mvtvm.info.input[i].sz_q);
 	}
@@ -231,6 +241,16 @@ mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model)
 		model->mvtvm.info.total_output_sz_d += model->mvtvm.info.output[i].sz_d;
 		model->mvtvm.info.total_output_sz_q += model->mvtvm.info.output[i].sz_q;
 
+		model->mvtvm.info.output[i].offset_d = model->mvtvm.info.total_output_sz_d;
+		model->mvtvm.info.output[i].offset_q = model->mvtvm.info.total_output_sz_q;
+
+		model->mvtvm.output_tensor[i].device = metadata->output[i].device;
+		model->mvtvm.output_tensor[i].ndim = metadata->output[i].ndim;
+		model->mvtvm.output_tensor[i].dtype = metadata->output[i].datatype;
+		model->mvtvm.output_tensor[i].shape = metadata->output[i].shape;
+		model->mvtvm.output_tensor[i].strides = NULL;
+		model->mvtvm.output_tensor[i].byte_offset = model->mvtvm.info.output[i].offset_q;
+
 		plt_ml_dbg("model_id = %u, output[%u] - sz_d = %u sz_q = %u", model->model_id, i,
 			   model->mvtvm.info.output[i].sz_d, model->mvtvm.info.output[i].sz_q);
 	}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 66c3af18e1..7ffce38094 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -69,6 +69,12 @@ struct mvtvm_ml_model_data {
 
 	/* Stats for burst ops */
 	struct mvtvm_ml_model_xstats *burst_xstats;
+
+	/* Input Tensor */
+	DLTensor input_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Output Tensor */
+	DLTensor output_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
 };
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 776675843a..1e74b82a0a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -19,6 +19,12 @@
 /* ML model macros */
 #define MVTVM_ML_MODEL_MEMZONE_NAME "ml_mvtvm_model_mz"
 
+__rte_hot static void
+mvtvm_ml_set_poll_addr(struct cnxk_ml_req *req)
+{
+	req->status = &req->mvtvm_req.status;
+}
+
 void
 mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 			      uint16_t stat_id, uint16_t entry, char *suffix)
@@ -242,6 +248,7 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback->tvmrt_free = cn10k_ml_free;
 		callback->tvmrt_quantize = mvtvm_ml_io_quantize;
 		callback->tvmrt_dequantize = mvtvm_ml_io_dequantize;
+		callback->tvmrt_inference = cn10k_ml_inference_sync;
 	} else {
 		callback = NULL;
 	}
@@ -285,6 +292,19 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		model->mvtvm.burst_xstats[qp_id].dequeued_count = 0;
 	}
 
+	/* Set model specific fast path functions */
+	if (model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL) {
+		model->enqueue_single = cn10k_ml_enqueue_single;
+		model->result_update = cn10k_ml_result_update;
+		model->set_error_code = cn10k_ml_set_error_code;
+		model->set_poll_addr = cn10k_ml_set_poll_addr;
+	} else {
+		model->enqueue_single = mvtvm_ml_enqueue_single;
+		model->result_update = mvtvm_ml_result_update;
+		model->set_error_code = mvtvm_ml_set_error_code;
+		model->set_poll_addr = mvtvm_ml_set_poll_addr;
+	}
+
 	return 0;
 
 error:
@@ -495,3 +515,107 @@ mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name,
 
 	return 0;
 }
+
+static int
+mvtvm_ml_model_run(struct cnxk_ml_model *model, struct rte_ml_op *op, struct cnxk_ml_req *req)
+{
+	uint8_t i;
+
+	rte_memcpy(req->mvtvm_req.input_tensor, model->mvtvm.input_tensor,
+		   model->mvtvm.metadata.model.num_input * sizeof(DLTensor));
+	for (i = 0; i < model->mvtvm.metadata.model.num_input; i++) {
+		req->mvtvm_req.input_tensor[i].data = op->input[i]->addr;
+		req->mvtvm_req.input_tensor[i].byte_offset = 0;
+	}
+
+	rte_memcpy(req->mvtvm_req.output_tensor, model->mvtvm.output_tensor,
+		   model->mvtvm.metadata.model.num_output * sizeof(DLTensor));
+	for (i = 0; i < model->mvtvm.metadata.model.num_output; i++) {
+		req->mvtvm_req.output_tensor[i].data = op->output[i]->addr;
+		req->mvtvm_req.output_tensor[i].byte_offset = 0;
+	}
+
+	tvmdp_model_run(model->model_id, model->mvtvm.metadata.model.num_input,
+			req->mvtvm_req.input_tensor, model->mvtvm.metadata.model.num_output,
+			req->mvtvm_req.output_tensor, &req->mvtvm_req.result,
+			&req->mvtvm_req.status);
+
+	plt_write64(ML_CNXK_POLL_JOB_FINISH, req->status);
+
+	return 0;
+}
+
+__rte_hot void
+mvtvm_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype)
+{
+	RTE_SET_USED(stype);
+
+	req->mvtvm_req.result.error_code = etype;
+}
+
+__rte_hot bool
+mvtvm_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op, uint16_t layer_id,
+			struct cnxk_ml_qp *qp, uint64_t head)
+{
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_req *req;
+
+	RTE_SET_USED(layer_id);
+
+	queue = &qp->queue;
+	req = &queue->reqs[head];
+	model = cnxk_mldev->mldev->data->models[op->model_id];
+
+	model->set_poll_addr(req);
+	memset(&req->mvtvm_req.result, 0, sizeof(struct mvtvm_ml_result));
+	req->mvtvm_req.result.error_code = 0x0;
+	req->mvtvm_req.result.user_ptr = op->user_ptr;
+
+	cnxk_ml_set_poll_ptr(req);
+	mvtvm_ml_model_run(model, op, req);
+	req->timeout = plt_tsc_cycles() + queue->wait_cycles;
+	req->op = op;
+
+	return true;
+}
+
+__rte_hot void
+mvtvm_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request)
+{
+	struct mvtvm_ml_model_xstats *xstats;
+	struct mvtvm_ml_result *result;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_req *req;
+	uint64_t tvm_rt_latency;
+	struct cnxk_ml_qp *qp;
+	struct rte_ml_op *op;
+
+	req = (struct cnxk_ml_req *)request;
+	result = &req->mvtvm_req.result;
+	op = req->op;
+	qp = cnxk_mldev->mldev->data->queue_pairs[qp_id];
+	op->impl_opaque = result->error_code;
+
+	if (likely(result->error_code == 0)) {
+		qp->stats.dequeued_count++;
+		op->status = RTE_ML_OP_STATUS_SUCCESS;
+
+		model = cnxk_mldev->mldev->data->models[op->model_id];
+		xstats = &model->mvtvm.burst_xstats[qp_id];
+
+		if (unlikely(xstats->dequeued_count == xstats->tvm_rt_reset_count)) {
+			xstats->tvm_rt_latency_min = UINT64_MAX;
+			xstats->tvm_rt_latency_max = 0;
+		}
+		tvm_rt_latency = result->stats.end_ns - result->stats.start_ns;
+		xstats->tvm_rt_latency = tvm_rt_latency;
+		xstats->tvm_rt_latency_tot += tvm_rt_latency;
+		xstats->tvm_rt_latency_min = RTE_MIN(xstats->tvm_rt_latency_min, tvm_rt_latency);
+		xstats->tvm_rt_latency_max = RTE_MAX(xstats->tvm_rt_latency_max, tvm_rt_latency);
+		xstats->dequeued_count++;
+	} else {
+		qp->stats.dequeue_err_count++;
+		op->status = RTE_ML_OP_STATUS_ERROR;
+	}
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 4cabe30a82..cb4b219743 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -16,6 +16,44 @@
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
+struct cnxk_ml_qp;
+struct cnxk_ml_req;
+
+/* Inference stats */
+struct mvtvm_ml_stats {
+	/* Start ns */
+	uint64_t start_ns;
+
+	/* Start ns */
+	uint64_t end_ns;
+};
+
+/* Result structure */
+struct mvtvm_ml_result {
+	/* Job error code */
+	uint64_t error_code;
+
+	/* Inference stats */
+	struct mvtvm_ml_stats stats;
+
+	/* User context pointer */
+	void *user_ptr;
+};
+
+/* MVTVM specific request */
+struct mvtvm_ml_req {
+	/* Input tensors */
+	DLTensor input_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Output tensors */
+	DLTensor output_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Status field for poll mode requests */
+	volatile uint64_t status;
+
+	/* Result */
+	struct mvtvm_ml_result result;
+};
 
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
@@ -29,6 +67,11 @@ int mvtvm_ml_io_quantize(void *device, uint16_t model_id, const char *layer_name
 int mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name, void *qbuffer,
 			   const DLTensor **deq_tensor);
 
+__rte_hot bool mvtvm_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op,
+				       uint16_t layer_id, struct cnxk_ml_qp *qp, uint64_t head);
+__rte_hot void mvtvm_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request);
+__rte_hot void mvtvm_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype);
+
 void mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 				   uint16_t stat_id, uint16_t entry, char *suffix);
 uint64_t mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 33/35] ml/cnxk: enable creation of mvtvm virtual device
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
                     ` (31 preceding siblings ...)
  2023-09-27 18:30   ` [PATCH v3 32/35] ml/cnxk: enable fast-path ops for TVM models Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 34/35] ml/cnxk: update dependency info in driver docs Srikanth Yalavarthi
  2023-09-27 18:30   ` [PATCH v3 35/35] ml/cnxk: update release notes for 23.11 Srikanth Yalavarthi
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enable support to create a mvtvm virtual device on system's
without a PCI based ML HW accelerator.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 doc/guides/mldevs/cnxk.rst       |  49 +++++++-
 drivers/ml/cnxk/cn10k_ml_dev.c   |   8 ++
 drivers/ml/cnxk/cn10k_ml_dev.h   |   3 +
 drivers/ml/cnxk/cnxk_ml_dev.c    |   3 +
 drivers/ml/cnxk/cnxk_ml_dev.h    |  21 ++++
 drivers/ml/cnxk/cnxk_ml_ops.c    |  82 +++++++++----
 drivers/ml/cnxk/meson.build      |   2 +
 drivers/ml/cnxk/mvtvm_ml_dev.c   | 196 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_dev.h   |  40 +++++++
 drivers/ml/cnxk/mvtvm_ml_ops.c   |  31 +++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |   2 +
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  18 +++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |   2 +
 13 files changed, 433 insertions(+), 24 deletions(-)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.h

diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
index 1834b1f905..197e1ed06f 100644
--- a/doc/guides/mldevs/cnxk.rst
+++ b/doc/guides/mldevs/cnxk.rst
@@ -70,6 +70,22 @@ Bind the ML PF device to the vfio_pci driver:
    usertools/dpdk-devbind.py -u 0000:00:10.0
    usertools/dpdk-devbind.py -b vfio-pci 0000:00:10.0
 
+VDEV support
+------------
+
+On platforms which don't support ML hardware acceleration through PCI device, the
+Marvell ML CNXK PMD can execute inference operations on a vdev with the ML models
+compiled using Apache TVM framework.
+
+VDEV can be enabled by passing the EAL arguments
+
+.. code-block:: console
+
+   --vdev ml_mvtvm
+
+VDEV can also be used on platforms with ML HW accelerator. However use of VDEV and
+PCI HW accelerator is mutually exclusive.
+
 
 Runtime Config Options
 ----------------------
@@ -80,6 +96,8 @@ Runtime Config Options
   The parameter ``fw_path`` can be used by the user
   to load ML firmware from a custom path.
 
+  This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,fw_path="/home/user/ml_fw.bin"
@@ -95,6 +113,8 @@ Runtime Config Options
   When enabled, firmware would mask the DPE non-fatal hardware errors as warnings.
   The parameter ``enable_dpe_warnings`` is used fo this configuration.
 
+  This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,enable_dpe_warnings=0
@@ -111,11 +131,19 @@ Runtime Config Options
   Caching of model data improves the inferencing throughput / latency for the model.
   The parameter ``cache_model_data`` is used to enable data caching.
 
+  This option is supported on PCI HW accelerator and vdev.
+
   For example::
 
      -a 0000:00:10.0,cache_model_data=0
 
-  With the above configuration, model data caching is disabled.
+  With the above configuration, model data caching is disabled on HW accelerator.
+
+  For example::
+
+     --vdev ml_mvtvm,cache_model_data=0
+
+  With the above configuration, model data caching is disabled on vdev.
 
 
 **OCM allocation mode** (default ``lowest``)
@@ -131,6 +159,8 @@ Runtime Config Options
   ``largest``
     Allocate OCM for the model from the slot with largest amount of free space.
 
+  This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,ocm_alloc_mode=lowest
@@ -148,6 +178,8 @@ Runtime Config Options
   Supported page sizes by the driver are 1 KB, 2 KB, 4 KB, 8 KB and 16 KB.
   Default page size is 16 KB.
 
+  This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,ocm_page_size=8192
@@ -172,6 +204,8 @@ Runtime Config Options
     Enabling spinlock version would disable restrictions on the number of queue-pairs
     that can be supported by the driver.
 
+   This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,hw_queue_lock=1
@@ -180,6 +214,19 @@ Runtime Config Options
   in the fast path enqueue burst operation.
 
 
+**Maximum queue pairs** (default ``1``)
+
+  VDEV supports additional EAL arguments to configure the maximum number of
+  queue-pairs on the ML device through the option ``max_qps``.
+
+  This option is supported only on vdev.
+
+  For example::
+
+     --vdev ml_mvtvm,max_qps=4
+
+  With the above configuration, 4 queue-pairs are created on the vdev.
+
 Debugging Options
 -----------------
 
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 91813e9d0a..caa13ba08c 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -309,6 +309,12 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 
 	PLT_SET_USED(pci_drv);
 
+	if (cnxk_ml_dev_initialized == 1) {
+		plt_err("ML CNXK device already initialized!");
+		plt_err("Cannot initialize CN10K PCI dev");
+		rte_exit(-EINVAL, "Invalid EAL arguments ");
+	}
+
 	init_params = (struct rte_ml_dev_pmd_init_params){
 		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cnxk_ml_dev)};
 
@@ -355,6 +361,8 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	dev->dequeue_burst = NULL;
 	dev->op_error_get = NULL;
 
+	cnxk_ml_dev_initialized = 1;
+	cnxk_mldev->type = CNXK_ML_DEV_TYPE_PCI;
 	cnxk_mldev->state = ML_CNXK_DEV_STATE_PROBED;
 
 	return 0;
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 2e7eb6c9ef..cee405f3f5 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -11,6 +11,9 @@
 
 #include "cnxk_ml_io.h"
 
+/* Device status */
+extern int cnxk_ml_dev_initialized;
+
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.c b/drivers/ml/cnxk/cnxk_ml_dev.c
index 63d1c9e417..dc4512223c 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.c
+++ b/drivers/ml/cnxk/cnxk_ml_dev.c
@@ -7,6 +7,9 @@
 
 #include "cnxk_ml_dev.h"
 
+/* Device status */
+int cnxk_ml_dev_initialized;
+
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
 
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 382fca64be..491c4c4aea 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -9,6 +9,10 @@
 
 #include "cn10k_ml_dev.h"
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_dev.h"
+#endif
+
 #include "cnxk_ml_xstats.h"
 
 /* ML command timeout in seconds */
@@ -34,6 +38,15 @@ struct cnxk_ml_error_db {
 	char str[RTE_ML_STR_MAX];
 };
 
+/* Device type */
+enum cnxk_ml_dev_type {
+	/* PCI based Marvell's ML HW accelerator device */
+	CNXK_ML_DEV_TYPE_PCI,
+
+	/* Generic Virtual device */
+	CNXK_ML_DEV_TYPE_VDEV,
+};
+
 /* Device configuration state enum */
 enum cnxk_ml_dev_state {
 	/* Probed and not configured */
@@ -66,6 +79,9 @@ struct cnxk_ml_dev {
 	/* RTE device */
 	struct rte_ml_dev *mldev;
 
+	/* Device type */
+	enum cnxk_ml_dev_type type;
+
 	/* Configuration state */
 	enum cnxk_ml_dev_state state;
 
@@ -87,6 +103,11 @@ struct cnxk_ml_dev {
 	/* CN10K device structure */
 	struct cn10k_ml_dev cn10k_mldev;
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	/* MVTVM device structure */
+	struct mvtvm_ml_dev mvtvm_mldev;
+#endif
+
 	/* Maximum number of layers */
 	uint64_t max_nb_layers;
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 9bcf1b099e..1a876e190a 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -117,7 +117,8 @@ cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc
 	qp->stats.enqueue_err_count = 0;
 	qp->stats.dequeue_err_count = 0;
 
-	cn10k_ml_qp_initialize(cnxk_mldev, qp);
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		cn10k_ml_qp_initialize(cnxk_mldev, qp);
 
 	return qp;
 
@@ -480,7 +481,12 @@ cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	dev_info->driver_name = dev->device->driver->name;
 	dev_info->max_models = ML_CNXK_MAX_MODELS;
 
-	return cn10k_ml_dev_info_get(cnxk_mldev, dev_info);
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		return cn10k_ml_dev_info_get(cnxk_mldev, dev_info);
+	else
+		return mvtvm_ml_dev_info_get(cnxk_mldev, dev_info);
+
+	return 0;
 }
 
 static int
@@ -518,9 +524,11 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 			   conf->nb_queue_pairs, conf->nb_models);
 
 		/* Load firmware */
-		ret = cn10k_ml_fw_load(cnxk_mldev);
-		if (ret != 0)
-			return ret;
+		if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+			ret = cn10k_ml_fw_load(cnxk_mldev);
+			if (ret != 0)
+				return ret;
+		}
 	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
 		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
@@ -618,10 +626,12 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	}
 	dev->data->nb_models = conf->nb_models;
 
-	ret = cn10k_ml_dev_configure(cnxk_mldev, conf);
-	if (ret != 0) {
-		plt_err("Failed to configure CN10K ML Device");
-		goto error;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		ret = cn10k_ml_dev_configure(cnxk_mldev, conf);
+		if (ret != 0) {
+			plt_err("Failed to configure CN10K ML Device");
+			goto error;
+		}
 	}
 
 	ret = mvtvm_ml_dev_configure(cnxk_mldev, conf);
@@ -629,12 +639,17 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 		goto error;
 
 	/* Set device capabilities */
-	cnxk_mldev->max_nb_layers =
-		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		cnxk_mldev->max_nb_layers =
+			cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
+	else
+		cnxk_mldev->max_nb_layers = ML_CNXK_MAX_MODELS;
 
 	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
 	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
-	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
+
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
 
 	/* Allocate and initialize index_map */
 	if (cnxk_mldev->index_map == NULL) {
@@ -695,8 +710,10 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 	if (mvtvm_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close MVTVM ML Device");
 
-	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
-		plt_err("Failed to close CN10K ML Device");
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		if (cn10k_ml_dev_close(cnxk_mldev) != 0)
+			plt_err("Failed to close CN10K ML Device");
+	}
 
 	if (cnxk_mldev->index_map)
 		rte_free(cnxk_mldev->index_map);
@@ -748,10 +765,12 @@ cnxk_ml_dev_start(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
-	ret = cn10k_ml_dev_start(cnxk_mldev);
-	if (ret != 0) {
-		plt_err("Failed to start CN10K ML Device");
-		return ret;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		ret = cn10k_ml_dev_start(cnxk_mldev);
+		if (ret != 0) {
+			plt_err("Failed to start CN10K ML Device");
+			return ret;
+		}
 	}
 
 	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
@@ -770,10 +789,12 @@ cnxk_ml_dev_stop(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
-	ret = cn10k_ml_dev_stop(cnxk_mldev);
-	if (ret != 0) {
-		plt_err("Failed to stop CN10K ML Device");
-		return ret;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		ret = cn10k_ml_dev_stop(cnxk_mldev);
+		if (ret != 0) {
+			plt_err("Failed to stop CN10K ML Device");
+			return ret;
+		}
 	}
 
 	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
@@ -800,7 +821,12 @@ cnxk_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 			cnxk_ml_model_dump(cnxk_mldev, model, fp);
 	}
 
-	return cn10k_ml_dev_dump(cnxk_mldev, fp);
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		return cn10k_ml_dev_dump(cnxk_mldev, fp);
+	else
+		return mvtvm_ml_dev_dump(cnxk_mldev, fp);
+
+	return 0;
 }
 
 static int
@@ -813,6 +839,9 @@ cnxk_ml_dev_selftest(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV)
+		return -ENOTSUP;
+
 	return cn10k_ml_dev_selftest(cnxk_mldev);
 }
 
@@ -1141,6 +1170,11 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 		return -EINVAL;
 	}
 
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV && type != ML_CNXK_MODEL_TYPE_TVM) {
+		plt_err("Unsupported model type");
+		return -ENOTSUP;
+	}
+
 	/* Find model ID */
 	found = false;
 	for (lcl_model_id = 0; lcl_model_id < dev->data->nb_models; lcl_model_id++) {
@@ -1324,6 +1358,8 @@ cnxk_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buf
 		return -EINVAL;
 
 	cnxk_mldev = dev->data->dev_private;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV)
+		return -ENOTSUP;
 
 	model = dev->data->models[model_id];
 	if (model == NULL) {
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index b3a62a7871..e4e3bc200d 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -70,11 +70,13 @@ if enable_mvtvm
 dpdk_conf.set('RTE_MLDEV_CNXK_ENABLE_MVTVM', 1)
 
 driver_sdk_headers += files(
+        'mvtvm_ml_dev.h',
         'mvtvm_ml_ops.h',
         'mvtvm_ml_model.h',
 )
 
 sources += files(
+        'mvtvm_ml_dev.c',
         'mvtvm_ml_ops.c',
         'mvtvm_ml_model.c',
 )
diff --git a/drivers/ml/cnxk/mvtvm_ml_dev.c b/drivers/ml/cnxk/mvtvm_ml_dev.c
new file mode 100644
index 0000000000..c93b5155b9
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_dev.c
@@ -0,0 +1,196 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_kvargs.h>
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include <bus_vdev_driver.h>
+
+#include <roc_api.h>
+
+#include "cnxk_ml_dev.h"
+
+#define MVTVM_ML_DEV_MAX_QPS	      "max_qps"
+#define MVTVM_ML_DEV_CACHE_MODEL_DATA "cache_model_data"
+
+#define MVTVM_ML_DEV_MAX_QPS_DEFAULT	      32
+#define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT 1
+
+static const char *const valid_args[] = {MVTVM_ML_DEV_MAX_QPS, MVTVM_ML_DEV_CACHE_MODEL_DATA, NULL};
+
+static int
+parse_integer_arg(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	int *i = (int *)extra_args;
+
+	*i = atoi(value);
+	if (*i < 0) {
+		plt_err("Argument has to be positive.");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int
+parse_uint_arg(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	int i;
+	char *end;
+	errno = 0;
+
+	i = strtol(value, &end, 10);
+	if (*end != 0 || errno != 0 || i < 0)
+		return -EINVAL;
+
+	*((uint32_t *)extra_args) = i;
+
+	return 0;
+}
+
+static int
+mvtvm_mldev_parse_devargs(const char *args, struct mvtvm_ml_dev *mvtvm_mldev)
+{
+	bool cache_model_data_set = false;
+	struct rte_kvargs *kvlist = NULL;
+	bool max_qps_set = false;
+	int ret = 0;
+
+	if (args == NULL)
+		goto check_args;
+
+	kvlist = rte_kvargs_parse(args, valid_args);
+	if (kvlist == NULL) {
+		plt_err("Error parsing %s devargs\n", "MLDEV_NAME_MVTVM_PMD");
+		return -EINVAL;
+	}
+
+	if (rte_kvargs_count(kvlist, MVTVM_ML_DEV_MAX_QPS) == 1) {
+		ret = rte_kvargs_process(kvlist, MVTVM_ML_DEV_MAX_QPS, &parse_uint_arg,
+					 &mvtvm_mldev->max_nb_qpairs);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n", MVTVM_ML_DEV_MAX_QPS);
+			ret = -EINVAL;
+			goto exit;
+		}
+		max_qps_set = true;
+	}
+
+	if (rte_kvargs_count(kvlist, MVTVM_ML_DEV_CACHE_MODEL_DATA) == 1) {
+		ret = rte_kvargs_process(kvlist, MVTVM_ML_DEV_CACHE_MODEL_DATA, &parse_integer_arg,
+					 &mvtvm_mldev->cache_model_data);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n",
+				MVTVM_ML_DEV_CACHE_MODEL_DATA);
+			ret = -EINVAL;
+			goto exit;
+		}
+		cache_model_data_set = true;
+	}
+
+check_args:
+	if (!max_qps_set)
+		mvtvm_mldev->max_nb_qpairs = MVTVM_ML_DEV_MAX_QPS_DEFAULT;
+	plt_ml_dbg("ML: %s = %u", MVTVM_ML_DEV_MAX_QPS, mvtvm_mldev->max_nb_qpairs);
+
+	if (!cache_model_data_set) {
+		mvtvm_mldev->cache_model_data = CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT;
+	} else {
+		if ((mvtvm_mldev->cache_model_data < 0) || (mvtvm_mldev->cache_model_data > 1)) {
+			plt_err("Invalid argument, %s = %d\n", MVTVM_ML_DEV_CACHE_MODEL_DATA,
+				mvtvm_mldev->cache_model_data);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_ml_dbg("ML: %s = %d", MVTVM_ML_DEV_CACHE_MODEL_DATA, mvtvm_mldev->cache_model_data);
+
+exit:
+	if (kvlist)
+		rte_kvargs_free(kvlist);
+
+	return ret;
+}
+
+static int
+mvtvm_ml_vdev_probe(struct rte_vdev_device *vdev)
+{
+	struct rte_ml_dev_pmd_init_params init_params;
+	struct mvtvm_ml_dev *mvtvm_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct rte_ml_dev *dev;
+	const char *input_args;
+	const char *name;
+	int ret = 0;
+
+	if (cnxk_ml_dev_initialized == 1) {
+		plt_err("ML CNXK device already initialized!");
+		plt_err("Cannot initialize MVTVM vdev");
+		rte_exit(-EINVAL, "Invalid EAL arguments ");
+	}
+
+	init_params = (struct rte_ml_dev_pmd_init_params){
+		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cnxk_ml_dev)};
+
+	name = rte_vdev_device_name(vdev);
+	if (name == NULL)
+		return -EINVAL;
+	input_args = rte_vdev_device_args(vdev);
+
+	dev = rte_ml_dev_pmd_create(name, &vdev->device, &init_params);
+	if (dev == NULL) {
+		ret = -EFAULT;
+		goto error_exit;
+	}
+
+	cnxk_mldev = dev->data->dev_private;
+	cnxk_mldev->mldev = dev;
+	mvtvm_mldev = &cnxk_mldev->mvtvm_mldev;
+	mvtvm_mldev->vdev = vdev;
+
+	ret = mvtvm_mldev_parse_devargs(input_args, mvtvm_mldev);
+	if (ret < 0)
+		goto error_exit;
+
+	dev->dev_ops = &cnxk_ml_ops;
+	dev->enqueue_burst = NULL;
+	dev->dequeue_burst = NULL;
+	dev->op_error_get = NULL;
+
+	cnxk_ml_dev_initialized = 1;
+	cnxk_mldev->type = CNXK_ML_DEV_TYPE_VDEV;
+
+	return 0;
+
+error_exit:
+	plt_err("Could not create device: ml_mvtvm");
+
+	return ret;
+}
+
+static int
+mvtvm_ml_vdev_remove(struct rte_vdev_device *vdev)
+{
+	struct rte_ml_dev *dev;
+	const char *name;
+
+	name = rte_vdev_device_name(vdev);
+	if (name == NULL)
+		return -EINVAL;
+
+	dev = rte_ml_dev_pmd_get_named_dev(name);
+	if (dev == NULL)
+		return -ENODEV;
+
+	return rte_ml_dev_pmd_destroy(dev);
+}
+
+static struct rte_vdev_driver mvtvm_mldev_pmd = {.probe = mvtvm_ml_vdev_probe,
+						 .remove = mvtvm_ml_vdev_remove};
+
+RTE_PMD_REGISTER_VDEV(MLDEV_NAME_MVTVM_PMD, mvtvm_mldev_pmd);
+
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_MVTVM_PMD,
+			      MVTVM_ML_DEV_MAX_QPS "=<int>" MVTVM_ML_DEV_CACHE_MODEL_DATA "=<0|1>");
diff --git a/drivers/ml/cnxk/mvtvm_ml_dev.h b/drivers/ml/cnxk/mvtvm_ml_dev.h
new file mode 100644
index 0000000000..6922c19337
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_dev.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_DEV_H_
+#define _MVTVM_ML_DEV_H_
+
+#include <rte_mldev_core.h>
+
+/* Device status */
+extern int cnxk_ml_dev_initialized;
+
+/* CNXK Device ops */
+extern struct rte_ml_dev_ops cnxk_ml_ops;
+
+/* Marvell MVTVM ML PMD device name */
+#define MLDEV_NAME_MVTVM_PMD ml_mvtvm
+
+/* Maximum number of descriptors per queue-pair */
+#define ML_MVTVM_MAX_DESC_PER_QP 1024
+
+/* Maximum number of inputs / outputs per model */
+#define ML_MVTVM_MAX_INPUT_OUTPUT 32
+
+/* Maximum number of segments for IO data */
+#define ML_MVTVM_MAX_SEGMENTS 1
+
+/* Device private data */
+struct mvtvm_ml_dev {
+	/* Virtual device */
+	struct rte_vdev_device *vdev;
+
+	/* Maximum number of queue pairs */
+	uint16_t max_nb_qpairs;
+
+	/* Enable / disable model data caching */
+	int cache_model_data;
+};
+
+#endif /* _MVTVM_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 1e74b82a0a..bbefa8a356 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -97,6 +97,22 @@ mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *m
 	return value;
 }
 
+int
+mvtvm_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info)
+{
+	struct mvtvm_ml_dev *mvtvm_mldev;
+
+	mvtvm_mldev = &cnxk_mldev->mvtvm_mldev;
+
+	dev_info->max_queue_pairs = mvtvm_mldev->max_nb_qpairs;
+	dev_info->max_desc = ML_MVTVM_MAX_DESC_PER_QP;
+	dev_info->max_io = ML_MVTVM_MAX_INPUT_OUTPUT;
+	dev_info->max_segments = ML_MVTVM_MAX_SEGMENTS;
+	dev_info->align_size = RTE_CACHE_LINE_SIZE;
+
+	return 0;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
@@ -127,6 +143,15 @@ mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 	return ret;
 }
 
+int
+mvtvm_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(fp);
+
+	return 0;
+}
+
 int
 mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 		    struct cnxk_ml_model *model)
@@ -237,6 +262,12 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	else
 		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_HYBRID;
 
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV &&
+	    model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_LLVM) {
+		plt_err("Unsupported model sub-type");
+		return -ENOTSUP;
+	}
+
 	/* Set callback function array */
 	if (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_LLVM) {
 		callback = &model->mvtvm.cb;
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index cb4b219743..0232c5ead5 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -55,8 +55,10 @@ struct mvtvm_ml_req {
 	struct mvtvm_ml_result result;
 };
 
+int mvtvm_ml_dev_info_get(struct cnxk_ml_dev *mldev, struct rte_ml_dev_info *dev_info);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int mvtvm_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index 19af1d2703..126a954c91 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -67,6 +67,15 @@ mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *m
 	return 0;
 }
 
+int
+mvtvm_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(dev_info);
+
+	return -ENOTSUP;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
@@ -84,6 +93,15 @@ mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
+int
+mvtvm_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(fp);
+
+	return -EINVAL;
+}
+
 int
 mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 		    struct cnxk_ml_model *model)
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 3fd1f04c35..4220a963f2 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -14,8 +14,10 @@ struct cnxk_ml_model;
 struct cnxk_ml_layer;
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
+int mvtvm_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int mvtvm_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 34/35] ml/cnxk: update dependency info in driver docs
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
                     ` (32 preceding siblings ...)
  2023-09-27 18:30   ` [PATCH v3 33/35] ml/cnxk: enable creation of mvtvm virtual device Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  2023-09-28  4:12     ` Jerin Jacob
  2023-09-27 18:30   ` [PATCH v3 35/35] ml/cnxk: update release notes for 23.11 Srikanth Yalavarthi
  34 siblings, 1 reply; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added information related to external library dependencies
for ml/cnxk driver.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 doc/guides/mldevs/cnxk.rst | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
index 197e1ed06f..afadc834e0 100644
--- a/doc/guides/mldevs/cnxk.rst
+++ b/doc/guides/mldevs/cnxk.rst
@@ -47,6 +47,34 @@ or cross-compiled on an x86 platform.
 Refer to :doc:`../platform/cnxk` for instructions to build your DPDK application.
 
 
+Compilation Prerequisites
+-------------------------
+
+This driver requires external libraries to optionally enable support for
+models compiled using Apache TVM framework. The following dependencies are
+not part of DPDK and must be installed separately:
+
+- **Jansson**
+
+  This library enables support to parse and read JSON files.
+
+- **libarchive**
+
+  Apached TVM framework generates compiled models as tar archives. This
+  library enables support to decompress and read archive files in tar,
+  xz and other formats.
+
+- **TVM**
+
+  Apache TVM provides a runtime library (libtvm_runtime) used to execute
+  models on CPU cores or hardware accelerators.
+
+- **TVMDP**
+
+  Marvell's TVM dataplane library which works as an interface between TVM
+  runtime and DPDK drivers. TVMDP library provides a simplified C interface
+  for TVM's runtime based on C++.
+
 Initialization
 --------------
 
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v3 35/35] ml/cnxk: update release notes for 23.11
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
                     ` (33 preceding siblings ...)
  2023-09-27 18:30   ` [PATCH v3 34/35] ml/cnxk: update dependency info in driver docs Srikanth Yalavarthi
@ 2023-09-27 18:30   ` Srikanth Yalavarthi
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:30 UTC (permalink / raw)
  Cc: dev, syalavarthi, sshankarnara, aprabhu, ptakkar

Updated 23.11 release notes for ml/cnxk driver.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 doc/guides/rel_notes/release_23_11.rst | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/doc/guides/rel_notes/release_23_11.rst b/doc/guides/rel_notes/release_23_11.rst
index ca31ac5985..7e1d31f680 100644
--- a/doc/guides/rel_notes/release_23_11.rst
+++ b/doc/guides/rel_notes/release_23_11.rst
@@ -45,6 +45,10 @@ New Features
 
      Added support in mldev library for models with multiple inputs and outputs.
 
+   * **Added support for Marvell TVM models in ML CNXK driver.**
+
+     Added support models compiled using TVM framework in ML CNXK driver.
+
 
 .. This section should contain new features added in this release.
    Sample format:
-- 
2.41.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* RE: [EXT] Re: [PATCH v1 02/34] ml/cnxk: drop use of RTE API for firmware read
  2023-09-21 12:08   ` Jerin Jacob
  2023-09-21 12:52     ` David Marchand
  2023-09-27  9:38     ` David Marchand
@ 2023-09-27 18:37     ` Srikanth Yalavarthi
  2 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:37 UTC (permalink / raw)
  To: Jerin Jacob, David Marchand
  Cc: Prince Takkar, dev, Shivah Shankar Shankar Narayan Rao,
	Anup Prabhu, Srikanth Yalavarthi

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: 21 September 2023 17:38
> To: Srikanth Yalavarthi <syalavarthi@marvell.com>; David Marchand
> <david.marchand@redhat.com>
> Cc: Prince Takkar <ptakkar@marvell.com>; dev@dpdk.org; Shivah Shankar
> Shankar Narayan Rao <sshankarnara@marvell.com>; Anup Prabhu
> <aprabhu@marvell.com>; Srikanth Yalavarthi <syalavarthi@marvell.com>
> Subject: [EXT] Re: [PATCH v1 02/34] ml/cnxk: drop use of RTE API for
> firmware read
> 
> External Email
> 
> ----------------------------------------------------------------------
> On Wed, Aug 30, 2023 at 9:40 PM Srikanth Yalavarthi
> <syalavarthi@marvell.com> wrote:
> >
> > Dropped use of rte_firmware_read API to read ML firmware binary. When
> > DPDK is built with libarchive aaupport, the the RTE API assumes the
> > binary file as a compressed archive. This causes the ML firmware
> > binary to be parsed incorrectly.
> 
> + @David Marchand  rte_firmware_read() author for his opinions
> 

Dropped this patch in v3 series. Required fix is implemented as part of
http://patches.dpdk.org/project/dpdk/patch/20230926144454.13419-1-syalavarthi@marvell.com/


^ permalink raw reply	[flat|nested] 340+ messages in thread

* RE: [EXT] Re: [PATCH v1 19/34] ml/cnxk: support config and close of tvmdp library
  2023-09-21 12:32   ` Jerin Jacob
@ 2023-09-27 18:38     ` Srikanth Yalavarthi
  0 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:38 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, Shivah Shankar Shankar Narayan Rao, Anup Prabhu,
	Prince Takkar, Srikanth Yalavarthi

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: 21 September 2023 18:02
> To: Srikanth Yalavarthi <syalavarthi@marvell.com>
> Cc: dev@dpdk.org; Shivah Shankar Shankar Narayan Rao
> <sshankarnara@marvell.com>; Anup Prabhu <aprabhu@marvell.com>;
> Prince Takkar <ptakkar@marvell.com>; Srikanth Yalavarthi
> <syalavarthi@marvell.com>
> Subject: [EXT] Re: [PATCH v1 19/34] ml/cnxk: support config and close of
> tvmdp library
> 
> External Email
> 
> ----------------------------------------------------------------------
> On Wed, Aug 30, 2023 at 9:34 PM Srikanth Yalavarthi
> <syalavarthi@marvell.com> wrote:
> >
> > Added support to configure and close TVMDP library based on ML device
> > configuration options.
> >
> > Updated meson build to enable Jansson, TVM runtime, TVMDP library as
> > build dependencies.
> 
> If it is optional – please add optional
> 
> please update cnxk ml driver documentation on this dependency and
> example command to build it. See DPDK mlx5 docs for dependency
> documentation.

Updated the driver documentation with details related to external libraries in v3 patch series.
> 
> 
> >
> > Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
> > ---
> >  drivers/ml/cnxk/cnxk_ml_ops.c  | 15 ++++++++++++
> >  drivers/ml/cnxk/meson.build    | 45
> ++++++++++++++++++++++++++++++++++
> >  drivers/ml/cnxk/mvtvm_ml_ops.c | 44
> +++++++++++++++++++++++++++++++++
> > drivers/ml/cnxk/mvtvm_ml_ops.h | 15 ++++++++++++
> >  4 files changed, 119 insertions(+)
> >  create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.c  create mode
> 100644
> > drivers/ml/cnxk/mvtvm_ml_ops.h
> >
> > diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c
> > b/drivers/ml/cnxk/cnxk_ml_ops.c index b2eb4bd0d9a..454fec33234
> 100644
> > --- a/drivers/ml/cnxk/cnxk_ml_ops.c
> > +++ b/drivers/ml/cnxk/cnxk_ml_ops.c
> > @@ -9,6 +9,10 @@
> >
> >  #include "cn10k_ml_ops.h"
> >
> > +#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
> > +#include "mvtvm_ml_ops.h"
> > +#endif
> > +
> >  #include "cnxk_ml_dev.h"
> >  #include "cnxk_ml_io.h"
> >  #include "cnxk_ml_model.h"
> > @@ -625,6 +629,12 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev,
> const struct rte_ml_dev_config *co
> >                 goto error;
> >         }
> >
> > +#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
> 
> If this #ifdef used a lot place in code like this, Please have stub and segregate
> at once place in header file and avoid ifdef main code like
> cnxk_ml_dev_configure().

Reorganized the code to reduce the use of RTE_MLDEV_CNXK_ENABLE_MVTVM. Changes part of v3 series.

^ permalink raw reply	[flat|nested] 340+ messages in thread

* RE: [EXT] Re: [PATCH v2 00/34] Implemenation of revised ml/cnxk driver
  2023-09-21 12:15   ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Jerin Jacob
@ 2023-09-27 18:39     ` Srikanth Yalavarthi
  0 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-09-27 18:39 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, Shivah Shankar Shankar Narayan Rao, Anup Prabhu,
	Prince Takkar, Srikanth Yalavarthi

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: 21 September 2023 17:46
> To: Srikanth Yalavarthi <syalavarthi@marvell.com>
> Cc: dev@dpdk.org; Shivah Shankar Shankar Narayan Rao
> <sshankarnara@marvell.com>; Anup Prabhu <aprabhu@marvell.com>;
> Prince Takkar <ptakkar@marvell.com>; Srikanth Yalavarthi
> <syalavarthi@marvell.com>
> Subject: [EXT] Re: [PATCH v2 00/34] Implemenation of revised ml/cnxk driver
> 
> External Email
> 
> ----------------------------------------------------------------------
> On Wed, Sep 20, 2023 at 12:55 PM Srikanth Yalavarthi
> <syalavarthi@marvell.com> wrote:
> >
> > This patch series is an implementation of revised ml/cnxk driver to
> > support models compiled with TVM compiler framework. TVM models use
> a
> > hybrid mode for execution, with regions of the model executing on the
> > ML accelerator and the rest executing on CPU cores.
> >
> > This series of commits reorganizes the ml/cnxk driver and adds support
> > to execute multiple regions with-in a TVM model.
> 
> 
> For new feature as when add it the patch, please update
> doc/guides/rel_notes/release_23_11.rst
> under "* **Updated Marvell cnxk ml driver.**"

Updated. Added details in release notes.

^ permalink raw reply	[flat|nested] 340+ messages in thread

* Re: [PATCH v3 34/35] ml/cnxk: update dependency info in driver docs
  2023-09-27 18:30   ` [PATCH v3 34/35] ml/cnxk: update dependency info in driver docs Srikanth Yalavarthi
@ 2023-09-28  4:12     ` Jerin Jacob
  2023-10-01  0:32       ` [EXT] " Srikanth Yalavarthi
  2023-10-17 17:03       ` Srikanth Yalavarthi
  0 siblings, 2 replies; 340+ messages in thread
From: Jerin Jacob @ 2023-09-28  4:12 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar, techboard

On Thu, Sep 28, 2023 at 6:41 AM Srikanth Yalavarthi
<syalavarthi@marvell.com> wrote:
>
> Added information related to external library dependencies
> for ml/cnxk driver.
>
> Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
> ---
>  doc/guides/mldevs/cnxk.rst | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
>
> diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
> index 197e1ed06f..afadc834e0 100644
> --- a/doc/guides/mldevs/cnxk.rst
> +++ b/doc/guides/mldevs/cnxk.rst
> @@ -47,6 +47,34 @@ or cross-compiled on an x86 platform.
>  Refer to :doc:`../platform/cnxk` for instructions to build your DPDK application.
>
>
> +Compilation Prerequisites
> +-------------------------
> +
> +This driver requires external libraries to optionally enable support for
> +models compiled using Apache TVM framework. The following dependencies are
> +not part of DPDK and must be installed separately:
> +
> +- **Jansson**
> +
> +  This library enables support to parse and read JSON files.
> +
> +- **libarchive**
> +
> +  Apached TVM framework generates compiled models as tar archives. This
> +  library enables support to decompress and read archive files in tar,
> +  xz and other formats.
> +
> +- **TVM**
> +
> +  Apache TVM provides a runtime library (libtvm_runtime) used to execute
> +  models on CPU cores or hardware accelerators.
> +
> +- **TVMDP**
> +
> +  Marvell's TVM dataplane library which works as an interface between TVM
> +  runtime and DPDK drivers. TVMDP library provides a simplified C interface
> +  for TVM's runtime based on C++.

It seems that it depends on a proprietary library. Please fix the
following for merging this series.

According to what was discussed in the Technical Board:
http://mails.dpdk.org/archives/dev/2019-June/135847.html
the dependency must be "freely available" to build it either source or
binary form. (Prefer in source form)

Also, Squash all doc updates to relevant patches.

^ permalink raw reply	[flat|nested] 340+ messages in thread

* RE: [EXT] Re: [PATCH v3 34/35] ml/cnxk: update dependency info in driver docs
  2023-09-28  4:12     ` Jerin Jacob
@ 2023-10-01  0:32       ` Srikanth Yalavarthi
  2023-10-17 17:03       ` Srikanth Yalavarthi
  1 sibling, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-01  0:32 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, Shivah Shankar Shankar Narayan Rao, Anup Prabhu,
	Prince Takkar, techboard, Srikanth Yalavarthi

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: 28 September 2023 09:43
> To: Srikanth Yalavarthi <syalavarthi@marvell.com>
> Cc: dev@dpdk.org; Shivah Shankar Shankar Narayan Rao
> <sshankarnara@marvell.com>; Anup Prabhu <aprabhu@marvell.com>;
> Prince Takkar <ptakkar@marvell.com>; techboard@dpdk.org; Srikanth
> Yalavarthi <syalavarthi@marvell.com>
> Subject: [EXT] Re: [PATCH v3 34/35] ml/cnxk: update dependency info in
> driver docs
> 
> External Email
> 
> ----------------------------------------------------------------------
> On Thu, Sep 28, 2023 at 6:41 AM Srikanth Yalavarthi
> <syalavarthi@marvell.com> wrote:
> >
> > Added information related to external library dependencies for ml/cnxk
> > driver.
> >
> > Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
> > ---
> >  doc/guides/mldevs/cnxk.rst | 28 ++++++++++++++++++++++++++++
> >  1 file changed, 28 insertions(+)
> >
> > diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
> > index 197e1ed06f..afadc834e0 100644
> > --- a/doc/guides/mldevs/cnxk.rst
> > +++ b/doc/guides/mldevs/cnxk.rst
> > @@ -47,6 +47,34 @@ or cross-compiled on an x86 platform.
> >  Refer to :doc:`../platform/cnxk` for instructions to build your DPDK
> application.
> >
> >
> > +Compilation Prerequisites
> > +-------------------------
> > +
> > +This driver requires external libraries to optionally enable support
> > +for models compiled using Apache TVM framework. The following
> > +dependencies are not part of DPDK and must be installed separately:
> > +
> > +- **Jansson**
> > +
> > +  This library enables support to parse and read JSON files.
> > +
> > +- **libarchive**
> > +
> > +  Apached TVM framework generates compiled models as tar archives.
> > + This  library enables support to decompress and read archive files
> > + in tar,  xz and other formats.
> > +
> > +- **TVM**
> > +
> > +  Apache TVM provides a runtime library (libtvm_runtime) used to
> > + execute  models on CPU cores or hardware accelerators.
> > +
> > +- **TVMDP**
> > +
> > +  Marvell's TVM dataplane library which works as an interface between
> > + TVM  runtime and DPDK drivers. TVMDP library provides a simplified C
> > + interface  for TVM's runtime based on C++.
> 
> It seems that it depends on a proprietary library. Please fix the following for
> merging this series.
> 
> According to what was discussed in the Technical Board:
> https://urldefense.proofpoint.com/v2/url?u=http-
> 3A__mails.dpdk.org_archives_dev_2019-
> 2DJune_135847.html&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=SNPqUk
> Gl0n_Ms1iJa_6wD6LBwX8efL_NOyXvAX-
> iCMI&m=diNZcWaywZP478LbvQPYaK6-
> w1mq1giW2phwy5s7roDoNUFO6TSUdyOFHVZnutCI&s=8WbivGMgGdsgKaKT
> I1QKfTTnq56JJqnyxGUczzyYm3I&e=
> the dependency must be "freely available" to build it either source or binary
> form. (Prefer in source form)

We are working hosting TVMDP library on github. Will submit revised series with the details.

> 
> Also, Squash all doc updates to relevant patches.

Ack.

^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 00/34] Implementation of revised ml/cnxk driver
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (35 preceding siblings ...)
  2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
@ 2023-10-17 16:59 ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 01/34] ml/cnxk: drop support for register polling Srikanth Yalavarthi
                     ` (34 more replies)
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
                   ` (4 subsequent siblings)
  41 siblings, 35 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  Cc: dev, syalavarthi, sshankarnara, aprabhu, ptakkar

This patch series is an implementation of revised ml/cnxk driver
to support models compiled with TVM compiler framework. TVM models
use a hybrid mode for execution, with regions of the model executing
on the ML accelerator and the rest executing on CPU cores.

This series of commits reorganizes the ml/cnxk driver and adds support
to execute multiple regions with-in a TVM model.

v4:
  - Squashed release notes
  - Updated external build dependency info in documentation

v3:
  - Reduced use of RTE_MLDEV_CNXK_ENABLE_MVTVM macro
  - Added stubs file with dummy functions to use when TVM is disabled
  - Dropped patch with internal function to read firmware
  - Updated ML CNXK PMD documentation
  - Added external library dependency info in documentation
  - Added release notes for 23.11

v2:
  - Fix xstats reporting
  - Fix issues reported by klocwork static analysis tool
  - Update external header inclusions

v1:
  - Initial changes

Anup Prabhu (2):
  ml/cnxk: enable OCM check for multilayer TVM model
  ml/cnxk: enable fast-path ops for TVM models

Prince Takkar (2):
  ml/cnxk: update internal TVM model info structure
  ml/cnxk: support quantize and dequantize callback

Srikanth Yalavarthi (30):
  ml/cnxk: drop support for register polling
  ml/cnxk: add generic cnxk device structure
  ml/cnxk: add generic model and layer structures
  ml/cnxk: add generic cnxk request structure
  ml/cnxk: add generic cnxk xstats structures
  ml/cnxk: rename cnxk ops function pointers struct
  ml/cnxk: update device handling functions
  ml/cnxk: update queue-pair handling functions
  ml/cnxk: update model load and unload functions
  ml/cnxk: update model start and stop functions
  ml/cnxk: update model utility functions
  ml/cnxk: update data quantization functions
  ml/cnxk: update device debug functions
  ml/cnxk: update device stats functions
  ml/cnxk: update device and model xstats functions
  ml/cnxk: update fast path functions
  ml/cnxk: move error handling to cnxk layer
  ml/cnxk: support config and close of tvmdp library
  ml/cnxk: add structures to support TVM model type
  ml/cnxk: add support for identify model type
  ml/cnxk: add support to parse TVM model objects
  ml/cnxk: fetch layer info and load TVM model
  ml/cnxk: update internal info for TVM model
  ml/cnxk: enable model unload in tvmdp library
  ml/cnxk: support start and stop for TVM models
  ml/cnxk: support device dump for TVM models
  ml/cnxk: enable reporting model runtime as xstats
  ml/cnxk: implement I/O alloc and free callbacks
  ml/cnxk: add generic ML malloc and free callback
  ml/cnxk: enable creation of mvtvm virtual device

 doc/guides/mldevs/cnxk.rst             |  111 +-
 doc/guides/rel_notes/release_23_11.rst |    4 +
 drivers/ml/cnxk/cn10k_ml_dev.c         |  416 ++--
 drivers/ml/cnxk/cn10k_ml_dev.h         |  457 +---
 drivers/ml/cnxk/cn10k_ml_model.c       |  401 ++--
 drivers/ml/cnxk/cn10k_ml_model.h       |  151 +-
 drivers/ml/cnxk/cn10k_ml_ocm.c         |  111 +-
 drivers/ml/cnxk/cn10k_ml_ocm.h         |   15 +-
 drivers/ml/cnxk/cn10k_ml_ops.c         | 2828 ++++++++----------------
 drivers/ml/cnxk/cn10k_ml_ops.h         |  358 ++-
 drivers/ml/cnxk/cnxk_ml_dev.c          |   22 +
 drivers/ml/cnxk/cnxk_ml_dev.h          |  120 +
 drivers/ml/cnxk/cnxk_ml_io.c           |   95 +
 drivers/ml/cnxk/cnxk_ml_io.h           |   88 +
 drivers/ml/cnxk/cnxk_ml_model.c        |   94 +
 drivers/ml/cnxk/cnxk_ml_model.h        |  192 ++
 drivers/ml/cnxk/cnxk_ml_ops.c          | 1690 ++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.h          |   87 +
 drivers/ml/cnxk/cnxk_ml_utils.c        |   15 +
 drivers/ml/cnxk/cnxk_ml_utils.h        |   17 +
 drivers/ml/cnxk/cnxk_ml_xstats.h       |  152 ++
 drivers/ml/cnxk/meson.build            |   79 +
 drivers/ml/cnxk/mvtvm_ml_dev.c         |  196 ++
 drivers/ml/cnxk/mvtvm_ml_dev.h         |   40 +
 drivers/ml/cnxk/mvtvm_ml_model.c       |  392 ++++
 drivers/ml/cnxk/mvtvm_ml_model.h       |   90 +
 drivers/ml/cnxk/mvtvm_ml_ops.c         |  652 ++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h         |   82 +
 drivers/ml/cnxk/mvtvm_ml_stubs.c       |  141 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.h       |   36 +
 30 files changed, 6173 insertions(+), 2959 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_xstats.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_stubs.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_stubs.h

-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 01/34] ml/cnxk: drop support for register polling
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 02/34] ml/cnxk: add generic cnxk device structure Srikanth Yalavarthi
                     ` (33 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Dropped support for device argument "poll_mem" for cnxk
ML driver. Support to use registers for polling is removed
and DDR addresses would be used for polling.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 doc/guides/mldevs/cnxk.rst     |  16 -----
 drivers/ml/cnxk/cn10k_ml_dev.c |  36 +----------
 drivers/ml/cnxk/cn10k_ml_dev.h |  13 +---
 drivers/ml/cnxk/cn10k_ml_ops.c | 111 ++++-----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   6 --
 5 files changed, 18 insertions(+), 164 deletions(-)

diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
index b79bc540d9..1834b1f905 100644
--- a/doc/guides/mldevs/cnxk.rst
+++ b/doc/guides/mldevs/cnxk.rst
@@ -180,22 +180,6 @@ Runtime Config Options
   in the fast path enqueue burst operation.
 
 
-**Polling memory location** (default ``ddr``)
-
-  ML cnxk driver provides the option to select the memory location to be used
-  for polling to check the inference request completion.
-  Driver supports using either the DDR address space (``ddr``)
-  or ML registers (``register``) as polling locations.
-  The parameter ``poll_mem`` is used to specify the poll location.
-
-  For example::
-
-     -a 0000:00:10.0,poll_mem="register"
-
-  With the above configuration, ML cnxk driver is configured to use ML registers
-  for polling in fastpath requests.
-
-
 Debugging Options
 -----------------
 
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 983138a7f2..e3c2badcef 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -23,7 +23,6 @@
 #define CN10K_ML_DEV_CACHE_MODEL_DATA	"cache_model_data"
 #define CN10K_ML_OCM_ALLOC_MODE		"ocm_alloc_mode"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK	"hw_queue_lock"
-#define CN10K_ML_FW_POLL_MEM		"poll_mem"
 #define CN10K_ML_OCM_PAGE_SIZE		"ocm_page_size"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
@@ -32,7 +31,6 @@
 #define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT	1
 #define CN10K_ML_OCM_ALLOC_MODE_DEFAULT		"lowest"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT	1
-#define CN10K_ML_FW_POLL_MEM_DEFAULT		"ddr"
 #define CN10K_ML_OCM_PAGE_SIZE_DEFAULT		16384
 
 /* ML firmware macros */
@@ -54,7 +52,6 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 					 CN10K_ML_DEV_CACHE_MODEL_DATA,
 					 CN10K_ML_OCM_ALLOC_MODE,
 					 CN10K_ML_DEV_HW_QUEUE_LOCK,
-					 CN10K_ML_FW_POLL_MEM,
 					 CN10K_ML_OCM_PAGE_SIZE,
 					 NULL};
 
@@ -103,9 +100,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	bool hw_queue_lock_set = false;
 	bool ocm_page_size_set = false;
 	char *ocm_alloc_mode = NULL;
-	bool poll_mem_set = false;
 	bool fw_path_set = false;
-	char *poll_mem = NULL;
 	char *fw_path = NULL;
 	int ret = 0;
 	bool found;
@@ -189,17 +184,6 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		hw_queue_lock_set = true;
 	}
 
-	if (rte_kvargs_count(kvlist, CN10K_ML_FW_POLL_MEM) == 1) {
-		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_POLL_MEM, &parse_string_arg,
-					 &poll_mem);
-		if (ret < 0) {
-			plt_err("Error processing arguments, key = %s\n", CN10K_ML_FW_POLL_MEM);
-			ret = -EINVAL;
-			goto exit;
-		}
-		poll_mem_set = true;
-	}
-
 	if (rte_kvargs_count(kvlist, CN10K_ML_OCM_PAGE_SIZE) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_OCM_PAGE_SIZE, &parse_integer_arg,
 					 &mldev->ocm_page_size);
@@ -280,18 +264,6 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, mldev->hw_queue_lock);
 
-	if (!poll_mem_set) {
-		mldev->fw.poll_mem = CN10K_ML_FW_POLL_MEM_DEFAULT;
-	} else {
-		if (!((strcmp(poll_mem, "ddr") == 0) || (strcmp(poll_mem, "register") == 0))) {
-			plt_err("Invalid argument, %s = %s\n", CN10K_ML_FW_POLL_MEM, poll_mem);
-			ret = -EINVAL;
-			goto exit;
-		}
-		mldev->fw.poll_mem = poll_mem;
-	}
-	plt_info("ML: %s = %s", CN10K_ML_FW_POLL_MEM, mldev->fw.poll_mem);
-
 	if (!ocm_page_size_set) {
 		mldev->ocm_page_size = CN10K_ML_OCM_PAGE_SIZE_DEFAULT;
 	} else {
@@ -450,10 +422,7 @@ cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 	if (fw->report_dpe_warnings)
 		flags = flags | FW_REPORT_DPE_WARNING_BITMASK;
 
-	if (strcmp(fw->poll_mem, "ddr") == 0)
-		flags = flags | FW_USE_DDR_POLL_ADDR_FP;
-	else if (strcmp(fw->poll_mem, "register") == 0)
-		flags = flags & ~FW_USE_DDR_POLL_ADDR_FP;
+	flags = flags | FW_USE_DDR_POLL_ADDR_FP;
 
 	return flags;
 }
@@ -863,5 +832,4 @@ RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH
 			      "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
 			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE
 			      "=<lowest|largest>" CN10K_ML_DEV_HW_QUEUE_LOCK
-			      "=<0|1>" CN10K_ML_FW_POLL_MEM "=<ddr|register>" CN10K_ML_OCM_PAGE_SIZE
-			      "=<1024|2048|4096|8192|16384>");
+			      "=<0|1>" CN10K_ML_OCM_PAGE_SIZE "=<1024|2048|4096|8192|16384>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index c73bf7d001..4aaeecff03 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -390,9 +390,6 @@ struct cn10k_ml_fw {
 	/* Report DPE warnings */
 	int report_dpe_warnings;
 
-	/* Memory to be used for polling in fast-path requests */
-	const char *poll_mem;
-
 	/* Data buffer */
 	uint8_t *data;
 
@@ -525,13 +522,9 @@ struct cn10k_ml_dev {
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
 
 	/* Poll handling function pointers */
-	void (*set_poll_addr)(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx);
-	void (*set_poll_ptr)(struct roc_ml *roc_ml, struct cn10k_ml_req *req);
-	uint64_t (*get_poll_ptr)(struct roc_ml *roc_ml, struct cn10k_ml_req *req);
-
-	/* Memory barrier function pointers to handle synchronization */
-	void (*set_enq_barrier)(void);
-	void (*set_deq_barrier)(void);
+	void (*set_poll_addr)(struct cn10k_ml_req *req);
+	void (*set_poll_ptr)(struct cn10k_ml_req *req);
+	uint64_t (*get_poll_ptr)(struct cn10k_ml_req *req);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 4abf4ae0d3..11531afd8c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -23,11 +23,6 @@
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
-/* Scratch register range for poll mode requests */
-#define ML_POLL_REGISTER_SYNC  1023
-#define ML_POLL_REGISTER_START 1024
-#define ML_POLL_REGISTER_END   2047
-
 /* Error message length */
 #define ERRMSG_LEN 32
 
@@ -82,79 +77,23 @@ print_line(FILE *fp, int len)
 }
 
 static inline void
-cn10k_ml_set_poll_addr_ddr(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx)
+cn10k_ml_set_poll_addr(struct cn10k_ml_req *req)
 {
-	PLT_SET_USED(qp);
-	PLT_SET_USED(idx);
-
 	req->compl_W1 = PLT_U64_CAST(&req->status);
 }
 
 static inline void
-cn10k_ml_set_poll_addr_reg(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx)
-{
-	req->compl_W1 = ML_SCRATCH(qp->block_start + idx % qp->block_size);
-}
-
-static inline void
-cn10k_ml_set_poll_ptr_ddr(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+cn10k_ml_set_poll_ptr(struct cn10k_ml_req *req)
 {
-	PLT_SET_USED(roc_ml);
-
 	plt_write64(ML_CN10K_POLL_JOB_START, req->compl_W1);
 }
 
-static inline void
-cn10k_ml_set_poll_ptr_reg(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
-{
-	roc_ml_reg_write64(roc_ml, ML_CN10K_POLL_JOB_START, req->compl_W1);
-}
-
 static inline uint64_t
-cn10k_ml_get_poll_ptr_ddr(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+cn10k_ml_get_poll_ptr(struct cn10k_ml_req *req)
 {
-	PLT_SET_USED(roc_ml);
-
 	return plt_read64(req->compl_W1);
 }
 
-static inline uint64_t
-cn10k_ml_get_poll_ptr_reg(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
-{
-	return roc_ml_reg_read64(roc_ml, req->compl_W1);
-}
-
-static inline void
-cn10k_ml_set_sync_addr(struct cn10k_ml_dev *mldev, struct cn10k_ml_req *req)
-{
-	if (strcmp(mldev->fw.poll_mem, "ddr") == 0)
-		req->compl_W1 = PLT_U64_CAST(&req->status);
-	else if (strcmp(mldev->fw.poll_mem, "register") == 0)
-		req->compl_W1 = ML_SCRATCH(ML_POLL_REGISTER_SYNC);
-}
-
-static inline void
-cn10k_ml_enq_barrier_ddr(void)
-{
-}
-
-static inline void
-cn10k_ml_deq_barrier_ddr(void)
-{
-}
-
-static inline void
-cn10k_ml_enq_barrier_register(void)
-{
-	dmb_st;
-}
-
-static inline void
-cn10k_ml_deq_barrier_register(void)
-{
-	dsb_st;
-}
-
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -242,9 +181,6 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->stats.dequeued_count = 0;
 	qp->stats.enqueue_err_count = 0;
 	qp->stats.dequeue_err_count = 0;
-	qp->block_size =
-		(ML_POLL_REGISTER_END - ML_POLL_REGISTER_START + 1) / dev->data->nb_queue_pairs;
-	qp->block_start = ML_POLL_REGISTER_START + qp_id * qp->block_size;
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
@@ -933,11 +869,7 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	else
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_LF;
 
-	if (strcmp(mldev->fw.poll_mem, "register") == 0)
-		dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP / dev_info->max_queue_pairs;
-	else if (strcmp(mldev->fw.poll_mem, "ddr") == 0)
-		dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
-
+	dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
 	dev_info->max_io = ML_CN10K_MAX_INPUT_OUTPUT;
 	dev_info->max_segments = ML_CN10K_MAX_SEGMENTS;
 	dev_info->align_size = ML_CN10K_ALIGN_SIZE;
@@ -1118,24 +1050,9 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
 	/* Set polling function pointers */
-	if (strcmp(mldev->fw.poll_mem, "ddr") == 0) {
-		mldev->set_poll_addr = cn10k_ml_set_poll_addr_ddr;
-		mldev->set_poll_ptr = cn10k_ml_set_poll_ptr_ddr;
-		mldev->get_poll_ptr = cn10k_ml_get_poll_ptr_ddr;
-	} else if (strcmp(mldev->fw.poll_mem, "register") == 0) {
-		mldev->set_poll_addr = cn10k_ml_set_poll_addr_reg;
-		mldev->set_poll_ptr = cn10k_ml_set_poll_ptr_reg;
-		mldev->get_poll_ptr = cn10k_ml_get_poll_ptr_reg;
-	}
-
-	/* Set barrier function pointers */
-	if (strcmp(mldev->fw.poll_mem, "ddr") == 0) {
-		mldev->set_enq_barrier = cn10k_ml_enq_barrier_ddr;
-		mldev->set_deq_barrier = cn10k_ml_deq_barrier_ddr;
-	} else if (strcmp(mldev->fw.poll_mem, "register") == 0) {
-		mldev->set_enq_barrier = cn10k_ml_enq_barrier_register;
-		mldev->set_deq_barrier = cn10k_ml_deq_barrier_register;
-	}
+	mldev->set_poll_addr = cn10k_ml_set_poll_addr;
+	mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
+	mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
 
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
@@ -2390,15 +2307,14 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	op = ops[count];
 	req = &queue->reqs[head];
 
-	mldev->set_poll_addr(qp, req, head);
+	mldev->set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
-	mldev->set_enq_barrier();
 
-	mldev->set_poll_ptr(&mldev->roc, req);
+	mldev->set_poll_ptr(req);
 	enqueued = mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
@@ -2445,7 +2361,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 
 dequeue_req:
 	req = &queue->reqs[tail];
-	status = mldev->get_poll_ptr(&mldev->roc, req);
+	status = mldev->get_poll_ptr(req);
 	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH)) {
 		if (plt_tsc_cycles() < req->timeout)
 			goto empty_or_active;
@@ -2453,7 +2369,6 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 			req->result.error_code.s.etype = ML_ETYPE_DRIVER;
 	}
 
-	mldev->set_deq_barrier();
 	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
 	ops[count] = req->op;
 
@@ -2515,14 +2430,14 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	model = dev->data->models[op->model_id];
 	req = model->req;
 
-	cn10k_ml_set_sync_addr(mldev, req);
+	cn10k_ml_set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
-	mldev->set_poll_ptr(&mldev->roc, req);
+	mldev->set_poll_ptr(req);
 	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
 
 	timeout = true;
@@ -2542,7 +2457,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 
 	timeout = true;
 	do {
-		if (mldev->get_poll_ptr(&mldev->roc, req) == ML_CN10K_POLL_JOB_FINISH) {
+		if (mldev->get_poll_ptr(req) == ML_CN10K_POLL_JOB_FINISH) {
 			timeout = false;
 			break;
 		}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index d64a9f27e6..005b093e45 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -67,12 +67,6 @@ struct cn10k_ml_qp {
 
 	/* Statistics per queue-pair */
 	struct rte_ml_dev_stats stats;
-
-	/* Register block start for polling */
-	uint32_t block_start;
-
-	/* Register block end for polling */
-	uint32_t block_size;
 };
 
 /* Device ops */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 02/34] ml/cnxk: add generic cnxk device structure
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 01/34] ml/cnxk: drop support for register polling Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 03/34] ml/cnxk: add generic model and layer structures Srikanth Yalavarthi
                     ` (32 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduce generic cnxk device structure. This structure is
a top level device structure for the driver, which would
encapsulate the target / platform specific device structure.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c   | 316 ++++++++++----------
 drivers/ml/cnxk/cn10k_ml_dev.h   |  47 +--
 drivers/ml/cnxk/cn10k_ml_model.c |  15 +-
 drivers/ml/cnxk/cn10k_ml_model.h |   8 +-
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  60 ++--
 drivers/ml/cnxk/cn10k_ml_ops.c   | 495 +++++++++++++++++--------------
 drivers/ml/cnxk/cnxk_ml_dev.c    |  11 +
 drivers/ml/cnxk/cnxk_ml_dev.h    |  58 ++++
 drivers/ml/cnxk/meson.build      |   2 +
 9 files changed, 563 insertions(+), 449 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index e3c2badcef..3bc61443d8 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -10,13 +10,14 @@
 #include <rte_mldev_pmd.h>
 #include <rte_pci.h>
 
-#include <roc_api.h>
-
 #include <eal_firmware.h>
 
-#include "cn10k_ml_dev.h"
+#include <roc_api.h>
+
 #include "cn10k_ml_ops.h"
 
+#include "cnxk_ml_dev.h"
+
 #define CN10K_ML_FW_PATH		"fw_path"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
@@ -58,9 +59,6 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 /* Supported OCM page sizes: 1KB, 2KB, 4KB, 8KB and 16KB */
 static const int valid_ocm_page_size[] = {1024, 2048, 4096, 8192, 16384};
 
-/* Dummy operations for ML device */
-struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
-
 static int
 parse_string_arg(const char *key __rte_unused, const char *value, void *extra_args)
 {
@@ -90,7 +88,7 @@ parse_integer_arg(const char *key __rte_unused, const char *value, void *extra_a
 }
 
 static int
-cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mldev)
+cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *cn10k_mldev)
 {
 	bool enable_dpe_warnings_set = false;
 	bool report_dpe_warnings_set = false;
@@ -127,7 +125,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_FW_ENABLE_DPE_WARNINGS) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_ENABLE_DPE_WARNINGS,
-					 &parse_integer_arg, &mldev->fw.enable_dpe_warnings);
+					 &parse_integer_arg, &cn10k_mldev->fw.enable_dpe_warnings);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_FW_ENABLE_DPE_WARNINGS);
@@ -139,7 +137,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_FW_REPORT_DPE_WARNINGS) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_REPORT_DPE_WARNINGS,
-					 &parse_integer_arg, &mldev->fw.report_dpe_warnings);
+					 &parse_integer_arg, &cn10k_mldev->fw.report_dpe_warnings);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_FW_REPORT_DPE_WARNINGS);
@@ -151,7 +149,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_DEV_CACHE_MODEL_DATA) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_DEV_CACHE_MODEL_DATA, &parse_integer_arg,
-					 &mldev->cache_model_data);
+					 &cn10k_mldev->cache_model_data);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_DEV_CACHE_MODEL_DATA);
@@ -174,7 +172,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_DEV_HW_QUEUE_LOCK) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_DEV_HW_QUEUE_LOCK, &parse_integer_arg,
-					 &mldev->hw_queue_lock);
+					 &cn10k_mldev->hw_queue_lock);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_DEV_HW_QUEUE_LOCK);
@@ -186,7 +184,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_OCM_PAGE_SIZE) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_OCM_PAGE_SIZE, &parse_integer_arg,
-					 &mldev->ocm_page_size);
+					 &cn10k_mldev->ocm_page_size);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n", CN10K_ML_OCM_PAGE_SIZE);
 			ret = -EINVAL;
@@ -197,49 +195,53 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 check_args:
 	if (!fw_path_set)
-		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
+		cn10k_mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
 	else
-		mldev->fw.path = fw_path;
-	plt_info("ML: %s = %s", CN10K_ML_FW_PATH, mldev->fw.path);
+		cn10k_mldev->fw.path = fw_path;
+	plt_info("ML: %s = %s", CN10K_ML_FW_PATH, cn10k_mldev->fw.path);
 
 	if (!enable_dpe_warnings_set) {
-		mldev->fw.enable_dpe_warnings = CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT;
+		cn10k_mldev->fw.enable_dpe_warnings = CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT;
 	} else {
-		if ((mldev->fw.enable_dpe_warnings < 0) || (mldev->fw.enable_dpe_warnings > 1)) {
+		if ((cn10k_mldev->fw.enable_dpe_warnings < 0) ||
+		    (cn10k_mldev->fw.enable_dpe_warnings > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_FW_ENABLE_DPE_WARNINGS,
-				mldev->fw.enable_dpe_warnings);
+				cn10k_mldev->fw.enable_dpe_warnings);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_FW_ENABLE_DPE_WARNINGS, mldev->fw.enable_dpe_warnings);
+	plt_info("ML: %s = %d", CN10K_ML_FW_ENABLE_DPE_WARNINGS,
+		 cn10k_mldev->fw.enable_dpe_warnings);
 
 	if (!report_dpe_warnings_set) {
-		mldev->fw.report_dpe_warnings = CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT;
+		cn10k_mldev->fw.report_dpe_warnings = CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT;
 	} else {
-		if ((mldev->fw.report_dpe_warnings < 0) || (mldev->fw.report_dpe_warnings > 1)) {
+		if ((cn10k_mldev->fw.report_dpe_warnings < 0) ||
+		    (cn10k_mldev->fw.report_dpe_warnings > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_FW_REPORT_DPE_WARNINGS,
-				mldev->fw.report_dpe_warnings);
+				cn10k_mldev->fw.report_dpe_warnings);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_FW_REPORT_DPE_WARNINGS, mldev->fw.report_dpe_warnings);
+	plt_info("ML: %s = %d", CN10K_ML_FW_REPORT_DPE_WARNINGS,
+		 cn10k_mldev->fw.report_dpe_warnings);
 
 	if (!cache_model_data_set) {
-		mldev->cache_model_data = CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT;
+		cn10k_mldev->cache_model_data = CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT;
 	} else {
-		if ((mldev->cache_model_data < 0) || (mldev->cache_model_data > 1)) {
+		if ((cn10k_mldev->cache_model_data < 0) || (cn10k_mldev->cache_model_data > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_DEV_CACHE_MODEL_DATA,
-				mldev->cache_model_data);
+				cn10k_mldev->cache_model_data);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_DEV_CACHE_MODEL_DATA, mldev->cache_model_data);
+	plt_info("ML: %s = %d", CN10K_ML_DEV_CACHE_MODEL_DATA, cn10k_mldev->cache_model_data);
 
 	if (!ocm_alloc_mode_set) {
-		mldev->ocm.alloc_mode = CN10K_ML_OCM_ALLOC_MODE_DEFAULT;
+		cn10k_mldev->ocm.alloc_mode = CN10K_ML_OCM_ALLOC_MODE_DEFAULT;
 	} else {
 		if (!((strcmp(ocm_alloc_mode, "lowest") == 0) ||
 		      (strcmp(ocm_alloc_mode, "largest") == 0))) {
@@ -248,47 +250,47 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 			ret = -EINVAL;
 			goto exit;
 		}
-		mldev->ocm.alloc_mode = ocm_alloc_mode;
+		cn10k_mldev->ocm.alloc_mode = ocm_alloc_mode;
 	}
-	plt_info("ML: %s = %s", CN10K_ML_OCM_ALLOC_MODE, mldev->ocm.alloc_mode);
+	plt_info("ML: %s = %s", CN10K_ML_OCM_ALLOC_MODE, cn10k_mldev->ocm.alloc_mode);
 
 	if (!hw_queue_lock_set) {
-		mldev->hw_queue_lock = CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT;
+		cn10k_mldev->hw_queue_lock = CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT;
 	} else {
-		if ((mldev->hw_queue_lock < 0) || (mldev->hw_queue_lock > 1)) {
+		if ((cn10k_mldev->hw_queue_lock < 0) || (cn10k_mldev->hw_queue_lock > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_DEV_HW_QUEUE_LOCK,
-				mldev->hw_queue_lock);
+				cn10k_mldev->hw_queue_lock);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, mldev->hw_queue_lock);
+	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, cn10k_mldev->hw_queue_lock);
 
 	if (!ocm_page_size_set) {
-		mldev->ocm_page_size = CN10K_ML_OCM_PAGE_SIZE_DEFAULT;
+		cn10k_mldev->ocm_page_size = CN10K_ML_OCM_PAGE_SIZE_DEFAULT;
 	} else {
-		if (mldev->ocm_page_size < 0) {
+		if (cn10k_mldev->ocm_page_size < 0) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_OCM_PAGE_SIZE,
-				mldev->ocm_page_size);
+				cn10k_mldev->ocm_page_size);
 			ret = -EINVAL;
 			goto exit;
 		}
 
 		found = false;
 		for (i = 0; i < PLT_DIM(valid_ocm_page_size); i++) {
-			if (mldev->ocm_page_size == valid_ocm_page_size[i]) {
+			if (cn10k_mldev->ocm_page_size == valid_ocm_page_size[i]) {
 				found = true;
 				break;
 			}
 		}
 
 		if (!found) {
-			plt_err("Unsupported ocm_page_size = %d\n", mldev->ocm_page_size);
+			plt_err("Unsupported ocm_page_size = %d\n", cn10k_mldev->ocm_page_size);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_OCM_PAGE_SIZE, mldev->ocm_page_size);
+	plt_info("ML: %s = %d", CN10K_ML_OCM_PAGE_SIZE, cn10k_mldev->ocm_page_size);
 
 exit:
 	rte_kvargs_free(kvlist);
@@ -300,7 +302,8 @@ static int
 cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 {
 	struct rte_ml_dev_pmd_init_params init_params;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	char name[RTE_ML_STR_MAX];
 	struct rte_ml_dev *dev;
 	int ret;
@@ -308,7 +311,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	PLT_SET_USED(pci_drv);
 
 	init_params = (struct rte_ml_dev_pmd_init_params){
-		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cn10k_ml_dev)};
+		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cnxk_ml_dev)};
 
 	ret = roc_plt_init();
 	if (ret < 0) {
@@ -324,18 +327,20 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	}
 
 	/* Get private data space allocated */
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cnxk_mldev->mldev = dev;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
-		mldev->roc.pci_dev = pci_dev;
+		cn10k_mldev->roc.pci_dev = pci_dev;
 
-		ret = cn10k_mldev_parse_devargs(dev->device->devargs, mldev);
+		ret = cn10k_mldev_parse_devargs(dev->device->devargs, cn10k_mldev);
 		if (ret) {
 			plt_err("Failed to parse devargs ret = %d", ret);
 			goto pmd_destroy;
 		}
 
-		ret = roc_ml_dev_init(&mldev->roc);
+		ret = roc_ml_dev_init(&cn10k_mldev->roc);
 		if (ret) {
 			plt_err("Failed to initialize ML ROC, ret = %d", ret);
 			goto pmd_destroy;
@@ -351,7 +356,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	dev->dequeue_burst = NULL;
 	dev->op_error_get = NULL;
 
-	mldev->state = ML_CN10K_DEV_STATE_PROBED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_PROBED;
 
 	return 0;
 
@@ -368,7 +373,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 static int
 cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	char name[RTE_ML_STR_MAX];
 	struct rte_ml_dev *dev;
 	int ret;
@@ -383,8 +388,8 @@ cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 		return -ENODEV;
 
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
-		mldev = dev->data->dev_private;
-		ret = roc_ml_dev_fini(&mldev->roc);
+		cnxk_mldev = dev->data->dev_private;
+		ret = roc_ml_dev_fini(&cnxk_mldev->cn10k_mldev.roc);
 		if (ret)
 			return ret;
 	}
@@ -430,45 +435,45 @@ cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 static int
 cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
 	uint64_t timeout_cycle;
 	uint64_t reg_val64;
 	bool timeout;
 	int ret = 0;
 
-	mldev = fw->mldev;
+	cn10k_mldev = fw->cn10k_mldev;
 
 	/* Reset HEAD and TAIL debug pointer registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
 
 	/* Set ML_MLR_BASE to base IOVA of the ML region in LLC/DRAM. */
 	reg_val64 = rte_eal_get_baseaddr();
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_MLR_BASE);
-	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
-	roc_ml_reg_save(&mldev->roc, ML_MLR_BASE);
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_save(&cn10k_mldev->roc, ML_MLR_BASE);
 
 	/* Update FW load completion structure */
 	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
 	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &fw->req->result);
+	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
 	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CN10K_POLL_JOB_START, &fw->req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
-	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&mldev->roc, &fw->req->jd);
+	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
 
 	plt_rmb();
 	do {
-		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
+		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -480,11 +485,11 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 	} else {
 		/* Set ML to disable new jobs */
 		reg_val64 = (ROC_ML_CFG_JD_SIZE | ROC_ML_CFG_MLIP_ENA);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 
 		/* Clear scratch registers */
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
 
 		if (timeout) {
 			plt_err("Firmware load timeout");
@@ -498,14 +503,14 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 	}
 
 	/* Reset scratch registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
 
 	/* Disable job execution, to be enabled in start */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	return ret;
 }
@@ -515,7 +520,7 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 {
 	union ml_a35_0_rst_vector_base_s a35_0_rst_vector_base;
 	union ml_a35_0_rst_vector_base_s a35_1_rst_vector_base;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
 	uint64_t timeout_cycle;
 	uint64_t reg_val64;
 	uint32_t reg_val32;
@@ -524,24 +529,24 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	int ret = 0;
 	uint8_t i;
 
-	mldev = fw->mldev;
+	cn10k_mldev = fw->cn10k_mldev;
 
 	/* Reset HEAD and TAIL debug pointer registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
 
 	/* (1) Write firmware images for ACC's two A35 cores to the ML region in LLC / DRAM. */
 	rte_memcpy(PLT_PTR_ADD(fw->data, FW_LINKER_OFFSET), buffer, size);
 
 	/* (2) Set ML(0)_MLR_BASE = Base IOVA of the ML region in LLC/DRAM. */
 	reg_val64 = PLT_PTR_SUB_U64_CAST(fw->data, rte_eal_get_baseaddr());
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_MLR_BASE);
-	plt_ml_dbg("ML_MLR_BASE => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
-	roc_ml_reg_save(&mldev->roc, ML_MLR_BASE);
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_save(&cn10k_mldev->roc, ML_MLR_BASE);
 
 	/* (3) Set ML(0)_AXI_BRIDGE_CTRL(1) = 0x184003 to remove back-pressure check on DMA AXI
 	 * bridge.
@@ -549,9 +554,9 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	reg_val64 = (ROC_ML_AXI_BRIDGE_CTRL_AXI_RESP_CTRL |
 		     ROC_ML_AXI_BRIDGE_CTRL_BRIDGE_CTRL_MODE | ROC_ML_AXI_BRIDGE_CTRL_NCB_WR_BLK |
 		     ROC_ML_AXI_BRIDGE_CTRL_FORCE_WRESP_OK | ROC_ML_AXI_BRIDGE_CTRL_FORCE_RRESP_OK);
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_AXI_BRIDGE_CTRL(1));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_AXI_BRIDGE_CTRL(1));
 	plt_ml_dbg("ML_AXI_BRIDGE_CTRL(1) => 0x%016lx",
-		   roc_ml_reg_read64(&mldev->roc, ML_AXI_BRIDGE_CTRL(1)));
+		   roc_ml_reg_read64(&cn10k_mldev->roc, ML_AXI_BRIDGE_CTRL(1)));
 
 	/* (4) Set ML(0)_ANB(0..2)_BACKP_DISABLE = 0x3 to remove back-pressure on the AXI to NCB
 	 * bridges.
@@ -559,9 +564,9 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	for (i = 0; i < ML_ANBX_NR; i++) {
 		reg_val64 = (ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_B_BACKP_DISABLE |
 			     ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_R_BACKP_DISABLE);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_BACKP_DISABLE(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_BACKP_DISABLE(i));
 		plt_ml_dbg("ML_ANBX_BACKP_DISABLE(%u) => 0x%016lx", i,
-			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_BACKP_DISABLE(i)));
+			   roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_BACKP_DISABLE(i)));
 	}
 
 	/* (5) Set ML(0)_ANB(0..2)_NCBI_P_OVR = 0x3000 and ML(0)_ANB(0..2)_NCBI_NP_OVR = 0x3000 to
@@ -570,39 +575,40 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	for (i = 0; i < ML_ANBX_NR; i++) {
 		reg_val64 = (ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR |
 			     ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR_VLD);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
 		plt_ml_dbg("ML_ANBX_NCBI_P_OVR(%u) => 0x%016lx", i,
-			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_P_OVR(i)));
+			   roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_NCBI_P_OVR(i)));
 
 		reg_val64 |= (ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR |
 			      ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR_VLD);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_NP_OVR(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_NCBI_NP_OVR(i));
 		plt_ml_dbg("ML_ANBX_NCBI_NP_OVR(%u) => 0x%016lx", i,
-			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_NP_OVR(i)));
+			   roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_NCBI_NP_OVR(i)));
 	}
 
 	/* (6) Set ML(0)_CFG[MLIP_CLK_FORCE] = 1, to force turning on the MLIP clock. */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_MLIP_CLK_FORCE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* (7) Set ML(0)_JOB_MGR_CTRL[STALL_ON_IDLE] = 0, to make sure the boot request is accepted
 	 * when there is no job in the command queue.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL);
 	reg_val64 &= ~ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
-	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
+	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx",
+		   roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL));
 
 	/* (8) Set ML(0)_CFG[ENA] = 0 and ML(0)_CFG[MLIP_ENA] = 1 to bring MLIP out of reset while
 	 * keeping the job manager disabled.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_MLIP_ENA;
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* (9) Wait at least 70 coprocessor clock cycles. */
 	plt_delay_us(FW_WAIT_CYCLES);
@@ -613,53 +619,57 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	 * AXI outbound address divided by 4. Read after write.
 	 */
 	offset = PLT_PTR_ADD_U64_CAST(
-		fw->data, FW_LINKER_OFFSET - roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+		fw->data, FW_LINKER_OFFSET - roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
 	a35_0_rst_vector_base.s.addr = (offset + ML_AXI_START_ADDR) / 4;
 	a35_1_rst_vector_base.s.addr = (offset + ML_AXI_START_ADDR) / 4;
 
-	roc_ml_reg_write32(&mldev->roc, a35_0_rst_vector_base.w.w0, ML_A35_0_RST_VECTOR_BASE_W(0));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(0));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_0_rst_vector_base.w.w0,
+			   ML_A35_0_RST_VECTOR_BASE_W(0));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(0));
 	plt_ml_dbg("ML_A35_0_RST_VECTOR_BASE_W(0) => 0x%08x", reg_val32);
 
-	roc_ml_reg_write32(&mldev->roc, a35_0_rst_vector_base.w.w1, ML_A35_0_RST_VECTOR_BASE_W(1));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(1));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_0_rst_vector_base.w.w1,
+			   ML_A35_0_RST_VECTOR_BASE_W(1));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(1));
 	plt_ml_dbg("ML_A35_0_RST_VECTOR_BASE_W(1) => 0x%08x", reg_val32);
 
-	roc_ml_reg_write32(&mldev->roc, a35_1_rst_vector_base.w.w0, ML_A35_1_RST_VECTOR_BASE_W(0));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(0));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_1_rst_vector_base.w.w0,
+			   ML_A35_1_RST_VECTOR_BASE_W(0));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(0));
 	plt_ml_dbg("ML_A35_1_RST_VECTOR_BASE_W(0) => 0x%08x", reg_val32);
 
-	roc_ml_reg_write32(&mldev->roc, a35_1_rst_vector_base.w.w1, ML_A35_1_RST_VECTOR_BASE_W(1));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(1));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_1_rst_vector_base.w.w1,
+			   ML_A35_1_RST_VECTOR_BASE_W(1));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(1));
 	plt_ml_dbg("ML_A35_1_RST_VECTOR_BASE_W(1) => 0x%08x", reg_val32);
 
 	/* (11) Clear MLIP's ML(0)_SW_RST_CTRL[ACC_RST]. This will bring the ACC cores and other
 	 * MLIP components out of reset. The cores will execute firmware from the ML region as
 	 * written in step 1.
 	 */
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_SW_RST_CTRL);
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_SW_RST_CTRL);
 	reg_val32 &= ~ROC_ML_SW_RST_CTRL_ACC_RST;
-	roc_ml_reg_write32(&mldev->roc, reg_val32, ML_SW_RST_CTRL);
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_SW_RST_CTRL);
+	roc_ml_reg_write32(&cn10k_mldev->roc, reg_val32, ML_SW_RST_CTRL);
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_SW_RST_CTRL);
 	plt_ml_dbg("ML_SW_RST_CTRL => 0x%08x", reg_val32);
 
 	/* (12) Wait for notification from firmware that ML is ready for job execution. */
 	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
 	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &fw->req->result);
+	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
 	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CN10K_POLL_JOB_START, &fw->req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
-	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&mldev->roc, &fw->req->jd);
+	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
 
 	plt_rmb();
 	do {
-		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
+		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -671,11 +681,11 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	} else {
 		/* Set ML to disable new jobs */
 		reg_val64 = (ROC_ML_CFG_JD_SIZE | ROC_ML_CFG_MLIP_ENA);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 
 		/* Clear scratch registers */
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
 
 		if (timeout) {
 			plt_err("Firmware load timeout");
@@ -691,49 +701,51 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	/* (13) Set ML(0)_JOB_MGR_CTRL[STALL_ON_IDLE] = 0x1; this is needed to shut down the MLIP
 	 * clock when there are no more jobs to process.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL);
 	reg_val64 |= ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
-	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
+	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx",
+		   roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL));
 
 	/* (14) Set ML(0)_CFG[MLIP_CLK_FORCE] = 0; the MLIP clock will be turned on/off based on job
 	 * activities.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_MLIP_CLK_FORCE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* (15) Set ML(0)_CFG[ENA] to enable ML job execution. */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* Reset scratch registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
 
 	/* Disable job execution, to be enabled in start */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* Additional fixes: Set RO bit to fix O2D DMA bandwidth issue on cn10ka */
 	for (i = 0; i < ML_ANBX_NR; i++) {
-		reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_P_OVR(i));
+		reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_NCBI_P_OVR(i));
 		reg_val64 |= (ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR |
 			      ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR_VLD);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
 	}
 
 	return ret;
 }
 
 int
-cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
+cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
 	const struct plt_memzone *mz;
 	struct cn10k_ml_fw *fw;
 	void *fw_buffer = NULL;
@@ -741,8 +753,9 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	uint64_t fw_size = 0;
 	int ret = 0;
 
-	fw = &mldev->fw;
-	fw->mldev = mldev;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	fw = &cn10k_mldev->fw;
+	fw->cn10k_mldev = cn10k_mldev;
 
 	if (roc_env_is_emulator() || roc_env_is_hw()) {
 		/* Read firmware image to a buffer */
@@ -773,8 +786,8 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	memset(&fw->req->jd.fw_load.version[0], '\0', MLDEV_FIRMWARE_VERSION_LENGTH);
 
 	/* Reset device, if in active state */
-	if (roc_ml_mlip_is_enabled(&mldev->roc))
-		roc_ml_mlip_reset(&mldev->roc, true);
+	if (roc_ml_mlip_is_enabled(&cn10k_mldev->roc))
+		roc_ml_mlip_reset(&cn10k_mldev->roc, true);
 
 	/* Load firmware */
 	if (roc_env_is_emulator() || roc_env_is_hw()) {
@@ -787,22 +800,25 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	}
 
 	if (ret < 0)
-		cn10k_ml_fw_unload(mldev);
+		cn10k_ml_fw_unload(cnxk_mldev);
 
 	return ret;
 }
 
 void
-cn10k_ml_fw_unload(struct cn10k_ml_dev *mldev)
+cn10k_ml_fw_unload(struct cnxk_ml_dev *cnxk_mldev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
 	const struct plt_memzone *mz;
 	uint64_t reg_val;
 
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+
 	/* Disable and reset device */
-	reg_val = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val &= ~ROC_ML_CFG_MLIP_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val, ML_CFG);
-	roc_ml_mlip_reset(&mldev->roc, true);
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val, ML_CFG);
+	roc_ml_mlip_reset(&cn10k_mldev->roc, true);
 
 	mz = plt_memzone_lookup(FW_MEMZONE_NAME);
 	if (mz != NULL)
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 4aaeecff03..f9da1548c4 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -9,6 +9,9 @@
 
 #include "cn10k_ml_ocm.h"
 
+/* Dummy Device ops */
+extern struct rte_ml_dev_ops ml_dev_dummy_ops;
+
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
@@ -36,17 +39,10 @@
 /* Maximum number of segments for IO data */
 #define ML_CN10K_MAX_SEGMENTS 1
 
-/* ML command timeout in seconds */
-#define ML_CN10K_CMD_TIMEOUT 5
-
 /* ML slow-path job flags */
 #define ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE BIT(0)
 #define ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD   BIT(1)
 
-/* Poll mode job state */
-#define ML_CN10K_POLL_JOB_START	 0
-#define ML_CN10K_POLL_JOB_FINISH 1
-
 /* Memory barrier macros */
 #if defined(RTE_ARCH_ARM)
 #define dmb_st ({ asm volatile("dmb st" : : : "memory"); })
@@ -56,6 +52,7 @@
 #define dsb_st
 #endif
 
+struct cnxk_ml_dev;
 struct cn10k_ml_req;
 struct cn10k_ml_qp;
 
@@ -68,21 +65,6 @@ enum cn10k_ml_job_type {
 	ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST,
 };
 
-/* Device configuration state enum */
-enum cn10k_ml_dev_state {
-	/* Probed and not configured */
-	ML_CN10K_DEV_STATE_PROBED = 0,
-
-	/* Configured */
-	ML_CN10K_DEV_STATE_CONFIGURED,
-
-	/* Started */
-	ML_CN10K_DEV_STATE_STARTED,
-
-	/* Closed */
-	ML_CN10K_DEV_STATE_CLOSED
-};
-
 /* Error types enumeration */
 enum cn10k_ml_error_etype {
 	/* 0x0 */ ML_ETYPE_NO_ERROR = 0, /* No error */
@@ -379,7 +361,7 @@ struct cn10k_ml_jd {
 /* ML firmware structure */
 struct cn10k_ml_fw {
 	/* Device reference */
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
 
 	/* Firmware file path */
 	const char *path;
@@ -485,27 +467,12 @@ struct cn10k_ml_dev {
 	/* Device ROC */
 	struct roc_ml roc;
 
-	/* Configuration state */
-	enum cn10k_ml_dev_state state;
-
 	/* Firmware */
 	struct cn10k_ml_fw fw;
 
 	/* OCM info */
 	struct cn10k_ml_ocm ocm;
 
-	/* Number of models loaded */
-	uint16_t nb_models_loaded;
-
-	/* Number of models unloaded */
-	uint16_t nb_models_unloaded;
-
-	/* Number of models started */
-	uint16_t nb_models_started;
-
-	/* Number of models stopped */
-	uint16_t nb_models_stopped;
-
 	/* Extended stats data */
 	struct cn10k_ml_xstats xstats;
 
@@ -528,7 +495,7 @@ struct cn10k_ml_dev {
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
-int cn10k_ml_fw_load(struct cn10k_ml_dev *mldev);
-void cn10k_ml_fw_unload(struct cn10k_ml_dev *mldev);
+int cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev);
+void cn10k_ml_fw_unload(struct cnxk_ml_dev *cnxk_mldev);
 
 #endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index e0b750cd8e..cc46ca2efd 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -6,10 +6,11 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
+#include "cnxk_ml_dev.h"
+
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
 {
@@ -461,7 +462,7 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_
 }
 
 int
-cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, uint8_t *buffer,
+cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id, uint8_t *buffer,
 			       uint16_t *wb_pages, uint16_t *scratch_pages)
 {
 	struct cn10k_ml_model_metadata *metadata;
@@ -470,7 +471,7 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui
 	uint64_t wb_size;
 
 	metadata = (struct cn10k_ml_model_metadata *)buffer;
-	ocm = &mldev->ocm;
+	ocm = &cn10k_mldev->ocm;
 
 	/* Assume wb_size is zero for non-relocatable models */
 	if (metadata->model.ocm_relocatable)
@@ -494,11 +495,11 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui
 		   scratch_size, *scratch_pages);
 
 	/* Check if the model can be loaded on OCM */
-	if ((*wb_pages + *scratch_pages) > mldev->ocm.num_pages) {
+	if ((*wb_pages + *scratch_pages) > cn10k_mldev->ocm.num_pages) {
 		plt_err("Cannot create the model, OCM relocatable = %u",
 			metadata->model.ocm_relocatable);
 		plt_err("wb_pages (%u) + scratch_pages (%u) > %u", *wb_pages, *scratch_pages,
-			mldev->ocm.num_pages);
+			cn10k_mldev->ocm.num_pages);
 		return -ENOMEM;
 	}
 
@@ -506,8 +507,8 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui
 	 * prevent the library from allocating the remaining space on the tile to other models.
 	 */
 	if (!metadata->model.ocm_relocatable)
-		*scratch_pages =
-			PLT_MAX(PLT_U64_CAST(*scratch_pages), PLT_U64_CAST(mldev->ocm.num_pages));
+		*scratch_pages = PLT_MAX(PLT_U64_CAST(*scratch_pages),
+					 PLT_U64_CAST(cn10k_mldev->ocm.num_pages));
 
 	return 0;
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 4cc0744891..3128b28db7 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -13,6 +13,8 @@
 #include "cn10k_ml_ocm.h"
 #include "cn10k_ml_ops.h"
 
+struct cnxk_ml_dev;
+
 /* Model state */
 enum cn10k_ml_model_state {
 	ML_CN10K_MODEL_STATE_LOADED,
@@ -489,7 +491,7 @@ struct cn10k_ml_model_stats {
 /* Model Object */
 struct cn10k_ml_model {
 	/* Device reference */
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *mldev;
 
 	/* Name */
 	char name[RTE_ML_STR_MAX];
@@ -537,8 +539,8 @@ int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
 void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
 				uint8_t *base_dma_addr);
-int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, uint8_t *buffer,
-				   uint16_t *wb_pages, uint16_t *scratch_pages);
+int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id,
+				   uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages);
 void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 6fb0bb620e..8094a0fab1 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -4,11 +4,12 @@
 
 #include <rte_mldev_pmd.h>
 
-#include "cn10k_ml_dev.h"
+#include <roc_api.h>
+
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
-#include "roc_api.h"
+#include "cnxk_ml_dev.h"
 
 /* OCM macros */
 #define BYTE_LEN	   8
@@ -217,7 +218,8 @@ int
 cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
 			   uint16_t scratch_pages, uint64_t *tilemask)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	uint16_t used_scratch_pages_max;
@@ -236,8 +238,9 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	int max_slot_sz;
 	int page_id;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 
 	if (num_tiles > ML_CN10K_OCM_NUMTILES) {
 		plt_err("Invalid num_tiles = %u (> %u)", num_tiles, ML_CN10K_OCM_NUMTILES);
@@ -254,8 +257,8 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	tile_start = 0;
 	search_end_tile = ocm->num_tiles - num_tiles;
 
-	/* allocate for local ocm mask */
-	local_ocm_mask = rte_zmalloc("local_ocm_mask", mldev->ocm.mask_words, RTE_CACHE_LINE_SIZE);
+	/* Allocate for local ocm mask */
+	local_ocm_mask = rte_zmalloc("local_ocm_mask", ocm->mask_words, RTE_CACHE_LINE_SIZE);
 	if (local_ocm_mask == NULL) {
 		plt_err("Unable to allocate memory for local_ocm_mask");
 		return -1;
@@ -271,7 +274,7 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 			PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, used_last_wb_page_max);
 	}
 
-	memset(local_ocm_mask, 0, mldev->ocm.mask_words);
+	memset(local_ocm_mask, 0, ocm->mask_words);
 	for (tile_id = tile_start; tile_id < tile_start + num_tiles; tile_id++) {
 		for (word_id = 0; word_id < ocm->mask_words; word_id++)
 			local_ocm_mask[word_id] |= ocm->tile_ocm_info[tile_id].ocm_mask[word_id];
@@ -333,8 +336,9 @@ void
 cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t tilemask,
 			   int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_page_start;
@@ -345,8 +349,9 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 	int tile_id;
 	int page_id;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	/* Get first set bit, tile_start */
@@ -391,8 +396,9 @@ void
 cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_model *local_model;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_resize_pages;
@@ -404,8 +410,9 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 	int page_id;
 	uint16_t i;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	/* Update OCM info for WB memory */
@@ -453,35 +460,37 @@ cn10k_ml_ocm_pagemask_to_str(struct cn10k_ml_ocm_tile_info *tile_info, uint16_t
 	char *p = str;
 	int word;
 
-	/* add prefix 0x */
+	/* Add prefix 0x */
 	*p++ = '0';
 	*p++ = 'x';
 
-	/* build one word at a time */
+	/* Build hex string */
 	for (word = nwords - 1; word >= 0; word--) {
 		sprintf(p, "%02X", tile_info->ocm_mask[word]);
 		p += 2;
 	}
 
-	/* terminate */
+	/* Terminate */
 	*p++ = 0;
 }
 
 void
 cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
 {
-	char *str;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 	uint8_t tile_id;
 	uint8_t word_id;
 	int wb_pages;
+	char *str;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 
-	/* nibbles + prefix '0x' */
-	str = rte_zmalloc("ocm_mask_str", mldev->ocm.num_pages / 4 + 2, RTE_CACHE_LINE_SIZE);
+	/* Nibbles + prefix '0x' */
+	str = rte_zmalloc("ocm_mask_str", ocm->num_pages / 4 + 2, RTE_CACHE_LINE_SIZE);
 	if (str == NULL) {
 		plt_err("Unable to allocate memory for ocm_mask_str");
 		return;
@@ -492,9 +501,8 @@ cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
 		cn10k_ml_ocm_pagemask_to_str(&ocm->tile_ocm_info[tile_id], ocm->mask_words, str);
 
 		wb_pages = 0 - ocm->tile_ocm_info[tile_id].scratch_pages;
-		for (word_id = 0; word_id < mldev->ocm.mask_words; word_id++)
-			wb_pages +=
-				rte_popcount32(ocm->tile_ocm_info[tile_id].ocm_mask[word_id]);
+		for (word_id = 0; word_id < ocm->mask_words; word_id++)
+			wb_pages += rte_popcount32(ocm->tile_ocm_info[tile_id].ocm_mask[word_id]);
 
 		fprintf(fp,
 			"tile = %2u, scratch_pages = %4u,"
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 11531afd8c..def6d4c756 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -7,10 +7,11 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ops.h"
 
+#include "cnxk_ml_dev.h"
+
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
 
@@ -85,7 +86,7 @@ cn10k_ml_set_poll_addr(struct cn10k_ml_req *req)
 static inline void
 cn10k_ml_set_poll_ptr(struct cn10k_ml_req *req)
 {
-	plt_write64(ML_CN10K_POLL_JOB_START, req->compl_W1);
+	plt_write64(ML_CNXK_POLL_JOB_START, req->compl_W1);
 }
 
 static inline uint64_t
@@ -175,7 +176,7 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->queue.reqs = (struct cn10k_ml_req *)va;
 	qp->queue.head = 0;
 	qp->queue.tail = 0;
-	qp->queue.wait_cycles = ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	qp->nb_desc = nb_desc;
 	qp->stats.enqueued_count = 0;
 	qp->stats.dequeued_count = 0;
@@ -199,16 +200,17 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 static void
 cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 {
-
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	char str[STR_LEN];
 	uint8_t i;
 	uint8_t j;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	/* Print debug info */
@@ -249,7 +251,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
 			ML_CN10K_OCM_NUMTILES / 4, model->model_mem_map.tilemask);
 		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
-			model->model_mem_map.wb_page_start * mldev->ocm.page_size);
+			model->model_mem_map.wb_page_start * cn10k_mldev->ocm.page_size);
 	}
 
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->metadata.model.num_input);
@@ -325,7 +327,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 }
 
 static void
-cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_model *model,
+cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_model *model,
 				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
@@ -340,7 +342,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 	req->jd.hdr.model_id = model->model_id;
 	req->jd.hdr.job_type = job_type;
 	req->jd.hdr.fp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
 		if (!model->metadata.model.ocm_relocatable)
@@ -350,9 +352,9 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 
 		req->jd.hdr.sp_flags |= ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD;
 		req->jd.model_start.extended_args =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, &req->extended_args));
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->extended_args));
 		req->jd.model_start.model_dst_ddr_addr =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, addr->init_run_addr));
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_run_addr));
 		req->jd.model_start.model_init_offset = 0x0;
 		req->jd.model_start.model_main_offset = metadata->init_model.file_size;
 		req->jd.model_start.model_finish_offset =
@@ -372,7 +374,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 		req->jd.model_start.ocm_wb_range_start = metadata->model.ocm_wb_range_start;
 		req->jd.model_start.ocm_wb_range_end = metadata->model.ocm_wb_range_end;
 		req->jd.model_start.ddr_wb_base_address = PLT_U64_CAST(roc_ml_addr_ap2mlip(
-			&mldev->roc,
+			&cn10k_mldev->roc,
 			PLT_PTR_ADD(addr->finish_load_addr, metadata->finish_model.file_size)));
 		req->jd.model_start.ddr_wb_range_start = metadata->model.ddr_wb_range_start;
 		req->jd.model_start.ddr_wb_range_end = metadata->model.ddr_wb_range_end;
@@ -383,7 +385,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 		req->jd.model_start.output.s.ddr_range_end = metadata->model.ddr_output_range_end;
 
 		req->extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
-			roc_ml_addr_ap2mlip(&mldev->roc, model->addr.scratch_base_addr));
+			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, model->addr.scratch_base_addr));
 		req->extended_args.start.ddr_scratch_range_start =
 			metadata->model.ddr_scratch_range_start;
 		req->extended_args.start.ddr_scratch_range_end =
@@ -392,24 +394,20 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 }
 
 static __rte_always_inline void
-cn10k_ml_prep_fp_job_descriptor(struct rte_ml_dev *dev, struct cn10k_ml_req *req,
+cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_req *req,
 				struct rte_ml_op *op)
 {
-	struct cn10k_ml_dev *mldev;
-
-	mldev = dev->data->dev_private;
-
 	req->jd.hdr.jce.w0.u64 = 0;
 	req->jd.hdr.jce.w1.u64 = req->compl_W1;
 	req->jd.hdr.model_id = op->model_id;
 	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
 	req->jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
 	req->jd.hdr.sp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
 	req->jd.model_run.input_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->input[0]->addr));
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->input[0]->addr));
 	req->jd.model_run.output_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->output[0]->addr));
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->output[0]->addr));
 	req->jd.model_run.num_batches = op->nb_batches;
 }
 
@@ -436,66 +434,69 @@ static const struct xstat_info model_stats[] = {
 static int
 cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint16_t nb_stats;
 	uint16_t stat_id;
 	uint16_t model;
 	uint16_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
 	nb_stats = RTE_DIM(device_stats) + ML_CN10K_MAX_MODELS * RTE_DIM(model_stats);
-	if (mldev->xstats.entries == NULL)
-		mldev->xstats.entries = rte_zmalloc("cn10k_ml_xstats",
-						    sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
-						    PLT_CACHE_LINE_SIZE);
+	if (cn10k_mldev->xstats.entries == NULL)
+		cn10k_mldev->xstats.entries = rte_zmalloc(
+			"cn10k_ml_xstats", sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
+			PLT_CACHE_LINE_SIZE);
 
-	if (mldev->xstats.entries == NULL)
+	if (cn10k_mldev->xstats.entries == NULL)
 		return -ENOMEM;
 
 	/* Initialize device xstats */
 	stat_id = 0;
 	for (i = 0; i < RTE_DIM(device_stats); i++) {
-		mldev->xstats.entries[stat_id].map.id = stat_id;
-		snprintf(mldev->xstats.entries[stat_id].map.name,
-			 sizeof(mldev->xstats.entries[stat_id].map.name), "%s",
+		cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
+		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s",
 			 device_stats[i].name);
 
-		mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
-		mldev->xstats.entries[stat_id].type = device_stats[i].type;
-		mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_DEVICE;
-		mldev->xstats.entries[stat_id].obj_idx = 0;
-		mldev->xstats.entries[stat_id].reset_allowed = device_stats[i].reset_allowed;
+		cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
+		cn10k_mldev->xstats.entries[stat_id].type = device_stats[i].type;
+		cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_DEVICE;
+		cn10k_mldev->xstats.entries[stat_id].obj_idx = 0;
+		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_stats[i].reset_allowed;
 		stat_id++;
 	}
-	mldev->xstats.count_mode_device = stat_id;
+	cn10k_mldev->xstats.count_mode_device = stat_id;
 
 	/* Initialize model xstats */
 	for (model = 0; model < ML_CN10K_MAX_MODELS; model++) {
-		mldev->xstats.offset_for_model[model] = stat_id;
+		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
 
 		for (i = 0; i < RTE_DIM(model_stats); i++) {
-			mldev->xstats.entries[stat_id].map.id = stat_id;
-			mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
-			mldev->xstats.entries[stat_id].type = model_stats[i].type;
-			mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_MODEL;
-			mldev->xstats.entries[stat_id].obj_idx = model;
-			mldev->xstats.entries[stat_id].reset_allowed = model_stats[i].reset_allowed;
+			cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
+			cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
+			cn10k_mldev->xstats.entries[stat_id].type = model_stats[i].type;
+			cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_MODEL;
+			cn10k_mldev->xstats.entries[stat_id].obj_idx = model;
+			cn10k_mldev->xstats.entries[stat_id].reset_allowed =
+				model_stats[i].reset_allowed;
 
 			/* Name of xstat is updated during model load */
-			snprintf(mldev->xstats.entries[stat_id].map.name,
-				 sizeof(mldev->xstats.entries[stat_id].map.name), "Model-%u-%s",
-				 model, model_stats[i].name);
+			snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
+				 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name),
+				 "Model-%u-%s", model, model_stats[i].name);
 
 			stat_id++;
 		}
 
-		mldev->xstats.count_per_model[model] = RTE_DIM(model_stats);
+		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(model_stats);
 	}
 
-	mldev->xstats.count_mode_model = stat_id - mldev->xstats.count_mode_device;
-	mldev->xstats.count = stat_id;
+	cn10k_mldev->xstats.count_mode_model = stat_id - cn10k_mldev->xstats.count_mode_device;
+	cn10k_mldev->xstats.count = stat_id;
 
 	return 0;
 }
@@ -503,28 +504,32 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 static void
 cn10k_ml_xstats_uninit(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	rte_free(mldev->xstats.entries);
-	mldev->xstats.entries = NULL;
+	rte_free(cn10k_mldev->xstats.entries);
+	cn10k_mldev->xstats.entries = NULL;
 
-	mldev->xstats.count = 0;
+	cn10k_mldev->xstats.count = 0;
 }
 
 static void
 cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	uint16_t rclk_freq;
 	uint16_t sclk_freq;
 	uint16_t stat_id;
 	char suffix[8];
 	uint16_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[model_id];
 	stat_id = RTE_DIM(device_stats) + model_id * RTE_DIM(model_stats);
 
@@ -536,8 +541,8 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Update xstat name based on model name and sclk availability */
 	for (i = 0; i < RTE_DIM(model_stats); i++) {
-		snprintf(mldev->xstats.entries[stat_id].map.name,
-			 sizeof(mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
 			 model->metadata.model.name, model_stats[i].name, suffix);
 		stat_id++;
 	}
@@ -547,19 +552,19 @@ static uint64_t
 cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 		       enum cn10k_ml_xstats_type type)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
 
 	switch (type) {
 	case nb_models_loaded:
-		return mldev->nb_models_loaded;
+		return cnxk_mldev->nb_models_loaded;
 	case nb_models_unloaded:
-		return mldev->nb_models_unloaded;
+		return cnxk_mldev->nb_models_unloaded;
 	case nb_models_started:
-		return mldev->nb_models_started;
+		return cnxk_mldev->nb_models_started;
 	case nb_models_stopped:
-		return mldev->nb_models_stopped;
+		return cnxk_mldev->nb_models_stopped;
 	default:
 		return -1;
 	}
@@ -651,15 +656,17 @@ static int
 cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[], uint16_t nb_ids)
 {
 	struct cn10k_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint16_t nb_stats;
 	uint16_t stat_id;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	if (stat_ids == NULL)
-		nb_stats = mldev->xstats.count_mode_device;
+		nb_stats = cn10k_mldev->xstats.count_mode_device;
 	else
 		nb_stats = nb_ids;
 
@@ -669,10 +676,10 @@ cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[],
 		else
 			stat_id = stat_ids[i];
 
-		if (stat_id >= mldev->xstats.count_mode_device)
+		if (stat_id >= cn10k_mldev->xstats.count_mode_device)
 			return -EINVAL;
 
-		xs = &mldev->xstats.entries[stat_id];
+		xs = &cn10k_mldev->xstats.entries[stat_id];
 		if (!xs->reset_allowed)
 			continue;
 
@@ -740,15 +747,17 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 			    uint16_t nb_ids)
 {
 	struct cn10k_ml_xstats_entry *xs;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	int32_t lcl_model_id = 0;
 	uint16_t start_id;
 	uint16_t end_id;
 	int32_t i;
 	int32_t j;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	for (i = 0; i < ML_CN10K_MAX_MODELS; i++) {
 		if (model_id == -1) {
 			model = dev->data->models[i];
@@ -765,12 +774,13 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 			}
 		}
 
-		start_id = mldev->xstats.offset_for_model[i];
-		end_id = mldev->xstats.offset_for_model[i] + mldev->xstats.count_per_model[i] - 1;
+		start_id = cn10k_mldev->xstats.offset_for_model[i];
+		end_id = cn10k_mldev->xstats.offset_for_model[i] +
+			 cn10k_mldev->xstats.count_per_model[i] - 1;
 
 		if (stat_ids == NULL) {
 			for (j = start_id; j <= end_id; j++) {
-				xs = &mldev->xstats.entries[j];
+				xs = &cn10k_mldev->xstats.entries[j];
 				cn10k_ml_reset_model_stat(dev, i, xs->type);
 			}
 		} else {
@@ -780,7 +790,7 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 						stat_ids[j], lcl_model_id);
 					return -EINVAL;
 				}
-				xs = &mldev->xstats.entries[stat_ids[j]];
+				xs = &cn10k_mldev->xstats.entries[stat_ids[j]];
 				cn10k_ml_reset_model_stat(dev, i, xs->type);
 			}
 		}
@@ -854,17 +864,19 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
 	if (dev_info == NULL)
 		return -EINVAL;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
 	dev_info->driver_name = dev->device->driver->name;
 	dev_info->max_models = ML_CN10K_MAX_MODELS;
-	if (mldev->hw_queue_lock)
+	if (cn10k_mldev->hw_queue_lock)
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
 	else
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_LF;
@@ -881,8 +893,9 @@ static int
 cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
 {
 	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
@@ -895,7 +908,8 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		return -EINVAL;
 
 	/* Get CN10K device handle */
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	cn10k_ml_dev_info_get(dev, &dev_info);
 	if (conf->nb_models > dev_info.max_models) {
@@ -908,21 +922,21 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		return -EINVAL;
 	}
 
-	if (mldev->state == ML_CN10K_DEV_STATE_PROBED) {
+	if (cnxk_mldev->state == ML_CNXK_DEV_STATE_PROBED) {
 		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
 
 		/* Load firmware */
-		ret = cn10k_ml_fw_load(mldev);
+		ret = cn10k_ml_fw_load(cnxk_mldev);
 		if (ret != 0)
 			return ret;
-	} else if (mldev->state == ML_CN10K_DEV_STATE_CONFIGURED) {
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
 		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
-	} else if (mldev->state == ML_CN10K_DEV_STATE_STARTED) {
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_STARTED) {
 		plt_err("Device can't be reconfigured in started state\n");
 		return -ENOTSUP;
-	} else if (mldev->state == ML_CN10K_DEV_STATE_CLOSED) {
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CLOSED) {
 		plt_err("Device can't be reconfigured after close\n");
 		return -ENOTSUP;
 	}
@@ -1013,10 +1027,10 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	}
 	dev->data->nb_models = conf->nb_models;
 
-	ocm = &mldev->ocm;
+	ocm = &cn10k_mldev->ocm;
 	ocm->num_tiles = ML_CN10K_OCM_NUMTILES;
 	ocm->size_per_tile = ML_CN10K_OCM_TILESIZE;
-	ocm->page_size = mldev->ocm_page_size;
+	ocm->page_size = cn10k_mldev->ocm_page_size;
 	ocm->num_pages = ocm->size_per_tile / ocm->page_size;
 	ocm->mask_words = ocm->num_pages / (8 * sizeof(uint8_t));
 
@@ -1044,25 +1058,25 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	}
 
 	/* Set JCMDQ enqueue function */
-	if (mldev->hw_queue_lock == 1)
-		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_sl;
+	if (cn10k_mldev->hw_queue_lock == 1)
+		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_sl;
 	else
-		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
+		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
 	/* Set polling function pointers */
-	mldev->set_poll_addr = cn10k_ml_set_poll_addr;
-	mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
-	mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
+	cn10k_mldev->set_poll_addr = cn10k_ml_set_poll_addr;
+	cn10k_mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
+	cn10k_mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
 
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
 	dev->op_error_get = cn10k_ml_op_error_get;
 
-	mldev->nb_models_loaded = 0;
-	mldev->nb_models_started = 0;
-	mldev->nb_models_stopped = 0;
-	mldev->nb_models_unloaded = 0;
-	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
+	cnxk_mldev->nb_models_loaded = 0;
+	cnxk_mldev->nb_models_started = 0;
+	cnxk_mldev->nb_models_stopped = 0;
+	cnxk_mldev->nb_models_unloaded = 0;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
 
 	return 0;
 
@@ -1077,8 +1091,9 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 static int
 cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
 	uint16_t qp_id;
@@ -1086,10 +1101,11 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	if (dev == NULL)
 		return -EINVAL;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Release ocm_mask memory */
-	rte_free(mldev->ocm.ocm_mask);
+	rte_free(cn10k_mldev->ocm.ocm_mask);
 
 	/* Stop and unload all models */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
@@ -1125,21 +1141,21 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	cn10k_ml_xstats_uninit(dev);
 
 	/* Unload firmware */
-	cn10k_ml_fw_unload(mldev);
+	cn10k_ml_fw_unload(cnxk_mldev);
 
 	/* Clear scratch registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
 
 	/* Reset ML_MLR_BASE */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_MLR_BASE);
-	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
 
-	mldev->state = ML_CN10K_DEV_STATE_CLOSED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CLOSED;
 
 	/* Remove PCI device */
 	return rte_dev_remove(dev->device);
@@ -1148,17 +1164,19 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 static int
 cn10k_ml_dev_start(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	mldev->state = ML_CN10K_DEV_STATE_STARTED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
 
 	return 0;
 }
@@ -1166,17 +1184,19 @@ cn10k_ml_dev_start(struct rte_ml_dev *dev)
 static int
 cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
 
 	return 0;
 }
@@ -1259,22 +1279,24 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
 			      uint32_t size)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint32_t xstats_mode_count;
 	uint32_t idx = 0;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	xstats_mode_count = 0;
 	switch (mode) {
 	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = mldev->xstats.count_mode_device;
+		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CN10K_MAX_MODELS)
 			break;
-		xstats_mode_count = mldev->xstats.count_per_model[model_id];
+		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
 	default:
 		return -EINVAL;
@@ -1283,16 +1305,17 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 	if (xstats_mode_count > size || xstats_map == NULL)
 		return xstats_mode_count;
 
-	for (i = 0; i < mldev->xstats.count && idx < size; i++) {
-		if (mldev->xstats.entries[i].mode != mode)
+	for (i = 0; i < cn10k_mldev->xstats.count && idx < size; i++) {
+		if (cn10k_mldev->xstats.entries[i].mode != mode)
 			continue;
 
 		if (mode != RTE_ML_DEV_XSTATS_DEVICE &&
-		    model_id != mldev->xstats.entries[i].obj_idx)
+		    model_id != cn10k_mldev->xstats.entries[i].obj_idx)
 			continue;
 
-		strncpy(xstats_map[idx].name, mldev->xstats.entries[i].map.name, RTE_ML_STR_MAX);
-		xstats_map[idx].id = mldev->xstats.entries[i].map.id;
+		strncpy(xstats_map[idx].name, cn10k_mldev->xstats.entries[i].map.name,
+			RTE_ML_STR_MAX);
+		xstats_map[idx].id = cn10k_mldev->xstats.entries[i].map.id;
 		idx++;
 	}
 
@@ -1304,13 +1327,15 @@ cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16
 				uint64_t *value)
 {
 	struct cn10k_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	cn10k_ml_xstats_fn fn;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
-	for (i = 0; i < mldev->xstats.count; i++) {
-		xs = &mldev->xstats.entries[i];
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	for (i = 0; i < cn10k_mldev->xstats.count; i++) {
+		xs = &cn10k_mldev->xstats.entries[i];
 		if (strncmp(xs->map.name, name, RTE_ML_STR_MAX) == 0) {
 			if (stat_id != NULL)
 				*stat_id = xs->map.id;
@@ -1344,24 +1369,26 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
 {
 	struct cn10k_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint32_t xstats_mode_count;
 	cn10k_ml_xstats_fn fn;
 	uint64_t val;
 	uint32_t idx;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	xstats_mode_count = 0;
 
 	switch (mode) {
 	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = mldev->xstats.count_mode_device;
+		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CN10K_MAX_MODELS)
 			return -EINVAL;
-		xstats_mode_count = mldev->xstats.count_per_model[model_id];
+		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
 	default:
 		return -EINVAL;
@@ -1369,8 +1396,8 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 
 	idx = 0;
 	for (i = 0; i < nb_ids && idx < xstats_mode_count; i++) {
-		xs = &mldev->xstats.entries[stat_ids[i]];
-		if (stat_ids[i] > mldev->xstats.count || xs->mode != mode)
+		xs = &cn10k_mldev->xstats.entries[stat_ids[i]];
+		if (stat_ids[i] > cn10k_mldev->xstats.count || xs->mode != mode)
 			continue;
 
 		if (mode == RTE_ML_DEV_XSTATS_MODEL && model_id != xs->obj_idx) {
@@ -1418,8 +1445,9 @@ cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mo
 static int
 cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_fw *fw;
 
 	uint32_t head_loc;
@@ -1432,8 +1460,9 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	if (roc_env_is_asim())
 		return 0;
 
-	mldev = dev->data->dev_private;
-	fw = &mldev->fw;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	fw = &cn10k_mldev->fw;
 
 	/* Dump model info */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
@@ -1451,15 +1480,19 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	for (core_id = 0; core_id <= 1; core_id++) {
 		bufsize = fw->req->jd.fw_load.debug.debug_buffer_size;
 		if (core_id == 0) {
-			head_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-			tail_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+			head_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+			tail_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_debug_ptr);
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		} else {
-			head_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-			tail_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+			head_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+			tail_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_debug_ptr);
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		}
 		if (head_loc < tail_loc) {
 			fprintf(fp, "%.*s\n", tail_loc - head_loc, &head_ptr[head_loc]);
@@ -1473,18 +1506,18 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	for (core_id = 0; core_id <= 1; core_id++) {
 		bufsize = fw->req->jd.fw_load.debug.exception_state_size;
 		if ((core_id == 0) &&
-		    (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0)) {
+		    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0)) {
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C0 = 0x%016lx",
-				roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0));
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0));
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 			fprintf(fp, "%.*s", bufsize, head_ptr);
-		} else if ((core_id == 1) &&
-			   (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0)) {
+		} else if ((core_id == 1) && (roc_ml_reg_read64(&cn10k_mldev->roc,
+								ML_SCRATCH_EXCEPTION_SP_C1) != 0)) {
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C1 = 0x%016lx",
-				roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1));
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1));
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 			fprintf(fp, "%.*s", bufsize, head_ptr);
 		}
 	}
@@ -1495,14 +1528,16 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 static int
 cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	const struct plt_memzone *mz;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	uint64_t timeout_cycle;
 	bool timeout;
 	int ret;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cn10k_ml_req), 0,
 					 ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL) {
@@ -1515,20 +1550,20 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
 	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
 	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
-	req->jd.fw_load.flags = cn10k_ml_fw_flags_get(&mldev->fw);
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
+	req->jd.fw_load.flags = cn10k_ml_fw_flags_get(&cn10k_mldev->fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
 	/* Enqueue firmware selftest request through scratch registers */
 	timeout = true;
-	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
 
 	plt_rmb();
 	do {
-		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
-		    (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
+		    (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -1552,8 +1587,8 @@ int
 cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
 	struct cn10k_ml_model_metadata *metadata;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
@@ -1574,7 +1609,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	if (ret != 0)
 		return ret;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
 
 	/* Find model ID */
 	found = false;
@@ -1591,7 +1626,8 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	/* Get WB and scratch pages, check if model can be loaded. */
-	ret = cn10k_ml_model_ocm_pages_count(mldev, idx, params->addr, &wb_pages, &scratch_pages);
+	ret = cn10k_ml_model_ocm_pages_count(&cnxk_mldev->cn10k_mldev, idx, params->addr, &wb_pages,
+					     &scratch_pages);
 	if (ret < 0)
 		return ret;
 
@@ -1623,7 +1659,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	model = mz->addr;
-	model->mldev = mldev;
+	model->mldev = cnxk_mldev;
 	model->model_id = idx;
 
 	rte_memcpy(&model->metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
@@ -1680,7 +1716,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
-	mldev->nb_models_loaded++;
+	cnxk_mldev->nb_models_loaded++;
 
 	/* Update xstats names */
 	cn10k_ml_xstats_model_name_update(dev, idx);
@@ -1695,9 +1731,9 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	char str[RTE_MEMZONE_NAMESIZE];
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
 	model = dev->data->models[model_id];
 
 	if (model == NULL) {
@@ -1711,7 +1747,7 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 	}
 
 	dev->data->models[model_id] = NULL;
-	mldev->nb_models_unloaded++;
+	cnxk_mldev->nb_models_unloaded++;
 
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, model_id);
 	return plt_memzone_free(plt_memzone_lookup(str));
@@ -1720,8 +1756,9 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 int
 cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1735,8 +1772,9 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	bool locked;
 	int ret = 0;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	if (model == NULL) {
@@ -1746,11 +1784,11 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Prepare JD */
 	req = model->req;
-	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
+	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
 
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
 	num_tiles = model->metadata.model.tile_end - model->metadata.model.tile_start + 1;
@@ -1815,26 +1853,26 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	job_dequeued = false;
 	do {
 		if (!job_enqueued) {
-			req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&mldev->roc, &req->jd);
+			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
 			if (req->result.error_code.u64 == 0)
 				ret = 0;
 			else
 				ret = -1;
 		}
 	} else { /* Reset scratch registers */
-		roc_ml_scratch_queue_reset(&mldev->roc);
+		roc_ml_scratch_queue_reset(&cn10k_mldev->roc);
 		ret = -ETIME;
 	}
 
@@ -1843,7 +1881,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 		if (plt_spinlock_trylock(&model->lock) != 0) {
 			if (ret == 0) {
 				model->state = ML_CN10K_MODEL_STATE_STARTED;
-				mldev->nb_models_started++;
+				cnxk_mldev->nb_models_started++;
 			} else {
 				model->state = ML_CN10K_MODEL_STATE_UNKNOWN;
 			}
@@ -1867,7 +1905,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	if (ret < 0) { /* Call unload to update model and FW state, ignore error */
 		rte_ml_model_stop(dev->data->dev_id, model_id);
 	} else {
-		if (mldev->cache_model_data && roc_model_is_cn10ka())
+		if (cn10k_mldev->cache_model_data && roc_model_is_cn10ka())
 			ret = cn10k_ml_cache_model_data(dev, model_id);
 	}
 
@@ -1877,8 +1915,9 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 int
 cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1887,8 +1926,9 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	bool locked;
 	int ret = 0;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	if (model == NULL) {
@@ -1898,11 +1938,11 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Prepare JD */
 	req = model->req;
-	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
+	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
 
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
 	locked = false;
@@ -1941,33 +1981,33 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	job_dequeued = false;
 	do {
 		if (!job_enqueued) {
-			req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&mldev->roc, &req->jd);
+			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
 			if (req->result.error_code.u64 == 0x0)
 				ret = 0;
 			else
 				ret = -1;
 		}
 	} else {
-		roc_ml_scratch_queue_reset(&mldev->roc);
+		roc_ml_scratch_queue_reset(&cn10k_mldev->roc);
 		ret = -ETIME;
 	}
 
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			mldev->nb_models_stopped++;
+			cnxk_mldev->nb_models_stopped++;
 			model->state = ML_CN10K_MODEL_STATE_LOADED;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
@@ -2211,8 +2251,9 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		       struct rte_ml_op *op)
 {
 	struct cn10k_ml_model_stats *stats;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
@@ -2258,14 +2299,16 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 
 		/* Handle driver error */
 		if (result->error_code.s.etype == ML_ETYPE_DRIVER) {
-			mldev = dev->data->dev_private;
+			cnxk_mldev = dev->data->dev_private;
+			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 			/* Check for exception */
-			if ((roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0) ||
-			    (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
+			if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) !=
+			     0) ||
+			    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
 				result->error_code.s.stype = ML_DRIVER_ERR_EXCEPTION;
-			else if ((roc_ml_reg_read64(&mldev->roc, ML_CORE_INT_LO) != 0) ||
-				 (roc_ml_reg_read64(&mldev->roc, ML_CORE_INT_HI) != 0))
+			else if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_LO) != 0) ||
+				 (roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_HI) != 0))
 				result->error_code.s.stype = ML_DRIVER_ERR_FW_ERROR;
 			else
 				result->error_code.s.stype = ML_DRIVER_ERR_UNKNOWN;
@@ -2282,8 +2325,9 @@ __rte_hot uint16_t
 cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	struct cn10k_ml_qp *qp;
 	struct rte_ml_op *op;
@@ -2292,7 +2336,8 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	uint64_t head;
 	bool enqueued;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	qp = dev->data->queue_pairs[qp_id];
 	queue = &qp->queue;
 
@@ -2307,15 +2352,15 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	op = ops[count];
 	req = &queue->reqs[head];
 
-	mldev->set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
+	cn10k_mldev->set_poll_addr(req);
+	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
-	mldev->set_poll_ptr(req);
-	enqueued = mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd);
+	cn10k_mldev->set_poll_ptr(req);
+	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
 
@@ -2339,8 +2384,9 @@ __rte_hot uint16_t
 cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	struct cn10k_ml_qp *qp;
 
@@ -2348,7 +2394,8 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	uint16_t count;
 	uint64_t tail;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	qp = dev->data->queue_pairs[qp_id];
 	queue = &qp->queue;
 
@@ -2361,8 +2408,8 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 
 dequeue_req:
 	req = &queue->reqs[tail];
-	status = mldev->get_poll_ptr(req);
-	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH)) {
+	status = cn10k_mldev->get_poll_ptr(req);
+	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
 		if (plt_tsc_cycles() < req->timeout)
 			goto empty_or_active;
 		else /* Timeout, set indication of driver error */
@@ -2420,30 +2467,32 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 __rte_hot int
 cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	bool timeout;
 	int ret = 0;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[op->model_id];
 	req = model->req;
 
 	cn10k_ml_set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
-	mldev->set_poll_ptr(req);
+	cn10k_mldev->set_poll_ptr(req);
 	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
 
 	timeout = true;
-	req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	do {
-		if (mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd)) {
+		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd)) {
 			req->op = op;
 			timeout = false;
 			break;
@@ -2457,7 +2506,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 
 	timeout = true;
 	do {
-		if (mldev->get_poll_ptr(req) == ML_CN10K_POLL_JOB_FINISH) {
+		if (cn10k_mldev->get_poll_ptr(req) == ML_CNXK_POLL_JOB_FINISH) {
 			timeout = false;
 			break;
 		}
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.c b/drivers/ml/cnxk/cnxk_ml_dev.c
new file mode 100644
index 0000000000..2a5c17c973
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_dev.c
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include "cnxk_ml_dev.h"
+
+/* Dummy operations for ML device */
+struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
new file mode 100644
index 0000000000..51315de622
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_DEV_H_
+#define _CNXK_ML_DEV_H_
+
+#include <roc_api.h>
+
+#include "cn10k_ml_dev.h"
+
+/* ML command timeout in seconds */
+#define ML_CNXK_CMD_TIMEOUT 5
+
+/* Poll mode job state */
+#define ML_CNXK_POLL_JOB_START	0
+#define ML_CNXK_POLL_JOB_FINISH 1
+
+/* Device configuration state enum */
+enum cnxk_ml_dev_state {
+	/* Probed and not configured */
+	ML_CNXK_DEV_STATE_PROBED = 0,
+
+	/* Configured */
+	ML_CNXK_DEV_STATE_CONFIGURED,
+
+	/* Started */
+	ML_CNXK_DEV_STATE_STARTED,
+
+	/* Closed */
+	ML_CNXK_DEV_STATE_CLOSED
+};
+
+/* Device private data */
+struct cnxk_ml_dev {
+	/* RTE device */
+	struct rte_ml_dev *mldev;
+
+	/* Configuration state */
+	enum cnxk_ml_dev_state state;
+
+	/* Number of models loaded */
+	uint16_t nb_models_loaded;
+
+	/* Number of models unloaded */
+	uint16_t nb_models_unloaded;
+
+	/* Number of models started */
+	uint16_t nb_models_started;
+
+	/* Number of models stopped */
+	uint16_t nb_models_stopped;
+
+	/* CN10K device structure */
+	struct cn10k_ml_dev cn10k_mldev;
+};
+
+#endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 94fa4283b1..03a2d4ecf2 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -12,6 +12,7 @@ driver_sdk_headers = files(
         'cn10k_ml_ops.h',
         'cn10k_ml_model.h',
         'cn10k_ml_ocm.h',
+        'cnxk_ml_dev.h',
 )
 
 sources = files(
@@ -19,6 +20,7 @@ sources = files(
         'cn10k_ml_ops.c',
         'cn10k_ml_model.c',
         'cn10k_ml_ocm.c',
+        'cnxk_ml_dev.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 03/34] ml/cnxk: add generic model and layer structures
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 01/34] ml/cnxk: drop support for register polling Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 02/34] ml/cnxk: add generic cnxk device structure Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 04/34] ml/cnxk: add generic cnxk request structure Srikanth Yalavarthi
                     ` (31 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduce generic cnxk model and layer structure. These
structures would enable supporting models with multiple
layers. A model is a collection of multiple independent
layers with flow dependencies between the layers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |   9 +-
 drivers/ml/cnxk/cn10k_ml_model.c | 245 ++++++++--------
 drivers/ml/cnxk/cn10k_ml_model.h | 122 ++------
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  50 ++--
 drivers/ml/cnxk/cn10k_ml_ocm.h   |   9 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 488 +++++++++++++++++--------------
 drivers/ml/cnxk/cnxk_ml_io.h     |  79 +++++
 drivers/ml/cnxk/cnxk_ml_model.c  |   7 +
 drivers/ml/cnxk/cnxk_ml_model.h  | 111 +++++++
 drivers/ml/cnxk/meson.build      |   3 +
 10 files changed, 653 insertions(+), 470 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index f9da1548c4..99ff0a344a 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -9,6 +9,8 @@
 
 #include "cn10k_ml_ocm.h"
 
+#include "cnxk_ml_io.h"
+
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 
@@ -21,9 +23,6 @@ extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 /* Device alignment size */
 #define ML_CN10K_ALIGN_SIZE 128
 
-/* Maximum number of models per device */
-#define ML_CN10K_MAX_MODELS 16
-
 /* Maximum number of queue-pairs per device, spinlock version */
 #define ML_CN10K_MAX_QP_PER_DEVICE_SL 16
 
@@ -455,8 +454,8 @@ struct cn10k_ml_xstats {
 	struct cn10k_ml_xstats_entry *entries;
 
 	/* Store num stats and offset of the stats for each model */
-	uint16_t count_per_model[ML_CN10K_MAX_MODELS];
-	uint16_t offset_for_model[ML_CN10K_MAX_MODELS];
+	uint16_t count_per_model[ML_CNXK_MAX_MODELS];
+	uint16_t offset_for_model[ML_CNXK_MAX_MODELS];
 	uint16_t count_mode_device;
 	uint16_t count_mode_model;
 	uint16_t count;
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index cc46ca2efd..d747bba151 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -6,10 +6,10 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -311,19 +311,17 @@ cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata)
 }
 
 void
-cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_t *base_dma_addr)
+cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer, uint8_t *base_dma_addr)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_model_addr *addr;
+	struct cn10k_ml_layer_addr *addr;
 	size_t model_data_size;
 	uint8_t *dma_addr_load;
 	uint8_t *dma_addr_run;
-	uint8_t i;
-	uint8_t j;
 	int fpos;
 
-	metadata = &model->metadata;
-	addr = &model->addr;
+	metadata = &layer->glow.metadata;
+	addr = &layer->glow.addr;
 	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
 			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
 
@@ -361,102 +359,136 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_
 	addr->wb_base_addr = PLT_PTR_SUB(dma_addr_load, metadata->weights_bias.mem_offset);
 	addr->wb_load_addr = PLT_PTR_ADD(addr->wb_base_addr, metadata->weights_bias.mem_offset);
 	rte_memcpy(addr->wb_load_addr, PLT_PTR_ADD(buffer, fpos), metadata->weights_bias.file_size);
+}
+
+void
+cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	uint8_t i;
+	uint8_t j;
+
+	metadata = &layer->glow.metadata;
 
 	/* Inputs */
-	addr->total_input_sz_d = 0;
-	addr->total_input_sz_q = 0;
+	layer->info.nb_inputs = metadata->model.num_input;
+	layer->info.total_input_sz_d = 0;
+	layer->info.total_input_sz_q = 0;
 	for (i = 0; i < metadata->model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			addr->input[i].nb_dims = 4;
-			addr->input[i].shape[0] = metadata->input1[i].shape.w;
-			addr->input[i].shape[1] = metadata->input1[i].shape.x;
-			addr->input[i].shape[2] = metadata->input1[i].shape.y;
-			addr->input[i].shape[3] = metadata->input1[i].shape.z;
-
-			addr->input[i].nb_elements =
+			strncpy(layer->info.input[i].name, (char *)metadata->input1[i].input_name,
+				MRVL_ML_INPUT_NAME_LEN);
+			layer->info.input[i].dtype = metadata->input1[i].input_type;
+			layer->info.input[i].qtype = metadata->input1[i].model_input_type;
+			layer->info.input[i].nb_dims = 4;
+			layer->info.input[i].shape[0] = metadata->input1[i].shape.w;
+			layer->info.input[i].shape[1] = metadata->input1[i].shape.x;
+			layer->info.input[i].shape[2] = metadata->input1[i].shape.y;
+			layer->info.input[i].shape[3] = metadata->input1[i].shape.z;
+			layer->info.input[i].nb_elements =
 				metadata->input1[i].shape.w * metadata->input1[i].shape.x *
 				metadata->input1[i].shape.y * metadata->input1[i].shape.z;
-			addr->input[i].sz_d =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_d =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].input_type);
-			addr->input[i].sz_q =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_q =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].model_input_type);
-			addr->total_input_sz_d += addr->input[i].sz_d;
-			addr->total_input_sz_q += addr->input[i].sz_q;
+			layer->info.input[i].scale = metadata->input1[i].qscale;
+
+			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
+			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
 
 			plt_ml_dbg(
-				"model_id = %u, input[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				model->model_id, i, metadata->input1[i].shape.w,
+				"index = %u, input1[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				layer->index, i, metadata->input1[i].shape.w,
 				metadata->input1[i].shape.x, metadata->input1[i].shape.y,
-				metadata->input1[i].shape.z, addr->input[i].sz_d,
-				addr->input[i].sz_q);
+				metadata->input1[i].shape.z, layer->info.input[i].sz_d,
+				layer->info.input[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			addr->input[i].nb_dims = 4;
-			addr->input[i].shape[0] = metadata->input2[j].shape.w;
-			addr->input[i].shape[1] = metadata->input2[j].shape.x;
-			addr->input[i].shape[2] = metadata->input2[j].shape.y;
-			addr->input[i].shape[3] = metadata->input2[j].shape.z;
-
-			addr->input[i].nb_elements =
+			strncpy(layer->info.input[i].name, (char *)metadata->input2[j].input_name,
+				MRVL_ML_INPUT_NAME_LEN);
+			layer->info.input[i].dtype = metadata->input2[j].input_type;
+			layer->info.input[i].qtype = metadata->input2[j].model_input_type;
+			layer->info.input[i].nb_dims = 4;
+			layer->info.input[i].shape[0] = metadata->input2[j].shape.w;
+			layer->info.input[i].shape[1] = metadata->input2[j].shape.x;
+			layer->info.input[i].shape[2] = metadata->input2[j].shape.y;
+			layer->info.input[i].shape[3] = metadata->input2[j].shape.z;
+			layer->info.input[i].nb_elements =
 				metadata->input2[j].shape.w * metadata->input2[j].shape.x *
 				metadata->input2[j].shape.y * metadata->input2[j].shape.z;
-			addr->input[i].sz_d =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_d =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].input_type);
-			addr->input[i].sz_q =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_q =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].model_input_type);
-			addr->total_input_sz_d += addr->input[i].sz_d;
-			addr->total_input_sz_q += addr->input[i].sz_q;
+			layer->info.input[i].scale = metadata->input2[j].qscale;
+
+			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
+			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
 
 			plt_ml_dbg(
-				"model_id = %u, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				model->model_id, j, metadata->input2[j].shape.w,
+				"index = %u, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				layer->index, j, metadata->input2[j].shape.w,
 				metadata->input2[j].shape.x, metadata->input2[j].shape.y,
-				metadata->input2[j].shape.z, addr->input[i].sz_d,
-				addr->input[i].sz_q);
+				metadata->input2[j].shape.z, layer->info.input[i].sz_d,
+				layer->info.input[i].sz_q);
 		}
 	}
 
 	/* Outputs */
-	addr->total_output_sz_q = 0;
-	addr->total_output_sz_d = 0;
+	layer->info.nb_outputs = metadata->model.num_output;
+	layer->info.total_output_sz_q = 0;
+	layer->info.total_output_sz_d = 0;
 	for (i = 0; i < metadata->model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			addr->output[i].nb_dims = 1;
-			addr->output[i].shape[0] = metadata->output1[i].size;
-			addr->output[i].nb_elements = metadata->output1[i].size;
-			addr->output[i].sz_d =
-				addr->output[i].nb_elements *
+			strncpy(layer->info.output[i].name,
+				(char *)metadata->output1[i].output_name, MRVL_ML_OUTPUT_NAME_LEN);
+			layer->info.output[i].dtype = metadata->output1[i].output_type;
+			layer->info.output[i].qtype = metadata->output1[i].model_output_type;
+			layer->info.output[i].nb_dims = 1;
+			layer->info.output[i].shape[0] = metadata->output1[i].size;
+			layer->info.output[i].nb_elements = metadata->output1[i].size;
+			layer->info.output[i].sz_d =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].output_type);
-			addr->output[i].sz_q =
-				addr->output[i].nb_elements *
+			layer->info.output[i].sz_q =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].model_output_type);
-			addr->total_output_sz_q += addr->output[i].sz_q;
-			addr->total_output_sz_d += addr->output[i].sz_d;
+			layer->info.output[i].scale = metadata->output1[i].dscale;
 
-			plt_ml_dbg("model_id = %u, output[%u] - sz_d = %u, sz_q = %u",
-				   model->model_id, i, addr->output[i].sz_d, addr->output[i].sz_q);
+			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
+			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
+
+			plt_ml_dbg("index = %u, output1[%u] - sz_d = %u, sz_q = %u", layer->index,
+				   i, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			addr->output[i].nb_dims = 1;
-			addr->output[i].shape[0] = metadata->output2[j].size;
-			addr->output[i].nb_elements = metadata->output2[j].size;
-			addr->output[i].sz_d =
-				addr->output[i].nb_elements *
+			strncpy(layer->info.output[i].name,
+				(char *)metadata->output2[j].output_name, MRVL_ML_OUTPUT_NAME_LEN);
+			layer->info.output[i].dtype = metadata->output2[j].output_type;
+			layer->info.output[i].qtype = metadata->output2[j].model_output_type;
+			layer->info.output[i].nb_dims = 1;
+			layer->info.output[i].shape[0] = metadata->output2[j].size;
+			layer->info.output[i].nb_elements = metadata->output2[j].size;
+			layer->info.output[i].sz_d =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].output_type);
-			addr->output[i].sz_q =
-				addr->output[i].nb_elements *
+			layer->info.output[i].sz_q =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].model_output_type);
-			addr->total_output_sz_q += addr->output[i].sz_q;
-			addr->total_output_sz_d += addr->output[i].sz_d;
+			layer->info.output[i].scale = metadata->output2[j].dscale;
+
+			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
+			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
 
-			plt_ml_dbg("model_id = %u, output2[%u] - sz_d = %u, sz_q = %u",
-				   model->model_id, j, addr->output[i].sz_d, addr->output[i].sz_q);
+			plt_ml_dbg("index = %u, output2[%u] - sz_d = %u, sz_q = %u", layer->index,
+				   j, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
 		}
 	}
 }
@@ -514,23 +546,23 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 }
 
 void
-cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model)
+cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_model_addr *addr;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct rte_ml_model_info *info;
 	struct rte_ml_io_info *output;
 	struct rte_ml_io_info *input;
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_layer *layer;
 	uint8_t i;
-	uint8_t j;
 
-	mldev = dev->data->dev_private;
-	metadata = &model->metadata;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	metadata = &model->glow.metadata;
 	info = PLT_PTR_CAST(model->info);
 	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
 	output = PLT_PTR_ADD(input, metadata->model.num_input * sizeof(struct rte_ml_io_info));
-	addr = &model->addr;
 
 	/* Set model info */
 	memset(info, 0, sizeof(struct rte_ml_model_info));
@@ -542,7 +574,8 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model)
 	info->device_id = dev->data->dev_id;
 	info->io_layout = RTE_ML_IO_LAYOUT_PACKED;
 	info->min_batches = model->batch_size;
-	info->max_batches = mldev->fw.req->jd.fw_load.cap.s.max_num_batches / model->batch_size;
+	info->max_batches =
+		cn10k_mldev->fw.req->jd.fw_load.cap.s.max_num_batches / model->batch_size;
 	info->nb_inputs = metadata->model.num_input;
 	info->input_info = input;
 	info->nb_outputs = metadata->model.num_output;
@@ -550,56 +583,26 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model)
 	info->wb_size = metadata->weights_bias.file_size;
 
 	/* Set input info */
+	layer = &model->layer[0];
 	for (i = 0; i < info->nb_inputs; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			rte_memcpy(input[i].name, metadata->input1[i].input_name,
-				   MRVL_ML_INPUT_NAME_LEN);
-			input[i].nb_dims = addr->input[i].nb_dims;
-			input[i].shape = addr->input[i].shape;
-			input[i].type = metadata->input1[i].model_input_type;
-			input[i].nb_elements = addr->input[i].nb_elements;
-			input[i].size =
-				addr->input[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->input1[i].model_input_type);
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-
-			rte_memcpy(input[i].name, metadata->input2[j].input_name,
-				   MRVL_ML_INPUT_NAME_LEN);
-			input[i].nb_dims = addr->input[i].nb_dims;
-			input[i].shape = addr->input[i].shape;
-			input[i].type = metadata->input2[j].model_input_type;
-			input[i].nb_elements = addr->input[i].nb_elements;
-			input[i].size =
-				addr->input[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->input2[j].model_input_type);
-		}
+		rte_memcpy(input[i].name, layer->info.input[i].name, MRVL_ML_INPUT_NAME_LEN);
+		input[i].nb_dims = layer->info.input[i].nb_dims;
+		input[i].shape = &layer->info.input[i].shape[0];
+		input[i].type = layer->info.input[i].qtype;
+		input[i].nb_elements = layer->info.input[i].nb_elements;
+		input[i].size = layer->info.input[i].nb_elements *
+				rte_ml_io_type_size_get(layer->info.input[i].qtype);
 	}
 
 	/* Set output info */
+	layer = &model->layer[0];
 	for (i = 0; i < info->nb_outputs; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			rte_memcpy(output[i].name, metadata->output1[i].output_name,
-				   MRVL_ML_OUTPUT_NAME_LEN);
-			output[i].nb_dims = addr->output[i].nb_dims;
-			output[i].shape = addr->output[i].shape;
-			output[i].type = metadata->output1[i].model_output_type;
-			output[i].nb_elements = addr->output[i].nb_elements;
-			output[i].size =
-				addr->output[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->output1[i].model_output_type);
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-
-			rte_memcpy(output[i].name, metadata->output2[j].output_name,
-				   MRVL_ML_OUTPUT_NAME_LEN);
-			output[i].nb_dims = addr->output[i].nb_dims;
-			output[i].shape = addr->output[i].shape;
-			output[i].type = metadata->output2[j].model_output_type;
-			output[i].nb_elements = addr->output[i].nb_elements;
-			output[i].size =
-				addr->output[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->output2[j].model_output_type);
-		}
+		rte_memcpy(output[i].name, layer->info.output[i].name, MRVL_ML_INPUT_NAME_LEN);
+		output[i].nb_dims = layer->info.output[i].nb_dims;
+		output[i].shape = &layer->info.output[i].shape[0];
+		output[i].type = layer->info.output[i].qtype;
+		output[i].nb_elements = layer->info.output[i].nb_elements;
+		output[i].size = layer->info.output[i].nb_elements *
+				 rte_ml_io_type_size_get(layer->info.output[i].qtype);
 	}
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 3128b28db7..206a369ca7 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -13,15 +13,8 @@
 #include "cn10k_ml_ocm.h"
 #include "cn10k_ml_ops.h"
 
-struct cnxk_ml_dev;
-
-/* Model state */
-enum cn10k_ml_model_state {
-	ML_CN10K_MODEL_STATE_LOADED,
-	ML_CN10K_MODEL_STATE_JOB_ACTIVE,
-	ML_CN10K_MODEL_STATE_STARTED,
-	ML_CN10K_MODEL_STATE_UNKNOWN,
-};
+struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 /* Model Metadata : v 2.3.0.1 */
 #define MRVL_ML_MODEL_MAGIC_STRING "MRVL"
@@ -369,7 +362,7 @@ struct cn10k_ml_model_metadata {
 };
 
 /* Model address structure */
-struct cn10k_ml_model_addr {
+struct cn10k_ml_layer_addr {
 	/* Base DMA address for load */
 	void *base_dma_addr_load;
 
@@ -408,58 +401,10 @@ struct cn10k_ml_model_addr {
 
 	/* End tile */
 	uint8_t tile_end;
-
-	/* Input address and size */
-	struct {
-		/* Number of dimensions in shape */
-		uint32_t nb_dims;
-
-		/* Shape of input */
-		uint32_t shape[4];
-
-		/* Number of elements */
-		uint32_t nb_elements;
-
-		/* Dequantized input size */
-		uint32_t sz_d;
-
-		/* Quantized input size */
-		uint32_t sz_q;
-	} input[MRVL_ML_NUM_INPUT_OUTPUT];
-
-	/* Output address and size */
-	struct {
-		/* Number of dimensions in shape */
-		uint32_t nb_dims;
-
-		/* Shape of input */
-		uint32_t shape[4];
-
-		/* Number of elements */
-		uint32_t nb_elements;
-
-		/* Dequantize output size */
-		uint32_t sz_d;
-
-		/* Quantized output size */
-		uint32_t sz_q;
-	} output[MRVL_ML_NUM_INPUT_OUTPUT];
-
-	/* Total size of quantized input */
-	uint32_t total_input_sz_q;
-
-	/* Total size of dequantized input */
-	uint32_t total_input_sz_d;
-
-	/* Total size of quantized output */
-	uint32_t total_output_sz_q;
-
-	/* Total size of dequantized output */
-	uint32_t total_output_sz_d;
 };
 
 /* Model fast-path stats */
-struct cn10k_ml_model_stats {
+struct cn10k_ml_layer_stats {
 	/* Total hardware latency, sum of all inferences */
 	uint64_t hw_latency_tot;
 
@@ -488,59 +433,38 @@ struct cn10k_ml_model_stats {
 	uint64_t fw_reset_count;
 };
 
-/* Model Object */
-struct cn10k_ml_model {
-	/* Device reference */
-	struct cnxk_ml_dev *mldev;
-
-	/* Name */
-	char name[RTE_ML_STR_MAX];
-
-	/* ID */
-	uint16_t model_id;
-
-	/* Batch size */
-	uint32_t batch_size;
-
-	/* Metadata */
+struct cn10k_ml_layer_data {
+	/* Model / Layer: metadata */
 	struct cn10k_ml_model_metadata metadata;
 
-	/* Address structure */
-	struct cn10k_ml_model_addr addr;
+	/* Layer: address structure */
+	struct cn10k_ml_layer_addr addr;
 
-	/* Tile and memory information object */
-	struct cn10k_ml_ocm_model_map model_mem_map;
+	/* Layer: Tile and memory information object */
+	struct cn10k_ml_ocm_layer_map ocm_map;
 
-	/* Internal model information structure
-	 * Size of the buffer = sizeof(struct rte_ml_model_info)
-	 *                    + num_inputs * sizeof(struct rte_ml_io_info)
-	 *                    + num_outputs * sizeof(struct rte_ml_io_info).
-	 * Structures would be arranged in the same order in the buffer.
-	 */
-	uint8_t *info;
-
-	/* Spinlock, used to update model state */
-	plt_spinlock_t lock;
-
-	/* State */
-	enum cn10k_ml_model_state state;
-
-	/* Slow-path operations request pointer */
+	/* Layer: Slow-path operations request pointer */
 	struct cn10k_ml_req *req;
 
-	/* Stats for burst ops */
-	struct cn10k_ml_model_stats *burst_stats;
+	/* Layer: Stats for burst ops */
+	struct cn10k_ml_layer_stats *burst_stats;
 
-	/* Stats for sync ops */
-	struct cn10k_ml_model_stats *sync_stats;
+	/* Layer: Stats for sync ops */
+	struct cn10k_ml_layer_stats *sync_stats;
+};
+
+struct cn10k_ml_model_data {
+	/* Model / Layer: metadata */
+	struct cn10k_ml_model_metadata metadata;
 };
 
 int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
-void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
+void cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer,
 				uint8_t *base_dma_addr);
+void cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer);
 int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id,
 				   uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages);
-void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model);
+void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 8094a0fab1..d71c36eae6 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -6,10 +6,10 @@
 
 #include <roc_api.h>
 
-#include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 
 /* OCM macros */
 #define BYTE_LEN	   8
@@ -333,12 +333,14 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 }
 
 void
-cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t tilemask,
-			   int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages)
+cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
+			   uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
+			   uint16_t scratch_pages)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_page_start;
@@ -353,6 +355,7 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
+	layer = &model->layer[layer_id];
 
 	/* Get first set bit, tile_start */
 	tile_start = 0;
@@ -382,8 +385,8 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 				PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, wb_page_end);
 	}
 
-	model->addr.tile_start = tile_start;
-	model->addr.tile_end = tile_end;
+	layer->glow.addr.tile_start = tile_start;
+	layer->glow.addr.tile_end = tile_end;
 
 	plt_ml_dbg("model_id = %u, tilemask = 0x%016lx", model_id, tilemask);
 	plt_ml_dbg("model_id = %u, wb_page_start = %d, wb_page_end = %d", model_id, wb_page_start,
@@ -393,12 +396,14 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 }
 
 void
-cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id)
 {
-	struct cn10k_ml_model *local_model;
+	struct cnxk_ml_model *local_model;
+	struct cnxk_ml_layer *local_layer;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_resize_pages;
@@ -409,16 +414,19 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 	int tile_id;
 	int page_id;
 	uint16_t i;
+	uint16_t j;
 
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
+	layer = &model->layer[layer_id];
 
 	/* Update OCM info for WB memory */
-	wb_page_start = model->model_mem_map.wb_page_start;
-	wb_page_end = wb_page_start + model->model_mem_map.wb_pages - 1;
-	for (tile_id = model->addr.tile_start; tile_id <= model->addr.tile_end; tile_id++) {
+	wb_page_start = layer->glow.ocm_map.wb_page_start;
+	wb_page_end = wb_page_start + layer->glow.ocm_map.wb_pages - 1;
+	for (tile_id = layer->glow.addr.tile_start; tile_id <= layer->glow.addr.tile_end;
+	     tile_id++) {
 		for (page_id = wb_page_start; page_id <= wb_page_end; page_id++) {
 			CLEAR_BIT(ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE],
 				  page_id % OCM_MAP_WORD_SIZE);
@@ -432,11 +440,19 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 		scratch_resize_pages = 0;
 		for (i = 0; i < dev->data->nb_models; i++) {
 			local_model = dev->data->models[i];
-			if ((i != model_id) && (local_model != NULL)) {
-				if (IS_BIT_SET(local_model->model_mem_map.tilemask, tile_id))
-					scratch_resize_pages = PLT_MAX(
-						(int)local_model->model_mem_map.scratch_pages,
-						scratch_resize_pages);
+			if (local_model == NULL)
+				continue;
+
+			for (j = 0; j < local_model->nb_layers; j++) {
+				local_layer = &local_model->layer[j];
+				if (local_layer != layer &&
+				    local_layer->glow.ocm_map.ocm_reserved) {
+					if (IS_BIT_SET(local_layer->glow.ocm_map.tilemask, tile_id))
+						scratch_resize_pages =
+							PLT_MAX((int)local_layer->glow.ocm_map
+									.scratch_pages,
+								scratch_resize_pages);
+				}
 			}
 		}
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 3404e7fd65..720f8caf76 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -27,7 +27,7 @@ struct cn10k_ml_ocm_tile_info {
 };
 
 /* Model OCM map structure */
-struct cn10k_ml_ocm_model_map {
+struct cn10k_ml_ocm_layer_map {
 	/* Status of OCM reservation */
 	bool ocm_reserved;
 
@@ -77,9 +77,10 @@ struct cn10k_ml_ocm {
 int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
 int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
 			       uint16_t scratch_pages, uint64_t *tilemask);
-void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t tilemask,
-				int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages);
-void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id);
+void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
+				uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
+				uint16_t scratch_pages);
+void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id);
 void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp);
 
 #endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index def6d4c756..e91cc4e859 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -7,10 +7,10 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_model.h"
 #include "cn10k_ml_ops.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
@@ -202,7 +202,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	char str[STR_LEN];
 	uint8_t i;
@@ -215,77 +215,80 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 
 	/* Print debug info */
 	print_line(fp, LINE_LEN);
-	fprintf(fp, " Model Information (%s)\n", model->metadata.model.name);
+	fprintf(fp, " Model Information (%s)\n", model->glow.metadata.model.name);
 	print_line(fp, LINE_LEN);
-	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->metadata.model.name);
-	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version", model->metadata.model.version[0],
-		model->metadata.model.version[1], model->metadata.model.version[2],
-		model->metadata.model.version[3]);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->glow.metadata.model.name);
+	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version",
+		model->glow.metadata.model.version[0], model->glow.metadata.model.version[1],
+		model->glow.metadata.model.version[2], model->glow.metadata.model.version[3]);
 	if (strlen(model->name) != 0)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "debug_name", model->name);
 	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->metadata.model.batch_size);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_layers", model->metadata.model.num_layers);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->glow.metadata.model.batch_size);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_layers", model->glow.metadata.model.num_layers);
 
 	/* Print model state */
-	if (model->state == ML_CN10K_MODEL_STATE_LOADED)
+	if (model->state == ML_CNXK_MODEL_STATE_LOADED)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
-	if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE)
+	if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
-	if (model->state == ML_CN10K_MODEL_STATE_STARTED)
+	if (model->state == ML_CNXK_MODEL_STATE_STARTED)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
 
 	/* Print OCM status */
 	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
-		model->metadata.model.ocm_wb_range_end - model->metadata.model.ocm_wb_range_start +
-			1);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", model->model_mem_map.wb_pages);
+		model->glow.metadata.model.ocm_wb_range_end -
+			model->glow.metadata.model.ocm_wb_range_start + 1);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", model->layer[0].glow.ocm_map.wb_pages);
 	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
-		ocm->size_per_tile - model->metadata.model.ocm_tmp_range_floor);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages", model->model_mem_map.scratch_pages);
+		ocm->size_per_tile - model->glow.metadata.model.ocm_tmp_range_floor);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages",
+		model->layer[0].glow.ocm_map.scratch_pages);
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
-		model->metadata.model.tile_end - model->metadata.model.tile_start + 1);
+		model->glow.metadata.model.tile_end - model->glow.metadata.model.tile_start + 1);
 
-	if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+	if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
-			ML_CN10K_OCM_NUMTILES / 4, model->model_mem_map.tilemask);
+			ML_CN10K_OCM_NUMTILES / 4, model->layer[0].glow.ocm_map.tilemask);
 		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
-			model->model_mem_map.wb_page_start * cn10k_mldev->ocm.page_size);
+			model->layer[0].glow.ocm_map.wb_page_start * cn10k_mldev->ocm.page_size);
 	}
 
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->metadata.model.num_input);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", model->metadata.model.num_output);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->glow.metadata.model.num_input);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", model->glow.metadata.model.num_output);
 	fprintf(fp, "\n");
 
 	print_line(fp, LINE_LEN);
 	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "input", "input_name", "input_type",
 		"model_input_type", "quantize");
 	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->metadata.model.num_input; i++) {
+	for (i = 0; i < model->glow.metadata.model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.input1[i].input_name);
-			rte_ml_io_type_to_str(model->metadata.input1[i].input_type, str, STR_LEN);
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.input1[i].input_name);
+			rte_ml_io_type_to_str(model->glow.metadata.input1[i].input_type, str,
+					      STR_LEN);
 			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.input1[i].model_input_type, str,
+			rte_ml_io_type_to_str(model->glow.metadata.input1[i].model_input_type, str,
 					      STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.input1[i].quantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.input1[i].quantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.input2[j].input_name);
-			rte_ml_io_type_to_str(model->metadata.input2[j].input_type, str, STR_LEN);
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.input2[j].input_name);
+			rte_ml_io_type_to_str(model->glow.metadata.input2[j].input_type, str,
+					      STR_LEN);
 			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.input2[j].model_input_type, str,
+			rte_ml_io_type_to_str(model->glow.metadata.input2[j].model_input_type, str,
 					      STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.input2[j].quantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.input2[j].quantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		}
 	}
@@ -295,29 +298,31 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "output", "output_name", "output_type",
 		"model_output_type", "dequantize");
 	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->metadata.model.num_output; i++) {
+	for (i = 0; i < model->glow.metadata.model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.output1[i].output_name);
-			rte_ml_io_type_to_str(model->metadata.output1[i].output_type, str, STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.output1[i].model_output_type, str,
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.output1[i].output_name);
+			rte_ml_io_type_to_str(model->glow.metadata.output1[i].output_type, str,
 					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(model->glow.metadata.output1[i].model_output_type,
+					      str, STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.output1[i].dequantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.output1[i].dequantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.output2[j].output_name);
-			rte_ml_io_type_to_str(model->metadata.output2[j].output_type, str, STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.output2[j].model_output_type, str,
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.output2[j].output_name);
+			rte_ml_io_type_to_str(model->glow.metadata.output2[j].output_type, str,
 					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(model->glow.metadata.output2[j].model_output_type,
+					      str, STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.output2[j].dequantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.output2[j].dequantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		}
 	}
@@ -327,14 +332,14 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 }
 
 static void
-cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_model *model,
+cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_model *model,
 				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_model_addr *addr;
+	struct cn10k_ml_layer_addr *addr;
 
-	metadata = &model->metadata;
-	addr = &model->addr;
+	metadata = &model->glow.metadata;
+	addr = &model->layer[0].glow.addr;
 
 	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
 	req->jd.hdr.jce.w0.u64 = 0;
@@ -345,7 +350,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_m
 	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
-		if (!model->metadata.model.ocm_relocatable)
+		if (!model->glow.metadata.model.ocm_relocatable)
 			req->jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
 		else
 			req->jd.hdr.sp_flags = 0x0;
@@ -385,7 +390,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_m
 		req->jd.model_start.output.s.ddr_range_end = metadata->model.ddr_output_range_end;
 
 		req->extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
-			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, model->addr.scratch_base_addr));
+			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->scratch_base_addr));
 		req->extended_args.start.ddr_scratch_range_start =
 			metadata->model.ddr_scratch_range_start;
 		req->extended_args.start.ddr_scratch_range_end =
@@ -445,7 +450,7 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
-	nb_stats = RTE_DIM(device_stats) + ML_CN10K_MAX_MODELS * RTE_DIM(model_stats);
+	nb_stats = RTE_DIM(device_stats) + ML_CNXK_MAX_MODELS * RTE_DIM(model_stats);
 	if (cn10k_mldev->xstats.entries == NULL)
 		cn10k_mldev->xstats.entries = rte_zmalloc(
 			"cn10k_ml_xstats", sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
@@ -472,7 +477,7 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	cn10k_mldev->xstats.count_mode_device = stat_id;
 
 	/* Initialize model xstats */
-	for (model = 0; model < ML_CN10K_MAX_MODELS; model++) {
+	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
 		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
 
 		for (i = 0; i < RTE_DIM(model_stats); i++) {
@@ -521,7 +526,7 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint16_t rclk_freq;
 	uint16_t sclk_freq;
 	uint16_t stat_id;
@@ -543,7 +548,7 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 	for (i = 0; i < RTE_DIM(model_stats); i++) {
 		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
-			 model->metadata.model.name, model_stats[i].name, suffix);
+			 model->layer[0].glow.metadata.model.name, model_stats[i].name, suffix);
 		stat_id++;
 	}
 }
@@ -576,9 +581,9 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value += model->burst_stats[qp_id].str##_latency_tot;                      \
-			count += model->burst_stats[qp_id].dequeued_count -                        \
-				 model->burst_stats[qp_id].str##_reset_count;                      \
+			value += model->layer[0].glow.burst_stats[qp_id].str##_latency_tot;        \
+			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
+				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
 		}                                                                                  \
 		if (count != 0)                                                                    \
 			value = value / count;                                                     \
@@ -588,9 +593,10 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = UINT64_MAX;                                                                \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MIN(value, model->burst_stats[qp_id].str##_latency_min);       \
-			count += model->burst_stats[qp_id].dequeued_count -                        \
-				 model->burst_stats[qp_id].str##_reset_count;                      \
+			value = PLT_MIN(                                                           \
+				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_min); \
+			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
+				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
@@ -600,9 +606,10 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MAX(value, model->burst_stats[qp_id].str##_latency_max);       \
-			count += model->burst_stats[qp_id].dequeued_count -                        \
-				 model->burst_stats[qp_id].str##_reset_count;                      \
+			value = PLT_MAX(                                                           \
+				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_max); \
+			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
+				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
@@ -611,7 +618,7 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 static uint64_t
 cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cn10k_ml_xstats_type type)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint16_t rclk_freq; /* MHz */
 	uint16_t sclk_freq; /* MHz */
 	uint64_t count = 0;
@@ -692,28 +699,28 @@ cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[],
 #define ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			model->burst_stats[qp_id].str##_latency_tot = 0;                           \
-			model->burst_stats[qp_id].str##_reset_count =                              \
-				model->burst_stats[qp_id].dequeued_count;                          \
+			model->layer[0].glow.burst_stats[qp_id].str##_latency_tot = 0;             \
+			model->layer[0].glow.burst_stats[qp_id].str##_reset_count =                \
+				model->layer[0].glow.burst_stats[qp_id].dequeued_count;            \
 		}                                                                                  \
 	} while (0)
 
 #define ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->burst_stats[qp_id].str##_latency_min = UINT64_MAX;                  \
+			model->layer[0].glow.burst_stats[qp_id].str##_latency_min = UINT64_MAX;    \
 	} while (0)
 
 #define ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->burst_stats[qp_id].str##_latency_max = 0;                           \
+			model->layer[0].glow.burst_stats[qp_id].str##_latency_max = 0;             \
 	} while (0)
 
 static void
 cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cn10k_ml_xstats_type type)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint32_t qp_id;
 
 	model = dev->data->models[model_id];
@@ -749,7 +756,7 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	int32_t lcl_model_id = 0;
 	uint16_t start_id;
 	uint16_t end_id;
@@ -758,7 +765,7 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	for (i = 0; i < ML_CN10K_MAX_MODELS; i++) {
+	for (i = 0; i < ML_CNXK_MAX_MODELS; i++) {
 		if (model_id == -1) {
 			model = dev->data->models[i];
 			if (model == NULL) /* Skip inactive models */
@@ -803,7 +810,7 @@ static int
 cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct rte_ml_model_info *info;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct rte_ml_buff_seg seg[2];
 	struct rte_ml_buff_seg *inp;
 	struct rte_ml_buff_seg *out;
@@ -854,7 +861,7 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	op.input = &inp;
 	op.output = &out;
 
-	memset(model->req, 0, sizeof(struct cn10k_ml_req));
+	memset(model->layer[0].glow.req, 0, sizeof(struct cn10k_ml_req));
 	ret = cn10k_ml_inference_sync(dev, &op);
 	plt_memzone_free(mz);
 
@@ -875,7 +882,7 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 
 	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
 	dev_info->driver_name = dev->device->driver->name;
-	dev_info->max_models = ML_CN10K_MAX_MODELS;
+	dev_info->max_models = ML_CNXK_MAX_MODELS;
 	if (cn10k_mldev->hw_queue_lock)
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
 	else
@@ -895,7 +902,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
@@ -1001,11 +1008,11 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 			model = dev->data->models[model_id];
 			if (model != NULL) {
-				if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 					if (cn10k_ml_model_stop(dev, model_id) != 0)
 						plt_err("Could not stop model %u", model_id);
 				}
-				if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
 					if (cn10k_ml_model_unload(dev, model_id) != 0)
 						plt_err("Could not unload model %u", model_id);
 				}
@@ -1093,7 +1100,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
 	uint16_t qp_id;
@@ -1111,11 +1118,11 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 		model = dev->data->models[model_id];
 		if (model != NULL) {
-			if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 				if (cn10k_ml_model_stop(dev, model_id) != 0)
 					plt_err("Could not stop model %u", model_id);
 			}
-			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
 				if (cn10k_ml_model_unload(dev, model_id) != 0)
 					plt_err("Could not unload model %u", model_id);
 			}
@@ -1294,7 +1301,7 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CN10K_MAX_MODELS)
+		if (model_id >= ML_CNXK_MAX_MODELS)
 			break;
 		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
@@ -1386,7 +1393,7 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CN10K_MAX_MODELS)
+		if (model_id >= ML_CNXK_MAX_MODELS)
 			return -EINVAL;
 		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
@@ -1447,7 +1454,7 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_fw *fw;
 
 	uint32_t head_loc;
@@ -1588,7 +1595,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
@@ -1643,9 +1650,9 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
 			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
 	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
-	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_model_stats);
+	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_stats);
 
-	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
+	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
 		  2 * model_data_size + model_scratch_size + model_info_size +
 		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE) +
 		  model_stats_size;
@@ -1659,62 +1666,85 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	model = mz->addr;
-	model->mldev = cnxk_mldev;
+	model->cnxk_mldev = cnxk_mldev;
 	model->model_id = idx;
+	dev->data->models[idx] = model;
 
-	rte_memcpy(&model->metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
-	cn10k_ml_model_metadata_update(&model->metadata);
+	rte_memcpy(&model->glow.metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
+	cn10k_ml_model_metadata_update(&model->glow.metadata);
+
+	/* Set model name */
+	rte_memcpy(model->name, (char *)model->glow.metadata.model.name, 64);
 
 	/* Enable support for batch_size of 256 */
-	if (model->metadata.model.batch_size == 0)
+	if (model->glow.metadata.model.batch_size == 0)
 		model->batch_size = 256;
 	else
-		model->batch_size = model->metadata.model.batch_size;
+		model->batch_size = model->glow.metadata.model.batch_size;
+
+	/* Since the number of layers that the driver would be handling for glow models is
+	 * always 1. consider the entire model as a model with single layer. This would
+	 * ignore the num_layers from metadata.
+	 */
+	model->nb_layers = 1;
+
+	/* Copy metadata to internal buffer */
+	rte_memcpy(&model->layer[0].glow.metadata, params->addr,
+		   sizeof(struct cn10k_ml_model_metadata));
+	cn10k_ml_model_metadata_update(&model->layer[0].glow.metadata);
+	model->layer[0].model = model;
 
 	/* Set DMA base address */
 	base_dma_addr = PLT_PTR_ADD(
-		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE));
-	cn10k_ml_model_addr_update(model, params->addr, base_dma_addr);
-	model->addr.scratch_base_addr = PLT_PTR_ADD(base_dma_addr, 2 * model_data_size);
+		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE));
+	cn10k_ml_layer_addr_update(&model->layer[0], params->addr, base_dma_addr);
+	model->layer[0].glow.addr.scratch_base_addr =
+		PLT_PTR_ADD(base_dma_addr, 2 * model_data_size);
 
 	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, model_data_size);
+	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
+		   model->layer[0].glow.addr.base_dma_addr_load, model_data_size);
+
+	/* Update internal I/O data structure */
+	cn10k_ml_layer_info_update(&model->layer[0]);
 
 	/* Initialize model_mem_map */
-	memset(&model->model_mem_map, 0, sizeof(struct cn10k_ml_ocm_model_map));
-	model->model_mem_map.ocm_reserved = false;
-	model->model_mem_map.tilemask = 0;
-	model->model_mem_map.wb_page_start = -1;
-	model->model_mem_map.wb_pages = wb_pages;
-	model->model_mem_map.scratch_pages = scratch_pages;
+	memset(&model->layer[0].glow.ocm_map, 0, sizeof(struct cn10k_ml_ocm_layer_map));
+	model->layer[0].glow.ocm_map.ocm_reserved = false;
+	model->layer[0].glow.ocm_map.tilemask = 0;
+	model->layer[0].glow.ocm_map.wb_page_start = -1;
+	model->layer[0].glow.ocm_map.wb_pages = wb_pages;
+	model->layer[0].glow.ocm_map.scratch_pages = scratch_pages;
 
 	/* Set model info */
-	model->info = PLT_PTR_ADD(model->addr.scratch_base_addr, model_scratch_size);
+	model->info = PLT_PTR_ADD(model->layer[0].glow.addr.scratch_base_addr, model_scratch_size);
 	cn10k_ml_model_info_set(dev, model);
 
 	/* Set slow-path request address and state */
-	model->req = PLT_PTR_ADD(model->info, model_info_size);
+	model->layer[0].glow.req = PLT_PTR_ADD(model->info, model_info_size);
 
 	/* Reset burst and sync stats */
-	model->burst_stats = PLT_PTR_ADD(
-		model->req, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE));
+	model->layer[0].glow.burst_stats =
+		PLT_PTR_ADD(model->layer[0].glow.req,
+			    PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE));
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
-		model->burst_stats[qp_id].hw_latency_tot = 0;
-		model->burst_stats[qp_id].hw_latency_min = UINT64_MAX;
-		model->burst_stats[qp_id].hw_latency_max = 0;
-		model->burst_stats[qp_id].fw_latency_tot = 0;
-		model->burst_stats[qp_id].fw_latency_min = UINT64_MAX;
-		model->burst_stats[qp_id].fw_latency_max = 0;
-		model->burst_stats[qp_id].hw_reset_count = 0;
-		model->burst_stats[qp_id].fw_reset_count = 0;
-		model->burst_stats[qp_id].dequeued_count = 0;
-	}
-	model->sync_stats =
-		PLT_PTR_ADD(model->burst_stats,
-			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_model_stats));
+		model->layer[0].glow.burst_stats[qp_id].hw_latency_tot = 0;
+		model->layer[0].glow.burst_stats[qp_id].hw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_stats[qp_id].hw_latency_max = 0;
+		model->layer[0].glow.burst_stats[qp_id].fw_latency_tot = 0;
+		model->layer[0].glow.burst_stats[qp_id].fw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_stats[qp_id].fw_latency_max = 0;
+		model->layer[0].glow.burst_stats[qp_id].hw_reset_count = 0;
+		model->layer[0].glow.burst_stats[qp_id].fw_reset_count = 0;
+		model->layer[0].glow.burst_stats[qp_id].dequeued_count = 0;
+	}
+
+	model->layer[0].glow.sync_stats =
+		PLT_PTR_ADD(model->layer[0].glow.burst_stats,
+			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_stats));
 
 	plt_spinlock_init(&model->lock);
-	model->state = ML_CN10K_MODEL_STATE_LOADED;
+	model->state = ML_CNXK_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
 	cnxk_mldev->nb_models_loaded++;
 
@@ -1730,7 +1760,7 @@ int
 cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	char str[RTE_MEMZONE_NAMESIZE];
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cnxk_ml_dev *cnxk_mldev;
 
 	cnxk_mldev = dev->data->dev_private;
@@ -1741,7 +1771,7 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EINVAL;
 	}
 
-	if (model->state != ML_CN10K_MODEL_STATE_LOADED) {
+	if (model->state != ML_CNXK_MODEL_STATE_LOADED) {
 		plt_err("Cannot unload. Model in use.");
 		return -EBUSY;
 	}
@@ -1758,7 +1788,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1783,7 +1813,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	}
 
 	/* Prepare JD */
-	req = model->req;
+	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
@@ -1791,63 +1821,66 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
-	num_tiles = model->metadata.model.tile_end - model->metadata.model.tile_start + 1;
+	num_tiles = model->layer[0].glow.metadata.model.tile_end -
+		    model->layer[0].glow.metadata.model.tile_start + 1;
 
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 				plt_ml_dbg("Model already started, model = 0x%016lx",
 					   PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE) {
+			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
 				plt_err("A slow-path job is active for the model = 0x%016lx",
 					PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CN10K_MODEL_STATE_JOB_ACTIVE;
+			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (!model->model_mem_map.ocm_reserved) {
+	while (!model->layer[0].glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
 			wb_page_start = cn10k_ml_ocm_tilemask_find(
-				dev, num_tiles, model->model_mem_map.wb_pages,
-				model->model_mem_map.scratch_pages, &tilemask);
+				dev, num_tiles, model->layer[0].glow.ocm_map.wb_pages,
+				model->layer[0].glow.ocm_map.scratch_pages, &tilemask);
 
 			if (wb_page_start == -1) {
 				plt_err("Free pages not available on OCM tiles");
 				plt_err("Failed to start model = 0x%016lx, name = %s",
-					PLT_U64_CAST(model), model->metadata.model.name);
+					PLT_U64_CAST(model),
+					model->layer[0].glow.metadata.model.name);
 
 				plt_spinlock_unlock(&ocm->lock);
 				return -ENOMEM;
 			}
 
-			model->model_mem_map.tilemask = tilemask;
-			model->model_mem_map.wb_page_start = wb_page_start;
+			model->layer[0].glow.ocm_map.tilemask = tilemask;
+			model->layer[0].glow.ocm_map.wb_page_start = wb_page_start;
 
-			cn10k_ml_ocm_reserve_pages(
-				dev, model->model_id, model->model_mem_map.tilemask,
-				model->model_mem_map.wb_page_start, model->model_mem_map.wb_pages,
-				model->model_mem_map.scratch_pages);
-			model->model_mem_map.ocm_reserved = true;
+			cn10k_ml_ocm_reserve_pages(dev, model->model_id, 0,
+						   model->layer[0].glow.ocm_map.tilemask,
+						   model->layer[0].glow.ocm_map.wb_page_start,
+						   model->layer[0].glow.ocm_map.wb_pages,
+						   model->layer[0].glow.ocm_map.scratch_pages);
+			model->layer[0].glow.ocm_map.ocm_reserved = true;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
 
 	/* Update JD */
-	cn10k_ml_ocm_tilecount(model->model_mem_map.tilemask, &tile_start, &tile_end);
+	cn10k_ml_ocm_tilecount(model->layer[0].glow.ocm_map.tilemask, &tile_start, &tile_end);
 	req->jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
 	req->jd.model_start.ocm_wb_base_address =
-		model->model_mem_map.wb_page_start * ocm->page_size;
+		model->layer[0].glow.ocm_map.wb_page_start * ocm->page_size;
 
 	job_enqueued = false;
 	job_dequeued = false;
@@ -1880,10 +1913,10 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
 			if (ret == 0) {
-				model->state = ML_CN10K_MODEL_STATE_STARTED;
+				model->state = ML_CNXK_MODEL_STATE_STARTED;
 				cnxk_mldev->nb_models_started++;
 			} else {
-				model->state = ML_CN10K_MODEL_STATE_UNKNOWN;
+				model->state = ML_CNXK_MODEL_STATE_UNKNOWN;
 			}
 
 			plt_spinlock_unlock(&model->lock);
@@ -1891,12 +1924,12 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 		}
 	}
 
-	if (model->state == ML_CN10K_MODEL_STATE_UNKNOWN) {
-		while (model->model_mem_map.ocm_reserved) {
+	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN) {
+		while (model->layer[0].glow.ocm_map.ocm_reserved) {
 			if (plt_spinlock_trylock(&ocm->lock) != 0) {
-				cn10k_ml_ocm_free_pages(dev, model->model_id);
-				model->model_mem_map.ocm_reserved = false;
-				model->model_mem_map.tilemask = 0x0;
+				cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
+				model->layer[0].glow.ocm_map.ocm_reserved = false;
+				model->layer[0].glow.ocm_map.tilemask = 0x0;
 				plt_spinlock_unlock(&ocm->lock);
 			}
 		}
@@ -1917,7 +1950,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1937,7 +1970,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	}
 
 	/* Prepare JD */
-	req = model->req;
+	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
@@ -1948,31 +1981,31 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
 				plt_ml_dbg("Model not started, model = 0x%016lx",
 					   PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE) {
+			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
 				plt_err("A slow-path job is active for the model = 0x%016lx",
 					PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CN10K_MODEL_STATE_JOB_ACTIVE;
+			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (model->model_mem_map.ocm_reserved) {
+	while (model->layer[0].glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
-			cn10k_ml_ocm_free_pages(dev, model->model_id);
-			model->model_mem_map.ocm_reserved = false;
-			model->model_mem_map.tilemask = 0x0;
+			cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
+			model->layer[0].glow.ocm_map.ocm_reserved = false;
+			model->layer[0].glow.ocm_map.tilemask = 0x0;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
@@ -2008,7 +2041,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
 			cnxk_mldev->nb_models_stopped++;
-			model->state = ML_CN10K_MODEL_STATE_LOADED;
+			model->state = ML_CNXK_MODEL_STATE_LOADED;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
@@ -2021,7 +2054,7 @@ static int
 cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			struct rte_ml_model_info *model_info)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 
 	model = dev->data->models[model_id];
 
@@ -2040,7 +2073,7 @@ cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 static int
 cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	size_t size;
 
 	model = dev->data->models[model_id];
@@ -2050,19 +2083,23 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *bu
 		return -EINVAL;
 	}
 
-	if (model->state == ML_CN10K_MODEL_STATE_UNKNOWN)
+	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN)
 		return -1;
-	else if (model->state != ML_CN10K_MODEL_STATE_LOADED)
+	else if (model->state != ML_CNXK_MODEL_STATE_LOADED)
 		return -EBUSY;
 
-	size = model->metadata.init_model.file_size + model->metadata.main_model.file_size +
-	       model->metadata.finish_model.file_size + model->metadata.weights_bias.file_size;
+	size = model->layer[0].glow.metadata.init_model.file_size +
+	       model->layer[0].glow.metadata.main_model.file_size +
+	       model->layer[0].glow.metadata.finish_model.file_size +
+	       model->layer[0].glow.metadata.weights_bias.file_size;
 
 	/* Update model weights & bias */
-	rte_memcpy(model->addr.wb_load_addr, buffer, model->metadata.weights_bias.file_size);
+	rte_memcpy(model->layer[0].glow.addr.wb_load_addr, buffer,
+		   model->layer[0].glow.metadata.weights_bias.file_size);
 
 	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, size);
+	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
+		   model->layer[0].glow.addr.base_dma_addr_load, size);
 
 	return 0;
 }
@@ -2071,7 +2108,7 @@ static int
 cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
 		     struct rte_ml_buff_seg **qbuffer)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint8_t model_input_type;
 	uint8_t *lcl_dbuffer;
 	uint8_t *lcl_qbuffer;
@@ -2091,57 +2128,58 @@ cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_bu
 	lcl_dbuffer = dbuffer[0]->addr;
 	lcl_qbuffer = qbuffer[0]->addr;
 
-	for (i = 0; i < model->metadata.model.num_input; i++) {
+	for (i = 0; i < model->layer[0].glow.metadata.model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			input_type = model->metadata.input1[i].input_type;
-			model_input_type = model->metadata.input1[i].model_input_type;
-			qscale = model->metadata.input1[i].qscale;
+			input_type = model->layer[0].glow.metadata.input1[i].input_type;
+			model_input_type = model->layer[0].glow.metadata.input1[i].model_input_type;
+			qscale = model->layer[0].glow.metadata.input1[i].qscale;
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			input_type = model->metadata.input2[j].input_type;
-			model_input_type = model->metadata.input2[j].model_input_type;
-			qscale = model->metadata.input2[j].qscale;
+			input_type = model->layer[0].glow.metadata.input2[j].input_type;
+			model_input_type = model->layer[0].glow.metadata.input2[j].model_input_type;
+			qscale = model->layer[0].glow.metadata.input2[j].qscale;
 		}
 
 		if (input_type == model_input_type) {
-			rte_memcpy(lcl_qbuffer, lcl_dbuffer, model->addr.input[i].sz_d);
+			rte_memcpy(lcl_qbuffer, lcl_dbuffer, model->layer[0].info.input[i].sz_d);
 		} else {
-			switch (model->metadata.input1[i].model_input_type) {
+			switch (model->layer[0].glow.metadata.input1[i].model_input_type) {
 			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_float32_to_int8(qscale,
-								model->addr.input[i].nb_elements,
-								lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_int8(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_float32_to_uint8(qscale,
-								 model->addr.input[i].nb_elements,
-								 lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_uint8(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_float32_to_int16(qscale,
-								 model->addr.input[i].nb_elements,
-								 lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_int16(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_float32_to_uint16(qscale,
-								  model->addr.input[i].nb_elements,
-								  lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_uint16(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_FP16:
-				ret = rte_ml_io_float32_to_float16(model->addr.input[i].nb_elements,
-								   lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_float16(
+					model->layer[0].info.input[i].nb_elements, lcl_dbuffer,
+					lcl_qbuffer);
 				break;
 			default:
 				plt_err("Unsupported model_input_type[%u] : %u", i,
-					model->metadata.input1[i].model_input_type);
+					model->layer[0].glow.metadata.input1[i].model_input_type);
 				ret = -ENOTSUP;
 			}
 			if (ret < 0)
 				return ret;
 		}
 
-		lcl_dbuffer += model->addr.input[i].sz_d;
-		lcl_qbuffer += model->addr.input[i].sz_q;
+		lcl_dbuffer += model->layer[0].info.input[i].sz_d;
+		lcl_qbuffer += model->layer[0].info.input[i].sz_q;
 	}
 
 	return 0;
@@ -2151,7 +2189,7 @@ static int
 cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
 		       struct rte_ml_buff_seg **dbuffer)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint8_t model_output_type;
 	uint8_t *lcl_qbuffer;
 	uint8_t *lcl_dbuffer;
@@ -2171,58 +2209,60 @@ cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_
 	lcl_dbuffer = dbuffer[0]->addr;
 	lcl_qbuffer = qbuffer[0]->addr;
 
-	for (i = 0; i < model->metadata.model.num_output; i++) {
+	for (i = 0; i < model->layer[0].glow.metadata.model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			output_type = model->metadata.output1[i].output_type;
-			model_output_type = model->metadata.output1[i].model_output_type;
-			dscale = model->metadata.output1[i].dscale;
+			output_type = model->layer[0].glow.metadata.output1[i].output_type;
+			model_output_type =
+				model->layer[0].glow.metadata.output1[i].model_output_type;
+			dscale = model->layer[0].glow.metadata.output1[i].dscale;
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			output_type = model->metadata.output2[j].output_type;
-			model_output_type = model->metadata.output2[j].model_output_type;
-			dscale = model->metadata.output2[j].dscale;
+			output_type = model->layer[0].glow.metadata.output2[j].output_type;
+			model_output_type =
+				model->layer[0].glow.metadata.output2[j].model_output_type;
+			dscale = model->layer[0].glow.metadata.output2[j].dscale;
 		}
 
 		if (output_type == model_output_type) {
-			rte_memcpy(lcl_dbuffer, lcl_qbuffer, model->addr.output[i].sz_q);
+			rte_memcpy(lcl_dbuffer, lcl_qbuffer, model->layer[0].info.output[i].sz_q);
 		} else {
-			switch (model->metadata.output1[i].model_output_type) {
+			switch (model->layer[0].glow.metadata.output1[i].model_output_type) {
 			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_int8_to_float32(dscale,
-								model->addr.output[i].nb_elements,
-								lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_int8_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_uint8_to_float32(dscale,
-								 model->addr.output[i].nb_elements,
-								 lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_uint8_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_int16_to_float32(dscale,
-								 model->addr.output[i].nb_elements,
-								 lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_int16_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_uint16_to_float32(dscale,
-								  model->addr.output[i].nb_elements,
-								  lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_uint16_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_FP16:
 				ret = rte_ml_io_float16_to_float32(
-					model->addr.output[i].nb_elements, lcl_qbuffer,
+					model->layer[0].info.output[i].nb_elements, lcl_qbuffer,
 					lcl_dbuffer);
 				break;
 			default:
 				plt_err("Unsupported model_output_type[%u] : %u", i,
-					model->metadata.output1[i].model_output_type);
+					model->layer[0].glow.metadata.output1[i].model_output_type);
 				ret = -ENOTSUP;
 			}
 			if (ret < 0)
 				return ret;
 		}
 
-		lcl_qbuffer += model->addr.output[i].sz_q;
-		lcl_dbuffer += model->addr.output[i].sz_d;
+		lcl_qbuffer += model->layer[0].info.output[i].sz_q;
+		lcl_dbuffer += model->layer[0].info.output[i].sz_d;
 	}
 
 	return 0;
@@ -2250,10 +2290,10 @@ static __rte_always_inline void
 cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
 		       struct rte_ml_op *op)
 {
-	struct cn10k_ml_model_stats *stats;
+	struct cn10k_ml_layer_stats *stats;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_qp *qp;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
@@ -2263,9 +2303,9 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		if (likely(qp_id >= 0)) {
 			qp = dev->data->queue_pairs[qp_id];
 			qp->stats.dequeued_count++;
-			stats = &model->burst_stats[qp_id];
+			stats = &model->layer[0].glow.burst_stats[qp_id];
 		} else {
-			stats = model->sync_stats;
+			stats = model->layer[0].glow.sync_stats;
 		}
 
 		if (unlikely(stats->dequeued_count == stats->hw_reset_count)) {
@@ -2469,7 +2509,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_req *req;
 	bool timeout;
 	int ret = 0;
@@ -2477,7 +2517,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[op->model_id];
-	req = model->req;
+	req = model->layer[0].glow.req;
 
 	cn10k_ml_set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
diff --git a/drivers/ml/cnxk/cnxk_ml_io.h b/drivers/ml/cnxk/cnxk_ml_io.h
new file mode 100644
index 0000000000..29ec7ec511
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_io.h
@@ -0,0 +1,79 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_IO_H_
+#define _CNXK_ML_IO_H_
+
+#include <rte_mldev.h>
+
+/* Maximum number of models per device */
+#define ML_CNXK_MAX_MODELS 16
+
+/* Maximum number of layers per model */
+#define ML_CNXK_MODEL_MAX_LAYERS 32
+
+/* Maximum number of inputs or outputs per layer or model */
+#define ML_CNXK_MODEL_MAX_INPUT_OUTPUT 32
+
+/* Maximum number of dimensions per I/O shape */
+#define ML_CNXK_MODEL_MAX_DIMS 8
+
+/* Input / Output structure */
+struct cnxk_ml_io {
+	/* name */
+	char name[RTE_ML_STR_MAX];
+
+	/* dequantized data type */
+	enum rte_ml_io_type dtype;
+
+	/* quantized data type */
+	enum rte_ml_io_type qtype;
+
+	/* Number of dimensions in shape */
+	uint32_t nb_dims;
+
+	/* Shape of input */
+	uint32_t shape[ML_CNXK_MODEL_MAX_DIMS];
+
+	/* Number of elements */
+	uint32_t nb_elements;
+
+	/* Dequantized input size */
+	uint32_t sz_d;
+
+	/* Quantized input size */
+	uint32_t sz_q;
+
+	/* Scale */
+	float scale;
+};
+
+/* Model / Layer IO structure */
+struct cnxk_ml_io_info {
+	/* Number of inputs */
+	uint16_t nb_inputs;
+
+	/* Model / Layer inputs */
+	struct cnxk_ml_io input[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Total size of quantized input */
+	uint32_t total_input_sz_q;
+
+	/* Total size of dequantized input */
+	uint32_t total_input_sz_d;
+
+	/* Number of outputs */
+	uint16_t nb_outputs;
+
+	/* Model / Layer outputs */
+	struct cnxk_ml_io output[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Total size of quantized output */
+	uint32_t total_output_sz_q;
+
+	/* Total size of dequantized output */
+	uint32_t total_output_sz_d;
+};
+
+#endif /* _CNXK_ML_IO_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
new file mode 100644
index 0000000000..3d735ced3e
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include "cnxk_ml_model.h"
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
new file mode 100644
index 0000000000..a2994dbb71
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -0,0 +1,111 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_MODEL_H_
+#define _CNXK_ML_MODEL_H_
+
+#include <rte_mldev.h>
+
+#include <roc_api.h>
+
+#include "cn10k_ml_model.h"
+
+#include "cnxk_ml_io.h"
+
+struct cnxk_ml_dev;
+struct cnxk_ml_model;
+
+/* Model state */
+enum cnxk_ml_model_state {
+	/* Unknown state */
+	ML_CNXK_MODEL_STATE_UNKNOWN,
+
+	/* Model loaded */
+	ML_CNXK_MODEL_STATE_LOADED,
+
+	/* A slow-path job is active, start or stop */
+	ML_CNXK_MODEL_STATE_JOB_ACTIVE,
+
+	/* Model started */
+	ML_CNXK_MODEL_STATE_STARTED,
+};
+
+/* Layer state */
+enum cnxk_ml_layer_state {
+	/* Unknown state */
+	ML_CNXK_LAYER_STATE_UNKNOWN,
+
+	/* Layer loaded */
+	ML_CNXK_LAYER_STATE_LOADED,
+
+	/* A slow-path job is active, start or stop */
+	ML_CNXK_LAYER_STATE_JOB_ACTIVE,
+
+	/* Layer started */
+	ML_CNXK_LAYER_STATE_STARTED,
+};
+
+/* Layer object */
+struct cnxk_ml_layer {
+	/* Name*/
+	char name[RTE_ML_STR_MAX];
+
+	/* Model handle */
+	struct cnxk_ml_model *model;
+
+	/* Index mapped with firmware's model_id */
+	uint16_t index;
+
+	/* Input / Output */
+	struct cnxk_ml_io_info info;
+
+	/* Batch size */
+	uint32_t batch_size;
+
+	/* State */
+	enum cnxk_ml_layer_state state;
+
+	/* Glow layer specific data */
+	struct cn10k_ml_layer_data glow;
+};
+
+/* Model Object */
+struct cnxk_ml_model {
+	/* Device reference */
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	/* ID */
+	uint16_t model_id;
+
+	/* Name */
+	char name[RTE_ML_STR_MAX];
+
+	/* Model specific data - glow */
+	struct cn10k_ml_model_data glow;
+
+	/* Batch size */
+	uint32_t batch_size;
+
+	/* Number of layers */
+	uint16_t nb_layers;
+
+	/* Layer info */
+	struct cnxk_ml_layer layer[ML_CNXK_MODEL_MAX_LAYERS];
+
+	/* State */
+	enum cnxk_ml_model_state state;
+
+	/* Internal model information structure
+	 * Size of the buffer = sizeof(struct rte_ml_model_info)
+	 *                    + num_inputs * sizeof(struct rte_ml_io_info)
+	 *                    + num_outputs * sizeof(struct rte_ml_io_info).
+	 * Structures would be arranged in the same order in the buffer.
+	 */
+	uint8_t *info;
+
+	/* Spinlock, used to update model state */
+	plt_spinlock_t lock;
+};
+
+#endif /* _CNXK_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 03a2d4ecf2..72e03b15b5 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -13,6 +13,8 @@ driver_sdk_headers = files(
         'cn10k_ml_model.h',
         'cn10k_ml_ocm.h',
         'cnxk_ml_dev.h',
+        'cnxk_ml_io.h',
+        'cnxk_ml_model.h',
 )
 
 sources = files(
@@ -21,6 +23,7 @@ sources = files(
         'cn10k_ml_model.c',
         'cn10k_ml_ocm.c',
         'cnxk_ml_dev.c',
+        'cnxk_ml_model.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 04/34] ml/cnxk: add generic cnxk request structure
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (2 preceding siblings ...)
  2023-10-17 16:59   ` [PATCH v4 03/34] ml/cnxk: add generic model and layer structures Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 05/34] ml/cnxk: add generic cnxk xstats structures Srikanth Yalavarthi
                     ` (30 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added generic cnxk request structure. Moved common fields
from cn10k structures to cnxk structure. Moved job related
structures and enumerations to ops headers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c   |  72 +++----
 drivers/ml/cnxk/cn10k_ml_dev.h   | 269 +------------------------
 drivers/ml/cnxk/cn10k_ml_model.c |   6 +-
 drivers/ml/cnxk/cn10k_ml_model.h |   4 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 331 +++++++++++++++++--------------
 drivers/ml/cnxk/cn10k_ml_ops.h   | 296 +++++++++++++++++++++++----
 drivers/ml/cnxk/cnxk_ml_ops.c    |   7 +
 drivers/ml/cnxk/cnxk_ml_ops.h    |  63 ++++++
 drivers/ml/cnxk/meson.build      |   2 +
 9 files changed, 558 insertions(+), 492 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 3bc61443d8..fc6f78d414 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -14,9 +14,8 @@
 
 #include <roc_api.h>
 
-#include "cn10k_ml_ops.h"
-
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_ops.h"
 
 #define CN10K_ML_FW_PATH		"fw_path"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
@@ -400,20 +399,23 @@ cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 static void
 cn10k_ml_fw_print_info(struct cn10k_ml_fw *fw)
 {
-	plt_info("ML Firmware Version = %s", fw->req->jd.fw_load.version);
-
-	plt_ml_dbg("Firmware capabilities = 0x%016lx", fw->req->jd.fw_load.cap.u64);
-	plt_ml_dbg("Version = %s", fw->req->jd.fw_load.version);
-	plt_ml_dbg("core0_debug_ptr = 0x%016lx", fw->req->jd.fw_load.debug.core0_debug_ptr);
-	plt_ml_dbg("core1_debug_ptr = 0x%016lx", fw->req->jd.fw_load.debug.core1_debug_ptr);
-	plt_ml_dbg("debug_buffer_size = %u bytes", fw->req->jd.fw_load.debug.debug_buffer_size);
+	plt_info("ML Firmware Version = %s", fw->req->cn10k_req.jd.fw_load.version);
+
+	plt_ml_dbg("Firmware capabilities = 0x%016lx", fw->req->cn10k_req.jd.fw_load.cap.u64);
+	plt_ml_dbg("Version = %s", fw->req->cn10k_req.jd.fw_load.version);
+	plt_ml_dbg("core0_debug_ptr = 0x%016lx",
+		   fw->req->cn10k_req.jd.fw_load.debug.core0_debug_ptr);
+	plt_ml_dbg("core1_debug_ptr = 0x%016lx",
+		   fw->req->cn10k_req.jd.fw_load.debug.core1_debug_ptr);
+	plt_ml_dbg("debug_buffer_size = %u bytes",
+		   fw->req->cn10k_req.jd.fw_load.debug.debug_buffer_size);
 	plt_ml_dbg("core0_exception_buffer = 0x%016lx",
-		   fw->req->jd.fw_load.debug.core0_exception_buffer);
+		   fw->req->cn10k_req.jd.fw_load.debug.core0_exception_buffer);
 	plt_ml_dbg("core1_exception_buffer = 0x%016lx",
-		   fw->req->jd.fw_load.debug.core1_exception_buffer);
+		   fw->req->cn10k_req.jd.fw_load.debug.core1_exception_buffer);
 	plt_ml_dbg("exception_state_size = %u bytes",
-		   fw->req->jd.fw_load.debug.exception_state_size);
-	plt_ml_dbg("flags = 0x%016lx", fw->req->jd.fw_load.flags);
+		   fw->req->cn10k_req.jd.fw_load.debug.exception_state_size);
+	plt_ml_dbg("flags = 0x%016lx", fw->req->cn10k_req.jd.fw_load.flags);
 }
 
 uint64_t
@@ -458,29 +460,30 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 	roc_ml_reg_save(&cn10k_mldev->roc, ML_MLR_BASE);
 
 	/* Update FW load completion structure */
-	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
-	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
-	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
+	fw->req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->cn10k_req.status);
+	fw->req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
+	fw->req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->cn10k_req.result);
+	fw->req->cn10k_req.jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->cn10k_req.status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
 	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->cn10k_req.jd);
 
 	plt_rmb();
 	do {
 		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
+		    (plt_read64(&fw->req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
 	} while (plt_tsc_cycles() < timeout_cycle);
 
 	/* Check firmware load status, clean-up and exit on failure. */
-	if ((!timeout) && (fw->req->result.error_code.u64 == 0)) {
+	if ((!timeout) && (fw->req->cn10k_req.result.error_code == 0)) {
 		cn10k_ml_fw_print_info(fw);
 	} else {
 		/* Set ML to disable new jobs */
@@ -654,29 +657,30 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	plt_ml_dbg("ML_SW_RST_CTRL => 0x%08x", reg_val32);
 
 	/* (12) Wait for notification from firmware that ML is ready for job execution. */
-	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
-	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
-	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
+	fw->req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->cn10k_req.status);
+	fw->req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
+	fw->req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->cn10k_req.result);
+	fw->req->cn10k_req.jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->cn10k_req.status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
 	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->cn10k_req.jd);
 
 	plt_rmb();
 	do {
 		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
+		    (plt_read64(&fw->req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
 	} while (plt_tsc_cycles() < timeout_cycle);
 
 	/* Check firmware load status, clean-up and exit on failure. */
-	if ((!timeout) && (fw->req->result.error_code.u64 == 0)) {
+	if ((!timeout) && (fw->req->cn10k_req.result.error_code == 0)) {
 		cn10k_ml_fw_print_info(fw);
 	} else {
 		/* Set ML to disable new jobs */
@@ -766,11 +770,11 @@ cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 		}
 
 		/* Reserve memzone for firmware load completion and data */
-		mz_size = sizeof(struct cn10k_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
+		mz_size = sizeof(struct cnxk_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
 			  FW_DEBUG_BUFFER_SIZE + FW_EXCEPTION_BUFFER_SIZE;
 	} else if (roc_env_is_asim()) {
 		/* Reserve memzone for firmware load completion */
-		mz_size = sizeof(struct cn10k_ml_req);
+		mz_size = sizeof(struct cnxk_ml_req);
 	}
 
 	mz = plt_memzone_reserve_aligned(FW_MEMZONE_NAME, mz_size, 0, ML_CN10K_ALIGN_SIZE);
@@ -782,8 +786,8 @@ cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 	fw->req = mz->addr;
 
 	/* Reset firmware load completion structure */
-	memset(&fw->req->jd, 0, sizeof(struct cn10k_ml_jd));
-	memset(&fw->req->jd.fw_load.version[0], '\0', MLDEV_FIRMWARE_VERSION_LENGTH);
+	memset(&fw->req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+	memset(&fw->req->cn10k_req.jd.fw_load.version[0], '\0', MLDEV_FIRMWARE_VERSION_LENGTH);
 
 	/* Reset device, if in active state */
 	if (roc_ml_mlip_is_enabled(&cn10k_mldev->roc))
@@ -791,7 +795,7 @@ cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 
 	/* Load firmware */
 	if (roc_env_is_emulator() || roc_env_is_hw()) {
-		fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cn10k_ml_req));
+		fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cnxk_ml_req));
 		ret = cn10k_ml_fw_load_cn10ka(fw, fw_buffer, fw_size);
 		free(fw_buffer);
 	} else if (roc_env_is_asim()) {
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 99ff0a344a..1852d4f6c9 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -17,9 +17,6 @@ extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
-/* Firmware version string length */
-#define MLDEV_FIRMWARE_VERSION_LENGTH 32
-
 /* Device alignment size */
 #define ML_CN10K_ALIGN_SIZE 128
 
@@ -52,17 +49,8 @@ extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 #endif
 
 struct cnxk_ml_dev;
-struct cn10k_ml_req;
-struct cn10k_ml_qp;
-
-/* Job types */
-enum cn10k_ml_job_type {
-	ML_CN10K_JOB_TYPE_MODEL_RUN = 0,
-	ML_CN10K_JOB_TYPE_MODEL_STOP,
-	ML_CN10K_JOB_TYPE_MODEL_START,
-	ML_CN10K_JOB_TYPE_FIRMWARE_LOAD,
-	ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST,
-};
+struct cnxk_ml_req;
+struct cnxk_ml_qp;
 
 /* Error types enumeration */
 enum cn10k_ml_error_etype {
@@ -112,251 +100,6 @@ union cn10k_ml_error_code {
 	uint64_t u64;
 };
 
-/* Firmware stats */
-struct cn10k_ml_fw_stats {
-	/* Firmware start cycle */
-	uint64_t fw_start;
-
-	/* Firmware end cycle */
-	uint64_t fw_end;
-
-	/* Hardware start cycle */
-	uint64_t hw_start;
-
-	/* Hardware end cycle */
-	uint64_t hw_end;
-};
-
-/* Result structure */
-struct cn10k_ml_result {
-	/* Job error code */
-	union cn10k_ml_error_code error_code;
-
-	/* Firmware stats */
-	struct cn10k_ml_fw_stats stats;
-
-	/* User context pointer */
-	void *user_ptr;
-};
-
-/* Firmware capability structure */
-union cn10k_ml_fw_cap {
-	uint64_t u64;
-
-	struct {
-		/* CMPC completion support */
-		uint64_t cmpc_completions : 1;
-
-		/* Poll mode completion support */
-		uint64_t poll_completions : 1;
-
-		/* SSO completion support */
-		uint64_t sso_completions : 1;
-
-		/* Support for model side loading */
-		uint64_t side_load_model : 1;
-
-		/* Batch execution */
-		uint64_t batch_run : 1;
-
-		/* Max number of models to be loaded in parallel */
-		uint64_t max_models : 8;
-
-		/* Firmware statistics */
-		uint64_t fw_stats : 1;
-
-		/* Hardware statistics */
-		uint64_t hw_stats : 1;
-
-		/* Max number of batches */
-		uint64_t max_num_batches : 16;
-
-		uint64_t rsvd : 33;
-	} s;
-};
-
-/* Firmware debug info structure */
-struct cn10k_ml_fw_debug {
-	/* ACC core 0 debug buffer */
-	uint64_t core0_debug_ptr;
-
-	/* ACC core 1 debug buffer */
-	uint64_t core1_debug_ptr;
-
-	/* ACC core 0 exception state buffer */
-	uint64_t core0_exception_buffer;
-
-	/* ACC core 1 exception state buffer */
-	uint64_t core1_exception_buffer;
-
-	/* Debug buffer size per core */
-	uint32_t debug_buffer_size;
-
-	/* Exception state dump size */
-	uint32_t exception_state_size;
-};
-
-/* Job descriptor header (32 bytes) */
-struct cn10k_ml_jd_header {
-	/* Job completion structure */
-	struct ml_jce_s jce;
-
-	/* Model ID */
-	uint64_t model_id : 8;
-
-	/* Job type */
-	uint64_t job_type : 8;
-
-	/* Flags for fast-path jobs */
-	uint64_t fp_flags : 16;
-
-	/* Flags for slow-path jobs */
-	uint64_t sp_flags : 16;
-	uint64_t rsvd : 16;
-
-	/* Job result pointer */
-	uint64_t *result;
-};
-
-/* Extra arguments for job descriptor */
-union cn10k_ml_jd_extended_args {
-	struct cn10k_ml_jd_extended_args_section_start {
-		/** DDR Scratch base address */
-		uint64_t ddr_scratch_base_address;
-
-		/** DDR Scratch range start */
-		uint64_t ddr_scratch_range_start;
-
-		/** DDR Scratch range end */
-		uint64_t ddr_scratch_range_end;
-
-		uint8_t rsvd[104];
-	} start;
-};
-
-/* Job descriptor structure */
-struct cn10k_ml_jd {
-	/* Job descriptor header (32 bytes) */
-	struct cn10k_ml_jd_header hdr;
-
-	union {
-		struct cn10k_ml_jd_section_fw_load {
-			/* Firmware capability structure (8 bytes) */
-			union cn10k_ml_fw_cap cap;
-
-			/* Firmware version (32 bytes) */
-			uint8_t version[MLDEV_FIRMWARE_VERSION_LENGTH];
-
-			/* Debug capability structure (40 bytes) */
-			struct cn10k_ml_fw_debug debug;
-
-			/* Flags to control error handling */
-			uint64_t flags;
-
-			uint8_t rsvd[8];
-		} fw_load;
-
-		struct cn10k_ml_jd_section_model_start {
-			/* Extended arguments */
-			uint64_t extended_args;
-
-			/* Destination model start address in DDR relative to ML_MLR_BASE */
-			uint64_t model_dst_ddr_addr;
-
-			/* Offset to model init section in the model */
-			uint64_t model_init_offset : 32;
-
-			/* Size of init section in the model */
-			uint64_t model_init_size : 32;
-
-			/* Offset to model main section in the model */
-			uint64_t model_main_offset : 32;
-
-			/* Size of main section in the model */
-			uint64_t model_main_size : 32;
-
-			/* Offset to model finish section in the model */
-			uint64_t model_finish_offset : 32;
-
-			/* Size of finish section in the model */
-			uint64_t model_finish_size : 32;
-
-			/* Offset to WB in model bin */
-			uint64_t model_wb_offset : 32;
-
-			/* Number of model layers */
-			uint64_t num_layers : 8;
-
-			/* Number of gather entries, 0 means linear input mode (= no gather) */
-			uint64_t num_gather_entries : 8;
-
-			/* Number of scatter entries 0 means linear input mode (= no scatter) */
-			uint64_t num_scatter_entries : 8;
-
-			/* Tile mask to load model */
-			uint64_t tilemask : 8;
-
-			/* Batch size of model  */
-			uint64_t batch_size : 32;
-
-			/* OCM WB base address */
-			uint64_t ocm_wb_base_address : 32;
-
-			/* OCM WB range start */
-			uint64_t ocm_wb_range_start : 32;
-
-			/* OCM WB range End */
-			uint64_t ocm_wb_range_end : 32;
-
-			/* DDR WB address */
-			uint64_t ddr_wb_base_address;
-
-			/* DDR WB range start */
-			uint64_t ddr_wb_range_start : 32;
-
-			/* DDR WB range end */
-			uint64_t ddr_wb_range_end : 32;
-
-			union {
-				/* Points to gather list if num_gather_entries > 0 */
-				void *gather_list;
-				struct {
-					/* Linear input mode */
-					uint64_t ddr_range_start : 32;
-					uint64_t ddr_range_end : 32;
-				} s;
-			} input;
-
-			union {
-				/* Points to scatter list if num_scatter_entries > 0 */
-				void *scatter_list;
-				struct {
-					/* Linear output mode */
-					uint64_t ddr_range_start : 32;
-					uint64_t ddr_range_end : 32;
-				} s;
-			} output;
-		} model_start;
-
-		struct cn10k_ml_jd_section_model_stop {
-			uint8_t rsvd[96];
-		} model_stop;
-
-		struct cn10k_ml_jd_section_model_run {
-			/* Address of the input for the run relative to ML_MLR_BASE */
-			uint64_t input_ddr_addr;
-
-			/* Address of the output for the run relative to ML_MLR_BASE */
-			uint64_t output_ddr_addr;
-
-			/* Number of batches to run in variable batch processing */
-			uint16_t num_batches;
-
-			uint8_t rsvd[78];
-		} model_run;
-	};
-};
-
 /* ML firmware structure */
 struct cn10k_ml_fw {
 	/* Device reference */
@@ -375,7 +118,7 @@ struct cn10k_ml_fw {
 	uint8_t *data;
 
 	/* Firmware load / handshake request structure */
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 };
 
 /* Extended stats types enum */
@@ -488,9 +231,9 @@ struct cn10k_ml_dev {
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
 
 	/* Poll handling function pointers */
-	void (*set_poll_addr)(struct cn10k_ml_req *req);
-	void (*set_poll_ptr)(struct cn10k_ml_req *req);
-	uint64_t (*get_poll_ptr)(struct cn10k_ml_req *req);
+	void (*set_poll_addr)(struct cnxk_ml_req *req);
+	void (*set_poll_ptr)(struct cnxk_ml_req *req);
+	uint64_t (*get_poll_ptr)(struct cnxk_ml_req *req);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index d747bba151..5d37e9bf8a 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -10,6 +10,7 @@
 
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_ops.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -549,7 +550,6 @@ void
 cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct rte_ml_model_info *info;
 	struct rte_ml_io_info *output;
@@ -558,7 +558,6 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 	uint8_t i;
 
 	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	metadata = &model->glow.metadata;
 	info = PLT_PTR_CAST(model->info);
 	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
@@ -575,7 +574,8 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 	info->io_layout = RTE_ML_IO_LAYOUT_PACKED;
 	info->min_batches = model->batch_size;
 	info->max_batches =
-		cn10k_mldev->fw.req->jd.fw_load.cap.s.max_num_batches / model->batch_size;
+		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_num_batches /
+		model->batch_size;
 	info->nb_inputs = metadata->model.num_input;
 	info->input_info = input;
 	info->nb_outputs = metadata->model.num_output;
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 206a369ca7..74ada1531a 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -11,10 +11,10 @@
 
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ocm.h"
-#include "cn10k_ml_ops.h"
 
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
+struct cnxk_ml_req;
 
 /* Model Metadata : v 2.3.0.1 */
 #define MRVL_ML_MODEL_MAGIC_STRING "MRVL"
@@ -444,7 +444,7 @@ struct cn10k_ml_layer_data {
 	struct cn10k_ml_ocm_layer_map ocm_map;
 
 	/* Layer: Slow-path operations request pointer */
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 
 	/* Layer: Stats for burst ops */
 	struct cn10k_ml_layer_stats *burst_stats;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index e91cc4e859..caee09829b 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -7,10 +7,9 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_ops.h"
-
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_ops.h"
 
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
@@ -78,31 +77,31 @@ print_line(FILE *fp, int len)
 }
 
 static inline void
-cn10k_ml_set_poll_addr(struct cn10k_ml_req *req)
+cn10k_ml_set_poll_addr(struct cnxk_ml_req *req)
 {
-	req->compl_W1 = PLT_U64_CAST(&req->status);
+	req->status = &req->cn10k_req.status;
 }
 
 static inline void
-cn10k_ml_set_poll_ptr(struct cn10k_ml_req *req)
+cn10k_ml_set_poll_ptr(struct cnxk_ml_req *req)
 {
-	plt_write64(ML_CNXK_POLL_JOB_START, req->compl_W1);
+	plt_write64(ML_CNXK_POLL_JOB_START, req->status);
 }
 
 static inline uint64_t
-cn10k_ml_get_poll_ptr(struct cn10k_ml_req *req)
+cn10k_ml_get_poll_ptr(struct cnxk_ml_req *req)
 {
-	return plt_read64(req->compl_W1);
+	return plt_read64(req->status);
 }
 
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
-	snprintf(name, size, "cn10k_ml_qp_mem_%u:%u", dev_id, qp_id);
+	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
 }
 
 static int
-cn10k_ml_qp_destroy(const struct rte_ml_dev *dev, struct cn10k_ml_qp *qp)
+cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
 {
 	const struct rte_memzone *qp_mem;
 	char name[RTE_MEMZONE_NAMESIZE];
@@ -122,14 +121,14 @@ cn10k_ml_qp_destroy(const struct rte_ml_dev *dev, struct cn10k_ml_qp *qp)
 static int
 cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
 {
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	int ret;
 
 	qp = dev->data->queue_pairs[queue_pair_id];
 	if (qp == NULL)
 		return -EINVAL;
 
-	ret = cn10k_ml_qp_destroy(dev, qp);
+	ret = cnxk_ml_qp_destroy(dev, qp);
 	if (ret) {
 		plt_err("Could not destroy queue pair %u", queue_pair_id);
 		return ret;
@@ -140,18 +139,18 @@ cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
 	return 0;
 }
 
-static struct cn10k_ml_qp *
-cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
+static struct cnxk_ml_qp *
+cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
 {
 	const struct rte_memzone *qp_mem;
 	char name[RTE_MEMZONE_NAMESIZE];
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint32_t len;
 	uint8_t *va;
 	uint64_t i;
 
 	/* Allocate queue pair */
-	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cn10k_ml_qp), ROC_ALIGN,
+	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cnxk_ml_qp), ROC_ALIGN,
 				socket_id);
 	if (qp == NULL) {
 		plt_err("Could not allocate queue pair");
@@ -159,7 +158,7 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	}
 
 	/* For request queue */
-	len = nb_desc * sizeof(struct cn10k_ml_req);
+	len = nb_desc * sizeof(struct cnxk_ml_req);
 	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp_id);
 	qp_mem = rte_memzone_reserve_aligned(
 		name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | RTE_MEMZONE_256MB, ROC_ALIGN);
@@ -173,7 +172,7 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 
 	/* Initialize Request queue */
 	qp->id = qp_id;
-	qp->queue.reqs = (struct cn10k_ml_req *)va;
+	qp->queue.reqs = (struct cnxk_ml_req *)va;
 	qp->queue.head = 0;
 	qp->queue.tail = 0;
 	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
@@ -185,8 +184,9 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
-		memset(&qp->queue.reqs[i].jd, 0, sizeof(struct cn10k_ml_jd));
-		qp->queue.reqs[i].jcmd.w1.s.jobptr = PLT_U64_CAST(&qp->queue.reqs[i].jd);
+		memset(&qp->queue.reqs[i].cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+		qp->queue.reqs[i].cn10k_req.jcmd.w1.s.jobptr =
+			PLT_U64_CAST(&qp->queue.reqs[i].cn10k_req.jd);
 	}
 
 	return qp;
@@ -333,7 +333,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 
 static void
 cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_model *model,
-				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
+				struct cnxk_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_layer_addr *addr;
@@ -341,79 +341,88 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 	metadata = &model->glow.metadata;
 	addr = &model->layer[0].glow.addr;
 
-	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
-	req->jd.hdr.jce.w0.u64 = 0;
-	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
-	req->jd.hdr.model_id = model->model_id;
-	req->jd.hdr.job_type = job_type;
-	req->jd.hdr.fp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
+	memset(&req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
+	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->cn10k_req.status);
+	req->cn10k_req.jd.hdr.model_id = model->model_id;
+	req->cn10k_req.jd.hdr.job_type = job_type;
+	req->cn10k_req.jd.hdr.fp_flags = 0x0;
+	req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
 		if (!model->glow.metadata.model.ocm_relocatable)
-			req->jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
+			req->cn10k_req.jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
 		else
-			req->jd.hdr.sp_flags = 0x0;
+			req->cn10k_req.jd.hdr.sp_flags = 0x0;
 
-		req->jd.hdr.sp_flags |= ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD;
-		req->jd.model_start.extended_args =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->extended_args));
-		req->jd.model_start.model_dst_ddr_addr =
+		req->cn10k_req.jd.hdr.sp_flags |= ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD;
+		req->cn10k_req.jd.model_start.extended_args = PLT_U64_CAST(
+			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.extended_args));
+		req->cn10k_req.jd.model_start.model_dst_ddr_addr =
 			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_run_addr));
-		req->jd.model_start.model_init_offset = 0x0;
-		req->jd.model_start.model_main_offset = metadata->init_model.file_size;
-		req->jd.model_start.model_finish_offset =
+		req->cn10k_req.jd.model_start.model_init_offset = 0x0;
+		req->cn10k_req.jd.model_start.model_main_offset = metadata->init_model.file_size;
+		req->cn10k_req.jd.model_start.model_finish_offset =
 			metadata->init_model.file_size + metadata->main_model.file_size;
-		req->jd.model_start.model_init_size = metadata->init_model.file_size;
-		req->jd.model_start.model_main_size = metadata->main_model.file_size;
-		req->jd.model_start.model_finish_size = metadata->finish_model.file_size;
-		req->jd.model_start.model_wb_offset = metadata->init_model.file_size +
-						      metadata->main_model.file_size +
-						      metadata->finish_model.file_size;
-		req->jd.model_start.num_layers = metadata->model.num_layers;
-		req->jd.model_start.num_gather_entries = 0;
-		req->jd.model_start.num_scatter_entries = 0;
-		req->jd.model_start.tilemask = 0; /* Updated after reserving pages */
-		req->jd.model_start.batch_size = model->batch_size;
-		req->jd.model_start.ocm_wb_base_address = 0; /* Updated after reserving pages */
-		req->jd.model_start.ocm_wb_range_start = metadata->model.ocm_wb_range_start;
-		req->jd.model_start.ocm_wb_range_end = metadata->model.ocm_wb_range_end;
-		req->jd.model_start.ddr_wb_base_address = PLT_U64_CAST(roc_ml_addr_ap2mlip(
-			&cn10k_mldev->roc,
-			PLT_PTR_ADD(addr->finish_load_addr, metadata->finish_model.file_size)));
-		req->jd.model_start.ddr_wb_range_start = metadata->model.ddr_wb_range_start;
-		req->jd.model_start.ddr_wb_range_end = metadata->model.ddr_wb_range_end;
-		req->jd.model_start.input.s.ddr_range_start = metadata->model.ddr_input_range_start;
-		req->jd.model_start.input.s.ddr_range_end = metadata->model.ddr_input_range_end;
-		req->jd.model_start.output.s.ddr_range_start =
+		req->cn10k_req.jd.model_start.model_init_size = metadata->init_model.file_size;
+		req->cn10k_req.jd.model_start.model_main_size = metadata->main_model.file_size;
+		req->cn10k_req.jd.model_start.model_finish_size = metadata->finish_model.file_size;
+		req->cn10k_req.jd.model_start.model_wb_offset = metadata->init_model.file_size +
+								metadata->main_model.file_size +
+								metadata->finish_model.file_size;
+		req->cn10k_req.jd.model_start.num_layers = metadata->model.num_layers;
+		req->cn10k_req.jd.model_start.num_gather_entries = 0;
+		req->cn10k_req.jd.model_start.num_scatter_entries = 0;
+		req->cn10k_req.jd.model_start.tilemask = 0; /* Updated after reserving pages */
+		req->cn10k_req.jd.model_start.batch_size = model->batch_size;
+		req->cn10k_req.jd.model_start.ocm_wb_base_address =
+			0; /* Updated after reserving pages */
+		req->cn10k_req.jd.model_start.ocm_wb_range_start =
+			metadata->model.ocm_wb_range_start;
+		req->cn10k_req.jd.model_start.ocm_wb_range_end = metadata->model.ocm_wb_range_end;
+		req->cn10k_req.jd.model_start.ddr_wb_base_address =
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(
+				&cn10k_mldev->roc, PLT_PTR_ADD(addr->finish_load_addr,
+							       metadata->finish_model.file_size)));
+		req->cn10k_req.jd.model_start.ddr_wb_range_start =
+			metadata->model.ddr_wb_range_start;
+		req->cn10k_req.jd.model_start.ddr_wb_range_end = metadata->model.ddr_wb_range_end;
+		req->cn10k_req.jd.model_start.input.s.ddr_range_start =
+			metadata->model.ddr_input_range_start;
+		req->cn10k_req.jd.model_start.input.s.ddr_range_end =
+			metadata->model.ddr_input_range_end;
+		req->cn10k_req.jd.model_start.output.s.ddr_range_start =
 			metadata->model.ddr_output_range_start;
-		req->jd.model_start.output.s.ddr_range_end = metadata->model.ddr_output_range_end;
+		req->cn10k_req.jd.model_start.output.s.ddr_range_end =
+			metadata->model.ddr_output_range_end;
 
-		req->extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
+		req->cn10k_req.extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
 			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->scratch_base_addr));
-		req->extended_args.start.ddr_scratch_range_start =
+		req->cn10k_req.extended_args.start.ddr_scratch_range_start =
 			metadata->model.ddr_scratch_range_start;
-		req->extended_args.start.ddr_scratch_range_end =
+		req->cn10k_req.extended_args.start.ddr_scratch_range_end =
 			metadata->model.ddr_scratch_range_end;
 	}
 }
 
 static __rte_always_inline void
-cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_req *req,
+cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_req *req,
 				struct rte_ml_op *op)
 {
-	req->jd.hdr.jce.w0.u64 = 0;
-	req->jd.hdr.jce.w1.u64 = req->compl_W1;
-	req->jd.hdr.model_id = op->model_id;
-	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
-	req->jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
-	req->jd.hdr.sp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
-	req->jd.model_run.input_ddr_addr =
+	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
+	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(req->status);
+	req->cn10k_req.jd.hdr.model_id = op->model_id;
+	req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
+	req->cn10k_req.jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
+	req->cn10k_req.jd.hdr.sp_flags = 0x0;
+	req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
+	req->cn10k_req.jd.model_run.input_ddr_addr =
 		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->input[0]->addr));
-	req->jd.model_run.output_ddr_addr =
+	req->cn10k_req.jd.model_run.output_ddr_addr =
 		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->output[0]->addr));
-	req->jd.model_run.num_batches = op->nb_batches;
+	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
 }
 
 struct xstat_info {
@@ -861,7 +870,7 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	op.input = &inp;
 	op.output = &out;
 
-	memset(model->layer[0].glow.req, 0, sizeof(struct cn10k_ml_req));
+	memset(model->layer[0].glow.req, 0, sizeof(struct cnxk_ml_req));
 	ret = cn10k_ml_inference_sync(dev, &op);
 	plt_memzone_free(mz);
 
@@ -904,7 +913,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint16_t model_id;
 	uint32_t mz_size;
 	uint16_t tile_id;
@@ -1101,7 +1110,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint16_t model_id;
 	uint16_t qp_id;
 
@@ -1136,7 +1145,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
 		qp = dev->data->queue_pairs[qp_id];
 		if (qp != NULL) {
-			if (cn10k_ml_qp_destroy(dev, qp) != 0)
+			if (cnxk_ml_qp_destroy(dev, qp) != 0)
 				plt_err("Could not destroy queue pair %u", qp_id);
 			dev->data->queue_pairs[qp_id] = NULL;
 		}
@@ -1213,7 +1222,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 			      const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
 {
 	struct rte_ml_dev_info dev_info;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint32_t nb_desc;
 
 	if (queue_pair_id >= dev->data->nb_queue_pairs) {
@@ -1239,7 +1248,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	 */
 	nb_desc =
 		(qp_conf->nb_desc == dev_info.max_desc) ? dev_info.max_desc : qp_conf->nb_desc + 1;
-	qp = cn10k_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
+	qp = cnxk_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
 	if (qp == NULL) {
 		plt_err("Could not create queue pair %u", queue_pair_id);
 		return -ENOMEM;
@@ -1252,7 +1261,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 static int
 cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 {
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	int qp_id;
 
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
@@ -1269,7 +1278,7 @@ cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 static void
 cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	int qp_id;
 
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
@@ -1485,20 +1494,22 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 
 	/* Dump debug buffer */
 	for (core_id = 0; core_id <= 1; core_id++) {
-		bufsize = fw->req->jd.fw_load.debug.debug_buffer_size;
+		bufsize = fw->req->cn10k_req.jd.fw_load.debug.debug_buffer_size;
 		if (core_id == 0) {
 			head_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
 			tail_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_debug_ptr);
+			head_ptr =
+				PLT_PTR_CAST(fw->req->cn10k_req.jd.fw_load.debug.core0_debug_ptr);
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		} else {
 			head_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
 			tail_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_debug_ptr);
+			head_ptr =
+				PLT_PTR_CAST(fw->req->cn10k_req.jd.fw_load.debug.core1_debug_ptr);
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		}
 		if (head_loc < tail_loc) {
@@ -1511,17 +1522,19 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 
 	/* Dump exception info */
 	for (core_id = 0; core_id <= 1; core_id++) {
-		bufsize = fw->req->jd.fw_load.debug.exception_state_size;
+		bufsize = fw->req->cn10k_req.jd.fw_load.debug.exception_state_size;
 		if ((core_id == 0) &&
 		    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0)) {
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_exception_buffer);
+			head_ptr = PLT_PTR_CAST(
+				fw->req->cn10k_req.jd.fw_load.debug.core0_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C0 = 0x%016lx",
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0));
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 			fprintf(fp, "%.*s", bufsize, head_ptr);
 		} else if ((core_id == 1) && (roc_ml_reg_read64(&cn10k_mldev->roc,
 								ML_SCRATCH_EXCEPTION_SP_C1) != 0)) {
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_exception_buffer);
+			head_ptr = PLT_PTR_CAST(
+				fw->req->cn10k_req.jd.fw_load.debug.core1_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C1 = 0x%016lx",
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1));
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
@@ -1538,14 +1551,14 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	const struct plt_memzone *mz;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 	uint64_t timeout_cycle;
 	bool timeout;
 	int ret;
 
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cn10k_ml_req), 0,
+	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cnxk_ml_req), 0,
 					 ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL) {
 		plt_err("Could not allocate reserved memzone");
@@ -1554,23 +1567,24 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	req = mz->addr;
 
 	/* Prepare load completion structure */
-	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
-	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
-	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
-	req->jd.fw_load.flags = cn10k_ml_fw_flags_get(&cn10k_mldev->fw);
-	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
+	memset(&req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->cn10k_req.status);
+	req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST;
+	req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
+	req->cn10k_req.jd.fw_load.flags = cn10k_ml_fw_flags_get(&cn10k_mldev->fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
 	/* Enqueue firmware selftest request through scratch registers */
 	timeout = true;
 	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 
 	plt_rmb();
 	do {
 		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
-		    (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH)) {
+		    (plt_read64(&req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -1581,7 +1595,7 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	if (timeout) {
 		ret = -ETIME;
 	} else {
-		if (req->result.error_code.u64 != 0)
+		if (req->cn10k_req.result.error_code != 0)
 			ret = -1;
 	}
 
@@ -1654,7 +1668,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
 		  2 * model_data_size + model_scratch_size + model_info_size +
-		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE) +
+		  PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE) +
 		  model_stats_size;
 
 	/* Allocate memzone for model object and model data */
@@ -1726,7 +1740,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	/* Reset burst and sync stats */
 	model->layer[0].glow.burst_stats =
 		PLT_PTR_ADD(model->layer[0].glow.req,
-			    PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE));
+			    PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
 		model->layer[0].glow.burst_stats[qp_id].hw_latency_tot = 0;
 		model->layer[0].glow.burst_stats[qp_id].hw_latency_min = UINT64_MAX;
@@ -1790,7 +1804,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 
 	bool job_enqueued;
 	bool job_dequeued;
@@ -1815,10 +1829,10 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	/* Prepare JD */
 	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
-	req->result.error_code.u64 = 0x0;
-	req->result.user_ptr = NULL;
+	req->cn10k_req.result.error_code = 0x0;
+	req->cn10k_req.result.user_ptr = NULL;
 
-	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
 	num_tiles = model->layer[0].glow.metadata.model.tile_end -
@@ -1878,8 +1892,8 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Update JD */
 	cn10k_ml_ocm_tilecount(model->layer[0].glow.ocm_map.tilemask, &tile_start, &tile_end);
-	req->jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
-	req->jd.model_start.ocm_wb_base_address =
+	req->cn10k_req.jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
+	req->cn10k_req.jd.model_start.ocm_wb_base_address =
 		model->layer[0].glow.ocm_map.wb_page_start * ocm->page_size;
 
 	job_enqueued = false;
@@ -1887,19 +1901,21 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	do {
 		if (!job_enqueued) {
 			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
+			job_enqueued =
+				roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
+			job_dequeued =
+				roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
-			if (req->result.error_code.u64 == 0)
+		if (plt_read64(&req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH) {
+			if (req->cn10k_req.result.error_code == 0)
 				ret = 0;
 			else
 				ret = -1;
@@ -1952,7 +1968,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 
 	bool job_enqueued;
 	bool job_dequeued;
@@ -1972,10 +1988,10 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	/* Prepare JD */
 	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
-	req->result.error_code.u64 = 0x0;
-	req->result.user_ptr = NULL;
+	req->cn10k_req.result.error_code = 0x0;
+	req->cn10k_req.result.user_ptr = NULL;
 
-	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
 	locked = false;
@@ -2015,19 +2031,21 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	do {
 		if (!job_enqueued) {
 			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
+			job_enqueued =
+				roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
+			job_dequeued =
+				roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
-			if (req->result.error_code.u64 == 0x0)
+		if (plt_read64(&req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH) {
+			if (req->cn10k_req.result.error_code == 0x0)
 				ret = 0;
 			else
 				ret = -1;
@@ -2287,18 +2305,23 @@ queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
 }
 
 static __rte_always_inline void
-cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
-		       struct rte_ml_op *op)
+cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *req)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_layer_stats *stats;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cn10k_ml_result *result;
 	struct cnxk_ml_model *model;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
+	struct rte_ml_op *op;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
 
-	if (likely(result->error_code.u64 == 0)) {
+	result = &req->cn10k_req.result;
+	op = req->op;
+
+	if (likely(result->error_code == 0)) {
 		model = dev->data->models[op->model_id];
 		if (likely(qp_id >= 0)) {
 			qp = dev->data->queue_pairs[qp_id];
@@ -2329,7 +2352,7 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		stats->fw_latency_max = PLT_MAX(stats->fw_latency_max, fw_latency);
 		stats->dequeued_count++;
 
-		op->impl_opaque = result->error_code.u64;
+		op->impl_opaque = result->error_code;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
 	} else {
 		if (likely(qp_id >= 0)) {
@@ -2338,7 +2361,8 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		}
 
 		/* Handle driver error */
-		if (result->error_code.s.etype == ML_ETYPE_DRIVER) {
+		error_code = (union cn10k_ml_error_code *)&result->error_code;
+		if (error_code->s.etype == ML_ETYPE_DRIVER) {
 			cnxk_mldev = dev->data->dev_private;
 			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
@@ -2346,15 +2370,15 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 			if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) !=
 			     0) ||
 			    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
-				result->error_code.s.stype = ML_DRIVER_ERR_EXCEPTION;
+				error_code->s.stype = ML_DRIVER_ERR_EXCEPTION;
 			else if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_LO) != 0) ||
 				 (roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_HI) != 0))
-				result->error_code.s.stype = ML_DRIVER_ERR_FW_ERROR;
+				error_code->s.stype = ML_DRIVER_ERR_FW_ERROR;
 			else
-				result->error_code.s.stype = ML_DRIVER_ERR_UNKNOWN;
+				error_code->s.stype = ML_DRIVER_ERR_UNKNOWN;
 		}
 
-		op->impl_opaque = result->error_code.u64;
+		op->impl_opaque = result->error_code;
 		op->status = RTE_ML_OP_STATUS_ERROR;
 	}
 
@@ -2365,11 +2389,12 @@ __rte_hot uint16_t
 cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_req *req;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_req *req;
+	struct cnxk_ml_qp *qp;
 	struct rte_ml_op *op;
 
 	uint16_t count;
@@ -2395,12 +2420,13 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	cn10k_mldev->set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
-	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
-	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
-	req->result.user_ptr = op->user_ptr;
+	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
+	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	req->cn10k_req.result.user_ptr = op->user_ptr;
 
 	cn10k_mldev->set_poll_ptr(req);
-	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd);
+	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
 
@@ -2424,11 +2450,12 @@ __rte_hot uint16_t
 cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_req *req;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_req *req;
+	struct cnxk_ml_qp *qp;
 
 	uint64_t status;
 	uint16_t count;
@@ -2450,13 +2477,15 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	req = &queue->reqs[tail];
 	status = cn10k_mldev->get_poll_ptr(req);
 	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
-		if (plt_tsc_cycles() < req->timeout)
+		if (plt_tsc_cycles() < req->timeout) {
 			goto empty_or_active;
-		else /* Timeout, set indication of driver error */
-			req->result.error_code.s.etype = ML_ETYPE_DRIVER;
+		} else { /* Timeout, set indication of driver error */
+			error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+			error_code->s.etype = ML_ETYPE_DRIVER;
+		}
 	}
 
-	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
+	cn10k_ml_result_update(dev, qp_id, req);
 	ops[count] = req->op;
 
 	queue_index_advance(&tail, qp->nb_desc);
@@ -2507,10 +2536,11 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 __rte_hot int
 cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 	bool timeout;
 	int ret = 0;
 
@@ -2522,17 +2552,18 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	cn10k_ml_set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
-	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
-	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
-	req->result.user_ptr = op->user_ptr;
+	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
+	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	req->cn10k_req.result.user_ptr = op->user_ptr;
 
 	cn10k_mldev->set_poll_ptr(req);
-	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
+	req->cn10k_req.jcmd.w1.s.jobptr = PLT_U64_CAST(&req->cn10k_req.jd);
 
 	timeout = true;
 	req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	do {
-		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd)) {
+		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd)) {
 			req->op = op;
 			timeout = false;
 			break;
@@ -2555,7 +2586,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	if (timeout)
 		ret = -ETIME;
 	else
-		cn10k_ml_result_update(dev, -1, &req->result, req->op);
+		cn10k_ml_result_update(dev, -1, req);
 
 error_enqueue:
 	return ret;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 005b093e45..fd5992e192 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -10,63 +10,279 @@
 
 #include <roc_api.h>
 
-#include "cn10k_ml_dev.h"
+/* Firmware version string length */
+#define MLDEV_FIRMWARE_VERSION_LENGTH 32
 
-/* Request structure */
-struct cn10k_ml_req {
-	/* Job descriptor */
-	struct cn10k_ml_jd jd;
+/* Job types */
+enum cn10k_ml_job_type {
+	ML_CN10K_JOB_TYPE_MODEL_RUN = 0,
+	ML_CN10K_JOB_TYPE_MODEL_STOP,
+	ML_CN10K_JOB_TYPE_MODEL_START,
+	ML_CN10K_JOB_TYPE_FIRMWARE_LOAD,
+	ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST,
+};
 
-	/* Job descriptor extra arguments */
-	union cn10k_ml_jd_extended_args extended_args;
+/* Firmware stats */
+struct cn10k_ml_stats {
+	/* Firmware start cycle */
+	uint64_t fw_start;
 
-	/* Job result */
-	struct cn10k_ml_result result;
+	/* Firmware end cycle */
+	uint64_t fw_end;
 
-	/* Status field for poll mode requests */
-	volatile uint64_t status;
+	/* Hardware start cycle */
+	uint64_t hw_start;
 
-	/* Job command */
-	struct ml_job_cmd_s jcmd;
+	/* Hardware end cycle */
+	uint64_t hw_end;
+};
+
+/* Result structure */
+struct cn10k_ml_result {
+	/* Job error code */
+	uint64_t error_code;
+
+	/* Stats */
+	struct cn10k_ml_stats stats;
+
+	/* User context pointer */
+	void *user_ptr;
+};
+
+/* Firmware capability structure */
+union cn10k_ml_fw_cap {
+	uint64_t u64;
+
+	struct {
+		/* CMPC completion support */
+		uint64_t cmpc_completions : 1;
+
+		/* Poll mode completion support */
+		uint64_t poll_completions : 1;
+
+		/* SSO completion support */
+		uint64_t sso_completions : 1;
+
+		/* Support for model side loading */
+		uint64_t side_load_model : 1;
 
-	/* Job completion W1 */
-	uint64_t compl_W1;
+		/* Batch execution */
+		uint64_t batch_run : 1;
 
-	/* Timeout cycle */
-	uint64_t timeout;
+		/* Max number of models to be loaded in parallel */
+		uint64_t max_models : 8;
 
-	/* Op */
-	struct rte_ml_op *op;
-} __rte_aligned(ROC_ALIGN);
+		/* Firmware statistics */
+		uint64_t fw_stats : 1;
 
-/* Request queue */
-struct cn10k_ml_queue {
-	/* Array of requests */
-	struct cn10k_ml_req *reqs;
+		/* Hardware statistics */
+		uint64_t hw_stats : 1;
 
-	/* Head of the queue, used for enqueue */
-	uint64_t head;
+		/* Max number of batches */
+		uint64_t max_num_batches : 16;
 
-	/* Tail of the queue, used for dequeue */
-	uint64_t tail;
+		uint64_t rsvd : 33;
+	} s;
+};
+
+/* Firmware debug info structure */
+struct cn10k_ml_fw_debug {
+	/* ACC core 0 debug buffer */
+	uint64_t core0_debug_ptr;
+
+	/* ACC core 1 debug buffer */
+	uint64_t core1_debug_ptr;
+
+	/* ACC core 0 exception state buffer */
+	uint64_t core0_exception_buffer;
+
+	/* ACC core 1 exception state buffer */
+	uint64_t core1_exception_buffer;
+
+	/* Debug buffer size per core */
+	uint32_t debug_buffer_size;
 
-	/* Wait cycles before timeout */
-	uint64_t wait_cycles;
+	/* Exception state dump size */
+	uint32_t exception_state_size;
 };
 
-/* Queue-pair structure */
-struct cn10k_ml_qp {
-	/* ID */
-	uint32_t id;
+/* Job descriptor header (32 bytes) */
+struct cn10k_ml_jd_header {
+	/* Job completion structure */
+	struct ml_jce_s jce;
+
+	/* Model ID */
+	uint64_t model_id : 8;
+
+	/* Job type */
+	uint64_t job_type : 8;
+
+	/* Flags for fast-path jobs */
+	uint64_t fp_flags : 16;
+
+	/* Flags for slow-path jobs */
+	uint64_t sp_flags : 16;
+	uint64_t rsvd : 16;
+
+	/* Job result pointer */
+	uint64_t *result;
+};
+
+/* Extra arguments for job descriptor */
+union cn10k_ml_jd_extended_args {
+	struct cn10k_ml_jd_extended_args_section_start {
+		/* DDR Scratch base address */
+		uint64_t ddr_scratch_base_address;
+
+		/* DDR Scratch range start */
+		uint64_t ddr_scratch_range_start;
+
+		/* DDR Scratch range end */
+		uint64_t ddr_scratch_range_end;
+
+		uint8_t rsvd[104];
+	} start;
+};
+
+/* Job descriptor structure */
+struct cn10k_ml_jd {
+	/* Job descriptor header (32 bytes) */
+	struct cn10k_ml_jd_header hdr;
+
+	union {
+		struct cn10k_ml_jd_section_fw_load {
+			/* Firmware capability structure (8 bytes) */
+			union cn10k_ml_fw_cap cap;
+
+			/* Firmware version (32 bytes) */
+			uint8_t version[MLDEV_FIRMWARE_VERSION_LENGTH];
+
+			/* Debug capability structure (40 bytes) */
+			struct cn10k_ml_fw_debug debug;
 
-	/* Number of descriptors */
-	uint64_t nb_desc;
+			/* Flags to control error handling */
+			uint64_t flags;
 
-	/* Request queue */
-	struct cn10k_ml_queue queue;
+			uint8_t rsvd[8];
+		} fw_load;
 
-	/* Statistics per queue-pair */
-	struct rte_ml_dev_stats stats;
+		struct cn10k_ml_jd_section_model_start {
+			/* Extended arguments */
+			uint64_t extended_args;
+
+			/* Destination model start address in DDR relative to ML_MLR_BASE */
+			uint64_t model_dst_ddr_addr;
+
+			/* Offset to model init section in the model */
+			uint64_t model_init_offset : 32;
+
+			/* Size of init section in the model */
+			uint64_t model_init_size : 32;
+
+			/* Offset to model main section in the model */
+			uint64_t model_main_offset : 32;
+
+			/* Size of main section in the model */
+			uint64_t model_main_size : 32;
+
+			/* Offset to model finish section in the model */
+			uint64_t model_finish_offset : 32;
+
+			/* Size of finish section in the model */
+			uint64_t model_finish_size : 32;
+
+			/* Offset to WB in model bin */
+			uint64_t model_wb_offset : 32;
+
+			/* Number of model layers */
+			uint64_t num_layers : 8;
+
+			/* Number of gather entries, 0 means linear input mode (= no gather) */
+			uint64_t num_gather_entries : 8;
+
+			/* Number of scatter entries 0 means linear input mode (= no scatter) */
+			uint64_t num_scatter_entries : 8;
+
+			/* Tile mask to load model */
+			uint64_t tilemask : 8;
+
+			/* Batch size of model  */
+			uint64_t batch_size : 32;
+
+			/* OCM WB base address */
+			uint64_t ocm_wb_base_address : 32;
+
+			/* OCM WB range start */
+			uint64_t ocm_wb_range_start : 32;
+
+			/* OCM WB range End */
+			uint64_t ocm_wb_range_end : 32;
+
+			/* DDR WB address */
+			uint64_t ddr_wb_base_address;
+
+			/* DDR WB range start */
+			uint64_t ddr_wb_range_start : 32;
+
+			/* DDR WB range end */
+			uint64_t ddr_wb_range_end : 32;
+
+			union {
+				/* Points to gather list if num_gather_entries > 0 */
+				void *gather_list;
+				struct {
+					/* Linear input mode */
+					uint64_t ddr_range_start : 32;
+					uint64_t ddr_range_end : 32;
+				} s;
+			} input;
+
+			union {
+				/* Points to scatter list if num_scatter_entries > 0 */
+				void *scatter_list;
+				struct {
+					/* Linear output mode */
+					uint64_t ddr_range_start : 32;
+					uint64_t ddr_range_end : 32;
+				} s;
+			} output;
+		} model_start;
+
+		struct cn10k_ml_jd_section_model_stop {
+			uint8_t rsvd[96];
+		} model_stop;
+
+		struct cn10k_ml_jd_section_model_run {
+			/* Address of the input for the run relative to ML_MLR_BASE */
+			uint64_t input_ddr_addr;
+
+			/* Address of the output for the run relative to ML_MLR_BASE */
+			uint64_t output_ddr_addr;
+
+			/* Number of batches to run in variable batch processing */
+			uint16_t num_batches;
+
+			uint8_t rsvd[78];
+		} model_run;
+	};
+} __plt_aligned(ROC_ALIGN);
+
+/* CN10K specific request */
+struct cn10k_ml_req {
+	/* Job descriptor */
+	struct cn10k_ml_jd jd;
+
+	/* Job descriptor extra arguments */
+	union cn10k_ml_jd_extended_args extended_args;
+
+	/* Status field for poll mode requests */
+	volatile uint64_t status;
+
+	/* Job command */
+	struct ml_job_cmd_s jcmd;
+
+	/* Result */
+	struct cn10k_ml_result result;
 };
 
 /* Device ops */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
new file mode 100644
index 0000000000..f1872dcf7c
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include "cnxk_ml_ops.h"
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
new file mode 100644
index 0000000000..b953fb0f5f
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_OPS_H_
+#define _CNXK_ML_OPS_H_
+
+#include <rte_mldev.h>
+#include <rte_mldev_core.h>
+
+#include <roc_api.h>
+
+#include "cn10k_ml_ops.h"
+
+/* Request structure */
+struct cnxk_ml_req {
+	/* Device specific request */
+	union {
+		/* CN10K */
+		struct cn10k_ml_req cn10k_req;
+	};
+
+	/* Address of status field */
+	volatile uint64_t *status;
+
+	/* Timeout cycle */
+	uint64_t timeout;
+
+	/* Op */
+	struct rte_ml_op *op;
+} __rte_aligned(ROC_ALIGN);
+
+/* Request queue */
+struct cnxk_ml_queue {
+	/* Array of requests */
+	struct cnxk_ml_req *reqs;
+
+	/* Head of the queue, used for enqueue */
+	uint64_t head;
+
+	/* Tail of the queue, used for dequeue */
+	uint64_t tail;
+
+	/* Wait cycles before timeout */
+	uint64_t wait_cycles;
+};
+
+/* Queue-pair structure */
+struct cnxk_ml_qp {
+	/* ID */
+	uint32_t id;
+
+	/* Number of descriptors */
+	uint64_t nb_desc;
+
+	/* Request queue */
+	struct cnxk_ml_queue queue;
+
+	/* Statistics per queue-pair */
+	struct rte_ml_dev_stats stats;
+};
+
+#endif /* _CNXK_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 72e03b15b5..73db458fcd 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -15,6 +15,7 @@ driver_sdk_headers = files(
         'cnxk_ml_dev.h',
         'cnxk_ml_io.h',
         'cnxk_ml_model.h',
+        'cnxk_ml_ops.h',
 )
 
 sources = files(
@@ -24,6 +25,7 @@ sources = files(
         'cn10k_ml_ocm.c',
         'cnxk_ml_dev.c',
         'cnxk_ml_model.c',
+        'cnxk_ml_ops.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 05/34] ml/cnxk: add generic cnxk xstats structures
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (3 preceding siblings ...)
  2023-10-17 16:59   ` [PATCH v4 04/34] ml/cnxk: add generic cnxk request structure Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 06/34] ml/cnxk: rename cnxk ops function pointers struct Srikanth Yalavarthi
                     ` (29 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduced generic xstats structures and renamed cn10k
xstats enumerations with cnxk prefix.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |  86 +---------------
 drivers/ml/cnxk/cn10k_ml_model.h |   6 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 169 ++++++++++++++-----------------
 drivers/ml/cnxk/cnxk_ml_xstats.h | 128 +++++++++++++++++++++++
 drivers/ml/cnxk/meson.build      |   1 +
 5 files changed, 210 insertions(+), 180 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_xstats.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 1852d4f6c9..be989e0a20 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -10,6 +10,7 @@
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_io.h"
+#include "cnxk_ml_xstats.h"
 
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
@@ -121,89 +122,6 @@ struct cn10k_ml_fw {
 	struct cnxk_ml_req *req;
 };
 
-/* Extended stats types enum */
-enum cn10k_ml_xstats_type {
-	/* Number of models loaded */
-	nb_models_loaded,
-
-	/* Number of models unloaded */
-	nb_models_unloaded,
-
-	/* Number of models started */
-	nb_models_started,
-
-	/* Number of models stopped */
-	nb_models_stopped,
-
-	/* Average inference hardware latency */
-	avg_hw_latency,
-
-	/* Minimum hardware latency */
-	min_hw_latency,
-
-	/* Maximum hardware latency */
-	max_hw_latency,
-
-	/* Average firmware latency */
-	avg_fw_latency,
-
-	/* Minimum firmware latency */
-	min_fw_latency,
-
-	/* Maximum firmware latency */
-	max_fw_latency,
-};
-
-/* Extended stats function type enum. */
-enum cn10k_ml_xstats_fn_type {
-	/* Device function */
-	CN10K_ML_XSTATS_FN_DEVICE,
-
-	/* Model function */
-	CN10K_ML_XSTATS_FN_MODEL,
-};
-
-/* Function pointer to get xstats for a type */
-typedef uint64_t (*cn10k_ml_xstats_fn)(struct rte_ml_dev *dev, uint16_t obj_idx,
-				       enum cn10k_ml_xstats_type stat);
-
-/* Extended stats entry structure */
-struct cn10k_ml_xstats_entry {
-	/* Name-ID map */
-	struct rte_ml_dev_xstats_map map;
-
-	/* xstats mode, device or model */
-	enum rte_ml_dev_xstats_mode mode;
-
-	/* Type of xstats */
-	enum cn10k_ml_xstats_type type;
-
-	/* xstats function */
-	enum cn10k_ml_xstats_fn_type fn_id;
-
-	/* Object ID, model ID for model stat type */
-	uint16_t obj_idx;
-
-	/* Allowed to reset the stat */
-	uint8_t reset_allowed;
-
-	/* An offset to be taken away to emulate resets */
-	uint64_t reset_value;
-};
-
-/* Extended stats data */
-struct cn10k_ml_xstats {
-	/* Pointer to xstats entries */
-	struct cn10k_ml_xstats_entry *entries;
-
-	/* Store num stats and offset of the stats for each model */
-	uint16_t count_per_model[ML_CNXK_MAX_MODELS];
-	uint16_t offset_for_model[ML_CNXK_MAX_MODELS];
-	uint16_t count_mode_device;
-	uint16_t count_mode_model;
-	uint16_t count;
-};
-
 /* Device private data */
 struct cn10k_ml_dev {
 	/* Device ROC */
@@ -216,7 +134,7 @@ struct cn10k_ml_dev {
 	struct cn10k_ml_ocm ocm;
 
 	/* Extended stats data */
-	struct cn10k_ml_xstats xstats;
+	struct cnxk_ml_xstats xstats;
 
 	/* Enable / disable model data caching */
 	int cache_model_data;
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 74ada1531a..5c32f48c68 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -404,7 +404,7 @@ struct cn10k_ml_layer_addr {
 };
 
 /* Model fast-path stats */
-struct cn10k_ml_layer_stats {
+struct cn10k_ml_layer_xstats {
 	/* Total hardware latency, sum of all inferences */
 	uint64_t hw_latency_tot;
 
@@ -447,10 +447,10 @@ struct cn10k_ml_layer_data {
 	struct cnxk_ml_req *req;
 
 	/* Layer: Stats for burst ops */
-	struct cn10k_ml_layer_stats *burst_stats;
+	struct cn10k_ml_layer_xstats *burst_xstats;
 
 	/* Layer: Stats for sync ops */
-	struct cn10k_ml_layer_stats *sync_stats;
+	struct cn10k_ml_layer_xstats *sync_xstats;
 };
 
 struct cn10k_ml_model_data {
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index caee09829b..42a4389bbe 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -10,6 +10,7 @@
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
+#include "cnxk_ml_xstats.h"
 
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
@@ -425,26 +426,6 @@ cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
 }
 
-struct xstat_info {
-	char name[32];
-	enum cn10k_ml_xstats_type type;
-	uint8_t reset_allowed;
-};
-
-/* Note: Device stats are not allowed to be reset. */
-static const struct xstat_info device_stats[] = {
-	{"nb_models_loaded", nb_models_loaded, 0},
-	{"nb_models_unloaded", nb_models_unloaded, 0},
-	{"nb_models_started", nb_models_started, 0},
-	{"nb_models_stopped", nb_models_stopped, 0},
-};
-
-static const struct xstat_info model_stats[] = {
-	{"Avg-HW-Latency", avg_hw_latency, 1}, {"Min-HW-Latency", min_hw_latency, 1},
-	{"Max-HW-Latency", max_hw_latency, 1}, {"Avg-FW-Latency", avg_fw_latency, 1},
-	{"Min-FW-Latency", min_fw_latency, 1}, {"Max-FW-Latency", max_fw_latency, 1},
-};
-
 static int
 cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 {
@@ -459,10 +440,10 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
-	nb_stats = RTE_DIM(device_stats) + ML_CNXK_MAX_MODELS * RTE_DIM(model_stats);
+	nb_stats = RTE_DIM(device_xstats) + ML_CNXK_MAX_MODELS * RTE_DIM(layer_xstats);
 	if (cn10k_mldev->xstats.entries == NULL)
 		cn10k_mldev->xstats.entries = rte_zmalloc(
-			"cn10k_ml_xstats", sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
+			"cn10k_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
 			PLT_CACHE_LINE_SIZE);
 
 	if (cn10k_mldev->xstats.entries == NULL)
@@ -470,17 +451,17 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 
 	/* Initialize device xstats */
 	stat_id = 0;
-	for (i = 0; i < RTE_DIM(device_stats); i++) {
+	for (i = 0; i < RTE_DIM(device_xstats); i++) {
 		cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
 		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s",
-			 device_stats[i].name);
+			 device_xstats[i].name);
 
 		cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
-		cn10k_mldev->xstats.entries[stat_id].type = device_stats[i].type;
-		cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_DEVICE;
+		cn10k_mldev->xstats.entries[stat_id].type = device_xstats[i].type;
+		cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_DEVICE;
 		cn10k_mldev->xstats.entries[stat_id].obj_idx = 0;
-		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_stats[i].reset_allowed;
+		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_xstats[i].reset_allowed;
 		stat_id++;
 	}
 	cn10k_mldev->xstats.count_mode_device = stat_id;
@@ -489,24 +470,24 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
 		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
 
-		for (i = 0; i < RTE_DIM(model_stats); i++) {
+		for (i = 0; i < RTE_DIM(layer_xstats); i++) {
 			cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
 			cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
-			cn10k_mldev->xstats.entries[stat_id].type = model_stats[i].type;
-			cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_MODEL;
+			cn10k_mldev->xstats.entries[stat_id].type = layer_xstats[i].type;
+			cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
 			cn10k_mldev->xstats.entries[stat_id].obj_idx = model;
 			cn10k_mldev->xstats.entries[stat_id].reset_allowed =
-				model_stats[i].reset_allowed;
+				layer_xstats[i].reset_allowed;
 
 			/* Name of xstat is updated during model load */
 			snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 				 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name),
-				 "Model-%u-%s", model, model_stats[i].name);
+				 "Model-%u-%s", model, layer_xstats[i].name);
 
 			stat_id++;
 		}
 
-		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(model_stats);
+		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
 	}
 
 	cn10k_mldev->xstats.count_mode_model = stat_id - cn10k_mldev->xstats.count_mode_device;
@@ -545,7 +526,7 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[model_id];
-	stat_id = RTE_DIM(device_stats) + model_id * RTE_DIM(model_stats);
+	stat_id = RTE_DIM(device_xstats) + model_id * RTE_DIM(layer_xstats);
 
 	roc_clk_freq_get(&rclk_freq, &sclk_freq);
 	if (sclk_freq == 0)
@@ -554,17 +535,17 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 		strcpy(suffix, "ns");
 
 	/* Update xstat name based on model name and sclk availability */
-	for (i = 0; i < RTE_DIM(model_stats); i++) {
+	for (i = 0; i < RTE_DIM(layer_xstats); i++) {
 		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
-			 model->layer[0].glow.metadata.model.name, model_stats[i].name, suffix);
+			 model->layer[0].glow.metadata.model.name, layer_xstats[i].name, suffix);
 		stat_id++;
 	}
 }
 
 static uint64_t
 cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
-		       enum cn10k_ml_xstats_type type)
+		       enum cnxk_ml_xstats_type type)
 {
 	struct cnxk_ml_dev *cnxk_mldev;
 
@@ -590,9 +571,9 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value += model->layer[0].glow.burst_stats[qp_id].str##_latency_tot;        \
-			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
-				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
+			value += model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot;       \
+			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
+				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
 		}                                                                                  \
 		if (count != 0)                                                                    \
 			value = value / count;                                                     \
@@ -603,9 +584,10 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 		value = UINT64_MAX;                                                                \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
 			value = PLT_MIN(                                                           \
-				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_min); \
-			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
-				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
+				value,                                                             \
+				model->layer[0].glow.burst_xstats[qp_id].str##_latency_min);       \
+			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
+				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
@@ -616,16 +598,17 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
 			value = PLT_MAX(                                                           \
-				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_max); \
-			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
-				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
+				value,                                                             \
+				model->layer[0].glow.burst_xstats[qp_id].str##_latency_max);       \
+			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
+				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
 	} while (0)
 
 static uint64_t
-cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cn10k_ml_xstats_type type)
+cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cnxk_ml_xstats_type type)
 {
 	struct cnxk_ml_model *model;
 	uint16_t rclk_freq; /* MHz */
@@ -671,8 +654,8 @@ cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cn10k_ml
 static int
 cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[], uint16_t nb_ids)
 {
-	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
 	uint16_t nb_stats;
 	uint16_t stat_id;
@@ -708,26 +691,26 @@ cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[],
 #define ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			model->layer[0].glow.burst_stats[qp_id].str##_latency_tot = 0;             \
-			model->layer[0].glow.burst_stats[qp_id].str##_reset_count =                \
-				model->layer[0].glow.burst_stats[qp_id].dequeued_count;            \
+			model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot = 0;            \
+			model->layer[0].glow.burst_xstats[qp_id].str##_reset_count =               \
+				model->layer[0].glow.burst_xstats[qp_id].dequeued_count;           \
 		}                                                                                  \
 	} while (0)
 
 #define ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_stats[qp_id].str##_latency_min = UINT64_MAX;    \
+			model->layer[0].glow.burst_xstats[qp_id].str##_latency_min = UINT64_MAX;   \
 	} while (0)
 
 #define ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_stats[qp_id].str##_latency_max = 0;             \
+			model->layer[0].glow.burst_xstats[qp_id].str##_latency_max = 0;            \
 	} while (0)
 
 static void
-cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cn10k_ml_xstats_type type)
+cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cnxk_ml_xstats_type type)
 {
 	struct cnxk_ml_model *model;
 	uint32_t qp_id;
@@ -762,8 +745,8 @@ static int
 cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint16_t stat_ids[],
 			    uint16_t nb_ids)
 {
-	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	int32_t lcl_model_id = 0;
@@ -1342,10 +1325,10 @@ static int
 cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
 				uint64_t *value)
 {
-	struct cn10k_ml_xstats_entry *xs;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	cn10k_ml_xstats_fn fn;
+	cnxk_ml_xstats_fn fn;
 	uint32_t i;
 
 	cnxk_mldev = dev->data->dev_private;
@@ -1357,10 +1340,10 @@ cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16
 				*stat_id = xs->map.id;
 
 			switch (xs->fn_id) {
-			case CN10K_ML_XSTATS_FN_DEVICE:
+			case CNXK_ML_XSTATS_FN_DEVICE:
 				fn = cn10k_ml_dev_xstat_get;
 				break;
-			case CN10K_ML_XSTATS_FN_MODEL:
+			case CNXK_ML_XSTATS_FN_MODEL:
 				fn = cn10k_ml_model_xstat_get;
 				break;
 			default:
@@ -1384,11 +1367,11 @@ static int
 cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
 			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
 {
-	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
 	uint32_t xstats_mode_count;
-	cn10k_ml_xstats_fn fn;
+	cnxk_ml_xstats_fn fn;
 	uint64_t val;
 	uint32_t idx;
 	uint32_t i;
@@ -1423,10 +1406,10 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 		}
 
 		switch (xs->fn_id) {
-		case CN10K_ML_XSTATS_FN_DEVICE:
+		case CNXK_ML_XSTATS_FN_DEVICE:
 			fn = cn10k_ml_dev_xstat_get;
 			break;
-		case CN10K_ML_XSTATS_FN_MODEL:
+		case CNXK_ML_XSTATS_FN_MODEL:
 			fn = cn10k_ml_model_xstat_get;
 			break;
 		default:
@@ -1664,7 +1647,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
 			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
 	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
-	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_stats);
+	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_xstats);
 
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
 		  2 * model_data_size + model_scratch_size + model_info_size +
@@ -1738,24 +1721,24 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	model->layer[0].glow.req = PLT_PTR_ADD(model->info, model_info_size);
 
 	/* Reset burst and sync stats */
-	model->layer[0].glow.burst_stats =
+	model->layer[0].glow.burst_xstats =
 		PLT_PTR_ADD(model->layer[0].glow.req,
 			    PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
-		model->layer[0].glow.burst_stats[qp_id].hw_latency_tot = 0;
-		model->layer[0].glow.burst_stats[qp_id].hw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_stats[qp_id].hw_latency_max = 0;
-		model->layer[0].glow.burst_stats[qp_id].fw_latency_tot = 0;
-		model->layer[0].glow.burst_stats[qp_id].fw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_stats[qp_id].fw_latency_max = 0;
-		model->layer[0].glow.burst_stats[qp_id].hw_reset_count = 0;
-		model->layer[0].glow.burst_stats[qp_id].fw_reset_count = 0;
-		model->layer[0].glow.burst_stats[qp_id].dequeued_count = 0;
+		model->layer[0].glow.burst_xstats[qp_id].hw_latency_tot = 0;
+		model->layer[0].glow.burst_xstats[qp_id].hw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_xstats[qp_id].hw_latency_max = 0;
+		model->layer[0].glow.burst_xstats[qp_id].fw_latency_tot = 0;
+		model->layer[0].glow.burst_xstats[qp_id].fw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_xstats[qp_id].fw_latency_max = 0;
+		model->layer[0].glow.burst_xstats[qp_id].hw_reset_count = 0;
+		model->layer[0].glow.burst_xstats[qp_id].fw_reset_count = 0;
+		model->layer[0].glow.burst_xstats[qp_id].dequeued_count = 0;
 	}
 
-	model->layer[0].glow.sync_stats =
-		PLT_PTR_ADD(model->layer[0].glow.burst_stats,
-			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_stats));
+	model->layer[0].glow.sync_xstats =
+		PLT_PTR_ADD(model->layer[0].glow.burst_xstats,
+			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_xstats));
 
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CNXK_MODEL_STATE_LOADED;
@@ -2308,7 +2291,7 @@ static __rte_always_inline void
 cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *req)
 {
 	union cn10k_ml_error_code *error_code;
-	struct cn10k_ml_layer_stats *stats;
+	struct cn10k_ml_layer_xstats *xstats;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_result *result;
@@ -2326,31 +2309,31 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *re
 		if (likely(qp_id >= 0)) {
 			qp = dev->data->queue_pairs[qp_id];
 			qp->stats.dequeued_count++;
-			stats = &model->layer[0].glow.burst_stats[qp_id];
+			xstats = &model->layer[0].glow.burst_xstats[qp_id];
 		} else {
-			stats = model->layer[0].glow.sync_stats;
+			xstats = model->layer[0].glow.sync_xstats;
 		}
 
-		if (unlikely(stats->dequeued_count == stats->hw_reset_count)) {
-			stats->hw_latency_min = UINT64_MAX;
-			stats->hw_latency_max = 0;
+		if (unlikely(xstats->dequeued_count == xstats->hw_reset_count)) {
+			xstats->hw_latency_min = UINT64_MAX;
+			xstats->hw_latency_max = 0;
 		}
 
-		if (unlikely(stats->dequeued_count == stats->fw_reset_count)) {
-			stats->fw_latency_min = UINT64_MAX;
-			stats->fw_latency_max = 0;
+		if (unlikely(xstats->dequeued_count == xstats->fw_reset_count)) {
+			xstats->fw_latency_min = UINT64_MAX;
+			xstats->fw_latency_max = 0;
 		}
 
 		hw_latency = result->stats.hw_end - result->stats.hw_start;
 		fw_latency = result->stats.fw_end - result->stats.fw_start - hw_latency;
 
-		stats->hw_latency_tot += hw_latency;
-		stats->hw_latency_min = PLT_MIN(stats->hw_latency_min, hw_latency);
-		stats->hw_latency_max = PLT_MAX(stats->hw_latency_max, hw_latency);
-		stats->fw_latency_tot += fw_latency;
-		stats->fw_latency_min = PLT_MIN(stats->fw_latency_min, fw_latency);
-		stats->fw_latency_max = PLT_MAX(stats->fw_latency_max, fw_latency);
-		stats->dequeued_count++;
+		xstats->hw_latency_tot += hw_latency;
+		xstats->hw_latency_min = PLT_MIN(xstats->hw_latency_min, hw_latency);
+		xstats->hw_latency_max = PLT_MAX(xstats->hw_latency_max, hw_latency);
+		xstats->fw_latency_tot += fw_latency;
+		xstats->fw_latency_min = PLT_MIN(xstats->fw_latency_min, fw_latency);
+		xstats->fw_latency_max = PLT_MAX(xstats->fw_latency_max, fw_latency);
+		xstats->dequeued_count++;
 
 		op->impl_opaque = result->error_code;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
diff --git a/drivers/ml/cnxk/cnxk_ml_xstats.h b/drivers/ml/cnxk/cnxk_ml_xstats.h
new file mode 100644
index 0000000000..0d405679ca
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_xstats.h
@@ -0,0 +1,128 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_XSTATS_H_
+#define _CNXK_ML_XSTATS_H_
+
+#include "cnxk_ml_io.h"
+
+/* Extended stats types enum */
+enum cnxk_ml_xstats_type {
+	/* Number of models loaded */
+	nb_models_loaded,
+
+	/* Number of models unloaded */
+	nb_models_unloaded,
+
+	/* Number of models started */
+	nb_models_started,
+
+	/* Number of models stopped */
+	nb_models_stopped,
+
+	/* Average inference hardware latency */
+	avg_hw_latency,
+
+	/* Minimum hardware latency */
+	min_hw_latency,
+
+	/* Maximum hardware latency */
+	max_hw_latency,
+
+	/* Average firmware latency */
+	avg_fw_latency,
+
+	/* Minimum firmware latency */
+	min_fw_latency,
+
+	/* Maximum firmware latency */
+	max_fw_latency,
+
+	/* Average runtime latency */
+	avg_rt_latency,
+
+	/* Minimum runtime latency */
+	min_rt_latency,
+
+	/* Maximum runtime latency */
+	max_rt_latency,
+};
+
+/* Extended stats function type enum. */
+enum cnxk_ml_xstats_fn_type {
+	/* Device function */
+	CNXK_ML_XSTATS_FN_DEVICE,
+
+	/* Model function */
+	CNXK_ML_XSTATS_FN_MODEL,
+};
+
+/* Function pointer to get xstats for a type */
+typedef uint64_t (*cnxk_ml_xstats_fn)(struct rte_ml_dev *cnxk_mldev, uint16_t obj_idx,
+				      enum cnxk_ml_xstats_type stat);
+
+/* Extended stats entry structure */
+struct cnxk_ml_xstats_entry {
+	/* Name-ID map */
+	struct rte_ml_dev_xstats_map map;
+
+	/* xstats mode, device or model */
+	enum rte_ml_dev_xstats_mode mode;
+
+	/* Type of xstats */
+	enum cnxk_ml_xstats_type type;
+
+	/* xstats function */
+	enum cnxk_ml_xstats_fn_type fn_id;
+
+	/* Object ID, model ID for model stat type */
+	uint16_t obj_idx;
+
+	/* Layer ID, valid for model stat type */
+	int32_t layer_id;
+
+	/* Allowed to reset the stat */
+	uint8_t reset_allowed;
+
+	/* An offset to be taken away to emulate resets */
+	uint64_t reset_value;
+};
+
+/* Extended stats data */
+struct cnxk_ml_xstats {
+	/* Pointer to xstats entries */
+	struct cnxk_ml_xstats_entry *entries;
+
+	/* Store num stats and offset of the stats for each model */
+	uint16_t count_per_model[ML_CNXK_MAX_MODELS];
+	uint16_t offset_for_model[ML_CNXK_MAX_MODELS];
+	uint16_t count_per_layer[ML_CNXK_MAX_MODELS][ML_CNXK_MODEL_MAX_LAYERS];
+	uint16_t offset_for_layer[ML_CNXK_MAX_MODELS][ML_CNXK_MODEL_MAX_LAYERS];
+	uint16_t count_mode_device;
+	uint16_t count_mode_model;
+	uint16_t count;
+};
+
+struct cnxk_ml_xstat_info {
+	char name[32];
+	enum cnxk_ml_xstats_type type;
+	uint8_t reset_allowed;
+};
+
+/* Device xstats. Note: Device stats are not allowed to be reset. */
+static const struct cnxk_ml_xstat_info device_xstats[] = {
+	{"nb_models_loaded", nb_models_loaded, 0},
+	{"nb_models_unloaded", nb_models_unloaded, 0},
+	{"nb_models_started", nb_models_started, 0},
+	{"nb_models_stopped", nb_models_stopped, 0},
+};
+
+/* Layer xstats */
+static const struct cnxk_ml_xstat_info layer_xstats[] = {
+	{"Avg-HW-Latency", avg_hw_latency, 1}, {"Min-HW-Latency", min_hw_latency, 1},
+	{"Max-HW-Latency", max_hw_latency, 1}, {"Avg-FW-Latency", avg_fw_latency, 1},
+	{"Min-FW-Latency", min_fw_latency, 1}, {"Max-FW-Latency", max_fw_latency, 1},
+};
+
+#endif /* _CNXK_ML_XSTATS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 73db458fcd..6385ac4548 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -16,6 +16,7 @@ driver_sdk_headers = files(
         'cnxk_ml_io.h',
         'cnxk_ml_model.h',
         'cnxk_ml_ops.h',
+        'cnxk_ml_xstats.h',
 )
 
 sources = files(
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 06/34] ml/cnxk: rename cnxk ops function pointers struct
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (4 preceding siblings ...)
  2023-10-17 16:59   ` [PATCH v4 05/34] ml/cnxk: add generic cnxk xstats structures Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 07/34] ml/cnxk: update device handling functions Srikanth Yalavarthi
                     ` (28 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Renamed cn10k ML ops structure with cnxk prefix.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c |  2 +-
 drivers/ml/cnxk/cn10k_ml_ops.c | 73 +++++++++-------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h | 34 +++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.c  | 36 +++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.h  |  2 +
 5 files changed, 91 insertions(+), 56 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index fc6f78d414..91813e9d0a 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -345,7 +345,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 			goto pmd_destroy;
 		}
 
-		dev->dev_ops = &cn10k_ml_ops;
+		dev->dev_ops = &cnxk_ml_ops;
 	} else {
 		plt_err("CN10K ML Ops are not supported on secondary process");
 		dev->dev_ops = &ml_dev_dummy_ops;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 42a4389bbe..66b38fc1eb 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -119,7 +119,7 @@ cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
 {
 	struct cnxk_ml_qp *qp;
@@ -860,7 +860,7 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	return ret;
 }
 
-static int
+int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -888,7 +888,7 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
 {
 	struct rte_ml_dev_info dev_info;
@@ -1087,7 +1087,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	return ret;
 }
 
-static int
+int
 cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1160,7 +1160,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	return rte_dev_remove(dev->device);
 }
 
-static int
+int
 cn10k_ml_dev_start(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1180,7 +1180,7 @@ cn10k_ml_dev_start(struct rte_ml_dev *dev)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1200,7 +1200,7 @@ cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 			      const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
 {
@@ -1241,7 +1241,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 {
 	struct cnxk_ml_qp *qp;
@@ -1258,7 +1258,7 @@ cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 	return 0;
 }
 
-static void
+void
 cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 {
 	struct cnxk_ml_qp *qp;
@@ -1273,7 +1273,7 @@ cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 	}
 }
 
-static int
+int
 cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
 			      uint32_t size)
@@ -1321,7 +1321,7 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 	return idx;
 }
 
-static int
+int
 cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
 				uint64_t *value)
 {
@@ -1363,7 +1363,7 @@ cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16
 	return -EINVAL;
 }
 
-static int
+int
 cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
 			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
 {
@@ -1427,7 +1427,7 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 	return idx;
 }
 
-static int
+int
 cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 			  int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids)
 {
@@ -1441,7 +1441,7 @@ cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mo
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1528,7 +1528,7 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -2051,7 +2051,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	return ret;
 }
 
-static int
+int
 cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			struct rte_ml_model_info *model_info)
 {
@@ -2071,7 +2071,7 @@ cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 	return 0;
 }
 
-static int
+int
 cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
 {
 	struct cnxk_ml_model *model;
@@ -2105,7 +2105,7 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *bu
 	return 0;
 }
 
-static int
+int
 cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
 		     struct rte_ml_buff_seg **qbuffer)
 {
@@ -2186,7 +2186,7 @@ cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_bu
 	return 0;
 }
 
-static int
+int
 cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
 		       struct rte_ml_buff_seg **dbuffer)
 {
@@ -2574,38 +2574,3 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 error_enqueue:
 	return ret;
 }
-
-struct rte_ml_dev_ops cn10k_ml_ops = {
-	/* Device control ops */
-	.dev_info_get = cn10k_ml_dev_info_get,
-	.dev_configure = cn10k_ml_dev_configure,
-	.dev_close = cn10k_ml_dev_close,
-	.dev_start = cn10k_ml_dev_start,
-	.dev_stop = cn10k_ml_dev_stop,
-	.dev_dump = cn10k_ml_dev_dump,
-	.dev_selftest = cn10k_ml_dev_selftest,
-
-	/* Queue-pair handling ops */
-	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
-	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
-
-	/* Stats ops */
-	.dev_stats_get = cn10k_ml_dev_stats_get,
-	.dev_stats_reset = cn10k_ml_dev_stats_reset,
-	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
-	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
-	.dev_xstats_get = cn10k_ml_dev_xstats_get,
-	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
-
-	/* Model ops */
-	.model_load = cn10k_ml_model_load,
-	.model_unload = cn10k_ml_model_unload,
-	.model_start = cn10k_ml_model_start,
-	.model_stop = cn10k_ml_model_stop,
-	.model_info_get = cn10k_ml_model_info_get,
-	.model_params_update = cn10k_ml_model_params_update,
-
-	/* I/O ops */
-	.io_quantize = cn10k_ml_io_quantize,
-	.io_dequantize = cn10k_ml_io_dequantize,
-};
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index fd5992e192..16480b9ad8 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -286,7 +286,29 @@ struct cn10k_ml_req {
 };
 
 /* Device ops */
-extern struct rte_ml_dev_ops cn10k_ml_ops;
+int cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
+int cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf);
+int cn10k_ml_dev_close(struct rte_ml_dev *dev);
+int cn10k_ml_dev_start(struct rte_ml_dev *dev);
+int cn10k_ml_dev_stop(struct rte_ml_dev *dev);
+int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
+int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
+int cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
+				  const struct rte_ml_dev_qp_conf *qp_conf, int socket_id);
+int cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id);
+
+int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
+void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
+int cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+				  int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
+				  uint32_t size);
+int cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
+				    uint64_t *value);
+int cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+			    int32_t model_id, const uint16_t stat_ids[], uint64_t values[],
+			    uint16_t nb_ids);
+int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+			      int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids);
 
 /* Slow-path ops */
 int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
@@ -294,6 +316,16 @@ int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *para
 int cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
+int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
+			    struct rte_ml_model_info *model_info);
+int cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer);
+
+/* I/O ops */
+int cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id,
+			 struct rte_ml_buff_seg **dbuffer, struct rte_ml_buff_seg **qbuffer);
+
+int cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id,
+			   struct rte_ml_buff_seg **qbuffer, struct rte_ml_buff_seg **dbuffer);
 
 /* Fast-path ops */
 __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index f1872dcf7c..03402681c5 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -3,5 +3,41 @@
  */
 
 #include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
 
 #include "cnxk_ml_ops.h"
+
+struct rte_ml_dev_ops cnxk_ml_ops = {
+	/* Device control ops */
+	.dev_info_get = cn10k_ml_dev_info_get,
+	.dev_configure = cn10k_ml_dev_configure,
+	.dev_close = cn10k_ml_dev_close,
+	.dev_start = cn10k_ml_dev_start,
+	.dev_stop = cn10k_ml_dev_stop,
+	.dev_dump = cn10k_ml_dev_dump,
+	.dev_selftest = cn10k_ml_dev_selftest,
+
+	/* Queue-pair handling ops */
+	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
+	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
+
+	/* Stats ops */
+	.dev_stats_get = cn10k_ml_dev_stats_get,
+	.dev_stats_reset = cn10k_ml_dev_stats_reset,
+	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
+	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
+	.dev_xstats_get = cn10k_ml_dev_xstats_get,
+	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
+
+	/* Model ops */
+	.model_load = cn10k_ml_model_load,
+	.model_unload = cn10k_ml_model_unload,
+	.model_start = cn10k_ml_model_start,
+	.model_stop = cn10k_ml_model_stop,
+	.model_info_get = cn10k_ml_model_info_get,
+	.model_params_update = cn10k_ml_model_params_update,
+
+	/* I/O ops */
+	.io_quantize = cn10k_ml_io_quantize,
+	.io_dequantize = cn10k_ml_io_dequantize,
+};
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index b953fb0f5f..a925c07580 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -60,4 +60,6 @@ struct cnxk_ml_qp {
 	struct rte_ml_dev_stats stats;
 };
 
+extern struct rte_ml_dev_ops cnxk_ml_ops;
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 07/34] ml/cnxk: update device handling functions
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (5 preceding siblings ...)
  2023-10-17 16:59   ` [PATCH v4 06/34] ml/cnxk: rename cnxk ops function pointers struct Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 08/34] ml/cnxk: update queue-pair " Srikanth Yalavarthi
                     ` (27 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implement CNXK wrapper functions for dev_info_get,
dev_configure, dev_close, dev_start and dev_stop. The
wrapper functions allocate / release common resources
for the ML driver and invoke device specific functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 230 ++------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |  16 +-
 drivers/ml/cnxk/cnxk_ml_dev.h  |   3 +
 drivers/ml/cnxk/cnxk_ml_ops.c  | 286 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.h  |   3 +
 5 files changed, 314 insertions(+), 224 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 66b38fc1eb..6d8f2c8777 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -101,7 +101,7 @@ qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
 }
 
-static int
+int
 cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
 {
 	const struct rte_memzone *qp_mem;
@@ -861,20 +861,12 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 }
 
 int
-cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
+cn10k_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 
-	if (dev_info == NULL)
-		return -EINVAL;
-
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
-	dev_info->driver_name = dev->device->driver->name;
-	dev_info->max_models = ML_CNXK_MAX_MODELS;
 	if (cn10k_mldev->hw_queue_lock)
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
 	else
@@ -889,143 +881,17 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 }
 
 int
-cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
+cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
-	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cnxk_ml_qp *qp;
-	uint16_t model_id;
-	uint32_t mz_size;
 	uint16_t tile_id;
-	uint16_t qp_id;
 	int ret;
 
-	if (dev == NULL || conf == NULL)
-		return -EINVAL;
+	RTE_SET_USED(conf);
 
-	/* Get CN10K device handle */
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	cn10k_ml_dev_info_get(dev, &dev_info);
-	if (conf->nb_models > dev_info.max_models) {
-		plt_err("Invalid device config, nb_models > %u\n", dev_info.max_models);
-		return -EINVAL;
-	}
-
-	if (conf->nb_queue_pairs > dev_info.max_queue_pairs) {
-		plt_err("Invalid device config, nb_queue_pairs > %u\n", dev_info.max_queue_pairs);
-		return -EINVAL;
-	}
-
-	if (cnxk_mldev->state == ML_CNXK_DEV_STATE_PROBED) {
-		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
-			   conf->nb_queue_pairs, conf->nb_models);
-
-		/* Load firmware */
-		ret = cn10k_ml_fw_load(cnxk_mldev);
-		if (ret != 0)
-			return ret;
-	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
-		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
-			   conf->nb_queue_pairs, conf->nb_models);
-	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_STARTED) {
-		plt_err("Device can't be reconfigured in started state\n");
-		return -ENOTSUP;
-	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CLOSED) {
-		plt_err("Device can't be reconfigured after close\n");
-		return -ENOTSUP;
-	}
-
-	/* Configure queue-pairs */
-	if (dev->data->queue_pairs == NULL) {
-		mz_size = sizeof(dev->data->queue_pairs[0]) * conf->nb_queue_pairs;
-		dev->data->queue_pairs =
-			rte_zmalloc("cn10k_mldev_queue_pairs", mz_size, RTE_CACHE_LINE_SIZE);
-		if (dev->data->queue_pairs == NULL) {
-			dev->data->nb_queue_pairs = 0;
-			plt_err("Failed to get memory for queue_pairs, nb_queue_pairs %u",
-				conf->nb_queue_pairs);
-			return -ENOMEM;
-		}
-	} else { /* Re-configure */
-		void **queue_pairs;
-
-		/* Release all queue pairs as ML spec doesn't support queue_pair_destroy. */
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-			qp = dev->data->queue_pairs[qp_id];
-			if (qp != NULL) {
-				ret = cn10k_ml_dev_queue_pair_release(dev, qp_id);
-				if (ret < 0)
-					return ret;
-			}
-		}
-
-		queue_pairs = dev->data->queue_pairs;
-		queue_pairs =
-			rte_realloc(queue_pairs, sizeof(queue_pairs[0]) * conf->nb_queue_pairs,
-				    RTE_CACHE_LINE_SIZE);
-		if (queue_pairs == NULL) {
-			dev->data->nb_queue_pairs = 0;
-			plt_err("Failed to realloc queue_pairs, nb_queue_pairs = %u",
-				conf->nb_queue_pairs);
-			ret = -ENOMEM;
-			goto error;
-		}
-
-		memset(queue_pairs, 0, sizeof(queue_pairs[0]) * conf->nb_queue_pairs);
-		dev->data->queue_pairs = queue_pairs;
-	}
-	dev->data->nb_queue_pairs = conf->nb_queue_pairs;
-
-	/* Allocate ML models */
-	if (dev->data->models == NULL) {
-		mz_size = sizeof(dev->data->models[0]) * conf->nb_models;
-		dev->data->models = rte_zmalloc("cn10k_mldev_models", mz_size, RTE_CACHE_LINE_SIZE);
-		if (dev->data->models == NULL) {
-			dev->data->nb_models = 0;
-			plt_err("Failed to get memory for ml_models, nb_models %u",
-				conf->nb_models);
-			ret = -ENOMEM;
-			goto error;
-		}
-	} else {
-		/* Re-configure */
-		void **models;
-
-		/* Stop and unload all models */
-		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
-			model = dev->data->models[model_id];
-			if (model != NULL) {
-				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-					if (cn10k_ml_model_stop(dev, model_id) != 0)
-						plt_err("Could not stop model %u", model_id);
-				}
-				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-					if (cn10k_ml_model_unload(dev, model_id) != 0)
-						plt_err("Could not unload model %u", model_id);
-				}
-				dev->data->models[model_id] = NULL;
-			}
-		}
-
-		models = dev->data->models;
-		models = rte_realloc(models, sizeof(models[0]) * conf->nb_models,
-				     RTE_CACHE_LINE_SIZE);
-		if (models == NULL) {
-			dev->data->nb_models = 0;
-			plt_err("Failed to realloc ml_models, nb_models = %u", conf->nb_models);
-			ret = -ENOMEM;
-			goto error;
-		}
-		memset(models, 0, sizeof(models[0]) * conf->nb_models);
-		dev->data->models = models;
-	}
-	dev->data->nb_models = conf->nb_models;
-
 	ocm = &cn10k_mldev->ocm;
 	ocm->num_tiles = ML_CN10K_OCM_NUMTILES;
 	ocm->size_per_tile = ML_CN10K_OCM_TILESIZE;
@@ -1038,8 +904,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		rte_zmalloc("ocm_mask", ocm->mask_words * ocm->num_tiles, RTE_CACHE_LINE_SIZE);
 	if (ocm->ocm_mask == NULL) {
 		plt_err("Unable to allocate memory for OCM mask");
-		ret = -ENOMEM;
-		goto error;
+		return -ENOMEM;
 	}
 
 	for (tile_id = 0; tile_id < ocm->num_tiles; tile_id++) {
@@ -1050,10 +915,10 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	rte_spinlock_init(&ocm->lock);
 
 	/* Initialize xstats */
-	ret = cn10k_ml_xstats_init(dev);
+	ret = cn10k_ml_xstats_init(cnxk_mldev->mldev);
 	if (ret != 0) {
 		plt_err("Failed to initialize xstats");
-		goto error;
+		return ret;
 	}
 
 	/* Set JCMDQ enqueue function */
@@ -1067,77 +932,25 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	cn10k_mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
 	cn10k_mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
 
-	dev->enqueue_burst = cn10k_ml_enqueue_burst;
-	dev->dequeue_burst = cn10k_ml_dequeue_burst;
-	dev->op_error_get = cn10k_ml_op_error_get;
-
-	cnxk_mldev->nb_models_loaded = 0;
-	cnxk_mldev->nb_models_started = 0;
-	cnxk_mldev->nb_models_stopped = 0;
-	cnxk_mldev->nb_models_unloaded = 0;
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
+	cnxk_mldev->mldev->enqueue_burst = cn10k_ml_enqueue_burst;
+	cnxk_mldev->mldev->dequeue_burst = cn10k_ml_dequeue_burst;
+	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
 
 	return 0;
-
-error:
-	rte_free(dev->data->queue_pairs);
-
-	rte_free(dev->data->models);
-
-	return ret;
 }
 
 int
-cn10k_ml_dev_close(struct rte_ml_dev *dev)
+cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
-	struct cnxk_ml_qp *qp;
-	uint16_t model_id;
-	uint16_t qp_id;
 
-	if (dev == NULL)
-		return -EINVAL;
-
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Release ocm_mask memory */
 	rte_free(cn10k_mldev->ocm.ocm_mask);
 
-	/* Stop and unload all models */
-	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
-		model = dev->data->models[model_id];
-		if (model != NULL) {
-			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-				if (cn10k_ml_model_stop(dev, model_id) != 0)
-					plt_err("Could not stop model %u", model_id);
-			}
-			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-				if (cn10k_ml_model_unload(dev, model_id) != 0)
-					plt_err("Could not unload model %u", model_id);
-			}
-			dev->data->models[model_id] = NULL;
-		}
-	}
-
-	rte_free(dev->data->models);
-
-	/* Destroy all queue pairs */
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-		qp = dev->data->queue_pairs[qp_id];
-		if (qp != NULL) {
-			if (cnxk_ml_qp_destroy(dev, qp) != 0)
-				plt_err("Could not destroy queue pair %u", qp_id);
-			dev->data->queue_pairs[qp_id] = NULL;
-		}
-	}
-
-	rte_free(dev->data->queue_pairs);
-
 	/* Un-initialize xstats */
-	cn10k_ml_xstats_uninit(dev);
+	cn10k_ml_xstats_uninit(cnxk_mldev->mldev);
 
 	/* Unload firmware */
 	cn10k_ml_fw_unload(cnxk_mldev);
@@ -1154,20 +967,15 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_MLR_BASE);
 	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
 
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_CLOSED;
-
-	/* Remove PCI device */
-	return rte_dev_remove(dev->device);
+	return 0;
 }
 
 int
-cn10k_ml_dev_start(struct rte_ml_dev *dev)
+cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
@@ -1175,19 +983,15 @@ cn10k_ml_dev_start(struct rte_ml_dev *dev)
 	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
-
 	return 0;
 }
 
 int
-cn10k_ml_dev_stop(struct rte_ml_dev *dev)
+cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
@@ -1195,8 +999,6 @@ cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
-
 	return 0;
 }
 
@@ -1217,7 +1019,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	if (dev->data->queue_pairs[queue_pair_id] != NULL)
 		cn10k_ml_dev_queue_pair_release(dev, queue_pair_id);
 
-	cn10k_ml_dev_info_get(dev, &dev_info);
+	cnxk_ml_dev_info_get(dev, &dev_info);
 	if ((qp_conf->nb_desc > dev_info.max_desc) || (qp_conf->nb_desc == 0)) {
 		plt_err("Could not setup queue pair for %u descriptors", qp_conf->nb_desc);
 		return -EINVAL;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 16480b9ad8..d50b5bede7 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -10,6 +10,9 @@
 
 #include <roc_api.h>
 
+struct cnxk_ml_dev;
+struct cnxk_ml_qp;
+
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
 
@@ -286,11 +289,11 @@ struct cn10k_ml_req {
 };
 
 /* Device ops */
-int cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
-int cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf);
-int cn10k_ml_dev_close(struct rte_ml_dev *dev);
-int cn10k_ml_dev_start(struct rte_ml_dev *dev);
-int cn10k_ml_dev_stop(struct rte_ml_dev *dev);
+int cn10k_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info);
+int cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
+int cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev);
+int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
 int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
 int cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
@@ -336,4 +339,7 @@ __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op
 				    struct rte_ml_op_error *error);
 __rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
 
+/* Temporarily set below functions as non-static */
+int cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp);
+
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 51315de622..02605fa28f 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -53,6 +53,9 @@ struct cnxk_ml_dev {
 
 	/* CN10K device structure */
 	struct cn10k_ml_dev cn10k_mldev;
+
+	/* Maximum number of layers */
+	uint64_t max_nb_layers;
 };
 
 #endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 03402681c5..07a4daabc5 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -5,15 +5,291 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include "cnxk_ml_dev.h"
+#include "cnxk_ml_io.h"
+#include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
+int
+cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	if (dev == NULL || dev_info == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
+	dev_info->driver_name = dev->device->driver->name;
+	dev_info->max_models = ML_CNXK_MAX_MODELS;
+
+	return cn10k_ml_dev_info_get(cnxk_mldev, dev_info);
+}
+
+static int
+cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_qp *qp;
+	uint16_t model_id;
+	uint32_t mz_size;
+	uint16_t qp_id;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	/* Get CNXK device handle */
+	cnxk_mldev = dev->data->dev_private;
+
+	cnxk_ml_dev_info_get(dev, &dev_info);
+	if (conf->nb_models > dev_info.max_models) {
+		plt_err("Invalid device config, nb_models > %u\n", dev_info.max_models);
+		return -EINVAL;
+	}
+
+	if (conf->nb_queue_pairs > dev_info.max_queue_pairs) {
+		plt_err("Invalid device config, nb_queue_pairs > %u\n", dev_info.max_queue_pairs);
+		return -EINVAL;
+	}
+
+	if (cnxk_mldev->state == ML_CNXK_DEV_STATE_PROBED) {
+		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
+			   conf->nb_queue_pairs, conf->nb_models);
+
+		/* Load firmware */
+		ret = cn10k_ml_fw_load(cnxk_mldev);
+		if (ret != 0)
+			return ret;
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
+		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
+			   conf->nb_queue_pairs, conf->nb_models);
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_STARTED) {
+		plt_err("Device can't be reconfigured in started state\n");
+		return -ENOTSUP;
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CLOSED) {
+		plt_err("Device can't be reconfigured after close\n");
+		return -ENOTSUP;
+	}
+
+	/* Configure queue-pairs */
+	if (dev->data->queue_pairs == NULL) {
+		mz_size = sizeof(dev->data->queue_pairs[0]) * conf->nb_queue_pairs;
+		dev->data->queue_pairs =
+			rte_zmalloc("cnxk_mldev_queue_pairs", mz_size, RTE_CACHE_LINE_SIZE);
+		if (dev->data->queue_pairs == NULL) {
+			dev->data->nb_queue_pairs = 0;
+			plt_err("Failed to get memory for queue_pairs, nb_queue_pairs %u",
+				conf->nb_queue_pairs);
+			return -ENOMEM;
+		}
+	} else { /* Re-configure */
+		void **queue_pairs;
+
+		/* Release all queue pairs as ML spec doesn't support queue_pair_destroy. */
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+			qp = dev->data->queue_pairs[qp_id];
+			if (qp != NULL) {
+				ret = cn10k_ml_dev_queue_pair_release(dev, qp_id);
+				if (ret < 0)
+					return ret;
+			}
+		}
+
+		queue_pairs = dev->data->queue_pairs;
+		queue_pairs =
+			rte_realloc(queue_pairs, sizeof(queue_pairs[0]) * conf->nb_queue_pairs,
+				    RTE_CACHE_LINE_SIZE);
+		if (queue_pairs == NULL) {
+			dev->data->nb_queue_pairs = 0;
+			plt_err("Failed to realloc queue_pairs, nb_queue_pairs = %u",
+				conf->nb_queue_pairs);
+			ret = -ENOMEM;
+			goto error;
+		}
+
+		memset(queue_pairs, 0, sizeof(queue_pairs[0]) * conf->nb_queue_pairs);
+		dev->data->queue_pairs = queue_pairs;
+	}
+	dev->data->nb_queue_pairs = conf->nb_queue_pairs;
+
+	/* Allocate ML models */
+	if (dev->data->models == NULL) {
+		mz_size = sizeof(dev->data->models[0]) * conf->nb_models;
+		dev->data->models = rte_zmalloc("cnxk_mldev_models", mz_size, RTE_CACHE_LINE_SIZE);
+		if (dev->data->models == NULL) {
+			dev->data->nb_models = 0;
+			plt_err("Failed to get memory for ml_models, nb_models %u",
+				conf->nb_models);
+			ret = -ENOMEM;
+			goto error;
+		}
+	} else {
+		/* Re-configure */
+		void **models;
+
+		/* Stop and unload all models */
+		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+			model = dev->data->models[model_id];
+			if (model != NULL) {
+				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
+					if (cn10k_ml_model_stop(dev, model_id) != 0)
+						plt_err("Could not stop model %u", model_id);
+				}
+				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
+					if (cn10k_ml_model_unload(dev, model_id) != 0)
+						plt_err("Could not unload model %u", model_id);
+				}
+				dev->data->models[model_id] = NULL;
+			}
+		}
+
+		models = dev->data->models;
+		models = rte_realloc(models, sizeof(models[0]) * conf->nb_models,
+				     RTE_CACHE_LINE_SIZE);
+		if (models == NULL) {
+			dev->data->nb_models = 0;
+			plt_err("Failed to realloc ml_models, nb_models = %u", conf->nb_models);
+			ret = -ENOMEM;
+			goto error;
+		}
+		memset(models, 0, sizeof(models[0]) * conf->nb_models);
+		dev->data->models = models;
+	}
+	dev->data->nb_models = conf->nb_models;
+
+	ret = cn10k_ml_dev_configure(cnxk_mldev, conf);
+	if (ret != 0) {
+		plt_err("Failed to configure CN10K ML Device");
+		goto error;
+	}
+
+	/* Set device capabilities */
+	cnxk_mldev->max_nb_layers =
+		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
+
+	cnxk_mldev->nb_models_loaded = 0;
+	cnxk_mldev->nb_models_started = 0;
+	cnxk_mldev->nb_models_stopped = 0;
+	cnxk_mldev->nb_models_unloaded = 0;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
+
+	return 0;
+
+error:
+	rte_free(dev->data->queue_pairs);
+	rte_free(dev->data->models);
+
+	return ret;
+}
+
+static int
+cnxk_ml_dev_close(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_qp *qp;
+	uint16_t model_id;
+	uint16_t qp_id;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
+		plt_err("Failed to close CN10K ML Device");
+
+	/* Stop and unload all models */
+	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+		model = dev->data->models[model_id];
+		if (model != NULL) {
+			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
+				if (cn10k_ml_model_stop(dev, model_id) != 0)
+					plt_err("Could not stop model %u", model_id);
+			}
+			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
+				if (cn10k_ml_model_unload(dev, model_id) != 0)
+					plt_err("Could not unload model %u", model_id);
+			}
+			dev->data->models[model_id] = NULL;
+		}
+	}
+
+	rte_free(dev->data->models);
+
+	/* Destroy all queue pairs */
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		if (qp != NULL) {
+			if (cnxk_ml_qp_destroy(dev, qp) != 0)
+				plt_err("Could not destroy queue pair %u", qp_id);
+			dev->data->queue_pairs[qp_id] = NULL;
+		}
+	}
+
+	rte_free(dev->data->queue_pairs);
+
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CLOSED;
+
+	/* Remove PCI device */
+	return rte_dev_remove(dev->device);
+}
+
+static int
+cnxk_ml_dev_start(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	ret = cn10k_ml_dev_start(cnxk_mldev);
+	if (ret != 0) {
+		plt_err("Failed to start CN10K ML Device");
+		return ret;
+	}
+
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
+
+	return 0;
+}
+
+static int
+cnxk_ml_dev_stop(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	ret = cn10k_ml_dev_stop(cnxk_mldev);
+	if (ret != 0) {
+		plt_err("Failed to stop CN10K ML Device");
+		return ret;
+	}
+
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
-	.dev_info_get = cn10k_ml_dev_info_get,
-	.dev_configure = cn10k_ml_dev_configure,
-	.dev_close = cn10k_ml_dev_close,
-	.dev_start = cn10k_ml_dev_start,
-	.dev_stop = cn10k_ml_dev_stop,
+	.dev_info_get = cnxk_ml_dev_info_get,
+	.dev_configure = cnxk_ml_dev_configure,
+	.dev_close = cnxk_ml_dev_close,
+	.dev_start = cnxk_ml_dev_start,
+	.dev_stop = cnxk_ml_dev_stop,
 	.dev_dump = cn10k_ml_dev_dump,
 	.dev_selftest = cn10k_ml_dev_selftest,
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index a925c07580..2996928d7d 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -62,4 +62,7 @@ struct cnxk_ml_qp {
 
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
+/* Temporarily set cnxk driver functions as non-static */
+int cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 08/34] ml/cnxk: update queue-pair handling functions
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (6 preceding siblings ...)
  2023-10-17 16:59   ` [PATCH v4 07/34] ml/cnxk: update device handling functions Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 09/34] ml/cnxk: update model load and unload functions Srikanth Yalavarthi
                     ` (26 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to handle ML device queue-pairs.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 135 +----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   7 +-
 drivers/ml/cnxk/cnxk_ml_ops.c  | 153 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.h  |   3 -
 4 files changed, 154 insertions(+), 144 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 6d8f2c8777..e3c688a55f 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -95,93 +95,12 @@ cn10k_ml_get_poll_ptr(struct cnxk_ml_req *req)
 	return plt_read64(req->status);
 }
 
-static void
-qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
-{
-	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
-}
-
-int
-cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
-{
-	const struct rte_memzone *qp_mem;
-	char name[RTE_MEMZONE_NAMESIZE];
-	int ret;
-
-	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp->id);
-	qp_mem = rte_memzone_lookup(name);
-	ret = rte_memzone_free(qp_mem);
-	if (ret)
-		return ret;
-
-	rte_free(qp);
-
-	return 0;
-}
-
-int
-cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
-{
-	struct cnxk_ml_qp *qp;
-	int ret;
-
-	qp = dev->data->queue_pairs[queue_pair_id];
-	if (qp == NULL)
-		return -EINVAL;
-
-	ret = cnxk_ml_qp_destroy(dev, qp);
-	if (ret) {
-		plt_err("Could not destroy queue pair %u", queue_pair_id);
-		return ret;
-	}
-
-	dev->data->queue_pairs[queue_pair_id] = NULL;
-
-	return 0;
-}
-
-static struct cnxk_ml_qp *
-cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
+void
+cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp)
 {
-	const struct rte_memzone *qp_mem;
-	char name[RTE_MEMZONE_NAMESIZE];
-	struct cnxk_ml_qp *qp;
-	uint32_t len;
-	uint8_t *va;
 	uint64_t i;
 
-	/* Allocate queue pair */
-	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cnxk_ml_qp), ROC_ALIGN,
-				socket_id);
-	if (qp == NULL) {
-		plt_err("Could not allocate queue pair");
-		return NULL;
-	}
-
-	/* For request queue */
-	len = nb_desc * sizeof(struct cnxk_ml_req);
-	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp_id);
-	qp_mem = rte_memzone_reserve_aligned(
-		name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | RTE_MEMZONE_256MB, ROC_ALIGN);
-	if (qp_mem == NULL) {
-		plt_err("Could not reserve memzone: %s", name);
-		goto qp_free;
-	}
-
-	va = qp_mem->addr;
-	memset(va, 0, len);
-
-	/* Initialize Request queue */
-	qp->id = qp_id;
-	qp->queue.reqs = (struct cnxk_ml_req *)va;
-	qp->queue.head = 0;
-	qp->queue.tail = 0;
-	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	qp->nb_desc = nb_desc;
-	qp->stats.enqueued_count = 0;
-	qp->stats.dequeued_count = 0;
-	qp->stats.enqueue_err_count = 0;
-	qp->stats.dequeue_err_count = 0;
+	RTE_SET_USED(cnxk_mldev);
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
@@ -189,13 +108,6 @@ cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc
 		qp->queue.reqs[i].cn10k_req.jcmd.w1.s.jobptr =
 			PLT_U64_CAST(&qp->queue.reqs[i].cn10k_req.jd);
 	}
-
-	return qp;
-
-qp_free:
-	rte_free(qp);
-
-	return NULL;
 }
 
 static void
@@ -1002,47 +914,6 @@ cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
-int
-cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
-			      const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
-{
-	struct rte_ml_dev_info dev_info;
-	struct cnxk_ml_qp *qp;
-	uint32_t nb_desc;
-
-	if (queue_pair_id >= dev->data->nb_queue_pairs) {
-		plt_err("Queue-pair id = %u (>= max queue pairs supported, %u)\n", queue_pair_id,
-			dev->data->nb_queue_pairs);
-		return -EINVAL;
-	}
-
-	if (dev->data->queue_pairs[queue_pair_id] != NULL)
-		cn10k_ml_dev_queue_pair_release(dev, queue_pair_id);
-
-	cnxk_ml_dev_info_get(dev, &dev_info);
-	if ((qp_conf->nb_desc > dev_info.max_desc) || (qp_conf->nb_desc == 0)) {
-		plt_err("Could not setup queue pair for %u descriptors", qp_conf->nb_desc);
-		return -EINVAL;
-	}
-	plt_ml_dbg("Creating queue-pair, queue_pair_id = %u, nb_desc = %u", queue_pair_id,
-		   qp_conf->nb_desc);
-
-	/* As the number of usable descriptors is 1 less than the queue size being created, we
-	 * increment the size of queue by 1 than the requested size, except when the requested size
-	 * is equal to the maximum possible size.
-	 */
-	nb_desc =
-		(qp_conf->nb_desc == dev_info.max_desc) ? dev_info.max_desc : qp_conf->nb_desc + 1;
-	qp = cnxk_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
-	if (qp == NULL) {
-		plt_err("Could not create queue pair %u", queue_pair_id);
-		return -ENOMEM;
-	}
-	dev->data->queue_pairs[queue_pair_id] = qp;
-
-	return 0;
-}
-
 int
 cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 {
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index d50b5bede7..2d0a49d5cd 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -296,9 +296,6 @@ int cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
 int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
-int cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
-				  const struct rte_ml_dev_qp_conf *qp_conf, int socket_id);
-int cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id);
 
 int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
 void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
@@ -339,7 +336,7 @@ __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op
 				    struct rte_ml_op_error *error);
 __rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
 
-/* Temporarily set below functions as non-static */
-int cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp);
+/* Misc ops */
+void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
 
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 07a4daabc5..aa56dd2276 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -10,7 +10,107 @@
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
-int
+static void
+qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
+{
+	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
+}
+
+static int
+cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
+{
+	const struct rte_memzone *qp_mem;
+	char name[RTE_MEMZONE_NAMESIZE];
+	int ret;
+
+	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp->id);
+	qp_mem = rte_memzone_lookup(name);
+	ret = rte_memzone_free(qp_mem);
+	if (ret)
+		return ret;
+
+	rte_free(qp);
+
+	return 0;
+}
+
+static int
+cnxk_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
+{
+	struct cnxk_ml_qp *qp;
+	int ret;
+
+	qp = dev->data->queue_pairs[queue_pair_id];
+	if (qp == NULL)
+		return -EINVAL;
+
+	ret = cnxk_ml_qp_destroy(dev, qp);
+	if (ret) {
+		plt_err("Could not destroy queue pair %u", queue_pair_id);
+		return ret;
+	}
+
+	dev->data->queue_pairs[queue_pair_id] = NULL;
+
+	return 0;
+}
+
+static struct cnxk_ml_qp *
+cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
+{
+	const struct rte_memzone *qp_mem;
+	char name[RTE_MEMZONE_NAMESIZE];
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_qp *qp;
+	uint32_t len;
+	uint8_t *va;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	/* Allocate queue pair */
+	qp = rte_zmalloc_socket("cnxk_ml_pmd_queue_pair", sizeof(struct cnxk_ml_qp), ROC_ALIGN,
+				socket_id);
+	if (qp == NULL) {
+		plt_err("Could not allocate queue pair");
+		return NULL;
+	}
+
+	/* For request queue */
+	len = nb_desc * sizeof(struct cnxk_ml_req);
+	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp_id);
+	qp_mem = rte_memzone_reserve_aligned(
+		name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | RTE_MEMZONE_256MB, ROC_ALIGN);
+	if (qp_mem == NULL) {
+		plt_err("Could not reserve memzone: %s", name);
+		goto qp_free;
+	}
+
+	va = qp_mem->addr;
+	memset(va, 0, len);
+
+	/* Initialize Request queue */
+	qp->id = qp_id;
+	qp->queue.reqs = (struct cnxk_ml_req *)va;
+	qp->queue.head = 0;
+	qp->queue.tail = 0;
+	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	qp->nb_desc = nb_desc;
+	qp->stats.enqueued_count = 0;
+	qp->stats.dequeued_count = 0;
+	qp->stats.enqueue_err_count = 0;
+	qp->stats.dequeue_err_count = 0;
+
+	cn10k_ml_qp_initialize(cnxk_mldev, qp);
+
+	return qp;
+
+qp_free:
+	rte_free(qp);
+
+	return NULL;
+}
+
+static int
 cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
 	struct cnxk_ml_dev *cnxk_mldev;
@@ -93,7 +193,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
 			qp = dev->data->queue_pairs[qp_id];
 			if (qp != NULL) {
-				ret = cn10k_ml_dev_queue_pair_release(dev, qp_id);
+				ret = cnxk_ml_dev_queue_pair_release(dev, qp_id);
 				if (ret < 0)
 					return ret;
 			}
@@ -283,6 +383,51 @@ cnxk_ml_dev_stop(struct rte_ml_dev *dev)
 	return 0;
 }
 
+static int
+cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
+			     const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cnxk_ml_qp *qp;
+	uint32_t nb_desc;
+
+	if (queue_pair_id >= dev->data->nb_queue_pairs) {
+		plt_err("Queue-pair id = %u (>= max queue pairs supported, %u)\n", queue_pair_id,
+			dev->data->nb_queue_pairs);
+		return -EINVAL;
+	}
+
+	if (dev->data->queue_pairs[queue_pair_id] != NULL)
+		cnxk_ml_dev_queue_pair_release(dev, queue_pair_id);
+
+	cnxk_ml_dev_info_get(dev, &dev_info);
+	if (qp_conf->nb_desc == 0) {
+		plt_err("Could not setup queue pair for %u descriptors", qp_conf->nb_desc);
+		return -EINVAL;
+	} else if (qp_conf->nb_desc > dev_info.max_desc) {
+		plt_err("Could not setup queue pair for %u descriptors (> %u)", qp_conf->nb_desc,
+			dev_info.max_desc);
+		return -EINVAL;
+	}
+	plt_ml_dbg("Creating queue-pair, queue_pair_id = %u, nb_desc = %u", queue_pair_id,
+		   qp_conf->nb_desc);
+
+	/* As the number of usable descriptors is 1 less than the queue size being created, we
+	 * increment the size of queue by 1 than the requested size, except when the requested size
+	 * is equal to the maximum possible size.
+	 */
+	nb_desc =
+		(qp_conf->nb_desc == dev_info.max_desc) ? dev_info.max_desc : qp_conf->nb_desc + 1;
+	qp = cnxk_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
+	if (qp == NULL) {
+		plt_err("Could not create queue pair %u", queue_pair_id);
+		return -ENOMEM;
+	}
+	dev->data->queue_pairs[queue_pair_id] = qp;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -294,8 +439,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_selftest = cn10k_ml_dev_selftest,
 
 	/* Queue-pair handling ops */
-	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
-	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
+	.dev_queue_pair_setup = cnxk_ml_dev_queue_pair_setup,
+	.dev_queue_pair_release = cnxk_ml_dev_queue_pair_release,
 
 	/* Stats ops */
 	.dev_stats_get = cn10k_ml_dev_stats_get,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index 2996928d7d..a925c07580 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -62,7 +62,4 @@ struct cnxk_ml_qp {
 
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
-/* Temporarily set cnxk driver functions as non-static */
-int cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
-
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 09/34] ml/cnxk: update model load and unload functions
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (7 preceding siblings ...)
  2023-10-17 16:59   ` [PATCH v4 08/34] ml/cnxk: update queue-pair " Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 10/34] ml/cnxk: enable OCM check for multilayer TVM model Srikanth Yalavarthi
                     ` (25 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented cnxk wrapper functions to load and unload
ML models. Wrapper functions would invoke the cn10k
model load and unload functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 244 ++++++++++++-------------
 drivers/ml/cnxk/cn10k_ml_model.h |  26 ++-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 296 ++++++++++++++++++-------------
 drivers/ml/cnxk/cn10k_ml_ops.h   |  12 +-
 drivers/ml/cnxk/cnxk_ml_dev.h    |  15 ++
 drivers/ml/cnxk/cnxk_ml_ops.c    | 144 ++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.h    |   2 +
 7 files changed, 462 insertions(+), 277 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 5d37e9bf8a..69a60b9b90 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -316,42 +316,31 @@ cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer, uint8_t
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_layer_addr *addr;
-	size_t model_data_size;
 	uint8_t *dma_addr_load;
-	uint8_t *dma_addr_run;
 	int fpos;
 
 	metadata = &layer->glow.metadata;
 	addr = &layer->glow.addr;
-	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
-			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
 
 	/* Base address */
 	addr->base_dma_addr_load = base_dma_addr;
-	addr->base_dma_addr_run = PLT_PTR_ADD(addr->base_dma_addr_load, model_data_size);
 
 	/* Init section */
 	dma_addr_load = addr->base_dma_addr_load;
-	dma_addr_run = addr->base_dma_addr_run;
 	fpos = sizeof(struct cn10k_ml_model_metadata);
 	addr->init_load_addr = dma_addr_load;
-	addr->init_run_addr = dma_addr_run;
 	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->init_model.file_size);
 
 	/* Main section */
 	dma_addr_load += metadata->init_model.file_size;
-	dma_addr_run += metadata->init_model.file_size;
 	fpos += metadata->init_model.file_size;
 	addr->main_load_addr = dma_addr_load;
-	addr->main_run_addr = dma_addr_run;
 	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->main_model.file_size);
 
 	/* Finish section */
 	dma_addr_load += metadata->main_model.file_size;
-	dma_addr_run += metadata->main_model.file_size;
 	fpos += metadata->main_model.file_size;
 	addr->finish_load_addr = dma_addr_load;
-	addr->finish_run_addr = dma_addr_run;
 	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->finish_model.file_size);
 
 	/* Weights and Bias section */
@@ -363,140 +352,146 @@ cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer, uint8_t
 }
 
 void
-cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer)
+cn10k_ml_layer_io_info_set(struct cnxk_ml_io_info *io_info,
+			   struct cn10k_ml_model_metadata *metadata)
 {
-	struct cn10k_ml_model_metadata *metadata;
 	uint8_t i;
 	uint8_t j;
 
-	metadata = &layer->glow.metadata;
-
 	/* Inputs */
-	layer->info.nb_inputs = metadata->model.num_input;
-	layer->info.total_input_sz_d = 0;
-	layer->info.total_input_sz_q = 0;
+	io_info->nb_inputs = metadata->model.num_input;
+	io_info->total_input_sz_d = 0;
+	io_info->total_input_sz_q = 0;
 	for (i = 0; i < metadata->model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			strncpy(layer->info.input[i].name, (char *)metadata->input1[i].input_name,
+			strncpy(io_info->input[i].name, (char *)metadata->input1[i].input_name,
 				MRVL_ML_INPUT_NAME_LEN);
-			layer->info.input[i].dtype = metadata->input1[i].input_type;
-			layer->info.input[i].qtype = metadata->input1[i].model_input_type;
-			layer->info.input[i].nb_dims = 4;
-			layer->info.input[i].shape[0] = metadata->input1[i].shape.w;
-			layer->info.input[i].shape[1] = metadata->input1[i].shape.x;
-			layer->info.input[i].shape[2] = metadata->input1[i].shape.y;
-			layer->info.input[i].shape[3] = metadata->input1[i].shape.z;
-			layer->info.input[i].nb_elements =
+			io_info->input[i].dtype = metadata->input1[i].input_type;
+			io_info->input[i].qtype = metadata->input1[i].model_input_type;
+			io_info->input[i].nb_dims = 4;
+			io_info->input[i].shape[0] = metadata->input1[i].shape.w;
+			io_info->input[i].shape[1] = metadata->input1[i].shape.x;
+			io_info->input[i].shape[2] = metadata->input1[i].shape.y;
+			io_info->input[i].shape[3] = metadata->input1[i].shape.z;
+			io_info->input[i].nb_elements =
 				metadata->input1[i].shape.w * metadata->input1[i].shape.x *
 				metadata->input1[i].shape.y * metadata->input1[i].shape.z;
-			layer->info.input[i].sz_d =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_d =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].input_type);
-			layer->info.input[i].sz_q =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_q =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].model_input_type);
-			layer->info.input[i].scale = metadata->input1[i].qscale;
+			io_info->input[i].scale = metadata->input1[i].qscale;
 
-			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
-			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
+			io_info->total_input_sz_d += io_info->input[i].sz_d;
+			io_info->total_input_sz_q += io_info->input[i].sz_q;
 
 			plt_ml_dbg(
-				"index = %u, input1[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				layer->index, i, metadata->input1[i].shape.w,
+				"layer_name = %s, input1[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				metadata->model.name, i, metadata->input1[i].shape.w,
 				metadata->input1[i].shape.x, metadata->input1[i].shape.y,
-				metadata->input1[i].shape.z, layer->info.input[i].sz_d,
-				layer->info.input[i].sz_q);
+				metadata->input1[i].shape.z, io_info->input[i].sz_d,
+				io_info->input[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			strncpy(layer->info.input[i].name, (char *)metadata->input2[j].input_name,
+			strncpy(io_info->input[i].name, (char *)metadata->input2[j].input_name,
 				MRVL_ML_INPUT_NAME_LEN);
-			layer->info.input[i].dtype = metadata->input2[j].input_type;
-			layer->info.input[i].qtype = metadata->input2[j].model_input_type;
-			layer->info.input[i].nb_dims = 4;
-			layer->info.input[i].shape[0] = metadata->input2[j].shape.w;
-			layer->info.input[i].shape[1] = metadata->input2[j].shape.x;
-			layer->info.input[i].shape[2] = metadata->input2[j].shape.y;
-			layer->info.input[i].shape[3] = metadata->input2[j].shape.z;
-			layer->info.input[i].nb_elements =
+			io_info->input[i].dtype = metadata->input2[j].input_type;
+			io_info->input[i].qtype = metadata->input2[j].model_input_type;
+			io_info->input[i].nb_dims = 4;
+			io_info->input[i].shape[0] = metadata->input2[j].shape.w;
+			io_info->input[i].shape[1] = metadata->input2[j].shape.x;
+			io_info->input[i].shape[2] = metadata->input2[j].shape.y;
+			io_info->input[i].shape[3] = metadata->input2[j].shape.z;
+			io_info->input[i].nb_elements =
 				metadata->input2[j].shape.w * metadata->input2[j].shape.x *
 				metadata->input2[j].shape.y * metadata->input2[j].shape.z;
-			layer->info.input[i].sz_d =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_d =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].input_type);
-			layer->info.input[i].sz_q =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_q =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].model_input_type);
-			layer->info.input[i].scale = metadata->input2[j].qscale;
+			io_info->input[i].scale = metadata->input2[j].qscale;
 
-			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
-			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
+			io_info->total_input_sz_d += io_info->input[i].sz_d;
+			io_info->total_input_sz_q += io_info->input[i].sz_q;
 
 			plt_ml_dbg(
-				"index = %u, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				layer->index, j, metadata->input2[j].shape.w,
+				"layer_name = %s, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				metadata->model.name, j, metadata->input2[j].shape.w,
 				metadata->input2[j].shape.x, metadata->input2[j].shape.y,
-				metadata->input2[j].shape.z, layer->info.input[i].sz_d,
-				layer->info.input[i].sz_q);
+				metadata->input2[j].shape.z, io_info->input[i].sz_d,
+				io_info->input[i].sz_q);
 		}
 	}
 
 	/* Outputs */
-	layer->info.nb_outputs = metadata->model.num_output;
-	layer->info.total_output_sz_q = 0;
-	layer->info.total_output_sz_d = 0;
+	io_info->nb_outputs = metadata->model.num_output;
+	io_info->total_output_sz_q = 0;
+	io_info->total_output_sz_d = 0;
 	for (i = 0; i < metadata->model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			strncpy(layer->info.output[i].name,
-				(char *)metadata->output1[i].output_name, MRVL_ML_OUTPUT_NAME_LEN);
-			layer->info.output[i].dtype = metadata->output1[i].output_type;
-			layer->info.output[i].qtype = metadata->output1[i].model_output_type;
-			layer->info.output[i].nb_dims = 1;
-			layer->info.output[i].shape[0] = metadata->output1[i].size;
-			layer->info.output[i].nb_elements = metadata->output1[i].size;
-			layer->info.output[i].sz_d =
-				layer->info.output[i].nb_elements *
+			strncpy(io_info->output[i].name, (char *)metadata->output1[i].output_name,
+				MRVL_ML_OUTPUT_NAME_LEN);
+			io_info->output[i].dtype = metadata->output1[i].output_type;
+			io_info->output[i].qtype = metadata->output1[i].model_output_type;
+			io_info->output[i].nb_dims = 1;
+			io_info->output[i].shape[0] = metadata->output1[i].size;
+			io_info->output[i].nb_elements = metadata->output1[i].size;
+			io_info->output[i].sz_d =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].output_type);
-			layer->info.output[i].sz_q =
-				layer->info.output[i].nb_elements *
+			io_info->output[i].sz_q =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].model_output_type);
-			layer->info.output[i].scale = metadata->output1[i].dscale;
+			io_info->output[i].scale = metadata->output1[i].dscale;
 
-			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
-			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
+			io_info->total_output_sz_q += io_info->output[i].sz_q;
+			io_info->total_output_sz_d += io_info->output[i].sz_d;
 
-			plt_ml_dbg("index = %u, output1[%u] - sz_d = %u, sz_q = %u", layer->index,
-				   i, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
+			plt_ml_dbg("layer_name = %s, output1[%u] - sz_d = %u, sz_q = %u",
+				   metadata->model.name, i, io_info->output[i].sz_d,
+				   io_info->output[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			strncpy(layer->info.output[i].name,
-				(char *)metadata->output2[j].output_name, MRVL_ML_OUTPUT_NAME_LEN);
-			layer->info.output[i].dtype = metadata->output2[j].output_type;
-			layer->info.output[i].qtype = metadata->output2[j].model_output_type;
-			layer->info.output[i].nb_dims = 1;
-			layer->info.output[i].shape[0] = metadata->output2[j].size;
-			layer->info.output[i].nb_elements = metadata->output2[j].size;
-			layer->info.output[i].sz_d =
-				layer->info.output[i].nb_elements *
+			strncpy(io_info->output[i].name, (char *)metadata->output2[j].output_name,
+				MRVL_ML_OUTPUT_NAME_LEN);
+			io_info->output[i].dtype = metadata->output2[j].output_type;
+			io_info->output[i].qtype = metadata->output2[j].model_output_type;
+			io_info->output[i].nb_dims = 1;
+			io_info->output[i].shape[0] = metadata->output2[j].size;
+			io_info->output[i].nb_elements = metadata->output2[j].size;
+			io_info->output[i].sz_d =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].output_type);
-			layer->info.output[i].sz_q =
-				layer->info.output[i].nb_elements *
+			io_info->output[i].sz_q =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].model_output_type);
-			layer->info.output[i].scale = metadata->output2[j].dscale;
+			io_info->output[i].scale = metadata->output2[j].dscale;
 
-			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
-			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
+			io_info->total_output_sz_q += io_info->output[i].sz_q;
+			io_info->total_output_sz_d += io_info->output[i].sz_d;
 
-			plt_ml_dbg("index = %u, output2[%u] - sz_d = %u, sz_q = %u", layer->index,
-				   j, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
+			plt_ml_dbg("layer_name = %s, output2[%u] - sz_d = %u, sz_q = %u",
+				   metadata->model.name, j, io_info->output[i].sz_d,
+				   io_info->output[i].sz_q);
 		}
 	}
 }
 
+struct cnxk_ml_io_info *
+cn10k_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
+{
+	return &model->layer[layer_id].info;
+}
+
 int
-cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id, uint8_t *buffer,
-			       uint16_t *wb_pages, uint16_t *scratch_pages)
+cn10k_ml_model_ocm_pages_count(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
+			       uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_ocm *ocm;
@@ -504,7 +499,7 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 	uint64_t wb_size;
 
 	metadata = (struct cn10k_ml_model_metadata *)buffer;
-	ocm = &cn10k_mldev->ocm;
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
 
 	/* Assume wb_size is zero for non-relocatable models */
 	if (metadata->model.ocm_relocatable)
@@ -516,7 +511,7 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 		*wb_pages = wb_size / ocm->page_size + 1;
 	else
 		*wb_pages = wb_size / ocm->page_size;
-	plt_ml_dbg("model_id = %u, wb_size = %" PRIu64 ", wb_pages = %u", model_id, wb_size,
+	plt_ml_dbg("index = %u, wb_size = %" PRIu64 ", wb_pages = %u", layer->index, wb_size,
 		   *wb_pages);
 
 	scratch_size = ocm->size_per_tile - metadata->model.ocm_tmp_range_floor;
@@ -524,15 +519,15 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 		*scratch_pages = scratch_size / ocm->page_size + 1;
 	else
 		*scratch_pages = scratch_size / ocm->page_size;
-	plt_ml_dbg("model_id = %u, scratch_size = %" PRIu64 ", scratch_pages = %u", model_id,
+	plt_ml_dbg("index = %u, scratch_size = %" PRIu64 ", scratch_pages = %u", layer->index,
 		   scratch_size, *scratch_pages);
 
 	/* Check if the model can be loaded on OCM */
-	if ((*wb_pages + *scratch_pages) > cn10k_mldev->ocm.num_pages) {
+	if ((*wb_pages + *scratch_pages) > ocm->num_pages) {
 		plt_err("Cannot create the model, OCM relocatable = %u",
 			metadata->model.ocm_relocatable);
 		plt_err("wb_pages (%u) + scratch_pages (%u) > %u", *wb_pages, *scratch_pages,
-			cn10k_mldev->ocm.num_pages);
+			ocm->num_pages);
 		return -ENOMEM;
 	}
 
@@ -540,28 +535,25 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 	 * prevent the library from allocating the remaining space on the tile to other models.
 	 */
 	if (!metadata->model.ocm_relocatable)
-		*scratch_pages = PLT_MAX(PLT_U64_CAST(*scratch_pages),
-					 PLT_U64_CAST(cn10k_mldev->ocm.num_pages));
+		*scratch_pages =
+			PLT_MAX(PLT_U64_CAST(*scratch_pages), PLT_U64_CAST(ocm->num_pages));
 
 	return 0;
 }
 
 void
-cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
+cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			struct cnxk_ml_io_info *io_info, struct cn10k_ml_model_metadata *metadata)
 {
-	struct cn10k_ml_model_metadata *metadata;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct rte_ml_model_info *info;
 	struct rte_ml_io_info *output;
 	struct rte_ml_io_info *input;
-	struct cnxk_ml_layer *layer;
 	uint8_t i;
 
-	cnxk_mldev = dev->data->dev_private;
 	metadata = &model->glow.metadata;
 	info = PLT_PTR_CAST(model->info);
 	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
-	output = PLT_PTR_ADD(input, metadata->model.num_input * sizeof(struct rte_ml_io_info));
+	output = PLT_PTR_ADD(input, ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info));
 
 	/* Set model info */
 	memset(info, 0, sizeof(struct rte_ml_model_info));
@@ -570,39 +562,37 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 		 metadata->model.version[1], metadata->model.version[2],
 		 metadata->model.version[3]);
 	info->model_id = model->model_id;
-	info->device_id = dev->data->dev_id;
+	info->device_id = cnxk_mldev->mldev->data->dev_id;
 	info->io_layout = RTE_ML_IO_LAYOUT_PACKED;
 	info->min_batches = model->batch_size;
 	info->max_batches =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_num_batches /
 		model->batch_size;
-	info->nb_inputs = metadata->model.num_input;
+	info->nb_inputs = io_info->nb_inputs;
 	info->input_info = input;
-	info->nb_outputs = metadata->model.num_output;
+	info->nb_outputs = io_info->nb_outputs;
 	info->output_info = output;
 	info->wb_size = metadata->weights_bias.file_size;
 
 	/* Set input info */
-	layer = &model->layer[0];
 	for (i = 0; i < info->nb_inputs; i++) {
-		rte_memcpy(input[i].name, layer->info.input[i].name, MRVL_ML_INPUT_NAME_LEN);
-		input[i].nb_dims = layer->info.input[i].nb_dims;
-		input[i].shape = &layer->info.input[i].shape[0];
-		input[i].type = layer->info.input[i].qtype;
-		input[i].nb_elements = layer->info.input[i].nb_elements;
-		input[i].size = layer->info.input[i].nb_elements *
-				rte_ml_io_type_size_get(layer->info.input[i].qtype);
+		rte_memcpy(input[i].name, io_info->input[i].name, MRVL_ML_INPUT_NAME_LEN);
+		input[i].nb_dims = io_info->input[i].nb_dims;
+		input[i].shape = &io_info->input[i].shape[0];
+		input[i].type = io_info->input[i].qtype;
+		input[i].nb_elements = io_info->input[i].nb_elements;
+		input[i].size = io_info->input[i].nb_elements *
+				rte_ml_io_type_size_get(io_info->input[i].qtype);
 	}
 
 	/* Set output info */
-	layer = &model->layer[0];
 	for (i = 0; i < info->nb_outputs; i++) {
-		rte_memcpy(output[i].name, layer->info.output[i].name, MRVL_ML_INPUT_NAME_LEN);
-		output[i].nb_dims = layer->info.output[i].nb_dims;
-		output[i].shape = &layer->info.output[i].shape[0];
-		output[i].type = layer->info.output[i].qtype;
-		output[i].nb_elements = layer->info.output[i].nb_elements;
-		output[i].size = layer->info.output[i].nb_elements *
-				 rte_ml_io_type_size_get(layer->info.output[i].qtype);
+		rte_memcpy(output[i].name, io_info->output[i].name, MRVL_ML_INPUT_NAME_LEN);
+		output[i].nb_dims = io_info->output[i].nb_dims;
+		output[i].shape = &io_info->output[i].shape[0];
+		output[i].type = io_info->output[i].qtype;
+		output[i].nb_elements = io_info->output[i].nb_elements;
+		output[i].size = io_info->output[i].nb_elements *
+				 rte_ml_io_type_size_get(io_info->output[i].qtype);
 	}
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 5c32f48c68..b891c9d627 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -9,9 +9,11 @@
 
 #include <roc_api.h>
 
-#include "cn10k_ml_dev.h"
 #include "cn10k_ml_ocm.h"
 
+#include "cnxk_ml_io.h"
+
+struct cnxk_ml_dev;
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
 struct cnxk_ml_req;
@@ -366,27 +368,15 @@ struct cn10k_ml_layer_addr {
 	/* Base DMA address for load */
 	void *base_dma_addr_load;
 
-	/* Base DMA address for run */
-	void *base_dma_addr_run;
-
 	/* Init section load address */
 	void *init_load_addr;
 
-	/* Init section run address */
-	void *init_run_addr;
-
 	/* Main section load address */
 	void *main_load_addr;
 
-	/* Main section run address */
-	void *main_run_addr;
-
 	/* Finish section load address */
 	void *finish_load_addr;
 
-	/* Finish section run address */
-	void *finish_run_addr;
-
 	/* Weights and Bias base address */
 	void *wb_base_addr;
 
@@ -462,9 +452,13 @@ int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
 void cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer,
 				uint8_t *base_dma_addr);
-void cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer);
-int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id,
+void cn10k_ml_layer_io_info_set(struct cnxk_ml_io_info *io_info,
+				struct cn10k_ml_model_metadata *metadata);
+struct cnxk_ml_io_info *cn10k_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
+int cn10k_ml_model_ocm_pages_count(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				   uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages);
-void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model);
+void cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			     struct cnxk_ml_io_info *io_info,
+			     struct cn10k_ml_model_metadata *metadata);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index e3c688a55f..ad2effb904 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -15,6 +15,9 @@
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
 
+/* ML layer macros */
+#define CN10K_ML_LAYER_MEMZONE_NAME "ml_cn10k_layer_mz"
+
 /* Debug print width */
 #define STR_LEN	  12
 #define FIELD_LEN 16
@@ -273,7 +276,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 		req->cn10k_req.jd.model_start.extended_args = PLT_U64_CAST(
 			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.extended_args));
 		req->cn10k_req.jd.model_start.model_dst_ddr_addr =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_run_addr));
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_load_addr));
 		req->cn10k_req.jd.model_start.model_init_offset = 0x0;
 		req->cn10k_req.jd.model_start.model_main_offset = metadata->init_model.file_size;
 		req->cn10k_req.jd.model_start.model_finish_offset =
@@ -1261,85 +1264,171 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 }
 
 int
-cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
+cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uint8_t *buffer,
+		    size_t size, uint16_t *index)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
-	size_t model_scratch_size;
-	size_t model_stats_size;
-	size_t model_data_size;
-	size_t model_info_size;
+	size_t layer_object_size = 0;
+	size_t layer_scratch_size;
+	size_t layer_xstats_size;
 	uint8_t *base_dma_addr;
 	uint16_t scratch_pages;
+	uint16_t layer_id = 0;
 	uint16_t wb_pages;
 	uint64_t mz_size;
 	uint16_t idx;
-	bool found;
 	int qp_id;
 	int ret;
 
-	ret = cn10k_ml_model_metadata_check(params->addr, params->size);
+	PLT_SET_USED(size);
+	PLT_SET_USED(layer_name);
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	layer = &model->layer[layer_id];
+
+	ret = cn10k_ml_model_metadata_check(buffer, size);
 	if (ret != 0)
 		return ret;
 
-	cnxk_mldev = dev->data->dev_private;
-
-	/* Find model ID */
-	found = false;
-	for (idx = 0; idx < dev->data->nb_models; idx++) {
-		if (dev->data->models[idx] == NULL) {
-			found = true;
+	/* Get index */
+	for (idx = 0; idx < cnxk_mldev->max_nb_layers; idx++) {
+		if (!cnxk_mldev->index_map[idx].active) {
+			layer->index = idx;
 			break;
 		}
 	}
 
-	if (!found) {
-		plt_err("No slots available to load new model");
-		return -ENOMEM;
+	if (idx >= cnxk_mldev->max_nb_layers) {
+		plt_err("No slots available for model layers, model_id = %u, layer_id = %u",
+			model->model_id, layer_id);
+		return -1;
 	}
 
+	layer->model = model;
+
 	/* Get WB and scratch pages, check if model can be loaded. */
-	ret = cn10k_ml_model_ocm_pages_count(&cnxk_mldev->cn10k_mldev, idx, params->addr, &wb_pages,
-					     &scratch_pages);
+	ret = cn10k_ml_model_ocm_pages_count(cnxk_mldev, layer, buffer, &wb_pages, &scratch_pages);
 	if (ret < 0)
 		return ret;
 
-	/* Compute memzone size */
-	metadata = (struct cn10k_ml_model_metadata *)params->addr;
-	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
-			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
-	model_scratch_size = PLT_ALIGN_CEIL(metadata->model.ddr_scratch_range_end -
+	/* Compute layer memzone size */
+	metadata = (struct cn10k_ml_model_metadata *)buffer;
+	layer_object_size = metadata->init_model.file_size + metadata->main_model.file_size +
+			    metadata->finish_model.file_size + metadata->weights_bias.file_size;
+	layer_object_size = PLT_ALIGN_CEIL(layer_object_size, ML_CN10K_ALIGN_SIZE);
+	layer_scratch_size = PLT_ALIGN_CEIL(metadata->model.ddr_scratch_range_end -
 						    metadata->model.ddr_scratch_range_start + 1,
 					    ML_CN10K_ALIGN_SIZE);
-	model_data_size = PLT_ALIGN_CEIL(model_data_size, ML_CN10K_ALIGN_SIZE);
-	model_info_size = sizeof(struct rte_ml_model_info) +
-			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
-			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
-	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
-	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_xstats);
-
-	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
-		  2 * model_data_size + model_scratch_size + model_info_size +
-		  PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE) +
-		  model_stats_size;
+	layer_xstats_size = (cnxk_mldev->mldev->data->nb_queue_pairs + 1) *
+			    sizeof(struct cn10k_ml_layer_xstats);
 
-	/* Allocate memzone for model object and model data */
-	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
+	/* Allocate memzone for model data */
+	mz_size = layer_object_size + layer_scratch_size +
+		  PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE) +
+		  layer_xstats_size;
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u_%u", CN10K_ML_LAYER_MEMZONE_NAME,
+		 model->model_id, layer_id);
 	mz = plt_memzone_reserve_aligned(str, mz_size, 0, ML_CN10K_ALIGN_SIZE);
 	if (!mz) {
 		plt_err("plt_memzone_reserve failed : %s", str);
 		return -ENOMEM;
 	}
 
-	model = mz->addr;
-	model->cnxk_mldev = cnxk_mldev;
-	model->model_id = idx;
-	dev->data->models[idx] = model;
+	/* Copy metadata to internal buffer */
+	rte_memcpy(&layer->glow.metadata, buffer, sizeof(struct cn10k_ml_model_metadata));
+	cn10k_ml_model_metadata_update(&layer->glow.metadata);
+
+	/* Set layer name */
+	rte_memcpy(layer->name, layer->glow.metadata.model.name, MRVL_ML_MODEL_NAME_LEN);
+
+	/* Enable support for batch_size of 256 */
+	if (layer->glow.metadata.model.batch_size == 0)
+		layer->batch_size = 256;
+	else
+		layer->batch_size = layer->glow.metadata.model.batch_size;
+
+	/* Set DMA base address */
+	base_dma_addr = mz->addr;
+	cn10k_ml_layer_addr_update(layer, buffer, base_dma_addr);
+
+	/* Set scratch base address */
+	layer->glow.addr.scratch_base_addr = PLT_PTR_ADD(base_dma_addr, layer_object_size);
+
+	/* Update internal I/O data structure */
+	cn10k_ml_layer_io_info_set(&layer->info, &layer->glow.metadata);
+
+	/* Initialize model_mem_map */
+	memset(&layer->glow.ocm_map, 0, sizeof(struct cn10k_ml_ocm_layer_map));
+	layer->glow.ocm_map.ocm_reserved = false;
+	layer->glow.ocm_map.tilemask = 0;
+	layer->glow.ocm_map.wb_page_start = -1;
+	layer->glow.ocm_map.wb_pages = wb_pages;
+	layer->glow.ocm_map.scratch_pages = scratch_pages;
+
+	/* Set slow-path request address and state */
+	layer->glow.req = PLT_PTR_ADD(mz->addr, layer_object_size + layer_scratch_size);
+
+	/* Reset burst and sync stats */
+	layer->glow.burst_xstats = PLT_PTR_ADD(
+		layer->glow.req, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
+	for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs + 1; qp_id++) {
+		layer->glow.burst_xstats[qp_id].hw_latency_tot = 0;
+		layer->glow.burst_xstats[qp_id].hw_latency_min = UINT64_MAX;
+		layer->glow.burst_xstats[qp_id].hw_latency_max = 0;
+		layer->glow.burst_xstats[qp_id].fw_latency_tot = 0;
+		layer->glow.burst_xstats[qp_id].fw_latency_min = UINT64_MAX;
+		layer->glow.burst_xstats[qp_id].fw_latency_max = 0;
+		layer->glow.burst_xstats[qp_id].hw_reset_count = 0;
+		layer->glow.burst_xstats[qp_id].fw_reset_count = 0;
+		layer->glow.burst_xstats[qp_id].dequeued_count = 0;
+	}
+
+	layer->glow.sync_xstats =
+		PLT_PTR_ADD(layer->glow.burst_xstats, cnxk_mldev->mldev->data->nb_queue_pairs *
+							      sizeof(struct cn10k_ml_layer_xstats));
+
+	/* Update xstats names */
+	cn10k_ml_xstats_model_name_update(cnxk_mldev->mldev, idx);
+
+	layer->state = ML_CNXK_LAYER_STATE_LOADED;
+	cnxk_mldev->index_map[idx].model_id = model->model_id;
+	cnxk_mldev->index_map[idx].layer_id = layer_id;
+	cnxk_mldev->index_map[idx].active = true;
+	*index = idx;
+
+	return 0;
+}
+
+int
+cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+		    struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+	int ret;
+
+	/* Metadata check */
+	ret = cn10k_ml_model_metadata_check(params->addr, params->size);
+	if (ret != 0)
+		return ret;
 
+	/* Copy metadata to internal buffer */
 	rte_memcpy(&model->glow.metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
 	cn10k_ml_model_metadata_update(&model->glow.metadata);
 
@@ -1358,99 +1447,62 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	 */
 	model->nb_layers = 1;
 
-	/* Copy metadata to internal buffer */
-	rte_memcpy(&model->layer[0].glow.metadata, params->addr,
-		   sizeof(struct cn10k_ml_model_metadata));
-	cn10k_ml_model_metadata_update(&model->layer[0].glow.metadata);
-	model->layer[0].model = model;
-
-	/* Set DMA base address */
-	base_dma_addr = PLT_PTR_ADD(
-		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE));
-	cn10k_ml_layer_addr_update(&model->layer[0], params->addr, base_dma_addr);
-	model->layer[0].glow.addr.scratch_base_addr =
-		PLT_PTR_ADD(base_dma_addr, 2 * model_data_size);
-
-	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
-		   model->layer[0].glow.addr.base_dma_addr_load, model_data_size);
-
-	/* Update internal I/O data structure */
-	cn10k_ml_layer_info_update(&model->layer[0]);
-
-	/* Initialize model_mem_map */
-	memset(&model->layer[0].glow.ocm_map, 0, sizeof(struct cn10k_ml_ocm_layer_map));
-	model->layer[0].glow.ocm_map.ocm_reserved = false;
-	model->layer[0].glow.ocm_map.tilemask = 0;
-	model->layer[0].glow.ocm_map.wb_page_start = -1;
-	model->layer[0].glow.ocm_map.wb_pages = wb_pages;
-	model->layer[0].glow.ocm_map.scratch_pages = scratch_pages;
-
-	/* Set model info */
-	model->info = PLT_PTR_ADD(model->layer[0].glow.addr.scratch_base_addr, model_scratch_size);
-	cn10k_ml_model_info_set(dev, model);
-
-	/* Set slow-path request address and state */
-	model->layer[0].glow.req = PLT_PTR_ADD(model->info, model_info_size);
-
-	/* Reset burst and sync stats */
-	model->layer[0].glow.burst_xstats =
-		PLT_PTR_ADD(model->layer[0].glow.req,
-			    PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
-		model->layer[0].glow.burst_xstats[qp_id].hw_latency_tot = 0;
-		model->layer[0].glow.burst_xstats[qp_id].hw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_xstats[qp_id].hw_latency_max = 0;
-		model->layer[0].glow.burst_xstats[qp_id].fw_latency_tot = 0;
-		model->layer[0].glow.burst_xstats[qp_id].fw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_xstats[qp_id].fw_latency_max = 0;
-		model->layer[0].glow.burst_xstats[qp_id].hw_reset_count = 0;
-		model->layer[0].glow.burst_xstats[qp_id].fw_reset_count = 0;
-		model->layer[0].glow.burst_xstats[qp_id].dequeued_count = 0;
+	/* Load layer and get the index */
+	layer = &model->layer[0];
+	ret = cn10k_ml_layer_load(cnxk_mldev, model->model_id, NULL, params->addr, params->size,
+				  &layer->index);
+	if (ret != 0) {
+		plt_err("Model layer load failed: model_id = %u, layer_id = %u", model->model_id,
+			0);
+		return ret;
 	}
 
-	model->layer[0].glow.sync_xstats =
-		PLT_PTR_ADD(model->layer[0].glow.burst_xstats,
-			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_xstats));
-
-	plt_spinlock_init(&model->lock);
-	model->state = ML_CNXK_MODEL_STATE_LOADED;
-	dev->data->models[idx] = model;
-	cnxk_mldev->nb_models_loaded++;
-
-	/* Update xstats names */
-	cn10k_ml_xstats_model_name_update(dev, idx);
-
-	*model_id = idx;
+	cn10k_ml_model_info_set(cnxk_mldev, model, &model->layer[0].info, &model->glow.metadata);
 
 	return 0;
 }
 
 int
-cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name)
 {
-	char str[RTE_MEMZONE_NAMESIZE];
-	struct cnxk_ml_model *model;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 
-	cnxk_mldev = dev->data->dev_private;
-	model = dev->data->models[model_id];
+	char str[RTE_MEMZONE_NAMESIZE];
+	uint16_t layer_id = 0;
+	int ret;
 
+	PLT_SET_USED(layer_name);
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
 	if (model == NULL) {
 		plt_err("Invalid model_id = %u", model_id);
 		return -EINVAL;
 	}
 
-	if (model->state != ML_CNXK_MODEL_STATE_LOADED) {
-		plt_err("Cannot unload. Model in use.");
-		return -EBUSY;
-	}
+	layer = &model->layer[layer_id];
 
-	dev->data->models[model_id] = NULL;
-	cnxk_mldev->nb_models_unloaded++;
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u_%u", CN10K_ML_LAYER_MEMZONE_NAME,
+		 model->model_id, layer_id);
+	ret = plt_memzone_free(plt_memzone_lookup(str));
 
-	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, model_id);
-	return plt_memzone_free(plt_memzone_lookup(str));
+	layer->state = ML_CNXK_LAYER_STATE_UNKNOWN;
+	cnxk_mldev->index_map[layer->index].active = false;
+
+	return ret;
+}
+
+int
+cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	return cn10k_ml_layer_unload(cnxk_mldev, model->model_id, NULL);
 }
 
 int
@@ -1748,7 +1800,6 @@ int
 cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
 {
 	struct cnxk_ml_model *model;
-	size_t size;
 
 	model = dev->data->models[model_id];
 
@@ -1762,19 +1813,10 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *bu
 	else if (model->state != ML_CNXK_MODEL_STATE_LOADED)
 		return -EBUSY;
 
-	size = model->layer[0].glow.metadata.init_model.file_size +
-	       model->layer[0].glow.metadata.main_model.file_size +
-	       model->layer[0].glow.metadata.finish_model.file_size +
-	       model->layer[0].glow.metadata.weights_bias.file_size;
-
 	/* Update model weights & bias */
 	rte_memcpy(model->layer[0].glow.addr.wb_load_addr, buffer,
 		   model->layer[0].glow.metadata.weights_bias.file_size);
 
-	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
-		   model->layer[0].glow.addr.base_dma_addr_load, size);
-
 	return 0;
 }
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 2d0a49d5cd..677219dfdf 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -12,6 +12,7 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_qp;
+struct cnxk_ml_model;
 
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
@@ -311,9 +312,9 @@ int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 			      int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids);
 
 /* Slow-path ops */
-int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
-			uint16_t *model_id);
-int cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
+int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+			struct cnxk_ml_model *model);
+int cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
@@ -339,4 +340,9 @@ __rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *
 /* Misc ops */
 void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
 
+/* Layer ops */
+int cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uint8_t *buffer,
+			size_t size, uint16_t *index);
+int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name);
+
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 02605fa28f..1590249abd 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -31,6 +31,18 @@ enum cnxk_ml_dev_state {
 	ML_CNXK_DEV_STATE_CLOSED
 };
 
+/* Index to model and layer ID map */
+struct cnxk_ml_index_map {
+	/* Model ID */
+	uint16_t model_id;
+
+	/* Layer ID */
+	uint16_t layer_id;
+
+	/* Layer status */
+	bool active;
+};
+
 /* Device private data */
 struct cnxk_ml_dev {
 	/* RTE device */
@@ -56,6 +68,9 @@ struct cnxk_ml_dev {
 
 	/* Maximum number of layers */
 	uint64_t max_nb_layers;
+
+	/* Index map */
+	struct cnxk_ml_index_map *index_map;
 };
 
 #endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index aa56dd2276..1d8b84269d 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -10,6 +10,9 @@
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
+/* ML model macros */
+#define CNXK_ML_MODEL_MEMZONE_NAME "ml_cnxk_model_mz"
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -137,6 +140,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	uint16_t model_id;
 	uint32_t mz_size;
 	uint16_t qp_id;
+	uint64_t i;
 	int ret;
 
 	if (dev == NULL)
@@ -240,7 +244,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 						plt_err("Could not stop model %u", model_id);
 				}
 				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-					if (cn10k_ml_model_unload(dev, model_id) != 0)
+					if (cnxk_ml_model_unload(dev, model_id) != 0)
 						plt_err("Could not unload model %u", model_id);
 				}
 				dev->data->models[model_id] = NULL;
@@ -271,6 +275,23 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	cnxk_mldev->max_nb_layers =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
 
+	/* Allocate and initialize index_map */
+	if (cnxk_mldev->index_map == NULL) {
+		cnxk_mldev->index_map =
+			rte_zmalloc("cnxk_ml_index_map",
+				    sizeof(struct cnxk_ml_index_map) * cnxk_mldev->max_nb_layers,
+				    RTE_CACHE_LINE_SIZE);
+		if (cnxk_mldev->index_map == NULL) {
+			plt_err("Failed to get memory for index_map, nb_layers %" PRIu64,
+				cnxk_mldev->max_nb_layers);
+			ret = -ENOMEM;
+			goto error;
+		}
+	}
+
+	for (i = 0; i < cnxk_mldev->max_nb_layers; i++)
+		cnxk_mldev->index_map[i].active = false;
+
 	cnxk_mldev->nb_models_loaded = 0;
 	cnxk_mldev->nb_models_started = 0;
 	cnxk_mldev->nb_models_stopped = 0;
@@ -303,6 +324,9 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close CN10K ML Device");
 
+	if (cnxk_mldev->index_map)
+		rte_free(cnxk_mldev->index_map);
+
 	/* Stop and unload all models */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 		model = dev->data->models[model_id];
@@ -312,7 +336,7 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 					plt_err("Could not stop model %u", model_id);
 			}
 			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-				if (cn10k_ml_model_unload(dev, model_id) != 0)
+				if (cnxk_ml_model_unload(dev, model_id) != 0)
 					plt_err("Could not unload model %u", model_id);
 			}
 			dev->data->models[model_id] = NULL;
@@ -428,6 +452,118 @@ cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+static int
+cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint64_t model_info_size;
+	uint16_t lcl_model_id;
+	uint64_t mz_size;
+	bool found;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	/* Find model ID */
+	found = false;
+	for (lcl_model_id = 0; lcl_model_id < dev->data->nb_models; lcl_model_id++) {
+		if (dev->data->models[lcl_model_id] == NULL) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		plt_err("No slots available to load new model");
+		return -ENOMEM;
+	}
+
+	/* Compute memzone size */
+	cnxk_ml_dev_info_get(dev, &dev_info);
+	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size);
+	model_info_size = sizeof(struct rte_ml_model_info) +
+			  ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info) +
+			  ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info);
+	model_info_size = PLT_ALIGN_CEIL(model_info_size, dev_info.align_size);
+	mz_size += model_info_size;
+
+	/* Allocate memzone for model object */
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CNXK_ML_MODEL_MEMZONE_NAME, lcl_model_id);
+	mz = plt_memzone_reserve_aligned(str, mz_size, 0, dev_info.align_size);
+	if (!mz) {
+		plt_err("Failed to allocate memory for cnxk_ml_model: %s", str);
+		return -ENOMEM;
+	}
+
+	model = mz->addr;
+	model->cnxk_mldev = cnxk_mldev;
+	model->model_id = lcl_model_id;
+	model->info = PLT_PTR_ADD(
+		model, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size));
+	dev->data->models[lcl_model_id] = model;
+
+	ret = cn10k_ml_model_load(cnxk_mldev, params, model);
+	if (ret != 0)
+		goto error;
+
+	plt_spinlock_init(&model->lock);
+	model->state = ML_CNXK_MODEL_STATE_LOADED;
+	cnxk_mldev->nb_models_loaded++;
+
+	*model_id = lcl_model_id;
+
+	return 0;
+
+error:
+	rte_memzone_free(mz);
+
+	return ret;
+}
+
+int
+cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	if (model->state != ML_CNXK_MODEL_STATE_LOADED) {
+		plt_err("Cannot unload. Model in use.");
+		return -EBUSY;
+	}
+
+	ret = cn10k_ml_model_unload(cnxk_mldev, model);
+	if (ret != 0)
+		return ret;
+
+	dev->data->models[model_id] = NULL;
+	cnxk_mldev->nb_models_unloaded++;
+
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CNXK_ML_MODEL_MEMZONE_NAME, model_id);
+	return plt_memzone_free(plt_memzone_lookup(str));
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -451,8 +587,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
 
 	/* Model ops */
-	.model_load = cn10k_ml_model_load,
-	.model_unload = cn10k_ml_model_unload,
+	.model_load = cnxk_ml_model_load,
+	.model_unload = cnxk_ml_model_unload,
 	.model_start = cn10k_ml_model_start,
 	.model_stop = cn10k_ml_model_stop,
 	.model_info_get = cn10k_ml_model_info_get,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index a925c07580..bc14f6e5b9 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -62,4 +62,6 @@ struct cnxk_ml_qp {
 
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
+int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 10/34] ml/cnxk: enable OCM check for multilayer TVM model
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (8 preceding siblings ...)
  2023-10-17 16:59   ` [PATCH v4 09/34] ml/cnxk: update model load and unload functions Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 11/34] ml/cnxk: update model start and stop functions Srikanth Yalavarthi
                     ` (24 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

From: Anup Prabhu <aprabhu@marvell.com>

Enabled check for OCM size requirement for multi-layer
TVM model. Compute OCM scratch and WB requirement for
all layers during the load stage.

Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_ops.c | 60 +++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 1d8b84269d..e2ba43a307 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -461,8 +461,12 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
+	uint16_t max_scratch_pages;
+	struct cn10k_ml_ocm *ocm;
 	uint64_t model_info_size;
+	uint16_t total_wb_pages;
 	uint16_t lcl_model_id;
+	uint16_t layer_id;
 	uint64_t mz_size;
 	bool found;
 	int ret;
@@ -514,6 +518,62 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 	if (ret != 0)
 		goto error;
 
+	max_scratch_pages = 0;
+	total_wb_pages = 0;
+	layer_id = 0;
+
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
+
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
+		total_wb_pages = total_wb_pages + model->layer[layer_id].glow.ocm_map.wb_pages;
+		max_scratch_pages = PLT_MAX(max_scratch_pages,
+					    model->layer[layer_id].glow.ocm_map.scratch_pages);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	} else {
+		for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+			if (model->layer[layer_id].type == ML_CNXK_LAYER_TYPE_MRVL) {
+				total_wb_pages = total_wb_pages +
+						 model->layer[layer_id].glow.ocm_map.wb_pages;
+				max_scratch_pages =
+					PLT_MAX(max_scratch_pages,
+						model->layer[layer_id].glow.ocm_map.scratch_pages);
+			}
+		}
+#endif
+	}
+
+	if ((total_wb_pages + max_scratch_pages) > ocm->num_pages) {
+		plt_err("model_id = %u: total_wb_pages (%u) + scratch_pages (%u) >  %u\n",
+			lcl_model_id, total_wb_pages, max_scratch_pages, ocm->num_pages);
+
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
+			plt_ml_dbg("layer_id = %u: wb_pages = %u, scratch_pages = %u\n", layer_id,
+				   model->layer[layer_id].glow.ocm_map.wb_pages,
+				   model->layer[layer_id].glow.ocm_map.scratch_pages);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		} else {
+			for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers;
+			     layer_id++) {
+				if (model->layer[layer_id].type == ML_CNXK_LAYER_TYPE_MRVL) {
+					plt_ml_dbg(
+						"layer_id = %u: wb_pages = %u, scratch_pages = %u\n",
+						layer_id,
+						model->layer[layer_id].glow.ocm_map.wb_pages,
+						model->layer[layer_id].glow.ocm_map.scratch_pages);
+				}
+			}
+#endif
+		}
+
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+			cn10k_ml_model_unload(cnxk_mldev, model);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		else {
+			mvtvm_ml_model_unload(cnxk_mldev, model);
+			return -ENOMEM;
+		}
+#endif
+	}
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CNXK_MODEL_STATE_LOADED;
 	cnxk_mldev->nb_models_loaded++;
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 11/34] ml/cnxk: update model start and stop functions
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (9 preceding siblings ...)
  2023-10-17 16:59   ` [PATCH v4 10/34] ml/cnxk: enable OCM check for multilayer TVM model Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 12/34] ml/cnxk: update model utility functions Srikanth Yalavarthi
                     ` (23 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented cnxk wrapper functions to start and stop
ML models. Wrapper functions would invoke the cn10k
model start and stop functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c |  28 ++--
 drivers/ml/cnxk/cn10k_ml_ocm.h |  12 +-
 drivers/ml/cnxk/cn10k_ml_ops.c | 282 ++++++++++++++++++++-------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   8 +-
 drivers/ml/cnxk/cnxk_ml_ops.c  |  48 +++++-
 drivers/ml/cnxk/cnxk_ml_ops.h  |   1 +
 6 files changed, 240 insertions(+), 139 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index d71c36eae6..2197e5e0ed 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -215,11 +215,10 @@ cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end)
  * scratch & WB pages and OCM allocation mode.
  */
 int
-cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
+cn10k_ml_ocm_tilemask_find(struct cnxk_ml_dev *cnxk_mldev, uint8_t num_tiles, uint16_t wb_pages,
 			   uint16_t scratch_pages, uint64_t *tilemask)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	uint16_t used_scratch_pages_max;
@@ -238,7 +237,6 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	int max_slot_sz;
 	int page_id;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
 
@@ -333,12 +331,10 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 }
 
 void
-cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
+cn10k_ml_ocm_reserve_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id,
 			   uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
 			   uint16_t scratch_pages)
 {
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
@@ -351,10 +347,8 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t l
 	int tile_id;
 	int page_id;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
+	model = cnxk_mldev->mldev->data->models[model_id];
 	layer = &model->layer[layer_id];
 
 	/* Get first set bit, tile_start */
@@ -396,12 +390,10 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t l
 }
 
 void
-cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id)
+cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id)
 {
 	struct cnxk_ml_model *local_model;
 	struct cnxk_ml_layer *local_layer;
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
@@ -416,10 +408,8 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t laye
 	uint16_t i;
 	uint16_t j;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
+	model = cnxk_mldev->mldev->data->models[model_id];
 	layer = &model->layer[layer_id];
 
 	/* Update OCM info for WB memory */
@@ -438,8 +428,8 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t laye
 
 		/* Get max scratch pages required, excluding the current model */
 		scratch_resize_pages = 0;
-		for (i = 0; i < dev->data->nb_models; i++) {
-			local_model = dev->data->models[i];
+		for (i = 0; i < cnxk_mldev->mldev->data->nb_models; i++) {
+			local_model = cnxk_mldev->mldev->data->models[i];
 			if (local_model == NULL)
 				continue;
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 720f8caf76..97b723a56a 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -8,6 +8,8 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+struct cnxk_ml_dev;
+
 /* Number of OCM tiles. */
 #define ML_CN10K_OCM_NUMTILES 0x8
 
@@ -75,12 +77,12 @@ struct cn10k_ml_ocm {
 };
 
 int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
-int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
+int cn10k_ml_ocm_tilemask_find(struct cnxk_ml_dev *cnxk_mldev, uint8_t num_tiles, uint16_t wb_pages,
 			       uint16_t scratch_pages, uint64_t *tilemask);
-void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
-				uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
-				uint16_t scratch_pages);
-void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id);
+void cn10k_ml_ocm_reserve_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id,
+				uint16_t layer_id, uint64_t tilemask, int wb_page_start,
+				uint16_t wb_pages, uint16_t scratch_pages);
+void cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id);
 void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp);
 
 #endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index ad2effb904..c677861645 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -248,26 +248,28 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 }
 
 static void
-cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_model *model,
+cn10k_ml_prep_sp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				struct cnxk_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_layer_addr *addr;
+	struct cn10k_ml_dev *cn10k_mldev;
 
-	metadata = &model->glow.metadata;
-	addr = &model->layer[0].glow.addr;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	metadata = &layer->glow.metadata;
+	addr = &layer->glow.addr;
 
 	memset(&req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
 	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
 	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->cn10k_req.status);
-	req->cn10k_req.jd.hdr.model_id = model->model_id;
+	req->cn10k_req.jd.hdr.model_id = layer->index;
 	req->cn10k_req.jd.hdr.job_type = job_type;
 	req->cn10k_req.jd.hdr.fp_flags = 0x0;
 	req->cn10k_req.jd.hdr.result =
 		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
-		if (!model->glow.metadata.model.ocm_relocatable)
+		if (!layer->glow.metadata.model.ocm_relocatable)
 			req->cn10k_req.jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
 		else
 			req->cn10k_req.jd.hdr.sp_flags = 0x0;
@@ -291,7 +293,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 		req->cn10k_req.jd.model_start.num_gather_entries = 0;
 		req->cn10k_req.jd.model_start.num_scatter_entries = 0;
 		req->cn10k_req.jd.model_start.tilemask = 0; /* Updated after reserving pages */
-		req->cn10k_req.jd.model_start.batch_size = model->batch_size;
+		req->cn10k_req.jd.model_start.batch_size = layer->batch_size;
 		req->cn10k_req.jd.model_start.ocm_wb_base_address =
 			0; /* Updated after reserving pages */
 		req->cn10k_req.jd.model_start.ocm_wb_range_start =
@@ -323,9 +325,13 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 }
 
 static __rte_always_inline void
-cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_req *req,
+cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_req *req,
 				struct rte_ml_op *op)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+
 	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
 	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(req->status);
 	req->cn10k_req.jd.hdr.model_id = op->model_id;
@@ -714,10 +720,8 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 }
 
 static int
-cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer)
 {
-	struct rte_ml_model_info *info;
-	struct cnxk_ml_model *model;
 	struct rte_ml_buff_seg seg[2];
 	struct rte_ml_buff_seg *inp;
 	struct rte_ml_buff_seg *out;
@@ -730,22 +734,20 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	int ret = 0;
 	uint32_t i;
 
-	model = dev->data->models[model_id];
-	info = (struct rte_ml_model_info *)model->info;
 	inp = &seg[0];
 	out = &seg[1];
 
 	/* Create input and output buffers. */
-	for (i = 0; i < info->nb_inputs; i++)
-		isize += info->input_info[i].size;
+	for (i = 0; i < layer->info.nb_inputs; i++)
+		isize += layer->info.input[i].sz_q;
 
-	for (i = 0; i < info->nb_outputs; i++)
-		osize += info->output_info[i].size;
+	for (i = 0; i < layer->info.nb_outputs; i++)
+		osize += layer->info.output[i].sz_q;
 
-	isize = model->batch_size * isize;
-	osize = model->batch_size * osize;
+	isize = layer->batch_size * isize;
+	osize = layer->batch_size * osize;
 
-	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", model_id);
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", layer->index);
 	mz = plt_memzone_reserve_aligned(str, isize + osize, 0, ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL)
 		return -ENOMEM;
@@ -761,15 +763,15 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	seg[1].length = osize;
 	seg[1].next = NULL;
 
-	op.model_id = model_id;
-	op.nb_batches = model->batch_size;
+	op.model_id = layer->index;
+	op.nb_batches = layer->batch_size;
 	op.mempool = NULL;
 
 	op.input = &inp;
 	op.output = &out;
 
-	memset(model->layer[0].glow.req, 0, sizeof(struct cnxk_ml_req));
-	ret = cn10k_ml_inference_sync(dev, &op);
+	memset(layer->glow.req, 0, sizeof(struct cnxk_ml_req));
+	ret = cn10k_ml_inference_sync(cnxk_mldev, &op);
 	plt_memzone_free(mz);
 
 	return ret;
@@ -1506,14 +1508,16 @@ cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mode
 }
 
 int
-cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 	struct cnxk_ml_req *req;
 
+	uint16_t layer_id = 0;
 	bool job_enqueued;
 	bool job_dequeued;
 	uint8_t num_tiles;
@@ -1524,85 +1528,89 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	bool locked;
 	int ret = 0;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	PLT_SET_USED(layer_name);
 
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
 	if (model == NULL) {
 		plt_err("Invalid model_id = %u", model_id);
 		return -EINVAL;
 	}
 
+	layer = &model->layer[layer_id];
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
+
 	/* Prepare JD */
-	req = model->layer[0].glow.req;
-	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
+	req = layer->glow.req;
+	cn10k_ml_prep_sp_job_descriptor(cnxk_mldev, layer, req, ML_CN10K_JOB_TYPE_MODEL_START);
 	req->cn10k_req.result.error_code = 0x0;
 	req->cn10k_req.result.user_ptr = NULL;
 
 	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
-	num_tiles = model->layer[0].glow.metadata.model.tile_end -
-		    model->layer[0].glow.metadata.model.tile_start + 1;
+	num_tiles = layer->glow.metadata.model.tile_end - layer->glow.metadata.model.tile_start + 1;
 
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-				plt_ml_dbg("Model already started, model = 0x%016lx",
-					   PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_STARTED) {
+				plt_ml_dbg("Layer already started, model_id = %u, layer_id = %u",
+					   model->model_id, layer_id);
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
-				plt_err("A slow-path job is active for the model = 0x%016lx",
-					PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE) {
+				plt_err("A slow-path job is active for the model_id = %u",
+					model->model_id);
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
+			layer->state = ML_CNXK_LAYER_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (!model->layer[0].glow.ocm_map.ocm_reserved) {
+	while (!layer->glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
 			wb_page_start = cn10k_ml_ocm_tilemask_find(
-				dev, num_tiles, model->layer[0].glow.ocm_map.wb_pages,
-				model->layer[0].glow.ocm_map.scratch_pages, &tilemask);
+				cnxk_mldev, num_tiles, layer->glow.ocm_map.wb_pages,
+				layer->glow.ocm_map.scratch_pages, &tilemask);
 
 			if (wb_page_start == -1) {
 				plt_err("Free pages not available on OCM tiles");
-				plt_err("Failed to start model = 0x%016lx, name = %s",
-					PLT_U64_CAST(model),
-					model->layer[0].glow.metadata.model.name);
-
+				plt_err("Failed to start layer, model_id = %u, layer_id = %u",
+					model->model_id, layer_id);
 				plt_spinlock_unlock(&ocm->lock);
 				return -ENOMEM;
 			}
 
-			model->layer[0].glow.ocm_map.tilemask = tilemask;
-			model->layer[0].glow.ocm_map.wb_page_start = wb_page_start;
+			layer->glow.ocm_map.tilemask = tilemask;
+			layer->glow.ocm_map.wb_page_start = wb_page_start;
 
-			cn10k_ml_ocm_reserve_pages(dev, model->model_id, 0,
-						   model->layer[0].glow.ocm_map.tilemask,
-						   model->layer[0].glow.ocm_map.wb_page_start,
-						   model->layer[0].glow.ocm_map.wb_pages,
-						   model->layer[0].glow.ocm_map.scratch_pages);
-			model->layer[0].glow.ocm_map.ocm_reserved = true;
+			cn10k_ml_ocm_reserve_pages(
+				cnxk_mldev, model->model_id, layer_id, layer->glow.ocm_map.tilemask,
+				layer->glow.ocm_map.wb_page_start, layer->glow.ocm_map.wb_pages,
+				layer->glow.ocm_map.scratch_pages);
+			layer->glow.ocm_map.ocm_reserved = true;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
 
 	/* Update JD */
-	cn10k_ml_ocm_tilecount(model->layer[0].glow.ocm_map.tilemask, &tile_start, &tile_end);
+	cn10k_ml_ocm_tilecount(layer->glow.ocm_map.tilemask, &tile_start, &tile_end);
 	req->cn10k_req.jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
 	req->cn10k_req.jd.model_start.ocm_wb_base_address =
-		model->layer[0].glow.ocm_map.wb_page_start * ocm->page_size;
+		layer->glow.ocm_map.wb_page_start * ocm->page_size;
 
 	job_enqueued = false;
 	job_dequeued = false;
@@ -1636,66 +1644,94 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (ret == 0) {
-				model->state = ML_CNXK_MODEL_STATE_STARTED;
-				cnxk_mldev->nb_models_started++;
-			} else {
-				model->state = ML_CNXK_MODEL_STATE_UNKNOWN;
-			}
+			if (ret == 0)
+				layer->state = ML_CNXK_LAYER_STATE_STARTED;
+			else
+				layer->state = ML_CNXK_LAYER_STATE_UNKNOWN;
 
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN) {
-		while (model->layer[0].glow.ocm_map.ocm_reserved) {
+	if (layer->state == ML_CNXK_LAYER_STATE_UNKNOWN) {
+		while (layer->glow.ocm_map.ocm_reserved) {
 			if (plt_spinlock_trylock(&ocm->lock) != 0) {
-				cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
-				model->layer[0].glow.ocm_map.ocm_reserved = false;
-				model->layer[0].glow.ocm_map.tilemask = 0x0;
+				cn10k_ml_ocm_free_pages(cnxk_mldev, model->model_id, layer_id);
+				layer->glow.ocm_map.ocm_reserved = false;
+				layer->glow.ocm_map.tilemask = 0x0;
 				plt_spinlock_unlock(&ocm->lock);
 			}
 		}
 	}
 
-	if (ret < 0) { /* Call unload to update model and FW state, ignore error */
-		rte_ml_model_stop(dev->data->dev_id, model_id);
+	if (ret < 0) {
+		cn10k_ml_layer_stop(device, model_id, layer_name);
 	} else {
-		if (cn10k_mldev->cache_model_data && roc_model_is_cn10ka())
-			ret = cn10k_ml_cache_model_data(dev, model_id);
+		if (cn10k_mldev->cache_model_data)
+			ret = cn10k_ml_cache_model_data(cnxk_mldev, layer);
 	}
 
 	return ret;
 }
 
 int
-cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+	int ret;
+
+	layer = &model->layer[0];
+	ret = cn10k_ml_layer_start(cnxk_mldev, model->model_id, layer->name);
+	if (ret != 0) {
+		plt_err("CN10K Model start failed, model_id = %u, error = %d", model->model_id,
+			ret);
+		return ret;
+	}
+
+	cnxk_mldev->nb_models_started++;
+	model->state = ML_CNXK_MODEL_STATE_STARTED;
+
+	return 0;
+}
+
+int
+cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 	struct cnxk_ml_req *req;
 
+	uint16_t layer_id = 0;
 	bool job_enqueued;
 	bool job_dequeued;
 	bool locked;
 	int ret = 0;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	PLT_SET_USED(layer_name);
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
 
+	model = cnxk_mldev->mldev->data->models[model_id];
 	if (model == NULL) {
 		plt_err("Invalid model_id = %u", model_id);
 		return -EINVAL;
 	}
 
+	layer = &model->layer[layer_id];
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
+
 	/* Prepare JD */
-	req = model->layer[0].glow.req;
-	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
+	req = layer->glow.req;
+	cn10k_ml_prep_sp_job_descriptor(cnxk_mldev, layer, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
 	req->cn10k_req.result.error_code = 0x0;
 	req->cn10k_req.result.user_ptr = NULL;
 
@@ -1705,31 +1741,31 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-				plt_ml_dbg("Model not started, model = 0x%016lx",
-					   PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_LOADED) {
+				plt_ml_dbg("Layer not started, model_id = %u, layer_id = %u",
+					   model->model_id, layer_id);
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
-				plt_err("A slow-path job is active for the model = 0x%016lx",
-					PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE) {
+				plt_err("A slow-path job is active for the layer, model_id = %u, layer_id = %u",
+					model->model_id, layer_id);
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
+			layer->state = ML_CNXK_LAYER_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (model->layer[0].glow.ocm_map.ocm_reserved) {
+	while (layer->glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
-			cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
-			model->layer[0].glow.ocm_map.ocm_reserved = false;
-			model->layer[0].glow.ocm_map.tilemask = 0x0;
+			cn10k_ml_ocm_free_pages(cnxk_mldev, model->model_id, layer_id);
+			layer->glow.ocm_map.ocm_reserved = false;
+			layer->glow.ocm_map.tilemask = 0x0;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
@@ -1766,8 +1802,11 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			cnxk_mldev->nb_models_stopped++;
-			model->state = ML_CNXK_MODEL_STATE_LOADED;
+			if (ret == 0)
+				layer->state = ML_CNXK_LAYER_STATE_LOADED;
+			else
+				layer->state = ML_CNXK_LAYER_STATE_UNKNOWN;
+
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
@@ -1776,6 +1815,25 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	return ret;
 }
 
+int
+cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+	int ret;
+
+	layer = &model->layer[0];
+	ret = cn10k_ml_layer_stop(cnxk_mldev, model->model_id, layer->name);
+	if (ret != 0) {
+		plt_err("CN10K Model stop failed, model_id = %u, error = %d", model->model_id, ret);
+		return ret;
+	}
+
+	cnxk_mldev->nb_models_stopped++;
+	model->state = ML_CNXK_MODEL_STATE_LOADED;
+
+	return 0;
+}
+
 int
 cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			struct rte_ml_model_info *model_info)
@@ -2003,30 +2061,35 @@ queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
 }
 
 static __rte_always_inline void
-cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *req)
+cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml_req *req)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_layer_xstats *xstats;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_result *result;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cnxk_ml_qp *qp;
 	struct rte_ml_op *op;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
+	uint16_t model_id;
+	uint16_t layer_id;
 
 	result = &req->cn10k_req.result;
 	op = req->op;
 
 	if (likely(result->error_code == 0)) {
-		model = dev->data->models[op->model_id];
+		model_id = cnxk_mldev->index_map[op->model_id].model_id;
+		layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
+		model = cnxk_mldev->mldev->data->models[model_id];
+		layer = &model->layer[layer_id];
 		if (likely(qp_id >= 0)) {
-			qp = dev->data->queue_pairs[qp_id];
+			qp = cnxk_mldev->mldev->data->queue_pairs[qp_id];
 			qp->stats.dequeued_count++;
-			xstats = &model->layer[0].glow.burst_xstats[qp_id];
+			xstats = &layer->glow.burst_xstats[qp_id];
 		} else {
-			xstats = model->layer[0].glow.sync_xstats;
+			xstats = layer->glow.sync_xstats;
 		}
 
 		if (unlikely(xstats->dequeued_count == xstats->hw_reset_count)) {
@@ -2054,14 +2117,13 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *re
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
 	} else {
 		if (likely(qp_id >= 0)) {
-			qp = dev->data->queue_pairs[qp_id];
+			qp = cnxk_mldev->mldev->data->queue_pairs[qp_id];
 			qp->stats.dequeue_err_count++;
 		}
 
 		/* Handle driver error */
 		error_code = (union cn10k_ml_error_code *)&result->error_code;
 		if (error_code->s.etype == ML_ETYPE_DRIVER) {
-			cnxk_mldev = dev->data->dev_private;
 			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 			/* Check for exception */
@@ -2116,7 +2178,7 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	req = &queue->reqs[head];
 
 	cn10k_mldev->set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
@@ -2183,7 +2245,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 		}
 	}
 
-	cn10k_ml_result_update(dev, qp_id, req);
+	cn10k_ml_result_update(cnxk_mldev, qp_id, req);
 	ops[count] = req->op;
 
 	queue_index_advance(&tail, qp->nb_desc);
@@ -2232,23 +2294,27 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 }
 
 __rte_hot int
-cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
+cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cnxk_ml_req *req;
+	uint16_t model_id;
+	uint16_t layer_id;
 	bool timeout;
 	int ret = 0;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	model = dev->data->models[op->model_id];
-	req = model->layer[0].glow.req;
+	model_id = cnxk_mldev->index_map[op->model_id].model_id;
+	layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
+	model = cnxk_mldev->mldev->data->models[model_id];
+	layer = &model->layer[layer_id];
+	req = layer->glow.req;
 
 	cn10k_ml_set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
@@ -2284,7 +2350,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	if (timeout)
 		ret = -ETIME;
 	else
-		cn10k_ml_result_update(dev, -1, req);
+		cn10k_ml_result_update(cnxk_mldev, -1, req);
 
 error_enqueue:
 	return ret;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 677219dfdf..a222a43d55 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -315,8 +315,8 @@ int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
-int cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id);
-int cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
+int cn10k_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			    struct rte_ml_model_info *model_info);
 int cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer);
@@ -335,7 +335,7 @@ __rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id
 					  struct rte_ml_op **ops, uint16_t nb_ops);
 __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op,
 				    struct rte_ml_op_error *error);
-__rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
+__rte_hot int cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op);
 
 /* Misc ops */
 void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
@@ -344,5 +344,7 @@ void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *q
 int cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uint8_t *buffer,
 			size_t size, uint16_t *index);
 int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name);
+int cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name);
+int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name);
 
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index e2ba43a307..d34e60be32 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -240,7 +240,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 			model = dev->data->models[model_id];
 			if (model != NULL) {
 				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-					if (cn10k_ml_model_stop(dev, model_id) != 0)
+					if (cnxk_ml_model_stop(dev, model_id) != 0)
 						plt_err("Could not stop model %u", model_id);
 				}
 				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
@@ -332,7 +332,7 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 		model = dev->data->models[model_id];
 		if (model != NULL) {
 			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-				if (cn10k_ml_model_stop(dev, model_id) != 0)
+				if (cnxk_ml_model_stop(dev, model_id) != 0)
 					plt_err("Could not stop model %u", model_id);
 			}
 			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
@@ -624,6 +624,46 @@ cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 	return plt_memzone_free(plt_memzone_lookup(str));
 }
 
+static int
+cnxk_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	return cn10k_ml_model_start(cnxk_mldev, model);
+}
+
+int
+cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	return cn10k_ml_model_stop(cnxk_mldev, model);
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -649,8 +689,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Model ops */
 	.model_load = cnxk_ml_model_load,
 	.model_unload = cnxk_ml_model_unload,
-	.model_start = cn10k_ml_model_start,
-	.model_stop = cn10k_ml_model_stop,
+	.model_start = cnxk_ml_model_start,
+	.model_stop = cnxk_ml_model_stop,
 	.model_info_get = cn10k_ml_model_info_get,
 	.model_params_update = cn10k_ml_model_params_update,
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index bc14f6e5b9..d27ca0d0cb 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -63,5 +63,6 @@ struct cnxk_ml_qp {
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
 int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
+int cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
 
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 12/34] ml/cnxk: update model utility functions
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (10 preceding siblings ...)
  2023-10-17 16:59   ` [PATCH v4 11/34] ml/cnxk: update model start and stop functions Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 13/34] ml/cnxk: update data quantization functions Srikanth Yalavarthi
                     ` (22 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to update model params and
fetch model info.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 38 ++++++---------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |  5 ++--
 drivers/ml/cnxk/cnxk_ml_ops.c  | 48 ++++++++++++++++++++++++++++++++--
 3 files changed, 56 insertions(+), 35 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index c677861645..c0d6216485 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1835,45 +1835,23 @@ cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
 }
 
 int
-cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
-			struct rte_ml_model_info *model_info)
+cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			     void *buffer)
 {
-	struct cnxk_ml_model *model;
-
-	model = dev->data->models[model_id];
-
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
-
-	rte_memcpy(model_info, model->info, sizeof(struct rte_ml_model_info));
-	model_info->input_info = ((struct rte_ml_model_info *)model->info)->input_info;
-	model_info->output_info = ((struct rte_ml_model_info *)model->info)->output_info;
-
-	return 0;
-}
-
-int
-cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
-{
-	struct cnxk_ml_model *model;
-
-	model = dev->data->models[model_id];
+	struct cnxk_ml_layer *layer;
 
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
+	RTE_SET_USED(cnxk_mldev);
 
 	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN)
 		return -1;
 	else if (model->state != ML_CNXK_MODEL_STATE_LOADED)
 		return -EBUSY;
 
+	layer = &model->layer[0];
+
 	/* Update model weights & bias */
-	rte_memcpy(model->layer[0].glow.addr.wb_load_addr, buffer,
-		   model->layer[0].glow.metadata.weights_bias.file_size);
+	rte_memcpy(layer->glow.addr.wb_load_addr, buffer,
+		   layer->glow.metadata.weights_bias.file_size);
 
 	return 0;
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index a222a43d55..ef12069f0d 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -317,9 +317,8 @@ int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_para
 int cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
-int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
-			    struct rte_ml_model_info *model_info);
-int cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer);
+int cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				 void *buffer);
 
 /* I/O ops */
 int cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index d34e60be32..79665fa21b 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -664,6 +664,50 @@ cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	return cn10k_ml_model_stop(cnxk_mldev, model);
 }
 
+static int
+cnxk_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
+		       struct rte_ml_model_info *model_info)
+{
+	struct rte_ml_model_info *info;
+	struct cnxk_ml_model *model;
+
+	if ((dev == NULL) || (model_info == NULL))
+		return -EINVAL;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	info = (struct rte_ml_model_info *)model->info;
+	rte_memcpy(model_info, info, sizeof(struct rte_ml_model_info));
+	model_info->input_info = info->input_info;
+	model_info->output_info = info->output_info;
+
+	return 0;
+}
+
+static int
+cnxk_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	if ((dev == NULL) || (buffer == NULL))
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	return cn10k_ml_model_params_update(cnxk_mldev, model, buffer);
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -691,8 +735,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.model_unload = cnxk_ml_model_unload,
 	.model_start = cnxk_ml_model_start,
 	.model_stop = cnxk_ml_model_stop,
-	.model_info_get = cn10k_ml_model_info_get,
-	.model_params_update = cn10k_ml_model_params_update,
+	.model_info_get = cnxk_ml_model_info_get,
+	.model_params_update = cnxk_ml_model_params_update,
 
 	/* I/O ops */
 	.io_quantize = cn10k_ml_io_quantize,
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 13/34] ml/cnxk: update data quantization functions
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (11 preceding siblings ...)
  2023-10-17 16:59   ` [PATCH v4 12/34] ml/cnxk: update model utility functions Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 14/34] ml/cnxk: update device debug functions Srikanth Yalavarthi
                     ` (21 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper functions to quantize input data and
dequantize output data.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 164 ---------------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   7 --
 drivers/ml/cnxk/cnxk_ml_io.c   |  95 +++++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_io.h   |   3 +
 drivers/ml/cnxk/cnxk_ml_ops.c  |  78 +++++++++++++++-
 drivers/ml/cnxk/meson.build    |   1 +
 6 files changed, 175 insertions(+), 173 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.c

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index c0d6216485..ff190b7f86 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1856,170 +1856,6 @@ cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_mode
 	return 0;
 }
 
-int
-cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
-		     struct rte_ml_buff_seg **qbuffer)
-{
-	struct cnxk_ml_model *model;
-	uint8_t model_input_type;
-	uint8_t *lcl_dbuffer;
-	uint8_t *lcl_qbuffer;
-	uint8_t input_type;
-	float qscale;
-	uint32_t i;
-	uint32_t j;
-	int ret;
-
-	model = dev->data->models[model_id];
-
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
-
-	lcl_dbuffer = dbuffer[0]->addr;
-	lcl_qbuffer = qbuffer[0]->addr;
-
-	for (i = 0; i < model->layer[0].glow.metadata.model.num_input; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			input_type = model->layer[0].glow.metadata.input1[i].input_type;
-			model_input_type = model->layer[0].glow.metadata.input1[i].model_input_type;
-			qscale = model->layer[0].glow.metadata.input1[i].qscale;
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			input_type = model->layer[0].glow.metadata.input2[j].input_type;
-			model_input_type = model->layer[0].glow.metadata.input2[j].model_input_type;
-			qscale = model->layer[0].glow.metadata.input2[j].qscale;
-		}
-
-		if (input_type == model_input_type) {
-			rte_memcpy(lcl_qbuffer, lcl_dbuffer, model->layer[0].info.input[i].sz_d);
-		} else {
-			switch (model->layer[0].glow.metadata.input1[i].model_input_type) {
-			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_float32_to_int8(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_float32_to_uint8(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_float32_to_int16(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_float32_to_uint16(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_FP16:
-				ret = rte_ml_io_float32_to_float16(
-					model->layer[0].info.input[i].nb_elements, lcl_dbuffer,
-					lcl_qbuffer);
-				break;
-			default:
-				plt_err("Unsupported model_input_type[%u] : %u", i,
-					model->layer[0].glow.metadata.input1[i].model_input_type);
-				ret = -ENOTSUP;
-			}
-			if (ret < 0)
-				return ret;
-		}
-
-		lcl_dbuffer += model->layer[0].info.input[i].sz_d;
-		lcl_qbuffer += model->layer[0].info.input[i].sz_q;
-	}
-
-	return 0;
-}
-
-int
-cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
-		       struct rte_ml_buff_seg **dbuffer)
-{
-	struct cnxk_ml_model *model;
-	uint8_t model_output_type;
-	uint8_t *lcl_qbuffer;
-	uint8_t *lcl_dbuffer;
-	uint8_t output_type;
-	float dscale;
-	uint32_t i;
-	uint32_t j;
-	int ret;
-
-	model = dev->data->models[model_id];
-
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
-
-	lcl_dbuffer = dbuffer[0]->addr;
-	lcl_qbuffer = qbuffer[0]->addr;
-
-	for (i = 0; i < model->layer[0].glow.metadata.model.num_output; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			output_type = model->layer[0].glow.metadata.output1[i].output_type;
-			model_output_type =
-				model->layer[0].glow.metadata.output1[i].model_output_type;
-			dscale = model->layer[0].glow.metadata.output1[i].dscale;
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			output_type = model->layer[0].glow.metadata.output2[j].output_type;
-			model_output_type =
-				model->layer[0].glow.metadata.output2[j].model_output_type;
-			dscale = model->layer[0].glow.metadata.output2[j].dscale;
-		}
-
-		if (output_type == model_output_type) {
-			rte_memcpy(lcl_dbuffer, lcl_qbuffer, model->layer[0].info.output[i].sz_q);
-		} else {
-			switch (model->layer[0].glow.metadata.output1[i].model_output_type) {
-			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_int8_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_uint8_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_int16_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_uint16_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_FP16:
-				ret = rte_ml_io_float16_to_float32(
-					model->layer[0].info.output[i].nb_elements, lcl_qbuffer,
-					lcl_dbuffer);
-				break;
-			default:
-				plt_err("Unsupported model_output_type[%u] : %u", i,
-					model->layer[0].glow.metadata.output1[i].model_output_type);
-				ret = -ENOTSUP;
-			}
-			if (ret < 0)
-				return ret;
-		}
-
-		lcl_qbuffer += model->layer[0].info.output[i].sz_q;
-		lcl_dbuffer += model->layer[0].info.output[i].sz_d;
-	}
-
-	return 0;
-}
-
 static __rte_always_inline void
 queue_index_advance(uint64_t *index, uint64_t nb_desc)
 {
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index ef12069f0d..780e2a9f9c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -320,13 +320,6 @@ int cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 int cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 				 void *buffer);
 
-/* I/O ops */
-int cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id,
-			 struct rte_ml_buff_seg **dbuffer, struct rte_ml_buff_seg **qbuffer);
-
-int cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id,
-			   struct rte_ml_buff_seg **qbuffer, struct rte_ml_buff_seg **dbuffer);
-
 /* Fast-path ops */
 __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					  struct rte_ml_op **ops, uint16_t nb_ops);
diff --git a/drivers/ml/cnxk/cnxk_ml_io.c b/drivers/ml/cnxk/cnxk_ml_io.c
new file mode 100644
index 0000000000..c78009ab0c
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_io.c
@@ -0,0 +1,95 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include <mldev_utils.h>
+
+#include <roc_api.h>
+
+#include "cnxk_ml_io.h"
+
+inline int
+cnxk_ml_io_quantize_single(struct cnxk_ml_io *input, uint8_t *dbuffer, uint8_t *qbuffer)
+{
+	enum rte_ml_io_type qtype;
+	enum rte_ml_io_type dtype;
+	uint32_t nb_elements;
+	float qscale;
+	int ret = 0;
+
+	dtype = input->dtype;
+	qtype = input->qtype;
+	qscale = input->scale;
+	nb_elements = input->nb_elements;
+
+	if (dtype == qtype) {
+		rte_memcpy(qbuffer, dbuffer, input->sz_d);
+	} else {
+		switch (qtype) {
+		case RTE_ML_IO_TYPE_INT8:
+			ret = rte_ml_io_float32_to_int8(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT8:
+			ret = rte_ml_io_float32_to_uint8(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_INT16:
+			ret = rte_ml_io_float32_to_int16(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT16:
+			ret = rte_ml_io_float32_to_uint16(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_FP16:
+			ret = rte_ml_io_float32_to_float16(nb_elements, dbuffer, qbuffer);
+			break;
+		default:
+			plt_err("Unsupported qtype : %u", qtype);
+			ret = -ENOTSUP;
+		}
+	}
+
+	return ret;
+}
+
+inline int
+cnxk_ml_io_dequantize_single(struct cnxk_ml_io *output, uint8_t *qbuffer, uint8_t *dbuffer)
+{
+	enum rte_ml_io_type qtype;
+	enum rte_ml_io_type dtype;
+	uint32_t nb_elements;
+	float dscale;
+	int ret = 0;
+
+	dtype = output->dtype;
+	qtype = output->qtype;
+	dscale = output->scale;
+	nb_elements = output->nb_elements;
+
+	if (dtype == qtype) {
+		rte_memcpy(dbuffer, qbuffer, output->sz_q);
+	} else {
+		switch (qtype) {
+		case RTE_ML_IO_TYPE_INT8:
+			ret = rte_ml_io_int8_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT8:
+			ret = rte_ml_io_uint8_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_INT16:
+			ret = rte_ml_io_int16_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT16:
+			ret = rte_ml_io_uint16_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_FP16:
+			ret = rte_ml_io_float16_to_float32(nb_elements, qbuffer, dbuffer);
+			break;
+		default:
+			plt_err("Unsupported qtype: %u", qtype);
+			ret = -ENOTSUP;
+		}
+	}
+
+	return ret;
+}
diff --git a/drivers/ml/cnxk/cnxk_ml_io.h b/drivers/ml/cnxk/cnxk_ml_io.h
index 29ec7ec511..5de166c252 100644
--- a/drivers/ml/cnxk/cnxk_ml_io.h
+++ b/drivers/ml/cnxk/cnxk_ml_io.h
@@ -76,4 +76,7 @@ struct cnxk_ml_io_info {
 	uint32_t total_output_sz_d;
 };
 
+int cnxk_ml_io_quantize_single(struct cnxk_ml_io *input, uint8_t *dbuffer, uint8_t *qbuffer);
+int cnxk_ml_io_dequantize_single(struct cnxk_ml_io *output, uint8_t *qbuffer, uint8_t *dbuffer);
+
 #endif /* _CNXK_ML_IO_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 79665fa21b..5d181eb0f2 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -5,6 +5,8 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include <mldev_utils.h>
+
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_io.h"
 #include "cnxk_ml_model.h"
@@ -708,6 +710,78 @@ cnxk_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buf
 	return cn10k_ml_model_params_update(cnxk_mldev, model, buffer);
 }
 
+static int
+cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
+		    struct rte_ml_buff_seg **qbuffer)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_model *model;
+	uint8_t *lcl_dbuffer;
+	uint8_t *lcl_qbuffer;
+	uint32_t i;
+	int ret;
+
+	if ((dev == NULL) || (dbuffer == NULL) || (qbuffer == NULL))
+		return -EINVAL;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	info = &model->layer[0].info;
+
+	lcl_dbuffer = dbuffer[0]->addr;
+	lcl_qbuffer = qbuffer[0]->addr;
+	for (i = 0; i < info->nb_inputs; i++) {
+		ret = cnxk_ml_io_quantize_single(&info->input[i], lcl_dbuffer, lcl_qbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_dbuffer += info->input[i].sz_d;
+		lcl_qbuffer += info->input[i].sz_q;
+	}
+
+	return 0;
+}
+
+static int
+cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
+		      struct rte_ml_buff_seg **dbuffer)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_model *model;
+	uint8_t *lcl_qbuffer;
+	uint8_t *lcl_dbuffer;
+	uint32_t i;
+	int ret;
+
+	if ((dev == NULL) || (qbuffer == NULL) || (dbuffer == NULL))
+		return -EINVAL;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	info = &model->layer[model->nb_layers - 1].info;
+
+	lcl_qbuffer = qbuffer[0]->addr;
+	lcl_dbuffer = dbuffer[0]->addr;
+	for (i = 0; i < info->nb_outputs; i++) {
+		ret = cnxk_ml_io_dequantize_single(&info->output[i], lcl_qbuffer, lcl_dbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_qbuffer += info->output[i].sz_q;
+		lcl_dbuffer += info->output[i].sz_d;
+	}
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -739,6 +813,6 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.model_params_update = cnxk_ml_model_params_update,
 
 	/* I/O ops */
-	.io_quantize = cn10k_ml_io_quantize,
-	.io_dequantize = cn10k_ml_io_dequantize,
+	.io_quantize = cnxk_ml_io_quantize,
+	.io_dequantize = cnxk_ml_io_dequantize,
 };
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 6385ac4548..9cc4ddec70 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -25,6 +25,7 @@ sources = files(
         'cn10k_ml_model.c',
         'cn10k_ml_ocm.c',
         'cnxk_ml_dev.c',
+        'cnxk_ml_io.c',
         'cnxk_ml_model.c',
         'cnxk_ml_ops.c',
 )
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 14/34] ml/cnxk: update device debug functions
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (12 preceding siblings ...)
  2023-10-17 16:59   ` [PATCH v4 13/34] ml/cnxk: update data quantization functions Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 15/34] ml/cnxk: update device stats functions Srikanth Yalavarthi
                     ` (20 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper for device dump and selftest debug
functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 118 +++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h |   1 +
 drivers/ml/cnxk/cn10k_ml_ocm.c   |   8 +-
 drivers/ml/cnxk/cn10k_ml_ocm.h   |   2 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 176 ++-----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h   |   4 +-
 drivers/ml/cnxk/cnxk_ml_model.c  |  33 ++++++
 drivers/ml/cnxk/cnxk_ml_model.h  |   2 +
 drivers/ml/cnxk/cnxk_ml_ops.c    |  39 ++++++-
 drivers/ml/cnxk/cnxk_ml_utils.c  |  15 +++
 drivers/ml/cnxk/cnxk_ml_utils.h  |  17 +++
 drivers/ml/cnxk/meson.build      |   2 +
 12 files changed, 236 insertions(+), 181 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.h

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 69a60b9b90..b765b4ada9 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -11,6 +11,7 @@
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
+#include "cnxk_ml_utils.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -596,3 +597,120 @@ cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 				 rte_ml_io_type_size_get(io_info->output[i].qtype);
 	}
 }
+
+void
+cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp)
+{
+	struct cn10k_ml_ocm *ocm;
+	char str[STR_LEN];
+	uint8_t i;
+	uint8_t j;
+
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
+
+	/* Print debug info */
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, " Layer Information (Layer ID: %u, Name: %s)\n",
+		cnxk_mldev->index_map[layer->index].layer_id, layer->name);
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "index", layer->index);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", layer->name);
+	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version",
+		layer->glow.metadata.model.version[0], layer->glow.metadata.model.version[1],
+		layer->glow.metadata.model.version[2], layer->glow.metadata.model.version[3]);
+	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "layer", PLT_U64_CAST(layer));
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", layer->batch_size);
+
+	/* Print model state */
+	if (layer->state == ML_CNXK_LAYER_STATE_LOADED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+	if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+	if (layer->state == ML_CNXK_LAYER_STATE_STARTED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+
+	/* Print OCM status */
+	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
+		layer->glow.metadata.model.ocm_wb_range_end -
+			layer->glow.metadata.model.ocm_wb_range_start + 1);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", layer->glow.ocm_map.wb_pages);
+	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
+		ocm->size_per_tile - layer->glow.metadata.model.ocm_tmp_range_floor);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages", layer->glow.ocm_map.scratch_pages);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
+		layer->glow.metadata.model.tile_end - layer->glow.metadata.model.tile_start + 1);
+
+	if (layer->state == ML_CNXK_LAYER_STATE_STARTED) {
+		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
+			ML_CN10K_OCM_NUMTILES / 4, layer->glow.ocm_map.tilemask);
+		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
+			layer->glow.ocm_map.wb_page_start * ocm->page_size);
+	}
+
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", layer->glow.metadata.model.num_input);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", layer->glow.metadata.model.num_output);
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s  %18s\n", "input", "input_name", "input_type",
+		"model_input_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->glow.metadata.model.num_input; i++) {
+		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.input1[i].input_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.input1[i].input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.input1[i].model_input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		} else {
+			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
+
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.input2[j].input_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.input2[j].input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.input2[j].model_input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		}
+	}
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s  %18s\n", "output", "output_name", "output_type",
+		"model_output_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->glow.metadata.model.num_output; i++) {
+		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.output1[i].output_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.output1[i].output_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.output1[i].model_output_type,
+					      str, STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		} else {
+			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.output2[j].output_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.output2[j].output_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.output2[j].model_output_type,
+					      str, STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		}
+	}
+	fprintf(fp, "\n");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index b891c9d627..45f2ed5fcf 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -460,5 +460,6 @@ int cn10k_ml_model_ocm_pages_count(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_m
 void cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 			     struct cnxk_ml_io_info *io_info,
 			     struct cn10k_ml_model_metadata *metadata);
+void cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 2197e5e0ed..dc315cce10 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -481,19 +481,15 @@ cn10k_ml_ocm_pagemask_to_str(struct cn10k_ml_ocm_tile_info *tile_info, uint16_t
 }
 
 void
-cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
+cn10k_ml_ocm_print(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
 {
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 	uint8_t tile_id;
 	uint8_t word_id;
 	int wb_pages;
 	char *str;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
 
 	/* Nibbles + prefix '0x' */
 	str = rte_zmalloc("ocm_mask_str", ocm->num_pages / 4 + 2, RTE_CACHE_LINE_SIZE);
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 97b723a56a..bf8944f8ee 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -83,6 +83,6 @@ void cn10k_ml_ocm_reserve_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_i
 				uint16_t layer_id, uint64_t tilemask, int wb_page_start,
 				uint16_t wb_pages, uint16_t scratch_pages);
 void cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id);
-void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp);
+void cn10k_ml_ocm_print(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 
 #endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index ff190b7f86..0a3575879f 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -18,11 +18,6 @@
 /* ML layer macros */
 #define CN10K_ML_LAYER_MEMZONE_NAME "ml_cn10k_layer_mz"
 
-/* Debug print width */
-#define STR_LEN	  12
-#define FIELD_LEN 16
-#define LINE_LEN  90
-
 /* ML Job descriptor flags */
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
@@ -70,16 +65,6 @@ static const struct cn10k_ml_stype_db_driver {
 	{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
 };
 
-static void
-print_line(FILE *fp, int len)
-{
-	int i;
-
-	for (i = 0; i < len; i++)
-		fprintf(fp, "-");
-	fprintf(fp, "\n");
-}
-
 static inline void
 cn10k_ml_set_poll_addr(struct cnxk_ml_req *req)
 {
@@ -113,140 +98,6 @@ cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp)
 	}
 }
 
-static void
-cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
-	struct cn10k_ml_ocm *ocm;
-	char str[STR_LEN];
-	uint8_t i;
-	uint8_t j;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
-
-	/* Print debug info */
-	print_line(fp, LINE_LEN);
-	fprintf(fp, " Model Information (%s)\n", model->glow.metadata.model.name);
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->glow.metadata.model.name);
-	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version",
-		model->glow.metadata.model.version[0], model->glow.metadata.model.version[1],
-		model->glow.metadata.model.version[2], model->glow.metadata.model.version[3]);
-	if (strlen(model->name) != 0)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "debug_name", model->name);
-	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->glow.metadata.model.batch_size);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_layers", model->glow.metadata.model.num_layers);
-
-	/* Print model state */
-	if (model->state == ML_CNXK_MODEL_STATE_LOADED)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
-	if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
-	if (model->state == ML_CNXK_MODEL_STATE_STARTED)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
-
-	/* Print OCM status */
-	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
-		model->glow.metadata.model.ocm_wb_range_end -
-			model->glow.metadata.model.ocm_wb_range_start + 1);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", model->layer[0].glow.ocm_map.wb_pages);
-	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
-		ocm->size_per_tile - model->glow.metadata.model.ocm_tmp_range_floor);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages",
-		model->layer[0].glow.ocm_map.scratch_pages);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
-		model->glow.metadata.model.tile_end - model->glow.metadata.model.tile_start + 1);
-
-	if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
-			ML_CN10K_OCM_NUMTILES / 4, model->layer[0].glow.ocm_map.tilemask);
-		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
-			model->layer[0].glow.ocm_map.wb_page_start * cn10k_mldev->ocm.page_size);
-	}
-
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->glow.metadata.model.num_input);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", model->glow.metadata.model.num_output);
-	fprintf(fp, "\n");
-
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "input", "input_name", "input_type",
-		"model_input_type", "quantize");
-	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->glow.metadata.model.num_input; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.input1[i].input_name);
-			rte_ml_io_type_to_str(model->glow.metadata.input1[i].input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.input1[i].model_input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.input1[i].quantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.input2[j].input_name);
-			rte_ml_io_type_to_str(model->glow.metadata.input2[j].input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.input2[j].model_input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.input2[j].quantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		}
-	}
-	fprintf(fp, "\n");
-
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "output", "output_name", "output_type",
-		"model_output_type", "dequantize");
-	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->glow.metadata.model.num_output; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.output1[i].output_name);
-			rte_ml_io_type_to_str(model->glow.metadata.output1[i].output_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.output1[i].model_output_type,
-					      str, STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.output1[i].dequantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.output2[j].output_name);
-			rte_ml_io_type_to_str(model->glow.metadata.output2[j].output_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.output2[j].model_output_type,
-					      str, STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.output2[j].dequantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		}
-	}
-	fprintf(fp, "\n");
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "\n");
-}
-
 static void
 cn10k_ml_prep_sp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				struct cnxk_ml_req *req, enum cn10k_ml_job_type job_type)
@@ -1120,38 +971,25 @@ cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mo
 }
 
 int
-cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
+cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
 	struct cn10k_ml_fw *fw;
 
 	uint32_t head_loc;
 	uint32_t tail_loc;
-	uint16_t model_id;
 	uint32_t bufsize;
 	char *head_ptr;
 	int core_id;
 
-	if (roc_env_is_asim())
-		return 0;
-
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	fw = &cn10k_mldev->fw;
 
-	/* Dump model info */
-	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
-		model = dev->data->models[model_id];
-		if (model != NULL) {
-			cn10k_ml_model_print(dev, model_id, fp);
-			fprintf(fp, "\n");
-		}
-	}
-
 	/* Dump OCM state */
-	cn10k_ml_ocm_print(dev, fp);
+	cn10k_ml_ocm_print(cnxk_mldev, fp);
+
+	if (roc_env_is_asim())
+		return 0;
 
 	/* Dump debug buffer */
 	for (core_id = 0; core_id <= 1; core_id++) {
@@ -1207,17 +1045,15 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 }
 
 int
-cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
+cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	const struct plt_memzone *mz;
 	struct cnxk_ml_req *req;
 	uint64_t timeout_cycle;
 	bool timeout;
 	int ret;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cnxk_ml_req), 0,
 					 ML_CN10K_ALIGN_SIZE);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 780e2a9f9c..5fda98ae88 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -295,8 +295,8 @@ int cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_d
 int cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
-int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
-int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
+int cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
+int cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev);
 
 int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
 void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
index 3d735ced3e..b069d4e3a5 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.c
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -5,3 +5,36 @@
 #include <rte_mldev.h>
 
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_utils.h"
+
+void
+cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp)
+{
+	struct cnxk_ml_layer *layer;
+	uint16_t layer_id;
+
+	/* Print debug info */
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, " Model Information (Model ID: %u, Name: %s)\n", model->model_id, model->name);
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->name);
+	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->batch_size);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "nb_layers", model->nb_layers);
+
+	/* Print model state */
+	if (model->state == ML_CNXK_MODEL_STATE_LOADED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+	if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+	if (model->state == ML_CNXK_MODEL_STATE_STARTED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+
+	for (layer_id = 0; layer_id < model->nb_layers; layer_id++) {
+		layer = &model->layer[layer_id];
+		cn10k_ml_layer_print(cnxk_mldev, layer, fp);
+	}
+}
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index a2994dbb71..66d979dd3c 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -108,4 +108,6 @@ struct cnxk_ml_model {
 	plt_spinlock_t lock;
 };
 
+void cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp);
+
 #endif /* _CNXK_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 5d181eb0f2..ba388af787 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -409,6 +409,41 @@ cnxk_ml_dev_stop(struct rte_ml_dev *dev)
 	return 0;
 }
 
+static int
+cnxk_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	uint16_t model_id;
+
+	if ((dev == NULL) || (fp == NULL))
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	/* Dump model info */
+	for (model_id = 0; model_id < cnxk_mldev->mldev->data->nb_models; model_id++) {
+		model = cnxk_mldev->mldev->data->models[model_id];
+		if (model != NULL)
+			cnxk_ml_model_dump(cnxk_mldev, model, fp);
+	}
+
+	return cn10k_ml_dev_dump(cnxk_mldev, fp);
+}
+
+static int
+cnxk_ml_dev_selftest(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	return cn10k_ml_dev_selftest(cnxk_mldev);
+}
+
 static int
 cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 			     const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
@@ -789,8 +824,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_close = cnxk_ml_dev_close,
 	.dev_start = cnxk_ml_dev_start,
 	.dev_stop = cnxk_ml_dev_stop,
-	.dev_dump = cn10k_ml_dev_dump,
-	.dev_selftest = cn10k_ml_dev_selftest,
+	.dev_dump = cnxk_ml_dev_dump,
+	.dev_selftest = cnxk_ml_dev_selftest,
 
 	/* Queue-pair handling ops */
 	.dev_queue_pair_setup = cnxk_ml_dev_queue_pair_setup,
diff --git a/drivers/ml/cnxk/cnxk_ml_utils.c b/drivers/ml/cnxk/cnxk_ml_utils.c
new file mode 100644
index 0000000000..ca3670a9e8
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_utils.c
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include "cnxk_ml_utils.h"
+
+void
+cnxk_ml_print_line(FILE *fp, int len)
+{
+	int i;
+
+	for (i = 0; i < len; i++)
+		fprintf(fp, "-");
+	fprintf(fp, "\n");
+}
diff --git a/drivers/ml/cnxk/cnxk_ml_utils.h b/drivers/ml/cnxk/cnxk_ml_utils.h
new file mode 100644
index 0000000000..ed2ab21346
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_utils.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_UTILS_H_
+#define _CNXK_ML_UTILS_H_
+
+#include <rte_mldev.h>
+
+/* Debug print width */
+#define STR_LEN	  12
+#define FIELD_LEN 16
+#define LINE_LEN  72
+
+void cnxk_ml_print_line(FILE *fp, int len);
+
+#endif /* _CNXK_ML_UTILS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 9cc4ddec70..575f08f9c0 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -17,6 +17,7 @@ driver_sdk_headers = files(
         'cnxk_ml_model.h',
         'cnxk_ml_ops.h',
         'cnxk_ml_xstats.h',
+        'cnxk_ml_utils.h',
 )
 
 sources = files(
@@ -28,6 +29,7 @@ sources = files(
         'cnxk_ml_io.c',
         'cnxk_ml_model.c',
         'cnxk_ml_ops.c',
+        'cnxk_ml_utils.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 15/34] ml/cnxk: update device stats functions
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (13 preceding siblings ...)
  2023-10-17 16:59   ` [PATCH v4 14/34] ml/cnxk: update device debug functions Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 16/34] ml/cnxk: update device and model xstats functions Srikanth Yalavarthi
                     ` (19 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to handle ML device stats

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 32 ------------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |  2 --
 drivers/ml/cnxk/cnxk_ml_ops.c  | 36 ++++++++++++++++++++++++++++++++--
 3 files changed, 34 insertions(+), 36 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 0a3575879f..27d255a830 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -770,38 +770,6 @@ cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
-int
-cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
-{
-	struct cnxk_ml_qp *qp;
-	int qp_id;
-
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-		qp = dev->data->queue_pairs[qp_id];
-		stats->enqueued_count += qp->stats.enqueued_count;
-		stats->dequeued_count += qp->stats.dequeued_count;
-		stats->enqueue_err_count += qp->stats.enqueue_err_count;
-		stats->dequeue_err_count += qp->stats.dequeue_err_count;
-	}
-
-	return 0;
-}
-
-void
-cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
-{
-	struct cnxk_ml_qp *qp;
-	int qp_id;
-
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-		qp = dev->data->queue_pairs[qp_id];
-		qp->stats.enqueued_count = 0;
-		qp->stats.dequeued_count = 0;
-		qp->stats.enqueue_err_count = 0;
-		qp->stats.dequeue_err_count = 0;
-	}
-}
-
 int
 cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 5fda98ae88..47e7cb12af 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -298,8 +298,6 @@ int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev);
 
-int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
-void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
 int cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 				  int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
 				  uint32_t size);
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index ba388af787..10bc580dfe 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -489,6 +489,38 @@ cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+static int
+cnxk_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
+{
+	struct cnxk_ml_qp *qp;
+	int qp_id;
+
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		stats->enqueued_count += qp->stats.enqueued_count;
+		stats->dequeued_count += qp->stats.dequeued_count;
+		stats->enqueue_err_count += qp->stats.enqueue_err_count;
+		stats->dequeue_err_count += qp->stats.dequeue_err_count;
+	}
+
+	return 0;
+}
+
+static void
+cnxk_ml_dev_stats_reset(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_qp *qp;
+	int qp_id;
+
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		qp->stats.enqueued_count = 0;
+		qp->stats.dequeued_count = 0;
+		qp->stats.enqueue_err_count = 0;
+		qp->stats.dequeue_err_count = 0;
+	}
+}
+
 static int
 cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
@@ -832,8 +864,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_queue_pair_release = cnxk_ml_dev_queue_pair_release,
 
 	/* Stats ops */
-	.dev_stats_get = cn10k_ml_dev_stats_get,
-	.dev_stats_reset = cn10k_ml_dev_stats_reset,
+	.dev_stats_get = cnxk_ml_dev_stats_get,
+	.dev_stats_reset = cnxk_ml_dev_stats_reset,
 	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
 	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
 	.dev_xstats_get = cn10k_ml_dev_xstats_get,
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 16/34] ml/cnxk: update device and model xstats functions
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (14 preceding siblings ...)
  2023-10-17 16:59   ` [PATCH v4 15/34] ml/cnxk: update device stats functions Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 17/34] ml/cnxk: update fast path functions Srikanth Yalavarthi
                     ` (18 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to handle ML device and model
extended stats. Handling resources for the xstats is done
in the cnxk layer. Introduced internal xstats group.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |   4 -
 drivers/ml/cnxk/cn10k_ml_ops.c   | 531 +++----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h   |  16 +-
 drivers/ml/cnxk/cnxk_ml_dev.h    |   5 +
 drivers/ml/cnxk/cnxk_ml_ops.c    | 481 +++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_xstats.h |  21 +-
 6 files changed, 551 insertions(+), 507 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index be989e0a20..bde9d08901 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -10,7 +10,6 @@
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_io.h"
-#include "cnxk_ml_xstats.h"
 
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
@@ -133,9 +132,6 @@ struct cn10k_ml_dev {
 	/* OCM info */
 	struct cn10k_ml_ocm ocm;
 
-	/* Extended stats data */
-	struct cnxk_ml_xstats xstats;
-
 	/* Enable / disable model data caching */
 	int cache_model_data;
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 27d255a830..776ad60401 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -198,107 +198,21 @@ cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_r
 	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
 }
 
-static int
-cn10k_ml_xstats_init(struct rte_ml_dev *dev)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint16_t nb_stats;
-	uint16_t stat_id;
-	uint16_t model;
-	uint16_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
-	nb_stats = RTE_DIM(device_xstats) + ML_CNXK_MAX_MODELS * RTE_DIM(layer_xstats);
-	if (cn10k_mldev->xstats.entries == NULL)
-		cn10k_mldev->xstats.entries = rte_zmalloc(
-			"cn10k_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
-			PLT_CACHE_LINE_SIZE);
-
-	if (cn10k_mldev->xstats.entries == NULL)
-		return -ENOMEM;
-
-	/* Initialize device xstats */
-	stat_id = 0;
-	for (i = 0; i < RTE_DIM(device_xstats); i++) {
-		cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
-		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
-			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s",
-			 device_xstats[i].name);
-
-		cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
-		cn10k_mldev->xstats.entries[stat_id].type = device_xstats[i].type;
-		cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_DEVICE;
-		cn10k_mldev->xstats.entries[stat_id].obj_idx = 0;
-		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_xstats[i].reset_allowed;
-		stat_id++;
-	}
-	cn10k_mldev->xstats.count_mode_device = stat_id;
-
-	/* Initialize model xstats */
-	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
-		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
-
-		for (i = 0; i < RTE_DIM(layer_xstats); i++) {
-			cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
-			cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
-			cn10k_mldev->xstats.entries[stat_id].type = layer_xstats[i].type;
-			cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
-			cn10k_mldev->xstats.entries[stat_id].obj_idx = model;
-			cn10k_mldev->xstats.entries[stat_id].reset_allowed =
-				layer_xstats[i].reset_allowed;
-
-			/* Name of xstat is updated during model load */
-			snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
-				 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name),
-				 "Model-%u-%s", model, layer_xstats[i].name);
-
-			stat_id++;
-		}
-
-		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
-	}
-
-	cn10k_mldev->xstats.count_mode_model = stat_id - cn10k_mldev->xstats.count_mode_device;
-	cn10k_mldev->xstats.count = stat_id;
-
-	return 0;
-}
-
 static void
-cn10k_ml_xstats_uninit(struct rte_ml_dev *dev)
+cn10k_ml_xstats_layer_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id,
+				  uint16_t layer_id)
 {
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	rte_free(cn10k_mldev->xstats.entries);
-	cn10k_mldev->xstats.entries = NULL;
-
-	cn10k_mldev->xstats.count = 0;
-}
-
-static void
-cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	uint16_t rclk_freq;
 	uint16_t sclk_freq;
 	uint16_t stat_id;
 	char suffix[8];
 	uint16_t i;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	model = dev->data->models[model_id];
-	stat_id = RTE_DIM(device_xstats) + model_id * RTE_DIM(layer_xstats);
+	model = cnxk_mldev->mldev->data->models[model_id];
+	layer = &model->layer[layer_id];
+	stat_id = cnxk_mldev->xstats.offset_for_layer[model_id][layer_id];
 
 	roc_clk_freq_get(&rclk_freq, &sclk_freq);
 	if (sclk_freq == 0)
@@ -306,270 +220,94 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 	else
 		strcpy(suffix, "ns");
 
-	/* Update xstat name based on model name and sclk availability */
+	/* Update xstat name based on layer name and sclk availability */
 	for (i = 0; i < RTE_DIM(layer_xstats); i++) {
-		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
-			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
-			 model->layer[0].glow.metadata.model.name, layer_xstats[i].name, suffix);
+		snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+			 layer->glow.metadata.model.name, layer_xstats[i].name, suffix);
 		stat_id++;
 	}
 }
 
-static uint64_t
-cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
-		       enum cnxk_ml_xstats_type type)
-{
-	struct cnxk_ml_dev *cnxk_mldev;
-
-	cnxk_mldev = dev->data->dev_private;
-
-	switch (type) {
-	case nb_models_loaded:
-		return cnxk_mldev->nb_models_loaded;
-	case nb_models_unloaded:
-		return cnxk_mldev->nb_models_unloaded;
-	case nb_models_started:
-		return cnxk_mldev->nb_models_started;
-	case nb_models_stopped:
-		return cnxk_mldev->nb_models_stopped;
-	default:
-		return -1;
-	}
-
-	return 0;
-}
-
-#define ML_AVG_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+#define ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
 	do {                                                                                       \
 		value = 0;                                                                         \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value += model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot;       \
-			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
-				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value += layer->glow.burst_xstats[qp_id].str##_latency_tot;                \
+			count += layer->glow.burst_xstats[qp_id].dequeued_count -                  \
+				 layer->glow.burst_xstats[qp_id].str##_reset_count;                \
 		}                                                                                  \
+		value += layer->glow.sync_xstats->str##_latency_tot;                               \
+		count += layer->glow.sync_xstats->dequeued_count -                                 \
+			 layer->glow.sync_xstats->str##_reset_count;                               \
 		if (count != 0)                                                                    \
 			value = value / count;                                                     \
 	} while (0)
 
-#define ML_MIN_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+#define ML_MIN_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
 	do {                                                                                       \
 		value = UINT64_MAX;                                                                \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MIN(                                                           \
-				value,                                                             \
-				model->layer[0].glow.burst_xstats[qp_id].str##_latency_min);       \
-			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
-				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MIN(value, layer->glow.burst_xstats[qp_id].str##_latency_min); \
+			count += layer->glow.burst_xstats[qp_id].dequeued_count -                  \
+				 layer->glow.burst_xstats[qp_id].str##_reset_count;                \
 		}                                                                                  \
+		value = PLT_MIN(value, layer->glow.sync_xstats->str##_latency_min);                \
+		count += layer->glow.sync_xstats->dequeued_count -                                 \
+			 layer->glow.sync_xstats->str##_reset_count;                               \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
 	} while (0)
 
-#define ML_MAX_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+#define ML_MAX_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
 	do {                                                                                       \
 		value = 0;                                                                         \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MAX(                                                           \
-				value,                                                             \
-				model->layer[0].glow.burst_xstats[qp_id].str##_latency_max);       \
-			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
-				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MAX(value, layer->glow.burst_xstats[qp_id].str##_latency_max); \
+			count += layer->glow.burst_xstats[qp_id].dequeued_count -                  \
+				 layer->glow.burst_xstats[qp_id].str##_reset_count;                \
 		}                                                                                  \
+		value = PLT_MAX(value, layer->glow.sync_xstats->str##_latency_max);                \
+		count += layer->glow.sync_xstats->dequeued_count -                                 \
+			 layer->glow.sync_xstats->str##_reset_count;                               \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
 	} while (0)
 
-static uint64_t
-cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cnxk_ml_xstats_type type)
+uint64_t
+cn10k_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
+			 enum cnxk_ml_xstats_type type)
 {
-	struct cnxk_ml_model *model;
-	uint16_t rclk_freq; /* MHz */
-	uint16_t sclk_freq; /* MHz */
 	uint64_t count = 0;
-	uint64_t value;
+	uint64_t value = 0;
 	uint32_t qp_id;
 
-	model = dev->data->models[obj_idx];
-	if (model == NULL)
-		return 0;
-
 	switch (type) {
 	case avg_hw_latency:
-		ML_AVG_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, hw, value, count);
 		break;
 	case min_hw_latency:
-		ML_MIN_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		ML_MIN_FOREACH_QP(cnxk_mldev, layer, qp_id, hw, value, count);
 		break;
 	case max_hw_latency:
-		ML_MAX_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		ML_MAX_FOREACH_QP(cnxk_mldev, layer, qp_id, hw, value, count);
 		break;
 	case avg_fw_latency:
-		ML_AVG_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, fw, value, count);
 		break;
 	case min_fw_latency:
-		ML_MIN_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		ML_MIN_FOREACH_QP(cnxk_mldev, layer, qp_id, fw, value, count);
 		break;
 	case max_fw_latency:
-		ML_MAX_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		ML_MAX_FOREACH_QP(cnxk_mldev, layer, qp_id, fw, value, count);
 		break;
 	default:
 		value = 0;
 	}
 
-	roc_clk_freq_get(&rclk_freq, &sclk_freq);
-	if (sclk_freq != 0) /* return in ns */
-		value = (value * 1000ULL) / sclk_freq;
-
 	return value;
 }
 
-static int
-cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[], uint16_t nb_ids)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_xstats_entry *xs;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint16_t nb_stats;
-	uint16_t stat_id;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	if (stat_ids == NULL)
-		nb_stats = cn10k_mldev->xstats.count_mode_device;
-	else
-		nb_stats = nb_ids;
-
-	for (i = 0; i < nb_stats; i++) {
-		if (stat_ids == NULL)
-			stat_id = i;
-		else
-			stat_id = stat_ids[i];
-
-		if (stat_id >= cn10k_mldev->xstats.count_mode_device)
-			return -EINVAL;
-
-		xs = &cn10k_mldev->xstats.entries[stat_id];
-		if (!xs->reset_allowed)
-			continue;
-
-		xs->reset_value = cn10k_ml_dev_xstat_get(dev, xs->obj_idx, xs->type);
-	}
-
-	return 0;
-}
-
-#define ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
-	do {                                                                                       \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot = 0;            \
-			model->layer[0].glow.burst_xstats[qp_id].str##_reset_count =               \
-				model->layer[0].glow.burst_xstats[qp_id].dequeued_count;           \
-		}                                                                                  \
-	} while (0)
-
-#define ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
-	do {                                                                                       \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_xstats[qp_id].str##_latency_min = UINT64_MAX;   \
-	} while (0)
-
-#define ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
-	do {                                                                                       \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_xstats[qp_id].str##_latency_max = 0;            \
-	} while (0)
-
-static void
-cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cnxk_ml_xstats_type type)
-{
-	struct cnxk_ml_model *model;
-	uint32_t qp_id;
-
-	model = dev->data->models[model_id];
-
-	switch (type) {
-	case avg_hw_latency:
-		ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, hw);
-		break;
-	case min_hw_latency:
-		ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, hw);
-		break;
-	case max_hw_latency:
-		ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, hw);
-		break;
-	case avg_fw_latency:
-		ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, fw);
-		break;
-	case min_fw_latency:
-		ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, fw);
-		break;
-	case max_fw_latency:
-		ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, fw);
-		break;
-	default:
-		return;
-	}
-}
-
-static int
-cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint16_t stat_ids[],
-			    uint16_t nb_ids)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_xstats_entry *xs;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
-	int32_t lcl_model_id = 0;
-	uint16_t start_id;
-	uint16_t end_id;
-	int32_t i;
-	int32_t j;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	for (i = 0; i < ML_CNXK_MAX_MODELS; i++) {
-		if (model_id == -1) {
-			model = dev->data->models[i];
-			if (model == NULL) /* Skip inactive models */
-				continue;
-		} else {
-			if (model_id != i)
-				continue;
-
-			model = dev->data->models[model_id];
-			if (model == NULL) {
-				plt_err("Invalid model_id = %d\n", model_id);
-				return -EINVAL;
-			}
-		}
-
-		start_id = cn10k_mldev->xstats.offset_for_model[i];
-		end_id = cn10k_mldev->xstats.offset_for_model[i] +
-			 cn10k_mldev->xstats.count_per_model[i] - 1;
-
-		if (stat_ids == NULL) {
-			for (j = start_id; j <= end_id; j++) {
-				xs = &cn10k_mldev->xstats.entries[j];
-				cn10k_ml_reset_model_stat(dev, i, xs->type);
-			}
-		} else {
-			for (j = 0; j < nb_ids; j++) {
-				if (stat_ids[j] < start_id || stat_ids[j] > end_id) {
-					plt_err("Invalid stat_ids[%d] = %d for model_id = %d\n", j,
-						stat_ids[j], lcl_model_id);
-					return -EINVAL;
-				}
-				xs = &cn10k_mldev->xstats.entries[stat_ids[j]];
-				cn10k_ml_reset_model_stat(dev, i, xs->type);
-			}
-		}
-	}
-
-	return 0;
-}
-
 static int
 cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer)
 {
@@ -654,7 +392,6 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cn10k_ml_ocm *ocm;
 	uint16_t tile_id;
-	int ret;
 
 	RTE_SET_USED(conf);
 
@@ -682,13 +419,6 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 
 	rte_spinlock_init(&ocm->lock);
 
-	/* Initialize xstats */
-	ret = cn10k_ml_xstats_init(cnxk_mldev->mldev);
-	if (ret != 0) {
-		plt_err("Failed to initialize xstats");
-		return ret;
-	}
-
 	/* Set JCMDQ enqueue function */
 	if (cn10k_mldev->hw_queue_lock == 1)
 		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_sl;
@@ -717,9 +447,6 @@ cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 	/* Release ocm_mask memory */
 	rte_free(cn10k_mldev->ocm.ocm_mask);
 
-	/* Un-initialize xstats */
-	cn10k_ml_xstats_uninit(cnxk_mldev->mldev);
-
 	/* Unload firmware */
 	cn10k_ml_fw_unload(cnxk_mldev);
 
@@ -770,174 +497,6 @@ cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
-int
-cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
-			      uint32_t size)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint32_t xstats_mode_count;
-	uint32_t idx = 0;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	xstats_mode_count = 0;
-	switch (mode) {
-	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
-		break;
-	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CNXK_MAX_MODELS)
-			break;
-		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
-		break;
-	default:
-		return -EINVAL;
-	};
-
-	if (xstats_mode_count > size || xstats_map == NULL)
-		return xstats_mode_count;
-
-	for (i = 0; i < cn10k_mldev->xstats.count && idx < size; i++) {
-		if (cn10k_mldev->xstats.entries[i].mode != mode)
-			continue;
-
-		if (mode != RTE_ML_DEV_XSTATS_DEVICE &&
-		    model_id != cn10k_mldev->xstats.entries[i].obj_idx)
-			continue;
-
-		strncpy(xstats_map[idx].name, cn10k_mldev->xstats.entries[i].map.name,
-			RTE_ML_STR_MAX);
-		xstats_map[idx].id = cn10k_mldev->xstats.entries[i].map.id;
-		idx++;
-	}
-
-	return idx;
-}
-
-int
-cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
-				uint64_t *value)
-{
-	struct cnxk_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	cnxk_ml_xstats_fn fn;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	for (i = 0; i < cn10k_mldev->xstats.count; i++) {
-		xs = &cn10k_mldev->xstats.entries[i];
-		if (strncmp(xs->map.name, name, RTE_ML_STR_MAX) == 0) {
-			if (stat_id != NULL)
-				*stat_id = xs->map.id;
-
-			switch (xs->fn_id) {
-			case CNXK_ML_XSTATS_FN_DEVICE:
-				fn = cn10k_ml_dev_xstat_get;
-				break;
-			case CNXK_ML_XSTATS_FN_MODEL:
-				fn = cn10k_ml_model_xstat_get;
-				break;
-			default:
-				plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
-				return -EINVAL;
-			}
-
-			*value = fn(dev, xs->obj_idx, xs->type) - xs->reset_value;
-
-			return 0;
-		}
-	}
-
-	if (stat_id != NULL)
-		*stat_id = (uint16_t)-1;
-
-	return -EINVAL;
-}
-
-int
-cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
-			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_xstats_entry *xs;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint32_t xstats_mode_count;
-	cnxk_ml_xstats_fn fn;
-	uint64_t val;
-	uint32_t idx;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	xstats_mode_count = 0;
-
-	switch (mode) {
-	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
-		break;
-	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CNXK_MAX_MODELS)
-			return -EINVAL;
-		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
-		break;
-	default:
-		return -EINVAL;
-	};
-
-	idx = 0;
-	for (i = 0; i < nb_ids && idx < xstats_mode_count; i++) {
-		xs = &cn10k_mldev->xstats.entries[stat_ids[i]];
-		if (stat_ids[i] > cn10k_mldev->xstats.count || xs->mode != mode)
-			continue;
-
-		if (mode == RTE_ML_DEV_XSTATS_MODEL && model_id != xs->obj_idx) {
-			plt_err("Invalid stats_id[%d] = %d for model_id = %d\n", i, stat_ids[i],
-				model_id);
-			return -EINVAL;
-		}
-
-		switch (xs->fn_id) {
-		case CNXK_ML_XSTATS_FN_DEVICE:
-			fn = cn10k_ml_dev_xstat_get;
-			break;
-		case CNXK_ML_XSTATS_FN_MODEL:
-			fn = cn10k_ml_model_xstat_get;
-			break;
-		default:
-			plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
-			return -EINVAL;
-		}
-
-		val = fn(dev, xs->obj_idx, xs->type);
-		if (values)
-			values[idx] = val;
-
-		idx++;
-	}
-
-	return idx;
-}
-
-int
-cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			  int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids)
-{
-	switch (mode) {
-	case RTE_ML_DEV_XSTATS_DEVICE:
-		return cn10k_ml_device_xstats_reset(dev, stat_ids, nb_ids);
-	case RTE_ML_DEV_XSTATS_MODEL:
-		return cn10k_ml_model_xstats_reset(dev, model_id, stat_ids, nb_ids);
-	};
-
-	return 0;
-}
-
 int
 cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
 {
@@ -1211,7 +770,7 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 							      sizeof(struct cn10k_ml_layer_xstats));
 
 	/* Update xstats names */
-	cn10k_ml_xstats_model_name_update(cnxk_mldev->mldev, idx);
+	cn10k_ml_xstats_layer_name_update(cnxk_mldev, model_id, layer_id);
 
 	layer->state = ML_CNXK_LAYER_STATE_LOADED;
 	cnxk_mldev->index_map[idx].model_id = model->model_id;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 47e7cb12af..4d76164dba 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -13,6 +13,7 @@
 struct cnxk_ml_dev;
 struct cnxk_ml_qp;
 struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
@@ -298,17 +299,6 @@ int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev);
 
-int cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-				  int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
-				  uint32_t size);
-int cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
-				    uint64_t *value);
-int cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			    int32_t model_id, const uint16_t stat_ids[], uint64_t values[],
-			    uint16_t nb_ids);
-int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			      int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids);
-
 /* Slow-path ops */
 int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
@@ -337,4 +327,8 @@ int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_nam
 int cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name);
 int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name);
 
+/* xstats ops */
+uint64_t cn10k_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
+				  enum cnxk_ml_xstats_type type);
+
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 1590249abd..3ce9338f1f 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -9,6 +9,8 @@
 
 #include "cn10k_ml_dev.h"
 
+#include "cnxk_ml_xstats.h"
+
 /* ML command timeout in seconds */
 #define ML_CNXK_CMD_TIMEOUT 5
 
@@ -51,6 +53,9 @@ struct cnxk_ml_dev {
 	/* Configuration state */
 	enum cnxk_ml_dev_state state;
 
+	/* Extended stats data */
+	struct cnxk_ml_xstats xstats;
+
 	/* Number of models loaded */
 	uint16_t nb_models_loaded;
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 10bc580dfe..ff6384668f 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -115,6 +115,285 @@ cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc
 	return NULL;
 }
 
+static int
+cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
+{
+	uint16_t nb_stats;
+	uint16_t stat_id;
+	uint16_t model;
+	uint16_t layer;
+	uint16_t i;
+
+	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
+	nb_stats = RTE_DIM(device_xstats) +
+		   RTE_DIM(layer_xstats) * ML_CNXK_MAX_MODELS * ML_CNXK_MODEL_MAX_LAYERS;
+	if (cnxk_mldev->xstats.entries == NULL)
+		cnxk_mldev->xstats.entries = rte_zmalloc(
+			"cnxk_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
+			PLT_CACHE_LINE_SIZE);
+
+	if (cnxk_mldev->xstats.entries == NULL)
+		return -ENOMEM;
+
+	/* Initialize device xstats */
+	stat_id = 0;
+	for (i = 0; i < RTE_DIM(device_xstats); i++) {
+		cnxk_mldev->xstats.entries[stat_id].map.id = stat_id;
+		snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s",
+			 device_xstats[i].name);
+
+		cnxk_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
+		cnxk_mldev->xstats.entries[stat_id].group = CNXK_ML_XSTATS_GROUP_DEVICE;
+		cnxk_mldev->xstats.entries[stat_id].type = device_xstats[i].type;
+		cnxk_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_DEVICE;
+		cnxk_mldev->xstats.entries[stat_id].obj_idx = 0;
+		cnxk_mldev->xstats.entries[stat_id].reset_allowed = device_xstats[i].reset_allowed;
+		stat_id++;
+	}
+	cnxk_mldev->xstats.count_mode_device = stat_id;
+
+	/* Initialize model xstats */
+	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
+		cnxk_mldev->xstats.offset_for_model[model] = stat_id;
+
+		for (layer = 0; layer < ML_CNXK_MODEL_MAX_LAYERS; layer++) {
+			cnxk_mldev->xstats.offset_for_layer[model][layer] = stat_id;
+
+			for (i = 0; i < RTE_DIM(layer_xstats); i++) {
+				cnxk_mldev->xstats.entries[stat_id].map.id = stat_id;
+				cnxk_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
+				cnxk_mldev->xstats.entries[stat_id].group =
+					CNXK_ML_XSTATS_GROUP_LAYER;
+				cnxk_mldev->xstats.entries[stat_id].type = layer_xstats[i].type;
+				cnxk_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
+				cnxk_mldev->xstats.entries[stat_id].obj_idx = model;
+				cnxk_mldev->xstats.entries[stat_id].layer_id = layer;
+				cnxk_mldev->xstats.entries[stat_id].reset_allowed =
+					layer_xstats[i].reset_allowed;
+
+				/* Name of xstat is updated during model load */
+				snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+					 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name),
+					 "Layer-%u-%u-%s", model, layer, layer_xstats[i].name);
+
+				stat_id++;
+			}
+
+			cnxk_mldev->xstats.count_per_layer[model][layer] = RTE_DIM(layer_xstats);
+		}
+
+		cnxk_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
+	}
+
+	cnxk_mldev->xstats.count_mode_model = stat_id - cnxk_mldev->xstats.count_mode_device;
+	cnxk_mldev->xstats.count = stat_id;
+
+	return 0;
+}
+
+static void
+cnxk_ml_xstats_uninit(struct cnxk_ml_dev *cnxk_mldev)
+{
+	rte_free(cnxk_mldev->xstats.entries);
+	cnxk_mldev->xstats.entries = NULL;
+
+	cnxk_mldev->xstats.count = 0;
+}
+
+static uint64_t
+cnxk_ml_dev_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx __rte_unused,
+		      int32_t layer_id __rte_unused, enum cnxk_ml_xstats_type type)
+{
+	switch (type) {
+	case nb_models_loaded:
+		return cnxk_mldev->nb_models_loaded;
+	case nb_models_unloaded:
+		return cnxk_mldev->nb_models_unloaded;
+	case nb_models_started:
+		return cnxk_mldev->nb_models_started;
+	case nb_models_stopped:
+		return cnxk_mldev->nb_models_stopped;
+	default:
+		return -1;
+	}
+
+	return 0;
+}
+
+static uint64_t
+cnxk_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx, int32_t layer_id,
+			enum cnxk_ml_xstats_type type)
+{
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
+	uint16_t rclk_freq; /* MHz */
+	uint16_t sclk_freq; /* MHz */
+	uint64_t value = 0;
+
+	model = cnxk_mldev->mldev->data->models[obj_idx];
+	if (model == NULL)
+		return 0;
+
+	if (layer_id >= 0)
+		layer = &model->layer[layer_id];
+	else
+		return 0;
+
+	value = cn10k_ml_model_xstat_get(cnxk_mldev, layer, type);
+
+	roc_clk_freq_get(&rclk_freq, &sclk_freq);
+	if (sclk_freq != 0) /* return in ns */
+		value = (value * 1000ULL) / sclk_freq;
+
+	return value;
+}
+
+static int
+cnxk_ml_device_xstats_reset(struct cnxk_ml_dev *cnxk_mldev, const uint16_t stat_ids[],
+			    uint16_t nb_ids)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	uint16_t nb_stats;
+	uint16_t stat_id;
+	uint32_t i;
+
+	if (stat_ids == NULL)
+		nb_stats = cnxk_mldev->xstats.count_mode_device;
+	else
+		nb_stats = nb_ids;
+
+	for (i = 0; i < nb_stats; i++) {
+		if (stat_ids == NULL)
+			stat_id = i;
+		else
+			stat_id = stat_ids[i];
+
+		if (stat_id >= cnxk_mldev->xstats.count_mode_device)
+			return -EINVAL;
+
+		xs = &cnxk_mldev->xstats.entries[stat_id];
+		if (!xs->reset_allowed)
+			continue;
+
+		xs->reset_value =
+			cnxk_ml_dev_xstat_get(cnxk_mldev, xs->obj_idx, xs->layer_id, xs->type);
+	}
+
+	return 0;
+}
+
+#define ML_AVG_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, str)                                     \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			layer->glow.burst_xstats[qp_id].str##_latency_tot = 0;                     \
+			layer->glow.burst_xstats[qp_id].str##_reset_count =                        \
+				layer->glow.burst_xstats[qp_id].dequeued_count;                    \
+		}                                                                                  \
+	} while (0)
+
+#define ML_MIN_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, str)                                     \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++)          \
+			layer->glow.burst_xstats[qp_id].str##_latency_min = UINT64_MAX;            \
+	} while (0)
+
+#define ML_MAX_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, str)                                     \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++)          \
+			layer->glow.burst_xstats[qp_id].str##_latency_max = 0;                     \
+	} while (0)
+
+static void
+cnxk_ml_reset_model_stat(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id,
+			 enum cnxk_ml_xstats_type type)
+{
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
+	uint16_t layer_id = 0;
+	uint32_t qp_id;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	layer = &model->layer[layer_id];
+
+	switch (type) {
+	case avg_hw_latency:
+		ML_AVG_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, hw);
+		break;
+	case min_hw_latency:
+		ML_MIN_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, hw);
+		break;
+	case max_hw_latency:
+		ML_MAX_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, hw);
+		break;
+	case avg_fw_latency:
+		ML_AVG_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, fw);
+		break;
+	case min_fw_latency:
+		ML_MIN_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, fw);
+		break;
+	case max_fw_latency:
+		ML_MAX_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, fw);
+		break;
+	default:
+		return;
+	}
+}
+
+static int
+cnxk_ml_model_xstats_reset(struct cnxk_ml_dev *cnxk_mldev, int32_t model_id,
+			   const uint16_t stat_ids[], uint16_t nb_ids)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_model *model;
+	int32_t lcl_model_id = 0;
+	uint16_t layer_id = 0;
+	uint16_t start_id;
+	uint16_t end_id;
+	int32_t i;
+	int32_t j;
+
+	for (i = 0; i < ML_CNXK_MAX_MODELS; i++) {
+		if (model_id == -1) {
+			model = cnxk_mldev->mldev->data->models[i];
+			if (model == NULL) /* skip inactive models */
+				continue;
+		} else {
+			if (model_id != i)
+				continue;
+
+			model = cnxk_mldev->mldev->data->models[model_id];
+			if (model == NULL) {
+				plt_err("Invalid model_id = %d\n", model_id);
+				return -EINVAL;
+			}
+		}
+
+		start_id = cnxk_mldev->xstats.offset_for_layer[i][layer_id];
+		end_id = cnxk_mldev->xstats.offset_for_layer[i][layer_id] +
+			 cnxk_mldev->xstats.count_per_layer[i][layer_id] - 1;
+
+		if (stat_ids == NULL) {
+			for (j = start_id; j <= end_id; j++) {
+				xs = &cnxk_mldev->xstats.entries[j];
+				cnxk_ml_reset_model_stat(cnxk_mldev, i, xs->type);
+			}
+		} else {
+			for (j = 0; j < nb_ids; j++) {
+				if (stat_ids[j] < start_id || stat_ids[j] > end_id) {
+					plt_err("Invalid stat_ids[%d] = %d for model_id = %d\n", j,
+						stat_ids[j], lcl_model_id);
+					return -EINVAL;
+				}
+				xs = &cnxk_mldev->xstats.entries[stat_ids[j]];
+				cnxk_ml_reset_model_stat(cnxk_mldev, i, xs->type);
+			}
+		}
+	}
+
+	return 0;
+}
+
 static int
 cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -294,6 +573,13 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	for (i = 0; i < cnxk_mldev->max_nb_layers; i++)
 		cnxk_mldev->index_map[i].active = false;
 
+	/* Initialize xstats */
+	ret = cnxk_ml_xstats_init(cnxk_mldev);
+	if (ret != 0) {
+		plt_err("Failed to initialize xstats");
+		goto error;
+	}
+
 	cnxk_mldev->nb_models_loaded = 0;
 	cnxk_mldev->nb_models_started = 0;
 	cnxk_mldev->nb_models_stopped = 0;
@@ -323,6 +609,9 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
+	/* Un-initialize xstats */
+	cnxk_ml_xstats_uninit(cnxk_mldev);
+
 	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close CN10K ML Device");
 
@@ -521,6 +810,190 @@ cnxk_ml_dev_stats_reset(struct rte_ml_dev *dev)
 	}
 }
 
+static int
+cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+			     int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
+			     uint32_t size)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_dev *cnxk_mldev;
+	uint32_t xstats_mode_count;
+	uint16_t layer_id = 0;
+	uint32_t idx = 0;
+	uint32_t i;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+	xstats_mode_count = 0;
+
+	switch (mode) {
+	case RTE_ML_DEV_XSTATS_DEVICE:
+		xstats_mode_count = cnxk_mldev->xstats.count_mode_device;
+		break;
+	case RTE_ML_DEV_XSTATS_MODEL:
+		if (model_id >= ML_CNXK_MAX_MODELS)
+			break;
+		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+		break;
+	default:
+		return -EINVAL;
+	};
+
+	if (xstats_mode_count > size || xstats_map == NULL)
+		return xstats_mode_count;
+
+	for (i = 0; i < cnxk_mldev->xstats.count && idx < size; i++) {
+		xs = &cnxk_mldev->xstats.entries[i];
+		if (xs->mode != mode)
+			continue;
+
+		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
+		    (model_id != xs->obj_idx || layer_id != xs->layer_id))
+			continue;
+
+		strncpy(xstats_map[idx].name, xs->map.name, RTE_ML_STR_MAX);
+		xstats_map[idx].id = xs->map.id;
+		idx++;
+	}
+
+	return idx;
+}
+
+static int
+cnxk_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
+			       uint64_t *value)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_dev *cnxk_mldev;
+	cnxk_ml_xstats_fn fn;
+	uint32_t i;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	for (i = 0; i < cnxk_mldev->xstats.count; i++) {
+		xs = &cnxk_mldev->xstats.entries[i];
+		if (strncmp(xs->map.name, name, RTE_ML_STR_MAX) == 0) {
+			if (stat_id != NULL)
+				*stat_id = xs->map.id;
+
+			switch (xs->fn_id) {
+			case CNXK_ML_XSTATS_FN_DEVICE:
+				fn = cnxk_ml_dev_xstat_get;
+				break;
+			case CNXK_ML_XSTATS_FN_MODEL:
+				fn = cnxk_ml_model_xstat_get;
+				break;
+			default:
+				plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
+				return -EINVAL;
+			}
+
+			*value = fn(cnxk_mldev, xs->obj_idx, xs->layer_id, xs->type) -
+				 xs->reset_value;
+
+			return 0;
+		}
+	}
+
+	if (stat_id != NULL)
+		*stat_id = (uint16_t)-1;
+
+	return -EINVAL;
+}
+
+static int
+cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
+		       const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_dev *cnxk_mldev;
+	uint32_t xstats_mode_count;
+	uint16_t layer_id = 0;
+	cnxk_ml_xstats_fn fn;
+	uint64_t val;
+	uint32_t idx;
+	uint32_t i;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+	xstats_mode_count = 0;
+
+	switch (mode) {
+	case RTE_ML_DEV_XSTATS_DEVICE:
+		xstats_mode_count = cnxk_mldev->xstats.count_mode_device;
+		break;
+	case RTE_ML_DEV_XSTATS_MODEL:
+		if (model_id >= ML_CNXK_MAX_MODELS)
+			return -EINVAL;
+		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+		break;
+	default:
+		return -EINVAL;
+	};
+
+	idx = 0;
+	for (i = 0; i < nb_ids && idx < xstats_mode_count; i++) {
+		xs = &cnxk_mldev->xstats.entries[stat_ids[i]];
+		if (stat_ids[i] > cnxk_mldev->xstats.count || xs->mode != mode)
+			continue;
+
+		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
+		    (model_id != xs->obj_idx || layer_id != xs->layer_id)) {
+			plt_err("Invalid stats_id[%d] = %d for model_id = %d\n", i, stat_ids[i],
+				model_id);
+			return -EINVAL;
+		}
+
+		switch (xs->fn_id) {
+		case CNXK_ML_XSTATS_FN_DEVICE:
+			fn = cnxk_ml_dev_xstat_get;
+			break;
+		case CNXK_ML_XSTATS_FN_MODEL:
+			fn = cnxk_ml_model_xstat_get;
+			break;
+		default:
+			plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
+			return -EINVAL;
+		}
+
+		val = fn(cnxk_mldev, xs->obj_idx, xs->layer_id, xs->type);
+		if (values)
+			values[idx] = val;
+
+		idx++;
+	}
+
+	return idx;
+}
+
+static int
+cnxk_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
+			 const uint16_t stat_ids[], uint16_t nb_ids)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	switch (mode) {
+	case RTE_ML_DEV_XSTATS_DEVICE:
+		return cnxk_ml_device_xstats_reset(cnxk_mldev, stat_ids, nb_ids);
+	case RTE_ML_DEV_XSTATS_MODEL:
+		return cnxk_ml_model_xstats_reset(cnxk_mldev, model_id, stat_ids, nb_ids);
+	};
+
+	return 0;
+}
+
 static int
 cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
@@ -866,10 +1339,10 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Stats ops */
 	.dev_stats_get = cnxk_ml_dev_stats_get,
 	.dev_stats_reset = cnxk_ml_dev_stats_reset,
-	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
-	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
-	.dev_xstats_get = cn10k_ml_dev_xstats_get,
-	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
+	.dev_xstats_names_get = cnxk_ml_dev_xstats_names_get,
+	.dev_xstats_by_name_get = cnxk_ml_dev_xstats_by_name_get,
+	.dev_xstats_get = cnxk_ml_dev_xstats_get,
+	.dev_xstats_reset = cnxk_ml_dev_xstats_reset,
 
 	/* Model ops */
 	.model_load = cnxk_ml_model_load,
diff --git a/drivers/ml/cnxk/cnxk_ml_xstats.h b/drivers/ml/cnxk/cnxk_ml_xstats.h
index 0d405679ca..5e02bb876c 100644
--- a/drivers/ml/cnxk/cnxk_ml_xstats.h
+++ b/drivers/ml/cnxk/cnxk_ml_xstats.h
@@ -7,6 +7,8 @@
 
 #include "cnxk_ml_io.h"
 
+struct cnxk_ml_dev;
+
 /* Extended stats types enum */
 enum cnxk_ml_xstats_type {
 	/* Number of models loaded */
@@ -58,9 +60,21 @@ enum cnxk_ml_xstats_fn_type {
 	CNXK_ML_XSTATS_FN_MODEL,
 };
 
+/* Extended stats group */
+enum cnxk_ml_xstats_group {
+	/* Device stats */
+	CNXK_ML_XSTATS_GROUP_DEVICE,
+
+	/* Model stats */
+	CNXK_ML_XSTATS_GROUP_MODEL,
+
+	/* Layer stats */
+	CNXK_ML_XSTATS_GROUP_LAYER,
+};
+
 /* Function pointer to get xstats for a type */
-typedef uint64_t (*cnxk_ml_xstats_fn)(struct rte_ml_dev *cnxk_mldev, uint16_t obj_idx,
-				      enum cnxk_ml_xstats_type stat);
+typedef uint64_t (*cnxk_ml_xstats_fn)(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx,
+				      int32_t layer_id, enum cnxk_ml_xstats_type stat);
 
 /* Extended stats entry structure */
 struct cnxk_ml_xstats_entry {
@@ -70,6 +84,9 @@ struct cnxk_ml_xstats_entry {
 	/* xstats mode, device or model */
 	enum rte_ml_dev_xstats_mode mode;
 
+	/* xstats group */
+	enum cnxk_ml_xstats_group group;
+
 	/* Type of xstats */
 	enum cnxk_ml_xstats_type type;
 
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 17/34] ml/cnxk: update fast path functions
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (15 preceding siblings ...)
  2023-10-17 16:59   ` [PATCH v4 16/34] ml/cnxk: update device and model xstats functions Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 18/34] ml/cnxk: move error handling to cnxk layer Srikanth Yalavarthi
                     ` (17 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented cnxk layer fast-path functions and added support
for model specific fast-path functions. CNXK layer functions
would invoke model specific fast-path functions.

Added support for model specific poll handling functions and
updated internal inference sync function. Drop use of rte_ml_op
as argument. Updated function arguments to enable the function
to be used as callback by TVM HW runtime.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h  |   5 -
 drivers/ml/cnxk/cn10k_ml_ops.c  | 241 ++++++++------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h  |  13 +-
 drivers/ml/cnxk/cnxk_ml_model.h |  14 ++
 drivers/ml/cnxk/cnxk_ml_ops.c   | 128 +++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.h   |   7 +
 6 files changed, 216 insertions(+), 192 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index bde9d08901..94a94d996f 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -143,11 +143,6 @@ struct cn10k_ml_dev {
 
 	/* JCMD enqueue function handler */
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
-
-	/* Poll handling function pointers */
-	void (*set_poll_addr)(struct cnxk_ml_req *req);
-	void (*set_poll_ptr)(struct cnxk_ml_req *req);
-	uint64_t (*get_poll_ptr)(struct cnxk_ml_req *req);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 776ad60401..8116c8dedb 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -65,24 +65,12 @@ static const struct cn10k_ml_stype_db_driver {
 	{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
 };
 
-static inline void
+__rte_hot void
 cn10k_ml_set_poll_addr(struct cnxk_ml_req *req)
 {
 	req->status = &req->cn10k_req.status;
 }
 
-static inline void
-cn10k_ml_set_poll_ptr(struct cnxk_ml_req *req)
-{
-	plt_write64(ML_CNXK_POLL_JOB_START, req->status);
-}
-
-static inline uint64_t
-cn10k_ml_get_poll_ptr(struct cnxk_ml_req *req)
-{
-	return plt_read64(req->status);
-}
-
 void
 cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp)
 {
@@ -177,7 +165,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_l
 
 static __rte_always_inline void
 cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_req *req,
-				struct rte_ml_op *op)
+				uint16_t index, void *input, void *output, uint16_t nb_batches)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 
@@ -185,17 +173,17 @@ cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_r
 
 	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
 	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(req->status);
-	req->cn10k_req.jd.hdr.model_id = op->model_id;
+	req->cn10k_req.jd.hdr.model_id = index;
 	req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
 	req->cn10k_req.jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
 	req->cn10k_req.jd.hdr.sp_flags = 0x0;
 	req->cn10k_req.jd.hdr.result =
 		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
 	req->cn10k_req.jd.model_run.input_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->input[0]->addr));
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, input));
 	req->cn10k_req.jd.model_run.output_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->output[0]->addr));
-	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, output));
+	req->cn10k_req.jd.model_run.num_batches = nb_batches;
 }
 
 static void
@@ -311,30 +299,15 @@ cn10k_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *l
 static int
 cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer)
 {
-	struct rte_ml_buff_seg seg[2];
-	struct rte_ml_buff_seg *inp;
-	struct rte_ml_buff_seg *out;
-	struct rte_ml_op op;
-
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
 	uint64_t isize = 0;
 	uint64_t osize = 0;
 	int ret = 0;
-	uint32_t i;
-
-	inp = &seg[0];
-	out = &seg[1];
 
 	/* Create input and output buffers. */
-	for (i = 0; i < layer->info.nb_inputs; i++)
-		isize += layer->info.input[i].sz_q;
-
-	for (i = 0; i < layer->info.nb_outputs; i++)
-		osize += layer->info.output[i].sz_q;
-
-	isize = layer->batch_size * isize;
-	osize = layer->batch_size * osize;
+	isize = layer->info.total_input_sz_q;
+	osize = layer->info.total_output_sz_q;
 
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", layer->index);
 	mz = plt_memzone_reserve_aligned(str, isize + osize, 0, ML_CN10K_ALIGN_SIZE);
@@ -342,25 +315,9 @@ cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *
 		return -ENOMEM;
 	memset(mz->addr, 0, isize + osize);
 
-	seg[0].addr = mz->addr;
-	seg[0].iova_addr = mz->iova;
-	seg[0].length = isize;
-	seg[0].next = NULL;
-
-	seg[1].addr = PLT_PTR_ADD(mz->addr, isize);
-	seg[1].iova_addr = mz->iova + isize;
-	seg[1].length = osize;
-	seg[1].next = NULL;
-
-	op.model_id = layer->index;
-	op.nb_batches = layer->batch_size;
-	op.mempool = NULL;
-
-	op.input = &inp;
-	op.output = &out;
-
 	memset(layer->glow.req, 0, sizeof(struct cnxk_ml_req));
-	ret = cn10k_ml_inference_sync(cnxk_mldev, &op);
+	ret = cn10k_ml_inference_sync(cnxk_mldev, layer->index, mz->addr,
+				      PLT_PTR_ADD(mz->addr, isize), 1);
 	plt_memzone_free(mz);
 
 	return ret;
@@ -425,13 +382,8 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 	else
 		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
-	/* Set polling function pointers */
-	cn10k_mldev->set_poll_addr = cn10k_ml_set_poll_addr;
-	cn10k_mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
-	cn10k_mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
-
-	cnxk_mldev->mldev->enqueue_burst = cn10k_ml_enqueue_burst;
-	cnxk_mldev->mldev->dequeue_burst = cn10k_ml_dequeue_burst;
+	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
+	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
 	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
 
 	return 0;
@@ -824,6 +776,12 @@ cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	cn10k_ml_model_info_set(cnxk_mldev, model, &model->layer[0].info, &model->glow.metadata);
 
+	/* Set fast-path functions */
+	model->enqueue_single = cn10k_ml_enqueue_single;
+	model->result_update = cn10k_ml_result_update;
+	model->set_error_code = cn10k_ml_set_error_code;
+	model->set_poll_addr = cn10k_ml_set_poll_addr;
+
 	return 0;
 }
 
@@ -1219,26 +1177,8 @@ cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_mode
 	return 0;
 }
 
-static __rte_always_inline void
-queue_index_advance(uint64_t *index, uint64_t nb_desc)
-{
-	*index = (*index + 1) % nb_desc;
-}
-
-static __rte_always_inline uint64_t
-queue_pending_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
-{
-	return (nb_desc + head - tail) % nb_desc;
-}
-
-static __rte_always_inline uint64_t
-queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
-{
-	return nb_desc - queue_pending_count(head, tail, nb_desc) - 1;
-}
-
-static __rte_always_inline void
-cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml_req *req)
+__rte_hot void
+cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_layer_xstats *xstats;
@@ -1246,6 +1186,7 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml
 	struct cn10k_ml_result *result;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
+	struct cnxk_ml_req *req;
 	struct cnxk_ml_qp *qp;
 	struct rte_ml_op *op;
 	uint64_t hw_latency;
@@ -1253,9 +1194,9 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml
 	uint16_t model_id;
 	uint16_t layer_id;
 
+	req = (struct cnxk_ml_req *)request;
 	result = &req->cn10k_req.result;
 	op = req->op;
-
 	if (likely(result->error_code == 0)) {
 		model_id = cnxk_mldev->index_map[op->model_id].model_id;
 		layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
@@ -1322,119 +1263,48 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml
 	op->user_ptr = result->user_ptr;
 }
 
-__rte_hot uint16_t
-cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
-		       uint16_t nb_ops)
+__rte_hot void
+cn10k_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype)
+{
+	union cn10k_ml_error_code *error_code;
+
+	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+	error_code->s.etype = etype;
+	error_code->s.stype = stype;
+}
+
+__rte_hot bool
+cn10k_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op, uint16_t layer_id,
+			struct cnxk_ml_qp *qp, uint64_t head)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
 	struct cnxk_ml_queue *queue;
 	struct cnxk_ml_req *req;
-	struct cnxk_ml_qp *qp;
-	struct rte_ml_op *op;
-
-	uint16_t count;
-	uint64_t head;
-	bool enqueued;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	qp = dev->data->queue_pairs[qp_id];
 	queue = &qp->queue;
-
-	head = queue->head;
-	nb_ops = PLT_MIN(nb_ops, queue_free_count(head, queue->tail, qp->nb_desc));
-	count = 0;
-
-	if (unlikely(nb_ops == 0))
-		return 0;
-
-enqueue_req:
-	op = ops[count];
 	req = &queue->reqs[head];
 
-	cn10k_mldev->set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
+	model = cnxk_mldev->mldev->data->models[op->model_id];
+	model->set_poll_addr(req);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, model->layer[layer_id].index,
+					op->input[0]->addr, op->output[0]->addr, op->nb_batches);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
 	error_code->s.etype = ML_ETYPE_UNKNOWN;
 	req->cn10k_req.result.user_ptr = op->user_ptr;
 
-	cn10k_mldev->set_poll_ptr(req);
-	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd);
-	if (unlikely(!enqueued))
-		goto jcmdq_full;
+	cnxk_ml_set_poll_ptr(req);
+	if (unlikely(!cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd)))
+		return false;
 
 	req->timeout = plt_tsc_cycles() + queue->wait_cycles;
 	req->op = op;
 
-	queue_index_advance(&head, qp->nb_desc);
-	count++;
-
-	if (count < nb_ops)
-		goto enqueue_req;
-
-jcmdq_full:
-	queue->head = head;
-	qp->stats.enqueued_count += count;
-
-	return count;
-}
-
-__rte_hot uint16_t
-cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
-		       uint16_t nb_ops)
-{
-	union cn10k_ml_error_code *error_code;
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_queue *queue;
-	struct cnxk_ml_req *req;
-	struct cnxk_ml_qp *qp;
-
-	uint64_t status;
-	uint16_t count;
-	uint64_t tail;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	qp = dev->data->queue_pairs[qp_id];
-	queue = &qp->queue;
-
-	tail = queue->tail;
-	nb_ops = PLT_MIN(nb_ops, queue_pending_count(queue->head, tail, qp->nb_desc));
-	count = 0;
-
-	if (unlikely(nb_ops == 0))
-		goto empty_or_active;
-
-dequeue_req:
-	req = &queue->reqs[tail];
-	status = cn10k_mldev->get_poll_ptr(req);
-	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
-		if (plt_tsc_cycles() < req->timeout) {
-			goto empty_or_active;
-		} else { /* Timeout, set indication of driver error */
-			error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
-			error_code->s.etype = ML_ETYPE_DRIVER;
-		}
-	}
-
-	cn10k_ml_result_update(cnxk_mldev, qp_id, req);
-	ops[count] = req->op;
-
-	queue_index_advance(&tail, qp->nb_desc);
-	count++;
-
-	if (count < nb_ops)
-		goto dequeue_req;
-
-empty_or_active:
-	queue->tail = tail;
-
-	return count;
+	return true;
 }
 
 __rte_hot int
@@ -1471,41 +1341,48 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 }
 
 __rte_hot int
-cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op)
+cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
+			uint16_t nb_batches)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
 	struct cnxk_ml_req *req;
+	struct rte_ml_op op;
 	uint16_t model_id;
 	uint16_t layer_id;
 	bool timeout;
 	int ret = 0;
 
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	model_id = cnxk_mldev->index_map[op->model_id].model_id;
-	layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
+	model_id = cnxk_mldev->index_map[index].model_id;
+	layer_id = cnxk_mldev->index_map[index].layer_id;
 	model = cnxk_mldev->mldev->data->models[model_id];
 	layer = &model->layer[layer_id];
 	req = layer->glow.req;
 
+	op.model_id = index;
+	op.impl_opaque = 0;
+
 	cn10k_ml_set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, index, input, output, nb_batches);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
 	error_code->s.etype = ML_ETYPE_UNKNOWN;
-	req->cn10k_req.result.user_ptr = op->user_ptr;
+	req->cn10k_req.result.user_ptr = NULL;
 
-	cn10k_mldev->set_poll_ptr(req);
+	cnxk_ml_set_poll_ptr(req);
 	req->cn10k_req.jcmd.w1.s.jobptr = PLT_U64_CAST(&req->cn10k_req.jd);
 
 	timeout = true;
 	req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	do {
 		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd)) {
-			req->op = op;
+			req->op = &op;
 			timeout = false;
 			break;
 		}
@@ -1518,7 +1395,7 @@ cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op)
 
 	timeout = true;
 	do {
-		if (cn10k_mldev->get_poll_ptr(req) == ML_CNXK_POLL_JOB_FINISH) {
+		if (cnxk_ml_get_poll_ptr(req) == ML_CNXK_POLL_JOB_FINISH) {
 			timeout = false;
 			break;
 		}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 4d76164dba..3d18303ed3 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -14,6 +14,7 @@ struct cnxk_ml_dev;
 struct cnxk_ml_qp;
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
+struct cnxk_ml_req;
 
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
@@ -309,13 +310,15 @@ int cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_
 				 void *buffer);
 
 /* Fast-path ops */
-__rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
-					  struct rte_ml_op **ops, uint16_t nb_ops);
-__rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
-					  struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot bool cn10k_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op,
+				       uint16_t layer_id, struct cnxk_ml_qp *qp, uint64_t head);
 __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op,
 				    struct rte_ml_op_error *error);
-__rte_hot int cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op);
+__rte_hot int cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
+				      uint16_t nb_batches);
+__rte_hot void cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request);
+__rte_hot void cn10k_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype);
+__rte_hot void cn10k_ml_set_poll_addr(struct cnxk_ml_req *req);
 
 /* Misc ops */
 void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index 66d979dd3c..f618e5aa5f 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -15,6 +15,8 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_qp;
+struct cnxk_ml_req;
 
 /* Model state */
 enum cnxk_ml_model_state {
@@ -70,6 +72,12 @@ struct cnxk_ml_layer {
 	struct cn10k_ml_layer_data glow;
 };
 
+typedef bool (*enqueue_single_t)(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op,
+				 uint16_t layer_id, struct cnxk_ml_qp *qp, uint64_t head);
+typedef void (*result_update_t)(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request);
+typedef void (*set_error_code_t)(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype);
+typedef void (*set_poll_addr_t)(struct cnxk_ml_req *req);
+
 /* Model Object */
 struct cnxk_ml_model {
 	/* Device reference */
@@ -106,6 +114,12 @@ struct cnxk_ml_model {
 
 	/* Spinlock, used to update model state */
 	plt_spinlock_t lock;
+
+	/* Fast-path functions */
+	enqueue_single_t enqueue_single;
+	result_update_t result_update;
+	set_error_code_t set_error_code;
+	set_poll_addr_t set_poll_addr;
 };
 
 void cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp);
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index ff6384668f..d5bc08c1ae 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -15,6 +15,18 @@
 /* ML model macros */
 #define CNXK_ML_MODEL_MEMZONE_NAME "ml_cnxk_model_mz"
 
+__rte_hot void
+cnxk_ml_set_poll_ptr(struct cnxk_ml_req *req)
+{
+	plt_write64(ML_CNXK_POLL_JOB_START, req->status);
+}
+
+__rte_hot uint64_t
+cnxk_ml_get_poll_ptr(struct cnxk_ml_req *req)
+{
+	return plt_read64(req->status);
+}
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -1322,6 +1334,122 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 	return 0;
 }
 
+static __rte_always_inline void
+queue_index_advance(uint64_t *index, uint64_t nb_desc)
+{
+	*index = (*index + 1) % nb_desc;
+}
+
+static __rte_always_inline uint64_t
+queue_pending_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
+{
+	return (nb_desc + head - tail) % nb_desc;
+}
+
+static __rte_always_inline uint64_t
+queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
+{
+	return nb_desc - queue_pending_count(head, tail, nb_desc) - 1;
+}
+
+__rte_hot uint16_t
+cnxk_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
+		      uint16_t nb_ops)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_qp *qp;
+	struct rte_ml_op *op;
+
+	uint16_t layer_id = 0;
+	uint16_t count;
+	uint64_t head;
+
+	cnxk_mldev = dev->data->dev_private;
+	qp = dev->data->queue_pairs[qp_id];
+	queue = &qp->queue;
+
+	head = queue->head;
+	nb_ops = PLT_MIN(nb_ops, queue_free_count(head, queue->tail, qp->nb_desc));
+	count = 0;
+
+	if (unlikely(nb_ops == 0))
+		return 0;
+
+enqueue_req:
+	op = ops[count];
+	model = cnxk_mldev->mldev->data->models[op->model_id];
+
+	if (unlikely(!model->enqueue_single(cnxk_mldev, op, layer_id, qp, head)))
+		goto jcmdq_full;
+
+	queue_index_advance(&head, qp->nb_desc);
+	count++;
+
+	if (count < nb_ops)
+		goto enqueue_req;
+
+jcmdq_full:
+	queue->head = head;
+	qp->stats.enqueued_count += count;
+
+	return count;
+}
+
+__rte_hot uint16_t
+cnxk_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
+		      uint16_t nb_ops)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_req *req;
+	struct cnxk_ml_qp *qp;
+
+	uint64_t status;
+	uint16_t count;
+	uint64_t tail;
+
+	cnxk_mldev = dev->data->dev_private;
+	qp = dev->data->queue_pairs[qp_id];
+	queue = &qp->queue;
+
+	tail = queue->tail;
+	nb_ops = PLT_MIN(nb_ops, queue_pending_count(queue->head, tail, qp->nb_desc));
+	count = 0;
+
+	if (unlikely(nb_ops == 0))
+		goto empty_or_active;
+
+dequeue_req:
+
+	req = &queue->reqs[tail];
+	model = cnxk_mldev->mldev->data->models[req->op->model_id];
+
+	status = cnxk_ml_get_poll_ptr(req);
+	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
+		if (plt_tsc_cycles() < req->timeout)
+			goto empty_or_active;
+		else /* Timeout, set indication of driver error */
+			model->set_error_code(req, ML_ETYPE_DRIVER, 0);
+	}
+
+	model->result_update(cnxk_mldev, qp->id, req);
+
+	ops[count] = req->op;
+	queue_index_advance(&tail, qp->nb_desc);
+	count++;
+
+	if (count < nb_ops)
+		goto dequeue_req;
+
+empty_or_active:
+	queue->tail = tail;
+
+	return count;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index d27ca0d0cb..d0c126f34b 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -65,4 +65,11 @@ extern struct rte_ml_dev_ops cnxk_ml_ops;
 int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
 int cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
 
+__rte_hot uint16_t cnxk_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
+					 struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot uint16_t cnxk_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
+					 struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot void cnxk_ml_set_poll_ptr(struct cnxk_ml_req *req);
+__rte_hot uint64_t cnxk_ml_get_poll_ptr(struct cnxk_ml_req *req);
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 18/34] ml/cnxk: move error handling to cnxk layer
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (16 preceding siblings ...)
  2023-10-17 16:59   ` [PATCH v4 17/34] ml/cnxk: update fast path functions Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 19/34] ml/cnxk: support config and close of tvmdp library Srikanth Yalavarthi
                     ` (16 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Move error type structures to cnxk layer. cn10k layer to
handle fw and hw error sub-types only.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h | 41 ++++++---------
 drivers/ml/cnxk/cn10k_ml_ops.c | 93 +++++++++++++---------------------
 drivers/ml/cnxk/cnxk_ml_dev.c  |  8 +++
 drivers/ml/cnxk/cnxk_ml_dev.h  | 18 +++++++
 drivers/ml/cnxk/cnxk_ml_ops.c  |  2 +-
 5 files changed, 78 insertions(+), 84 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 94a94d996f..2e7eb6c9ef 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -52,38 +52,27 @@ struct cnxk_ml_dev;
 struct cnxk_ml_req;
 struct cnxk_ml_qp;
 
-/* Error types enumeration */
-enum cn10k_ml_error_etype {
-	/* 0x0 */ ML_ETYPE_NO_ERROR = 0, /* No error */
-	/* 0x1 */ ML_ETYPE_FW_NONFATAL,	 /* Firmware non-fatal error */
-	/* 0x2 */ ML_ETYPE_HW_NONFATAL,	 /* Hardware non-fatal error */
-	/* 0x3 */ ML_ETYPE_HW_FATAL,	 /* Hardware fatal error */
-	/* 0x4 */ ML_ETYPE_HW_WARNING,	 /* Hardware warning */
-	/* 0x5 */ ML_ETYPE_DRIVER,	 /* Driver specific error */
-	/* 0x6 */ ML_ETYPE_UNKNOWN,	 /* Unknown error */
-};
-
 /* Firmware non-fatal error sub-type */
 enum cn10k_ml_error_stype_fw_nf {
-	/* 0x0 */ ML_FW_ERR_NOERR = 0,		 /* No error */
-	/* 0x1 */ ML_FW_ERR_UNLOAD_ID_NOT_FOUND, /* Model ID not found during load */
-	/* 0x2 */ ML_FW_ERR_LOAD_LUT_OVERFLOW,	 /* Lookup table overflow at load */
-	/* 0x3 */ ML_FW_ERR_ID_IN_USE,		 /* Model ID already in use */
-	/* 0x4 */ ML_FW_ERR_INVALID_TILEMASK,	 /* Invalid OCM tilemask */
-	/* 0x5 */ ML_FW_ERR_RUN_LUT_OVERFLOW,	 /* Lookup table overflow at run */
-	/* 0x6 */ ML_FW_ERR_RUN_ID_NOT_FOUND,	 /* Model ID not found during run */
-	/* 0x7 */ ML_FW_ERR_COMMAND_NOTSUP,	 /* Unsupported command */
-	/* 0x8 */ ML_FW_ERR_DDR_ADDR_RANGE,	 /* DDR address out of range */
-	/* 0x9 */ ML_FW_ERR_NUM_BATCHES_INVALID, /* Invalid number of batches */
-	/* 0xA */ ML_FW_ERR_INSSYNC_TIMEOUT,	 /* INS sync timeout */
+	/* 0x0 */ ML_CN10K_FW_ERR_NOERR = 0,	       /* No error */
+	/* 0x1 */ ML_CN10K_FW_ERR_UNLOAD_ID_NOT_FOUND, /* Model ID not found during load */
+	/* 0x2 */ ML_CN10K_FW_ERR_LOAD_LUT_OVERFLOW,   /* Lookup table overflow at load */
+	/* 0x3 */ ML_CN10K_FW_ERR_ID_IN_USE,	       /* Model ID already in use */
+	/* 0x4 */ ML_CN10K_FW_ERR_INVALID_TILEMASK,    /* Invalid OCM tilemask */
+	/* 0x5 */ ML_CN10K_FW_ERR_RUN_LUT_OVERFLOW,    /* Lookup table overflow at run */
+	/* 0x6 */ ML_CN10K_FW_ERR_RUN_ID_NOT_FOUND,    /* Model ID not found during run */
+	/* 0x7 */ ML_CN10K_FW_ERR_COMMAND_NOTSUP,      /* Unsupported command */
+	/* 0x8 */ ML_CN10K_FW_ERR_DDR_ADDR_RANGE,      /* DDR address out of range */
+	/* 0x9 */ ML_CN10K_FW_ERR_NUM_BATCHES_INVALID, /* Invalid number of batches */
+	/* 0xA */ ML_CN10K_FW_ERR_INSSYNC_TIMEOUT,     /* INS sync timeout */
 };
 
 /* Driver error sub-type */
 enum cn10k_ml_error_stype_driver {
-	/* 0x0 */ ML_DRIVER_ERR_NOERR = 0, /* No error */
-	/* 0x1 */ ML_DRIVER_ERR_UNKNOWN,   /* Unable to determine error sub-type */
-	/* 0x2 */ ML_DRIVER_ERR_EXCEPTION, /* Firmware exception */
-	/* 0x3 */ ML_DRIVER_ERR_FW_ERROR,  /* Unknown firmware error */
+	/* 0x0 */ ML_CN10K_DRIVER_ERR_NOERR = 0, /* No error */
+	/* 0x1 */ ML_CN10K_DRIVER_ERR_UNKNOWN,	 /* Unable to determine error sub-type */
+	/* 0x2 */ ML_CN10K_DRIVER_ERR_EXCEPTION, /* Firmware exception */
+	/* 0x3 */ ML_CN10K_DRIVER_ERR_FW_ERROR,	 /* Unknown firmware error */
 };
 
 /* Error structure */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 8116c8dedb..65eaaf030d 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -22,47 +22,27 @@
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
-/* Error message length */
-#define ERRMSG_LEN 32
-
-/* Error type database */
-static const struct cn10k_ml_etype_db {
-	enum cn10k_ml_error_etype etype;
-	char name[ERRMSG_LEN];
-} ml_etype_db[] = {
-	{ML_ETYPE_NO_ERROR, "NO_ERROR"},	{ML_ETYPE_FW_NONFATAL, "FW_NON_FATAL"},
-	{ML_ETYPE_HW_NONFATAL, "HW_NON_FATAL"}, {ML_ETYPE_HW_FATAL, "HW_FATAL"},
-	{ML_ETYPE_HW_WARNING, "HW_WARNING"},	{ML_ETYPE_DRIVER, "DRIVER_ERROR"},
-	{ML_ETYPE_UNKNOWN, "UNKNOWN_ERROR"},
-};
-
 /* Hardware non-fatal error subtype database */
-static const struct cn10k_ml_stype_db_hw_nf {
-	enum cn10k_ml_error_stype_fw_nf stype;
-	char msg[ERRMSG_LEN];
-} ml_stype_db_hw_nf[] = {
-	{ML_FW_ERR_NOERR, "NO ERROR"},
-	{ML_FW_ERR_UNLOAD_ID_NOT_FOUND, "UNLOAD MODEL ID NOT FOUND"},
-	{ML_FW_ERR_LOAD_LUT_OVERFLOW, "LOAD LUT OVERFLOW"},
-	{ML_FW_ERR_ID_IN_USE, "MODEL ID IN USE"},
-	{ML_FW_ERR_INVALID_TILEMASK, "INVALID TILEMASK"},
-	{ML_FW_ERR_RUN_LUT_OVERFLOW, "RUN LUT OVERFLOW"},
-	{ML_FW_ERR_RUN_ID_NOT_FOUND, "RUN MODEL ID NOT FOUND"},
-	{ML_FW_ERR_COMMAND_NOTSUP, "COMMAND NOT SUPPORTED"},
-	{ML_FW_ERR_DDR_ADDR_RANGE, "DDR ADDRESS OUT OF RANGE"},
-	{ML_FW_ERR_NUM_BATCHES_INVALID, "INVALID BATCHES"},
-	{ML_FW_ERR_INSSYNC_TIMEOUT, "INSSYNC TIMEOUT"},
+static struct cnxk_ml_error_db ml_stype_db_hw_nf[] = {
+	{ML_CN10K_FW_ERR_NOERR, "NO ERROR"},
+	{ML_CN10K_FW_ERR_UNLOAD_ID_NOT_FOUND, "UNLOAD MODEL ID NOT FOUND"},
+	{ML_CN10K_FW_ERR_LOAD_LUT_OVERFLOW, "LOAD LUT OVERFLOW"},
+	{ML_CN10K_FW_ERR_ID_IN_USE, "MODEL ID IN USE"},
+	{ML_CN10K_FW_ERR_INVALID_TILEMASK, "INVALID TILEMASK"},
+	{ML_CN10K_FW_ERR_RUN_LUT_OVERFLOW, "RUN LUT OVERFLOW"},
+	{ML_CN10K_FW_ERR_RUN_ID_NOT_FOUND, "RUN MODEL ID NOT FOUND"},
+	{ML_CN10K_FW_ERR_COMMAND_NOTSUP, "COMMAND NOT SUPPORTED"},
+	{ML_CN10K_FW_ERR_DDR_ADDR_RANGE, "DDR ADDRESS OUT OF RANGE"},
+	{ML_CN10K_FW_ERR_NUM_BATCHES_INVALID, "INVALID BATCHES"},
+	{ML_CN10K_FW_ERR_INSSYNC_TIMEOUT, "INSSYNC TIMEOUT"},
 };
 
 /* Driver error subtype database */
-static const struct cn10k_ml_stype_db_driver {
-	enum cn10k_ml_error_stype_driver stype;
-	char msg[ERRMSG_LEN];
-} ml_stype_db_driver[] = {
-	{ML_DRIVER_ERR_NOERR, "NO ERROR"},
-	{ML_DRIVER_ERR_UNKNOWN, "UNKNOWN ERROR"},
-	{ML_DRIVER_ERR_EXCEPTION, "FW EXCEPTION"},
-	{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
+static struct cnxk_ml_error_db ml_stype_db_driver[] = {
+	{ML_CN10K_DRIVER_ERR_NOERR, "NO ERROR"},
+	{ML_CN10K_DRIVER_ERR_UNKNOWN, "UNKNOWN ERROR"},
+	{ML_CN10K_DRIVER_ERR_EXCEPTION, "FW EXCEPTION"},
+	{ML_CN10K_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
 };
 
 __rte_hot void
@@ -1241,19 +1221,19 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request)
 
 		/* Handle driver error */
 		error_code = (union cn10k_ml_error_code *)&result->error_code;
-		if (error_code->s.etype == ML_ETYPE_DRIVER) {
+		if (error_code->s.etype == ML_CNXK_ETYPE_DRIVER) {
 			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 			/* Check for exception */
 			if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) !=
 			     0) ||
 			    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
-				error_code->s.stype = ML_DRIVER_ERR_EXCEPTION;
+				error_code->s.stype = ML_CN10K_DRIVER_ERR_EXCEPTION;
 			else if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_LO) != 0) ||
 				 (roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_HI) != 0))
-				error_code->s.stype = ML_DRIVER_ERR_FW_ERROR;
+				error_code->s.stype = ML_CN10K_DRIVER_ERR_FW_ERROR;
 			else
-				error_code->s.stype = ML_DRIVER_ERR_UNKNOWN;
+				error_code->s.stype = ML_CN10K_DRIVER_ERR_UNKNOWN;
 		}
 
 		op->impl_opaque = result->error_code;
@@ -1294,7 +1274,7 @@ cn10k_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op, ui
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
-	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	error_code->s.etype = ML_CNXK_ETYPE_UNKNOWN;
 	req->cn10k_req.result.user_ptr = op->user_ptr;
 
 	cnxk_ml_set_poll_ptr(req);
@@ -1311,30 +1291,29 @@ __rte_hot int
 cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_ml_op_error *error)
 {
 	union cn10k_ml_error_code *error_code;
-	char msg[RTE_ML_STR_MAX];
 
 	PLT_SET_USED(dev);
 
 	error_code = (union cn10k_ml_error_code *)&op->impl_opaque;
 
-	/* Copy error message */
-	plt_strlcpy(msg, ml_etype_db[error_code->s.etype].name, sizeof(msg));
-
 	/* Copy sub error message */
-	if (error_code->s.etype == ML_ETYPE_HW_NONFATAL) {
-		strcat(msg, " : ");
+	if (error_code->s.etype == ML_CNXK_ETYPE_HW_NONFATAL) {
 		if (error_code->s.stype < PLT_DIM(ml_stype_db_hw_nf))
-			strcat(msg, ml_stype_db_hw_nf[error_code->s.stype].msg);
+			snprintf(error->message, RTE_ML_STR_MAX, "%s : %s",
+				 ml_etype_db[error_code->s.etype].str,
+				 ml_stype_db_hw_nf[error_code->s.stype].str);
 		else
-			strcat(msg, "UNKNOWN ERROR");
-	}
-
-	if (error_code->s.etype == ML_ETYPE_DRIVER) {
-		strcat(msg, " : ");
-		strcat(msg, ml_stype_db_driver[error_code->s.stype].msg);
+			snprintf(error->message, RTE_ML_STR_MAX, "%s : UNKNOWN ERROR",
+				 ml_etype_db[error_code->s.etype].str);
+	} else if (error_code->s.etype == ML_CNXK_ETYPE_DRIVER) {
+		snprintf(error->message, RTE_ML_STR_MAX, "%s : %s",
+			 ml_etype_db[error_code->s.etype].str,
+			 ml_stype_db_driver[error_code->s.stype].str);
+	} else {
+		snprintf(error->message, RTE_ML_STR_MAX, "%s",
+			 ml_etype_db[error_code->s.etype].str);
 	}
 
-	plt_strlcpy(error->message, msg, sizeof(error->message));
 	error->errcode = error_code->u64;
 
 	return 0;
@@ -1372,7 +1351,7 @@ cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
-	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	error_code->s.etype = ML_CNXK_ETYPE_UNKNOWN;
 	req->cn10k_req.result.user_ptr = NULL;
 
 	cnxk_ml_set_poll_ptr(req);
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.c b/drivers/ml/cnxk/cnxk_ml_dev.c
index 2a5c17c973..63d1c9e417 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.c
+++ b/drivers/ml/cnxk/cnxk_ml_dev.c
@@ -9,3 +9,11 @@
 
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
+
+/* Error type database */
+struct cnxk_ml_error_db ml_etype_db[] = {
+	{ML_CNXK_ETYPE_NO_ERROR, "NO_ERROR"},	     {ML_CNXK_ETYPE_FW_NONFATAL, "FW_NON_FATAL"},
+	{ML_CNXK_ETYPE_HW_NONFATAL, "HW_NON_FATAL"}, {ML_CNXK_ETYPE_HW_FATAL, "HW_FATAL"},
+	{ML_CNXK_ETYPE_HW_WARNING, "HW_WARNING"},    {ML_CNXK_ETYPE_DRIVER, "DRIVER_ERROR"},
+	{ML_CNXK_ETYPE_UNKNOWN, "UNKNOWN_ERROR"},
+};
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 3ce9338f1f..382fca64be 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -18,6 +18,22 @@
 #define ML_CNXK_POLL_JOB_START	0
 #define ML_CNXK_POLL_JOB_FINISH 1
 
+/* Error types enumeration */
+enum cnxk_ml_error_etype {
+	/* 0x0 */ ML_CNXK_ETYPE_NO_ERROR = 0, /* No error */
+	/* 0x1 */ ML_CNXK_ETYPE_FW_NONFATAL,  /* Firmware non-fatal error */
+	/* 0x2 */ ML_CNXK_ETYPE_HW_NONFATAL,  /* Hardware non-fatal error */
+	/* 0x3 */ ML_CNXK_ETYPE_HW_FATAL,     /* Hardware fatal error */
+	/* 0x4 */ ML_CNXK_ETYPE_HW_WARNING,   /* Hardware warning */
+	/* 0x5 */ ML_CNXK_ETYPE_DRIVER,	      /* Driver specific error */
+	/* 0x6 */ ML_CNXK_ETYPE_UNKNOWN,      /* Unknown error */
+};
+
+struct cnxk_ml_error_db {
+	uint64_t code;
+	char str[RTE_ML_STR_MAX];
+};
+
 /* Device configuration state enum */
 enum cnxk_ml_dev_state {
 	/* Probed and not configured */
@@ -78,4 +94,6 @@ struct cnxk_ml_dev {
 	struct cnxk_ml_index_map *index_map;
 };
 
+extern struct cnxk_ml_error_db ml_etype_db[];
+
 #endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index d5bc08c1ae..f10bdcee90 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1432,7 +1432,7 @@ cnxk_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op *
 		if (plt_tsc_cycles() < req->timeout)
 			goto empty_or_active;
 		else /* Timeout, set indication of driver error */
-			model->set_error_code(req, ML_ETYPE_DRIVER, 0);
+			model->set_error_code(req, ML_CNXK_ETYPE_DRIVER, 0);
 	}
 
 	model->result_update(cnxk_mldev, qp->id, req);
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 19/34] ml/cnxk: support config and close of tvmdp library
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (17 preceding siblings ...)
  2023-10-17 16:59   ` [PATCH v4 18/34] ml/cnxk: move error handling to cnxk layer Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 20/34] ml/cnxk: add structures to support TVM model type Srikanth Yalavarthi
                     ` (15 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to configure and close TVMDP library based
on ML device configuration options.

Updated meson build to enable Jansson, TVM runtime, TVMDP
library as build dependencies.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 doc/guides/mldevs/cnxk.rst       | 58 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.c    |  7 ++++
 drivers/ml/cnxk/cnxk_ml_ops.h    |  6 ++++
 drivers/ml/cnxk/meson.build      | 59 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 41 ++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   | 19 ++++++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.c | 26 ++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h | 15 ++++++++
 8 files changed, 231 insertions(+)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_stubs.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_stubs.h

diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
index 1834b1f905..a629ceb796 100644
--- a/doc/guides/mldevs/cnxk.rst
+++ b/doc/guides/mldevs/cnxk.rst
@@ -46,6 +46,64 @@ or cross-compiled on an x86 platform.
 
 Refer to :doc:`../platform/cnxk` for instructions to build your DPDK application.
 
+Compilation Prerequisites
+-------------------------
+
+This driver requires external libraries to optionally enable support for
+models compiled using Apache TVM framework. The following dependencies are
+not part of DPDK and must be installed separately:
+
+- **Jansson**
+
+  This library enables support to parse and read JSON files.
+
+- **TVM**
+
+  Apache TVM provides a runtime library (libtvm_runtime) used to execute
+  models on CPU cores or hardware accelerators.
+
+.. note::
+
+    DPDK CNXK ML driver requires TVM version 0.10.0
+
+.. code-block:: console
+
+    git clone https://github.com/apache/tvm.git
+    cd tvm
+    git checkout v0.10.0 -b v0.10.0
+    cmake -S ./ -B build \
+      -DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
+      -DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++ \
+      -DMACHINE_NAME=aarch64-linux-gnu \
+      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
+      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY
+    make -C build
+    make -C build install
+
+- **TVMDP**
+
+  Marvell's `TVM Dataplane Library <https://github.com/MarvellEmbeddedProcessors/tvmdp>`_
+  works as an interface between TVM runtime and DPDK drivers. TVMDP library
+  provides a simplified C interface for TVM's runtime based on C++.
+
+.. code-block:: console
+
+    git clone https://github.com/MarvellEmbeddedProcessors/tvmdp.git
+    cd tvmdp
+    git checkout main
+    cmake -S ./ -B build \
+      -DCMAKE_TOOLCHAIN_FILE=config/toolchains/arm64_linux_gcc.cmake \
+      -DBUILD_SHARED_LIBS=ON \
+      -DBUILD_TESTING=OFF
+    make -C build
+    make -C build install
+
+- **libarchive**
+
+  Apached TVM framework generates compiled models as tar archives. This
+  library enables support to decompress and read archive files in tar,
+  xz and other formats.
+
 
 Initialization
 --------------
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index f10bdcee90..65d3e79ec2 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -564,6 +564,10 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 		goto error;
 	}
 
+	ret = mvtvm_ml_dev_configure(cnxk_mldev, conf);
+	if (ret != 0)
+		goto error;
+
 	/* Set device capabilities */
 	cnxk_mldev->max_nb_layers =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
@@ -624,6 +628,9 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 	/* Un-initialize xstats */
 	cnxk_ml_xstats_uninit(cnxk_mldev);
 
+	if (mvtvm_ml_dev_close(cnxk_mldev) != 0)
+		plt_err("Failed to close MVTVM ML Device");
+
 	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close CN10K ML Device");
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index d0c126f34b..b22a2b0d95 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -12,6 +12,12 @@
 
 #include "cn10k_ml_ops.h"
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_ops.h"
+#else
+#include "mvtvm_ml_stubs.h"
+#endif
+
 /* Request structure */
 struct cnxk_ml_req {
 	/* Device specific request */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 575f08f9c0..7570186177 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -7,6 +7,32 @@ if not is_linux or not dpdk_conf.get('RTE_ARCH_64')
     subdir_done()
 endif
 
+enable_mvtvm = true
+
+if not jansson_dep.found()
+        message('drivers/ml/cnxk: jansson not found')
+        enable_mvtvm = false
+endif
+
+if not cc.check_header('dlpack/dlpack.h')
+        message('drivers/ml/cnxk: dlpack.h not found')
+        enable_mvtvm = false
+endif
+
+tvmrt_lib = cc.find_library('tvm_runtime', required: false)
+if tvmrt_lib.found()
+        tvmrt_dep = declare_dependency(dependencies: tvmrt_lib)
+else
+        message('drivers/ml/cnxk: tvm_runtime not found')
+        enable_mvtvm = false
+endif
+
+tvmdp_dep = dependency('tvmdp', required: false)
+if not tvmdp_dep.found()
+        message('drivers/ml/cnxk: tvmdp not found')
+        enable_mvtvm = false
+endif
+
 driver_sdk_headers = files(
         'cn10k_ml_dev.h',
         'cn10k_ml_ops.h',
@@ -34,6 +60,39 @@ sources = files(
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
 
+if enable_mvtvm
+
+dpdk_conf.set('RTE_MLDEV_CNXK_ENABLE_MVTVM', 1)
+
+driver_sdk_headers += files(
+        'mvtvm_ml_ops.h',
+)
+
+sources += files(
+        'mvtvm_ml_ops.c',
+)
+
+ext_deps += tvmrt_dep
+ext_deps += tvmdp_dep
+ext_deps += cc.find_library('stdc++', required: true)
+ext_deps += jansson_dep
+
+deps += ['bus_vdev']
+
+message('drivers/ml/cnxk: Enabled TVM model support')
+else
+message('drivers/ml/cnxk: Disabled TVM model support')
+
+driver_sdk_headers += files(
+        'mvtvm_ml_stubs.h',
+)
+
+sources += files(
+        'mvtvm_ml_stubs.c',
+)
+
+endif
+
 require_iova_in_mbuf = false
 
 if get_option('buildtype').contains('debug')
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
new file mode 100644
index 0000000000..88c6d5a864
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_common.h>
+#include <rte_cycles.h>
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include "cnxk_ml_dev.h"
+#include "cnxk_ml_ops.h"
+
+int
+mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
+{
+	int ret;
+
+	RTE_SET_USED(conf);
+
+	/* Configure TVMDP library */
+	ret = tvmdp_configure(cnxk_mldev->mldev->data->nb_models, rte_get_tsc_cycles);
+	if (ret != 0)
+		plt_err("TVMDP configuration failed, error = %d\n", ret);
+
+	return ret;
+}
+
+int
+mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
+{
+	int ret;
+
+	RTE_SET_USED(cnxk_mldev);
+
+	/* Close TVMDP library configuration */
+	ret = tvmdp_close();
+	if (ret != 0)
+		plt_err("TVMDP close failed, error = %d\n", ret);
+
+	return ret;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
new file mode 100644
index 0000000000..305b4681ed
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_OPS_H_
+#define _MVTVM_ML_OPS_H_
+
+#include <dlpack/dlpack.h>
+
+#include <tvmdp.h>
+
+#include <rte_mldev.h>
+
+struct cnxk_ml_dev;
+
+int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
+int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+
+#endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
new file mode 100644
index 0000000000..a31cd39cfa
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include "mvtvm_ml_stubs.h"
+
+#include "cnxk_ml_dev.h"
+
+int
+mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(conf);
+
+	return 0;
+}
+
+int
+mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
+{
+	RTE_SET_USED(cnxk_mldev);
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
new file mode 100644
index 0000000000..11c56e5144
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_STUBS_H_
+#define _MVTVM_ML_STUBS_H_
+
+#include <rte_mldev.h>
+
+struct cnxk_ml_dev;
+
+int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
+int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+
+#endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 20/34] ml/cnxk: add structures to support TVM model type
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (18 preceding siblings ...)
  2023-10-17 16:59   ` [PATCH v4 19/34] ml/cnxk: support config and close of tvmdp library Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 21/34] ml/cnxk: add support for identify " Srikanth Yalavarthi
                     ` (14 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduced model type, sub-type and layer type. Added
internal structures for TVM model objects.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  3 ++
 drivers/ml/cnxk/cn10k_ml_ops.c   |  6 ++-
 drivers/ml/cnxk/cnxk_ml_model.h  | 66 +++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.c    | 52 ++++++++++++++++++++-----
 drivers/ml/cnxk/meson.build      |  1 +
 drivers/ml/cnxk/mvtvm_ml_model.h | 46 ++++++++++++++++++++++
 6 files changed, 161 insertions(+), 13 deletions(-)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.h

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index dc315cce10..749ddeb344 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -435,6 +435,9 @@ cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint1
 
 			for (j = 0; j < local_model->nb_layers; j++) {
 				local_layer = &local_model->layer[j];
+				if (local_layer->type != ML_CNXK_LAYER_TYPE_MRVL)
+					continue;
+
 				if (local_layer != layer &&
 				    local_layer->glow.ocm_map.ocm_reserved) {
 					if (IS_BIT_SET(local_layer->glow.ocm_map.tilemask, tile_id))
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 65eaaf030d..a471e98fbf 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -725,6 +725,9 @@ cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	if (ret != 0)
 		return ret;
 
+	/* Set model sub type */
+	model->subtype = ML_CNXK_MODEL_SUBTYPE_GLOW_MRVL;
+
 	/* Copy metadata to internal buffer */
 	rte_memcpy(&model->glow.metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
 	cn10k_ml_model_metadata_update(&model->glow.metadata);
@@ -746,6 +749,7 @@ cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	/* Load layer and get the index */
 	layer = &model->layer[0];
+	layer->type = ML_CNXK_LAYER_TYPE_MRVL;
 	ret = cn10k_ml_layer_load(cnxk_mldev, model->model_id, NULL, params->addr, params->size,
 				  &layer->index);
 	if (ret != 0) {
@@ -969,7 +973,7 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 	if (ret < 0) {
 		cn10k_ml_layer_stop(device, model_id, layer_name);
 	} else {
-		if (cn10k_mldev->cache_model_data)
+		if (cn10k_mldev->cache_model_data && model->type == ML_CNXK_MODEL_TYPE_GLOW)
 			ret = cn10k_ml_cache_model_data(cnxk_mldev, layer);
 	}
 
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index f618e5aa5f..f100eca203 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -11,6 +11,10 @@
 
 #include "cn10k_ml_model.h"
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_model.h"
+#endif
+
 #include "cnxk_ml_io.h"
 
 struct cnxk_ml_dev;
@@ -18,6 +22,48 @@ struct cnxk_ml_model;
 struct cnxk_ml_qp;
 struct cnxk_ml_req;
 
+/* Model type */
+enum cnxk_ml_model_type {
+	/* Unknown model type */
+	ML_CNXK_MODEL_TYPE_UNKNOWN,
+
+	/* Invalid model type */
+	ML_CNXK_MODEL_TYPE_INVALID,
+
+	/* Glow compiled model, for MLIP target */
+	ML_CNXK_MODEL_TYPE_GLOW,
+
+	/* TVM compiled model, for ARM64 / ARM64 + MLIP target */
+	ML_CNXK_MODEL_TYPE_TVM,
+};
+
+/* Model subtype */
+enum cnxk_ml_model_subtype {
+	/* Marvell Glow model */
+	ML_CNXK_MODEL_SUBTYPE_GLOW_MRVL,
+
+	/* TVM model with single MRVL region */
+	ML_CNXK_MODEL_SUBTYPE_TVM_MRVL,
+
+	/* TVM model with LLVM regions only */
+	ML_CNXK_MODEL_SUBTYPE_TVM_LLVM,
+
+	/* TVM hybrid model, with both MRVL and LLVM regions or (> 1) MRVL regions*/
+	ML_CNXK_MODEL_SUBTYPE_TVM_HYBRID,
+};
+
+/* Layer type */
+enum cnxk_ml_layer_type {
+	/* MRVL layer, for MLIP target*/
+	ML_CNXK_LAYER_TYPE_UNKNOWN = 0,
+
+	/* MRVL layer, for MLIP target*/
+	ML_CNXK_LAYER_TYPE_MRVL,
+
+	/* LLVM layer, for ARM64 target*/
+	ML_CNXK_LAYER_TYPE_LLVM,
+};
+
 /* Model state */
 enum cnxk_ml_model_state {
 	/* Unknown state */
@@ -53,6 +99,9 @@ struct cnxk_ml_layer {
 	/* Name*/
 	char name[RTE_ML_STR_MAX];
 
+	/* Type */
+	enum cnxk_ml_layer_type type;
+
 	/* Model handle */
 	struct cnxk_ml_model *model;
 
@@ -83,14 +132,27 @@ struct cnxk_ml_model {
 	/* Device reference */
 	struct cnxk_ml_dev *cnxk_mldev;
 
+	/* Type */
+	enum cnxk_ml_model_type type;
+
+	/* Model subtype */
+	enum cnxk_ml_model_subtype subtype;
+
 	/* ID */
 	uint16_t model_id;
 
 	/* Name */
 	char name[RTE_ML_STR_MAX];
 
-	/* Model specific data - glow */
-	struct cn10k_ml_model_data glow;
+	union {
+		/* Model specific data - glow */
+		struct cn10k_ml_model_data glow;
+
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		/* Model type specific data - mvtvm */
+		struct mvtvm_ml_model_data mvtvm;
+#endif
+	};
 
 	/* Batch size */
 	uint32_t batch_size;
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 65d3e79ec2..2bb6ec9f50 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1277,6 +1277,8 @@ cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buf
 	struct cnxk_ml_model *model;
 	uint8_t *lcl_dbuffer;
 	uint8_t *lcl_qbuffer;
+	uint64_t d_offset;
+	uint64_t q_offset;
 	uint32_t i;
 	int ret;
 
@@ -1289,17 +1291,31 @@ cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buf
 		return -EINVAL;
 	}
 
-	info = &model->layer[0].info;
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		info = cn10k_ml_model_io_info_get(model, 0);
 
-	lcl_dbuffer = dbuffer[0]->addr;
-	lcl_qbuffer = qbuffer[0]->addr;
+	if (info == NULL)
+		return -EINVAL;
+
+	d_offset = 0;
+	q_offset = 0;
 	for (i = 0; i < info->nb_inputs; i++) {
+		if (model->type == ML_CNXK_MODEL_TYPE_TVM) {
+			lcl_dbuffer = dbuffer[i]->addr;
+			lcl_qbuffer = qbuffer[i]->addr;
+		} else {
+			lcl_dbuffer = RTE_PTR_ADD(dbuffer[0]->addr, d_offset);
+			lcl_qbuffer = RTE_PTR_ADD(qbuffer[0]->addr, q_offset);
+		}
+
 		ret = cnxk_ml_io_quantize_single(&info->input[i], lcl_dbuffer, lcl_qbuffer);
 		if (ret < 0)
 			return ret;
 
-		lcl_dbuffer += info->input[i].sz_d;
-		lcl_qbuffer += info->input[i].sz_q;
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
+			d_offset += info->input[i].sz_d;
+			q_offset += info->input[i].sz_q;
+		}
 	}
 
 	return 0;
@@ -1313,6 +1329,8 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 	struct cnxk_ml_model *model;
 	uint8_t *lcl_qbuffer;
 	uint8_t *lcl_dbuffer;
+	uint64_t q_offset;
+	uint64_t d_offset;
 	uint32_t i;
 	int ret;
 
@@ -1325,17 +1343,31 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 		return -EINVAL;
 	}
 
-	info = &model->layer[model->nb_layers - 1].info;
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		info = cn10k_ml_model_io_info_get(model, model->nb_layers - 1);
 
-	lcl_qbuffer = qbuffer[0]->addr;
-	lcl_dbuffer = dbuffer[0]->addr;
+	if (info == NULL)
+		return -EINVAL;
+
+	q_offset = 0;
+	d_offset = 0;
 	for (i = 0; i < info->nb_outputs; i++) {
+		if (model->type == ML_CNXK_MODEL_TYPE_TVM) {
+			lcl_qbuffer = qbuffer[i]->addr;
+			lcl_dbuffer = dbuffer[i]->addr;
+		} else {
+			lcl_qbuffer = RTE_PTR_ADD(qbuffer[0]->addr, q_offset);
+			lcl_dbuffer = RTE_PTR_ADD(dbuffer[0]->addr, d_offset);
+		}
+
 		ret = cnxk_ml_io_dequantize_single(&info->output[i], lcl_qbuffer, lcl_dbuffer);
 		if (ret < 0)
 			return ret;
 
-		lcl_qbuffer += info->output[i].sz_q;
-		lcl_dbuffer += info->output[i].sz_d;
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
+			q_offset += info->output[i].sz_q;
+			d_offset += info->output[i].sz_d;
+		}
 	}
 
 	return 0;
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 7570186177..12b73ee3be 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -66,6 +66,7 @@ dpdk_conf.set('RTE_MLDEV_CNXK_ENABLE_MVTVM', 1)
 
 driver_sdk_headers += files(
         'mvtvm_ml_ops.h',
+        'mvtvm_ml_model.h',
 )
 
 sources += files(
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
new file mode 100644
index 0000000000..1f6b435be0
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_MODEL_H_
+#define _MVTVM_ML_MODEL_H_
+
+#include <tvmdp.h>
+
+#include <rte_mldev.h>
+
+#include "cnxk_ml_io.h"
+
+/* Maximum number of objects per model */
+#define ML_MVTVM_MODEL_OBJECT_MAX 3
+
+/* Objects list */
+extern char mvtvm_object_list[ML_MVTVM_MODEL_OBJECT_MAX][RTE_ML_STR_MAX];
+
+/* Model object structure */
+struct mvtvm_ml_model_object {
+	/* Name */
+	char name[RTE_ML_STR_MAX];
+
+	/* Temporary buffer */
+	uint8_t *buffer;
+
+	/* Buffer size */
+	int64_t size;
+};
+
+struct mvtvm_ml_model_data {
+	/* Model metadata */
+	struct tvmdp_model_metadata metadata;
+
+	/* Model objects */
+	struct tvmdp_model_object object;
+
+	/* TVM runtime callbacks */
+	struct tvmrt_glow_callback cb;
+
+	/* Model I/O info */
+	struct cnxk_ml_io_info info;
+};
+
+#endif /* _MVTVM_ML_MODEL_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 21/34] ml/cnxk: add support for identify model type
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (19 preceding siblings ...)
  2023-10-17 16:59   ` [PATCH v4 20/34] ml/cnxk: add structures to support TVM model type Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 22/34] ml/cnxk: add support to parse TVM model objects Srikanth Yalavarthi
                     ` (13 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enable support to parse model buffer to identify the
model type and model sub-type. Enabled basic checks
for Glow model type buffer.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_model.c  | 49 ++++++++++++++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_model.h  |  3 ++
 drivers/ml/cnxk/cnxk_ml_ops.c    |  8 +++++
 drivers/ml/cnxk/meson.build      |  6 ++++
 drivers/ml/cnxk/mvtvm_ml_model.c | 55 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  9 ++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  1 +
 8 files changed, 133 insertions(+)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.c

diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
index b069d4e3a5..02f80410ec 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.c
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -2,11 +2,60 @@
  * Copyright (c) 2023 Marvell.
  */
 
+#include <rte_hash_crc.h>
 #include <rte_mldev.h>
 
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_utils.h"
 
+enum cnxk_ml_model_type
+cnxk_ml_model_get_type(struct rte_ml_model_params *params)
+{
+	struct cn10k_ml_model_metadata_header *metadata_header;
+	enum cnxk_ml_model_type type;
+	uint32_t payload_crc32c;
+	uint32_t header_crc32c;
+
+	type = mvtvm_ml_model_type_get(params);
+	if (type == ML_CNXK_MODEL_TYPE_TVM)
+		return ML_CNXK_MODEL_TYPE_TVM;
+	else if (type == ML_CNXK_MODEL_TYPE_INVALID)
+		return ML_CNXK_MODEL_TYPE_INVALID;
+
+	/* Check model magic string */
+	metadata_header = (struct cn10k_ml_model_metadata_header *)params->addr;
+	if (strncmp((char *)metadata_header->magic, MRVL_ML_MODEL_MAGIC_STRING, 4) != 0) {
+		plt_err("Invalid Glow model, magic = %s", metadata_header->magic);
+		return ML_CNXK_MODEL_TYPE_INVALID;
+	}
+
+	/* Header CRC check */
+	if (metadata_header->header_crc32c != 0) {
+		header_crc32c = rte_hash_crc(
+			params->addr,
+			sizeof(struct cn10k_ml_model_metadata_header) - sizeof(uint32_t), 0);
+
+		if (header_crc32c != metadata_header->header_crc32c) {
+			plt_err("Invalid Glow model, Header CRC mismatch");
+			return ML_CNXK_MODEL_TYPE_INVALID;
+		}
+	}
+
+	/* Payload CRC check */
+	if (metadata_header->payload_crc32c != 0) {
+		payload_crc32c = rte_hash_crc(
+			PLT_PTR_ADD(params->addr, sizeof(struct cn10k_ml_model_metadata_header)),
+			params->size - sizeof(struct cn10k_ml_model_metadata_header), 0);
+
+		if (payload_crc32c != metadata_header->payload_crc32c) {
+			plt_err("Invalid Glow model, Payload CRC mismatch");
+			return ML_CNXK_MODEL_TYPE_INVALID;
+		}
+	}
+
+	return ML_CNXK_MODEL_TYPE_GLOW;
+}
+
 void
 cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp)
 {
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index f100eca203..a2fced46a2 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -13,6 +13,8 @@
 
 #ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
 #include "mvtvm_ml_model.h"
+#else
+#include "mvtvm_ml_stubs.h"
 #endif
 
 #include "cnxk_ml_io.h"
@@ -184,6 +186,7 @@ struct cnxk_ml_model {
 	set_poll_addr_t set_poll_addr;
 };
 
+enum cnxk_ml_model_type cnxk_ml_model_get_type(struct rte_ml_model_params *params);
 void cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp);
 
 #endif /* _CNXK_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 2bb6ec9f50..aa809cabea 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1018,6 +1018,7 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 {
 	struct rte_ml_dev_info dev_info;
 	struct cnxk_ml_dev *cnxk_mldev;
+	enum cnxk_ml_model_type type;
 	struct cnxk_ml_model *model;
 
 	char str[RTE_MEMZONE_NAMESIZE];
@@ -1037,6 +1038,12 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 
 	cnxk_mldev = dev->data->dev_private;
 
+	type = cnxk_ml_model_get_type(params);
+	if (type == ML_CNXK_MODEL_TYPE_INVALID) {
+		plt_err("Invalid / unsupported model type");
+		return -EINVAL;
+	}
+
 	/* Find model ID */
 	found = false;
 	for (lcl_model_id = 0; lcl_model_id < dev->data->nb_models; lcl_model_id++) {
@@ -1070,6 +1077,7 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 
 	model = mz->addr;
 	model->cnxk_mldev = cnxk_mldev;
+	model->type = type;
 	model->model_id = lcl_model_id;
 	model->info = PLT_PTR_ADD(
 		model, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size));
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 12b73ee3be..b3a62a7871 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -9,6 +9,11 @@ endif
 
 enable_mvtvm = true
 
+if not libarchive.found()
+        message('drivers/ml/cnxk: libarchive not found')
+        enable_mvtvm = false
+endif
+
 if not jansson_dep.found()
         message('drivers/ml/cnxk: jansson not found')
         enable_mvtvm = false
@@ -71,6 +76,7 @@ driver_sdk_headers += files(
 
 sources += files(
         'mvtvm_ml_ops.c',
+        'mvtvm_ml_model.c',
 )
 
 ext_deps += tvmrt_dep
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
new file mode 100644
index 0000000000..ab5f8baa67
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <archive.h>
+#include <archive_entry.h>
+
+#include <rte_mldev.h>
+
+#include <roc_api.h>
+
+#include "cnxk_ml_model.h"
+
+/* Objects list */
+char mvtvm_object_list[ML_MVTVM_MODEL_OBJECT_MAX][RTE_ML_STR_MAX] = {"mod.so", "mod.json",
+								     "mod.params"};
+
+enum cnxk_ml_model_type
+mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
+{
+	bool object_found[ML_MVTVM_MODEL_OBJECT_MAX] = {false, false, false};
+	struct archive_entry *entry;
+	struct archive *a;
+	uint8_t i;
+	int ret;
+
+	/* Assume as archive and check for read status */
+	a = archive_read_new();
+	archive_read_support_filter_all(a);
+	archive_read_support_format_all(a);
+
+	ret = archive_read_open_memory(a, params->addr, params->size);
+	if (ret != ARCHIVE_OK)
+		return ML_CNXK_MODEL_TYPE_UNKNOWN;
+
+	/* Parse buffer for available objects */
+	while (archive_read_next_header(a, &entry) == ARCHIVE_OK) {
+		for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+			if (!object_found[i] &&
+			    (strcmp(archive_entry_pathname(entry), mvtvm_object_list[i]) == 0))
+				object_found[i] = true;
+		}
+		archive_read_data_skip(a);
+	}
+
+	/* Check if all objects are available */
+	for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+		if (!object_found[i]) {
+			plt_err("Object %s not found in archive!\n", mvtvm_object_list[i]);
+			return ML_CNXK_MODEL_TYPE_INVALID;
+		}
+	}
+
+	return ML_CNXK_MODEL_TYPE_TVM;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 1f6b435be0..b6162fceec 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -43,4 +43,6 @@ struct mvtvm_ml_model_data {
 	struct cnxk_ml_io_info info;
 };
 
+enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
+
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index a31cd39cfa..a7352840a6 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -7,6 +7,15 @@
 #include "mvtvm_ml_stubs.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
+
+enum cnxk_ml_model_type
+mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
+{
+	RTE_SET_USED(params);
+
+	return ML_CNXK_MODEL_TYPE_UNKNOWN;
+}
 
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 11c56e5144..467a9d39e5 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -9,6 +9,7 @@
 
 struct cnxk_ml_dev;
 
+enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 22/34] ml/cnxk: add support to parse TVM model objects
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (20 preceding siblings ...)
  2023-10-17 16:59   ` [PATCH v4 21/34] ml/cnxk: add support for identify " Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 23/34] ml/cnxk: fetch layer info and load TVM model Srikanth Yalavarthi
                     ` (12 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to parse TVM model objects from the model
archive buffer. Added support to check for all expected
objects and copy TVM model objects to internal buffers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_ops.c    |  5 ++-
 drivers/ml/cnxk/mvtvm_ml_model.c | 57 +++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 62 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |  3 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c | 11 ++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  3 ++
 7 files changed, 142 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index aa809cabea..60b1c6b375 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1083,7 +1083,10 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 		model, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size));
 	dev->data->models[lcl_model_id] = model;
 
-	ret = cn10k_ml_model_load(cnxk_mldev, params, model);
+	if (type == ML_CNXK_MODEL_TYPE_GLOW)
+		ret = cn10k_ml_model_load(cnxk_mldev, params, model);
+	else
+		ret = mvtvm_ml_model_load(cnxk_mldev, params, model);
 	if (ret != 0)
 		goto error;
 
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index ab5f8baa67..4c9a080c05 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -53,3 +53,60 @@ mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
 
 	return ML_CNXK_MODEL_TYPE_TVM;
 }
+
+int
+mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params, struct mvtvm_ml_model_object *object)
+{
+	bool object_found[ML_MVTVM_MODEL_OBJECT_MAX] = {false, false, false};
+	struct archive_entry *entry;
+	struct archive *a;
+	uint8_t i;
+	int ret;
+
+	/* Open archive */
+	a = archive_read_new();
+	archive_read_support_filter_all(a);
+	archive_read_support_format_all(a);
+
+	ret = archive_read_open_memory(a, params->addr, params->size);
+	if (ret != ARCHIVE_OK)
+		return archive_errno(a);
+
+	/* Read archive */
+	while (archive_read_next_header(a, &entry) == ARCHIVE_OK) {
+		for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+			if (!object_found[i] &&
+			    (strcmp(archive_entry_pathname(entry), mvtvm_object_list[i]) == 0)) {
+				memcpy(object[i].name, mvtvm_object_list[i], RTE_ML_STR_MAX);
+				object[i].size = archive_entry_size(entry);
+				object[i].buffer = rte_malloc(NULL, object[i].size, 0);
+
+				if (archive_read_data(a, object[i].buffer, object[i].size) !=
+				    object[i].size) {
+					plt_err("Failed to read object from model archive: %s",
+						object[i].name);
+					goto error;
+				}
+				object_found[i] = true;
+			}
+		}
+		archive_read_data_skip(a);
+	}
+
+	/* Check if all objects are parsed */
+	for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+		if (!object_found[i]) {
+			plt_err("Object %s not found in archive!\n", mvtvm_object_list[i]);
+			goto error;
+		}
+	}
+	return 0;
+
+error:
+	for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+		if (object[i].buffer != NULL)
+			rte_free(object[i].buffer);
+	}
+
+	return -EINVAL;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index b6162fceec..b11b66f495 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -44,5 +44,7 @@ struct mvtvm_ml_model_data {
 };
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
+int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
+			      struct mvtvm_ml_model_object *object);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 88c6d5a864..e2413b6b15 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -8,8 +8,12 @@
 #include <rte_mldev_pmd.h>
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
+/* ML model macros */
+#define MVTVM_ML_MODEL_MEMZONE_NAME "ml_mvtvm_model_mz"
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
@@ -39,3 +43,61 @@ mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 
 	return ret;
 }
+
+int
+mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+		    struct cnxk_ml_model *model)
+{
+	struct mvtvm_ml_model_object object[ML_MVTVM_MODEL_OBJECT_MAX];
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	size_t model_object_size = 0;
+	uint64_t mz_size = 0;
+	int ret;
+
+	RTE_SET_USED(cnxk_mldev);
+
+	ret = mvtvm_ml_model_blob_parse(params, object);
+	if (ret != 0)
+		return ret;
+
+	model_object_size = RTE_ALIGN_CEIL(object[0].size, RTE_CACHE_LINE_MIN_SIZE) +
+			    RTE_ALIGN_CEIL(object[1].size, RTE_CACHE_LINE_MIN_SIZE) +
+			    RTE_ALIGN_CEIL(object[2].size, RTE_CACHE_LINE_MIN_SIZE);
+	mz_size += model_object_size;
+
+	/* Allocate memzone for model object */
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", MVTVM_ML_MODEL_MEMZONE_NAME, model->model_id);
+	mz = plt_memzone_reserve_aligned(str, mz_size, 0, ML_CN10K_ALIGN_SIZE);
+	if (!mz) {
+		plt_err("plt_memzone_reserve failed : %s", str);
+		return -ENOMEM;
+	}
+
+	/* Copy mod.so */
+	model->mvtvm.object.so.addr = mz->addr;
+	model->mvtvm.object.so.size = object[0].size;
+	rte_memcpy(model->mvtvm.object.so.name, object[0].name, TVMDP_NAME_STRLEN);
+	rte_memcpy(model->mvtvm.object.so.addr, object[0].buffer, object[0].size);
+	rte_free(object[0].buffer);
+
+	/* Copy mod.json */
+	model->mvtvm.object.json.addr =
+		RTE_PTR_ADD(model->mvtvm.object.so.addr,
+			    RTE_ALIGN_CEIL(model->mvtvm.object.so.size, RTE_CACHE_LINE_MIN_SIZE));
+	model->mvtvm.object.json.size = object[1].size;
+	rte_memcpy(model->mvtvm.object.json.name, object[1].name, TVMDP_NAME_STRLEN);
+	rte_memcpy(model->mvtvm.object.json.addr, object[1].buffer, object[1].size);
+	rte_free(object[1].buffer);
+
+	/* Copy mod.params */
+	model->mvtvm.object.params.addr =
+		RTE_PTR_ADD(model->mvtvm.object.json.addr,
+			    RTE_ALIGN_CEIL(model->mvtvm.object.json.size, RTE_CACHE_LINE_MIN_SIZE));
+	model->mvtvm.object.params.size = object[2].size;
+	rte_memcpy(model->mvtvm.object.params.name, object[2].name, TVMDP_NAME_STRLEN);
+	rte_memcpy(model->mvtvm.object.params.addr, object[2].buffer, object[2].size);
+	rte_free(object[2].buffer);
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 305b4681ed..6607537599 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -12,8 +12,11 @@
 #include <rte_mldev.h>
 
 struct cnxk_ml_dev;
+struct cnxk_ml_model;
 
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+			struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index a7352840a6..7f3b3abb2e 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -33,3 +33,14 @@ mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 
 	return 0;
 }
+
+int
+mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+		    struct cnxk_ml_model *model)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(params);
+	RTE_SET_USED(model);
+
+	return -EINVAL;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 467a9d39e5..4bb1772ef4 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -8,9 +8,12 @@
 #include <rte_mldev.h>
 
 struct cnxk_ml_dev;
+struct cnxk_ml_model;
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+			struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 23/34] ml/cnxk: fetch layer info and load TVM model
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (21 preceding siblings ...)
  2023-10-17 16:59   ` [PATCH v4 22/34] ml/cnxk: add support to parse TVM model objects Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 24/34] ml/cnxk: update internal info for " Srikanth Yalavarthi
                     ` (11 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to fetch TVM model layer information and
update internal structures based on the layer information
Set callback functions for layer load and unload and
enable model loading using TVMDP library. Added support
to fetch full metadata after model load.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 11 +++++
 drivers/ml/cnxk/cn10k_ml_model.h |  2 +
 drivers/ml/cnxk/cn10k_ml_ops.c   |  7 ++-
 drivers/ml/cnxk/mvtvm_ml_model.c | 25 ++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  4 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 81 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.c | 10 ++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  3 ++
 8 files changed, 141 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index b765b4ada9..9a80adf0fc 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -714,3 +714,14 @@ cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer
 	cnxk_ml_print_line(fp, LINE_LEN);
 	fprintf(fp, "\n");
 }
+
+int
+cn10k_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name, uint16_t *layer_id)
+{
+	if (model->type == ML_CNXK_MODEL_TYPE_TVM)
+		return mvtvm_ml_model_get_layer_id(model, layer_name, layer_id);
+
+	*layer_id = 0;
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 45f2ed5fcf..6744175cd5 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -461,5 +461,7 @@ void cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_mode
 			     struct cnxk_ml_io_info *io_info,
 			     struct cn10k_ml_model_metadata *metadata);
 void cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
+int cn10k_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
+				uint16_t *layer_id);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index a471e98fbf..4191ccc840 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -576,7 +576,7 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 	size_t layer_xstats_size;
 	uint8_t *base_dma_addr;
 	uint16_t scratch_pages;
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	uint16_t wb_pages;
 	uint64_t mz_size;
 	uint16_t idx;
@@ -584,7 +584,6 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 	int ret;
 
 	PLT_SET_USED(size);
-	PLT_SET_USED(layer_name);
 
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
@@ -598,6 +597,10 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 		return -EINVAL;
 	}
 
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
 	layer = &model->layer[layer_id];
 
 	ret = cn10k_ml_model_metadata_check(buffer, size);
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 4c9a080c05..8536fd8927 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -110,3 +110,28 @@ mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params, struct mvtvm_ml_mo
 
 	return -EINVAL;
 }
+
+int
+mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name, uint16_t *layer_id)
+{
+	uint16_t i;
+
+	for (i = 0; i < model->mvtvm.metadata.model.nb_layers; i++) {
+		if (strcmp(model->layer[i].name, layer_name) == 0)
+			break;
+	}
+
+	if (i == model->mvtvm.metadata.model.nb_layers) {
+		plt_err("Invalid layer name: %s", layer_name);
+		return -EINVAL;
+	}
+
+	if (model->layer[i].type != ML_CNXK_LAYER_TYPE_MRVL) {
+		plt_err("Invalid layer type, name: %s type: %d", layer_name, model->layer[i].type);
+		return -EINVAL;
+	}
+
+	*layer_id = i;
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index b11b66f495..6cb2639876 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -11,6 +11,8 @@
 
 #include "cnxk_ml_io.h"
 
+struct cnxk_ml_model;
+
 /* Maximum number of objects per model */
 #define ML_MVTVM_MODEL_OBJECT_MAX 3
 
@@ -46,5 +48,7 @@ struct mvtvm_ml_model_data {
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
 int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
 			      struct mvtvm_ml_model_object *object);
+int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
+				uint16_t *layer_id);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index e2413b6b15..1fe0a04301 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -49,9 +49,13 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		    struct cnxk_ml_model *model)
 {
 	struct mvtvm_ml_model_object object[ML_MVTVM_MODEL_OBJECT_MAX];
+	struct tvmrt_glow_callback *callback;
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
 	size_t model_object_size = 0;
+	uint16_t nb_mrvl_layers;
+	uint16_t nb_llvm_layers;
+	uint8_t layer_id = 0;
 	uint64_t mz_size = 0;
 	int ret;
 
@@ -99,5 +103,82 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	rte_memcpy(model->mvtvm.object.params.addr, object[2].buffer, object[2].size);
 	rte_free(object[2].buffer);
 
+	/* Get metadata - stage 1 */
+	ret = tvmdp_model_metadata_get_stage1(model->mvtvm.object.json.addr,
+					      model->mvtvm.object.json.size,
+					      &model->mvtvm.metadata);
+	if (ret != 0) {
+		plt_err("TVMDP: Failed to parse metadata - stage 1, model_id = %u, error = %d",
+			model->model_id, ret);
+		goto error;
+	}
+
+	/* Set model fields */
+	plt_strlcpy(model->name, model->mvtvm.metadata.model.name, TVMDP_NAME_STRLEN);
+	model->batch_size = 1;
+	model->nb_layers = model->mvtvm.metadata.model.nb_layers;
+
+	/* Update layer info */
+	nb_mrvl_layers = 0;
+	nb_llvm_layers = 0;
+	for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+		strncpy(model->layer[layer_id].name,
+			model->mvtvm.metadata.model.layer[layer_id].name, TVMDP_NAME_STRLEN);
+		if (strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "mrvl") == 0 ||
+		    strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "MRVL") == 0) {
+			model->layer[layer_id].type = ML_CNXK_LAYER_TYPE_MRVL;
+			nb_mrvl_layers++;
+		} else if (strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "llvm") == 0 ||
+			   strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "LLVM") == 0) {
+			model->layer[layer_id].type = ML_CNXK_LAYER_TYPE_LLVM;
+			nb_llvm_layers++;
+		}
+	}
+
+	if ((nb_llvm_layers == 0) && (nb_mrvl_layers == 0)) {
+		plt_err("Invalid model, nb_llvm_layers = %u, nb_mrvl_layers = %u", nb_llvm_layers,
+			nb_mrvl_layers);
+		goto error;
+	}
+
+	/* Set model subtype */
+	if ((nb_llvm_layers == 0) && (nb_mrvl_layers == 1))
+		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_MRVL;
+	else if ((nb_llvm_layers > 0) && (nb_mrvl_layers == 0))
+		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_LLVM;
+	else
+		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_HYBRID;
+
+	/* Set callback function array */
+	if (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_LLVM) {
+		callback = &model->mvtvm.cb;
+		callback->tvmrt_glow_layer_load = cn10k_ml_layer_load;
+		callback->tvmrt_glow_layer_unload = cn10k_ml_layer_unload;
+	} else {
+		callback = NULL;
+	}
+
+	/* Initialize model in TVMDP */
+	ret = tvmdp_model_load(cnxk_mldev, model->model_id, (void *)(&model->mvtvm.object),
+			       callback);
+	if (ret != 0) {
+		plt_err("TVMDP: Model load failed, model_id = %u, error = %d", model->model_id,
+			ret);
+		goto error;
+	}
+
+	/* Get model metadata - stage 2 */
+	ret = tvmdp_model_metadata_get_stage2(model->model_id, &model->mvtvm.metadata);
+	if (ret != 0) {
+		plt_err("TVMDP: Failed to get metadata, model_id = %u, error = %d\n",
+			model->model_id, ret);
+		goto error;
+	}
+
 	return 0;
+
+error:
+	rte_memzone_free(mz);
+
+	return ret;
 }
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index 7f3b3abb2e..d621dbc897 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -17,6 +17,16 @@ mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
 	return ML_CNXK_MODEL_TYPE_UNKNOWN;
 }
 
+int
+mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name, uint16_t *layer_id)
+{
+	RTE_SET_USED(model);
+	RTE_SET_USED(layer_name);
+	RTE_SET_USED(layer_id);
+
+	return -EINVAL;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 4bb1772ef4..23fdfdc4cd 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -16,4 +16,7 @@ int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 
+int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
+				uint16_t *layer_id);
+
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 24/34] ml/cnxk: update internal info for TVM model
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (22 preceding siblings ...)
  2023-10-17 16:59   ` [PATCH v4 23/34] ml/cnxk: fetch layer info and load TVM model Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 25/34] ml/cnxk: enable model unload in tvmdp library Srikanth Yalavarthi
                     ` (10 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enabled updating internal IO info structures for TVM model.
Compute static fields related to the model I/O.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_ops.c    |   4 ++
 drivers/ml/cnxk/mvtvm_ml_model.c | 111 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |   2 +
 drivers/ml/cnxk/mvtvm_ml_ops.c   |   3 +
 drivers/ml/cnxk/mvtvm_ml_stubs.c |   9 +++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |   1 +
 6 files changed, 130 insertions(+)

diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 60b1c6b375..94bb48168c 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1304,6 +1304,8 @@ cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buf
 
 	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
 		info = cn10k_ml_model_io_info_get(model, 0);
+	else
+		info = mvtvm_ml_model_io_info_get(model, 0);
 
 	if (info == NULL)
 		return -EINVAL;
@@ -1356,6 +1358,8 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 
 	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
 		info = cn10k_ml_model_io_info_get(model, model->nb_layers - 1);
+	else
+		info = mvtvm_ml_model_io_info_get(model, model->nb_layers - 1);
 
 	if (info == NULL)
 		return -EINVAL;
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 8536fd8927..14f4b258d8 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -7,6 +7,8 @@
 
 #include <rte_mldev.h>
 
+#include <mldev_utils.h>
+
 #include <roc_api.h>
 
 #include "cnxk_ml_model.h"
@@ -135,3 +137,112 @@ mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 
 	return 0;
 }
+
+static enum rte_ml_io_type
+mvtvm_ml_io_type_map(uint8_t type)
+{
+	switch (type) {
+	case kDLInt:
+		return RTE_ML_IO_TYPE_INT32;
+	case kDLUInt:
+		return RTE_ML_IO_TYPE_UINT32;
+	case kDLFloat:
+		return RTE_ML_IO_TYPE_FP32;
+	case kDLBfloat:
+		return RTE_ML_IO_TYPE_BFLOAT16;
+	}
+
+	return RTE_ML_IO_TYPE_UNKNOWN;
+}
+
+void
+mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model)
+{
+	struct tvmdp_model_metadata *metadata;
+	int32_t i;
+	int32_t j;
+
+	if (model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL)
+		goto tvm_mrvl_model;
+
+	metadata = &model->mvtvm.metadata;
+
+	/* Inputs, set for layer_id = 0 */
+	model->mvtvm.info.nb_inputs = metadata->model.num_input;
+	model->mvtvm.info.total_input_sz_d = 0;
+	model->mvtvm.info.total_input_sz_q = 0;
+	for (i = 0; i < metadata->model.num_input; i++) {
+		strncpy(model->mvtvm.info.input[i].name, metadata->input[i].name,
+			TVMDP_NAME_STRLEN);
+		model->mvtvm.info.input[i].dtype =
+			mvtvm_ml_io_type_map(metadata->input[i].datatype.code);
+		model->mvtvm.info.input[i].qtype =
+			mvtvm_ml_io_type_map(metadata->input[i].model_datatype.code);
+		model->mvtvm.info.input[i].nb_dims = metadata->input[i].ndim;
+
+		model->mvtvm.info.input[i].nb_elements = 1;
+		for (j = 0; j < metadata->input[i].ndim; j++) {
+			model->mvtvm.info.input[i].shape[j] = metadata->input[i].shape[j];
+			model->mvtvm.info.input[i].nb_elements *= metadata->input[i].shape[j];
+		}
+
+		model->mvtvm.info.input[i].sz_d =
+			model->mvtvm.info.input[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.input[i].dtype);
+		model->mvtvm.info.input[i].sz_q =
+			model->mvtvm.info.input[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.input[i].qtype);
+
+		model->mvtvm.info.total_input_sz_d += model->mvtvm.info.input[i].sz_d;
+		model->mvtvm.info.total_input_sz_q += model->mvtvm.info.input[i].sz_q;
+
+		plt_ml_dbg("model_id = %u, input[%u] - sz_d = %u sz_q = %u", model->model_id, i,
+			   model->mvtvm.info.input[i].sz_d, model->mvtvm.info.input[i].sz_q);
+	}
+
+	/* Outputs, set for nb_layers - 1 */
+	model->mvtvm.info.nb_outputs = metadata->model.num_output;
+	model->mvtvm.info.total_output_sz_d = 0;
+	model->mvtvm.info.total_output_sz_q = 0;
+	for (i = 0; i < metadata->model.num_output; i++) {
+		strncpy(model->mvtvm.info.output[i].name, metadata->output[i].name,
+			TVMDP_NAME_STRLEN);
+		model->mvtvm.info.output[i].dtype =
+			mvtvm_ml_io_type_map(metadata->output[i].datatype.code);
+		model->mvtvm.info.output[i].qtype =
+			mvtvm_ml_io_type_map(metadata->output[i].model_datatype.code);
+		model->mvtvm.info.output[i].nb_dims = metadata->output[i].ndim;
+
+		model->mvtvm.info.output[i].nb_elements = 1;
+		for (j = 0; j < metadata->output[i].ndim; j++) {
+			model->mvtvm.info.output[i].shape[j] = metadata->output[i].shape[j];
+			model->mvtvm.info.output[i].nb_elements *= metadata->output[i].shape[j];
+		}
+
+		model->mvtvm.info.output[i].sz_d =
+			model->mvtvm.info.output[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.output[i].dtype);
+		model->mvtvm.info.output[i].sz_q =
+			model->mvtvm.info.output[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.output[i].qtype);
+
+		model->mvtvm.info.total_output_sz_d += model->mvtvm.info.output[i].sz_d;
+		model->mvtvm.info.total_output_sz_q += model->mvtvm.info.output[i].sz_q;
+
+		plt_ml_dbg("model_id = %u, output[%u] - sz_d = %u sz_q = %u", model->model_id, i,
+			   model->mvtvm.info.output[i].sz_d, model->mvtvm.info.output[i].sz_q);
+	}
+
+	return;
+
+tvm_mrvl_model:
+	cn10k_ml_layer_io_info_set(&model->mvtvm.info, &model->layer[0].glow.metadata);
+}
+
+struct cnxk_ml_io_info *
+mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
+{
+	RTE_SET_USED(layer_id);
+
+	return &model->mvtvm.info;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 6cb2639876..e86581bc6a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -50,5 +50,7 @@ int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
 			      struct mvtvm_ml_model_object *object);
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
+void mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model);
+struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 1fe0a04301..e248310cb3 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -175,6 +175,9 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		goto error;
 	}
 
+	/* Update model I/O data */
+	mvtvm_ml_model_io_info_set(model);
+
 	return 0;
 
 error:
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index d621dbc897..80a9a90b4e 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -27,6 +27,15 @@ mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 	return -EINVAL;
 }
 
+struct cnxk_ml_io_info *
+mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
+{
+	RTE_SET_USED(model);
+	RTE_SET_USED(layer_id);
+
+	return NULL;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 23fdfdc4cd..29f721072a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -18,5 +18,6 @@ int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_para
 
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
+struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
 
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 25/34] ml/cnxk: enable model unload in tvmdp library
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (23 preceding siblings ...)
  2023-10-17 16:59   ` [PATCH v4 24/34] ml/cnxk: update internal info for " Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 26/34] ml/cnxk: support start and stop for TVM models Srikanth Yalavarthi
                     ` (9 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enable unloading model using external tvmdp library. Updated
layer unload callback to support multiple layers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c   |  8 +++++---
 drivers/ml/cnxk/cnxk_ml_ops.c    |  7 +++++--
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 28 ++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |  1 +
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  9 +++++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  1 +
 6 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 4191ccc840..e7208391fd 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -780,11 +780,9 @@ cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name)
 	struct cnxk_ml_layer *layer;
 
 	char str[RTE_MEMZONE_NAMESIZE];
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	int ret;
 
-	PLT_SET_USED(layer_name);
-
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
 		plt_err("Invalid device = %p", device);
@@ -797,6 +795,10 @@ cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name)
 		return -EINVAL;
 	}
 
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
 	layer = &model->layer[layer_id];
 
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u_%u", CN10K_ML_LAYER_MEMZONE_NAME,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 94bb48168c..03f4783b3f 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1167,7 +1167,7 @@ cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 	struct cnxk_ml_model *model;
 
 	char str[RTE_MEMZONE_NAMESIZE];
-	int ret;
+	int ret = 0;
 
 	if (dev == NULL)
 		return -EINVAL;
@@ -1185,7 +1185,10 @@ cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EBUSY;
 	}
 
-	ret = cn10k_ml_model_unload(cnxk_mldev, model);
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		ret = cn10k_ml_model_unload(cnxk_mldev, model);
+	else
+		ret = mvtvm_ml_model_unload(cnxk_mldev, model);
 	if (ret != 0)
 		return ret;
 
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index e248310cb3..9fd9e58de6 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -185,3 +185,31 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	return ret;
 }
+
+int
+mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	int ret;
+
+	RTE_SET_USED(cnxk_mldev);
+
+	/* Initialize model in TVMDP */
+	ret = tvmdp_model_unload(model->model_id);
+	if (ret != 0) {
+		plt_err("TVMDP: Model unload failed, model_id = %u, error = %d", model->model_id,
+			ret);
+		return ret;
+	}
+
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", MVTVM_ML_MODEL_MEMZONE_NAME, model->model_id);
+	mz = rte_memzone_lookup(str);
+	if (mz == NULL) {
+		plt_err("Memzone lookup failed for TVM model: model_id = %u, mz = %s",
+			model->model_id, str);
+		return -EINVAL;
+	}
+
+	return plt_memzone_free(mz);
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 6607537599..770794fe7d 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -18,5 +18,6 @@ int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_d
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
+int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index 80a9a90b4e..a17a76e41f 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -63,3 +63,12 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	return -EINVAL;
 }
+
+int
+mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+
+	return -EINVAL;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 29f721072a..3776fb5369 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -15,6 +15,7 @@ int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_d
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
+int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 26/34] ml/cnxk: support start and stop for TVM models
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (24 preceding siblings ...)
  2023-10-17 16:59   ` [PATCH v4 25/34] ml/cnxk: enable model unload in tvmdp library Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 27/34] ml/cnxk: update internal TVM model info structure Srikanth Yalavarthi
                     ` (8 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to start and stop TVM models. TVM model
start would invoke layer start for all Glow layers part
of the model. TVM model stop would invoke layer stop
for all Glow layers part of the model.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c   | 16 ++++++----
 drivers/ml/cnxk/cnxk_ml_ops.c    | 14 +++++++--
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 52 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |  2 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c | 18 +++++++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  2 ++
 6 files changed, 96 insertions(+), 8 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index e7208391fd..2d308802cf 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -827,7 +827,7 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 	struct cn10k_ml_ocm *ocm;
 	struct cnxk_ml_req *req;
 
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	bool job_enqueued;
 	bool job_dequeued;
 	uint8_t num_tiles;
@@ -838,8 +838,6 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 	bool locked;
 	int ret = 0;
 
-	PLT_SET_USED(layer_name);
-
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
 		plt_err("Invalid device = %p", device);
@@ -852,6 +850,10 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 		return -EINVAL;
 	}
 
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
 	layer = &model->layer[layer_id];
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
@@ -1015,14 +1017,12 @@ cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name)
 	struct cn10k_ml_ocm *ocm;
 	struct cnxk_ml_req *req;
 
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	bool job_enqueued;
 	bool job_dequeued;
 	bool locked;
 	int ret = 0;
 
-	PLT_SET_USED(layer_name);
-
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
 		plt_err("Invalid device = %p", device);
@@ -1035,6 +1035,10 @@ cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name)
 		return -EINVAL;
 	}
 
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
 	layer = &model->layer[layer_id];
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 03f4783b3f..66cda513db 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1216,7 +1216,12 @@ cnxk_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EINVAL;
 	}
 
-	return cn10k_ml_model_start(cnxk_mldev, model);
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		return cn10k_ml_model_start(cnxk_mldev, model);
+	else
+		return mvtvm_ml_model_start(cnxk_mldev, model);
+
+	return 0;
 }
 
 int
@@ -1236,7 +1241,12 @@ cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EINVAL;
 	}
 
-	return cn10k_ml_model_stop(cnxk_mldev, model);
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		return cn10k_ml_model_stop(cnxk_mldev, model);
+	else
+		return mvtvm_ml_model_stop(cnxk_mldev, model);
+
+	return 0;
 }
 
 static int
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 9fd9e58de6..1d0b3544a7 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -213,3 +213,55 @@ mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mode
 
 	return plt_memzone_free(mz);
 }
+
+int
+mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+
+	uint16_t layer_id = 0;
+	int ret = 0;
+
+next_layer:
+	layer = &model->layer[layer_id];
+	if (layer->type == ML_CNXK_LAYER_TYPE_MRVL) {
+		ret = cn10k_ml_layer_start(cnxk_mldev, model->model_id, layer->name);
+		if (ret != 0) {
+			plt_err("Layer start failed, model_id = %u, layer_name = %s, error = %d",
+				model->model_id, layer->name, ret);
+			return ret;
+		}
+	}
+	layer_id++;
+
+	if (layer_id < model->nb_layers)
+		goto next_layer;
+
+	return 0;
+}
+
+int
+mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+
+	uint16_t layer_id = 0;
+	int ret = 0;
+
+next_layer:
+	layer = &model->layer[layer_id];
+	if (layer->type == ML_CNXK_LAYER_TYPE_MRVL) {
+		ret = cn10k_ml_layer_stop(cnxk_mldev, model->model_id, layer->name);
+		if (ret != 0) {
+			plt_err("Layer stop failed, model_id = %u, layer_name = %s, error = %d",
+				model->model_id, layer->name, ret);
+			return ret;
+		}
+	}
+	layer_id++;
+
+	if (layer_id < model->nb_layers)
+		goto next_layer;
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 770794fe7d..55459f9f7f 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -19,5 +19,7 @@ int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index a17a76e41f..b8c2e6a1fc 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -72,3 +72,21 @@ mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mode
 
 	return -EINVAL;
 }
+
+int
+mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+
+	return -EINVAL;
+}
+
+int
+mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+
+	return -EINVAL;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 3776fb5369..1eb663b1d1 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -16,6 +16,8 @@ int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 27/34] ml/cnxk: update internal TVM model info structure
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (25 preceding siblings ...)
  2023-10-17 16:59   ` [PATCH v4 26/34] ml/cnxk: support start and stop for TVM models Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 28/34] ml/cnxk: support device dump for TVM models Srikanth Yalavarthi
                     ` (7 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

From: Prince Takkar <ptakkar@marvell.com>

Added support to update internal model info structure
for TVM models.

Signed-off-by: Prince Takkar <ptakkar@marvell.com>
Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/mvtvm_ml_model.c | 65 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 +
 drivers/ml/cnxk/mvtvm_ml_ops.c   |  3 ++
 3 files changed, 70 insertions(+)

diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 14f4b258d8..569147aca7 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -11,6 +11,7 @@
 
 #include <roc_api.h>
 
+#include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 
 /* Objects list */
@@ -246,3 +247,67 @@ mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
 
 	return &model->mvtvm.info;
 }
+
+void
+mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct tvmdp_model_metadata *metadata;
+	struct rte_ml_model_info *info;
+	struct rte_ml_io_info *output;
+	struct rte_ml_io_info *input;
+	uint8_t i;
+
+	info = PLT_PTR_CAST(model->info);
+	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
+	output = PLT_PTR_ADD(input, ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info));
+
+	/* Reset model info */
+	memset(info, 0, sizeof(struct rte_ml_model_info));
+
+	if (model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL)
+		goto tvm_mrvl_model;
+
+	metadata = &model->mvtvm.metadata;
+	rte_memcpy(info->name, metadata->model.name, TVMDP_NAME_STRLEN);
+	snprintf(info->version, RTE_ML_STR_MAX, "%u.%u.%u.%u", metadata->model.version[0],
+		 metadata->model.version[1], metadata->model.version[2],
+		 metadata->model.version[3]);
+	info->model_id = model->model_id;
+	info->device_id = cnxk_mldev->mldev->data->dev_id;
+	info->io_layout = RTE_ML_IO_LAYOUT_SPLIT;
+	info->min_batches = model->batch_size;
+	info->max_batches = model->batch_size;
+	info->nb_inputs = metadata->model.num_input;
+	info->input_info = input;
+	info->nb_outputs = metadata->model.num_output;
+	info->output_info = output;
+	info->wb_size = 0;
+
+	/* Set input info */
+	for (i = 0; i < info->nb_inputs; i++) {
+		rte_memcpy(input[i].name, metadata->input[i].name, MRVL_ML_INPUT_NAME_LEN);
+		input[i].nb_dims = metadata->input[i].ndim;
+		input[i].shape = &model->mvtvm.info.input[i].shape[0];
+		input[i].type = model->mvtvm.info.input[i].qtype;
+		input[i].nb_elements = model->mvtvm.info.input[i].nb_elements;
+		input[i].size = model->mvtvm.info.input[i].nb_elements *
+				rte_ml_io_type_size_get(model->mvtvm.info.input[i].qtype);
+	}
+
+	/* Set output info */
+	for (i = 0; i < info->nb_outputs; i++) {
+		rte_memcpy(output[i].name, metadata->output[i].name, MRVL_ML_OUTPUT_NAME_LEN);
+		output[i].nb_dims = metadata->output[i].ndim;
+		output[i].shape = &model->mvtvm.info.output[i].shape[0];
+		output[i].type = model->mvtvm.info.output[i].qtype;
+		output[i].nb_elements = model->mvtvm.info.output[i].nb_elements;
+		output[i].size = model->mvtvm.info.output[i].nb_elements *
+				 rte_ml_io_type_size_get(model->mvtvm.info.output[i].qtype);
+	}
+
+	return;
+
+tvm_mrvl_model:
+	cn10k_ml_model_info_set(cnxk_mldev, model, &model->mvtvm.info,
+				&model->layer[0].glow.metadata);
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index e86581bc6a..a1247ffbde 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -11,6 +11,7 @@
 
 #include "cnxk_ml_io.h"
 
+struct cnxk_ml_dev;
 struct cnxk_ml_model;
 
 /* Maximum number of objects per model */
@@ -52,5 +53,6 @@ int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_n
 				uint16_t *layer_id);
 void mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model);
 struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
+void mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 1d0b3544a7..f13ba76207 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -178,6 +178,9 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	/* Update model I/O data */
 	mvtvm_ml_model_io_info_set(model);
 
+	/* Set model info */
+	mvtvm_ml_model_info_set(cnxk_mldev, model);
+
 	return 0;
 
 error:
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 28/34] ml/cnxk: support device dump for TVM models
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (26 preceding siblings ...)
  2023-10-17 16:59   ` [PATCH v4 27/34] ml/cnxk: update internal TVM model info structure Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 29/34] ml/cnxk: enable reporting model runtime as xstats Srikanth Yalavarthi
                     ` (6 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enabled support to print TVM model layer info.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_model.c  |  7 +++-
 drivers/ml/cnxk/mvtvm_ml_model.c | 59 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  8 +++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  2 ++
 5 files changed, 77 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
index 02f80410ec..ed6a1ed866 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.c
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -68,6 +68,8 @@ cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 	cnxk_ml_print_line(fp, LINE_LEN);
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
 	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->name);
+	fprintf(fp, "%*s : %d\n", FIELD_LEN, "type", model->type);
+	fprintf(fp, "%*s : %d\n", FIELD_LEN, "subtype", model->subtype);
 	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->batch_size);
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "nb_layers", model->nb_layers);
@@ -84,6 +86,9 @@ cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 
 	for (layer_id = 0; layer_id < model->nb_layers; layer_id++) {
 		layer = &model->layer[layer_id];
-		cn10k_ml_layer_print(cnxk_mldev, layer, fp);
+		if (layer->type == ML_CNXK_LAYER_TYPE_MRVL)
+			cn10k_ml_layer_print(cnxk_mldev, layer, fp);
+		else
+			mvtvm_ml_layer_print(cnxk_mldev, layer, fp);
 	}
 }
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 569147aca7..4c12f584d5 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -13,6 +13,7 @@
 
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_utils.h"
 
 /* Objects list */
 char mvtvm_object_list[ML_MVTVM_MODEL_OBJECT_MAX][RTE_ML_STR_MAX] = {"mod.so", "mod.json",
@@ -311,3 +312,61 @@ mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 	cn10k_ml_model_info_set(cnxk_mldev, model, &model->mvtvm.info,
 				&model->layer[0].glow.metadata);
 }
+
+void
+mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp)
+{
+	char str[STR_LEN];
+	uint8_t i;
+
+	/* Print debug info */
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, " Layer Information (Layer ID: %u, Name: %s)\n",
+		cnxk_mldev->index_map[layer->index].layer_id, layer->name);
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "layer_id",
+		cnxk_mldev->index_map[layer->index].layer_id);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", layer->name);
+	fprintf(fp, "%*s : %d\n", FIELD_LEN, "type", layer->type);
+	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "layer", PLT_U64_CAST(layer));
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", layer->batch_size);
+
+	/* Print model state */
+	if (layer->state == ML_CNXK_LAYER_STATE_LOADED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+	if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+	if (layer->state == ML_CNXK_LAYER_STATE_STARTED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", layer->info.nb_inputs);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", layer->info.nb_outputs);
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s\n", "input", "input_name", "input_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->info.nb_inputs; i++) {
+		fprintf(fp, "%8u  ", i);
+		fprintf(fp, "%*s  ", 16, layer->info.input[i].name);
+		rte_ml_io_type_to_str(layer->info.input[i].qtype, str, STR_LEN);
+		fprintf(fp, "%*s  ", 12, str);
+	}
+	fprintf(fp, "\n");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s\n", "output", "output_name", "output_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->info.nb_outputs; i++) {
+		fprintf(fp, "%8u  ", i);
+		fprintf(fp, "%*s  ", 16, layer->info.output[i].name);
+		rte_ml_io_type_to_str(layer->info.output[i].qtype, str, STR_LEN);
+		fprintf(fp, "%*s  ", 12, str);
+		fprintf(fp, "\n");
+	}
+	fprintf(fp, "\n");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index a1247ffbde..900ba44fa0 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -13,6 +13,7 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 /* Maximum number of objects per model */
 #define ML_MVTVM_MODEL_OBJECT_MAX 3
@@ -54,5 +55,6 @@ int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_n
 void mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model);
 struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
 void mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+void mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index b8c2e6a1fc..260a051b08 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -36,6 +36,14 @@ mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
 	return NULL;
 }
 
+void
+mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(layer);
+	RTE_SET_USED(fp);
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 1eb663b1d1..d6d0edbcf1 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -9,6 +9,7 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
@@ -22,5 +23,6 @@ int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
 struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
+void mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
 
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 29/34] ml/cnxk: enable reporting model runtime as xstats
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (27 preceding siblings ...)
  2023-10-17 16:59   ` [PATCH v4 28/34] ml/cnxk: support device dump for TVM models Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 30/34] ml/cnxk: implement I/O alloc and free callbacks Srikanth Yalavarthi
                     ` (5 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added model xstats entries to compute runtime latency.
Allocated internal resources for TVM model xstats.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c   |   9 +++
 drivers/ml/cnxk/cn10k_ml_ops.h   |   2 +
 drivers/ml/cnxk/cnxk_ml_ops.c    | 131 +++++++++++++++++++++++++++----
 drivers/ml/cnxk/cnxk_ml_ops.h    |   1 +
 drivers/ml/cnxk/cnxk_ml_xstats.h |   7 ++
 drivers/ml/cnxk/mvtvm_ml_model.h |  24 ++++++
 drivers/ml/cnxk/mvtvm_ml_ops.c   |  96 +++++++++++++++++++++-
 drivers/ml/cnxk/mvtvm_ml_ops.h   |   8 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  23 ++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |   6 ++
 10 files changed, 289 insertions(+), 18 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 2d308802cf..0c67ce7b40 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -197,6 +197,15 @@ cn10k_ml_xstats_layer_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model
 	}
 }
 
+void
+cn10k_ml_xstat_model_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			      uint16_t stat_id, uint16_t entry, char *suffix)
+{
+	snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+		 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+		 model->glow.metadata.model.name, model_xstats[entry].name, suffix);
+}
+
 #define ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
 	do {                                                                                       \
 		value = 0;                                                                         \
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 3d18303ed3..045e2e6cd2 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -331,6 +331,8 @@ int cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name
 int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name);
 
 /* xstats ops */
+void cn10k_ml_xstat_model_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				   uint16_t stat_id, uint16_t entry, char *suffix);
 uint64_t cn10k_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				  enum cnxk_ml_xstats_type type);
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 66cda513db..fd2c46ac1f 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -138,7 +138,8 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
 	nb_stats = RTE_DIM(device_xstats) +
-		   RTE_DIM(layer_xstats) * ML_CNXK_MAX_MODELS * ML_CNXK_MODEL_MAX_LAYERS;
+		   RTE_DIM(layer_xstats) * ML_CNXK_MAX_MODELS * ML_CNXK_MODEL_MAX_LAYERS +
+		   RTE_DIM(model_xstats) * ML_CNXK_MAX_MODELS;
 	if (cnxk_mldev->xstats.entries == NULL)
 		cnxk_mldev->xstats.entries = rte_zmalloc(
 			"cnxk_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
@@ -169,6 +170,25 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
 		cnxk_mldev->xstats.offset_for_model[model] = stat_id;
 
+		for (i = 0; i < RTE_DIM(model_xstats); i++) {
+			cnxk_mldev->xstats.entries[stat_id].map.id = stat_id;
+			cnxk_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
+			cnxk_mldev->xstats.entries[stat_id].group = CNXK_ML_XSTATS_GROUP_MODEL;
+			cnxk_mldev->xstats.entries[stat_id].type = model_xstats[i].type;
+			cnxk_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
+			cnxk_mldev->xstats.entries[stat_id].obj_idx = model;
+			cnxk_mldev->xstats.entries[stat_id].layer_id = -1;
+			cnxk_mldev->xstats.entries[stat_id].reset_allowed =
+				model_xstats[i].reset_allowed;
+
+			/* Name of xstat is updated during model load */
+			snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+				 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name),
+				 "Model-%u-%s", model, model_xstats[i].name);
+
+			stat_id++;
+		}
+
 		for (layer = 0; layer < ML_CNXK_MODEL_MAX_LAYERS; layer++) {
 			cnxk_mldev->xstats.offset_for_layer[model][layer] = stat_id;
 
@@ -195,7 +215,8 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 			cnxk_mldev->xstats.count_per_layer[model][layer] = RTE_DIM(layer_xstats);
 		}
 
-		cnxk_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
+		cnxk_mldev->xstats.count_per_model[model] =
+			RTE_DIM(layer_xstats) + ML_CNXK_MODEL_MAX_LAYERS * RTE_DIM(model_xstats);
 	}
 
 	cnxk_mldev->xstats.count_mode_model = stat_id - cnxk_mldev->xstats.count_mode_device;
@@ -204,6 +225,36 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
+void
+cnxk_ml_xstats_model_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id)
+{
+	struct cnxk_ml_model *model;
+	uint16_t rclk_freq;
+	uint16_t sclk_freq;
+	uint16_t stat_id;
+	char suffix[8];
+	uint16_t i;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	stat_id = cnxk_mldev->xstats.offset_for_model[model_id];
+
+	roc_clk_freq_get(&rclk_freq, &sclk_freq);
+	if (sclk_freq == 0)
+		strcpy(suffix, "cycles");
+	else
+		strcpy(suffix, "ns");
+
+	/* Update xstat name based on layer name and sclk availability */
+	for (i = 0; i < RTE_DIM(model_xstats); i++) {
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+			cn10k_ml_xstat_model_name_set(cnxk_mldev, model, stat_id, i, suffix);
+		else
+			mvtvm_ml_model_xstat_name_set(cnxk_mldev, model, stat_id, i, suffix);
+
+		stat_id++;
+	}
+}
+
 static void
 cnxk_ml_xstats_uninit(struct cnxk_ml_dev *cnxk_mldev)
 {
@@ -247,13 +298,22 @@ cnxk_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx, int32_
 	if (model == NULL)
 		return 0;
 
-	if (layer_id >= 0)
+	if (layer_id >= 0) {
 		layer = &model->layer[layer_id];
-	else
-		return 0;
+		goto layer_xstats;
+	} else {
+		layer = NULL;
+		goto model_xstats;
+	}
 
+layer_xstats:
 	value = cn10k_ml_model_xstat_get(cnxk_mldev, layer, type);
+	goto exit_xstats;
 
+model_xstats:
+	value = mvtvm_ml_model_xstat_get(cnxk_mldev, model, type);
+
+exit_xstats:
 	roc_clk_freq_get(&rclk_freq, &sclk_freq);
 	if (sclk_freq != 0) /* return in ns */
 		value = (value * 1000ULL) / sclk_freq;
@@ -836,8 +896,9 @@ cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode
 {
 	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
 	uint32_t xstats_mode_count;
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	uint32_t idx = 0;
 	uint32_t i;
 
@@ -854,7 +915,17 @@ cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CNXK_MAX_MODELS)
 			break;
-		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+
+		model = cnxk_mldev->mldev->data->models[model_id];
+		for (layer_id = 0; layer_id < model->nb_layers; layer_id++) {
+			if (model->layer[layer_id].type == ML_CNXK_LAYER_TYPE_MRVL)
+				xstats_mode_count +=
+					cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+		}
+
+		if ((model->type == ML_CNXK_MODEL_TYPE_TVM) &&
+		    (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_MRVL))
+			xstats_mode_count += RTE_DIM(model_xstats);
 		break;
 	default:
 		return -EINVAL;
@@ -868,9 +939,20 @@ cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode
 		if (xs->mode != mode)
 			continue;
 
-		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
-		    (model_id != xs->obj_idx || layer_id != xs->layer_id))
-			continue;
+		if (mode == RTE_ML_DEV_XSTATS_MODEL) {
+			if (model_id != xs->obj_idx)
+				continue;
+
+			model = cnxk_mldev->mldev->data->models[model_id];
+			if ((model->type == ML_CNXK_MODEL_TYPE_GLOW ||
+			     model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL) &&
+			    xs->group == CNXK_ML_XSTATS_GROUP_MODEL)
+				continue;
+
+			if (model->type == ML_CNXK_MODEL_TYPE_TVM &&
+			    model->layer[xs->layer_id].type == ML_CNXK_LAYER_TYPE_LLVM)
+				continue;
+		}
 
 		strncpy(xstats_map[idx].name, xs->map.name, RTE_ML_STR_MAX);
 		xstats_map[idx].id = xs->map.id;
@@ -931,9 +1013,10 @@ cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 {
 	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
 	uint32_t xstats_mode_count;
-	uint16_t layer_id = 0;
 	cnxk_ml_xstats_fn fn;
+	uint16_t layer_id;
 	uint64_t val;
 	uint32_t idx;
 	uint32_t i;
@@ -951,7 +1034,14 @@ cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CNXK_MAX_MODELS)
 			return -EINVAL;
-		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+
+		model = cnxk_mldev->mldev->data->models[model_id];
+		for (layer_id = 0; layer_id < model->nb_layers; layer_id++)
+			xstats_mode_count += cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+
+		if ((model->type == ML_CNXK_MODEL_TYPE_TVM) &&
+		    (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_MRVL))
+			xstats_mode_count += RTE_DIM(model_xstats);
 		break;
 	default:
 		return -EINVAL;
@@ -963,11 +1053,18 @@ cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 		if (stat_ids[i] > cnxk_mldev->xstats.count || xs->mode != mode)
 			continue;
 
-		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
-		    (model_id != xs->obj_idx || layer_id != xs->layer_id)) {
-			plt_err("Invalid stats_id[%d] = %d for model_id = %d\n", i, stat_ids[i],
-				model_id);
-			return -EINVAL;
+		if (mode == RTE_ML_DEV_XSTATS_MODEL) {
+			if (model_id != xs->obj_idx)
+				continue;
+
+			model = cnxk_mldev->mldev->data->models[xs->obj_idx];
+			if ((model->type == ML_CNXK_MODEL_TYPE_GLOW ||
+			     model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL) &&
+			    xs->group == CNXK_ML_XSTATS_GROUP_MODEL)
+				continue;
+
+			if (xs->layer_id == -1 && xs->group == CNXK_ML_XSTATS_GROUP_LAYER)
+				continue;
 		}
 
 		switch (xs->fn_id) {
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index b22a2b0d95..ab32676b3e 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -70,6 +70,7 @@ extern struct rte_ml_dev_ops cnxk_ml_ops;
 
 int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
 int cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
+void cnxk_ml_xstats_model_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id);
 
 __rte_hot uint16_t cnxk_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					 struct rte_ml_op **ops, uint16_t nb_ops);
diff --git a/drivers/ml/cnxk/cnxk_ml_xstats.h b/drivers/ml/cnxk/cnxk_ml_xstats.h
index 5e02bb876c..a2c9adfe4a 100644
--- a/drivers/ml/cnxk/cnxk_ml_xstats.h
+++ b/drivers/ml/cnxk/cnxk_ml_xstats.h
@@ -142,4 +142,11 @@ static const struct cnxk_ml_xstat_info layer_xstats[] = {
 	{"Min-FW-Latency", min_fw_latency, 1}, {"Max-FW-Latency", max_fw_latency, 1},
 };
 
+/* Model xstats */
+static const struct cnxk_ml_xstat_info model_xstats[] = {
+	{"Avg-RT-Latency", avg_rt_latency, 1},
+	{"Min-RT-Latency", min_rt_latency, 1},
+	{"Max-RT-Latency", max_rt_latency, 1},
+};
+
 #endif /* _CNXK_ML_XSTATS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 900ba44fa0..66c3af18e1 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -33,6 +33,27 @@ struct mvtvm_ml_model_object {
 	int64_t size;
 };
 
+/* Model fast-path stats */
+struct mvtvm_ml_model_xstats {
+	/* Total TVM runtime latency, sum of all inferences */
+	uint64_t tvm_rt_latency_tot;
+
+	/* TVM runtime latency */
+	uint64_t tvm_rt_latency;
+
+	/* Minimum TVM runtime latency */
+	uint64_t tvm_rt_latency_min;
+
+	/* Maximum TVM runtime latency */
+	uint64_t tvm_rt_latency_max;
+
+	/* Total jobs dequeued */
+	uint64_t dequeued_count;
+
+	/* Hardware stats reset index */
+	uint64_t tvm_rt_reset_count;
+};
+
 struct mvtvm_ml_model_data {
 	/* Model metadata */
 	struct tvmdp_model_metadata metadata;
@@ -45,6 +66,9 @@ struct mvtvm_ml_model_data {
 
 	/* Model I/O info */
 	struct cnxk_ml_io_info info;
+
+	/* Stats for burst ops */
+	struct mvtvm_ml_model_xstats *burst_xstats;
 };
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index f13ba76207..832837034b 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -10,10 +10,83 @@
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
+#include "cnxk_ml_xstats.h"
 
 /* ML model macros */
 #define MVTVM_ML_MODEL_MEMZONE_NAME "ml_mvtvm_model_mz"
 
+void
+mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			      uint16_t stat_id, uint16_t entry, char *suffix)
+{
+	snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+		 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+		 model->mvtvm.metadata.model.name, model_xstats[entry].name, suffix);
+}
+
+#define ML_AVG_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count)                            \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value += model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_tot;              \
+			count += model->mvtvm.burst_xstats[qp_id].dequeued_count -                 \
+				 model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count;              \
+		}                                                                                  \
+		if (count != 0)                                                                    \
+			value = value / count;                                                     \
+	} while (0)
+
+#define ML_MIN_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count)                            \
+	do {                                                                                       \
+		value = UINT64_MAX;                                                                \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MIN(value,                                                     \
+					model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_min);      \
+			count += model->mvtvm.burst_xstats[qp_id].dequeued_count -                 \
+				 model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count;              \
+		}                                                                                  \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+
+#define ML_MAX_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count)                            \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MAX(value,                                                     \
+					model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_max);      \
+			count += model->mvtvm.burst_xstats[qp_id].dequeued_count -                 \
+				 model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count;              \
+		}                                                                                  \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+
+uint64_t
+mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			 enum cnxk_ml_xstats_type type)
+{
+	uint64_t count = 0;
+	uint64_t value = 0;
+	uint32_t qp_id;
+
+	switch (type) {
+	case avg_rt_latency:
+		ML_AVG_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count);
+		break;
+	case min_rt_latency:
+		ML_MIN_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count);
+		break;
+	case max_rt_latency:
+		ML_MAX_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count);
+		break;
+	default:
+		value = 0;
+	}
+
+	return value;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
@@ -53,6 +126,7 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
 	size_t model_object_size = 0;
+	size_t model_xstats_size = 0;
 	uint16_t nb_mrvl_layers;
 	uint16_t nb_llvm_layers;
 	uint8_t layer_id = 0;
@@ -68,7 +142,11 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	model_object_size = RTE_ALIGN_CEIL(object[0].size, RTE_CACHE_LINE_MIN_SIZE) +
 			    RTE_ALIGN_CEIL(object[1].size, RTE_CACHE_LINE_MIN_SIZE) +
 			    RTE_ALIGN_CEIL(object[2].size, RTE_CACHE_LINE_MIN_SIZE);
-	mz_size += model_object_size;
+
+	model_xstats_size =
+		cnxk_mldev->mldev->data->nb_queue_pairs * sizeof(struct mvtvm_ml_model_xstats);
+
+	mz_size += model_object_size + model_xstats_size;
 
 	/* Allocate memzone for model object */
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", MVTVM_ML_MODEL_MEMZONE_NAME, model->model_id);
@@ -181,6 +259,22 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	/* Set model info */
 	mvtvm_ml_model_info_set(cnxk_mldev, model);
 
+	/* Update model xstats name */
+	cnxk_ml_xstats_model_name_update(cnxk_mldev, model->model_id);
+
+	model->mvtvm.burst_xstats = RTE_PTR_ADD(
+		model->mvtvm.object.params.addr,
+		RTE_ALIGN_CEIL(model->mvtvm.object.params.size, RTE_CACHE_LINE_MIN_SIZE));
+
+	for (int qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_tot = 0;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency = 0;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_min = UINT64_MAX;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_max = 0;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count = 0;
+		model->mvtvm.burst_xstats[qp_id].dequeued_count = 0;
+	}
+
 	return 0;
 
 error:
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 55459f9f7f..22e0340146 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -11,8 +11,11 @@
 
 #include <rte_mldev.h>
 
+#include "cnxk_ml_xstats.h"
+
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
@@ -22,4 +25,9 @@ int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *
 int mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
+void mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				   uint16_t stat_id, uint16_t entry, char *suffix);
+uint64_t mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				  enum cnxk_ml_xstats_type type);
+
 #endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index 260a051b08..19af1d2703 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -8,6 +8,7 @@
 
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_xstats.h"
 
 enum cnxk_ml_model_type
 mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
@@ -44,6 +45,28 @@ mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer
 	RTE_SET_USED(fp);
 }
 
+void
+mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			      uint16_t stat_id, uint16_t entry, char *suffix)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+	RTE_SET_USED(stat_id);
+	RTE_SET_USED(entry);
+	RTE_SET_USED(suffix);
+}
+
+uint64_t
+mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			 enum cnxk_ml_xstats_type type)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+	RTE_SET_USED(type);
+
+	return 0;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index d6d0edbcf1..3fd1f04c35 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -7,6 +7,8 @@
 
 #include <rte_mldev.h>
 
+#include "cnxk_ml_xstats.h"
+
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
@@ -24,5 +26,9 @@ int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_n
 				uint16_t *layer_id);
 struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
 void mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
+void mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				   uint16_t stat_id, uint16_t entry, char *suffix);
+uint64_t mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				  enum cnxk_ml_xstats_type type);
 
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 30/34] ml/cnxk: implement I/O alloc and free callbacks
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (28 preceding siblings ...)
  2023-10-17 16:59   ` [PATCH v4 29/34] ml/cnxk: enable reporting model runtime as xstats Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 31/34] ml/cnxk: add generic ML malloc and free callback Srikanth Yalavarthi
                     ` (4 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented callback functions for IO allocation and free
for Glow layers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 87 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  3 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c |  2 +
 3 files changed, 92 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 0c67ce7b40..7802425c87 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1410,3 +1410,90 @@ cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
 error_enqueue:
 	return ret;
 }
+
+int
+cn10k_ml_io_alloc(void *device, uint16_t model_id, const char *layer_name, uint64_t **input_qbuffer,
+		  uint64_t **output_qbuffer)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint64_t output_size;
+	uint64_t input_size;
+	uint16_t layer_id;
+	int ret;
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
+	layer = &model->layer[layer_id];
+	input_size = PLT_ALIGN_CEIL(layer->info.total_input_sz_q, ML_CN10K_ALIGN_SIZE);
+	output_size = PLT_ALIGN_CEIL(layer->info.total_output_sz_q, ML_CN10K_ALIGN_SIZE);
+
+	sprintf(str, "cn10k_ml_io_mz_%u_%u", model_id, layer_id);
+	mz = plt_memzone_reserve_aligned(str, input_size + output_size, 0, ML_CN10K_ALIGN_SIZE);
+	if (mz == NULL) {
+		plt_err("io_alloc failed: Unable to allocate memory: model_id = %u, layer_name = %s",
+			model_id, layer_name);
+		return -ENOMEM;
+	}
+
+	*input_qbuffer = mz->addr;
+	*output_qbuffer = PLT_PTR_ADD(mz->addr, input_size);
+
+	return 0;
+}
+
+int
+cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint16_t layer_id;
+	int ret;
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
+	sprintf(str, "cn10k_ml_io_mz_%u_%u", model_id, layer_id);
+	mz = plt_memzone_lookup(str);
+	if (mz == NULL) {
+		plt_err("io_free failed: Memzone not found: model_id = %u, layer_name = %s",
+			model_id, layer_name);
+		return -EINVAL;
+	}
+
+	return plt_memzone_free(mz);
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 045e2e6cd2..9c41c1c0b0 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -329,6 +329,9 @@ int cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name,
 int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name);
 int cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name);
 int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name);
+int cn10k_ml_io_alloc(void *device, uint16_t model_id, const char *layer_name,
+		      uint64_t **input_qbuffer, uint64_t **output_qbuffer);
+int cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name);
 
 /* xstats ops */
 void cn10k_ml_xstat_model_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 832837034b..77c2b5bcdc 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -232,6 +232,8 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback = &model->mvtvm.cb;
 		callback->tvmrt_glow_layer_load = cn10k_ml_layer_load;
 		callback->tvmrt_glow_layer_unload = cn10k_ml_layer_unload;
+		callback->tvmrt_io_alloc = cn10k_ml_io_alloc;
+		callback->tvmrt_io_free = cn10k_ml_io_free;
 	} else {
 		callback = NULL;
 	}
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 31/34] ml/cnxk: add generic ML malloc and free callback
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (29 preceding siblings ...)
  2023-10-17 16:59   ` [PATCH v4 30/34] ml/cnxk: implement I/O alloc and free callbacks Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 32/34] ml/cnxk: support quantize and dequantize callback Srikanth Yalavarthi
                     ` (3 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented generic ML malloc and free callbacks

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 30 ++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  3 +++
 drivers/ml/cnxk/mvtvm_ml_ops.c |  2 ++
 3 files changed, 35 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 7802425c87..01b0a44caa 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1497,3 +1497,33 @@ cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name)
 
 	return plt_memzone_free(mz);
 }
+
+int
+cn10k_ml_malloc(const char *name, size_t size, uint32_t align, void **addr)
+{
+	const struct plt_memzone *mz;
+
+	mz = plt_memzone_reserve_aligned(name, size, 0, align);
+	if (mz == NULL) {
+		plt_err("ml_malloc failed: Unable to allocate memory: name = %s", name);
+		return -ENOMEM;
+	}
+
+	*addr = mz->addr;
+
+	return 0;
+}
+
+int
+cn10k_ml_free(const char *name)
+{
+	const struct plt_memzone *mz;
+
+	mz = plt_memzone_lookup(name);
+	if (mz == NULL) {
+		plt_err("ml_free failed: Memzone not found: name = %s", name);
+		return -EINVAL;
+	}
+
+	return plt_memzone_free(mz);
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 9c41c1c0b0..eb3e1c139c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -333,6 +333,9 @@ int cn10k_ml_io_alloc(void *device, uint16_t model_id, const char *layer_name,
 		      uint64_t **input_qbuffer, uint64_t **output_qbuffer);
 int cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name);
 
+int cn10k_ml_malloc(const char *name, size_t size, uint32_t align, void **addr);
+int cn10k_ml_free(const char *name);
+
 /* xstats ops */
 void cn10k_ml_xstat_model_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 				   uint16_t stat_id, uint16_t entry, char *suffix);
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 77c2b5bcdc..b627355917 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -234,6 +234,8 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback->tvmrt_glow_layer_unload = cn10k_ml_layer_unload;
 		callback->tvmrt_io_alloc = cn10k_ml_io_alloc;
 		callback->tvmrt_io_free = cn10k_ml_io_free;
+		callback->tvmrt_malloc = cn10k_ml_malloc;
+		callback->tvmrt_free = cn10k_ml_free;
 	} else {
 		callback = NULL;
 	}
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 32/34] ml/cnxk: support quantize and dequantize callback
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (30 preceding siblings ...)
  2023-10-17 16:59   ` [PATCH v4 31/34] ml/cnxk: add generic ML malloc and free callback Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 33/34] ml/cnxk: enable fast-path ops for TVM models Srikanth Yalavarthi
                     ` (2 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

From: Prince Takkar <ptakkar@marvell.com>

Added support for quantize and dequantize callback
functions for TVM models.

Signed-off-by: Prince Takkar <ptakkar@marvell.com>
---
 drivers/ml/cnxk/mvtvm_ml_ops.c | 129 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h |   4 +
 2 files changed, 133 insertions(+)

diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index b627355917..776675843a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -2,11 +2,15 @@
  * Copyright (c) 2023 Marvell.
  */
 
+#include <dlpack/dlpack.h>
+
 #include <rte_common.h>
 #include <rte_cycles.h>
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include <mldev_utils.h>
+
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
@@ -236,6 +240,8 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback->tvmrt_io_free = cn10k_ml_io_free;
 		callback->tvmrt_malloc = cn10k_ml_malloc;
 		callback->tvmrt_free = cn10k_ml_free;
+		callback->tvmrt_quantize = mvtvm_ml_io_quantize;
+		callback->tvmrt_dequantize = mvtvm_ml_io_dequantize;
 	} else {
 		callback = NULL;
 	}
@@ -366,3 +372,126 @@ mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
 
 	return 0;
 }
+
+int
+mvtvm_ml_io_quantize(void *device, uint16_t model_id, const char *layer_name,
+		     const DLTensor **deq_tensor, void *qbuffer)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	uint16_t layer_id = 0;
+	uint8_t *lcl_dbuffer;
+	uint8_t *lcl_qbuffer;
+	uint32_t i;
+	int ret;
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if ((device == NULL) || (deq_tensor == NULL) || (qbuffer == NULL))
+		return -EINVAL;
+#endif
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+#ifdef CNXK_ML_DEV_DEBUG
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+#endif
+
+	/* Get layer id */
+	for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+		if (strcmp(model->layer[layer_id].name, layer_name) == 0)
+			break;
+	}
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if (layer_id == model->mvtvm.metadata.model.nb_layers) {
+		plt_err("Invalid layer name: %s", layer_name);
+		return -EINVAL;
+	}
+
+	if (model->layer[layer_id].type != ML_CNXK_LAYER_TYPE_MRVL) {
+		plt_err("Invalid layer name / type: %s", layer_name);
+		return -EINVAL;
+	}
+#endif
+
+	info = &model->layer[layer_id].info;
+	lcl_qbuffer = (uint8_t *)qbuffer;
+
+	for (i = 0; i < info->nb_inputs; i++) {
+		lcl_dbuffer = PLT_PTR_ADD(deq_tensor[i]->data, deq_tensor[i]->byte_offset);
+
+		ret = cnxk_ml_io_quantize_single(&info->input[i], lcl_dbuffer, lcl_qbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_qbuffer += info->input[i].sz_q;
+	}
+
+	return 0;
+}
+
+int
+mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name, void *qbuffer,
+		       const DLTensor **deq_tensor)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	uint16_t layer_id = 0;
+	uint8_t *lcl_dbuffer;
+	uint8_t *lcl_qbuffer;
+	uint32_t i;
+	int ret;
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if ((device == NULL) || (deq_tensor == NULL) || (qbuffer == NULL))
+		return -EINVAL;
+#endif
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+#ifdef CNXK_ML_DEV_DEBUG
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+#endif
+
+	for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+		if (strcmp(model->layer[layer_id].name, layer_name) == 0)
+			break;
+	}
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if (layer_id == model->mvtvm.metadata.model.nb_layers) {
+		plt_err("Invalid layer name: %s", layer_name);
+		return -EINVAL;
+	}
+
+	if (model->layer[layer_id].type != ML_CNXK_LAYER_TYPE_MRVL) {
+		plt_err("Invalid layer name / type: %s", layer_name);
+		return -EINVAL;
+	}
+#endif
+
+	info = &model->layer[layer_id].info;
+	lcl_qbuffer = (uint8_t *)qbuffer;
+
+	for (i = 0; i < info->nb_outputs; i++) {
+		lcl_dbuffer = PLT_PTR_ADD(deq_tensor[i]->data, deq_tensor[i]->byte_offset);
+
+		ret = cnxk_ml_io_dequantize_single(&info->output[i], lcl_qbuffer, lcl_dbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_qbuffer += info->output[i].sz_q;
+	}
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 22e0340146..4cabe30a82 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -24,6 +24,10 @@ int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_para
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_io_quantize(void *device, uint16_t model_id, const char *layer_name,
+			 const DLTensor **deq_tensor, void *qbuffer);
+int mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name, void *qbuffer,
+			   const DLTensor **deq_tensor);
 
 void mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 				   uint16_t stat_id, uint16_t entry, char *suffix);
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 33/34] ml/cnxk: enable fast-path ops for TVM models
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (31 preceding siblings ...)
  2023-10-17 16:59   ` [PATCH v4 32/34] ml/cnxk: support quantize and dequantize callback Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-17 16:59   ` [PATCH v4 34/34] ml/cnxk: enable creation of mvtvm virtual device Srikanth Yalavarthi
  2023-10-18  1:56   ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Jerin Jacob
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

From: Anup Prabhu <aprabhu@marvell.com>

Enable fast-path ops support for TVM models. Models would
use TVMDP library function calls to execute inference
operations for Hybrid and LLVM model sub-types.

For TVM MRVL model subtypes that have a single MRVL layer,
the inference requests are directly enqueued to hardware
by the driver.

Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 doc/guides/rel_notes/release_23_11.rst |   4 +
 drivers/ml/cnxk/cn10k_ml_ops.c         |   4 -
 drivers/ml/cnxk/cnxk_ml_io.h           |   6 ++
 drivers/ml/cnxk/cnxk_ml_ops.c          |   4 +
 drivers/ml/cnxk/cnxk_ml_ops.h          |   5 +
 drivers/ml/cnxk/mvtvm_ml_model.c       |  20 ++++
 drivers/ml/cnxk/mvtvm_ml_model.h       |   6 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c         | 124 +++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h         |  43 +++++++++
 9 files changed, 212 insertions(+), 4 deletions(-)

diff --git a/doc/guides/rel_notes/release_23_11.rst b/doc/guides/rel_notes/release_23_11.rst
index 8701350b2e..ba4d162287 100644
--- a/doc/guides/rel_notes/release_23_11.rst
+++ b/doc/guides/rel_notes/release_23_11.rst
@@ -28,6 +28,10 @@ New Features
 
      Added support in mldev library for models with multiple inputs and outputs.
 
+   * **Added support for Marvell TVM models in ML CNXK driver.**
+
+     Added support for models compiled using TVM framework in ML CNXK driver.
+
 
 .. This section should contain new features added in this release.
    Sample format:
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 01b0a44caa..b9d30278c6 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -371,10 +371,6 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 	else
 		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
-	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
-	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
-	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
-
 	return 0;
 }
 
diff --git a/drivers/ml/cnxk/cnxk_ml_io.h b/drivers/ml/cnxk/cnxk_ml_io.h
index 5de166c252..6d5d25a7c9 100644
--- a/drivers/ml/cnxk/cnxk_ml_io.h
+++ b/drivers/ml/cnxk/cnxk_ml_io.h
@@ -47,6 +47,12 @@ struct cnxk_ml_io {
 
 	/* Scale */
 	float scale;
+
+	/* Dequantized offset */
+	uint32_t offset_d;
+
+	/* Quantized offset */
+	uint32_t offset_q;
 };
 
 /* Model / Layer IO structure */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index fd2c46ac1f..608e9fc4ca 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -632,6 +632,10 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	cnxk_mldev->max_nb_layers =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
 
+	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
+	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
+	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
+
 	/* Allocate and initialize index_map */
 	if (cnxk_mldev->index_map == NULL) {
 		cnxk_mldev->index_map =
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index ab32676b3e..7b49793a57 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -24,6 +24,11 @@ struct cnxk_ml_req {
 	union {
 		/* CN10K */
 		struct cn10k_ml_req cn10k_req;
+
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		/* MVTVM */
+		struct mvtvm_ml_req mvtvm_req;
+#endif
 	};
 
 	/* Address of status field */
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 4c12f584d5..1dfd0d176a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -198,6 +198,16 @@ mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model)
 		model->mvtvm.info.total_input_sz_d += model->mvtvm.info.input[i].sz_d;
 		model->mvtvm.info.total_input_sz_q += model->mvtvm.info.input[i].sz_q;
 
+		model->mvtvm.info.input[i].offset_d = model->mvtvm.info.total_input_sz_d;
+		model->mvtvm.info.input[i].offset_q = model->mvtvm.info.total_input_sz_q;
+
+		model->mvtvm.input_tensor[i].device = metadata->input[i].device;
+		model->mvtvm.input_tensor[i].ndim = metadata->input[i].ndim;
+		model->mvtvm.input_tensor[i].dtype = metadata->input[i].datatype;
+		model->mvtvm.input_tensor[i].shape = metadata->input[i].shape;
+		model->mvtvm.input_tensor[i].strides = NULL;
+		model->mvtvm.input_tensor[i].byte_offset = model->mvtvm.info.input[i].offset_q;
+
 		plt_ml_dbg("model_id = %u, input[%u] - sz_d = %u sz_q = %u", model->model_id, i,
 			   model->mvtvm.info.input[i].sz_d, model->mvtvm.info.input[i].sz_q);
 	}
@@ -231,6 +241,16 @@ mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model)
 		model->mvtvm.info.total_output_sz_d += model->mvtvm.info.output[i].sz_d;
 		model->mvtvm.info.total_output_sz_q += model->mvtvm.info.output[i].sz_q;
 
+		model->mvtvm.info.output[i].offset_d = model->mvtvm.info.total_output_sz_d;
+		model->mvtvm.info.output[i].offset_q = model->mvtvm.info.total_output_sz_q;
+
+		model->mvtvm.output_tensor[i].device = metadata->output[i].device;
+		model->mvtvm.output_tensor[i].ndim = metadata->output[i].ndim;
+		model->mvtvm.output_tensor[i].dtype = metadata->output[i].datatype;
+		model->mvtvm.output_tensor[i].shape = metadata->output[i].shape;
+		model->mvtvm.output_tensor[i].strides = NULL;
+		model->mvtvm.output_tensor[i].byte_offset = model->mvtvm.info.output[i].offset_q;
+
 		plt_ml_dbg("model_id = %u, output[%u] - sz_d = %u sz_q = %u", model->model_id, i,
 			   model->mvtvm.info.output[i].sz_d, model->mvtvm.info.output[i].sz_q);
 	}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 66c3af18e1..7ffce38094 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -69,6 +69,12 @@ struct mvtvm_ml_model_data {
 
 	/* Stats for burst ops */
 	struct mvtvm_ml_model_xstats *burst_xstats;
+
+	/* Input Tensor */
+	DLTensor input_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Output Tensor */
+	DLTensor output_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
 };
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 776675843a..1e74b82a0a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -19,6 +19,12 @@
 /* ML model macros */
 #define MVTVM_ML_MODEL_MEMZONE_NAME "ml_mvtvm_model_mz"
 
+__rte_hot static void
+mvtvm_ml_set_poll_addr(struct cnxk_ml_req *req)
+{
+	req->status = &req->mvtvm_req.status;
+}
+
 void
 mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 			      uint16_t stat_id, uint16_t entry, char *suffix)
@@ -242,6 +248,7 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback->tvmrt_free = cn10k_ml_free;
 		callback->tvmrt_quantize = mvtvm_ml_io_quantize;
 		callback->tvmrt_dequantize = mvtvm_ml_io_dequantize;
+		callback->tvmrt_inference = cn10k_ml_inference_sync;
 	} else {
 		callback = NULL;
 	}
@@ -285,6 +292,19 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		model->mvtvm.burst_xstats[qp_id].dequeued_count = 0;
 	}
 
+	/* Set model specific fast path functions */
+	if (model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL) {
+		model->enqueue_single = cn10k_ml_enqueue_single;
+		model->result_update = cn10k_ml_result_update;
+		model->set_error_code = cn10k_ml_set_error_code;
+		model->set_poll_addr = cn10k_ml_set_poll_addr;
+	} else {
+		model->enqueue_single = mvtvm_ml_enqueue_single;
+		model->result_update = mvtvm_ml_result_update;
+		model->set_error_code = mvtvm_ml_set_error_code;
+		model->set_poll_addr = mvtvm_ml_set_poll_addr;
+	}
+
 	return 0;
 
 error:
@@ -495,3 +515,107 @@ mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name,
 
 	return 0;
 }
+
+static int
+mvtvm_ml_model_run(struct cnxk_ml_model *model, struct rte_ml_op *op, struct cnxk_ml_req *req)
+{
+	uint8_t i;
+
+	rte_memcpy(req->mvtvm_req.input_tensor, model->mvtvm.input_tensor,
+		   model->mvtvm.metadata.model.num_input * sizeof(DLTensor));
+	for (i = 0; i < model->mvtvm.metadata.model.num_input; i++) {
+		req->mvtvm_req.input_tensor[i].data = op->input[i]->addr;
+		req->mvtvm_req.input_tensor[i].byte_offset = 0;
+	}
+
+	rte_memcpy(req->mvtvm_req.output_tensor, model->mvtvm.output_tensor,
+		   model->mvtvm.metadata.model.num_output * sizeof(DLTensor));
+	for (i = 0; i < model->mvtvm.metadata.model.num_output; i++) {
+		req->mvtvm_req.output_tensor[i].data = op->output[i]->addr;
+		req->mvtvm_req.output_tensor[i].byte_offset = 0;
+	}
+
+	tvmdp_model_run(model->model_id, model->mvtvm.metadata.model.num_input,
+			req->mvtvm_req.input_tensor, model->mvtvm.metadata.model.num_output,
+			req->mvtvm_req.output_tensor, &req->mvtvm_req.result,
+			&req->mvtvm_req.status);
+
+	plt_write64(ML_CNXK_POLL_JOB_FINISH, req->status);
+
+	return 0;
+}
+
+__rte_hot void
+mvtvm_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype)
+{
+	RTE_SET_USED(stype);
+
+	req->mvtvm_req.result.error_code = etype;
+}
+
+__rte_hot bool
+mvtvm_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op, uint16_t layer_id,
+			struct cnxk_ml_qp *qp, uint64_t head)
+{
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_req *req;
+
+	RTE_SET_USED(layer_id);
+
+	queue = &qp->queue;
+	req = &queue->reqs[head];
+	model = cnxk_mldev->mldev->data->models[op->model_id];
+
+	model->set_poll_addr(req);
+	memset(&req->mvtvm_req.result, 0, sizeof(struct mvtvm_ml_result));
+	req->mvtvm_req.result.error_code = 0x0;
+	req->mvtvm_req.result.user_ptr = op->user_ptr;
+
+	cnxk_ml_set_poll_ptr(req);
+	mvtvm_ml_model_run(model, op, req);
+	req->timeout = plt_tsc_cycles() + queue->wait_cycles;
+	req->op = op;
+
+	return true;
+}
+
+__rte_hot void
+mvtvm_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request)
+{
+	struct mvtvm_ml_model_xstats *xstats;
+	struct mvtvm_ml_result *result;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_req *req;
+	uint64_t tvm_rt_latency;
+	struct cnxk_ml_qp *qp;
+	struct rte_ml_op *op;
+
+	req = (struct cnxk_ml_req *)request;
+	result = &req->mvtvm_req.result;
+	op = req->op;
+	qp = cnxk_mldev->mldev->data->queue_pairs[qp_id];
+	op->impl_opaque = result->error_code;
+
+	if (likely(result->error_code == 0)) {
+		qp->stats.dequeued_count++;
+		op->status = RTE_ML_OP_STATUS_SUCCESS;
+
+		model = cnxk_mldev->mldev->data->models[op->model_id];
+		xstats = &model->mvtvm.burst_xstats[qp_id];
+
+		if (unlikely(xstats->dequeued_count == xstats->tvm_rt_reset_count)) {
+			xstats->tvm_rt_latency_min = UINT64_MAX;
+			xstats->tvm_rt_latency_max = 0;
+		}
+		tvm_rt_latency = result->stats.end_ns - result->stats.start_ns;
+		xstats->tvm_rt_latency = tvm_rt_latency;
+		xstats->tvm_rt_latency_tot += tvm_rt_latency;
+		xstats->tvm_rt_latency_min = RTE_MIN(xstats->tvm_rt_latency_min, tvm_rt_latency);
+		xstats->tvm_rt_latency_max = RTE_MAX(xstats->tvm_rt_latency_max, tvm_rt_latency);
+		xstats->dequeued_count++;
+	} else {
+		qp->stats.dequeue_err_count++;
+		op->status = RTE_ML_OP_STATUS_ERROR;
+	}
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 4cabe30a82..cb4b219743 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -16,6 +16,44 @@
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
+struct cnxk_ml_qp;
+struct cnxk_ml_req;
+
+/* Inference stats */
+struct mvtvm_ml_stats {
+	/* Start ns */
+	uint64_t start_ns;
+
+	/* Start ns */
+	uint64_t end_ns;
+};
+
+/* Result structure */
+struct mvtvm_ml_result {
+	/* Job error code */
+	uint64_t error_code;
+
+	/* Inference stats */
+	struct mvtvm_ml_stats stats;
+
+	/* User context pointer */
+	void *user_ptr;
+};
+
+/* MVTVM specific request */
+struct mvtvm_ml_req {
+	/* Input tensors */
+	DLTensor input_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Output tensors */
+	DLTensor output_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Status field for poll mode requests */
+	volatile uint64_t status;
+
+	/* Result */
+	struct mvtvm_ml_result result;
+};
 
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
@@ -29,6 +67,11 @@ int mvtvm_ml_io_quantize(void *device, uint16_t model_id, const char *layer_name
 int mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name, void *qbuffer,
 			   const DLTensor **deq_tensor);
 
+__rte_hot bool mvtvm_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op,
+				       uint16_t layer_id, struct cnxk_ml_qp *qp, uint64_t head);
+__rte_hot void mvtvm_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request);
+__rte_hot void mvtvm_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype);
+
 void mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 				   uint16_t stat_id, uint16_t entry, char *suffix);
 uint64_t mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v4 34/34] ml/cnxk: enable creation of mvtvm virtual device
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (32 preceding siblings ...)
  2023-10-17 16:59   ` [PATCH v4 33/34] ml/cnxk: enable fast-path ops for TVM models Srikanth Yalavarthi
@ 2023-10-17 16:59   ` Srikanth Yalavarthi
  2023-10-18  1:56   ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Jerin Jacob
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 16:59 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enable support to create a mvtvm virtual device on system's
without a PCI based ML HW accelerator.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 doc/guides/mldevs/cnxk.rst       |  49 +++++++-
 drivers/ml/cnxk/cn10k_ml_dev.c   |   8 ++
 drivers/ml/cnxk/cn10k_ml_dev.h   |   3 +
 drivers/ml/cnxk/cnxk_ml_dev.c    |   3 +
 drivers/ml/cnxk/cnxk_ml_dev.h    |  21 ++++
 drivers/ml/cnxk/cnxk_ml_ops.c    |  82 +++++++++----
 drivers/ml/cnxk/meson.build      |   2 +
 drivers/ml/cnxk/mvtvm_ml_dev.c   | 196 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_dev.h   |  40 +++++++
 drivers/ml/cnxk/mvtvm_ml_ops.c   |  31 +++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |   2 +
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  18 +++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |   2 +
 13 files changed, 433 insertions(+), 24 deletions(-)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.h

diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
index a629ceb796..55138c4ced 100644
--- a/doc/guides/mldevs/cnxk.rst
+++ b/doc/guides/mldevs/cnxk.rst
@@ -128,6 +128,22 @@ Bind the ML PF device to the vfio_pci driver:
    usertools/dpdk-devbind.py -u 0000:00:10.0
    usertools/dpdk-devbind.py -b vfio-pci 0000:00:10.0
 
+VDEV support
+------------
+
+On platforms which don't support ML hardware acceleration through PCI device, the
+Marvell ML CNXK PMD can execute inference operations on a vdev with the ML models
+compiled using Apache TVM framework.
+
+VDEV can be enabled by passing the EAL arguments
+
+.. code-block:: console
+
+   --vdev ml_mvtvm
+
+VDEV can also be used on platforms with ML HW accelerator. However use of VDEV and
+PCI HW accelerator is mutually exclusive.
+
 
 Runtime Config Options
 ----------------------
@@ -138,6 +154,8 @@ Runtime Config Options
   The parameter ``fw_path`` can be used by the user
   to load ML firmware from a custom path.
 
+  This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,fw_path="/home/user/ml_fw.bin"
@@ -153,6 +171,8 @@ Runtime Config Options
   When enabled, firmware would mask the DPE non-fatal hardware errors as warnings.
   The parameter ``enable_dpe_warnings`` is used fo this configuration.
 
+  This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,enable_dpe_warnings=0
@@ -169,11 +189,19 @@ Runtime Config Options
   Caching of model data improves the inferencing throughput / latency for the model.
   The parameter ``cache_model_data`` is used to enable data caching.
 
+  This option is supported on PCI HW accelerator and vdev.
+
   For example::
 
      -a 0000:00:10.0,cache_model_data=0
 
-  With the above configuration, model data caching is disabled.
+  With the above configuration, model data caching is disabled on HW accelerator.
+
+  For example::
+
+     --vdev ml_mvtvm,cache_model_data=0
+
+  With the above configuration, model data caching is disabled on vdev.
 
 
 **OCM allocation mode** (default ``lowest``)
@@ -189,6 +217,8 @@ Runtime Config Options
   ``largest``
     Allocate OCM for the model from the slot with largest amount of free space.
 
+  This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,ocm_alloc_mode=lowest
@@ -206,6 +236,8 @@ Runtime Config Options
   Supported page sizes by the driver are 1 KB, 2 KB, 4 KB, 8 KB and 16 KB.
   Default page size is 16 KB.
 
+  This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,ocm_page_size=8192
@@ -230,6 +262,8 @@ Runtime Config Options
     Enabling spinlock version would disable restrictions on the number of queue-pairs
     that can be supported by the driver.
 
+   This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,hw_queue_lock=1
@@ -238,6 +272,19 @@ Runtime Config Options
   in the fast path enqueue burst operation.
 
 
+**Maximum queue pairs** (default ``1``)
+
+  VDEV supports additional EAL arguments to configure the maximum number of
+  queue-pairs on the ML device through the option ``max_qps``.
+
+  This option is supported only on vdev.
+
+  For example::
+
+     --vdev ml_mvtvm,max_qps=4
+
+  With the above configuration, 4 queue-pairs are created on the vdev.
+
 Debugging Options
 -----------------
 
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 91813e9d0a..caa13ba08c 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -309,6 +309,12 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 
 	PLT_SET_USED(pci_drv);
 
+	if (cnxk_ml_dev_initialized == 1) {
+		plt_err("ML CNXK device already initialized!");
+		plt_err("Cannot initialize CN10K PCI dev");
+		rte_exit(-EINVAL, "Invalid EAL arguments ");
+	}
+
 	init_params = (struct rte_ml_dev_pmd_init_params){
 		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cnxk_ml_dev)};
 
@@ -355,6 +361,8 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	dev->dequeue_burst = NULL;
 	dev->op_error_get = NULL;
 
+	cnxk_ml_dev_initialized = 1;
+	cnxk_mldev->type = CNXK_ML_DEV_TYPE_PCI;
 	cnxk_mldev->state = ML_CNXK_DEV_STATE_PROBED;
 
 	return 0;
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 2e7eb6c9ef..cee405f3f5 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -11,6 +11,9 @@
 
 #include "cnxk_ml_io.h"
 
+/* Device status */
+extern int cnxk_ml_dev_initialized;
+
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.c b/drivers/ml/cnxk/cnxk_ml_dev.c
index 63d1c9e417..dc4512223c 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.c
+++ b/drivers/ml/cnxk/cnxk_ml_dev.c
@@ -7,6 +7,9 @@
 
 #include "cnxk_ml_dev.h"
 
+/* Device status */
+int cnxk_ml_dev_initialized;
+
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
 
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 382fca64be..491c4c4aea 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -9,6 +9,10 @@
 
 #include "cn10k_ml_dev.h"
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_dev.h"
+#endif
+
 #include "cnxk_ml_xstats.h"
 
 /* ML command timeout in seconds */
@@ -34,6 +38,15 @@ struct cnxk_ml_error_db {
 	char str[RTE_ML_STR_MAX];
 };
 
+/* Device type */
+enum cnxk_ml_dev_type {
+	/* PCI based Marvell's ML HW accelerator device */
+	CNXK_ML_DEV_TYPE_PCI,
+
+	/* Generic Virtual device */
+	CNXK_ML_DEV_TYPE_VDEV,
+};
+
 /* Device configuration state enum */
 enum cnxk_ml_dev_state {
 	/* Probed and not configured */
@@ -66,6 +79,9 @@ struct cnxk_ml_dev {
 	/* RTE device */
 	struct rte_ml_dev *mldev;
 
+	/* Device type */
+	enum cnxk_ml_dev_type type;
+
 	/* Configuration state */
 	enum cnxk_ml_dev_state state;
 
@@ -87,6 +103,11 @@ struct cnxk_ml_dev {
 	/* CN10K device structure */
 	struct cn10k_ml_dev cn10k_mldev;
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	/* MVTVM device structure */
+	struct mvtvm_ml_dev mvtvm_mldev;
+#endif
+
 	/* Maximum number of layers */
 	uint64_t max_nb_layers;
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 608e9fc4ca..517aa71931 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -117,7 +117,8 @@ cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc
 	qp->stats.enqueue_err_count = 0;
 	qp->stats.dequeue_err_count = 0;
 
-	cn10k_ml_qp_initialize(cnxk_mldev, qp);
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		cn10k_ml_qp_initialize(cnxk_mldev, qp);
 
 	return qp;
 
@@ -480,7 +481,12 @@ cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	dev_info->driver_name = dev->device->driver->name;
 	dev_info->max_models = ML_CNXK_MAX_MODELS;
 
-	return cn10k_ml_dev_info_get(cnxk_mldev, dev_info);
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		return cn10k_ml_dev_info_get(cnxk_mldev, dev_info);
+	else
+		return mvtvm_ml_dev_info_get(cnxk_mldev, dev_info);
+
+	return 0;
 }
 
 static int
@@ -518,9 +524,11 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 			   conf->nb_queue_pairs, conf->nb_models);
 
 		/* Load firmware */
-		ret = cn10k_ml_fw_load(cnxk_mldev);
-		if (ret != 0)
-			return ret;
+		if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+			ret = cn10k_ml_fw_load(cnxk_mldev);
+			if (ret != 0)
+				return ret;
+		}
 	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
 		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
@@ -618,10 +626,12 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	}
 	dev->data->nb_models = conf->nb_models;
 
-	ret = cn10k_ml_dev_configure(cnxk_mldev, conf);
-	if (ret != 0) {
-		plt_err("Failed to configure CN10K ML Device");
-		goto error;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		ret = cn10k_ml_dev_configure(cnxk_mldev, conf);
+		if (ret != 0) {
+			plt_err("Failed to configure CN10K ML Device");
+			goto error;
+		}
 	}
 
 	ret = mvtvm_ml_dev_configure(cnxk_mldev, conf);
@@ -629,12 +639,17 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 		goto error;
 
 	/* Set device capabilities */
-	cnxk_mldev->max_nb_layers =
-		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		cnxk_mldev->max_nb_layers =
+			cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
+	else
+		cnxk_mldev->max_nb_layers = ML_CNXK_MAX_MODELS;
 
 	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
 	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
-	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
+
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
 
 	/* Allocate and initialize index_map */
 	if (cnxk_mldev->index_map == NULL) {
@@ -695,8 +710,10 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 	if (mvtvm_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close MVTVM ML Device");
 
-	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
-		plt_err("Failed to close CN10K ML Device");
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		if (cn10k_ml_dev_close(cnxk_mldev) != 0)
+			plt_err("Failed to close CN10K ML Device");
+	}
 
 	if (cnxk_mldev->index_map)
 		rte_free(cnxk_mldev->index_map);
@@ -748,10 +765,12 @@ cnxk_ml_dev_start(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
-	ret = cn10k_ml_dev_start(cnxk_mldev);
-	if (ret != 0) {
-		plt_err("Failed to start CN10K ML Device");
-		return ret;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		ret = cn10k_ml_dev_start(cnxk_mldev);
+		if (ret != 0) {
+			plt_err("Failed to start CN10K ML Device");
+			return ret;
+		}
 	}
 
 	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
@@ -770,10 +789,12 @@ cnxk_ml_dev_stop(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
-	ret = cn10k_ml_dev_stop(cnxk_mldev);
-	if (ret != 0) {
-		plt_err("Failed to stop CN10K ML Device");
-		return ret;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		ret = cn10k_ml_dev_stop(cnxk_mldev);
+		if (ret != 0) {
+			plt_err("Failed to stop CN10K ML Device");
+			return ret;
+		}
 	}
 
 	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
@@ -800,7 +821,12 @@ cnxk_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 			cnxk_ml_model_dump(cnxk_mldev, model, fp);
 	}
 
-	return cn10k_ml_dev_dump(cnxk_mldev, fp);
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		return cn10k_ml_dev_dump(cnxk_mldev, fp);
+	else
+		return mvtvm_ml_dev_dump(cnxk_mldev, fp);
+
+	return 0;
 }
 
 static int
@@ -813,6 +839,9 @@ cnxk_ml_dev_selftest(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV)
+		return -ENOTSUP;
+
 	return cn10k_ml_dev_selftest(cnxk_mldev);
 }
 
@@ -1145,6 +1174,11 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 		return -EINVAL;
 	}
 
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV && type != ML_CNXK_MODEL_TYPE_TVM) {
+		plt_err("Unsupported model type");
+		return -ENOTSUP;
+	}
+
 	/* Find model ID */
 	found = false;
 	for (lcl_model_id = 0; lcl_model_id < dev->data->nb_models; lcl_model_id++) {
@@ -1384,6 +1418,8 @@ cnxk_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buf
 		return -EINVAL;
 
 	cnxk_mldev = dev->data->dev_private;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV)
+		return -ENOTSUP;
 
 	model = dev->data->models[model_id];
 	if (model == NULL) {
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index b3a62a7871..e4e3bc200d 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -70,11 +70,13 @@ if enable_mvtvm
 dpdk_conf.set('RTE_MLDEV_CNXK_ENABLE_MVTVM', 1)
 
 driver_sdk_headers += files(
+        'mvtvm_ml_dev.h',
         'mvtvm_ml_ops.h',
         'mvtvm_ml_model.h',
 )
 
 sources += files(
+        'mvtvm_ml_dev.c',
         'mvtvm_ml_ops.c',
         'mvtvm_ml_model.c',
 )
diff --git a/drivers/ml/cnxk/mvtvm_ml_dev.c b/drivers/ml/cnxk/mvtvm_ml_dev.c
new file mode 100644
index 0000000000..c93b5155b9
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_dev.c
@@ -0,0 +1,196 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_kvargs.h>
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include <bus_vdev_driver.h>
+
+#include <roc_api.h>
+
+#include "cnxk_ml_dev.h"
+
+#define MVTVM_ML_DEV_MAX_QPS	      "max_qps"
+#define MVTVM_ML_DEV_CACHE_MODEL_DATA "cache_model_data"
+
+#define MVTVM_ML_DEV_MAX_QPS_DEFAULT	      32
+#define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT 1
+
+static const char *const valid_args[] = {MVTVM_ML_DEV_MAX_QPS, MVTVM_ML_DEV_CACHE_MODEL_DATA, NULL};
+
+static int
+parse_integer_arg(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	int *i = (int *)extra_args;
+
+	*i = atoi(value);
+	if (*i < 0) {
+		plt_err("Argument has to be positive.");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int
+parse_uint_arg(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	int i;
+	char *end;
+	errno = 0;
+
+	i = strtol(value, &end, 10);
+	if (*end != 0 || errno != 0 || i < 0)
+		return -EINVAL;
+
+	*((uint32_t *)extra_args) = i;
+
+	return 0;
+}
+
+static int
+mvtvm_mldev_parse_devargs(const char *args, struct mvtvm_ml_dev *mvtvm_mldev)
+{
+	bool cache_model_data_set = false;
+	struct rte_kvargs *kvlist = NULL;
+	bool max_qps_set = false;
+	int ret = 0;
+
+	if (args == NULL)
+		goto check_args;
+
+	kvlist = rte_kvargs_parse(args, valid_args);
+	if (kvlist == NULL) {
+		plt_err("Error parsing %s devargs\n", "MLDEV_NAME_MVTVM_PMD");
+		return -EINVAL;
+	}
+
+	if (rte_kvargs_count(kvlist, MVTVM_ML_DEV_MAX_QPS) == 1) {
+		ret = rte_kvargs_process(kvlist, MVTVM_ML_DEV_MAX_QPS, &parse_uint_arg,
+					 &mvtvm_mldev->max_nb_qpairs);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n", MVTVM_ML_DEV_MAX_QPS);
+			ret = -EINVAL;
+			goto exit;
+		}
+		max_qps_set = true;
+	}
+
+	if (rte_kvargs_count(kvlist, MVTVM_ML_DEV_CACHE_MODEL_DATA) == 1) {
+		ret = rte_kvargs_process(kvlist, MVTVM_ML_DEV_CACHE_MODEL_DATA, &parse_integer_arg,
+					 &mvtvm_mldev->cache_model_data);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n",
+				MVTVM_ML_DEV_CACHE_MODEL_DATA);
+			ret = -EINVAL;
+			goto exit;
+		}
+		cache_model_data_set = true;
+	}
+
+check_args:
+	if (!max_qps_set)
+		mvtvm_mldev->max_nb_qpairs = MVTVM_ML_DEV_MAX_QPS_DEFAULT;
+	plt_ml_dbg("ML: %s = %u", MVTVM_ML_DEV_MAX_QPS, mvtvm_mldev->max_nb_qpairs);
+
+	if (!cache_model_data_set) {
+		mvtvm_mldev->cache_model_data = CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT;
+	} else {
+		if ((mvtvm_mldev->cache_model_data < 0) || (mvtvm_mldev->cache_model_data > 1)) {
+			plt_err("Invalid argument, %s = %d\n", MVTVM_ML_DEV_CACHE_MODEL_DATA,
+				mvtvm_mldev->cache_model_data);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_ml_dbg("ML: %s = %d", MVTVM_ML_DEV_CACHE_MODEL_DATA, mvtvm_mldev->cache_model_data);
+
+exit:
+	if (kvlist)
+		rte_kvargs_free(kvlist);
+
+	return ret;
+}
+
+static int
+mvtvm_ml_vdev_probe(struct rte_vdev_device *vdev)
+{
+	struct rte_ml_dev_pmd_init_params init_params;
+	struct mvtvm_ml_dev *mvtvm_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct rte_ml_dev *dev;
+	const char *input_args;
+	const char *name;
+	int ret = 0;
+
+	if (cnxk_ml_dev_initialized == 1) {
+		plt_err("ML CNXK device already initialized!");
+		plt_err("Cannot initialize MVTVM vdev");
+		rte_exit(-EINVAL, "Invalid EAL arguments ");
+	}
+
+	init_params = (struct rte_ml_dev_pmd_init_params){
+		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cnxk_ml_dev)};
+
+	name = rte_vdev_device_name(vdev);
+	if (name == NULL)
+		return -EINVAL;
+	input_args = rte_vdev_device_args(vdev);
+
+	dev = rte_ml_dev_pmd_create(name, &vdev->device, &init_params);
+	if (dev == NULL) {
+		ret = -EFAULT;
+		goto error_exit;
+	}
+
+	cnxk_mldev = dev->data->dev_private;
+	cnxk_mldev->mldev = dev;
+	mvtvm_mldev = &cnxk_mldev->mvtvm_mldev;
+	mvtvm_mldev->vdev = vdev;
+
+	ret = mvtvm_mldev_parse_devargs(input_args, mvtvm_mldev);
+	if (ret < 0)
+		goto error_exit;
+
+	dev->dev_ops = &cnxk_ml_ops;
+	dev->enqueue_burst = NULL;
+	dev->dequeue_burst = NULL;
+	dev->op_error_get = NULL;
+
+	cnxk_ml_dev_initialized = 1;
+	cnxk_mldev->type = CNXK_ML_DEV_TYPE_VDEV;
+
+	return 0;
+
+error_exit:
+	plt_err("Could not create device: ml_mvtvm");
+
+	return ret;
+}
+
+static int
+mvtvm_ml_vdev_remove(struct rte_vdev_device *vdev)
+{
+	struct rte_ml_dev *dev;
+	const char *name;
+
+	name = rte_vdev_device_name(vdev);
+	if (name == NULL)
+		return -EINVAL;
+
+	dev = rte_ml_dev_pmd_get_named_dev(name);
+	if (dev == NULL)
+		return -ENODEV;
+
+	return rte_ml_dev_pmd_destroy(dev);
+}
+
+static struct rte_vdev_driver mvtvm_mldev_pmd = {.probe = mvtvm_ml_vdev_probe,
+						 .remove = mvtvm_ml_vdev_remove};
+
+RTE_PMD_REGISTER_VDEV(MLDEV_NAME_MVTVM_PMD, mvtvm_mldev_pmd);
+
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_MVTVM_PMD,
+			      MVTVM_ML_DEV_MAX_QPS "=<int>" MVTVM_ML_DEV_CACHE_MODEL_DATA "=<0|1>");
diff --git a/drivers/ml/cnxk/mvtvm_ml_dev.h b/drivers/ml/cnxk/mvtvm_ml_dev.h
new file mode 100644
index 0000000000..6922c19337
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_dev.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_DEV_H_
+#define _MVTVM_ML_DEV_H_
+
+#include <rte_mldev_core.h>
+
+/* Device status */
+extern int cnxk_ml_dev_initialized;
+
+/* CNXK Device ops */
+extern struct rte_ml_dev_ops cnxk_ml_ops;
+
+/* Marvell MVTVM ML PMD device name */
+#define MLDEV_NAME_MVTVM_PMD ml_mvtvm
+
+/* Maximum number of descriptors per queue-pair */
+#define ML_MVTVM_MAX_DESC_PER_QP 1024
+
+/* Maximum number of inputs / outputs per model */
+#define ML_MVTVM_MAX_INPUT_OUTPUT 32
+
+/* Maximum number of segments for IO data */
+#define ML_MVTVM_MAX_SEGMENTS 1
+
+/* Device private data */
+struct mvtvm_ml_dev {
+	/* Virtual device */
+	struct rte_vdev_device *vdev;
+
+	/* Maximum number of queue pairs */
+	uint16_t max_nb_qpairs;
+
+	/* Enable / disable model data caching */
+	int cache_model_data;
+};
+
+#endif /* _MVTVM_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 1e74b82a0a..bbefa8a356 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -97,6 +97,22 @@ mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *m
 	return value;
 }
 
+int
+mvtvm_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info)
+{
+	struct mvtvm_ml_dev *mvtvm_mldev;
+
+	mvtvm_mldev = &cnxk_mldev->mvtvm_mldev;
+
+	dev_info->max_queue_pairs = mvtvm_mldev->max_nb_qpairs;
+	dev_info->max_desc = ML_MVTVM_MAX_DESC_PER_QP;
+	dev_info->max_io = ML_MVTVM_MAX_INPUT_OUTPUT;
+	dev_info->max_segments = ML_MVTVM_MAX_SEGMENTS;
+	dev_info->align_size = RTE_CACHE_LINE_SIZE;
+
+	return 0;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
@@ -127,6 +143,15 @@ mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 	return ret;
 }
 
+int
+mvtvm_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(fp);
+
+	return 0;
+}
+
 int
 mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 		    struct cnxk_ml_model *model)
@@ -237,6 +262,12 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	else
 		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_HYBRID;
 
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV &&
+	    model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_LLVM) {
+		plt_err("Unsupported model sub-type");
+		return -ENOTSUP;
+	}
+
 	/* Set callback function array */
 	if (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_LLVM) {
 		callback = &model->mvtvm.cb;
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index cb4b219743..0232c5ead5 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -55,8 +55,10 @@ struct mvtvm_ml_req {
 	struct mvtvm_ml_result result;
 };
 
+int mvtvm_ml_dev_info_get(struct cnxk_ml_dev *mldev, struct rte_ml_dev_info *dev_info);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int mvtvm_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index 19af1d2703..126a954c91 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -67,6 +67,15 @@ mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *m
 	return 0;
 }
 
+int
+mvtvm_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(dev_info);
+
+	return -ENOTSUP;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
@@ -84,6 +93,15 @@ mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
+int
+mvtvm_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(fp);
+
+	return -EINVAL;
+}
+
 int
 mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 		    struct cnxk_ml_model *model)
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 3fd1f04c35..4220a963f2 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -14,8 +14,10 @@ struct cnxk_ml_model;
 struct cnxk_ml_layer;
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
+int mvtvm_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int mvtvm_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* RE: [EXT] Re: [PATCH v3 34/35] ml/cnxk: update dependency info in driver docs
  2023-09-28  4:12     ` Jerin Jacob
  2023-10-01  0:32       ` [EXT] " Srikanth Yalavarthi
@ 2023-10-17 17:03       ` Srikanth Yalavarthi
  1 sibling, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-17 17:03 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, Shivah Shankar Shankar Narayan Rao, Anup Prabhu,
	Prince Takkar, techboard, Srikanth Yalavarthi

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: 28 September 2023 09:43
> To: Srikanth Yalavarthi <syalavarthi@marvell.com>
> Cc: dev@dpdk.org; Shivah Shankar Shankar Narayan Rao
> <sshankarnara@marvell.com>; Anup Prabhu <aprabhu@marvell.com>;
> Prince Takkar <ptakkar@marvell.com>; techboard@dpdk.org; Srikanth
> Yalavarthi <syalavarthi@marvell.com>
> Subject: [EXT] Re: [PATCH v3 34/35] ml/cnxk: update dependency info in
> driver docs
> 
> External Email
> 
> ----------------------------------------------------------------------
> On Thu, Sep 28, 2023 at 6:41 AM Srikanth Yalavarthi
> <syalavarthi@marvell.com> wrote:
> >
> > Added information related to external library dependencies for ml/cnxk
> > driver.
> >
> > Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
> > ---
> >  doc/guides/mldevs/cnxk.rst | 28 ++++++++++++++++++++++++++++
> >  1 file changed, 28 insertions(+)
> >
> > diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
> > index 197e1ed06f..afadc834e0 100644
> > --- a/doc/guides/mldevs/cnxk.rst
> > +++ b/doc/guides/mldevs/cnxk.rst
> > @@ -47,6 +47,34 @@ or cross-compiled on an x86 platform.
> >  Refer to :doc:`../platform/cnxk` for instructions to build your DPDK
> application.
> >
> >
> > +Compilation Prerequisites
> > +-------------------------
> > +
> > +This driver requires external libraries to optionally enable support
> > +for models compiled using Apache TVM framework. The following
> > +dependencies are not part of DPDK and must be installed separately:
> > +
> > +- **Jansson**
> > +
> > +  This library enables support to parse and read JSON files.
> > +
> > +- **libarchive**
> > +
> > +  Apached TVM framework generates compiled models as tar archives.
> > + This  library enables support to decompress and read archive files
> > + in tar,  xz and other formats.
> > +
> > +- **TVM**
> > +
> > +  Apache TVM provides a runtime library (libtvm_runtime) used to
> > + execute  models on CPU cores or hardware accelerators.
> > +
> > +- **TVMDP**
> > +
> > +  Marvell's TVM dataplane library which works as an interface between
> > + TVM  runtime and DPDK drivers. TVMDP library provides a simplified C
> > + interface  for TVM's runtime based on C++.
> 
> It seems that it depends on a proprietary library. Please fix the following for
> merging this series.
> 
> According to what was discussed in the Technical Board:
> https://urldefense.proofpoint.com/v2/url?u=http-
> 3A__mails.dpdk.org_archives_dev_2019-
> 2DJune_135847.html&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=SNPqUk
> Gl0n_Ms1iJa_6wD6LBwX8efL_NOyXvAX-
> iCMI&m=diNZcWaywZP478LbvQPYaK6-
> w1mq1giW2phwy5s7roDoNUFO6TSUdyOFHVZnutCI&s=8WbivGMgGdsgKaKT
> I1QKfTTnq56JJqnyxGUczzyYm3I&e=
> the dependency must be "freely available" to build it either source or binary
> form. (Prefer in source form)
> 
TVMDP library is now hosted on GitHub. Updated the documentation with required details and build steps.

> Also, Squash all doc updates to relevant patches.
Done. Squashed to corresponding patches.

^ permalink raw reply	[flat|nested] 340+ messages in thread

* Re: [PATCH v4 00/34] Implementation of revised ml/cnxk driver
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (33 preceding siblings ...)
  2023-10-17 16:59   ` [PATCH v4 34/34] ml/cnxk: enable creation of mvtvm virtual device Srikanth Yalavarthi
@ 2023-10-18  1:56   ` Jerin Jacob
  2023-10-18  6:55     ` [EXT] " Srikanth Yalavarthi
  34 siblings, 1 reply; 340+ messages in thread
From: Jerin Jacob @ 2023-10-18  1:56 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

On Tue, Oct 17, 2023 at 10:30 PM Srikanth Yalavarthi
<syalavarthi@marvell.com> wrote:
>
> This patch series is an implementation of revised ml/cnxk driver
> to support models compiled with TVM compiler framework. TVM models
> use a hybrid mode for execution, with regions of the model executing
> on the ML accelerator and the rest executing on CPU cores.
>
> This series of commits reorganizes the ml/cnxk driver and adds support
> to execute multiple regions with-in a TVM model.
>

Found following build error (may be due to gcc13)

ml/cnxk: enable OCM check for multilayer TVM model

[2389/2660] Compiling C object
drivers/libtmp_rte_ml_cnxk.a.p/ml_cnxk_cnxk_ml_ops.c.o
FAILED: drivers/libtmp_rte_ml_cnxk.a.p/ml_cnxk_cnxk_ml_ops.c.o
ccache gcc -Idrivers/libtmp_rte_ml_cnxk.a.p -Idrivers -I../drivers
-Idrivers/ml/cnxk -I../drivers/ml/cnxk -Ilib/mldev -I../lib/mldev -I.
-I.. -Iconfig -I../config -Ilib/eal/include -I../lib/eal/include
-Ilib/eal/linux/include -I../lib/eal/l
inux/include -Ilib/eal/x86/include -I../lib/eal/x86/include
-Ilib/eal/common -I../lib/eal/common -Ilib/eal -I../lib/eal
-Ilib/kvargs -I../lib/kvargs -Ilib/log -I../lib/log -Ilib/metrics
-I../lib/metrics -Ilib/telemetry -I../lib/telemetry -I
lib/mempool -I../lib/mempool -Ilib/ring -I../lib/ring -Ilib/mbuf
-I../lib/mbuf -Idrivers/common/cnxk -I../drivers/common/cnxk
-Idrivers/bus/pci -I../drivers/bus/pci -Ilib/net -I../lib/net
-Ilib/ethdev -I../lib/ethdev -Ilib/meter -I../lib/me
ter -Ilib/pci -I../lib/pci -I../drivers/bus/pci/linux -Ilib/security
-I../lib/security -Ilib/cryptodev -I../lib/cryptodev -Ilib/rcu
-I../lib/rcu -Ilib/hash -I../lib/hash -fdiagnostics-color=always
-D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch
-Wextra -Werror -std=c11 -O2 -g -include rte_config.h -Wcast-qual
-Wdeprecated -Wformat -Wformat-nonliteral -Wformat-security
-Wmissing-declarations -Wmissing-prototypes -Wnested-externs
-Wold-style-definition -Wpointer-arith -Wsign-compare
 -Wstrict-prototypes -Wundef -Wwrite-strings
-Wno-address-of-packed-member -Wno-packed-not-aligned
-Wno-missing-field-initializers -Wno-zero-length-bounds -D_GNU_SOURCE
-fPIC -march=native -mrtm -DALLOW_EXPERIMENTAL_API
-DALLOW_INTERNAL_API
 -Wno-format-truncation -DCNXK_ML_DEV_DEBUG
-DRTE_LOG_DEFAULT_LOGTYPE=pmd.ml.cnxk -MD -MQ
drivers/libtmp_rte_ml_cnxk.a.p/ml_cnxk_cnxk_ml_ops.c.o -MF
drivers/libtmp_rte_ml_cnxk.a.p/ml_cnxk_cnxk_ml_ops.c.o.d -o
drivers/libtmp_rte_ml_cnxk.a.p/
ml_cnxk_cnxk_ml_ops.c.o -c ../drivers/ml/cnxk/cnxk_ml_ops.c
../drivers/ml/cnxk/cnxk_ml_ops.c: In function ‘cnxk_ml_model_load’:
../drivers/ml/cnxk/cnxk_ml_ops.c:527:18: error: ‘struct cnxk_ml_model’
has no member named ‘type’
  527 |         if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
      |                  ^~
../drivers/ml/cnxk/cnxk_ml_ops.c:527:28: error:
‘ML_CNXK_MODEL_TYPE_GLOW’ undeclared (first use in this function); did
you mean ‘ML_CNXK_MODEL_STATE_LOADED’?
  527 |         if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
      |                            ^~~~~~~~~~~~~~~~~~~~~~~
      |                            ML_CNXK_MODEL_STATE_LOADED
../drivers/ml/cnxk/cnxk_ml_ops.c:527:28: note: each undeclared
identifier is reported only once for each function it appears in
../drivers/ml/cnxk/cnxk_ml_ops.c:549:26: error: ‘struct cnxk_ml_model’
has no member named ‘type’
  549 |                 if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
      |                          ^~
../drivers/ml/cnxk/cnxk_ml_ops.c:568:26: error: ‘struct cnxk_ml_model’
has no member named ‘type’
  568 |                 if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
      |                          ^~
[2390/2660] Generating drivers/rte_bus_dpaa.sym_chk with a custom
command (wrapped by meson to capture output)
[2391/2660] Generating drivers/rte_bus_fslmc.sym_chk with a custom
command (wrapped by meson to capture output)
[2392/2660] Generating lib/pipeline.sym_chk with a custom command
(wrapped by meson to capture output)
[2393/2660] Generating lib/ethdev.sym_chk with a custom command
(wrapped by meson to capture output)
[2394/2660] Generating lib/eal.sym_chk with a custom command (wrapped
by meson to capture output)
[2395/2660] Generating drivers/rte_common_sfc_efx.sym_chk with a
custom command (wrapped by meson to capture output)
[2396/2660] Generating drivers/rte_common_cnxk.sym_chk with a custom
command (wrapped by meson to capture output)
ninja: build stopped: subcommand failed.

^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 00/34] Implementation of revised ml/cnxk driver
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (36 preceding siblings ...)
  2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
@ 2023-10-18  6:47 ` Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 01/34] ml/cnxk: drop support for register polling Srikanth Yalavarthi
                     ` (33 more replies)
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (3 subsequent siblings)
  41 siblings, 34 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:47 UTC (permalink / raw)
  Cc: dev, syalavarthi, sshankarnara, aprabhu, ptakkar

This patch series is an implementation of revised ml/cnxk driver
to support models compiled with TVM compiler framework. TVM models
use a hybrid mode for execution, with regions of the model executing
on the ML accelerator and the rest executing on CPU cores.

This series of commits reorganizes the ml/cnxk driver and adds support
to execute multiple regions with-in a TVM model.

v5:
  - Fix build failures for individual patches in the series
  - Finished build testing with devtools/test-meson-builds.sh script

v4:
  - Squashed release notes
  - Updated external build dependency info in documentation

v3:
  - Reduced use of RTE_MLDEV_CNXK_ENABLE_MVTVM macro
  - Added stubs file with dummy functions to use when TVM is disabled
  - Dropped patch with internal function to read firmware
  - Updated ML CNXK PMD documentation
  - Added external library dependency info in documentation
  - Added release notes for 23.11

v2:
  - Fix xstats reporting
  - Fix issues reported by klocwork static analysis tool
  - Update external header inclusions

v1:
  - Initial changes

Anup Prabhu (2):
  ml/cnxk: enable OCM check for multilayer TVM model
  ml/cnxk: enable fast-path ops for TVM models

Prince Takkar (2):
  ml/cnxk: update internal TVM model info structure
  ml/cnxk: support quantize and dequantize callback

Srikanth Yalavarthi (30):
  ml/cnxk: drop support for register polling
  ml/cnxk: add generic cnxk device structure
  ml/cnxk: add generic model and layer structures
  ml/cnxk: add generic cnxk request structure
  ml/cnxk: add generic cnxk xstats structures
  ml/cnxk: rename cnxk ops function pointers struct
  ml/cnxk: update device handling functions
  ml/cnxk: update queue-pair handling functions
  ml/cnxk: update model load and unload functions
  ml/cnxk: update model start and stop functions
  ml/cnxk: update model utility functions
  ml/cnxk: update data quantization functions
  ml/cnxk: update device debug functions
  ml/cnxk: update device stats functions
  ml/cnxk: update device and model xstats functions
  ml/cnxk: update fast path functions
  ml/cnxk: move error handling to cnxk layer
  ml/cnxk: support config and close of tvmdp library
  ml/cnxk: add structures to support TVM model type
  ml/cnxk: add support for identify model type
  ml/cnxk: add support to parse TVM model objects
  ml/cnxk: fetch layer info and load TVM model
  ml/cnxk: update internal info for TVM model
  ml/cnxk: enable model unload in tvmdp library
  ml/cnxk: support start and stop for TVM models
  ml/cnxk: support device dump for TVM models
  ml/cnxk: enable reporting model runtime as xstats
  ml/cnxk: implement I/O alloc and free callbacks
  ml/cnxk: add generic ML malloc and free callback
  ml/cnxk: enable creation of mvtvm virtual device

 doc/guides/mldevs/cnxk.rst             |  111 +-
 doc/guides/rel_notes/release_23_11.rst |    4 +
 drivers/ml/cnxk/cn10k_ml_dev.c         |  416 ++--
 drivers/ml/cnxk/cn10k_ml_dev.h         |  457 +---
 drivers/ml/cnxk/cn10k_ml_model.c       |  401 ++--
 drivers/ml/cnxk/cn10k_ml_model.h       |  151 +-
 drivers/ml/cnxk/cn10k_ml_ocm.c         |  111 +-
 drivers/ml/cnxk/cn10k_ml_ocm.h         |   15 +-
 drivers/ml/cnxk/cn10k_ml_ops.c         | 2828 ++++++++----------------
 drivers/ml/cnxk/cn10k_ml_ops.h         |  358 ++-
 drivers/ml/cnxk/cnxk_ml_dev.c          |   22 +
 drivers/ml/cnxk/cnxk_ml_dev.h          |  120 +
 drivers/ml/cnxk/cnxk_ml_io.c           |   95 +
 drivers/ml/cnxk/cnxk_ml_io.h           |   88 +
 drivers/ml/cnxk/cnxk_ml_model.c        |   94 +
 drivers/ml/cnxk/cnxk_ml_model.h        |  192 ++
 drivers/ml/cnxk/cnxk_ml_ops.c          | 1690 ++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.h          |   87 +
 drivers/ml/cnxk/cnxk_ml_utils.c        |   15 +
 drivers/ml/cnxk/cnxk_ml_utils.h        |   17 +
 drivers/ml/cnxk/cnxk_ml_xstats.h       |  152 ++
 drivers/ml/cnxk/meson.build            |   79 +
 drivers/ml/cnxk/mvtvm_ml_dev.c         |  196 ++
 drivers/ml/cnxk/mvtvm_ml_dev.h         |   40 +
 drivers/ml/cnxk/mvtvm_ml_model.c       |  392 ++++
 drivers/ml/cnxk/mvtvm_ml_model.h       |   90 +
 drivers/ml/cnxk/mvtvm_ml_ops.c         |  652 ++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h         |   82 +
 drivers/ml/cnxk/mvtvm_ml_stubs.c       |  141 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.h       |   36 +
 30 files changed, 6173 insertions(+), 2959 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_xstats.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_stubs.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_stubs.h

-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 01/34] ml/cnxk: drop support for register polling
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
@ 2023-10-18  6:47   ` Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 02/34] ml/cnxk: add generic cnxk device structure Srikanth Yalavarthi
                     ` (32 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:47 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Dropped support for device argument "poll_mem" for cnxk
ML driver. Support to use registers for polling is removed
and DDR addresses would be used for polling.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 doc/guides/mldevs/cnxk.rst     |  16 -----
 drivers/ml/cnxk/cn10k_ml_dev.c |  36 +----------
 drivers/ml/cnxk/cn10k_ml_dev.h |  13 +---
 drivers/ml/cnxk/cn10k_ml_ops.c | 111 ++++-----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   6 --
 5 files changed, 18 insertions(+), 164 deletions(-)

diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
index b79bc540d9..1834b1f905 100644
--- a/doc/guides/mldevs/cnxk.rst
+++ b/doc/guides/mldevs/cnxk.rst
@@ -180,22 +180,6 @@ Runtime Config Options
   in the fast path enqueue burst operation.
 
 
-**Polling memory location** (default ``ddr``)
-
-  ML cnxk driver provides the option to select the memory location to be used
-  for polling to check the inference request completion.
-  Driver supports using either the DDR address space (``ddr``)
-  or ML registers (``register``) as polling locations.
-  The parameter ``poll_mem`` is used to specify the poll location.
-
-  For example::
-
-     -a 0000:00:10.0,poll_mem="register"
-
-  With the above configuration, ML cnxk driver is configured to use ML registers
-  for polling in fastpath requests.
-
-
 Debugging Options
 -----------------
 
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 983138a7f2..e3c2badcef 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -23,7 +23,6 @@
 #define CN10K_ML_DEV_CACHE_MODEL_DATA	"cache_model_data"
 #define CN10K_ML_OCM_ALLOC_MODE		"ocm_alloc_mode"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK	"hw_queue_lock"
-#define CN10K_ML_FW_POLL_MEM		"poll_mem"
 #define CN10K_ML_OCM_PAGE_SIZE		"ocm_page_size"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
@@ -32,7 +31,6 @@
 #define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT	1
 #define CN10K_ML_OCM_ALLOC_MODE_DEFAULT		"lowest"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT	1
-#define CN10K_ML_FW_POLL_MEM_DEFAULT		"ddr"
 #define CN10K_ML_OCM_PAGE_SIZE_DEFAULT		16384
 
 /* ML firmware macros */
@@ -54,7 +52,6 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 					 CN10K_ML_DEV_CACHE_MODEL_DATA,
 					 CN10K_ML_OCM_ALLOC_MODE,
 					 CN10K_ML_DEV_HW_QUEUE_LOCK,
-					 CN10K_ML_FW_POLL_MEM,
 					 CN10K_ML_OCM_PAGE_SIZE,
 					 NULL};
 
@@ -103,9 +100,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	bool hw_queue_lock_set = false;
 	bool ocm_page_size_set = false;
 	char *ocm_alloc_mode = NULL;
-	bool poll_mem_set = false;
 	bool fw_path_set = false;
-	char *poll_mem = NULL;
 	char *fw_path = NULL;
 	int ret = 0;
 	bool found;
@@ -189,17 +184,6 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		hw_queue_lock_set = true;
 	}
 
-	if (rte_kvargs_count(kvlist, CN10K_ML_FW_POLL_MEM) == 1) {
-		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_POLL_MEM, &parse_string_arg,
-					 &poll_mem);
-		if (ret < 0) {
-			plt_err("Error processing arguments, key = %s\n", CN10K_ML_FW_POLL_MEM);
-			ret = -EINVAL;
-			goto exit;
-		}
-		poll_mem_set = true;
-	}
-
 	if (rte_kvargs_count(kvlist, CN10K_ML_OCM_PAGE_SIZE) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_OCM_PAGE_SIZE, &parse_integer_arg,
 					 &mldev->ocm_page_size);
@@ -280,18 +264,6 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, mldev->hw_queue_lock);
 
-	if (!poll_mem_set) {
-		mldev->fw.poll_mem = CN10K_ML_FW_POLL_MEM_DEFAULT;
-	} else {
-		if (!((strcmp(poll_mem, "ddr") == 0) || (strcmp(poll_mem, "register") == 0))) {
-			plt_err("Invalid argument, %s = %s\n", CN10K_ML_FW_POLL_MEM, poll_mem);
-			ret = -EINVAL;
-			goto exit;
-		}
-		mldev->fw.poll_mem = poll_mem;
-	}
-	plt_info("ML: %s = %s", CN10K_ML_FW_POLL_MEM, mldev->fw.poll_mem);
-
 	if (!ocm_page_size_set) {
 		mldev->ocm_page_size = CN10K_ML_OCM_PAGE_SIZE_DEFAULT;
 	} else {
@@ -450,10 +422,7 @@ cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 	if (fw->report_dpe_warnings)
 		flags = flags | FW_REPORT_DPE_WARNING_BITMASK;
 
-	if (strcmp(fw->poll_mem, "ddr") == 0)
-		flags = flags | FW_USE_DDR_POLL_ADDR_FP;
-	else if (strcmp(fw->poll_mem, "register") == 0)
-		flags = flags & ~FW_USE_DDR_POLL_ADDR_FP;
+	flags = flags | FW_USE_DDR_POLL_ADDR_FP;
 
 	return flags;
 }
@@ -863,5 +832,4 @@ RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH
 			      "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
 			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE
 			      "=<lowest|largest>" CN10K_ML_DEV_HW_QUEUE_LOCK
-			      "=<0|1>" CN10K_ML_FW_POLL_MEM "=<ddr|register>" CN10K_ML_OCM_PAGE_SIZE
-			      "=<1024|2048|4096|8192|16384>");
+			      "=<0|1>" CN10K_ML_OCM_PAGE_SIZE "=<1024|2048|4096|8192|16384>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index c73bf7d001..4aaeecff03 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -390,9 +390,6 @@ struct cn10k_ml_fw {
 	/* Report DPE warnings */
 	int report_dpe_warnings;
 
-	/* Memory to be used for polling in fast-path requests */
-	const char *poll_mem;
-
 	/* Data buffer */
 	uint8_t *data;
 
@@ -525,13 +522,9 @@ struct cn10k_ml_dev {
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
 
 	/* Poll handling function pointers */
-	void (*set_poll_addr)(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx);
-	void (*set_poll_ptr)(struct roc_ml *roc_ml, struct cn10k_ml_req *req);
-	uint64_t (*get_poll_ptr)(struct roc_ml *roc_ml, struct cn10k_ml_req *req);
-
-	/* Memory barrier function pointers to handle synchronization */
-	void (*set_enq_barrier)(void);
-	void (*set_deq_barrier)(void);
+	void (*set_poll_addr)(struct cn10k_ml_req *req);
+	void (*set_poll_ptr)(struct cn10k_ml_req *req);
+	uint64_t (*get_poll_ptr)(struct cn10k_ml_req *req);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 4abf4ae0d3..11531afd8c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -23,11 +23,6 @@
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
-/* Scratch register range for poll mode requests */
-#define ML_POLL_REGISTER_SYNC  1023
-#define ML_POLL_REGISTER_START 1024
-#define ML_POLL_REGISTER_END   2047
-
 /* Error message length */
 #define ERRMSG_LEN 32
 
@@ -82,79 +77,23 @@ print_line(FILE *fp, int len)
 }
 
 static inline void
-cn10k_ml_set_poll_addr_ddr(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx)
+cn10k_ml_set_poll_addr(struct cn10k_ml_req *req)
 {
-	PLT_SET_USED(qp);
-	PLT_SET_USED(idx);
-
 	req->compl_W1 = PLT_U64_CAST(&req->status);
 }
 
 static inline void
-cn10k_ml_set_poll_addr_reg(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx)
-{
-	req->compl_W1 = ML_SCRATCH(qp->block_start + idx % qp->block_size);
-}
-
-static inline void
-cn10k_ml_set_poll_ptr_ddr(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+cn10k_ml_set_poll_ptr(struct cn10k_ml_req *req)
 {
-	PLT_SET_USED(roc_ml);
-
 	plt_write64(ML_CN10K_POLL_JOB_START, req->compl_W1);
 }
 
-static inline void
-cn10k_ml_set_poll_ptr_reg(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
-{
-	roc_ml_reg_write64(roc_ml, ML_CN10K_POLL_JOB_START, req->compl_W1);
-}
-
 static inline uint64_t
-cn10k_ml_get_poll_ptr_ddr(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+cn10k_ml_get_poll_ptr(struct cn10k_ml_req *req)
 {
-	PLT_SET_USED(roc_ml);
-
 	return plt_read64(req->compl_W1);
 }
 
-static inline uint64_t
-cn10k_ml_get_poll_ptr_reg(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
-{
-	return roc_ml_reg_read64(roc_ml, req->compl_W1);
-}
-
-static inline void
-cn10k_ml_set_sync_addr(struct cn10k_ml_dev *mldev, struct cn10k_ml_req *req)
-{
-	if (strcmp(mldev->fw.poll_mem, "ddr") == 0)
-		req->compl_W1 = PLT_U64_CAST(&req->status);
-	else if (strcmp(mldev->fw.poll_mem, "register") == 0)
-		req->compl_W1 = ML_SCRATCH(ML_POLL_REGISTER_SYNC);
-}
-
-static inline void
-cn10k_ml_enq_barrier_ddr(void)
-{
-}
-
-static inline void
-cn10k_ml_deq_barrier_ddr(void)
-{
-}
-
-static inline void
-cn10k_ml_enq_barrier_register(void)
-{
-	dmb_st;
-}
-
-static inline void
-cn10k_ml_deq_barrier_register(void)
-{
-	dsb_st;
-}
-
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -242,9 +181,6 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->stats.dequeued_count = 0;
 	qp->stats.enqueue_err_count = 0;
 	qp->stats.dequeue_err_count = 0;
-	qp->block_size =
-		(ML_POLL_REGISTER_END - ML_POLL_REGISTER_START + 1) / dev->data->nb_queue_pairs;
-	qp->block_start = ML_POLL_REGISTER_START + qp_id * qp->block_size;
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
@@ -933,11 +869,7 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	else
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_LF;
 
-	if (strcmp(mldev->fw.poll_mem, "register") == 0)
-		dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP / dev_info->max_queue_pairs;
-	else if (strcmp(mldev->fw.poll_mem, "ddr") == 0)
-		dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
-
+	dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
 	dev_info->max_io = ML_CN10K_MAX_INPUT_OUTPUT;
 	dev_info->max_segments = ML_CN10K_MAX_SEGMENTS;
 	dev_info->align_size = ML_CN10K_ALIGN_SIZE;
@@ -1118,24 +1050,9 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
 	/* Set polling function pointers */
-	if (strcmp(mldev->fw.poll_mem, "ddr") == 0) {
-		mldev->set_poll_addr = cn10k_ml_set_poll_addr_ddr;
-		mldev->set_poll_ptr = cn10k_ml_set_poll_ptr_ddr;
-		mldev->get_poll_ptr = cn10k_ml_get_poll_ptr_ddr;
-	} else if (strcmp(mldev->fw.poll_mem, "register") == 0) {
-		mldev->set_poll_addr = cn10k_ml_set_poll_addr_reg;
-		mldev->set_poll_ptr = cn10k_ml_set_poll_ptr_reg;
-		mldev->get_poll_ptr = cn10k_ml_get_poll_ptr_reg;
-	}
-
-	/* Set barrier function pointers */
-	if (strcmp(mldev->fw.poll_mem, "ddr") == 0) {
-		mldev->set_enq_barrier = cn10k_ml_enq_barrier_ddr;
-		mldev->set_deq_barrier = cn10k_ml_deq_barrier_ddr;
-	} else if (strcmp(mldev->fw.poll_mem, "register") == 0) {
-		mldev->set_enq_barrier = cn10k_ml_enq_barrier_register;
-		mldev->set_deq_barrier = cn10k_ml_deq_barrier_register;
-	}
+	mldev->set_poll_addr = cn10k_ml_set_poll_addr;
+	mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
+	mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
 
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
@@ -2390,15 +2307,14 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	op = ops[count];
 	req = &queue->reqs[head];
 
-	mldev->set_poll_addr(qp, req, head);
+	mldev->set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
-	mldev->set_enq_barrier();
 
-	mldev->set_poll_ptr(&mldev->roc, req);
+	mldev->set_poll_ptr(req);
 	enqueued = mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
@@ -2445,7 +2361,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 
 dequeue_req:
 	req = &queue->reqs[tail];
-	status = mldev->get_poll_ptr(&mldev->roc, req);
+	status = mldev->get_poll_ptr(req);
 	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH)) {
 		if (plt_tsc_cycles() < req->timeout)
 			goto empty_or_active;
@@ -2453,7 +2369,6 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 			req->result.error_code.s.etype = ML_ETYPE_DRIVER;
 	}
 
-	mldev->set_deq_barrier();
 	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
 	ops[count] = req->op;
 
@@ -2515,14 +2430,14 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	model = dev->data->models[op->model_id];
 	req = model->req;
 
-	cn10k_ml_set_sync_addr(mldev, req);
+	cn10k_ml_set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
-	mldev->set_poll_ptr(&mldev->roc, req);
+	mldev->set_poll_ptr(req);
 	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
 
 	timeout = true;
@@ -2542,7 +2457,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 
 	timeout = true;
 	do {
-		if (mldev->get_poll_ptr(&mldev->roc, req) == ML_CN10K_POLL_JOB_FINISH) {
+		if (mldev->get_poll_ptr(req) == ML_CN10K_POLL_JOB_FINISH) {
 			timeout = false;
 			break;
 		}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index d64a9f27e6..005b093e45 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -67,12 +67,6 @@ struct cn10k_ml_qp {
 
 	/* Statistics per queue-pair */
 	struct rte_ml_dev_stats stats;
-
-	/* Register block start for polling */
-	uint32_t block_start;
-
-	/* Register block end for polling */
-	uint32_t block_size;
 };
 
 /* Device ops */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 02/34] ml/cnxk: add generic cnxk device structure
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 01/34] ml/cnxk: drop support for register polling Srikanth Yalavarthi
@ 2023-10-18  6:47   ` Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 03/34] ml/cnxk: add generic model and layer structures Srikanth Yalavarthi
                     ` (31 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:47 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduce generic cnxk device structure. This structure is
a top level device structure for the driver, which would
encapsulate the target / platform specific device structure.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c   | 316 ++++++++++----------
 drivers/ml/cnxk/cn10k_ml_dev.h   |  47 +--
 drivers/ml/cnxk/cn10k_ml_model.c |  15 +-
 drivers/ml/cnxk/cn10k_ml_model.h |   8 +-
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  60 ++--
 drivers/ml/cnxk/cn10k_ml_ops.c   | 495 +++++++++++++++++--------------
 drivers/ml/cnxk/cnxk_ml_dev.c    |  11 +
 drivers/ml/cnxk/cnxk_ml_dev.h    |  58 ++++
 drivers/ml/cnxk/meson.build      |   2 +
 9 files changed, 563 insertions(+), 449 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index e3c2badcef..3bc61443d8 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -10,13 +10,14 @@
 #include <rte_mldev_pmd.h>
 #include <rte_pci.h>
 
-#include <roc_api.h>
-
 #include <eal_firmware.h>
 
-#include "cn10k_ml_dev.h"
+#include <roc_api.h>
+
 #include "cn10k_ml_ops.h"
 
+#include "cnxk_ml_dev.h"
+
 #define CN10K_ML_FW_PATH		"fw_path"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
@@ -58,9 +59,6 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 /* Supported OCM page sizes: 1KB, 2KB, 4KB, 8KB and 16KB */
 static const int valid_ocm_page_size[] = {1024, 2048, 4096, 8192, 16384};
 
-/* Dummy operations for ML device */
-struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
-
 static int
 parse_string_arg(const char *key __rte_unused, const char *value, void *extra_args)
 {
@@ -90,7 +88,7 @@ parse_integer_arg(const char *key __rte_unused, const char *value, void *extra_a
 }
 
 static int
-cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mldev)
+cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *cn10k_mldev)
 {
 	bool enable_dpe_warnings_set = false;
 	bool report_dpe_warnings_set = false;
@@ -127,7 +125,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_FW_ENABLE_DPE_WARNINGS) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_ENABLE_DPE_WARNINGS,
-					 &parse_integer_arg, &mldev->fw.enable_dpe_warnings);
+					 &parse_integer_arg, &cn10k_mldev->fw.enable_dpe_warnings);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_FW_ENABLE_DPE_WARNINGS);
@@ -139,7 +137,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_FW_REPORT_DPE_WARNINGS) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_REPORT_DPE_WARNINGS,
-					 &parse_integer_arg, &mldev->fw.report_dpe_warnings);
+					 &parse_integer_arg, &cn10k_mldev->fw.report_dpe_warnings);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_FW_REPORT_DPE_WARNINGS);
@@ -151,7 +149,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_DEV_CACHE_MODEL_DATA) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_DEV_CACHE_MODEL_DATA, &parse_integer_arg,
-					 &mldev->cache_model_data);
+					 &cn10k_mldev->cache_model_data);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_DEV_CACHE_MODEL_DATA);
@@ -174,7 +172,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_DEV_HW_QUEUE_LOCK) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_DEV_HW_QUEUE_LOCK, &parse_integer_arg,
-					 &mldev->hw_queue_lock);
+					 &cn10k_mldev->hw_queue_lock);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_DEV_HW_QUEUE_LOCK);
@@ -186,7 +184,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_OCM_PAGE_SIZE) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_OCM_PAGE_SIZE, &parse_integer_arg,
-					 &mldev->ocm_page_size);
+					 &cn10k_mldev->ocm_page_size);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n", CN10K_ML_OCM_PAGE_SIZE);
 			ret = -EINVAL;
@@ -197,49 +195,53 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 check_args:
 	if (!fw_path_set)
-		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
+		cn10k_mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
 	else
-		mldev->fw.path = fw_path;
-	plt_info("ML: %s = %s", CN10K_ML_FW_PATH, mldev->fw.path);
+		cn10k_mldev->fw.path = fw_path;
+	plt_info("ML: %s = %s", CN10K_ML_FW_PATH, cn10k_mldev->fw.path);
 
 	if (!enable_dpe_warnings_set) {
-		mldev->fw.enable_dpe_warnings = CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT;
+		cn10k_mldev->fw.enable_dpe_warnings = CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT;
 	} else {
-		if ((mldev->fw.enable_dpe_warnings < 0) || (mldev->fw.enable_dpe_warnings > 1)) {
+		if ((cn10k_mldev->fw.enable_dpe_warnings < 0) ||
+		    (cn10k_mldev->fw.enable_dpe_warnings > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_FW_ENABLE_DPE_WARNINGS,
-				mldev->fw.enable_dpe_warnings);
+				cn10k_mldev->fw.enable_dpe_warnings);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_FW_ENABLE_DPE_WARNINGS, mldev->fw.enable_dpe_warnings);
+	plt_info("ML: %s = %d", CN10K_ML_FW_ENABLE_DPE_WARNINGS,
+		 cn10k_mldev->fw.enable_dpe_warnings);
 
 	if (!report_dpe_warnings_set) {
-		mldev->fw.report_dpe_warnings = CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT;
+		cn10k_mldev->fw.report_dpe_warnings = CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT;
 	} else {
-		if ((mldev->fw.report_dpe_warnings < 0) || (mldev->fw.report_dpe_warnings > 1)) {
+		if ((cn10k_mldev->fw.report_dpe_warnings < 0) ||
+		    (cn10k_mldev->fw.report_dpe_warnings > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_FW_REPORT_DPE_WARNINGS,
-				mldev->fw.report_dpe_warnings);
+				cn10k_mldev->fw.report_dpe_warnings);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_FW_REPORT_DPE_WARNINGS, mldev->fw.report_dpe_warnings);
+	plt_info("ML: %s = %d", CN10K_ML_FW_REPORT_DPE_WARNINGS,
+		 cn10k_mldev->fw.report_dpe_warnings);
 
 	if (!cache_model_data_set) {
-		mldev->cache_model_data = CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT;
+		cn10k_mldev->cache_model_data = CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT;
 	} else {
-		if ((mldev->cache_model_data < 0) || (mldev->cache_model_data > 1)) {
+		if ((cn10k_mldev->cache_model_data < 0) || (cn10k_mldev->cache_model_data > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_DEV_CACHE_MODEL_DATA,
-				mldev->cache_model_data);
+				cn10k_mldev->cache_model_data);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_DEV_CACHE_MODEL_DATA, mldev->cache_model_data);
+	plt_info("ML: %s = %d", CN10K_ML_DEV_CACHE_MODEL_DATA, cn10k_mldev->cache_model_data);
 
 	if (!ocm_alloc_mode_set) {
-		mldev->ocm.alloc_mode = CN10K_ML_OCM_ALLOC_MODE_DEFAULT;
+		cn10k_mldev->ocm.alloc_mode = CN10K_ML_OCM_ALLOC_MODE_DEFAULT;
 	} else {
 		if (!((strcmp(ocm_alloc_mode, "lowest") == 0) ||
 		      (strcmp(ocm_alloc_mode, "largest") == 0))) {
@@ -248,47 +250,47 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 			ret = -EINVAL;
 			goto exit;
 		}
-		mldev->ocm.alloc_mode = ocm_alloc_mode;
+		cn10k_mldev->ocm.alloc_mode = ocm_alloc_mode;
 	}
-	plt_info("ML: %s = %s", CN10K_ML_OCM_ALLOC_MODE, mldev->ocm.alloc_mode);
+	plt_info("ML: %s = %s", CN10K_ML_OCM_ALLOC_MODE, cn10k_mldev->ocm.alloc_mode);
 
 	if (!hw_queue_lock_set) {
-		mldev->hw_queue_lock = CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT;
+		cn10k_mldev->hw_queue_lock = CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT;
 	} else {
-		if ((mldev->hw_queue_lock < 0) || (mldev->hw_queue_lock > 1)) {
+		if ((cn10k_mldev->hw_queue_lock < 0) || (cn10k_mldev->hw_queue_lock > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_DEV_HW_QUEUE_LOCK,
-				mldev->hw_queue_lock);
+				cn10k_mldev->hw_queue_lock);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, mldev->hw_queue_lock);
+	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, cn10k_mldev->hw_queue_lock);
 
 	if (!ocm_page_size_set) {
-		mldev->ocm_page_size = CN10K_ML_OCM_PAGE_SIZE_DEFAULT;
+		cn10k_mldev->ocm_page_size = CN10K_ML_OCM_PAGE_SIZE_DEFAULT;
 	} else {
-		if (mldev->ocm_page_size < 0) {
+		if (cn10k_mldev->ocm_page_size < 0) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_OCM_PAGE_SIZE,
-				mldev->ocm_page_size);
+				cn10k_mldev->ocm_page_size);
 			ret = -EINVAL;
 			goto exit;
 		}
 
 		found = false;
 		for (i = 0; i < PLT_DIM(valid_ocm_page_size); i++) {
-			if (mldev->ocm_page_size == valid_ocm_page_size[i]) {
+			if (cn10k_mldev->ocm_page_size == valid_ocm_page_size[i]) {
 				found = true;
 				break;
 			}
 		}
 
 		if (!found) {
-			plt_err("Unsupported ocm_page_size = %d\n", mldev->ocm_page_size);
+			plt_err("Unsupported ocm_page_size = %d\n", cn10k_mldev->ocm_page_size);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_OCM_PAGE_SIZE, mldev->ocm_page_size);
+	plt_info("ML: %s = %d", CN10K_ML_OCM_PAGE_SIZE, cn10k_mldev->ocm_page_size);
 
 exit:
 	rte_kvargs_free(kvlist);
@@ -300,7 +302,8 @@ static int
 cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 {
 	struct rte_ml_dev_pmd_init_params init_params;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	char name[RTE_ML_STR_MAX];
 	struct rte_ml_dev *dev;
 	int ret;
@@ -308,7 +311,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	PLT_SET_USED(pci_drv);
 
 	init_params = (struct rte_ml_dev_pmd_init_params){
-		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cn10k_ml_dev)};
+		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cnxk_ml_dev)};
 
 	ret = roc_plt_init();
 	if (ret < 0) {
@@ -324,18 +327,20 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	}
 
 	/* Get private data space allocated */
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cnxk_mldev->mldev = dev;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
-		mldev->roc.pci_dev = pci_dev;
+		cn10k_mldev->roc.pci_dev = pci_dev;
 
-		ret = cn10k_mldev_parse_devargs(dev->device->devargs, mldev);
+		ret = cn10k_mldev_parse_devargs(dev->device->devargs, cn10k_mldev);
 		if (ret) {
 			plt_err("Failed to parse devargs ret = %d", ret);
 			goto pmd_destroy;
 		}
 
-		ret = roc_ml_dev_init(&mldev->roc);
+		ret = roc_ml_dev_init(&cn10k_mldev->roc);
 		if (ret) {
 			plt_err("Failed to initialize ML ROC, ret = %d", ret);
 			goto pmd_destroy;
@@ -351,7 +356,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	dev->dequeue_burst = NULL;
 	dev->op_error_get = NULL;
 
-	mldev->state = ML_CN10K_DEV_STATE_PROBED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_PROBED;
 
 	return 0;
 
@@ -368,7 +373,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 static int
 cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	char name[RTE_ML_STR_MAX];
 	struct rte_ml_dev *dev;
 	int ret;
@@ -383,8 +388,8 @@ cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 		return -ENODEV;
 
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
-		mldev = dev->data->dev_private;
-		ret = roc_ml_dev_fini(&mldev->roc);
+		cnxk_mldev = dev->data->dev_private;
+		ret = roc_ml_dev_fini(&cnxk_mldev->cn10k_mldev.roc);
 		if (ret)
 			return ret;
 	}
@@ -430,45 +435,45 @@ cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 static int
 cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
 	uint64_t timeout_cycle;
 	uint64_t reg_val64;
 	bool timeout;
 	int ret = 0;
 
-	mldev = fw->mldev;
+	cn10k_mldev = fw->cn10k_mldev;
 
 	/* Reset HEAD and TAIL debug pointer registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
 
 	/* Set ML_MLR_BASE to base IOVA of the ML region in LLC/DRAM. */
 	reg_val64 = rte_eal_get_baseaddr();
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_MLR_BASE);
-	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
-	roc_ml_reg_save(&mldev->roc, ML_MLR_BASE);
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_save(&cn10k_mldev->roc, ML_MLR_BASE);
 
 	/* Update FW load completion structure */
 	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
 	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &fw->req->result);
+	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
 	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CN10K_POLL_JOB_START, &fw->req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
-	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&mldev->roc, &fw->req->jd);
+	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
 
 	plt_rmb();
 	do {
-		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
+		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -480,11 +485,11 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 	} else {
 		/* Set ML to disable new jobs */
 		reg_val64 = (ROC_ML_CFG_JD_SIZE | ROC_ML_CFG_MLIP_ENA);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 
 		/* Clear scratch registers */
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
 
 		if (timeout) {
 			plt_err("Firmware load timeout");
@@ -498,14 +503,14 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 	}
 
 	/* Reset scratch registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
 
 	/* Disable job execution, to be enabled in start */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	return ret;
 }
@@ -515,7 +520,7 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 {
 	union ml_a35_0_rst_vector_base_s a35_0_rst_vector_base;
 	union ml_a35_0_rst_vector_base_s a35_1_rst_vector_base;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
 	uint64_t timeout_cycle;
 	uint64_t reg_val64;
 	uint32_t reg_val32;
@@ -524,24 +529,24 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	int ret = 0;
 	uint8_t i;
 
-	mldev = fw->mldev;
+	cn10k_mldev = fw->cn10k_mldev;
 
 	/* Reset HEAD and TAIL debug pointer registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
 
 	/* (1) Write firmware images for ACC's two A35 cores to the ML region in LLC / DRAM. */
 	rte_memcpy(PLT_PTR_ADD(fw->data, FW_LINKER_OFFSET), buffer, size);
 
 	/* (2) Set ML(0)_MLR_BASE = Base IOVA of the ML region in LLC/DRAM. */
 	reg_val64 = PLT_PTR_SUB_U64_CAST(fw->data, rte_eal_get_baseaddr());
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_MLR_BASE);
-	plt_ml_dbg("ML_MLR_BASE => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
-	roc_ml_reg_save(&mldev->roc, ML_MLR_BASE);
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_save(&cn10k_mldev->roc, ML_MLR_BASE);
 
 	/* (3) Set ML(0)_AXI_BRIDGE_CTRL(1) = 0x184003 to remove back-pressure check on DMA AXI
 	 * bridge.
@@ -549,9 +554,9 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	reg_val64 = (ROC_ML_AXI_BRIDGE_CTRL_AXI_RESP_CTRL |
 		     ROC_ML_AXI_BRIDGE_CTRL_BRIDGE_CTRL_MODE | ROC_ML_AXI_BRIDGE_CTRL_NCB_WR_BLK |
 		     ROC_ML_AXI_BRIDGE_CTRL_FORCE_WRESP_OK | ROC_ML_AXI_BRIDGE_CTRL_FORCE_RRESP_OK);
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_AXI_BRIDGE_CTRL(1));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_AXI_BRIDGE_CTRL(1));
 	plt_ml_dbg("ML_AXI_BRIDGE_CTRL(1) => 0x%016lx",
-		   roc_ml_reg_read64(&mldev->roc, ML_AXI_BRIDGE_CTRL(1)));
+		   roc_ml_reg_read64(&cn10k_mldev->roc, ML_AXI_BRIDGE_CTRL(1)));
 
 	/* (4) Set ML(0)_ANB(0..2)_BACKP_DISABLE = 0x3 to remove back-pressure on the AXI to NCB
 	 * bridges.
@@ -559,9 +564,9 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	for (i = 0; i < ML_ANBX_NR; i++) {
 		reg_val64 = (ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_B_BACKP_DISABLE |
 			     ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_R_BACKP_DISABLE);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_BACKP_DISABLE(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_BACKP_DISABLE(i));
 		plt_ml_dbg("ML_ANBX_BACKP_DISABLE(%u) => 0x%016lx", i,
-			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_BACKP_DISABLE(i)));
+			   roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_BACKP_DISABLE(i)));
 	}
 
 	/* (5) Set ML(0)_ANB(0..2)_NCBI_P_OVR = 0x3000 and ML(0)_ANB(0..2)_NCBI_NP_OVR = 0x3000 to
@@ -570,39 +575,40 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	for (i = 0; i < ML_ANBX_NR; i++) {
 		reg_val64 = (ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR |
 			     ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR_VLD);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
 		plt_ml_dbg("ML_ANBX_NCBI_P_OVR(%u) => 0x%016lx", i,
-			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_P_OVR(i)));
+			   roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_NCBI_P_OVR(i)));
 
 		reg_val64 |= (ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR |
 			      ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR_VLD);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_NP_OVR(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_NCBI_NP_OVR(i));
 		plt_ml_dbg("ML_ANBX_NCBI_NP_OVR(%u) => 0x%016lx", i,
-			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_NP_OVR(i)));
+			   roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_NCBI_NP_OVR(i)));
 	}
 
 	/* (6) Set ML(0)_CFG[MLIP_CLK_FORCE] = 1, to force turning on the MLIP clock. */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_MLIP_CLK_FORCE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* (7) Set ML(0)_JOB_MGR_CTRL[STALL_ON_IDLE] = 0, to make sure the boot request is accepted
 	 * when there is no job in the command queue.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL);
 	reg_val64 &= ~ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
-	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
+	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx",
+		   roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL));
 
 	/* (8) Set ML(0)_CFG[ENA] = 0 and ML(0)_CFG[MLIP_ENA] = 1 to bring MLIP out of reset while
 	 * keeping the job manager disabled.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_MLIP_ENA;
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* (9) Wait at least 70 coprocessor clock cycles. */
 	plt_delay_us(FW_WAIT_CYCLES);
@@ -613,53 +619,57 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	 * AXI outbound address divided by 4. Read after write.
 	 */
 	offset = PLT_PTR_ADD_U64_CAST(
-		fw->data, FW_LINKER_OFFSET - roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+		fw->data, FW_LINKER_OFFSET - roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
 	a35_0_rst_vector_base.s.addr = (offset + ML_AXI_START_ADDR) / 4;
 	a35_1_rst_vector_base.s.addr = (offset + ML_AXI_START_ADDR) / 4;
 
-	roc_ml_reg_write32(&mldev->roc, a35_0_rst_vector_base.w.w0, ML_A35_0_RST_VECTOR_BASE_W(0));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(0));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_0_rst_vector_base.w.w0,
+			   ML_A35_0_RST_VECTOR_BASE_W(0));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(0));
 	plt_ml_dbg("ML_A35_0_RST_VECTOR_BASE_W(0) => 0x%08x", reg_val32);
 
-	roc_ml_reg_write32(&mldev->roc, a35_0_rst_vector_base.w.w1, ML_A35_0_RST_VECTOR_BASE_W(1));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(1));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_0_rst_vector_base.w.w1,
+			   ML_A35_0_RST_VECTOR_BASE_W(1));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(1));
 	plt_ml_dbg("ML_A35_0_RST_VECTOR_BASE_W(1) => 0x%08x", reg_val32);
 
-	roc_ml_reg_write32(&mldev->roc, a35_1_rst_vector_base.w.w0, ML_A35_1_RST_VECTOR_BASE_W(0));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(0));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_1_rst_vector_base.w.w0,
+			   ML_A35_1_RST_VECTOR_BASE_W(0));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(0));
 	plt_ml_dbg("ML_A35_1_RST_VECTOR_BASE_W(0) => 0x%08x", reg_val32);
 
-	roc_ml_reg_write32(&mldev->roc, a35_1_rst_vector_base.w.w1, ML_A35_1_RST_VECTOR_BASE_W(1));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(1));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_1_rst_vector_base.w.w1,
+			   ML_A35_1_RST_VECTOR_BASE_W(1));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(1));
 	plt_ml_dbg("ML_A35_1_RST_VECTOR_BASE_W(1) => 0x%08x", reg_val32);
 
 	/* (11) Clear MLIP's ML(0)_SW_RST_CTRL[ACC_RST]. This will bring the ACC cores and other
 	 * MLIP components out of reset. The cores will execute firmware from the ML region as
 	 * written in step 1.
 	 */
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_SW_RST_CTRL);
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_SW_RST_CTRL);
 	reg_val32 &= ~ROC_ML_SW_RST_CTRL_ACC_RST;
-	roc_ml_reg_write32(&mldev->roc, reg_val32, ML_SW_RST_CTRL);
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_SW_RST_CTRL);
+	roc_ml_reg_write32(&cn10k_mldev->roc, reg_val32, ML_SW_RST_CTRL);
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_SW_RST_CTRL);
 	plt_ml_dbg("ML_SW_RST_CTRL => 0x%08x", reg_val32);
 
 	/* (12) Wait for notification from firmware that ML is ready for job execution. */
 	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
 	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &fw->req->result);
+	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
 	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CN10K_POLL_JOB_START, &fw->req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
-	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&mldev->roc, &fw->req->jd);
+	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
 
 	plt_rmb();
 	do {
-		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
+		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -671,11 +681,11 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	} else {
 		/* Set ML to disable new jobs */
 		reg_val64 = (ROC_ML_CFG_JD_SIZE | ROC_ML_CFG_MLIP_ENA);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 
 		/* Clear scratch registers */
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
 
 		if (timeout) {
 			plt_err("Firmware load timeout");
@@ -691,49 +701,51 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	/* (13) Set ML(0)_JOB_MGR_CTRL[STALL_ON_IDLE] = 0x1; this is needed to shut down the MLIP
 	 * clock when there are no more jobs to process.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL);
 	reg_val64 |= ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
-	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
+	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx",
+		   roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL));
 
 	/* (14) Set ML(0)_CFG[MLIP_CLK_FORCE] = 0; the MLIP clock will be turned on/off based on job
 	 * activities.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_MLIP_CLK_FORCE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* (15) Set ML(0)_CFG[ENA] to enable ML job execution. */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* Reset scratch registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
 
 	/* Disable job execution, to be enabled in start */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* Additional fixes: Set RO bit to fix O2D DMA bandwidth issue on cn10ka */
 	for (i = 0; i < ML_ANBX_NR; i++) {
-		reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_P_OVR(i));
+		reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_NCBI_P_OVR(i));
 		reg_val64 |= (ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR |
 			      ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR_VLD);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
 	}
 
 	return ret;
 }
 
 int
-cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
+cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
 	const struct plt_memzone *mz;
 	struct cn10k_ml_fw *fw;
 	void *fw_buffer = NULL;
@@ -741,8 +753,9 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	uint64_t fw_size = 0;
 	int ret = 0;
 
-	fw = &mldev->fw;
-	fw->mldev = mldev;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	fw = &cn10k_mldev->fw;
+	fw->cn10k_mldev = cn10k_mldev;
 
 	if (roc_env_is_emulator() || roc_env_is_hw()) {
 		/* Read firmware image to a buffer */
@@ -773,8 +786,8 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	memset(&fw->req->jd.fw_load.version[0], '\0', MLDEV_FIRMWARE_VERSION_LENGTH);
 
 	/* Reset device, if in active state */
-	if (roc_ml_mlip_is_enabled(&mldev->roc))
-		roc_ml_mlip_reset(&mldev->roc, true);
+	if (roc_ml_mlip_is_enabled(&cn10k_mldev->roc))
+		roc_ml_mlip_reset(&cn10k_mldev->roc, true);
 
 	/* Load firmware */
 	if (roc_env_is_emulator() || roc_env_is_hw()) {
@@ -787,22 +800,25 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	}
 
 	if (ret < 0)
-		cn10k_ml_fw_unload(mldev);
+		cn10k_ml_fw_unload(cnxk_mldev);
 
 	return ret;
 }
 
 void
-cn10k_ml_fw_unload(struct cn10k_ml_dev *mldev)
+cn10k_ml_fw_unload(struct cnxk_ml_dev *cnxk_mldev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
 	const struct plt_memzone *mz;
 	uint64_t reg_val;
 
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+
 	/* Disable and reset device */
-	reg_val = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val &= ~ROC_ML_CFG_MLIP_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val, ML_CFG);
-	roc_ml_mlip_reset(&mldev->roc, true);
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val, ML_CFG);
+	roc_ml_mlip_reset(&cn10k_mldev->roc, true);
 
 	mz = plt_memzone_lookup(FW_MEMZONE_NAME);
 	if (mz != NULL)
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 4aaeecff03..f9da1548c4 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -9,6 +9,9 @@
 
 #include "cn10k_ml_ocm.h"
 
+/* Dummy Device ops */
+extern struct rte_ml_dev_ops ml_dev_dummy_ops;
+
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
@@ -36,17 +39,10 @@
 /* Maximum number of segments for IO data */
 #define ML_CN10K_MAX_SEGMENTS 1
 
-/* ML command timeout in seconds */
-#define ML_CN10K_CMD_TIMEOUT 5
-
 /* ML slow-path job flags */
 #define ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE BIT(0)
 #define ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD   BIT(1)
 
-/* Poll mode job state */
-#define ML_CN10K_POLL_JOB_START	 0
-#define ML_CN10K_POLL_JOB_FINISH 1
-
 /* Memory barrier macros */
 #if defined(RTE_ARCH_ARM)
 #define dmb_st ({ asm volatile("dmb st" : : : "memory"); })
@@ -56,6 +52,7 @@
 #define dsb_st
 #endif
 
+struct cnxk_ml_dev;
 struct cn10k_ml_req;
 struct cn10k_ml_qp;
 
@@ -68,21 +65,6 @@ enum cn10k_ml_job_type {
 	ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST,
 };
 
-/* Device configuration state enum */
-enum cn10k_ml_dev_state {
-	/* Probed and not configured */
-	ML_CN10K_DEV_STATE_PROBED = 0,
-
-	/* Configured */
-	ML_CN10K_DEV_STATE_CONFIGURED,
-
-	/* Started */
-	ML_CN10K_DEV_STATE_STARTED,
-
-	/* Closed */
-	ML_CN10K_DEV_STATE_CLOSED
-};
-
 /* Error types enumeration */
 enum cn10k_ml_error_etype {
 	/* 0x0 */ ML_ETYPE_NO_ERROR = 0, /* No error */
@@ -379,7 +361,7 @@ struct cn10k_ml_jd {
 /* ML firmware structure */
 struct cn10k_ml_fw {
 	/* Device reference */
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
 
 	/* Firmware file path */
 	const char *path;
@@ -485,27 +467,12 @@ struct cn10k_ml_dev {
 	/* Device ROC */
 	struct roc_ml roc;
 
-	/* Configuration state */
-	enum cn10k_ml_dev_state state;
-
 	/* Firmware */
 	struct cn10k_ml_fw fw;
 
 	/* OCM info */
 	struct cn10k_ml_ocm ocm;
 
-	/* Number of models loaded */
-	uint16_t nb_models_loaded;
-
-	/* Number of models unloaded */
-	uint16_t nb_models_unloaded;
-
-	/* Number of models started */
-	uint16_t nb_models_started;
-
-	/* Number of models stopped */
-	uint16_t nb_models_stopped;
-
 	/* Extended stats data */
 	struct cn10k_ml_xstats xstats;
 
@@ -528,7 +495,7 @@ struct cn10k_ml_dev {
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
-int cn10k_ml_fw_load(struct cn10k_ml_dev *mldev);
-void cn10k_ml_fw_unload(struct cn10k_ml_dev *mldev);
+int cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev);
+void cn10k_ml_fw_unload(struct cnxk_ml_dev *cnxk_mldev);
 
 #endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index e0b750cd8e..cc46ca2efd 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -6,10 +6,11 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
+#include "cnxk_ml_dev.h"
+
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
 {
@@ -461,7 +462,7 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_
 }
 
 int
-cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, uint8_t *buffer,
+cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id, uint8_t *buffer,
 			       uint16_t *wb_pages, uint16_t *scratch_pages)
 {
 	struct cn10k_ml_model_metadata *metadata;
@@ -470,7 +471,7 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui
 	uint64_t wb_size;
 
 	metadata = (struct cn10k_ml_model_metadata *)buffer;
-	ocm = &mldev->ocm;
+	ocm = &cn10k_mldev->ocm;
 
 	/* Assume wb_size is zero for non-relocatable models */
 	if (metadata->model.ocm_relocatable)
@@ -494,11 +495,11 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui
 		   scratch_size, *scratch_pages);
 
 	/* Check if the model can be loaded on OCM */
-	if ((*wb_pages + *scratch_pages) > mldev->ocm.num_pages) {
+	if ((*wb_pages + *scratch_pages) > cn10k_mldev->ocm.num_pages) {
 		plt_err("Cannot create the model, OCM relocatable = %u",
 			metadata->model.ocm_relocatable);
 		plt_err("wb_pages (%u) + scratch_pages (%u) > %u", *wb_pages, *scratch_pages,
-			mldev->ocm.num_pages);
+			cn10k_mldev->ocm.num_pages);
 		return -ENOMEM;
 	}
 
@@ -506,8 +507,8 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui
 	 * prevent the library from allocating the remaining space on the tile to other models.
 	 */
 	if (!metadata->model.ocm_relocatable)
-		*scratch_pages =
-			PLT_MAX(PLT_U64_CAST(*scratch_pages), PLT_U64_CAST(mldev->ocm.num_pages));
+		*scratch_pages = PLT_MAX(PLT_U64_CAST(*scratch_pages),
+					 PLT_U64_CAST(cn10k_mldev->ocm.num_pages));
 
 	return 0;
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 4cc0744891..3128b28db7 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -13,6 +13,8 @@
 #include "cn10k_ml_ocm.h"
 #include "cn10k_ml_ops.h"
 
+struct cnxk_ml_dev;
+
 /* Model state */
 enum cn10k_ml_model_state {
 	ML_CN10K_MODEL_STATE_LOADED,
@@ -489,7 +491,7 @@ struct cn10k_ml_model_stats {
 /* Model Object */
 struct cn10k_ml_model {
 	/* Device reference */
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *mldev;
 
 	/* Name */
 	char name[RTE_ML_STR_MAX];
@@ -537,8 +539,8 @@ int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
 void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
 				uint8_t *base_dma_addr);
-int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, uint8_t *buffer,
-				   uint16_t *wb_pages, uint16_t *scratch_pages);
+int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id,
+				   uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages);
 void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 6fb0bb620e..8094a0fab1 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -4,11 +4,12 @@
 
 #include <rte_mldev_pmd.h>
 
-#include "cn10k_ml_dev.h"
+#include <roc_api.h>
+
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
-#include "roc_api.h"
+#include "cnxk_ml_dev.h"
 
 /* OCM macros */
 #define BYTE_LEN	   8
@@ -217,7 +218,8 @@ int
 cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
 			   uint16_t scratch_pages, uint64_t *tilemask)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	uint16_t used_scratch_pages_max;
@@ -236,8 +238,9 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	int max_slot_sz;
 	int page_id;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 
 	if (num_tiles > ML_CN10K_OCM_NUMTILES) {
 		plt_err("Invalid num_tiles = %u (> %u)", num_tiles, ML_CN10K_OCM_NUMTILES);
@@ -254,8 +257,8 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	tile_start = 0;
 	search_end_tile = ocm->num_tiles - num_tiles;
 
-	/* allocate for local ocm mask */
-	local_ocm_mask = rte_zmalloc("local_ocm_mask", mldev->ocm.mask_words, RTE_CACHE_LINE_SIZE);
+	/* Allocate for local ocm mask */
+	local_ocm_mask = rte_zmalloc("local_ocm_mask", ocm->mask_words, RTE_CACHE_LINE_SIZE);
 	if (local_ocm_mask == NULL) {
 		plt_err("Unable to allocate memory for local_ocm_mask");
 		return -1;
@@ -271,7 +274,7 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 			PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, used_last_wb_page_max);
 	}
 
-	memset(local_ocm_mask, 0, mldev->ocm.mask_words);
+	memset(local_ocm_mask, 0, ocm->mask_words);
 	for (tile_id = tile_start; tile_id < tile_start + num_tiles; tile_id++) {
 		for (word_id = 0; word_id < ocm->mask_words; word_id++)
 			local_ocm_mask[word_id] |= ocm->tile_ocm_info[tile_id].ocm_mask[word_id];
@@ -333,8 +336,9 @@ void
 cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t tilemask,
 			   int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_page_start;
@@ -345,8 +349,9 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 	int tile_id;
 	int page_id;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	/* Get first set bit, tile_start */
@@ -391,8 +396,9 @@ void
 cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_model *local_model;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_resize_pages;
@@ -404,8 +410,9 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 	int page_id;
 	uint16_t i;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	/* Update OCM info for WB memory */
@@ -453,35 +460,37 @@ cn10k_ml_ocm_pagemask_to_str(struct cn10k_ml_ocm_tile_info *tile_info, uint16_t
 	char *p = str;
 	int word;
 
-	/* add prefix 0x */
+	/* Add prefix 0x */
 	*p++ = '0';
 	*p++ = 'x';
 
-	/* build one word at a time */
+	/* Build hex string */
 	for (word = nwords - 1; word >= 0; word--) {
 		sprintf(p, "%02X", tile_info->ocm_mask[word]);
 		p += 2;
 	}
 
-	/* terminate */
+	/* Terminate */
 	*p++ = 0;
 }
 
 void
 cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
 {
-	char *str;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 	uint8_t tile_id;
 	uint8_t word_id;
 	int wb_pages;
+	char *str;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 
-	/* nibbles + prefix '0x' */
-	str = rte_zmalloc("ocm_mask_str", mldev->ocm.num_pages / 4 + 2, RTE_CACHE_LINE_SIZE);
+	/* Nibbles + prefix '0x' */
+	str = rte_zmalloc("ocm_mask_str", ocm->num_pages / 4 + 2, RTE_CACHE_LINE_SIZE);
 	if (str == NULL) {
 		plt_err("Unable to allocate memory for ocm_mask_str");
 		return;
@@ -492,9 +501,8 @@ cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
 		cn10k_ml_ocm_pagemask_to_str(&ocm->tile_ocm_info[tile_id], ocm->mask_words, str);
 
 		wb_pages = 0 - ocm->tile_ocm_info[tile_id].scratch_pages;
-		for (word_id = 0; word_id < mldev->ocm.mask_words; word_id++)
-			wb_pages +=
-				rte_popcount32(ocm->tile_ocm_info[tile_id].ocm_mask[word_id]);
+		for (word_id = 0; word_id < ocm->mask_words; word_id++)
+			wb_pages += rte_popcount32(ocm->tile_ocm_info[tile_id].ocm_mask[word_id]);
 
 		fprintf(fp,
 			"tile = %2u, scratch_pages = %4u,"
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 11531afd8c..def6d4c756 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -7,10 +7,11 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ops.h"
 
+#include "cnxk_ml_dev.h"
+
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
 
@@ -85,7 +86,7 @@ cn10k_ml_set_poll_addr(struct cn10k_ml_req *req)
 static inline void
 cn10k_ml_set_poll_ptr(struct cn10k_ml_req *req)
 {
-	plt_write64(ML_CN10K_POLL_JOB_START, req->compl_W1);
+	plt_write64(ML_CNXK_POLL_JOB_START, req->compl_W1);
 }
 
 static inline uint64_t
@@ -175,7 +176,7 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->queue.reqs = (struct cn10k_ml_req *)va;
 	qp->queue.head = 0;
 	qp->queue.tail = 0;
-	qp->queue.wait_cycles = ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	qp->nb_desc = nb_desc;
 	qp->stats.enqueued_count = 0;
 	qp->stats.dequeued_count = 0;
@@ -199,16 +200,17 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 static void
 cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 {
-
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	char str[STR_LEN];
 	uint8_t i;
 	uint8_t j;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	/* Print debug info */
@@ -249,7 +251,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
 			ML_CN10K_OCM_NUMTILES / 4, model->model_mem_map.tilemask);
 		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
-			model->model_mem_map.wb_page_start * mldev->ocm.page_size);
+			model->model_mem_map.wb_page_start * cn10k_mldev->ocm.page_size);
 	}
 
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->metadata.model.num_input);
@@ -325,7 +327,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 }
 
 static void
-cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_model *model,
+cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_model *model,
 				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
@@ -340,7 +342,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 	req->jd.hdr.model_id = model->model_id;
 	req->jd.hdr.job_type = job_type;
 	req->jd.hdr.fp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
 		if (!model->metadata.model.ocm_relocatable)
@@ -350,9 +352,9 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 
 		req->jd.hdr.sp_flags |= ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD;
 		req->jd.model_start.extended_args =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, &req->extended_args));
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->extended_args));
 		req->jd.model_start.model_dst_ddr_addr =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, addr->init_run_addr));
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_run_addr));
 		req->jd.model_start.model_init_offset = 0x0;
 		req->jd.model_start.model_main_offset = metadata->init_model.file_size;
 		req->jd.model_start.model_finish_offset =
@@ -372,7 +374,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 		req->jd.model_start.ocm_wb_range_start = metadata->model.ocm_wb_range_start;
 		req->jd.model_start.ocm_wb_range_end = metadata->model.ocm_wb_range_end;
 		req->jd.model_start.ddr_wb_base_address = PLT_U64_CAST(roc_ml_addr_ap2mlip(
-			&mldev->roc,
+			&cn10k_mldev->roc,
 			PLT_PTR_ADD(addr->finish_load_addr, metadata->finish_model.file_size)));
 		req->jd.model_start.ddr_wb_range_start = metadata->model.ddr_wb_range_start;
 		req->jd.model_start.ddr_wb_range_end = metadata->model.ddr_wb_range_end;
@@ -383,7 +385,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 		req->jd.model_start.output.s.ddr_range_end = metadata->model.ddr_output_range_end;
 
 		req->extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
-			roc_ml_addr_ap2mlip(&mldev->roc, model->addr.scratch_base_addr));
+			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, model->addr.scratch_base_addr));
 		req->extended_args.start.ddr_scratch_range_start =
 			metadata->model.ddr_scratch_range_start;
 		req->extended_args.start.ddr_scratch_range_end =
@@ -392,24 +394,20 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 }
 
 static __rte_always_inline void
-cn10k_ml_prep_fp_job_descriptor(struct rte_ml_dev *dev, struct cn10k_ml_req *req,
+cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_req *req,
 				struct rte_ml_op *op)
 {
-	struct cn10k_ml_dev *mldev;
-
-	mldev = dev->data->dev_private;
-
 	req->jd.hdr.jce.w0.u64 = 0;
 	req->jd.hdr.jce.w1.u64 = req->compl_W1;
 	req->jd.hdr.model_id = op->model_id;
 	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
 	req->jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
 	req->jd.hdr.sp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
 	req->jd.model_run.input_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->input[0]->addr));
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->input[0]->addr));
 	req->jd.model_run.output_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->output[0]->addr));
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->output[0]->addr));
 	req->jd.model_run.num_batches = op->nb_batches;
 }
 
@@ -436,66 +434,69 @@ static const struct xstat_info model_stats[] = {
 static int
 cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint16_t nb_stats;
 	uint16_t stat_id;
 	uint16_t model;
 	uint16_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
 	nb_stats = RTE_DIM(device_stats) + ML_CN10K_MAX_MODELS * RTE_DIM(model_stats);
-	if (mldev->xstats.entries == NULL)
-		mldev->xstats.entries = rte_zmalloc("cn10k_ml_xstats",
-						    sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
-						    PLT_CACHE_LINE_SIZE);
+	if (cn10k_mldev->xstats.entries == NULL)
+		cn10k_mldev->xstats.entries = rte_zmalloc(
+			"cn10k_ml_xstats", sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
+			PLT_CACHE_LINE_SIZE);
 
-	if (mldev->xstats.entries == NULL)
+	if (cn10k_mldev->xstats.entries == NULL)
 		return -ENOMEM;
 
 	/* Initialize device xstats */
 	stat_id = 0;
 	for (i = 0; i < RTE_DIM(device_stats); i++) {
-		mldev->xstats.entries[stat_id].map.id = stat_id;
-		snprintf(mldev->xstats.entries[stat_id].map.name,
-			 sizeof(mldev->xstats.entries[stat_id].map.name), "%s",
+		cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
+		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s",
 			 device_stats[i].name);
 
-		mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
-		mldev->xstats.entries[stat_id].type = device_stats[i].type;
-		mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_DEVICE;
-		mldev->xstats.entries[stat_id].obj_idx = 0;
-		mldev->xstats.entries[stat_id].reset_allowed = device_stats[i].reset_allowed;
+		cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
+		cn10k_mldev->xstats.entries[stat_id].type = device_stats[i].type;
+		cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_DEVICE;
+		cn10k_mldev->xstats.entries[stat_id].obj_idx = 0;
+		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_stats[i].reset_allowed;
 		stat_id++;
 	}
-	mldev->xstats.count_mode_device = stat_id;
+	cn10k_mldev->xstats.count_mode_device = stat_id;
 
 	/* Initialize model xstats */
 	for (model = 0; model < ML_CN10K_MAX_MODELS; model++) {
-		mldev->xstats.offset_for_model[model] = stat_id;
+		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
 
 		for (i = 0; i < RTE_DIM(model_stats); i++) {
-			mldev->xstats.entries[stat_id].map.id = stat_id;
-			mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
-			mldev->xstats.entries[stat_id].type = model_stats[i].type;
-			mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_MODEL;
-			mldev->xstats.entries[stat_id].obj_idx = model;
-			mldev->xstats.entries[stat_id].reset_allowed = model_stats[i].reset_allowed;
+			cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
+			cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
+			cn10k_mldev->xstats.entries[stat_id].type = model_stats[i].type;
+			cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_MODEL;
+			cn10k_mldev->xstats.entries[stat_id].obj_idx = model;
+			cn10k_mldev->xstats.entries[stat_id].reset_allowed =
+				model_stats[i].reset_allowed;
 
 			/* Name of xstat is updated during model load */
-			snprintf(mldev->xstats.entries[stat_id].map.name,
-				 sizeof(mldev->xstats.entries[stat_id].map.name), "Model-%u-%s",
-				 model, model_stats[i].name);
+			snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
+				 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name),
+				 "Model-%u-%s", model, model_stats[i].name);
 
 			stat_id++;
 		}
 
-		mldev->xstats.count_per_model[model] = RTE_DIM(model_stats);
+		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(model_stats);
 	}
 
-	mldev->xstats.count_mode_model = stat_id - mldev->xstats.count_mode_device;
-	mldev->xstats.count = stat_id;
+	cn10k_mldev->xstats.count_mode_model = stat_id - cn10k_mldev->xstats.count_mode_device;
+	cn10k_mldev->xstats.count = stat_id;
 
 	return 0;
 }
@@ -503,28 +504,32 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 static void
 cn10k_ml_xstats_uninit(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	rte_free(mldev->xstats.entries);
-	mldev->xstats.entries = NULL;
+	rte_free(cn10k_mldev->xstats.entries);
+	cn10k_mldev->xstats.entries = NULL;
 
-	mldev->xstats.count = 0;
+	cn10k_mldev->xstats.count = 0;
 }
 
 static void
 cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	uint16_t rclk_freq;
 	uint16_t sclk_freq;
 	uint16_t stat_id;
 	char suffix[8];
 	uint16_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[model_id];
 	stat_id = RTE_DIM(device_stats) + model_id * RTE_DIM(model_stats);
 
@@ -536,8 +541,8 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Update xstat name based on model name and sclk availability */
 	for (i = 0; i < RTE_DIM(model_stats); i++) {
-		snprintf(mldev->xstats.entries[stat_id].map.name,
-			 sizeof(mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
 			 model->metadata.model.name, model_stats[i].name, suffix);
 		stat_id++;
 	}
@@ -547,19 +552,19 @@ static uint64_t
 cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 		       enum cn10k_ml_xstats_type type)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
 
 	switch (type) {
 	case nb_models_loaded:
-		return mldev->nb_models_loaded;
+		return cnxk_mldev->nb_models_loaded;
 	case nb_models_unloaded:
-		return mldev->nb_models_unloaded;
+		return cnxk_mldev->nb_models_unloaded;
 	case nb_models_started:
-		return mldev->nb_models_started;
+		return cnxk_mldev->nb_models_started;
 	case nb_models_stopped:
-		return mldev->nb_models_stopped;
+		return cnxk_mldev->nb_models_stopped;
 	default:
 		return -1;
 	}
@@ -651,15 +656,17 @@ static int
 cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[], uint16_t nb_ids)
 {
 	struct cn10k_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint16_t nb_stats;
 	uint16_t stat_id;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	if (stat_ids == NULL)
-		nb_stats = mldev->xstats.count_mode_device;
+		nb_stats = cn10k_mldev->xstats.count_mode_device;
 	else
 		nb_stats = nb_ids;
 
@@ -669,10 +676,10 @@ cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[],
 		else
 			stat_id = stat_ids[i];
 
-		if (stat_id >= mldev->xstats.count_mode_device)
+		if (stat_id >= cn10k_mldev->xstats.count_mode_device)
 			return -EINVAL;
 
-		xs = &mldev->xstats.entries[stat_id];
+		xs = &cn10k_mldev->xstats.entries[stat_id];
 		if (!xs->reset_allowed)
 			continue;
 
@@ -740,15 +747,17 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 			    uint16_t nb_ids)
 {
 	struct cn10k_ml_xstats_entry *xs;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	int32_t lcl_model_id = 0;
 	uint16_t start_id;
 	uint16_t end_id;
 	int32_t i;
 	int32_t j;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	for (i = 0; i < ML_CN10K_MAX_MODELS; i++) {
 		if (model_id == -1) {
 			model = dev->data->models[i];
@@ -765,12 +774,13 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 			}
 		}
 
-		start_id = mldev->xstats.offset_for_model[i];
-		end_id = mldev->xstats.offset_for_model[i] + mldev->xstats.count_per_model[i] - 1;
+		start_id = cn10k_mldev->xstats.offset_for_model[i];
+		end_id = cn10k_mldev->xstats.offset_for_model[i] +
+			 cn10k_mldev->xstats.count_per_model[i] - 1;
 
 		if (stat_ids == NULL) {
 			for (j = start_id; j <= end_id; j++) {
-				xs = &mldev->xstats.entries[j];
+				xs = &cn10k_mldev->xstats.entries[j];
 				cn10k_ml_reset_model_stat(dev, i, xs->type);
 			}
 		} else {
@@ -780,7 +790,7 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 						stat_ids[j], lcl_model_id);
 					return -EINVAL;
 				}
-				xs = &mldev->xstats.entries[stat_ids[j]];
+				xs = &cn10k_mldev->xstats.entries[stat_ids[j]];
 				cn10k_ml_reset_model_stat(dev, i, xs->type);
 			}
 		}
@@ -854,17 +864,19 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
 	if (dev_info == NULL)
 		return -EINVAL;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
 	dev_info->driver_name = dev->device->driver->name;
 	dev_info->max_models = ML_CN10K_MAX_MODELS;
-	if (mldev->hw_queue_lock)
+	if (cn10k_mldev->hw_queue_lock)
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
 	else
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_LF;
@@ -881,8 +893,9 @@ static int
 cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
 {
 	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
@@ -895,7 +908,8 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		return -EINVAL;
 
 	/* Get CN10K device handle */
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	cn10k_ml_dev_info_get(dev, &dev_info);
 	if (conf->nb_models > dev_info.max_models) {
@@ -908,21 +922,21 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		return -EINVAL;
 	}
 
-	if (mldev->state == ML_CN10K_DEV_STATE_PROBED) {
+	if (cnxk_mldev->state == ML_CNXK_DEV_STATE_PROBED) {
 		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
 
 		/* Load firmware */
-		ret = cn10k_ml_fw_load(mldev);
+		ret = cn10k_ml_fw_load(cnxk_mldev);
 		if (ret != 0)
 			return ret;
-	} else if (mldev->state == ML_CN10K_DEV_STATE_CONFIGURED) {
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
 		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
-	} else if (mldev->state == ML_CN10K_DEV_STATE_STARTED) {
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_STARTED) {
 		plt_err("Device can't be reconfigured in started state\n");
 		return -ENOTSUP;
-	} else if (mldev->state == ML_CN10K_DEV_STATE_CLOSED) {
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CLOSED) {
 		plt_err("Device can't be reconfigured after close\n");
 		return -ENOTSUP;
 	}
@@ -1013,10 +1027,10 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	}
 	dev->data->nb_models = conf->nb_models;
 
-	ocm = &mldev->ocm;
+	ocm = &cn10k_mldev->ocm;
 	ocm->num_tiles = ML_CN10K_OCM_NUMTILES;
 	ocm->size_per_tile = ML_CN10K_OCM_TILESIZE;
-	ocm->page_size = mldev->ocm_page_size;
+	ocm->page_size = cn10k_mldev->ocm_page_size;
 	ocm->num_pages = ocm->size_per_tile / ocm->page_size;
 	ocm->mask_words = ocm->num_pages / (8 * sizeof(uint8_t));
 
@@ -1044,25 +1058,25 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	}
 
 	/* Set JCMDQ enqueue function */
-	if (mldev->hw_queue_lock == 1)
-		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_sl;
+	if (cn10k_mldev->hw_queue_lock == 1)
+		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_sl;
 	else
-		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
+		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
 	/* Set polling function pointers */
-	mldev->set_poll_addr = cn10k_ml_set_poll_addr;
-	mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
-	mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
+	cn10k_mldev->set_poll_addr = cn10k_ml_set_poll_addr;
+	cn10k_mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
+	cn10k_mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
 
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
 	dev->op_error_get = cn10k_ml_op_error_get;
 
-	mldev->nb_models_loaded = 0;
-	mldev->nb_models_started = 0;
-	mldev->nb_models_stopped = 0;
-	mldev->nb_models_unloaded = 0;
-	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
+	cnxk_mldev->nb_models_loaded = 0;
+	cnxk_mldev->nb_models_started = 0;
+	cnxk_mldev->nb_models_stopped = 0;
+	cnxk_mldev->nb_models_unloaded = 0;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
 
 	return 0;
 
@@ -1077,8 +1091,9 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 static int
 cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
 	uint16_t qp_id;
@@ -1086,10 +1101,11 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	if (dev == NULL)
 		return -EINVAL;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Release ocm_mask memory */
-	rte_free(mldev->ocm.ocm_mask);
+	rte_free(cn10k_mldev->ocm.ocm_mask);
 
 	/* Stop and unload all models */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
@@ -1125,21 +1141,21 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	cn10k_ml_xstats_uninit(dev);
 
 	/* Unload firmware */
-	cn10k_ml_fw_unload(mldev);
+	cn10k_ml_fw_unload(cnxk_mldev);
 
 	/* Clear scratch registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
 
 	/* Reset ML_MLR_BASE */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_MLR_BASE);
-	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
 
-	mldev->state = ML_CN10K_DEV_STATE_CLOSED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CLOSED;
 
 	/* Remove PCI device */
 	return rte_dev_remove(dev->device);
@@ -1148,17 +1164,19 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 static int
 cn10k_ml_dev_start(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	mldev->state = ML_CN10K_DEV_STATE_STARTED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
 
 	return 0;
 }
@@ -1166,17 +1184,19 @@ cn10k_ml_dev_start(struct rte_ml_dev *dev)
 static int
 cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
 
 	return 0;
 }
@@ -1259,22 +1279,24 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
 			      uint32_t size)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint32_t xstats_mode_count;
 	uint32_t idx = 0;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	xstats_mode_count = 0;
 	switch (mode) {
 	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = mldev->xstats.count_mode_device;
+		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CN10K_MAX_MODELS)
 			break;
-		xstats_mode_count = mldev->xstats.count_per_model[model_id];
+		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
 	default:
 		return -EINVAL;
@@ -1283,16 +1305,17 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 	if (xstats_mode_count > size || xstats_map == NULL)
 		return xstats_mode_count;
 
-	for (i = 0; i < mldev->xstats.count && idx < size; i++) {
-		if (mldev->xstats.entries[i].mode != mode)
+	for (i = 0; i < cn10k_mldev->xstats.count && idx < size; i++) {
+		if (cn10k_mldev->xstats.entries[i].mode != mode)
 			continue;
 
 		if (mode != RTE_ML_DEV_XSTATS_DEVICE &&
-		    model_id != mldev->xstats.entries[i].obj_idx)
+		    model_id != cn10k_mldev->xstats.entries[i].obj_idx)
 			continue;
 
-		strncpy(xstats_map[idx].name, mldev->xstats.entries[i].map.name, RTE_ML_STR_MAX);
-		xstats_map[idx].id = mldev->xstats.entries[i].map.id;
+		strncpy(xstats_map[idx].name, cn10k_mldev->xstats.entries[i].map.name,
+			RTE_ML_STR_MAX);
+		xstats_map[idx].id = cn10k_mldev->xstats.entries[i].map.id;
 		idx++;
 	}
 
@@ -1304,13 +1327,15 @@ cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16
 				uint64_t *value)
 {
 	struct cn10k_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	cn10k_ml_xstats_fn fn;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
-	for (i = 0; i < mldev->xstats.count; i++) {
-		xs = &mldev->xstats.entries[i];
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	for (i = 0; i < cn10k_mldev->xstats.count; i++) {
+		xs = &cn10k_mldev->xstats.entries[i];
 		if (strncmp(xs->map.name, name, RTE_ML_STR_MAX) == 0) {
 			if (stat_id != NULL)
 				*stat_id = xs->map.id;
@@ -1344,24 +1369,26 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
 {
 	struct cn10k_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint32_t xstats_mode_count;
 	cn10k_ml_xstats_fn fn;
 	uint64_t val;
 	uint32_t idx;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	xstats_mode_count = 0;
 
 	switch (mode) {
 	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = mldev->xstats.count_mode_device;
+		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CN10K_MAX_MODELS)
 			return -EINVAL;
-		xstats_mode_count = mldev->xstats.count_per_model[model_id];
+		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
 	default:
 		return -EINVAL;
@@ -1369,8 +1396,8 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 
 	idx = 0;
 	for (i = 0; i < nb_ids && idx < xstats_mode_count; i++) {
-		xs = &mldev->xstats.entries[stat_ids[i]];
-		if (stat_ids[i] > mldev->xstats.count || xs->mode != mode)
+		xs = &cn10k_mldev->xstats.entries[stat_ids[i]];
+		if (stat_ids[i] > cn10k_mldev->xstats.count || xs->mode != mode)
 			continue;
 
 		if (mode == RTE_ML_DEV_XSTATS_MODEL && model_id != xs->obj_idx) {
@@ -1418,8 +1445,9 @@ cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mo
 static int
 cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_fw *fw;
 
 	uint32_t head_loc;
@@ -1432,8 +1460,9 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	if (roc_env_is_asim())
 		return 0;
 
-	mldev = dev->data->dev_private;
-	fw = &mldev->fw;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	fw = &cn10k_mldev->fw;
 
 	/* Dump model info */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
@@ -1451,15 +1480,19 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	for (core_id = 0; core_id <= 1; core_id++) {
 		bufsize = fw->req->jd.fw_load.debug.debug_buffer_size;
 		if (core_id == 0) {
-			head_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-			tail_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+			head_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+			tail_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_debug_ptr);
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		} else {
-			head_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-			tail_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+			head_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+			tail_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_debug_ptr);
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		}
 		if (head_loc < tail_loc) {
 			fprintf(fp, "%.*s\n", tail_loc - head_loc, &head_ptr[head_loc]);
@@ -1473,18 +1506,18 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	for (core_id = 0; core_id <= 1; core_id++) {
 		bufsize = fw->req->jd.fw_load.debug.exception_state_size;
 		if ((core_id == 0) &&
-		    (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0)) {
+		    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0)) {
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C0 = 0x%016lx",
-				roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0));
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0));
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 			fprintf(fp, "%.*s", bufsize, head_ptr);
-		} else if ((core_id == 1) &&
-			   (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0)) {
+		} else if ((core_id == 1) && (roc_ml_reg_read64(&cn10k_mldev->roc,
+								ML_SCRATCH_EXCEPTION_SP_C1) != 0)) {
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C1 = 0x%016lx",
-				roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1));
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1));
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 			fprintf(fp, "%.*s", bufsize, head_ptr);
 		}
 	}
@@ -1495,14 +1528,16 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 static int
 cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	const struct plt_memzone *mz;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	uint64_t timeout_cycle;
 	bool timeout;
 	int ret;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cn10k_ml_req), 0,
 					 ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL) {
@@ -1515,20 +1550,20 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
 	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
 	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
-	req->jd.fw_load.flags = cn10k_ml_fw_flags_get(&mldev->fw);
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
+	req->jd.fw_load.flags = cn10k_ml_fw_flags_get(&cn10k_mldev->fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
 	/* Enqueue firmware selftest request through scratch registers */
 	timeout = true;
-	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
 
 	plt_rmb();
 	do {
-		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
-		    (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
+		    (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -1552,8 +1587,8 @@ int
 cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
 	struct cn10k_ml_model_metadata *metadata;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
@@ -1574,7 +1609,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	if (ret != 0)
 		return ret;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
 
 	/* Find model ID */
 	found = false;
@@ -1591,7 +1626,8 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	/* Get WB and scratch pages, check if model can be loaded. */
-	ret = cn10k_ml_model_ocm_pages_count(mldev, idx, params->addr, &wb_pages, &scratch_pages);
+	ret = cn10k_ml_model_ocm_pages_count(&cnxk_mldev->cn10k_mldev, idx, params->addr, &wb_pages,
+					     &scratch_pages);
 	if (ret < 0)
 		return ret;
 
@@ -1623,7 +1659,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	model = mz->addr;
-	model->mldev = mldev;
+	model->mldev = cnxk_mldev;
 	model->model_id = idx;
 
 	rte_memcpy(&model->metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
@@ -1680,7 +1716,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
-	mldev->nb_models_loaded++;
+	cnxk_mldev->nb_models_loaded++;
 
 	/* Update xstats names */
 	cn10k_ml_xstats_model_name_update(dev, idx);
@@ -1695,9 +1731,9 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	char str[RTE_MEMZONE_NAMESIZE];
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
 	model = dev->data->models[model_id];
 
 	if (model == NULL) {
@@ -1711,7 +1747,7 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 	}
 
 	dev->data->models[model_id] = NULL;
-	mldev->nb_models_unloaded++;
+	cnxk_mldev->nb_models_unloaded++;
 
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, model_id);
 	return plt_memzone_free(plt_memzone_lookup(str));
@@ -1720,8 +1756,9 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 int
 cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1735,8 +1772,9 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	bool locked;
 	int ret = 0;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	if (model == NULL) {
@@ -1746,11 +1784,11 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Prepare JD */
 	req = model->req;
-	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
+	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
 
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
 	num_tiles = model->metadata.model.tile_end - model->metadata.model.tile_start + 1;
@@ -1815,26 +1853,26 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	job_dequeued = false;
 	do {
 		if (!job_enqueued) {
-			req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&mldev->roc, &req->jd);
+			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
 			if (req->result.error_code.u64 == 0)
 				ret = 0;
 			else
 				ret = -1;
 		}
 	} else { /* Reset scratch registers */
-		roc_ml_scratch_queue_reset(&mldev->roc);
+		roc_ml_scratch_queue_reset(&cn10k_mldev->roc);
 		ret = -ETIME;
 	}
 
@@ -1843,7 +1881,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 		if (plt_spinlock_trylock(&model->lock) != 0) {
 			if (ret == 0) {
 				model->state = ML_CN10K_MODEL_STATE_STARTED;
-				mldev->nb_models_started++;
+				cnxk_mldev->nb_models_started++;
 			} else {
 				model->state = ML_CN10K_MODEL_STATE_UNKNOWN;
 			}
@@ -1867,7 +1905,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	if (ret < 0) { /* Call unload to update model and FW state, ignore error */
 		rte_ml_model_stop(dev->data->dev_id, model_id);
 	} else {
-		if (mldev->cache_model_data && roc_model_is_cn10ka())
+		if (cn10k_mldev->cache_model_data && roc_model_is_cn10ka())
 			ret = cn10k_ml_cache_model_data(dev, model_id);
 	}
 
@@ -1877,8 +1915,9 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 int
 cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1887,8 +1926,9 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	bool locked;
 	int ret = 0;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	if (model == NULL) {
@@ -1898,11 +1938,11 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Prepare JD */
 	req = model->req;
-	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
+	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
 
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
 	locked = false;
@@ -1941,33 +1981,33 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	job_dequeued = false;
 	do {
 		if (!job_enqueued) {
-			req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&mldev->roc, &req->jd);
+			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
 			if (req->result.error_code.u64 == 0x0)
 				ret = 0;
 			else
 				ret = -1;
 		}
 	} else {
-		roc_ml_scratch_queue_reset(&mldev->roc);
+		roc_ml_scratch_queue_reset(&cn10k_mldev->roc);
 		ret = -ETIME;
 	}
 
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			mldev->nb_models_stopped++;
+			cnxk_mldev->nb_models_stopped++;
 			model->state = ML_CN10K_MODEL_STATE_LOADED;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
@@ -2211,8 +2251,9 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		       struct rte_ml_op *op)
 {
 	struct cn10k_ml_model_stats *stats;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
@@ -2258,14 +2299,16 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 
 		/* Handle driver error */
 		if (result->error_code.s.etype == ML_ETYPE_DRIVER) {
-			mldev = dev->data->dev_private;
+			cnxk_mldev = dev->data->dev_private;
+			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 			/* Check for exception */
-			if ((roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0) ||
-			    (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
+			if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) !=
+			     0) ||
+			    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
 				result->error_code.s.stype = ML_DRIVER_ERR_EXCEPTION;
-			else if ((roc_ml_reg_read64(&mldev->roc, ML_CORE_INT_LO) != 0) ||
-				 (roc_ml_reg_read64(&mldev->roc, ML_CORE_INT_HI) != 0))
+			else if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_LO) != 0) ||
+				 (roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_HI) != 0))
 				result->error_code.s.stype = ML_DRIVER_ERR_FW_ERROR;
 			else
 				result->error_code.s.stype = ML_DRIVER_ERR_UNKNOWN;
@@ -2282,8 +2325,9 @@ __rte_hot uint16_t
 cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	struct cn10k_ml_qp *qp;
 	struct rte_ml_op *op;
@@ -2292,7 +2336,8 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	uint64_t head;
 	bool enqueued;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	qp = dev->data->queue_pairs[qp_id];
 	queue = &qp->queue;
 
@@ -2307,15 +2352,15 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	op = ops[count];
 	req = &queue->reqs[head];
 
-	mldev->set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
+	cn10k_mldev->set_poll_addr(req);
+	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
-	mldev->set_poll_ptr(req);
-	enqueued = mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd);
+	cn10k_mldev->set_poll_ptr(req);
+	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
 
@@ -2339,8 +2384,9 @@ __rte_hot uint16_t
 cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	struct cn10k_ml_qp *qp;
 
@@ -2348,7 +2394,8 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	uint16_t count;
 	uint64_t tail;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	qp = dev->data->queue_pairs[qp_id];
 	queue = &qp->queue;
 
@@ -2361,8 +2408,8 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 
 dequeue_req:
 	req = &queue->reqs[tail];
-	status = mldev->get_poll_ptr(req);
-	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH)) {
+	status = cn10k_mldev->get_poll_ptr(req);
+	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
 		if (plt_tsc_cycles() < req->timeout)
 			goto empty_or_active;
 		else /* Timeout, set indication of driver error */
@@ -2420,30 +2467,32 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 __rte_hot int
 cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	bool timeout;
 	int ret = 0;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[op->model_id];
 	req = model->req;
 
 	cn10k_ml_set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
-	mldev->set_poll_ptr(req);
+	cn10k_mldev->set_poll_ptr(req);
 	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
 
 	timeout = true;
-	req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	do {
-		if (mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd)) {
+		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd)) {
 			req->op = op;
 			timeout = false;
 			break;
@@ -2457,7 +2506,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 
 	timeout = true;
 	do {
-		if (mldev->get_poll_ptr(req) == ML_CN10K_POLL_JOB_FINISH) {
+		if (cn10k_mldev->get_poll_ptr(req) == ML_CNXK_POLL_JOB_FINISH) {
 			timeout = false;
 			break;
 		}
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.c b/drivers/ml/cnxk/cnxk_ml_dev.c
new file mode 100644
index 0000000000..2a5c17c973
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_dev.c
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include "cnxk_ml_dev.h"
+
+/* Dummy operations for ML device */
+struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
new file mode 100644
index 0000000000..51315de622
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_DEV_H_
+#define _CNXK_ML_DEV_H_
+
+#include <roc_api.h>
+
+#include "cn10k_ml_dev.h"
+
+/* ML command timeout in seconds */
+#define ML_CNXK_CMD_TIMEOUT 5
+
+/* Poll mode job state */
+#define ML_CNXK_POLL_JOB_START	0
+#define ML_CNXK_POLL_JOB_FINISH 1
+
+/* Device configuration state enum */
+enum cnxk_ml_dev_state {
+	/* Probed and not configured */
+	ML_CNXK_DEV_STATE_PROBED = 0,
+
+	/* Configured */
+	ML_CNXK_DEV_STATE_CONFIGURED,
+
+	/* Started */
+	ML_CNXK_DEV_STATE_STARTED,
+
+	/* Closed */
+	ML_CNXK_DEV_STATE_CLOSED
+};
+
+/* Device private data */
+struct cnxk_ml_dev {
+	/* RTE device */
+	struct rte_ml_dev *mldev;
+
+	/* Configuration state */
+	enum cnxk_ml_dev_state state;
+
+	/* Number of models loaded */
+	uint16_t nb_models_loaded;
+
+	/* Number of models unloaded */
+	uint16_t nb_models_unloaded;
+
+	/* Number of models started */
+	uint16_t nb_models_started;
+
+	/* Number of models stopped */
+	uint16_t nb_models_stopped;
+
+	/* CN10K device structure */
+	struct cn10k_ml_dev cn10k_mldev;
+};
+
+#endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 94fa4283b1..03a2d4ecf2 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -12,6 +12,7 @@ driver_sdk_headers = files(
         'cn10k_ml_ops.h',
         'cn10k_ml_model.h',
         'cn10k_ml_ocm.h',
+        'cnxk_ml_dev.h',
 )
 
 sources = files(
@@ -19,6 +20,7 @@ sources = files(
         'cn10k_ml_ops.c',
         'cn10k_ml_model.c',
         'cn10k_ml_ocm.c',
+        'cnxk_ml_dev.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 03/34] ml/cnxk: add generic model and layer structures
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 01/34] ml/cnxk: drop support for register polling Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 02/34] ml/cnxk: add generic cnxk device structure Srikanth Yalavarthi
@ 2023-10-18  6:47   ` Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 04/34] ml/cnxk: add generic cnxk request structure Srikanth Yalavarthi
                     ` (30 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:47 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduce generic cnxk model and layer structure. These
structures would enable supporting models with multiple
layers. A model is a collection of multiple independent
layers with flow dependencies between the layers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |   9 +-
 drivers/ml/cnxk/cn10k_ml_model.c | 245 ++++++++--------
 drivers/ml/cnxk/cn10k_ml_model.h | 122 ++------
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  50 ++--
 drivers/ml/cnxk/cn10k_ml_ocm.h   |   9 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 488 +++++++++++++++++--------------
 drivers/ml/cnxk/cnxk_ml_io.h     |  79 +++++
 drivers/ml/cnxk/cnxk_ml_model.c  |   7 +
 drivers/ml/cnxk/cnxk_ml_model.h  | 111 +++++++
 drivers/ml/cnxk/meson.build      |   3 +
 10 files changed, 653 insertions(+), 470 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index f9da1548c4..99ff0a344a 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -9,6 +9,8 @@
 
 #include "cn10k_ml_ocm.h"
 
+#include "cnxk_ml_io.h"
+
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 
@@ -21,9 +23,6 @@ extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 /* Device alignment size */
 #define ML_CN10K_ALIGN_SIZE 128
 
-/* Maximum number of models per device */
-#define ML_CN10K_MAX_MODELS 16
-
 /* Maximum number of queue-pairs per device, spinlock version */
 #define ML_CN10K_MAX_QP_PER_DEVICE_SL 16
 
@@ -455,8 +454,8 @@ struct cn10k_ml_xstats {
 	struct cn10k_ml_xstats_entry *entries;
 
 	/* Store num stats and offset of the stats for each model */
-	uint16_t count_per_model[ML_CN10K_MAX_MODELS];
-	uint16_t offset_for_model[ML_CN10K_MAX_MODELS];
+	uint16_t count_per_model[ML_CNXK_MAX_MODELS];
+	uint16_t offset_for_model[ML_CNXK_MAX_MODELS];
 	uint16_t count_mode_device;
 	uint16_t count_mode_model;
 	uint16_t count;
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index cc46ca2efd..d747bba151 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -6,10 +6,10 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -311,19 +311,17 @@ cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata)
 }
 
 void
-cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_t *base_dma_addr)
+cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer, uint8_t *base_dma_addr)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_model_addr *addr;
+	struct cn10k_ml_layer_addr *addr;
 	size_t model_data_size;
 	uint8_t *dma_addr_load;
 	uint8_t *dma_addr_run;
-	uint8_t i;
-	uint8_t j;
 	int fpos;
 
-	metadata = &model->metadata;
-	addr = &model->addr;
+	metadata = &layer->glow.metadata;
+	addr = &layer->glow.addr;
 	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
 			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
 
@@ -361,102 +359,136 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_
 	addr->wb_base_addr = PLT_PTR_SUB(dma_addr_load, metadata->weights_bias.mem_offset);
 	addr->wb_load_addr = PLT_PTR_ADD(addr->wb_base_addr, metadata->weights_bias.mem_offset);
 	rte_memcpy(addr->wb_load_addr, PLT_PTR_ADD(buffer, fpos), metadata->weights_bias.file_size);
+}
+
+void
+cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	uint8_t i;
+	uint8_t j;
+
+	metadata = &layer->glow.metadata;
 
 	/* Inputs */
-	addr->total_input_sz_d = 0;
-	addr->total_input_sz_q = 0;
+	layer->info.nb_inputs = metadata->model.num_input;
+	layer->info.total_input_sz_d = 0;
+	layer->info.total_input_sz_q = 0;
 	for (i = 0; i < metadata->model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			addr->input[i].nb_dims = 4;
-			addr->input[i].shape[0] = metadata->input1[i].shape.w;
-			addr->input[i].shape[1] = metadata->input1[i].shape.x;
-			addr->input[i].shape[2] = metadata->input1[i].shape.y;
-			addr->input[i].shape[3] = metadata->input1[i].shape.z;
-
-			addr->input[i].nb_elements =
+			strncpy(layer->info.input[i].name, (char *)metadata->input1[i].input_name,
+				MRVL_ML_INPUT_NAME_LEN);
+			layer->info.input[i].dtype = metadata->input1[i].input_type;
+			layer->info.input[i].qtype = metadata->input1[i].model_input_type;
+			layer->info.input[i].nb_dims = 4;
+			layer->info.input[i].shape[0] = metadata->input1[i].shape.w;
+			layer->info.input[i].shape[1] = metadata->input1[i].shape.x;
+			layer->info.input[i].shape[2] = metadata->input1[i].shape.y;
+			layer->info.input[i].shape[3] = metadata->input1[i].shape.z;
+			layer->info.input[i].nb_elements =
 				metadata->input1[i].shape.w * metadata->input1[i].shape.x *
 				metadata->input1[i].shape.y * metadata->input1[i].shape.z;
-			addr->input[i].sz_d =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_d =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].input_type);
-			addr->input[i].sz_q =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_q =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].model_input_type);
-			addr->total_input_sz_d += addr->input[i].sz_d;
-			addr->total_input_sz_q += addr->input[i].sz_q;
+			layer->info.input[i].scale = metadata->input1[i].qscale;
+
+			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
+			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
 
 			plt_ml_dbg(
-				"model_id = %u, input[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				model->model_id, i, metadata->input1[i].shape.w,
+				"index = %u, input1[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				layer->index, i, metadata->input1[i].shape.w,
 				metadata->input1[i].shape.x, metadata->input1[i].shape.y,
-				metadata->input1[i].shape.z, addr->input[i].sz_d,
-				addr->input[i].sz_q);
+				metadata->input1[i].shape.z, layer->info.input[i].sz_d,
+				layer->info.input[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			addr->input[i].nb_dims = 4;
-			addr->input[i].shape[0] = metadata->input2[j].shape.w;
-			addr->input[i].shape[1] = metadata->input2[j].shape.x;
-			addr->input[i].shape[2] = metadata->input2[j].shape.y;
-			addr->input[i].shape[3] = metadata->input2[j].shape.z;
-
-			addr->input[i].nb_elements =
+			strncpy(layer->info.input[i].name, (char *)metadata->input2[j].input_name,
+				MRVL_ML_INPUT_NAME_LEN);
+			layer->info.input[i].dtype = metadata->input2[j].input_type;
+			layer->info.input[i].qtype = metadata->input2[j].model_input_type;
+			layer->info.input[i].nb_dims = 4;
+			layer->info.input[i].shape[0] = metadata->input2[j].shape.w;
+			layer->info.input[i].shape[1] = metadata->input2[j].shape.x;
+			layer->info.input[i].shape[2] = metadata->input2[j].shape.y;
+			layer->info.input[i].shape[3] = metadata->input2[j].shape.z;
+			layer->info.input[i].nb_elements =
 				metadata->input2[j].shape.w * metadata->input2[j].shape.x *
 				metadata->input2[j].shape.y * metadata->input2[j].shape.z;
-			addr->input[i].sz_d =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_d =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].input_type);
-			addr->input[i].sz_q =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_q =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].model_input_type);
-			addr->total_input_sz_d += addr->input[i].sz_d;
-			addr->total_input_sz_q += addr->input[i].sz_q;
+			layer->info.input[i].scale = metadata->input2[j].qscale;
+
+			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
+			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
 
 			plt_ml_dbg(
-				"model_id = %u, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				model->model_id, j, metadata->input2[j].shape.w,
+				"index = %u, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				layer->index, j, metadata->input2[j].shape.w,
 				metadata->input2[j].shape.x, metadata->input2[j].shape.y,
-				metadata->input2[j].shape.z, addr->input[i].sz_d,
-				addr->input[i].sz_q);
+				metadata->input2[j].shape.z, layer->info.input[i].sz_d,
+				layer->info.input[i].sz_q);
 		}
 	}
 
 	/* Outputs */
-	addr->total_output_sz_q = 0;
-	addr->total_output_sz_d = 0;
+	layer->info.nb_outputs = metadata->model.num_output;
+	layer->info.total_output_sz_q = 0;
+	layer->info.total_output_sz_d = 0;
 	for (i = 0; i < metadata->model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			addr->output[i].nb_dims = 1;
-			addr->output[i].shape[0] = metadata->output1[i].size;
-			addr->output[i].nb_elements = metadata->output1[i].size;
-			addr->output[i].sz_d =
-				addr->output[i].nb_elements *
+			strncpy(layer->info.output[i].name,
+				(char *)metadata->output1[i].output_name, MRVL_ML_OUTPUT_NAME_LEN);
+			layer->info.output[i].dtype = metadata->output1[i].output_type;
+			layer->info.output[i].qtype = metadata->output1[i].model_output_type;
+			layer->info.output[i].nb_dims = 1;
+			layer->info.output[i].shape[0] = metadata->output1[i].size;
+			layer->info.output[i].nb_elements = metadata->output1[i].size;
+			layer->info.output[i].sz_d =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].output_type);
-			addr->output[i].sz_q =
-				addr->output[i].nb_elements *
+			layer->info.output[i].sz_q =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].model_output_type);
-			addr->total_output_sz_q += addr->output[i].sz_q;
-			addr->total_output_sz_d += addr->output[i].sz_d;
+			layer->info.output[i].scale = metadata->output1[i].dscale;
 
-			plt_ml_dbg("model_id = %u, output[%u] - sz_d = %u, sz_q = %u",
-				   model->model_id, i, addr->output[i].sz_d, addr->output[i].sz_q);
+			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
+			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
+
+			plt_ml_dbg("index = %u, output1[%u] - sz_d = %u, sz_q = %u", layer->index,
+				   i, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			addr->output[i].nb_dims = 1;
-			addr->output[i].shape[0] = metadata->output2[j].size;
-			addr->output[i].nb_elements = metadata->output2[j].size;
-			addr->output[i].sz_d =
-				addr->output[i].nb_elements *
+			strncpy(layer->info.output[i].name,
+				(char *)metadata->output2[j].output_name, MRVL_ML_OUTPUT_NAME_LEN);
+			layer->info.output[i].dtype = metadata->output2[j].output_type;
+			layer->info.output[i].qtype = metadata->output2[j].model_output_type;
+			layer->info.output[i].nb_dims = 1;
+			layer->info.output[i].shape[0] = metadata->output2[j].size;
+			layer->info.output[i].nb_elements = metadata->output2[j].size;
+			layer->info.output[i].sz_d =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].output_type);
-			addr->output[i].sz_q =
-				addr->output[i].nb_elements *
+			layer->info.output[i].sz_q =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].model_output_type);
-			addr->total_output_sz_q += addr->output[i].sz_q;
-			addr->total_output_sz_d += addr->output[i].sz_d;
+			layer->info.output[i].scale = metadata->output2[j].dscale;
+
+			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
+			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
 
-			plt_ml_dbg("model_id = %u, output2[%u] - sz_d = %u, sz_q = %u",
-				   model->model_id, j, addr->output[i].sz_d, addr->output[i].sz_q);
+			plt_ml_dbg("index = %u, output2[%u] - sz_d = %u, sz_q = %u", layer->index,
+				   j, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
 		}
 	}
 }
@@ -514,23 +546,23 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 }
 
 void
-cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model)
+cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_model_addr *addr;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct rte_ml_model_info *info;
 	struct rte_ml_io_info *output;
 	struct rte_ml_io_info *input;
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_layer *layer;
 	uint8_t i;
-	uint8_t j;
 
-	mldev = dev->data->dev_private;
-	metadata = &model->metadata;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	metadata = &model->glow.metadata;
 	info = PLT_PTR_CAST(model->info);
 	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
 	output = PLT_PTR_ADD(input, metadata->model.num_input * sizeof(struct rte_ml_io_info));
-	addr = &model->addr;
 
 	/* Set model info */
 	memset(info, 0, sizeof(struct rte_ml_model_info));
@@ -542,7 +574,8 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model)
 	info->device_id = dev->data->dev_id;
 	info->io_layout = RTE_ML_IO_LAYOUT_PACKED;
 	info->min_batches = model->batch_size;
-	info->max_batches = mldev->fw.req->jd.fw_load.cap.s.max_num_batches / model->batch_size;
+	info->max_batches =
+		cn10k_mldev->fw.req->jd.fw_load.cap.s.max_num_batches / model->batch_size;
 	info->nb_inputs = metadata->model.num_input;
 	info->input_info = input;
 	info->nb_outputs = metadata->model.num_output;
@@ -550,56 +583,26 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model)
 	info->wb_size = metadata->weights_bias.file_size;
 
 	/* Set input info */
+	layer = &model->layer[0];
 	for (i = 0; i < info->nb_inputs; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			rte_memcpy(input[i].name, metadata->input1[i].input_name,
-				   MRVL_ML_INPUT_NAME_LEN);
-			input[i].nb_dims = addr->input[i].nb_dims;
-			input[i].shape = addr->input[i].shape;
-			input[i].type = metadata->input1[i].model_input_type;
-			input[i].nb_elements = addr->input[i].nb_elements;
-			input[i].size =
-				addr->input[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->input1[i].model_input_type);
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-
-			rte_memcpy(input[i].name, metadata->input2[j].input_name,
-				   MRVL_ML_INPUT_NAME_LEN);
-			input[i].nb_dims = addr->input[i].nb_dims;
-			input[i].shape = addr->input[i].shape;
-			input[i].type = metadata->input2[j].model_input_type;
-			input[i].nb_elements = addr->input[i].nb_elements;
-			input[i].size =
-				addr->input[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->input2[j].model_input_type);
-		}
+		rte_memcpy(input[i].name, layer->info.input[i].name, MRVL_ML_INPUT_NAME_LEN);
+		input[i].nb_dims = layer->info.input[i].nb_dims;
+		input[i].shape = &layer->info.input[i].shape[0];
+		input[i].type = layer->info.input[i].qtype;
+		input[i].nb_elements = layer->info.input[i].nb_elements;
+		input[i].size = layer->info.input[i].nb_elements *
+				rte_ml_io_type_size_get(layer->info.input[i].qtype);
 	}
 
 	/* Set output info */
+	layer = &model->layer[0];
 	for (i = 0; i < info->nb_outputs; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			rte_memcpy(output[i].name, metadata->output1[i].output_name,
-				   MRVL_ML_OUTPUT_NAME_LEN);
-			output[i].nb_dims = addr->output[i].nb_dims;
-			output[i].shape = addr->output[i].shape;
-			output[i].type = metadata->output1[i].model_output_type;
-			output[i].nb_elements = addr->output[i].nb_elements;
-			output[i].size =
-				addr->output[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->output1[i].model_output_type);
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-
-			rte_memcpy(output[i].name, metadata->output2[j].output_name,
-				   MRVL_ML_OUTPUT_NAME_LEN);
-			output[i].nb_dims = addr->output[i].nb_dims;
-			output[i].shape = addr->output[i].shape;
-			output[i].type = metadata->output2[j].model_output_type;
-			output[i].nb_elements = addr->output[i].nb_elements;
-			output[i].size =
-				addr->output[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->output2[j].model_output_type);
-		}
+		rte_memcpy(output[i].name, layer->info.output[i].name, MRVL_ML_INPUT_NAME_LEN);
+		output[i].nb_dims = layer->info.output[i].nb_dims;
+		output[i].shape = &layer->info.output[i].shape[0];
+		output[i].type = layer->info.output[i].qtype;
+		output[i].nb_elements = layer->info.output[i].nb_elements;
+		output[i].size = layer->info.output[i].nb_elements *
+				 rte_ml_io_type_size_get(layer->info.output[i].qtype);
 	}
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 3128b28db7..206a369ca7 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -13,15 +13,8 @@
 #include "cn10k_ml_ocm.h"
 #include "cn10k_ml_ops.h"
 
-struct cnxk_ml_dev;
-
-/* Model state */
-enum cn10k_ml_model_state {
-	ML_CN10K_MODEL_STATE_LOADED,
-	ML_CN10K_MODEL_STATE_JOB_ACTIVE,
-	ML_CN10K_MODEL_STATE_STARTED,
-	ML_CN10K_MODEL_STATE_UNKNOWN,
-};
+struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 /* Model Metadata : v 2.3.0.1 */
 #define MRVL_ML_MODEL_MAGIC_STRING "MRVL"
@@ -369,7 +362,7 @@ struct cn10k_ml_model_metadata {
 };
 
 /* Model address structure */
-struct cn10k_ml_model_addr {
+struct cn10k_ml_layer_addr {
 	/* Base DMA address for load */
 	void *base_dma_addr_load;
 
@@ -408,58 +401,10 @@ struct cn10k_ml_model_addr {
 
 	/* End tile */
 	uint8_t tile_end;
-
-	/* Input address and size */
-	struct {
-		/* Number of dimensions in shape */
-		uint32_t nb_dims;
-
-		/* Shape of input */
-		uint32_t shape[4];
-
-		/* Number of elements */
-		uint32_t nb_elements;
-
-		/* Dequantized input size */
-		uint32_t sz_d;
-
-		/* Quantized input size */
-		uint32_t sz_q;
-	} input[MRVL_ML_NUM_INPUT_OUTPUT];
-
-	/* Output address and size */
-	struct {
-		/* Number of dimensions in shape */
-		uint32_t nb_dims;
-
-		/* Shape of input */
-		uint32_t shape[4];
-
-		/* Number of elements */
-		uint32_t nb_elements;
-
-		/* Dequantize output size */
-		uint32_t sz_d;
-
-		/* Quantized output size */
-		uint32_t sz_q;
-	} output[MRVL_ML_NUM_INPUT_OUTPUT];
-
-	/* Total size of quantized input */
-	uint32_t total_input_sz_q;
-
-	/* Total size of dequantized input */
-	uint32_t total_input_sz_d;
-
-	/* Total size of quantized output */
-	uint32_t total_output_sz_q;
-
-	/* Total size of dequantized output */
-	uint32_t total_output_sz_d;
 };
 
 /* Model fast-path stats */
-struct cn10k_ml_model_stats {
+struct cn10k_ml_layer_stats {
 	/* Total hardware latency, sum of all inferences */
 	uint64_t hw_latency_tot;
 
@@ -488,59 +433,38 @@ struct cn10k_ml_model_stats {
 	uint64_t fw_reset_count;
 };
 
-/* Model Object */
-struct cn10k_ml_model {
-	/* Device reference */
-	struct cnxk_ml_dev *mldev;
-
-	/* Name */
-	char name[RTE_ML_STR_MAX];
-
-	/* ID */
-	uint16_t model_id;
-
-	/* Batch size */
-	uint32_t batch_size;
-
-	/* Metadata */
+struct cn10k_ml_layer_data {
+	/* Model / Layer: metadata */
 	struct cn10k_ml_model_metadata metadata;
 
-	/* Address structure */
-	struct cn10k_ml_model_addr addr;
+	/* Layer: address structure */
+	struct cn10k_ml_layer_addr addr;
 
-	/* Tile and memory information object */
-	struct cn10k_ml_ocm_model_map model_mem_map;
+	/* Layer: Tile and memory information object */
+	struct cn10k_ml_ocm_layer_map ocm_map;
 
-	/* Internal model information structure
-	 * Size of the buffer = sizeof(struct rte_ml_model_info)
-	 *                    + num_inputs * sizeof(struct rte_ml_io_info)
-	 *                    + num_outputs * sizeof(struct rte_ml_io_info).
-	 * Structures would be arranged in the same order in the buffer.
-	 */
-	uint8_t *info;
-
-	/* Spinlock, used to update model state */
-	plt_spinlock_t lock;
-
-	/* State */
-	enum cn10k_ml_model_state state;
-
-	/* Slow-path operations request pointer */
+	/* Layer: Slow-path operations request pointer */
 	struct cn10k_ml_req *req;
 
-	/* Stats for burst ops */
-	struct cn10k_ml_model_stats *burst_stats;
+	/* Layer: Stats for burst ops */
+	struct cn10k_ml_layer_stats *burst_stats;
 
-	/* Stats for sync ops */
-	struct cn10k_ml_model_stats *sync_stats;
+	/* Layer: Stats for sync ops */
+	struct cn10k_ml_layer_stats *sync_stats;
+};
+
+struct cn10k_ml_model_data {
+	/* Model / Layer: metadata */
+	struct cn10k_ml_model_metadata metadata;
 };
 
 int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
-void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
+void cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer,
 				uint8_t *base_dma_addr);
+void cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer);
 int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id,
 				   uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages);
-void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model);
+void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 8094a0fab1..d71c36eae6 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -6,10 +6,10 @@
 
 #include <roc_api.h>
 
-#include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 
 /* OCM macros */
 #define BYTE_LEN	   8
@@ -333,12 +333,14 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 }
 
 void
-cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t tilemask,
-			   int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages)
+cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
+			   uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
+			   uint16_t scratch_pages)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_page_start;
@@ -353,6 +355,7 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
+	layer = &model->layer[layer_id];
 
 	/* Get first set bit, tile_start */
 	tile_start = 0;
@@ -382,8 +385,8 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 				PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, wb_page_end);
 	}
 
-	model->addr.tile_start = tile_start;
-	model->addr.tile_end = tile_end;
+	layer->glow.addr.tile_start = tile_start;
+	layer->glow.addr.tile_end = tile_end;
 
 	plt_ml_dbg("model_id = %u, tilemask = 0x%016lx", model_id, tilemask);
 	plt_ml_dbg("model_id = %u, wb_page_start = %d, wb_page_end = %d", model_id, wb_page_start,
@@ -393,12 +396,14 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 }
 
 void
-cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id)
 {
-	struct cn10k_ml_model *local_model;
+	struct cnxk_ml_model *local_model;
+	struct cnxk_ml_layer *local_layer;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_resize_pages;
@@ -409,16 +414,19 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 	int tile_id;
 	int page_id;
 	uint16_t i;
+	uint16_t j;
 
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
+	layer = &model->layer[layer_id];
 
 	/* Update OCM info for WB memory */
-	wb_page_start = model->model_mem_map.wb_page_start;
-	wb_page_end = wb_page_start + model->model_mem_map.wb_pages - 1;
-	for (tile_id = model->addr.tile_start; tile_id <= model->addr.tile_end; tile_id++) {
+	wb_page_start = layer->glow.ocm_map.wb_page_start;
+	wb_page_end = wb_page_start + layer->glow.ocm_map.wb_pages - 1;
+	for (tile_id = layer->glow.addr.tile_start; tile_id <= layer->glow.addr.tile_end;
+	     tile_id++) {
 		for (page_id = wb_page_start; page_id <= wb_page_end; page_id++) {
 			CLEAR_BIT(ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE],
 				  page_id % OCM_MAP_WORD_SIZE);
@@ -432,11 +440,19 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 		scratch_resize_pages = 0;
 		for (i = 0; i < dev->data->nb_models; i++) {
 			local_model = dev->data->models[i];
-			if ((i != model_id) && (local_model != NULL)) {
-				if (IS_BIT_SET(local_model->model_mem_map.tilemask, tile_id))
-					scratch_resize_pages = PLT_MAX(
-						(int)local_model->model_mem_map.scratch_pages,
-						scratch_resize_pages);
+			if (local_model == NULL)
+				continue;
+
+			for (j = 0; j < local_model->nb_layers; j++) {
+				local_layer = &local_model->layer[j];
+				if (local_layer != layer &&
+				    local_layer->glow.ocm_map.ocm_reserved) {
+					if (IS_BIT_SET(local_layer->glow.ocm_map.tilemask, tile_id))
+						scratch_resize_pages =
+							PLT_MAX((int)local_layer->glow.ocm_map
+									.scratch_pages,
+								scratch_resize_pages);
+				}
 			}
 		}
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 3404e7fd65..720f8caf76 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -27,7 +27,7 @@ struct cn10k_ml_ocm_tile_info {
 };
 
 /* Model OCM map structure */
-struct cn10k_ml_ocm_model_map {
+struct cn10k_ml_ocm_layer_map {
 	/* Status of OCM reservation */
 	bool ocm_reserved;
 
@@ -77,9 +77,10 @@ struct cn10k_ml_ocm {
 int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
 int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
 			       uint16_t scratch_pages, uint64_t *tilemask);
-void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t tilemask,
-				int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages);
-void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id);
+void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
+				uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
+				uint16_t scratch_pages);
+void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id);
 void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp);
 
 #endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index def6d4c756..e91cc4e859 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -7,10 +7,10 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_model.h"
 #include "cn10k_ml_ops.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
@@ -202,7 +202,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	char str[STR_LEN];
 	uint8_t i;
@@ -215,77 +215,80 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 
 	/* Print debug info */
 	print_line(fp, LINE_LEN);
-	fprintf(fp, " Model Information (%s)\n", model->metadata.model.name);
+	fprintf(fp, " Model Information (%s)\n", model->glow.metadata.model.name);
 	print_line(fp, LINE_LEN);
-	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->metadata.model.name);
-	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version", model->metadata.model.version[0],
-		model->metadata.model.version[1], model->metadata.model.version[2],
-		model->metadata.model.version[3]);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->glow.metadata.model.name);
+	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version",
+		model->glow.metadata.model.version[0], model->glow.metadata.model.version[1],
+		model->glow.metadata.model.version[2], model->glow.metadata.model.version[3]);
 	if (strlen(model->name) != 0)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "debug_name", model->name);
 	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->metadata.model.batch_size);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_layers", model->metadata.model.num_layers);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->glow.metadata.model.batch_size);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_layers", model->glow.metadata.model.num_layers);
 
 	/* Print model state */
-	if (model->state == ML_CN10K_MODEL_STATE_LOADED)
+	if (model->state == ML_CNXK_MODEL_STATE_LOADED)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
-	if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE)
+	if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
-	if (model->state == ML_CN10K_MODEL_STATE_STARTED)
+	if (model->state == ML_CNXK_MODEL_STATE_STARTED)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
 
 	/* Print OCM status */
 	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
-		model->metadata.model.ocm_wb_range_end - model->metadata.model.ocm_wb_range_start +
-			1);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", model->model_mem_map.wb_pages);
+		model->glow.metadata.model.ocm_wb_range_end -
+			model->glow.metadata.model.ocm_wb_range_start + 1);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", model->layer[0].glow.ocm_map.wb_pages);
 	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
-		ocm->size_per_tile - model->metadata.model.ocm_tmp_range_floor);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages", model->model_mem_map.scratch_pages);
+		ocm->size_per_tile - model->glow.metadata.model.ocm_tmp_range_floor);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages",
+		model->layer[0].glow.ocm_map.scratch_pages);
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
-		model->metadata.model.tile_end - model->metadata.model.tile_start + 1);
+		model->glow.metadata.model.tile_end - model->glow.metadata.model.tile_start + 1);
 
-	if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+	if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
-			ML_CN10K_OCM_NUMTILES / 4, model->model_mem_map.tilemask);
+			ML_CN10K_OCM_NUMTILES / 4, model->layer[0].glow.ocm_map.tilemask);
 		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
-			model->model_mem_map.wb_page_start * cn10k_mldev->ocm.page_size);
+			model->layer[0].glow.ocm_map.wb_page_start * cn10k_mldev->ocm.page_size);
 	}
 
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->metadata.model.num_input);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", model->metadata.model.num_output);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->glow.metadata.model.num_input);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", model->glow.metadata.model.num_output);
 	fprintf(fp, "\n");
 
 	print_line(fp, LINE_LEN);
 	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "input", "input_name", "input_type",
 		"model_input_type", "quantize");
 	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->metadata.model.num_input; i++) {
+	for (i = 0; i < model->glow.metadata.model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.input1[i].input_name);
-			rte_ml_io_type_to_str(model->metadata.input1[i].input_type, str, STR_LEN);
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.input1[i].input_name);
+			rte_ml_io_type_to_str(model->glow.metadata.input1[i].input_type, str,
+					      STR_LEN);
 			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.input1[i].model_input_type, str,
+			rte_ml_io_type_to_str(model->glow.metadata.input1[i].model_input_type, str,
 					      STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.input1[i].quantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.input1[i].quantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.input2[j].input_name);
-			rte_ml_io_type_to_str(model->metadata.input2[j].input_type, str, STR_LEN);
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.input2[j].input_name);
+			rte_ml_io_type_to_str(model->glow.metadata.input2[j].input_type, str,
+					      STR_LEN);
 			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.input2[j].model_input_type, str,
+			rte_ml_io_type_to_str(model->glow.metadata.input2[j].model_input_type, str,
 					      STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.input2[j].quantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.input2[j].quantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		}
 	}
@@ -295,29 +298,31 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "output", "output_name", "output_type",
 		"model_output_type", "dequantize");
 	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->metadata.model.num_output; i++) {
+	for (i = 0; i < model->glow.metadata.model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.output1[i].output_name);
-			rte_ml_io_type_to_str(model->metadata.output1[i].output_type, str, STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.output1[i].model_output_type, str,
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.output1[i].output_name);
+			rte_ml_io_type_to_str(model->glow.metadata.output1[i].output_type, str,
 					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(model->glow.metadata.output1[i].model_output_type,
+					      str, STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.output1[i].dequantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.output1[i].dequantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.output2[j].output_name);
-			rte_ml_io_type_to_str(model->metadata.output2[j].output_type, str, STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.output2[j].model_output_type, str,
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.output2[j].output_name);
+			rte_ml_io_type_to_str(model->glow.metadata.output2[j].output_type, str,
 					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(model->glow.metadata.output2[j].model_output_type,
+					      str, STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.output2[j].dequantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.output2[j].dequantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		}
 	}
@@ -327,14 +332,14 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 }
 
 static void
-cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_model *model,
+cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_model *model,
 				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_model_addr *addr;
+	struct cn10k_ml_layer_addr *addr;
 
-	metadata = &model->metadata;
-	addr = &model->addr;
+	metadata = &model->glow.metadata;
+	addr = &model->layer[0].glow.addr;
 
 	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
 	req->jd.hdr.jce.w0.u64 = 0;
@@ -345,7 +350,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_m
 	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
-		if (!model->metadata.model.ocm_relocatable)
+		if (!model->glow.metadata.model.ocm_relocatable)
 			req->jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
 		else
 			req->jd.hdr.sp_flags = 0x0;
@@ -385,7 +390,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_m
 		req->jd.model_start.output.s.ddr_range_end = metadata->model.ddr_output_range_end;
 
 		req->extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
-			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, model->addr.scratch_base_addr));
+			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->scratch_base_addr));
 		req->extended_args.start.ddr_scratch_range_start =
 			metadata->model.ddr_scratch_range_start;
 		req->extended_args.start.ddr_scratch_range_end =
@@ -445,7 +450,7 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
-	nb_stats = RTE_DIM(device_stats) + ML_CN10K_MAX_MODELS * RTE_DIM(model_stats);
+	nb_stats = RTE_DIM(device_stats) + ML_CNXK_MAX_MODELS * RTE_DIM(model_stats);
 	if (cn10k_mldev->xstats.entries == NULL)
 		cn10k_mldev->xstats.entries = rte_zmalloc(
 			"cn10k_ml_xstats", sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
@@ -472,7 +477,7 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	cn10k_mldev->xstats.count_mode_device = stat_id;
 
 	/* Initialize model xstats */
-	for (model = 0; model < ML_CN10K_MAX_MODELS; model++) {
+	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
 		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
 
 		for (i = 0; i < RTE_DIM(model_stats); i++) {
@@ -521,7 +526,7 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint16_t rclk_freq;
 	uint16_t sclk_freq;
 	uint16_t stat_id;
@@ -543,7 +548,7 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 	for (i = 0; i < RTE_DIM(model_stats); i++) {
 		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
-			 model->metadata.model.name, model_stats[i].name, suffix);
+			 model->layer[0].glow.metadata.model.name, model_stats[i].name, suffix);
 		stat_id++;
 	}
 }
@@ -576,9 +581,9 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value += model->burst_stats[qp_id].str##_latency_tot;                      \
-			count += model->burst_stats[qp_id].dequeued_count -                        \
-				 model->burst_stats[qp_id].str##_reset_count;                      \
+			value += model->layer[0].glow.burst_stats[qp_id].str##_latency_tot;        \
+			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
+				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
 		}                                                                                  \
 		if (count != 0)                                                                    \
 			value = value / count;                                                     \
@@ -588,9 +593,10 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = UINT64_MAX;                                                                \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MIN(value, model->burst_stats[qp_id].str##_latency_min);       \
-			count += model->burst_stats[qp_id].dequeued_count -                        \
-				 model->burst_stats[qp_id].str##_reset_count;                      \
+			value = PLT_MIN(                                                           \
+				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_min); \
+			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
+				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
@@ -600,9 +606,10 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MAX(value, model->burst_stats[qp_id].str##_latency_max);       \
-			count += model->burst_stats[qp_id].dequeued_count -                        \
-				 model->burst_stats[qp_id].str##_reset_count;                      \
+			value = PLT_MAX(                                                           \
+				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_max); \
+			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
+				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
@@ -611,7 +618,7 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 static uint64_t
 cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cn10k_ml_xstats_type type)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint16_t rclk_freq; /* MHz */
 	uint16_t sclk_freq; /* MHz */
 	uint64_t count = 0;
@@ -692,28 +699,28 @@ cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[],
 #define ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			model->burst_stats[qp_id].str##_latency_tot = 0;                           \
-			model->burst_stats[qp_id].str##_reset_count =                              \
-				model->burst_stats[qp_id].dequeued_count;                          \
+			model->layer[0].glow.burst_stats[qp_id].str##_latency_tot = 0;             \
+			model->layer[0].glow.burst_stats[qp_id].str##_reset_count =                \
+				model->layer[0].glow.burst_stats[qp_id].dequeued_count;            \
 		}                                                                                  \
 	} while (0)
 
 #define ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->burst_stats[qp_id].str##_latency_min = UINT64_MAX;                  \
+			model->layer[0].glow.burst_stats[qp_id].str##_latency_min = UINT64_MAX;    \
 	} while (0)
 
 #define ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->burst_stats[qp_id].str##_latency_max = 0;                           \
+			model->layer[0].glow.burst_stats[qp_id].str##_latency_max = 0;             \
 	} while (0)
 
 static void
 cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cn10k_ml_xstats_type type)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint32_t qp_id;
 
 	model = dev->data->models[model_id];
@@ -749,7 +756,7 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	int32_t lcl_model_id = 0;
 	uint16_t start_id;
 	uint16_t end_id;
@@ -758,7 +765,7 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	for (i = 0; i < ML_CN10K_MAX_MODELS; i++) {
+	for (i = 0; i < ML_CNXK_MAX_MODELS; i++) {
 		if (model_id == -1) {
 			model = dev->data->models[i];
 			if (model == NULL) /* Skip inactive models */
@@ -803,7 +810,7 @@ static int
 cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct rte_ml_model_info *info;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct rte_ml_buff_seg seg[2];
 	struct rte_ml_buff_seg *inp;
 	struct rte_ml_buff_seg *out;
@@ -854,7 +861,7 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	op.input = &inp;
 	op.output = &out;
 
-	memset(model->req, 0, sizeof(struct cn10k_ml_req));
+	memset(model->layer[0].glow.req, 0, sizeof(struct cn10k_ml_req));
 	ret = cn10k_ml_inference_sync(dev, &op);
 	plt_memzone_free(mz);
 
@@ -875,7 +882,7 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 
 	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
 	dev_info->driver_name = dev->device->driver->name;
-	dev_info->max_models = ML_CN10K_MAX_MODELS;
+	dev_info->max_models = ML_CNXK_MAX_MODELS;
 	if (cn10k_mldev->hw_queue_lock)
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
 	else
@@ -895,7 +902,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
@@ -1001,11 +1008,11 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 			model = dev->data->models[model_id];
 			if (model != NULL) {
-				if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 					if (cn10k_ml_model_stop(dev, model_id) != 0)
 						plt_err("Could not stop model %u", model_id);
 				}
-				if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
 					if (cn10k_ml_model_unload(dev, model_id) != 0)
 						plt_err("Could not unload model %u", model_id);
 				}
@@ -1093,7 +1100,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
 	uint16_t qp_id;
@@ -1111,11 +1118,11 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 		model = dev->data->models[model_id];
 		if (model != NULL) {
-			if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 				if (cn10k_ml_model_stop(dev, model_id) != 0)
 					plt_err("Could not stop model %u", model_id);
 			}
-			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
 				if (cn10k_ml_model_unload(dev, model_id) != 0)
 					plt_err("Could not unload model %u", model_id);
 			}
@@ -1294,7 +1301,7 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CN10K_MAX_MODELS)
+		if (model_id >= ML_CNXK_MAX_MODELS)
 			break;
 		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
@@ -1386,7 +1393,7 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CN10K_MAX_MODELS)
+		if (model_id >= ML_CNXK_MAX_MODELS)
 			return -EINVAL;
 		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
@@ -1447,7 +1454,7 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_fw *fw;
 
 	uint32_t head_loc;
@@ -1588,7 +1595,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
@@ -1643,9 +1650,9 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
 			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
 	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
-	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_model_stats);
+	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_stats);
 
-	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
+	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
 		  2 * model_data_size + model_scratch_size + model_info_size +
 		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE) +
 		  model_stats_size;
@@ -1659,62 +1666,85 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	model = mz->addr;
-	model->mldev = cnxk_mldev;
+	model->cnxk_mldev = cnxk_mldev;
 	model->model_id = idx;
+	dev->data->models[idx] = model;
 
-	rte_memcpy(&model->metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
-	cn10k_ml_model_metadata_update(&model->metadata);
+	rte_memcpy(&model->glow.metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
+	cn10k_ml_model_metadata_update(&model->glow.metadata);
+
+	/* Set model name */
+	rte_memcpy(model->name, (char *)model->glow.metadata.model.name, 64);
 
 	/* Enable support for batch_size of 256 */
-	if (model->metadata.model.batch_size == 0)
+	if (model->glow.metadata.model.batch_size == 0)
 		model->batch_size = 256;
 	else
-		model->batch_size = model->metadata.model.batch_size;
+		model->batch_size = model->glow.metadata.model.batch_size;
+
+	/* Since the number of layers that the driver would be handling for glow models is
+	 * always 1. consider the entire model as a model with single layer. This would
+	 * ignore the num_layers from metadata.
+	 */
+	model->nb_layers = 1;
+
+	/* Copy metadata to internal buffer */
+	rte_memcpy(&model->layer[0].glow.metadata, params->addr,
+		   sizeof(struct cn10k_ml_model_metadata));
+	cn10k_ml_model_metadata_update(&model->layer[0].glow.metadata);
+	model->layer[0].model = model;
 
 	/* Set DMA base address */
 	base_dma_addr = PLT_PTR_ADD(
-		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE));
-	cn10k_ml_model_addr_update(model, params->addr, base_dma_addr);
-	model->addr.scratch_base_addr = PLT_PTR_ADD(base_dma_addr, 2 * model_data_size);
+		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE));
+	cn10k_ml_layer_addr_update(&model->layer[0], params->addr, base_dma_addr);
+	model->layer[0].glow.addr.scratch_base_addr =
+		PLT_PTR_ADD(base_dma_addr, 2 * model_data_size);
 
 	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, model_data_size);
+	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
+		   model->layer[0].glow.addr.base_dma_addr_load, model_data_size);
+
+	/* Update internal I/O data structure */
+	cn10k_ml_layer_info_update(&model->layer[0]);
 
 	/* Initialize model_mem_map */
-	memset(&model->model_mem_map, 0, sizeof(struct cn10k_ml_ocm_model_map));
-	model->model_mem_map.ocm_reserved = false;
-	model->model_mem_map.tilemask = 0;
-	model->model_mem_map.wb_page_start = -1;
-	model->model_mem_map.wb_pages = wb_pages;
-	model->model_mem_map.scratch_pages = scratch_pages;
+	memset(&model->layer[0].glow.ocm_map, 0, sizeof(struct cn10k_ml_ocm_layer_map));
+	model->layer[0].glow.ocm_map.ocm_reserved = false;
+	model->layer[0].glow.ocm_map.tilemask = 0;
+	model->layer[0].glow.ocm_map.wb_page_start = -1;
+	model->layer[0].glow.ocm_map.wb_pages = wb_pages;
+	model->layer[0].glow.ocm_map.scratch_pages = scratch_pages;
 
 	/* Set model info */
-	model->info = PLT_PTR_ADD(model->addr.scratch_base_addr, model_scratch_size);
+	model->info = PLT_PTR_ADD(model->layer[0].glow.addr.scratch_base_addr, model_scratch_size);
 	cn10k_ml_model_info_set(dev, model);
 
 	/* Set slow-path request address and state */
-	model->req = PLT_PTR_ADD(model->info, model_info_size);
+	model->layer[0].glow.req = PLT_PTR_ADD(model->info, model_info_size);
 
 	/* Reset burst and sync stats */
-	model->burst_stats = PLT_PTR_ADD(
-		model->req, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE));
+	model->layer[0].glow.burst_stats =
+		PLT_PTR_ADD(model->layer[0].glow.req,
+			    PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE));
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
-		model->burst_stats[qp_id].hw_latency_tot = 0;
-		model->burst_stats[qp_id].hw_latency_min = UINT64_MAX;
-		model->burst_stats[qp_id].hw_latency_max = 0;
-		model->burst_stats[qp_id].fw_latency_tot = 0;
-		model->burst_stats[qp_id].fw_latency_min = UINT64_MAX;
-		model->burst_stats[qp_id].fw_latency_max = 0;
-		model->burst_stats[qp_id].hw_reset_count = 0;
-		model->burst_stats[qp_id].fw_reset_count = 0;
-		model->burst_stats[qp_id].dequeued_count = 0;
-	}
-	model->sync_stats =
-		PLT_PTR_ADD(model->burst_stats,
-			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_model_stats));
+		model->layer[0].glow.burst_stats[qp_id].hw_latency_tot = 0;
+		model->layer[0].glow.burst_stats[qp_id].hw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_stats[qp_id].hw_latency_max = 0;
+		model->layer[0].glow.burst_stats[qp_id].fw_latency_tot = 0;
+		model->layer[0].glow.burst_stats[qp_id].fw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_stats[qp_id].fw_latency_max = 0;
+		model->layer[0].glow.burst_stats[qp_id].hw_reset_count = 0;
+		model->layer[0].glow.burst_stats[qp_id].fw_reset_count = 0;
+		model->layer[0].glow.burst_stats[qp_id].dequeued_count = 0;
+	}
+
+	model->layer[0].glow.sync_stats =
+		PLT_PTR_ADD(model->layer[0].glow.burst_stats,
+			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_stats));
 
 	plt_spinlock_init(&model->lock);
-	model->state = ML_CN10K_MODEL_STATE_LOADED;
+	model->state = ML_CNXK_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
 	cnxk_mldev->nb_models_loaded++;
 
@@ -1730,7 +1760,7 @@ int
 cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	char str[RTE_MEMZONE_NAMESIZE];
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cnxk_ml_dev *cnxk_mldev;
 
 	cnxk_mldev = dev->data->dev_private;
@@ -1741,7 +1771,7 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EINVAL;
 	}
 
-	if (model->state != ML_CN10K_MODEL_STATE_LOADED) {
+	if (model->state != ML_CNXK_MODEL_STATE_LOADED) {
 		plt_err("Cannot unload. Model in use.");
 		return -EBUSY;
 	}
@@ -1758,7 +1788,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1783,7 +1813,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	}
 
 	/* Prepare JD */
-	req = model->req;
+	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
@@ -1791,63 +1821,66 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
-	num_tiles = model->metadata.model.tile_end - model->metadata.model.tile_start + 1;
+	num_tiles = model->layer[0].glow.metadata.model.tile_end -
+		    model->layer[0].glow.metadata.model.tile_start + 1;
 
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 				plt_ml_dbg("Model already started, model = 0x%016lx",
 					   PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE) {
+			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
 				plt_err("A slow-path job is active for the model = 0x%016lx",
 					PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CN10K_MODEL_STATE_JOB_ACTIVE;
+			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (!model->model_mem_map.ocm_reserved) {
+	while (!model->layer[0].glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
 			wb_page_start = cn10k_ml_ocm_tilemask_find(
-				dev, num_tiles, model->model_mem_map.wb_pages,
-				model->model_mem_map.scratch_pages, &tilemask);
+				dev, num_tiles, model->layer[0].glow.ocm_map.wb_pages,
+				model->layer[0].glow.ocm_map.scratch_pages, &tilemask);
 
 			if (wb_page_start == -1) {
 				plt_err("Free pages not available on OCM tiles");
 				plt_err("Failed to start model = 0x%016lx, name = %s",
-					PLT_U64_CAST(model), model->metadata.model.name);
+					PLT_U64_CAST(model),
+					model->layer[0].glow.metadata.model.name);
 
 				plt_spinlock_unlock(&ocm->lock);
 				return -ENOMEM;
 			}
 
-			model->model_mem_map.tilemask = tilemask;
-			model->model_mem_map.wb_page_start = wb_page_start;
+			model->layer[0].glow.ocm_map.tilemask = tilemask;
+			model->layer[0].glow.ocm_map.wb_page_start = wb_page_start;
 
-			cn10k_ml_ocm_reserve_pages(
-				dev, model->model_id, model->model_mem_map.tilemask,
-				model->model_mem_map.wb_page_start, model->model_mem_map.wb_pages,
-				model->model_mem_map.scratch_pages);
-			model->model_mem_map.ocm_reserved = true;
+			cn10k_ml_ocm_reserve_pages(dev, model->model_id, 0,
+						   model->layer[0].glow.ocm_map.tilemask,
+						   model->layer[0].glow.ocm_map.wb_page_start,
+						   model->layer[0].glow.ocm_map.wb_pages,
+						   model->layer[0].glow.ocm_map.scratch_pages);
+			model->layer[0].glow.ocm_map.ocm_reserved = true;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
 
 	/* Update JD */
-	cn10k_ml_ocm_tilecount(model->model_mem_map.tilemask, &tile_start, &tile_end);
+	cn10k_ml_ocm_tilecount(model->layer[0].glow.ocm_map.tilemask, &tile_start, &tile_end);
 	req->jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
 	req->jd.model_start.ocm_wb_base_address =
-		model->model_mem_map.wb_page_start * ocm->page_size;
+		model->layer[0].glow.ocm_map.wb_page_start * ocm->page_size;
 
 	job_enqueued = false;
 	job_dequeued = false;
@@ -1880,10 +1913,10 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
 			if (ret == 0) {
-				model->state = ML_CN10K_MODEL_STATE_STARTED;
+				model->state = ML_CNXK_MODEL_STATE_STARTED;
 				cnxk_mldev->nb_models_started++;
 			} else {
-				model->state = ML_CN10K_MODEL_STATE_UNKNOWN;
+				model->state = ML_CNXK_MODEL_STATE_UNKNOWN;
 			}
 
 			plt_spinlock_unlock(&model->lock);
@@ -1891,12 +1924,12 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 		}
 	}
 
-	if (model->state == ML_CN10K_MODEL_STATE_UNKNOWN) {
-		while (model->model_mem_map.ocm_reserved) {
+	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN) {
+		while (model->layer[0].glow.ocm_map.ocm_reserved) {
 			if (plt_spinlock_trylock(&ocm->lock) != 0) {
-				cn10k_ml_ocm_free_pages(dev, model->model_id);
-				model->model_mem_map.ocm_reserved = false;
-				model->model_mem_map.tilemask = 0x0;
+				cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
+				model->layer[0].glow.ocm_map.ocm_reserved = false;
+				model->layer[0].glow.ocm_map.tilemask = 0x0;
 				plt_spinlock_unlock(&ocm->lock);
 			}
 		}
@@ -1917,7 +1950,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1937,7 +1970,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	}
 
 	/* Prepare JD */
-	req = model->req;
+	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
@@ -1948,31 +1981,31 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
 				plt_ml_dbg("Model not started, model = 0x%016lx",
 					   PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE) {
+			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
 				plt_err("A slow-path job is active for the model = 0x%016lx",
 					PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CN10K_MODEL_STATE_JOB_ACTIVE;
+			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (model->model_mem_map.ocm_reserved) {
+	while (model->layer[0].glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
-			cn10k_ml_ocm_free_pages(dev, model->model_id);
-			model->model_mem_map.ocm_reserved = false;
-			model->model_mem_map.tilemask = 0x0;
+			cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
+			model->layer[0].glow.ocm_map.ocm_reserved = false;
+			model->layer[0].glow.ocm_map.tilemask = 0x0;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
@@ -2008,7 +2041,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
 			cnxk_mldev->nb_models_stopped++;
-			model->state = ML_CN10K_MODEL_STATE_LOADED;
+			model->state = ML_CNXK_MODEL_STATE_LOADED;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
@@ -2021,7 +2054,7 @@ static int
 cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			struct rte_ml_model_info *model_info)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 
 	model = dev->data->models[model_id];
 
@@ -2040,7 +2073,7 @@ cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 static int
 cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	size_t size;
 
 	model = dev->data->models[model_id];
@@ -2050,19 +2083,23 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *bu
 		return -EINVAL;
 	}
 
-	if (model->state == ML_CN10K_MODEL_STATE_UNKNOWN)
+	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN)
 		return -1;
-	else if (model->state != ML_CN10K_MODEL_STATE_LOADED)
+	else if (model->state != ML_CNXK_MODEL_STATE_LOADED)
 		return -EBUSY;
 
-	size = model->metadata.init_model.file_size + model->metadata.main_model.file_size +
-	       model->metadata.finish_model.file_size + model->metadata.weights_bias.file_size;
+	size = model->layer[0].glow.metadata.init_model.file_size +
+	       model->layer[0].glow.metadata.main_model.file_size +
+	       model->layer[0].glow.metadata.finish_model.file_size +
+	       model->layer[0].glow.metadata.weights_bias.file_size;
 
 	/* Update model weights & bias */
-	rte_memcpy(model->addr.wb_load_addr, buffer, model->metadata.weights_bias.file_size);
+	rte_memcpy(model->layer[0].glow.addr.wb_load_addr, buffer,
+		   model->layer[0].glow.metadata.weights_bias.file_size);
 
 	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, size);
+	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
+		   model->layer[0].glow.addr.base_dma_addr_load, size);
 
 	return 0;
 }
@@ -2071,7 +2108,7 @@ static int
 cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
 		     struct rte_ml_buff_seg **qbuffer)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint8_t model_input_type;
 	uint8_t *lcl_dbuffer;
 	uint8_t *lcl_qbuffer;
@@ -2091,57 +2128,58 @@ cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_bu
 	lcl_dbuffer = dbuffer[0]->addr;
 	lcl_qbuffer = qbuffer[0]->addr;
 
-	for (i = 0; i < model->metadata.model.num_input; i++) {
+	for (i = 0; i < model->layer[0].glow.metadata.model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			input_type = model->metadata.input1[i].input_type;
-			model_input_type = model->metadata.input1[i].model_input_type;
-			qscale = model->metadata.input1[i].qscale;
+			input_type = model->layer[0].glow.metadata.input1[i].input_type;
+			model_input_type = model->layer[0].glow.metadata.input1[i].model_input_type;
+			qscale = model->layer[0].glow.metadata.input1[i].qscale;
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			input_type = model->metadata.input2[j].input_type;
-			model_input_type = model->metadata.input2[j].model_input_type;
-			qscale = model->metadata.input2[j].qscale;
+			input_type = model->layer[0].glow.metadata.input2[j].input_type;
+			model_input_type = model->layer[0].glow.metadata.input2[j].model_input_type;
+			qscale = model->layer[0].glow.metadata.input2[j].qscale;
 		}
 
 		if (input_type == model_input_type) {
-			rte_memcpy(lcl_qbuffer, lcl_dbuffer, model->addr.input[i].sz_d);
+			rte_memcpy(lcl_qbuffer, lcl_dbuffer, model->layer[0].info.input[i].sz_d);
 		} else {
-			switch (model->metadata.input1[i].model_input_type) {
+			switch (model->layer[0].glow.metadata.input1[i].model_input_type) {
 			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_float32_to_int8(qscale,
-								model->addr.input[i].nb_elements,
-								lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_int8(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_float32_to_uint8(qscale,
-								 model->addr.input[i].nb_elements,
-								 lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_uint8(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_float32_to_int16(qscale,
-								 model->addr.input[i].nb_elements,
-								 lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_int16(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_float32_to_uint16(qscale,
-								  model->addr.input[i].nb_elements,
-								  lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_uint16(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_FP16:
-				ret = rte_ml_io_float32_to_float16(model->addr.input[i].nb_elements,
-								   lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_float16(
+					model->layer[0].info.input[i].nb_elements, lcl_dbuffer,
+					lcl_qbuffer);
 				break;
 			default:
 				plt_err("Unsupported model_input_type[%u] : %u", i,
-					model->metadata.input1[i].model_input_type);
+					model->layer[0].glow.metadata.input1[i].model_input_type);
 				ret = -ENOTSUP;
 			}
 			if (ret < 0)
 				return ret;
 		}
 
-		lcl_dbuffer += model->addr.input[i].sz_d;
-		lcl_qbuffer += model->addr.input[i].sz_q;
+		lcl_dbuffer += model->layer[0].info.input[i].sz_d;
+		lcl_qbuffer += model->layer[0].info.input[i].sz_q;
 	}
 
 	return 0;
@@ -2151,7 +2189,7 @@ static int
 cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
 		       struct rte_ml_buff_seg **dbuffer)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint8_t model_output_type;
 	uint8_t *lcl_qbuffer;
 	uint8_t *lcl_dbuffer;
@@ -2171,58 +2209,60 @@ cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_
 	lcl_dbuffer = dbuffer[0]->addr;
 	lcl_qbuffer = qbuffer[0]->addr;
 
-	for (i = 0; i < model->metadata.model.num_output; i++) {
+	for (i = 0; i < model->layer[0].glow.metadata.model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			output_type = model->metadata.output1[i].output_type;
-			model_output_type = model->metadata.output1[i].model_output_type;
-			dscale = model->metadata.output1[i].dscale;
+			output_type = model->layer[0].glow.metadata.output1[i].output_type;
+			model_output_type =
+				model->layer[0].glow.metadata.output1[i].model_output_type;
+			dscale = model->layer[0].glow.metadata.output1[i].dscale;
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			output_type = model->metadata.output2[j].output_type;
-			model_output_type = model->metadata.output2[j].model_output_type;
-			dscale = model->metadata.output2[j].dscale;
+			output_type = model->layer[0].glow.metadata.output2[j].output_type;
+			model_output_type =
+				model->layer[0].glow.metadata.output2[j].model_output_type;
+			dscale = model->layer[0].glow.metadata.output2[j].dscale;
 		}
 
 		if (output_type == model_output_type) {
-			rte_memcpy(lcl_dbuffer, lcl_qbuffer, model->addr.output[i].sz_q);
+			rte_memcpy(lcl_dbuffer, lcl_qbuffer, model->layer[0].info.output[i].sz_q);
 		} else {
-			switch (model->metadata.output1[i].model_output_type) {
+			switch (model->layer[0].glow.metadata.output1[i].model_output_type) {
 			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_int8_to_float32(dscale,
-								model->addr.output[i].nb_elements,
-								lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_int8_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_uint8_to_float32(dscale,
-								 model->addr.output[i].nb_elements,
-								 lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_uint8_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_int16_to_float32(dscale,
-								 model->addr.output[i].nb_elements,
-								 lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_int16_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_uint16_to_float32(dscale,
-								  model->addr.output[i].nb_elements,
-								  lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_uint16_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_FP16:
 				ret = rte_ml_io_float16_to_float32(
-					model->addr.output[i].nb_elements, lcl_qbuffer,
+					model->layer[0].info.output[i].nb_elements, lcl_qbuffer,
 					lcl_dbuffer);
 				break;
 			default:
 				plt_err("Unsupported model_output_type[%u] : %u", i,
-					model->metadata.output1[i].model_output_type);
+					model->layer[0].glow.metadata.output1[i].model_output_type);
 				ret = -ENOTSUP;
 			}
 			if (ret < 0)
 				return ret;
 		}
 
-		lcl_qbuffer += model->addr.output[i].sz_q;
-		lcl_dbuffer += model->addr.output[i].sz_d;
+		lcl_qbuffer += model->layer[0].info.output[i].sz_q;
+		lcl_dbuffer += model->layer[0].info.output[i].sz_d;
 	}
 
 	return 0;
@@ -2250,10 +2290,10 @@ static __rte_always_inline void
 cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
 		       struct rte_ml_op *op)
 {
-	struct cn10k_ml_model_stats *stats;
+	struct cn10k_ml_layer_stats *stats;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_qp *qp;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
@@ -2263,9 +2303,9 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		if (likely(qp_id >= 0)) {
 			qp = dev->data->queue_pairs[qp_id];
 			qp->stats.dequeued_count++;
-			stats = &model->burst_stats[qp_id];
+			stats = &model->layer[0].glow.burst_stats[qp_id];
 		} else {
-			stats = model->sync_stats;
+			stats = model->layer[0].glow.sync_stats;
 		}
 
 		if (unlikely(stats->dequeued_count == stats->hw_reset_count)) {
@@ -2469,7 +2509,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_req *req;
 	bool timeout;
 	int ret = 0;
@@ -2477,7 +2517,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[op->model_id];
-	req = model->req;
+	req = model->layer[0].glow.req;
 
 	cn10k_ml_set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
diff --git a/drivers/ml/cnxk/cnxk_ml_io.h b/drivers/ml/cnxk/cnxk_ml_io.h
new file mode 100644
index 0000000000..29ec7ec511
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_io.h
@@ -0,0 +1,79 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_IO_H_
+#define _CNXK_ML_IO_H_
+
+#include <rte_mldev.h>
+
+/* Maximum number of models per device */
+#define ML_CNXK_MAX_MODELS 16
+
+/* Maximum number of layers per model */
+#define ML_CNXK_MODEL_MAX_LAYERS 32
+
+/* Maximum number of inputs or outputs per layer or model */
+#define ML_CNXK_MODEL_MAX_INPUT_OUTPUT 32
+
+/* Maximum number of dimensions per I/O shape */
+#define ML_CNXK_MODEL_MAX_DIMS 8
+
+/* Input / Output structure */
+struct cnxk_ml_io {
+	/* name */
+	char name[RTE_ML_STR_MAX];
+
+	/* dequantized data type */
+	enum rte_ml_io_type dtype;
+
+	/* quantized data type */
+	enum rte_ml_io_type qtype;
+
+	/* Number of dimensions in shape */
+	uint32_t nb_dims;
+
+	/* Shape of input */
+	uint32_t shape[ML_CNXK_MODEL_MAX_DIMS];
+
+	/* Number of elements */
+	uint32_t nb_elements;
+
+	/* Dequantized input size */
+	uint32_t sz_d;
+
+	/* Quantized input size */
+	uint32_t sz_q;
+
+	/* Scale */
+	float scale;
+};
+
+/* Model / Layer IO structure */
+struct cnxk_ml_io_info {
+	/* Number of inputs */
+	uint16_t nb_inputs;
+
+	/* Model / Layer inputs */
+	struct cnxk_ml_io input[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Total size of quantized input */
+	uint32_t total_input_sz_q;
+
+	/* Total size of dequantized input */
+	uint32_t total_input_sz_d;
+
+	/* Number of outputs */
+	uint16_t nb_outputs;
+
+	/* Model / Layer outputs */
+	struct cnxk_ml_io output[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Total size of quantized output */
+	uint32_t total_output_sz_q;
+
+	/* Total size of dequantized output */
+	uint32_t total_output_sz_d;
+};
+
+#endif /* _CNXK_ML_IO_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
new file mode 100644
index 0000000000..3d735ced3e
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include "cnxk_ml_model.h"
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
new file mode 100644
index 0000000000..a2994dbb71
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -0,0 +1,111 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_MODEL_H_
+#define _CNXK_ML_MODEL_H_
+
+#include <rte_mldev.h>
+
+#include <roc_api.h>
+
+#include "cn10k_ml_model.h"
+
+#include "cnxk_ml_io.h"
+
+struct cnxk_ml_dev;
+struct cnxk_ml_model;
+
+/* Model state */
+enum cnxk_ml_model_state {
+	/* Unknown state */
+	ML_CNXK_MODEL_STATE_UNKNOWN,
+
+	/* Model loaded */
+	ML_CNXK_MODEL_STATE_LOADED,
+
+	/* A slow-path job is active, start or stop */
+	ML_CNXK_MODEL_STATE_JOB_ACTIVE,
+
+	/* Model started */
+	ML_CNXK_MODEL_STATE_STARTED,
+};
+
+/* Layer state */
+enum cnxk_ml_layer_state {
+	/* Unknown state */
+	ML_CNXK_LAYER_STATE_UNKNOWN,
+
+	/* Layer loaded */
+	ML_CNXK_LAYER_STATE_LOADED,
+
+	/* A slow-path job is active, start or stop */
+	ML_CNXK_LAYER_STATE_JOB_ACTIVE,
+
+	/* Layer started */
+	ML_CNXK_LAYER_STATE_STARTED,
+};
+
+/* Layer object */
+struct cnxk_ml_layer {
+	/* Name*/
+	char name[RTE_ML_STR_MAX];
+
+	/* Model handle */
+	struct cnxk_ml_model *model;
+
+	/* Index mapped with firmware's model_id */
+	uint16_t index;
+
+	/* Input / Output */
+	struct cnxk_ml_io_info info;
+
+	/* Batch size */
+	uint32_t batch_size;
+
+	/* State */
+	enum cnxk_ml_layer_state state;
+
+	/* Glow layer specific data */
+	struct cn10k_ml_layer_data glow;
+};
+
+/* Model Object */
+struct cnxk_ml_model {
+	/* Device reference */
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	/* ID */
+	uint16_t model_id;
+
+	/* Name */
+	char name[RTE_ML_STR_MAX];
+
+	/* Model specific data - glow */
+	struct cn10k_ml_model_data glow;
+
+	/* Batch size */
+	uint32_t batch_size;
+
+	/* Number of layers */
+	uint16_t nb_layers;
+
+	/* Layer info */
+	struct cnxk_ml_layer layer[ML_CNXK_MODEL_MAX_LAYERS];
+
+	/* State */
+	enum cnxk_ml_model_state state;
+
+	/* Internal model information structure
+	 * Size of the buffer = sizeof(struct rte_ml_model_info)
+	 *                    + num_inputs * sizeof(struct rte_ml_io_info)
+	 *                    + num_outputs * sizeof(struct rte_ml_io_info).
+	 * Structures would be arranged in the same order in the buffer.
+	 */
+	uint8_t *info;
+
+	/* Spinlock, used to update model state */
+	plt_spinlock_t lock;
+};
+
+#endif /* _CNXK_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 03a2d4ecf2..72e03b15b5 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -13,6 +13,8 @@ driver_sdk_headers = files(
         'cn10k_ml_model.h',
         'cn10k_ml_ocm.h',
         'cnxk_ml_dev.h',
+        'cnxk_ml_io.h',
+        'cnxk_ml_model.h',
 )
 
 sources = files(
@@ -21,6 +23,7 @@ sources = files(
         'cn10k_ml_model.c',
         'cn10k_ml_ocm.c',
         'cnxk_ml_dev.c',
+        'cnxk_ml_model.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 04/34] ml/cnxk: add generic cnxk request structure
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
                     ` (2 preceding siblings ...)
  2023-10-18  6:47   ` [PATCH v5 03/34] ml/cnxk: add generic model and layer structures Srikanth Yalavarthi
@ 2023-10-18  6:47   ` Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 05/34] ml/cnxk: add generic cnxk xstats structures Srikanth Yalavarthi
                     ` (29 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:47 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added generic cnxk request structure. Moved common fields
from cn10k structures to cnxk structure. Moved job related
structures and enumerations to ops headers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c   |  72 +++----
 drivers/ml/cnxk/cn10k_ml_dev.h   | 269 +------------------------
 drivers/ml/cnxk/cn10k_ml_model.c |   6 +-
 drivers/ml/cnxk/cn10k_ml_model.h |   4 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 331 +++++++++++++++++--------------
 drivers/ml/cnxk/cn10k_ml_ops.h   | 296 +++++++++++++++++++++++----
 drivers/ml/cnxk/cnxk_ml_ops.c    |   7 +
 drivers/ml/cnxk/cnxk_ml_ops.h    |  63 ++++++
 drivers/ml/cnxk/meson.build      |   2 +
 9 files changed, 558 insertions(+), 492 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 3bc61443d8..fc6f78d414 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -14,9 +14,8 @@
 
 #include <roc_api.h>
 
-#include "cn10k_ml_ops.h"
-
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_ops.h"
 
 #define CN10K_ML_FW_PATH		"fw_path"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
@@ -400,20 +399,23 @@ cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 static void
 cn10k_ml_fw_print_info(struct cn10k_ml_fw *fw)
 {
-	plt_info("ML Firmware Version = %s", fw->req->jd.fw_load.version);
-
-	plt_ml_dbg("Firmware capabilities = 0x%016lx", fw->req->jd.fw_load.cap.u64);
-	plt_ml_dbg("Version = %s", fw->req->jd.fw_load.version);
-	plt_ml_dbg("core0_debug_ptr = 0x%016lx", fw->req->jd.fw_load.debug.core0_debug_ptr);
-	plt_ml_dbg("core1_debug_ptr = 0x%016lx", fw->req->jd.fw_load.debug.core1_debug_ptr);
-	plt_ml_dbg("debug_buffer_size = %u bytes", fw->req->jd.fw_load.debug.debug_buffer_size);
+	plt_info("ML Firmware Version = %s", fw->req->cn10k_req.jd.fw_load.version);
+
+	plt_ml_dbg("Firmware capabilities = 0x%016lx", fw->req->cn10k_req.jd.fw_load.cap.u64);
+	plt_ml_dbg("Version = %s", fw->req->cn10k_req.jd.fw_load.version);
+	plt_ml_dbg("core0_debug_ptr = 0x%016lx",
+		   fw->req->cn10k_req.jd.fw_load.debug.core0_debug_ptr);
+	plt_ml_dbg("core1_debug_ptr = 0x%016lx",
+		   fw->req->cn10k_req.jd.fw_load.debug.core1_debug_ptr);
+	plt_ml_dbg("debug_buffer_size = %u bytes",
+		   fw->req->cn10k_req.jd.fw_load.debug.debug_buffer_size);
 	plt_ml_dbg("core0_exception_buffer = 0x%016lx",
-		   fw->req->jd.fw_load.debug.core0_exception_buffer);
+		   fw->req->cn10k_req.jd.fw_load.debug.core0_exception_buffer);
 	plt_ml_dbg("core1_exception_buffer = 0x%016lx",
-		   fw->req->jd.fw_load.debug.core1_exception_buffer);
+		   fw->req->cn10k_req.jd.fw_load.debug.core1_exception_buffer);
 	plt_ml_dbg("exception_state_size = %u bytes",
-		   fw->req->jd.fw_load.debug.exception_state_size);
-	plt_ml_dbg("flags = 0x%016lx", fw->req->jd.fw_load.flags);
+		   fw->req->cn10k_req.jd.fw_load.debug.exception_state_size);
+	plt_ml_dbg("flags = 0x%016lx", fw->req->cn10k_req.jd.fw_load.flags);
 }
 
 uint64_t
@@ -458,29 +460,30 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 	roc_ml_reg_save(&cn10k_mldev->roc, ML_MLR_BASE);
 
 	/* Update FW load completion structure */
-	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
-	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
-	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
+	fw->req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->cn10k_req.status);
+	fw->req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
+	fw->req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->cn10k_req.result);
+	fw->req->cn10k_req.jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->cn10k_req.status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
 	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->cn10k_req.jd);
 
 	plt_rmb();
 	do {
 		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
+		    (plt_read64(&fw->req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
 	} while (plt_tsc_cycles() < timeout_cycle);
 
 	/* Check firmware load status, clean-up and exit on failure. */
-	if ((!timeout) && (fw->req->result.error_code.u64 == 0)) {
+	if ((!timeout) && (fw->req->cn10k_req.result.error_code == 0)) {
 		cn10k_ml_fw_print_info(fw);
 	} else {
 		/* Set ML to disable new jobs */
@@ -654,29 +657,30 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	plt_ml_dbg("ML_SW_RST_CTRL => 0x%08x", reg_val32);
 
 	/* (12) Wait for notification from firmware that ML is ready for job execution. */
-	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
-	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
-	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
+	fw->req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->cn10k_req.status);
+	fw->req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
+	fw->req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->cn10k_req.result);
+	fw->req->cn10k_req.jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->cn10k_req.status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
 	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->cn10k_req.jd);
 
 	plt_rmb();
 	do {
 		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
+		    (plt_read64(&fw->req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
 	} while (plt_tsc_cycles() < timeout_cycle);
 
 	/* Check firmware load status, clean-up and exit on failure. */
-	if ((!timeout) && (fw->req->result.error_code.u64 == 0)) {
+	if ((!timeout) && (fw->req->cn10k_req.result.error_code == 0)) {
 		cn10k_ml_fw_print_info(fw);
 	} else {
 		/* Set ML to disable new jobs */
@@ -766,11 +770,11 @@ cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 		}
 
 		/* Reserve memzone for firmware load completion and data */
-		mz_size = sizeof(struct cn10k_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
+		mz_size = sizeof(struct cnxk_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
 			  FW_DEBUG_BUFFER_SIZE + FW_EXCEPTION_BUFFER_SIZE;
 	} else if (roc_env_is_asim()) {
 		/* Reserve memzone for firmware load completion */
-		mz_size = sizeof(struct cn10k_ml_req);
+		mz_size = sizeof(struct cnxk_ml_req);
 	}
 
 	mz = plt_memzone_reserve_aligned(FW_MEMZONE_NAME, mz_size, 0, ML_CN10K_ALIGN_SIZE);
@@ -782,8 +786,8 @@ cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 	fw->req = mz->addr;
 
 	/* Reset firmware load completion structure */
-	memset(&fw->req->jd, 0, sizeof(struct cn10k_ml_jd));
-	memset(&fw->req->jd.fw_load.version[0], '\0', MLDEV_FIRMWARE_VERSION_LENGTH);
+	memset(&fw->req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+	memset(&fw->req->cn10k_req.jd.fw_load.version[0], '\0', MLDEV_FIRMWARE_VERSION_LENGTH);
 
 	/* Reset device, if in active state */
 	if (roc_ml_mlip_is_enabled(&cn10k_mldev->roc))
@@ -791,7 +795,7 @@ cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 
 	/* Load firmware */
 	if (roc_env_is_emulator() || roc_env_is_hw()) {
-		fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cn10k_ml_req));
+		fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cnxk_ml_req));
 		ret = cn10k_ml_fw_load_cn10ka(fw, fw_buffer, fw_size);
 		free(fw_buffer);
 	} else if (roc_env_is_asim()) {
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 99ff0a344a..1852d4f6c9 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -17,9 +17,6 @@ extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
-/* Firmware version string length */
-#define MLDEV_FIRMWARE_VERSION_LENGTH 32
-
 /* Device alignment size */
 #define ML_CN10K_ALIGN_SIZE 128
 
@@ -52,17 +49,8 @@ extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 #endif
 
 struct cnxk_ml_dev;
-struct cn10k_ml_req;
-struct cn10k_ml_qp;
-
-/* Job types */
-enum cn10k_ml_job_type {
-	ML_CN10K_JOB_TYPE_MODEL_RUN = 0,
-	ML_CN10K_JOB_TYPE_MODEL_STOP,
-	ML_CN10K_JOB_TYPE_MODEL_START,
-	ML_CN10K_JOB_TYPE_FIRMWARE_LOAD,
-	ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST,
-};
+struct cnxk_ml_req;
+struct cnxk_ml_qp;
 
 /* Error types enumeration */
 enum cn10k_ml_error_etype {
@@ -112,251 +100,6 @@ union cn10k_ml_error_code {
 	uint64_t u64;
 };
 
-/* Firmware stats */
-struct cn10k_ml_fw_stats {
-	/* Firmware start cycle */
-	uint64_t fw_start;
-
-	/* Firmware end cycle */
-	uint64_t fw_end;
-
-	/* Hardware start cycle */
-	uint64_t hw_start;
-
-	/* Hardware end cycle */
-	uint64_t hw_end;
-};
-
-/* Result structure */
-struct cn10k_ml_result {
-	/* Job error code */
-	union cn10k_ml_error_code error_code;
-
-	/* Firmware stats */
-	struct cn10k_ml_fw_stats stats;
-
-	/* User context pointer */
-	void *user_ptr;
-};
-
-/* Firmware capability structure */
-union cn10k_ml_fw_cap {
-	uint64_t u64;
-
-	struct {
-		/* CMPC completion support */
-		uint64_t cmpc_completions : 1;
-
-		/* Poll mode completion support */
-		uint64_t poll_completions : 1;
-
-		/* SSO completion support */
-		uint64_t sso_completions : 1;
-
-		/* Support for model side loading */
-		uint64_t side_load_model : 1;
-
-		/* Batch execution */
-		uint64_t batch_run : 1;
-
-		/* Max number of models to be loaded in parallel */
-		uint64_t max_models : 8;
-
-		/* Firmware statistics */
-		uint64_t fw_stats : 1;
-
-		/* Hardware statistics */
-		uint64_t hw_stats : 1;
-
-		/* Max number of batches */
-		uint64_t max_num_batches : 16;
-
-		uint64_t rsvd : 33;
-	} s;
-};
-
-/* Firmware debug info structure */
-struct cn10k_ml_fw_debug {
-	/* ACC core 0 debug buffer */
-	uint64_t core0_debug_ptr;
-
-	/* ACC core 1 debug buffer */
-	uint64_t core1_debug_ptr;
-
-	/* ACC core 0 exception state buffer */
-	uint64_t core0_exception_buffer;
-
-	/* ACC core 1 exception state buffer */
-	uint64_t core1_exception_buffer;
-
-	/* Debug buffer size per core */
-	uint32_t debug_buffer_size;
-
-	/* Exception state dump size */
-	uint32_t exception_state_size;
-};
-
-/* Job descriptor header (32 bytes) */
-struct cn10k_ml_jd_header {
-	/* Job completion structure */
-	struct ml_jce_s jce;
-
-	/* Model ID */
-	uint64_t model_id : 8;
-
-	/* Job type */
-	uint64_t job_type : 8;
-
-	/* Flags for fast-path jobs */
-	uint64_t fp_flags : 16;
-
-	/* Flags for slow-path jobs */
-	uint64_t sp_flags : 16;
-	uint64_t rsvd : 16;
-
-	/* Job result pointer */
-	uint64_t *result;
-};
-
-/* Extra arguments for job descriptor */
-union cn10k_ml_jd_extended_args {
-	struct cn10k_ml_jd_extended_args_section_start {
-		/** DDR Scratch base address */
-		uint64_t ddr_scratch_base_address;
-
-		/** DDR Scratch range start */
-		uint64_t ddr_scratch_range_start;
-
-		/** DDR Scratch range end */
-		uint64_t ddr_scratch_range_end;
-
-		uint8_t rsvd[104];
-	} start;
-};
-
-/* Job descriptor structure */
-struct cn10k_ml_jd {
-	/* Job descriptor header (32 bytes) */
-	struct cn10k_ml_jd_header hdr;
-
-	union {
-		struct cn10k_ml_jd_section_fw_load {
-			/* Firmware capability structure (8 bytes) */
-			union cn10k_ml_fw_cap cap;
-
-			/* Firmware version (32 bytes) */
-			uint8_t version[MLDEV_FIRMWARE_VERSION_LENGTH];
-
-			/* Debug capability structure (40 bytes) */
-			struct cn10k_ml_fw_debug debug;
-
-			/* Flags to control error handling */
-			uint64_t flags;
-
-			uint8_t rsvd[8];
-		} fw_load;
-
-		struct cn10k_ml_jd_section_model_start {
-			/* Extended arguments */
-			uint64_t extended_args;
-
-			/* Destination model start address in DDR relative to ML_MLR_BASE */
-			uint64_t model_dst_ddr_addr;
-
-			/* Offset to model init section in the model */
-			uint64_t model_init_offset : 32;
-
-			/* Size of init section in the model */
-			uint64_t model_init_size : 32;
-
-			/* Offset to model main section in the model */
-			uint64_t model_main_offset : 32;
-
-			/* Size of main section in the model */
-			uint64_t model_main_size : 32;
-
-			/* Offset to model finish section in the model */
-			uint64_t model_finish_offset : 32;
-
-			/* Size of finish section in the model */
-			uint64_t model_finish_size : 32;
-
-			/* Offset to WB in model bin */
-			uint64_t model_wb_offset : 32;
-
-			/* Number of model layers */
-			uint64_t num_layers : 8;
-
-			/* Number of gather entries, 0 means linear input mode (= no gather) */
-			uint64_t num_gather_entries : 8;
-
-			/* Number of scatter entries 0 means linear input mode (= no scatter) */
-			uint64_t num_scatter_entries : 8;
-
-			/* Tile mask to load model */
-			uint64_t tilemask : 8;
-
-			/* Batch size of model  */
-			uint64_t batch_size : 32;
-
-			/* OCM WB base address */
-			uint64_t ocm_wb_base_address : 32;
-
-			/* OCM WB range start */
-			uint64_t ocm_wb_range_start : 32;
-
-			/* OCM WB range End */
-			uint64_t ocm_wb_range_end : 32;
-
-			/* DDR WB address */
-			uint64_t ddr_wb_base_address;
-
-			/* DDR WB range start */
-			uint64_t ddr_wb_range_start : 32;
-
-			/* DDR WB range end */
-			uint64_t ddr_wb_range_end : 32;
-
-			union {
-				/* Points to gather list if num_gather_entries > 0 */
-				void *gather_list;
-				struct {
-					/* Linear input mode */
-					uint64_t ddr_range_start : 32;
-					uint64_t ddr_range_end : 32;
-				} s;
-			} input;
-
-			union {
-				/* Points to scatter list if num_scatter_entries > 0 */
-				void *scatter_list;
-				struct {
-					/* Linear output mode */
-					uint64_t ddr_range_start : 32;
-					uint64_t ddr_range_end : 32;
-				} s;
-			} output;
-		} model_start;
-
-		struct cn10k_ml_jd_section_model_stop {
-			uint8_t rsvd[96];
-		} model_stop;
-
-		struct cn10k_ml_jd_section_model_run {
-			/* Address of the input for the run relative to ML_MLR_BASE */
-			uint64_t input_ddr_addr;
-
-			/* Address of the output for the run relative to ML_MLR_BASE */
-			uint64_t output_ddr_addr;
-
-			/* Number of batches to run in variable batch processing */
-			uint16_t num_batches;
-
-			uint8_t rsvd[78];
-		} model_run;
-	};
-};
-
 /* ML firmware structure */
 struct cn10k_ml_fw {
 	/* Device reference */
@@ -375,7 +118,7 @@ struct cn10k_ml_fw {
 	uint8_t *data;
 
 	/* Firmware load / handshake request structure */
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 };
 
 /* Extended stats types enum */
@@ -488,9 +231,9 @@ struct cn10k_ml_dev {
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
 
 	/* Poll handling function pointers */
-	void (*set_poll_addr)(struct cn10k_ml_req *req);
-	void (*set_poll_ptr)(struct cn10k_ml_req *req);
-	uint64_t (*get_poll_ptr)(struct cn10k_ml_req *req);
+	void (*set_poll_addr)(struct cnxk_ml_req *req);
+	void (*set_poll_ptr)(struct cnxk_ml_req *req);
+	uint64_t (*get_poll_ptr)(struct cnxk_ml_req *req);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index d747bba151..5d37e9bf8a 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -10,6 +10,7 @@
 
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_ops.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -549,7 +550,6 @@ void
 cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct rte_ml_model_info *info;
 	struct rte_ml_io_info *output;
@@ -558,7 +558,6 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 	uint8_t i;
 
 	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	metadata = &model->glow.metadata;
 	info = PLT_PTR_CAST(model->info);
 	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
@@ -575,7 +574,8 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 	info->io_layout = RTE_ML_IO_LAYOUT_PACKED;
 	info->min_batches = model->batch_size;
 	info->max_batches =
-		cn10k_mldev->fw.req->jd.fw_load.cap.s.max_num_batches / model->batch_size;
+		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_num_batches /
+		model->batch_size;
 	info->nb_inputs = metadata->model.num_input;
 	info->input_info = input;
 	info->nb_outputs = metadata->model.num_output;
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 206a369ca7..74ada1531a 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -11,10 +11,10 @@
 
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ocm.h"
-#include "cn10k_ml_ops.h"
 
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
+struct cnxk_ml_req;
 
 /* Model Metadata : v 2.3.0.1 */
 #define MRVL_ML_MODEL_MAGIC_STRING "MRVL"
@@ -444,7 +444,7 @@ struct cn10k_ml_layer_data {
 	struct cn10k_ml_ocm_layer_map ocm_map;
 
 	/* Layer: Slow-path operations request pointer */
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 
 	/* Layer: Stats for burst ops */
 	struct cn10k_ml_layer_stats *burst_stats;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index e91cc4e859..caee09829b 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -7,10 +7,9 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_ops.h"
-
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_ops.h"
 
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
@@ -78,31 +77,31 @@ print_line(FILE *fp, int len)
 }
 
 static inline void
-cn10k_ml_set_poll_addr(struct cn10k_ml_req *req)
+cn10k_ml_set_poll_addr(struct cnxk_ml_req *req)
 {
-	req->compl_W1 = PLT_U64_CAST(&req->status);
+	req->status = &req->cn10k_req.status;
 }
 
 static inline void
-cn10k_ml_set_poll_ptr(struct cn10k_ml_req *req)
+cn10k_ml_set_poll_ptr(struct cnxk_ml_req *req)
 {
-	plt_write64(ML_CNXK_POLL_JOB_START, req->compl_W1);
+	plt_write64(ML_CNXK_POLL_JOB_START, req->status);
 }
 
 static inline uint64_t
-cn10k_ml_get_poll_ptr(struct cn10k_ml_req *req)
+cn10k_ml_get_poll_ptr(struct cnxk_ml_req *req)
 {
-	return plt_read64(req->compl_W1);
+	return plt_read64(req->status);
 }
 
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
-	snprintf(name, size, "cn10k_ml_qp_mem_%u:%u", dev_id, qp_id);
+	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
 }
 
 static int
-cn10k_ml_qp_destroy(const struct rte_ml_dev *dev, struct cn10k_ml_qp *qp)
+cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
 {
 	const struct rte_memzone *qp_mem;
 	char name[RTE_MEMZONE_NAMESIZE];
@@ -122,14 +121,14 @@ cn10k_ml_qp_destroy(const struct rte_ml_dev *dev, struct cn10k_ml_qp *qp)
 static int
 cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
 {
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	int ret;
 
 	qp = dev->data->queue_pairs[queue_pair_id];
 	if (qp == NULL)
 		return -EINVAL;
 
-	ret = cn10k_ml_qp_destroy(dev, qp);
+	ret = cnxk_ml_qp_destroy(dev, qp);
 	if (ret) {
 		plt_err("Could not destroy queue pair %u", queue_pair_id);
 		return ret;
@@ -140,18 +139,18 @@ cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
 	return 0;
 }
 
-static struct cn10k_ml_qp *
-cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
+static struct cnxk_ml_qp *
+cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
 {
 	const struct rte_memzone *qp_mem;
 	char name[RTE_MEMZONE_NAMESIZE];
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint32_t len;
 	uint8_t *va;
 	uint64_t i;
 
 	/* Allocate queue pair */
-	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cn10k_ml_qp), ROC_ALIGN,
+	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cnxk_ml_qp), ROC_ALIGN,
 				socket_id);
 	if (qp == NULL) {
 		plt_err("Could not allocate queue pair");
@@ -159,7 +158,7 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	}
 
 	/* For request queue */
-	len = nb_desc * sizeof(struct cn10k_ml_req);
+	len = nb_desc * sizeof(struct cnxk_ml_req);
 	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp_id);
 	qp_mem = rte_memzone_reserve_aligned(
 		name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | RTE_MEMZONE_256MB, ROC_ALIGN);
@@ -173,7 +172,7 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 
 	/* Initialize Request queue */
 	qp->id = qp_id;
-	qp->queue.reqs = (struct cn10k_ml_req *)va;
+	qp->queue.reqs = (struct cnxk_ml_req *)va;
 	qp->queue.head = 0;
 	qp->queue.tail = 0;
 	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
@@ -185,8 +184,9 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
-		memset(&qp->queue.reqs[i].jd, 0, sizeof(struct cn10k_ml_jd));
-		qp->queue.reqs[i].jcmd.w1.s.jobptr = PLT_U64_CAST(&qp->queue.reqs[i].jd);
+		memset(&qp->queue.reqs[i].cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+		qp->queue.reqs[i].cn10k_req.jcmd.w1.s.jobptr =
+			PLT_U64_CAST(&qp->queue.reqs[i].cn10k_req.jd);
 	}
 
 	return qp;
@@ -333,7 +333,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 
 static void
 cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_model *model,
-				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
+				struct cnxk_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_layer_addr *addr;
@@ -341,79 +341,88 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 	metadata = &model->glow.metadata;
 	addr = &model->layer[0].glow.addr;
 
-	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
-	req->jd.hdr.jce.w0.u64 = 0;
-	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
-	req->jd.hdr.model_id = model->model_id;
-	req->jd.hdr.job_type = job_type;
-	req->jd.hdr.fp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
+	memset(&req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
+	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->cn10k_req.status);
+	req->cn10k_req.jd.hdr.model_id = model->model_id;
+	req->cn10k_req.jd.hdr.job_type = job_type;
+	req->cn10k_req.jd.hdr.fp_flags = 0x0;
+	req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
 		if (!model->glow.metadata.model.ocm_relocatable)
-			req->jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
+			req->cn10k_req.jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
 		else
-			req->jd.hdr.sp_flags = 0x0;
+			req->cn10k_req.jd.hdr.sp_flags = 0x0;
 
-		req->jd.hdr.sp_flags |= ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD;
-		req->jd.model_start.extended_args =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->extended_args));
-		req->jd.model_start.model_dst_ddr_addr =
+		req->cn10k_req.jd.hdr.sp_flags |= ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD;
+		req->cn10k_req.jd.model_start.extended_args = PLT_U64_CAST(
+			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.extended_args));
+		req->cn10k_req.jd.model_start.model_dst_ddr_addr =
 			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_run_addr));
-		req->jd.model_start.model_init_offset = 0x0;
-		req->jd.model_start.model_main_offset = metadata->init_model.file_size;
-		req->jd.model_start.model_finish_offset =
+		req->cn10k_req.jd.model_start.model_init_offset = 0x0;
+		req->cn10k_req.jd.model_start.model_main_offset = metadata->init_model.file_size;
+		req->cn10k_req.jd.model_start.model_finish_offset =
 			metadata->init_model.file_size + metadata->main_model.file_size;
-		req->jd.model_start.model_init_size = metadata->init_model.file_size;
-		req->jd.model_start.model_main_size = metadata->main_model.file_size;
-		req->jd.model_start.model_finish_size = metadata->finish_model.file_size;
-		req->jd.model_start.model_wb_offset = metadata->init_model.file_size +
-						      metadata->main_model.file_size +
-						      metadata->finish_model.file_size;
-		req->jd.model_start.num_layers = metadata->model.num_layers;
-		req->jd.model_start.num_gather_entries = 0;
-		req->jd.model_start.num_scatter_entries = 0;
-		req->jd.model_start.tilemask = 0; /* Updated after reserving pages */
-		req->jd.model_start.batch_size = model->batch_size;
-		req->jd.model_start.ocm_wb_base_address = 0; /* Updated after reserving pages */
-		req->jd.model_start.ocm_wb_range_start = metadata->model.ocm_wb_range_start;
-		req->jd.model_start.ocm_wb_range_end = metadata->model.ocm_wb_range_end;
-		req->jd.model_start.ddr_wb_base_address = PLT_U64_CAST(roc_ml_addr_ap2mlip(
-			&cn10k_mldev->roc,
-			PLT_PTR_ADD(addr->finish_load_addr, metadata->finish_model.file_size)));
-		req->jd.model_start.ddr_wb_range_start = metadata->model.ddr_wb_range_start;
-		req->jd.model_start.ddr_wb_range_end = metadata->model.ddr_wb_range_end;
-		req->jd.model_start.input.s.ddr_range_start = metadata->model.ddr_input_range_start;
-		req->jd.model_start.input.s.ddr_range_end = metadata->model.ddr_input_range_end;
-		req->jd.model_start.output.s.ddr_range_start =
+		req->cn10k_req.jd.model_start.model_init_size = metadata->init_model.file_size;
+		req->cn10k_req.jd.model_start.model_main_size = metadata->main_model.file_size;
+		req->cn10k_req.jd.model_start.model_finish_size = metadata->finish_model.file_size;
+		req->cn10k_req.jd.model_start.model_wb_offset = metadata->init_model.file_size +
+								metadata->main_model.file_size +
+								metadata->finish_model.file_size;
+		req->cn10k_req.jd.model_start.num_layers = metadata->model.num_layers;
+		req->cn10k_req.jd.model_start.num_gather_entries = 0;
+		req->cn10k_req.jd.model_start.num_scatter_entries = 0;
+		req->cn10k_req.jd.model_start.tilemask = 0; /* Updated after reserving pages */
+		req->cn10k_req.jd.model_start.batch_size = model->batch_size;
+		req->cn10k_req.jd.model_start.ocm_wb_base_address =
+			0; /* Updated after reserving pages */
+		req->cn10k_req.jd.model_start.ocm_wb_range_start =
+			metadata->model.ocm_wb_range_start;
+		req->cn10k_req.jd.model_start.ocm_wb_range_end = metadata->model.ocm_wb_range_end;
+		req->cn10k_req.jd.model_start.ddr_wb_base_address =
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(
+				&cn10k_mldev->roc, PLT_PTR_ADD(addr->finish_load_addr,
+							       metadata->finish_model.file_size)));
+		req->cn10k_req.jd.model_start.ddr_wb_range_start =
+			metadata->model.ddr_wb_range_start;
+		req->cn10k_req.jd.model_start.ddr_wb_range_end = metadata->model.ddr_wb_range_end;
+		req->cn10k_req.jd.model_start.input.s.ddr_range_start =
+			metadata->model.ddr_input_range_start;
+		req->cn10k_req.jd.model_start.input.s.ddr_range_end =
+			metadata->model.ddr_input_range_end;
+		req->cn10k_req.jd.model_start.output.s.ddr_range_start =
 			metadata->model.ddr_output_range_start;
-		req->jd.model_start.output.s.ddr_range_end = metadata->model.ddr_output_range_end;
+		req->cn10k_req.jd.model_start.output.s.ddr_range_end =
+			metadata->model.ddr_output_range_end;
 
-		req->extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
+		req->cn10k_req.extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
 			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->scratch_base_addr));
-		req->extended_args.start.ddr_scratch_range_start =
+		req->cn10k_req.extended_args.start.ddr_scratch_range_start =
 			metadata->model.ddr_scratch_range_start;
-		req->extended_args.start.ddr_scratch_range_end =
+		req->cn10k_req.extended_args.start.ddr_scratch_range_end =
 			metadata->model.ddr_scratch_range_end;
 	}
 }
 
 static __rte_always_inline void
-cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_req *req,
+cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_req *req,
 				struct rte_ml_op *op)
 {
-	req->jd.hdr.jce.w0.u64 = 0;
-	req->jd.hdr.jce.w1.u64 = req->compl_W1;
-	req->jd.hdr.model_id = op->model_id;
-	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
-	req->jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
-	req->jd.hdr.sp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
-	req->jd.model_run.input_ddr_addr =
+	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
+	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(req->status);
+	req->cn10k_req.jd.hdr.model_id = op->model_id;
+	req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
+	req->cn10k_req.jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
+	req->cn10k_req.jd.hdr.sp_flags = 0x0;
+	req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
+	req->cn10k_req.jd.model_run.input_ddr_addr =
 		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->input[0]->addr));
-	req->jd.model_run.output_ddr_addr =
+	req->cn10k_req.jd.model_run.output_ddr_addr =
 		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->output[0]->addr));
-	req->jd.model_run.num_batches = op->nb_batches;
+	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
 }
 
 struct xstat_info {
@@ -861,7 +870,7 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	op.input = &inp;
 	op.output = &out;
 
-	memset(model->layer[0].glow.req, 0, sizeof(struct cn10k_ml_req));
+	memset(model->layer[0].glow.req, 0, sizeof(struct cnxk_ml_req));
 	ret = cn10k_ml_inference_sync(dev, &op);
 	plt_memzone_free(mz);
 
@@ -904,7 +913,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint16_t model_id;
 	uint32_t mz_size;
 	uint16_t tile_id;
@@ -1101,7 +1110,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint16_t model_id;
 	uint16_t qp_id;
 
@@ -1136,7 +1145,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
 		qp = dev->data->queue_pairs[qp_id];
 		if (qp != NULL) {
-			if (cn10k_ml_qp_destroy(dev, qp) != 0)
+			if (cnxk_ml_qp_destroy(dev, qp) != 0)
 				plt_err("Could not destroy queue pair %u", qp_id);
 			dev->data->queue_pairs[qp_id] = NULL;
 		}
@@ -1213,7 +1222,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 			      const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
 {
 	struct rte_ml_dev_info dev_info;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint32_t nb_desc;
 
 	if (queue_pair_id >= dev->data->nb_queue_pairs) {
@@ -1239,7 +1248,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	 */
 	nb_desc =
 		(qp_conf->nb_desc == dev_info.max_desc) ? dev_info.max_desc : qp_conf->nb_desc + 1;
-	qp = cn10k_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
+	qp = cnxk_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
 	if (qp == NULL) {
 		plt_err("Could not create queue pair %u", queue_pair_id);
 		return -ENOMEM;
@@ -1252,7 +1261,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 static int
 cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 {
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	int qp_id;
 
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
@@ -1269,7 +1278,7 @@ cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 static void
 cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	int qp_id;
 
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
@@ -1485,20 +1494,22 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 
 	/* Dump debug buffer */
 	for (core_id = 0; core_id <= 1; core_id++) {
-		bufsize = fw->req->jd.fw_load.debug.debug_buffer_size;
+		bufsize = fw->req->cn10k_req.jd.fw_load.debug.debug_buffer_size;
 		if (core_id == 0) {
 			head_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
 			tail_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_debug_ptr);
+			head_ptr =
+				PLT_PTR_CAST(fw->req->cn10k_req.jd.fw_load.debug.core0_debug_ptr);
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		} else {
 			head_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
 			tail_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_debug_ptr);
+			head_ptr =
+				PLT_PTR_CAST(fw->req->cn10k_req.jd.fw_load.debug.core1_debug_ptr);
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		}
 		if (head_loc < tail_loc) {
@@ -1511,17 +1522,19 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 
 	/* Dump exception info */
 	for (core_id = 0; core_id <= 1; core_id++) {
-		bufsize = fw->req->jd.fw_load.debug.exception_state_size;
+		bufsize = fw->req->cn10k_req.jd.fw_load.debug.exception_state_size;
 		if ((core_id == 0) &&
 		    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0)) {
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_exception_buffer);
+			head_ptr = PLT_PTR_CAST(
+				fw->req->cn10k_req.jd.fw_load.debug.core0_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C0 = 0x%016lx",
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0));
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 			fprintf(fp, "%.*s", bufsize, head_ptr);
 		} else if ((core_id == 1) && (roc_ml_reg_read64(&cn10k_mldev->roc,
 								ML_SCRATCH_EXCEPTION_SP_C1) != 0)) {
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_exception_buffer);
+			head_ptr = PLT_PTR_CAST(
+				fw->req->cn10k_req.jd.fw_load.debug.core1_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C1 = 0x%016lx",
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1));
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
@@ -1538,14 +1551,14 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	const struct plt_memzone *mz;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 	uint64_t timeout_cycle;
 	bool timeout;
 	int ret;
 
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cn10k_ml_req), 0,
+	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cnxk_ml_req), 0,
 					 ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL) {
 		plt_err("Could not allocate reserved memzone");
@@ -1554,23 +1567,24 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	req = mz->addr;
 
 	/* Prepare load completion structure */
-	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
-	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
-	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
-	req->jd.fw_load.flags = cn10k_ml_fw_flags_get(&cn10k_mldev->fw);
-	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
+	memset(&req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->cn10k_req.status);
+	req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST;
+	req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
+	req->cn10k_req.jd.fw_load.flags = cn10k_ml_fw_flags_get(&cn10k_mldev->fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
 	/* Enqueue firmware selftest request through scratch registers */
 	timeout = true;
 	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 
 	plt_rmb();
 	do {
 		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
-		    (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH)) {
+		    (plt_read64(&req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -1581,7 +1595,7 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	if (timeout) {
 		ret = -ETIME;
 	} else {
-		if (req->result.error_code.u64 != 0)
+		if (req->cn10k_req.result.error_code != 0)
 			ret = -1;
 	}
 
@@ -1654,7 +1668,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
 		  2 * model_data_size + model_scratch_size + model_info_size +
-		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE) +
+		  PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE) +
 		  model_stats_size;
 
 	/* Allocate memzone for model object and model data */
@@ -1726,7 +1740,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	/* Reset burst and sync stats */
 	model->layer[0].glow.burst_stats =
 		PLT_PTR_ADD(model->layer[0].glow.req,
-			    PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE));
+			    PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
 		model->layer[0].glow.burst_stats[qp_id].hw_latency_tot = 0;
 		model->layer[0].glow.burst_stats[qp_id].hw_latency_min = UINT64_MAX;
@@ -1790,7 +1804,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 
 	bool job_enqueued;
 	bool job_dequeued;
@@ -1815,10 +1829,10 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	/* Prepare JD */
 	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
-	req->result.error_code.u64 = 0x0;
-	req->result.user_ptr = NULL;
+	req->cn10k_req.result.error_code = 0x0;
+	req->cn10k_req.result.user_ptr = NULL;
 
-	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
 	num_tiles = model->layer[0].glow.metadata.model.tile_end -
@@ -1878,8 +1892,8 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Update JD */
 	cn10k_ml_ocm_tilecount(model->layer[0].glow.ocm_map.tilemask, &tile_start, &tile_end);
-	req->jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
-	req->jd.model_start.ocm_wb_base_address =
+	req->cn10k_req.jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
+	req->cn10k_req.jd.model_start.ocm_wb_base_address =
 		model->layer[0].glow.ocm_map.wb_page_start * ocm->page_size;
 
 	job_enqueued = false;
@@ -1887,19 +1901,21 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	do {
 		if (!job_enqueued) {
 			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
+			job_enqueued =
+				roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
+			job_dequeued =
+				roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
-			if (req->result.error_code.u64 == 0)
+		if (plt_read64(&req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH) {
+			if (req->cn10k_req.result.error_code == 0)
 				ret = 0;
 			else
 				ret = -1;
@@ -1952,7 +1968,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 
 	bool job_enqueued;
 	bool job_dequeued;
@@ -1972,10 +1988,10 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	/* Prepare JD */
 	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
-	req->result.error_code.u64 = 0x0;
-	req->result.user_ptr = NULL;
+	req->cn10k_req.result.error_code = 0x0;
+	req->cn10k_req.result.user_ptr = NULL;
 
-	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
 	locked = false;
@@ -2015,19 +2031,21 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	do {
 		if (!job_enqueued) {
 			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
+			job_enqueued =
+				roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
+			job_dequeued =
+				roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
-			if (req->result.error_code.u64 == 0x0)
+		if (plt_read64(&req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH) {
+			if (req->cn10k_req.result.error_code == 0x0)
 				ret = 0;
 			else
 				ret = -1;
@@ -2287,18 +2305,23 @@ queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
 }
 
 static __rte_always_inline void
-cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
-		       struct rte_ml_op *op)
+cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *req)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_layer_stats *stats;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cn10k_ml_result *result;
 	struct cnxk_ml_model *model;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
+	struct rte_ml_op *op;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
 
-	if (likely(result->error_code.u64 == 0)) {
+	result = &req->cn10k_req.result;
+	op = req->op;
+
+	if (likely(result->error_code == 0)) {
 		model = dev->data->models[op->model_id];
 		if (likely(qp_id >= 0)) {
 			qp = dev->data->queue_pairs[qp_id];
@@ -2329,7 +2352,7 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		stats->fw_latency_max = PLT_MAX(stats->fw_latency_max, fw_latency);
 		stats->dequeued_count++;
 
-		op->impl_opaque = result->error_code.u64;
+		op->impl_opaque = result->error_code;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
 	} else {
 		if (likely(qp_id >= 0)) {
@@ -2338,7 +2361,8 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		}
 
 		/* Handle driver error */
-		if (result->error_code.s.etype == ML_ETYPE_DRIVER) {
+		error_code = (union cn10k_ml_error_code *)&result->error_code;
+		if (error_code->s.etype == ML_ETYPE_DRIVER) {
 			cnxk_mldev = dev->data->dev_private;
 			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
@@ -2346,15 +2370,15 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 			if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) !=
 			     0) ||
 			    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
-				result->error_code.s.stype = ML_DRIVER_ERR_EXCEPTION;
+				error_code->s.stype = ML_DRIVER_ERR_EXCEPTION;
 			else if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_LO) != 0) ||
 				 (roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_HI) != 0))
-				result->error_code.s.stype = ML_DRIVER_ERR_FW_ERROR;
+				error_code->s.stype = ML_DRIVER_ERR_FW_ERROR;
 			else
-				result->error_code.s.stype = ML_DRIVER_ERR_UNKNOWN;
+				error_code->s.stype = ML_DRIVER_ERR_UNKNOWN;
 		}
 
-		op->impl_opaque = result->error_code.u64;
+		op->impl_opaque = result->error_code;
 		op->status = RTE_ML_OP_STATUS_ERROR;
 	}
 
@@ -2365,11 +2389,12 @@ __rte_hot uint16_t
 cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_req *req;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_req *req;
+	struct cnxk_ml_qp *qp;
 	struct rte_ml_op *op;
 
 	uint16_t count;
@@ -2395,12 +2420,13 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	cn10k_mldev->set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
-	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
-	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
-	req->result.user_ptr = op->user_ptr;
+	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
+	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	req->cn10k_req.result.user_ptr = op->user_ptr;
 
 	cn10k_mldev->set_poll_ptr(req);
-	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd);
+	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
 
@@ -2424,11 +2450,12 @@ __rte_hot uint16_t
 cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_req *req;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_req *req;
+	struct cnxk_ml_qp *qp;
 
 	uint64_t status;
 	uint16_t count;
@@ -2450,13 +2477,15 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	req = &queue->reqs[tail];
 	status = cn10k_mldev->get_poll_ptr(req);
 	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
-		if (plt_tsc_cycles() < req->timeout)
+		if (plt_tsc_cycles() < req->timeout) {
 			goto empty_or_active;
-		else /* Timeout, set indication of driver error */
-			req->result.error_code.s.etype = ML_ETYPE_DRIVER;
+		} else { /* Timeout, set indication of driver error */
+			error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+			error_code->s.etype = ML_ETYPE_DRIVER;
+		}
 	}
 
-	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
+	cn10k_ml_result_update(dev, qp_id, req);
 	ops[count] = req->op;
 
 	queue_index_advance(&tail, qp->nb_desc);
@@ -2507,10 +2536,11 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 __rte_hot int
 cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 	bool timeout;
 	int ret = 0;
 
@@ -2522,17 +2552,18 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	cn10k_ml_set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
-	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
-	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
-	req->result.user_ptr = op->user_ptr;
+	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
+	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	req->cn10k_req.result.user_ptr = op->user_ptr;
 
 	cn10k_mldev->set_poll_ptr(req);
-	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
+	req->cn10k_req.jcmd.w1.s.jobptr = PLT_U64_CAST(&req->cn10k_req.jd);
 
 	timeout = true;
 	req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	do {
-		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd)) {
+		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd)) {
 			req->op = op;
 			timeout = false;
 			break;
@@ -2555,7 +2586,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	if (timeout)
 		ret = -ETIME;
 	else
-		cn10k_ml_result_update(dev, -1, &req->result, req->op);
+		cn10k_ml_result_update(dev, -1, req);
 
 error_enqueue:
 	return ret;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 005b093e45..fd5992e192 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -10,63 +10,279 @@
 
 #include <roc_api.h>
 
-#include "cn10k_ml_dev.h"
+/* Firmware version string length */
+#define MLDEV_FIRMWARE_VERSION_LENGTH 32
 
-/* Request structure */
-struct cn10k_ml_req {
-	/* Job descriptor */
-	struct cn10k_ml_jd jd;
+/* Job types */
+enum cn10k_ml_job_type {
+	ML_CN10K_JOB_TYPE_MODEL_RUN = 0,
+	ML_CN10K_JOB_TYPE_MODEL_STOP,
+	ML_CN10K_JOB_TYPE_MODEL_START,
+	ML_CN10K_JOB_TYPE_FIRMWARE_LOAD,
+	ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST,
+};
 
-	/* Job descriptor extra arguments */
-	union cn10k_ml_jd_extended_args extended_args;
+/* Firmware stats */
+struct cn10k_ml_stats {
+	/* Firmware start cycle */
+	uint64_t fw_start;
 
-	/* Job result */
-	struct cn10k_ml_result result;
+	/* Firmware end cycle */
+	uint64_t fw_end;
 
-	/* Status field for poll mode requests */
-	volatile uint64_t status;
+	/* Hardware start cycle */
+	uint64_t hw_start;
 
-	/* Job command */
-	struct ml_job_cmd_s jcmd;
+	/* Hardware end cycle */
+	uint64_t hw_end;
+};
+
+/* Result structure */
+struct cn10k_ml_result {
+	/* Job error code */
+	uint64_t error_code;
+
+	/* Stats */
+	struct cn10k_ml_stats stats;
+
+	/* User context pointer */
+	void *user_ptr;
+};
+
+/* Firmware capability structure */
+union cn10k_ml_fw_cap {
+	uint64_t u64;
+
+	struct {
+		/* CMPC completion support */
+		uint64_t cmpc_completions : 1;
+
+		/* Poll mode completion support */
+		uint64_t poll_completions : 1;
+
+		/* SSO completion support */
+		uint64_t sso_completions : 1;
+
+		/* Support for model side loading */
+		uint64_t side_load_model : 1;
 
-	/* Job completion W1 */
-	uint64_t compl_W1;
+		/* Batch execution */
+		uint64_t batch_run : 1;
 
-	/* Timeout cycle */
-	uint64_t timeout;
+		/* Max number of models to be loaded in parallel */
+		uint64_t max_models : 8;
 
-	/* Op */
-	struct rte_ml_op *op;
-} __rte_aligned(ROC_ALIGN);
+		/* Firmware statistics */
+		uint64_t fw_stats : 1;
 
-/* Request queue */
-struct cn10k_ml_queue {
-	/* Array of requests */
-	struct cn10k_ml_req *reqs;
+		/* Hardware statistics */
+		uint64_t hw_stats : 1;
 
-	/* Head of the queue, used for enqueue */
-	uint64_t head;
+		/* Max number of batches */
+		uint64_t max_num_batches : 16;
 
-	/* Tail of the queue, used for dequeue */
-	uint64_t tail;
+		uint64_t rsvd : 33;
+	} s;
+};
+
+/* Firmware debug info structure */
+struct cn10k_ml_fw_debug {
+	/* ACC core 0 debug buffer */
+	uint64_t core0_debug_ptr;
+
+	/* ACC core 1 debug buffer */
+	uint64_t core1_debug_ptr;
+
+	/* ACC core 0 exception state buffer */
+	uint64_t core0_exception_buffer;
+
+	/* ACC core 1 exception state buffer */
+	uint64_t core1_exception_buffer;
+
+	/* Debug buffer size per core */
+	uint32_t debug_buffer_size;
 
-	/* Wait cycles before timeout */
-	uint64_t wait_cycles;
+	/* Exception state dump size */
+	uint32_t exception_state_size;
 };
 
-/* Queue-pair structure */
-struct cn10k_ml_qp {
-	/* ID */
-	uint32_t id;
+/* Job descriptor header (32 bytes) */
+struct cn10k_ml_jd_header {
+	/* Job completion structure */
+	struct ml_jce_s jce;
+
+	/* Model ID */
+	uint64_t model_id : 8;
+
+	/* Job type */
+	uint64_t job_type : 8;
+
+	/* Flags for fast-path jobs */
+	uint64_t fp_flags : 16;
+
+	/* Flags for slow-path jobs */
+	uint64_t sp_flags : 16;
+	uint64_t rsvd : 16;
+
+	/* Job result pointer */
+	uint64_t *result;
+};
+
+/* Extra arguments for job descriptor */
+union cn10k_ml_jd_extended_args {
+	struct cn10k_ml_jd_extended_args_section_start {
+		/* DDR Scratch base address */
+		uint64_t ddr_scratch_base_address;
+
+		/* DDR Scratch range start */
+		uint64_t ddr_scratch_range_start;
+
+		/* DDR Scratch range end */
+		uint64_t ddr_scratch_range_end;
+
+		uint8_t rsvd[104];
+	} start;
+};
+
+/* Job descriptor structure */
+struct cn10k_ml_jd {
+	/* Job descriptor header (32 bytes) */
+	struct cn10k_ml_jd_header hdr;
+
+	union {
+		struct cn10k_ml_jd_section_fw_load {
+			/* Firmware capability structure (8 bytes) */
+			union cn10k_ml_fw_cap cap;
+
+			/* Firmware version (32 bytes) */
+			uint8_t version[MLDEV_FIRMWARE_VERSION_LENGTH];
+
+			/* Debug capability structure (40 bytes) */
+			struct cn10k_ml_fw_debug debug;
 
-	/* Number of descriptors */
-	uint64_t nb_desc;
+			/* Flags to control error handling */
+			uint64_t flags;
 
-	/* Request queue */
-	struct cn10k_ml_queue queue;
+			uint8_t rsvd[8];
+		} fw_load;
 
-	/* Statistics per queue-pair */
-	struct rte_ml_dev_stats stats;
+		struct cn10k_ml_jd_section_model_start {
+			/* Extended arguments */
+			uint64_t extended_args;
+
+			/* Destination model start address in DDR relative to ML_MLR_BASE */
+			uint64_t model_dst_ddr_addr;
+
+			/* Offset to model init section in the model */
+			uint64_t model_init_offset : 32;
+
+			/* Size of init section in the model */
+			uint64_t model_init_size : 32;
+
+			/* Offset to model main section in the model */
+			uint64_t model_main_offset : 32;
+
+			/* Size of main section in the model */
+			uint64_t model_main_size : 32;
+
+			/* Offset to model finish section in the model */
+			uint64_t model_finish_offset : 32;
+
+			/* Size of finish section in the model */
+			uint64_t model_finish_size : 32;
+
+			/* Offset to WB in model bin */
+			uint64_t model_wb_offset : 32;
+
+			/* Number of model layers */
+			uint64_t num_layers : 8;
+
+			/* Number of gather entries, 0 means linear input mode (= no gather) */
+			uint64_t num_gather_entries : 8;
+
+			/* Number of scatter entries 0 means linear input mode (= no scatter) */
+			uint64_t num_scatter_entries : 8;
+
+			/* Tile mask to load model */
+			uint64_t tilemask : 8;
+
+			/* Batch size of model  */
+			uint64_t batch_size : 32;
+
+			/* OCM WB base address */
+			uint64_t ocm_wb_base_address : 32;
+
+			/* OCM WB range start */
+			uint64_t ocm_wb_range_start : 32;
+
+			/* OCM WB range End */
+			uint64_t ocm_wb_range_end : 32;
+
+			/* DDR WB address */
+			uint64_t ddr_wb_base_address;
+
+			/* DDR WB range start */
+			uint64_t ddr_wb_range_start : 32;
+
+			/* DDR WB range end */
+			uint64_t ddr_wb_range_end : 32;
+
+			union {
+				/* Points to gather list if num_gather_entries > 0 */
+				void *gather_list;
+				struct {
+					/* Linear input mode */
+					uint64_t ddr_range_start : 32;
+					uint64_t ddr_range_end : 32;
+				} s;
+			} input;
+
+			union {
+				/* Points to scatter list if num_scatter_entries > 0 */
+				void *scatter_list;
+				struct {
+					/* Linear output mode */
+					uint64_t ddr_range_start : 32;
+					uint64_t ddr_range_end : 32;
+				} s;
+			} output;
+		} model_start;
+
+		struct cn10k_ml_jd_section_model_stop {
+			uint8_t rsvd[96];
+		} model_stop;
+
+		struct cn10k_ml_jd_section_model_run {
+			/* Address of the input for the run relative to ML_MLR_BASE */
+			uint64_t input_ddr_addr;
+
+			/* Address of the output for the run relative to ML_MLR_BASE */
+			uint64_t output_ddr_addr;
+
+			/* Number of batches to run in variable batch processing */
+			uint16_t num_batches;
+
+			uint8_t rsvd[78];
+		} model_run;
+	};
+} __plt_aligned(ROC_ALIGN);
+
+/* CN10K specific request */
+struct cn10k_ml_req {
+	/* Job descriptor */
+	struct cn10k_ml_jd jd;
+
+	/* Job descriptor extra arguments */
+	union cn10k_ml_jd_extended_args extended_args;
+
+	/* Status field for poll mode requests */
+	volatile uint64_t status;
+
+	/* Job command */
+	struct ml_job_cmd_s jcmd;
+
+	/* Result */
+	struct cn10k_ml_result result;
 };
 
 /* Device ops */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
new file mode 100644
index 0000000000..f1872dcf7c
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include "cnxk_ml_ops.h"
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
new file mode 100644
index 0000000000..b953fb0f5f
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_OPS_H_
+#define _CNXK_ML_OPS_H_
+
+#include <rte_mldev.h>
+#include <rte_mldev_core.h>
+
+#include <roc_api.h>
+
+#include "cn10k_ml_ops.h"
+
+/* Request structure */
+struct cnxk_ml_req {
+	/* Device specific request */
+	union {
+		/* CN10K */
+		struct cn10k_ml_req cn10k_req;
+	};
+
+	/* Address of status field */
+	volatile uint64_t *status;
+
+	/* Timeout cycle */
+	uint64_t timeout;
+
+	/* Op */
+	struct rte_ml_op *op;
+} __rte_aligned(ROC_ALIGN);
+
+/* Request queue */
+struct cnxk_ml_queue {
+	/* Array of requests */
+	struct cnxk_ml_req *reqs;
+
+	/* Head of the queue, used for enqueue */
+	uint64_t head;
+
+	/* Tail of the queue, used for dequeue */
+	uint64_t tail;
+
+	/* Wait cycles before timeout */
+	uint64_t wait_cycles;
+};
+
+/* Queue-pair structure */
+struct cnxk_ml_qp {
+	/* ID */
+	uint32_t id;
+
+	/* Number of descriptors */
+	uint64_t nb_desc;
+
+	/* Request queue */
+	struct cnxk_ml_queue queue;
+
+	/* Statistics per queue-pair */
+	struct rte_ml_dev_stats stats;
+};
+
+#endif /* _CNXK_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 72e03b15b5..73db458fcd 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -15,6 +15,7 @@ driver_sdk_headers = files(
         'cnxk_ml_dev.h',
         'cnxk_ml_io.h',
         'cnxk_ml_model.h',
+        'cnxk_ml_ops.h',
 )
 
 sources = files(
@@ -24,6 +25,7 @@ sources = files(
         'cn10k_ml_ocm.c',
         'cnxk_ml_dev.c',
         'cnxk_ml_model.c',
+        'cnxk_ml_ops.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 05/34] ml/cnxk: add generic cnxk xstats structures
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
                     ` (3 preceding siblings ...)
  2023-10-18  6:47   ` [PATCH v5 04/34] ml/cnxk: add generic cnxk request structure Srikanth Yalavarthi
@ 2023-10-18  6:47   ` Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 06/34] ml/cnxk: rename cnxk ops function pointers struct Srikanth Yalavarthi
                     ` (28 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:47 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduced generic xstats structures and renamed cn10k
xstats enumerations with cnxk prefix.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |  86 +---------------
 drivers/ml/cnxk/cn10k_ml_model.h |   6 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 169 ++++++++++++++-----------------
 drivers/ml/cnxk/cnxk_ml_xstats.h | 128 +++++++++++++++++++++++
 drivers/ml/cnxk/meson.build      |   1 +
 5 files changed, 210 insertions(+), 180 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_xstats.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 1852d4f6c9..be989e0a20 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -10,6 +10,7 @@
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_io.h"
+#include "cnxk_ml_xstats.h"
 
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
@@ -121,89 +122,6 @@ struct cn10k_ml_fw {
 	struct cnxk_ml_req *req;
 };
 
-/* Extended stats types enum */
-enum cn10k_ml_xstats_type {
-	/* Number of models loaded */
-	nb_models_loaded,
-
-	/* Number of models unloaded */
-	nb_models_unloaded,
-
-	/* Number of models started */
-	nb_models_started,
-
-	/* Number of models stopped */
-	nb_models_stopped,
-
-	/* Average inference hardware latency */
-	avg_hw_latency,
-
-	/* Minimum hardware latency */
-	min_hw_latency,
-
-	/* Maximum hardware latency */
-	max_hw_latency,
-
-	/* Average firmware latency */
-	avg_fw_latency,
-
-	/* Minimum firmware latency */
-	min_fw_latency,
-
-	/* Maximum firmware latency */
-	max_fw_latency,
-};
-
-/* Extended stats function type enum. */
-enum cn10k_ml_xstats_fn_type {
-	/* Device function */
-	CN10K_ML_XSTATS_FN_DEVICE,
-
-	/* Model function */
-	CN10K_ML_XSTATS_FN_MODEL,
-};
-
-/* Function pointer to get xstats for a type */
-typedef uint64_t (*cn10k_ml_xstats_fn)(struct rte_ml_dev *dev, uint16_t obj_idx,
-				       enum cn10k_ml_xstats_type stat);
-
-/* Extended stats entry structure */
-struct cn10k_ml_xstats_entry {
-	/* Name-ID map */
-	struct rte_ml_dev_xstats_map map;
-
-	/* xstats mode, device or model */
-	enum rte_ml_dev_xstats_mode mode;
-
-	/* Type of xstats */
-	enum cn10k_ml_xstats_type type;
-
-	/* xstats function */
-	enum cn10k_ml_xstats_fn_type fn_id;
-
-	/* Object ID, model ID for model stat type */
-	uint16_t obj_idx;
-
-	/* Allowed to reset the stat */
-	uint8_t reset_allowed;
-
-	/* An offset to be taken away to emulate resets */
-	uint64_t reset_value;
-};
-
-/* Extended stats data */
-struct cn10k_ml_xstats {
-	/* Pointer to xstats entries */
-	struct cn10k_ml_xstats_entry *entries;
-
-	/* Store num stats and offset of the stats for each model */
-	uint16_t count_per_model[ML_CNXK_MAX_MODELS];
-	uint16_t offset_for_model[ML_CNXK_MAX_MODELS];
-	uint16_t count_mode_device;
-	uint16_t count_mode_model;
-	uint16_t count;
-};
-
 /* Device private data */
 struct cn10k_ml_dev {
 	/* Device ROC */
@@ -216,7 +134,7 @@ struct cn10k_ml_dev {
 	struct cn10k_ml_ocm ocm;
 
 	/* Extended stats data */
-	struct cn10k_ml_xstats xstats;
+	struct cnxk_ml_xstats xstats;
 
 	/* Enable / disable model data caching */
 	int cache_model_data;
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 74ada1531a..5c32f48c68 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -404,7 +404,7 @@ struct cn10k_ml_layer_addr {
 };
 
 /* Model fast-path stats */
-struct cn10k_ml_layer_stats {
+struct cn10k_ml_layer_xstats {
 	/* Total hardware latency, sum of all inferences */
 	uint64_t hw_latency_tot;
 
@@ -447,10 +447,10 @@ struct cn10k_ml_layer_data {
 	struct cnxk_ml_req *req;
 
 	/* Layer: Stats for burst ops */
-	struct cn10k_ml_layer_stats *burst_stats;
+	struct cn10k_ml_layer_xstats *burst_xstats;
 
 	/* Layer: Stats for sync ops */
-	struct cn10k_ml_layer_stats *sync_stats;
+	struct cn10k_ml_layer_xstats *sync_xstats;
 };
 
 struct cn10k_ml_model_data {
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index caee09829b..42a4389bbe 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -10,6 +10,7 @@
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
+#include "cnxk_ml_xstats.h"
 
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
@@ -425,26 +426,6 @@ cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
 }
 
-struct xstat_info {
-	char name[32];
-	enum cn10k_ml_xstats_type type;
-	uint8_t reset_allowed;
-};
-
-/* Note: Device stats are not allowed to be reset. */
-static const struct xstat_info device_stats[] = {
-	{"nb_models_loaded", nb_models_loaded, 0},
-	{"nb_models_unloaded", nb_models_unloaded, 0},
-	{"nb_models_started", nb_models_started, 0},
-	{"nb_models_stopped", nb_models_stopped, 0},
-};
-
-static const struct xstat_info model_stats[] = {
-	{"Avg-HW-Latency", avg_hw_latency, 1}, {"Min-HW-Latency", min_hw_latency, 1},
-	{"Max-HW-Latency", max_hw_latency, 1}, {"Avg-FW-Latency", avg_fw_latency, 1},
-	{"Min-FW-Latency", min_fw_latency, 1}, {"Max-FW-Latency", max_fw_latency, 1},
-};
-
 static int
 cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 {
@@ -459,10 +440,10 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
-	nb_stats = RTE_DIM(device_stats) + ML_CNXK_MAX_MODELS * RTE_DIM(model_stats);
+	nb_stats = RTE_DIM(device_xstats) + ML_CNXK_MAX_MODELS * RTE_DIM(layer_xstats);
 	if (cn10k_mldev->xstats.entries == NULL)
 		cn10k_mldev->xstats.entries = rte_zmalloc(
-			"cn10k_ml_xstats", sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
+			"cn10k_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
 			PLT_CACHE_LINE_SIZE);
 
 	if (cn10k_mldev->xstats.entries == NULL)
@@ -470,17 +451,17 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 
 	/* Initialize device xstats */
 	stat_id = 0;
-	for (i = 0; i < RTE_DIM(device_stats); i++) {
+	for (i = 0; i < RTE_DIM(device_xstats); i++) {
 		cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
 		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s",
-			 device_stats[i].name);
+			 device_xstats[i].name);
 
 		cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
-		cn10k_mldev->xstats.entries[stat_id].type = device_stats[i].type;
-		cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_DEVICE;
+		cn10k_mldev->xstats.entries[stat_id].type = device_xstats[i].type;
+		cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_DEVICE;
 		cn10k_mldev->xstats.entries[stat_id].obj_idx = 0;
-		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_stats[i].reset_allowed;
+		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_xstats[i].reset_allowed;
 		stat_id++;
 	}
 	cn10k_mldev->xstats.count_mode_device = stat_id;
@@ -489,24 +470,24 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
 		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
 
-		for (i = 0; i < RTE_DIM(model_stats); i++) {
+		for (i = 0; i < RTE_DIM(layer_xstats); i++) {
 			cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
 			cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
-			cn10k_mldev->xstats.entries[stat_id].type = model_stats[i].type;
-			cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_MODEL;
+			cn10k_mldev->xstats.entries[stat_id].type = layer_xstats[i].type;
+			cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
 			cn10k_mldev->xstats.entries[stat_id].obj_idx = model;
 			cn10k_mldev->xstats.entries[stat_id].reset_allowed =
-				model_stats[i].reset_allowed;
+				layer_xstats[i].reset_allowed;
 
 			/* Name of xstat is updated during model load */
 			snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 				 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name),
-				 "Model-%u-%s", model, model_stats[i].name);
+				 "Model-%u-%s", model, layer_xstats[i].name);
 
 			stat_id++;
 		}
 
-		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(model_stats);
+		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
 	}
 
 	cn10k_mldev->xstats.count_mode_model = stat_id - cn10k_mldev->xstats.count_mode_device;
@@ -545,7 +526,7 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[model_id];
-	stat_id = RTE_DIM(device_stats) + model_id * RTE_DIM(model_stats);
+	stat_id = RTE_DIM(device_xstats) + model_id * RTE_DIM(layer_xstats);
 
 	roc_clk_freq_get(&rclk_freq, &sclk_freq);
 	if (sclk_freq == 0)
@@ -554,17 +535,17 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 		strcpy(suffix, "ns");
 
 	/* Update xstat name based on model name and sclk availability */
-	for (i = 0; i < RTE_DIM(model_stats); i++) {
+	for (i = 0; i < RTE_DIM(layer_xstats); i++) {
 		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
-			 model->layer[0].glow.metadata.model.name, model_stats[i].name, suffix);
+			 model->layer[0].glow.metadata.model.name, layer_xstats[i].name, suffix);
 		stat_id++;
 	}
 }
 
 static uint64_t
 cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
-		       enum cn10k_ml_xstats_type type)
+		       enum cnxk_ml_xstats_type type)
 {
 	struct cnxk_ml_dev *cnxk_mldev;
 
@@ -590,9 +571,9 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value += model->layer[0].glow.burst_stats[qp_id].str##_latency_tot;        \
-			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
-				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
+			value += model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot;       \
+			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
+				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
 		}                                                                                  \
 		if (count != 0)                                                                    \
 			value = value / count;                                                     \
@@ -603,9 +584,10 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 		value = UINT64_MAX;                                                                \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
 			value = PLT_MIN(                                                           \
-				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_min); \
-			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
-				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
+				value,                                                             \
+				model->layer[0].glow.burst_xstats[qp_id].str##_latency_min);       \
+			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
+				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
@@ -616,16 +598,17 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
 			value = PLT_MAX(                                                           \
-				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_max); \
-			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
-				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
+				value,                                                             \
+				model->layer[0].glow.burst_xstats[qp_id].str##_latency_max);       \
+			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
+				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
 	} while (0)
 
 static uint64_t
-cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cn10k_ml_xstats_type type)
+cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cnxk_ml_xstats_type type)
 {
 	struct cnxk_ml_model *model;
 	uint16_t rclk_freq; /* MHz */
@@ -671,8 +654,8 @@ cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cn10k_ml
 static int
 cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[], uint16_t nb_ids)
 {
-	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
 	uint16_t nb_stats;
 	uint16_t stat_id;
@@ -708,26 +691,26 @@ cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[],
 #define ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			model->layer[0].glow.burst_stats[qp_id].str##_latency_tot = 0;             \
-			model->layer[0].glow.burst_stats[qp_id].str##_reset_count =                \
-				model->layer[0].glow.burst_stats[qp_id].dequeued_count;            \
+			model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot = 0;            \
+			model->layer[0].glow.burst_xstats[qp_id].str##_reset_count =               \
+				model->layer[0].glow.burst_xstats[qp_id].dequeued_count;           \
 		}                                                                                  \
 	} while (0)
 
 #define ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_stats[qp_id].str##_latency_min = UINT64_MAX;    \
+			model->layer[0].glow.burst_xstats[qp_id].str##_latency_min = UINT64_MAX;   \
 	} while (0)
 
 #define ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_stats[qp_id].str##_latency_max = 0;             \
+			model->layer[0].glow.burst_xstats[qp_id].str##_latency_max = 0;            \
 	} while (0)
 
 static void
-cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cn10k_ml_xstats_type type)
+cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cnxk_ml_xstats_type type)
 {
 	struct cnxk_ml_model *model;
 	uint32_t qp_id;
@@ -762,8 +745,8 @@ static int
 cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint16_t stat_ids[],
 			    uint16_t nb_ids)
 {
-	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	int32_t lcl_model_id = 0;
@@ -1342,10 +1325,10 @@ static int
 cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
 				uint64_t *value)
 {
-	struct cn10k_ml_xstats_entry *xs;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	cn10k_ml_xstats_fn fn;
+	cnxk_ml_xstats_fn fn;
 	uint32_t i;
 
 	cnxk_mldev = dev->data->dev_private;
@@ -1357,10 +1340,10 @@ cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16
 				*stat_id = xs->map.id;
 
 			switch (xs->fn_id) {
-			case CN10K_ML_XSTATS_FN_DEVICE:
+			case CNXK_ML_XSTATS_FN_DEVICE:
 				fn = cn10k_ml_dev_xstat_get;
 				break;
-			case CN10K_ML_XSTATS_FN_MODEL:
+			case CNXK_ML_XSTATS_FN_MODEL:
 				fn = cn10k_ml_model_xstat_get;
 				break;
 			default:
@@ -1384,11 +1367,11 @@ static int
 cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
 			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
 {
-	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
 	uint32_t xstats_mode_count;
-	cn10k_ml_xstats_fn fn;
+	cnxk_ml_xstats_fn fn;
 	uint64_t val;
 	uint32_t idx;
 	uint32_t i;
@@ -1423,10 +1406,10 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 		}
 
 		switch (xs->fn_id) {
-		case CN10K_ML_XSTATS_FN_DEVICE:
+		case CNXK_ML_XSTATS_FN_DEVICE:
 			fn = cn10k_ml_dev_xstat_get;
 			break;
-		case CN10K_ML_XSTATS_FN_MODEL:
+		case CNXK_ML_XSTATS_FN_MODEL:
 			fn = cn10k_ml_model_xstat_get;
 			break;
 		default:
@@ -1664,7 +1647,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
 			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
 	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
-	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_stats);
+	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_xstats);
 
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
 		  2 * model_data_size + model_scratch_size + model_info_size +
@@ -1738,24 +1721,24 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	model->layer[0].glow.req = PLT_PTR_ADD(model->info, model_info_size);
 
 	/* Reset burst and sync stats */
-	model->layer[0].glow.burst_stats =
+	model->layer[0].glow.burst_xstats =
 		PLT_PTR_ADD(model->layer[0].glow.req,
 			    PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
-		model->layer[0].glow.burst_stats[qp_id].hw_latency_tot = 0;
-		model->layer[0].glow.burst_stats[qp_id].hw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_stats[qp_id].hw_latency_max = 0;
-		model->layer[0].glow.burst_stats[qp_id].fw_latency_tot = 0;
-		model->layer[0].glow.burst_stats[qp_id].fw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_stats[qp_id].fw_latency_max = 0;
-		model->layer[0].glow.burst_stats[qp_id].hw_reset_count = 0;
-		model->layer[0].glow.burst_stats[qp_id].fw_reset_count = 0;
-		model->layer[0].glow.burst_stats[qp_id].dequeued_count = 0;
+		model->layer[0].glow.burst_xstats[qp_id].hw_latency_tot = 0;
+		model->layer[0].glow.burst_xstats[qp_id].hw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_xstats[qp_id].hw_latency_max = 0;
+		model->layer[0].glow.burst_xstats[qp_id].fw_latency_tot = 0;
+		model->layer[0].glow.burst_xstats[qp_id].fw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_xstats[qp_id].fw_latency_max = 0;
+		model->layer[0].glow.burst_xstats[qp_id].hw_reset_count = 0;
+		model->layer[0].glow.burst_xstats[qp_id].fw_reset_count = 0;
+		model->layer[0].glow.burst_xstats[qp_id].dequeued_count = 0;
 	}
 
-	model->layer[0].glow.sync_stats =
-		PLT_PTR_ADD(model->layer[0].glow.burst_stats,
-			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_stats));
+	model->layer[0].glow.sync_xstats =
+		PLT_PTR_ADD(model->layer[0].glow.burst_xstats,
+			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_xstats));
 
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CNXK_MODEL_STATE_LOADED;
@@ -2308,7 +2291,7 @@ static __rte_always_inline void
 cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *req)
 {
 	union cn10k_ml_error_code *error_code;
-	struct cn10k_ml_layer_stats *stats;
+	struct cn10k_ml_layer_xstats *xstats;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_result *result;
@@ -2326,31 +2309,31 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *re
 		if (likely(qp_id >= 0)) {
 			qp = dev->data->queue_pairs[qp_id];
 			qp->stats.dequeued_count++;
-			stats = &model->layer[0].glow.burst_stats[qp_id];
+			xstats = &model->layer[0].glow.burst_xstats[qp_id];
 		} else {
-			stats = model->layer[0].glow.sync_stats;
+			xstats = model->layer[0].glow.sync_xstats;
 		}
 
-		if (unlikely(stats->dequeued_count == stats->hw_reset_count)) {
-			stats->hw_latency_min = UINT64_MAX;
-			stats->hw_latency_max = 0;
+		if (unlikely(xstats->dequeued_count == xstats->hw_reset_count)) {
+			xstats->hw_latency_min = UINT64_MAX;
+			xstats->hw_latency_max = 0;
 		}
 
-		if (unlikely(stats->dequeued_count == stats->fw_reset_count)) {
-			stats->fw_latency_min = UINT64_MAX;
-			stats->fw_latency_max = 0;
+		if (unlikely(xstats->dequeued_count == xstats->fw_reset_count)) {
+			xstats->fw_latency_min = UINT64_MAX;
+			xstats->fw_latency_max = 0;
 		}
 
 		hw_latency = result->stats.hw_end - result->stats.hw_start;
 		fw_latency = result->stats.fw_end - result->stats.fw_start - hw_latency;
 
-		stats->hw_latency_tot += hw_latency;
-		stats->hw_latency_min = PLT_MIN(stats->hw_latency_min, hw_latency);
-		stats->hw_latency_max = PLT_MAX(stats->hw_latency_max, hw_latency);
-		stats->fw_latency_tot += fw_latency;
-		stats->fw_latency_min = PLT_MIN(stats->fw_latency_min, fw_latency);
-		stats->fw_latency_max = PLT_MAX(stats->fw_latency_max, fw_latency);
-		stats->dequeued_count++;
+		xstats->hw_latency_tot += hw_latency;
+		xstats->hw_latency_min = PLT_MIN(xstats->hw_latency_min, hw_latency);
+		xstats->hw_latency_max = PLT_MAX(xstats->hw_latency_max, hw_latency);
+		xstats->fw_latency_tot += fw_latency;
+		xstats->fw_latency_min = PLT_MIN(xstats->fw_latency_min, fw_latency);
+		xstats->fw_latency_max = PLT_MAX(xstats->fw_latency_max, fw_latency);
+		xstats->dequeued_count++;
 
 		op->impl_opaque = result->error_code;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
diff --git a/drivers/ml/cnxk/cnxk_ml_xstats.h b/drivers/ml/cnxk/cnxk_ml_xstats.h
new file mode 100644
index 0000000000..0d405679ca
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_xstats.h
@@ -0,0 +1,128 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_XSTATS_H_
+#define _CNXK_ML_XSTATS_H_
+
+#include "cnxk_ml_io.h"
+
+/* Extended stats types enum */
+enum cnxk_ml_xstats_type {
+	/* Number of models loaded */
+	nb_models_loaded,
+
+	/* Number of models unloaded */
+	nb_models_unloaded,
+
+	/* Number of models started */
+	nb_models_started,
+
+	/* Number of models stopped */
+	nb_models_stopped,
+
+	/* Average inference hardware latency */
+	avg_hw_latency,
+
+	/* Minimum hardware latency */
+	min_hw_latency,
+
+	/* Maximum hardware latency */
+	max_hw_latency,
+
+	/* Average firmware latency */
+	avg_fw_latency,
+
+	/* Minimum firmware latency */
+	min_fw_latency,
+
+	/* Maximum firmware latency */
+	max_fw_latency,
+
+	/* Average runtime latency */
+	avg_rt_latency,
+
+	/* Minimum runtime latency */
+	min_rt_latency,
+
+	/* Maximum runtime latency */
+	max_rt_latency,
+};
+
+/* Extended stats function type enum. */
+enum cnxk_ml_xstats_fn_type {
+	/* Device function */
+	CNXK_ML_XSTATS_FN_DEVICE,
+
+	/* Model function */
+	CNXK_ML_XSTATS_FN_MODEL,
+};
+
+/* Function pointer to get xstats for a type */
+typedef uint64_t (*cnxk_ml_xstats_fn)(struct rte_ml_dev *cnxk_mldev, uint16_t obj_idx,
+				      enum cnxk_ml_xstats_type stat);
+
+/* Extended stats entry structure */
+struct cnxk_ml_xstats_entry {
+	/* Name-ID map */
+	struct rte_ml_dev_xstats_map map;
+
+	/* xstats mode, device or model */
+	enum rte_ml_dev_xstats_mode mode;
+
+	/* Type of xstats */
+	enum cnxk_ml_xstats_type type;
+
+	/* xstats function */
+	enum cnxk_ml_xstats_fn_type fn_id;
+
+	/* Object ID, model ID for model stat type */
+	uint16_t obj_idx;
+
+	/* Layer ID, valid for model stat type */
+	int32_t layer_id;
+
+	/* Allowed to reset the stat */
+	uint8_t reset_allowed;
+
+	/* An offset to be taken away to emulate resets */
+	uint64_t reset_value;
+};
+
+/* Extended stats data */
+struct cnxk_ml_xstats {
+	/* Pointer to xstats entries */
+	struct cnxk_ml_xstats_entry *entries;
+
+	/* Store num stats and offset of the stats for each model */
+	uint16_t count_per_model[ML_CNXK_MAX_MODELS];
+	uint16_t offset_for_model[ML_CNXK_MAX_MODELS];
+	uint16_t count_per_layer[ML_CNXK_MAX_MODELS][ML_CNXK_MODEL_MAX_LAYERS];
+	uint16_t offset_for_layer[ML_CNXK_MAX_MODELS][ML_CNXK_MODEL_MAX_LAYERS];
+	uint16_t count_mode_device;
+	uint16_t count_mode_model;
+	uint16_t count;
+};
+
+struct cnxk_ml_xstat_info {
+	char name[32];
+	enum cnxk_ml_xstats_type type;
+	uint8_t reset_allowed;
+};
+
+/* Device xstats. Note: Device stats are not allowed to be reset. */
+static const struct cnxk_ml_xstat_info device_xstats[] = {
+	{"nb_models_loaded", nb_models_loaded, 0},
+	{"nb_models_unloaded", nb_models_unloaded, 0},
+	{"nb_models_started", nb_models_started, 0},
+	{"nb_models_stopped", nb_models_stopped, 0},
+};
+
+/* Layer xstats */
+static const struct cnxk_ml_xstat_info layer_xstats[] = {
+	{"Avg-HW-Latency", avg_hw_latency, 1}, {"Min-HW-Latency", min_hw_latency, 1},
+	{"Max-HW-Latency", max_hw_latency, 1}, {"Avg-FW-Latency", avg_fw_latency, 1},
+	{"Min-FW-Latency", min_fw_latency, 1}, {"Max-FW-Latency", max_fw_latency, 1},
+};
+
+#endif /* _CNXK_ML_XSTATS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 73db458fcd..6385ac4548 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -16,6 +16,7 @@ driver_sdk_headers = files(
         'cnxk_ml_io.h',
         'cnxk_ml_model.h',
         'cnxk_ml_ops.h',
+        'cnxk_ml_xstats.h',
 )
 
 sources = files(
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 06/34] ml/cnxk: rename cnxk ops function pointers struct
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
                     ` (4 preceding siblings ...)
  2023-10-18  6:47   ` [PATCH v5 05/34] ml/cnxk: add generic cnxk xstats structures Srikanth Yalavarthi
@ 2023-10-18  6:47   ` Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 07/34] ml/cnxk: update device handling functions Srikanth Yalavarthi
                     ` (27 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:47 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Renamed cn10k ML ops structure with cnxk prefix.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c |  2 +-
 drivers/ml/cnxk/cn10k_ml_ops.c | 73 +++++++++-------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h | 34 +++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.c  | 36 +++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.h  |  2 +
 5 files changed, 91 insertions(+), 56 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index fc6f78d414..91813e9d0a 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -345,7 +345,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 			goto pmd_destroy;
 		}
 
-		dev->dev_ops = &cn10k_ml_ops;
+		dev->dev_ops = &cnxk_ml_ops;
 	} else {
 		plt_err("CN10K ML Ops are not supported on secondary process");
 		dev->dev_ops = &ml_dev_dummy_ops;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 42a4389bbe..66b38fc1eb 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -119,7 +119,7 @@ cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
 {
 	struct cnxk_ml_qp *qp;
@@ -860,7 +860,7 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	return ret;
 }
 
-static int
+int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -888,7 +888,7 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
 {
 	struct rte_ml_dev_info dev_info;
@@ -1087,7 +1087,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	return ret;
 }
 
-static int
+int
 cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1160,7 +1160,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	return rte_dev_remove(dev->device);
 }
 
-static int
+int
 cn10k_ml_dev_start(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1180,7 +1180,7 @@ cn10k_ml_dev_start(struct rte_ml_dev *dev)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1200,7 +1200,7 @@ cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 			      const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
 {
@@ -1241,7 +1241,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 {
 	struct cnxk_ml_qp *qp;
@@ -1258,7 +1258,7 @@ cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 	return 0;
 }
 
-static void
+void
 cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 {
 	struct cnxk_ml_qp *qp;
@@ -1273,7 +1273,7 @@ cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 	}
 }
 
-static int
+int
 cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
 			      uint32_t size)
@@ -1321,7 +1321,7 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 	return idx;
 }
 
-static int
+int
 cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
 				uint64_t *value)
 {
@@ -1363,7 +1363,7 @@ cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16
 	return -EINVAL;
 }
 
-static int
+int
 cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
 			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
 {
@@ -1427,7 +1427,7 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 	return idx;
 }
 
-static int
+int
 cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 			  int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids)
 {
@@ -1441,7 +1441,7 @@ cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mo
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1528,7 +1528,7 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -2051,7 +2051,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	return ret;
 }
 
-static int
+int
 cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			struct rte_ml_model_info *model_info)
 {
@@ -2071,7 +2071,7 @@ cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 	return 0;
 }
 
-static int
+int
 cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
 {
 	struct cnxk_ml_model *model;
@@ -2105,7 +2105,7 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *bu
 	return 0;
 }
 
-static int
+int
 cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
 		     struct rte_ml_buff_seg **qbuffer)
 {
@@ -2186,7 +2186,7 @@ cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_bu
 	return 0;
 }
 
-static int
+int
 cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
 		       struct rte_ml_buff_seg **dbuffer)
 {
@@ -2574,38 +2574,3 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 error_enqueue:
 	return ret;
 }
-
-struct rte_ml_dev_ops cn10k_ml_ops = {
-	/* Device control ops */
-	.dev_info_get = cn10k_ml_dev_info_get,
-	.dev_configure = cn10k_ml_dev_configure,
-	.dev_close = cn10k_ml_dev_close,
-	.dev_start = cn10k_ml_dev_start,
-	.dev_stop = cn10k_ml_dev_stop,
-	.dev_dump = cn10k_ml_dev_dump,
-	.dev_selftest = cn10k_ml_dev_selftest,
-
-	/* Queue-pair handling ops */
-	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
-	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
-
-	/* Stats ops */
-	.dev_stats_get = cn10k_ml_dev_stats_get,
-	.dev_stats_reset = cn10k_ml_dev_stats_reset,
-	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
-	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
-	.dev_xstats_get = cn10k_ml_dev_xstats_get,
-	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
-
-	/* Model ops */
-	.model_load = cn10k_ml_model_load,
-	.model_unload = cn10k_ml_model_unload,
-	.model_start = cn10k_ml_model_start,
-	.model_stop = cn10k_ml_model_stop,
-	.model_info_get = cn10k_ml_model_info_get,
-	.model_params_update = cn10k_ml_model_params_update,
-
-	/* I/O ops */
-	.io_quantize = cn10k_ml_io_quantize,
-	.io_dequantize = cn10k_ml_io_dequantize,
-};
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index fd5992e192..16480b9ad8 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -286,7 +286,29 @@ struct cn10k_ml_req {
 };
 
 /* Device ops */
-extern struct rte_ml_dev_ops cn10k_ml_ops;
+int cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
+int cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf);
+int cn10k_ml_dev_close(struct rte_ml_dev *dev);
+int cn10k_ml_dev_start(struct rte_ml_dev *dev);
+int cn10k_ml_dev_stop(struct rte_ml_dev *dev);
+int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
+int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
+int cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
+				  const struct rte_ml_dev_qp_conf *qp_conf, int socket_id);
+int cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id);
+
+int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
+void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
+int cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+				  int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
+				  uint32_t size);
+int cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
+				    uint64_t *value);
+int cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+			    int32_t model_id, const uint16_t stat_ids[], uint64_t values[],
+			    uint16_t nb_ids);
+int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+			      int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids);
 
 /* Slow-path ops */
 int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
@@ -294,6 +316,16 @@ int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *para
 int cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
+int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
+			    struct rte_ml_model_info *model_info);
+int cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer);
+
+/* I/O ops */
+int cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id,
+			 struct rte_ml_buff_seg **dbuffer, struct rte_ml_buff_seg **qbuffer);
+
+int cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id,
+			   struct rte_ml_buff_seg **qbuffer, struct rte_ml_buff_seg **dbuffer);
 
 /* Fast-path ops */
 __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index f1872dcf7c..03402681c5 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -3,5 +3,41 @@
  */
 
 #include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
 
 #include "cnxk_ml_ops.h"
+
+struct rte_ml_dev_ops cnxk_ml_ops = {
+	/* Device control ops */
+	.dev_info_get = cn10k_ml_dev_info_get,
+	.dev_configure = cn10k_ml_dev_configure,
+	.dev_close = cn10k_ml_dev_close,
+	.dev_start = cn10k_ml_dev_start,
+	.dev_stop = cn10k_ml_dev_stop,
+	.dev_dump = cn10k_ml_dev_dump,
+	.dev_selftest = cn10k_ml_dev_selftest,
+
+	/* Queue-pair handling ops */
+	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
+	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
+
+	/* Stats ops */
+	.dev_stats_get = cn10k_ml_dev_stats_get,
+	.dev_stats_reset = cn10k_ml_dev_stats_reset,
+	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
+	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
+	.dev_xstats_get = cn10k_ml_dev_xstats_get,
+	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
+
+	/* Model ops */
+	.model_load = cn10k_ml_model_load,
+	.model_unload = cn10k_ml_model_unload,
+	.model_start = cn10k_ml_model_start,
+	.model_stop = cn10k_ml_model_stop,
+	.model_info_get = cn10k_ml_model_info_get,
+	.model_params_update = cn10k_ml_model_params_update,
+
+	/* I/O ops */
+	.io_quantize = cn10k_ml_io_quantize,
+	.io_dequantize = cn10k_ml_io_dequantize,
+};
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index b953fb0f5f..a925c07580 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -60,4 +60,6 @@ struct cnxk_ml_qp {
 	struct rte_ml_dev_stats stats;
 };
 
+extern struct rte_ml_dev_ops cnxk_ml_ops;
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 07/34] ml/cnxk: update device handling functions
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
                     ` (5 preceding siblings ...)
  2023-10-18  6:47   ` [PATCH v5 06/34] ml/cnxk: rename cnxk ops function pointers struct Srikanth Yalavarthi
@ 2023-10-18  6:47   ` Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 08/34] ml/cnxk: update queue-pair " Srikanth Yalavarthi
                     ` (26 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:47 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implement CNXK wrapper functions for dev_info_get,
dev_configure, dev_close, dev_start and dev_stop. The
wrapper functions allocate / release common resources
for the ML driver and invoke device specific functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 230 ++------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |  16 +-
 drivers/ml/cnxk/cnxk_ml_dev.h  |   3 +
 drivers/ml/cnxk/cnxk_ml_ops.c  | 286 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.h  |   3 +
 5 files changed, 314 insertions(+), 224 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 66b38fc1eb..6d8f2c8777 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -101,7 +101,7 @@ qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
 }
 
-static int
+int
 cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
 {
 	const struct rte_memzone *qp_mem;
@@ -861,20 +861,12 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 }
 
 int
-cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
+cn10k_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 
-	if (dev_info == NULL)
-		return -EINVAL;
-
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
-	dev_info->driver_name = dev->device->driver->name;
-	dev_info->max_models = ML_CNXK_MAX_MODELS;
 	if (cn10k_mldev->hw_queue_lock)
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
 	else
@@ -889,143 +881,17 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 }
 
 int
-cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
+cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
-	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cnxk_ml_qp *qp;
-	uint16_t model_id;
-	uint32_t mz_size;
 	uint16_t tile_id;
-	uint16_t qp_id;
 	int ret;
 
-	if (dev == NULL || conf == NULL)
-		return -EINVAL;
+	RTE_SET_USED(conf);
 
-	/* Get CN10K device handle */
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	cn10k_ml_dev_info_get(dev, &dev_info);
-	if (conf->nb_models > dev_info.max_models) {
-		plt_err("Invalid device config, nb_models > %u\n", dev_info.max_models);
-		return -EINVAL;
-	}
-
-	if (conf->nb_queue_pairs > dev_info.max_queue_pairs) {
-		plt_err("Invalid device config, nb_queue_pairs > %u\n", dev_info.max_queue_pairs);
-		return -EINVAL;
-	}
-
-	if (cnxk_mldev->state == ML_CNXK_DEV_STATE_PROBED) {
-		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
-			   conf->nb_queue_pairs, conf->nb_models);
-
-		/* Load firmware */
-		ret = cn10k_ml_fw_load(cnxk_mldev);
-		if (ret != 0)
-			return ret;
-	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
-		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
-			   conf->nb_queue_pairs, conf->nb_models);
-	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_STARTED) {
-		plt_err("Device can't be reconfigured in started state\n");
-		return -ENOTSUP;
-	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CLOSED) {
-		plt_err("Device can't be reconfigured after close\n");
-		return -ENOTSUP;
-	}
-
-	/* Configure queue-pairs */
-	if (dev->data->queue_pairs == NULL) {
-		mz_size = sizeof(dev->data->queue_pairs[0]) * conf->nb_queue_pairs;
-		dev->data->queue_pairs =
-			rte_zmalloc("cn10k_mldev_queue_pairs", mz_size, RTE_CACHE_LINE_SIZE);
-		if (dev->data->queue_pairs == NULL) {
-			dev->data->nb_queue_pairs = 0;
-			plt_err("Failed to get memory for queue_pairs, nb_queue_pairs %u",
-				conf->nb_queue_pairs);
-			return -ENOMEM;
-		}
-	} else { /* Re-configure */
-		void **queue_pairs;
-
-		/* Release all queue pairs as ML spec doesn't support queue_pair_destroy. */
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-			qp = dev->data->queue_pairs[qp_id];
-			if (qp != NULL) {
-				ret = cn10k_ml_dev_queue_pair_release(dev, qp_id);
-				if (ret < 0)
-					return ret;
-			}
-		}
-
-		queue_pairs = dev->data->queue_pairs;
-		queue_pairs =
-			rte_realloc(queue_pairs, sizeof(queue_pairs[0]) * conf->nb_queue_pairs,
-				    RTE_CACHE_LINE_SIZE);
-		if (queue_pairs == NULL) {
-			dev->data->nb_queue_pairs = 0;
-			plt_err("Failed to realloc queue_pairs, nb_queue_pairs = %u",
-				conf->nb_queue_pairs);
-			ret = -ENOMEM;
-			goto error;
-		}
-
-		memset(queue_pairs, 0, sizeof(queue_pairs[0]) * conf->nb_queue_pairs);
-		dev->data->queue_pairs = queue_pairs;
-	}
-	dev->data->nb_queue_pairs = conf->nb_queue_pairs;
-
-	/* Allocate ML models */
-	if (dev->data->models == NULL) {
-		mz_size = sizeof(dev->data->models[0]) * conf->nb_models;
-		dev->data->models = rte_zmalloc("cn10k_mldev_models", mz_size, RTE_CACHE_LINE_SIZE);
-		if (dev->data->models == NULL) {
-			dev->data->nb_models = 0;
-			plt_err("Failed to get memory for ml_models, nb_models %u",
-				conf->nb_models);
-			ret = -ENOMEM;
-			goto error;
-		}
-	} else {
-		/* Re-configure */
-		void **models;
-
-		/* Stop and unload all models */
-		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
-			model = dev->data->models[model_id];
-			if (model != NULL) {
-				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-					if (cn10k_ml_model_stop(dev, model_id) != 0)
-						plt_err("Could not stop model %u", model_id);
-				}
-				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-					if (cn10k_ml_model_unload(dev, model_id) != 0)
-						plt_err("Could not unload model %u", model_id);
-				}
-				dev->data->models[model_id] = NULL;
-			}
-		}
-
-		models = dev->data->models;
-		models = rte_realloc(models, sizeof(models[0]) * conf->nb_models,
-				     RTE_CACHE_LINE_SIZE);
-		if (models == NULL) {
-			dev->data->nb_models = 0;
-			plt_err("Failed to realloc ml_models, nb_models = %u", conf->nb_models);
-			ret = -ENOMEM;
-			goto error;
-		}
-		memset(models, 0, sizeof(models[0]) * conf->nb_models);
-		dev->data->models = models;
-	}
-	dev->data->nb_models = conf->nb_models;
-
 	ocm = &cn10k_mldev->ocm;
 	ocm->num_tiles = ML_CN10K_OCM_NUMTILES;
 	ocm->size_per_tile = ML_CN10K_OCM_TILESIZE;
@@ -1038,8 +904,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		rte_zmalloc("ocm_mask", ocm->mask_words * ocm->num_tiles, RTE_CACHE_LINE_SIZE);
 	if (ocm->ocm_mask == NULL) {
 		plt_err("Unable to allocate memory for OCM mask");
-		ret = -ENOMEM;
-		goto error;
+		return -ENOMEM;
 	}
 
 	for (tile_id = 0; tile_id < ocm->num_tiles; tile_id++) {
@@ -1050,10 +915,10 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	rte_spinlock_init(&ocm->lock);
 
 	/* Initialize xstats */
-	ret = cn10k_ml_xstats_init(dev);
+	ret = cn10k_ml_xstats_init(cnxk_mldev->mldev);
 	if (ret != 0) {
 		plt_err("Failed to initialize xstats");
-		goto error;
+		return ret;
 	}
 
 	/* Set JCMDQ enqueue function */
@@ -1067,77 +932,25 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	cn10k_mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
 	cn10k_mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
 
-	dev->enqueue_burst = cn10k_ml_enqueue_burst;
-	dev->dequeue_burst = cn10k_ml_dequeue_burst;
-	dev->op_error_get = cn10k_ml_op_error_get;
-
-	cnxk_mldev->nb_models_loaded = 0;
-	cnxk_mldev->nb_models_started = 0;
-	cnxk_mldev->nb_models_stopped = 0;
-	cnxk_mldev->nb_models_unloaded = 0;
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
+	cnxk_mldev->mldev->enqueue_burst = cn10k_ml_enqueue_burst;
+	cnxk_mldev->mldev->dequeue_burst = cn10k_ml_dequeue_burst;
+	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
 
 	return 0;
-
-error:
-	rte_free(dev->data->queue_pairs);
-
-	rte_free(dev->data->models);
-
-	return ret;
 }
 
 int
-cn10k_ml_dev_close(struct rte_ml_dev *dev)
+cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
-	struct cnxk_ml_qp *qp;
-	uint16_t model_id;
-	uint16_t qp_id;
 
-	if (dev == NULL)
-		return -EINVAL;
-
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Release ocm_mask memory */
 	rte_free(cn10k_mldev->ocm.ocm_mask);
 
-	/* Stop and unload all models */
-	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
-		model = dev->data->models[model_id];
-		if (model != NULL) {
-			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-				if (cn10k_ml_model_stop(dev, model_id) != 0)
-					plt_err("Could not stop model %u", model_id);
-			}
-			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-				if (cn10k_ml_model_unload(dev, model_id) != 0)
-					plt_err("Could not unload model %u", model_id);
-			}
-			dev->data->models[model_id] = NULL;
-		}
-	}
-
-	rte_free(dev->data->models);
-
-	/* Destroy all queue pairs */
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-		qp = dev->data->queue_pairs[qp_id];
-		if (qp != NULL) {
-			if (cnxk_ml_qp_destroy(dev, qp) != 0)
-				plt_err("Could not destroy queue pair %u", qp_id);
-			dev->data->queue_pairs[qp_id] = NULL;
-		}
-	}
-
-	rte_free(dev->data->queue_pairs);
-
 	/* Un-initialize xstats */
-	cn10k_ml_xstats_uninit(dev);
+	cn10k_ml_xstats_uninit(cnxk_mldev->mldev);
 
 	/* Unload firmware */
 	cn10k_ml_fw_unload(cnxk_mldev);
@@ -1154,20 +967,15 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_MLR_BASE);
 	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
 
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_CLOSED;
-
-	/* Remove PCI device */
-	return rte_dev_remove(dev->device);
+	return 0;
 }
 
 int
-cn10k_ml_dev_start(struct rte_ml_dev *dev)
+cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
@@ -1175,19 +983,15 @@ cn10k_ml_dev_start(struct rte_ml_dev *dev)
 	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
-
 	return 0;
 }
 
 int
-cn10k_ml_dev_stop(struct rte_ml_dev *dev)
+cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
@@ -1195,8 +999,6 @@ cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
-
 	return 0;
 }
 
@@ -1217,7 +1019,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	if (dev->data->queue_pairs[queue_pair_id] != NULL)
 		cn10k_ml_dev_queue_pair_release(dev, queue_pair_id);
 
-	cn10k_ml_dev_info_get(dev, &dev_info);
+	cnxk_ml_dev_info_get(dev, &dev_info);
 	if ((qp_conf->nb_desc > dev_info.max_desc) || (qp_conf->nb_desc == 0)) {
 		plt_err("Could not setup queue pair for %u descriptors", qp_conf->nb_desc);
 		return -EINVAL;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 16480b9ad8..d50b5bede7 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -10,6 +10,9 @@
 
 #include <roc_api.h>
 
+struct cnxk_ml_dev;
+struct cnxk_ml_qp;
+
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
 
@@ -286,11 +289,11 @@ struct cn10k_ml_req {
 };
 
 /* Device ops */
-int cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
-int cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf);
-int cn10k_ml_dev_close(struct rte_ml_dev *dev);
-int cn10k_ml_dev_start(struct rte_ml_dev *dev);
-int cn10k_ml_dev_stop(struct rte_ml_dev *dev);
+int cn10k_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info);
+int cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
+int cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev);
+int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
 int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
 int cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
@@ -336,4 +339,7 @@ __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op
 				    struct rte_ml_op_error *error);
 __rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
 
+/* Temporarily set below functions as non-static */
+int cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp);
+
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 51315de622..02605fa28f 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -53,6 +53,9 @@ struct cnxk_ml_dev {
 
 	/* CN10K device structure */
 	struct cn10k_ml_dev cn10k_mldev;
+
+	/* Maximum number of layers */
+	uint64_t max_nb_layers;
 };
 
 #endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 03402681c5..07a4daabc5 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -5,15 +5,291 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include "cnxk_ml_dev.h"
+#include "cnxk_ml_io.h"
+#include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
+int
+cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	if (dev == NULL || dev_info == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
+	dev_info->driver_name = dev->device->driver->name;
+	dev_info->max_models = ML_CNXK_MAX_MODELS;
+
+	return cn10k_ml_dev_info_get(cnxk_mldev, dev_info);
+}
+
+static int
+cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_qp *qp;
+	uint16_t model_id;
+	uint32_t mz_size;
+	uint16_t qp_id;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	/* Get CNXK device handle */
+	cnxk_mldev = dev->data->dev_private;
+
+	cnxk_ml_dev_info_get(dev, &dev_info);
+	if (conf->nb_models > dev_info.max_models) {
+		plt_err("Invalid device config, nb_models > %u\n", dev_info.max_models);
+		return -EINVAL;
+	}
+
+	if (conf->nb_queue_pairs > dev_info.max_queue_pairs) {
+		plt_err("Invalid device config, nb_queue_pairs > %u\n", dev_info.max_queue_pairs);
+		return -EINVAL;
+	}
+
+	if (cnxk_mldev->state == ML_CNXK_DEV_STATE_PROBED) {
+		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
+			   conf->nb_queue_pairs, conf->nb_models);
+
+		/* Load firmware */
+		ret = cn10k_ml_fw_load(cnxk_mldev);
+		if (ret != 0)
+			return ret;
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
+		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
+			   conf->nb_queue_pairs, conf->nb_models);
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_STARTED) {
+		plt_err("Device can't be reconfigured in started state\n");
+		return -ENOTSUP;
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CLOSED) {
+		plt_err("Device can't be reconfigured after close\n");
+		return -ENOTSUP;
+	}
+
+	/* Configure queue-pairs */
+	if (dev->data->queue_pairs == NULL) {
+		mz_size = sizeof(dev->data->queue_pairs[0]) * conf->nb_queue_pairs;
+		dev->data->queue_pairs =
+			rte_zmalloc("cnxk_mldev_queue_pairs", mz_size, RTE_CACHE_LINE_SIZE);
+		if (dev->data->queue_pairs == NULL) {
+			dev->data->nb_queue_pairs = 0;
+			plt_err("Failed to get memory for queue_pairs, nb_queue_pairs %u",
+				conf->nb_queue_pairs);
+			return -ENOMEM;
+		}
+	} else { /* Re-configure */
+		void **queue_pairs;
+
+		/* Release all queue pairs as ML spec doesn't support queue_pair_destroy. */
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+			qp = dev->data->queue_pairs[qp_id];
+			if (qp != NULL) {
+				ret = cn10k_ml_dev_queue_pair_release(dev, qp_id);
+				if (ret < 0)
+					return ret;
+			}
+		}
+
+		queue_pairs = dev->data->queue_pairs;
+		queue_pairs =
+			rte_realloc(queue_pairs, sizeof(queue_pairs[0]) * conf->nb_queue_pairs,
+				    RTE_CACHE_LINE_SIZE);
+		if (queue_pairs == NULL) {
+			dev->data->nb_queue_pairs = 0;
+			plt_err("Failed to realloc queue_pairs, nb_queue_pairs = %u",
+				conf->nb_queue_pairs);
+			ret = -ENOMEM;
+			goto error;
+		}
+
+		memset(queue_pairs, 0, sizeof(queue_pairs[0]) * conf->nb_queue_pairs);
+		dev->data->queue_pairs = queue_pairs;
+	}
+	dev->data->nb_queue_pairs = conf->nb_queue_pairs;
+
+	/* Allocate ML models */
+	if (dev->data->models == NULL) {
+		mz_size = sizeof(dev->data->models[0]) * conf->nb_models;
+		dev->data->models = rte_zmalloc("cnxk_mldev_models", mz_size, RTE_CACHE_LINE_SIZE);
+		if (dev->data->models == NULL) {
+			dev->data->nb_models = 0;
+			plt_err("Failed to get memory for ml_models, nb_models %u",
+				conf->nb_models);
+			ret = -ENOMEM;
+			goto error;
+		}
+	} else {
+		/* Re-configure */
+		void **models;
+
+		/* Stop and unload all models */
+		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+			model = dev->data->models[model_id];
+			if (model != NULL) {
+				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
+					if (cn10k_ml_model_stop(dev, model_id) != 0)
+						plt_err("Could not stop model %u", model_id);
+				}
+				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
+					if (cn10k_ml_model_unload(dev, model_id) != 0)
+						plt_err("Could not unload model %u", model_id);
+				}
+				dev->data->models[model_id] = NULL;
+			}
+		}
+
+		models = dev->data->models;
+		models = rte_realloc(models, sizeof(models[0]) * conf->nb_models,
+				     RTE_CACHE_LINE_SIZE);
+		if (models == NULL) {
+			dev->data->nb_models = 0;
+			plt_err("Failed to realloc ml_models, nb_models = %u", conf->nb_models);
+			ret = -ENOMEM;
+			goto error;
+		}
+		memset(models, 0, sizeof(models[0]) * conf->nb_models);
+		dev->data->models = models;
+	}
+	dev->data->nb_models = conf->nb_models;
+
+	ret = cn10k_ml_dev_configure(cnxk_mldev, conf);
+	if (ret != 0) {
+		plt_err("Failed to configure CN10K ML Device");
+		goto error;
+	}
+
+	/* Set device capabilities */
+	cnxk_mldev->max_nb_layers =
+		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
+
+	cnxk_mldev->nb_models_loaded = 0;
+	cnxk_mldev->nb_models_started = 0;
+	cnxk_mldev->nb_models_stopped = 0;
+	cnxk_mldev->nb_models_unloaded = 0;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
+
+	return 0;
+
+error:
+	rte_free(dev->data->queue_pairs);
+	rte_free(dev->data->models);
+
+	return ret;
+}
+
+static int
+cnxk_ml_dev_close(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_qp *qp;
+	uint16_t model_id;
+	uint16_t qp_id;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
+		plt_err("Failed to close CN10K ML Device");
+
+	/* Stop and unload all models */
+	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+		model = dev->data->models[model_id];
+		if (model != NULL) {
+			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
+				if (cn10k_ml_model_stop(dev, model_id) != 0)
+					plt_err("Could not stop model %u", model_id);
+			}
+			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
+				if (cn10k_ml_model_unload(dev, model_id) != 0)
+					plt_err("Could not unload model %u", model_id);
+			}
+			dev->data->models[model_id] = NULL;
+		}
+	}
+
+	rte_free(dev->data->models);
+
+	/* Destroy all queue pairs */
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		if (qp != NULL) {
+			if (cnxk_ml_qp_destroy(dev, qp) != 0)
+				plt_err("Could not destroy queue pair %u", qp_id);
+			dev->data->queue_pairs[qp_id] = NULL;
+		}
+	}
+
+	rte_free(dev->data->queue_pairs);
+
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CLOSED;
+
+	/* Remove PCI device */
+	return rte_dev_remove(dev->device);
+}
+
+static int
+cnxk_ml_dev_start(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	ret = cn10k_ml_dev_start(cnxk_mldev);
+	if (ret != 0) {
+		plt_err("Failed to start CN10K ML Device");
+		return ret;
+	}
+
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
+
+	return 0;
+}
+
+static int
+cnxk_ml_dev_stop(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	ret = cn10k_ml_dev_stop(cnxk_mldev);
+	if (ret != 0) {
+		plt_err("Failed to stop CN10K ML Device");
+		return ret;
+	}
+
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
-	.dev_info_get = cn10k_ml_dev_info_get,
-	.dev_configure = cn10k_ml_dev_configure,
-	.dev_close = cn10k_ml_dev_close,
-	.dev_start = cn10k_ml_dev_start,
-	.dev_stop = cn10k_ml_dev_stop,
+	.dev_info_get = cnxk_ml_dev_info_get,
+	.dev_configure = cnxk_ml_dev_configure,
+	.dev_close = cnxk_ml_dev_close,
+	.dev_start = cnxk_ml_dev_start,
+	.dev_stop = cnxk_ml_dev_stop,
 	.dev_dump = cn10k_ml_dev_dump,
 	.dev_selftest = cn10k_ml_dev_selftest,
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index a925c07580..2996928d7d 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -62,4 +62,7 @@ struct cnxk_ml_qp {
 
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
+/* Temporarily set cnxk driver functions as non-static */
+int cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 08/34] ml/cnxk: update queue-pair handling functions
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
                     ` (6 preceding siblings ...)
  2023-10-18  6:47   ` [PATCH v5 07/34] ml/cnxk: update device handling functions Srikanth Yalavarthi
@ 2023-10-18  6:47   ` Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 09/34] ml/cnxk: update model load and unload functions Srikanth Yalavarthi
                     ` (25 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:47 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to handle ML device queue-pairs.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 135 +----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   7 +-
 drivers/ml/cnxk/cnxk_ml_ops.c  | 153 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.h  |   3 -
 4 files changed, 154 insertions(+), 144 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 6d8f2c8777..e3c688a55f 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -95,93 +95,12 @@ cn10k_ml_get_poll_ptr(struct cnxk_ml_req *req)
 	return plt_read64(req->status);
 }
 
-static void
-qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
-{
-	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
-}
-
-int
-cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
-{
-	const struct rte_memzone *qp_mem;
-	char name[RTE_MEMZONE_NAMESIZE];
-	int ret;
-
-	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp->id);
-	qp_mem = rte_memzone_lookup(name);
-	ret = rte_memzone_free(qp_mem);
-	if (ret)
-		return ret;
-
-	rte_free(qp);
-
-	return 0;
-}
-
-int
-cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
-{
-	struct cnxk_ml_qp *qp;
-	int ret;
-
-	qp = dev->data->queue_pairs[queue_pair_id];
-	if (qp == NULL)
-		return -EINVAL;
-
-	ret = cnxk_ml_qp_destroy(dev, qp);
-	if (ret) {
-		plt_err("Could not destroy queue pair %u", queue_pair_id);
-		return ret;
-	}
-
-	dev->data->queue_pairs[queue_pair_id] = NULL;
-
-	return 0;
-}
-
-static struct cnxk_ml_qp *
-cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
+void
+cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp)
 {
-	const struct rte_memzone *qp_mem;
-	char name[RTE_MEMZONE_NAMESIZE];
-	struct cnxk_ml_qp *qp;
-	uint32_t len;
-	uint8_t *va;
 	uint64_t i;
 
-	/* Allocate queue pair */
-	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cnxk_ml_qp), ROC_ALIGN,
-				socket_id);
-	if (qp == NULL) {
-		plt_err("Could not allocate queue pair");
-		return NULL;
-	}
-
-	/* For request queue */
-	len = nb_desc * sizeof(struct cnxk_ml_req);
-	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp_id);
-	qp_mem = rte_memzone_reserve_aligned(
-		name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | RTE_MEMZONE_256MB, ROC_ALIGN);
-	if (qp_mem == NULL) {
-		plt_err("Could not reserve memzone: %s", name);
-		goto qp_free;
-	}
-
-	va = qp_mem->addr;
-	memset(va, 0, len);
-
-	/* Initialize Request queue */
-	qp->id = qp_id;
-	qp->queue.reqs = (struct cnxk_ml_req *)va;
-	qp->queue.head = 0;
-	qp->queue.tail = 0;
-	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	qp->nb_desc = nb_desc;
-	qp->stats.enqueued_count = 0;
-	qp->stats.dequeued_count = 0;
-	qp->stats.enqueue_err_count = 0;
-	qp->stats.dequeue_err_count = 0;
+	RTE_SET_USED(cnxk_mldev);
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
@@ -189,13 +108,6 @@ cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc
 		qp->queue.reqs[i].cn10k_req.jcmd.w1.s.jobptr =
 			PLT_U64_CAST(&qp->queue.reqs[i].cn10k_req.jd);
 	}
-
-	return qp;
-
-qp_free:
-	rte_free(qp);
-
-	return NULL;
 }
 
 static void
@@ -1002,47 +914,6 @@ cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
-int
-cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
-			      const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
-{
-	struct rte_ml_dev_info dev_info;
-	struct cnxk_ml_qp *qp;
-	uint32_t nb_desc;
-
-	if (queue_pair_id >= dev->data->nb_queue_pairs) {
-		plt_err("Queue-pair id = %u (>= max queue pairs supported, %u)\n", queue_pair_id,
-			dev->data->nb_queue_pairs);
-		return -EINVAL;
-	}
-
-	if (dev->data->queue_pairs[queue_pair_id] != NULL)
-		cn10k_ml_dev_queue_pair_release(dev, queue_pair_id);
-
-	cnxk_ml_dev_info_get(dev, &dev_info);
-	if ((qp_conf->nb_desc > dev_info.max_desc) || (qp_conf->nb_desc == 0)) {
-		plt_err("Could not setup queue pair for %u descriptors", qp_conf->nb_desc);
-		return -EINVAL;
-	}
-	plt_ml_dbg("Creating queue-pair, queue_pair_id = %u, nb_desc = %u", queue_pair_id,
-		   qp_conf->nb_desc);
-
-	/* As the number of usable descriptors is 1 less than the queue size being created, we
-	 * increment the size of queue by 1 than the requested size, except when the requested size
-	 * is equal to the maximum possible size.
-	 */
-	nb_desc =
-		(qp_conf->nb_desc == dev_info.max_desc) ? dev_info.max_desc : qp_conf->nb_desc + 1;
-	qp = cnxk_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
-	if (qp == NULL) {
-		plt_err("Could not create queue pair %u", queue_pair_id);
-		return -ENOMEM;
-	}
-	dev->data->queue_pairs[queue_pair_id] = qp;
-
-	return 0;
-}
-
 int
 cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 {
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index d50b5bede7..2d0a49d5cd 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -296,9 +296,6 @@ int cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
 int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
-int cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
-				  const struct rte_ml_dev_qp_conf *qp_conf, int socket_id);
-int cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id);
 
 int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
 void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
@@ -339,7 +336,7 @@ __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op
 				    struct rte_ml_op_error *error);
 __rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
 
-/* Temporarily set below functions as non-static */
-int cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp);
+/* Misc ops */
+void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
 
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 07a4daabc5..aa56dd2276 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -10,7 +10,107 @@
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
-int
+static void
+qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
+{
+	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
+}
+
+static int
+cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
+{
+	const struct rte_memzone *qp_mem;
+	char name[RTE_MEMZONE_NAMESIZE];
+	int ret;
+
+	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp->id);
+	qp_mem = rte_memzone_lookup(name);
+	ret = rte_memzone_free(qp_mem);
+	if (ret)
+		return ret;
+
+	rte_free(qp);
+
+	return 0;
+}
+
+static int
+cnxk_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
+{
+	struct cnxk_ml_qp *qp;
+	int ret;
+
+	qp = dev->data->queue_pairs[queue_pair_id];
+	if (qp == NULL)
+		return -EINVAL;
+
+	ret = cnxk_ml_qp_destroy(dev, qp);
+	if (ret) {
+		plt_err("Could not destroy queue pair %u", queue_pair_id);
+		return ret;
+	}
+
+	dev->data->queue_pairs[queue_pair_id] = NULL;
+
+	return 0;
+}
+
+static struct cnxk_ml_qp *
+cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
+{
+	const struct rte_memzone *qp_mem;
+	char name[RTE_MEMZONE_NAMESIZE];
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_qp *qp;
+	uint32_t len;
+	uint8_t *va;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	/* Allocate queue pair */
+	qp = rte_zmalloc_socket("cnxk_ml_pmd_queue_pair", sizeof(struct cnxk_ml_qp), ROC_ALIGN,
+				socket_id);
+	if (qp == NULL) {
+		plt_err("Could not allocate queue pair");
+		return NULL;
+	}
+
+	/* For request queue */
+	len = nb_desc * sizeof(struct cnxk_ml_req);
+	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp_id);
+	qp_mem = rte_memzone_reserve_aligned(
+		name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | RTE_MEMZONE_256MB, ROC_ALIGN);
+	if (qp_mem == NULL) {
+		plt_err("Could not reserve memzone: %s", name);
+		goto qp_free;
+	}
+
+	va = qp_mem->addr;
+	memset(va, 0, len);
+
+	/* Initialize Request queue */
+	qp->id = qp_id;
+	qp->queue.reqs = (struct cnxk_ml_req *)va;
+	qp->queue.head = 0;
+	qp->queue.tail = 0;
+	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	qp->nb_desc = nb_desc;
+	qp->stats.enqueued_count = 0;
+	qp->stats.dequeued_count = 0;
+	qp->stats.enqueue_err_count = 0;
+	qp->stats.dequeue_err_count = 0;
+
+	cn10k_ml_qp_initialize(cnxk_mldev, qp);
+
+	return qp;
+
+qp_free:
+	rte_free(qp);
+
+	return NULL;
+}
+
+static int
 cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
 	struct cnxk_ml_dev *cnxk_mldev;
@@ -93,7 +193,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
 			qp = dev->data->queue_pairs[qp_id];
 			if (qp != NULL) {
-				ret = cn10k_ml_dev_queue_pair_release(dev, qp_id);
+				ret = cnxk_ml_dev_queue_pair_release(dev, qp_id);
 				if (ret < 0)
 					return ret;
 			}
@@ -283,6 +383,51 @@ cnxk_ml_dev_stop(struct rte_ml_dev *dev)
 	return 0;
 }
 
+static int
+cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
+			     const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cnxk_ml_qp *qp;
+	uint32_t nb_desc;
+
+	if (queue_pair_id >= dev->data->nb_queue_pairs) {
+		plt_err("Queue-pair id = %u (>= max queue pairs supported, %u)\n", queue_pair_id,
+			dev->data->nb_queue_pairs);
+		return -EINVAL;
+	}
+
+	if (dev->data->queue_pairs[queue_pair_id] != NULL)
+		cnxk_ml_dev_queue_pair_release(dev, queue_pair_id);
+
+	cnxk_ml_dev_info_get(dev, &dev_info);
+	if (qp_conf->nb_desc == 0) {
+		plt_err("Could not setup queue pair for %u descriptors", qp_conf->nb_desc);
+		return -EINVAL;
+	} else if (qp_conf->nb_desc > dev_info.max_desc) {
+		plt_err("Could not setup queue pair for %u descriptors (> %u)", qp_conf->nb_desc,
+			dev_info.max_desc);
+		return -EINVAL;
+	}
+	plt_ml_dbg("Creating queue-pair, queue_pair_id = %u, nb_desc = %u", queue_pair_id,
+		   qp_conf->nb_desc);
+
+	/* As the number of usable descriptors is 1 less than the queue size being created, we
+	 * increment the size of queue by 1 than the requested size, except when the requested size
+	 * is equal to the maximum possible size.
+	 */
+	nb_desc =
+		(qp_conf->nb_desc == dev_info.max_desc) ? dev_info.max_desc : qp_conf->nb_desc + 1;
+	qp = cnxk_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
+	if (qp == NULL) {
+		plt_err("Could not create queue pair %u", queue_pair_id);
+		return -ENOMEM;
+	}
+	dev->data->queue_pairs[queue_pair_id] = qp;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -294,8 +439,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_selftest = cn10k_ml_dev_selftest,
 
 	/* Queue-pair handling ops */
-	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
-	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
+	.dev_queue_pair_setup = cnxk_ml_dev_queue_pair_setup,
+	.dev_queue_pair_release = cnxk_ml_dev_queue_pair_release,
 
 	/* Stats ops */
 	.dev_stats_get = cn10k_ml_dev_stats_get,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index 2996928d7d..a925c07580 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -62,7 +62,4 @@ struct cnxk_ml_qp {
 
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
-/* Temporarily set cnxk driver functions as non-static */
-int cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
-
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 09/34] ml/cnxk: update model load and unload functions
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
                     ` (7 preceding siblings ...)
  2023-10-18  6:47   ` [PATCH v5 08/34] ml/cnxk: update queue-pair " Srikanth Yalavarthi
@ 2023-10-18  6:47   ` Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 10/34] ml/cnxk: update model start and stop functions Srikanth Yalavarthi
                     ` (24 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:47 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented cnxk wrapper functions to load and unload
ML models. Wrapper functions would invoke the cn10k
model load and unload functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 244 ++++++++++++-------------
 drivers/ml/cnxk/cn10k_ml_model.h |  26 ++-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 296 ++++++++++++++++++-------------
 drivers/ml/cnxk/cn10k_ml_ops.h   |  12 +-
 drivers/ml/cnxk/cnxk_ml_dev.h    |  15 ++
 drivers/ml/cnxk/cnxk_ml_ops.c    | 144 ++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.h    |   2 +
 7 files changed, 462 insertions(+), 277 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 5d37e9bf8a..69a60b9b90 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -316,42 +316,31 @@ cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer, uint8_t
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_layer_addr *addr;
-	size_t model_data_size;
 	uint8_t *dma_addr_load;
-	uint8_t *dma_addr_run;
 	int fpos;
 
 	metadata = &layer->glow.metadata;
 	addr = &layer->glow.addr;
-	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
-			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
 
 	/* Base address */
 	addr->base_dma_addr_load = base_dma_addr;
-	addr->base_dma_addr_run = PLT_PTR_ADD(addr->base_dma_addr_load, model_data_size);
 
 	/* Init section */
 	dma_addr_load = addr->base_dma_addr_load;
-	dma_addr_run = addr->base_dma_addr_run;
 	fpos = sizeof(struct cn10k_ml_model_metadata);
 	addr->init_load_addr = dma_addr_load;
-	addr->init_run_addr = dma_addr_run;
 	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->init_model.file_size);
 
 	/* Main section */
 	dma_addr_load += metadata->init_model.file_size;
-	dma_addr_run += metadata->init_model.file_size;
 	fpos += metadata->init_model.file_size;
 	addr->main_load_addr = dma_addr_load;
-	addr->main_run_addr = dma_addr_run;
 	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->main_model.file_size);
 
 	/* Finish section */
 	dma_addr_load += metadata->main_model.file_size;
-	dma_addr_run += metadata->main_model.file_size;
 	fpos += metadata->main_model.file_size;
 	addr->finish_load_addr = dma_addr_load;
-	addr->finish_run_addr = dma_addr_run;
 	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->finish_model.file_size);
 
 	/* Weights and Bias section */
@@ -363,140 +352,146 @@ cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer, uint8_t
 }
 
 void
-cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer)
+cn10k_ml_layer_io_info_set(struct cnxk_ml_io_info *io_info,
+			   struct cn10k_ml_model_metadata *metadata)
 {
-	struct cn10k_ml_model_metadata *metadata;
 	uint8_t i;
 	uint8_t j;
 
-	metadata = &layer->glow.metadata;
-
 	/* Inputs */
-	layer->info.nb_inputs = metadata->model.num_input;
-	layer->info.total_input_sz_d = 0;
-	layer->info.total_input_sz_q = 0;
+	io_info->nb_inputs = metadata->model.num_input;
+	io_info->total_input_sz_d = 0;
+	io_info->total_input_sz_q = 0;
 	for (i = 0; i < metadata->model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			strncpy(layer->info.input[i].name, (char *)metadata->input1[i].input_name,
+			strncpy(io_info->input[i].name, (char *)metadata->input1[i].input_name,
 				MRVL_ML_INPUT_NAME_LEN);
-			layer->info.input[i].dtype = metadata->input1[i].input_type;
-			layer->info.input[i].qtype = metadata->input1[i].model_input_type;
-			layer->info.input[i].nb_dims = 4;
-			layer->info.input[i].shape[0] = metadata->input1[i].shape.w;
-			layer->info.input[i].shape[1] = metadata->input1[i].shape.x;
-			layer->info.input[i].shape[2] = metadata->input1[i].shape.y;
-			layer->info.input[i].shape[3] = metadata->input1[i].shape.z;
-			layer->info.input[i].nb_elements =
+			io_info->input[i].dtype = metadata->input1[i].input_type;
+			io_info->input[i].qtype = metadata->input1[i].model_input_type;
+			io_info->input[i].nb_dims = 4;
+			io_info->input[i].shape[0] = metadata->input1[i].shape.w;
+			io_info->input[i].shape[1] = metadata->input1[i].shape.x;
+			io_info->input[i].shape[2] = metadata->input1[i].shape.y;
+			io_info->input[i].shape[3] = metadata->input1[i].shape.z;
+			io_info->input[i].nb_elements =
 				metadata->input1[i].shape.w * metadata->input1[i].shape.x *
 				metadata->input1[i].shape.y * metadata->input1[i].shape.z;
-			layer->info.input[i].sz_d =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_d =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].input_type);
-			layer->info.input[i].sz_q =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_q =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].model_input_type);
-			layer->info.input[i].scale = metadata->input1[i].qscale;
+			io_info->input[i].scale = metadata->input1[i].qscale;
 
-			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
-			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
+			io_info->total_input_sz_d += io_info->input[i].sz_d;
+			io_info->total_input_sz_q += io_info->input[i].sz_q;
 
 			plt_ml_dbg(
-				"index = %u, input1[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				layer->index, i, metadata->input1[i].shape.w,
+				"layer_name = %s, input1[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				metadata->model.name, i, metadata->input1[i].shape.w,
 				metadata->input1[i].shape.x, metadata->input1[i].shape.y,
-				metadata->input1[i].shape.z, layer->info.input[i].sz_d,
-				layer->info.input[i].sz_q);
+				metadata->input1[i].shape.z, io_info->input[i].sz_d,
+				io_info->input[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			strncpy(layer->info.input[i].name, (char *)metadata->input2[j].input_name,
+			strncpy(io_info->input[i].name, (char *)metadata->input2[j].input_name,
 				MRVL_ML_INPUT_NAME_LEN);
-			layer->info.input[i].dtype = metadata->input2[j].input_type;
-			layer->info.input[i].qtype = metadata->input2[j].model_input_type;
-			layer->info.input[i].nb_dims = 4;
-			layer->info.input[i].shape[0] = metadata->input2[j].shape.w;
-			layer->info.input[i].shape[1] = metadata->input2[j].shape.x;
-			layer->info.input[i].shape[2] = metadata->input2[j].shape.y;
-			layer->info.input[i].shape[3] = metadata->input2[j].shape.z;
-			layer->info.input[i].nb_elements =
+			io_info->input[i].dtype = metadata->input2[j].input_type;
+			io_info->input[i].qtype = metadata->input2[j].model_input_type;
+			io_info->input[i].nb_dims = 4;
+			io_info->input[i].shape[0] = metadata->input2[j].shape.w;
+			io_info->input[i].shape[1] = metadata->input2[j].shape.x;
+			io_info->input[i].shape[2] = metadata->input2[j].shape.y;
+			io_info->input[i].shape[3] = metadata->input2[j].shape.z;
+			io_info->input[i].nb_elements =
 				metadata->input2[j].shape.w * metadata->input2[j].shape.x *
 				metadata->input2[j].shape.y * metadata->input2[j].shape.z;
-			layer->info.input[i].sz_d =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_d =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].input_type);
-			layer->info.input[i].sz_q =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_q =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].model_input_type);
-			layer->info.input[i].scale = metadata->input2[j].qscale;
+			io_info->input[i].scale = metadata->input2[j].qscale;
 
-			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
-			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
+			io_info->total_input_sz_d += io_info->input[i].sz_d;
+			io_info->total_input_sz_q += io_info->input[i].sz_q;
 
 			plt_ml_dbg(
-				"index = %u, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				layer->index, j, metadata->input2[j].shape.w,
+				"layer_name = %s, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				metadata->model.name, j, metadata->input2[j].shape.w,
 				metadata->input2[j].shape.x, metadata->input2[j].shape.y,
-				metadata->input2[j].shape.z, layer->info.input[i].sz_d,
-				layer->info.input[i].sz_q);
+				metadata->input2[j].shape.z, io_info->input[i].sz_d,
+				io_info->input[i].sz_q);
 		}
 	}
 
 	/* Outputs */
-	layer->info.nb_outputs = metadata->model.num_output;
-	layer->info.total_output_sz_q = 0;
-	layer->info.total_output_sz_d = 0;
+	io_info->nb_outputs = metadata->model.num_output;
+	io_info->total_output_sz_q = 0;
+	io_info->total_output_sz_d = 0;
 	for (i = 0; i < metadata->model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			strncpy(layer->info.output[i].name,
-				(char *)metadata->output1[i].output_name, MRVL_ML_OUTPUT_NAME_LEN);
-			layer->info.output[i].dtype = metadata->output1[i].output_type;
-			layer->info.output[i].qtype = metadata->output1[i].model_output_type;
-			layer->info.output[i].nb_dims = 1;
-			layer->info.output[i].shape[0] = metadata->output1[i].size;
-			layer->info.output[i].nb_elements = metadata->output1[i].size;
-			layer->info.output[i].sz_d =
-				layer->info.output[i].nb_elements *
+			strncpy(io_info->output[i].name, (char *)metadata->output1[i].output_name,
+				MRVL_ML_OUTPUT_NAME_LEN);
+			io_info->output[i].dtype = metadata->output1[i].output_type;
+			io_info->output[i].qtype = metadata->output1[i].model_output_type;
+			io_info->output[i].nb_dims = 1;
+			io_info->output[i].shape[0] = metadata->output1[i].size;
+			io_info->output[i].nb_elements = metadata->output1[i].size;
+			io_info->output[i].sz_d =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].output_type);
-			layer->info.output[i].sz_q =
-				layer->info.output[i].nb_elements *
+			io_info->output[i].sz_q =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].model_output_type);
-			layer->info.output[i].scale = metadata->output1[i].dscale;
+			io_info->output[i].scale = metadata->output1[i].dscale;
 
-			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
-			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
+			io_info->total_output_sz_q += io_info->output[i].sz_q;
+			io_info->total_output_sz_d += io_info->output[i].sz_d;
 
-			plt_ml_dbg("index = %u, output1[%u] - sz_d = %u, sz_q = %u", layer->index,
-				   i, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
+			plt_ml_dbg("layer_name = %s, output1[%u] - sz_d = %u, sz_q = %u",
+				   metadata->model.name, i, io_info->output[i].sz_d,
+				   io_info->output[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			strncpy(layer->info.output[i].name,
-				(char *)metadata->output2[j].output_name, MRVL_ML_OUTPUT_NAME_LEN);
-			layer->info.output[i].dtype = metadata->output2[j].output_type;
-			layer->info.output[i].qtype = metadata->output2[j].model_output_type;
-			layer->info.output[i].nb_dims = 1;
-			layer->info.output[i].shape[0] = metadata->output2[j].size;
-			layer->info.output[i].nb_elements = metadata->output2[j].size;
-			layer->info.output[i].sz_d =
-				layer->info.output[i].nb_elements *
+			strncpy(io_info->output[i].name, (char *)metadata->output2[j].output_name,
+				MRVL_ML_OUTPUT_NAME_LEN);
+			io_info->output[i].dtype = metadata->output2[j].output_type;
+			io_info->output[i].qtype = metadata->output2[j].model_output_type;
+			io_info->output[i].nb_dims = 1;
+			io_info->output[i].shape[0] = metadata->output2[j].size;
+			io_info->output[i].nb_elements = metadata->output2[j].size;
+			io_info->output[i].sz_d =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].output_type);
-			layer->info.output[i].sz_q =
-				layer->info.output[i].nb_elements *
+			io_info->output[i].sz_q =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].model_output_type);
-			layer->info.output[i].scale = metadata->output2[j].dscale;
+			io_info->output[i].scale = metadata->output2[j].dscale;
 
-			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
-			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
+			io_info->total_output_sz_q += io_info->output[i].sz_q;
+			io_info->total_output_sz_d += io_info->output[i].sz_d;
 
-			plt_ml_dbg("index = %u, output2[%u] - sz_d = %u, sz_q = %u", layer->index,
-				   j, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
+			plt_ml_dbg("layer_name = %s, output2[%u] - sz_d = %u, sz_q = %u",
+				   metadata->model.name, j, io_info->output[i].sz_d,
+				   io_info->output[i].sz_q);
 		}
 	}
 }
 
+struct cnxk_ml_io_info *
+cn10k_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
+{
+	return &model->layer[layer_id].info;
+}
+
 int
-cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id, uint8_t *buffer,
-			       uint16_t *wb_pages, uint16_t *scratch_pages)
+cn10k_ml_model_ocm_pages_count(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
+			       uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_ocm *ocm;
@@ -504,7 +499,7 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 	uint64_t wb_size;
 
 	metadata = (struct cn10k_ml_model_metadata *)buffer;
-	ocm = &cn10k_mldev->ocm;
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
 
 	/* Assume wb_size is zero for non-relocatable models */
 	if (metadata->model.ocm_relocatable)
@@ -516,7 +511,7 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 		*wb_pages = wb_size / ocm->page_size + 1;
 	else
 		*wb_pages = wb_size / ocm->page_size;
-	plt_ml_dbg("model_id = %u, wb_size = %" PRIu64 ", wb_pages = %u", model_id, wb_size,
+	plt_ml_dbg("index = %u, wb_size = %" PRIu64 ", wb_pages = %u", layer->index, wb_size,
 		   *wb_pages);
 
 	scratch_size = ocm->size_per_tile - metadata->model.ocm_tmp_range_floor;
@@ -524,15 +519,15 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 		*scratch_pages = scratch_size / ocm->page_size + 1;
 	else
 		*scratch_pages = scratch_size / ocm->page_size;
-	plt_ml_dbg("model_id = %u, scratch_size = %" PRIu64 ", scratch_pages = %u", model_id,
+	plt_ml_dbg("index = %u, scratch_size = %" PRIu64 ", scratch_pages = %u", layer->index,
 		   scratch_size, *scratch_pages);
 
 	/* Check if the model can be loaded on OCM */
-	if ((*wb_pages + *scratch_pages) > cn10k_mldev->ocm.num_pages) {
+	if ((*wb_pages + *scratch_pages) > ocm->num_pages) {
 		plt_err("Cannot create the model, OCM relocatable = %u",
 			metadata->model.ocm_relocatable);
 		plt_err("wb_pages (%u) + scratch_pages (%u) > %u", *wb_pages, *scratch_pages,
-			cn10k_mldev->ocm.num_pages);
+			ocm->num_pages);
 		return -ENOMEM;
 	}
 
@@ -540,28 +535,25 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 	 * prevent the library from allocating the remaining space on the tile to other models.
 	 */
 	if (!metadata->model.ocm_relocatable)
-		*scratch_pages = PLT_MAX(PLT_U64_CAST(*scratch_pages),
-					 PLT_U64_CAST(cn10k_mldev->ocm.num_pages));
+		*scratch_pages =
+			PLT_MAX(PLT_U64_CAST(*scratch_pages), PLT_U64_CAST(ocm->num_pages));
 
 	return 0;
 }
 
 void
-cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
+cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			struct cnxk_ml_io_info *io_info, struct cn10k_ml_model_metadata *metadata)
 {
-	struct cn10k_ml_model_metadata *metadata;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct rte_ml_model_info *info;
 	struct rte_ml_io_info *output;
 	struct rte_ml_io_info *input;
-	struct cnxk_ml_layer *layer;
 	uint8_t i;
 
-	cnxk_mldev = dev->data->dev_private;
 	metadata = &model->glow.metadata;
 	info = PLT_PTR_CAST(model->info);
 	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
-	output = PLT_PTR_ADD(input, metadata->model.num_input * sizeof(struct rte_ml_io_info));
+	output = PLT_PTR_ADD(input, ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info));
 
 	/* Set model info */
 	memset(info, 0, sizeof(struct rte_ml_model_info));
@@ -570,39 +562,37 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 		 metadata->model.version[1], metadata->model.version[2],
 		 metadata->model.version[3]);
 	info->model_id = model->model_id;
-	info->device_id = dev->data->dev_id;
+	info->device_id = cnxk_mldev->mldev->data->dev_id;
 	info->io_layout = RTE_ML_IO_LAYOUT_PACKED;
 	info->min_batches = model->batch_size;
 	info->max_batches =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_num_batches /
 		model->batch_size;
-	info->nb_inputs = metadata->model.num_input;
+	info->nb_inputs = io_info->nb_inputs;
 	info->input_info = input;
-	info->nb_outputs = metadata->model.num_output;
+	info->nb_outputs = io_info->nb_outputs;
 	info->output_info = output;
 	info->wb_size = metadata->weights_bias.file_size;
 
 	/* Set input info */
-	layer = &model->layer[0];
 	for (i = 0; i < info->nb_inputs; i++) {
-		rte_memcpy(input[i].name, layer->info.input[i].name, MRVL_ML_INPUT_NAME_LEN);
-		input[i].nb_dims = layer->info.input[i].nb_dims;
-		input[i].shape = &layer->info.input[i].shape[0];
-		input[i].type = layer->info.input[i].qtype;
-		input[i].nb_elements = layer->info.input[i].nb_elements;
-		input[i].size = layer->info.input[i].nb_elements *
-				rte_ml_io_type_size_get(layer->info.input[i].qtype);
+		rte_memcpy(input[i].name, io_info->input[i].name, MRVL_ML_INPUT_NAME_LEN);
+		input[i].nb_dims = io_info->input[i].nb_dims;
+		input[i].shape = &io_info->input[i].shape[0];
+		input[i].type = io_info->input[i].qtype;
+		input[i].nb_elements = io_info->input[i].nb_elements;
+		input[i].size = io_info->input[i].nb_elements *
+				rte_ml_io_type_size_get(io_info->input[i].qtype);
 	}
 
 	/* Set output info */
-	layer = &model->layer[0];
 	for (i = 0; i < info->nb_outputs; i++) {
-		rte_memcpy(output[i].name, layer->info.output[i].name, MRVL_ML_INPUT_NAME_LEN);
-		output[i].nb_dims = layer->info.output[i].nb_dims;
-		output[i].shape = &layer->info.output[i].shape[0];
-		output[i].type = layer->info.output[i].qtype;
-		output[i].nb_elements = layer->info.output[i].nb_elements;
-		output[i].size = layer->info.output[i].nb_elements *
-				 rte_ml_io_type_size_get(layer->info.output[i].qtype);
+		rte_memcpy(output[i].name, io_info->output[i].name, MRVL_ML_INPUT_NAME_LEN);
+		output[i].nb_dims = io_info->output[i].nb_dims;
+		output[i].shape = &io_info->output[i].shape[0];
+		output[i].type = io_info->output[i].qtype;
+		output[i].nb_elements = io_info->output[i].nb_elements;
+		output[i].size = io_info->output[i].nb_elements *
+				 rte_ml_io_type_size_get(io_info->output[i].qtype);
 	}
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 5c32f48c68..b891c9d627 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -9,9 +9,11 @@
 
 #include <roc_api.h>
 
-#include "cn10k_ml_dev.h"
 #include "cn10k_ml_ocm.h"
 
+#include "cnxk_ml_io.h"
+
+struct cnxk_ml_dev;
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
 struct cnxk_ml_req;
@@ -366,27 +368,15 @@ struct cn10k_ml_layer_addr {
 	/* Base DMA address for load */
 	void *base_dma_addr_load;
 
-	/* Base DMA address for run */
-	void *base_dma_addr_run;
-
 	/* Init section load address */
 	void *init_load_addr;
 
-	/* Init section run address */
-	void *init_run_addr;
-
 	/* Main section load address */
 	void *main_load_addr;
 
-	/* Main section run address */
-	void *main_run_addr;
-
 	/* Finish section load address */
 	void *finish_load_addr;
 
-	/* Finish section run address */
-	void *finish_run_addr;
-
 	/* Weights and Bias base address */
 	void *wb_base_addr;
 
@@ -462,9 +452,13 @@ int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
 void cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer,
 				uint8_t *base_dma_addr);
-void cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer);
-int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id,
+void cn10k_ml_layer_io_info_set(struct cnxk_ml_io_info *io_info,
+				struct cn10k_ml_model_metadata *metadata);
+struct cnxk_ml_io_info *cn10k_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
+int cn10k_ml_model_ocm_pages_count(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				   uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages);
-void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model);
+void cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			     struct cnxk_ml_io_info *io_info,
+			     struct cn10k_ml_model_metadata *metadata);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index e3c688a55f..ad2effb904 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -15,6 +15,9 @@
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
 
+/* ML layer macros */
+#define CN10K_ML_LAYER_MEMZONE_NAME "ml_cn10k_layer_mz"
+
 /* Debug print width */
 #define STR_LEN	  12
 #define FIELD_LEN 16
@@ -273,7 +276,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 		req->cn10k_req.jd.model_start.extended_args = PLT_U64_CAST(
 			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.extended_args));
 		req->cn10k_req.jd.model_start.model_dst_ddr_addr =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_run_addr));
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_load_addr));
 		req->cn10k_req.jd.model_start.model_init_offset = 0x0;
 		req->cn10k_req.jd.model_start.model_main_offset = metadata->init_model.file_size;
 		req->cn10k_req.jd.model_start.model_finish_offset =
@@ -1261,85 +1264,171 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 }
 
 int
-cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
+cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uint8_t *buffer,
+		    size_t size, uint16_t *index)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
-	size_t model_scratch_size;
-	size_t model_stats_size;
-	size_t model_data_size;
-	size_t model_info_size;
+	size_t layer_object_size = 0;
+	size_t layer_scratch_size;
+	size_t layer_xstats_size;
 	uint8_t *base_dma_addr;
 	uint16_t scratch_pages;
+	uint16_t layer_id = 0;
 	uint16_t wb_pages;
 	uint64_t mz_size;
 	uint16_t idx;
-	bool found;
 	int qp_id;
 	int ret;
 
-	ret = cn10k_ml_model_metadata_check(params->addr, params->size);
+	PLT_SET_USED(size);
+	PLT_SET_USED(layer_name);
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	layer = &model->layer[layer_id];
+
+	ret = cn10k_ml_model_metadata_check(buffer, size);
 	if (ret != 0)
 		return ret;
 
-	cnxk_mldev = dev->data->dev_private;
-
-	/* Find model ID */
-	found = false;
-	for (idx = 0; idx < dev->data->nb_models; idx++) {
-		if (dev->data->models[idx] == NULL) {
-			found = true;
+	/* Get index */
+	for (idx = 0; idx < cnxk_mldev->max_nb_layers; idx++) {
+		if (!cnxk_mldev->index_map[idx].active) {
+			layer->index = idx;
 			break;
 		}
 	}
 
-	if (!found) {
-		plt_err("No slots available to load new model");
-		return -ENOMEM;
+	if (idx >= cnxk_mldev->max_nb_layers) {
+		plt_err("No slots available for model layers, model_id = %u, layer_id = %u",
+			model->model_id, layer_id);
+		return -1;
 	}
 
+	layer->model = model;
+
 	/* Get WB and scratch pages, check if model can be loaded. */
-	ret = cn10k_ml_model_ocm_pages_count(&cnxk_mldev->cn10k_mldev, idx, params->addr, &wb_pages,
-					     &scratch_pages);
+	ret = cn10k_ml_model_ocm_pages_count(cnxk_mldev, layer, buffer, &wb_pages, &scratch_pages);
 	if (ret < 0)
 		return ret;
 
-	/* Compute memzone size */
-	metadata = (struct cn10k_ml_model_metadata *)params->addr;
-	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
-			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
-	model_scratch_size = PLT_ALIGN_CEIL(metadata->model.ddr_scratch_range_end -
+	/* Compute layer memzone size */
+	metadata = (struct cn10k_ml_model_metadata *)buffer;
+	layer_object_size = metadata->init_model.file_size + metadata->main_model.file_size +
+			    metadata->finish_model.file_size + metadata->weights_bias.file_size;
+	layer_object_size = PLT_ALIGN_CEIL(layer_object_size, ML_CN10K_ALIGN_SIZE);
+	layer_scratch_size = PLT_ALIGN_CEIL(metadata->model.ddr_scratch_range_end -
 						    metadata->model.ddr_scratch_range_start + 1,
 					    ML_CN10K_ALIGN_SIZE);
-	model_data_size = PLT_ALIGN_CEIL(model_data_size, ML_CN10K_ALIGN_SIZE);
-	model_info_size = sizeof(struct rte_ml_model_info) +
-			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
-			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
-	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
-	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_xstats);
-
-	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
-		  2 * model_data_size + model_scratch_size + model_info_size +
-		  PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE) +
-		  model_stats_size;
+	layer_xstats_size = (cnxk_mldev->mldev->data->nb_queue_pairs + 1) *
+			    sizeof(struct cn10k_ml_layer_xstats);
 
-	/* Allocate memzone for model object and model data */
-	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
+	/* Allocate memzone for model data */
+	mz_size = layer_object_size + layer_scratch_size +
+		  PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE) +
+		  layer_xstats_size;
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u_%u", CN10K_ML_LAYER_MEMZONE_NAME,
+		 model->model_id, layer_id);
 	mz = plt_memzone_reserve_aligned(str, mz_size, 0, ML_CN10K_ALIGN_SIZE);
 	if (!mz) {
 		plt_err("plt_memzone_reserve failed : %s", str);
 		return -ENOMEM;
 	}
 
-	model = mz->addr;
-	model->cnxk_mldev = cnxk_mldev;
-	model->model_id = idx;
-	dev->data->models[idx] = model;
+	/* Copy metadata to internal buffer */
+	rte_memcpy(&layer->glow.metadata, buffer, sizeof(struct cn10k_ml_model_metadata));
+	cn10k_ml_model_metadata_update(&layer->glow.metadata);
+
+	/* Set layer name */
+	rte_memcpy(layer->name, layer->glow.metadata.model.name, MRVL_ML_MODEL_NAME_LEN);
+
+	/* Enable support for batch_size of 256 */
+	if (layer->glow.metadata.model.batch_size == 0)
+		layer->batch_size = 256;
+	else
+		layer->batch_size = layer->glow.metadata.model.batch_size;
+
+	/* Set DMA base address */
+	base_dma_addr = mz->addr;
+	cn10k_ml_layer_addr_update(layer, buffer, base_dma_addr);
+
+	/* Set scratch base address */
+	layer->glow.addr.scratch_base_addr = PLT_PTR_ADD(base_dma_addr, layer_object_size);
+
+	/* Update internal I/O data structure */
+	cn10k_ml_layer_io_info_set(&layer->info, &layer->glow.metadata);
+
+	/* Initialize model_mem_map */
+	memset(&layer->glow.ocm_map, 0, sizeof(struct cn10k_ml_ocm_layer_map));
+	layer->glow.ocm_map.ocm_reserved = false;
+	layer->glow.ocm_map.tilemask = 0;
+	layer->glow.ocm_map.wb_page_start = -1;
+	layer->glow.ocm_map.wb_pages = wb_pages;
+	layer->glow.ocm_map.scratch_pages = scratch_pages;
+
+	/* Set slow-path request address and state */
+	layer->glow.req = PLT_PTR_ADD(mz->addr, layer_object_size + layer_scratch_size);
+
+	/* Reset burst and sync stats */
+	layer->glow.burst_xstats = PLT_PTR_ADD(
+		layer->glow.req, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
+	for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs + 1; qp_id++) {
+		layer->glow.burst_xstats[qp_id].hw_latency_tot = 0;
+		layer->glow.burst_xstats[qp_id].hw_latency_min = UINT64_MAX;
+		layer->glow.burst_xstats[qp_id].hw_latency_max = 0;
+		layer->glow.burst_xstats[qp_id].fw_latency_tot = 0;
+		layer->glow.burst_xstats[qp_id].fw_latency_min = UINT64_MAX;
+		layer->glow.burst_xstats[qp_id].fw_latency_max = 0;
+		layer->glow.burst_xstats[qp_id].hw_reset_count = 0;
+		layer->glow.burst_xstats[qp_id].fw_reset_count = 0;
+		layer->glow.burst_xstats[qp_id].dequeued_count = 0;
+	}
+
+	layer->glow.sync_xstats =
+		PLT_PTR_ADD(layer->glow.burst_xstats, cnxk_mldev->mldev->data->nb_queue_pairs *
+							      sizeof(struct cn10k_ml_layer_xstats));
+
+	/* Update xstats names */
+	cn10k_ml_xstats_model_name_update(cnxk_mldev->mldev, idx);
+
+	layer->state = ML_CNXK_LAYER_STATE_LOADED;
+	cnxk_mldev->index_map[idx].model_id = model->model_id;
+	cnxk_mldev->index_map[idx].layer_id = layer_id;
+	cnxk_mldev->index_map[idx].active = true;
+	*index = idx;
+
+	return 0;
+}
+
+int
+cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+		    struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+	int ret;
+
+	/* Metadata check */
+	ret = cn10k_ml_model_metadata_check(params->addr, params->size);
+	if (ret != 0)
+		return ret;
 
+	/* Copy metadata to internal buffer */
 	rte_memcpy(&model->glow.metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
 	cn10k_ml_model_metadata_update(&model->glow.metadata);
 
@@ -1358,99 +1447,62 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	 */
 	model->nb_layers = 1;
 
-	/* Copy metadata to internal buffer */
-	rte_memcpy(&model->layer[0].glow.metadata, params->addr,
-		   sizeof(struct cn10k_ml_model_metadata));
-	cn10k_ml_model_metadata_update(&model->layer[0].glow.metadata);
-	model->layer[0].model = model;
-
-	/* Set DMA base address */
-	base_dma_addr = PLT_PTR_ADD(
-		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE));
-	cn10k_ml_layer_addr_update(&model->layer[0], params->addr, base_dma_addr);
-	model->layer[0].glow.addr.scratch_base_addr =
-		PLT_PTR_ADD(base_dma_addr, 2 * model_data_size);
-
-	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
-		   model->layer[0].glow.addr.base_dma_addr_load, model_data_size);
-
-	/* Update internal I/O data structure */
-	cn10k_ml_layer_info_update(&model->layer[0]);
-
-	/* Initialize model_mem_map */
-	memset(&model->layer[0].glow.ocm_map, 0, sizeof(struct cn10k_ml_ocm_layer_map));
-	model->layer[0].glow.ocm_map.ocm_reserved = false;
-	model->layer[0].glow.ocm_map.tilemask = 0;
-	model->layer[0].glow.ocm_map.wb_page_start = -1;
-	model->layer[0].glow.ocm_map.wb_pages = wb_pages;
-	model->layer[0].glow.ocm_map.scratch_pages = scratch_pages;
-
-	/* Set model info */
-	model->info = PLT_PTR_ADD(model->layer[0].glow.addr.scratch_base_addr, model_scratch_size);
-	cn10k_ml_model_info_set(dev, model);
-
-	/* Set slow-path request address and state */
-	model->layer[0].glow.req = PLT_PTR_ADD(model->info, model_info_size);
-
-	/* Reset burst and sync stats */
-	model->layer[0].glow.burst_xstats =
-		PLT_PTR_ADD(model->layer[0].glow.req,
-			    PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
-		model->layer[0].glow.burst_xstats[qp_id].hw_latency_tot = 0;
-		model->layer[0].glow.burst_xstats[qp_id].hw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_xstats[qp_id].hw_latency_max = 0;
-		model->layer[0].glow.burst_xstats[qp_id].fw_latency_tot = 0;
-		model->layer[0].glow.burst_xstats[qp_id].fw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_xstats[qp_id].fw_latency_max = 0;
-		model->layer[0].glow.burst_xstats[qp_id].hw_reset_count = 0;
-		model->layer[0].glow.burst_xstats[qp_id].fw_reset_count = 0;
-		model->layer[0].glow.burst_xstats[qp_id].dequeued_count = 0;
+	/* Load layer and get the index */
+	layer = &model->layer[0];
+	ret = cn10k_ml_layer_load(cnxk_mldev, model->model_id, NULL, params->addr, params->size,
+				  &layer->index);
+	if (ret != 0) {
+		plt_err("Model layer load failed: model_id = %u, layer_id = %u", model->model_id,
+			0);
+		return ret;
 	}
 
-	model->layer[0].glow.sync_xstats =
-		PLT_PTR_ADD(model->layer[0].glow.burst_xstats,
-			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_xstats));
-
-	plt_spinlock_init(&model->lock);
-	model->state = ML_CNXK_MODEL_STATE_LOADED;
-	dev->data->models[idx] = model;
-	cnxk_mldev->nb_models_loaded++;
-
-	/* Update xstats names */
-	cn10k_ml_xstats_model_name_update(dev, idx);
-
-	*model_id = idx;
+	cn10k_ml_model_info_set(cnxk_mldev, model, &model->layer[0].info, &model->glow.metadata);
 
 	return 0;
 }
 
 int
-cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name)
 {
-	char str[RTE_MEMZONE_NAMESIZE];
-	struct cnxk_ml_model *model;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 
-	cnxk_mldev = dev->data->dev_private;
-	model = dev->data->models[model_id];
+	char str[RTE_MEMZONE_NAMESIZE];
+	uint16_t layer_id = 0;
+	int ret;
 
+	PLT_SET_USED(layer_name);
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
 	if (model == NULL) {
 		plt_err("Invalid model_id = %u", model_id);
 		return -EINVAL;
 	}
 
-	if (model->state != ML_CNXK_MODEL_STATE_LOADED) {
-		plt_err("Cannot unload. Model in use.");
-		return -EBUSY;
-	}
+	layer = &model->layer[layer_id];
 
-	dev->data->models[model_id] = NULL;
-	cnxk_mldev->nb_models_unloaded++;
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u_%u", CN10K_ML_LAYER_MEMZONE_NAME,
+		 model->model_id, layer_id);
+	ret = plt_memzone_free(plt_memzone_lookup(str));
 
-	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, model_id);
-	return plt_memzone_free(plt_memzone_lookup(str));
+	layer->state = ML_CNXK_LAYER_STATE_UNKNOWN;
+	cnxk_mldev->index_map[layer->index].active = false;
+
+	return ret;
+}
+
+int
+cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	return cn10k_ml_layer_unload(cnxk_mldev, model->model_id, NULL);
 }
 
 int
@@ -1748,7 +1800,6 @@ int
 cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
 {
 	struct cnxk_ml_model *model;
-	size_t size;
 
 	model = dev->data->models[model_id];
 
@@ -1762,19 +1813,10 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *bu
 	else if (model->state != ML_CNXK_MODEL_STATE_LOADED)
 		return -EBUSY;
 
-	size = model->layer[0].glow.metadata.init_model.file_size +
-	       model->layer[0].glow.metadata.main_model.file_size +
-	       model->layer[0].glow.metadata.finish_model.file_size +
-	       model->layer[0].glow.metadata.weights_bias.file_size;
-
 	/* Update model weights & bias */
 	rte_memcpy(model->layer[0].glow.addr.wb_load_addr, buffer,
 		   model->layer[0].glow.metadata.weights_bias.file_size);
 
-	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
-		   model->layer[0].glow.addr.base_dma_addr_load, size);
-
 	return 0;
 }
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 2d0a49d5cd..677219dfdf 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -12,6 +12,7 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_qp;
+struct cnxk_ml_model;
 
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
@@ -311,9 +312,9 @@ int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 			      int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids);
 
 /* Slow-path ops */
-int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
-			uint16_t *model_id);
-int cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
+int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+			struct cnxk_ml_model *model);
+int cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
@@ -339,4 +340,9 @@ __rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *
 /* Misc ops */
 void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
 
+/* Layer ops */
+int cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uint8_t *buffer,
+			size_t size, uint16_t *index);
+int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name);
+
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 02605fa28f..1590249abd 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -31,6 +31,18 @@ enum cnxk_ml_dev_state {
 	ML_CNXK_DEV_STATE_CLOSED
 };
 
+/* Index to model and layer ID map */
+struct cnxk_ml_index_map {
+	/* Model ID */
+	uint16_t model_id;
+
+	/* Layer ID */
+	uint16_t layer_id;
+
+	/* Layer status */
+	bool active;
+};
+
 /* Device private data */
 struct cnxk_ml_dev {
 	/* RTE device */
@@ -56,6 +68,9 @@ struct cnxk_ml_dev {
 
 	/* Maximum number of layers */
 	uint64_t max_nb_layers;
+
+	/* Index map */
+	struct cnxk_ml_index_map *index_map;
 };
 
 #endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index aa56dd2276..1d8b84269d 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -10,6 +10,9 @@
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
+/* ML model macros */
+#define CNXK_ML_MODEL_MEMZONE_NAME "ml_cnxk_model_mz"
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -137,6 +140,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	uint16_t model_id;
 	uint32_t mz_size;
 	uint16_t qp_id;
+	uint64_t i;
 	int ret;
 
 	if (dev == NULL)
@@ -240,7 +244,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 						plt_err("Could not stop model %u", model_id);
 				}
 				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-					if (cn10k_ml_model_unload(dev, model_id) != 0)
+					if (cnxk_ml_model_unload(dev, model_id) != 0)
 						plt_err("Could not unload model %u", model_id);
 				}
 				dev->data->models[model_id] = NULL;
@@ -271,6 +275,23 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	cnxk_mldev->max_nb_layers =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
 
+	/* Allocate and initialize index_map */
+	if (cnxk_mldev->index_map == NULL) {
+		cnxk_mldev->index_map =
+			rte_zmalloc("cnxk_ml_index_map",
+				    sizeof(struct cnxk_ml_index_map) * cnxk_mldev->max_nb_layers,
+				    RTE_CACHE_LINE_SIZE);
+		if (cnxk_mldev->index_map == NULL) {
+			plt_err("Failed to get memory for index_map, nb_layers %" PRIu64,
+				cnxk_mldev->max_nb_layers);
+			ret = -ENOMEM;
+			goto error;
+		}
+	}
+
+	for (i = 0; i < cnxk_mldev->max_nb_layers; i++)
+		cnxk_mldev->index_map[i].active = false;
+
 	cnxk_mldev->nb_models_loaded = 0;
 	cnxk_mldev->nb_models_started = 0;
 	cnxk_mldev->nb_models_stopped = 0;
@@ -303,6 +324,9 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close CN10K ML Device");
 
+	if (cnxk_mldev->index_map)
+		rte_free(cnxk_mldev->index_map);
+
 	/* Stop and unload all models */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 		model = dev->data->models[model_id];
@@ -312,7 +336,7 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 					plt_err("Could not stop model %u", model_id);
 			}
 			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-				if (cn10k_ml_model_unload(dev, model_id) != 0)
+				if (cnxk_ml_model_unload(dev, model_id) != 0)
 					plt_err("Could not unload model %u", model_id);
 			}
 			dev->data->models[model_id] = NULL;
@@ -428,6 +452,118 @@ cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+static int
+cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint64_t model_info_size;
+	uint16_t lcl_model_id;
+	uint64_t mz_size;
+	bool found;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	/* Find model ID */
+	found = false;
+	for (lcl_model_id = 0; lcl_model_id < dev->data->nb_models; lcl_model_id++) {
+		if (dev->data->models[lcl_model_id] == NULL) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		plt_err("No slots available to load new model");
+		return -ENOMEM;
+	}
+
+	/* Compute memzone size */
+	cnxk_ml_dev_info_get(dev, &dev_info);
+	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size);
+	model_info_size = sizeof(struct rte_ml_model_info) +
+			  ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info) +
+			  ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info);
+	model_info_size = PLT_ALIGN_CEIL(model_info_size, dev_info.align_size);
+	mz_size += model_info_size;
+
+	/* Allocate memzone for model object */
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CNXK_ML_MODEL_MEMZONE_NAME, lcl_model_id);
+	mz = plt_memzone_reserve_aligned(str, mz_size, 0, dev_info.align_size);
+	if (!mz) {
+		plt_err("Failed to allocate memory for cnxk_ml_model: %s", str);
+		return -ENOMEM;
+	}
+
+	model = mz->addr;
+	model->cnxk_mldev = cnxk_mldev;
+	model->model_id = lcl_model_id;
+	model->info = PLT_PTR_ADD(
+		model, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size));
+	dev->data->models[lcl_model_id] = model;
+
+	ret = cn10k_ml_model_load(cnxk_mldev, params, model);
+	if (ret != 0)
+		goto error;
+
+	plt_spinlock_init(&model->lock);
+	model->state = ML_CNXK_MODEL_STATE_LOADED;
+	cnxk_mldev->nb_models_loaded++;
+
+	*model_id = lcl_model_id;
+
+	return 0;
+
+error:
+	rte_memzone_free(mz);
+
+	return ret;
+}
+
+int
+cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	if (model->state != ML_CNXK_MODEL_STATE_LOADED) {
+		plt_err("Cannot unload. Model in use.");
+		return -EBUSY;
+	}
+
+	ret = cn10k_ml_model_unload(cnxk_mldev, model);
+	if (ret != 0)
+		return ret;
+
+	dev->data->models[model_id] = NULL;
+	cnxk_mldev->nb_models_unloaded++;
+
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CNXK_ML_MODEL_MEMZONE_NAME, model_id);
+	return plt_memzone_free(plt_memzone_lookup(str));
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -451,8 +587,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
 
 	/* Model ops */
-	.model_load = cn10k_ml_model_load,
-	.model_unload = cn10k_ml_model_unload,
+	.model_load = cnxk_ml_model_load,
+	.model_unload = cnxk_ml_model_unload,
 	.model_start = cn10k_ml_model_start,
 	.model_stop = cn10k_ml_model_stop,
 	.model_info_get = cn10k_ml_model_info_get,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index a925c07580..bc14f6e5b9 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -62,4 +62,6 @@ struct cnxk_ml_qp {
 
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
+int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 10/34] ml/cnxk: update model start and stop functions
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
                     ` (8 preceding siblings ...)
  2023-10-18  6:47   ` [PATCH v5 09/34] ml/cnxk: update model load and unload functions Srikanth Yalavarthi
@ 2023-10-18  6:47   ` Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 11/34] ml/cnxk: update model utility functions Srikanth Yalavarthi
                     ` (23 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:47 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented cnxk wrapper functions to start and stop
ML models. Wrapper functions would invoke the cn10k
model start and stop functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c |  28 ++--
 drivers/ml/cnxk/cn10k_ml_ocm.h |  12 +-
 drivers/ml/cnxk/cn10k_ml_ops.c | 282 ++++++++++++++++++++-------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   8 +-
 drivers/ml/cnxk/cnxk_ml_ops.c  |  48 +++++-
 drivers/ml/cnxk/cnxk_ml_ops.h  |   1 +
 6 files changed, 240 insertions(+), 139 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index d71c36eae6..2197e5e0ed 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -215,11 +215,10 @@ cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end)
  * scratch & WB pages and OCM allocation mode.
  */
 int
-cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
+cn10k_ml_ocm_tilemask_find(struct cnxk_ml_dev *cnxk_mldev, uint8_t num_tiles, uint16_t wb_pages,
 			   uint16_t scratch_pages, uint64_t *tilemask)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	uint16_t used_scratch_pages_max;
@@ -238,7 +237,6 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	int max_slot_sz;
 	int page_id;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
 
@@ -333,12 +331,10 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 }
 
 void
-cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
+cn10k_ml_ocm_reserve_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id,
 			   uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
 			   uint16_t scratch_pages)
 {
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
@@ -351,10 +347,8 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t l
 	int tile_id;
 	int page_id;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
+	model = cnxk_mldev->mldev->data->models[model_id];
 	layer = &model->layer[layer_id];
 
 	/* Get first set bit, tile_start */
@@ -396,12 +390,10 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t l
 }
 
 void
-cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id)
+cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id)
 {
 	struct cnxk_ml_model *local_model;
 	struct cnxk_ml_layer *local_layer;
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
@@ -416,10 +408,8 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t laye
 	uint16_t i;
 	uint16_t j;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
+	model = cnxk_mldev->mldev->data->models[model_id];
 	layer = &model->layer[layer_id];
 
 	/* Update OCM info for WB memory */
@@ -438,8 +428,8 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t laye
 
 		/* Get max scratch pages required, excluding the current model */
 		scratch_resize_pages = 0;
-		for (i = 0; i < dev->data->nb_models; i++) {
-			local_model = dev->data->models[i];
+		for (i = 0; i < cnxk_mldev->mldev->data->nb_models; i++) {
+			local_model = cnxk_mldev->mldev->data->models[i];
 			if (local_model == NULL)
 				continue;
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 720f8caf76..97b723a56a 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -8,6 +8,8 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+struct cnxk_ml_dev;
+
 /* Number of OCM tiles. */
 #define ML_CN10K_OCM_NUMTILES 0x8
 
@@ -75,12 +77,12 @@ struct cn10k_ml_ocm {
 };
 
 int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
-int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
+int cn10k_ml_ocm_tilemask_find(struct cnxk_ml_dev *cnxk_mldev, uint8_t num_tiles, uint16_t wb_pages,
 			       uint16_t scratch_pages, uint64_t *tilemask);
-void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
-				uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
-				uint16_t scratch_pages);
-void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id);
+void cn10k_ml_ocm_reserve_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id,
+				uint16_t layer_id, uint64_t tilemask, int wb_page_start,
+				uint16_t wb_pages, uint16_t scratch_pages);
+void cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id);
 void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp);
 
 #endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index ad2effb904..c677861645 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -248,26 +248,28 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 }
 
 static void
-cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_model *model,
+cn10k_ml_prep_sp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				struct cnxk_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_layer_addr *addr;
+	struct cn10k_ml_dev *cn10k_mldev;
 
-	metadata = &model->glow.metadata;
-	addr = &model->layer[0].glow.addr;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	metadata = &layer->glow.metadata;
+	addr = &layer->glow.addr;
 
 	memset(&req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
 	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
 	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->cn10k_req.status);
-	req->cn10k_req.jd.hdr.model_id = model->model_id;
+	req->cn10k_req.jd.hdr.model_id = layer->index;
 	req->cn10k_req.jd.hdr.job_type = job_type;
 	req->cn10k_req.jd.hdr.fp_flags = 0x0;
 	req->cn10k_req.jd.hdr.result =
 		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
-		if (!model->glow.metadata.model.ocm_relocatable)
+		if (!layer->glow.metadata.model.ocm_relocatable)
 			req->cn10k_req.jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
 		else
 			req->cn10k_req.jd.hdr.sp_flags = 0x0;
@@ -291,7 +293,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 		req->cn10k_req.jd.model_start.num_gather_entries = 0;
 		req->cn10k_req.jd.model_start.num_scatter_entries = 0;
 		req->cn10k_req.jd.model_start.tilemask = 0; /* Updated after reserving pages */
-		req->cn10k_req.jd.model_start.batch_size = model->batch_size;
+		req->cn10k_req.jd.model_start.batch_size = layer->batch_size;
 		req->cn10k_req.jd.model_start.ocm_wb_base_address =
 			0; /* Updated after reserving pages */
 		req->cn10k_req.jd.model_start.ocm_wb_range_start =
@@ -323,9 +325,13 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 }
 
 static __rte_always_inline void
-cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_req *req,
+cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_req *req,
 				struct rte_ml_op *op)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+
 	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
 	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(req->status);
 	req->cn10k_req.jd.hdr.model_id = op->model_id;
@@ -714,10 +720,8 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 }
 
 static int
-cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer)
 {
-	struct rte_ml_model_info *info;
-	struct cnxk_ml_model *model;
 	struct rte_ml_buff_seg seg[2];
 	struct rte_ml_buff_seg *inp;
 	struct rte_ml_buff_seg *out;
@@ -730,22 +734,20 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	int ret = 0;
 	uint32_t i;
 
-	model = dev->data->models[model_id];
-	info = (struct rte_ml_model_info *)model->info;
 	inp = &seg[0];
 	out = &seg[1];
 
 	/* Create input and output buffers. */
-	for (i = 0; i < info->nb_inputs; i++)
-		isize += info->input_info[i].size;
+	for (i = 0; i < layer->info.nb_inputs; i++)
+		isize += layer->info.input[i].sz_q;
 
-	for (i = 0; i < info->nb_outputs; i++)
-		osize += info->output_info[i].size;
+	for (i = 0; i < layer->info.nb_outputs; i++)
+		osize += layer->info.output[i].sz_q;
 
-	isize = model->batch_size * isize;
-	osize = model->batch_size * osize;
+	isize = layer->batch_size * isize;
+	osize = layer->batch_size * osize;
 
-	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", model_id);
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", layer->index);
 	mz = plt_memzone_reserve_aligned(str, isize + osize, 0, ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL)
 		return -ENOMEM;
@@ -761,15 +763,15 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	seg[1].length = osize;
 	seg[1].next = NULL;
 
-	op.model_id = model_id;
-	op.nb_batches = model->batch_size;
+	op.model_id = layer->index;
+	op.nb_batches = layer->batch_size;
 	op.mempool = NULL;
 
 	op.input = &inp;
 	op.output = &out;
 
-	memset(model->layer[0].glow.req, 0, sizeof(struct cnxk_ml_req));
-	ret = cn10k_ml_inference_sync(dev, &op);
+	memset(layer->glow.req, 0, sizeof(struct cnxk_ml_req));
+	ret = cn10k_ml_inference_sync(cnxk_mldev, &op);
 	plt_memzone_free(mz);
 
 	return ret;
@@ -1506,14 +1508,16 @@ cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mode
 }
 
 int
-cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 	struct cnxk_ml_req *req;
 
+	uint16_t layer_id = 0;
 	bool job_enqueued;
 	bool job_dequeued;
 	uint8_t num_tiles;
@@ -1524,85 +1528,89 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	bool locked;
 	int ret = 0;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	PLT_SET_USED(layer_name);
 
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
 	if (model == NULL) {
 		plt_err("Invalid model_id = %u", model_id);
 		return -EINVAL;
 	}
 
+	layer = &model->layer[layer_id];
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
+
 	/* Prepare JD */
-	req = model->layer[0].glow.req;
-	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
+	req = layer->glow.req;
+	cn10k_ml_prep_sp_job_descriptor(cnxk_mldev, layer, req, ML_CN10K_JOB_TYPE_MODEL_START);
 	req->cn10k_req.result.error_code = 0x0;
 	req->cn10k_req.result.user_ptr = NULL;
 
 	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
-	num_tiles = model->layer[0].glow.metadata.model.tile_end -
-		    model->layer[0].glow.metadata.model.tile_start + 1;
+	num_tiles = layer->glow.metadata.model.tile_end - layer->glow.metadata.model.tile_start + 1;
 
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-				plt_ml_dbg("Model already started, model = 0x%016lx",
-					   PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_STARTED) {
+				plt_ml_dbg("Layer already started, model_id = %u, layer_id = %u",
+					   model->model_id, layer_id);
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
-				plt_err("A slow-path job is active for the model = 0x%016lx",
-					PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE) {
+				plt_err("A slow-path job is active for the model_id = %u",
+					model->model_id);
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
+			layer->state = ML_CNXK_LAYER_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (!model->layer[0].glow.ocm_map.ocm_reserved) {
+	while (!layer->glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
 			wb_page_start = cn10k_ml_ocm_tilemask_find(
-				dev, num_tiles, model->layer[0].glow.ocm_map.wb_pages,
-				model->layer[0].glow.ocm_map.scratch_pages, &tilemask);
+				cnxk_mldev, num_tiles, layer->glow.ocm_map.wb_pages,
+				layer->glow.ocm_map.scratch_pages, &tilemask);
 
 			if (wb_page_start == -1) {
 				plt_err("Free pages not available on OCM tiles");
-				plt_err("Failed to start model = 0x%016lx, name = %s",
-					PLT_U64_CAST(model),
-					model->layer[0].glow.metadata.model.name);
-
+				plt_err("Failed to start layer, model_id = %u, layer_id = %u",
+					model->model_id, layer_id);
 				plt_spinlock_unlock(&ocm->lock);
 				return -ENOMEM;
 			}
 
-			model->layer[0].glow.ocm_map.tilemask = tilemask;
-			model->layer[0].glow.ocm_map.wb_page_start = wb_page_start;
+			layer->glow.ocm_map.tilemask = tilemask;
+			layer->glow.ocm_map.wb_page_start = wb_page_start;
 
-			cn10k_ml_ocm_reserve_pages(dev, model->model_id, 0,
-						   model->layer[0].glow.ocm_map.tilemask,
-						   model->layer[0].glow.ocm_map.wb_page_start,
-						   model->layer[0].glow.ocm_map.wb_pages,
-						   model->layer[0].glow.ocm_map.scratch_pages);
-			model->layer[0].glow.ocm_map.ocm_reserved = true;
+			cn10k_ml_ocm_reserve_pages(
+				cnxk_mldev, model->model_id, layer_id, layer->glow.ocm_map.tilemask,
+				layer->glow.ocm_map.wb_page_start, layer->glow.ocm_map.wb_pages,
+				layer->glow.ocm_map.scratch_pages);
+			layer->glow.ocm_map.ocm_reserved = true;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
 
 	/* Update JD */
-	cn10k_ml_ocm_tilecount(model->layer[0].glow.ocm_map.tilemask, &tile_start, &tile_end);
+	cn10k_ml_ocm_tilecount(layer->glow.ocm_map.tilemask, &tile_start, &tile_end);
 	req->cn10k_req.jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
 	req->cn10k_req.jd.model_start.ocm_wb_base_address =
-		model->layer[0].glow.ocm_map.wb_page_start * ocm->page_size;
+		layer->glow.ocm_map.wb_page_start * ocm->page_size;
 
 	job_enqueued = false;
 	job_dequeued = false;
@@ -1636,66 +1644,94 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (ret == 0) {
-				model->state = ML_CNXK_MODEL_STATE_STARTED;
-				cnxk_mldev->nb_models_started++;
-			} else {
-				model->state = ML_CNXK_MODEL_STATE_UNKNOWN;
-			}
+			if (ret == 0)
+				layer->state = ML_CNXK_LAYER_STATE_STARTED;
+			else
+				layer->state = ML_CNXK_LAYER_STATE_UNKNOWN;
 
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN) {
-		while (model->layer[0].glow.ocm_map.ocm_reserved) {
+	if (layer->state == ML_CNXK_LAYER_STATE_UNKNOWN) {
+		while (layer->glow.ocm_map.ocm_reserved) {
 			if (plt_spinlock_trylock(&ocm->lock) != 0) {
-				cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
-				model->layer[0].glow.ocm_map.ocm_reserved = false;
-				model->layer[0].glow.ocm_map.tilemask = 0x0;
+				cn10k_ml_ocm_free_pages(cnxk_mldev, model->model_id, layer_id);
+				layer->glow.ocm_map.ocm_reserved = false;
+				layer->glow.ocm_map.tilemask = 0x0;
 				plt_spinlock_unlock(&ocm->lock);
 			}
 		}
 	}
 
-	if (ret < 0) { /* Call unload to update model and FW state, ignore error */
-		rte_ml_model_stop(dev->data->dev_id, model_id);
+	if (ret < 0) {
+		cn10k_ml_layer_stop(device, model_id, layer_name);
 	} else {
-		if (cn10k_mldev->cache_model_data && roc_model_is_cn10ka())
-			ret = cn10k_ml_cache_model_data(dev, model_id);
+		if (cn10k_mldev->cache_model_data)
+			ret = cn10k_ml_cache_model_data(cnxk_mldev, layer);
 	}
 
 	return ret;
 }
 
 int
-cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+	int ret;
+
+	layer = &model->layer[0];
+	ret = cn10k_ml_layer_start(cnxk_mldev, model->model_id, layer->name);
+	if (ret != 0) {
+		plt_err("CN10K Model start failed, model_id = %u, error = %d", model->model_id,
+			ret);
+		return ret;
+	}
+
+	cnxk_mldev->nb_models_started++;
+	model->state = ML_CNXK_MODEL_STATE_STARTED;
+
+	return 0;
+}
+
+int
+cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 	struct cnxk_ml_req *req;
 
+	uint16_t layer_id = 0;
 	bool job_enqueued;
 	bool job_dequeued;
 	bool locked;
 	int ret = 0;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	PLT_SET_USED(layer_name);
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
 
+	model = cnxk_mldev->mldev->data->models[model_id];
 	if (model == NULL) {
 		plt_err("Invalid model_id = %u", model_id);
 		return -EINVAL;
 	}
 
+	layer = &model->layer[layer_id];
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
+
 	/* Prepare JD */
-	req = model->layer[0].glow.req;
-	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
+	req = layer->glow.req;
+	cn10k_ml_prep_sp_job_descriptor(cnxk_mldev, layer, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
 	req->cn10k_req.result.error_code = 0x0;
 	req->cn10k_req.result.user_ptr = NULL;
 
@@ -1705,31 +1741,31 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-				plt_ml_dbg("Model not started, model = 0x%016lx",
-					   PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_LOADED) {
+				plt_ml_dbg("Layer not started, model_id = %u, layer_id = %u",
+					   model->model_id, layer_id);
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
-				plt_err("A slow-path job is active for the model = 0x%016lx",
-					PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE) {
+				plt_err("A slow-path job is active for the layer, model_id = %u, layer_id = %u",
+					model->model_id, layer_id);
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
+			layer->state = ML_CNXK_LAYER_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (model->layer[0].glow.ocm_map.ocm_reserved) {
+	while (layer->glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
-			cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
-			model->layer[0].glow.ocm_map.ocm_reserved = false;
-			model->layer[0].glow.ocm_map.tilemask = 0x0;
+			cn10k_ml_ocm_free_pages(cnxk_mldev, model->model_id, layer_id);
+			layer->glow.ocm_map.ocm_reserved = false;
+			layer->glow.ocm_map.tilemask = 0x0;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
@@ -1766,8 +1802,11 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			cnxk_mldev->nb_models_stopped++;
-			model->state = ML_CNXK_MODEL_STATE_LOADED;
+			if (ret == 0)
+				layer->state = ML_CNXK_LAYER_STATE_LOADED;
+			else
+				layer->state = ML_CNXK_LAYER_STATE_UNKNOWN;
+
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
@@ -1776,6 +1815,25 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	return ret;
 }
 
+int
+cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+	int ret;
+
+	layer = &model->layer[0];
+	ret = cn10k_ml_layer_stop(cnxk_mldev, model->model_id, layer->name);
+	if (ret != 0) {
+		plt_err("CN10K Model stop failed, model_id = %u, error = %d", model->model_id, ret);
+		return ret;
+	}
+
+	cnxk_mldev->nb_models_stopped++;
+	model->state = ML_CNXK_MODEL_STATE_LOADED;
+
+	return 0;
+}
+
 int
 cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			struct rte_ml_model_info *model_info)
@@ -2003,30 +2061,35 @@ queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
 }
 
 static __rte_always_inline void
-cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *req)
+cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml_req *req)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_layer_xstats *xstats;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_result *result;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cnxk_ml_qp *qp;
 	struct rte_ml_op *op;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
+	uint16_t model_id;
+	uint16_t layer_id;
 
 	result = &req->cn10k_req.result;
 	op = req->op;
 
 	if (likely(result->error_code == 0)) {
-		model = dev->data->models[op->model_id];
+		model_id = cnxk_mldev->index_map[op->model_id].model_id;
+		layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
+		model = cnxk_mldev->mldev->data->models[model_id];
+		layer = &model->layer[layer_id];
 		if (likely(qp_id >= 0)) {
-			qp = dev->data->queue_pairs[qp_id];
+			qp = cnxk_mldev->mldev->data->queue_pairs[qp_id];
 			qp->stats.dequeued_count++;
-			xstats = &model->layer[0].glow.burst_xstats[qp_id];
+			xstats = &layer->glow.burst_xstats[qp_id];
 		} else {
-			xstats = model->layer[0].glow.sync_xstats;
+			xstats = layer->glow.sync_xstats;
 		}
 
 		if (unlikely(xstats->dequeued_count == xstats->hw_reset_count)) {
@@ -2054,14 +2117,13 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *re
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
 	} else {
 		if (likely(qp_id >= 0)) {
-			qp = dev->data->queue_pairs[qp_id];
+			qp = cnxk_mldev->mldev->data->queue_pairs[qp_id];
 			qp->stats.dequeue_err_count++;
 		}
 
 		/* Handle driver error */
 		error_code = (union cn10k_ml_error_code *)&result->error_code;
 		if (error_code->s.etype == ML_ETYPE_DRIVER) {
-			cnxk_mldev = dev->data->dev_private;
 			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 			/* Check for exception */
@@ -2116,7 +2178,7 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	req = &queue->reqs[head];
 
 	cn10k_mldev->set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
@@ -2183,7 +2245,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 		}
 	}
 
-	cn10k_ml_result_update(dev, qp_id, req);
+	cn10k_ml_result_update(cnxk_mldev, qp_id, req);
 	ops[count] = req->op;
 
 	queue_index_advance(&tail, qp->nb_desc);
@@ -2232,23 +2294,27 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 }
 
 __rte_hot int
-cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
+cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cnxk_ml_req *req;
+	uint16_t model_id;
+	uint16_t layer_id;
 	bool timeout;
 	int ret = 0;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	model = dev->data->models[op->model_id];
-	req = model->layer[0].glow.req;
+	model_id = cnxk_mldev->index_map[op->model_id].model_id;
+	layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
+	model = cnxk_mldev->mldev->data->models[model_id];
+	layer = &model->layer[layer_id];
+	req = layer->glow.req;
 
 	cn10k_ml_set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
@@ -2284,7 +2350,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	if (timeout)
 		ret = -ETIME;
 	else
-		cn10k_ml_result_update(dev, -1, req);
+		cn10k_ml_result_update(cnxk_mldev, -1, req);
 
 error_enqueue:
 	return ret;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 677219dfdf..a222a43d55 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -315,8 +315,8 @@ int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
-int cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id);
-int cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
+int cn10k_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			    struct rte_ml_model_info *model_info);
 int cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer);
@@ -335,7 +335,7 @@ __rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id
 					  struct rte_ml_op **ops, uint16_t nb_ops);
 __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op,
 				    struct rte_ml_op_error *error);
-__rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
+__rte_hot int cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op);
 
 /* Misc ops */
 void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
@@ -344,5 +344,7 @@ void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *q
 int cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uint8_t *buffer,
 			size_t size, uint16_t *index);
 int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name);
+int cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name);
+int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name);
 
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 1d8b84269d..b61ed45876 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -240,7 +240,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 			model = dev->data->models[model_id];
 			if (model != NULL) {
 				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-					if (cn10k_ml_model_stop(dev, model_id) != 0)
+					if (cnxk_ml_model_stop(dev, model_id) != 0)
 						plt_err("Could not stop model %u", model_id);
 				}
 				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
@@ -332,7 +332,7 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 		model = dev->data->models[model_id];
 		if (model != NULL) {
 			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-				if (cn10k_ml_model_stop(dev, model_id) != 0)
+				if (cnxk_ml_model_stop(dev, model_id) != 0)
 					plt_err("Could not stop model %u", model_id);
 			}
 			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
@@ -564,6 +564,46 @@ cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 	return plt_memzone_free(plt_memzone_lookup(str));
 }
 
+static int
+cnxk_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	return cn10k_ml_model_start(cnxk_mldev, model);
+}
+
+int
+cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	return cn10k_ml_model_stop(cnxk_mldev, model);
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -589,8 +629,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Model ops */
 	.model_load = cnxk_ml_model_load,
 	.model_unload = cnxk_ml_model_unload,
-	.model_start = cn10k_ml_model_start,
-	.model_stop = cn10k_ml_model_stop,
+	.model_start = cnxk_ml_model_start,
+	.model_stop = cnxk_ml_model_stop,
 	.model_info_get = cn10k_ml_model_info_get,
 	.model_params_update = cn10k_ml_model_params_update,
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index bc14f6e5b9..d27ca0d0cb 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -63,5 +63,6 @@ struct cnxk_ml_qp {
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
 int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
+int cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
 
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 11/34] ml/cnxk: update model utility functions
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
                     ` (9 preceding siblings ...)
  2023-10-18  6:47   ` [PATCH v5 10/34] ml/cnxk: update model start and stop functions Srikanth Yalavarthi
@ 2023-10-18  6:47   ` Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 12/34] ml/cnxk: update data quantization functions Srikanth Yalavarthi
                     ` (22 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:47 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to update model params and
fetch model info.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 38 ++++++---------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |  5 ++--
 drivers/ml/cnxk/cnxk_ml_ops.c  | 48 ++++++++++++++++++++++++++++++++--
 3 files changed, 56 insertions(+), 35 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index c677861645..c0d6216485 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1835,45 +1835,23 @@ cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
 }
 
 int
-cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
-			struct rte_ml_model_info *model_info)
+cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			     void *buffer)
 {
-	struct cnxk_ml_model *model;
-
-	model = dev->data->models[model_id];
-
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
-
-	rte_memcpy(model_info, model->info, sizeof(struct rte_ml_model_info));
-	model_info->input_info = ((struct rte_ml_model_info *)model->info)->input_info;
-	model_info->output_info = ((struct rte_ml_model_info *)model->info)->output_info;
-
-	return 0;
-}
-
-int
-cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
-{
-	struct cnxk_ml_model *model;
-
-	model = dev->data->models[model_id];
+	struct cnxk_ml_layer *layer;
 
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
+	RTE_SET_USED(cnxk_mldev);
 
 	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN)
 		return -1;
 	else if (model->state != ML_CNXK_MODEL_STATE_LOADED)
 		return -EBUSY;
 
+	layer = &model->layer[0];
+
 	/* Update model weights & bias */
-	rte_memcpy(model->layer[0].glow.addr.wb_load_addr, buffer,
-		   model->layer[0].glow.metadata.weights_bias.file_size);
+	rte_memcpy(layer->glow.addr.wb_load_addr, buffer,
+		   layer->glow.metadata.weights_bias.file_size);
 
 	return 0;
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index a222a43d55..ef12069f0d 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -317,9 +317,8 @@ int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_para
 int cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
-int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
-			    struct rte_ml_model_info *model_info);
-int cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer);
+int cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				 void *buffer);
 
 /* I/O ops */
 int cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index b61ed45876..9ce37fcfd1 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -604,6 +604,50 @@ cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	return cn10k_ml_model_stop(cnxk_mldev, model);
 }
 
+static int
+cnxk_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
+		       struct rte_ml_model_info *model_info)
+{
+	struct rte_ml_model_info *info;
+	struct cnxk_ml_model *model;
+
+	if ((dev == NULL) || (model_info == NULL))
+		return -EINVAL;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	info = (struct rte_ml_model_info *)model->info;
+	rte_memcpy(model_info, info, sizeof(struct rte_ml_model_info));
+	model_info->input_info = info->input_info;
+	model_info->output_info = info->output_info;
+
+	return 0;
+}
+
+static int
+cnxk_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	if ((dev == NULL) || (buffer == NULL))
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	return cn10k_ml_model_params_update(cnxk_mldev, model, buffer);
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -631,8 +675,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.model_unload = cnxk_ml_model_unload,
 	.model_start = cnxk_ml_model_start,
 	.model_stop = cnxk_ml_model_stop,
-	.model_info_get = cn10k_ml_model_info_get,
-	.model_params_update = cn10k_ml_model_params_update,
+	.model_info_get = cnxk_ml_model_info_get,
+	.model_params_update = cnxk_ml_model_params_update,
 
 	/* I/O ops */
 	.io_quantize = cn10k_ml_io_quantize,
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 12/34] ml/cnxk: update data quantization functions
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
                     ` (10 preceding siblings ...)
  2023-10-18  6:47   ` [PATCH v5 11/34] ml/cnxk: update model utility functions Srikanth Yalavarthi
@ 2023-10-18  6:47   ` Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 13/34] ml/cnxk: update device debug functions Srikanth Yalavarthi
                     ` (21 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:47 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper functions to quantize input data and
dequantize output data.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 164 ---------------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   7 --
 drivers/ml/cnxk/cnxk_ml_io.c   |  95 +++++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_io.h   |   3 +
 drivers/ml/cnxk/cnxk_ml_ops.c  |  78 +++++++++++++++-
 drivers/ml/cnxk/meson.build    |   1 +
 6 files changed, 175 insertions(+), 173 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.c

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index c0d6216485..ff190b7f86 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1856,170 +1856,6 @@ cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_mode
 	return 0;
 }
 
-int
-cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
-		     struct rte_ml_buff_seg **qbuffer)
-{
-	struct cnxk_ml_model *model;
-	uint8_t model_input_type;
-	uint8_t *lcl_dbuffer;
-	uint8_t *lcl_qbuffer;
-	uint8_t input_type;
-	float qscale;
-	uint32_t i;
-	uint32_t j;
-	int ret;
-
-	model = dev->data->models[model_id];
-
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
-
-	lcl_dbuffer = dbuffer[0]->addr;
-	lcl_qbuffer = qbuffer[0]->addr;
-
-	for (i = 0; i < model->layer[0].glow.metadata.model.num_input; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			input_type = model->layer[0].glow.metadata.input1[i].input_type;
-			model_input_type = model->layer[0].glow.metadata.input1[i].model_input_type;
-			qscale = model->layer[0].glow.metadata.input1[i].qscale;
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			input_type = model->layer[0].glow.metadata.input2[j].input_type;
-			model_input_type = model->layer[0].glow.metadata.input2[j].model_input_type;
-			qscale = model->layer[0].glow.metadata.input2[j].qscale;
-		}
-
-		if (input_type == model_input_type) {
-			rte_memcpy(lcl_qbuffer, lcl_dbuffer, model->layer[0].info.input[i].sz_d);
-		} else {
-			switch (model->layer[0].glow.metadata.input1[i].model_input_type) {
-			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_float32_to_int8(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_float32_to_uint8(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_float32_to_int16(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_float32_to_uint16(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_FP16:
-				ret = rte_ml_io_float32_to_float16(
-					model->layer[0].info.input[i].nb_elements, lcl_dbuffer,
-					lcl_qbuffer);
-				break;
-			default:
-				plt_err("Unsupported model_input_type[%u] : %u", i,
-					model->layer[0].glow.metadata.input1[i].model_input_type);
-				ret = -ENOTSUP;
-			}
-			if (ret < 0)
-				return ret;
-		}
-
-		lcl_dbuffer += model->layer[0].info.input[i].sz_d;
-		lcl_qbuffer += model->layer[0].info.input[i].sz_q;
-	}
-
-	return 0;
-}
-
-int
-cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
-		       struct rte_ml_buff_seg **dbuffer)
-{
-	struct cnxk_ml_model *model;
-	uint8_t model_output_type;
-	uint8_t *lcl_qbuffer;
-	uint8_t *lcl_dbuffer;
-	uint8_t output_type;
-	float dscale;
-	uint32_t i;
-	uint32_t j;
-	int ret;
-
-	model = dev->data->models[model_id];
-
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
-
-	lcl_dbuffer = dbuffer[0]->addr;
-	lcl_qbuffer = qbuffer[0]->addr;
-
-	for (i = 0; i < model->layer[0].glow.metadata.model.num_output; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			output_type = model->layer[0].glow.metadata.output1[i].output_type;
-			model_output_type =
-				model->layer[0].glow.metadata.output1[i].model_output_type;
-			dscale = model->layer[0].glow.metadata.output1[i].dscale;
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			output_type = model->layer[0].glow.metadata.output2[j].output_type;
-			model_output_type =
-				model->layer[0].glow.metadata.output2[j].model_output_type;
-			dscale = model->layer[0].glow.metadata.output2[j].dscale;
-		}
-
-		if (output_type == model_output_type) {
-			rte_memcpy(lcl_dbuffer, lcl_qbuffer, model->layer[0].info.output[i].sz_q);
-		} else {
-			switch (model->layer[0].glow.metadata.output1[i].model_output_type) {
-			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_int8_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_uint8_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_int16_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_uint16_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_FP16:
-				ret = rte_ml_io_float16_to_float32(
-					model->layer[0].info.output[i].nb_elements, lcl_qbuffer,
-					lcl_dbuffer);
-				break;
-			default:
-				plt_err("Unsupported model_output_type[%u] : %u", i,
-					model->layer[0].glow.metadata.output1[i].model_output_type);
-				ret = -ENOTSUP;
-			}
-			if (ret < 0)
-				return ret;
-		}
-
-		lcl_qbuffer += model->layer[0].info.output[i].sz_q;
-		lcl_dbuffer += model->layer[0].info.output[i].sz_d;
-	}
-
-	return 0;
-}
-
 static __rte_always_inline void
 queue_index_advance(uint64_t *index, uint64_t nb_desc)
 {
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index ef12069f0d..780e2a9f9c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -320,13 +320,6 @@ int cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 int cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 				 void *buffer);
 
-/* I/O ops */
-int cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id,
-			 struct rte_ml_buff_seg **dbuffer, struct rte_ml_buff_seg **qbuffer);
-
-int cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id,
-			   struct rte_ml_buff_seg **qbuffer, struct rte_ml_buff_seg **dbuffer);
-
 /* Fast-path ops */
 __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					  struct rte_ml_op **ops, uint16_t nb_ops);
diff --git a/drivers/ml/cnxk/cnxk_ml_io.c b/drivers/ml/cnxk/cnxk_ml_io.c
new file mode 100644
index 0000000000..c78009ab0c
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_io.c
@@ -0,0 +1,95 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include <mldev_utils.h>
+
+#include <roc_api.h>
+
+#include "cnxk_ml_io.h"
+
+inline int
+cnxk_ml_io_quantize_single(struct cnxk_ml_io *input, uint8_t *dbuffer, uint8_t *qbuffer)
+{
+	enum rte_ml_io_type qtype;
+	enum rte_ml_io_type dtype;
+	uint32_t nb_elements;
+	float qscale;
+	int ret = 0;
+
+	dtype = input->dtype;
+	qtype = input->qtype;
+	qscale = input->scale;
+	nb_elements = input->nb_elements;
+
+	if (dtype == qtype) {
+		rte_memcpy(qbuffer, dbuffer, input->sz_d);
+	} else {
+		switch (qtype) {
+		case RTE_ML_IO_TYPE_INT8:
+			ret = rte_ml_io_float32_to_int8(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT8:
+			ret = rte_ml_io_float32_to_uint8(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_INT16:
+			ret = rte_ml_io_float32_to_int16(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT16:
+			ret = rte_ml_io_float32_to_uint16(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_FP16:
+			ret = rte_ml_io_float32_to_float16(nb_elements, dbuffer, qbuffer);
+			break;
+		default:
+			plt_err("Unsupported qtype : %u", qtype);
+			ret = -ENOTSUP;
+		}
+	}
+
+	return ret;
+}
+
+inline int
+cnxk_ml_io_dequantize_single(struct cnxk_ml_io *output, uint8_t *qbuffer, uint8_t *dbuffer)
+{
+	enum rte_ml_io_type qtype;
+	enum rte_ml_io_type dtype;
+	uint32_t nb_elements;
+	float dscale;
+	int ret = 0;
+
+	dtype = output->dtype;
+	qtype = output->qtype;
+	dscale = output->scale;
+	nb_elements = output->nb_elements;
+
+	if (dtype == qtype) {
+		rte_memcpy(dbuffer, qbuffer, output->sz_q);
+	} else {
+		switch (qtype) {
+		case RTE_ML_IO_TYPE_INT8:
+			ret = rte_ml_io_int8_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT8:
+			ret = rte_ml_io_uint8_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_INT16:
+			ret = rte_ml_io_int16_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT16:
+			ret = rte_ml_io_uint16_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_FP16:
+			ret = rte_ml_io_float16_to_float32(nb_elements, qbuffer, dbuffer);
+			break;
+		default:
+			plt_err("Unsupported qtype: %u", qtype);
+			ret = -ENOTSUP;
+		}
+	}
+
+	return ret;
+}
diff --git a/drivers/ml/cnxk/cnxk_ml_io.h b/drivers/ml/cnxk/cnxk_ml_io.h
index 29ec7ec511..5de166c252 100644
--- a/drivers/ml/cnxk/cnxk_ml_io.h
+++ b/drivers/ml/cnxk/cnxk_ml_io.h
@@ -76,4 +76,7 @@ struct cnxk_ml_io_info {
 	uint32_t total_output_sz_d;
 };
 
+int cnxk_ml_io_quantize_single(struct cnxk_ml_io *input, uint8_t *dbuffer, uint8_t *qbuffer);
+int cnxk_ml_io_dequantize_single(struct cnxk_ml_io *output, uint8_t *qbuffer, uint8_t *dbuffer);
+
 #endif /* _CNXK_ML_IO_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 9ce37fcfd1..63842025fc 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -5,6 +5,8 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include <mldev_utils.h>
+
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_io.h"
 #include "cnxk_ml_model.h"
@@ -648,6 +650,78 @@ cnxk_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buf
 	return cn10k_ml_model_params_update(cnxk_mldev, model, buffer);
 }
 
+static int
+cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
+		    struct rte_ml_buff_seg **qbuffer)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_model *model;
+	uint8_t *lcl_dbuffer;
+	uint8_t *lcl_qbuffer;
+	uint32_t i;
+	int ret;
+
+	if ((dev == NULL) || (dbuffer == NULL) || (qbuffer == NULL))
+		return -EINVAL;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	info = &model->layer[0].info;
+
+	lcl_dbuffer = dbuffer[0]->addr;
+	lcl_qbuffer = qbuffer[0]->addr;
+	for (i = 0; i < info->nb_inputs; i++) {
+		ret = cnxk_ml_io_quantize_single(&info->input[i], lcl_dbuffer, lcl_qbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_dbuffer += info->input[i].sz_d;
+		lcl_qbuffer += info->input[i].sz_q;
+	}
+
+	return 0;
+}
+
+static int
+cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
+		      struct rte_ml_buff_seg **dbuffer)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_model *model;
+	uint8_t *lcl_qbuffer;
+	uint8_t *lcl_dbuffer;
+	uint32_t i;
+	int ret;
+
+	if ((dev == NULL) || (qbuffer == NULL) || (dbuffer == NULL))
+		return -EINVAL;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	info = &model->layer[model->nb_layers - 1].info;
+
+	lcl_qbuffer = qbuffer[0]->addr;
+	lcl_dbuffer = dbuffer[0]->addr;
+	for (i = 0; i < info->nb_outputs; i++) {
+		ret = cnxk_ml_io_dequantize_single(&info->output[i], lcl_qbuffer, lcl_dbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_qbuffer += info->output[i].sz_q;
+		lcl_dbuffer += info->output[i].sz_d;
+	}
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -679,6 +753,6 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.model_params_update = cnxk_ml_model_params_update,
 
 	/* I/O ops */
-	.io_quantize = cn10k_ml_io_quantize,
-	.io_dequantize = cn10k_ml_io_dequantize,
+	.io_quantize = cnxk_ml_io_quantize,
+	.io_dequantize = cnxk_ml_io_dequantize,
 };
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 6385ac4548..9cc4ddec70 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -25,6 +25,7 @@ sources = files(
         'cn10k_ml_model.c',
         'cn10k_ml_ocm.c',
         'cnxk_ml_dev.c',
+        'cnxk_ml_io.c',
         'cnxk_ml_model.c',
         'cnxk_ml_ops.c',
 )
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 13/34] ml/cnxk: update device debug functions
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
                     ` (11 preceding siblings ...)
  2023-10-18  6:47   ` [PATCH v5 12/34] ml/cnxk: update data quantization functions Srikanth Yalavarthi
@ 2023-10-18  6:47   ` Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 14/34] ml/cnxk: update device stats functions Srikanth Yalavarthi
                     ` (20 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:47 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper for device dump and selftest debug
functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 118 +++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h |   1 +
 drivers/ml/cnxk/cn10k_ml_ocm.c   |   8 +-
 drivers/ml/cnxk/cn10k_ml_ocm.h   |   2 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 176 ++-----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h   |   4 +-
 drivers/ml/cnxk/cnxk_ml_model.c  |  33 ++++++
 drivers/ml/cnxk/cnxk_ml_model.h  |   2 +
 drivers/ml/cnxk/cnxk_ml_ops.c    |  39 ++++++-
 drivers/ml/cnxk/cnxk_ml_utils.c  |  15 +++
 drivers/ml/cnxk/cnxk_ml_utils.h  |  17 +++
 drivers/ml/cnxk/meson.build      |   2 +
 12 files changed, 236 insertions(+), 181 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.h

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 69a60b9b90..b765b4ada9 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -11,6 +11,7 @@
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
+#include "cnxk_ml_utils.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -596,3 +597,120 @@ cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 				 rte_ml_io_type_size_get(io_info->output[i].qtype);
 	}
 }
+
+void
+cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp)
+{
+	struct cn10k_ml_ocm *ocm;
+	char str[STR_LEN];
+	uint8_t i;
+	uint8_t j;
+
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
+
+	/* Print debug info */
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, " Layer Information (Layer ID: %u, Name: %s)\n",
+		cnxk_mldev->index_map[layer->index].layer_id, layer->name);
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "index", layer->index);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", layer->name);
+	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version",
+		layer->glow.metadata.model.version[0], layer->glow.metadata.model.version[1],
+		layer->glow.metadata.model.version[2], layer->glow.metadata.model.version[3]);
+	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "layer", PLT_U64_CAST(layer));
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", layer->batch_size);
+
+	/* Print model state */
+	if (layer->state == ML_CNXK_LAYER_STATE_LOADED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+	if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+	if (layer->state == ML_CNXK_LAYER_STATE_STARTED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+
+	/* Print OCM status */
+	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
+		layer->glow.metadata.model.ocm_wb_range_end -
+			layer->glow.metadata.model.ocm_wb_range_start + 1);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", layer->glow.ocm_map.wb_pages);
+	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
+		ocm->size_per_tile - layer->glow.metadata.model.ocm_tmp_range_floor);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages", layer->glow.ocm_map.scratch_pages);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
+		layer->glow.metadata.model.tile_end - layer->glow.metadata.model.tile_start + 1);
+
+	if (layer->state == ML_CNXK_LAYER_STATE_STARTED) {
+		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
+			ML_CN10K_OCM_NUMTILES / 4, layer->glow.ocm_map.tilemask);
+		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
+			layer->glow.ocm_map.wb_page_start * ocm->page_size);
+	}
+
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", layer->glow.metadata.model.num_input);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", layer->glow.metadata.model.num_output);
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s  %18s\n", "input", "input_name", "input_type",
+		"model_input_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->glow.metadata.model.num_input; i++) {
+		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.input1[i].input_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.input1[i].input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.input1[i].model_input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		} else {
+			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
+
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.input2[j].input_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.input2[j].input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.input2[j].model_input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		}
+	}
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s  %18s\n", "output", "output_name", "output_type",
+		"model_output_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->glow.metadata.model.num_output; i++) {
+		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.output1[i].output_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.output1[i].output_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.output1[i].model_output_type,
+					      str, STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		} else {
+			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.output2[j].output_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.output2[j].output_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.output2[j].model_output_type,
+					      str, STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		}
+	}
+	fprintf(fp, "\n");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index b891c9d627..45f2ed5fcf 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -460,5 +460,6 @@ int cn10k_ml_model_ocm_pages_count(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_m
 void cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 			     struct cnxk_ml_io_info *io_info,
 			     struct cn10k_ml_model_metadata *metadata);
+void cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 2197e5e0ed..dc315cce10 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -481,19 +481,15 @@ cn10k_ml_ocm_pagemask_to_str(struct cn10k_ml_ocm_tile_info *tile_info, uint16_t
 }
 
 void
-cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
+cn10k_ml_ocm_print(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
 {
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 	uint8_t tile_id;
 	uint8_t word_id;
 	int wb_pages;
 	char *str;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
 
 	/* Nibbles + prefix '0x' */
 	str = rte_zmalloc("ocm_mask_str", ocm->num_pages / 4 + 2, RTE_CACHE_LINE_SIZE);
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 97b723a56a..bf8944f8ee 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -83,6 +83,6 @@ void cn10k_ml_ocm_reserve_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_i
 				uint16_t layer_id, uint64_t tilemask, int wb_page_start,
 				uint16_t wb_pages, uint16_t scratch_pages);
 void cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id);
-void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp);
+void cn10k_ml_ocm_print(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 
 #endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index ff190b7f86..0a3575879f 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -18,11 +18,6 @@
 /* ML layer macros */
 #define CN10K_ML_LAYER_MEMZONE_NAME "ml_cn10k_layer_mz"
 
-/* Debug print width */
-#define STR_LEN	  12
-#define FIELD_LEN 16
-#define LINE_LEN  90
-
 /* ML Job descriptor flags */
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
@@ -70,16 +65,6 @@ static const struct cn10k_ml_stype_db_driver {
 	{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
 };
 
-static void
-print_line(FILE *fp, int len)
-{
-	int i;
-
-	for (i = 0; i < len; i++)
-		fprintf(fp, "-");
-	fprintf(fp, "\n");
-}
-
 static inline void
 cn10k_ml_set_poll_addr(struct cnxk_ml_req *req)
 {
@@ -113,140 +98,6 @@ cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp)
 	}
 }
 
-static void
-cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
-	struct cn10k_ml_ocm *ocm;
-	char str[STR_LEN];
-	uint8_t i;
-	uint8_t j;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
-
-	/* Print debug info */
-	print_line(fp, LINE_LEN);
-	fprintf(fp, " Model Information (%s)\n", model->glow.metadata.model.name);
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->glow.metadata.model.name);
-	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version",
-		model->glow.metadata.model.version[0], model->glow.metadata.model.version[1],
-		model->glow.metadata.model.version[2], model->glow.metadata.model.version[3]);
-	if (strlen(model->name) != 0)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "debug_name", model->name);
-	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->glow.metadata.model.batch_size);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_layers", model->glow.metadata.model.num_layers);
-
-	/* Print model state */
-	if (model->state == ML_CNXK_MODEL_STATE_LOADED)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
-	if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
-	if (model->state == ML_CNXK_MODEL_STATE_STARTED)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
-
-	/* Print OCM status */
-	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
-		model->glow.metadata.model.ocm_wb_range_end -
-			model->glow.metadata.model.ocm_wb_range_start + 1);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", model->layer[0].glow.ocm_map.wb_pages);
-	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
-		ocm->size_per_tile - model->glow.metadata.model.ocm_tmp_range_floor);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages",
-		model->layer[0].glow.ocm_map.scratch_pages);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
-		model->glow.metadata.model.tile_end - model->glow.metadata.model.tile_start + 1);
-
-	if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
-			ML_CN10K_OCM_NUMTILES / 4, model->layer[0].glow.ocm_map.tilemask);
-		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
-			model->layer[0].glow.ocm_map.wb_page_start * cn10k_mldev->ocm.page_size);
-	}
-
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->glow.metadata.model.num_input);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", model->glow.metadata.model.num_output);
-	fprintf(fp, "\n");
-
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "input", "input_name", "input_type",
-		"model_input_type", "quantize");
-	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->glow.metadata.model.num_input; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.input1[i].input_name);
-			rte_ml_io_type_to_str(model->glow.metadata.input1[i].input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.input1[i].model_input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.input1[i].quantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.input2[j].input_name);
-			rte_ml_io_type_to_str(model->glow.metadata.input2[j].input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.input2[j].model_input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.input2[j].quantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		}
-	}
-	fprintf(fp, "\n");
-
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "output", "output_name", "output_type",
-		"model_output_type", "dequantize");
-	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->glow.metadata.model.num_output; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.output1[i].output_name);
-			rte_ml_io_type_to_str(model->glow.metadata.output1[i].output_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.output1[i].model_output_type,
-					      str, STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.output1[i].dequantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.output2[j].output_name);
-			rte_ml_io_type_to_str(model->glow.metadata.output2[j].output_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.output2[j].model_output_type,
-					      str, STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.output2[j].dequantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		}
-	}
-	fprintf(fp, "\n");
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "\n");
-}
-
 static void
 cn10k_ml_prep_sp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				struct cnxk_ml_req *req, enum cn10k_ml_job_type job_type)
@@ -1120,38 +971,25 @@ cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mo
 }
 
 int
-cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
+cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
 	struct cn10k_ml_fw *fw;
 
 	uint32_t head_loc;
 	uint32_t tail_loc;
-	uint16_t model_id;
 	uint32_t bufsize;
 	char *head_ptr;
 	int core_id;
 
-	if (roc_env_is_asim())
-		return 0;
-
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	fw = &cn10k_mldev->fw;
 
-	/* Dump model info */
-	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
-		model = dev->data->models[model_id];
-		if (model != NULL) {
-			cn10k_ml_model_print(dev, model_id, fp);
-			fprintf(fp, "\n");
-		}
-	}
-
 	/* Dump OCM state */
-	cn10k_ml_ocm_print(dev, fp);
+	cn10k_ml_ocm_print(cnxk_mldev, fp);
+
+	if (roc_env_is_asim())
+		return 0;
 
 	/* Dump debug buffer */
 	for (core_id = 0; core_id <= 1; core_id++) {
@@ -1207,17 +1045,15 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 }
 
 int
-cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
+cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	const struct plt_memzone *mz;
 	struct cnxk_ml_req *req;
 	uint64_t timeout_cycle;
 	bool timeout;
 	int ret;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cnxk_ml_req), 0,
 					 ML_CN10K_ALIGN_SIZE);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 780e2a9f9c..5fda98ae88 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -295,8 +295,8 @@ int cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_d
 int cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
-int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
-int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
+int cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
+int cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev);
 
 int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
 void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
index 3d735ced3e..b069d4e3a5 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.c
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -5,3 +5,36 @@
 #include <rte_mldev.h>
 
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_utils.h"
+
+void
+cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp)
+{
+	struct cnxk_ml_layer *layer;
+	uint16_t layer_id;
+
+	/* Print debug info */
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, " Model Information (Model ID: %u, Name: %s)\n", model->model_id, model->name);
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->name);
+	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->batch_size);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "nb_layers", model->nb_layers);
+
+	/* Print model state */
+	if (model->state == ML_CNXK_MODEL_STATE_LOADED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+	if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+	if (model->state == ML_CNXK_MODEL_STATE_STARTED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+
+	for (layer_id = 0; layer_id < model->nb_layers; layer_id++) {
+		layer = &model->layer[layer_id];
+		cn10k_ml_layer_print(cnxk_mldev, layer, fp);
+	}
+}
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index a2994dbb71..66d979dd3c 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -108,4 +108,6 @@ struct cnxk_ml_model {
 	plt_spinlock_t lock;
 };
 
+void cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp);
+
 #endif /* _CNXK_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 63842025fc..66b88ddae1 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -409,6 +409,41 @@ cnxk_ml_dev_stop(struct rte_ml_dev *dev)
 	return 0;
 }
 
+static int
+cnxk_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	uint16_t model_id;
+
+	if ((dev == NULL) || (fp == NULL))
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	/* Dump model info */
+	for (model_id = 0; model_id < cnxk_mldev->mldev->data->nb_models; model_id++) {
+		model = cnxk_mldev->mldev->data->models[model_id];
+		if (model != NULL)
+			cnxk_ml_model_dump(cnxk_mldev, model, fp);
+	}
+
+	return cn10k_ml_dev_dump(cnxk_mldev, fp);
+}
+
+static int
+cnxk_ml_dev_selftest(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	return cn10k_ml_dev_selftest(cnxk_mldev);
+}
+
 static int
 cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 			     const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
@@ -729,8 +764,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_close = cnxk_ml_dev_close,
 	.dev_start = cnxk_ml_dev_start,
 	.dev_stop = cnxk_ml_dev_stop,
-	.dev_dump = cn10k_ml_dev_dump,
-	.dev_selftest = cn10k_ml_dev_selftest,
+	.dev_dump = cnxk_ml_dev_dump,
+	.dev_selftest = cnxk_ml_dev_selftest,
 
 	/* Queue-pair handling ops */
 	.dev_queue_pair_setup = cnxk_ml_dev_queue_pair_setup,
diff --git a/drivers/ml/cnxk/cnxk_ml_utils.c b/drivers/ml/cnxk/cnxk_ml_utils.c
new file mode 100644
index 0000000000..ca3670a9e8
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_utils.c
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include "cnxk_ml_utils.h"
+
+void
+cnxk_ml_print_line(FILE *fp, int len)
+{
+	int i;
+
+	for (i = 0; i < len; i++)
+		fprintf(fp, "-");
+	fprintf(fp, "\n");
+}
diff --git a/drivers/ml/cnxk/cnxk_ml_utils.h b/drivers/ml/cnxk/cnxk_ml_utils.h
new file mode 100644
index 0000000000..ed2ab21346
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_utils.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_UTILS_H_
+#define _CNXK_ML_UTILS_H_
+
+#include <rte_mldev.h>
+
+/* Debug print width */
+#define STR_LEN	  12
+#define FIELD_LEN 16
+#define LINE_LEN  72
+
+void cnxk_ml_print_line(FILE *fp, int len);
+
+#endif /* _CNXK_ML_UTILS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 9cc4ddec70..575f08f9c0 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -17,6 +17,7 @@ driver_sdk_headers = files(
         'cnxk_ml_model.h',
         'cnxk_ml_ops.h',
         'cnxk_ml_xstats.h',
+        'cnxk_ml_utils.h',
 )
 
 sources = files(
@@ -28,6 +29,7 @@ sources = files(
         'cnxk_ml_io.c',
         'cnxk_ml_model.c',
         'cnxk_ml_ops.c',
+        'cnxk_ml_utils.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 14/34] ml/cnxk: update device stats functions
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
                     ` (12 preceding siblings ...)
  2023-10-18  6:47   ` [PATCH v5 13/34] ml/cnxk: update device debug functions Srikanth Yalavarthi
@ 2023-10-18  6:47   ` Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 15/34] ml/cnxk: update device and model xstats functions Srikanth Yalavarthi
                     ` (19 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:47 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to handle ML device stats

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 32 ------------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |  2 --
 drivers/ml/cnxk/cnxk_ml_ops.c  | 36 ++++++++++++++++++++++++++++++++--
 3 files changed, 34 insertions(+), 36 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 0a3575879f..27d255a830 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -770,38 +770,6 @@ cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
-int
-cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
-{
-	struct cnxk_ml_qp *qp;
-	int qp_id;
-
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-		qp = dev->data->queue_pairs[qp_id];
-		stats->enqueued_count += qp->stats.enqueued_count;
-		stats->dequeued_count += qp->stats.dequeued_count;
-		stats->enqueue_err_count += qp->stats.enqueue_err_count;
-		stats->dequeue_err_count += qp->stats.dequeue_err_count;
-	}
-
-	return 0;
-}
-
-void
-cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
-{
-	struct cnxk_ml_qp *qp;
-	int qp_id;
-
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-		qp = dev->data->queue_pairs[qp_id];
-		qp->stats.enqueued_count = 0;
-		qp->stats.dequeued_count = 0;
-		qp->stats.enqueue_err_count = 0;
-		qp->stats.dequeue_err_count = 0;
-	}
-}
-
 int
 cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 5fda98ae88..47e7cb12af 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -298,8 +298,6 @@ int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev);
 
-int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
-void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
 int cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 				  int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
 				  uint32_t size);
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 66b88ddae1..c75317d6da 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -489,6 +489,38 @@ cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+static int
+cnxk_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
+{
+	struct cnxk_ml_qp *qp;
+	int qp_id;
+
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		stats->enqueued_count += qp->stats.enqueued_count;
+		stats->dequeued_count += qp->stats.dequeued_count;
+		stats->enqueue_err_count += qp->stats.enqueue_err_count;
+		stats->dequeue_err_count += qp->stats.dequeue_err_count;
+	}
+
+	return 0;
+}
+
+static void
+cnxk_ml_dev_stats_reset(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_qp *qp;
+	int qp_id;
+
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		qp->stats.enqueued_count = 0;
+		qp->stats.dequeued_count = 0;
+		qp->stats.enqueue_err_count = 0;
+		qp->stats.dequeue_err_count = 0;
+	}
+}
+
 static int
 cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
@@ -772,8 +804,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_queue_pair_release = cnxk_ml_dev_queue_pair_release,
 
 	/* Stats ops */
-	.dev_stats_get = cn10k_ml_dev_stats_get,
-	.dev_stats_reset = cn10k_ml_dev_stats_reset,
+	.dev_stats_get = cnxk_ml_dev_stats_get,
+	.dev_stats_reset = cnxk_ml_dev_stats_reset,
 	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
 	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
 	.dev_xstats_get = cn10k_ml_dev_xstats_get,
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 15/34] ml/cnxk: update device and model xstats functions
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
                     ` (13 preceding siblings ...)
  2023-10-18  6:47   ` [PATCH v5 14/34] ml/cnxk: update device stats functions Srikanth Yalavarthi
@ 2023-10-18  6:47   ` Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 16/34] ml/cnxk: update fast path functions Srikanth Yalavarthi
                     ` (18 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:47 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to handle ML device and model
extended stats. Handling resources for the xstats is done
in the cnxk layer. Introduced internal xstats group.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |   4 -
 drivers/ml/cnxk/cn10k_ml_ops.c   | 531 +++----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h   |  16 +-
 drivers/ml/cnxk/cnxk_ml_dev.h    |   5 +
 drivers/ml/cnxk/cnxk_ml_ops.c    | 481 +++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_xstats.h |  21 +-
 6 files changed, 551 insertions(+), 507 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index be989e0a20..bde9d08901 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -10,7 +10,6 @@
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_io.h"
-#include "cnxk_ml_xstats.h"
 
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
@@ -133,9 +132,6 @@ struct cn10k_ml_dev {
 	/* OCM info */
 	struct cn10k_ml_ocm ocm;
 
-	/* Extended stats data */
-	struct cnxk_ml_xstats xstats;
-
 	/* Enable / disable model data caching */
 	int cache_model_data;
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 27d255a830..776ad60401 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -198,107 +198,21 @@ cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_r
 	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
 }
 
-static int
-cn10k_ml_xstats_init(struct rte_ml_dev *dev)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint16_t nb_stats;
-	uint16_t stat_id;
-	uint16_t model;
-	uint16_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
-	nb_stats = RTE_DIM(device_xstats) + ML_CNXK_MAX_MODELS * RTE_DIM(layer_xstats);
-	if (cn10k_mldev->xstats.entries == NULL)
-		cn10k_mldev->xstats.entries = rte_zmalloc(
-			"cn10k_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
-			PLT_CACHE_LINE_SIZE);
-
-	if (cn10k_mldev->xstats.entries == NULL)
-		return -ENOMEM;
-
-	/* Initialize device xstats */
-	stat_id = 0;
-	for (i = 0; i < RTE_DIM(device_xstats); i++) {
-		cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
-		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
-			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s",
-			 device_xstats[i].name);
-
-		cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
-		cn10k_mldev->xstats.entries[stat_id].type = device_xstats[i].type;
-		cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_DEVICE;
-		cn10k_mldev->xstats.entries[stat_id].obj_idx = 0;
-		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_xstats[i].reset_allowed;
-		stat_id++;
-	}
-	cn10k_mldev->xstats.count_mode_device = stat_id;
-
-	/* Initialize model xstats */
-	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
-		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
-
-		for (i = 0; i < RTE_DIM(layer_xstats); i++) {
-			cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
-			cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
-			cn10k_mldev->xstats.entries[stat_id].type = layer_xstats[i].type;
-			cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
-			cn10k_mldev->xstats.entries[stat_id].obj_idx = model;
-			cn10k_mldev->xstats.entries[stat_id].reset_allowed =
-				layer_xstats[i].reset_allowed;
-
-			/* Name of xstat is updated during model load */
-			snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
-				 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name),
-				 "Model-%u-%s", model, layer_xstats[i].name);
-
-			stat_id++;
-		}
-
-		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
-	}
-
-	cn10k_mldev->xstats.count_mode_model = stat_id - cn10k_mldev->xstats.count_mode_device;
-	cn10k_mldev->xstats.count = stat_id;
-
-	return 0;
-}
-
 static void
-cn10k_ml_xstats_uninit(struct rte_ml_dev *dev)
+cn10k_ml_xstats_layer_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id,
+				  uint16_t layer_id)
 {
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	rte_free(cn10k_mldev->xstats.entries);
-	cn10k_mldev->xstats.entries = NULL;
-
-	cn10k_mldev->xstats.count = 0;
-}
-
-static void
-cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	uint16_t rclk_freq;
 	uint16_t sclk_freq;
 	uint16_t stat_id;
 	char suffix[8];
 	uint16_t i;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	model = dev->data->models[model_id];
-	stat_id = RTE_DIM(device_xstats) + model_id * RTE_DIM(layer_xstats);
+	model = cnxk_mldev->mldev->data->models[model_id];
+	layer = &model->layer[layer_id];
+	stat_id = cnxk_mldev->xstats.offset_for_layer[model_id][layer_id];
 
 	roc_clk_freq_get(&rclk_freq, &sclk_freq);
 	if (sclk_freq == 0)
@@ -306,270 +220,94 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 	else
 		strcpy(suffix, "ns");
 
-	/* Update xstat name based on model name and sclk availability */
+	/* Update xstat name based on layer name and sclk availability */
 	for (i = 0; i < RTE_DIM(layer_xstats); i++) {
-		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
-			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
-			 model->layer[0].glow.metadata.model.name, layer_xstats[i].name, suffix);
+		snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+			 layer->glow.metadata.model.name, layer_xstats[i].name, suffix);
 		stat_id++;
 	}
 }
 
-static uint64_t
-cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
-		       enum cnxk_ml_xstats_type type)
-{
-	struct cnxk_ml_dev *cnxk_mldev;
-
-	cnxk_mldev = dev->data->dev_private;
-
-	switch (type) {
-	case nb_models_loaded:
-		return cnxk_mldev->nb_models_loaded;
-	case nb_models_unloaded:
-		return cnxk_mldev->nb_models_unloaded;
-	case nb_models_started:
-		return cnxk_mldev->nb_models_started;
-	case nb_models_stopped:
-		return cnxk_mldev->nb_models_stopped;
-	default:
-		return -1;
-	}
-
-	return 0;
-}
-
-#define ML_AVG_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+#define ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
 	do {                                                                                       \
 		value = 0;                                                                         \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value += model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot;       \
-			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
-				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value += layer->glow.burst_xstats[qp_id].str##_latency_tot;                \
+			count += layer->glow.burst_xstats[qp_id].dequeued_count -                  \
+				 layer->glow.burst_xstats[qp_id].str##_reset_count;                \
 		}                                                                                  \
+		value += layer->glow.sync_xstats->str##_latency_tot;                               \
+		count += layer->glow.sync_xstats->dequeued_count -                                 \
+			 layer->glow.sync_xstats->str##_reset_count;                               \
 		if (count != 0)                                                                    \
 			value = value / count;                                                     \
 	} while (0)
 
-#define ML_MIN_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+#define ML_MIN_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
 	do {                                                                                       \
 		value = UINT64_MAX;                                                                \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MIN(                                                           \
-				value,                                                             \
-				model->layer[0].glow.burst_xstats[qp_id].str##_latency_min);       \
-			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
-				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MIN(value, layer->glow.burst_xstats[qp_id].str##_latency_min); \
+			count += layer->glow.burst_xstats[qp_id].dequeued_count -                  \
+				 layer->glow.burst_xstats[qp_id].str##_reset_count;                \
 		}                                                                                  \
+		value = PLT_MIN(value, layer->glow.sync_xstats->str##_latency_min);                \
+		count += layer->glow.sync_xstats->dequeued_count -                                 \
+			 layer->glow.sync_xstats->str##_reset_count;                               \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
 	} while (0)
 
-#define ML_MAX_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+#define ML_MAX_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
 	do {                                                                                       \
 		value = 0;                                                                         \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MAX(                                                           \
-				value,                                                             \
-				model->layer[0].glow.burst_xstats[qp_id].str##_latency_max);       \
-			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
-				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MAX(value, layer->glow.burst_xstats[qp_id].str##_latency_max); \
+			count += layer->glow.burst_xstats[qp_id].dequeued_count -                  \
+				 layer->glow.burst_xstats[qp_id].str##_reset_count;                \
 		}                                                                                  \
+		value = PLT_MAX(value, layer->glow.sync_xstats->str##_latency_max);                \
+		count += layer->glow.sync_xstats->dequeued_count -                                 \
+			 layer->glow.sync_xstats->str##_reset_count;                               \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
 	} while (0)
 
-static uint64_t
-cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cnxk_ml_xstats_type type)
+uint64_t
+cn10k_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
+			 enum cnxk_ml_xstats_type type)
 {
-	struct cnxk_ml_model *model;
-	uint16_t rclk_freq; /* MHz */
-	uint16_t sclk_freq; /* MHz */
 	uint64_t count = 0;
-	uint64_t value;
+	uint64_t value = 0;
 	uint32_t qp_id;
 
-	model = dev->data->models[obj_idx];
-	if (model == NULL)
-		return 0;
-
 	switch (type) {
 	case avg_hw_latency:
-		ML_AVG_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, hw, value, count);
 		break;
 	case min_hw_latency:
-		ML_MIN_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		ML_MIN_FOREACH_QP(cnxk_mldev, layer, qp_id, hw, value, count);
 		break;
 	case max_hw_latency:
-		ML_MAX_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		ML_MAX_FOREACH_QP(cnxk_mldev, layer, qp_id, hw, value, count);
 		break;
 	case avg_fw_latency:
-		ML_AVG_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, fw, value, count);
 		break;
 	case min_fw_latency:
-		ML_MIN_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		ML_MIN_FOREACH_QP(cnxk_mldev, layer, qp_id, fw, value, count);
 		break;
 	case max_fw_latency:
-		ML_MAX_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		ML_MAX_FOREACH_QP(cnxk_mldev, layer, qp_id, fw, value, count);
 		break;
 	default:
 		value = 0;
 	}
 
-	roc_clk_freq_get(&rclk_freq, &sclk_freq);
-	if (sclk_freq != 0) /* return in ns */
-		value = (value * 1000ULL) / sclk_freq;
-
 	return value;
 }
 
-static int
-cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[], uint16_t nb_ids)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_xstats_entry *xs;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint16_t nb_stats;
-	uint16_t stat_id;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	if (stat_ids == NULL)
-		nb_stats = cn10k_mldev->xstats.count_mode_device;
-	else
-		nb_stats = nb_ids;
-
-	for (i = 0; i < nb_stats; i++) {
-		if (stat_ids == NULL)
-			stat_id = i;
-		else
-			stat_id = stat_ids[i];
-
-		if (stat_id >= cn10k_mldev->xstats.count_mode_device)
-			return -EINVAL;
-
-		xs = &cn10k_mldev->xstats.entries[stat_id];
-		if (!xs->reset_allowed)
-			continue;
-
-		xs->reset_value = cn10k_ml_dev_xstat_get(dev, xs->obj_idx, xs->type);
-	}
-
-	return 0;
-}
-
-#define ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
-	do {                                                                                       \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot = 0;            \
-			model->layer[0].glow.burst_xstats[qp_id].str##_reset_count =               \
-				model->layer[0].glow.burst_xstats[qp_id].dequeued_count;           \
-		}                                                                                  \
-	} while (0)
-
-#define ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
-	do {                                                                                       \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_xstats[qp_id].str##_latency_min = UINT64_MAX;   \
-	} while (0)
-
-#define ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
-	do {                                                                                       \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_xstats[qp_id].str##_latency_max = 0;            \
-	} while (0)
-
-static void
-cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cnxk_ml_xstats_type type)
-{
-	struct cnxk_ml_model *model;
-	uint32_t qp_id;
-
-	model = dev->data->models[model_id];
-
-	switch (type) {
-	case avg_hw_latency:
-		ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, hw);
-		break;
-	case min_hw_latency:
-		ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, hw);
-		break;
-	case max_hw_latency:
-		ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, hw);
-		break;
-	case avg_fw_latency:
-		ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, fw);
-		break;
-	case min_fw_latency:
-		ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, fw);
-		break;
-	case max_fw_latency:
-		ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, fw);
-		break;
-	default:
-		return;
-	}
-}
-
-static int
-cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint16_t stat_ids[],
-			    uint16_t nb_ids)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_xstats_entry *xs;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
-	int32_t lcl_model_id = 0;
-	uint16_t start_id;
-	uint16_t end_id;
-	int32_t i;
-	int32_t j;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	for (i = 0; i < ML_CNXK_MAX_MODELS; i++) {
-		if (model_id == -1) {
-			model = dev->data->models[i];
-			if (model == NULL) /* Skip inactive models */
-				continue;
-		} else {
-			if (model_id != i)
-				continue;
-
-			model = dev->data->models[model_id];
-			if (model == NULL) {
-				plt_err("Invalid model_id = %d\n", model_id);
-				return -EINVAL;
-			}
-		}
-
-		start_id = cn10k_mldev->xstats.offset_for_model[i];
-		end_id = cn10k_mldev->xstats.offset_for_model[i] +
-			 cn10k_mldev->xstats.count_per_model[i] - 1;
-
-		if (stat_ids == NULL) {
-			for (j = start_id; j <= end_id; j++) {
-				xs = &cn10k_mldev->xstats.entries[j];
-				cn10k_ml_reset_model_stat(dev, i, xs->type);
-			}
-		} else {
-			for (j = 0; j < nb_ids; j++) {
-				if (stat_ids[j] < start_id || stat_ids[j] > end_id) {
-					plt_err("Invalid stat_ids[%d] = %d for model_id = %d\n", j,
-						stat_ids[j], lcl_model_id);
-					return -EINVAL;
-				}
-				xs = &cn10k_mldev->xstats.entries[stat_ids[j]];
-				cn10k_ml_reset_model_stat(dev, i, xs->type);
-			}
-		}
-	}
-
-	return 0;
-}
-
 static int
 cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer)
 {
@@ -654,7 +392,6 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cn10k_ml_ocm *ocm;
 	uint16_t tile_id;
-	int ret;
 
 	RTE_SET_USED(conf);
 
@@ -682,13 +419,6 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 
 	rte_spinlock_init(&ocm->lock);
 
-	/* Initialize xstats */
-	ret = cn10k_ml_xstats_init(cnxk_mldev->mldev);
-	if (ret != 0) {
-		plt_err("Failed to initialize xstats");
-		return ret;
-	}
-
 	/* Set JCMDQ enqueue function */
 	if (cn10k_mldev->hw_queue_lock == 1)
 		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_sl;
@@ -717,9 +447,6 @@ cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 	/* Release ocm_mask memory */
 	rte_free(cn10k_mldev->ocm.ocm_mask);
 
-	/* Un-initialize xstats */
-	cn10k_ml_xstats_uninit(cnxk_mldev->mldev);
-
 	/* Unload firmware */
 	cn10k_ml_fw_unload(cnxk_mldev);
 
@@ -770,174 +497,6 @@ cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
-int
-cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
-			      uint32_t size)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint32_t xstats_mode_count;
-	uint32_t idx = 0;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	xstats_mode_count = 0;
-	switch (mode) {
-	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
-		break;
-	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CNXK_MAX_MODELS)
-			break;
-		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
-		break;
-	default:
-		return -EINVAL;
-	};
-
-	if (xstats_mode_count > size || xstats_map == NULL)
-		return xstats_mode_count;
-
-	for (i = 0; i < cn10k_mldev->xstats.count && idx < size; i++) {
-		if (cn10k_mldev->xstats.entries[i].mode != mode)
-			continue;
-
-		if (mode != RTE_ML_DEV_XSTATS_DEVICE &&
-		    model_id != cn10k_mldev->xstats.entries[i].obj_idx)
-			continue;
-
-		strncpy(xstats_map[idx].name, cn10k_mldev->xstats.entries[i].map.name,
-			RTE_ML_STR_MAX);
-		xstats_map[idx].id = cn10k_mldev->xstats.entries[i].map.id;
-		idx++;
-	}
-
-	return idx;
-}
-
-int
-cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
-				uint64_t *value)
-{
-	struct cnxk_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	cnxk_ml_xstats_fn fn;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	for (i = 0; i < cn10k_mldev->xstats.count; i++) {
-		xs = &cn10k_mldev->xstats.entries[i];
-		if (strncmp(xs->map.name, name, RTE_ML_STR_MAX) == 0) {
-			if (stat_id != NULL)
-				*stat_id = xs->map.id;
-
-			switch (xs->fn_id) {
-			case CNXK_ML_XSTATS_FN_DEVICE:
-				fn = cn10k_ml_dev_xstat_get;
-				break;
-			case CNXK_ML_XSTATS_FN_MODEL:
-				fn = cn10k_ml_model_xstat_get;
-				break;
-			default:
-				plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
-				return -EINVAL;
-			}
-
-			*value = fn(dev, xs->obj_idx, xs->type) - xs->reset_value;
-
-			return 0;
-		}
-	}
-
-	if (stat_id != NULL)
-		*stat_id = (uint16_t)-1;
-
-	return -EINVAL;
-}
-
-int
-cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
-			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_xstats_entry *xs;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint32_t xstats_mode_count;
-	cnxk_ml_xstats_fn fn;
-	uint64_t val;
-	uint32_t idx;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	xstats_mode_count = 0;
-
-	switch (mode) {
-	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
-		break;
-	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CNXK_MAX_MODELS)
-			return -EINVAL;
-		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
-		break;
-	default:
-		return -EINVAL;
-	};
-
-	idx = 0;
-	for (i = 0; i < nb_ids && idx < xstats_mode_count; i++) {
-		xs = &cn10k_mldev->xstats.entries[stat_ids[i]];
-		if (stat_ids[i] > cn10k_mldev->xstats.count || xs->mode != mode)
-			continue;
-
-		if (mode == RTE_ML_DEV_XSTATS_MODEL && model_id != xs->obj_idx) {
-			plt_err("Invalid stats_id[%d] = %d for model_id = %d\n", i, stat_ids[i],
-				model_id);
-			return -EINVAL;
-		}
-
-		switch (xs->fn_id) {
-		case CNXK_ML_XSTATS_FN_DEVICE:
-			fn = cn10k_ml_dev_xstat_get;
-			break;
-		case CNXK_ML_XSTATS_FN_MODEL:
-			fn = cn10k_ml_model_xstat_get;
-			break;
-		default:
-			plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
-			return -EINVAL;
-		}
-
-		val = fn(dev, xs->obj_idx, xs->type);
-		if (values)
-			values[idx] = val;
-
-		idx++;
-	}
-
-	return idx;
-}
-
-int
-cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			  int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids)
-{
-	switch (mode) {
-	case RTE_ML_DEV_XSTATS_DEVICE:
-		return cn10k_ml_device_xstats_reset(dev, stat_ids, nb_ids);
-	case RTE_ML_DEV_XSTATS_MODEL:
-		return cn10k_ml_model_xstats_reset(dev, model_id, stat_ids, nb_ids);
-	};
-
-	return 0;
-}
-
 int
 cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
 {
@@ -1211,7 +770,7 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 							      sizeof(struct cn10k_ml_layer_xstats));
 
 	/* Update xstats names */
-	cn10k_ml_xstats_model_name_update(cnxk_mldev->mldev, idx);
+	cn10k_ml_xstats_layer_name_update(cnxk_mldev, model_id, layer_id);
 
 	layer->state = ML_CNXK_LAYER_STATE_LOADED;
 	cnxk_mldev->index_map[idx].model_id = model->model_id;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 47e7cb12af..4d76164dba 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -13,6 +13,7 @@
 struct cnxk_ml_dev;
 struct cnxk_ml_qp;
 struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
@@ -298,17 +299,6 @@ int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev);
 
-int cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-				  int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
-				  uint32_t size);
-int cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
-				    uint64_t *value);
-int cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			    int32_t model_id, const uint16_t stat_ids[], uint64_t values[],
-			    uint16_t nb_ids);
-int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			      int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids);
-
 /* Slow-path ops */
 int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
@@ -337,4 +327,8 @@ int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_nam
 int cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name);
 int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name);
 
+/* xstats ops */
+uint64_t cn10k_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
+				  enum cnxk_ml_xstats_type type);
+
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 1590249abd..3ce9338f1f 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -9,6 +9,8 @@
 
 #include "cn10k_ml_dev.h"
 
+#include "cnxk_ml_xstats.h"
+
 /* ML command timeout in seconds */
 #define ML_CNXK_CMD_TIMEOUT 5
 
@@ -51,6 +53,9 @@ struct cnxk_ml_dev {
 	/* Configuration state */
 	enum cnxk_ml_dev_state state;
 
+	/* Extended stats data */
+	struct cnxk_ml_xstats xstats;
+
 	/* Number of models loaded */
 	uint16_t nb_models_loaded;
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index c75317d6da..6a423d9eda 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -115,6 +115,285 @@ cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc
 	return NULL;
 }
 
+static int
+cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
+{
+	uint16_t nb_stats;
+	uint16_t stat_id;
+	uint16_t model;
+	uint16_t layer;
+	uint16_t i;
+
+	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
+	nb_stats = RTE_DIM(device_xstats) +
+		   RTE_DIM(layer_xstats) * ML_CNXK_MAX_MODELS * ML_CNXK_MODEL_MAX_LAYERS;
+	if (cnxk_mldev->xstats.entries == NULL)
+		cnxk_mldev->xstats.entries = rte_zmalloc(
+			"cnxk_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
+			PLT_CACHE_LINE_SIZE);
+
+	if (cnxk_mldev->xstats.entries == NULL)
+		return -ENOMEM;
+
+	/* Initialize device xstats */
+	stat_id = 0;
+	for (i = 0; i < RTE_DIM(device_xstats); i++) {
+		cnxk_mldev->xstats.entries[stat_id].map.id = stat_id;
+		snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s",
+			 device_xstats[i].name);
+
+		cnxk_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
+		cnxk_mldev->xstats.entries[stat_id].group = CNXK_ML_XSTATS_GROUP_DEVICE;
+		cnxk_mldev->xstats.entries[stat_id].type = device_xstats[i].type;
+		cnxk_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_DEVICE;
+		cnxk_mldev->xstats.entries[stat_id].obj_idx = 0;
+		cnxk_mldev->xstats.entries[stat_id].reset_allowed = device_xstats[i].reset_allowed;
+		stat_id++;
+	}
+	cnxk_mldev->xstats.count_mode_device = stat_id;
+
+	/* Initialize model xstats */
+	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
+		cnxk_mldev->xstats.offset_for_model[model] = stat_id;
+
+		for (layer = 0; layer < ML_CNXK_MODEL_MAX_LAYERS; layer++) {
+			cnxk_mldev->xstats.offset_for_layer[model][layer] = stat_id;
+
+			for (i = 0; i < RTE_DIM(layer_xstats); i++) {
+				cnxk_mldev->xstats.entries[stat_id].map.id = stat_id;
+				cnxk_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
+				cnxk_mldev->xstats.entries[stat_id].group =
+					CNXK_ML_XSTATS_GROUP_LAYER;
+				cnxk_mldev->xstats.entries[stat_id].type = layer_xstats[i].type;
+				cnxk_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
+				cnxk_mldev->xstats.entries[stat_id].obj_idx = model;
+				cnxk_mldev->xstats.entries[stat_id].layer_id = layer;
+				cnxk_mldev->xstats.entries[stat_id].reset_allowed =
+					layer_xstats[i].reset_allowed;
+
+				/* Name of xstat is updated during model load */
+				snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+					 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name),
+					 "Layer-%u-%u-%s", model, layer, layer_xstats[i].name);
+
+				stat_id++;
+			}
+
+			cnxk_mldev->xstats.count_per_layer[model][layer] = RTE_DIM(layer_xstats);
+		}
+
+		cnxk_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
+	}
+
+	cnxk_mldev->xstats.count_mode_model = stat_id - cnxk_mldev->xstats.count_mode_device;
+	cnxk_mldev->xstats.count = stat_id;
+
+	return 0;
+}
+
+static void
+cnxk_ml_xstats_uninit(struct cnxk_ml_dev *cnxk_mldev)
+{
+	rte_free(cnxk_mldev->xstats.entries);
+	cnxk_mldev->xstats.entries = NULL;
+
+	cnxk_mldev->xstats.count = 0;
+}
+
+static uint64_t
+cnxk_ml_dev_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx __rte_unused,
+		      int32_t layer_id __rte_unused, enum cnxk_ml_xstats_type type)
+{
+	switch (type) {
+	case nb_models_loaded:
+		return cnxk_mldev->nb_models_loaded;
+	case nb_models_unloaded:
+		return cnxk_mldev->nb_models_unloaded;
+	case nb_models_started:
+		return cnxk_mldev->nb_models_started;
+	case nb_models_stopped:
+		return cnxk_mldev->nb_models_stopped;
+	default:
+		return -1;
+	}
+
+	return 0;
+}
+
+static uint64_t
+cnxk_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx, int32_t layer_id,
+			enum cnxk_ml_xstats_type type)
+{
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
+	uint16_t rclk_freq; /* MHz */
+	uint16_t sclk_freq; /* MHz */
+	uint64_t value = 0;
+
+	model = cnxk_mldev->mldev->data->models[obj_idx];
+	if (model == NULL)
+		return 0;
+
+	if (layer_id >= 0)
+		layer = &model->layer[layer_id];
+	else
+		return 0;
+
+	value = cn10k_ml_model_xstat_get(cnxk_mldev, layer, type);
+
+	roc_clk_freq_get(&rclk_freq, &sclk_freq);
+	if (sclk_freq != 0) /* return in ns */
+		value = (value * 1000ULL) / sclk_freq;
+
+	return value;
+}
+
+static int
+cnxk_ml_device_xstats_reset(struct cnxk_ml_dev *cnxk_mldev, const uint16_t stat_ids[],
+			    uint16_t nb_ids)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	uint16_t nb_stats;
+	uint16_t stat_id;
+	uint32_t i;
+
+	if (stat_ids == NULL)
+		nb_stats = cnxk_mldev->xstats.count_mode_device;
+	else
+		nb_stats = nb_ids;
+
+	for (i = 0; i < nb_stats; i++) {
+		if (stat_ids == NULL)
+			stat_id = i;
+		else
+			stat_id = stat_ids[i];
+
+		if (stat_id >= cnxk_mldev->xstats.count_mode_device)
+			return -EINVAL;
+
+		xs = &cnxk_mldev->xstats.entries[stat_id];
+		if (!xs->reset_allowed)
+			continue;
+
+		xs->reset_value =
+			cnxk_ml_dev_xstat_get(cnxk_mldev, xs->obj_idx, xs->layer_id, xs->type);
+	}
+
+	return 0;
+}
+
+#define ML_AVG_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, str)                                     \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			layer->glow.burst_xstats[qp_id].str##_latency_tot = 0;                     \
+			layer->glow.burst_xstats[qp_id].str##_reset_count =                        \
+				layer->glow.burst_xstats[qp_id].dequeued_count;                    \
+		}                                                                                  \
+	} while (0)
+
+#define ML_MIN_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, str)                                     \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++)          \
+			layer->glow.burst_xstats[qp_id].str##_latency_min = UINT64_MAX;            \
+	} while (0)
+
+#define ML_MAX_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, str)                                     \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++)          \
+			layer->glow.burst_xstats[qp_id].str##_latency_max = 0;                     \
+	} while (0)
+
+static void
+cnxk_ml_reset_model_stat(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id,
+			 enum cnxk_ml_xstats_type type)
+{
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
+	uint16_t layer_id = 0;
+	uint32_t qp_id;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	layer = &model->layer[layer_id];
+
+	switch (type) {
+	case avg_hw_latency:
+		ML_AVG_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, hw);
+		break;
+	case min_hw_latency:
+		ML_MIN_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, hw);
+		break;
+	case max_hw_latency:
+		ML_MAX_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, hw);
+		break;
+	case avg_fw_latency:
+		ML_AVG_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, fw);
+		break;
+	case min_fw_latency:
+		ML_MIN_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, fw);
+		break;
+	case max_fw_latency:
+		ML_MAX_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, fw);
+		break;
+	default:
+		return;
+	}
+}
+
+static int
+cnxk_ml_model_xstats_reset(struct cnxk_ml_dev *cnxk_mldev, int32_t model_id,
+			   const uint16_t stat_ids[], uint16_t nb_ids)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_model *model;
+	int32_t lcl_model_id = 0;
+	uint16_t layer_id = 0;
+	uint16_t start_id;
+	uint16_t end_id;
+	int32_t i;
+	int32_t j;
+
+	for (i = 0; i < ML_CNXK_MAX_MODELS; i++) {
+		if (model_id == -1) {
+			model = cnxk_mldev->mldev->data->models[i];
+			if (model == NULL) /* skip inactive models */
+				continue;
+		} else {
+			if (model_id != i)
+				continue;
+
+			model = cnxk_mldev->mldev->data->models[model_id];
+			if (model == NULL) {
+				plt_err("Invalid model_id = %d\n", model_id);
+				return -EINVAL;
+			}
+		}
+
+		start_id = cnxk_mldev->xstats.offset_for_layer[i][layer_id];
+		end_id = cnxk_mldev->xstats.offset_for_layer[i][layer_id] +
+			 cnxk_mldev->xstats.count_per_layer[i][layer_id] - 1;
+
+		if (stat_ids == NULL) {
+			for (j = start_id; j <= end_id; j++) {
+				xs = &cnxk_mldev->xstats.entries[j];
+				cnxk_ml_reset_model_stat(cnxk_mldev, i, xs->type);
+			}
+		} else {
+			for (j = 0; j < nb_ids; j++) {
+				if (stat_ids[j] < start_id || stat_ids[j] > end_id) {
+					plt_err("Invalid stat_ids[%d] = %d for model_id = %d\n", j,
+						stat_ids[j], lcl_model_id);
+					return -EINVAL;
+				}
+				xs = &cnxk_mldev->xstats.entries[stat_ids[j]];
+				cnxk_ml_reset_model_stat(cnxk_mldev, i, xs->type);
+			}
+		}
+	}
+
+	return 0;
+}
+
 static int
 cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -294,6 +573,13 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	for (i = 0; i < cnxk_mldev->max_nb_layers; i++)
 		cnxk_mldev->index_map[i].active = false;
 
+	/* Initialize xstats */
+	ret = cnxk_ml_xstats_init(cnxk_mldev);
+	if (ret != 0) {
+		plt_err("Failed to initialize xstats");
+		goto error;
+	}
+
 	cnxk_mldev->nb_models_loaded = 0;
 	cnxk_mldev->nb_models_started = 0;
 	cnxk_mldev->nb_models_stopped = 0;
@@ -323,6 +609,9 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
+	/* Un-initialize xstats */
+	cnxk_ml_xstats_uninit(cnxk_mldev);
+
 	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close CN10K ML Device");
 
@@ -521,6 +810,190 @@ cnxk_ml_dev_stats_reset(struct rte_ml_dev *dev)
 	}
 }
 
+static int
+cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+			     int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
+			     uint32_t size)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_dev *cnxk_mldev;
+	uint32_t xstats_mode_count;
+	uint16_t layer_id = 0;
+	uint32_t idx = 0;
+	uint32_t i;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+	xstats_mode_count = 0;
+
+	switch (mode) {
+	case RTE_ML_DEV_XSTATS_DEVICE:
+		xstats_mode_count = cnxk_mldev->xstats.count_mode_device;
+		break;
+	case RTE_ML_DEV_XSTATS_MODEL:
+		if (model_id >= ML_CNXK_MAX_MODELS)
+			break;
+		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+		break;
+	default:
+		return -EINVAL;
+	};
+
+	if (xstats_mode_count > size || xstats_map == NULL)
+		return xstats_mode_count;
+
+	for (i = 0; i < cnxk_mldev->xstats.count && idx < size; i++) {
+		xs = &cnxk_mldev->xstats.entries[i];
+		if (xs->mode != mode)
+			continue;
+
+		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
+		    (model_id != xs->obj_idx || layer_id != xs->layer_id))
+			continue;
+
+		strncpy(xstats_map[idx].name, xs->map.name, RTE_ML_STR_MAX);
+		xstats_map[idx].id = xs->map.id;
+		idx++;
+	}
+
+	return idx;
+}
+
+static int
+cnxk_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
+			       uint64_t *value)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_dev *cnxk_mldev;
+	cnxk_ml_xstats_fn fn;
+	uint32_t i;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	for (i = 0; i < cnxk_mldev->xstats.count; i++) {
+		xs = &cnxk_mldev->xstats.entries[i];
+		if (strncmp(xs->map.name, name, RTE_ML_STR_MAX) == 0) {
+			if (stat_id != NULL)
+				*stat_id = xs->map.id;
+
+			switch (xs->fn_id) {
+			case CNXK_ML_XSTATS_FN_DEVICE:
+				fn = cnxk_ml_dev_xstat_get;
+				break;
+			case CNXK_ML_XSTATS_FN_MODEL:
+				fn = cnxk_ml_model_xstat_get;
+				break;
+			default:
+				plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
+				return -EINVAL;
+			}
+
+			*value = fn(cnxk_mldev, xs->obj_idx, xs->layer_id, xs->type) -
+				 xs->reset_value;
+
+			return 0;
+		}
+	}
+
+	if (stat_id != NULL)
+		*stat_id = (uint16_t)-1;
+
+	return -EINVAL;
+}
+
+static int
+cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
+		       const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_dev *cnxk_mldev;
+	uint32_t xstats_mode_count;
+	uint16_t layer_id = 0;
+	cnxk_ml_xstats_fn fn;
+	uint64_t val;
+	uint32_t idx;
+	uint32_t i;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+	xstats_mode_count = 0;
+
+	switch (mode) {
+	case RTE_ML_DEV_XSTATS_DEVICE:
+		xstats_mode_count = cnxk_mldev->xstats.count_mode_device;
+		break;
+	case RTE_ML_DEV_XSTATS_MODEL:
+		if (model_id >= ML_CNXK_MAX_MODELS)
+			return -EINVAL;
+		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+		break;
+	default:
+		return -EINVAL;
+	};
+
+	idx = 0;
+	for (i = 0; i < nb_ids && idx < xstats_mode_count; i++) {
+		xs = &cnxk_mldev->xstats.entries[stat_ids[i]];
+		if (stat_ids[i] > cnxk_mldev->xstats.count || xs->mode != mode)
+			continue;
+
+		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
+		    (model_id != xs->obj_idx || layer_id != xs->layer_id)) {
+			plt_err("Invalid stats_id[%d] = %d for model_id = %d\n", i, stat_ids[i],
+				model_id);
+			return -EINVAL;
+		}
+
+		switch (xs->fn_id) {
+		case CNXK_ML_XSTATS_FN_DEVICE:
+			fn = cnxk_ml_dev_xstat_get;
+			break;
+		case CNXK_ML_XSTATS_FN_MODEL:
+			fn = cnxk_ml_model_xstat_get;
+			break;
+		default:
+			plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
+			return -EINVAL;
+		}
+
+		val = fn(cnxk_mldev, xs->obj_idx, xs->layer_id, xs->type);
+		if (values)
+			values[idx] = val;
+
+		idx++;
+	}
+
+	return idx;
+}
+
+static int
+cnxk_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
+			 const uint16_t stat_ids[], uint16_t nb_ids)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	switch (mode) {
+	case RTE_ML_DEV_XSTATS_DEVICE:
+		return cnxk_ml_device_xstats_reset(cnxk_mldev, stat_ids, nb_ids);
+	case RTE_ML_DEV_XSTATS_MODEL:
+		return cnxk_ml_model_xstats_reset(cnxk_mldev, model_id, stat_ids, nb_ids);
+	};
+
+	return 0;
+}
+
 static int
 cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
@@ -806,10 +1279,10 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Stats ops */
 	.dev_stats_get = cnxk_ml_dev_stats_get,
 	.dev_stats_reset = cnxk_ml_dev_stats_reset,
-	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
-	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
-	.dev_xstats_get = cn10k_ml_dev_xstats_get,
-	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
+	.dev_xstats_names_get = cnxk_ml_dev_xstats_names_get,
+	.dev_xstats_by_name_get = cnxk_ml_dev_xstats_by_name_get,
+	.dev_xstats_get = cnxk_ml_dev_xstats_get,
+	.dev_xstats_reset = cnxk_ml_dev_xstats_reset,
 
 	/* Model ops */
 	.model_load = cnxk_ml_model_load,
diff --git a/drivers/ml/cnxk/cnxk_ml_xstats.h b/drivers/ml/cnxk/cnxk_ml_xstats.h
index 0d405679ca..5e02bb876c 100644
--- a/drivers/ml/cnxk/cnxk_ml_xstats.h
+++ b/drivers/ml/cnxk/cnxk_ml_xstats.h
@@ -7,6 +7,8 @@
 
 #include "cnxk_ml_io.h"
 
+struct cnxk_ml_dev;
+
 /* Extended stats types enum */
 enum cnxk_ml_xstats_type {
 	/* Number of models loaded */
@@ -58,9 +60,21 @@ enum cnxk_ml_xstats_fn_type {
 	CNXK_ML_XSTATS_FN_MODEL,
 };
 
+/* Extended stats group */
+enum cnxk_ml_xstats_group {
+	/* Device stats */
+	CNXK_ML_XSTATS_GROUP_DEVICE,
+
+	/* Model stats */
+	CNXK_ML_XSTATS_GROUP_MODEL,
+
+	/* Layer stats */
+	CNXK_ML_XSTATS_GROUP_LAYER,
+};
+
 /* Function pointer to get xstats for a type */
-typedef uint64_t (*cnxk_ml_xstats_fn)(struct rte_ml_dev *cnxk_mldev, uint16_t obj_idx,
-				      enum cnxk_ml_xstats_type stat);
+typedef uint64_t (*cnxk_ml_xstats_fn)(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx,
+				      int32_t layer_id, enum cnxk_ml_xstats_type stat);
 
 /* Extended stats entry structure */
 struct cnxk_ml_xstats_entry {
@@ -70,6 +84,9 @@ struct cnxk_ml_xstats_entry {
 	/* xstats mode, device or model */
 	enum rte_ml_dev_xstats_mode mode;
 
+	/* xstats group */
+	enum cnxk_ml_xstats_group group;
+
 	/* Type of xstats */
 	enum cnxk_ml_xstats_type type;
 
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 16/34] ml/cnxk: update fast path functions
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
                     ` (14 preceding siblings ...)
  2023-10-18  6:47   ` [PATCH v5 15/34] ml/cnxk: update device and model xstats functions Srikanth Yalavarthi
@ 2023-10-18  6:47   ` Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 17/34] ml/cnxk: move error handling to cnxk layer Srikanth Yalavarthi
                     ` (17 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:47 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented cnxk layer fast-path functions and added support
for model specific fast-path functions. CNXK layer functions
would invoke model specific fast-path functions.

Added support for model specific poll handling functions and
updated internal inference sync function. Drop use of rte_ml_op
as argument. Updated function arguments to enable the function
to be used as callback by TVM HW runtime.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h  |   5 -
 drivers/ml/cnxk/cn10k_ml_ops.c  | 241 ++++++++------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h  |  13 +-
 drivers/ml/cnxk/cnxk_ml_model.h |  14 ++
 drivers/ml/cnxk/cnxk_ml_ops.c   | 128 +++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.h   |   7 +
 6 files changed, 216 insertions(+), 192 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index bde9d08901..94a94d996f 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -143,11 +143,6 @@ struct cn10k_ml_dev {
 
 	/* JCMD enqueue function handler */
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
-
-	/* Poll handling function pointers */
-	void (*set_poll_addr)(struct cnxk_ml_req *req);
-	void (*set_poll_ptr)(struct cnxk_ml_req *req);
-	uint64_t (*get_poll_ptr)(struct cnxk_ml_req *req);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 776ad60401..8116c8dedb 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -65,24 +65,12 @@ static const struct cn10k_ml_stype_db_driver {
 	{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
 };
 
-static inline void
+__rte_hot void
 cn10k_ml_set_poll_addr(struct cnxk_ml_req *req)
 {
 	req->status = &req->cn10k_req.status;
 }
 
-static inline void
-cn10k_ml_set_poll_ptr(struct cnxk_ml_req *req)
-{
-	plt_write64(ML_CNXK_POLL_JOB_START, req->status);
-}
-
-static inline uint64_t
-cn10k_ml_get_poll_ptr(struct cnxk_ml_req *req)
-{
-	return plt_read64(req->status);
-}
-
 void
 cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp)
 {
@@ -177,7 +165,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_l
 
 static __rte_always_inline void
 cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_req *req,
-				struct rte_ml_op *op)
+				uint16_t index, void *input, void *output, uint16_t nb_batches)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 
@@ -185,17 +173,17 @@ cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_r
 
 	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
 	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(req->status);
-	req->cn10k_req.jd.hdr.model_id = op->model_id;
+	req->cn10k_req.jd.hdr.model_id = index;
 	req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
 	req->cn10k_req.jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
 	req->cn10k_req.jd.hdr.sp_flags = 0x0;
 	req->cn10k_req.jd.hdr.result =
 		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
 	req->cn10k_req.jd.model_run.input_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->input[0]->addr));
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, input));
 	req->cn10k_req.jd.model_run.output_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->output[0]->addr));
-	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, output));
+	req->cn10k_req.jd.model_run.num_batches = nb_batches;
 }
 
 static void
@@ -311,30 +299,15 @@ cn10k_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *l
 static int
 cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer)
 {
-	struct rte_ml_buff_seg seg[2];
-	struct rte_ml_buff_seg *inp;
-	struct rte_ml_buff_seg *out;
-	struct rte_ml_op op;
-
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
 	uint64_t isize = 0;
 	uint64_t osize = 0;
 	int ret = 0;
-	uint32_t i;
-
-	inp = &seg[0];
-	out = &seg[1];
 
 	/* Create input and output buffers. */
-	for (i = 0; i < layer->info.nb_inputs; i++)
-		isize += layer->info.input[i].sz_q;
-
-	for (i = 0; i < layer->info.nb_outputs; i++)
-		osize += layer->info.output[i].sz_q;
-
-	isize = layer->batch_size * isize;
-	osize = layer->batch_size * osize;
+	isize = layer->info.total_input_sz_q;
+	osize = layer->info.total_output_sz_q;
 
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", layer->index);
 	mz = plt_memzone_reserve_aligned(str, isize + osize, 0, ML_CN10K_ALIGN_SIZE);
@@ -342,25 +315,9 @@ cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *
 		return -ENOMEM;
 	memset(mz->addr, 0, isize + osize);
 
-	seg[0].addr = mz->addr;
-	seg[0].iova_addr = mz->iova;
-	seg[0].length = isize;
-	seg[0].next = NULL;
-
-	seg[1].addr = PLT_PTR_ADD(mz->addr, isize);
-	seg[1].iova_addr = mz->iova + isize;
-	seg[1].length = osize;
-	seg[1].next = NULL;
-
-	op.model_id = layer->index;
-	op.nb_batches = layer->batch_size;
-	op.mempool = NULL;
-
-	op.input = &inp;
-	op.output = &out;
-
 	memset(layer->glow.req, 0, sizeof(struct cnxk_ml_req));
-	ret = cn10k_ml_inference_sync(cnxk_mldev, &op);
+	ret = cn10k_ml_inference_sync(cnxk_mldev, layer->index, mz->addr,
+				      PLT_PTR_ADD(mz->addr, isize), 1);
 	plt_memzone_free(mz);
 
 	return ret;
@@ -425,13 +382,8 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 	else
 		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
-	/* Set polling function pointers */
-	cn10k_mldev->set_poll_addr = cn10k_ml_set_poll_addr;
-	cn10k_mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
-	cn10k_mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
-
-	cnxk_mldev->mldev->enqueue_burst = cn10k_ml_enqueue_burst;
-	cnxk_mldev->mldev->dequeue_burst = cn10k_ml_dequeue_burst;
+	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
+	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
 	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
 
 	return 0;
@@ -824,6 +776,12 @@ cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	cn10k_ml_model_info_set(cnxk_mldev, model, &model->layer[0].info, &model->glow.metadata);
 
+	/* Set fast-path functions */
+	model->enqueue_single = cn10k_ml_enqueue_single;
+	model->result_update = cn10k_ml_result_update;
+	model->set_error_code = cn10k_ml_set_error_code;
+	model->set_poll_addr = cn10k_ml_set_poll_addr;
+
 	return 0;
 }
 
@@ -1219,26 +1177,8 @@ cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_mode
 	return 0;
 }
 
-static __rte_always_inline void
-queue_index_advance(uint64_t *index, uint64_t nb_desc)
-{
-	*index = (*index + 1) % nb_desc;
-}
-
-static __rte_always_inline uint64_t
-queue_pending_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
-{
-	return (nb_desc + head - tail) % nb_desc;
-}
-
-static __rte_always_inline uint64_t
-queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
-{
-	return nb_desc - queue_pending_count(head, tail, nb_desc) - 1;
-}
-
-static __rte_always_inline void
-cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml_req *req)
+__rte_hot void
+cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_layer_xstats *xstats;
@@ -1246,6 +1186,7 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml
 	struct cn10k_ml_result *result;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
+	struct cnxk_ml_req *req;
 	struct cnxk_ml_qp *qp;
 	struct rte_ml_op *op;
 	uint64_t hw_latency;
@@ -1253,9 +1194,9 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml
 	uint16_t model_id;
 	uint16_t layer_id;
 
+	req = (struct cnxk_ml_req *)request;
 	result = &req->cn10k_req.result;
 	op = req->op;
-
 	if (likely(result->error_code == 0)) {
 		model_id = cnxk_mldev->index_map[op->model_id].model_id;
 		layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
@@ -1322,119 +1263,48 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml
 	op->user_ptr = result->user_ptr;
 }
 
-__rte_hot uint16_t
-cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
-		       uint16_t nb_ops)
+__rte_hot void
+cn10k_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype)
+{
+	union cn10k_ml_error_code *error_code;
+
+	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+	error_code->s.etype = etype;
+	error_code->s.stype = stype;
+}
+
+__rte_hot bool
+cn10k_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op, uint16_t layer_id,
+			struct cnxk_ml_qp *qp, uint64_t head)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
 	struct cnxk_ml_queue *queue;
 	struct cnxk_ml_req *req;
-	struct cnxk_ml_qp *qp;
-	struct rte_ml_op *op;
-
-	uint16_t count;
-	uint64_t head;
-	bool enqueued;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	qp = dev->data->queue_pairs[qp_id];
 	queue = &qp->queue;
-
-	head = queue->head;
-	nb_ops = PLT_MIN(nb_ops, queue_free_count(head, queue->tail, qp->nb_desc));
-	count = 0;
-
-	if (unlikely(nb_ops == 0))
-		return 0;
-
-enqueue_req:
-	op = ops[count];
 	req = &queue->reqs[head];
 
-	cn10k_mldev->set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
+	model = cnxk_mldev->mldev->data->models[op->model_id];
+	model->set_poll_addr(req);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, model->layer[layer_id].index,
+					op->input[0]->addr, op->output[0]->addr, op->nb_batches);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
 	error_code->s.etype = ML_ETYPE_UNKNOWN;
 	req->cn10k_req.result.user_ptr = op->user_ptr;
 
-	cn10k_mldev->set_poll_ptr(req);
-	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd);
-	if (unlikely(!enqueued))
-		goto jcmdq_full;
+	cnxk_ml_set_poll_ptr(req);
+	if (unlikely(!cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd)))
+		return false;
 
 	req->timeout = plt_tsc_cycles() + queue->wait_cycles;
 	req->op = op;
 
-	queue_index_advance(&head, qp->nb_desc);
-	count++;
-
-	if (count < nb_ops)
-		goto enqueue_req;
-
-jcmdq_full:
-	queue->head = head;
-	qp->stats.enqueued_count += count;
-
-	return count;
-}
-
-__rte_hot uint16_t
-cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
-		       uint16_t nb_ops)
-{
-	union cn10k_ml_error_code *error_code;
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_queue *queue;
-	struct cnxk_ml_req *req;
-	struct cnxk_ml_qp *qp;
-
-	uint64_t status;
-	uint16_t count;
-	uint64_t tail;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	qp = dev->data->queue_pairs[qp_id];
-	queue = &qp->queue;
-
-	tail = queue->tail;
-	nb_ops = PLT_MIN(nb_ops, queue_pending_count(queue->head, tail, qp->nb_desc));
-	count = 0;
-
-	if (unlikely(nb_ops == 0))
-		goto empty_or_active;
-
-dequeue_req:
-	req = &queue->reqs[tail];
-	status = cn10k_mldev->get_poll_ptr(req);
-	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
-		if (plt_tsc_cycles() < req->timeout) {
-			goto empty_or_active;
-		} else { /* Timeout, set indication of driver error */
-			error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
-			error_code->s.etype = ML_ETYPE_DRIVER;
-		}
-	}
-
-	cn10k_ml_result_update(cnxk_mldev, qp_id, req);
-	ops[count] = req->op;
-
-	queue_index_advance(&tail, qp->nb_desc);
-	count++;
-
-	if (count < nb_ops)
-		goto dequeue_req;
-
-empty_or_active:
-	queue->tail = tail;
-
-	return count;
+	return true;
 }
 
 __rte_hot int
@@ -1471,41 +1341,48 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 }
 
 __rte_hot int
-cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op)
+cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
+			uint16_t nb_batches)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
 	struct cnxk_ml_req *req;
+	struct rte_ml_op op;
 	uint16_t model_id;
 	uint16_t layer_id;
 	bool timeout;
 	int ret = 0;
 
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	model_id = cnxk_mldev->index_map[op->model_id].model_id;
-	layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
+	model_id = cnxk_mldev->index_map[index].model_id;
+	layer_id = cnxk_mldev->index_map[index].layer_id;
 	model = cnxk_mldev->mldev->data->models[model_id];
 	layer = &model->layer[layer_id];
 	req = layer->glow.req;
 
+	op.model_id = index;
+	op.impl_opaque = 0;
+
 	cn10k_ml_set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, index, input, output, nb_batches);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
 	error_code->s.etype = ML_ETYPE_UNKNOWN;
-	req->cn10k_req.result.user_ptr = op->user_ptr;
+	req->cn10k_req.result.user_ptr = NULL;
 
-	cn10k_mldev->set_poll_ptr(req);
+	cnxk_ml_set_poll_ptr(req);
 	req->cn10k_req.jcmd.w1.s.jobptr = PLT_U64_CAST(&req->cn10k_req.jd);
 
 	timeout = true;
 	req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	do {
 		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd)) {
-			req->op = op;
+			req->op = &op;
 			timeout = false;
 			break;
 		}
@@ -1518,7 +1395,7 @@ cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op)
 
 	timeout = true;
 	do {
-		if (cn10k_mldev->get_poll_ptr(req) == ML_CNXK_POLL_JOB_FINISH) {
+		if (cnxk_ml_get_poll_ptr(req) == ML_CNXK_POLL_JOB_FINISH) {
 			timeout = false;
 			break;
 		}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 4d76164dba..3d18303ed3 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -14,6 +14,7 @@ struct cnxk_ml_dev;
 struct cnxk_ml_qp;
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
+struct cnxk_ml_req;
 
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
@@ -309,13 +310,15 @@ int cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_
 				 void *buffer);
 
 /* Fast-path ops */
-__rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
-					  struct rte_ml_op **ops, uint16_t nb_ops);
-__rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
-					  struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot bool cn10k_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op,
+				       uint16_t layer_id, struct cnxk_ml_qp *qp, uint64_t head);
 __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op,
 				    struct rte_ml_op_error *error);
-__rte_hot int cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op);
+__rte_hot int cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
+				      uint16_t nb_batches);
+__rte_hot void cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request);
+__rte_hot void cn10k_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype);
+__rte_hot void cn10k_ml_set_poll_addr(struct cnxk_ml_req *req);
 
 /* Misc ops */
 void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index 66d979dd3c..f618e5aa5f 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -15,6 +15,8 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_qp;
+struct cnxk_ml_req;
 
 /* Model state */
 enum cnxk_ml_model_state {
@@ -70,6 +72,12 @@ struct cnxk_ml_layer {
 	struct cn10k_ml_layer_data glow;
 };
 
+typedef bool (*enqueue_single_t)(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op,
+				 uint16_t layer_id, struct cnxk_ml_qp *qp, uint64_t head);
+typedef void (*result_update_t)(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request);
+typedef void (*set_error_code_t)(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype);
+typedef void (*set_poll_addr_t)(struct cnxk_ml_req *req);
+
 /* Model Object */
 struct cnxk_ml_model {
 	/* Device reference */
@@ -106,6 +114,12 @@ struct cnxk_ml_model {
 
 	/* Spinlock, used to update model state */
 	plt_spinlock_t lock;
+
+	/* Fast-path functions */
+	enqueue_single_t enqueue_single;
+	result_update_t result_update;
+	set_error_code_t set_error_code;
+	set_poll_addr_t set_poll_addr;
 };
 
 void cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp);
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 6a423d9eda..6a44a69508 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -15,6 +15,18 @@
 /* ML model macros */
 #define CNXK_ML_MODEL_MEMZONE_NAME "ml_cnxk_model_mz"
 
+__rte_hot void
+cnxk_ml_set_poll_ptr(struct cnxk_ml_req *req)
+{
+	plt_write64(ML_CNXK_POLL_JOB_START, req->status);
+}
+
+__rte_hot uint64_t
+cnxk_ml_get_poll_ptr(struct cnxk_ml_req *req)
+{
+	return plt_read64(req->status);
+}
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -1262,6 +1274,122 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 	return 0;
 }
 
+static __rte_always_inline void
+queue_index_advance(uint64_t *index, uint64_t nb_desc)
+{
+	*index = (*index + 1) % nb_desc;
+}
+
+static __rte_always_inline uint64_t
+queue_pending_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
+{
+	return (nb_desc + head - tail) % nb_desc;
+}
+
+static __rte_always_inline uint64_t
+queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
+{
+	return nb_desc - queue_pending_count(head, tail, nb_desc) - 1;
+}
+
+__rte_hot uint16_t
+cnxk_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
+		      uint16_t nb_ops)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_qp *qp;
+	struct rte_ml_op *op;
+
+	uint16_t layer_id = 0;
+	uint16_t count;
+	uint64_t head;
+
+	cnxk_mldev = dev->data->dev_private;
+	qp = dev->data->queue_pairs[qp_id];
+	queue = &qp->queue;
+
+	head = queue->head;
+	nb_ops = PLT_MIN(nb_ops, queue_free_count(head, queue->tail, qp->nb_desc));
+	count = 0;
+
+	if (unlikely(nb_ops == 0))
+		return 0;
+
+enqueue_req:
+	op = ops[count];
+	model = cnxk_mldev->mldev->data->models[op->model_id];
+
+	if (unlikely(!model->enqueue_single(cnxk_mldev, op, layer_id, qp, head)))
+		goto jcmdq_full;
+
+	queue_index_advance(&head, qp->nb_desc);
+	count++;
+
+	if (count < nb_ops)
+		goto enqueue_req;
+
+jcmdq_full:
+	queue->head = head;
+	qp->stats.enqueued_count += count;
+
+	return count;
+}
+
+__rte_hot uint16_t
+cnxk_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
+		      uint16_t nb_ops)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_req *req;
+	struct cnxk_ml_qp *qp;
+
+	uint64_t status;
+	uint16_t count;
+	uint64_t tail;
+
+	cnxk_mldev = dev->data->dev_private;
+	qp = dev->data->queue_pairs[qp_id];
+	queue = &qp->queue;
+
+	tail = queue->tail;
+	nb_ops = PLT_MIN(nb_ops, queue_pending_count(queue->head, tail, qp->nb_desc));
+	count = 0;
+
+	if (unlikely(nb_ops == 0))
+		goto empty_or_active;
+
+dequeue_req:
+
+	req = &queue->reqs[tail];
+	model = cnxk_mldev->mldev->data->models[req->op->model_id];
+
+	status = cnxk_ml_get_poll_ptr(req);
+	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
+		if (plt_tsc_cycles() < req->timeout)
+			goto empty_or_active;
+		else /* Timeout, set indication of driver error */
+			model->set_error_code(req, ML_ETYPE_DRIVER, 0);
+	}
+
+	model->result_update(cnxk_mldev, qp->id, req);
+
+	ops[count] = req->op;
+	queue_index_advance(&tail, qp->nb_desc);
+	count++;
+
+	if (count < nb_ops)
+		goto dequeue_req;
+
+empty_or_active:
+	queue->tail = tail;
+
+	return count;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index d27ca0d0cb..d0c126f34b 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -65,4 +65,11 @@ extern struct rte_ml_dev_ops cnxk_ml_ops;
 int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
 int cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
 
+__rte_hot uint16_t cnxk_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
+					 struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot uint16_t cnxk_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
+					 struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot void cnxk_ml_set_poll_ptr(struct cnxk_ml_req *req);
+__rte_hot uint64_t cnxk_ml_get_poll_ptr(struct cnxk_ml_req *req);
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 17/34] ml/cnxk: move error handling to cnxk layer
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
                     ` (15 preceding siblings ...)
  2023-10-18  6:47   ` [PATCH v5 16/34] ml/cnxk: update fast path functions Srikanth Yalavarthi
@ 2023-10-18  6:47   ` Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 18/34] ml/cnxk: support config and close of tvmdp library Srikanth Yalavarthi
                     ` (16 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:47 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Move error type structures to cnxk layer. cn10k layer to
handle fw and hw error sub-types only.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h | 41 ++++++---------
 drivers/ml/cnxk/cn10k_ml_ops.c | 93 +++++++++++++---------------------
 drivers/ml/cnxk/cnxk_ml_dev.c  |  8 +++
 drivers/ml/cnxk/cnxk_ml_dev.h  | 18 +++++++
 drivers/ml/cnxk/cnxk_ml_ops.c  |  2 +-
 5 files changed, 78 insertions(+), 84 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 94a94d996f..2e7eb6c9ef 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -52,38 +52,27 @@ struct cnxk_ml_dev;
 struct cnxk_ml_req;
 struct cnxk_ml_qp;
 
-/* Error types enumeration */
-enum cn10k_ml_error_etype {
-	/* 0x0 */ ML_ETYPE_NO_ERROR = 0, /* No error */
-	/* 0x1 */ ML_ETYPE_FW_NONFATAL,	 /* Firmware non-fatal error */
-	/* 0x2 */ ML_ETYPE_HW_NONFATAL,	 /* Hardware non-fatal error */
-	/* 0x3 */ ML_ETYPE_HW_FATAL,	 /* Hardware fatal error */
-	/* 0x4 */ ML_ETYPE_HW_WARNING,	 /* Hardware warning */
-	/* 0x5 */ ML_ETYPE_DRIVER,	 /* Driver specific error */
-	/* 0x6 */ ML_ETYPE_UNKNOWN,	 /* Unknown error */
-};
-
 /* Firmware non-fatal error sub-type */
 enum cn10k_ml_error_stype_fw_nf {
-	/* 0x0 */ ML_FW_ERR_NOERR = 0,		 /* No error */
-	/* 0x1 */ ML_FW_ERR_UNLOAD_ID_NOT_FOUND, /* Model ID not found during load */
-	/* 0x2 */ ML_FW_ERR_LOAD_LUT_OVERFLOW,	 /* Lookup table overflow at load */
-	/* 0x3 */ ML_FW_ERR_ID_IN_USE,		 /* Model ID already in use */
-	/* 0x4 */ ML_FW_ERR_INVALID_TILEMASK,	 /* Invalid OCM tilemask */
-	/* 0x5 */ ML_FW_ERR_RUN_LUT_OVERFLOW,	 /* Lookup table overflow at run */
-	/* 0x6 */ ML_FW_ERR_RUN_ID_NOT_FOUND,	 /* Model ID not found during run */
-	/* 0x7 */ ML_FW_ERR_COMMAND_NOTSUP,	 /* Unsupported command */
-	/* 0x8 */ ML_FW_ERR_DDR_ADDR_RANGE,	 /* DDR address out of range */
-	/* 0x9 */ ML_FW_ERR_NUM_BATCHES_INVALID, /* Invalid number of batches */
-	/* 0xA */ ML_FW_ERR_INSSYNC_TIMEOUT,	 /* INS sync timeout */
+	/* 0x0 */ ML_CN10K_FW_ERR_NOERR = 0,	       /* No error */
+	/* 0x1 */ ML_CN10K_FW_ERR_UNLOAD_ID_NOT_FOUND, /* Model ID not found during load */
+	/* 0x2 */ ML_CN10K_FW_ERR_LOAD_LUT_OVERFLOW,   /* Lookup table overflow at load */
+	/* 0x3 */ ML_CN10K_FW_ERR_ID_IN_USE,	       /* Model ID already in use */
+	/* 0x4 */ ML_CN10K_FW_ERR_INVALID_TILEMASK,    /* Invalid OCM tilemask */
+	/* 0x5 */ ML_CN10K_FW_ERR_RUN_LUT_OVERFLOW,    /* Lookup table overflow at run */
+	/* 0x6 */ ML_CN10K_FW_ERR_RUN_ID_NOT_FOUND,    /* Model ID not found during run */
+	/* 0x7 */ ML_CN10K_FW_ERR_COMMAND_NOTSUP,      /* Unsupported command */
+	/* 0x8 */ ML_CN10K_FW_ERR_DDR_ADDR_RANGE,      /* DDR address out of range */
+	/* 0x9 */ ML_CN10K_FW_ERR_NUM_BATCHES_INVALID, /* Invalid number of batches */
+	/* 0xA */ ML_CN10K_FW_ERR_INSSYNC_TIMEOUT,     /* INS sync timeout */
 };
 
 /* Driver error sub-type */
 enum cn10k_ml_error_stype_driver {
-	/* 0x0 */ ML_DRIVER_ERR_NOERR = 0, /* No error */
-	/* 0x1 */ ML_DRIVER_ERR_UNKNOWN,   /* Unable to determine error sub-type */
-	/* 0x2 */ ML_DRIVER_ERR_EXCEPTION, /* Firmware exception */
-	/* 0x3 */ ML_DRIVER_ERR_FW_ERROR,  /* Unknown firmware error */
+	/* 0x0 */ ML_CN10K_DRIVER_ERR_NOERR = 0, /* No error */
+	/* 0x1 */ ML_CN10K_DRIVER_ERR_UNKNOWN,	 /* Unable to determine error sub-type */
+	/* 0x2 */ ML_CN10K_DRIVER_ERR_EXCEPTION, /* Firmware exception */
+	/* 0x3 */ ML_CN10K_DRIVER_ERR_FW_ERROR,	 /* Unknown firmware error */
 };
 
 /* Error structure */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 8116c8dedb..65eaaf030d 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -22,47 +22,27 @@
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
-/* Error message length */
-#define ERRMSG_LEN 32
-
-/* Error type database */
-static const struct cn10k_ml_etype_db {
-	enum cn10k_ml_error_etype etype;
-	char name[ERRMSG_LEN];
-} ml_etype_db[] = {
-	{ML_ETYPE_NO_ERROR, "NO_ERROR"},	{ML_ETYPE_FW_NONFATAL, "FW_NON_FATAL"},
-	{ML_ETYPE_HW_NONFATAL, "HW_NON_FATAL"}, {ML_ETYPE_HW_FATAL, "HW_FATAL"},
-	{ML_ETYPE_HW_WARNING, "HW_WARNING"},	{ML_ETYPE_DRIVER, "DRIVER_ERROR"},
-	{ML_ETYPE_UNKNOWN, "UNKNOWN_ERROR"},
-};
-
 /* Hardware non-fatal error subtype database */
-static const struct cn10k_ml_stype_db_hw_nf {
-	enum cn10k_ml_error_stype_fw_nf stype;
-	char msg[ERRMSG_LEN];
-} ml_stype_db_hw_nf[] = {
-	{ML_FW_ERR_NOERR, "NO ERROR"},
-	{ML_FW_ERR_UNLOAD_ID_NOT_FOUND, "UNLOAD MODEL ID NOT FOUND"},
-	{ML_FW_ERR_LOAD_LUT_OVERFLOW, "LOAD LUT OVERFLOW"},
-	{ML_FW_ERR_ID_IN_USE, "MODEL ID IN USE"},
-	{ML_FW_ERR_INVALID_TILEMASK, "INVALID TILEMASK"},
-	{ML_FW_ERR_RUN_LUT_OVERFLOW, "RUN LUT OVERFLOW"},
-	{ML_FW_ERR_RUN_ID_NOT_FOUND, "RUN MODEL ID NOT FOUND"},
-	{ML_FW_ERR_COMMAND_NOTSUP, "COMMAND NOT SUPPORTED"},
-	{ML_FW_ERR_DDR_ADDR_RANGE, "DDR ADDRESS OUT OF RANGE"},
-	{ML_FW_ERR_NUM_BATCHES_INVALID, "INVALID BATCHES"},
-	{ML_FW_ERR_INSSYNC_TIMEOUT, "INSSYNC TIMEOUT"},
+static struct cnxk_ml_error_db ml_stype_db_hw_nf[] = {
+	{ML_CN10K_FW_ERR_NOERR, "NO ERROR"},
+	{ML_CN10K_FW_ERR_UNLOAD_ID_NOT_FOUND, "UNLOAD MODEL ID NOT FOUND"},
+	{ML_CN10K_FW_ERR_LOAD_LUT_OVERFLOW, "LOAD LUT OVERFLOW"},
+	{ML_CN10K_FW_ERR_ID_IN_USE, "MODEL ID IN USE"},
+	{ML_CN10K_FW_ERR_INVALID_TILEMASK, "INVALID TILEMASK"},
+	{ML_CN10K_FW_ERR_RUN_LUT_OVERFLOW, "RUN LUT OVERFLOW"},
+	{ML_CN10K_FW_ERR_RUN_ID_NOT_FOUND, "RUN MODEL ID NOT FOUND"},
+	{ML_CN10K_FW_ERR_COMMAND_NOTSUP, "COMMAND NOT SUPPORTED"},
+	{ML_CN10K_FW_ERR_DDR_ADDR_RANGE, "DDR ADDRESS OUT OF RANGE"},
+	{ML_CN10K_FW_ERR_NUM_BATCHES_INVALID, "INVALID BATCHES"},
+	{ML_CN10K_FW_ERR_INSSYNC_TIMEOUT, "INSSYNC TIMEOUT"},
 };
 
 /* Driver error subtype database */
-static const struct cn10k_ml_stype_db_driver {
-	enum cn10k_ml_error_stype_driver stype;
-	char msg[ERRMSG_LEN];
-} ml_stype_db_driver[] = {
-	{ML_DRIVER_ERR_NOERR, "NO ERROR"},
-	{ML_DRIVER_ERR_UNKNOWN, "UNKNOWN ERROR"},
-	{ML_DRIVER_ERR_EXCEPTION, "FW EXCEPTION"},
-	{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
+static struct cnxk_ml_error_db ml_stype_db_driver[] = {
+	{ML_CN10K_DRIVER_ERR_NOERR, "NO ERROR"},
+	{ML_CN10K_DRIVER_ERR_UNKNOWN, "UNKNOWN ERROR"},
+	{ML_CN10K_DRIVER_ERR_EXCEPTION, "FW EXCEPTION"},
+	{ML_CN10K_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
 };
 
 __rte_hot void
@@ -1241,19 +1221,19 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request)
 
 		/* Handle driver error */
 		error_code = (union cn10k_ml_error_code *)&result->error_code;
-		if (error_code->s.etype == ML_ETYPE_DRIVER) {
+		if (error_code->s.etype == ML_CNXK_ETYPE_DRIVER) {
 			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 			/* Check for exception */
 			if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) !=
 			     0) ||
 			    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
-				error_code->s.stype = ML_DRIVER_ERR_EXCEPTION;
+				error_code->s.stype = ML_CN10K_DRIVER_ERR_EXCEPTION;
 			else if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_LO) != 0) ||
 				 (roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_HI) != 0))
-				error_code->s.stype = ML_DRIVER_ERR_FW_ERROR;
+				error_code->s.stype = ML_CN10K_DRIVER_ERR_FW_ERROR;
 			else
-				error_code->s.stype = ML_DRIVER_ERR_UNKNOWN;
+				error_code->s.stype = ML_CN10K_DRIVER_ERR_UNKNOWN;
 		}
 
 		op->impl_opaque = result->error_code;
@@ -1294,7 +1274,7 @@ cn10k_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op, ui
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
-	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	error_code->s.etype = ML_CNXK_ETYPE_UNKNOWN;
 	req->cn10k_req.result.user_ptr = op->user_ptr;
 
 	cnxk_ml_set_poll_ptr(req);
@@ -1311,30 +1291,29 @@ __rte_hot int
 cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_ml_op_error *error)
 {
 	union cn10k_ml_error_code *error_code;
-	char msg[RTE_ML_STR_MAX];
 
 	PLT_SET_USED(dev);
 
 	error_code = (union cn10k_ml_error_code *)&op->impl_opaque;
 
-	/* Copy error message */
-	plt_strlcpy(msg, ml_etype_db[error_code->s.etype].name, sizeof(msg));
-
 	/* Copy sub error message */
-	if (error_code->s.etype == ML_ETYPE_HW_NONFATAL) {
-		strcat(msg, " : ");
+	if (error_code->s.etype == ML_CNXK_ETYPE_HW_NONFATAL) {
 		if (error_code->s.stype < PLT_DIM(ml_stype_db_hw_nf))
-			strcat(msg, ml_stype_db_hw_nf[error_code->s.stype].msg);
+			snprintf(error->message, RTE_ML_STR_MAX, "%s : %s",
+				 ml_etype_db[error_code->s.etype].str,
+				 ml_stype_db_hw_nf[error_code->s.stype].str);
 		else
-			strcat(msg, "UNKNOWN ERROR");
-	}
-
-	if (error_code->s.etype == ML_ETYPE_DRIVER) {
-		strcat(msg, " : ");
-		strcat(msg, ml_stype_db_driver[error_code->s.stype].msg);
+			snprintf(error->message, RTE_ML_STR_MAX, "%s : UNKNOWN ERROR",
+				 ml_etype_db[error_code->s.etype].str);
+	} else if (error_code->s.etype == ML_CNXK_ETYPE_DRIVER) {
+		snprintf(error->message, RTE_ML_STR_MAX, "%s : %s",
+			 ml_etype_db[error_code->s.etype].str,
+			 ml_stype_db_driver[error_code->s.stype].str);
+	} else {
+		snprintf(error->message, RTE_ML_STR_MAX, "%s",
+			 ml_etype_db[error_code->s.etype].str);
 	}
 
-	plt_strlcpy(error->message, msg, sizeof(error->message));
 	error->errcode = error_code->u64;
 
 	return 0;
@@ -1372,7 +1351,7 @@ cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
-	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	error_code->s.etype = ML_CNXK_ETYPE_UNKNOWN;
 	req->cn10k_req.result.user_ptr = NULL;
 
 	cnxk_ml_set_poll_ptr(req);
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.c b/drivers/ml/cnxk/cnxk_ml_dev.c
index 2a5c17c973..63d1c9e417 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.c
+++ b/drivers/ml/cnxk/cnxk_ml_dev.c
@@ -9,3 +9,11 @@
 
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
+
+/* Error type database */
+struct cnxk_ml_error_db ml_etype_db[] = {
+	{ML_CNXK_ETYPE_NO_ERROR, "NO_ERROR"},	     {ML_CNXK_ETYPE_FW_NONFATAL, "FW_NON_FATAL"},
+	{ML_CNXK_ETYPE_HW_NONFATAL, "HW_NON_FATAL"}, {ML_CNXK_ETYPE_HW_FATAL, "HW_FATAL"},
+	{ML_CNXK_ETYPE_HW_WARNING, "HW_WARNING"},    {ML_CNXK_ETYPE_DRIVER, "DRIVER_ERROR"},
+	{ML_CNXK_ETYPE_UNKNOWN, "UNKNOWN_ERROR"},
+};
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 3ce9338f1f..382fca64be 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -18,6 +18,22 @@
 #define ML_CNXK_POLL_JOB_START	0
 #define ML_CNXK_POLL_JOB_FINISH 1
 
+/* Error types enumeration */
+enum cnxk_ml_error_etype {
+	/* 0x0 */ ML_CNXK_ETYPE_NO_ERROR = 0, /* No error */
+	/* 0x1 */ ML_CNXK_ETYPE_FW_NONFATAL,  /* Firmware non-fatal error */
+	/* 0x2 */ ML_CNXK_ETYPE_HW_NONFATAL,  /* Hardware non-fatal error */
+	/* 0x3 */ ML_CNXK_ETYPE_HW_FATAL,     /* Hardware fatal error */
+	/* 0x4 */ ML_CNXK_ETYPE_HW_WARNING,   /* Hardware warning */
+	/* 0x5 */ ML_CNXK_ETYPE_DRIVER,	      /* Driver specific error */
+	/* 0x6 */ ML_CNXK_ETYPE_UNKNOWN,      /* Unknown error */
+};
+
+struct cnxk_ml_error_db {
+	uint64_t code;
+	char str[RTE_ML_STR_MAX];
+};
+
 /* Device configuration state enum */
 enum cnxk_ml_dev_state {
 	/* Probed and not configured */
@@ -78,4 +94,6 @@ struct cnxk_ml_dev {
 	struct cnxk_ml_index_map *index_map;
 };
 
+extern struct cnxk_ml_error_db ml_etype_db[];
+
 #endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 6a44a69508..8339f8342b 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1372,7 +1372,7 @@ cnxk_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op *
 		if (plt_tsc_cycles() < req->timeout)
 			goto empty_or_active;
 		else /* Timeout, set indication of driver error */
-			model->set_error_code(req, ML_ETYPE_DRIVER, 0);
+			model->set_error_code(req, ML_CNXK_ETYPE_DRIVER, 0);
 	}
 
 	model->result_update(cnxk_mldev, qp->id, req);
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 18/34] ml/cnxk: support config and close of tvmdp library
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
                     ` (16 preceding siblings ...)
  2023-10-18  6:47   ` [PATCH v5 17/34] ml/cnxk: move error handling to cnxk layer Srikanth Yalavarthi
@ 2023-10-18  6:47   ` Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 19/34] ml/cnxk: add structures to support TVM model type Srikanth Yalavarthi
                     ` (15 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:47 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to configure and close TVMDP library based
on ML device configuration options.

Updated meson build to enable Jansson, TVM runtime, TVMDP
library as build dependencies.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 doc/guides/mldevs/cnxk.rst       | 58 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.c    |  7 ++++
 drivers/ml/cnxk/cnxk_ml_ops.h    |  6 ++++
 drivers/ml/cnxk/meson.build      | 59 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 41 ++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   | 19 ++++++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.c | 26 ++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h | 15 ++++++++
 8 files changed, 231 insertions(+)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_stubs.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_stubs.h

diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
index 1834b1f905..a629ceb796 100644
--- a/doc/guides/mldevs/cnxk.rst
+++ b/doc/guides/mldevs/cnxk.rst
@@ -46,6 +46,64 @@ or cross-compiled on an x86 platform.
 
 Refer to :doc:`../platform/cnxk` for instructions to build your DPDK application.
 
+Compilation Prerequisites
+-------------------------
+
+This driver requires external libraries to optionally enable support for
+models compiled using Apache TVM framework. The following dependencies are
+not part of DPDK and must be installed separately:
+
+- **Jansson**
+
+  This library enables support to parse and read JSON files.
+
+- **TVM**
+
+  Apache TVM provides a runtime library (libtvm_runtime) used to execute
+  models on CPU cores or hardware accelerators.
+
+.. note::
+
+    DPDK CNXK ML driver requires TVM version 0.10.0
+
+.. code-block:: console
+
+    git clone https://github.com/apache/tvm.git
+    cd tvm
+    git checkout v0.10.0 -b v0.10.0
+    cmake -S ./ -B build \
+      -DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
+      -DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++ \
+      -DMACHINE_NAME=aarch64-linux-gnu \
+      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
+      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY
+    make -C build
+    make -C build install
+
+- **TVMDP**
+
+  Marvell's `TVM Dataplane Library <https://github.com/MarvellEmbeddedProcessors/tvmdp>`_
+  works as an interface between TVM runtime and DPDK drivers. TVMDP library
+  provides a simplified C interface for TVM's runtime based on C++.
+
+.. code-block:: console
+
+    git clone https://github.com/MarvellEmbeddedProcessors/tvmdp.git
+    cd tvmdp
+    git checkout main
+    cmake -S ./ -B build \
+      -DCMAKE_TOOLCHAIN_FILE=config/toolchains/arm64_linux_gcc.cmake \
+      -DBUILD_SHARED_LIBS=ON \
+      -DBUILD_TESTING=OFF
+    make -C build
+    make -C build install
+
+- **libarchive**
+
+  Apached TVM framework generates compiled models as tar archives. This
+  library enables support to decompress and read archive files in tar,
+  xz and other formats.
+
 
 Initialization
 --------------
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 8339f8342b..c3639320a5 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -564,6 +564,10 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 		goto error;
 	}
 
+	ret = mvtvm_ml_dev_configure(cnxk_mldev, conf);
+	if (ret != 0)
+		goto error;
+
 	/* Set device capabilities */
 	cnxk_mldev->max_nb_layers =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
@@ -624,6 +628,9 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 	/* Un-initialize xstats */
 	cnxk_ml_xstats_uninit(cnxk_mldev);
 
+	if (mvtvm_ml_dev_close(cnxk_mldev) != 0)
+		plt_err("Failed to close MVTVM ML Device");
+
 	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close CN10K ML Device");
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index d0c126f34b..b22a2b0d95 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -12,6 +12,12 @@
 
 #include "cn10k_ml_ops.h"
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_ops.h"
+#else
+#include "mvtvm_ml_stubs.h"
+#endif
+
 /* Request structure */
 struct cnxk_ml_req {
 	/* Device specific request */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 575f08f9c0..7570186177 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -7,6 +7,32 @@ if not is_linux or not dpdk_conf.get('RTE_ARCH_64')
     subdir_done()
 endif
 
+enable_mvtvm = true
+
+if not jansson_dep.found()
+        message('drivers/ml/cnxk: jansson not found')
+        enable_mvtvm = false
+endif
+
+if not cc.check_header('dlpack/dlpack.h')
+        message('drivers/ml/cnxk: dlpack.h not found')
+        enable_mvtvm = false
+endif
+
+tvmrt_lib = cc.find_library('tvm_runtime', required: false)
+if tvmrt_lib.found()
+        tvmrt_dep = declare_dependency(dependencies: tvmrt_lib)
+else
+        message('drivers/ml/cnxk: tvm_runtime not found')
+        enable_mvtvm = false
+endif
+
+tvmdp_dep = dependency('tvmdp', required: false)
+if not tvmdp_dep.found()
+        message('drivers/ml/cnxk: tvmdp not found')
+        enable_mvtvm = false
+endif
+
 driver_sdk_headers = files(
         'cn10k_ml_dev.h',
         'cn10k_ml_ops.h',
@@ -34,6 +60,39 @@ sources = files(
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
 
+if enable_mvtvm
+
+dpdk_conf.set('RTE_MLDEV_CNXK_ENABLE_MVTVM', 1)
+
+driver_sdk_headers += files(
+        'mvtvm_ml_ops.h',
+)
+
+sources += files(
+        'mvtvm_ml_ops.c',
+)
+
+ext_deps += tvmrt_dep
+ext_deps += tvmdp_dep
+ext_deps += cc.find_library('stdc++', required: true)
+ext_deps += jansson_dep
+
+deps += ['bus_vdev']
+
+message('drivers/ml/cnxk: Enabled TVM model support')
+else
+message('drivers/ml/cnxk: Disabled TVM model support')
+
+driver_sdk_headers += files(
+        'mvtvm_ml_stubs.h',
+)
+
+sources += files(
+        'mvtvm_ml_stubs.c',
+)
+
+endif
+
 require_iova_in_mbuf = false
 
 if get_option('buildtype').contains('debug')
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
new file mode 100644
index 0000000000..88c6d5a864
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_common.h>
+#include <rte_cycles.h>
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include "cnxk_ml_dev.h"
+#include "cnxk_ml_ops.h"
+
+int
+mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
+{
+	int ret;
+
+	RTE_SET_USED(conf);
+
+	/* Configure TVMDP library */
+	ret = tvmdp_configure(cnxk_mldev->mldev->data->nb_models, rte_get_tsc_cycles);
+	if (ret != 0)
+		plt_err("TVMDP configuration failed, error = %d\n", ret);
+
+	return ret;
+}
+
+int
+mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
+{
+	int ret;
+
+	RTE_SET_USED(cnxk_mldev);
+
+	/* Close TVMDP library configuration */
+	ret = tvmdp_close();
+	if (ret != 0)
+		plt_err("TVMDP close failed, error = %d\n", ret);
+
+	return ret;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
new file mode 100644
index 0000000000..305b4681ed
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_OPS_H_
+#define _MVTVM_ML_OPS_H_
+
+#include <dlpack/dlpack.h>
+
+#include <tvmdp.h>
+
+#include <rte_mldev.h>
+
+struct cnxk_ml_dev;
+
+int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
+int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+
+#endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
new file mode 100644
index 0000000000..a31cd39cfa
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include "mvtvm_ml_stubs.h"
+
+#include "cnxk_ml_dev.h"
+
+int
+mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(conf);
+
+	return 0;
+}
+
+int
+mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
+{
+	RTE_SET_USED(cnxk_mldev);
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
new file mode 100644
index 0000000000..11c56e5144
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_STUBS_H_
+#define _MVTVM_ML_STUBS_H_
+
+#include <rte_mldev.h>
+
+struct cnxk_ml_dev;
+
+int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
+int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+
+#endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 19/34] ml/cnxk: add structures to support TVM model type
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
                     ` (17 preceding siblings ...)
  2023-10-18  6:47   ` [PATCH v5 18/34] ml/cnxk: support config and close of tvmdp library Srikanth Yalavarthi
@ 2023-10-18  6:47   ` Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 20/34] ml/cnxk: add support for identify " Srikanth Yalavarthi
                     ` (14 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:47 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduced model type, sub-type and layer type. Added
internal structures for TVM model objects.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  3 ++
 drivers/ml/cnxk/cn10k_ml_ops.c   |  6 ++-
 drivers/ml/cnxk/cnxk_ml_model.h  | 66 +++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.c    | 52 ++++++++++++++++++++-----
 drivers/ml/cnxk/meson.build      |  1 +
 drivers/ml/cnxk/mvtvm_ml_model.h | 46 ++++++++++++++++++++++
 6 files changed, 161 insertions(+), 13 deletions(-)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.h

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index dc315cce10..749ddeb344 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -435,6 +435,9 @@ cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint1
 
 			for (j = 0; j < local_model->nb_layers; j++) {
 				local_layer = &local_model->layer[j];
+				if (local_layer->type != ML_CNXK_LAYER_TYPE_MRVL)
+					continue;
+
 				if (local_layer != layer &&
 				    local_layer->glow.ocm_map.ocm_reserved) {
 					if (IS_BIT_SET(local_layer->glow.ocm_map.tilemask, tile_id))
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 65eaaf030d..a471e98fbf 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -725,6 +725,9 @@ cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	if (ret != 0)
 		return ret;
 
+	/* Set model sub type */
+	model->subtype = ML_CNXK_MODEL_SUBTYPE_GLOW_MRVL;
+
 	/* Copy metadata to internal buffer */
 	rte_memcpy(&model->glow.metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
 	cn10k_ml_model_metadata_update(&model->glow.metadata);
@@ -746,6 +749,7 @@ cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	/* Load layer and get the index */
 	layer = &model->layer[0];
+	layer->type = ML_CNXK_LAYER_TYPE_MRVL;
 	ret = cn10k_ml_layer_load(cnxk_mldev, model->model_id, NULL, params->addr, params->size,
 				  &layer->index);
 	if (ret != 0) {
@@ -969,7 +973,7 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 	if (ret < 0) {
 		cn10k_ml_layer_stop(device, model_id, layer_name);
 	} else {
-		if (cn10k_mldev->cache_model_data)
+		if (cn10k_mldev->cache_model_data && model->type == ML_CNXK_MODEL_TYPE_GLOW)
 			ret = cn10k_ml_cache_model_data(cnxk_mldev, layer);
 	}
 
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index f618e5aa5f..f100eca203 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -11,6 +11,10 @@
 
 #include "cn10k_ml_model.h"
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_model.h"
+#endif
+
 #include "cnxk_ml_io.h"
 
 struct cnxk_ml_dev;
@@ -18,6 +22,48 @@ struct cnxk_ml_model;
 struct cnxk_ml_qp;
 struct cnxk_ml_req;
 
+/* Model type */
+enum cnxk_ml_model_type {
+	/* Unknown model type */
+	ML_CNXK_MODEL_TYPE_UNKNOWN,
+
+	/* Invalid model type */
+	ML_CNXK_MODEL_TYPE_INVALID,
+
+	/* Glow compiled model, for MLIP target */
+	ML_CNXK_MODEL_TYPE_GLOW,
+
+	/* TVM compiled model, for ARM64 / ARM64 + MLIP target */
+	ML_CNXK_MODEL_TYPE_TVM,
+};
+
+/* Model subtype */
+enum cnxk_ml_model_subtype {
+	/* Marvell Glow model */
+	ML_CNXK_MODEL_SUBTYPE_GLOW_MRVL,
+
+	/* TVM model with single MRVL region */
+	ML_CNXK_MODEL_SUBTYPE_TVM_MRVL,
+
+	/* TVM model with LLVM regions only */
+	ML_CNXK_MODEL_SUBTYPE_TVM_LLVM,
+
+	/* TVM hybrid model, with both MRVL and LLVM regions or (> 1) MRVL regions*/
+	ML_CNXK_MODEL_SUBTYPE_TVM_HYBRID,
+};
+
+/* Layer type */
+enum cnxk_ml_layer_type {
+	/* MRVL layer, for MLIP target*/
+	ML_CNXK_LAYER_TYPE_UNKNOWN = 0,
+
+	/* MRVL layer, for MLIP target*/
+	ML_CNXK_LAYER_TYPE_MRVL,
+
+	/* LLVM layer, for ARM64 target*/
+	ML_CNXK_LAYER_TYPE_LLVM,
+};
+
 /* Model state */
 enum cnxk_ml_model_state {
 	/* Unknown state */
@@ -53,6 +99,9 @@ struct cnxk_ml_layer {
 	/* Name*/
 	char name[RTE_ML_STR_MAX];
 
+	/* Type */
+	enum cnxk_ml_layer_type type;
+
 	/* Model handle */
 	struct cnxk_ml_model *model;
 
@@ -83,14 +132,27 @@ struct cnxk_ml_model {
 	/* Device reference */
 	struct cnxk_ml_dev *cnxk_mldev;
 
+	/* Type */
+	enum cnxk_ml_model_type type;
+
+	/* Model subtype */
+	enum cnxk_ml_model_subtype subtype;
+
 	/* ID */
 	uint16_t model_id;
 
 	/* Name */
 	char name[RTE_ML_STR_MAX];
 
-	/* Model specific data - glow */
-	struct cn10k_ml_model_data glow;
+	union {
+		/* Model specific data - glow */
+		struct cn10k_ml_model_data glow;
+
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		/* Model type specific data - mvtvm */
+		struct mvtvm_ml_model_data mvtvm;
+#endif
+	};
 
 	/* Batch size */
 	uint32_t batch_size;
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index c3639320a5..ea6f59a70f 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1217,6 +1217,8 @@ cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buf
 	struct cnxk_ml_model *model;
 	uint8_t *lcl_dbuffer;
 	uint8_t *lcl_qbuffer;
+	uint64_t d_offset;
+	uint64_t q_offset;
 	uint32_t i;
 	int ret;
 
@@ -1229,17 +1231,31 @@ cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buf
 		return -EINVAL;
 	}
 
-	info = &model->layer[0].info;
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		info = cn10k_ml_model_io_info_get(model, 0);
 
-	lcl_dbuffer = dbuffer[0]->addr;
-	lcl_qbuffer = qbuffer[0]->addr;
+	if (info == NULL)
+		return -EINVAL;
+
+	d_offset = 0;
+	q_offset = 0;
 	for (i = 0; i < info->nb_inputs; i++) {
+		if (model->type == ML_CNXK_MODEL_TYPE_TVM) {
+			lcl_dbuffer = dbuffer[i]->addr;
+			lcl_qbuffer = qbuffer[i]->addr;
+		} else {
+			lcl_dbuffer = RTE_PTR_ADD(dbuffer[0]->addr, d_offset);
+			lcl_qbuffer = RTE_PTR_ADD(qbuffer[0]->addr, q_offset);
+		}
+
 		ret = cnxk_ml_io_quantize_single(&info->input[i], lcl_dbuffer, lcl_qbuffer);
 		if (ret < 0)
 			return ret;
 
-		lcl_dbuffer += info->input[i].sz_d;
-		lcl_qbuffer += info->input[i].sz_q;
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
+			d_offset += info->input[i].sz_d;
+			q_offset += info->input[i].sz_q;
+		}
 	}
 
 	return 0;
@@ -1253,6 +1269,8 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 	struct cnxk_ml_model *model;
 	uint8_t *lcl_qbuffer;
 	uint8_t *lcl_dbuffer;
+	uint64_t q_offset;
+	uint64_t d_offset;
 	uint32_t i;
 	int ret;
 
@@ -1265,17 +1283,31 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 		return -EINVAL;
 	}
 
-	info = &model->layer[model->nb_layers - 1].info;
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		info = cn10k_ml_model_io_info_get(model, model->nb_layers - 1);
 
-	lcl_qbuffer = qbuffer[0]->addr;
-	lcl_dbuffer = dbuffer[0]->addr;
+	if (info == NULL)
+		return -EINVAL;
+
+	q_offset = 0;
+	d_offset = 0;
 	for (i = 0; i < info->nb_outputs; i++) {
+		if (model->type == ML_CNXK_MODEL_TYPE_TVM) {
+			lcl_qbuffer = qbuffer[i]->addr;
+			lcl_dbuffer = dbuffer[i]->addr;
+		} else {
+			lcl_qbuffer = RTE_PTR_ADD(qbuffer[0]->addr, q_offset);
+			lcl_dbuffer = RTE_PTR_ADD(dbuffer[0]->addr, d_offset);
+		}
+
 		ret = cnxk_ml_io_dequantize_single(&info->output[i], lcl_qbuffer, lcl_dbuffer);
 		if (ret < 0)
 			return ret;
 
-		lcl_qbuffer += info->output[i].sz_q;
-		lcl_dbuffer += info->output[i].sz_d;
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
+			q_offset += info->output[i].sz_q;
+			d_offset += info->output[i].sz_d;
+		}
 	}
 
 	return 0;
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 7570186177..12b73ee3be 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -66,6 +66,7 @@ dpdk_conf.set('RTE_MLDEV_CNXK_ENABLE_MVTVM', 1)
 
 driver_sdk_headers += files(
         'mvtvm_ml_ops.h',
+        'mvtvm_ml_model.h',
 )
 
 sources += files(
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
new file mode 100644
index 0000000000..1f6b435be0
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_MODEL_H_
+#define _MVTVM_ML_MODEL_H_
+
+#include <tvmdp.h>
+
+#include <rte_mldev.h>
+
+#include "cnxk_ml_io.h"
+
+/* Maximum number of objects per model */
+#define ML_MVTVM_MODEL_OBJECT_MAX 3
+
+/* Objects list */
+extern char mvtvm_object_list[ML_MVTVM_MODEL_OBJECT_MAX][RTE_ML_STR_MAX];
+
+/* Model object structure */
+struct mvtvm_ml_model_object {
+	/* Name */
+	char name[RTE_ML_STR_MAX];
+
+	/* Temporary buffer */
+	uint8_t *buffer;
+
+	/* Buffer size */
+	int64_t size;
+};
+
+struct mvtvm_ml_model_data {
+	/* Model metadata */
+	struct tvmdp_model_metadata metadata;
+
+	/* Model objects */
+	struct tvmdp_model_object object;
+
+	/* TVM runtime callbacks */
+	struct tvmrt_glow_callback cb;
+
+	/* Model I/O info */
+	struct cnxk_ml_io_info info;
+};
+
+#endif /* _MVTVM_ML_MODEL_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 20/34] ml/cnxk: add support for identify model type
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
                     ` (18 preceding siblings ...)
  2023-10-18  6:47   ` [PATCH v5 19/34] ml/cnxk: add structures to support TVM model type Srikanth Yalavarthi
@ 2023-10-18  6:47   ` Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 21/34] ml/cnxk: add support to parse TVM model objects Srikanth Yalavarthi
                     ` (13 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:47 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enable support to parse model buffer to identify the
model type and model sub-type. Enabled basic checks
for Glow model type buffer.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_model.c  | 49 ++++++++++++++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_model.h  |  3 ++
 drivers/ml/cnxk/cnxk_ml_ops.c    |  8 +++++
 drivers/ml/cnxk/meson.build      |  6 ++++
 drivers/ml/cnxk/mvtvm_ml_model.c | 55 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  9 ++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  1 +
 8 files changed, 133 insertions(+)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.c

diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
index b069d4e3a5..02f80410ec 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.c
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -2,11 +2,60 @@
  * Copyright (c) 2023 Marvell.
  */
 
+#include <rte_hash_crc.h>
 #include <rte_mldev.h>
 
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_utils.h"
 
+enum cnxk_ml_model_type
+cnxk_ml_model_get_type(struct rte_ml_model_params *params)
+{
+	struct cn10k_ml_model_metadata_header *metadata_header;
+	enum cnxk_ml_model_type type;
+	uint32_t payload_crc32c;
+	uint32_t header_crc32c;
+
+	type = mvtvm_ml_model_type_get(params);
+	if (type == ML_CNXK_MODEL_TYPE_TVM)
+		return ML_CNXK_MODEL_TYPE_TVM;
+	else if (type == ML_CNXK_MODEL_TYPE_INVALID)
+		return ML_CNXK_MODEL_TYPE_INVALID;
+
+	/* Check model magic string */
+	metadata_header = (struct cn10k_ml_model_metadata_header *)params->addr;
+	if (strncmp((char *)metadata_header->magic, MRVL_ML_MODEL_MAGIC_STRING, 4) != 0) {
+		plt_err("Invalid Glow model, magic = %s", metadata_header->magic);
+		return ML_CNXK_MODEL_TYPE_INVALID;
+	}
+
+	/* Header CRC check */
+	if (metadata_header->header_crc32c != 0) {
+		header_crc32c = rte_hash_crc(
+			params->addr,
+			sizeof(struct cn10k_ml_model_metadata_header) - sizeof(uint32_t), 0);
+
+		if (header_crc32c != metadata_header->header_crc32c) {
+			plt_err("Invalid Glow model, Header CRC mismatch");
+			return ML_CNXK_MODEL_TYPE_INVALID;
+		}
+	}
+
+	/* Payload CRC check */
+	if (metadata_header->payload_crc32c != 0) {
+		payload_crc32c = rte_hash_crc(
+			PLT_PTR_ADD(params->addr, sizeof(struct cn10k_ml_model_metadata_header)),
+			params->size - sizeof(struct cn10k_ml_model_metadata_header), 0);
+
+		if (payload_crc32c != metadata_header->payload_crc32c) {
+			plt_err("Invalid Glow model, Payload CRC mismatch");
+			return ML_CNXK_MODEL_TYPE_INVALID;
+		}
+	}
+
+	return ML_CNXK_MODEL_TYPE_GLOW;
+}
+
 void
 cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp)
 {
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index f100eca203..a2fced46a2 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -13,6 +13,8 @@
 
 #ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
 #include "mvtvm_ml_model.h"
+#else
+#include "mvtvm_ml_stubs.h"
 #endif
 
 #include "cnxk_ml_io.h"
@@ -184,6 +186,7 @@ struct cnxk_ml_model {
 	set_poll_addr_t set_poll_addr;
 };
 
+enum cnxk_ml_model_type cnxk_ml_model_get_type(struct rte_ml_model_params *params);
 void cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp);
 
 #endif /* _CNXK_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index ea6f59a70f..c140408023 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1018,6 +1018,7 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 {
 	struct rte_ml_dev_info dev_info;
 	struct cnxk_ml_dev *cnxk_mldev;
+	enum cnxk_ml_model_type type;
 	struct cnxk_ml_model *model;
 
 	char str[RTE_MEMZONE_NAMESIZE];
@@ -1033,6 +1034,12 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 
 	cnxk_mldev = dev->data->dev_private;
 
+	type = cnxk_ml_model_get_type(params);
+	if (type == ML_CNXK_MODEL_TYPE_INVALID) {
+		plt_err("Invalid / unsupported model type");
+		return -EINVAL;
+	}
+
 	/* Find model ID */
 	found = false;
 	for (lcl_model_id = 0; lcl_model_id < dev->data->nb_models; lcl_model_id++) {
@@ -1066,6 +1073,7 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 
 	model = mz->addr;
 	model->cnxk_mldev = cnxk_mldev;
+	model->type = type;
 	model->model_id = lcl_model_id;
 	model->info = PLT_PTR_ADD(
 		model, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size));
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 12b73ee3be..b3a62a7871 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -9,6 +9,11 @@ endif
 
 enable_mvtvm = true
 
+if not libarchive.found()
+        message('drivers/ml/cnxk: libarchive not found')
+        enable_mvtvm = false
+endif
+
 if not jansson_dep.found()
         message('drivers/ml/cnxk: jansson not found')
         enable_mvtvm = false
@@ -71,6 +76,7 @@ driver_sdk_headers += files(
 
 sources += files(
         'mvtvm_ml_ops.c',
+        'mvtvm_ml_model.c',
 )
 
 ext_deps += tvmrt_dep
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
new file mode 100644
index 0000000000..ab5f8baa67
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <archive.h>
+#include <archive_entry.h>
+
+#include <rte_mldev.h>
+
+#include <roc_api.h>
+
+#include "cnxk_ml_model.h"
+
+/* Objects list */
+char mvtvm_object_list[ML_MVTVM_MODEL_OBJECT_MAX][RTE_ML_STR_MAX] = {"mod.so", "mod.json",
+								     "mod.params"};
+
+enum cnxk_ml_model_type
+mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
+{
+	bool object_found[ML_MVTVM_MODEL_OBJECT_MAX] = {false, false, false};
+	struct archive_entry *entry;
+	struct archive *a;
+	uint8_t i;
+	int ret;
+
+	/* Assume as archive and check for read status */
+	a = archive_read_new();
+	archive_read_support_filter_all(a);
+	archive_read_support_format_all(a);
+
+	ret = archive_read_open_memory(a, params->addr, params->size);
+	if (ret != ARCHIVE_OK)
+		return ML_CNXK_MODEL_TYPE_UNKNOWN;
+
+	/* Parse buffer for available objects */
+	while (archive_read_next_header(a, &entry) == ARCHIVE_OK) {
+		for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+			if (!object_found[i] &&
+			    (strcmp(archive_entry_pathname(entry), mvtvm_object_list[i]) == 0))
+				object_found[i] = true;
+		}
+		archive_read_data_skip(a);
+	}
+
+	/* Check if all objects are available */
+	for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+		if (!object_found[i]) {
+			plt_err("Object %s not found in archive!\n", mvtvm_object_list[i]);
+			return ML_CNXK_MODEL_TYPE_INVALID;
+		}
+	}
+
+	return ML_CNXK_MODEL_TYPE_TVM;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 1f6b435be0..b6162fceec 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -43,4 +43,6 @@ struct mvtvm_ml_model_data {
 	struct cnxk_ml_io_info info;
 };
 
+enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
+
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index a31cd39cfa..a7352840a6 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -7,6 +7,15 @@
 #include "mvtvm_ml_stubs.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
+
+enum cnxk_ml_model_type
+mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
+{
+	RTE_SET_USED(params);
+
+	return ML_CNXK_MODEL_TYPE_UNKNOWN;
+}
 
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 11c56e5144..467a9d39e5 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -9,6 +9,7 @@
 
 struct cnxk_ml_dev;
 
+enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 21/34] ml/cnxk: add support to parse TVM model objects
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
                     ` (19 preceding siblings ...)
  2023-10-18  6:47   ` [PATCH v5 20/34] ml/cnxk: add support for identify " Srikanth Yalavarthi
@ 2023-10-18  6:47   ` Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 22/34] ml/cnxk: fetch layer info and load TVM model Srikanth Yalavarthi
                     ` (12 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:47 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to parse TVM model objects from the model
archive buffer. Added support to check for all expected
objects and copy TVM model objects to internal buffers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_ops.c    |  5 ++-
 drivers/ml/cnxk/mvtvm_ml_model.c | 57 +++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 62 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |  3 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c | 11 ++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  3 ++
 7 files changed, 142 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index c140408023..b18271545d 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1079,7 +1079,10 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 		model, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size));
 	dev->data->models[lcl_model_id] = model;
 
-	ret = cn10k_ml_model_load(cnxk_mldev, params, model);
+	if (type == ML_CNXK_MODEL_TYPE_GLOW)
+		ret = cn10k_ml_model_load(cnxk_mldev, params, model);
+	else
+		ret = mvtvm_ml_model_load(cnxk_mldev, params, model);
 	if (ret != 0)
 		goto error;
 
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index ab5f8baa67..4c9a080c05 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -53,3 +53,60 @@ mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
 
 	return ML_CNXK_MODEL_TYPE_TVM;
 }
+
+int
+mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params, struct mvtvm_ml_model_object *object)
+{
+	bool object_found[ML_MVTVM_MODEL_OBJECT_MAX] = {false, false, false};
+	struct archive_entry *entry;
+	struct archive *a;
+	uint8_t i;
+	int ret;
+
+	/* Open archive */
+	a = archive_read_new();
+	archive_read_support_filter_all(a);
+	archive_read_support_format_all(a);
+
+	ret = archive_read_open_memory(a, params->addr, params->size);
+	if (ret != ARCHIVE_OK)
+		return archive_errno(a);
+
+	/* Read archive */
+	while (archive_read_next_header(a, &entry) == ARCHIVE_OK) {
+		for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+			if (!object_found[i] &&
+			    (strcmp(archive_entry_pathname(entry), mvtvm_object_list[i]) == 0)) {
+				memcpy(object[i].name, mvtvm_object_list[i], RTE_ML_STR_MAX);
+				object[i].size = archive_entry_size(entry);
+				object[i].buffer = rte_malloc(NULL, object[i].size, 0);
+
+				if (archive_read_data(a, object[i].buffer, object[i].size) !=
+				    object[i].size) {
+					plt_err("Failed to read object from model archive: %s",
+						object[i].name);
+					goto error;
+				}
+				object_found[i] = true;
+			}
+		}
+		archive_read_data_skip(a);
+	}
+
+	/* Check if all objects are parsed */
+	for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+		if (!object_found[i]) {
+			plt_err("Object %s not found in archive!\n", mvtvm_object_list[i]);
+			goto error;
+		}
+	}
+	return 0;
+
+error:
+	for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+		if (object[i].buffer != NULL)
+			rte_free(object[i].buffer);
+	}
+
+	return -EINVAL;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index b6162fceec..b11b66f495 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -44,5 +44,7 @@ struct mvtvm_ml_model_data {
 };
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
+int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
+			      struct mvtvm_ml_model_object *object);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 88c6d5a864..e2413b6b15 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -8,8 +8,12 @@
 #include <rte_mldev_pmd.h>
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
+/* ML model macros */
+#define MVTVM_ML_MODEL_MEMZONE_NAME "ml_mvtvm_model_mz"
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
@@ -39,3 +43,61 @@ mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 
 	return ret;
 }
+
+int
+mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+		    struct cnxk_ml_model *model)
+{
+	struct mvtvm_ml_model_object object[ML_MVTVM_MODEL_OBJECT_MAX];
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	size_t model_object_size = 0;
+	uint64_t mz_size = 0;
+	int ret;
+
+	RTE_SET_USED(cnxk_mldev);
+
+	ret = mvtvm_ml_model_blob_parse(params, object);
+	if (ret != 0)
+		return ret;
+
+	model_object_size = RTE_ALIGN_CEIL(object[0].size, RTE_CACHE_LINE_MIN_SIZE) +
+			    RTE_ALIGN_CEIL(object[1].size, RTE_CACHE_LINE_MIN_SIZE) +
+			    RTE_ALIGN_CEIL(object[2].size, RTE_CACHE_LINE_MIN_SIZE);
+	mz_size += model_object_size;
+
+	/* Allocate memzone for model object */
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", MVTVM_ML_MODEL_MEMZONE_NAME, model->model_id);
+	mz = plt_memzone_reserve_aligned(str, mz_size, 0, ML_CN10K_ALIGN_SIZE);
+	if (!mz) {
+		plt_err("plt_memzone_reserve failed : %s", str);
+		return -ENOMEM;
+	}
+
+	/* Copy mod.so */
+	model->mvtvm.object.so.addr = mz->addr;
+	model->mvtvm.object.so.size = object[0].size;
+	rte_memcpy(model->mvtvm.object.so.name, object[0].name, TVMDP_NAME_STRLEN);
+	rte_memcpy(model->mvtvm.object.so.addr, object[0].buffer, object[0].size);
+	rte_free(object[0].buffer);
+
+	/* Copy mod.json */
+	model->mvtvm.object.json.addr =
+		RTE_PTR_ADD(model->mvtvm.object.so.addr,
+			    RTE_ALIGN_CEIL(model->mvtvm.object.so.size, RTE_CACHE_LINE_MIN_SIZE));
+	model->mvtvm.object.json.size = object[1].size;
+	rte_memcpy(model->mvtvm.object.json.name, object[1].name, TVMDP_NAME_STRLEN);
+	rte_memcpy(model->mvtvm.object.json.addr, object[1].buffer, object[1].size);
+	rte_free(object[1].buffer);
+
+	/* Copy mod.params */
+	model->mvtvm.object.params.addr =
+		RTE_PTR_ADD(model->mvtvm.object.json.addr,
+			    RTE_ALIGN_CEIL(model->mvtvm.object.json.size, RTE_CACHE_LINE_MIN_SIZE));
+	model->mvtvm.object.params.size = object[2].size;
+	rte_memcpy(model->mvtvm.object.params.name, object[2].name, TVMDP_NAME_STRLEN);
+	rte_memcpy(model->mvtvm.object.params.addr, object[2].buffer, object[2].size);
+	rte_free(object[2].buffer);
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 305b4681ed..6607537599 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -12,8 +12,11 @@
 #include <rte_mldev.h>
 
 struct cnxk_ml_dev;
+struct cnxk_ml_model;
 
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+			struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index a7352840a6..7f3b3abb2e 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -33,3 +33,14 @@ mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 
 	return 0;
 }
+
+int
+mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+		    struct cnxk_ml_model *model)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(params);
+	RTE_SET_USED(model);
+
+	return -EINVAL;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 467a9d39e5..4bb1772ef4 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -8,9 +8,12 @@
 #include <rte_mldev.h>
 
 struct cnxk_ml_dev;
+struct cnxk_ml_model;
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+			struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 22/34] ml/cnxk: fetch layer info and load TVM model
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
                     ` (20 preceding siblings ...)
  2023-10-18  6:47   ` [PATCH v5 21/34] ml/cnxk: add support to parse TVM model objects Srikanth Yalavarthi
@ 2023-10-18  6:47   ` Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 23/34] ml/cnxk: update internal info for " Srikanth Yalavarthi
                     ` (11 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:47 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to fetch TVM model layer information and
update internal structures based on the layer information
Set callback functions for layer load and unload and
enable model loading using TVMDP library. Added support
to fetch full metadata after model load.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 11 +++++
 drivers/ml/cnxk/cn10k_ml_model.h |  2 +
 drivers/ml/cnxk/cn10k_ml_ops.c   |  7 ++-
 drivers/ml/cnxk/mvtvm_ml_model.c | 25 ++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  4 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 81 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.c | 10 ++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  3 ++
 8 files changed, 141 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index b765b4ada9..9a80adf0fc 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -714,3 +714,14 @@ cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer
 	cnxk_ml_print_line(fp, LINE_LEN);
 	fprintf(fp, "\n");
 }
+
+int
+cn10k_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name, uint16_t *layer_id)
+{
+	if (model->type == ML_CNXK_MODEL_TYPE_TVM)
+		return mvtvm_ml_model_get_layer_id(model, layer_name, layer_id);
+
+	*layer_id = 0;
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 45f2ed5fcf..6744175cd5 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -461,5 +461,7 @@ void cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_mode
 			     struct cnxk_ml_io_info *io_info,
 			     struct cn10k_ml_model_metadata *metadata);
 void cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
+int cn10k_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
+				uint16_t *layer_id);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index a471e98fbf..4191ccc840 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -576,7 +576,7 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 	size_t layer_xstats_size;
 	uint8_t *base_dma_addr;
 	uint16_t scratch_pages;
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	uint16_t wb_pages;
 	uint64_t mz_size;
 	uint16_t idx;
@@ -584,7 +584,6 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 	int ret;
 
 	PLT_SET_USED(size);
-	PLT_SET_USED(layer_name);
 
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
@@ -598,6 +597,10 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 		return -EINVAL;
 	}
 
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
 	layer = &model->layer[layer_id];
 
 	ret = cn10k_ml_model_metadata_check(buffer, size);
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 4c9a080c05..8536fd8927 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -110,3 +110,28 @@ mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params, struct mvtvm_ml_mo
 
 	return -EINVAL;
 }
+
+int
+mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name, uint16_t *layer_id)
+{
+	uint16_t i;
+
+	for (i = 0; i < model->mvtvm.metadata.model.nb_layers; i++) {
+		if (strcmp(model->layer[i].name, layer_name) == 0)
+			break;
+	}
+
+	if (i == model->mvtvm.metadata.model.nb_layers) {
+		plt_err("Invalid layer name: %s", layer_name);
+		return -EINVAL;
+	}
+
+	if (model->layer[i].type != ML_CNXK_LAYER_TYPE_MRVL) {
+		plt_err("Invalid layer type, name: %s type: %d", layer_name, model->layer[i].type);
+		return -EINVAL;
+	}
+
+	*layer_id = i;
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index b11b66f495..6cb2639876 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -11,6 +11,8 @@
 
 #include "cnxk_ml_io.h"
 
+struct cnxk_ml_model;
+
 /* Maximum number of objects per model */
 #define ML_MVTVM_MODEL_OBJECT_MAX 3
 
@@ -46,5 +48,7 @@ struct mvtvm_ml_model_data {
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
 int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
 			      struct mvtvm_ml_model_object *object);
+int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
+				uint16_t *layer_id);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index e2413b6b15..1fe0a04301 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -49,9 +49,13 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		    struct cnxk_ml_model *model)
 {
 	struct mvtvm_ml_model_object object[ML_MVTVM_MODEL_OBJECT_MAX];
+	struct tvmrt_glow_callback *callback;
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
 	size_t model_object_size = 0;
+	uint16_t nb_mrvl_layers;
+	uint16_t nb_llvm_layers;
+	uint8_t layer_id = 0;
 	uint64_t mz_size = 0;
 	int ret;
 
@@ -99,5 +103,82 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	rte_memcpy(model->mvtvm.object.params.addr, object[2].buffer, object[2].size);
 	rte_free(object[2].buffer);
 
+	/* Get metadata - stage 1 */
+	ret = tvmdp_model_metadata_get_stage1(model->mvtvm.object.json.addr,
+					      model->mvtvm.object.json.size,
+					      &model->mvtvm.metadata);
+	if (ret != 0) {
+		plt_err("TVMDP: Failed to parse metadata - stage 1, model_id = %u, error = %d",
+			model->model_id, ret);
+		goto error;
+	}
+
+	/* Set model fields */
+	plt_strlcpy(model->name, model->mvtvm.metadata.model.name, TVMDP_NAME_STRLEN);
+	model->batch_size = 1;
+	model->nb_layers = model->mvtvm.metadata.model.nb_layers;
+
+	/* Update layer info */
+	nb_mrvl_layers = 0;
+	nb_llvm_layers = 0;
+	for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+		strncpy(model->layer[layer_id].name,
+			model->mvtvm.metadata.model.layer[layer_id].name, TVMDP_NAME_STRLEN);
+		if (strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "mrvl") == 0 ||
+		    strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "MRVL") == 0) {
+			model->layer[layer_id].type = ML_CNXK_LAYER_TYPE_MRVL;
+			nb_mrvl_layers++;
+		} else if (strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "llvm") == 0 ||
+			   strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "LLVM") == 0) {
+			model->layer[layer_id].type = ML_CNXK_LAYER_TYPE_LLVM;
+			nb_llvm_layers++;
+		}
+	}
+
+	if ((nb_llvm_layers == 0) && (nb_mrvl_layers == 0)) {
+		plt_err("Invalid model, nb_llvm_layers = %u, nb_mrvl_layers = %u", nb_llvm_layers,
+			nb_mrvl_layers);
+		goto error;
+	}
+
+	/* Set model subtype */
+	if ((nb_llvm_layers == 0) && (nb_mrvl_layers == 1))
+		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_MRVL;
+	else if ((nb_llvm_layers > 0) && (nb_mrvl_layers == 0))
+		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_LLVM;
+	else
+		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_HYBRID;
+
+	/* Set callback function array */
+	if (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_LLVM) {
+		callback = &model->mvtvm.cb;
+		callback->tvmrt_glow_layer_load = cn10k_ml_layer_load;
+		callback->tvmrt_glow_layer_unload = cn10k_ml_layer_unload;
+	} else {
+		callback = NULL;
+	}
+
+	/* Initialize model in TVMDP */
+	ret = tvmdp_model_load(cnxk_mldev, model->model_id, (void *)(&model->mvtvm.object),
+			       callback);
+	if (ret != 0) {
+		plt_err("TVMDP: Model load failed, model_id = %u, error = %d", model->model_id,
+			ret);
+		goto error;
+	}
+
+	/* Get model metadata - stage 2 */
+	ret = tvmdp_model_metadata_get_stage2(model->model_id, &model->mvtvm.metadata);
+	if (ret != 0) {
+		plt_err("TVMDP: Failed to get metadata, model_id = %u, error = %d\n",
+			model->model_id, ret);
+		goto error;
+	}
+
 	return 0;
+
+error:
+	rte_memzone_free(mz);
+
+	return ret;
 }
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index 7f3b3abb2e..d621dbc897 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -17,6 +17,16 @@ mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
 	return ML_CNXK_MODEL_TYPE_UNKNOWN;
 }
 
+int
+mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name, uint16_t *layer_id)
+{
+	RTE_SET_USED(model);
+	RTE_SET_USED(layer_name);
+	RTE_SET_USED(layer_id);
+
+	return -EINVAL;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 4bb1772ef4..23fdfdc4cd 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -16,4 +16,7 @@ int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 
+int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
+				uint16_t *layer_id);
+
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 23/34] ml/cnxk: update internal info for TVM model
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
                     ` (21 preceding siblings ...)
  2023-10-18  6:47   ` [PATCH v5 22/34] ml/cnxk: fetch layer info and load TVM model Srikanth Yalavarthi
@ 2023-10-18  6:47   ` Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 24/34] ml/cnxk: enable model unload in tvmdp library Srikanth Yalavarthi
                     ` (10 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:47 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enabled updating internal IO info structures for TVM model.
Compute static fields related to the model I/O.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_ops.c    |   4 ++
 drivers/ml/cnxk/mvtvm_ml_model.c | 111 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |   2 +
 drivers/ml/cnxk/mvtvm_ml_ops.c   |   3 +
 drivers/ml/cnxk/mvtvm_ml_stubs.c |   9 +++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |   1 +
 6 files changed, 130 insertions(+)

diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index b18271545d..90b23d9c1c 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1244,6 +1244,8 @@ cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buf
 
 	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
 		info = cn10k_ml_model_io_info_get(model, 0);
+	else
+		info = mvtvm_ml_model_io_info_get(model, 0);
 
 	if (info == NULL)
 		return -EINVAL;
@@ -1296,6 +1298,8 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 
 	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
 		info = cn10k_ml_model_io_info_get(model, model->nb_layers - 1);
+	else
+		info = mvtvm_ml_model_io_info_get(model, model->nb_layers - 1);
 
 	if (info == NULL)
 		return -EINVAL;
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 8536fd8927..14f4b258d8 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -7,6 +7,8 @@
 
 #include <rte_mldev.h>
 
+#include <mldev_utils.h>
+
 #include <roc_api.h>
 
 #include "cnxk_ml_model.h"
@@ -135,3 +137,112 @@ mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 
 	return 0;
 }
+
+static enum rte_ml_io_type
+mvtvm_ml_io_type_map(uint8_t type)
+{
+	switch (type) {
+	case kDLInt:
+		return RTE_ML_IO_TYPE_INT32;
+	case kDLUInt:
+		return RTE_ML_IO_TYPE_UINT32;
+	case kDLFloat:
+		return RTE_ML_IO_TYPE_FP32;
+	case kDLBfloat:
+		return RTE_ML_IO_TYPE_BFLOAT16;
+	}
+
+	return RTE_ML_IO_TYPE_UNKNOWN;
+}
+
+void
+mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model)
+{
+	struct tvmdp_model_metadata *metadata;
+	int32_t i;
+	int32_t j;
+
+	if (model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL)
+		goto tvm_mrvl_model;
+
+	metadata = &model->mvtvm.metadata;
+
+	/* Inputs, set for layer_id = 0 */
+	model->mvtvm.info.nb_inputs = metadata->model.num_input;
+	model->mvtvm.info.total_input_sz_d = 0;
+	model->mvtvm.info.total_input_sz_q = 0;
+	for (i = 0; i < metadata->model.num_input; i++) {
+		strncpy(model->mvtvm.info.input[i].name, metadata->input[i].name,
+			TVMDP_NAME_STRLEN);
+		model->mvtvm.info.input[i].dtype =
+			mvtvm_ml_io_type_map(metadata->input[i].datatype.code);
+		model->mvtvm.info.input[i].qtype =
+			mvtvm_ml_io_type_map(metadata->input[i].model_datatype.code);
+		model->mvtvm.info.input[i].nb_dims = metadata->input[i].ndim;
+
+		model->mvtvm.info.input[i].nb_elements = 1;
+		for (j = 0; j < metadata->input[i].ndim; j++) {
+			model->mvtvm.info.input[i].shape[j] = metadata->input[i].shape[j];
+			model->mvtvm.info.input[i].nb_elements *= metadata->input[i].shape[j];
+		}
+
+		model->mvtvm.info.input[i].sz_d =
+			model->mvtvm.info.input[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.input[i].dtype);
+		model->mvtvm.info.input[i].sz_q =
+			model->mvtvm.info.input[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.input[i].qtype);
+
+		model->mvtvm.info.total_input_sz_d += model->mvtvm.info.input[i].sz_d;
+		model->mvtvm.info.total_input_sz_q += model->mvtvm.info.input[i].sz_q;
+
+		plt_ml_dbg("model_id = %u, input[%u] - sz_d = %u sz_q = %u", model->model_id, i,
+			   model->mvtvm.info.input[i].sz_d, model->mvtvm.info.input[i].sz_q);
+	}
+
+	/* Outputs, set for nb_layers - 1 */
+	model->mvtvm.info.nb_outputs = metadata->model.num_output;
+	model->mvtvm.info.total_output_sz_d = 0;
+	model->mvtvm.info.total_output_sz_q = 0;
+	for (i = 0; i < metadata->model.num_output; i++) {
+		strncpy(model->mvtvm.info.output[i].name, metadata->output[i].name,
+			TVMDP_NAME_STRLEN);
+		model->mvtvm.info.output[i].dtype =
+			mvtvm_ml_io_type_map(metadata->output[i].datatype.code);
+		model->mvtvm.info.output[i].qtype =
+			mvtvm_ml_io_type_map(metadata->output[i].model_datatype.code);
+		model->mvtvm.info.output[i].nb_dims = metadata->output[i].ndim;
+
+		model->mvtvm.info.output[i].nb_elements = 1;
+		for (j = 0; j < metadata->output[i].ndim; j++) {
+			model->mvtvm.info.output[i].shape[j] = metadata->output[i].shape[j];
+			model->mvtvm.info.output[i].nb_elements *= metadata->output[i].shape[j];
+		}
+
+		model->mvtvm.info.output[i].sz_d =
+			model->mvtvm.info.output[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.output[i].dtype);
+		model->mvtvm.info.output[i].sz_q =
+			model->mvtvm.info.output[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.output[i].qtype);
+
+		model->mvtvm.info.total_output_sz_d += model->mvtvm.info.output[i].sz_d;
+		model->mvtvm.info.total_output_sz_q += model->mvtvm.info.output[i].sz_q;
+
+		plt_ml_dbg("model_id = %u, output[%u] - sz_d = %u sz_q = %u", model->model_id, i,
+			   model->mvtvm.info.output[i].sz_d, model->mvtvm.info.output[i].sz_q);
+	}
+
+	return;
+
+tvm_mrvl_model:
+	cn10k_ml_layer_io_info_set(&model->mvtvm.info, &model->layer[0].glow.metadata);
+}
+
+struct cnxk_ml_io_info *
+mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
+{
+	RTE_SET_USED(layer_id);
+
+	return &model->mvtvm.info;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 6cb2639876..e86581bc6a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -50,5 +50,7 @@ int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
 			      struct mvtvm_ml_model_object *object);
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
+void mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model);
+struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 1fe0a04301..e248310cb3 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -175,6 +175,9 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		goto error;
 	}
 
+	/* Update model I/O data */
+	mvtvm_ml_model_io_info_set(model);
+
 	return 0;
 
 error:
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index d621dbc897..80a9a90b4e 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -27,6 +27,15 @@ mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 	return -EINVAL;
 }
 
+struct cnxk_ml_io_info *
+mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
+{
+	RTE_SET_USED(model);
+	RTE_SET_USED(layer_id);
+
+	return NULL;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 23fdfdc4cd..29f721072a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -18,5 +18,6 @@ int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_para
 
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
+struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
 
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 24/34] ml/cnxk: enable model unload in tvmdp library
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
                     ` (22 preceding siblings ...)
  2023-10-18  6:47   ` [PATCH v5 23/34] ml/cnxk: update internal info for " Srikanth Yalavarthi
@ 2023-10-18  6:47   ` Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 25/34] ml/cnxk: enable OCM check for multilayer TVM model Srikanth Yalavarthi
                     ` (9 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:47 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enable unloading model using external tvmdp library. Updated
layer unload callback to support multiple layers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c   |  8 +++++---
 drivers/ml/cnxk/cnxk_ml_ops.c    |  7 +++++--
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 28 ++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |  1 +
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  9 +++++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  1 +
 6 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 4191ccc840..e7208391fd 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -780,11 +780,9 @@ cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name)
 	struct cnxk_ml_layer *layer;
 
 	char str[RTE_MEMZONE_NAMESIZE];
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	int ret;
 
-	PLT_SET_USED(layer_name);
-
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
 		plt_err("Invalid device = %p", device);
@@ -797,6 +795,10 @@ cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name)
 		return -EINVAL;
 	}
 
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
 	layer = &model->layer[layer_id];
 
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u_%u", CN10K_ML_LAYER_MEMZONE_NAME,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 90b23d9c1c..cd95a3c7ad 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1107,7 +1107,7 @@ cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 	struct cnxk_ml_model *model;
 
 	char str[RTE_MEMZONE_NAMESIZE];
-	int ret;
+	int ret = 0;
 
 	if (dev == NULL)
 		return -EINVAL;
@@ -1125,7 +1125,10 @@ cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EBUSY;
 	}
 
-	ret = cn10k_ml_model_unload(cnxk_mldev, model);
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		ret = cn10k_ml_model_unload(cnxk_mldev, model);
+	else
+		ret = mvtvm_ml_model_unload(cnxk_mldev, model);
 	if (ret != 0)
 		return ret;
 
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index e248310cb3..9fd9e58de6 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -185,3 +185,31 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	return ret;
 }
+
+int
+mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	int ret;
+
+	RTE_SET_USED(cnxk_mldev);
+
+	/* Initialize model in TVMDP */
+	ret = tvmdp_model_unload(model->model_id);
+	if (ret != 0) {
+		plt_err("TVMDP: Model unload failed, model_id = %u, error = %d", model->model_id,
+			ret);
+		return ret;
+	}
+
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", MVTVM_ML_MODEL_MEMZONE_NAME, model->model_id);
+	mz = rte_memzone_lookup(str);
+	if (mz == NULL) {
+		plt_err("Memzone lookup failed for TVM model: model_id = %u, mz = %s",
+			model->model_id, str);
+		return -EINVAL;
+	}
+
+	return plt_memzone_free(mz);
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 6607537599..770794fe7d 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -18,5 +18,6 @@ int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_d
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
+int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index 80a9a90b4e..a17a76e41f 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -63,3 +63,12 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	return -EINVAL;
 }
+
+int
+mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+
+	return -EINVAL;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 29f721072a..3776fb5369 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -15,6 +15,7 @@ int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_d
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
+int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 25/34] ml/cnxk: enable OCM check for multilayer TVM model
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
                     ` (23 preceding siblings ...)
  2023-10-18  6:47   ` [PATCH v5 24/34] ml/cnxk: enable model unload in tvmdp library Srikanth Yalavarthi
@ 2023-10-18  6:47   ` Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 26/34] ml/cnxk: support start and stop for TVM models Srikanth Yalavarthi
                     ` (8 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:47 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

From: Anup Prabhu <aprabhu@marvell.com>

Enabled check for OCM size requirement for multi-layer
TVM model. Compute OCM scratch and WB requirement for
all layers during the load stage.

Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_ops.c | 60 +++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index cd95a3c7ad..03f4783b3f 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1023,8 +1023,12 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
+	uint16_t max_scratch_pages;
+	struct cn10k_ml_ocm *ocm;
 	uint64_t model_info_size;
+	uint16_t total_wb_pages;
 	uint16_t lcl_model_id;
+	uint16_t layer_id;
 	uint64_t mz_size;
 	bool found;
 	int ret;
@@ -1086,6 +1090,62 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 	if (ret != 0)
 		goto error;
 
+	max_scratch_pages = 0;
+	total_wb_pages = 0;
+	layer_id = 0;
+
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
+
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
+		total_wb_pages = total_wb_pages + model->layer[layer_id].glow.ocm_map.wb_pages;
+		max_scratch_pages = PLT_MAX(max_scratch_pages,
+					    model->layer[layer_id].glow.ocm_map.scratch_pages);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	} else {
+		for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+			if (model->layer[layer_id].type == ML_CNXK_LAYER_TYPE_MRVL) {
+				total_wb_pages = total_wb_pages +
+						 model->layer[layer_id].glow.ocm_map.wb_pages;
+				max_scratch_pages =
+					PLT_MAX(max_scratch_pages,
+						model->layer[layer_id].glow.ocm_map.scratch_pages);
+			}
+		}
+#endif
+	}
+
+	if ((total_wb_pages + max_scratch_pages) > ocm->num_pages) {
+		plt_err("model_id = %u: total_wb_pages (%u) + scratch_pages (%u) >  %u\n",
+			lcl_model_id, total_wb_pages, max_scratch_pages, ocm->num_pages);
+
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
+			plt_ml_dbg("layer_id = %u: wb_pages = %u, scratch_pages = %u\n", layer_id,
+				   model->layer[layer_id].glow.ocm_map.wb_pages,
+				   model->layer[layer_id].glow.ocm_map.scratch_pages);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		} else {
+			for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers;
+			     layer_id++) {
+				if (model->layer[layer_id].type == ML_CNXK_LAYER_TYPE_MRVL) {
+					plt_ml_dbg(
+						"layer_id = %u: wb_pages = %u, scratch_pages = %u\n",
+						layer_id,
+						model->layer[layer_id].glow.ocm_map.wb_pages,
+						model->layer[layer_id].glow.ocm_map.scratch_pages);
+				}
+			}
+#endif
+		}
+
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+			cn10k_ml_model_unload(cnxk_mldev, model);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		else {
+			mvtvm_ml_model_unload(cnxk_mldev, model);
+			return -ENOMEM;
+		}
+#endif
+	}
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CNXK_MODEL_STATE_LOADED;
 	cnxk_mldev->nb_models_loaded++;
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 26/34] ml/cnxk: support start and stop for TVM models
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
                     ` (24 preceding siblings ...)
  2023-10-18  6:47   ` [PATCH v5 25/34] ml/cnxk: enable OCM check for multilayer TVM model Srikanth Yalavarthi
@ 2023-10-18  6:47   ` Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 27/34] ml/cnxk: update internal TVM model info structure Srikanth Yalavarthi
                     ` (7 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:47 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to start and stop TVM models. TVM model
start would invoke layer start for all Glow layers part
of the model. TVM model stop would invoke layer stop
for all Glow layers part of the model.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c   | 16 ++++++----
 drivers/ml/cnxk/cnxk_ml_ops.c    | 14 +++++++--
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 52 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |  2 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c | 18 +++++++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  2 ++
 6 files changed, 96 insertions(+), 8 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index e7208391fd..2d308802cf 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -827,7 +827,7 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 	struct cn10k_ml_ocm *ocm;
 	struct cnxk_ml_req *req;
 
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	bool job_enqueued;
 	bool job_dequeued;
 	uint8_t num_tiles;
@@ -838,8 +838,6 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 	bool locked;
 	int ret = 0;
 
-	PLT_SET_USED(layer_name);
-
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
 		plt_err("Invalid device = %p", device);
@@ -852,6 +850,10 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 		return -EINVAL;
 	}
 
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
 	layer = &model->layer[layer_id];
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
@@ -1015,14 +1017,12 @@ cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name)
 	struct cn10k_ml_ocm *ocm;
 	struct cnxk_ml_req *req;
 
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	bool job_enqueued;
 	bool job_dequeued;
 	bool locked;
 	int ret = 0;
 
-	PLT_SET_USED(layer_name);
-
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
 		plt_err("Invalid device = %p", device);
@@ -1035,6 +1035,10 @@ cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name)
 		return -EINVAL;
 	}
 
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
 	layer = &model->layer[layer_id];
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 03f4783b3f..66cda513db 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1216,7 +1216,12 @@ cnxk_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EINVAL;
 	}
 
-	return cn10k_ml_model_start(cnxk_mldev, model);
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		return cn10k_ml_model_start(cnxk_mldev, model);
+	else
+		return mvtvm_ml_model_start(cnxk_mldev, model);
+
+	return 0;
 }
 
 int
@@ -1236,7 +1241,12 @@ cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EINVAL;
 	}
 
-	return cn10k_ml_model_stop(cnxk_mldev, model);
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		return cn10k_ml_model_stop(cnxk_mldev, model);
+	else
+		return mvtvm_ml_model_stop(cnxk_mldev, model);
+
+	return 0;
 }
 
 static int
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 9fd9e58de6..1d0b3544a7 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -213,3 +213,55 @@ mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mode
 
 	return plt_memzone_free(mz);
 }
+
+int
+mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+
+	uint16_t layer_id = 0;
+	int ret = 0;
+
+next_layer:
+	layer = &model->layer[layer_id];
+	if (layer->type == ML_CNXK_LAYER_TYPE_MRVL) {
+		ret = cn10k_ml_layer_start(cnxk_mldev, model->model_id, layer->name);
+		if (ret != 0) {
+			plt_err("Layer start failed, model_id = %u, layer_name = %s, error = %d",
+				model->model_id, layer->name, ret);
+			return ret;
+		}
+	}
+	layer_id++;
+
+	if (layer_id < model->nb_layers)
+		goto next_layer;
+
+	return 0;
+}
+
+int
+mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+
+	uint16_t layer_id = 0;
+	int ret = 0;
+
+next_layer:
+	layer = &model->layer[layer_id];
+	if (layer->type == ML_CNXK_LAYER_TYPE_MRVL) {
+		ret = cn10k_ml_layer_stop(cnxk_mldev, model->model_id, layer->name);
+		if (ret != 0) {
+			plt_err("Layer stop failed, model_id = %u, layer_name = %s, error = %d",
+				model->model_id, layer->name, ret);
+			return ret;
+		}
+	}
+	layer_id++;
+
+	if (layer_id < model->nb_layers)
+		goto next_layer;
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 770794fe7d..55459f9f7f 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -19,5 +19,7 @@ int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index a17a76e41f..b8c2e6a1fc 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -72,3 +72,21 @@ mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mode
 
 	return -EINVAL;
 }
+
+int
+mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+
+	return -EINVAL;
+}
+
+int
+mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+
+	return -EINVAL;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 3776fb5369..1eb663b1d1 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -16,6 +16,8 @@ int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 27/34] ml/cnxk: update internal TVM model info structure
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
                     ` (25 preceding siblings ...)
  2023-10-18  6:47   ` [PATCH v5 26/34] ml/cnxk: support start and stop for TVM models Srikanth Yalavarthi
@ 2023-10-18  6:47   ` Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 28/34] ml/cnxk: support device dump for TVM models Srikanth Yalavarthi
                     ` (6 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:47 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

From: Prince Takkar <ptakkar@marvell.com>

Added support to update internal model info structure
for TVM models.

Signed-off-by: Prince Takkar <ptakkar@marvell.com>
Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/mvtvm_ml_model.c | 65 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 +
 drivers/ml/cnxk/mvtvm_ml_ops.c   |  3 ++
 3 files changed, 70 insertions(+)

diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 14f4b258d8..569147aca7 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -11,6 +11,7 @@
 
 #include <roc_api.h>
 
+#include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 
 /* Objects list */
@@ -246,3 +247,67 @@ mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
 
 	return &model->mvtvm.info;
 }
+
+void
+mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct tvmdp_model_metadata *metadata;
+	struct rte_ml_model_info *info;
+	struct rte_ml_io_info *output;
+	struct rte_ml_io_info *input;
+	uint8_t i;
+
+	info = PLT_PTR_CAST(model->info);
+	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
+	output = PLT_PTR_ADD(input, ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info));
+
+	/* Reset model info */
+	memset(info, 0, sizeof(struct rte_ml_model_info));
+
+	if (model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL)
+		goto tvm_mrvl_model;
+
+	metadata = &model->mvtvm.metadata;
+	rte_memcpy(info->name, metadata->model.name, TVMDP_NAME_STRLEN);
+	snprintf(info->version, RTE_ML_STR_MAX, "%u.%u.%u.%u", metadata->model.version[0],
+		 metadata->model.version[1], metadata->model.version[2],
+		 metadata->model.version[3]);
+	info->model_id = model->model_id;
+	info->device_id = cnxk_mldev->mldev->data->dev_id;
+	info->io_layout = RTE_ML_IO_LAYOUT_SPLIT;
+	info->min_batches = model->batch_size;
+	info->max_batches = model->batch_size;
+	info->nb_inputs = metadata->model.num_input;
+	info->input_info = input;
+	info->nb_outputs = metadata->model.num_output;
+	info->output_info = output;
+	info->wb_size = 0;
+
+	/* Set input info */
+	for (i = 0; i < info->nb_inputs; i++) {
+		rte_memcpy(input[i].name, metadata->input[i].name, MRVL_ML_INPUT_NAME_LEN);
+		input[i].nb_dims = metadata->input[i].ndim;
+		input[i].shape = &model->mvtvm.info.input[i].shape[0];
+		input[i].type = model->mvtvm.info.input[i].qtype;
+		input[i].nb_elements = model->mvtvm.info.input[i].nb_elements;
+		input[i].size = model->mvtvm.info.input[i].nb_elements *
+				rte_ml_io_type_size_get(model->mvtvm.info.input[i].qtype);
+	}
+
+	/* Set output info */
+	for (i = 0; i < info->nb_outputs; i++) {
+		rte_memcpy(output[i].name, metadata->output[i].name, MRVL_ML_OUTPUT_NAME_LEN);
+		output[i].nb_dims = metadata->output[i].ndim;
+		output[i].shape = &model->mvtvm.info.output[i].shape[0];
+		output[i].type = model->mvtvm.info.output[i].qtype;
+		output[i].nb_elements = model->mvtvm.info.output[i].nb_elements;
+		output[i].size = model->mvtvm.info.output[i].nb_elements *
+				 rte_ml_io_type_size_get(model->mvtvm.info.output[i].qtype);
+	}
+
+	return;
+
+tvm_mrvl_model:
+	cn10k_ml_model_info_set(cnxk_mldev, model, &model->mvtvm.info,
+				&model->layer[0].glow.metadata);
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index e86581bc6a..a1247ffbde 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -11,6 +11,7 @@
 
 #include "cnxk_ml_io.h"
 
+struct cnxk_ml_dev;
 struct cnxk_ml_model;
 
 /* Maximum number of objects per model */
@@ -52,5 +53,6 @@ int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_n
 				uint16_t *layer_id);
 void mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model);
 struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
+void mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 1d0b3544a7..f13ba76207 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -178,6 +178,9 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	/* Update model I/O data */
 	mvtvm_ml_model_io_info_set(model);
 
+	/* Set model info */
+	mvtvm_ml_model_info_set(cnxk_mldev, model);
+
 	return 0;
 
 error:
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 28/34] ml/cnxk: support device dump for TVM models
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
                     ` (26 preceding siblings ...)
  2023-10-18  6:47   ` [PATCH v5 27/34] ml/cnxk: update internal TVM model info structure Srikanth Yalavarthi
@ 2023-10-18  6:47   ` Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 29/34] ml/cnxk: enable reporting model runtime as xstats Srikanth Yalavarthi
                     ` (5 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:47 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enabled support to print TVM model layer info.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_model.c  |  7 +++-
 drivers/ml/cnxk/mvtvm_ml_model.c | 59 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  8 +++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  2 ++
 5 files changed, 77 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
index 02f80410ec..ed6a1ed866 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.c
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -68,6 +68,8 @@ cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 	cnxk_ml_print_line(fp, LINE_LEN);
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
 	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->name);
+	fprintf(fp, "%*s : %d\n", FIELD_LEN, "type", model->type);
+	fprintf(fp, "%*s : %d\n", FIELD_LEN, "subtype", model->subtype);
 	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->batch_size);
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "nb_layers", model->nb_layers);
@@ -84,6 +86,9 @@ cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 
 	for (layer_id = 0; layer_id < model->nb_layers; layer_id++) {
 		layer = &model->layer[layer_id];
-		cn10k_ml_layer_print(cnxk_mldev, layer, fp);
+		if (layer->type == ML_CNXK_LAYER_TYPE_MRVL)
+			cn10k_ml_layer_print(cnxk_mldev, layer, fp);
+		else
+			mvtvm_ml_layer_print(cnxk_mldev, layer, fp);
 	}
 }
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 569147aca7..4c12f584d5 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -13,6 +13,7 @@
 
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_utils.h"
 
 /* Objects list */
 char mvtvm_object_list[ML_MVTVM_MODEL_OBJECT_MAX][RTE_ML_STR_MAX] = {"mod.so", "mod.json",
@@ -311,3 +312,61 @@ mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 	cn10k_ml_model_info_set(cnxk_mldev, model, &model->mvtvm.info,
 				&model->layer[0].glow.metadata);
 }
+
+void
+mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp)
+{
+	char str[STR_LEN];
+	uint8_t i;
+
+	/* Print debug info */
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, " Layer Information (Layer ID: %u, Name: %s)\n",
+		cnxk_mldev->index_map[layer->index].layer_id, layer->name);
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "layer_id",
+		cnxk_mldev->index_map[layer->index].layer_id);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", layer->name);
+	fprintf(fp, "%*s : %d\n", FIELD_LEN, "type", layer->type);
+	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "layer", PLT_U64_CAST(layer));
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", layer->batch_size);
+
+	/* Print model state */
+	if (layer->state == ML_CNXK_LAYER_STATE_LOADED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+	if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+	if (layer->state == ML_CNXK_LAYER_STATE_STARTED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", layer->info.nb_inputs);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", layer->info.nb_outputs);
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s\n", "input", "input_name", "input_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->info.nb_inputs; i++) {
+		fprintf(fp, "%8u  ", i);
+		fprintf(fp, "%*s  ", 16, layer->info.input[i].name);
+		rte_ml_io_type_to_str(layer->info.input[i].qtype, str, STR_LEN);
+		fprintf(fp, "%*s  ", 12, str);
+	}
+	fprintf(fp, "\n");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s\n", "output", "output_name", "output_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->info.nb_outputs; i++) {
+		fprintf(fp, "%8u  ", i);
+		fprintf(fp, "%*s  ", 16, layer->info.output[i].name);
+		rte_ml_io_type_to_str(layer->info.output[i].qtype, str, STR_LEN);
+		fprintf(fp, "%*s  ", 12, str);
+		fprintf(fp, "\n");
+	}
+	fprintf(fp, "\n");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index a1247ffbde..900ba44fa0 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -13,6 +13,7 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 /* Maximum number of objects per model */
 #define ML_MVTVM_MODEL_OBJECT_MAX 3
@@ -54,5 +55,6 @@ int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_n
 void mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model);
 struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
 void mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+void mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index b8c2e6a1fc..260a051b08 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -36,6 +36,14 @@ mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
 	return NULL;
 }
 
+void
+mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(layer);
+	RTE_SET_USED(fp);
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 1eb663b1d1..d6d0edbcf1 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -9,6 +9,7 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
@@ -22,5 +23,6 @@ int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
 struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
+void mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
 
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 29/34] ml/cnxk: enable reporting model runtime as xstats
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
                     ` (27 preceding siblings ...)
  2023-10-18  6:47   ` [PATCH v5 28/34] ml/cnxk: support device dump for TVM models Srikanth Yalavarthi
@ 2023-10-18  6:47   ` Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 30/34] ml/cnxk: implement I/O alloc and free callbacks Srikanth Yalavarthi
                     ` (4 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:47 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added model xstats entries to compute runtime latency.
Allocated internal resources for TVM model xstats.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c   |   9 +++
 drivers/ml/cnxk/cn10k_ml_ops.h   |   2 +
 drivers/ml/cnxk/cnxk_ml_ops.c    | 131 +++++++++++++++++++++++++++----
 drivers/ml/cnxk/cnxk_ml_ops.h    |   1 +
 drivers/ml/cnxk/cnxk_ml_xstats.h |   7 ++
 drivers/ml/cnxk/mvtvm_ml_model.h |  24 ++++++
 drivers/ml/cnxk/mvtvm_ml_ops.c   |  96 +++++++++++++++++++++-
 drivers/ml/cnxk/mvtvm_ml_ops.h   |   8 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  23 ++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |   6 ++
 10 files changed, 289 insertions(+), 18 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 2d308802cf..0c67ce7b40 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -197,6 +197,15 @@ cn10k_ml_xstats_layer_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model
 	}
 }
 
+void
+cn10k_ml_xstat_model_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			      uint16_t stat_id, uint16_t entry, char *suffix)
+{
+	snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+		 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+		 model->glow.metadata.model.name, model_xstats[entry].name, suffix);
+}
+
 #define ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
 	do {                                                                                       \
 		value = 0;                                                                         \
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 3d18303ed3..045e2e6cd2 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -331,6 +331,8 @@ int cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name
 int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name);
 
 /* xstats ops */
+void cn10k_ml_xstat_model_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				   uint16_t stat_id, uint16_t entry, char *suffix);
 uint64_t cn10k_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				  enum cnxk_ml_xstats_type type);
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 66cda513db..fd2c46ac1f 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -138,7 +138,8 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
 	nb_stats = RTE_DIM(device_xstats) +
-		   RTE_DIM(layer_xstats) * ML_CNXK_MAX_MODELS * ML_CNXK_MODEL_MAX_LAYERS;
+		   RTE_DIM(layer_xstats) * ML_CNXK_MAX_MODELS * ML_CNXK_MODEL_MAX_LAYERS +
+		   RTE_DIM(model_xstats) * ML_CNXK_MAX_MODELS;
 	if (cnxk_mldev->xstats.entries == NULL)
 		cnxk_mldev->xstats.entries = rte_zmalloc(
 			"cnxk_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
@@ -169,6 +170,25 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
 		cnxk_mldev->xstats.offset_for_model[model] = stat_id;
 
+		for (i = 0; i < RTE_DIM(model_xstats); i++) {
+			cnxk_mldev->xstats.entries[stat_id].map.id = stat_id;
+			cnxk_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
+			cnxk_mldev->xstats.entries[stat_id].group = CNXK_ML_XSTATS_GROUP_MODEL;
+			cnxk_mldev->xstats.entries[stat_id].type = model_xstats[i].type;
+			cnxk_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
+			cnxk_mldev->xstats.entries[stat_id].obj_idx = model;
+			cnxk_mldev->xstats.entries[stat_id].layer_id = -1;
+			cnxk_mldev->xstats.entries[stat_id].reset_allowed =
+				model_xstats[i].reset_allowed;
+
+			/* Name of xstat is updated during model load */
+			snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+				 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name),
+				 "Model-%u-%s", model, model_xstats[i].name);
+
+			stat_id++;
+		}
+
 		for (layer = 0; layer < ML_CNXK_MODEL_MAX_LAYERS; layer++) {
 			cnxk_mldev->xstats.offset_for_layer[model][layer] = stat_id;
 
@@ -195,7 +215,8 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 			cnxk_mldev->xstats.count_per_layer[model][layer] = RTE_DIM(layer_xstats);
 		}
 
-		cnxk_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
+		cnxk_mldev->xstats.count_per_model[model] =
+			RTE_DIM(layer_xstats) + ML_CNXK_MODEL_MAX_LAYERS * RTE_DIM(model_xstats);
 	}
 
 	cnxk_mldev->xstats.count_mode_model = stat_id - cnxk_mldev->xstats.count_mode_device;
@@ -204,6 +225,36 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
+void
+cnxk_ml_xstats_model_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id)
+{
+	struct cnxk_ml_model *model;
+	uint16_t rclk_freq;
+	uint16_t sclk_freq;
+	uint16_t stat_id;
+	char suffix[8];
+	uint16_t i;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	stat_id = cnxk_mldev->xstats.offset_for_model[model_id];
+
+	roc_clk_freq_get(&rclk_freq, &sclk_freq);
+	if (sclk_freq == 0)
+		strcpy(suffix, "cycles");
+	else
+		strcpy(suffix, "ns");
+
+	/* Update xstat name based on layer name and sclk availability */
+	for (i = 0; i < RTE_DIM(model_xstats); i++) {
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+			cn10k_ml_xstat_model_name_set(cnxk_mldev, model, stat_id, i, suffix);
+		else
+			mvtvm_ml_model_xstat_name_set(cnxk_mldev, model, stat_id, i, suffix);
+
+		stat_id++;
+	}
+}
+
 static void
 cnxk_ml_xstats_uninit(struct cnxk_ml_dev *cnxk_mldev)
 {
@@ -247,13 +298,22 @@ cnxk_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx, int32_
 	if (model == NULL)
 		return 0;
 
-	if (layer_id >= 0)
+	if (layer_id >= 0) {
 		layer = &model->layer[layer_id];
-	else
-		return 0;
+		goto layer_xstats;
+	} else {
+		layer = NULL;
+		goto model_xstats;
+	}
 
+layer_xstats:
 	value = cn10k_ml_model_xstat_get(cnxk_mldev, layer, type);
+	goto exit_xstats;
 
+model_xstats:
+	value = mvtvm_ml_model_xstat_get(cnxk_mldev, model, type);
+
+exit_xstats:
 	roc_clk_freq_get(&rclk_freq, &sclk_freq);
 	if (sclk_freq != 0) /* return in ns */
 		value = (value * 1000ULL) / sclk_freq;
@@ -836,8 +896,9 @@ cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode
 {
 	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
 	uint32_t xstats_mode_count;
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	uint32_t idx = 0;
 	uint32_t i;
 
@@ -854,7 +915,17 @@ cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CNXK_MAX_MODELS)
 			break;
-		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+
+		model = cnxk_mldev->mldev->data->models[model_id];
+		for (layer_id = 0; layer_id < model->nb_layers; layer_id++) {
+			if (model->layer[layer_id].type == ML_CNXK_LAYER_TYPE_MRVL)
+				xstats_mode_count +=
+					cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+		}
+
+		if ((model->type == ML_CNXK_MODEL_TYPE_TVM) &&
+		    (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_MRVL))
+			xstats_mode_count += RTE_DIM(model_xstats);
 		break;
 	default:
 		return -EINVAL;
@@ -868,9 +939,20 @@ cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode
 		if (xs->mode != mode)
 			continue;
 
-		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
-		    (model_id != xs->obj_idx || layer_id != xs->layer_id))
-			continue;
+		if (mode == RTE_ML_DEV_XSTATS_MODEL) {
+			if (model_id != xs->obj_idx)
+				continue;
+
+			model = cnxk_mldev->mldev->data->models[model_id];
+			if ((model->type == ML_CNXK_MODEL_TYPE_GLOW ||
+			     model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL) &&
+			    xs->group == CNXK_ML_XSTATS_GROUP_MODEL)
+				continue;
+
+			if (model->type == ML_CNXK_MODEL_TYPE_TVM &&
+			    model->layer[xs->layer_id].type == ML_CNXK_LAYER_TYPE_LLVM)
+				continue;
+		}
 
 		strncpy(xstats_map[idx].name, xs->map.name, RTE_ML_STR_MAX);
 		xstats_map[idx].id = xs->map.id;
@@ -931,9 +1013,10 @@ cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 {
 	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
 	uint32_t xstats_mode_count;
-	uint16_t layer_id = 0;
 	cnxk_ml_xstats_fn fn;
+	uint16_t layer_id;
 	uint64_t val;
 	uint32_t idx;
 	uint32_t i;
@@ -951,7 +1034,14 @@ cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CNXK_MAX_MODELS)
 			return -EINVAL;
-		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+
+		model = cnxk_mldev->mldev->data->models[model_id];
+		for (layer_id = 0; layer_id < model->nb_layers; layer_id++)
+			xstats_mode_count += cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+
+		if ((model->type == ML_CNXK_MODEL_TYPE_TVM) &&
+		    (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_MRVL))
+			xstats_mode_count += RTE_DIM(model_xstats);
 		break;
 	default:
 		return -EINVAL;
@@ -963,11 +1053,18 @@ cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 		if (stat_ids[i] > cnxk_mldev->xstats.count || xs->mode != mode)
 			continue;
 
-		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
-		    (model_id != xs->obj_idx || layer_id != xs->layer_id)) {
-			plt_err("Invalid stats_id[%d] = %d for model_id = %d\n", i, stat_ids[i],
-				model_id);
-			return -EINVAL;
+		if (mode == RTE_ML_DEV_XSTATS_MODEL) {
+			if (model_id != xs->obj_idx)
+				continue;
+
+			model = cnxk_mldev->mldev->data->models[xs->obj_idx];
+			if ((model->type == ML_CNXK_MODEL_TYPE_GLOW ||
+			     model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL) &&
+			    xs->group == CNXK_ML_XSTATS_GROUP_MODEL)
+				continue;
+
+			if (xs->layer_id == -1 && xs->group == CNXK_ML_XSTATS_GROUP_LAYER)
+				continue;
 		}
 
 		switch (xs->fn_id) {
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index b22a2b0d95..ab32676b3e 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -70,6 +70,7 @@ extern struct rte_ml_dev_ops cnxk_ml_ops;
 
 int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
 int cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
+void cnxk_ml_xstats_model_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id);
 
 __rte_hot uint16_t cnxk_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					 struct rte_ml_op **ops, uint16_t nb_ops);
diff --git a/drivers/ml/cnxk/cnxk_ml_xstats.h b/drivers/ml/cnxk/cnxk_ml_xstats.h
index 5e02bb876c..a2c9adfe4a 100644
--- a/drivers/ml/cnxk/cnxk_ml_xstats.h
+++ b/drivers/ml/cnxk/cnxk_ml_xstats.h
@@ -142,4 +142,11 @@ static const struct cnxk_ml_xstat_info layer_xstats[] = {
 	{"Min-FW-Latency", min_fw_latency, 1}, {"Max-FW-Latency", max_fw_latency, 1},
 };
 
+/* Model xstats */
+static const struct cnxk_ml_xstat_info model_xstats[] = {
+	{"Avg-RT-Latency", avg_rt_latency, 1},
+	{"Min-RT-Latency", min_rt_latency, 1},
+	{"Max-RT-Latency", max_rt_latency, 1},
+};
+
 #endif /* _CNXK_ML_XSTATS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 900ba44fa0..66c3af18e1 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -33,6 +33,27 @@ struct mvtvm_ml_model_object {
 	int64_t size;
 };
 
+/* Model fast-path stats */
+struct mvtvm_ml_model_xstats {
+	/* Total TVM runtime latency, sum of all inferences */
+	uint64_t tvm_rt_latency_tot;
+
+	/* TVM runtime latency */
+	uint64_t tvm_rt_latency;
+
+	/* Minimum TVM runtime latency */
+	uint64_t tvm_rt_latency_min;
+
+	/* Maximum TVM runtime latency */
+	uint64_t tvm_rt_latency_max;
+
+	/* Total jobs dequeued */
+	uint64_t dequeued_count;
+
+	/* Hardware stats reset index */
+	uint64_t tvm_rt_reset_count;
+};
+
 struct mvtvm_ml_model_data {
 	/* Model metadata */
 	struct tvmdp_model_metadata metadata;
@@ -45,6 +66,9 @@ struct mvtvm_ml_model_data {
 
 	/* Model I/O info */
 	struct cnxk_ml_io_info info;
+
+	/* Stats for burst ops */
+	struct mvtvm_ml_model_xstats *burst_xstats;
 };
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index f13ba76207..832837034b 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -10,10 +10,83 @@
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
+#include "cnxk_ml_xstats.h"
 
 /* ML model macros */
 #define MVTVM_ML_MODEL_MEMZONE_NAME "ml_mvtvm_model_mz"
 
+void
+mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			      uint16_t stat_id, uint16_t entry, char *suffix)
+{
+	snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+		 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+		 model->mvtvm.metadata.model.name, model_xstats[entry].name, suffix);
+}
+
+#define ML_AVG_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count)                            \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value += model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_tot;              \
+			count += model->mvtvm.burst_xstats[qp_id].dequeued_count -                 \
+				 model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count;              \
+		}                                                                                  \
+		if (count != 0)                                                                    \
+			value = value / count;                                                     \
+	} while (0)
+
+#define ML_MIN_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count)                            \
+	do {                                                                                       \
+		value = UINT64_MAX;                                                                \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MIN(value,                                                     \
+					model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_min);      \
+			count += model->mvtvm.burst_xstats[qp_id].dequeued_count -                 \
+				 model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count;              \
+		}                                                                                  \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+
+#define ML_MAX_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count)                            \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MAX(value,                                                     \
+					model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_max);      \
+			count += model->mvtvm.burst_xstats[qp_id].dequeued_count -                 \
+				 model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count;              \
+		}                                                                                  \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+
+uint64_t
+mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			 enum cnxk_ml_xstats_type type)
+{
+	uint64_t count = 0;
+	uint64_t value = 0;
+	uint32_t qp_id;
+
+	switch (type) {
+	case avg_rt_latency:
+		ML_AVG_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count);
+		break;
+	case min_rt_latency:
+		ML_MIN_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count);
+		break;
+	case max_rt_latency:
+		ML_MAX_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count);
+		break;
+	default:
+		value = 0;
+	}
+
+	return value;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
@@ -53,6 +126,7 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
 	size_t model_object_size = 0;
+	size_t model_xstats_size = 0;
 	uint16_t nb_mrvl_layers;
 	uint16_t nb_llvm_layers;
 	uint8_t layer_id = 0;
@@ -68,7 +142,11 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	model_object_size = RTE_ALIGN_CEIL(object[0].size, RTE_CACHE_LINE_MIN_SIZE) +
 			    RTE_ALIGN_CEIL(object[1].size, RTE_CACHE_LINE_MIN_SIZE) +
 			    RTE_ALIGN_CEIL(object[2].size, RTE_CACHE_LINE_MIN_SIZE);
-	mz_size += model_object_size;
+
+	model_xstats_size =
+		cnxk_mldev->mldev->data->nb_queue_pairs * sizeof(struct mvtvm_ml_model_xstats);
+
+	mz_size += model_object_size + model_xstats_size;
 
 	/* Allocate memzone for model object */
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", MVTVM_ML_MODEL_MEMZONE_NAME, model->model_id);
@@ -181,6 +259,22 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	/* Set model info */
 	mvtvm_ml_model_info_set(cnxk_mldev, model);
 
+	/* Update model xstats name */
+	cnxk_ml_xstats_model_name_update(cnxk_mldev, model->model_id);
+
+	model->mvtvm.burst_xstats = RTE_PTR_ADD(
+		model->mvtvm.object.params.addr,
+		RTE_ALIGN_CEIL(model->mvtvm.object.params.size, RTE_CACHE_LINE_MIN_SIZE));
+
+	for (int qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_tot = 0;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency = 0;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_min = UINT64_MAX;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_max = 0;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count = 0;
+		model->mvtvm.burst_xstats[qp_id].dequeued_count = 0;
+	}
+
 	return 0;
 
 error:
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 55459f9f7f..22e0340146 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -11,8 +11,11 @@
 
 #include <rte_mldev.h>
 
+#include "cnxk_ml_xstats.h"
+
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
@@ -22,4 +25,9 @@ int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *
 int mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
+void mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				   uint16_t stat_id, uint16_t entry, char *suffix);
+uint64_t mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				  enum cnxk_ml_xstats_type type);
+
 #endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index 260a051b08..19af1d2703 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -8,6 +8,7 @@
 
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_xstats.h"
 
 enum cnxk_ml_model_type
 mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
@@ -44,6 +45,28 @@ mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer
 	RTE_SET_USED(fp);
 }
 
+void
+mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			      uint16_t stat_id, uint16_t entry, char *suffix)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+	RTE_SET_USED(stat_id);
+	RTE_SET_USED(entry);
+	RTE_SET_USED(suffix);
+}
+
+uint64_t
+mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			 enum cnxk_ml_xstats_type type)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+	RTE_SET_USED(type);
+
+	return 0;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index d6d0edbcf1..3fd1f04c35 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -7,6 +7,8 @@
 
 #include <rte_mldev.h>
 
+#include "cnxk_ml_xstats.h"
+
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
@@ -24,5 +26,9 @@ int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_n
 				uint16_t *layer_id);
 struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
 void mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
+void mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				   uint16_t stat_id, uint16_t entry, char *suffix);
+uint64_t mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				  enum cnxk_ml_xstats_type type);
 
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 30/34] ml/cnxk: implement I/O alloc and free callbacks
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
                     ` (28 preceding siblings ...)
  2023-10-18  6:47   ` [PATCH v5 29/34] ml/cnxk: enable reporting model runtime as xstats Srikanth Yalavarthi
@ 2023-10-18  6:47   ` Srikanth Yalavarthi
  2023-10-18  6:47   ` [PATCH v5 31/34] ml/cnxk: add generic ML malloc and free callback Srikanth Yalavarthi
                     ` (3 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:47 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented callback functions for IO allocation and free
for Glow layers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 87 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  3 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c |  2 +
 3 files changed, 92 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 0c67ce7b40..7802425c87 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1410,3 +1410,90 @@ cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
 error_enqueue:
 	return ret;
 }
+
+int
+cn10k_ml_io_alloc(void *device, uint16_t model_id, const char *layer_name, uint64_t **input_qbuffer,
+		  uint64_t **output_qbuffer)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint64_t output_size;
+	uint64_t input_size;
+	uint16_t layer_id;
+	int ret;
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
+	layer = &model->layer[layer_id];
+	input_size = PLT_ALIGN_CEIL(layer->info.total_input_sz_q, ML_CN10K_ALIGN_SIZE);
+	output_size = PLT_ALIGN_CEIL(layer->info.total_output_sz_q, ML_CN10K_ALIGN_SIZE);
+
+	sprintf(str, "cn10k_ml_io_mz_%u_%u", model_id, layer_id);
+	mz = plt_memzone_reserve_aligned(str, input_size + output_size, 0, ML_CN10K_ALIGN_SIZE);
+	if (mz == NULL) {
+		plt_err("io_alloc failed: Unable to allocate memory: model_id = %u, layer_name = %s",
+			model_id, layer_name);
+		return -ENOMEM;
+	}
+
+	*input_qbuffer = mz->addr;
+	*output_qbuffer = PLT_PTR_ADD(mz->addr, input_size);
+
+	return 0;
+}
+
+int
+cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint16_t layer_id;
+	int ret;
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
+	sprintf(str, "cn10k_ml_io_mz_%u_%u", model_id, layer_id);
+	mz = plt_memzone_lookup(str);
+	if (mz == NULL) {
+		plt_err("io_free failed: Memzone not found: model_id = %u, layer_name = %s",
+			model_id, layer_name);
+		return -EINVAL;
+	}
+
+	return plt_memzone_free(mz);
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 045e2e6cd2..9c41c1c0b0 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -329,6 +329,9 @@ int cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name,
 int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name);
 int cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name);
 int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name);
+int cn10k_ml_io_alloc(void *device, uint16_t model_id, const char *layer_name,
+		      uint64_t **input_qbuffer, uint64_t **output_qbuffer);
+int cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name);
 
 /* xstats ops */
 void cn10k_ml_xstat_model_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 832837034b..77c2b5bcdc 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -232,6 +232,8 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback = &model->mvtvm.cb;
 		callback->tvmrt_glow_layer_load = cn10k_ml_layer_load;
 		callback->tvmrt_glow_layer_unload = cn10k_ml_layer_unload;
+		callback->tvmrt_io_alloc = cn10k_ml_io_alloc;
+		callback->tvmrt_io_free = cn10k_ml_io_free;
 	} else {
 		callback = NULL;
 	}
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 31/34] ml/cnxk: add generic ML malloc and free callback
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
                     ` (29 preceding siblings ...)
  2023-10-18  6:47   ` [PATCH v5 30/34] ml/cnxk: implement I/O alloc and free callbacks Srikanth Yalavarthi
@ 2023-10-18  6:47   ` Srikanth Yalavarthi
  2023-10-18  6:48   ` [PATCH v5 32/34] ml/cnxk: support quantize and dequantize callback Srikanth Yalavarthi
                     ` (2 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:47 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented generic ML malloc and free callbacks

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 30 ++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  3 +++
 drivers/ml/cnxk/mvtvm_ml_ops.c |  2 ++
 3 files changed, 35 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 7802425c87..01b0a44caa 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1497,3 +1497,33 @@ cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name)
 
 	return plt_memzone_free(mz);
 }
+
+int
+cn10k_ml_malloc(const char *name, size_t size, uint32_t align, void **addr)
+{
+	const struct plt_memzone *mz;
+
+	mz = plt_memzone_reserve_aligned(name, size, 0, align);
+	if (mz == NULL) {
+		plt_err("ml_malloc failed: Unable to allocate memory: name = %s", name);
+		return -ENOMEM;
+	}
+
+	*addr = mz->addr;
+
+	return 0;
+}
+
+int
+cn10k_ml_free(const char *name)
+{
+	const struct plt_memzone *mz;
+
+	mz = plt_memzone_lookup(name);
+	if (mz == NULL) {
+		plt_err("ml_free failed: Memzone not found: name = %s", name);
+		return -EINVAL;
+	}
+
+	return plt_memzone_free(mz);
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 9c41c1c0b0..eb3e1c139c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -333,6 +333,9 @@ int cn10k_ml_io_alloc(void *device, uint16_t model_id, const char *layer_name,
 		      uint64_t **input_qbuffer, uint64_t **output_qbuffer);
 int cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name);
 
+int cn10k_ml_malloc(const char *name, size_t size, uint32_t align, void **addr);
+int cn10k_ml_free(const char *name);
+
 /* xstats ops */
 void cn10k_ml_xstat_model_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 				   uint16_t stat_id, uint16_t entry, char *suffix);
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 77c2b5bcdc..b627355917 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -234,6 +234,8 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback->tvmrt_glow_layer_unload = cn10k_ml_layer_unload;
 		callback->tvmrt_io_alloc = cn10k_ml_io_alloc;
 		callback->tvmrt_io_free = cn10k_ml_io_free;
+		callback->tvmrt_malloc = cn10k_ml_malloc;
+		callback->tvmrt_free = cn10k_ml_free;
 	} else {
 		callback = NULL;
 	}
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 32/34] ml/cnxk: support quantize and dequantize callback
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
                     ` (30 preceding siblings ...)
  2023-10-18  6:47   ` [PATCH v5 31/34] ml/cnxk: add generic ML malloc and free callback Srikanth Yalavarthi
@ 2023-10-18  6:48   ` Srikanth Yalavarthi
  2023-10-18  6:48   ` [PATCH v5 33/34] ml/cnxk: enable fast-path ops for TVM models Srikanth Yalavarthi
  2023-10-18  6:48   ` [PATCH v5 34/34] ml/cnxk: enable creation of mvtvm virtual device Srikanth Yalavarthi
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:48 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

From: Prince Takkar <ptakkar@marvell.com>

Added support for quantize and dequantize callback
functions for TVM models.

Signed-off-by: Prince Takkar <ptakkar@marvell.com>
---
 drivers/ml/cnxk/mvtvm_ml_ops.c | 129 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h |   4 +
 2 files changed, 133 insertions(+)

diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index b627355917..776675843a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -2,11 +2,15 @@
  * Copyright (c) 2023 Marvell.
  */
 
+#include <dlpack/dlpack.h>
+
 #include <rte_common.h>
 #include <rte_cycles.h>
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include <mldev_utils.h>
+
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
@@ -236,6 +240,8 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback->tvmrt_io_free = cn10k_ml_io_free;
 		callback->tvmrt_malloc = cn10k_ml_malloc;
 		callback->tvmrt_free = cn10k_ml_free;
+		callback->tvmrt_quantize = mvtvm_ml_io_quantize;
+		callback->tvmrt_dequantize = mvtvm_ml_io_dequantize;
 	} else {
 		callback = NULL;
 	}
@@ -366,3 +372,126 @@ mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
 
 	return 0;
 }
+
+int
+mvtvm_ml_io_quantize(void *device, uint16_t model_id, const char *layer_name,
+		     const DLTensor **deq_tensor, void *qbuffer)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	uint16_t layer_id = 0;
+	uint8_t *lcl_dbuffer;
+	uint8_t *lcl_qbuffer;
+	uint32_t i;
+	int ret;
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if ((device == NULL) || (deq_tensor == NULL) || (qbuffer == NULL))
+		return -EINVAL;
+#endif
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+#ifdef CNXK_ML_DEV_DEBUG
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+#endif
+
+	/* Get layer id */
+	for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+		if (strcmp(model->layer[layer_id].name, layer_name) == 0)
+			break;
+	}
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if (layer_id == model->mvtvm.metadata.model.nb_layers) {
+		plt_err("Invalid layer name: %s", layer_name);
+		return -EINVAL;
+	}
+
+	if (model->layer[layer_id].type != ML_CNXK_LAYER_TYPE_MRVL) {
+		plt_err("Invalid layer name / type: %s", layer_name);
+		return -EINVAL;
+	}
+#endif
+
+	info = &model->layer[layer_id].info;
+	lcl_qbuffer = (uint8_t *)qbuffer;
+
+	for (i = 0; i < info->nb_inputs; i++) {
+		lcl_dbuffer = PLT_PTR_ADD(deq_tensor[i]->data, deq_tensor[i]->byte_offset);
+
+		ret = cnxk_ml_io_quantize_single(&info->input[i], lcl_dbuffer, lcl_qbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_qbuffer += info->input[i].sz_q;
+	}
+
+	return 0;
+}
+
+int
+mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name, void *qbuffer,
+		       const DLTensor **deq_tensor)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	uint16_t layer_id = 0;
+	uint8_t *lcl_dbuffer;
+	uint8_t *lcl_qbuffer;
+	uint32_t i;
+	int ret;
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if ((device == NULL) || (deq_tensor == NULL) || (qbuffer == NULL))
+		return -EINVAL;
+#endif
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+#ifdef CNXK_ML_DEV_DEBUG
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+#endif
+
+	for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+		if (strcmp(model->layer[layer_id].name, layer_name) == 0)
+			break;
+	}
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if (layer_id == model->mvtvm.metadata.model.nb_layers) {
+		plt_err("Invalid layer name: %s", layer_name);
+		return -EINVAL;
+	}
+
+	if (model->layer[layer_id].type != ML_CNXK_LAYER_TYPE_MRVL) {
+		plt_err("Invalid layer name / type: %s", layer_name);
+		return -EINVAL;
+	}
+#endif
+
+	info = &model->layer[layer_id].info;
+	lcl_qbuffer = (uint8_t *)qbuffer;
+
+	for (i = 0; i < info->nb_outputs; i++) {
+		lcl_dbuffer = PLT_PTR_ADD(deq_tensor[i]->data, deq_tensor[i]->byte_offset);
+
+		ret = cnxk_ml_io_dequantize_single(&info->output[i], lcl_qbuffer, lcl_dbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_qbuffer += info->output[i].sz_q;
+	}
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 22e0340146..4cabe30a82 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -24,6 +24,10 @@ int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_para
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_io_quantize(void *device, uint16_t model_id, const char *layer_name,
+			 const DLTensor **deq_tensor, void *qbuffer);
+int mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name, void *qbuffer,
+			   const DLTensor **deq_tensor);
 
 void mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 				   uint16_t stat_id, uint16_t entry, char *suffix);
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 33/34] ml/cnxk: enable fast-path ops for TVM models
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
                     ` (31 preceding siblings ...)
  2023-10-18  6:48   ` [PATCH v5 32/34] ml/cnxk: support quantize and dequantize callback Srikanth Yalavarthi
@ 2023-10-18  6:48   ` Srikanth Yalavarthi
  2023-10-18  6:48   ` [PATCH v5 34/34] ml/cnxk: enable creation of mvtvm virtual device Srikanth Yalavarthi
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:48 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

From: Anup Prabhu <aprabhu@marvell.com>

Enable fast-path ops support for TVM models. Models would
use TVMDP library function calls to execute inference
operations for Hybrid and LLVM model sub-types.

For TVM MRVL model subtypes that have a single MRVL layer,
the inference requests are directly enqueued to hardware
by the driver.

Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 doc/guides/rel_notes/release_23_11.rst |   4 +
 drivers/ml/cnxk/cn10k_ml_ops.c         |   4 -
 drivers/ml/cnxk/cnxk_ml_io.h           |   6 ++
 drivers/ml/cnxk/cnxk_ml_ops.c          |   4 +
 drivers/ml/cnxk/cnxk_ml_ops.h          |   5 +
 drivers/ml/cnxk/mvtvm_ml_model.c       |  20 ++++
 drivers/ml/cnxk/mvtvm_ml_model.h       |   6 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c         | 124 +++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h         |  43 +++++++++
 9 files changed, 212 insertions(+), 4 deletions(-)

diff --git a/doc/guides/rel_notes/release_23_11.rst b/doc/guides/rel_notes/release_23_11.rst
index 8701350b2e..ba4d162287 100644
--- a/doc/guides/rel_notes/release_23_11.rst
+++ b/doc/guides/rel_notes/release_23_11.rst
@@ -28,6 +28,10 @@ New Features
 
      Added support in mldev library for models with multiple inputs and outputs.
 
+   * **Added support for Marvell TVM models in ML CNXK driver.**
+
+     Added support for models compiled using TVM framework in ML CNXK driver.
+
 
 .. This section should contain new features added in this release.
    Sample format:
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 01b0a44caa..b9d30278c6 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -371,10 +371,6 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 	else
 		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
-	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
-	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
-	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
-
 	return 0;
 }
 
diff --git a/drivers/ml/cnxk/cnxk_ml_io.h b/drivers/ml/cnxk/cnxk_ml_io.h
index 5de166c252..6d5d25a7c9 100644
--- a/drivers/ml/cnxk/cnxk_ml_io.h
+++ b/drivers/ml/cnxk/cnxk_ml_io.h
@@ -47,6 +47,12 @@ struct cnxk_ml_io {
 
 	/* Scale */
 	float scale;
+
+	/* Dequantized offset */
+	uint32_t offset_d;
+
+	/* Quantized offset */
+	uint32_t offset_q;
 };
 
 /* Model / Layer IO structure */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index fd2c46ac1f..608e9fc4ca 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -632,6 +632,10 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	cnxk_mldev->max_nb_layers =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
 
+	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
+	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
+	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
+
 	/* Allocate and initialize index_map */
 	if (cnxk_mldev->index_map == NULL) {
 		cnxk_mldev->index_map =
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index ab32676b3e..7b49793a57 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -24,6 +24,11 @@ struct cnxk_ml_req {
 	union {
 		/* CN10K */
 		struct cn10k_ml_req cn10k_req;
+
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		/* MVTVM */
+		struct mvtvm_ml_req mvtvm_req;
+#endif
 	};
 
 	/* Address of status field */
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 4c12f584d5..1dfd0d176a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -198,6 +198,16 @@ mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model)
 		model->mvtvm.info.total_input_sz_d += model->mvtvm.info.input[i].sz_d;
 		model->mvtvm.info.total_input_sz_q += model->mvtvm.info.input[i].sz_q;
 
+		model->mvtvm.info.input[i].offset_d = model->mvtvm.info.total_input_sz_d;
+		model->mvtvm.info.input[i].offset_q = model->mvtvm.info.total_input_sz_q;
+
+		model->mvtvm.input_tensor[i].device = metadata->input[i].device;
+		model->mvtvm.input_tensor[i].ndim = metadata->input[i].ndim;
+		model->mvtvm.input_tensor[i].dtype = metadata->input[i].datatype;
+		model->mvtvm.input_tensor[i].shape = metadata->input[i].shape;
+		model->mvtvm.input_tensor[i].strides = NULL;
+		model->mvtvm.input_tensor[i].byte_offset = model->mvtvm.info.input[i].offset_q;
+
 		plt_ml_dbg("model_id = %u, input[%u] - sz_d = %u sz_q = %u", model->model_id, i,
 			   model->mvtvm.info.input[i].sz_d, model->mvtvm.info.input[i].sz_q);
 	}
@@ -231,6 +241,16 @@ mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model)
 		model->mvtvm.info.total_output_sz_d += model->mvtvm.info.output[i].sz_d;
 		model->mvtvm.info.total_output_sz_q += model->mvtvm.info.output[i].sz_q;
 
+		model->mvtvm.info.output[i].offset_d = model->mvtvm.info.total_output_sz_d;
+		model->mvtvm.info.output[i].offset_q = model->mvtvm.info.total_output_sz_q;
+
+		model->mvtvm.output_tensor[i].device = metadata->output[i].device;
+		model->mvtvm.output_tensor[i].ndim = metadata->output[i].ndim;
+		model->mvtvm.output_tensor[i].dtype = metadata->output[i].datatype;
+		model->mvtvm.output_tensor[i].shape = metadata->output[i].shape;
+		model->mvtvm.output_tensor[i].strides = NULL;
+		model->mvtvm.output_tensor[i].byte_offset = model->mvtvm.info.output[i].offset_q;
+
 		plt_ml_dbg("model_id = %u, output[%u] - sz_d = %u sz_q = %u", model->model_id, i,
 			   model->mvtvm.info.output[i].sz_d, model->mvtvm.info.output[i].sz_q);
 	}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 66c3af18e1..7ffce38094 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -69,6 +69,12 @@ struct mvtvm_ml_model_data {
 
 	/* Stats for burst ops */
 	struct mvtvm_ml_model_xstats *burst_xstats;
+
+	/* Input Tensor */
+	DLTensor input_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Output Tensor */
+	DLTensor output_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
 };
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 776675843a..1e74b82a0a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -19,6 +19,12 @@
 /* ML model macros */
 #define MVTVM_ML_MODEL_MEMZONE_NAME "ml_mvtvm_model_mz"
 
+__rte_hot static void
+mvtvm_ml_set_poll_addr(struct cnxk_ml_req *req)
+{
+	req->status = &req->mvtvm_req.status;
+}
+
 void
 mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 			      uint16_t stat_id, uint16_t entry, char *suffix)
@@ -242,6 +248,7 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback->tvmrt_free = cn10k_ml_free;
 		callback->tvmrt_quantize = mvtvm_ml_io_quantize;
 		callback->tvmrt_dequantize = mvtvm_ml_io_dequantize;
+		callback->tvmrt_inference = cn10k_ml_inference_sync;
 	} else {
 		callback = NULL;
 	}
@@ -285,6 +292,19 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		model->mvtvm.burst_xstats[qp_id].dequeued_count = 0;
 	}
 
+	/* Set model specific fast path functions */
+	if (model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL) {
+		model->enqueue_single = cn10k_ml_enqueue_single;
+		model->result_update = cn10k_ml_result_update;
+		model->set_error_code = cn10k_ml_set_error_code;
+		model->set_poll_addr = cn10k_ml_set_poll_addr;
+	} else {
+		model->enqueue_single = mvtvm_ml_enqueue_single;
+		model->result_update = mvtvm_ml_result_update;
+		model->set_error_code = mvtvm_ml_set_error_code;
+		model->set_poll_addr = mvtvm_ml_set_poll_addr;
+	}
+
 	return 0;
 
 error:
@@ -495,3 +515,107 @@ mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name,
 
 	return 0;
 }
+
+static int
+mvtvm_ml_model_run(struct cnxk_ml_model *model, struct rte_ml_op *op, struct cnxk_ml_req *req)
+{
+	uint8_t i;
+
+	rte_memcpy(req->mvtvm_req.input_tensor, model->mvtvm.input_tensor,
+		   model->mvtvm.metadata.model.num_input * sizeof(DLTensor));
+	for (i = 0; i < model->mvtvm.metadata.model.num_input; i++) {
+		req->mvtvm_req.input_tensor[i].data = op->input[i]->addr;
+		req->mvtvm_req.input_tensor[i].byte_offset = 0;
+	}
+
+	rte_memcpy(req->mvtvm_req.output_tensor, model->mvtvm.output_tensor,
+		   model->mvtvm.metadata.model.num_output * sizeof(DLTensor));
+	for (i = 0; i < model->mvtvm.metadata.model.num_output; i++) {
+		req->mvtvm_req.output_tensor[i].data = op->output[i]->addr;
+		req->mvtvm_req.output_tensor[i].byte_offset = 0;
+	}
+
+	tvmdp_model_run(model->model_id, model->mvtvm.metadata.model.num_input,
+			req->mvtvm_req.input_tensor, model->mvtvm.metadata.model.num_output,
+			req->mvtvm_req.output_tensor, &req->mvtvm_req.result,
+			&req->mvtvm_req.status);
+
+	plt_write64(ML_CNXK_POLL_JOB_FINISH, req->status);
+
+	return 0;
+}
+
+__rte_hot void
+mvtvm_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype)
+{
+	RTE_SET_USED(stype);
+
+	req->mvtvm_req.result.error_code = etype;
+}
+
+__rte_hot bool
+mvtvm_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op, uint16_t layer_id,
+			struct cnxk_ml_qp *qp, uint64_t head)
+{
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_req *req;
+
+	RTE_SET_USED(layer_id);
+
+	queue = &qp->queue;
+	req = &queue->reqs[head];
+	model = cnxk_mldev->mldev->data->models[op->model_id];
+
+	model->set_poll_addr(req);
+	memset(&req->mvtvm_req.result, 0, sizeof(struct mvtvm_ml_result));
+	req->mvtvm_req.result.error_code = 0x0;
+	req->mvtvm_req.result.user_ptr = op->user_ptr;
+
+	cnxk_ml_set_poll_ptr(req);
+	mvtvm_ml_model_run(model, op, req);
+	req->timeout = plt_tsc_cycles() + queue->wait_cycles;
+	req->op = op;
+
+	return true;
+}
+
+__rte_hot void
+mvtvm_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request)
+{
+	struct mvtvm_ml_model_xstats *xstats;
+	struct mvtvm_ml_result *result;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_req *req;
+	uint64_t tvm_rt_latency;
+	struct cnxk_ml_qp *qp;
+	struct rte_ml_op *op;
+
+	req = (struct cnxk_ml_req *)request;
+	result = &req->mvtvm_req.result;
+	op = req->op;
+	qp = cnxk_mldev->mldev->data->queue_pairs[qp_id];
+	op->impl_opaque = result->error_code;
+
+	if (likely(result->error_code == 0)) {
+		qp->stats.dequeued_count++;
+		op->status = RTE_ML_OP_STATUS_SUCCESS;
+
+		model = cnxk_mldev->mldev->data->models[op->model_id];
+		xstats = &model->mvtvm.burst_xstats[qp_id];
+
+		if (unlikely(xstats->dequeued_count == xstats->tvm_rt_reset_count)) {
+			xstats->tvm_rt_latency_min = UINT64_MAX;
+			xstats->tvm_rt_latency_max = 0;
+		}
+		tvm_rt_latency = result->stats.end_ns - result->stats.start_ns;
+		xstats->tvm_rt_latency = tvm_rt_latency;
+		xstats->tvm_rt_latency_tot += tvm_rt_latency;
+		xstats->tvm_rt_latency_min = RTE_MIN(xstats->tvm_rt_latency_min, tvm_rt_latency);
+		xstats->tvm_rt_latency_max = RTE_MAX(xstats->tvm_rt_latency_max, tvm_rt_latency);
+		xstats->dequeued_count++;
+	} else {
+		qp->stats.dequeue_err_count++;
+		op->status = RTE_ML_OP_STATUS_ERROR;
+	}
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 4cabe30a82..cb4b219743 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -16,6 +16,44 @@
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
+struct cnxk_ml_qp;
+struct cnxk_ml_req;
+
+/* Inference stats */
+struct mvtvm_ml_stats {
+	/* Start ns */
+	uint64_t start_ns;
+
+	/* Start ns */
+	uint64_t end_ns;
+};
+
+/* Result structure */
+struct mvtvm_ml_result {
+	/* Job error code */
+	uint64_t error_code;
+
+	/* Inference stats */
+	struct mvtvm_ml_stats stats;
+
+	/* User context pointer */
+	void *user_ptr;
+};
+
+/* MVTVM specific request */
+struct mvtvm_ml_req {
+	/* Input tensors */
+	DLTensor input_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Output tensors */
+	DLTensor output_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Status field for poll mode requests */
+	volatile uint64_t status;
+
+	/* Result */
+	struct mvtvm_ml_result result;
+};
 
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
@@ -29,6 +67,11 @@ int mvtvm_ml_io_quantize(void *device, uint16_t model_id, const char *layer_name
 int mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name, void *qbuffer,
 			   const DLTensor **deq_tensor);
 
+__rte_hot bool mvtvm_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op,
+				       uint16_t layer_id, struct cnxk_ml_qp *qp, uint64_t head);
+__rte_hot void mvtvm_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request);
+__rte_hot void mvtvm_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype);
+
 void mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 				   uint16_t stat_id, uint16_t entry, char *suffix);
 uint64_t mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v5 34/34] ml/cnxk: enable creation of mvtvm virtual device
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
                     ` (32 preceding siblings ...)
  2023-10-18  6:48   ` [PATCH v5 33/34] ml/cnxk: enable fast-path ops for TVM models Srikanth Yalavarthi
@ 2023-10-18  6:48   ` Srikanth Yalavarthi
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:48 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enable support to create a mvtvm virtual device on system's
without a PCI based ML HW accelerator.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 doc/guides/mldevs/cnxk.rst       |  49 +++++++-
 drivers/ml/cnxk/cn10k_ml_dev.c   |   8 ++
 drivers/ml/cnxk/cn10k_ml_dev.h   |   3 +
 drivers/ml/cnxk/cnxk_ml_dev.c    |   3 +
 drivers/ml/cnxk/cnxk_ml_dev.h    |  21 ++++
 drivers/ml/cnxk/cnxk_ml_ops.c    |  82 +++++++++----
 drivers/ml/cnxk/meson.build      |   2 +
 drivers/ml/cnxk/mvtvm_ml_dev.c   | 196 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_dev.h   |  40 +++++++
 drivers/ml/cnxk/mvtvm_ml_ops.c   |  31 +++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |   2 +
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  18 +++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |   2 +
 13 files changed, 433 insertions(+), 24 deletions(-)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.h

diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
index a629ceb796..55138c4ced 100644
--- a/doc/guides/mldevs/cnxk.rst
+++ b/doc/guides/mldevs/cnxk.rst
@@ -128,6 +128,22 @@ Bind the ML PF device to the vfio_pci driver:
    usertools/dpdk-devbind.py -u 0000:00:10.0
    usertools/dpdk-devbind.py -b vfio-pci 0000:00:10.0
 
+VDEV support
+------------
+
+On platforms which don't support ML hardware acceleration through PCI device, the
+Marvell ML CNXK PMD can execute inference operations on a vdev with the ML models
+compiled using Apache TVM framework.
+
+VDEV can be enabled by passing the EAL arguments
+
+.. code-block:: console
+
+   --vdev ml_mvtvm
+
+VDEV can also be used on platforms with ML HW accelerator. However use of VDEV and
+PCI HW accelerator is mutually exclusive.
+
 
 Runtime Config Options
 ----------------------
@@ -138,6 +154,8 @@ Runtime Config Options
   The parameter ``fw_path`` can be used by the user
   to load ML firmware from a custom path.
 
+  This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,fw_path="/home/user/ml_fw.bin"
@@ -153,6 +171,8 @@ Runtime Config Options
   When enabled, firmware would mask the DPE non-fatal hardware errors as warnings.
   The parameter ``enable_dpe_warnings`` is used fo this configuration.
 
+  This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,enable_dpe_warnings=0
@@ -169,11 +189,19 @@ Runtime Config Options
   Caching of model data improves the inferencing throughput / latency for the model.
   The parameter ``cache_model_data`` is used to enable data caching.
 
+  This option is supported on PCI HW accelerator and vdev.
+
   For example::
 
      -a 0000:00:10.0,cache_model_data=0
 
-  With the above configuration, model data caching is disabled.
+  With the above configuration, model data caching is disabled on HW accelerator.
+
+  For example::
+
+     --vdev ml_mvtvm,cache_model_data=0
+
+  With the above configuration, model data caching is disabled on vdev.
 
 
 **OCM allocation mode** (default ``lowest``)
@@ -189,6 +217,8 @@ Runtime Config Options
   ``largest``
     Allocate OCM for the model from the slot with largest amount of free space.
 
+  This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,ocm_alloc_mode=lowest
@@ -206,6 +236,8 @@ Runtime Config Options
   Supported page sizes by the driver are 1 KB, 2 KB, 4 KB, 8 KB and 16 KB.
   Default page size is 16 KB.
 
+  This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,ocm_page_size=8192
@@ -230,6 +262,8 @@ Runtime Config Options
     Enabling spinlock version would disable restrictions on the number of queue-pairs
     that can be supported by the driver.
 
+   This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,hw_queue_lock=1
@@ -238,6 +272,19 @@ Runtime Config Options
   in the fast path enqueue burst operation.
 
 
+**Maximum queue pairs** (default ``1``)
+
+  VDEV supports additional EAL arguments to configure the maximum number of
+  queue-pairs on the ML device through the option ``max_qps``.
+
+  This option is supported only on vdev.
+
+  For example::
+
+     --vdev ml_mvtvm,max_qps=4
+
+  With the above configuration, 4 queue-pairs are created on the vdev.
+
 Debugging Options
 -----------------
 
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 91813e9d0a..caa13ba08c 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -309,6 +309,12 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 
 	PLT_SET_USED(pci_drv);
 
+	if (cnxk_ml_dev_initialized == 1) {
+		plt_err("ML CNXK device already initialized!");
+		plt_err("Cannot initialize CN10K PCI dev");
+		rte_exit(-EINVAL, "Invalid EAL arguments ");
+	}
+
 	init_params = (struct rte_ml_dev_pmd_init_params){
 		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cnxk_ml_dev)};
 
@@ -355,6 +361,8 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	dev->dequeue_burst = NULL;
 	dev->op_error_get = NULL;
 
+	cnxk_ml_dev_initialized = 1;
+	cnxk_mldev->type = CNXK_ML_DEV_TYPE_PCI;
 	cnxk_mldev->state = ML_CNXK_DEV_STATE_PROBED;
 
 	return 0;
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 2e7eb6c9ef..cee405f3f5 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -11,6 +11,9 @@
 
 #include "cnxk_ml_io.h"
 
+/* Device status */
+extern int cnxk_ml_dev_initialized;
+
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.c b/drivers/ml/cnxk/cnxk_ml_dev.c
index 63d1c9e417..dc4512223c 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.c
+++ b/drivers/ml/cnxk/cnxk_ml_dev.c
@@ -7,6 +7,9 @@
 
 #include "cnxk_ml_dev.h"
 
+/* Device status */
+int cnxk_ml_dev_initialized;
+
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
 
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 382fca64be..491c4c4aea 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -9,6 +9,10 @@
 
 #include "cn10k_ml_dev.h"
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_dev.h"
+#endif
+
 #include "cnxk_ml_xstats.h"
 
 /* ML command timeout in seconds */
@@ -34,6 +38,15 @@ struct cnxk_ml_error_db {
 	char str[RTE_ML_STR_MAX];
 };
 
+/* Device type */
+enum cnxk_ml_dev_type {
+	/* PCI based Marvell's ML HW accelerator device */
+	CNXK_ML_DEV_TYPE_PCI,
+
+	/* Generic Virtual device */
+	CNXK_ML_DEV_TYPE_VDEV,
+};
+
 /* Device configuration state enum */
 enum cnxk_ml_dev_state {
 	/* Probed and not configured */
@@ -66,6 +79,9 @@ struct cnxk_ml_dev {
 	/* RTE device */
 	struct rte_ml_dev *mldev;
 
+	/* Device type */
+	enum cnxk_ml_dev_type type;
+
 	/* Configuration state */
 	enum cnxk_ml_dev_state state;
 
@@ -87,6 +103,11 @@ struct cnxk_ml_dev {
 	/* CN10K device structure */
 	struct cn10k_ml_dev cn10k_mldev;
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	/* MVTVM device structure */
+	struct mvtvm_ml_dev mvtvm_mldev;
+#endif
+
 	/* Maximum number of layers */
 	uint64_t max_nb_layers;
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 608e9fc4ca..517aa71931 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -117,7 +117,8 @@ cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc
 	qp->stats.enqueue_err_count = 0;
 	qp->stats.dequeue_err_count = 0;
 
-	cn10k_ml_qp_initialize(cnxk_mldev, qp);
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		cn10k_ml_qp_initialize(cnxk_mldev, qp);
 
 	return qp;
 
@@ -480,7 +481,12 @@ cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	dev_info->driver_name = dev->device->driver->name;
 	dev_info->max_models = ML_CNXK_MAX_MODELS;
 
-	return cn10k_ml_dev_info_get(cnxk_mldev, dev_info);
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		return cn10k_ml_dev_info_get(cnxk_mldev, dev_info);
+	else
+		return mvtvm_ml_dev_info_get(cnxk_mldev, dev_info);
+
+	return 0;
 }
 
 static int
@@ -518,9 +524,11 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 			   conf->nb_queue_pairs, conf->nb_models);
 
 		/* Load firmware */
-		ret = cn10k_ml_fw_load(cnxk_mldev);
-		if (ret != 0)
-			return ret;
+		if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+			ret = cn10k_ml_fw_load(cnxk_mldev);
+			if (ret != 0)
+				return ret;
+		}
 	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
 		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
@@ -618,10 +626,12 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	}
 	dev->data->nb_models = conf->nb_models;
 
-	ret = cn10k_ml_dev_configure(cnxk_mldev, conf);
-	if (ret != 0) {
-		plt_err("Failed to configure CN10K ML Device");
-		goto error;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		ret = cn10k_ml_dev_configure(cnxk_mldev, conf);
+		if (ret != 0) {
+			plt_err("Failed to configure CN10K ML Device");
+			goto error;
+		}
 	}
 
 	ret = mvtvm_ml_dev_configure(cnxk_mldev, conf);
@@ -629,12 +639,17 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 		goto error;
 
 	/* Set device capabilities */
-	cnxk_mldev->max_nb_layers =
-		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		cnxk_mldev->max_nb_layers =
+			cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
+	else
+		cnxk_mldev->max_nb_layers = ML_CNXK_MAX_MODELS;
 
 	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
 	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
-	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
+
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
 
 	/* Allocate and initialize index_map */
 	if (cnxk_mldev->index_map == NULL) {
@@ -695,8 +710,10 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 	if (mvtvm_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close MVTVM ML Device");
 
-	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
-		plt_err("Failed to close CN10K ML Device");
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		if (cn10k_ml_dev_close(cnxk_mldev) != 0)
+			plt_err("Failed to close CN10K ML Device");
+	}
 
 	if (cnxk_mldev->index_map)
 		rte_free(cnxk_mldev->index_map);
@@ -748,10 +765,12 @@ cnxk_ml_dev_start(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
-	ret = cn10k_ml_dev_start(cnxk_mldev);
-	if (ret != 0) {
-		plt_err("Failed to start CN10K ML Device");
-		return ret;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		ret = cn10k_ml_dev_start(cnxk_mldev);
+		if (ret != 0) {
+			plt_err("Failed to start CN10K ML Device");
+			return ret;
+		}
 	}
 
 	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
@@ -770,10 +789,12 @@ cnxk_ml_dev_stop(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
-	ret = cn10k_ml_dev_stop(cnxk_mldev);
-	if (ret != 0) {
-		plt_err("Failed to stop CN10K ML Device");
-		return ret;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		ret = cn10k_ml_dev_stop(cnxk_mldev);
+		if (ret != 0) {
+			plt_err("Failed to stop CN10K ML Device");
+			return ret;
+		}
 	}
 
 	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
@@ -800,7 +821,12 @@ cnxk_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 			cnxk_ml_model_dump(cnxk_mldev, model, fp);
 	}
 
-	return cn10k_ml_dev_dump(cnxk_mldev, fp);
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		return cn10k_ml_dev_dump(cnxk_mldev, fp);
+	else
+		return mvtvm_ml_dev_dump(cnxk_mldev, fp);
+
+	return 0;
 }
 
 static int
@@ -813,6 +839,9 @@ cnxk_ml_dev_selftest(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV)
+		return -ENOTSUP;
+
 	return cn10k_ml_dev_selftest(cnxk_mldev);
 }
 
@@ -1145,6 +1174,11 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 		return -EINVAL;
 	}
 
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV && type != ML_CNXK_MODEL_TYPE_TVM) {
+		plt_err("Unsupported model type");
+		return -ENOTSUP;
+	}
+
 	/* Find model ID */
 	found = false;
 	for (lcl_model_id = 0; lcl_model_id < dev->data->nb_models; lcl_model_id++) {
@@ -1384,6 +1418,8 @@ cnxk_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buf
 		return -EINVAL;
 
 	cnxk_mldev = dev->data->dev_private;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV)
+		return -ENOTSUP;
 
 	model = dev->data->models[model_id];
 	if (model == NULL) {
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index b3a62a7871..e4e3bc200d 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -70,11 +70,13 @@ if enable_mvtvm
 dpdk_conf.set('RTE_MLDEV_CNXK_ENABLE_MVTVM', 1)
 
 driver_sdk_headers += files(
+        'mvtvm_ml_dev.h',
         'mvtvm_ml_ops.h',
         'mvtvm_ml_model.h',
 )
 
 sources += files(
+        'mvtvm_ml_dev.c',
         'mvtvm_ml_ops.c',
         'mvtvm_ml_model.c',
 )
diff --git a/drivers/ml/cnxk/mvtvm_ml_dev.c b/drivers/ml/cnxk/mvtvm_ml_dev.c
new file mode 100644
index 0000000000..c93b5155b9
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_dev.c
@@ -0,0 +1,196 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_kvargs.h>
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include <bus_vdev_driver.h>
+
+#include <roc_api.h>
+
+#include "cnxk_ml_dev.h"
+
+#define MVTVM_ML_DEV_MAX_QPS	      "max_qps"
+#define MVTVM_ML_DEV_CACHE_MODEL_DATA "cache_model_data"
+
+#define MVTVM_ML_DEV_MAX_QPS_DEFAULT	      32
+#define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT 1
+
+static const char *const valid_args[] = {MVTVM_ML_DEV_MAX_QPS, MVTVM_ML_DEV_CACHE_MODEL_DATA, NULL};
+
+static int
+parse_integer_arg(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	int *i = (int *)extra_args;
+
+	*i = atoi(value);
+	if (*i < 0) {
+		plt_err("Argument has to be positive.");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int
+parse_uint_arg(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	int i;
+	char *end;
+	errno = 0;
+
+	i = strtol(value, &end, 10);
+	if (*end != 0 || errno != 0 || i < 0)
+		return -EINVAL;
+
+	*((uint32_t *)extra_args) = i;
+
+	return 0;
+}
+
+static int
+mvtvm_mldev_parse_devargs(const char *args, struct mvtvm_ml_dev *mvtvm_mldev)
+{
+	bool cache_model_data_set = false;
+	struct rte_kvargs *kvlist = NULL;
+	bool max_qps_set = false;
+	int ret = 0;
+
+	if (args == NULL)
+		goto check_args;
+
+	kvlist = rte_kvargs_parse(args, valid_args);
+	if (kvlist == NULL) {
+		plt_err("Error parsing %s devargs\n", "MLDEV_NAME_MVTVM_PMD");
+		return -EINVAL;
+	}
+
+	if (rte_kvargs_count(kvlist, MVTVM_ML_DEV_MAX_QPS) == 1) {
+		ret = rte_kvargs_process(kvlist, MVTVM_ML_DEV_MAX_QPS, &parse_uint_arg,
+					 &mvtvm_mldev->max_nb_qpairs);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n", MVTVM_ML_DEV_MAX_QPS);
+			ret = -EINVAL;
+			goto exit;
+		}
+		max_qps_set = true;
+	}
+
+	if (rte_kvargs_count(kvlist, MVTVM_ML_DEV_CACHE_MODEL_DATA) == 1) {
+		ret = rte_kvargs_process(kvlist, MVTVM_ML_DEV_CACHE_MODEL_DATA, &parse_integer_arg,
+					 &mvtvm_mldev->cache_model_data);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n",
+				MVTVM_ML_DEV_CACHE_MODEL_DATA);
+			ret = -EINVAL;
+			goto exit;
+		}
+		cache_model_data_set = true;
+	}
+
+check_args:
+	if (!max_qps_set)
+		mvtvm_mldev->max_nb_qpairs = MVTVM_ML_DEV_MAX_QPS_DEFAULT;
+	plt_ml_dbg("ML: %s = %u", MVTVM_ML_DEV_MAX_QPS, mvtvm_mldev->max_nb_qpairs);
+
+	if (!cache_model_data_set) {
+		mvtvm_mldev->cache_model_data = CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT;
+	} else {
+		if ((mvtvm_mldev->cache_model_data < 0) || (mvtvm_mldev->cache_model_data > 1)) {
+			plt_err("Invalid argument, %s = %d\n", MVTVM_ML_DEV_CACHE_MODEL_DATA,
+				mvtvm_mldev->cache_model_data);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_ml_dbg("ML: %s = %d", MVTVM_ML_DEV_CACHE_MODEL_DATA, mvtvm_mldev->cache_model_data);
+
+exit:
+	if (kvlist)
+		rte_kvargs_free(kvlist);
+
+	return ret;
+}
+
+static int
+mvtvm_ml_vdev_probe(struct rte_vdev_device *vdev)
+{
+	struct rte_ml_dev_pmd_init_params init_params;
+	struct mvtvm_ml_dev *mvtvm_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct rte_ml_dev *dev;
+	const char *input_args;
+	const char *name;
+	int ret = 0;
+
+	if (cnxk_ml_dev_initialized == 1) {
+		plt_err("ML CNXK device already initialized!");
+		plt_err("Cannot initialize MVTVM vdev");
+		rte_exit(-EINVAL, "Invalid EAL arguments ");
+	}
+
+	init_params = (struct rte_ml_dev_pmd_init_params){
+		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cnxk_ml_dev)};
+
+	name = rte_vdev_device_name(vdev);
+	if (name == NULL)
+		return -EINVAL;
+	input_args = rte_vdev_device_args(vdev);
+
+	dev = rte_ml_dev_pmd_create(name, &vdev->device, &init_params);
+	if (dev == NULL) {
+		ret = -EFAULT;
+		goto error_exit;
+	}
+
+	cnxk_mldev = dev->data->dev_private;
+	cnxk_mldev->mldev = dev;
+	mvtvm_mldev = &cnxk_mldev->mvtvm_mldev;
+	mvtvm_mldev->vdev = vdev;
+
+	ret = mvtvm_mldev_parse_devargs(input_args, mvtvm_mldev);
+	if (ret < 0)
+		goto error_exit;
+
+	dev->dev_ops = &cnxk_ml_ops;
+	dev->enqueue_burst = NULL;
+	dev->dequeue_burst = NULL;
+	dev->op_error_get = NULL;
+
+	cnxk_ml_dev_initialized = 1;
+	cnxk_mldev->type = CNXK_ML_DEV_TYPE_VDEV;
+
+	return 0;
+
+error_exit:
+	plt_err("Could not create device: ml_mvtvm");
+
+	return ret;
+}
+
+static int
+mvtvm_ml_vdev_remove(struct rte_vdev_device *vdev)
+{
+	struct rte_ml_dev *dev;
+	const char *name;
+
+	name = rte_vdev_device_name(vdev);
+	if (name == NULL)
+		return -EINVAL;
+
+	dev = rte_ml_dev_pmd_get_named_dev(name);
+	if (dev == NULL)
+		return -ENODEV;
+
+	return rte_ml_dev_pmd_destroy(dev);
+}
+
+static struct rte_vdev_driver mvtvm_mldev_pmd = {.probe = mvtvm_ml_vdev_probe,
+						 .remove = mvtvm_ml_vdev_remove};
+
+RTE_PMD_REGISTER_VDEV(MLDEV_NAME_MVTVM_PMD, mvtvm_mldev_pmd);
+
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_MVTVM_PMD,
+			      MVTVM_ML_DEV_MAX_QPS "=<int>" MVTVM_ML_DEV_CACHE_MODEL_DATA "=<0|1>");
diff --git a/drivers/ml/cnxk/mvtvm_ml_dev.h b/drivers/ml/cnxk/mvtvm_ml_dev.h
new file mode 100644
index 0000000000..6922c19337
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_dev.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_DEV_H_
+#define _MVTVM_ML_DEV_H_
+
+#include <rte_mldev_core.h>
+
+/* Device status */
+extern int cnxk_ml_dev_initialized;
+
+/* CNXK Device ops */
+extern struct rte_ml_dev_ops cnxk_ml_ops;
+
+/* Marvell MVTVM ML PMD device name */
+#define MLDEV_NAME_MVTVM_PMD ml_mvtvm
+
+/* Maximum number of descriptors per queue-pair */
+#define ML_MVTVM_MAX_DESC_PER_QP 1024
+
+/* Maximum number of inputs / outputs per model */
+#define ML_MVTVM_MAX_INPUT_OUTPUT 32
+
+/* Maximum number of segments for IO data */
+#define ML_MVTVM_MAX_SEGMENTS 1
+
+/* Device private data */
+struct mvtvm_ml_dev {
+	/* Virtual device */
+	struct rte_vdev_device *vdev;
+
+	/* Maximum number of queue pairs */
+	uint16_t max_nb_qpairs;
+
+	/* Enable / disable model data caching */
+	int cache_model_data;
+};
+
+#endif /* _MVTVM_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 1e74b82a0a..bbefa8a356 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -97,6 +97,22 @@ mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *m
 	return value;
 }
 
+int
+mvtvm_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info)
+{
+	struct mvtvm_ml_dev *mvtvm_mldev;
+
+	mvtvm_mldev = &cnxk_mldev->mvtvm_mldev;
+
+	dev_info->max_queue_pairs = mvtvm_mldev->max_nb_qpairs;
+	dev_info->max_desc = ML_MVTVM_MAX_DESC_PER_QP;
+	dev_info->max_io = ML_MVTVM_MAX_INPUT_OUTPUT;
+	dev_info->max_segments = ML_MVTVM_MAX_SEGMENTS;
+	dev_info->align_size = RTE_CACHE_LINE_SIZE;
+
+	return 0;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
@@ -127,6 +143,15 @@ mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 	return ret;
 }
 
+int
+mvtvm_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(fp);
+
+	return 0;
+}
+
 int
 mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 		    struct cnxk_ml_model *model)
@@ -237,6 +262,12 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	else
 		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_HYBRID;
 
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV &&
+	    model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_LLVM) {
+		plt_err("Unsupported model sub-type");
+		return -ENOTSUP;
+	}
+
 	/* Set callback function array */
 	if (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_LLVM) {
 		callback = &model->mvtvm.cb;
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index cb4b219743..0232c5ead5 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -55,8 +55,10 @@ struct mvtvm_ml_req {
 	struct mvtvm_ml_result result;
 };
 
+int mvtvm_ml_dev_info_get(struct cnxk_ml_dev *mldev, struct rte_ml_dev_info *dev_info);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int mvtvm_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index 19af1d2703..126a954c91 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -67,6 +67,15 @@ mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *m
 	return 0;
 }
 
+int
+mvtvm_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(dev_info);
+
+	return -ENOTSUP;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
@@ -84,6 +93,15 @@ mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
+int
+mvtvm_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(fp);
+
+	return -EINVAL;
+}
+
 int
 mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 		    struct cnxk_ml_model *model)
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 3fd1f04c35..4220a963f2 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -14,8 +14,10 @@ struct cnxk_ml_model;
 struct cnxk_ml_layer;
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
+int mvtvm_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int mvtvm_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* RE: [EXT] Re: [PATCH v4 00/34] Implementation of revised ml/cnxk driver
  2023-10-18  1:56   ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Jerin Jacob
@ 2023-10-18  6:55     ` Srikanth Yalavarthi
  0 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18  6:55 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, Shivah Shankar Shankar Narayan Rao, Anup Prabhu,
	Prince Takkar, Srikanth Yalavarthi

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: 18 October 2023 07:26
> To: Srikanth Yalavarthi <syalavarthi@marvell.com>
> Cc: dev@dpdk.org; Shivah Shankar Shankar Narayan Rao
> <sshankarnara@marvell.com>; Anup Prabhu <aprabhu@marvell.com>;
> Prince Takkar <ptakkar@marvell.com>
> Subject: [EXT] Re: [PATCH v4 00/34] Implementation of revised ml/cnxk
> driver
> 
> External Email
> 
> ----------------------------------------------------------------------
> On Tue, Oct 17, 2023 at 10:30 PM Srikanth Yalavarthi
> <syalavarthi@marvell.com> wrote:
> >
> > This patch series is an implementation of revised ml/cnxk driver to
> > support models compiled with TVM compiler framework. TVM models use
> a
> > hybrid mode for execution, with regions of the model executing on the
> > ML accelerator and the rest executing on CPU cores.
> >
> > This series of commits reorganizes the ml/cnxk driver and adds support
> > to execute multiple regions with-in a TVM model.
> >
> 
> Found following build error (may be due to gcc13)

Issue is due to order of the patches in the series. This patch is incorrectly ordered in the series and should be applied after
"ml/cnxk: add structures to support TVM model type"

I have fixed this and tested building all patches. No issues observed now.

Submitted v5 with required changes

> 
> ml/cnxk: enable OCM check for multilayer TVM model
> 
> [2389/2660] Compiling C object
> drivers/libtmp_rte_ml_cnxk.a.p/ml_cnxk_cnxk_ml_ops.c.o
> FAILED: drivers/libtmp_rte_ml_cnxk.a.p/ml_cnxk_cnxk_ml_ops.c.o
> ccache gcc -Idrivers/libtmp_rte_ml_cnxk.a.p -Idrivers -I../drivers -
> Idrivers/ml/cnxk -I../drivers/ml/cnxk -Ilib/mldev -I../lib/mldev -I.
> -I.. -Iconfig -I../config -Ilib/eal/include -I../lib/eal/include -
> Ilib/eal/linux/include -I../lib/eal/l inux/include -Ilib/eal/x86/include -
> I../lib/eal/x86/include -Ilib/eal/common -I../lib/eal/common -Ilib/eal -
> I../lib/eal -Ilib/kvargs -I../lib/kvargs -Ilib/log -I../lib/log -Ilib/metrics -
> I../lib/metrics -Ilib/telemetry -I../lib/telemetry -I lib/mempool -
> I../lib/mempool -Ilib/ring -I../lib/ring -Ilib/mbuf -I../lib/mbuf -
> Idrivers/common/cnxk -I../drivers/common/cnxk -Idrivers/bus/pci -
> I../drivers/bus/pci -Ilib/net -I../lib/net -Ilib/ethdev -I../lib/ethdev -Ilib/meter
> -I../lib/me ter -Ilib/pci -I../lib/pci -I../drivers/bus/pci/linux -Ilib/security -
> I../lib/security -Ilib/cryptodev -I../lib/cryptodev -Ilib/rcu -I../lib/rcu -Ilib/hash
> -I../lib/hash -fdiagnostics-color=always
> -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Wextra -Werror -std=c11 -O2 -
> g -include rte_config.h -Wcast-qual -Wdeprecated -Wformat -Wformat-
> nonliteral -Wformat-security -Wmissing-declarations -Wmissing-prototypes -
> Wnested-externs -Wold-style-definition -Wpointer-arith -Wsign-compare  -
> Wstrict-prototypes -Wundef -Wwrite-strings -Wno-address-of-packed-
> member -Wno-packed-not-aligned -Wno-missing-field-initializers -Wno-
> zero-length-bounds -D_GNU_SOURCE -fPIC -march=native -mrtm -
> DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API  -Wno-format-
> truncation -DCNXK_ML_DEV_DEBUG -
> DRTE_LOG_DEFAULT_LOGTYPE=pmd.ml.cnxk -MD -MQ
> drivers/libtmp_rte_ml_cnxk.a.p/ml_cnxk_cnxk_ml_ops.c.o -MF
> drivers/libtmp_rte_ml_cnxk.a.p/ml_cnxk_cnxk_ml_ops.c.o.d -o
> drivers/libtmp_rte_ml_cnxk.a.p/ ml_cnxk_cnxk_ml_ops.c.o -c
> ../drivers/ml/cnxk/cnxk_ml_ops.c
> ../drivers/ml/cnxk/cnxk_ml_ops.c: In function ‘cnxk_ml_model_load’:
> ../drivers/ml/cnxk/cnxk_ml_ops.c:527:18: error: ‘struct cnxk_ml_model’
> has no member named ‘type’
>   527 |         if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
>       |                  ^~
> ../drivers/ml/cnxk/cnxk_ml_ops.c:527:28: error:
> ‘ML_CNXK_MODEL_TYPE_GLOW’ undeclared (first use in this function); did
> you mean ‘ML_CNXK_MODEL_STATE_LOADED’?
>   527 |         if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
>       |                            ^~~~~~~~~~~~~~~~~~~~~~~
>       |                            ML_CNXK_MODEL_STATE_LOADED
> ../drivers/ml/cnxk/cnxk_ml_ops.c:527:28: note: each undeclared identifier is
> reported only once for each function it appears in
> ../drivers/ml/cnxk/cnxk_ml_ops.c:549:26: error: ‘struct cnxk_ml_model’
> has no member named ‘type’
>   549 |                 if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
>       |                          ^~
> ../drivers/ml/cnxk/cnxk_ml_ops.c:568:26: error: ‘struct cnxk_ml_model’
> has no member named ‘type’
>   568 |                 if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
>       |                          ^~
> [2390/2660] Generating drivers/rte_bus_dpaa.sym_chk with a custom
> command (wrapped by meson to capture output) [2391/2660] Generating
> drivers/rte_bus_fslmc.sym_chk with a custom command (wrapped by
> meson to capture output) [2392/2660] Generating lib/pipeline.sym_chk with
> a custom command (wrapped by meson to capture output) [2393/2660]
> Generating lib/ethdev.sym_chk with a custom command (wrapped by
> meson to capture output) [2394/2660] Generating lib/eal.sym_chk with a
> custom command (wrapped by meson to capture output) [2395/2660]
> Generating drivers/rte_common_sfc_efx.sym_chk with a custom command
> (wrapped by meson to capture output) [2396/2660] Generating
> drivers/rte_common_cnxk.sym_chk with a custom command (wrapped by
> meson to capture output)
> ninja: build stopped: subcommand failed.

^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 00/34] Implementation of revised ml/cnxk driver
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (37 preceding siblings ...)
  2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
@ 2023-10-18 13:53 ` Srikanth Yalavarthi
  2023-10-18 13:53   ` [PATCH v6 01/34] ml/cnxk: drop support for register polling Srikanth Yalavarthi
                     ` (34 more replies)
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
                   ` (2 subsequent siblings)
  41 siblings, 35 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:53 UTC (permalink / raw)
  Cc: dev, syalavarthi, sshankarnara, aprabhu, ptakkar

This patch series is an implementation of revised ml/cnxk driver
to support models compiled with TVM compiler framework. TVM models
use a hybrid mode for execution, with regions of the model executing
on the ML accelerator and the rest executing on CPU cores.

This series of commits reorganizes the ml/cnxk driver and adds support
to execute multiple regions with-in a TVM model.

v6:
  - Added depends info for series. This series depends on patch-132887
  - Fix merge conflicts with dpdk-23.11-rc1
  - Fix issues with ml/cnxk driver release notes
  - Added build dependency information for dlpack headers

v5:
  - Fix build failures for individual patches in the series
  - Finished build testing with devtools/test-meson-builds.sh script

v4:
  - Squashed release notes
  - Updated external build dependency info in documentation

v3:
  - Reduced use of RTE_MLDEV_CNXK_ENABLE_MVTVM macro
  - Added stubs file with dummy functions to use when TVM is disabled
  - Dropped patch with internal function to read firmware
  - Updated ML CNXK PMD documentation
  - Added external library dependency info in documentation
  - Added release notes for 23.11

v2:
  - Fix xstats reporting
  - Fix issues reported by klocwork static analysis tool
  - Update external header inclusions

v1:
  - Initial changes

Anup Prabhu (2):
  ml/cnxk: enable OCM check for multilayer TVM model
  ml/cnxk: enable fast-path ops for TVM models

Prince Takkar (2):
  ml/cnxk: update internal TVM model info structure
  ml/cnxk: support quantize and dequantize callback

Srikanth Yalavarthi (30):
  ml/cnxk: drop support for register polling
  ml/cnxk: add generic cnxk device structure
  ml/cnxk: add generic model and layer structures
  ml/cnxk: add generic cnxk request structure
  ml/cnxk: add generic cnxk xstats structures
  ml/cnxk: rename cnxk ops function pointers struct
  ml/cnxk: update device handling functions
  ml/cnxk: update queue-pair handling functions
  ml/cnxk: update model load and unload functions
  ml/cnxk: update model start and stop functions
  ml/cnxk: update model utility functions
  ml/cnxk: update data quantization functions
  ml/cnxk: update device debug functions
  ml/cnxk: update device stats functions
  ml/cnxk: update device and model xstats functions
  ml/cnxk: update fast path functions
  ml/cnxk: move error handling to cnxk layer
  ml/cnxk: support config and close of tvmdp library
  ml/cnxk: add structures to support TVM model type
  ml/cnxk: add support for identify model type
  ml/cnxk: add support to parse TVM model objects
  ml/cnxk: fetch layer info and load TVM model
  ml/cnxk: update internal info for TVM model
  ml/cnxk: enable model unload in tvmdp library
  ml/cnxk: support start and stop for TVM models
  ml/cnxk: support device dump for TVM models
  ml/cnxk: enable reporting model runtime as xstats
  ml/cnxk: implement I/O alloc and free callbacks
  ml/cnxk: add generic ML malloc and free callback
  ml/cnxk: enable creation of mvtvm virtual device

 doc/guides/mldevs/cnxk.rst             |  131 +-
 doc/guides/rel_notes/release_23_11.rst |    3 +
 drivers/ml/cnxk/cn10k_ml_dev.c         |  416 ++--
 drivers/ml/cnxk/cn10k_ml_dev.h         |  457 +---
 drivers/ml/cnxk/cn10k_ml_model.c       |  401 ++--
 drivers/ml/cnxk/cn10k_ml_model.h       |  151 +-
 drivers/ml/cnxk/cn10k_ml_ocm.c         |  111 +-
 drivers/ml/cnxk/cn10k_ml_ocm.h         |   15 +-
 drivers/ml/cnxk/cn10k_ml_ops.c         | 2828 ++++++++----------------
 drivers/ml/cnxk/cn10k_ml_ops.h         |  358 ++-
 drivers/ml/cnxk/cnxk_ml_dev.c          |   22 +
 drivers/ml/cnxk/cnxk_ml_dev.h          |  120 +
 drivers/ml/cnxk/cnxk_ml_io.c           |   95 +
 drivers/ml/cnxk/cnxk_ml_io.h           |   88 +
 drivers/ml/cnxk/cnxk_ml_model.c        |   94 +
 drivers/ml/cnxk/cnxk_ml_model.h        |  192 ++
 drivers/ml/cnxk/cnxk_ml_ops.c          | 1690 ++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.h          |   87 +
 drivers/ml/cnxk/cnxk_ml_utils.c        |   15 +
 drivers/ml/cnxk/cnxk_ml_utils.h        |   17 +
 drivers/ml/cnxk/cnxk_ml_xstats.h       |  152 ++
 drivers/ml/cnxk/meson.build            |   73 +
 drivers/ml/cnxk/mvtvm_ml_dev.c         |  196 ++
 drivers/ml/cnxk/mvtvm_ml_dev.h         |   40 +
 drivers/ml/cnxk/mvtvm_ml_model.c       |  392 ++++
 drivers/ml/cnxk/mvtvm_ml_model.h       |   90 +
 drivers/ml/cnxk/mvtvm_ml_ops.c         |  652 ++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h         |   82 +
 drivers/ml/cnxk/mvtvm_ml_stubs.c       |  141 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.h       |   36 +
 30 files changed, 6186 insertions(+), 2959 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_xstats.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_stubs.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_stubs.h

-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 01/34] ml/cnxk: drop support for register polling
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
@ 2023-10-18 13:53   ` Srikanth Yalavarthi
  2023-10-18 13:53   ` [PATCH v6 02/34] ml/cnxk: add generic cnxk device structure Srikanth Yalavarthi
                     ` (33 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:53 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Dropped support for device argument "poll_mem" for cnxk
ML driver. Support to use registers for polling is removed
and DDR addresses would be used for polling.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
Depends-on: patch-132887 ("ml/cnxk: don't export internal headers")

 doc/guides/mldevs/cnxk.rst     |  16 -----
 drivers/ml/cnxk/cn10k_ml_dev.c |  36 +----------
 drivers/ml/cnxk/cn10k_ml_dev.h |  13 +---
 drivers/ml/cnxk/cn10k_ml_ops.c | 111 ++++-----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   6 --
 5 files changed, 18 insertions(+), 164 deletions(-)

diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
index b79bc540d9..1834b1f905 100644
--- a/doc/guides/mldevs/cnxk.rst
+++ b/doc/guides/mldevs/cnxk.rst
@@ -180,22 +180,6 @@ Runtime Config Options
   in the fast path enqueue burst operation.
 
 
-**Polling memory location** (default ``ddr``)
-
-  ML cnxk driver provides the option to select the memory location to be used
-  for polling to check the inference request completion.
-  Driver supports using either the DDR address space (``ddr``)
-  or ML registers (``register``) as polling locations.
-  The parameter ``poll_mem`` is used to specify the poll location.
-
-  For example::
-
-     -a 0000:00:10.0,poll_mem="register"
-
-  With the above configuration, ML cnxk driver is configured to use ML registers
-  for polling in fastpath requests.
-
-
 Debugging Options
 -----------------
 
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 983138a7f2..e3c2badcef 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -23,7 +23,6 @@
 #define CN10K_ML_DEV_CACHE_MODEL_DATA	"cache_model_data"
 #define CN10K_ML_OCM_ALLOC_MODE		"ocm_alloc_mode"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK	"hw_queue_lock"
-#define CN10K_ML_FW_POLL_MEM		"poll_mem"
 #define CN10K_ML_OCM_PAGE_SIZE		"ocm_page_size"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
@@ -32,7 +31,6 @@
 #define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT	1
 #define CN10K_ML_OCM_ALLOC_MODE_DEFAULT		"lowest"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT	1
-#define CN10K_ML_FW_POLL_MEM_DEFAULT		"ddr"
 #define CN10K_ML_OCM_PAGE_SIZE_DEFAULT		16384
 
 /* ML firmware macros */
@@ -54,7 +52,6 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 					 CN10K_ML_DEV_CACHE_MODEL_DATA,
 					 CN10K_ML_OCM_ALLOC_MODE,
 					 CN10K_ML_DEV_HW_QUEUE_LOCK,
-					 CN10K_ML_FW_POLL_MEM,
 					 CN10K_ML_OCM_PAGE_SIZE,
 					 NULL};
 
@@ -103,9 +100,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	bool hw_queue_lock_set = false;
 	bool ocm_page_size_set = false;
 	char *ocm_alloc_mode = NULL;
-	bool poll_mem_set = false;
 	bool fw_path_set = false;
-	char *poll_mem = NULL;
 	char *fw_path = NULL;
 	int ret = 0;
 	bool found;
@@ -189,17 +184,6 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		hw_queue_lock_set = true;
 	}
 
-	if (rte_kvargs_count(kvlist, CN10K_ML_FW_POLL_MEM) == 1) {
-		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_POLL_MEM, &parse_string_arg,
-					 &poll_mem);
-		if (ret < 0) {
-			plt_err("Error processing arguments, key = %s\n", CN10K_ML_FW_POLL_MEM);
-			ret = -EINVAL;
-			goto exit;
-		}
-		poll_mem_set = true;
-	}
-
 	if (rte_kvargs_count(kvlist, CN10K_ML_OCM_PAGE_SIZE) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_OCM_PAGE_SIZE, &parse_integer_arg,
 					 &mldev->ocm_page_size);
@@ -280,18 +264,6 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, mldev->hw_queue_lock);
 
-	if (!poll_mem_set) {
-		mldev->fw.poll_mem = CN10K_ML_FW_POLL_MEM_DEFAULT;
-	} else {
-		if (!((strcmp(poll_mem, "ddr") == 0) || (strcmp(poll_mem, "register") == 0))) {
-			plt_err("Invalid argument, %s = %s\n", CN10K_ML_FW_POLL_MEM, poll_mem);
-			ret = -EINVAL;
-			goto exit;
-		}
-		mldev->fw.poll_mem = poll_mem;
-	}
-	plt_info("ML: %s = %s", CN10K_ML_FW_POLL_MEM, mldev->fw.poll_mem);
-
 	if (!ocm_page_size_set) {
 		mldev->ocm_page_size = CN10K_ML_OCM_PAGE_SIZE_DEFAULT;
 	} else {
@@ -450,10 +422,7 @@ cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 	if (fw->report_dpe_warnings)
 		flags = flags | FW_REPORT_DPE_WARNING_BITMASK;
 
-	if (strcmp(fw->poll_mem, "ddr") == 0)
-		flags = flags | FW_USE_DDR_POLL_ADDR_FP;
-	else if (strcmp(fw->poll_mem, "register") == 0)
-		flags = flags & ~FW_USE_DDR_POLL_ADDR_FP;
+	flags = flags | FW_USE_DDR_POLL_ADDR_FP;
 
 	return flags;
 }
@@ -863,5 +832,4 @@ RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH
 			      "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
 			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE
 			      "=<lowest|largest>" CN10K_ML_DEV_HW_QUEUE_LOCK
-			      "=<0|1>" CN10K_ML_FW_POLL_MEM "=<ddr|register>" CN10K_ML_OCM_PAGE_SIZE
-			      "=<1024|2048|4096|8192|16384>");
+			      "=<0|1>" CN10K_ML_OCM_PAGE_SIZE "=<1024|2048|4096|8192|16384>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index c73bf7d001..4aaeecff03 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -390,9 +390,6 @@ struct cn10k_ml_fw {
 	/* Report DPE warnings */
 	int report_dpe_warnings;
 
-	/* Memory to be used for polling in fast-path requests */
-	const char *poll_mem;
-
 	/* Data buffer */
 	uint8_t *data;
 
@@ -525,13 +522,9 @@ struct cn10k_ml_dev {
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
 
 	/* Poll handling function pointers */
-	void (*set_poll_addr)(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx);
-	void (*set_poll_ptr)(struct roc_ml *roc_ml, struct cn10k_ml_req *req);
-	uint64_t (*get_poll_ptr)(struct roc_ml *roc_ml, struct cn10k_ml_req *req);
-
-	/* Memory barrier function pointers to handle synchronization */
-	void (*set_enq_barrier)(void);
-	void (*set_deq_barrier)(void);
+	void (*set_poll_addr)(struct cn10k_ml_req *req);
+	void (*set_poll_ptr)(struct cn10k_ml_req *req);
+	uint64_t (*get_poll_ptr)(struct cn10k_ml_req *req);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 4abf4ae0d3..11531afd8c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -23,11 +23,6 @@
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
-/* Scratch register range for poll mode requests */
-#define ML_POLL_REGISTER_SYNC  1023
-#define ML_POLL_REGISTER_START 1024
-#define ML_POLL_REGISTER_END   2047
-
 /* Error message length */
 #define ERRMSG_LEN 32
 
@@ -82,79 +77,23 @@ print_line(FILE *fp, int len)
 }
 
 static inline void
-cn10k_ml_set_poll_addr_ddr(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx)
+cn10k_ml_set_poll_addr(struct cn10k_ml_req *req)
 {
-	PLT_SET_USED(qp);
-	PLT_SET_USED(idx);
-
 	req->compl_W1 = PLT_U64_CAST(&req->status);
 }
 
 static inline void
-cn10k_ml_set_poll_addr_reg(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx)
-{
-	req->compl_W1 = ML_SCRATCH(qp->block_start + idx % qp->block_size);
-}
-
-static inline void
-cn10k_ml_set_poll_ptr_ddr(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+cn10k_ml_set_poll_ptr(struct cn10k_ml_req *req)
 {
-	PLT_SET_USED(roc_ml);
-
 	plt_write64(ML_CN10K_POLL_JOB_START, req->compl_W1);
 }
 
-static inline void
-cn10k_ml_set_poll_ptr_reg(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
-{
-	roc_ml_reg_write64(roc_ml, ML_CN10K_POLL_JOB_START, req->compl_W1);
-}
-
 static inline uint64_t
-cn10k_ml_get_poll_ptr_ddr(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+cn10k_ml_get_poll_ptr(struct cn10k_ml_req *req)
 {
-	PLT_SET_USED(roc_ml);
-
 	return plt_read64(req->compl_W1);
 }
 
-static inline uint64_t
-cn10k_ml_get_poll_ptr_reg(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
-{
-	return roc_ml_reg_read64(roc_ml, req->compl_W1);
-}
-
-static inline void
-cn10k_ml_set_sync_addr(struct cn10k_ml_dev *mldev, struct cn10k_ml_req *req)
-{
-	if (strcmp(mldev->fw.poll_mem, "ddr") == 0)
-		req->compl_W1 = PLT_U64_CAST(&req->status);
-	else if (strcmp(mldev->fw.poll_mem, "register") == 0)
-		req->compl_W1 = ML_SCRATCH(ML_POLL_REGISTER_SYNC);
-}
-
-static inline void
-cn10k_ml_enq_barrier_ddr(void)
-{
-}
-
-static inline void
-cn10k_ml_deq_barrier_ddr(void)
-{
-}
-
-static inline void
-cn10k_ml_enq_barrier_register(void)
-{
-	dmb_st;
-}
-
-static inline void
-cn10k_ml_deq_barrier_register(void)
-{
-	dsb_st;
-}
-
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -242,9 +181,6 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->stats.dequeued_count = 0;
 	qp->stats.enqueue_err_count = 0;
 	qp->stats.dequeue_err_count = 0;
-	qp->block_size =
-		(ML_POLL_REGISTER_END - ML_POLL_REGISTER_START + 1) / dev->data->nb_queue_pairs;
-	qp->block_start = ML_POLL_REGISTER_START + qp_id * qp->block_size;
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
@@ -933,11 +869,7 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	else
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_LF;
 
-	if (strcmp(mldev->fw.poll_mem, "register") == 0)
-		dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP / dev_info->max_queue_pairs;
-	else if (strcmp(mldev->fw.poll_mem, "ddr") == 0)
-		dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
-
+	dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
 	dev_info->max_io = ML_CN10K_MAX_INPUT_OUTPUT;
 	dev_info->max_segments = ML_CN10K_MAX_SEGMENTS;
 	dev_info->align_size = ML_CN10K_ALIGN_SIZE;
@@ -1118,24 +1050,9 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
 	/* Set polling function pointers */
-	if (strcmp(mldev->fw.poll_mem, "ddr") == 0) {
-		mldev->set_poll_addr = cn10k_ml_set_poll_addr_ddr;
-		mldev->set_poll_ptr = cn10k_ml_set_poll_ptr_ddr;
-		mldev->get_poll_ptr = cn10k_ml_get_poll_ptr_ddr;
-	} else if (strcmp(mldev->fw.poll_mem, "register") == 0) {
-		mldev->set_poll_addr = cn10k_ml_set_poll_addr_reg;
-		mldev->set_poll_ptr = cn10k_ml_set_poll_ptr_reg;
-		mldev->get_poll_ptr = cn10k_ml_get_poll_ptr_reg;
-	}
-
-	/* Set barrier function pointers */
-	if (strcmp(mldev->fw.poll_mem, "ddr") == 0) {
-		mldev->set_enq_barrier = cn10k_ml_enq_barrier_ddr;
-		mldev->set_deq_barrier = cn10k_ml_deq_barrier_ddr;
-	} else if (strcmp(mldev->fw.poll_mem, "register") == 0) {
-		mldev->set_enq_barrier = cn10k_ml_enq_barrier_register;
-		mldev->set_deq_barrier = cn10k_ml_deq_barrier_register;
-	}
+	mldev->set_poll_addr = cn10k_ml_set_poll_addr;
+	mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
+	mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
 
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
@@ -2390,15 +2307,14 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	op = ops[count];
 	req = &queue->reqs[head];
 
-	mldev->set_poll_addr(qp, req, head);
+	mldev->set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
-	mldev->set_enq_barrier();
 
-	mldev->set_poll_ptr(&mldev->roc, req);
+	mldev->set_poll_ptr(req);
 	enqueued = mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
@@ -2445,7 +2361,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 
 dequeue_req:
 	req = &queue->reqs[tail];
-	status = mldev->get_poll_ptr(&mldev->roc, req);
+	status = mldev->get_poll_ptr(req);
 	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH)) {
 		if (plt_tsc_cycles() < req->timeout)
 			goto empty_or_active;
@@ -2453,7 +2369,6 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 			req->result.error_code.s.etype = ML_ETYPE_DRIVER;
 	}
 
-	mldev->set_deq_barrier();
 	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
 	ops[count] = req->op;
 
@@ -2515,14 +2430,14 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	model = dev->data->models[op->model_id];
 	req = model->req;
 
-	cn10k_ml_set_sync_addr(mldev, req);
+	cn10k_ml_set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
-	mldev->set_poll_ptr(&mldev->roc, req);
+	mldev->set_poll_ptr(req);
 	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
 
 	timeout = true;
@@ -2542,7 +2457,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 
 	timeout = true;
 	do {
-		if (mldev->get_poll_ptr(&mldev->roc, req) == ML_CN10K_POLL_JOB_FINISH) {
+		if (mldev->get_poll_ptr(req) == ML_CN10K_POLL_JOB_FINISH) {
 			timeout = false;
 			break;
 		}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index d64a9f27e6..005b093e45 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -67,12 +67,6 @@ struct cn10k_ml_qp {
 
 	/* Statistics per queue-pair */
 	struct rte_ml_dev_stats stats;
-
-	/* Register block start for polling */
-	uint32_t block_start;
-
-	/* Register block end for polling */
-	uint32_t block_size;
 };
 
 /* Device ops */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 02/34] ml/cnxk: add generic cnxk device structure
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
  2023-10-18 13:53   ` [PATCH v6 01/34] ml/cnxk: drop support for register polling Srikanth Yalavarthi
@ 2023-10-18 13:53   ` Srikanth Yalavarthi
  2023-10-18 13:53   ` [PATCH v6 03/34] ml/cnxk: add generic model and layer structures Srikanth Yalavarthi
                     ` (32 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:53 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduce generic cnxk device structure. This structure is
a top level device structure for the driver, which would
encapsulate the target / platform specific device structure.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c   | 316 ++++++++++----------
 drivers/ml/cnxk/cn10k_ml_dev.h   |  47 +--
 drivers/ml/cnxk/cn10k_ml_model.c |  15 +-
 drivers/ml/cnxk/cn10k_ml_model.h |   8 +-
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  60 ++--
 drivers/ml/cnxk/cn10k_ml_ops.c   | 495 +++++++++++++++++--------------
 drivers/ml/cnxk/cnxk_ml_dev.c    |  11 +
 drivers/ml/cnxk/cnxk_ml_dev.h    |  58 ++++
 drivers/ml/cnxk/meson.build      |   1 +
 9 files changed, 562 insertions(+), 449 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index e3c2badcef..3bc61443d8 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -10,13 +10,14 @@
 #include <rte_mldev_pmd.h>
 #include <rte_pci.h>
 
-#include <roc_api.h>
-
 #include <eal_firmware.h>
 
-#include "cn10k_ml_dev.h"
+#include <roc_api.h>
+
 #include "cn10k_ml_ops.h"
 
+#include "cnxk_ml_dev.h"
+
 #define CN10K_ML_FW_PATH		"fw_path"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
@@ -58,9 +59,6 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 /* Supported OCM page sizes: 1KB, 2KB, 4KB, 8KB and 16KB */
 static const int valid_ocm_page_size[] = {1024, 2048, 4096, 8192, 16384};
 
-/* Dummy operations for ML device */
-struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
-
 static int
 parse_string_arg(const char *key __rte_unused, const char *value, void *extra_args)
 {
@@ -90,7 +88,7 @@ parse_integer_arg(const char *key __rte_unused, const char *value, void *extra_a
 }
 
 static int
-cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mldev)
+cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *cn10k_mldev)
 {
 	bool enable_dpe_warnings_set = false;
 	bool report_dpe_warnings_set = false;
@@ -127,7 +125,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_FW_ENABLE_DPE_WARNINGS) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_ENABLE_DPE_WARNINGS,
-					 &parse_integer_arg, &mldev->fw.enable_dpe_warnings);
+					 &parse_integer_arg, &cn10k_mldev->fw.enable_dpe_warnings);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_FW_ENABLE_DPE_WARNINGS);
@@ -139,7 +137,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_FW_REPORT_DPE_WARNINGS) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_REPORT_DPE_WARNINGS,
-					 &parse_integer_arg, &mldev->fw.report_dpe_warnings);
+					 &parse_integer_arg, &cn10k_mldev->fw.report_dpe_warnings);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_FW_REPORT_DPE_WARNINGS);
@@ -151,7 +149,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_DEV_CACHE_MODEL_DATA) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_DEV_CACHE_MODEL_DATA, &parse_integer_arg,
-					 &mldev->cache_model_data);
+					 &cn10k_mldev->cache_model_data);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_DEV_CACHE_MODEL_DATA);
@@ -174,7 +172,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_DEV_HW_QUEUE_LOCK) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_DEV_HW_QUEUE_LOCK, &parse_integer_arg,
-					 &mldev->hw_queue_lock);
+					 &cn10k_mldev->hw_queue_lock);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_DEV_HW_QUEUE_LOCK);
@@ -186,7 +184,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_OCM_PAGE_SIZE) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_OCM_PAGE_SIZE, &parse_integer_arg,
-					 &mldev->ocm_page_size);
+					 &cn10k_mldev->ocm_page_size);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n", CN10K_ML_OCM_PAGE_SIZE);
 			ret = -EINVAL;
@@ -197,49 +195,53 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 check_args:
 	if (!fw_path_set)
-		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
+		cn10k_mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
 	else
-		mldev->fw.path = fw_path;
-	plt_info("ML: %s = %s", CN10K_ML_FW_PATH, mldev->fw.path);
+		cn10k_mldev->fw.path = fw_path;
+	plt_info("ML: %s = %s", CN10K_ML_FW_PATH, cn10k_mldev->fw.path);
 
 	if (!enable_dpe_warnings_set) {
-		mldev->fw.enable_dpe_warnings = CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT;
+		cn10k_mldev->fw.enable_dpe_warnings = CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT;
 	} else {
-		if ((mldev->fw.enable_dpe_warnings < 0) || (mldev->fw.enable_dpe_warnings > 1)) {
+		if ((cn10k_mldev->fw.enable_dpe_warnings < 0) ||
+		    (cn10k_mldev->fw.enable_dpe_warnings > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_FW_ENABLE_DPE_WARNINGS,
-				mldev->fw.enable_dpe_warnings);
+				cn10k_mldev->fw.enable_dpe_warnings);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_FW_ENABLE_DPE_WARNINGS, mldev->fw.enable_dpe_warnings);
+	plt_info("ML: %s = %d", CN10K_ML_FW_ENABLE_DPE_WARNINGS,
+		 cn10k_mldev->fw.enable_dpe_warnings);
 
 	if (!report_dpe_warnings_set) {
-		mldev->fw.report_dpe_warnings = CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT;
+		cn10k_mldev->fw.report_dpe_warnings = CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT;
 	} else {
-		if ((mldev->fw.report_dpe_warnings < 0) || (mldev->fw.report_dpe_warnings > 1)) {
+		if ((cn10k_mldev->fw.report_dpe_warnings < 0) ||
+		    (cn10k_mldev->fw.report_dpe_warnings > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_FW_REPORT_DPE_WARNINGS,
-				mldev->fw.report_dpe_warnings);
+				cn10k_mldev->fw.report_dpe_warnings);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_FW_REPORT_DPE_WARNINGS, mldev->fw.report_dpe_warnings);
+	plt_info("ML: %s = %d", CN10K_ML_FW_REPORT_DPE_WARNINGS,
+		 cn10k_mldev->fw.report_dpe_warnings);
 
 	if (!cache_model_data_set) {
-		mldev->cache_model_data = CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT;
+		cn10k_mldev->cache_model_data = CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT;
 	} else {
-		if ((mldev->cache_model_data < 0) || (mldev->cache_model_data > 1)) {
+		if ((cn10k_mldev->cache_model_data < 0) || (cn10k_mldev->cache_model_data > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_DEV_CACHE_MODEL_DATA,
-				mldev->cache_model_data);
+				cn10k_mldev->cache_model_data);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_DEV_CACHE_MODEL_DATA, mldev->cache_model_data);
+	plt_info("ML: %s = %d", CN10K_ML_DEV_CACHE_MODEL_DATA, cn10k_mldev->cache_model_data);
 
 	if (!ocm_alloc_mode_set) {
-		mldev->ocm.alloc_mode = CN10K_ML_OCM_ALLOC_MODE_DEFAULT;
+		cn10k_mldev->ocm.alloc_mode = CN10K_ML_OCM_ALLOC_MODE_DEFAULT;
 	} else {
 		if (!((strcmp(ocm_alloc_mode, "lowest") == 0) ||
 		      (strcmp(ocm_alloc_mode, "largest") == 0))) {
@@ -248,47 +250,47 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 			ret = -EINVAL;
 			goto exit;
 		}
-		mldev->ocm.alloc_mode = ocm_alloc_mode;
+		cn10k_mldev->ocm.alloc_mode = ocm_alloc_mode;
 	}
-	plt_info("ML: %s = %s", CN10K_ML_OCM_ALLOC_MODE, mldev->ocm.alloc_mode);
+	plt_info("ML: %s = %s", CN10K_ML_OCM_ALLOC_MODE, cn10k_mldev->ocm.alloc_mode);
 
 	if (!hw_queue_lock_set) {
-		mldev->hw_queue_lock = CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT;
+		cn10k_mldev->hw_queue_lock = CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT;
 	} else {
-		if ((mldev->hw_queue_lock < 0) || (mldev->hw_queue_lock > 1)) {
+		if ((cn10k_mldev->hw_queue_lock < 0) || (cn10k_mldev->hw_queue_lock > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_DEV_HW_QUEUE_LOCK,
-				mldev->hw_queue_lock);
+				cn10k_mldev->hw_queue_lock);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, mldev->hw_queue_lock);
+	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, cn10k_mldev->hw_queue_lock);
 
 	if (!ocm_page_size_set) {
-		mldev->ocm_page_size = CN10K_ML_OCM_PAGE_SIZE_DEFAULT;
+		cn10k_mldev->ocm_page_size = CN10K_ML_OCM_PAGE_SIZE_DEFAULT;
 	} else {
-		if (mldev->ocm_page_size < 0) {
+		if (cn10k_mldev->ocm_page_size < 0) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_OCM_PAGE_SIZE,
-				mldev->ocm_page_size);
+				cn10k_mldev->ocm_page_size);
 			ret = -EINVAL;
 			goto exit;
 		}
 
 		found = false;
 		for (i = 0; i < PLT_DIM(valid_ocm_page_size); i++) {
-			if (mldev->ocm_page_size == valid_ocm_page_size[i]) {
+			if (cn10k_mldev->ocm_page_size == valid_ocm_page_size[i]) {
 				found = true;
 				break;
 			}
 		}
 
 		if (!found) {
-			plt_err("Unsupported ocm_page_size = %d\n", mldev->ocm_page_size);
+			plt_err("Unsupported ocm_page_size = %d\n", cn10k_mldev->ocm_page_size);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_OCM_PAGE_SIZE, mldev->ocm_page_size);
+	plt_info("ML: %s = %d", CN10K_ML_OCM_PAGE_SIZE, cn10k_mldev->ocm_page_size);
 
 exit:
 	rte_kvargs_free(kvlist);
@@ -300,7 +302,8 @@ static int
 cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 {
 	struct rte_ml_dev_pmd_init_params init_params;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	char name[RTE_ML_STR_MAX];
 	struct rte_ml_dev *dev;
 	int ret;
@@ -308,7 +311,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	PLT_SET_USED(pci_drv);
 
 	init_params = (struct rte_ml_dev_pmd_init_params){
-		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cn10k_ml_dev)};
+		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cnxk_ml_dev)};
 
 	ret = roc_plt_init();
 	if (ret < 0) {
@@ -324,18 +327,20 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	}
 
 	/* Get private data space allocated */
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cnxk_mldev->mldev = dev;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
-		mldev->roc.pci_dev = pci_dev;
+		cn10k_mldev->roc.pci_dev = pci_dev;
 
-		ret = cn10k_mldev_parse_devargs(dev->device->devargs, mldev);
+		ret = cn10k_mldev_parse_devargs(dev->device->devargs, cn10k_mldev);
 		if (ret) {
 			plt_err("Failed to parse devargs ret = %d", ret);
 			goto pmd_destroy;
 		}
 
-		ret = roc_ml_dev_init(&mldev->roc);
+		ret = roc_ml_dev_init(&cn10k_mldev->roc);
 		if (ret) {
 			plt_err("Failed to initialize ML ROC, ret = %d", ret);
 			goto pmd_destroy;
@@ -351,7 +356,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	dev->dequeue_burst = NULL;
 	dev->op_error_get = NULL;
 
-	mldev->state = ML_CN10K_DEV_STATE_PROBED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_PROBED;
 
 	return 0;
 
@@ -368,7 +373,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 static int
 cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	char name[RTE_ML_STR_MAX];
 	struct rte_ml_dev *dev;
 	int ret;
@@ -383,8 +388,8 @@ cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 		return -ENODEV;
 
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
-		mldev = dev->data->dev_private;
-		ret = roc_ml_dev_fini(&mldev->roc);
+		cnxk_mldev = dev->data->dev_private;
+		ret = roc_ml_dev_fini(&cnxk_mldev->cn10k_mldev.roc);
 		if (ret)
 			return ret;
 	}
@@ -430,45 +435,45 @@ cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 static int
 cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
 	uint64_t timeout_cycle;
 	uint64_t reg_val64;
 	bool timeout;
 	int ret = 0;
 
-	mldev = fw->mldev;
+	cn10k_mldev = fw->cn10k_mldev;
 
 	/* Reset HEAD and TAIL debug pointer registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
 
 	/* Set ML_MLR_BASE to base IOVA of the ML region in LLC/DRAM. */
 	reg_val64 = rte_eal_get_baseaddr();
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_MLR_BASE);
-	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
-	roc_ml_reg_save(&mldev->roc, ML_MLR_BASE);
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_save(&cn10k_mldev->roc, ML_MLR_BASE);
 
 	/* Update FW load completion structure */
 	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
 	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &fw->req->result);
+	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
 	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CN10K_POLL_JOB_START, &fw->req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
-	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&mldev->roc, &fw->req->jd);
+	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
 
 	plt_rmb();
 	do {
-		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
+		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -480,11 +485,11 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 	} else {
 		/* Set ML to disable new jobs */
 		reg_val64 = (ROC_ML_CFG_JD_SIZE | ROC_ML_CFG_MLIP_ENA);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 
 		/* Clear scratch registers */
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
 
 		if (timeout) {
 			plt_err("Firmware load timeout");
@@ -498,14 +503,14 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 	}
 
 	/* Reset scratch registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
 
 	/* Disable job execution, to be enabled in start */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	return ret;
 }
@@ -515,7 +520,7 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 {
 	union ml_a35_0_rst_vector_base_s a35_0_rst_vector_base;
 	union ml_a35_0_rst_vector_base_s a35_1_rst_vector_base;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
 	uint64_t timeout_cycle;
 	uint64_t reg_val64;
 	uint32_t reg_val32;
@@ -524,24 +529,24 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	int ret = 0;
 	uint8_t i;
 
-	mldev = fw->mldev;
+	cn10k_mldev = fw->cn10k_mldev;
 
 	/* Reset HEAD and TAIL debug pointer registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
 
 	/* (1) Write firmware images for ACC's two A35 cores to the ML region in LLC / DRAM. */
 	rte_memcpy(PLT_PTR_ADD(fw->data, FW_LINKER_OFFSET), buffer, size);
 
 	/* (2) Set ML(0)_MLR_BASE = Base IOVA of the ML region in LLC/DRAM. */
 	reg_val64 = PLT_PTR_SUB_U64_CAST(fw->data, rte_eal_get_baseaddr());
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_MLR_BASE);
-	plt_ml_dbg("ML_MLR_BASE => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
-	roc_ml_reg_save(&mldev->roc, ML_MLR_BASE);
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_save(&cn10k_mldev->roc, ML_MLR_BASE);
 
 	/* (3) Set ML(0)_AXI_BRIDGE_CTRL(1) = 0x184003 to remove back-pressure check on DMA AXI
 	 * bridge.
@@ -549,9 +554,9 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	reg_val64 = (ROC_ML_AXI_BRIDGE_CTRL_AXI_RESP_CTRL |
 		     ROC_ML_AXI_BRIDGE_CTRL_BRIDGE_CTRL_MODE | ROC_ML_AXI_BRIDGE_CTRL_NCB_WR_BLK |
 		     ROC_ML_AXI_BRIDGE_CTRL_FORCE_WRESP_OK | ROC_ML_AXI_BRIDGE_CTRL_FORCE_RRESP_OK);
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_AXI_BRIDGE_CTRL(1));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_AXI_BRIDGE_CTRL(1));
 	plt_ml_dbg("ML_AXI_BRIDGE_CTRL(1) => 0x%016lx",
-		   roc_ml_reg_read64(&mldev->roc, ML_AXI_BRIDGE_CTRL(1)));
+		   roc_ml_reg_read64(&cn10k_mldev->roc, ML_AXI_BRIDGE_CTRL(1)));
 
 	/* (4) Set ML(0)_ANB(0..2)_BACKP_DISABLE = 0x3 to remove back-pressure on the AXI to NCB
 	 * bridges.
@@ -559,9 +564,9 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	for (i = 0; i < ML_ANBX_NR; i++) {
 		reg_val64 = (ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_B_BACKP_DISABLE |
 			     ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_R_BACKP_DISABLE);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_BACKP_DISABLE(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_BACKP_DISABLE(i));
 		plt_ml_dbg("ML_ANBX_BACKP_DISABLE(%u) => 0x%016lx", i,
-			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_BACKP_DISABLE(i)));
+			   roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_BACKP_DISABLE(i)));
 	}
 
 	/* (5) Set ML(0)_ANB(0..2)_NCBI_P_OVR = 0x3000 and ML(0)_ANB(0..2)_NCBI_NP_OVR = 0x3000 to
@@ -570,39 +575,40 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	for (i = 0; i < ML_ANBX_NR; i++) {
 		reg_val64 = (ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR |
 			     ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR_VLD);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
 		plt_ml_dbg("ML_ANBX_NCBI_P_OVR(%u) => 0x%016lx", i,
-			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_P_OVR(i)));
+			   roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_NCBI_P_OVR(i)));
 
 		reg_val64 |= (ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR |
 			      ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR_VLD);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_NP_OVR(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_NCBI_NP_OVR(i));
 		plt_ml_dbg("ML_ANBX_NCBI_NP_OVR(%u) => 0x%016lx", i,
-			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_NP_OVR(i)));
+			   roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_NCBI_NP_OVR(i)));
 	}
 
 	/* (6) Set ML(0)_CFG[MLIP_CLK_FORCE] = 1, to force turning on the MLIP clock. */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_MLIP_CLK_FORCE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* (7) Set ML(0)_JOB_MGR_CTRL[STALL_ON_IDLE] = 0, to make sure the boot request is accepted
 	 * when there is no job in the command queue.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL);
 	reg_val64 &= ~ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
-	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
+	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx",
+		   roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL));
 
 	/* (8) Set ML(0)_CFG[ENA] = 0 and ML(0)_CFG[MLIP_ENA] = 1 to bring MLIP out of reset while
 	 * keeping the job manager disabled.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_MLIP_ENA;
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* (9) Wait at least 70 coprocessor clock cycles. */
 	plt_delay_us(FW_WAIT_CYCLES);
@@ -613,53 +619,57 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	 * AXI outbound address divided by 4. Read after write.
 	 */
 	offset = PLT_PTR_ADD_U64_CAST(
-		fw->data, FW_LINKER_OFFSET - roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+		fw->data, FW_LINKER_OFFSET - roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
 	a35_0_rst_vector_base.s.addr = (offset + ML_AXI_START_ADDR) / 4;
 	a35_1_rst_vector_base.s.addr = (offset + ML_AXI_START_ADDR) / 4;
 
-	roc_ml_reg_write32(&mldev->roc, a35_0_rst_vector_base.w.w0, ML_A35_0_RST_VECTOR_BASE_W(0));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(0));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_0_rst_vector_base.w.w0,
+			   ML_A35_0_RST_VECTOR_BASE_W(0));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(0));
 	plt_ml_dbg("ML_A35_0_RST_VECTOR_BASE_W(0) => 0x%08x", reg_val32);
 
-	roc_ml_reg_write32(&mldev->roc, a35_0_rst_vector_base.w.w1, ML_A35_0_RST_VECTOR_BASE_W(1));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(1));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_0_rst_vector_base.w.w1,
+			   ML_A35_0_RST_VECTOR_BASE_W(1));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(1));
 	plt_ml_dbg("ML_A35_0_RST_VECTOR_BASE_W(1) => 0x%08x", reg_val32);
 
-	roc_ml_reg_write32(&mldev->roc, a35_1_rst_vector_base.w.w0, ML_A35_1_RST_VECTOR_BASE_W(0));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(0));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_1_rst_vector_base.w.w0,
+			   ML_A35_1_RST_VECTOR_BASE_W(0));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(0));
 	plt_ml_dbg("ML_A35_1_RST_VECTOR_BASE_W(0) => 0x%08x", reg_val32);
 
-	roc_ml_reg_write32(&mldev->roc, a35_1_rst_vector_base.w.w1, ML_A35_1_RST_VECTOR_BASE_W(1));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(1));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_1_rst_vector_base.w.w1,
+			   ML_A35_1_RST_VECTOR_BASE_W(1));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(1));
 	plt_ml_dbg("ML_A35_1_RST_VECTOR_BASE_W(1) => 0x%08x", reg_val32);
 
 	/* (11) Clear MLIP's ML(0)_SW_RST_CTRL[ACC_RST]. This will bring the ACC cores and other
 	 * MLIP components out of reset. The cores will execute firmware from the ML region as
 	 * written in step 1.
 	 */
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_SW_RST_CTRL);
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_SW_RST_CTRL);
 	reg_val32 &= ~ROC_ML_SW_RST_CTRL_ACC_RST;
-	roc_ml_reg_write32(&mldev->roc, reg_val32, ML_SW_RST_CTRL);
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_SW_RST_CTRL);
+	roc_ml_reg_write32(&cn10k_mldev->roc, reg_val32, ML_SW_RST_CTRL);
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_SW_RST_CTRL);
 	plt_ml_dbg("ML_SW_RST_CTRL => 0x%08x", reg_val32);
 
 	/* (12) Wait for notification from firmware that ML is ready for job execution. */
 	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
 	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &fw->req->result);
+	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
 	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CN10K_POLL_JOB_START, &fw->req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
-	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&mldev->roc, &fw->req->jd);
+	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
 
 	plt_rmb();
 	do {
-		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
+		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -671,11 +681,11 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	} else {
 		/* Set ML to disable new jobs */
 		reg_val64 = (ROC_ML_CFG_JD_SIZE | ROC_ML_CFG_MLIP_ENA);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 
 		/* Clear scratch registers */
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
 
 		if (timeout) {
 			plt_err("Firmware load timeout");
@@ -691,49 +701,51 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	/* (13) Set ML(0)_JOB_MGR_CTRL[STALL_ON_IDLE] = 0x1; this is needed to shut down the MLIP
 	 * clock when there are no more jobs to process.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL);
 	reg_val64 |= ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
-	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
+	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx",
+		   roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL));
 
 	/* (14) Set ML(0)_CFG[MLIP_CLK_FORCE] = 0; the MLIP clock will be turned on/off based on job
 	 * activities.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_MLIP_CLK_FORCE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* (15) Set ML(0)_CFG[ENA] to enable ML job execution. */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* Reset scratch registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
 
 	/* Disable job execution, to be enabled in start */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* Additional fixes: Set RO bit to fix O2D DMA bandwidth issue on cn10ka */
 	for (i = 0; i < ML_ANBX_NR; i++) {
-		reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_P_OVR(i));
+		reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_NCBI_P_OVR(i));
 		reg_val64 |= (ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR |
 			      ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR_VLD);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
 	}
 
 	return ret;
 }
 
 int
-cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
+cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
 	const struct plt_memzone *mz;
 	struct cn10k_ml_fw *fw;
 	void *fw_buffer = NULL;
@@ -741,8 +753,9 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	uint64_t fw_size = 0;
 	int ret = 0;
 
-	fw = &mldev->fw;
-	fw->mldev = mldev;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	fw = &cn10k_mldev->fw;
+	fw->cn10k_mldev = cn10k_mldev;
 
 	if (roc_env_is_emulator() || roc_env_is_hw()) {
 		/* Read firmware image to a buffer */
@@ -773,8 +786,8 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	memset(&fw->req->jd.fw_load.version[0], '\0', MLDEV_FIRMWARE_VERSION_LENGTH);
 
 	/* Reset device, if in active state */
-	if (roc_ml_mlip_is_enabled(&mldev->roc))
-		roc_ml_mlip_reset(&mldev->roc, true);
+	if (roc_ml_mlip_is_enabled(&cn10k_mldev->roc))
+		roc_ml_mlip_reset(&cn10k_mldev->roc, true);
 
 	/* Load firmware */
 	if (roc_env_is_emulator() || roc_env_is_hw()) {
@@ -787,22 +800,25 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	}
 
 	if (ret < 0)
-		cn10k_ml_fw_unload(mldev);
+		cn10k_ml_fw_unload(cnxk_mldev);
 
 	return ret;
 }
 
 void
-cn10k_ml_fw_unload(struct cn10k_ml_dev *mldev)
+cn10k_ml_fw_unload(struct cnxk_ml_dev *cnxk_mldev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
 	const struct plt_memzone *mz;
 	uint64_t reg_val;
 
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+
 	/* Disable and reset device */
-	reg_val = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val &= ~ROC_ML_CFG_MLIP_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val, ML_CFG);
-	roc_ml_mlip_reset(&mldev->roc, true);
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val, ML_CFG);
+	roc_ml_mlip_reset(&cn10k_mldev->roc, true);
 
 	mz = plt_memzone_lookup(FW_MEMZONE_NAME);
 	if (mz != NULL)
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 4aaeecff03..f9da1548c4 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -9,6 +9,9 @@
 
 #include "cn10k_ml_ocm.h"
 
+/* Dummy Device ops */
+extern struct rte_ml_dev_ops ml_dev_dummy_ops;
+
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
@@ -36,17 +39,10 @@
 /* Maximum number of segments for IO data */
 #define ML_CN10K_MAX_SEGMENTS 1
 
-/* ML command timeout in seconds */
-#define ML_CN10K_CMD_TIMEOUT 5
-
 /* ML slow-path job flags */
 #define ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE BIT(0)
 #define ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD   BIT(1)
 
-/* Poll mode job state */
-#define ML_CN10K_POLL_JOB_START	 0
-#define ML_CN10K_POLL_JOB_FINISH 1
-
 /* Memory barrier macros */
 #if defined(RTE_ARCH_ARM)
 #define dmb_st ({ asm volatile("dmb st" : : : "memory"); })
@@ -56,6 +52,7 @@
 #define dsb_st
 #endif
 
+struct cnxk_ml_dev;
 struct cn10k_ml_req;
 struct cn10k_ml_qp;
 
@@ -68,21 +65,6 @@ enum cn10k_ml_job_type {
 	ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST,
 };
 
-/* Device configuration state enum */
-enum cn10k_ml_dev_state {
-	/* Probed and not configured */
-	ML_CN10K_DEV_STATE_PROBED = 0,
-
-	/* Configured */
-	ML_CN10K_DEV_STATE_CONFIGURED,
-
-	/* Started */
-	ML_CN10K_DEV_STATE_STARTED,
-
-	/* Closed */
-	ML_CN10K_DEV_STATE_CLOSED
-};
-
 /* Error types enumeration */
 enum cn10k_ml_error_etype {
 	/* 0x0 */ ML_ETYPE_NO_ERROR = 0, /* No error */
@@ -379,7 +361,7 @@ struct cn10k_ml_jd {
 /* ML firmware structure */
 struct cn10k_ml_fw {
 	/* Device reference */
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
 
 	/* Firmware file path */
 	const char *path;
@@ -485,27 +467,12 @@ struct cn10k_ml_dev {
 	/* Device ROC */
 	struct roc_ml roc;
 
-	/* Configuration state */
-	enum cn10k_ml_dev_state state;
-
 	/* Firmware */
 	struct cn10k_ml_fw fw;
 
 	/* OCM info */
 	struct cn10k_ml_ocm ocm;
 
-	/* Number of models loaded */
-	uint16_t nb_models_loaded;
-
-	/* Number of models unloaded */
-	uint16_t nb_models_unloaded;
-
-	/* Number of models started */
-	uint16_t nb_models_started;
-
-	/* Number of models stopped */
-	uint16_t nb_models_stopped;
-
 	/* Extended stats data */
 	struct cn10k_ml_xstats xstats;
 
@@ -528,7 +495,7 @@ struct cn10k_ml_dev {
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
-int cn10k_ml_fw_load(struct cn10k_ml_dev *mldev);
-void cn10k_ml_fw_unload(struct cn10k_ml_dev *mldev);
+int cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev);
+void cn10k_ml_fw_unload(struct cnxk_ml_dev *cnxk_mldev);
 
 #endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index e0b750cd8e..cc46ca2efd 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -6,10 +6,11 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
+#include "cnxk_ml_dev.h"
+
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
 {
@@ -461,7 +462,7 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_
 }
 
 int
-cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, uint8_t *buffer,
+cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id, uint8_t *buffer,
 			       uint16_t *wb_pages, uint16_t *scratch_pages)
 {
 	struct cn10k_ml_model_metadata *metadata;
@@ -470,7 +471,7 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui
 	uint64_t wb_size;
 
 	metadata = (struct cn10k_ml_model_metadata *)buffer;
-	ocm = &mldev->ocm;
+	ocm = &cn10k_mldev->ocm;
 
 	/* Assume wb_size is zero for non-relocatable models */
 	if (metadata->model.ocm_relocatable)
@@ -494,11 +495,11 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui
 		   scratch_size, *scratch_pages);
 
 	/* Check if the model can be loaded on OCM */
-	if ((*wb_pages + *scratch_pages) > mldev->ocm.num_pages) {
+	if ((*wb_pages + *scratch_pages) > cn10k_mldev->ocm.num_pages) {
 		plt_err("Cannot create the model, OCM relocatable = %u",
 			metadata->model.ocm_relocatable);
 		plt_err("wb_pages (%u) + scratch_pages (%u) > %u", *wb_pages, *scratch_pages,
-			mldev->ocm.num_pages);
+			cn10k_mldev->ocm.num_pages);
 		return -ENOMEM;
 	}
 
@@ -506,8 +507,8 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui
 	 * prevent the library from allocating the remaining space on the tile to other models.
 	 */
 	if (!metadata->model.ocm_relocatable)
-		*scratch_pages =
-			PLT_MAX(PLT_U64_CAST(*scratch_pages), PLT_U64_CAST(mldev->ocm.num_pages));
+		*scratch_pages = PLT_MAX(PLT_U64_CAST(*scratch_pages),
+					 PLT_U64_CAST(cn10k_mldev->ocm.num_pages));
 
 	return 0;
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 4cc0744891..3128b28db7 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -13,6 +13,8 @@
 #include "cn10k_ml_ocm.h"
 #include "cn10k_ml_ops.h"
 
+struct cnxk_ml_dev;
+
 /* Model state */
 enum cn10k_ml_model_state {
 	ML_CN10K_MODEL_STATE_LOADED,
@@ -489,7 +491,7 @@ struct cn10k_ml_model_stats {
 /* Model Object */
 struct cn10k_ml_model {
 	/* Device reference */
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *mldev;
 
 	/* Name */
 	char name[RTE_ML_STR_MAX];
@@ -537,8 +539,8 @@ int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
 void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
 				uint8_t *base_dma_addr);
-int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, uint8_t *buffer,
-				   uint16_t *wb_pages, uint16_t *scratch_pages);
+int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id,
+				   uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages);
 void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 6fb0bb620e..8094a0fab1 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -4,11 +4,12 @@
 
 #include <rte_mldev_pmd.h>
 
-#include "cn10k_ml_dev.h"
+#include <roc_api.h>
+
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
-#include "roc_api.h"
+#include "cnxk_ml_dev.h"
 
 /* OCM macros */
 #define BYTE_LEN	   8
@@ -217,7 +218,8 @@ int
 cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
 			   uint16_t scratch_pages, uint64_t *tilemask)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	uint16_t used_scratch_pages_max;
@@ -236,8 +238,9 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	int max_slot_sz;
 	int page_id;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 
 	if (num_tiles > ML_CN10K_OCM_NUMTILES) {
 		plt_err("Invalid num_tiles = %u (> %u)", num_tiles, ML_CN10K_OCM_NUMTILES);
@@ -254,8 +257,8 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	tile_start = 0;
 	search_end_tile = ocm->num_tiles - num_tiles;
 
-	/* allocate for local ocm mask */
-	local_ocm_mask = rte_zmalloc("local_ocm_mask", mldev->ocm.mask_words, RTE_CACHE_LINE_SIZE);
+	/* Allocate for local ocm mask */
+	local_ocm_mask = rte_zmalloc("local_ocm_mask", ocm->mask_words, RTE_CACHE_LINE_SIZE);
 	if (local_ocm_mask == NULL) {
 		plt_err("Unable to allocate memory for local_ocm_mask");
 		return -1;
@@ -271,7 +274,7 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 			PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, used_last_wb_page_max);
 	}
 
-	memset(local_ocm_mask, 0, mldev->ocm.mask_words);
+	memset(local_ocm_mask, 0, ocm->mask_words);
 	for (tile_id = tile_start; tile_id < tile_start + num_tiles; tile_id++) {
 		for (word_id = 0; word_id < ocm->mask_words; word_id++)
 			local_ocm_mask[word_id] |= ocm->tile_ocm_info[tile_id].ocm_mask[word_id];
@@ -333,8 +336,9 @@ void
 cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t tilemask,
 			   int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_page_start;
@@ -345,8 +349,9 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 	int tile_id;
 	int page_id;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	/* Get first set bit, tile_start */
@@ -391,8 +396,9 @@ void
 cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_model *local_model;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_resize_pages;
@@ -404,8 +410,9 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 	int page_id;
 	uint16_t i;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	/* Update OCM info for WB memory */
@@ -453,35 +460,37 @@ cn10k_ml_ocm_pagemask_to_str(struct cn10k_ml_ocm_tile_info *tile_info, uint16_t
 	char *p = str;
 	int word;
 
-	/* add prefix 0x */
+	/* Add prefix 0x */
 	*p++ = '0';
 	*p++ = 'x';
 
-	/* build one word at a time */
+	/* Build hex string */
 	for (word = nwords - 1; word >= 0; word--) {
 		sprintf(p, "%02X", tile_info->ocm_mask[word]);
 		p += 2;
 	}
 
-	/* terminate */
+	/* Terminate */
 	*p++ = 0;
 }
 
 void
 cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
 {
-	char *str;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 	uint8_t tile_id;
 	uint8_t word_id;
 	int wb_pages;
+	char *str;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 
-	/* nibbles + prefix '0x' */
-	str = rte_zmalloc("ocm_mask_str", mldev->ocm.num_pages / 4 + 2, RTE_CACHE_LINE_SIZE);
+	/* Nibbles + prefix '0x' */
+	str = rte_zmalloc("ocm_mask_str", ocm->num_pages / 4 + 2, RTE_CACHE_LINE_SIZE);
 	if (str == NULL) {
 		plt_err("Unable to allocate memory for ocm_mask_str");
 		return;
@@ -492,9 +501,8 @@ cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
 		cn10k_ml_ocm_pagemask_to_str(&ocm->tile_ocm_info[tile_id], ocm->mask_words, str);
 
 		wb_pages = 0 - ocm->tile_ocm_info[tile_id].scratch_pages;
-		for (word_id = 0; word_id < mldev->ocm.mask_words; word_id++)
-			wb_pages +=
-				rte_popcount32(ocm->tile_ocm_info[tile_id].ocm_mask[word_id]);
+		for (word_id = 0; word_id < ocm->mask_words; word_id++)
+			wb_pages += rte_popcount32(ocm->tile_ocm_info[tile_id].ocm_mask[word_id]);
 
 		fprintf(fp,
 			"tile = %2u, scratch_pages = %4u,"
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 11531afd8c..def6d4c756 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -7,10 +7,11 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ops.h"
 
+#include "cnxk_ml_dev.h"
+
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
 
@@ -85,7 +86,7 @@ cn10k_ml_set_poll_addr(struct cn10k_ml_req *req)
 static inline void
 cn10k_ml_set_poll_ptr(struct cn10k_ml_req *req)
 {
-	plt_write64(ML_CN10K_POLL_JOB_START, req->compl_W1);
+	plt_write64(ML_CNXK_POLL_JOB_START, req->compl_W1);
 }
 
 static inline uint64_t
@@ -175,7 +176,7 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->queue.reqs = (struct cn10k_ml_req *)va;
 	qp->queue.head = 0;
 	qp->queue.tail = 0;
-	qp->queue.wait_cycles = ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	qp->nb_desc = nb_desc;
 	qp->stats.enqueued_count = 0;
 	qp->stats.dequeued_count = 0;
@@ -199,16 +200,17 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 static void
 cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 {
-
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	char str[STR_LEN];
 	uint8_t i;
 	uint8_t j;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	/* Print debug info */
@@ -249,7 +251,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
 			ML_CN10K_OCM_NUMTILES / 4, model->model_mem_map.tilemask);
 		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
-			model->model_mem_map.wb_page_start * mldev->ocm.page_size);
+			model->model_mem_map.wb_page_start * cn10k_mldev->ocm.page_size);
 	}
 
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->metadata.model.num_input);
@@ -325,7 +327,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 }
 
 static void
-cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_model *model,
+cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_model *model,
 				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
@@ -340,7 +342,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 	req->jd.hdr.model_id = model->model_id;
 	req->jd.hdr.job_type = job_type;
 	req->jd.hdr.fp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
 		if (!model->metadata.model.ocm_relocatable)
@@ -350,9 +352,9 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 
 		req->jd.hdr.sp_flags |= ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD;
 		req->jd.model_start.extended_args =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, &req->extended_args));
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->extended_args));
 		req->jd.model_start.model_dst_ddr_addr =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, addr->init_run_addr));
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_run_addr));
 		req->jd.model_start.model_init_offset = 0x0;
 		req->jd.model_start.model_main_offset = metadata->init_model.file_size;
 		req->jd.model_start.model_finish_offset =
@@ -372,7 +374,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 		req->jd.model_start.ocm_wb_range_start = metadata->model.ocm_wb_range_start;
 		req->jd.model_start.ocm_wb_range_end = metadata->model.ocm_wb_range_end;
 		req->jd.model_start.ddr_wb_base_address = PLT_U64_CAST(roc_ml_addr_ap2mlip(
-			&mldev->roc,
+			&cn10k_mldev->roc,
 			PLT_PTR_ADD(addr->finish_load_addr, metadata->finish_model.file_size)));
 		req->jd.model_start.ddr_wb_range_start = metadata->model.ddr_wb_range_start;
 		req->jd.model_start.ddr_wb_range_end = metadata->model.ddr_wb_range_end;
@@ -383,7 +385,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 		req->jd.model_start.output.s.ddr_range_end = metadata->model.ddr_output_range_end;
 
 		req->extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
-			roc_ml_addr_ap2mlip(&mldev->roc, model->addr.scratch_base_addr));
+			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, model->addr.scratch_base_addr));
 		req->extended_args.start.ddr_scratch_range_start =
 			metadata->model.ddr_scratch_range_start;
 		req->extended_args.start.ddr_scratch_range_end =
@@ -392,24 +394,20 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 }
 
 static __rte_always_inline void
-cn10k_ml_prep_fp_job_descriptor(struct rte_ml_dev *dev, struct cn10k_ml_req *req,
+cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_req *req,
 				struct rte_ml_op *op)
 {
-	struct cn10k_ml_dev *mldev;
-
-	mldev = dev->data->dev_private;
-
 	req->jd.hdr.jce.w0.u64 = 0;
 	req->jd.hdr.jce.w1.u64 = req->compl_W1;
 	req->jd.hdr.model_id = op->model_id;
 	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
 	req->jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
 	req->jd.hdr.sp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
 	req->jd.model_run.input_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->input[0]->addr));
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->input[0]->addr));
 	req->jd.model_run.output_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->output[0]->addr));
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->output[0]->addr));
 	req->jd.model_run.num_batches = op->nb_batches;
 }
 
@@ -436,66 +434,69 @@ static const struct xstat_info model_stats[] = {
 static int
 cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint16_t nb_stats;
 	uint16_t stat_id;
 	uint16_t model;
 	uint16_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
 	nb_stats = RTE_DIM(device_stats) + ML_CN10K_MAX_MODELS * RTE_DIM(model_stats);
-	if (mldev->xstats.entries == NULL)
-		mldev->xstats.entries = rte_zmalloc("cn10k_ml_xstats",
-						    sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
-						    PLT_CACHE_LINE_SIZE);
+	if (cn10k_mldev->xstats.entries == NULL)
+		cn10k_mldev->xstats.entries = rte_zmalloc(
+			"cn10k_ml_xstats", sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
+			PLT_CACHE_LINE_SIZE);
 
-	if (mldev->xstats.entries == NULL)
+	if (cn10k_mldev->xstats.entries == NULL)
 		return -ENOMEM;
 
 	/* Initialize device xstats */
 	stat_id = 0;
 	for (i = 0; i < RTE_DIM(device_stats); i++) {
-		mldev->xstats.entries[stat_id].map.id = stat_id;
-		snprintf(mldev->xstats.entries[stat_id].map.name,
-			 sizeof(mldev->xstats.entries[stat_id].map.name), "%s",
+		cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
+		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s",
 			 device_stats[i].name);
 
-		mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
-		mldev->xstats.entries[stat_id].type = device_stats[i].type;
-		mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_DEVICE;
-		mldev->xstats.entries[stat_id].obj_idx = 0;
-		mldev->xstats.entries[stat_id].reset_allowed = device_stats[i].reset_allowed;
+		cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
+		cn10k_mldev->xstats.entries[stat_id].type = device_stats[i].type;
+		cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_DEVICE;
+		cn10k_mldev->xstats.entries[stat_id].obj_idx = 0;
+		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_stats[i].reset_allowed;
 		stat_id++;
 	}
-	mldev->xstats.count_mode_device = stat_id;
+	cn10k_mldev->xstats.count_mode_device = stat_id;
 
 	/* Initialize model xstats */
 	for (model = 0; model < ML_CN10K_MAX_MODELS; model++) {
-		mldev->xstats.offset_for_model[model] = stat_id;
+		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
 
 		for (i = 0; i < RTE_DIM(model_stats); i++) {
-			mldev->xstats.entries[stat_id].map.id = stat_id;
-			mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
-			mldev->xstats.entries[stat_id].type = model_stats[i].type;
-			mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_MODEL;
-			mldev->xstats.entries[stat_id].obj_idx = model;
-			mldev->xstats.entries[stat_id].reset_allowed = model_stats[i].reset_allowed;
+			cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
+			cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
+			cn10k_mldev->xstats.entries[stat_id].type = model_stats[i].type;
+			cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_MODEL;
+			cn10k_mldev->xstats.entries[stat_id].obj_idx = model;
+			cn10k_mldev->xstats.entries[stat_id].reset_allowed =
+				model_stats[i].reset_allowed;
 
 			/* Name of xstat is updated during model load */
-			snprintf(mldev->xstats.entries[stat_id].map.name,
-				 sizeof(mldev->xstats.entries[stat_id].map.name), "Model-%u-%s",
-				 model, model_stats[i].name);
+			snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
+				 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name),
+				 "Model-%u-%s", model, model_stats[i].name);
 
 			stat_id++;
 		}
 
-		mldev->xstats.count_per_model[model] = RTE_DIM(model_stats);
+		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(model_stats);
 	}
 
-	mldev->xstats.count_mode_model = stat_id - mldev->xstats.count_mode_device;
-	mldev->xstats.count = stat_id;
+	cn10k_mldev->xstats.count_mode_model = stat_id - cn10k_mldev->xstats.count_mode_device;
+	cn10k_mldev->xstats.count = stat_id;
 
 	return 0;
 }
@@ -503,28 +504,32 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 static void
 cn10k_ml_xstats_uninit(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	rte_free(mldev->xstats.entries);
-	mldev->xstats.entries = NULL;
+	rte_free(cn10k_mldev->xstats.entries);
+	cn10k_mldev->xstats.entries = NULL;
 
-	mldev->xstats.count = 0;
+	cn10k_mldev->xstats.count = 0;
 }
 
 static void
 cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	uint16_t rclk_freq;
 	uint16_t sclk_freq;
 	uint16_t stat_id;
 	char suffix[8];
 	uint16_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[model_id];
 	stat_id = RTE_DIM(device_stats) + model_id * RTE_DIM(model_stats);
 
@@ -536,8 +541,8 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Update xstat name based on model name and sclk availability */
 	for (i = 0; i < RTE_DIM(model_stats); i++) {
-		snprintf(mldev->xstats.entries[stat_id].map.name,
-			 sizeof(mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
 			 model->metadata.model.name, model_stats[i].name, suffix);
 		stat_id++;
 	}
@@ -547,19 +552,19 @@ static uint64_t
 cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 		       enum cn10k_ml_xstats_type type)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
 
 	switch (type) {
 	case nb_models_loaded:
-		return mldev->nb_models_loaded;
+		return cnxk_mldev->nb_models_loaded;
 	case nb_models_unloaded:
-		return mldev->nb_models_unloaded;
+		return cnxk_mldev->nb_models_unloaded;
 	case nb_models_started:
-		return mldev->nb_models_started;
+		return cnxk_mldev->nb_models_started;
 	case nb_models_stopped:
-		return mldev->nb_models_stopped;
+		return cnxk_mldev->nb_models_stopped;
 	default:
 		return -1;
 	}
@@ -651,15 +656,17 @@ static int
 cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[], uint16_t nb_ids)
 {
 	struct cn10k_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint16_t nb_stats;
 	uint16_t stat_id;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	if (stat_ids == NULL)
-		nb_stats = mldev->xstats.count_mode_device;
+		nb_stats = cn10k_mldev->xstats.count_mode_device;
 	else
 		nb_stats = nb_ids;
 
@@ -669,10 +676,10 @@ cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[],
 		else
 			stat_id = stat_ids[i];
 
-		if (stat_id >= mldev->xstats.count_mode_device)
+		if (stat_id >= cn10k_mldev->xstats.count_mode_device)
 			return -EINVAL;
 
-		xs = &mldev->xstats.entries[stat_id];
+		xs = &cn10k_mldev->xstats.entries[stat_id];
 		if (!xs->reset_allowed)
 			continue;
 
@@ -740,15 +747,17 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 			    uint16_t nb_ids)
 {
 	struct cn10k_ml_xstats_entry *xs;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	int32_t lcl_model_id = 0;
 	uint16_t start_id;
 	uint16_t end_id;
 	int32_t i;
 	int32_t j;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	for (i = 0; i < ML_CN10K_MAX_MODELS; i++) {
 		if (model_id == -1) {
 			model = dev->data->models[i];
@@ -765,12 +774,13 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 			}
 		}
 
-		start_id = mldev->xstats.offset_for_model[i];
-		end_id = mldev->xstats.offset_for_model[i] + mldev->xstats.count_per_model[i] - 1;
+		start_id = cn10k_mldev->xstats.offset_for_model[i];
+		end_id = cn10k_mldev->xstats.offset_for_model[i] +
+			 cn10k_mldev->xstats.count_per_model[i] - 1;
 
 		if (stat_ids == NULL) {
 			for (j = start_id; j <= end_id; j++) {
-				xs = &mldev->xstats.entries[j];
+				xs = &cn10k_mldev->xstats.entries[j];
 				cn10k_ml_reset_model_stat(dev, i, xs->type);
 			}
 		} else {
@@ -780,7 +790,7 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 						stat_ids[j], lcl_model_id);
 					return -EINVAL;
 				}
-				xs = &mldev->xstats.entries[stat_ids[j]];
+				xs = &cn10k_mldev->xstats.entries[stat_ids[j]];
 				cn10k_ml_reset_model_stat(dev, i, xs->type);
 			}
 		}
@@ -854,17 +864,19 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
 	if (dev_info == NULL)
 		return -EINVAL;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
 	dev_info->driver_name = dev->device->driver->name;
 	dev_info->max_models = ML_CN10K_MAX_MODELS;
-	if (mldev->hw_queue_lock)
+	if (cn10k_mldev->hw_queue_lock)
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
 	else
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_LF;
@@ -881,8 +893,9 @@ static int
 cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
 {
 	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
@@ -895,7 +908,8 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		return -EINVAL;
 
 	/* Get CN10K device handle */
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	cn10k_ml_dev_info_get(dev, &dev_info);
 	if (conf->nb_models > dev_info.max_models) {
@@ -908,21 +922,21 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		return -EINVAL;
 	}
 
-	if (mldev->state == ML_CN10K_DEV_STATE_PROBED) {
+	if (cnxk_mldev->state == ML_CNXK_DEV_STATE_PROBED) {
 		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
 
 		/* Load firmware */
-		ret = cn10k_ml_fw_load(mldev);
+		ret = cn10k_ml_fw_load(cnxk_mldev);
 		if (ret != 0)
 			return ret;
-	} else if (mldev->state == ML_CN10K_DEV_STATE_CONFIGURED) {
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
 		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
-	} else if (mldev->state == ML_CN10K_DEV_STATE_STARTED) {
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_STARTED) {
 		plt_err("Device can't be reconfigured in started state\n");
 		return -ENOTSUP;
-	} else if (mldev->state == ML_CN10K_DEV_STATE_CLOSED) {
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CLOSED) {
 		plt_err("Device can't be reconfigured after close\n");
 		return -ENOTSUP;
 	}
@@ -1013,10 +1027,10 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	}
 	dev->data->nb_models = conf->nb_models;
 
-	ocm = &mldev->ocm;
+	ocm = &cn10k_mldev->ocm;
 	ocm->num_tiles = ML_CN10K_OCM_NUMTILES;
 	ocm->size_per_tile = ML_CN10K_OCM_TILESIZE;
-	ocm->page_size = mldev->ocm_page_size;
+	ocm->page_size = cn10k_mldev->ocm_page_size;
 	ocm->num_pages = ocm->size_per_tile / ocm->page_size;
 	ocm->mask_words = ocm->num_pages / (8 * sizeof(uint8_t));
 
@@ -1044,25 +1058,25 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	}
 
 	/* Set JCMDQ enqueue function */
-	if (mldev->hw_queue_lock == 1)
-		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_sl;
+	if (cn10k_mldev->hw_queue_lock == 1)
+		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_sl;
 	else
-		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
+		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
 	/* Set polling function pointers */
-	mldev->set_poll_addr = cn10k_ml_set_poll_addr;
-	mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
-	mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
+	cn10k_mldev->set_poll_addr = cn10k_ml_set_poll_addr;
+	cn10k_mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
+	cn10k_mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
 
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
 	dev->op_error_get = cn10k_ml_op_error_get;
 
-	mldev->nb_models_loaded = 0;
-	mldev->nb_models_started = 0;
-	mldev->nb_models_stopped = 0;
-	mldev->nb_models_unloaded = 0;
-	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
+	cnxk_mldev->nb_models_loaded = 0;
+	cnxk_mldev->nb_models_started = 0;
+	cnxk_mldev->nb_models_stopped = 0;
+	cnxk_mldev->nb_models_unloaded = 0;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
 
 	return 0;
 
@@ -1077,8 +1091,9 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 static int
 cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
 	uint16_t qp_id;
@@ -1086,10 +1101,11 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	if (dev == NULL)
 		return -EINVAL;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Release ocm_mask memory */
-	rte_free(mldev->ocm.ocm_mask);
+	rte_free(cn10k_mldev->ocm.ocm_mask);
 
 	/* Stop and unload all models */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
@@ -1125,21 +1141,21 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	cn10k_ml_xstats_uninit(dev);
 
 	/* Unload firmware */
-	cn10k_ml_fw_unload(mldev);
+	cn10k_ml_fw_unload(cnxk_mldev);
 
 	/* Clear scratch registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
 
 	/* Reset ML_MLR_BASE */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_MLR_BASE);
-	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
 
-	mldev->state = ML_CN10K_DEV_STATE_CLOSED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CLOSED;
 
 	/* Remove PCI device */
 	return rte_dev_remove(dev->device);
@@ -1148,17 +1164,19 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 static int
 cn10k_ml_dev_start(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	mldev->state = ML_CN10K_DEV_STATE_STARTED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
 
 	return 0;
 }
@@ -1166,17 +1184,19 @@ cn10k_ml_dev_start(struct rte_ml_dev *dev)
 static int
 cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
 
 	return 0;
 }
@@ -1259,22 +1279,24 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
 			      uint32_t size)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint32_t xstats_mode_count;
 	uint32_t idx = 0;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	xstats_mode_count = 0;
 	switch (mode) {
 	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = mldev->xstats.count_mode_device;
+		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CN10K_MAX_MODELS)
 			break;
-		xstats_mode_count = mldev->xstats.count_per_model[model_id];
+		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
 	default:
 		return -EINVAL;
@@ -1283,16 +1305,17 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 	if (xstats_mode_count > size || xstats_map == NULL)
 		return xstats_mode_count;
 
-	for (i = 0; i < mldev->xstats.count && idx < size; i++) {
-		if (mldev->xstats.entries[i].mode != mode)
+	for (i = 0; i < cn10k_mldev->xstats.count && idx < size; i++) {
+		if (cn10k_mldev->xstats.entries[i].mode != mode)
 			continue;
 
 		if (mode != RTE_ML_DEV_XSTATS_DEVICE &&
-		    model_id != mldev->xstats.entries[i].obj_idx)
+		    model_id != cn10k_mldev->xstats.entries[i].obj_idx)
 			continue;
 
-		strncpy(xstats_map[idx].name, mldev->xstats.entries[i].map.name, RTE_ML_STR_MAX);
-		xstats_map[idx].id = mldev->xstats.entries[i].map.id;
+		strncpy(xstats_map[idx].name, cn10k_mldev->xstats.entries[i].map.name,
+			RTE_ML_STR_MAX);
+		xstats_map[idx].id = cn10k_mldev->xstats.entries[i].map.id;
 		idx++;
 	}
 
@@ -1304,13 +1327,15 @@ cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16
 				uint64_t *value)
 {
 	struct cn10k_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	cn10k_ml_xstats_fn fn;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
-	for (i = 0; i < mldev->xstats.count; i++) {
-		xs = &mldev->xstats.entries[i];
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	for (i = 0; i < cn10k_mldev->xstats.count; i++) {
+		xs = &cn10k_mldev->xstats.entries[i];
 		if (strncmp(xs->map.name, name, RTE_ML_STR_MAX) == 0) {
 			if (stat_id != NULL)
 				*stat_id = xs->map.id;
@@ -1344,24 +1369,26 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
 {
 	struct cn10k_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint32_t xstats_mode_count;
 	cn10k_ml_xstats_fn fn;
 	uint64_t val;
 	uint32_t idx;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	xstats_mode_count = 0;
 
 	switch (mode) {
 	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = mldev->xstats.count_mode_device;
+		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CN10K_MAX_MODELS)
 			return -EINVAL;
-		xstats_mode_count = mldev->xstats.count_per_model[model_id];
+		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
 	default:
 		return -EINVAL;
@@ -1369,8 +1396,8 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 
 	idx = 0;
 	for (i = 0; i < nb_ids && idx < xstats_mode_count; i++) {
-		xs = &mldev->xstats.entries[stat_ids[i]];
-		if (stat_ids[i] > mldev->xstats.count || xs->mode != mode)
+		xs = &cn10k_mldev->xstats.entries[stat_ids[i]];
+		if (stat_ids[i] > cn10k_mldev->xstats.count || xs->mode != mode)
 			continue;
 
 		if (mode == RTE_ML_DEV_XSTATS_MODEL && model_id != xs->obj_idx) {
@@ -1418,8 +1445,9 @@ cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mo
 static int
 cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_fw *fw;
 
 	uint32_t head_loc;
@@ -1432,8 +1460,9 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	if (roc_env_is_asim())
 		return 0;
 
-	mldev = dev->data->dev_private;
-	fw = &mldev->fw;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	fw = &cn10k_mldev->fw;
 
 	/* Dump model info */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
@@ -1451,15 +1480,19 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	for (core_id = 0; core_id <= 1; core_id++) {
 		bufsize = fw->req->jd.fw_load.debug.debug_buffer_size;
 		if (core_id == 0) {
-			head_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-			tail_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+			head_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+			tail_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_debug_ptr);
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		} else {
-			head_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-			tail_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+			head_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+			tail_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_debug_ptr);
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		}
 		if (head_loc < tail_loc) {
 			fprintf(fp, "%.*s\n", tail_loc - head_loc, &head_ptr[head_loc]);
@@ -1473,18 +1506,18 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	for (core_id = 0; core_id <= 1; core_id++) {
 		bufsize = fw->req->jd.fw_load.debug.exception_state_size;
 		if ((core_id == 0) &&
-		    (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0)) {
+		    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0)) {
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C0 = 0x%016lx",
-				roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0));
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0));
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 			fprintf(fp, "%.*s", bufsize, head_ptr);
-		} else if ((core_id == 1) &&
-			   (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0)) {
+		} else if ((core_id == 1) && (roc_ml_reg_read64(&cn10k_mldev->roc,
+								ML_SCRATCH_EXCEPTION_SP_C1) != 0)) {
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C1 = 0x%016lx",
-				roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1));
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1));
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 			fprintf(fp, "%.*s", bufsize, head_ptr);
 		}
 	}
@@ -1495,14 +1528,16 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 static int
 cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	const struct plt_memzone *mz;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	uint64_t timeout_cycle;
 	bool timeout;
 	int ret;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cn10k_ml_req), 0,
 					 ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL) {
@@ -1515,20 +1550,20 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
 	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
 	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
-	req->jd.fw_load.flags = cn10k_ml_fw_flags_get(&mldev->fw);
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
+	req->jd.fw_load.flags = cn10k_ml_fw_flags_get(&cn10k_mldev->fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
 	/* Enqueue firmware selftest request through scratch registers */
 	timeout = true;
-	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
 
 	plt_rmb();
 	do {
-		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
-		    (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
+		    (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -1552,8 +1587,8 @@ int
 cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
 	struct cn10k_ml_model_metadata *metadata;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
@@ -1574,7 +1609,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	if (ret != 0)
 		return ret;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
 
 	/* Find model ID */
 	found = false;
@@ -1591,7 +1626,8 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	/* Get WB and scratch pages, check if model can be loaded. */
-	ret = cn10k_ml_model_ocm_pages_count(mldev, idx, params->addr, &wb_pages, &scratch_pages);
+	ret = cn10k_ml_model_ocm_pages_count(&cnxk_mldev->cn10k_mldev, idx, params->addr, &wb_pages,
+					     &scratch_pages);
 	if (ret < 0)
 		return ret;
 
@@ -1623,7 +1659,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	model = mz->addr;
-	model->mldev = mldev;
+	model->mldev = cnxk_mldev;
 	model->model_id = idx;
 
 	rte_memcpy(&model->metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
@@ -1680,7 +1716,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
-	mldev->nb_models_loaded++;
+	cnxk_mldev->nb_models_loaded++;
 
 	/* Update xstats names */
 	cn10k_ml_xstats_model_name_update(dev, idx);
@@ -1695,9 +1731,9 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	char str[RTE_MEMZONE_NAMESIZE];
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
 	model = dev->data->models[model_id];
 
 	if (model == NULL) {
@@ -1711,7 +1747,7 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 	}
 
 	dev->data->models[model_id] = NULL;
-	mldev->nb_models_unloaded++;
+	cnxk_mldev->nb_models_unloaded++;
 
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, model_id);
 	return plt_memzone_free(plt_memzone_lookup(str));
@@ -1720,8 +1756,9 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 int
 cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1735,8 +1772,9 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	bool locked;
 	int ret = 0;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	if (model == NULL) {
@@ -1746,11 +1784,11 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Prepare JD */
 	req = model->req;
-	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
+	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
 
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
 	num_tiles = model->metadata.model.tile_end - model->metadata.model.tile_start + 1;
@@ -1815,26 +1853,26 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	job_dequeued = false;
 	do {
 		if (!job_enqueued) {
-			req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&mldev->roc, &req->jd);
+			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
 			if (req->result.error_code.u64 == 0)
 				ret = 0;
 			else
 				ret = -1;
 		}
 	} else { /* Reset scratch registers */
-		roc_ml_scratch_queue_reset(&mldev->roc);
+		roc_ml_scratch_queue_reset(&cn10k_mldev->roc);
 		ret = -ETIME;
 	}
 
@@ -1843,7 +1881,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 		if (plt_spinlock_trylock(&model->lock) != 0) {
 			if (ret == 0) {
 				model->state = ML_CN10K_MODEL_STATE_STARTED;
-				mldev->nb_models_started++;
+				cnxk_mldev->nb_models_started++;
 			} else {
 				model->state = ML_CN10K_MODEL_STATE_UNKNOWN;
 			}
@@ -1867,7 +1905,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	if (ret < 0) { /* Call unload to update model and FW state, ignore error */
 		rte_ml_model_stop(dev->data->dev_id, model_id);
 	} else {
-		if (mldev->cache_model_data && roc_model_is_cn10ka())
+		if (cn10k_mldev->cache_model_data && roc_model_is_cn10ka())
 			ret = cn10k_ml_cache_model_data(dev, model_id);
 	}
 
@@ -1877,8 +1915,9 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 int
 cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1887,8 +1926,9 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	bool locked;
 	int ret = 0;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	if (model == NULL) {
@@ -1898,11 +1938,11 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Prepare JD */
 	req = model->req;
-	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
+	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
 
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
 	locked = false;
@@ -1941,33 +1981,33 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	job_dequeued = false;
 	do {
 		if (!job_enqueued) {
-			req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&mldev->roc, &req->jd);
+			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
 			if (req->result.error_code.u64 == 0x0)
 				ret = 0;
 			else
 				ret = -1;
 		}
 	} else {
-		roc_ml_scratch_queue_reset(&mldev->roc);
+		roc_ml_scratch_queue_reset(&cn10k_mldev->roc);
 		ret = -ETIME;
 	}
 
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			mldev->nb_models_stopped++;
+			cnxk_mldev->nb_models_stopped++;
 			model->state = ML_CN10K_MODEL_STATE_LOADED;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
@@ -2211,8 +2251,9 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		       struct rte_ml_op *op)
 {
 	struct cn10k_ml_model_stats *stats;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
@@ -2258,14 +2299,16 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 
 		/* Handle driver error */
 		if (result->error_code.s.etype == ML_ETYPE_DRIVER) {
-			mldev = dev->data->dev_private;
+			cnxk_mldev = dev->data->dev_private;
+			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 			/* Check for exception */
-			if ((roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0) ||
-			    (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
+			if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) !=
+			     0) ||
+			    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
 				result->error_code.s.stype = ML_DRIVER_ERR_EXCEPTION;
-			else if ((roc_ml_reg_read64(&mldev->roc, ML_CORE_INT_LO) != 0) ||
-				 (roc_ml_reg_read64(&mldev->roc, ML_CORE_INT_HI) != 0))
+			else if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_LO) != 0) ||
+				 (roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_HI) != 0))
 				result->error_code.s.stype = ML_DRIVER_ERR_FW_ERROR;
 			else
 				result->error_code.s.stype = ML_DRIVER_ERR_UNKNOWN;
@@ -2282,8 +2325,9 @@ __rte_hot uint16_t
 cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	struct cn10k_ml_qp *qp;
 	struct rte_ml_op *op;
@@ -2292,7 +2336,8 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	uint64_t head;
 	bool enqueued;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	qp = dev->data->queue_pairs[qp_id];
 	queue = &qp->queue;
 
@@ -2307,15 +2352,15 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	op = ops[count];
 	req = &queue->reqs[head];
 
-	mldev->set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
+	cn10k_mldev->set_poll_addr(req);
+	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
-	mldev->set_poll_ptr(req);
-	enqueued = mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd);
+	cn10k_mldev->set_poll_ptr(req);
+	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
 
@@ -2339,8 +2384,9 @@ __rte_hot uint16_t
 cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	struct cn10k_ml_qp *qp;
 
@@ -2348,7 +2394,8 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	uint16_t count;
 	uint64_t tail;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	qp = dev->data->queue_pairs[qp_id];
 	queue = &qp->queue;
 
@@ -2361,8 +2408,8 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 
 dequeue_req:
 	req = &queue->reqs[tail];
-	status = mldev->get_poll_ptr(req);
-	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH)) {
+	status = cn10k_mldev->get_poll_ptr(req);
+	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
 		if (plt_tsc_cycles() < req->timeout)
 			goto empty_or_active;
 		else /* Timeout, set indication of driver error */
@@ -2420,30 +2467,32 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 __rte_hot int
 cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	bool timeout;
 	int ret = 0;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[op->model_id];
 	req = model->req;
 
 	cn10k_ml_set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
-	mldev->set_poll_ptr(req);
+	cn10k_mldev->set_poll_ptr(req);
 	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
 
 	timeout = true;
-	req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	do {
-		if (mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd)) {
+		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd)) {
 			req->op = op;
 			timeout = false;
 			break;
@@ -2457,7 +2506,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 
 	timeout = true;
 	do {
-		if (mldev->get_poll_ptr(req) == ML_CN10K_POLL_JOB_FINISH) {
+		if (cn10k_mldev->get_poll_ptr(req) == ML_CNXK_POLL_JOB_FINISH) {
 			timeout = false;
 			break;
 		}
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.c b/drivers/ml/cnxk/cnxk_ml_dev.c
new file mode 100644
index 0000000000..2a5c17c973
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_dev.c
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include "cnxk_ml_dev.h"
+
+/* Dummy operations for ML device */
+struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
new file mode 100644
index 0000000000..51315de622
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_DEV_H_
+#define _CNXK_ML_DEV_H_
+
+#include <roc_api.h>
+
+#include "cn10k_ml_dev.h"
+
+/* ML command timeout in seconds */
+#define ML_CNXK_CMD_TIMEOUT 5
+
+/* Poll mode job state */
+#define ML_CNXK_POLL_JOB_START	0
+#define ML_CNXK_POLL_JOB_FINISH 1
+
+/* Device configuration state enum */
+enum cnxk_ml_dev_state {
+	/* Probed and not configured */
+	ML_CNXK_DEV_STATE_PROBED = 0,
+
+	/* Configured */
+	ML_CNXK_DEV_STATE_CONFIGURED,
+
+	/* Started */
+	ML_CNXK_DEV_STATE_STARTED,
+
+	/* Closed */
+	ML_CNXK_DEV_STATE_CLOSED
+};
+
+/* Device private data */
+struct cnxk_ml_dev {
+	/* RTE device */
+	struct rte_ml_dev *mldev;
+
+	/* Configuration state */
+	enum cnxk_ml_dev_state state;
+
+	/* Number of models loaded */
+	uint16_t nb_models_loaded;
+
+	/* Number of models unloaded */
+	uint16_t nb_models_unloaded;
+
+	/* Number of models started */
+	uint16_t nb_models_started;
+
+	/* Number of models stopped */
+	uint16_t nb_models_stopped;
+
+	/* CN10K device structure */
+	struct cn10k_ml_dev cn10k_mldev;
+};
+
+#endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 5bf17d8ae3..e006fdfe0e 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -12,6 +12,7 @@ sources = files(
         'cn10k_ml_ops.c',
         'cn10k_ml_model.c',
         'cn10k_ml_ocm.c',
+        'cnxk_ml_dev.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 03/34] ml/cnxk: add generic model and layer structures
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
  2023-10-18 13:53   ` [PATCH v6 01/34] ml/cnxk: drop support for register polling Srikanth Yalavarthi
  2023-10-18 13:53   ` [PATCH v6 02/34] ml/cnxk: add generic cnxk device structure Srikanth Yalavarthi
@ 2023-10-18 13:53   ` Srikanth Yalavarthi
  2023-10-18 13:53   ` [PATCH v6 04/34] ml/cnxk: add generic cnxk request structure Srikanth Yalavarthi
                     ` (31 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:53 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduce generic cnxk model and layer structure. These
structures would enable supporting models with multiple
layers. A model is a collection of multiple independent
layers with flow dependencies between the layers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |   9 +-
 drivers/ml/cnxk/cn10k_ml_model.c | 245 ++++++++--------
 drivers/ml/cnxk/cn10k_ml_model.h | 122 ++------
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  50 ++--
 drivers/ml/cnxk/cn10k_ml_ocm.h   |   9 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 488 +++++++++++++++++--------------
 drivers/ml/cnxk/cnxk_ml_io.h     |  79 +++++
 drivers/ml/cnxk/cnxk_ml_model.c  |   7 +
 drivers/ml/cnxk/cnxk_ml_model.h  | 111 +++++++
 drivers/ml/cnxk/meson.build      |   1 +
 10 files changed, 651 insertions(+), 470 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index f9da1548c4..99ff0a344a 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -9,6 +9,8 @@
 
 #include "cn10k_ml_ocm.h"
 
+#include "cnxk_ml_io.h"
+
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 
@@ -21,9 +23,6 @@ extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 /* Device alignment size */
 #define ML_CN10K_ALIGN_SIZE 128
 
-/* Maximum number of models per device */
-#define ML_CN10K_MAX_MODELS 16
-
 /* Maximum number of queue-pairs per device, spinlock version */
 #define ML_CN10K_MAX_QP_PER_DEVICE_SL 16
 
@@ -455,8 +454,8 @@ struct cn10k_ml_xstats {
 	struct cn10k_ml_xstats_entry *entries;
 
 	/* Store num stats and offset of the stats for each model */
-	uint16_t count_per_model[ML_CN10K_MAX_MODELS];
-	uint16_t offset_for_model[ML_CN10K_MAX_MODELS];
+	uint16_t count_per_model[ML_CNXK_MAX_MODELS];
+	uint16_t offset_for_model[ML_CNXK_MAX_MODELS];
 	uint16_t count_mode_device;
 	uint16_t count_mode_model;
 	uint16_t count;
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index cc46ca2efd..d747bba151 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -6,10 +6,10 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -311,19 +311,17 @@ cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata)
 }
 
 void
-cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_t *base_dma_addr)
+cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer, uint8_t *base_dma_addr)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_model_addr *addr;
+	struct cn10k_ml_layer_addr *addr;
 	size_t model_data_size;
 	uint8_t *dma_addr_load;
 	uint8_t *dma_addr_run;
-	uint8_t i;
-	uint8_t j;
 	int fpos;
 
-	metadata = &model->metadata;
-	addr = &model->addr;
+	metadata = &layer->glow.metadata;
+	addr = &layer->glow.addr;
 	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
 			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
 
@@ -361,102 +359,136 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_
 	addr->wb_base_addr = PLT_PTR_SUB(dma_addr_load, metadata->weights_bias.mem_offset);
 	addr->wb_load_addr = PLT_PTR_ADD(addr->wb_base_addr, metadata->weights_bias.mem_offset);
 	rte_memcpy(addr->wb_load_addr, PLT_PTR_ADD(buffer, fpos), metadata->weights_bias.file_size);
+}
+
+void
+cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	uint8_t i;
+	uint8_t j;
+
+	metadata = &layer->glow.metadata;
 
 	/* Inputs */
-	addr->total_input_sz_d = 0;
-	addr->total_input_sz_q = 0;
+	layer->info.nb_inputs = metadata->model.num_input;
+	layer->info.total_input_sz_d = 0;
+	layer->info.total_input_sz_q = 0;
 	for (i = 0; i < metadata->model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			addr->input[i].nb_dims = 4;
-			addr->input[i].shape[0] = metadata->input1[i].shape.w;
-			addr->input[i].shape[1] = metadata->input1[i].shape.x;
-			addr->input[i].shape[2] = metadata->input1[i].shape.y;
-			addr->input[i].shape[3] = metadata->input1[i].shape.z;
-
-			addr->input[i].nb_elements =
+			strncpy(layer->info.input[i].name, (char *)metadata->input1[i].input_name,
+				MRVL_ML_INPUT_NAME_LEN);
+			layer->info.input[i].dtype = metadata->input1[i].input_type;
+			layer->info.input[i].qtype = metadata->input1[i].model_input_type;
+			layer->info.input[i].nb_dims = 4;
+			layer->info.input[i].shape[0] = metadata->input1[i].shape.w;
+			layer->info.input[i].shape[1] = metadata->input1[i].shape.x;
+			layer->info.input[i].shape[2] = metadata->input1[i].shape.y;
+			layer->info.input[i].shape[3] = metadata->input1[i].shape.z;
+			layer->info.input[i].nb_elements =
 				metadata->input1[i].shape.w * metadata->input1[i].shape.x *
 				metadata->input1[i].shape.y * metadata->input1[i].shape.z;
-			addr->input[i].sz_d =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_d =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].input_type);
-			addr->input[i].sz_q =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_q =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].model_input_type);
-			addr->total_input_sz_d += addr->input[i].sz_d;
-			addr->total_input_sz_q += addr->input[i].sz_q;
+			layer->info.input[i].scale = metadata->input1[i].qscale;
+
+			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
+			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
 
 			plt_ml_dbg(
-				"model_id = %u, input[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				model->model_id, i, metadata->input1[i].shape.w,
+				"index = %u, input1[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				layer->index, i, metadata->input1[i].shape.w,
 				metadata->input1[i].shape.x, metadata->input1[i].shape.y,
-				metadata->input1[i].shape.z, addr->input[i].sz_d,
-				addr->input[i].sz_q);
+				metadata->input1[i].shape.z, layer->info.input[i].sz_d,
+				layer->info.input[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			addr->input[i].nb_dims = 4;
-			addr->input[i].shape[0] = metadata->input2[j].shape.w;
-			addr->input[i].shape[1] = metadata->input2[j].shape.x;
-			addr->input[i].shape[2] = metadata->input2[j].shape.y;
-			addr->input[i].shape[3] = metadata->input2[j].shape.z;
-
-			addr->input[i].nb_elements =
+			strncpy(layer->info.input[i].name, (char *)metadata->input2[j].input_name,
+				MRVL_ML_INPUT_NAME_LEN);
+			layer->info.input[i].dtype = metadata->input2[j].input_type;
+			layer->info.input[i].qtype = metadata->input2[j].model_input_type;
+			layer->info.input[i].nb_dims = 4;
+			layer->info.input[i].shape[0] = metadata->input2[j].shape.w;
+			layer->info.input[i].shape[1] = metadata->input2[j].shape.x;
+			layer->info.input[i].shape[2] = metadata->input2[j].shape.y;
+			layer->info.input[i].shape[3] = metadata->input2[j].shape.z;
+			layer->info.input[i].nb_elements =
 				metadata->input2[j].shape.w * metadata->input2[j].shape.x *
 				metadata->input2[j].shape.y * metadata->input2[j].shape.z;
-			addr->input[i].sz_d =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_d =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].input_type);
-			addr->input[i].sz_q =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_q =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].model_input_type);
-			addr->total_input_sz_d += addr->input[i].sz_d;
-			addr->total_input_sz_q += addr->input[i].sz_q;
+			layer->info.input[i].scale = metadata->input2[j].qscale;
+
+			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
+			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
 
 			plt_ml_dbg(
-				"model_id = %u, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				model->model_id, j, metadata->input2[j].shape.w,
+				"index = %u, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				layer->index, j, metadata->input2[j].shape.w,
 				metadata->input2[j].shape.x, metadata->input2[j].shape.y,
-				metadata->input2[j].shape.z, addr->input[i].sz_d,
-				addr->input[i].sz_q);
+				metadata->input2[j].shape.z, layer->info.input[i].sz_d,
+				layer->info.input[i].sz_q);
 		}
 	}
 
 	/* Outputs */
-	addr->total_output_sz_q = 0;
-	addr->total_output_sz_d = 0;
+	layer->info.nb_outputs = metadata->model.num_output;
+	layer->info.total_output_sz_q = 0;
+	layer->info.total_output_sz_d = 0;
 	for (i = 0; i < metadata->model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			addr->output[i].nb_dims = 1;
-			addr->output[i].shape[0] = metadata->output1[i].size;
-			addr->output[i].nb_elements = metadata->output1[i].size;
-			addr->output[i].sz_d =
-				addr->output[i].nb_elements *
+			strncpy(layer->info.output[i].name,
+				(char *)metadata->output1[i].output_name, MRVL_ML_OUTPUT_NAME_LEN);
+			layer->info.output[i].dtype = metadata->output1[i].output_type;
+			layer->info.output[i].qtype = metadata->output1[i].model_output_type;
+			layer->info.output[i].nb_dims = 1;
+			layer->info.output[i].shape[0] = metadata->output1[i].size;
+			layer->info.output[i].nb_elements = metadata->output1[i].size;
+			layer->info.output[i].sz_d =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].output_type);
-			addr->output[i].sz_q =
-				addr->output[i].nb_elements *
+			layer->info.output[i].sz_q =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].model_output_type);
-			addr->total_output_sz_q += addr->output[i].sz_q;
-			addr->total_output_sz_d += addr->output[i].sz_d;
+			layer->info.output[i].scale = metadata->output1[i].dscale;
 
-			plt_ml_dbg("model_id = %u, output[%u] - sz_d = %u, sz_q = %u",
-				   model->model_id, i, addr->output[i].sz_d, addr->output[i].sz_q);
+			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
+			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
+
+			plt_ml_dbg("index = %u, output1[%u] - sz_d = %u, sz_q = %u", layer->index,
+				   i, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			addr->output[i].nb_dims = 1;
-			addr->output[i].shape[0] = metadata->output2[j].size;
-			addr->output[i].nb_elements = metadata->output2[j].size;
-			addr->output[i].sz_d =
-				addr->output[i].nb_elements *
+			strncpy(layer->info.output[i].name,
+				(char *)metadata->output2[j].output_name, MRVL_ML_OUTPUT_NAME_LEN);
+			layer->info.output[i].dtype = metadata->output2[j].output_type;
+			layer->info.output[i].qtype = metadata->output2[j].model_output_type;
+			layer->info.output[i].nb_dims = 1;
+			layer->info.output[i].shape[0] = metadata->output2[j].size;
+			layer->info.output[i].nb_elements = metadata->output2[j].size;
+			layer->info.output[i].sz_d =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].output_type);
-			addr->output[i].sz_q =
-				addr->output[i].nb_elements *
+			layer->info.output[i].sz_q =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].model_output_type);
-			addr->total_output_sz_q += addr->output[i].sz_q;
-			addr->total_output_sz_d += addr->output[i].sz_d;
+			layer->info.output[i].scale = metadata->output2[j].dscale;
+
+			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
+			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
 
-			plt_ml_dbg("model_id = %u, output2[%u] - sz_d = %u, sz_q = %u",
-				   model->model_id, j, addr->output[i].sz_d, addr->output[i].sz_q);
+			plt_ml_dbg("index = %u, output2[%u] - sz_d = %u, sz_q = %u", layer->index,
+				   j, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
 		}
 	}
 }
@@ -514,23 +546,23 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 }
 
 void
-cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model)
+cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_model_addr *addr;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct rte_ml_model_info *info;
 	struct rte_ml_io_info *output;
 	struct rte_ml_io_info *input;
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_layer *layer;
 	uint8_t i;
-	uint8_t j;
 
-	mldev = dev->data->dev_private;
-	metadata = &model->metadata;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	metadata = &model->glow.metadata;
 	info = PLT_PTR_CAST(model->info);
 	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
 	output = PLT_PTR_ADD(input, metadata->model.num_input * sizeof(struct rte_ml_io_info));
-	addr = &model->addr;
 
 	/* Set model info */
 	memset(info, 0, sizeof(struct rte_ml_model_info));
@@ -542,7 +574,8 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model)
 	info->device_id = dev->data->dev_id;
 	info->io_layout = RTE_ML_IO_LAYOUT_PACKED;
 	info->min_batches = model->batch_size;
-	info->max_batches = mldev->fw.req->jd.fw_load.cap.s.max_num_batches / model->batch_size;
+	info->max_batches =
+		cn10k_mldev->fw.req->jd.fw_load.cap.s.max_num_batches / model->batch_size;
 	info->nb_inputs = metadata->model.num_input;
 	info->input_info = input;
 	info->nb_outputs = metadata->model.num_output;
@@ -550,56 +583,26 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model)
 	info->wb_size = metadata->weights_bias.file_size;
 
 	/* Set input info */
+	layer = &model->layer[0];
 	for (i = 0; i < info->nb_inputs; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			rte_memcpy(input[i].name, metadata->input1[i].input_name,
-				   MRVL_ML_INPUT_NAME_LEN);
-			input[i].nb_dims = addr->input[i].nb_dims;
-			input[i].shape = addr->input[i].shape;
-			input[i].type = metadata->input1[i].model_input_type;
-			input[i].nb_elements = addr->input[i].nb_elements;
-			input[i].size =
-				addr->input[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->input1[i].model_input_type);
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-
-			rte_memcpy(input[i].name, metadata->input2[j].input_name,
-				   MRVL_ML_INPUT_NAME_LEN);
-			input[i].nb_dims = addr->input[i].nb_dims;
-			input[i].shape = addr->input[i].shape;
-			input[i].type = metadata->input2[j].model_input_type;
-			input[i].nb_elements = addr->input[i].nb_elements;
-			input[i].size =
-				addr->input[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->input2[j].model_input_type);
-		}
+		rte_memcpy(input[i].name, layer->info.input[i].name, MRVL_ML_INPUT_NAME_LEN);
+		input[i].nb_dims = layer->info.input[i].nb_dims;
+		input[i].shape = &layer->info.input[i].shape[0];
+		input[i].type = layer->info.input[i].qtype;
+		input[i].nb_elements = layer->info.input[i].nb_elements;
+		input[i].size = layer->info.input[i].nb_elements *
+				rte_ml_io_type_size_get(layer->info.input[i].qtype);
 	}
 
 	/* Set output info */
+	layer = &model->layer[0];
 	for (i = 0; i < info->nb_outputs; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			rte_memcpy(output[i].name, metadata->output1[i].output_name,
-				   MRVL_ML_OUTPUT_NAME_LEN);
-			output[i].nb_dims = addr->output[i].nb_dims;
-			output[i].shape = addr->output[i].shape;
-			output[i].type = metadata->output1[i].model_output_type;
-			output[i].nb_elements = addr->output[i].nb_elements;
-			output[i].size =
-				addr->output[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->output1[i].model_output_type);
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-
-			rte_memcpy(output[i].name, metadata->output2[j].output_name,
-				   MRVL_ML_OUTPUT_NAME_LEN);
-			output[i].nb_dims = addr->output[i].nb_dims;
-			output[i].shape = addr->output[i].shape;
-			output[i].type = metadata->output2[j].model_output_type;
-			output[i].nb_elements = addr->output[i].nb_elements;
-			output[i].size =
-				addr->output[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->output2[j].model_output_type);
-		}
+		rte_memcpy(output[i].name, layer->info.output[i].name, MRVL_ML_INPUT_NAME_LEN);
+		output[i].nb_dims = layer->info.output[i].nb_dims;
+		output[i].shape = &layer->info.output[i].shape[0];
+		output[i].type = layer->info.output[i].qtype;
+		output[i].nb_elements = layer->info.output[i].nb_elements;
+		output[i].size = layer->info.output[i].nb_elements *
+				 rte_ml_io_type_size_get(layer->info.output[i].qtype);
 	}
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 3128b28db7..206a369ca7 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -13,15 +13,8 @@
 #include "cn10k_ml_ocm.h"
 #include "cn10k_ml_ops.h"
 
-struct cnxk_ml_dev;
-
-/* Model state */
-enum cn10k_ml_model_state {
-	ML_CN10K_MODEL_STATE_LOADED,
-	ML_CN10K_MODEL_STATE_JOB_ACTIVE,
-	ML_CN10K_MODEL_STATE_STARTED,
-	ML_CN10K_MODEL_STATE_UNKNOWN,
-};
+struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 /* Model Metadata : v 2.3.0.1 */
 #define MRVL_ML_MODEL_MAGIC_STRING "MRVL"
@@ -369,7 +362,7 @@ struct cn10k_ml_model_metadata {
 };
 
 /* Model address structure */
-struct cn10k_ml_model_addr {
+struct cn10k_ml_layer_addr {
 	/* Base DMA address for load */
 	void *base_dma_addr_load;
 
@@ -408,58 +401,10 @@ struct cn10k_ml_model_addr {
 
 	/* End tile */
 	uint8_t tile_end;
-
-	/* Input address and size */
-	struct {
-		/* Number of dimensions in shape */
-		uint32_t nb_dims;
-
-		/* Shape of input */
-		uint32_t shape[4];
-
-		/* Number of elements */
-		uint32_t nb_elements;
-
-		/* Dequantized input size */
-		uint32_t sz_d;
-
-		/* Quantized input size */
-		uint32_t sz_q;
-	} input[MRVL_ML_NUM_INPUT_OUTPUT];
-
-	/* Output address and size */
-	struct {
-		/* Number of dimensions in shape */
-		uint32_t nb_dims;
-
-		/* Shape of input */
-		uint32_t shape[4];
-
-		/* Number of elements */
-		uint32_t nb_elements;
-
-		/* Dequantize output size */
-		uint32_t sz_d;
-
-		/* Quantized output size */
-		uint32_t sz_q;
-	} output[MRVL_ML_NUM_INPUT_OUTPUT];
-
-	/* Total size of quantized input */
-	uint32_t total_input_sz_q;
-
-	/* Total size of dequantized input */
-	uint32_t total_input_sz_d;
-
-	/* Total size of quantized output */
-	uint32_t total_output_sz_q;
-
-	/* Total size of dequantized output */
-	uint32_t total_output_sz_d;
 };
 
 /* Model fast-path stats */
-struct cn10k_ml_model_stats {
+struct cn10k_ml_layer_stats {
 	/* Total hardware latency, sum of all inferences */
 	uint64_t hw_latency_tot;
 
@@ -488,59 +433,38 @@ struct cn10k_ml_model_stats {
 	uint64_t fw_reset_count;
 };
 
-/* Model Object */
-struct cn10k_ml_model {
-	/* Device reference */
-	struct cnxk_ml_dev *mldev;
-
-	/* Name */
-	char name[RTE_ML_STR_MAX];
-
-	/* ID */
-	uint16_t model_id;
-
-	/* Batch size */
-	uint32_t batch_size;
-
-	/* Metadata */
+struct cn10k_ml_layer_data {
+	/* Model / Layer: metadata */
 	struct cn10k_ml_model_metadata metadata;
 
-	/* Address structure */
-	struct cn10k_ml_model_addr addr;
+	/* Layer: address structure */
+	struct cn10k_ml_layer_addr addr;
 
-	/* Tile and memory information object */
-	struct cn10k_ml_ocm_model_map model_mem_map;
+	/* Layer: Tile and memory information object */
+	struct cn10k_ml_ocm_layer_map ocm_map;
 
-	/* Internal model information structure
-	 * Size of the buffer = sizeof(struct rte_ml_model_info)
-	 *                    + num_inputs * sizeof(struct rte_ml_io_info)
-	 *                    + num_outputs * sizeof(struct rte_ml_io_info).
-	 * Structures would be arranged in the same order in the buffer.
-	 */
-	uint8_t *info;
-
-	/* Spinlock, used to update model state */
-	plt_spinlock_t lock;
-
-	/* State */
-	enum cn10k_ml_model_state state;
-
-	/* Slow-path operations request pointer */
+	/* Layer: Slow-path operations request pointer */
 	struct cn10k_ml_req *req;
 
-	/* Stats for burst ops */
-	struct cn10k_ml_model_stats *burst_stats;
+	/* Layer: Stats for burst ops */
+	struct cn10k_ml_layer_stats *burst_stats;
 
-	/* Stats for sync ops */
-	struct cn10k_ml_model_stats *sync_stats;
+	/* Layer: Stats for sync ops */
+	struct cn10k_ml_layer_stats *sync_stats;
+};
+
+struct cn10k_ml_model_data {
+	/* Model / Layer: metadata */
+	struct cn10k_ml_model_metadata metadata;
 };
 
 int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
-void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
+void cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer,
 				uint8_t *base_dma_addr);
+void cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer);
 int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id,
 				   uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages);
-void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model);
+void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 8094a0fab1..d71c36eae6 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -6,10 +6,10 @@
 
 #include <roc_api.h>
 
-#include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 
 /* OCM macros */
 #define BYTE_LEN	   8
@@ -333,12 +333,14 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 }
 
 void
-cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t tilemask,
-			   int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages)
+cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
+			   uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
+			   uint16_t scratch_pages)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_page_start;
@@ -353,6 +355,7 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
+	layer = &model->layer[layer_id];
 
 	/* Get first set bit, tile_start */
 	tile_start = 0;
@@ -382,8 +385,8 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 				PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, wb_page_end);
 	}
 
-	model->addr.tile_start = tile_start;
-	model->addr.tile_end = tile_end;
+	layer->glow.addr.tile_start = tile_start;
+	layer->glow.addr.tile_end = tile_end;
 
 	plt_ml_dbg("model_id = %u, tilemask = 0x%016lx", model_id, tilemask);
 	plt_ml_dbg("model_id = %u, wb_page_start = %d, wb_page_end = %d", model_id, wb_page_start,
@@ -393,12 +396,14 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 }
 
 void
-cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id)
 {
-	struct cn10k_ml_model *local_model;
+	struct cnxk_ml_model *local_model;
+	struct cnxk_ml_layer *local_layer;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_resize_pages;
@@ -409,16 +414,19 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 	int tile_id;
 	int page_id;
 	uint16_t i;
+	uint16_t j;
 
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
+	layer = &model->layer[layer_id];
 
 	/* Update OCM info for WB memory */
-	wb_page_start = model->model_mem_map.wb_page_start;
-	wb_page_end = wb_page_start + model->model_mem_map.wb_pages - 1;
-	for (tile_id = model->addr.tile_start; tile_id <= model->addr.tile_end; tile_id++) {
+	wb_page_start = layer->glow.ocm_map.wb_page_start;
+	wb_page_end = wb_page_start + layer->glow.ocm_map.wb_pages - 1;
+	for (tile_id = layer->glow.addr.tile_start; tile_id <= layer->glow.addr.tile_end;
+	     tile_id++) {
 		for (page_id = wb_page_start; page_id <= wb_page_end; page_id++) {
 			CLEAR_BIT(ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE],
 				  page_id % OCM_MAP_WORD_SIZE);
@@ -432,11 +440,19 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 		scratch_resize_pages = 0;
 		for (i = 0; i < dev->data->nb_models; i++) {
 			local_model = dev->data->models[i];
-			if ((i != model_id) && (local_model != NULL)) {
-				if (IS_BIT_SET(local_model->model_mem_map.tilemask, tile_id))
-					scratch_resize_pages = PLT_MAX(
-						(int)local_model->model_mem_map.scratch_pages,
-						scratch_resize_pages);
+			if (local_model == NULL)
+				continue;
+
+			for (j = 0; j < local_model->nb_layers; j++) {
+				local_layer = &local_model->layer[j];
+				if (local_layer != layer &&
+				    local_layer->glow.ocm_map.ocm_reserved) {
+					if (IS_BIT_SET(local_layer->glow.ocm_map.tilemask, tile_id))
+						scratch_resize_pages =
+							PLT_MAX((int)local_layer->glow.ocm_map
+									.scratch_pages,
+								scratch_resize_pages);
+				}
 			}
 		}
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 3404e7fd65..720f8caf76 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -27,7 +27,7 @@ struct cn10k_ml_ocm_tile_info {
 };
 
 /* Model OCM map structure */
-struct cn10k_ml_ocm_model_map {
+struct cn10k_ml_ocm_layer_map {
 	/* Status of OCM reservation */
 	bool ocm_reserved;
 
@@ -77,9 +77,10 @@ struct cn10k_ml_ocm {
 int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
 int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
 			       uint16_t scratch_pages, uint64_t *tilemask);
-void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t tilemask,
-				int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages);
-void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id);
+void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
+				uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
+				uint16_t scratch_pages);
+void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id);
 void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp);
 
 #endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index def6d4c756..e91cc4e859 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -7,10 +7,10 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_model.h"
 #include "cn10k_ml_ops.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
@@ -202,7 +202,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	char str[STR_LEN];
 	uint8_t i;
@@ -215,77 +215,80 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 
 	/* Print debug info */
 	print_line(fp, LINE_LEN);
-	fprintf(fp, " Model Information (%s)\n", model->metadata.model.name);
+	fprintf(fp, " Model Information (%s)\n", model->glow.metadata.model.name);
 	print_line(fp, LINE_LEN);
-	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->metadata.model.name);
-	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version", model->metadata.model.version[0],
-		model->metadata.model.version[1], model->metadata.model.version[2],
-		model->metadata.model.version[3]);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->glow.metadata.model.name);
+	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version",
+		model->glow.metadata.model.version[0], model->glow.metadata.model.version[1],
+		model->glow.metadata.model.version[2], model->glow.metadata.model.version[3]);
 	if (strlen(model->name) != 0)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "debug_name", model->name);
 	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->metadata.model.batch_size);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_layers", model->metadata.model.num_layers);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->glow.metadata.model.batch_size);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_layers", model->glow.metadata.model.num_layers);
 
 	/* Print model state */
-	if (model->state == ML_CN10K_MODEL_STATE_LOADED)
+	if (model->state == ML_CNXK_MODEL_STATE_LOADED)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
-	if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE)
+	if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
-	if (model->state == ML_CN10K_MODEL_STATE_STARTED)
+	if (model->state == ML_CNXK_MODEL_STATE_STARTED)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
 
 	/* Print OCM status */
 	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
-		model->metadata.model.ocm_wb_range_end - model->metadata.model.ocm_wb_range_start +
-			1);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", model->model_mem_map.wb_pages);
+		model->glow.metadata.model.ocm_wb_range_end -
+			model->glow.metadata.model.ocm_wb_range_start + 1);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", model->layer[0].glow.ocm_map.wb_pages);
 	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
-		ocm->size_per_tile - model->metadata.model.ocm_tmp_range_floor);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages", model->model_mem_map.scratch_pages);
+		ocm->size_per_tile - model->glow.metadata.model.ocm_tmp_range_floor);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages",
+		model->layer[0].glow.ocm_map.scratch_pages);
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
-		model->metadata.model.tile_end - model->metadata.model.tile_start + 1);
+		model->glow.metadata.model.tile_end - model->glow.metadata.model.tile_start + 1);
 
-	if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+	if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
-			ML_CN10K_OCM_NUMTILES / 4, model->model_mem_map.tilemask);
+			ML_CN10K_OCM_NUMTILES / 4, model->layer[0].glow.ocm_map.tilemask);
 		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
-			model->model_mem_map.wb_page_start * cn10k_mldev->ocm.page_size);
+			model->layer[0].glow.ocm_map.wb_page_start * cn10k_mldev->ocm.page_size);
 	}
 
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->metadata.model.num_input);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", model->metadata.model.num_output);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->glow.metadata.model.num_input);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", model->glow.metadata.model.num_output);
 	fprintf(fp, "\n");
 
 	print_line(fp, LINE_LEN);
 	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "input", "input_name", "input_type",
 		"model_input_type", "quantize");
 	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->metadata.model.num_input; i++) {
+	for (i = 0; i < model->glow.metadata.model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.input1[i].input_name);
-			rte_ml_io_type_to_str(model->metadata.input1[i].input_type, str, STR_LEN);
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.input1[i].input_name);
+			rte_ml_io_type_to_str(model->glow.metadata.input1[i].input_type, str,
+					      STR_LEN);
 			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.input1[i].model_input_type, str,
+			rte_ml_io_type_to_str(model->glow.metadata.input1[i].model_input_type, str,
 					      STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.input1[i].quantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.input1[i].quantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.input2[j].input_name);
-			rte_ml_io_type_to_str(model->metadata.input2[j].input_type, str, STR_LEN);
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.input2[j].input_name);
+			rte_ml_io_type_to_str(model->glow.metadata.input2[j].input_type, str,
+					      STR_LEN);
 			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.input2[j].model_input_type, str,
+			rte_ml_io_type_to_str(model->glow.metadata.input2[j].model_input_type, str,
 					      STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.input2[j].quantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.input2[j].quantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		}
 	}
@@ -295,29 +298,31 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "output", "output_name", "output_type",
 		"model_output_type", "dequantize");
 	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->metadata.model.num_output; i++) {
+	for (i = 0; i < model->glow.metadata.model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.output1[i].output_name);
-			rte_ml_io_type_to_str(model->metadata.output1[i].output_type, str, STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.output1[i].model_output_type, str,
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.output1[i].output_name);
+			rte_ml_io_type_to_str(model->glow.metadata.output1[i].output_type, str,
 					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(model->glow.metadata.output1[i].model_output_type,
+					      str, STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.output1[i].dequantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.output1[i].dequantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.output2[j].output_name);
-			rte_ml_io_type_to_str(model->metadata.output2[j].output_type, str, STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.output2[j].model_output_type, str,
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.output2[j].output_name);
+			rte_ml_io_type_to_str(model->glow.metadata.output2[j].output_type, str,
 					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(model->glow.metadata.output2[j].model_output_type,
+					      str, STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.output2[j].dequantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.output2[j].dequantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		}
 	}
@@ -327,14 +332,14 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 }
 
 static void
-cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_model *model,
+cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_model *model,
 				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_model_addr *addr;
+	struct cn10k_ml_layer_addr *addr;
 
-	metadata = &model->metadata;
-	addr = &model->addr;
+	metadata = &model->glow.metadata;
+	addr = &model->layer[0].glow.addr;
 
 	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
 	req->jd.hdr.jce.w0.u64 = 0;
@@ -345,7 +350,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_m
 	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
-		if (!model->metadata.model.ocm_relocatable)
+		if (!model->glow.metadata.model.ocm_relocatable)
 			req->jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
 		else
 			req->jd.hdr.sp_flags = 0x0;
@@ -385,7 +390,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_m
 		req->jd.model_start.output.s.ddr_range_end = metadata->model.ddr_output_range_end;
 
 		req->extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
-			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, model->addr.scratch_base_addr));
+			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->scratch_base_addr));
 		req->extended_args.start.ddr_scratch_range_start =
 			metadata->model.ddr_scratch_range_start;
 		req->extended_args.start.ddr_scratch_range_end =
@@ -445,7 +450,7 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
-	nb_stats = RTE_DIM(device_stats) + ML_CN10K_MAX_MODELS * RTE_DIM(model_stats);
+	nb_stats = RTE_DIM(device_stats) + ML_CNXK_MAX_MODELS * RTE_DIM(model_stats);
 	if (cn10k_mldev->xstats.entries == NULL)
 		cn10k_mldev->xstats.entries = rte_zmalloc(
 			"cn10k_ml_xstats", sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
@@ -472,7 +477,7 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	cn10k_mldev->xstats.count_mode_device = stat_id;
 
 	/* Initialize model xstats */
-	for (model = 0; model < ML_CN10K_MAX_MODELS; model++) {
+	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
 		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
 
 		for (i = 0; i < RTE_DIM(model_stats); i++) {
@@ -521,7 +526,7 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint16_t rclk_freq;
 	uint16_t sclk_freq;
 	uint16_t stat_id;
@@ -543,7 +548,7 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 	for (i = 0; i < RTE_DIM(model_stats); i++) {
 		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
-			 model->metadata.model.name, model_stats[i].name, suffix);
+			 model->layer[0].glow.metadata.model.name, model_stats[i].name, suffix);
 		stat_id++;
 	}
 }
@@ -576,9 +581,9 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value += model->burst_stats[qp_id].str##_latency_tot;                      \
-			count += model->burst_stats[qp_id].dequeued_count -                        \
-				 model->burst_stats[qp_id].str##_reset_count;                      \
+			value += model->layer[0].glow.burst_stats[qp_id].str##_latency_tot;        \
+			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
+				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
 		}                                                                                  \
 		if (count != 0)                                                                    \
 			value = value / count;                                                     \
@@ -588,9 +593,10 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = UINT64_MAX;                                                                \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MIN(value, model->burst_stats[qp_id].str##_latency_min);       \
-			count += model->burst_stats[qp_id].dequeued_count -                        \
-				 model->burst_stats[qp_id].str##_reset_count;                      \
+			value = PLT_MIN(                                                           \
+				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_min); \
+			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
+				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
@@ -600,9 +606,10 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MAX(value, model->burst_stats[qp_id].str##_latency_max);       \
-			count += model->burst_stats[qp_id].dequeued_count -                        \
-				 model->burst_stats[qp_id].str##_reset_count;                      \
+			value = PLT_MAX(                                                           \
+				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_max); \
+			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
+				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
@@ -611,7 +618,7 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 static uint64_t
 cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cn10k_ml_xstats_type type)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint16_t rclk_freq; /* MHz */
 	uint16_t sclk_freq; /* MHz */
 	uint64_t count = 0;
@@ -692,28 +699,28 @@ cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[],
 #define ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			model->burst_stats[qp_id].str##_latency_tot = 0;                           \
-			model->burst_stats[qp_id].str##_reset_count =                              \
-				model->burst_stats[qp_id].dequeued_count;                          \
+			model->layer[0].glow.burst_stats[qp_id].str##_latency_tot = 0;             \
+			model->layer[0].glow.burst_stats[qp_id].str##_reset_count =                \
+				model->layer[0].glow.burst_stats[qp_id].dequeued_count;            \
 		}                                                                                  \
 	} while (0)
 
 #define ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->burst_stats[qp_id].str##_latency_min = UINT64_MAX;                  \
+			model->layer[0].glow.burst_stats[qp_id].str##_latency_min = UINT64_MAX;    \
 	} while (0)
 
 #define ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->burst_stats[qp_id].str##_latency_max = 0;                           \
+			model->layer[0].glow.burst_stats[qp_id].str##_latency_max = 0;             \
 	} while (0)
 
 static void
 cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cn10k_ml_xstats_type type)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint32_t qp_id;
 
 	model = dev->data->models[model_id];
@@ -749,7 +756,7 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	int32_t lcl_model_id = 0;
 	uint16_t start_id;
 	uint16_t end_id;
@@ -758,7 +765,7 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	for (i = 0; i < ML_CN10K_MAX_MODELS; i++) {
+	for (i = 0; i < ML_CNXK_MAX_MODELS; i++) {
 		if (model_id == -1) {
 			model = dev->data->models[i];
 			if (model == NULL) /* Skip inactive models */
@@ -803,7 +810,7 @@ static int
 cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct rte_ml_model_info *info;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct rte_ml_buff_seg seg[2];
 	struct rte_ml_buff_seg *inp;
 	struct rte_ml_buff_seg *out;
@@ -854,7 +861,7 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	op.input = &inp;
 	op.output = &out;
 
-	memset(model->req, 0, sizeof(struct cn10k_ml_req));
+	memset(model->layer[0].glow.req, 0, sizeof(struct cn10k_ml_req));
 	ret = cn10k_ml_inference_sync(dev, &op);
 	plt_memzone_free(mz);
 
@@ -875,7 +882,7 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 
 	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
 	dev_info->driver_name = dev->device->driver->name;
-	dev_info->max_models = ML_CN10K_MAX_MODELS;
+	dev_info->max_models = ML_CNXK_MAX_MODELS;
 	if (cn10k_mldev->hw_queue_lock)
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
 	else
@@ -895,7 +902,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
@@ -1001,11 +1008,11 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 			model = dev->data->models[model_id];
 			if (model != NULL) {
-				if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 					if (cn10k_ml_model_stop(dev, model_id) != 0)
 						plt_err("Could not stop model %u", model_id);
 				}
-				if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
 					if (cn10k_ml_model_unload(dev, model_id) != 0)
 						plt_err("Could not unload model %u", model_id);
 				}
@@ -1093,7 +1100,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
 	uint16_t qp_id;
@@ -1111,11 +1118,11 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 		model = dev->data->models[model_id];
 		if (model != NULL) {
-			if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 				if (cn10k_ml_model_stop(dev, model_id) != 0)
 					plt_err("Could not stop model %u", model_id);
 			}
-			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
 				if (cn10k_ml_model_unload(dev, model_id) != 0)
 					plt_err("Could not unload model %u", model_id);
 			}
@@ -1294,7 +1301,7 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CN10K_MAX_MODELS)
+		if (model_id >= ML_CNXK_MAX_MODELS)
 			break;
 		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
@@ -1386,7 +1393,7 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CN10K_MAX_MODELS)
+		if (model_id >= ML_CNXK_MAX_MODELS)
 			return -EINVAL;
 		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
@@ -1447,7 +1454,7 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_fw *fw;
 
 	uint32_t head_loc;
@@ -1588,7 +1595,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
@@ -1643,9 +1650,9 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
 			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
 	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
-	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_model_stats);
+	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_stats);
 
-	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
+	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
 		  2 * model_data_size + model_scratch_size + model_info_size +
 		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE) +
 		  model_stats_size;
@@ -1659,62 +1666,85 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	model = mz->addr;
-	model->mldev = cnxk_mldev;
+	model->cnxk_mldev = cnxk_mldev;
 	model->model_id = idx;
+	dev->data->models[idx] = model;
 
-	rte_memcpy(&model->metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
-	cn10k_ml_model_metadata_update(&model->metadata);
+	rte_memcpy(&model->glow.metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
+	cn10k_ml_model_metadata_update(&model->glow.metadata);
+
+	/* Set model name */
+	rte_memcpy(model->name, (char *)model->glow.metadata.model.name, 64);
 
 	/* Enable support for batch_size of 256 */
-	if (model->metadata.model.batch_size == 0)
+	if (model->glow.metadata.model.batch_size == 0)
 		model->batch_size = 256;
 	else
-		model->batch_size = model->metadata.model.batch_size;
+		model->batch_size = model->glow.metadata.model.batch_size;
+
+	/* Since the number of layers that the driver would be handling for glow models is
+	 * always 1. consider the entire model as a model with single layer. This would
+	 * ignore the num_layers from metadata.
+	 */
+	model->nb_layers = 1;
+
+	/* Copy metadata to internal buffer */
+	rte_memcpy(&model->layer[0].glow.metadata, params->addr,
+		   sizeof(struct cn10k_ml_model_metadata));
+	cn10k_ml_model_metadata_update(&model->layer[0].glow.metadata);
+	model->layer[0].model = model;
 
 	/* Set DMA base address */
 	base_dma_addr = PLT_PTR_ADD(
-		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE));
-	cn10k_ml_model_addr_update(model, params->addr, base_dma_addr);
-	model->addr.scratch_base_addr = PLT_PTR_ADD(base_dma_addr, 2 * model_data_size);
+		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE));
+	cn10k_ml_layer_addr_update(&model->layer[0], params->addr, base_dma_addr);
+	model->layer[0].glow.addr.scratch_base_addr =
+		PLT_PTR_ADD(base_dma_addr, 2 * model_data_size);
 
 	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, model_data_size);
+	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
+		   model->layer[0].glow.addr.base_dma_addr_load, model_data_size);
+
+	/* Update internal I/O data structure */
+	cn10k_ml_layer_info_update(&model->layer[0]);
 
 	/* Initialize model_mem_map */
-	memset(&model->model_mem_map, 0, sizeof(struct cn10k_ml_ocm_model_map));
-	model->model_mem_map.ocm_reserved = false;
-	model->model_mem_map.tilemask = 0;
-	model->model_mem_map.wb_page_start = -1;
-	model->model_mem_map.wb_pages = wb_pages;
-	model->model_mem_map.scratch_pages = scratch_pages;
+	memset(&model->layer[0].glow.ocm_map, 0, sizeof(struct cn10k_ml_ocm_layer_map));
+	model->layer[0].glow.ocm_map.ocm_reserved = false;
+	model->layer[0].glow.ocm_map.tilemask = 0;
+	model->layer[0].glow.ocm_map.wb_page_start = -1;
+	model->layer[0].glow.ocm_map.wb_pages = wb_pages;
+	model->layer[0].glow.ocm_map.scratch_pages = scratch_pages;
 
 	/* Set model info */
-	model->info = PLT_PTR_ADD(model->addr.scratch_base_addr, model_scratch_size);
+	model->info = PLT_PTR_ADD(model->layer[0].glow.addr.scratch_base_addr, model_scratch_size);
 	cn10k_ml_model_info_set(dev, model);
 
 	/* Set slow-path request address and state */
-	model->req = PLT_PTR_ADD(model->info, model_info_size);
+	model->layer[0].glow.req = PLT_PTR_ADD(model->info, model_info_size);
 
 	/* Reset burst and sync stats */
-	model->burst_stats = PLT_PTR_ADD(
-		model->req, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE));
+	model->layer[0].glow.burst_stats =
+		PLT_PTR_ADD(model->layer[0].glow.req,
+			    PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE));
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
-		model->burst_stats[qp_id].hw_latency_tot = 0;
-		model->burst_stats[qp_id].hw_latency_min = UINT64_MAX;
-		model->burst_stats[qp_id].hw_latency_max = 0;
-		model->burst_stats[qp_id].fw_latency_tot = 0;
-		model->burst_stats[qp_id].fw_latency_min = UINT64_MAX;
-		model->burst_stats[qp_id].fw_latency_max = 0;
-		model->burst_stats[qp_id].hw_reset_count = 0;
-		model->burst_stats[qp_id].fw_reset_count = 0;
-		model->burst_stats[qp_id].dequeued_count = 0;
-	}
-	model->sync_stats =
-		PLT_PTR_ADD(model->burst_stats,
-			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_model_stats));
+		model->layer[0].glow.burst_stats[qp_id].hw_latency_tot = 0;
+		model->layer[0].glow.burst_stats[qp_id].hw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_stats[qp_id].hw_latency_max = 0;
+		model->layer[0].glow.burst_stats[qp_id].fw_latency_tot = 0;
+		model->layer[0].glow.burst_stats[qp_id].fw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_stats[qp_id].fw_latency_max = 0;
+		model->layer[0].glow.burst_stats[qp_id].hw_reset_count = 0;
+		model->layer[0].glow.burst_stats[qp_id].fw_reset_count = 0;
+		model->layer[0].glow.burst_stats[qp_id].dequeued_count = 0;
+	}
+
+	model->layer[0].glow.sync_stats =
+		PLT_PTR_ADD(model->layer[0].glow.burst_stats,
+			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_stats));
 
 	plt_spinlock_init(&model->lock);
-	model->state = ML_CN10K_MODEL_STATE_LOADED;
+	model->state = ML_CNXK_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
 	cnxk_mldev->nb_models_loaded++;
 
@@ -1730,7 +1760,7 @@ int
 cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	char str[RTE_MEMZONE_NAMESIZE];
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cnxk_ml_dev *cnxk_mldev;
 
 	cnxk_mldev = dev->data->dev_private;
@@ -1741,7 +1771,7 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EINVAL;
 	}
 
-	if (model->state != ML_CN10K_MODEL_STATE_LOADED) {
+	if (model->state != ML_CNXK_MODEL_STATE_LOADED) {
 		plt_err("Cannot unload. Model in use.");
 		return -EBUSY;
 	}
@@ -1758,7 +1788,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1783,7 +1813,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	}
 
 	/* Prepare JD */
-	req = model->req;
+	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
@@ -1791,63 +1821,66 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
-	num_tiles = model->metadata.model.tile_end - model->metadata.model.tile_start + 1;
+	num_tiles = model->layer[0].glow.metadata.model.tile_end -
+		    model->layer[0].glow.metadata.model.tile_start + 1;
 
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 				plt_ml_dbg("Model already started, model = 0x%016lx",
 					   PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE) {
+			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
 				plt_err("A slow-path job is active for the model = 0x%016lx",
 					PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CN10K_MODEL_STATE_JOB_ACTIVE;
+			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (!model->model_mem_map.ocm_reserved) {
+	while (!model->layer[0].glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
 			wb_page_start = cn10k_ml_ocm_tilemask_find(
-				dev, num_tiles, model->model_mem_map.wb_pages,
-				model->model_mem_map.scratch_pages, &tilemask);
+				dev, num_tiles, model->layer[0].glow.ocm_map.wb_pages,
+				model->layer[0].glow.ocm_map.scratch_pages, &tilemask);
 
 			if (wb_page_start == -1) {
 				plt_err("Free pages not available on OCM tiles");
 				plt_err("Failed to start model = 0x%016lx, name = %s",
-					PLT_U64_CAST(model), model->metadata.model.name);
+					PLT_U64_CAST(model),
+					model->layer[0].glow.metadata.model.name);
 
 				plt_spinlock_unlock(&ocm->lock);
 				return -ENOMEM;
 			}
 
-			model->model_mem_map.tilemask = tilemask;
-			model->model_mem_map.wb_page_start = wb_page_start;
+			model->layer[0].glow.ocm_map.tilemask = tilemask;
+			model->layer[0].glow.ocm_map.wb_page_start = wb_page_start;
 
-			cn10k_ml_ocm_reserve_pages(
-				dev, model->model_id, model->model_mem_map.tilemask,
-				model->model_mem_map.wb_page_start, model->model_mem_map.wb_pages,
-				model->model_mem_map.scratch_pages);
-			model->model_mem_map.ocm_reserved = true;
+			cn10k_ml_ocm_reserve_pages(dev, model->model_id, 0,
+						   model->layer[0].glow.ocm_map.tilemask,
+						   model->layer[0].glow.ocm_map.wb_page_start,
+						   model->layer[0].glow.ocm_map.wb_pages,
+						   model->layer[0].glow.ocm_map.scratch_pages);
+			model->layer[0].glow.ocm_map.ocm_reserved = true;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
 
 	/* Update JD */
-	cn10k_ml_ocm_tilecount(model->model_mem_map.tilemask, &tile_start, &tile_end);
+	cn10k_ml_ocm_tilecount(model->layer[0].glow.ocm_map.tilemask, &tile_start, &tile_end);
 	req->jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
 	req->jd.model_start.ocm_wb_base_address =
-		model->model_mem_map.wb_page_start * ocm->page_size;
+		model->layer[0].glow.ocm_map.wb_page_start * ocm->page_size;
 
 	job_enqueued = false;
 	job_dequeued = false;
@@ -1880,10 +1913,10 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
 			if (ret == 0) {
-				model->state = ML_CN10K_MODEL_STATE_STARTED;
+				model->state = ML_CNXK_MODEL_STATE_STARTED;
 				cnxk_mldev->nb_models_started++;
 			} else {
-				model->state = ML_CN10K_MODEL_STATE_UNKNOWN;
+				model->state = ML_CNXK_MODEL_STATE_UNKNOWN;
 			}
 
 			plt_spinlock_unlock(&model->lock);
@@ -1891,12 +1924,12 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 		}
 	}
 
-	if (model->state == ML_CN10K_MODEL_STATE_UNKNOWN) {
-		while (model->model_mem_map.ocm_reserved) {
+	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN) {
+		while (model->layer[0].glow.ocm_map.ocm_reserved) {
 			if (plt_spinlock_trylock(&ocm->lock) != 0) {
-				cn10k_ml_ocm_free_pages(dev, model->model_id);
-				model->model_mem_map.ocm_reserved = false;
-				model->model_mem_map.tilemask = 0x0;
+				cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
+				model->layer[0].glow.ocm_map.ocm_reserved = false;
+				model->layer[0].glow.ocm_map.tilemask = 0x0;
 				plt_spinlock_unlock(&ocm->lock);
 			}
 		}
@@ -1917,7 +1950,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1937,7 +1970,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	}
 
 	/* Prepare JD */
-	req = model->req;
+	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
@@ -1948,31 +1981,31 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
 				plt_ml_dbg("Model not started, model = 0x%016lx",
 					   PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE) {
+			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
 				plt_err("A slow-path job is active for the model = 0x%016lx",
 					PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CN10K_MODEL_STATE_JOB_ACTIVE;
+			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (model->model_mem_map.ocm_reserved) {
+	while (model->layer[0].glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
-			cn10k_ml_ocm_free_pages(dev, model->model_id);
-			model->model_mem_map.ocm_reserved = false;
-			model->model_mem_map.tilemask = 0x0;
+			cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
+			model->layer[0].glow.ocm_map.ocm_reserved = false;
+			model->layer[0].glow.ocm_map.tilemask = 0x0;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
@@ -2008,7 +2041,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
 			cnxk_mldev->nb_models_stopped++;
-			model->state = ML_CN10K_MODEL_STATE_LOADED;
+			model->state = ML_CNXK_MODEL_STATE_LOADED;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
@@ -2021,7 +2054,7 @@ static int
 cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			struct rte_ml_model_info *model_info)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 
 	model = dev->data->models[model_id];
 
@@ -2040,7 +2073,7 @@ cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 static int
 cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	size_t size;
 
 	model = dev->data->models[model_id];
@@ -2050,19 +2083,23 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *bu
 		return -EINVAL;
 	}
 
-	if (model->state == ML_CN10K_MODEL_STATE_UNKNOWN)
+	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN)
 		return -1;
-	else if (model->state != ML_CN10K_MODEL_STATE_LOADED)
+	else if (model->state != ML_CNXK_MODEL_STATE_LOADED)
 		return -EBUSY;
 
-	size = model->metadata.init_model.file_size + model->metadata.main_model.file_size +
-	       model->metadata.finish_model.file_size + model->metadata.weights_bias.file_size;
+	size = model->layer[0].glow.metadata.init_model.file_size +
+	       model->layer[0].glow.metadata.main_model.file_size +
+	       model->layer[0].glow.metadata.finish_model.file_size +
+	       model->layer[0].glow.metadata.weights_bias.file_size;
 
 	/* Update model weights & bias */
-	rte_memcpy(model->addr.wb_load_addr, buffer, model->metadata.weights_bias.file_size);
+	rte_memcpy(model->layer[0].glow.addr.wb_load_addr, buffer,
+		   model->layer[0].glow.metadata.weights_bias.file_size);
 
 	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, size);
+	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
+		   model->layer[0].glow.addr.base_dma_addr_load, size);
 
 	return 0;
 }
@@ -2071,7 +2108,7 @@ static int
 cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
 		     struct rte_ml_buff_seg **qbuffer)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint8_t model_input_type;
 	uint8_t *lcl_dbuffer;
 	uint8_t *lcl_qbuffer;
@@ -2091,57 +2128,58 @@ cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_bu
 	lcl_dbuffer = dbuffer[0]->addr;
 	lcl_qbuffer = qbuffer[0]->addr;
 
-	for (i = 0; i < model->metadata.model.num_input; i++) {
+	for (i = 0; i < model->layer[0].glow.metadata.model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			input_type = model->metadata.input1[i].input_type;
-			model_input_type = model->metadata.input1[i].model_input_type;
-			qscale = model->metadata.input1[i].qscale;
+			input_type = model->layer[0].glow.metadata.input1[i].input_type;
+			model_input_type = model->layer[0].glow.metadata.input1[i].model_input_type;
+			qscale = model->layer[0].glow.metadata.input1[i].qscale;
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			input_type = model->metadata.input2[j].input_type;
-			model_input_type = model->metadata.input2[j].model_input_type;
-			qscale = model->metadata.input2[j].qscale;
+			input_type = model->layer[0].glow.metadata.input2[j].input_type;
+			model_input_type = model->layer[0].glow.metadata.input2[j].model_input_type;
+			qscale = model->layer[0].glow.metadata.input2[j].qscale;
 		}
 
 		if (input_type == model_input_type) {
-			rte_memcpy(lcl_qbuffer, lcl_dbuffer, model->addr.input[i].sz_d);
+			rte_memcpy(lcl_qbuffer, lcl_dbuffer, model->layer[0].info.input[i].sz_d);
 		} else {
-			switch (model->metadata.input1[i].model_input_type) {
+			switch (model->layer[0].glow.metadata.input1[i].model_input_type) {
 			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_float32_to_int8(qscale,
-								model->addr.input[i].nb_elements,
-								lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_int8(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_float32_to_uint8(qscale,
-								 model->addr.input[i].nb_elements,
-								 lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_uint8(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_float32_to_int16(qscale,
-								 model->addr.input[i].nb_elements,
-								 lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_int16(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_float32_to_uint16(qscale,
-								  model->addr.input[i].nb_elements,
-								  lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_uint16(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_FP16:
-				ret = rte_ml_io_float32_to_float16(model->addr.input[i].nb_elements,
-								   lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_float16(
+					model->layer[0].info.input[i].nb_elements, lcl_dbuffer,
+					lcl_qbuffer);
 				break;
 			default:
 				plt_err("Unsupported model_input_type[%u] : %u", i,
-					model->metadata.input1[i].model_input_type);
+					model->layer[0].glow.metadata.input1[i].model_input_type);
 				ret = -ENOTSUP;
 			}
 			if (ret < 0)
 				return ret;
 		}
 
-		lcl_dbuffer += model->addr.input[i].sz_d;
-		lcl_qbuffer += model->addr.input[i].sz_q;
+		lcl_dbuffer += model->layer[0].info.input[i].sz_d;
+		lcl_qbuffer += model->layer[0].info.input[i].sz_q;
 	}
 
 	return 0;
@@ -2151,7 +2189,7 @@ static int
 cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
 		       struct rte_ml_buff_seg **dbuffer)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint8_t model_output_type;
 	uint8_t *lcl_qbuffer;
 	uint8_t *lcl_dbuffer;
@@ -2171,58 +2209,60 @@ cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_
 	lcl_dbuffer = dbuffer[0]->addr;
 	lcl_qbuffer = qbuffer[0]->addr;
 
-	for (i = 0; i < model->metadata.model.num_output; i++) {
+	for (i = 0; i < model->layer[0].glow.metadata.model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			output_type = model->metadata.output1[i].output_type;
-			model_output_type = model->metadata.output1[i].model_output_type;
-			dscale = model->metadata.output1[i].dscale;
+			output_type = model->layer[0].glow.metadata.output1[i].output_type;
+			model_output_type =
+				model->layer[0].glow.metadata.output1[i].model_output_type;
+			dscale = model->layer[0].glow.metadata.output1[i].dscale;
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			output_type = model->metadata.output2[j].output_type;
-			model_output_type = model->metadata.output2[j].model_output_type;
-			dscale = model->metadata.output2[j].dscale;
+			output_type = model->layer[0].glow.metadata.output2[j].output_type;
+			model_output_type =
+				model->layer[0].glow.metadata.output2[j].model_output_type;
+			dscale = model->layer[0].glow.metadata.output2[j].dscale;
 		}
 
 		if (output_type == model_output_type) {
-			rte_memcpy(lcl_dbuffer, lcl_qbuffer, model->addr.output[i].sz_q);
+			rte_memcpy(lcl_dbuffer, lcl_qbuffer, model->layer[0].info.output[i].sz_q);
 		} else {
-			switch (model->metadata.output1[i].model_output_type) {
+			switch (model->layer[0].glow.metadata.output1[i].model_output_type) {
 			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_int8_to_float32(dscale,
-								model->addr.output[i].nb_elements,
-								lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_int8_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_uint8_to_float32(dscale,
-								 model->addr.output[i].nb_elements,
-								 lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_uint8_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_int16_to_float32(dscale,
-								 model->addr.output[i].nb_elements,
-								 lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_int16_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_uint16_to_float32(dscale,
-								  model->addr.output[i].nb_elements,
-								  lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_uint16_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_FP16:
 				ret = rte_ml_io_float16_to_float32(
-					model->addr.output[i].nb_elements, lcl_qbuffer,
+					model->layer[0].info.output[i].nb_elements, lcl_qbuffer,
 					lcl_dbuffer);
 				break;
 			default:
 				plt_err("Unsupported model_output_type[%u] : %u", i,
-					model->metadata.output1[i].model_output_type);
+					model->layer[0].glow.metadata.output1[i].model_output_type);
 				ret = -ENOTSUP;
 			}
 			if (ret < 0)
 				return ret;
 		}
 
-		lcl_qbuffer += model->addr.output[i].sz_q;
-		lcl_dbuffer += model->addr.output[i].sz_d;
+		lcl_qbuffer += model->layer[0].info.output[i].sz_q;
+		lcl_dbuffer += model->layer[0].info.output[i].sz_d;
 	}
 
 	return 0;
@@ -2250,10 +2290,10 @@ static __rte_always_inline void
 cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
 		       struct rte_ml_op *op)
 {
-	struct cn10k_ml_model_stats *stats;
+	struct cn10k_ml_layer_stats *stats;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_qp *qp;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
@@ -2263,9 +2303,9 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		if (likely(qp_id >= 0)) {
 			qp = dev->data->queue_pairs[qp_id];
 			qp->stats.dequeued_count++;
-			stats = &model->burst_stats[qp_id];
+			stats = &model->layer[0].glow.burst_stats[qp_id];
 		} else {
-			stats = model->sync_stats;
+			stats = model->layer[0].glow.sync_stats;
 		}
 
 		if (unlikely(stats->dequeued_count == stats->hw_reset_count)) {
@@ -2469,7 +2509,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_req *req;
 	bool timeout;
 	int ret = 0;
@@ -2477,7 +2517,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[op->model_id];
-	req = model->req;
+	req = model->layer[0].glow.req;
 
 	cn10k_ml_set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
diff --git a/drivers/ml/cnxk/cnxk_ml_io.h b/drivers/ml/cnxk/cnxk_ml_io.h
new file mode 100644
index 0000000000..29ec7ec511
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_io.h
@@ -0,0 +1,79 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_IO_H_
+#define _CNXK_ML_IO_H_
+
+#include <rte_mldev.h>
+
+/* Maximum number of models per device */
+#define ML_CNXK_MAX_MODELS 16
+
+/* Maximum number of layers per model */
+#define ML_CNXK_MODEL_MAX_LAYERS 32
+
+/* Maximum number of inputs or outputs per layer or model */
+#define ML_CNXK_MODEL_MAX_INPUT_OUTPUT 32
+
+/* Maximum number of dimensions per I/O shape */
+#define ML_CNXK_MODEL_MAX_DIMS 8
+
+/* Input / Output structure */
+struct cnxk_ml_io {
+	/* name */
+	char name[RTE_ML_STR_MAX];
+
+	/* dequantized data type */
+	enum rte_ml_io_type dtype;
+
+	/* quantized data type */
+	enum rte_ml_io_type qtype;
+
+	/* Number of dimensions in shape */
+	uint32_t nb_dims;
+
+	/* Shape of input */
+	uint32_t shape[ML_CNXK_MODEL_MAX_DIMS];
+
+	/* Number of elements */
+	uint32_t nb_elements;
+
+	/* Dequantized input size */
+	uint32_t sz_d;
+
+	/* Quantized input size */
+	uint32_t sz_q;
+
+	/* Scale */
+	float scale;
+};
+
+/* Model / Layer IO structure */
+struct cnxk_ml_io_info {
+	/* Number of inputs */
+	uint16_t nb_inputs;
+
+	/* Model / Layer inputs */
+	struct cnxk_ml_io input[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Total size of quantized input */
+	uint32_t total_input_sz_q;
+
+	/* Total size of dequantized input */
+	uint32_t total_input_sz_d;
+
+	/* Number of outputs */
+	uint16_t nb_outputs;
+
+	/* Model / Layer outputs */
+	struct cnxk_ml_io output[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Total size of quantized output */
+	uint32_t total_output_sz_q;
+
+	/* Total size of dequantized output */
+	uint32_t total_output_sz_d;
+};
+
+#endif /* _CNXK_ML_IO_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
new file mode 100644
index 0000000000..3d735ced3e
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include "cnxk_ml_model.h"
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
new file mode 100644
index 0000000000..a2994dbb71
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -0,0 +1,111 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_MODEL_H_
+#define _CNXK_ML_MODEL_H_
+
+#include <rte_mldev.h>
+
+#include <roc_api.h>
+
+#include "cn10k_ml_model.h"
+
+#include "cnxk_ml_io.h"
+
+struct cnxk_ml_dev;
+struct cnxk_ml_model;
+
+/* Model state */
+enum cnxk_ml_model_state {
+	/* Unknown state */
+	ML_CNXK_MODEL_STATE_UNKNOWN,
+
+	/* Model loaded */
+	ML_CNXK_MODEL_STATE_LOADED,
+
+	/* A slow-path job is active, start or stop */
+	ML_CNXK_MODEL_STATE_JOB_ACTIVE,
+
+	/* Model started */
+	ML_CNXK_MODEL_STATE_STARTED,
+};
+
+/* Layer state */
+enum cnxk_ml_layer_state {
+	/* Unknown state */
+	ML_CNXK_LAYER_STATE_UNKNOWN,
+
+	/* Layer loaded */
+	ML_CNXK_LAYER_STATE_LOADED,
+
+	/* A slow-path job is active, start or stop */
+	ML_CNXK_LAYER_STATE_JOB_ACTIVE,
+
+	/* Layer started */
+	ML_CNXK_LAYER_STATE_STARTED,
+};
+
+/* Layer object */
+struct cnxk_ml_layer {
+	/* Name*/
+	char name[RTE_ML_STR_MAX];
+
+	/* Model handle */
+	struct cnxk_ml_model *model;
+
+	/* Index mapped with firmware's model_id */
+	uint16_t index;
+
+	/* Input / Output */
+	struct cnxk_ml_io_info info;
+
+	/* Batch size */
+	uint32_t batch_size;
+
+	/* State */
+	enum cnxk_ml_layer_state state;
+
+	/* Glow layer specific data */
+	struct cn10k_ml_layer_data glow;
+};
+
+/* Model Object */
+struct cnxk_ml_model {
+	/* Device reference */
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	/* ID */
+	uint16_t model_id;
+
+	/* Name */
+	char name[RTE_ML_STR_MAX];
+
+	/* Model specific data - glow */
+	struct cn10k_ml_model_data glow;
+
+	/* Batch size */
+	uint32_t batch_size;
+
+	/* Number of layers */
+	uint16_t nb_layers;
+
+	/* Layer info */
+	struct cnxk_ml_layer layer[ML_CNXK_MODEL_MAX_LAYERS];
+
+	/* State */
+	enum cnxk_ml_model_state state;
+
+	/* Internal model information structure
+	 * Size of the buffer = sizeof(struct rte_ml_model_info)
+	 *                    + num_inputs * sizeof(struct rte_ml_io_info)
+	 *                    + num_outputs * sizeof(struct rte_ml_io_info).
+	 * Structures would be arranged in the same order in the buffer.
+	 */
+	uint8_t *info;
+
+	/* Spinlock, used to update model state */
+	plt_spinlock_t lock;
+};
+
+#endif /* _CNXK_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index e006fdfe0e..a70956cceb 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -13,6 +13,7 @@ sources = files(
         'cn10k_ml_model.c',
         'cn10k_ml_ocm.c',
         'cnxk_ml_dev.c',
+        'cnxk_ml_model.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 04/34] ml/cnxk: add generic cnxk request structure
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (2 preceding siblings ...)
  2023-10-18 13:53   ` [PATCH v6 03/34] ml/cnxk: add generic model and layer structures Srikanth Yalavarthi
@ 2023-10-18 13:53   ` Srikanth Yalavarthi
  2023-10-18 13:53   ` [PATCH v6 05/34] ml/cnxk: add generic cnxk xstats structures Srikanth Yalavarthi
                     ` (30 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:53 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added generic cnxk request structure. Moved common fields
from cn10k structures to cnxk structure. Moved job related
structures and enumerations to ops headers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c   |  72 +++----
 drivers/ml/cnxk/cn10k_ml_dev.h   | 269 +------------------------
 drivers/ml/cnxk/cn10k_ml_model.c |   6 +-
 drivers/ml/cnxk/cn10k_ml_model.h |   4 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 331 +++++++++++++++++--------------
 drivers/ml/cnxk/cn10k_ml_ops.h   | 296 +++++++++++++++++++++++----
 drivers/ml/cnxk/cnxk_ml_ops.c    |   7 +
 drivers/ml/cnxk/cnxk_ml_ops.h    |  63 ++++++
 drivers/ml/cnxk/meson.build      |   1 +
 9 files changed, 557 insertions(+), 492 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 3bc61443d8..fc6f78d414 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -14,9 +14,8 @@
 
 #include <roc_api.h>
 
-#include "cn10k_ml_ops.h"
-
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_ops.h"
 
 #define CN10K_ML_FW_PATH		"fw_path"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
@@ -400,20 +399,23 @@ cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 static void
 cn10k_ml_fw_print_info(struct cn10k_ml_fw *fw)
 {
-	plt_info("ML Firmware Version = %s", fw->req->jd.fw_load.version);
-
-	plt_ml_dbg("Firmware capabilities = 0x%016lx", fw->req->jd.fw_load.cap.u64);
-	plt_ml_dbg("Version = %s", fw->req->jd.fw_load.version);
-	plt_ml_dbg("core0_debug_ptr = 0x%016lx", fw->req->jd.fw_load.debug.core0_debug_ptr);
-	plt_ml_dbg("core1_debug_ptr = 0x%016lx", fw->req->jd.fw_load.debug.core1_debug_ptr);
-	plt_ml_dbg("debug_buffer_size = %u bytes", fw->req->jd.fw_load.debug.debug_buffer_size);
+	plt_info("ML Firmware Version = %s", fw->req->cn10k_req.jd.fw_load.version);
+
+	plt_ml_dbg("Firmware capabilities = 0x%016lx", fw->req->cn10k_req.jd.fw_load.cap.u64);
+	plt_ml_dbg("Version = %s", fw->req->cn10k_req.jd.fw_load.version);
+	plt_ml_dbg("core0_debug_ptr = 0x%016lx",
+		   fw->req->cn10k_req.jd.fw_load.debug.core0_debug_ptr);
+	plt_ml_dbg("core1_debug_ptr = 0x%016lx",
+		   fw->req->cn10k_req.jd.fw_load.debug.core1_debug_ptr);
+	plt_ml_dbg("debug_buffer_size = %u bytes",
+		   fw->req->cn10k_req.jd.fw_load.debug.debug_buffer_size);
 	plt_ml_dbg("core0_exception_buffer = 0x%016lx",
-		   fw->req->jd.fw_load.debug.core0_exception_buffer);
+		   fw->req->cn10k_req.jd.fw_load.debug.core0_exception_buffer);
 	plt_ml_dbg("core1_exception_buffer = 0x%016lx",
-		   fw->req->jd.fw_load.debug.core1_exception_buffer);
+		   fw->req->cn10k_req.jd.fw_load.debug.core1_exception_buffer);
 	plt_ml_dbg("exception_state_size = %u bytes",
-		   fw->req->jd.fw_load.debug.exception_state_size);
-	plt_ml_dbg("flags = 0x%016lx", fw->req->jd.fw_load.flags);
+		   fw->req->cn10k_req.jd.fw_load.debug.exception_state_size);
+	plt_ml_dbg("flags = 0x%016lx", fw->req->cn10k_req.jd.fw_load.flags);
 }
 
 uint64_t
@@ -458,29 +460,30 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 	roc_ml_reg_save(&cn10k_mldev->roc, ML_MLR_BASE);
 
 	/* Update FW load completion structure */
-	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
-	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
-	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
+	fw->req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->cn10k_req.status);
+	fw->req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
+	fw->req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->cn10k_req.result);
+	fw->req->cn10k_req.jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->cn10k_req.status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
 	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->cn10k_req.jd);
 
 	plt_rmb();
 	do {
 		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
+		    (plt_read64(&fw->req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
 	} while (plt_tsc_cycles() < timeout_cycle);
 
 	/* Check firmware load status, clean-up and exit on failure. */
-	if ((!timeout) && (fw->req->result.error_code.u64 == 0)) {
+	if ((!timeout) && (fw->req->cn10k_req.result.error_code == 0)) {
 		cn10k_ml_fw_print_info(fw);
 	} else {
 		/* Set ML to disable new jobs */
@@ -654,29 +657,30 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	plt_ml_dbg("ML_SW_RST_CTRL => 0x%08x", reg_val32);
 
 	/* (12) Wait for notification from firmware that ML is ready for job execution. */
-	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
-	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
-	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
+	fw->req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->cn10k_req.status);
+	fw->req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
+	fw->req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->cn10k_req.result);
+	fw->req->cn10k_req.jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->cn10k_req.status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
 	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->cn10k_req.jd);
 
 	plt_rmb();
 	do {
 		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
+		    (plt_read64(&fw->req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
 	} while (plt_tsc_cycles() < timeout_cycle);
 
 	/* Check firmware load status, clean-up and exit on failure. */
-	if ((!timeout) && (fw->req->result.error_code.u64 == 0)) {
+	if ((!timeout) && (fw->req->cn10k_req.result.error_code == 0)) {
 		cn10k_ml_fw_print_info(fw);
 	} else {
 		/* Set ML to disable new jobs */
@@ -766,11 +770,11 @@ cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 		}
 
 		/* Reserve memzone for firmware load completion and data */
-		mz_size = sizeof(struct cn10k_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
+		mz_size = sizeof(struct cnxk_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
 			  FW_DEBUG_BUFFER_SIZE + FW_EXCEPTION_BUFFER_SIZE;
 	} else if (roc_env_is_asim()) {
 		/* Reserve memzone for firmware load completion */
-		mz_size = sizeof(struct cn10k_ml_req);
+		mz_size = sizeof(struct cnxk_ml_req);
 	}
 
 	mz = plt_memzone_reserve_aligned(FW_MEMZONE_NAME, mz_size, 0, ML_CN10K_ALIGN_SIZE);
@@ -782,8 +786,8 @@ cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 	fw->req = mz->addr;
 
 	/* Reset firmware load completion structure */
-	memset(&fw->req->jd, 0, sizeof(struct cn10k_ml_jd));
-	memset(&fw->req->jd.fw_load.version[0], '\0', MLDEV_FIRMWARE_VERSION_LENGTH);
+	memset(&fw->req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+	memset(&fw->req->cn10k_req.jd.fw_load.version[0], '\0', MLDEV_FIRMWARE_VERSION_LENGTH);
 
 	/* Reset device, if in active state */
 	if (roc_ml_mlip_is_enabled(&cn10k_mldev->roc))
@@ -791,7 +795,7 @@ cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 
 	/* Load firmware */
 	if (roc_env_is_emulator() || roc_env_is_hw()) {
-		fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cn10k_ml_req));
+		fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cnxk_ml_req));
 		ret = cn10k_ml_fw_load_cn10ka(fw, fw_buffer, fw_size);
 		free(fw_buffer);
 	} else if (roc_env_is_asim()) {
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 99ff0a344a..1852d4f6c9 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -17,9 +17,6 @@ extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
-/* Firmware version string length */
-#define MLDEV_FIRMWARE_VERSION_LENGTH 32
-
 /* Device alignment size */
 #define ML_CN10K_ALIGN_SIZE 128
 
@@ -52,17 +49,8 @@ extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 #endif
 
 struct cnxk_ml_dev;
-struct cn10k_ml_req;
-struct cn10k_ml_qp;
-
-/* Job types */
-enum cn10k_ml_job_type {
-	ML_CN10K_JOB_TYPE_MODEL_RUN = 0,
-	ML_CN10K_JOB_TYPE_MODEL_STOP,
-	ML_CN10K_JOB_TYPE_MODEL_START,
-	ML_CN10K_JOB_TYPE_FIRMWARE_LOAD,
-	ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST,
-};
+struct cnxk_ml_req;
+struct cnxk_ml_qp;
 
 /* Error types enumeration */
 enum cn10k_ml_error_etype {
@@ -112,251 +100,6 @@ union cn10k_ml_error_code {
 	uint64_t u64;
 };
 
-/* Firmware stats */
-struct cn10k_ml_fw_stats {
-	/* Firmware start cycle */
-	uint64_t fw_start;
-
-	/* Firmware end cycle */
-	uint64_t fw_end;
-
-	/* Hardware start cycle */
-	uint64_t hw_start;
-
-	/* Hardware end cycle */
-	uint64_t hw_end;
-};
-
-/* Result structure */
-struct cn10k_ml_result {
-	/* Job error code */
-	union cn10k_ml_error_code error_code;
-
-	/* Firmware stats */
-	struct cn10k_ml_fw_stats stats;
-
-	/* User context pointer */
-	void *user_ptr;
-};
-
-/* Firmware capability structure */
-union cn10k_ml_fw_cap {
-	uint64_t u64;
-
-	struct {
-		/* CMPC completion support */
-		uint64_t cmpc_completions : 1;
-
-		/* Poll mode completion support */
-		uint64_t poll_completions : 1;
-
-		/* SSO completion support */
-		uint64_t sso_completions : 1;
-
-		/* Support for model side loading */
-		uint64_t side_load_model : 1;
-
-		/* Batch execution */
-		uint64_t batch_run : 1;
-
-		/* Max number of models to be loaded in parallel */
-		uint64_t max_models : 8;
-
-		/* Firmware statistics */
-		uint64_t fw_stats : 1;
-
-		/* Hardware statistics */
-		uint64_t hw_stats : 1;
-
-		/* Max number of batches */
-		uint64_t max_num_batches : 16;
-
-		uint64_t rsvd : 33;
-	} s;
-};
-
-/* Firmware debug info structure */
-struct cn10k_ml_fw_debug {
-	/* ACC core 0 debug buffer */
-	uint64_t core0_debug_ptr;
-
-	/* ACC core 1 debug buffer */
-	uint64_t core1_debug_ptr;
-
-	/* ACC core 0 exception state buffer */
-	uint64_t core0_exception_buffer;
-
-	/* ACC core 1 exception state buffer */
-	uint64_t core1_exception_buffer;
-
-	/* Debug buffer size per core */
-	uint32_t debug_buffer_size;
-
-	/* Exception state dump size */
-	uint32_t exception_state_size;
-};
-
-/* Job descriptor header (32 bytes) */
-struct cn10k_ml_jd_header {
-	/* Job completion structure */
-	struct ml_jce_s jce;
-
-	/* Model ID */
-	uint64_t model_id : 8;
-
-	/* Job type */
-	uint64_t job_type : 8;
-
-	/* Flags for fast-path jobs */
-	uint64_t fp_flags : 16;
-
-	/* Flags for slow-path jobs */
-	uint64_t sp_flags : 16;
-	uint64_t rsvd : 16;
-
-	/* Job result pointer */
-	uint64_t *result;
-};
-
-/* Extra arguments for job descriptor */
-union cn10k_ml_jd_extended_args {
-	struct cn10k_ml_jd_extended_args_section_start {
-		/** DDR Scratch base address */
-		uint64_t ddr_scratch_base_address;
-
-		/** DDR Scratch range start */
-		uint64_t ddr_scratch_range_start;
-
-		/** DDR Scratch range end */
-		uint64_t ddr_scratch_range_end;
-
-		uint8_t rsvd[104];
-	} start;
-};
-
-/* Job descriptor structure */
-struct cn10k_ml_jd {
-	/* Job descriptor header (32 bytes) */
-	struct cn10k_ml_jd_header hdr;
-
-	union {
-		struct cn10k_ml_jd_section_fw_load {
-			/* Firmware capability structure (8 bytes) */
-			union cn10k_ml_fw_cap cap;
-
-			/* Firmware version (32 bytes) */
-			uint8_t version[MLDEV_FIRMWARE_VERSION_LENGTH];
-
-			/* Debug capability structure (40 bytes) */
-			struct cn10k_ml_fw_debug debug;
-
-			/* Flags to control error handling */
-			uint64_t flags;
-
-			uint8_t rsvd[8];
-		} fw_load;
-
-		struct cn10k_ml_jd_section_model_start {
-			/* Extended arguments */
-			uint64_t extended_args;
-
-			/* Destination model start address in DDR relative to ML_MLR_BASE */
-			uint64_t model_dst_ddr_addr;
-
-			/* Offset to model init section in the model */
-			uint64_t model_init_offset : 32;
-
-			/* Size of init section in the model */
-			uint64_t model_init_size : 32;
-
-			/* Offset to model main section in the model */
-			uint64_t model_main_offset : 32;
-
-			/* Size of main section in the model */
-			uint64_t model_main_size : 32;
-
-			/* Offset to model finish section in the model */
-			uint64_t model_finish_offset : 32;
-
-			/* Size of finish section in the model */
-			uint64_t model_finish_size : 32;
-
-			/* Offset to WB in model bin */
-			uint64_t model_wb_offset : 32;
-
-			/* Number of model layers */
-			uint64_t num_layers : 8;
-
-			/* Number of gather entries, 0 means linear input mode (= no gather) */
-			uint64_t num_gather_entries : 8;
-
-			/* Number of scatter entries 0 means linear input mode (= no scatter) */
-			uint64_t num_scatter_entries : 8;
-
-			/* Tile mask to load model */
-			uint64_t tilemask : 8;
-
-			/* Batch size of model  */
-			uint64_t batch_size : 32;
-
-			/* OCM WB base address */
-			uint64_t ocm_wb_base_address : 32;
-
-			/* OCM WB range start */
-			uint64_t ocm_wb_range_start : 32;
-
-			/* OCM WB range End */
-			uint64_t ocm_wb_range_end : 32;
-
-			/* DDR WB address */
-			uint64_t ddr_wb_base_address;
-
-			/* DDR WB range start */
-			uint64_t ddr_wb_range_start : 32;
-
-			/* DDR WB range end */
-			uint64_t ddr_wb_range_end : 32;
-
-			union {
-				/* Points to gather list if num_gather_entries > 0 */
-				void *gather_list;
-				struct {
-					/* Linear input mode */
-					uint64_t ddr_range_start : 32;
-					uint64_t ddr_range_end : 32;
-				} s;
-			} input;
-
-			union {
-				/* Points to scatter list if num_scatter_entries > 0 */
-				void *scatter_list;
-				struct {
-					/* Linear output mode */
-					uint64_t ddr_range_start : 32;
-					uint64_t ddr_range_end : 32;
-				} s;
-			} output;
-		} model_start;
-
-		struct cn10k_ml_jd_section_model_stop {
-			uint8_t rsvd[96];
-		} model_stop;
-
-		struct cn10k_ml_jd_section_model_run {
-			/* Address of the input for the run relative to ML_MLR_BASE */
-			uint64_t input_ddr_addr;
-
-			/* Address of the output for the run relative to ML_MLR_BASE */
-			uint64_t output_ddr_addr;
-
-			/* Number of batches to run in variable batch processing */
-			uint16_t num_batches;
-
-			uint8_t rsvd[78];
-		} model_run;
-	};
-};
-
 /* ML firmware structure */
 struct cn10k_ml_fw {
 	/* Device reference */
@@ -375,7 +118,7 @@ struct cn10k_ml_fw {
 	uint8_t *data;
 
 	/* Firmware load / handshake request structure */
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 };
 
 /* Extended stats types enum */
@@ -488,9 +231,9 @@ struct cn10k_ml_dev {
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
 
 	/* Poll handling function pointers */
-	void (*set_poll_addr)(struct cn10k_ml_req *req);
-	void (*set_poll_ptr)(struct cn10k_ml_req *req);
-	uint64_t (*get_poll_ptr)(struct cn10k_ml_req *req);
+	void (*set_poll_addr)(struct cnxk_ml_req *req);
+	void (*set_poll_ptr)(struct cnxk_ml_req *req);
+	uint64_t (*get_poll_ptr)(struct cnxk_ml_req *req);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index d747bba151..5d37e9bf8a 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -10,6 +10,7 @@
 
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_ops.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -549,7 +550,6 @@ void
 cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct rte_ml_model_info *info;
 	struct rte_ml_io_info *output;
@@ -558,7 +558,6 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 	uint8_t i;
 
 	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	metadata = &model->glow.metadata;
 	info = PLT_PTR_CAST(model->info);
 	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
@@ -575,7 +574,8 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 	info->io_layout = RTE_ML_IO_LAYOUT_PACKED;
 	info->min_batches = model->batch_size;
 	info->max_batches =
-		cn10k_mldev->fw.req->jd.fw_load.cap.s.max_num_batches / model->batch_size;
+		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_num_batches /
+		model->batch_size;
 	info->nb_inputs = metadata->model.num_input;
 	info->input_info = input;
 	info->nb_outputs = metadata->model.num_output;
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 206a369ca7..74ada1531a 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -11,10 +11,10 @@
 
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ocm.h"
-#include "cn10k_ml_ops.h"
 
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
+struct cnxk_ml_req;
 
 /* Model Metadata : v 2.3.0.1 */
 #define MRVL_ML_MODEL_MAGIC_STRING "MRVL"
@@ -444,7 +444,7 @@ struct cn10k_ml_layer_data {
 	struct cn10k_ml_ocm_layer_map ocm_map;
 
 	/* Layer: Slow-path operations request pointer */
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 
 	/* Layer: Stats for burst ops */
 	struct cn10k_ml_layer_stats *burst_stats;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index e91cc4e859..caee09829b 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -7,10 +7,9 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_ops.h"
-
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_ops.h"
 
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
@@ -78,31 +77,31 @@ print_line(FILE *fp, int len)
 }
 
 static inline void
-cn10k_ml_set_poll_addr(struct cn10k_ml_req *req)
+cn10k_ml_set_poll_addr(struct cnxk_ml_req *req)
 {
-	req->compl_W1 = PLT_U64_CAST(&req->status);
+	req->status = &req->cn10k_req.status;
 }
 
 static inline void
-cn10k_ml_set_poll_ptr(struct cn10k_ml_req *req)
+cn10k_ml_set_poll_ptr(struct cnxk_ml_req *req)
 {
-	plt_write64(ML_CNXK_POLL_JOB_START, req->compl_W1);
+	plt_write64(ML_CNXK_POLL_JOB_START, req->status);
 }
 
 static inline uint64_t
-cn10k_ml_get_poll_ptr(struct cn10k_ml_req *req)
+cn10k_ml_get_poll_ptr(struct cnxk_ml_req *req)
 {
-	return plt_read64(req->compl_W1);
+	return plt_read64(req->status);
 }
 
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
-	snprintf(name, size, "cn10k_ml_qp_mem_%u:%u", dev_id, qp_id);
+	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
 }
 
 static int
-cn10k_ml_qp_destroy(const struct rte_ml_dev *dev, struct cn10k_ml_qp *qp)
+cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
 {
 	const struct rte_memzone *qp_mem;
 	char name[RTE_MEMZONE_NAMESIZE];
@@ -122,14 +121,14 @@ cn10k_ml_qp_destroy(const struct rte_ml_dev *dev, struct cn10k_ml_qp *qp)
 static int
 cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
 {
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	int ret;
 
 	qp = dev->data->queue_pairs[queue_pair_id];
 	if (qp == NULL)
 		return -EINVAL;
 
-	ret = cn10k_ml_qp_destroy(dev, qp);
+	ret = cnxk_ml_qp_destroy(dev, qp);
 	if (ret) {
 		plt_err("Could not destroy queue pair %u", queue_pair_id);
 		return ret;
@@ -140,18 +139,18 @@ cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
 	return 0;
 }
 
-static struct cn10k_ml_qp *
-cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
+static struct cnxk_ml_qp *
+cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
 {
 	const struct rte_memzone *qp_mem;
 	char name[RTE_MEMZONE_NAMESIZE];
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint32_t len;
 	uint8_t *va;
 	uint64_t i;
 
 	/* Allocate queue pair */
-	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cn10k_ml_qp), ROC_ALIGN,
+	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cnxk_ml_qp), ROC_ALIGN,
 				socket_id);
 	if (qp == NULL) {
 		plt_err("Could not allocate queue pair");
@@ -159,7 +158,7 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	}
 
 	/* For request queue */
-	len = nb_desc * sizeof(struct cn10k_ml_req);
+	len = nb_desc * sizeof(struct cnxk_ml_req);
 	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp_id);
 	qp_mem = rte_memzone_reserve_aligned(
 		name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | RTE_MEMZONE_256MB, ROC_ALIGN);
@@ -173,7 +172,7 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 
 	/* Initialize Request queue */
 	qp->id = qp_id;
-	qp->queue.reqs = (struct cn10k_ml_req *)va;
+	qp->queue.reqs = (struct cnxk_ml_req *)va;
 	qp->queue.head = 0;
 	qp->queue.tail = 0;
 	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
@@ -185,8 +184,9 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
-		memset(&qp->queue.reqs[i].jd, 0, sizeof(struct cn10k_ml_jd));
-		qp->queue.reqs[i].jcmd.w1.s.jobptr = PLT_U64_CAST(&qp->queue.reqs[i].jd);
+		memset(&qp->queue.reqs[i].cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+		qp->queue.reqs[i].cn10k_req.jcmd.w1.s.jobptr =
+			PLT_U64_CAST(&qp->queue.reqs[i].cn10k_req.jd);
 	}
 
 	return qp;
@@ -333,7 +333,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 
 static void
 cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_model *model,
-				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
+				struct cnxk_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_layer_addr *addr;
@@ -341,79 +341,88 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 	metadata = &model->glow.metadata;
 	addr = &model->layer[0].glow.addr;
 
-	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
-	req->jd.hdr.jce.w0.u64 = 0;
-	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
-	req->jd.hdr.model_id = model->model_id;
-	req->jd.hdr.job_type = job_type;
-	req->jd.hdr.fp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
+	memset(&req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
+	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->cn10k_req.status);
+	req->cn10k_req.jd.hdr.model_id = model->model_id;
+	req->cn10k_req.jd.hdr.job_type = job_type;
+	req->cn10k_req.jd.hdr.fp_flags = 0x0;
+	req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
 		if (!model->glow.metadata.model.ocm_relocatable)
-			req->jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
+			req->cn10k_req.jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
 		else
-			req->jd.hdr.sp_flags = 0x0;
+			req->cn10k_req.jd.hdr.sp_flags = 0x0;
 
-		req->jd.hdr.sp_flags |= ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD;
-		req->jd.model_start.extended_args =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->extended_args));
-		req->jd.model_start.model_dst_ddr_addr =
+		req->cn10k_req.jd.hdr.sp_flags |= ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD;
+		req->cn10k_req.jd.model_start.extended_args = PLT_U64_CAST(
+			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.extended_args));
+		req->cn10k_req.jd.model_start.model_dst_ddr_addr =
 			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_run_addr));
-		req->jd.model_start.model_init_offset = 0x0;
-		req->jd.model_start.model_main_offset = metadata->init_model.file_size;
-		req->jd.model_start.model_finish_offset =
+		req->cn10k_req.jd.model_start.model_init_offset = 0x0;
+		req->cn10k_req.jd.model_start.model_main_offset = metadata->init_model.file_size;
+		req->cn10k_req.jd.model_start.model_finish_offset =
 			metadata->init_model.file_size + metadata->main_model.file_size;
-		req->jd.model_start.model_init_size = metadata->init_model.file_size;
-		req->jd.model_start.model_main_size = metadata->main_model.file_size;
-		req->jd.model_start.model_finish_size = metadata->finish_model.file_size;
-		req->jd.model_start.model_wb_offset = metadata->init_model.file_size +
-						      metadata->main_model.file_size +
-						      metadata->finish_model.file_size;
-		req->jd.model_start.num_layers = metadata->model.num_layers;
-		req->jd.model_start.num_gather_entries = 0;
-		req->jd.model_start.num_scatter_entries = 0;
-		req->jd.model_start.tilemask = 0; /* Updated after reserving pages */
-		req->jd.model_start.batch_size = model->batch_size;
-		req->jd.model_start.ocm_wb_base_address = 0; /* Updated after reserving pages */
-		req->jd.model_start.ocm_wb_range_start = metadata->model.ocm_wb_range_start;
-		req->jd.model_start.ocm_wb_range_end = metadata->model.ocm_wb_range_end;
-		req->jd.model_start.ddr_wb_base_address = PLT_U64_CAST(roc_ml_addr_ap2mlip(
-			&cn10k_mldev->roc,
-			PLT_PTR_ADD(addr->finish_load_addr, metadata->finish_model.file_size)));
-		req->jd.model_start.ddr_wb_range_start = metadata->model.ddr_wb_range_start;
-		req->jd.model_start.ddr_wb_range_end = metadata->model.ddr_wb_range_end;
-		req->jd.model_start.input.s.ddr_range_start = metadata->model.ddr_input_range_start;
-		req->jd.model_start.input.s.ddr_range_end = metadata->model.ddr_input_range_end;
-		req->jd.model_start.output.s.ddr_range_start =
+		req->cn10k_req.jd.model_start.model_init_size = metadata->init_model.file_size;
+		req->cn10k_req.jd.model_start.model_main_size = metadata->main_model.file_size;
+		req->cn10k_req.jd.model_start.model_finish_size = metadata->finish_model.file_size;
+		req->cn10k_req.jd.model_start.model_wb_offset = metadata->init_model.file_size +
+								metadata->main_model.file_size +
+								metadata->finish_model.file_size;
+		req->cn10k_req.jd.model_start.num_layers = metadata->model.num_layers;
+		req->cn10k_req.jd.model_start.num_gather_entries = 0;
+		req->cn10k_req.jd.model_start.num_scatter_entries = 0;
+		req->cn10k_req.jd.model_start.tilemask = 0; /* Updated after reserving pages */
+		req->cn10k_req.jd.model_start.batch_size = model->batch_size;
+		req->cn10k_req.jd.model_start.ocm_wb_base_address =
+			0; /* Updated after reserving pages */
+		req->cn10k_req.jd.model_start.ocm_wb_range_start =
+			metadata->model.ocm_wb_range_start;
+		req->cn10k_req.jd.model_start.ocm_wb_range_end = metadata->model.ocm_wb_range_end;
+		req->cn10k_req.jd.model_start.ddr_wb_base_address =
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(
+				&cn10k_mldev->roc, PLT_PTR_ADD(addr->finish_load_addr,
+							       metadata->finish_model.file_size)));
+		req->cn10k_req.jd.model_start.ddr_wb_range_start =
+			metadata->model.ddr_wb_range_start;
+		req->cn10k_req.jd.model_start.ddr_wb_range_end = metadata->model.ddr_wb_range_end;
+		req->cn10k_req.jd.model_start.input.s.ddr_range_start =
+			metadata->model.ddr_input_range_start;
+		req->cn10k_req.jd.model_start.input.s.ddr_range_end =
+			metadata->model.ddr_input_range_end;
+		req->cn10k_req.jd.model_start.output.s.ddr_range_start =
 			metadata->model.ddr_output_range_start;
-		req->jd.model_start.output.s.ddr_range_end = metadata->model.ddr_output_range_end;
+		req->cn10k_req.jd.model_start.output.s.ddr_range_end =
+			metadata->model.ddr_output_range_end;
 
-		req->extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
+		req->cn10k_req.extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
 			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->scratch_base_addr));
-		req->extended_args.start.ddr_scratch_range_start =
+		req->cn10k_req.extended_args.start.ddr_scratch_range_start =
 			metadata->model.ddr_scratch_range_start;
-		req->extended_args.start.ddr_scratch_range_end =
+		req->cn10k_req.extended_args.start.ddr_scratch_range_end =
 			metadata->model.ddr_scratch_range_end;
 	}
 }
 
 static __rte_always_inline void
-cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_req *req,
+cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_req *req,
 				struct rte_ml_op *op)
 {
-	req->jd.hdr.jce.w0.u64 = 0;
-	req->jd.hdr.jce.w1.u64 = req->compl_W1;
-	req->jd.hdr.model_id = op->model_id;
-	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
-	req->jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
-	req->jd.hdr.sp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
-	req->jd.model_run.input_ddr_addr =
+	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
+	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(req->status);
+	req->cn10k_req.jd.hdr.model_id = op->model_id;
+	req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
+	req->cn10k_req.jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
+	req->cn10k_req.jd.hdr.sp_flags = 0x0;
+	req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
+	req->cn10k_req.jd.model_run.input_ddr_addr =
 		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->input[0]->addr));
-	req->jd.model_run.output_ddr_addr =
+	req->cn10k_req.jd.model_run.output_ddr_addr =
 		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->output[0]->addr));
-	req->jd.model_run.num_batches = op->nb_batches;
+	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
 }
 
 struct xstat_info {
@@ -861,7 +870,7 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	op.input = &inp;
 	op.output = &out;
 
-	memset(model->layer[0].glow.req, 0, sizeof(struct cn10k_ml_req));
+	memset(model->layer[0].glow.req, 0, sizeof(struct cnxk_ml_req));
 	ret = cn10k_ml_inference_sync(dev, &op);
 	plt_memzone_free(mz);
 
@@ -904,7 +913,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint16_t model_id;
 	uint32_t mz_size;
 	uint16_t tile_id;
@@ -1101,7 +1110,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint16_t model_id;
 	uint16_t qp_id;
 
@@ -1136,7 +1145,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
 		qp = dev->data->queue_pairs[qp_id];
 		if (qp != NULL) {
-			if (cn10k_ml_qp_destroy(dev, qp) != 0)
+			if (cnxk_ml_qp_destroy(dev, qp) != 0)
 				plt_err("Could not destroy queue pair %u", qp_id);
 			dev->data->queue_pairs[qp_id] = NULL;
 		}
@@ -1213,7 +1222,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 			      const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
 {
 	struct rte_ml_dev_info dev_info;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint32_t nb_desc;
 
 	if (queue_pair_id >= dev->data->nb_queue_pairs) {
@@ -1239,7 +1248,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	 */
 	nb_desc =
 		(qp_conf->nb_desc == dev_info.max_desc) ? dev_info.max_desc : qp_conf->nb_desc + 1;
-	qp = cn10k_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
+	qp = cnxk_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
 	if (qp == NULL) {
 		plt_err("Could not create queue pair %u", queue_pair_id);
 		return -ENOMEM;
@@ -1252,7 +1261,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 static int
 cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 {
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	int qp_id;
 
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
@@ -1269,7 +1278,7 @@ cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 static void
 cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	int qp_id;
 
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
@@ -1485,20 +1494,22 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 
 	/* Dump debug buffer */
 	for (core_id = 0; core_id <= 1; core_id++) {
-		bufsize = fw->req->jd.fw_load.debug.debug_buffer_size;
+		bufsize = fw->req->cn10k_req.jd.fw_load.debug.debug_buffer_size;
 		if (core_id == 0) {
 			head_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
 			tail_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_debug_ptr);
+			head_ptr =
+				PLT_PTR_CAST(fw->req->cn10k_req.jd.fw_load.debug.core0_debug_ptr);
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		} else {
 			head_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
 			tail_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_debug_ptr);
+			head_ptr =
+				PLT_PTR_CAST(fw->req->cn10k_req.jd.fw_load.debug.core1_debug_ptr);
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		}
 		if (head_loc < tail_loc) {
@@ -1511,17 +1522,19 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 
 	/* Dump exception info */
 	for (core_id = 0; core_id <= 1; core_id++) {
-		bufsize = fw->req->jd.fw_load.debug.exception_state_size;
+		bufsize = fw->req->cn10k_req.jd.fw_load.debug.exception_state_size;
 		if ((core_id == 0) &&
 		    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0)) {
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_exception_buffer);
+			head_ptr = PLT_PTR_CAST(
+				fw->req->cn10k_req.jd.fw_load.debug.core0_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C0 = 0x%016lx",
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0));
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 			fprintf(fp, "%.*s", bufsize, head_ptr);
 		} else if ((core_id == 1) && (roc_ml_reg_read64(&cn10k_mldev->roc,
 								ML_SCRATCH_EXCEPTION_SP_C1) != 0)) {
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_exception_buffer);
+			head_ptr = PLT_PTR_CAST(
+				fw->req->cn10k_req.jd.fw_load.debug.core1_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C1 = 0x%016lx",
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1));
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
@@ -1538,14 +1551,14 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	const struct plt_memzone *mz;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 	uint64_t timeout_cycle;
 	bool timeout;
 	int ret;
 
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cn10k_ml_req), 0,
+	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cnxk_ml_req), 0,
 					 ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL) {
 		plt_err("Could not allocate reserved memzone");
@@ -1554,23 +1567,24 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	req = mz->addr;
 
 	/* Prepare load completion structure */
-	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
-	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
-	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
-	req->jd.fw_load.flags = cn10k_ml_fw_flags_get(&cn10k_mldev->fw);
-	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
+	memset(&req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->cn10k_req.status);
+	req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST;
+	req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
+	req->cn10k_req.jd.fw_load.flags = cn10k_ml_fw_flags_get(&cn10k_mldev->fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
 	/* Enqueue firmware selftest request through scratch registers */
 	timeout = true;
 	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 
 	plt_rmb();
 	do {
 		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
-		    (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH)) {
+		    (plt_read64(&req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -1581,7 +1595,7 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	if (timeout) {
 		ret = -ETIME;
 	} else {
-		if (req->result.error_code.u64 != 0)
+		if (req->cn10k_req.result.error_code != 0)
 			ret = -1;
 	}
 
@@ -1654,7 +1668,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
 		  2 * model_data_size + model_scratch_size + model_info_size +
-		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE) +
+		  PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE) +
 		  model_stats_size;
 
 	/* Allocate memzone for model object and model data */
@@ -1726,7 +1740,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	/* Reset burst and sync stats */
 	model->layer[0].glow.burst_stats =
 		PLT_PTR_ADD(model->layer[0].glow.req,
-			    PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE));
+			    PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
 		model->layer[0].glow.burst_stats[qp_id].hw_latency_tot = 0;
 		model->layer[0].glow.burst_stats[qp_id].hw_latency_min = UINT64_MAX;
@@ -1790,7 +1804,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 
 	bool job_enqueued;
 	bool job_dequeued;
@@ -1815,10 +1829,10 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	/* Prepare JD */
 	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
-	req->result.error_code.u64 = 0x0;
-	req->result.user_ptr = NULL;
+	req->cn10k_req.result.error_code = 0x0;
+	req->cn10k_req.result.user_ptr = NULL;
 
-	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
 	num_tiles = model->layer[0].glow.metadata.model.tile_end -
@@ -1878,8 +1892,8 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Update JD */
 	cn10k_ml_ocm_tilecount(model->layer[0].glow.ocm_map.tilemask, &tile_start, &tile_end);
-	req->jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
-	req->jd.model_start.ocm_wb_base_address =
+	req->cn10k_req.jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
+	req->cn10k_req.jd.model_start.ocm_wb_base_address =
 		model->layer[0].glow.ocm_map.wb_page_start * ocm->page_size;
 
 	job_enqueued = false;
@@ -1887,19 +1901,21 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	do {
 		if (!job_enqueued) {
 			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
+			job_enqueued =
+				roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
+			job_dequeued =
+				roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
-			if (req->result.error_code.u64 == 0)
+		if (plt_read64(&req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH) {
+			if (req->cn10k_req.result.error_code == 0)
 				ret = 0;
 			else
 				ret = -1;
@@ -1952,7 +1968,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 
 	bool job_enqueued;
 	bool job_dequeued;
@@ -1972,10 +1988,10 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	/* Prepare JD */
 	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
-	req->result.error_code.u64 = 0x0;
-	req->result.user_ptr = NULL;
+	req->cn10k_req.result.error_code = 0x0;
+	req->cn10k_req.result.user_ptr = NULL;
 
-	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
 	locked = false;
@@ -2015,19 +2031,21 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	do {
 		if (!job_enqueued) {
 			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
+			job_enqueued =
+				roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
+			job_dequeued =
+				roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
-			if (req->result.error_code.u64 == 0x0)
+		if (plt_read64(&req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH) {
+			if (req->cn10k_req.result.error_code == 0x0)
 				ret = 0;
 			else
 				ret = -1;
@@ -2287,18 +2305,23 @@ queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
 }
 
 static __rte_always_inline void
-cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
-		       struct rte_ml_op *op)
+cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *req)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_layer_stats *stats;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cn10k_ml_result *result;
 	struct cnxk_ml_model *model;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
+	struct rte_ml_op *op;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
 
-	if (likely(result->error_code.u64 == 0)) {
+	result = &req->cn10k_req.result;
+	op = req->op;
+
+	if (likely(result->error_code == 0)) {
 		model = dev->data->models[op->model_id];
 		if (likely(qp_id >= 0)) {
 			qp = dev->data->queue_pairs[qp_id];
@@ -2329,7 +2352,7 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		stats->fw_latency_max = PLT_MAX(stats->fw_latency_max, fw_latency);
 		stats->dequeued_count++;
 
-		op->impl_opaque = result->error_code.u64;
+		op->impl_opaque = result->error_code;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
 	} else {
 		if (likely(qp_id >= 0)) {
@@ -2338,7 +2361,8 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		}
 
 		/* Handle driver error */
-		if (result->error_code.s.etype == ML_ETYPE_DRIVER) {
+		error_code = (union cn10k_ml_error_code *)&result->error_code;
+		if (error_code->s.etype == ML_ETYPE_DRIVER) {
 			cnxk_mldev = dev->data->dev_private;
 			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
@@ -2346,15 +2370,15 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 			if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) !=
 			     0) ||
 			    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
-				result->error_code.s.stype = ML_DRIVER_ERR_EXCEPTION;
+				error_code->s.stype = ML_DRIVER_ERR_EXCEPTION;
 			else if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_LO) != 0) ||
 				 (roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_HI) != 0))
-				result->error_code.s.stype = ML_DRIVER_ERR_FW_ERROR;
+				error_code->s.stype = ML_DRIVER_ERR_FW_ERROR;
 			else
-				result->error_code.s.stype = ML_DRIVER_ERR_UNKNOWN;
+				error_code->s.stype = ML_DRIVER_ERR_UNKNOWN;
 		}
 
-		op->impl_opaque = result->error_code.u64;
+		op->impl_opaque = result->error_code;
 		op->status = RTE_ML_OP_STATUS_ERROR;
 	}
 
@@ -2365,11 +2389,12 @@ __rte_hot uint16_t
 cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_req *req;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_req *req;
+	struct cnxk_ml_qp *qp;
 	struct rte_ml_op *op;
 
 	uint16_t count;
@@ -2395,12 +2420,13 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	cn10k_mldev->set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
-	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
-	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
-	req->result.user_ptr = op->user_ptr;
+	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
+	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	req->cn10k_req.result.user_ptr = op->user_ptr;
 
 	cn10k_mldev->set_poll_ptr(req);
-	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd);
+	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
 
@@ -2424,11 +2450,12 @@ __rte_hot uint16_t
 cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_req *req;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_req *req;
+	struct cnxk_ml_qp *qp;
 
 	uint64_t status;
 	uint16_t count;
@@ -2450,13 +2477,15 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	req = &queue->reqs[tail];
 	status = cn10k_mldev->get_poll_ptr(req);
 	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
-		if (plt_tsc_cycles() < req->timeout)
+		if (plt_tsc_cycles() < req->timeout) {
 			goto empty_or_active;
-		else /* Timeout, set indication of driver error */
-			req->result.error_code.s.etype = ML_ETYPE_DRIVER;
+		} else { /* Timeout, set indication of driver error */
+			error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+			error_code->s.etype = ML_ETYPE_DRIVER;
+		}
 	}
 
-	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
+	cn10k_ml_result_update(dev, qp_id, req);
 	ops[count] = req->op;
 
 	queue_index_advance(&tail, qp->nb_desc);
@@ -2507,10 +2536,11 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 __rte_hot int
 cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 	bool timeout;
 	int ret = 0;
 
@@ -2522,17 +2552,18 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	cn10k_ml_set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
-	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
-	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
-	req->result.user_ptr = op->user_ptr;
+	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
+	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	req->cn10k_req.result.user_ptr = op->user_ptr;
 
 	cn10k_mldev->set_poll_ptr(req);
-	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
+	req->cn10k_req.jcmd.w1.s.jobptr = PLT_U64_CAST(&req->cn10k_req.jd);
 
 	timeout = true;
 	req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	do {
-		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd)) {
+		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd)) {
 			req->op = op;
 			timeout = false;
 			break;
@@ -2555,7 +2586,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	if (timeout)
 		ret = -ETIME;
 	else
-		cn10k_ml_result_update(dev, -1, &req->result, req->op);
+		cn10k_ml_result_update(dev, -1, req);
 
 error_enqueue:
 	return ret;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 005b093e45..fd5992e192 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -10,63 +10,279 @@
 
 #include <roc_api.h>
 
-#include "cn10k_ml_dev.h"
+/* Firmware version string length */
+#define MLDEV_FIRMWARE_VERSION_LENGTH 32
 
-/* Request structure */
-struct cn10k_ml_req {
-	/* Job descriptor */
-	struct cn10k_ml_jd jd;
+/* Job types */
+enum cn10k_ml_job_type {
+	ML_CN10K_JOB_TYPE_MODEL_RUN = 0,
+	ML_CN10K_JOB_TYPE_MODEL_STOP,
+	ML_CN10K_JOB_TYPE_MODEL_START,
+	ML_CN10K_JOB_TYPE_FIRMWARE_LOAD,
+	ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST,
+};
 
-	/* Job descriptor extra arguments */
-	union cn10k_ml_jd_extended_args extended_args;
+/* Firmware stats */
+struct cn10k_ml_stats {
+	/* Firmware start cycle */
+	uint64_t fw_start;
 
-	/* Job result */
-	struct cn10k_ml_result result;
+	/* Firmware end cycle */
+	uint64_t fw_end;
 
-	/* Status field for poll mode requests */
-	volatile uint64_t status;
+	/* Hardware start cycle */
+	uint64_t hw_start;
 
-	/* Job command */
-	struct ml_job_cmd_s jcmd;
+	/* Hardware end cycle */
+	uint64_t hw_end;
+};
+
+/* Result structure */
+struct cn10k_ml_result {
+	/* Job error code */
+	uint64_t error_code;
+
+	/* Stats */
+	struct cn10k_ml_stats stats;
+
+	/* User context pointer */
+	void *user_ptr;
+};
+
+/* Firmware capability structure */
+union cn10k_ml_fw_cap {
+	uint64_t u64;
+
+	struct {
+		/* CMPC completion support */
+		uint64_t cmpc_completions : 1;
+
+		/* Poll mode completion support */
+		uint64_t poll_completions : 1;
+
+		/* SSO completion support */
+		uint64_t sso_completions : 1;
+
+		/* Support for model side loading */
+		uint64_t side_load_model : 1;
 
-	/* Job completion W1 */
-	uint64_t compl_W1;
+		/* Batch execution */
+		uint64_t batch_run : 1;
 
-	/* Timeout cycle */
-	uint64_t timeout;
+		/* Max number of models to be loaded in parallel */
+		uint64_t max_models : 8;
 
-	/* Op */
-	struct rte_ml_op *op;
-} __rte_aligned(ROC_ALIGN);
+		/* Firmware statistics */
+		uint64_t fw_stats : 1;
 
-/* Request queue */
-struct cn10k_ml_queue {
-	/* Array of requests */
-	struct cn10k_ml_req *reqs;
+		/* Hardware statistics */
+		uint64_t hw_stats : 1;
 
-	/* Head of the queue, used for enqueue */
-	uint64_t head;
+		/* Max number of batches */
+		uint64_t max_num_batches : 16;
 
-	/* Tail of the queue, used for dequeue */
-	uint64_t tail;
+		uint64_t rsvd : 33;
+	} s;
+};
+
+/* Firmware debug info structure */
+struct cn10k_ml_fw_debug {
+	/* ACC core 0 debug buffer */
+	uint64_t core0_debug_ptr;
+
+	/* ACC core 1 debug buffer */
+	uint64_t core1_debug_ptr;
+
+	/* ACC core 0 exception state buffer */
+	uint64_t core0_exception_buffer;
+
+	/* ACC core 1 exception state buffer */
+	uint64_t core1_exception_buffer;
+
+	/* Debug buffer size per core */
+	uint32_t debug_buffer_size;
 
-	/* Wait cycles before timeout */
-	uint64_t wait_cycles;
+	/* Exception state dump size */
+	uint32_t exception_state_size;
 };
 
-/* Queue-pair structure */
-struct cn10k_ml_qp {
-	/* ID */
-	uint32_t id;
+/* Job descriptor header (32 bytes) */
+struct cn10k_ml_jd_header {
+	/* Job completion structure */
+	struct ml_jce_s jce;
+
+	/* Model ID */
+	uint64_t model_id : 8;
+
+	/* Job type */
+	uint64_t job_type : 8;
+
+	/* Flags for fast-path jobs */
+	uint64_t fp_flags : 16;
+
+	/* Flags for slow-path jobs */
+	uint64_t sp_flags : 16;
+	uint64_t rsvd : 16;
+
+	/* Job result pointer */
+	uint64_t *result;
+};
+
+/* Extra arguments for job descriptor */
+union cn10k_ml_jd_extended_args {
+	struct cn10k_ml_jd_extended_args_section_start {
+		/* DDR Scratch base address */
+		uint64_t ddr_scratch_base_address;
+
+		/* DDR Scratch range start */
+		uint64_t ddr_scratch_range_start;
+
+		/* DDR Scratch range end */
+		uint64_t ddr_scratch_range_end;
+
+		uint8_t rsvd[104];
+	} start;
+};
+
+/* Job descriptor structure */
+struct cn10k_ml_jd {
+	/* Job descriptor header (32 bytes) */
+	struct cn10k_ml_jd_header hdr;
+
+	union {
+		struct cn10k_ml_jd_section_fw_load {
+			/* Firmware capability structure (8 bytes) */
+			union cn10k_ml_fw_cap cap;
+
+			/* Firmware version (32 bytes) */
+			uint8_t version[MLDEV_FIRMWARE_VERSION_LENGTH];
+
+			/* Debug capability structure (40 bytes) */
+			struct cn10k_ml_fw_debug debug;
 
-	/* Number of descriptors */
-	uint64_t nb_desc;
+			/* Flags to control error handling */
+			uint64_t flags;
 
-	/* Request queue */
-	struct cn10k_ml_queue queue;
+			uint8_t rsvd[8];
+		} fw_load;
 
-	/* Statistics per queue-pair */
-	struct rte_ml_dev_stats stats;
+		struct cn10k_ml_jd_section_model_start {
+			/* Extended arguments */
+			uint64_t extended_args;
+
+			/* Destination model start address in DDR relative to ML_MLR_BASE */
+			uint64_t model_dst_ddr_addr;
+
+			/* Offset to model init section in the model */
+			uint64_t model_init_offset : 32;
+
+			/* Size of init section in the model */
+			uint64_t model_init_size : 32;
+
+			/* Offset to model main section in the model */
+			uint64_t model_main_offset : 32;
+
+			/* Size of main section in the model */
+			uint64_t model_main_size : 32;
+
+			/* Offset to model finish section in the model */
+			uint64_t model_finish_offset : 32;
+
+			/* Size of finish section in the model */
+			uint64_t model_finish_size : 32;
+
+			/* Offset to WB in model bin */
+			uint64_t model_wb_offset : 32;
+
+			/* Number of model layers */
+			uint64_t num_layers : 8;
+
+			/* Number of gather entries, 0 means linear input mode (= no gather) */
+			uint64_t num_gather_entries : 8;
+
+			/* Number of scatter entries 0 means linear input mode (= no scatter) */
+			uint64_t num_scatter_entries : 8;
+
+			/* Tile mask to load model */
+			uint64_t tilemask : 8;
+
+			/* Batch size of model  */
+			uint64_t batch_size : 32;
+
+			/* OCM WB base address */
+			uint64_t ocm_wb_base_address : 32;
+
+			/* OCM WB range start */
+			uint64_t ocm_wb_range_start : 32;
+
+			/* OCM WB range End */
+			uint64_t ocm_wb_range_end : 32;
+
+			/* DDR WB address */
+			uint64_t ddr_wb_base_address;
+
+			/* DDR WB range start */
+			uint64_t ddr_wb_range_start : 32;
+
+			/* DDR WB range end */
+			uint64_t ddr_wb_range_end : 32;
+
+			union {
+				/* Points to gather list if num_gather_entries > 0 */
+				void *gather_list;
+				struct {
+					/* Linear input mode */
+					uint64_t ddr_range_start : 32;
+					uint64_t ddr_range_end : 32;
+				} s;
+			} input;
+
+			union {
+				/* Points to scatter list if num_scatter_entries > 0 */
+				void *scatter_list;
+				struct {
+					/* Linear output mode */
+					uint64_t ddr_range_start : 32;
+					uint64_t ddr_range_end : 32;
+				} s;
+			} output;
+		} model_start;
+
+		struct cn10k_ml_jd_section_model_stop {
+			uint8_t rsvd[96];
+		} model_stop;
+
+		struct cn10k_ml_jd_section_model_run {
+			/* Address of the input for the run relative to ML_MLR_BASE */
+			uint64_t input_ddr_addr;
+
+			/* Address of the output for the run relative to ML_MLR_BASE */
+			uint64_t output_ddr_addr;
+
+			/* Number of batches to run in variable batch processing */
+			uint16_t num_batches;
+
+			uint8_t rsvd[78];
+		} model_run;
+	};
+} __plt_aligned(ROC_ALIGN);
+
+/* CN10K specific request */
+struct cn10k_ml_req {
+	/* Job descriptor */
+	struct cn10k_ml_jd jd;
+
+	/* Job descriptor extra arguments */
+	union cn10k_ml_jd_extended_args extended_args;
+
+	/* Status field for poll mode requests */
+	volatile uint64_t status;
+
+	/* Job command */
+	struct ml_job_cmd_s jcmd;
+
+	/* Result */
+	struct cn10k_ml_result result;
 };
 
 /* Device ops */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
new file mode 100644
index 0000000000..f1872dcf7c
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include "cnxk_ml_ops.h"
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
new file mode 100644
index 0000000000..b953fb0f5f
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_OPS_H_
+#define _CNXK_ML_OPS_H_
+
+#include <rte_mldev.h>
+#include <rte_mldev_core.h>
+
+#include <roc_api.h>
+
+#include "cn10k_ml_ops.h"
+
+/* Request structure */
+struct cnxk_ml_req {
+	/* Device specific request */
+	union {
+		/* CN10K */
+		struct cn10k_ml_req cn10k_req;
+	};
+
+	/* Address of status field */
+	volatile uint64_t *status;
+
+	/* Timeout cycle */
+	uint64_t timeout;
+
+	/* Op */
+	struct rte_ml_op *op;
+} __rte_aligned(ROC_ALIGN);
+
+/* Request queue */
+struct cnxk_ml_queue {
+	/* Array of requests */
+	struct cnxk_ml_req *reqs;
+
+	/* Head of the queue, used for enqueue */
+	uint64_t head;
+
+	/* Tail of the queue, used for dequeue */
+	uint64_t tail;
+
+	/* Wait cycles before timeout */
+	uint64_t wait_cycles;
+};
+
+/* Queue-pair structure */
+struct cnxk_ml_qp {
+	/* ID */
+	uint32_t id;
+
+	/* Number of descriptors */
+	uint64_t nb_desc;
+
+	/* Request queue */
+	struct cnxk_ml_queue queue;
+
+	/* Statistics per queue-pair */
+	struct rte_ml_dev_stats stats;
+};
+
+#endif /* _CNXK_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index a70956cceb..d652543912 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -14,6 +14,7 @@ sources = files(
         'cn10k_ml_ocm.c',
         'cnxk_ml_dev.c',
         'cnxk_ml_model.c',
+        'cnxk_ml_ops.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 05/34] ml/cnxk: add generic cnxk xstats structures
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (3 preceding siblings ...)
  2023-10-18 13:53   ` [PATCH v6 04/34] ml/cnxk: add generic cnxk request structure Srikanth Yalavarthi
@ 2023-10-18 13:53   ` Srikanth Yalavarthi
  2023-10-18 13:53   ` [PATCH v6 06/34] ml/cnxk: rename cnxk ops function pointers struct Srikanth Yalavarthi
                     ` (29 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:53 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduced generic xstats structures and renamed cn10k
xstats enumerations with cnxk prefix.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |  86 +---------------
 drivers/ml/cnxk/cn10k_ml_model.h |   6 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 169 ++++++++++++++-----------------
 drivers/ml/cnxk/cnxk_ml_xstats.h | 128 +++++++++++++++++++++++
 4 files changed, 209 insertions(+), 180 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_xstats.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 1852d4f6c9..be989e0a20 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -10,6 +10,7 @@
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_io.h"
+#include "cnxk_ml_xstats.h"
 
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
@@ -121,89 +122,6 @@ struct cn10k_ml_fw {
 	struct cnxk_ml_req *req;
 };
 
-/* Extended stats types enum */
-enum cn10k_ml_xstats_type {
-	/* Number of models loaded */
-	nb_models_loaded,
-
-	/* Number of models unloaded */
-	nb_models_unloaded,
-
-	/* Number of models started */
-	nb_models_started,
-
-	/* Number of models stopped */
-	nb_models_stopped,
-
-	/* Average inference hardware latency */
-	avg_hw_latency,
-
-	/* Minimum hardware latency */
-	min_hw_latency,
-
-	/* Maximum hardware latency */
-	max_hw_latency,
-
-	/* Average firmware latency */
-	avg_fw_latency,
-
-	/* Minimum firmware latency */
-	min_fw_latency,
-
-	/* Maximum firmware latency */
-	max_fw_latency,
-};
-
-/* Extended stats function type enum. */
-enum cn10k_ml_xstats_fn_type {
-	/* Device function */
-	CN10K_ML_XSTATS_FN_DEVICE,
-
-	/* Model function */
-	CN10K_ML_XSTATS_FN_MODEL,
-};
-
-/* Function pointer to get xstats for a type */
-typedef uint64_t (*cn10k_ml_xstats_fn)(struct rte_ml_dev *dev, uint16_t obj_idx,
-				       enum cn10k_ml_xstats_type stat);
-
-/* Extended stats entry structure */
-struct cn10k_ml_xstats_entry {
-	/* Name-ID map */
-	struct rte_ml_dev_xstats_map map;
-
-	/* xstats mode, device or model */
-	enum rte_ml_dev_xstats_mode mode;
-
-	/* Type of xstats */
-	enum cn10k_ml_xstats_type type;
-
-	/* xstats function */
-	enum cn10k_ml_xstats_fn_type fn_id;
-
-	/* Object ID, model ID for model stat type */
-	uint16_t obj_idx;
-
-	/* Allowed to reset the stat */
-	uint8_t reset_allowed;
-
-	/* An offset to be taken away to emulate resets */
-	uint64_t reset_value;
-};
-
-/* Extended stats data */
-struct cn10k_ml_xstats {
-	/* Pointer to xstats entries */
-	struct cn10k_ml_xstats_entry *entries;
-
-	/* Store num stats and offset of the stats for each model */
-	uint16_t count_per_model[ML_CNXK_MAX_MODELS];
-	uint16_t offset_for_model[ML_CNXK_MAX_MODELS];
-	uint16_t count_mode_device;
-	uint16_t count_mode_model;
-	uint16_t count;
-};
-
 /* Device private data */
 struct cn10k_ml_dev {
 	/* Device ROC */
@@ -216,7 +134,7 @@ struct cn10k_ml_dev {
 	struct cn10k_ml_ocm ocm;
 
 	/* Extended stats data */
-	struct cn10k_ml_xstats xstats;
+	struct cnxk_ml_xstats xstats;
 
 	/* Enable / disable model data caching */
 	int cache_model_data;
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 74ada1531a..5c32f48c68 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -404,7 +404,7 @@ struct cn10k_ml_layer_addr {
 };
 
 /* Model fast-path stats */
-struct cn10k_ml_layer_stats {
+struct cn10k_ml_layer_xstats {
 	/* Total hardware latency, sum of all inferences */
 	uint64_t hw_latency_tot;
 
@@ -447,10 +447,10 @@ struct cn10k_ml_layer_data {
 	struct cnxk_ml_req *req;
 
 	/* Layer: Stats for burst ops */
-	struct cn10k_ml_layer_stats *burst_stats;
+	struct cn10k_ml_layer_xstats *burst_xstats;
 
 	/* Layer: Stats for sync ops */
-	struct cn10k_ml_layer_stats *sync_stats;
+	struct cn10k_ml_layer_xstats *sync_xstats;
 };
 
 struct cn10k_ml_model_data {
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index caee09829b..42a4389bbe 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -10,6 +10,7 @@
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
+#include "cnxk_ml_xstats.h"
 
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
@@ -425,26 +426,6 @@ cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
 }
 
-struct xstat_info {
-	char name[32];
-	enum cn10k_ml_xstats_type type;
-	uint8_t reset_allowed;
-};
-
-/* Note: Device stats are not allowed to be reset. */
-static const struct xstat_info device_stats[] = {
-	{"nb_models_loaded", nb_models_loaded, 0},
-	{"nb_models_unloaded", nb_models_unloaded, 0},
-	{"nb_models_started", nb_models_started, 0},
-	{"nb_models_stopped", nb_models_stopped, 0},
-};
-
-static const struct xstat_info model_stats[] = {
-	{"Avg-HW-Latency", avg_hw_latency, 1}, {"Min-HW-Latency", min_hw_latency, 1},
-	{"Max-HW-Latency", max_hw_latency, 1}, {"Avg-FW-Latency", avg_fw_latency, 1},
-	{"Min-FW-Latency", min_fw_latency, 1}, {"Max-FW-Latency", max_fw_latency, 1},
-};
-
 static int
 cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 {
@@ -459,10 +440,10 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
-	nb_stats = RTE_DIM(device_stats) + ML_CNXK_MAX_MODELS * RTE_DIM(model_stats);
+	nb_stats = RTE_DIM(device_xstats) + ML_CNXK_MAX_MODELS * RTE_DIM(layer_xstats);
 	if (cn10k_mldev->xstats.entries == NULL)
 		cn10k_mldev->xstats.entries = rte_zmalloc(
-			"cn10k_ml_xstats", sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
+			"cn10k_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
 			PLT_CACHE_LINE_SIZE);
 
 	if (cn10k_mldev->xstats.entries == NULL)
@@ -470,17 +451,17 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 
 	/* Initialize device xstats */
 	stat_id = 0;
-	for (i = 0; i < RTE_DIM(device_stats); i++) {
+	for (i = 0; i < RTE_DIM(device_xstats); i++) {
 		cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
 		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s",
-			 device_stats[i].name);
+			 device_xstats[i].name);
 
 		cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
-		cn10k_mldev->xstats.entries[stat_id].type = device_stats[i].type;
-		cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_DEVICE;
+		cn10k_mldev->xstats.entries[stat_id].type = device_xstats[i].type;
+		cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_DEVICE;
 		cn10k_mldev->xstats.entries[stat_id].obj_idx = 0;
-		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_stats[i].reset_allowed;
+		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_xstats[i].reset_allowed;
 		stat_id++;
 	}
 	cn10k_mldev->xstats.count_mode_device = stat_id;
@@ -489,24 +470,24 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
 		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
 
-		for (i = 0; i < RTE_DIM(model_stats); i++) {
+		for (i = 0; i < RTE_DIM(layer_xstats); i++) {
 			cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
 			cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
-			cn10k_mldev->xstats.entries[stat_id].type = model_stats[i].type;
-			cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_MODEL;
+			cn10k_mldev->xstats.entries[stat_id].type = layer_xstats[i].type;
+			cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
 			cn10k_mldev->xstats.entries[stat_id].obj_idx = model;
 			cn10k_mldev->xstats.entries[stat_id].reset_allowed =
-				model_stats[i].reset_allowed;
+				layer_xstats[i].reset_allowed;
 
 			/* Name of xstat is updated during model load */
 			snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 				 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name),
-				 "Model-%u-%s", model, model_stats[i].name);
+				 "Model-%u-%s", model, layer_xstats[i].name);
 
 			stat_id++;
 		}
 
-		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(model_stats);
+		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
 	}
 
 	cn10k_mldev->xstats.count_mode_model = stat_id - cn10k_mldev->xstats.count_mode_device;
@@ -545,7 +526,7 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[model_id];
-	stat_id = RTE_DIM(device_stats) + model_id * RTE_DIM(model_stats);
+	stat_id = RTE_DIM(device_xstats) + model_id * RTE_DIM(layer_xstats);
 
 	roc_clk_freq_get(&rclk_freq, &sclk_freq);
 	if (sclk_freq == 0)
@@ -554,17 +535,17 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 		strcpy(suffix, "ns");
 
 	/* Update xstat name based on model name and sclk availability */
-	for (i = 0; i < RTE_DIM(model_stats); i++) {
+	for (i = 0; i < RTE_DIM(layer_xstats); i++) {
 		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
-			 model->layer[0].glow.metadata.model.name, model_stats[i].name, suffix);
+			 model->layer[0].glow.metadata.model.name, layer_xstats[i].name, suffix);
 		stat_id++;
 	}
 }
 
 static uint64_t
 cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
-		       enum cn10k_ml_xstats_type type)
+		       enum cnxk_ml_xstats_type type)
 {
 	struct cnxk_ml_dev *cnxk_mldev;
 
@@ -590,9 +571,9 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value += model->layer[0].glow.burst_stats[qp_id].str##_latency_tot;        \
-			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
-				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
+			value += model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot;       \
+			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
+				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
 		}                                                                                  \
 		if (count != 0)                                                                    \
 			value = value / count;                                                     \
@@ -603,9 +584,10 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 		value = UINT64_MAX;                                                                \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
 			value = PLT_MIN(                                                           \
-				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_min); \
-			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
-				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
+				value,                                                             \
+				model->layer[0].glow.burst_xstats[qp_id].str##_latency_min);       \
+			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
+				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
@@ -616,16 +598,17 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
 			value = PLT_MAX(                                                           \
-				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_max); \
-			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
-				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
+				value,                                                             \
+				model->layer[0].glow.burst_xstats[qp_id].str##_latency_max);       \
+			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
+				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
 	} while (0)
 
 static uint64_t
-cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cn10k_ml_xstats_type type)
+cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cnxk_ml_xstats_type type)
 {
 	struct cnxk_ml_model *model;
 	uint16_t rclk_freq; /* MHz */
@@ -671,8 +654,8 @@ cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cn10k_ml
 static int
 cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[], uint16_t nb_ids)
 {
-	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
 	uint16_t nb_stats;
 	uint16_t stat_id;
@@ -708,26 +691,26 @@ cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[],
 #define ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			model->layer[0].glow.burst_stats[qp_id].str##_latency_tot = 0;             \
-			model->layer[0].glow.burst_stats[qp_id].str##_reset_count =                \
-				model->layer[0].glow.burst_stats[qp_id].dequeued_count;            \
+			model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot = 0;            \
+			model->layer[0].glow.burst_xstats[qp_id].str##_reset_count =               \
+				model->layer[0].glow.burst_xstats[qp_id].dequeued_count;           \
 		}                                                                                  \
 	} while (0)
 
 #define ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_stats[qp_id].str##_latency_min = UINT64_MAX;    \
+			model->layer[0].glow.burst_xstats[qp_id].str##_latency_min = UINT64_MAX;   \
 	} while (0)
 
 #define ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_stats[qp_id].str##_latency_max = 0;             \
+			model->layer[0].glow.burst_xstats[qp_id].str##_latency_max = 0;            \
 	} while (0)
 
 static void
-cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cn10k_ml_xstats_type type)
+cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cnxk_ml_xstats_type type)
 {
 	struct cnxk_ml_model *model;
 	uint32_t qp_id;
@@ -762,8 +745,8 @@ static int
 cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint16_t stat_ids[],
 			    uint16_t nb_ids)
 {
-	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	int32_t lcl_model_id = 0;
@@ -1342,10 +1325,10 @@ static int
 cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
 				uint64_t *value)
 {
-	struct cn10k_ml_xstats_entry *xs;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	cn10k_ml_xstats_fn fn;
+	cnxk_ml_xstats_fn fn;
 	uint32_t i;
 
 	cnxk_mldev = dev->data->dev_private;
@@ -1357,10 +1340,10 @@ cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16
 				*stat_id = xs->map.id;
 
 			switch (xs->fn_id) {
-			case CN10K_ML_XSTATS_FN_DEVICE:
+			case CNXK_ML_XSTATS_FN_DEVICE:
 				fn = cn10k_ml_dev_xstat_get;
 				break;
-			case CN10K_ML_XSTATS_FN_MODEL:
+			case CNXK_ML_XSTATS_FN_MODEL:
 				fn = cn10k_ml_model_xstat_get;
 				break;
 			default:
@@ -1384,11 +1367,11 @@ static int
 cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
 			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
 {
-	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
 	uint32_t xstats_mode_count;
-	cn10k_ml_xstats_fn fn;
+	cnxk_ml_xstats_fn fn;
 	uint64_t val;
 	uint32_t idx;
 	uint32_t i;
@@ -1423,10 +1406,10 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 		}
 
 		switch (xs->fn_id) {
-		case CN10K_ML_XSTATS_FN_DEVICE:
+		case CNXK_ML_XSTATS_FN_DEVICE:
 			fn = cn10k_ml_dev_xstat_get;
 			break;
-		case CN10K_ML_XSTATS_FN_MODEL:
+		case CNXK_ML_XSTATS_FN_MODEL:
 			fn = cn10k_ml_model_xstat_get;
 			break;
 		default:
@@ -1664,7 +1647,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
 			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
 	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
-	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_stats);
+	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_xstats);
 
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
 		  2 * model_data_size + model_scratch_size + model_info_size +
@@ -1738,24 +1721,24 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	model->layer[0].glow.req = PLT_PTR_ADD(model->info, model_info_size);
 
 	/* Reset burst and sync stats */
-	model->layer[0].glow.burst_stats =
+	model->layer[0].glow.burst_xstats =
 		PLT_PTR_ADD(model->layer[0].glow.req,
 			    PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
-		model->layer[0].glow.burst_stats[qp_id].hw_latency_tot = 0;
-		model->layer[0].glow.burst_stats[qp_id].hw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_stats[qp_id].hw_latency_max = 0;
-		model->layer[0].glow.burst_stats[qp_id].fw_latency_tot = 0;
-		model->layer[0].glow.burst_stats[qp_id].fw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_stats[qp_id].fw_latency_max = 0;
-		model->layer[0].glow.burst_stats[qp_id].hw_reset_count = 0;
-		model->layer[0].glow.burst_stats[qp_id].fw_reset_count = 0;
-		model->layer[0].glow.burst_stats[qp_id].dequeued_count = 0;
+		model->layer[0].glow.burst_xstats[qp_id].hw_latency_tot = 0;
+		model->layer[0].glow.burst_xstats[qp_id].hw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_xstats[qp_id].hw_latency_max = 0;
+		model->layer[0].glow.burst_xstats[qp_id].fw_latency_tot = 0;
+		model->layer[0].glow.burst_xstats[qp_id].fw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_xstats[qp_id].fw_latency_max = 0;
+		model->layer[0].glow.burst_xstats[qp_id].hw_reset_count = 0;
+		model->layer[0].glow.burst_xstats[qp_id].fw_reset_count = 0;
+		model->layer[0].glow.burst_xstats[qp_id].dequeued_count = 0;
 	}
 
-	model->layer[0].glow.sync_stats =
-		PLT_PTR_ADD(model->layer[0].glow.burst_stats,
-			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_stats));
+	model->layer[0].glow.sync_xstats =
+		PLT_PTR_ADD(model->layer[0].glow.burst_xstats,
+			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_xstats));
 
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CNXK_MODEL_STATE_LOADED;
@@ -2308,7 +2291,7 @@ static __rte_always_inline void
 cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *req)
 {
 	union cn10k_ml_error_code *error_code;
-	struct cn10k_ml_layer_stats *stats;
+	struct cn10k_ml_layer_xstats *xstats;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_result *result;
@@ -2326,31 +2309,31 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *re
 		if (likely(qp_id >= 0)) {
 			qp = dev->data->queue_pairs[qp_id];
 			qp->stats.dequeued_count++;
-			stats = &model->layer[0].glow.burst_stats[qp_id];
+			xstats = &model->layer[0].glow.burst_xstats[qp_id];
 		} else {
-			stats = model->layer[0].glow.sync_stats;
+			xstats = model->layer[0].glow.sync_xstats;
 		}
 
-		if (unlikely(stats->dequeued_count == stats->hw_reset_count)) {
-			stats->hw_latency_min = UINT64_MAX;
-			stats->hw_latency_max = 0;
+		if (unlikely(xstats->dequeued_count == xstats->hw_reset_count)) {
+			xstats->hw_latency_min = UINT64_MAX;
+			xstats->hw_latency_max = 0;
 		}
 
-		if (unlikely(stats->dequeued_count == stats->fw_reset_count)) {
-			stats->fw_latency_min = UINT64_MAX;
-			stats->fw_latency_max = 0;
+		if (unlikely(xstats->dequeued_count == xstats->fw_reset_count)) {
+			xstats->fw_latency_min = UINT64_MAX;
+			xstats->fw_latency_max = 0;
 		}
 
 		hw_latency = result->stats.hw_end - result->stats.hw_start;
 		fw_latency = result->stats.fw_end - result->stats.fw_start - hw_latency;
 
-		stats->hw_latency_tot += hw_latency;
-		stats->hw_latency_min = PLT_MIN(stats->hw_latency_min, hw_latency);
-		stats->hw_latency_max = PLT_MAX(stats->hw_latency_max, hw_latency);
-		stats->fw_latency_tot += fw_latency;
-		stats->fw_latency_min = PLT_MIN(stats->fw_latency_min, fw_latency);
-		stats->fw_latency_max = PLT_MAX(stats->fw_latency_max, fw_latency);
-		stats->dequeued_count++;
+		xstats->hw_latency_tot += hw_latency;
+		xstats->hw_latency_min = PLT_MIN(xstats->hw_latency_min, hw_latency);
+		xstats->hw_latency_max = PLT_MAX(xstats->hw_latency_max, hw_latency);
+		xstats->fw_latency_tot += fw_latency;
+		xstats->fw_latency_min = PLT_MIN(xstats->fw_latency_min, fw_latency);
+		xstats->fw_latency_max = PLT_MAX(xstats->fw_latency_max, fw_latency);
+		xstats->dequeued_count++;
 
 		op->impl_opaque = result->error_code;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
diff --git a/drivers/ml/cnxk/cnxk_ml_xstats.h b/drivers/ml/cnxk/cnxk_ml_xstats.h
new file mode 100644
index 0000000000..0d405679ca
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_xstats.h
@@ -0,0 +1,128 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_XSTATS_H_
+#define _CNXK_ML_XSTATS_H_
+
+#include "cnxk_ml_io.h"
+
+/* Extended stats types enum */
+enum cnxk_ml_xstats_type {
+	/* Number of models loaded */
+	nb_models_loaded,
+
+	/* Number of models unloaded */
+	nb_models_unloaded,
+
+	/* Number of models started */
+	nb_models_started,
+
+	/* Number of models stopped */
+	nb_models_stopped,
+
+	/* Average inference hardware latency */
+	avg_hw_latency,
+
+	/* Minimum hardware latency */
+	min_hw_latency,
+
+	/* Maximum hardware latency */
+	max_hw_latency,
+
+	/* Average firmware latency */
+	avg_fw_latency,
+
+	/* Minimum firmware latency */
+	min_fw_latency,
+
+	/* Maximum firmware latency */
+	max_fw_latency,
+
+	/* Average runtime latency */
+	avg_rt_latency,
+
+	/* Minimum runtime latency */
+	min_rt_latency,
+
+	/* Maximum runtime latency */
+	max_rt_latency,
+};
+
+/* Extended stats function type enum. */
+enum cnxk_ml_xstats_fn_type {
+	/* Device function */
+	CNXK_ML_XSTATS_FN_DEVICE,
+
+	/* Model function */
+	CNXK_ML_XSTATS_FN_MODEL,
+};
+
+/* Function pointer to get xstats for a type */
+typedef uint64_t (*cnxk_ml_xstats_fn)(struct rte_ml_dev *cnxk_mldev, uint16_t obj_idx,
+				      enum cnxk_ml_xstats_type stat);
+
+/* Extended stats entry structure */
+struct cnxk_ml_xstats_entry {
+	/* Name-ID map */
+	struct rte_ml_dev_xstats_map map;
+
+	/* xstats mode, device or model */
+	enum rte_ml_dev_xstats_mode mode;
+
+	/* Type of xstats */
+	enum cnxk_ml_xstats_type type;
+
+	/* xstats function */
+	enum cnxk_ml_xstats_fn_type fn_id;
+
+	/* Object ID, model ID for model stat type */
+	uint16_t obj_idx;
+
+	/* Layer ID, valid for model stat type */
+	int32_t layer_id;
+
+	/* Allowed to reset the stat */
+	uint8_t reset_allowed;
+
+	/* An offset to be taken away to emulate resets */
+	uint64_t reset_value;
+};
+
+/* Extended stats data */
+struct cnxk_ml_xstats {
+	/* Pointer to xstats entries */
+	struct cnxk_ml_xstats_entry *entries;
+
+	/* Store num stats and offset of the stats for each model */
+	uint16_t count_per_model[ML_CNXK_MAX_MODELS];
+	uint16_t offset_for_model[ML_CNXK_MAX_MODELS];
+	uint16_t count_per_layer[ML_CNXK_MAX_MODELS][ML_CNXK_MODEL_MAX_LAYERS];
+	uint16_t offset_for_layer[ML_CNXK_MAX_MODELS][ML_CNXK_MODEL_MAX_LAYERS];
+	uint16_t count_mode_device;
+	uint16_t count_mode_model;
+	uint16_t count;
+};
+
+struct cnxk_ml_xstat_info {
+	char name[32];
+	enum cnxk_ml_xstats_type type;
+	uint8_t reset_allowed;
+};
+
+/* Device xstats. Note: Device stats are not allowed to be reset. */
+static const struct cnxk_ml_xstat_info device_xstats[] = {
+	{"nb_models_loaded", nb_models_loaded, 0},
+	{"nb_models_unloaded", nb_models_unloaded, 0},
+	{"nb_models_started", nb_models_started, 0},
+	{"nb_models_stopped", nb_models_stopped, 0},
+};
+
+/* Layer xstats */
+static const struct cnxk_ml_xstat_info layer_xstats[] = {
+	{"Avg-HW-Latency", avg_hw_latency, 1}, {"Min-HW-Latency", min_hw_latency, 1},
+	{"Max-HW-Latency", max_hw_latency, 1}, {"Avg-FW-Latency", avg_fw_latency, 1},
+	{"Min-FW-Latency", min_fw_latency, 1}, {"Max-FW-Latency", max_fw_latency, 1},
+};
+
+#endif /* _CNXK_ML_XSTATS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 06/34] ml/cnxk: rename cnxk ops function pointers struct
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (4 preceding siblings ...)
  2023-10-18 13:53   ` [PATCH v6 05/34] ml/cnxk: add generic cnxk xstats structures Srikanth Yalavarthi
@ 2023-10-18 13:53   ` Srikanth Yalavarthi
  2023-10-18 13:53   ` [PATCH v6 07/34] ml/cnxk: update device handling functions Srikanth Yalavarthi
                     ` (28 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:53 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Renamed cn10k ML ops structure with cnxk prefix.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c |  2 +-
 drivers/ml/cnxk/cn10k_ml_ops.c | 73 +++++++++-------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h | 34 +++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.c  | 36 +++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.h  |  2 +
 5 files changed, 91 insertions(+), 56 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index fc6f78d414..91813e9d0a 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -345,7 +345,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 			goto pmd_destroy;
 		}
 
-		dev->dev_ops = &cn10k_ml_ops;
+		dev->dev_ops = &cnxk_ml_ops;
 	} else {
 		plt_err("CN10K ML Ops are not supported on secondary process");
 		dev->dev_ops = &ml_dev_dummy_ops;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 42a4389bbe..66b38fc1eb 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -119,7 +119,7 @@ cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
 {
 	struct cnxk_ml_qp *qp;
@@ -860,7 +860,7 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	return ret;
 }
 
-static int
+int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -888,7 +888,7 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
 {
 	struct rte_ml_dev_info dev_info;
@@ -1087,7 +1087,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	return ret;
 }
 
-static int
+int
 cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1160,7 +1160,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	return rte_dev_remove(dev->device);
 }
 
-static int
+int
 cn10k_ml_dev_start(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1180,7 +1180,7 @@ cn10k_ml_dev_start(struct rte_ml_dev *dev)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1200,7 +1200,7 @@ cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 			      const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
 {
@@ -1241,7 +1241,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 {
 	struct cnxk_ml_qp *qp;
@@ -1258,7 +1258,7 @@ cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 	return 0;
 }
 
-static void
+void
 cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 {
 	struct cnxk_ml_qp *qp;
@@ -1273,7 +1273,7 @@ cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 	}
 }
 
-static int
+int
 cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
 			      uint32_t size)
@@ -1321,7 +1321,7 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 	return idx;
 }
 
-static int
+int
 cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
 				uint64_t *value)
 {
@@ -1363,7 +1363,7 @@ cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16
 	return -EINVAL;
 }
 
-static int
+int
 cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
 			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
 {
@@ -1427,7 +1427,7 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 	return idx;
 }
 
-static int
+int
 cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 			  int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids)
 {
@@ -1441,7 +1441,7 @@ cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mo
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1528,7 +1528,7 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -2051,7 +2051,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	return ret;
 }
 
-static int
+int
 cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			struct rte_ml_model_info *model_info)
 {
@@ -2071,7 +2071,7 @@ cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 	return 0;
 }
 
-static int
+int
 cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
 {
 	struct cnxk_ml_model *model;
@@ -2105,7 +2105,7 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *bu
 	return 0;
 }
 
-static int
+int
 cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
 		     struct rte_ml_buff_seg **qbuffer)
 {
@@ -2186,7 +2186,7 @@ cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_bu
 	return 0;
 }
 
-static int
+int
 cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
 		       struct rte_ml_buff_seg **dbuffer)
 {
@@ -2574,38 +2574,3 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 error_enqueue:
 	return ret;
 }
-
-struct rte_ml_dev_ops cn10k_ml_ops = {
-	/* Device control ops */
-	.dev_info_get = cn10k_ml_dev_info_get,
-	.dev_configure = cn10k_ml_dev_configure,
-	.dev_close = cn10k_ml_dev_close,
-	.dev_start = cn10k_ml_dev_start,
-	.dev_stop = cn10k_ml_dev_stop,
-	.dev_dump = cn10k_ml_dev_dump,
-	.dev_selftest = cn10k_ml_dev_selftest,
-
-	/* Queue-pair handling ops */
-	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
-	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
-
-	/* Stats ops */
-	.dev_stats_get = cn10k_ml_dev_stats_get,
-	.dev_stats_reset = cn10k_ml_dev_stats_reset,
-	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
-	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
-	.dev_xstats_get = cn10k_ml_dev_xstats_get,
-	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
-
-	/* Model ops */
-	.model_load = cn10k_ml_model_load,
-	.model_unload = cn10k_ml_model_unload,
-	.model_start = cn10k_ml_model_start,
-	.model_stop = cn10k_ml_model_stop,
-	.model_info_get = cn10k_ml_model_info_get,
-	.model_params_update = cn10k_ml_model_params_update,
-
-	/* I/O ops */
-	.io_quantize = cn10k_ml_io_quantize,
-	.io_dequantize = cn10k_ml_io_dequantize,
-};
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index fd5992e192..16480b9ad8 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -286,7 +286,29 @@ struct cn10k_ml_req {
 };
 
 /* Device ops */
-extern struct rte_ml_dev_ops cn10k_ml_ops;
+int cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
+int cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf);
+int cn10k_ml_dev_close(struct rte_ml_dev *dev);
+int cn10k_ml_dev_start(struct rte_ml_dev *dev);
+int cn10k_ml_dev_stop(struct rte_ml_dev *dev);
+int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
+int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
+int cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
+				  const struct rte_ml_dev_qp_conf *qp_conf, int socket_id);
+int cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id);
+
+int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
+void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
+int cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+				  int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
+				  uint32_t size);
+int cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
+				    uint64_t *value);
+int cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+			    int32_t model_id, const uint16_t stat_ids[], uint64_t values[],
+			    uint16_t nb_ids);
+int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+			      int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids);
 
 /* Slow-path ops */
 int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
@@ -294,6 +316,16 @@ int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *para
 int cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
+int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
+			    struct rte_ml_model_info *model_info);
+int cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer);
+
+/* I/O ops */
+int cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id,
+			 struct rte_ml_buff_seg **dbuffer, struct rte_ml_buff_seg **qbuffer);
+
+int cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id,
+			   struct rte_ml_buff_seg **qbuffer, struct rte_ml_buff_seg **dbuffer);
 
 /* Fast-path ops */
 __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index f1872dcf7c..03402681c5 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -3,5 +3,41 @@
  */
 
 #include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
 
 #include "cnxk_ml_ops.h"
+
+struct rte_ml_dev_ops cnxk_ml_ops = {
+	/* Device control ops */
+	.dev_info_get = cn10k_ml_dev_info_get,
+	.dev_configure = cn10k_ml_dev_configure,
+	.dev_close = cn10k_ml_dev_close,
+	.dev_start = cn10k_ml_dev_start,
+	.dev_stop = cn10k_ml_dev_stop,
+	.dev_dump = cn10k_ml_dev_dump,
+	.dev_selftest = cn10k_ml_dev_selftest,
+
+	/* Queue-pair handling ops */
+	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
+	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
+
+	/* Stats ops */
+	.dev_stats_get = cn10k_ml_dev_stats_get,
+	.dev_stats_reset = cn10k_ml_dev_stats_reset,
+	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
+	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
+	.dev_xstats_get = cn10k_ml_dev_xstats_get,
+	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
+
+	/* Model ops */
+	.model_load = cn10k_ml_model_load,
+	.model_unload = cn10k_ml_model_unload,
+	.model_start = cn10k_ml_model_start,
+	.model_stop = cn10k_ml_model_stop,
+	.model_info_get = cn10k_ml_model_info_get,
+	.model_params_update = cn10k_ml_model_params_update,
+
+	/* I/O ops */
+	.io_quantize = cn10k_ml_io_quantize,
+	.io_dequantize = cn10k_ml_io_dequantize,
+};
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index b953fb0f5f..a925c07580 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -60,4 +60,6 @@ struct cnxk_ml_qp {
 	struct rte_ml_dev_stats stats;
 };
 
+extern struct rte_ml_dev_ops cnxk_ml_ops;
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 07/34] ml/cnxk: update device handling functions
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (5 preceding siblings ...)
  2023-10-18 13:53   ` [PATCH v6 06/34] ml/cnxk: rename cnxk ops function pointers struct Srikanth Yalavarthi
@ 2023-10-18 13:53   ` Srikanth Yalavarthi
  2023-10-18 13:53   ` [PATCH v6 08/34] ml/cnxk: update queue-pair " Srikanth Yalavarthi
                     ` (27 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:53 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implement CNXK wrapper functions for dev_info_get,
dev_configure, dev_close, dev_start and dev_stop. The
wrapper functions allocate / release common resources
for the ML driver and invoke device specific functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 230 ++------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |  16 +-
 drivers/ml/cnxk/cnxk_ml_dev.h  |   3 +
 drivers/ml/cnxk/cnxk_ml_ops.c  | 286 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.h  |   3 +
 5 files changed, 314 insertions(+), 224 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 66b38fc1eb..6d8f2c8777 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -101,7 +101,7 @@ qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
 }
 
-static int
+int
 cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
 {
 	const struct rte_memzone *qp_mem;
@@ -861,20 +861,12 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 }
 
 int
-cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
+cn10k_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 
-	if (dev_info == NULL)
-		return -EINVAL;
-
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
-	dev_info->driver_name = dev->device->driver->name;
-	dev_info->max_models = ML_CNXK_MAX_MODELS;
 	if (cn10k_mldev->hw_queue_lock)
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
 	else
@@ -889,143 +881,17 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 }
 
 int
-cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
+cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
-	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cnxk_ml_qp *qp;
-	uint16_t model_id;
-	uint32_t mz_size;
 	uint16_t tile_id;
-	uint16_t qp_id;
 	int ret;
 
-	if (dev == NULL || conf == NULL)
-		return -EINVAL;
+	RTE_SET_USED(conf);
 
-	/* Get CN10K device handle */
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	cn10k_ml_dev_info_get(dev, &dev_info);
-	if (conf->nb_models > dev_info.max_models) {
-		plt_err("Invalid device config, nb_models > %u\n", dev_info.max_models);
-		return -EINVAL;
-	}
-
-	if (conf->nb_queue_pairs > dev_info.max_queue_pairs) {
-		plt_err("Invalid device config, nb_queue_pairs > %u\n", dev_info.max_queue_pairs);
-		return -EINVAL;
-	}
-
-	if (cnxk_mldev->state == ML_CNXK_DEV_STATE_PROBED) {
-		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
-			   conf->nb_queue_pairs, conf->nb_models);
-
-		/* Load firmware */
-		ret = cn10k_ml_fw_load(cnxk_mldev);
-		if (ret != 0)
-			return ret;
-	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
-		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
-			   conf->nb_queue_pairs, conf->nb_models);
-	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_STARTED) {
-		plt_err("Device can't be reconfigured in started state\n");
-		return -ENOTSUP;
-	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CLOSED) {
-		plt_err("Device can't be reconfigured after close\n");
-		return -ENOTSUP;
-	}
-
-	/* Configure queue-pairs */
-	if (dev->data->queue_pairs == NULL) {
-		mz_size = sizeof(dev->data->queue_pairs[0]) * conf->nb_queue_pairs;
-		dev->data->queue_pairs =
-			rte_zmalloc("cn10k_mldev_queue_pairs", mz_size, RTE_CACHE_LINE_SIZE);
-		if (dev->data->queue_pairs == NULL) {
-			dev->data->nb_queue_pairs = 0;
-			plt_err("Failed to get memory for queue_pairs, nb_queue_pairs %u",
-				conf->nb_queue_pairs);
-			return -ENOMEM;
-		}
-	} else { /* Re-configure */
-		void **queue_pairs;
-
-		/* Release all queue pairs as ML spec doesn't support queue_pair_destroy. */
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-			qp = dev->data->queue_pairs[qp_id];
-			if (qp != NULL) {
-				ret = cn10k_ml_dev_queue_pair_release(dev, qp_id);
-				if (ret < 0)
-					return ret;
-			}
-		}
-
-		queue_pairs = dev->data->queue_pairs;
-		queue_pairs =
-			rte_realloc(queue_pairs, sizeof(queue_pairs[0]) * conf->nb_queue_pairs,
-				    RTE_CACHE_LINE_SIZE);
-		if (queue_pairs == NULL) {
-			dev->data->nb_queue_pairs = 0;
-			plt_err("Failed to realloc queue_pairs, nb_queue_pairs = %u",
-				conf->nb_queue_pairs);
-			ret = -ENOMEM;
-			goto error;
-		}
-
-		memset(queue_pairs, 0, sizeof(queue_pairs[0]) * conf->nb_queue_pairs);
-		dev->data->queue_pairs = queue_pairs;
-	}
-	dev->data->nb_queue_pairs = conf->nb_queue_pairs;
-
-	/* Allocate ML models */
-	if (dev->data->models == NULL) {
-		mz_size = sizeof(dev->data->models[0]) * conf->nb_models;
-		dev->data->models = rte_zmalloc("cn10k_mldev_models", mz_size, RTE_CACHE_LINE_SIZE);
-		if (dev->data->models == NULL) {
-			dev->data->nb_models = 0;
-			plt_err("Failed to get memory for ml_models, nb_models %u",
-				conf->nb_models);
-			ret = -ENOMEM;
-			goto error;
-		}
-	} else {
-		/* Re-configure */
-		void **models;
-
-		/* Stop and unload all models */
-		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
-			model = dev->data->models[model_id];
-			if (model != NULL) {
-				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-					if (cn10k_ml_model_stop(dev, model_id) != 0)
-						plt_err("Could not stop model %u", model_id);
-				}
-				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-					if (cn10k_ml_model_unload(dev, model_id) != 0)
-						plt_err("Could not unload model %u", model_id);
-				}
-				dev->data->models[model_id] = NULL;
-			}
-		}
-
-		models = dev->data->models;
-		models = rte_realloc(models, sizeof(models[0]) * conf->nb_models,
-				     RTE_CACHE_LINE_SIZE);
-		if (models == NULL) {
-			dev->data->nb_models = 0;
-			plt_err("Failed to realloc ml_models, nb_models = %u", conf->nb_models);
-			ret = -ENOMEM;
-			goto error;
-		}
-		memset(models, 0, sizeof(models[0]) * conf->nb_models);
-		dev->data->models = models;
-	}
-	dev->data->nb_models = conf->nb_models;
-
 	ocm = &cn10k_mldev->ocm;
 	ocm->num_tiles = ML_CN10K_OCM_NUMTILES;
 	ocm->size_per_tile = ML_CN10K_OCM_TILESIZE;
@@ -1038,8 +904,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		rte_zmalloc("ocm_mask", ocm->mask_words * ocm->num_tiles, RTE_CACHE_LINE_SIZE);
 	if (ocm->ocm_mask == NULL) {
 		plt_err("Unable to allocate memory for OCM mask");
-		ret = -ENOMEM;
-		goto error;
+		return -ENOMEM;
 	}
 
 	for (tile_id = 0; tile_id < ocm->num_tiles; tile_id++) {
@@ -1050,10 +915,10 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	rte_spinlock_init(&ocm->lock);
 
 	/* Initialize xstats */
-	ret = cn10k_ml_xstats_init(dev);
+	ret = cn10k_ml_xstats_init(cnxk_mldev->mldev);
 	if (ret != 0) {
 		plt_err("Failed to initialize xstats");
-		goto error;
+		return ret;
 	}
 
 	/* Set JCMDQ enqueue function */
@@ -1067,77 +932,25 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	cn10k_mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
 	cn10k_mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
 
-	dev->enqueue_burst = cn10k_ml_enqueue_burst;
-	dev->dequeue_burst = cn10k_ml_dequeue_burst;
-	dev->op_error_get = cn10k_ml_op_error_get;
-
-	cnxk_mldev->nb_models_loaded = 0;
-	cnxk_mldev->nb_models_started = 0;
-	cnxk_mldev->nb_models_stopped = 0;
-	cnxk_mldev->nb_models_unloaded = 0;
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
+	cnxk_mldev->mldev->enqueue_burst = cn10k_ml_enqueue_burst;
+	cnxk_mldev->mldev->dequeue_burst = cn10k_ml_dequeue_burst;
+	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
 
 	return 0;
-
-error:
-	rte_free(dev->data->queue_pairs);
-
-	rte_free(dev->data->models);
-
-	return ret;
 }
 
 int
-cn10k_ml_dev_close(struct rte_ml_dev *dev)
+cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
-	struct cnxk_ml_qp *qp;
-	uint16_t model_id;
-	uint16_t qp_id;
 
-	if (dev == NULL)
-		return -EINVAL;
-
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Release ocm_mask memory */
 	rte_free(cn10k_mldev->ocm.ocm_mask);
 
-	/* Stop and unload all models */
-	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
-		model = dev->data->models[model_id];
-		if (model != NULL) {
-			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-				if (cn10k_ml_model_stop(dev, model_id) != 0)
-					plt_err("Could not stop model %u", model_id);
-			}
-			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-				if (cn10k_ml_model_unload(dev, model_id) != 0)
-					plt_err("Could not unload model %u", model_id);
-			}
-			dev->data->models[model_id] = NULL;
-		}
-	}
-
-	rte_free(dev->data->models);
-
-	/* Destroy all queue pairs */
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-		qp = dev->data->queue_pairs[qp_id];
-		if (qp != NULL) {
-			if (cnxk_ml_qp_destroy(dev, qp) != 0)
-				plt_err("Could not destroy queue pair %u", qp_id);
-			dev->data->queue_pairs[qp_id] = NULL;
-		}
-	}
-
-	rte_free(dev->data->queue_pairs);
-
 	/* Un-initialize xstats */
-	cn10k_ml_xstats_uninit(dev);
+	cn10k_ml_xstats_uninit(cnxk_mldev->mldev);
 
 	/* Unload firmware */
 	cn10k_ml_fw_unload(cnxk_mldev);
@@ -1154,20 +967,15 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_MLR_BASE);
 	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
 
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_CLOSED;
-
-	/* Remove PCI device */
-	return rte_dev_remove(dev->device);
+	return 0;
 }
 
 int
-cn10k_ml_dev_start(struct rte_ml_dev *dev)
+cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
@@ -1175,19 +983,15 @@ cn10k_ml_dev_start(struct rte_ml_dev *dev)
 	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
-
 	return 0;
 }
 
 int
-cn10k_ml_dev_stop(struct rte_ml_dev *dev)
+cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
@@ -1195,8 +999,6 @@ cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
-
 	return 0;
 }
 
@@ -1217,7 +1019,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	if (dev->data->queue_pairs[queue_pair_id] != NULL)
 		cn10k_ml_dev_queue_pair_release(dev, queue_pair_id);
 
-	cn10k_ml_dev_info_get(dev, &dev_info);
+	cnxk_ml_dev_info_get(dev, &dev_info);
 	if ((qp_conf->nb_desc > dev_info.max_desc) || (qp_conf->nb_desc == 0)) {
 		plt_err("Could not setup queue pair for %u descriptors", qp_conf->nb_desc);
 		return -EINVAL;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 16480b9ad8..d50b5bede7 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -10,6 +10,9 @@
 
 #include <roc_api.h>
 
+struct cnxk_ml_dev;
+struct cnxk_ml_qp;
+
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
 
@@ -286,11 +289,11 @@ struct cn10k_ml_req {
 };
 
 /* Device ops */
-int cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
-int cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf);
-int cn10k_ml_dev_close(struct rte_ml_dev *dev);
-int cn10k_ml_dev_start(struct rte_ml_dev *dev);
-int cn10k_ml_dev_stop(struct rte_ml_dev *dev);
+int cn10k_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info);
+int cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
+int cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev);
+int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
 int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
 int cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
@@ -336,4 +339,7 @@ __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op
 				    struct rte_ml_op_error *error);
 __rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
 
+/* Temporarily set below functions as non-static */
+int cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp);
+
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 51315de622..02605fa28f 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -53,6 +53,9 @@ struct cnxk_ml_dev {
 
 	/* CN10K device structure */
 	struct cn10k_ml_dev cn10k_mldev;
+
+	/* Maximum number of layers */
+	uint64_t max_nb_layers;
 };
 
 #endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 03402681c5..07a4daabc5 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -5,15 +5,291 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include "cnxk_ml_dev.h"
+#include "cnxk_ml_io.h"
+#include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
+int
+cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	if (dev == NULL || dev_info == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
+	dev_info->driver_name = dev->device->driver->name;
+	dev_info->max_models = ML_CNXK_MAX_MODELS;
+
+	return cn10k_ml_dev_info_get(cnxk_mldev, dev_info);
+}
+
+static int
+cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_qp *qp;
+	uint16_t model_id;
+	uint32_t mz_size;
+	uint16_t qp_id;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	/* Get CNXK device handle */
+	cnxk_mldev = dev->data->dev_private;
+
+	cnxk_ml_dev_info_get(dev, &dev_info);
+	if (conf->nb_models > dev_info.max_models) {
+		plt_err("Invalid device config, nb_models > %u\n", dev_info.max_models);
+		return -EINVAL;
+	}
+
+	if (conf->nb_queue_pairs > dev_info.max_queue_pairs) {
+		plt_err("Invalid device config, nb_queue_pairs > %u\n", dev_info.max_queue_pairs);
+		return -EINVAL;
+	}
+
+	if (cnxk_mldev->state == ML_CNXK_DEV_STATE_PROBED) {
+		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
+			   conf->nb_queue_pairs, conf->nb_models);
+
+		/* Load firmware */
+		ret = cn10k_ml_fw_load(cnxk_mldev);
+		if (ret != 0)
+			return ret;
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
+		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
+			   conf->nb_queue_pairs, conf->nb_models);
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_STARTED) {
+		plt_err("Device can't be reconfigured in started state\n");
+		return -ENOTSUP;
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CLOSED) {
+		plt_err("Device can't be reconfigured after close\n");
+		return -ENOTSUP;
+	}
+
+	/* Configure queue-pairs */
+	if (dev->data->queue_pairs == NULL) {
+		mz_size = sizeof(dev->data->queue_pairs[0]) * conf->nb_queue_pairs;
+		dev->data->queue_pairs =
+			rte_zmalloc("cnxk_mldev_queue_pairs", mz_size, RTE_CACHE_LINE_SIZE);
+		if (dev->data->queue_pairs == NULL) {
+			dev->data->nb_queue_pairs = 0;
+			plt_err("Failed to get memory for queue_pairs, nb_queue_pairs %u",
+				conf->nb_queue_pairs);
+			return -ENOMEM;
+		}
+	} else { /* Re-configure */
+		void **queue_pairs;
+
+		/* Release all queue pairs as ML spec doesn't support queue_pair_destroy. */
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+			qp = dev->data->queue_pairs[qp_id];
+			if (qp != NULL) {
+				ret = cn10k_ml_dev_queue_pair_release(dev, qp_id);
+				if (ret < 0)
+					return ret;
+			}
+		}
+
+		queue_pairs = dev->data->queue_pairs;
+		queue_pairs =
+			rte_realloc(queue_pairs, sizeof(queue_pairs[0]) * conf->nb_queue_pairs,
+				    RTE_CACHE_LINE_SIZE);
+		if (queue_pairs == NULL) {
+			dev->data->nb_queue_pairs = 0;
+			plt_err("Failed to realloc queue_pairs, nb_queue_pairs = %u",
+				conf->nb_queue_pairs);
+			ret = -ENOMEM;
+			goto error;
+		}
+
+		memset(queue_pairs, 0, sizeof(queue_pairs[0]) * conf->nb_queue_pairs);
+		dev->data->queue_pairs = queue_pairs;
+	}
+	dev->data->nb_queue_pairs = conf->nb_queue_pairs;
+
+	/* Allocate ML models */
+	if (dev->data->models == NULL) {
+		mz_size = sizeof(dev->data->models[0]) * conf->nb_models;
+		dev->data->models = rte_zmalloc("cnxk_mldev_models", mz_size, RTE_CACHE_LINE_SIZE);
+		if (dev->data->models == NULL) {
+			dev->data->nb_models = 0;
+			plt_err("Failed to get memory for ml_models, nb_models %u",
+				conf->nb_models);
+			ret = -ENOMEM;
+			goto error;
+		}
+	} else {
+		/* Re-configure */
+		void **models;
+
+		/* Stop and unload all models */
+		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+			model = dev->data->models[model_id];
+			if (model != NULL) {
+				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
+					if (cn10k_ml_model_stop(dev, model_id) != 0)
+						plt_err("Could not stop model %u", model_id);
+				}
+				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
+					if (cn10k_ml_model_unload(dev, model_id) != 0)
+						plt_err("Could not unload model %u", model_id);
+				}
+				dev->data->models[model_id] = NULL;
+			}
+		}
+
+		models = dev->data->models;
+		models = rte_realloc(models, sizeof(models[0]) * conf->nb_models,
+				     RTE_CACHE_LINE_SIZE);
+		if (models == NULL) {
+			dev->data->nb_models = 0;
+			plt_err("Failed to realloc ml_models, nb_models = %u", conf->nb_models);
+			ret = -ENOMEM;
+			goto error;
+		}
+		memset(models, 0, sizeof(models[0]) * conf->nb_models);
+		dev->data->models = models;
+	}
+	dev->data->nb_models = conf->nb_models;
+
+	ret = cn10k_ml_dev_configure(cnxk_mldev, conf);
+	if (ret != 0) {
+		plt_err("Failed to configure CN10K ML Device");
+		goto error;
+	}
+
+	/* Set device capabilities */
+	cnxk_mldev->max_nb_layers =
+		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
+
+	cnxk_mldev->nb_models_loaded = 0;
+	cnxk_mldev->nb_models_started = 0;
+	cnxk_mldev->nb_models_stopped = 0;
+	cnxk_mldev->nb_models_unloaded = 0;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
+
+	return 0;
+
+error:
+	rte_free(dev->data->queue_pairs);
+	rte_free(dev->data->models);
+
+	return ret;
+}
+
+static int
+cnxk_ml_dev_close(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_qp *qp;
+	uint16_t model_id;
+	uint16_t qp_id;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
+		plt_err("Failed to close CN10K ML Device");
+
+	/* Stop and unload all models */
+	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+		model = dev->data->models[model_id];
+		if (model != NULL) {
+			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
+				if (cn10k_ml_model_stop(dev, model_id) != 0)
+					plt_err("Could not stop model %u", model_id);
+			}
+			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
+				if (cn10k_ml_model_unload(dev, model_id) != 0)
+					plt_err("Could not unload model %u", model_id);
+			}
+			dev->data->models[model_id] = NULL;
+		}
+	}
+
+	rte_free(dev->data->models);
+
+	/* Destroy all queue pairs */
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		if (qp != NULL) {
+			if (cnxk_ml_qp_destroy(dev, qp) != 0)
+				plt_err("Could not destroy queue pair %u", qp_id);
+			dev->data->queue_pairs[qp_id] = NULL;
+		}
+	}
+
+	rte_free(dev->data->queue_pairs);
+
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CLOSED;
+
+	/* Remove PCI device */
+	return rte_dev_remove(dev->device);
+}
+
+static int
+cnxk_ml_dev_start(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	ret = cn10k_ml_dev_start(cnxk_mldev);
+	if (ret != 0) {
+		plt_err("Failed to start CN10K ML Device");
+		return ret;
+	}
+
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
+
+	return 0;
+}
+
+static int
+cnxk_ml_dev_stop(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	ret = cn10k_ml_dev_stop(cnxk_mldev);
+	if (ret != 0) {
+		plt_err("Failed to stop CN10K ML Device");
+		return ret;
+	}
+
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
-	.dev_info_get = cn10k_ml_dev_info_get,
-	.dev_configure = cn10k_ml_dev_configure,
-	.dev_close = cn10k_ml_dev_close,
-	.dev_start = cn10k_ml_dev_start,
-	.dev_stop = cn10k_ml_dev_stop,
+	.dev_info_get = cnxk_ml_dev_info_get,
+	.dev_configure = cnxk_ml_dev_configure,
+	.dev_close = cnxk_ml_dev_close,
+	.dev_start = cnxk_ml_dev_start,
+	.dev_stop = cnxk_ml_dev_stop,
 	.dev_dump = cn10k_ml_dev_dump,
 	.dev_selftest = cn10k_ml_dev_selftest,
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index a925c07580..2996928d7d 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -62,4 +62,7 @@ struct cnxk_ml_qp {
 
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
+/* Temporarily set cnxk driver functions as non-static */
+int cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 08/34] ml/cnxk: update queue-pair handling functions
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (6 preceding siblings ...)
  2023-10-18 13:53   ` [PATCH v6 07/34] ml/cnxk: update device handling functions Srikanth Yalavarthi
@ 2023-10-18 13:53   ` Srikanth Yalavarthi
  2023-10-18 13:53   ` [PATCH v6 09/34] ml/cnxk: update model load and unload functions Srikanth Yalavarthi
                     ` (26 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:53 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to handle ML device queue-pairs.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 135 +----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   7 +-
 drivers/ml/cnxk/cnxk_ml_ops.c  | 153 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.h  |   3 -
 4 files changed, 154 insertions(+), 144 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 6d8f2c8777..e3c688a55f 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -95,93 +95,12 @@ cn10k_ml_get_poll_ptr(struct cnxk_ml_req *req)
 	return plt_read64(req->status);
 }
 
-static void
-qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
-{
-	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
-}
-
-int
-cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
-{
-	const struct rte_memzone *qp_mem;
-	char name[RTE_MEMZONE_NAMESIZE];
-	int ret;
-
-	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp->id);
-	qp_mem = rte_memzone_lookup(name);
-	ret = rte_memzone_free(qp_mem);
-	if (ret)
-		return ret;
-
-	rte_free(qp);
-
-	return 0;
-}
-
-int
-cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
-{
-	struct cnxk_ml_qp *qp;
-	int ret;
-
-	qp = dev->data->queue_pairs[queue_pair_id];
-	if (qp == NULL)
-		return -EINVAL;
-
-	ret = cnxk_ml_qp_destroy(dev, qp);
-	if (ret) {
-		plt_err("Could not destroy queue pair %u", queue_pair_id);
-		return ret;
-	}
-
-	dev->data->queue_pairs[queue_pair_id] = NULL;
-
-	return 0;
-}
-
-static struct cnxk_ml_qp *
-cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
+void
+cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp)
 {
-	const struct rte_memzone *qp_mem;
-	char name[RTE_MEMZONE_NAMESIZE];
-	struct cnxk_ml_qp *qp;
-	uint32_t len;
-	uint8_t *va;
 	uint64_t i;
 
-	/* Allocate queue pair */
-	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cnxk_ml_qp), ROC_ALIGN,
-				socket_id);
-	if (qp == NULL) {
-		plt_err("Could not allocate queue pair");
-		return NULL;
-	}
-
-	/* For request queue */
-	len = nb_desc * sizeof(struct cnxk_ml_req);
-	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp_id);
-	qp_mem = rte_memzone_reserve_aligned(
-		name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | RTE_MEMZONE_256MB, ROC_ALIGN);
-	if (qp_mem == NULL) {
-		plt_err("Could not reserve memzone: %s", name);
-		goto qp_free;
-	}
-
-	va = qp_mem->addr;
-	memset(va, 0, len);
-
-	/* Initialize Request queue */
-	qp->id = qp_id;
-	qp->queue.reqs = (struct cnxk_ml_req *)va;
-	qp->queue.head = 0;
-	qp->queue.tail = 0;
-	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	qp->nb_desc = nb_desc;
-	qp->stats.enqueued_count = 0;
-	qp->stats.dequeued_count = 0;
-	qp->stats.enqueue_err_count = 0;
-	qp->stats.dequeue_err_count = 0;
+	RTE_SET_USED(cnxk_mldev);
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
@@ -189,13 +108,6 @@ cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc
 		qp->queue.reqs[i].cn10k_req.jcmd.w1.s.jobptr =
 			PLT_U64_CAST(&qp->queue.reqs[i].cn10k_req.jd);
 	}
-
-	return qp;
-
-qp_free:
-	rte_free(qp);
-
-	return NULL;
 }
 
 static void
@@ -1002,47 +914,6 @@ cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
-int
-cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
-			      const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
-{
-	struct rte_ml_dev_info dev_info;
-	struct cnxk_ml_qp *qp;
-	uint32_t nb_desc;
-
-	if (queue_pair_id >= dev->data->nb_queue_pairs) {
-		plt_err("Queue-pair id = %u (>= max queue pairs supported, %u)\n", queue_pair_id,
-			dev->data->nb_queue_pairs);
-		return -EINVAL;
-	}
-
-	if (dev->data->queue_pairs[queue_pair_id] != NULL)
-		cn10k_ml_dev_queue_pair_release(dev, queue_pair_id);
-
-	cnxk_ml_dev_info_get(dev, &dev_info);
-	if ((qp_conf->nb_desc > dev_info.max_desc) || (qp_conf->nb_desc == 0)) {
-		plt_err("Could not setup queue pair for %u descriptors", qp_conf->nb_desc);
-		return -EINVAL;
-	}
-	plt_ml_dbg("Creating queue-pair, queue_pair_id = %u, nb_desc = %u", queue_pair_id,
-		   qp_conf->nb_desc);
-
-	/* As the number of usable descriptors is 1 less than the queue size being created, we
-	 * increment the size of queue by 1 than the requested size, except when the requested size
-	 * is equal to the maximum possible size.
-	 */
-	nb_desc =
-		(qp_conf->nb_desc == dev_info.max_desc) ? dev_info.max_desc : qp_conf->nb_desc + 1;
-	qp = cnxk_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
-	if (qp == NULL) {
-		plt_err("Could not create queue pair %u", queue_pair_id);
-		return -ENOMEM;
-	}
-	dev->data->queue_pairs[queue_pair_id] = qp;
-
-	return 0;
-}
-
 int
 cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 {
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index d50b5bede7..2d0a49d5cd 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -296,9 +296,6 @@ int cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
 int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
-int cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
-				  const struct rte_ml_dev_qp_conf *qp_conf, int socket_id);
-int cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id);
 
 int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
 void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
@@ -339,7 +336,7 @@ __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op
 				    struct rte_ml_op_error *error);
 __rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
 
-/* Temporarily set below functions as non-static */
-int cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp);
+/* Misc ops */
+void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
 
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 07a4daabc5..aa56dd2276 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -10,7 +10,107 @@
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
-int
+static void
+qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
+{
+	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
+}
+
+static int
+cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
+{
+	const struct rte_memzone *qp_mem;
+	char name[RTE_MEMZONE_NAMESIZE];
+	int ret;
+
+	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp->id);
+	qp_mem = rte_memzone_lookup(name);
+	ret = rte_memzone_free(qp_mem);
+	if (ret)
+		return ret;
+
+	rte_free(qp);
+
+	return 0;
+}
+
+static int
+cnxk_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
+{
+	struct cnxk_ml_qp *qp;
+	int ret;
+
+	qp = dev->data->queue_pairs[queue_pair_id];
+	if (qp == NULL)
+		return -EINVAL;
+
+	ret = cnxk_ml_qp_destroy(dev, qp);
+	if (ret) {
+		plt_err("Could not destroy queue pair %u", queue_pair_id);
+		return ret;
+	}
+
+	dev->data->queue_pairs[queue_pair_id] = NULL;
+
+	return 0;
+}
+
+static struct cnxk_ml_qp *
+cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
+{
+	const struct rte_memzone *qp_mem;
+	char name[RTE_MEMZONE_NAMESIZE];
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_qp *qp;
+	uint32_t len;
+	uint8_t *va;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	/* Allocate queue pair */
+	qp = rte_zmalloc_socket("cnxk_ml_pmd_queue_pair", sizeof(struct cnxk_ml_qp), ROC_ALIGN,
+				socket_id);
+	if (qp == NULL) {
+		plt_err("Could not allocate queue pair");
+		return NULL;
+	}
+
+	/* For request queue */
+	len = nb_desc * sizeof(struct cnxk_ml_req);
+	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp_id);
+	qp_mem = rte_memzone_reserve_aligned(
+		name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | RTE_MEMZONE_256MB, ROC_ALIGN);
+	if (qp_mem == NULL) {
+		plt_err("Could not reserve memzone: %s", name);
+		goto qp_free;
+	}
+
+	va = qp_mem->addr;
+	memset(va, 0, len);
+
+	/* Initialize Request queue */
+	qp->id = qp_id;
+	qp->queue.reqs = (struct cnxk_ml_req *)va;
+	qp->queue.head = 0;
+	qp->queue.tail = 0;
+	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	qp->nb_desc = nb_desc;
+	qp->stats.enqueued_count = 0;
+	qp->stats.dequeued_count = 0;
+	qp->stats.enqueue_err_count = 0;
+	qp->stats.dequeue_err_count = 0;
+
+	cn10k_ml_qp_initialize(cnxk_mldev, qp);
+
+	return qp;
+
+qp_free:
+	rte_free(qp);
+
+	return NULL;
+}
+
+static int
 cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
 	struct cnxk_ml_dev *cnxk_mldev;
@@ -93,7 +193,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
 			qp = dev->data->queue_pairs[qp_id];
 			if (qp != NULL) {
-				ret = cn10k_ml_dev_queue_pair_release(dev, qp_id);
+				ret = cnxk_ml_dev_queue_pair_release(dev, qp_id);
 				if (ret < 0)
 					return ret;
 			}
@@ -283,6 +383,51 @@ cnxk_ml_dev_stop(struct rte_ml_dev *dev)
 	return 0;
 }
 
+static int
+cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
+			     const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cnxk_ml_qp *qp;
+	uint32_t nb_desc;
+
+	if (queue_pair_id >= dev->data->nb_queue_pairs) {
+		plt_err("Queue-pair id = %u (>= max queue pairs supported, %u)\n", queue_pair_id,
+			dev->data->nb_queue_pairs);
+		return -EINVAL;
+	}
+
+	if (dev->data->queue_pairs[queue_pair_id] != NULL)
+		cnxk_ml_dev_queue_pair_release(dev, queue_pair_id);
+
+	cnxk_ml_dev_info_get(dev, &dev_info);
+	if (qp_conf->nb_desc == 0) {
+		plt_err("Could not setup queue pair for %u descriptors", qp_conf->nb_desc);
+		return -EINVAL;
+	} else if (qp_conf->nb_desc > dev_info.max_desc) {
+		plt_err("Could not setup queue pair for %u descriptors (> %u)", qp_conf->nb_desc,
+			dev_info.max_desc);
+		return -EINVAL;
+	}
+	plt_ml_dbg("Creating queue-pair, queue_pair_id = %u, nb_desc = %u", queue_pair_id,
+		   qp_conf->nb_desc);
+
+	/* As the number of usable descriptors is 1 less than the queue size being created, we
+	 * increment the size of queue by 1 than the requested size, except when the requested size
+	 * is equal to the maximum possible size.
+	 */
+	nb_desc =
+		(qp_conf->nb_desc == dev_info.max_desc) ? dev_info.max_desc : qp_conf->nb_desc + 1;
+	qp = cnxk_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
+	if (qp == NULL) {
+		plt_err("Could not create queue pair %u", queue_pair_id);
+		return -ENOMEM;
+	}
+	dev->data->queue_pairs[queue_pair_id] = qp;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -294,8 +439,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_selftest = cn10k_ml_dev_selftest,
 
 	/* Queue-pair handling ops */
-	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
-	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
+	.dev_queue_pair_setup = cnxk_ml_dev_queue_pair_setup,
+	.dev_queue_pair_release = cnxk_ml_dev_queue_pair_release,
 
 	/* Stats ops */
 	.dev_stats_get = cn10k_ml_dev_stats_get,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index 2996928d7d..a925c07580 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -62,7 +62,4 @@ struct cnxk_ml_qp {
 
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
-/* Temporarily set cnxk driver functions as non-static */
-int cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
-
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 09/34] ml/cnxk: update model load and unload functions
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (7 preceding siblings ...)
  2023-10-18 13:53   ` [PATCH v6 08/34] ml/cnxk: update queue-pair " Srikanth Yalavarthi
@ 2023-10-18 13:53   ` Srikanth Yalavarthi
  2023-10-18 13:53   ` [PATCH v6 10/34] ml/cnxk: update model start and stop functions Srikanth Yalavarthi
                     ` (25 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:53 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented cnxk wrapper functions to load and unload
ML models. Wrapper functions would invoke the cn10k
model load and unload functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 244 ++++++++++++-------------
 drivers/ml/cnxk/cn10k_ml_model.h |  26 ++-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 296 ++++++++++++++++++-------------
 drivers/ml/cnxk/cn10k_ml_ops.h   |  12 +-
 drivers/ml/cnxk/cnxk_ml_dev.h    |  15 ++
 drivers/ml/cnxk/cnxk_ml_ops.c    | 144 ++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.h    |   2 +
 7 files changed, 462 insertions(+), 277 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 5d37e9bf8a..69a60b9b90 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -316,42 +316,31 @@ cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer, uint8_t
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_layer_addr *addr;
-	size_t model_data_size;
 	uint8_t *dma_addr_load;
-	uint8_t *dma_addr_run;
 	int fpos;
 
 	metadata = &layer->glow.metadata;
 	addr = &layer->glow.addr;
-	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
-			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
 
 	/* Base address */
 	addr->base_dma_addr_load = base_dma_addr;
-	addr->base_dma_addr_run = PLT_PTR_ADD(addr->base_dma_addr_load, model_data_size);
 
 	/* Init section */
 	dma_addr_load = addr->base_dma_addr_load;
-	dma_addr_run = addr->base_dma_addr_run;
 	fpos = sizeof(struct cn10k_ml_model_metadata);
 	addr->init_load_addr = dma_addr_load;
-	addr->init_run_addr = dma_addr_run;
 	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->init_model.file_size);
 
 	/* Main section */
 	dma_addr_load += metadata->init_model.file_size;
-	dma_addr_run += metadata->init_model.file_size;
 	fpos += metadata->init_model.file_size;
 	addr->main_load_addr = dma_addr_load;
-	addr->main_run_addr = dma_addr_run;
 	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->main_model.file_size);
 
 	/* Finish section */
 	dma_addr_load += metadata->main_model.file_size;
-	dma_addr_run += metadata->main_model.file_size;
 	fpos += metadata->main_model.file_size;
 	addr->finish_load_addr = dma_addr_load;
-	addr->finish_run_addr = dma_addr_run;
 	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->finish_model.file_size);
 
 	/* Weights and Bias section */
@@ -363,140 +352,146 @@ cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer, uint8_t
 }
 
 void
-cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer)
+cn10k_ml_layer_io_info_set(struct cnxk_ml_io_info *io_info,
+			   struct cn10k_ml_model_metadata *metadata)
 {
-	struct cn10k_ml_model_metadata *metadata;
 	uint8_t i;
 	uint8_t j;
 
-	metadata = &layer->glow.metadata;
-
 	/* Inputs */
-	layer->info.nb_inputs = metadata->model.num_input;
-	layer->info.total_input_sz_d = 0;
-	layer->info.total_input_sz_q = 0;
+	io_info->nb_inputs = metadata->model.num_input;
+	io_info->total_input_sz_d = 0;
+	io_info->total_input_sz_q = 0;
 	for (i = 0; i < metadata->model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			strncpy(layer->info.input[i].name, (char *)metadata->input1[i].input_name,
+			strncpy(io_info->input[i].name, (char *)metadata->input1[i].input_name,
 				MRVL_ML_INPUT_NAME_LEN);
-			layer->info.input[i].dtype = metadata->input1[i].input_type;
-			layer->info.input[i].qtype = metadata->input1[i].model_input_type;
-			layer->info.input[i].nb_dims = 4;
-			layer->info.input[i].shape[0] = metadata->input1[i].shape.w;
-			layer->info.input[i].shape[1] = metadata->input1[i].shape.x;
-			layer->info.input[i].shape[2] = metadata->input1[i].shape.y;
-			layer->info.input[i].shape[3] = metadata->input1[i].shape.z;
-			layer->info.input[i].nb_elements =
+			io_info->input[i].dtype = metadata->input1[i].input_type;
+			io_info->input[i].qtype = metadata->input1[i].model_input_type;
+			io_info->input[i].nb_dims = 4;
+			io_info->input[i].shape[0] = metadata->input1[i].shape.w;
+			io_info->input[i].shape[1] = metadata->input1[i].shape.x;
+			io_info->input[i].shape[2] = metadata->input1[i].shape.y;
+			io_info->input[i].shape[3] = metadata->input1[i].shape.z;
+			io_info->input[i].nb_elements =
 				metadata->input1[i].shape.w * metadata->input1[i].shape.x *
 				metadata->input1[i].shape.y * metadata->input1[i].shape.z;
-			layer->info.input[i].sz_d =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_d =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].input_type);
-			layer->info.input[i].sz_q =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_q =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].model_input_type);
-			layer->info.input[i].scale = metadata->input1[i].qscale;
+			io_info->input[i].scale = metadata->input1[i].qscale;
 
-			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
-			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
+			io_info->total_input_sz_d += io_info->input[i].sz_d;
+			io_info->total_input_sz_q += io_info->input[i].sz_q;
 
 			plt_ml_dbg(
-				"index = %u, input1[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				layer->index, i, metadata->input1[i].shape.w,
+				"layer_name = %s, input1[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				metadata->model.name, i, metadata->input1[i].shape.w,
 				metadata->input1[i].shape.x, metadata->input1[i].shape.y,
-				metadata->input1[i].shape.z, layer->info.input[i].sz_d,
-				layer->info.input[i].sz_q);
+				metadata->input1[i].shape.z, io_info->input[i].sz_d,
+				io_info->input[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			strncpy(layer->info.input[i].name, (char *)metadata->input2[j].input_name,
+			strncpy(io_info->input[i].name, (char *)metadata->input2[j].input_name,
 				MRVL_ML_INPUT_NAME_LEN);
-			layer->info.input[i].dtype = metadata->input2[j].input_type;
-			layer->info.input[i].qtype = metadata->input2[j].model_input_type;
-			layer->info.input[i].nb_dims = 4;
-			layer->info.input[i].shape[0] = metadata->input2[j].shape.w;
-			layer->info.input[i].shape[1] = metadata->input2[j].shape.x;
-			layer->info.input[i].shape[2] = metadata->input2[j].shape.y;
-			layer->info.input[i].shape[3] = metadata->input2[j].shape.z;
-			layer->info.input[i].nb_elements =
+			io_info->input[i].dtype = metadata->input2[j].input_type;
+			io_info->input[i].qtype = metadata->input2[j].model_input_type;
+			io_info->input[i].nb_dims = 4;
+			io_info->input[i].shape[0] = metadata->input2[j].shape.w;
+			io_info->input[i].shape[1] = metadata->input2[j].shape.x;
+			io_info->input[i].shape[2] = metadata->input2[j].shape.y;
+			io_info->input[i].shape[3] = metadata->input2[j].shape.z;
+			io_info->input[i].nb_elements =
 				metadata->input2[j].shape.w * metadata->input2[j].shape.x *
 				metadata->input2[j].shape.y * metadata->input2[j].shape.z;
-			layer->info.input[i].sz_d =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_d =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].input_type);
-			layer->info.input[i].sz_q =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_q =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].model_input_type);
-			layer->info.input[i].scale = metadata->input2[j].qscale;
+			io_info->input[i].scale = metadata->input2[j].qscale;
 
-			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
-			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
+			io_info->total_input_sz_d += io_info->input[i].sz_d;
+			io_info->total_input_sz_q += io_info->input[i].sz_q;
 
 			plt_ml_dbg(
-				"index = %u, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				layer->index, j, metadata->input2[j].shape.w,
+				"layer_name = %s, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				metadata->model.name, j, metadata->input2[j].shape.w,
 				metadata->input2[j].shape.x, metadata->input2[j].shape.y,
-				metadata->input2[j].shape.z, layer->info.input[i].sz_d,
-				layer->info.input[i].sz_q);
+				metadata->input2[j].shape.z, io_info->input[i].sz_d,
+				io_info->input[i].sz_q);
 		}
 	}
 
 	/* Outputs */
-	layer->info.nb_outputs = metadata->model.num_output;
-	layer->info.total_output_sz_q = 0;
-	layer->info.total_output_sz_d = 0;
+	io_info->nb_outputs = metadata->model.num_output;
+	io_info->total_output_sz_q = 0;
+	io_info->total_output_sz_d = 0;
 	for (i = 0; i < metadata->model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			strncpy(layer->info.output[i].name,
-				(char *)metadata->output1[i].output_name, MRVL_ML_OUTPUT_NAME_LEN);
-			layer->info.output[i].dtype = metadata->output1[i].output_type;
-			layer->info.output[i].qtype = metadata->output1[i].model_output_type;
-			layer->info.output[i].nb_dims = 1;
-			layer->info.output[i].shape[0] = metadata->output1[i].size;
-			layer->info.output[i].nb_elements = metadata->output1[i].size;
-			layer->info.output[i].sz_d =
-				layer->info.output[i].nb_elements *
+			strncpy(io_info->output[i].name, (char *)metadata->output1[i].output_name,
+				MRVL_ML_OUTPUT_NAME_LEN);
+			io_info->output[i].dtype = metadata->output1[i].output_type;
+			io_info->output[i].qtype = metadata->output1[i].model_output_type;
+			io_info->output[i].nb_dims = 1;
+			io_info->output[i].shape[0] = metadata->output1[i].size;
+			io_info->output[i].nb_elements = metadata->output1[i].size;
+			io_info->output[i].sz_d =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].output_type);
-			layer->info.output[i].sz_q =
-				layer->info.output[i].nb_elements *
+			io_info->output[i].sz_q =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].model_output_type);
-			layer->info.output[i].scale = metadata->output1[i].dscale;
+			io_info->output[i].scale = metadata->output1[i].dscale;
 
-			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
-			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
+			io_info->total_output_sz_q += io_info->output[i].sz_q;
+			io_info->total_output_sz_d += io_info->output[i].sz_d;
 
-			plt_ml_dbg("index = %u, output1[%u] - sz_d = %u, sz_q = %u", layer->index,
-				   i, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
+			plt_ml_dbg("layer_name = %s, output1[%u] - sz_d = %u, sz_q = %u",
+				   metadata->model.name, i, io_info->output[i].sz_d,
+				   io_info->output[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			strncpy(layer->info.output[i].name,
-				(char *)metadata->output2[j].output_name, MRVL_ML_OUTPUT_NAME_LEN);
-			layer->info.output[i].dtype = metadata->output2[j].output_type;
-			layer->info.output[i].qtype = metadata->output2[j].model_output_type;
-			layer->info.output[i].nb_dims = 1;
-			layer->info.output[i].shape[0] = metadata->output2[j].size;
-			layer->info.output[i].nb_elements = metadata->output2[j].size;
-			layer->info.output[i].sz_d =
-				layer->info.output[i].nb_elements *
+			strncpy(io_info->output[i].name, (char *)metadata->output2[j].output_name,
+				MRVL_ML_OUTPUT_NAME_LEN);
+			io_info->output[i].dtype = metadata->output2[j].output_type;
+			io_info->output[i].qtype = metadata->output2[j].model_output_type;
+			io_info->output[i].nb_dims = 1;
+			io_info->output[i].shape[0] = metadata->output2[j].size;
+			io_info->output[i].nb_elements = metadata->output2[j].size;
+			io_info->output[i].sz_d =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].output_type);
-			layer->info.output[i].sz_q =
-				layer->info.output[i].nb_elements *
+			io_info->output[i].sz_q =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].model_output_type);
-			layer->info.output[i].scale = metadata->output2[j].dscale;
+			io_info->output[i].scale = metadata->output2[j].dscale;
 
-			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
-			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
+			io_info->total_output_sz_q += io_info->output[i].sz_q;
+			io_info->total_output_sz_d += io_info->output[i].sz_d;
 
-			plt_ml_dbg("index = %u, output2[%u] - sz_d = %u, sz_q = %u", layer->index,
-				   j, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
+			plt_ml_dbg("layer_name = %s, output2[%u] - sz_d = %u, sz_q = %u",
+				   metadata->model.name, j, io_info->output[i].sz_d,
+				   io_info->output[i].sz_q);
 		}
 	}
 }
 
+struct cnxk_ml_io_info *
+cn10k_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
+{
+	return &model->layer[layer_id].info;
+}
+
 int
-cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id, uint8_t *buffer,
-			       uint16_t *wb_pages, uint16_t *scratch_pages)
+cn10k_ml_model_ocm_pages_count(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
+			       uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_ocm *ocm;
@@ -504,7 +499,7 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 	uint64_t wb_size;
 
 	metadata = (struct cn10k_ml_model_metadata *)buffer;
-	ocm = &cn10k_mldev->ocm;
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
 
 	/* Assume wb_size is zero for non-relocatable models */
 	if (metadata->model.ocm_relocatable)
@@ -516,7 +511,7 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 		*wb_pages = wb_size / ocm->page_size + 1;
 	else
 		*wb_pages = wb_size / ocm->page_size;
-	plt_ml_dbg("model_id = %u, wb_size = %" PRIu64 ", wb_pages = %u", model_id, wb_size,
+	plt_ml_dbg("index = %u, wb_size = %" PRIu64 ", wb_pages = %u", layer->index, wb_size,
 		   *wb_pages);
 
 	scratch_size = ocm->size_per_tile - metadata->model.ocm_tmp_range_floor;
@@ -524,15 +519,15 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 		*scratch_pages = scratch_size / ocm->page_size + 1;
 	else
 		*scratch_pages = scratch_size / ocm->page_size;
-	plt_ml_dbg("model_id = %u, scratch_size = %" PRIu64 ", scratch_pages = %u", model_id,
+	plt_ml_dbg("index = %u, scratch_size = %" PRIu64 ", scratch_pages = %u", layer->index,
 		   scratch_size, *scratch_pages);
 
 	/* Check if the model can be loaded on OCM */
-	if ((*wb_pages + *scratch_pages) > cn10k_mldev->ocm.num_pages) {
+	if ((*wb_pages + *scratch_pages) > ocm->num_pages) {
 		plt_err("Cannot create the model, OCM relocatable = %u",
 			metadata->model.ocm_relocatable);
 		plt_err("wb_pages (%u) + scratch_pages (%u) > %u", *wb_pages, *scratch_pages,
-			cn10k_mldev->ocm.num_pages);
+			ocm->num_pages);
 		return -ENOMEM;
 	}
 
@@ -540,28 +535,25 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 	 * prevent the library from allocating the remaining space on the tile to other models.
 	 */
 	if (!metadata->model.ocm_relocatable)
-		*scratch_pages = PLT_MAX(PLT_U64_CAST(*scratch_pages),
-					 PLT_U64_CAST(cn10k_mldev->ocm.num_pages));
+		*scratch_pages =
+			PLT_MAX(PLT_U64_CAST(*scratch_pages), PLT_U64_CAST(ocm->num_pages));
 
 	return 0;
 }
 
 void
-cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
+cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			struct cnxk_ml_io_info *io_info, struct cn10k_ml_model_metadata *metadata)
 {
-	struct cn10k_ml_model_metadata *metadata;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct rte_ml_model_info *info;
 	struct rte_ml_io_info *output;
 	struct rte_ml_io_info *input;
-	struct cnxk_ml_layer *layer;
 	uint8_t i;
 
-	cnxk_mldev = dev->data->dev_private;
 	metadata = &model->glow.metadata;
 	info = PLT_PTR_CAST(model->info);
 	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
-	output = PLT_PTR_ADD(input, metadata->model.num_input * sizeof(struct rte_ml_io_info));
+	output = PLT_PTR_ADD(input, ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info));
 
 	/* Set model info */
 	memset(info, 0, sizeof(struct rte_ml_model_info));
@@ -570,39 +562,37 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 		 metadata->model.version[1], metadata->model.version[2],
 		 metadata->model.version[3]);
 	info->model_id = model->model_id;
-	info->device_id = dev->data->dev_id;
+	info->device_id = cnxk_mldev->mldev->data->dev_id;
 	info->io_layout = RTE_ML_IO_LAYOUT_PACKED;
 	info->min_batches = model->batch_size;
 	info->max_batches =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_num_batches /
 		model->batch_size;
-	info->nb_inputs = metadata->model.num_input;
+	info->nb_inputs = io_info->nb_inputs;
 	info->input_info = input;
-	info->nb_outputs = metadata->model.num_output;
+	info->nb_outputs = io_info->nb_outputs;
 	info->output_info = output;
 	info->wb_size = metadata->weights_bias.file_size;
 
 	/* Set input info */
-	layer = &model->layer[0];
 	for (i = 0; i < info->nb_inputs; i++) {
-		rte_memcpy(input[i].name, layer->info.input[i].name, MRVL_ML_INPUT_NAME_LEN);
-		input[i].nb_dims = layer->info.input[i].nb_dims;
-		input[i].shape = &layer->info.input[i].shape[0];
-		input[i].type = layer->info.input[i].qtype;
-		input[i].nb_elements = layer->info.input[i].nb_elements;
-		input[i].size = layer->info.input[i].nb_elements *
-				rte_ml_io_type_size_get(layer->info.input[i].qtype);
+		rte_memcpy(input[i].name, io_info->input[i].name, MRVL_ML_INPUT_NAME_LEN);
+		input[i].nb_dims = io_info->input[i].nb_dims;
+		input[i].shape = &io_info->input[i].shape[0];
+		input[i].type = io_info->input[i].qtype;
+		input[i].nb_elements = io_info->input[i].nb_elements;
+		input[i].size = io_info->input[i].nb_elements *
+				rte_ml_io_type_size_get(io_info->input[i].qtype);
 	}
 
 	/* Set output info */
-	layer = &model->layer[0];
 	for (i = 0; i < info->nb_outputs; i++) {
-		rte_memcpy(output[i].name, layer->info.output[i].name, MRVL_ML_INPUT_NAME_LEN);
-		output[i].nb_dims = layer->info.output[i].nb_dims;
-		output[i].shape = &layer->info.output[i].shape[0];
-		output[i].type = layer->info.output[i].qtype;
-		output[i].nb_elements = layer->info.output[i].nb_elements;
-		output[i].size = layer->info.output[i].nb_elements *
-				 rte_ml_io_type_size_get(layer->info.output[i].qtype);
+		rte_memcpy(output[i].name, io_info->output[i].name, MRVL_ML_INPUT_NAME_LEN);
+		output[i].nb_dims = io_info->output[i].nb_dims;
+		output[i].shape = &io_info->output[i].shape[0];
+		output[i].type = io_info->output[i].qtype;
+		output[i].nb_elements = io_info->output[i].nb_elements;
+		output[i].size = io_info->output[i].nb_elements *
+				 rte_ml_io_type_size_get(io_info->output[i].qtype);
 	}
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 5c32f48c68..b891c9d627 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -9,9 +9,11 @@
 
 #include <roc_api.h>
 
-#include "cn10k_ml_dev.h"
 #include "cn10k_ml_ocm.h"
 
+#include "cnxk_ml_io.h"
+
+struct cnxk_ml_dev;
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
 struct cnxk_ml_req;
@@ -366,27 +368,15 @@ struct cn10k_ml_layer_addr {
 	/* Base DMA address for load */
 	void *base_dma_addr_load;
 
-	/* Base DMA address for run */
-	void *base_dma_addr_run;
-
 	/* Init section load address */
 	void *init_load_addr;
 
-	/* Init section run address */
-	void *init_run_addr;
-
 	/* Main section load address */
 	void *main_load_addr;
 
-	/* Main section run address */
-	void *main_run_addr;
-
 	/* Finish section load address */
 	void *finish_load_addr;
 
-	/* Finish section run address */
-	void *finish_run_addr;
-
 	/* Weights and Bias base address */
 	void *wb_base_addr;
 
@@ -462,9 +452,13 @@ int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
 void cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer,
 				uint8_t *base_dma_addr);
-void cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer);
-int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id,
+void cn10k_ml_layer_io_info_set(struct cnxk_ml_io_info *io_info,
+				struct cn10k_ml_model_metadata *metadata);
+struct cnxk_ml_io_info *cn10k_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
+int cn10k_ml_model_ocm_pages_count(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				   uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages);
-void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model);
+void cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			     struct cnxk_ml_io_info *io_info,
+			     struct cn10k_ml_model_metadata *metadata);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index e3c688a55f..ad2effb904 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -15,6 +15,9 @@
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
 
+/* ML layer macros */
+#define CN10K_ML_LAYER_MEMZONE_NAME "ml_cn10k_layer_mz"
+
 /* Debug print width */
 #define STR_LEN	  12
 #define FIELD_LEN 16
@@ -273,7 +276,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 		req->cn10k_req.jd.model_start.extended_args = PLT_U64_CAST(
 			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.extended_args));
 		req->cn10k_req.jd.model_start.model_dst_ddr_addr =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_run_addr));
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_load_addr));
 		req->cn10k_req.jd.model_start.model_init_offset = 0x0;
 		req->cn10k_req.jd.model_start.model_main_offset = metadata->init_model.file_size;
 		req->cn10k_req.jd.model_start.model_finish_offset =
@@ -1261,85 +1264,171 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 }
 
 int
-cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
+cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uint8_t *buffer,
+		    size_t size, uint16_t *index)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
-	size_t model_scratch_size;
-	size_t model_stats_size;
-	size_t model_data_size;
-	size_t model_info_size;
+	size_t layer_object_size = 0;
+	size_t layer_scratch_size;
+	size_t layer_xstats_size;
 	uint8_t *base_dma_addr;
 	uint16_t scratch_pages;
+	uint16_t layer_id = 0;
 	uint16_t wb_pages;
 	uint64_t mz_size;
 	uint16_t idx;
-	bool found;
 	int qp_id;
 	int ret;
 
-	ret = cn10k_ml_model_metadata_check(params->addr, params->size);
+	PLT_SET_USED(size);
+	PLT_SET_USED(layer_name);
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	layer = &model->layer[layer_id];
+
+	ret = cn10k_ml_model_metadata_check(buffer, size);
 	if (ret != 0)
 		return ret;
 
-	cnxk_mldev = dev->data->dev_private;
-
-	/* Find model ID */
-	found = false;
-	for (idx = 0; idx < dev->data->nb_models; idx++) {
-		if (dev->data->models[idx] == NULL) {
-			found = true;
+	/* Get index */
+	for (idx = 0; idx < cnxk_mldev->max_nb_layers; idx++) {
+		if (!cnxk_mldev->index_map[idx].active) {
+			layer->index = idx;
 			break;
 		}
 	}
 
-	if (!found) {
-		plt_err("No slots available to load new model");
-		return -ENOMEM;
+	if (idx >= cnxk_mldev->max_nb_layers) {
+		plt_err("No slots available for model layers, model_id = %u, layer_id = %u",
+			model->model_id, layer_id);
+		return -1;
 	}
 
+	layer->model = model;
+
 	/* Get WB and scratch pages, check if model can be loaded. */
-	ret = cn10k_ml_model_ocm_pages_count(&cnxk_mldev->cn10k_mldev, idx, params->addr, &wb_pages,
-					     &scratch_pages);
+	ret = cn10k_ml_model_ocm_pages_count(cnxk_mldev, layer, buffer, &wb_pages, &scratch_pages);
 	if (ret < 0)
 		return ret;
 
-	/* Compute memzone size */
-	metadata = (struct cn10k_ml_model_metadata *)params->addr;
-	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
-			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
-	model_scratch_size = PLT_ALIGN_CEIL(metadata->model.ddr_scratch_range_end -
+	/* Compute layer memzone size */
+	metadata = (struct cn10k_ml_model_metadata *)buffer;
+	layer_object_size = metadata->init_model.file_size + metadata->main_model.file_size +
+			    metadata->finish_model.file_size + metadata->weights_bias.file_size;
+	layer_object_size = PLT_ALIGN_CEIL(layer_object_size, ML_CN10K_ALIGN_SIZE);
+	layer_scratch_size = PLT_ALIGN_CEIL(metadata->model.ddr_scratch_range_end -
 						    metadata->model.ddr_scratch_range_start + 1,
 					    ML_CN10K_ALIGN_SIZE);
-	model_data_size = PLT_ALIGN_CEIL(model_data_size, ML_CN10K_ALIGN_SIZE);
-	model_info_size = sizeof(struct rte_ml_model_info) +
-			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
-			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
-	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
-	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_xstats);
-
-	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
-		  2 * model_data_size + model_scratch_size + model_info_size +
-		  PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE) +
-		  model_stats_size;
+	layer_xstats_size = (cnxk_mldev->mldev->data->nb_queue_pairs + 1) *
+			    sizeof(struct cn10k_ml_layer_xstats);
 
-	/* Allocate memzone for model object and model data */
-	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
+	/* Allocate memzone for model data */
+	mz_size = layer_object_size + layer_scratch_size +
+		  PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE) +
+		  layer_xstats_size;
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u_%u", CN10K_ML_LAYER_MEMZONE_NAME,
+		 model->model_id, layer_id);
 	mz = plt_memzone_reserve_aligned(str, mz_size, 0, ML_CN10K_ALIGN_SIZE);
 	if (!mz) {
 		plt_err("plt_memzone_reserve failed : %s", str);
 		return -ENOMEM;
 	}
 
-	model = mz->addr;
-	model->cnxk_mldev = cnxk_mldev;
-	model->model_id = idx;
-	dev->data->models[idx] = model;
+	/* Copy metadata to internal buffer */
+	rte_memcpy(&layer->glow.metadata, buffer, sizeof(struct cn10k_ml_model_metadata));
+	cn10k_ml_model_metadata_update(&layer->glow.metadata);
+
+	/* Set layer name */
+	rte_memcpy(layer->name, layer->glow.metadata.model.name, MRVL_ML_MODEL_NAME_LEN);
+
+	/* Enable support for batch_size of 256 */
+	if (layer->glow.metadata.model.batch_size == 0)
+		layer->batch_size = 256;
+	else
+		layer->batch_size = layer->glow.metadata.model.batch_size;
+
+	/* Set DMA base address */
+	base_dma_addr = mz->addr;
+	cn10k_ml_layer_addr_update(layer, buffer, base_dma_addr);
+
+	/* Set scratch base address */
+	layer->glow.addr.scratch_base_addr = PLT_PTR_ADD(base_dma_addr, layer_object_size);
+
+	/* Update internal I/O data structure */
+	cn10k_ml_layer_io_info_set(&layer->info, &layer->glow.metadata);
+
+	/* Initialize model_mem_map */
+	memset(&layer->glow.ocm_map, 0, sizeof(struct cn10k_ml_ocm_layer_map));
+	layer->glow.ocm_map.ocm_reserved = false;
+	layer->glow.ocm_map.tilemask = 0;
+	layer->glow.ocm_map.wb_page_start = -1;
+	layer->glow.ocm_map.wb_pages = wb_pages;
+	layer->glow.ocm_map.scratch_pages = scratch_pages;
+
+	/* Set slow-path request address and state */
+	layer->glow.req = PLT_PTR_ADD(mz->addr, layer_object_size + layer_scratch_size);
+
+	/* Reset burst and sync stats */
+	layer->glow.burst_xstats = PLT_PTR_ADD(
+		layer->glow.req, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
+	for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs + 1; qp_id++) {
+		layer->glow.burst_xstats[qp_id].hw_latency_tot = 0;
+		layer->glow.burst_xstats[qp_id].hw_latency_min = UINT64_MAX;
+		layer->glow.burst_xstats[qp_id].hw_latency_max = 0;
+		layer->glow.burst_xstats[qp_id].fw_latency_tot = 0;
+		layer->glow.burst_xstats[qp_id].fw_latency_min = UINT64_MAX;
+		layer->glow.burst_xstats[qp_id].fw_latency_max = 0;
+		layer->glow.burst_xstats[qp_id].hw_reset_count = 0;
+		layer->glow.burst_xstats[qp_id].fw_reset_count = 0;
+		layer->glow.burst_xstats[qp_id].dequeued_count = 0;
+	}
+
+	layer->glow.sync_xstats =
+		PLT_PTR_ADD(layer->glow.burst_xstats, cnxk_mldev->mldev->data->nb_queue_pairs *
+							      sizeof(struct cn10k_ml_layer_xstats));
+
+	/* Update xstats names */
+	cn10k_ml_xstats_model_name_update(cnxk_mldev->mldev, idx);
+
+	layer->state = ML_CNXK_LAYER_STATE_LOADED;
+	cnxk_mldev->index_map[idx].model_id = model->model_id;
+	cnxk_mldev->index_map[idx].layer_id = layer_id;
+	cnxk_mldev->index_map[idx].active = true;
+	*index = idx;
+
+	return 0;
+}
+
+int
+cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+		    struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+	int ret;
+
+	/* Metadata check */
+	ret = cn10k_ml_model_metadata_check(params->addr, params->size);
+	if (ret != 0)
+		return ret;
 
+	/* Copy metadata to internal buffer */
 	rte_memcpy(&model->glow.metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
 	cn10k_ml_model_metadata_update(&model->glow.metadata);
 
@@ -1358,99 +1447,62 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	 */
 	model->nb_layers = 1;
 
-	/* Copy metadata to internal buffer */
-	rte_memcpy(&model->layer[0].glow.metadata, params->addr,
-		   sizeof(struct cn10k_ml_model_metadata));
-	cn10k_ml_model_metadata_update(&model->layer[0].glow.metadata);
-	model->layer[0].model = model;
-
-	/* Set DMA base address */
-	base_dma_addr = PLT_PTR_ADD(
-		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE));
-	cn10k_ml_layer_addr_update(&model->layer[0], params->addr, base_dma_addr);
-	model->layer[0].glow.addr.scratch_base_addr =
-		PLT_PTR_ADD(base_dma_addr, 2 * model_data_size);
-
-	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
-		   model->layer[0].glow.addr.base_dma_addr_load, model_data_size);
-
-	/* Update internal I/O data structure */
-	cn10k_ml_layer_info_update(&model->layer[0]);
-
-	/* Initialize model_mem_map */
-	memset(&model->layer[0].glow.ocm_map, 0, sizeof(struct cn10k_ml_ocm_layer_map));
-	model->layer[0].glow.ocm_map.ocm_reserved = false;
-	model->layer[0].glow.ocm_map.tilemask = 0;
-	model->layer[0].glow.ocm_map.wb_page_start = -1;
-	model->layer[0].glow.ocm_map.wb_pages = wb_pages;
-	model->layer[0].glow.ocm_map.scratch_pages = scratch_pages;
-
-	/* Set model info */
-	model->info = PLT_PTR_ADD(model->layer[0].glow.addr.scratch_base_addr, model_scratch_size);
-	cn10k_ml_model_info_set(dev, model);
-
-	/* Set slow-path request address and state */
-	model->layer[0].glow.req = PLT_PTR_ADD(model->info, model_info_size);
-
-	/* Reset burst and sync stats */
-	model->layer[0].glow.burst_xstats =
-		PLT_PTR_ADD(model->layer[0].glow.req,
-			    PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
-		model->layer[0].glow.burst_xstats[qp_id].hw_latency_tot = 0;
-		model->layer[0].glow.burst_xstats[qp_id].hw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_xstats[qp_id].hw_latency_max = 0;
-		model->layer[0].glow.burst_xstats[qp_id].fw_latency_tot = 0;
-		model->layer[0].glow.burst_xstats[qp_id].fw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_xstats[qp_id].fw_latency_max = 0;
-		model->layer[0].glow.burst_xstats[qp_id].hw_reset_count = 0;
-		model->layer[0].glow.burst_xstats[qp_id].fw_reset_count = 0;
-		model->layer[0].glow.burst_xstats[qp_id].dequeued_count = 0;
+	/* Load layer and get the index */
+	layer = &model->layer[0];
+	ret = cn10k_ml_layer_load(cnxk_mldev, model->model_id, NULL, params->addr, params->size,
+				  &layer->index);
+	if (ret != 0) {
+		plt_err("Model layer load failed: model_id = %u, layer_id = %u", model->model_id,
+			0);
+		return ret;
 	}
 
-	model->layer[0].glow.sync_xstats =
-		PLT_PTR_ADD(model->layer[0].glow.burst_xstats,
-			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_xstats));
-
-	plt_spinlock_init(&model->lock);
-	model->state = ML_CNXK_MODEL_STATE_LOADED;
-	dev->data->models[idx] = model;
-	cnxk_mldev->nb_models_loaded++;
-
-	/* Update xstats names */
-	cn10k_ml_xstats_model_name_update(dev, idx);
-
-	*model_id = idx;
+	cn10k_ml_model_info_set(cnxk_mldev, model, &model->layer[0].info, &model->glow.metadata);
 
 	return 0;
 }
 
 int
-cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name)
 {
-	char str[RTE_MEMZONE_NAMESIZE];
-	struct cnxk_ml_model *model;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 
-	cnxk_mldev = dev->data->dev_private;
-	model = dev->data->models[model_id];
+	char str[RTE_MEMZONE_NAMESIZE];
+	uint16_t layer_id = 0;
+	int ret;
 
+	PLT_SET_USED(layer_name);
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
 	if (model == NULL) {
 		plt_err("Invalid model_id = %u", model_id);
 		return -EINVAL;
 	}
 
-	if (model->state != ML_CNXK_MODEL_STATE_LOADED) {
-		plt_err("Cannot unload. Model in use.");
-		return -EBUSY;
-	}
+	layer = &model->layer[layer_id];
 
-	dev->data->models[model_id] = NULL;
-	cnxk_mldev->nb_models_unloaded++;
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u_%u", CN10K_ML_LAYER_MEMZONE_NAME,
+		 model->model_id, layer_id);
+	ret = plt_memzone_free(plt_memzone_lookup(str));
 
-	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, model_id);
-	return plt_memzone_free(plt_memzone_lookup(str));
+	layer->state = ML_CNXK_LAYER_STATE_UNKNOWN;
+	cnxk_mldev->index_map[layer->index].active = false;
+
+	return ret;
+}
+
+int
+cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	return cn10k_ml_layer_unload(cnxk_mldev, model->model_id, NULL);
 }
 
 int
@@ -1748,7 +1800,6 @@ int
 cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
 {
 	struct cnxk_ml_model *model;
-	size_t size;
 
 	model = dev->data->models[model_id];
 
@@ -1762,19 +1813,10 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *bu
 	else if (model->state != ML_CNXK_MODEL_STATE_LOADED)
 		return -EBUSY;
 
-	size = model->layer[0].glow.metadata.init_model.file_size +
-	       model->layer[0].glow.metadata.main_model.file_size +
-	       model->layer[0].glow.metadata.finish_model.file_size +
-	       model->layer[0].glow.metadata.weights_bias.file_size;
-
 	/* Update model weights & bias */
 	rte_memcpy(model->layer[0].glow.addr.wb_load_addr, buffer,
 		   model->layer[0].glow.metadata.weights_bias.file_size);
 
-	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
-		   model->layer[0].glow.addr.base_dma_addr_load, size);
-
 	return 0;
 }
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 2d0a49d5cd..677219dfdf 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -12,6 +12,7 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_qp;
+struct cnxk_ml_model;
 
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
@@ -311,9 +312,9 @@ int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 			      int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids);
 
 /* Slow-path ops */
-int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
-			uint16_t *model_id);
-int cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
+int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+			struct cnxk_ml_model *model);
+int cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
@@ -339,4 +340,9 @@ __rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *
 /* Misc ops */
 void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
 
+/* Layer ops */
+int cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uint8_t *buffer,
+			size_t size, uint16_t *index);
+int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name);
+
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 02605fa28f..1590249abd 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -31,6 +31,18 @@ enum cnxk_ml_dev_state {
 	ML_CNXK_DEV_STATE_CLOSED
 };
 
+/* Index to model and layer ID map */
+struct cnxk_ml_index_map {
+	/* Model ID */
+	uint16_t model_id;
+
+	/* Layer ID */
+	uint16_t layer_id;
+
+	/* Layer status */
+	bool active;
+};
+
 /* Device private data */
 struct cnxk_ml_dev {
 	/* RTE device */
@@ -56,6 +68,9 @@ struct cnxk_ml_dev {
 
 	/* Maximum number of layers */
 	uint64_t max_nb_layers;
+
+	/* Index map */
+	struct cnxk_ml_index_map *index_map;
 };
 
 #endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index aa56dd2276..1d8b84269d 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -10,6 +10,9 @@
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
+/* ML model macros */
+#define CNXK_ML_MODEL_MEMZONE_NAME "ml_cnxk_model_mz"
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -137,6 +140,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	uint16_t model_id;
 	uint32_t mz_size;
 	uint16_t qp_id;
+	uint64_t i;
 	int ret;
 
 	if (dev == NULL)
@@ -240,7 +244,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 						plt_err("Could not stop model %u", model_id);
 				}
 				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-					if (cn10k_ml_model_unload(dev, model_id) != 0)
+					if (cnxk_ml_model_unload(dev, model_id) != 0)
 						plt_err("Could not unload model %u", model_id);
 				}
 				dev->data->models[model_id] = NULL;
@@ -271,6 +275,23 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	cnxk_mldev->max_nb_layers =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
 
+	/* Allocate and initialize index_map */
+	if (cnxk_mldev->index_map == NULL) {
+		cnxk_mldev->index_map =
+			rte_zmalloc("cnxk_ml_index_map",
+				    sizeof(struct cnxk_ml_index_map) * cnxk_mldev->max_nb_layers,
+				    RTE_CACHE_LINE_SIZE);
+		if (cnxk_mldev->index_map == NULL) {
+			plt_err("Failed to get memory for index_map, nb_layers %" PRIu64,
+				cnxk_mldev->max_nb_layers);
+			ret = -ENOMEM;
+			goto error;
+		}
+	}
+
+	for (i = 0; i < cnxk_mldev->max_nb_layers; i++)
+		cnxk_mldev->index_map[i].active = false;
+
 	cnxk_mldev->nb_models_loaded = 0;
 	cnxk_mldev->nb_models_started = 0;
 	cnxk_mldev->nb_models_stopped = 0;
@@ -303,6 +324,9 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close CN10K ML Device");
 
+	if (cnxk_mldev->index_map)
+		rte_free(cnxk_mldev->index_map);
+
 	/* Stop and unload all models */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 		model = dev->data->models[model_id];
@@ -312,7 +336,7 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 					plt_err("Could not stop model %u", model_id);
 			}
 			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-				if (cn10k_ml_model_unload(dev, model_id) != 0)
+				if (cnxk_ml_model_unload(dev, model_id) != 0)
 					plt_err("Could not unload model %u", model_id);
 			}
 			dev->data->models[model_id] = NULL;
@@ -428,6 +452,118 @@ cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+static int
+cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint64_t model_info_size;
+	uint16_t lcl_model_id;
+	uint64_t mz_size;
+	bool found;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	/* Find model ID */
+	found = false;
+	for (lcl_model_id = 0; lcl_model_id < dev->data->nb_models; lcl_model_id++) {
+		if (dev->data->models[lcl_model_id] == NULL) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		plt_err("No slots available to load new model");
+		return -ENOMEM;
+	}
+
+	/* Compute memzone size */
+	cnxk_ml_dev_info_get(dev, &dev_info);
+	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size);
+	model_info_size = sizeof(struct rte_ml_model_info) +
+			  ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info) +
+			  ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info);
+	model_info_size = PLT_ALIGN_CEIL(model_info_size, dev_info.align_size);
+	mz_size += model_info_size;
+
+	/* Allocate memzone for model object */
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CNXK_ML_MODEL_MEMZONE_NAME, lcl_model_id);
+	mz = plt_memzone_reserve_aligned(str, mz_size, 0, dev_info.align_size);
+	if (!mz) {
+		plt_err("Failed to allocate memory for cnxk_ml_model: %s", str);
+		return -ENOMEM;
+	}
+
+	model = mz->addr;
+	model->cnxk_mldev = cnxk_mldev;
+	model->model_id = lcl_model_id;
+	model->info = PLT_PTR_ADD(
+		model, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size));
+	dev->data->models[lcl_model_id] = model;
+
+	ret = cn10k_ml_model_load(cnxk_mldev, params, model);
+	if (ret != 0)
+		goto error;
+
+	plt_spinlock_init(&model->lock);
+	model->state = ML_CNXK_MODEL_STATE_LOADED;
+	cnxk_mldev->nb_models_loaded++;
+
+	*model_id = lcl_model_id;
+
+	return 0;
+
+error:
+	rte_memzone_free(mz);
+
+	return ret;
+}
+
+int
+cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	if (model->state != ML_CNXK_MODEL_STATE_LOADED) {
+		plt_err("Cannot unload. Model in use.");
+		return -EBUSY;
+	}
+
+	ret = cn10k_ml_model_unload(cnxk_mldev, model);
+	if (ret != 0)
+		return ret;
+
+	dev->data->models[model_id] = NULL;
+	cnxk_mldev->nb_models_unloaded++;
+
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CNXK_ML_MODEL_MEMZONE_NAME, model_id);
+	return plt_memzone_free(plt_memzone_lookup(str));
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -451,8 +587,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
 
 	/* Model ops */
-	.model_load = cn10k_ml_model_load,
-	.model_unload = cn10k_ml_model_unload,
+	.model_load = cnxk_ml_model_load,
+	.model_unload = cnxk_ml_model_unload,
 	.model_start = cn10k_ml_model_start,
 	.model_stop = cn10k_ml_model_stop,
 	.model_info_get = cn10k_ml_model_info_get,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index a925c07580..bc14f6e5b9 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -62,4 +62,6 @@ struct cnxk_ml_qp {
 
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
+int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 10/34] ml/cnxk: update model start and stop functions
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (8 preceding siblings ...)
  2023-10-18 13:53   ` [PATCH v6 09/34] ml/cnxk: update model load and unload functions Srikanth Yalavarthi
@ 2023-10-18 13:53   ` Srikanth Yalavarthi
  2023-10-18 13:53   ` [PATCH v6 11/34] ml/cnxk: update model utility functions Srikanth Yalavarthi
                     ` (24 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:53 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented cnxk wrapper functions to start and stop
ML models. Wrapper functions would invoke the cn10k
model start and stop functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c |  28 ++--
 drivers/ml/cnxk/cn10k_ml_ocm.h |  12 +-
 drivers/ml/cnxk/cn10k_ml_ops.c | 282 ++++++++++++++++++++-------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   8 +-
 drivers/ml/cnxk/cnxk_ml_ops.c  |  48 +++++-
 drivers/ml/cnxk/cnxk_ml_ops.h  |   1 +
 6 files changed, 240 insertions(+), 139 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index d71c36eae6..2197e5e0ed 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -215,11 +215,10 @@ cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end)
  * scratch & WB pages and OCM allocation mode.
  */
 int
-cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
+cn10k_ml_ocm_tilemask_find(struct cnxk_ml_dev *cnxk_mldev, uint8_t num_tiles, uint16_t wb_pages,
 			   uint16_t scratch_pages, uint64_t *tilemask)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	uint16_t used_scratch_pages_max;
@@ -238,7 +237,6 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	int max_slot_sz;
 	int page_id;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
 
@@ -333,12 +331,10 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 }
 
 void
-cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
+cn10k_ml_ocm_reserve_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id,
 			   uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
 			   uint16_t scratch_pages)
 {
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
@@ -351,10 +347,8 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t l
 	int tile_id;
 	int page_id;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
+	model = cnxk_mldev->mldev->data->models[model_id];
 	layer = &model->layer[layer_id];
 
 	/* Get first set bit, tile_start */
@@ -396,12 +390,10 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t l
 }
 
 void
-cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id)
+cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id)
 {
 	struct cnxk_ml_model *local_model;
 	struct cnxk_ml_layer *local_layer;
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
@@ -416,10 +408,8 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t laye
 	uint16_t i;
 	uint16_t j;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
+	model = cnxk_mldev->mldev->data->models[model_id];
 	layer = &model->layer[layer_id];
 
 	/* Update OCM info for WB memory */
@@ -438,8 +428,8 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t laye
 
 		/* Get max scratch pages required, excluding the current model */
 		scratch_resize_pages = 0;
-		for (i = 0; i < dev->data->nb_models; i++) {
-			local_model = dev->data->models[i];
+		for (i = 0; i < cnxk_mldev->mldev->data->nb_models; i++) {
+			local_model = cnxk_mldev->mldev->data->models[i];
 			if (local_model == NULL)
 				continue;
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 720f8caf76..97b723a56a 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -8,6 +8,8 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+struct cnxk_ml_dev;
+
 /* Number of OCM tiles. */
 #define ML_CN10K_OCM_NUMTILES 0x8
 
@@ -75,12 +77,12 @@ struct cn10k_ml_ocm {
 };
 
 int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
-int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
+int cn10k_ml_ocm_tilemask_find(struct cnxk_ml_dev *cnxk_mldev, uint8_t num_tiles, uint16_t wb_pages,
 			       uint16_t scratch_pages, uint64_t *tilemask);
-void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
-				uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
-				uint16_t scratch_pages);
-void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id);
+void cn10k_ml_ocm_reserve_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id,
+				uint16_t layer_id, uint64_t tilemask, int wb_page_start,
+				uint16_t wb_pages, uint16_t scratch_pages);
+void cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id);
 void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp);
 
 #endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index ad2effb904..c677861645 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -248,26 +248,28 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 }
 
 static void
-cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_model *model,
+cn10k_ml_prep_sp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				struct cnxk_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_layer_addr *addr;
+	struct cn10k_ml_dev *cn10k_mldev;
 
-	metadata = &model->glow.metadata;
-	addr = &model->layer[0].glow.addr;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	metadata = &layer->glow.metadata;
+	addr = &layer->glow.addr;
 
 	memset(&req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
 	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
 	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->cn10k_req.status);
-	req->cn10k_req.jd.hdr.model_id = model->model_id;
+	req->cn10k_req.jd.hdr.model_id = layer->index;
 	req->cn10k_req.jd.hdr.job_type = job_type;
 	req->cn10k_req.jd.hdr.fp_flags = 0x0;
 	req->cn10k_req.jd.hdr.result =
 		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
-		if (!model->glow.metadata.model.ocm_relocatable)
+		if (!layer->glow.metadata.model.ocm_relocatable)
 			req->cn10k_req.jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
 		else
 			req->cn10k_req.jd.hdr.sp_flags = 0x0;
@@ -291,7 +293,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 		req->cn10k_req.jd.model_start.num_gather_entries = 0;
 		req->cn10k_req.jd.model_start.num_scatter_entries = 0;
 		req->cn10k_req.jd.model_start.tilemask = 0; /* Updated after reserving pages */
-		req->cn10k_req.jd.model_start.batch_size = model->batch_size;
+		req->cn10k_req.jd.model_start.batch_size = layer->batch_size;
 		req->cn10k_req.jd.model_start.ocm_wb_base_address =
 			0; /* Updated after reserving pages */
 		req->cn10k_req.jd.model_start.ocm_wb_range_start =
@@ -323,9 +325,13 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 }
 
 static __rte_always_inline void
-cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_req *req,
+cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_req *req,
 				struct rte_ml_op *op)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+
 	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
 	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(req->status);
 	req->cn10k_req.jd.hdr.model_id = op->model_id;
@@ -714,10 +720,8 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 }
 
 static int
-cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer)
 {
-	struct rte_ml_model_info *info;
-	struct cnxk_ml_model *model;
 	struct rte_ml_buff_seg seg[2];
 	struct rte_ml_buff_seg *inp;
 	struct rte_ml_buff_seg *out;
@@ -730,22 +734,20 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	int ret = 0;
 	uint32_t i;
 
-	model = dev->data->models[model_id];
-	info = (struct rte_ml_model_info *)model->info;
 	inp = &seg[0];
 	out = &seg[1];
 
 	/* Create input and output buffers. */
-	for (i = 0; i < info->nb_inputs; i++)
-		isize += info->input_info[i].size;
+	for (i = 0; i < layer->info.nb_inputs; i++)
+		isize += layer->info.input[i].sz_q;
 
-	for (i = 0; i < info->nb_outputs; i++)
-		osize += info->output_info[i].size;
+	for (i = 0; i < layer->info.nb_outputs; i++)
+		osize += layer->info.output[i].sz_q;
 
-	isize = model->batch_size * isize;
-	osize = model->batch_size * osize;
+	isize = layer->batch_size * isize;
+	osize = layer->batch_size * osize;
 
-	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", model_id);
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", layer->index);
 	mz = plt_memzone_reserve_aligned(str, isize + osize, 0, ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL)
 		return -ENOMEM;
@@ -761,15 +763,15 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	seg[1].length = osize;
 	seg[1].next = NULL;
 
-	op.model_id = model_id;
-	op.nb_batches = model->batch_size;
+	op.model_id = layer->index;
+	op.nb_batches = layer->batch_size;
 	op.mempool = NULL;
 
 	op.input = &inp;
 	op.output = &out;
 
-	memset(model->layer[0].glow.req, 0, sizeof(struct cnxk_ml_req));
-	ret = cn10k_ml_inference_sync(dev, &op);
+	memset(layer->glow.req, 0, sizeof(struct cnxk_ml_req));
+	ret = cn10k_ml_inference_sync(cnxk_mldev, &op);
 	plt_memzone_free(mz);
 
 	return ret;
@@ -1506,14 +1508,16 @@ cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mode
 }
 
 int
-cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 	struct cnxk_ml_req *req;
 
+	uint16_t layer_id = 0;
 	bool job_enqueued;
 	bool job_dequeued;
 	uint8_t num_tiles;
@@ -1524,85 +1528,89 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	bool locked;
 	int ret = 0;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	PLT_SET_USED(layer_name);
 
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
 	if (model == NULL) {
 		plt_err("Invalid model_id = %u", model_id);
 		return -EINVAL;
 	}
 
+	layer = &model->layer[layer_id];
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
+
 	/* Prepare JD */
-	req = model->layer[0].glow.req;
-	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
+	req = layer->glow.req;
+	cn10k_ml_prep_sp_job_descriptor(cnxk_mldev, layer, req, ML_CN10K_JOB_TYPE_MODEL_START);
 	req->cn10k_req.result.error_code = 0x0;
 	req->cn10k_req.result.user_ptr = NULL;
 
 	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
-	num_tiles = model->layer[0].glow.metadata.model.tile_end -
-		    model->layer[0].glow.metadata.model.tile_start + 1;
+	num_tiles = layer->glow.metadata.model.tile_end - layer->glow.metadata.model.tile_start + 1;
 
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-				plt_ml_dbg("Model already started, model = 0x%016lx",
-					   PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_STARTED) {
+				plt_ml_dbg("Layer already started, model_id = %u, layer_id = %u",
+					   model->model_id, layer_id);
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
-				plt_err("A slow-path job is active for the model = 0x%016lx",
-					PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE) {
+				plt_err("A slow-path job is active for the model_id = %u",
+					model->model_id);
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
+			layer->state = ML_CNXK_LAYER_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (!model->layer[0].glow.ocm_map.ocm_reserved) {
+	while (!layer->glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
 			wb_page_start = cn10k_ml_ocm_tilemask_find(
-				dev, num_tiles, model->layer[0].glow.ocm_map.wb_pages,
-				model->layer[0].glow.ocm_map.scratch_pages, &tilemask);
+				cnxk_mldev, num_tiles, layer->glow.ocm_map.wb_pages,
+				layer->glow.ocm_map.scratch_pages, &tilemask);
 
 			if (wb_page_start == -1) {
 				plt_err("Free pages not available on OCM tiles");
-				plt_err("Failed to start model = 0x%016lx, name = %s",
-					PLT_U64_CAST(model),
-					model->layer[0].glow.metadata.model.name);
-
+				plt_err("Failed to start layer, model_id = %u, layer_id = %u",
+					model->model_id, layer_id);
 				plt_spinlock_unlock(&ocm->lock);
 				return -ENOMEM;
 			}
 
-			model->layer[0].glow.ocm_map.tilemask = tilemask;
-			model->layer[0].glow.ocm_map.wb_page_start = wb_page_start;
+			layer->glow.ocm_map.tilemask = tilemask;
+			layer->glow.ocm_map.wb_page_start = wb_page_start;
 
-			cn10k_ml_ocm_reserve_pages(dev, model->model_id, 0,
-						   model->layer[0].glow.ocm_map.tilemask,
-						   model->layer[0].glow.ocm_map.wb_page_start,
-						   model->layer[0].glow.ocm_map.wb_pages,
-						   model->layer[0].glow.ocm_map.scratch_pages);
-			model->layer[0].glow.ocm_map.ocm_reserved = true;
+			cn10k_ml_ocm_reserve_pages(
+				cnxk_mldev, model->model_id, layer_id, layer->glow.ocm_map.tilemask,
+				layer->glow.ocm_map.wb_page_start, layer->glow.ocm_map.wb_pages,
+				layer->glow.ocm_map.scratch_pages);
+			layer->glow.ocm_map.ocm_reserved = true;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
 
 	/* Update JD */
-	cn10k_ml_ocm_tilecount(model->layer[0].glow.ocm_map.tilemask, &tile_start, &tile_end);
+	cn10k_ml_ocm_tilecount(layer->glow.ocm_map.tilemask, &tile_start, &tile_end);
 	req->cn10k_req.jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
 	req->cn10k_req.jd.model_start.ocm_wb_base_address =
-		model->layer[0].glow.ocm_map.wb_page_start * ocm->page_size;
+		layer->glow.ocm_map.wb_page_start * ocm->page_size;
 
 	job_enqueued = false;
 	job_dequeued = false;
@@ -1636,66 +1644,94 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (ret == 0) {
-				model->state = ML_CNXK_MODEL_STATE_STARTED;
-				cnxk_mldev->nb_models_started++;
-			} else {
-				model->state = ML_CNXK_MODEL_STATE_UNKNOWN;
-			}
+			if (ret == 0)
+				layer->state = ML_CNXK_LAYER_STATE_STARTED;
+			else
+				layer->state = ML_CNXK_LAYER_STATE_UNKNOWN;
 
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN) {
-		while (model->layer[0].glow.ocm_map.ocm_reserved) {
+	if (layer->state == ML_CNXK_LAYER_STATE_UNKNOWN) {
+		while (layer->glow.ocm_map.ocm_reserved) {
 			if (plt_spinlock_trylock(&ocm->lock) != 0) {
-				cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
-				model->layer[0].glow.ocm_map.ocm_reserved = false;
-				model->layer[0].glow.ocm_map.tilemask = 0x0;
+				cn10k_ml_ocm_free_pages(cnxk_mldev, model->model_id, layer_id);
+				layer->glow.ocm_map.ocm_reserved = false;
+				layer->glow.ocm_map.tilemask = 0x0;
 				plt_spinlock_unlock(&ocm->lock);
 			}
 		}
 	}
 
-	if (ret < 0) { /* Call unload to update model and FW state, ignore error */
-		rte_ml_model_stop(dev->data->dev_id, model_id);
+	if (ret < 0) {
+		cn10k_ml_layer_stop(device, model_id, layer_name);
 	} else {
-		if (cn10k_mldev->cache_model_data && roc_model_is_cn10ka())
-			ret = cn10k_ml_cache_model_data(dev, model_id);
+		if (cn10k_mldev->cache_model_data)
+			ret = cn10k_ml_cache_model_data(cnxk_mldev, layer);
 	}
 
 	return ret;
 }
 
 int
-cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+	int ret;
+
+	layer = &model->layer[0];
+	ret = cn10k_ml_layer_start(cnxk_mldev, model->model_id, layer->name);
+	if (ret != 0) {
+		plt_err("CN10K Model start failed, model_id = %u, error = %d", model->model_id,
+			ret);
+		return ret;
+	}
+
+	cnxk_mldev->nb_models_started++;
+	model->state = ML_CNXK_MODEL_STATE_STARTED;
+
+	return 0;
+}
+
+int
+cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 	struct cnxk_ml_req *req;
 
+	uint16_t layer_id = 0;
 	bool job_enqueued;
 	bool job_dequeued;
 	bool locked;
 	int ret = 0;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	PLT_SET_USED(layer_name);
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
 
+	model = cnxk_mldev->mldev->data->models[model_id];
 	if (model == NULL) {
 		plt_err("Invalid model_id = %u", model_id);
 		return -EINVAL;
 	}
 
+	layer = &model->layer[layer_id];
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
+
 	/* Prepare JD */
-	req = model->layer[0].glow.req;
-	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
+	req = layer->glow.req;
+	cn10k_ml_prep_sp_job_descriptor(cnxk_mldev, layer, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
 	req->cn10k_req.result.error_code = 0x0;
 	req->cn10k_req.result.user_ptr = NULL;
 
@@ -1705,31 +1741,31 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-				plt_ml_dbg("Model not started, model = 0x%016lx",
-					   PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_LOADED) {
+				plt_ml_dbg("Layer not started, model_id = %u, layer_id = %u",
+					   model->model_id, layer_id);
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
-				plt_err("A slow-path job is active for the model = 0x%016lx",
-					PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE) {
+				plt_err("A slow-path job is active for the layer, model_id = %u, layer_id = %u",
+					model->model_id, layer_id);
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
+			layer->state = ML_CNXK_LAYER_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (model->layer[0].glow.ocm_map.ocm_reserved) {
+	while (layer->glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
-			cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
-			model->layer[0].glow.ocm_map.ocm_reserved = false;
-			model->layer[0].glow.ocm_map.tilemask = 0x0;
+			cn10k_ml_ocm_free_pages(cnxk_mldev, model->model_id, layer_id);
+			layer->glow.ocm_map.ocm_reserved = false;
+			layer->glow.ocm_map.tilemask = 0x0;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
@@ -1766,8 +1802,11 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			cnxk_mldev->nb_models_stopped++;
-			model->state = ML_CNXK_MODEL_STATE_LOADED;
+			if (ret == 0)
+				layer->state = ML_CNXK_LAYER_STATE_LOADED;
+			else
+				layer->state = ML_CNXK_LAYER_STATE_UNKNOWN;
+
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
@@ -1776,6 +1815,25 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	return ret;
 }
 
+int
+cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+	int ret;
+
+	layer = &model->layer[0];
+	ret = cn10k_ml_layer_stop(cnxk_mldev, model->model_id, layer->name);
+	if (ret != 0) {
+		plt_err("CN10K Model stop failed, model_id = %u, error = %d", model->model_id, ret);
+		return ret;
+	}
+
+	cnxk_mldev->nb_models_stopped++;
+	model->state = ML_CNXK_MODEL_STATE_LOADED;
+
+	return 0;
+}
+
 int
 cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			struct rte_ml_model_info *model_info)
@@ -2003,30 +2061,35 @@ queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
 }
 
 static __rte_always_inline void
-cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *req)
+cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml_req *req)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_layer_xstats *xstats;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_result *result;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cnxk_ml_qp *qp;
 	struct rte_ml_op *op;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
+	uint16_t model_id;
+	uint16_t layer_id;
 
 	result = &req->cn10k_req.result;
 	op = req->op;
 
 	if (likely(result->error_code == 0)) {
-		model = dev->data->models[op->model_id];
+		model_id = cnxk_mldev->index_map[op->model_id].model_id;
+		layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
+		model = cnxk_mldev->mldev->data->models[model_id];
+		layer = &model->layer[layer_id];
 		if (likely(qp_id >= 0)) {
-			qp = dev->data->queue_pairs[qp_id];
+			qp = cnxk_mldev->mldev->data->queue_pairs[qp_id];
 			qp->stats.dequeued_count++;
-			xstats = &model->layer[0].glow.burst_xstats[qp_id];
+			xstats = &layer->glow.burst_xstats[qp_id];
 		} else {
-			xstats = model->layer[0].glow.sync_xstats;
+			xstats = layer->glow.sync_xstats;
 		}
 
 		if (unlikely(xstats->dequeued_count == xstats->hw_reset_count)) {
@@ -2054,14 +2117,13 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *re
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
 	} else {
 		if (likely(qp_id >= 0)) {
-			qp = dev->data->queue_pairs[qp_id];
+			qp = cnxk_mldev->mldev->data->queue_pairs[qp_id];
 			qp->stats.dequeue_err_count++;
 		}
 
 		/* Handle driver error */
 		error_code = (union cn10k_ml_error_code *)&result->error_code;
 		if (error_code->s.etype == ML_ETYPE_DRIVER) {
-			cnxk_mldev = dev->data->dev_private;
 			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 			/* Check for exception */
@@ -2116,7 +2178,7 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	req = &queue->reqs[head];
 
 	cn10k_mldev->set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
@@ -2183,7 +2245,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 		}
 	}
 
-	cn10k_ml_result_update(dev, qp_id, req);
+	cn10k_ml_result_update(cnxk_mldev, qp_id, req);
 	ops[count] = req->op;
 
 	queue_index_advance(&tail, qp->nb_desc);
@@ -2232,23 +2294,27 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 }
 
 __rte_hot int
-cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
+cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cnxk_ml_req *req;
+	uint16_t model_id;
+	uint16_t layer_id;
 	bool timeout;
 	int ret = 0;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	model = dev->data->models[op->model_id];
-	req = model->layer[0].glow.req;
+	model_id = cnxk_mldev->index_map[op->model_id].model_id;
+	layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
+	model = cnxk_mldev->mldev->data->models[model_id];
+	layer = &model->layer[layer_id];
+	req = layer->glow.req;
 
 	cn10k_ml_set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
@@ -2284,7 +2350,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	if (timeout)
 		ret = -ETIME;
 	else
-		cn10k_ml_result_update(dev, -1, req);
+		cn10k_ml_result_update(cnxk_mldev, -1, req);
 
 error_enqueue:
 	return ret;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 677219dfdf..a222a43d55 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -315,8 +315,8 @@ int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
-int cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id);
-int cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
+int cn10k_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			    struct rte_ml_model_info *model_info);
 int cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer);
@@ -335,7 +335,7 @@ __rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id
 					  struct rte_ml_op **ops, uint16_t nb_ops);
 __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op,
 				    struct rte_ml_op_error *error);
-__rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
+__rte_hot int cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op);
 
 /* Misc ops */
 void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
@@ -344,5 +344,7 @@ void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *q
 int cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uint8_t *buffer,
 			size_t size, uint16_t *index);
 int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name);
+int cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name);
+int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name);
 
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 1d8b84269d..b61ed45876 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -240,7 +240,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 			model = dev->data->models[model_id];
 			if (model != NULL) {
 				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-					if (cn10k_ml_model_stop(dev, model_id) != 0)
+					if (cnxk_ml_model_stop(dev, model_id) != 0)
 						plt_err("Could not stop model %u", model_id);
 				}
 				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
@@ -332,7 +332,7 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 		model = dev->data->models[model_id];
 		if (model != NULL) {
 			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-				if (cn10k_ml_model_stop(dev, model_id) != 0)
+				if (cnxk_ml_model_stop(dev, model_id) != 0)
 					plt_err("Could not stop model %u", model_id);
 			}
 			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
@@ -564,6 +564,46 @@ cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 	return plt_memzone_free(plt_memzone_lookup(str));
 }
 
+static int
+cnxk_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	return cn10k_ml_model_start(cnxk_mldev, model);
+}
+
+int
+cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	return cn10k_ml_model_stop(cnxk_mldev, model);
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -589,8 +629,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Model ops */
 	.model_load = cnxk_ml_model_load,
 	.model_unload = cnxk_ml_model_unload,
-	.model_start = cn10k_ml_model_start,
-	.model_stop = cn10k_ml_model_stop,
+	.model_start = cnxk_ml_model_start,
+	.model_stop = cnxk_ml_model_stop,
 	.model_info_get = cn10k_ml_model_info_get,
 	.model_params_update = cn10k_ml_model_params_update,
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index bc14f6e5b9..d27ca0d0cb 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -63,5 +63,6 @@ struct cnxk_ml_qp {
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
 int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
+int cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
 
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 11/34] ml/cnxk: update model utility functions
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (9 preceding siblings ...)
  2023-10-18 13:53   ` [PATCH v6 10/34] ml/cnxk: update model start and stop functions Srikanth Yalavarthi
@ 2023-10-18 13:53   ` Srikanth Yalavarthi
  2023-10-18 13:53   ` [PATCH v6 12/34] ml/cnxk: update data quantization functions Srikanth Yalavarthi
                     ` (23 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:53 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to update model params and
fetch model info.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 38 ++++++---------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |  5 ++--
 drivers/ml/cnxk/cnxk_ml_ops.c  | 48 ++++++++++++++++++++++++++++++++--
 3 files changed, 56 insertions(+), 35 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index c677861645..c0d6216485 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1835,45 +1835,23 @@ cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
 }
 
 int
-cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
-			struct rte_ml_model_info *model_info)
+cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			     void *buffer)
 {
-	struct cnxk_ml_model *model;
-
-	model = dev->data->models[model_id];
-
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
-
-	rte_memcpy(model_info, model->info, sizeof(struct rte_ml_model_info));
-	model_info->input_info = ((struct rte_ml_model_info *)model->info)->input_info;
-	model_info->output_info = ((struct rte_ml_model_info *)model->info)->output_info;
-
-	return 0;
-}
-
-int
-cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
-{
-	struct cnxk_ml_model *model;
-
-	model = dev->data->models[model_id];
+	struct cnxk_ml_layer *layer;
 
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
+	RTE_SET_USED(cnxk_mldev);
 
 	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN)
 		return -1;
 	else if (model->state != ML_CNXK_MODEL_STATE_LOADED)
 		return -EBUSY;
 
+	layer = &model->layer[0];
+
 	/* Update model weights & bias */
-	rte_memcpy(model->layer[0].glow.addr.wb_load_addr, buffer,
-		   model->layer[0].glow.metadata.weights_bias.file_size);
+	rte_memcpy(layer->glow.addr.wb_load_addr, buffer,
+		   layer->glow.metadata.weights_bias.file_size);
 
 	return 0;
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index a222a43d55..ef12069f0d 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -317,9 +317,8 @@ int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_para
 int cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
-int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
-			    struct rte_ml_model_info *model_info);
-int cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer);
+int cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				 void *buffer);
 
 /* I/O ops */
 int cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index b61ed45876..9ce37fcfd1 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -604,6 +604,50 @@ cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	return cn10k_ml_model_stop(cnxk_mldev, model);
 }
 
+static int
+cnxk_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
+		       struct rte_ml_model_info *model_info)
+{
+	struct rte_ml_model_info *info;
+	struct cnxk_ml_model *model;
+
+	if ((dev == NULL) || (model_info == NULL))
+		return -EINVAL;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	info = (struct rte_ml_model_info *)model->info;
+	rte_memcpy(model_info, info, sizeof(struct rte_ml_model_info));
+	model_info->input_info = info->input_info;
+	model_info->output_info = info->output_info;
+
+	return 0;
+}
+
+static int
+cnxk_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	if ((dev == NULL) || (buffer == NULL))
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	return cn10k_ml_model_params_update(cnxk_mldev, model, buffer);
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -631,8 +675,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.model_unload = cnxk_ml_model_unload,
 	.model_start = cnxk_ml_model_start,
 	.model_stop = cnxk_ml_model_stop,
-	.model_info_get = cn10k_ml_model_info_get,
-	.model_params_update = cn10k_ml_model_params_update,
+	.model_info_get = cnxk_ml_model_info_get,
+	.model_params_update = cnxk_ml_model_params_update,
 
 	/* I/O ops */
 	.io_quantize = cn10k_ml_io_quantize,
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 12/34] ml/cnxk: update data quantization functions
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (10 preceding siblings ...)
  2023-10-18 13:53   ` [PATCH v6 11/34] ml/cnxk: update model utility functions Srikanth Yalavarthi
@ 2023-10-18 13:53   ` Srikanth Yalavarthi
  2023-10-18 13:53   ` [PATCH v6 13/34] ml/cnxk: update device debug functions Srikanth Yalavarthi
                     ` (22 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:53 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper functions to quantize input data and
dequantize output data.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 164 ---------------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   7 --
 drivers/ml/cnxk/cnxk_ml_io.c   |  95 +++++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_io.h   |   3 +
 drivers/ml/cnxk/cnxk_ml_ops.c  |  78 +++++++++++++++-
 drivers/ml/cnxk/meson.build    |   1 +
 6 files changed, 175 insertions(+), 173 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.c

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index c0d6216485..ff190b7f86 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1856,170 +1856,6 @@ cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_mode
 	return 0;
 }
 
-int
-cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
-		     struct rte_ml_buff_seg **qbuffer)
-{
-	struct cnxk_ml_model *model;
-	uint8_t model_input_type;
-	uint8_t *lcl_dbuffer;
-	uint8_t *lcl_qbuffer;
-	uint8_t input_type;
-	float qscale;
-	uint32_t i;
-	uint32_t j;
-	int ret;
-
-	model = dev->data->models[model_id];
-
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
-
-	lcl_dbuffer = dbuffer[0]->addr;
-	lcl_qbuffer = qbuffer[0]->addr;
-
-	for (i = 0; i < model->layer[0].glow.metadata.model.num_input; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			input_type = model->layer[0].glow.metadata.input1[i].input_type;
-			model_input_type = model->layer[0].glow.metadata.input1[i].model_input_type;
-			qscale = model->layer[0].glow.metadata.input1[i].qscale;
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			input_type = model->layer[0].glow.metadata.input2[j].input_type;
-			model_input_type = model->layer[0].glow.metadata.input2[j].model_input_type;
-			qscale = model->layer[0].glow.metadata.input2[j].qscale;
-		}
-
-		if (input_type == model_input_type) {
-			rte_memcpy(lcl_qbuffer, lcl_dbuffer, model->layer[0].info.input[i].sz_d);
-		} else {
-			switch (model->layer[0].glow.metadata.input1[i].model_input_type) {
-			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_float32_to_int8(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_float32_to_uint8(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_float32_to_int16(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_float32_to_uint16(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_FP16:
-				ret = rte_ml_io_float32_to_float16(
-					model->layer[0].info.input[i].nb_elements, lcl_dbuffer,
-					lcl_qbuffer);
-				break;
-			default:
-				plt_err("Unsupported model_input_type[%u] : %u", i,
-					model->layer[0].glow.metadata.input1[i].model_input_type);
-				ret = -ENOTSUP;
-			}
-			if (ret < 0)
-				return ret;
-		}
-
-		lcl_dbuffer += model->layer[0].info.input[i].sz_d;
-		lcl_qbuffer += model->layer[0].info.input[i].sz_q;
-	}
-
-	return 0;
-}
-
-int
-cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
-		       struct rte_ml_buff_seg **dbuffer)
-{
-	struct cnxk_ml_model *model;
-	uint8_t model_output_type;
-	uint8_t *lcl_qbuffer;
-	uint8_t *lcl_dbuffer;
-	uint8_t output_type;
-	float dscale;
-	uint32_t i;
-	uint32_t j;
-	int ret;
-
-	model = dev->data->models[model_id];
-
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
-
-	lcl_dbuffer = dbuffer[0]->addr;
-	lcl_qbuffer = qbuffer[0]->addr;
-
-	for (i = 0; i < model->layer[0].glow.metadata.model.num_output; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			output_type = model->layer[0].glow.metadata.output1[i].output_type;
-			model_output_type =
-				model->layer[0].glow.metadata.output1[i].model_output_type;
-			dscale = model->layer[0].glow.metadata.output1[i].dscale;
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			output_type = model->layer[0].glow.metadata.output2[j].output_type;
-			model_output_type =
-				model->layer[0].glow.metadata.output2[j].model_output_type;
-			dscale = model->layer[0].glow.metadata.output2[j].dscale;
-		}
-
-		if (output_type == model_output_type) {
-			rte_memcpy(lcl_dbuffer, lcl_qbuffer, model->layer[0].info.output[i].sz_q);
-		} else {
-			switch (model->layer[0].glow.metadata.output1[i].model_output_type) {
-			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_int8_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_uint8_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_int16_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_uint16_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_FP16:
-				ret = rte_ml_io_float16_to_float32(
-					model->layer[0].info.output[i].nb_elements, lcl_qbuffer,
-					lcl_dbuffer);
-				break;
-			default:
-				plt_err("Unsupported model_output_type[%u] : %u", i,
-					model->layer[0].glow.metadata.output1[i].model_output_type);
-				ret = -ENOTSUP;
-			}
-			if (ret < 0)
-				return ret;
-		}
-
-		lcl_qbuffer += model->layer[0].info.output[i].sz_q;
-		lcl_dbuffer += model->layer[0].info.output[i].sz_d;
-	}
-
-	return 0;
-}
-
 static __rte_always_inline void
 queue_index_advance(uint64_t *index, uint64_t nb_desc)
 {
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index ef12069f0d..780e2a9f9c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -320,13 +320,6 @@ int cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 int cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 				 void *buffer);
 
-/* I/O ops */
-int cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id,
-			 struct rte_ml_buff_seg **dbuffer, struct rte_ml_buff_seg **qbuffer);
-
-int cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id,
-			   struct rte_ml_buff_seg **qbuffer, struct rte_ml_buff_seg **dbuffer);
-
 /* Fast-path ops */
 __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					  struct rte_ml_op **ops, uint16_t nb_ops);
diff --git a/drivers/ml/cnxk/cnxk_ml_io.c b/drivers/ml/cnxk/cnxk_ml_io.c
new file mode 100644
index 0000000000..c78009ab0c
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_io.c
@@ -0,0 +1,95 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include <mldev_utils.h>
+
+#include <roc_api.h>
+
+#include "cnxk_ml_io.h"
+
+inline int
+cnxk_ml_io_quantize_single(struct cnxk_ml_io *input, uint8_t *dbuffer, uint8_t *qbuffer)
+{
+	enum rte_ml_io_type qtype;
+	enum rte_ml_io_type dtype;
+	uint32_t nb_elements;
+	float qscale;
+	int ret = 0;
+
+	dtype = input->dtype;
+	qtype = input->qtype;
+	qscale = input->scale;
+	nb_elements = input->nb_elements;
+
+	if (dtype == qtype) {
+		rte_memcpy(qbuffer, dbuffer, input->sz_d);
+	} else {
+		switch (qtype) {
+		case RTE_ML_IO_TYPE_INT8:
+			ret = rte_ml_io_float32_to_int8(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT8:
+			ret = rte_ml_io_float32_to_uint8(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_INT16:
+			ret = rte_ml_io_float32_to_int16(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT16:
+			ret = rte_ml_io_float32_to_uint16(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_FP16:
+			ret = rte_ml_io_float32_to_float16(nb_elements, dbuffer, qbuffer);
+			break;
+		default:
+			plt_err("Unsupported qtype : %u", qtype);
+			ret = -ENOTSUP;
+		}
+	}
+
+	return ret;
+}
+
+inline int
+cnxk_ml_io_dequantize_single(struct cnxk_ml_io *output, uint8_t *qbuffer, uint8_t *dbuffer)
+{
+	enum rte_ml_io_type qtype;
+	enum rte_ml_io_type dtype;
+	uint32_t nb_elements;
+	float dscale;
+	int ret = 0;
+
+	dtype = output->dtype;
+	qtype = output->qtype;
+	dscale = output->scale;
+	nb_elements = output->nb_elements;
+
+	if (dtype == qtype) {
+		rte_memcpy(dbuffer, qbuffer, output->sz_q);
+	} else {
+		switch (qtype) {
+		case RTE_ML_IO_TYPE_INT8:
+			ret = rte_ml_io_int8_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT8:
+			ret = rte_ml_io_uint8_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_INT16:
+			ret = rte_ml_io_int16_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT16:
+			ret = rte_ml_io_uint16_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_FP16:
+			ret = rte_ml_io_float16_to_float32(nb_elements, qbuffer, dbuffer);
+			break;
+		default:
+			plt_err("Unsupported qtype: %u", qtype);
+			ret = -ENOTSUP;
+		}
+	}
+
+	return ret;
+}
diff --git a/drivers/ml/cnxk/cnxk_ml_io.h b/drivers/ml/cnxk/cnxk_ml_io.h
index 29ec7ec511..5de166c252 100644
--- a/drivers/ml/cnxk/cnxk_ml_io.h
+++ b/drivers/ml/cnxk/cnxk_ml_io.h
@@ -76,4 +76,7 @@ struct cnxk_ml_io_info {
 	uint32_t total_output_sz_d;
 };
 
+int cnxk_ml_io_quantize_single(struct cnxk_ml_io *input, uint8_t *dbuffer, uint8_t *qbuffer);
+int cnxk_ml_io_dequantize_single(struct cnxk_ml_io *output, uint8_t *qbuffer, uint8_t *dbuffer);
+
 #endif /* _CNXK_ML_IO_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 9ce37fcfd1..63842025fc 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -5,6 +5,8 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include <mldev_utils.h>
+
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_io.h"
 #include "cnxk_ml_model.h"
@@ -648,6 +650,78 @@ cnxk_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buf
 	return cn10k_ml_model_params_update(cnxk_mldev, model, buffer);
 }
 
+static int
+cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
+		    struct rte_ml_buff_seg **qbuffer)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_model *model;
+	uint8_t *lcl_dbuffer;
+	uint8_t *lcl_qbuffer;
+	uint32_t i;
+	int ret;
+
+	if ((dev == NULL) || (dbuffer == NULL) || (qbuffer == NULL))
+		return -EINVAL;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	info = &model->layer[0].info;
+
+	lcl_dbuffer = dbuffer[0]->addr;
+	lcl_qbuffer = qbuffer[0]->addr;
+	for (i = 0; i < info->nb_inputs; i++) {
+		ret = cnxk_ml_io_quantize_single(&info->input[i], lcl_dbuffer, lcl_qbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_dbuffer += info->input[i].sz_d;
+		lcl_qbuffer += info->input[i].sz_q;
+	}
+
+	return 0;
+}
+
+static int
+cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
+		      struct rte_ml_buff_seg **dbuffer)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_model *model;
+	uint8_t *lcl_qbuffer;
+	uint8_t *lcl_dbuffer;
+	uint32_t i;
+	int ret;
+
+	if ((dev == NULL) || (qbuffer == NULL) || (dbuffer == NULL))
+		return -EINVAL;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	info = &model->layer[model->nb_layers - 1].info;
+
+	lcl_qbuffer = qbuffer[0]->addr;
+	lcl_dbuffer = dbuffer[0]->addr;
+	for (i = 0; i < info->nb_outputs; i++) {
+		ret = cnxk_ml_io_dequantize_single(&info->output[i], lcl_qbuffer, lcl_dbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_qbuffer += info->output[i].sz_q;
+		lcl_dbuffer += info->output[i].sz_d;
+	}
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -679,6 +753,6 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.model_params_update = cnxk_ml_model_params_update,
 
 	/* I/O ops */
-	.io_quantize = cn10k_ml_io_quantize,
-	.io_dequantize = cn10k_ml_io_dequantize,
+	.io_quantize = cnxk_ml_io_quantize,
+	.io_dequantize = cnxk_ml_io_dequantize,
 };
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index d652543912..79154c8698 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -13,6 +13,7 @@ sources = files(
         'cn10k_ml_model.c',
         'cn10k_ml_ocm.c',
         'cnxk_ml_dev.c',
+        'cnxk_ml_io.c',
         'cnxk_ml_model.c',
         'cnxk_ml_ops.c',
 )
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 13/34] ml/cnxk: update device debug functions
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (11 preceding siblings ...)
  2023-10-18 13:53   ` [PATCH v6 12/34] ml/cnxk: update data quantization functions Srikanth Yalavarthi
@ 2023-10-18 13:53   ` Srikanth Yalavarthi
  2023-10-18 13:53   ` [PATCH v6 14/34] ml/cnxk: update device stats functions Srikanth Yalavarthi
                     ` (21 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:53 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper for device dump and selftest debug
functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 118 +++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h |   1 +
 drivers/ml/cnxk/cn10k_ml_ocm.c   |   8 +-
 drivers/ml/cnxk/cn10k_ml_ocm.h   |   2 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 176 ++-----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h   |   4 +-
 drivers/ml/cnxk/cnxk_ml_model.c  |  33 ++++++
 drivers/ml/cnxk/cnxk_ml_model.h  |   2 +
 drivers/ml/cnxk/cnxk_ml_ops.c    |  39 ++++++-
 drivers/ml/cnxk/cnxk_ml_utils.c  |  15 +++
 drivers/ml/cnxk/cnxk_ml_utils.h  |  17 +++
 drivers/ml/cnxk/meson.build      |   1 +
 12 files changed, 235 insertions(+), 181 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.h

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 69a60b9b90..b765b4ada9 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -11,6 +11,7 @@
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
+#include "cnxk_ml_utils.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -596,3 +597,120 @@ cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 				 rte_ml_io_type_size_get(io_info->output[i].qtype);
 	}
 }
+
+void
+cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp)
+{
+	struct cn10k_ml_ocm *ocm;
+	char str[STR_LEN];
+	uint8_t i;
+	uint8_t j;
+
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
+
+	/* Print debug info */
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, " Layer Information (Layer ID: %u, Name: %s)\n",
+		cnxk_mldev->index_map[layer->index].layer_id, layer->name);
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "index", layer->index);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", layer->name);
+	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version",
+		layer->glow.metadata.model.version[0], layer->glow.metadata.model.version[1],
+		layer->glow.metadata.model.version[2], layer->glow.metadata.model.version[3]);
+	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "layer", PLT_U64_CAST(layer));
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", layer->batch_size);
+
+	/* Print model state */
+	if (layer->state == ML_CNXK_LAYER_STATE_LOADED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+	if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+	if (layer->state == ML_CNXK_LAYER_STATE_STARTED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+
+	/* Print OCM status */
+	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
+		layer->glow.metadata.model.ocm_wb_range_end -
+			layer->glow.metadata.model.ocm_wb_range_start + 1);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", layer->glow.ocm_map.wb_pages);
+	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
+		ocm->size_per_tile - layer->glow.metadata.model.ocm_tmp_range_floor);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages", layer->glow.ocm_map.scratch_pages);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
+		layer->glow.metadata.model.tile_end - layer->glow.metadata.model.tile_start + 1);
+
+	if (layer->state == ML_CNXK_LAYER_STATE_STARTED) {
+		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
+			ML_CN10K_OCM_NUMTILES / 4, layer->glow.ocm_map.tilemask);
+		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
+			layer->glow.ocm_map.wb_page_start * ocm->page_size);
+	}
+
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", layer->glow.metadata.model.num_input);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", layer->glow.metadata.model.num_output);
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s  %18s\n", "input", "input_name", "input_type",
+		"model_input_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->glow.metadata.model.num_input; i++) {
+		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.input1[i].input_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.input1[i].input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.input1[i].model_input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		} else {
+			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
+
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.input2[j].input_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.input2[j].input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.input2[j].model_input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		}
+	}
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s  %18s\n", "output", "output_name", "output_type",
+		"model_output_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->glow.metadata.model.num_output; i++) {
+		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.output1[i].output_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.output1[i].output_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.output1[i].model_output_type,
+					      str, STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		} else {
+			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.output2[j].output_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.output2[j].output_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.output2[j].model_output_type,
+					      str, STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		}
+	}
+	fprintf(fp, "\n");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index b891c9d627..45f2ed5fcf 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -460,5 +460,6 @@ int cn10k_ml_model_ocm_pages_count(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_m
 void cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 			     struct cnxk_ml_io_info *io_info,
 			     struct cn10k_ml_model_metadata *metadata);
+void cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 2197e5e0ed..dc315cce10 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -481,19 +481,15 @@ cn10k_ml_ocm_pagemask_to_str(struct cn10k_ml_ocm_tile_info *tile_info, uint16_t
 }
 
 void
-cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
+cn10k_ml_ocm_print(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
 {
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 	uint8_t tile_id;
 	uint8_t word_id;
 	int wb_pages;
 	char *str;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
 
 	/* Nibbles + prefix '0x' */
 	str = rte_zmalloc("ocm_mask_str", ocm->num_pages / 4 + 2, RTE_CACHE_LINE_SIZE);
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 97b723a56a..bf8944f8ee 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -83,6 +83,6 @@ void cn10k_ml_ocm_reserve_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_i
 				uint16_t layer_id, uint64_t tilemask, int wb_page_start,
 				uint16_t wb_pages, uint16_t scratch_pages);
 void cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id);
-void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp);
+void cn10k_ml_ocm_print(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 
 #endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index ff190b7f86..0a3575879f 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -18,11 +18,6 @@
 /* ML layer macros */
 #define CN10K_ML_LAYER_MEMZONE_NAME "ml_cn10k_layer_mz"
 
-/* Debug print width */
-#define STR_LEN	  12
-#define FIELD_LEN 16
-#define LINE_LEN  90
-
 /* ML Job descriptor flags */
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
@@ -70,16 +65,6 @@ static const struct cn10k_ml_stype_db_driver {
 	{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
 };
 
-static void
-print_line(FILE *fp, int len)
-{
-	int i;
-
-	for (i = 0; i < len; i++)
-		fprintf(fp, "-");
-	fprintf(fp, "\n");
-}
-
 static inline void
 cn10k_ml_set_poll_addr(struct cnxk_ml_req *req)
 {
@@ -113,140 +98,6 @@ cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp)
 	}
 }
 
-static void
-cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
-	struct cn10k_ml_ocm *ocm;
-	char str[STR_LEN];
-	uint8_t i;
-	uint8_t j;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
-
-	/* Print debug info */
-	print_line(fp, LINE_LEN);
-	fprintf(fp, " Model Information (%s)\n", model->glow.metadata.model.name);
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->glow.metadata.model.name);
-	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version",
-		model->glow.metadata.model.version[0], model->glow.metadata.model.version[1],
-		model->glow.metadata.model.version[2], model->glow.metadata.model.version[3]);
-	if (strlen(model->name) != 0)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "debug_name", model->name);
-	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->glow.metadata.model.batch_size);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_layers", model->glow.metadata.model.num_layers);
-
-	/* Print model state */
-	if (model->state == ML_CNXK_MODEL_STATE_LOADED)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
-	if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
-	if (model->state == ML_CNXK_MODEL_STATE_STARTED)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
-
-	/* Print OCM status */
-	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
-		model->glow.metadata.model.ocm_wb_range_end -
-			model->glow.metadata.model.ocm_wb_range_start + 1);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", model->layer[0].glow.ocm_map.wb_pages);
-	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
-		ocm->size_per_tile - model->glow.metadata.model.ocm_tmp_range_floor);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages",
-		model->layer[0].glow.ocm_map.scratch_pages);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
-		model->glow.metadata.model.tile_end - model->glow.metadata.model.tile_start + 1);
-
-	if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
-			ML_CN10K_OCM_NUMTILES / 4, model->layer[0].glow.ocm_map.tilemask);
-		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
-			model->layer[0].glow.ocm_map.wb_page_start * cn10k_mldev->ocm.page_size);
-	}
-
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->glow.metadata.model.num_input);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", model->glow.metadata.model.num_output);
-	fprintf(fp, "\n");
-
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "input", "input_name", "input_type",
-		"model_input_type", "quantize");
-	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->glow.metadata.model.num_input; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.input1[i].input_name);
-			rte_ml_io_type_to_str(model->glow.metadata.input1[i].input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.input1[i].model_input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.input1[i].quantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.input2[j].input_name);
-			rte_ml_io_type_to_str(model->glow.metadata.input2[j].input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.input2[j].model_input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.input2[j].quantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		}
-	}
-	fprintf(fp, "\n");
-
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "output", "output_name", "output_type",
-		"model_output_type", "dequantize");
-	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->glow.metadata.model.num_output; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.output1[i].output_name);
-			rte_ml_io_type_to_str(model->glow.metadata.output1[i].output_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.output1[i].model_output_type,
-					      str, STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.output1[i].dequantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.output2[j].output_name);
-			rte_ml_io_type_to_str(model->glow.metadata.output2[j].output_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.output2[j].model_output_type,
-					      str, STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.output2[j].dequantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		}
-	}
-	fprintf(fp, "\n");
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "\n");
-}
-
 static void
 cn10k_ml_prep_sp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				struct cnxk_ml_req *req, enum cn10k_ml_job_type job_type)
@@ -1120,38 +971,25 @@ cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mo
 }
 
 int
-cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
+cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
 	struct cn10k_ml_fw *fw;
 
 	uint32_t head_loc;
 	uint32_t tail_loc;
-	uint16_t model_id;
 	uint32_t bufsize;
 	char *head_ptr;
 	int core_id;
 
-	if (roc_env_is_asim())
-		return 0;
-
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	fw = &cn10k_mldev->fw;
 
-	/* Dump model info */
-	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
-		model = dev->data->models[model_id];
-		if (model != NULL) {
-			cn10k_ml_model_print(dev, model_id, fp);
-			fprintf(fp, "\n");
-		}
-	}
-
 	/* Dump OCM state */
-	cn10k_ml_ocm_print(dev, fp);
+	cn10k_ml_ocm_print(cnxk_mldev, fp);
+
+	if (roc_env_is_asim())
+		return 0;
 
 	/* Dump debug buffer */
 	for (core_id = 0; core_id <= 1; core_id++) {
@@ -1207,17 +1045,15 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 }
 
 int
-cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
+cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	const struct plt_memzone *mz;
 	struct cnxk_ml_req *req;
 	uint64_t timeout_cycle;
 	bool timeout;
 	int ret;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cnxk_ml_req), 0,
 					 ML_CN10K_ALIGN_SIZE);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 780e2a9f9c..5fda98ae88 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -295,8 +295,8 @@ int cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_d
 int cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
-int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
-int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
+int cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
+int cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev);
 
 int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
 void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
index 3d735ced3e..b069d4e3a5 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.c
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -5,3 +5,36 @@
 #include <rte_mldev.h>
 
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_utils.h"
+
+void
+cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp)
+{
+	struct cnxk_ml_layer *layer;
+	uint16_t layer_id;
+
+	/* Print debug info */
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, " Model Information (Model ID: %u, Name: %s)\n", model->model_id, model->name);
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->name);
+	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->batch_size);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "nb_layers", model->nb_layers);
+
+	/* Print model state */
+	if (model->state == ML_CNXK_MODEL_STATE_LOADED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+	if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+	if (model->state == ML_CNXK_MODEL_STATE_STARTED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+
+	for (layer_id = 0; layer_id < model->nb_layers; layer_id++) {
+		layer = &model->layer[layer_id];
+		cn10k_ml_layer_print(cnxk_mldev, layer, fp);
+	}
+}
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index a2994dbb71..66d979dd3c 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -108,4 +108,6 @@ struct cnxk_ml_model {
 	plt_spinlock_t lock;
 };
 
+void cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp);
+
 #endif /* _CNXK_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 63842025fc..66b88ddae1 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -409,6 +409,41 @@ cnxk_ml_dev_stop(struct rte_ml_dev *dev)
 	return 0;
 }
 
+static int
+cnxk_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	uint16_t model_id;
+
+	if ((dev == NULL) || (fp == NULL))
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	/* Dump model info */
+	for (model_id = 0; model_id < cnxk_mldev->mldev->data->nb_models; model_id++) {
+		model = cnxk_mldev->mldev->data->models[model_id];
+		if (model != NULL)
+			cnxk_ml_model_dump(cnxk_mldev, model, fp);
+	}
+
+	return cn10k_ml_dev_dump(cnxk_mldev, fp);
+}
+
+static int
+cnxk_ml_dev_selftest(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	return cn10k_ml_dev_selftest(cnxk_mldev);
+}
+
 static int
 cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 			     const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
@@ -729,8 +764,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_close = cnxk_ml_dev_close,
 	.dev_start = cnxk_ml_dev_start,
 	.dev_stop = cnxk_ml_dev_stop,
-	.dev_dump = cn10k_ml_dev_dump,
-	.dev_selftest = cn10k_ml_dev_selftest,
+	.dev_dump = cnxk_ml_dev_dump,
+	.dev_selftest = cnxk_ml_dev_selftest,
 
 	/* Queue-pair handling ops */
 	.dev_queue_pair_setup = cnxk_ml_dev_queue_pair_setup,
diff --git a/drivers/ml/cnxk/cnxk_ml_utils.c b/drivers/ml/cnxk/cnxk_ml_utils.c
new file mode 100644
index 0000000000..ca3670a9e8
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_utils.c
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include "cnxk_ml_utils.h"
+
+void
+cnxk_ml_print_line(FILE *fp, int len)
+{
+	int i;
+
+	for (i = 0; i < len; i++)
+		fprintf(fp, "-");
+	fprintf(fp, "\n");
+}
diff --git a/drivers/ml/cnxk/cnxk_ml_utils.h b/drivers/ml/cnxk/cnxk_ml_utils.h
new file mode 100644
index 0000000000..ed2ab21346
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_utils.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_UTILS_H_
+#define _CNXK_ML_UTILS_H_
+
+#include <rte_mldev.h>
+
+/* Debug print width */
+#define STR_LEN	  12
+#define FIELD_LEN 16
+#define LINE_LEN  72
+
+void cnxk_ml_print_line(FILE *fp, int len);
+
+#endif /* _CNXK_ML_UTILS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 79154c8698..5d27a87d91 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -16,6 +16,7 @@ sources = files(
         'cnxk_ml_io.c',
         'cnxk_ml_model.c',
         'cnxk_ml_ops.c',
+        'cnxk_ml_utils.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 14/34] ml/cnxk: update device stats functions
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (12 preceding siblings ...)
  2023-10-18 13:53   ` [PATCH v6 13/34] ml/cnxk: update device debug functions Srikanth Yalavarthi
@ 2023-10-18 13:53   ` Srikanth Yalavarthi
  2023-10-18 13:54   ` [PATCH v6 15/34] ml/cnxk: update device and model xstats functions Srikanth Yalavarthi
                     ` (20 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:53 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to handle ML device stats

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 32 ------------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |  2 --
 drivers/ml/cnxk/cnxk_ml_ops.c  | 36 ++++++++++++++++++++++++++++++++--
 3 files changed, 34 insertions(+), 36 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 0a3575879f..27d255a830 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -770,38 +770,6 @@ cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
-int
-cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
-{
-	struct cnxk_ml_qp *qp;
-	int qp_id;
-
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-		qp = dev->data->queue_pairs[qp_id];
-		stats->enqueued_count += qp->stats.enqueued_count;
-		stats->dequeued_count += qp->stats.dequeued_count;
-		stats->enqueue_err_count += qp->stats.enqueue_err_count;
-		stats->dequeue_err_count += qp->stats.dequeue_err_count;
-	}
-
-	return 0;
-}
-
-void
-cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
-{
-	struct cnxk_ml_qp *qp;
-	int qp_id;
-
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-		qp = dev->data->queue_pairs[qp_id];
-		qp->stats.enqueued_count = 0;
-		qp->stats.dequeued_count = 0;
-		qp->stats.enqueue_err_count = 0;
-		qp->stats.dequeue_err_count = 0;
-	}
-}
-
 int
 cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 5fda98ae88..47e7cb12af 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -298,8 +298,6 @@ int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev);
 
-int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
-void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
 int cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 				  int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
 				  uint32_t size);
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 66b88ddae1..c75317d6da 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -489,6 +489,38 @@ cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+static int
+cnxk_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
+{
+	struct cnxk_ml_qp *qp;
+	int qp_id;
+
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		stats->enqueued_count += qp->stats.enqueued_count;
+		stats->dequeued_count += qp->stats.dequeued_count;
+		stats->enqueue_err_count += qp->stats.enqueue_err_count;
+		stats->dequeue_err_count += qp->stats.dequeue_err_count;
+	}
+
+	return 0;
+}
+
+static void
+cnxk_ml_dev_stats_reset(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_qp *qp;
+	int qp_id;
+
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		qp->stats.enqueued_count = 0;
+		qp->stats.dequeued_count = 0;
+		qp->stats.enqueue_err_count = 0;
+		qp->stats.dequeue_err_count = 0;
+	}
+}
+
 static int
 cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
@@ -772,8 +804,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_queue_pair_release = cnxk_ml_dev_queue_pair_release,
 
 	/* Stats ops */
-	.dev_stats_get = cn10k_ml_dev_stats_get,
-	.dev_stats_reset = cn10k_ml_dev_stats_reset,
+	.dev_stats_get = cnxk_ml_dev_stats_get,
+	.dev_stats_reset = cnxk_ml_dev_stats_reset,
 	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
 	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
 	.dev_xstats_get = cn10k_ml_dev_xstats_get,
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 15/34] ml/cnxk: update device and model xstats functions
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (13 preceding siblings ...)
  2023-10-18 13:53   ` [PATCH v6 14/34] ml/cnxk: update device stats functions Srikanth Yalavarthi
@ 2023-10-18 13:54   ` Srikanth Yalavarthi
  2023-10-18 13:54   ` [PATCH v6 16/34] ml/cnxk: update fast path functions Srikanth Yalavarthi
                     ` (19 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:54 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to handle ML device and model
extended stats. Handling resources for the xstats is done
in the cnxk layer. Introduced internal xstats group.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |   4 -
 drivers/ml/cnxk/cn10k_ml_ops.c   | 531 +++----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h   |  16 +-
 drivers/ml/cnxk/cnxk_ml_dev.h    |   5 +
 drivers/ml/cnxk/cnxk_ml_ops.c    | 481 +++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_xstats.h |  21 +-
 6 files changed, 551 insertions(+), 507 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index be989e0a20..bde9d08901 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -10,7 +10,6 @@
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_io.h"
-#include "cnxk_ml_xstats.h"
 
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
@@ -133,9 +132,6 @@ struct cn10k_ml_dev {
 	/* OCM info */
 	struct cn10k_ml_ocm ocm;
 
-	/* Extended stats data */
-	struct cnxk_ml_xstats xstats;
-
 	/* Enable / disable model data caching */
 	int cache_model_data;
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 27d255a830..776ad60401 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -198,107 +198,21 @@ cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_r
 	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
 }
 
-static int
-cn10k_ml_xstats_init(struct rte_ml_dev *dev)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint16_t nb_stats;
-	uint16_t stat_id;
-	uint16_t model;
-	uint16_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
-	nb_stats = RTE_DIM(device_xstats) + ML_CNXK_MAX_MODELS * RTE_DIM(layer_xstats);
-	if (cn10k_mldev->xstats.entries == NULL)
-		cn10k_mldev->xstats.entries = rte_zmalloc(
-			"cn10k_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
-			PLT_CACHE_LINE_SIZE);
-
-	if (cn10k_mldev->xstats.entries == NULL)
-		return -ENOMEM;
-
-	/* Initialize device xstats */
-	stat_id = 0;
-	for (i = 0; i < RTE_DIM(device_xstats); i++) {
-		cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
-		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
-			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s",
-			 device_xstats[i].name);
-
-		cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
-		cn10k_mldev->xstats.entries[stat_id].type = device_xstats[i].type;
-		cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_DEVICE;
-		cn10k_mldev->xstats.entries[stat_id].obj_idx = 0;
-		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_xstats[i].reset_allowed;
-		stat_id++;
-	}
-	cn10k_mldev->xstats.count_mode_device = stat_id;
-
-	/* Initialize model xstats */
-	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
-		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
-
-		for (i = 0; i < RTE_DIM(layer_xstats); i++) {
-			cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
-			cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
-			cn10k_mldev->xstats.entries[stat_id].type = layer_xstats[i].type;
-			cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
-			cn10k_mldev->xstats.entries[stat_id].obj_idx = model;
-			cn10k_mldev->xstats.entries[stat_id].reset_allowed =
-				layer_xstats[i].reset_allowed;
-
-			/* Name of xstat is updated during model load */
-			snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
-				 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name),
-				 "Model-%u-%s", model, layer_xstats[i].name);
-
-			stat_id++;
-		}
-
-		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
-	}
-
-	cn10k_mldev->xstats.count_mode_model = stat_id - cn10k_mldev->xstats.count_mode_device;
-	cn10k_mldev->xstats.count = stat_id;
-
-	return 0;
-}
-
 static void
-cn10k_ml_xstats_uninit(struct rte_ml_dev *dev)
+cn10k_ml_xstats_layer_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id,
+				  uint16_t layer_id)
 {
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	rte_free(cn10k_mldev->xstats.entries);
-	cn10k_mldev->xstats.entries = NULL;
-
-	cn10k_mldev->xstats.count = 0;
-}
-
-static void
-cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	uint16_t rclk_freq;
 	uint16_t sclk_freq;
 	uint16_t stat_id;
 	char suffix[8];
 	uint16_t i;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	model = dev->data->models[model_id];
-	stat_id = RTE_DIM(device_xstats) + model_id * RTE_DIM(layer_xstats);
+	model = cnxk_mldev->mldev->data->models[model_id];
+	layer = &model->layer[layer_id];
+	stat_id = cnxk_mldev->xstats.offset_for_layer[model_id][layer_id];
 
 	roc_clk_freq_get(&rclk_freq, &sclk_freq);
 	if (sclk_freq == 0)
@@ -306,270 +220,94 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 	else
 		strcpy(suffix, "ns");
 
-	/* Update xstat name based on model name and sclk availability */
+	/* Update xstat name based on layer name and sclk availability */
 	for (i = 0; i < RTE_DIM(layer_xstats); i++) {
-		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
-			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
-			 model->layer[0].glow.metadata.model.name, layer_xstats[i].name, suffix);
+		snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+			 layer->glow.metadata.model.name, layer_xstats[i].name, suffix);
 		stat_id++;
 	}
 }
 
-static uint64_t
-cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
-		       enum cnxk_ml_xstats_type type)
-{
-	struct cnxk_ml_dev *cnxk_mldev;
-
-	cnxk_mldev = dev->data->dev_private;
-
-	switch (type) {
-	case nb_models_loaded:
-		return cnxk_mldev->nb_models_loaded;
-	case nb_models_unloaded:
-		return cnxk_mldev->nb_models_unloaded;
-	case nb_models_started:
-		return cnxk_mldev->nb_models_started;
-	case nb_models_stopped:
-		return cnxk_mldev->nb_models_stopped;
-	default:
-		return -1;
-	}
-
-	return 0;
-}
-
-#define ML_AVG_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+#define ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
 	do {                                                                                       \
 		value = 0;                                                                         \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value += model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot;       \
-			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
-				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value += layer->glow.burst_xstats[qp_id].str##_latency_tot;                \
+			count += layer->glow.burst_xstats[qp_id].dequeued_count -                  \
+				 layer->glow.burst_xstats[qp_id].str##_reset_count;                \
 		}                                                                                  \
+		value += layer->glow.sync_xstats->str##_latency_tot;                               \
+		count += layer->glow.sync_xstats->dequeued_count -                                 \
+			 layer->glow.sync_xstats->str##_reset_count;                               \
 		if (count != 0)                                                                    \
 			value = value / count;                                                     \
 	} while (0)
 
-#define ML_MIN_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+#define ML_MIN_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
 	do {                                                                                       \
 		value = UINT64_MAX;                                                                \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MIN(                                                           \
-				value,                                                             \
-				model->layer[0].glow.burst_xstats[qp_id].str##_latency_min);       \
-			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
-				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MIN(value, layer->glow.burst_xstats[qp_id].str##_latency_min); \
+			count += layer->glow.burst_xstats[qp_id].dequeued_count -                  \
+				 layer->glow.burst_xstats[qp_id].str##_reset_count;                \
 		}                                                                                  \
+		value = PLT_MIN(value, layer->glow.sync_xstats->str##_latency_min);                \
+		count += layer->glow.sync_xstats->dequeued_count -                                 \
+			 layer->glow.sync_xstats->str##_reset_count;                               \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
 	} while (0)
 
-#define ML_MAX_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+#define ML_MAX_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
 	do {                                                                                       \
 		value = 0;                                                                         \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MAX(                                                           \
-				value,                                                             \
-				model->layer[0].glow.burst_xstats[qp_id].str##_latency_max);       \
-			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
-				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MAX(value, layer->glow.burst_xstats[qp_id].str##_latency_max); \
+			count += layer->glow.burst_xstats[qp_id].dequeued_count -                  \
+				 layer->glow.burst_xstats[qp_id].str##_reset_count;                \
 		}                                                                                  \
+		value = PLT_MAX(value, layer->glow.sync_xstats->str##_latency_max);                \
+		count += layer->glow.sync_xstats->dequeued_count -                                 \
+			 layer->glow.sync_xstats->str##_reset_count;                               \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
 	} while (0)
 
-static uint64_t
-cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cnxk_ml_xstats_type type)
+uint64_t
+cn10k_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
+			 enum cnxk_ml_xstats_type type)
 {
-	struct cnxk_ml_model *model;
-	uint16_t rclk_freq; /* MHz */
-	uint16_t sclk_freq; /* MHz */
 	uint64_t count = 0;
-	uint64_t value;
+	uint64_t value = 0;
 	uint32_t qp_id;
 
-	model = dev->data->models[obj_idx];
-	if (model == NULL)
-		return 0;
-
 	switch (type) {
 	case avg_hw_latency:
-		ML_AVG_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, hw, value, count);
 		break;
 	case min_hw_latency:
-		ML_MIN_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		ML_MIN_FOREACH_QP(cnxk_mldev, layer, qp_id, hw, value, count);
 		break;
 	case max_hw_latency:
-		ML_MAX_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		ML_MAX_FOREACH_QP(cnxk_mldev, layer, qp_id, hw, value, count);
 		break;
 	case avg_fw_latency:
-		ML_AVG_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, fw, value, count);
 		break;
 	case min_fw_latency:
-		ML_MIN_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		ML_MIN_FOREACH_QP(cnxk_mldev, layer, qp_id, fw, value, count);
 		break;
 	case max_fw_latency:
-		ML_MAX_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		ML_MAX_FOREACH_QP(cnxk_mldev, layer, qp_id, fw, value, count);
 		break;
 	default:
 		value = 0;
 	}
 
-	roc_clk_freq_get(&rclk_freq, &sclk_freq);
-	if (sclk_freq != 0) /* return in ns */
-		value = (value * 1000ULL) / sclk_freq;
-
 	return value;
 }
 
-static int
-cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[], uint16_t nb_ids)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_xstats_entry *xs;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint16_t nb_stats;
-	uint16_t stat_id;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	if (stat_ids == NULL)
-		nb_stats = cn10k_mldev->xstats.count_mode_device;
-	else
-		nb_stats = nb_ids;
-
-	for (i = 0; i < nb_stats; i++) {
-		if (stat_ids == NULL)
-			stat_id = i;
-		else
-			stat_id = stat_ids[i];
-
-		if (stat_id >= cn10k_mldev->xstats.count_mode_device)
-			return -EINVAL;
-
-		xs = &cn10k_mldev->xstats.entries[stat_id];
-		if (!xs->reset_allowed)
-			continue;
-
-		xs->reset_value = cn10k_ml_dev_xstat_get(dev, xs->obj_idx, xs->type);
-	}
-
-	return 0;
-}
-
-#define ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
-	do {                                                                                       \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot = 0;            \
-			model->layer[0].glow.burst_xstats[qp_id].str##_reset_count =               \
-				model->layer[0].glow.burst_xstats[qp_id].dequeued_count;           \
-		}                                                                                  \
-	} while (0)
-
-#define ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
-	do {                                                                                       \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_xstats[qp_id].str##_latency_min = UINT64_MAX;   \
-	} while (0)
-
-#define ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
-	do {                                                                                       \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_xstats[qp_id].str##_latency_max = 0;            \
-	} while (0)
-
-static void
-cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cnxk_ml_xstats_type type)
-{
-	struct cnxk_ml_model *model;
-	uint32_t qp_id;
-
-	model = dev->data->models[model_id];
-
-	switch (type) {
-	case avg_hw_latency:
-		ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, hw);
-		break;
-	case min_hw_latency:
-		ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, hw);
-		break;
-	case max_hw_latency:
-		ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, hw);
-		break;
-	case avg_fw_latency:
-		ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, fw);
-		break;
-	case min_fw_latency:
-		ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, fw);
-		break;
-	case max_fw_latency:
-		ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, fw);
-		break;
-	default:
-		return;
-	}
-}
-
-static int
-cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint16_t stat_ids[],
-			    uint16_t nb_ids)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_xstats_entry *xs;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
-	int32_t lcl_model_id = 0;
-	uint16_t start_id;
-	uint16_t end_id;
-	int32_t i;
-	int32_t j;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	for (i = 0; i < ML_CNXK_MAX_MODELS; i++) {
-		if (model_id == -1) {
-			model = dev->data->models[i];
-			if (model == NULL) /* Skip inactive models */
-				continue;
-		} else {
-			if (model_id != i)
-				continue;
-
-			model = dev->data->models[model_id];
-			if (model == NULL) {
-				plt_err("Invalid model_id = %d\n", model_id);
-				return -EINVAL;
-			}
-		}
-
-		start_id = cn10k_mldev->xstats.offset_for_model[i];
-		end_id = cn10k_mldev->xstats.offset_for_model[i] +
-			 cn10k_mldev->xstats.count_per_model[i] - 1;
-
-		if (stat_ids == NULL) {
-			for (j = start_id; j <= end_id; j++) {
-				xs = &cn10k_mldev->xstats.entries[j];
-				cn10k_ml_reset_model_stat(dev, i, xs->type);
-			}
-		} else {
-			for (j = 0; j < nb_ids; j++) {
-				if (stat_ids[j] < start_id || stat_ids[j] > end_id) {
-					plt_err("Invalid stat_ids[%d] = %d for model_id = %d\n", j,
-						stat_ids[j], lcl_model_id);
-					return -EINVAL;
-				}
-				xs = &cn10k_mldev->xstats.entries[stat_ids[j]];
-				cn10k_ml_reset_model_stat(dev, i, xs->type);
-			}
-		}
-	}
-
-	return 0;
-}
-
 static int
 cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer)
 {
@@ -654,7 +392,6 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cn10k_ml_ocm *ocm;
 	uint16_t tile_id;
-	int ret;
 
 	RTE_SET_USED(conf);
 
@@ -682,13 +419,6 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 
 	rte_spinlock_init(&ocm->lock);
 
-	/* Initialize xstats */
-	ret = cn10k_ml_xstats_init(cnxk_mldev->mldev);
-	if (ret != 0) {
-		plt_err("Failed to initialize xstats");
-		return ret;
-	}
-
 	/* Set JCMDQ enqueue function */
 	if (cn10k_mldev->hw_queue_lock == 1)
 		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_sl;
@@ -717,9 +447,6 @@ cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 	/* Release ocm_mask memory */
 	rte_free(cn10k_mldev->ocm.ocm_mask);
 
-	/* Un-initialize xstats */
-	cn10k_ml_xstats_uninit(cnxk_mldev->mldev);
-
 	/* Unload firmware */
 	cn10k_ml_fw_unload(cnxk_mldev);
 
@@ -770,174 +497,6 @@ cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
-int
-cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
-			      uint32_t size)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint32_t xstats_mode_count;
-	uint32_t idx = 0;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	xstats_mode_count = 0;
-	switch (mode) {
-	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
-		break;
-	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CNXK_MAX_MODELS)
-			break;
-		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
-		break;
-	default:
-		return -EINVAL;
-	};
-
-	if (xstats_mode_count > size || xstats_map == NULL)
-		return xstats_mode_count;
-
-	for (i = 0; i < cn10k_mldev->xstats.count && idx < size; i++) {
-		if (cn10k_mldev->xstats.entries[i].mode != mode)
-			continue;
-
-		if (mode != RTE_ML_DEV_XSTATS_DEVICE &&
-		    model_id != cn10k_mldev->xstats.entries[i].obj_idx)
-			continue;
-
-		strncpy(xstats_map[idx].name, cn10k_mldev->xstats.entries[i].map.name,
-			RTE_ML_STR_MAX);
-		xstats_map[idx].id = cn10k_mldev->xstats.entries[i].map.id;
-		idx++;
-	}
-
-	return idx;
-}
-
-int
-cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
-				uint64_t *value)
-{
-	struct cnxk_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	cnxk_ml_xstats_fn fn;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	for (i = 0; i < cn10k_mldev->xstats.count; i++) {
-		xs = &cn10k_mldev->xstats.entries[i];
-		if (strncmp(xs->map.name, name, RTE_ML_STR_MAX) == 0) {
-			if (stat_id != NULL)
-				*stat_id = xs->map.id;
-
-			switch (xs->fn_id) {
-			case CNXK_ML_XSTATS_FN_DEVICE:
-				fn = cn10k_ml_dev_xstat_get;
-				break;
-			case CNXK_ML_XSTATS_FN_MODEL:
-				fn = cn10k_ml_model_xstat_get;
-				break;
-			default:
-				plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
-				return -EINVAL;
-			}
-
-			*value = fn(dev, xs->obj_idx, xs->type) - xs->reset_value;
-
-			return 0;
-		}
-	}
-
-	if (stat_id != NULL)
-		*stat_id = (uint16_t)-1;
-
-	return -EINVAL;
-}
-
-int
-cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
-			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_xstats_entry *xs;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint32_t xstats_mode_count;
-	cnxk_ml_xstats_fn fn;
-	uint64_t val;
-	uint32_t idx;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	xstats_mode_count = 0;
-
-	switch (mode) {
-	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
-		break;
-	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CNXK_MAX_MODELS)
-			return -EINVAL;
-		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
-		break;
-	default:
-		return -EINVAL;
-	};
-
-	idx = 0;
-	for (i = 0; i < nb_ids && idx < xstats_mode_count; i++) {
-		xs = &cn10k_mldev->xstats.entries[stat_ids[i]];
-		if (stat_ids[i] > cn10k_mldev->xstats.count || xs->mode != mode)
-			continue;
-
-		if (mode == RTE_ML_DEV_XSTATS_MODEL && model_id != xs->obj_idx) {
-			plt_err("Invalid stats_id[%d] = %d for model_id = %d\n", i, stat_ids[i],
-				model_id);
-			return -EINVAL;
-		}
-
-		switch (xs->fn_id) {
-		case CNXK_ML_XSTATS_FN_DEVICE:
-			fn = cn10k_ml_dev_xstat_get;
-			break;
-		case CNXK_ML_XSTATS_FN_MODEL:
-			fn = cn10k_ml_model_xstat_get;
-			break;
-		default:
-			plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
-			return -EINVAL;
-		}
-
-		val = fn(dev, xs->obj_idx, xs->type);
-		if (values)
-			values[idx] = val;
-
-		idx++;
-	}
-
-	return idx;
-}
-
-int
-cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			  int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids)
-{
-	switch (mode) {
-	case RTE_ML_DEV_XSTATS_DEVICE:
-		return cn10k_ml_device_xstats_reset(dev, stat_ids, nb_ids);
-	case RTE_ML_DEV_XSTATS_MODEL:
-		return cn10k_ml_model_xstats_reset(dev, model_id, stat_ids, nb_ids);
-	};
-
-	return 0;
-}
-
 int
 cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
 {
@@ -1211,7 +770,7 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 							      sizeof(struct cn10k_ml_layer_xstats));
 
 	/* Update xstats names */
-	cn10k_ml_xstats_model_name_update(cnxk_mldev->mldev, idx);
+	cn10k_ml_xstats_layer_name_update(cnxk_mldev, model_id, layer_id);
 
 	layer->state = ML_CNXK_LAYER_STATE_LOADED;
 	cnxk_mldev->index_map[idx].model_id = model->model_id;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 47e7cb12af..4d76164dba 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -13,6 +13,7 @@
 struct cnxk_ml_dev;
 struct cnxk_ml_qp;
 struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
@@ -298,17 +299,6 @@ int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev);
 
-int cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-				  int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
-				  uint32_t size);
-int cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
-				    uint64_t *value);
-int cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			    int32_t model_id, const uint16_t stat_ids[], uint64_t values[],
-			    uint16_t nb_ids);
-int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			      int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids);
-
 /* Slow-path ops */
 int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
@@ -337,4 +327,8 @@ int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_nam
 int cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name);
 int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name);
 
+/* xstats ops */
+uint64_t cn10k_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
+				  enum cnxk_ml_xstats_type type);
+
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 1590249abd..3ce9338f1f 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -9,6 +9,8 @@
 
 #include "cn10k_ml_dev.h"
 
+#include "cnxk_ml_xstats.h"
+
 /* ML command timeout in seconds */
 #define ML_CNXK_CMD_TIMEOUT 5
 
@@ -51,6 +53,9 @@ struct cnxk_ml_dev {
 	/* Configuration state */
 	enum cnxk_ml_dev_state state;
 
+	/* Extended stats data */
+	struct cnxk_ml_xstats xstats;
+
 	/* Number of models loaded */
 	uint16_t nb_models_loaded;
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index c75317d6da..6a423d9eda 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -115,6 +115,285 @@ cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc
 	return NULL;
 }
 
+static int
+cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
+{
+	uint16_t nb_stats;
+	uint16_t stat_id;
+	uint16_t model;
+	uint16_t layer;
+	uint16_t i;
+
+	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
+	nb_stats = RTE_DIM(device_xstats) +
+		   RTE_DIM(layer_xstats) * ML_CNXK_MAX_MODELS * ML_CNXK_MODEL_MAX_LAYERS;
+	if (cnxk_mldev->xstats.entries == NULL)
+		cnxk_mldev->xstats.entries = rte_zmalloc(
+			"cnxk_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
+			PLT_CACHE_LINE_SIZE);
+
+	if (cnxk_mldev->xstats.entries == NULL)
+		return -ENOMEM;
+
+	/* Initialize device xstats */
+	stat_id = 0;
+	for (i = 0; i < RTE_DIM(device_xstats); i++) {
+		cnxk_mldev->xstats.entries[stat_id].map.id = stat_id;
+		snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s",
+			 device_xstats[i].name);
+
+		cnxk_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
+		cnxk_mldev->xstats.entries[stat_id].group = CNXK_ML_XSTATS_GROUP_DEVICE;
+		cnxk_mldev->xstats.entries[stat_id].type = device_xstats[i].type;
+		cnxk_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_DEVICE;
+		cnxk_mldev->xstats.entries[stat_id].obj_idx = 0;
+		cnxk_mldev->xstats.entries[stat_id].reset_allowed = device_xstats[i].reset_allowed;
+		stat_id++;
+	}
+	cnxk_mldev->xstats.count_mode_device = stat_id;
+
+	/* Initialize model xstats */
+	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
+		cnxk_mldev->xstats.offset_for_model[model] = stat_id;
+
+		for (layer = 0; layer < ML_CNXK_MODEL_MAX_LAYERS; layer++) {
+			cnxk_mldev->xstats.offset_for_layer[model][layer] = stat_id;
+
+			for (i = 0; i < RTE_DIM(layer_xstats); i++) {
+				cnxk_mldev->xstats.entries[stat_id].map.id = stat_id;
+				cnxk_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
+				cnxk_mldev->xstats.entries[stat_id].group =
+					CNXK_ML_XSTATS_GROUP_LAYER;
+				cnxk_mldev->xstats.entries[stat_id].type = layer_xstats[i].type;
+				cnxk_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
+				cnxk_mldev->xstats.entries[stat_id].obj_idx = model;
+				cnxk_mldev->xstats.entries[stat_id].layer_id = layer;
+				cnxk_mldev->xstats.entries[stat_id].reset_allowed =
+					layer_xstats[i].reset_allowed;
+
+				/* Name of xstat is updated during model load */
+				snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+					 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name),
+					 "Layer-%u-%u-%s", model, layer, layer_xstats[i].name);
+
+				stat_id++;
+			}
+
+			cnxk_mldev->xstats.count_per_layer[model][layer] = RTE_DIM(layer_xstats);
+		}
+
+		cnxk_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
+	}
+
+	cnxk_mldev->xstats.count_mode_model = stat_id - cnxk_mldev->xstats.count_mode_device;
+	cnxk_mldev->xstats.count = stat_id;
+
+	return 0;
+}
+
+static void
+cnxk_ml_xstats_uninit(struct cnxk_ml_dev *cnxk_mldev)
+{
+	rte_free(cnxk_mldev->xstats.entries);
+	cnxk_mldev->xstats.entries = NULL;
+
+	cnxk_mldev->xstats.count = 0;
+}
+
+static uint64_t
+cnxk_ml_dev_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx __rte_unused,
+		      int32_t layer_id __rte_unused, enum cnxk_ml_xstats_type type)
+{
+	switch (type) {
+	case nb_models_loaded:
+		return cnxk_mldev->nb_models_loaded;
+	case nb_models_unloaded:
+		return cnxk_mldev->nb_models_unloaded;
+	case nb_models_started:
+		return cnxk_mldev->nb_models_started;
+	case nb_models_stopped:
+		return cnxk_mldev->nb_models_stopped;
+	default:
+		return -1;
+	}
+
+	return 0;
+}
+
+static uint64_t
+cnxk_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx, int32_t layer_id,
+			enum cnxk_ml_xstats_type type)
+{
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
+	uint16_t rclk_freq; /* MHz */
+	uint16_t sclk_freq; /* MHz */
+	uint64_t value = 0;
+
+	model = cnxk_mldev->mldev->data->models[obj_idx];
+	if (model == NULL)
+		return 0;
+
+	if (layer_id >= 0)
+		layer = &model->layer[layer_id];
+	else
+		return 0;
+
+	value = cn10k_ml_model_xstat_get(cnxk_mldev, layer, type);
+
+	roc_clk_freq_get(&rclk_freq, &sclk_freq);
+	if (sclk_freq != 0) /* return in ns */
+		value = (value * 1000ULL) / sclk_freq;
+
+	return value;
+}
+
+static int
+cnxk_ml_device_xstats_reset(struct cnxk_ml_dev *cnxk_mldev, const uint16_t stat_ids[],
+			    uint16_t nb_ids)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	uint16_t nb_stats;
+	uint16_t stat_id;
+	uint32_t i;
+
+	if (stat_ids == NULL)
+		nb_stats = cnxk_mldev->xstats.count_mode_device;
+	else
+		nb_stats = nb_ids;
+
+	for (i = 0; i < nb_stats; i++) {
+		if (stat_ids == NULL)
+			stat_id = i;
+		else
+			stat_id = stat_ids[i];
+
+		if (stat_id >= cnxk_mldev->xstats.count_mode_device)
+			return -EINVAL;
+
+		xs = &cnxk_mldev->xstats.entries[stat_id];
+		if (!xs->reset_allowed)
+			continue;
+
+		xs->reset_value =
+			cnxk_ml_dev_xstat_get(cnxk_mldev, xs->obj_idx, xs->layer_id, xs->type);
+	}
+
+	return 0;
+}
+
+#define ML_AVG_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, str)                                     \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			layer->glow.burst_xstats[qp_id].str##_latency_tot = 0;                     \
+			layer->glow.burst_xstats[qp_id].str##_reset_count =                        \
+				layer->glow.burst_xstats[qp_id].dequeued_count;                    \
+		}                                                                                  \
+	} while (0)
+
+#define ML_MIN_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, str)                                     \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++)          \
+			layer->glow.burst_xstats[qp_id].str##_latency_min = UINT64_MAX;            \
+	} while (0)
+
+#define ML_MAX_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, str)                                     \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++)          \
+			layer->glow.burst_xstats[qp_id].str##_latency_max = 0;                     \
+	} while (0)
+
+static void
+cnxk_ml_reset_model_stat(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id,
+			 enum cnxk_ml_xstats_type type)
+{
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
+	uint16_t layer_id = 0;
+	uint32_t qp_id;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	layer = &model->layer[layer_id];
+
+	switch (type) {
+	case avg_hw_latency:
+		ML_AVG_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, hw);
+		break;
+	case min_hw_latency:
+		ML_MIN_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, hw);
+		break;
+	case max_hw_latency:
+		ML_MAX_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, hw);
+		break;
+	case avg_fw_latency:
+		ML_AVG_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, fw);
+		break;
+	case min_fw_latency:
+		ML_MIN_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, fw);
+		break;
+	case max_fw_latency:
+		ML_MAX_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, fw);
+		break;
+	default:
+		return;
+	}
+}
+
+static int
+cnxk_ml_model_xstats_reset(struct cnxk_ml_dev *cnxk_mldev, int32_t model_id,
+			   const uint16_t stat_ids[], uint16_t nb_ids)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_model *model;
+	int32_t lcl_model_id = 0;
+	uint16_t layer_id = 0;
+	uint16_t start_id;
+	uint16_t end_id;
+	int32_t i;
+	int32_t j;
+
+	for (i = 0; i < ML_CNXK_MAX_MODELS; i++) {
+		if (model_id == -1) {
+			model = cnxk_mldev->mldev->data->models[i];
+			if (model == NULL) /* skip inactive models */
+				continue;
+		} else {
+			if (model_id != i)
+				continue;
+
+			model = cnxk_mldev->mldev->data->models[model_id];
+			if (model == NULL) {
+				plt_err("Invalid model_id = %d\n", model_id);
+				return -EINVAL;
+			}
+		}
+
+		start_id = cnxk_mldev->xstats.offset_for_layer[i][layer_id];
+		end_id = cnxk_mldev->xstats.offset_for_layer[i][layer_id] +
+			 cnxk_mldev->xstats.count_per_layer[i][layer_id] - 1;
+
+		if (stat_ids == NULL) {
+			for (j = start_id; j <= end_id; j++) {
+				xs = &cnxk_mldev->xstats.entries[j];
+				cnxk_ml_reset_model_stat(cnxk_mldev, i, xs->type);
+			}
+		} else {
+			for (j = 0; j < nb_ids; j++) {
+				if (stat_ids[j] < start_id || stat_ids[j] > end_id) {
+					plt_err("Invalid stat_ids[%d] = %d for model_id = %d\n", j,
+						stat_ids[j], lcl_model_id);
+					return -EINVAL;
+				}
+				xs = &cnxk_mldev->xstats.entries[stat_ids[j]];
+				cnxk_ml_reset_model_stat(cnxk_mldev, i, xs->type);
+			}
+		}
+	}
+
+	return 0;
+}
+
 static int
 cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -294,6 +573,13 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	for (i = 0; i < cnxk_mldev->max_nb_layers; i++)
 		cnxk_mldev->index_map[i].active = false;
 
+	/* Initialize xstats */
+	ret = cnxk_ml_xstats_init(cnxk_mldev);
+	if (ret != 0) {
+		plt_err("Failed to initialize xstats");
+		goto error;
+	}
+
 	cnxk_mldev->nb_models_loaded = 0;
 	cnxk_mldev->nb_models_started = 0;
 	cnxk_mldev->nb_models_stopped = 0;
@@ -323,6 +609,9 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
+	/* Un-initialize xstats */
+	cnxk_ml_xstats_uninit(cnxk_mldev);
+
 	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close CN10K ML Device");
 
@@ -521,6 +810,190 @@ cnxk_ml_dev_stats_reset(struct rte_ml_dev *dev)
 	}
 }
 
+static int
+cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+			     int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
+			     uint32_t size)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_dev *cnxk_mldev;
+	uint32_t xstats_mode_count;
+	uint16_t layer_id = 0;
+	uint32_t idx = 0;
+	uint32_t i;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+	xstats_mode_count = 0;
+
+	switch (mode) {
+	case RTE_ML_DEV_XSTATS_DEVICE:
+		xstats_mode_count = cnxk_mldev->xstats.count_mode_device;
+		break;
+	case RTE_ML_DEV_XSTATS_MODEL:
+		if (model_id >= ML_CNXK_MAX_MODELS)
+			break;
+		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+		break;
+	default:
+		return -EINVAL;
+	};
+
+	if (xstats_mode_count > size || xstats_map == NULL)
+		return xstats_mode_count;
+
+	for (i = 0; i < cnxk_mldev->xstats.count && idx < size; i++) {
+		xs = &cnxk_mldev->xstats.entries[i];
+		if (xs->mode != mode)
+			continue;
+
+		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
+		    (model_id != xs->obj_idx || layer_id != xs->layer_id))
+			continue;
+
+		strncpy(xstats_map[idx].name, xs->map.name, RTE_ML_STR_MAX);
+		xstats_map[idx].id = xs->map.id;
+		idx++;
+	}
+
+	return idx;
+}
+
+static int
+cnxk_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
+			       uint64_t *value)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_dev *cnxk_mldev;
+	cnxk_ml_xstats_fn fn;
+	uint32_t i;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	for (i = 0; i < cnxk_mldev->xstats.count; i++) {
+		xs = &cnxk_mldev->xstats.entries[i];
+		if (strncmp(xs->map.name, name, RTE_ML_STR_MAX) == 0) {
+			if (stat_id != NULL)
+				*stat_id = xs->map.id;
+
+			switch (xs->fn_id) {
+			case CNXK_ML_XSTATS_FN_DEVICE:
+				fn = cnxk_ml_dev_xstat_get;
+				break;
+			case CNXK_ML_XSTATS_FN_MODEL:
+				fn = cnxk_ml_model_xstat_get;
+				break;
+			default:
+				plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
+				return -EINVAL;
+			}
+
+			*value = fn(cnxk_mldev, xs->obj_idx, xs->layer_id, xs->type) -
+				 xs->reset_value;
+
+			return 0;
+		}
+	}
+
+	if (stat_id != NULL)
+		*stat_id = (uint16_t)-1;
+
+	return -EINVAL;
+}
+
+static int
+cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
+		       const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_dev *cnxk_mldev;
+	uint32_t xstats_mode_count;
+	uint16_t layer_id = 0;
+	cnxk_ml_xstats_fn fn;
+	uint64_t val;
+	uint32_t idx;
+	uint32_t i;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+	xstats_mode_count = 0;
+
+	switch (mode) {
+	case RTE_ML_DEV_XSTATS_DEVICE:
+		xstats_mode_count = cnxk_mldev->xstats.count_mode_device;
+		break;
+	case RTE_ML_DEV_XSTATS_MODEL:
+		if (model_id >= ML_CNXK_MAX_MODELS)
+			return -EINVAL;
+		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+		break;
+	default:
+		return -EINVAL;
+	};
+
+	idx = 0;
+	for (i = 0; i < nb_ids && idx < xstats_mode_count; i++) {
+		xs = &cnxk_mldev->xstats.entries[stat_ids[i]];
+		if (stat_ids[i] > cnxk_mldev->xstats.count || xs->mode != mode)
+			continue;
+
+		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
+		    (model_id != xs->obj_idx || layer_id != xs->layer_id)) {
+			plt_err("Invalid stats_id[%d] = %d for model_id = %d\n", i, stat_ids[i],
+				model_id);
+			return -EINVAL;
+		}
+
+		switch (xs->fn_id) {
+		case CNXK_ML_XSTATS_FN_DEVICE:
+			fn = cnxk_ml_dev_xstat_get;
+			break;
+		case CNXK_ML_XSTATS_FN_MODEL:
+			fn = cnxk_ml_model_xstat_get;
+			break;
+		default:
+			plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
+			return -EINVAL;
+		}
+
+		val = fn(cnxk_mldev, xs->obj_idx, xs->layer_id, xs->type);
+		if (values)
+			values[idx] = val;
+
+		idx++;
+	}
+
+	return idx;
+}
+
+static int
+cnxk_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
+			 const uint16_t stat_ids[], uint16_t nb_ids)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	switch (mode) {
+	case RTE_ML_DEV_XSTATS_DEVICE:
+		return cnxk_ml_device_xstats_reset(cnxk_mldev, stat_ids, nb_ids);
+	case RTE_ML_DEV_XSTATS_MODEL:
+		return cnxk_ml_model_xstats_reset(cnxk_mldev, model_id, stat_ids, nb_ids);
+	};
+
+	return 0;
+}
+
 static int
 cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
@@ -806,10 +1279,10 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Stats ops */
 	.dev_stats_get = cnxk_ml_dev_stats_get,
 	.dev_stats_reset = cnxk_ml_dev_stats_reset,
-	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
-	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
-	.dev_xstats_get = cn10k_ml_dev_xstats_get,
-	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
+	.dev_xstats_names_get = cnxk_ml_dev_xstats_names_get,
+	.dev_xstats_by_name_get = cnxk_ml_dev_xstats_by_name_get,
+	.dev_xstats_get = cnxk_ml_dev_xstats_get,
+	.dev_xstats_reset = cnxk_ml_dev_xstats_reset,
 
 	/* Model ops */
 	.model_load = cnxk_ml_model_load,
diff --git a/drivers/ml/cnxk/cnxk_ml_xstats.h b/drivers/ml/cnxk/cnxk_ml_xstats.h
index 0d405679ca..5e02bb876c 100644
--- a/drivers/ml/cnxk/cnxk_ml_xstats.h
+++ b/drivers/ml/cnxk/cnxk_ml_xstats.h
@@ -7,6 +7,8 @@
 
 #include "cnxk_ml_io.h"
 
+struct cnxk_ml_dev;
+
 /* Extended stats types enum */
 enum cnxk_ml_xstats_type {
 	/* Number of models loaded */
@@ -58,9 +60,21 @@ enum cnxk_ml_xstats_fn_type {
 	CNXK_ML_XSTATS_FN_MODEL,
 };
 
+/* Extended stats group */
+enum cnxk_ml_xstats_group {
+	/* Device stats */
+	CNXK_ML_XSTATS_GROUP_DEVICE,
+
+	/* Model stats */
+	CNXK_ML_XSTATS_GROUP_MODEL,
+
+	/* Layer stats */
+	CNXK_ML_XSTATS_GROUP_LAYER,
+};
+
 /* Function pointer to get xstats for a type */
-typedef uint64_t (*cnxk_ml_xstats_fn)(struct rte_ml_dev *cnxk_mldev, uint16_t obj_idx,
-				      enum cnxk_ml_xstats_type stat);
+typedef uint64_t (*cnxk_ml_xstats_fn)(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx,
+				      int32_t layer_id, enum cnxk_ml_xstats_type stat);
 
 /* Extended stats entry structure */
 struct cnxk_ml_xstats_entry {
@@ -70,6 +84,9 @@ struct cnxk_ml_xstats_entry {
 	/* xstats mode, device or model */
 	enum rte_ml_dev_xstats_mode mode;
 
+	/* xstats group */
+	enum cnxk_ml_xstats_group group;
+
 	/* Type of xstats */
 	enum cnxk_ml_xstats_type type;
 
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 16/34] ml/cnxk: update fast path functions
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (14 preceding siblings ...)
  2023-10-18 13:54   ` [PATCH v6 15/34] ml/cnxk: update device and model xstats functions Srikanth Yalavarthi
@ 2023-10-18 13:54   ` Srikanth Yalavarthi
  2023-10-18 13:54   ` [PATCH v6 17/34] ml/cnxk: move error handling to cnxk layer Srikanth Yalavarthi
                     ` (18 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:54 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented cnxk layer fast-path functions and added support
for model specific fast-path functions. CNXK layer functions
would invoke model specific fast-path functions.

Added support for model specific poll handling functions and
updated internal inference sync function. Drop use of rte_ml_op
as argument. Updated function arguments to enable the function
to be used as callback by TVM HW runtime.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h  |   5 -
 drivers/ml/cnxk/cn10k_ml_ops.c  | 241 ++++++++------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h  |  13 +-
 drivers/ml/cnxk/cnxk_ml_model.h |  14 ++
 drivers/ml/cnxk/cnxk_ml_ops.c   | 128 +++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.h   |   7 +
 6 files changed, 216 insertions(+), 192 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index bde9d08901..94a94d996f 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -143,11 +143,6 @@ struct cn10k_ml_dev {
 
 	/* JCMD enqueue function handler */
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
-
-	/* Poll handling function pointers */
-	void (*set_poll_addr)(struct cnxk_ml_req *req);
-	void (*set_poll_ptr)(struct cnxk_ml_req *req);
-	uint64_t (*get_poll_ptr)(struct cnxk_ml_req *req);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 776ad60401..8116c8dedb 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -65,24 +65,12 @@ static const struct cn10k_ml_stype_db_driver {
 	{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
 };
 
-static inline void
+__rte_hot void
 cn10k_ml_set_poll_addr(struct cnxk_ml_req *req)
 {
 	req->status = &req->cn10k_req.status;
 }
 
-static inline void
-cn10k_ml_set_poll_ptr(struct cnxk_ml_req *req)
-{
-	plt_write64(ML_CNXK_POLL_JOB_START, req->status);
-}
-
-static inline uint64_t
-cn10k_ml_get_poll_ptr(struct cnxk_ml_req *req)
-{
-	return plt_read64(req->status);
-}
-
 void
 cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp)
 {
@@ -177,7 +165,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_l
 
 static __rte_always_inline void
 cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_req *req,
-				struct rte_ml_op *op)
+				uint16_t index, void *input, void *output, uint16_t nb_batches)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 
@@ -185,17 +173,17 @@ cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_r
 
 	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
 	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(req->status);
-	req->cn10k_req.jd.hdr.model_id = op->model_id;
+	req->cn10k_req.jd.hdr.model_id = index;
 	req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
 	req->cn10k_req.jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
 	req->cn10k_req.jd.hdr.sp_flags = 0x0;
 	req->cn10k_req.jd.hdr.result =
 		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
 	req->cn10k_req.jd.model_run.input_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->input[0]->addr));
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, input));
 	req->cn10k_req.jd.model_run.output_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->output[0]->addr));
-	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, output));
+	req->cn10k_req.jd.model_run.num_batches = nb_batches;
 }
 
 static void
@@ -311,30 +299,15 @@ cn10k_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *l
 static int
 cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer)
 {
-	struct rte_ml_buff_seg seg[2];
-	struct rte_ml_buff_seg *inp;
-	struct rte_ml_buff_seg *out;
-	struct rte_ml_op op;
-
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
 	uint64_t isize = 0;
 	uint64_t osize = 0;
 	int ret = 0;
-	uint32_t i;
-
-	inp = &seg[0];
-	out = &seg[1];
 
 	/* Create input and output buffers. */
-	for (i = 0; i < layer->info.nb_inputs; i++)
-		isize += layer->info.input[i].sz_q;
-
-	for (i = 0; i < layer->info.nb_outputs; i++)
-		osize += layer->info.output[i].sz_q;
-
-	isize = layer->batch_size * isize;
-	osize = layer->batch_size * osize;
+	isize = layer->info.total_input_sz_q;
+	osize = layer->info.total_output_sz_q;
 
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", layer->index);
 	mz = plt_memzone_reserve_aligned(str, isize + osize, 0, ML_CN10K_ALIGN_SIZE);
@@ -342,25 +315,9 @@ cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *
 		return -ENOMEM;
 	memset(mz->addr, 0, isize + osize);
 
-	seg[0].addr = mz->addr;
-	seg[0].iova_addr = mz->iova;
-	seg[0].length = isize;
-	seg[0].next = NULL;
-
-	seg[1].addr = PLT_PTR_ADD(mz->addr, isize);
-	seg[1].iova_addr = mz->iova + isize;
-	seg[1].length = osize;
-	seg[1].next = NULL;
-
-	op.model_id = layer->index;
-	op.nb_batches = layer->batch_size;
-	op.mempool = NULL;
-
-	op.input = &inp;
-	op.output = &out;
-
 	memset(layer->glow.req, 0, sizeof(struct cnxk_ml_req));
-	ret = cn10k_ml_inference_sync(cnxk_mldev, &op);
+	ret = cn10k_ml_inference_sync(cnxk_mldev, layer->index, mz->addr,
+				      PLT_PTR_ADD(mz->addr, isize), 1);
 	plt_memzone_free(mz);
 
 	return ret;
@@ -425,13 +382,8 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 	else
 		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
-	/* Set polling function pointers */
-	cn10k_mldev->set_poll_addr = cn10k_ml_set_poll_addr;
-	cn10k_mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
-	cn10k_mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
-
-	cnxk_mldev->mldev->enqueue_burst = cn10k_ml_enqueue_burst;
-	cnxk_mldev->mldev->dequeue_burst = cn10k_ml_dequeue_burst;
+	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
+	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
 	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
 
 	return 0;
@@ -824,6 +776,12 @@ cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	cn10k_ml_model_info_set(cnxk_mldev, model, &model->layer[0].info, &model->glow.metadata);
 
+	/* Set fast-path functions */
+	model->enqueue_single = cn10k_ml_enqueue_single;
+	model->result_update = cn10k_ml_result_update;
+	model->set_error_code = cn10k_ml_set_error_code;
+	model->set_poll_addr = cn10k_ml_set_poll_addr;
+
 	return 0;
 }
 
@@ -1219,26 +1177,8 @@ cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_mode
 	return 0;
 }
 
-static __rte_always_inline void
-queue_index_advance(uint64_t *index, uint64_t nb_desc)
-{
-	*index = (*index + 1) % nb_desc;
-}
-
-static __rte_always_inline uint64_t
-queue_pending_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
-{
-	return (nb_desc + head - tail) % nb_desc;
-}
-
-static __rte_always_inline uint64_t
-queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
-{
-	return nb_desc - queue_pending_count(head, tail, nb_desc) - 1;
-}
-
-static __rte_always_inline void
-cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml_req *req)
+__rte_hot void
+cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_layer_xstats *xstats;
@@ -1246,6 +1186,7 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml
 	struct cn10k_ml_result *result;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
+	struct cnxk_ml_req *req;
 	struct cnxk_ml_qp *qp;
 	struct rte_ml_op *op;
 	uint64_t hw_latency;
@@ -1253,9 +1194,9 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml
 	uint16_t model_id;
 	uint16_t layer_id;
 
+	req = (struct cnxk_ml_req *)request;
 	result = &req->cn10k_req.result;
 	op = req->op;
-
 	if (likely(result->error_code == 0)) {
 		model_id = cnxk_mldev->index_map[op->model_id].model_id;
 		layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
@@ -1322,119 +1263,48 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml
 	op->user_ptr = result->user_ptr;
 }
 
-__rte_hot uint16_t
-cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
-		       uint16_t nb_ops)
+__rte_hot void
+cn10k_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype)
+{
+	union cn10k_ml_error_code *error_code;
+
+	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+	error_code->s.etype = etype;
+	error_code->s.stype = stype;
+}
+
+__rte_hot bool
+cn10k_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op, uint16_t layer_id,
+			struct cnxk_ml_qp *qp, uint64_t head)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
 	struct cnxk_ml_queue *queue;
 	struct cnxk_ml_req *req;
-	struct cnxk_ml_qp *qp;
-	struct rte_ml_op *op;
-
-	uint16_t count;
-	uint64_t head;
-	bool enqueued;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	qp = dev->data->queue_pairs[qp_id];
 	queue = &qp->queue;
-
-	head = queue->head;
-	nb_ops = PLT_MIN(nb_ops, queue_free_count(head, queue->tail, qp->nb_desc));
-	count = 0;
-
-	if (unlikely(nb_ops == 0))
-		return 0;
-
-enqueue_req:
-	op = ops[count];
 	req = &queue->reqs[head];
 
-	cn10k_mldev->set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
+	model = cnxk_mldev->mldev->data->models[op->model_id];
+	model->set_poll_addr(req);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, model->layer[layer_id].index,
+					op->input[0]->addr, op->output[0]->addr, op->nb_batches);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
 	error_code->s.etype = ML_ETYPE_UNKNOWN;
 	req->cn10k_req.result.user_ptr = op->user_ptr;
 
-	cn10k_mldev->set_poll_ptr(req);
-	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd);
-	if (unlikely(!enqueued))
-		goto jcmdq_full;
+	cnxk_ml_set_poll_ptr(req);
+	if (unlikely(!cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd)))
+		return false;
 
 	req->timeout = plt_tsc_cycles() + queue->wait_cycles;
 	req->op = op;
 
-	queue_index_advance(&head, qp->nb_desc);
-	count++;
-
-	if (count < nb_ops)
-		goto enqueue_req;
-
-jcmdq_full:
-	queue->head = head;
-	qp->stats.enqueued_count += count;
-
-	return count;
-}
-
-__rte_hot uint16_t
-cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
-		       uint16_t nb_ops)
-{
-	union cn10k_ml_error_code *error_code;
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_queue *queue;
-	struct cnxk_ml_req *req;
-	struct cnxk_ml_qp *qp;
-
-	uint64_t status;
-	uint16_t count;
-	uint64_t tail;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	qp = dev->data->queue_pairs[qp_id];
-	queue = &qp->queue;
-
-	tail = queue->tail;
-	nb_ops = PLT_MIN(nb_ops, queue_pending_count(queue->head, tail, qp->nb_desc));
-	count = 0;
-
-	if (unlikely(nb_ops == 0))
-		goto empty_or_active;
-
-dequeue_req:
-	req = &queue->reqs[tail];
-	status = cn10k_mldev->get_poll_ptr(req);
-	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
-		if (plt_tsc_cycles() < req->timeout) {
-			goto empty_or_active;
-		} else { /* Timeout, set indication of driver error */
-			error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
-			error_code->s.etype = ML_ETYPE_DRIVER;
-		}
-	}
-
-	cn10k_ml_result_update(cnxk_mldev, qp_id, req);
-	ops[count] = req->op;
-
-	queue_index_advance(&tail, qp->nb_desc);
-	count++;
-
-	if (count < nb_ops)
-		goto dequeue_req;
-
-empty_or_active:
-	queue->tail = tail;
-
-	return count;
+	return true;
 }
 
 __rte_hot int
@@ -1471,41 +1341,48 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 }
 
 __rte_hot int
-cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op)
+cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
+			uint16_t nb_batches)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
 	struct cnxk_ml_req *req;
+	struct rte_ml_op op;
 	uint16_t model_id;
 	uint16_t layer_id;
 	bool timeout;
 	int ret = 0;
 
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	model_id = cnxk_mldev->index_map[op->model_id].model_id;
-	layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
+	model_id = cnxk_mldev->index_map[index].model_id;
+	layer_id = cnxk_mldev->index_map[index].layer_id;
 	model = cnxk_mldev->mldev->data->models[model_id];
 	layer = &model->layer[layer_id];
 	req = layer->glow.req;
 
+	op.model_id = index;
+	op.impl_opaque = 0;
+
 	cn10k_ml_set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, index, input, output, nb_batches);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
 	error_code->s.etype = ML_ETYPE_UNKNOWN;
-	req->cn10k_req.result.user_ptr = op->user_ptr;
+	req->cn10k_req.result.user_ptr = NULL;
 
-	cn10k_mldev->set_poll_ptr(req);
+	cnxk_ml_set_poll_ptr(req);
 	req->cn10k_req.jcmd.w1.s.jobptr = PLT_U64_CAST(&req->cn10k_req.jd);
 
 	timeout = true;
 	req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	do {
 		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd)) {
-			req->op = op;
+			req->op = &op;
 			timeout = false;
 			break;
 		}
@@ -1518,7 +1395,7 @@ cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op)
 
 	timeout = true;
 	do {
-		if (cn10k_mldev->get_poll_ptr(req) == ML_CNXK_POLL_JOB_FINISH) {
+		if (cnxk_ml_get_poll_ptr(req) == ML_CNXK_POLL_JOB_FINISH) {
 			timeout = false;
 			break;
 		}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 4d76164dba..3d18303ed3 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -14,6 +14,7 @@ struct cnxk_ml_dev;
 struct cnxk_ml_qp;
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
+struct cnxk_ml_req;
 
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
@@ -309,13 +310,15 @@ int cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_
 				 void *buffer);
 
 /* Fast-path ops */
-__rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
-					  struct rte_ml_op **ops, uint16_t nb_ops);
-__rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
-					  struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot bool cn10k_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op,
+				       uint16_t layer_id, struct cnxk_ml_qp *qp, uint64_t head);
 __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op,
 				    struct rte_ml_op_error *error);
-__rte_hot int cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op);
+__rte_hot int cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
+				      uint16_t nb_batches);
+__rte_hot void cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request);
+__rte_hot void cn10k_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype);
+__rte_hot void cn10k_ml_set_poll_addr(struct cnxk_ml_req *req);
 
 /* Misc ops */
 void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index 66d979dd3c..f618e5aa5f 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -15,6 +15,8 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_qp;
+struct cnxk_ml_req;
 
 /* Model state */
 enum cnxk_ml_model_state {
@@ -70,6 +72,12 @@ struct cnxk_ml_layer {
 	struct cn10k_ml_layer_data glow;
 };
 
+typedef bool (*enqueue_single_t)(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op,
+				 uint16_t layer_id, struct cnxk_ml_qp *qp, uint64_t head);
+typedef void (*result_update_t)(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request);
+typedef void (*set_error_code_t)(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype);
+typedef void (*set_poll_addr_t)(struct cnxk_ml_req *req);
+
 /* Model Object */
 struct cnxk_ml_model {
 	/* Device reference */
@@ -106,6 +114,12 @@ struct cnxk_ml_model {
 
 	/* Spinlock, used to update model state */
 	plt_spinlock_t lock;
+
+	/* Fast-path functions */
+	enqueue_single_t enqueue_single;
+	result_update_t result_update;
+	set_error_code_t set_error_code;
+	set_poll_addr_t set_poll_addr;
 };
 
 void cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp);
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 6a423d9eda..6a44a69508 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -15,6 +15,18 @@
 /* ML model macros */
 #define CNXK_ML_MODEL_MEMZONE_NAME "ml_cnxk_model_mz"
 
+__rte_hot void
+cnxk_ml_set_poll_ptr(struct cnxk_ml_req *req)
+{
+	plt_write64(ML_CNXK_POLL_JOB_START, req->status);
+}
+
+__rte_hot uint64_t
+cnxk_ml_get_poll_ptr(struct cnxk_ml_req *req)
+{
+	return plt_read64(req->status);
+}
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -1262,6 +1274,122 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 	return 0;
 }
 
+static __rte_always_inline void
+queue_index_advance(uint64_t *index, uint64_t nb_desc)
+{
+	*index = (*index + 1) % nb_desc;
+}
+
+static __rte_always_inline uint64_t
+queue_pending_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
+{
+	return (nb_desc + head - tail) % nb_desc;
+}
+
+static __rte_always_inline uint64_t
+queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
+{
+	return nb_desc - queue_pending_count(head, tail, nb_desc) - 1;
+}
+
+__rte_hot uint16_t
+cnxk_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
+		      uint16_t nb_ops)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_qp *qp;
+	struct rte_ml_op *op;
+
+	uint16_t layer_id = 0;
+	uint16_t count;
+	uint64_t head;
+
+	cnxk_mldev = dev->data->dev_private;
+	qp = dev->data->queue_pairs[qp_id];
+	queue = &qp->queue;
+
+	head = queue->head;
+	nb_ops = PLT_MIN(nb_ops, queue_free_count(head, queue->tail, qp->nb_desc));
+	count = 0;
+
+	if (unlikely(nb_ops == 0))
+		return 0;
+
+enqueue_req:
+	op = ops[count];
+	model = cnxk_mldev->mldev->data->models[op->model_id];
+
+	if (unlikely(!model->enqueue_single(cnxk_mldev, op, layer_id, qp, head)))
+		goto jcmdq_full;
+
+	queue_index_advance(&head, qp->nb_desc);
+	count++;
+
+	if (count < nb_ops)
+		goto enqueue_req;
+
+jcmdq_full:
+	queue->head = head;
+	qp->stats.enqueued_count += count;
+
+	return count;
+}
+
+__rte_hot uint16_t
+cnxk_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
+		      uint16_t nb_ops)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_req *req;
+	struct cnxk_ml_qp *qp;
+
+	uint64_t status;
+	uint16_t count;
+	uint64_t tail;
+
+	cnxk_mldev = dev->data->dev_private;
+	qp = dev->data->queue_pairs[qp_id];
+	queue = &qp->queue;
+
+	tail = queue->tail;
+	nb_ops = PLT_MIN(nb_ops, queue_pending_count(queue->head, tail, qp->nb_desc));
+	count = 0;
+
+	if (unlikely(nb_ops == 0))
+		goto empty_or_active;
+
+dequeue_req:
+
+	req = &queue->reqs[tail];
+	model = cnxk_mldev->mldev->data->models[req->op->model_id];
+
+	status = cnxk_ml_get_poll_ptr(req);
+	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
+		if (plt_tsc_cycles() < req->timeout)
+			goto empty_or_active;
+		else /* Timeout, set indication of driver error */
+			model->set_error_code(req, ML_ETYPE_DRIVER, 0);
+	}
+
+	model->result_update(cnxk_mldev, qp->id, req);
+
+	ops[count] = req->op;
+	queue_index_advance(&tail, qp->nb_desc);
+	count++;
+
+	if (count < nb_ops)
+		goto dequeue_req;
+
+empty_or_active:
+	queue->tail = tail;
+
+	return count;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index d27ca0d0cb..d0c126f34b 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -65,4 +65,11 @@ extern struct rte_ml_dev_ops cnxk_ml_ops;
 int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
 int cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
 
+__rte_hot uint16_t cnxk_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
+					 struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot uint16_t cnxk_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
+					 struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot void cnxk_ml_set_poll_ptr(struct cnxk_ml_req *req);
+__rte_hot uint64_t cnxk_ml_get_poll_ptr(struct cnxk_ml_req *req);
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 17/34] ml/cnxk: move error handling to cnxk layer
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (15 preceding siblings ...)
  2023-10-18 13:54   ` [PATCH v6 16/34] ml/cnxk: update fast path functions Srikanth Yalavarthi
@ 2023-10-18 13:54   ` Srikanth Yalavarthi
  2023-10-18 13:54   ` [PATCH v6 18/34] ml/cnxk: support config and close of tvmdp library Srikanth Yalavarthi
                     ` (17 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:54 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Move error type structures to cnxk layer. cn10k layer to
handle fw and hw error sub-types only.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h | 41 ++++++---------
 drivers/ml/cnxk/cn10k_ml_ops.c | 93 +++++++++++++---------------------
 drivers/ml/cnxk/cnxk_ml_dev.c  |  8 +++
 drivers/ml/cnxk/cnxk_ml_dev.h  | 18 +++++++
 drivers/ml/cnxk/cnxk_ml_ops.c  |  2 +-
 5 files changed, 78 insertions(+), 84 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 94a94d996f..2e7eb6c9ef 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -52,38 +52,27 @@ struct cnxk_ml_dev;
 struct cnxk_ml_req;
 struct cnxk_ml_qp;
 
-/* Error types enumeration */
-enum cn10k_ml_error_etype {
-	/* 0x0 */ ML_ETYPE_NO_ERROR = 0, /* No error */
-	/* 0x1 */ ML_ETYPE_FW_NONFATAL,	 /* Firmware non-fatal error */
-	/* 0x2 */ ML_ETYPE_HW_NONFATAL,	 /* Hardware non-fatal error */
-	/* 0x3 */ ML_ETYPE_HW_FATAL,	 /* Hardware fatal error */
-	/* 0x4 */ ML_ETYPE_HW_WARNING,	 /* Hardware warning */
-	/* 0x5 */ ML_ETYPE_DRIVER,	 /* Driver specific error */
-	/* 0x6 */ ML_ETYPE_UNKNOWN,	 /* Unknown error */
-};
-
 /* Firmware non-fatal error sub-type */
 enum cn10k_ml_error_stype_fw_nf {
-	/* 0x0 */ ML_FW_ERR_NOERR = 0,		 /* No error */
-	/* 0x1 */ ML_FW_ERR_UNLOAD_ID_NOT_FOUND, /* Model ID not found during load */
-	/* 0x2 */ ML_FW_ERR_LOAD_LUT_OVERFLOW,	 /* Lookup table overflow at load */
-	/* 0x3 */ ML_FW_ERR_ID_IN_USE,		 /* Model ID already in use */
-	/* 0x4 */ ML_FW_ERR_INVALID_TILEMASK,	 /* Invalid OCM tilemask */
-	/* 0x5 */ ML_FW_ERR_RUN_LUT_OVERFLOW,	 /* Lookup table overflow at run */
-	/* 0x6 */ ML_FW_ERR_RUN_ID_NOT_FOUND,	 /* Model ID not found during run */
-	/* 0x7 */ ML_FW_ERR_COMMAND_NOTSUP,	 /* Unsupported command */
-	/* 0x8 */ ML_FW_ERR_DDR_ADDR_RANGE,	 /* DDR address out of range */
-	/* 0x9 */ ML_FW_ERR_NUM_BATCHES_INVALID, /* Invalid number of batches */
-	/* 0xA */ ML_FW_ERR_INSSYNC_TIMEOUT,	 /* INS sync timeout */
+	/* 0x0 */ ML_CN10K_FW_ERR_NOERR = 0,	       /* No error */
+	/* 0x1 */ ML_CN10K_FW_ERR_UNLOAD_ID_NOT_FOUND, /* Model ID not found during load */
+	/* 0x2 */ ML_CN10K_FW_ERR_LOAD_LUT_OVERFLOW,   /* Lookup table overflow at load */
+	/* 0x3 */ ML_CN10K_FW_ERR_ID_IN_USE,	       /* Model ID already in use */
+	/* 0x4 */ ML_CN10K_FW_ERR_INVALID_TILEMASK,    /* Invalid OCM tilemask */
+	/* 0x5 */ ML_CN10K_FW_ERR_RUN_LUT_OVERFLOW,    /* Lookup table overflow at run */
+	/* 0x6 */ ML_CN10K_FW_ERR_RUN_ID_NOT_FOUND,    /* Model ID not found during run */
+	/* 0x7 */ ML_CN10K_FW_ERR_COMMAND_NOTSUP,      /* Unsupported command */
+	/* 0x8 */ ML_CN10K_FW_ERR_DDR_ADDR_RANGE,      /* DDR address out of range */
+	/* 0x9 */ ML_CN10K_FW_ERR_NUM_BATCHES_INVALID, /* Invalid number of batches */
+	/* 0xA */ ML_CN10K_FW_ERR_INSSYNC_TIMEOUT,     /* INS sync timeout */
 };
 
 /* Driver error sub-type */
 enum cn10k_ml_error_stype_driver {
-	/* 0x0 */ ML_DRIVER_ERR_NOERR = 0, /* No error */
-	/* 0x1 */ ML_DRIVER_ERR_UNKNOWN,   /* Unable to determine error sub-type */
-	/* 0x2 */ ML_DRIVER_ERR_EXCEPTION, /* Firmware exception */
-	/* 0x3 */ ML_DRIVER_ERR_FW_ERROR,  /* Unknown firmware error */
+	/* 0x0 */ ML_CN10K_DRIVER_ERR_NOERR = 0, /* No error */
+	/* 0x1 */ ML_CN10K_DRIVER_ERR_UNKNOWN,	 /* Unable to determine error sub-type */
+	/* 0x2 */ ML_CN10K_DRIVER_ERR_EXCEPTION, /* Firmware exception */
+	/* 0x3 */ ML_CN10K_DRIVER_ERR_FW_ERROR,	 /* Unknown firmware error */
 };
 
 /* Error structure */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 8116c8dedb..65eaaf030d 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -22,47 +22,27 @@
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
-/* Error message length */
-#define ERRMSG_LEN 32
-
-/* Error type database */
-static const struct cn10k_ml_etype_db {
-	enum cn10k_ml_error_etype etype;
-	char name[ERRMSG_LEN];
-} ml_etype_db[] = {
-	{ML_ETYPE_NO_ERROR, "NO_ERROR"},	{ML_ETYPE_FW_NONFATAL, "FW_NON_FATAL"},
-	{ML_ETYPE_HW_NONFATAL, "HW_NON_FATAL"}, {ML_ETYPE_HW_FATAL, "HW_FATAL"},
-	{ML_ETYPE_HW_WARNING, "HW_WARNING"},	{ML_ETYPE_DRIVER, "DRIVER_ERROR"},
-	{ML_ETYPE_UNKNOWN, "UNKNOWN_ERROR"},
-};
-
 /* Hardware non-fatal error subtype database */
-static const struct cn10k_ml_stype_db_hw_nf {
-	enum cn10k_ml_error_stype_fw_nf stype;
-	char msg[ERRMSG_LEN];
-} ml_stype_db_hw_nf[] = {
-	{ML_FW_ERR_NOERR, "NO ERROR"},
-	{ML_FW_ERR_UNLOAD_ID_NOT_FOUND, "UNLOAD MODEL ID NOT FOUND"},
-	{ML_FW_ERR_LOAD_LUT_OVERFLOW, "LOAD LUT OVERFLOW"},
-	{ML_FW_ERR_ID_IN_USE, "MODEL ID IN USE"},
-	{ML_FW_ERR_INVALID_TILEMASK, "INVALID TILEMASK"},
-	{ML_FW_ERR_RUN_LUT_OVERFLOW, "RUN LUT OVERFLOW"},
-	{ML_FW_ERR_RUN_ID_NOT_FOUND, "RUN MODEL ID NOT FOUND"},
-	{ML_FW_ERR_COMMAND_NOTSUP, "COMMAND NOT SUPPORTED"},
-	{ML_FW_ERR_DDR_ADDR_RANGE, "DDR ADDRESS OUT OF RANGE"},
-	{ML_FW_ERR_NUM_BATCHES_INVALID, "INVALID BATCHES"},
-	{ML_FW_ERR_INSSYNC_TIMEOUT, "INSSYNC TIMEOUT"},
+static struct cnxk_ml_error_db ml_stype_db_hw_nf[] = {
+	{ML_CN10K_FW_ERR_NOERR, "NO ERROR"},
+	{ML_CN10K_FW_ERR_UNLOAD_ID_NOT_FOUND, "UNLOAD MODEL ID NOT FOUND"},
+	{ML_CN10K_FW_ERR_LOAD_LUT_OVERFLOW, "LOAD LUT OVERFLOW"},
+	{ML_CN10K_FW_ERR_ID_IN_USE, "MODEL ID IN USE"},
+	{ML_CN10K_FW_ERR_INVALID_TILEMASK, "INVALID TILEMASK"},
+	{ML_CN10K_FW_ERR_RUN_LUT_OVERFLOW, "RUN LUT OVERFLOW"},
+	{ML_CN10K_FW_ERR_RUN_ID_NOT_FOUND, "RUN MODEL ID NOT FOUND"},
+	{ML_CN10K_FW_ERR_COMMAND_NOTSUP, "COMMAND NOT SUPPORTED"},
+	{ML_CN10K_FW_ERR_DDR_ADDR_RANGE, "DDR ADDRESS OUT OF RANGE"},
+	{ML_CN10K_FW_ERR_NUM_BATCHES_INVALID, "INVALID BATCHES"},
+	{ML_CN10K_FW_ERR_INSSYNC_TIMEOUT, "INSSYNC TIMEOUT"},
 };
 
 /* Driver error subtype database */
-static const struct cn10k_ml_stype_db_driver {
-	enum cn10k_ml_error_stype_driver stype;
-	char msg[ERRMSG_LEN];
-} ml_stype_db_driver[] = {
-	{ML_DRIVER_ERR_NOERR, "NO ERROR"},
-	{ML_DRIVER_ERR_UNKNOWN, "UNKNOWN ERROR"},
-	{ML_DRIVER_ERR_EXCEPTION, "FW EXCEPTION"},
-	{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
+static struct cnxk_ml_error_db ml_stype_db_driver[] = {
+	{ML_CN10K_DRIVER_ERR_NOERR, "NO ERROR"},
+	{ML_CN10K_DRIVER_ERR_UNKNOWN, "UNKNOWN ERROR"},
+	{ML_CN10K_DRIVER_ERR_EXCEPTION, "FW EXCEPTION"},
+	{ML_CN10K_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
 };
 
 __rte_hot void
@@ -1241,19 +1221,19 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request)
 
 		/* Handle driver error */
 		error_code = (union cn10k_ml_error_code *)&result->error_code;
-		if (error_code->s.etype == ML_ETYPE_DRIVER) {
+		if (error_code->s.etype == ML_CNXK_ETYPE_DRIVER) {
 			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 			/* Check for exception */
 			if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) !=
 			     0) ||
 			    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
-				error_code->s.stype = ML_DRIVER_ERR_EXCEPTION;
+				error_code->s.stype = ML_CN10K_DRIVER_ERR_EXCEPTION;
 			else if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_LO) != 0) ||
 				 (roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_HI) != 0))
-				error_code->s.stype = ML_DRIVER_ERR_FW_ERROR;
+				error_code->s.stype = ML_CN10K_DRIVER_ERR_FW_ERROR;
 			else
-				error_code->s.stype = ML_DRIVER_ERR_UNKNOWN;
+				error_code->s.stype = ML_CN10K_DRIVER_ERR_UNKNOWN;
 		}
 
 		op->impl_opaque = result->error_code;
@@ -1294,7 +1274,7 @@ cn10k_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op, ui
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
-	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	error_code->s.etype = ML_CNXK_ETYPE_UNKNOWN;
 	req->cn10k_req.result.user_ptr = op->user_ptr;
 
 	cnxk_ml_set_poll_ptr(req);
@@ -1311,30 +1291,29 @@ __rte_hot int
 cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_ml_op_error *error)
 {
 	union cn10k_ml_error_code *error_code;
-	char msg[RTE_ML_STR_MAX];
 
 	PLT_SET_USED(dev);
 
 	error_code = (union cn10k_ml_error_code *)&op->impl_opaque;
 
-	/* Copy error message */
-	plt_strlcpy(msg, ml_etype_db[error_code->s.etype].name, sizeof(msg));
-
 	/* Copy sub error message */
-	if (error_code->s.etype == ML_ETYPE_HW_NONFATAL) {
-		strcat(msg, " : ");
+	if (error_code->s.etype == ML_CNXK_ETYPE_HW_NONFATAL) {
 		if (error_code->s.stype < PLT_DIM(ml_stype_db_hw_nf))
-			strcat(msg, ml_stype_db_hw_nf[error_code->s.stype].msg);
+			snprintf(error->message, RTE_ML_STR_MAX, "%s : %s",
+				 ml_etype_db[error_code->s.etype].str,
+				 ml_stype_db_hw_nf[error_code->s.stype].str);
 		else
-			strcat(msg, "UNKNOWN ERROR");
-	}
-
-	if (error_code->s.etype == ML_ETYPE_DRIVER) {
-		strcat(msg, " : ");
-		strcat(msg, ml_stype_db_driver[error_code->s.stype].msg);
+			snprintf(error->message, RTE_ML_STR_MAX, "%s : UNKNOWN ERROR",
+				 ml_etype_db[error_code->s.etype].str);
+	} else if (error_code->s.etype == ML_CNXK_ETYPE_DRIVER) {
+		snprintf(error->message, RTE_ML_STR_MAX, "%s : %s",
+			 ml_etype_db[error_code->s.etype].str,
+			 ml_stype_db_driver[error_code->s.stype].str);
+	} else {
+		snprintf(error->message, RTE_ML_STR_MAX, "%s",
+			 ml_etype_db[error_code->s.etype].str);
 	}
 
-	plt_strlcpy(error->message, msg, sizeof(error->message));
 	error->errcode = error_code->u64;
 
 	return 0;
@@ -1372,7 +1351,7 @@ cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
-	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	error_code->s.etype = ML_CNXK_ETYPE_UNKNOWN;
 	req->cn10k_req.result.user_ptr = NULL;
 
 	cnxk_ml_set_poll_ptr(req);
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.c b/drivers/ml/cnxk/cnxk_ml_dev.c
index 2a5c17c973..63d1c9e417 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.c
+++ b/drivers/ml/cnxk/cnxk_ml_dev.c
@@ -9,3 +9,11 @@
 
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
+
+/* Error type database */
+struct cnxk_ml_error_db ml_etype_db[] = {
+	{ML_CNXK_ETYPE_NO_ERROR, "NO_ERROR"},	     {ML_CNXK_ETYPE_FW_NONFATAL, "FW_NON_FATAL"},
+	{ML_CNXK_ETYPE_HW_NONFATAL, "HW_NON_FATAL"}, {ML_CNXK_ETYPE_HW_FATAL, "HW_FATAL"},
+	{ML_CNXK_ETYPE_HW_WARNING, "HW_WARNING"},    {ML_CNXK_ETYPE_DRIVER, "DRIVER_ERROR"},
+	{ML_CNXK_ETYPE_UNKNOWN, "UNKNOWN_ERROR"},
+};
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 3ce9338f1f..382fca64be 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -18,6 +18,22 @@
 #define ML_CNXK_POLL_JOB_START	0
 #define ML_CNXK_POLL_JOB_FINISH 1
 
+/* Error types enumeration */
+enum cnxk_ml_error_etype {
+	/* 0x0 */ ML_CNXK_ETYPE_NO_ERROR = 0, /* No error */
+	/* 0x1 */ ML_CNXK_ETYPE_FW_NONFATAL,  /* Firmware non-fatal error */
+	/* 0x2 */ ML_CNXK_ETYPE_HW_NONFATAL,  /* Hardware non-fatal error */
+	/* 0x3 */ ML_CNXK_ETYPE_HW_FATAL,     /* Hardware fatal error */
+	/* 0x4 */ ML_CNXK_ETYPE_HW_WARNING,   /* Hardware warning */
+	/* 0x5 */ ML_CNXK_ETYPE_DRIVER,	      /* Driver specific error */
+	/* 0x6 */ ML_CNXK_ETYPE_UNKNOWN,      /* Unknown error */
+};
+
+struct cnxk_ml_error_db {
+	uint64_t code;
+	char str[RTE_ML_STR_MAX];
+};
+
 /* Device configuration state enum */
 enum cnxk_ml_dev_state {
 	/* Probed and not configured */
@@ -78,4 +94,6 @@ struct cnxk_ml_dev {
 	struct cnxk_ml_index_map *index_map;
 };
 
+extern struct cnxk_ml_error_db ml_etype_db[];
+
 #endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 6a44a69508..8339f8342b 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1372,7 +1372,7 @@ cnxk_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op *
 		if (plt_tsc_cycles() < req->timeout)
 			goto empty_or_active;
 		else /* Timeout, set indication of driver error */
-			model->set_error_code(req, ML_ETYPE_DRIVER, 0);
+			model->set_error_code(req, ML_CNXK_ETYPE_DRIVER, 0);
 	}
 
 	model->result_update(cnxk_mldev, qp->id, req);
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 18/34] ml/cnxk: support config and close of tvmdp library
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (16 preceding siblings ...)
  2023-10-18 13:54   ` [PATCH v6 17/34] ml/cnxk: move error handling to cnxk layer Srikanth Yalavarthi
@ 2023-10-18 13:54   ` Srikanth Yalavarthi
  2023-10-18 18:34     ` Jerin Jacob
  2023-10-18 13:54   ` [PATCH v6 19/34] ml/cnxk: add structures to support TVM model type Srikanth Yalavarthi
                     ` (16 subsequent siblings)
  34 siblings, 1 reply; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:54 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to configure and close TVMDP library based
on ML device configuration options.

Updated meson build to enable Jansson, TVM runtime, TVMDP
library as build dependencies.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 doc/guides/mldevs/cnxk.rst       | 78 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.c    |  7 +++
 drivers/ml/cnxk/cnxk_ml_ops.h    |  6 +++
 drivers/ml/cnxk/meson.build      | 59 ++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 41 +++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   | 19 ++++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.c | 26 +++++++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h | 15 ++++++
 8 files changed, 251 insertions(+)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_stubs.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_stubs.h

diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
index 1834b1f905..ef2b5d4581 100644
--- a/doc/guides/mldevs/cnxk.rst
+++ b/doc/guides/mldevs/cnxk.rst
@@ -46,6 +46,84 @@ or cross-compiled on an x86 platform.
 
 Refer to :doc:`../platform/cnxk` for instructions to build your DPDK application.
 
+Compilation Prerequisites
+-------------------------
+
+This driver requires external libraries to optionally enable support for
+models compiled using Apache TVM framework. The following dependencies are
+not part of DPDK and must be installed separately:
+
+- **Jansson**
+
+  This library enables support to parse and read JSON files.
+
+- **DLPack**
+
+  This library provides headers for open in-memory tensor structures.
+
+.. note::
+
+    DPDK CNXK ML driver requires DLPack version 0.7
+
+.. code-block:: console
+
+    git clone https://github.com/dmlc/dlpack.git
+    cd dlpack
+    git checkout v0.7 -b v0.7
+    cmake -S ./ -B build \
+      -DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
+      -DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++ \
+      -DBUILD_MOCK=OFF
+    make -C build
+    make -C build install
+
+- **TVM**
+
+  Apache TVM provides a runtime library (libtvm_runtime) used to execute
+  models on CPU cores or hardware accelerators.
+
+.. note::
+
+    DPDK CNXK ML driver requires TVM version 0.10.0
+
+.. code-block:: console
+
+    git clone https://github.com/apache/tvm.git
+    cd tvm
+    git checkout v0.10.0 -b v0.10.0
+    cmake -S ./ -B build \
+      -DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
+      -DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++ \
+      -DMACHINE_NAME=aarch64-linux-gnu \
+      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
+      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY
+    make -C build
+    make -C build install
+
+- **TVMDP**
+
+  Marvell's `TVM Dataplane Library <https://github.com/MarvellEmbeddedProcessors/tvmdp>`_
+  works as an interface between TVM runtime and DPDK drivers. TVMDP library
+  provides a simplified C interface for TVM's runtime based on C++.
+
+.. code-block:: console
+
+    git clone https://github.com/MarvellEmbeddedProcessors/tvmdp.git
+    cd tvmdp
+    git checkout main
+    cmake -S ./ -B build \
+      -DCMAKE_TOOLCHAIN_FILE=config/toolchains/arm64_linux_gcc.cmake \
+      -DBUILD_SHARED_LIBS=ON \
+      -DBUILD_TESTING=OFF
+    make -C build
+    make -C build install
+
+- **libarchive**
+
+  Apached TVM framework generates compiled models as tar archives. This
+  library enables support to decompress and read archive files in tar,
+  xz and other formats.
+
 
 Initialization
 --------------
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 8339f8342b..c3639320a5 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -564,6 +564,10 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 		goto error;
 	}
 
+	ret = mvtvm_ml_dev_configure(cnxk_mldev, conf);
+	if (ret != 0)
+		goto error;
+
 	/* Set device capabilities */
 	cnxk_mldev->max_nb_layers =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
@@ -624,6 +628,9 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 	/* Un-initialize xstats */
 	cnxk_ml_xstats_uninit(cnxk_mldev);
 
+	if (mvtvm_ml_dev_close(cnxk_mldev) != 0)
+		plt_err("Failed to close MVTVM ML Device");
+
 	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close CN10K ML Device");
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index d0c126f34b..b22a2b0d95 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -12,6 +12,12 @@
 
 #include "cn10k_ml_ops.h"
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_ops.h"
+#else
+#include "mvtvm_ml_stubs.h"
+#endif
+
 /* Request structure */
 struct cnxk_ml_req {
 	/* Device specific request */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 5d27a87d91..607e1c72e9 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -7,6 +7,32 @@ if not is_linux or not dpdk_conf.get('RTE_ARCH_64')
     subdir_done()
 endif
 
+enable_mvtvm = true
+
+if not jansson_dep.found()
+        message('drivers/ml/cnxk: jansson not found')
+        enable_mvtvm = false
+endif
+
+if not cc.check_header('dlpack/dlpack.h')
+        message('drivers/ml/cnxk: dlpack.h not found')
+        enable_mvtvm = false
+endif
+
+tvmrt_lib = cc.find_library('tvm_runtime', required: false)
+if tvmrt_lib.found()
+        tvmrt_dep = declare_dependency(dependencies: tvmrt_lib)
+else
+        message('drivers/ml/cnxk: tvm_runtime not found')
+        enable_mvtvm = false
+endif
+
+tvmdp_dep = dependency('tvmdp', required: false)
+if not tvmdp_dep.found()
+        message('drivers/ml/cnxk: tvmdp not found')
+        enable_mvtvm = false
+endif
+
 sources = files(
         'cn10k_ml_dev.c',
         'cn10k_ml_ops.c',
@@ -21,6 +47,39 @@ sources = files(
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
 
+if enable_mvtvm
+
+dpdk_conf.set('RTE_MLDEV_CNXK_ENABLE_MVTVM', 1)
+
+driver_sdk_headers += files(
+        'mvtvm_ml_ops.h',
+)
+
+sources += files(
+        'mvtvm_ml_ops.c',
+)
+
+ext_deps += tvmrt_dep
+ext_deps += tvmdp_dep
+ext_deps += cc.find_library('stdc++', required: true)
+ext_deps += jansson_dep
+
+deps += ['bus_vdev']
+
+message('drivers/ml/cnxk: Enabled TVM model support')
+else
+message('drivers/ml/cnxk: Disabled TVM model support')
+
+driver_sdk_headers += files(
+        'mvtvm_ml_stubs.h',
+)
+
+sources += files(
+        'mvtvm_ml_stubs.c',
+)
+
+endif
+
 require_iova_in_mbuf = false
 
 if get_option('buildtype').contains('debug')
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
new file mode 100644
index 0000000000..88c6d5a864
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_common.h>
+#include <rte_cycles.h>
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include "cnxk_ml_dev.h"
+#include "cnxk_ml_ops.h"
+
+int
+mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
+{
+	int ret;
+
+	RTE_SET_USED(conf);
+
+	/* Configure TVMDP library */
+	ret = tvmdp_configure(cnxk_mldev->mldev->data->nb_models, rte_get_tsc_cycles);
+	if (ret != 0)
+		plt_err("TVMDP configuration failed, error = %d\n", ret);
+
+	return ret;
+}
+
+int
+mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
+{
+	int ret;
+
+	RTE_SET_USED(cnxk_mldev);
+
+	/* Close TVMDP library configuration */
+	ret = tvmdp_close();
+	if (ret != 0)
+		plt_err("TVMDP close failed, error = %d\n", ret);
+
+	return ret;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
new file mode 100644
index 0000000000..305b4681ed
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_OPS_H_
+#define _MVTVM_ML_OPS_H_
+
+#include <dlpack/dlpack.h>
+
+#include <tvmdp.h>
+
+#include <rte_mldev.h>
+
+struct cnxk_ml_dev;
+
+int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
+int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+
+#endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
new file mode 100644
index 0000000000..a31cd39cfa
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include "mvtvm_ml_stubs.h"
+
+#include "cnxk_ml_dev.h"
+
+int
+mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(conf);
+
+	return 0;
+}
+
+int
+mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
+{
+	RTE_SET_USED(cnxk_mldev);
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
new file mode 100644
index 0000000000..11c56e5144
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_STUBS_H_
+#define _MVTVM_ML_STUBS_H_
+
+#include <rte_mldev.h>
+
+struct cnxk_ml_dev;
+
+int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
+int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+
+#endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 19/34] ml/cnxk: add structures to support TVM model type
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (17 preceding siblings ...)
  2023-10-18 13:54   ` [PATCH v6 18/34] ml/cnxk: support config and close of tvmdp library Srikanth Yalavarthi
@ 2023-10-18 13:54   ` Srikanth Yalavarthi
  2023-10-18 13:54   ` [PATCH v6 20/34] ml/cnxk: add support for identify " Srikanth Yalavarthi
                     ` (15 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:54 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduced model type, sub-type and layer type. Added
internal structures for TVM model objects.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  3 ++
 drivers/ml/cnxk/cn10k_ml_ops.c   |  6 ++-
 drivers/ml/cnxk/cnxk_ml_model.h  | 66 +++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.c    | 52 ++++++++++++++++++++-----
 drivers/ml/cnxk/meson.build      |  1 +
 drivers/ml/cnxk/mvtvm_ml_model.h | 46 ++++++++++++++++++++++
 6 files changed, 161 insertions(+), 13 deletions(-)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.h

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index dc315cce10..749ddeb344 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -435,6 +435,9 @@ cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint1
 
 			for (j = 0; j < local_model->nb_layers; j++) {
 				local_layer = &local_model->layer[j];
+				if (local_layer->type != ML_CNXK_LAYER_TYPE_MRVL)
+					continue;
+
 				if (local_layer != layer &&
 				    local_layer->glow.ocm_map.ocm_reserved) {
 					if (IS_BIT_SET(local_layer->glow.ocm_map.tilemask, tile_id))
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 65eaaf030d..a471e98fbf 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -725,6 +725,9 @@ cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	if (ret != 0)
 		return ret;
 
+	/* Set model sub type */
+	model->subtype = ML_CNXK_MODEL_SUBTYPE_GLOW_MRVL;
+
 	/* Copy metadata to internal buffer */
 	rte_memcpy(&model->glow.metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
 	cn10k_ml_model_metadata_update(&model->glow.metadata);
@@ -746,6 +749,7 @@ cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	/* Load layer and get the index */
 	layer = &model->layer[0];
+	layer->type = ML_CNXK_LAYER_TYPE_MRVL;
 	ret = cn10k_ml_layer_load(cnxk_mldev, model->model_id, NULL, params->addr, params->size,
 				  &layer->index);
 	if (ret != 0) {
@@ -969,7 +973,7 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 	if (ret < 0) {
 		cn10k_ml_layer_stop(device, model_id, layer_name);
 	} else {
-		if (cn10k_mldev->cache_model_data)
+		if (cn10k_mldev->cache_model_data && model->type == ML_CNXK_MODEL_TYPE_GLOW)
 			ret = cn10k_ml_cache_model_data(cnxk_mldev, layer);
 	}
 
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index f618e5aa5f..f100eca203 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -11,6 +11,10 @@
 
 #include "cn10k_ml_model.h"
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_model.h"
+#endif
+
 #include "cnxk_ml_io.h"
 
 struct cnxk_ml_dev;
@@ -18,6 +22,48 @@ struct cnxk_ml_model;
 struct cnxk_ml_qp;
 struct cnxk_ml_req;
 
+/* Model type */
+enum cnxk_ml_model_type {
+	/* Unknown model type */
+	ML_CNXK_MODEL_TYPE_UNKNOWN,
+
+	/* Invalid model type */
+	ML_CNXK_MODEL_TYPE_INVALID,
+
+	/* Glow compiled model, for MLIP target */
+	ML_CNXK_MODEL_TYPE_GLOW,
+
+	/* TVM compiled model, for ARM64 / ARM64 + MLIP target */
+	ML_CNXK_MODEL_TYPE_TVM,
+};
+
+/* Model subtype */
+enum cnxk_ml_model_subtype {
+	/* Marvell Glow model */
+	ML_CNXK_MODEL_SUBTYPE_GLOW_MRVL,
+
+	/* TVM model with single MRVL region */
+	ML_CNXK_MODEL_SUBTYPE_TVM_MRVL,
+
+	/* TVM model with LLVM regions only */
+	ML_CNXK_MODEL_SUBTYPE_TVM_LLVM,
+
+	/* TVM hybrid model, with both MRVL and LLVM regions or (> 1) MRVL regions*/
+	ML_CNXK_MODEL_SUBTYPE_TVM_HYBRID,
+};
+
+/* Layer type */
+enum cnxk_ml_layer_type {
+	/* MRVL layer, for MLIP target*/
+	ML_CNXK_LAYER_TYPE_UNKNOWN = 0,
+
+	/* MRVL layer, for MLIP target*/
+	ML_CNXK_LAYER_TYPE_MRVL,
+
+	/* LLVM layer, for ARM64 target*/
+	ML_CNXK_LAYER_TYPE_LLVM,
+};
+
 /* Model state */
 enum cnxk_ml_model_state {
 	/* Unknown state */
@@ -53,6 +99,9 @@ struct cnxk_ml_layer {
 	/* Name*/
 	char name[RTE_ML_STR_MAX];
 
+	/* Type */
+	enum cnxk_ml_layer_type type;
+
 	/* Model handle */
 	struct cnxk_ml_model *model;
 
@@ -83,14 +132,27 @@ struct cnxk_ml_model {
 	/* Device reference */
 	struct cnxk_ml_dev *cnxk_mldev;
 
+	/* Type */
+	enum cnxk_ml_model_type type;
+
+	/* Model subtype */
+	enum cnxk_ml_model_subtype subtype;
+
 	/* ID */
 	uint16_t model_id;
 
 	/* Name */
 	char name[RTE_ML_STR_MAX];
 
-	/* Model specific data - glow */
-	struct cn10k_ml_model_data glow;
+	union {
+		/* Model specific data - glow */
+		struct cn10k_ml_model_data glow;
+
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		/* Model type specific data - mvtvm */
+		struct mvtvm_ml_model_data mvtvm;
+#endif
+	};
 
 	/* Batch size */
 	uint32_t batch_size;
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index c3639320a5..ea6f59a70f 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1217,6 +1217,8 @@ cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buf
 	struct cnxk_ml_model *model;
 	uint8_t *lcl_dbuffer;
 	uint8_t *lcl_qbuffer;
+	uint64_t d_offset;
+	uint64_t q_offset;
 	uint32_t i;
 	int ret;
 
@@ -1229,17 +1231,31 @@ cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buf
 		return -EINVAL;
 	}
 
-	info = &model->layer[0].info;
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		info = cn10k_ml_model_io_info_get(model, 0);
 
-	lcl_dbuffer = dbuffer[0]->addr;
-	lcl_qbuffer = qbuffer[0]->addr;
+	if (info == NULL)
+		return -EINVAL;
+
+	d_offset = 0;
+	q_offset = 0;
 	for (i = 0; i < info->nb_inputs; i++) {
+		if (model->type == ML_CNXK_MODEL_TYPE_TVM) {
+			lcl_dbuffer = dbuffer[i]->addr;
+			lcl_qbuffer = qbuffer[i]->addr;
+		} else {
+			lcl_dbuffer = RTE_PTR_ADD(dbuffer[0]->addr, d_offset);
+			lcl_qbuffer = RTE_PTR_ADD(qbuffer[0]->addr, q_offset);
+		}
+
 		ret = cnxk_ml_io_quantize_single(&info->input[i], lcl_dbuffer, lcl_qbuffer);
 		if (ret < 0)
 			return ret;
 
-		lcl_dbuffer += info->input[i].sz_d;
-		lcl_qbuffer += info->input[i].sz_q;
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
+			d_offset += info->input[i].sz_d;
+			q_offset += info->input[i].sz_q;
+		}
 	}
 
 	return 0;
@@ -1253,6 +1269,8 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 	struct cnxk_ml_model *model;
 	uint8_t *lcl_qbuffer;
 	uint8_t *lcl_dbuffer;
+	uint64_t q_offset;
+	uint64_t d_offset;
 	uint32_t i;
 	int ret;
 
@@ -1265,17 +1283,31 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 		return -EINVAL;
 	}
 
-	info = &model->layer[model->nb_layers - 1].info;
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		info = cn10k_ml_model_io_info_get(model, model->nb_layers - 1);
 
-	lcl_qbuffer = qbuffer[0]->addr;
-	lcl_dbuffer = dbuffer[0]->addr;
+	if (info == NULL)
+		return -EINVAL;
+
+	q_offset = 0;
+	d_offset = 0;
 	for (i = 0; i < info->nb_outputs; i++) {
+		if (model->type == ML_CNXK_MODEL_TYPE_TVM) {
+			lcl_qbuffer = qbuffer[i]->addr;
+			lcl_dbuffer = dbuffer[i]->addr;
+		} else {
+			lcl_qbuffer = RTE_PTR_ADD(qbuffer[0]->addr, q_offset);
+			lcl_dbuffer = RTE_PTR_ADD(dbuffer[0]->addr, d_offset);
+		}
+
 		ret = cnxk_ml_io_dequantize_single(&info->output[i], lcl_qbuffer, lcl_dbuffer);
 		if (ret < 0)
 			return ret;
 
-		lcl_qbuffer += info->output[i].sz_q;
-		lcl_dbuffer += info->output[i].sz_d;
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
+			q_offset += info->output[i].sz_q;
+			d_offset += info->output[i].sz_d;
+		}
 	}
 
 	return 0;
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 607e1c72e9..ff9f11e111 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -53,6 +53,7 @@ dpdk_conf.set('RTE_MLDEV_CNXK_ENABLE_MVTVM', 1)
 
 driver_sdk_headers += files(
         'mvtvm_ml_ops.h',
+        'mvtvm_ml_model.h',
 )
 
 sources += files(
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
new file mode 100644
index 0000000000..1f6b435be0
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_MODEL_H_
+#define _MVTVM_ML_MODEL_H_
+
+#include <tvmdp.h>
+
+#include <rte_mldev.h>
+
+#include "cnxk_ml_io.h"
+
+/* Maximum number of objects per model */
+#define ML_MVTVM_MODEL_OBJECT_MAX 3
+
+/* Objects list */
+extern char mvtvm_object_list[ML_MVTVM_MODEL_OBJECT_MAX][RTE_ML_STR_MAX];
+
+/* Model object structure */
+struct mvtvm_ml_model_object {
+	/* Name */
+	char name[RTE_ML_STR_MAX];
+
+	/* Temporary buffer */
+	uint8_t *buffer;
+
+	/* Buffer size */
+	int64_t size;
+};
+
+struct mvtvm_ml_model_data {
+	/* Model metadata */
+	struct tvmdp_model_metadata metadata;
+
+	/* Model objects */
+	struct tvmdp_model_object object;
+
+	/* TVM runtime callbacks */
+	struct tvmrt_glow_callback cb;
+
+	/* Model I/O info */
+	struct cnxk_ml_io_info info;
+};
+
+#endif /* _MVTVM_ML_MODEL_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 20/34] ml/cnxk: add support for identify model type
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (18 preceding siblings ...)
  2023-10-18 13:54   ` [PATCH v6 19/34] ml/cnxk: add structures to support TVM model type Srikanth Yalavarthi
@ 2023-10-18 13:54   ` Srikanth Yalavarthi
  2023-10-18 13:54   ` [PATCH v6 21/34] ml/cnxk: add support to parse TVM model objects Srikanth Yalavarthi
                     ` (14 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:54 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enable support to parse model buffer to identify the
model type and model sub-type. Enabled basic checks
for Glow model type buffer.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_model.c  | 49 ++++++++++++++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_model.h  |  3 ++
 drivers/ml/cnxk/cnxk_ml_ops.c    |  8 +++++
 drivers/ml/cnxk/meson.build      |  6 ++++
 drivers/ml/cnxk/mvtvm_ml_model.c | 55 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  9 ++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  1 +
 8 files changed, 133 insertions(+)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.c

diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
index b069d4e3a5..02f80410ec 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.c
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -2,11 +2,60 @@
  * Copyright (c) 2023 Marvell.
  */
 
+#include <rte_hash_crc.h>
 #include <rte_mldev.h>
 
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_utils.h"
 
+enum cnxk_ml_model_type
+cnxk_ml_model_get_type(struct rte_ml_model_params *params)
+{
+	struct cn10k_ml_model_metadata_header *metadata_header;
+	enum cnxk_ml_model_type type;
+	uint32_t payload_crc32c;
+	uint32_t header_crc32c;
+
+	type = mvtvm_ml_model_type_get(params);
+	if (type == ML_CNXK_MODEL_TYPE_TVM)
+		return ML_CNXK_MODEL_TYPE_TVM;
+	else if (type == ML_CNXK_MODEL_TYPE_INVALID)
+		return ML_CNXK_MODEL_TYPE_INVALID;
+
+	/* Check model magic string */
+	metadata_header = (struct cn10k_ml_model_metadata_header *)params->addr;
+	if (strncmp((char *)metadata_header->magic, MRVL_ML_MODEL_MAGIC_STRING, 4) != 0) {
+		plt_err("Invalid Glow model, magic = %s", metadata_header->magic);
+		return ML_CNXK_MODEL_TYPE_INVALID;
+	}
+
+	/* Header CRC check */
+	if (metadata_header->header_crc32c != 0) {
+		header_crc32c = rte_hash_crc(
+			params->addr,
+			sizeof(struct cn10k_ml_model_metadata_header) - sizeof(uint32_t), 0);
+
+		if (header_crc32c != metadata_header->header_crc32c) {
+			plt_err("Invalid Glow model, Header CRC mismatch");
+			return ML_CNXK_MODEL_TYPE_INVALID;
+		}
+	}
+
+	/* Payload CRC check */
+	if (metadata_header->payload_crc32c != 0) {
+		payload_crc32c = rte_hash_crc(
+			PLT_PTR_ADD(params->addr, sizeof(struct cn10k_ml_model_metadata_header)),
+			params->size - sizeof(struct cn10k_ml_model_metadata_header), 0);
+
+		if (payload_crc32c != metadata_header->payload_crc32c) {
+			plt_err("Invalid Glow model, Payload CRC mismatch");
+			return ML_CNXK_MODEL_TYPE_INVALID;
+		}
+	}
+
+	return ML_CNXK_MODEL_TYPE_GLOW;
+}
+
 void
 cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp)
 {
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index f100eca203..a2fced46a2 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -13,6 +13,8 @@
 
 #ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
 #include "mvtvm_ml_model.h"
+#else
+#include "mvtvm_ml_stubs.h"
 #endif
 
 #include "cnxk_ml_io.h"
@@ -184,6 +186,7 @@ struct cnxk_ml_model {
 	set_poll_addr_t set_poll_addr;
 };
 
+enum cnxk_ml_model_type cnxk_ml_model_get_type(struct rte_ml_model_params *params);
 void cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp);
 
 #endif /* _CNXK_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index ea6f59a70f..c140408023 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1018,6 +1018,7 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 {
 	struct rte_ml_dev_info dev_info;
 	struct cnxk_ml_dev *cnxk_mldev;
+	enum cnxk_ml_model_type type;
 	struct cnxk_ml_model *model;
 
 	char str[RTE_MEMZONE_NAMESIZE];
@@ -1033,6 +1034,12 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 
 	cnxk_mldev = dev->data->dev_private;
 
+	type = cnxk_ml_model_get_type(params);
+	if (type == ML_CNXK_MODEL_TYPE_INVALID) {
+		plt_err("Invalid / unsupported model type");
+		return -EINVAL;
+	}
+
 	/* Find model ID */
 	found = false;
 	for (lcl_model_id = 0; lcl_model_id < dev->data->nb_models; lcl_model_id++) {
@@ -1066,6 +1073,7 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 
 	model = mz->addr;
 	model->cnxk_mldev = cnxk_mldev;
+	model->type = type;
 	model->model_id = lcl_model_id;
 	model->info = PLT_PTR_ADD(
 		model, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size));
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index ff9f11e111..a20615186c 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -9,6 +9,11 @@ endif
 
 enable_mvtvm = true
 
+if not libarchive.found()
+        message('drivers/ml/cnxk: libarchive not found')
+        enable_mvtvm = false
+endif
+
 if not jansson_dep.found()
         message('drivers/ml/cnxk: jansson not found')
         enable_mvtvm = false
@@ -58,6 +63,7 @@ driver_sdk_headers += files(
 
 sources += files(
         'mvtvm_ml_ops.c',
+        'mvtvm_ml_model.c',
 )
 
 ext_deps += tvmrt_dep
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
new file mode 100644
index 0000000000..ab5f8baa67
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <archive.h>
+#include <archive_entry.h>
+
+#include <rte_mldev.h>
+
+#include <roc_api.h>
+
+#include "cnxk_ml_model.h"
+
+/* Objects list */
+char mvtvm_object_list[ML_MVTVM_MODEL_OBJECT_MAX][RTE_ML_STR_MAX] = {"mod.so", "mod.json",
+								     "mod.params"};
+
+enum cnxk_ml_model_type
+mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
+{
+	bool object_found[ML_MVTVM_MODEL_OBJECT_MAX] = {false, false, false};
+	struct archive_entry *entry;
+	struct archive *a;
+	uint8_t i;
+	int ret;
+
+	/* Assume as archive and check for read status */
+	a = archive_read_new();
+	archive_read_support_filter_all(a);
+	archive_read_support_format_all(a);
+
+	ret = archive_read_open_memory(a, params->addr, params->size);
+	if (ret != ARCHIVE_OK)
+		return ML_CNXK_MODEL_TYPE_UNKNOWN;
+
+	/* Parse buffer for available objects */
+	while (archive_read_next_header(a, &entry) == ARCHIVE_OK) {
+		for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+			if (!object_found[i] &&
+			    (strcmp(archive_entry_pathname(entry), mvtvm_object_list[i]) == 0))
+				object_found[i] = true;
+		}
+		archive_read_data_skip(a);
+	}
+
+	/* Check if all objects are available */
+	for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+		if (!object_found[i]) {
+			plt_err("Object %s not found in archive!\n", mvtvm_object_list[i]);
+			return ML_CNXK_MODEL_TYPE_INVALID;
+		}
+	}
+
+	return ML_CNXK_MODEL_TYPE_TVM;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 1f6b435be0..b6162fceec 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -43,4 +43,6 @@ struct mvtvm_ml_model_data {
 	struct cnxk_ml_io_info info;
 };
 
+enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
+
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index a31cd39cfa..a7352840a6 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -7,6 +7,15 @@
 #include "mvtvm_ml_stubs.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
+
+enum cnxk_ml_model_type
+mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
+{
+	RTE_SET_USED(params);
+
+	return ML_CNXK_MODEL_TYPE_UNKNOWN;
+}
 
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 11c56e5144..467a9d39e5 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -9,6 +9,7 @@
 
 struct cnxk_ml_dev;
 
+enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 21/34] ml/cnxk: add support to parse TVM model objects
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (19 preceding siblings ...)
  2023-10-18 13:54   ` [PATCH v6 20/34] ml/cnxk: add support for identify " Srikanth Yalavarthi
@ 2023-10-18 13:54   ` Srikanth Yalavarthi
  2023-10-18 13:54   ` [PATCH v6 22/34] ml/cnxk: fetch layer info and load TVM model Srikanth Yalavarthi
                     ` (13 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:54 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to parse TVM model objects from the model
archive buffer. Added support to check for all expected
objects and copy TVM model objects to internal buffers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_ops.c    |  5 ++-
 drivers/ml/cnxk/mvtvm_ml_model.c | 57 +++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 62 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |  3 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c | 11 ++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  3 ++
 7 files changed, 142 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index c140408023..b18271545d 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1079,7 +1079,10 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 		model, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size));
 	dev->data->models[lcl_model_id] = model;
 
-	ret = cn10k_ml_model_load(cnxk_mldev, params, model);
+	if (type == ML_CNXK_MODEL_TYPE_GLOW)
+		ret = cn10k_ml_model_load(cnxk_mldev, params, model);
+	else
+		ret = mvtvm_ml_model_load(cnxk_mldev, params, model);
 	if (ret != 0)
 		goto error;
 
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index ab5f8baa67..4c9a080c05 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -53,3 +53,60 @@ mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
 
 	return ML_CNXK_MODEL_TYPE_TVM;
 }
+
+int
+mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params, struct mvtvm_ml_model_object *object)
+{
+	bool object_found[ML_MVTVM_MODEL_OBJECT_MAX] = {false, false, false};
+	struct archive_entry *entry;
+	struct archive *a;
+	uint8_t i;
+	int ret;
+
+	/* Open archive */
+	a = archive_read_new();
+	archive_read_support_filter_all(a);
+	archive_read_support_format_all(a);
+
+	ret = archive_read_open_memory(a, params->addr, params->size);
+	if (ret != ARCHIVE_OK)
+		return archive_errno(a);
+
+	/* Read archive */
+	while (archive_read_next_header(a, &entry) == ARCHIVE_OK) {
+		for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+			if (!object_found[i] &&
+			    (strcmp(archive_entry_pathname(entry), mvtvm_object_list[i]) == 0)) {
+				memcpy(object[i].name, mvtvm_object_list[i], RTE_ML_STR_MAX);
+				object[i].size = archive_entry_size(entry);
+				object[i].buffer = rte_malloc(NULL, object[i].size, 0);
+
+				if (archive_read_data(a, object[i].buffer, object[i].size) !=
+				    object[i].size) {
+					plt_err("Failed to read object from model archive: %s",
+						object[i].name);
+					goto error;
+				}
+				object_found[i] = true;
+			}
+		}
+		archive_read_data_skip(a);
+	}
+
+	/* Check if all objects are parsed */
+	for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+		if (!object_found[i]) {
+			plt_err("Object %s not found in archive!\n", mvtvm_object_list[i]);
+			goto error;
+		}
+	}
+	return 0;
+
+error:
+	for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+		if (object[i].buffer != NULL)
+			rte_free(object[i].buffer);
+	}
+
+	return -EINVAL;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index b6162fceec..b11b66f495 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -44,5 +44,7 @@ struct mvtvm_ml_model_data {
 };
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
+int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
+			      struct mvtvm_ml_model_object *object);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 88c6d5a864..e2413b6b15 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -8,8 +8,12 @@
 #include <rte_mldev_pmd.h>
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
+/* ML model macros */
+#define MVTVM_ML_MODEL_MEMZONE_NAME "ml_mvtvm_model_mz"
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
@@ -39,3 +43,61 @@ mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 
 	return ret;
 }
+
+int
+mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+		    struct cnxk_ml_model *model)
+{
+	struct mvtvm_ml_model_object object[ML_MVTVM_MODEL_OBJECT_MAX];
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	size_t model_object_size = 0;
+	uint64_t mz_size = 0;
+	int ret;
+
+	RTE_SET_USED(cnxk_mldev);
+
+	ret = mvtvm_ml_model_blob_parse(params, object);
+	if (ret != 0)
+		return ret;
+
+	model_object_size = RTE_ALIGN_CEIL(object[0].size, RTE_CACHE_LINE_MIN_SIZE) +
+			    RTE_ALIGN_CEIL(object[1].size, RTE_CACHE_LINE_MIN_SIZE) +
+			    RTE_ALIGN_CEIL(object[2].size, RTE_CACHE_LINE_MIN_SIZE);
+	mz_size += model_object_size;
+
+	/* Allocate memzone for model object */
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", MVTVM_ML_MODEL_MEMZONE_NAME, model->model_id);
+	mz = plt_memzone_reserve_aligned(str, mz_size, 0, ML_CN10K_ALIGN_SIZE);
+	if (!mz) {
+		plt_err("plt_memzone_reserve failed : %s", str);
+		return -ENOMEM;
+	}
+
+	/* Copy mod.so */
+	model->mvtvm.object.so.addr = mz->addr;
+	model->mvtvm.object.so.size = object[0].size;
+	rte_memcpy(model->mvtvm.object.so.name, object[0].name, TVMDP_NAME_STRLEN);
+	rte_memcpy(model->mvtvm.object.so.addr, object[0].buffer, object[0].size);
+	rte_free(object[0].buffer);
+
+	/* Copy mod.json */
+	model->mvtvm.object.json.addr =
+		RTE_PTR_ADD(model->mvtvm.object.so.addr,
+			    RTE_ALIGN_CEIL(model->mvtvm.object.so.size, RTE_CACHE_LINE_MIN_SIZE));
+	model->mvtvm.object.json.size = object[1].size;
+	rte_memcpy(model->mvtvm.object.json.name, object[1].name, TVMDP_NAME_STRLEN);
+	rte_memcpy(model->mvtvm.object.json.addr, object[1].buffer, object[1].size);
+	rte_free(object[1].buffer);
+
+	/* Copy mod.params */
+	model->mvtvm.object.params.addr =
+		RTE_PTR_ADD(model->mvtvm.object.json.addr,
+			    RTE_ALIGN_CEIL(model->mvtvm.object.json.size, RTE_CACHE_LINE_MIN_SIZE));
+	model->mvtvm.object.params.size = object[2].size;
+	rte_memcpy(model->mvtvm.object.params.name, object[2].name, TVMDP_NAME_STRLEN);
+	rte_memcpy(model->mvtvm.object.params.addr, object[2].buffer, object[2].size);
+	rte_free(object[2].buffer);
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 305b4681ed..6607537599 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -12,8 +12,11 @@
 #include <rte_mldev.h>
 
 struct cnxk_ml_dev;
+struct cnxk_ml_model;
 
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+			struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index a7352840a6..7f3b3abb2e 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -33,3 +33,14 @@ mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 
 	return 0;
 }
+
+int
+mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+		    struct cnxk_ml_model *model)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(params);
+	RTE_SET_USED(model);
+
+	return -EINVAL;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 467a9d39e5..4bb1772ef4 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -8,9 +8,12 @@
 #include <rte_mldev.h>
 
 struct cnxk_ml_dev;
+struct cnxk_ml_model;
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+			struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 22/34] ml/cnxk: fetch layer info and load TVM model
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (20 preceding siblings ...)
  2023-10-18 13:54   ` [PATCH v6 21/34] ml/cnxk: add support to parse TVM model objects Srikanth Yalavarthi
@ 2023-10-18 13:54   ` Srikanth Yalavarthi
  2023-10-18 13:54   ` [PATCH v6 23/34] ml/cnxk: update internal info for " Srikanth Yalavarthi
                     ` (12 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:54 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to fetch TVM model layer information and
update internal structures based on the layer information
Set callback functions for layer load and unload and
enable model loading using TVMDP library. Added support
to fetch full metadata after model load.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 11 +++++
 drivers/ml/cnxk/cn10k_ml_model.h |  2 +
 drivers/ml/cnxk/cn10k_ml_ops.c   |  7 ++-
 drivers/ml/cnxk/mvtvm_ml_model.c | 25 ++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  4 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 81 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.c | 10 ++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  3 ++
 8 files changed, 141 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index b765b4ada9..9a80adf0fc 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -714,3 +714,14 @@ cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer
 	cnxk_ml_print_line(fp, LINE_LEN);
 	fprintf(fp, "\n");
 }
+
+int
+cn10k_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name, uint16_t *layer_id)
+{
+	if (model->type == ML_CNXK_MODEL_TYPE_TVM)
+		return mvtvm_ml_model_get_layer_id(model, layer_name, layer_id);
+
+	*layer_id = 0;
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 45f2ed5fcf..6744175cd5 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -461,5 +461,7 @@ void cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_mode
 			     struct cnxk_ml_io_info *io_info,
 			     struct cn10k_ml_model_metadata *metadata);
 void cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
+int cn10k_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
+				uint16_t *layer_id);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index a471e98fbf..4191ccc840 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -576,7 +576,7 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 	size_t layer_xstats_size;
 	uint8_t *base_dma_addr;
 	uint16_t scratch_pages;
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	uint16_t wb_pages;
 	uint64_t mz_size;
 	uint16_t idx;
@@ -584,7 +584,6 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 	int ret;
 
 	PLT_SET_USED(size);
-	PLT_SET_USED(layer_name);
 
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
@@ -598,6 +597,10 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 		return -EINVAL;
 	}
 
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
 	layer = &model->layer[layer_id];
 
 	ret = cn10k_ml_model_metadata_check(buffer, size);
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 4c9a080c05..8536fd8927 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -110,3 +110,28 @@ mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params, struct mvtvm_ml_mo
 
 	return -EINVAL;
 }
+
+int
+mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name, uint16_t *layer_id)
+{
+	uint16_t i;
+
+	for (i = 0; i < model->mvtvm.metadata.model.nb_layers; i++) {
+		if (strcmp(model->layer[i].name, layer_name) == 0)
+			break;
+	}
+
+	if (i == model->mvtvm.metadata.model.nb_layers) {
+		plt_err("Invalid layer name: %s", layer_name);
+		return -EINVAL;
+	}
+
+	if (model->layer[i].type != ML_CNXK_LAYER_TYPE_MRVL) {
+		plt_err("Invalid layer type, name: %s type: %d", layer_name, model->layer[i].type);
+		return -EINVAL;
+	}
+
+	*layer_id = i;
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index b11b66f495..6cb2639876 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -11,6 +11,8 @@
 
 #include "cnxk_ml_io.h"
 
+struct cnxk_ml_model;
+
 /* Maximum number of objects per model */
 #define ML_MVTVM_MODEL_OBJECT_MAX 3
 
@@ -46,5 +48,7 @@ struct mvtvm_ml_model_data {
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
 int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
 			      struct mvtvm_ml_model_object *object);
+int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
+				uint16_t *layer_id);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index e2413b6b15..1fe0a04301 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -49,9 +49,13 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		    struct cnxk_ml_model *model)
 {
 	struct mvtvm_ml_model_object object[ML_MVTVM_MODEL_OBJECT_MAX];
+	struct tvmrt_glow_callback *callback;
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
 	size_t model_object_size = 0;
+	uint16_t nb_mrvl_layers;
+	uint16_t nb_llvm_layers;
+	uint8_t layer_id = 0;
 	uint64_t mz_size = 0;
 	int ret;
 
@@ -99,5 +103,82 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	rte_memcpy(model->mvtvm.object.params.addr, object[2].buffer, object[2].size);
 	rte_free(object[2].buffer);
 
+	/* Get metadata - stage 1 */
+	ret = tvmdp_model_metadata_get_stage1(model->mvtvm.object.json.addr,
+					      model->mvtvm.object.json.size,
+					      &model->mvtvm.metadata);
+	if (ret != 0) {
+		plt_err("TVMDP: Failed to parse metadata - stage 1, model_id = %u, error = %d",
+			model->model_id, ret);
+		goto error;
+	}
+
+	/* Set model fields */
+	plt_strlcpy(model->name, model->mvtvm.metadata.model.name, TVMDP_NAME_STRLEN);
+	model->batch_size = 1;
+	model->nb_layers = model->mvtvm.metadata.model.nb_layers;
+
+	/* Update layer info */
+	nb_mrvl_layers = 0;
+	nb_llvm_layers = 0;
+	for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+		strncpy(model->layer[layer_id].name,
+			model->mvtvm.metadata.model.layer[layer_id].name, TVMDP_NAME_STRLEN);
+		if (strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "mrvl") == 0 ||
+		    strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "MRVL") == 0) {
+			model->layer[layer_id].type = ML_CNXK_LAYER_TYPE_MRVL;
+			nb_mrvl_layers++;
+		} else if (strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "llvm") == 0 ||
+			   strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "LLVM") == 0) {
+			model->layer[layer_id].type = ML_CNXK_LAYER_TYPE_LLVM;
+			nb_llvm_layers++;
+		}
+	}
+
+	if ((nb_llvm_layers == 0) && (nb_mrvl_layers == 0)) {
+		plt_err("Invalid model, nb_llvm_layers = %u, nb_mrvl_layers = %u", nb_llvm_layers,
+			nb_mrvl_layers);
+		goto error;
+	}
+
+	/* Set model subtype */
+	if ((nb_llvm_layers == 0) && (nb_mrvl_layers == 1))
+		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_MRVL;
+	else if ((nb_llvm_layers > 0) && (nb_mrvl_layers == 0))
+		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_LLVM;
+	else
+		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_HYBRID;
+
+	/* Set callback function array */
+	if (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_LLVM) {
+		callback = &model->mvtvm.cb;
+		callback->tvmrt_glow_layer_load = cn10k_ml_layer_load;
+		callback->tvmrt_glow_layer_unload = cn10k_ml_layer_unload;
+	} else {
+		callback = NULL;
+	}
+
+	/* Initialize model in TVMDP */
+	ret = tvmdp_model_load(cnxk_mldev, model->model_id, (void *)(&model->mvtvm.object),
+			       callback);
+	if (ret != 0) {
+		plt_err("TVMDP: Model load failed, model_id = %u, error = %d", model->model_id,
+			ret);
+		goto error;
+	}
+
+	/* Get model metadata - stage 2 */
+	ret = tvmdp_model_metadata_get_stage2(model->model_id, &model->mvtvm.metadata);
+	if (ret != 0) {
+		plt_err("TVMDP: Failed to get metadata, model_id = %u, error = %d\n",
+			model->model_id, ret);
+		goto error;
+	}
+
 	return 0;
+
+error:
+	rte_memzone_free(mz);
+
+	return ret;
 }
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index 7f3b3abb2e..d621dbc897 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -17,6 +17,16 @@ mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
 	return ML_CNXK_MODEL_TYPE_UNKNOWN;
 }
 
+int
+mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name, uint16_t *layer_id)
+{
+	RTE_SET_USED(model);
+	RTE_SET_USED(layer_name);
+	RTE_SET_USED(layer_id);
+
+	return -EINVAL;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 4bb1772ef4..23fdfdc4cd 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -16,4 +16,7 @@ int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 
+int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
+				uint16_t *layer_id);
+
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 23/34] ml/cnxk: update internal info for TVM model
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (21 preceding siblings ...)
  2023-10-18 13:54   ` [PATCH v6 22/34] ml/cnxk: fetch layer info and load TVM model Srikanth Yalavarthi
@ 2023-10-18 13:54   ` Srikanth Yalavarthi
  2023-10-18 13:54   ` [PATCH v6 24/34] ml/cnxk: enable model unload in tvmdp library Srikanth Yalavarthi
                     ` (11 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:54 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enabled updating internal IO info structures for TVM model.
Compute static fields related to the model I/O.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_ops.c    |   4 ++
 drivers/ml/cnxk/mvtvm_ml_model.c | 111 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |   2 +
 drivers/ml/cnxk/mvtvm_ml_ops.c   |   3 +
 drivers/ml/cnxk/mvtvm_ml_stubs.c |   9 +++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |   1 +
 6 files changed, 130 insertions(+)

diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index b18271545d..90b23d9c1c 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1244,6 +1244,8 @@ cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buf
 
 	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
 		info = cn10k_ml_model_io_info_get(model, 0);
+	else
+		info = mvtvm_ml_model_io_info_get(model, 0);
 
 	if (info == NULL)
 		return -EINVAL;
@@ -1296,6 +1298,8 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 
 	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
 		info = cn10k_ml_model_io_info_get(model, model->nb_layers - 1);
+	else
+		info = mvtvm_ml_model_io_info_get(model, model->nb_layers - 1);
 
 	if (info == NULL)
 		return -EINVAL;
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 8536fd8927..14f4b258d8 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -7,6 +7,8 @@
 
 #include <rte_mldev.h>
 
+#include <mldev_utils.h>
+
 #include <roc_api.h>
 
 #include "cnxk_ml_model.h"
@@ -135,3 +137,112 @@ mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 
 	return 0;
 }
+
+static enum rte_ml_io_type
+mvtvm_ml_io_type_map(uint8_t type)
+{
+	switch (type) {
+	case kDLInt:
+		return RTE_ML_IO_TYPE_INT32;
+	case kDLUInt:
+		return RTE_ML_IO_TYPE_UINT32;
+	case kDLFloat:
+		return RTE_ML_IO_TYPE_FP32;
+	case kDLBfloat:
+		return RTE_ML_IO_TYPE_BFLOAT16;
+	}
+
+	return RTE_ML_IO_TYPE_UNKNOWN;
+}
+
+void
+mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model)
+{
+	struct tvmdp_model_metadata *metadata;
+	int32_t i;
+	int32_t j;
+
+	if (model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL)
+		goto tvm_mrvl_model;
+
+	metadata = &model->mvtvm.metadata;
+
+	/* Inputs, set for layer_id = 0 */
+	model->mvtvm.info.nb_inputs = metadata->model.num_input;
+	model->mvtvm.info.total_input_sz_d = 0;
+	model->mvtvm.info.total_input_sz_q = 0;
+	for (i = 0; i < metadata->model.num_input; i++) {
+		strncpy(model->mvtvm.info.input[i].name, metadata->input[i].name,
+			TVMDP_NAME_STRLEN);
+		model->mvtvm.info.input[i].dtype =
+			mvtvm_ml_io_type_map(metadata->input[i].datatype.code);
+		model->mvtvm.info.input[i].qtype =
+			mvtvm_ml_io_type_map(metadata->input[i].model_datatype.code);
+		model->mvtvm.info.input[i].nb_dims = metadata->input[i].ndim;
+
+		model->mvtvm.info.input[i].nb_elements = 1;
+		for (j = 0; j < metadata->input[i].ndim; j++) {
+			model->mvtvm.info.input[i].shape[j] = metadata->input[i].shape[j];
+			model->mvtvm.info.input[i].nb_elements *= metadata->input[i].shape[j];
+		}
+
+		model->mvtvm.info.input[i].sz_d =
+			model->mvtvm.info.input[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.input[i].dtype);
+		model->mvtvm.info.input[i].sz_q =
+			model->mvtvm.info.input[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.input[i].qtype);
+
+		model->mvtvm.info.total_input_sz_d += model->mvtvm.info.input[i].sz_d;
+		model->mvtvm.info.total_input_sz_q += model->mvtvm.info.input[i].sz_q;
+
+		plt_ml_dbg("model_id = %u, input[%u] - sz_d = %u sz_q = %u", model->model_id, i,
+			   model->mvtvm.info.input[i].sz_d, model->mvtvm.info.input[i].sz_q);
+	}
+
+	/* Outputs, set for nb_layers - 1 */
+	model->mvtvm.info.nb_outputs = metadata->model.num_output;
+	model->mvtvm.info.total_output_sz_d = 0;
+	model->mvtvm.info.total_output_sz_q = 0;
+	for (i = 0; i < metadata->model.num_output; i++) {
+		strncpy(model->mvtvm.info.output[i].name, metadata->output[i].name,
+			TVMDP_NAME_STRLEN);
+		model->mvtvm.info.output[i].dtype =
+			mvtvm_ml_io_type_map(metadata->output[i].datatype.code);
+		model->mvtvm.info.output[i].qtype =
+			mvtvm_ml_io_type_map(metadata->output[i].model_datatype.code);
+		model->mvtvm.info.output[i].nb_dims = metadata->output[i].ndim;
+
+		model->mvtvm.info.output[i].nb_elements = 1;
+		for (j = 0; j < metadata->output[i].ndim; j++) {
+			model->mvtvm.info.output[i].shape[j] = metadata->output[i].shape[j];
+			model->mvtvm.info.output[i].nb_elements *= metadata->output[i].shape[j];
+		}
+
+		model->mvtvm.info.output[i].sz_d =
+			model->mvtvm.info.output[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.output[i].dtype);
+		model->mvtvm.info.output[i].sz_q =
+			model->mvtvm.info.output[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.output[i].qtype);
+
+		model->mvtvm.info.total_output_sz_d += model->mvtvm.info.output[i].sz_d;
+		model->mvtvm.info.total_output_sz_q += model->mvtvm.info.output[i].sz_q;
+
+		plt_ml_dbg("model_id = %u, output[%u] - sz_d = %u sz_q = %u", model->model_id, i,
+			   model->mvtvm.info.output[i].sz_d, model->mvtvm.info.output[i].sz_q);
+	}
+
+	return;
+
+tvm_mrvl_model:
+	cn10k_ml_layer_io_info_set(&model->mvtvm.info, &model->layer[0].glow.metadata);
+}
+
+struct cnxk_ml_io_info *
+mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
+{
+	RTE_SET_USED(layer_id);
+
+	return &model->mvtvm.info;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 6cb2639876..e86581bc6a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -50,5 +50,7 @@ int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
 			      struct mvtvm_ml_model_object *object);
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
+void mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model);
+struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 1fe0a04301..e248310cb3 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -175,6 +175,9 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		goto error;
 	}
 
+	/* Update model I/O data */
+	mvtvm_ml_model_io_info_set(model);
+
 	return 0;
 
 error:
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index d621dbc897..80a9a90b4e 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -27,6 +27,15 @@ mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 	return -EINVAL;
 }
 
+struct cnxk_ml_io_info *
+mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
+{
+	RTE_SET_USED(model);
+	RTE_SET_USED(layer_id);
+
+	return NULL;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 23fdfdc4cd..29f721072a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -18,5 +18,6 @@ int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_para
 
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
+struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
 
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 24/34] ml/cnxk: enable model unload in tvmdp library
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (22 preceding siblings ...)
  2023-10-18 13:54   ` [PATCH v6 23/34] ml/cnxk: update internal info for " Srikanth Yalavarthi
@ 2023-10-18 13:54   ` Srikanth Yalavarthi
  2023-10-18 13:54   ` [PATCH v6 25/34] ml/cnxk: enable OCM check for multilayer TVM model Srikanth Yalavarthi
                     ` (10 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:54 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enable unloading model using external tvmdp library. Updated
layer unload callback to support multiple layers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c   |  8 +++++---
 drivers/ml/cnxk/cnxk_ml_ops.c    |  7 +++++--
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 28 ++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |  1 +
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  9 +++++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  1 +
 6 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 4191ccc840..e7208391fd 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -780,11 +780,9 @@ cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name)
 	struct cnxk_ml_layer *layer;
 
 	char str[RTE_MEMZONE_NAMESIZE];
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	int ret;
 
-	PLT_SET_USED(layer_name);
-
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
 		plt_err("Invalid device = %p", device);
@@ -797,6 +795,10 @@ cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name)
 		return -EINVAL;
 	}
 
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
 	layer = &model->layer[layer_id];
 
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u_%u", CN10K_ML_LAYER_MEMZONE_NAME,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 90b23d9c1c..cd95a3c7ad 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1107,7 +1107,7 @@ cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 	struct cnxk_ml_model *model;
 
 	char str[RTE_MEMZONE_NAMESIZE];
-	int ret;
+	int ret = 0;
 
 	if (dev == NULL)
 		return -EINVAL;
@@ -1125,7 +1125,10 @@ cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EBUSY;
 	}
 
-	ret = cn10k_ml_model_unload(cnxk_mldev, model);
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		ret = cn10k_ml_model_unload(cnxk_mldev, model);
+	else
+		ret = mvtvm_ml_model_unload(cnxk_mldev, model);
 	if (ret != 0)
 		return ret;
 
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index e248310cb3..9fd9e58de6 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -185,3 +185,31 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	return ret;
 }
+
+int
+mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	int ret;
+
+	RTE_SET_USED(cnxk_mldev);
+
+	/* Initialize model in TVMDP */
+	ret = tvmdp_model_unload(model->model_id);
+	if (ret != 0) {
+		plt_err("TVMDP: Model unload failed, model_id = %u, error = %d", model->model_id,
+			ret);
+		return ret;
+	}
+
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", MVTVM_ML_MODEL_MEMZONE_NAME, model->model_id);
+	mz = rte_memzone_lookup(str);
+	if (mz == NULL) {
+		plt_err("Memzone lookup failed for TVM model: model_id = %u, mz = %s",
+			model->model_id, str);
+		return -EINVAL;
+	}
+
+	return plt_memzone_free(mz);
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 6607537599..770794fe7d 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -18,5 +18,6 @@ int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_d
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
+int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index 80a9a90b4e..a17a76e41f 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -63,3 +63,12 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	return -EINVAL;
 }
+
+int
+mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+
+	return -EINVAL;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 29f721072a..3776fb5369 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -15,6 +15,7 @@ int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_d
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
+int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 25/34] ml/cnxk: enable OCM check for multilayer TVM model
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (23 preceding siblings ...)
  2023-10-18 13:54   ` [PATCH v6 24/34] ml/cnxk: enable model unload in tvmdp library Srikanth Yalavarthi
@ 2023-10-18 13:54   ` Srikanth Yalavarthi
  2023-10-18 13:54   ` [PATCH v6 26/34] ml/cnxk: support start and stop for TVM models Srikanth Yalavarthi
                     ` (9 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:54 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

From: Anup Prabhu <aprabhu@marvell.com>

Enabled check for OCM size requirement for multi-layer
TVM model. Compute OCM scratch and WB requirement for
all layers during the load stage.

Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_ops.c | 60 +++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index cd95a3c7ad..03f4783b3f 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1023,8 +1023,12 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
+	uint16_t max_scratch_pages;
+	struct cn10k_ml_ocm *ocm;
 	uint64_t model_info_size;
+	uint16_t total_wb_pages;
 	uint16_t lcl_model_id;
+	uint16_t layer_id;
 	uint64_t mz_size;
 	bool found;
 	int ret;
@@ -1086,6 +1090,62 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 	if (ret != 0)
 		goto error;
 
+	max_scratch_pages = 0;
+	total_wb_pages = 0;
+	layer_id = 0;
+
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
+
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
+		total_wb_pages = total_wb_pages + model->layer[layer_id].glow.ocm_map.wb_pages;
+		max_scratch_pages = PLT_MAX(max_scratch_pages,
+					    model->layer[layer_id].glow.ocm_map.scratch_pages);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	} else {
+		for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+			if (model->layer[layer_id].type == ML_CNXK_LAYER_TYPE_MRVL) {
+				total_wb_pages = total_wb_pages +
+						 model->layer[layer_id].glow.ocm_map.wb_pages;
+				max_scratch_pages =
+					PLT_MAX(max_scratch_pages,
+						model->layer[layer_id].glow.ocm_map.scratch_pages);
+			}
+		}
+#endif
+	}
+
+	if ((total_wb_pages + max_scratch_pages) > ocm->num_pages) {
+		plt_err("model_id = %u: total_wb_pages (%u) + scratch_pages (%u) >  %u\n",
+			lcl_model_id, total_wb_pages, max_scratch_pages, ocm->num_pages);
+
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
+			plt_ml_dbg("layer_id = %u: wb_pages = %u, scratch_pages = %u\n", layer_id,
+				   model->layer[layer_id].glow.ocm_map.wb_pages,
+				   model->layer[layer_id].glow.ocm_map.scratch_pages);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		} else {
+			for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers;
+			     layer_id++) {
+				if (model->layer[layer_id].type == ML_CNXK_LAYER_TYPE_MRVL) {
+					plt_ml_dbg(
+						"layer_id = %u: wb_pages = %u, scratch_pages = %u\n",
+						layer_id,
+						model->layer[layer_id].glow.ocm_map.wb_pages,
+						model->layer[layer_id].glow.ocm_map.scratch_pages);
+				}
+			}
+#endif
+		}
+
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+			cn10k_ml_model_unload(cnxk_mldev, model);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		else {
+			mvtvm_ml_model_unload(cnxk_mldev, model);
+			return -ENOMEM;
+		}
+#endif
+	}
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CNXK_MODEL_STATE_LOADED;
 	cnxk_mldev->nb_models_loaded++;
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 26/34] ml/cnxk: support start and stop for TVM models
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (24 preceding siblings ...)
  2023-10-18 13:54   ` [PATCH v6 25/34] ml/cnxk: enable OCM check for multilayer TVM model Srikanth Yalavarthi
@ 2023-10-18 13:54   ` Srikanth Yalavarthi
  2023-10-18 13:54   ` [PATCH v6 27/34] ml/cnxk: update internal TVM model info structure Srikanth Yalavarthi
                     ` (8 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:54 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to start and stop TVM models. TVM model
start would invoke layer start for all Glow layers part
of the model. TVM model stop would invoke layer stop
for all Glow layers part of the model.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c   | 16 ++++++----
 drivers/ml/cnxk/cnxk_ml_ops.c    | 14 +++++++--
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 52 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |  2 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c | 18 +++++++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  2 ++
 6 files changed, 96 insertions(+), 8 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index e7208391fd..2d308802cf 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -827,7 +827,7 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 	struct cn10k_ml_ocm *ocm;
 	struct cnxk_ml_req *req;
 
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	bool job_enqueued;
 	bool job_dequeued;
 	uint8_t num_tiles;
@@ -838,8 +838,6 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 	bool locked;
 	int ret = 0;
 
-	PLT_SET_USED(layer_name);
-
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
 		plt_err("Invalid device = %p", device);
@@ -852,6 +850,10 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 		return -EINVAL;
 	}
 
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
 	layer = &model->layer[layer_id];
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
@@ -1015,14 +1017,12 @@ cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name)
 	struct cn10k_ml_ocm *ocm;
 	struct cnxk_ml_req *req;
 
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	bool job_enqueued;
 	bool job_dequeued;
 	bool locked;
 	int ret = 0;
 
-	PLT_SET_USED(layer_name);
-
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
 		plt_err("Invalid device = %p", device);
@@ -1035,6 +1035,10 @@ cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name)
 		return -EINVAL;
 	}
 
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
 	layer = &model->layer[layer_id];
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 03f4783b3f..66cda513db 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1216,7 +1216,12 @@ cnxk_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EINVAL;
 	}
 
-	return cn10k_ml_model_start(cnxk_mldev, model);
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		return cn10k_ml_model_start(cnxk_mldev, model);
+	else
+		return mvtvm_ml_model_start(cnxk_mldev, model);
+
+	return 0;
 }
 
 int
@@ -1236,7 +1241,12 @@ cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EINVAL;
 	}
 
-	return cn10k_ml_model_stop(cnxk_mldev, model);
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		return cn10k_ml_model_stop(cnxk_mldev, model);
+	else
+		return mvtvm_ml_model_stop(cnxk_mldev, model);
+
+	return 0;
 }
 
 static int
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 9fd9e58de6..1d0b3544a7 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -213,3 +213,55 @@ mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mode
 
 	return plt_memzone_free(mz);
 }
+
+int
+mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+
+	uint16_t layer_id = 0;
+	int ret = 0;
+
+next_layer:
+	layer = &model->layer[layer_id];
+	if (layer->type == ML_CNXK_LAYER_TYPE_MRVL) {
+		ret = cn10k_ml_layer_start(cnxk_mldev, model->model_id, layer->name);
+		if (ret != 0) {
+			plt_err("Layer start failed, model_id = %u, layer_name = %s, error = %d",
+				model->model_id, layer->name, ret);
+			return ret;
+		}
+	}
+	layer_id++;
+
+	if (layer_id < model->nb_layers)
+		goto next_layer;
+
+	return 0;
+}
+
+int
+mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+
+	uint16_t layer_id = 0;
+	int ret = 0;
+
+next_layer:
+	layer = &model->layer[layer_id];
+	if (layer->type == ML_CNXK_LAYER_TYPE_MRVL) {
+		ret = cn10k_ml_layer_stop(cnxk_mldev, model->model_id, layer->name);
+		if (ret != 0) {
+			plt_err("Layer stop failed, model_id = %u, layer_name = %s, error = %d",
+				model->model_id, layer->name, ret);
+			return ret;
+		}
+	}
+	layer_id++;
+
+	if (layer_id < model->nb_layers)
+		goto next_layer;
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 770794fe7d..55459f9f7f 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -19,5 +19,7 @@ int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index a17a76e41f..b8c2e6a1fc 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -72,3 +72,21 @@ mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mode
 
 	return -EINVAL;
 }
+
+int
+mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+
+	return -EINVAL;
+}
+
+int
+mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+
+	return -EINVAL;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 3776fb5369..1eb663b1d1 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -16,6 +16,8 @@ int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 27/34] ml/cnxk: update internal TVM model info structure
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (25 preceding siblings ...)
  2023-10-18 13:54   ` [PATCH v6 26/34] ml/cnxk: support start and stop for TVM models Srikanth Yalavarthi
@ 2023-10-18 13:54   ` Srikanth Yalavarthi
  2023-10-18 13:54   ` [PATCH v6 28/34] ml/cnxk: support device dump for TVM models Srikanth Yalavarthi
                     ` (7 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:54 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

From: Prince Takkar <ptakkar@marvell.com>

Added support to update internal model info structure
for TVM models.

Signed-off-by: Prince Takkar <ptakkar@marvell.com>
Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/mvtvm_ml_model.c | 65 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 +
 drivers/ml/cnxk/mvtvm_ml_ops.c   |  3 ++
 3 files changed, 70 insertions(+)

diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 14f4b258d8..569147aca7 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -11,6 +11,7 @@
 
 #include <roc_api.h>
 
+#include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 
 /* Objects list */
@@ -246,3 +247,67 @@ mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
 
 	return &model->mvtvm.info;
 }
+
+void
+mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct tvmdp_model_metadata *metadata;
+	struct rte_ml_model_info *info;
+	struct rte_ml_io_info *output;
+	struct rte_ml_io_info *input;
+	uint8_t i;
+
+	info = PLT_PTR_CAST(model->info);
+	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
+	output = PLT_PTR_ADD(input, ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info));
+
+	/* Reset model info */
+	memset(info, 0, sizeof(struct rte_ml_model_info));
+
+	if (model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL)
+		goto tvm_mrvl_model;
+
+	metadata = &model->mvtvm.metadata;
+	rte_memcpy(info->name, metadata->model.name, TVMDP_NAME_STRLEN);
+	snprintf(info->version, RTE_ML_STR_MAX, "%u.%u.%u.%u", metadata->model.version[0],
+		 metadata->model.version[1], metadata->model.version[2],
+		 metadata->model.version[3]);
+	info->model_id = model->model_id;
+	info->device_id = cnxk_mldev->mldev->data->dev_id;
+	info->io_layout = RTE_ML_IO_LAYOUT_SPLIT;
+	info->min_batches = model->batch_size;
+	info->max_batches = model->batch_size;
+	info->nb_inputs = metadata->model.num_input;
+	info->input_info = input;
+	info->nb_outputs = metadata->model.num_output;
+	info->output_info = output;
+	info->wb_size = 0;
+
+	/* Set input info */
+	for (i = 0; i < info->nb_inputs; i++) {
+		rte_memcpy(input[i].name, metadata->input[i].name, MRVL_ML_INPUT_NAME_LEN);
+		input[i].nb_dims = metadata->input[i].ndim;
+		input[i].shape = &model->mvtvm.info.input[i].shape[0];
+		input[i].type = model->mvtvm.info.input[i].qtype;
+		input[i].nb_elements = model->mvtvm.info.input[i].nb_elements;
+		input[i].size = model->mvtvm.info.input[i].nb_elements *
+				rte_ml_io_type_size_get(model->mvtvm.info.input[i].qtype);
+	}
+
+	/* Set output info */
+	for (i = 0; i < info->nb_outputs; i++) {
+		rte_memcpy(output[i].name, metadata->output[i].name, MRVL_ML_OUTPUT_NAME_LEN);
+		output[i].nb_dims = metadata->output[i].ndim;
+		output[i].shape = &model->mvtvm.info.output[i].shape[0];
+		output[i].type = model->mvtvm.info.output[i].qtype;
+		output[i].nb_elements = model->mvtvm.info.output[i].nb_elements;
+		output[i].size = model->mvtvm.info.output[i].nb_elements *
+				 rte_ml_io_type_size_get(model->mvtvm.info.output[i].qtype);
+	}
+
+	return;
+
+tvm_mrvl_model:
+	cn10k_ml_model_info_set(cnxk_mldev, model, &model->mvtvm.info,
+				&model->layer[0].glow.metadata);
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index e86581bc6a..a1247ffbde 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -11,6 +11,7 @@
 
 #include "cnxk_ml_io.h"
 
+struct cnxk_ml_dev;
 struct cnxk_ml_model;
 
 /* Maximum number of objects per model */
@@ -52,5 +53,6 @@ int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_n
 				uint16_t *layer_id);
 void mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model);
 struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
+void mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 1d0b3544a7..f13ba76207 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -178,6 +178,9 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	/* Update model I/O data */
 	mvtvm_ml_model_io_info_set(model);
 
+	/* Set model info */
+	mvtvm_ml_model_info_set(cnxk_mldev, model);
+
 	return 0;
 
 error:
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 28/34] ml/cnxk: support device dump for TVM models
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (26 preceding siblings ...)
  2023-10-18 13:54   ` [PATCH v6 27/34] ml/cnxk: update internal TVM model info structure Srikanth Yalavarthi
@ 2023-10-18 13:54   ` Srikanth Yalavarthi
  2023-10-18 13:54   ` [PATCH v6 29/34] ml/cnxk: enable reporting model runtime as xstats Srikanth Yalavarthi
                     ` (6 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:54 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enabled support to print TVM model layer info.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_model.c  |  7 +++-
 drivers/ml/cnxk/mvtvm_ml_model.c | 59 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  8 +++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  2 ++
 5 files changed, 77 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
index 02f80410ec..ed6a1ed866 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.c
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -68,6 +68,8 @@ cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 	cnxk_ml_print_line(fp, LINE_LEN);
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
 	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->name);
+	fprintf(fp, "%*s : %d\n", FIELD_LEN, "type", model->type);
+	fprintf(fp, "%*s : %d\n", FIELD_LEN, "subtype", model->subtype);
 	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->batch_size);
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "nb_layers", model->nb_layers);
@@ -84,6 +86,9 @@ cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 
 	for (layer_id = 0; layer_id < model->nb_layers; layer_id++) {
 		layer = &model->layer[layer_id];
-		cn10k_ml_layer_print(cnxk_mldev, layer, fp);
+		if (layer->type == ML_CNXK_LAYER_TYPE_MRVL)
+			cn10k_ml_layer_print(cnxk_mldev, layer, fp);
+		else
+			mvtvm_ml_layer_print(cnxk_mldev, layer, fp);
 	}
 }
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 569147aca7..4c12f584d5 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -13,6 +13,7 @@
 
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_utils.h"
 
 /* Objects list */
 char mvtvm_object_list[ML_MVTVM_MODEL_OBJECT_MAX][RTE_ML_STR_MAX] = {"mod.so", "mod.json",
@@ -311,3 +312,61 @@ mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 	cn10k_ml_model_info_set(cnxk_mldev, model, &model->mvtvm.info,
 				&model->layer[0].glow.metadata);
 }
+
+void
+mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp)
+{
+	char str[STR_LEN];
+	uint8_t i;
+
+	/* Print debug info */
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, " Layer Information (Layer ID: %u, Name: %s)\n",
+		cnxk_mldev->index_map[layer->index].layer_id, layer->name);
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "layer_id",
+		cnxk_mldev->index_map[layer->index].layer_id);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", layer->name);
+	fprintf(fp, "%*s : %d\n", FIELD_LEN, "type", layer->type);
+	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "layer", PLT_U64_CAST(layer));
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", layer->batch_size);
+
+	/* Print model state */
+	if (layer->state == ML_CNXK_LAYER_STATE_LOADED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+	if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+	if (layer->state == ML_CNXK_LAYER_STATE_STARTED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", layer->info.nb_inputs);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", layer->info.nb_outputs);
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s\n", "input", "input_name", "input_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->info.nb_inputs; i++) {
+		fprintf(fp, "%8u  ", i);
+		fprintf(fp, "%*s  ", 16, layer->info.input[i].name);
+		rte_ml_io_type_to_str(layer->info.input[i].qtype, str, STR_LEN);
+		fprintf(fp, "%*s  ", 12, str);
+	}
+	fprintf(fp, "\n");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s\n", "output", "output_name", "output_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->info.nb_outputs; i++) {
+		fprintf(fp, "%8u  ", i);
+		fprintf(fp, "%*s  ", 16, layer->info.output[i].name);
+		rte_ml_io_type_to_str(layer->info.output[i].qtype, str, STR_LEN);
+		fprintf(fp, "%*s  ", 12, str);
+		fprintf(fp, "\n");
+	}
+	fprintf(fp, "\n");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index a1247ffbde..900ba44fa0 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -13,6 +13,7 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 /* Maximum number of objects per model */
 #define ML_MVTVM_MODEL_OBJECT_MAX 3
@@ -54,5 +55,6 @@ int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_n
 void mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model);
 struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
 void mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+void mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index b8c2e6a1fc..260a051b08 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -36,6 +36,14 @@ mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
 	return NULL;
 }
 
+void
+mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(layer);
+	RTE_SET_USED(fp);
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 1eb663b1d1..d6d0edbcf1 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -9,6 +9,7 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
@@ -22,5 +23,6 @@ int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
 struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
+void mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
 
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 29/34] ml/cnxk: enable reporting model runtime as xstats
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (27 preceding siblings ...)
  2023-10-18 13:54   ` [PATCH v6 28/34] ml/cnxk: support device dump for TVM models Srikanth Yalavarthi
@ 2023-10-18 13:54   ` Srikanth Yalavarthi
  2023-10-18 13:54   ` [PATCH v6 30/34] ml/cnxk: implement I/O alloc and free callbacks Srikanth Yalavarthi
                     ` (5 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:54 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added model xstats entries to compute runtime latency.
Allocated internal resources for TVM model xstats.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c   |   9 +++
 drivers/ml/cnxk/cn10k_ml_ops.h   |   2 +
 drivers/ml/cnxk/cnxk_ml_ops.c    | 131 +++++++++++++++++++++++++++----
 drivers/ml/cnxk/cnxk_ml_ops.h    |   1 +
 drivers/ml/cnxk/cnxk_ml_xstats.h |   7 ++
 drivers/ml/cnxk/mvtvm_ml_model.h |  24 ++++++
 drivers/ml/cnxk/mvtvm_ml_ops.c   |  96 +++++++++++++++++++++-
 drivers/ml/cnxk/mvtvm_ml_ops.h   |   8 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  23 ++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |   6 ++
 10 files changed, 289 insertions(+), 18 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 2d308802cf..0c67ce7b40 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -197,6 +197,15 @@ cn10k_ml_xstats_layer_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model
 	}
 }
 
+void
+cn10k_ml_xstat_model_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			      uint16_t stat_id, uint16_t entry, char *suffix)
+{
+	snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+		 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+		 model->glow.metadata.model.name, model_xstats[entry].name, suffix);
+}
+
 #define ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
 	do {                                                                                       \
 		value = 0;                                                                         \
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 3d18303ed3..045e2e6cd2 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -331,6 +331,8 @@ int cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name
 int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name);
 
 /* xstats ops */
+void cn10k_ml_xstat_model_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				   uint16_t stat_id, uint16_t entry, char *suffix);
 uint64_t cn10k_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				  enum cnxk_ml_xstats_type type);
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 66cda513db..fd2c46ac1f 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -138,7 +138,8 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
 	nb_stats = RTE_DIM(device_xstats) +
-		   RTE_DIM(layer_xstats) * ML_CNXK_MAX_MODELS * ML_CNXK_MODEL_MAX_LAYERS;
+		   RTE_DIM(layer_xstats) * ML_CNXK_MAX_MODELS * ML_CNXK_MODEL_MAX_LAYERS +
+		   RTE_DIM(model_xstats) * ML_CNXK_MAX_MODELS;
 	if (cnxk_mldev->xstats.entries == NULL)
 		cnxk_mldev->xstats.entries = rte_zmalloc(
 			"cnxk_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
@@ -169,6 +170,25 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
 		cnxk_mldev->xstats.offset_for_model[model] = stat_id;
 
+		for (i = 0; i < RTE_DIM(model_xstats); i++) {
+			cnxk_mldev->xstats.entries[stat_id].map.id = stat_id;
+			cnxk_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
+			cnxk_mldev->xstats.entries[stat_id].group = CNXK_ML_XSTATS_GROUP_MODEL;
+			cnxk_mldev->xstats.entries[stat_id].type = model_xstats[i].type;
+			cnxk_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
+			cnxk_mldev->xstats.entries[stat_id].obj_idx = model;
+			cnxk_mldev->xstats.entries[stat_id].layer_id = -1;
+			cnxk_mldev->xstats.entries[stat_id].reset_allowed =
+				model_xstats[i].reset_allowed;
+
+			/* Name of xstat is updated during model load */
+			snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+				 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name),
+				 "Model-%u-%s", model, model_xstats[i].name);
+
+			stat_id++;
+		}
+
 		for (layer = 0; layer < ML_CNXK_MODEL_MAX_LAYERS; layer++) {
 			cnxk_mldev->xstats.offset_for_layer[model][layer] = stat_id;
 
@@ -195,7 +215,8 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 			cnxk_mldev->xstats.count_per_layer[model][layer] = RTE_DIM(layer_xstats);
 		}
 
-		cnxk_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
+		cnxk_mldev->xstats.count_per_model[model] =
+			RTE_DIM(layer_xstats) + ML_CNXK_MODEL_MAX_LAYERS * RTE_DIM(model_xstats);
 	}
 
 	cnxk_mldev->xstats.count_mode_model = stat_id - cnxk_mldev->xstats.count_mode_device;
@@ -204,6 +225,36 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
+void
+cnxk_ml_xstats_model_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id)
+{
+	struct cnxk_ml_model *model;
+	uint16_t rclk_freq;
+	uint16_t sclk_freq;
+	uint16_t stat_id;
+	char suffix[8];
+	uint16_t i;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	stat_id = cnxk_mldev->xstats.offset_for_model[model_id];
+
+	roc_clk_freq_get(&rclk_freq, &sclk_freq);
+	if (sclk_freq == 0)
+		strcpy(suffix, "cycles");
+	else
+		strcpy(suffix, "ns");
+
+	/* Update xstat name based on layer name and sclk availability */
+	for (i = 0; i < RTE_DIM(model_xstats); i++) {
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+			cn10k_ml_xstat_model_name_set(cnxk_mldev, model, stat_id, i, suffix);
+		else
+			mvtvm_ml_model_xstat_name_set(cnxk_mldev, model, stat_id, i, suffix);
+
+		stat_id++;
+	}
+}
+
 static void
 cnxk_ml_xstats_uninit(struct cnxk_ml_dev *cnxk_mldev)
 {
@@ -247,13 +298,22 @@ cnxk_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx, int32_
 	if (model == NULL)
 		return 0;
 
-	if (layer_id >= 0)
+	if (layer_id >= 0) {
 		layer = &model->layer[layer_id];
-	else
-		return 0;
+		goto layer_xstats;
+	} else {
+		layer = NULL;
+		goto model_xstats;
+	}
 
+layer_xstats:
 	value = cn10k_ml_model_xstat_get(cnxk_mldev, layer, type);
+	goto exit_xstats;
 
+model_xstats:
+	value = mvtvm_ml_model_xstat_get(cnxk_mldev, model, type);
+
+exit_xstats:
 	roc_clk_freq_get(&rclk_freq, &sclk_freq);
 	if (sclk_freq != 0) /* return in ns */
 		value = (value * 1000ULL) / sclk_freq;
@@ -836,8 +896,9 @@ cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode
 {
 	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
 	uint32_t xstats_mode_count;
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	uint32_t idx = 0;
 	uint32_t i;
 
@@ -854,7 +915,17 @@ cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CNXK_MAX_MODELS)
 			break;
-		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+
+		model = cnxk_mldev->mldev->data->models[model_id];
+		for (layer_id = 0; layer_id < model->nb_layers; layer_id++) {
+			if (model->layer[layer_id].type == ML_CNXK_LAYER_TYPE_MRVL)
+				xstats_mode_count +=
+					cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+		}
+
+		if ((model->type == ML_CNXK_MODEL_TYPE_TVM) &&
+		    (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_MRVL))
+			xstats_mode_count += RTE_DIM(model_xstats);
 		break;
 	default:
 		return -EINVAL;
@@ -868,9 +939,20 @@ cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode
 		if (xs->mode != mode)
 			continue;
 
-		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
-		    (model_id != xs->obj_idx || layer_id != xs->layer_id))
-			continue;
+		if (mode == RTE_ML_DEV_XSTATS_MODEL) {
+			if (model_id != xs->obj_idx)
+				continue;
+
+			model = cnxk_mldev->mldev->data->models[model_id];
+			if ((model->type == ML_CNXK_MODEL_TYPE_GLOW ||
+			     model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL) &&
+			    xs->group == CNXK_ML_XSTATS_GROUP_MODEL)
+				continue;
+
+			if (model->type == ML_CNXK_MODEL_TYPE_TVM &&
+			    model->layer[xs->layer_id].type == ML_CNXK_LAYER_TYPE_LLVM)
+				continue;
+		}
 
 		strncpy(xstats_map[idx].name, xs->map.name, RTE_ML_STR_MAX);
 		xstats_map[idx].id = xs->map.id;
@@ -931,9 +1013,10 @@ cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 {
 	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
 	uint32_t xstats_mode_count;
-	uint16_t layer_id = 0;
 	cnxk_ml_xstats_fn fn;
+	uint16_t layer_id;
 	uint64_t val;
 	uint32_t idx;
 	uint32_t i;
@@ -951,7 +1034,14 @@ cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CNXK_MAX_MODELS)
 			return -EINVAL;
-		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+
+		model = cnxk_mldev->mldev->data->models[model_id];
+		for (layer_id = 0; layer_id < model->nb_layers; layer_id++)
+			xstats_mode_count += cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+
+		if ((model->type == ML_CNXK_MODEL_TYPE_TVM) &&
+		    (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_MRVL))
+			xstats_mode_count += RTE_DIM(model_xstats);
 		break;
 	default:
 		return -EINVAL;
@@ -963,11 +1053,18 @@ cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 		if (stat_ids[i] > cnxk_mldev->xstats.count || xs->mode != mode)
 			continue;
 
-		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
-		    (model_id != xs->obj_idx || layer_id != xs->layer_id)) {
-			plt_err("Invalid stats_id[%d] = %d for model_id = %d\n", i, stat_ids[i],
-				model_id);
-			return -EINVAL;
+		if (mode == RTE_ML_DEV_XSTATS_MODEL) {
+			if (model_id != xs->obj_idx)
+				continue;
+
+			model = cnxk_mldev->mldev->data->models[xs->obj_idx];
+			if ((model->type == ML_CNXK_MODEL_TYPE_GLOW ||
+			     model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL) &&
+			    xs->group == CNXK_ML_XSTATS_GROUP_MODEL)
+				continue;
+
+			if (xs->layer_id == -1 && xs->group == CNXK_ML_XSTATS_GROUP_LAYER)
+				continue;
 		}
 
 		switch (xs->fn_id) {
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index b22a2b0d95..ab32676b3e 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -70,6 +70,7 @@ extern struct rte_ml_dev_ops cnxk_ml_ops;
 
 int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
 int cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
+void cnxk_ml_xstats_model_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id);
 
 __rte_hot uint16_t cnxk_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					 struct rte_ml_op **ops, uint16_t nb_ops);
diff --git a/drivers/ml/cnxk/cnxk_ml_xstats.h b/drivers/ml/cnxk/cnxk_ml_xstats.h
index 5e02bb876c..a2c9adfe4a 100644
--- a/drivers/ml/cnxk/cnxk_ml_xstats.h
+++ b/drivers/ml/cnxk/cnxk_ml_xstats.h
@@ -142,4 +142,11 @@ static const struct cnxk_ml_xstat_info layer_xstats[] = {
 	{"Min-FW-Latency", min_fw_latency, 1}, {"Max-FW-Latency", max_fw_latency, 1},
 };
 
+/* Model xstats */
+static const struct cnxk_ml_xstat_info model_xstats[] = {
+	{"Avg-RT-Latency", avg_rt_latency, 1},
+	{"Min-RT-Latency", min_rt_latency, 1},
+	{"Max-RT-Latency", max_rt_latency, 1},
+};
+
 #endif /* _CNXK_ML_XSTATS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 900ba44fa0..66c3af18e1 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -33,6 +33,27 @@ struct mvtvm_ml_model_object {
 	int64_t size;
 };
 
+/* Model fast-path stats */
+struct mvtvm_ml_model_xstats {
+	/* Total TVM runtime latency, sum of all inferences */
+	uint64_t tvm_rt_latency_tot;
+
+	/* TVM runtime latency */
+	uint64_t tvm_rt_latency;
+
+	/* Minimum TVM runtime latency */
+	uint64_t tvm_rt_latency_min;
+
+	/* Maximum TVM runtime latency */
+	uint64_t tvm_rt_latency_max;
+
+	/* Total jobs dequeued */
+	uint64_t dequeued_count;
+
+	/* Hardware stats reset index */
+	uint64_t tvm_rt_reset_count;
+};
+
 struct mvtvm_ml_model_data {
 	/* Model metadata */
 	struct tvmdp_model_metadata metadata;
@@ -45,6 +66,9 @@ struct mvtvm_ml_model_data {
 
 	/* Model I/O info */
 	struct cnxk_ml_io_info info;
+
+	/* Stats for burst ops */
+	struct mvtvm_ml_model_xstats *burst_xstats;
 };
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index f13ba76207..832837034b 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -10,10 +10,83 @@
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
+#include "cnxk_ml_xstats.h"
 
 /* ML model macros */
 #define MVTVM_ML_MODEL_MEMZONE_NAME "ml_mvtvm_model_mz"
 
+void
+mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			      uint16_t stat_id, uint16_t entry, char *suffix)
+{
+	snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+		 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+		 model->mvtvm.metadata.model.name, model_xstats[entry].name, suffix);
+}
+
+#define ML_AVG_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count)                            \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value += model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_tot;              \
+			count += model->mvtvm.burst_xstats[qp_id].dequeued_count -                 \
+				 model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count;              \
+		}                                                                                  \
+		if (count != 0)                                                                    \
+			value = value / count;                                                     \
+	} while (0)
+
+#define ML_MIN_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count)                            \
+	do {                                                                                       \
+		value = UINT64_MAX;                                                                \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MIN(value,                                                     \
+					model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_min);      \
+			count += model->mvtvm.burst_xstats[qp_id].dequeued_count -                 \
+				 model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count;              \
+		}                                                                                  \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+
+#define ML_MAX_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count)                            \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MAX(value,                                                     \
+					model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_max);      \
+			count += model->mvtvm.burst_xstats[qp_id].dequeued_count -                 \
+				 model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count;              \
+		}                                                                                  \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+
+uint64_t
+mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			 enum cnxk_ml_xstats_type type)
+{
+	uint64_t count = 0;
+	uint64_t value = 0;
+	uint32_t qp_id;
+
+	switch (type) {
+	case avg_rt_latency:
+		ML_AVG_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count);
+		break;
+	case min_rt_latency:
+		ML_MIN_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count);
+		break;
+	case max_rt_latency:
+		ML_MAX_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count);
+		break;
+	default:
+		value = 0;
+	}
+
+	return value;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
@@ -53,6 +126,7 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
 	size_t model_object_size = 0;
+	size_t model_xstats_size = 0;
 	uint16_t nb_mrvl_layers;
 	uint16_t nb_llvm_layers;
 	uint8_t layer_id = 0;
@@ -68,7 +142,11 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	model_object_size = RTE_ALIGN_CEIL(object[0].size, RTE_CACHE_LINE_MIN_SIZE) +
 			    RTE_ALIGN_CEIL(object[1].size, RTE_CACHE_LINE_MIN_SIZE) +
 			    RTE_ALIGN_CEIL(object[2].size, RTE_CACHE_LINE_MIN_SIZE);
-	mz_size += model_object_size;
+
+	model_xstats_size =
+		cnxk_mldev->mldev->data->nb_queue_pairs * sizeof(struct mvtvm_ml_model_xstats);
+
+	mz_size += model_object_size + model_xstats_size;
 
 	/* Allocate memzone for model object */
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", MVTVM_ML_MODEL_MEMZONE_NAME, model->model_id);
@@ -181,6 +259,22 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	/* Set model info */
 	mvtvm_ml_model_info_set(cnxk_mldev, model);
 
+	/* Update model xstats name */
+	cnxk_ml_xstats_model_name_update(cnxk_mldev, model->model_id);
+
+	model->mvtvm.burst_xstats = RTE_PTR_ADD(
+		model->mvtvm.object.params.addr,
+		RTE_ALIGN_CEIL(model->mvtvm.object.params.size, RTE_CACHE_LINE_MIN_SIZE));
+
+	for (int qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_tot = 0;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency = 0;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_min = UINT64_MAX;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_max = 0;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count = 0;
+		model->mvtvm.burst_xstats[qp_id].dequeued_count = 0;
+	}
+
 	return 0;
 
 error:
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 55459f9f7f..22e0340146 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -11,8 +11,11 @@
 
 #include <rte_mldev.h>
 
+#include "cnxk_ml_xstats.h"
+
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
@@ -22,4 +25,9 @@ int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *
 int mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
+void mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				   uint16_t stat_id, uint16_t entry, char *suffix);
+uint64_t mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				  enum cnxk_ml_xstats_type type);
+
 #endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index 260a051b08..19af1d2703 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -8,6 +8,7 @@
 
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_xstats.h"
 
 enum cnxk_ml_model_type
 mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
@@ -44,6 +45,28 @@ mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer
 	RTE_SET_USED(fp);
 }
 
+void
+mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			      uint16_t stat_id, uint16_t entry, char *suffix)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+	RTE_SET_USED(stat_id);
+	RTE_SET_USED(entry);
+	RTE_SET_USED(suffix);
+}
+
+uint64_t
+mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			 enum cnxk_ml_xstats_type type)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+	RTE_SET_USED(type);
+
+	return 0;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index d6d0edbcf1..3fd1f04c35 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -7,6 +7,8 @@
 
 #include <rte_mldev.h>
 
+#include "cnxk_ml_xstats.h"
+
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
@@ -24,5 +26,9 @@ int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_n
 				uint16_t *layer_id);
 struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
 void mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
+void mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				   uint16_t stat_id, uint16_t entry, char *suffix);
+uint64_t mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				  enum cnxk_ml_xstats_type type);
 
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 30/34] ml/cnxk: implement I/O alloc and free callbacks
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (28 preceding siblings ...)
  2023-10-18 13:54   ` [PATCH v6 29/34] ml/cnxk: enable reporting model runtime as xstats Srikanth Yalavarthi
@ 2023-10-18 13:54   ` Srikanth Yalavarthi
  2023-10-18 13:54   ` [PATCH v6 31/34] ml/cnxk: add generic ML malloc and free callback Srikanth Yalavarthi
                     ` (4 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:54 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented callback functions for IO allocation and free
for Glow layers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 87 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  3 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c |  2 +
 3 files changed, 92 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 0c67ce7b40..7802425c87 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1410,3 +1410,90 @@ cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
 error_enqueue:
 	return ret;
 }
+
+int
+cn10k_ml_io_alloc(void *device, uint16_t model_id, const char *layer_name, uint64_t **input_qbuffer,
+		  uint64_t **output_qbuffer)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint64_t output_size;
+	uint64_t input_size;
+	uint16_t layer_id;
+	int ret;
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
+	layer = &model->layer[layer_id];
+	input_size = PLT_ALIGN_CEIL(layer->info.total_input_sz_q, ML_CN10K_ALIGN_SIZE);
+	output_size = PLT_ALIGN_CEIL(layer->info.total_output_sz_q, ML_CN10K_ALIGN_SIZE);
+
+	sprintf(str, "cn10k_ml_io_mz_%u_%u", model_id, layer_id);
+	mz = plt_memzone_reserve_aligned(str, input_size + output_size, 0, ML_CN10K_ALIGN_SIZE);
+	if (mz == NULL) {
+		plt_err("io_alloc failed: Unable to allocate memory: model_id = %u, layer_name = %s",
+			model_id, layer_name);
+		return -ENOMEM;
+	}
+
+	*input_qbuffer = mz->addr;
+	*output_qbuffer = PLT_PTR_ADD(mz->addr, input_size);
+
+	return 0;
+}
+
+int
+cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint16_t layer_id;
+	int ret;
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
+	sprintf(str, "cn10k_ml_io_mz_%u_%u", model_id, layer_id);
+	mz = plt_memzone_lookup(str);
+	if (mz == NULL) {
+		plt_err("io_free failed: Memzone not found: model_id = %u, layer_name = %s",
+			model_id, layer_name);
+		return -EINVAL;
+	}
+
+	return plt_memzone_free(mz);
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 045e2e6cd2..9c41c1c0b0 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -329,6 +329,9 @@ int cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name,
 int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name);
 int cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name);
 int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name);
+int cn10k_ml_io_alloc(void *device, uint16_t model_id, const char *layer_name,
+		      uint64_t **input_qbuffer, uint64_t **output_qbuffer);
+int cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name);
 
 /* xstats ops */
 void cn10k_ml_xstat_model_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 832837034b..77c2b5bcdc 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -232,6 +232,8 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback = &model->mvtvm.cb;
 		callback->tvmrt_glow_layer_load = cn10k_ml_layer_load;
 		callback->tvmrt_glow_layer_unload = cn10k_ml_layer_unload;
+		callback->tvmrt_io_alloc = cn10k_ml_io_alloc;
+		callback->tvmrt_io_free = cn10k_ml_io_free;
 	} else {
 		callback = NULL;
 	}
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 31/34] ml/cnxk: add generic ML malloc and free callback
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (29 preceding siblings ...)
  2023-10-18 13:54   ` [PATCH v6 30/34] ml/cnxk: implement I/O alloc and free callbacks Srikanth Yalavarthi
@ 2023-10-18 13:54   ` Srikanth Yalavarthi
  2023-10-18 13:54   ` [PATCH v6 32/34] ml/cnxk: support quantize and dequantize callback Srikanth Yalavarthi
                     ` (3 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:54 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented generic ML malloc and free callbacks

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 30 ++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  3 +++
 drivers/ml/cnxk/mvtvm_ml_ops.c |  2 ++
 3 files changed, 35 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 7802425c87..01b0a44caa 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1497,3 +1497,33 @@ cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name)
 
 	return plt_memzone_free(mz);
 }
+
+int
+cn10k_ml_malloc(const char *name, size_t size, uint32_t align, void **addr)
+{
+	const struct plt_memzone *mz;
+
+	mz = plt_memzone_reserve_aligned(name, size, 0, align);
+	if (mz == NULL) {
+		plt_err("ml_malloc failed: Unable to allocate memory: name = %s", name);
+		return -ENOMEM;
+	}
+
+	*addr = mz->addr;
+
+	return 0;
+}
+
+int
+cn10k_ml_free(const char *name)
+{
+	const struct plt_memzone *mz;
+
+	mz = plt_memzone_lookup(name);
+	if (mz == NULL) {
+		plt_err("ml_free failed: Memzone not found: name = %s", name);
+		return -EINVAL;
+	}
+
+	return plt_memzone_free(mz);
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 9c41c1c0b0..eb3e1c139c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -333,6 +333,9 @@ int cn10k_ml_io_alloc(void *device, uint16_t model_id, const char *layer_name,
 		      uint64_t **input_qbuffer, uint64_t **output_qbuffer);
 int cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name);
 
+int cn10k_ml_malloc(const char *name, size_t size, uint32_t align, void **addr);
+int cn10k_ml_free(const char *name);
+
 /* xstats ops */
 void cn10k_ml_xstat_model_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 				   uint16_t stat_id, uint16_t entry, char *suffix);
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 77c2b5bcdc..b627355917 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -234,6 +234,8 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback->tvmrt_glow_layer_unload = cn10k_ml_layer_unload;
 		callback->tvmrt_io_alloc = cn10k_ml_io_alloc;
 		callback->tvmrt_io_free = cn10k_ml_io_free;
+		callback->tvmrt_malloc = cn10k_ml_malloc;
+		callback->tvmrt_free = cn10k_ml_free;
 	} else {
 		callback = NULL;
 	}
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 32/34] ml/cnxk: support quantize and dequantize callback
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (30 preceding siblings ...)
  2023-10-18 13:54   ` [PATCH v6 31/34] ml/cnxk: add generic ML malloc and free callback Srikanth Yalavarthi
@ 2023-10-18 13:54   ` Srikanth Yalavarthi
  2023-10-18 13:54   ` [PATCH v6 33/34] ml/cnxk: enable fast-path ops for TVM models Srikanth Yalavarthi
                     ` (2 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:54 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

From: Prince Takkar <ptakkar@marvell.com>

Added support for quantize and dequantize callback
functions for TVM models.

Signed-off-by: Prince Takkar <ptakkar@marvell.com>
---
 drivers/ml/cnxk/mvtvm_ml_ops.c | 129 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h |   4 +
 2 files changed, 133 insertions(+)

diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index b627355917..776675843a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -2,11 +2,15 @@
  * Copyright (c) 2023 Marvell.
  */
 
+#include <dlpack/dlpack.h>
+
 #include <rte_common.h>
 #include <rte_cycles.h>
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include <mldev_utils.h>
+
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
@@ -236,6 +240,8 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback->tvmrt_io_free = cn10k_ml_io_free;
 		callback->tvmrt_malloc = cn10k_ml_malloc;
 		callback->tvmrt_free = cn10k_ml_free;
+		callback->tvmrt_quantize = mvtvm_ml_io_quantize;
+		callback->tvmrt_dequantize = mvtvm_ml_io_dequantize;
 	} else {
 		callback = NULL;
 	}
@@ -366,3 +372,126 @@ mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
 
 	return 0;
 }
+
+int
+mvtvm_ml_io_quantize(void *device, uint16_t model_id, const char *layer_name,
+		     const DLTensor **deq_tensor, void *qbuffer)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	uint16_t layer_id = 0;
+	uint8_t *lcl_dbuffer;
+	uint8_t *lcl_qbuffer;
+	uint32_t i;
+	int ret;
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if ((device == NULL) || (deq_tensor == NULL) || (qbuffer == NULL))
+		return -EINVAL;
+#endif
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+#ifdef CNXK_ML_DEV_DEBUG
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+#endif
+
+	/* Get layer id */
+	for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+		if (strcmp(model->layer[layer_id].name, layer_name) == 0)
+			break;
+	}
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if (layer_id == model->mvtvm.metadata.model.nb_layers) {
+		plt_err("Invalid layer name: %s", layer_name);
+		return -EINVAL;
+	}
+
+	if (model->layer[layer_id].type != ML_CNXK_LAYER_TYPE_MRVL) {
+		plt_err("Invalid layer name / type: %s", layer_name);
+		return -EINVAL;
+	}
+#endif
+
+	info = &model->layer[layer_id].info;
+	lcl_qbuffer = (uint8_t *)qbuffer;
+
+	for (i = 0; i < info->nb_inputs; i++) {
+		lcl_dbuffer = PLT_PTR_ADD(deq_tensor[i]->data, deq_tensor[i]->byte_offset);
+
+		ret = cnxk_ml_io_quantize_single(&info->input[i], lcl_dbuffer, lcl_qbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_qbuffer += info->input[i].sz_q;
+	}
+
+	return 0;
+}
+
+int
+mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name, void *qbuffer,
+		       const DLTensor **deq_tensor)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	uint16_t layer_id = 0;
+	uint8_t *lcl_dbuffer;
+	uint8_t *lcl_qbuffer;
+	uint32_t i;
+	int ret;
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if ((device == NULL) || (deq_tensor == NULL) || (qbuffer == NULL))
+		return -EINVAL;
+#endif
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+#ifdef CNXK_ML_DEV_DEBUG
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+#endif
+
+	for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+		if (strcmp(model->layer[layer_id].name, layer_name) == 0)
+			break;
+	}
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if (layer_id == model->mvtvm.metadata.model.nb_layers) {
+		plt_err("Invalid layer name: %s", layer_name);
+		return -EINVAL;
+	}
+
+	if (model->layer[layer_id].type != ML_CNXK_LAYER_TYPE_MRVL) {
+		plt_err("Invalid layer name / type: %s", layer_name);
+		return -EINVAL;
+	}
+#endif
+
+	info = &model->layer[layer_id].info;
+	lcl_qbuffer = (uint8_t *)qbuffer;
+
+	for (i = 0; i < info->nb_outputs; i++) {
+		lcl_dbuffer = PLT_PTR_ADD(deq_tensor[i]->data, deq_tensor[i]->byte_offset);
+
+		ret = cnxk_ml_io_dequantize_single(&info->output[i], lcl_qbuffer, lcl_dbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_qbuffer += info->output[i].sz_q;
+	}
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 22e0340146..4cabe30a82 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -24,6 +24,10 @@ int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_para
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_io_quantize(void *device, uint16_t model_id, const char *layer_name,
+			 const DLTensor **deq_tensor, void *qbuffer);
+int mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name, void *qbuffer,
+			   const DLTensor **deq_tensor);
 
 void mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 				   uint16_t stat_id, uint16_t entry, char *suffix);
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 33/34] ml/cnxk: enable fast-path ops for TVM models
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (31 preceding siblings ...)
  2023-10-18 13:54   ` [PATCH v6 32/34] ml/cnxk: support quantize and dequantize callback Srikanth Yalavarthi
@ 2023-10-18 13:54   ` Srikanth Yalavarthi
  2023-10-18 13:54   ` [PATCH v6 34/34] ml/cnxk: enable creation of mvtvm virtual device Srikanth Yalavarthi
  2023-10-18 14:20   ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Jerin Jacob
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:54 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

From: Anup Prabhu <aprabhu@marvell.com>

Enable fast-path ops support for TVM models. Models would
use TVMDP library function calls to execute inference
operations for Hybrid and LLVM model sub-types.

For TVM MRVL model subtypes that have a single MRVL layer,
the inference requests are directly enqueued to hardware
by the driver.

Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 doc/guides/rel_notes/release_23_11.rst |   3 +
 drivers/ml/cnxk/cn10k_ml_ops.c         |   4 -
 drivers/ml/cnxk/cnxk_ml_io.h           |   6 ++
 drivers/ml/cnxk/cnxk_ml_ops.c          |   4 +
 drivers/ml/cnxk/cnxk_ml_ops.h          |   5 +
 drivers/ml/cnxk/mvtvm_ml_model.c       |  20 ++++
 drivers/ml/cnxk/mvtvm_ml_model.h       |   6 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c         | 124 +++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h         |  43 +++++++++
 9 files changed, 211 insertions(+), 4 deletions(-)

diff --git a/doc/guides/rel_notes/release_23_11.rst b/doc/guides/rel_notes/release_23_11.rst
index 0a6fc76a9d..5fcf2a1897 100644
--- a/doc/guides/rel_notes/release_23_11.rst
+++ b/doc/guides/rel_notes/release_23_11.rst
@@ -243,6 +243,9 @@ New Features
   Added dispatcher library which purpose is to help decouple different
   parts (modules) of an eventdev-based application.
 
+* **Updated Marvell cnxk mldev driver.**
+
+  * Added support for models compiled using TVM framework.
 
 Removed Items
 -------------
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 01b0a44caa..b9d30278c6 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -371,10 +371,6 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 	else
 		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
-	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
-	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
-	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
-
 	return 0;
 }
 
diff --git a/drivers/ml/cnxk/cnxk_ml_io.h b/drivers/ml/cnxk/cnxk_ml_io.h
index 5de166c252..6d5d25a7c9 100644
--- a/drivers/ml/cnxk/cnxk_ml_io.h
+++ b/drivers/ml/cnxk/cnxk_ml_io.h
@@ -47,6 +47,12 @@ struct cnxk_ml_io {
 
 	/* Scale */
 	float scale;
+
+	/* Dequantized offset */
+	uint32_t offset_d;
+
+	/* Quantized offset */
+	uint32_t offset_q;
 };
 
 /* Model / Layer IO structure */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index fd2c46ac1f..608e9fc4ca 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -632,6 +632,10 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	cnxk_mldev->max_nb_layers =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
 
+	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
+	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
+	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
+
 	/* Allocate and initialize index_map */
 	if (cnxk_mldev->index_map == NULL) {
 		cnxk_mldev->index_map =
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index ab32676b3e..7b49793a57 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -24,6 +24,11 @@ struct cnxk_ml_req {
 	union {
 		/* CN10K */
 		struct cn10k_ml_req cn10k_req;
+
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		/* MVTVM */
+		struct mvtvm_ml_req mvtvm_req;
+#endif
 	};
 
 	/* Address of status field */
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 4c12f584d5..1dfd0d176a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -198,6 +198,16 @@ mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model)
 		model->mvtvm.info.total_input_sz_d += model->mvtvm.info.input[i].sz_d;
 		model->mvtvm.info.total_input_sz_q += model->mvtvm.info.input[i].sz_q;
 
+		model->mvtvm.info.input[i].offset_d = model->mvtvm.info.total_input_sz_d;
+		model->mvtvm.info.input[i].offset_q = model->mvtvm.info.total_input_sz_q;
+
+		model->mvtvm.input_tensor[i].device = metadata->input[i].device;
+		model->mvtvm.input_tensor[i].ndim = metadata->input[i].ndim;
+		model->mvtvm.input_tensor[i].dtype = metadata->input[i].datatype;
+		model->mvtvm.input_tensor[i].shape = metadata->input[i].shape;
+		model->mvtvm.input_tensor[i].strides = NULL;
+		model->mvtvm.input_tensor[i].byte_offset = model->mvtvm.info.input[i].offset_q;
+
 		plt_ml_dbg("model_id = %u, input[%u] - sz_d = %u sz_q = %u", model->model_id, i,
 			   model->mvtvm.info.input[i].sz_d, model->mvtvm.info.input[i].sz_q);
 	}
@@ -231,6 +241,16 @@ mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model)
 		model->mvtvm.info.total_output_sz_d += model->mvtvm.info.output[i].sz_d;
 		model->mvtvm.info.total_output_sz_q += model->mvtvm.info.output[i].sz_q;
 
+		model->mvtvm.info.output[i].offset_d = model->mvtvm.info.total_output_sz_d;
+		model->mvtvm.info.output[i].offset_q = model->mvtvm.info.total_output_sz_q;
+
+		model->mvtvm.output_tensor[i].device = metadata->output[i].device;
+		model->mvtvm.output_tensor[i].ndim = metadata->output[i].ndim;
+		model->mvtvm.output_tensor[i].dtype = metadata->output[i].datatype;
+		model->mvtvm.output_tensor[i].shape = metadata->output[i].shape;
+		model->mvtvm.output_tensor[i].strides = NULL;
+		model->mvtvm.output_tensor[i].byte_offset = model->mvtvm.info.output[i].offset_q;
+
 		plt_ml_dbg("model_id = %u, output[%u] - sz_d = %u sz_q = %u", model->model_id, i,
 			   model->mvtvm.info.output[i].sz_d, model->mvtvm.info.output[i].sz_q);
 	}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 66c3af18e1..7ffce38094 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -69,6 +69,12 @@ struct mvtvm_ml_model_data {
 
 	/* Stats for burst ops */
 	struct mvtvm_ml_model_xstats *burst_xstats;
+
+	/* Input Tensor */
+	DLTensor input_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Output Tensor */
+	DLTensor output_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
 };
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 776675843a..1e74b82a0a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -19,6 +19,12 @@
 /* ML model macros */
 #define MVTVM_ML_MODEL_MEMZONE_NAME "ml_mvtvm_model_mz"
 
+__rte_hot static void
+mvtvm_ml_set_poll_addr(struct cnxk_ml_req *req)
+{
+	req->status = &req->mvtvm_req.status;
+}
+
 void
 mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 			      uint16_t stat_id, uint16_t entry, char *suffix)
@@ -242,6 +248,7 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback->tvmrt_free = cn10k_ml_free;
 		callback->tvmrt_quantize = mvtvm_ml_io_quantize;
 		callback->tvmrt_dequantize = mvtvm_ml_io_dequantize;
+		callback->tvmrt_inference = cn10k_ml_inference_sync;
 	} else {
 		callback = NULL;
 	}
@@ -285,6 +292,19 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		model->mvtvm.burst_xstats[qp_id].dequeued_count = 0;
 	}
 
+	/* Set model specific fast path functions */
+	if (model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL) {
+		model->enqueue_single = cn10k_ml_enqueue_single;
+		model->result_update = cn10k_ml_result_update;
+		model->set_error_code = cn10k_ml_set_error_code;
+		model->set_poll_addr = cn10k_ml_set_poll_addr;
+	} else {
+		model->enqueue_single = mvtvm_ml_enqueue_single;
+		model->result_update = mvtvm_ml_result_update;
+		model->set_error_code = mvtvm_ml_set_error_code;
+		model->set_poll_addr = mvtvm_ml_set_poll_addr;
+	}
+
 	return 0;
 
 error:
@@ -495,3 +515,107 @@ mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name,
 
 	return 0;
 }
+
+static int
+mvtvm_ml_model_run(struct cnxk_ml_model *model, struct rte_ml_op *op, struct cnxk_ml_req *req)
+{
+	uint8_t i;
+
+	rte_memcpy(req->mvtvm_req.input_tensor, model->mvtvm.input_tensor,
+		   model->mvtvm.metadata.model.num_input * sizeof(DLTensor));
+	for (i = 0; i < model->mvtvm.metadata.model.num_input; i++) {
+		req->mvtvm_req.input_tensor[i].data = op->input[i]->addr;
+		req->mvtvm_req.input_tensor[i].byte_offset = 0;
+	}
+
+	rte_memcpy(req->mvtvm_req.output_tensor, model->mvtvm.output_tensor,
+		   model->mvtvm.metadata.model.num_output * sizeof(DLTensor));
+	for (i = 0; i < model->mvtvm.metadata.model.num_output; i++) {
+		req->mvtvm_req.output_tensor[i].data = op->output[i]->addr;
+		req->mvtvm_req.output_tensor[i].byte_offset = 0;
+	}
+
+	tvmdp_model_run(model->model_id, model->mvtvm.metadata.model.num_input,
+			req->mvtvm_req.input_tensor, model->mvtvm.metadata.model.num_output,
+			req->mvtvm_req.output_tensor, &req->mvtvm_req.result,
+			&req->mvtvm_req.status);
+
+	plt_write64(ML_CNXK_POLL_JOB_FINISH, req->status);
+
+	return 0;
+}
+
+__rte_hot void
+mvtvm_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype)
+{
+	RTE_SET_USED(stype);
+
+	req->mvtvm_req.result.error_code = etype;
+}
+
+__rte_hot bool
+mvtvm_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op, uint16_t layer_id,
+			struct cnxk_ml_qp *qp, uint64_t head)
+{
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_req *req;
+
+	RTE_SET_USED(layer_id);
+
+	queue = &qp->queue;
+	req = &queue->reqs[head];
+	model = cnxk_mldev->mldev->data->models[op->model_id];
+
+	model->set_poll_addr(req);
+	memset(&req->mvtvm_req.result, 0, sizeof(struct mvtvm_ml_result));
+	req->mvtvm_req.result.error_code = 0x0;
+	req->mvtvm_req.result.user_ptr = op->user_ptr;
+
+	cnxk_ml_set_poll_ptr(req);
+	mvtvm_ml_model_run(model, op, req);
+	req->timeout = plt_tsc_cycles() + queue->wait_cycles;
+	req->op = op;
+
+	return true;
+}
+
+__rte_hot void
+mvtvm_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request)
+{
+	struct mvtvm_ml_model_xstats *xstats;
+	struct mvtvm_ml_result *result;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_req *req;
+	uint64_t tvm_rt_latency;
+	struct cnxk_ml_qp *qp;
+	struct rte_ml_op *op;
+
+	req = (struct cnxk_ml_req *)request;
+	result = &req->mvtvm_req.result;
+	op = req->op;
+	qp = cnxk_mldev->mldev->data->queue_pairs[qp_id];
+	op->impl_opaque = result->error_code;
+
+	if (likely(result->error_code == 0)) {
+		qp->stats.dequeued_count++;
+		op->status = RTE_ML_OP_STATUS_SUCCESS;
+
+		model = cnxk_mldev->mldev->data->models[op->model_id];
+		xstats = &model->mvtvm.burst_xstats[qp_id];
+
+		if (unlikely(xstats->dequeued_count == xstats->tvm_rt_reset_count)) {
+			xstats->tvm_rt_latency_min = UINT64_MAX;
+			xstats->tvm_rt_latency_max = 0;
+		}
+		tvm_rt_latency = result->stats.end_ns - result->stats.start_ns;
+		xstats->tvm_rt_latency = tvm_rt_latency;
+		xstats->tvm_rt_latency_tot += tvm_rt_latency;
+		xstats->tvm_rt_latency_min = RTE_MIN(xstats->tvm_rt_latency_min, tvm_rt_latency);
+		xstats->tvm_rt_latency_max = RTE_MAX(xstats->tvm_rt_latency_max, tvm_rt_latency);
+		xstats->dequeued_count++;
+	} else {
+		qp->stats.dequeue_err_count++;
+		op->status = RTE_ML_OP_STATUS_ERROR;
+	}
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 4cabe30a82..cb4b219743 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -16,6 +16,44 @@
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
+struct cnxk_ml_qp;
+struct cnxk_ml_req;
+
+/* Inference stats */
+struct mvtvm_ml_stats {
+	/* Start ns */
+	uint64_t start_ns;
+
+	/* Start ns */
+	uint64_t end_ns;
+};
+
+/* Result structure */
+struct mvtvm_ml_result {
+	/* Job error code */
+	uint64_t error_code;
+
+	/* Inference stats */
+	struct mvtvm_ml_stats stats;
+
+	/* User context pointer */
+	void *user_ptr;
+};
+
+/* MVTVM specific request */
+struct mvtvm_ml_req {
+	/* Input tensors */
+	DLTensor input_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Output tensors */
+	DLTensor output_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Status field for poll mode requests */
+	volatile uint64_t status;
+
+	/* Result */
+	struct mvtvm_ml_result result;
+};
 
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
@@ -29,6 +67,11 @@ int mvtvm_ml_io_quantize(void *device, uint16_t model_id, const char *layer_name
 int mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name, void *qbuffer,
 			   const DLTensor **deq_tensor);
 
+__rte_hot bool mvtvm_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op,
+				       uint16_t layer_id, struct cnxk_ml_qp *qp, uint64_t head);
+__rte_hot void mvtvm_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request);
+__rte_hot void mvtvm_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype);
+
 void mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 				   uint16_t stat_id, uint16_t entry, char *suffix);
 uint64_t mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v6 34/34] ml/cnxk: enable creation of mvtvm virtual device
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (32 preceding siblings ...)
  2023-10-18 13:54   ` [PATCH v6 33/34] ml/cnxk: enable fast-path ops for TVM models Srikanth Yalavarthi
@ 2023-10-18 13:54   ` Srikanth Yalavarthi
  2023-10-18 14:20   ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Jerin Jacob
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-18 13:54 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enable support to create a mvtvm virtual device on system's
without a PCI based ML HW accelerator.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 doc/guides/mldevs/cnxk.rst       |  49 +++++++-
 drivers/ml/cnxk/cn10k_ml_dev.c   |   8 ++
 drivers/ml/cnxk/cn10k_ml_dev.h   |   3 +
 drivers/ml/cnxk/cnxk_ml_dev.c    |   3 +
 drivers/ml/cnxk/cnxk_ml_dev.h    |  21 ++++
 drivers/ml/cnxk/cnxk_ml_ops.c    |  82 +++++++++----
 drivers/ml/cnxk/meson.build      |   2 +
 drivers/ml/cnxk/mvtvm_ml_dev.c   | 196 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_dev.h   |  40 +++++++
 drivers/ml/cnxk/mvtvm_ml_ops.c   |  31 +++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |   2 +
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  18 +++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |   2 +
 13 files changed, 433 insertions(+), 24 deletions(-)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.h

diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
index ef2b5d4581..1d7f63993b 100644
--- a/doc/guides/mldevs/cnxk.rst
+++ b/doc/guides/mldevs/cnxk.rst
@@ -148,6 +148,22 @@ Bind the ML PF device to the vfio_pci driver:
    usertools/dpdk-devbind.py -u 0000:00:10.0
    usertools/dpdk-devbind.py -b vfio-pci 0000:00:10.0
 
+VDEV support
+------------
+
+On platforms which don't support ML hardware acceleration through PCI device, the
+Marvell ML CNXK PMD can execute inference operations on a vdev with the ML models
+compiled using Apache TVM framework.
+
+VDEV can be enabled by passing the EAL arguments
+
+.. code-block:: console
+
+   --vdev ml_mvtvm
+
+VDEV can also be used on platforms with ML HW accelerator. However use of VDEV and
+PCI HW accelerator is mutually exclusive.
+
 
 Runtime Config Options
 ----------------------
@@ -158,6 +174,8 @@ Runtime Config Options
   The parameter ``fw_path`` can be used by the user
   to load ML firmware from a custom path.
 
+  This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,fw_path="/home/user/ml_fw.bin"
@@ -173,6 +191,8 @@ Runtime Config Options
   When enabled, firmware would mask the DPE non-fatal hardware errors as warnings.
   The parameter ``enable_dpe_warnings`` is used fo this configuration.
 
+  This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,enable_dpe_warnings=0
@@ -189,11 +209,19 @@ Runtime Config Options
   Caching of model data improves the inferencing throughput / latency for the model.
   The parameter ``cache_model_data`` is used to enable data caching.
 
+  This option is supported on PCI HW accelerator and vdev.
+
   For example::
 
      -a 0000:00:10.0,cache_model_data=0
 
-  With the above configuration, model data caching is disabled.
+  With the above configuration, model data caching is disabled on HW accelerator.
+
+  For example::
+
+     --vdev ml_mvtvm,cache_model_data=0
+
+  With the above configuration, model data caching is disabled on vdev.
 
 
 **OCM allocation mode** (default ``lowest``)
@@ -209,6 +237,8 @@ Runtime Config Options
   ``largest``
     Allocate OCM for the model from the slot with largest amount of free space.
 
+  This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,ocm_alloc_mode=lowest
@@ -226,6 +256,8 @@ Runtime Config Options
   Supported page sizes by the driver are 1 KB, 2 KB, 4 KB, 8 KB and 16 KB.
   Default page size is 16 KB.
 
+  This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,ocm_page_size=8192
@@ -250,6 +282,8 @@ Runtime Config Options
     Enabling spinlock version would disable restrictions on the number of queue-pairs
     that can be supported by the driver.
 
+   This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,hw_queue_lock=1
@@ -258,6 +292,19 @@ Runtime Config Options
   in the fast path enqueue burst operation.
 
 
+**Maximum queue pairs** (default ``1``)
+
+  VDEV supports additional EAL arguments to configure the maximum number of
+  queue-pairs on the ML device through the option ``max_qps``.
+
+  This option is supported only on vdev.
+
+  For example::
+
+     --vdev ml_mvtvm,max_qps=4
+
+  With the above configuration, 4 queue-pairs are created on the vdev.
+
 Debugging Options
 -----------------
 
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 91813e9d0a..caa13ba08c 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -309,6 +309,12 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 
 	PLT_SET_USED(pci_drv);
 
+	if (cnxk_ml_dev_initialized == 1) {
+		plt_err("ML CNXK device already initialized!");
+		plt_err("Cannot initialize CN10K PCI dev");
+		rte_exit(-EINVAL, "Invalid EAL arguments ");
+	}
+
 	init_params = (struct rte_ml_dev_pmd_init_params){
 		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cnxk_ml_dev)};
 
@@ -355,6 +361,8 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	dev->dequeue_burst = NULL;
 	dev->op_error_get = NULL;
 
+	cnxk_ml_dev_initialized = 1;
+	cnxk_mldev->type = CNXK_ML_DEV_TYPE_PCI;
 	cnxk_mldev->state = ML_CNXK_DEV_STATE_PROBED;
 
 	return 0;
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 2e7eb6c9ef..cee405f3f5 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -11,6 +11,9 @@
 
 #include "cnxk_ml_io.h"
 
+/* Device status */
+extern int cnxk_ml_dev_initialized;
+
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.c b/drivers/ml/cnxk/cnxk_ml_dev.c
index 63d1c9e417..dc4512223c 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.c
+++ b/drivers/ml/cnxk/cnxk_ml_dev.c
@@ -7,6 +7,9 @@
 
 #include "cnxk_ml_dev.h"
 
+/* Device status */
+int cnxk_ml_dev_initialized;
+
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
 
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 382fca64be..491c4c4aea 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -9,6 +9,10 @@
 
 #include "cn10k_ml_dev.h"
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_dev.h"
+#endif
+
 #include "cnxk_ml_xstats.h"
 
 /* ML command timeout in seconds */
@@ -34,6 +38,15 @@ struct cnxk_ml_error_db {
 	char str[RTE_ML_STR_MAX];
 };
 
+/* Device type */
+enum cnxk_ml_dev_type {
+	/* PCI based Marvell's ML HW accelerator device */
+	CNXK_ML_DEV_TYPE_PCI,
+
+	/* Generic Virtual device */
+	CNXK_ML_DEV_TYPE_VDEV,
+};
+
 /* Device configuration state enum */
 enum cnxk_ml_dev_state {
 	/* Probed and not configured */
@@ -66,6 +79,9 @@ struct cnxk_ml_dev {
 	/* RTE device */
 	struct rte_ml_dev *mldev;
 
+	/* Device type */
+	enum cnxk_ml_dev_type type;
+
 	/* Configuration state */
 	enum cnxk_ml_dev_state state;
 
@@ -87,6 +103,11 @@ struct cnxk_ml_dev {
 	/* CN10K device structure */
 	struct cn10k_ml_dev cn10k_mldev;
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	/* MVTVM device structure */
+	struct mvtvm_ml_dev mvtvm_mldev;
+#endif
+
 	/* Maximum number of layers */
 	uint64_t max_nb_layers;
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 608e9fc4ca..517aa71931 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -117,7 +117,8 @@ cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc
 	qp->stats.enqueue_err_count = 0;
 	qp->stats.dequeue_err_count = 0;
 
-	cn10k_ml_qp_initialize(cnxk_mldev, qp);
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		cn10k_ml_qp_initialize(cnxk_mldev, qp);
 
 	return qp;
 
@@ -480,7 +481,12 @@ cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	dev_info->driver_name = dev->device->driver->name;
 	dev_info->max_models = ML_CNXK_MAX_MODELS;
 
-	return cn10k_ml_dev_info_get(cnxk_mldev, dev_info);
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		return cn10k_ml_dev_info_get(cnxk_mldev, dev_info);
+	else
+		return mvtvm_ml_dev_info_get(cnxk_mldev, dev_info);
+
+	return 0;
 }
 
 static int
@@ -518,9 +524,11 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 			   conf->nb_queue_pairs, conf->nb_models);
 
 		/* Load firmware */
-		ret = cn10k_ml_fw_load(cnxk_mldev);
-		if (ret != 0)
-			return ret;
+		if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+			ret = cn10k_ml_fw_load(cnxk_mldev);
+			if (ret != 0)
+				return ret;
+		}
 	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
 		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
@@ -618,10 +626,12 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	}
 	dev->data->nb_models = conf->nb_models;
 
-	ret = cn10k_ml_dev_configure(cnxk_mldev, conf);
-	if (ret != 0) {
-		plt_err("Failed to configure CN10K ML Device");
-		goto error;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		ret = cn10k_ml_dev_configure(cnxk_mldev, conf);
+		if (ret != 0) {
+			plt_err("Failed to configure CN10K ML Device");
+			goto error;
+		}
 	}
 
 	ret = mvtvm_ml_dev_configure(cnxk_mldev, conf);
@@ -629,12 +639,17 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 		goto error;
 
 	/* Set device capabilities */
-	cnxk_mldev->max_nb_layers =
-		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		cnxk_mldev->max_nb_layers =
+			cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
+	else
+		cnxk_mldev->max_nb_layers = ML_CNXK_MAX_MODELS;
 
 	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
 	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
-	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
+
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
 
 	/* Allocate and initialize index_map */
 	if (cnxk_mldev->index_map == NULL) {
@@ -695,8 +710,10 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 	if (mvtvm_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close MVTVM ML Device");
 
-	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
-		plt_err("Failed to close CN10K ML Device");
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		if (cn10k_ml_dev_close(cnxk_mldev) != 0)
+			plt_err("Failed to close CN10K ML Device");
+	}
 
 	if (cnxk_mldev->index_map)
 		rte_free(cnxk_mldev->index_map);
@@ -748,10 +765,12 @@ cnxk_ml_dev_start(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
-	ret = cn10k_ml_dev_start(cnxk_mldev);
-	if (ret != 0) {
-		plt_err("Failed to start CN10K ML Device");
-		return ret;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		ret = cn10k_ml_dev_start(cnxk_mldev);
+		if (ret != 0) {
+			plt_err("Failed to start CN10K ML Device");
+			return ret;
+		}
 	}
 
 	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
@@ -770,10 +789,12 @@ cnxk_ml_dev_stop(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
-	ret = cn10k_ml_dev_stop(cnxk_mldev);
-	if (ret != 0) {
-		plt_err("Failed to stop CN10K ML Device");
-		return ret;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		ret = cn10k_ml_dev_stop(cnxk_mldev);
+		if (ret != 0) {
+			plt_err("Failed to stop CN10K ML Device");
+			return ret;
+		}
 	}
 
 	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
@@ -800,7 +821,12 @@ cnxk_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 			cnxk_ml_model_dump(cnxk_mldev, model, fp);
 	}
 
-	return cn10k_ml_dev_dump(cnxk_mldev, fp);
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		return cn10k_ml_dev_dump(cnxk_mldev, fp);
+	else
+		return mvtvm_ml_dev_dump(cnxk_mldev, fp);
+
+	return 0;
 }
 
 static int
@@ -813,6 +839,9 @@ cnxk_ml_dev_selftest(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV)
+		return -ENOTSUP;
+
 	return cn10k_ml_dev_selftest(cnxk_mldev);
 }
 
@@ -1145,6 +1174,11 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 		return -EINVAL;
 	}
 
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV && type != ML_CNXK_MODEL_TYPE_TVM) {
+		plt_err("Unsupported model type");
+		return -ENOTSUP;
+	}
+
 	/* Find model ID */
 	found = false;
 	for (lcl_model_id = 0; lcl_model_id < dev->data->nb_models; lcl_model_id++) {
@@ -1384,6 +1418,8 @@ cnxk_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buf
 		return -EINVAL;
 
 	cnxk_mldev = dev->data->dev_private;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV)
+		return -ENOTSUP;
 
 	model = dev->data->models[model_id];
 	if (model == NULL) {
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index a20615186c..204c850901 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -57,11 +57,13 @@ if enable_mvtvm
 dpdk_conf.set('RTE_MLDEV_CNXK_ENABLE_MVTVM', 1)
 
 driver_sdk_headers += files(
+        'mvtvm_ml_dev.h',
         'mvtvm_ml_ops.h',
         'mvtvm_ml_model.h',
 )
 
 sources += files(
+        'mvtvm_ml_dev.c',
         'mvtvm_ml_ops.c',
         'mvtvm_ml_model.c',
 )
diff --git a/drivers/ml/cnxk/mvtvm_ml_dev.c b/drivers/ml/cnxk/mvtvm_ml_dev.c
new file mode 100644
index 0000000000..c93b5155b9
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_dev.c
@@ -0,0 +1,196 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_kvargs.h>
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include <bus_vdev_driver.h>
+
+#include <roc_api.h>
+
+#include "cnxk_ml_dev.h"
+
+#define MVTVM_ML_DEV_MAX_QPS	      "max_qps"
+#define MVTVM_ML_DEV_CACHE_MODEL_DATA "cache_model_data"
+
+#define MVTVM_ML_DEV_MAX_QPS_DEFAULT	      32
+#define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT 1
+
+static const char *const valid_args[] = {MVTVM_ML_DEV_MAX_QPS, MVTVM_ML_DEV_CACHE_MODEL_DATA, NULL};
+
+static int
+parse_integer_arg(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	int *i = (int *)extra_args;
+
+	*i = atoi(value);
+	if (*i < 0) {
+		plt_err("Argument has to be positive.");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int
+parse_uint_arg(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	int i;
+	char *end;
+	errno = 0;
+
+	i = strtol(value, &end, 10);
+	if (*end != 0 || errno != 0 || i < 0)
+		return -EINVAL;
+
+	*((uint32_t *)extra_args) = i;
+
+	return 0;
+}
+
+static int
+mvtvm_mldev_parse_devargs(const char *args, struct mvtvm_ml_dev *mvtvm_mldev)
+{
+	bool cache_model_data_set = false;
+	struct rte_kvargs *kvlist = NULL;
+	bool max_qps_set = false;
+	int ret = 0;
+
+	if (args == NULL)
+		goto check_args;
+
+	kvlist = rte_kvargs_parse(args, valid_args);
+	if (kvlist == NULL) {
+		plt_err("Error parsing %s devargs\n", "MLDEV_NAME_MVTVM_PMD");
+		return -EINVAL;
+	}
+
+	if (rte_kvargs_count(kvlist, MVTVM_ML_DEV_MAX_QPS) == 1) {
+		ret = rte_kvargs_process(kvlist, MVTVM_ML_DEV_MAX_QPS, &parse_uint_arg,
+					 &mvtvm_mldev->max_nb_qpairs);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n", MVTVM_ML_DEV_MAX_QPS);
+			ret = -EINVAL;
+			goto exit;
+		}
+		max_qps_set = true;
+	}
+
+	if (rte_kvargs_count(kvlist, MVTVM_ML_DEV_CACHE_MODEL_DATA) == 1) {
+		ret = rte_kvargs_process(kvlist, MVTVM_ML_DEV_CACHE_MODEL_DATA, &parse_integer_arg,
+					 &mvtvm_mldev->cache_model_data);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n",
+				MVTVM_ML_DEV_CACHE_MODEL_DATA);
+			ret = -EINVAL;
+			goto exit;
+		}
+		cache_model_data_set = true;
+	}
+
+check_args:
+	if (!max_qps_set)
+		mvtvm_mldev->max_nb_qpairs = MVTVM_ML_DEV_MAX_QPS_DEFAULT;
+	plt_ml_dbg("ML: %s = %u", MVTVM_ML_DEV_MAX_QPS, mvtvm_mldev->max_nb_qpairs);
+
+	if (!cache_model_data_set) {
+		mvtvm_mldev->cache_model_data = CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT;
+	} else {
+		if ((mvtvm_mldev->cache_model_data < 0) || (mvtvm_mldev->cache_model_data > 1)) {
+			plt_err("Invalid argument, %s = %d\n", MVTVM_ML_DEV_CACHE_MODEL_DATA,
+				mvtvm_mldev->cache_model_data);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_ml_dbg("ML: %s = %d", MVTVM_ML_DEV_CACHE_MODEL_DATA, mvtvm_mldev->cache_model_data);
+
+exit:
+	if (kvlist)
+		rte_kvargs_free(kvlist);
+
+	return ret;
+}
+
+static int
+mvtvm_ml_vdev_probe(struct rte_vdev_device *vdev)
+{
+	struct rte_ml_dev_pmd_init_params init_params;
+	struct mvtvm_ml_dev *mvtvm_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct rte_ml_dev *dev;
+	const char *input_args;
+	const char *name;
+	int ret = 0;
+
+	if (cnxk_ml_dev_initialized == 1) {
+		plt_err("ML CNXK device already initialized!");
+		plt_err("Cannot initialize MVTVM vdev");
+		rte_exit(-EINVAL, "Invalid EAL arguments ");
+	}
+
+	init_params = (struct rte_ml_dev_pmd_init_params){
+		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cnxk_ml_dev)};
+
+	name = rte_vdev_device_name(vdev);
+	if (name == NULL)
+		return -EINVAL;
+	input_args = rte_vdev_device_args(vdev);
+
+	dev = rte_ml_dev_pmd_create(name, &vdev->device, &init_params);
+	if (dev == NULL) {
+		ret = -EFAULT;
+		goto error_exit;
+	}
+
+	cnxk_mldev = dev->data->dev_private;
+	cnxk_mldev->mldev = dev;
+	mvtvm_mldev = &cnxk_mldev->mvtvm_mldev;
+	mvtvm_mldev->vdev = vdev;
+
+	ret = mvtvm_mldev_parse_devargs(input_args, mvtvm_mldev);
+	if (ret < 0)
+		goto error_exit;
+
+	dev->dev_ops = &cnxk_ml_ops;
+	dev->enqueue_burst = NULL;
+	dev->dequeue_burst = NULL;
+	dev->op_error_get = NULL;
+
+	cnxk_ml_dev_initialized = 1;
+	cnxk_mldev->type = CNXK_ML_DEV_TYPE_VDEV;
+
+	return 0;
+
+error_exit:
+	plt_err("Could not create device: ml_mvtvm");
+
+	return ret;
+}
+
+static int
+mvtvm_ml_vdev_remove(struct rte_vdev_device *vdev)
+{
+	struct rte_ml_dev *dev;
+	const char *name;
+
+	name = rte_vdev_device_name(vdev);
+	if (name == NULL)
+		return -EINVAL;
+
+	dev = rte_ml_dev_pmd_get_named_dev(name);
+	if (dev == NULL)
+		return -ENODEV;
+
+	return rte_ml_dev_pmd_destroy(dev);
+}
+
+static struct rte_vdev_driver mvtvm_mldev_pmd = {.probe = mvtvm_ml_vdev_probe,
+						 .remove = mvtvm_ml_vdev_remove};
+
+RTE_PMD_REGISTER_VDEV(MLDEV_NAME_MVTVM_PMD, mvtvm_mldev_pmd);
+
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_MVTVM_PMD,
+			      MVTVM_ML_DEV_MAX_QPS "=<int>" MVTVM_ML_DEV_CACHE_MODEL_DATA "=<0|1>");
diff --git a/drivers/ml/cnxk/mvtvm_ml_dev.h b/drivers/ml/cnxk/mvtvm_ml_dev.h
new file mode 100644
index 0000000000..6922c19337
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_dev.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_DEV_H_
+#define _MVTVM_ML_DEV_H_
+
+#include <rte_mldev_core.h>
+
+/* Device status */
+extern int cnxk_ml_dev_initialized;
+
+/* CNXK Device ops */
+extern struct rte_ml_dev_ops cnxk_ml_ops;
+
+/* Marvell MVTVM ML PMD device name */
+#define MLDEV_NAME_MVTVM_PMD ml_mvtvm
+
+/* Maximum number of descriptors per queue-pair */
+#define ML_MVTVM_MAX_DESC_PER_QP 1024
+
+/* Maximum number of inputs / outputs per model */
+#define ML_MVTVM_MAX_INPUT_OUTPUT 32
+
+/* Maximum number of segments for IO data */
+#define ML_MVTVM_MAX_SEGMENTS 1
+
+/* Device private data */
+struct mvtvm_ml_dev {
+	/* Virtual device */
+	struct rte_vdev_device *vdev;
+
+	/* Maximum number of queue pairs */
+	uint16_t max_nb_qpairs;
+
+	/* Enable / disable model data caching */
+	int cache_model_data;
+};
+
+#endif /* _MVTVM_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 1e74b82a0a..bbefa8a356 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -97,6 +97,22 @@ mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *m
 	return value;
 }
 
+int
+mvtvm_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info)
+{
+	struct mvtvm_ml_dev *mvtvm_mldev;
+
+	mvtvm_mldev = &cnxk_mldev->mvtvm_mldev;
+
+	dev_info->max_queue_pairs = mvtvm_mldev->max_nb_qpairs;
+	dev_info->max_desc = ML_MVTVM_MAX_DESC_PER_QP;
+	dev_info->max_io = ML_MVTVM_MAX_INPUT_OUTPUT;
+	dev_info->max_segments = ML_MVTVM_MAX_SEGMENTS;
+	dev_info->align_size = RTE_CACHE_LINE_SIZE;
+
+	return 0;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
@@ -127,6 +143,15 @@ mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 	return ret;
 }
 
+int
+mvtvm_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(fp);
+
+	return 0;
+}
+
 int
 mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 		    struct cnxk_ml_model *model)
@@ -237,6 +262,12 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	else
 		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_HYBRID;
 
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV &&
+	    model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_LLVM) {
+		plt_err("Unsupported model sub-type");
+		return -ENOTSUP;
+	}
+
 	/* Set callback function array */
 	if (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_LLVM) {
 		callback = &model->mvtvm.cb;
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index cb4b219743..0232c5ead5 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -55,8 +55,10 @@ struct mvtvm_ml_req {
 	struct mvtvm_ml_result result;
 };
 
+int mvtvm_ml_dev_info_get(struct cnxk_ml_dev *mldev, struct rte_ml_dev_info *dev_info);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int mvtvm_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index 19af1d2703..126a954c91 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -67,6 +67,15 @@ mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *m
 	return 0;
 }
 
+int
+mvtvm_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(dev_info);
+
+	return -ENOTSUP;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
@@ -84,6 +93,15 @@ mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
+int
+mvtvm_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(fp);
+
+	return -EINVAL;
+}
+
 int
 mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 		    struct cnxk_ml_model *model)
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 3fd1f04c35..4220a963f2 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -14,8 +14,10 @@ struct cnxk_ml_model;
 struct cnxk_ml_layer;
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
+int mvtvm_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int mvtvm_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* Re: [PATCH v6 00/34] Implementation of revised ml/cnxk driver
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (33 preceding siblings ...)
  2023-10-18 13:54   ` [PATCH v6 34/34] ml/cnxk: enable creation of mvtvm virtual device Srikanth Yalavarthi
@ 2023-10-18 14:20   ` Jerin Jacob
  2023-10-19  6:41     ` [EXT] " Srikanth Yalavarthi
  34 siblings, 1 reply; 340+ messages in thread
From: Jerin Jacob @ 2023-10-18 14:20 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

On Wed, Oct 18, 2023 at 7:24 PM Srikanth Yalavarthi
<syalavarthi@marvell.com> wrote:
>
> This patch series is an implementation of revised ml/cnxk driver
> to support models compiled with TVM compiler framework. TVM models
> use a hybrid mode for execution, with regions of the model executing
> on the ML accelerator and the rest executing on CPU cores.
>
> This series of commits reorganizes the ml/cnxk driver and adds support
> to execute multiple regions with-in a TVM model.
>

Fix this warning

### [PATCH] ml/cnxk: enable creation of mvtvm virtual device

Warning in drivers/ml/cnxk/cn10k_ml_dev.c:
Using rte_panic/rte_exit

Fix as needed which is relevent
### [PATCH] ml/cnxk: add generic cnxk device structure

WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
strncpy - see: https://github.com/KSPP/linux/issues/90
#1778: FILE: drivers/ml/cnxk/cn10k_ml_ops.c:1316:
+               strncpy(xstats_map[idx].name,
cn10k_mldev->xstats.entries[i].map.name,

total: 0 errors, 1 warnings, 2276 lines checked

### [PATCH] ml/cnxk: add generic model and layer structures

WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
strncpy - see: https://github.com/KSPP/linux/issues/90
#117: FILE: drivers/ml/cnxk/cn10k_ml_model.c:379:
+                       strncpy(layer->info.input[i].name, (char
*)metadata->input1[i].input_name,

WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
strncpy - see: https://github.com/KSPP/linux/issues/90
#166: FILE: drivers/ml/cnxk/cn10k_ml_model.c:411:
+                       strncpy(layer->info.input[i].name, (char
*)metadata->input2[j].input_name,

WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
strncpy - see: https://github.com/KSPP/linux/issues/90
#221: FILE: drivers/ml/cnxk/cn10k_ml_model.c:449:
+                       strncpy(layer->info.output[i].name,

WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
strncpy - see: https://github.com/KSPP/linux/issues/90
#255: FILE: drivers/ml/cnxk/cn10k_ml_model.c:472:
+                       strncpy(layer->info.output[i].name,

total: 0 errors, 4 warnings, 1905 lines checked

### [PATCH] ml/cnxk: update model load and unload functions

WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
strncpy - see: https://github.com/KSPP/linux/issues/90
#83: FILE: drivers/ml/cnxk/cn10k_ml_model.c:367:
+                       strncpy(io_info->input[i].name, (char
*)metadata->input1[i].input_name,

WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
strncpy - see: https://github.com/KSPP/linux/issues/90
#135: FILE: drivers/ml/cnxk/cn10k_ml_model.c:399:
+                       strncpy(io_info->input[i].name, (char
*)metadata->input2[j].input_name,

WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
strncpy - see: https://github.com/KSPP/linux/issues/90
#204: FILE: drivers/ml/cnxk/cn10k_ml_model.c:437:
+                       strncpy(io_info->output[i].name, (char
*)metadata->output1[i].output_name,

WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
strncpy - see: https://github.com/KSPP/linux/issues/90
#244: FILE: drivers/ml/cnxk/cn10k_ml_model.c:461:
+                       strncpy(io_info->output[i].name, (char
*)metadata->output2[j].output_name,

total: 0 errors, 4 warnings, 1094 lines checked

### [PATCH] ml/cnxk: update device and model xstats functions

WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
strncpy - see: https://github.com/KSPP/linux/issues/90
#1100: FILE: drivers/ml/cnxk/cnxk_ml_ops.c:856:
WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
strncpy - see: https://github.com/KSPP/linux/issues/90
#1100: FILE: drivers/ml/cnxk/cnxk_ml_ops.c:856:
+               strncpy(xstats_map[idx].name, xs->map.name, RTE_ML_STR_MAX);

total: 0 errors, 1 warnings, 1248 lines checked

### [PATCH] ml/cnxk: fetch layer info and load TVM model

WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
strncpy - see: https://github.com/KSPP/linux/issues/90
#172: FILE: drivers/ml/cnxk/mvtvm_ml_ops.c:125:
+               strncpy(model->layer[layer_id].name,

total: 0 errors, 1 warnings, 207 lines checked

### [PATCH] ml/cnxk: update internal info for TVM model

WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
strncpy - see: https://github.com/KSPP/linux/issues/90
#85: FILE: drivers/ml/cnxk/mvtvm_ml_model.c:175:
+               strncpy(model->mvtvm.info.input[i].name,
metadata->input[i].name,

WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
strncpy - see: https://github.com/KSPP/linux/issues/90
#118: FILE: drivers/ml/cnxk/mvtvm_ml_model.c:208:
+               strncpy(model->mvtvm.info.output[i].name,
metadata->output[i].name,

total: 0 errors, 2 warnings, 173 lines checked

### [PATCH] ml/cnxk: enable reporting model runtime as xstats

WARNING:STRCPY: Prefer strscpy over strcpy - see:
https://github.com/KSPP/linux/issues/88
#113: FILE: drivers/ml/cnxk/cnxk_ml_ops.c:243:
+               strcpy(suffix, "cycles");

WARNING:STRCPY: Prefer strscpy over strcpy - see:
https://github.com/KSPP/linux/issues/88
#115: FILE: drivers/ml/cnxk/cnxk_ml_ops.c:245:
+               strcpy(suffix, "ns");

total: 0 errors, 2 warnings, 477 lines checked

^ permalink raw reply	[flat|nested] 340+ messages in thread

* Re: [PATCH v6 18/34] ml/cnxk: support config and close of tvmdp library
  2023-10-18 13:54   ` [PATCH v6 18/34] ml/cnxk: support config and close of tvmdp library Srikanth Yalavarthi
@ 2023-10-18 18:34     ` Jerin Jacob
  2023-10-19  6:44       ` [EXT] " Srikanth Yalavarthi
  0 siblings, 1 reply; 340+ messages in thread
From: Jerin Jacob @ 2023-10-18 18:34 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

On Wed, Oct 18, 2023 at 7:52 PM Srikanth Yalavarthi
<syalavarthi@marvell.com> wrote:
>
> Added support to configure and close TVMDP library based
> on ML device configuration options.
>
> Updated meson build to enable Jansson, TVM runtime, TVMDP
> library as build dependencies.
>
> Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
> ---

>
> +Compilation Prerequisites
> +-------------------------
> +
> +This driver requires external libraries to optionally enable support for
> +models compiled using Apache TVM framework. The following dependencies are
> +not part of DPDK and must be installed separately:
> +
> +- **Jansson**
> +
> +  This library enables support to parse and read JSON files.
> +
> +- **DLPack**
> +
> +  This library provides headers for open in-memory tensor structures.
> +
> +.. note::
> +
> +    DPDK CNXK ML driver requires DLPack version 0.7
> +
> +.. code-block:: console


Please add sections for cross and native.

> +    git clone https://github.com/dmlc/dlpack.git
> +    cd dlpack
> +    git checkout v0.7 -b v0.7
> +    cmake -S ./ -B build \
> +      -DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
> +      -DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++ \
> +      -DBUILD_MOCK=OFF
> +    make -C build
> +    make -C build install
> +
> +- **TVM**
> +
> +  Apache TVM provides a runtime library (libtvm_runtime) used to execute
> +  models on CPU cores or hardware accelerators.
> +
> +.. note::
> +
> +    DPDK CNXK ML driver requires TVM version 0.10.0
> +
> +.. code-block:: console
> +
> +    git clone https://github.com/apache/tvm.git

I need to use --recursive to avoid
CMake Error at /usr/share/cmake/Modules/ExternalProject.cmake:3176 (message):
  No download info given for 'project_libbacktrace' and its source directory:


> +    cd tvm
> +    git checkout v0.10.0 -b v0.10.0
> +    cmake -S ./ -B build \
> +      -DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
> +      -DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++ \
> +      -DMACHINE_NAME=aarch64-linux-gnu \
> +      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
> +      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY
> +    make -C build
> +    make -C build install
> +
> +- **TVMDP**
> +
> +  Marvell's `TVM Dataplane Library <https://github.com/MarvellEmbeddedProcessors/tvmdp>`_
> +  works as an interface between TVM runtime and DPDK drivers. TVMDP library
> +  provides a simplified C interface for TVM's runtime based on C++.
> +
> +.. code-block:: console
> +
> +    git clone https://github.com/MarvellEmbeddedProcessors/tvmdp.git
> +    cd tvmdp
> +    git checkout main
> +    cmake -S ./ -B build \
> +      -DCMAKE_TOOLCHAIN_FILE=config/toolchains/arm64_linux_gcc.cmake \
> +      -DBUILD_SHARED_LIBS=ON \
> +      -DBUILD_TESTING=OFF

[main]dell[tvmdp] $ cmake -S ./ -B build
-DCMAKE_INSTALL_PREFIX=/export/cross_prefix/prefix
-DCMAKE_TOOLCHAIN_FILE=config/toolchains/arm64_linux_gcc.cmake
-DBUILD_SHARED_LIBS=ON  -DBUILD_TESTING=OFF
-- The CXX compiler identification is GNU 13.2.0
-- The C compiler identification is GNU 13.2.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/aarch64-linux-gnu-g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/aarch64-linux-gnu-gcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
CMake Error at CMakeLists.txt:53 (find_package):
  By not providing "Finddmlc.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "dmlc", but
  CMake did not find one.

  Could not find a package configuration file provided by "dmlc" with any of
  the following names:

    dmlcConfig.cmake
    dmlc-config.cmake

  Add the installation prefix of "dmlc" to CMAKE_PREFIX_PATH or set
  "dmlc_DIR" to a directory containing one of the above files.  If "dmlc"
  provides a separate development package or SDK, be sure it has been
  installed.


-- Configuring incomplete, errors occurred!


> +enable_mvtvm = true
> +
> +if not jansson_dep.found()
> +        message('drivers/ml/cnxk: jansson not found')
> +        enable_mvtvm = false
> +endif
> +
> +if not cc.check_header('dlpack/dlpack.h')
> +        message('drivers/ml/cnxk: dlpack.h not found')
> +        enable_mvtvm = false
> +endif
> +
> +tvmrt_lib = cc.find_library('tvm_runtime', required: false)
> +if tvmrt_lib.found()
> +        tvmrt_dep = declare_dependency(dependencies: tvmrt_lib)
> +else
> +        message('drivers/ml/cnxk: tvm_runtime not found')
> +        enable_mvtvm = false
> +endif
> +
> +tvmdp_dep = dependency('tvmdp', required: false)
> +if not tvmdp_dep.found()
> +        message('drivers/ml/cnxk: tvmdp not found')
> +        enable_mvtvm = false
> +endif
> +
>  sources = files(
>          'cn10k_ml_dev.c',
>          'cn10k_ml_ops.c',
> @@ -21,6 +47,39 @@ sources = files(
>
>  deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
>
> +if enable_mvtvm
> +
> +dpdk_conf.set('RTE_MLDEV_CNXK_ENABLE_MVTVM', 1)
> +
> +driver_sdk_headers += files(
> +        'mvtvm_ml_ops.h',
> +)

Remove this

^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 00/34] Implementation of revised ml/cnxk driver
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (38 preceding siblings ...)
  2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
@ 2023-10-19  4:16 ` Srikanth Yalavarthi
  2023-10-19  4:16   ` [PATCH v7 01/34] ml/cnxk: drop support for register polling Srikanth Yalavarthi
                     ` (33 more replies)
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
  41 siblings, 34 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:16 UTC (permalink / raw)
  Cc: dev, syalavarthi, sshankarnara, aprabhu, ptakkar

This patch series is an implementation of revised ml/cnxk driver
to support models compiled with TVM compiler framework. TVM models
use a hybrid mode for execution, with regions of the model executing
on the ML accelerator and the rest executing on CPU cores.

This series of commits reorganizes the ml/cnxk driver and adds support
to execute multiple regions with-in a TVM model.

v7:
  - Updated steps to build dependencies in cnxk mldev documentation
  - Replace str functions with rte_str functions
  - Drop use of rte_exit in ml/cnxk driver

v6:
  - Added depends info for series. This series depends on patch-132887
  - Fix merge conflicts with dpdk-23.11-rc1
  - Fix issues with ml/cnxk driver release notes
  - Added build dependency information for dlpack headers

v5:
  - Fix build failures for individual patches in the series
  - Finished build testing with devtools/test-meson-builds.sh script

v4:
  - Squashed release notes
  - Updated external build dependency info in documentation

v3:
  - Reduced use of RTE_MLDEV_CNXK_ENABLE_MVTVM macro
  - Added stubs file with dummy functions to use when TVM is disabled
  - Dropped patch with internal function to read firmware
  - Updated ML CNXK PMD documentation
  - Added external library dependency info in documentation
  - Added release notes for 23.11

v2:
  - Fix xstats reporting
  - Fix issues reported by klocwork static analysis tool
  - Update external header inclusions

v1:
  - Initial changes

Anup Prabhu (2):
  ml/cnxk: enable OCM check for multilayer TVM model
  ml/cnxk: enable fast-path ops for TVM models

Prince Takkar (2):
  ml/cnxk: update internal TVM model info structure
  ml/cnxk: support quantize and dequantize callback

Srikanth Yalavarthi (30):
  ml/cnxk: drop support for register polling
  ml/cnxk: add generic cnxk device structure
  ml/cnxk: add generic model and layer structures
  ml/cnxk: add generic cnxk request structure
  ml/cnxk: add generic cnxk xstats structures
  ml/cnxk: rename cnxk ops function pointers struct
  ml/cnxk: update device handling functions
  ml/cnxk: update queue-pair handling functions
  ml/cnxk: update model load and unload functions
  ml/cnxk: update model start and stop functions
  ml/cnxk: update model utility functions
  ml/cnxk: update data quantization functions
  ml/cnxk: update device debug functions
  ml/cnxk: update device stats functions
  ml/cnxk: update device and model xstats functions
  ml/cnxk: update fast path functions
  ml/cnxk: move error handling to cnxk layer
  ml/cnxk: support config and close of tvmdp library
  ml/cnxk: add structures to support TVM model type
  ml/cnxk: add support for identify model type
  ml/cnxk: add support to parse TVM model objects
  ml/cnxk: fetch layer info and load TVM model
  ml/cnxk: update internal info for TVM model
  ml/cnxk: enable model unload in tvmdp library
  ml/cnxk: support start and stop for TVM models
  ml/cnxk: support device dump for TVM models
  ml/cnxk: enable reporting model runtime as xstats
  ml/cnxk: implement I/O alloc and free callbacks
  ml/cnxk: add generic ML malloc and free callback
  ml/cnxk: enable creation of mvtvm virtual device

 doc/guides/mldevs/cnxk.rst             |  207 +-
 doc/guides/rel_notes/release_23_11.rst |    3 +
 drivers/ml/cnxk/cn10k_ml_dev.c         |  416 ++--
 drivers/ml/cnxk/cn10k_ml_dev.h         |  457 +---
 drivers/ml/cnxk/cn10k_ml_model.c       |  403 ++--
 drivers/ml/cnxk/cn10k_ml_model.h       |  151 +-
 drivers/ml/cnxk/cn10k_ml_ocm.c         |  111 +-
 drivers/ml/cnxk/cn10k_ml_ocm.h         |   15 +-
 drivers/ml/cnxk/cn10k_ml_ops.c         | 2828 ++++++++----------------
 drivers/ml/cnxk/cn10k_ml_ops.h         |  358 ++-
 drivers/ml/cnxk/cnxk_ml_dev.c          |   22 +
 drivers/ml/cnxk/cnxk_ml_dev.h          |  120 +
 drivers/ml/cnxk/cnxk_ml_io.c           |   95 +
 drivers/ml/cnxk/cnxk_ml_io.h           |   88 +
 drivers/ml/cnxk/cnxk_ml_model.c        |   94 +
 drivers/ml/cnxk/cnxk_ml_model.h        |  192 ++
 drivers/ml/cnxk/cnxk_ml_ops.c          | 1690 ++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.h          |   87 +
 drivers/ml/cnxk/cnxk_ml_utils.c        |   15 +
 drivers/ml/cnxk/cnxk_ml_utils.h        |   17 +
 drivers/ml/cnxk/cnxk_ml_xstats.h       |  152 ++
 drivers/ml/cnxk/meson.build            |   63 +
 drivers/ml/cnxk/mvtvm_ml_dev.c         |  196 ++
 drivers/ml/cnxk/mvtvm_ml_dev.h         |   40 +
 drivers/ml/cnxk/mvtvm_ml_model.c       |  392 ++++
 drivers/ml/cnxk/mvtvm_ml_model.h       |   90 +
 drivers/ml/cnxk/mvtvm_ml_ops.c         |  652 ++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h         |   82 +
 drivers/ml/cnxk/mvtvm_ml_stubs.c       |  141 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.h       |   36 +
 30 files changed, 6254 insertions(+), 2959 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_xstats.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_stubs.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_stubs.h

-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 01/34] ml/cnxk: drop support for register polling
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
@ 2023-10-19  4:16   ` Srikanth Yalavarthi
  2023-10-19  4:16   ` [PATCH v7 02/34] ml/cnxk: add generic cnxk device structure Srikanth Yalavarthi
                     ` (32 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:16 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Dropped support for device argument "poll_mem" for cnxk
ML driver. Support to use registers for polling is removed
and DDR addresses would be used for polling.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 doc/guides/mldevs/cnxk.rst     |  16 -----
 drivers/ml/cnxk/cn10k_ml_dev.c |  36 +----------
 drivers/ml/cnxk/cn10k_ml_dev.h |  13 +---
 drivers/ml/cnxk/cn10k_ml_ops.c | 111 ++++-----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   6 --
 5 files changed, 18 insertions(+), 164 deletions(-)

diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
index b79bc540d9..1834b1f905 100644
--- a/doc/guides/mldevs/cnxk.rst
+++ b/doc/guides/mldevs/cnxk.rst
@@ -180,22 +180,6 @@ Runtime Config Options
   in the fast path enqueue burst operation.
 
 
-**Polling memory location** (default ``ddr``)
-
-  ML cnxk driver provides the option to select the memory location to be used
-  for polling to check the inference request completion.
-  Driver supports using either the DDR address space (``ddr``)
-  or ML registers (``register``) as polling locations.
-  The parameter ``poll_mem`` is used to specify the poll location.
-
-  For example::
-
-     -a 0000:00:10.0,poll_mem="register"
-
-  With the above configuration, ML cnxk driver is configured to use ML registers
-  for polling in fastpath requests.
-
-
 Debugging Options
 -----------------
 
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 983138a7f2..e3c2badcef 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -23,7 +23,6 @@
 #define CN10K_ML_DEV_CACHE_MODEL_DATA	"cache_model_data"
 #define CN10K_ML_OCM_ALLOC_MODE		"ocm_alloc_mode"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK	"hw_queue_lock"
-#define CN10K_ML_FW_POLL_MEM		"poll_mem"
 #define CN10K_ML_OCM_PAGE_SIZE		"ocm_page_size"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
@@ -32,7 +31,6 @@
 #define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT	1
 #define CN10K_ML_OCM_ALLOC_MODE_DEFAULT		"lowest"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT	1
-#define CN10K_ML_FW_POLL_MEM_DEFAULT		"ddr"
 #define CN10K_ML_OCM_PAGE_SIZE_DEFAULT		16384
 
 /* ML firmware macros */
@@ -54,7 +52,6 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 					 CN10K_ML_DEV_CACHE_MODEL_DATA,
 					 CN10K_ML_OCM_ALLOC_MODE,
 					 CN10K_ML_DEV_HW_QUEUE_LOCK,
-					 CN10K_ML_FW_POLL_MEM,
 					 CN10K_ML_OCM_PAGE_SIZE,
 					 NULL};
 
@@ -103,9 +100,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	bool hw_queue_lock_set = false;
 	bool ocm_page_size_set = false;
 	char *ocm_alloc_mode = NULL;
-	bool poll_mem_set = false;
 	bool fw_path_set = false;
-	char *poll_mem = NULL;
 	char *fw_path = NULL;
 	int ret = 0;
 	bool found;
@@ -189,17 +184,6 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		hw_queue_lock_set = true;
 	}
 
-	if (rte_kvargs_count(kvlist, CN10K_ML_FW_POLL_MEM) == 1) {
-		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_POLL_MEM, &parse_string_arg,
-					 &poll_mem);
-		if (ret < 0) {
-			plt_err("Error processing arguments, key = %s\n", CN10K_ML_FW_POLL_MEM);
-			ret = -EINVAL;
-			goto exit;
-		}
-		poll_mem_set = true;
-	}
-
 	if (rte_kvargs_count(kvlist, CN10K_ML_OCM_PAGE_SIZE) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_OCM_PAGE_SIZE, &parse_integer_arg,
 					 &mldev->ocm_page_size);
@@ -280,18 +264,6 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, mldev->hw_queue_lock);
 
-	if (!poll_mem_set) {
-		mldev->fw.poll_mem = CN10K_ML_FW_POLL_MEM_DEFAULT;
-	} else {
-		if (!((strcmp(poll_mem, "ddr") == 0) || (strcmp(poll_mem, "register") == 0))) {
-			plt_err("Invalid argument, %s = %s\n", CN10K_ML_FW_POLL_MEM, poll_mem);
-			ret = -EINVAL;
-			goto exit;
-		}
-		mldev->fw.poll_mem = poll_mem;
-	}
-	plt_info("ML: %s = %s", CN10K_ML_FW_POLL_MEM, mldev->fw.poll_mem);
-
 	if (!ocm_page_size_set) {
 		mldev->ocm_page_size = CN10K_ML_OCM_PAGE_SIZE_DEFAULT;
 	} else {
@@ -450,10 +422,7 @@ cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 	if (fw->report_dpe_warnings)
 		flags = flags | FW_REPORT_DPE_WARNING_BITMASK;
 
-	if (strcmp(fw->poll_mem, "ddr") == 0)
-		flags = flags | FW_USE_DDR_POLL_ADDR_FP;
-	else if (strcmp(fw->poll_mem, "register") == 0)
-		flags = flags & ~FW_USE_DDR_POLL_ADDR_FP;
+	flags = flags | FW_USE_DDR_POLL_ADDR_FP;
 
 	return flags;
 }
@@ -863,5 +832,4 @@ RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH
 			      "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
 			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE
 			      "=<lowest|largest>" CN10K_ML_DEV_HW_QUEUE_LOCK
-			      "=<0|1>" CN10K_ML_FW_POLL_MEM "=<ddr|register>" CN10K_ML_OCM_PAGE_SIZE
-			      "=<1024|2048|4096|8192|16384>");
+			      "=<0|1>" CN10K_ML_OCM_PAGE_SIZE "=<1024|2048|4096|8192|16384>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index c73bf7d001..4aaeecff03 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -390,9 +390,6 @@ struct cn10k_ml_fw {
 	/* Report DPE warnings */
 	int report_dpe_warnings;
 
-	/* Memory to be used for polling in fast-path requests */
-	const char *poll_mem;
-
 	/* Data buffer */
 	uint8_t *data;
 
@@ -525,13 +522,9 @@ struct cn10k_ml_dev {
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
 
 	/* Poll handling function pointers */
-	void (*set_poll_addr)(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx);
-	void (*set_poll_ptr)(struct roc_ml *roc_ml, struct cn10k_ml_req *req);
-	uint64_t (*get_poll_ptr)(struct roc_ml *roc_ml, struct cn10k_ml_req *req);
-
-	/* Memory barrier function pointers to handle synchronization */
-	void (*set_enq_barrier)(void);
-	void (*set_deq_barrier)(void);
+	void (*set_poll_addr)(struct cn10k_ml_req *req);
+	void (*set_poll_ptr)(struct cn10k_ml_req *req);
+	uint64_t (*get_poll_ptr)(struct cn10k_ml_req *req);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 4abf4ae0d3..11531afd8c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -23,11 +23,6 @@
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
-/* Scratch register range for poll mode requests */
-#define ML_POLL_REGISTER_SYNC  1023
-#define ML_POLL_REGISTER_START 1024
-#define ML_POLL_REGISTER_END   2047
-
 /* Error message length */
 #define ERRMSG_LEN 32
 
@@ -82,79 +77,23 @@ print_line(FILE *fp, int len)
 }
 
 static inline void
-cn10k_ml_set_poll_addr_ddr(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx)
+cn10k_ml_set_poll_addr(struct cn10k_ml_req *req)
 {
-	PLT_SET_USED(qp);
-	PLT_SET_USED(idx);
-
 	req->compl_W1 = PLT_U64_CAST(&req->status);
 }
 
 static inline void
-cn10k_ml_set_poll_addr_reg(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx)
-{
-	req->compl_W1 = ML_SCRATCH(qp->block_start + idx % qp->block_size);
-}
-
-static inline void
-cn10k_ml_set_poll_ptr_ddr(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+cn10k_ml_set_poll_ptr(struct cn10k_ml_req *req)
 {
-	PLT_SET_USED(roc_ml);
-
 	plt_write64(ML_CN10K_POLL_JOB_START, req->compl_W1);
 }
 
-static inline void
-cn10k_ml_set_poll_ptr_reg(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
-{
-	roc_ml_reg_write64(roc_ml, ML_CN10K_POLL_JOB_START, req->compl_W1);
-}
-
 static inline uint64_t
-cn10k_ml_get_poll_ptr_ddr(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+cn10k_ml_get_poll_ptr(struct cn10k_ml_req *req)
 {
-	PLT_SET_USED(roc_ml);
-
 	return plt_read64(req->compl_W1);
 }
 
-static inline uint64_t
-cn10k_ml_get_poll_ptr_reg(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
-{
-	return roc_ml_reg_read64(roc_ml, req->compl_W1);
-}
-
-static inline void
-cn10k_ml_set_sync_addr(struct cn10k_ml_dev *mldev, struct cn10k_ml_req *req)
-{
-	if (strcmp(mldev->fw.poll_mem, "ddr") == 0)
-		req->compl_W1 = PLT_U64_CAST(&req->status);
-	else if (strcmp(mldev->fw.poll_mem, "register") == 0)
-		req->compl_W1 = ML_SCRATCH(ML_POLL_REGISTER_SYNC);
-}
-
-static inline void
-cn10k_ml_enq_barrier_ddr(void)
-{
-}
-
-static inline void
-cn10k_ml_deq_barrier_ddr(void)
-{
-}
-
-static inline void
-cn10k_ml_enq_barrier_register(void)
-{
-	dmb_st;
-}
-
-static inline void
-cn10k_ml_deq_barrier_register(void)
-{
-	dsb_st;
-}
-
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -242,9 +181,6 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->stats.dequeued_count = 0;
 	qp->stats.enqueue_err_count = 0;
 	qp->stats.dequeue_err_count = 0;
-	qp->block_size =
-		(ML_POLL_REGISTER_END - ML_POLL_REGISTER_START + 1) / dev->data->nb_queue_pairs;
-	qp->block_start = ML_POLL_REGISTER_START + qp_id * qp->block_size;
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
@@ -933,11 +869,7 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	else
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_LF;
 
-	if (strcmp(mldev->fw.poll_mem, "register") == 0)
-		dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP / dev_info->max_queue_pairs;
-	else if (strcmp(mldev->fw.poll_mem, "ddr") == 0)
-		dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
-
+	dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
 	dev_info->max_io = ML_CN10K_MAX_INPUT_OUTPUT;
 	dev_info->max_segments = ML_CN10K_MAX_SEGMENTS;
 	dev_info->align_size = ML_CN10K_ALIGN_SIZE;
@@ -1118,24 +1050,9 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
 	/* Set polling function pointers */
-	if (strcmp(mldev->fw.poll_mem, "ddr") == 0) {
-		mldev->set_poll_addr = cn10k_ml_set_poll_addr_ddr;
-		mldev->set_poll_ptr = cn10k_ml_set_poll_ptr_ddr;
-		mldev->get_poll_ptr = cn10k_ml_get_poll_ptr_ddr;
-	} else if (strcmp(mldev->fw.poll_mem, "register") == 0) {
-		mldev->set_poll_addr = cn10k_ml_set_poll_addr_reg;
-		mldev->set_poll_ptr = cn10k_ml_set_poll_ptr_reg;
-		mldev->get_poll_ptr = cn10k_ml_get_poll_ptr_reg;
-	}
-
-	/* Set barrier function pointers */
-	if (strcmp(mldev->fw.poll_mem, "ddr") == 0) {
-		mldev->set_enq_barrier = cn10k_ml_enq_barrier_ddr;
-		mldev->set_deq_barrier = cn10k_ml_deq_barrier_ddr;
-	} else if (strcmp(mldev->fw.poll_mem, "register") == 0) {
-		mldev->set_enq_barrier = cn10k_ml_enq_barrier_register;
-		mldev->set_deq_barrier = cn10k_ml_deq_barrier_register;
-	}
+	mldev->set_poll_addr = cn10k_ml_set_poll_addr;
+	mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
+	mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
 
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
@@ -2390,15 +2307,14 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	op = ops[count];
 	req = &queue->reqs[head];
 
-	mldev->set_poll_addr(qp, req, head);
+	mldev->set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
-	mldev->set_enq_barrier();
 
-	mldev->set_poll_ptr(&mldev->roc, req);
+	mldev->set_poll_ptr(req);
 	enqueued = mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
@@ -2445,7 +2361,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 
 dequeue_req:
 	req = &queue->reqs[tail];
-	status = mldev->get_poll_ptr(&mldev->roc, req);
+	status = mldev->get_poll_ptr(req);
 	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH)) {
 		if (plt_tsc_cycles() < req->timeout)
 			goto empty_or_active;
@@ -2453,7 +2369,6 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 			req->result.error_code.s.etype = ML_ETYPE_DRIVER;
 	}
 
-	mldev->set_deq_barrier();
 	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
 	ops[count] = req->op;
 
@@ -2515,14 +2430,14 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	model = dev->data->models[op->model_id];
 	req = model->req;
 
-	cn10k_ml_set_sync_addr(mldev, req);
+	cn10k_ml_set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
-	mldev->set_poll_ptr(&mldev->roc, req);
+	mldev->set_poll_ptr(req);
 	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
 
 	timeout = true;
@@ -2542,7 +2457,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 
 	timeout = true;
 	do {
-		if (mldev->get_poll_ptr(&mldev->roc, req) == ML_CN10K_POLL_JOB_FINISH) {
+		if (mldev->get_poll_ptr(req) == ML_CN10K_POLL_JOB_FINISH) {
 			timeout = false;
 			break;
 		}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index d64a9f27e6..005b093e45 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -67,12 +67,6 @@ struct cn10k_ml_qp {
 
 	/* Statistics per queue-pair */
 	struct rte_ml_dev_stats stats;
-
-	/* Register block start for polling */
-	uint32_t block_start;
-
-	/* Register block end for polling */
-	uint32_t block_size;
 };
 
 /* Device ops */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 02/34] ml/cnxk: add generic cnxk device structure
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
  2023-10-19  4:16   ` [PATCH v7 01/34] ml/cnxk: drop support for register polling Srikanth Yalavarthi
@ 2023-10-19  4:16   ` Srikanth Yalavarthi
  2023-10-19  4:16   ` [PATCH v7 03/34] ml/cnxk: add generic model and layer structures Srikanth Yalavarthi
                     ` (31 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:16 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduce generic cnxk device structure. This structure is
a top level device structure for the driver, which would
encapsulate the target / platform specific device structure.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c   | 316 ++++++++++----------
 drivers/ml/cnxk/cn10k_ml_dev.h   |  47 +--
 drivers/ml/cnxk/cn10k_ml_model.c |  15 +-
 drivers/ml/cnxk/cn10k_ml_model.h |   8 +-
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  60 ++--
 drivers/ml/cnxk/cn10k_ml_ops.c   | 495 +++++++++++++++++--------------
 drivers/ml/cnxk/cnxk_ml_dev.c    |  11 +
 drivers/ml/cnxk/cnxk_ml_dev.h    |  58 ++++
 drivers/ml/cnxk/meson.build      |   1 +
 9 files changed, 562 insertions(+), 449 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index e3c2badcef..3bc61443d8 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -10,13 +10,14 @@
 #include <rte_mldev_pmd.h>
 #include <rte_pci.h>
 
-#include <roc_api.h>
-
 #include <eal_firmware.h>
 
-#include "cn10k_ml_dev.h"
+#include <roc_api.h>
+
 #include "cn10k_ml_ops.h"
 
+#include "cnxk_ml_dev.h"
+
 #define CN10K_ML_FW_PATH		"fw_path"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
@@ -58,9 +59,6 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 /* Supported OCM page sizes: 1KB, 2KB, 4KB, 8KB and 16KB */
 static const int valid_ocm_page_size[] = {1024, 2048, 4096, 8192, 16384};
 
-/* Dummy operations for ML device */
-struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
-
 static int
 parse_string_arg(const char *key __rte_unused, const char *value, void *extra_args)
 {
@@ -90,7 +88,7 @@ parse_integer_arg(const char *key __rte_unused, const char *value, void *extra_a
 }
 
 static int
-cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mldev)
+cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *cn10k_mldev)
 {
 	bool enable_dpe_warnings_set = false;
 	bool report_dpe_warnings_set = false;
@@ -127,7 +125,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_FW_ENABLE_DPE_WARNINGS) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_ENABLE_DPE_WARNINGS,
-					 &parse_integer_arg, &mldev->fw.enable_dpe_warnings);
+					 &parse_integer_arg, &cn10k_mldev->fw.enable_dpe_warnings);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_FW_ENABLE_DPE_WARNINGS);
@@ -139,7 +137,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_FW_REPORT_DPE_WARNINGS) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_REPORT_DPE_WARNINGS,
-					 &parse_integer_arg, &mldev->fw.report_dpe_warnings);
+					 &parse_integer_arg, &cn10k_mldev->fw.report_dpe_warnings);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_FW_REPORT_DPE_WARNINGS);
@@ -151,7 +149,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_DEV_CACHE_MODEL_DATA) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_DEV_CACHE_MODEL_DATA, &parse_integer_arg,
-					 &mldev->cache_model_data);
+					 &cn10k_mldev->cache_model_data);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_DEV_CACHE_MODEL_DATA);
@@ -174,7 +172,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_DEV_HW_QUEUE_LOCK) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_DEV_HW_QUEUE_LOCK, &parse_integer_arg,
-					 &mldev->hw_queue_lock);
+					 &cn10k_mldev->hw_queue_lock);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_DEV_HW_QUEUE_LOCK);
@@ -186,7 +184,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_OCM_PAGE_SIZE) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_OCM_PAGE_SIZE, &parse_integer_arg,
-					 &mldev->ocm_page_size);
+					 &cn10k_mldev->ocm_page_size);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n", CN10K_ML_OCM_PAGE_SIZE);
 			ret = -EINVAL;
@@ -197,49 +195,53 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 check_args:
 	if (!fw_path_set)
-		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
+		cn10k_mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
 	else
-		mldev->fw.path = fw_path;
-	plt_info("ML: %s = %s", CN10K_ML_FW_PATH, mldev->fw.path);
+		cn10k_mldev->fw.path = fw_path;
+	plt_info("ML: %s = %s", CN10K_ML_FW_PATH, cn10k_mldev->fw.path);
 
 	if (!enable_dpe_warnings_set) {
-		mldev->fw.enable_dpe_warnings = CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT;
+		cn10k_mldev->fw.enable_dpe_warnings = CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT;
 	} else {
-		if ((mldev->fw.enable_dpe_warnings < 0) || (mldev->fw.enable_dpe_warnings > 1)) {
+		if ((cn10k_mldev->fw.enable_dpe_warnings < 0) ||
+		    (cn10k_mldev->fw.enable_dpe_warnings > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_FW_ENABLE_DPE_WARNINGS,
-				mldev->fw.enable_dpe_warnings);
+				cn10k_mldev->fw.enable_dpe_warnings);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_FW_ENABLE_DPE_WARNINGS, mldev->fw.enable_dpe_warnings);
+	plt_info("ML: %s = %d", CN10K_ML_FW_ENABLE_DPE_WARNINGS,
+		 cn10k_mldev->fw.enable_dpe_warnings);
 
 	if (!report_dpe_warnings_set) {
-		mldev->fw.report_dpe_warnings = CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT;
+		cn10k_mldev->fw.report_dpe_warnings = CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT;
 	} else {
-		if ((mldev->fw.report_dpe_warnings < 0) || (mldev->fw.report_dpe_warnings > 1)) {
+		if ((cn10k_mldev->fw.report_dpe_warnings < 0) ||
+		    (cn10k_mldev->fw.report_dpe_warnings > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_FW_REPORT_DPE_WARNINGS,
-				mldev->fw.report_dpe_warnings);
+				cn10k_mldev->fw.report_dpe_warnings);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_FW_REPORT_DPE_WARNINGS, mldev->fw.report_dpe_warnings);
+	plt_info("ML: %s = %d", CN10K_ML_FW_REPORT_DPE_WARNINGS,
+		 cn10k_mldev->fw.report_dpe_warnings);
 
 	if (!cache_model_data_set) {
-		mldev->cache_model_data = CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT;
+		cn10k_mldev->cache_model_data = CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT;
 	} else {
-		if ((mldev->cache_model_data < 0) || (mldev->cache_model_data > 1)) {
+		if ((cn10k_mldev->cache_model_data < 0) || (cn10k_mldev->cache_model_data > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_DEV_CACHE_MODEL_DATA,
-				mldev->cache_model_data);
+				cn10k_mldev->cache_model_data);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_DEV_CACHE_MODEL_DATA, mldev->cache_model_data);
+	plt_info("ML: %s = %d", CN10K_ML_DEV_CACHE_MODEL_DATA, cn10k_mldev->cache_model_data);
 
 	if (!ocm_alloc_mode_set) {
-		mldev->ocm.alloc_mode = CN10K_ML_OCM_ALLOC_MODE_DEFAULT;
+		cn10k_mldev->ocm.alloc_mode = CN10K_ML_OCM_ALLOC_MODE_DEFAULT;
 	} else {
 		if (!((strcmp(ocm_alloc_mode, "lowest") == 0) ||
 		      (strcmp(ocm_alloc_mode, "largest") == 0))) {
@@ -248,47 +250,47 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 			ret = -EINVAL;
 			goto exit;
 		}
-		mldev->ocm.alloc_mode = ocm_alloc_mode;
+		cn10k_mldev->ocm.alloc_mode = ocm_alloc_mode;
 	}
-	plt_info("ML: %s = %s", CN10K_ML_OCM_ALLOC_MODE, mldev->ocm.alloc_mode);
+	plt_info("ML: %s = %s", CN10K_ML_OCM_ALLOC_MODE, cn10k_mldev->ocm.alloc_mode);
 
 	if (!hw_queue_lock_set) {
-		mldev->hw_queue_lock = CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT;
+		cn10k_mldev->hw_queue_lock = CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT;
 	} else {
-		if ((mldev->hw_queue_lock < 0) || (mldev->hw_queue_lock > 1)) {
+		if ((cn10k_mldev->hw_queue_lock < 0) || (cn10k_mldev->hw_queue_lock > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_DEV_HW_QUEUE_LOCK,
-				mldev->hw_queue_lock);
+				cn10k_mldev->hw_queue_lock);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, mldev->hw_queue_lock);
+	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, cn10k_mldev->hw_queue_lock);
 
 	if (!ocm_page_size_set) {
-		mldev->ocm_page_size = CN10K_ML_OCM_PAGE_SIZE_DEFAULT;
+		cn10k_mldev->ocm_page_size = CN10K_ML_OCM_PAGE_SIZE_DEFAULT;
 	} else {
-		if (mldev->ocm_page_size < 0) {
+		if (cn10k_mldev->ocm_page_size < 0) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_OCM_PAGE_SIZE,
-				mldev->ocm_page_size);
+				cn10k_mldev->ocm_page_size);
 			ret = -EINVAL;
 			goto exit;
 		}
 
 		found = false;
 		for (i = 0; i < PLT_DIM(valid_ocm_page_size); i++) {
-			if (mldev->ocm_page_size == valid_ocm_page_size[i]) {
+			if (cn10k_mldev->ocm_page_size == valid_ocm_page_size[i]) {
 				found = true;
 				break;
 			}
 		}
 
 		if (!found) {
-			plt_err("Unsupported ocm_page_size = %d\n", mldev->ocm_page_size);
+			plt_err("Unsupported ocm_page_size = %d\n", cn10k_mldev->ocm_page_size);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_OCM_PAGE_SIZE, mldev->ocm_page_size);
+	plt_info("ML: %s = %d", CN10K_ML_OCM_PAGE_SIZE, cn10k_mldev->ocm_page_size);
 
 exit:
 	rte_kvargs_free(kvlist);
@@ -300,7 +302,8 @@ static int
 cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 {
 	struct rte_ml_dev_pmd_init_params init_params;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	char name[RTE_ML_STR_MAX];
 	struct rte_ml_dev *dev;
 	int ret;
@@ -308,7 +311,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	PLT_SET_USED(pci_drv);
 
 	init_params = (struct rte_ml_dev_pmd_init_params){
-		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cn10k_ml_dev)};
+		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cnxk_ml_dev)};
 
 	ret = roc_plt_init();
 	if (ret < 0) {
@@ -324,18 +327,20 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	}
 
 	/* Get private data space allocated */
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cnxk_mldev->mldev = dev;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
-		mldev->roc.pci_dev = pci_dev;
+		cn10k_mldev->roc.pci_dev = pci_dev;
 
-		ret = cn10k_mldev_parse_devargs(dev->device->devargs, mldev);
+		ret = cn10k_mldev_parse_devargs(dev->device->devargs, cn10k_mldev);
 		if (ret) {
 			plt_err("Failed to parse devargs ret = %d", ret);
 			goto pmd_destroy;
 		}
 
-		ret = roc_ml_dev_init(&mldev->roc);
+		ret = roc_ml_dev_init(&cn10k_mldev->roc);
 		if (ret) {
 			plt_err("Failed to initialize ML ROC, ret = %d", ret);
 			goto pmd_destroy;
@@ -351,7 +356,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	dev->dequeue_burst = NULL;
 	dev->op_error_get = NULL;
 
-	mldev->state = ML_CN10K_DEV_STATE_PROBED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_PROBED;
 
 	return 0;
 
@@ -368,7 +373,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 static int
 cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	char name[RTE_ML_STR_MAX];
 	struct rte_ml_dev *dev;
 	int ret;
@@ -383,8 +388,8 @@ cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 		return -ENODEV;
 
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
-		mldev = dev->data->dev_private;
-		ret = roc_ml_dev_fini(&mldev->roc);
+		cnxk_mldev = dev->data->dev_private;
+		ret = roc_ml_dev_fini(&cnxk_mldev->cn10k_mldev.roc);
 		if (ret)
 			return ret;
 	}
@@ -430,45 +435,45 @@ cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 static int
 cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
 	uint64_t timeout_cycle;
 	uint64_t reg_val64;
 	bool timeout;
 	int ret = 0;
 
-	mldev = fw->mldev;
+	cn10k_mldev = fw->cn10k_mldev;
 
 	/* Reset HEAD and TAIL debug pointer registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
 
 	/* Set ML_MLR_BASE to base IOVA of the ML region in LLC/DRAM. */
 	reg_val64 = rte_eal_get_baseaddr();
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_MLR_BASE);
-	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
-	roc_ml_reg_save(&mldev->roc, ML_MLR_BASE);
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_save(&cn10k_mldev->roc, ML_MLR_BASE);
 
 	/* Update FW load completion structure */
 	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
 	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &fw->req->result);
+	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
 	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CN10K_POLL_JOB_START, &fw->req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
-	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&mldev->roc, &fw->req->jd);
+	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
 
 	plt_rmb();
 	do {
-		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
+		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -480,11 +485,11 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 	} else {
 		/* Set ML to disable new jobs */
 		reg_val64 = (ROC_ML_CFG_JD_SIZE | ROC_ML_CFG_MLIP_ENA);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 
 		/* Clear scratch registers */
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
 
 		if (timeout) {
 			plt_err("Firmware load timeout");
@@ -498,14 +503,14 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 	}
 
 	/* Reset scratch registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
 
 	/* Disable job execution, to be enabled in start */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	return ret;
 }
@@ -515,7 +520,7 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 {
 	union ml_a35_0_rst_vector_base_s a35_0_rst_vector_base;
 	union ml_a35_0_rst_vector_base_s a35_1_rst_vector_base;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
 	uint64_t timeout_cycle;
 	uint64_t reg_val64;
 	uint32_t reg_val32;
@@ -524,24 +529,24 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	int ret = 0;
 	uint8_t i;
 
-	mldev = fw->mldev;
+	cn10k_mldev = fw->cn10k_mldev;
 
 	/* Reset HEAD and TAIL debug pointer registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
 
 	/* (1) Write firmware images for ACC's two A35 cores to the ML region in LLC / DRAM. */
 	rte_memcpy(PLT_PTR_ADD(fw->data, FW_LINKER_OFFSET), buffer, size);
 
 	/* (2) Set ML(0)_MLR_BASE = Base IOVA of the ML region in LLC/DRAM. */
 	reg_val64 = PLT_PTR_SUB_U64_CAST(fw->data, rte_eal_get_baseaddr());
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_MLR_BASE);
-	plt_ml_dbg("ML_MLR_BASE => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
-	roc_ml_reg_save(&mldev->roc, ML_MLR_BASE);
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_save(&cn10k_mldev->roc, ML_MLR_BASE);
 
 	/* (3) Set ML(0)_AXI_BRIDGE_CTRL(1) = 0x184003 to remove back-pressure check on DMA AXI
 	 * bridge.
@@ -549,9 +554,9 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	reg_val64 = (ROC_ML_AXI_BRIDGE_CTRL_AXI_RESP_CTRL |
 		     ROC_ML_AXI_BRIDGE_CTRL_BRIDGE_CTRL_MODE | ROC_ML_AXI_BRIDGE_CTRL_NCB_WR_BLK |
 		     ROC_ML_AXI_BRIDGE_CTRL_FORCE_WRESP_OK | ROC_ML_AXI_BRIDGE_CTRL_FORCE_RRESP_OK);
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_AXI_BRIDGE_CTRL(1));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_AXI_BRIDGE_CTRL(1));
 	plt_ml_dbg("ML_AXI_BRIDGE_CTRL(1) => 0x%016lx",
-		   roc_ml_reg_read64(&mldev->roc, ML_AXI_BRIDGE_CTRL(1)));
+		   roc_ml_reg_read64(&cn10k_mldev->roc, ML_AXI_BRIDGE_CTRL(1)));
 
 	/* (4) Set ML(0)_ANB(0..2)_BACKP_DISABLE = 0x3 to remove back-pressure on the AXI to NCB
 	 * bridges.
@@ -559,9 +564,9 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	for (i = 0; i < ML_ANBX_NR; i++) {
 		reg_val64 = (ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_B_BACKP_DISABLE |
 			     ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_R_BACKP_DISABLE);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_BACKP_DISABLE(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_BACKP_DISABLE(i));
 		plt_ml_dbg("ML_ANBX_BACKP_DISABLE(%u) => 0x%016lx", i,
-			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_BACKP_DISABLE(i)));
+			   roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_BACKP_DISABLE(i)));
 	}
 
 	/* (5) Set ML(0)_ANB(0..2)_NCBI_P_OVR = 0x3000 and ML(0)_ANB(0..2)_NCBI_NP_OVR = 0x3000 to
@@ -570,39 +575,40 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	for (i = 0; i < ML_ANBX_NR; i++) {
 		reg_val64 = (ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR |
 			     ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR_VLD);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
 		plt_ml_dbg("ML_ANBX_NCBI_P_OVR(%u) => 0x%016lx", i,
-			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_P_OVR(i)));
+			   roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_NCBI_P_OVR(i)));
 
 		reg_val64 |= (ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR |
 			      ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR_VLD);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_NP_OVR(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_NCBI_NP_OVR(i));
 		plt_ml_dbg("ML_ANBX_NCBI_NP_OVR(%u) => 0x%016lx", i,
-			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_NP_OVR(i)));
+			   roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_NCBI_NP_OVR(i)));
 	}
 
 	/* (6) Set ML(0)_CFG[MLIP_CLK_FORCE] = 1, to force turning on the MLIP clock. */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_MLIP_CLK_FORCE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* (7) Set ML(0)_JOB_MGR_CTRL[STALL_ON_IDLE] = 0, to make sure the boot request is accepted
 	 * when there is no job in the command queue.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL);
 	reg_val64 &= ~ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
-	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
+	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx",
+		   roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL));
 
 	/* (8) Set ML(0)_CFG[ENA] = 0 and ML(0)_CFG[MLIP_ENA] = 1 to bring MLIP out of reset while
 	 * keeping the job manager disabled.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_MLIP_ENA;
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* (9) Wait at least 70 coprocessor clock cycles. */
 	plt_delay_us(FW_WAIT_CYCLES);
@@ -613,53 +619,57 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	 * AXI outbound address divided by 4. Read after write.
 	 */
 	offset = PLT_PTR_ADD_U64_CAST(
-		fw->data, FW_LINKER_OFFSET - roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+		fw->data, FW_LINKER_OFFSET - roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
 	a35_0_rst_vector_base.s.addr = (offset + ML_AXI_START_ADDR) / 4;
 	a35_1_rst_vector_base.s.addr = (offset + ML_AXI_START_ADDR) / 4;
 
-	roc_ml_reg_write32(&mldev->roc, a35_0_rst_vector_base.w.w0, ML_A35_0_RST_VECTOR_BASE_W(0));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(0));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_0_rst_vector_base.w.w0,
+			   ML_A35_0_RST_VECTOR_BASE_W(0));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(0));
 	plt_ml_dbg("ML_A35_0_RST_VECTOR_BASE_W(0) => 0x%08x", reg_val32);
 
-	roc_ml_reg_write32(&mldev->roc, a35_0_rst_vector_base.w.w1, ML_A35_0_RST_VECTOR_BASE_W(1));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(1));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_0_rst_vector_base.w.w1,
+			   ML_A35_0_RST_VECTOR_BASE_W(1));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(1));
 	plt_ml_dbg("ML_A35_0_RST_VECTOR_BASE_W(1) => 0x%08x", reg_val32);
 
-	roc_ml_reg_write32(&mldev->roc, a35_1_rst_vector_base.w.w0, ML_A35_1_RST_VECTOR_BASE_W(0));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(0));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_1_rst_vector_base.w.w0,
+			   ML_A35_1_RST_VECTOR_BASE_W(0));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(0));
 	plt_ml_dbg("ML_A35_1_RST_VECTOR_BASE_W(0) => 0x%08x", reg_val32);
 
-	roc_ml_reg_write32(&mldev->roc, a35_1_rst_vector_base.w.w1, ML_A35_1_RST_VECTOR_BASE_W(1));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(1));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_1_rst_vector_base.w.w1,
+			   ML_A35_1_RST_VECTOR_BASE_W(1));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(1));
 	plt_ml_dbg("ML_A35_1_RST_VECTOR_BASE_W(1) => 0x%08x", reg_val32);
 
 	/* (11) Clear MLIP's ML(0)_SW_RST_CTRL[ACC_RST]. This will bring the ACC cores and other
 	 * MLIP components out of reset. The cores will execute firmware from the ML region as
 	 * written in step 1.
 	 */
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_SW_RST_CTRL);
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_SW_RST_CTRL);
 	reg_val32 &= ~ROC_ML_SW_RST_CTRL_ACC_RST;
-	roc_ml_reg_write32(&mldev->roc, reg_val32, ML_SW_RST_CTRL);
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_SW_RST_CTRL);
+	roc_ml_reg_write32(&cn10k_mldev->roc, reg_val32, ML_SW_RST_CTRL);
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_SW_RST_CTRL);
 	plt_ml_dbg("ML_SW_RST_CTRL => 0x%08x", reg_val32);
 
 	/* (12) Wait for notification from firmware that ML is ready for job execution. */
 	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
 	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &fw->req->result);
+	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
 	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CN10K_POLL_JOB_START, &fw->req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
-	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&mldev->roc, &fw->req->jd);
+	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
 
 	plt_rmb();
 	do {
-		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
+		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -671,11 +681,11 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	} else {
 		/* Set ML to disable new jobs */
 		reg_val64 = (ROC_ML_CFG_JD_SIZE | ROC_ML_CFG_MLIP_ENA);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 
 		/* Clear scratch registers */
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
 
 		if (timeout) {
 			plt_err("Firmware load timeout");
@@ -691,49 +701,51 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	/* (13) Set ML(0)_JOB_MGR_CTRL[STALL_ON_IDLE] = 0x1; this is needed to shut down the MLIP
 	 * clock when there are no more jobs to process.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL);
 	reg_val64 |= ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
-	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
+	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx",
+		   roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL));
 
 	/* (14) Set ML(0)_CFG[MLIP_CLK_FORCE] = 0; the MLIP clock will be turned on/off based on job
 	 * activities.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_MLIP_CLK_FORCE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* (15) Set ML(0)_CFG[ENA] to enable ML job execution. */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* Reset scratch registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
 
 	/* Disable job execution, to be enabled in start */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* Additional fixes: Set RO bit to fix O2D DMA bandwidth issue on cn10ka */
 	for (i = 0; i < ML_ANBX_NR; i++) {
-		reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_P_OVR(i));
+		reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_NCBI_P_OVR(i));
 		reg_val64 |= (ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR |
 			      ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR_VLD);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
 	}
 
 	return ret;
 }
 
 int
-cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
+cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
 	const struct plt_memzone *mz;
 	struct cn10k_ml_fw *fw;
 	void *fw_buffer = NULL;
@@ -741,8 +753,9 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	uint64_t fw_size = 0;
 	int ret = 0;
 
-	fw = &mldev->fw;
-	fw->mldev = mldev;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	fw = &cn10k_mldev->fw;
+	fw->cn10k_mldev = cn10k_mldev;
 
 	if (roc_env_is_emulator() || roc_env_is_hw()) {
 		/* Read firmware image to a buffer */
@@ -773,8 +786,8 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	memset(&fw->req->jd.fw_load.version[0], '\0', MLDEV_FIRMWARE_VERSION_LENGTH);
 
 	/* Reset device, if in active state */
-	if (roc_ml_mlip_is_enabled(&mldev->roc))
-		roc_ml_mlip_reset(&mldev->roc, true);
+	if (roc_ml_mlip_is_enabled(&cn10k_mldev->roc))
+		roc_ml_mlip_reset(&cn10k_mldev->roc, true);
 
 	/* Load firmware */
 	if (roc_env_is_emulator() || roc_env_is_hw()) {
@@ -787,22 +800,25 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	}
 
 	if (ret < 0)
-		cn10k_ml_fw_unload(mldev);
+		cn10k_ml_fw_unload(cnxk_mldev);
 
 	return ret;
 }
 
 void
-cn10k_ml_fw_unload(struct cn10k_ml_dev *mldev)
+cn10k_ml_fw_unload(struct cnxk_ml_dev *cnxk_mldev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
 	const struct plt_memzone *mz;
 	uint64_t reg_val;
 
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+
 	/* Disable and reset device */
-	reg_val = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val &= ~ROC_ML_CFG_MLIP_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val, ML_CFG);
-	roc_ml_mlip_reset(&mldev->roc, true);
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val, ML_CFG);
+	roc_ml_mlip_reset(&cn10k_mldev->roc, true);
 
 	mz = plt_memzone_lookup(FW_MEMZONE_NAME);
 	if (mz != NULL)
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 4aaeecff03..f9da1548c4 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -9,6 +9,9 @@
 
 #include "cn10k_ml_ocm.h"
 
+/* Dummy Device ops */
+extern struct rte_ml_dev_ops ml_dev_dummy_ops;
+
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
@@ -36,17 +39,10 @@
 /* Maximum number of segments for IO data */
 #define ML_CN10K_MAX_SEGMENTS 1
 
-/* ML command timeout in seconds */
-#define ML_CN10K_CMD_TIMEOUT 5
-
 /* ML slow-path job flags */
 #define ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE BIT(0)
 #define ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD   BIT(1)
 
-/* Poll mode job state */
-#define ML_CN10K_POLL_JOB_START	 0
-#define ML_CN10K_POLL_JOB_FINISH 1
-
 /* Memory barrier macros */
 #if defined(RTE_ARCH_ARM)
 #define dmb_st ({ asm volatile("dmb st" : : : "memory"); })
@@ -56,6 +52,7 @@
 #define dsb_st
 #endif
 
+struct cnxk_ml_dev;
 struct cn10k_ml_req;
 struct cn10k_ml_qp;
 
@@ -68,21 +65,6 @@ enum cn10k_ml_job_type {
 	ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST,
 };
 
-/* Device configuration state enum */
-enum cn10k_ml_dev_state {
-	/* Probed and not configured */
-	ML_CN10K_DEV_STATE_PROBED = 0,
-
-	/* Configured */
-	ML_CN10K_DEV_STATE_CONFIGURED,
-
-	/* Started */
-	ML_CN10K_DEV_STATE_STARTED,
-
-	/* Closed */
-	ML_CN10K_DEV_STATE_CLOSED
-};
-
 /* Error types enumeration */
 enum cn10k_ml_error_etype {
 	/* 0x0 */ ML_ETYPE_NO_ERROR = 0, /* No error */
@@ -379,7 +361,7 @@ struct cn10k_ml_jd {
 /* ML firmware structure */
 struct cn10k_ml_fw {
 	/* Device reference */
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
 
 	/* Firmware file path */
 	const char *path;
@@ -485,27 +467,12 @@ struct cn10k_ml_dev {
 	/* Device ROC */
 	struct roc_ml roc;
 
-	/* Configuration state */
-	enum cn10k_ml_dev_state state;
-
 	/* Firmware */
 	struct cn10k_ml_fw fw;
 
 	/* OCM info */
 	struct cn10k_ml_ocm ocm;
 
-	/* Number of models loaded */
-	uint16_t nb_models_loaded;
-
-	/* Number of models unloaded */
-	uint16_t nb_models_unloaded;
-
-	/* Number of models started */
-	uint16_t nb_models_started;
-
-	/* Number of models stopped */
-	uint16_t nb_models_stopped;
-
 	/* Extended stats data */
 	struct cn10k_ml_xstats xstats;
 
@@ -528,7 +495,7 @@ struct cn10k_ml_dev {
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
-int cn10k_ml_fw_load(struct cn10k_ml_dev *mldev);
-void cn10k_ml_fw_unload(struct cn10k_ml_dev *mldev);
+int cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev);
+void cn10k_ml_fw_unload(struct cnxk_ml_dev *cnxk_mldev);
 
 #endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index e0b750cd8e..cc46ca2efd 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -6,10 +6,11 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
+#include "cnxk_ml_dev.h"
+
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
 {
@@ -461,7 +462,7 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_
 }
 
 int
-cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, uint8_t *buffer,
+cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id, uint8_t *buffer,
 			       uint16_t *wb_pages, uint16_t *scratch_pages)
 {
 	struct cn10k_ml_model_metadata *metadata;
@@ -470,7 +471,7 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui
 	uint64_t wb_size;
 
 	metadata = (struct cn10k_ml_model_metadata *)buffer;
-	ocm = &mldev->ocm;
+	ocm = &cn10k_mldev->ocm;
 
 	/* Assume wb_size is zero for non-relocatable models */
 	if (metadata->model.ocm_relocatable)
@@ -494,11 +495,11 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui
 		   scratch_size, *scratch_pages);
 
 	/* Check if the model can be loaded on OCM */
-	if ((*wb_pages + *scratch_pages) > mldev->ocm.num_pages) {
+	if ((*wb_pages + *scratch_pages) > cn10k_mldev->ocm.num_pages) {
 		plt_err("Cannot create the model, OCM relocatable = %u",
 			metadata->model.ocm_relocatable);
 		plt_err("wb_pages (%u) + scratch_pages (%u) > %u", *wb_pages, *scratch_pages,
-			mldev->ocm.num_pages);
+			cn10k_mldev->ocm.num_pages);
 		return -ENOMEM;
 	}
 
@@ -506,8 +507,8 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui
 	 * prevent the library from allocating the remaining space on the tile to other models.
 	 */
 	if (!metadata->model.ocm_relocatable)
-		*scratch_pages =
-			PLT_MAX(PLT_U64_CAST(*scratch_pages), PLT_U64_CAST(mldev->ocm.num_pages));
+		*scratch_pages = PLT_MAX(PLT_U64_CAST(*scratch_pages),
+					 PLT_U64_CAST(cn10k_mldev->ocm.num_pages));
 
 	return 0;
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 4cc0744891..3128b28db7 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -13,6 +13,8 @@
 #include "cn10k_ml_ocm.h"
 #include "cn10k_ml_ops.h"
 
+struct cnxk_ml_dev;
+
 /* Model state */
 enum cn10k_ml_model_state {
 	ML_CN10K_MODEL_STATE_LOADED,
@@ -489,7 +491,7 @@ struct cn10k_ml_model_stats {
 /* Model Object */
 struct cn10k_ml_model {
 	/* Device reference */
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *mldev;
 
 	/* Name */
 	char name[RTE_ML_STR_MAX];
@@ -537,8 +539,8 @@ int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
 void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
 				uint8_t *base_dma_addr);
-int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, uint8_t *buffer,
-				   uint16_t *wb_pages, uint16_t *scratch_pages);
+int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id,
+				   uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages);
 void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 6fb0bb620e..8094a0fab1 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -4,11 +4,12 @@
 
 #include <rte_mldev_pmd.h>
 
-#include "cn10k_ml_dev.h"
+#include <roc_api.h>
+
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
-#include "roc_api.h"
+#include "cnxk_ml_dev.h"
 
 /* OCM macros */
 #define BYTE_LEN	   8
@@ -217,7 +218,8 @@ int
 cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
 			   uint16_t scratch_pages, uint64_t *tilemask)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	uint16_t used_scratch_pages_max;
@@ -236,8 +238,9 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	int max_slot_sz;
 	int page_id;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 
 	if (num_tiles > ML_CN10K_OCM_NUMTILES) {
 		plt_err("Invalid num_tiles = %u (> %u)", num_tiles, ML_CN10K_OCM_NUMTILES);
@@ -254,8 +257,8 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	tile_start = 0;
 	search_end_tile = ocm->num_tiles - num_tiles;
 
-	/* allocate for local ocm mask */
-	local_ocm_mask = rte_zmalloc("local_ocm_mask", mldev->ocm.mask_words, RTE_CACHE_LINE_SIZE);
+	/* Allocate for local ocm mask */
+	local_ocm_mask = rte_zmalloc("local_ocm_mask", ocm->mask_words, RTE_CACHE_LINE_SIZE);
 	if (local_ocm_mask == NULL) {
 		plt_err("Unable to allocate memory for local_ocm_mask");
 		return -1;
@@ -271,7 +274,7 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 			PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, used_last_wb_page_max);
 	}
 
-	memset(local_ocm_mask, 0, mldev->ocm.mask_words);
+	memset(local_ocm_mask, 0, ocm->mask_words);
 	for (tile_id = tile_start; tile_id < tile_start + num_tiles; tile_id++) {
 		for (word_id = 0; word_id < ocm->mask_words; word_id++)
 			local_ocm_mask[word_id] |= ocm->tile_ocm_info[tile_id].ocm_mask[word_id];
@@ -333,8 +336,9 @@ void
 cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t tilemask,
 			   int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_page_start;
@@ -345,8 +349,9 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 	int tile_id;
 	int page_id;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	/* Get first set bit, tile_start */
@@ -391,8 +396,9 @@ void
 cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_model *local_model;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_resize_pages;
@@ -404,8 +410,9 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 	int page_id;
 	uint16_t i;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	/* Update OCM info for WB memory */
@@ -453,35 +460,37 @@ cn10k_ml_ocm_pagemask_to_str(struct cn10k_ml_ocm_tile_info *tile_info, uint16_t
 	char *p = str;
 	int word;
 
-	/* add prefix 0x */
+	/* Add prefix 0x */
 	*p++ = '0';
 	*p++ = 'x';
 
-	/* build one word at a time */
+	/* Build hex string */
 	for (word = nwords - 1; word >= 0; word--) {
 		sprintf(p, "%02X", tile_info->ocm_mask[word]);
 		p += 2;
 	}
 
-	/* terminate */
+	/* Terminate */
 	*p++ = 0;
 }
 
 void
 cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
 {
-	char *str;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 	uint8_t tile_id;
 	uint8_t word_id;
 	int wb_pages;
+	char *str;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 
-	/* nibbles + prefix '0x' */
-	str = rte_zmalloc("ocm_mask_str", mldev->ocm.num_pages / 4 + 2, RTE_CACHE_LINE_SIZE);
+	/* Nibbles + prefix '0x' */
+	str = rte_zmalloc("ocm_mask_str", ocm->num_pages / 4 + 2, RTE_CACHE_LINE_SIZE);
 	if (str == NULL) {
 		plt_err("Unable to allocate memory for ocm_mask_str");
 		return;
@@ -492,9 +501,8 @@ cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
 		cn10k_ml_ocm_pagemask_to_str(&ocm->tile_ocm_info[tile_id], ocm->mask_words, str);
 
 		wb_pages = 0 - ocm->tile_ocm_info[tile_id].scratch_pages;
-		for (word_id = 0; word_id < mldev->ocm.mask_words; word_id++)
-			wb_pages +=
-				rte_popcount32(ocm->tile_ocm_info[tile_id].ocm_mask[word_id]);
+		for (word_id = 0; word_id < ocm->mask_words; word_id++)
+			wb_pages += rte_popcount32(ocm->tile_ocm_info[tile_id].ocm_mask[word_id]);
 
 		fprintf(fp,
 			"tile = %2u, scratch_pages = %4u,"
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 11531afd8c..dc747cf534 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -7,10 +7,11 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ops.h"
 
+#include "cnxk_ml_dev.h"
+
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
 
@@ -85,7 +86,7 @@ cn10k_ml_set_poll_addr(struct cn10k_ml_req *req)
 static inline void
 cn10k_ml_set_poll_ptr(struct cn10k_ml_req *req)
 {
-	plt_write64(ML_CN10K_POLL_JOB_START, req->compl_W1);
+	plt_write64(ML_CNXK_POLL_JOB_START, req->compl_W1);
 }
 
 static inline uint64_t
@@ -175,7 +176,7 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->queue.reqs = (struct cn10k_ml_req *)va;
 	qp->queue.head = 0;
 	qp->queue.tail = 0;
-	qp->queue.wait_cycles = ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	qp->nb_desc = nb_desc;
 	qp->stats.enqueued_count = 0;
 	qp->stats.dequeued_count = 0;
@@ -199,16 +200,17 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 static void
 cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 {
-
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	char str[STR_LEN];
 	uint8_t i;
 	uint8_t j;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	/* Print debug info */
@@ -249,7 +251,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
 			ML_CN10K_OCM_NUMTILES / 4, model->model_mem_map.tilemask);
 		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
-			model->model_mem_map.wb_page_start * mldev->ocm.page_size);
+			model->model_mem_map.wb_page_start * cn10k_mldev->ocm.page_size);
 	}
 
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->metadata.model.num_input);
@@ -325,7 +327,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 }
 
 static void
-cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_model *model,
+cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_model *model,
 				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
@@ -340,7 +342,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 	req->jd.hdr.model_id = model->model_id;
 	req->jd.hdr.job_type = job_type;
 	req->jd.hdr.fp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
 		if (!model->metadata.model.ocm_relocatable)
@@ -350,9 +352,9 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 
 		req->jd.hdr.sp_flags |= ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD;
 		req->jd.model_start.extended_args =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, &req->extended_args));
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->extended_args));
 		req->jd.model_start.model_dst_ddr_addr =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, addr->init_run_addr));
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_run_addr));
 		req->jd.model_start.model_init_offset = 0x0;
 		req->jd.model_start.model_main_offset = metadata->init_model.file_size;
 		req->jd.model_start.model_finish_offset =
@@ -372,7 +374,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 		req->jd.model_start.ocm_wb_range_start = metadata->model.ocm_wb_range_start;
 		req->jd.model_start.ocm_wb_range_end = metadata->model.ocm_wb_range_end;
 		req->jd.model_start.ddr_wb_base_address = PLT_U64_CAST(roc_ml_addr_ap2mlip(
-			&mldev->roc,
+			&cn10k_mldev->roc,
 			PLT_PTR_ADD(addr->finish_load_addr, metadata->finish_model.file_size)));
 		req->jd.model_start.ddr_wb_range_start = metadata->model.ddr_wb_range_start;
 		req->jd.model_start.ddr_wb_range_end = metadata->model.ddr_wb_range_end;
@@ -383,7 +385,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 		req->jd.model_start.output.s.ddr_range_end = metadata->model.ddr_output_range_end;
 
 		req->extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
-			roc_ml_addr_ap2mlip(&mldev->roc, model->addr.scratch_base_addr));
+			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, model->addr.scratch_base_addr));
 		req->extended_args.start.ddr_scratch_range_start =
 			metadata->model.ddr_scratch_range_start;
 		req->extended_args.start.ddr_scratch_range_end =
@@ -392,24 +394,20 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 }
 
 static __rte_always_inline void
-cn10k_ml_prep_fp_job_descriptor(struct rte_ml_dev *dev, struct cn10k_ml_req *req,
+cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_req *req,
 				struct rte_ml_op *op)
 {
-	struct cn10k_ml_dev *mldev;
-
-	mldev = dev->data->dev_private;
-
 	req->jd.hdr.jce.w0.u64 = 0;
 	req->jd.hdr.jce.w1.u64 = req->compl_W1;
 	req->jd.hdr.model_id = op->model_id;
 	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
 	req->jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
 	req->jd.hdr.sp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
 	req->jd.model_run.input_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->input[0]->addr));
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->input[0]->addr));
 	req->jd.model_run.output_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->output[0]->addr));
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->output[0]->addr));
 	req->jd.model_run.num_batches = op->nb_batches;
 }
 
@@ -436,66 +434,69 @@ static const struct xstat_info model_stats[] = {
 static int
 cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint16_t nb_stats;
 	uint16_t stat_id;
 	uint16_t model;
 	uint16_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
 	nb_stats = RTE_DIM(device_stats) + ML_CN10K_MAX_MODELS * RTE_DIM(model_stats);
-	if (mldev->xstats.entries == NULL)
-		mldev->xstats.entries = rte_zmalloc("cn10k_ml_xstats",
-						    sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
-						    PLT_CACHE_LINE_SIZE);
+	if (cn10k_mldev->xstats.entries == NULL)
+		cn10k_mldev->xstats.entries = rte_zmalloc(
+			"cn10k_ml_xstats", sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
+			PLT_CACHE_LINE_SIZE);
 
-	if (mldev->xstats.entries == NULL)
+	if (cn10k_mldev->xstats.entries == NULL)
 		return -ENOMEM;
 
 	/* Initialize device xstats */
 	stat_id = 0;
 	for (i = 0; i < RTE_DIM(device_stats); i++) {
-		mldev->xstats.entries[stat_id].map.id = stat_id;
-		snprintf(mldev->xstats.entries[stat_id].map.name,
-			 sizeof(mldev->xstats.entries[stat_id].map.name), "%s",
+		cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
+		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s",
 			 device_stats[i].name);
 
-		mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
-		mldev->xstats.entries[stat_id].type = device_stats[i].type;
-		mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_DEVICE;
-		mldev->xstats.entries[stat_id].obj_idx = 0;
-		mldev->xstats.entries[stat_id].reset_allowed = device_stats[i].reset_allowed;
+		cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
+		cn10k_mldev->xstats.entries[stat_id].type = device_stats[i].type;
+		cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_DEVICE;
+		cn10k_mldev->xstats.entries[stat_id].obj_idx = 0;
+		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_stats[i].reset_allowed;
 		stat_id++;
 	}
-	mldev->xstats.count_mode_device = stat_id;
+	cn10k_mldev->xstats.count_mode_device = stat_id;
 
 	/* Initialize model xstats */
 	for (model = 0; model < ML_CN10K_MAX_MODELS; model++) {
-		mldev->xstats.offset_for_model[model] = stat_id;
+		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
 
 		for (i = 0; i < RTE_DIM(model_stats); i++) {
-			mldev->xstats.entries[stat_id].map.id = stat_id;
-			mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
-			mldev->xstats.entries[stat_id].type = model_stats[i].type;
-			mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_MODEL;
-			mldev->xstats.entries[stat_id].obj_idx = model;
-			mldev->xstats.entries[stat_id].reset_allowed = model_stats[i].reset_allowed;
+			cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
+			cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
+			cn10k_mldev->xstats.entries[stat_id].type = model_stats[i].type;
+			cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_MODEL;
+			cn10k_mldev->xstats.entries[stat_id].obj_idx = model;
+			cn10k_mldev->xstats.entries[stat_id].reset_allowed =
+				model_stats[i].reset_allowed;
 
 			/* Name of xstat is updated during model load */
-			snprintf(mldev->xstats.entries[stat_id].map.name,
-				 sizeof(mldev->xstats.entries[stat_id].map.name), "Model-%u-%s",
-				 model, model_stats[i].name);
+			snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
+				 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name),
+				 "Model-%u-%s", model, model_stats[i].name);
 
 			stat_id++;
 		}
 
-		mldev->xstats.count_per_model[model] = RTE_DIM(model_stats);
+		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(model_stats);
 	}
 
-	mldev->xstats.count_mode_model = stat_id - mldev->xstats.count_mode_device;
-	mldev->xstats.count = stat_id;
+	cn10k_mldev->xstats.count_mode_model = stat_id - cn10k_mldev->xstats.count_mode_device;
+	cn10k_mldev->xstats.count = stat_id;
 
 	return 0;
 }
@@ -503,28 +504,32 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 static void
 cn10k_ml_xstats_uninit(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	rte_free(mldev->xstats.entries);
-	mldev->xstats.entries = NULL;
+	rte_free(cn10k_mldev->xstats.entries);
+	cn10k_mldev->xstats.entries = NULL;
 
-	mldev->xstats.count = 0;
+	cn10k_mldev->xstats.count = 0;
 }
 
 static void
 cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	uint16_t rclk_freq;
 	uint16_t sclk_freq;
 	uint16_t stat_id;
 	char suffix[8];
 	uint16_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[model_id];
 	stat_id = RTE_DIM(device_stats) + model_id * RTE_DIM(model_stats);
 
@@ -536,8 +541,8 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Update xstat name based on model name and sclk availability */
 	for (i = 0; i < RTE_DIM(model_stats); i++) {
-		snprintf(mldev->xstats.entries[stat_id].map.name,
-			 sizeof(mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
 			 model->metadata.model.name, model_stats[i].name, suffix);
 		stat_id++;
 	}
@@ -547,19 +552,19 @@ static uint64_t
 cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 		       enum cn10k_ml_xstats_type type)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
 
 	switch (type) {
 	case nb_models_loaded:
-		return mldev->nb_models_loaded;
+		return cnxk_mldev->nb_models_loaded;
 	case nb_models_unloaded:
-		return mldev->nb_models_unloaded;
+		return cnxk_mldev->nb_models_unloaded;
 	case nb_models_started:
-		return mldev->nb_models_started;
+		return cnxk_mldev->nb_models_started;
 	case nb_models_stopped:
-		return mldev->nb_models_stopped;
+		return cnxk_mldev->nb_models_stopped;
 	default:
 		return -1;
 	}
@@ -651,15 +656,17 @@ static int
 cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[], uint16_t nb_ids)
 {
 	struct cn10k_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint16_t nb_stats;
 	uint16_t stat_id;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	if (stat_ids == NULL)
-		nb_stats = mldev->xstats.count_mode_device;
+		nb_stats = cn10k_mldev->xstats.count_mode_device;
 	else
 		nb_stats = nb_ids;
 
@@ -669,10 +676,10 @@ cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[],
 		else
 			stat_id = stat_ids[i];
 
-		if (stat_id >= mldev->xstats.count_mode_device)
+		if (stat_id >= cn10k_mldev->xstats.count_mode_device)
 			return -EINVAL;
 
-		xs = &mldev->xstats.entries[stat_id];
+		xs = &cn10k_mldev->xstats.entries[stat_id];
 		if (!xs->reset_allowed)
 			continue;
 
@@ -740,15 +747,17 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 			    uint16_t nb_ids)
 {
 	struct cn10k_ml_xstats_entry *xs;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	int32_t lcl_model_id = 0;
 	uint16_t start_id;
 	uint16_t end_id;
 	int32_t i;
 	int32_t j;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	for (i = 0; i < ML_CN10K_MAX_MODELS; i++) {
 		if (model_id == -1) {
 			model = dev->data->models[i];
@@ -765,12 +774,13 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 			}
 		}
 
-		start_id = mldev->xstats.offset_for_model[i];
-		end_id = mldev->xstats.offset_for_model[i] + mldev->xstats.count_per_model[i] - 1;
+		start_id = cn10k_mldev->xstats.offset_for_model[i];
+		end_id = cn10k_mldev->xstats.offset_for_model[i] +
+			 cn10k_mldev->xstats.count_per_model[i] - 1;
 
 		if (stat_ids == NULL) {
 			for (j = start_id; j <= end_id; j++) {
-				xs = &mldev->xstats.entries[j];
+				xs = &cn10k_mldev->xstats.entries[j];
 				cn10k_ml_reset_model_stat(dev, i, xs->type);
 			}
 		} else {
@@ -780,7 +790,7 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 						stat_ids[j], lcl_model_id);
 					return -EINVAL;
 				}
-				xs = &mldev->xstats.entries[stat_ids[j]];
+				xs = &cn10k_mldev->xstats.entries[stat_ids[j]];
 				cn10k_ml_reset_model_stat(dev, i, xs->type);
 			}
 		}
@@ -854,17 +864,19 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
 	if (dev_info == NULL)
 		return -EINVAL;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
 	dev_info->driver_name = dev->device->driver->name;
 	dev_info->max_models = ML_CN10K_MAX_MODELS;
-	if (mldev->hw_queue_lock)
+	if (cn10k_mldev->hw_queue_lock)
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
 	else
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_LF;
@@ -881,8 +893,9 @@ static int
 cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
 {
 	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
@@ -895,7 +908,8 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		return -EINVAL;
 
 	/* Get CN10K device handle */
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	cn10k_ml_dev_info_get(dev, &dev_info);
 	if (conf->nb_models > dev_info.max_models) {
@@ -908,21 +922,21 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		return -EINVAL;
 	}
 
-	if (mldev->state == ML_CN10K_DEV_STATE_PROBED) {
+	if (cnxk_mldev->state == ML_CNXK_DEV_STATE_PROBED) {
 		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
 
 		/* Load firmware */
-		ret = cn10k_ml_fw_load(mldev);
+		ret = cn10k_ml_fw_load(cnxk_mldev);
 		if (ret != 0)
 			return ret;
-	} else if (mldev->state == ML_CN10K_DEV_STATE_CONFIGURED) {
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
 		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
-	} else if (mldev->state == ML_CN10K_DEV_STATE_STARTED) {
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_STARTED) {
 		plt_err("Device can't be reconfigured in started state\n");
 		return -ENOTSUP;
-	} else if (mldev->state == ML_CN10K_DEV_STATE_CLOSED) {
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CLOSED) {
 		plt_err("Device can't be reconfigured after close\n");
 		return -ENOTSUP;
 	}
@@ -1013,10 +1027,10 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	}
 	dev->data->nb_models = conf->nb_models;
 
-	ocm = &mldev->ocm;
+	ocm = &cn10k_mldev->ocm;
 	ocm->num_tiles = ML_CN10K_OCM_NUMTILES;
 	ocm->size_per_tile = ML_CN10K_OCM_TILESIZE;
-	ocm->page_size = mldev->ocm_page_size;
+	ocm->page_size = cn10k_mldev->ocm_page_size;
 	ocm->num_pages = ocm->size_per_tile / ocm->page_size;
 	ocm->mask_words = ocm->num_pages / (8 * sizeof(uint8_t));
 
@@ -1044,25 +1058,25 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	}
 
 	/* Set JCMDQ enqueue function */
-	if (mldev->hw_queue_lock == 1)
-		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_sl;
+	if (cn10k_mldev->hw_queue_lock == 1)
+		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_sl;
 	else
-		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
+		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
 	/* Set polling function pointers */
-	mldev->set_poll_addr = cn10k_ml_set_poll_addr;
-	mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
-	mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
+	cn10k_mldev->set_poll_addr = cn10k_ml_set_poll_addr;
+	cn10k_mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
+	cn10k_mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
 
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
 	dev->op_error_get = cn10k_ml_op_error_get;
 
-	mldev->nb_models_loaded = 0;
-	mldev->nb_models_started = 0;
-	mldev->nb_models_stopped = 0;
-	mldev->nb_models_unloaded = 0;
-	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
+	cnxk_mldev->nb_models_loaded = 0;
+	cnxk_mldev->nb_models_started = 0;
+	cnxk_mldev->nb_models_stopped = 0;
+	cnxk_mldev->nb_models_unloaded = 0;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
 
 	return 0;
 
@@ -1077,8 +1091,9 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 static int
 cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
 	uint16_t qp_id;
@@ -1086,10 +1101,11 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	if (dev == NULL)
 		return -EINVAL;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Release ocm_mask memory */
-	rte_free(mldev->ocm.ocm_mask);
+	rte_free(cn10k_mldev->ocm.ocm_mask);
 
 	/* Stop and unload all models */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
@@ -1125,21 +1141,21 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	cn10k_ml_xstats_uninit(dev);
 
 	/* Unload firmware */
-	cn10k_ml_fw_unload(mldev);
+	cn10k_ml_fw_unload(cnxk_mldev);
 
 	/* Clear scratch registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
 
 	/* Reset ML_MLR_BASE */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_MLR_BASE);
-	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
 
-	mldev->state = ML_CN10K_DEV_STATE_CLOSED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CLOSED;
 
 	/* Remove PCI device */
 	return rte_dev_remove(dev->device);
@@ -1148,17 +1164,19 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 static int
 cn10k_ml_dev_start(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	mldev->state = ML_CN10K_DEV_STATE_STARTED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
 
 	return 0;
 }
@@ -1166,17 +1184,19 @@ cn10k_ml_dev_start(struct rte_ml_dev *dev)
 static int
 cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
 
 	return 0;
 }
@@ -1259,22 +1279,24 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
 			      uint32_t size)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint32_t xstats_mode_count;
 	uint32_t idx = 0;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	xstats_mode_count = 0;
 	switch (mode) {
 	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = mldev->xstats.count_mode_device;
+		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CN10K_MAX_MODELS)
 			break;
-		xstats_mode_count = mldev->xstats.count_per_model[model_id];
+		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
 	default:
 		return -EINVAL;
@@ -1283,16 +1305,17 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 	if (xstats_mode_count > size || xstats_map == NULL)
 		return xstats_mode_count;
 
-	for (i = 0; i < mldev->xstats.count && idx < size; i++) {
-		if (mldev->xstats.entries[i].mode != mode)
+	for (i = 0; i < cn10k_mldev->xstats.count && idx < size; i++) {
+		if (cn10k_mldev->xstats.entries[i].mode != mode)
 			continue;
 
 		if (mode != RTE_ML_DEV_XSTATS_DEVICE &&
-		    model_id != mldev->xstats.entries[i].obj_idx)
+		    model_id != cn10k_mldev->xstats.entries[i].obj_idx)
 			continue;
 
-		strncpy(xstats_map[idx].name, mldev->xstats.entries[i].map.name, RTE_ML_STR_MAX);
-		xstats_map[idx].id = mldev->xstats.entries[i].map.id;
+		rte_strscpy(xstats_map[idx].name, cn10k_mldev->xstats.entries[i].map.name,
+			    RTE_ML_STR_MAX);
+		xstats_map[idx].id = cn10k_mldev->xstats.entries[i].map.id;
 		idx++;
 	}
 
@@ -1304,13 +1327,15 @@ cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16
 				uint64_t *value)
 {
 	struct cn10k_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	cn10k_ml_xstats_fn fn;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
-	for (i = 0; i < mldev->xstats.count; i++) {
-		xs = &mldev->xstats.entries[i];
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	for (i = 0; i < cn10k_mldev->xstats.count; i++) {
+		xs = &cn10k_mldev->xstats.entries[i];
 		if (strncmp(xs->map.name, name, RTE_ML_STR_MAX) == 0) {
 			if (stat_id != NULL)
 				*stat_id = xs->map.id;
@@ -1344,24 +1369,26 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
 {
 	struct cn10k_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint32_t xstats_mode_count;
 	cn10k_ml_xstats_fn fn;
 	uint64_t val;
 	uint32_t idx;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	xstats_mode_count = 0;
 
 	switch (mode) {
 	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = mldev->xstats.count_mode_device;
+		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CN10K_MAX_MODELS)
 			return -EINVAL;
-		xstats_mode_count = mldev->xstats.count_per_model[model_id];
+		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
 	default:
 		return -EINVAL;
@@ -1369,8 +1396,8 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 
 	idx = 0;
 	for (i = 0; i < nb_ids && idx < xstats_mode_count; i++) {
-		xs = &mldev->xstats.entries[stat_ids[i]];
-		if (stat_ids[i] > mldev->xstats.count || xs->mode != mode)
+		xs = &cn10k_mldev->xstats.entries[stat_ids[i]];
+		if (stat_ids[i] > cn10k_mldev->xstats.count || xs->mode != mode)
 			continue;
 
 		if (mode == RTE_ML_DEV_XSTATS_MODEL && model_id != xs->obj_idx) {
@@ -1418,8 +1445,9 @@ cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mo
 static int
 cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_fw *fw;
 
 	uint32_t head_loc;
@@ -1432,8 +1460,9 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	if (roc_env_is_asim())
 		return 0;
 
-	mldev = dev->data->dev_private;
-	fw = &mldev->fw;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	fw = &cn10k_mldev->fw;
 
 	/* Dump model info */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
@@ -1451,15 +1480,19 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	for (core_id = 0; core_id <= 1; core_id++) {
 		bufsize = fw->req->jd.fw_load.debug.debug_buffer_size;
 		if (core_id == 0) {
-			head_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-			tail_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+			head_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+			tail_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_debug_ptr);
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		} else {
-			head_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-			tail_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+			head_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+			tail_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_debug_ptr);
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		}
 		if (head_loc < tail_loc) {
 			fprintf(fp, "%.*s\n", tail_loc - head_loc, &head_ptr[head_loc]);
@@ -1473,18 +1506,18 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	for (core_id = 0; core_id <= 1; core_id++) {
 		bufsize = fw->req->jd.fw_load.debug.exception_state_size;
 		if ((core_id == 0) &&
-		    (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0)) {
+		    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0)) {
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C0 = 0x%016lx",
-				roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0));
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0));
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 			fprintf(fp, "%.*s", bufsize, head_ptr);
-		} else if ((core_id == 1) &&
-			   (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0)) {
+		} else if ((core_id == 1) && (roc_ml_reg_read64(&cn10k_mldev->roc,
+								ML_SCRATCH_EXCEPTION_SP_C1) != 0)) {
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C1 = 0x%016lx",
-				roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1));
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1));
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 			fprintf(fp, "%.*s", bufsize, head_ptr);
 		}
 	}
@@ -1495,14 +1528,16 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 static int
 cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	const struct plt_memzone *mz;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	uint64_t timeout_cycle;
 	bool timeout;
 	int ret;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cn10k_ml_req), 0,
 					 ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL) {
@@ -1515,20 +1550,20 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
 	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
 	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
-	req->jd.fw_load.flags = cn10k_ml_fw_flags_get(&mldev->fw);
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
+	req->jd.fw_load.flags = cn10k_ml_fw_flags_get(&cn10k_mldev->fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
 	/* Enqueue firmware selftest request through scratch registers */
 	timeout = true;
-	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
 
 	plt_rmb();
 	do {
-		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
-		    (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
+		    (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -1552,8 +1587,8 @@ int
 cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
 	struct cn10k_ml_model_metadata *metadata;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
@@ -1574,7 +1609,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	if (ret != 0)
 		return ret;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
 
 	/* Find model ID */
 	found = false;
@@ -1591,7 +1626,8 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	/* Get WB and scratch pages, check if model can be loaded. */
-	ret = cn10k_ml_model_ocm_pages_count(mldev, idx, params->addr, &wb_pages, &scratch_pages);
+	ret = cn10k_ml_model_ocm_pages_count(&cnxk_mldev->cn10k_mldev, idx, params->addr, &wb_pages,
+					     &scratch_pages);
 	if (ret < 0)
 		return ret;
 
@@ -1623,7 +1659,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	model = mz->addr;
-	model->mldev = mldev;
+	model->mldev = cnxk_mldev;
 	model->model_id = idx;
 
 	rte_memcpy(&model->metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
@@ -1680,7 +1716,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
-	mldev->nb_models_loaded++;
+	cnxk_mldev->nb_models_loaded++;
 
 	/* Update xstats names */
 	cn10k_ml_xstats_model_name_update(dev, idx);
@@ -1695,9 +1731,9 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	char str[RTE_MEMZONE_NAMESIZE];
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
 	model = dev->data->models[model_id];
 
 	if (model == NULL) {
@@ -1711,7 +1747,7 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 	}
 
 	dev->data->models[model_id] = NULL;
-	mldev->nb_models_unloaded++;
+	cnxk_mldev->nb_models_unloaded++;
 
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, model_id);
 	return plt_memzone_free(plt_memzone_lookup(str));
@@ -1720,8 +1756,9 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 int
 cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1735,8 +1772,9 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	bool locked;
 	int ret = 0;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	if (model == NULL) {
@@ -1746,11 +1784,11 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Prepare JD */
 	req = model->req;
-	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
+	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
 
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
 	num_tiles = model->metadata.model.tile_end - model->metadata.model.tile_start + 1;
@@ -1815,26 +1853,26 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	job_dequeued = false;
 	do {
 		if (!job_enqueued) {
-			req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&mldev->roc, &req->jd);
+			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
 			if (req->result.error_code.u64 == 0)
 				ret = 0;
 			else
 				ret = -1;
 		}
 	} else { /* Reset scratch registers */
-		roc_ml_scratch_queue_reset(&mldev->roc);
+		roc_ml_scratch_queue_reset(&cn10k_mldev->roc);
 		ret = -ETIME;
 	}
 
@@ -1843,7 +1881,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 		if (plt_spinlock_trylock(&model->lock) != 0) {
 			if (ret == 0) {
 				model->state = ML_CN10K_MODEL_STATE_STARTED;
-				mldev->nb_models_started++;
+				cnxk_mldev->nb_models_started++;
 			} else {
 				model->state = ML_CN10K_MODEL_STATE_UNKNOWN;
 			}
@@ -1867,7 +1905,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	if (ret < 0) { /* Call unload to update model and FW state, ignore error */
 		rte_ml_model_stop(dev->data->dev_id, model_id);
 	} else {
-		if (mldev->cache_model_data && roc_model_is_cn10ka())
+		if (cn10k_mldev->cache_model_data && roc_model_is_cn10ka())
 			ret = cn10k_ml_cache_model_data(dev, model_id);
 	}
 
@@ -1877,8 +1915,9 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 int
 cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1887,8 +1926,9 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	bool locked;
 	int ret = 0;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	if (model == NULL) {
@@ -1898,11 +1938,11 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Prepare JD */
 	req = model->req;
-	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
+	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
 
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
 	locked = false;
@@ -1941,33 +1981,33 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	job_dequeued = false;
 	do {
 		if (!job_enqueued) {
-			req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&mldev->roc, &req->jd);
+			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
 			if (req->result.error_code.u64 == 0x0)
 				ret = 0;
 			else
 				ret = -1;
 		}
 	} else {
-		roc_ml_scratch_queue_reset(&mldev->roc);
+		roc_ml_scratch_queue_reset(&cn10k_mldev->roc);
 		ret = -ETIME;
 	}
 
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			mldev->nb_models_stopped++;
+			cnxk_mldev->nb_models_stopped++;
 			model->state = ML_CN10K_MODEL_STATE_LOADED;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
@@ -2211,8 +2251,9 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		       struct rte_ml_op *op)
 {
 	struct cn10k_ml_model_stats *stats;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
@@ -2258,14 +2299,16 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 
 		/* Handle driver error */
 		if (result->error_code.s.etype == ML_ETYPE_DRIVER) {
-			mldev = dev->data->dev_private;
+			cnxk_mldev = dev->data->dev_private;
+			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 			/* Check for exception */
-			if ((roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0) ||
-			    (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
+			if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) !=
+			     0) ||
+			    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
 				result->error_code.s.stype = ML_DRIVER_ERR_EXCEPTION;
-			else if ((roc_ml_reg_read64(&mldev->roc, ML_CORE_INT_LO) != 0) ||
-				 (roc_ml_reg_read64(&mldev->roc, ML_CORE_INT_HI) != 0))
+			else if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_LO) != 0) ||
+				 (roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_HI) != 0))
 				result->error_code.s.stype = ML_DRIVER_ERR_FW_ERROR;
 			else
 				result->error_code.s.stype = ML_DRIVER_ERR_UNKNOWN;
@@ -2282,8 +2325,9 @@ __rte_hot uint16_t
 cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	struct cn10k_ml_qp *qp;
 	struct rte_ml_op *op;
@@ -2292,7 +2336,8 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	uint64_t head;
 	bool enqueued;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	qp = dev->data->queue_pairs[qp_id];
 	queue = &qp->queue;
 
@@ -2307,15 +2352,15 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	op = ops[count];
 	req = &queue->reqs[head];
 
-	mldev->set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
+	cn10k_mldev->set_poll_addr(req);
+	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
-	mldev->set_poll_ptr(req);
-	enqueued = mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd);
+	cn10k_mldev->set_poll_ptr(req);
+	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
 
@@ -2339,8 +2384,9 @@ __rte_hot uint16_t
 cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	struct cn10k_ml_qp *qp;
 
@@ -2348,7 +2394,8 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	uint16_t count;
 	uint64_t tail;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	qp = dev->data->queue_pairs[qp_id];
 	queue = &qp->queue;
 
@@ -2361,8 +2408,8 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 
 dequeue_req:
 	req = &queue->reqs[tail];
-	status = mldev->get_poll_ptr(req);
-	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH)) {
+	status = cn10k_mldev->get_poll_ptr(req);
+	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
 		if (plt_tsc_cycles() < req->timeout)
 			goto empty_or_active;
 		else /* Timeout, set indication of driver error */
@@ -2420,30 +2467,32 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 __rte_hot int
 cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	bool timeout;
 	int ret = 0;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[op->model_id];
 	req = model->req;
 
 	cn10k_ml_set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
-	mldev->set_poll_ptr(req);
+	cn10k_mldev->set_poll_ptr(req);
 	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
 
 	timeout = true;
-	req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	do {
-		if (mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd)) {
+		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd)) {
 			req->op = op;
 			timeout = false;
 			break;
@@ -2457,7 +2506,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 
 	timeout = true;
 	do {
-		if (mldev->get_poll_ptr(req) == ML_CN10K_POLL_JOB_FINISH) {
+		if (cn10k_mldev->get_poll_ptr(req) == ML_CNXK_POLL_JOB_FINISH) {
 			timeout = false;
 			break;
 		}
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.c b/drivers/ml/cnxk/cnxk_ml_dev.c
new file mode 100644
index 0000000000..2a5c17c973
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_dev.c
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include "cnxk_ml_dev.h"
+
+/* Dummy operations for ML device */
+struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
new file mode 100644
index 0000000000..51315de622
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_DEV_H_
+#define _CNXK_ML_DEV_H_
+
+#include <roc_api.h>
+
+#include "cn10k_ml_dev.h"
+
+/* ML command timeout in seconds */
+#define ML_CNXK_CMD_TIMEOUT 5
+
+/* Poll mode job state */
+#define ML_CNXK_POLL_JOB_START	0
+#define ML_CNXK_POLL_JOB_FINISH 1
+
+/* Device configuration state enum */
+enum cnxk_ml_dev_state {
+	/* Probed and not configured */
+	ML_CNXK_DEV_STATE_PROBED = 0,
+
+	/* Configured */
+	ML_CNXK_DEV_STATE_CONFIGURED,
+
+	/* Started */
+	ML_CNXK_DEV_STATE_STARTED,
+
+	/* Closed */
+	ML_CNXK_DEV_STATE_CLOSED
+};
+
+/* Device private data */
+struct cnxk_ml_dev {
+	/* RTE device */
+	struct rte_ml_dev *mldev;
+
+	/* Configuration state */
+	enum cnxk_ml_dev_state state;
+
+	/* Number of models loaded */
+	uint16_t nb_models_loaded;
+
+	/* Number of models unloaded */
+	uint16_t nb_models_unloaded;
+
+	/* Number of models started */
+	uint16_t nb_models_started;
+
+	/* Number of models stopped */
+	uint16_t nb_models_stopped;
+
+	/* CN10K device structure */
+	struct cn10k_ml_dev cn10k_mldev;
+};
+
+#endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 5bf17d8ae3..e006fdfe0e 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -12,6 +12,7 @@ sources = files(
         'cn10k_ml_ops.c',
         'cn10k_ml_model.c',
         'cn10k_ml_ocm.c',
+        'cnxk_ml_dev.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 03/34] ml/cnxk: add generic model and layer structures
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
  2023-10-19  4:16   ` [PATCH v7 01/34] ml/cnxk: drop support for register polling Srikanth Yalavarthi
  2023-10-19  4:16   ` [PATCH v7 02/34] ml/cnxk: add generic cnxk device structure Srikanth Yalavarthi
@ 2023-10-19  4:16   ` Srikanth Yalavarthi
  2023-10-19  4:16   ` [PATCH v7 04/34] ml/cnxk: add generic cnxk request structure Srikanth Yalavarthi
                     ` (30 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:16 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduce generic cnxk model and layer structure. These
structures would enable supporting models with multiple
layers. A model is a collection of multiple independent
layers with flow dependencies between the layers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |   9 +-
 drivers/ml/cnxk/cn10k_ml_model.c | 247 ++++++++--------
 drivers/ml/cnxk/cn10k_ml_model.h | 122 ++------
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  50 ++--
 drivers/ml/cnxk/cn10k_ml_ocm.h   |   9 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 488 +++++++++++++++++--------------
 drivers/ml/cnxk/cnxk_ml_io.h     |  79 +++++
 drivers/ml/cnxk/cnxk_ml_model.c  |   7 +
 drivers/ml/cnxk/cnxk_ml_model.h  | 111 +++++++
 drivers/ml/cnxk/meson.build      |   1 +
 10 files changed, 653 insertions(+), 470 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index f9da1548c4..99ff0a344a 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -9,6 +9,8 @@
 
 #include "cn10k_ml_ocm.h"
 
+#include "cnxk_ml_io.h"
+
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 
@@ -21,9 +23,6 @@ extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 /* Device alignment size */
 #define ML_CN10K_ALIGN_SIZE 128
 
-/* Maximum number of models per device */
-#define ML_CN10K_MAX_MODELS 16
-
 /* Maximum number of queue-pairs per device, spinlock version */
 #define ML_CN10K_MAX_QP_PER_DEVICE_SL 16
 
@@ -455,8 +454,8 @@ struct cn10k_ml_xstats {
 	struct cn10k_ml_xstats_entry *entries;
 
 	/* Store num stats and offset of the stats for each model */
-	uint16_t count_per_model[ML_CN10K_MAX_MODELS];
-	uint16_t offset_for_model[ML_CN10K_MAX_MODELS];
+	uint16_t count_per_model[ML_CNXK_MAX_MODELS];
+	uint16_t offset_for_model[ML_CNXK_MAX_MODELS];
 	uint16_t count_mode_device;
 	uint16_t count_mode_model;
 	uint16_t count;
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index cc46ca2efd..d033d6deff 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -6,10 +6,10 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -311,19 +311,17 @@ cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata)
 }
 
 void
-cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_t *base_dma_addr)
+cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer, uint8_t *base_dma_addr)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_model_addr *addr;
+	struct cn10k_ml_layer_addr *addr;
 	size_t model_data_size;
 	uint8_t *dma_addr_load;
 	uint8_t *dma_addr_run;
-	uint8_t i;
-	uint8_t j;
 	int fpos;
 
-	metadata = &model->metadata;
-	addr = &model->addr;
+	metadata = &layer->glow.metadata;
+	addr = &layer->glow.addr;
 	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
 			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
 
@@ -361,102 +359,138 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_
 	addr->wb_base_addr = PLT_PTR_SUB(dma_addr_load, metadata->weights_bias.mem_offset);
 	addr->wb_load_addr = PLT_PTR_ADD(addr->wb_base_addr, metadata->weights_bias.mem_offset);
 	rte_memcpy(addr->wb_load_addr, PLT_PTR_ADD(buffer, fpos), metadata->weights_bias.file_size);
+}
+
+void
+cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	uint8_t i;
+	uint8_t j;
+
+	metadata = &layer->glow.metadata;
 
 	/* Inputs */
-	addr->total_input_sz_d = 0;
-	addr->total_input_sz_q = 0;
+	layer->info.nb_inputs = metadata->model.num_input;
+	layer->info.total_input_sz_d = 0;
+	layer->info.total_input_sz_q = 0;
 	for (i = 0; i < metadata->model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			addr->input[i].nb_dims = 4;
-			addr->input[i].shape[0] = metadata->input1[i].shape.w;
-			addr->input[i].shape[1] = metadata->input1[i].shape.x;
-			addr->input[i].shape[2] = metadata->input1[i].shape.y;
-			addr->input[i].shape[3] = metadata->input1[i].shape.z;
-
-			addr->input[i].nb_elements =
+			rte_strscpy(layer->info.input[i].name,
+				    (char *)metadata->input1[i].input_name, MRVL_ML_INPUT_NAME_LEN);
+			layer->info.input[i].dtype = metadata->input1[i].input_type;
+			layer->info.input[i].qtype = metadata->input1[i].model_input_type;
+			layer->info.input[i].nb_dims = 4;
+			layer->info.input[i].shape[0] = metadata->input1[i].shape.w;
+			layer->info.input[i].shape[1] = metadata->input1[i].shape.x;
+			layer->info.input[i].shape[2] = metadata->input1[i].shape.y;
+			layer->info.input[i].shape[3] = metadata->input1[i].shape.z;
+			layer->info.input[i].nb_elements =
 				metadata->input1[i].shape.w * metadata->input1[i].shape.x *
 				metadata->input1[i].shape.y * metadata->input1[i].shape.z;
-			addr->input[i].sz_d =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_d =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].input_type);
-			addr->input[i].sz_q =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_q =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].model_input_type);
-			addr->total_input_sz_d += addr->input[i].sz_d;
-			addr->total_input_sz_q += addr->input[i].sz_q;
+			layer->info.input[i].scale = metadata->input1[i].qscale;
+
+			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
+			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
 
 			plt_ml_dbg(
-				"model_id = %u, input[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				model->model_id, i, metadata->input1[i].shape.w,
+				"index = %u, input1[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				layer->index, i, metadata->input1[i].shape.w,
 				metadata->input1[i].shape.x, metadata->input1[i].shape.y,
-				metadata->input1[i].shape.z, addr->input[i].sz_d,
-				addr->input[i].sz_q);
+				metadata->input1[i].shape.z, layer->info.input[i].sz_d,
+				layer->info.input[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			addr->input[i].nb_dims = 4;
-			addr->input[i].shape[0] = metadata->input2[j].shape.w;
-			addr->input[i].shape[1] = metadata->input2[j].shape.x;
-			addr->input[i].shape[2] = metadata->input2[j].shape.y;
-			addr->input[i].shape[3] = metadata->input2[j].shape.z;
-
-			addr->input[i].nb_elements =
+			rte_strscpy(layer->info.input[i].name,
+				    (char *)metadata->input2[j].input_name, MRVL_ML_INPUT_NAME_LEN);
+			layer->info.input[i].dtype = metadata->input2[j].input_type;
+			layer->info.input[i].qtype = metadata->input2[j].model_input_type;
+			layer->info.input[i].nb_dims = 4;
+			layer->info.input[i].shape[0] = metadata->input2[j].shape.w;
+			layer->info.input[i].shape[1] = metadata->input2[j].shape.x;
+			layer->info.input[i].shape[2] = metadata->input2[j].shape.y;
+			layer->info.input[i].shape[3] = metadata->input2[j].shape.z;
+			layer->info.input[i].nb_elements =
 				metadata->input2[j].shape.w * metadata->input2[j].shape.x *
 				metadata->input2[j].shape.y * metadata->input2[j].shape.z;
-			addr->input[i].sz_d =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_d =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].input_type);
-			addr->input[i].sz_q =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_q =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].model_input_type);
-			addr->total_input_sz_d += addr->input[i].sz_d;
-			addr->total_input_sz_q += addr->input[i].sz_q;
+			layer->info.input[i].scale = metadata->input2[j].qscale;
+
+			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
+			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
 
 			plt_ml_dbg(
-				"model_id = %u, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				model->model_id, j, metadata->input2[j].shape.w,
+				"index = %u, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				layer->index, j, metadata->input2[j].shape.w,
 				metadata->input2[j].shape.x, metadata->input2[j].shape.y,
-				metadata->input2[j].shape.z, addr->input[i].sz_d,
-				addr->input[i].sz_q);
+				metadata->input2[j].shape.z, layer->info.input[i].sz_d,
+				layer->info.input[i].sz_q);
 		}
 	}
 
 	/* Outputs */
-	addr->total_output_sz_q = 0;
-	addr->total_output_sz_d = 0;
+	layer->info.nb_outputs = metadata->model.num_output;
+	layer->info.total_output_sz_q = 0;
+	layer->info.total_output_sz_d = 0;
 	for (i = 0; i < metadata->model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			addr->output[i].nb_dims = 1;
-			addr->output[i].shape[0] = metadata->output1[i].size;
-			addr->output[i].nb_elements = metadata->output1[i].size;
-			addr->output[i].sz_d =
-				addr->output[i].nb_elements *
+			rte_strscpy(layer->info.output[i].name,
+				    (char *)metadata->output1[i].output_name,
+				    MRVL_ML_OUTPUT_NAME_LEN);
+			layer->info.output[i].dtype = metadata->output1[i].output_type;
+			layer->info.output[i].qtype = metadata->output1[i].model_output_type;
+			layer->info.output[i].nb_dims = 1;
+			layer->info.output[i].shape[0] = metadata->output1[i].size;
+			layer->info.output[i].nb_elements = metadata->output1[i].size;
+			layer->info.output[i].sz_d =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].output_type);
-			addr->output[i].sz_q =
-				addr->output[i].nb_elements *
+			layer->info.output[i].sz_q =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].model_output_type);
-			addr->total_output_sz_q += addr->output[i].sz_q;
-			addr->total_output_sz_d += addr->output[i].sz_d;
+			layer->info.output[i].scale = metadata->output1[i].dscale;
 
-			plt_ml_dbg("model_id = %u, output[%u] - sz_d = %u, sz_q = %u",
-				   model->model_id, i, addr->output[i].sz_d, addr->output[i].sz_q);
+			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
+			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
+
+			plt_ml_dbg("index = %u, output1[%u] - sz_d = %u, sz_q = %u", layer->index,
+				   i, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			addr->output[i].nb_dims = 1;
-			addr->output[i].shape[0] = metadata->output2[j].size;
-			addr->output[i].nb_elements = metadata->output2[j].size;
-			addr->output[i].sz_d =
-				addr->output[i].nb_elements *
+			rte_strscpy(layer->info.output[i].name,
+				    (char *)metadata->output2[j].output_name,
+				    MRVL_ML_OUTPUT_NAME_LEN);
+			layer->info.output[i].dtype = metadata->output2[j].output_type;
+			layer->info.output[i].qtype = metadata->output2[j].model_output_type;
+			layer->info.output[i].nb_dims = 1;
+			layer->info.output[i].shape[0] = metadata->output2[j].size;
+			layer->info.output[i].nb_elements = metadata->output2[j].size;
+			layer->info.output[i].sz_d =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].output_type);
-			addr->output[i].sz_q =
-				addr->output[i].nb_elements *
+			layer->info.output[i].sz_q =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].model_output_type);
-			addr->total_output_sz_q += addr->output[i].sz_q;
-			addr->total_output_sz_d += addr->output[i].sz_d;
+			layer->info.output[i].scale = metadata->output2[j].dscale;
+
+			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
+			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
 
-			plt_ml_dbg("model_id = %u, output2[%u] - sz_d = %u, sz_q = %u",
-				   model->model_id, j, addr->output[i].sz_d, addr->output[i].sz_q);
+			plt_ml_dbg("index = %u, output2[%u] - sz_d = %u, sz_q = %u", layer->index,
+				   j, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
 		}
 	}
 }
@@ -514,23 +548,23 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 }
 
 void
-cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model)
+cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_model_addr *addr;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct rte_ml_model_info *info;
 	struct rte_ml_io_info *output;
 	struct rte_ml_io_info *input;
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_layer *layer;
 	uint8_t i;
-	uint8_t j;
 
-	mldev = dev->data->dev_private;
-	metadata = &model->metadata;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	metadata = &model->glow.metadata;
 	info = PLT_PTR_CAST(model->info);
 	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
 	output = PLT_PTR_ADD(input, metadata->model.num_input * sizeof(struct rte_ml_io_info));
-	addr = &model->addr;
 
 	/* Set model info */
 	memset(info, 0, sizeof(struct rte_ml_model_info));
@@ -542,7 +576,8 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model)
 	info->device_id = dev->data->dev_id;
 	info->io_layout = RTE_ML_IO_LAYOUT_PACKED;
 	info->min_batches = model->batch_size;
-	info->max_batches = mldev->fw.req->jd.fw_load.cap.s.max_num_batches / model->batch_size;
+	info->max_batches =
+		cn10k_mldev->fw.req->jd.fw_load.cap.s.max_num_batches / model->batch_size;
 	info->nb_inputs = metadata->model.num_input;
 	info->input_info = input;
 	info->nb_outputs = metadata->model.num_output;
@@ -550,56 +585,26 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model)
 	info->wb_size = metadata->weights_bias.file_size;
 
 	/* Set input info */
+	layer = &model->layer[0];
 	for (i = 0; i < info->nb_inputs; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			rte_memcpy(input[i].name, metadata->input1[i].input_name,
-				   MRVL_ML_INPUT_NAME_LEN);
-			input[i].nb_dims = addr->input[i].nb_dims;
-			input[i].shape = addr->input[i].shape;
-			input[i].type = metadata->input1[i].model_input_type;
-			input[i].nb_elements = addr->input[i].nb_elements;
-			input[i].size =
-				addr->input[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->input1[i].model_input_type);
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-
-			rte_memcpy(input[i].name, metadata->input2[j].input_name,
-				   MRVL_ML_INPUT_NAME_LEN);
-			input[i].nb_dims = addr->input[i].nb_dims;
-			input[i].shape = addr->input[i].shape;
-			input[i].type = metadata->input2[j].model_input_type;
-			input[i].nb_elements = addr->input[i].nb_elements;
-			input[i].size =
-				addr->input[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->input2[j].model_input_type);
-		}
+		rte_memcpy(input[i].name, layer->info.input[i].name, MRVL_ML_INPUT_NAME_LEN);
+		input[i].nb_dims = layer->info.input[i].nb_dims;
+		input[i].shape = &layer->info.input[i].shape[0];
+		input[i].type = layer->info.input[i].qtype;
+		input[i].nb_elements = layer->info.input[i].nb_elements;
+		input[i].size = layer->info.input[i].nb_elements *
+				rte_ml_io_type_size_get(layer->info.input[i].qtype);
 	}
 
 	/* Set output info */
+	layer = &model->layer[0];
 	for (i = 0; i < info->nb_outputs; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			rte_memcpy(output[i].name, metadata->output1[i].output_name,
-				   MRVL_ML_OUTPUT_NAME_LEN);
-			output[i].nb_dims = addr->output[i].nb_dims;
-			output[i].shape = addr->output[i].shape;
-			output[i].type = metadata->output1[i].model_output_type;
-			output[i].nb_elements = addr->output[i].nb_elements;
-			output[i].size =
-				addr->output[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->output1[i].model_output_type);
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-
-			rte_memcpy(output[i].name, metadata->output2[j].output_name,
-				   MRVL_ML_OUTPUT_NAME_LEN);
-			output[i].nb_dims = addr->output[i].nb_dims;
-			output[i].shape = addr->output[i].shape;
-			output[i].type = metadata->output2[j].model_output_type;
-			output[i].nb_elements = addr->output[i].nb_elements;
-			output[i].size =
-				addr->output[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->output2[j].model_output_type);
-		}
+		rte_memcpy(output[i].name, layer->info.output[i].name, MRVL_ML_INPUT_NAME_LEN);
+		output[i].nb_dims = layer->info.output[i].nb_dims;
+		output[i].shape = &layer->info.output[i].shape[0];
+		output[i].type = layer->info.output[i].qtype;
+		output[i].nb_elements = layer->info.output[i].nb_elements;
+		output[i].size = layer->info.output[i].nb_elements *
+				 rte_ml_io_type_size_get(layer->info.output[i].qtype);
 	}
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 3128b28db7..206a369ca7 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -13,15 +13,8 @@
 #include "cn10k_ml_ocm.h"
 #include "cn10k_ml_ops.h"
 
-struct cnxk_ml_dev;
-
-/* Model state */
-enum cn10k_ml_model_state {
-	ML_CN10K_MODEL_STATE_LOADED,
-	ML_CN10K_MODEL_STATE_JOB_ACTIVE,
-	ML_CN10K_MODEL_STATE_STARTED,
-	ML_CN10K_MODEL_STATE_UNKNOWN,
-};
+struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 /* Model Metadata : v 2.3.0.1 */
 #define MRVL_ML_MODEL_MAGIC_STRING "MRVL"
@@ -369,7 +362,7 @@ struct cn10k_ml_model_metadata {
 };
 
 /* Model address structure */
-struct cn10k_ml_model_addr {
+struct cn10k_ml_layer_addr {
 	/* Base DMA address for load */
 	void *base_dma_addr_load;
 
@@ -408,58 +401,10 @@ struct cn10k_ml_model_addr {
 
 	/* End tile */
 	uint8_t tile_end;
-
-	/* Input address and size */
-	struct {
-		/* Number of dimensions in shape */
-		uint32_t nb_dims;
-
-		/* Shape of input */
-		uint32_t shape[4];
-
-		/* Number of elements */
-		uint32_t nb_elements;
-
-		/* Dequantized input size */
-		uint32_t sz_d;
-
-		/* Quantized input size */
-		uint32_t sz_q;
-	} input[MRVL_ML_NUM_INPUT_OUTPUT];
-
-	/* Output address and size */
-	struct {
-		/* Number of dimensions in shape */
-		uint32_t nb_dims;
-
-		/* Shape of input */
-		uint32_t shape[4];
-
-		/* Number of elements */
-		uint32_t nb_elements;
-
-		/* Dequantize output size */
-		uint32_t sz_d;
-
-		/* Quantized output size */
-		uint32_t sz_q;
-	} output[MRVL_ML_NUM_INPUT_OUTPUT];
-
-	/* Total size of quantized input */
-	uint32_t total_input_sz_q;
-
-	/* Total size of dequantized input */
-	uint32_t total_input_sz_d;
-
-	/* Total size of quantized output */
-	uint32_t total_output_sz_q;
-
-	/* Total size of dequantized output */
-	uint32_t total_output_sz_d;
 };
 
 /* Model fast-path stats */
-struct cn10k_ml_model_stats {
+struct cn10k_ml_layer_stats {
 	/* Total hardware latency, sum of all inferences */
 	uint64_t hw_latency_tot;
 
@@ -488,59 +433,38 @@ struct cn10k_ml_model_stats {
 	uint64_t fw_reset_count;
 };
 
-/* Model Object */
-struct cn10k_ml_model {
-	/* Device reference */
-	struct cnxk_ml_dev *mldev;
-
-	/* Name */
-	char name[RTE_ML_STR_MAX];
-
-	/* ID */
-	uint16_t model_id;
-
-	/* Batch size */
-	uint32_t batch_size;
-
-	/* Metadata */
+struct cn10k_ml_layer_data {
+	/* Model / Layer: metadata */
 	struct cn10k_ml_model_metadata metadata;
 
-	/* Address structure */
-	struct cn10k_ml_model_addr addr;
+	/* Layer: address structure */
+	struct cn10k_ml_layer_addr addr;
 
-	/* Tile and memory information object */
-	struct cn10k_ml_ocm_model_map model_mem_map;
+	/* Layer: Tile and memory information object */
+	struct cn10k_ml_ocm_layer_map ocm_map;
 
-	/* Internal model information structure
-	 * Size of the buffer = sizeof(struct rte_ml_model_info)
-	 *                    + num_inputs * sizeof(struct rte_ml_io_info)
-	 *                    + num_outputs * sizeof(struct rte_ml_io_info).
-	 * Structures would be arranged in the same order in the buffer.
-	 */
-	uint8_t *info;
-
-	/* Spinlock, used to update model state */
-	plt_spinlock_t lock;
-
-	/* State */
-	enum cn10k_ml_model_state state;
-
-	/* Slow-path operations request pointer */
+	/* Layer: Slow-path operations request pointer */
 	struct cn10k_ml_req *req;
 
-	/* Stats for burst ops */
-	struct cn10k_ml_model_stats *burst_stats;
+	/* Layer: Stats for burst ops */
+	struct cn10k_ml_layer_stats *burst_stats;
 
-	/* Stats for sync ops */
-	struct cn10k_ml_model_stats *sync_stats;
+	/* Layer: Stats for sync ops */
+	struct cn10k_ml_layer_stats *sync_stats;
+};
+
+struct cn10k_ml_model_data {
+	/* Model / Layer: metadata */
+	struct cn10k_ml_model_metadata metadata;
 };
 
 int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
-void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
+void cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer,
 				uint8_t *base_dma_addr);
+void cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer);
 int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id,
 				   uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages);
-void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model);
+void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 8094a0fab1..d71c36eae6 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -6,10 +6,10 @@
 
 #include <roc_api.h>
 
-#include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 
 /* OCM macros */
 #define BYTE_LEN	   8
@@ -333,12 +333,14 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 }
 
 void
-cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t tilemask,
-			   int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages)
+cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
+			   uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
+			   uint16_t scratch_pages)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_page_start;
@@ -353,6 +355,7 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
+	layer = &model->layer[layer_id];
 
 	/* Get first set bit, tile_start */
 	tile_start = 0;
@@ -382,8 +385,8 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 				PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, wb_page_end);
 	}
 
-	model->addr.tile_start = tile_start;
-	model->addr.tile_end = tile_end;
+	layer->glow.addr.tile_start = tile_start;
+	layer->glow.addr.tile_end = tile_end;
 
 	plt_ml_dbg("model_id = %u, tilemask = 0x%016lx", model_id, tilemask);
 	plt_ml_dbg("model_id = %u, wb_page_start = %d, wb_page_end = %d", model_id, wb_page_start,
@@ -393,12 +396,14 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 }
 
 void
-cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id)
 {
-	struct cn10k_ml_model *local_model;
+	struct cnxk_ml_model *local_model;
+	struct cnxk_ml_layer *local_layer;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_resize_pages;
@@ -409,16 +414,19 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 	int tile_id;
 	int page_id;
 	uint16_t i;
+	uint16_t j;
 
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
+	layer = &model->layer[layer_id];
 
 	/* Update OCM info for WB memory */
-	wb_page_start = model->model_mem_map.wb_page_start;
-	wb_page_end = wb_page_start + model->model_mem_map.wb_pages - 1;
-	for (tile_id = model->addr.tile_start; tile_id <= model->addr.tile_end; tile_id++) {
+	wb_page_start = layer->glow.ocm_map.wb_page_start;
+	wb_page_end = wb_page_start + layer->glow.ocm_map.wb_pages - 1;
+	for (tile_id = layer->glow.addr.tile_start; tile_id <= layer->glow.addr.tile_end;
+	     tile_id++) {
 		for (page_id = wb_page_start; page_id <= wb_page_end; page_id++) {
 			CLEAR_BIT(ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE],
 				  page_id % OCM_MAP_WORD_SIZE);
@@ -432,11 +440,19 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 		scratch_resize_pages = 0;
 		for (i = 0; i < dev->data->nb_models; i++) {
 			local_model = dev->data->models[i];
-			if ((i != model_id) && (local_model != NULL)) {
-				if (IS_BIT_SET(local_model->model_mem_map.tilemask, tile_id))
-					scratch_resize_pages = PLT_MAX(
-						(int)local_model->model_mem_map.scratch_pages,
-						scratch_resize_pages);
+			if (local_model == NULL)
+				continue;
+
+			for (j = 0; j < local_model->nb_layers; j++) {
+				local_layer = &local_model->layer[j];
+				if (local_layer != layer &&
+				    local_layer->glow.ocm_map.ocm_reserved) {
+					if (IS_BIT_SET(local_layer->glow.ocm_map.tilemask, tile_id))
+						scratch_resize_pages =
+							PLT_MAX((int)local_layer->glow.ocm_map
+									.scratch_pages,
+								scratch_resize_pages);
+				}
 			}
 		}
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 3404e7fd65..720f8caf76 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -27,7 +27,7 @@ struct cn10k_ml_ocm_tile_info {
 };
 
 /* Model OCM map structure */
-struct cn10k_ml_ocm_model_map {
+struct cn10k_ml_ocm_layer_map {
 	/* Status of OCM reservation */
 	bool ocm_reserved;
 
@@ -77,9 +77,10 @@ struct cn10k_ml_ocm {
 int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
 int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
 			       uint16_t scratch_pages, uint64_t *tilemask);
-void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t tilemask,
-				int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages);
-void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id);
+void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
+				uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
+				uint16_t scratch_pages);
+void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id);
 void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp);
 
 #endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index dc747cf534..b226a9b5a2 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -7,10 +7,10 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_model.h"
 #include "cn10k_ml_ops.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
@@ -202,7 +202,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	char str[STR_LEN];
 	uint8_t i;
@@ -215,77 +215,80 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 
 	/* Print debug info */
 	print_line(fp, LINE_LEN);
-	fprintf(fp, " Model Information (%s)\n", model->metadata.model.name);
+	fprintf(fp, " Model Information (%s)\n", model->glow.metadata.model.name);
 	print_line(fp, LINE_LEN);
-	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->metadata.model.name);
-	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version", model->metadata.model.version[0],
-		model->metadata.model.version[1], model->metadata.model.version[2],
-		model->metadata.model.version[3]);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->glow.metadata.model.name);
+	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version",
+		model->glow.metadata.model.version[0], model->glow.metadata.model.version[1],
+		model->glow.metadata.model.version[2], model->glow.metadata.model.version[3]);
 	if (strlen(model->name) != 0)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "debug_name", model->name);
 	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->metadata.model.batch_size);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_layers", model->metadata.model.num_layers);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->glow.metadata.model.batch_size);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_layers", model->glow.metadata.model.num_layers);
 
 	/* Print model state */
-	if (model->state == ML_CN10K_MODEL_STATE_LOADED)
+	if (model->state == ML_CNXK_MODEL_STATE_LOADED)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
-	if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE)
+	if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
-	if (model->state == ML_CN10K_MODEL_STATE_STARTED)
+	if (model->state == ML_CNXK_MODEL_STATE_STARTED)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
 
 	/* Print OCM status */
 	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
-		model->metadata.model.ocm_wb_range_end - model->metadata.model.ocm_wb_range_start +
-			1);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", model->model_mem_map.wb_pages);
+		model->glow.metadata.model.ocm_wb_range_end -
+			model->glow.metadata.model.ocm_wb_range_start + 1);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", model->layer[0].glow.ocm_map.wb_pages);
 	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
-		ocm->size_per_tile - model->metadata.model.ocm_tmp_range_floor);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages", model->model_mem_map.scratch_pages);
+		ocm->size_per_tile - model->glow.metadata.model.ocm_tmp_range_floor);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages",
+		model->layer[0].glow.ocm_map.scratch_pages);
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
-		model->metadata.model.tile_end - model->metadata.model.tile_start + 1);
+		model->glow.metadata.model.tile_end - model->glow.metadata.model.tile_start + 1);
 
-	if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+	if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
-			ML_CN10K_OCM_NUMTILES / 4, model->model_mem_map.tilemask);
+			ML_CN10K_OCM_NUMTILES / 4, model->layer[0].glow.ocm_map.tilemask);
 		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
-			model->model_mem_map.wb_page_start * cn10k_mldev->ocm.page_size);
+			model->layer[0].glow.ocm_map.wb_page_start * cn10k_mldev->ocm.page_size);
 	}
 
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->metadata.model.num_input);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", model->metadata.model.num_output);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->glow.metadata.model.num_input);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", model->glow.metadata.model.num_output);
 	fprintf(fp, "\n");
 
 	print_line(fp, LINE_LEN);
 	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "input", "input_name", "input_type",
 		"model_input_type", "quantize");
 	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->metadata.model.num_input; i++) {
+	for (i = 0; i < model->glow.metadata.model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.input1[i].input_name);
-			rte_ml_io_type_to_str(model->metadata.input1[i].input_type, str, STR_LEN);
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.input1[i].input_name);
+			rte_ml_io_type_to_str(model->glow.metadata.input1[i].input_type, str,
+					      STR_LEN);
 			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.input1[i].model_input_type, str,
+			rte_ml_io_type_to_str(model->glow.metadata.input1[i].model_input_type, str,
 					      STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.input1[i].quantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.input1[i].quantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.input2[j].input_name);
-			rte_ml_io_type_to_str(model->metadata.input2[j].input_type, str, STR_LEN);
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.input2[j].input_name);
+			rte_ml_io_type_to_str(model->glow.metadata.input2[j].input_type, str,
+					      STR_LEN);
 			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.input2[j].model_input_type, str,
+			rte_ml_io_type_to_str(model->glow.metadata.input2[j].model_input_type, str,
 					      STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.input2[j].quantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.input2[j].quantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		}
 	}
@@ -295,29 +298,31 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "output", "output_name", "output_type",
 		"model_output_type", "dequantize");
 	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->metadata.model.num_output; i++) {
+	for (i = 0; i < model->glow.metadata.model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.output1[i].output_name);
-			rte_ml_io_type_to_str(model->metadata.output1[i].output_type, str, STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.output1[i].model_output_type, str,
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.output1[i].output_name);
+			rte_ml_io_type_to_str(model->glow.metadata.output1[i].output_type, str,
 					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(model->glow.metadata.output1[i].model_output_type,
+					      str, STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.output1[i].dequantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.output1[i].dequantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.output2[j].output_name);
-			rte_ml_io_type_to_str(model->metadata.output2[j].output_type, str, STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.output2[j].model_output_type, str,
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.output2[j].output_name);
+			rte_ml_io_type_to_str(model->glow.metadata.output2[j].output_type, str,
 					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(model->glow.metadata.output2[j].model_output_type,
+					      str, STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.output2[j].dequantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.output2[j].dequantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		}
 	}
@@ -327,14 +332,14 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 }
 
 static void
-cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_model *model,
+cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_model *model,
 				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_model_addr *addr;
+	struct cn10k_ml_layer_addr *addr;
 
-	metadata = &model->metadata;
-	addr = &model->addr;
+	metadata = &model->glow.metadata;
+	addr = &model->layer[0].glow.addr;
 
 	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
 	req->jd.hdr.jce.w0.u64 = 0;
@@ -345,7 +350,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_m
 	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
-		if (!model->metadata.model.ocm_relocatable)
+		if (!model->glow.metadata.model.ocm_relocatable)
 			req->jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
 		else
 			req->jd.hdr.sp_flags = 0x0;
@@ -385,7 +390,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_m
 		req->jd.model_start.output.s.ddr_range_end = metadata->model.ddr_output_range_end;
 
 		req->extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
-			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, model->addr.scratch_base_addr));
+			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->scratch_base_addr));
 		req->extended_args.start.ddr_scratch_range_start =
 			metadata->model.ddr_scratch_range_start;
 		req->extended_args.start.ddr_scratch_range_end =
@@ -445,7 +450,7 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
-	nb_stats = RTE_DIM(device_stats) + ML_CN10K_MAX_MODELS * RTE_DIM(model_stats);
+	nb_stats = RTE_DIM(device_stats) + ML_CNXK_MAX_MODELS * RTE_DIM(model_stats);
 	if (cn10k_mldev->xstats.entries == NULL)
 		cn10k_mldev->xstats.entries = rte_zmalloc(
 			"cn10k_ml_xstats", sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
@@ -472,7 +477,7 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	cn10k_mldev->xstats.count_mode_device = stat_id;
 
 	/* Initialize model xstats */
-	for (model = 0; model < ML_CN10K_MAX_MODELS; model++) {
+	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
 		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
 
 		for (i = 0; i < RTE_DIM(model_stats); i++) {
@@ -521,7 +526,7 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint16_t rclk_freq;
 	uint16_t sclk_freq;
 	uint16_t stat_id;
@@ -543,7 +548,7 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 	for (i = 0; i < RTE_DIM(model_stats); i++) {
 		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
-			 model->metadata.model.name, model_stats[i].name, suffix);
+			 model->layer[0].glow.metadata.model.name, model_stats[i].name, suffix);
 		stat_id++;
 	}
 }
@@ -576,9 +581,9 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value += model->burst_stats[qp_id].str##_latency_tot;                      \
-			count += model->burst_stats[qp_id].dequeued_count -                        \
-				 model->burst_stats[qp_id].str##_reset_count;                      \
+			value += model->layer[0].glow.burst_stats[qp_id].str##_latency_tot;        \
+			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
+				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
 		}                                                                                  \
 		if (count != 0)                                                                    \
 			value = value / count;                                                     \
@@ -588,9 +593,10 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = UINT64_MAX;                                                                \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MIN(value, model->burst_stats[qp_id].str##_latency_min);       \
-			count += model->burst_stats[qp_id].dequeued_count -                        \
-				 model->burst_stats[qp_id].str##_reset_count;                      \
+			value = PLT_MIN(                                                           \
+				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_min); \
+			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
+				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
@@ -600,9 +606,10 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MAX(value, model->burst_stats[qp_id].str##_latency_max);       \
-			count += model->burst_stats[qp_id].dequeued_count -                        \
-				 model->burst_stats[qp_id].str##_reset_count;                      \
+			value = PLT_MAX(                                                           \
+				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_max); \
+			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
+				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
@@ -611,7 +618,7 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 static uint64_t
 cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cn10k_ml_xstats_type type)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint16_t rclk_freq; /* MHz */
 	uint16_t sclk_freq; /* MHz */
 	uint64_t count = 0;
@@ -692,28 +699,28 @@ cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[],
 #define ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			model->burst_stats[qp_id].str##_latency_tot = 0;                           \
-			model->burst_stats[qp_id].str##_reset_count =                              \
-				model->burst_stats[qp_id].dequeued_count;                          \
+			model->layer[0].glow.burst_stats[qp_id].str##_latency_tot = 0;             \
+			model->layer[0].glow.burst_stats[qp_id].str##_reset_count =                \
+				model->layer[0].glow.burst_stats[qp_id].dequeued_count;            \
 		}                                                                                  \
 	} while (0)
 
 #define ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->burst_stats[qp_id].str##_latency_min = UINT64_MAX;                  \
+			model->layer[0].glow.burst_stats[qp_id].str##_latency_min = UINT64_MAX;    \
 	} while (0)
 
 #define ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->burst_stats[qp_id].str##_latency_max = 0;                           \
+			model->layer[0].glow.burst_stats[qp_id].str##_latency_max = 0;             \
 	} while (0)
 
 static void
 cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cn10k_ml_xstats_type type)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint32_t qp_id;
 
 	model = dev->data->models[model_id];
@@ -749,7 +756,7 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	int32_t lcl_model_id = 0;
 	uint16_t start_id;
 	uint16_t end_id;
@@ -758,7 +765,7 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	for (i = 0; i < ML_CN10K_MAX_MODELS; i++) {
+	for (i = 0; i < ML_CNXK_MAX_MODELS; i++) {
 		if (model_id == -1) {
 			model = dev->data->models[i];
 			if (model == NULL) /* Skip inactive models */
@@ -803,7 +810,7 @@ static int
 cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct rte_ml_model_info *info;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct rte_ml_buff_seg seg[2];
 	struct rte_ml_buff_seg *inp;
 	struct rte_ml_buff_seg *out;
@@ -854,7 +861,7 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	op.input = &inp;
 	op.output = &out;
 
-	memset(model->req, 0, sizeof(struct cn10k_ml_req));
+	memset(model->layer[0].glow.req, 0, sizeof(struct cn10k_ml_req));
 	ret = cn10k_ml_inference_sync(dev, &op);
 	plt_memzone_free(mz);
 
@@ -875,7 +882,7 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 
 	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
 	dev_info->driver_name = dev->device->driver->name;
-	dev_info->max_models = ML_CN10K_MAX_MODELS;
+	dev_info->max_models = ML_CNXK_MAX_MODELS;
 	if (cn10k_mldev->hw_queue_lock)
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
 	else
@@ -895,7 +902,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
@@ -1001,11 +1008,11 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 			model = dev->data->models[model_id];
 			if (model != NULL) {
-				if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 					if (cn10k_ml_model_stop(dev, model_id) != 0)
 						plt_err("Could not stop model %u", model_id);
 				}
-				if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
 					if (cn10k_ml_model_unload(dev, model_id) != 0)
 						plt_err("Could not unload model %u", model_id);
 				}
@@ -1093,7 +1100,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
 	uint16_t qp_id;
@@ -1111,11 +1118,11 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 		model = dev->data->models[model_id];
 		if (model != NULL) {
-			if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 				if (cn10k_ml_model_stop(dev, model_id) != 0)
 					plt_err("Could not stop model %u", model_id);
 			}
-			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
 				if (cn10k_ml_model_unload(dev, model_id) != 0)
 					plt_err("Could not unload model %u", model_id);
 			}
@@ -1294,7 +1301,7 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CN10K_MAX_MODELS)
+		if (model_id >= ML_CNXK_MAX_MODELS)
 			break;
 		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
@@ -1386,7 +1393,7 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CN10K_MAX_MODELS)
+		if (model_id >= ML_CNXK_MAX_MODELS)
 			return -EINVAL;
 		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
@@ -1447,7 +1454,7 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_fw *fw;
 
 	uint32_t head_loc;
@@ -1588,7 +1595,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
@@ -1643,9 +1650,9 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
 			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
 	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
-	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_model_stats);
+	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_stats);
 
-	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
+	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
 		  2 * model_data_size + model_scratch_size + model_info_size +
 		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE) +
 		  model_stats_size;
@@ -1659,62 +1666,85 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	model = mz->addr;
-	model->mldev = cnxk_mldev;
+	model->cnxk_mldev = cnxk_mldev;
 	model->model_id = idx;
+	dev->data->models[idx] = model;
 
-	rte_memcpy(&model->metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
-	cn10k_ml_model_metadata_update(&model->metadata);
+	rte_memcpy(&model->glow.metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
+	cn10k_ml_model_metadata_update(&model->glow.metadata);
+
+	/* Set model name */
+	rte_memcpy(model->name, (char *)model->glow.metadata.model.name, 64);
 
 	/* Enable support for batch_size of 256 */
-	if (model->metadata.model.batch_size == 0)
+	if (model->glow.metadata.model.batch_size == 0)
 		model->batch_size = 256;
 	else
-		model->batch_size = model->metadata.model.batch_size;
+		model->batch_size = model->glow.metadata.model.batch_size;
+
+	/* Since the number of layers that the driver would be handling for glow models is
+	 * always 1. consider the entire model as a model with single layer. This would
+	 * ignore the num_layers from metadata.
+	 */
+	model->nb_layers = 1;
+
+	/* Copy metadata to internal buffer */
+	rte_memcpy(&model->layer[0].glow.metadata, params->addr,
+		   sizeof(struct cn10k_ml_model_metadata));
+	cn10k_ml_model_metadata_update(&model->layer[0].glow.metadata);
+	model->layer[0].model = model;
 
 	/* Set DMA base address */
 	base_dma_addr = PLT_PTR_ADD(
-		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE));
-	cn10k_ml_model_addr_update(model, params->addr, base_dma_addr);
-	model->addr.scratch_base_addr = PLT_PTR_ADD(base_dma_addr, 2 * model_data_size);
+		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE));
+	cn10k_ml_layer_addr_update(&model->layer[0], params->addr, base_dma_addr);
+	model->layer[0].glow.addr.scratch_base_addr =
+		PLT_PTR_ADD(base_dma_addr, 2 * model_data_size);
 
 	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, model_data_size);
+	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
+		   model->layer[0].glow.addr.base_dma_addr_load, model_data_size);
+
+	/* Update internal I/O data structure */
+	cn10k_ml_layer_info_update(&model->layer[0]);
 
 	/* Initialize model_mem_map */
-	memset(&model->model_mem_map, 0, sizeof(struct cn10k_ml_ocm_model_map));
-	model->model_mem_map.ocm_reserved = false;
-	model->model_mem_map.tilemask = 0;
-	model->model_mem_map.wb_page_start = -1;
-	model->model_mem_map.wb_pages = wb_pages;
-	model->model_mem_map.scratch_pages = scratch_pages;
+	memset(&model->layer[0].glow.ocm_map, 0, sizeof(struct cn10k_ml_ocm_layer_map));
+	model->layer[0].glow.ocm_map.ocm_reserved = false;
+	model->layer[0].glow.ocm_map.tilemask = 0;
+	model->layer[0].glow.ocm_map.wb_page_start = -1;
+	model->layer[0].glow.ocm_map.wb_pages = wb_pages;
+	model->layer[0].glow.ocm_map.scratch_pages = scratch_pages;
 
 	/* Set model info */
-	model->info = PLT_PTR_ADD(model->addr.scratch_base_addr, model_scratch_size);
+	model->info = PLT_PTR_ADD(model->layer[0].glow.addr.scratch_base_addr, model_scratch_size);
 	cn10k_ml_model_info_set(dev, model);
 
 	/* Set slow-path request address and state */
-	model->req = PLT_PTR_ADD(model->info, model_info_size);
+	model->layer[0].glow.req = PLT_PTR_ADD(model->info, model_info_size);
 
 	/* Reset burst and sync stats */
-	model->burst_stats = PLT_PTR_ADD(
-		model->req, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE));
+	model->layer[0].glow.burst_stats =
+		PLT_PTR_ADD(model->layer[0].glow.req,
+			    PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE));
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
-		model->burst_stats[qp_id].hw_latency_tot = 0;
-		model->burst_stats[qp_id].hw_latency_min = UINT64_MAX;
-		model->burst_stats[qp_id].hw_latency_max = 0;
-		model->burst_stats[qp_id].fw_latency_tot = 0;
-		model->burst_stats[qp_id].fw_latency_min = UINT64_MAX;
-		model->burst_stats[qp_id].fw_latency_max = 0;
-		model->burst_stats[qp_id].hw_reset_count = 0;
-		model->burst_stats[qp_id].fw_reset_count = 0;
-		model->burst_stats[qp_id].dequeued_count = 0;
-	}
-	model->sync_stats =
-		PLT_PTR_ADD(model->burst_stats,
-			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_model_stats));
+		model->layer[0].glow.burst_stats[qp_id].hw_latency_tot = 0;
+		model->layer[0].glow.burst_stats[qp_id].hw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_stats[qp_id].hw_latency_max = 0;
+		model->layer[0].glow.burst_stats[qp_id].fw_latency_tot = 0;
+		model->layer[0].glow.burst_stats[qp_id].fw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_stats[qp_id].fw_latency_max = 0;
+		model->layer[0].glow.burst_stats[qp_id].hw_reset_count = 0;
+		model->layer[0].glow.burst_stats[qp_id].fw_reset_count = 0;
+		model->layer[0].glow.burst_stats[qp_id].dequeued_count = 0;
+	}
+
+	model->layer[0].glow.sync_stats =
+		PLT_PTR_ADD(model->layer[0].glow.burst_stats,
+			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_stats));
 
 	plt_spinlock_init(&model->lock);
-	model->state = ML_CN10K_MODEL_STATE_LOADED;
+	model->state = ML_CNXK_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
 	cnxk_mldev->nb_models_loaded++;
 
@@ -1730,7 +1760,7 @@ int
 cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	char str[RTE_MEMZONE_NAMESIZE];
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cnxk_ml_dev *cnxk_mldev;
 
 	cnxk_mldev = dev->data->dev_private;
@@ -1741,7 +1771,7 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EINVAL;
 	}
 
-	if (model->state != ML_CN10K_MODEL_STATE_LOADED) {
+	if (model->state != ML_CNXK_MODEL_STATE_LOADED) {
 		plt_err("Cannot unload. Model in use.");
 		return -EBUSY;
 	}
@@ -1758,7 +1788,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1783,7 +1813,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	}
 
 	/* Prepare JD */
-	req = model->req;
+	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
@@ -1791,63 +1821,66 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
-	num_tiles = model->metadata.model.tile_end - model->metadata.model.tile_start + 1;
+	num_tiles = model->layer[0].glow.metadata.model.tile_end -
+		    model->layer[0].glow.metadata.model.tile_start + 1;
 
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 				plt_ml_dbg("Model already started, model = 0x%016lx",
 					   PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE) {
+			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
 				plt_err("A slow-path job is active for the model = 0x%016lx",
 					PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CN10K_MODEL_STATE_JOB_ACTIVE;
+			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (!model->model_mem_map.ocm_reserved) {
+	while (!model->layer[0].glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
 			wb_page_start = cn10k_ml_ocm_tilemask_find(
-				dev, num_tiles, model->model_mem_map.wb_pages,
-				model->model_mem_map.scratch_pages, &tilemask);
+				dev, num_tiles, model->layer[0].glow.ocm_map.wb_pages,
+				model->layer[0].glow.ocm_map.scratch_pages, &tilemask);
 
 			if (wb_page_start == -1) {
 				plt_err("Free pages not available on OCM tiles");
 				plt_err("Failed to start model = 0x%016lx, name = %s",
-					PLT_U64_CAST(model), model->metadata.model.name);
+					PLT_U64_CAST(model),
+					model->layer[0].glow.metadata.model.name);
 
 				plt_spinlock_unlock(&ocm->lock);
 				return -ENOMEM;
 			}
 
-			model->model_mem_map.tilemask = tilemask;
-			model->model_mem_map.wb_page_start = wb_page_start;
+			model->layer[0].glow.ocm_map.tilemask = tilemask;
+			model->layer[0].glow.ocm_map.wb_page_start = wb_page_start;
 
-			cn10k_ml_ocm_reserve_pages(
-				dev, model->model_id, model->model_mem_map.tilemask,
-				model->model_mem_map.wb_page_start, model->model_mem_map.wb_pages,
-				model->model_mem_map.scratch_pages);
-			model->model_mem_map.ocm_reserved = true;
+			cn10k_ml_ocm_reserve_pages(dev, model->model_id, 0,
+						   model->layer[0].glow.ocm_map.tilemask,
+						   model->layer[0].glow.ocm_map.wb_page_start,
+						   model->layer[0].glow.ocm_map.wb_pages,
+						   model->layer[0].glow.ocm_map.scratch_pages);
+			model->layer[0].glow.ocm_map.ocm_reserved = true;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
 
 	/* Update JD */
-	cn10k_ml_ocm_tilecount(model->model_mem_map.tilemask, &tile_start, &tile_end);
+	cn10k_ml_ocm_tilecount(model->layer[0].glow.ocm_map.tilemask, &tile_start, &tile_end);
 	req->jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
 	req->jd.model_start.ocm_wb_base_address =
-		model->model_mem_map.wb_page_start * ocm->page_size;
+		model->layer[0].glow.ocm_map.wb_page_start * ocm->page_size;
 
 	job_enqueued = false;
 	job_dequeued = false;
@@ -1880,10 +1913,10 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
 			if (ret == 0) {
-				model->state = ML_CN10K_MODEL_STATE_STARTED;
+				model->state = ML_CNXK_MODEL_STATE_STARTED;
 				cnxk_mldev->nb_models_started++;
 			} else {
-				model->state = ML_CN10K_MODEL_STATE_UNKNOWN;
+				model->state = ML_CNXK_MODEL_STATE_UNKNOWN;
 			}
 
 			plt_spinlock_unlock(&model->lock);
@@ -1891,12 +1924,12 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 		}
 	}
 
-	if (model->state == ML_CN10K_MODEL_STATE_UNKNOWN) {
-		while (model->model_mem_map.ocm_reserved) {
+	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN) {
+		while (model->layer[0].glow.ocm_map.ocm_reserved) {
 			if (plt_spinlock_trylock(&ocm->lock) != 0) {
-				cn10k_ml_ocm_free_pages(dev, model->model_id);
-				model->model_mem_map.ocm_reserved = false;
-				model->model_mem_map.tilemask = 0x0;
+				cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
+				model->layer[0].glow.ocm_map.ocm_reserved = false;
+				model->layer[0].glow.ocm_map.tilemask = 0x0;
 				plt_spinlock_unlock(&ocm->lock);
 			}
 		}
@@ -1917,7 +1950,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1937,7 +1970,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	}
 
 	/* Prepare JD */
-	req = model->req;
+	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
@@ -1948,31 +1981,31 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
 				plt_ml_dbg("Model not started, model = 0x%016lx",
 					   PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE) {
+			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
 				plt_err("A slow-path job is active for the model = 0x%016lx",
 					PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CN10K_MODEL_STATE_JOB_ACTIVE;
+			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (model->model_mem_map.ocm_reserved) {
+	while (model->layer[0].glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
-			cn10k_ml_ocm_free_pages(dev, model->model_id);
-			model->model_mem_map.ocm_reserved = false;
-			model->model_mem_map.tilemask = 0x0;
+			cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
+			model->layer[0].glow.ocm_map.ocm_reserved = false;
+			model->layer[0].glow.ocm_map.tilemask = 0x0;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
@@ -2008,7 +2041,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
 			cnxk_mldev->nb_models_stopped++;
-			model->state = ML_CN10K_MODEL_STATE_LOADED;
+			model->state = ML_CNXK_MODEL_STATE_LOADED;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
@@ -2021,7 +2054,7 @@ static int
 cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			struct rte_ml_model_info *model_info)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 
 	model = dev->data->models[model_id];
 
@@ -2040,7 +2073,7 @@ cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 static int
 cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	size_t size;
 
 	model = dev->data->models[model_id];
@@ -2050,19 +2083,23 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *bu
 		return -EINVAL;
 	}
 
-	if (model->state == ML_CN10K_MODEL_STATE_UNKNOWN)
+	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN)
 		return -1;
-	else if (model->state != ML_CN10K_MODEL_STATE_LOADED)
+	else if (model->state != ML_CNXK_MODEL_STATE_LOADED)
 		return -EBUSY;
 
-	size = model->metadata.init_model.file_size + model->metadata.main_model.file_size +
-	       model->metadata.finish_model.file_size + model->metadata.weights_bias.file_size;
+	size = model->layer[0].glow.metadata.init_model.file_size +
+	       model->layer[0].glow.metadata.main_model.file_size +
+	       model->layer[0].glow.metadata.finish_model.file_size +
+	       model->layer[0].glow.metadata.weights_bias.file_size;
 
 	/* Update model weights & bias */
-	rte_memcpy(model->addr.wb_load_addr, buffer, model->metadata.weights_bias.file_size);
+	rte_memcpy(model->layer[0].glow.addr.wb_load_addr, buffer,
+		   model->layer[0].glow.metadata.weights_bias.file_size);
 
 	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, size);
+	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
+		   model->layer[0].glow.addr.base_dma_addr_load, size);
 
 	return 0;
 }
@@ -2071,7 +2108,7 @@ static int
 cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
 		     struct rte_ml_buff_seg **qbuffer)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint8_t model_input_type;
 	uint8_t *lcl_dbuffer;
 	uint8_t *lcl_qbuffer;
@@ -2091,57 +2128,58 @@ cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_bu
 	lcl_dbuffer = dbuffer[0]->addr;
 	lcl_qbuffer = qbuffer[0]->addr;
 
-	for (i = 0; i < model->metadata.model.num_input; i++) {
+	for (i = 0; i < model->layer[0].glow.metadata.model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			input_type = model->metadata.input1[i].input_type;
-			model_input_type = model->metadata.input1[i].model_input_type;
-			qscale = model->metadata.input1[i].qscale;
+			input_type = model->layer[0].glow.metadata.input1[i].input_type;
+			model_input_type = model->layer[0].glow.metadata.input1[i].model_input_type;
+			qscale = model->layer[0].glow.metadata.input1[i].qscale;
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			input_type = model->metadata.input2[j].input_type;
-			model_input_type = model->metadata.input2[j].model_input_type;
-			qscale = model->metadata.input2[j].qscale;
+			input_type = model->layer[0].glow.metadata.input2[j].input_type;
+			model_input_type = model->layer[0].glow.metadata.input2[j].model_input_type;
+			qscale = model->layer[0].glow.metadata.input2[j].qscale;
 		}
 
 		if (input_type == model_input_type) {
-			rte_memcpy(lcl_qbuffer, lcl_dbuffer, model->addr.input[i].sz_d);
+			rte_memcpy(lcl_qbuffer, lcl_dbuffer, model->layer[0].info.input[i].sz_d);
 		} else {
-			switch (model->metadata.input1[i].model_input_type) {
+			switch (model->layer[0].glow.metadata.input1[i].model_input_type) {
 			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_float32_to_int8(qscale,
-								model->addr.input[i].nb_elements,
-								lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_int8(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_float32_to_uint8(qscale,
-								 model->addr.input[i].nb_elements,
-								 lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_uint8(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_float32_to_int16(qscale,
-								 model->addr.input[i].nb_elements,
-								 lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_int16(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_float32_to_uint16(qscale,
-								  model->addr.input[i].nb_elements,
-								  lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_uint16(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_FP16:
-				ret = rte_ml_io_float32_to_float16(model->addr.input[i].nb_elements,
-								   lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_float16(
+					model->layer[0].info.input[i].nb_elements, lcl_dbuffer,
+					lcl_qbuffer);
 				break;
 			default:
 				plt_err("Unsupported model_input_type[%u] : %u", i,
-					model->metadata.input1[i].model_input_type);
+					model->layer[0].glow.metadata.input1[i].model_input_type);
 				ret = -ENOTSUP;
 			}
 			if (ret < 0)
 				return ret;
 		}
 
-		lcl_dbuffer += model->addr.input[i].sz_d;
-		lcl_qbuffer += model->addr.input[i].sz_q;
+		lcl_dbuffer += model->layer[0].info.input[i].sz_d;
+		lcl_qbuffer += model->layer[0].info.input[i].sz_q;
 	}
 
 	return 0;
@@ -2151,7 +2189,7 @@ static int
 cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
 		       struct rte_ml_buff_seg **dbuffer)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint8_t model_output_type;
 	uint8_t *lcl_qbuffer;
 	uint8_t *lcl_dbuffer;
@@ -2171,58 +2209,60 @@ cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_
 	lcl_dbuffer = dbuffer[0]->addr;
 	lcl_qbuffer = qbuffer[0]->addr;
 
-	for (i = 0; i < model->metadata.model.num_output; i++) {
+	for (i = 0; i < model->layer[0].glow.metadata.model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			output_type = model->metadata.output1[i].output_type;
-			model_output_type = model->metadata.output1[i].model_output_type;
-			dscale = model->metadata.output1[i].dscale;
+			output_type = model->layer[0].glow.metadata.output1[i].output_type;
+			model_output_type =
+				model->layer[0].glow.metadata.output1[i].model_output_type;
+			dscale = model->layer[0].glow.metadata.output1[i].dscale;
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			output_type = model->metadata.output2[j].output_type;
-			model_output_type = model->metadata.output2[j].model_output_type;
-			dscale = model->metadata.output2[j].dscale;
+			output_type = model->layer[0].glow.metadata.output2[j].output_type;
+			model_output_type =
+				model->layer[0].glow.metadata.output2[j].model_output_type;
+			dscale = model->layer[0].glow.metadata.output2[j].dscale;
 		}
 
 		if (output_type == model_output_type) {
-			rte_memcpy(lcl_dbuffer, lcl_qbuffer, model->addr.output[i].sz_q);
+			rte_memcpy(lcl_dbuffer, lcl_qbuffer, model->layer[0].info.output[i].sz_q);
 		} else {
-			switch (model->metadata.output1[i].model_output_type) {
+			switch (model->layer[0].glow.metadata.output1[i].model_output_type) {
 			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_int8_to_float32(dscale,
-								model->addr.output[i].nb_elements,
-								lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_int8_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_uint8_to_float32(dscale,
-								 model->addr.output[i].nb_elements,
-								 lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_uint8_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_int16_to_float32(dscale,
-								 model->addr.output[i].nb_elements,
-								 lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_int16_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_uint16_to_float32(dscale,
-								  model->addr.output[i].nb_elements,
-								  lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_uint16_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_FP16:
 				ret = rte_ml_io_float16_to_float32(
-					model->addr.output[i].nb_elements, lcl_qbuffer,
+					model->layer[0].info.output[i].nb_elements, lcl_qbuffer,
 					lcl_dbuffer);
 				break;
 			default:
 				plt_err("Unsupported model_output_type[%u] : %u", i,
-					model->metadata.output1[i].model_output_type);
+					model->layer[0].glow.metadata.output1[i].model_output_type);
 				ret = -ENOTSUP;
 			}
 			if (ret < 0)
 				return ret;
 		}
 
-		lcl_qbuffer += model->addr.output[i].sz_q;
-		lcl_dbuffer += model->addr.output[i].sz_d;
+		lcl_qbuffer += model->layer[0].info.output[i].sz_q;
+		lcl_dbuffer += model->layer[0].info.output[i].sz_d;
 	}
 
 	return 0;
@@ -2250,10 +2290,10 @@ static __rte_always_inline void
 cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
 		       struct rte_ml_op *op)
 {
-	struct cn10k_ml_model_stats *stats;
+	struct cn10k_ml_layer_stats *stats;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_qp *qp;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
@@ -2263,9 +2303,9 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		if (likely(qp_id >= 0)) {
 			qp = dev->data->queue_pairs[qp_id];
 			qp->stats.dequeued_count++;
-			stats = &model->burst_stats[qp_id];
+			stats = &model->layer[0].glow.burst_stats[qp_id];
 		} else {
-			stats = model->sync_stats;
+			stats = model->layer[0].glow.sync_stats;
 		}
 
 		if (unlikely(stats->dequeued_count == stats->hw_reset_count)) {
@@ -2469,7 +2509,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_req *req;
 	bool timeout;
 	int ret = 0;
@@ -2477,7 +2517,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[op->model_id];
-	req = model->req;
+	req = model->layer[0].glow.req;
 
 	cn10k_ml_set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
diff --git a/drivers/ml/cnxk/cnxk_ml_io.h b/drivers/ml/cnxk/cnxk_ml_io.h
new file mode 100644
index 0000000000..29ec7ec511
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_io.h
@@ -0,0 +1,79 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_IO_H_
+#define _CNXK_ML_IO_H_
+
+#include <rte_mldev.h>
+
+/* Maximum number of models per device */
+#define ML_CNXK_MAX_MODELS 16
+
+/* Maximum number of layers per model */
+#define ML_CNXK_MODEL_MAX_LAYERS 32
+
+/* Maximum number of inputs or outputs per layer or model */
+#define ML_CNXK_MODEL_MAX_INPUT_OUTPUT 32
+
+/* Maximum number of dimensions per I/O shape */
+#define ML_CNXK_MODEL_MAX_DIMS 8
+
+/* Input / Output structure */
+struct cnxk_ml_io {
+	/* name */
+	char name[RTE_ML_STR_MAX];
+
+	/* dequantized data type */
+	enum rte_ml_io_type dtype;
+
+	/* quantized data type */
+	enum rte_ml_io_type qtype;
+
+	/* Number of dimensions in shape */
+	uint32_t nb_dims;
+
+	/* Shape of input */
+	uint32_t shape[ML_CNXK_MODEL_MAX_DIMS];
+
+	/* Number of elements */
+	uint32_t nb_elements;
+
+	/* Dequantized input size */
+	uint32_t sz_d;
+
+	/* Quantized input size */
+	uint32_t sz_q;
+
+	/* Scale */
+	float scale;
+};
+
+/* Model / Layer IO structure */
+struct cnxk_ml_io_info {
+	/* Number of inputs */
+	uint16_t nb_inputs;
+
+	/* Model / Layer inputs */
+	struct cnxk_ml_io input[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Total size of quantized input */
+	uint32_t total_input_sz_q;
+
+	/* Total size of dequantized input */
+	uint32_t total_input_sz_d;
+
+	/* Number of outputs */
+	uint16_t nb_outputs;
+
+	/* Model / Layer outputs */
+	struct cnxk_ml_io output[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Total size of quantized output */
+	uint32_t total_output_sz_q;
+
+	/* Total size of dequantized output */
+	uint32_t total_output_sz_d;
+};
+
+#endif /* _CNXK_ML_IO_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
new file mode 100644
index 0000000000..3d735ced3e
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include "cnxk_ml_model.h"
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
new file mode 100644
index 0000000000..a2994dbb71
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -0,0 +1,111 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_MODEL_H_
+#define _CNXK_ML_MODEL_H_
+
+#include <rte_mldev.h>
+
+#include <roc_api.h>
+
+#include "cn10k_ml_model.h"
+
+#include "cnxk_ml_io.h"
+
+struct cnxk_ml_dev;
+struct cnxk_ml_model;
+
+/* Model state */
+enum cnxk_ml_model_state {
+	/* Unknown state */
+	ML_CNXK_MODEL_STATE_UNKNOWN,
+
+	/* Model loaded */
+	ML_CNXK_MODEL_STATE_LOADED,
+
+	/* A slow-path job is active, start or stop */
+	ML_CNXK_MODEL_STATE_JOB_ACTIVE,
+
+	/* Model started */
+	ML_CNXK_MODEL_STATE_STARTED,
+};
+
+/* Layer state */
+enum cnxk_ml_layer_state {
+	/* Unknown state */
+	ML_CNXK_LAYER_STATE_UNKNOWN,
+
+	/* Layer loaded */
+	ML_CNXK_LAYER_STATE_LOADED,
+
+	/* A slow-path job is active, start or stop */
+	ML_CNXK_LAYER_STATE_JOB_ACTIVE,
+
+	/* Layer started */
+	ML_CNXK_LAYER_STATE_STARTED,
+};
+
+/* Layer object */
+struct cnxk_ml_layer {
+	/* Name*/
+	char name[RTE_ML_STR_MAX];
+
+	/* Model handle */
+	struct cnxk_ml_model *model;
+
+	/* Index mapped with firmware's model_id */
+	uint16_t index;
+
+	/* Input / Output */
+	struct cnxk_ml_io_info info;
+
+	/* Batch size */
+	uint32_t batch_size;
+
+	/* State */
+	enum cnxk_ml_layer_state state;
+
+	/* Glow layer specific data */
+	struct cn10k_ml_layer_data glow;
+};
+
+/* Model Object */
+struct cnxk_ml_model {
+	/* Device reference */
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	/* ID */
+	uint16_t model_id;
+
+	/* Name */
+	char name[RTE_ML_STR_MAX];
+
+	/* Model specific data - glow */
+	struct cn10k_ml_model_data glow;
+
+	/* Batch size */
+	uint32_t batch_size;
+
+	/* Number of layers */
+	uint16_t nb_layers;
+
+	/* Layer info */
+	struct cnxk_ml_layer layer[ML_CNXK_MODEL_MAX_LAYERS];
+
+	/* State */
+	enum cnxk_ml_model_state state;
+
+	/* Internal model information structure
+	 * Size of the buffer = sizeof(struct rte_ml_model_info)
+	 *                    + num_inputs * sizeof(struct rte_ml_io_info)
+	 *                    + num_outputs * sizeof(struct rte_ml_io_info).
+	 * Structures would be arranged in the same order in the buffer.
+	 */
+	uint8_t *info;
+
+	/* Spinlock, used to update model state */
+	plt_spinlock_t lock;
+};
+
+#endif /* _CNXK_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index e006fdfe0e..a70956cceb 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -13,6 +13,7 @@ sources = files(
         'cn10k_ml_model.c',
         'cn10k_ml_ocm.c',
         'cnxk_ml_dev.c',
+        'cnxk_ml_model.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 04/34] ml/cnxk: add generic cnxk request structure
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
                     ` (2 preceding siblings ...)
  2023-10-19  4:16   ` [PATCH v7 03/34] ml/cnxk: add generic model and layer structures Srikanth Yalavarthi
@ 2023-10-19  4:16   ` Srikanth Yalavarthi
  2023-10-19  4:16   ` [PATCH v7 05/34] ml/cnxk: add generic cnxk xstats structures Srikanth Yalavarthi
                     ` (29 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:16 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added generic cnxk request structure. Moved common fields
from cn10k structures to cnxk structure. Moved job related
structures and enumerations to ops headers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c   |  72 +++----
 drivers/ml/cnxk/cn10k_ml_dev.h   | 269 +------------------------
 drivers/ml/cnxk/cn10k_ml_model.c |   6 +-
 drivers/ml/cnxk/cn10k_ml_model.h |   4 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 331 +++++++++++++++++--------------
 drivers/ml/cnxk/cn10k_ml_ops.h   | 296 +++++++++++++++++++++++----
 drivers/ml/cnxk/cnxk_ml_ops.c    |   7 +
 drivers/ml/cnxk/cnxk_ml_ops.h    |  63 ++++++
 drivers/ml/cnxk/meson.build      |   1 +
 9 files changed, 557 insertions(+), 492 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 3bc61443d8..fc6f78d414 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -14,9 +14,8 @@
 
 #include <roc_api.h>
 
-#include "cn10k_ml_ops.h"
-
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_ops.h"
 
 #define CN10K_ML_FW_PATH		"fw_path"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
@@ -400,20 +399,23 @@ cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 static void
 cn10k_ml_fw_print_info(struct cn10k_ml_fw *fw)
 {
-	plt_info("ML Firmware Version = %s", fw->req->jd.fw_load.version);
-
-	plt_ml_dbg("Firmware capabilities = 0x%016lx", fw->req->jd.fw_load.cap.u64);
-	plt_ml_dbg("Version = %s", fw->req->jd.fw_load.version);
-	plt_ml_dbg("core0_debug_ptr = 0x%016lx", fw->req->jd.fw_load.debug.core0_debug_ptr);
-	plt_ml_dbg("core1_debug_ptr = 0x%016lx", fw->req->jd.fw_load.debug.core1_debug_ptr);
-	plt_ml_dbg("debug_buffer_size = %u bytes", fw->req->jd.fw_load.debug.debug_buffer_size);
+	plt_info("ML Firmware Version = %s", fw->req->cn10k_req.jd.fw_load.version);
+
+	plt_ml_dbg("Firmware capabilities = 0x%016lx", fw->req->cn10k_req.jd.fw_load.cap.u64);
+	plt_ml_dbg("Version = %s", fw->req->cn10k_req.jd.fw_load.version);
+	plt_ml_dbg("core0_debug_ptr = 0x%016lx",
+		   fw->req->cn10k_req.jd.fw_load.debug.core0_debug_ptr);
+	plt_ml_dbg("core1_debug_ptr = 0x%016lx",
+		   fw->req->cn10k_req.jd.fw_load.debug.core1_debug_ptr);
+	plt_ml_dbg("debug_buffer_size = %u bytes",
+		   fw->req->cn10k_req.jd.fw_load.debug.debug_buffer_size);
 	plt_ml_dbg("core0_exception_buffer = 0x%016lx",
-		   fw->req->jd.fw_load.debug.core0_exception_buffer);
+		   fw->req->cn10k_req.jd.fw_load.debug.core0_exception_buffer);
 	plt_ml_dbg("core1_exception_buffer = 0x%016lx",
-		   fw->req->jd.fw_load.debug.core1_exception_buffer);
+		   fw->req->cn10k_req.jd.fw_load.debug.core1_exception_buffer);
 	plt_ml_dbg("exception_state_size = %u bytes",
-		   fw->req->jd.fw_load.debug.exception_state_size);
-	plt_ml_dbg("flags = 0x%016lx", fw->req->jd.fw_load.flags);
+		   fw->req->cn10k_req.jd.fw_load.debug.exception_state_size);
+	plt_ml_dbg("flags = 0x%016lx", fw->req->cn10k_req.jd.fw_load.flags);
 }
 
 uint64_t
@@ -458,29 +460,30 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 	roc_ml_reg_save(&cn10k_mldev->roc, ML_MLR_BASE);
 
 	/* Update FW load completion structure */
-	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
-	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
-	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
+	fw->req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->cn10k_req.status);
+	fw->req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
+	fw->req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->cn10k_req.result);
+	fw->req->cn10k_req.jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->cn10k_req.status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
 	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->cn10k_req.jd);
 
 	plt_rmb();
 	do {
 		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
+		    (plt_read64(&fw->req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
 	} while (plt_tsc_cycles() < timeout_cycle);
 
 	/* Check firmware load status, clean-up and exit on failure. */
-	if ((!timeout) && (fw->req->result.error_code.u64 == 0)) {
+	if ((!timeout) && (fw->req->cn10k_req.result.error_code == 0)) {
 		cn10k_ml_fw_print_info(fw);
 	} else {
 		/* Set ML to disable new jobs */
@@ -654,29 +657,30 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	plt_ml_dbg("ML_SW_RST_CTRL => 0x%08x", reg_val32);
 
 	/* (12) Wait for notification from firmware that ML is ready for job execution. */
-	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
-	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
-	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
+	fw->req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->cn10k_req.status);
+	fw->req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
+	fw->req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->cn10k_req.result);
+	fw->req->cn10k_req.jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->cn10k_req.status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
 	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->cn10k_req.jd);
 
 	plt_rmb();
 	do {
 		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
+		    (plt_read64(&fw->req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
 	} while (plt_tsc_cycles() < timeout_cycle);
 
 	/* Check firmware load status, clean-up and exit on failure. */
-	if ((!timeout) && (fw->req->result.error_code.u64 == 0)) {
+	if ((!timeout) && (fw->req->cn10k_req.result.error_code == 0)) {
 		cn10k_ml_fw_print_info(fw);
 	} else {
 		/* Set ML to disable new jobs */
@@ -766,11 +770,11 @@ cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 		}
 
 		/* Reserve memzone for firmware load completion and data */
-		mz_size = sizeof(struct cn10k_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
+		mz_size = sizeof(struct cnxk_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
 			  FW_DEBUG_BUFFER_SIZE + FW_EXCEPTION_BUFFER_SIZE;
 	} else if (roc_env_is_asim()) {
 		/* Reserve memzone for firmware load completion */
-		mz_size = sizeof(struct cn10k_ml_req);
+		mz_size = sizeof(struct cnxk_ml_req);
 	}
 
 	mz = plt_memzone_reserve_aligned(FW_MEMZONE_NAME, mz_size, 0, ML_CN10K_ALIGN_SIZE);
@@ -782,8 +786,8 @@ cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 	fw->req = mz->addr;
 
 	/* Reset firmware load completion structure */
-	memset(&fw->req->jd, 0, sizeof(struct cn10k_ml_jd));
-	memset(&fw->req->jd.fw_load.version[0], '\0', MLDEV_FIRMWARE_VERSION_LENGTH);
+	memset(&fw->req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+	memset(&fw->req->cn10k_req.jd.fw_load.version[0], '\0', MLDEV_FIRMWARE_VERSION_LENGTH);
 
 	/* Reset device, if in active state */
 	if (roc_ml_mlip_is_enabled(&cn10k_mldev->roc))
@@ -791,7 +795,7 @@ cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 
 	/* Load firmware */
 	if (roc_env_is_emulator() || roc_env_is_hw()) {
-		fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cn10k_ml_req));
+		fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cnxk_ml_req));
 		ret = cn10k_ml_fw_load_cn10ka(fw, fw_buffer, fw_size);
 		free(fw_buffer);
 	} else if (roc_env_is_asim()) {
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 99ff0a344a..1852d4f6c9 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -17,9 +17,6 @@ extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
-/* Firmware version string length */
-#define MLDEV_FIRMWARE_VERSION_LENGTH 32
-
 /* Device alignment size */
 #define ML_CN10K_ALIGN_SIZE 128
 
@@ -52,17 +49,8 @@ extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 #endif
 
 struct cnxk_ml_dev;
-struct cn10k_ml_req;
-struct cn10k_ml_qp;
-
-/* Job types */
-enum cn10k_ml_job_type {
-	ML_CN10K_JOB_TYPE_MODEL_RUN = 0,
-	ML_CN10K_JOB_TYPE_MODEL_STOP,
-	ML_CN10K_JOB_TYPE_MODEL_START,
-	ML_CN10K_JOB_TYPE_FIRMWARE_LOAD,
-	ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST,
-};
+struct cnxk_ml_req;
+struct cnxk_ml_qp;
 
 /* Error types enumeration */
 enum cn10k_ml_error_etype {
@@ -112,251 +100,6 @@ union cn10k_ml_error_code {
 	uint64_t u64;
 };
 
-/* Firmware stats */
-struct cn10k_ml_fw_stats {
-	/* Firmware start cycle */
-	uint64_t fw_start;
-
-	/* Firmware end cycle */
-	uint64_t fw_end;
-
-	/* Hardware start cycle */
-	uint64_t hw_start;
-
-	/* Hardware end cycle */
-	uint64_t hw_end;
-};
-
-/* Result structure */
-struct cn10k_ml_result {
-	/* Job error code */
-	union cn10k_ml_error_code error_code;
-
-	/* Firmware stats */
-	struct cn10k_ml_fw_stats stats;
-
-	/* User context pointer */
-	void *user_ptr;
-};
-
-/* Firmware capability structure */
-union cn10k_ml_fw_cap {
-	uint64_t u64;
-
-	struct {
-		/* CMPC completion support */
-		uint64_t cmpc_completions : 1;
-
-		/* Poll mode completion support */
-		uint64_t poll_completions : 1;
-
-		/* SSO completion support */
-		uint64_t sso_completions : 1;
-
-		/* Support for model side loading */
-		uint64_t side_load_model : 1;
-
-		/* Batch execution */
-		uint64_t batch_run : 1;
-
-		/* Max number of models to be loaded in parallel */
-		uint64_t max_models : 8;
-
-		/* Firmware statistics */
-		uint64_t fw_stats : 1;
-
-		/* Hardware statistics */
-		uint64_t hw_stats : 1;
-
-		/* Max number of batches */
-		uint64_t max_num_batches : 16;
-
-		uint64_t rsvd : 33;
-	} s;
-};
-
-/* Firmware debug info structure */
-struct cn10k_ml_fw_debug {
-	/* ACC core 0 debug buffer */
-	uint64_t core0_debug_ptr;
-
-	/* ACC core 1 debug buffer */
-	uint64_t core1_debug_ptr;
-
-	/* ACC core 0 exception state buffer */
-	uint64_t core0_exception_buffer;
-
-	/* ACC core 1 exception state buffer */
-	uint64_t core1_exception_buffer;
-
-	/* Debug buffer size per core */
-	uint32_t debug_buffer_size;
-
-	/* Exception state dump size */
-	uint32_t exception_state_size;
-};
-
-/* Job descriptor header (32 bytes) */
-struct cn10k_ml_jd_header {
-	/* Job completion structure */
-	struct ml_jce_s jce;
-
-	/* Model ID */
-	uint64_t model_id : 8;
-
-	/* Job type */
-	uint64_t job_type : 8;
-
-	/* Flags for fast-path jobs */
-	uint64_t fp_flags : 16;
-
-	/* Flags for slow-path jobs */
-	uint64_t sp_flags : 16;
-	uint64_t rsvd : 16;
-
-	/* Job result pointer */
-	uint64_t *result;
-};
-
-/* Extra arguments for job descriptor */
-union cn10k_ml_jd_extended_args {
-	struct cn10k_ml_jd_extended_args_section_start {
-		/** DDR Scratch base address */
-		uint64_t ddr_scratch_base_address;
-
-		/** DDR Scratch range start */
-		uint64_t ddr_scratch_range_start;
-
-		/** DDR Scratch range end */
-		uint64_t ddr_scratch_range_end;
-
-		uint8_t rsvd[104];
-	} start;
-};
-
-/* Job descriptor structure */
-struct cn10k_ml_jd {
-	/* Job descriptor header (32 bytes) */
-	struct cn10k_ml_jd_header hdr;
-
-	union {
-		struct cn10k_ml_jd_section_fw_load {
-			/* Firmware capability structure (8 bytes) */
-			union cn10k_ml_fw_cap cap;
-
-			/* Firmware version (32 bytes) */
-			uint8_t version[MLDEV_FIRMWARE_VERSION_LENGTH];
-
-			/* Debug capability structure (40 bytes) */
-			struct cn10k_ml_fw_debug debug;
-
-			/* Flags to control error handling */
-			uint64_t flags;
-
-			uint8_t rsvd[8];
-		} fw_load;
-
-		struct cn10k_ml_jd_section_model_start {
-			/* Extended arguments */
-			uint64_t extended_args;
-
-			/* Destination model start address in DDR relative to ML_MLR_BASE */
-			uint64_t model_dst_ddr_addr;
-
-			/* Offset to model init section in the model */
-			uint64_t model_init_offset : 32;
-
-			/* Size of init section in the model */
-			uint64_t model_init_size : 32;
-
-			/* Offset to model main section in the model */
-			uint64_t model_main_offset : 32;
-
-			/* Size of main section in the model */
-			uint64_t model_main_size : 32;
-
-			/* Offset to model finish section in the model */
-			uint64_t model_finish_offset : 32;
-
-			/* Size of finish section in the model */
-			uint64_t model_finish_size : 32;
-
-			/* Offset to WB in model bin */
-			uint64_t model_wb_offset : 32;
-
-			/* Number of model layers */
-			uint64_t num_layers : 8;
-
-			/* Number of gather entries, 0 means linear input mode (= no gather) */
-			uint64_t num_gather_entries : 8;
-
-			/* Number of scatter entries 0 means linear input mode (= no scatter) */
-			uint64_t num_scatter_entries : 8;
-
-			/* Tile mask to load model */
-			uint64_t tilemask : 8;
-
-			/* Batch size of model  */
-			uint64_t batch_size : 32;
-
-			/* OCM WB base address */
-			uint64_t ocm_wb_base_address : 32;
-
-			/* OCM WB range start */
-			uint64_t ocm_wb_range_start : 32;
-
-			/* OCM WB range End */
-			uint64_t ocm_wb_range_end : 32;
-
-			/* DDR WB address */
-			uint64_t ddr_wb_base_address;
-
-			/* DDR WB range start */
-			uint64_t ddr_wb_range_start : 32;
-
-			/* DDR WB range end */
-			uint64_t ddr_wb_range_end : 32;
-
-			union {
-				/* Points to gather list if num_gather_entries > 0 */
-				void *gather_list;
-				struct {
-					/* Linear input mode */
-					uint64_t ddr_range_start : 32;
-					uint64_t ddr_range_end : 32;
-				} s;
-			} input;
-
-			union {
-				/* Points to scatter list if num_scatter_entries > 0 */
-				void *scatter_list;
-				struct {
-					/* Linear output mode */
-					uint64_t ddr_range_start : 32;
-					uint64_t ddr_range_end : 32;
-				} s;
-			} output;
-		} model_start;
-
-		struct cn10k_ml_jd_section_model_stop {
-			uint8_t rsvd[96];
-		} model_stop;
-
-		struct cn10k_ml_jd_section_model_run {
-			/* Address of the input for the run relative to ML_MLR_BASE */
-			uint64_t input_ddr_addr;
-
-			/* Address of the output for the run relative to ML_MLR_BASE */
-			uint64_t output_ddr_addr;
-
-			/* Number of batches to run in variable batch processing */
-			uint16_t num_batches;
-
-			uint8_t rsvd[78];
-		} model_run;
-	};
-};
-
 /* ML firmware structure */
 struct cn10k_ml_fw {
 	/* Device reference */
@@ -375,7 +118,7 @@ struct cn10k_ml_fw {
 	uint8_t *data;
 
 	/* Firmware load / handshake request structure */
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 };
 
 /* Extended stats types enum */
@@ -488,9 +231,9 @@ struct cn10k_ml_dev {
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
 
 	/* Poll handling function pointers */
-	void (*set_poll_addr)(struct cn10k_ml_req *req);
-	void (*set_poll_ptr)(struct cn10k_ml_req *req);
-	uint64_t (*get_poll_ptr)(struct cn10k_ml_req *req);
+	void (*set_poll_addr)(struct cnxk_ml_req *req);
+	void (*set_poll_ptr)(struct cnxk_ml_req *req);
+	uint64_t (*get_poll_ptr)(struct cnxk_ml_req *req);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index d033d6deff..d2f1c761be 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -10,6 +10,7 @@
 
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_ops.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -551,7 +552,6 @@ void
 cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct rte_ml_model_info *info;
 	struct rte_ml_io_info *output;
@@ -560,7 +560,6 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 	uint8_t i;
 
 	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	metadata = &model->glow.metadata;
 	info = PLT_PTR_CAST(model->info);
 	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
@@ -577,7 +576,8 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 	info->io_layout = RTE_ML_IO_LAYOUT_PACKED;
 	info->min_batches = model->batch_size;
 	info->max_batches =
-		cn10k_mldev->fw.req->jd.fw_load.cap.s.max_num_batches / model->batch_size;
+		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_num_batches /
+		model->batch_size;
 	info->nb_inputs = metadata->model.num_input;
 	info->input_info = input;
 	info->nb_outputs = metadata->model.num_output;
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 206a369ca7..74ada1531a 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -11,10 +11,10 @@
 
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ocm.h"
-#include "cn10k_ml_ops.h"
 
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
+struct cnxk_ml_req;
 
 /* Model Metadata : v 2.3.0.1 */
 #define MRVL_ML_MODEL_MAGIC_STRING "MRVL"
@@ -444,7 +444,7 @@ struct cn10k_ml_layer_data {
 	struct cn10k_ml_ocm_layer_map ocm_map;
 
 	/* Layer: Slow-path operations request pointer */
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 
 	/* Layer: Stats for burst ops */
 	struct cn10k_ml_layer_stats *burst_stats;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index b226a9b5a2..25ebb28993 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -7,10 +7,9 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_ops.h"
-
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_ops.h"
 
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
@@ -78,31 +77,31 @@ print_line(FILE *fp, int len)
 }
 
 static inline void
-cn10k_ml_set_poll_addr(struct cn10k_ml_req *req)
+cn10k_ml_set_poll_addr(struct cnxk_ml_req *req)
 {
-	req->compl_W1 = PLT_U64_CAST(&req->status);
+	req->status = &req->cn10k_req.status;
 }
 
 static inline void
-cn10k_ml_set_poll_ptr(struct cn10k_ml_req *req)
+cn10k_ml_set_poll_ptr(struct cnxk_ml_req *req)
 {
-	plt_write64(ML_CNXK_POLL_JOB_START, req->compl_W1);
+	plt_write64(ML_CNXK_POLL_JOB_START, req->status);
 }
 
 static inline uint64_t
-cn10k_ml_get_poll_ptr(struct cn10k_ml_req *req)
+cn10k_ml_get_poll_ptr(struct cnxk_ml_req *req)
 {
-	return plt_read64(req->compl_W1);
+	return plt_read64(req->status);
 }
 
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
-	snprintf(name, size, "cn10k_ml_qp_mem_%u:%u", dev_id, qp_id);
+	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
 }
 
 static int
-cn10k_ml_qp_destroy(const struct rte_ml_dev *dev, struct cn10k_ml_qp *qp)
+cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
 {
 	const struct rte_memzone *qp_mem;
 	char name[RTE_MEMZONE_NAMESIZE];
@@ -122,14 +121,14 @@ cn10k_ml_qp_destroy(const struct rte_ml_dev *dev, struct cn10k_ml_qp *qp)
 static int
 cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
 {
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	int ret;
 
 	qp = dev->data->queue_pairs[queue_pair_id];
 	if (qp == NULL)
 		return -EINVAL;
 
-	ret = cn10k_ml_qp_destroy(dev, qp);
+	ret = cnxk_ml_qp_destroy(dev, qp);
 	if (ret) {
 		plt_err("Could not destroy queue pair %u", queue_pair_id);
 		return ret;
@@ -140,18 +139,18 @@ cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
 	return 0;
 }
 
-static struct cn10k_ml_qp *
-cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
+static struct cnxk_ml_qp *
+cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
 {
 	const struct rte_memzone *qp_mem;
 	char name[RTE_MEMZONE_NAMESIZE];
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint32_t len;
 	uint8_t *va;
 	uint64_t i;
 
 	/* Allocate queue pair */
-	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cn10k_ml_qp), ROC_ALIGN,
+	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cnxk_ml_qp), ROC_ALIGN,
 				socket_id);
 	if (qp == NULL) {
 		plt_err("Could not allocate queue pair");
@@ -159,7 +158,7 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	}
 
 	/* For request queue */
-	len = nb_desc * sizeof(struct cn10k_ml_req);
+	len = nb_desc * sizeof(struct cnxk_ml_req);
 	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp_id);
 	qp_mem = rte_memzone_reserve_aligned(
 		name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | RTE_MEMZONE_256MB, ROC_ALIGN);
@@ -173,7 +172,7 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 
 	/* Initialize Request queue */
 	qp->id = qp_id;
-	qp->queue.reqs = (struct cn10k_ml_req *)va;
+	qp->queue.reqs = (struct cnxk_ml_req *)va;
 	qp->queue.head = 0;
 	qp->queue.tail = 0;
 	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
@@ -185,8 +184,9 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
-		memset(&qp->queue.reqs[i].jd, 0, sizeof(struct cn10k_ml_jd));
-		qp->queue.reqs[i].jcmd.w1.s.jobptr = PLT_U64_CAST(&qp->queue.reqs[i].jd);
+		memset(&qp->queue.reqs[i].cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+		qp->queue.reqs[i].cn10k_req.jcmd.w1.s.jobptr =
+			PLT_U64_CAST(&qp->queue.reqs[i].cn10k_req.jd);
 	}
 
 	return qp;
@@ -333,7 +333,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 
 static void
 cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_model *model,
-				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
+				struct cnxk_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_layer_addr *addr;
@@ -341,79 +341,88 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 	metadata = &model->glow.metadata;
 	addr = &model->layer[0].glow.addr;
 
-	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
-	req->jd.hdr.jce.w0.u64 = 0;
-	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
-	req->jd.hdr.model_id = model->model_id;
-	req->jd.hdr.job_type = job_type;
-	req->jd.hdr.fp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
+	memset(&req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
+	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->cn10k_req.status);
+	req->cn10k_req.jd.hdr.model_id = model->model_id;
+	req->cn10k_req.jd.hdr.job_type = job_type;
+	req->cn10k_req.jd.hdr.fp_flags = 0x0;
+	req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
 		if (!model->glow.metadata.model.ocm_relocatable)
-			req->jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
+			req->cn10k_req.jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
 		else
-			req->jd.hdr.sp_flags = 0x0;
+			req->cn10k_req.jd.hdr.sp_flags = 0x0;
 
-		req->jd.hdr.sp_flags |= ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD;
-		req->jd.model_start.extended_args =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->extended_args));
-		req->jd.model_start.model_dst_ddr_addr =
+		req->cn10k_req.jd.hdr.sp_flags |= ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD;
+		req->cn10k_req.jd.model_start.extended_args = PLT_U64_CAST(
+			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.extended_args));
+		req->cn10k_req.jd.model_start.model_dst_ddr_addr =
 			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_run_addr));
-		req->jd.model_start.model_init_offset = 0x0;
-		req->jd.model_start.model_main_offset = metadata->init_model.file_size;
-		req->jd.model_start.model_finish_offset =
+		req->cn10k_req.jd.model_start.model_init_offset = 0x0;
+		req->cn10k_req.jd.model_start.model_main_offset = metadata->init_model.file_size;
+		req->cn10k_req.jd.model_start.model_finish_offset =
 			metadata->init_model.file_size + metadata->main_model.file_size;
-		req->jd.model_start.model_init_size = metadata->init_model.file_size;
-		req->jd.model_start.model_main_size = metadata->main_model.file_size;
-		req->jd.model_start.model_finish_size = metadata->finish_model.file_size;
-		req->jd.model_start.model_wb_offset = metadata->init_model.file_size +
-						      metadata->main_model.file_size +
-						      metadata->finish_model.file_size;
-		req->jd.model_start.num_layers = metadata->model.num_layers;
-		req->jd.model_start.num_gather_entries = 0;
-		req->jd.model_start.num_scatter_entries = 0;
-		req->jd.model_start.tilemask = 0; /* Updated after reserving pages */
-		req->jd.model_start.batch_size = model->batch_size;
-		req->jd.model_start.ocm_wb_base_address = 0; /* Updated after reserving pages */
-		req->jd.model_start.ocm_wb_range_start = metadata->model.ocm_wb_range_start;
-		req->jd.model_start.ocm_wb_range_end = metadata->model.ocm_wb_range_end;
-		req->jd.model_start.ddr_wb_base_address = PLT_U64_CAST(roc_ml_addr_ap2mlip(
-			&cn10k_mldev->roc,
-			PLT_PTR_ADD(addr->finish_load_addr, metadata->finish_model.file_size)));
-		req->jd.model_start.ddr_wb_range_start = metadata->model.ddr_wb_range_start;
-		req->jd.model_start.ddr_wb_range_end = metadata->model.ddr_wb_range_end;
-		req->jd.model_start.input.s.ddr_range_start = metadata->model.ddr_input_range_start;
-		req->jd.model_start.input.s.ddr_range_end = metadata->model.ddr_input_range_end;
-		req->jd.model_start.output.s.ddr_range_start =
+		req->cn10k_req.jd.model_start.model_init_size = metadata->init_model.file_size;
+		req->cn10k_req.jd.model_start.model_main_size = metadata->main_model.file_size;
+		req->cn10k_req.jd.model_start.model_finish_size = metadata->finish_model.file_size;
+		req->cn10k_req.jd.model_start.model_wb_offset = metadata->init_model.file_size +
+								metadata->main_model.file_size +
+								metadata->finish_model.file_size;
+		req->cn10k_req.jd.model_start.num_layers = metadata->model.num_layers;
+		req->cn10k_req.jd.model_start.num_gather_entries = 0;
+		req->cn10k_req.jd.model_start.num_scatter_entries = 0;
+		req->cn10k_req.jd.model_start.tilemask = 0; /* Updated after reserving pages */
+		req->cn10k_req.jd.model_start.batch_size = model->batch_size;
+		req->cn10k_req.jd.model_start.ocm_wb_base_address =
+			0; /* Updated after reserving pages */
+		req->cn10k_req.jd.model_start.ocm_wb_range_start =
+			metadata->model.ocm_wb_range_start;
+		req->cn10k_req.jd.model_start.ocm_wb_range_end = metadata->model.ocm_wb_range_end;
+		req->cn10k_req.jd.model_start.ddr_wb_base_address =
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(
+				&cn10k_mldev->roc, PLT_PTR_ADD(addr->finish_load_addr,
+							       metadata->finish_model.file_size)));
+		req->cn10k_req.jd.model_start.ddr_wb_range_start =
+			metadata->model.ddr_wb_range_start;
+		req->cn10k_req.jd.model_start.ddr_wb_range_end = metadata->model.ddr_wb_range_end;
+		req->cn10k_req.jd.model_start.input.s.ddr_range_start =
+			metadata->model.ddr_input_range_start;
+		req->cn10k_req.jd.model_start.input.s.ddr_range_end =
+			metadata->model.ddr_input_range_end;
+		req->cn10k_req.jd.model_start.output.s.ddr_range_start =
 			metadata->model.ddr_output_range_start;
-		req->jd.model_start.output.s.ddr_range_end = metadata->model.ddr_output_range_end;
+		req->cn10k_req.jd.model_start.output.s.ddr_range_end =
+			metadata->model.ddr_output_range_end;
 
-		req->extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
+		req->cn10k_req.extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
 			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->scratch_base_addr));
-		req->extended_args.start.ddr_scratch_range_start =
+		req->cn10k_req.extended_args.start.ddr_scratch_range_start =
 			metadata->model.ddr_scratch_range_start;
-		req->extended_args.start.ddr_scratch_range_end =
+		req->cn10k_req.extended_args.start.ddr_scratch_range_end =
 			metadata->model.ddr_scratch_range_end;
 	}
 }
 
 static __rte_always_inline void
-cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_req *req,
+cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_req *req,
 				struct rte_ml_op *op)
 {
-	req->jd.hdr.jce.w0.u64 = 0;
-	req->jd.hdr.jce.w1.u64 = req->compl_W1;
-	req->jd.hdr.model_id = op->model_id;
-	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
-	req->jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
-	req->jd.hdr.sp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
-	req->jd.model_run.input_ddr_addr =
+	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
+	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(req->status);
+	req->cn10k_req.jd.hdr.model_id = op->model_id;
+	req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
+	req->cn10k_req.jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
+	req->cn10k_req.jd.hdr.sp_flags = 0x0;
+	req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
+	req->cn10k_req.jd.model_run.input_ddr_addr =
 		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->input[0]->addr));
-	req->jd.model_run.output_ddr_addr =
+	req->cn10k_req.jd.model_run.output_ddr_addr =
 		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->output[0]->addr));
-	req->jd.model_run.num_batches = op->nb_batches;
+	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
 }
 
 struct xstat_info {
@@ -861,7 +870,7 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	op.input = &inp;
 	op.output = &out;
 
-	memset(model->layer[0].glow.req, 0, sizeof(struct cn10k_ml_req));
+	memset(model->layer[0].glow.req, 0, sizeof(struct cnxk_ml_req));
 	ret = cn10k_ml_inference_sync(dev, &op);
 	plt_memzone_free(mz);
 
@@ -904,7 +913,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint16_t model_id;
 	uint32_t mz_size;
 	uint16_t tile_id;
@@ -1101,7 +1110,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint16_t model_id;
 	uint16_t qp_id;
 
@@ -1136,7 +1145,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
 		qp = dev->data->queue_pairs[qp_id];
 		if (qp != NULL) {
-			if (cn10k_ml_qp_destroy(dev, qp) != 0)
+			if (cnxk_ml_qp_destroy(dev, qp) != 0)
 				plt_err("Could not destroy queue pair %u", qp_id);
 			dev->data->queue_pairs[qp_id] = NULL;
 		}
@@ -1213,7 +1222,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 			      const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
 {
 	struct rte_ml_dev_info dev_info;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint32_t nb_desc;
 
 	if (queue_pair_id >= dev->data->nb_queue_pairs) {
@@ -1239,7 +1248,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	 */
 	nb_desc =
 		(qp_conf->nb_desc == dev_info.max_desc) ? dev_info.max_desc : qp_conf->nb_desc + 1;
-	qp = cn10k_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
+	qp = cnxk_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
 	if (qp == NULL) {
 		plt_err("Could not create queue pair %u", queue_pair_id);
 		return -ENOMEM;
@@ -1252,7 +1261,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 static int
 cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 {
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	int qp_id;
 
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
@@ -1269,7 +1278,7 @@ cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 static void
 cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	int qp_id;
 
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
@@ -1485,20 +1494,22 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 
 	/* Dump debug buffer */
 	for (core_id = 0; core_id <= 1; core_id++) {
-		bufsize = fw->req->jd.fw_load.debug.debug_buffer_size;
+		bufsize = fw->req->cn10k_req.jd.fw_load.debug.debug_buffer_size;
 		if (core_id == 0) {
 			head_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
 			tail_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_debug_ptr);
+			head_ptr =
+				PLT_PTR_CAST(fw->req->cn10k_req.jd.fw_load.debug.core0_debug_ptr);
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		} else {
 			head_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
 			tail_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_debug_ptr);
+			head_ptr =
+				PLT_PTR_CAST(fw->req->cn10k_req.jd.fw_load.debug.core1_debug_ptr);
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		}
 		if (head_loc < tail_loc) {
@@ -1511,17 +1522,19 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 
 	/* Dump exception info */
 	for (core_id = 0; core_id <= 1; core_id++) {
-		bufsize = fw->req->jd.fw_load.debug.exception_state_size;
+		bufsize = fw->req->cn10k_req.jd.fw_load.debug.exception_state_size;
 		if ((core_id == 0) &&
 		    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0)) {
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_exception_buffer);
+			head_ptr = PLT_PTR_CAST(
+				fw->req->cn10k_req.jd.fw_load.debug.core0_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C0 = 0x%016lx",
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0));
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 			fprintf(fp, "%.*s", bufsize, head_ptr);
 		} else if ((core_id == 1) && (roc_ml_reg_read64(&cn10k_mldev->roc,
 								ML_SCRATCH_EXCEPTION_SP_C1) != 0)) {
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_exception_buffer);
+			head_ptr = PLT_PTR_CAST(
+				fw->req->cn10k_req.jd.fw_load.debug.core1_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C1 = 0x%016lx",
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1));
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
@@ -1538,14 +1551,14 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	const struct plt_memzone *mz;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 	uint64_t timeout_cycle;
 	bool timeout;
 	int ret;
 
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cn10k_ml_req), 0,
+	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cnxk_ml_req), 0,
 					 ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL) {
 		plt_err("Could not allocate reserved memzone");
@@ -1554,23 +1567,24 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	req = mz->addr;
 
 	/* Prepare load completion structure */
-	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
-	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
-	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
-	req->jd.fw_load.flags = cn10k_ml_fw_flags_get(&cn10k_mldev->fw);
-	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
+	memset(&req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->cn10k_req.status);
+	req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST;
+	req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
+	req->cn10k_req.jd.fw_load.flags = cn10k_ml_fw_flags_get(&cn10k_mldev->fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
 	/* Enqueue firmware selftest request through scratch registers */
 	timeout = true;
 	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 
 	plt_rmb();
 	do {
 		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
-		    (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH)) {
+		    (plt_read64(&req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -1581,7 +1595,7 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	if (timeout) {
 		ret = -ETIME;
 	} else {
-		if (req->result.error_code.u64 != 0)
+		if (req->cn10k_req.result.error_code != 0)
 			ret = -1;
 	}
 
@@ -1654,7 +1668,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
 		  2 * model_data_size + model_scratch_size + model_info_size +
-		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE) +
+		  PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE) +
 		  model_stats_size;
 
 	/* Allocate memzone for model object and model data */
@@ -1726,7 +1740,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	/* Reset burst and sync stats */
 	model->layer[0].glow.burst_stats =
 		PLT_PTR_ADD(model->layer[0].glow.req,
-			    PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE));
+			    PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
 		model->layer[0].glow.burst_stats[qp_id].hw_latency_tot = 0;
 		model->layer[0].glow.burst_stats[qp_id].hw_latency_min = UINT64_MAX;
@@ -1790,7 +1804,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 
 	bool job_enqueued;
 	bool job_dequeued;
@@ -1815,10 +1829,10 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	/* Prepare JD */
 	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
-	req->result.error_code.u64 = 0x0;
-	req->result.user_ptr = NULL;
+	req->cn10k_req.result.error_code = 0x0;
+	req->cn10k_req.result.user_ptr = NULL;
 
-	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
 	num_tiles = model->layer[0].glow.metadata.model.tile_end -
@@ -1878,8 +1892,8 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Update JD */
 	cn10k_ml_ocm_tilecount(model->layer[0].glow.ocm_map.tilemask, &tile_start, &tile_end);
-	req->jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
-	req->jd.model_start.ocm_wb_base_address =
+	req->cn10k_req.jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
+	req->cn10k_req.jd.model_start.ocm_wb_base_address =
 		model->layer[0].glow.ocm_map.wb_page_start * ocm->page_size;
 
 	job_enqueued = false;
@@ -1887,19 +1901,21 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	do {
 		if (!job_enqueued) {
 			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
+			job_enqueued =
+				roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
+			job_dequeued =
+				roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
-			if (req->result.error_code.u64 == 0)
+		if (plt_read64(&req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH) {
+			if (req->cn10k_req.result.error_code == 0)
 				ret = 0;
 			else
 				ret = -1;
@@ -1952,7 +1968,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 
 	bool job_enqueued;
 	bool job_dequeued;
@@ -1972,10 +1988,10 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	/* Prepare JD */
 	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
-	req->result.error_code.u64 = 0x0;
-	req->result.user_ptr = NULL;
+	req->cn10k_req.result.error_code = 0x0;
+	req->cn10k_req.result.user_ptr = NULL;
 
-	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
 	locked = false;
@@ -2015,19 +2031,21 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	do {
 		if (!job_enqueued) {
 			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
+			job_enqueued =
+				roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
+			job_dequeued =
+				roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
-			if (req->result.error_code.u64 == 0x0)
+		if (plt_read64(&req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH) {
+			if (req->cn10k_req.result.error_code == 0x0)
 				ret = 0;
 			else
 				ret = -1;
@@ -2287,18 +2305,23 @@ queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
 }
 
 static __rte_always_inline void
-cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
-		       struct rte_ml_op *op)
+cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *req)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_layer_stats *stats;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cn10k_ml_result *result;
 	struct cnxk_ml_model *model;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
+	struct rte_ml_op *op;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
 
-	if (likely(result->error_code.u64 == 0)) {
+	result = &req->cn10k_req.result;
+	op = req->op;
+
+	if (likely(result->error_code == 0)) {
 		model = dev->data->models[op->model_id];
 		if (likely(qp_id >= 0)) {
 			qp = dev->data->queue_pairs[qp_id];
@@ -2329,7 +2352,7 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		stats->fw_latency_max = PLT_MAX(stats->fw_latency_max, fw_latency);
 		stats->dequeued_count++;
 
-		op->impl_opaque = result->error_code.u64;
+		op->impl_opaque = result->error_code;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
 	} else {
 		if (likely(qp_id >= 0)) {
@@ -2338,7 +2361,8 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		}
 
 		/* Handle driver error */
-		if (result->error_code.s.etype == ML_ETYPE_DRIVER) {
+		error_code = (union cn10k_ml_error_code *)&result->error_code;
+		if (error_code->s.etype == ML_ETYPE_DRIVER) {
 			cnxk_mldev = dev->data->dev_private;
 			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
@@ -2346,15 +2370,15 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 			if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) !=
 			     0) ||
 			    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
-				result->error_code.s.stype = ML_DRIVER_ERR_EXCEPTION;
+				error_code->s.stype = ML_DRIVER_ERR_EXCEPTION;
 			else if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_LO) != 0) ||
 				 (roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_HI) != 0))
-				result->error_code.s.stype = ML_DRIVER_ERR_FW_ERROR;
+				error_code->s.stype = ML_DRIVER_ERR_FW_ERROR;
 			else
-				result->error_code.s.stype = ML_DRIVER_ERR_UNKNOWN;
+				error_code->s.stype = ML_DRIVER_ERR_UNKNOWN;
 		}
 
-		op->impl_opaque = result->error_code.u64;
+		op->impl_opaque = result->error_code;
 		op->status = RTE_ML_OP_STATUS_ERROR;
 	}
 
@@ -2365,11 +2389,12 @@ __rte_hot uint16_t
 cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_req *req;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_req *req;
+	struct cnxk_ml_qp *qp;
 	struct rte_ml_op *op;
 
 	uint16_t count;
@@ -2395,12 +2420,13 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	cn10k_mldev->set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
-	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
-	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
-	req->result.user_ptr = op->user_ptr;
+	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
+	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	req->cn10k_req.result.user_ptr = op->user_ptr;
 
 	cn10k_mldev->set_poll_ptr(req);
-	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd);
+	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
 
@@ -2424,11 +2450,12 @@ __rte_hot uint16_t
 cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_req *req;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_req *req;
+	struct cnxk_ml_qp *qp;
 
 	uint64_t status;
 	uint16_t count;
@@ -2450,13 +2477,15 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	req = &queue->reqs[tail];
 	status = cn10k_mldev->get_poll_ptr(req);
 	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
-		if (plt_tsc_cycles() < req->timeout)
+		if (plt_tsc_cycles() < req->timeout) {
 			goto empty_or_active;
-		else /* Timeout, set indication of driver error */
-			req->result.error_code.s.etype = ML_ETYPE_DRIVER;
+		} else { /* Timeout, set indication of driver error */
+			error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+			error_code->s.etype = ML_ETYPE_DRIVER;
+		}
 	}
 
-	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
+	cn10k_ml_result_update(dev, qp_id, req);
 	ops[count] = req->op;
 
 	queue_index_advance(&tail, qp->nb_desc);
@@ -2507,10 +2536,11 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 __rte_hot int
 cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 	bool timeout;
 	int ret = 0;
 
@@ -2522,17 +2552,18 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	cn10k_ml_set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
-	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
-	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
-	req->result.user_ptr = op->user_ptr;
+	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
+	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	req->cn10k_req.result.user_ptr = op->user_ptr;
 
 	cn10k_mldev->set_poll_ptr(req);
-	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
+	req->cn10k_req.jcmd.w1.s.jobptr = PLT_U64_CAST(&req->cn10k_req.jd);
 
 	timeout = true;
 	req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	do {
-		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd)) {
+		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd)) {
 			req->op = op;
 			timeout = false;
 			break;
@@ -2555,7 +2586,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	if (timeout)
 		ret = -ETIME;
 	else
-		cn10k_ml_result_update(dev, -1, &req->result, req->op);
+		cn10k_ml_result_update(dev, -1, req);
 
 error_enqueue:
 	return ret;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 005b093e45..fd5992e192 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -10,63 +10,279 @@
 
 #include <roc_api.h>
 
-#include "cn10k_ml_dev.h"
+/* Firmware version string length */
+#define MLDEV_FIRMWARE_VERSION_LENGTH 32
 
-/* Request structure */
-struct cn10k_ml_req {
-	/* Job descriptor */
-	struct cn10k_ml_jd jd;
+/* Job types */
+enum cn10k_ml_job_type {
+	ML_CN10K_JOB_TYPE_MODEL_RUN = 0,
+	ML_CN10K_JOB_TYPE_MODEL_STOP,
+	ML_CN10K_JOB_TYPE_MODEL_START,
+	ML_CN10K_JOB_TYPE_FIRMWARE_LOAD,
+	ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST,
+};
 
-	/* Job descriptor extra arguments */
-	union cn10k_ml_jd_extended_args extended_args;
+/* Firmware stats */
+struct cn10k_ml_stats {
+	/* Firmware start cycle */
+	uint64_t fw_start;
 
-	/* Job result */
-	struct cn10k_ml_result result;
+	/* Firmware end cycle */
+	uint64_t fw_end;
 
-	/* Status field for poll mode requests */
-	volatile uint64_t status;
+	/* Hardware start cycle */
+	uint64_t hw_start;
 
-	/* Job command */
-	struct ml_job_cmd_s jcmd;
+	/* Hardware end cycle */
+	uint64_t hw_end;
+};
+
+/* Result structure */
+struct cn10k_ml_result {
+	/* Job error code */
+	uint64_t error_code;
+
+	/* Stats */
+	struct cn10k_ml_stats stats;
+
+	/* User context pointer */
+	void *user_ptr;
+};
+
+/* Firmware capability structure */
+union cn10k_ml_fw_cap {
+	uint64_t u64;
+
+	struct {
+		/* CMPC completion support */
+		uint64_t cmpc_completions : 1;
+
+		/* Poll mode completion support */
+		uint64_t poll_completions : 1;
+
+		/* SSO completion support */
+		uint64_t sso_completions : 1;
+
+		/* Support for model side loading */
+		uint64_t side_load_model : 1;
 
-	/* Job completion W1 */
-	uint64_t compl_W1;
+		/* Batch execution */
+		uint64_t batch_run : 1;
 
-	/* Timeout cycle */
-	uint64_t timeout;
+		/* Max number of models to be loaded in parallel */
+		uint64_t max_models : 8;
 
-	/* Op */
-	struct rte_ml_op *op;
-} __rte_aligned(ROC_ALIGN);
+		/* Firmware statistics */
+		uint64_t fw_stats : 1;
 
-/* Request queue */
-struct cn10k_ml_queue {
-	/* Array of requests */
-	struct cn10k_ml_req *reqs;
+		/* Hardware statistics */
+		uint64_t hw_stats : 1;
 
-	/* Head of the queue, used for enqueue */
-	uint64_t head;
+		/* Max number of batches */
+		uint64_t max_num_batches : 16;
 
-	/* Tail of the queue, used for dequeue */
-	uint64_t tail;
+		uint64_t rsvd : 33;
+	} s;
+};
+
+/* Firmware debug info structure */
+struct cn10k_ml_fw_debug {
+	/* ACC core 0 debug buffer */
+	uint64_t core0_debug_ptr;
+
+	/* ACC core 1 debug buffer */
+	uint64_t core1_debug_ptr;
+
+	/* ACC core 0 exception state buffer */
+	uint64_t core0_exception_buffer;
+
+	/* ACC core 1 exception state buffer */
+	uint64_t core1_exception_buffer;
+
+	/* Debug buffer size per core */
+	uint32_t debug_buffer_size;
 
-	/* Wait cycles before timeout */
-	uint64_t wait_cycles;
+	/* Exception state dump size */
+	uint32_t exception_state_size;
 };
 
-/* Queue-pair structure */
-struct cn10k_ml_qp {
-	/* ID */
-	uint32_t id;
+/* Job descriptor header (32 bytes) */
+struct cn10k_ml_jd_header {
+	/* Job completion structure */
+	struct ml_jce_s jce;
+
+	/* Model ID */
+	uint64_t model_id : 8;
+
+	/* Job type */
+	uint64_t job_type : 8;
+
+	/* Flags for fast-path jobs */
+	uint64_t fp_flags : 16;
+
+	/* Flags for slow-path jobs */
+	uint64_t sp_flags : 16;
+	uint64_t rsvd : 16;
+
+	/* Job result pointer */
+	uint64_t *result;
+};
+
+/* Extra arguments for job descriptor */
+union cn10k_ml_jd_extended_args {
+	struct cn10k_ml_jd_extended_args_section_start {
+		/* DDR Scratch base address */
+		uint64_t ddr_scratch_base_address;
+
+		/* DDR Scratch range start */
+		uint64_t ddr_scratch_range_start;
+
+		/* DDR Scratch range end */
+		uint64_t ddr_scratch_range_end;
+
+		uint8_t rsvd[104];
+	} start;
+};
+
+/* Job descriptor structure */
+struct cn10k_ml_jd {
+	/* Job descriptor header (32 bytes) */
+	struct cn10k_ml_jd_header hdr;
+
+	union {
+		struct cn10k_ml_jd_section_fw_load {
+			/* Firmware capability structure (8 bytes) */
+			union cn10k_ml_fw_cap cap;
+
+			/* Firmware version (32 bytes) */
+			uint8_t version[MLDEV_FIRMWARE_VERSION_LENGTH];
+
+			/* Debug capability structure (40 bytes) */
+			struct cn10k_ml_fw_debug debug;
 
-	/* Number of descriptors */
-	uint64_t nb_desc;
+			/* Flags to control error handling */
+			uint64_t flags;
 
-	/* Request queue */
-	struct cn10k_ml_queue queue;
+			uint8_t rsvd[8];
+		} fw_load;
 
-	/* Statistics per queue-pair */
-	struct rte_ml_dev_stats stats;
+		struct cn10k_ml_jd_section_model_start {
+			/* Extended arguments */
+			uint64_t extended_args;
+
+			/* Destination model start address in DDR relative to ML_MLR_BASE */
+			uint64_t model_dst_ddr_addr;
+
+			/* Offset to model init section in the model */
+			uint64_t model_init_offset : 32;
+
+			/* Size of init section in the model */
+			uint64_t model_init_size : 32;
+
+			/* Offset to model main section in the model */
+			uint64_t model_main_offset : 32;
+
+			/* Size of main section in the model */
+			uint64_t model_main_size : 32;
+
+			/* Offset to model finish section in the model */
+			uint64_t model_finish_offset : 32;
+
+			/* Size of finish section in the model */
+			uint64_t model_finish_size : 32;
+
+			/* Offset to WB in model bin */
+			uint64_t model_wb_offset : 32;
+
+			/* Number of model layers */
+			uint64_t num_layers : 8;
+
+			/* Number of gather entries, 0 means linear input mode (= no gather) */
+			uint64_t num_gather_entries : 8;
+
+			/* Number of scatter entries 0 means linear input mode (= no scatter) */
+			uint64_t num_scatter_entries : 8;
+
+			/* Tile mask to load model */
+			uint64_t tilemask : 8;
+
+			/* Batch size of model  */
+			uint64_t batch_size : 32;
+
+			/* OCM WB base address */
+			uint64_t ocm_wb_base_address : 32;
+
+			/* OCM WB range start */
+			uint64_t ocm_wb_range_start : 32;
+
+			/* OCM WB range End */
+			uint64_t ocm_wb_range_end : 32;
+
+			/* DDR WB address */
+			uint64_t ddr_wb_base_address;
+
+			/* DDR WB range start */
+			uint64_t ddr_wb_range_start : 32;
+
+			/* DDR WB range end */
+			uint64_t ddr_wb_range_end : 32;
+
+			union {
+				/* Points to gather list if num_gather_entries > 0 */
+				void *gather_list;
+				struct {
+					/* Linear input mode */
+					uint64_t ddr_range_start : 32;
+					uint64_t ddr_range_end : 32;
+				} s;
+			} input;
+
+			union {
+				/* Points to scatter list if num_scatter_entries > 0 */
+				void *scatter_list;
+				struct {
+					/* Linear output mode */
+					uint64_t ddr_range_start : 32;
+					uint64_t ddr_range_end : 32;
+				} s;
+			} output;
+		} model_start;
+
+		struct cn10k_ml_jd_section_model_stop {
+			uint8_t rsvd[96];
+		} model_stop;
+
+		struct cn10k_ml_jd_section_model_run {
+			/* Address of the input for the run relative to ML_MLR_BASE */
+			uint64_t input_ddr_addr;
+
+			/* Address of the output for the run relative to ML_MLR_BASE */
+			uint64_t output_ddr_addr;
+
+			/* Number of batches to run in variable batch processing */
+			uint16_t num_batches;
+
+			uint8_t rsvd[78];
+		} model_run;
+	};
+} __plt_aligned(ROC_ALIGN);
+
+/* CN10K specific request */
+struct cn10k_ml_req {
+	/* Job descriptor */
+	struct cn10k_ml_jd jd;
+
+	/* Job descriptor extra arguments */
+	union cn10k_ml_jd_extended_args extended_args;
+
+	/* Status field for poll mode requests */
+	volatile uint64_t status;
+
+	/* Job command */
+	struct ml_job_cmd_s jcmd;
+
+	/* Result */
+	struct cn10k_ml_result result;
 };
 
 /* Device ops */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
new file mode 100644
index 0000000000..f1872dcf7c
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include "cnxk_ml_ops.h"
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
new file mode 100644
index 0000000000..b953fb0f5f
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_OPS_H_
+#define _CNXK_ML_OPS_H_
+
+#include <rte_mldev.h>
+#include <rte_mldev_core.h>
+
+#include <roc_api.h>
+
+#include "cn10k_ml_ops.h"
+
+/* Request structure */
+struct cnxk_ml_req {
+	/* Device specific request */
+	union {
+		/* CN10K */
+		struct cn10k_ml_req cn10k_req;
+	};
+
+	/* Address of status field */
+	volatile uint64_t *status;
+
+	/* Timeout cycle */
+	uint64_t timeout;
+
+	/* Op */
+	struct rte_ml_op *op;
+} __rte_aligned(ROC_ALIGN);
+
+/* Request queue */
+struct cnxk_ml_queue {
+	/* Array of requests */
+	struct cnxk_ml_req *reqs;
+
+	/* Head of the queue, used for enqueue */
+	uint64_t head;
+
+	/* Tail of the queue, used for dequeue */
+	uint64_t tail;
+
+	/* Wait cycles before timeout */
+	uint64_t wait_cycles;
+};
+
+/* Queue-pair structure */
+struct cnxk_ml_qp {
+	/* ID */
+	uint32_t id;
+
+	/* Number of descriptors */
+	uint64_t nb_desc;
+
+	/* Request queue */
+	struct cnxk_ml_queue queue;
+
+	/* Statistics per queue-pair */
+	struct rte_ml_dev_stats stats;
+};
+
+#endif /* _CNXK_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index a70956cceb..d652543912 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -14,6 +14,7 @@ sources = files(
         'cn10k_ml_ocm.c',
         'cnxk_ml_dev.c',
         'cnxk_ml_model.c',
+        'cnxk_ml_ops.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 05/34] ml/cnxk: add generic cnxk xstats structures
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
                     ` (3 preceding siblings ...)
  2023-10-19  4:16   ` [PATCH v7 04/34] ml/cnxk: add generic cnxk request structure Srikanth Yalavarthi
@ 2023-10-19  4:16   ` Srikanth Yalavarthi
  2023-10-19  4:16   ` [PATCH v7 06/34] ml/cnxk: rename cnxk ops function pointers struct Srikanth Yalavarthi
                     ` (28 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:16 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduced generic xstats structures and renamed cn10k
xstats enumerations with cnxk prefix.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |  86 +---------------
 drivers/ml/cnxk/cn10k_ml_model.h |   6 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 169 ++++++++++++++-----------------
 drivers/ml/cnxk/cnxk_ml_xstats.h | 128 +++++++++++++++++++++++
 4 files changed, 209 insertions(+), 180 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_xstats.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 1852d4f6c9..be989e0a20 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -10,6 +10,7 @@
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_io.h"
+#include "cnxk_ml_xstats.h"
 
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
@@ -121,89 +122,6 @@ struct cn10k_ml_fw {
 	struct cnxk_ml_req *req;
 };
 
-/* Extended stats types enum */
-enum cn10k_ml_xstats_type {
-	/* Number of models loaded */
-	nb_models_loaded,
-
-	/* Number of models unloaded */
-	nb_models_unloaded,
-
-	/* Number of models started */
-	nb_models_started,
-
-	/* Number of models stopped */
-	nb_models_stopped,
-
-	/* Average inference hardware latency */
-	avg_hw_latency,
-
-	/* Minimum hardware latency */
-	min_hw_latency,
-
-	/* Maximum hardware latency */
-	max_hw_latency,
-
-	/* Average firmware latency */
-	avg_fw_latency,
-
-	/* Minimum firmware latency */
-	min_fw_latency,
-
-	/* Maximum firmware latency */
-	max_fw_latency,
-};
-
-/* Extended stats function type enum. */
-enum cn10k_ml_xstats_fn_type {
-	/* Device function */
-	CN10K_ML_XSTATS_FN_DEVICE,
-
-	/* Model function */
-	CN10K_ML_XSTATS_FN_MODEL,
-};
-
-/* Function pointer to get xstats for a type */
-typedef uint64_t (*cn10k_ml_xstats_fn)(struct rte_ml_dev *dev, uint16_t obj_idx,
-				       enum cn10k_ml_xstats_type stat);
-
-/* Extended stats entry structure */
-struct cn10k_ml_xstats_entry {
-	/* Name-ID map */
-	struct rte_ml_dev_xstats_map map;
-
-	/* xstats mode, device or model */
-	enum rte_ml_dev_xstats_mode mode;
-
-	/* Type of xstats */
-	enum cn10k_ml_xstats_type type;
-
-	/* xstats function */
-	enum cn10k_ml_xstats_fn_type fn_id;
-
-	/* Object ID, model ID for model stat type */
-	uint16_t obj_idx;
-
-	/* Allowed to reset the stat */
-	uint8_t reset_allowed;
-
-	/* An offset to be taken away to emulate resets */
-	uint64_t reset_value;
-};
-
-/* Extended stats data */
-struct cn10k_ml_xstats {
-	/* Pointer to xstats entries */
-	struct cn10k_ml_xstats_entry *entries;
-
-	/* Store num stats and offset of the stats for each model */
-	uint16_t count_per_model[ML_CNXK_MAX_MODELS];
-	uint16_t offset_for_model[ML_CNXK_MAX_MODELS];
-	uint16_t count_mode_device;
-	uint16_t count_mode_model;
-	uint16_t count;
-};
-
 /* Device private data */
 struct cn10k_ml_dev {
 	/* Device ROC */
@@ -216,7 +134,7 @@ struct cn10k_ml_dev {
 	struct cn10k_ml_ocm ocm;
 
 	/* Extended stats data */
-	struct cn10k_ml_xstats xstats;
+	struct cnxk_ml_xstats xstats;
 
 	/* Enable / disable model data caching */
 	int cache_model_data;
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 74ada1531a..5c32f48c68 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -404,7 +404,7 @@ struct cn10k_ml_layer_addr {
 };
 
 /* Model fast-path stats */
-struct cn10k_ml_layer_stats {
+struct cn10k_ml_layer_xstats {
 	/* Total hardware latency, sum of all inferences */
 	uint64_t hw_latency_tot;
 
@@ -447,10 +447,10 @@ struct cn10k_ml_layer_data {
 	struct cnxk_ml_req *req;
 
 	/* Layer: Stats for burst ops */
-	struct cn10k_ml_layer_stats *burst_stats;
+	struct cn10k_ml_layer_xstats *burst_xstats;
 
 	/* Layer: Stats for sync ops */
-	struct cn10k_ml_layer_stats *sync_stats;
+	struct cn10k_ml_layer_xstats *sync_xstats;
 };
 
 struct cn10k_ml_model_data {
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 25ebb28993..b470955ffd 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -10,6 +10,7 @@
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
+#include "cnxk_ml_xstats.h"
 
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
@@ -425,26 +426,6 @@ cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
 }
 
-struct xstat_info {
-	char name[32];
-	enum cn10k_ml_xstats_type type;
-	uint8_t reset_allowed;
-};
-
-/* Note: Device stats are not allowed to be reset. */
-static const struct xstat_info device_stats[] = {
-	{"nb_models_loaded", nb_models_loaded, 0},
-	{"nb_models_unloaded", nb_models_unloaded, 0},
-	{"nb_models_started", nb_models_started, 0},
-	{"nb_models_stopped", nb_models_stopped, 0},
-};
-
-static const struct xstat_info model_stats[] = {
-	{"Avg-HW-Latency", avg_hw_latency, 1}, {"Min-HW-Latency", min_hw_latency, 1},
-	{"Max-HW-Latency", max_hw_latency, 1}, {"Avg-FW-Latency", avg_fw_latency, 1},
-	{"Min-FW-Latency", min_fw_latency, 1}, {"Max-FW-Latency", max_fw_latency, 1},
-};
-
 static int
 cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 {
@@ -459,10 +440,10 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
-	nb_stats = RTE_DIM(device_stats) + ML_CNXK_MAX_MODELS * RTE_DIM(model_stats);
+	nb_stats = RTE_DIM(device_xstats) + ML_CNXK_MAX_MODELS * RTE_DIM(layer_xstats);
 	if (cn10k_mldev->xstats.entries == NULL)
 		cn10k_mldev->xstats.entries = rte_zmalloc(
-			"cn10k_ml_xstats", sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
+			"cn10k_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
 			PLT_CACHE_LINE_SIZE);
 
 	if (cn10k_mldev->xstats.entries == NULL)
@@ -470,17 +451,17 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 
 	/* Initialize device xstats */
 	stat_id = 0;
-	for (i = 0; i < RTE_DIM(device_stats); i++) {
+	for (i = 0; i < RTE_DIM(device_xstats); i++) {
 		cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
 		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s",
-			 device_stats[i].name);
+			 device_xstats[i].name);
 
 		cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
-		cn10k_mldev->xstats.entries[stat_id].type = device_stats[i].type;
-		cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_DEVICE;
+		cn10k_mldev->xstats.entries[stat_id].type = device_xstats[i].type;
+		cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_DEVICE;
 		cn10k_mldev->xstats.entries[stat_id].obj_idx = 0;
-		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_stats[i].reset_allowed;
+		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_xstats[i].reset_allowed;
 		stat_id++;
 	}
 	cn10k_mldev->xstats.count_mode_device = stat_id;
@@ -489,24 +470,24 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
 		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
 
-		for (i = 0; i < RTE_DIM(model_stats); i++) {
+		for (i = 0; i < RTE_DIM(layer_xstats); i++) {
 			cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
 			cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
-			cn10k_mldev->xstats.entries[stat_id].type = model_stats[i].type;
-			cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_MODEL;
+			cn10k_mldev->xstats.entries[stat_id].type = layer_xstats[i].type;
+			cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
 			cn10k_mldev->xstats.entries[stat_id].obj_idx = model;
 			cn10k_mldev->xstats.entries[stat_id].reset_allowed =
-				model_stats[i].reset_allowed;
+				layer_xstats[i].reset_allowed;
 
 			/* Name of xstat is updated during model load */
 			snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 				 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name),
-				 "Model-%u-%s", model, model_stats[i].name);
+				 "Model-%u-%s", model, layer_xstats[i].name);
 
 			stat_id++;
 		}
 
-		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(model_stats);
+		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
 	}
 
 	cn10k_mldev->xstats.count_mode_model = stat_id - cn10k_mldev->xstats.count_mode_device;
@@ -545,7 +526,7 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[model_id];
-	stat_id = RTE_DIM(device_stats) + model_id * RTE_DIM(model_stats);
+	stat_id = RTE_DIM(device_xstats) + model_id * RTE_DIM(layer_xstats);
 
 	roc_clk_freq_get(&rclk_freq, &sclk_freq);
 	if (sclk_freq == 0)
@@ -554,17 +535,17 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 		strcpy(suffix, "ns");
 
 	/* Update xstat name based on model name and sclk availability */
-	for (i = 0; i < RTE_DIM(model_stats); i++) {
+	for (i = 0; i < RTE_DIM(layer_xstats); i++) {
 		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
-			 model->layer[0].glow.metadata.model.name, model_stats[i].name, suffix);
+			 model->layer[0].glow.metadata.model.name, layer_xstats[i].name, suffix);
 		stat_id++;
 	}
 }
 
 static uint64_t
 cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
-		       enum cn10k_ml_xstats_type type)
+		       enum cnxk_ml_xstats_type type)
 {
 	struct cnxk_ml_dev *cnxk_mldev;
 
@@ -590,9 +571,9 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value += model->layer[0].glow.burst_stats[qp_id].str##_latency_tot;        \
-			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
-				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
+			value += model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot;       \
+			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
+				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
 		}                                                                                  \
 		if (count != 0)                                                                    \
 			value = value / count;                                                     \
@@ -603,9 +584,10 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 		value = UINT64_MAX;                                                                \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
 			value = PLT_MIN(                                                           \
-				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_min); \
-			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
-				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
+				value,                                                             \
+				model->layer[0].glow.burst_xstats[qp_id].str##_latency_min);       \
+			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
+				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
@@ -616,16 +598,17 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
 			value = PLT_MAX(                                                           \
-				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_max); \
-			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
-				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
+				value,                                                             \
+				model->layer[0].glow.burst_xstats[qp_id].str##_latency_max);       \
+			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
+				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
 	} while (0)
 
 static uint64_t
-cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cn10k_ml_xstats_type type)
+cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cnxk_ml_xstats_type type)
 {
 	struct cnxk_ml_model *model;
 	uint16_t rclk_freq; /* MHz */
@@ -671,8 +654,8 @@ cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cn10k_ml
 static int
 cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[], uint16_t nb_ids)
 {
-	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
 	uint16_t nb_stats;
 	uint16_t stat_id;
@@ -708,26 +691,26 @@ cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[],
 #define ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			model->layer[0].glow.burst_stats[qp_id].str##_latency_tot = 0;             \
-			model->layer[0].glow.burst_stats[qp_id].str##_reset_count =                \
-				model->layer[0].glow.burst_stats[qp_id].dequeued_count;            \
+			model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot = 0;            \
+			model->layer[0].glow.burst_xstats[qp_id].str##_reset_count =               \
+				model->layer[0].glow.burst_xstats[qp_id].dequeued_count;           \
 		}                                                                                  \
 	} while (0)
 
 #define ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_stats[qp_id].str##_latency_min = UINT64_MAX;    \
+			model->layer[0].glow.burst_xstats[qp_id].str##_latency_min = UINT64_MAX;   \
 	} while (0)
 
 #define ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_stats[qp_id].str##_latency_max = 0;             \
+			model->layer[0].glow.burst_xstats[qp_id].str##_latency_max = 0;            \
 	} while (0)
 
 static void
-cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cn10k_ml_xstats_type type)
+cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cnxk_ml_xstats_type type)
 {
 	struct cnxk_ml_model *model;
 	uint32_t qp_id;
@@ -762,8 +745,8 @@ static int
 cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint16_t stat_ids[],
 			    uint16_t nb_ids)
 {
-	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	int32_t lcl_model_id = 0;
@@ -1342,10 +1325,10 @@ static int
 cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
 				uint64_t *value)
 {
-	struct cn10k_ml_xstats_entry *xs;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	cn10k_ml_xstats_fn fn;
+	cnxk_ml_xstats_fn fn;
 	uint32_t i;
 
 	cnxk_mldev = dev->data->dev_private;
@@ -1357,10 +1340,10 @@ cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16
 				*stat_id = xs->map.id;
 
 			switch (xs->fn_id) {
-			case CN10K_ML_XSTATS_FN_DEVICE:
+			case CNXK_ML_XSTATS_FN_DEVICE:
 				fn = cn10k_ml_dev_xstat_get;
 				break;
-			case CN10K_ML_XSTATS_FN_MODEL:
+			case CNXK_ML_XSTATS_FN_MODEL:
 				fn = cn10k_ml_model_xstat_get;
 				break;
 			default:
@@ -1384,11 +1367,11 @@ static int
 cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
 			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
 {
-	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
 	uint32_t xstats_mode_count;
-	cn10k_ml_xstats_fn fn;
+	cnxk_ml_xstats_fn fn;
 	uint64_t val;
 	uint32_t idx;
 	uint32_t i;
@@ -1423,10 +1406,10 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 		}
 
 		switch (xs->fn_id) {
-		case CN10K_ML_XSTATS_FN_DEVICE:
+		case CNXK_ML_XSTATS_FN_DEVICE:
 			fn = cn10k_ml_dev_xstat_get;
 			break;
-		case CN10K_ML_XSTATS_FN_MODEL:
+		case CNXK_ML_XSTATS_FN_MODEL:
 			fn = cn10k_ml_model_xstat_get;
 			break;
 		default:
@@ -1664,7 +1647,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
 			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
 	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
-	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_stats);
+	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_xstats);
 
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
 		  2 * model_data_size + model_scratch_size + model_info_size +
@@ -1738,24 +1721,24 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	model->layer[0].glow.req = PLT_PTR_ADD(model->info, model_info_size);
 
 	/* Reset burst and sync stats */
-	model->layer[0].glow.burst_stats =
+	model->layer[0].glow.burst_xstats =
 		PLT_PTR_ADD(model->layer[0].glow.req,
 			    PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
-		model->layer[0].glow.burst_stats[qp_id].hw_latency_tot = 0;
-		model->layer[0].glow.burst_stats[qp_id].hw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_stats[qp_id].hw_latency_max = 0;
-		model->layer[0].glow.burst_stats[qp_id].fw_latency_tot = 0;
-		model->layer[0].glow.burst_stats[qp_id].fw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_stats[qp_id].fw_latency_max = 0;
-		model->layer[0].glow.burst_stats[qp_id].hw_reset_count = 0;
-		model->layer[0].glow.burst_stats[qp_id].fw_reset_count = 0;
-		model->layer[0].glow.burst_stats[qp_id].dequeued_count = 0;
+		model->layer[0].glow.burst_xstats[qp_id].hw_latency_tot = 0;
+		model->layer[0].glow.burst_xstats[qp_id].hw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_xstats[qp_id].hw_latency_max = 0;
+		model->layer[0].glow.burst_xstats[qp_id].fw_latency_tot = 0;
+		model->layer[0].glow.burst_xstats[qp_id].fw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_xstats[qp_id].fw_latency_max = 0;
+		model->layer[0].glow.burst_xstats[qp_id].hw_reset_count = 0;
+		model->layer[0].glow.burst_xstats[qp_id].fw_reset_count = 0;
+		model->layer[0].glow.burst_xstats[qp_id].dequeued_count = 0;
 	}
 
-	model->layer[0].glow.sync_stats =
-		PLT_PTR_ADD(model->layer[0].glow.burst_stats,
-			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_stats));
+	model->layer[0].glow.sync_xstats =
+		PLT_PTR_ADD(model->layer[0].glow.burst_xstats,
+			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_xstats));
 
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CNXK_MODEL_STATE_LOADED;
@@ -2308,7 +2291,7 @@ static __rte_always_inline void
 cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *req)
 {
 	union cn10k_ml_error_code *error_code;
-	struct cn10k_ml_layer_stats *stats;
+	struct cn10k_ml_layer_xstats *xstats;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_result *result;
@@ -2326,31 +2309,31 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *re
 		if (likely(qp_id >= 0)) {
 			qp = dev->data->queue_pairs[qp_id];
 			qp->stats.dequeued_count++;
-			stats = &model->layer[0].glow.burst_stats[qp_id];
+			xstats = &model->layer[0].glow.burst_xstats[qp_id];
 		} else {
-			stats = model->layer[0].glow.sync_stats;
+			xstats = model->layer[0].glow.sync_xstats;
 		}
 
-		if (unlikely(stats->dequeued_count == stats->hw_reset_count)) {
-			stats->hw_latency_min = UINT64_MAX;
-			stats->hw_latency_max = 0;
+		if (unlikely(xstats->dequeued_count == xstats->hw_reset_count)) {
+			xstats->hw_latency_min = UINT64_MAX;
+			xstats->hw_latency_max = 0;
 		}
 
-		if (unlikely(stats->dequeued_count == stats->fw_reset_count)) {
-			stats->fw_latency_min = UINT64_MAX;
-			stats->fw_latency_max = 0;
+		if (unlikely(xstats->dequeued_count == xstats->fw_reset_count)) {
+			xstats->fw_latency_min = UINT64_MAX;
+			xstats->fw_latency_max = 0;
 		}
 
 		hw_latency = result->stats.hw_end - result->stats.hw_start;
 		fw_latency = result->stats.fw_end - result->stats.fw_start - hw_latency;
 
-		stats->hw_latency_tot += hw_latency;
-		stats->hw_latency_min = PLT_MIN(stats->hw_latency_min, hw_latency);
-		stats->hw_latency_max = PLT_MAX(stats->hw_latency_max, hw_latency);
-		stats->fw_latency_tot += fw_latency;
-		stats->fw_latency_min = PLT_MIN(stats->fw_latency_min, fw_latency);
-		stats->fw_latency_max = PLT_MAX(stats->fw_latency_max, fw_latency);
-		stats->dequeued_count++;
+		xstats->hw_latency_tot += hw_latency;
+		xstats->hw_latency_min = PLT_MIN(xstats->hw_latency_min, hw_latency);
+		xstats->hw_latency_max = PLT_MAX(xstats->hw_latency_max, hw_latency);
+		xstats->fw_latency_tot += fw_latency;
+		xstats->fw_latency_min = PLT_MIN(xstats->fw_latency_min, fw_latency);
+		xstats->fw_latency_max = PLT_MAX(xstats->fw_latency_max, fw_latency);
+		xstats->dequeued_count++;
 
 		op->impl_opaque = result->error_code;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
diff --git a/drivers/ml/cnxk/cnxk_ml_xstats.h b/drivers/ml/cnxk/cnxk_ml_xstats.h
new file mode 100644
index 0000000000..0d405679ca
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_xstats.h
@@ -0,0 +1,128 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_XSTATS_H_
+#define _CNXK_ML_XSTATS_H_
+
+#include "cnxk_ml_io.h"
+
+/* Extended stats types enum */
+enum cnxk_ml_xstats_type {
+	/* Number of models loaded */
+	nb_models_loaded,
+
+	/* Number of models unloaded */
+	nb_models_unloaded,
+
+	/* Number of models started */
+	nb_models_started,
+
+	/* Number of models stopped */
+	nb_models_stopped,
+
+	/* Average inference hardware latency */
+	avg_hw_latency,
+
+	/* Minimum hardware latency */
+	min_hw_latency,
+
+	/* Maximum hardware latency */
+	max_hw_latency,
+
+	/* Average firmware latency */
+	avg_fw_latency,
+
+	/* Minimum firmware latency */
+	min_fw_latency,
+
+	/* Maximum firmware latency */
+	max_fw_latency,
+
+	/* Average runtime latency */
+	avg_rt_latency,
+
+	/* Minimum runtime latency */
+	min_rt_latency,
+
+	/* Maximum runtime latency */
+	max_rt_latency,
+};
+
+/* Extended stats function type enum. */
+enum cnxk_ml_xstats_fn_type {
+	/* Device function */
+	CNXK_ML_XSTATS_FN_DEVICE,
+
+	/* Model function */
+	CNXK_ML_XSTATS_FN_MODEL,
+};
+
+/* Function pointer to get xstats for a type */
+typedef uint64_t (*cnxk_ml_xstats_fn)(struct rte_ml_dev *cnxk_mldev, uint16_t obj_idx,
+				      enum cnxk_ml_xstats_type stat);
+
+/* Extended stats entry structure */
+struct cnxk_ml_xstats_entry {
+	/* Name-ID map */
+	struct rte_ml_dev_xstats_map map;
+
+	/* xstats mode, device or model */
+	enum rte_ml_dev_xstats_mode mode;
+
+	/* Type of xstats */
+	enum cnxk_ml_xstats_type type;
+
+	/* xstats function */
+	enum cnxk_ml_xstats_fn_type fn_id;
+
+	/* Object ID, model ID for model stat type */
+	uint16_t obj_idx;
+
+	/* Layer ID, valid for model stat type */
+	int32_t layer_id;
+
+	/* Allowed to reset the stat */
+	uint8_t reset_allowed;
+
+	/* An offset to be taken away to emulate resets */
+	uint64_t reset_value;
+};
+
+/* Extended stats data */
+struct cnxk_ml_xstats {
+	/* Pointer to xstats entries */
+	struct cnxk_ml_xstats_entry *entries;
+
+	/* Store num stats and offset of the stats for each model */
+	uint16_t count_per_model[ML_CNXK_MAX_MODELS];
+	uint16_t offset_for_model[ML_CNXK_MAX_MODELS];
+	uint16_t count_per_layer[ML_CNXK_MAX_MODELS][ML_CNXK_MODEL_MAX_LAYERS];
+	uint16_t offset_for_layer[ML_CNXK_MAX_MODELS][ML_CNXK_MODEL_MAX_LAYERS];
+	uint16_t count_mode_device;
+	uint16_t count_mode_model;
+	uint16_t count;
+};
+
+struct cnxk_ml_xstat_info {
+	char name[32];
+	enum cnxk_ml_xstats_type type;
+	uint8_t reset_allowed;
+};
+
+/* Device xstats. Note: Device stats are not allowed to be reset. */
+static const struct cnxk_ml_xstat_info device_xstats[] = {
+	{"nb_models_loaded", nb_models_loaded, 0},
+	{"nb_models_unloaded", nb_models_unloaded, 0},
+	{"nb_models_started", nb_models_started, 0},
+	{"nb_models_stopped", nb_models_stopped, 0},
+};
+
+/* Layer xstats */
+static const struct cnxk_ml_xstat_info layer_xstats[] = {
+	{"Avg-HW-Latency", avg_hw_latency, 1}, {"Min-HW-Latency", min_hw_latency, 1},
+	{"Max-HW-Latency", max_hw_latency, 1}, {"Avg-FW-Latency", avg_fw_latency, 1},
+	{"Min-FW-Latency", min_fw_latency, 1}, {"Max-FW-Latency", max_fw_latency, 1},
+};
+
+#endif /* _CNXK_ML_XSTATS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 06/34] ml/cnxk: rename cnxk ops function pointers struct
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
                     ` (4 preceding siblings ...)
  2023-10-19  4:16   ` [PATCH v7 05/34] ml/cnxk: add generic cnxk xstats structures Srikanth Yalavarthi
@ 2023-10-19  4:16   ` Srikanth Yalavarthi
  2023-10-19  4:16   ` [PATCH v7 07/34] ml/cnxk: update device handling functions Srikanth Yalavarthi
                     ` (27 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:16 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Renamed cn10k ML ops structure with cnxk prefix.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c |  2 +-
 drivers/ml/cnxk/cn10k_ml_ops.c | 73 +++++++++-------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h | 34 +++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.c  | 36 +++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.h  |  2 +
 5 files changed, 91 insertions(+), 56 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index fc6f78d414..91813e9d0a 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -345,7 +345,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 			goto pmd_destroy;
 		}
 
-		dev->dev_ops = &cn10k_ml_ops;
+		dev->dev_ops = &cnxk_ml_ops;
 	} else {
 		plt_err("CN10K ML Ops are not supported on secondary process");
 		dev->dev_ops = &ml_dev_dummy_ops;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index b470955ffd..a44fb26215 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -119,7 +119,7 @@ cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
 {
 	struct cnxk_ml_qp *qp;
@@ -860,7 +860,7 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	return ret;
 }
 
-static int
+int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -888,7 +888,7 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
 {
 	struct rte_ml_dev_info dev_info;
@@ -1087,7 +1087,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	return ret;
 }
 
-static int
+int
 cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1160,7 +1160,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	return rte_dev_remove(dev->device);
 }
 
-static int
+int
 cn10k_ml_dev_start(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1180,7 +1180,7 @@ cn10k_ml_dev_start(struct rte_ml_dev *dev)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1200,7 +1200,7 @@ cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 			      const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
 {
@@ -1241,7 +1241,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 {
 	struct cnxk_ml_qp *qp;
@@ -1258,7 +1258,7 @@ cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 	return 0;
 }
 
-static void
+void
 cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 {
 	struct cnxk_ml_qp *qp;
@@ -1273,7 +1273,7 @@ cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 	}
 }
 
-static int
+int
 cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
 			      uint32_t size)
@@ -1321,7 +1321,7 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 	return idx;
 }
 
-static int
+int
 cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
 				uint64_t *value)
 {
@@ -1363,7 +1363,7 @@ cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16
 	return -EINVAL;
 }
 
-static int
+int
 cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
 			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
 {
@@ -1427,7 +1427,7 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 	return idx;
 }
 
-static int
+int
 cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 			  int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids)
 {
@@ -1441,7 +1441,7 @@ cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mo
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1528,7 +1528,7 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -2051,7 +2051,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	return ret;
 }
 
-static int
+int
 cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			struct rte_ml_model_info *model_info)
 {
@@ -2071,7 +2071,7 @@ cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 	return 0;
 }
 
-static int
+int
 cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
 {
 	struct cnxk_ml_model *model;
@@ -2105,7 +2105,7 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *bu
 	return 0;
 }
 
-static int
+int
 cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
 		     struct rte_ml_buff_seg **qbuffer)
 {
@@ -2186,7 +2186,7 @@ cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_bu
 	return 0;
 }
 
-static int
+int
 cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
 		       struct rte_ml_buff_seg **dbuffer)
 {
@@ -2574,38 +2574,3 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 error_enqueue:
 	return ret;
 }
-
-struct rte_ml_dev_ops cn10k_ml_ops = {
-	/* Device control ops */
-	.dev_info_get = cn10k_ml_dev_info_get,
-	.dev_configure = cn10k_ml_dev_configure,
-	.dev_close = cn10k_ml_dev_close,
-	.dev_start = cn10k_ml_dev_start,
-	.dev_stop = cn10k_ml_dev_stop,
-	.dev_dump = cn10k_ml_dev_dump,
-	.dev_selftest = cn10k_ml_dev_selftest,
-
-	/* Queue-pair handling ops */
-	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
-	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
-
-	/* Stats ops */
-	.dev_stats_get = cn10k_ml_dev_stats_get,
-	.dev_stats_reset = cn10k_ml_dev_stats_reset,
-	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
-	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
-	.dev_xstats_get = cn10k_ml_dev_xstats_get,
-	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
-
-	/* Model ops */
-	.model_load = cn10k_ml_model_load,
-	.model_unload = cn10k_ml_model_unload,
-	.model_start = cn10k_ml_model_start,
-	.model_stop = cn10k_ml_model_stop,
-	.model_info_get = cn10k_ml_model_info_get,
-	.model_params_update = cn10k_ml_model_params_update,
-
-	/* I/O ops */
-	.io_quantize = cn10k_ml_io_quantize,
-	.io_dequantize = cn10k_ml_io_dequantize,
-};
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index fd5992e192..16480b9ad8 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -286,7 +286,29 @@ struct cn10k_ml_req {
 };
 
 /* Device ops */
-extern struct rte_ml_dev_ops cn10k_ml_ops;
+int cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
+int cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf);
+int cn10k_ml_dev_close(struct rte_ml_dev *dev);
+int cn10k_ml_dev_start(struct rte_ml_dev *dev);
+int cn10k_ml_dev_stop(struct rte_ml_dev *dev);
+int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
+int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
+int cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
+				  const struct rte_ml_dev_qp_conf *qp_conf, int socket_id);
+int cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id);
+
+int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
+void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
+int cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+				  int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
+				  uint32_t size);
+int cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
+				    uint64_t *value);
+int cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+			    int32_t model_id, const uint16_t stat_ids[], uint64_t values[],
+			    uint16_t nb_ids);
+int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+			      int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids);
 
 /* Slow-path ops */
 int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
@@ -294,6 +316,16 @@ int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *para
 int cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
+int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
+			    struct rte_ml_model_info *model_info);
+int cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer);
+
+/* I/O ops */
+int cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id,
+			 struct rte_ml_buff_seg **dbuffer, struct rte_ml_buff_seg **qbuffer);
+
+int cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id,
+			   struct rte_ml_buff_seg **qbuffer, struct rte_ml_buff_seg **dbuffer);
 
 /* Fast-path ops */
 __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index f1872dcf7c..03402681c5 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -3,5 +3,41 @@
  */
 
 #include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
 
 #include "cnxk_ml_ops.h"
+
+struct rte_ml_dev_ops cnxk_ml_ops = {
+	/* Device control ops */
+	.dev_info_get = cn10k_ml_dev_info_get,
+	.dev_configure = cn10k_ml_dev_configure,
+	.dev_close = cn10k_ml_dev_close,
+	.dev_start = cn10k_ml_dev_start,
+	.dev_stop = cn10k_ml_dev_stop,
+	.dev_dump = cn10k_ml_dev_dump,
+	.dev_selftest = cn10k_ml_dev_selftest,
+
+	/* Queue-pair handling ops */
+	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
+	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
+
+	/* Stats ops */
+	.dev_stats_get = cn10k_ml_dev_stats_get,
+	.dev_stats_reset = cn10k_ml_dev_stats_reset,
+	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
+	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
+	.dev_xstats_get = cn10k_ml_dev_xstats_get,
+	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
+
+	/* Model ops */
+	.model_load = cn10k_ml_model_load,
+	.model_unload = cn10k_ml_model_unload,
+	.model_start = cn10k_ml_model_start,
+	.model_stop = cn10k_ml_model_stop,
+	.model_info_get = cn10k_ml_model_info_get,
+	.model_params_update = cn10k_ml_model_params_update,
+
+	/* I/O ops */
+	.io_quantize = cn10k_ml_io_quantize,
+	.io_dequantize = cn10k_ml_io_dequantize,
+};
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index b953fb0f5f..a925c07580 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -60,4 +60,6 @@ struct cnxk_ml_qp {
 	struct rte_ml_dev_stats stats;
 };
 
+extern struct rte_ml_dev_ops cnxk_ml_ops;
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 07/34] ml/cnxk: update device handling functions
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
                     ` (5 preceding siblings ...)
  2023-10-19  4:16   ` [PATCH v7 06/34] ml/cnxk: rename cnxk ops function pointers struct Srikanth Yalavarthi
@ 2023-10-19  4:16   ` Srikanth Yalavarthi
  2023-10-19  4:16   ` [PATCH v7 08/34] ml/cnxk: update queue-pair " Srikanth Yalavarthi
                     ` (26 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:16 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implement CNXK wrapper functions for dev_info_get,
dev_configure, dev_close, dev_start and dev_stop. The
wrapper functions allocate / release common resources
for the ML driver and invoke device specific functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 230 ++------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |  16 +-
 drivers/ml/cnxk/cnxk_ml_dev.h  |   3 +
 drivers/ml/cnxk/cnxk_ml_ops.c  | 286 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.h  |   3 +
 5 files changed, 314 insertions(+), 224 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index a44fb26215..f8c51ab394 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -101,7 +101,7 @@ qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
 }
 
-static int
+int
 cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
 {
 	const struct rte_memzone *qp_mem;
@@ -861,20 +861,12 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 }
 
 int
-cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
+cn10k_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 
-	if (dev_info == NULL)
-		return -EINVAL;
-
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
-	dev_info->driver_name = dev->device->driver->name;
-	dev_info->max_models = ML_CNXK_MAX_MODELS;
 	if (cn10k_mldev->hw_queue_lock)
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
 	else
@@ -889,143 +881,17 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 }
 
 int
-cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
+cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
-	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cnxk_ml_qp *qp;
-	uint16_t model_id;
-	uint32_t mz_size;
 	uint16_t tile_id;
-	uint16_t qp_id;
 	int ret;
 
-	if (dev == NULL || conf == NULL)
-		return -EINVAL;
+	RTE_SET_USED(conf);
 
-	/* Get CN10K device handle */
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	cn10k_ml_dev_info_get(dev, &dev_info);
-	if (conf->nb_models > dev_info.max_models) {
-		plt_err("Invalid device config, nb_models > %u\n", dev_info.max_models);
-		return -EINVAL;
-	}
-
-	if (conf->nb_queue_pairs > dev_info.max_queue_pairs) {
-		plt_err("Invalid device config, nb_queue_pairs > %u\n", dev_info.max_queue_pairs);
-		return -EINVAL;
-	}
-
-	if (cnxk_mldev->state == ML_CNXK_DEV_STATE_PROBED) {
-		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
-			   conf->nb_queue_pairs, conf->nb_models);
-
-		/* Load firmware */
-		ret = cn10k_ml_fw_load(cnxk_mldev);
-		if (ret != 0)
-			return ret;
-	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
-		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
-			   conf->nb_queue_pairs, conf->nb_models);
-	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_STARTED) {
-		plt_err("Device can't be reconfigured in started state\n");
-		return -ENOTSUP;
-	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CLOSED) {
-		plt_err("Device can't be reconfigured after close\n");
-		return -ENOTSUP;
-	}
-
-	/* Configure queue-pairs */
-	if (dev->data->queue_pairs == NULL) {
-		mz_size = sizeof(dev->data->queue_pairs[0]) * conf->nb_queue_pairs;
-		dev->data->queue_pairs =
-			rte_zmalloc("cn10k_mldev_queue_pairs", mz_size, RTE_CACHE_LINE_SIZE);
-		if (dev->data->queue_pairs == NULL) {
-			dev->data->nb_queue_pairs = 0;
-			plt_err("Failed to get memory for queue_pairs, nb_queue_pairs %u",
-				conf->nb_queue_pairs);
-			return -ENOMEM;
-		}
-	} else { /* Re-configure */
-		void **queue_pairs;
-
-		/* Release all queue pairs as ML spec doesn't support queue_pair_destroy. */
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-			qp = dev->data->queue_pairs[qp_id];
-			if (qp != NULL) {
-				ret = cn10k_ml_dev_queue_pair_release(dev, qp_id);
-				if (ret < 0)
-					return ret;
-			}
-		}
-
-		queue_pairs = dev->data->queue_pairs;
-		queue_pairs =
-			rte_realloc(queue_pairs, sizeof(queue_pairs[0]) * conf->nb_queue_pairs,
-				    RTE_CACHE_LINE_SIZE);
-		if (queue_pairs == NULL) {
-			dev->data->nb_queue_pairs = 0;
-			plt_err("Failed to realloc queue_pairs, nb_queue_pairs = %u",
-				conf->nb_queue_pairs);
-			ret = -ENOMEM;
-			goto error;
-		}
-
-		memset(queue_pairs, 0, sizeof(queue_pairs[0]) * conf->nb_queue_pairs);
-		dev->data->queue_pairs = queue_pairs;
-	}
-	dev->data->nb_queue_pairs = conf->nb_queue_pairs;
-
-	/* Allocate ML models */
-	if (dev->data->models == NULL) {
-		mz_size = sizeof(dev->data->models[0]) * conf->nb_models;
-		dev->data->models = rte_zmalloc("cn10k_mldev_models", mz_size, RTE_CACHE_LINE_SIZE);
-		if (dev->data->models == NULL) {
-			dev->data->nb_models = 0;
-			plt_err("Failed to get memory for ml_models, nb_models %u",
-				conf->nb_models);
-			ret = -ENOMEM;
-			goto error;
-		}
-	} else {
-		/* Re-configure */
-		void **models;
-
-		/* Stop and unload all models */
-		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
-			model = dev->data->models[model_id];
-			if (model != NULL) {
-				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-					if (cn10k_ml_model_stop(dev, model_id) != 0)
-						plt_err("Could not stop model %u", model_id);
-				}
-				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-					if (cn10k_ml_model_unload(dev, model_id) != 0)
-						plt_err("Could not unload model %u", model_id);
-				}
-				dev->data->models[model_id] = NULL;
-			}
-		}
-
-		models = dev->data->models;
-		models = rte_realloc(models, sizeof(models[0]) * conf->nb_models,
-				     RTE_CACHE_LINE_SIZE);
-		if (models == NULL) {
-			dev->data->nb_models = 0;
-			plt_err("Failed to realloc ml_models, nb_models = %u", conf->nb_models);
-			ret = -ENOMEM;
-			goto error;
-		}
-		memset(models, 0, sizeof(models[0]) * conf->nb_models);
-		dev->data->models = models;
-	}
-	dev->data->nb_models = conf->nb_models;
-
 	ocm = &cn10k_mldev->ocm;
 	ocm->num_tiles = ML_CN10K_OCM_NUMTILES;
 	ocm->size_per_tile = ML_CN10K_OCM_TILESIZE;
@@ -1038,8 +904,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		rte_zmalloc("ocm_mask", ocm->mask_words * ocm->num_tiles, RTE_CACHE_LINE_SIZE);
 	if (ocm->ocm_mask == NULL) {
 		plt_err("Unable to allocate memory for OCM mask");
-		ret = -ENOMEM;
-		goto error;
+		return -ENOMEM;
 	}
 
 	for (tile_id = 0; tile_id < ocm->num_tiles; tile_id++) {
@@ -1050,10 +915,10 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	rte_spinlock_init(&ocm->lock);
 
 	/* Initialize xstats */
-	ret = cn10k_ml_xstats_init(dev);
+	ret = cn10k_ml_xstats_init(cnxk_mldev->mldev);
 	if (ret != 0) {
 		plt_err("Failed to initialize xstats");
-		goto error;
+		return ret;
 	}
 
 	/* Set JCMDQ enqueue function */
@@ -1067,77 +932,25 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	cn10k_mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
 	cn10k_mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
 
-	dev->enqueue_burst = cn10k_ml_enqueue_burst;
-	dev->dequeue_burst = cn10k_ml_dequeue_burst;
-	dev->op_error_get = cn10k_ml_op_error_get;
-
-	cnxk_mldev->nb_models_loaded = 0;
-	cnxk_mldev->nb_models_started = 0;
-	cnxk_mldev->nb_models_stopped = 0;
-	cnxk_mldev->nb_models_unloaded = 0;
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
+	cnxk_mldev->mldev->enqueue_burst = cn10k_ml_enqueue_burst;
+	cnxk_mldev->mldev->dequeue_burst = cn10k_ml_dequeue_burst;
+	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
 
 	return 0;
-
-error:
-	rte_free(dev->data->queue_pairs);
-
-	rte_free(dev->data->models);
-
-	return ret;
 }
 
 int
-cn10k_ml_dev_close(struct rte_ml_dev *dev)
+cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
-	struct cnxk_ml_qp *qp;
-	uint16_t model_id;
-	uint16_t qp_id;
 
-	if (dev == NULL)
-		return -EINVAL;
-
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Release ocm_mask memory */
 	rte_free(cn10k_mldev->ocm.ocm_mask);
 
-	/* Stop and unload all models */
-	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
-		model = dev->data->models[model_id];
-		if (model != NULL) {
-			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-				if (cn10k_ml_model_stop(dev, model_id) != 0)
-					plt_err("Could not stop model %u", model_id);
-			}
-			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-				if (cn10k_ml_model_unload(dev, model_id) != 0)
-					plt_err("Could not unload model %u", model_id);
-			}
-			dev->data->models[model_id] = NULL;
-		}
-	}
-
-	rte_free(dev->data->models);
-
-	/* Destroy all queue pairs */
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-		qp = dev->data->queue_pairs[qp_id];
-		if (qp != NULL) {
-			if (cnxk_ml_qp_destroy(dev, qp) != 0)
-				plt_err("Could not destroy queue pair %u", qp_id);
-			dev->data->queue_pairs[qp_id] = NULL;
-		}
-	}
-
-	rte_free(dev->data->queue_pairs);
-
 	/* Un-initialize xstats */
-	cn10k_ml_xstats_uninit(dev);
+	cn10k_ml_xstats_uninit(cnxk_mldev->mldev);
 
 	/* Unload firmware */
 	cn10k_ml_fw_unload(cnxk_mldev);
@@ -1154,20 +967,15 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_MLR_BASE);
 	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
 
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_CLOSED;
-
-	/* Remove PCI device */
-	return rte_dev_remove(dev->device);
+	return 0;
 }
 
 int
-cn10k_ml_dev_start(struct rte_ml_dev *dev)
+cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
@@ -1175,19 +983,15 @@ cn10k_ml_dev_start(struct rte_ml_dev *dev)
 	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
-
 	return 0;
 }
 
 int
-cn10k_ml_dev_stop(struct rte_ml_dev *dev)
+cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
@@ -1195,8 +999,6 @@ cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
-
 	return 0;
 }
 
@@ -1217,7 +1019,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	if (dev->data->queue_pairs[queue_pair_id] != NULL)
 		cn10k_ml_dev_queue_pair_release(dev, queue_pair_id);
 
-	cn10k_ml_dev_info_get(dev, &dev_info);
+	cnxk_ml_dev_info_get(dev, &dev_info);
 	if ((qp_conf->nb_desc > dev_info.max_desc) || (qp_conf->nb_desc == 0)) {
 		plt_err("Could not setup queue pair for %u descriptors", qp_conf->nb_desc);
 		return -EINVAL;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 16480b9ad8..d50b5bede7 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -10,6 +10,9 @@
 
 #include <roc_api.h>
 
+struct cnxk_ml_dev;
+struct cnxk_ml_qp;
+
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
 
@@ -286,11 +289,11 @@ struct cn10k_ml_req {
 };
 
 /* Device ops */
-int cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
-int cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf);
-int cn10k_ml_dev_close(struct rte_ml_dev *dev);
-int cn10k_ml_dev_start(struct rte_ml_dev *dev);
-int cn10k_ml_dev_stop(struct rte_ml_dev *dev);
+int cn10k_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info);
+int cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
+int cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev);
+int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
 int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
 int cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
@@ -336,4 +339,7 @@ __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op
 				    struct rte_ml_op_error *error);
 __rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
 
+/* Temporarily set below functions as non-static */
+int cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp);
+
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 51315de622..02605fa28f 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -53,6 +53,9 @@ struct cnxk_ml_dev {
 
 	/* CN10K device structure */
 	struct cn10k_ml_dev cn10k_mldev;
+
+	/* Maximum number of layers */
+	uint64_t max_nb_layers;
 };
 
 #endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 03402681c5..07a4daabc5 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -5,15 +5,291 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include "cnxk_ml_dev.h"
+#include "cnxk_ml_io.h"
+#include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
+int
+cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	if (dev == NULL || dev_info == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
+	dev_info->driver_name = dev->device->driver->name;
+	dev_info->max_models = ML_CNXK_MAX_MODELS;
+
+	return cn10k_ml_dev_info_get(cnxk_mldev, dev_info);
+}
+
+static int
+cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_qp *qp;
+	uint16_t model_id;
+	uint32_t mz_size;
+	uint16_t qp_id;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	/* Get CNXK device handle */
+	cnxk_mldev = dev->data->dev_private;
+
+	cnxk_ml_dev_info_get(dev, &dev_info);
+	if (conf->nb_models > dev_info.max_models) {
+		plt_err("Invalid device config, nb_models > %u\n", dev_info.max_models);
+		return -EINVAL;
+	}
+
+	if (conf->nb_queue_pairs > dev_info.max_queue_pairs) {
+		plt_err("Invalid device config, nb_queue_pairs > %u\n", dev_info.max_queue_pairs);
+		return -EINVAL;
+	}
+
+	if (cnxk_mldev->state == ML_CNXK_DEV_STATE_PROBED) {
+		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
+			   conf->nb_queue_pairs, conf->nb_models);
+
+		/* Load firmware */
+		ret = cn10k_ml_fw_load(cnxk_mldev);
+		if (ret != 0)
+			return ret;
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
+		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
+			   conf->nb_queue_pairs, conf->nb_models);
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_STARTED) {
+		plt_err("Device can't be reconfigured in started state\n");
+		return -ENOTSUP;
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CLOSED) {
+		plt_err("Device can't be reconfigured after close\n");
+		return -ENOTSUP;
+	}
+
+	/* Configure queue-pairs */
+	if (dev->data->queue_pairs == NULL) {
+		mz_size = sizeof(dev->data->queue_pairs[0]) * conf->nb_queue_pairs;
+		dev->data->queue_pairs =
+			rte_zmalloc("cnxk_mldev_queue_pairs", mz_size, RTE_CACHE_LINE_SIZE);
+		if (dev->data->queue_pairs == NULL) {
+			dev->data->nb_queue_pairs = 0;
+			plt_err("Failed to get memory for queue_pairs, nb_queue_pairs %u",
+				conf->nb_queue_pairs);
+			return -ENOMEM;
+		}
+	} else { /* Re-configure */
+		void **queue_pairs;
+
+		/* Release all queue pairs as ML spec doesn't support queue_pair_destroy. */
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+			qp = dev->data->queue_pairs[qp_id];
+			if (qp != NULL) {
+				ret = cn10k_ml_dev_queue_pair_release(dev, qp_id);
+				if (ret < 0)
+					return ret;
+			}
+		}
+
+		queue_pairs = dev->data->queue_pairs;
+		queue_pairs =
+			rte_realloc(queue_pairs, sizeof(queue_pairs[0]) * conf->nb_queue_pairs,
+				    RTE_CACHE_LINE_SIZE);
+		if (queue_pairs == NULL) {
+			dev->data->nb_queue_pairs = 0;
+			plt_err("Failed to realloc queue_pairs, nb_queue_pairs = %u",
+				conf->nb_queue_pairs);
+			ret = -ENOMEM;
+			goto error;
+		}
+
+		memset(queue_pairs, 0, sizeof(queue_pairs[0]) * conf->nb_queue_pairs);
+		dev->data->queue_pairs = queue_pairs;
+	}
+	dev->data->nb_queue_pairs = conf->nb_queue_pairs;
+
+	/* Allocate ML models */
+	if (dev->data->models == NULL) {
+		mz_size = sizeof(dev->data->models[0]) * conf->nb_models;
+		dev->data->models = rte_zmalloc("cnxk_mldev_models", mz_size, RTE_CACHE_LINE_SIZE);
+		if (dev->data->models == NULL) {
+			dev->data->nb_models = 0;
+			plt_err("Failed to get memory for ml_models, nb_models %u",
+				conf->nb_models);
+			ret = -ENOMEM;
+			goto error;
+		}
+	} else {
+		/* Re-configure */
+		void **models;
+
+		/* Stop and unload all models */
+		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+			model = dev->data->models[model_id];
+			if (model != NULL) {
+				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
+					if (cn10k_ml_model_stop(dev, model_id) != 0)
+						plt_err("Could not stop model %u", model_id);
+				}
+				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
+					if (cn10k_ml_model_unload(dev, model_id) != 0)
+						plt_err("Could not unload model %u", model_id);
+				}
+				dev->data->models[model_id] = NULL;
+			}
+		}
+
+		models = dev->data->models;
+		models = rte_realloc(models, sizeof(models[0]) * conf->nb_models,
+				     RTE_CACHE_LINE_SIZE);
+		if (models == NULL) {
+			dev->data->nb_models = 0;
+			plt_err("Failed to realloc ml_models, nb_models = %u", conf->nb_models);
+			ret = -ENOMEM;
+			goto error;
+		}
+		memset(models, 0, sizeof(models[0]) * conf->nb_models);
+		dev->data->models = models;
+	}
+	dev->data->nb_models = conf->nb_models;
+
+	ret = cn10k_ml_dev_configure(cnxk_mldev, conf);
+	if (ret != 0) {
+		plt_err("Failed to configure CN10K ML Device");
+		goto error;
+	}
+
+	/* Set device capabilities */
+	cnxk_mldev->max_nb_layers =
+		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
+
+	cnxk_mldev->nb_models_loaded = 0;
+	cnxk_mldev->nb_models_started = 0;
+	cnxk_mldev->nb_models_stopped = 0;
+	cnxk_mldev->nb_models_unloaded = 0;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
+
+	return 0;
+
+error:
+	rte_free(dev->data->queue_pairs);
+	rte_free(dev->data->models);
+
+	return ret;
+}
+
+static int
+cnxk_ml_dev_close(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_qp *qp;
+	uint16_t model_id;
+	uint16_t qp_id;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
+		plt_err("Failed to close CN10K ML Device");
+
+	/* Stop and unload all models */
+	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+		model = dev->data->models[model_id];
+		if (model != NULL) {
+			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
+				if (cn10k_ml_model_stop(dev, model_id) != 0)
+					plt_err("Could not stop model %u", model_id);
+			}
+			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
+				if (cn10k_ml_model_unload(dev, model_id) != 0)
+					plt_err("Could not unload model %u", model_id);
+			}
+			dev->data->models[model_id] = NULL;
+		}
+	}
+
+	rte_free(dev->data->models);
+
+	/* Destroy all queue pairs */
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		if (qp != NULL) {
+			if (cnxk_ml_qp_destroy(dev, qp) != 0)
+				plt_err("Could not destroy queue pair %u", qp_id);
+			dev->data->queue_pairs[qp_id] = NULL;
+		}
+	}
+
+	rte_free(dev->data->queue_pairs);
+
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CLOSED;
+
+	/* Remove PCI device */
+	return rte_dev_remove(dev->device);
+}
+
+static int
+cnxk_ml_dev_start(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	ret = cn10k_ml_dev_start(cnxk_mldev);
+	if (ret != 0) {
+		plt_err("Failed to start CN10K ML Device");
+		return ret;
+	}
+
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
+
+	return 0;
+}
+
+static int
+cnxk_ml_dev_stop(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	ret = cn10k_ml_dev_stop(cnxk_mldev);
+	if (ret != 0) {
+		plt_err("Failed to stop CN10K ML Device");
+		return ret;
+	}
+
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
-	.dev_info_get = cn10k_ml_dev_info_get,
-	.dev_configure = cn10k_ml_dev_configure,
-	.dev_close = cn10k_ml_dev_close,
-	.dev_start = cn10k_ml_dev_start,
-	.dev_stop = cn10k_ml_dev_stop,
+	.dev_info_get = cnxk_ml_dev_info_get,
+	.dev_configure = cnxk_ml_dev_configure,
+	.dev_close = cnxk_ml_dev_close,
+	.dev_start = cnxk_ml_dev_start,
+	.dev_stop = cnxk_ml_dev_stop,
 	.dev_dump = cn10k_ml_dev_dump,
 	.dev_selftest = cn10k_ml_dev_selftest,
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index a925c07580..2996928d7d 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -62,4 +62,7 @@ struct cnxk_ml_qp {
 
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
+/* Temporarily set cnxk driver functions as non-static */
+int cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 08/34] ml/cnxk: update queue-pair handling functions
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
                     ` (6 preceding siblings ...)
  2023-10-19  4:16   ` [PATCH v7 07/34] ml/cnxk: update device handling functions Srikanth Yalavarthi
@ 2023-10-19  4:16   ` Srikanth Yalavarthi
  2023-10-19  4:16   ` [PATCH v7 09/34] ml/cnxk: update model load and unload functions Srikanth Yalavarthi
                     ` (25 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:16 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to handle ML device queue-pairs.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 135 +----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   7 +-
 drivers/ml/cnxk/cnxk_ml_ops.c  | 153 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.h  |   3 -
 4 files changed, 154 insertions(+), 144 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index f8c51ab394..9691cf03e3 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -95,93 +95,12 @@ cn10k_ml_get_poll_ptr(struct cnxk_ml_req *req)
 	return plt_read64(req->status);
 }
 
-static void
-qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
-{
-	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
-}
-
-int
-cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
-{
-	const struct rte_memzone *qp_mem;
-	char name[RTE_MEMZONE_NAMESIZE];
-	int ret;
-
-	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp->id);
-	qp_mem = rte_memzone_lookup(name);
-	ret = rte_memzone_free(qp_mem);
-	if (ret)
-		return ret;
-
-	rte_free(qp);
-
-	return 0;
-}
-
-int
-cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
-{
-	struct cnxk_ml_qp *qp;
-	int ret;
-
-	qp = dev->data->queue_pairs[queue_pair_id];
-	if (qp == NULL)
-		return -EINVAL;
-
-	ret = cnxk_ml_qp_destroy(dev, qp);
-	if (ret) {
-		plt_err("Could not destroy queue pair %u", queue_pair_id);
-		return ret;
-	}
-
-	dev->data->queue_pairs[queue_pair_id] = NULL;
-
-	return 0;
-}
-
-static struct cnxk_ml_qp *
-cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
+void
+cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp)
 {
-	const struct rte_memzone *qp_mem;
-	char name[RTE_MEMZONE_NAMESIZE];
-	struct cnxk_ml_qp *qp;
-	uint32_t len;
-	uint8_t *va;
 	uint64_t i;
 
-	/* Allocate queue pair */
-	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cnxk_ml_qp), ROC_ALIGN,
-				socket_id);
-	if (qp == NULL) {
-		plt_err("Could not allocate queue pair");
-		return NULL;
-	}
-
-	/* For request queue */
-	len = nb_desc * sizeof(struct cnxk_ml_req);
-	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp_id);
-	qp_mem = rte_memzone_reserve_aligned(
-		name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | RTE_MEMZONE_256MB, ROC_ALIGN);
-	if (qp_mem == NULL) {
-		plt_err("Could not reserve memzone: %s", name);
-		goto qp_free;
-	}
-
-	va = qp_mem->addr;
-	memset(va, 0, len);
-
-	/* Initialize Request queue */
-	qp->id = qp_id;
-	qp->queue.reqs = (struct cnxk_ml_req *)va;
-	qp->queue.head = 0;
-	qp->queue.tail = 0;
-	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	qp->nb_desc = nb_desc;
-	qp->stats.enqueued_count = 0;
-	qp->stats.dequeued_count = 0;
-	qp->stats.enqueue_err_count = 0;
-	qp->stats.dequeue_err_count = 0;
+	RTE_SET_USED(cnxk_mldev);
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
@@ -189,13 +108,6 @@ cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc
 		qp->queue.reqs[i].cn10k_req.jcmd.w1.s.jobptr =
 			PLT_U64_CAST(&qp->queue.reqs[i].cn10k_req.jd);
 	}
-
-	return qp;
-
-qp_free:
-	rte_free(qp);
-
-	return NULL;
 }
 
 static void
@@ -1002,47 +914,6 @@ cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
-int
-cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
-			      const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
-{
-	struct rte_ml_dev_info dev_info;
-	struct cnxk_ml_qp *qp;
-	uint32_t nb_desc;
-
-	if (queue_pair_id >= dev->data->nb_queue_pairs) {
-		plt_err("Queue-pair id = %u (>= max queue pairs supported, %u)\n", queue_pair_id,
-			dev->data->nb_queue_pairs);
-		return -EINVAL;
-	}
-
-	if (dev->data->queue_pairs[queue_pair_id] != NULL)
-		cn10k_ml_dev_queue_pair_release(dev, queue_pair_id);
-
-	cnxk_ml_dev_info_get(dev, &dev_info);
-	if ((qp_conf->nb_desc > dev_info.max_desc) || (qp_conf->nb_desc == 0)) {
-		plt_err("Could not setup queue pair for %u descriptors", qp_conf->nb_desc);
-		return -EINVAL;
-	}
-	plt_ml_dbg("Creating queue-pair, queue_pair_id = %u, nb_desc = %u", queue_pair_id,
-		   qp_conf->nb_desc);
-
-	/* As the number of usable descriptors is 1 less than the queue size being created, we
-	 * increment the size of queue by 1 than the requested size, except when the requested size
-	 * is equal to the maximum possible size.
-	 */
-	nb_desc =
-		(qp_conf->nb_desc == dev_info.max_desc) ? dev_info.max_desc : qp_conf->nb_desc + 1;
-	qp = cnxk_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
-	if (qp == NULL) {
-		plt_err("Could not create queue pair %u", queue_pair_id);
-		return -ENOMEM;
-	}
-	dev->data->queue_pairs[queue_pair_id] = qp;
-
-	return 0;
-}
-
 int
 cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 {
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index d50b5bede7..2d0a49d5cd 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -296,9 +296,6 @@ int cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
 int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
-int cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
-				  const struct rte_ml_dev_qp_conf *qp_conf, int socket_id);
-int cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id);
 
 int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
 void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
@@ -339,7 +336,7 @@ __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op
 				    struct rte_ml_op_error *error);
 __rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
 
-/* Temporarily set below functions as non-static */
-int cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp);
+/* Misc ops */
+void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
 
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 07a4daabc5..aa56dd2276 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -10,7 +10,107 @@
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
-int
+static void
+qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
+{
+	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
+}
+
+static int
+cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
+{
+	const struct rte_memzone *qp_mem;
+	char name[RTE_MEMZONE_NAMESIZE];
+	int ret;
+
+	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp->id);
+	qp_mem = rte_memzone_lookup(name);
+	ret = rte_memzone_free(qp_mem);
+	if (ret)
+		return ret;
+
+	rte_free(qp);
+
+	return 0;
+}
+
+static int
+cnxk_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
+{
+	struct cnxk_ml_qp *qp;
+	int ret;
+
+	qp = dev->data->queue_pairs[queue_pair_id];
+	if (qp == NULL)
+		return -EINVAL;
+
+	ret = cnxk_ml_qp_destroy(dev, qp);
+	if (ret) {
+		plt_err("Could not destroy queue pair %u", queue_pair_id);
+		return ret;
+	}
+
+	dev->data->queue_pairs[queue_pair_id] = NULL;
+
+	return 0;
+}
+
+static struct cnxk_ml_qp *
+cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
+{
+	const struct rte_memzone *qp_mem;
+	char name[RTE_MEMZONE_NAMESIZE];
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_qp *qp;
+	uint32_t len;
+	uint8_t *va;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	/* Allocate queue pair */
+	qp = rte_zmalloc_socket("cnxk_ml_pmd_queue_pair", sizeof(struct cnxk_ml_qp), ROC_ALIGN,
+				socket_id);
+	if (qp == NULL) {
+		plt_err("Could not allocate queue pair");
+		return NULL;
+	}
+
+	/* For request queue */
+	len = nb_desc * sizeof(struct cnxk_ml_req);
+	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp_id);
+	qp_mem = rte_memzone_reserve_aligned(
+		name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | RTE_MEMZONE_256MB, ROC_ALIGN);
+	if (qp_mem == NULL) {
+		plt_err("Could not reserve memzone: %s", name);
+		goto qp_free;
+	}
+
+	va = qp_mem->addr;
+	memset(va, 0, len);
+
+	/* Initialize Request queue */
+	qp->id = qp_id;
+	qp->queue.reqs = (struct cnxk_ml_req *)va;
+	qp->queue.head = 0;
+	qp->queue.tail = 0;
+	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	qp->nb_desc = nb_desc;
+	qp->stats.enqueued_count = 0;
+	qp->stats.dequeued_count = 0;
+	qp->stats.enqueue_err_count = 0;
+	qp->stats.dequeue_err_count = 0;
+
+	cn10k_ml_qp_initialize(cnxk_mldev, qp);
+
+	return qp;
+
+qp_free:
+	rte_free(qp);
+
+	return NULL;
+}
+
+static int
 cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
 	struct cnxk_ml_dev *cnxk_mldev;
@@ -93,7 +193,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
 			qp = dev->data->queue_pairs[qp_id];
 			if (qp != NULL) {
-				ret = cn10k_ml_dev_queue_pair_release(dev, qp_id);
+				ret = cnxk_ml_dev_queue_pair_release(dev, qp_id);
 				if (ret < 0)
 					return ret;
 			}
@@ -283,6 +383,51 @@ cnxk_ml_dev_stop(struct rte_ml_dev *dev)
 	return 0;
 }
 
+static int
+cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
+			     const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cnxk_ml_qp *qp;
+	uint32_t nb_desc;
+
+	if (queue_pair_id >= dev->data->nb_queue_pairs) {
+		plt_err("Queue-pair id = %u (>= max queue pairs supported, %u)\n", queue_pair_id,
+			dev->data->nb_queue_pairs);
+		return -EINVAL;
+	}
+
+	if (dev->data->queue_pairs[queue_pair_id] != NULL)
+		cnxk_ml_dev_queue_pair_release(dev, queue_pair_id);
+
+	cnxk_ml_dev_info_get(dev, &dev_info);
+	if (qp_conf->nb_desc == 0) {
+		plt_err("Could not setup queue pair for %u descriptors", qp_conf->nb_desc);
+		return -EINVAL;
+	} else if (qp_conf->nb_desc > dev_info.max_desc) {
+		plt_err("Could not setup queue pair for %u descriptors (> %u)", qp_conf->nb_desc,
+			dev_info.max_desc);
+		return -EINVAL;
+	}
+	plt_ml_dbg("Creating queue-pair, queue_pair_id = %u, nb_desc = %u", queue_pair_id,
+		   qp_conf->nb_desc);
+
+	/* As the number of usable descriptors is 1 less than the queue size being created, we
+	 * increment the size of queue by 1 than the requested size, except when the requested size
+	 * is equal to the maximum possible size.
+	 */
+	nb_desc =
+		(qp_conf->nb_desc == dev_info.max_desc) ? dev_info.max_desc : qp_conf->nb_desc + 1;
+	qp = cnxk_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
+	if (qp == NULL) {
+		plt_err("Could not create queue pair %u", queue_pair_id);
+		return -ENOMEM;
+	}
+	dev->data->queue_pairs[queue_pair_id] = qp;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -294,8 +439,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_selftest = cn10k_ml_dev_selftest,
 
 	/* Queue-pair handling ops */
-	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
-	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
+	.dev_queue_pair_setup = cnxk_ml_dev_queue_pair_setup,
+	.dev_queue_pair_release = cnxk_ml_dev_queue_pair_release,
 
 	/* Stats ops */
 	.dev_stats_get = cn10k_ml_dev_stats_get,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index 2996928d7d..a925c07580 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -62,7 +62,4 @@ struct cnxk_ml_qp {
 
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
-/* Temporarily set cnxk driver functions as non-static */
-int cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
-
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 09/34] ml/cnxk: update model load and unload functions
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
                     ` (7 preceding siblings ...)
  2023-10-19  4:16   ` [PATCH v7 08/34] ml/cnxk: update queue-pair " Srikanth Yalavarthi
@ 2023-10-19  4:16   ` Srikanth Yalavarthi
  2023-10-19  4:16   ` [PATCH v7 10/34] ml/cnxk: update model start and stop functions Srikanth Yalavarthi
                     ` (24 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:16 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented cnxk wrapper functions to load and unload
ML models. Wrapper functions would invoke the cn10k
model load and unload functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 244 ++++++++++++-------------
 drivers/ml/cnxk/cn10k_ml_model.h |  26 ++-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 296 ++++++++++++++++++-------------
 drivers/ml/cnxk/cn10k_ml_ops.h   |  12 +-
 drivers/ml/cnxk/cnxk_ml_dev.h    |  15 ++
 drivers/ml/cnxk/cnxk_ml_ops.c    | 144 ++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.h    |   2 +
 7 files changed, 462 insertions(+), 277 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index d2f1c761be..48d70027ca 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -316,42 +316,31 @@ cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer, uint8_t
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_layer_addr *addr;
-	size_t model_data_size;
 	uint8_t *dma_addr_load;
-	uint8_t *dma_addr_run;
 	int fpos;
 
 	metadata = &layer->glow.metadata;
 	addr = &layer->glow.addr;
-	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
-			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
 
 	/* Base address */
 	addr->base_dma_addr_load = base_dma_addr;
-	addr->base_dma_addr_run = PLT_PTR_ADD(addr->base_dma_addr_load, model_data_size);
 
 	/* Init section */
 	dma_addr_load = addr->base_dma_addr_load;
-	dma_addr_run = addr->base_dma_addr_run;
 	fpos = sizeof(struct cn10k_ml_model_metadata);
 	addr->init_load_addr = dma_addr_load;
-	addr->init_run_addr = dma_addr_run;
 	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->init_model.file_size);
 
 	/* Main section */
 	dma_addr_load += metadata->init_model.file_size;
-	dma_addr_run += metadata->init_model.file_size;
 	fpos += metadata->init_model.file_size;
 	addr->main_load_addr = dma_addr_load;
-	addr->main_run_addr = dma_addr_run;
 	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->main_model.file_size);
 
 	/* Finish section */
 	dma_addr_load += metadata->main_model.file_size;
-	dma_addr_run += metadata->main_model.file_size;
 	fpos += metadata->main_model.file_size;
 	addr->finish_load_addr = dma_addr_load;
-	addr->finish_run_addr = dma_addr_run;
 	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->finish_model.file_size);
 
 	/* Weights and Bias section */
@@ -363,142 +352,148 @@ cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer, uint8_t
 }
 
 void
-cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer)
+cn10k_ml_layer_io_info_set(struct cnxk_ml_io_info *io_info,
+			   struct cn10k_ml_model_metadata *metadata)
 {
-	struct cn10k_ml_model_metadata *metadata;
 	uint8_t i;
 	uint8_t j;
 
-	metadata = &layer->glow.metadata;
-
 	/* Inputs */
-	layer->info.nb_inputs = metadata->model.num_input;
-	layer->info.total_input_sz_d = 0;
-	layer->info.total_input_sz_q = 0;
+	io_info->nb_inputs = metadata->model.num_input;
+	io_info->total_input_sz_d = 0;
+	io_info->total_input_sz_q = 0;
 	for (i = 0; i < metadata->model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			rte_strscpy(layer->info.input[i].name,
-				    (char *)metadata->input1[i].input_name, MRVL_ML_INPUT_NAME_LEN);
-			layer->info.input[i].dtype = metadata->input1[i].input_type;
-			layer->info.input[i].qtype = metadata->input1[i].model_input_type;
-			layer->info.input[i].nb_dims = 4;
-			layer->info.input[i].shape[0] = metadata->input1[i].shape.w;
-			layer->info.input[i].shape[1] = metadata->input1[i].shape.x;
-			layer->info.input[i].shape[2] = metadata->input1[i].shape.y;
-			layer->info.input[i].shape[3] = metadata->input1[i].shape.z;
-			layer->info.input[i].nb_elements =
+			rte_strscpy(io_info->input[i].name, (char *)metadata->input1[i].input_name,
+				    MRVL_ML_INPUT_NAME_LEN);
+			io_info->input[i].dtype = metadata->input1[i].input_type;
+			io_info->input[i].qtype = metadata->input1[i].model_input_type;
+			io_info->input[i].nb_dims = 4;
+			io_info->input[i].shape[0] = metadata->input1[i].shape.w;
+			io_info->input[i].shape[1] = metadata->input1[i].shape.x;
+			io_info->input[i].shape[2] = metadata->input1[i].shape.y;
+			io_info->input[i].shape[3] = metadata->input1[i].shape.z;
+			io_info->input[i].nb_elements =
 				metadata->input1[i].shape.w * metadata->input1[i].shape.x *
 				metadata->input1[i].shape.y * metadata->input1[i].shape.z;
-			layer->info.input[i].sz_d =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_d =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].input_type);
-			layer->info.input[i].sz_q =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_q =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].model_input_type);
-			layer->info.input[i].scale = metadata->input1[i].qscale;
+			io_info->input[i].scale = metadata->input1[i].qscale;
 
-			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
-			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
+			io_info->total_input_sz_d += io_info->input[i].sz_d;
+			io_info->total_input_sz_q += io_info->input[i].sz_q;
 
 			plt_ml_dbg(
-				"index = %u, input1[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				layer->index, i, metadata->input1[i].shape.w,
+				"layer_name = %s, input1[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				metadata->model.name, i, metadata->input1[i].shape.w,
 				metadata->input1[i].shape.x, metadata->input1[i].shape.y,
-				metadata->input1[i].shape.z, layer->info.input[i].sz_d,
-				layer->info.input[i].sz_q);
+				metadata->input1[i].shape.z, io_info->input[i].sz_d,
+				io_info->input[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			rte_strscpy(layer->info.input[i].name,
-				    (char *)metadata->input2[j].input_name, MRVL_ML_INPUT_NAME_LEN);
-			layer->info.input[i].dtype = metadata->input2[j].input_type;
-			layer->info.input[i].qtype = metadata->input2[j].model_input_type;
-			layer->info.input[i].nb_dims = 4;
-			layer->info.input[i].shape[0] = metadata->input2[j].shape.w;
-			layer->info.input[i].shape[1] = metadata->input2[j].shape.x;
-			layer->info.input[i].shape[2] = metadata->input2[j].shape.y;
-			layer->info.input[i].shape[3] = metadata->input2[j].shape.z;
-			layer->info.input[i].nb_elements =
+			rte_strscpy(io_info->input[i].name, (char *)metadata->input2[j].input_name,
+				    MRVL_ML_INPUT_NAME_LEN);
+			io_info->input[i].dtype = metadata->input2[j].input_type;
+			io_info->input[i].qtype = metadata->input2[j].model_input_type;
+			io_info->input[i].nb_dims = 4;
+			io_info->input[i].shape[0] = metadata->input2[j].shape.w;
+			io_info->input[i].shape[1] = metadata->input2[j].shape.x;
+			io_info->input[i].shape[2] = metadata->input2[j].shape.y;
+			io_info->input[i].shape[3] = metadata->input2[j].shape.z;
+			io_info->input[i].nb_elements =
 				metadata->input2[j].shape.w * metadata->input2[j].shape.x *
 				metadata->input2[j].shape.y * metadata->input2[j].shape.z;
-			layer->info.input[i].sz_d =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_d =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].input_type);
-			layer->info.input[i].sz_q =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_q =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].model_input_type);
-			layer->info.input[i].scale = metadata->input2[j].qscale;
+			io_info->input[i].scale = metadata->input2[j].qscale;
 
-			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
-			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
+			io_info->total_input_sz_d += io_info->input[i].sz_d;
+			io_info->total_input_sz_q += io_info->input[i].sz_q;
 
 			plt_ml_dbg(
-				"index = %u, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				layer->index, j, metadata->input2[j].shape.w,
+				"layer_name = %s, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				metadata->model.name, j, metadata->input2[j].shape.w,
 				metadata->input2[j].shape.x, metadata->input2[j].shape.y,
-				metadata->input2[j].shape.z, layer->info.input[i].sz_d,
-				layer->info.input[i].sz_q);
+				metadata->input2[j].shape.z, io_info->input[i].sz_d,
+				io_info->input[i].sz_q);
 		}
 	}
 
 	/* Outputs */
-	layer->info.nb_outputs = metadata->model.num_output;
-	layer->info.total_output_sz_q = 0;
-	layer->info.total_output_sz_d = 0;
+	io_info->nb_outputs = metadata->model.num_output;
+	io_info->total_output_sz_q = 0;
+	io_info->total_output_sz_d = 0;
 	for (i = 0; i < metadata->model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			rte_strscpy(layer->info.output[i].name,
+			rte_strscpy(io_info->output[i].name,
 				    (char *)metadata->output1[i].output_name,
 				    MRVL_ML_OUTPUT_NAME_LEN);
-			layer->info.output[i].dtype = metadata->output1[i].output_type;
-			layer->info.output[i].qtype = metadata->output1[i].model_output_type;
-			layer->info.output[i].nb_dims = 1;
-			layer->info.output[i].shape[0] = metadata->output1[i].size;
-			layer->info.output[i].nb_elements = metadata->output1[i].size;
-			layer->info.output[i].sz_d =
-				layer->info.output[i].nb_elements *
+			io_info->output[i].dtype = metadata->output1[i].output_type;
+			io_info->output[i].qtype = metadata->output1[i].model_output_type;
+			io_info->output[i].nb_dims = 1;
+			io_info->output[i].shape[0] = metadata->output1[i].size;
+			io_info->output[i].nb_elements = metadata->output1[i].size;
+			io_info->output[i].sz_d =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].output_type);
-			layer->info.output[i].sz_q =
-				layer->info.output[i].nb_elements *
+			io_info->output[i].sz_q =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].model_output_type);
-			layer->info.output[i].scale = metadata->output1[i].dscale;
+			io_info->output[i].scale = metadata->output1[i].dscale;
 
-			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
-			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
+			io_info->total_output_sz_q += io_info->output[i].sz_q;
+			io_info->total_output_sz_d += io_info->output[i].sz_d;
 
-			plt_ml_dbg("index = %u, output1[%u] - sz_d = %u, sz_q = %u", layer->index,
-				   i, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
+			plt_ml_dbg("layer_name = %s, output1[%u] - sz_d = %u, sz_q = %u",
+				   metadata->model.name, i, io_info->output[i].sz_d,
+				   io_info->output[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			rte_strscpy(layer->info.output[i].name,
+			rte_strscpy(io_info->output[i].name,
 				    (char *)metadata->output2[j].output_name,
 				    MRVL_ML_OUTPUT_NAME_LEN);
-			layer->info.output[i].dtype = metadata->output2[j].output_type;
-			layer->info.output[i].qtype = metadata->output2[j].model_output_type;
-			layer->info.output[i].nb_dims = 1;
-			layer->info.output[i].shape[0] = metadata->output2[j].size;
-			layer->info.output[i].nb_elements = metadata->output2[j].size;
-			layer->info.output[i].sz_d =
-				layer->info.output[i].nb_elements *
+			io_info->output[i].dtype = metadata->output2[j].output_type;
+			io_info->output[i].qtype = metadata->output2[j].model_output_type;
+			io_info->output[i].nb_dims = 1;
+			io_info->output[i].shape[0] = metadata->output2[j].size;
+			io_info->output[i].nb_elements = metadata->output2[j].size;
+			io_info->output[i].sz_d =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].output_type);
-			layer->info.output[i].sz_q =
-				layer->info.output[i].nb_elements *
+			io_info->output[i].sz_q =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].model_output_type);
-			layer->info.output[i].scale = metadata->output2[j].dscale;
+			io_info->output[i].scale = metadata->output2[j].dscale;
 
-			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
-			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
+			io_info->total_output_sz_q += io_info->output[i].sz_q;
+			io_info->total_output_sz_d += io_info->output[i].sz_d;
 
-			plt_ml_dbg("index = %u, output2[%u] - sz_d = %u, sz_q = %u", layer->index,
-				   j, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
+			plt_ml_dbg("layer_name = %s, output2[%u] - sz_d = %u, sz_q = %u",
+				   metadata->model.name, j, io_info->output[i].sz_d,
+				   io_info->output[i].sz_q);
 		}
 	}
 }
 
+struct cnxk_ml_io_info *
+cn10k_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
+{
+	return &model->layer[layer_id].info;
+}
+
 int
-cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id, uint8_t *buffer,
-			       uint16_t *wb_pages, uint16_t *scratch_pages)
+cn10k_ml_model_ocm_pages_count(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
+			       uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_ocm *ocm;
@@ -506,7 +501,7 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 	uint64_t wb_size;
 
 	metadata = (struct cn10k_ml_model_metadata *)buffer;
-	ocm = &cn10k_mldev->ocm;
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
 
 	/* Assume wb_size is zero for non-relocatable models */
 	if (metadata->model.ocm_relocatable)
@@ -518,7 +513,7 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 		*wb_pages = wb_size / ocm->page_size + 1;
 	else
 		*wb_pages = wb_size / ocm->page_size;
-	plt_ml_dbg("model_id = %u, wb_size = %" PRIu64 ", wb_pages = %u", model_id, wb_size,
+	plt_ml_dbg("index = %u, wb_size = %" PRIu64 ", wb_pages = %u", layer->index, wb_size,
 		   *wb_pages);
 
 	scratch_size = ocm->size_per_tile - metadata->model.ocm_tmp_range_floor;
@@ -526,15 +521,15 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 		*scratch_pages = scratch_size / ocm->page_size + 1;
 	else
 		*scratch_pages = scratch_size / ocm->page_size;
-	plt_ml_dbg("model_id = %u, scratch_size = %" PRIu64 ", scratch_pages = %u", model_id,
+	plt_ml_dbg("index = %u, scratch_size = %" PRIu64 ", scratch_pages = %u", layer->index,
 		   scratch_size, *scratch_pages);
 
 	/* Check if the model can be loaded on OCM */
-	if ((*wb_pages + *scratch_pages) > cn10k_mldev->ocm.num_pages) {
+	if ((*wb_pages + *scratch_pages) > ocm->num_pages) {
 		plt_err("Cannot create the model, OCM relocatable = %u",
 			metadata->model.ocm_relocatable);
 		plt_err("wb_pages (%u) + scratch_pages (%u) > %u", *wb_pages, *scratch_pages,
-			cn10k_mldev->ocm.num_pages);
+			ocm->num_pages);
 		return -ENOMEM;
 	}
 
@@ -542,28 +537,25 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 	 * prevent the library from allocating the remaining space on the tile to other models.
 	 */
 	if (!metadata->model.ocm_relocatable)
-		*scratch_pages = PLT_MAX(PLT_U64_CAST(*scratch_pages),
-					 PLT_U64_CAST(cn10k_mldev->ocm.num_pages));
+		*scratch_pages =
+			PLT_MAX(PLT_U64_CAST(*scratch_pages), PLT_U64_CAST(ocm->num_pages));
 
 	return 0;
 }
 
 void
-cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
+cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			struct cnxk_ml_io_info *io_info, struct cn10k_ml_model_metadata *metadata)
 {
-	struct cn10k_ml_model_metadata *metadata;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct rte_ml_model_info *info;
 	struct rte_ml_io_info *output;
 	struct rte_ml_io_info *input;
-	struct cnxk_ml_layer *layer;
 	uint8_t i;
 
-	cnxk_mldev = dev->data->dev_private;
 	metadata = &model->glow.metadata;
 	info = PLT_PTR_CAST(model->info);
 	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
-	output = PLT_PTR_ADD(input, metadata->model.num_input * sizeof(struct rte_ml_io_info));
+	output = PLT_PTR_ADD(input, ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info));
 
 	/* Set model info */
 	memset(info, 0, sizeof(struct rte_ml_model_info));
@@ -572,39 +564,37 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 		 metadata->model.version[1], metadata->model.version[2],
 		 metadata->model.version[3]);
 	info->model_id = model->model_id;
-	info->device_id = dev->data->dev_id;
+	info->device_id = cnxk_mldev->mldev->data->dev_id;
 	info->io_layout = RTE_ML_IO_LAYOUT_PACKED;
 	info->min_batches = model->batch_size;
 	info->max_batches =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_num_batches /
 		model->batch_size;
-	info->nb_inputs = metadata->model.num_input;
+	info->nb_inputs = io_info->nb_inputs;
 	info->input_info = input;
-	info->nb_outputs = metadata->model.num_output;
+	info->nb_outputs = io_info->nb_outputs;
 	info->output_info = output;
 	info->wb_size = metadata->weights_bias.file_size;
 
 	/* Set input info */
-	layer = &model->layer[0];
 	for (i = 0; i < info->nb_inputs; i++) {
-		rte_memcpy(input[i].name, layer->info.input[i].name, MRVL_ML_INPUT_NAME_LEN);
-		input[i].nb_dims = layer->info.input[i].nb_dims;
-		input[i].shape = &layer->info.input[i].shape[0];
-		input[i].type = layer->info.input[i].qtype;
-		input[i].nb_elements = layer->info.input[i].nb_elements;
-		input[i].size = layer->info.input[i].nb_elements *
-				rte_ml_io_type_size_get(layer->info.input[i].qtype);
+		rte_memcpy(input[i].name, io_info->input[i].name, MRVL_ML_INPUT_NAME_LEN);
+		input[i].nb_dims = io_info->input[i].nb_dims;
+		input[i].shape = &io_info->input[i].shape[0];
+		input[i].type = io_info->input[i].qtype;
+		input[i].nb_elements = io_info->input[i].nb_elements;
+		input[i].size = io_info->input[i].nb_elements *
+				rte_ml_io_type_size_get(io_info->input[i].qtype);
 	}
 
 	/* Set output info */
-	layer = &model->layer[0];
 	for (i = 0; i < info->nb_outputs; i++) {
-		rte_memcpy(output[i].name, layer->info.output[i].name, MRVL_ML_INPUT_NAME_LEN);
-		output[i].nb_dims = layer->info.output[i].nb_dims;
-		output[i].shape = &layer->info.output[i].shape[0];
-		output[i].type = layer->info.output[i].qtype;
-		output[i].nb_elements = layer->info.output[i].nb_elements;
-		output[i].size = layer->info.output[i].nb_elements *
-				 rte_ml_io_type_size_get(layer->info.output[i].qtype);
+		rte_memcpy(output[i].name, io_info->output[i].name, MRVL_ML_INPUT_NAME_LEN);
+		output[i].nb_dims = io_info->output[i].nb_dims;
+		output[i].shape = &io_info->output[i].shape[0];
+		output[i].type = io_info->output[i].qtype;
+		output[i].nb_elements = io_info->output[i].nb_elements;
+		output[i].size = io_info->output[i].nb_elements *
+				 rte_ml_io_type_size_get(io_info->output[i].qtype);
 	}
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 5c32f48c68..b891c9d627 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -9,9 +9,11 @@
 
 #include <roc_api.h>
 
-#include "cn10k_ml_dev.h"
 #include "cn10k_ml_ocm.h"
 
+#include "cnxk_ml_io.h"
+
+struct cnxk_ml_dev;
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
 struct cnxk_ml_req;
@@ -366,27 +368,15 @@ struct cn10k_ml_layer_addr {
 	/* Base DMA address for load */
 	void *base_dma_addr_load;
 
-	/* Base DMA address for run */
-	void *base_dma_addr_run;
-
 	/* Init section load address */
 	void *init_load_addr;
 
-	/* Init section run address */
-	void *init_run_addr;
-
 	/* Main section load address */
 	void *main_load_addr;
 
-	/* Main section run address */
-	void *main_run_addr;
-
 	/* Finish section load address */
 	void *finish_load_addr;
 
-	/* Finish section run address */
-	void *finish_run_addr;
-
 	/* Weights and Bias base address */
 	void *wb_base_addr;
 
@@ -462,9 +452,13 @@ int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
 void cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer,
 				uint8_t *base_dma_addr);
-void cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer);
-int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id,
+void cn10k_ml_layer_io_info_set(struct cnxk_ml_io_info *io_info,
+				struct cn10k_ml_model_metadata *metadata);
+struct cnxk_ml_io_info *cn10k_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
+int cn10k_ml_model_ocm_pages_count(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				   uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages);
-void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model);
+void cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			     struct cnxk_ml_io_info *io_info,
+			     struct cn10k_ml_model_metadata *metadata);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 9691cf03e3..ab05896b5e 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -15,6 +15,9 @@
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
 
+/* ML layer macros */
+#define CN10K_ML_LAYER_MEMZONE_NAME "ml_cn10k_layer_mz"
+
 /* Debug print width */
 #define STR_LEN	  12
 #define FIELD_LEN 16
@@ -273,7 +276,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 		req->cn10k_req.jd.model_start.extended_args = PLT_U64_CAST(
 			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.extended_args));
 		req->cn10k_req.jd.model_start.model_dst_ddr_addr =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_run_addr));
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_load_addr));
 		req->cn10k_req.jd.model_start.model_init_offset = 0x0;
 		req->cn10k_req.jd.model_start.model_main_offset = metadata->init_model.file_size;
 		req->cn10k_req.jd.model_start.model_finish_offset =
@@ -1261,85 +1264,171 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 }
 
 int
-cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
+cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uint8_t *buffer,
+		    size_t size, uint16_t *index)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
-	size_t model_scratch_size;
-	size_t model_stats_size;
-	size_t model_data_size;
-	size_t model_info_size;
+	size_t layer_object_size = 0;
+	size_t layer_scratch_size;
+	size_t layer_xstats_size;
 	uint8_t *base_dma_addr;
 	uint16_t scratch_pages;
+	uint16_t layer_id = 0;
 	uint16_t wb_pages;
 	uint64_t mz_size;
 	uint16_t idx;
-	bool found;
 	int qp_id;
 	int ret;
 
-	ret = cn10k_ml_model_metadata_check(params->addr, params->size);
+	PLT_SET_USED(size);
+	PLT_SET_USED(layer_name);
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	layer = &model->layer[layer_id];
+
+	ret = cn10k_ml_model_metadata_check(buffer, size);
 	if (ret != 0)
 		return ret;
 
-	cnxk_mldev = dev->data->dev_private;
-
-	/* Find model ID */
-	found = false;
-	for (idx = 0; idx < dev->data->nb_models; idx++) {
-		if (dev->data->models[idx] == NULL) {
-			found = true;
+	/* Get index */
+	for (idx = 0; idx < cnxk_mldev->max_nb_layers; idx++) {
+		if (!cnxk_mldev->index_map[idx].active) {
+			layer->index = idx;
 			break;
 		}
 	}
 
-	if (!found) {
-		plt_err("No slots available to load new model");
-		return -ENOMEM;
+	if (idx >= cnxk_mldev->max_nb_layers) {
+		plt_err("No slots available for model layers, model_id = %u, layer_id = %u",
+			model->model_id, layer_id);
+		return -1;
 	}
 
+	layer->model = model;
+
 	/* Get WB and scratch pages, check if model can be loaded. */
-	ret = cn10k_ml_model_ocm_pages_count(&cnxk_mldev->cn10k_mldev, idx, params->addr, &wb_pages,
-					     &scratch_pages);
+	ret = cn10k_ml_model_ocm_pages_count(cnxk_mldev, layer, buffer, &wb_pages, &scratch_pages);
 	if (ret < 0)
 		return ret;
 
-	/* Compute memzone size */
-	metadata = (struct cn10k_ml_model_metadata *)params->addr;
-	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
-			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
-	model_scratch_size = PLT_ALIGN_CEIL(metadata->model.ddr_scratch_range_end -
+	/* Compute layer memzone size */
+	metadata = (struct cn10k_ml_model_metadata *)buffer;
+	layer_object_size = metadata->init_model.file_size + metadata->main_model.file_size +
+			    metadata->finish_model.file_size + metadata->weights_bias.file_size;
+	layer_object_size = PLT_ALIGN_CEIL(layer_object_size, ML_CN10K_ALIGN_SIZE);
+	layer_scratch_size = PLT_ALIGN_CEIL(metadata->model.ddr_scratch_range_end -
 						    metadata->model.ddr_scratch_range_start + 1,
 					    ML_CN10K_ALIGN_SIZE);
-	model_data_size = PLT_ALIGN_CEIL(model_data_size, ML_CN10K_ALIGN_SIZE);
-	model_info_size = sizeof(struct rte_ml_model_info) +
-			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
-			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
-	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
-	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_xstats);
-
-	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
-		  2 * model_data_size + model_scratch_size + model_info_size +
-		  PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE) +
-		  model_stats_size;
+	layer_xstats_size = (cnxk_mldev->mldev->data->nb_queue_pairs + 1) *
+			    sizeof(struct cn10k_ml_layer_xstats);
 
-	/* Allocate memzone for model object and model data */
-	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
+	/* Allocate memzone for model data */
+	mz_size = layer_object_size + layer_scratch_size +
+		  PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE) +
+		  layer_xstats_size;
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u_%u", CN10K_ML_LAYER_MEMZONE_NAME,
+		 model->model_id, layer_id);
 	mz = plt_memzone_reserve_aligned(str, mz_size, 0, ML_CN10K_ALIGN_SIZE);
 	if (!mz) {
 		plt_err("plt_memzone_reserve failed : %s", str);
 		return -ENOMEM;
 	}
 
-	model = mz->addr;
-	model->cnxk_mldev = cnxk_mldev;
-	model->model_id = idx;
-	dev->data->models[idx] = model;
+	/* Copy metadata to internal buffer */
+	rte_memcpy(&layer->glow.metadata, buffer, sizeof(struct cn10k_ml_model_metadata));
+	cn10k_ml_model_metadata_update(&layer->glow.metadata);
+
+	/* Set layer name */
+	rte_memcpy(layer->name, layer->glow.metadata.model.name, MRVL_ML_MODEL_NAME_LEN);
+
+	/* Enable support for batch_size of 256 */
+	if (layer->glow.metadata.model.batch_size == 0)
+		layer->batch_size = 256;
+	else
+		layer->batch_size = layer->glow.metadata.model.batch_size;
+
+	/* Set DMA base address */
+	base_dma_addr = mz->addr;
+	cn10k_ml_layer_addr_update(layer, buffer, base_dma_addr);
+
+	/* Set scratch base address */
+	layer->glow.addr.scratch_base_addr = PLT_PTR_ADD(base_dma_addr, layer_object_size);
+
+	/* Update internal I/O data structure */
+	cn10k_ml_layer_io_info_set(&layer->info, &layer->glow.metadata);
+
+	/* Initialize model_mem_map */
+	memset(&layer->glow.ocm_map, 0, sizeof(struct cn10k_ml_ocm_layer_map));
+	layer->glow.ocm_map.ocm_reserved = false;
+	layer->glow.ocm_map.tilemask = 0;
+	layer->glow.ocm_map.wb_page_start = -1;
+	layer->glow.ocm_map.wb_pages = wb_pages;
+	layer->glow.ocm_map.scratch_pages = scratch_pages;
+
+	/* Set slow-path request address and state */
+	layer->glow.req = PLT_PTR_ADD(mz->addr, layer_object_size + layer_scratch_size);
+
+	/* Reset burst and sync stats */
+	layer->glow.burst_xstats = PLT_PTR_ADD(
+		layer->glow.req, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
+	for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs + 1; qp_id++) {
+		layer->glow.burst_xstats[qp_id].hw_latency_tot = 0;
+		layer->glow.burst_xstats[qp_id].hw_latency_min = UINT64_MAX;
+		layer->glow.burst_xstats[qp_id].hw_latency_max = 0;
+		layer->glow.burst_xstats[qp_id].fw_latency_tot = 0;
+		layer->glow.burst_xstats[qp_id].fw_latency_min = UINT64_MAX;
+		layer->glow.burst_xstats[qp_id].fw_latency_max = 0;
+		layer->glow.burst_xstats[qp_id].hw_reset_count = 0;
+		layer->glow.burst_xstats[qp_id].fw_reset_count = 0;
+		layer->glow.burst_xstats[qp_id].dequeued_count = 0;
+	}
+
+	layer->glow.sync_xstats =
+		PLT_PTR_ADD(layer->glow.burst_xstats, cnxk_mldev->mldev->data->nb_queue_pairs *
+							      sizeof(struct cn10k_ml_layer_xstats));
+
+	/* Update xstats names */
+	cn10k_ml_xstats_model_name_update(cnxk_mldev->mldev, idx);
+
+	layer->state = ML_CNXK_LAYER_STATE_LOADED;
+	cnxk_mldev->index_map[idx].model_id = model->model_id;
+	cnxk_mldev->index_map[idx].layer_id = layer_id;
+	cnxk_mldev->index_map[idx].active = true;
+	*index = idx;
+
+	return 0;
+}
+
+int
+cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+		    struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+	int ret;
+
+	/* Metadata check */
+	ret = cn10k_ml_model_metadata_check(params->addr, params->size);
+	if (ret != 0)
+		return ret;
 
+	/* Copy metadata to internal buffer */
 	rte_memcpy(&model->glow.metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
 	cn10k_ml_model_metadata_update(&model->glow.metadata);
 
@@ -1358,99 +1447,62 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	 */
 	model->nb_layers = 1;
 
-	/* Copy metadata to internal buffer */
-	rte_memcpy(&model->layer[0].glow.metadata, params->addr,
-		   sizeof(struct cn10k_ml_model_metadata));
-	cn10k_ml_model_metadata_update(&model->layer[0].glow.metadata);
-	model->layer[0].model = model;
-
-	/* Set DMA base address */
-	base_dma_addr = PLT_PTR_ADD(
-		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE));
-	cn10k_ml_layer_addr_update(&model->layer[0], params->addr, base_dma_addr);
-	model->layer[0].glow.addr.scratch_base_addr =
-		PLT_PTR_ADD(base_dma_addr, 2 * model_data_size);
-
-	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
-		   model->layer[0].glow.addr.base_dma_addr_load, model_data_size);
-
-	/* Update internal I/O data structure */
-	cn10k_ml_layer_info_update(&model->layer[0]);
-
-	/* Initialize model_mem_map */
-	memset(&model->layer[0].glow.ocm_map, 0, sizeof(struct cn10k_ml_ocm_layer_map));
-	model->layer[0].glow.ocm_map.ocm_reserved = false;
-	model->layer[0].glow.ocm_map.tilemask = 0;
-	model->layer[0].glow.ocm_map.wb_page_start = -1;
-	model->layer[0].glow.ocm_map.wb_pages = wb_pages;
-	model->layer[0].glow.ocm_map.scratch_pages = scratch_pages;
-
-	/* Set model info */
-	model->info = PLT_PTR_ADD(model->layer[0].glow.addr.scratch_base_addr, model_scratch_size);
-	cn10k_ml_model_info_set(dev, model);
-
-	/* Set slow-path request address and state */
-	model->layer[0].glow.req = PLT_PTR_ADD(model->info, model_info_size);
-
-	/* Reset burst and sync stats */
-	model->layer[0].glow.burst_xstats =
-		PLT_PTR_ADD(model->layer[0].glow.req,
-			    PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
-		model->layer[0].glow.burst_xstats[qp_id].hw_latency_tot = 0;
-		model->layer[0].glow.burst_xstats[qp_id].hw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_xstats[qp_id].hw_latency_max = 0;
-		model->layer[0].glow.burst_xstats[qp_id].fw_latency_tot = 0;
-		model->layer[0].glow.burst_xstats[qp_id].fw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_xstats[qp_id].fw_latency_max = 0;
-		model->layer[0].glow.burst_xstats[qp_id].hw_reset_count = 0;
-		model->layer[0].glow.burst_xstats[qp_id].fw_reset_count = 0;
-		model->layer[0].glow.burst_xstats[qp_id].dequeued_count = 0;
+	/* Load layer and get the index */
+	layer = &model->layer[0];
+	ret = cn10k_ml_layer_load(cnxk_mldev, model->model_id, NULL, params->addr, params->size,
+				  &layer->index);
+	if (ret != 0) {
+		plt_err("Model layer load failed: model_id = %u, layer_id = %u", model->model_id,
+			0);
+		return ret;
 	}
 
-	model->layer[0].glow.sync_xstats =
-		PLT_PTR_ADD(model->layer[0].glow.burst_xstats,
-			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_xstats));
-
-	plt_spinlock_init(&model->lock);
-	model->state = ML_CNXK_MODEL_STATE_LOADED;
-	dev->data->models[idx] = model;
-	cnxk_mldev->nb_models_loaded++;
-
-	/* Update xstats names */
-	cn10k_ml_xstats_model_name_update(dev, idx);
-
-	*model_id = idx;
+	cn10k_ml_model_info_set(cnxk_mldev, model, &model->layer[0].info, &model->glow.metadata);
 
 	return 0;
 }
 
 int
-cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name)
 {
-	char str[RTE_MEMZONE_NAMESIZE];
-	struct cnxk_ml_model *model;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 
-	cnxk_mldev = dev->data->dev_private;
-	model = dev->data->models[model_id];
+	char str[RTE_MEMZONE_NAMESIZE];
+	uint16_t layer_id = 0;
+	int ret;
 
+	PLT_SET_USED(layer_name);
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
 	if (model == NULL) {
 		plt_err("Invalid model_id = %u", model_id);
 		return -EINVAL;
 	}
 
-	if (model->state != ML_CNXK_MODEL_STATE_LOADED) {
-		plt_err("Cannot unload. Model in use.");
-		return -EBUSY;
-	}
+	layer = &model->layer[layer_id];
 
-	dev->data->models[model_id] = NULL;
-	cnxk_mldev->nb_models_unloaded++;
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u_%u", CN10K_ML_LAYER_MEMZONE_NAME,
+		 model->model_id, layer_id);
+	ret = plt_memzone_free(plt_memzone_lookup(str));
 
-	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, model_id);
-	return plt_memzone_free(plt_memzone_lookup(str));
+	layer->state = ML_CNXK_LAYER_STATE_UNKNOWN;
+	cnxk_mldev->index_map[layer->index].active = false;
+
+	return ret;
+}
+
+int
+cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	return cn10k_ml_layer_unload(cnxk_mldev, model->model_id, NULL);
 }
 
 int
@@ -1748,7 +1800,6 @@ int
 cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
 {
 	struct cnxk_ml_model *model;
-	size_t size;
 
 	model = dev->data->models[model_id];
 
@@ -1762,19 +1813,10 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *bu
 	else if (model->state != ML_CNXK_MODEL_STATE_LOADED)
 		return -EBUSY;
 
-	size = model->layer[0].glow.metadata.init_model.file_size +
-	       model->layer[0].glow.metadata.main_model.file_size +
-	       model->layer[0].glow.metadata.finish_model.file_size +
-	       model->layer[0].glow.metadata.weights_bias.file_size;
-
 	/* Update model weights & bias */
 	rte_memcpy(model->layer[0].glow.addr.wb_load_addr, buffer,
 		   model->layer[0].glow.metadata.weights_bias.file_size);
 
-	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
-		   model->layer[0].glow.addr.base_dma_addr_load, size);
-
 	return 0;
 }
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 2d0a49d5cd..677219dfdf 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -12,6 +12,7 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_qp;
+struct cnxk_ml_model;
 
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
@@ -311,9 +312,9 @@ int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 			      int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids);
 
 /* Slow-path ops */
-int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
-			uint16_t *model_id);
-int cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
+int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+			struct cnxk_ml_model *model);
+int cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
@@ -339,4 +340,9 @@ __rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *
 /* Misc ops */
 void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
 
+/* Layer ops */
+int cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uint8_t *buffer,
+			size_t size, uint16_t *index);
+int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name);
+
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 02605fa28f..1590249abd 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -31,6 +31,18 @@ enum cnxk_ml_dev_state {
 	ML_CNXK_DEV_STATE_CLOSED
 };
 
+/* Index to model and layer ID map */
+struct cnxk_ml_index_map {
+	/* Model ID */
+	uint16_t model_id;
+
+	/* Layer ID */
+	uint16_t layer_id;
+
+	/* Layer status */
+	bool active;
+};
+
 /* Device private data */
 struct cnxk_ml_dev {
 	/* RTE device */
@@ -56,6 +68,9 @@ struct cnxk_ml_dev {
 
 	/* Maximum number of layers */
 	uint64_t max_nb_layers;
+
+	/* Index map */
+	struct cnxk_ml_index_map *index_map;
 };
 
 #endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index aa56dd2276..1d8b84269d 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -10,6 +10,9 @@
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
+/* ML model macros */
+#define CNXK_ML_MODEL_MEMZONE_NAME "ml_cnxk_model_mz"
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -137,6 +140,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	uint16_t model_id;
 	uint32_t mz_size;
 	uint16_t qp_id;
+	uint64_t i;
 	int ret;
 
 	if (dev == NULL)
@@ -240,7 +244,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 						plt_err("Could not stop model %u", model_id);
 				}
 				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-					if (cn10k_ml_model_unload(dev, model_id) != 0)
+					if (cnxk_ml_model_unload(dev, model_id) != 0)
 						plt_err("Could not unload model %u", model_id);
 				}
 				dev->data->models[model_id] = NULL;
@@ -271,6 +275,23 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	cnxk_mldev->max_nb_layers =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
 
+	/* Allocate and initialize index_map */
+	if (cnxk_mldev->index_map == NULL) {
+		cnxk_mldev->index_map =
+			rte_zmalloc("cnxk_ml_index_map",
+				    sizeof(struct cnxk_ml_index_map) * cnxk_mldev->max_nb_layers,
+				    RTE_CACHE_LINE_SIZE);
+		if (cnxk_mldev->index_map == NULL) {
+			plt_err("Failed to get memory for index_map, nb_layers %" PRIu64,
+				cnxk_mldev->max_nb_layers);
+			ret = -ENOMEM;
+			goto error;
+		}
+	}
+
+	for (i = 0; i < cnxk_mldev->max_nb_layers; i++)
+		cnxk_mldev->index_map[i].active = false;
+
 	cnxk_mldev->nb_models_loaded = 0;
 	cnxk_mldev->nb_models_started = 0;
 	cnxk_mldev->nb_models_stopped = 0;
@@ -303,6 +324,9 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close CN10K ML Device");
 
+	if (cnxk_mldev->index_map)
+		rte_free(cnxk_mldev->index_map);
+
 	/* Stop and unload all models */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 		model = dev->data->models[model_id];
@@ -312,7 +336,7 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 					plt_err("Could not stop model %u", model_id);
 			}
 			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-				if (cn10k_ml_model_unload(dev, model_id) != 0)
+				if (cnxk_ml_model_unload(dev, model_id) != 0)
 					plt_err("Could not unload model %u", model_id);
 			}
 			dev->data->models[model_id] = NULL;
@@ -428,6 +452,118 @@ cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+static int
+cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint64_t model_info_size;
+	uint16_t lcl_model_id;
+	uint64_t mz_size;
+	bool found;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	/* Find model ID */
+	found = false;
+	for (lcl_model_id = 0; lcl_model_id < dev->data->nb_models; lcl_model_id++) {
+		if (dev->data->models[lcl_model_id] == NULL) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		plt_err("No slots available to load new model");
+		return -ENOMEM;
+	}
+
+	/* Compute memzone size */
+	cnxk_ml_dev_info_get(dev, &dev_info);
+	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size);
+	model_info_size = sizeof(struct rte_ml_model_info) +
+			  ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info) +
+			  ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info);
+	model_info_size = PLT_ALIGN_CEIL(model_info_size, dev_info.align_size);
+	mz_size += model_info_size;
+
+	/* Allocate memzone for model object */
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CNXK_ML_MODEL_MEMZONE_NAME, lcl_model_id);
+	mz = plt_memzone_reserve_aligned(str, mz_size, 0, dev_info.align_size);
+	if (!mz) {
+		plt_err("Failed to allocate memory for cnxk_ml_model: %s", str);
+		return -ENOMEM;
+	}
+
+	model = mz->addr;
+	model->cnxk_mldev = cnxk_mldev;
+	model->model_id = lcl_model_id;
+	model->info = PLT_PTR_ADD(
+		model, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size));
+	dev->data->models[lcl_model_id] = model;
+
+	ret = cn10k_ml_model_load(cnxk_mldev, params, model);
+	if (ret != 0)
+		goto error;
+
+	plt_spinlock_init(&model->lock);
+	model->state = ML_CNXK_MODEL_STATE_LOADED;
+	cnxk_mldev->nb_models_loaded++;
+
+	*model_id = lcl_model_id;
+
+	return 0;
+
+error:
+	rte_memzone_free(mz);
+
+	return ret;
+}
+
+int
+cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	if (model->state != ML_CNXK_MODEL_STATE_LOADED) {
+		plt_err("Cannot unload. Model in use.");
+		return -EBUSY;
+	}
+
+	ret = cn10k_ml_model_unload(cnxk_mldev, model);
+	if (ret != 0)
+		return ret;
+
+	dev->data->models[model_id] = NULL;
+	cnxk_mldev->nb_models_unloaded++;
+
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CNXK_ML_MODEL_MEMZONE_NAME, model_id);
+	return plt_memzone_free(plt_memzone_lookup(str));
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -451,8 +587,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
 
 	/* Model ops */
-	.model_load = cn10k_ml_model_load,
-	.model_unload = cn10k_ml_model_unload,
+	.model_load = cnxk_ml_model_load,
+	.model_unload = cnxk_ml_model_unload,
 	.model_start = cn10k_ml_model_start,
 	.model_stop = cn10k_ml_model_stop,
 	.model_info_get = cn10k_ml_model_info_get,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index a925c07580..bc14f6e5b9 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -62,4 +62,6 @@ struct cnxk_ml_qp {
 
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
+int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 10/34] ml/cnxk: update model start and stop functions
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
                     ` (8 preceding siblings ...)
  2023-10-19  4:16   ` [PATCH v7 09/34] ml/cnxk: update model load and unload functions Srikanth Yalavarthi
@ 2023-10-19  4:16   ` Srikanth Yalavarthi
  2023-10-19  4:17   ` [PATCH v7 11/34] ml/cnxk: update model utility functions Srikanth Yalavarthi
                     ` (23 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:16 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented cnxk wrapper functions to start and stop
ML models. Wrapper functions would invoke the cn10k
model start and stop functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c |  28 ++--
 drivers/ml/cnxk/cn10k_ml_ocm.h |  12 +-
 drivers/ml/cnxk/cn10k_ml_ops.c | 282 ++++++++++++++++++++-------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   8 +-
 drivers/ml/cnxk/cnxk_ml_ops.c  |  48 +++++-
 drivers/ml/cnxk/cnxk_ml_ops.h  |   1 +
 6 files changed, 240 insertions(+), 139 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index d71c36eae6..2197e5e0ed 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -215,11 +215,10 @@ cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end)
  * scratch & WB pages and OCM allocation mode.
  */
 int
-cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
+cn10k_ml_ocm_tilemask_find(struct cnxk_ml_dev *cnxk_mldev, uint8_t num_tiles, uint16_t wb_pages,
 			   uint16_t scratch_pages, uint64_t *tilemask)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	uint16_t used_scratch_pages_max;
@@ -238,7 +237,6 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	int max_slot_sz;
 	int page_id;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
 
@@ -333,12 +331,10 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 }
 
 void
-cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
+cn10k_ml_ocm_reserve_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id,
 			   uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
 			   uint16_t scratch_pages)
 {
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
@@ -351,10 +347,8 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t l
 	int tile_id;
 	int page_id;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
+	model = cnxk_mldev->mldev->data->models[model_id];
 	layer = &model->layer[layer_id];
 
 	/* Get first set bit, tile_start */
@@ -396,12 +390,10 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t l
 }
 
 void
-cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id)
+cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id)
 {
 	struct cnxk_ml_model *local_model;
 	struct cnxk_ml_layer *local_layer;
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
@@ -416,10 +408,8 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t laye
 	uint16_t i;
 	uint16_t j;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
+	model = cnxk_mldev->mldev->data->models[model_id];
 	layer = &model->layer[layer_id];
 
 	/* Update OCM info for WB memory */
@@ -438,8 +428,8 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t laye
 
 		/* Get max scratch pages required, excluding the current model */
 		scratch_resize_pages = 0;
-		for (i = 0; i < dev->data->nb_models; i++) {
-			local_model = dev->data->models[i];
+		for (i = 0; i < cnxk_mldev->mldev->data->nb_models; i++) {
+			local_model = cnxk_mldev->mldev->data->models[i];
 			if (local_model == NULL)
 				continue;
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 720f8caf76..97b723a56a 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -8,6 +8,8 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+struct cnxk_ml_dev;
+
 /* Number of OCM tiles. */
 #define ML_CN10K_OCM_NUMTILES 0x8
 
@@ -75,12 +77,12 @@ struct cn10k_ml_ocm {
 };
 
 int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
-int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
+int cn10k_ml_ocm_tilemask_find(struct cnxk_ml_dev *cnxk_mldev, uint8_t num_tiles, uint16_t wb_pages,
 			       uint16_t scratch_pages, uint64_t *tilemask);
-void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
-				uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
-				uint16_t scratch_pages);
-void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id);
+void cn10k_ml_ocm_reserve_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id,
+				uint16_t layer_id, uint64_t tilemask, int wb_page_start,
+				uint16_t wb_pages, uint16_t scratch_pages);
+void cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id);
 void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp);
 
 #endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index ab05896b5e..40f484158a 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -248,26 +248,28 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 }
 
 static void
-cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_model *model,
+cn10k_ml_prep_sp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				struct cnxk_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_layer_addr *addr;
+	struct cn10k_ml_dev *cn10k_mldev;
 
-	metadata = &model->glow.metadata;
-	addr = &model->layer[0].glow.addr;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	metadata = &layer->glow.metadata;
+	addr = &layer->glow.addr;
 
 	memset(&req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
 	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
 	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->cn10k_req.status);
-	req->cn10k_req.jd.hdr.model_id = model->model_id;
+	req->cn10k_req.jd.hdr.model_id = layer->index;
 	req->cn10k_req.jd.hdr.job_type = job_type;
 	req->cn10k_req.jd.hdr.fp_flags = 0x0;
 	req->cn10k_req.jd.hdr.result =
 		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
-		if (!model->glow.metadata.model.ocm_relocatable)
+		if (!layer->glow.metadata.model.ocm_relocatable)
 			req->cn10k_req.jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
 		else
 			req->cn10k_req.jd.hdr.sp_flags = 0x0;
@@ -291,7 +293,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 		req->cn10k_req.jd.model_start.num_gather_entries = 0;
 		req->cn10k_req.jd.model_start.num_scatter_entries = 0;
 		req->cn10k_req.jd.model_start.tilemask = 0; /* Updated after reserving pages */
-		req->cn10k_req.jd.model_start.batch_size = model->batch_size;
+		req->cn10k_req.jd.model_start.batch_size = layer->batch_size;
 		req->cn10k_req.jd.model_start.ocm_wb_base_address =
 			0; /* Updated after reserving pages */
 		req->cn10k_req.jd.model_start.ocm_wb_range_start =
@@ -323,9 +325,13 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 }
 
 static __rte_always_inline void
-cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_req *req,
+cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_req *req,
 				struct rte_ml_op *op)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+
 	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
 	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(req->status);
 	req->cn10k_req.jd.hdr.model_id = op->model_id;
@@ -714,10 +720,8 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 }
 
 static int
-cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer)
 {
-	struct rte_ml_model_info *info;
-	struct cnxk_ml_model *model;
 	struct rte_ml_buff_seg seg[2];
 	struct rte_ml_buff_seg *inp;
 	struct rte_ml_buff_seg *out;
@@ -730,22 +734,20 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	int ret = 0;
 	uint32_t i;
 
-	model = dev->data->models[model_id];
-	info = (struct rte_ml_model_info *)model->info;
 	inp = &seg[0];
 	out = &seg[1];
 
 	/* Create input and output buffers. */
-	for (i = 0; i < info->nb_inputs; i++)
-		isize += info->input_info[i].size;
+	for (i = 0; i < layer->info.nb_inputs; i++)
+		isize += layer->info.input[i].sz_q;
 
-	for (i = 0; i < info->nb_outputs; i++)
-		osize += info->output_info[i].size;
+	for (i = 0; i < layer->info.nb_outputs; i++)
+		osize += layer->info.output[i].sz_q;
 
-	isize = model->batch_size * isize;
-	osize = model->batch_size * osize;
+	isize = layer->batch_size * isize;
+	osize = layer->batch_size * osize;
 
-	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", model_id);
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", layer->index);
 	mz = plt_memzone_reserve_aligned(str, isize + osize, 0, ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL)
 		return -ENOMEM;
@@ -761,15 +763,15 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	seg[1].length = osize;
 	seg[1].next = NULL;
 
-	op.model_id = model_id;
-	op.nb_batches = model->batch_size;
+	op.model_id = layer->index;
+	op.nb_batches = layer->batch_size;
 	op.mempool = NULL;
 
 	op.input = &inp;
 	op.output = &out;
 
-	memset(model->layer[0].glow.req, 0, sizeof(struct cnxk_ml_req));
-	ret = cn10k_ml_inference_sync(dev, &op);
+	memset(layer->glow.req, 0, sizeof(struct cnxk_ml_req));
+	ret = cn10k_ml_inference_sync(cnxk_mldev, &op);
 	plt_memzone_free(mz);
 
 	return ret;
@@ -1506,14 +1508,16 @@ cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mode
 }
 
 int
-cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 	struct cnxk_ml_req *req;
 
+	uint16_t layer_id = 0;
 	bool job_enqueued;
 	bool job_dequeued;
 	uint8_t num_tiles;
@@ -1524,85 +1528,89 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	bool locked;
 	int ret = 0;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	PLT_SET_USED(layer_name);
 
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
 	if (model == NULL) {
 		plt_err("Invalid model_id = %u", model_id);
 		return -EINVAL;
 	}
 
+	layer = &model->layer[layer_id];
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
+
 	/* Prepare JD */
-	req = model->layer[0].glow.req;
-	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
+	req = layer->glow.req;
+	cn10k_ml_prep_sp_job_descriptor(cnxk_mldev, layer, req, ML_CN10K_JOB_TYPE_MODEL_START);
 	req->cn10k_req.result.error_code = 0x0;
 	req->cn10k_req.result.user_ptr = NULL;
 
 	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
-	num_tiles = model->layer[0].glow.metadata.model.tile_end -
-		    model->layer[0].glow.metadata.model.tile_start + 1;
+	num_tiles = layer->glow.metadata.model.tile_end - layer->glow.metadata.model.tile_start + 1;
 
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-				plt_ml_dbg("Model already started, model = 0x%016lx",
-					   PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_STARTED) {
+				plt_ml_dbg("Layer already started, model_id = %u, layer_id = %u",
+					   model->model_id, layer_id);
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
-				plt_err("A slow-path job is active for the model = 0x%016lx",
-					PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE) {
+				plt_err("A slow-path job is active for the model_id = %u",
+					model->model_id);
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
+			layer->state = ML_CNXK_LAYER_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (!model->layer[0].glow.ocm_map.ocm_reserved) {
+	while (!layer->glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
 			wb_page_start = cn10k_ml_ocm_tilemask_find(
-				dev, num_tiles, model->layer[0].glow.ocm_map.wb_pages,
-				model->layer[0].glow.ocm_map.scratch_pages, &tilemask);
+				cnxk_mldev, num_tiles, layer->glow.ocm_map.wb_pages,
+				layer->glow.ocm_map.scratch_pages, &tilemask);
 
 			if (wb_page_start == -1) {
 				plt_err("Free pages not available on OCM tiles");
-				plt_err("Failed to start model = 0x%016lx, name = %s",
-					PLT_U64_CAST(model),
-					model->layer[0].glow.metadata.model.name);
-
+				plt_err("Failed to start layer, model_id = %u, layer_id = %u",
+					model->model_id, layer_id);
 				plt_spinlock_unlock(&ocm->lock);
 				return -ENOMEM;
 			}
 
-			model->layer[0].glow.ocm_map.tilemask = tilemask;
-			model->layer[0].glow.ocm_map.wb_page_start = wb_page_start;
+			layer->glow.ocm_map.tilemask = tilemask;
+			layer->glow.ocm_map.wb_page_start = wb_page_start;
 
-			cn10k_ml_ocm_reserve_pages(dev, model->model_id, 0,
-						   model->layer[0].glow.ocm_map.tilemask,
-						   model->layer[0].glow.ocm_map.wb_page_start,
-						   model->layer[0].glow.ocm_map.wb_pages,
-						   model->layer[0].glow.ocm_map.scratch_pages);
-			model->layer[0].glow.ocm_map.ocm_reserved = true;
+			cn10k_ml_ocm_reserve_pages(
+				cnxk_mldev, model->model_id, layer_id, layer->glow.ocm_map.tilemask,
+				layer->glow.ocm_map.wb_page_start, layer->glow.ocm_map.wb_pages,
+				layer->glow.ocm_map.scratch_pages);
+			layer->glow.ocm_map.ocm_reserved = true;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
 
 	/* Update JD */
-	cn10k_ml_ocm_tilecount(model->layer[0].glow.ocm_map.tilemask, &tile_start, &tile_end);
+	cn10k_ml_ocm_tilecount(layer->glow.ocm_map.tilemask, &tile_start, &tile_end);
 	req->cn10k_req.jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
 	req->cn10k_req.jd.model_start.ocm_wb_base_address =
-		model->layer[0].glow.ocm_map.wb_page_start * ocm->page_size;
+		layer->glow.ocm_map.wb_page_start * ocm->page_size;
 
 	job_enqueued = false;
 	job_dequeued = false;
@@ -1636,66 +1644,94 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (ret == 0) {
-				model->state = ML_CNXK_MODEL_STATE_STARTED;
-				cnxk_mldev->nb_models_started++;
-			} else {
-				model->state = ML_CNXK_MODEL_STATE_UNKNOWN;
-			}
+			if (ret == 0)
+				layer->state = ML_CNXK_LAYER_STATE_STARTED;
+			else
+				layer->state = ML_CNXK_LAYER_STATE_UNKNOWN;
 
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN) {
-		while (model->layer[0].glow.ocm_map.ocm_reserved) {
+	if (layer->state == ML_CNXK_LAYER_STATE_UNKNOWN) {
+		while (layer->glow.ocm_map.ocm_reserved) {
 			if (plt_spinlock_trylock(&ocm->lock) != 0) {
-				cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
-				model->layer[0].glow.ocm_map.ocm_reserved = false;
-				model->layer[0].glow.ocm_map.tilemask = 0x0;
+				cn10k_ml_ocm_free_pages(cnxk_mldev, model->model_id, layer_id);
+				layer->glow.ocm_map.ocm_reserved = false;
+				layer->glow.ocm_map.tilemask = 0x0;
 				plt_spinlock_unlock(&ocm->lock);
 			}
 		}
 	}
 
-	if (ret < 0) { /* Call unload to update model and FW state, ignore error */
-		rte_ml_model_stop(dev->data->dev_id, model_id);
+	if (ret < 0) {
+		cn10k_ml_layer_stop(device, model_id, layer_name);
 	} else {
-		if (cn10k_mldev->cache_model_data && roc_model_is_cn10ka())
-			ret = cn10k_ml_cache_model_data(dev, model_id);
+		if (cn10k_mldev->cache_model_data)
+			ret = cn10k_ml_cache_model_data(cnxk_mldev, layer);
 	}
 
 	return ret;
 }
 
 int
-cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+	int ret;
+
+	layer = &model->layer[0];
+	ret = cn10k_ml_layer_start(cnxk_mldev, model->model_id, layer->name);
+	if (ret != 0) {
+		plt_err("CN10K Model start failed, model_id = %u, error = %d", model->model_id,
+			ret);
+		return ret;
+	}
+
+	cnxk_mldev->nb_models_started++;
+	model->state = ML_CNXK_MODEL_STATE_STARTED;
+
+	return 0;
+}
+
+int
+cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 	struct cnxk_ml_req *req;
 
+	uint16_t layer_id = 0;
 	bool job_enqueued;
 	bool job_dequeued;
 	bool locked;
 	int ret = 0;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	PLT_SET_USED(layer_name);
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
 
+	model = cnxk_mldev->mldev->data->models[model_id];
 	if (model == NULL) {
 		plt_err("Invalid model_id = %u", model_id);
 		return -EINVAL;
 	}
 
+	layer = &model->layer[layer_id];
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
+
 	/* Prepare JD */
-	req = model->layer[0].glow.req;
-	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
+	req = layer->glow.req;
+	cn10k_ml_prep_sp_job_descriptor(cnxk_mldev, layer, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
 	req->cn10k_req.result.error_code = 0x0;
 	req->cn10k_req.result.user_ptr = NULL;
 
@@ -1705,31 +1741,31 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-				plt_ml_dbg("Model not started, model = 0x%016lx",
-					   PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_LOADED) {
+				plt_ml_dbg("Layer not started, model_id = %u, layer_id = %u",
+					   model->model_id, layer_id);
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
-				plt_err("A slow-path job is active for the model = 0x%016lx",
-					PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE) {
+				plt_err("A slow-path job is active for the layer, model_id = %u, layer_id = %u",
+					model->model_id, layer_id);
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
+			layer->state = ML_CNXK_LAYER_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (model->layer[0].glow.ocm_map.ocm_reserved) {
+	while (layer->glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
-			cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
-			model->layer[0].glow.ocm_map.ocm_reserved = false;
-			model->layer[0].glow.ocm_map.tilemask = 0x0;
+			cn10k_ml_ocm_free_pages(cnxk_mldev, model->model_id, layer_id);
+			layer->glow.ocm_map.ocm_reserved = false;
+			layer->glow.ocm_map.tilemask = 0x0;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
@@ -1766,8 +1802,11 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			cnxk_mldev->nb_models_stopped++;
-			model->state = ML_CNXK_MODEL_STATE_LOADED;
+			if (ret == 0)
+				layer->state = ML_CNXK_LAYER_STATE_LOADED;
+			else
+				layer->state = ML_CNXK_LAYER_STATE_UNKNOWN;
+
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
@@ -1776,6 +1815,25 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	return ret;
 }
 
+int
+cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+	int ret;
+
+	layer = &model->layer[0];
+	ret = cn10k_ml_layer_stop(cnxk_mldev, model->model_id, layer->name);
+	if (ret != 0) {
+		plt_err("CN10K Model stop failed, model_id = %u, error = %d", model->model_id, ret);
+		return ret;
+	}
+
+	cnxk_mldev->nb_models_stopped++;
+	model->state = ML_CNXK_MODEL_STATE_LOADED;
+
+	return 0;
+}
+
 int
 cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			struct rte_ml_model_info *model_info)
@@ -2003,30 +2061,35 @@ queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
 }
 
 static __rte_always_inline void
-cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *req)
+cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml_req *req)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_layer_xstats *xstats;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_result *result;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cnxk_ml_qp *qp;
 	struct rte_ml_op *op;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
+	uint16_t model_id;
+	uint16_t layer_id;
 
 	result = &req->cn10k_req.result;
 	op = req->op;
 
 	if (likely(result->error_code == 0)) {
-		model = dev->data->models[op->model_id];
+		model_id = cnxk_mldev->index_map[op->model_id].model_id;
+		layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
+		model = cnxk_mldev->mldev->data->models[model_id];
+		layer = &model->layer[layer_id];
 		if (likely(qp_id >= 0)) {
-			qp = dev->data->queue_pairs[qp_id];
+			qp = cnxk_mldev->mldev->data->queue_pairs[qp_id];
 			qp->stats.dequeued_count++;
-			xstats = &model->layer[0].glow.burst_xstats[qp_id];
+			xstats = &layer->glow.burst_xstats[qp_id];
 		} else {
-			xstats = model->layer[0].glow.sync_xstats;
+			xstats = layer->glow.sync_xstats;
 		}
 
 		if (unlikely(xstats->dequeued_count == xstats->hw_reset_count)) {
@@ -2054,14 +2117,13 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *re
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
 	} else {
 		if (likely(qp_id >= 0)) {
-			qp = dev->data->queue_pairs[qp_id];
+			qp = cnxk_mldev->mldev->data->queue_pairs[qp_id];
 			qp->stats.dequeue_err_count++;
 		}
 
 		/* Handle driver error */
 		error_code = (union cn10k_ml_error_code *)&result->error_code;
 		if (error_code->s.etype == ML_ETYPE_DRIVER) {
-			cnxk_mldev = dev->data->dev_private;
 			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 			/* Check for exception */
@@ -2116,7 +2178,7 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	req = &queue->reqs[head];
 
 	cn10k_mldev->set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
@@ -2183,7 +2245,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 		}
 	}
 
-	cn10k_ml_result_update(dev, qp_id, req);
+	cn10k_ml_result_update(cnxk_mldev, qp_id, req);
 	ops[count] = req->op;
 
 	queue_index_advance(&tail, qp->nb_desc);
@@ -2232,23 +2294,27 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 }
 
 __rte_hot int
-cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
+cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cnxk_ml_req *req;
+	uint16_t model_id;
+	uint16_t layer_id;
 	bool timeout;
 	int ret = 0;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	model = dev->data->models[op->model_id];
-	req = model->layer[0].glow.req;
+	model_id = cnxk_mldev->index_map[op->model_id].model_id;
+	layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
+	model = cnxk_mldev->mldev->data->models[model_id];
+	layer = &model->layer[layer_id];
+	req = layer->glow.req;
 
 	cn10k_ml_set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
@@ -2284,7 +2350,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	if (timeout)
 		ret = -ETIME;
 	else
-		cn10k_ml_result_update(dev, -1, req);
+		cn10k_ml_result_update(cnxk_mldev, -1, req);
 
 error_enqueue:
 	return ret;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 677219dfdf..a222a43d55 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -315,8 +315,8 @@ int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
-int cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id);
-int cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
+int cn10k_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			    struct rte_ml_model_info *model_info);
 int cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer);
@@ -335,7 +335,7 @@ __rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id
 					  struct rte_ml_op **ops, uint16_t nb_ops);
 __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op,
 				    struct rte_ml_op_error *error);
-__rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
+__rte_hot int cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op);
 
 /* Misc ops */
 void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
@@ -344,5 +344,7 @@ void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *q
 int cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uint8_t *buffer,
 			size_t size, uint16_t *index);
 int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name);
+int cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name);
+int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name);
 
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 1d8b84269d..b61ed45876 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -240,7 +240,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 			model = dev->data->models[model_id];
 			if (model != NULL) {
 				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-					if (cn10k_ml_model_stop(dev, model_id) != 0)
+					if (cnxk_ml_model_stop(dev, model_id) != 0)
 						plt_err("Could not stop model %u", model_id);
 				}
 				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
@@ -332,7 +332,7 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 		model = dev->data->models[model_id];
 		if (model != NULL) {
 			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-				if (cn10k_ml_model_stop(dev, model_id) != 0)
+				if (cnxk_ml_model_stop(dev, model_id) != 0)
 					plt_err("Could not stop model %u", model_id);
 			}
 			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
@@ -564,6 +564,46 @@ cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 	return plt_memzone_free(plt_memzone_lookup(str));
 }
 
+static int
+cnxk_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	return cn10k_ml_model_start(cnxk_mldev, model);
+}
+
+int
+cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	return cn10k_ml_model_stop(cnxk_mldev, model);
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -589,8 +629,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Model ops */
 	.model_load = cnxk_ml_model_load,
 	.model_unload = cnxk_ml_model_unload,
-	.model_start = cn10k_ml_model_start,
-	.model_stop = cn10k_ml_model_stop,
+	.model_start = cnxk_ml_model_start,
+	.model_stop = cnxk_ml_model_stop,
 	.model_info_get = cn10k_ml_model_info_get,
 	.model_params_update = cn10k_ml_model_params_update,
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index bc14f6e5b9..d27ca0d0cb 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -63,5 +63,6 @@ struct cnxk_ml_qp {
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
 int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
+int cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
 
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 11/34] ml/cnxk: update model utility functions
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
                     ` (9 preceding siblings ...)
  2023-10-19  4:16   ` [PATCH v7 10/34] ml/cnxk: update model start and stop functions Srikanth Yalavarthi
@ 2023-10-19  4:17   ` Srikanth Yalavarthi
  2023-10-19  4:17   ` [PATCH v7 12/34] ml/cnxk: update data quantization functions Srikanth Yalavarthi
                     ` (22 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to update model params and
fetch model info.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 38 ++++++---------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |  5 ++--
 drivers/ml/cnxk/cnxk_ml_ops.c  | 48 ++++++++++++++++++++++++++++++++--
 3 files changed, 56 insertions(+), 35 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 40f484158a..3ff82829f0 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1835,45 +1835,23 @@ cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
 }
 
 int
-cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
-			struct rte_ml_model_info *model_info)
+cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			     void *buffer)
 {
-	struct cnxk_ml_model *model;
-
-	model = dev->data->models[model_id];
-
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
-
-	rte_memcpy(model_info, model->info, sizeof(struct rte_ml_model_info));
-	model_info->input_info = ((struct rte_ml_model_info *)model->info)->input_info;
-	model_info->output_info = ((struct rte_ml_model_info *)model->info)->output_info;
-
-	return 0;
-}
-
-int
-cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
-{
-	struct cnxk_ml_model *model;
-
-	model = dev->data->models[model_id];
+	struct cnxk_ml_layer *layer;
 
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
+	RTE_SET_USED(cnxk_mldev);
 
 	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN)
 		return -1;
 	else if (model->state != ML_CNXK_MODEL_STATE_LOADED)
 		return -EBUSY;
 
+	layer = &model->layer[0];
+
 	/* Update model weights & bias */
-	rte_memcpy(model->layer[0].glow.addr.wb_load_addr, buffer,
-		   model->layer[0].glow.metadata.weights_bias.file_size);
+	rte_memcpy(layer->glow.addr.wb_load_addr, buffer,
+		   layer->glow.metadata.weights_bias.file_size);
 
 	return 0;
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index a222a43d55..ef12069f0d 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -317,9 +317,8 @@ int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_para
 int cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
-int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
-			    struct rte_ml_model_info *model_info);
-int cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer);
+int cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				 void *buffer);
 
 /* I/O ops */
 int cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index b61ed45876..9ce37fcfd1 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -604,6 +604,50 @@ cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	return cn10k_ml_model_stop(cnxk_mldev, model);
 }
 
+static int
+cnxk_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
+		       struct rte_ml_model_info *model_info)
+{
+	struct rte_ml_model_info *info;
+	struct cnxk_ml_model *model;
+
+	if ((dev == NULL) || (model_info == NULL))
+		return -EINVAL;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	info = (struct rte_ml_model_info *)model->info;
+	rte_memcpy(model_info, info, sizeof(struct rte_ml_model_info));
+	model_info->input_info = info->input_info;
+	model_info->output_info = info->output_info;
+
+	return 0;
+}
+
+static int
+cnxk_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	if ((dev == NULL) || (buffer == NULL))
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	return cn10k_ml_model_params_update(cnxk_mldev, model, buffer);
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -631,8 +675,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.model_unload = cnxk_ml_model_unload,
 	.model_start = cnxk_ml_model_start,
 	.model_stop = cnxk_ml_model_stop,
-	.model_info_get = cn10k_ml_model_info_get,
-	.model_params_update = cn10k_ml_model_params_update,
+	.model_info_get = cnxk_ml_model_info_get,
+	.model_params_update = cnxk_ml_model_params_update,
 
 	/* I/O ops */
 	.io_quantize = cn10k_ml_io_quantize,
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 12/34] ml/cnxk: update data quantization functions
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
                     ` (10 preceding siblings ...)
  2023-10-19  4:17   ` [PATCH v7 11/34] ml/cnxk: update model utility functions Srikanth Yalavarthi
@ 2023-10-19  4:17   ` Srikanth Yalavarthi
  2023-10-19  4:17   ` [PATCH v7 13/34] ml/cnxk: update device debug functions Srikanth Yalavarthi
                     ` (21 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper functions to quantize input data and
dequantize output data.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 164 ---------------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   7 --
 drivers/ml/cnxk/cnxk_ml_io.c   |  95 +++++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_io.h   |   3 +
 drivers/ml/cnxk/cnxk_ml_ops.c  |  78 +++++++++++++++-
 drivers/ml/cnxk/meson.build    |   1 +
 6 files changed, 175 insertions(+), 173 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.c

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 3ff82829f0..c68e6c620c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1856,170 +1856,6 @@ cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_mode
 	return 0;
 }
 
-int
-cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
-		     struct rte_ml_buff_seg **qbuffer)
-{
-	struct cnxk_ml_model *model;
-	uint8_t model_input_type;
-	uint8_t *lcl_dbuffer;
-	uint8_t *lcl_qbuffer;
-	uint8_t input_type;
-	float qscale;
-	uint32_t i;
-	uint32_t j;
-	int ret;
-
-	model = dev->data->models[model_id];
-
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
-
-	lcl_dbuffer = dbuffer[0]->addr;
-	lcl_qbuffer = qbuffer[0]->addr;
-
-	for (i = 0; i < model->layer[0].glow.metadata.model.num_input; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			input_type = model->layer[0].glow.metadata.input1[i].input_type;
-			model_input_type = model->layer[0].glow.metadata.input1[i].model_input_type;
-			qscale = model->layer[0].glow.metadata.input1[i].qscale;
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			input_type = model->layer[0].glow.metadata.input2[j].input_type;
-			model_input_type = model->layer[0].glow.metadata.input2[j].model_input_type;
-			qscale = model->layer[0].glow.metadata.input2[j].qscale;
-		}
-
-		if (input_type == model_input_type) {
-			rte_memcpy(lcl_qbuffer, lcl_dbuffer, model->layer[0].info.input[i].sz_d);
-		} else {
-			switch (model->layer[0].glow.metadata.input1[i].model_input_type) {
-			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_float32_to_int8(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_float32_to_uint8(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_float32_to_int16(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_float32_to_uint16(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_FP16:
-				ret = rte_ml_io_float32_to_float16(
-					model->layer[0].info.input[i].nb_elements, lcl_dbuffer,
-					lcl_qbuffer);
-				break;
-			default:
-				plt_err("Unsupported model_input_type[%u] : %u", i,
-					model->layer[0].glow.metadata.input1[i].model_input_type);
-				ret = -ENOTSUP;
-			}
-			if (ret < 0)
-				return ret;
-		}
-
-		lcl_dbuffer += model->layer[0].info.input[i].sz_d;
-		lcl_qbuffer += model->layer[0].info.input[i].sz_q;
-	}
-
-	return 0;
-}
-
-int
-cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
-		       struct rte_ml_buff_seg **dbuffer)
-{
-	struct cnxk_ml_model *model;
-	uint8_t model_output_type;
-	uint8_t *lcl_qbuffer;
-	uint8_t *lcl_dbuffer;
-	uint8_t output_type;
-	float dscale;
-	uint32_t i;
-	uint32_t j;
-	int ret;
-
-	model = dev->data->models[model_id];
-
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
-
-	lcl_dbuffer = dbuffer[0]->addr;
-	lcl_qbuffer = qbuffer[0]->addr;
-
-	for (i = 0; i < model->layer[0].glow.metadata.model.num_output; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			output_type = model->layer[0].glow.metadata.output1[i].output_type;
-			model_output_type =
-				model->layer[0].glow.metadata.output1[i].model_output_type;
-			dscale = model->layer[0].glow.metadata.output1[i].dscale;
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			output_type = model->layer[0].glow.metadata.output2[j].output_type;
-			model_output_type =
-				model->layer[0].glow.metadata.output2[j].model_output_type;
-			dscale = model->layer[0].glow.metadata.output2[j].dscale;
-		}
-
-		if (output_type == model_output_type) {
-			rte_memcpy(lcl_dbuffer, lcl_qbuffer, model->layer[0].info.output[i].sz_q);
-		} else {
-			switch (model->layer[0].glow.metadata.output1[i].model_output_type) {
-			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_int8_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_uint8_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_int16_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_uint16_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_FP16:
-				ret = rte_ml_io_float16_to_float32(
-					model->layer[0].info.output[i].nb_elements, lcl_qbuffer,
-					lcl_dbuffer);
-				break;
-			default:
-				plt_err("Unsupported model_output_type[%u] : %u", i,
-					model->layer[0].glow.metadata.output1[i].model_output_type);
-				ret = -ENOTSUP;
-			}
-			if (ret < 0)
-				return ret;
-		}
-
-		lcl_qbuffer += model->layer[0].info.output[i].sz_q;
-		lcl_dbuffer += model->layer[0].info.output[i].sz_d;
-	}
-
-	return 0;
-}
-
 static __rte_always_inline void
 queue_index_advance(uint64_t *index, uint64_t nb_desc)
 {
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index ef12069f0d..780e2a9f9c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -320,13 +320,6 @@ int cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 int cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 				 void *buffer);
 
-/* I/O ops */
-int cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id,
-			 struct rte_ml_buff_seg **dbuffer, struct rte_ml_buff_seg **qbuffer);
-
-int cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id,
-			   struct rte_ml_buff_seg **qbuffer, struct rte_ml_buff_seg **dbuffer);
-
 /* Fast-path ops */
 __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					  struct rte_ml_op **ops, uint16_t nb_ops);
diff --git a/drivers/ml/cnxk/cnxk_ml_io.c b/drivers/ml/cnxk/cnxk_ml_io.c
new file mode 100644
index 0000000000..c78009ab0c
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_io.c
@@ -0,0 +1,95 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include <mldev_utils.h>
+
+#include <roc_api.h>
+
+#include "cnxk_ml_io.h"
+
+inline int
+cnxk_ml_io_quantize_single(struct cnxk_ml_io *input, uint8_t *dbuffer, uint8_t *qbuffer)
+{
+	enum rte_ml_io_type qtype;
+	enum rte_ml_io_type dtype;
+	uint32_t nb_elements;
+	float qscale;
+	int ret = 0;
+
+	dtype = input->dtype;
+	qtype = input->qtype;
+	qscale = input->scale;
+	nb_elements = input->nb_elements;
+
+	if (dtype == qtype) {
+		rte_memcpy(qbuffer, dbuffer, input->sz_d);
+	} else {
+		switch (qtype) {
+		case RTE_ML_IO_TYPE_INT8:
+			ret = rte_ml_io_float32_to_int8(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT8:
+			ret = rte_ml_io_float32_to_uint8(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_INT16:
+			ret = rte_ml_io_float32_to_int16(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT16:
+			ret = rte_ml_io_float32_to_uint16(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_FP16:
+			ret = rte_ml_io_float32_to_float16(nb_elements, dbuffer, qbuffer);
+			break;
+		default:
+			plt_err("Unsupported qtype : %u", qtype);
+			ret = -ENOTSUP;
+		}
+	}
+
+	return ret;
+}
+
+inline int
+cnxk_ml_io_dequantize_single(struct cnxk_ml_io *output, uint8_t *qbuffer, uint8_t *dbuffer)
+{
+	enum rte_ml_io_type qtype;
+	enum rte_ml_io_type dtype;
+	uint32_t nb_elements;
+	float dscale;
+	int ret = 0;
+
+	dtype = output->dtype;
+	qtype = output->qtype;
+	dscale = output->scale;
+	nb_elements = output->nb_elements;
+
+	if (dtype == qtype) {
+		rte_memcpy(dbuffer, qbuffer, output->sz_q);
+	} else {
+		switch (qtype) {
+		case RTE_ML_IO_TYPE_INT8:
+			ret = rte_ml_io_int8_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT8:
+			ret = rte_ml_io_uint8_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_INT16:
+			ret = rte_ml_io_int16_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT16:
+			ret = rte_ml_io_uint16_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_FP16:
+			ret = rte_ml_io_float16_to_float32(nb_elements, qbuffer, dbuffer);
+			break;
+		default:
+			plt_err("Unsupported qtype: %u", qtype);
+			ret = -ENOTSUP;
+		}
+	}
+
+	return ret;
+}
diff --git a/drivers/ml/cnxk/cnxk_ml_io.h b/drivers/ml/cnxk/cnxk_ml_io.h
index 29ec7ec511..5de166c252 100644
--- a/drivers/ml/cnxk/cnxk_ml_io.h
+++ b/drivers/ml/cnxk/cnxk_ml_io.h
@@ -76,4 +76,7 @@ struct cnxk_ml_io_info {
 	uint32_t total_output_sz_d;
 };
 
+int cnxk_ml_io_quantize_single(struct cnxk_ml_io *input, uint8_t *dbuffer, uint8_t *qbuffer);
+int cnxk_ml_io_dequantize_single(struct cnxk_ml_io *output, uint8_t *qbuffer, uint8_t *dbuffer);
+
 #endif /* _CNXK_ML_IO_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 9ce37fcfd1..63842025fc 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -5,6 +5,8 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include <mldev_utils.h>
+
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_io.h"
 #include "cnxk_ml_model.h"
@@ -648,6 +650,78 @@ cnxk_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buf
 	return cn10k_ml_model_params_update(cnxk_mldev, model, buffer);
 }
 
+static int
+cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
+		    struct rte_ml_buff_seg **qbuffer)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_model *model;
+	uint8_t *lcl_dbuffer;
+	uint8_t *lcl_qbuffer;
+	uint32_t i;
+	int ret;
+
+	if ((dev == NULL) || (dbuffer == NULL) || (qbuffer == NULL))
+		return -EINVAL;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	info = &model->layer[0].info;
+
+	lcl_dbuffer = dbuffer[0]->addr;
+	lcl_qbuffer = qbuffer[0]->addr;
+	for (i = 0; i < info->nb_inputs; i++) {
+		ret = cnxk_ml_io_quantize_single(&info->input[i], lcl_dbuffer, lcl_qbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_dbuffer += info->input[i].sz_d;
+		lcl_qbuffer += info->input[i].sz_q;
+	}
+
+	return 0;
+}
+
+static int
+cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
+		      struct rte_ml_buff_seg **dbuffer)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_model *model;
+	uint8_t *lcl_qbuffer;
+	uint8_t *lcl_dbuffer;
+	uint32_t i;
+	int ret;
+
+	if ((dev == NULL) || (qbuffer == NULL) || (dbuffer == NULL))
+		return -EINVAL;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	info = &model->layer[model->nb_layers - 1].info;
+
+	lcl_qbuffer = qbuffer[0]->addr;
+	lcl_dbuffer = dbuffer[0]->addr;
+	for (i = 0; i < info->nb_outputs; i++) {
+		ret = cnxk_ml_io_dequantize_single(&info->output[i], lcl_qbuffer, lcl_dbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_qbuffer += info->output[i].sz_q;
+		lcl_dbuffer += info->output[i].sz_d;
+	}
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -679,6 +753,6 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.model_params_update = cnxk_ml_model_params_update,
 
 	/* I/O ops */
-	.io_quantize = cn10k_ml_io_quantize,
-	.io_dequantize = cn10k_ml_io_dequantize,
+	.io_quantize = cnxk_ml_io_quantize,
+	.io_dequantize = cnxk_ml_io_dequantize,
 };
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index d652543912..79154c8698 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -13,6 +13,7 @@ sources = files(
         'cn10k_ml_model.c',
         'cn10k_ml_ocm.c',
         'cnxk_ml_dev.c',
+        'cnxk_ml_io.c',
         'cnxk_ml_model.c',
         'cnxk_ml_ops.c',
 )
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 13/34] ml/cnxk: update device debug functions
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
                     ` (11 preceding siblings ...)
  2023-10-19  4:17   ` [PATCH v7 12/34] ml/cnxk: update data quantization functions Srikanth Yalavarthi
@ 2023-10-19  4:17   ` Srikanth Yalavarthi
  2023-10-19  4:17   ` [PATCH v7 14/34] ml/cnxk: update device stats functions Srikanth Yalavarthi
                     ` (20 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper for device dump and selftest debug
functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 118 +++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h |   1 +
 drivers/ml/cnxk/cn10k_ml_ocm.c   |   8 +-
 drivers/ml/cnxk/cn10k_ml_ocm.h   |   2 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 176 ++-----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h   |   4 +-
 drivers/ml/cnxk/cnxk_ml_model.c  |  33 ++++++
 drivers/ml/cnxk/cnxk_ml_model.h  |   2 +
 drivers/ml/cnxk/cnxk_ml_ops.c    |  39 ++++++-
 drivers/ml/cnxk/cnxk_ml_utils.c  |  15 +++
 drivers/ml/cnxk/cnxk_ml_utils.h  |  17 +++
 drivers/ml/cnxk/meson.build      |   1 +
 12 files changed, 235 insertions(+), 181 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.h

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 48d70027ca..af9d5a666f 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -11,6 +11,7 @@
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
+#include "cnxk_ml_utils.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -598,3 +599,120 @@ cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 				 rte_ml_io_type_size_get(io_info->output[i].qtype);
 	}
 }
+
+void
+cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp)
+{
+	struct cn10k_ml_ocm *ocm;
+	char str[STR_LEN];
+	uint8_t i;
+	uint8_t j;
+
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
+
+	/* Print debug info */
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, " Layer Information (Layer ID: %u, Name: %s)\n",
+		cnxk_mldev->index_map[layer->index].layer_id, layer->name);
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "index", layer->index);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", layer->name);
+	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version",
+		layer->glow.metadata.model.version[0], layer->glow.metadata.model.version[1],
+		layer->glow.metadata.model.version[2], layer->glow.metadata.model.version[3]);
+	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "layer", PLT_U64_CAST(layer));
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", layer->batch_size);
+
+	/* Print model state */
+	if (layer->state == ML_CNXK_LAYER_STATE_LOADED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+	if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+	if (layer->state == ML_CNXK_LAYER_STATE_STARTED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+
+	/* Print OCM status */
+	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
+		layer->glow.metadata.model.ocm_wb_range_end -
+			layer->glow.metadata.model.ocm_wb_range_start + 1);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", layer->glow.ocm_map.wb_pages);
+	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
+		ocm->size_per_tile - layer->glow.metadata.model.ocm_tmp_range_floor);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages", layer->glow.ocm_map.scratch_pages);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
+		layer->glow.metadata.model.tile_end - layer->glow.metadata.model.tile_start + 1);
+
+	if (layer->state == ML_CNXK_LAYER_STATE_STARTED) {
+		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
+			ML_CN10K_OCM_NUMTILES / 4, layer->glow.ocm_map.tilemask);
+		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
+			layer->glow.ocm_map.wb_page_start * ocm->page_size);
+	}
+
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", layer->glow.metadata.model.num_input);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", layer->glow.metadata.model.num_output);
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s  %18s\n", "input", "input_name", "input_type",
+		"model_input_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->glow.metadata.model.num_input; i++) {
+		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.input1[i].input_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.input1[i].input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.input1[i].model_input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		} else {
+			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
+
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.input2[j].input_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.input2[j].input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.input2[j].model_input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		}
+	}
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s  %18s\n", "output", "output_name", "output_type",
+		"model_output_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->glow.metadata.model.num_output; i++) {
+		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.output1[i].output_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.output1[i].output_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.output1[i].model_output_type,
+					      str, STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		} else {
+			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.output2[j].output_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.output2[j].output_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.output2[j].model_output_type,
+					      str, STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		}
+	}
+	fprintf(fp, "\n");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index b891c9d627..45f2ed5fcf 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -460,5 +460,6 @@ int cn10k_ml_model_ocm_pages_count(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_m
 void cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 			     struct cnxk_ml_io_info *io_info,
 			     struct cn10k_ml_model_metadata *metadata);
+void cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 2197e5e0ed..dc315cce10 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -481,19 +481,15 @@ cn10k_ml_ocm_pagemask_to_str(struct cn10k_ml_ocm_tile_info *tile_info, uint16_t
 }
 
 void
-cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
+cn10k_ml_ocm_print(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
 {
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 	uint8_t tile_id;
 	uint8_t word_id;
 	int wb_pages;
 	char *str;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
 
 	/* Nibbles + prefix '0x' */
 	str = rte_zmalloc("ocm_mask_str", ocm->num_pages / 4 + 2, RTE_CACHE_LINE_SIZE);
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 97b723a56a..bf8944f8ee 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -83,6 +83,6 @@ void cn10k_ml_ocm_reserve_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_i
 				uint16_t layer_id, uint64_t tilemask, int wb_page_start,
 				uint16_t wb_pages, uint16_t scratch_pages);
 void cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id);
-void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp);
+void cn10k_ml_ocm_print(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 
 #endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index c68e6c620c..a56d002d4c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -18,11 +18,6 @@
 /* ML layer macros */
 #define CN10K_ML_LAYER_MEMZONE_NAME "ml_cn10k_layer_mz"
 
-/* Debug print width */
-#define STR_LEN	  12
-#define FIELD_LEN 16
-#define LINE_LEN  90
-
 /* ML Job descriptor flags */
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
@@ -70,16 +65,6 @@ static const struct cn10k_ml_stype_db_driver {
 	{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
 };
 
-static void
-print_line(FILE *fp, int len)
-{
-	int i;
-
-	for (i = 0; i < len; i++)
-		fprintf(fp, "-");
-	fprintf(fp, "\n");
-}
-
 static inline void
 cn10k_ml_set_poll_addr(struct cnxk_ml_req *req)
 {
@@ -113,140 +98,6 @@ cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp)
 	}
 }
 
-static void
-cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
-	struct cn10k_ml_ocm *ocm;
-	char str[STR_LEN];
-	uint8_t i;
-	uint8_t j;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
-
-	/* Print debug info */
-	print_line(fp, LINE_LEN);
-	fprintf(fp, " Model Information (%s)\n", model->glow.metadata.model.name);
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->glow.metadata.model.name);
-	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version",
-		model->glow.metadata.model.version[0], model->glow.metadata.model.version[1],
-		model->glow.metadata.model.version[2], model->glow.metadata.model.version[3]);
-	if (strlen(model->name) != 0)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "debug_name", model->name);
-	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->glow.metadata.model.batch_size);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_layers", model->glow.metadata.model.num_layers);
-
-	/* Print model state */
-	if (model->state == ML_CNXK_MODEL_STATE_LOADED)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
-	if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
-	if (model->state == ML_CNXK_MODEL_STATE_STARTED)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
-
-	/* Print OCM status */
-	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
-		model->glow.metadata.model.ocm_wb_range_end -
-			model->glow.metadata.model.ocm_wb_range_start + 1);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", model->layer[0].glow.ocm_map.wb_pages);
-	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
-		ocm->size_per_tile - model->glow.metadata.model.ocm_tmp_range_floor);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages",
-		model->layer[0].glow.ocm_map.scratch_pages);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
-		model->glow.metadata.model.tile_end - model->glow.metadata.model.tile_start + 1);
-
-	if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
-			ML_CN10K_OCM_NUMTILES / 4, model->layer[0].glow.ocm_map.tilemask);
-		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
-			model->layer[0].glow.ocm_map.wb_page_start * cn10k_mldev->ocm.page_size);
-	}
-
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->glow.metadata.model.num_input);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", model->glow.metadata.model.num_output);
-	fprintf(fp, "\n");
-
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "input", "input_name", "input_type",
-		"model_input_type", "quantize");
-	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->glow.metadata.model.num_input; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.input1[i].input_name);
-			rte_ml_io_type_to_str(model->glow.metadata.input1[i].input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.input1[i].model_input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.input1[i].quantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.input2[j].input_name);
-			rte_ml_io_type_to_str(model->glow.metadata.input2[j].input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.input2[j].model_input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.input2[j].quantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		}
-	}
-	fprintf(fp, "\n");
-
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "output", "output_name", "output_type",
-		"model_output_type", "dequantize");
-	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->glow.metadata.model.num_output; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.output1[i].output_name);
-			rte_ml_io_type_to_str(model->glow.metadata.output1[i].output_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.output1[i].model_output_type,
-					      str, STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.output1[i].dequantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.output2[j].output_name);
-			rte_ml_io_type_to_str(model->glow.metadata.output2[j].output_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.output2[j].model_output_type,
-					      str, STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.output2[j].dequantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		}
-	}
-	fprintf(fp, "\n");
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "\n");
-}
-
 static void
 cn10k_ml_prep_sp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				struct cnxk_ml_req *req, enum cn10k_ml_job_type job_type)
@@ -1120,38 +971,25 @@ cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mo
 }
 
 int
-cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
+cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
 	struct cn10k_ml_fw *fw;
 
 	uint32_t head_loc;
 	uint32_t tail_loc;
-	uint16_t model_id;
 	uint32_t bufsize;
 	char *head_ptr;
 	int core_id;
 
-	if (roc_env_is_asim())
-		return 0;
-
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	fw = &cn10k_mldev->fw;
 
-	/* Dump model info */
-	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
-		model = dev->data->models[model_id];
-		if (model != NULL) {
-			cn10k_ml_model_print(dev, model_id, fp);
-			fprintf(fp, "\n");
-		}
-	}
-
 	/* Dump OCM state */
-	cn10k_ml_ocm_print(dev, fp);
+	cn10k_ml_ocm_print(cnxk_mldev, fp);
+
+	if (roc_env_is_asim())
+		return 0;
 
 	/* Dump debug buffer */
 	for (core_id = 0; core_id <= 1; core_id++) {
@@ -1207,17 +1045,15 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 }
 
 int
-cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
+cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	const struct plt_memzone *mz;
 	struct cnxk_ml_req *req;
 	uint64_t timeout_cycle;
 	bool timeout;
 	int ret;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cnxk_ml_req), 0,
 					 ML_CN10K_ALIGN_SIZE);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 780e2a9f9c..5fda98ae88 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -295,8 +295,8 @@ int cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_d
 int cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
-int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
-int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
+int cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
+int cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev);
 
 int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
 void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
index 3d735ced3e..b069d4e3a5 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.c
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -5,3 +5,36 @@
 #include <rte_mldev.h>
 
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_utils.h"
+
+void
+cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp)
+{
+	struct cnxk_ml_layer *layer;
+	uint16_t layer_id;
+
+	/* Print debug info */
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, " Model Information (Model ID: %u, Name: %s)\n", model->model_id, model->name);
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->name);
+	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->batch_size);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "nb_layers", model->nb_layers);
+
+	/* Print model state */
+	if (model->state == ML_CNXK_MODEL_STATE_LOADED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+	if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+	if (model->state == ML_CNXK_MODEL_STATE_STARTED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+
+	for (layer_id = 0; layer_id < model->nb_layers; layer_id++) {
+		layer = &model->layer[layer_id];
+		cn10k_ml_layer_print(cnxk_mldev, layer, fp);
+	}
+}
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index a2994dbb71..66d979dd3c 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -108,4 +108,6 @@ struct cnxk_ml_model {
 	plt_spinlock_t lock;
 };
 
+void cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp);
+
 #endif /* _CNXK_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 63842025fc..66b88ddae1 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -409,6 +409,41 @@ cnxk_ml_dev_stop(struct rte_ml_dev *dev)
 	return 0;
 }
 
+static int
+cnxk_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	uint16_t model_id;
+
+	if ((dev == NULL) || (fp == NULL))
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	/* Dump model info */
+	for (model_id = 0; model_id < cnxk_mldev->mldev->data->nb_models; model_id++) {
+		model = cnxk_mldev->mldev->data->models[model_id];
+		if (model != NULL)
+			cnxk_ml_model_dump(cnxk_mldev, model, fp);
+	}
+
+	return cn10k_ml_dev_dump(cnxk_mldev, fp);
+}
+
+static int
+cnxk_ml_dev_selftest(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	return cn10k_ml_dev_selftest(cnxk_mldev);
+}
+
 static int
 cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 			     const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
@@ -729,8 +764,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_close = cnxk_ml_dev_close,
 	.dev_start = cnxk_ml_dev_start,
 	.dev_stop = cnxk_ml_dev_stop,
-	.dev_dump = cn10k_ml_dev_dump,
-	.dev_selftest = cn10k_ml_dev_selftest,
+	.dev_dump = cnxk_ml_dev_dump,
+	.dev_selftest = cnxk_ml_dev_selftest,
 
 	/* Queue-pair handling ops */
 	.dev_queue_pair_setup = cnxk_ml_dev_queue_pair_setup,
diff --git a/drivers/ml/cnxk/cnxk_ml_utils.c b/drivers/ml/cnxk/cnxk_ml_utils.c
new file mode 100644
index 0000000000..ca3670a9e8
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_utils.c
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include "cnxk_ml_utils.h"
+
+void
+cnxk_ml_print_line(FILE *fp, int len)
+{
+	int i;
+
+	for (i = 0; i < len; i++)
+		fprintf(fp, "-");
+	fprintf(fp, "\n");
+}
diff --git a/drivers/ml/cnxk/cnxk_ml_utils.h b/drivers/ml/cnxk/cnxk_ml_utils.h
new file mode 100644
index 0000000000..ed2ab21346
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_utils.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_UTILS_H_
+#define _CNXK_ML_UTILS_H_
+
+#include <rte_mldev.h>
+
+/* Debug print width */
+#define STR_LEN	  12
+#define FIELD_LEN 16
+#define LINE_LEN  72
+
+void cnxk_ml_print_line(FILE *fp, int len);
+
+#endif /* _CNXK_ML_UTILS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 79154c8698..5d27a87d91 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -16,6 +16,7 @@ sources = files(
         'cnxk_ml_io.c',
         'cnxk_ml_model.c',
         'cnxk_ml_ops.c',
+        'cnxk_ml_utils.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 14/34] ml/cnxk: update device stats functions
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
                     ` (12 preceding siblings ...)
  2023-10-19  4:17   ` [PATCH v7 13/34] ml/cnxk: update device debug functions Srikanth Yalavarthi
@ 2023-10-19  4:17   ` Srikanth Yalavarthi
  2023-10-19  4:17   ` [PATCH v7 15/34] ml/cnxk: update device and model xstats functions Srikanth Yalavarthi
                     ` (19 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to handle ML device stats

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 32 ------------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |  2 --
 drivers/ml/cnxk/cnxk_ml_ops.c  | 36 ++++++++++++++++++++++++++++++++--
 3 files changed, 34 insertions(+), 36 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index a56d002d4c..8cbf700f6e 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -770,38 +770,6 @@ cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
-int
-cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
-{
-	struct cnxk_ml_qp *qp;
-	int qp_id;
-
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-		qp = dev->data->queue_pairs[qp_id];
-		stats->enqueued_count += qp->stats.enqueued_count;
-		stats->dequeued_count += qp->stats.dequeued_count;
-		stats->enqueue_err_count += qp->stats.enqueue_err_count;
-		stats->dequeue_err_count += qp->stats.dequeue_err_count;
-	}
-
-	return 0;
-}
-
-void
-cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
-{
-	struct cnxk_ml_qp *qp;
-	int qp_id;
-
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-		qp = dev->data->queue_pairs[qp_id];
-		qp->stats.enqueued_count = 0;
-		qp->stats.dequeued_count = 0;
-		qp->stats.enqueue_err_count = 0;
-		qp->stats.dequeue_err_count = 0;
-	}
-}
-
 int
 cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 5fda98ae88..47e7cb12af 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -298,8 +298,6 @@ int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev);
 
-int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
-void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
 int cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 				  int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
 				  uint32_t size);
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 66b88ddae1..c75317d6da 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -489,6 +489,38 @@ cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+static int
+cnxk_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
+{
+	struct cnxk_ml_qp *qp;
+	int qp_id;
+
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		stats->enqueued_count += qp->stats.enqueued_count;
+		stats->dequeued_count += qp->stats.dequeued_count;
+		stats->enqueue_err_count += qp->stats.enqueue_err_count;
+		stats->dequeue_err_count += qp->stats.dequeue_err_count;
+	}
+
+	return 0;
+}
+
+static void
+cnxk_ml_dev_stats_reset(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_qp *qp;
+	int qp_id;
+
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		qp->stats.enqueued_count = 0;
+		qp->stats.dequeued_count = 0;
+		qp->stats.enqueue_err_count = 0;
+		qp->stats.dequeue_err_count = 0;
+	}
+}
+
 static int
 cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
@@ -772,8 +804,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_queue_pair_release = cnxk_ml_dev_queue_pair_release,
 
 	/* Stats ops */
-	.dev_stats_get = cn10k_ml_dev_stats_get,
-	.dev_stats_reset = cn10k_ml_dev_stats_reset,
+	.dev_stats_get = cnxk_ml_dev_stats_get,
+	.dev_stats_reset = cnxk_ml_dev_stats_reset,
 	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
 	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
 	.dev_xstats_get = cn10k_ml_dev_xstats_get,
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 15/34] ml/cnxk: update device and model xstats functions
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
                     ` (13 preceding siblings ...)
  2023-10-19  4:17   ` [PATCH v7 14/34] ml/cnxk: update device stats functions Srikanth Yalavarthi
@ 2023-10-19  4:17   ` Srikanth Yalavarthi
  2023-10-19  4:17   ` [PATCH v7 16/34] ml/cnxk: update fast path functions Srikanth Yalavarthi
                     ` (18 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to handle ML device and model
extended stats. Handling resources for the xstats is done
in the cnxk layer. Introduced internal xstats group.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |   4 -
 drivers/ml/cnxk/cn10k_ml_ops.c   | 531 +++----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h   |  16 +-
 drivers/ml/cnxk/cnxk_ml_dev.h    |   5 +
 drivers/ml/cnxk/cnxk_ml_ops.c    | 481 +++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_xstats.h |  21 +-
 6 files changed, 551 insertions(+), 507 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index be989e0a20..bde9d08901 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -10,7 +10,6 @@
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_io.h"
-#include "cnxk_ml_xstats.h"
 
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
@@ -133,9 +132,6 @@ struct cn10k_ml_dev {
 	/* OCM info */
 	struct cn10k_ml_ocm ocm;
 
-	/* Extended stats data */
-	struct cnxk_ml_xstats xstats;
-
 	/* Enable / disable model data caching */
 	int cache_model_data;
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 8cbf700f6e..776ad60401 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -198,107 +198,21 @@ cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_r
 	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
 }
 
-static int
-cn10k_ml_xstats_init(struct rte_ml_dev *dev)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint16_t nb_stats;
-	uint16_t stat_id;
-	uint16_t model;
-	uint16_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
-	nb_stats = RTE_DIM(device_xstats) + ML_CNXK_MAX_MODELS * RTE_DIM(layer_xstats);
-	if (cn10k_mldev->xstats.entries == NULL)
-		cn10k_mldev->xstats.entries = rte_zmalloc(
-			"cn10k_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
-			PLT_CACHE_LINE_SIZE);
-
-	if (cn10k_mldev->xstats.entries == NULL)
-		return -ENOMEM;
-
-	/* Initialize device xstats */
-	stat_id = 0;
-	for (i = 0; i < RTE_DIM(device_xstats); i++) {
-		cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
-		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
-			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s",
-			 device_xstats[i].name);
-
-		cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
-		cn10k_mldev->xstats.entries[stat_id].type = device_xstats[i].type;
-		cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_DEVICE;
-		cn10k_mldev->xstats.entries[stat_id].obj_idx = 0;
-		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_xstats[i].reset_allowed;
-		stat_id++;
-	}
-	cn10k_mldev->xstats.count_mode_device = stat_id;
-
-	/* Initialize model xstats */
-	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
-		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
-
-		for (i = 0; i < RTE_DIM(layer_xstats); i++) {
-			cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
-			cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
-			cn10k_mldev->xstats.entries[stat_id].type = layer_xstats[i].type;
-			cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
-			cn10k_mldev->xstats.entries[stat_id].obj_idx = model;
-			cn10k_mldev->xstats.entries[stat_id].reset_allowed =
-				layer_xstats[i].reset_allowed;
-
-			/* Name of xstat is updated during model load */
-			snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
-				 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name),
-				 "Model-%u-%s", model, layer_xstats[i].name);
-
-			stat_id++;
-		}
-
-		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
-	}
-
-	cn10k_mldev->xstats.count_mode_model = stat_id - cn10k_mldev->xstats.count_mode_device;
-	cn10k_mldev->xstats.count = stat_id;
-
-	return 0;
-}
-
 static void
-cn10k_ml_xstats_uninit(struct rte_ml_dev *dev)
+cn10k_ml_xstats_layer_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id,
+				  uint16_t layer_id)
 {
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	rte_free(cn10k_mldev->xstats.entries);
-	cn10k_mldev->xstats.entries = NULL;
-
-	cn10k_mldev->xstats.count = 0;
-}
-
-static void
-cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	uint16_t rclk_freq;
 	uint16_t sclk_freq;
 	uint16_t stat_id;
 	char suffix[8];
 	uint16_t i;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	model = dev->data->models[model_id];
-	stat_id = RTE_DIM(device_xstats) + model_id * RTE_DIM(layer_xstats);
+	model = cnxk_mldev->mldev->data->models[model_id];
+	layer = &model->layer[layer_id];
+	stat_id = cnxk_mldev->xstats.offset_for_layer[model_id][layer_id];
 
 	roc_clk_freq_get(&rclk_freq, &sclk_freq);
 	if (sclk_freq == 0)
@@ -306,270 +220,94 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 	else
 		strcpy(suffix, "ns");
 
-	/* Update xstat name based on model name and sclk availability */
+	/* Update xstat name based on layer name and sclk availability */
 	for (i = 0; i < RTE_DIM(layer_xstats); i++) {
-		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
-			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
-			 model->layer[0].glow.metadata.model.name, layer_xstats[i].name, suffix);
+		snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+			 layer->glow.metadata.model.name, layer_xstats[i].name, suffix);
 		stat_id++;
 	}
 }
 
-static uint64_t
-cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
-		       enum cnxk_ml_xstats_type type)
-{
-	struct cnxk_ml_dev *cnxk_mldev;
-
-	cnxk_mldev = dev->data->dev_private;
-
-	switch (type) {
-	case nb_models_loaded:
-		return cnxk_mldev->nb_models_loaded;
-	case nb_models_unloaded:
-		return cnxk_mldev->nb_models_unloaded;
-	case nb_models_started:
-		return cnxk_mldev->nb_models_started;
-	case nb_models_stopped:
-		return cnxk_mldev->nb_models_stopped;
-	default:
-		return -1;
-	}
-
-	return 0;
-}
-
-#define ML_AVG_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+#define ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
 	do {                                                                                       \
 		value = 0;                                                                         \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value += model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot;       \
-			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
-				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value += layer->glow.burst_xstats[qp_id].str##_latency_tot;                \
+			count += layer->glow.burst_xstats[qp_id].dequeued_count -                  \
+				 layer->glow.burst_xstats[qp_id].str##_reset_count;                \
 		}                                                                                  \
+		value += layer->glow.sync_xstats->str##_latency_tot;                               \
+		count += layer->glow.sync_xstats->dequeued_count -                                 \
+			 layer->glow.sync_xstats->str##_reset_count;                               \
 		if (count != 0)                                                                    \
 			value = value / count;                                                     \
 	} while (0)
 
-#define ML_MIN_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+#define ML_MIN_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
 	do {                                                                                       \
 		value = UINT64_MAX;                                                                \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MIN(                                                           \
-				value,                                                             \
-				model->layer[0].glow.burst_xstats[qp_id].str##_latency_min);       \
-			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
-				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MIN(value, layer->glow.burst_xstats[qp_id].str##_latency_min); \
+			count += layer->glow.burst_xstats[qp_id].dequeued_count -                  \
+				 layer->glow.burst_xstats[qp_id].str##_reset_count;                \
 		}                                                                                  \
+		value = PLT_MIN(value, layer->glow.sync_xstats->str##_latency_min);                \
+		count += layer->glow.sync_xstats->dequeued_count -                                 \
+			 layer->glow.sync_xstats->str##_reset_count;                               \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
 	} while (0)
 
-#define ML_MAX_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+#define ML_MAX_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
 	do {                                                                                       \
 		value = 0;                                                                         \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MAX(                                                           \
-				value,                                                             \
-				model->layer[0].glow.burst_xstats[qp_id].str##_latency_max);       \
-			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
-				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MAX(value, layer->glow.burst_xstats[qp_id].str##_latency_max); \
+			count += layer->glow.burst_xstats[qp_id].dequeued_count -                  \
+				 layer->glow.burst_xstats[qp_id].str##_reset_count;                \
 		}                                                                                  \
+		value = PLT_MAX(value, layer->glow.sync_xstats->str##_latency_max);                \
+		count += layer->glow.sync_xstats->dequeued_count -                                 \
+			 layer->glow.sync_xstats->str##_reset_count;                               \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
 	} while (0)
 
-static uint64_t
-cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cnxk_ml_xstats_type type)
+uint64_t
+cn10k_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
+			 enum cnxk_ml_xstats_type type)
 {
-	struct cnxk_ml_model *model;
-	uint16_t rclk_freq; /* MHz */
-	uint16_t sclk_freq; /* MHz */
 	uint64_t count = 0;
-	uint64_t value;
+	uint64_t value = 0;
 	uint32_t qp_id;
 
-	model = dev->data->models[obj_idx];
-	if (model == NULL)
-		return 0;
-
 	switch (type) {
 	case avg_hw_latency:
-		ML_AVG_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, hw, value, count);
 		break;
 	case min_hw_latency:
-		ML_MIN_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		ML_MIN_FOREACH_QP(cnxk_mldev, layer, qp_id, hw, value, count);
 		break;
 	case max_hw_latency:
-		ML_MAX_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		ML_MAX_FOREACH_QP(cnxk_mldev, layer, qp_id, hw, value, count);
 		break;
 	case avg_fw_latency:
-		ML_AVG_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, fw, value, count);
 		break;
 	case min_fw_latency:
-		ML_MIN_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		ML_MIN_FOREACH_QP(cnxk_mldev, layer, qp_id, fw, value, count);
 		break;
 	case max_fw_latency:
-		ML_MAX_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		ML_MAX_FOREACH_QP(cnxk_mldev, layer, qp_id, fw, value, count);
 		break;
 	default:
 		value = 0;
 	}
 
-	roc_clk_freq_get(&rclk_freq, &sclk_freq);
-	if (sclk_freq != 0) /* return in ns */
-		value = (value * 1000ULL) / sclk_freq;
-
 	return value;
 }
 
-static int
-cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[], uint16_t nb_ids)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_xstats_entry *xs;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint16_t nb_stats;
-	uint16_t stat_id;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	if (stat_ids == NULL)
-		nb_stats = cn10k_mldev->xstats.count_mode_device;
-	else
-		nb_stats = nb_ids;
-
-	for (i = 0; i < nb_stats; i++) {
-		if (stat_ids == NULL)
-			stat_id = i;
-		else
-			stat_id = stat_ids[i];
-
-		if (stat_id >= cn10k_mldev->xstats.count_mode_device)
-			return -EINVAL;
-
-		xs = &cn10k_mldev->xstats.entries[stat_id];
-		if (!xs->reset_allowed)
-			continue;
-
-		xs->reset_value = cn10k_ml_dev_xstat_get(dev, xs->obj_idx, xs->type);
-	}
-
-	return 0;
-}
-
-#define ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
-	do {                                                                                       \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot = 0;            \
-			model->layer[0].glow.burst_xstats[qp_id].str##_reset_count =               \
-				model->layer[0].glow.burst_xstats[qp_id].dequeued_count;           \
-		}                                                                                  \
-	} while (0)
-
-#define ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
-	do {                                                                                       \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_xstats[qp_id].str##_latency_min = UINT64_MAX;   \
-	} while (0)
-
-#define ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
-	do {                                                                                       \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_xstats[qp_id].str##_latency_max = 0;            \
-	} while (0)
-
-static void
-cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cnxk_ml_xstats_type type)
-{
-	struct cnxk_ml_model *model;
-	uint32_t qp_id;
-
-	model = dev->data->models[model_id];
-
-	switch (type) {
-	case avg_hw_latency:
-		ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, hw);
-		break;
-	case min_hw_latency:
-		ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, hw);
-		break;
-	case max_hw_latency:
-		ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, hw);
-		break;
-	case avg_fw_latency:
-		ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, fw);
-		break;
-	case min_fw_latency:
-		ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, fw);
-		break;
-	case max_fw_latency:
-		ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, fw);
-		break;
-	default:
-		return;
-	}
-}
-
-static int
-cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint16_t stat_ids[],
-			    uint16_t nb_ids)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_xstats_entry *xs;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
-	int32_t lcl_model_id = 0;
-	uint16_t start_id;
-	uint16_t end_id;
-	int32_t i;
-	int32_t j;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	for (i = 0; i < ML_CNXK_MAX_MODELS; i++) {
-		if (model_id == -1) {
-			model = dev->data->models[i];
-			if (model == NULL) /* Skip inactive models */
-				continue;
-		} else {
-			if (model_id != i)
-				continue;
-
-			model = dev->data->models[model_id];
-			if (model == NULL) {
-				plt_err("Invalid model_id = %d\n", model_id);
-				return -EINVAL;
-			}
-		}
-
-		start_id = cn10k_mldev->xstats.offset_for_model[i];
-		end_id = cn10k_mldev->xstats.offset_for_model[i] +
-			 cn10k_mldev->xstats.count_per_model[i] - 1;
-
-		if (stat_ids == NULL) {
-			for (j = start_id; j <= end_id; j++) {
-				xs = &cn10k_mldev->xstats.entries[j];
-				cn10k_ml_reset_model_stat(dev, i, xs->type);
-			}
-		} else {
-			for (j = 0; j < nb_ids; j++) {
-				if (stat_ids[j] < start_id || stat_ids[j] > end_id) {
-					plt_err("Invalid stat_ids[%d] = %d for model_id = %d\n", j,
-						stat_ids[j], lcl_model_id);
-					return -EINVAL;
-				}
-				xs = &cn10k_mldev->xstats.entries[stat_ids[j]];
-				cn10k_ml_reset_model_stat(dev, i, xs->type);
-			}
-		}
-	}
-
-	return 0;
-}
-
 static int
 cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer)
 {
@@ -654,7 +392,6 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cn10k_ml_ocm *ocm;
 	uint16_t tile_id;
-	int ret;
 
 	RTE_SET_USED(conf);
 
@@ -682,13 +419,6 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 
 	rte_spinlock_init(&ocm->lock);
 
-	/* Initialize xstats */
-	ret = cn10k_ml_xstats_init(cnxk_mldev->mldev);
-	if (ret != 0) {
-		plt_err("Failed to initialize xstats");
-		return ret;
-	}
-
 	/* Set JCMDQ enqueue function */
 	if (cn10k_mldev->hw_queue_lock == 1)
 		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_sl;
@@ -717,9 +447,6 @@ cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 	/* Release ocm_mask memory */
 	rte_free(cn10k_mldev->ocm.ocm_mask);
 
-	/* Un-initialize xstats */
-	cn10k_ml_xstats_uninit(cnxk_mldev->mldev);
-
 	/* Unload firmware */
 	cn10k_ml_fw_unload(cnxk_mldev);
 
@@ -770,174 +497,6 @@ cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
-int
-cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
-			      uint32_t size)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint32_t xstats_mode_count;
-	uint32_t idx = 0;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	xstats_mode_count = 0;
-	switch (mode) {
-	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
-		break;
-	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CNXK_MAX_MODELS)
-			break;
-		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
-		break;
-	default:
-		return -EINVAL;
-	};
-
-	if (xstats_mode_count > size || xstats_map == NULL)
-		return xstats_mode_count;
-
-	for (i = 0; i < cn10k_mldev->xstats.count && idx < size; i++) {
-		if (cn10k_mldev->xstats.entries[i].mode != mode)
-			continue;
-
-		if (mode != RTE_ML_DEV_XSTATS_DEVICE &&
-		    model_id != cn10k_mldev->xstats.entries[i].obj_idx)
-			continue;
-
-		rte_strscpy(xstats_map[idx].name, cn10k_mldev->xstats.entries[i].map.name,
-			    RTE_ML_STR_MAX);
-		xstats_map[idx].id = cn10k_mldev->xstats.entries[i].map.id;
-		idx++;
-	}
-
-	return idx;
-}
-
-int
-cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
-				uint64_t *value)
-{
-	struct cnxk_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	cnxk_ml_xstats_fn fn;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	for (i = 0; i < cn10k_mldev->xstats.count; i++) {
-		xs = &cn10k_mldev->xstats.entries[i];
-		if (strncmp(xs->map.name, name, RTE_ML_STR_MAX) == 0) {
-			if (stat_id != NULL)
-				*stat_id = xs->map.id;
-
-			switch (xs->fn_id) {
-			case CNXK_ML_XSTATS_FN_DEVICE:
-				fn = cn10k_ml_dev_xstat_get;
-				break;
-			case CNXK_ML_XSTATS_FN_MODEL:
-				fn = cn10k_ml_model_xstat_get;
-				break;
-			default:
-				plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
-				return -EINVAL;
-			}
-
-			*value = fn(dev, xs->obj_idx, xs->type) - xs->reset_value;
-
-			return 0;
-		}
-	}
-
-	if (stat_id != NULL)
-		*stat_id = (uint16_t)-1;
-
-	return -EINVAL;
-}
-
-int
-cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
-			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_xstats_entry *xs;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint32_t xstats_mode_count;
-	cnxk_ml_xstats_fn fn;
-	uint64_t val;
-	uint32_t idx;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	xstats_mode_count = 0;
-
-	switch (mode) {
-	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
-		break;
-	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CNXK_MAX_MODELS)
-			return -EINVAL;
-		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
-		break;
-	default:
-		return -EINVAL;
-	};
-
-	idx = 0;
-	for (i = 0; i < nb_ids && idx < xstats_mode_count; i++) {
-		xs = &cn10k_mldev->xstats.entries[stat_ids[i]];
-		if (stat_ids[i] > cn10k_mldev->xstats.count || xs->mode != mode)
-			continue;
-
-		if (mode == RTE_ML_DEV_XSTATS_MODEL && model_id != xs->obj_idx) {
-			plt_err("Invalid stats_id[%d] = %d for model_id = %d\n", i, stat_ids[i],
-				model_id);
-			return -EINVAL;
-		}
-
-		switch (xs->fn_id) {
-		case CNXK_ML_XSTATS_FN_DEVICE:
-			fn = cn10k_ml_dev_xstat_get;
-			break;
-		case CNXK_ML_XSTATS_FN_MODEL:
-			fn = cn10k_ml_model_xstat_get;
-			break;
-		default:
-			plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
-			return -EINVAL;
-		}
-
-		val = fn(dev, xs->obj_idx, xs->type);
-		if (values)
-			values[idx] = val;
-
-		idx++;
-	}
-
-	return idx;
-}
-
-int
-cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			  int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids)
-{
-	switch (mode) {
-	case RTE_ML_DEV_XSTATS_DEVICE:
-		return cn10k_ml_device_xstats_reset(dev, stat_ids, nb_ids);
-	case RTE_ML_DEV_XSTATS_MODEL:
-		return cn10k_ml_model_xstats_reset(dev, model_id, stat_ids, nb_ids);
-	};
-
-	return 0;
-}
-
 int
 cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
 {
@@ -1211,7 +770,7 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 							      sizeof(struct cn10k_ml_layer_xstats));
 
 	/* Update xstats names */
-	cn10k_ml_xstats_model_name_update(cnxk_mldev->mldev, idx);
+	cn10k_ml_xstats_layer_name_update(cnxk_mldev, model_id, layer_id);
 
 	layer->state = ML_CNXK_LAYER_STATE_LOADED;
 	cnxk_mldev->index_map[idx].model_id = model->model_id;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 47e7cb12af..4d76164dba 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -13,6 +13,7 @@
 struct cnxk_ml_dev;
 struct cnxk_ml_qp;
 struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
@@ -298,17 +299,6 @@ int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev);
 
-int cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-				  int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
-				  uint32_t size);
-int cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
-				    uint64_t *value);
-int cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			    int32_t model_id, const uint16_t stat_ids[], uint64_t values[],
-			    uint16_t nb_ids);
-int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			      int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids);
-
 /* Slow-path ops */
 int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
@@ -337,4 +327,8 @@ int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_nam
 int cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name);
 int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name);
 
+/* xstats ops */
+uint64_t cn10k_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
+				  enum cnxk_ml_xstats_type type);
+
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 1590249abd..3ce9338f1f 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -9,6 +9,8 @@
 
 #include "cn10k_ml_dev.h"
 
+#include "cnxk_ml_xstats.h"
+
 /* ML command timeout in seconds */
 #define ML_CNXK_CMD_TIMEOUT 5
 
@@ -51,6 +53,9 @@ struct cnxk_ml_dev {
 	/* Configuration state */
 	enum cnxk_ml_dev_state state;
 
+	/* Extended stats data */
+	struct cnxk_ml_xstats xstats;
+
 	/* Number of models loaded */
 	uint16_t nb_models_loaded;
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index c75317d6da..4f4a41219e 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -115,6 +115,285 @@ cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc
 	return NULL;
 }
 
+static int
+cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
+{
+	uint16_t nb_stats;
+	uint16_t stat_id;
+	uint16_t model;
+	uint16_t layer;
+	uint16_t i;
+
+	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
+	nb_stats = RTE_DIM(device_xstats) +
+		   RTE_DIM(layer_xstats) * ML_CNXK_MAX_MODELS * ML_CNXK_MODEL_MAX_LAYERS;
+	if (cnxk_mldev->xstats.entries == NULL)
+		cnxk_mldev->xstats.entries = rte_zmalloc(
+			"cnxk_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
+			PLT_CACHE_LINE_SIZE);
+
+	if (cnxk_mldev->xstats.entries == NULL)
+		return -ENOMEM;
+
+	/* Initialize device xstats */
+	stat_id = 0;
+	for (i = 0; i < RTE_DIM(device_xstats); i++) {
+		cnxk_mldev->xstats.entries[stat_id].map.id = stat_id;
+		snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s",
+			 device_xstats[i].name);
+
+		cnxk_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
+		cnxk_mldev->xstats.entries[stat_id].group = CNXK_ML_XSTATS_GROUP_DEVICE;
+		cnxk_mldev->xstats.entries[stat_id].type = device_xstats[i].type;
+		cnxk_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_DEVICE;
+		cnxk_mldev->xstats.entries[stat_id].obj_idx = 0;
+		cnxk_mldev->xstats.entries[stat_id].reset_allowed = device_xstats[i].reset_allowed;
+		stat_id++;
+	}
+	cnxk_mldev->xstats.count_mode_device = stat_id;
+
+	/* Initialize model xstats */
+	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
+		cnxk_mldev->xstats.offset_for_model[model] = stat_id;
+
+		for (layer = 0; layer < ML_CNXK_MODEL_MAX_LAYERS; layer++) {
+			cnxk_mldev->xstats.offset_for_layer[model][layer] = stat_id;
+
+			for (i = 0; i < RTE_DIM(layer_xstats); i++) {
+				cnxk_mldev->xstats.entries[stat_id].map.id = stat_id;
+				cnxk_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
+				cnxk_mldev->xstats.entries[stat_id].group =
+					CNXK_ML_XSTATS_GROUP_LAYER;
+				cnxk_mldev->xstats.entries[stat_id].type = layer_xstats[i].type;
+				cnxk_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
+				cnxk_mldev->xstats.entries[stat_id].obj_idx = model;
+				cnxk_mldev->xstats.entries[stat_id].layer_id = layer;
+				cnxk_mldev->xstats.entries[stat_id].reset_allowed =
+					layer_xstats[i].reset_allowed;
+
+				/* Name of xstat is updated during model load */
+				snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+					 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name),
+					 "Layer-%u-%u-%s", model, layer, layer_xstats[i].name);
+
+				stat_id++;
+			}
+
+			cnxk_mldev->xstats.count_per_layer[model][layer] = RTE_DIM(layer_xstats);
+		}
+
+		cnxk_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
+	}
+
+	cnxk_mldev->xstats.count_mode_model = stat_id - cnxk_mldev->xstats.count_mode_device;
+	cnxk_mldev->xstats.count = stat_id;
+
+	return 0;
+}
+
+static void
+cnxk_ml_xstats_uninit(struct cnxk_ml_dev *cnxk_mldev)
+{
+	rte_free(cnxk_mldev->xstats.entries);
+	cnxk_mldev->xstats.entries = NULL;
+
+	cnxk_mldev->xstats.count = 0;
+}
+
+static uint64_t
+cnxk_ml_dev_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx __rte_unused,
+		      int32_t layer_id __rte_unused, enum cnxk_ml_xstats_type type)
+{
+	switch (type) {
+	case nb_models_loaded:
+		return cnxk_mldev->nb_models_loaded;
+	case nb_models_unloaded:
+		return cnxk_mldev->nb_models_unloaded;
+	case nb_models_started:
+		return cnxk_mldev->nb_models_started;
+	case nb_models_stopped:
+		return cnxk_mldev->nb_models_stopped;
+	default:
+		return -1;
+	}
+
+	return 0;
+}
+
+static uint64_t
+cnxk_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx, int32_t layer_id,
+			enum cnxk_ml_xstats_type type)
+{
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
+	uint16_t rclk_freq; /* MHz */
+	uint16_t sclk_freq; /* MHz */
+	uint64_t value = 0;
+
+	model = cnxk_mldev->mldev->data->models[obj_idx];
+	if (model == NULL)
+		return 0;
+
+	if (layer_id >= 0)
+		layer = &model->layer[layer_id];
+	else
+		return 0;
+
+	value = cn10k_ml_model_xstat_get(cnxk_mldev, layer, type);
+
+	roc_clk_freq_get(&rclk_freq, &sclk_freq);
+	if (sclk_freq != 0) /* return in ns */
+		value = (value * 1000ULL) / sclk_freq;
+
+	return value;
+}
+
+static int
+cnxk_ml_device_xstats_reset(struct cnxk_ml_dev *cnxk_mldev, const uint16_t stat_ids[],
+			    uint16_t nb_ids)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	uint16_t nb_stats;
+	uint16_t stat_id;
+	uint32_t i;
+
+	if (stat_ids == NULL)
+		nb_stats = cnxk_mldev->xstats.count_mode_device;
+	else
+		nb_stats = nb_ids;
+
+	for (i = 0; i < nb_stats; i++) {
+		if (stat_ids == NULL)
+			stat_id = i;
+		else
+			stat_id = stat_ids[i];
+
+		if (stat_id >= cnxk_mldev->xstats.count_mode_device)
+			return -EINVAL;
+
+		xs = &cnxk_mldev->xstats.entries[stat_id];
+		if (!xs->reset_allowed)
+			continue;
+
+		xs->reset_value =
+			cnxk_ml_dev_xstat_get(cnxk_mldev, xs->obj_idx, xs->layer_id, xs->type);
+	}
+
+	return 0;
+}
+
+#define ML_AVG_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, str)                                     \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			layer->glow.burst_xstats[qp_id].str##_latency_tot = 0;                     \
+			layer->glow.burst_xstats[qp_id].str##_reset_count =                        \
+				layer->glow.burst_xstats[qp_id].dequeued_count;                    \
+		}                                                                                  \
+	} while (0)
+
+#define ML_MIN_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, str)                                     \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++)          \
+			layer->glow.burst_xstats[qp_id].str##_latency_min = UINT64_MAX;            \
+	} while (0)
+
+#define ML_MAX_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, str)                                     \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++)          \
+			layer->glow.burst_xstats[qp_id].str##_latency_max = 0;                     \
+	} while (0)
+
+static void
+cnxk_ml_reset_model_stat(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id,
+			 enum cnxk_ml_xstats_type type)
+{
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
+	uint16_t layer_id = 0;
+	uint32_t qp_id;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	layer = &model->layer[layer_id];
+
+	switch (type) {
+	case avg_hw_latency:
+		ML_AVG_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, hw);
+		break;
+	case min_hw_latency:
+		ML_MIN_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, hw);
+		break;
+	case max_hw_latency:
+		ML_MAX_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, hw);
+		break;
+	case avg_fw_latency:
+		ML_AVG_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, fw);
+		break;
+	case min_fw_latency:
+		ML_MIN_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, fw);
+		break;
+	case max_fw_latency:
+		ML_MAX_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, fw);
+		break;
+	default:
+		return;
+	}
+}
+
+static int
+cnxk_ml_model_xstats_reset(struct cnxk_ml_dev *cnxk_mldev, int32_t model_id,
+			   const uint16_t stat_ids[], uint16_t nb_ids)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_model *model;
+	int32_t lcl_model_id = 0;
+	uint16_t layer_id = 0;
+	uint16_t start_id;
+	uint16_t end_id;
+	int32_t i;
+	int32_t j;
+
+	for (i = 0; i < ML_CNXK_MAX_MODELS; i++) {
+		if (model_id == -1) {
+			model = cnxk_mldev->mldev->data->models[i];
+			if (model == NULL) /* skip inactive models */
+				continue;
+		} else {
+			if (model_id != i)
+				continue;
+
+			model = cnxk_mldev->mldev->data->models[model_id];
+			if (model == NULL) {
+				plt_err("Invalid model_id = %d\n", model_id);
+				return -EINVAL;
+			}
+		}
+
+		start_id = cnxk_mldev->xstats.offset_for_layer[i][layer_id];
+		end_id = cnxk_mldev->xstats.offset_for_layer[i][layer_id] +
+			 cnxk_mldev->xstats.count_per_layer[i][layer_id] - 1;
+
+		if (stat_ids == NULL) {
+			for (j = start_id; j <= end_id; j++) {
+				xs = &cnxk_mldev->xstats.entries[j];
+				cnxk_ml_reset_model_stat(cnxk_mldev, i, xs->type);
+			}
+		} else {
+			for (j = 0; j < nb_ids; j++) {
+				if (stat_ids[j] < start_id || stat_ids[j] > end_id) {
+					plt_err("Invalid stat_ids[%d] = %d for model_id = %d\n", j,
+						stat_ids[j], lcl_model_id);
+					return -EINVAL;
+				}
+				xs = &cnxk_mldev->xstats.entries[stat_ids[j]];
+				cnxk_ml_reset_model_stat(cnxk_mldev, i, xs->type);
+			}
+		}
+	}
+
+	return 0;
+}
+
 static int
 cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -294,6 +573,13 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	for (i = 0; i < cnxk_mldev->max_nb_layers; i++)
 		cnxk_mldev->index_map[i].active = false;
 
+	/* Initialize xstats */
+	ret = cnxk_ml_xstats_init(cnxk_mldev);
+	if (ret != 0) {
+		plt_err("Failed to initialize xstats");
+		goto error;
+	}
+
 	cnxk_mldev->nb_models_loaded = 0;
 	cnxk_mldev->nb_models_started = 0;
 	cnxk_mldev->nb_models_stopped = 0;
@@ -323,6 +609,9 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
+	/* Un-initialize xstats */
+	cnxk_ml_xstats_uninit(cnxk_mldev);
+
 	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close CN10K ML Device");
 
@@ -521,6 +810,190 @@ cnxk_ml_dev_stats_reset(struct rte_ml_dev *dev)
 	}
 }
 
+static int
+cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+			     int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
+			     uint32_t size)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_dev *cnxk_mldev;
+	uint32_t xstats_mode_count;
+	uint16_t layer_id = 0;
+	uint32_t idx = 0;
+	uint32_t i;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+	xstats_mode_count = 0;
+
+	switch (mode) {
+	case RTE_ML_DEV_XSTATS_DEVICE:
+		xstats_mode_count = cnxk_mldev->xstats.count_mode_device;
+		break;
+	case RTE_ML_DEV_XSTATS_MODEL:
+		if (model_id >= ML_CNXK_MAX_MODELS)
+			break;
+		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+		break;
+	default:
+		return -EINVAL;
+	};
+
+	if (xstats_mode_count > size || xstats_map == NULL)
+		return xstats_mode_count;
+
+	for (i = 0; i < cnxk_mldev->xstats.count && idx < size; i++) {
+		xs = &cnxk_mldev->xstats.entries[i];
+		if (xs->mode != mode)
+			continue;
+
+		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
+		    (model_id != xs->obj_idx || layer_id != xs->layer_id))
+			continue;
+
+		rte_strscpy(xstats_map[idx].name, xs->map.name, RTE_ML_STR_MAX);
+		xstats_map[idx].id = xs->map.id;
+		idx++;
+	}
+
+	return idx;
+}
+
+static int
+cnxk_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
+			       uint64_t *value)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_dev *cnxk_mldev;
+	cnxk_ml_xstats_fn fn;
+	uint32_t i;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	for (i = 0; i < cnxk_mldev->xstats.count; i++) {
+		xs = &cnxk_mldev->xstats.entries[i];
+		if (strncmp(xs->map.name, name, RTE_ML_STR_MAX) == 0) {
+			if (stat_id != NULL)
+				*stat_id = xs->map.id;
+
+			switch (xs->fn_id) {
+			case CNXK_ML_XSTATS_FN_DEVICE:
+				fn = cnxk_ml_dev_xstat_get;
+				break;
+			case CNXK_ML_XSTATS_FN_MODEL:
+				fn = cnxk_ml_model_xstat_get;
+				break;
+			default:
+				plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
+				return -EINVAL;
+			}
+
+			*value = fn(cnxk_mldev, xs->obj_idx, xs->layer_id, xs->type) -
+				 xs->reset_value;
+
+			return 0;
+		}
+	}
+
+	if (stat_id != NULL)
+		*stat_id = (uint16_t)-1;
+
+	return -EINVAL;
+}
+
+static int
+cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
+		       const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_dev *cnxk_mldev;
+	uint32_t xstats_mode_count;
+	uint16_t layer_id = 0;
+	cnxk_ml_xstats_fn fn;
+	uint64_t val;
+	uint32_t idx;
+	uint32_t i;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+	xstats_mode_count = 0;
+
+	switch (mode) {
+	case RTE_ML_DEV_XSTATS_DEVICE:
+		xstats_mode_count = cnxk_mldev->xstats.count_mode_device;
+		break;
+	case RTE_ML_DEV_XSTATS_MODEL:
+		if (model_id >= ML_CNXK_MAX_MODELS)
+			return -EINVAL;
+		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+		break;
+	default:
+		return -EINVAL;
+	};
+
+	idx = 0;
+	for (i = 0; i < nb_ids && idx < xstats_mode_count; i++) {
+		xs = &cnxk_mldev->xstats.entries[stat_ids[i]];
+		if (stat_ids[i] > cnxk_mldev->xstats.count || xs->mode != mode)
+			continue;
+
+		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
+		    (model_id != xs->obj_idx || layer_id != xs->layer_id)) {
+			plt_err("Invalid stats_id[%d] = %d for model_id = %d\n", i, stat_ids[i],
+				model_id);
+			return -EINVAL;
+		}
+
+		switch (xs->fn_id) {
+		case CNXK_ML_XSTATS_FN_DEVICE:
+			fn = cnxk_ml_dev_xstat_get;
+			break;
+		case CNXK_ML_XSTATS_FN_MODEL:
+			fn = cnxk_ml_model_xstat_get;
+			break;
+		default:
+			plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
+			return -EINVAL;
+		}
+
+		val = fn(cnxk_mldev, xs->obj_idx, xs->layer_id, xs->type);
+		if (values)
+			values[idx] = val;
+
+		idx++;
+	}
+
+	return idx;
+}
+
+static int
+cnxk_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
+			 const uint16_t stat_ids[], uint16_t nb_ids)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	switch (mode) {
+	case RTE_ML_DEV_XSTATS_DEVICE:
+		return cnxk_ml_device_xstats_reset(cnxk_mldev, stat_ids, nb_ids);
+	case RTE_ML_DEV_XSTATS_MODEL:
+		return cnxk_ml_model_xstats_reset(cnxk_mldev, model_id, stat_ids, nb_ids);
+	};
+
+	return 0;
+}
+
 static int
 cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
@@ -806,10 +1279,10 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Stats ops */
 	.dev_stats_get = cnxk_ml_dev_stats_get,
 	.dev_stats_reset = cnxk_ml_dev_stats_reset,
-	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
-	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
-	.dev_xstats_get = cn10k_ml_dev_xstats_get,
-	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
+	.dev_xstats_names_get = cnxk_ml_dev_xstats_names_get,
+	.dev_xstats_by_name_get = cnxk_ml_dev_xstats_by_name_get,
+	.dev_xstats_get = cnxk_ml_dev_xstats_get,
+	.dev_xstats_reset = cnxk_ml_dev_xstats_reset,
 
 	/* Model ops */
 	.model_load = cnxk_ml_model_load,
diff --git a/drivers/ml/cnxk/cnxk_ml_xstats.h b/drivers/ml/cnxk/cnxk_ml_xstats.h
index 0d405679ca..5e02bb876c 100644
--- a/drivers/ml/cnxk/cnxk_ml_xstats.h
+++ b/drivers/ml/cnxk/cnxk_ml_xstats.h
@@ -7,6 +7,8 @@
 
 #include "cnxk_ml_io.h"
 
+struct cnxk_ml_dev;
+
 /* Extended stats types enum */
 enum cnxk_ml_xstats_type {
 	/* Number of models loaded */
@@ -58,9 +60,21 @@ enum cnxk_ml_xstats_fn_type {
 	CNXK_ML_XSTATS_FN_MODEL,
 };
 
+/* Extended stats group */
+enum cnxk_ml_xstats_group {
+	/* Device stats */
+	CNXK_ML_XSTATS_GROUP_DEVICE,
+
+	/* Model stats */
+	CNXK_ML_XSTATS_GROUP_MODEL,
+
+	/* Layer stats */
+	CNXK_ML_XSTATS_GROUP_LAYER,
+};
+
 /* Function pointer to get xstats for a type */
-typedef uint64_t (*cnxk_ml_xstats_fn)(struct rte_ml_dev *cnxk_mldev, uint16_t obj_idx,
-				      enum cnxk_ml_xstats_type stat);
+typedef uint64_t (*cnxk_ml_xstats_fn)(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx,
+				      int32_t layer_id, enum cnxk_ml_xstats_type stat);
 
 /* Extended stats entry structure */
 struct cnxk_ml_xstats_entry {
@@ -70,6 +84,9 @@ struct cnxk_ml_xstats_entry {
 	/* xstats mode, device or model */
 	enum rte_ml_dev_xstats_mode mode;
 
+	/* xstats group */
+	enum cnxk_ml_xstats_group group;
+
 	/* Type of xstats */
 	enum cnxk_ml_xstats_type type;
 
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 16/34] ml/cnxk: update fast path functions
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
                     ` (14 preceding siblings ...)
  2023-10-19  4:17   ` [PATCH v7 15/34] ml/cnxk: update device and model xstats functions Srikanth Yalavarthi
@ 2023-10-19  4:17   ` Srikanth Yalavarthi
  2023-10-19  4:17   ` [PATCH v7 17/34] ml/cnxk: move error handling to cnxk layer Srikanth Yalavarthi
                     ` (17 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented cnxk layer fast-path functions and added support
for model specific fast-path functions. CNXK layer functions
would invoke model specific fast-path functions.

Added support for model specific poll handling functions and
updated internal inference sync function. Drop use of rte_ml_op
as argument. Updated function arguments to enable the function
to be used as callback by TVM HW runtime.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h  |   5 -
 drivers/ml/cnxk/cn10k_ml_ops.c  | 241 ++++++++------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h  |  13 +-
 drivers/ml/cnxk/cnxk_ml_model.h |  14 ++
 drivers/ml/cnxk/cnxk_ml_ops.c   | 128 +++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.h   |   7 +
 6 files changed, 216 insertions(+), 192 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index bde9d08901..94a94d996f 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -143,11 +143,6 @@ struct cn10k_ml_dev {
 
 	/* JCMD enqueue function handler */
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
-
-	/* Poll handling function pointers */
-	void (*set_poll_addr)(struct cnxk_ml_req *req);
-	void (*set_poll_ptr)(struct cnxk_ml_req *req);
-	uint64_t (*get_poll_ptr)(struct cnxk_ml_req *req);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 776ad60401..8116c8dedb 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -65,24 +65,12 @@ static const struct cn10k_ml_stype_db_driver {
 	{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
 };
 
-static inline void
+__rte_hot void
 cn10k_ml_set_poll_addr(struct cnxk_ml_req *req)
 {
 	req->status = &req->cn10k_req.status;
 }
 
-static inline void
-cn10k_ml_set_poll_ptr(struct cnxk_ml_req *req)
-{
-	plt_write64(ML_CNXK_POLL_JOB_START, req->status);
-}
-
-static inline uint64_t
-cn10k_ml_get_poll_ptr(struct cnxk_ml_req *req)
-{
-	return plt_read64(req->status);
-}
-
 void
 cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp)
 {
@@ -177,7 +165,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_l
 
 static __rte_always_inline void
 cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_req *req,
-				struct rte_ml_op *op)
+				uint16_t index, void *input, void *output, uint16_t nb_batches)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 
@@ -185,17 +173,17 @@ cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_r
 
 	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
 	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(req->status);
-	req->cn10k_req.jd.hdr.model_id = op->model_id;
+	req->cn10k_req.jd.hdr.model_id = index;
 	req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
 	req->cn10k_req.jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
 	req->cn10k_req.jd.hdr.sp_flags = 0x0;
 	req->cn10k_req.jd.hdr.result =
 		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
 	req->cn10k_req.jd.model_run.input_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->input[0]->addr));
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, input));
 	req->cn10k_req.jd.model_run.output_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->output[0]->addr));
-	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, output));
+	req->cn10k_req.jd.model_run.num_batches = nb_batches;
 }
 
 static void
@@ -311,30 +299,15 @@ cn10k_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *l
 static int
 cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer)
 {
-	struct rte_ml_buff_seg seg[2];
-	struct rte_ml_buff_seg *inp;
-	struct rte_ml_buff_seg *out;
-	struct rte_ml_op op;
-
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
 	uint64_t isize = 0;
 	uint64_t osize = 0;
 	int ret = 0;
-	uint32_t i;
-
-	inp = &seg[0];
-	out = &seg[1];
 
 	/* Create input and output buffers. */
-	for (i = 0; i < layer->info.nb_inputs; i++)
-		isize += layer->info.input[i].sz_q;
-
-	for (i = 0; i < layer->info.nb_outputs; i++)
-		osize += layer->info.output[i].sz_q;
-
-	isize = layer->batch_size * isize;
-	osize = layer->batch_size * osize;
+	isize = layer->info.total_input_sz_q;
+	osize = layer->info.total_output_sz_q;
 
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", layer->index);
 	mz = plt_memzone_reserve_aligned(str, isize + osize, 0, ML_CN10K_ALIGN_SIZE);
@@ -342,25 +315,9 @@ cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *
 		return -ENOMEM;
 	memset(mz->addr, 0, isize + osize);
 
-	seg[0].addr = mz->addr;
-	seg[0].iova_addr = mz->iova;
-	seg[0].length = isize;
-	seg[0].next = NULL;
-
-	seg[1].addr = PLT_PTR_ADD(mz->addr, isize);
-	seg[1].iova_addr = mz->iova + isize;
-	seg[1].length = osize;
-	seg[1].next = NULL;
-
-	op.model_id = layer->index;
-	op.nb_batches = layer->batch_size;
-	op.mempool = NULL;
-
-	op.input = &inp;
-	op.output = &out;
-
 	memset(layer->glow.req, 0, sizeof(struct cnxk_ml_req));
-	ret = cn10k_ml_inference_sync(cnxk_mldev, &op);
+	ret = cn10k_ml_inference_sync(cnxk_mldev, layer->index, mz->addr,
+				      PLT_PTR_ADD(mz->addr, isize), 1);
 	plt_memzone_free(mz);
 
 	return ret;
@@ -425,13 +382,8 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 	else
 		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
-	/* Set polling function pointers */
-	cn10k_mldev->set_poll_addr = cn10k_ml_set_poll_addr;
-	cn10k_mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
-	cn10k_mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
-
-	cnxk_mldev->mldev->enqueue_burst = cn10k_ml_enqueue_burst;
-	cnxk_mldev->mldev->dequeue_burst = cn10k_ml_dequeue_burst;
+	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
+	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
 	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
 
 	return 0;
@@ -824,6 +776,12 @@ cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	cn10k_ml_model_info_set(cnxk_mldev, model, &model->layer[0].info, &model->glow.metadata);
 
+	/* Set fast-path functions */
+	model->enqueue_single = cn10k_ml_enqueue_single;
+	model->result_update = cn10k_ml_result_update;
+	model->set_error_code = cn10k_ml_set_error_code;
+	model->set_poll_addr = cn10k_ml_set_poll_addr;
+
 	return 0;
 }
 
@@ -1219,26 +1177,8 @@ cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_mode
 	return 0;
 }
 
-static __rte_always_inline void
-queue_index_advance(uint64_t *index, uint64_t nb_desc)
-{
-	*index = (*index + 1) % nb_desc;
-}
-
-static __rte_always_inline uint64_t
-queue_pending_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
-{
-	return (nb_desc + head - tail) % nb_desc;
-}
-
-static __rte_always_inline uint64_t
-queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
-{
-	return nb_desc - queue_pending_count(head, tail, nb_desc) - 1;
-}
-
-static __rte_always_inline void
-cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml_req *req)
+__rte_hot void
+cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_layer_xstats *xstats;
@@ -1246,6 +1186,7 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml
 	struct cn10k_ml_result *result;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
+	struct cnxk_ml_req *req;
 	struct cnxk_ml_qp *qp;
 	struct rte_ml_op *op;
 	uint64_t hw_latency;
@@ -1253,9 +1194,9 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml
 	uint16_t model_id;
 	uint16_t layer_id;
 
+	req = (struct cnxk_ml_req *)request;
 	result = &req->cn10k_req.result;
 	op = req->op;
-
 	if (likely(result->error_code == 0)) {
 		model_id = cnxk_mldev->index_map[op->model_id].model_id;
 		layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
@@ -1322,119 +1263,48 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml
 	op->user_ptr = result->user_ptr;
 }
 
-__rte_hot uint16_t
-cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
-		       uint16_t nb_ops)
+__rte_hot void
+cn10k_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype)
+{
+	union cn10k_ml_error_code *error_code;
+
+	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+	error_code->s.etype = etype;
+	error_code->s.stype = stype;
+}
+
+__rte_hot bool
+cn10k_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op, uint16_t layer_id,
+			struct cnxk_ml_qp *qp, uint64_t head)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
 	struct cnxk_ml_queue *queue;
 	struct cnxk_ml_req *req;
-	struct cnxk_ml_qp *qp;
-	struct rte_ml_op *op;
-
-	uint16_t count;
-	uint64_t head;
-	bool enqueued;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	qp = dev->data->queue_pairs[qp_id];
 	queue = &qp->queue;
-
-	head = queue->head;
-	nb_ops = PLT_MIN(nb_ops, queue_free_count(head, queue->tail, qp->nb_desc));
-	count = 0;
-
-	if (unlikely(nb_ops == 0))
-		return 0;
-
-enqueue_req:
-	op = ops[count];
 	req = &queue->reqs[head];
 
-	cn10k_mldev->set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
+	model = cnxk_mldev->mldev->data->models[op->model_id];
+	model->set_poll_addr(req);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, model->layer[layer_id].index,
+					op->input[0]->addr, op->output[0]->addr, op->nb_batches);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
 	error_code->s.etype = ML_ETYPE_UNKNOWN;
 	req->cn10k_req.result.user_ptr = op->user_ptr;
 
-	cn10k_mldev->set_poll_ptr(req);
-	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd);
-	if (unlikely(!enqueued))
-		goto jcmdq_full;
+	cnxk_ml_set_poll_ptr(req);
+	if (unlikely(!cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd)))
+		return false;
 
 	req->timeout = plt_tsc_cycles() + queue->wait_cycles;
 	req->op = op;
 
-	queue_index_advance(&head, qp->nb_desc);
-	count++;
-
-	if (count < nb_ops)
-		goto enqueue_req;
-
-jcmdq_full:
-	queue->head = head;
-	qp->stats.enqueued_count += count;
-
-	return count;
-}
-
-__rte_hot uint16_t
-cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
-		       uint16_t nb_ops)
-{
-	union cn10k_ml_error_code *error_code;
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_queue *queue;
-	struct cnxk_ml_req *req;
-	struct cnxk_ml_qp *qp;
-
-	uint64_t status;
-	uint16_t count;
-	uint64_t tail;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	qp = dev->data->queue_pairs[qp_id];
-	queue = &qp->queue;
-
-	tail = queue->tail;
-	nb_ops = PLT_MIN(nb_ops, queue_pending_count(queue->head, tail, qp->nb_desc));
-	count = 0;
-
-	if (unlikely(nb_ops == 0))
-		goto empty_or_active;
-
-dequeue_req:
-	req = &queue->reqs[tail];
-	status = cn10k_mldev->get_poll_ptr(req);
-	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
-		if (plt_tsc_cycles() < req->timeout) {
-			goto empty_or_active;
-		} else { /* Timeout, set indication of driver error */
-			error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
-			error_code->s.etype = ML_ETYPE_DRIVER;
-		}
-	}
-
-	cn10k_ml_result_update(cnxk_mldev, qp_id, req);
-	ops[count] = req->op;
-
-	queue_index_advance(&tail, qp->nb_desc);
-	count++;
-
-	if (count < nb_ops)
-		goto dequeue_req;
-
-empty_or_active:
-	queue->tail = tail;
-
-	return count;
+	return true;
 }
 
 __rte_hot int
@@ -1471,41 +1341,48 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 }
 
 __rte_hot int
-cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op)
+cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
+			uint16_t nb_batches)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
 	struct cnxk_ml_req *req;
+	struct rte_ml_op op;
 	uint16_t model_id;
 	uint16_t layer_id;
 	bool timeout;
 	int ret = 0;
 
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	model_id = cnxk_mldev->index_map[op->model_id].model_id;
-	layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
+	model_id = cnxk_mldev->index_map[index].model_id;
+	layer_id = cnxk_mldev->index_map[index].layer_id;
 	model = cnxk_mldev->mldev->data->models[model_id];
 	layer = &model->layer[layer_id];
 	req = layer->glow.req;
 
+	op.model_id = index;
+	op.impl_opaque = 0;
+
 	cn10k_ml_set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, index, input, output, nb_batches);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
 	error_code->s.etype = ML_ETYPE_UNKNOWN;
-	req->cn10k_req.result.user_ptr = op->user_ptr;
+	req->cn10k_req.result.user_ptr = NULL;
 
-	cn10k_mldev->set_poll_ptr(req);
+	cnxk_ml_set_poll_ptr(req);
 	req->cn10k_req.jcmd.w1.s.jobptr = PLT_U64_CAST(&req->cn10k_req.jd);
 
 	timeout = true;
 	req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	do {
 		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd)) {
-			req->op = op;
+			req->op = &op;
 			timeout = false;
 			break;
 		}
@@ -1518,7 +1395,7 @@ cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op)
 
 	timeout = true;
 	do {
-		if (cn10k_mldev->get_poll_ptr(req) == ML_CNXK_POLL_JOB_FINISH) {
+		if (cnxk_ml_get_poll_ptr(req) == ML_CNXK_POLL_JOB_FINISH) {
 			timeout = false;
 			break;
 		}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 4d76164dba..3d18303ed3 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -14,6 +14,7 @@ struct cnxk_ml_dev;
 struct cnxk_ml_qp;
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
+struct cnxk_ml_req;
 
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
@@ -309,13 +310,15 @@ int cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_
 				 void *buffer);
 
 /* Fast-path ops */
-__rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
-					  struct rte_ml_op **ops, uint16_t nb_ops);
-__rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
-					  struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot bool cn10k_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op,
+				       uint16_t layer_id, struct cnxk_ml_qp *qp, uint64_t head);
 __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op,
 				    struct rte_ml_op_error *error);
-__rte_hot int cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op);
+__rte_hot int cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
+				      uint16_t nb_batches);
+__rte_hot void cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request);
+__rte_hot void cn10k_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype);
+__rte_hot void cn10k_ml_set_poll_addr(struct cnxk_ml_req *req);
 
 /* Misc ops */
 void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index 66d979dd3c..f618e5aa5f 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -15,6 +15,8 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_qp;
+struct cnxk_ml_req;
 
 /* Model state */
 enum cnxk_ml_model_state {
@@ -70,6 +72,12 @@ struct cnxk_ml_layer {
 	struct cn10k_ml_layer_data glow;
 };
 
+typedef bool (*enqueue_single_t)(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op,
+				 uint16_t layer_id, struct cnxk_ml_qp *qp, uint64_t head);
+typedef void (*result_update_t)(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request);
+typedef void (*set_error_code_t)(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype);
+typedef void (*set_poll_addr_t)(struct cnxk_ml_req *req);
+
 /* Model Object */
 struct cnxk_ml_model {
 	/* Device reference */
@@ -106,6 +114,12 @@ struct cnxk_ml_model {
 
 	/* Spinlock, used to update model state */
 	plt_spinlock_t lock;
+
+	/* Fast-path functions */
+	enqueue_single_t enqueue_single;
+	result_update_t result_update;
+	set_error_code_t set_error_code;
+	set_poll_addr_t set_poll_addr;
 };
 
 void cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp);
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 4f4a41219e..909e9143bf 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -15,6 +15,18 @@
 /* ML model macros */
 #define CNXK_ML_MODEL_MEMZONE_NAME "ml_cnxk_model_mz"
 
+__rte_hot void
+cnxk_ml_set_poll_ptr(struct cnxk_ml_req *req)
+{
+	plt_write64(ML_CNXK_POLL_JOB_START, req->status);
+}
+
+__rte_hot uint64_t
+cnxk_ml_get_poll_ptr(struct cnxk_ml_req *req)
+{
+	return plt_read64(req->status);
+}
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -1262,6 +1274,122 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 	return 0;
 }
 
+static __rte_always_inline void
+queue_index_advance(uint64_t *index, uint64_t nb_desc)
+{
+	*index = (*index + 1) % nb_desc;
+}
+
+static __rte_always_inline uint64_t
+queue_pending_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
+{
+	return (nb_desc + head - tail) % nb_desc;
+}
+
+static __rte_always_inline uint64_t
+queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
+{
+	return nb_desc - queue_pending_count(head, tail, nb_desc) - 1;
+}
+
+__rte_hot uint16_t
+cnxk_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
+		      uint16_t nb_ops)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_qp *qp;
+	struct rte_ml_op *op;
+
+	uint16_t layer_id = 0;
+	uint16_t count;
+	uint64_t head;
+
+	cnxk_mldev = dev->data->dev_private;
+	qp = dev->data->queue_pairs[qp_id];
+	queue = &qp->queue;
+
+	head = queue->head;
+	nb_ops = PLT_MIN(nb_ops, queue_free_count(head, queue->tail, qp->nb_desc));
+	count = 0;
+
+	if (unlikely(nb_ops == 0))
+		return 0;
+
+enqueue_req:
+	op = ops[count];
+	model = cnxk_mldev->mldev->data->models[op->model_id];
+
+	if (unlikely(!model->enqueue_single(cnxk_mldev, op, layer_id, qp, head)))
+		goto jcmdq_full;
+
+	queue_index_advance(&head, qp->nb_desc);
+	count++;
+
+	if (count < nb_ops)
+		goto enqueue_req;
+
+jcmdq_full:
+	queue->head = head;
+	qp->stats.enqueued_count += count;
+
+	return count;
+}
+
+__rte_hot uint16_t
+cnxk_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
+		      uint16_t nb_ops)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_req *req;
+	struct cnxk_ml_qp *qp;
+
+	uint64_t status;
+	uint16_t count;
+	uint64_t tail;
+
+	cnxk_mldev = dev->data->dev_private;
+	qp = dev->data->queue_pairs[qp_id];
+	queue = &qp->queue;
+
+	tail = queue->tail;
+	nb_ops = PLT_MIN(nb_ops, queue_pending_count(queue->head, tail, qp->nb_desc));
+	count = 0;
+
+	if (unlikely(nb_ops == 0))
+		goto empty_or_active;
+
+dequeue_req:
+
+	req = &queue->reqs[tail];
+	model = cnxk_mldev->mldev->data->models[req->op->model_id];
+
+	status = cnxk_ml_get_poll_ptr(req);
+	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
+		if (plt_tsc_cycles() < req->timeout)
+			goto empty_or_active;
+		else /* Timeout, set indication of driver error */
+			model->set_error_code(req, ML_ETYPE_DRIVER, 0);
+	}
+
+	model->result_update(cnxk_mldev, qp->id, req);
+
+	ops[count] = req->op;
+	queue_index_advance(&tail, qp->nb_desc);
+	count++;
+
+	if (count < nb_ops)
+		goto dequeue_req;
+
+empty_or_active:
+	queue->tail = tail;
+
+	return count;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index d27ca0d0cb..d0c126f34b 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -65,4 +65,11 @@ extern struct rte_ml_dev_ops cnxk_ml_ops;
 int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
 int cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
 
+__rte_hot uint16_t cnxk_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
+					 struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot uint16_t cnxk_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
+					 struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot void cnxk_ml_set_poll_ptr(struct cnxk_ml_req *req);
+__rte_hot uint64_t cnxk_ml_get_poll_ptr(struct cnxk_ml_req *req);
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 17/34] ml/cnxk: move error handling to cnxk layer
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
                     ` (15 preceding siblings ...)
  2023-10-19  4:17   ` [PATCH v7 16/34] ml/cnxk: update fast path functions Srikanth Yalavarthi
@ 2023-10-19  4:17   ` Srikanth Yalavarthi
  2023-10-19  4:17   ` [PATCH v7 18/34] ml/cnxk: support config and close of tvmdp library Srikanth Yalavarthi
                     ` (16 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Move error type structures to cnxk layer. cn10k layer to
handle fw and hw error sub-types only.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h | 41 ++++++---------
 drivers/ml/cnxk/cn10k_ml_ops.c | 93 +++++++++++++---------------------
 drivers/ml/cnxk/cnxk_ml_dev.c  |  8 +++
 drivers/ml/cnxk/cnxk_ml_dev.h  | 18 +++++++
 drivers/ml/cnxk/cnxk_ml_ops.c  |  2 +-
 5 files changed, 78 insertions(+), 84 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 94a94d996f..2e7eb6c9ef 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -52,38 +52,27 @@ struct cnxk_ml_dev;
 struct cnxk_ml_req;
 struct cnxk_ml_qp;
 
-/* Error types enumeration */
-enum cn10k_ml_error_etype {
-	/* 0x0 */ ML_ETYPE_NO_ERROR = 0, /* No error */
-	/* 0x1 */ ML_ETYPE_FW_NONFATAL,	 /* Firmware non-fatal error */
-	/* 0x2 */ ML_ETYPE_HW_NONFATAL,	 /* Hardware non-fatal error */
-	/* 0x3 */ ML_ETYPE_HW_FATAL,	 /* Hardware fatal error */
-	/* 0x4 */ ML_ETYPE_HW_WARNING,	 /* Hardware warning */
-	/* 0x5 */ ML_ETYPE_DRIVER,	 /* Driver specific error */
-	/* 0x6 */ ML_ETYPE_UNKNOWN,	 /* Unknown error */
-};
-
 /* Firmware non-fatal error sub-type */
 enum cn10k_ml_error_stype_fw_nf {
-	/* 0x0 */ ML_FW_ERR_NOERR = 0,		 /* No error */
-	/* 0x1 */ ML_FW_ERR_UNLOAD_ID_NOT_FOUND, /* Model ID not found during load */
-	/* 0x2 */ ML_FW_ERR_LOAD_LUT_OVERFLOW,	 /* Lookup table overflow at load */
-	/* 0x3 */ ML_FW_ERR_ID_IN_USE,		 /* Model ID already in use */
-	/* 0x4 */ ML_FW_ERR_INVALID_TILEMASK,	 /* Invalid OCM tilemask */
-	/* 0x5 */ ML_FW_ERR_RUN_LUT_OVERFLOW,	 /* Lookup table overflow at run */
-	/* 0x6 */ ML_FW_ERR_RUN_ID_NOT_FOUND,	 /* Model ID not found during run */
-	/* 0x7 */ ML_FW_ERR_COMMAND_NOTSUP,	 /* Unsupported command */
-	/* 0x8 */ ML_FW_ERR_DDR_ADDR_RANGE,	 /* DDR address out of range */
-	/* 0x9 */ ML_FW_ERR_NUM_BATCHES_INVALID, /* Invalid number of batches */
-	/* 0xA */ ML_FW_ERR_INSSYNC_TIMEOUT,	 /* INS sync timeout */
+	/* 0x0 */ ML_CN10K_FW_ERR_NOERR = 0,	       /* No error */
+	/* 0x1 */ ML_CN10K_FW_ERR_UNLOAD_ID_NOT_FOUND, /* Model ID not found during load */
+	/* 0x2 */ ML_CN10K_FW_ERR_LOAD_LUT_OVERFLOW,   /* Lookup table overflow at load */
+	/* 0x3 */ ML_CN10K_FW_ERR_ID_IN_USE,	       /* Model ID already in use */
+	/* 0x4 */ ML_CN10K_FW_ERR_INVALID_TILEMASK,    /* Invalid OCM tilemask */
+	/* 0x5 */ ML_CN10K_FW_ERR_RUN_LUT_OVERFLOW,    /* Lookup table overflow at run */
+	/* 0x6 */ ML_CN10K_FW_ERR_RUN_ID_NOT_FOUND,    /* Model ID not found during run */
+	/* 0x7 */ ML_CN10K_FW_ERR_COMMAND_NOTSUP,      /* Unsupported command */
+	/* 0x8 */ ML_CN10K_FW_ERR_DDR_ADDR_RANGE,      /* DDR address out of range */
+	/* 0x9 */ ML_CN10K_FW_ERR_NUM_BATCHES_INVALID, /* Invalid number of batches */
+	/* 0xA */ ML_CN10K_FW_ERR_INSSYNC_TIMEOUT,     /* INS sync timeout */
 };
 
 /* Driver error sub-type */
 enum cn10k_ml_error_stype_driver {
-	/* 0x0 */ ML_DRIVER_ERR_NOERR = 0, /* No error */
-	/* 0x1 */ ML_DRIVER_ERR_UNKNOWN,   /* Unable to determine error sub-type */
-	/* 0x2 */ ML_DRIVER_ERR_EXCEPTION, /* Firmware exception */
-	/* 0x3 */ ML_DRIVER_ERR_FW_ERROR,  /* Unknown firmware error */
+	/* 0x0 */ ML_CN10K_DRIVER_ERR_NOERR = 0, /* No error */
+	/* 0x1 */ ML_CN10K_DRIVER_ERR_UNKNOWN,	 /* Unable to determine error sub-type */
+	/* 0x2 */ ML_CN10K_DRIVER_ERR_EXCEPTION, /* Firmware exception */
+	/* 0x3 */ ML_CN10K_DRIVER_ERR_FW_ERROR,	 /* Unknown firmware error */
 };
 
 /* Error structure */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 8116c8dedb..65eaaf030d 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -22,47 +22,27 @@
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
-/* Error message length */
-#define ERRMSG_LEN 32
-
-/* Error type database */
-static const struct cn10k_ml_etype_db {
-	enum cn10k_ml_error_etype etype;
-	char name[ERRMSG_LEN];
-} ml_etype_db[] = {
-	{ML_ETYPE_NO_ERROR, "NO_ERROR"},	{ML_ETYPE_FW_NONFATAL, "FW_NON_FATAL"},
-	{ML_ETYPE_HW_NONFATAL, "HW_NON_FATAL"}, {ML_ETYPE_HW_FATAL, "HW_FATAL"},
-	{ML_ETYPE_HW_WARNING, "HW_WARNING"},	{ML_ETYPE_DRIVER, "DRIVER_ERROR"},
-	{ML_ETYPE_UNKNOWN, "UNKNOWN_ERROR"},
-};
-
 /* Hardware non-fatal error subtype database */
-static const struct cn10k_ml_stype_db_hw_nf {
-	enum cn10k_ml_error_stype_fw_nf stype;
-	char msg[ERRMSG_LEN];
-} ml_stype_db_hw_nf[] = {
-	{ML_FW_ERR_NOERR, "NO ERROR"},
-	{ML_FW_ERR_UNLOAD_ID_NOT_FOUND, "UNLOAD MODEL ID NOT FOUND"},
-	{ML_FW_ERR_LOAD_LUT_OVERFLOW, "LOAD LUT OVERFLOW"},
-	{ML_FW_ERR_ID_IN_USE, "MODEL ID IN USE"},
-	{ML_FW_ERR_INVALID_TILEMASK, "INVALID TILEMASK"},
-	{ML_FW_ERR_RUN_LUT_OVERFLOW, "RUN LUT OVERFLOW"},
-	{ML_FW_ERR_RUN_ID_NOT_FOUND, "RUN MODEL ID NOT FOUND"},
-	{ML_FW_ERR_COMMAND_NOTSUP, "COMMAND NOT SUPPORTED"},
-	{ML_FW_ERR_DDR_ADDR_RANGE, "DDR ADDRESS OUT OF RANGE"},
-	{ML_FW_ERR_NUM_BATCHES_INVALID, "INVALID BATCHES"},
-	{ML_FW_ERR_INSSYNC_TIMEOUT, "INSSYNC TIMEOUT"},
+static struct cnxk_ml_error_db ml_stype_db_hw_nf[] = {
+	{ML_CN10K_FW_ERR_NOERR, "NO ERROR"},
+	{ML_CN10K_FW_ERR_UNLOAD_ID_NOT_FOUND, "UNLOAD MODEL ID NOT FOUND"},
+	{ML_CN10K_FW_ERR_LOAD_LUT_OVERFLOW, "LOAD LUT OVERFLOW"},
+	{ML_CN10K_FW_ERR_ID_IN_USE, "MODEL ID IN USE"},
+	{ML_CN10K_FW_ERR_INVALID_TILEMASK, "INVALID TILEMASK"},
+	{ML_CN10K_FW_ERR_RUN_LUT_OVERFLOW, "RUN LUT OVERFLOW"},
+	{ML_CN10K_FW_ERR_RUN_ID_NOT_FOUND, "RUN MODEL ID NOT FOUND"},
+	{ML_CN10K_FW_ERR_COMMAND_NOTSUP, "COMMAND NOT SUPPORTED"},
+	{ML_CN10K_FW_ERR_DDR_ADDR_RANGE, "DDR ADDRESS OUT OF RANGE"},
+	{ML_CN10K_FW_ERR_NUM_BATCHES_INVALID, "INVALID BATCHES"},
+	{ML_CN10K_FW_ERR_INSSYNC_TIMEOUT, "INSSYNC TIMEOUT"},
 };
 
 /* Driver error subtype database */
-static const struct cn10k_ml_stype_db_driver {
-	enum cn10k_ml_error_stype_driver stype;
-	char msg[ERRMSG_LEN];
-} ml_stype_db_driver[] = {
-	{ML_DRIVER_ERR_NOERR, "NO ERROR"},
-	{ML_DRIVER_ERR_UNKNOWN, "UNKNOWN ERROR"},
-	{ML_DRIVER_ERR_EXCEPTION, "FW EXCEPTION"},
-	{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
+static struct cnxk_ml_error_db ml_stype_db_driver[] = {
+	{ML_CN10K_DRIVER_ERR_NOERR, "NO ERROR"},
+	{ML_CN10K_DRIVER_ERR_UNKNOWN, "UNKNOWN ERROR"},
+	{ML_CN10K_DRIVER_ERR_EXCEPTION, "FW EXCEPTION"},
+	{ML_CN10K_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
 };
 
 __rte_hot void
@@ -1241,19 +1221,19 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request)
 
 		/* Handle driver error */
 		error_code = (union cn10k_ml_error_code *)&result->error_code;
-		if (error_code->s.etype == ML_ETYPE_DRIVER) {
+		if (error_code->s.etype == ML_CNXK_ETYPE_DRIVER) {
 			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 			/* Check for exception */
 			if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) !=
 			     0) ||
 			    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
-				error_code->s.stype = ML_DRIVER_ERR_EXCEPTION;
+				error_code->s.stype = ML_CN10K_DRIVER_ERR_EXCEPTION;
 			else if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_LO) != 0) ||
 				 (roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_HI) != 0))
-				error_code->s.stype = ML_DRIVER_ERR_FW_ERROR;
+				error_code->s.stype = ML_CN10K_DRIVER_ERR_FW_ERROR;
 			else
-				error_code->s.stype = ML_DRIVER_ERR_UNKNOWN;
+				error_code->s.stype = ML_CN10K_DRIVER_ERR_UNKNOWN;
 		}
 
 		op->impl_opaque = result->error_code;
@@ -1294,7 +1274,7 @@ cn10k_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op, ui
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
-	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	error_code->s.etype = ML_CNXK_ETYPE_UNKNOWN;
 	req->cn10k_req.result.user_ptr = op->user_ptr;
 
 	cnxk_ml_set_poll_ptr(req);
@@ -1311,30 +1291,29 @@ __rte_hot int
 cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_ml_op_error *error)
 {
 	union cn10k_ml_error_code *error_code;
-	char msg[RTE_ML_STR_MAX];
 
 	PLT_SET_USED(dev);
 
 	error_code = (union cn10k_ml_error_code *)&op->impl_opaque;
 
-	/* Copy error message */
-	plt_strlcpy(msg, ml_etype_db[error_code->s.etype].name, sizeof(msg));
-
 	/* Copy sub error message */
-	if (error_code->s.etype == ML_ETYPE_HW_NONFATAL) {
-		strcat(msg, " : ");
+	if (error_code->s.etype == ML_CNXK_ETYPE_HW_NONFATAL) {
 		if (error_code->s.stype < PLT_DIM(ml_stype_db_hw_nf))
-			strcat(msg, ml_stype_db_hw_nf[error_code->s.stype].msg);
+			snprintf(error->message, RTE_ML_STR_MAX, "%s : %s",
+				 ml_etype_db[error_code->s.etype].str,
+				 ml_stype_db_hw_nf[error_code->s.stype].str);
 		else
-			strcat(msg, "UNKNOWN ERROR");
-	}
-
-	if (error_code->s.etype == ML_ETYPE_DRIVER) {
-		strcat(msg, " : ");
-		strcat(msg, ml_stype_db_driver[error_code->s.stype].msg);
+			snprintf(error->message, RTE_ML_STR_MAX, "%s : UNKNOWN ERROR",
+				 ml_etype_db[error_code->s.etype].str);
+	} else if (error_code->s.etype == ML_CNXK_ETYPE_DRIVER) {
+		snprintf(error->message, RTE_ML_STR_MAX, "%s : %s",
+			 ml_etype_db[error_code->s.etype].str,
+			 ml_stype_db_driver[error_code->s.stype].str);
+	} else {
+		snprintf(error->message, RTE_ML_STR_MAX, "%s",
+			 ml_etype_db[error_code->s.etype].str);
 	}
 
-	plt_strlcpy(error->message, msg, sizeof(error->message));
 	error->errcode = error_code->u64;
 
 	return 0;
@@ -1372,7 +1351,7 @@ cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
-	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	error_code->s.etype = ML_CNXK_ETYPE_UNKNOWN;
 	req->cn10k_req.result.user_ptr = NULL;
 
 	cnxk_ml_set_poll_ptr(req);
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.c b/drivers/ml/cnxk/cnxk_ml_dev.c
index 2a5c17c973..63d1c9e417 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.c
+++ b/drivers/ml/cnxk/cnxk_ml_dev.c
@@ -9,3 +9,11 @@
 
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
+
+/* Error type database */
+struct cnxk_ml_error_db ml_etype_db[] = {
+	{ML_CNXK_ETYPE_NO_ERROR, "NO_ERROR"},	     {ML_CNXK_ETYPE_FW_NONFATAL, "FW_NON_FATAL"},
+	{ML_CNXK_ETYPE_HW_NONFATAL, "HW_NON_FATAL"}, {ML_CNXK_ETYPE_HW_FATAL, "HW_FATAL"},
+	{ML_CNXK_ETYPE_HW_WARNING, "HW_WARNING"},    {ML_CNXK_ETYPE_DRIVER, "DRIVER_ERROR"},
+	{ML_CNXK_ETYPE_UNKNOWN, "UNKNOWN_ERROR"},
+};
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 3ce9338f1f..382fca64be 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -18,6 +18,22 @@
 #define ML_CNXK_POLL_JOB_START	0
 #define ML_CNXK_POLL_JOB_FINISH 1
 
+/* Error types enumeration */
+enum cnxk_ml_error_etype {
+	/* 0x0 */ ML_CNXK_ETYPE_NO_ERROR = 0, /* No error */
+	/* 0x1 */ ML_CNXK_ETYPE_FW_NONFATAL,  /* Firmware non-fatal error */
+	/* 0x2 */ ML_CNXK_ETYPE_HW_NONFATAL,  /* Hardware non-fatal error */
+	/* 0x3 */ ML_CNXK_ETYPE_HW_FATAL,     /* Hardware fatal error */
+	/* 0x4 */ ML_CNXK_ETYPE_HW_WARNING,   /* Hardware warning */
+	/* 0x5 */ ML_CNXK_ETYPE_DRIVER,	      /* Driver specific error */
+	/* 0x6 */ ML_CNXK_ETYPE_UNKNOWN,      /* Unknown error */
+};
+
+struct cnxk_ml_error_db {
+	uint64_t code;
+	char str[RTE_ML_STR_MAX];
+};
+
 /* Device configuration state enum */
 enum cnxk_ml_dev_state {
 	/* Probed and not configured */
@@ -78,4 +94,6 @@ struct cnxk_ml_dev {
 	struct cnxk_ml_index_map *index_map;
 };
 
+extern struct cnxk_ml_error_db ml_etype_db[];
+
 #endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 909e9143bf..3d21a31374 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1372,7 +1372,7 @@ cnxk_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op *
 		if (plt_tsc_cycles() < req->timeout)
 			goto empty_or_active;
 		else /* Timeout, set indication of driver error */
-			model->set_error_code(req, ML_ETYPE_DRIVER, 0);
+			model->set_error_code(req, ML_CNXK_ETYPE_DRIVER, 0);
 	}
 
 	model->result_update(cnxk_mldev, qp->id, req);
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 18/34] ml/cnxk: support config and close of tvmdp library
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
                     ` (16 preceding siblings ...)
  2023-10-19  4:17   ` [PATCH v7 17/34] ml/cnxk: move error handling to cnxk layer Srikanth Yalavarthi
@ 2023-10-19  4:17   ` Srikanth Yalavarthi
  2023-10-19  4:17   ` [PATCH v7 19/34] ml/cnxk: add structures to support TVM model type Srikanth Yalavarthi
                     ` (15 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to configure and close TVMDP library based
on ML device configuration options.

Updated meson build to enable Jansson, TVM runtime, TVMDP
library as build dependencies.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 doc/guides/mldevs/cnxk.rst       | 153 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.c    |   7 ++
 drivers/ml/cnxk/cnxk_ml_ops.h    |   6 ++
 drivers/ml/cnxk/meson.build      |  51 +++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.c   |  41 +++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |  19 ++++
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  26 ++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  15 +++
 8 files changed, 318 insertions(+)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_stubs.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_stubs.h

diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
index 1834b1f905..5fe572d225 100644
--- a/doc/guides/mldevs/cnxk.rst
+++ b/doc/guides/mldevs/cnxk.rst
@@ -46,6 +46,159 @@ or cross-compiled on an x86 platform.
 
 Refer to :doc:`../platform/cnxk` for instructions to build your DPDK application.
 
+Compilation Prerequisites
+-------------------------
+
+This driver requires external libraries to optionally enable support for
+models compiled using Apache TVM framework. The following dependencies are
+not part of DPDK and must be installed separately:
+
+- **Jansson**
+
+  This library enables support to parse and read JSON files.
+
+- **DLPack**
+
+  This library provides headers for open in-memory tensor structures.
+
+.. note::
+
+    DPDK CNXK ML driver requires DLPack version 0.7
+
+.. code-block:: console
+
+    git clone https://github.com/dmlc/dlpack.git
+    cd dlpack
+    git checkout v0.7 -b v0.7
+    cmake -S ./ -B build \
+      -DBUILD_MOCK=OFF
+    make -C build
+    make -C build install
+
+*Cross-compiling for AArch64*
+
+.. code-block:: console
+
+    git clone https://github.com/dmlc/dlpack.git
+    cd dlpack
+    git checkout v0.7 -b v0.7
+    cmake -S ./ -B build \
+      -DCMAKE_INSTALL_PREFIX=<install_prefix>
+      -DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
+      -DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++ \
+      -DBUILD_MOCK=OFF
+    make -C build
+    make -C build install
+
+- **TVM**
+
+  Apache TVM provides a runtime library (libtvm_runtime) used to execute
+  models on CPU cores or hardware accelerators.
+
+.. note::
+
+    DPDK CNXK ML driver requires TVM version 0.10.0
+
+.. code-block:: console
+
+    git clone https://github.com/apache/tvm.git
+    cd tvm
+    git checkout v0.11.0 -b v0.11.0
+    git submodule update --init
+    cmake -S ./ -B build \
+      -DBUILD_STATIC_RUNTIME=OFF
+    make -C build
+    make -C build install
+
+*Cross-compiling for AArch64*
+
+.. code-block:: console
+
+    git clone https://github.com/apache/tvm.git
+    cd tvm
+    git checkout v0.11.0 -b v0.11.0
+    git submodule update --init
+    cmake -S ./ -B build \
+      -DCMAKE_INSTALL_PREFIX=<install_prefix> \
+      -DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
+      -DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++ \
+      -DMACHINE_NAME=aarch64-linux-gnu \
+      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
+      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
+      -DBUILD_STATIC_RUNTIME=OFF
+    make -C build
+    make -C build install
+
+- **TVMDP**
+
+  Marvell's `TVM Dataplane Library <https://github.com/MarvellEmbeddedProcessors/tvmdp>`_
+  works as an interface between TVM runtime and DPDK drivers. TVMDP library
+  provides a simplified C interface for TVM's runtime based on C++.
+
+.. note::
+
+    TVMDP library dependent on dlpack, dmlc-core and jansson.
+
+.. code-block:: console
+
+    # build dmlc-core
+    git clone https://github.com/dmlc/dmlc-core.git
+    cd dmlc-core
+    git checkout main
+    cmake -S ./ -B build
+    make -C build
+    make -C build install
+
+    # build tvmdp
+    git clone https://github.com/MarvellEmbeddedProcessors/tvmdp.git
+    cd tvmdp
+    git checkout main
+    cmake -S ./ -B build \
+      -DBUILD_SHARED_LIBS=ON \
+      -DBUILD_TESTING=OFF
+    make -C build
+    make -C build install
+
+*Cross-compiling for AArch64*
+
+.. code-block:: console
+
+    # build dmlc-core
+    git clone https://github.com/dmlc/dmlc-core.git
+    cd dmlc-core
+    git checkout main
+    cmake -S ./ -B build \
+      -DCMAKE_INSTALL_PREFIX=<INSTALL_DIR> \
+      -DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
+      -DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++
+    make -C build
+    make -C build install
+
+    # build tvmdp
+    git clone https://github.com/MarvellEmbeddedProcessors/tvmdp.git
+    cd tvmdp
+    git checkout main
+    cmake -S ./ -B build \
+      -DCMAKE_INSTALL_PREFIX=<install_prefix> \
+      -DCMAKE_TOOLCHAIN_FILE=config/toolchains/arm64_linux_gcc.cmake \
+      -DCMAKE_FIND_ROOT_PATH=<install_prefix> \
+      -DBUILD_SHARED_LIBS=ON \
+      -DBUILD_TESTING=OFF
+    make -C build
+    make -C build install
+
+- **libarchive**
+
+  Apached TVM framework generates compiled models as tar archives. This
+  library enables support to decompress and read archive files in tar,
+  xz and other formats.
+
+.. note::
+
+    When cross-compiling for AArch64, <install_prefix>/lib/pkgconfig should
+    be added to PKG_CONFIG_PATH environment variable and cmake_prefix_path
+    should be set to <install_prefix>/lib/cmake/tvm in meson crossfile. This
+    would enable meson to find the dependencies.
 
 Initialization
 --------------
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 3d21a31374..33d13d5514 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -564,6 +564,10 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 		goto error;
 	}
 
+	ret = mvtvm_ml_dev_configure(cnxk_mldev, conf);
+	if (ret != 0)
+		goto error;
+
 	/* Set device capabilities */
 	cnxk_mldev->max_nb_layers =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
@@ -624,6 +628,9 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 	/* Un-initialize xstats */
 	cnxk_ml_xstats_uninit(cnxk_mldev);
 
+	if (mvtvm_ml_dev_close(cnxk_mldev) != 0)
+		plt_err("Failed to close MVTVM ML Device");
+
 	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close CN10K ML Device");
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index d0c126f34b..b22a2b0d95 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -12,6 +12,12 @@
 
 #include "cn10k_ml_ops.h"
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_ops.h"
+#else
+#include "mvtvm_ml_stubs.h"
+#endif
+
 /* Request structure */
 struct cnxk_ml_req {
 	/* Device specific request */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 5d27a87d91..8ce9b96d5a 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -7,6 +7,32 @@ if not is_linux or not dpdk_conf.get('RTE_ARCH_64')
     subdir_done()
 endif
 
+enable_mvtvm = true
+
+if not jansson_dep.found()
+        message('drivers/ml/cnxk: jansson not found')
+        enable_mvtvm = false
+endif
+
+if not cc.check_header('dlpack/dlpack.h')
+        message('drivers/ml/cnxk: dlpack.h not found')
+        enable_mvtvm = false
+endif
+
+tvmrt_lib = cc.find_library('tvm_runtime', required: false)
+if tvmrt_lib.found()
+        tvmrt_dep = declare_dependency(dependencies: tvmrt_lib)
+else
+        message('drivers/ml/cnxk: tvm_runtime not found')
+        enable_mvtvm = false
+endif
+
+tvmdp_dep = dependency('tvmdp', required: false)
+if not tvmdp_dep.found()
+        message('drivers/ml/cnxk: tvmdp not found')
+        enable_mvtvm = false
+endif
+
 sources = files(
         'cn10k_ml_dev.c',
         'cn10k_ml_ops.c',
@@ -21,6 +47,31 @@ sources = files(
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
 
+if enable_mvtvm
+
+dpdk_conf.set('RTE_MLDEV_CNXK_ENABLE_MVTVM', 1)
+
+sources += files(
+        'mvtvm_ml_ops.c',
+)
+
+ext_deps += tvmrt_dep
+ext_deps += tvmdp_dep
+ext_deps += cc.find_library('stdc++', required: true)
+ext_deps += jansson_dep
+
+deps += ['bus_vdev']
+
+message('drivers/ml/cnxk: Enabled TVM model support')
+else
+message('drivers/ml/cnxk: Disabled TVM model support')
+
+sources += files(
+        'mvtvm_ml_stubs.c',
+)
+
+endif
+
 require_iova_in_mbuf = false
 
 if get_option('buildtype').contains('debug')
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
new file mode 100644
index 0000000000..88c6d5a864
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_common.h>
+#include <rte_cycles.h>
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include "cnxk_ml_dev.h"
+#include "cnxk_ml_ops.h"
+
+int
+mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
+{
+	int ret;
+
+	RTE_SET_USED(conf);
+
+	/* Configure TVMDP library */
+	ret = tvmdp_configure(cnxk_mldev->mldev->data->nb_models, rte_get_tsc_cycles);
+	if (ret != 0)
+		plt_err("TVMDP configuration failed, error = %d\n", ret);
+
+	return ret;
+}
+
+int
+mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
+{
+	int ret;
+
+	RTE_SET_USED(cnxk_mldev);
+
+	/* Close TVMDP library configuration */
+	ret = tvmdp_close();
+	if (ret != 0)
+		plt_err("TVMDP close failed, error = %d\n", ret);
+
+	return ret;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
new file mode 100644
index 0000000000..305b4681ed
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_OPS_H_
+#define _MVTVM_ML_OPS_H_
+
+#include <dlpack/dlpack.h>
+
+#include <tvmdp.h>
+
+#include <rte_mldev.h>
+
+struct cnxk_ml_dev;
+
+int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
+int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+
+#endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
new file mode 100644
index 0000000000..a31cd39cfa
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include "mvtvm_ml_stubs.h"
+
+#include "cnxk_ml_dev.h"
+
+int
+mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(conf);
+
+	return 0;
+}
+
+int
+mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
+{
+	RTE_SET_USED(cnxk_mldev);
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
new file mode 100644
index 0000000000..11c56e5144
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_STUBS_H_
+#define _MVTVM_ML_STUBS_H_
+
+#include <rte_mldev.h>
+
+struct cnxk_ml_dev;
+
+int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
+int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+
+#endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 19/34] ml/cnxk: add structures to support TVM model type
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
                     ` (17 preceding siblings ...)
  2023-10-19  4:17   ` [PATCH v7 18/34] ml/cnxk: support config and close of tvmdp library Srikanth Yalavarthi
@ 2023-10-19  4:17   ` Srikanth Yalavarthi
  2023-10-19  4:17   ` [PATCH v7 20/34] ml/cnxk: add support for identify " Srikanth Yalavarthi
                     ` (14 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduced model type, sub-type and layer type. Added
internal structures for TVM model objects.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  3 ++
 drivers/ml/cnxk/cn10k_ml_ops.c   |  6 ++-
 drivers/ml/cnxk/cnxk_ml_model.h  | 66 +++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.c    | 52 ++++++++++++++++++++-----
 drivers/ml/cnxk/mvtvm_ml_model.h | 46 ++++++++++++++++++++++
 5 files changed, 160 insertions(+), 13 deletions(-)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.h

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index dc315cce10..749ddeb344 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -435,6 +435,9 @@ cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint1
 
 			for (j = 0; j < local_model->nb_layers; j++) {
 				local_layer = &local_model->layer[j];
+				if (local_layer->type != ML_CNXK_LAYER_TYPE_MRVL)
+					continue;
+
 				if (local_layer != layer &&
 				    local_layer->glow.ocm_map.ocm_reserved) {
 					if (IS_BIT_SET(local_layer->glow.ocm_map.tilemask, tile_id))
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 65eaaf030d..a471e98fbf 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -725,6 +725,9 @@ cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	if (ret != 0)
 		return ret;
 
+	/* Set model sub type */
+	model->subtype = ML_CNXK_MODEL_SUBTYPE_GLOW_MRVL;
+
 	/* Copy metadata to internal buffer */
 	rte_memcpy(&model->glow.metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
 	cn10k_ml_model_metadata_update(&model->glow.metadata);
@@ -746,6 +749,7 @@ cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	/* Load layer and get the index */
 	layer = &model->layer[0];
+	layer->type = ML_CNXK_LAYER_TYPE_MRVL;
 	ret = cn10k_ml_layer_load(cnxk_mldev, model->model_id, NULL, params->addr, params->size,
 				  &layer->index);
 	if (ret != 0) {
@@ -969,7 +973,7 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 	if (ret < 0) {
 		cn10k_ml_layer_stop(device, model_id, layer_name);
 	} else {
-		if (cn10k_mldev->cache_model_data)
+		if (cn10k_mldev->cache_model_data && model->type == ML_CNXK_MODEL_TYPE_GLOW)
 			ret = cn10k_ml_cache_model_data(cnxk_mldev, layer);
 	}
 
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index f618e5aa5f..f100eca203 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -11,6 +11,10 @@
 
 #include "cn10k_ml_model.h"
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_model.h"
+#endif
+
 #include "cnxk_ml_io.h"
 
 struct cnxk_ml_dev;
@@ -18,6 +22,48 @@ struct cnxk_ml_model;
 struct cnxk_ml_qp;
 struct cnxk_ml_req;
 
+/* Model type */
+enum cnxk_ml_model_type {
+	/* Unknown model type */
+	ML_CNXK_MODEL_TYPE_UNKNOWN,
+
+	/* Invalid model type */
+	ML_CNXK_MODEL_TYPE_INVALID,
+
+	/* Glow compiled model, for MLIP target */
+	ML_CNXK_MODEL_TYPE_GLOW,
+
+	/* TVM compiled model, for ARM64 / ARM64 + MLIP target */
+	ML_CNXK_MODEL_TYPE_TVM,
+};
+
+/* Model subtype */
+enum cnxk_ml_model_subtype {
+	/* Marvell Glow model */
+	ML_CNXK_MODEL_SUBTYPE_GLOW_MRVL,
+
+	/* TVM model with single MRVL region */
+	ML_CNXK_MODEL_SUBTYPE_TVM_MRVL,
+
+	/* TVM model with LLVM regions only */
+	ML_CNXK_MODEL_SUBTYPE_TVM_LLVM,
+
+	/* TVM hybrid model, with both MRVL and LLVM regions or (> 1) MRVL regions*/
+	ML_CNXK_MODEL_SUBTYPE_TVM_HYBRID,
+};
+
+/* Layer type */
+enum cnxk_ml_layer_type {
+	/* MRVL layer, for MLIP target*/
+	ML_CNXK_LAYER_TYPE_UNKNOWN = 0,
+
+	/* MRVL layer, for MLIP target*/
+	ML_CNXK_LAYER_TYPE_MRVL,
+
+	/* LLVM layer, for ARM64 target*/
+	ML_CNXK_LAYER_TYPE_LLVM,
+};
+
 /* Model state */
 enum cnxk_ml_model_state {
 	/* Unknown state */
@@ -53,6 +99,9 @@ struct cnxk_ml_layer {
 	/* Name*/
 	char name[RTE_ML_STR_MAX];
 
+	/* Type */
+	enum cnxk_ml_layer_type type;
+
 	/* Model handle */
 	struct cnxk_ml_model *model;
 
@@ -83,14 +132,27 @@ struct cnxk_ml_model {
 	/* Device reference */
 	struct cnxk_ml_dev *cnxk_mldev;
 
+	/* Type */
+	enum cnxk_ml_model_type type;
+
+	/* Model subtype */
+	enum cnxk_ml_model_subtype subtype;
+
 	/* ID */
 	uint16_t model_id;
 
 	/* Name */
 	char name[RTE_ML_STR_MAX];
 
-	/* Model specific data - glow */
-	struct cn10k_ml_model_data glow;
+	union {
+		/* Model specific data - glow */
+		struct cn10k_ml_model_data glow;
+
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		/* Model type specific data - mvtvm */
+		struct mvtvm_ml_model_data mvtvm;
+#endif
+	};
 
 	/* Batch size */
 	uint32_t batch_size;
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 33d13d5514..96f87128f9 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1217,6 +1217,8 @@ cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buf
 	struct cnxk_ml_model *model;
 	uint8_t *lcl_dbuffer;
 	uint8_t *lcl_qbuffer;
+	uint64_t d_offset;
+	uint64_t q_offset;
 	uint32_t i;
 	int ret;
 
@@ -1229,17 +1231,31 @@ cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buf
 		return -EINVAL;
 	}
 
-	info = &model->layer[0].info;
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		info = cn10k_ml_model_io_info_get(model, 0);
 
-	lcl_dbuffer = dbuffer[0]->addr;
-	lcl_qbuffer = qbuffer[0]->addr;
+	if (info == NULL)
+		return -EINVAL;
+
+	d_offset = 0;
+	q_offset = 0;
 	for (i = 0; i < info->nb_inputs; i++) {
+		if (model->type == ML_CNXK_MODEL_TYPE_TVM) {
+			lcl_dbuffer = dbuffer[i]->addr;
+			lcl_qbuffer = qbuffer[i]->addr;
+		} else {
+			lcl_dbuffer = RTE_PTR_ADD(dbuffer[0]->addr, d_offset);
+			lcl_qbuffer = RTE_PTR_ADD(qbuffer[0]->addr, q_offset);
+		}
+
 		ret = cnxk_ml_io_quantize_single(&info->input[i], lcl_dbuffer, lcl_qbuffer);
 		if (ret < 0)
 			return ret;
 
-		lcl_dbuffer += info->input[i].sz_d;
-		lcl_qbuffer += info->input[i].sz_q;
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
+			d_offset += info->input[i].sz_d;
+			q_offset += info->input[i].sz_q;
+		}
 	}
 
 	return 0;
@@ -1253,6 +1269,8 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 	struct cnxk_ml_model *model;
 	uint8_t *lcl_qbuffer;
 	uint8_t *lcl_dbuffer;
+	uint64_t q_offset;
+	uint64_t d_offset;
 	uint32_t i;
 	int ret;
 
@@ -1265,17 +1283,31 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 		return -EINVAL;
 	}
 
-	info = &model->layer[model->nb_layers - 1].info;
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		info = cn10k_ml_model_io_info_get(model, model->nb_layers - 1);
 
-	lcl_qbuffer = qbuffer[0]->addr;
-	lcl_dbuffer = dbuffer[0]->addr;
+	if (info == NULL)
+		return -EINVAL;
+
+	q_offset = 0;
+	d_offset = 0;
 	for (i = 0; i < info->nb_outputs; i++) {
+		if (model->type == ML_CNXK_MODEL_TYPE_TVM) {
+			lcl_qbuffer = qbuffer[i]->addr;
+			lcl_dbuffer = dbuffer[i]->addr;
+		} else {
+			lcl_qbuffer = RTE_PTR_ADD(qbuffer[0]->addr, q_offset);
+			lcl_dbuffer = RTE_PTR_ADD(dbuffer[0]->addr, d_offset);
+		}
+
 		ret = cnxk_ml_io_dequantize_single(&info->output[i], lcl_qbuffer, lcl_dbuffer);
 		if (ret < 0)
 			return ret;
 
-		lcl_qbuffer += info->output[i].sz_q;
-		lcl_dbuffer += info->output[i].sz_d;
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
+			q_offset += info->output[i].sz_q;
+			d_offset += info->output[i].sz_d;
+		}
 	}
 
 	return 0;
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
new file mode 100644
index 0000000000..1f6b435be0
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_MODEL_H_
+#define _MVTVM_ML_MODEL_H_
+
+#include <tvmdp.h>
+
+#include <rte_mldev.h>
+
+#include "cnxk_ml_io.h"
+
+/* Maximum number of objects per model */
+#define ML_MVTVM_MODEL_OBJECT_MAX 3
+
+/* Objects list */
+extern char mvtvm_object_list[ML_MVTVM_MODEL_OBJECT_MAX][RTE_ML_STR_MAX];
+
+/* Model object structure */
+struct mvtvm_ml_model_object {
+	/* Name */
+	char name[RTE_ML_STR_MAX];
+
+	/* Temporary buffer */
+	uint8_t *buffer;
+
+	/* Buffer size */
+	int64_t size;
+};
+
+struct mvtvm_ml_model_data {
+	/* Model metadata */
+	struct tvmdp_model_metadata metadata;
+
+	/* Model objects */
+	struct tvmdp_model_object object;
+
+	/* TVM runtime callbacks */
+	struct tvmrt_glow_callback cb;
+
+	/* Model I/O info */
+	struct cnxk_ml_io_info info;
+};
+
+#endif /* _MVTVM_ML_MODEL_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 20/34] ml/cnxk: add support for identify model type
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
                     ` (18 preceding siblings ...)
  2023-10-19  4:17   ` [PATCH v7 19/34] ml/cnxk: add structures to support TVM model type Srikanth Yalavarthi
@ 2023-10-19  4:17   ` Srikanth Yalavarthi
  2023-10-19  4:17   ` [PATCH v7 21/34] ml/cnxk: add support to parse TVM model objects Srikanth Yalavarthi
                     ` (13 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enable support to parse model buffer to identify the
model type and model sub-type. Enabled basic checks
for Glow model type buffer.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_model.c  | 49 ++++++++++++++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_model.h  |  3 ++
 drivers/ml/cnxk/cnxk_ml_ops.c    |  8 +++++
 drivers/ml/cnxk/meson.build      |  6 ++++
 drivers/ml/cnxk/mvtvm_ml_model.c | 55 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  9 ++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  1 +
 8 files changed, 133 insertions(+)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.c

diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
index b069d4e3a5..02f80410ec 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.c
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -2,11 +2,60 @@
  * Copyright (c) 2023 Marvell.
  */
 
+#include <rte_hash_crc.h>
 #include <rte_mldev.h>
 
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_utils.h"
 
+enum cnxk_ml_model_type
+cnxk_ml_model_get_type(struct rte_ml_model_params *params)
+{
+	struct cn10k_ml_model_metadata_header *metadata_header;
+	enum cnxk_ml_model_type type;
+	uint32_t payload_crc32c;
+	uint32_t header_crc32c;
+
+	type = mvtvm_ml_model_type_get(params);
+	if (type == ML_CNXK_MODEL_TYPE_TVM)
+		return ML_CNXK_MODEL_TYPE_TVM;
+	else if (type == ML_CNXK_MODEL_TYPE_INVALID)
+		return ML_CNXK_MODEL_TYPE_INVALID;
+
+	/* Check model magic string */
+	metadata_header = (struct cn10k_ml_model_metadata_header *)params->addr;
+	if (strncmp((char *)metadata_header->magic, MRVL_ML_MODEL_MAGIC_STRING, 4) != 0) {
+		plt_err("Invalid Glow model, magic = %s", metadata_header->magic);
+		return ML_CNXK_MODEL_TYPE_INVALID;
+	}
+
+	/* Header CRC check */
+	if (metadata_header->header_crc32c != 0) {
+		header_crc32c = rte_hash_crc(
+			params->addr,
+			sizeof(struct cn10k_ml_model_metadata_header) - sizeof(uint32_t), 0);
+
+		if (header_crc32c != metadata_header->header_crc32c) {
+			plt_err("Invalid Glow model, Header CRC mismatch");
+			return ML_CNXK_MODEL_TYPE_INVALID;
+		}
+	}
+
+	/* Payload CRC check */
+	if (metadata_header->payload_crc32c != 0) {
+		payload_crc32c = rte_hash_crc(
+			PLT_PTR_ADD(params->addr, sizeof(struct cn10k_ml_model_metadata_header)),
+			params->size - sizeof(struct cn10k_ml_model_metadata_header), 0);
+
+		if (payload_crc32c != metadata_header->payload_crc32c) {
+			plt_err("Invalid Glow model, Payload CRC mismatch");
+			return ML_CNXK_MODEL_TYPE_INVALID;
+		}
+	}
+
+	return ML_CNXK_MODEL_TYPE_GLOW;
+}
+
 void
 cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp)
 {
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index f100eca203..a2fced46a2 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -13,6 +13,8 @@
 
 #ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
 #include "mvtvm_ml_model.h"
+#else
+#include "mvtvm_ml_stubs.h"
 #endif
 
 #include "cnxk_ml_io.h"
@@ -184,6 +186,7 @@ struct cnxk_ml_model {
 	set_poll_addr_t set_poll_addr;
 };
 
+enum cnxk_ml_model_type cnxk_ml_model_get_type(struct rte_ml_model_params *params);
 void cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp);
 
 #endif /* _CNXK_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 96f87128f9..ebc78e36e9 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1018,6 +1018,7 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 {
 	struct rte_ml_dev_info dev_info;
 	struct cnxk_ml_dev *cnxk_mldev;
+	enum cnxk_ml_model_type type;
 	struct cnxk_ml_model *model;
 
 	char str[RTE_MEMZONE_NAMESIZE];
@@ -1033,6 +1034,12 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 
 	cnxk_mldev = dev->data->dev_private;
 
+	type = cnxk_ml_model_get_type(params);
+	if (type == ML_CNXK_MODEL_TYPE_INVALID) {
+		plt_err("Invalid / unsupported model type");
+		return -EINVAL;
+	}
+
 	/* Find model ID */
 	found = false;
 	for (lcl_model_id = 0; lcl_model_id < dev->data->nb_models; lcl_model_id++) {
@@ -1066,6 +1073,7 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 
 	model = mz->addr;
 	model->cnxk_mldev = cnxk_mldev;
+	model->type = type;
 	model->model_id = lcl_model_id;
 	model->info = PLT_PTR_ADD(
 		model, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size));
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 8ce9b96d5a..20dbaab734 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -9,6 +9,11 @@ endif
 
 enable_mvtvm = true
 
+if not libarchive.found()
+        message('drivers/ml/cnxk: libarchive not found')
+        enable_mvtvm = false
+endif
+
 if not jansson_dep.found()
         message('drivers/ml/cnxk: jansson not found')
         enable_mvtvm = false
@@ -53,6 +58,7 @@ dpdk_conf.set('RTE_MLDEV_CNXK_ENABLE_MVTVM', 1)
 
 sources += files(
         'mvtvm_ml_ops.c',
+        'mvtvm_ml_model.c',
 )
 
 ext_deps += tvmrt_dep
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
new file mode 100644
index 0000000000..ab5f8baa67
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <archive.h>
+#include <archive_entry.h>
+
+#include <rte_mldev.h>
+
+#include <roc_api.h>
+
+#include "cnxk_ml_model.h"
+
+/* Objects list */
+char mvtvm_object_list[ML_MVTVM_MODEL_OBJECT_MAX][RTE_ML_STR_MAX] = {"mod.so", "mod.json",
+								     "mod.params"};
+
+enum cnxk_ml_model_type
+mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
+{
+	bool object_found[ML_MVTVM_MODEL_OBJECT_MAX] = {false, false, false};
+	struct archive_entry *entry;
+	struct archive *a;
+	uint8_t i;
+	int ret;
+
+	/* Assume as archive and check for read status */
+	a = archive_read_new();
+	archive_read_support_filter_all(a);
+	archive_read_support_format_all(a);
+
+	ret = archive_read_open_memory(a, params->addr, params->size);
+	if (ret != ARCHIVE_OK)
+		return ML_CNXK_MODEL_TYPE_UNKNOWN;
+
+	/* Parse buffer for available objects */
+	while (archive_read_next_header(a, &entry) == ARCHIVE_OK) {
+		for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+			if (!object_found[i] &&
+			    (strcmp(archive_entry_pathname(entry), mvtvm_object_list[i]) == 0))
+				object_found[i] = true;
+		}
+		archive_read_data_skip(a);
+	}
+
+	/* Check if all objects are available */
+	for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+		if (!object_found[i]) {
+			plt_err("Object %s not found in archive!\n", mvtvm_object_list[i]);
+			return ML_CNXK_MODEL_TYPE_INVALID;
+		}
+	}
+
+	return ML_CNXK_MODEL_TYPE_TVM;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 1f6b435be0..b6162fceec 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -43,4 +43,6 @@ struct mvtvm_ml_model_data {
 	struct cnxk_ml_io_info info;
 };
 
+enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
+
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index a31cd39cfa..a7352840a6 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -7,6 +7,15 @@
 #include "mvtvm_ml_stubs.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
+
+enum cnxk_ml_model_type
+mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
+{
+	RTE_SET_USED(params);
+
+	return ML_CNXK_MODEL_TYPE_UNKNOWN;
+}
 
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 11c56e5144..467a9d39e5 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -9,6 +9,7 @@
 
 struct cnxk_ml_dev;
 
+enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 21/34] ml/cnxk: add support to parse TVM model objects
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
                     ` (19 preceding siblings ...)
  2023-10-19  4:17   ` [PATCH v7 20/34] ml/cnxk: add support for identify " Srikanth Yalavarthi
@ 2023-10-19  4:17   ` Srikanth Yalavarthi
  2023-10-19  4:17   ` [PATCH v7 22/34] ml/cnxk: fetch layer info and load TVM model Srikanth Yalavarthi
                     ` (12 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to parse TVM model objects from the model
archive buffer. Added support to check for all expected
objects and copy TVM model objects to internal buffers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_ops.c    |  5 ++-
 drivers/ml/cnxk/mvtvm_ml_model.c | 57 +++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 62 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |  3 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c | 11 ++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  3 ++
 7 files changed, 142 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index ebc78e36e9..85b37161d2 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1079,7 +1079,10 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 		model, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size));
 	dev->data->models[lcl_model_id] = model;
 
-	ret = cn10k_ml_model_load(cnxk_mldev, params, model);
+	if (type == ML_CNXK_MODEL_TYPE_GLOW)
+		ret = cn10k_ml_model_load(cnxk_mldev, params, model);
+	else
+		ret = mvtvm_ml_model_load(cnxk_mldev, params, model);
 	if (ret != 0)
 		goto error;
 
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index ab5f8baa67..4c9a080c05 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -53,3 +53,60 @@ mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
 
 	return ML_CNXK_MODEL_TYPE_TVM;
 }
+
+int
+mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params, struct mvtvm_ml_model_object *object)
+{
+	bool object_found[ML_MVTVM_MODEL_OBJECT_MAX] = {false, false, false};
+	struct archive_entry *entry;
+	struct archive *a;
+	uint8_t i;
+	int ret;
+
+	/* Open archive */
+	a = archive_read_new();
+	archive_read_support_filter_all(a);
+	archive_read_support_format_all(a);
+
+	ret = archive_read_open_memory(a, params->addr, params->size);
+	if (ret != ARCHIVE_OK)
+		return archive_errno(a);
+
+	/* Read archive */
+	while (archive_read_next_header(a, &entry) == ARCHIVE_OK) {
+		for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+			if (!object_found[i] &&
+			    (strcmp(archive_entry_pathname(entry), mvtvm_object_list[i]) == 0)) {
+				memcpy(object[i].name, mvtvm_object_list[i], RTE_ML_STR_MAX);
+				object[i].size = archive_entry_size(entry);
+				object[i].buffer = rte_malloc(NULL, object[i].size, 0);
+
+				if (archive_read_data(a, object[i].buffer, object[i].size) !=
+				    object[i].size) {
+					plt_err("Failed to read object from model archive: %s",
+						object[i].name);
+					goto error;
+				}
+				object_found[i] = true;
+			}
+		}
+		archive_read_data_skip(a);
+	}
+
+	/* Check if all objects are parsed */
+	for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+		if (!object_found[i]) {
+			plt_err("Object %s not found in archive!\n", mvtvm_object_list[i]);
+			goto error;
+		}
+	}
+	return 0;
+
+error:
+	for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+		if (object[i].buffer != NULL)
+			rte_free(object[i].buffer);
+	}
+
+	return -EINVAL;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index b6162fceec..b11b66f495 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -44,5 +44,7 @@ struct mvtvm_ml_model_data {
 };
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
+int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
+			      struct mvtvm_ml_model_object *object);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 88c6d5a864..e2413b6b15 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -8,8 +8,12 @@
 #include <rte_mldev_pmd.h>
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
+/* ML model macros */
+#define MVTVM_ML_MODEL_MEMZONE_NAME "ml_mvtvm_model_mz"
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
@@ -39,3 +43,61 @@ mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 
 	return ret;
 }
+
+int
+mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+		    struct cnxk_ml_model *model)
+{
+	struct mvtvm_ml_model_object object[ML_MVTVM_MODEL_OBJECT_MAX];
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	size_t model_object_size = 0;
+	uint64_t mz_size = 0;
+	int ret;
+
+	RTE_SET_USED(cnxk_mldev);
+
+	ret = mvtvm_ml_model_blob_parse(params, object);
+	if (ret != 0)
+		return ret;
+
+	model_object_size = RTE_ALIGN_CEIL(object[0].size, RTE_CACHE_LINE_MIN_SIZE) +
+			    RTE_ALIGN_CEIL(object[1].size, RTE_CACHE_LINE_MIN_SIZE) +
+			    RTE_ALIGN_CEIL(object[2].size, RTE_CACHE_LINE_MIN_SIZE);
+	mz_size += model_object_size;
+
+	/* Allocate memzone for model object */
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", MVTVM_ML_MODEL_MEMZONE_NAME, model->model_id);
+	mz = plt_memzone_reserve_aligned(str, mz_size, 0, ML_CN10K_ALIGN_SIZE);
+	if (!mz) {
+		plt_err("plt_memzone_reserve failed : %s", str);
+		return -ENOMEM;
+	}
+
+	/* Copy mod.so */
+	model->mvtvm.object.so.addr = mz->addr;
+	model->mvtvm.object.so.size = object[0].size;
+	rte_memcpy(model->mvtvm.object.so.name, object[0].name, TVMDP_NAME_STRLEN);
+	rte_memcpy(model->mvtvm.object.so.addr, object[0].buffer, object[0].size);
+	rte_free(object[0].buffer);
+
+	/* Copy mod.json */
+	model->mvtvm.object.json.addr =
+		RTE_PTR_ADD(model->mvtvm.object.so.addr,
+			    RTE_ALIGN_CEIL(model->mvtvm.object.so.size, RTE_CACHE_LINE_MIN_SIZE));
+	model->mvtvm.object.json.size = object[1].size;
+	rte_memcpy(model->mvtvm.object.json.name, object[1].name, TVMDP_NAME_STRLEN);
+	rte_memcpy(model->mvtvm.object.json.addr, object[1].buffer, object[1].size);
+	rte_free(object[1].buffer);
+
+	/* Copy mod.params */
+	model->mvtvm.object.params.addr =
+		RTE_PTR_ADD(model->mvtvm.object.json.addr,
+			    RTE_ALIGN_CEIL(model->mvtvm.object.json.size, RTE_CACHE_LINE_MIN_SIZE));
+	model->mvtvm.object.params.size = object[2].size;
+	rte_memcpy(model->mvtvm.object.params.name, object[2].name, TVMDP_NAME_STRLEN);
+	rte_memcpy(model->mvtvm.object.params.addr, object[2].buffer, object[2].size);
+	rte_free(object[2].buffer);
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 305b4681ed..6607537599 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -12,8 +12,11 @@
 #include <rte_mldev.h>
 
 struct cnxk_ml_dev;
+struct cnxk_ml_model;
 
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+			struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index a7352840a6..7f3b3abb2e 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -33,3 +33,14 @@ mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 
 	return 0;
 }
+
+int
+mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+		    struct cnxk_ml_model *model)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(params);
+	RTE_SET_USED(model);
+
+	return -EINVAL;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 467a9d39e5..4bb1772ef4 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -8,9 +8,12 @@
 #include <rte_mldev.h>
 
 struct cnxk_ml_dev;
+struct cnxk_ml_model;
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+			struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 22/34] ml/cnxk: fetch layer info and load TVM model
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
                     ` (20 preceding siblings ...)
  2023-10-19  4:17   ` [PATCH v7 21/34] ml/cnxk: add support to parse TVM model objects Srikanth Yalavarthi
@ 2023-10-19  4:17   ` Srikanth Yalavarthi
  2023-10-19  4:17   ` [PATCH v7 23/34] ml/cnxk: update internal info for " Srikanth Yalavarthi
                     ` (11 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to fetch TVM model layer information and
update internal structures based on the layer information
Set callback functions for layer load and unload and
enable model loading using TVMDP library. Added support
to fetch full metadata after model load.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 11 +++++
 drivers/ml/cnxk/cn10k_ml_model.h |  2 +
 drivers/ml/cnxk/cn10k_ml_ops.c   |  7 ++-
 drivers/ml/cnxk/mvtvm_ml_model.c | 25 ++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  4 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 81 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.c | 10 ++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  3 ++
 8 files changed, 141 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index af9d5a666f..0325cd54f1 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -716,3 +716,14 @@ cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer
 	cnxk_ml_print_line(fp, LINE_LEN);
 	fprintf(fp, "\n");
 }
+
+int
+cn10k_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name, uint16_t *layer_id)
+{
+	if (model->type == ML_CNXK_MODEL_TYPE_TVM)
+		return mvtvm_ml_model_get_layer_id(model, layer_name, layer_id);
+
+	*layer_id = 0;
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 45f2ed5fcf..6744175cd5 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -461,5 +461,7 @@ void cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_mode
 			     struct cnxk_ml_io_info *io_info,
 			     struct cn10k_ml_model_metadata *metadata);
 void cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
+int cn10k_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
+				uint16_t *layer_id);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index a471e98fbf..4191ccc840 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -576,7 +576,7 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 	size_t layer_xstats_size;
 	uint8_t *base_dma_addr;
 	uint16_t scratch_pages;
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	uint16_t wb_pages;
 	uint64_t mz_size;
 	uint16_t idx;
@@ -584,7 +584,6 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 	int ret;
 
 	PLT_SET_USED(size);
-	PLT_SET_USED(layer_name);
 
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
@@ -598,6 +597,10 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 		return -EINVAL;
 	}
 
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
 	layer = &model->layer[layer_id];
 
 	ret = cn10k_ml_model_metadata_check(buffer, size);
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 4c9a080c05..8536fd8927 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -110,3 +110,28 @@ mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params, struct mvtvm_ml_mo
 
 	return -EINVAL;
 }
+
+int
+mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name, uint16_t *layer_id)
+{
+	uint16_t i;
+
+	for (i = 0; i < model->mvtvm.metadata.model.nb_layers; i++) {
+		if (strcmp(model->layer[i].name, layer_name) == 0)
+			break;
+	}
+
+	if (i == model->mvtvm.metadata.model.nb_layers) {
+		plt_err("Invalid layer name: %s", layer_name);
+		return -EINVAL;
+	}
+
+	if (model->layer[i].type != ML_CNXK_LAYER_TYPE_MRVL) {
+		plt_err("Invalid layer type, name: %s type: %d", layer_name, model->layer[i].type);
+		return -EINVAL;
+	}
+
+	*layer_id = i;
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index b11b66f495..6cb2639876 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -11,6 +11,8 @@
 
 #include "cnxk_ml_io.h"
 
+struct cnxk_ml_model;
+
 /* Maximum number of objects per model */
 #define ML_MVTVM_MODEL_OBJECT_MAX 3
 
@@ -46,5 +48,7 @@ struct mvtvm_ml_model_data {
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
 int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
 			      struct mvtvm_ml_model_object *object);
+int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
+				uint16_t *layer_id);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index e2413b6b15..9a3ada1b0d 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -49,9 +49,13 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		    struct cnxk_ml_model *model)
 {
 	struct mvtvm_ml_model_object object[ML_MVTVM_MODEL_OBJECT_MAX];
+	struct tvmrt_glow_callback *callback;
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
 	size_t model_object_size = 0;
+	uint16_t nb_mrvl_layers;
+	uint16_t nb_llvm_layers;
+	uint8_t layer_id = 0;
 	uint64_t mz_size = 0;
 	int ret;
 
@@ -99,5 +103,82 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	rte_memcpy(model->mvtvm.object.params.addr, object[2].buffer, object[2].size);
 	rte_free(object[2].buffer);
 
+	/* Get metadata - stage 1 */
+	ret = tvmdp_model_metadata_get_stage1(model->mvtvm.object.json.addr,
+					      model->mvtvm.object.json.size,
+					      &model->mvtvm.metadata);
+	if (ret != 0) {
+		plt_err("TVMDP: Failed to parse metadata - stage 1, model_id = %u, error = %d",
+			model->model_id, ret);
+		goto error;
+	}
+
+	/* Set model fields */
+	plt_strlcpy(model->name, model->mvtvm.metadata.model.name, TVMDP_NAME_STRLEN);
+	model->batch_size = 1;
+	model->nb_layers = model->mvtvm.metadata.model.nb_layers;
+
+	/* Update layer info */
+	nb_mrvl_layers = 0;
+	nb_llvm_layers = 0;
+	for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+		rte_strscpy(model->layer[layer_id].name,
+			    model->mvtvm.metadata.model.layer[layer_id].name, TVMDP_NAME_STRLEN);
+		if (strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "mrvl") == 0 ||
+		    strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "MRVL") == 0) {
+			model->layer[layer_id].type = ML_CNXK_LAYER_TYPE_MRVL;
+			nb_mrvl_layers++;
+		} else if (strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "llvm") == 0 ||
+			   strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "LLVM") == 0) {
+			model->layer[layer_id].type = ML_CNXK_LAYER_TYPE_LLVM;
+			nb_llvm_layers++;
+		}
+	}
+
+	if ((nb_llvm_layers == 0) && (nb_mrvl_layers == 0)) {
+		plt_err("Invalid model, nb_llvm_layers = %u, nb_mrvl_layers = %u", nb_llvm_layers,
+			nb_mrvl_layers);
+		goto error;
+	}
+
+	/* Set model subtype */
+	if ((nb_llvm_layers == 0) && (nb_mrvl_layers == 1))
+		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_MRVL;
+	else if ((nb_llvm_layers > 0) && (nb_mrvl_layers == 0))
+		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_LLVM;
+	else
+		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_HYBRID;
+
+	/* Set callback function array */
+	if (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_LLVM) {
+		callback = &model->mvtvm.cb;
+		callback->tvmrt_glow_layer_load = cn10k_ml_layer_load;
+		callback->tvmrt_glow_layer_unload = cn10k_ml_layer_unload;
+	} else {
+		callback = NULL;
+	}
+
+	/* Initialize model in TVMDP */
+	ret = tvmdp_model_load(cnxk_mldev, model->model_id, (void *)(&model->mvtvm.object),
+			       callback);
+	if (ret != 0) {
+		plt_err("TVMDP: Model load failed, model_id = %u, error = %d", model->model_id,
+			ret);
+		goto error;
+	}
+
+	/* Get model metadata - stage 2 */
+	ret = tvmdp_model_metadata_get_stage2(model->model_id, &model->mvtvm.metadata);
+	if (ret != 0) {
+		plt_err("TVMDP: Failed to get metadata, model_id = %u, error = %d\n",
+			model->model_id, ret);
+		goto error;
+	}
+
 	return 0;
+
+error:
+	rte_memzone_free(mz);
+
+	return ret;
 }
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index 7f3b3abb2e..d621dbc897 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -17,6 +17,16 @@ mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
 	return ML_CNXK_MODEL_TYPE_UNKNOWN;
 }
 
+int
+mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name, uint16_t *layer_id)
+{
+	RTE_SET_USED(model);
+	RTE_SET_USED(layer_name);
+	RTE_SET_USED(layer_id);
+
+	return -EINVAL;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 4bb1772ef4..23fdfdc4cd 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -16,4 +16,7 @@ int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 
+int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
+				uint16_t *layer_id);
+
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 23/34] ml/cnxk: update internal info for TVM model
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
                     ` (21 preceding siblings ...)
  2023-10-19  4:17   ` [PATCH v7 22/34] ml/cnxk: fetch layer info and load TVM model Srikanth Yalavarthi
@ 2023-10-19  4:17   ` Srikanth Yalavarthi
  2023-10-19  4:17   ` [PATCH v7 24/34] ml/cnxk: enable model unload in tvmdp library Srikanth Yalavarthi
                     ` (10 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enabled updating internal IO info structures for TVM model.
Compute static fields related to the model I/O.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_ops.c    |   4 ++
 drivers/ml/cnxk/mvtvm_ml_model.c | 111 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |   2 +
 drivers/ml/cnxk/mvtvm_ml_ops.c   |   3 +
 drivers/ml/cnxk/mvtvm_ml_stubs.c |   9 +++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |   1 +
 6 files changed, 130 insertions(+)

diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 85b37161d2..1565e521fd 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1244,6 +1244,8 @@ cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buf
 
 	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
 		info = cn10k_ml_model_io_info_get(model, 0);
+	else
+		info = mvtvm_ml_model_io_info_get(model, 0);
 
 	if (info == NULL)
 		return -EINVAL;
@@ -1296,6 +1298,8 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 
 	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
 		info = cn10k_ml_model_io_info_get(model, model->nb_layers - 1);
+	else
+		info = mvtvm_ml_model_io_info_get(model, model->nb_layers - 1);
 
 	if (info == NULL)
 		return -EINVAL;
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 8536fd8927..b40b0a13af 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -7,6 +7,8 @@
 
 #include <rte_mldev.h>
 
+#include <mldev_utils.h>
+
 #include <roc_api.h>
 
 #include "cnxk_ml_model.h"
@@ -135,3 +137,112 @@ mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 
 	return 0;
 }
+
+static enum rte_ml_io_type
+mvtvm_ml_io_type_map(uint8_t type)
+{
+	switch (type) {
+	case kDLInt:
+		return RTE_ML_IO_TYPE_INT32;
+	case kDLUInt:
+		return RTE_ML_IO_TYPE_UINT32;
+	case kDLFloat:
+		return RTE_ML_IO_TYPE_FP32;
+	case kDLBfloat:
+		return RTE_ML_IO_TYPE_BFLOAT16;
+	}
+
+	return RTE_ML_IO_TYPE_UNKNOWN;
+}
+
+void
+mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model)
+{
+	struct tvmdp_model_metadata *metadata;
+	int32_t i;
+	int32_t j;
+
+	if (model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL)
+		goto tvm_mrvl_model;
+
+	metadata = &model->mvtvm.metadata;
+
+	/* Inputs, set for layer_id = 0 */
+	model->mvtvm.info.nb_inputs = metadata->model.num_input;
+	model->mvtvm.info.total_input_sz_d = 0;
+	model->mvtvm.info.total_input_sz_q = 0;
+	for (i = 0; i < metadata->model.num_input; i++) {
+		rte_strscpy(model->mvtvm.info.input[i].name, metadata->input[i].name,
+			    TVMDP_NAME_STRLEN);
+		model->mvtvm.info.input[i].dtype =
+			mvtvm_ml_io_type_map(metadata->input[i].datatype.code);
+		model->mvtvm.info.input[i].qtype =
+			mvtvm_ml_io_type_map(metadata->input[i].model_datatype.code);
+		model->mvtvm.info.input[i].nb_dims = metadata->input[i].ndim;
+
+		model->mvtvm.info.input[i].nb_elements = 1;
+		for (j = 0; j < metadata->input[i].ndim; j++) {
+			model->mvtvm.info.input[i].shape[j] = metadata->input[i].shape[j];
+			model->mvtvm.info.input[i].nb_elements *= metadata->input[i].shape[j];
+		}
+
+		model->mvtvm.info.input[i].sz_d =
+			model->mvtvm.info.input[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.input[i].dtype);
+		model->mvtvm.info.input[i].sz_q =
+			model->mvtvm.info.input[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.input[i].qtype);
+
+		model->mvtvm.info.total_input_sz_d += model->mvtvm.info.input[i].sz_d;
+		model->mvtvm.info.total_input_sz_q += model->mvtvm.info.input[i].sz_q;
+
+		plt_ml_dbg("model_id = %u, input[%u] - sz_d = %u sz_q = %u", model->model_id, i,
+			   model->mvtvm.info.input[i].sz_d, model->mvtvm.info.input[i].sz_q);
+	}
+
+	/* Outputs, set for nb_layers - 1 */
+	model->mvtvm.info.nb_outputs = metadata->model.num_output;
+	model->mvtvm.info.total_output_sz_d = 0;
+	model->mvtvm.info.total_output_sz_q = 0;
+	for (i = 0; i < metadata->model.num_output; i++) {
+		rte_strscpy(model->mvtvm.info.output[i].name, metadata->output[i].name,
+			    TVMDP_NAME_STRLEN);
+		model->mvtvm.info.output[i].dtype =
+			mvtvm_ml_io_type_map(metadata->output[i].datatype.code);
+		model->mvtvm.info.output[i].qtype =
+			mvtvm_ml_io_type_map(metadata->output[i].model_datatype.code);
+		model->mvtvm.info.output[i].nb_dims = metadata->output[i].ndim;
+
+		model->mvtvm.info.output[i].nb_elements = 1;
+		for (j = 0; j < metadata->output[i].ndim; j++) {
+			model->mvtvm.info.output[i].shape[j] = metadata->output[i].shape[j];
+			model->mvtvm.info.output[i].nb_elements *= metadata->output[i].shape[j];
+		}
+
+		model->mvtvm.info.output[i].sz_d =
+			model->mvtvm.info.output[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.output[i].dtype);
+		model->mvtvm.info.output[i].sz_q =
+			model->mvtvm.info.output[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.output[i].qtype);
+
+		model->mvtvm.info.total_output_sz_d += model->mvtvm.info.output[i].sz_d;
+		model->mvtvm.info.total_output_sz_q += model->mvtvm.info.output[i].sz_q;
+
+		plt_ml_dbg("model_id = %u, output[%u] - sz_d = %u sz_q = %u", model->model_id, i,
+			   model->mvtvm.info.output[i].sz_d, model->mvtvm.info.output[i].sz_q);
+	}
+
+	return;
+
+tvm_mrvl_model:
+	cn10k_ml_layer_io_info_set(&model->mvtvm.info, &model->layer[0].glow.metadata);
+}
+
+struct cnxk_ml_io_info *
+mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
+{
+	RTE_SET_USED(layer_id);
+
+	return &model->mvtvm.info;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 6cb2639876..e86581bc6a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -50,5 +50,7 @@ int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
 			      struct mvtvm_ml_model_object *object);
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
+void mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model);
+struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 9a3ada1b0d..e21bf2dc07 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -175,6 +175,9 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		goto error;
 	}
 
+	/* Update model I/O data */
+	mvtvm_ml_model_io_info_set(model);
+
 	return 0;
 
 error:
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index d621dbc897..80a9a90b4e 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -27,6 +27,15 @@ mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 	return -EINVAL;
 }
 
+struct cnxk_ml_io_info *
+mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
+{
+	RTE_SET_USED(model);
+	RTE_SET_USED(layer_id);
+
+	return NULL;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 23fdfdc4cd..29f721072a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -18,5 +18,6 @@ int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_para
 
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
+struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
 
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 24/34] ml/cnxk: enable model unload in tvmdp library
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
                     ` (22 preceding siblings ...)
  2023-10-19  4:17   ` [PATCH v7 23/34] ml/cnxk: update internal info for " Srikanth Yalavarthi
@ 2023-10-19  4:17   ` Srikanth Yalavarthi
  2023-10-19  4:17   ` [PATCH v7 25/34] ml/cnxk: enable OCM check for multilayer TVM model Srikanth Yalavarthi
                     ` (9 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enable unloading model using external tvmdp library. Updated
layer unload callback to support multiple layers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c   |  8 +++++---
 drivers/ml/cnxk/cnxk_ml_ops.c    |  7 +++++--
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 28 ++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |  1 +
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  9 +++++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  1 +
 6 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 4191ccc840..e7208391fd 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -780,11 +780,9 @@ cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name)
 	struct cnxk_ml_layer *layer;
 
 	char str[RTE_MEMZONE_NAMESIZE];
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	int ret;
 
-	PLT_SET_USED(layer_name);
-
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
 		plt_err("Invalid device = %p", device);
@@ -797,6 +795,10 @@ cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name)
 		return -EINVAL;
 	}
 
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
 	layer = &model->layer[layer_id];
 
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u_%u", CN10K_ML_LAYER_MEMZONE_NAME,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 1565e521fd..ce668e1eb6 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1107,7 +1107,7 @@ cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 	struct cnxk_ml_model *model;
 
 	char str[RTE_MEMZONE_NAMESIZE];
-	int ret;
+	int ret = 0;
 
 	if (dev == NULL)
 		return -EINVAL;
@@ -1125,7 +1125,10 @@ cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EBUSY;
 	}
 
-	ret = cn10k_ml_model_unload(cnxk_mldev, model);
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		ret = cn10k_ml_model_unload(cnxk_mldev, model);
+	else
+		ret = mvtvm_ml_model_unload(cnxk_mldev, model);
 	if (ret != 0)
 		return ret;
 
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index e21bf2dc07..3847f9b6b9 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -185,3 +185,31 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	return ret;
 }
+
+int
+mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	int ret;
+
+	RTE_SET_USED(cnxk_mldev);
+
+	/* Initialize model in TVMDP */
+	ret = tvmdp_model_unload(model->model_id);
+	if (ret != 0) {
+		plt_err("TVMDP: Model unload failed, model_id = %u, error = %d", model->model_id,
+			ret);
+		return ret;
+	}
+
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", MVTVM_ML_MODEL_MEMZONE_NAME, model->model_id);
+	mz = rte_memzone_lookup(str);
+	if (mz == NULL) {
+		plt_err("Memzone lookup failed for TVM model: model_id = %u, mz = %s",
+			model->model_id, str);
+		return -EINVAL;
+	}
+
+	return plt_memzone_free(mz);
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 6607537599..770794fe7d 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -18,5 +18,6 @@ int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_d
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
+int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index 80a9a90b4e..a17a76e41f 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -63,3 +63,12 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	return -EINVAL;
 }
+
+int
+mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+
+	return -EINVAL;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 29f721072a..3776fb5369 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -15,6 +15,7 @@ int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_d
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
+int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 25/34] ml/cnxk: enable OCM check for multilayer TVM model
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
                     ` (23 preceding siblings ...)
  2023-10-19  4:17   ` [PATCH v7 24/34] ml/cnxk: enable model unload in tvmdp library Srikanth Yalavarthi
@ 2023-10-19  4:17   ` Srikanth Yalavarthi
  2023-10-19  4:17   ` [PATCH v7 26/34] ml/cnxk: support start and stop for TVM models Srikanth Yalavarthi
                     ` (8 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

From: Anup Prabhu <aprabhu@marvell.com>

Enabled check for OCM size requirement for multi-layer
TVM model. Compute OCM scratch and WB requirement for
all layers during the load stage.

Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_ops.c | 60 +++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index ce668e1eb6..d1471971e4 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1023,8 +1023,12 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
+	uint16_t max_scratch_pages;
+	struct cn10k_ml_ocm *ocm;
 	uint64_t model_info_size;
+	uint16_t total_wb_pages;
 	uint16_t lcl_model_id;
+	uint16_t layer_id;
 	uint64_t mz_size;
 	bool found;
 	int ret;
@@ -1086,6 +1090,62 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 	if (ret != 0)
 		goto error;
 
+	max_scratch_pages = 0;
+	total_wb_pages = 0;
+	layer_id = 0;
+
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
+
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
+		total_wb_pages = total_wb_pages + model->layer[layer_id].glow.ocm_map.wb_pages;
+		max_scratch_pages = PLT_MAX(max_scratch_pages,
+					    model->layer[layer_id].glow.ocm_map.scratch_pages);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	} else {
+		for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+			if (model->layer[layer_id].type == ML_CNXK_LAYER_TYPE_MRVL) {
+				total_wb_pages = total_wb_pages +
+						 model->layer[layer_id].glow.ocm_map.wb_pages;
+				max_scratch_pages =
+					PLT_MAX(max_scratch_pages,
+						model->layer[layer_id].glow.ocm_map.scratch_pages);
+			}
+		}
+#endif
+	}
+
+	if ((total_wb_pages + max_scratch_pages) > ocm->num_pages) {
+		plt_err("model_id = %u: total_wb_pages (%u) + scratch_pages (%u) >  %u\n",
+			lcl_model_id, total_wb_pages, max_scratch_pages, ocm->num_pages);
+
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
+			plt_ml_dbg("layer_id = %u: wb_pages = %u, scratch_pages = %u\n", layer_id,
+				   model->layer[layer_id].glow.ocm_map.wb_pages,
+				   model->layer[layer_id].glow.ocm_map.scratch_pages);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		} else {
+			for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers;
+			     layer_id++) {
+				if (model->layer[layer_id].type == ML_CNXK_LAYER_TYPE_MRVL) {
+					plt_ml_dbg(
+						"layer_id = %u: wb_pages = %u, scratch_pages = %u\n",
+						layer_id,
+						model->layer[layer_id].glow.ocm_map.wb_pages,
+						model->layer[layer_id].glow.ocm_map.scratch_pages);
+				}
+			}
+#endif
+		}
+
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+			cn10k_ml_model_unload(cnxk_mldev, model);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		else {
+			mvtvm_ml_model_unload(cnxk_mldev, model);
+			return -ENOMEM;
+		}
+#endif
+	}
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CNXK_MODEL_STATE_LOADED;
 	cnxk_mldev->nb_models_loaded++;
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 26/34] ml/cnxk: support start and stop for TVM models
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
                     ` (24 preceding siblings ...)
  2023-10-19  4:17   ` [PATCH v7 25/34] ml/cnxk: enable OCM check for multilayer TVM model Srikanth Yalavarthi
@ 2023-10-19  4:17   ` Srikanth Yalavarthi
  2023-10-19  4:17   ` [PATCH v7 27/34] ml/cnxk: update internal TVM model info structure Srikanth Yalavarthi
                     ` (7 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to start and stop TVM models. TVM model
start would invoke layer start for all Glow layers part
of the model. TVM model stop would invoke layer stop
for all Glow layers part of the model.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c   | 16 ++++++----
 drivers/ml/cnxk/cnxk_ml_ops.c    | 14 +++++++--
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 52 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |  2 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c | 18 +++++++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  2 ++
 6 files changed, 96 insertions(+), 8 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index e7208391fd..2d308802cf 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -827,7 +827,7 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 	struct cn10k_ml_ocm *ocm;
 	struct cnxk_ml_req *req;
 
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	bool job_enqueued;
 	bool job_dequeued;
 	uint8_t num_tiles;
@@ -838,8 +838,6 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 	bool locked;
 	int ret = 0;
 
-	PLT_SET_USED(layer_name);
-
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
 		plt_err("Invalid device = %p", device);
@@ -852,6 +850,10 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 		return -EINVAL;
 	}
 
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
 	layer = &model->layer[layer_id];
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
@@ -1015,14 +1017,12 @@ cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name)
 	struct cn10k_ml_ocm *ocm;
 	struct cnxk_ml_req *req;
 
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	bool job_enqueued;
 	bool job_dequeued;
 	bool locked;
 	int ret = 0;
 
-	PLT_SET_USED(layer_name);
-
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
 		plt_err("Invalid device = %p", device);
@@ -1035,6 +1035,10 @@ cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name)
 		return -EINVAL;
 	}
 
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
 	layer = &model->layer[layer_id];
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index d1471971e4..c38c60bf76 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1216,7 +1216,12 @@ cnxk_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EINVAL;
 	}
 
-	return cn10k_ml_model_start(cnxk_mldev, model);
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		return cn10k_ml_model_start(cnxk_mldev, model);
+	else
+		return mvtvm_ml_model_start(cnxk_mldev, model);
+
+	return 0;
 }
 
 int
@@ -1236,7 +1241,12 @@ cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EINVAL;
 	}
 
-	return cn10k_ml_model_stop(cnxk_mldev, model);
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		return cn10k_ml_model_stop(cnxk_mldev, model);
+	else
+		return mvtvm_ml_model_stop(cnxk_mldev, model);
+
+	return 0;
 }
 
 static int
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 3847f9b6b9..323c7c6fb6 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -213,3 +213,55 @@ mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mode
 
 	return plt_memzone_free(mz);
 }
+
+int
+mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+
+	uint16_t layer_id = 0;
+	int ret = 0;
+
+next_layer:
+	layer = &model->layer[layer_id];
+	if (layer->type == ML_CNXK_LAYER_TYPE_MRVL) {
+		ret = cn10k_ml_layer_start(cnxk_mldev, model->model_id, layer->name);
+		if (ret != 0) {
+			plt_err("Layer start failed, model_id = %u, layer_name = %s, error = %d",
+				model->model_id, layer->name, ret);
+			return ret;
+		}
+	}
+	layer_id++;
+
+	if (layer_id < model->nb_layers)
+		goto next_layer;
+
+	return 0;
+}
+
+int
+mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+
+	uint16_t layer_id = 0;
+	int ret = 0;
+
+next_layer:
+	layer = &model->layer[layer_id];
+	if (layer->type == ML_CNXK_LAYER_TYPE_MRVL) {
+		ret = cn10k_ml_layer_stop(cnxk_mldev, model->model_id, layer->name);
+		if (ret != 0) {
+			plt_err("Layer stop failed, model_id = %u, layer_name = %s, error = %d",
+				model->model_id, layer->name, ret);
+			return ret;
+		}
+	}
+	layer_id++;
+
+	if (layer_id < model->nb_layers)
+		goto next_layer;
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 770794fe7d..55459f9f7f 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -19,5 +19,7 @@ int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index a17a76e41f..b8c2e6a1fc 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -72,3 +72,21 @@ mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mode
 
 	return -EINVAL;
 }
+
+int
+mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+
+	return -EINVAL;
+}
+
+int
+mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+
+	return -EINVAL;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 3776fb5369..1eb663b1d1 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -16,6 +16,8 @@ int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 27/34] ml/cnxk: update internal TVM model info structure
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
                     ` (25 preceding siblings ...)
  2023-10-19  4:17   ` [PATCH v7 26/34] ml/cnxk: support start and stop for TVM models Srikanth Yalavarthi
@ 2023-10-19  4:17   ` Srikanth Yalavarthi
  2023-10-19  4:17   ` [PATCH v7 28/34] ml/cnxk: support device dump for TVM models Srikanth Yalavarthi
                     ` (6 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

From: Prince Takkar <ptakkar@marvell.com>

Added support to update internal model info structure
for TVM models.

Signed-off-by: Prince Takkar <ptakkar@marvell.com>
Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/mvtvm_ml_model.c | 65 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 +
 drivers/ml/cnxk/mvtvm_ml_ops.c   |  3 ++
 3 files changed, 70 insertions(+)

diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index b40b0a13af..650dd970bd 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -11,6 +11,7 @@
 
 #include <roc_api.h>
 
+#include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 
 /* Objects list */
@@ -246,3 +247,67 @@ mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
 
 	return &model->mvtvm.info;
 }
+
+void
+mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct tvmdp_model_metadata *metadata;
+	struct rte_ml_model_info *info;
+	struct rte_ml_io_info *output;
+	struct rte_ml_io_info *input;
+	uint8_t i;
+
+	info = PLT_PTR_CAST(model->info);
+	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
+	output = PLT_PTR_ADD(input, ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info));
+
+	/* Reset model info */
+	memset(info, 0, sizeof(struct rte_ml_model_info));
+
+	if (model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL)
+		goto tvm_mrvl_model;
+
+	metadata = &model->mvtvm.metadata;
+	rte_memcpy(info->name, metadata->model.name, TVMDP_NAME_STRLEN);
+	snprintf(info->version, RTE_ML_STR_MAX, "%u.%u.%u.%u", metadata->model.version[0],
+		 metadata->model.version[1], metadata->model.version[2],
+		 metadata->model.version[3]);
+	info->model_id = model->model_id;
+	info->device_id = cnxk_mldev->mldev->data->dev_id;
+	info->io_layout = RTE_ML_IO_LAYOUT_SPLIT;
+	info->min_batches = model->batch_size;
+	info->max_batches = model->batch_size;
+	info->nb_inputs = metadata->model.num_input;
+	info->input_info = input;
+	info->nb_outputs = metadata->model.num_output;
+	info->output_info = output;
+	info->wb_size = 0;
+
+	/* Set input info */
+	for (i = 0; i < info->nb_inputs; i++) {
+		rte_memcpy(input[i].name, metadata->input[i].name, MRVL_ML_INPUT_NAME_LEN);
+		input[i].nb_dims = metadata->input[i].ndim;
+		input[i].shape = &model->mvtvm.info.input[i].shape[0];
+		input[i].type = model->mvtvm.info.input[i].qtype;
+		input[i].nb_elements = model->mvtvm.info.input[i].nb_elements;
+		input[i].size = model->mvtvm.info.input[i].nb_elements *
+				rte_ml_io_type_size_get(model->mvtvm.info.input[i].qtype);
+	}
+
+	/* Set output info */
+	for (i = 0; i < info->nb_outputs; i++) {
+		rte_memcpy(output[i].name, metadata->output[i].name, MRVL_ML_OUTPUT_NAME_LEN);
+		output[i].nb_dims = metadata->output[i].ndim;
+		output[i].shape = &model->mvtvm.info.output[i].shape[0];
+		output[i].type = model->mvtvm.info.output[i].qtype;
+		output[i].nb_elements = model->mvtvm.info.output[i].nb_elements;
+		output[i].size = model->mvtvm.info.output[i].nb_elements *
+				 rte_ml_io_type_size_get(model->mvtvm.info.output[i].qtype);
+	}
+
+	return;
+
+tvm_mrvl_model:
+	cn10k_ml_model_info_set(cnxk_mldev, model, &model->mvtvm.info,
+				&model->layer[0].glow.metadata);
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index e86581bc6a..a1247ffbde 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -11,6 +11,7 @@
 
 #include "cnxk_ml_io.h"
 
+struct cnxk_ml_dev;
 struct cnxk_ml_model;
 
 /* Maximum number of objects per model */
@@ -52,5 +53,6 @@ int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_n
 				uint16_t *layer_id);
 void mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model);
 struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
+void mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 323c7c6fb6..c6872cd89a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -178,6 +178,9 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	/* Update model I/O data */
 	mvtvm_ml_model_io_info_set(model);
 
+	/* Set model info */
+	mvtvm_ml_model_info_set(cnxk_mldev, model);
+
 	return 0;
 
 error:
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 28/34] ml/cnxk: support device dump for TVM models
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
                     ` (26 preceding siblings ...)
  2023-10-19  4:17   ` [PATCH v7 27/34] ml/cnxk: update internal TVM model info structure Srikanth Yalavarthi
@ 2023-10-19  4:17   ` Srikanth Yalavarthi
  2023-10-19  4:17   ` [PATCH v7 29/34] ml/cnxk: enable reporting model runtime as xstats Srikanth Yalavarthi
                     ` (5 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enabled support to print TVM model layer info.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_model.c  |  7 +++-
 drivers/ml/cnxk/mvtvm_ml_model.c | 59 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  8 +++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  2 ++
 5 files changed, 77 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
index 02f80410ec..ed6a1ed866 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.c
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -68,6 +68,8 @@ cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 	cnxk_ml_print_line(fp, LINE_LEN);
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
 	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->name);
+	fprintf(fp, "%*s : %d\n", FIELD_LEN, "type", model->type);
+	fprintf(fp, "%*s : %d\n", FIELD_LEN, "subtype", model->subtype);
 	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->batch_size);
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "nb_layers", model->nb_layers);
@@ -84,6 +86,9 @@ cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 
 	for (layer_id = 0; layer_id < model->nb_layers; layer_id++) {
 		layer = &model->layer[layer_id];
-		cn10k_ml_layer_print(cnxk_mldev, layer, fp);
+		if (layer->type == ML_CNXK_LAYER_TYPE_MRVL)
+			cn10k_ml_layer_print(cnxk_mldev, layer, fp);
+		else
+			mvtvm_ml_layer_print(cnxk_mldev, layer, fp);
 	}
 }
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 650dd970bd..ffbcec8b80 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -13,6 +13,7 @@
 
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_utils.h"
 
 /* Objects list */
 char mvtvm_object_list[ML_MVTVM_MODEL_OBJECT_MAX][RTE_ML_STR_MAX] = {"mod.so", "mod.json",
@@ -311,3 +312,61 @@ mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 	cn10k_ml_model_info_set(cnxk_mldev, model, &model->mvtvm.info,
 				&model->layer[0].glow.metadata);
 }
+
+void
+mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp)
+{
+	char str[STR_LEN];
+	uint8_t i;
+
+	/* Print debug info */
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, " Layer Information (Layer ID: %u, Name: %s)\n",
+		cnxk_mldev->index_map[layer->index].layer_id, layer->name);
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "layer_id",
+		cnxk_mldev->index_map[layer->index].layer_id);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", layer->name);
+	fprintf(fp, "%*s : %d\n", FIELD_LEN, "type", layer->type);
+	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "layer", PLT_U64_CAST(layer));
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", layer->batch_size);
+
+	/* Print model state */
+	if (layer->state == ML_CNXK_LAYER_STATE_LOADED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+	if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+	if (layer->state == ML_CNXK_LAYER_STATE_STARTED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", layer->info.nb_inputs);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", layer->info.nb_outputs);
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s\n", "input", "input_name", "input_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->info.nb_inputs; i++) {
+		fprintf(fp, "%8u  ", i);
+		fprintf(fp, "%*s  ", 16, layer->info.input[i].name);
+		rte_ml_io_type_to_str(layer->info.input[i].qtype, str, STR_LEN);
+		fprintf(fp, "%*s  ", 12, str);
+	}
+	fprintf(fp, "\n");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s\n", "output", "output_name", "output_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->info.nb_outputs; i++) {
+		fprintf(fp, "%8u  ", i);
+		fprintf(fp, "%*s  ", 16, layer->info.output[i].name);
+		rte_ml_io_type_to_str(layer->info.output[i].qtype, str, STR_LEN);
+		fprintf(fp, "%*s  ", 12, str);
+		fprintf(fp, "\n");
+	}
+	fprintf(fp, "\n");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index a1247ffbde..900ba44fa0 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -13,6 +13,7 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 /* Maximum number of objects per model */
 #define ML_MVTVM_MODEL_OBJECT_MAX 3
@@ -54,5 +55,6 @@ int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_n
 void mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model);
 struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
 void mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+void mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index b8c2e6a1fc..260a051b08 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -36,6 +36,14 @@ mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
 	return NULL;
 }
 
+void
+mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(layer);
+	RTE_SET_USED(fp);
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 1eb663b1d1..d6d0edbcf1 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -9,6 +9,7 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
@@ -22,5 +23,6 @@ int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
 struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
+void mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
 
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 29/34] ml/cnxk: enable reporting model runtime as xstats
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
                     ` (27 preceding siblings ...)
  2023-10-19  4:17   ` [PATCH v7 28/34] ml/cnxk: support device dump for TVM models Srikanth Yalavarthi
@ 2023-10-19  4:17   ` Srikanth Yalavarthi
  2023-10-19  4:17   ` [PATCH v7 30/34] ml/cnxk: implement I/O alloc and free callbacks Srikanth Yalavarthi
                     ` (4 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added model xstats entries to compute runtime latency.
Allocated internal resources for TVM model xstats.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c   |   9 +++
 drivers/ml/cnxk/cn10k_ml_ops.h   |   2 +
 drivers/ml/cnxk/cnxk_ml_ops.c    | 131 +++++++++++++++++++++++++++----
 drivers/ml/cnxk/cnxk_ml_ops.h    |   1 +
 drivers/ml/cnxk/cnxk_ml_xstats.h |   7 ++
 drivers/ml/cnxk/mvtvm_ml_model.h |  24 ++++++
 drivers/ml/cnxk/mvtvm_ml_ops.c   |  96 +++++++++++++++++++++-
 drivers/ml/cnxk/mvtvm_ml_ops.h   |   8 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  23 ++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |   6 ++
 10 files changed, 289 insertions(+), 18 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 2d308802cf..0c67ce7b40 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -197,6 +197,15 @@ cn10k_ml_xstats_layer_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model
 	}
 }
 
+void
+cn10k_ml_xstat_model_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			      uint16_t stat_id, uint16_t entry, char *suffix)
+{
+	snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+		 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+		 model->glow.metadata.model.name, model_xstats[entry].name, suffix);
+}
+
 #define ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
 	do {                                                                                       \
 		value = 0;                                                                         \
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 3d18303ed3..045e2e6cd2 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -331,6 +331,8 @@ int cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name
 int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name);
 
 /* xstats ops */
+void cn10k_ml_xstat_model_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				   uint16_t stat_id, uint16_t entry, char *suffix);
 uint64_t cn10k_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				  enum cnxk_ml_xstats_type type);
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index c38c60bf76..2632d70d8c 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -138,7 +138,8 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
 	nb_stats = RTE_DIM(device_xstats) +
-		   RTE_DIM(layer_xstats) * ML_CNXK_MAX_MODELS * ML_CNXK_MODEL_MAX_LAYERS;
+		   RTE_DIM(layer_xstats) * ML_CNXK_MAX_MODELS * ML_CNXK_MODEL_MAX_LAYERS +
+		   RTE_DIM(model_xstats) * ML_CNXK_MAX_MODELS;
 	if (cnxk_mldev->xstats.entries == NULL)
 		cnxk_mldev->xstats.entries = rte_zmalloc(
 			"cnxk_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
@@ -169,6 +170,25 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
 		cnxk_mldev->xstats.offset_for_model[model] = stat_id;
 
+		for (i = 0; i < RTE_DIM(model_xstats); i++) {
+			cnxk_mldev->xstats.entries[stat_id].map.id = stat_id;
+			cnxk_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
+			cnxk_mldev->xstats.entries[stat_id].group = CNXK_ML_XSTATS_GROUP_MODEL;
+			cnxk_mldev->xstats.entries[stat_id].type = model_xstats[i].type;
+			cnxk_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
+			cnxk_mldev->xstats.entries[stat_id].obj_idx = model;
+			cnxk_mldev->xstats.entries[stat_id].layer_id = -1;
+			cnxk_mldev->xstats.entries[stat_id].reset_allowed =
+				model_xstats[i].reset_allowed;
+
+			/* Name of xstat is updated during model load */
+			snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+				 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name),
+				 "Model-%u-%s", model, model_xstats[i].name);
+
+			stat_id++;
+		}
+
 		for (layer = 0; layer < ML_CNXK_MODEL_MAX_LAYERS; layer++) {
 			cnxk_mldev->xstats.offset_for_layer[model][layer] = stat_id;
 
@@ -195,7 +215,8 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 			cnxk_mldev->xstats.count_per_layer[model][layer] = RTE_DIM(layer_xstats);
 		}
 
-		cnxk_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
+		cnxk_mldev->xstats.count_per_model[model] =
+			RTE_DIM(layer_xstats) + ML_CNXK_MODEL_MAX_LAYERS * RTE_DIM(model_xstats);
 	}
 
 	cnxk_mldev->xstats.count_mode_model = stat_id - cnxk_mldev->xstats.count_mode_device;
@@ -204,6 +225,36 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
+void
+cnxk_ml_xstats_model_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id)
+{
+	struct cnxk_ml_model *model;
+	uint16_t rclk_freq;
+	uint16_t sclk_freq;
+	uint16_t stat_id;
+	char suffix[8];
+	uint16_t i;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	stat_id = cnxk_mldev->xstats.offset_for_model[model_id];
+
+	roc_clk_freq_get(&rclk_freq, &sclk_freq);
+	if (sclk_freq == 0)
+		rte_strscpy(suffix, "cycles", 7);
+	else
+		rte_strscpy(suffix, "ns", 3);
+
+	/* Update xstat name based on layer name and sclk availability */
+	for (i = 0; i < RTE_DIM(model_xstats); i++) {
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+			cn10k_ml_xstat_model_name_set(cnxk_mldev, model, stat_id, i, suffix);
+		else
+			mvtvm_ml_model_xstat_name_set(cnxk_mldev, model, stat_id, i, suffix);
+
+		stat_id++;
+	}
+}
+
 static void
 cnxk_ml_xstats_uninit(struct cnxk_ml_dev *cnxk_mldev)
 {
@@ -247,13 +298,22 @@ cnxk_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx, int32_
 	if (model == NULL)
 		return 0;
 
-	if (layer_id >= 0)
+	if (layer_id >= 0) {
 		layer = &model->layer[layer_id];
-	else
-		return 0;
+		goto layer_xstats;
+	} else {
+		layer = NULL;
+		goto model_xstats;
+	}
 
+layer_xstats:
 	value = cn10k_ml_model_xstat_get(cnxk_mldev, layer, type);
+	goto exit_xstats;
 
+model_xstats:
+	value = mvtvm_ml_model_xstat_get(cnxk_mldev, model, type);
+
+exit_xstats:
 	roc_clk_freq_get(&rclk_freq, &sclk_freq);
 	if (sclk_freq != 0) /* return in ns */
 		value = (value * 1000ULL) / sclk_freq;
@@ -836,8 +896,9 @@ cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode
 {
 	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
 	uint32_t xstats_mode_count;
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	uint32_t idx = 0;
 	uint32_t i;
 
@@ -854,7 +915,17 @@ cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CNXK_MAX_MODELS)
 			break;
-		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+
+		model = cnxk_mldev->mldev->data->models[model_id];
+		for (layer_id = 0; layer_id < model->nb_layers; layer_id++) {
+			if (model->layer[layer_id].type == ML_CNXK_LAYER_TYPE_MRVL)
+				xstats_mode_count +=
+					cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+		}
+
+		if ((model->type == ML_CNXK_MODEL_TYPE_TVM) &&
+		    (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_MRVL))
+			xstats_mode_count += RTE_DIM(model_xstats);
 		break;
 	default:
 		return -EINVAL;
@@ -868,9 +939,20 @@ cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode
 		if (xs->mode != mode)
 			continue;
 
-		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
-		    (model_id != xs->obj_idx || layer_id != xs->layer_id))
-			continue;
+		if (mode == RTE_ML_DEV_XSTATS_MODEL) {
+			if (model_id != xs->obj_idx)
+				continue;
+
+			model = cnxk_mldev->mldev->data->models[model_id];
+			if ((model->type == ML_CNXK_MODEL_TYPE_GLOW ||
+			     model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL) &&
+			    xs->group == CNXK_ML_XSTATS_GROUP_MODEL)
+				continue;
+
+			if (model->type == ML_CNXK_MODEL_TYPE_TVM &&
+			    model->layer[xs->layer_id].type == ML_CNXK_LAYER_TYPE_LLVM)
+				continue;
+		}
 
 		rte_strscpy(xstats_map[idx].name, xs->map.name, RTE_ML_STR_MAX);
 		xstats_map[idx].id = xs->map.id;
@@ -931,9 +1013,10 @@ cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 {
 	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
 	uint32_t xstats_mode_count;
-	uint16_t layer_id = 0;
 	cnxk_ml_xstats_fn fn;
+	uint16_t layer_id;
 	uint64_t val;
 	uint32_t idx;
 	uint32_t i;
@@ -951,7 +1034,14 @@ cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CNXK_MAX_MODELS)
 			return -EINVAL;
-		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+
+		model = cnxk_mldev->mldev->data->models[model_id];
+		for (layer_id = 0; layer_id < model->nb_layers; layer_id++)
+			xstats_mode_count += cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+
+		if ((model->type == ML_CNXK_MODEL_TYPE_TVM) &&
+		    (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_MRVL))
+			xstats_mode_count += RTE_DIM(model_xstats);
 		break;
 	default:
 		return -EINVAL;
@@ -963,11 +1053,18 @@ cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 		if (stat_ids[i] > cnxk_mldev->xstats.count || xs->mode != mode)
 			continue;
 
-		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
-		    (model_id != xs->obj_idx || layer_id != xs->layer_id)) {
-			plt_err("Invalid stats_id[%d] = %d for model_id = %d\n", i, stat_ids[i],
-				model_id);
-			return -EINVAL;
+		if (mode == RTE_ML_DEV_XSTATS_MODEL) {
+			if (model_id != xs->obj_idx)
+				continue;
+
+			model = cnxk_mldev->mldev->data->models[xs->obj_idx];
+			if ((model->type == ML_CNXK_MODEL_TYPE_GLOW ||
+			     model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL) &&
+			    xs->group == CNXK_ML_XSTATS_GROUP_MODEL)
+				continue;
+
+			if (xs->layer_id == -1 && xs->group == CNXK_ML_XSTATS_GROUP_LAYER)
+				continue;
 		}
 
 		switch (xs->fn_id) {
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index b22a2b0d95..ab32676b3e 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -70,6 +70,7 @@ extern struct rte_ml_dev_ops cnxk_ml_ops;
 
 int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
 int cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
+void cnxk_ml_xstats_model_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id);
 
 __rte_hot uint16_t cnxk_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					 struct rte_ml_op **ops, uint16_t nb_ops);
diff --git a/drivers/ml/cnxk/cnxk_ml_xstats.h b/drivers/ml/cnxk/cnxk_ml_xstats.h
index 5e02bb876c..a2c9adfe4a 100644
--- a/drivers/ml/cnxk/cnxk_ml_xstats.h
+++ b/drivers/ml/cnxk/cnxk_ml_xstats.h
@@ -142,4 +142,11 @@ static const struct cnxk_ml_xstat_info layer_xstats[] = {
 	{"Min-FW-Latency", min_fw_latency, 1}, {"Max-FW-Latency", max_fw_latency, 1},
 };
 
+/* Model xstats */
+static const struct cnxk_ml_xstat_info model_xstats[] = {
+	{"Avg-RT-Latency", avg_rt_latency, 1},
+	{"Min-RT-Latency", min_rt_latency, 1},
+	{"Max-RT-Latency", max_rt_latency, 1},
+};
+
 #endif /* _CNXK_ML_XSTATS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 900ba44fa0..66c3af18e1 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -33,6 +33,27 @@ struct mvtvm_ml_model_object {
 	int64_t size;
 };
 
+/* Model fast-path stats */
+struct mvtvm_ml_model_xstats {
+	/* Total TVM runtime latency, sum of all inferences */
+	uint64_t tvm_rt_latency_tot;
+
+	/* TVM runtime latency */
+	uint64_t tvm_rt_latency;
+
+	/* Minimum TVM runtime latency */
+	uint64_t tvm_rt_latency_min;
+
+	/* Maximum TVM runtime latency */
+	uint64_t tvm_rt_latency_max;
+
+	/* Total jobs dequeued */
+	uint64_t dequeued_count;
+
+	/* Hardware stats reset index */
+	uint64_t tvm_rt_reset_count;
+};
+
 struct mvtvm_ml_model_data {
 	/* Model metadata */
 	struct tvmdp_model_metadata metadata;
@@ -45,6 +66,9 @@ struct mvtvm_ml_model_data {
 
 	/* Model I/O info */
 	struct cnxk_ml_io_info info;
+
+	/* Stats for burst ops */
+	struct mvtvm_ml_model_xstats *burst_xstats;
 };
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index c6872cd89a..abfbae2b3a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -10,10 +10,83 @@
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
+#include "cnxk_ml_xstats.h"
 
 /* ML model macros */
 #define MVTVM_ML_MODEL_MEMZONE_NAME "ml_mvtvm_model_mz"
 
+void
+mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			      uint16_t stat_id, uint16_t entry, char *suffix)
+{
+	snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+		 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+		 model->mvtvm.metadata.model.name, model_xstats[entry].name, suffix);
+}
+
+#define ML_AVG_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count)                            \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value += model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_tot;              \
+			count += model->mvtvm.burst_xstats[qp_id].dequeued_count -                 \
+				 model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count;              \
+		}                                                                                  \
+		if (count != 0)                                                                    \
+			value = value / count;                                                     \
+	} while (0)
+
+#define ML_MIN_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count)                            \
+	do {                                                                                       \
+		value = UINT64_MAX;                                                                \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MIN(value,                                                     \
+					model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_min);      \
+			count += model->mvtvm.burst_xstats[qp_id].dequeued_count -                 \
+				 model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count;              \
+		}                                                                                  \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+
+#define ML_MAX_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count)                            \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MAX(value,                                                     \
+					model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_max);      \
+			count += model->mvtvm.burst_xstats[qp_id].dequeued_count -                 \
+				 model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count;              \
+		}                                                                                  \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+
+uint64_t
+mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			 enum cnxk_ml_xstats_type type)
+{
+	uint64_t count = 0;
+	uint64_t value = 0;
+	uint32_t qp_id;
+
+	switch (type) {
+	case avg_rt_latency:
+		ML_AVG_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count);
+		break;
+	case min_rt_latency:
+		ML_MIN_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count);
+		break;
+	case max_rt_latency:
+		ML_MAX_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count);
+		break;
+	default:
+		value = 0;
+	}
+
+	return value;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
@@ -53,6 +126,7 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
 	size_t model_object_size = 0;
+	size_t model_xstats_size = 0;
 	uint16_t nb_mrvl_layers;
 	uint16_t nb_llvm_layers;
 	uint8_t layer_id = 0;
@@ -68,7 +142,11 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	model_object_size = RTE_ALIGN_CEIL(object[0].size, RTE_CACHE_LINE_MIN_SIZE) +
 			    RTE_ALIGN_CEIL(object[1].size, RTE_CACHE_LINE_MIN_SIZE) +
 			    RTE_ALIGN_CEIL(object[2].size, RTE_CACHE_LINE_MIN_SIZE);
-	mz_size += model_object_size;
+
+	model_xstats_size =
+		cnxk_mldev->mldev->data->nb_queue_pairs * sizeof(struct mvtvm_ml_model_xstats);
+
+	mz_size += model_object_size + model_xstats_size;
 
 	/* Allocate memzone for model object */
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", MVTVM_ML_MODEL_MEMZONE_NAME, model->model_id);
@@ -181,6 +259,22 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	/* Set model info */
 	mvtvm_ml_model_info_set(cnxk_mldev, model);
 
+	/* Update model xstats name */
+	cnxk_ml_xstats_model_name_update(cnxk_mldev, model->model_id);
+
+	model->mvtvm.burst_xstats = RTE_PTR_ADD(
+		model->mvtvm.object.params.addr,
+		RTE_ALIGN_CEIL(model->mvtvm.object.params.size, RTE_CACHE_LINE_MIN_SIZE));
+
+	for (int qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_tot = 0;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency = 0;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_min = UINT64_MAX;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_max = 0;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count = 0;
+		model->mvtvm.burst_xstats[qp_id].dequeued_count = 0;
+	}
+
 	return 0;
 
 error:
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 55459f9f7f..22e0340146 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -11,8 +11,11 @@
 
 #include <rte_mldev.h>
 
+#include "cnxk_ml_xstats.h"
+
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
@@ -22,4 +25,9 @@ int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *
 int mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
+void mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				   uint16_t stat_id, uint16_t entry, char *suffix);
+uint64_t mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				  enum cnxk_ml_xstats_type type);
+
 #endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index 260a051b08..19af1d2703 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -8,6 +8,7 @@
 
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_xstats.h"
 
 enum cnxk_ml_model_type
 mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
@@ -44,6 +45,28 @@ mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer
 	RTE_SET_USED(fp);
 }
 
+void
+mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			      uint16_t stat_id, uint16_t entry, char *suffix)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+	RTE_SET_USED(stat_id);
+	RTE_SET_USED(entry);
+	RTE_SET_USED(suffix);
+}
+
+uint64_t
+mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			 enum cnxk_ml_xstats_type type)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+	RTE_SET_USED(type);
+
+	return 0;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index d6d0edbcf1..3fd1f04c35 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -7,6 +7,8 @@
 
 #include <rte_mldev.h>
 
+#include "cnxk_ml_xstats.h"
+
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
@@ -24,5 +26,9 @@ int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_n
 				uint16_t *layer_id);
 struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
 void mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
+void mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				   uint16_t stat_id, uint16_t entry, char *suffix);
+uint64_t mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				  enum cnxk_ml_xstats_type type);
 
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 30/34] ml/cnxk: implement I/O alloc and free callbacks
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
                     ` (28 preceding siblings ...)
  2023-10-19  4:17   ` [PATCH v7 29/34] ml/cnxk: enable reporting model runtime as xstats Srikanth Yalavarthi
@ 2023-10-19  4:17   ` Srikanth Yalavarthi
  2023-10-19  4:17   ` [PATCH v7 31/34] ml/cnxk: add generic ML malloc and free callback Srikanth Yalavarthi
                     ` (3 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented callback functions for IO allocation and free
for Glow layers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 87 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  3 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c |  2 +
 3 files changed, 92 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 0c67ce7b40..7802425c87 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1410,3 +1410,90 @@ cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
 error_enqueue:
 	return ret;
 }
+
+int
+cn10k_ml_io_alloc(void *device, uint16_t model_id, const char *layer_name, uint64_t **input_qbuffer,
+		  uint64_t **output_qbuffer)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint64_t output_size;
+	uint64_t input_size;
+	uint16_t layer_id;
+	int ret;
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
+	layer = &model->layer[layer_id];
+	input_size = PLT_ALIGN_CEIL(layer->info.total_input_sz_q, ML_CN10K_ALIGN_SIZE);
+	output_size = PLT_ALIGN_CEIL(layer->info.total_output_sz_q, ML_CN10K_ALIGN_SIZE);
+
+	sprintf(str, "cn10k_ml_io_mz_%u_%u", model_id, layer_id);
+	mz = plt_memzone_reserve_aligned(str, input_size + output_size, 0, ML_CN10K_ALIGN_SIZE);
+	if (mz == NULL) {
+		plt_err("io_alloc failed: Unable to allocate memory: model_id = %u, layer_name = %s",
+			model_id, layer_name);
+		return -ENOMEM;
+	}
+
+	*input_qbuffer = mz->addr;
+	*output_qbuffer = PLT_PTR_ADD(mz->addr, input_size);
+
+	return 0;
+}
+
+int
+cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint16_t layer_id;
+	int ret;
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
+	sprintf(str, "cn10k_ml_io_mz_%u_%u", model_id, layer_id);
+	mz = plt_memzone_lookup(str);
+	if (mz == NULL) {
+		plt_err("io_free failed: Memzone not found: model_id = %u, layer_name = %s",
+			model_id, layer_name);
+		return -EINVAL;
+	}
+
+	return plt_memzone_free(mz);
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 045e2e6cd2..9c41c1c0b0 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -329,6 +329,9 @@ int cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name,
 int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name);
 int cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name);
 int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name);
+int cn10k_ml_io_alloc(void *device, uint16_t model_id, const char *layer_name,
+		      uint64_t **input_qbuffer, uint64_t **output_qbuffer);
+int cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name);
 
 /* xstats ops */
 void cn10k_ml_xstat_model_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index abfbae2b3a..a50b31ec6e 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -232,6 +232,8 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback = &model->mvtvm.cb;
 		callback->tvmrt_glow_layer_load = cn10k_ml_layer_load;
 		callback->tvmrt_glow_layer_unload = cn10k_ml_layer_unload;
+		callback->tvmrt_io_alloc = cn10k_ml_io_alloc;
+		callback->tvmrt_io_free = cn10k_ml_io_free;
 	} else {
 		callback = NULL;
 	}
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 31/34] ml/cnxk: add generic ML malloc and free callback
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
                     ` (29 preceding siblings ...)
  2023-10-19  4:17   ` [PATCH v7 30/34] ml/cnxk: implement I/O alloc and free callbacks Srikanth Yalavarthi
@ 2023-10-19  4:17   ` Srikanth Yalavarthi
  2023-10-19  4:17   ` [PATCH v7 32/34] ml/cnxk: support quantize and dequantize callback Srikanth Yalavarthi
                     ` (2 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented generic ML malloc and free callbacks

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 30 ++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  3 +++
 drivers/ml/cnxk/mvtvm_ml_ops.c |  2 ++
 3 files changed, 35 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 7802425c87..01b0a44caa 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1497,3 +1497,33 @@ cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name)
 
 	return plt_memzone_free(mz);
 }
+
+int
+cn10k_ml_malloc(const char *name, size_t size, uint32_t align, void **addr)
+{
+	const struct plt_memzone *mz;
+
+	mz = plt_memzone_reserve_aligned(name, size, 0, align);
+	if (mz == NULL) {
+		plt_err("ml_malloc failed: Unable to allocate memory: name = %s", name);
+		return -ENOMEM;
+	}
+
+	*addr = mz->addr;
+
+	return 0;
+}
+
+int
+cn10k_ml_free(const char *name)
+{
+	const struct plt_memzone *mz;
+
+	mz = plt_memzone_lookup(name);
+	if (mz == NULL) {
+		plt_err("ml_free failed: Memzone not found: name = %s", name);
+		return -EINVAL;
+	}
+
+	return plt_memzone_free(mz);
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 9c41c1c0b0..eb3e1c139c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -333,6 +333,9 @@ int cn10k_ml_io_alloc(void *device, uint16_t model_id, const char *layer_name,
 		      uint64_t **input_qbuffer, uint64_t **output_qbuffer);
 int cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name);
 
+int cn10k_ml_malloc(const char *name, size_t size, uint32_t align, void **addr);
+int cn10k_ml_free(const char *name);
+
 /* xstats ops */
 void cn10k_ml_xstat_model_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 				   uint16_t stat_id, uint16_t entry, char *suffix);
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index a50b31ec6e..9d59e28661 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -234,6 +234,8 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback->tvmrt_glow_layer_unload = cn10k_ml_layer_unload;
 		callback->tvmrt_io_alloc = cn10k_ml_io_alloc;
 		callback->tvmrt_io_free = cn10k_ml_io_free;
+		callback->tvmrt_malloc = cn10k_ml_malloc;
+		callback->tvmrt_free = cn10k_ml_free;
 	} else {
 		callback = NULL;
 	}
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 32/34] ml/cnxk: support quantize and dequantize callback
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
                     ` (30 preceding siblings ...)
  2023-10-19  4:17   ` [PATCH v7 31/34] ml/cnxk: add generic ML malloc and free callback Srikanth Yalavarthi
@ 2023-10-19  4:17   ` Srikanth Yalavarthi
  2023-10-19  4:17   ` [PATCH v7 33/34] ml/cnxk: enable fast-path ops for TVM models Srikanth Yalavarthi
  2023-10-19  4:17   ` [PATCH v7 34/34] ml/cnxk: enable creation of mvtvm virtual device Srikanth Yalavarthi
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

From: Prince Takkar <ptakkar@marvell.com>

Added support for quantize and dequantize callback
functions for TVM models.

Signed-off-by: Prince Takkar <ptakkar@marvell.com>
---
 drivers/ml/cnxk/mvtvm_ml_ops.c | 129 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h |   4 +
 2 files changed, 133 insertions(+)

diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 9d59e28661..39c8bf0f04 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -2,11 +2,15 @@
  * Copyright (c) 2023 Marvell.
  */
 
+#include <dlpack/dlpack.h>
+
 #include <rte_common.h>
 #include <rte_cycles.h>
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include <mldev_utils.h>
+
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
@@ -236,6 +240,8 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback->tvmrt_io_free = cn10k_ml_io_free;
 		callback->tvmrt_malloc = cn10k_ml_malloc;
 		callback->tvmrt_free = cn10k_ml_free;
+		callback->tvmrt_quantize = mvtvm_ml_io_quantize;
+		callback->tvmrt_dequantize = mvtvm_ml_io_dequantize;
 	} else {
 		callback = NULL;
 	}
@@ -366,3 +372,126 @@ mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
 
 	return 0;
 }
+
+int
+mvtvm_ml_io_quantize(void *device, uint16_t model_id, const char *layer_name,
+		     const DLTensor **deq_tensor, void *qbuffer)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	uint16_t layer_id = 0;
+	uint8_t *lcl_dbuffer;
+	uint8_t *lcl_qbuffer;
+	uint32_t i;
+	int ret;
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if ((device == NULL) || (deq_tensor == NULL) || (qbuffer == NULL))
+		return -EINVAL;
+#endif
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+#ifdef CNXK_ML_DEV_DEBUG
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+#endif
+
+	/* Get layer id */
+	for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+		if (strcmp(model->layer[layer_id].name, layer_name) == 0)
+			break;
+	}
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if (layer_id == model->mvtvm.metadata.model.nb_layers) {
+		plt_err("Invalid layer name: %s", layer_name);
+		return -EINVAL;
+	}
+
+	if (model->layer[layer_id].type != ML_CNXK_LAYER_TYPE_MRVL) {
+		plt_err("Invalid layer name / type: %s", layer_name);
+		return -EINVAL;
+	}
+#endif
+
+	info = &model->layer[layer_id].info;
+	lcl_qbuffer = (uint8_t *)qbuffer;
+
+	for (i = 0; i < info->nb_inputs; i++) {
+		lcl_dbuffer = PLT_PTR_ADD(deq_tensor[i]->data, deq_tensor[i]->byte_offset);
+
+		ret = cnxk_ml_io_quantize_single(&info->input[i], lcl_dbuffer, lcl_qbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_qbuffer += info->input[i].sz_q;
+	}
+
+	return 0;
+}
+
+int
+mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name, void *qbuffer,
+		       const DLTensor **deq_tensor)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	uint16_t layer_id = 0;
+	uint8_t *lcl_dbuffer;
+	uint8_t *lcl_qbuffer;
+	uint32_t i;
+	int ret;
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if ((device == NULL) || (deq_tensor == NULL) || (qbuffer == NULL))
+		return -EINVAL;
+#endif
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+#ifdef CNXK_ML_DEV_DEBUG
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+#endif
+
+	for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+		if (strcmp(model->layer[layer_id].name, layer_name) == 0)
+			break;
+	}
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if (layer_id == model->mvtvm.metadata.model.nb_layers) {
+		plt_err("Invalid layer name: %s", layer_name);
+		return -EINVAL;
+	}
+
+	if (model->layer[layer_id].type != ML_CNXK_LAYER_TYPE_MRVL) {
+		plt_err("Invalid layer name / type: %s", layer_name);
+		return -EINVAL;
+	}
+#endif
+
+	info = &model->layer[layer_id].info;
+	lcl_qbuffer = (uint8_t *)qbuffer;
+
+	for (i = 0; i < info->nb_outputs; i++) {
+		lcl_dbuffer = PLT_PTR_ADD(deq_tensor[i]->data, deq_tensor[i]->byte_offset);
+
+		ret = cnxk_ml_io_dequantize_single(&info->output[i], lcl_qbuffer, lcl_dbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_qbuffer += info->output[i].sz_q;
+	}
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 22e0340146..4cabe30a82 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -24,6 +24,10 @@ int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_para
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_io_quantize(void *device, uint16_t model_id, const char *layer_name,
+			 const DLTensor **deq_tensor, void *qbuffer);
+int mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name, void *qbuffer,
+			   const DLTensor **deq_tensor);
 
 void mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 				   uint16_t stat_id, uint16_t entry, char *suffix);
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 33/34] ml/cnxk: enable fast-path ops for TVM models
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
                     ` (31 preceding siblings ...)
  2023-10-19  4:17   ` [PATCH v7 32/34] ml/cnxk: support quantize and dequantize callback Srikanth Yalavarthi
@ 2023-10-19  4:17   ` Srikanth Yalavarthi
  2023-10-19  4:17   ` [PATCH v7 34/34] ml/cnxk: enable creation of mvtvm virtual device Srikanth Yalavarthi
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

From: Anup Prabhu <aprabhu@marvell.com>

Enable fast-path ops support for TVM models. Models would
use TVMDP library function calls to execute inference
operations for Hybrid and LLVM model sub-types.

For TVM MRVL model subtypes that have a single MRVL layer,
the inference requests are directly enqueued to hardware
by the driver.

Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 doc/guides/rel_notes/release_23_11.rst |   3 +
 drivers/ml/cnxk/cn10k_ml_ops.c         |   4 -
 drivers/ml/cnxk/cnxk_ml_io.h           |   6 ++
 drivers/ml/cnxk/cnxk_ml_ops.c          |   4 +
 drivers/ml/cnxk/cnxk_ml_ops.h          |   5 +
 drivers/ml/cnxk/mvtvm_ml_model.c       |  20 ++++
 drivers/ml/cnxk/mvtvm_ml_model.h       |   6 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c         | 124 +++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h         |  43 +++++++++
 9 files changed, 211 insertions(+), 4 deletions(-)

diff --git a/doc/guides/rel_notes/release_23_11.rst b/doc/guides/rel_notes/release_23_11.rst
index d7f4484558..79ee9d51cf 100644
--- a/doc/guides/rel_notes/release_23_11.rst
+++ b/doc/guides/rel_notes/release_23_11.rst
@@ -247,6 +247,9 @@ New Features
   Added dispatcher library which purpose is to help decouple different
   parts (modules) of an eventdev-based application.
 
+* **Updated Marvell cnxk mldev driver.**
+
+  * Added support for models compiled using TVM framework.
 
 Removed Items
 -------------
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 01b0a44caa..b9d30278c6 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -371,10 +371,6 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 	else
 		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
-	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
-	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
-	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
-
 	return 0;
 }
 
diff --git a/drivers/ml/cnxk/cnxk_ml_io.h b/drivers/ml/cnxk/cnxk_ml_io.h
index 5de166c252..6d5d25a7c9 100644
--- a/drivers/ml/cnxk/cnxk_ml_io.h
+++ b/drivers/ml/cnxk/cnxk_ml_io.h
@@ -47,6 +47,12 @@ struct cnxk_ml_io {
 
 	/* Scale */
 	float scale;
+
+	/* Dequantized offset */
+	uint32_t offset_d;
+
+	/* Quantized offset */
+	uint32_t offset_q;
 };
 
 /* Model / Layer IO structure */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 2632d70d8c..bf266d4d6e 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -632,6 +632,10 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	cnxk_mldev->max_nb_layers =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
 
+	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
+	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
+	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
+
 	/* Allocate and initialize index_map */
 	if (cnxk_mldev->index_map == NULL) {
 		cnxk_mldev->index_map =
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index ab32676b3e..7b49793a57 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -24,6 +24,11 @@ struct cnxk_ml_req {
 	union {
 		/* CN10K */
 		struct cn10k_ml_req cn10k_req;
+
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		/* MVTVM */
+		struct mvtvm_ml_req mvtvm_req;
+#endif
 	};
 
 	/* Address of status field */
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index ffbcec8b80..95bde6a9cb 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -198,6 +198,16 @@ mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model)
 		model->mvtvm.info.total_input_sz_d += model->mvtvm.info.input[i].sz_d;
 		model->mvtvm.info.total_input_sz_q += model->mvtvm.info.input[i].sz_q;
 
+		model->mvtvm.info.input[i].offset_d = model->mvtvm.info.total_input_sz_d;
+		model->mvtvm.info.input[i].offset_q = model->mvtvm.info.total_input_sz_q;
+
+		model->mvtvm.input_tensor[i].device = metadata->input[i].device;
+		model->mvtvm.input_tensor[i].ndim = metadata->input[i].ndim;
+		model->mvtvm.input_tensor[i].dtype = metadata->input[i].datatype;
+		model->mvtvm.input_tensor[i].shape = metadata->input[i].shape;
+		model->mvtvm.input_tensor[i].strides = NULL;
+		model->mvtvm.input_tensor[i].byte_offset = model->mvtvm.info.input[i].offset_q;
+
 		plt_ml_dbg("model_id = %u, input[%u] - sz_d = %u sz_q = %u", model->model_id, i,
 			   model->mvtvm.info.input[i].sz_d, model->mvtvm.info.input[i].sz_q);
 	}
@@ -231,6 +241,16 @@ mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model)
 		model->mvtvm.info.total_output_sz_d += model->mvtvm.info.output[i].sz_d;
 		model->mvtvm.info.total_output_sz_q += model->mvtvm.info.output[i].sz_q;
 
+		model->mvtvm.info.output[i].offset_d = model->mvtvm.info.total_output_sz_d;
+		model->mvtvm.info.output[i].offset_q = model->mvtvm.info.total_output_sz_q;
+
+		model->mvtvm.output_tensor[i].device = metadata->output[i].device;
+		model->mvtvm.output_tensor[i].ndim = metadata->output[i].ndim;
+		model->mvtvm.output_tensor[i].dtype = metadata->output[i].datatype;
+		model->mvtvm.output_tensor[i].shape = metadata->output[i].shape;
+		model->mvtvm.output_tensor[i].strides = NULL;
+		model->mvtvm.output_tensor[i].byte_offset = model->mvtvm.info.output[i].offset_q;
+
 		plt_ml_dbg("model_id = %u, output[%u] - sz_d = %u sz_q = %u", model->model_id, i,
 			   model->mvtvm.info.output[i].sz_d, model->mvtvm.info.output[i].sz_q);
 	}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 66c3af18e1..7ffce38094 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -69,6 +69,12 @@ struct mvtvm_ml_model_data {
 
 	/* Stats for burst ops */
 	struct mvtvm_ml_model_xstats *burst_xstats;
+
+	/* Input Tensor */
+	DLTensor input_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Output Tensor */
+	DLTensor output_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
 };
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 39c8bf0f04..6b88491371 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -19,6 +19,12 @@
 /* ML model macros */
 #define MVTVM_ML_MODEL_MEMZONE_NAME "ml_mvtvm_model_mz"
 
+__rte_hot static void
+mvtvm_ml_set_poll_addr(struct cnxk_ml_req *req)
+{
+	req->status = &req->mvtvm_req.status;
+}
+
 void
 mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 			      uint16_t stat_id, uint16_t entry, char *suffix)
@@ -242,6 +248,7 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback->tvmrt_free = cn10k_ml_free;
 		callback->tvmrt_quantize = mvtvm_ml_io_quantize;
 		callback->tvmrt_dequantize = mvtvm_ml_io_dequantize;
+		callback->tvmrt_inference = cn10k_ml_inference_sync;
 	} else {
 		callback = NULL;
 	}
@@ -285,6 +292,19 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		model->mvtvm.burst_xstats[qp_id].dequeued_count = 0;
 	}
 
+	/* Set model specific fast path functions */
+	if (model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL) {
+		model->enqueue_single = cn10k_ml_enqueue_single;
+		model->result_update = cn10k_ml_result_update;
+		model->set_error_code = cn10k_ml_set_error_code;
+		model->set_poll_addr = cn10k_ml_set_poll_addr;
+	} else {
+		model->enqueue_single = mvtvm_ml_enqueue_single;
+		model->result_update = mvtvm_ml_result_update;
+		model->set_error_code = mvtvm_ml_set_error_code;
+		model->set_poll_addr = mvtvm_ml_set_poll_addr;
+	}
+
 	return 0;
 
 error:
@@ -495,3 +515,107 @@ mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name,
 
 	return 0;
 }
+
+static int
+mvtvm_ml_model_run(struct cnxk_ml_model *model, struct rte_ml_op *op, struct cnxk_ml_req *req)
+{
+	uint8_t i;
+
+	rte_memcpy(req->mvtvm_req.input_tensor, model->mvtvm.input_tensor,
+		   model->mvtvm.metadata.model.num_input * sizeof(DLTensor));
+	for (i = 0; i < model->mvtvm.metadata.model.num_input; i++) {
+		req->mvtvm_req.input_tensor[i].data = op->input[i]->addr;
+		req->mvtvm_req.input_tensor[i].byte_offset = 0;
+	}
+
+	rte_memcpy(req->mvtvm_req.output_tensor, model->mvtvm.output_tensor,
+		   model->mvtvm.metadata.model.num_output * sizeof(DLTensor));
+	for (i = 0; i < model->mvtvm.metadata.model.num_output; i++) {
+		req->mvtvm_req.output_tensor[i].data = op->output[i]->addr;
+		req->mvtvm_req.output_tensor[i].byte_offset = 0;
+	}
+
+	tvmdp_model_run(model->model_id, model->mvtvm.metadata.model.num_input,
+			req->mvtvm_req.input_tensor, model->mvtvm.metadata.model.num_output,
+			req->mvtvm_req.output_tensor, &req->mvtvm_req.result,
+			&req->mvtvm_req.status);
+
+	plt_write64(ML_CNXK_POLL_JOB_FINISH, req->status);
+
+	return 0;
+}
+
+__rte_hot void
+mvtvm_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype)
+{
+	RTE_SET_USED(stype);
+
+	req->mvtvm_req.result.error_code = etype;
+}
+
+__rte_hot bool
+mvtvm_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op, uint16_t layer_id,
+			struct cnxk_ml_qp *qp, uint64_t head)
+{
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_req *req;
+
+	RTE_SET_USED(layer_id);
+
+	queue = &qp->queue;
+	req = &queue->reqs[head];
+	model = cnxk_mldev->mldev->data->models[op->model_id];
+
+	model->set_poll_addr(req);
+	memset(&req->mvtvm_req.result, 0, sizeof(struct mvtvm_ml_result));
+	req->mvtvm_req.result.error_code = 0x0;
+	req->mvtvm_req.result.user_ptr = op->user_ptr;
+
+	cnxk_ml_set_poll_ptr(req);
+	mvtvm_ml_model_run(model, op, req);
+	req->timeout = plt_tsc_cycles() + queue->wait_cycles;
+	req->op = op;
+
+	return true;
+}
+
+__rte_hot void
+mvtvm_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request)
+{
+	struct mvtvm_ml_model_xstats *xstats;
+	struct mvtvm_ml_result *result;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_req *req;
+	uint64_t tvm_rt_latency;
+	struct cnxk_ml_qp *qp;
+	struct rte_ml_op *op;
+
+	req = (struct cnxk_ml_req *)request;
+	result = &req->mvtvm_req.result;
+	op = req->op;
+	qp = cnxk_mldev->mldev->data->queue_pairs[qp_id];
+	op->impl_opaque = result->error_code;
+
+	if (likely(result->error_code == 0)) {
+		qp->stats.dequeued_count++;
+		op->status = RTE_ML_OP_STATUS_SUCCESS;
+
+		model = cnxk_mldev->mldev->data->models[op->model_id];
+		xstats = &model->mvtvm.burst_xstats[qp_id];
+
+		if (unlikely(xstats->dequeued_count == xstats->tvm_rt_reset_count)) {
+			xstats->tvm_rt_latency_min = UINT64_MAX;
+			xstats->tvm_rt_latency_max = 0;
+		}
+		tvm_rt_latency = result->stats.end_ns - result->stats.start_ns;
+		xstats->tvm_rt_latency = tvm_rt_latency;
+		xstats->tvm_rt_latency_tot += tvm_rt_latency;
+		xstats->tvm_rt_latency_min = RTE_MIN(xstats->tvm_rt_latency_min, tvm_rt_latency);
+		xstats->tvm_rt_latency_max = RTE_MAX(xstats->tvm_rt_latency_max, tvm_rt_latency);
+		xstats->dequeued_count++;
+	} else {
+		qp->stats.dequeue_err_count++;
+		op->status = RTE_ML_OP_STATUS_ERROR;
+	}
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 4cabe30a82..cb4b219743 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -16,6 +16,44 @@
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
+struct cnxk_ml_qp;
+struct cnxk_ml_req;
+
+/* Inference stats */
+struct mvtvm_ml_stats {
+	/* Start ns */
+	uint64_t start_ns;
+
+	/* Start ns */
+	uint64_t end_ns;
+};
+
+/* Result structure */
+struct mvtvm_ml_result {
+	/* Job error code */
+	uint64_t error_code;
+
+	/* Inference stats */
+	struct mvtvm_ml_stats stats;
+
+	/* User context pointer */
+	void *user_ptr;
+};
+
+/* MVTVM specific request */
+struct mvtvm_ml_req {
+	/* Input tensors */
+	DLTensor input_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Output tensors */
+	DLTensor output_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Status field for poll mode requests */
+	volatile uint64_t status;
+
+	/* Result */
+	struct mvtvm_ml_result result;
+};
 
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
@@ -29,6 +67,11 @@ int mvtvm_ml_io_quantize(void *device, uint16_t model_id, const char *layer_name
 int mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name, void *qbuffer,
 			   const DLTensor **deq_tensor);
 
+__rte_hot bool mvtvm_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op,
+				       uint16_t layer_id, struct cnxk_ml_qp *qp, uint64_t head);
+__rte_hot void mvtvm_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request);
+__rte_hot void mvtvm_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype);
+
 void mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 				   uint16_t stat_id, uint16_t entry, char *suffix);
 uint64_t mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v7 34/34] ml/cnxk: enable creation of mvtvm virtual device
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
                     ` (32 preceding siblings ...)
  2023-10-19  4:17   ` [PATCH v7 33/34] ml/cnxk: enable fast-path ops for TVM models Srikanth Yalavarthi
@ 2023-10-19  4:17   ` Srikanth Yalavarthi
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  4:17 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enable support to create a mvtvm virtual device on
system's without a PCI based ML HW accelerator.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 doc/guides/mldevs/cnxk.rst       |  50 +++++++-
 drivers/ml/cnxk/cn10k_ml_dev.c   |   8 ++
 drivers/ml/cnxk/cn10k_ml_dev.h   |   3 +
 drivers/ml/cnxk/cnxk_ml_dev.c    |   3 +
 drivers/ml/cnxk/cnxk_ml_dev.h    |  21 ++++
 drivers/ml/cnxk/cnxk_ml_ops.c    |  82 +++++++++----
 drivers/ml/cnxk/meson.build      |   1 +
 drivers/ml/cnxk/mvtvm_ml_dev.c   | 196 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_dev.h   |  40 +++++++
 drivers/ml/cnxk/mvtvm_ml_ops.c   |  31 +++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |   2 +
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  18 +++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |   2 +
 13 files changed, 433 insertions(+), 24 deletions(-)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.h

diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
index 5fe572d225..498fea284b 100644
--- a/doc/guides/mldevs/cnxk.rst
+++ b/doc/guides/mldevs/cnxk.rst
@@ -223,6 +223,23 @@ Bind the ML PF device to the vfio_pci driver:
    usertools/dpdk-devbind.py -u 0000:00:10.0
    usertools/dpdk-devbind.py -b vfio-pci 0000:00:10.0
 
+VDEV support
+------------
+
+On platforms which don't support ML hardware acceleration through PCI device, the
+Marvell ML CNXK PMD can execute inference operations on a vdev with the ML models
+compiled using Apache TVM framework.
+
+VDEV can be enabled by passing the EAL arguments
+
+.. code-block:: console
+
+   --vdev ml_mvtvm
+
+VDEV can also be used on platforms with ML HW accelerator. However to use VDEV in
+this case, the PCI device has to be un-binded. When PCI device is binded, creation
+of vdev is skipped.
+
 
 Runtime Config Options
 ----------------------
@@ -233,6 +250,8 @@ Runtime Config Options
   The parameter ``fw_path`` can be used by the user
   to load ML firmware from a custom path.
 
+  This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,fw_path="/home/user/ml_fw.bin"
@@ -248,6 +267,8 @@ Runtime Config Options
   When enabled, firmware would mask the DPE non-fatal hardware errors as warnings.
   The parameter ``enable_dpe_warnings`` is used fo this configuration.
 
+  This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,enable_dpe_warnings=0
@@ -264,11 +285,19 @@ Runtime Config Options
   Caching of model data improves the inferencing throughput / latency for the model.
   The parameter ``cache_model_data`` is used to enable data caching.
 
+  This option is supported on PCI HW accelerator and vdev.
+
   For example::
 
      -a 0000:00:10.0,cache_model_data=0
 
-  With the above configuration, model data caching is disabled.
+  With the above configuration, model data caching is disabled on HW accelerator.
+
+  For example::
+
+     --vdev ml_mvtvm,cache_model_data=0
+
+  With the above configuration, model data caching is disabled on vdev.
 
 
 **OCM allocation mode** (default ``lowest``)
@@ -284,6 +313,8 @@ Runtime Config Options
   ``largest``
     Allocate OCM for the model from the slot with largest amount of free space.
 
+  This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,ocm_alloc_mode=lowest
@@ -301,6 +332,8 @@ Runtime Config Options
   Supported page sizes by the driver are 1 KB, 2 KB, 4 KB, 8 KB and 16 KB.
   Default page size is 16 KB.
 
+  This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,ocm_page_size=8192
@@ -325,6 +358,8 @@ Runtime Config Options
     Enabling spinlock version would disable restrictions on the number of queue-pairs
     that can be supported by the driver.
 
+   This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,hw_queue_lock=1
@@ -333,6 +368,19 @@ Runtime Config Options
   in the fast path enqueue burst operation.
 
 
+**Maximum queue pairs** (default ``1``)
+
+  VDEV supports additional EAL arguments to configure the maximum number of
+  queue-pairs on the ML device through the option ``max_qps``.
+
+  This option is supported only on vdev.
+
+  For example::
+
+     --vdev ml_mvtvm,max_qps=4
+
+  With the above configuration, 4 queue-pairs are created on the vdev.
+
 Debugging Options
 -----------------
 
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 91813e9d0a..41f3b7a95d 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -309,6 +309,12 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 
 	PLT_SET_USED(pci_drv);
 
+	if (cnxk_ml_dev_initialized == 1) {
+		plt_err("ML CNXK device already initialized!");
+		plt_err("Cannot initialize CN10K PCI dev");
+		return -EINVAL;
+	}
+
 	init_params = (struct rte_ml_dev_pmd_init_params){
 		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cnxk_ml_dev)};
 
@@ -355,6 +361,8 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	dev->dequeue_burst = NULL;
 	dev->op_error_get = NULL;
 
+	cnxk_ml_dev_initialized = 1;
+	cnxk_mldev->type = CNXK_ML_DEV_TYPE_PCI;
 	cnxk_mldev->state = ML_CNXK_DEV_STATE_PROBED;
 
 	return 0;
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 2e7eb6c9ef..cee405f3f5 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -11,6 +11,9 @@
 
 #include "cnxk_ml_io.h"
 
+/* Device status */
+extern int cnxk_ml_dev_initialized;
+
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.c b/drivers/ml/cnxk/cnxk_ml_dev.c
index 63d1c9e417..dc4512223c 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.c
+++ b/drivers/ml/cnxk/cnxk_ml_dev.c
@@ -7,6 +7,9 @@
 
 #include "cnxk_ml_dev.h"
 
+/* Device status */
+int cnxk_ml_dev_initialized;
+
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
 
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 382fca64be..491c4c4aea 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -9,6 +9,10 @@
 
 #include "cn10k_ml_dev.h"
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_dev.h"
+#endif
+
 #include "cnxk_ml_xstats.h"
 
 /* ML command timeout in seconds */
@@ -34,6 +38,15 @@ struct cnxk_ml_error_db {
 	char str[RTE_ML_STR_MAX];
 };
 
+/* Device type */
+enum cnxk_ml_dev_type {
+	/* PCI based Marvell's ML HW accelerator device */
+	CNXK_ML_DEV_TYPE_PCI,
+
+	/* Generic Virtual device */
+	CNXK_ML_DEV_TYPE_VDEV,
+};
+
 /* Device configuration state enum */
 enum cnxk_ml_dev_state {
 	/* Probed and not configured */
@@ -66,6 +79,9 @@ struct cnxk_ml_dev {
 	/* RTE device */
 	struct rte_ml_dev *mldev;
 
+	/* Device type */
+	enum cnxk_ml_dev_type type;
+
 	/* Configuration state */
 	enum cnxk_ml_dev_state state;
 
@@ -87,6 +103,11 @@ struct cnxk_ml_dev {
 	/* CN10K device structure */
 	struct cn10k_ml_dev cn10k_mldev;
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	/* MVTVM device structure */
+	struct mvtvm_ml_dev mvtvm_mldev;
+#endif
+
 	/* Maximum number of layers */
 	uint64_t max_nb_layers;
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index bf266d4d6e..36a5dcf9b0 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -117,7 +117,8 @@ cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc
 	qp->stats.enqueue_err_count = 0;
 	qp->stats.dequeue_err_count = 0;
 
-	cn10k_ml_qp_initialize(cnxk_mldev, qp);
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		cn10k_ml_qp_initialize(cnxk_mldev, qp);
 
 	return qp;
 
@@ -480,7 +481,12 @@ cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	dev_info->driver_name = dev->device->driver->name;
 	dev_info->max_models = ML_CNXK_MAX_MODELS;
 
-	return cn10k_ml_dev_info_get(cnxk_mldev, dev_info);
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		return cn10k_ml_dev_info_get(cnxk_mldev, dev_info);
+	else
+		return mvtvm_ml_dev_info_get(cnxk_mldev, dev_info);
+
+	return 0;
 }
 
 static int
@@ -518,9 +524,11 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 			   conf->nb_queue_pairs, conf->nb_models);
 
 		/* Load firmware */
-		ret = cn10k_ml_fw_load(cnxk_mldev);
-		if (ret != 0)
-			return ret;
+		if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+			ret = cn10k_ml_fw_load(cnxk_mldev);
+			if (ret != 0)
+				return ret;
+		}
 	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
 		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
@@ -618,10 +626,12 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	}
 	dev->data->nb_models = conf->nb_models;
 
-	ret = cn10k_ml_dev_configure(cnxk_mldev, conf);
-	if (ret != 0) {
-		plt_err("Failed to configure CN10K ML Device");
-		goto error;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		ret = cn10k_ml_dev_configure(cnxk_mldev, conf);
+		if (ret != 0) {
+			plt_err("Failed to configure CN10K ML Device");
+			goto error;
+		}
 	}
 
 	ret = mvtvm_ml_dev_configure(cnxk_mldev, conf);
@@ -629,12 +639,17 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 		goto error;
 
 	/* Set device capabilities */
-	cnxk_mldev->max_nb_layers =
-		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		cnxk_mldev->max_nb_layers =
+			cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
+	else
+		cnxk_mldev->max_nb_layers = ML_CNXK_MAX_MODELS;
 
 	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
 	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
-	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
+
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
 
 	/* Allocate and initialize index_map */
 	if (cnxk_mldev->index_map == NULL) {
@@ -695,8 +710,10 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 	if (mvtvm_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close MVTVM ML Device");
 
-	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
-		plt_err("Failed to close CN10K ML Device");
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		if (cn10k_ml_dev_close(cnxk_mldev) != 0)
+			plt_err("Failed to close CN10K ML Device");
+	}
 
 	if (cnxk_mldev->index_map)
 		rte_free(cnxk_mldev->index_map);
@@ -748,10 +765,12 @@ cnxk_ml_dev_start(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
-	ret = cn10k_ml_dev_start(cnxk_mldev);
-	if (ret != 0) {
-		plt_err("Failed to start CN10K ML Device");
-		return ret;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		ret = cn10k_ml_dev_start(cnxk_mldev);
+		if (ret != 0) {
+			plt_err("Failed to start CN10K ML Device");
+			return ret;
+		}
 	}
 
 	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
@@ -770,10 +789,12 @@ cnxk_ml_dev_stop(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
-	ret = cn10k_ml_dev_stop(cnxk_mldev);
-	if (ret != 0) {
-		plt_err("Failed to stop CN10K ML Device");
-		return ret;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		ret = cn10k_ml_dev_stop(cnxk_mldev);
+		if (ret != 0) {
+			plt_err("Failed to stop CN10K ML Device");
+			return ret;
+		}
 	}
 
 	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
@@ -800,7 +821,12 @@ cnxk_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 			cnxk_ml_model_dump(cnxk_mldev, model, fp);
 	}
 
-	return cn10k_ml_dev_dump(cnxk_mldev, fp);
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		return cn10k_ml_dev_dump(cnxk_mldev, fp);
+	else
+		return mvtvm_ml_dev_dump(cnxk_mldev, fp);
+
+	return 0;
 }
 
 static int
@@ -813,6 +839,9 @@ cnxk_ml_dev_selftest(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV)
+		return -ENOTSUP;
+
 	return cn10k_ml_dev_selftest(cnxk_mldev);
 }
 
@@ -1145,6 +1174,11 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 		return -EINVAL;
 	}
 
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV && type != ML_CNXK_MODEL_TYPE_TVM) {
+		plt_err("Unsupported model type");
+		return -ENOTSUP;
+	}
+
 	/* Find model ID */
 	found = false;
 	for (lcl_model_id = 0; lcl_model_id < dev->data->nb_models; lcl_model_id++) {
@@ -1384,6 +1418,8 @@ cnxk_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buf
 		return -EINVAL;
 
 	cnxk_mldev = dev->data->dev_private;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV)
+		return -ENOTSUP;
 
 	model = dev->data->models[model_id];
 	if (model == NULL) {
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 20dbaab734..de63a9c502 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -57,6 +57,7 @@ if enable_mvtvm
 dpdk_conf.set('RTE_MLDEV_CNXK_ENABLE_MVTVM', 1)
 
 sources += files(
+        'mvtvm_ml_dev.c',
         'mvtvm_ml_ops.c',
         'mvtvm_ml_model.c',
 )
diff --git a/drivers/ml/cnxk/mvtvm_ml_dev.c b/drivers/ml/cnxk/mvtvm_ml_dev.c
new file mode 100644
index 0000000000..dcac7b7273
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_dev.c
@@ -0,0 +1,196 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_kvargs.h>
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include <bus_vdev_driver.h>
+
+#include <roc_api.h>
+
+#include "cnxk_ml_dev.h"
+
+#define MVTVM_ML_DEV_MAX_QPS	      "max_qps"
+#define MVTVM_ML_DEV_CACHE_MODEL_DATA "cache_model_data"
+
+#define MVTVM_ML_DEV_MAX_QPS_DEFAULT	      32
+#define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT 1
+
+static const char *const valid_args[] = {MVTVM_ML_DEV_MAX_QPS, MVTVM_ML_DEV_CACHE_MODEL_DATA, NULL};
+
+static int
+parse_integer_arg(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	int *i = (int *)extra_args;
+
+	*i = atoi(value);
+	if (*i < 0) {
+		plt_err("Argument has to be positive.");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int
+parse_uint_arg(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	int i;
+	char *end;
+	errno = 0;
+
+	i = strtol(value, &end, 10);
+	if (*end != 0 || errno != 0 || i < 0)
+		return -EINVAL;
+
+	*((uint32_t *)extra_args) = i;
+
+	return 0;
+}
+
+static int
+mvtvm_mldev_parse_devargs(const char *args, struct mvtvm_ml_dev *mvtvm_mldev)
+{
+	bool cache_model_data_set = false;
+	struct rte_kvargs *kvlist = NULL;
+	bool max_qps_set = false;
+	int ret = 0;
+
+	if (args == NULL)
+		goto check_args;
+
+	kvlist = rte_kvargs_parse(args, valid_args);
+	if (kvlist == NULL) {
+		plt_err("Error parsing %s devargs\n", "MLDEV_NAME_MVTVM_PMD");
+		return -EINVAL;
+	}
+
+	if (rte_kvargs_count(kvlist, MVTVM_ML_DEV_MAX_QPS) == 1) {
+		ret = rte_kvargs_process(kvlist, MVTVM_ML_DEV_MAX_QPS, &parse_uint_arg,
+					 &mvtvm_mldev->max_nb_qpairs);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n", MVTVM_ML_DEV_MAX_QPS);
+			ret = -EINVAL;
+			goto exit;
+		}
+		max_qps_set = true;
+	}
+
+	if (rte_kvargs_count(kvlist, MVTVM_ML_DEV_CACHE_MODEL_DATA) == 1) {
+		ret = rte_kvargs_process(kvlist, MVTVM_ML_DEV_CACHE_MODEL_DATA, &parse_integer_arg,
+					 &mvtvm_mldev->cache_model_data);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n",
+				MVTVM_ML_DEV_CACHE_MODEL_DATA);
+			ret = -EINVAL;
+			goto exit;
+		}
+		cache_model_data_set = true;
+	}
+
+check_args:
+	if (!max_qps_set)
+		mvtvm_mldev->max_nb_qpairs = MVTVM_ML_DEV_MAX_QPS_DEFAULT;
+	plt_ml_dbg("ML: %s = %u", MVTVM_ML_DEV_MAX_QPS, mvtvm_mldev->max_nb_qpairs);
+
+	if (!cache_model_data_set) {
+		mvtvm_mldev->cache_model_data = CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT;
+	} else {
+		if ((mvtvm_mldev->cache_model_data < 0) || (mvtvm_mldev->cache_model_data > 1)) {
+			plt_err("Invalid argument, %s = %d\n", MVTVM_ML_DEV_CACHE_MODEL_DATA,
+				mvtvm_mldev->cache_model_data);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_ml_dbg("ML: %s = %d", MVTVM_ML_DEV_CACHE_MODEL_DATA, mvtvm_mldev->cache_model_data);
+
+exit:
+	if (kvlist)
+		rte_kvargs_free(kvlist);
+
+	return ret;
+}
+
+static int
+mvtvm_ml_vdev_probe(struct rte_vdev_device *vdev)
+{
+	struct rte_ml_dev_pmd_init_params init_params;
+	struct mvtvm_ml_dev *mvtvm_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct rte_ml_dev *dev;
+	const char *input_args;
+	const char *name;
+	int ret = 0;
+
+	if (cnxk_ml_dev_initialized == 1) {
+		plt_err("ML CNXK device already initialized!");
+		plt_err("Not creating ml_mvtvm vdev!");
+		return 0;
+	}
+
+	init_params = (struct rte_ml_dev_pmd_init_params){
+		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cnxk_ml_dev)};
+
+	name = rte_vdev_device_name(vdev);
+	if (name == NULL)
+		return -EINVAL;
+	input_args = rte_vdev_device_args(vdev);
+
+	dev = rte_ml_dev_pmd_create(name, &vdev->device, &init_params);
+	if (dev == NULL) {
+		ret = -EFAULT;
+		goto error_exit;
+	}
+
+	cnxk_mldev = dev->data->dev_private;
+	cnxk_mldev->mldev = dev;
+	mvtvm_mldev = &cnxk_mldev->mvtvm_mldev;
+	mvtvm_mldev->vdev = vdev;
+
+	ret = mvtvm_mldev_parse_devargs(input_args, mvtvm_mldev);
+	if (ret < 0)
+		goto error_exit;
+
+	dev->dev_ops = &cnxk_ml_ops;
+	dev->enqueue_burst = NULL;
+	dev->dequeue_burst = NULL;
+	dev->op_error_get = NULL;
+
+	cnxk_ml_dev_initialized = 1;
+	cnxk_mldev->type = CNXK_ML_DEV_TYPE_VDEV;
+
+	return 0;
+
+error_exit:
+	plt_err("Could not create device: ml_mvtvm");
+
+	return ret;
+}
+
+static int
+mvtvm_ml_vdev_remove(struct rte_vdev_device *vdev)
+{
+	struct rte_ml_dev *dev;
+	const char *name;
+
+	name = rte_vdev_device_name(vdev);
+	if (name == NULL)
+		return -EINVAL;
+
+	dev = rte_ml_dev_pmd_get_named_dev(name);
+	if (dev == NULL)
+		return -ENODEV;
+
+	return rte_ml_dev_pmd_destroy(dev);
+}
+
+static struct rte_vdev_driver mvtvm_mldev_pmd = {.probe = mvtvm_ml_vdev_probe,
+						 .remove = mvtvm_ml_vdev_remove};
+
+RTE_PMD_REGISTER_VDEV(MLDEV_NAME_MVTVM_PMD, mvtvm_mldev_pmd);
+
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_MVTVM_PMD,
+			      MVTVM_ML_DEV_MAX_QPS "=<int>" MVTVM_ML_DEV_CACHE_MODEL_DATA "=<0|1>");
diff --git a/drivers/ml/cnxk/mvtvm_ml_dev.h b/drivers/ml/cnxk/mvtvm_ml_dev.h
new file mode 100644
index 0000000000..6922c19337
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_dev.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_DEV_H_
+#define _MVTVM_ML_DEV_H_
+
+#include <rte_mldev_core.h>
+
+/* Device status */
+extern int cnxk_ml_dev_initialized;
+
+/* CNXK Device ops */
+extern struct rte_ml_dev_ops cnxk_ml_ops;
+
+/* Marvell MVTVM ML PMD device name */
+#define MLDEV_NAME_MVTVM_PMD ml_mvtvm
+
+/* Maximum number of descriptors per queue-pair */
+#define ML_MVTVM_MAX_DESC_PER_QP 1024
+
+/* Maximum number of inputs / outputs per model */
+#define ML_MVTVM_MAX_INPUT_OUTPUT 32
+
+/* Maximum number of segments for IO data */
+#define ML_MVTVM_MAX_SEGMENTS 1
+
+/* Device private data */
+struct mvtvm_ml_dev {
+	/* Virtual device */
+	struct rte_vdev_device *vdev;
+
+	/* Maximum number of queue pairs */
+	uint16_t max_nb_qpairs;
+
+	/* Enable / disable model data caching */
+	int cache_model_data;
+};
+
+#endif /* _MVTVM_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 6b88491371..e825c3fb23 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -97,6 +97,22 @@ mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *m
 	return value;
 }
 
+int
+mvtvm_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info)
+{
+	struct mvtvm_ml_dev *mvtvm_mldev;
+
+	mvtvm_mldev = &cnxk_mldev->mvtvm_mldev;
+
+	dev_info->max_queue_pairs = mvtvm_mldev->max_nb_qpairs;
+	dev_info->max_desc = ML_MVTVM_MAX_DESC_PER_QP;
+	dev_info->max_io = ML_MVTVM_MAX_INPUT_OUTPUT;
+	dev_info->max_segments = ML_MVTVM_MAX_SEGMENTS;
+	dev_info->align_size = RTE_CACHE_LINE_SIZE;
+
+	return 0;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
@@ -127,6 +143,15 @@ mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 	return ret;
 }
 
+int
+mvtvm_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(fp);
+
+	return 0;
+}
+
 int
 mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 		    struct cnxk_ml_model *model)
@@ -237,6 +262,12 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	else
 		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_HYBRID;
 
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV &&
+	    model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_LLVM) {
+		plt_err("Unsupported model sub-type");
+		return -ENOTSUP;
+	}
+
 	/* Set callback function array */
 	if (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_LLVM) {
 		callback = &model->mvtvm.cb;
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index cb4b219743..0232c5ead5 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -55,8 +55,10 @@ struct mvtvm_ml_req {
 	struct mvtvm_ml_result result;
 };
 
+int mvtvm_ml_dev_info_get(struct cnxk_ml_dev *mldev, struct rte_ml_dev_info *dev_info);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int mvtvm_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index 19af1d2703..126a954c91 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -67,6 +67,15 @@ mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *m
 	return 0;
 }
 
+int
+mvtvm_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(dev_info);
+
+	return -ENOTSUP;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
@@ -84,6 +93,15 @@ mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
+int
+mvtvm_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(fp);
+
+	return -EINVAL;
+}
+
 int
 mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 		    struct cnxk_ml_model *model)
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 3fd1f04c35..4220a963f2 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -14,8 +14,10 @@ struct cnxk_ml_model;
 struct cnxk_ml_layer;
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
+int mvtvm_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int mvtvm_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* RE: [EXT] Re: [PATCH v6 00/34] Implementation of revised ml/cnxk driver
  2023-10-18 14:20   ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Jerin Jacob
@ 2023-10-19  6:41     ` Srikanth Yalavarthi
  0 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  6:41 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, Shivah Shankar Shankar Narayan Rao, Anup Prabhu,
	Prince Takkar, Srikanth Yalavarthi

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: 18 October 2023 19:50
> To: Srikanth Yalavarthi <syalavarthi@marvell.com>
> Cc: dev@dpdk.org; Shivah Shankar Shankar Narayan Rao
> <sshankarnara@marvell.com>; Anup Prabhu <aprabhu@marvell.com>;
> Prince Takkar <ptakkar@marvell.com>; Srikanth Yalavarthi
> <syalavarthi@marvell.com>
> Subject: [EXT] Re: [PATCH v6 00/34] Implementation of revised ml/cnxk
> driver
> 
> External Email
> 
> ----------------------------------------------------------------------
> On Wed, Oct 18, 2023 at 7:24 PM Srikanth Yalavarthi
> <syalavarthi@marvell.com> wrote:
> >
> > This patch series is an implementation of revised ml/cnxk driver to
> > support models compiled with TVM compiler framework. TVM models use
> a
> > hybrid mode for execution, with regions of the model executing on the
> > ML accelerator and the rest executing on CPU cores.
> >
> > This series of commits reorganizes the ml/cnxk driver and adds support
> > to execute multiple regions with-in a TVM model.
> >
> 
> Fix this warning
> 
> ### [PATCH] ml/cnxk: enable creation of mvtvm virtual device
> 
> Warning in drivers/ml/cnxk/cn10k_ml_dev.c:
> Using rte_panic/rte_exit

Updated the driver patches to avoid using rte_panic/rte_exit. Changes part of v7.

> 
> Fix as needed which is relevent
> ### [PATCH] ml/cnxk: add generic cnxk device structure
> 
> WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
> strncpy - see: https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_KSPP_linux_issues_90&d=DwIFaQ&c=nKjWec2b6R0mOyPa
> z7xtfQ&r=SNPqUkGl0n_Ms1iJa_6wD6LBwX8efL_NOyXvAX-iCMI&m=Wc-
> 9LbDLV9eZtdFI_acMCdshpvh76LPngspZk3yJrdWbVO8NUnmS3ywndxRTEuAI
> &s=rCRq1CQlxbyMlcn4Bf-vfXpQbosVe-pT3EtatPXlKmg&e=
> #1778: FILE: drivers/ml/cnxk/cn10k_ml_ops.c:1316:
> +               strncpy(xstats_map[idx].name,
> cn10k_mldev->xstats.entries[i].map.name,
> 
> total: 0 errors, 1 warnings, 2276 lines checked
> 
> ### [PATCH] ml/cnxk: add generic model and layer structures
> 
> WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
> strncpy - see: https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_KSPP_linux_issues_90&d=DwIFaQ&c=nKjWec2b6R0mOyPa
> z7xtfQ&r=SNPqUkGl0n_Ms1iJa_6wD6LBwX8efL_NOyXvAX-iCMI&m=Wc-
> 9LbDLV9eZtdFI_acMCdshpvh76LPngspZk3yJrdWbVO8NUnmS3ywndxRTEuAI
> &s=rCRq1CQlxbyMlcn4Bf-vfXpQbosVe-pT3EtatPXlKmg&e=
> #117: FILE: drivers/ml/cnxk/cn10k_ml_model.c:379:
> +                       strncpy(layer->info.input[i].name, (char
> *)metadata->input1[i].input_name,
> 
> WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
> strncpy - see: https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_KSPP_linux_issues_90&d=DwIFaQ&c=nKjWec2b6R0mOyPa
> z7xtfQ&r=SNPqUkGl0n_Ms1iJa_6wD6LBwX8efL_NOyXvAX-iCMI&m=Wc-
> 9LbDLV9eZtdFI_acMCdshpvh76LPngspZk3yJrdWbVO8NUnmS3ywndxRTEuAI
> &s=rCRq1CQlxbyMlcn4Bf-vfXpQbosVe-pT3EtatPXlKmg&e=
> #166: FILE: drivers/ml/cnxk/cn10k_ml_model.c:411:
> +                       strncpy(layer->info.input[i].name, (char
> *)metadata->input2[j].input_name,
> 
> WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
> strncpy - see: https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_KSPP_linux_issues_90&d=DwIFaQ&c=nKjWec2b6R0mOyPa
> z7xtfQ&r=SNPqUkGl0n_Ms1iJa_6wD6LBwX8efL_NOyXvAX-iCMI&m=Wc-
> 9LbDLV9eZtdFI_acMCdshpvh76LPngspZk3yJrdWbVO8NUnmS3ywndxRTEuAI
> &s=rCRq1CQlxbyMlcn4Bf-vfXpQbosVe-pT3EtatPXlKmg&e=
> #221: FILE: drivers/ml/cnxk/cn10k_ml_model.c:449:
> +                       strncpy(layer->info.output[i].name,
> 
> WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
> strncpy - see: https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_KSPP_linux_issues_90&d=DwIFaQ&c=nKjWec2b6R0mOyPa
> z7xtfQ&r=SNPqUkGl0n_Ms1iJa_6wD6LBwX8efL_NOyXvAX-iCMI&m=Wc-
> 9LbDLV9eZtdFI_acMCdshpvh76LPngspZk3yJrdWbVO8NUnmS3ywndxRTEuAI
> &s=rCRq1CQlxbyMlcn4Bf-vfXpQbosVe-pT3EtatPXlKmg&e=
> #255: FILE: drivers/ml/cnxk/cn10k_ml_model.c:472:
> +                       strncpy(layer->info.output[i].name,
> 
> total: 0 errors, 4 warnings, 1905 lines checked
> 
> ### [PATCH] ml/cnxk: update model load and unload functions
> 
> WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
> strncpy - see: https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_KSPP_linux_issues_90&d=DwIFaQ&c=nKjWec2b6R0mOyPa
> z7xtfQ&r=SNPqUkGl0n_Ms1iJa_6wD6LBwX8efL_NOyXvAX-iCMI&m=Wc-
> 9LbDLV9eZtdFI_acMCdshpvh76LPngspZk3yJrdWbVO8NUnmS3ywndxRTEuAI
> &s=rCRq1CQlxbyMlcn4Bf-vfXpQbosVe-pT3EtatPXlKmg&e=
> #83: FILE: drivers/ml/cnxk/cn10k_ml_model.c:367:
> +                       strncpy(io_info->input[i].name, (char
> *)metadata->input1[i].input_name,
> 
> WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
> strncpy - see: https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_KSPP_linux_issues_90&d=DwIFaQ&c=nKjWec2b6R0mOyPa
> z7xtfQ&r=SNPqUkGl0n_Ms1iJa_6wD6LBwX8efL_NOyXvAX-iCMI&m=Wc-
> 9LbDLV9eZtdFI_acMCdshpvh76LPngspZk3yJrdWbVO8NUnmS3ywndxRTEuAI
> &s=rCRq1CQlxbyMlcn4Bf-vfXpQbosVe-pT3EtatPXlKmg&e=
> #135: FILE: drivers/ml/cnxk/cn10k_ml_model.c:399:
> +                       strncpy(io_info->input[i].name, (char
> *)metadata->input2[j].input_name,
> 
> WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
> strncpy - see: https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_KSPP_linux_issues_90&d=DwIFaQ&c=nKjWec2b6R0mOyPa
> z7xtfQ&r=SNPqUkGl0n_Ms1iJa_6wD6LBwX8efL_NOyXvAX-iCMI&m=Wc-
> 9LbDLV9eZtdFI_acMCdshpvh76LPngspZk3yJrdWbVO8NUnmS3ywndxRTEuAI
> &s=rCRq1CQlxbyMlcn4Bf-vfXpQbosVe-pT3EtatPXlKmg&e=
> #204: FILE: drivers/ml/cnxk/cn10k_ml_model.c:437:
> +                       strncpy(io_info->output[i].name, (char
> *)metadata->output1[i].output_name,
> 
> WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
> strncpy - see: https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_KSPP_linux_issues_90&d=DwIFaQ&c=nKjWec2b6R0mOyPa
> z7xtfQ&r=SNPqUkGl0n_Ms1iJa_6wD6LBwX8efL_NOyXvAX-iCMI&m=Wc-
> 9LbDLV9eZtdFI_acMCdshpvh76LPngspZk3yJrdWbVO8NUnmS3ywndxRTEuAI
> &s=rCRq1CQlxbyMlcn4Bf-vfXpQbosVe-pT3EtatPXlKmg&e=
> #244: FILE: drivers/ml/cnxk/cn10k_ml_model.c:461:
> +                       strncpy(io_info->output[i].name, (char
> *)metadata->output2[j].output_name,
> 
> total: 0 errors, 4 warnings, 1094 lines checked
> 
> ### [PATCH] ml/cnxk: update device and model xstats functions
> 
> WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
> strncpy - see: https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_KSPP_linux_issues_90&d=DwIFaQ&c=nKjWec2b6R0mOyPa
> z7xtfQ&r=SNPqUkGl0n_Ms1iJa_6wD6LBwX8efL_NOyXvAX-iCMI&m=Wc-
> 9LbDLV9eZtdFI_acMCdshpvh76LPngspZk3yJrdWbVO8NUnmS3ywndxRTEuAI
> &s=rCRq1CQlxbyMlcn4Bf-vfXpQbosVe-pT3EtatPXlKmg&e=
> #1100: FILE: drivers/ml/cnxk/cnxk_ml_ops.c:856:
> WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
> strncpy - see: https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_KSPP_linux_issues_90&d=DwIFaQ&c=nKjWec2b6R0mOyPa
> z7xtfQ&r=SNPqUkGl0n_Ms1iJa_6wD6LBwX8efL_NOyXvAX-iCMI&m=Wc-
> 9LbDLV9eZtdFI_acMCdshpvh76LPngspZk3yJrdWbVO8NUnmS3ywndxRTEuAI
> &s=rCRq1CQlxbyMlcn4Bf-vfXpQbosVe-pT3EtatPXlKmg&e=
> #1100: FILE: drivers/ml/cnxk/cnxk_ml_ops.c:856:
> +               strncpy(xstats_map[idx].name, xs->map.name,
> RTE_ML_STR_MAX);
> 
> total: 0 errors, 1 warnings, 1248 lines checked
> 
> ### [PATCH] ml/cnxk: fetch layer info and load TVM model
> 
> WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
> strncpy - see: https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_KSPP_linux_issues_90&d=DwIFaQ&c=nKjWec2b6R0mOyPa
> z7xtfQ&r=SNPqUkGl0n_Ms1iJa_6wD6LBwX8efL_NOyXvAX-iCMI&m=Wc-
> 9LbDLV9eZtdFI_acMCdshpvh76LPngspZk3yJrdWbVO8NUnmS3ywndxRTEuAI
> &s=rCRq1CQlxbyMlcn4Bf-vfXpQbosVe-pT3EtatPXlKmg&e=
> #172: FILE: drivers/ml/cnxk/mvtvm_ml_ops.c:125:
> +               strncpy(model->layer[layer_id].name,
> 
> total: 0 errors, 1 warnings, 207 lines checked
> 
> ### [PATCH] ml/cnxk: update internal info for TVM model
> 
> WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
> strncpy - see: https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_KSPP_linux_issues_90&d=DwIFaQ&c=nKjWec2b6R0mOyPa
> z7xtfQ&r=SNPqUkGl0n_Ms1iJa_6wD6LBwX8efL_NOyXvAX-iCMI&m=Wc-
> 9LbDLV9eZtdFI_acMCdshpvh76LPngspZk3yJrdWbVO8NUnmS3ywndxRTEuAI
> &s=rCRq1CQlxbyMlcn4Bf-vfXpQbosVe-pT3EtatPXlKmg&e=
> #85: FILE: drivers/ml/cnxk/mvtvm_ml_model.c:175:
> +               strncpy(model->mvtvm.info.input[i].name,
> metadata->input[i].name,
> 
> WARNING:STRNCPY: Prefer strscpy, strscpy_pad, or __nonstring over
> strncpy - see: https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_KSPP_linux_issues_90&d=DwIFaQ&c=nKjWec2b6R0mOyPa
> z7xtfQ&r=SNPqUkGl0n_Ms1iJa_6wD6LBwX8efL_NOyXvAX-iCMI&m=Wc-
> 9LbDLV9eZtdFI_acMCdshpvh76LPngspZk3yJrdWbVO8NUnmS3ywndxRTEuAI
> &s=rCRq1CQlxbyMlcn4Bf-vfXpQbosVe-pT3EtatPXlKmg&e=
> #118: FILE: drivers/ml/cnxk/mvtvm_ml_model.c:208:
> +               strncpy(model->mvtvm.info.output[i].name,
> metadata->output[i].name,
> 
> total: 0 errors, 2 warnings, 173 lines checked
> 
> ### [PATCH] ml/cnxk: enable reporting model runtime as xstats
> 
> WARNING:STRCPY: Prefer strscpy over strcpy - see:
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_KSPP_linux_issues_88&d=DwIFaQ&c=nKjWec2b6R0mOyPa
> z7xtfQ&r=SNPqUkGl0n_Ms1iJa_6wD6LBwX8efL_NOyXvAX-iCMI&m=Wc-
> 9LbDLV9eZtdFI_acMCdshpvh76LPngspZk3yJrdWbVO8NUnmS3ywndxRTEuAI
> &s=mUyWaXNokXAr7ebegeCamaauO7XAu7W5GqbaP20g-i8&e=
> #113: FILE: drivers/ml/cnxk/cnxk_ml_ops.c:243:
> +               strcpy(suffix, "cycles");
> 
> WARNING:STRCPY: Prefer strscpy over strcpy - see:
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_KSPP_linux_issues_88&d=DwIFaQ&c=nKjWec2b6R0mOyPa
> z7xtfQ&r=SNPqUkGl0n_Ms1iJa_6wD6LBwX8efL_NOyXvAX-iCMI&m=Wc-
> 9LbDLV9eZtdFI_acMCdshpvh76LPngspZk3yJrdWbVO8NUnmS3ywndxRTEuAI
> &s=mUyWaXNokXAr7ebegeCamaauO7XAu7W5GqbaP20g-i8&e=
> #115: FILE: drivers/ml/cnxk/cnxk_ml_ops.c:245:
> +               strcpy(suffix, "ns");
> 

Replaced all str* functions with rte_str* functions. Changes part of v7.

> total: 0 errors, 2 warnings, 477 lines checked

^ permalink raw reply	[flat|nested] 340+ messages in thread

* RE: [EXT] Re: [PATCH v6 18/34] ml/cnxk: support config and close of tvmdp library
  2023-10-18 18:34     ` Jerin Jacob
@ 2023-10-19  6:44       ` Srikanth Yalavarthi
  0 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-19  6:44 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, Shivah Shankar Shankar Narayan Rao, Anup Prabhu,
	Prince Takkar, Srikanth Yalavarthi


> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: 19 October 2023 00:04
> To: Srikanth Yalavarthi <syalavarthi@marvell.com>
> Cc: dev@dpdk.org; Shivah Shankar Shankar Narayan Rao
> <sshankarnara@marvell.com>; Anup Prabhu <aprabhu@marvell.com>;
> Prince Takkar <ptakkar@marvell.com>; Srikanth Yalavarthi
> <syalavarthi@marvell.com>
> Subject: [EXT] Re: [PATCH v6 18/34] ml/cnxk: support config and close of
> tvmdp library
> 
> External Email
> 
> ----------------------------------------------------------------------
> On Wed, Oct 18, 2023 at 7:52 PM Srikanth Yalavarthi
> <syalavarthi@marvell.com> wrote:
> >
> > Added support to configure and close TVMDP library based on ML device
> > configuration options.
> >
> > Updated meson build to enable Jansson, TVM runtime, TVMDP library as
> > build dependencies.
> >
> > Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
> > ---
> 
> >
> > +Compilation Prerequisites
> > +-------------------------
> > +
> > +This driver requires external libraries to optionally enable support
> > +for models compiled using Apache TVM framework. The following
> > +dependencies are not part of DPDK and must be installed separately:
> > +
> > +- **Jansson**
> > +
> > +  This library enables support to parse and read JSON files.
> > +
> > +- **DLPack**
> > +
> > +  This library provides headers for open in-memory tensor structures.
> > +
> > +.. note::
> > +
> > +    DPDK CNXK ML driver requires DLPack version 0.7
> > +
> > +.. code-block:: console
> 
> 
> Please add sections for cross and native.
> 
> > +    git clone https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_dmlc_dlpack.git&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ
> &r=SNPqUkGl0n_Ms1iJa_6wD6LBwX8efL_NOyXvAX-
> iCMI&m=Af3Vz7Jwj42zg2TmQe6-
> BpsSzCWoeRVDxlAtzIB9e_Rtv6KG1mT0_Lq0_HnJyz1E&s=d5Yn2kMNnw-
> GyGXVRyExfsC8-Uy9S3TKnW0boz8mlsI&e=
> > +    cd dlpack
> > +    git checkout v0.7 -b v0.7
> > +    cmake -S ./ -B build \
> > +      -DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
> > +      -DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++ \
> > +      -DBUILD_MOCK=OFF
> > +    make -C build
> > +    make -C build install
> > +
> > +- **TVM**
> > +
> > +  Apache TVM provides a runtime library (libtvm_runtime) used to
> > + execute  models on CPU cores or hardware accelerators.
> > +
> > +.. note::
> > +
> > +    DPDK CNXK ML driver requires TVM version 0.10.0
> > +
> > +.. code-block:: console
> > +
> > +    git clone
> > + https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_apac
> > +
> he_tvm.git&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=SNPqUkGl0n_Ms1
> iJa_6wD
> > + 6LBwX8efL_NOyXvAX-iCMI&m=Af3Vz7Jwj42zg2TmQe6-
> BpsSzCWoeRVDxlAtzIB9e_R
> > + tv6KG1mT0_Lq0_HnJyz1E&s=pj_HUWALTg49rW1wRTyzB-yWSyuHzWr-
> XzPycb8UtlI&
> > + e=
> 
> I need to use --recursive to avoid
> CMake Error at /usr/share/cmake/Modules/ExternalProject.cmake:3176
> (message):
>   No download info given for 'project_libbacktrace' and its source directory:

Updated build steps in version 7. Added steps to initialize submodules.
> 
> 
> > +    cd tvm
> > +    git checkout v0.10.0 -b v0.10.0
> > +    cmake -S ./ -B build \
> > +      -DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
> > +      -DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++ \
> > +      -DMACHINE_NAME=aarch64-linux-gnu \
> > +      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
> > +      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY
> > +    make -C build
> > +    make -C build install
> > +
> > +- **TVMDP**
> > +
> > +  Marvell's `TVM Dataplane Library
> > + <https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_Mar
> > +
> vellEmbeddedProcessors_tvmdp&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ
> &r=SNPqUkGl0n_Ms1iJa_6wD6LBwX8efL_NOyXvAX-
> iCMI&m=Af3Vz7Jwj42zg2TmQe6-
> BpsSzCWoeRVDxlAtzIB9e_Rtv6KG1mT0_Lq0_HnJyz1E&s=kqu82cbMqOdusys
> dSYmp2cCwH9VTwcVvmMK0wqy04w0&e= >`_  works as an interface
> between TVM runtime and DPDK drivers. TVMDP library  provides a
> simplified C interface for TVM's runtime based on C++.
> > +
> > +.. code-block:: console
> > +
> > +    git clone https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__github.com_MarvellEmbeddedProcessors_tvmdp.git&d=DwIFaQ&c=nK
> jWec2b6R0mOyPaz7xtfQ&r=SNPqUkGl0n_Ms1iJa_6wD6LBwX8efL_NOyXvAX
> -iCMI&m=Af3Vz7Jwj42zg2TmQe6-
> BpsSzCWoeRVDxlAtzIB9e_Rtv6KG1mT0_Lq0_HnJyz1E&s=pR-
> 2iG1huVG6g7jrnjHx-nhJVQHnJWu-UJxrd6ziFx4&e=
> > +    cd tvmdp
> > +    git checkout main
> > +    cmake -S ./ -B build \
> > +      -
> DCMAKE_TOOLCHAIN_FILE=config/toolchains/arm64_linux_gcc.cmake \
> > +      -DBUILD_SHARED_LIBS=ON \
> > +      -DBUILD_TESTING=OFF
> 
> [main]dell[tvmdp] $ cmake -S ./ -B build -
> DCMAKE_INSTALL_PREFIX=/export/cross_prefix/prefix
> -DCMAKE_TOOLCHAIN_FILE=config/toolchains/arm64_linux_gcc.cmake
> -DBUILD_SHARED_LIBS=ON  -DBUILD_TESTING=OFF
> -- The CXX compiler identification is GNU 13.2.0
> -- The C compiler identification is GNU 13.2.0
> -- Detecting CXX compiler ABI info
> -- Detecting CXX compiler ABI info - done
> -- Check for working CXX compiler: /usr/bin/aarch64-linux-gnu-g++ - skipped
> -- Detecting CXX compile features
> -- Detecting CXX compile features - done
> -- Detecting C compiler ABI info
> -- Detecting C compiler ABI info - done
> -- Check for working C compiler: /usr/bin/aarch64-linux-gnu-gcc - skipped
> -- Detecting C compile features
> -- Detecting C compile features - done
> CMake Error at CMakeLists.txt:53 (find_package):
>   By not providing "Finddmlc.cmake" in CMAKE_MODULE_PATH this project
> has
>   asked CMake to find a package configuration file provided by "dmlc", but
>   CMake did not find one.
> 
>   Could not find a package configuration file provided by "dmlc" with any of
>   the following names:
> 
>     dmlcConfig.cmake
>     dmlc-config.cmake
> 
>   Add the installation prefix of "dmlc" to CMAKE_PREFIX_PATH or set
>   "dmlc_DIR" to a directory containing one of the above files.  If "dmlc"
>   provides a separate development package or SDK, be sure it has been
>   installed.
> 
> 
> -- Configuring incomplete, errors occurred!

This is bug in TVMDP CMakeLists.txt. This issue is fixed now and change is pushed to TVMDP's github repo.
> 
> 
> > +enable_mvtvm = true
> > +
> > +if not jansson_dep.found()
> > +        message('drivers/ml/cnxk: jansson not found')
> > +        enable_mvtvm = false
> > +endif
> > +
> > +if not cc.check_header('dlpack/dlpack.h')
> > +        message('drivers/ml/cnxk: dlpack.h not found')
> > +        enable_mvtvm = false
> > +endif
> > +
> > +tvmrt_lib = cc.find_library('tvm_runtime', required: false) if
> > +tvmrt_lib.found()
> > +        tvmrt_dep = declare_dependency(dependencies: tvmrt_lib) else
> > +        message('drivers/ml/cnxk: tvm_runtime not found')
> > +        enable_mvtvm = false
> > +endif
> > +
> > +tvmdp_dep = dependency('tvmdp', required: false) if not
> > +tvmdp_dep.found()
> > +        message('drivers/ml/cnxk: tvmdp not found')
> > +        enable_mvtvm = false
> > +endif
> > +
> >  sources = files(
> >          'cn10k_ml_dev.c',
> >          'cn10k_ml_ops.c',
> > @@ -21,6 +47,39 @@ sources = files(
> >
> >  deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
> >
> > +if enable_mvtvm
> > +
> > +dpdk_conf.set('RTE_MLDEV_CNXK_ENABLE_MVTVM', 1)
> > +
> > +driver_sdk_headers += files(
> > +        'mvtvm_ml_ops.h',
> > +)
> 
> Remove this
Done. Change part of version 7.

^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 00/34] Implementation of revised ml/cnxk driver
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (39 preceding siblings ...)
  2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
@ 2023-10-23  4:41 ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 01/34] ml/cnxk: drop support for register polling Srikanth Yalavarthi
                     ` (33 more replies)
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
  41 siblings, 34 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  Cc: dev, syalavarthi, sshankarnara, aprabhu, ptakkar

This patch series is an implementation of revised ml/cnxk driver
to support models compiled with TVM compiler framework. TVM models
use a hybrid mode for execution, with regions of the model executing
on the ML accelerator and the rest executing on CPU cores.

This series of commits reorganizes the ml/cnxk driver and adds support
to execute multiple regions with-in a TVM model.

v8:
  - Updated CMake dependency resolution of external dependencies
  - Updated mldevs/cnxk documentation
  - Updated meson config files for cn9k and cn10k to include cmake

v7:
  - Updated steps to build dependencies in cnxk mldev documentation
  - Replace str functions with rte_str functions
  - Drop use of rte_exit in ml/cnxk driver

v6:
  - Added depends info for series. This series depends on patch-132887
  - Fix merge conflicts with dpdk-23.11-rc1
  - Fix issues with ml/cnxk driver release notes
  - Added build dependency information for dlpack headers

v5:
  - Fix build failures for individual patches in the series
  - Finished build testing with devtools/test-meson-builds.sh script

v4:
  - Squashed release notes
  - Updated external build dependency info in documentation

v3:
  - Reduced use of RTE_MLDEV_CNXK_ENABLE_MVTVM macro
  - Added stubs file with dummy functions to use when TVM is disabled
  - Dropped patch with internal function to read firmware
  - Updated ML CNXK PMD documentation
  - Added external library dependency info in documentation
  - Added release notes for 23.11

v2:
  - Fix xstats reporting
  - Fix issues reported by klocwork static analysis tool
  - Update external header inclusions

v1:
  - Initial changes

Anup Prabhu (2):
  ml/cnxk: enable OCM check for multilayer TVM model
  ml/cnxk: enable fast-path ops for TVM models

Prince Takkar (2):
  ml/cnxk: update internal TVM model info structure
  ml/cnxk: support quantize and dequantize callback

Srikanth Yalavarthi (30):
  ml/cnxk: drop support for register polling
  ml/cnxk: add generic cnxk device structure
  ml/cnxk: add generic model and layer structures
  ml/cnxk: add generic cnxk request structure
  ml/cnxk: add generic cnxk xstats structures
  ml/cnxk: rename cnxk ops function pointers struct
  ml/cnxk: update device handling functions
  ml/cnxk: update queue-pair handling functions
  ml/cnxk: update model load and unload functions
  ml/cnxk: update model start and stop functions
  ml/cnxk: update model utility functions
  ml/cnxk: update data quantization functions
  ml/cnxk: update device debug functions
  ml/cnxk: update device stats functions
  ml/cnxk: update device and model xstats functions
  ml/cnxk: update fast path functions
  ml/cnxk: move error handling to cnxk layer
  ml/cnxk: support config and close of tvmdp library
  ml/cnxk: add structures to support TVM model type
  ml/cnxk: add support for identify model type
  ml/cnxk: add support to parse TVM model objects
  ml/cnxk: fetch layer info and load TVM model
  ml/cnxk: update internal info for TVM model
  ml/cnxk: enable model unload in tvmdp library
  ml/cnxk: support start and stop for TVM models
  ml/cnxk: support device dump for TVM models
  ml/cnxk: enable reporting model runtime as xstats
  ml/cnxk: implement I/O alloc and free callbacks
  ml/cnxk: add generic ML malloc and free callback
  ml/cnxk: enable creation of mvtvm virtual device

 config/arm/arm64_cn10k_linux_gcc       |    1 +
 config/arm/arm64_cn9k_linux_gcc        |    1 +
 doc/guides/mldevs/cnxk.rst             |  223 +-
 doc/guides/rel_notes/release_23_11.rst |    3 +
 drivers/ml/cnxk/cn10k_ml_dev.c         |  416 ++--
 drivers/ml/cnxk/cn10k_ml_dev.h         |  457 +---
 drivers/ml/cnxk/cn10k_ml_model.c       |  403 ++--
 drivers/ml/cnxk/cn10k_ml_model.h       |  151 +-
 drivers/ml/cnxk/cn10k_ml_ocm.c         |  111 +-
 drivers/ml/cnxk/cn10k_ml_ocm.h         |   15 +-
 drivers/ml/cnxk/cn10k_ml_ops.c         | 2828 ++++++++----------------
 drivers/ml/cnxk/cn10k_ml_ops.h         |  358 ++-
 drivers/ml/cnxk/cnxk_ml_dev.c          |   22 +
 drivers/ml/cnxk/cnxk_ml_dev.h          |  120 +
 drivers/ml/cnxk/cnxk_ml_io.c           |   95 +
 drivers/ml/cnxk/cnxk_ml_io.h           |   88 +
 drivers/ml/cnxk/cnxk_ml_model.c        |   94 +
 drivers/ml/cnxk/cnxk_ml_model.h        |  192 ++
 drivers/ml/cnxk/cnxk_ml_ops.c          | 1690 ++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.h          |   87 +
 drivers/ml/cnxk/cnxk_ml_utils.c        |   15 +
 drivers/ml/cnxk/cnxk_ml_utils.h        |   17 +
 drivers/ml/cnxk/cnxk_ml_xstats.h       |  152 ++
 drivers/ml/cnxk/meson.build            |   70 +
 drivers/ml/cnxk/mvtvm_ml_dev.c         |  196 ++
 drivers/ml/cnxk/mvtvm_ml_dev.h         |   40 +
 drivers/ml/cnxk/mvtvm_ml_model.c       |  392 ++++
 drivers/ml/cnxk/mvtvm_ml_model.h       |   90 +
 drivers/ml/cnxk/mvtvm_ml_ops.c         |  652 ++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h         |   82 +
 drivers/ml/cnxk/mvtvm_ml_stubs.c       |  141 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.h       |   36 +
 32 files changed, 6279 insertions(+), 2959 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_xstats.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_stubs.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_stubs.h

-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 01/34] ml/cnxk: drop support for register polling
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 02/34] ml/cnxk: add generic cnxk device structure Srikanth Yalavarthi
                     ` (32 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Dropped support for device argument "poll_mem" for cnxk
ML driver. Support to use registers for polling is removed
and DDR addresses would be used for polling.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 doc/guides/mldevs/cnxk.rst     |  16 -----
 drivers/ml/cnxk/cn10k_ml_dev.c |  36 +----------
 drivers/ml/cnxk/cn10k_ml_dev.h |  13 +---
 drivers/ml/cnxk/cn10k_ml_ops.c | 111 ++++-----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   6 --
 5 files changed, 18 insertions(+), 164 deletions(-)

diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
index b79bc540d9..1834b1f905 100644
--- a/doc/guides/mldevs/cnxk.rst
+++ b/doc/guides/mldevs/cnxk.rst
@@ -180,22 +180,6 @@ Runtime Config Options
   in the fast path enqueue burst operation.
 
 
-**Polling memory location** (default ``ddr``)
-
-  ML cnxk driver provides the option to select the memory location to be used
-  for polling to check the inference request completion.
-  Driver supports using either the DDR address space (``ddr``)
-  or ML registers (``register``) as polling locations.
-  The parameter ``poll_mem`` is used to specify the poll location.
-
-  For example::
-
-     -a 0000:00:10.0,poll_mem="register"
-
-  With the above configuration, ML cnxk driver is configured to use ML registers
-  for polling in fastpath requests.
-
-
 Debugging Options
 -----------------
 
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 983138a7f2..e3c2badcef 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -23,7 +23,6 @@
 #define CN10K_ML_DEV_CACHE_MODEL_DATA	"cache_model_data"
 #define CN10K_ML_OCM_ALLOC_MODE		"ocm_alloc_mode"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK	"hw_queue_lock"
-#define CN10K_ML_FW_POLL_MEM		"poll_mem"
 #define CN10K_ML_OCM_PAGE_SIZE		"ocm_page_size"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
@@ -32,7 +31,6 @@
 #define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT	1
 #define CN10K_ML_OCM_ALLOC_MODE_DEFAULT		"lowest"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT	1
-#define CN10K_ML_FW_POLL_MEM_DEFAULT		"ddr"
 #define CN10K_ML_OCM_PAGE_SIZE_DEFAULT		16384
 
 /* ML firmware macros */
@@ -54,7 +52,6 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 					 CN10K_ML_DEV_CACHE_MODEL_DATA,
 					 CN10K_ML_OCM_ALLOC_MODE,
 					 CN10K_ML_DEV_HW_QUEUE_LOCK,
-					 CN10K_ML_FW_POLL_MEM,
 					 CN10K_ML_OCM_PAGE_SIZE,
 					 NULL};
 
@@ -103,9 +100,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	bool hw_queue_lock_set = false;
 	bool ocm_page_size_set = false;
 	char *ocm_alloc_mode = NULL;
-	bool poll_mem_set = false;
 	bool fw_path_set = false;
-	char *poll_mem = NULL;
 	char *fw_path = NULL;
 	int ret = 0;
 	bool found;
@@ -189,17 +184,6 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		hw_queue_lock_set = true;
 	}
 
-	if (rte_kvargs_count(kvlist, CN10K_ML_FW_POLL_MEM) == 1) {
-		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_POLL_MEM, &parse_string_arg,
-					 &poll_mem);
-		if (ret < 0) {
-			plt_err("Error processing arguments, key = %s\n", CN10K_ML_FW_POLL_MEM);
-			ret = -EINVAL;
-			goto exit;
-		}
-		poll_mem_set = true;
-	}
-
 	if (rte_kvargs_count(kvlist, CN10K_ML_OCM_PAGE_SIZE) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_OCM_PAGE_SIZE, &parse_integer_arg,
 					 &mldev->ocm_page_size);
@@ -280,18 +264,6 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, mldev->hw_queue_lock);
 
-	if (!poll_mem_set) {
-		mldev->fw.poll_mem = CN10K_ML_FW_POLL_MEM_DEFAULT;
-	} else {
-		if (!((strcmp(poll_mem, "ddr") == 0) || (strcmp(poll_mem, "register") == 0))) {
-			plt_err("Invalid argument, %s = %s\n", CN10K_ML_FW_POLL_MEM, poll_mem);
-			ret = -EINVAL;
-			goto exit;
-		}
-		mldev->fw.poll_mem = poll_mem;
-	}
-	plt_info("ML: %s = %s", CN10K_ML_FW_POLL_MEM, mldev->fw.poll_mem);
-
 	if (!ocm_page_size_set) {
 		mldev->ocm_page_size = CN10K_ML_OCM_PAGE_SIZE_DEFAULT;
 	} else {
@@ -450,10 +422,7 @@ cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 	if (fw->report_dpe_warnings)
 		flags = flags | FW_REPORT_DPE_WARNING_BITMASK;
 
-	if (strcmp(fw->poll_mem, "ddr") == 0)
-		flags = flags | FW_USE_DDR_POLL_ADDR_FP;
-	else if (strcmp(fw->poll_mem, "register") == 0)
-		flags = flags & ~FW_USE_DDR_POLL_ADDR_FP;
+	flags = flags | FW_USE_DDR_POLL_ADDR_FP;
 
 	return flags;
 }
@@ -863,5 +832,4 @@ RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH
 			      "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
 			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE
 			      "=<lowest|largest>" CN10K_ML_DEV_HW_QUEUE_LOCK
-			      "=<0|1>" CN10K_ML_FW_POLL_MEM "=<ddr|register>" CN10K_ML_OCM_PAGE_SIZE
-			      "=<1024|2048|4096|8192|16384>");
+			      "=<0|1>" CN10K_ML_OCM_PAGE_SIZE "=<1024|2048|4096|8192|16384>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index c73bf7d001..4aaeecff03 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -390,9 +390,6 @@ struct cn10k_ml_fw {
 	/* Report DPE warnings */
 	int report_dpe_warnings;
 
-	/* Memory to be used for polling in fast-path requests */
-	const char *poll_mem;
-
 	/* Data buffer */
 	uint8_t *data;
 
@@ -525,13 +522,9 @@ struct cn10k_ml_dev {
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
 
 	/* Poll handling function pointers */
-	void (*set_poll_addr)(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx);
-	void (*set_poll_ptr)(struct roc_ml *roc_ml, struct cn10k_ml_req *req);
-	uint64_t (*get_poll_ptr)(struct roc_ml *roc_ml, struct cn10k_ml_req *req);
-
-	/* Memory barrier function pointers to handle synchronization */
-	void (*set_enq_barrier)(void);
-	void (*set_deq_barrier)(void);
+	void (*set_poll_addr)(struct cn10k_ml_req *req);
+	void (*set_poll_ptr)(struct cn10k_ml_req *req);
+	uint64_t (*get_poll_ptr)(struct cn10k_ml_req *req);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 4abf4ae0d3..11531afd8c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -23,11 +23,6 @@
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
-/* Scratch register range for poll mode requests */
-#define ML_POLL_REGISTER_SYNC  1023
-#define ML_POLL_REGISTER_START 1024
-#define ML_POLL_REGISTER_END   2047
-
 /* Error message length */
 #define ERRMSG_LEN 32
 
@@ -82,79 +77,23 @@ print_line(FILE *fp, int len)
 }
 
 static inline void
-cn10k_ml_set_poll_addr_ddr(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx)
+cn10k_ml_set_poll_addr(struct cn10k_ml_req *req)
 {
-	PLT_SET_USED(qp);
-	PLT_SET_USED(idx);
-
 	req->compl_W1 = PLT_U64_CAST(&req->status);
 }
 
 static inline void
-cn10k_ml_set_poll_addr_reg(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx)
-{
-	req->compl_W1 = ML_SCRATCH(qp->block_start + idx % qp->block_size);
-}
-
-static inline void
-cn10k_ml_set_poll_ptr_ddr(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+cn10k_ml_set_poll_ptr(struct cn10k_ml_req *req)
 {
-	PLT_SET_USED(roc_ml);
-
 	plt_write64(ML_CN10K_POLL_JOB_START, req->compl_W1);
 }
 
-static inline void
-cn10k_ml_set_poll_ptr_reg(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
-{
-	roc_ml_reg_write64(roc_ml, ML_CN10K_POLL_JOB_START, req->compl_W1);
-}
-
 static inline uint64_t
-cn10k_ml_get_poll_ptr_ddr(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+cn10k_ml_get_poll_ptr(struct cn10k_ml_req *req)
 {
-	PLT_SET_USED(roc_ml);
-
 	return plt_read64(req->compl_W1);
 }
 
-static inline uint64_t
-cn10k_ml_get_poll_ptr_reg(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
-{
-	return roc_ml_reg_read64(roc_ml, req->compl_W1);
-}
-
-static inline void
-cn10k_ml_set_sync_addr(struct cn10k_ml_dev *mldev, struct cn10k_ml_req *req)
-{
-	if (strcmp(mldev->fw.poll_mem, "ddr") == 0)
-		req->compl_W1 = PLT_U64_CAST(&req->status);
-	else if (strcmp(mldev->fw.poll_mem, "register") == 0)
-		req->compl_W1 = ML_SCRATCH(ML_POLL_REGISTER_SYNC);
-}
-
-static inline void
-cn10k_ml_enq_barrier_ddr(void)
-{
-}
-
-static inline void
-cn10k_ml_deq_barrier_ddr(void)
-{
-}
-
-static inline void
-cn10k_ml_enq_barrier_register(void)
-{
-	dmb_st;
-}
-
-static inline void
-cn10k_ml_deq_barrier_register(void)
-{
-	dsb_st;
-}
-
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -242,9 +181,6 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->stats.dequeued_count = 0;
 	qp->stats.enqueue_err_count = 0;
 	qp->stats.dequeue_err_count = 0;
-	qp->block_size =
-		(ML_POLL_REGISTER_END - ML_POLL_REGISTER_START + 1) / dev->data->nb_queue_pairs;
-	qp->block_start = ML_POLL_REGISTER_START + qp_id * qp->block_size;
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
@@ -933,11 +869,7 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	else
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_LF;
 
-	if (strcmp(mldev->fw.poll_mem, "register") == 0)
-		dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP / dev_info->max_queue_pairs;
-	else if (strcmp(mldev->fw.poll_mem, "ddr") == 0)
-		dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
-
+	dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
 	dev_info->max_io = ML_CN10K_MAX_INPUT_OUTPUT;
 	dev_info->max_segments = ML_CN10K_MAX_SEGMENTS;
 	dev_info->align_size = ML_CN10K_ALIGN_SIZE;
@@ -1118,24 +1050,9 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
 	/* Set polling function pointers */
-	if (strcmp(mldev->fw.poll_mem, "ddr") == 0) {
-		mldev->set_poll_addr = cn10k_ml_set_poll_addr_ddr;
-		mldev->set_poll_ptr = cn10k_ml_set_poll_ptr_ddr;
-		mldev->get_poll_ptr = cn10k_ml_get_poll_ptr_ddr;
-	} else if (strcmp(mldev->fw.poll_mem, "register") == 0) {
-		mldev->set_poll_addr = cn10k_ml_set_poll_addr_reg;
-		mldev->set_poll_ptr = cn10k_ml_set_poll_ptr_reg;
-		mldev->get_poll_ptr = cn10k_ml_get_poll_ptr_reg;
-	}
-
-	/* Set barrier function pointers */
-	if (strcmp(mldev->fw.poll_mem, "ddr") == 0) {
-		mldev->set_enq_barrier = cn10k_ml_enq_barrier_ddr;
-		mldev->set_deq_barrier = cn10k_ml_deq_barrier_ddr;
-	} else if (strcmp(mldev->fw.poll_mem, "register") == 0) {
-		mldev->set_enq_barrier = cn10k_ml_enq_barrier_register;
-		mldev->set_deq_barrier = cn10k_ml_deq_barrier_register;
-	}
+	mldev->set_poll_addr = cn10k_ml_set_poll_addr;
+	mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
+	mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
 
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
@@ -2390,15 +2307,14 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	op = ops[count];
 	req = &queue->reqs[head];
 
-	mldev->set_poll_addr(qp, req, head);
+	mldev->set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
-	mldev->set_enq_barrier();
 
-	mldev->set_poll_ptr(&mldev->roc, req);
+	mldev->set_poll_ptr(req);
 	enqueued = mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
@@ -2445,7 +2361,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 
 dequeue_req:
 	req = &queue->reqs[tail];
-	status = mldev->get_poll_ptr(&mldev->roc, req);
+	status = mldev->get_poll_ptr(req);
 	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH)) {
 		if (plt_tsc_cycles() < req->timeout)
 			goto empty_or_active;
@@ -2453,7 +2369,6 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 			req->result.error_code.s.etype = ML_ETYPE_DRIVER;
 	}
 
-	mldev->set_deq_barrier();
 	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
 	ops[count] = req->op;
 
@@ -2515,14 +2430,14 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	model = dev->data->models[op->model_id];
 	req = model->req;
 
-	cn10k_ml_set_sync_addr(mldev, req);
+	cn10k_ml_set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
-	mldev->set_poll_ptr(&mldev->roc, req);
+	mldev->set_poll_ptr(req);
 	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
 
 	timeout = true;
@@ -2542,7 +2457,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 
 	timeout = true;
 	do {
-		if (mldev->get_poll_ptr(&mldev->roc, req) == ML_CN10K_POLL_JOB_FINISH) {
+		if (mldev->get_poll_ptr(req) == ML_CN10K_POLL_JOB_FINISH) {
 			timeout = false;
 			break;
 		}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index d64a9f27e6..005b093e45 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -67,12 +67,6 @@ struct cn10k_ml_qp {
 
 	/* Statistics per queue-pair */
 	struct rte_ml_dev_stats stats;
-
-	/* Register block start for polling */
-	uint32_t block_start;
-
-	/* Register block end for polling */
-	uint32_t block_size;
 };
 
 /* Device ops */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 02/34] ml/cnxk: add generic cnxk device structure
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 01/34] ml/cnxk: drop support for register polling Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 03/34] ml/cnxk: add generic model and layer structures Srikanth Yalavarthi
                     ` (31 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduce generic cnxk device structure. This structure is
a top level device structure for the driver, which would
encapsulate the target / platform specific device structure.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c   | 316 ++++++++++----------
 drivers/ml/cnxk/cn10k_ml_dev.h   |  47 +--
 drivers/ml/cnxk/cn10k_ml_model.c |  15 +-
 drivers/ml/cnxk/cn10k_ml_model.h |   8 +-
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  60 ++--
 drivers/ml/cnxk/cn10k_ml_ops.c   | 495 +++++++++++++++++--------------
 drivers/ml/cnxk/cnxk_ml_dev.c    |  11 +
 drivers/ml/cnxk/cnxk_ml_dev.h    |  58 ++++
 drivers/ml/cnxk/meson.build      |   1 +
 9 files changed, 562 insertions(+), 449 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index e3c2badcef..3bc61443d8 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -10,13 +10,14 @@
 #include <rte_mldev_pmd.h>
 #include <rte_pci.h>
 
-#include <roc_api.h>
-
 #include <eal_firmware.h>
 
-#include "cn10k_ml_dev.h"
+#include <roc_api.h>
+
 #include "cn10k_ml_ops.h"
 
+#include "cnxk_ml_dev.h"
+
 #define CN10K_ML_FW_PATH		"fw_path"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
@@ -58,9 +59,6 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 /* Supported OCM page sizes: 1KB, 2KB, 4KB, 8KB and 16KB */
 static const int valid_ocm_page_size[] = {1024, 2048, 4096, 8192, 16384};
 
-/* Dummy operations for ML device */
-struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
-
 static int
 parse_string_arg(const char *key __rte_unused, const char *value, void *extra_args)
 {
@@ -90,7 +88,7 @@ parse_integer_arg(const char *key __rte_unused, const char *value, void *extra_a
 }
 
 static int
-cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mldev)
+cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *cn10k_mldev)
 {
 	bool enable_dpe_warnings_set = false;
 	bool report_dpe_warnings_set = false;
@@ -127,7 +125,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_FW_ENABLE_DPE_WARNINGS) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_ENABLE_DPE_WARNINGS,
-					 &parse_integer_arg, &mldev->fw.enable_dpe_warnings);
+					 &parse_integer_arg, &cn10k_mldev->fw.enable_dpe_warnings);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_FW_ENABLE_DPE_WARNINGS);
@@ -139,7 +137,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_FW_REPORT_DPE_WARNINGS) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_REPORT_DPE_WARNINGS,
-					 &parse_integer_arg, &mldev->fw.report_dpe_warnings);
+					 &parse_integer_arg, &cn10k_mldev->fw.report_dpe_warnings);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_FW_REPORT_DPE_WARNINGS);
@@ -151,7 +149,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_DEV_CACHE_MODEL_DATA) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_DEV_CACHE_MODEL_DATA, &parse_integer_arg,
-					 &mldev->cache_model_data);
+					 &cn10k_mldev->cache_model_data);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_DEV_CACHE_MODEL_DATA);
@@ -174,7 +172,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_DEV_HW_QUEUE_LOCK) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_DEV_HW_QUEUE_LOCK, &parse_integer_arg,
-					 &mldev->hw_queue_lock);
+					 &cn10k_mldev->hw_queue_lock);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_DEV_HW_QUEUE_LOCK);
@@ -186,7 +184,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_OCM_PAGE_SIZE) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_OCM_PAGE_SIZE, &parse_integer_arg,
-					 &mldev->ocm_page_size);
+					 &cn10k_mldev->ocm_page_size);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n", CN10K_ML_OCM_PAGE_SIZE);
 			ret = -EINVAL;
@@ -197,49 +195,53 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 check_args:
 	if (!fw_path_set)
-		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
+		cn10k_mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
 	else
-		mldev->fw.path = fw_path;
-	plt_info("ML: %s = %s", CN10K_ML_FW_PATH, mldev->fw.path);
+		cn10k_mldev->fw.path = fw_path;
+	plt_info("ML: %s = %s", CN10K_ML_FW_PATH, cn10k_mldev->fw.path);
 
 	if (!enable_dpe_warnings_set) {
-		mldev->fw.enable_dpe_warnings = CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT;
+		cn10k_mldev->fw.enable_dpe_warnings = CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT;
 	} else {
-		if ((mldev->fw.enable_dpe_warnings < 0) || (mldev->fw.enable_dpe_warnings > 1)) {
+		if ((cn10k_mldev->fw.enable_dpe_warnings < 0) ||
+		    (cn10k_mldev->fw.enable_dpe_warnings > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_FW_ENABLE_DPE_WARNINGS,
-				mldev->fw.enable_dpe_warnings);
+				cn10k_mldev->fw.enable_dpe_warnings);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_FW_ENABLE_DPE_WARNINGS, mldev->fw.enable_dpe_warnings);
+	plt_info("ML: %s = %d", CN10K_ML_FW_ENABLE_DPE_WARNINGS,
+		 cn10k_mldev->fw.enable_dpe_warnings);
 
 	if (!report_dpe_warnings_set) {
-		mldev->fw.report_dpe_warnings = CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT;
+		cn10k_mldev->fw.report_dpe_warnings = CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT;
 	} else {
-		if ((mldev->fw.report_dpe_warnings < 0) || (mldev->fw.report_dpe_warnings > 1)) {
+		if ((cn10k_mldev->fw.report_dpe_warnings < 0) ||
+		    (cn10k_mldev->fw.report_dpe_warnings > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_FW_REPORT_DPE_WARNINGS,
-				mldev->fw.report_dpe_warnings);
+				cn10k_mldev->fw.report_dpe_warnings);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_FW_REPORT_DPE_WARNINGS, mldev->fw.report_dpe_warnings);
+	plt_info("ML: %s = %d", CN10K_ML_FW_REPORT_DPE_WARNINGS,
+		 cn10k_mldev->fw.report_dpe_warnings);
 
 	if (!cache_model_data_set) {
-		mldev->cache_model_data = CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT;
+		cn10k_mldev->cache_model_data = CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT;
 	} else {
-		if ((mldev->cache_model_data < 0) || (mldev->cache_model_data > 1)) {
+		if ((cn10k_mldev->cache_model_data < 0) || (cn10k_mldev->cache_model_data > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_DEV_CACHE_MODEL_DATA,
-				mldev->cache_model_data);
+				cn10k_mldev->cache_model_data);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_DEV_CACHE_MODEL_DATA, mldev->cache_model_data);
+	plt_info("ML: %s = %d", CN10K_ML_DEV_CACHE_MODEL_DATA, cn10k_mldev->cache_model_data);
 
 	if (!ocm_alloc_mode_set) {
-		mldev->ocm.alloc_mode = CN10K_ML_OCM_ALLOC_MODE_DEFAULT;
+		cn10k_mldev->ocm.alloc_mode = CN10K_ML_OCM_ALLOC_MODE_DEFAULT;
 	} else {
 		if (!((strcmp(ocm_alloc_mode, "lowest") == 0) ||
 		      (strcmp(ocm_alloc_mode, "largest") == 0))) {
@@ -248,47 +250,47 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 			ret = -EINVAL;
 			goto exit;
 		}
-		mldev->ocm.alloc_mode = ocm_alloc_mode;
+		cn10k_mldev->ocm.alloc_mode = ocm_alloc_mode;
 	}
-	plt_info("ML: %s = %s", CN10K_ML_OCM_ALLOC_MODE, mldev->ocm.alloc_mode);
+	plt_info("ML: %s = %s", CN10K_ML_OCM_ALLOC_MODE, cn10k_mldev->ocm.alloc_mode);
 
 	if (!hw_queue_lock_set) {
-		mldev->hw_queue_lock = CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT;
+		cn10k_mldev->hw_queue_lock = CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT;
 	} else {
-		if ((mldev->hw_queue_lock < 0) || (mldev->hw_queue_lock > 1)) {
+		if ((cn10k_mldev->hw_queue_lock < 0) || (cn10k_mldev->hw_queue_lock > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_DEV_HW_QUEUE_LOCK,
-				mldev->hw_queue_lock);
+				cn10k_mldev->hw_queue_lock);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, mldev->hw_queue_lock);
+	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, cn10k_mldev->hw_queue_lock);
 
 	if (!ocm_page_size_set) {
-		mldev->ocm_page_size = CN10K_ML_OCM_PAGE_SIZE_DEFAULT;
+		cn10k_mldev->ocm_page_size = CN10K_ML_OCM_PAGE_SIZE_DEFAULT;
 	} else {
-		if (mldev->ocm_page_size < 0) {
+		if (cn10k_mldev->ocm_page_size < 0) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_OCM_PAGE_SIZE,
-				mldev->ocm_page_size);
+				cn10k_mldev->ocm_page_size);
 			ret = -EINVAL;
 			goto exit;
 		}
 
 		found = false;
 		for (i = 0; i < PLT_DIM(valid_ocm_page_size); i++) {
-			if (mldev->ocm_page_size == valid_ocm_page_size[i]) {
+			if (cn10k_mldev->ocm_page_size == valid_ocm_page_size[i]) {
 				found = true;
 				break;
 			}
 		}
 
 		if (!found) {
-			plt_err("Unsupported ocm_page_size = %d\n", mldev->ocm_page_size);
+			plt_err("Unsupported ocm_page_size = %d\n", cn10k_mldev->ocm_page_size);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_OCM_PAGE_SIZE, mldev->ocm_page_size);
+	plt_info("ML: %s = %d", CN10K_ML_OCM_PAGE_SIZE, cn10k_mldev->ocm_page_size);
 
 exit:
 	rte_kvargs_free(kvlist);
@@ -300,7 +302,8 @@ static int
 cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 {
 	struct rte_ml_dev_pmd_init_params init_params;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	char name[RTE_ML_STR_MAX];
 	struct rte_ml_dev *dev;
 	int ret;
@@ -308,7 +311,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	PLT_SET_USED(pci_drv);
 
 	init_params = (struct rte_ml_dev_pmd_init_params){
-		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cn10k_ml_dev)};
+		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cnxk_ml_dev)};
 
 	ret = roc_plt_init();
 	if (ret < 0) {
@@ -324,18 +327,20 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	}
 
 	/* Get private data space allocated */
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cnxk_mldev->mldev = dev;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
-		mldev->roc.pci_dev = pci_dev;
+		cn10k_mldev->roc.pci_dev = pci_dev;
 
-		ret = cn10k_mldev_parse_devargs(dev->device->devargs, mldev);
+		ret = cn10k_mldev_parse_devargs(dev->device->devargs, cn10k_mldev);
 		if (ret) {
 			plt_err("Failed to parse devargs ret = %d", ret);
 			goto pmd_destroy;
 		}
 
-		ret = roc_ml_dev_init(&mldev->roc);
+		ret = roc_ml_dev_init(&cn10k_mldev->roc);
 		if (ret) {
 			plt_err("Failed to initialize ML ROC, ret = %d", ret);
 			goto pmd_destroy;
@@ -351,7 +356,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	dev->dequeue_burst = NULL;
 	dev->op_error_get = NULL;
 
-	mldev->state = ML_CN10K_DEV_STATE_PROBED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_PROBED;
 
 	return 0;
 
@@ -368,7 +373,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 static int
 cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	char name[RTE_ML_STR_MAX];
 	struct rte_ml_dev *dev;
 	int ret;
@@ -383,8 +388,8 @@ cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 		return -ENODEV;
 
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
-		mldev = dev->data->dev_private;
-		ret = roc_ml_dev_fini(&mldev->roc);
+		cnxk_mldev = dev->data->dev_private;
+		ret = roc_ml_dev_fini(&cnxk_mldev->cn10k_mldev.roc);
 		if (ret)
 			return ret;
 	}
@@ -430,45 +435,45 @@ cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 static int
 cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
 	uint64_t timeout_cycle;
 	uint64_t reg_val64;
 	bool timeout;
 	int ret = 0;
 
-	mldev = fw->mldev;
+	cn10k_mldev = fw->cn10k_mldev;
 
 	/* Reset HEAD and TAIL debug pointer registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
 
 	/* Set ML_MLR_BASE to base IOVA of the ML region in LLC/DRAM. */
 	reg_val64 = rte_eal_get_baseaddr();
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_MLR_BASE);
-	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
-	roc_ml_reg_save(&mldev->roc, ML_MLR_BASE);
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_save(&cn10k_mldev->roc, ML_MLR_BASE);
 
 	/* Update FW load completion structure */
 	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
 	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &fw->req->result);
+	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
 	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CN10K_POLL_JOB_START, &fw->req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
-	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&mldev->roc, &fw->req->jd);
+	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
 
 	plt_rmb();
 	do {
-		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
+		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -480,11 +485,11 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 	} else {
 		/* Set ML to disable new jobs */
 		reg_val64 = (ROC_ML_CFG_JD_SIZE | ROC_ML_CFG_MLIP_ENA);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 
 		/* Clear scratch registers */
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
 
 		if (timeout) {
 			plt_err("Firmware load timeout");
@@ -498,14 +503,14 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 	}
 
 	/* Reset scratch registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
 
 	/* Disable job execution, to be enabled in start */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	return ret;
 }
@@ -515,7 +520,7 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 {
 	union ml_a35_0_rst_vector_base_s a35_0_rst_vector_base;
 	union ml_a35_0_rst_vector_base_s a35_1_rst_vector_base;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
 	uint64_t timeout_cycle;
 	uint64_t reg_val64;
 	uint32_t reg_val32;
@@ -524,24 +529,24 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	int ret = 0;
 	uint8_t i;
 
-	mldev = fw->mldev;
+	cn10k_mldev = fw->cn10k_mldev;
 
 	/* Reset HEAD and TAIL debug pointer registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
 
 	/* (1) Write firmware images for ACC's two A35 cores to the ML region in LLC / DRAM. */
 	rte_memcpy(PLT_PTR_ADD(fw->data, FW_LINKER_OFFSET), buffer, size);
 
 	/* (2) Set ML(0)_MLR_BASE = Base IOVA of the ML region in LLC/DRAM. */
 	reg_val64 = PLT_PTR_SUB_U64_CAST(fw->data, rte_eal_get_baseaddr());
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_MLR_BASE);
-	plt_ml_dbg("ML_MLR_BASE => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
-	roc_ml_reg_save(&mldev->roc, ML_MLR_BASE);
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_save(&cn10k_mldev->roc, ML_MLR_BASE);
 
 	/* (3) Set ML(0)_AXI_BRIDGE_CTRL(1) = 0x184003 to remove back-pressure check on DMA AXI
 	 * bridge.
@@ -549,9 +554,9 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	reg_val64 = (ROC_ML_AXI_BRIDGE_CTRL_AXI_RESP_CTRL |
 		     ROC_ML_AXI_BRIDGE_CTRL_BRIDGE_CTRL_MODE | ROC_ML_AXI_BRIDGE_CTRL_NCB_WR_BLK |
 		     ROC_ML_AXI_BRIDGE_CTRL_FORCE_WRESP_OK | ROC_ML_AXI_BRIDGE_CTRL_FORCE_RRESP_OK);
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_AXI_BRIDGE_CTRL(1));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_AXI_BRIDGE_CTRL(1));
 	plt_ml_dbg("ML_AXI_BRIDGE_CTRL(1) => 0x%016lx",
-		   roc_ml_reg_read64(&mldev->roc, ML_AXI_BRIDGE_CTRL(1)));
+		   roc_ml_reg_read64(&cn10k_mldev->roc, ML_AXI_BRIDGE_CTRL(1)));
 
 	/* (4) Set ML(0)_ANB(0..2)_BACKP_DISABLE = 0x3 to remove back-pressure on the AXI to NCB
 	 * bridges.
@@ -559,9 +564,9 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	for (i = 0; i < ML_ANBX_NR; i++) {
 		reg_val64 = (ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_B_BACKP_DISABLE |
 			     ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_R_BACKP_DISABLE);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_BACKP_DISABLE(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_BACKP_DISABLE(i));
 		plt_ml_dbg("ML_ANBX_BACKP_DISABLE(%u) => 0x%016lx", i,
-			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_BACKP_DISABLE(i)));
+			   roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_BACKP_DISABLE(i)));
 	}
 
 	/* (5) Set ML(0)_ANB(0..2)_NCBI_P_OVR = 0x3000 and ML(0)_ANB(0..2)_NCBI_NP_OVR = 0x3000 to
@@ -570,39 +575,40 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	for (i = 0; i < ML_ANBX_NR; i++) {
 		reg_val64 = (ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR |
 			     ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR_VLD);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
 		plt_ml_dbg("ML_ANBX_NCBI_P_OVR(%u) => 0x%016lx", i,
-			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_P_OVR(i)));
+			   roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_NCBI_P_OVR(i)));
 
 		reg_val64 |= (ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR |
 			      ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR_VLD);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_NP_OVR(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_NCBI_NP_OVR(i));
 		plt_ml_dbg("ML_ANBX_NCBI_NP_OVR(%u) => 0x%016lx", i,
-			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_NP_OVR(i)));
+			   roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_NCBI_NP_OVR(i)));
 	}
 
 	/* (6) Set ML(0)_CFG[MLIP_CLK_FORCE] = 1, to force turning on the MLIP clock. */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_MLIP_CLK_FORCE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* (7) Set ML(0)_JOB_MGR_CTRL[STALL_ON_IDLE] = 0, to make sure the boot request is accepted
 	 * when there is no job in the command queue.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL);
 	reg_val64 &= ~ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
-	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
+	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx",
+		   roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL));
 
 	/* (8) Set ML(0)_CFG[ENA] = 0 and ML(0)_CFG[MLIP_ENA] = 1 to bring MLIP out of reset while
 	 * keeping the job manager disabled.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_MLIP_ENA;
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* (9) Wait at least 70 coprocessor clock cycles. */
 	plt_delay_us(FW_WAIT_CYCLES);
@@ -613,53 +619,57 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	 * AXI outbound address divided by 4. Read after write.
 	 */
 	offset = PLT_PTR_ADD_U64_CAST(
-		fw->data, FW_LINKER_OFFSET - roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+		fw->data, FW_LINKER_OFFSET - roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
 	a35_0_rst_vector_base.s.addr = (offset + ML_AXI_START_ADDR) / 4;
 	a35_1_rst_vector_base.s.addr = (offset + ML_AXI_START_ADDR) / 4;
 
-	roc_ml_reg_write32(&mldev->roc, a35_0_rst_vector_base.w.w0, ML_A35_0_RST_VECTOR_BASE_W(0));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(0));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_0_rst_vector_base.w.w0,
+			   ML_A35_0_RST_VECTOR_BASE_W(0));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(0));
 	plt_ml_dbg("ML_A35_0_RST_VECTOR_BASE_W(0) => 0x%08x", reg_val32);
 
-	roc_ml_reg_write32(&mldev->roc, a35_0_rst_vector_base.w.w1, ML_A35_0_RST_VECTOR_BASE_W(1));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(1));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_0_rst_vector_base.w.w1,
+			   ML_A35_0_RST_VECTOR_BASE_W(1));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(1));
 	plt_ml_dbg("ML_A35_0_RST_VECTOR_BASE_W(1) => 0x%08x", reg_val32);
 
-	roc_ml_reg_write32(&mldev->roc, a35_1_rst_vector_base.w.w0, ML_A35_1_RST_VECTOR_BASE_W(0));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(0));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_1_rst_vector_base.w.w0,
+			   ML_A35_1_RST_VECTOR_BASE_W(0));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(0));
 	plt_ml_dbg("ML_A35_1_RST_VECTOR_BASE_W(0) => 0x%08x", reg_val32);
 
-	roc_ml_reg_write32(&mldev->roc, a35_1_rst_vector_base.w.w1, ML_A35_1_RST_VECTOR_BASE_W(1));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(1));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_1_rst_vector_base.w.w1,
+			   ML_A35_1_RST_VECTOR_BASE_W(1));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(1));
 	plt_ml_dbg("ML_A35_1_RST_VECTOR_BASE_W(1) => 0x%08x", reg_val32);
 
 	/* (11) Clear MLIP's ML(0)_SW_RST_CTRL[ACC_RST]. This will bring the ACC cores and other
 	 * MLIP components out of reset. The cores will execute firmware from the ML region as
 	 * written in step 1.
 	 */
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_SW_RST_CTRL);
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_SW_RST_CTRL);
 	reg_val32 &= ~ROC_ML_SW_RST_CTRL_ACC_RST;
-	roc_ml_reg_write32(&mldev->roc, reg_val32, ML_SW_RST_CTRL);
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_SW_RST_CTRL);
+	roc_ml_reg_write32(&cn10k_mldev->roc, reg_val32, ML_SW_RST_CTRL);
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_SW_RST_CTRL);
 	plt_ml_dbg("ML_SW_RST_CTRL => 0x%08x", reg_val32);
 
 	/* (12) Wait for notification from firmware that ML is ready for job execution. */
 	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
 	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &fw->req->result);
+	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
 	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CN10K_POLL_JOB_START, &fw->req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
-	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&mldev->roc, &fw->req->jd);
+	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
 
 	plt_rmb();
 	do {
-		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
+		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -671,11 +681,11 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	} else {
 		/* Set ML to disable new jobs */
 		reg_val64 = (ROC_ML_CFG_JD_SIZE | ROC_ML_CFG_MLIP_ENA);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 
 		/* Clear scratch registers */
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
 
 		if (timeout) {
 			plt_err("Firmware load timeout");
@@ -691,49 +701,51 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	/* (13) Set ML(0)_JOB_MGR_CTRL[STALL_ON_IDLE] = 0x1; this is needed to shut down the MLIP
 	 * clock when there are no more jobs to process.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL);
 	reg_val64 |= ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
-	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
+	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx",
+		   roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL));
 
 	/* (14) Set ML(0)_CFG[MLIP_CLK_FORCE] = 0; the MLIP clock will be turned on/off based on job
 	 * activities.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_MLIP_CLK_FORCE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* (15) Set ML(0)_CFG[ENA] to enable ML job execution. */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* Reset scratch registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
 
 	/* Disable job execution, to be enabled in start */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* Additional fixes: Set RO bit to fix O2D DMA bandwidth issue on cn10ka */
 	for (i = 0; i < ML_ANBX_NR; i++) {
-		reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_P_OVR(i));
+		reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_NCBI_P_OVR(i));
 		reg_val64 |= (ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR |
 			      ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR_VLD);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
 	}
 
 	return ret;
 }
 
 int
-cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
+cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
 	const struct plt_memzone *mz;
 	struct cn10k_ml_fw *fw;
 	void *fw_buffer = NULL;
@@ -741,8 +753,9 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	uint64_t fw_size = 0;
 	int ret = 0;
 
-	fw = &mldev->fw;
-	fw->mldev = mldev;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	fw = &cn10k_mldev->fw;
+	fw->cn10k_mldev = cn10k_mldev;
 
 	if (roc_env_is_emulator() || roc_env_is_hw()) {
 		/* Read firmware image to a buffer */
@@ -773,8 +786,8 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	memset(&fw->req->jd.fw_load.version[0], '\0', MLDEV_FIRMWARE_VERSION_LENGTH);
 
 	/* Reset device, if in active state */
-	if (roc_ml_mlip_is_enabled(&mldev->roc))
-		roc_ml_mlip_reset(&mldev->roc, true);
+	if (roc_ml_mlip_is_enabled(&cn10k_mldev->roc))
+		roc_ml_mlip_reset(&cn10k_mldev->roc, true);
 
 	/* Load firmware */
 	if (roc_env_is_emulator() || roc_env_is_hw()) {
@@ -787,22 +800,25 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	}
 
 	if (ret < 0)
-		cn10k_ml_fw_unload(mldev);
+		cn10k_ml_fw_unload(cnxk_mldev);
 
 	return ret;
 }
 
 void
-cn10k_ml_fw_unload(struct cn10k_ml_dev *mldev)
+cn10k_ml_fw_unload(struct cnxk_ml_dev *cnxk_mldev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
 	const struct plt_memzone *mz;
 	uint64_t reg_val;
 
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+
 	/* Disable and reset device */
-	reg_val = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val &= ~ROC_ML_CFG_MLIP_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val, ML_CFG);
-	roc_ml_mlip_reset(&mldev->roc, true);
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val, ML_CFG);
+	roc_ml_mlip_reset(&cn10k_mldev->roc, true);
 
 	mz = plt_memzone_lookup(FW_MEMZONE_NAME);
 	if (mz != NULL)
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 4aaeecff03..f9da1548c4 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -9,6 +9,9 @@
 
 #include "cn10k_ml_ocm.h"
 
+/* Dummy Device ops */
+extern struct rte_ml_dev_ops ml_dev_dummy_ops;
+
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
@@ -36,17 +39,10 @@
 /* Maximum number of segments for IO data */
 #define ML_CN10K_MAX_SEGMENTS 1
 
-/* ML command timeout in seconds */
-#define ML_CN10K_CMD_TIMEOUT 5
-
 /* ML slow-path job flags */
 #define ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE BIT(0)
 #define ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD   BIT(1)
 
-/* Poll mode job state */
-#define ML_CN10K_POLL_JOB_START	 0
-#define ML_CN10K_POLL_JOB_FINISH 1
-
 /* Memory barrier macros */
 #if defined(RTE_ARCH_ARM)
 #define dmb_st ({ asm volatile("dmb st" : : : "memory"); })
@@ -56,6 +52,7 @@
 #define dsb_st
 #endif
 
+struct cnxk_ml_dev;
 struct cn10k_ml_req;
 struct cn10k_ml_qp;
 
@@ -68,21 +65,6 @@ enum cn10k_ml_job_type {
 	ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST,
 };
 
-/* Device configuration state enum */
-enum cn10k_ml_dev_state {
-	/* Probed and not configured */
-	ML_CN10K_DEV_STATE_PROBED = 0,
-
-	/* Configured */
-	ML_CN10K_DEV_STATE_CONFIGURED,
-
-	/* Started */
-	ML_CN10K_DEV_STATE_STARTED,
-
-	/* Closed */
-	ML_CN10K_DEV_STATE_CLOSED
-};
-
 /* Error types enumeration */
 enum cn10k_ml_error_etype {
 	/* 0x0 */ ML_ETYPE_NO_ERROR = 0, /* No error */
@@ -379,7 +361,7 @@ struct cn10k_ml_jd {
 /* ML firmware structure */
 struct cn10k_ml_fw {
 	/* Device reference */
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
 
 	/* Firmware file path */
 	const char *path;
@@ -485,27 +467,12 @@ struct cn10k_ml_dev {
 	/* Device ROC */
 	struct roc_ml roc;
 
-	/* Configuration state */
-	enum cn10k_ml_dev_state state;
-
 	/* Firmware */
 	struct cn10k_ml_fw fw;
 
 	/* OCM info */
 	struct cn10k_ml_ocm ocm;
 
-	/* Number of models loaded */
-	uint16_t nb_models_loaded;
-
-	/* Number of models unloaded */
-	uint16_t nb_models_unloaded;
-
-	/* Number of models started */
-	uint16_t nb_models_started;
-
-	/* Number of models stopped */
-	uint16_t nb_models_stopped;
-
 	/* Extended stats data */
 	struct cn10k_ml_xstats xstats;
 
@@ -528,7 +495,7 @@ struct cn10k_ml_dev {
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
-int cn10k_ml_fw_load(struct cn10k_ml_dev *mldev);
-void cn10k_ml_fw_unload(struct cn10k_ml_dev *mldev);
+int cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev);
+void cn10k_ml_fw_unload(struct cnxk_ml_dev *cnxk_mldev);
 
 #endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index e0b750cd8e..cc46ca2efd 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -6,10 +6,11 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
+#include "cnxk_ml_dev.h"
+
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
 {
@@ -461,7 +462,7 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_
 }
 
 int
-cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, uint8_t *buffer,
+cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id, uint8_t *buffer,
 			       uint16_t *wb_pages, uint16_t *scratch_pages)
 {
 	struct cn10k_ml_model_metadata *metadata;
@@ -470,7 +471,7 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui
 	uint64_t wb_size;
 
 	metadata = (struct cn10k_ml_model_metadata *)buffer;
-	ocm = &mldev->ocm;
+	ocm = &cn10k_mldev->ocm;
 
 	/* Assume wb_size is zero for non-relocatable models */
 	if (metadata->model.ocm_relocatable)
@@ -494,11 +495,11 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui
 		   scratch_size, *scratch_pages);
 
 	/* Check if the model can be loaded on OCM */
-	if ((*wb_pages + *scratch_pages) > mldev->ocm.num_pages) {
+	if ((*wb_pages + *scratch_pages) > cn10k_mldev->ocm.num_pages) {
 		plt_err("Cannot create the model, OCM relocatable = %u",
 			metadata->model.ocm_relocatable);
 		plt_err("wb_pages (%u) + scratch_pages (%u) > %u", *wb_pages, *scratch_pages,
-			mldev->ocm.num_pages);
+			cn10k_mldev->ocm.num_pages);
 		return -ENOMEM;
 	}
 
@@ -506,8 +507,8 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui
 	 * prevent the library from allocating the remaining space on the tile to other models.
 	 */
 	if (!metadata->model.ocm_relocatable)
-		*scratch_pages =
-			PLT_MAX(PLT_U64_CAST(*scratch_pages), PLT_U64_CAST(mldev->ocm.num_pages));
+		*scratch_pages = PLT_MAX(PLT_U64_CAST(*scratch_pages),
+					 PLT_U64_CAST(cn10k_mldev->ocm.num_pages));
 
 	return 0;
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 4cc0744891..3128b28db7 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -13,6 +13,8 @@
 #include "cn10k_ml_ocm.h"
 #include "cn10k_ml_ops.h"
 
+struct cnxk_ml_dev;
+
 /* Model state */
 enum cn10k_ml_model_state {
 	ML_CN10K_MODEL_STATE_LOADED,
@@ -489,7 +491,7 @@ struct cn10k_ml_model_stats {
 /* Model Object */
 struct cn10k_ml_model {
 	/* Device reference */
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *mldev;
 
 	/* Name */
 	char name[RTE_ML_STR_MAX];
@@ -537,8 +539,8 @@ int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
 void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
 				uint8_t *base_dma_addr);
-int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, uint8_t *buffer,
-				   uint16_t *wb_pages, uint16_t *scratch_pages);
+int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id,
+				   uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages);
 void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 6fb0bb620e..8094a0fab1 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -4,11 +4,12 @@
 
 #include <rte_mldev_pmd.h>
 
-#include "cn10k_ml_dev.h"
+#include <roc_api.h>
+
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
-#include "roc_api.h"
+#include "cnxk_ml_dev.h"
 
 /* OCM macros */
 #define BYTE_LEN	   8
@@ -217,7 +218,8 @@ int
 cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
 			   uint16_t scratch_pages, uint64_t *tilemask)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	uint16_t used_scratch_pages_max;
@@ -236,8 +238,9 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	int max_slot_sz;
 	int page_id;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 
 	if (num_tiles > ML_CN10K_OCM_NUMTILES) {
 		plt_err("Invalid num_tiles = %u (> %u)", num_tiles, ML_CN10K_OCM_NUMTILES);
@@ -254,8 +257,8 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	tile_start = 0;
 	search_end_tile = ocm->num_tiles - num_tiles;
 
-	/* allocate for local ocm mask */
-	local_ocm_mask = rte_zmalloc("local_ocm_mask", mldev->ocm.mask_words, RTE_CACHE_LINE_SIZE);
+	/* Allocate for local ocm mask */
+	local_ocm_mask = rte_zmalloc("local_ocm_mask", ocm->mask_words, RTE_CACHE_LINE_SIZE);
 	if (local_ocm_mask == NULL) {
 		plt_err("Unable to allocate memory for local_ocm_mask");
 		return -1;
@@ -271,7 +274,7 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 			PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, used_last_wb_page_max);
 	}
 
-	memset(local_ocm_mask, 0, mldev->ocm.mask_words);
+	memset(local_ocm_mask, 0, ocm->mask_words);
 	for (tile_id = tile_start; tile_id < tile_start + num_tiles; tile_id++) {
 		for (word_id = 0; word_id < ocm->mask_words; word_id++)
 			local_ocm_mask[word_id] |= ocm->tile_ocm_info[tile_id].ocm_mask[word_id];
@@ -333,8 +336,9 @@ void
 cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t tilemask,
 			   int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_page_start;
@@ -345,8 +349,9 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 	int tile_id;
 	int page_id;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	/* Get first set bit, tile_start */
@@ -391,8 +396,9 @@ void
 cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_model *local_model;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_resize_pages;
@@ -404,8 +410,9 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 	int page_id;
 	uint16_t i;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	/* Update OCM info for WB memory */
@@ -453,35 +460,37 @@ cn10k_ml_ocm_pagemask_to_str(struct cn10k_ml_ocm_tile_info *tile_info, uint16_t
 	char *p = str;
 	int word;
 
-	/* add prefix 0x */
+	/* Add prefix 0x */
 	*p++ = '0';
 	*p++ = 'x';
 
-	/* build one word at a time */
+	/* Build hex string */
 	for (word = nwords - 1; word >= 0; word--) {
 		sprintf(p, "%02X", tile_info->ocm_mask[word]);
 		p += 2;
 	}
 
-	/* terminate */
+	/* Terminate */
 	*p++ = 0;
 }
 
 void
 cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
 {
-	char *str;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 	uint8_t tile_id;
 	uint8_t word_id;
 	int wb_pages;
+	char *str;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 
-	/* nibbles + prefix '0x' */
-	str = rte_zmalloc("ocm_mask_str", mldev->ocm.num_pages / 4 + 2, RTE_CACHE_LINE_SIZE);
+	/* Nibbles + prefix '0x' */
+	str = rte_zmalloc("ocm_mask_str", ocm->num_pages / 4 + 2, RTE_CACHE_LINE_SIZE);
 	if (str == NULL) {
 		plt_err("Unable to allocate memory for ocm_mask_str");
 		return;
@@ -492,9 +501,8 @@ cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
 		cn10k_ml_ocm_pagemask_to_str(&ocm->tile_ocm_info[tile_id], ocm->mask_words, str);
 
 		wb_pages = 0 - ocm->tile_ocm_info[tile_id].scratch_pages;
-		for (word_id = 0; word_id < mldev->ocm.mask_words; word_id++)
-			wb_pages +=
-				rte_popcount32(ocm->tile_ocm_info[tile_id].ocm_mask[word_id]);
+		for (word_id = 0; word_id < ocm->mask_words; word_id++)
+			wb_pages += rte_popcount32(ocm->tile_ocm_info[tile_id].ocm_mask[word_id]);
 
 		fprintf(fp,
 			"tile = %2u, scratch_pages = %4u,"
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 11531afd8c..dc747cf534 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -7,10 +7,11 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ops.h"
 
+#include "cnxk_ml_dev.h"
+
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
 
@@ -85,7 +86,7 @@ cn10k_ml_set_poll_addr(struct cn10k_ml_req *req)
 static inline void
 cn10k_ml_set_poll_ptr(struct cn10k_ml_req *req)
 {
-	plt_write64(ML_CN10K_POLL_JOB_START, req->compl_W1);
+	plt_write64(ML_CNXK_POLL_JOB_START, req->compl_W1);
 }
 
 static inline uint64_t
@@ -175,7 +176,7 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->queue.reqs = (struct cn10k_ml_req *)va;
 	qp->queue.head = 0;
 	qp->queue.tail = 0;
-	qp->queue.wait_cycles = ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	qp->nb_desc = nb_desc;
 	qp->stats.enqueued_count = 0;
 	qp->stats.dequeued_count = 0;
@@ -199,16 +200,17 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 static void
 cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 {
-
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	char str[STR_LEN];
 	uint8_t i;
 	uint8_t j;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	/* Print debug info */
@@ -249,7 +251,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
 			ML_CN10K_OCM_NUMTILES / 4, model->model_mem_map.tilemask);
 		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
-			model->model_mem_map.wb_page_start * mldev->ocm.page_size);
+			model->model_mem_map.wb_page_start * cn10k_mldev->ocm.page_size);
 	}
 
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->metadata.model.num_input);
@@ -325,7 +327,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 }
 
 static void
-cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_model *model,
+cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_model *model,
 				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
@@ -340,7 +342,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 	req->jd.hdr.model_id = model->model_id;
 	req->jd.hdr.job_type = job_type;
 	req->jd.hdr.fp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
 		if (!model->metadata.model.ocm_relocatable)
@@ -350,9 +352,9 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 
 		req->jd.hdr.sp_flags |= ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD;
 		req->jd.model_start.extended_args =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, &req->extended_args));
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->extended_args));
 		req->jd.model_start.model_dst_ddr_addr =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, addr->init_run_addr));
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_run_addr));
 		req->jd.model_start.model_init_offset = 0x0;
 		req->jd.model_start.model_main_offset = metadata->init_model.file_size;
 		req->jd.model_start.model_finish_offset =
@@ -372,7 +374,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 		req->jd.model_start.ocm_wb_range_start = metadata->model.ocm_wb_range_start;
 		req->jd.model_start.ocm_wb_range_end = metadata->model.ocm_wb_range_end;
 		req->jd.model_start.ddr_wb_base_address = PLT_U64_CAST(roc_ml_addr_ap2mlip(
-			&mldev->roc,
+			&cn10k_mldev->roc,
 			PLT_PTR_ADD(addr->finish_load_addr, metadata->finish_model.file_size)));
 		req->jd.model_start.ddr_wb_range_start = metadata->model.ddr_wb_range_start;
 		req->jd.model_start.ddr_wb_range_end = metadata->model.ddr_wb_range_end;
@@ -383,7 +385,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 		req->jd.model_start.output.s.ddr_range_end = metadata->model.ddr_output_range_end;
 
 		req->extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
-			roc_ml_addr_ap2mlip(&mldev->roc, model->addr.scratch_base_addr));
+			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, model->addr.scratch_base_addr));
 		req->extended_args.start.ddr_scratch_range_start =
 			metadata->model.ddr_scratch_range_start;
 		req->extended_args.start.ddr_scratch_range_end =
@@ -392,24 +394,20 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 }
 
 static __rte_always_inline void
-cn10k_ml_prep_fp_job_descriptor(struct rte_ml_dev *dev, struct cn10k_ml_req *req,
+cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_req *req,
 				struct rte_ml_op *op)
 {
-	struct cn10k_ml_dev *mldev;
-
-	mldev = dev->data->dev_private;
-
 	req->jd.hdr.jce.w0.u64 = 0;
 	req->jd.hdr.jce.w1.u64 = req->compl_W1;
 	req->jd.hdr.model_id = op->model_id;
 	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
 	req->jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
 	req->jd.hdr.sp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
 	req->jd.model_run.input_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->input[0]->addr));
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->input[0]->addr));
 	req->jd.model_run.output_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->output[0]->addr));
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->output[0]->addr));
 	req->jd.model_run.num_batches = op->nb_batches;
 }
 
@@ -436,66 +434,69 @@ static const struct xstat_info model_stats[] = {
 static int
 cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint16_t nb_stats;
 	uint16_t stat_id;
 	uint16_t model;
 	uint16_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
 	nb_stats = RTE_DIM(device_stats) + ML_CN10K_MAX_MODELS * RTE_DIM(model_stats);
-	if (mldev->xstats.entries == NULL)
-		mldev->xstats.entries = rte_zmalloc("cn10k_ml_xstats",
-						    sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
-						    PLT_CACHE_LINE_SIZE);
+	if (cn10k_mldev->xstats.entries == NULL)
+		cn10k_mldev->xstats.entries = rte_zmalloc(
+			"cn10k_ml_xstats", sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
+			PLT_CACHE_LINE_SIZE);
 
-	if (mldev->xstats.entries == NULL)
+	if (cn10k_mldev->xstats.entries == NULL)
 		return -ENOMEM;
 
 	/* Initialize device xstats */
 	stat_id = 0;
 	for (i = 0; i < RTE_DIM(device_stats); i++) {
-		mldev->xstats.entries[stat_id].map.id = stat_id;
-		snprintf(mldev->xstats.entries[stat_id].map.name,
-			 sizeof(mldev->xstats.entries[stat_id].map.name), "%s",
+		cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
+		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s",
 			 device_stats[i].name);
 
-		mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
-		mldev->xstats.entries[stat_id].type = device_stats[i].type;
-		mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_DEVICE;
-		mldev->xstats.entries[stat_id].obj_idx = 0;
-		mldev->xstats.entries[stat_id].reset_allowed = device_stats[i].reset_allowed;
+		cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
+		cn10k_mldev->xstats.entries[stat_id].type = device_stats[i].type;
+		cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_DEVICE;
+		cn10k_mldev->xstats.entries[stat_id].obj_idx = 0;
+		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_stats[i].reset_allowed;
 		stat_id++;
 	}
-	mldev->xstats.count_mode_device = stat_id;
+	cn10k_mldev->xstats.count_mode_device = stat_id;
 
 	/* Initialize model xstats */
 	for (model = 0; model < ML_CN10K_MAX_MODELS; model++) {
-		mldev->xstats.offset_for_model[model] = stat_id;
+		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
 
 		for (i = 0; i < RTE_DIM(model_stats); i++) {
-			mldev->xstats.entries[stat_id].map.id = stat_id;
-			mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
-			mldev->xstats.entries[stat_id].type = model_stats[i].type;
-			mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_MODEL;
-			mldev->xstats.entries[stat_id].obj_idx = model;
-			mldev->xstats.entries[stat_id].reset_allowed = model_stats[i].reset_allowed;
+			cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
+			cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
+			cn10k_mldev->xstats.entries[stat_id].type = model_stats[i].type;
+			cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_MODEL;
+			cn10k_mldev->xstats.entries[stat_id].obj_idx = model;
+			cn10k_mldev->xstats.entries[stat_id].reset_allowed =
+				model_stats[i].reset_allowed;
 
 			/* Name of xstat is updated during model load */
-			snprintf(mldev->xstats.entries[stat_id].map.name,
-				 sizeof(mldev->xstats.entries[stat_id].map.name), "Model-%u-%s",
-				 model, model_stats[i].name);
+			snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
+				 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name),
+				 "Model-%u-%s", model, model_stats[i].name);
 
 			stat_id++;
 		}
 
-		mldev->xstats.count_per_model[model] = RTE_DIM(model_stats);
+		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(model_stats);
 	}
 
-	mldev->xstats.count_mode_model = stat_id - mldev->xstats.count_mode_device;
-	mldev->xstats.count = stat_id;
+	cn10k_mldev->xstats.count_mode_model = stat_id - cn10k_mldev->xstats.count_mode_device;
+	cn10k_mldev->xstats.count = stat_id;
 
 	return 0;
 }
@@ -503,28 +504,32 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 static void
 cn10k_ml_xstats_uninit(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	rte_free(mldev->xstats.entries);
-	mldev->xstats.entries = NULL;
+	rte_free(cn10k_mldev->xstats.entries);
+	cn10k_mldev->xstats.entries = NULL;
 
-	mldev->xstats.count = 0;
+	cn10k_mldev->xstats.count = 0;
 }
 
 static void
 cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	uint16_t rclk_freq;
 	uint16_t sclk_freq;
 	uint16_t stat_id;
 	char suffix[8];
 	uint16_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[model_id];
 	stat_id = RTE_DIM(device_stats) + model_id * RTE_DIM(model_stats);
 
@@ -536,8 +541,8 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Update xstat name based on model name and sclk availability */
 	for (i = 0; i < RTE_DIM(model_stats); i++) {
-		snprintf(mldev->xstats.entries[stat_id].map.name,
-			 sizeof(mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
 			 model->metadata.model.name, model_stats[i].name, suffix);
 		stat_id++;
 	}
@@ -547,19 +552,19 @@ static uint64_t
 cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 		       enum cn10k_ml_xstats_type type)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
 
 	switch (type) {
 	case nb_models_loaded:
-		return mldev->nb_models_loaded;
+		return cnxk_mldev->nb_models_loaded;
 	case nb_models_unloaded:
-		return mldev->nb_models_unloaded;
+		return cnxk_mldev->nb_models_unloaded;
 	case nb_models_started:
-		return mldev->nb_models_started;
+		return cnxk_mldev->nb_models_started;
 	case nb_models_stopped:
-		return mldev->nb_models_stopped;
+		return cnxk_mldev->nb_models_stopped;
 	default:
 		return -1;
 	}
@@ -651,15 +656,17 @@ static int
 cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[], uint16_t nb_ids)
 {
 	struct cn10k_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint16_t nb_stats;
 	uint16_t stat_id;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	if (stat_ids == NULL)
-		nb_stats = mldev->xstats.count_mode_device;
+		nb_stats = cn10k_mldev->xstats.count_mode_device;
 	else
 		nb_stats = nb_ids;
 
@@ -669,10 +676,10 @@ cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[],
 		else
 			stat_id = stat_ids[i];
 
-		if (stat_id >= mldev->xstats.count_mode_device)
+		if (stat_id >= cn10k_mldev->xstats.count_mode_device)
 			return -EINVAL;
 
-		xs = &mldev->xstats.entries[stat_id];
+		xs = &cn10k_mldev->xstats.entries[stat_id];
 		if (!xs->reset_allowed)
 			continue;
 
@@ -740,15 +747,17 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 			    uint16_t nb_ids)
 {
 	struct cn10k_ml_xstats_entry *xs;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	int32_t lcl_model_id = 0;
 	uint16_t start_id;
 	uint16_t end_id;
 	int32_t i;
 	int32_t j;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	for (i = 0; i < ML_CN10K_MAX_MODELS; i++) {
 		if (model_id == -1) {
 			model = dev->data->models[i];
@@ -765,12 +774,13 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 			}
 		}
 
-		start_id = mldev->xstats.offset_for_model[i];
-		end_id = mldev->xstats.offset_for_model[i] + mldev->xstats.count_per_model[i] - 1;
+		start_id = cn10k_mldev->xstats.offset_for_model[i];
+		end_id = cn10k_mldev->xstats.offset_for_model[i] +
+			 cn10k_mldev->xstats.count_per_model[i] - 1;
 
 		if (stat_ids == NULL) {
 			for (j = start_id; j <= end_id; j++) {
-				xs = &mldev->xstats.entries[j];
+				xs = &cn10k_mldev->xstats.entries[j];
 				cn10k_ml_reset_model_stat(dev, i, xs->type);
 			}
 		} else {
@@ -780,7 +790,7 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 						stat_ids[j], lcl_model_id);
 					return -EINVAL;
 				}
-				xs = &mldev->xstats.entries[stat_ids[j]];
+				xs = &cn10k_mldev->xstats.entries[stat_ids[j]];
 				cn10k_ml_reset_model_stat(dev, i, xs->type);
 			}
 		}
@@ -854,17 +864,19 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
 	if (dev_info == NULL)
 		return -EINVAL;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
 	dev_info->driver_name = dev->device->driver->name;
 	dev_info->max_models = ML_CN10K_MAX_MODELS;
-	if (mldev->hw_queue_lock)
+	if (cn10k_mldev->hw_queue_lock)
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
 	else
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_LF;
@@ -881,8 +893,9 @@ static int
 cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
 {
 	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
@@ -895,7 +908,8 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		return -EINVAL;
 
 	/* Get CN10K device handle */
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	cn10k_ml_dev_info_get(dev, &dev_info);
 	if (conf->nb_models > dev_info.max_models) {
@@ -908,21 +922,21 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		return -EINVAL;
 	}
 
-	if (mldev->state == ML_CN10K_DEV_STATE_PROBED) {
+	if (cnxk_mldev->state == ML_CNXK_DEV_STATE_PROBED) {
 		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
 
 		/* Load firmware */
-		ret = cn10k_ml_fw_load(mldev);
+		ret = cn10k_ml_fw_load(cnxk_mldev);
 		if (ret != 0)
 			return ret;
-	} else if (mldev->state == ML_CN10K_DEV_STATE_CONFIGURED) {
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
 		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
-	} else if (mldev->state == ML_CN10K_DEV_STATE_STARTED) {
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_STARTED) {
 		plt_err("Device can't be reconfigured in started state\n");
 		return -ENOTSUP;
-	} else if (mldev->state == ML_CN10K_DEV_STATE_CLOSED) {
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CLOSED) {
 		plt_err("Device can't be reconfigured after close\n");
 		return -ENOTSUP;
 	}
@@ -1013,10 +1027,10 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	}
 	dev->data->nb_models = conf->nb_models;
 
-	ocm = &mldev->ocm;
+	ocm = &cn10k_mldev->ocm;
 	ocm->num_tiles = ML_CN10K_OCM_NUMTILES;
 	ocm->size_per_tile = ML_CN10K_OCM_TILESIZE;
-	ocm->page_size = mldev->ocm_page_size;
+	ocm->page_size = cn10k_mldev->ocm_page_size;
 	ocm->num_pages = ocm->size_per_tile / ocm->page_size;
 	ocm->mask_words = ocm->num_pages / (8 * sizeof(uint8_t));
 
@@ -1044,25 +1058,25 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	}
 
 	/* Set JCMDQ enqueue function */
-	if (mldev->hw_queue_lock == 1)
-		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_sl;
+	if (cn10k_mldev->hw_queue_lock == 1)
+		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_sl;
 	else
-		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
+		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
 	/* Set polling function pointers */
-	mldev->set_poll_addr = cn10k_ml_set_poll_addr;
-	mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
-	mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
+	cn10k_mldev->set_poll_addr = cn10k_ml_set_poll_addr;
+	cn10k_mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
+	cn10k_mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
 
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
 	dev->op_error_get = cn10k_ml_op_error_get;
 
-	mldev->nb_models_loaded = 0;
-	mldev->nb_models_started = 0;
-	mldev->nb_models_stopped = 0;
-	mldev->nb_models_unloaded = 0;
-	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
+	cnxk_mldev->nb_models_loaded = 0;
+	cnxk_mldev->nb_models_started = 0;
+	cnxk_mldev->nb_models_stopped = 0;
+	cnxk_mldev->nb_models_unloaded = 0;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
 
 	return 0;
 
@@ -1077,8 +1091,9 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 static int
 cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
 	uint16_t qp_id;
@@ -1086,10 +1101,11 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	if (dev == NULL)
 		return -EINVAL;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Release ocm_mask memory */
-	rte_free(mldev->ocm.ocm_mask);
+	rte_free(cn10k_mldev->ocm.ocm_mask);
 
 	/* Stop and unload all models */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
@@ -1125,21 +1141,21 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	cn10k_ml_xstats_uninit(dev);
 
 	/* Unload firmware */
-	cn10k_ml_fw_unload(mldev);
+	cn10k_ml_fw_unload(cnxk_mldev);
 
 	/* Clear scratch registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
 
 	/* Reset ML_MLR_BASE */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_MLR_BASE);
-	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
 
-	mldev->state = ML_CN10K_DEV_STATE_CLOSED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CLOSED;
 
 	/* Remove PCI device */
 	return rte_dev_remove(dev->device);
@@ -1148,17 +1164,19 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 static int
 cn10k_ml_dev_start(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	mldev->state = ML_CN10K_DEV_STATE_STARTED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
 
 	return 0;
 }
@@ -1166,17 +1184,19 @@ cn10k_ml_dev_start(struct rte_ml_dev *dev)
 static int
 cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
 
 	return 0;
 }
@@ -1259,22 +1279,24 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
 			      uint32_t size)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint32_t xstats_mode_count;
 	uint32_t idx = 0;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	xstats_mode_count = 0;
 	switch (mode) {
 	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = mldev->xstats.count_mode_device;
+		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CN10K_MAX_MODELS)
 			break;
-		xstats_mode_count = mldev->xstats.count_per_model[model_id];
+		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
 	default:
 		return -EINVAL;
@@ -1283,16 +1305,17 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 	if (xstats_mode_count > size || xstats_map == NULL)
 		return xstats_mode_count;
 
-	for (i = 0; i < mldev->xstats.count && idx < size; i++) {
-		if (mldev->xstats.entries[i].mode != mode)
+	for (i = 0; i < cn10k_mldev->xstats.count && idx < size; i++) {
+		if (cn10k_mldev->xstats.entries[i].mode != mode)
 			continue;
 
 		if (mode != RTE_ML_DEV_XSTATS_DEVICE &&
-		    model_id != mldev->xstats.entries[i].obj_idx)
+		    model_id != cn10k_mldev->xstats.entries[i].obj_idx)
 			continue;
 
-		strncpy(xstats_map[idx].name, mldev->xstats.entries[i].map.name, RTE_ML_STR_MAX);
-		xstats_map[idx].id = mldev->xstats.entries[i].map.id;
+		rte_strscpy(xstats_map[idx].name, cn10k_mldev->xstats.entries[i].map.name,
+			    RTE_ML_STR_MAX);
+		xstats_map[idx].id = cn10k_mldev->xstats.entries[i].map.id;
 		idx++;
 	}
 
@@ -1304,13 +1327,15 @@ cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16
 				uint64_t *value)
 {
 	struct cn10k_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	cn10k_ml_xstats_fn fn;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
-	for (i = 0; i < mldev->xstats.count; i++) {
-		xs = &mldev->xstats.entries[i];
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	for (i = 0; i < cn10k_mldev->xstats.count; i++) {
+		xs = &cn10k_mldev->xstats.entries[i];
 		if (strncmp(xs->map.name, name, RTE_ML_STR_MAX) == 0) {
 			if (stat_id != NULL)
 				*stat_id = xs->map.id;
@@ -1344,24 +1369,26 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
 {
 	struct cn10k_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint32_t xstats_mode_count;
 	cn10k_ml_xstats_fn fn;
 	uint64_t val;
 	uint32_t idx;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	xstats_mode_count = 0;
 
 	switch (mode) {
 	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = mldev->xstats.count_mode_device;
+		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CN10K_MAX_MODELS)
 			return -EINVAL;
-		xstats_mode_count = mldev->xstats.count_per_model[model_id];
+		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
 	default:
 		return -EINVAL;
@@ -1369,8 +1396,8 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 
 	idx = 0;
 	for (i = 0; i < nb_ids && idx < xstats_mode_count; i++) {
-		xs = &mldev->xstats.entries[stat_ids[i]];
-		if (stat_ids[i] > mldev->xstats.count || xs->mode != mode)
+		xs = &cn10k_mldev->xstats.entries[stat_ids[i]];
+		if (stat_ids[i] > cn10k_mldev->xstats.count || xs->mode != mode)
 			continue;
 
 		if (mode == RTE_ML_DEV_XSTATS_MODEL && model_id != xs->obj_idx) {
@@ -1418,8 +1445,9 @@ cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mo
 static int
 cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_fw *fw;
 
 	uint32_t head_loc;
@@ -1432,8 +1460,9 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	if (roc_env_is_asim())
 		return 0;
 
-	mldev = dev->data->dev_private;
-	fw = &mldev->fw;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	fw = &cn10k_mldev->fw;
 
 	/* Dump model info */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
@@ -1451,15 +1480,19 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	for (core_id = 0; core_id <= 1; core_id++) {
 		bufsize = fw->req->jd.fw_load.debug.debug_buffer_size;
 		if (core_id == 0) {
-			head_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-			tail_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+			head_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+			tail_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_debug_ptr);
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		} else {
-			head_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-			tail_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+			head_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+			tail_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_debug_ptr);
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		}
 		if (head_loc < tail_loc) {
 			fprintf(fp, "%.*s\n", tail_loc - head_loc, &head_ptr[head_loc]);
@@ -1473,18 +1506,18 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	for (core_id = 0; core_id <= 1; core_id++) {
 		bufsize = fw->req->jd.fw_load.debug.exception_state_size;
 		if ((core_id == 0) &&
-		    (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0)) {
+		    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0)) {
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C0 = 0x%016lx",
-				roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0));
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0));
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 			fprintf(fp, "%.*s", bufsize, head_ptr);
-		} else if ((core_id == 1) &&
-			   (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0)) {
+		} else if ((core_id == 1) && (roc_ml_reg_read64(&cn10k_mldev->roc,
+								ML_SCRATCH_EXCEPTION_SP_C1) != 0)) {
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C1 = 0x%016lx",
-				roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1));
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1));
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 			fprintf(fp, "%.*s", bufsize, head_ptr);
 		}
 	}
@@ -1495,14 +1528,16 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 static int
 cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	const struct plt_memzone *mz;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	uint64_t timeout_cycle;
 	bool timeout;
 	int ret;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cn10k_ml_req), 0,
 					 ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL) {
@@ -1515,20 +1550,20 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
 	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
 	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
-	req->jd.fw_load.flags = cn10k_ml_fw_flags_get(&mldev->fw);
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
+	req->jd.fw_load.flags = cn10k_ml_fw_flags_get(&cn10k_mldev->fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
 	/* Enqueue firmware selftest request through scratch registers */
 	timeout = true;
-	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
 
 	plt_rmb();
 	do {
-		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
-		    (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
+		    (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -1552,8 +1587,8 @@ int
 cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
 	struct cn10k_ml_model_metadata *metadata;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
@@ -1574,7 +1609,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	if (ret != 0)
 		return ret;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
 
 	/* Find model ID */
 	found = false;
@@ -1591,7 +1626,8 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	/* Get WB and scratch pages, check if model can be loaded. */
-	ret = cn10k_ml_model_ocm_pages_count(mldev, idx, params->addr, &wb_pages, &scratch_pages);
+	ret = cn10k_ml_model_ocm_pages_count(&cnxk_mldev->cn10k_mldev, idx, params->addr, &wb_pages,
+					     &scratch_pages);
 	if (ret < 0)
 		return ret;
 
@@ -1623,7 +1659,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	model = mz->addr;
-	model->mldev = mldev;
+	model->mldev = cnxk_mldev;
 	model->model_id = idx;
 
 	rte_memcpy(&model->metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
@@ -1680,7 +1716,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
-	mldev->nb_models_loaded++;
+	cnxk_mldev->nb_models_loaded++;
 
 	/* Update xstats names */
 	cn10k_ml_xstats_model_name_update(dev, idx);
@@ -1695,9 +1731,9 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	char str[RTE_MEMZONE_NAMESIZE];
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
 	model = dev->data->models[model_id];
 
 	if (model == NULL) {
@@ -1711,7 +1747,7 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 	}
 
 	dev->data->models[model_id] = NULL;
-	mldev->nb_models_unloaded++;
+	cnxk_mldev->nb_models_unloaded++;
 
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, model_id);
 	return plt_memzone_free(plt_memzone_lookup(str));
@@ -1720,8 +1756,9 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 int
 cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1735,8 +1772,9 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	bool locked;
 	int ret = 0;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	if (model == NULL) {
@@ -1746,11 +1784,11 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Prepare JD */
 	req = model->req;
-	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
+	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
 
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
 	num_tiles = model->metadata.model.tile_end - model->metadata.model.tile_start + 1;
@@ -1815,26 +1853,26 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	job_dequeued = false;
 	do {
 		if (!job_enqueued) {
-			req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&mldev->roc, &req->jd);
+			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
 			if (req->result.error_code.u64 == 0)
 				ret = 0;
 			else
 				ret = -1;
 		}
 	} else { /* Reset scratch registers */
-		roc_ml_scratch_queue_reset(&mldev->roc);
+		roc_ml_scratch_queue_reset(&cn10k_mldev->roc);
 		ret = -ETIME;
 	}
 
@@ -1843,7 +1881,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 		if (plt_spinlock_trylock(&model->lock) != 0) {
 			if (ret == 0) {
 				model->state = ML_CN10K_MODEL_STATE_STARTED;
-				mldev->nb_models_started++;
+				cnxk_mldev->nb_models_started++;
 			} else {
 				model->state = ML_CN10K_MODEL_STATE_UNKNOWN;
 			}
@@ -1867,7 +1905,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	if (ret < 0) { /* Call unload to update model and FW state, ignore error */
 		rte_ml_model_stop(dev->data->dev_id, model_id);
 	} else {
-		if (mldev->cache_model_data && roc_model_is_cn10ka())
+		if (cn10k_mldev->cache_model_data && roc_model_is_cn10ka())
 			ret = cn10k_ml_cache_model_data(dev, model_id);
 	}
 
@@ -1877,8 +1915,9 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 int
 cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1887,8 +1926,9 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	bool locked;
 	int ret = 0;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	if (model == NULL) {
@@ -1898,11 +1938,11 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Prepare JD */
 	req = model->req;
-	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
+	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
 
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
 	locked = false;
@@ -1941,33 +1981,33 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	job_dequeued = false;
 	do {
 		if (!job_enqueued) {
-			req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&mldev->roc, &req->jd);
+			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
 			if (req->result.error_code.u64 == 0x0)
 				ret = 0;
 			else
 				ret = -1;
 		}
 	} else {
-		roc_ml_scratch_queue_reset(&mldev->roc);
+		roc_ml_scratch_queue_reset(&cn10k_mldev->roc);
 		ret = -ETIME;
 	}
 
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			mldev->nb_models_stopped++;
+			cnxk_mldev->nb_models_stopped++;
 			model->state = ML_CN10K_MODEL_STATE_LOADED;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
@@ -2211,8 +2251,9 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		       struct rte_ml_op *op)
 {
 	struct cn10k_ml_model_stats *stats;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
@@ -2258,14 +2299,16 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 
 		/* Handle driver error */
 		if (result->error_code.s.etype == ML_ETYPE_DRIVER) {
-			mldev = dev->data->dev_private;
+			cnxk_mldev = dev->data->dev_private;
+			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 			/* Check for exception */
-			if ((roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0) ||
-			    (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
+			if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) !=
+			     0) ||
+			    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
 				result->error_code.s.stype = ML_DRIVER_ERR_EXCEPTION;
-			else if ((roc_ml_reg_read64(&mldev->roc, ML_CORE_INT_LO) != 0) ||
-				 (roc_ml_reg_read64(&mldev->roc, ML_CORE_INT_HI) != 0))
+			else if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_LO) != 0) ||
+				 (roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_HI) != 0))
 				result->error_code.s.stype = ML_DRIVER_ERR_FW_ERROR;
 			else
 				result->error_code.s.stype = ML_DRIVER_ERR_UNKNOWN;
@@ -2282,8 +2325,9 @@ __rte_hot uint16_t
 cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	struct cn10k_ml_qp *qp;
 	struct rte_ml_op *op;
@@ -2292,7 +2336,8 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	uint64_t head;
 	bool enqueued;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	qp = dev->data->queue_pairs[qp_id];
 	queue = &qp->queue;
 
@@ -2307,15 +2352,15 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	op = ops[count];
 	req = &queue->reqs[head];
 
-	mldev->set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
+	cn10k_mldev->set_poll_addr(req);
+	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
-	mldev->set_poll_ptr(req);
-	enqueued = mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd);
+	cn10k_mldev->set_poll_ptr(req);
+	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
 
@@ -2339,8 +2384,9 @@ __rte_hot uint16_t
 cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	struct cn10k_ml_qp *qp;
 
@@ -2348,7 +2394,8 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	uint16_t count;
 	uint64_t tail;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	qp = dev->data->queue_pairs[qp_id];
 	queue = &qp->queue;
 
@@ -2361,8 +2408,8 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 
 dequeue_req:
 	req = &queue->reqs[tail];
-	status = mldev->get_poll_ptr(req);
-	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH)) {
+	status = cn10k_mldev->get_poll_ptr(req);
+	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
 		if (plt_tsc_cycles() < req->timeout)
 			goto empty_or_active;
 		else /* Timeout, set indication of driver error */
@@ -2420,30 +2467,32 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 __rte_hot int
 cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	bool timeout;
 	int ret = 0;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[op->model_id];
 	req = model->req;
 
 	cn10k_ml_set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
-	mldev->set_poll_ptr(req);
+	cn10k_mldev->set_poll_ptr(req);
 	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
 
 	timeout = true;
-	req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	do {
-		if (mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd)) {
+		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd)) {
 			req->op = op;
 			timeout = false;
 			break;
@@ -2457,7 +2506,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 
 	timeout = true;
 	do {
-		if (mldev->get_poll_ptr(req) == ML_CN10K_POLL_JOB_FINISH) {
+		if (cn10k_mldev->get_poll_ptr(req) == ML_CNXK_POLL_JOB_FINISH) {
 			timeout = false;
 			break;
 		}
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.c b/drivers/ml/cnxk/cnxk_ml_dev.c
new file mode 100644
index 0000000000..2a5c17c973
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_dev.c
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include "cnxk_ml_dev.h"
+
+/* Dummy operations for ML device */
+struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
new file mode 100644
index 0000000000..51315de622
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_DEV_H_
+#define _CNXK_ML_DEV_H_
+
+#include <roc_api.h>
+
+#include "cn10k_ml_dev.h"
+
+/* ML command timeout in seconds */
+#define ML_CNXK_CMD_TIMEOUT 5
+
+/* Poll mode job state */
+#define ML_CNXK_POLL_JOB_START	0
+#define ML_CNXK_POLL_JOB_FINISH 1
+
+/* Device configuration state enum */
+enum cnxk_ml_dev_state {
+	/* Probed and not configured */
+	ML_CNXK_DEV_STATE_PROBED = 0,
+
+	/* Configured */
+	ML_CNXK_DEV_STATE_CONFIGURED,
+
+	/* Started */
+	ML_CNXK_DEV_STATE_STARTED,
+
+	/* Closed */
+	ML_CNXK_DEV_STATE_CLOSED
+};
+
+/* Device private data */
+struct cnxk_ml_dev {
+	/* RTE device */
+	struct rte_ml_dev *mldev;
+
+	/* Configuration state */
+	enum cnxk_ml_dev_state state;
+
+	/* Number of models loaded */
+	uint16_t nb_models_loaded;
+
+	/* Number of models unloaded */
+	uint16_t nb_models_unloaded;
+
+	/* Number of models started */
+	uint16_t nb_models_started;
+
+	/* Number of models stopped */
+	uint16_t nb_models_stopped;
+
+	/* CN10K device structure */
+	struct cn10k_ml_dev cn10k_mldev;
+};
+
+#endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 5bf17d8ae3..e006fdfe0e 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -12,6 +12,7 @@ sources = files(
         'cn10k_ml_ops.c',
         'cn10k_ml_model.c',
         'cn10k_ml_ocm.c',
+        'cnxk_ml_dev.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 03/34] ml/cnxk: add generic model and layer structures
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 01/34] ml/cnxk: drop support for register polling Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 02/34] ml/cnxk: add generic cnxk device structure Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 04/34] ml/cnxk: add generic cnxk request structure Srikanth Yalavarthi
                     ` (30 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduce generic cnxk model and layer structure. These
structures would enable supporting models with multiple
layers. A model is a collection of multiple independent
layers with flow dependencies between the layers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |   9 +-
 drivers/ml/cnxk/cn10k_ml_model.c | 247 ++++++++--------
 drivers/ml/cnxk/cn10k_ml_model.h | 122 ++------
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  50 ++--
 drivers/ml/cnxk/cn10k_ml_ocm.h   |   9 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 488 +++++++++++++++++--------------
 drivers/ml/cnxk/cnxk_ml_io.h     |  79 +++++
 drivers/ml/cnxk/cnxk_ml_model.c  |   7 +
 drivers/ml/cnxk/cnxk_ml_model.h  | 111 +++++++
 drivers/ml/cnxk/meson.build      |   1 +
 10 files changed, 653 insertions(+), 470 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index f9da1548c4..99ff0a344a 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -9,6 +9,8 @@
 
 #include "cn10k_ml_ocm.h"
 
+#include "cnxk_ml_io.h"
+
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 
@@ -21,9 +23,6 @@ extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 /* Device alignment size */
 #define ML_CN10K_ALIGN_SIZE 128
 
-/* Maximum number of models per device */
-#define ML_CN10K_MAX_MODELS 16
-
 /* Maximum number of queue-pairs per device, spinlock version */
 #define ML_CN10K_MAX_QP_PER_DEVICE_SL 16
 
@@ -455,8 +454,8 @@ struct cn10k_ml_xstats {
 	struct cn10k_ml_xstats_entry *entries;
 
 	/* Store num stats and offset of the stats for each model */
-	uint16_t count_per_model[ML_CN10K_MAX_MODELS];
-	uint16_t offset_for_model[ML_CN10K_MAX_MODELS];
+	uint16_t count_per_model[ML_CNXK_MAX_MODELS];
+	uint16_t offset_for_model[ML_CNXK_MAX_MODELS];
 	uint16_t count_mode_device;
 	uint16_t count_mode_model;
 	uint16_t count;
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index cc46ca2efd..d033d6deff 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -6,10 +6,10 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -311,19 +311,17 @@ cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata)
 }
 
 void
-cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_t *base_dma_addr)
+cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer, uint8_t *base_dma_addr)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_model_addr *addr;
+	struct cn10k_ml_layer_addr *addr;
 	size_t model_data_size;
 	uint8_t *dma_addr_load;
 	uint8_t *dma_addr_run;
-	uint8_t i;
-	uint8_t j;
 	int fpos;
 
-	metadata = &model->metadata;
-	addr = &model->addr;
+	metadata = &layer->glow.metadata;
+	addr = &layer->glow.addr;
 	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
 			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
 
@@ -361,102 +359,138 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_
 	addr->wb_base_addr = PLT_PTR_SUB(dma_addr_load, metadata->weights_bias.mem_offset);
 	addr->wb_load_addr = PLT_PTR_ADD(addr->wb_base_addr, metadata->weights_bias.mem_offset);
 	rte_memcpy(addr->wb_load_addr, PLT_PTR_ADD(buffer, fpos), metadata->weights_bias.file_size);
+}
+
+void
+cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	uint8_t i;
+	uint8_t j;
+
+	metadata = &layer->glow.metadata;
 
 	/* Inputs */
-	addr->total_input_sz_d = 0;
-	addr->total_input_sz_q = 0;
+	layer->info.nb_inputs = metadata->model.num_input;
+	layer->info.total_input_sz_d = 0;
+	layer->info.total_input_sz_q = 0;
 	for (i = 0; i < metadata->model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			addr->input[i].nb_dims = 4;
-			addr->input[i].shape[0] = metadata->input1[i].shape.w;
-			addr->input[i].shape[1] = metadata->input1[i].shape.x;
-			addr->input[i].shape[2] = metadata->input1[i].shape.y;
-			addr->input[i].shape[3] = metadata->input1[i].shape.z;
-
-			addr->input[i].nb_elements =
+			rte_strscpy(layer->info.input[i].name,
+				    (char *)metadata->input1[i].input_name, MRVL_ML_INPUT_NAME_LEN);
+			layer->info.input[i].dtype = metadata->input1[i].input_type;
+			layer->info.input[i].qtype = metadata->input1[i].model_input_type;
+			layer->info.input[i].nb_dims = 4;
+			layer->info.input[i].shape[0] = metadata->input1[i].shape.w;
+			layer->info.input[i].shape[1] = metadata->input1[i].shape.x;
+			layer->info.input[i].shape[2] = metadata->input1[i].shape.y;
+			layer->info.input[i].shape[3] = metadata->input1[i].shape.z;
+			layer->info.input[i].nb_elements =
 				metadata->input1[i].shape.w * metadata->input1[i].shape.x *
 				metadata->input1[i].shape.y * metadata->input1[i].shape.z;
-			addr->input[i].sz_d =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_d =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].input_type);
-			addr->input[i].sz_q =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_q =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].model_input_type);
-			addr->total_input_sz_d += addr->input[i].sz_d;
-			addr->total_input_sz_q += addr->input[i].sz_q;
+			layer->info.input[i].scale = metadata->input1[i].qscale;
+
+			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
+			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
 
 			plt_ml_dbg(
-				"model_id = %u, input[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				model->model_id, i, metadata->input1[i].shape.w,
+				"index = %u, input1[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				layer->index, i, metadata->input1[i].shape.w,
 				metadata->input1[i].shape.x, metadata->input1[i].shape.y,
-				metadata->input1[i].shape.z, addr->input[i].sz_d,
-				addr->input[i].sz_q);
+				metadata->input1[i].shape.z, layer->info.input[i].sz_d,
+				layer->info.input[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			addr->input[i].nb_dims = 4;
-			addr->input[i].shape[0] = metadata->input2[j].shape.w;
-			addr->input[i].shape[1] = metadata->input2[j].shape.x;
-			addr->input[i].shape[2] = metadata->input2[j].shape.y;
-			addr->input[i].shape[3] = metadata->input2[j].shape.z;
-
-			addr->input[i].nb_elements =
+			rte_strscpy(layer->info.input[i].name,
+				    (char *)metadata->input2[j].input_name, MRVL_ML_INPUT_NAME_LEN);
+			layer->info.input[i].dtype = metadata->input2[j].input_type;
+			layer->info.input[i].qtype = metadata->input2[j].model_input_type;
+			layer->info.input[i].nb_dims = 4;
+			layer->info.input[i].shape[0] = metadata->input2[j].shape.w;
+			layer->info.input[i].shape[1] = metadata->input2[j].shape.x;
+			layer->info.input[i].shape[2] = metadata->input2[j].shape.y;
+			layer->info.input[i].shape[3] = metadata->input2[j].shape.z;
+			layer->info.input[i].nb_elements =
 				metadata->input2[j].shape.w * metadata->input2[j].shape.x *
 				metadata->input2[j].shape.y * metadata->input2[j].shape.z;
-			addr->input[i].sz_d =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_d =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].input_type);
-			addr->input[i].sz_q =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_q =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].model_input_type);
-			addr->total_input_sz_d += addr->input[i].sz_d;
-			addr->total_input_sz_q += addr->input[i].sz_q;
+			layer->info.input[i].scale = metadata->input2[j].qscale;
+
+			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
+			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
 
 			plt_ml_dbg(
-				"model_id = %u, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				model->model_id, j, metadata->input2[j].shape.w,
+				"index = %u, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				layer->index, j, metadata->input2[j].shape.w,
 				metadata->input2[j].shape.x, metadata->input2[j].shape.y,
-				metadata->input2[j].shape.z, addr->input[i].sz_d,
-				addr->input[i].sz_q);
+				metadata->input2[j].shape.z, layer->info.input[i].sz_d,
+				layer->info.input[i].sz_q);
 		}
 	}
 
 	/* Outputs */
-	addr->total_output_sz_q = 0;
-	addr->total_output_sz_d = 0;
+	layer->info.nb_outputs = metadata->model.num_output;
+	layer->info.total_output_sz_q = 0;
+	layer->info.total_output_sz_d = 0;
 	for (i = 0; i < metadata->model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			addr->output[i].nb_dims = 1;
-			addr->output[i].shape[0] = metadata->output1[i].size;
-			addr->output[i].nb_elements = metadata->output1[i].size;
-			addr->output[i].sz_d =
-				addr->output[i].nb_elements *
+			rte_strscpy(layer->info.output[i].name,
+				    (char *)metadata->output1[i].output_name,
+				    MRVL_ML_OUTPUT_NAME_LEN);
+			layer->info.output[i].dtype = metadata->output1[i].output_type;
+			layer->info.output[i].qtype = metadata->output1[i].model_output_type;
+			layer->info.output[i].nb_dims = 1;
+			layer->info.output[i].shape[0] = metadata->output1[i].size;
+			layer->info.output[i].nb_elements = metadata->output1[i].size;
+			layer->info.output[i].sz_d =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].output_type);
-			addr->output[i].sz_q =
-				addr->output[i].nb_elements *
+			layer->info.output[i].sz_q =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].model_output_type);
-			addr->total_output_sz_q += addr->output[i].sz_q;
-			addr->total_output_sz_d += addr->output[i].sz_d;
+			layer->info.output[i].scale = metadata->output1[i].dscale;
 
-			plt_ml_dbg("model_id = %u, output[%u] - sz_d = %u, sz_q = %u",
-				   model->model_id, i, addr->output[i].sz_d, addr->output[i].sz_q);
+			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
+			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
+
+			plt_ml_dbg("index = %u, output1[%u] - sz_d = %u, sz_q = %u", layer->index,
+				   i, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			addr->output[i].nb_dims = 1;
-			addr->output[i].shape[0] = metadata->output2[j].size;
-			addr->output[i].nb_elements = metadata->output2[j].size;
-			addr->output[i].sz_d =
-				addr->output[i].nb_elements *
+			rte_strscpy(layer->info.output[i].name,
+				    (char *)metadata->output2[j].output_name,
+				    MRVL_ML_OUTPUT_NAME_LEN);
+			layer->info.output[i].dtype = metadata->output2[j].output_type;
+			layer->info.output[i].qtype = metadata->output2[j].model_output_type;
+			layer->info.output[i].nb_dims = 1;
+			layer->info.output[i].shape[0] = metadata->output2[j].size;
+			layer->info.output[i].nb_elements = metadata->output2[j].size;
+			layer->info.output[i].sz_d =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].output_type);
-			addr->output[i].sz_q =
-				addr->output[i].nb_elements *
+			layer->info.output[i].sz_q =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].model_output_type);
-			addr->total_output_sz_q += addr->output[i].sz_q;
-			addr->total_output_sz_d += addr->output[i].sz_d;
+			layer->info.output[i].scale = metadata->output2[j].dscale;
+
+			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
+			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
 
-			plt_ml_dbg("model_id = %u, output2[%u] - sz_d = %u, sz_q = %u",
-				   model->model_id, j, addr->output[i].sz_d, addr->output[i].sz_q);
+			plt_ml_dbg("index = %u, output2[%u] - sz_d = %u, sz_q = %u", layer->index,
+				   j, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
 		}
 	}
 }
@@ -514,23 +548,23 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 }
 
 void
-cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model)
+cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_model_addr *addr;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct rte_ml_model_info *info;
 	struct rte_ml_io_info *output;
 	struct rte_ml_io_info *input;
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_layer *layer;
 	uint8_t i;
-	uint8_t j;
 
-	mldev = dev->data->dev_private;
-	metadata = &model->metadata;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	metadata = &model->glow.metadata;
 	info = PLT_PTR_CAST(model->info);
 	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
 	output = PLT_PTR_ADD(input, metadata->model.num_input * sizeof(struct rte_ml_io_info));
-	addr = &model->addr;
 
 	/* Set model info */
 	memset(info, 0, sizeof(struct rte_ml_model_info));
@@ -542,7 +576,8 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model)
 	info->device_id = dev->data->dev_id;
 	info->io_layout = RTE_ML_IO_LAYOUT_PACKED;
 	info->min_batches = model->batch_size;
-	info->max_batches = mldev->fw.req->jd.fw_load.cap.s.max_num_batches / model->batch_size;
+	info->max_batches =
+		cn10k_mldev->fw.req->jd.fw_load.cap.s.max_num_batches / model->batch_size;
 	info->nb_inputs = metadata->model.num_input;
 	info->input_info = input;
 	info->nb_outputs = metadata->model.num_output;
@@ -550,56 +585,26 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model)
 	info->wb_size = metadata->weights_bias.file_size;
 
 	/* Set input info */
+	layer = &model->layer[0];
 	for (i = 0; i < info->nb_inputs; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			rte_memcpy(input[i].name, metadata->input1[i].input_name,
-				   MRVL_ML_INPUT_NAME_LEN);
-			input[i].nb_dims = addr->input[i].nb_dims;
-			input[i].shape = addr->input[i].shape;
-			input[i].type = metadata->input1[i].model_input_type;
-			input[i].nb_elements = addr->input[i].nb_elements;
-			input[i].size =
-				addr->input[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->input1[i].model_input_type);
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-
-			rte_memcpy(input[i].name, metadata->input2[j].input_name,
-				   MRVL_ML_INPUT_NAME_LEN);
-			input[i].nb_dims = addr->input[i].nb_dims;
-			input[i].shape = addr->input[i].shape;
-			input[i].type = metadata->input2[j].model_input_type;
-			input[i].nb_elements = addr->input[i].nb_elements;
-			input[i].size =
-				addr->input[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->input2[j].model_input_type);
-		}
+		rte_memcpy(input[i].name, layer->info.input[i].name, MRVL_ML_INPUT_NAME_LEN);
+		input[i].nb_dims = layer->info.input[i].nb_dims;
+		input[i].shape = &layer->info.input[i].shape[0];
+		input[i].type = layer->info.input[i].qtype;
+		input[i].nb_elements = layer->info.input[i].nb_elements;
+		input[i].size = layer->info.input[i].nb_elements *
+				rte_ml_io_type_size_get(layer->info.input[i].qtype);
 	}
 
 	/* Set output info */
+	layer = &model->layer[0];
 	for (i = 0; i < info->nb_outputs; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			rte_memcpy(output[i].name, metadata->output1[i].output_name,
-				   MRVL_ML_OUTPUT_NAME_LEN);
-			output[i].nb_dims = addr->output[i].nb_dims;
-			output[i].shape = addr->output[i].shape;
-			output[i].type = metadata->output1[i].model_output_type;
-			output[i].nb_elements = addr->output[i].nb_elements;
-			output[i].size =
-				addr->output[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->output1[i].model_output_type);
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-
-			rte_memcpy(output[i].name, metadata->output2[j].output_name,
-				   MRVL_ML_OUTPUT_NAME_LEN);
-			output[i].nb_dims = addr->output[i].nb_dims;
-			output[i].shape = addr->output[i].shape;
-			output[i].type = metadata->output2[j].model_output_type;
-			output[i].nb_elements = addr->output[i].nb_elements;
-			output[i].size =
-				addr->output[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->output2[j].model_output_type);
-		}
+		rte_memcpy(output[i].name, layer->info.output[i].name, MRVL_ML_INPUT_NAME_LEN);
+		output[i].nb_dims = layer->info.output[i].nb_dims;
+		output[i].shape = &layer->info.output[i].shape[0];
+		output[i].type = layer->info.output[i].qtype;
+		output[i].nb_elements = layer->info.output[i].nb_elements;
+		output[i].size = layer->info.output[i].nb_elements *
+				 rte_ml_io_type_size_get(layer->info.output[i].qtype);
 	}
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 3128b28db7..206a369ca7 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -13,15 +13,8 @@
 #include "cn10k_ml_ocm.h"
 #include "cn10k_ml_ops.h"
 
-struct cnxk_ml_dev;
-
-/* Model state */
-enum cn10k_ml_model_state {
-	ML_CN10K_MODEL_STATE_LOADED,
-	ML_CN10K_MODEL_STATE_JOB_ACTIVE,
-	ML_CN10K_MODEL_STATE_STARTED,
-	ML_CN10K_MODEL_STATE_UNKNOWN,
-};
+struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 /* Model Metadata : v 2.3.0.1 */
 #define MRVL_ML_MODEL_MAGIC_STRING "MRVL"
@@ -369,7 +362,7 @@ struct cn10k_ml_model_metadata {
 };
 
 /* Model address structure */
-struct cn10k_ml_model_addr {
+struct cn10k_ml_layer_addr {
 	/* Base DMA address for load */
 	void *base_dma_addr_load;
 
@@ -408,58 +401,10 @@ struct cn10k_ml_model_addr {
 
 	/* End tile */
 	uint8_t tile_end;
-
-	/* Input address and size */
-	struct {
-		/* Number of dimensions in shape */
-		uint32_t nb_dims;
-
-		/* Shape of input */
-		uint32_t shape[4];
-
-		/* Number of elements */
-		uint32_t nb_elements;
-
-		/* Dequantized input size */
-		uint32_t sz_d;
-
-		/* Quantized input size */
-		uint32_t sz_q;
-	} input[MRVL_ML_NUM_INPUT_OUTPUT];
-
-	/* Output address and size */
-	struct {
-		/* Number of dimensions in shape */
-		uint32_t nb_dims;
-
-		/* Shape of input */
-		uint32_t shape[4];
-
-		/* Number of elements */
-		uint32_t nb_elements;
-
-		/* Dequantize output size */
-		uint32_t sz_d;
-
-		/* Quantized output size */
-		uint32_t sz_q;
-	} output[MRVL_ML_NUM_INPUT_OUTPUT];
-
-	/* Total size of quantized input */
-	uint32_t total_input_sz_q;
-
-	/* Total size of dequantized input */
-	uint32_t total_input_sz_d;
-
-	/* Total size of quantized output */
-	uint32_t total_output_sz_q;
-
-	/* Total size of dequantized output */
-	uint32_t total_output_sz_d;
 };
 
 /* Model fast-path stats */
-struct cn10k_ml_model_stats {
+struct cn10k_ml_layer_stats {
 	/* Total hardware latency, sum of all inferences */
 	uint64_t hw_latency_tot;
 
@@ -488,59 +433,38 @@ struct cn10k_ml_model_stats {
 	uint64_t fw_reset_count;
 };
 
-/* Model Object */
-struct cn10k_ml_model {
-	/* Device reference */
-	struct cnxk_ml_dev *mldev;
-
-	/* Name */
-	char name[RTE_ML_STR_MAX];
-
-	/* ID */
-	uint16_t model_id;
-
-	/* Batch size */
-	uint32_t batch_size;
-
-	/* Metadata */
+struct cn10k_ml_layer_data {
+	/* Model / Layer: metadata */
 	struct cn10k_ml_model_metadata metadata;
 
-	/* Address structure */
-	struct cn10k_ml_model_addr addr;
+	/* Layer: address structure */
+	struct cn10k_ml_layer_addr addr;
 
-	/* Tile and memory information object */
-	struct cn10k_ml_ocm_model_map model_mem_map;
+	/* Layer: Tile and memory information object */
+	struct cn10k_ml_ocm_layer_map ocm_map;
 
-	/* Internal model information structure
-	 * Size of the buffer = sizeof(struct rte_ml_model_info)
-	 *                    + num_inputs * sizeof(struct rte_ml_io_info)
-	 *                    + num_outputs * sizeof(struct rte_ml_io_info).
-	 * Structures would be arranged in the same order in the buffer.
-	 */
-	uint8_t *info;
-
-	/* Spinlock, used to update model state */
-	plt_spinlock_t lock;
-
-	/* State */
-	enum cn10k_ml_model_state state;
-
-	/* Slow-path operations request pointer */
+	/* Layer: Slow-path operations request pointer */
 	struct cn10k_ml_req *req;
 
-	/* Stats for burst ops */
-	struct cn10k_ml_model_stats *burst_stats;
+	/* Layer: Stats for burst ops */
+	struct cn10k_ml_layer_stats *burst_stats;
 
-	/* Stats for sync ops */
-	struct cn10k_ml_model_stats *sync_stats;
+	/* Layer: Stats for sync ops */
+	struct cn10k_ml_layer_stats *sync_stats;
+};
+
+struct cn10k_ml_model_data {
+	/* Model / Layer: metadata */
+	struct cn10k_ml_model_metadata metadata;
 };
 
 int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
-void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
+void cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer,
 				uint8_t *base_dma_addr);
+void cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer);
 int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id,
 				   uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages);
-void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model);
+void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 8094a0fab1..d71c36eae6 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -6,10 +6,10 @@
 
 #include <roc_api.h>
 
-#include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 
 /* OCM macros */
 #define BYTE_LEN	   8
@@ -333,12 +333,14 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 }
 
 void
-cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t tilemask,
-			   int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages)
+cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
+			   uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
+			   uint16_t scratch_pages)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_page_start;
@@ -353,6 +355,7 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
+	layer = &model->layer[layer_id];
 
 	/* Get first set bit, tile_start */
 	tile_start = 0;
@@ -382,8 +385,8 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 				PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, wb_page_end);
 	}
 
-	model->addr.tile_start = tile_start;
-	model->addr.tile_end = tile_end;
+	layer->glow.addr.tile_start = tile_start;
+	layer->glow.addr.tile_end = tile_end;
 
 	plt_ml_dbg("model_id = %u, tilemask = 0x%016lx", model_id, tilemask);
 	plt_ml_dbg("model_id = %u, wb_page_start = %d, wb_page_end = %d", model_id, wb_page_start,
@@ -393,12 +396,14 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 }
 
 void
-cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id)
 {
-	struct cn10k_ml_model *local_model;
+	struct cnxk_ml_model *local_model;
+	struct cnxk_ml_layer *local_layer;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_resize_pages;
@@ -409,16 +414,19 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 	int tile_id;
 	int page_id;
 	uint16_t i;
+	uint16_t j;
 
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
+	layer = &model->layer[layer_id];
 
 	/* Update OCM info for WB memory */
-	wb_page_start = model->model_mem_map.wb_page_start;
-	wb_page_end = wb_page_start + model->model_mem_map.wb_pages - 1;
-	for (tile_id = model->addr.tile_start; tile_id <= model->addr.tile_end; tile_id++) {
+	wb_page_start = layer->glow.ocm_map.wb_page_start;
+	wb_page_end = wb_page_start + layer->glow.ocm_map.wb_pages - 1;
+	for (tile_id = layer->glow.addr.tile_start; tile_id <= layer->glow.addr.tile_end;
+	     tile_id++) {
 		for (page_id = wb_page_start; page_id <= wb_page_end; page_id++) {
 			CLEAR_BIT(ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE],
 				  page_id % OCM_MAP_WORD_SIZE);
@@ -432,11 +440,19 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 		scratch_resize_pages = 0;
 		for (i = 0; i < dev->data->nb_models; i++) {
 			local_model = dev->data->models[i];
-			if ((i != model_id) && (local_model != NULL)) {
-				if (IS_BIT_SET(local_model->model_mem_map.tilemask, tile_id))
-					scratch_resize_pages = PLT_MAX(
-						(int)local_model->model_mem_map.scratch_pages,
-						scratch_resize_pages);
+			if (local_model == NULL)
+				continue;
+
+			for (j = 0; j < local_model->nb_layers; j++) {
+				local_layer = &local_model->layer[j];
+				if (local_layer != layer &&
+				    local_layer->glow.ocm_map.ocm_reserved) {
+					if (IS_BIT_SET(local_layer->glow.ocm_map.tilemask, tile_id))
+						scratch_resize_pages =
+							PLT_MAX((int)local_layer->glow.ocm_map
+									.scratch_pages,
+								scratch_resize_pages);
+				}
 			}
 		}
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 3404e7fd65..720f8caf76 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -27,7 +27,7 @@ struct cn10k_ml_ocm_tile_info {
 };
 
 /* Model OCM map structure */
-struct cn10k_ml_ocm_model_map {
+struct cn10k_ml_ocm_layer_map {
 	/* Status of OCM reservation */
 	bool ocm_reserved;
 
@@ -77,9 +77,10 @@ struct cn10k_ml_ocm {
 int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
 int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
 			       uint16_t scratch_pages, uint64_t *tilemask);
-void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t tilemask,
-				int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages);
-void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id);
+void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
+				uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
+				uint16_t scratch_pages);
+void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id);
 void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp);
 
 #endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index dc747cf534..b226a9b5a2 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -7,10 +7,10 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_model.h"
 #include "cn10k_ml_ops.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
@@ -202,7 +202,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	char str[STR_LEN];
 	uint8_t i;
@@ -215,77 +215,80 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 
 	/* Print debug info */
 	print_line(fp, LINE_LEN);
-	fprintf(fp, " Model Information (%s)\n", model->metadata.model.name);
+	fprintf(fp, " Model Information (%s)\n", model->glow.metadata.model.name);
 	print_line(fp, LINE_LEN);
-	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->metadata.model.name);
-	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version", model->metadata.model.version[0],
-		model->metadata.model.version[1], model->metadata.model.version[2],
-		model->metadata.model.version[3]);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->glow.metadata.model.name);
+	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version",
+		model->glow.metadata.model.version[0], model->glow.metadata.model.version[1],
+		model->glow.metadata.model.version[2], model->glow.metadata.model.version[3]);
 	if (strlen(model->name) != 0)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "debug_name", model->name);
 	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->metadata.model.batch_size);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_layers", model->metadata.model.num_layers);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->glow.metadata.model.batch_size);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_layers", model->glow.metadata.model.num_layers);
 
 	/* Print model state */
-	if (model->state == ML_CN10K_MODEL_STATE_LOADED)
+	if (model->state == ML_CNXK_MODEL_STATE_LOADED)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
-	if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE)
+	if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
-	if (model->state == ML_CN10K_MODEL_STATE_STARTED)
+	if (model->state == ML_CNXK_MODEL_STATE_STARTED)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
 
 	/* Print OCM status */
 	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
-		model->metadata.model.ocm_wb_range_end - model->metadata.model.ocm_wb_range_start +
-			1);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", model->model_mem_map.wb_pages);
+		model->glow.metadata.model.ocm_wb_range_end -
+			model->glow.metadata.model.ocm_wb_range_start + 1);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", model->layer[0].glow.ocm_map.wb_pages);
 	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
-		ocm->size_per_tile - model->metadata.model.ocm_tmp_range_floor);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages", model->model_mem_map.scratch_pages);
+		ocm->size_per_tile - model->glow.metadata.model.ocm_tmp_range_floor);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages",
+		model->layer[0].glow.ocm_map.scratch_pages);
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
-		model->metadata.model.tile_end - model->metadata.model.tile_start + 1);
+		model->glow.metadata.model.tile_end - model->glow.metadata.model.tile_start + 1);
 
-	if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+	if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
-			ML_CN10K_OCM_NUMTILES / 4, model->model_mem_map.tilemask);
+			ML_CN10K_OCM_NUMTILES / 4, model->layer[0].glow.ocm_map.tilemask);
 		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
-			model->model_mem_map.wb_page_start * cn10k_mldev->ocm.page_size);
+			model->layer[0].glow.ocm_map.wb_page_start * cn10k_mldev->ocm.page_size);
 	}
 
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->metadata.model.num_input);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", model->metadata.model.num_output);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->glow.metadata.model.num_input);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", model->glow.metadata.model.num_output);
 	fprintf(fp, "\n");
 
 	print_line(fp, LINE_LEN);
 	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "input", "input_name", "input_type",
 		"model_input_type", "quantize");
 	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->metadata.model.num_input; i++) {
+	for (i = 0; i < model->glow.metadata.model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.input1[i].input_name);
-			rte_ml_io_type_to_str(model->metadata.input1[i].input_type, str, STR_LEN);
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.input1[i].input_name);
+			rte_ml_io_type_to_str(model->glow.metadata.input1[i].input_type, str,
+					      STR_LEN);
 			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.input1[i].model_input_type, str,
+			rte_ml_io_type_to_str(model->glow.metadata.input1[i].model_input_type, str,
 					      STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.input1[i].quantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.input1[i].quantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.input2[j].input_name);
-			rte_ml_io_type_to_str(model->metadata.input2[j].input_type, str, STR_LEN);
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.input2[j].input_name);
+			rte_ml_io_type_to_str(model->glow.metadata.input2[j].input_type, str,
+					      STR_LEN);
 			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.input2[j].model_input_type, str,
+			rte_ml_io_type_to_str(model->glow.metadata.input2[j].model_input_type, str,
 					      STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.input2[j].quantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.input2[j].quantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		}
 	}
@@ -295,29 +298,31 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "output", "output_name", "output_type",
 		"model_output_type", "dequantize");
 	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->metadata.model.num_output; i++) {
+	for (i = 0; i < model->glow.metadata.model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.output1[i].output_name);
-			rte_ml_io_type_to_str(model->metadata.output1[i].output_type, str, STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.output1[i].model_output_type, str,
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.output1[i].output_name);
+			rte_ml_io_type_to_str(model->glow.metadata.output1[i].output_type, str,
 					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(model->glow.metadata.output1[i].model_output_type,
+					      str, STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.output1[i].dequantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.output1[i].dequantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.output2[j].output_name);
-			rte_ml_io_type_to_str(model->metadata.output2[j].output_type, str, STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.output2[j].model_output_type, str,
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.output2[j].output_name);
+			rte_ml_io_type_to_str(model->glow.metadata.output2[j].output_type, str,
 					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(model->glow.metadata.output2[j].model_output_type,
+					      str, STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.output2[j].dequantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.output2[j].dequantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		}
 	}
@@ -327,14 +332,14 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 }
 
 static void
-cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_model *model,
+cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_model *model,
 				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_model_addr *addr;
+	struct cn10k_ml_layer_addr *addr;
 
-	metadata = &model->metadata;
-	addr = &model->addr;
+	metadata = &model->glow.metadata;
+	addr = &model->layer[0].glow.addr;
 
 	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
 	req->jd.hdr.jce.w0.u64 = 0;
@@ -345,7 +350,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_m
 	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
-		if (!model->metadata.model.ocm_relocatable)
+		if (!model->glow.metadata.model.ocm_relocatable)
 			req->jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
 		else
 			req->jd.hdr.sp_flags = 0x0;
@@ -385,7 +390,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_m
 		req->jd.model_start.output.s.ddr_range_end = metadata->model.ddr_output_range_end;
 
 		req->extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
-			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, model->addr.scratch_base_addr));
+			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->scratch_base_addr));
 		req->extended_args.start.ddr_scratch_range_start =
 			metadata->model.ddr_scratch_range_start;
 		req->extended_args.start.ddr_scratch_range_end =
@@ -445,7 +450,7 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
-	nb_stats = RTE_DIM(device_stats) + ML_CN10K_MAX_MODELS * RTE_DIM(model_stats);
+	nb_stats = RTE_DIM(device_stats) + ML_CNXK_MAX_MODELS * RTE_DIM(model_stats);
 	if (cn10k_mldev->xstats.entries == NULL)
 		cn10k_mldev->xstats.entries = rte_zmalloc(
 			"cn10k_ml_xstats", sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
@@ -472,7 +477,7 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	cn10k_mldev->xstats.count_mode_device = stat_id;
 
 	/* Initialize model xstats */
-	for (model = 0; model < ML_CN10K_MAX_MODELS; model++) {
+	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
 		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
 
 		for (i = 0; i < RTE_DIM(model_stats); i++) {
@@ -521,7 +526,7 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint16_t rclk_freq;
 	uint16_t sclk_freq;
 	uint16_t stat_id;
@@ -543,7 +548,7 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 	for (i = 0; i < RTE_DIM(model_stats); i++) {
 		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
-			 model->metadata.model.name, model_stats[i].name, suffix);
+			 model->layer[0].glow.metadata.model.name, model_stats[i].name, suffix);
 		stat_id++;
 	}
 }
@@ -576,9 +581,9 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value += model->burst_stats[qp_id].str##_latency_tot;                      \
-			count += model->burst_stats[qp_id].dequeued_count -                        \
-				 model->burst_stats[qp_id].str##_reset_count;                      \
+			value += model->layer[0].glow.burst_stats[qp_id].str##_latency_tot;        \
+			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
+				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
 		}                                                                                  \
 		if (count != 0)                                                                    \
 			value = value / count;                                                     \
@@ -588,9 +593,10 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = UINT64_MAX;                                                                \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MIN(value, model->burst_stats[qp_id].str##_latency_min);       \
-			count += model->burst_stats[qp_id].dequeued_count -                        \
-				 model->burst_stats[qp_id].str##_reset_count;                      \
+			value = PLT_MIN(                                                           \
+				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_min); \
+			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
+				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
@@ -600,9 +606,10 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MAX(value, model->burst_stats[qp_id].str##_latency_max);       \
-			count += model->burst_stats[qp_id].dequeued_count -                        \
-				 model->burst_stats[qp_id].str##_reset_count;                      \
+			value = PLT_MAX(                                                           \
+				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_max); \
+			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
+				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
@@ -611,7 +618,7 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 static uint64_t
 cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cn10k_ml_xstats_type type)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint16_t rclk_freq; /* MHz */
 	uint16_t sclk_freq; /* MHz */
 	uint64_t count = 0;
@@ -692,28 +699,28 @@ cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[],
 #define ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			model->burst_stats[qp_id].str##_latency_tot = 0;                           \
-			model->burst_stats[qp_id].str##_reset_count =                              \
-				model->burst_stats[qp_id].dequeued_count;                          \
+			model->layer[0].glow.burst_stats[qp_id].str##_latency_tot = 0;             \
+			model->layer[0].glow.burst_stats[qp_id].str##_reset_count =                \
+				model->layer[0].glow.burst_stats[qp_id].dequeued_count;            \
 		}                                                                                  \
 	} while (0)
 
 #define ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->burst_stats[qp_id].str##_latency_min = UINT64_MAX;                  \
+			model->layer[0].glow.burst_stats[qp_id].str##_latency_min = UINT64_MAX;    \
 	} while (0)
 
 #define ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->burst_stats[qp_id].str##_latency_max = 0;                           \
+			model->layer[0].glow.burst_stats[qp_id].str##_latency_max = 0;             \
 	} while (0)
 
 static void
 cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cn10k_ml_xstats_type type)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint32_t qp_id;
 
 	model = dev->data->models[model_id];
@@ -749,7 +756,7 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	int32_t lcl_model_id = 0;
 	uint16_t start_id;
 	uint16_t end_id;
@@ -758,7 +765,7 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	for (i = 0; i < ML_CN10K_MAX_MODELS; i++) {
+	for (i = 0; i < ML_CNXK_MAX_MODELS; i++) {
 		if (model_id == -1) {
 			model = dev->data->models[i];
 			if (model == NULL) /* Skip inactive models */
@@ -803,7 +810,7 @@ static int
 cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct rte_ml_model_info *info;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct rte_ml_buff_seg seg[2];
 	struct rte_ml_buff_seg *inp;
 	struct rte_ml_buff_seg *out;
@@ -854,7 +861,7 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	op.input = &inp;
 	op.output = &out;
 
-	memset(model->req, 0, sizeof(struct cn10k_ml_req));
+	memset(model->layer[0].glow.req, 0, sizeof(struct cn10k_ml_req));
 	ret = cn10k_ml_inference_sync(dev, &op);
 	plt_memzone_free(mz);
 
@@ -875,7 +882,7 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 
 	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
 	dev_info->driver_name = dev->device->driver->name;
-	dev_info->max_models = ML_CN10K_MAX_MODELS;
+	dev_info->max_models = ML_CNXK_MAX_MODELS;
 	if (cn10k_mldev->hw_queue_lock)
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
 	else
@@ -895,7 +902,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
@@ -1001,11 +1008,11 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 			model = dev->data->models[model_id];
 			if (model != NULL) {
-				if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 					if (cn10k_ml_model_stop(dev, model_id) != 0)
 						plt_err("Could not stop model %u", model_id);
 				}
-				if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
 					if (cn10k_ml_model_unload(dev, model_id) != 0)
 						plt_err("Could not unload model %u", model_id);
 				}
@@ -1093,7 +1100,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
 	uint16_t qp_id;
@@ -1111,11 +1118,11 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 		model = dev->data->models[model_id];
 		if (model != NULL) {
-			if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 				if (cn10k_ml_model_stop(dev, model_id) != 0)
 					plt_err("Could not stop model %u", model_id);
 			}
-			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
 				if (cn10k_ml_model_unload(dev, model_id) != 0)
 					plt_err("Could not unload model %u", model_id);
 			}
@@ -1294,7 +1301,7 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CN10K_MAX_MODELS)
+		if (model_id >= ML_CNXK_MAX_MODELS)
 			break;
 		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
@@ -1386,7 +1393,7 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CN10K_MAX_MODELS)
+		if (model_id >= ML_CNXK_MAX_MODELS)
 			return -EINVAL;
 		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
@@ -1447,7 +1454,7 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_fw *fw;
 
 	uint32_t head_loc;
@@ -1588,7 +1595,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
@@ -1643,9 +1650,9 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
 			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
 	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
-	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_model_stats);
+	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_stats);
 
-	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
+	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
 		  2 * model_data_size + model_scratch_size + model_info_size +
 		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE) +
 		  model_stats_size;
@@ -1659,62 +1666,85 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	model = mz->addr;
-	model->mldev = cnxk_mldev;
+	model->cnxk_mldev = cnxk_mldev;
 	model->model_id = idx;
+	dev->data->models[idx] = model;
 
-	rte_memcpy(&model->metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
-	cn10k_ml_model_metadata_update(&model->metadata);
+	rte_memcpy(&model->glow.metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
+	cn10k_ml_model_metadata_update(&model->glow.metadata);
+
+	/* Set model name */
+	rte_memcpy(model->name, (char *)model->glow.metadata.model.name, 64);
 
 	/* Enable support for batch_size of 256 */
-	if (model->metadata.model.batch_size == 0)
+	if (model->glow.metadata.model.batch_size == 0)
 		model->batch_size = 256;
 	else
-		model->batch_size = model->metadata.model.batch_size;
+		model->batch_size = model->glow.metadata.model.batch_size;
+
+	/* Since the number of layers that the driver would be handling for glow models is
+	 * always 1. consider the entire model as a model with single layer. This would
+	 * ignore the num_layers from metadata.
+	 */
+	model->nb_layers = 1;
+
+	/* Copy metadata to internal buffer */
+	rte_memcpy(&model->layer[0].glow.metadata, params->addr,
+		   sizeof(struct cn10k_ml_model_metadata));
+	cn10k_ml_model_metadata_update(&model->layer[0].glow.metadata);
+	model->layer[0].model = model;
 
 	/* Set DMA base address */
 	base_dma_addr = PLT_PTR_ADD(
-		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE));
-	cn10k_ml_model_addr_update(model, params->addr, base_dma_addr);
-	model->addr.scratch_base_addr = PLT_PTR_ADD(base_dma_addr, 2 * model_data_size);
+		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE));
+	cn10k_ml_layer_addr_update(&model->layer[0], params->addr, base_dma_addr);
+	model->layer[0].glow.addr.scratch_base_addr =
+		PLT_PTR_ADD(base_dma_addr, 2 * model_data_size);
 
 	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, model_data_size);
+	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
+		   model->layer[0].glow.addr.base_dma_addr_load, model_data_size);
+
+	/* Update internal I/O data structure */
+	cn10k_ml_layer_info_update(&model->layer[0]);
 
 	/* Initialize model_mem_map */
-	memset(&model->model_mem_map, 0, sizeof(struct cn10k_ml_ocm_model_map));
-	model->model_mem_map.ocm_reserved = false;
-	model->model_mem_map.tilemask = 0;
-	model->model_mem_map.wb_page_start = -1;
-	model->model_mem_map.wb_pages = wb_pages;
-	model->model_mem_map.scratch_pages = scratch_pages;
+	memset(&model->layer[0].glow.ocm_map, 0, sizeof(struct cn10k_ml_ocm_layer_map));
+	model->layer[0].glow.ocm_map.ocm_reserved = false;
+	model->layer[0].glow.ocm_map.tilemask = 0;
+	model->layer[0].glow.ocm_map.wb_page_start = -1;
+	model->layer[0].glow.ocm_map.wb_pages = wb_pages;
+	model->layer[0].glow.ocm_map.scratch_pages = scratch_pages;
 
 	/* Set model info */
-	model->info = PLT_PTR_ADD(model->addr.scratch_base_addr, model_scratch_size);
+	model->info = PLT_PTR_ADD(model->layer[0].glow.addr.scratch_base_addr, model_scratch_size);
 	cn10k_ml_model_info_set(dev, model);
 
 	/* Set slow-path request address and state */
-	model->req = PLT_PTR_ADD(model->info, model_info_size);
+	model->layer[0].glow.req = PLT_PTR_ADD(model->info, model_info_size);
 
 	/* Reset burst and sync stats */
-	model->burst_stats = PLT_PTR_ADD(
-		model->req, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE));
+	model->layer[0].glow.burst_stats =
+		PLT_PTR_ADD(model->layer[0].glow.req,
+			    PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE));
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
-		model->burst_stats[qp_id].hw_latency_tot = 0;
-		model->burst_stats[qp_id].hw_latency_min = UINT64_MAX;
-		model->burst_stats[qp_id].hw_latency_max = 0;
-		model->burst_stats[qp_id].fw_latency_tot = 0;
-		model->burst_stats[qp_id].fw_latency_min = UINT64_MAX;
-		model->burst_stats[qp_id].fw_latency_max = 0;
-		model->burst_stats[qp_id].hw_reset_count = 0;
-		model->burst_stats[qp_id].fw_reset_count = 0;
-		model->burst_stats[qp_id].dequeued_count = 0;
-	}
-	model->sync_stats =
-		PLT_PTR_ADD(model->burst_stats,
-			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_model_stats));
+		model->layer[0].glow.burst_stats[qp_id].hw_latency_tot = 0;
+		model->layer[0].glow.burst_stats[qp_id].hw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_stats[qp_id].hw_latency_max = 0;
+		model->layer[0].glow.burst_stats[qp_id].fw_latency_tot = 0;
+		model->layer[0].glow.burst_stats[qp_id].fw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_stats[qp_id].fw_latency_max = 0;
+		model->layer[0].glow.burst_stats[qp_id].hw_reset_count = 0;
+		model->layer[0].glow.burst_stats[qp_id].fw_reset_count = 0;
+		model->layer[0].glow.burst_stats[qp_id].dequeued_count = 0;
+	}
+
+	model->layer[0].glow.sync_stats =
+		PLT_PTR_ADD(model->layer[0].glow.burst_stats,
+			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_stats));
 
 	plt_spinlock_init(&model->lock);
-	model->state = ML_CN10K_MODEL_STATE_LOADED;
+	model->state = ML_CNXK_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
 	cnxk_mldev->nb_models_loaded++;
 
@@ -1730,7 +1760,7 @@ int
 cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	char str[RTE_MEMZONE_NAMESIZE];
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cnxk_ml_dev *cnxk_mldev;
 
 	cnxk_mldev = dev->data->dev_private;
@@ -1741,7 +1771,7 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EINVAL;
 	}
 
-	if (model->state != ML_CN10K_MODEL_STATE_LOADED) {
+	if (model->state != ML_CNXK_MODEL_STATE_LOADED) {
 		plt_err("Cannot unload. Model in use.");
 		return -EBUSY;
 	}
@@ -1758,7 +1788,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1783,7 +1813,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	}
 
 	/* Prepare JD */
-	req = model->req;
+	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
@@ -1791,63 +1821,66 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
-	num_tiles = model->metadata.model.tile_end - model->metadata.model.tile_start + 1;
+	num_tiles = model->layer[0].glow.metadata.model.tile_end -
+		    model->layer[0].glow.metadata.model.tile_start + 1;
 
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 				plt_ml_dbg("Model already started, model = 0x%016lx",
 					   PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE) {
+			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
 				plt_err("A slow-path job is active for the model = 0x%016lx",
 					PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CN10K_MODEL_STATE_JOB_ACTIVE;
+			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (!model->model_mem_map.ocm_reserved) {
+	while (!model->layer[0].glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
 			wb_page_start = cn10k_ml_ocm_tilemask_find(
-				dev, num_tiles, model->model_mem_map.wb_pages,
-				model->model_mem_map.scratch_pages, &tilemask);
+				dev, num_tiles, model->layer[0].glow.ocm_map.wb_pages,
+				model->layer[0].glow.ocm_map.scratch_pages, &tilemask);
 
 			if (wb_page_start == -1) {
 				plt_err("Free pages not available on OCM tiles");
 				plt_err("Failed to start model = 0x%016lx, name = %s",
-					PLT_U64_CAST(model), model->metadata.model.name);
+					PLT_U64_CAST(model),
+					model->layer[0].glow.metadata.model.name);
 
 				plt_spinlock_unlock(&ocm->lock);
 				return -ENOMEM;
 			}
 
-			model->model_mem_map.tilemask = tilemask;
-			model->model_mem_map.wb_page_start = wb_page_start;
+			model->layer[0].glow.ocm_map.tilemask = tilemask;
+			model->layer[0].glow.ocm_map.wb_page_start = wb_page_start;
 
-			cn10k_ml_ocm_reserve_pages(
-				dev, model->model_id, model->model_mem_map.tilemask,
-				model->model_mem_map.wb_page_start, model->model_mem_map.wb_pages,
-				model->model_mem_map.scratch_pages);
-			model->model_mem_map.ocm_reserved = true;
+			cn10k_ml_ocm_reserve_pages(dev, model->model_id, 0,
+						   model->layer[0].glow.ocm_map.tilemask,
+						   model->layer[0].glow.ocm_map.wb_page_start,
+						   model->layer[0].glow.ocm_map.wb_pages,
+						   model->layer[0].glow.ocm_map.scratch_pages);
+			model->layer[0].glow.ocm_map.ocm_reserved = true;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
 
 	/* Update JD */
-	cn10k_ml_ocm_tilecount(model->model_mem_map.tilemask, &tile_start, &tile_end);
+	cn10k_ml_ocm_tilecount(model->layer[0].glow.ocm_map.tilemask, &tile_start, &tile_end);
 	req->jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
 	req->jd.model_start.ocm_wb_base_address =
-		model->model_mem_map.wb_page_start * ocm->page_size;
+		model->layer[0].glow.ocm_map.wb_page_start * ocm->page_size;
 
 	job_enqueued = false;
 	job_dequeued = false;
@@ -1880,10 +1913,10 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
 			if (ret == 0) {
-				model->state = ML_CN10K_MODEL_STATE_STARTED;
+				model->state = ML_CNXK_MODEL_STATE_STARTED;
 				cnxk_mldev->nb_models_started++;
 			} else {
-				model->state = ML_CN10K_MODEL_STATE_UNKNOWN;
+				model->state = ML_CNXK_MODEL_STATE_UNKNOWN;
 			}
 
 			plt_spinlock_unlock(&model->lock);
@@ -1891,12 +1924,12 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 		}
 	}
 
-	if (model->state == ML_CN10K_MODEL_STATE_UNKNOWN) {
-		while (model->model_mem_map.ocm_reserved) {
+	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN) {
+		while (model->layer[0].glow.ocm_map.ocm_reserved) {
 			if (plt_spinlock_trylock(&ocm->lock) != 0) {
-				cn10k_ml_ocm_free_pages(dev, model->model_id);
-				model->model_mem_map.ocm_reserved = false;
-				model->model_mem_map.tilemask = 0x0;
+				cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
+				model->layer[0].glow.ocm_map.ocm_reserved = false;
+				model->layer[0].glow.ocm_map.tilemask = 0x0;
 				plt_spinlock_unlock(&ocm->lock);
 			}
 		}
@@ -1917,7 +1950,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1937,7 +1970,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	}
 
 	/* Prepare JD */
-	req = model->req;
+	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
@@ -1948,31 +1981,31 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
 				plt_ml_dbg("Model not started, model = 0x%016lx",
 					   PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE) {
+			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
 				plt_err("A slow-path job is active for the model = 0x%016lx",
 					PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CN10K_MODEL_STATE_JOB_ACTIVE;
+			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (model->model_mem_map.ocm_reserved) {
+	while (model->layer[0].glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
-			cn10k_ml_ocm_free_pages(dev, model->model_id);
-			model->model_mem_map.ocm_reserved = false;
-			model->model_mem_map.tilemask = 0x0;
+			cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
+			model->layer[0].glow.ocm_map.ocm_reserved = false;
+			model->layer[0].glow.ocm_map.tilemask = 0x0;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
@@ -2008,7 +2041,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
 			cnxk_mldev->nb_models_stopped++;
-			model->state = ML_CN10K_MODEL_STATE_LOADED;
+			model->state = ML_CNXK_MODEL_STATE_LOADED;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
@@ -2021,7 +2054,7 @@ static int
 cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			struct rte_ml_model_info *model_info)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 
 	model = dev->data->models[model_id];
 
@@ -2040,7 +2073,7 @@ cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 static int
 cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	size_t size;
 
 	model = dev->data->models[model_id];
@@ -2050,19 +2083,23 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *bu
 		return -EINVAL;
 	}
 
-	if (model->state == ML_CN10K_MODEL_STATE_UNKNOWN)
+	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN)
 		return -1;
-	else if (model->state != ML_CN10K_MODEL_STATE_LOADED)
+	else if (model->state != ML_CNXK_MODEL_STATE_LOADED)
 		return -EBUSY;
 
-	size = model->metadata.init_model.file_size + model->metadata.main_model.file_size +
-	       model->metadata.finish_model.file_size + model->metadata.weights_bias.file_size;
+	size = model->layer[0].glow.metadata.init_model.file_size +
+	       model->layer[0].glow.metadata.main_model.file_size +
+	       model->layer[0].glow.metadata.finish_model.file_size +
+	       model->layer[0].glow.metadata.weights_bias.file_size;
 
 	/* Update model weights & bias */
-	rte_memcpy(model->addr.wb_load_addr, buffer, model->metadata.weights_bias.file_size);
+	rte_memcpy(model->layer[0].glow.addr.wb_load_addr, buffer,
+		   model->layer[0].glow.metadata.weights_bias.file_size);
 
 	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, size);
+	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
+		   model->layer[0].glow.addr.base_dma_addr_load, size);
 
 	return 0;
 }
@@ -2071,7 +2108,7 @@ static int
 cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
 		     struct rte_ml_buff_seg **qbuffer)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint8_t model_input_type;
 	uint8_t *lcl_dbuffer;
 	uint8_t *lcl_qbuffer;
@@ -2091,57 +2128,58 @@ cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_bu
 	lcl_dbuffer = dbuffer[0]->addr;
 	lcl_qbuffer = qbuffer[0]->addr;
 
-	for (i = 0; i < model->metadata.model.num_input; i++) {
+	for (i = 0; i < model->layer[0].glow.metadata.model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			input_type = model->metadata.input1[i].input_type;
-			model_input_type = model->metadata.input1[i].model_input_type;
-			qscale = model->metadata.input1[i].qscale;
+			input_type = model->layer[0].glow.metadata.input1[i].input_type;
+			model_input_type = model->layer[0].glow.metadata.input1[i].model_input_type;
+			qscale = model->layer[0].glow.metadata.input1[i].qscale;
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			input_type = model->metadata.input2[j].input_type;
-			model_input_type = model->metadata.input2[j].model_input_type;
-			qscale = model->metadata.input2[j].qscale;
+			input_type = model->layer[0].glow.metadata.input2[j].input_type;
+			model_input_type = model->layer[0].glow.metadata.input2[j].model_input_type;
+			qscale = model->layer[0].glow.metadata.input2[j].qscale;
 		}
 
 		if (input_type == model_input_type) {
-			rte_memcpy(lcl_qbuffer, lcl_dbuffer, model->addr.input[i].sz_d);
+			rte_memcpy(lcl_qbuffer, lcl_dbuffer, model->layer[0].info.input[i].sz_d);
 		} else {
-			switch (model->metadata.input1[i].model_input_type) {
+			switch (model->layer[0].glow.metadata.input1[i].model_input_type) {
 			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_float32_to_int8(qscale,
-								model->addr.input[i].nb_elements,
-								lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_int8(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_float32_to_uint8(qscale,
-								 model->addr.input[i].nb_elements,
-								 lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_uint8(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_float32_to_int16(qscale,
-								 model->addr.input[i].nb_elements,
-								 lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_int16(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_float32_to_uint16(qscale,
-								  model->addr.input[i].nb_elements,
-								  lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_uint16(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_FP16:
-				ret = rte_ml_io_float32_to_float16(model->addr.input[i].nb_elements,
-								   lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_float16(
+					model->layer[0].info.input[i].nb_elements, lcl_dbuffer,
+					lcl_qbuffer);
 				break;
 			default:
 				plt_err("Unsupported model_input_type[%u] : %u", i,
-					model->metadata.input1[i].model_input_type);
+					model->layer[0].glow.metadata.input1[i].model_input_type);
 				ret = -ENOTSUP;
 			}
 			if (ret < 0)
 				return ret;
 		}
 
-		lcl_dbuffer += model->addr.input[i].sz_d;
-		lcl_qbuffer += model->addr.input[i].sz_q;
+		lcl_dbuffer += model->layer[0].info.input[i].sz_d;
+		lcl_qbuffer += model->layer[0].info.input[i].sz_q;
 	}
 
 	return 0;
@@ -2151,7 +2189,7 @@ static int
 cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
 		       struct rte_ml_buff_seg **dbuffer)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint8_t model_output_type;
 	uint8_t *lcl_qbuffer;
 	uint8_t *lcl_dbuffer;
@@ -2171,58 +2209,60 @@ cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_
 	lcl_dbuffer = dbuffer[0]->addr;
 	lcl_qbuffer = qbuffer[0]->addr;
 
-	for (i = 0; i < model->metadata.model.num_output; i++) {
+	for (i = 0; i < model->layer[0].glow.metadata.model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			output_type = model->metadata.output1[i].output_type;
-			model_output_type = model->metadata.output1[i].model_output_type;
-			dscale = model->metadata.output1[i].dscale;
+			output_type = model->layer[0].glow.metadata.output1[i].output_type;
+			model_output_type =
+				model->layer[0].glow.metadata.output1[i].model_output_type;
+			dscale = model->layer[0].glow.metadata.output1[i].dscale;
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			output_type = model->metadata.output2[j].output_type;
-			model_output_type = model->metadata.output2[j].model_output_type;
-			dscale = model->metadata.output2[j].dscale;
+			output_type = model->layer[0].glow.metadata.output2[j].output_type;
+			model_output_type =
+				model->layer[0].glow.metadata.output2[j].model_output_type;
+			dscale = model->layer[0].glow.metadata.output2[j].dscale;
 		}
 
 		if (output_type == model_output_type) {
-			rte_memcpy(lcl_dbuffer, lcl_qbuffer, model->addr.output[i].sz_q);
+			rte_memcpy(lcl_dbuffer, lcl_qbuffer, model->layer[0].info.output[i].sz_q);
 		} else {
-			switch (model->metadata.output1[i].model_output_type) {
+			switch (model->layer[0].glow.metadata.output1[i].model_output_type) {
 			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_int8_to_float32(dscale,
-								model->addr.output[i].nb_elements,
-								lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_int8_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_uint8_to_float32(dscale,
-								 model->addr.output[i].nb_elements,
-								 lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_uint8_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_int16_to_float32(dscale,
-								 model->addr.output[i].nb_elements,
-								 lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_int16_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_uint16_to_float32(dscale,
-								  model->addr.output[i].nb_elements,
-								  lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_uint16_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_FP16:
 				ret = rte_ml_io_float16_to_float32(
-					model->addr.output[i].nb_elements, lcl_qbuffer,
+					model->layer[0].info.output[i].nb_elements, lcl_qbuffer,
 					lcl_dbuffer);
 				break;
 			default:
 				plt_err("Unsupported model_output_type[%u] : %u", i,
-					model->metadata.output1[i].model_output_type);
+					model->layer[0].glow.metadata.output1[i].model_output_type);
 				ret = -ENOTSUP;
 			}
 			if (ret < 0)
 				return ret;
 		}
 
-		lcl_qbuffer += model->addr.output[i].sz_q;
-		lcl_dbuffer += model->addr.output[i].sz_d;
+		lcl_qbuffer += model->layer[0].info.output[i].sz_q;
+		lcl_dbuffer += model->layer[0].info.output[i].sz_d;
 	}
 
 	return 0;
@@ -2250,10 +2290,10 @@ static __rte_always_inline void
 cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
 		       struct rte_ml_op *op)
 {
-	struct cn10k_ml_model_stats *stats;
+	struct cn10k_ml_layer_stats *stats;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_qp *qp;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
@@ -2263,9 +2303,9 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		if (likely(qp_id >= 0)) {
 			qp = dev->data->queue_pairs[qp_id];
 			qp->stats.dequeued_count++;
-			stats = &model->burst_stats[qp_id];
+			stats = &model->layer[0].glow.burst_stats[qp_id];
 		} else {
-			stats = model->sync_stats;
+			stats = model->layer[0].glow.sync_stats;
 		}
 
 		if (unlikely(stats->dequeued_count == stats->hw_reset_count)) {
@@ -2469,7 +2509,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_req *req;
 	bool timeout;
 	int ret = 0;
@@ -2477,7 +2517,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[op->model_id];
-	req = model->req;
+	req = model->layer[0].glow.req;
 
 	cn10k_ml_set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
diff --git a/drivers/ml/cnxk/cnxk_ml_io.h b/drivers/ml/cnxk/cnxk_ml_io.h
new file mode 100644
index 0000000000..29ec7ec511
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_io.h
@@ -0,0 +1,79 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_IO_H_
+#define _CNXK_ML_IO_H_
+
+#include <rte_mldev.h>
+
+/* Maximum number of models per device */
+#define ML_CNXK_MAX_MODELS 16
+
+/* Maximum number of layers per model */
+#define ML_CNXK_MODEL_MAX_LAYERS 32
+
+/* Maximum number of inputs or outputs per layer or model */
+#define ML_CNXK_MODEL_MAX_INPUT_OUTPUT 32
+
+/* Maximum number of dimensions per I/O shape */
+#define ML_CNXK_MODEL_MAX_DIMS 8
+
+/* Input / Output structure */
+struct cnxk_ml_io {
+	/* name */
+	char name[RTE_ML_STR_MAX];
+
+	/* dequantized data type */
+	enum rte_ml_io_type dtype;
+
+	/* quantized data type */
+	enum rte_ml_io_type qtype;
+
+	/* Number of dimensions in shape */
+	uint32_t nb_dims;
+
+	/* Shape of input */
+	uint32_t shape[ML_CNXK_MODEL_MAX_DIMS];
+
+	/* Number of elements */
+	uint32_t nb_elements;
+
+	/* Dequantized input size */
+	uint32_t sz_d;
+
+	/* Quantized input size */
+	uint32_t sz_q;
+
+	/* Scale */
+	float scale;
+};
+
+/* Model / Layer IO structure */
+struct cnxk_ml_io_info {
+	/* Number of inputs */
+	uint16_t nb_inputs;
+
+	/* Model / Layer inputs */
+	struct cnxk_ml_io input[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Total size of quantized input */
+	uint32_t total_input_sz_q;
+
+	/* Total size of dequantized input */
+	uint32_t total_input_sz_d;
+
+	/* Number of outputs */
+	uint16_t nb_outputs;
+
+	/* Model / Layer outputs */
+	struct cnxk_ml_io output[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Total size of quantized output */
+	uint32_t total_output_sz_q;
+
+	/* Total size of dequantized output */
+	uint32_t total_output_sz_d;
+};
+
+#endif /* _CNXK_ML_IO_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
new file mode 100644
index 0000000000..3d735ced3e
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include "cnxk_ml_model.h"
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
new file mode 100644
index 0000000000..a2994dbb71
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -0,0 +1,111 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_MODEL_H_
+#define _CNXK_ML_MODEL_H_
+
+#include <rte_mldev.h>
+
+#include <roc_api.h>
+
+#include "cn10k_ml_model.h"
+
+#include "cnxk_ml_io.h"
+
+struct cnxk_ml_dev;
+struct cnxk_ml_model;
+
+/* Model state */
+enum cnxk_ml_model_state {
+	/* Unknown state */
+	ML_CNXK_MODEL_STATE_UNKNOWN,
+
+	/* Model loaded */
+	ML_CNXK_MODEL_STATE_LOADED,
+
+	/* A slow-path job is active, start or stop */
+	ML_CNXK_MODEL_STATE_JOB_ACTIVE,
+
+	/* Model started */
+	ML_CNXK_MODEL_STATE_STARTED,
+};
+
+/* Layer state */
+enum cnxk_ml_layer_state {
+	/* Unknown state */
+	ML_CNXK_LAYER_STATE_UNKNOWN,
+
+	/* Layer loaded */
+	ML_CNXK_LAYER_STATE_LOADED,
+
+	/* A slow-path job is active, start or stop */
+	ML_CNXK_LAYER_STATE_JOB_ACTIVE,
+
+	/* Layer started */
+	ML_CNXK_LAYER_STATE_STARTED,
+};
+
+/* Layer object */
+struct cnxk_ml_layer {
+	/* Name*/
+	char name[RTE_ML_STR_MAX];
+
+	/* Model handle */
+	struct cnxk_ml_model *model;
+
+	/* Index mapped with firmware's model_id */
+	uint16_t index;
+
+	/* Input / Output */
+	struct cnxk_ml_io_info info;
+
+	/* Batch size */
+	uint32_t batch_size;
+
+	/* State */
+	enum cnxk_ml_layer_state state;
+
+	/* Glow layer specific data */
+	struct cn10k_ml_layer_data glow;
+};
+
+/* Model Object */
+struct cnxk_ml_model {
+	/* Device reference */
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	/* ID */
+	uint16_t model_id;
+
+	/* Name */
+	char name[RTE_ML_STR_MAX];
+
+	/* Model specific data - glow */
+	struct cn10k_ml_model_data glow;
+
+	/* Batch size */
+	uint32_t batch_size;
+
+	/* Number of layers */
+	uint16_t nb_layers;
+
+	/* Layer info */
+	struct cnxk_ml_layer layer[ML_CNXK_MODEL_MAX_LAYERS];
+
+	/* State */
+	enum cnxk_ml_model_state state;
+
+	/* Internal model information structure
+	 * Size of the buffer = sizeof(struct rte_ml_model_info)
+	 *                    + num_inputs * sizeof(struct rte_ml_io_info)
+	 *                    + num_outputs * sizeof(struct rte_ml_io_info).
+	 * Structures would be arranged in the same order in the buffer.
+	 */
+	uint8_t *info;
+
+	/* Spinlock, used to update model state */
+	plt_spinlock_t lock;
+};
+
+#endif /* _CNXK_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index e006fdfe0e..a70956cceb 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -13,6 +13,7 @@ sources = files(
         'cn10k_ml_model.c',
         'cn10k_ml_ocm.c',
         'cnxk_ml_dev.c',
+        'cnxk_ml_model.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 04/34] ml/cnxk: add generic cnxk request structure
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (2 preceding siblings ...)
  2023-10-23  4:41   ` [PATCH v8 03/34] ml/cnxk: add generic model and layer structures Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 05/34] ml/cnxk: add generic cnxk xstats structures Srikanth Yalavarthi
                     ` (29 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added generic cnxk request structure. Moved common fields
from cn10k structures to cnxk structure. Moved job related
structures and enumerations to ops headers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c   |  72 +++----
 drivers/ml/cnxk/cn10k_ml_dev.h   | 269 +------------------------
 drivers/ml/cnxk/cn10k_ml_model.c |   6 +-
 drivers/ml/cnxk/cn10k_ml_model.h |   4 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 331 +++++++++++++++++--------------
 drivers/ml/cnxk/cn10k_ml_ops.h   | 296 +++++++++++++++++++++++----
 drivers/ml/cnxk/cnxk_ml_ops.c    |   7 +
 drivers/ml/cnxk/cnxk_ml_ops.h    |  63 ++++++
 drivers/ml/cnxk/meson.build      |   1 +
 9 files changed, 557 insertions(+), 492 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 3bc61443d8..fc6f78d414 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -14,9 +14,8 @@
 
 #include <roc_api.h>
 
-#include "cn10k_ml_ops.h"
-
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_ops.h"
 
 #define CN10K_ML_FW_PATH		"fw_path"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
@@ -400,20 +399,23 @@ cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 static void
 cn10k_ml_fw_print_info(struct cn10k_ml_fw *fw)
 {
-	plt_info("ML Firmware Version = %s", fw->req->jd.fw_load.version);
-
-	plt_ml_dbg("Firmware capabilities = 0x%016lx", fw->req->jd.fw_load.cap.u64);
-	plt_ml_dbg("Version = %s", fw->req->jd.fw_load.version);
-	plt_ml_dbg("core0_debug_ptr = 0x%016lx", fw->req->jd.fw_load.debug.core0_debug_ptr);
-	plt_ml_dbg("core1_debug_ptr = 0x%016lx", fw->req->jd.fw_load.debug.core1_debug_ptr);
-	plt_ml_dbg("debug_buffer_size = %u bytes", fw->req->jd.fw_load.debug.debug_buffer_size);
+	plt_info("ML Firmware Version = %s", fw->req->cn10k_req.jd.fw_load.version);
+
+	plt_ml_dbg("Firmware capabilities = 0x%016lx", fw->req->cn10k_req.jd.fw_load.cap.u64);
+	plt_ml_dbg("Version = %s", fw->req->cn10k_req.jd.fw_load.version);
+	plt_ml_dbg("core0_debug_ptr = 0x%016lx",
+		   fw->req->cn10k_req.jd.fw_load.debug.core0_debug_ptr);
+	plt_ml_dbg("core1_debug_ptr = 0x%016lx",
+		   fw->req->cn10k_req.jd.fw_load.debug.core1_debug_ptr);
+	plt_ml_dbg("debug_buffer_size = %u bytes",
+		   fw->req->cn10k_req.jd.fw_load.debug.debug_buffer_size);
 	plt_ml_dbg("core0_exception_buffer = 0x%016lx",
-		   fw->req->jd.fw_load.debug.core0_exception_buffer);
+		   fw->req->cn10k_req.jd.fw_load.debug.core0_exception_buffer);
 	plt_ml_dbg("core1_exception_buffer = 0x%016lx",
-		   fw->req->jd.fw_load.debug.core1_exception_buffer);
+		   fw->req->cn10k_req.jd.fw_load.debug.core1_exception_buffer);
 	plt_ml_dbg("exception_state_size = %u bytes",
-		   fw->req->jd.fw_load.debug.exception_state_size);
-	plt_ml_dbg("flags = 0x%016lx", fw->req->jd.fw_load.flags);
+		   fw->req->cn10k_req.jd.fw_load.debug.exception_state_size);
+	plt_ml_dbg("flags = 0x%016lx", fw->req->cn10k_req.jd.fw_load.flags);
 }
 
 uint64_t
@@ -458,29 +460,30 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 	roc_ml_reg_save(&cn10k_mldev->roc, ML_MLR_BASE);
 
 	/* Update FW load completion structure */
-	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
-	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
-	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
+	fw->req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->cn10k_req.status);
+	fw->req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
+	fw->req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->cn10k_req.result);
+	fw->req->cn10k_req.jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->cn10k_req.status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
 	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->cn10k_req.jd);
 
 	plt_rmb();
 	do {
 		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
+		    (plt_read64(&fw->req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
 	} while (plt_tsc_cycles() < timeout_cycle);
 
 	/* Check firmware load status, clean-up and exit on failure. */
-	if ((!timeout) && (fw->req->result.error_code.u64 == 0)) {
+	if ((!timeout) && (fw->req->cn10k_req.result.error_code == 0)) {
 		cn10k_ml_fw_print_info(fw);
 	} else {
 		/* Set ML to disable new jobs */
@@ -654,29 +657,30 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	plt_ml_dbg("ML_SW_RST_CTRL => 0x%08x", reg_val32);
 
 	/* (12) Wait for notification from firmware that ML is ready for job execution. */
-	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
-	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
-	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
+	fw->req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->cn10k_req.status);
+	fw->req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
+	fw->req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->cn10k_req.result);
+	fw->req->cn10k_req.jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->cn10k_req.status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
 	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->cn10k_req.jd);
 
 	plt_rmb();
 	do {
 		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
+		    (plt_read64(&fw->req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
 	} while (plt_tsc_cycles() < timeout_cycle);
 
 	/* Check firmware load status, clean-up and exit on failure. */
-	if ((!timeout) && (fw->req->result.error_code.u64 == 0)) {
+	if ((!timeout) && (fw->req->cn10k_req.result.error_code == 0)) {
 		cn10k_ml_fw_print_info(fw);
 	} else {
 		/* Set ML to disable new jobs */
@@ -766,11 +770,11 @@ cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 		}
 
 		/* Reserve memzone for firmware load completion and data */
-		mz_size = sizeof(struct cn10k_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
+		mz_size = sizeof(struct cnxk_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
 			  FW_DEBUG_BUFFER_SIZE + FW_EXCEPTION_BUFFER_SIZE;
 	} else if (roc_env_is_asim()) {
 		/* Reserve memzone for firmware load completion */
-		mz_size = sizeof(struct cn10k_ml_req);
+		mz_size = sizeof(struct cnxk_ml_req);
 	}
 
 	mz = plt_memzone_reserve_aligned(FW_MEMZONE_NAME, mz_size, 0, ML_CN10K_ALIGN_SIZE);
@@ -782,8 +786,8 @@ cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 	fw->req = mz->addr;
 
 	/* Reset firmware load completion structure */
-	memset(&fw->req->jd, 0, sizeof(struct cn10k_ml_jd));
-	memset(&fw->req->jd.fw_load.version[0], '\0', MLDEV_FIRMWARE_VERSION_LENGTH);
+	memset(&fw->req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+	memset(&fw->req->cn10k_req.jd.fw_load.version[0], '\0', MLDEV_FIRMWARE_VERSION_LENGTH);
 
 	/* Reset device, if in active state */
 	if (roc_ml_mlip_is_enabled(&cn10k_mldev->roc))
@@ -791,7 +795,7 @@ cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 
 	/* Load firmware */
 	if (roc_env_is_emulator() || roc_env_is_hw()) {
-		fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cn10k_ml_req));
+		fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cnxk_ml_req));
 		ret = cn10k_ml_fw_load_cn10ka(fw, fw_buffer, fw_size);
 		free(fw_buffer);
 	} else if (roc_env_is_asim()) {
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 99ff0a344a..1852d4f6c9 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -17,9 +17,6 @@ extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
-/* Firmware version string length */
-#define MLDEV_FIRMWARE_VERSION_LENGTH 32
-
 /* Device alignment size */
 #define ML_CN10K_ALIGN_SIZE 128
 
@@ -52,17 +49,8 @@ extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 #endif
 
 struct cnxk_ml_dev;
-struct cn10k_ml_req;
-struct cn10k_ml_qp;
-
-/* Job types */
-enum cn10k_ml_job_type {
-	ML_CN10K_JOB_TYPE_MODEL_RUN = 0,
-	ML_CN10K_JOB_TYPE_MODEL_STOP,
-	ML_CN10K_JOB_TYPE_MODEL_START,
-	ML_CN10K_JOB_TYPE_FIRMWARE_LOAD,
-	ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST,
-};
+struct cnxk_ml_req;
+struct cnxk_ml_qp;
 
 /* Error types enumeration */
 enum cn10k_ml_error_etype {
@@ -112,251 +100,6 @@ union cn10k_ml_error_code {
 	uint64_t u64;
 };
 
-/* Firmware stats */
-struct cn10k_ml_fw_stats {
-	/* Firmware start cycle */
-	uint64_t fw_start;
-
-	/* Firmware end cycle */
-	uint64_t fw_end;
-
-	/* Hardware start cycle */
-	uint64_t hw_start;
-
-	/* Hardware end cycle */
-	uint64_t hw_end;
-};
-
-/* Result structure */
-struct cn10k_ml_result {
-	/* Job error code */
-	union cn10k_ml_error_code error_code;
-
-	/* Firmware stats */
-	struct cn10k_ml_fw_stats stats;
-
-	/* User context pointer */
-	void *user_ptr;
-};
-
-/* Firmware capability structure */
-union cn10k_ml_fw_cap {
-	uint64_t u64;
-
-	struct {
-		/* CMPC completion support */
-		uint64_t cmpc_completions : 1;
-
-		/* Poll mode completion support */
-		uint64_t poll_completions : 1;
-
-		/* SSO completion support */
-		uint64_t sso_completions : 1;
-
-		/* Support for model side loading */
-		uint64_t side_load_model : 1;
-
-		/* Batch execution */
-		uint64_t batch_run : 1;
-
-		/* Max number of models to be loaded in parallel */
-		uint64_t max_models : 8;
-
-		/* Firmware statistics */
-		uint64_t fw_stats : 1;
-
-		/* Hardware statistics */
-		uint64_t hw_stats : 1;
-
-		/* Max number of batches */
-		uint64_t max_num_batches : 16;
-
-		uint64_t rsvd : 33;
-	} s;
-};
-
-/* Firmware debug info structure */
-struct cn10k_ml_fw_debug {
-	/* ACC core 0 debug buffer */
-	uint64_t core0_debug_ptr;
-
-	/* ACC core 1 debug buffer */
-	uint64_t core1_debug_ptr;
-
-	/* ACC core 0 exception state buffer */
-	uint64_t core0_exception_buffer;
-
-	/* ACC core 1 exception state buffer */
-	uint64_t core1_exception_buffer;
-
-	/* Debug buffer size per core */
-	uint32_t debug_buffer_size;
-
-	/* Exception state dump size */
-	uint32_t exception_state_size;
-};
-
-/* Job descriptor header (32 bytes) */
-struct cn10k_ml_jd_header {
-	/* Job completion structure */
-	struct ml_jce_s jce;
-
-	/* Model ID */
-	uint64_t model_id : 8;
-
-	/* Job type */
-	uint64_t job_type : 8;
-
-	/* Flags for fast-path jobs */
-	uint64_t fp_flags : 16;
-
-	/* Flags for slow-path jobs */
-	uint64_t sp_flags : 16;
-	uint64_t rsvd : 16;
-
-	/* Job result pointer */
-	uint64_t *result;
-};
-
-/* Extra arguments for job descriptor */
-union cn10k_ml_jd_extended_args {
-	struct cn10k_ml_jd_extended_args_section_start {
-		/** DDR Scratch base address */
-		uint64_t ddr_scratch_base_address;
-
-		/** DDR Scratch range start */
-		uint64_t ddr_scratch_range_start;
-
-		/** DDR Scratch range end */
-		uint64_t ddr_scratch_range_end;
-
-		uint8_t rsvd[104];
-	} start;
-};
-
-/* Job descriptor structure */
-struct cn10k_ml_jd {
-	/* Job descriptor header (32 bytes) */
-	struct cn10k_ml_jd_header hdr;
-
-	union {
-		struct cn10k_ml_jd_section_fw_load {
-			/* Firmware capability structure (8 bytes) */
-			union cn10k_ml_fw_cap cap;
-
-			/* Firmware version (32 bytes) */
-			uint8_t version[MLDEV_FIRMWARE_VERSION_LENGTH];
-
-			/* Debug capability structure (40 bytes) */
-			struct cn10k_ml_fw_debug debug;
-
-			/* Flags to control error handling */
-			uint64_t flags;
-
-			uint8_t rsvd[8];
-		} fw_load;
-
-		struct cn10k_ml_jd_section_model_start {
-			/* Extended arguments */
-			uint64_t extended_args;
-
-			/* Destination model start address in DDR relative to ML_MLR_BASE */
-			uint64_t model_dst_ddr_addr;
-
-			/* Offset to model init section in the model */
-			uint64_t model_init_offset : 32;
-
-			/* Size of init section in the model */
-			uint64_t model_init_size : 32;
-
-			/* Offset to model main section in the model */
-			uint64_t model_main_offset : 32;
-
-			/* Size of main section in the model */
-			uint64_t model_main_size : 32;
-
-			/* Offset to model finish section in the model */
-			uint64_t model_finish_offset : 32;
-
-			/* Size of finish section in the model */
-			uint64_t model_finish_size : 32;
-
-			/* Offset to WB in model bin */
-			uint64_t model_wb_offset : 32;
-
-			/* Number of model layers */
-			uint64_t num_layers : 8;
-
-			/* Number of gather entries, 0 means linear input mode (= no gather) */
-			uint64_t num_gather_entries : 8;
-
-			/* Number of scatter entries 0 means linear input mode (= no scatter) */
-			uint64_t num_scatter_entries : 8;
-
-			/* Tile mask to load model */
-			uint64_t tilemask : 8;
-
-			/* Batch size of model  */
-			uint64_t batch_size : 32;
-
-			/* OCM WB base address */
-			uint64_t ocm_wb_base_address : 32;
-
-			/* OCM WB range start */
-			uint64_t ocm_wb_range_start : 32;
-
-			/* OCM WB range End */
-			uint64_t ocm_wb_range_end : 32;
-
-			/* DDR WB address */
-			uint64_t ddr_wb_base_address;
-
-			/* DDR WB range start */
-			uint64_t ddr_wb_range_start : 32;
-
-			/* DDR WB range end */
-			uint64_t ddr_wb_range_end : 32;
-
-			union {
-				/* Points to gather list if num_gather_entries > 0 */
-				void *gather_list;
-				struct {
-					/* Linear input mode */
-					uint64_t ddr_range_start : 32;
-					uint64_t ddr_range_end : 32;
-				} s;
-			} input;
-
-			union {
-				/* Points to scatter list if num_scatter_entries > 0 */
-				void *scatter_list;
-				struct {
-					/* Linear output mode */
-					uint64_t ddr_range_start : 32;
-					uint64_t ddr_range_end : 32;
-				} s;
-			} output;
-		} model_start;
-
-		struct cn10k_ml_jd_section_model_stop {
-			uint8_t rsvd[96];
-		} model_stop;
-
-		struct cn10k_ml_jd_section_model_run {
-			/* Address of the input for the run relative to ML_MLR_BASE */
-			uint64_t input_ddr_addr;
-
-			/* Address of the output for the run relative to ML_MLR_BASE */
-			uint64_t output_ddr_addr;
-
-			/* Number of batches to run in variable batch processing */
-			uint16_t num_batches;
-
-			uint8_t rsvd[78];
-		} model_run;
-	};
-};
-
 /* ML firmware structure */
 struct cn10k_ml_fw {
 	/* Device reference */
@@ -375,7 +118,7 @@ struct cn10k_ml_fw {
 	uint8_t *data;
 
 	/* Firmware load / handshake request structure */
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 };
 
 /* Extended stats types enum */
@@ -488,9 +231,9 @@ struct cn10k_ml_dev {
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
 
 	/* Poll handling function pointers */
-	void (*set_poll_addr)(struct cn10k_ml_req *req);
-	void (*set_poll_ptr)(struct cn10k_ml_req *req);
-	uint64_t (*get_poll_ptr)(struct cn10k_ml_req *req);
+	void (*set_poll_addr)(struct cnxk_ml_req *req);
+	void (*set_poll_ptr)(struct cnxk_ml_req *req);
+	uint64_t (*get_poll_ptr)(struct cnxk_ml_req *req);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index d033d6deff..d2f1c761be 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -10,6 +10,7 @@
 
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_ops.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -551,7 +552,6 @@ void
 cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct rte_ml_model_info *info;
 	struct rte_ml_io_info *output;
@@ -560,7 +560,6 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 	uint8_t i;
 
 	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	metadata = &model->glow.metadata;
 	info = PLT_PTR_CAST(model->info);
 	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
@@ -577,7 +576,8 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 	info->io_layout = RTE_ML_IO_LAYOUT_PACKED;
 	info->min_batches = model->batch_size;
 	info->max_batches =
-		cn10k_mldev->fw.req->jd.fw_load.cap.s.max_num_batches / model->batch_size;
+		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_num_batches /
+		model->batch_size;
 	info->nb_inputs = metadata->model.num_input;
 	info->input_info = input;
 	info->nb_outputs = metadata->model.num_output;
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 206a369ca7..74ada1531a 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -11,10 +11,10 @@
 
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ocm.h"
-#include "cn10k_ml_ops.h"
 
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
+struct cnxk_ml_req;
 
 /* Model Metadata : v 2.3.0.1 */
 #define MRVL_ML_MODEL_MAGIC_STRING "MRVL"
@@ -444,7 +444,7 @@ struct cn10k_ml_layer_data {
 	struct cn10k_ml_ocm_layer_map ocm_map;
 
 	/* Layer: Slow-path operations request pointer */
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 
 	/* Layer: Stats for burst ops */
 	struct cn10k_ml_layer_stats *burst_stats;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index b226a9b5a2..25ebb28993 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -7,10 +7,9 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_ops.h"
-
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_ops.h"
 
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
@@ -78,31 +77,31 @@ print_line(FILE *fp, int len)
 }
 
 static inline void
-cn10k_ml_set_poll_addr(struct cn10k_ml_req *req)
+cn10k_ml_set_poll_addr(struct cnxk_ml_req *req)
 {
-	req->compl_W1 = PLT_U64_CAST(&req->status);
+	req->status = &req->cn10k_req.status;
 }
 
 static inline void
-cn10k_ml_set_poll_ptr(struct cn10k_ml_req *req)
+cn10k_ml_set_poll_ptr(struct cnxk_ml_req *req)
 {
-	plt_write64(ML_CNXK_POLL_JOB_START, req->compl_W1);
+	plt_write64(ML_CNXK_POLL_JOB_START, req->status);
 }
 
 static inline uint64_t
-cn10k_ml_get_poll_ptr(struct cn10k_ml_req *req)
+cn10k_ml_get_poll_ptr(struct cnxk_ml_req *req)
 {
-	return plt_read64(req->compl_W1);
+	return plt_read64(req->status);
 }
 
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
-	snprintf(name, size, "cn10k_ml_qp_mem_%u:%u", dev_id, qp_id);
+	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
 }
 
 static int
-cn10k_ml_qp_destroy(const struct rte_ml_dev *dev, struct cn10k_ml_qp *qp)
+cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
 {
 	const struct rte_memzone *qp_mem;
 	char name[RTE_MEMZONE_NAMESIZE];
@@ -122,14 +121,14 @@ cn10k_ml_qp_destroy(const struct rte_ml_dev *dev, struct cn10k_ml_qp *qp)
 static int
 cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
 {
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	int ret;
 
 	qp = dev->data->queue_pairs[queue_pair_id];
 	if (qp == NULL)
 		return -EINVAL;
 
-	ret = cn10k_ml_qp_destroy(dev, qp);
+	ret = cnxk_ml_qp_destroy(dev, qp);
 	if (ret) {
 		plt_err("Could not destroy queue pair %u", queue_pair_id);
 		return ret;
@@ -140,18 +139,18 @@ cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
 	return 0;
 }
 
-static struct cn10k_ml_qp *
-cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
+static struct cnxk_ml_qp *
+cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
 {
 	const struct rte_memzone *qp_mem;
 	char name[RTE_MEMZONE_NAMESIZE];
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint32_t len;
 	uint8_t *va;
 	uint64_t i;
 
 	/* Allocate queue pair */
-	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cn10k_ml_qp), ROC_ALIGN,
+	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cnxk_ml_qp), ROC_ALIGN,
 				socket_id);
 	if (qp == NULL) {
 		plt_err("Could not allocate queue pair");
@@ -159,7 +158,7 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	}
 
 	/* For request queue */
-	len = nb_desc * sizeof(struct cn10k_ml_req);
+	len = nb_desc * sizeof(struct cnxk_ml_req);
 	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp_id);
 	qp_mem = rte_memzone_reserve_aligned(
 		name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | RTE_MEMZONE_256MB, ROC_ALIGN);
@@ -173,7 +172,7 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 
 	/* Initialize Request queue */
 	qp->id = qp_id;
-	qp->queue.reqs = (struct cn10k_ml_req *)va;
+	qp->queue.reqs = (struct cnxk_ml_req *)va;
 	qp->queue.head = 0;
 	qp->queue.tail = 0;
 	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
@@ -185,8 +184,9 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
-		memset(&qp->queue.reqs[i].jd, 0, sizeof(struct cn10k_ml_jd));
-		qp->queue.reqs[i].jcmd.w1.s.jobptr = PLT_U64_CAST(&qp->queue.reqs[i].jd);
+		memset(&qp->queue.reqs[i].cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+		qp->queue.reqs[i].cn10k_req.jcmd.w1.s.jobptr =
+			PLT_U64_CAST(&qp->queue.reqs[i].cn10k_req.jd);
 	}
 
 	return qp;
@@ -333,7 +333,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 
 static void
 cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_model *model,
-				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
+				struct cnxk_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_layer_addr *addr;
@@ -341,79 +341,88 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 	metadata = &model->glow.metadata;
 	addr = &model->layer[0].glow.addr;
 
-	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
-	req->jd.hdr.jce.w0.u64 = 0;
-	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
-	req->jd.hdr.model_id = model->model_id;
-	req->jd.hdr.job_type = job_type;
-	req->jd.hdr.fp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
+	memset(&req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
+	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->cn10k_req.status);
+	req->cn10k_req.jd.hdr.model_id = model->model_id;
+	req->cn10k_req.jd.hdr.job_type = job_type;
+	req->cn10k_req.jd.hdr.fp_flags = 0x0;
+	req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
 		if (!model->glow.metadata.model.ocm_relocatable)
-			req->jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
+			req->cn10k_req.jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
 		else
-			req->jd.hdr.sp_flags = 0x0;
+			req->cn10k_req.jd.hdr.sp_flags = 0x0;
 
-		req->jd.hdr.sp_flags |= ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD;
-		req->jd.model_start.extended_args =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->extended_args));
-		req->jd.model_start.model_dst_ddr_addr =
+		req->cn10k_req.jd.hdr.sp_flags |= ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD;
+		req->cn10k_req.jd.model_start.extended_args = PLT_U64_CAST(
+			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.extended_args));
+		req->cn10k_req.jd.model_start.model_dst_ddr_addr =
 			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_run_addr));
-		req->jd.model_start.model_init_offset = 0x0;
-		req->jd.model_start.model_main_offset = metadata->init_model.file_size;
-		req->jd.model_start.model_finish_offset =
+		req->cn10k_req.jd.model_start.model_init_offset = 0x0;
+		req->cn10k_req.jd.model_start.model_main_offset = metadata->init_model.file_size;
+		req->cn10k_req.jd.model_start.model_finish_offset =
 			metadata->init_model.file_size + metadata->main_model.file_size;
-		req->jd.model_start.model_init_size = metadata->init_model.file_size;
-		req->jd.model_start.model_main_size = metadata->main_model.file_size;
-		req->jd.model_start.model_finish_size = metadata->finish_model.file_size;
-		req->jd.model_start.model_wb_offset = metadata->init_model.file_size +
-						      metadata->main_model.file_size +
-						      metadata->finish_model.file_size;
-		req->jd.model_start.num_layers = metadata->model.num_layers;
-		req->jd.model_start.num_gather_entries = 0;
-		req->jd.model_start.num_scatter_entries = 0;
-		req->jd.model_start.tilemask = 0; /* Updated after reserving pages */
-		req->jd.model_start.batch_size = model->batch_size;
-		req->jd.model_start.ocm_wb_base_address = 0; /* Updated after reserving pages */
-		req->jd.model_start.ocm_wb_range_start = metadata->model.ocm_wb_range_start;
-		req->jd.model_start.ocm_wb_range_end = metadata->model.ocm_wb_range_end;
-		req->jd.model_start.ddr_wb_base_address = PLT_U64_CAST(roc_ml_addr_ap2mlip(
-			&cn10k_mldev->roc,
-			PLT_PTR_ADD(addr->finish_load_addr, metadata->finish_model.file_size)));
-		req->jd.model_start.ddr_wb_range_start = metadata->model.ddr_wb_range_start;
-		req->jd.model_start.ddr_wb_range_end = metadata->model.ddr_wb_range_end;
-		req->jd.model_start.input.s.ddr_range_start = metadata->model.ddr_input_range_start;
-		req->jd.model_start.input.s.ddr_range_end = metadata->model.ddr_input_range_end;
-		req->jd.model_start.output.s.ddr_range_start =
+		req->cn10k_req.jd.model_start.model_init_size = metadata->init_model.file_size;
+		req->cn10k_req.jd.model_start.model_main_size = metadata->main_model.file_size;
+		req->cn10k_req.jd.model_start.model_finish_size = metadata->finish_model.file_size;
+		req->cn10k_req.jd.model_start.model_wb_offset = metadata->init_model.file_size +
+								metadata->main_model.file_size +
+								metadata->finish_model.file_size;
+		req->cn10k_req.jd.model_start.num_layers = metadata->model.num_layers;
+		req->cn10k_req.jd.model_start.num_gather_entries = 0;
+		req->cn10k_req.jd.model_start.num_scatter_entries = 0;
+		req->cn10k_req.jd.model_start.tilemask = 0; /* Updated after reserving pages */
+		req->cn10k_req.jd.model_start.batch_size = model->batch_size;
+		req->cn10k_req.jd.model_start.ocm_wb_base_address =
+			0; /* Updated after reserving pages */
+		req->cn10k_req.jd.model_start.ocm_wb_range_start =
+			metadata->model.ocm_wb_range_start;
+		req->cn10k_req.jd.model_start.ocm_wb_range_end = metadata->model.ocm_wb_range_end;
+		req->cn10k_req.jd.model_start.ddr_wb_base_address =
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(
+				&cn10k_mldev->roc, PLT_PTR_ADD(addr->finish_load_addr,
+							       metadata->finish_model.file_size)));
+		req->cn10k_req.jd.model_start.ddr_wb_range_start =
+			metadata->model.ddr_wb_range_start;
+		req->cn10k_req.jd.model_start.ddr_wb_range_end = metadata->model.ddr_wb_range_end;
+		req->cn10k_req.jd.model_start.input.s.ddr_range_start =
+			metadata->model.ddr_input_range_start;
+		req->cn10k_req.jd.model_start.input.s.ddr_range_end =
+			metadata->model.ddr_input_range_end;
+		req->cn10k_req.jd.model_start.output.s.ddr_range_start =
 			metadata->model.ddr_output_range_start;
-		req->jd.model_start.output.s.ddr_range_end = metadata->model.ddr_output_range_end;
+		req->cn10k_req.jd.model_start.output.s.ddr_range_end =
+			metadata->model.ddr_output_range_end;
 
-		req->extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
+		req->cn10k_req.extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
 			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->scratch_base_addr));
-		req->extended_args.start.ddr_scratch_range_start =
+		req->cn10k_req.extended_args.start.ddr_scratch_range_start =
 			metadata->model.ddr_scratch_range_start;
-		req->extended_args.start.ddr_scratch_range_end =
+		req->cn10k_req.extended_args.start.ddr_scratch_range_end =
 			metadata->model.ddr_scratch_range_end;
 	}
 }
 
 static __rte_always_inline void
-cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_req *req,
+cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_req *req,
 				struct rte_ml_op *op)
 {
-	req->jd.hdr.jce.w0.u64 = 0;
-	req->jd.hdr.jce.w1.u64 = req->compl_W1;
-	req->jd.hdr.model_id = op->model_id;
-	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
-	req->jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
-	req->jd.hdr.sp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
-	req->jd.model_run.input_ddr_addr =
+	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
+	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(req->status);
+	req->cn10k_req.jd.hdr.model_id = op->model_id;
+	req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
+	req->cn10k_req.jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
+	req->cn10k_req.jd.hdr.sp_flags = 0x0;
+	req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
+	req->cn10k_req.jd.model_run.input_ddr_addr =
 		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->input[0]->addr));
-	req->jd.model_run.output_ddr_addr =
+	req->cn10k_req.jd.model_run.output_ddr_addr =
 		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->output[0]->addr));
-	req->jd.model_run.num_batches = op->nb_batches;
+	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
 }
 
 struct xstat_info {
@@ -861,7 +870,7 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	op.input = &inp;
 	op.output = &out;
 
-	memset(model->layer[0].glow.req, 0, sizeof(struct cn10k_ml_req));
+	memset(model->layer[0].glow.req, 0, sizeof(struct cnxk_ml_req));
 	ret = cn10k_ml_inference_sync(dev, &op);
 	plt_memzone_free(mz);
 
@@ -904,7 +913,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint16_t model_id;
 	uint32_t mz_size;
 	uint16_t tile_id;
@@ -1101,7 +1110,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint16_t model_id;
 	uint16_t qp_id;
 
@@ -1136,7 +1145,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
 		qp = dev->data->queue_pairs[qp_id];
 		if (qp != NULL) {
-			if (cn10k_ml_qp_destroy(dev, qp) != 0)
+			if (cnxk_ml_qp_destroy(dev, qp) != 0)
 				plt_err("Could not destroy queue pair %u", qp_id);
 			dev->data->queue_pairs[qp_id] = NULL;
 		}
@@ -1213,7 +1222,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 			      const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
 {
 	struct rte_ml_dev_info dev_info;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint32_t nb_desc;
 
 	if (queue_pair_id >= dev->data->nb_queue_pairs) {
@@ -1239,7 +1248,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	 */
 	nb_desc =
 		(qp_conf->nb_desc == dev_info.max_desc) ? dev_info.max_desc : qp_conf->nb_desc + 1;
-	qp = cn10k_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
+	qp = cnxk_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
 	if (qp == NULL) {
 		plt_err("Could not create queue pair %u", queue_pair_id);
 		return -ENOMEM;
@@ -1252,7 +1261,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 static int
 cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 {
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	int qp_id;
 
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
@@ -1269,7 +1278,7 @@ cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 static void
 cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	int qp_id;
 
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
@@ -1485,20 +1494,22 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 
 	/* Dump debug buffer */
 	for (core_id = 0; core_id <= 1; core_id++) {
-		bufsize = fw->req->jd.fw_load.debug.debug_buffer_size;
+		bufsize = fw->req->cn10k_req.jd.fw_load.debug.debug_buffer_size;
 		if (core_id == 0) {
 			head_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
 			tail_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_debug_ptr);
+			head_ptr =
+				PLT_PTR_CAST(fw->req->cn10k_req.jd.fw_load.debug.core0_debug_ptr);
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		} else {
 			head_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
 			tail_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_debug_ptr);
+			head_ptr =
+				PLT_PTR_CAST(fw->req->cn10k_req.jd.fw_load.debug.core1_debug_ptr);
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		}
 		if (head_loc < tail_loc) {
@@ -1511,17 +1522,19 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 
 	/* Dump exception info */
 	for (core_id = 0; core_id <= 1; core_id++) {
-		bufsize = fw->req->jd.fw_load.debug.exception_state_size;
+		bufsize = fw->req->cn10k_req.jd.fw_load.debug.exception_state_size;
 		if ((core_id == 0) &&
 		    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0)) {
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_exception_buffer);
+			head_ptr = PLT_PTR_CAST(
+				fw->req->cn10k_req.jd.fw_load.debug.core0_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C0 = 0x%016lx",
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0));
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 			fprintf(fp, "%.*s", bufsize, head_ptr);
 		} else if ((core_id == 1) && (roc_ml_reg_read64(&cn10k_mldev->roc,
 								ML_SCRATCH_EXCEPTION_SP_C1) != 0)) {
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_exception_buffer);
+			head_ptr = PLT_PTR_CAST(
+				fw->req->cn10k_req.jd.fw_load.debug.core1_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C1 = 0x%016lx",
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1));
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
@@ -1538,14 +1551,14 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	const struct plt_memzone *mz;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 	uint64_t timeout_cycle;
 	bool timeout;
 	int ret;
 
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cn10k_ml_req), 0,
+	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cnxk_ml_req), 0,
 					 ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL) {
 		plt_err("Could not allocate reserved memzone");
@@ -1554,23 +1567,24 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	req = mz->addr;
 
 	/* Prepare load completion structure */
-	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
-	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
-	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
-	req->jd.fw_load.flags = cn10k_ml_fw_flags_get(&cn10k_mldev->fw);
-	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
+	memset(&req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->cn10k_req.status);
+	req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST;
+	req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
+	req->cn10k_req.jd.fw_load.flags = cn10k_ml_fw_flags_get(&cn10k_mldev->fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
 	/* Enqueue firmware selftest request through scratch registers */
 	timeout = true;
 	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 
 	plt_rmb();
 	do {
 		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
-		    (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH)) {
+		    (plt_read64(&req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -1581,7 +1595,7 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	if (timeout) {
 		ret = -ETIME;
 	} else {
-		if (req->result.error_code.u64 != 0)
+		if (req->cn10k_req.result.error_code != 0)
 			ret = -1;
 	}
 
@@ -1654,7 +1668,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
 		  2 * model_data_size + model_scratch_size + model_info_size +
-		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE) +
+		  PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE) +
 		  model_stats_size;
 
 	/* Allocate memzone for model object and model data */
@@ -1726,7 +1740,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	/* Reset burst and sync stats */
 	model->layer[0].glow.burst_stats =
 		PLT_PTR_ADD(model->layer[0].glow.req,
-			    PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE));
+			    PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
 		model->layer[0].glow.burst_stats[qp_id].hw_latency_tot = 0;
 		model->layer[0].glow.burst_stats[qp_id].hw_latency_min = UINT64_MAX;
@@ -1790,7 +1804,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 
 	bool job_enqueued;
 	bool job_dequeued;
@@ -1815,10 +1829,10 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	/* Prepare JD */
 	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
-	req->result.error_code.u64 = 0x0;
-	req->result.user_ptr = NULL;
+	req->cn10k_req.result.error_code = 0x0;
+	req->cn10k_req.result.user_ptr = NULL;
 
-	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
 	num_tiles = model->layer[0].glow.metadata.model.tile_end -
@@ -1878,8 +1892,8 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Update JD */
 	cn10k_ml_ocm_tilecount(model->layer[0].glow.ocm_map.tilemask, &tile_start, &tile_end);
-	req->jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
-	req->jd.model_start.ocm_wb_base_address =
+	req->cn10k_req.jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
+	req->cn10k_req.jd.model_start.ocm_wb_base_address =
 		model->layer[0].glow.ocm_map.wb_page_start * ocm->page_size;
 
 	job_enqueued = false;
@@ -1887,19 +1901,21 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	do {
 		if (!job_enqueued) {
 			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
+			job_enqueued =
+				roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
+			job_dequeued =
+				roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
-			if (req->result.error_code.u64 == 0)
+		if (plt_read64(&req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH) {
+			if (req->cn10k_req.result.error_code == 0)
 				ret = 0;
 			else
 				ret = -1;
@@ -1952,7 +1968,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 
 	bool job_enqueued;
 	bool job_dequeued;
@@ -1972,10 +1988,10 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	/* Prepare JD */
 	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
-	req->result.error_code.u64 = 0x0;
-	req->result.user_ptr = NULL;
+	req->cn10k_req.result.error_code = 0x0;
+	req->cn10k_req.result.user_ptr = NULL;
 
-	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
 	locked = false;
@@ -2015,19 +2031,21 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	do {
 		if (!job_enqueued) {
 			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
+			job_enqueued =
+				roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
+			job_dequeued =
+				roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
-			if (req->result.error_code.u64 == 0x0)
+		if (plt_read64(&req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH) {
+			if (req->cn10k_req.result.error_code == 0x0)
 				ret = 0;
 			else
 				ret = -1;
@@ -2287,18 +2305,23 @@ queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
 }
 
 static __rte_always_inline void
-cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
-		       struct rte_ml_op *op)
+cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *req)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_layer_stats *stats;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cn10k_ml_result *result;
 	struct cnxk_ml_model *model;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
+	struct rte_ml_op *op;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
 
-	if (likely(result->error_code.u64 == 0)) {
+	result = &req->cn10k_req.result;
+	op = req->op;
+
+	if (likely(result->error_code == 0)) {
 		model = dev->data->models[op->model_id];
 		if (likely(qp_id >= 0)) {
 			qp = dev->data->queue_pairs[qp_id];
@@ -2329,7 +2352,7 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		stats->fw_latency_max = PLT_MAX(stats->fw_latency_max, fw_latency);
 		stats->dequeued_count++;
 
-		op->impl_opaque = result->error_code.u64;
+		op->impl_opaque = result->error_code;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
 	} else {
 		if (likely(qp_id >= 0)) {
@@ -2338,7 +2361,8 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		}
 
 		/* Handle driver error */
-		if (result->error_code.s.etype == ML_ETYPE_DRIVER) {
+		error_code = (union cn10k_ml_error_code *)&result->error_code;
+		if (error_code->s.etype == ML_ETYPE_DRIVER) {
 			cnxk_mldev = dev->data->dev_private;
 			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
@@ -2346,15 +2370,15 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 			if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) !=
 			     0) ||
 			    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
-				result->error_code.s.stype = ML_DRIVER_ERR_EXCEPTION;
+				error_code->s.stype = ML_DRIVER_ERR_EXCEPTION;
 			else if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_LO) != 0) ||
 				 (roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_HI) != 0))
-				result->error_code.s.stype = ML_DRIVER_ERR_FW_ERROR;
+				error_code->s.stype = ML_DRIVER_ERR_FW_ERROR;
 			else
-				result->error_code.s.stype = ML_DRIVER_ERR_UNKNOWN;
+				error_code->s.stype = ML_DRIVER_ERR_UNKNOWN;
 		}
 
-		op->impl_opaque = result->error_code.u64;
+		op->impl_opaque = result->error_code;
 		op->status = RTE_ML_OP_STATUS_ERROR;
 	}
 
@@ -2365,11 +2389,12 @@ __rte_hot uint16_t
 cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_req *req;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_req *req;
+	struct cnxk_ml_qp *qp;
 	struct rte_ml_op *op;
 
 	uint16_t count;
@@ -2395,12 +2420,13 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	cn10k_mldev->set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
-	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
-	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
-	req->result.user_ptr = op->user_ptr;
+	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
+	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	req->cn10k_req.result.user_ptr = op->user_ptr;
 
 	cn10k_mldev->set_poll_ptr(req);
-	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd);
+	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
 
@@ -2424,11 +2450,12 @@ __rte_hot uint16_t
 cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_req *req;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_req *req;
+	struct cnxk_ml_qp *qp;
 
 	uint64_t status;
 	uint16_t count;
@@ -2450,13 +2477,15 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	req = &queue->reqs[tail];
 	status = cn10k_mldev->get_poll_ptr(req);
 	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
-		if (plt_tsc_cycles() < req->timeout)
+		if (plt_tsc_cycles() < req->timeout) {
 			goto empty_or_active;
-		else /* Timeout, set indication of driver error */
-			req->result.error_code.s.etype = ML_ETYPE_DRIVER;
+		} else { /* Timeout, set indication of driver error */
+			error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+			error_code->s.etype = ML_ETYPE_DRIVER;
+		}
 	}
 
-	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
+	cn10k_ml_result_update(dev, qp_id, req);
 	ops[count] = req->op;
 
 	queue_index_advance(&tail, qp->nb_desc);
@@ -2507,10 +2536,11 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 __rte_hot int
 cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 	bool timeout;
 	int ret = 0;
 
@@ -2522,17 +2552,18 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	cn10k_ml_set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
-	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
-	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
-	req->result.user_ptr = op->user_ptr;
+	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
+	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	req->cn10k_req.result.user_ptr = op->user_ptr;
 
 	cn10k_mldev->set_poll_ptr(req);
-	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
+	req->cn10k_req.jcmd.w1.s.jobptr = PLT_U64_CAST(&req->cn10k_req.jd);
 
 	timeout = true;
 	req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	do {
-		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd)) {
+		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd)) {
 			req->op = op;
 			timeout = false;
 			break;
@@ -2555,7 +2586,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	if (timeout)
 		ret = -ETIME;
 	else
-		cn10k_ml_result_update(dev, -1, &req->result, req->op);
+		cn10k_ml_result_update(dev, -1, req);
 
 error_enqueue:
 	return ret;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 005b093e45..fd5992e192 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -10,63 +10,279 @@
 
 #include <roc_api.h>
 
-#include "cn10k_ml_dev.h"
+/* Firmware version string length */
+#define MLDEV_FIRMWARE_VERSION_LENGTH 32
 
-/* Request structure */
-struct cn10k_ml_req {
-	/* Job descriptor */
-	struct cn10k_ml_jd jd;
+/* Job types */
+enum cn10k_ml_job_type {
+	ML_CN10K_JOB_TYPE_MODEL_RUN = 0,
+	ML_CN10K_JOB_TYPE_MODEL_STOP,
+	ML_CN10K_JOB_TYPE_MODEL_START,
+	ML_CN10K_JOB_TYPE_FIRMWARE_LOAD,
+	ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST,
+};
 
-	/* Job descriptor extra arguments */
-	union cn10k_ml_jd_extended_args extended_args;
+/* Firmware stats */
+struct cn10k_ml_stats {
+	/* Firmware start cycle */
+	uint64_t fw_start;
 
-	/* Job result */
-	struct cn10k_ml_result result;
+	/* Firmware end cycle */
+	uint64_t fw_end;
 
-	/* Status field for poll mode requests */
-	volatile uint64_t status;
+	/* Hardware start cycle */
+	uint64_t hw_start;
 
-	/* Job command */
-	struct ml_job_cmd_s jcmd;
+	/* Hardware end cycle */
+	uint64_t hw_end;
+};
+
+/* Result structure */
+struct cn10k_ml_result {
+	/* Job error code */
+	uint64_t error_code;
+
+	/* Stats */
+	struct cn10k_ml_stats stats;
+
+	/* User context pointer */
+	void *user_ptr;
+};
+
+/* Firmware capability structure */
+union cn10k_ml_fw_cap {
+	uint64_t u64;
+
+	struct {
+		/* CMPC completion support */
+		uint64_t cmpc_completions : 1;
+
+		/* Poll mode completion support */
+		uint64_t poll_completions : 1;
+
+		/* SSO completion support */
+		uint64_t sso_completions : 1;
+
+		/* Support for model side loading */
+		uint64_t side_load_model : 1;
 
-	/* Job completion W1 */
-	uint64_t compl_W1;
+		/* Batch execution */
+		uint64_t batch_run : 1;
 
-	/* Timeout cycle */
-	uint64_t timeout;
+		/* Max number of models to be loaded in parallel */
+		uint64_t max_models : 8;
 
-	/* Op */
-	struct rte_ml_op *op;
-} __rte_aligned(ROC_ALIGN);
+		/* Firmware statistics */
+		uint64_t fw_stats : 1;
 
-/* Request queue */
-struct cn10k_ml_queue {
-	/* Array of requests */
-	struct cn10k_ml_req *reqs;
+		/* Hardware statistics */
+		uint64_t hw_stats : 1;
 
-	/* Head of the queue, used for enqueue */
-	uint64_t head;
+		/* Max number of batches */
+		uint64_t max_num_batches : 16;
 
-	/* Tail of the queue, used for dequeue */
-	uint64_t tail;
+		uint64_t rsvd : 33;
+	} s;
+};
+
+/* Firmware debug info structure */
+struct cn10k_ml_fw_debug {
+	/* ACC core 0 debug buffer */
+	uint64_t core0_debug_ptr;
+
+	/* ACC core 1 debug buffer */
+	uint64_t core1_debug_ptr;
+
+	/* ACC core 0 exception state buffer */
+	uint64_t core0_exception_buffer;
+
+	/* ACC core 1 exception state buffer */
+	uint64_t core1_exception_buffer;
+
+	/* Debug buffer size per core */
+	uint32_t debug_buffer_size;
 
-	/* Wait cycles before timeout */
-	uint64_t wait_cycles;
+	/* Exception state dump size */
+	uint32_t exception_state_size;
 };
 
-/* Queue-pair structure */
-struct cn10k_ml_qp {
-	/* ID */
-	uint32_t id;
+/* Job descriptor header (32 bytes) */
+struct cn10k_ml_jd_header {
+	/* Job completion structure */
+	struct ml_jce_s jce;
+
+	/* Model ID */
+	uint64_t model_id : 8;
+
+	/* Job type */
+	uint64_t job_type : 8;
+
+	/* Flags for fast-path jobs */
+	uint64_t fp_flags : 16;
+
+	/* Flags for slow-path jobs */
+	uint64_t sp_flags : 16;
+	uint64_t rsvd : 16;
+
+	/* Job result pointer */
+	uint64_t *result;
+};
+
+/* Extra arguments for job descriptor */
+union cn10k_ml_jd_extended_args {
+	struct cn10k_ml_jd_extended_args_section_start {
+		/* DDR Scratch base address */
+		uint64_t ddr_scratch_base_address;
+
+		/* DDR Scratch range start */
+		uint64_t ddr_scratch_range_start;
+
+		/* DDR Scratch range end */
+		uint64_t ddr_scratch_range_end;
+
+		uint8_t rsvd[104];
+	} start;
+};
+
+/* Job descriptor structure */
+struct cn10k_ml_jd {
+	/* Job descriptor header (32 bytes) */
+	struct cn10k_ml_jd_header hdr;
+
+	union {
+		struct cn10k_ml_jd_section_fw_load {
+			/* Firmware capability structure (8 bytes) */
+			union cn10k_ml_fw_cap cap;
+
+			/* Firmware version (32 bytes) */
+			uint8_t version[MLDEV_FIRMWARE_VERSION_LENGTH];
+
+			/* Debug capability structure (40 bytes) */
+			struct cn10k_ml_fw_debug debug;
 
-	/* Number of descriptors */
-	uint64_t nb_desc;
+			/* Flags to control error handling */
+			uint64_t flags;
 
-	/* Request queue */
-	struct cn10k_ml_queue queue;
+			uint8_t rsvd[8];
+		} fw_load;
 
-	/* Statistics per queue-pair */
-	struct rte_ml_dev_stats stats;
+		struct cn10k_ml_jd_section_model_start {
+			/* Extended arguments */
+			uint64_t extended_args;
+
+			/* Destination model start address in DDR relative to ML_MLR_BASE */
+			uint64_t model_dst_ddr_addr;
+
+			/* Offset to model init section in the model */
+			uint64_t model_init_offset : 32;
+
+			/* Size of init section in the model */
+			uint64_t model_init_size : 32;
+
+			/* Offset to model main section in the model */
+			uint64_t model_main_offset : 32;
+
+			/* Size of main section in the model */
+			uint64_t model_main_size : 32;
+
+			/* Offset to model finish section in the model */
+			uint64_t model_finish_offset : 32;
+
+			/* Size of finish section in the model */
+			uint64_t model_finish_size : 32;
+
+			/* Offset to WB in model bin */
+			uint64_t model_wb_offset : 32;
+
+			/* Number of model layers */
+			uint64_t num_layers : 8;
+
+			/* Number of gather entries, 0 means linear input mode (= no gather) */
+			uint64_t num_gather_entries : 8;
+
+			/* Number of scatter entries 0 means linear input mode (= no scatter) */
+			uint64_t num_scatter_entries : 8;
+
+			/* Tile mask to load model */
+			uint64_t tilemask : 8;
+
+			/* Batch size of model  */
+			uint64_t batch_size : 32;
+
+			/* OCM WB base address */
+			uint64_t ocm_wb_base_address : 32;
+
+			/* OCM WB range start */
+			uint64_t ocm_wb_range_start : 32;
+
+			/* OCM WB range End */
+			uint64_t ocm_wb_range_end : 32;
+
+			/* DDR WB address */
+			uint64_t ddr_wb_base_address;
+
+			/* DDR WB range start */
+			uint64_t ddr_wb_range_start : 32;
+
+			/* DDR WB range end */
+			uint64_t ddr_wb_range_end : 32;
+
+			union {
+				/* Points to gather list if num_gather_entries > 0 */
+				void *gather_list;
+				struct {
+					/* Linear input mode */
+					uint64_t ddr_range_start : 32;
+					uint64_t ddr_range_end : 32;
+				} s;
+			} input;
+
+			union {
+				/* Points to scatter list if num_scatter_entries > 0 */
+				void *scatter_list;
+				struct {
+					/* Linear output mode */
+					uint64_t ddr_range_start : 32;
+					uint64_t ddr_range_end : 32;
+				} s;
+			} output;
+		} model_start;
+
+		struct cn10k_ml_jd_section_model_stop {
+			uint8_t rsvd[96];
+		} model_stop;
+
+		struct cn10k_ml_jd_section_model_run {
+			/* Address of the input for the run relative to ML_MLR_BASE */
+			uint64_t input_ddr_addr;
+
+			/* Address of the output for the run relative to ML_MLR_BASE */
+			uint64_t output_ddr_addr;
+
+			/* Number of batches to run in variable batch processing */
+			uint16_t num_batches;
+
+			uint8_t rsvd[78];
+		} model_run;
+	};
+} __plt_aligned(ROC_ALIGN);
+
+/* CN10K specific request */
+struct cn10k_ml_req {
+	/* Job descriptor */
+	struct cn10k_ml_jd jd;
+
+	/* Job descriptor extra arguments */
+	union cn10k_ml_jd_extended_args extended_args;
+
+	/* Status field for poll mode requests */
+	volatile uint64_t status;
+
+	/* Job command */
+	struct ml_job_cmd_s jcmd;
+
+	/* Result */
+	struct cn10k_ml_result result;
 };
 
 /* Device ops */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
new file mode 100644
index 0000000000..f1872dcf7c
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include "cnxk_ml_ops.h"
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
new file mode 100644
index 0000000000..b953fb0f5f
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_OPS_H_
+#define _CNXK_ML_OPS_H_
+
+#include <rte_mldev.h>
+#include <rte_mldev_core.h>
+
+#include <roc_api.h>
+
+#include "cn10k_ml_ops.h"
+
+/* Request structure */
+struct cnxk_ml_req {
+	/* Device specific request */
+	union {
+		/* CN10K */
+		struct cn10k_ml_req cn10k_req;
+	};
+
+	/* Address of status field */
+	volatile uint64_t *status;
+
+	/* Timeout cycle */
+	uint64_t timeout;
+
+	/* Op */
+	struct rte_ml_op *op;
+} __rte_aligned(ROC_ALIGN);
+
+/* Request queue */
+struct cnxk_ml_queue {
+	/* Array of requests */
+	struct cnxk_ml_req *reqs;
+
+	/* Head of the queue, used for enqueue */
+	uint64_t head;
+
+	/* Tail of the queue, used for dequeue */
+	uint64_t tail;
+
+	/* Wait cycles before timeout */
+	uint64_t wait_cycles;
+};
+
+/* Queue-pair structure */
+struct cnxk_ml_qp {
+	/* ID */
+	uint32_t id;
+
+	/* Number of descriptors */
+	uint64_t nb_desc;
+
+	/* Request queue */
+	struct cnxk_ml_queue queue;
+
+	/* Statistics per queue-pair */
+	struct rte_ml_dev_stats stats;
+};
+
+#endif /* _CNXK_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index a70956cceb..d652543912 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -14,6 +14,7 @@ sources = files(
         'cn10k_ml_ocm.c',
         'cnxk_ml_dev.c',
         'cnxk_ml_model.c',
+        'cnxk_ml_ops.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 05/34] ml/cnxk: add generic cnxk xstats structures
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (3 preceding siblings ...)
  2023-10-23  4:41   ` [PATCH v8 04/34] ml/cnxk: add generic cnxk request structure Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 06/34] ml/cnxk: rename cnxk ops function pointers struct Srikanth Yalavarthi
                     ` (28 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduced generic xstats structures and renamed cn10k
xstats enumerations with cnxk prefix.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |  86 +---------------
 drivers/ml/cnxk/cn10k_ml_model.h |   6 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 169 ++++++++++++++-----------------
 drivers/ml/cnxk/cnxk_ml_xstats.h | 128 +++++++++++++++++++++++
 4 files changed, 209 insertions(+), 180 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_xstats.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 1852d4f6c9..be989e0a20 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -10,6 +10,7 @@
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_io.h"
+#include "cnxk_ml_xstats.h"
 
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
@@ -121,89 +122,6 @@ struct cn10k_ml_fw {
 	struct cnxk_ml_req *req;
 };
 
-/* Extended stats types enum */
-enum cn10k_ml_xstats_type {
-	/* Number of models loaded */
-	nb_models_loaded,
-
-	/* Number of models unloaded */
-	nb_models_unloaded,
-
-	/* Number of models started */
-	nb_models_started,
-
-	/* Number of models stopped */
-	nb_models_stopped,
-
-	/* Average inference hardware latency */
-	avg_hw_latency,
-
-	/* Minimum hardware latency */
-	min_hw_latency,
-
-	/* Maximum hardware latency */
-	max_hw_latency,
-
-	/* Average firmware latency */
-	avg_fw_latency,
-
-	/* Minimum firmware latency */
-	min_fw_latency,
-
-	/* Maximum firmware latency */
-	max_fw_latency,
-};
-
-/* Extended stats function type enum. */
-enum cn10k_ml_xstats_fn_type {
-	/* Device function */
-	CN10K_ML_XSTATS_FN_DEVICE,
-
-	/* Model function */
-	CN10K_ML_XSTATS_FN_MODEL,
-};
-
-/* Function pointer to get xstats for a type */
-typedef uint64_t (*cn10k_ml_xstats_fn)(struct rte_ml_dev *dev, uint16_t obj_idx,
-				       enum cn10k_ml_xstats_type stat);
-
-/* Extended stats entry structure */
-struct cn10k_ml_xstats_entry {
-	/* Name-ID map */
-	struct rte_ml_dev_xstats_map map;
-
-	/* xstats mode, device or model */
-	enum rte_ml_dev_xstats_mode mode;
-
-	/* Type of xstats */
-	enum cn10k_ml_xstats_type type;
-
-	/* xstats function */
-	enum cn10k_ml_xstats_fn_type fn_id;
-
-	/* Object ID, model ID for model stat type */
-	uint16_t obj_idx;
-
-	/* Allowed to reset the stat */
-	uint8_t reset_allowed;
-
-	/* An offset to be taken away to emulate resets */
-	uint64_t reset_value;
-};
-
-/* Extended stats data */
-struct cn10k_ml_xstats {
-	/* Pointer to xstats entries */
-	struct cn10k_ml_xstats_entry *entries;
-
-	/* Store num stats and offset of the stats for each model */
-	uint16_t count_per_model[ML_CNXK_MAX_MODELS];
-	uint16_t offset_for_model[ML_CNXK_MAX_MODELS];
-	uint16_t count_mode_device;
-	uint16_t count_mode_model;
-	uint16_t count;
-};
-
 /* Device private data */
 struct cn10k_ml_dev {
 	/* Device ROC */
@@ -216,7 +134,7 @@ struct cn10k_ml_dev {
 	struct cn10k_ml_ocm ocm;
 
 	/* Extended stats data */
-	struct cn10k_ml_xstats xstats;
+	struct cnxk_ml_xstats xstats;
 
 	/* Enable / disable model data caching */
 	int cache_model_data;
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 74ada1531a..5c32f48c68 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -404,7 +404,7 @@ struct cn10k_ml_layer_addr {
 };
 
 /* Model fast-path stats */
-struct cn10k_ml_layer_stats {
+struct cn10k_ml_layer_xstats {
 	/* Total hardware latency, sum of all inferences */
 	uint64_t hw_latency_tot;
 
@@ -447,10 +447,10 @@ struct cn10k_ml_layer_data {
 	struct cnxk_ml_req *req;
 
 	/* Layer: Stats for burst ops */
-	struct cn10k_ml_layer_stats *burst_stats;
+	struct cn10k_ml_layer_xstats *burst_xstats;
 
 	/* Layer: Stats for sync ops */
-	struct cn10k_ml_layer_stats *sync_stats;
+	struct cn10k_ml_layer_xstats *sync_xstats;
 };
 
 struct cn10k_ml_model_data {
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 25ebb28993..b470955ffd 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -10,6 +10,7 @@
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
+#include "cnxk_ml_xstats.h"
 
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
@@ -425,26 +426,6 @@ cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
 }
 
-struct xstat_info {
-	char name[32];
-	enum cn10k_ml_xstats_type type;
-	uint8_t reset_allowed;
-};
-
-/* Note: Device stats are not allowed to be reset. */
-static const struct xstat_info device_stats[] = {
-	{"nb_models_loaded", nb_models_loaded, 0},
-	{"nb_models_unloaded", nb_models_unloaded, 0},
-	{"nb_models_started", nb_models_started, 0},
-	{"nb_models_stopped", nb_models_stopped, 0},
-};
-
-static const struct xstat_info model_stats[] = {
-	{"Avg-HW-Latency", avg_hw_latency, 1}, {"Min-HW-Latency", min_hw_latency, 1},
-	{"Max-HW-Latency", max_hw_latency, 1}, {"Avg-FW-Latency", avg_fw_latency, 1},
-	{"Min-FW-Latency", min_fw_latency, 1}, {"Max-FW-Latency", max_fw_latency, 1},
-};
-
 static int
 cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 {
@@ -459,10 +440,10 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
-	nb_stats = RTE_DIM(device_stats) + ML_CNXK_MAX_MODELS * RTE_DIM(model_stats);
+	nb_stats = RTE_DIM(device_xstats) + ML_CNXK_MAX_MODELS * RTE_DIM(layer_xstats);
 	if (cn10k_mldev->xstats.entries == NULL)
 		cn10k_mldev->xstats.entries = rte_zmalloc(
-			"cn10k_ml_xstats", sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
+			"cn10k_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
 			PLT_CACHE_LINE_SIZE);
 
 	if (cn10k_mldev->xstats.entries == NULL)
@@ -470,17 +451,17 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 
 	/* Initialize device xstats */
 	stat_id = 0;
-	for (i = 0; i < RTE_DIM(device_stats); i++) {
+	for (i = 0; i < RTE_DIM(device_xstats); i++) {
 		cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
 		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s",
-			 device_stats[i].name);
+			 device_xstats[i].name);
 
 		cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
-		cn10k_mldev->xstats.entries[stat_id].type = device_stats[i].type;
-		cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_DEVICE;
+		cn10k_mldev->xstats.entries[stat_id].type = device_xstats[i].type;
+		cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_DEVICE;
 		cn10k_mldev->xstats.entries[stat_id].obj_idx = 0;
-		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_stats[i].reset_allowed;
+		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_xstats[i].reset_allowed;
 		stat_id++;
 	}
 	cn10k_mldev->xstats.count_mode_device = stat_id;
@@ -489,24 +470,24 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
 		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
 
-		for (i = 0; i < RTE_DIM(model_stats); i++) {
+		for (i = 0; i < RTE_DIM(layer_xstats); i++) {
 			cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
 			cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
-			cn10k_mldev->xstats.entries[stat_id].type = model_stats[i].type;
-			cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_MODEL;
+			cn10k_mldev->xstats.entries[stat_id].type = layer_xstats[i].type;
+			cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
 			cn10k_mldev->xstats.entries[stat_id].obj_idx = model;
 			cn10k_mldev->xstats.entries[stat_id].reset_allowed =
-				model_stats[i].reset_allowed;
+				layer_xstats[i].reset_allowed;
 
 			/* Name of xstat is updated during model load */
 			snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 				 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name),
-				 "Model-%u-%s", model, model_stats[i].name);
+				 "Model-%u-%s", model, layer_xstats[i].name);
 
 			stat_id++;
 		}
 
-		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(model_stats);
+		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
 	}
 
 	cn10k_mldev->xstats.count_mode_model = stat_id - cn10k_mldev->xstats.count_mode_device;
@@ -545,7 +526,7 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[model_id];
-	stat_id = RTE_DIM(device_stats) + model_id * RTE_DIM(model_stats);
+	stat_id = RTE_DIM(device_xstats) + model_id * RTE_DIM(layer_xstats);
 
 	roc_clk_freq_get(&rclk_freq, &sclk_freq);
 	if (sclk_freq == 0)
@@ -554,17 +535,17 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 		strcpy(suffix, "ns");
 
 	/* Update xstat name based on model name and sclk availability */
-	for (i = 0; i < RTE_DIM(model_stats); i++) {
+	for (i = 0; i < RTE_DIM(layer_xstats); i++) {
 		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
-			 model->layer[0].glow.metadata.model.name, model_stats[i].name, suffix);
+			 model->layer[0].glow.metadata.model.name, layer_xstats[i].name, suffix);
 		stat_id++;
 	}
 }
 
 static uint64_t
 cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
-		       enum cn10k_ml_xstats_type type)
+		       enum cnxk_ml_xstats_type type)
 {
 	struct cnxk_ml_dev *cnxk_mldev;
 
@@ -590,9 +571,9 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value += model->layer[0].glow.burst_stats[qp_id].str##_latency_tot;        \
-			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
-				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
+			value += model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot;       \
+			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
+				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
 		}                                                                                  \
 		if (count != 0)                                                                    \
 			value = value / count;                                                     \
@@ -603,9 +584,10 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 		value = UINT64_MAX;                                                                \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
 			value = PLT_MIN(                                                           \
-				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_min); \
-			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
-				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
+				value,                                                             \
+				model->layer[0].glow.burst_xstats[qp_id].str##_latency_min);       \
+			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
+				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
@@ -616,16 +598,17 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
 			value = PLT_MAX(                                                           \
-				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_max); \
-			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
-				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
+				value,                                                             \
+				model->layer[0].glow.burst_xstats[qp_id].str##_latency_max);       \
+			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
+				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
 	} while (0)
 
 static uint64_t
-cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cn10k_ml_xstats_type type)
+cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cnxk_ml_xstats_type type)
 {
 	struct cnxk_ml_model *model;
 	uint16_t rclk_freq; /* MHz */
@@ -671,8 +654,8 @@ cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cn10k_ml
 static int
 cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[], uint16_t nb_ids)
 {
-	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
 	uint16_t nb_stats;
 	uint16_t stat_id;
@@ -708,26 +691,26 @@ cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[],
 #define ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			model->layer[0].glow.burst_stats[qp_id].str##_latency_tot = 0;             \
-			model->layer[0].glow.burst_stats[qp_id].str##_reset_count =                \
-				model->layer[0].glow.burst_stats[qp_id].dequeued_count;            \
+			model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot = 0;            \
+			model->layer[0].glow.burst_xstats[qp_id].str##_reset_count =               \
+				model->layer[0].glow.burst_xstats[qp_id].dequeued_count;           \
 		}                                                                                  \
 	} while (0)
 
 #define ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_stats[qp_id].str##_latency_min = UINT64_MAX;    \
+			model->layer[0].glow.burst_xstats[qp_id].str##_latency_min = UINT64_MAX;   \
 	} while (0)
 
 #define ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_stats[qp_id].str##_latency_max = 0;             \
+			model->layer[0].glow.burst_xstats[qp_id].str##_latency_max = 0;            \
 	} while (0)
 
 static void
-cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cn10k_ml_xstats_type type)
+cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cnxk_ml_xstats_type type)
 {
 	struct cnxk_ml_model *model;
 	uint32_t qp_id;
@@ -762,8 +745,8 @@ static int
 cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint16_t stat_ids[],
 			    uint16_t nb_ids)
 {
-	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	int32_t lcl_model_id = 0;
@@ -1342,10 +1325,10 @@ static int
 cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
 				uint64_t *value)
 {
-	struct cn10k_ml_xstats_entry *xs;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	cn10k_ml_xstats_fn fn;
+	cnxk_ml_xstats_fn fn;
 	uint32_t i;
 
 	cnxk_mldev = dev->data->dev_private;
@@ -1357,10 +1340,10 @@ cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16
 				*stat_id = xs->map.id;
 
 			switch (xs->fn_id) {
-			case CN10K_ML_XSTATS_FN_DEVICE:
+			case CNXK_ML_XSTATS_FN_DEVICE:
 				fn = cn10k_ml_dev_xstat_get;
 				break;
-			case CN10K_ML_XSTATS_FN_MODEL:
+			case CNXK_ML_XSTATS_FN_MODEL:
 				fn = cn10k_ml_model_xstat_get;
 				break;
 			default:
@@ -1384,11 +1367,11 @@ static int
 cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
 			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
 {
-	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
 	uint32_t xstats_mode_count;
-	cn10k_ml_xstats_fn fn;
+	cnxk_ml_xstats_fn fn;
 	uint64_t val;
 	uint32_t idx;
 	uint32_t i;
@@ -1423,10 +1406,10 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 		}
 
 		switch (xs->fn_id) {
-		case CN10K_ML_XSTATS_FN_DEVICE:
+		case CNXK_ML_XSTATS_FN_DEVICE:
 			fn = cn10k_ml_dev_xstat_get;
 			break;
-		case CN10K_ML_XSTATS_FN_MODEL:
+		case CNXK_ML_XSTATS_FN_MODEL:
 			fn = cn10k_ml_model_xstat_get;
 			break;
 		default:
@@ -1664,7 +1647,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
 			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
 	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
-	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_stats);
+	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_xstats);
 
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
 		  2 * model_data_size + model_scratch_size + model_info_size +
@@ -1738,24 +1721,24 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	model->layer[0].glow.req = PLT_PTR_ADD(model->info, model_info_size);
 
 	/* Reset burst and sync stats */
-	model->layer[0].glow.burst_stats =
+	model->layer[0].glow.burst_xstats =
 		PLT_PTR_ADD(model->layer[0].glow.req,
 			    PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
-		model->layer[0].glow.burst_stats[qp_id].hw_latency_tot = 0;
-		model->layer[0].glow.burst_stats[qp_id].hw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_stats[qp_id].hw_latency_max = 0;
-		model->layer[0].glow.burst_stats[qp_id].fw_latency_tot = 0;
-		model->layer[0].glow.burst_stats[qp_id].fw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_stats[qp_id].fw_latency_max = 0;
-		model->layer[0].glow.burst_stats[qp_id].hw_reset_count = 0;
-		model->layer[0].glow.burst_stats[qp_id].fw_reset_count = 0;
-		model->layer[0].glow.burst_stats[qp_id].dequeued_count = 0;
+		model->layer[0].glow.burst_xstats[qp_id].hw_latency_tot = 0;
+		model->layer[0].glow.burst_xstats[qp_id].hw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_xstats[qp_id].hw_latency_max = 0;
+		model->layer[0].glow.burst_xstats[qp_id].fw_latency_tot = 0;
+		model->layer[0].glow.burst_xstats[qp_id].fw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_xstats[qp_id].fw_latency_max = 0;
+		model->layer[0].glow.burst_xstats[qp_id].hw_reset_count = 0;
+		model->layer[0].glow.burst_xstats[qp_id].fw_reset_count = 0;
+		model->layer[0].glow.burst_xstats[qp_id].dequeued_count = 0;
 	}
 
-	model->layer[0].glow.sync_stats =
-		PLT_PTR_ADD(model->layer[0].glow.burst_stats,
-			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_stats));
+	model->layer[0].glow.sync_xstats =
+		PLT_PTR_ADD(model->layer[0].glow.burst_xstats,
+			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_xstats));
 
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CNXK_MODEL_STATE_LOADED;
@@ -2308,7 +2291,7 @@ static __rte_always_inline void
 cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *req)
 {
 	union cn10k_ml_error_code *error_code;
-	struct cn10k_ml_layer_stats *stats;
+	struct cn10k_ml_layer_xstats *xstats;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_result *result;
@@ -2326,31 +2309,31 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *re
 		if (likely(qp_id >= 0)) {
 			qp = dev->data->queue_pairs[qp_id];
 			qp->stats.dequeued_count++;
-			stats = &model->layer[0].glow.burst_stats[qp_id];
+			xstats = &model->layer[0].glow.burst_xstats[qp_id];
 		} else {
-			stats = model->layer[0].glow.sync_stats;
+			xstats = model->layer[0].glow.sync_xstats;
 		}
 
-		if (unlikely(stats->dequeued_count == stats->hw_reset_count)) {
-			stats->hw_latency_min = UINT64_MAX;
-			stats->hw_latency_max = 0;
+		if (unlikely(xstats->dequeued_count == xstats->hw_reset_count)) {
+			xstats->hw_latency_min = UINT64_MAX;
+			xstats->hw_latency_max = 0;
 		}
 
-		if (unlikely(stats->dequeued_count == stats->fw_reset_count)) {
-			stats->fw_latency_min = UINT64_MAX;
-			stats->fw_latency_max = 0;
+		if (unlikely(xstats->dequeued_count == xstats->fw_reset_count)) {
+			xstats->fw_latency_min = UINT64_MAX;
+			xstats->fw_latency_max = 0;
 		}
 
 		hw_latency = result->stats.hw_end - result->stats.hw_start;
 		fw_latency = result->stats.fw_end - result->stats.fw_start - hw_latency;
 
-		stats->hw_latency_tot += hw_latency;
-		stats->hw_latency_min = PLT_MIN(stats->hw_latency_min, hw_latency);
-		stats->hw_latency_max = PLT_MAX(stats->hw_latency_max, hw_latency);
-		stats->fw_latency_tot += fw_latency;
-		stats->fw_latency_min = PLT_MIN(stats->fw_latency_min, fw_latency);
-		stats->fw_latency_max = PLT_MAX(stats->fw_latency_max, fw_latency);
-		stats->dequeued_count++;
+		xstats->hw_latency_tot += hw_latency;
+		xstats->hw_latency_min = PLT_MIN(xstats->hw_latency_min, hw_latency);
+		xstats->hw_latency_max = PLT_MAX(xstats->hw_latency_max, hw_latency);
+		xstats->fw_latency_tot += fw_latency;
+		xstats->fw_latency_min = PLT_MIN(xstats->fw_latency_min, fw_latency);
+		xstats->fw_latency_max = PLT_MAX(xstats->fw_latency_max, fw_latency);
+		xstats->dequeued_count++;
 
 		op->impl_opaque = result->error_code;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
diff --git a/drivers/ml/cnxk/cnxk_ml_xstats.h b/drivers/ml/cnxk/cnxk_ml_xstats.h
new file mode 100644
index 0000000000..0d405679ca
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_xstats.h
@@ -0,0 +1,128 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_XSTATS_H_
+#define _CNXK_ML_XSTATS_H_
+
+#include "cnxk_ml_io.h"
+
+/* Extended stats types enum */
+enum cnxk_ml_xstats_type {
+	/* Number of models loaded */
+	nb_models_loaded,
+
+	/* Number of models unloaded */
+	nb_models_unloaded,
+
+	/* Number of models started */
+	nb_models_started,
+
+	/* Number of models stopped */
+	nb_models_stopped,
+
+	/* Average inference hardware latency */
+	avg_hw_latency,
+
+	/* Minimum hardware latency */
+	min_hw_latency,
+
+	/* Maximum hardware latency */
+	max_hw_latency,
+
+	/* Average firmware latency */
+	avg_fw_latency,
+
+	/* Minimum firmware latency */
+	min_fw_latency,
+
+	/* Maximum firmware latency */
+	max_fw_latency,
+
+	/* Average runtime latency */
+	avg_rt_latency,
+
+	/* Minimum runtime latency */
+	min_rt_latency,
+
+	/* Maximum runtime latency */
+	max_rt_latency,
+};
+
+/* Extended stats function type enum. */
+enum cnxk_ml_xstats_fn_type {
+	/* Device function */
+	CNXK_ML_XSTATS_FN_DEVICE,
+
+	/* Model function */
+	CNXK_ML_XSTATS_FN_MODEL,
+};
+
+/* Function pointer to get xstats for a type */
+typedef uint64_t (*cnxk_ml_xstats_fn)(struct rte_ml_dev *cnxk_mldev, uint16_t obj_idx,
+				      enum cnxk_ml_xstats_type stat);
+
+/* Extended stats entry structure */
+struct cnxk_ml_xstats_entry {
+	/* Name-ID map */
+	struct rte_ml_dev_xstats_map map;
+
+	/* xstats mode, device or model */
+	enum rte_ml_dev_xstats_mode mode;
+
+	/* Type of xstats */
+	enum cnxk_ml_xstats_type type;
+
+	/* xstats function */
+	enum cnxk_ml_xstats_fn_type fn_id;
+
+	/* Object ID, model ID for model stat type */
+	uint16_t obj_idx;
+
+	/* Layer ID, valid for model stat type */
+	int32_t layer_id;
+
+	/* Allowed to reset the stat */
+	uint8_t reset_allowed;
+
+	/* An offset to be taken away to emulate resets */
+	uint64_t reset_value;
+};
+
+/* Extended stats data */
+struct cnxk_ml_xstats {
+	/* Pointer to xstats entries */
+	struct cnxk_ml_xstats_entry *entries;
+
+	/* Store num stats and offset of the stats for each model */
+	uint16_t count_per_model[ML_CNXK_MAX_MODELS];
+	uint16_t offset_for_model[ML_CNXK_MAX_MODELS];
+	uint16_t count_per_layer[ML_CNXK_MAX_MODELS][ML_CNXK_MODEL_MAX_LAYERS];
+	uint16_t offset_for_layer[ML_CNXK_MAX_MODELS][ML_CNXK_MODEL_MAX_LAYERS];
+	uint16_t count_mode_device;
+	uint16_t count_mode_model;
+	uint16_t count;
+};
+
+struct cnxk_ml_xstat_info {
+	char name[32];
+	enum cnxk_ml_xstats_type type;
+	uint8_t reset_allowed;
+};
+
+/* Device xstats. Note: Device stats are not allowed to be reset. */
+static const struct cnxk_ml_xstat_info device_xstats[] = {
+	{"nb_models_loaded", nb_models_loaded, 0},
+	{"nb_models_unloaded", nb_models_unloaded, 0},
+	{"nb_models_started", nb_models_started, 0},
+	{"nb_models_stopped", nb_models_stopped, 0},
+};
+
+/* Layer xstats */
+static const struct cnxk_ml_xstat_info layer_xstats[] = {
+	{"Avg-HW-Latency", avg_hw_latency, 1}, {"Min-HW-Latency", min_hw_latency, 1},
+	{"Max-HW-Latency", max_hw_latency, 1}, {"Avg-FW-Latency", avg_fw_latency, 1},
+	{"Min-FW-Latency", min_fw_latency, 1}, {"Max-FW-Latency", max_fw_latency, 1},
+};
+
+#endif /* _CNXK_ML_XSTATS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 06/34] ml/cnxk: rename cnxk ops function pointers struct
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (4 preceding siblings ...)
  2023-10-23  4:41   ` [PATCH v8 05/34] ml/cnxk: add generic cnxk xstats structures Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 07/34] ml/cnxk: update device handling functions Srikanth Yalavarthi
                     ` (27 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Renamed cn10k ML ops structure with cnxk prefix.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c |  2 +-
 drivers/ml/cnxk/cn10k_ml_ops.c | 73 +++++++++-------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h | 34 +++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.c  | 36 +++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.h  |  2 +
 5 files changed, 91 insertions(+), 56 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index fc6f78d414..91813e9d0a 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -345,7 +345,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 			goto pmd_destroy;
 		}
 
-		dev->dev_ops = &cn10k_ml_ops;
+		dev->dev_ops = &cnxk_ml_ops;
 	} else {
 		plt_err("CN10K ML Ops are not supported on secondary process");
 		dev->dev_ops = &ml_dev_dummy_ops;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index b470955ffd..a44fb26215 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -119,7 +119,7 @@ cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
 {
 	struct cnxk_ml_qp *qp;
@@ -860,7 +860,7 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	return ret;
 }
 
-static int
+int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -888,7 +888,7 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
 {
 	struct rte_ml_dev_info dev_info;
@@ -1087,7 +1087,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	return ret;
 }
 
-static int
+int
 cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1160,7 +1160,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	return rte_dev_remove(dev->device);
 }
 
-static int
+int
 cn10k_ml_dev_start(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1180,7 +1180,7 @@ cn10k_ml_dev_start(struct rte_ml_dev *dev)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1200,7 +1200,7 @@ cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 			      const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
 {
@@ -1241,7 +1241,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 {
 	struct cnxk_ml_qp *qp;
@@ -1258,7 +1258,7 @@ cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 	return 0;
 }
 
-static void
+void
 cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 {
 	struct cnxk_ml_qp *qp;
@@ -1273,7 +1273,7 @@ cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 	}
 }
 
-static int
+int
 cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
 			      uint32_t size)
@@ -1321,7 +1321,7 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 	return idx;
 }
 
-static int
+int
 cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
 				uint64_t *value)
 {
@@ -1363,7 +1363,7 @@ cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16
 	return -EINVAL;
 }
 
-static int
+int
 cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
 			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
 {
@@ -1427,7 +1427,7 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 	return idx;
 }
 
-static int
+int
 cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 			  int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids)
 {
@@ -1441,7 +1441,7 @@ cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mo
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1528,7 +1528,7 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -2051,7 +2051,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	return ret;
 }
 
-static int
+int
 cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			struct rte_ml_model_info *model_info)
 {
@@ -2071,7 +2071,7 @@ cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 	return 0;
 }
 
-static int
+int
 cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
 {
 	struct cnxk_ml_model *model;
@@ -2105,7 +2105,7 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *bu
 	return 0;
 }
 
-static int
+int
 cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
 		     struct rte_ml_buff_seg **qbuffer)
 {
@@ -2186,7 +2186,7 @@ cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_bu
 	return 0;
 }
 
-static int
+int
 cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
 		       struct rte_ml_buff_seg **dbuffer)
 {
@@ -2574,38 +2574,3 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 error_enqueue:
 	return ret;
 }
-
-struct rte_ml_dev_ops cn10k_ml_ops = {
-	/* Device control ops */
-	.dev_info_get = cn10k_ml_dev_info_get,
-	.dev_configure = cn10k_ml_dev_configure,
-	.dev_close = cn10k_ml_dev_close,
-	.dev_start = cn10k_ml_dev_start,
-	.dev_stop = cn10k_ml_dev_stop,
-	.dev_dump = cn10k_ml_dev_dump,
-	.dev_selftest = cn10k_ml_dev_selftest,
-
-	/* Queue-pair handling ops */
-	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
-	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
-
-	/* Stats ops */
-	.dev_stats_get = cn10k_ml_dev_stats_get,
-	.dev_stats_reset = cn10k_ml_dev_stats_reset,
-	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
-	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
-	.dev_xstats_get = cn10k_ml_dev_xstats_get,
-	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
-
-	/* Model ops */
-	.model_load = cn10k_ml_model_load,
-	.model_unload = cn10k_ml_model_unload,
-	.model_start = cn10k_ml_model_start,
-	.model_stop = cn10k_ml_model_stop,
-	.model_info_get = cn10k_ml_model_info_get,
-	.model_params_update = cn10k_ml_model_params_update,
-
-	/* I/O ops */
-	.io_quantize = cn10k_ml_io_quantize,
-	.io_dequantize = cn10k_ml_io_dequantize,
-};
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index fd5992e192..16480b9ad8 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -286,7 +286,29 @@ struct cn10k_ml_req {
 };
 
 /* Device ops */
-extern struct rte_ml_dev_ops cn10k_ml_ops;
+int cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
+int cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf);
+int cn10k_ml_dev_close(struct rte_ml_dev *dev);
+int cn10k_ml_dev_start(struct rte_ml_dev *dev);
+int cn10k_ml_dev_stop(struct rte_ml_dev *dev);
+int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
+int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
+int cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
+				  const struct rte_ml_dev_qp_conf *qp_conf, int socket_id);
+int cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id);
+
+int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
+void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
+int cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+				  int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
+				  uint32_t size);
+int cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
+				    uint64_t *value);
+int cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+			    int32_t model_id, const uint16_t stat_ids[], uint64_t values[],
+			    uint16_t nb_ids);
+int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+			      int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids);
 
 /* Slow-path ops */
 int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
@@ -294,6 +316,16 @@ int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *para
 int cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
+int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
+			    struct rte_ml_model_info *model_info);
+int cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer);
+
+/* I/O ops */
+int cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id,
+			 struct rte_ml_buff_seg **dbuffer, struct rte_ml_buff_seg **qbuffer);
+
+int cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id,
+			   struct rte_ml_buff_seg **qbuffer, struct rte_ml_buff_seg **dbuffer);
 
 /* Fast-path ops */
 __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index f1872dcf7c..03402681c5 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -3,5 +3,41 @@
  */
 
 #include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
 
 #include "cnxk_ml_ops.h"
+
+struct rte_ml_dev_ops cnxk_ml_ops = {
+	/* Device control ops */
+	.dev_info_get = cn10k_ml_dev_info_get,
+	.dev_configure = cn10k_ml_dev_configure,
+	.dev_close = cn10k_ml_dev_close,
+	.dev_start = cn10k_ml_dev_start,
+	.dev_stop = cn10k_ml_dev_stop,
+	.dev_dump = cn10k_ml_dev_dump,
+	.dev_selftest = cn10k_ml_dev_selftest,
+
+	/* Queue-pair handling ops */
+	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
+	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
+
+	/* Stats ops */
+	.dev_stats_get = cn10k_ml_dev_stats_get,
+	.dev_stats_reset = cn10k_ml_dev_stats_reset,
+	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
+	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
+	.dev_xstats_get = cn10k_ml_dev_xstats_get,
+	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
+
+	/* Model ops */
+	.model_load = cn10k_ml_model_load,
+	.model_unload = cn10k_ml_model_unload,
+	.model_start = cn10k_ml_model_start,
+	.model_stop = cn10k_ml_model_stop,
+	.model_info_get = cn10k_ml_model_info_get,
+	.model_params_update = cn10k_ml_model_params_update,
+
+	/* I/O ops */
+	.io_quantize = cn10k_ml_io_quantize,
+	.io_dequantize = cn10k_ml_io_dequantize,
+};
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index b953fb0f5f..a925c07580 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -60,4 +60,6 @@ struct cnxk_ml_qp {
 	struct rte_ml_dev_stats stats;
 };
 
+extern struct rte_ml_dev_ops cnxk_ml_ops;
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 07/34] ml/cnxk: update device handling functions
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (5 preceding siblings ...)
  2023-10-23  4:41   ` [PATCH v8 06/34] ml/cnxk: rename cnxk ops function pointers struct Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 08/34] ml/cnxk: update queue-pair " Srikanth Yalavarthi
                     ` (26 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implement CNXK wrapper functions for dev_info_get,
dev_configure, dev_close, dev_start and dev_stop. The
wrapper functions allocate / release common resources
for the ML driver and invoke device specific functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 230 ++------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |  16 +-
 drivers/ml/cnxk/cnxk_ml_dev.h  |   3 +
 drivers/ml/cnxk/cnxk_ml_ops.c  | 286 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.h  |   3 +
 5 files changed, 314 insertions(+), 224 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index a44fb26215..f8c51ab394 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -101,7 +101,7 @@ qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
 }
 
-static int
+int
 cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
 {
 	const struct rte_memzone *qp_mem;
@@ -861,20 +861,12 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 }
 
 int
-cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
+cn10k_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 
-	if (dev_info == NULL)
-		return -EINVAL;
-
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
-	dev_info->driver_name = dev->device->driver->name;
-	dev_info->max_models = ML_CNXK_MAX_MODELS;
 	if (cn10k_mldev->hw_queue_lock)
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
 	else
@@ -889,143 +881,17 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 }
 
 int
-cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
+cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
-	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cnxk_ml_qp *qp;
-	uint16_t model_id;
-	uint32_t mz_size;
 	uint16_t tile_id;
-	uint16_t qp_id;
 	int ret;
 
-	if (dev == NULL || conf == NULL)
-		return -EINVAL;
+	RTE_SET_USED(conf);
 
-	/* Get CN10K device handle */
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	cn10k_ml_dev_info_get(dev, &dev_info);
-	if (conf->nb_models > dev_info.max_models) {
-		plt_err("Invalid device config, nb_models > %u\n", dev_info.max_models);
-		return -EINVAL;
-	}
-
-	if (conf->nb_queue_pairs > dev_info.max_queue_pairs) {
-		plt_err("Invalid device config, nb_queue_pairs > %u\n", dev_info.max_queue_pairs);
-		return -EINVAL;
-	}
-
-	if (cnxk_mldev->state == ML_CNXK_DEV_STATE_PROBED) {
-		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
-			   conf->nb_queue_pairs, conf->nb_models);
-
-		/* Load firmware */
-		ret = cn10k_ml_fw_load(cnxk_mldev);
-		if (ret != 0)
-			return ret;
-	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
-		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
-			   conf->nb_queue_pairs, conf->nb_models);
-	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_STARTED) {
-		plt_err("Device can't be reconfigured in started state\n");
-		return -ENOTSUP;
-	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CLOSED) {
-		plt_err("Device can't be reconfigured after close\n");
-		return -ENOTSUP;
-	}
-
-	/* Configure queue-pairs */
-	if (dev->data->queue_pairs == NULL) {
-		mz_size = sizeof(dev->data->queue_pairs[0]) * conf->nb_queue_pairs;
-		dev->data->queue_pairs =
-			rte_zmalloc("cn10k_mldev_queue_pairs", mz_size, RTE_CACHE_LINE_SIZE);
-		if (dev->data->queue_pairs == NULL) {
-			dev->data->nb_queue_pairs = 0;
-			plt_err("Failed to get memory for queue_pairs, nb_queue_pairs %u",
-				conf->nb_queue_pairs);
-			return -ENOMEM;
-		}
-	} else { /* Re-configure */
-		void **queue_pairs;
-
-		/* Release all queue pairs as ML spec doesn't support queue_pair_destroy. */
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-			qp = dev->data->queue_pairs[qp_id];
-			if (qp != NULL) {
-				ret = cn10k_ml_dev_queue_pair_release(dev, qp_id);
-				if (ret < 0)
-					return ret;
-			}
-		}
-
-		queue_pairs = dev->data->queue_pairs;
-		queue_pairs =
-			rte_realloc(queue_pairs, sizeof(queue_pairs[0]) * conf->nb_queue_pairs,
-				    RTE_CACHE_LINE_SIZE);
-		if (queue_pairs == NULL) {
-			dev->data->nb_queue_pairs = 0;
-			plt_err("Failed to realloc queue_pairs, nb_queue_pairs = %u",
-				conf->nb_queue_pairs);
-			ret = -ENOMEM;
-			goto error;
-		}
-
-		memset(queue_pairs, 0, sizeof(queue_pairs[0]) * conf->nb_queue_pairs);
-		dev->data->queue_pairs = queue_pairs;
-	}
-	dev->data->nb_queue_pairs = conf->nb_queue_pairs;
-
-	/* Allocate ML models */
-	if (dev->data->models == NULL) {
-		mz_size = sizeof(dev->data->models[0]) * conf->nb_models;
-		dev->data->models = rte_zmalloc("cn10k_mldev_models", mz_size, RTE_CACHE_LINE_SIZE);
-		if (dev->data->models == NULL) {
-			dev->data->nb_models = 0;
-			plt_err("Failed to get memory for ml_models, nb_models %u",
-				conf->nb_models);
-			ret = -ENOMEM;
-			goto error;
-		}
-	} else {
-		/* Re-configure */
-		void **models;
-
-		/* Stop and unload all models */
-		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
-			model = dev->data->models[model_id];
-			if (model != NULL) {
-				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-					if (cn10k_ml_model_stop(dev, model_id) != 0)
-						plt_err("Could not stop model %u", model_id);
-				}
-				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-					if (cn10k_ml_model_unload(dev, model_id) != 0)
-						plt_err("Could not unload model %u", model_id);
-				}
-				dev->data->models[model_id] = NULL;
-			}
-		}
-
-		models = dev->data->models;
-		models = rte_realloc(models, sizeof(models[0]) * conf->nb_models,
-				     RTE_CACHE_LINE_SIZE);
-		if (models == NULL) {
-			dev->data->nb_models = 0;
-			plt_err("Failed to realloc ml_models, nb_models = %u", conf->nb_models);
-			ret = -ENOMEM;
-			goto error;
-		}
-		memset(models, 0, sizeof(models[0]) * conf->nb_models);
-		dev->data->models = models;
-	}
-	dev->data->nb_models = conf->nb_models;
-
 	ocm = &cn10k_mldev->ocm;
 	ocm->num_tiles = ML_CN10K_OCM_NUMTILES;
 	ocm->size_per_tile = ML_CN10K_OCM_TILESIZE;
@@ -1038,8 +904,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		rte_zmalloc("ocm_mask", ocm->mask_words * ocm->num_tiles, RTE_CACHE_LINE_SIZE);
 	if (ocm->ocm_mask == NULL) {
 		plt_err("Unable to allocate memory for OCM mask");
-		ret = -ENOMEM;
-		goto error;
+		return -ENOMEM;
 	}
 
 	for (tile_id = 0; tile_id < ocm->num_tiles; tile_id++) {
@@ -1050,10 +915,10 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	rte_spinlock_init(&ocm->lock);
 
 	/* Initialize xstats */
-	ret = cn10k_ml_xstats_init(dev);
+	ret = cn10k_ml_xstats_init(cnxk_mldev->mldev);
 	if (ret != 0) {
 		plt_err("Failed to initialize xstats");
-		goto error;
+		return ret;
 	}
 
 	/* Set JCMDQ enqueue function */
@@ -1067,77 +932,25 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	cn10k_mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
 	cn10k_mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
 
-	dev->enqueue_burst = cn10k_ml_enqueue_burst;
-	dev->dequeue_burst = cn10k_ml_dequeue_burst;
-	dev->op_error_get = cn10k_ml_op_error_get;
-
-	cnxk_mldev->nb_models_loaded = 0;
-	cnxk_mldev->nb_models_started = 0;
-	cnxk_mldev->nb_models_stopped = 0;
-	cnxk_mldev->nb_models_unloaded = 0;
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
+	cnxk_mldev->mldev->enqueue_burst = cn10k_ml_enqueue_burst;
+	cnxk_mldev->mldev->dequeue_burst = cn10k_ml_dequeue_burst;
+	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
 
 	return 0;
-
-error:
-	rte_free(dev->data->queue_pairs);
-
-	rte_free(dev->data->models);
-
-	return ret;
 }
 
 int
-cn10k_ml_dev_close(struct rte_ml_dev *dev)
+cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
-	struct cnxk_ml_qp *qp;
-	uint16_t model_id;
-	uint16_t qp_id;
 
-	if (dev == NULL)
-		return -EINVAL;
-
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Release ocm_mask memory */
 	rte_free(cn10k_mldev->ocm.ocm_mask);
 
-	/* Stop and unload all models */
-	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
-		model = dev->data->models[model_id];
-		if (model != NULL) {
-			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-				if (cn10k_ml_model_stop(dev, model_id) != 0)
-					plt_err("Could not stop model %u", model_id);
-			}
-			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-				if (cn10k_ml_model_unload(dev, model_id) != 0)
-					plt_err("Could not unload model %u", model_id);
-			}
-			dev->data->models[model_id] = NULL;
-		}
-	}
-
-	rte_free(dev->data->models);
-
-	/* Destroy all queue pairs */
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-		qp = dev->data->queue_pairs[qp_id];
-		if (qp != NULL) {
-			if (cnxk_ml_qp_destroy(dev, qp) != 0)
-				plt_err("Could not destroy queue pair %u", qp_id);
-			dev->data->queue_pairs[qp_id] = NULL;
-		}
-	}
-
-	rte_free(dev->data->queue_pairs);
-
 	/* Un-initialize xstats */
-	cn10k_ml_xstats_uninit(dev);
+	cn10k_ml_xstats_uninit(cnxk_mldev->mldev);
 
 	/* Unload firmware */
 	cn10k_ml_fw_unload(cnxk_mldev);
@@ -1154,20 +967,15 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_MLR_BASE);
 	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
 
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_CLOSED;
-
-	/* Remove PCI device */
-	return rte_dev_remove(dev->device);
+	return 0;
 }
 
 int
-cn10k_ml_dev_start(struct rte_ml_dev *dev)
+cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
@@ -1175,19 +983,15 @@ cn10k_ml_dev_start(struct rte_ml_dev *dev)
 	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
-
 	return 0;
 }
 
 int
-cn10k_ml_dev_stop(struct rte_ml_dev *dev)
+cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
@@ -1195,8 +999,6 @@ cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
-
 	return 0;
 }
 
@@ -1217,7 +1019,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	if (dev->data->queue_pairs[queue_pair_id] != NULL)
 		cn10k_ml_dev_queue_pair_release(dev, queue_pair_id);
 
-	cn10k_ml_dev_info_get(dev, &dev_info);
+	cnxk_ml_dev_info_get(dev, &dev_info);
 	if ((qp_conf->nb_desc > dev_info.max_desc) || (qp_conf->nb_desc == 0)) {
 		plt_err("Could not setup queue pair for %u descriptors", qp_conf->nb_desc);
 		return -EINVAL;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 16480b9ad8..d50b5bede7 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -10,6 +10,9 @@
 
 #include <roc_api.h>
 
+struct cnxk_ml_dev;
+struct cnxk_ml_qp;
+
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
 
@@ -286,11 +289,11 @@ struct cn10k_ml_req {
 };
 
 /* Device ops */
-int cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
-int cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf);
-int cn10k_ml_dev_close(struct rte_ml_dev *dev);
-int cn10k_ml_dev_start(struct rte_ml_dev *dev);
-int cn10k_ml_dev_stop(struct rte_ml_dev *dev);
+int cn10k_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info);
+int cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
+int cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev);
+int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
 int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
 int cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
@@ -336,4 +339,7 @@ __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op
 				    struct rte_ml_op_error *error);
 __rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
 
+/* Temporarily set below functions as non-static */
+int cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp);
+
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 51315de622..02605fa28f 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -53,6 +53,9 @@ struct cnxk_ml_dev {
 
 	/* CN10K device structure */
 	struct cn10k_ml_dev cn10k_mldev;
+
+	/* Maximum number of layers */
+	uint64_t max_nb_layers;
 };
 
 #endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 03402681c5..07a4daabc5 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -5,15 +5,291 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include "cnxk_ml_dev.h"
+#include "cnxk_ml_io.h"
+#include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
+int
+cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	if (dev == NULL || dev_info == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
+	dev_info->driver_name = dev->device->driver->name;
+	dev_info->max_models = ML_CNXK_MAX_MODELS;
+
+	return cn10k_ml_dev_info_get(cnxk_mldev, dev_info);
+}
+
+static int
+cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_qp *qp;
+	uint16_t model_id;
+	uint32_t mz_size;
+	uint16_t qp_id;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	/* Get CNXK device handle */
+	cnxk_mldev = dev->data->dev_private;
+
+	cnxk_ml_dev_info_get(dev, &dev_info);
+	if (conf->nb_models > dev_info.max_models) {
+		plt_err("Invalid device config, nb_models > %u\n", dev_info.max_models);
+		return -EINVAL;
+	}
+
+	if (conf->nb_queue_pairs > dev_info.max_queue_pairs) {
+		plt_err("Invalid device config, nb_queue_pairs > %u\n", dev_info.max_queue_pairs);
+		return -EINVAL;
+	}
+
+	if (cnxk_mldev->state == ML_CNXK_DEV_STATE_PROBED) {
+		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
+			   conf->nb_queue_pairs, conf->nb_models);
+
+		/* Load firmware */
+		ret = cn10k_ml_fw_load(cnxk_mldev);
+		if (ret != 0)
+			return ret;
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
+		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
+			   conf->nb_queue_pairs, conf->nb_models);
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_STARTED) {
+		plt_err("Device can't be reconfigured in started state\n");
+		return -ENOTSUP;
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CLOSED) {
+		plt_err("Device can't be reconfigured after close\n");
+		return -ENOTSUP;
+	}
+
+	/* Configure queue-pairs */
+	if (dev->data->queue_pairs == NULL) {
+		mz_size = sizeof(dev->data->queue_pairs[0]) * conf->nb_queue_pairs;
+		dev->data->queue_pairs =
+			rte_zmalloc("cnxk_mldev_queue_pairs", mz_size, RTE_CACHE_LINE_SIZE);
+		if (dev->data->queue_pairs == NULL) {
+			dev->data->nb_queue_pairs = 0;
+			plt_err("Failed to get memory for queue_pairs, nb_queue_pairs %u",
+				conf->nb_queue_pairs);
+			return -ENOMEM;
+		}
+	} else { /* Re-configure */
+		void **queue_pairs;
+
+		/* Release all queue pairs as ML spec doesn't support queue_pair_destroy. */
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+			qp = dev->data->queue_pairs[qp_id];
+			if (qp != NULL) {
+				ret = cn10k_ml_dev_queue_pair_release(dev, qp_id);
+				if (ret < 0)
+					return ret;
+			}
+		}
+
+		queue_pairs = dev->data->queue_pairs;
+		queue_pairs =
+			rte_realloc(queue_pairs, sizeof(queue_pairs[0]) * conf->nb_queue_pairs,
+				    RTE_CACHE_LINE_SIZE);
+		if (queue_pairs == NULL) {
+			dev->data->nb_queue_pairs = 0;
+			plt_err("Failed to realloc queue_pairs, nb_queue_pairs = %u",
+				conf->nb_queue_pairs);
+			ret = -ENOMEM;
+			goto error;
+		}
+
+		memset(queue_pairs, 0, sizeof(queue_pairs[0]) * conf->nb_queue_pairs);
+		dev->data->queue_pairs = queue_pairs;
+	}
+	dev->data->nb_queue_pairs = conf->nb_queue_pairs;
+
+	/* Allocate ML models */
+	if (dev->data->models == NULL) {
+		mz_size = sizeof(dev->data->models[0]) * conf->nb_models;
+		dev->data->models = rte_zmalloc("cnxk_mldev_models", mz_size, RTE_CACHE_LINE_SIZE);
+		if (dev->data->models == NULL) {
+			dev->data->nb_models = 0;
+			plt_err("Failed to get memory for ml_models, nb_models %u",
+				conf->nb_models);
+			ret = -ENOMEM;
+			goto error;
+		}
+	} else {
+		/* Re-configure */
+		void **models;
+
+		/* Stop and unload all models */
+		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+			model = dev->data->models[model_id];
+			if (model != NULL) {
+				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
+					if (cn10k_ml_model_stop(dev, model_id) != 0)
+						plt_err("Could not stop model %u", model_id);
+				}
+				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
+					if (cn10k_ml_model_unload(dev, model_id) != 0)
+						plt_err("Could not unload model %u", model_id);
+				}
+				dev->data->models[model_id] = NULL;
+			}
+		}
+
+		models = dev->data->models;
+		models = rte_realloc(models, sizeof(models[0]) * conf->nb_models,
+				     RTE_CACHE_LINE_SIZE);
+		if (models == NULL) {
+			dev->data->nb_models = 0;
+			plt_err("Failed to realloc ml_models, nb_models = %u", conf->nb_models);
+			ret = -ENOMEM;
+			goto error;
+		}
+		memset(models, 0, sizeof(models[0]) * conf->nb_models);
+		dev->data->models = models;
+	}
+	dev->data->nb_models = conf->nb_models;
+
+	ret = cn10k_ml_dev_configure(cnxk_mldev, conf);
+	if (ret != 0) {
+		plt_err("Failed to configure CN10K ML Device");
+		goto error;
+	}
+
+	/* Set device capabilities */
+	cnxk_mldev->max_nb_layers =
+		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
+
+	cnxk_mldev->nb_models_loaded = 0;
+	cnxk_mldev->nb_models_started = 0;
+	cnxk_mldev->nb_models_stopped = 0;
+	cnxk_mldev->nb_models_unloaded = 0;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
+
+	return 0;
+
+error:
+	rte_free(dev->data->queue_pairs);
+	rte_free(dev->data->models);
+
+	return ret;
+}
+
+static int
+cnxk_ml_dev_close(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_qp *qp;
+	uint16_t model_id;
+	uint16_t qp_id;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
+		plt_err("Failed to close CN10K ML Device");
+
+	/* Stop and unload all models */
+	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+		model = dev->data->models[model_id];
+		if (model != NULL) {
+			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
+				if (cn10k_ml_model_stop(dev, model_id) != 0)
+					plt_err("Could not stop model %u", model_id);
+			}
+			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
+				if (cn10k_ml_model_unload(dev, model_id) != 0)
+					plt_err("Could not unload model %u", model_id);
+			}
+			dev->data->models[model_id] = NULL;
+		}
+	}
+
+	rte_free(dev->data->models);
+
+	/* Destroy all queue pairs */
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		if (qp != NULL) {
+			if (cnxk_ml_qp_destroy(dev, qp) != 0)
+				plt_err("Could not destroy queue pair %u", qp_id);
+			dev->data->queue_pairs[qp_id] = NULL;
+		}
+	}
+
+	rte_free(dev->data->queue_pairs);
+
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CLOSED;
+
+	/* Remove PCI device */
+	return rte_dev_remove(dev->device);
+}
+
+static int
+cnxk_ml_dev_start(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	ret = cn10k_ml_dev_start(cnxk_mldev);
+	if (ret != 0) {
+		plt_err("Failed to start CN10K ML Device");
+		return ret;
+	}
+
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
+
+	return 0;
+}
+
+static int
+cnxk_ml_dev_stop(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	ret = cn10k_ml_dev_stop(cnxk_mldev);
+	if (ret != 0) {
+		plt_err("Failed to stop CN10K ML Device");
+		return ret;
+	}
+
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
-	.dev_info_get = cn10k_ml_dev_info_get,
-	.dev_configure = cn10k_ml_dev_configure,
-	.dev_close = cn10k_ml_dev_close,
-	.dev_start = cn10k_ml_dev_start,
-	.dev_stop = cn10k_ml_dev_stop,
+	.dev_info_get = cnxk_ml_dev_info_get,
+	.dev_configure = cnxk_ml_dev_configure,
+	.dev_close = cnxk_ml_dev_close,
+	.dev_start = cnxk_ml_dev_start,
+	.dev_stop = cnxk_ml_dev_stop,
 	.dev_dump = cn10k_ml_dev_dump,
 	.dev_selftest = cn10k_ml_dev_selftest,
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index a925c07580..2996928d7d 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -62,4 +62,7 @@ struct cnxk_ml_qp {
 
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
+/* Temporarily set cnxk driver functions as non-static */
+int cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 08/34] ml/cnxk: update queue-pair handling functions
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (6 preceding siblings ...)
  2023-10-23  4:41   ` [PATCH v8 07/34] ml/cnxk: update device handling functions Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 09/34] ml/cnxk: update model load and unload functions Srikanth Yalavarthi
                     ` (25 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to handle ML device queue-pairs.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 135 +----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   7 +-
 drivers/ml/cnxk/cnxk_ml_ops.c  | 153 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.h  |   3 -
 4 files changed, 154 insertions(+), 144 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index f8c51ab394..9691cf03e3 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -95,93 +95,12 @@ cn10k_ml_get_poll_ptr(struct cnxk_ml_req *req)
 	return plt_read64(req->status);
 }
 
-static void
-qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
-{
-	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
-}
-
-int
-cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
-{
-	const struct rte_memzone *qp_mem;
-	char name[RTE_MEMZONE_NAMESIZE];
-	int ret;
-
-	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp->id);
-	qp_mem = rte_memzone_lookup(name);
-	ret = rte_memzone_free(qp_mem);
-	if (ret)
-		return ret;
-
-	rte_free(qp);
-
-	return 0;
-}
-
-int
-cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
-{
-	struct cnxk_ml_qp *qp;
-	int ret;
-
-	qp = dev->data->queue_pairs[queue_pair_id];
-	if (qp == NULL)
-		return -EINVAL;
-
-	ret = cnxk_ml_qp_destroy(dev, qp);
-	if (ret) {
-		plt_err("Could not destroy queue pair %u", queue_pair_id);
-		return ret;
-	}
-
-	dev->data->queue_pairs[queue_pair_id] = NULL;
-
-	return 0;
-}
-
-static struct cnxk_ml_qp *
-cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
+void
+cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp)
 {
-	const struct rte_memzone *qp_mem;
-	char name[RTE_MEMZONE_NAMESIZE];
-	struct cnxk_ml_qp *qp;
-	uint32_t len;
-	uint8_t *va;
 	uint64_t i;
 
-	/* Allocate queue pair */
-	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cnxk_ml_qp), ROC_ALIGN,
-				socket_id);
-	if (qp == NULL) {
-		plt_err("Could not allocate queue pair");
-		return NULL;
-	}
-
-	/* For request queue */
-	len = nb_desc * sizeof(struct cnxk_ml_req);
-	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp_id);
-	qp_mem = rte_memzone_reserve_aligned(
-		name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | RTE_MEMZONE_256MB, ROC_ALIGN);
-	if (qp_mem == NULL) {
-		plt_err("Could not reserve memzone: %s", name);
-		goto qp_free;
-	}
-
-	va = qp_mem->addr;
-	memset(va, 0, len);
-
-	/* Initialize Request queue */
-	qp->id = qp_id;
-	qp->queue.reqs = (struct cnxk_ml_req *)va;
-	qp->queue.head = 0;
-	qp->queue.tail = 0;
-	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	qp->nb_desc = nb_desc;
-	qp->stats.enqueued_count = 0;
-	qp->stats.dequeued_count = 0;
-	qp->stats.enqueue_err_count = 0;
-	qp->stats.dequeue_err_count = 0;
+	RTE_SET_USED(cnxk_mldev);
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
@@ -189,13 +108,6 @@ cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc
 		qp->queue.reqs[i].cn10k_req.jcmd.w1.s.jobptr =
 			PLT_U64_CAST(&qp->queue.reqs[i].cn10k_req.jd);
 	}
-
-	return qp;
-
-qp_free:
-	rte_free(qp);
-
-	return NULL;
 }
 
 static void
@@ -1002,47 +914,6 @@ cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
-int
-cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
-			      const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
-{
-	struct rte_ml_dev_info dev_info;
-	struct cnxk_ml_qp *qp;
-	uint32_t nb_desc;
-
-	if (queue_pair_id >= dev->data->nb_queue_pairs) {
-		plt_err("Queue-pair id = %u (>= max queue pairs supported, %u)\n", queue_pair_id,
-			dev->data->nb_queue_pairs);
-		return -EINVAL;
-	}
-
-	if (dev->data->queue_pairs[queue_pair_id] != NULL)
-		cn10k_ml_dev_queue_pair_release(dev, queue_pair_id);
-
-	cnxk_ml_dev_info_get(dev, &dev_info);
-	if ((qp_conf->nb_desc > dev_info.max_desc) || (qp_conf->nb_desc == 0)) {
-		plt_err("Could not setup queue pair for %u descriptors", qp_conf->nb_desc);
-		return -EINVAL;
-	}
-	plt_ml_dbg("Creating queue-pair, queue_pair_id = %u, nb_desc = %u", queue_pair_id,
-		   qp_conf->nb_desc);
-
-	/* As the number of usable descriptors is 1 less than the queue size being created, we
-	 * increment the size of queue by 1 than the requested size, except when the requested size
-	 * is equal to the maximum possible size.
-	 */
-	nb_desc =
-		(qp_conf->nb_desc == dev_info.max_desc) ? dev_info.max_desc : qp_conf->nb_desc + 1;
-	qp = cnxk_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
-	if (qp == NULL) {
-		plt_err("Could not create queue pair %u", queue_pair_id);
-		return -ENOMEM;
-	}
-	dev->data->queue_pairs[queue_pair_id] = qp;
-
-	return 0;
-}
-
 int
 cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 {
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index d50b5bede7..2d0a49d5cd 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -296,9 +296,6 @@ int cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
 int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
-int cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
-				  const struct rte_ml_dev_qp_conf *qp_conf, int socket_id);
-int cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id);
 
 int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
 void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
@@ -339,7 +336,7 @@ __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op
 				    struct rte_ml_op_error *error);
 __rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
 
-/* Temporarily set below functions as non-static */
-int cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp);
+/* Misc ops */
+void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
 
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 07a4daabc5..aa56dd2276 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -10,7 +10,107 @@
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
-int
+static void
+qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
+{
+	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
+}
+
+static int
+cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
+{
+	const struct rte_memzone *qp_mem;
+	char name[RTE_MEMZONE_NAMESIZE];
+	int ret;
+
+	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp->id);
+	qp_mem = rte_memzone_lookup(name);
+	ret = rte_memzone_free(qp_mem);
+	if (ret)
+		return ret;
+
+	rte_free(qp);
+
+	return 0;
+}
+
+static int
+cnxk_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
+{
+	struct cnxk_ml_qp *qp;
+	int ret;
+
+	qp = dev->data->queue_pairs[queue_pair_id];
+	if (qp == NULL)
+		return -EINVAL;
+
+	ret = cnxk_ml_qp_destroy(dev, qp);
+	if (ret) {
+		plt_err("Could not destroy queue pair %u", queue_pair_id);
+		return ret;
+	}
+
+	dev->data->queue_pairs[queue_pair_id] = NULL;
+
+	return 0;
+}
+
+static struct cnxk_ml_qp *
+cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
+{
+	const struct rte_memzone *qp_mem;
+	char name[RTE_MEMZONE_NAMESIZE];
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_qp *qp;
+	uint32_t len;
+	uint8_t *va;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	/* Allocate queue pair */
+	qp = rte_zmalloc_socket("cnxk_ml_pmd_queue_pair", sizeof(struct cnxk_ml_qp), ROC_ALIGN,
+				socket_id);
+	if (qp == NULL) {
+		plt_err("Could not allocate queue pair");
+		return NULL;
+	}
+
+	/* For request queue */
+	len = nb_desc * sizeof(struct cnxk_ml_req);
+	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp_id);
+	qp_mem = rte_memzone_reserve_aligned(
+		name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | RTE_MEMZONE_256MB, ROC_ALIGN);
+	if (qp_mem == NULL) {
+		plt_err("Could not reserve memzone: %s", name);
+		goto qp_free;
+	}
+
+	va = qp_mem->addr;
+	memset(va, 0, len);
+
+	/* Initialize Request queue */
+	qp->id = qp_id;
+	qp->queue.reqs = (struct cnxk_ml_req *)va;
+	qp->queue.head = 0;
+	qp->queue.tail = 0;
+	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	qp->nb_desc = nb_desc;
+	qp->stats.enqueued_count = 0;
+	qp->stats.dequeued_count = 0;
+	qp->stats.enqueue_err_count = 0;
+	qp->stats.dequeue_err_count = 0;
+
+	cn10k_ml_qp_initialize(cnxk_mldev, qp);
+
+	return qp;
+
+qp_free:
+	rte_free(qp);
+
+	return NULL;
+}
+
+static int
 cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
 	struct cnxk_ml_dev *cnxk_mldev;
@@ -93,7 +193,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
 			qp = dev->data->queue_pairs[qp_id];
 			if (qp != NULL) {
-				ret = cn10k_ml_dev_queue_pair_release(dev, qp_id);
+				ret = cnxk_ml_dev_queue_pair_release(dev, qp_id);
 				if (ret < 0)
 					return ret;
 			}
@@ -283,6 +383,51 @@ cnxk_ml_dev_stop(struct rte_ml_dev *dev)
 	return 0;
 }
 
+static int
+cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
+			     const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cnxk_ml_qp *qp;
+	uint32_t nb_desc;
+
+	if (queue_pair_id >= dev->data->nb_queue_pairs) {
+		plt_err("Queue-pair id = %u (>= max queue pairs supported, %u)\n", queue_pair_id,
+			dev->data->nb_queue_pairs);
+		return -EINVAL;
+	}
+
+	if (dev->data->queue_pairs[queue_pair_id] != NULL)
+		cnxk_ml_dev_queue_pair_release(dev, queue_pair_id);
+
+	cnxk_ml_dev_info_get(dev, &dev_info);
+	if (qp_conf->nb_desc == 0) {
+		plt_err("Could not setup queue pair for %u descriptors", qp_conf->nb_desc);
+		return -EINVAL;
+	} else if (qp_conf->nb_desc > dev_info.max_desc) {
+		plt_err("Could not setup queue pair for %u descriptors (> %u)", qp_conf->nb_desc,
+			dev_info.max_desc);
+		return -EINVAL;
+	}
+	plt_ml_dbg("Creating queue-pair, queue_pair_id = %u, nb_desc = %u", queue_pair_id,
+		   qp_conf->nb_desc);
+
+	/* As the number of usable descriptors is 1 less than the queue size being created, we
+	 * increment the size of queue by 1 than the requested size, except when the requested size
+	 * is equal to the maximum possible size.
+	 */
+	nb_desc =
+		(qp_conf->nb_desc == dev_info.max_desc) ? dev_info.max_desc : qp_conf->nb_desc + 1;
+	qp = cnxk_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
+	if (qp == NULL) {
+		plt_err("Could not create queue pair %u", queue_pair_id);
+		return -ENOMEM;
+	}
+	dev->data->queue_pairs[queue_pair_id] = qp;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -294,8 +439,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_selftest = cn10k_ml_dev_selftest,
 
 	/* Queue-pair handling ops */
-	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
-	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
+	.dev_queue_pair_setup = cnxk_ml_dev_queue_pair_setup,
+	.dev_queue_pair_release = cnxk_ml_dev_queue_pair_release,
 
 	/* Stats ops */
 	.dev_stats_get = cn10k_ml_dev_stats_get,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index 2996928d7d..a925c07580 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -62,7 +62,4 @@ struct cnxk_ml_qp {
 
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
-/* Temporarily set cnxk driver functions as non-static */
-int cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
-
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 09/34] ml/cnxk: update model load and unload functions
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (7 preceding siblings ...)
  2023-10-23  4:41   ` [PATCH v8 08/34] ml/cnxk: update queue-pair " Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 10/34] ml/cnxk: update model start and stop functions Srikanth Yalavarthi
                     ` (24 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented cnxk wrapper functions to load and unload
ML models. Wrapper functions would invoke the cn10k
model load and unload functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 244 ++++++++++++-------------
 drivers/ml/cnxk/cn10k_ml_model.h |  26 ++-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 296 ++++++++++++++++++-------------
 drivers/ml/cnxk/cn10k_ml_ops.h   |  12 +-
 drivers/ml/cnxk/cnxk_ml_dev.h    |  15 ++
 drivers/ml/cnxk/cnxk_ml_ops.c    | 144 ++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.h    |   2 +
 7 files changed, 462 insertions(+), 277 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index d2f1c761be..48d70027ca 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -316,42 +316,31 @@ cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer, uint8_t
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_layer_addr *addr;
-	size_t model_data_size;
 	uint8_t *dma_addr_load;
-	uint8_t *dma_addr_run;
 	int fpos;
 
 	metadata = &layer->glow.metadata;
 	addr = &layer->glow.addr;
-	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
-			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
 
 	/* Base address */
 	addr->base_dma_addr_load = base_dma_addr;
-	addr->base_dma_addr_run = PLT_PTR_ADD(addr->base_dma_addr_load, model_data_size);
 
 	/* Init section */
 	dma_addr_load = addr->base_dma_addr_load;
-	dma_addr_run = addr->base_dma_addr_run;
 	fpos = sizeof(struct cn10k_ml_model_metadata);
 	addr->init_load_addr = dma_addr_load;
-	addr->init_run_addr = dma_addr_run;
 	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->init_model.file_size);
 
 	/* Main section */
 	dma_addr_load += metadata->init_model.file_size;
-	dma_addr_run += metadata->init_model.file_size;
 	fpos += metadata->init_model.file_size;
 	addr->main_load_addr = dma_addr_load;
-	addr->main_run_addr = dma_addr_run;
 	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->main_model.file_size);
 
 	/* Finish section */
 	dma_addr_load += metadata->main_model.file_size;
-	dma_addr_run += metadata->main_model.file_size;
 	fpos += metadata->main_model.file_size;
 	addr->finish_load_addr = dma_addr_load;
-	addr->finish_run_addr = dma_addr_run;
 	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->finish_model.file_size);
 
 	/* Weights and Bias section */
@@ -363,142 +352,148 @@ cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer, uint8_t
 }
 
 void
-cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer)
+cn10k_ml_layer_io_info_set(struct cnxk_ml_io_info *io_info,
+			   struct cn10k_ml_model_metadata *metadata)
 {
-	struct cn10k_ml_model_metadata *metadata;
 	uint8_t i;
 	uint8_t j;
 
-	metadata = &layer->glow.metadata;
-
 	/* Inputs */
-	layer->info.nb_inputs = metadata->model.num_input;
-	layer->info.total_input_sz_d = 0;
-	layer->info.total_input_sz_q = 0;
+	io_info->nb_inputs = metadata->model.num_input;
+	io_info->total_input_sz_d = 0;
+	io_info->total_input_sz_q = 0;
 	for (i = 0; i < metadata->model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			rte_strscpy(layer->info.input[i].name,
-				    (char *)metadata->input1[i].input_name, MRVL_ML_INPUT_NAME_LEN);
-			layer->info.input[i].dtype = metadata->input1[i].input_type;
-			layer->info.input[i].qtype = metadata->input1[i].model_input_type;
-			layer->info.input[i].nb_dims = 4;
-			layer->info.input[i].shape[0] = metadata->input1[i].shape.w;
-			layer->info.input[i].shape[1] = metadata->input1[i].shape.x;
-			layer->info.input[i].shape[2] = metadata->input1[i].shape.y;
-			layer->info.input[i].shape[3] = metadata->input1[i].shape.z;
-			layer->info.input[i].nb_elements =
+			rte_strscpy(io_info->input[i].name, (char *)metadata->input1[i].input_name,
+				    MRVL_ML_INPUT_NAME_LEN);
+			io_info->input[i].dtype = metadata->input1[i].input_type;
+			io_info->input[i].qtype = metadata->input1[i].model_input_type;
+			io_info->input[i].nb_dims = 4;
+			io_info->input[i].shape[0] = metadata->input1[i].shape.w;
+			io_info->input[i].shape[1] = metadata->input1[i].shape.x;
+			io_info->input[i].shape[2] = metadata->input1[i].shape.y;
+			io_info->input[i].shape[3] = metadata->input1[i].shape.z;
+			io_info->input[i].nb_elements =
 				metadata->input1[i].shape.w * metadata->input1[i].shape.x *
 				metadata->input1[i].shape.y * metadata->input1[i].shape.z;
-			layer->info.input[i].sz_d =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_d =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].input_type);
-			layer->info.input[i].sz_q =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_q =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].model_input_type);
-			layer->info.input[i].scale = metadata->input1[i].qscale;
+			io_info->input[i].scale = metadata->input1[i].qscale;
 
-			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
-			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
+			io_info->total_input_sz_d += io_info->input[i].sz_d;
+			io_info->total_input_sz_q += io_info->input[i].sz_q;
 
 			plt_ml_dbg(
-				"index = %u, input1[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				layer->index, i, metadata->input1[i].shape.w,
+				"layer_name = %s, input1[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				metadata->model.name, i, metadata->input1[i].shape.w,
 				metadata->input1[i].shape.x, metadata->input1[i].shape.y,
-				metadata->input1[i].shape.z, layer->info.input[i].sz_d,
-				layer->info.input[i].sz_q);
+				metadata->input1[i].shape.z, io_info->input[i].sz_d,
+				io_info->input[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			rte_strscpy(layer->info.input[i].name,
-				    (char *)metadata->input2[j].input_name, MRVL_ML_INPUT_NAME_LEN);
-			layer->info.input[i].dtype = metadata->input2[j].input_type;
-			layer->info.input[i].qtype = metadata->input2[j].model_input_type;
-			layer->info.input[i].nb_dims = 4;
-			layer->info.input[i].shape[0] = metadata->input2[j].shape.w;
-			layer->info.input[i].shape[1] = metadata->input2[j].shape.x;
-			layer->info.input[i].shape[2] = metadata->input2[j].shape.y;
-			layer->info.input[i].shape[3] = metadata->input2[j].shape.z;
-			layer->info.input[i].nb_elements =
+			rte_strscpy(io_info->input[i].name, (char *)metadata->input2[j].input_name,
+				    MRVL_ML_INPUT_NAME_LEN);
+			io_info->input[i].dtype = metadata->input2[j].input_type;
+			io_info->input[i].qtype = metadata->input2[j].model_input_type;
+			io_info->input[i].nb_dims = 4;
+			io_info->input[i].shape[0] = metadata->input2[j].shape.w;
+			io_info->input[i].shape[1] = metadata->input2[j].shape.x;
+			io_info->input[i].shape[2] = metadata->input2[j].shape.y;
+			io_info->input[i].shape[3] = metadata->input2[j].shape.z;
+			io_info->input[i].nb_elements =
 				metadata->input2[j].shape.w * metadata->input2[j].shape.x *
 				metadata->input2[j].shape.y * metadata->input2[j].shape.z;
-			layer->info.input[i].sz_d =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_d =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].input_type);
-			layer->info.input[i].sz_q =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_q =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].model_input_type);
-			layer->info.input[i].scale = metadata->input2[j].qscale;
+			io_info->input[i].scale = metadata->input2[j].qscale;
 
-			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
-			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
+			io_info->total_input_sz_d += io_info->input[i].sz_d;
+			io_info->total_input_sz_q += io_info->input[i].sz_q;
 
 			plt_ml_dbg(
-				"index = %u, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				layer->index, j, metadata->input2[j].shape.w,
+				"layer_name = %s, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				metadata->model.name, j, metadata->input2[j].shape.w,
 				metadata->input2[j].shape.x, metadata->input2[j].shape.y,
-				metadata->input2[j].shape.z, layer->info.input[i].sz_d,
-				layer->info.input[i].sz_q);
+				metadata->input2[j].shape.z, io_info->input[i].sz_d,
+				io_info->input[i].sz_q);
 		}
 	}
 
 	/* Outputs */
-	layer->info.nb_outputs = metadata->model.num_output;
-	layer->info.total_output_sz_q = 0;
-	layer->info.total_output_sz_d = 0;
+	io_info->nb_outputs = metadata->model.num_output;
+	io_info->total_output_sz_q = 0;
+	io_info->total_output_sz_d = 0;
 	for (i = 0; i < metadata->model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			rte_strscpy(layer->info.output[i].name,
+			rte_strscpy(io_info->output[i].name,
 				    (char *)metadata->output1[i].output_name,
 				    MRVL_ML_OUTPUT_NAME_LEN);
-			layer->info.output[i].dtype = metadata->output1[i].output_type;
-			layer->info.output[i].qtype = metadata->output1[i].model_output_type;
-			layer->info.output[i].nb_dims = 1;
-			layer->info.output[i].shape[0] = metadata->output1[i].size;
-			layer->info.output[i].nb_elements = metadata->output1[i].size;
-			layer->info.output[i].sz_d =
-				layer->info.output[i].nb_elements *
+			io_info->output[i].dtype = metadata->output1[i].output_type;
+			io_info->output[i].qtype = metadata->output1[i].model_output_type;
+			io_info->output[i].nb_dims = 1;
+			io_info->output[i].shape[0] = metadata->output1[i].size;
+			io_info->output[i].nb_elements = metadata->output1[i].size;
+			io_info->output[i].sz_d =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].output_type);
-			layer->info.output[i].sz_q =
-				layer->info.output[i].nb_elements *
+			io_info->output[i].sz_q =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].model_output_type);
-			layer->info.output[i].scale = metadata->output1[i].dscale;
+			io_info->output[i].scale = metadata->output1[i].dscale;
 
-			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
-			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
+			io_info->total_output_sz_q += io_info->output[i].sz_q;
+			io_info->total_output_sz_d += io_info->output[i].sz_d;
 
-			plt_ml_dbg("index = %u, output1[%u] - sz_d = %u, sz_q = %u", layer->index,
-				   i, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
+			plt_ml_dbg("layer_name = %s, output1[%u] - sz_d = %u, sz_q = %u",
+				   metadata->model.name, i, io_info->output[i].sz_d,
+				   io_info->output[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			rte_strscpy(layer->info.output[i].name,
+			rte_strscpy(io_info->output[i].name,
 				    (char *)metadata->output2[j].output_name,
 				    MRVL_ML_OUTPUT_NAME_LEN);
-			layer->info.output[i].dtype = metadata->output2[j].output_type;
-			layer->info.output[i].qtype = metadata->output2[j].model_output_type;
-			layer->info.output[i].nb_dims = 1;
-			layer->info.output[i].shape[0] = metadata->output2[j].size;
-			layer->info.output[i].nb_elements = metadata->output2[j].size;
-			layer->info.output[i].sz_d =
-				layer->info.output[i].nb_elements *
+			io_info->output[i].dtype = metadata->output2[j].output_type;
+			io_info->output[i].qtype = metadata->output2[j].model_output_type;
+			io_info->output[i].nb_dims = 1;
+			io_info->output[i].shape[0] = metadata->output2[j].size;
+			io_info->output[i].nb_elements = metadata->output2[j].size;
+			io_info->output[i].sz_d =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].output_type);
-			layer->info.output[i].sz_q =
-				layer->info.output[i].nb_elements *
+			io_info->output[i].sz_q =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].model_output_type);
-			layer->info.output[i].scale = metadata->output2[j].dscale;
+			io_info->output[i].scale = metadata->output2[j].dscale;
 
-			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
-			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
+			io_info->total_output_sz_q += io_info->output[i].sz_q;
+			io_info->total_output_sz_d += io_info->output[i].sz_d;
 
-			plt_ml_dbg("index = %u, output2[%u] - sz_d = %u, sz_q = %u", layer->index,
-				   j, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
+			plt_ml_dbg("layer_name = %s, output2[%u] - sz_d = %u, sz_q = %u",
+				   metadata->model.name, j, io_info->output[i].sz_d,
+				   io_info->output[i].sz_q);
 		}
 	}
 }
 
+struct cnxk_ml_io_info *
+cn10k_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
+{
+	return &model->layer[layer_id].info;
+}
+
 int
-cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id, uint8_t *buffer,
-			       uint16_t *wb_pages, uint16_t *scratch_pages)
+cn10k_ml_model_ocm_pages_count(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
+			       uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_ocm *ocm;
@@ -506,7 +501,7 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 	uint64_t wb_size;
 
 	metadata = (struct cn10k_ml_model_metadata *)buffer;
-	ocm = &cn10k_mldev->ocm;
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
 
 	/* Assume wb_size is zero for non-relocatable models */
 	if (metadata->model.ocm_relocatable)
@@ -518,7 +513,7 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 		*wb_pages = wb_size / ocm->page_size + 1;
 	else
 		*wb_pages = wb_size / ocm->page_size;
-	plt_ml_dbg("model_id = %u, wb_size = %" PRIu64 ", wb_pages = %u", model_id, wb_size,
+	plt_ml_dbg("index = %u, wb_size = %" PRIu64 ", wb_pages = %u", layer->index, wb_size,
 		   *wb_pages);
 
 	scratch_size = ocm->size_per_tile - metadata->model.ocm_tmp_range_floor;
@@ -526,15 +521,15 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 		*scratch_pages = scratch_size / ocm->page_size + 1;
 	else
 		*scratch_pages = scratch_size / ocm->page_size;
-	plt_ml_dbg("model_id = %u, scratch_size = %" PRIu64 ", scratch_pages = %u", model_id,
+	plt_ml_dbg("index = %u, scratch_size = %" PRIu64 ", scratch_pages = %u", layer->index,
 		   scratch_size, *scratch_pages);
 
 	/* Check if the model can be loaded on OCM */
-	if ((*wb_pages + *scratch_pages) > cn10k_mldev->ocm.num_pages) {
+	if ((*wb_pages + *scratch_pages) > ocm->num_pages) {
 		plt_err("Cannot create the model, OCM relocatable = %u",
 			metadata->model.ocm_relocatable);
 		plt_err("wb_pages (%u) + scratch_pages (%u) > %u", *wb_pages, *scratch_pages,
-			cn10k_mldev->ocm.num_pages);
+			ocm->num_pages);
 		return -ENOMEM;
 	}
 
@@ -542,28 +537,25 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 	 * prevent the library from allocating the remaining space on the tile to other models.
 	 */
 	if (!metadata->model.ocm_relocatable)
-		*scratch_pages = PLT_MAX(PLT_U64_CAST(*scratch_pages),
-					 PLT_U64_CAST(cn10k_mldev->ocm.num_pages));
+		*scratch_pages =
+			PLT_MAX(PLT_U64_CAST(*scratch_pages), PLT_U64_CAST(ocm->num_pages));
 
 	return 0;
 }
 
 void
-cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
+cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			struct cnxk_ml_io_info *io_info, struct cn10k_ml_model_metadata *metadata)
 {
-	struct cn10k_ml_model_metadata *metadata;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct rte_ml_model_info *info;
 	struct rte_ml_io_info *output;
 	struct rte_ml_io_info *input;
-	struct cnxk_ml_layer *layer;
 	uint8_t i;
 
-	cnxk_mldev = dev->data->dev_private;
 	metadata = &model->glow.metadata;
 	info = PLT_PTR_CAST(model->info);
 	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
-	output = PLT_PTR_ADD(input, metadata->model.num_input * sizeof(struct rte_ml_io_info));
+	output = PLT_PTR_ADD(input, ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info));
 
 	/* Set model info */
 	memset(info, 0, sizeof(struct rte_ml_model_info));
@@ -572,39 +564,37 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 		 metadata->model.version[1], metadata->model.version[2],
 		 metadata->model.version[3]);
 	info->model_id = model->model_id;
-	info->device_id = dev->data->dev_id;
+	info->device_id = cnxk_mldev->mldev->data->dev_id;
 	info->io_layout = RTE_ML_IO_LAYOUT_PACKED;
 	info->min_batches = model->batch_size;
 	info->max_batches =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_num_batches /
 		model->batch_size;
-	info->nb_inputs = metadata->model.num_input;
+	info->nb_inputs = io_info->nb_inputs;
 	info->input_info = input;
-	info->nb_outputs = metadata->model.num_output;
+	info->nb_outputs = io_info->nb_outputs;
 	info->output_info = output;
 	info->wb_size = metadata->weights_bias.file_size;
 
 	/* Set input info */
-	layer = &model->layer[0];
 	for (i = 0; i < info->nb_inputs; i++) {
-		rte_memcpy(input[i].name, layer->info.input[i].name, MRVL_ML_INPUT_NAME_LEN);
-		input[i].nb_dims = layer->info.input[i].nb_dims;
-		input[i].shape = &layer->info.input[i].shape[0];
-		input[i].type = layer->info.input[i].qtype;
-		input[i].nb_elements = layer->info.input[i].nb_elements;
-		input[i].size = layer->info.input[i].nb_elements *
-				rte_ml_io_type_size_get(layer->info.input[i].qtype);
+		rte_memcpy(input[i].name, io_info->input[i].name, MRVL_ML_INPUT_NAME_LEN);
+		input[i].nb_dims = io_info->input[i].nb_dims;
+		input[i].shape = &io_info->input[i].shape[0];
+		input[i].type = io_info->input[i].qtype;
+		input[i].nb_elements = io_info->input[i].nb_elements;
+		input[i].size = io_info->input[i].nb_elements *
+				rte_ml_io_type_size_get(io_info->input[i].qtype);
 	}
 
 	/* Set output info */
-	layer = &model->layer[0];
 	for (i = 0; i < info->nb_outputs; i++) {
-		rte_memcpy(output[i].name, layer->info.output[i].name, MRVL_ML_INPUT_NAME_LEN);
-		output[i].nb_dims = layer->info.output[i].nb_dims;
-		output[i].shape = &layer->info.output[i].shape[0];
-		output[i].type = layer->info.output[i].qtype;
-		output[i].nb_elements = layer->info.output[i].nb_elements;
-		output[i].size = layer->info.output[i].nb_elements *
-				 rte_ml_io_type_size_get(layer->info.output[i].qtype);
+		rte_memcpy(output[i].name, io_info->output[i].name, MRVL_ML_INPUT_NAME_LEN);
+		output[i].nb_dims = io_info->output[i].nb_dims;
+		output[i].shape = &io_info->output[i].shape[0];
+		output[i].type = io_info->output[i].qtype;
+		output[i].nb_elements = io_info->output[i].nb_elements;
+		output[i].size = io_info->output[i].nb_elements *
+				 rte_ml_io_type_size_get(io_info->output[i].qtype);
 	}
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 5c32f48c68..b891c9d627 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -9,9 +9,11 @@
 
 #include <roc_api.h>
 
-#include "cn10k_ml_dev.h"
 #include "cn10k_ml_ocm.h"
 
+#include "cnxk_ml_io.h"
+
+struct cnxk_ml_dev;
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
 struct cnxk_ml_req;
@@ -366,27 +368,15 @@ struct cn10k_ml_layer_addr {
 	/* Base DMA address for load */
 	void *base_dma_addr_load;
 
-	/* Base DMA address for run */
-	void *base_dma_addr_run;
-
 	/* Init section load address */
 	void *init_load_addr;
 
-	/* Init section run address */
-	void *init_run_addr;
-
 	/* Main section load address */
 	void *main_load_addr;
 
-	/* Main section run address */
-	void *main_run_addr;
-
 	/* Finish section load address */
 	void *finish_load_addr;
 
-	/* Finish section run address */
-	void *finish_run_addr;
-
 	/* Weights and Bias base address */
 	void *wb_base_addr;
 
@@ -462,9 +452,13 @@ int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
 void cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer,
 				uint8_t *base_dma_addr);
-void cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer);
-int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id,
+void cn10k_ml_layer_io_info_set(struct cnxk_ml_io_info *io_info,
+				struct cn10k_ml_model_metadata *metadata);
+struct cnxk_ml_io_info *cn10k_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
+int cn10k_ml_model_ocm_pages_count(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				   uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages);
-void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model);
+void cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			     struct cnxk_ml_io_info *io_info,
+			     struct cn10k_ml_model_metadata *metadata);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 9691cf03e3..ab05896b5e 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -15,6 +15,9 @@
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
 
+/* ML layer macros */
+#define CN10K_ML_LAYER_MEMZONE_NAME "ml_cn10k_layer_mz"
+
 /* Debug print width */
 #define STR_LEN	  12
 #define FIELD_LEN 16
@@ -273,7 +276,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 		req->cn10k_req.jd.model_start.extended_args = PLT_U64_CAST(
 			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.extended_args));
 		req->cn10k_req.jd.model_start.model_dst_ddr_addr =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_run_addr));
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_load_addr));
 		req->cn10k_req.jd.model_start.model_init_offset = 0x0;
 		req->cn10k_req.jd.model_start.model_main_offset = metadata->init_model.file_size;
 		req->cn10k_req.jd.model_start.model_finish_offset =
@@ -1261,85 +1264,171 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 }
 
 int
-cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
+cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uint8_t *buffer,
+		    size_t size, uint16_t *index)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
-	size_t model_scratch_size;
-	size_t model_stats_size;
-	size_t model_data_size;
-	size_t model_info_size;
+	size_t layer_object_size = 0;
+	size_t layer_scratch_size;
+	size_t layer_xstats_size;
 	uint8_t *base_dma_addr;
 	uint16_t scratch_pages;
+	uint16_t layer_id = 0;
 	uint16_t wb_pages;
 	uint64_t mz_size;
 	uint16_t idx;
-	bool found;
 	int qp_id;
 	int ret;
 
-	ret = cn10k_ml_model_metadata_check(params->addr, params->size);
+	PLT_SET_USED(size);
+	PLT_SET_USED(layer_name);
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	layer = &model->layer[layer_id];
+
+	ret = cn10k_ml_model_metadata_check(buffer, size);
 	if (ret != 0)
 		return ret;
 
-	cnxk_mldev = dev->data->dev_private;
-
-	/* Find model ID */
-	found = false;
-	for (idx = 0; idx < dev->data->nb_models; idx++) {
-		if (dev->data->models[idx] == NULL) {
-			found = true;
+	/* Get index */
+	for (idx = 0; idx < cnxk_mldev->max_nb_layers; idx++) {
+		if (!cnxk_mldev->index_map[idx].active) {
+			layer->index = idx;
 			break;
 		}
 	}
 
-	if (!found) {
-		plt_err("No slots available to load new model");
-		return -ENOMEM;
+	if (idx >= cnxk_mldev->max_nb_layers) {
+		plt_err("No slots available for model layers, model_id = %u, layer_id = %u",
+			model->model_id, layer_id);
+		return -1;
 	}
 
+	layer->model = model;
+
 	/* Get WB and scratch pages, check if model can be loaded. */
-	ret = cn10k_ml_model_ocm_pages_count(&cnxk_mldev->cn10k_mldev, idx, params->addr, &wb_pages,
-					     &scratch_pages);
+	ret = cn10k_ml_model_ocm_pages_count(cnxk_mldev, layer, buffer, &wb_pages, &scratch_pages);
 	if (ret < 0)
 		return ret;
 
-	/* Compute memzone size */
-	metadata = (struct cn10k_ml_model_metadata *)params->addr;
-	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
-			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
-	model_scratch_size = PLT_ALIGN_CEIL(metadata->model.ddr_scratch_range_end -
+	/* Compute layer memzone size */
+	metadata = (struct cn10k_ml_model_metadata *)buffer;
+	layer_object_size = metadata->init_model.file_size + metadata->main_model.file_size +
+			    metadata->finish_model.file_size + metadata->weights_bias.file_size;
+	layer_object_size = PLT_ALIGN_CEIL(layer_object_size, ML_CN10K_ALIGN_SIZE);
+	layer_scratch_size = PLT_ALIGN_CEIL(metadata->model.ddr_scratch_range_end -
 						    metadata->model.ddr_scratch_range_start + 1,
 					    ML_CN10K_ALIGN_SIZE);
-	model_data_size = PLT_ALIGN_CEIL(model_data_size, ML_CN10K_ALIGN_SIZE);
-	model_info_size = sizeof(struct rte_ml_model_info) +
-			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
-			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
-	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
-	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_xstats);
-
-	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
-		  2 * model_data_size + model_scratch_size + model_info_size +
-		  PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE) +
-		  model_stats_size;
+	layer_xstats_size = (cnxk_mldev->mldev->data->nb_queue_pairs + 1) *
+			    sizeof(struct cn10k_ml_layer_xstats);
 
-	/* Allocate memzone for model object and model data */
-	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
+	/* Allocate memzone for model data */
+	mz_size = layer_object_size + layer_scratch_size +
+		  PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE) +
+		  layer_xstats_size;
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u_%u", CN10K_ML_LAYER_MEMZONE_NAME,
+		 model->model_id, layer_id);
 	mz = plt_memzone_reserve_aligned(str, mz_size, 0, ML_CN10K_ALIGN_SIZE);
 	if (!mz) {
 		plt_err("plt_memzone_reserve failed : %s", str);
 		return -ENOMEM;
 	}
 
-	model = mz->addr;
-	model->cnxk_mldev = cnxk_mldev;
-	model->model_id = idx;
-	dev->data->models[idx] = model;
+	/* Copy metadata to internal buffer */
+	rte_memcpy(&layer->glow.metadata, buffer, sizeof(struct cn10k_ml_model_metadata));
+	cn10k_ml_model_metadata_update(&layer->glow.metadata);
+
+	/* Set layer name */
+	rte_memcpy(layer->name, layer->glow.metadata.model.name, MRVL_ML_MODEL_NAME_LEN);
+
+	/* Enable support for batch_size of 256 */
+	if (layer->glow.metadata.model.batch_size == 0)
+		layer->batch_size = 256;
+	else
+		layer->batch_size = layer->glow.metadata.model.batch_size;
+
+	/* Set DMA base address */
+	base_dma_addr = mz->addr;
+	cn10k_ml_layer_addr_update(layer, buffer, base_dma_addr);
+
+	/* Set scratch base address */
+	layer->glow.addr.scratch_base_addr = PLT_PTR_ADD(base_dma_addr, layer_object_size);
+
+	/* Update internal I/O data structure */
+	cn10k_ml_layer_io_info_set(&layer->info, &layer->glow.metadata);
+
+	/* Initialize model_mem_map */
+	memset(&layer->glow.ocm_map, 0, sizeof(struct cn10k_ml_ocm_layer_map));
+	layer->glow.ocm_map.ocm_reserved = false;
+	layer->glow.ocm_map.tilemask = 0;
+	layer->glow.ocm_map.wb_page_start = -1;
+	layer->glow.ocm_map.wb_pages = wb_pages;
+	layer->glow.ocm_map.scratch_pages = scratch_pages;
+
+	/* Set slow-path request address and state */
+	layer->glow.req = PLT_PTR_ADD(mz->addr, layer_object_size + layer_scratch_size);
+
+	/* Reset burst and sync stats */
+	layer->glow.burst_xstats = PLT_PTR_ADD(
+		layer->glow.req, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
+	for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs + 1; qp_id++) {
+		layer->glow.burst_xstats[qp_id].hw_latency_tot = 0;
+		layer->glow.burst_xstats[qp_id].hw_latency_min = UINT64_MAX;
+		layer->glow.burst_xstats[qp_id].hw_latency_max = 0;
+		layer->glow.burst_xstats[qp_id].fw_latency_tot = 0;
+		layer->glow.burst_xstats[qp_id].fw_latency_min = UINT64_MAX;
+		layer->glow.burst_xstats[qp_id].fw_latency_max = 0;
+		layer->glow.burst_xstats[qp_id].hw_reset_count = 0;
+		layer->glow.burst_xstats[qp_id].fw_reset_count = 0;
+		layer->glow.burst_xstats[qp_id].dequeued_count = 0;
+	}
+
+	layer->glow.sync_xstats =
+		PLT_PTR_ADD(layer->glow.burst_xstats, cnxk_mldev->mldev->data->nb_queue_pairs *
+							      sizeof(struct cn10k_ml_layer_xstats));
+
+	/* Update xstats names */
+	cn10k_ml_xstats_model_name_update(cnxk_mldev->mldev, idx);
+
+	layer->state = ML_CNXK_LAYER_STATE_LOADED;
+	cnxk_mldev->index_map[idx].model_id = model->model_id;
+	cnxk_mldev->index_map[idx].layer_id = layer_id;
+	cnxk_mldev->index_map[idx].active = true;
+	*index = idx;
+
+	return 0;
+}
+
+int
+cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+		    struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+	int ret;
+
+	/* Metadata check */
+	ret = cn10k_ml_model_metadata_check(params->addr, params->size);
+	if (ret != 0)
+		return ret;
 
+	/* Copy metadata to internal buffer */
 	rte_memcpy(&model->glow.metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
 	cn10k_ml_model_metadata_update(&model->glow.metadata);
 
@@ -1358,99 +1447,62 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	 */
 	model->nb_layers = 1;
 
-	/* Copy metadata to internal buffer */
-	rte_memcpy(&model->layer[0].glow.metadata, params->addr,
-		   sizeof(struct cn10k_ml_model_metadata));
-	cn10k_ml_model_metadata_update(&model->layer[0].glow.metadata);
-	model->layer[0].model = model;
-
-	/* Set DMA base address */
-	base_dma_addr = PLT_PTR_ADD(
-		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE));
-	cn10k_ml_layer_addr_update(&model->layer[0], params->addr, base_dma_addr);
-	model->layer[0].glow.addr.scratch_base_addr =
-		PLT_PTR_ADD(base_dma_addr, 2 * model_data_size);
-
-	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
-		   model->layer[0].glow.addr.base_dma_addr_load, model_data_size);
-
-	/* Update internal I/O data structure */
-	cn10k_ml_layer_info_update(&model->layer[0]);
-
-	/* Initialize model_mem_map */
-	memset(&model->layer[0].glow.ocm_map, 0, sizeof(struct cn10k_ml_ocm_layer_map));
-	model->layer[0].glow.ocm_map.ocm_reserved = false;
-	model->layer[0].glow.ocm_map.tilemask = 0;
-	model->layer[0].glow.ocm_map.wb_page_start = -1;
-	model->layer[0].glow.ocm_map.wb_pages = wb_pages;
-	model->layer[0].glow.ocm_map.scratch_pages = scratch_pages;
-
-	/* Set model info */
-	model->info = PLT_PTR_ADD(model->layer[0].glow.addr.scratch_base_addr, model_scratch_size);
-	cn10k_ml_model_info_set(dev, model);
-
-	/* Set slow-path request address and state */
-	model->layer[0].glow.req = PLT_PTR_ADD(model->info, model_info_size);
-
-	/* Reset burst and sync stats */
-	model->layer[0].glow.burst_xstats =
-		PLT_PTR_ADD(model->layer[0].glow.req,
-			    PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
-		model->layer[0].glow.burst_xstats[qp_id].hw_latency_tot = 0;
-		model->layer[0].glow.burst_xstats[qp_id].hw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_xstats[qp_id].hw_latency_max = 0;
-		model->layer[0].glow.burst_xstats[qp_id].fw_latency_tot = 0;
-		model->layer[0].glow.burst_xstats[qp_id].fw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_xstats[qp_id].fw_latency_max = 0;
-		model->layer[0].glow.burst_xstats[qp_id].hw_reset_count = 0;
-		model->layer[0].glow.burst_xstats[qp_id].fw_reset_count = 0;
-		model->layer[0].glow.burst_xstats[qp_id].dequeued_count = 0;
+	/* Load layer and get the index */
+	layer = &model->layer[0];
+	ret = cn10k_ml_layer_load(cnxk_mldev, model->model_id, NULL, params->addr, params->size,
+				  &layer->index);
+	if (ret != 0) {
+		plt_err("Model layer load failed: model_id = %u, layer_id = %u", model->model_id,
+			0);
+		return ret;
 	}
 
-	model->layer[0].glow.sync_xstats =
-		PLT_PTR_ADD(model->layer[0].glow.burst_xstats,
-			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_xstats));
-
-	plt_spinlock_init(&model->lock);
-	model->state = ML_CNXK_MODEL_STATE_LOADED;
-	dev->data->models[idx] = model;
-	cnxk_mldev->nb_models_loaded++;
-
-	/* Update xstats names */
-	cn10k_ml_xstats_model_name_update(dev, idx);
-
-	*model_id = idx;
+	cn10k_ml_model_info_set(cnxk_mldev, model, &model->layer[0].info, &model->glow.metadata);
 
 	return 0;
 }
 
 int
-cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name)
 {
-	char str[RTE_MEMZONE_NAMESIZE];
-	struct cnxk_ml_model *model;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 
-	cnxk_mldev = dev->data->dev_private;
-	model = dev->data->models[model_id];
+	char str[RTE_MEMZONE_NAMESIZE];
+	uint16_t layer_id = 0;
+	int ret;
 
+	PLT_SET_USED(layer_name);
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
 	if (model == NULL) {
 		plt_err("Invalid model_id = %u", model_id);
 		return -EINVAL;
 	}
 
-	if (model->state != ML_CNXK_MODEL_STATE_LOADED) {
-		plt_err("Cannot unload. Model in use.");
-		return -EBUSY;
-	}
+	layer = &model->layer[layer_id];
 
-	dev->data->models[model_id] = NULL;
-	cnxk_mldev->nb_models_unloaded++;
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u_%u", CN10K_ML_LAYER_MEMZONE_NAME,
+		 model->model_id, layer_id);
+	ret = plt_memzone_free(plt_memzone_lookup(str));
 
-	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, model_id);
-	return plt_memzone_free(plt_memzone_lookup(str));
+	layer->state = ML_CNXK_LAYER_STATE_UNKNOWN;
+	cnxk_mldev->index_map[layer->index].active = false;
+
+	return ret;
+}
+
+int
+cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	return cn10k_ml_layer_unload(cnxk_mldev, model->model_id, NULL);
 }
 
 int
@@ -1748,7 +1800,6 @@ int
 cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
 {
 	struct cnxk_ml_model *model;
-	size_t size;
 
 	model = dev->data->models[model_id];
 
@@ -1762,19 +1813,10 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *bu
 	else if (model->state != ML_CNXK_MODEL_STATE_LOADED)
 		return -EBUSY;
 
-	size = model->layer[0].glow.metadata.init_model.file_size +
-	       model->layer[0].glow.metadata.main_model.file_size +
-	       model->layer[0].glow.metadata.finish_model.file_size +
-	       model->layer[0].glow.metadata.weights_bias.file_size;
-
 	/* Update model weights & bias */
 	rte_memcpy(model->layer[0].glow.addr.wb_load_addr, buffer,
 		   model->layer[0].glow.metadata.weights_bias.file_size);
 
-	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
-		   model->layer[0].glow.addr.base_dma_addr_load, size);
-
 	return 0;
 }
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 2d0a49d5cd..677219dfdf 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -12,6 +12,7 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_qp;
+struct cnxk_ml_model;
 
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
@@ -311,9 +312,9 @@ int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 			      int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids);
 
 /* Slow-path ops */
-int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
-			uint16_t *model_id);
-int cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
+int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+			struct cnxk_ml_model *model);
+int cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
@@ -339,4 +340,9 @@ __rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *
 /* Misc ops */
 void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
 
+/* Layer ops */
+int cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uint8_t *buffer,
+			size_t size, uint16_t *index);
+int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name);
+
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 02605fa28f..1590249abd 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -31,6 +31,18 @@ enum cnxk_ml_dev_state {
 	ML_CNXK_DEV_STATE_CLOSED
 };
 
+/* Index to model and layer ID map */
+struct cnxk_ml_index_map {
+	/* Model ID */
+	uint16_t model_id;
+
+	/* Layer ID */
+	uint16_t layer_id;
+
+	/* Layer status */
+	bool active;
+};
+
 /* Device private data */
 struct cnxk_ml_dev {
 	/* RTE device */
@@ -56,6 +68,9 @@ struct cnxk_ml_dev {
 
 	/* Maximum number of layers */
 	uint64_t max_nb_layers;
+
+	/* Index map */
+	struct cnxk_ml_index_map *index_map;
 };
 
 #endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index aa56dd2276..1d8b84269d 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -10,6 +10,9 @@
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
+/* ML model macros */
+#define CNXK_ML_MODEL_MEMZONE_NAME "ml_cnxk_model_mz"
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -137,6 +140,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	uint16_t model_id;
 	uint32_t mz_size;
 	uint16_t qp_id;
+	uint64_t i;
 	int ret;
 
 	if (dev == NULL)
@@ -240,7 +244,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 						plt_err("Could not stop model %u", model_id);
 				}
 				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-					if (cn10k_ml_model_unload(dev, model_id) != 0)
+					if (cnxk_ml_model_unload(dev, model_id) != 0)
 						plt_err("Could not unload model %u", model_id);
 				}
 				dev->data->models[model_id] = NULL;
@@ -271,6 +275,23 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	cnxk_mldev->max_nb_layers =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
 
+	/* Allocate and initialize index_map */
+	if (cnxk_mldev->index_map == NULL) {
+		cnxk_mldev->index_map =
+			rte_zmalloc("cnxk_ml_index_map",
+				    sizeof(struct cnxk_ml_index_map) * cnxk_mldev->max_nb_layers,
+				    RTE_CACHE_LINE_SIZE);
+		if (cnxk_mldev->index_map == NULL) {
+			plt_err("Failed to get memory for index_map, nb_layers %" PRIu64,
+				cnxk_mldev->max_nb_layers);
+			ret = -ENOMEM;
+			goto error;
+		}
+	}
+
+	for (i = 0; i < cnxk_mldev->max_nb_layers; i++)
+		cnxk_mldev->index_map[i].active = false;
+
 	cnxk_mldev->nb_models_loaded = 0;
 	cnxk_mldev->nb_models_started = 0;
 	cnxk_mldev->nb_models_stopped = 0;
@@ -303,6 +324,9 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close CN10K ML Device");
 
+	if (cnxk_mldev->index_map)
+		rte_free(cnxk_mldev->index_map);
+
 	/* Stop and unload all models */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 		model = dev->data->models[model_id];
@@ -312,7 +336,7 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 					plt_err("Could not stop model %u", model_id);
 			}
 			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-				if (cn10k_ml_model_unload(dev, model_id) != 0)
+				if (cnxk_ml_model_unload(dev, model_id) != 0)
 					plt_err("Could not unload model %u", model_id);
 			}
 			dev->data->models[model_id] = NULL;
@@ -428,6 +452,118 @@ cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+static int
+cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint64_t model_info_size;
+	uint16_t lcl_model_id;
+	uint64_t mz_size;
+	bool found;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	/* Find model ID */
+	found = false;
+	for (lcl_model_id = 0; lcl_model_id < dev->data->nb_models; lcl_model_id++) {
+		if (dev->data->models[lcl_model_id] == NULL) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		plt_err("No slots available to load new model");
+		return -ENOMEM;
+	}
+
+	/* Compute memzone size */
+	cnxk_ml_dev_info_get(dev, &dev_info);
+	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size);
+	model_info_size = sizeof(struct rte_ml_model_info) +
+			  ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info) +
+			  ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info);
+	model_info_size = PLT_ALIGN_CEIL(model_info_size, dev_info.align_size);
+	mz_size += model_info_size;
+
+	/* Allocate memzone for model object */
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CNXK_ML_MODEL_MEMZONE_NAME, lcl_model_id);
+	mz = plt_memzone_reserve_aligned(str, mz_size, 0, dev_info.align_size);
+	if (!mz) {
+		plt_err("Failed to allocate memory for cnxk_ml_model: %s", str);
+		return -ENOMEM;
+	}
+
+	model = mz->addr;
+	model->cnxk_mldev = cnxk_mldev;
+	model->model_id = lcl_model_id;
+	model->info = PLT_PTR_ADD(
+		model, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size));
+	dev->data->models[lcl_model_id] = model;
+
+	ret = cn10k_ml_model_load(cnxk_mldev, params, model);
+	if (ret != 0)
+		goto error;
+
+	plt_spinlock_init(&model->lock);
+	model->state = ML_CNXK_MODEL_STATE_LOADED;
+	cnxk_mldev->nb_models_loaded++;
+
+	*model_id = lcl_model_id;
+
+	return 0;
+
+error:
+	rte_memzone_free(mz);
+
+	return ret;
+}
+
+int
+cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	if (model->state != ML_CNXK_MODEL_STATE_LOADED) {
+		plt_err("Cannot unload. Model in use.");
+		return -EBUSY;
+	}
+
+	ret = cn10k_ml_model_unload(cnxk_mldev, model);
+	if (ret != 0)
+		return ret;
+
+	dev->data->models[model_id] = NULL;
+	cnxk_mldev->nb_models_unloaded++;
+
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CNXK_ML_MODEL_MEMZONE_NAME, model_id);
+	return plt_memzone_free(plt_memzone_lookup(str));
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -451,8 +587,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
 
 	/* Model ops */
-	.model_load = cn10k_ml_model_load,
-	.model_unload = cn10k_ml_model_unload,
+	.model_load = cnxk_ml_model_load,
+	.model_unload = cnxk_ml_model_unload,
 	.model_start = cn10k_ml_model_start,
 	.model_stop = cn10k_ml_model_stop,
 	.model_info_get = cn10k_ml_model_info_get,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index a925c07580..bc14f6e5b9 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -62,4 +62,6 @@ struct cnxk_ml_qp {
 
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
+int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 10/34] ml/cnxk: update model start and stop functions
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (8 preceding siblings ...)
  2023-10-23  4:41   ` [PATCH v8 09/34] ml/cnxk: update model load and unload functions Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 11/34] ml/cnxk: update model utility functions Srikanth Yalavarthi
                     ` (23 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented cnxk wrapper functions to start and stop
ML models. Wrapper functions would invoke the cn10k
model start and stop functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c |  28 ++--
 drivers/ml/cnxk/cn10k_ml_ocm.h |  12 +-
 drivers/ml/cnxk/cn10k_ml_ops.c | 282 ++++++++++++++++++++-------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   8 +-
 drivers/ml/cnxk/cnxk_ml_ops.c  |  48 +++++-
 drivers/ml/cnxk/cnxk_ml_ops.h  |   1 +
 6 files changed, 240 insertions(+), 139 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index d71c36eae6..2197e5e0ed 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -215,11 +215,10 @@ cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end)
  * scratch & WB pages and OCM allocation mode.
  */
 int
-cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
+cn10k_ml_ocm_tilemask_find(struct cnxk_ml_dev *cnxk_mldev, uint8_t num_tiles, uint16_t wb_pages,
 			   uint16_t scratch_pages, uint64_t *tilemask)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	uint16_t used_scratch_pages_max;
@@ -238,7 +237,6 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	int max_slot_sz;
 	int page_id;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
 
@@ -333,12 +331,10 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 }
 
 void
-cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
+cn10k_ml_ocm_reserve_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id,
 			   uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
 			   uint16_t scratch_pages)
 {
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
@@ -351,10 +347,8 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t l
 	int tile_id;
 	int page_id;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
+	model = cnxk_mldev->mldev->data->models[model_id];
 	layer = &model->layer[layer_id];
 
 	/* Get first set bit, tile_start */
@@ -396,12 +390,10 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t l
 }
 
 void
-cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id)
+cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id)
 {
 	struct cnxk_ml_model *local_model;
 	struct cnxk_ml_layer *local_layer;
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
@@ -416,10 +408,8 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t laye
 	uint16_t i;
 	uint16_t j;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
+	model = cnxk_mldev->mldev->data->models[model_id];
 	layer = &model->layer[layer_id];
 
 	/* Update OCM info for WB memory */
@@ -438,8 +428,8 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t laye
 
 		/* Get max scratch pages required, excluding the current model */
 		scratch_resize_pages = 0;
-		for (i = 0; i < dev->data->nb_models; i++) {
-			local_model = dev->data->models[i];
+		for (i = 0; i < cnxk_mldev->mldev->data->nb_models; i++) {
+			local_model = cnxk_mldev->mldev->data->models[i];
 			if (local_model == NULL)
 				continue;
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 720f8caf76..97b723a56a 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -8,6 +8,8 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+struct cnxk_ml_dev;
+
 /* Number of OCM tiles. */
 #define ML_CN10K_OCM_NUMTILES 0x8
 
@@ -75,12 +77,12 @@ struct cn10k_ml_ocm {
 };
 
 int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
-int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
+int cn10k_ml_ocm_tilemask_find(struct cnxk_ml_dev *cnxk_mldev, uint8_t num_tiles, uint16_t wb_pages,
 			       uint16_t scratch_pages, uint64_t *tilemask);
-void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
-				uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
-				uint16_t scratch_pages);
-void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id);
+void cn10k_ml_ocm_reserve_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id,
+				uint16_t layer_id, uint64_t tilemask, int wb_page_start,
+				uint16_t wb_pages, uint16_t scratch_pages);
+void cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id);
 void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp);
 
 #endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index ab05896b5e..40f484158a 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -248,26 +248,28 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 }
 
 static void
-cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_model *model,
+cn10k_ml_prep_sp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				struct cnxk_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_layer_addr *addr;
+	struct cn10k_ml_dev *cn10k_mldev;
 
-	metadata = &model->glow.metadata;
-	addr = &model->layer[0].glow.addr;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	metadata = &layer->glow.metadata;
+	addr = &layer->glow.addr;
 
 	memset(&req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
 	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
 	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->cn10k_req.status);
-	req->cn10k_req.jd.hdr.model_id = model->model_id;
+	req->cn10k_req.jd.hdr.model_id = layer->index;
 	req->cn10k_req.jd.hdr.job_type = job_type;
 	req->cn10k_req.jd.hdr.fp_flags = 0x0;
 	req->cn10k_req.jd.hdr.result =
 		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
-		if (!model->glow.metadata.model.ocm_relocatable)
+		if (!layer->glow.metadata.model.ocm_relocatable)
 			req->cn10k_req.jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
 		else
 			req->cn10k_req.jd.hdr.sp_flags = 0x0;
@@ -291,7 +293,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 		req->cn10k_req.jd.model_start.num_gather_entries = 0;
 		req->cn10k_req.jd.model_start.num_scatter_entries = 0;
 		req->cn10k_req.jd.model_start.tilemask = 0; /* Updated after reserving pages */
-		req->cn10k_req.jd.model_start.batch_size = model->batch_size;
+		req->cn10k_req.jd.model_start.batch_size = layer->batch_size;
 		req->cn10k_req.jd.model_start.ocm_wb_base_address =
 			0; /* Updated after reserving pages */
 		req->cn10k_req.jd.model_start.ocm_wb_range_start =
@@ -323,9 +325,13 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 }
 
 static __rte_always_inline void
-cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_req *req,
+cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_req *req,
 				struct rte_ml_op *op)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+
 	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
 	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(req->status);
 	req->cn10k_req.jd.hdr.model_id = op->model_id;
@@ -714,10 +720,8 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 }
 
 static int
-cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer)
 {
-	struct rte_ml_model_info *info;
-	struct cnxk_ml_model *model;
 	struct rte_ml_buff_seg seg[2];
 	struct rte_ml_buff_seg *inp;
 	struct rte_ml_buff_seg *out;
@@ -730,22 +734,20 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	int ret = 0;
 	uint32_t i;
 
-	model = dev->data->models[model_id];
-	info = (struct rte_ml_model_info *)model->info;
 	inp = &seg[0];
 	out = &seg[1];
 
 	/* Create input and output buffers. */
-	for (i = 0; i < info->nb_inputs; i++)
-		isize += info->input_info[i].size;
+	for (i = 0; i < layer->info.nb_inputs; i++)
+		isize += layer->info.input[i].sz_q;
 
-	for (i = 0; i < info->nb_outputs; i++)
-		osize += info->output_info[i].size;
+	for (i = 0; i < layer->info.nb_outputs; i++)
+		osize += layer->info.output[i].sz_q;
 
-	isize = model->batch_size * isize;
-	osize = model->batch_size * osize;
+	isize = layer->batch_size * isize;
+	osize = layer->batch_size * osize;
 
-	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", model_id);
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", layer->index);
 	mz = plt_memzone_reserve_aligned(str, isize + osize, 0, ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL)
 		return -ENOMEM;
@@ -761,15 +763,15 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	seg[1].length = osize;
 	seg[1].next = NULL;
 
-	op.model_id = model_id;
-	op.nb_batches = model->batch_size;
+	op.model_id = layer->index;
+	op.nb_batches = layer->batch_size;
 	op.mempool = NULL;
 
 	op.input = &inp;
 	op.output = &out;
 
-	memset(model->layer[0].glow.req, 0, sizeof(struct cnxk_ml_req));
-	ret = cn10k_ml_inference_sync(dev, &op);
+	memset(layer->glow.req, 0, sizeof(struct cnxk_ml_req));
+	ret = cn10k_ml_inference_sync(cnxk_mldev, &op);
 	plt_memzone_free(mz);
 
 	return ret;
@@ -1506,14 +1508,16 @@ cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mode
 }
 
 int
-cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 	struct cnxk_ml_req *req;
 
+	uint16_t layer_id = 0;
 	bool job_enqueued;
 	bool job_dequeued;
 	uint8_t num_tiles;
@@ -1524,85 +1528,89 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	bool locked;
 	int ret = 0;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	PLT_SET_USED(layer_name);
 
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
 	if (model == NULL) {
 		plt_err("Invalid model_id = %u", model_id);
 		return -EINVAL;
 	}
 
+	layer = &model->layer[layer_id];
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
+
 	/* Prepare JD */
-	req = model->layer[0].glow.req;
-	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
+	req = layer->glow.req;
+	cn10k_ml_prep_sp_job_descriptor(cnxk_mldev, layer, req, ML_CN10K_JOB_TYPE_MODEL_START);
 	req->cn10k_req.result.error_code = 0x0;
 	req->cn10k_req.result.user_ptr = NULL;
 
 	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
-	num_tiles = model->layer[0].glow.metadata.model.tile_end -
-		    model->layer[0].glow.metadata.model.tile_start + 1;
+	num_tiles = layer->glow.metadata.model.tile_end - layer->glow.metadata.model.tile_start + 1;
 
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-				plt_ml_dbg("Model already started, model = 0x%016lx",
-					   PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_STARTED) {
+				plt_ml_dbg("Layer already started, model_id = %u, layer_id = %u",
+					   model->model_id, layer_id);
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
-				plt_err("A slow-path job is active for the model = 0x%016lx",
-					PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE) {
+				plt_err("A slow-path job is active for the model_id = %u",
+					model->model_id);
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
+			layer->state = ML_CNXK_LAYER_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (!model->layer[0].glow.ocm_map.ocm_reserved) {
+	while (!layer->glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
 			wb_page_start = cn10k_ml_ocm_tilemask_find(
-				dev, num_tiles, model->layer[0].glow.ocm_map.wb_pages,
-				model->layer[0].glow.ocm_map.scratch_pages, &tilemask);
+				cnxk_mldev, num_tiles, layer->glow.ocm_map.wb_pages,
+				layer->glow.ocm_map.scratch_pages, &tilemask);
 
 			if (wb_page_start == -1) {
 				plt_err("Free pages not available on OCM tiles");
-				plt_err("Failed to start model = 0x%016lx, name = %s",
-					PLT_U64_CAST(model),
-					model->layer[0].glow.metadata.model.name);
-
+				plt_err("Failed to start layer, model_id = %u, layer_id = %u",
+					model->model_id, layer_id);
 				plt_spinlock_unlock(&ocm->lock);
 				return -ENOMEM;
 			}
 
-			model->layer[0].glow.ocm_map.tilemask = tilemask;
-			model->layer[0].glow.ocm_map.wb_page_start = wb_page_start;
+			layer->glow.ocm_map.tilemask = tilemask;
+			layer->glow.ocm_map.wb_page_start = wb_page_start;
 
-			cn10k_ml_ocm_reserve_pages(dev, model->model_id, 0,
-						   model->layer[0].glow.ocm_map.tilemask,
-						   model->layer[0].glow.ocm_map.wb_page_start,
-						   model->layer[0].glow.ocm_map.wb_pages,
-						   model->layer[0].glow.ocm_map.scratch_pages);
-			model->layer[0].glow.ocm_map.ocm_reserved = true;
+			cn10k_ml_ocm_reserve_pages(
+				cnxk_mldev, model->model_id, layer_id, layer->glow.ocm_map.tilemask,
+				layer->glow.ocm_map.wb_page_start, layer->glow.ocm_map.wb_pages,
+				layer->glow.ocm_map.scratch_pages);
+			layer->glow.ocm_map.ocm_reserved = true;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
 
 	/* Update JD */
-	cn10k_ml_ocm_tilecount(model->layer[0].glow.ocm_map.tilemask, &tile_start, &tile_end);
+	cn10k_ml_ocm_tilecount(layer->glow.ocm_map.tilemask, &tile_start, &tile_end);
 	req->cn10k_req.jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
 	req->cn10k_req.jd.model_start.ocm_wb_base_address =
-		model->layer[0].glow.ocm_map.wb_page_start * ocm->page_size;
+		layer->glow.ocm_map.wb_page_start * ocm->page_size;
 
 	job_enqueued = false;
 	job_dequeued = false;
@@ -1636,66 +1644,94 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (ret == 0) {
-				model->state = ML_CNXK_MODEL_STATE_STARTED;
-				cnxk_mldev->nb_models_started++;
-			} else {
-				model->state = ML_CNXK_MODEL_STATE_UNKNOWN;
-			}
+			if (ret == 0)
+				layer->state = ML_CNXK_LAYER_STATE_STARTED;
+			else
+				layer->state = ML_CNXK_LAYER_STATE_UNKNOWN;
 
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN) {
-		while (model->layer[0].glow.ocm_map.ocm_reserved) {
+	if (layer->state == ML_CNXK_LAYER_STATE_UNKNOWN) {
+		while (layer->glow.ocm_map.ocm_reserved) {
 			if (plt_spinlock_trylock(&ocm->lock) != 0) {
-				cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
-				model->layer[0].glow.ocm_map.ocm_reserved = false;
-				model->layer[0].glow.ocm_map.tilemask = 0x0;
+				cn10k_ml_ocm_free_pages(cnxk_mldev, model->model_id, layer_id);
+				layer->glow.ocm_map.ocm_reserved = false;
+				layer->glow.ocm_map.tilemask = 0x0;
 				plt_spinlock_unlock(&ocm->lock);
 			}
 		}
 	}
 
-	if (ret < 0) { /* Call unload to update model and FW state, ignore error */
-		rte_ml_model_stop(dev->data->dev_id, model_id);
+	if (ret < 0) {
+		cn10k_ml_layer_stop(device, model_id, layer_name);
 	} else {
-		if (cn10k_mldev->cache_model_data && roc_model_is_cn10ka())
-			ret = cn10k_ml_cache_model_data(dev, model_id);
+		if (cn10k_mldev->cache_model_data)
+			ret = cn10k_ml_cache_model_data(cnxk_mldev, layer);
 	}
 
 	return ret;
 }
 
 int
-cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+	int ret;
+
+	layer = &model->layer[0];
+	ret = cn10k_ml_layer_start(cnxk_mldev, model->model_id, layer->name);
+	if (ret != 0) {
+		plt_err("CN10K Model start failed, model_id = %u, error = %d", model->model_id,
+			ret);
+		return ret;
+	}
+
+	cnxk_mldev->nb_models_started++;
+	model->state = ML_CNXK_MODEL_STATE_STARTED;
+
+	return 0;
+}
+
+int
+cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 	struct cnxk_ml_req *req;
 
+	uint16_t layer_id = 0;
 	bool job_enqueued;
 	bool job_dequeued;
 	bool locked;
 	int ret = 0;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	PLT_SET_USED(layer_name);
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
 
+	model = cnxk_mldev->mldev->data->models[model_id];
 	if (model == NULL) {
 		plt_err("Invalid model_id = %u", model_id);
 		return -EINVAL;
 	}
 
+	layer = &model->layer[layer_id];
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
+
 	/* Prepare JD */
-	req = model->layer[0].glow.req;
-	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
+	req = layer->glow.req;
+	cn10k_ml_prep_sp_job_descriptor(cnxk_mldev, layer, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
 	req->cn10k_req.result.error_code = 0x0;
 	req->cn10k_req.result.user_ptr = NULL;
 
@@ -1705,31 +1741,31 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-				plt_ml_dbg("Model not started, model = 0x%016lx",
-					   PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_LOADED) {
+				plt_ml_dbg("Layer not started, model_id = %u, layer_id = %u",
+					   model->model_id, layer_id);
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
-				plt_err("A slow-path job is active for the model = 0x%016lx",
-					PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE) {
+				plt_err("A slow-path job is active for the layer, model_id = %u, layer_id = %u",
+					model->model_id, layer_id);
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
+			layer->state = ML_CNXK_LAYER_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (model->layer[0].glow.ocm_map.ocm_reserved) {
+	while (layer->glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
-			cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
-			model->layer[0].glow.ocm_map.ocm_reserved = false;
-			model->layer[0].glow.ocm_map.tilemask = 0x0;
+			cn10k_ml_ocm_free_pages(cnxk_mldev, model->model_id, layer_id);
+			layer->glow.ocm_map.ocm_reserved = false;
+			layer->glow.ocm_map.tilemask = 0x0;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
@@ -1766,8 +1802,11 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			cnxk_mldev->nb_models_stopped++;
-			model->state = ML_CNXK_MODEL_STATE_LOADED;
+			if (ret == 0)
+				layer->state = ML_CNXK_LAYER_STATE_LOADED;
+			else
+				layer->state = ML_CNXK_LAYER_STATE_UNKNOWN;
+
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
@@ -1776,6 +1815,25 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	return ret;
 }
 
+int
+cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+	int ret;
+
+	layer = &model->layer[0];
+	ret = cn10k_ml_layer_stop(cnxk_mldev, model->model_id, layer->name);
+	if (ret != 0) {
+		plt_err("CN10K Model stop failed, model_id = %u, error = %d", model->model_id, ret);
+		return ret;
+	}
+
+	cnxk_mldev->nb_models_stopped++;
+	model->state = ML_CNXK_MODEL_STATE_LOADED;
+
+	return 0;
+}
+
 int
 cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			struct rte_ml_model_info *model_info)
@@ -2003,30 +2061,35 @@ queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
 }
 
 static __rte_always_inline void
-cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *req)
+cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml_req *req)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_layer_xstats *xstats;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_result *result;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cnxk_ml_qp *qp;
 	struct rte_ml_op *op;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
+	uint16_t model_id;
+	uint16_t layer_id;
 
 	result = &req->cn10k_req.result;
 	op = req->op;
 
 	if (likely(result->error_code == 0)) {
-		model = dev->data->models[op->model_id];
+		model_id = cnxk_mldev->index_map[op->model_id].model_id;
+		layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
+		model = cnxk_mldev->mldev->data->models[model_id];
+		layer = &model->layer[layer_id];
 		if (likely(qp_id >= 0)) {
-			qp = dev->data->queue_pairs[qp_id];
+			qp = cnxk_mldev->mldev->data->queue_pairs[qp_id];
 			qp->stats.dequeued_count++;
-			xstats = &model->layer[0].glow.burst_xstats[qp_id];
+			xstats = &layer->glow.burst_xstats[qp_id];
 		} else {
-			xstats = model->layer[0].glow.sync_xstats;
+			xstats = layer->glow.sync_xstats;
 		}
 
 		if (unlikely(xstats->dequeued_count == xstats->hw_reset_count)) {
@@ -2054,14 +2117,13 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *re
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
 	} else {
 		if (likely(qp_id >= 0)) {
-			qp = dev->data->queue_pairs[qp_id];
+			qp = cnxk_mldev->mldev->data->queue_pairs[qp_id];
 			qp->stats.dequeue_err_count++;
 		}
 
 		/* Handle driver error */
 		error_code = (union cn10k_ml_error_code *)&result->error_code;
 		if (error_code->s.etype == ML_ETYPE_DRIVER) {
-			cnxk_mldev = dev->data->dev_private;
 			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 			/* Check for exception */
@@ -2116,7 +2178,7 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	req = &queue->reqs[head];
 
 	cn10k_mldev->set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
@@ -2183,7 +2245,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 		}
 	}
 
-	cn10k_ml_result_update(dev, qp_id, req);
+	cn10k_ml_result_update(cnxk_mldev, qp_id, req);
 	ops[count] = req->op;
 
 	queue_index_advance(&tail, qp->nb_desc);
@@ -2232,23 +2294,27 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 }
 
 __rte_hot int
-cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
+cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cnxk_ml_req *req;
+	uint16_t model_id;
+	uint16_t layer_id;
 	bool timeout;
 	int ret = 0;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	model = dev->data->models[op->model_id];
-	req = model->layer[0].glow.req;
+	model_id = cnxk_mldev->index_map[op->model_id].model_id;
+	layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
+	model = cnxk_mldev->mldev->data->models[model_id];
+	layer = &model->layer[layer_id];
+	req = layer->glow.req;
 
 	cn10k_ml_set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
@@ -2284,7 +2350,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	if (timeout)
 		ret = -ETIME;
 	else
-		cn10k_ml_result_update(dev, -1, req);
+		cn10k_ml_result_update(cnxk_mldev, -1, req);
 
 error_enqueue:
 	return ret;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 677219dfdf..a222a43d55 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -315,8 +315,8 @@ int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
-int cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id);
-int cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
+int cn10k_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			    struct rte_ml_model_info *model_info);
 int cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer);
@@ -335,7 +335,7 @@ __rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id
 					  struct rte_ml_op **ops, uint16_t nb_ops);
 __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op,
 				    struct rte_ml_op_error *error);
-__rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
+__rte_hot int cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op);
 
 /* Misc ops */
 void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
@@ -344,5 +344,7 @@ void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *q
 int cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uint8_t *buffer,
 			size_t size, uint16_t *index);
 int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name);
+int cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name);
+int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name);
 
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 1d8b84269d..b61ed45876 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -240,7 +240,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 			model = dev->data->models[model_id];
 			if (model != NULL) {
 				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-					if (cn10k_ml_model_stop(dev, model_id) != 0)
+					if (cnxk_ml_model_stop(dev, model_id) != 0)
 						plt_err("Could not stop model %u", model_id);
 				}
 				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
@@ -332,7 +332,7 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 		model = dev->data->models[model_id];
 		if (model != NULL) {
 			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-				if (cn10k_ml_model_stop(dev, model_id) != 0)
+				if (cnxk_ml_model_stop(dev, model_id) != 0)
 					plt_err("Could not stop model %u", model_id);
 			}
 			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
@@ -564,6 +564,46 @@ cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 	return plt_memzone_free(plt_memzone_lookup(str));
 }
 
+static int
+cnxk_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	return cn10k_ml_model_start(cnxk_mldev, model);
+}
+
+int
+cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	return cn10k_ml_model_stop(cnxk_mldev, model);
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -589,8 +629,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Model ops */
 	.model_load = cnxk_ml_model_load,
 	.model_unload = cnxk_ml_model_unload,
-	.model_start = cn10k_ml_model_start,
-	.model_stop = cn10k_ml_model_stop,
+	.model_start = cnxk_ml_model_start,
+	.model_stop = cnxk_ml_model_stop,
 	.model_info_get = cn10k_ml_model_info_get,
 	.model_params_update = cn10k_ml_model_params_update,
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index bc14f6e5b9..d27ca0d0cb 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -63,5 +63,6 @@ struct cnxk_ml_qp {
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
 int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
+int cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
 
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 11/34] ml/cnxk: update model utility functions
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (9 preceding siblings ...)
  2023-10-23  4:41   ` [PATCH v8 10/34] ml/cnxk: update model start and stop functions Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 12/34] ml/cnxk: update data quantization functions Srikanth Yalavarthi
                     ` (22 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to update model params and
fetch model info.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 38 ++++++---------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |  5 ++--
 drivers/ml/cnxk/cnxk_ml_ops.c  | 48 ++++++++++++++++++++++++++++++++--
 3 files changed, 56 insertions(+), 35 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 40f484158a..3ff82829f0 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1835,45 +1835,23 @@ cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
 }
 
 int
-cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
-			struct rte_ml_model_info *model_info)
+cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			     void *buffer)
 {
-	struct cnxk_ml_model *model;
-
-	model = dev->data->models[model_id];
-
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
-
-	rte_memcpy(model_info, model->info, sizeof(struct rte_ml_model_info));
-	model_info->input_info = ((struct rte_ml_model_info *)model->info)->input_info;
-	model_info->output_info = ((struct rte_ml_model_info *)model->info)->output_info;
-
-	return 0;
-}
-
-int
-cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
-{
-	struct cnxk_ml_model *model;
-
-	model = dev->data->models[model_id];
+	struct cnxk_ml_layer *layer;
 
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
+	RTE_SET_USED(cnxk_mldev);
 
 	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN)
 		return -1;
 	else if (model->state != ML_CNXK_MODEL_STATE_LOADED)
 		return -EBUSY;
 
+	layer = &model->layer[0];
+
 	/* Update model weights & bias */
-	rte_memcpy(model->layer[0].glow.addr.wb_load_addr, buffer,
-		   model->layer[0].glow.metadata.weights_bias.file_size);
+	rte_memcpy(layer->glow.addr.wb_load_addr, buffer,
+		   layer->glow.metadata.weights_bias.file_size);
 
 	return 0;
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index a222a43d55..ef12069f0d 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -317,9 +317,8 @@ int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_para
 int cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
-int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
-			    struct rte_ml_model_info *model_info);
-int cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer);
+int cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				 void *buffer);
 
 /* I/O ops */
 int cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index b61ed45876..9ce37fcfd1 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -604,6 +604,50 @@ cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	return cn10k_ml_model_stop(cnxk_mldev, model);
 }
 
+static int
+cnxk_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
+		       struct rte_ml_model_info *model_info)
+{
+	struct rte_ml_model_info *info;
+	struct cnxk_ml_model *model;
+
+	if ((dev == NULL) || (model_info == NULL))
+		return -EINVAL;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	info = (struct rte_ml_model_info *)model->info;
+	rte_memcpy(model_info, info, sizeof(struct rte_ml_model_info));
+	model_info->input_info = info->input_info;
+	model_info->output_info = info->output_info;
+
+	return 0;
+}
+
+static int
+cnxk_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	if ((dev == NULL) || (buffer == NULL))
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	return cn10k_ml_model_params_update(cnxk_mldev, model, buffer);
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -631,8 +675,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.model_unload = cnxk_ml_model_unload,
 	.model_start = cnxk_ml_model_start,
 	.model_stop = cnxk_ml_model_stop,
-	.model_info_get = cn10k_ml_model_info_get,
-	.model_params_update = cn10k_ml_model_params_update,
+	.model_info_get = cnxk_ml_model_info_get,
+	.model_params_update = cnxk_ml_model_params_update,
 
 	/* I/O ops */
 	.io_quantize = cn10k_ml_io_quantize,
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 12/34] ml/cnxk: update data quantization functions
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (10 preceding siblings ...)
  2023-10-23  4:41   ` [PATCH v8 11/34] ml/cnxk: update model utility functions Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 13/34] ml/cnxk: update device debug functions Srikanth Yalavarthi
                     ` (21 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper functions to quantize input data and
dequantize output data.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 164 ---------------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   7 --
 drivers/ml/cnxk/cnxk_ml_io.c   |  95 +++++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_io.h   |   3 +
 drivers/ml/cnxk/cnxk_ml_ops.c  |  78 +++++++++++++++-
 drivers/ml/cnxk/meson.build    |   1 +
 6 files changed, 175 insertions(+), 173 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.c

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 3ff82829f0..c68e6c620c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1856,170 +1856,6 @@ cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_mode
 	return 0;
 }
 
-int
-cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
-		     struct rte_ml_buff_seg **qbuffer)
-{
-	struct cnxk_ml_model *model;
-	uint8_t model_input_type;
-	uint8_t *lcl_dbuffer;
-	uint8_t *lcl_qbuffer;
-	uint8_t input_type;
-	float qscale;
-	uint32_t i;
-	uint32_t j;
-	int ret;
-
-	model = dev->data->models[model_id];
-
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
-
-	lcl_dbuffer = dbuffer[0]->addr;
-	lcl_qbuffer = qbuffer[0]->addr;
-
-	for (i = 0; i < model->layer[0].glow.metadata.model.num_input; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			input_type = model->layer[0].glow.metadata.input1[i].input_type;
-			model_input_type = model->layer[0].glow.metadata.input1[i].model_input_type;
-			qscale = model->layer[0].glow.metadata.input1[i].qscale;
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			input_type = model->layer[0].glow.metadata.input2[j].input_type;
-			model_input_type = model->layer[0].glow.metadata.input2[j].model_input_type;
-			qscale = model->layer[0].glow.metadata.input2[j].qscale;
-		}
-
-		if (input_type == model_input_type) {
-			rte_memcpy(lcl_qbuffer, lcl_dbuffer, model->layer[0].info.input[i].sz_d);
-		} else {
-			switch (model->layer[0].glow.metadata.input1[i].model_input_type) {
-			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_float32_to_int8(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_float32_to_uint8(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_float32_to_int16(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_float32_to_uint16(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_FP16:
-				ret = rte_ml_io_float32_to_float16(
-					model->layer[0].info.input[i].nb_elements, lcl_dbuffer,
-					lcl_qbuffer);
-				break;
-			default:
-				plt_err("Unsupported model_input_type[%u] : %u", i,
-					model->layer[0].glow.metadata.input1[i].model_input_type);
-				ret = -ENOTSUP;
-			}
-			if (ret < 0)
-				return ret;
-		}
-
-		lcl_dbuffer += model->layer[0].info.input[i].sz_d;
-		lcl_qbuffer += model->layer[0].info.input[i].sz_q;
-	}
-
-	return 0;
-}
-
-int
-cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
-		       struct rte_ml_buff_seg **dbuffer)
-{
-	struct cnxk_ml_model *model;
-	uint8_t model_output_type;
-	uint8_t *lcl_qbuffer;
-	uint8_t *lcl_dbuffer;
-	uint8_t output_type;
-	float dscale;
-	uint32_t i;
-	uint32_t j;
-	int ret;
-
-	model = dev->data->models[model_id];
-
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
-
-	lcl_dbuffer = dbuffer[0]->addr;
-	lcl_qbuffer = qbuffer[0]->addr;
-
-	for (i = 0; i < model->layer[0].glow.metadata.model.num_output; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			output_type = model->layer[0].glow.metadata.output1[i].output_type;
-			model_output_type =
-				model->layer[0].glow.metadata.output1[i].model_output_type;
-			dscale = model->layer[0].glow.metadata.output1[i].dscale;
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			output_type = model->layer[0].glow.metadata.output2[j].output_type;
-			model_output_type =
-				model->layer[0].glow.metadata.output2[j].model_output_type;
-			dscale = model->layer[0].glow.metadata.output2[j].dscale;
-		}
-
-		if (output_type == model_output_type) {
-			rte_memcpy(lcl_dbuffer, lcl_qbuffer, model->layer[0].info.output[i].sz_q);
-		} else {
-			switch (model->layer[0].glow.metadata.output1[i].model_output_type) {
-			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_int8_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_uint8_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_int16_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_uint16_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_FP16:
-				ret = rte_ml_io_float16_to_float32(
-					model->layer[0].info.output[i].nb_elements, lcl_qbuffer,
-					lcl_dbuffer);
-				break;
-			default:
-				plt_err("Unsupported model_output_type[%u] : %u", i,
-					model->layer[0].glow.metadata.output1[i].model_output_type);
-				ret = -ENOTSUP;
-			}
-			if (ret < 0)
-				return ret;
-		}
-
-		lcl_qbuffer += model->layer[0].info.output[i].sz_q;
-		lcl_dbuffer += model->layer[0].info.output[i].sz_d;
-	}
-
-	return 0;
-}
-
 static __rte_always_inline void
 queue_index_advance(uint64_t *index, uint64_t nb_desc)
 {
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index ef12069f0d..780e2a9f9c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -320,13 +320,6 @@ int cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 int cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 				 void *buffer);
 
-/* I/O ops */
-int cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id,
-			 struct rte_ml_buff_seg **dbuffer, struct rte_ml_buff_seg **qbuffer);
-
-int cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id,
-			   struct rte_ml_buff_seg **qbuffer, struct rte_ml_buff_seg **dbuffer);
-
 /* Fast-path ops */
 __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					  struct rte_ml_op **ops, uint16_t nb_ops);
diff --git a/drivers/ml/cnxk/cnxk_ml_io.c b/drivers/ml/cnxk/cnxk_ml_io.c
new file mode 100644
index 0000000000..c78009ab0c
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_io.c
@@ -0,0 +1,95 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include <mldev_utils.h>
+
+#include <roc_api.h>
+
+#include "cnxk_ml_io.h"
+
+inline int
+cnxk_ml_io_quantize_single(struct cnxk_ml_io *input, uint8_t *dbuffer, uint8_t *qbuffer)
+{
+	enum rte_ml_io_type qtype;
+	enum rte_ml_io_type dtype;
+	uint32_t nb_elements;
+	float qscale;
+	int ret = 0;
+
+	dtype = input->dtype;
+	qtype = input->qtype;
+	qscale = input->scale;
+	nb_elements = input->nb_elements;
+
+	if (dtype == qtype) {
+		rte_memcpy(qbuffer, dbuffer, input->sz_d);
+	} else {
+		switch (qtype) {
+		case RTE_ML_IO_TYPE_INT8:
+			ret = rte_ml_io_float32_to_int8(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT8:
+			ret = rte_ml_io_float32_to_uint8(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_INT16:
+			ret = rte_ml_io_float32_to_int16(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT16:
+			ret = rte_ml_io_float32_to_uint16(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_FP16:
+			ret = rte_ml_io_float32_to_float16(nb_elements, dbuffer, qbuffer);
+			break;
+		default:
+			plt_err("Unsupported qtype : %u", qtype);
+			ret = -ENOTSUP;
+		}
+	}
+
+	return ret;
+}
+
+inline int
+cnxk_ml_io_dequantize_single(struct cnxk_ml_io *output, uint8_t *qbuffer, uint8_t *dbuffer)
+{
+	enum rte_ml_io_type qtype;
+	enum rte_ml_io_type dtype;
+	uint32_t nb_elements;
+	float dscale;
+	int ret = 0;
+
+	dtype = output->dtype;
+	qtype = output->qtype;
+	dscale = output->scale;
+	nb_elements = output->nb_elements;
+
+	if (dtype == qtype) {
+		rte_memcpy(dbuffer, qbuffer, output->sz_q);
+	} else {
+		switch (qtype) {
+		case RTE_ML_IO_TYPE_INT8:
+			ret = rte_ml_io_int8_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT8:
+			ret = rte_ml_io_uint8_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_INT16:
+			ret = rte_ml_io_int16_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT16:
+			ret = rte_ml_io_uint16_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_FP16:
+			ret = rte_ml_io_float16_to_float32(nb_elements, qbuffer, dbuffer);
+			break;
+		default:
+			plt_err("Unsupported qtype: %u", qtype);
+			ret = -ENOTSUP;
+		}
+	}
+
+	return ret;
+}
diff --git a/drivers/ml/cnxk/cnxk_ml_io.h b/drivers/ml/cnxk/cnxk_ml_io.h
index 29ec7ec511..5de166c252 100644
--- a/drivers/ml/cnxk/cnxk_ml_io.h
+++ b/drivers/ml/cnxk/cnxk_ml_io.h
@@ -76,4 +76,7 @@ struct cnxk_ml_io_info {
 	uint32_t total_output_sz_d;
 };
 
+int cnxk_ml_io_quantize_single(struct cnxk_ml_io *input, uint8_t *dbuffer, uint8_t *qbuffer);
+int cnxk_ml_io_dequantize_single(struct cnxk_ml_io *output, uint8_t *qbuffer, uint8_t *dbuffer);
+
 #endif /* _CNXK_ML_IO_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 9ce37fcfd1..63842025fc 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -5,6 +5,8 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include <mldev_utils.h>
+
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_io.h"
 #include "cnxk_ml_model.h"
@@ -648,6 +650,78 @@ cnxk_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buf
 	return cn10k_ml_model_params_update(cnxk_mldev, model, buffer);
 }
 
+static int
+cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
+		    struct rte_ml_buff_seg **qbuffer)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_model *model;
+	uint8_t *lcl_dbuffer;
+	uint8_t *lcl_qbuffer;
+	uint32_t i;
+	int ret;
+
+	if ((dev == NULL) || (dbuffer == NULL) || (qbuffer == NULL))
+		return -EINVAL;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	info = &model->layer[0].info;
+
+	lcl_dbuffer = dbuffer[0]->addr;
+	lcl_qbuffer = qbuffer[0]->addr;
+	for (i = 0; i < info->nb_inputs; i++) {
+		ret = cnxk_ml_io_quantize_single(&info->input[i], lcl_dbuffer, lcl_qbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_dbuffer += info->input[i].sz_d;
+		lcl_qbuffer += info->input[i].sz_q;
+	}
+
+	return 0;
+}
+
+static int
+cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
+		      struct rte_ml_buff_seg **dbuffer)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_model *model;
+	uint8_t *lcl_qbuffer;
+	uint8_t *lcl_dbuffer;
+	uint32_t i;
+	int ret;
+
+	if ((dev == NULL) || (qbuffer == NULL) || (dbuffer == NULL))
+		return -EINVAL;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	info = &model->layer[model->nb_layers - 1].info;
+
+	lcl_qbuffer = qbuffer[0]->addr;
+	lcl_dbuffer = dbuffer[0]->addr;
+	for (i = 0; i < info->nb_outputs; i++) {
+		ret = cnxk_ml_io_dequantize_single(&info->output[i], lcl_qbuffer, lcl_dbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_qbuffer += info->output[i].sz_q;
+		lcl_dbuffer += info->output[i].sz_d;
+	}
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -679,6 +753,6 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.model_params_update = cnxk_ml_model_params_update,
 
 	/* I/O ops */
-	.io_quantize = cn10k_ml_io_quantize,
-	.io_dequantize = cn10k_ml_io_dequantize,
+	.io_quantize = cnxk_ml_io_quantize,
+	.io_dequantize = cnxk_ml_io_dequantize,
 };
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index d652543912..79154c8698 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -13,6 +13,7 @@ sources = files(
         'cn10k_ml_model.c',
         'cn10k_ml_ocm.c',
         'cnxk_ml_dev.c',
+        'cnxk_ml_io.c',
         'cnxk_ml_model.c',
         'cnxk_ml_ops.c',
 )
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 13/34] ml/cnxk: update device debug functions
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (11 preceding siblings ...)
  2023-10-23  4:41   ` [PATCH v8 12/34] ml/cnxk: update data quantization functions Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 14/34] ml/cnxk: update device stats functions Srikanth Yalavarthi
                     ` (20 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper for device dump and selftest debug
functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 118 +++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h |   1 +
 drivers/ml/cnxk/cn10k_ml_ocm.c   |   8 +-
 drivers/ml/cnxk/cn10k_ml_ocm.h   |   2 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 176 ++-----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h   |   4 +-
 drivers/ml/cnxk/cnxk_ml_model.c  |  33 ++++++
 drivers/ml/cnxk/cnxk_ml_model.h  |   2 +
 drivers/ml/cnxk/cnxk_ml_ops.c    |  39 ++++++-
 drivers/ml/cnxk/cnxk_ml_utils.c  |  15 +++
 drivers/ml/cnxk/cnxk_ml_utils.h  |  17 +++
 drivers/ml/cnxk/meson.build      |   1 +
 12 files changed, 235 insertions(+), 181 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.h

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 48d70027ca..af9d5a666f 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -11,6 +11,7 @@
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
+#include "cnxk_ml_utils.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -598,3 +599,120 @@ cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 				 rte_ml_io_type_size_get(io_info->output[i].qtype);
 	}
 }
+
+void
+cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp)
+{
+	struct cn10k_ml_ocm *ocm;
+	char str[STR_LEN];
+	uint8_t i;
+	uint8_t j;
+
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
+
+	/* Print debug info */
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, " Layer Information (Layer ID: %u, Name: %s)\n",
+		cnxk_mldev->index_map[layer->index].layer_id, layer->name);
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "index", layer->index);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", layer->name);
+	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version",
+		layer->glow.metadata.model.version[0], layer->glow.metadata.model.version[1],
+		layer->glow.metadata.model.version[2], layer->glow.metadata.model.version[3]);
+	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "layer", PLT_U64_CAST(layer));
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", layer->batch_size);
+
+	/* Print model state */
+	if (layer->state == ML_CNXK_LAYER_STATE_LOADED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+	if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+	if (layer->state == ML_CNXK_LAYER_STATE_STARTED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+
+	/* Print OCM status */
+	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
+		layer->glow.metadata.model.ocm_wb_range_end -
+			layer->glow.metadata.model.ocm_wb_range_start + 1);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", layer->glow.ocm_map.wb_pages);
+	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
+		ocm->size_per_tile - layer->glow.metadata.model.ocm_tmp_range_floor);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages", layer->glow.ocm_map.scratch_pages);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
+		layer->glow.metadata.model.tile_end - layer->glow.metadata.model.tile_start + 1);
+
+	if (layer->state == ML_CNXK_LAYER_STATE_STARTED) {
+		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
+			ML_CN10K_OCM_NUMTILES / 4, layer->glow.ocm_map.tilemask);
+		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
+			layer->glow.ocm_map.wb_page_start * ocm->page_size);
+	}
+
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", layer->glow.metadata.model.num_input);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", layer->glow.metadata.model.num_output);
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s  %18s\n", "input", "input_name", "input_type",
+		"model_input_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->glow.metadata.model.num_input; i++) {
+		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.input1[i].input_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.input1[i].input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.input1[i].model_input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		} else {
+			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
+
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.input2[j].input_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.input2[j].input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.input2[j].model_input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		}
+	}
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s  %18s\n", "output", "output_name", "output_type",
+		"model_output_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->glow.metadata.model.num_output; i++) {
+		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.output1[i].output_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.output1[i].output_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.output1[i].model_output_type,
+					      str, STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		} else {
+			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.output2[j].output_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.output2[j].output_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.output2[j].model_output_type,
+					      str, STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		}
+	}
+	fprintf(fp, "\n");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index b891c9d627..45f2ed5fcf 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -460,5 +460,6 @@ int cn10k_ml_model_ocm_pages_count(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_m
 void cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 			     struct cnxk_ml_io_info *io_info,
 			     struct cn10k_ml_model_metadata *metadata);
+void cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 2197e5e0ed..dc315cce10 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -481,19 +481,15 @@ cn10k_ml_ocm_pagemask_to_str(struct cn10k_ml_ocm_tile_info *tile_info, uint16_t
 }
 
 void
-cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
+cn10k_ml_ocm_print(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
 {
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 	uint8_t tile_id;
 	uint8_t word_id;
 	int wb_pages;
 	char *str;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
 
 	/* Nibbles + prefix '0x' */
 	str = rte_zmalloc("ocm_mask_str", ocm->num_pages / 4 + 2, RTE_CACHE_LINE_SIZE);
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 97b723a56a..bf8944f8ee 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -83,6 +83,6 @@ void cn10k_ml_ocm_reserve_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_i
 				uint16_t layer_id, uint64_t tilemask, int wb_page_start,
 				uint16_t wb_pages, uint16_t scratch_pages);
 void cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id);
-void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp);
+void cn10k_ml_ocm_print(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 
 #endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index c68e6c620c..a56d002d4c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -18,11 +18,6 @@
 /* ML layer macros */
 #define CN10K_ML_LAYER_MEMZONE_NAME "ml_cn10k_layer_mz"
 
-/* Debug print width */
-#define STR_LEN	  12
-#define FIELD_LEN 16
-#define LINE_LEN  90
-
 /* ML Job descriptor flags */
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
@@ -70,16 +65,6 @@ static const struct cn10k_ml_stype_db_driver {
 	{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
 };
 
-static void
-print_line(FILE *fp, int len)
-{
-	int i;
-
-	for (i = 0; i < len; i++)
-		fprintf(fp, "-");
-	fprintf(fp, "\n");
-}
-
 static inline void
 cn10k_ml_set_poll_addr(struct cnxk_ml_req *req)
 {
@@ -113,140 +98,6 @@ cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp)
 	}
 }
 
-static void
-cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
-	struct cn10k_ml_ocm *ocm;
-	char str[STR_LEN];
-	uint8_t i;
-	uint8_t j;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
-
-	/* Print debug info */
-	print_line(fp, LINE_LEN);
-	fprintf(fp, " Model Information (%s)\n", model->glow.metadata.model.name);
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->glow.metadata.model.name);
-	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version",
-		model->glow.metadata.model.version[0], model->glow.metadata.model.version[1],
-		model->glow.metadata.model.version[2], model->glow.metadata.model.version[3]);
-	if (strlen(model->name) != 0)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "debug_name", model->name);
-	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->glow.metadata.model.batch_size);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_layers", model->glow.metadata.model.num_layers);
-
-	/* Print model state */
-	if (model->state == ML_CNXK_MODEL_STATE_LOADED)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
-	if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
-	if (model->state == ML_CNXK_MODEL_STATE_STARTED)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
-
-	/* Print OCM status */
-	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
-		model->glow.metadata.model.ocm_wb_range_end -
-			model->glow.metadata.model.ocm_wb_range_start + 1);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", model->layer[0].glow.ocm_map.wb_pages);
-	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
-		ocm->size_per_tile - model->glow.metadata.model.ocm_tmp_range_floor);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages",
-		model->layer[0].glow.ocm_map.scratch_pages);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
-		model->glow.metadata.model.tile_end - model->glow.metadata.model.tile_start + 1);
-
-	if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
-			ML_CN10K_OCM_NUMTILES / 4, model->layer[0].glow.ocm_map.tilemask);
-		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
-			model->layer[0].glow.ocm_map.wb_page_start * cn10k_mldev->ocm.page_size);
-	}
-
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->glow.metadata.model.num_input);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", model->glow.metadata.model.num_output);
-	fprintf(fp, "\n");
-
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "input", "input_name", "input_type",
-		"model_input_type", "quantize");
-	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->glow.metadata.model.num_input; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.input1[i].input_name);
-			rte_ml_io_type_to_str(model->glow.metadata.input1[i].input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.input1[i].model_input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.input1[i].quantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.input2[j].input_name);
-			rte_ml_io_type_to_str(model->glow.metadata.input2[j].input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.input2[j].model_input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.input2[j].quantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		}
-	}
-	fprintf(fp, "\n");
-
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "output", "output_name", "output_type",
-		"model_output_type", "dequantize");
-	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->glow.metadata.model.num_output; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.output1[i].output_name);
-			rte_ml_io_type_to_str(model->glow.metadata.output1[i].output_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.output1[i].model_output_type,
-					      str, STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.output1[i].dequantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.output2[j].output_name);
-			rte_ml_io_type_to_str(model->glow.metadata.output2[j].output_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.output2[j].model_output_type,
-					      str, STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.output2[j].dequantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		}
-	}
-	fprintf(fp, "\n");
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "\n");
-}
-
 static void
 cn10k_ml_prep_sp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				struct cnxk_ml_req *req, enum cn10k_ml_job_type job_type)
@@ -1120,38 +971,25 @@ cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mo
 }
 
 int
-cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
+cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
 	struct cn10k_ml_fw *fw;
 
 	uint32_t head_loc;
 	uint32_t tail_loc;
-	uint16_t model_id;
 	uint32_t bufsize;
 	char *head_ptr;
 	int core_id;
 
-	if (roc_env_is_asim())
-		return 0;
-
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	fw = &cn10k_mldev->fw;
 
-	/* Dump model info */
-	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
-		model = dev->data->models[model_id];
-		if (model != NULL) {
-			cn10k_ml_model_print(dev, model_id, fp);
-			fprintf(fp, "\n");
-		}
-	}
-
 	/* Dump OCM state */
-	cn10k_ml_ocm_print(dev, fp);
+	cn10k_ml_ocm_print(cnxk_mldev, fp);
+
+	if (roc_env_is_asim())
+		return 0;
 
 	/* Dump debug buffer */
 	for (core_id = 0; core_id <= 1; core_id++) {
@@ -1207,17 +1045,15 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 }
 
 int
-cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
+cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	const struct plt_memzone *mz;
 	struct cnxk_ml_req *req;
 	uint64_t timeout_cycle;
 	bool timeout;
 	int ret;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cnxk_ml_req), 0,
 					 ML_CN10K_ALIGN_SIZE);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 780e2a9f9c..5fda98ae88 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -295,8 +295,8 @@ int cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_d
 int cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
-int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
-int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
+int cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
+int cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev);
 
 int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
 void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
index 3d735ced3e..b069d4e3a5 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.c
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -5,3 +5,36 @@
 #include <rte_mldev.h>
 
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_utils.h"
+
+void
+cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp)
+{
+	struct cnxk_ml_layer *layer;
+	uint16_t layer_id;
+
+	/* Print debug info */
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, " Model Information (Model ID: %u, Name: %s)\n", model->model_id, model->name);
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->name);
+	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->batch_size);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "nb_layers", model->nb_layers);
+
+	/* Print model state */
+	if (model->state == ML_CNXK_MODEL_STATE_LOADED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+	if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+	if (model->state == ML_CNXK_MODEL_STATE_STARTED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+
+	for (layer_id = 0; layer_id < model->nb_layers; layer_id++) {
+		layer = &model->layer[layer_id];
+		cn10k_ml_layer_print(cnxk_mldev, layer, fp);
+	}
+}
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index a2994dbb71..66d979dd3c 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -108,4 +108,6 @@ struct cnxk_ml_model {
 	plt_spinlock_t lock;
 };
 
+void cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp);
+
 #endif /* _CNXK_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 63842025fc..66b88ddae1 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -409,6 +409,41 @@ cnxk_ml_dev_stop(struct rte_ml_dev *dev)
 	return 0;
 }
 
+static int
+cnxk_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	uint16_t model_id;
+
+	if ((dev == NULL) || (fp == NULL))
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	/* Dump model info */
+	for (model_id = 0; model_id < cnxk_mldev->mldev->data->nb_models; model_id++) {
+		model = cnxk_mldev->mldev->data->models[model_id];
+		if (model != NULL)
+			cnxk_ml_model_dump(cnxk_mldev, model, fp);
+	}
+
+	return cn10k_ml_dev_dump(cnxk_mldev, fp);
+}
+
+static int
+cnxk_ml_dev_selftest(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	return cn10k_ml_dev_selftest(cnxk_mldev);
+}
+
 static int
 cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 			     const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
@@ -729,8 +764,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_close = cnxk_ml_dev_close,
 	.dev_start = cnxk_ml_dev_start,
 	.dev_stop = cnxk_ml_dev_stop,
-	.dev_dump = cn10k_ml_dev_dump,
-	.dev_selftest = cn10k_ml_dev_selftest,
+	.dev_dump = cnxk_ml_dev_dump,
+	.dev_selftest = cnxk_ml_dev_selftest,
 
 	/* Queue-pair handling ops */
 	.dev_queue_pair_setup = cnxk_ml_dev_queue_pair_setup,
diff --git a/drivers/ml/cnxk/cnxk_ml_utils.c b/drivers/ml/cnxk/cnxk_ml_utils.c
new file mode 100644
index 0000000000..ca3670a9e8
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_utils.c
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include "cnxk_ml_utils.h"
+
+void
+cnxk_ml_print_line(FILE *fp, int len)
+{
+	int i;
+
+	for (i = 0; i < len; i++)
+		fprintf(fp, "-");
+	fprintf(fp, "\n");
+}
diff --git a/drivers/ml/cnxk/cnxk_ml_utils.h b/drivers/ml/cnxk/cnxk_ml_utils.h
new file mode 100644
index 0000000000..ed2ab21346
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_utils.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_UTILS_H_
+#define _CNXK_ML_UTILS_H_
+
+#include <rte_mldev.h>
+
+/* Debug print width */
+#define STR_LEN	  12
+#define FIELD_LEN 16
+#define LINE_LEN  72
+
+void cnxk_ml_print_line(FILE *fp, int len);
+
+#endif /* _CNXK_ML_UTILS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 79154c8698..5d27a87d91 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -16,6 +16,7 @@ sources = files(
         'cnxk_ml_io.c',
         'cnxk_ml_model.c',
         'cnxk_ml_ops.c',
+        'cnxk_ml_utils.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 14/34] ml/cnxk: update device stats functions
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (12 preceding siblings ...)
  2023-10-23  4:41   ` [PATCH v8 13/34] ml/cnxk: update device debug functions Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 15/34] ml/cnxk: update device and model xstats functions Srikanth Yalavarthi
                     ` (19 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to handle ML device stats

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 32 ------------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |  2 --
 drivers/ml/cnxk/cnxk_ml_ops.c  | 36 ++++++++++++++++++++++++++++++++--
 3 files changed, 34 insertions(+), 36 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index a56d002d4c..8cbf700f6e 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -770,38 +770,6 @@ cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
-int
-cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
-{
-	struct cnxk_ml_qp *qp;
-	int qp_id;
-
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-		qp = dev->data->queue_pairs[qp_id];
-		stats->enqueued_count += qp->stats.enqueued_count;
-		stats->dequeued_count += qp->stats.dequeued_count;
-		stats->enqueue_err_count += qp->stats.enqueue_err_count;
-		stats->dequeue_err_count += qp->stats.dequeue_err_count;
-	}
-
-	return 0;
-}
-
-void
-cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
-{
-	struct cnxk_ml_qp *qp;
-	int qp_id;
-
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-		qp = dev->data->queue_pairs[qp_id];
-		qp->stats.enqueued_count = 0;
-		qp->stats.dequeued_count = 0;
-		qp->stats.enqueue_err_count = 0;
-		qp->stats.dequeue_err_count = 0;
-	}
-}
-
 int
 cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 5fda98ae88..47e7cb12af 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -298,8 +298,6 @@ int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev);
 
-int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
-void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
 int cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 				  int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
 				  uint32_t size);
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 66b88ddae1..c75317d6da 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -489,6 +489,38 @@ cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+static int
+cnxk_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
+{
+	struct cnxk_ml_qp *qp;
+	int qp_id;
+
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		stats->enqueued_count += qp->stats.enqueued_count;
+		stats->dequeued_count += qp->stats.dequeued_count;
+		stats->enqueue_err_count += qp->stats.enqueue_err_count;
+		stats->dequeue_err_count += qp->stats.dequeue_err_count;
+	}
+
+	return 0;
+}
+
+static void
+cnxk_ml_dev_stats_reset(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_qp *qp;
+	int qp_id;
+
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		qp->stats.enqueued_count = 0;
+		qp->stats.dequeued_count = 0;
+		qp->stats.enqueue_err_count = 0;
+		qp->stats.dequeue_err_count = 0;
+	}
+}
+
 static int
 cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
@@ -772,8 +804,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_queue_pair_release = cnxk_ml_dev_queue_pair_release,
 
 	/* Stats ops */
-	.dev_stats_get = cn10k_ml_dev_stats_get,
-	.dev_stats_reset = cn10k_ml_dev_stats_reset,
+	.dev_stats_get = cnxk_ml_dev_stats_get,
+	.dev_stats_reset = cnxk_ml_dev_stats_reset,
 	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
 	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
 	.dev_xstats_get = cn10k_ml_dev_xstats_get,
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 15/34] ml/cnxk: update device and model xstats functions
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (13 preceding siblings ...)
  2023-10-23  4:41   ` [PATCH v8 14/34] ml/cnxk: update device stats functions Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 16/34] ml/cnxk: update fast path functions Srikanth Yalavarthi
                     ` (18 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to handle ML device and model
extended stats. Handling resources for the xstats is done
in the cnxk layer. Introduced internal xstats group.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |   4 -
 drivers/ml/cnxk/cn10k_ml_ops.c   | 531 +++----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h   |  16 +-
 drivers/ml/cnxk/cnxk_ml_dev.h    |   5 +
 drivers/ml/cnxk/cnxk_ml_ops.c    | 481 +++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_xstats.h |  21 +-
 6 files changed, 551 insertions(+), 507 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index be989e0a20..bde9d08901 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -10,7 +10,6 @@
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_io.h"
-#include "cnxk_ml_xstats.h"
 
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
@@ -133,9 +132,6 @@ struct cn10k_ml_dev {
 	/* OCM info */
 	struct cn10k_ml_ocm ocm;
 
-	/* Extended stats data */
-	struct cnxk_ml_xstats xstats;
-
 	/* Enable / disable model data caching */
 	int cache_model_data;
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 8cbf700f6e..776ad60401 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -198,107 +198,21 @@ cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_r
 	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
 }
 
-static int
-cn10k_ml_xstats_init(struct rte_ml_dev *dev)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint16_t nb_stats;
-	uint16_t stat_id;
-	uint16_t model;
-	uint16_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
-	nb_stats = RTE_DIM(device_xstats) + ML_CNXK_MAX_MODELS * RTE_DIM(layer_xstats);
-	if (cn10k_mldev->xstats.entries == NULL)
-		cn10k_mldev->xstats.entries = rte_zmalloc(
-			"cn10k_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
-			PLT_CACHE_LINE_SIZE);
-
-	if (cn10k_mldev->xstats.entries == NULL)
-		return -ENOMEM;
-
-	/* Initialize device xstats */
-	stat_id = 0;
-	for (i = 0; i < RTE_DIM(device_xstats); i++) {
-		cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
-		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
-			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s",
-			 device_xstats[i].name);
-
-		cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
-		cn10k_mldev->xstats.entries[stat_id].type = device_xstats[i].type;
-		cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_DEVICE;
-		cn10k_mldev->xstats.entries[stat_id].obj_idx = 0;
-		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_xstats[i].reset_allowed;
-		stat_id++;
-	}
-	cn10k_mldev->xstats.count_mode_device = stat_id;
-
-	/* Initialize model xstats */
-	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
-		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
-
-		for (i = 0; i < RTE_DIM(layer_xstats); i++) {
-			cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
-			cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
-			cn10k_mldev->xstats.entries[stat_id].type = layer_xstats[i].type;
-			cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
-			cn10k_mldev->xstats.entries[stat_id].obj_idx = model;
-			cn10k_mldev->xstats.entries[stat_id].reset_allowed =
-				layer_xstats[i].reset_allowed;
-
-			/* Name of xstat is updated during model load */
-			snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
-				 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name),
-				 "Model-%u-%s", model, layer_xstats[i].name);
-
-			stat_id++;
-		}
-
-		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
-	}
-
-	cn10k_mldev->xstats.count_mode_model = stat_id - cn10k_mldev->xstats.count_mode_device;
-	cn10k_mldev->xstats.count = stat_id;
-
-	return 0;
-}
-
 static void
-cn10k_ml_xstats_uninit(struct rte_ml_dev *dev)
+cn10k_ml_xstats_layer_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id,
+				  uint16_t layer_id)
 {
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	rte_free(cn10k_mldev->xstats.entries);
-	cn10k_mldev->xstats.entries = NULL;
-
-	cn10k_mldev->xstats.count = 0;
-}
-
-static void
-cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	uint16_t rclk_freq;
 	uint16_t sclk_freq;
 	uint16_t stat_id;
 	char suffix[8];
 	uint16_t i;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	model = dev->data->models[model_id];
-	stat_id = RTE_DIM(device_xstats) + model_id * RTE_DIM(layer_xstats);
+	model = cnxk_mldev->mldev->data->models[model_id];
+	layer = &model->layer[layer_id];
+	stat_id = cnxk_mldev->xstats.offset_for_layer[model_id][layer_id];
 
 	roc_clk_freq_get(&rclk_freq, &sclk_freq);
 	if (sclk_freq == 0)
@@ -306,270 +220,94 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 	else
 		strcpy(suffix, "ns");
 
-	/* Update xstat name based on model name and sclk availability */
+	/* Update xstat name based on layer name and sclk availability */
 	for (i = 0; i < RTE_DIM(layer_xstats); i++) {
-		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
-			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
-			 model->layer[0].glow.metadata.model.name, layer_xstats[i].name, suffix);
+		snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+			 layer->glow.metadata.model.name, layer_xstats[i].name, suffix);
 		stat_id++;
 	}
 }
 
-static uint64_t
-cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
-		       enum cnxk_ml_xstats_type type)
-{
-	struct cnxk_ml_dev *cnxk_mldev;
-
-	cnxk_mldev = dev->data->dev_private;
-
-	switch (type) {
-	case nb_models_loaded:
-		return cnxk_mldev->nb_models_loaded;
-	case nb_models_unloaded:
-		return cnxk_mldev->nb_models_unloaded;
-	case nb_models_started:
-		return cnxk_mldev->nb_models_started;
-	case nb_models_stopped:
-		return cnxk_mldev->nb_models_stopped;
-	default:
-		return -1;
-	}
-
-	return 0;
-}
-
-#define ML_AVG_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+#define ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
 	do {                                                                                       \
 		value = 0;                                                                         \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value += model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot;       \
-			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
-				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value += layer->glow.burst_xstats[qp_id].str##_latency_tot;                \
+			count += layer->glow.burst_xstats[qp_id].dequeued_count -                  \
+				 layer->glow.burst_xstats[qp_id].str##_reset_count;                \
 		}                                                                                  \
+		value += layer->glow.sync_xstats->str##_latency_tot;                               \
+		count += layer->glow.sync_xstats->dequeued_count -                                 \
+			 layer->glow.sync_xstats->str##_reset_count;                               \
 		if (count != 0)                                                                    \
 			value = value / count;                                                     \
 	} while (0)
 
-#define ML_MIN_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+#define ML_MIN_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
 	do {                                                                                       \
 		value = UINT64_MAX;                                                                \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MIN(                                                           \
-				value,                                                             \
-				model->layer[0].glow.burst_xstats[qp_id].str##_latency_min);       \
-			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
-				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MIN(value, layer->glow.burst_xstats[qp_id].str##_latency_min); \
+			count += layer->glow.burst_xstats[qp_id].dequeued_count -                  \
+				 layer->glow.burst_xstats[qp_id].str##_reset_count;                \
 		}                                                                                  \
+		value = PLT_MIN(value, layer->glow.sync_xstats->str##_latency_min);                \
+		count += layer->glow.sync_xstats->dequeued_count -                                 \
+			 layer->glow.sync_xstats->str##_reset_count;                               \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
 	} while (0)
 
-#define ML_MAX_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+#define ML_MAX_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
 	do {                                                                                       \
 		value = 0;                                                                         \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MAX(                                                           \
-				value,                                                             \
-				model->layer[0].glow.burst_xstats[qp_id].str##_latency_max);       \
-			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
-				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MAX(value, layer->glow.burst_xstats[qp_id].str##_latency_max); \
+			count += layer->glow.burst_xstats[qp_id].dequeued_count -                  \
+				 layer->glow.burst_xstats[qp_id].str##_reset_count;                \
 		}                                                                                  \
+		value = PLT_MAX(value, layer->glow.sync_xstats->str##_latency_max);                \
+		count += layer->glow.sync_xstats->dequeued_count -                                 \
+			 layer->glow.sync_xstats->str##_reset_count;                               \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
 	} while (0)
 
-static uint64_t
-cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cnxk_ml_xstats_type type)
+uint64_t
+cn10k_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
+			 enum cnxk_ml_xstats_type type)
 {
-	struct cnxk_ml_model *model;
-	uint16_t rclk_freq; /* MHz */
-	uint16_t sclk_freq; /* MHz */
 	uint64_t count = 0;
-	uint64_t value;
+	uint64_t value = 0;
 	uint32_t qp_id;
 
-	model = dev->data->models[obj_idx];
-	if (model == NULL)
-		return 0;
-
 	switch (type) {
 	case avg_hw_latency:
-		ML_AVG_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, hw, value, count);
 		break;
 	case min_hw_latency:
-		ML_MIN_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		ML_MIN_FOREACH_QP(cnxk_mldev, layer, qp_id, hw, value, count);
 		break;
 	case max_hw_latency:
-		ML_MAX_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		ML_MAX_FOREACH_QP(cnxk_mldev, layer, qp_id, hw, value, count);
 		break;
 	case avg_fw_latency:
-		ML_AVG_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, fw, value, count);
 		break;
 	case min_fw_latency:
-		ML_MIN_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		ML_MIN_FOREACH_QP(cnxk_mldev, layer, qp_id, fw, value, count);
 		break;
 	case max_fw_latency:
-		ML_MAX_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		ML_MAX_FOREACH_QP(cnxk_mldev, layer, qp_id, fw, value, count);
 		break;
 	default:
 		value = 0;
 	}
 
-	roc_clk_freq_get(&rclk_freq, &sclk_freq);
-	if (sclk_freq != 0) /* return in ns */
-		value = (value * 1000ULL) / sclk_freq;
-
 	return value;
 }
 
-static int
-cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[], uint16_t nb_ids)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_xstats_entry *xs;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint16_t nb_stats;
-	uint16_t stat_id;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	if (stat_ids == NULL)
-		nb_stats = cn10k_mldev->xstats.count_mode_device;
-	else
-		nb_stats = nb_ids;
-
-	for (i = 0; i < nb_stats; i++) {
-		if (stat_ids == NULL)
-			stat_id = i;
-		else
-			stat_id = stat_ids[i];
-
-		if (stat_id >= cn10k_mldev->xstats.count_mode_device)
-			return -EINVAL;
-
-		xs = &cn10k_mldev->xstats.entries[stat_id];
-		if (!xs->reset_allowed)
-			continue;
-
-		xs->reset_value = cn10k_ml_dev_xstat_get(dev, xs->obj_idx, xs->type);
-	}
-
-	return 0;
-}
-
-#define ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
-	do {                                                                                       \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot = 0;            \
-			model->layer[0].glow.burst_xstats[qp_id].str##_reset_count =               \
-				model->layer[0].glow.burst_xstats[qp_id].dequeued_count;           \
-		}                                                                                  \
-	} while (0)
-
-#define ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
-	do {                                                                                       \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_xstats[qp_id].str##_latency_min = UINT64_MAX;   \
-	} while (0)
-
-#define ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
-	do {                                                                                       \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_xstats[qp_id].str##_latency_max = 0;            \
-	} while (0)
-
-static void
-cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cnxk_ml_xstats_type type)
-{
-	struct cnxk_ml_model *model;
-	uint32_t qp_id;
-
-	model = dev->data->models[model_id];
-
-	switch (type) {
-	case avg_hw_latency:
-		ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, hw);
-		break;
-	case min_hw_latency:
-		ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, hw);
-		break;
-	case max_hw_latency:
-		ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, hw);
-		break;
-	case avg_fw_latency:
-		ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, fw);
-		break;
-	case min_fw_latency:
-		ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, fw);
-		break;
-	case max_fw_latency:
-		ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, fw);
-		break;
-	default:
-		return;
-	}
-}
-
-static int
-cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint16_t stat_ids[],
-			    uint16_t nb_ids)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_xstats_entry *xs;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
-	int32_t lcl_model_id = 0;
-	uint16_t start_id;
-	uint16_t end_id;
-	int32_t i;
-	int32_t j;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	for (i = 0; i < ML_CNXK_MAX_MODELS; i++) {
-		if (model_id == -1) {
-			model = dev->data->models[i];
-			if (model == NULL) /* Skip inactive models */
-				continue;
-		} else {
-			if (model_id != i)
-				continue;
-
-			model = dev->data->models[model_id];
-			if (model == NULL) {
-				plt_err("Invalid model_id = %d\n", model_id);
-				return -EINVAL;
-			}
-		}
-
-		start_id = cn10k_mldev->xstats.offset_for_model[i];
-		end_id = cn10k_mldev->xstats.offset_for_model[i] +
-			 cn10k_mldev->xstats.count_per_model[i] - 1;
-
-		if (stat_ids == NULL) {
-			for (j = start_id; j <= end_id; j++) {
-				xs = &cn10k_mldev->xstats.entries[j];
-				cn10k_ml_reset_model_stat(dev, i, xs->type);
-			}
-		} else {
-			for (j = 0; j < nb_ids; j++) {
-				if (stat_ids[j] < start_id || stat_ids[j] > end_id) {
-					plt_err("Invalid stat_ids[%d] = %d for model_id = %d\n", j,
-						stat_ids[j], lcl_model_id);
-					return -EINVAL;
-				}
-				xs = &cn10k_mldev->xstats.entries[stat_ids[j]];
-				cn10k_ml_reset_model_stat(dev, i, xs->type);
-			}
-		}
-	}
-
-	return 0;
-}
-
 static int
 cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer)
 {
@@ -654,7 +392,6 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cn10k_ml_ocm *ocm;
 	uint16_t tile_id;
-	int ret;
 
 	RTE_SET_USED(conf);
 
@@ -682,13 +419,6 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 
 	rte_spinlock_init(&ocm->lock);
 
-	/* Initialize xstats */
-	ret = cn10k_ml_xstats_init(cnxk_mldev->mldev);
-	if (ret != 0) {
-		plt_err("Failed to initialize xstats");
-		return ret;
-	}
-
 	/* Set JCMDQ enqueue function */
 	if (cn10k_mldev->hw_queue_lock == 1)
 		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_sl;
@@ -717,9 +447,6 @@ cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 	/* Release ocm_mask memory */
 	rte_free(cn10k_mldev->ocm.ocm_mask);
 
-	/* Un-initialize xstats */
-	cn10k_ml_xstats_uninit(cnxk_mldev->mldev);
-
 	/* Unload firmware */
 	cn10k_ml_fw_unload(cnxk_mldev);
 
@@ -770,174 +497,6 @@ cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
-int
-cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
-			      uint32_t size)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint32_t xstats_mode_count;
-	uint32_t idx = 0;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	xstats_mode_count = 0;
-	switch (mode) {
-	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
-		break;
-	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CNXK_MAX_MODELS)
-			break;
-		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
-		break;
-	default:
-		return -EINVAL;
-	};
-
-	if (xstats_mode_count > size || xstats_map == NULL)
-		return xstats_mode_count;
-
-	for (i = 0; i < cn10k_mldev->xstats.count && idx < size; i++) {
-		if (cn10k_mldev->xstats.entries[i].mode != mode)
-			continue;
-
-		if (mode != RTE_ML_DEV_XSTATS_DEVICE &&
-		    model_id != cn10k_mldev->xstats.entries[i].obj_idx)
-			continue;
-
-		rte_strscpy(xstats_map[idx].name, cn10k_mldev->xstats.entries[i].map.name,
-			    RTE_ML_STR_MAX);
-		xstats_map[idx].id = cn10k_mldev->xstats.entries[i].map.id;
-		idx++;
-	}
-
-	return idx;
-}
-
-int
-cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
-				uint64_t *value)
-{
-	struct cnxk_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	cnxk_ml_xstats_fn fn;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	for (i = 0; i < cn10k_mldev->xstats.count; i++) {
-		xs = &cn10k_mldev->xstats.entries[i];
-		if (strncmp(xs->map.name, name, RTE_ML_STR_MAX) == 0) {
-			if (stat_id != NULL)
-				*stat_id = xs->map.id;
-
-			switch (xs->fn_id) {
-			case CNXK_ML_XSTATS_FN_DEVICE:
-				fn = cn10k_ml_dev_xstat_get;
-				break;
-			case CNXK_ML_XSTATS_FN_MODEL:
-				fn = cn10k_ml_model_xstat_get;
-				break;
-			default:
-				plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
-				return -EINVAL;
-			}
-
-			*value = fn(dev, xs->obj_idx, xs->type) - xs->reset_value;
-
-			return 0;
-		}
-	}
-
-	if (stat_id != NULL)
-		*stat_id = (uint16_t)-1;
-
-	return -EINVAL;
-}
-
-int
-cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
-			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_xstats_entry *xs;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint32_t xstats_mode_count;
-	cnxk_ml_xstats_fn fn;
-	uint64_t val;
-	uint32_t idx;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	xstats_mode_count = 0;
-
-	switch (mode) {
-	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
-		break;
-	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CNXK_MAX_MODELS)
-			return -EINVAL;
-		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
-		break;
-	default:
-		return -EINVAL;
-	};
-
-	idx = 0;
-	for (i = 0; i < nb_ids && idx < xstats_mode_count; i++) {
-		xs = &cn10k_mldev->xstats.entries[stat_ids[i]];
-		if (stat_ids[i] > cn10k_mldev->xstats.count || xs->mode != mode)
-			continue;
-
-		if (mode == RTE_ML_DEV_XSTATS_MODEL && model_id != xs->obj_idx) {
-			plt_err("Invalid stats_id[%d] = %d for model_id = %d\n", i, stat_ids[i],
-				model_id);
-			return -EINVAL;
-		}
-
-		switch (xs->fn_id) {
-		case CNXK_ML_XSTATS_FN_DEVICE:
-			fn = cn10k_ml_dev_xstat_get;
-			break;
-		case CNXK_ML_XSTATS_FN_MODEL:
-			fn = cn10k_ml_model_xstat_get;
-			break;
-		default:
-			plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
-			return -EINVAL;
-		}
-
-		val = fn(dev, xs->obj_idx, xs->type);
-		if (values)
-			values[idx] = val;
-
-		idx++;
-	}
-
-	return idx;
-}
-
-int
-cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			  int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids)
-{
-	switch (mode) {
-	case RTE_ML_DEV_XSTATS_DEVICE:
-		return cn10k_ml_device_xstats_reset(dev, stat_ids, nb_ids);
-	case RTE_ML_DEV_XSTATS_MODEL:
-		return cn10k_ml_model_xstats_reset(dev, model_id, stat_ids, nb_ids);
-	};
-
-	return 0;
-}
-
 int
 cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
 {
@@ -1211,7 +770,7 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 							      sizeof(struct cn10k_ml_layer_xstats));
 
 	/* Update xstats names */
-	cn10k_ml_xstats_model_name_update(cnxk_mldev->mldev, idx);
+	cn10k_ml_xstats_layer_name_update(cnxk_mldev, model_id, layer_id);
 
 	layer->state = ML_CNXK_LAYER_STATE_LOADED;
 	cnxk_mldev->index_map[idx].model_id = model->model_id;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 47e7cb12af..4d76164dba 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -13,6 +13,7 @@
 struct cnxk_ml_dev;
 struct cnxk_ml_qp;
 struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
@@ -298,17 +299,6 @@ int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev);
 
-int cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-				  int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
-				  uint32_t size);
-int cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
-				    uint64_t *value);
-int cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			    int32_t model_id, const uint16_t stat_ids[], uint64_t values[],
-			    uint16_t nb_ids);
-int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			      int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids);
-
 /* Slow-path ops */
 int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
@@ -337,4 +327,8 @@ int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_nam
 int cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name);
 int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name);
 
+/* xstats ops */
+uint64_t cn10k_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
+				  enum cnxk_ml_xstats_type type);
+
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 1590249abd..3ce9338f1f 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -9,6 +9,8 @@
 
 #include "cn10k_ml_dev.h"
 
+#include "cnxk_ml_xstats.h"
+
 /* ML command timeout in seconds */
 #define ML_CNXK_CMD_TIMEOUT 5
 
@@ -51,6 +53,9 @@ struct cnxk_ml_dev {
 	/* Configuration state */
 	enum cnxk_ml_dev_state state;
 
+	/* Extended stats data */
+	struct cnxk_ml_xstats xstats;
+
 	/* Number of models loaded */
 	uint16_t nb_models_loaded;
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index c75317d6da..4f4a41219e 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -115,6 +115,285 @@ cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc
 	return NULL;
 }
 
+static int
+cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
+{
+	uint16_t nb_stats;
+	uint16_t stat_id;
+	uint16_t model;
+	uint16_t layer;
+	uint16_t i;
+
+	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
+	nb_stats = RTE_DIM(device_xstats) +
+		   RTE_DIM(layer_xstats) * ML_CNXK_MAX_MODELS * ML_CNXK_MODEL_MAX_LAYERS;
+	if (cnxk_mldev->xstats.entries == NULL)
+		cnxk_mldev->xstats.entries = rte_zmalloc(
+			"cnxk_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
+			PLT_CACHE_LINE_SIZE);
+
+	if (cnxk_mldev->xstats.entries == NULL)
+		return -ENOMEM;
+
+	/* Initialize device xstats */
+	stat_id = 0;
+	for (i = 0; i < RTE_DIM(device_xstats); i++) {
+		cnxk_mldev->xstats.entries[stat_id].map.id = stat_id;
+		snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s",
+			 device_xstats[i].name);
+
+		cnxk_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
+		cnxk_mldev->xstats.entries[stat_id].group = CNXK_ML_XSTATS_GROUP_DEVICE;
+		cnxk_mldev->xstats.entries[stat_id].type = device_xstats[i].type;
+		cnxk_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_DEVICE;
+		cnxk_mldev->xstats.entries[stat_id].obj_idx = 0;
+		cnxk_mldev->xstats.entries[stat_id].reset_allowed = device_xstats[i].reset_allowed;
+		stat_id++;
+	}
+	cnxk_mldev->xstats.count_mode_device = stat_id;
+
+	/* Initialize model xstats */
+	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
+		cnxk_mldev->xstats.offset_for_model[model] = stat_id;
+
+		for (layer = 0; layer < ML_CNXK_MODEL_MAX_LAYERS; layer++) {
+			cnxk_mldev->xstats.offset_for_layer[model][layer] = stat_id;
+
+			for (i = 0; i < RTE_DIM(layer_xstats); i++) {
+				cnxk_mldev->xstats.entries[stat_id].map.id = stat_id;
+				cnxk_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
+				cnxk_mldev->xstats.entries[stat_id].group =
+					CNXK_ML_XSTATS_GROUP_LAYER;
+				cnxk_mldev->xstats.entries[stat_id].type = layer_xstats[i].type;
+				cnxk_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
+				cnxk_mldev->xstats.entries[stat_id].obj_idx = model;
+				cnxk_mldev->xstats.entries[stat_id].layer_id = layer;
+				cnxk_mldev->xstats.entries[stat_id].reset_allowed =
+					layer_xstats[i].reset_allowed;
+
+				/* Name of xstat is updated during model load */
+				snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+					 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name),
+					 "Layer-%u-%u-%s", model, layer, layer_xstats[i].name);
+
+				stat_id++;
+			}
+
+			cnxk_mldev->xstats.count_per_layer[model][layer] = RTE_DIM(layer_xstats);
+		}
+
+		cnxk_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
+	}
+
+	cnxk_mldev->xstats.count_mode_model = stat_id - cnxk_mldev->xstats.count_mode_device;
+	cnxk_mldev->xstats.count = stat_id;
+
+	return 0;
+}
+
+static void
+cnxk_ml_xstats_uninit(struct cnxk_ml_dev *cnxk_mldev)
+{
+	rte_free(cnxk_mldev->xstats.entries);
+	cnxk_mldev->xstats.entries = NULL;
+
+	cnxk_mldev->xstats.count = 0;
+}
+
+static uint64_t
+cnxk_ml_dev_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx __rte_unused,
+		      int32_t layer_id __rte_unused, enum cnxk_ml_xstats_type type)
+{
+	switch (type) {
+	case nb_models_loaded:
+		return cnxk_mldev->nb_models_loaded;
+	case nb_models_unloaded:
+		return cnxk_mldev->nb_models_unloaded;
+	case nb_models_started:
+		return cnxk_mldev->nb_models_started;
+	case nb_models_stopped:
+		return cnxk_mldev->nb_models_stopped;
+	default:
+		return -1;
+	}
+
+	return 0;
+}
+
+static uint64_t
+cnxk_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx, int32_t layer_id,
+			enum cnxk_ml_xstats_type type)
+{
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
+	uint16_t rclk_freq; /* MHz */
+	uint16_t sclk_freq; /* MHz */
+	uint64_t value = 0;
+
+	model = cnxk_mldev->mldev->data->models[obj_idx];
+	if (model == NULL)
+		return 0;
+
+	if (layer_id >= 0)
+		layer = &model->layer[layer_id];
+	else
+		return 0;
+
+	value = cn10k_ml_model_xstat_get(cnxk_mldev, layer, type);
+
+	roc_clk_freq_get(&rclk_freq, &sclk_freq);
+	if (sclk_freq != 0) /* return in ns */
+		value = (value * 1000ULL) / sclk_freq;
+
+	return value;
+}
+
+static int
+cnxk_ml_device_xstats_reset(struct cnxk_ml_dev *cnxk_mldev, const uint16_t stat_ids[],
+			    uint16_t nb_ids)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	uint16_t nb_stats;
+	uint16_t stat_id;
+	uint32_t i;
+
+	if (stat_ids == NULL)
+		nb_stats = cnxk_mldev->xstats.count_mode_device;
+	else
+		nb_stats = nb_ids;
+
+	for (i = 0; i < nb_stats; i++) {
+		if (stat_ids == NULL)
+			stat_id = i;
+		else
+			stat_id = stat_ids[i];
+
+		if (stat_id >= cnxk_mldev->xstats.count_mode_device)
+			return -EINVAL;
+
+		xs = &cnxk_mldev->xstats.entries[stat_id];
+		if (!xs->reset_allowed)
+			continue;
+
+		xs->reset_value =
+			cnxk_ml_dev_xstat_get(cnxk_mldev, xs->obj_idx, xs->layer_id, xs->type);
+	}
+
+	return 0;
+}
+
+#define ML_AVG_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, str)                                     \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			layer->glow.burst_xstats[qp_id].str##_latency_tot = 0;                     \
+			layer->glow.burst_xstats[qp_id].str##_reset_count =                        \
+				layer->glow.burst_xstats[qp_id].dequeued_count;                    \
+		}                                                                                  \
+	} while (0)
+
+#define ML_MIN_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, str)                                     \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++)          \
+			layer->glow.burst_xstats[qp_id].str##_latency_min = UINT64_MAX;            \
+	} while (0)
+
+#define ML_MAX_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, str)                                     \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++)          \
+			layer->glow.burst_xstats[qp_id].str##_latency_max = 0;                     \
+	} while (0)
+
+static void
+cnxk_ml_reset_model_stat(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id,
+			 enum cnxk_ml_xstats_type type)
+{
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
+	uint16_t layer_id = 0;
+	uint32_t qp_id;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	layer = &model->layer[layer_id];
+
+	switch (type) {
+	case avg_hw_latency:
+		ML_AVG_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, hw);
+		break;
+	case min_hw_latency:
+		ML_MIN_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, hw);
+		break;
+	case max_hw_latency:
+		ML_MAX_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, hw);
+		break;
+	case avg_fw_latency:
+		ML_AVG_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, fw);
+		break;
+	case min_fw_latency:
+		ML_MIN_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, fw);
+		break;
+	case max_fw_latency:
+		ML_MAX_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, fw);
+		break;
+	default:
+		return;
+	}
+}
+
+static int
+cnxk_ml_model_xstats_reset(struct cnxk_ml_dev *cnxk_mldev, int32_t model_id,
+			   const uint16_t stat_ids[], uint16_t nb_ids)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_model *model;
+	int32_t lcl_model_id = 0;
+	uint16_t layer_id = 0;
+	uint16_t start_id;
+	uint16_t end_id;
+	int32_t i;
+	int32_t j;
+
+	for (i = 0; i < ML_CNXK_MAX_MODELS; i++) {
+		if (model_id == -1) {
+			model = cnxk_mldev->mldev->data->models[i];
+			if (model == NULL) /* skip inactive models */
+				continue;
+		} else {
+			if (model_id != i)
+				continue;
+
+			model = cnxk_mldev->mldev->data->models[model_id];
+			if (model == NULL) {
+				plt_err("Invalid model_id = %d\n", model_id);
+				return -EINVAL;
+			}
+		}
+
+		start_id = cnxk_mldev->xstats.offset_for_layer[i][layer_id];
+		end_id = cnxk_mldev->xstats.offset_for_layer[i][layer_id] +
+			 cnxk_mldev->xstats.count_per_layer[i][layer_id] - 1;
+
+		if (stat_ids == NULL) {
+			for (j = start_id; j <= end_id; j++) {
+				xs = &cnxk_mldev->xstats.entries[j];
+				cnxk_ml_reset_model_stat(cnxk_mldev, i, xs->type);
+			}
+		} else {
+			for (j = 0; j < nb_ids; j++) {
+				if (stat_ids[j] < start_id || stat_ids[j] > end_id) {
+					plt_err("Invalid stat_ids[%d] = %d for model_id = %d\n", j,
+						stat_ids[j], lcl_model_id);
+					return -EINVAL;
+				}
+				xs = &cnxk_mldev->xstats.entries[stat_ids[j]];
+				cnxk_ml_reset_model_stat(cnxk_mldev, i, xs->type);
+			}
+		}
+	}
+
+	return 0;
+}
+
 static int
 cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -294,6 +573,13 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	for (i = 0; i < cnxk_mldev->max_nb_layers; i++)
 		cnxk_mldev->index_map[i].active = false;
 
+	/* Initialize xstats */
+	ret = cnxk_ml_xstats_init(cnxk_mldev);
+	if (ret != 0) {
+		plt_err("Failed to initialize xstats");
+		goto error;
+	}
+
 	cnxk_mldev->nb_models_loaded = 0;
 	cnxk_mldev->nb_models_started = 0;
 	cnxk_mldev->nb_models_stopped = 0;
@@ -323,6 +609,9 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
+	/* Un-initialize xstats */
+	cnxk_ml_xstats_uninit(cnxk_mldev);
+
 	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close CN10K ML Device");
 
@@ -521,6 +810,190 @@ cnxk_ml_dev_stats_reset(struct rte_ml_dev *dev)
 	}
 }
 
+static int
+cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+			     int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
+			     uint32_t size)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_dev *cnxk_mldev;
+	uint32_t xstats_mode_count;
+	uint16_t layer_id = 0;
+	uint32_t idx = 0;
+	uint32_t i;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+	xstats_mode_count = 0;
+
+	switch (mode) {
+	case RTE_ML_DEV_XSTATS_DEVICE:
+		xstats_mode_count = cnxk_mldev->xstats.count_mode_device;
+		break;
+	case RTE_ML_DEV_XSTATS_MODEL:
+		if (model_id >= ML_CNXK_MAX_MODELS)
+			break;
+		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+		break;
+	default:
+		return -EINVAL;
+	};
+
+	if (xstats_mode_count > size || xstats_map == NULL)
+		return xstats_mode_count;
+
+	for (i = 0; i < cnxk_mldev->xstats.count && idx < size; i++) {
+		xs = &cnxk_mldev->xstats.entries[i];
+		if (xs->mode != mode)
+			continue;
+
+		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
+		    (model_id != xs->obj_idx || layer_id != xs->layer_id))
+			continue;
+
+		rte_strscpy(xstats_map[idx].name, xs->map.name, RTE_ML_STR_MAX);
+		xstats_map[idx].id = xs->map.id;
+		idx++;
+	}
+
+	return idx;
+}
+
+static int
+cnxk_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
+			       uint64_t *value)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_dev *cnxk_mldev;
+	cnxk_ml_xstats_fn fn;
+	uint32_t i;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	for (i = 0; i < cnxk_mldev->xstats.count; i++) {
+		xs = &cnxk_mldev->xstats.entries[i];
+		if (strncmp(xs->map.name, name, RTE_ML_STR_MAX) == 0) {
+			if (stat_id != NULL)
+				*stat_id = xs->map.id;
+
+			switch (xs->fn_id) {
+			case CNXK_ML_XSTATS_FN_DEVICE:
+				fn = cnxk_ml_dev_xstat_get;
+				break;
+			case CNXK_ML_XSTATS_FN_MODEL:
+				fn = cnxk_ml_model_xstat_get;
+				break;
+			default:
+				plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
+				return -EINVAL;
+			}
+
+			*value = fn(cnxk_mldev, xs->obj_idx, xs->layer_id, xs->type) -
+				 xs->reset_value;
+
+			return 0;
+		}
+	}
+
+	if (stat_id != NULL)
+		*stat_id = (uint16_t)-1;
+
+	return -EINVAL;
+}
+
+static int
+cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
+		       const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_dev *cnxk_mldev;
+	uint32_t xstats_mode_count;
+	uint16_t layer_id = 0;
+	cnxk_ml_xstats_fn fn;
+	uint64_t val;
+	uint32_t idx;
+	uint32_t i;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+	xstats_mode_count = 0;
+
+	switch (mode) {
+	case RTE_ML_DEV_XSTATS_DEVICE:
+		xstats_mode_count = cnxk_mldev->xstats.count_mode_device;
+		break;
+	case RTE_ML_DEV_XSTATS_MODEL:
+		if (model_id >= ML_CNXK_MAX_MODELS)
+			return -EINVAL;
+		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+		break;
+	default:
+		return -EINVAL;
+	};
+
+	idx = 0;
+	for (i = 0; i < nb_ids && idx < xstats_mode_count; i++) {
+		xs = &cnxk_mldev->xstats.entries[stat_ids[i]];
+		if (stat_ids[i] > cnxk_mldev->xstats.count || xs->mode != mode)
+			continue;
+
+		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
+		    (model_id != xs->obj_idx || layer_id != xs->layer_id)) {
+			plt_err("Invalid stats_id[%d] = %d for model_id = %d\n", i, stat_ids[i],
+				model_id);
+			return -EINVAL;
+		}
+
+		switch (xs->fn_id) {
+		case CNXK_ML_XSTATS_FN_DEVICE:
+			fn = cnxk_ml_dev_xstat_get;
+			break;
+		case CNXK_ML_XSTATS_FN_MODEL:
+			fn = cnxk_ml_model_xstat_get;
+			break;
+		default:
+			plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
+			return -EINVAL;
+		}
+
+		val = fn(cnxk_mldev, xs->obj_idx, xs->layer_id, xs->type);
+		if (values)
+			values[idx] = val;
+
+		idx++;
+	}
+
+	return idx;
+}
+
+static int
+cnxk_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
+			 const uint16_t stat_ids[], uint16_t nb_ids)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	switch (mode) {
+	case RTE_ML_DEV_XSTATS_DEVICE:
+		return cnxk_ml_device_xstats_reset(cnxk_mldev, stat_ids, nb_ids);
+	case RTE_ML_DEV_XSTATS_MODEL:
+		return cnxk_ml_model_xstats_reset(cnxk_mldev, model_id, stat_ids, nb_ids);
+	};
+
+	return 0;
+}
+
 static int
 cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
@@ -806,10 +1279,10 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Stats ops */
 	.dev_stats_get = cnxk_ml_dev_stats_get,
 	.dev_stats_reset = cnxk_ml_dev_stats_reset,
-	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
-	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
-	.dev_xstats_get = cn10k_ml_dev_xstats_get,
-	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
+	.dev_xstats_names_get = cnxk_ml_dev_xstats_names_get,
+	.dev_xstats_by_name_get = cnxk_ml_dev_xstats_by_name_get,
+	.dev_xstats_get = cnxk_ml_dev_xstats_get,
+	.dev_xstats_reset = cnxk_ml_dev_xstats_reset,
 
 	/* Model ops */
 	.model_load = cnxk_ml_model_load,
diff --git a/drivers/ml/cnxk/cnxk_ml_xstats.h b/drivers/ml/cnxk/cnxk_ml_xstats.h
index 0d405679ca..5e02bb876c 100644
--- a/drivers/ml/cnxk/cnxk_ml_xstats.h
+++ b/drivers/ml/cnxk/cnxk_ml_xstats.h
@@ -7,6 +7,8 @@
 
 #include "cnxk_ml_io.h"
 
+struct cnxk_ml_dev;
+
 /* Extended stats types enum */
 enum cnxk_ml_xstats_type {
 	/* Number of models loaded */
@@ -58,9 +60,21 @@ enum cnxk_ml_xstats_fn_type {
 	CNXK_ML_XSTATS_FN_MODEL,
 };
 
+/* Extended stats group */
+enum cnxk_ml_xstats_group {
+	/* Device stats */
+	CNXK_ML_XSTATS_GROUP_DEVICE,
+
+	/* Model stats */
+	CNXK_ML_XSTATS_GROUP_MODEL,
+
+	/* Layer stats */
+	CNXK_ML_XSTATS_GROUP_LAYER,
+};
+
 /* Function pointer to get xstats for a type */
-typedef uint64_t (*cnxk_ml_xstats_fn)(struct rte_ml_dev *cnxk_mldev, uint16_t obj_idx,
-				      enum cnxk_ml_xstats_type stat);
+typedef uint64_t (*cnxk_ml_xstats_fn)(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx,
+				      int32_t layer_id, enum cnxk_ml_xstats_type stat);
 
 /* Extended stats entry structure */
 struct cnxk_ml_xstats_entry {
@@ -70,6 +84,9 @@ struct cnxk_ml_xstats_entry {
 	/* xstats mode, device or model */
 	enum rte_ml_dev_xstats_mode mode;
 
+	/* xstats group */
+	enum cnxk_ml_xstats_group group;
+
 	/* Type of xstats */
 	enum cnxk_ml_xstats_type type;
 
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 16/34] ml/cnxk: update fast path functions
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (14 preceding siblings ...)
  2023-10-23  4:41   ` [PATCH v8 15/34] ml/cnxk: update device and model xstats functions Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 17/34] ml/cnxk: move error handling to cnxk layer Srikanth Yalavarthi
                     ` (17 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented cnxk layer fast-path functions and added support
for model specific fast-path functions. CNXK layer functions
would invoke model specific fast-path functions.

Added support for model specific poll handling functions and
updated internal inference sync function. Drop use of rte_ml_op
as argument. Updated function arguments to enable the function
to be used as callback by TVM HW runtime.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h  |   5 -
 drivers/ml/cnxk/cn10k_ml_ops.c  | 241 ++++++++------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h  |  13 +-
 drivers/ml/cnxk/cnxk_ml_model.h |  14 ++
 drivers/ml/cnxk/cnxk_ml_ops.c   | 128 +++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.h   |   7 +
 6 files changed, 216 insertions(+), 192 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index bde9d08901..94a94d996f 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -143,11 +143,6 @@ struct cn10k_ml_dev {
 
 	/* JCMD enqueue function handler */
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
-
-	/* Poll handling function pointers */
-	void (*set_poll_addr)(struct cnxk_ml_req *req);
-	void (*set_poll_ptr)(struct cnxk_ml_req *req);
-	uint64_t (*get_poll_ptr)(struct cnxk_ml_req *req);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 776ad60401..8116c8dedb 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -65,24 +65,12 @@ static const struct cn10k_ml_stype_db_driver {
 	{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
 };
 
-static inline void
+__rte_hot void
 cn10k_ml_set_poll_addr(struct cnxk_ml_req *req)
 {
 	req->status = &req->cn10k_req.status;
 }
 
-static inline void
-cn10k_ml_set_poll_ptr(struct cnxk_ml_req *req)
-{
-	plt_write64(ML_CNXK_POLL_JOB_START, req->status);
-}
-
-static inline uint64_t
-cn10k_ml_get_poll_ptr(struct cnxk_ml_req *req)
-{
-	return plt_read64(req->status);
-}
-
 void
 cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp)
 {
@@ -177,7 +165,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_l
 
 static __rte_always_inline void
 cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_req *req,
-				struct rte_ml_op *op)
+				uint16_t index, void *input, void *output, uint16_t nb_batches)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 
@@ -185,17 +173,17 @@ cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_r
 
 	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
 	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(req->status);
-	req->cn10k_req.jd.hdr.model_id = op->model_id;
+	req->cn10k_req.jd.hdr.model_id = index;
 	req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
 	req->cn10k_req.jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
 	req->cn10k_req.jd.hdr.sp_flags = 0x0;
 	req->cn10k_req.jd.hdr.result =
 		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
 	req->cn10k_req.jd.model_run.input_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->input[0]->addr));
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, input));
 	req->cn10k_req.jd.model_run.output_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->output[0]->addr));
-	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, output));
+	req->cn10k_req.jd.model_run.num_batches = nb_batches;
 }
 
 static void
@@ -311,30 +299,15 @@ cn10k_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *l
 static int
 cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer)
 {
-	struct rte_ml_buff_seg seg[2];
-	struct rte_ml_buff_seg *inp;
-	struct rte_ml_buff_seg *out;
-	struct rte_ml_op op;
-
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
 	uint64_t isize = 0;
 	uint64_t osize = 0;
 	int ret = 0;
-	uint32_t i;
-
-	inp = &seg[0];
-	out = &seg[1];
 
 	/* Create input and output buffers. */
-	for (i = 0; i < layer->info.nb_inputs; i++)
-		isize += layer->info.input[i].sz_q;
-
-	for (i = 0; i < layer->info.nb_outputs; i++)
-		osize += layer->info.output[i].sz_q;
-
-	isize = layer->batch_size * isize;
-	osize = layer->batch_size * osize;
+	isize = layer->info.total_input_sz_q;
+	osize = layer->info.total_output_sz_q;
 
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", layer->index);
 	mz = plt_memzone_reserve_aligned(str, isize + osize, 0, ML_CN10K_ALIGN_SIZE);
@@ -342,25 +315,9 @@ cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *
 		return -ENOMEM;
 	memset(mz->addr, 0, isize + osize);
 
-	seg[0].addr = mz->addr;
-	seg[0].iova_addr = mz->iova;
-	seg[0].length = isize;
-	seg[0].next = NULL;
-
-	seg[1].addr = PLT_PTR_ADD(mz->addr, isize);
-	seg[1].iova_addr = mz->iova + isize;
-	seg[1].length = osize;
-	seg[1].next = NULL;
-
-	op.model_id = layer->index;
-	op.nb_batches = layer->batch_size;
-	op.mempool = NULL;
-
-	op.input = &inp;
-	op.output = &out;
-
 	memset(layer->glow.req, 0, sizeof(struct cnxk_ml_req));
-	ret = cn10k_ml_inference_sync(cnxk_mldev, &op);
+	ret = cn10k_ml_inference_sync(cnxk_mldev, layer->index, mz->addr,
+				      PLT_PTR_ADD(mz->addr, isize), 1);
 	plt_memzone_free(mz);
 
 	return ret;
@@ -425,13 +382,8 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 	else
 		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
-	/* Set polling function pointers */
-	cn10k_mldev->set_poll_addr = cn10k_ml_set_poll_addr;
-	cn10k_mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
-	cn10k_mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
-
-	cnxk_mldev->mldev->enqueue_burst = cn10k_ml_enqueue_burst;
-	cnxk_mldev->mldev->dequeue_burst = cn10k_ml_dequeue_burst;
+	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
+	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
 	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
 
 	return 0;
@@ -824,6 +776,12 @@ cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	cn10k_ml_model_info_set(cnxk_mldev, model, &model->layer[0].info, &model->glow.metadata);
 
+	/* Set fast-path functions */
+	model->enqueue_single = cn10k_ml_enqueue_single;
+	model->result_update = cn10k_ml_result_update;
+	model->set_error_code = cn10k_ml_set_error_code;
+	model->set_poll_addr = cn10k_ml_set_poll_addr;
+
 	return 0;
 }
 
@@ -1219,26 +1177,8 @@ cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_mode
 	return 0;
 }
 
-static __rte_always_inline void
-queue_index_advance(uint64_t *index, uint64_t nb_desc)
-{
-	*index = (*index + 1) % nb_desc;
-}
-
-static __rte_always_inline uint64_t
-queue_pending_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
-{
-	return (nb_desc + head - tail) % nb_desc;
-}
-
-static __rte_always_inline uint64_t
-queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
-{
-	return nb_desc - queue_pending_count(head, tail, nb_desc) - 1;
-}
-
-static __rte_always_inline void
-cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml_req *req)
+__rte_hot void
+cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_layer_xstats *xstats;
@@ -1246,6 +1186,7 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml
 	struct cn10k_ml_result *result;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
+	struct cnxk_ml_req *req;
 	struct cnxk_ml_qp *qp;
 	struct rte_ml_op *op;
 	uint64_t hw_latency;
@@ -1253,9 +1194,9 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml
 	uint16_t model_id;
 	uint16_t layer_id;
 
+	req = (struct cnxk_ml_req *)request;
 	result = &req->cn10k_req.result;
 	op = req->op;
-
 	if (likely(result->error_code == 0)) {
 		model_id = cnxk_mldev->index_map[op->model_id].model_id;
 		layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
@@ -1322,119 +1263,48 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml
 	op->user_ptr = result->user_ptr;
 }
 
-__rte_hot uint16_t
-cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
-		       uint16_t nb_ops)
+__rte_hot void
+cn10k_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype)
+{
+	union cn10k_ml_error_code *error_code;
+
+	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+	error_code->s.etype = etype;
+	error_code->s.stype = stype;
+}
+
+__rte_hot bool
+cn10k_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op, uint16_t layer_id,
+			struct cnxk_ml_qp *qp, uint64_t head)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
 	struct cnxk_ml_queue *queue;
 	struct cnxk_ml_req *req;
-	struct cnxk_ml_qp *qp;
-	struct rte_ml_op *op;
-
-	uint16_t count;
-	uint64_t head;
-	bool enqueued;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	qp = dev->data->queue_pairs[qp_id];
 	queue = &qp->queue;
-
-	head = queue->head;
-	nb_ops = PLT_MIN(nb_ops, queue_free_count(head, queue->tail, qp->nb_desc));
-	count = 0;
-
-	if (unlikely(nb_ops == 0))
-		return 0;
-
-enqueue_req:
-	op = ops[count];
 	req = &queue->reqs[head];
 
-	cn10k_mldev->set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
+	model = cnxk_mldev->mldev->data->models[op->model_id];
+	model->set_poll_addr(req);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, model->layer[layer_id].index,
+					op->input[0]->addr, op->output[0]->addr, op->nb_batches);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
 	error_code->s.etype = ML_ETYPE_UNKNOWN;
 	req->cn10k_req.result.user_ptr = op->user_ptr;
 
-	cn10k_mldev->set_poll_ptr(req);
-	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd);
-	if (unlikely(!enqueued))
-		goto jcmdq_full;
+	cnxk_ml_set_poll_ptr(req);
+	if (unlikely(!cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd)))
+		return false;
 
 	req->timeout = plt_tsc_cycles() + queue->wait_cycles;
 	req->op = op;
 
-	queue_index_advance(&head, qp->nb_desc);
-	count++;
-
-	if (count < nb_ops)
-		goto enqueue_req;
-
-jcmdq_full:
-	queue->head = head;
-	qp->stats.enqueued_count += count;
-
-	return count;
-}
-
-__rte_hot uint16_t
-cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
-		       uint16_t nb_ops)
-{
-	union cn10k_ml_error_code *error_code;
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_queue *queue;
-	struct cnxk_ml_req *req;
-	struct cnxk_ml_qp *qp;
-
-	uint64_t status;
-	uint16_t count;
-	uint64_t tail;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	qp = dev->data->queue_pairs[qp_id];
-	queue = &qp->queue;
-
-	tail = queue->tail;
-	nb_ops = PLT_MIN(nb_ops, queue_pending_count(queue->head, tail, qp->nb_desc));
-	count = 0;
-
-	if (unlikely(nb_ops == 0))
-		goto empty_or_active;
-
-dequeue_req:
-	req = &queue->reqs[tail];
-	status = cn10k_mldev->get_poll_ptr(req);
-	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
-		if (plt_tsc_cycles() < req->timeout) {
-			goto empty_or_active;
-		} else { /* Timeout, set indication of driver error */
-			error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
-			error_code->s.etype = ML_ETYPE_DRIVER;
-		}
-	}
-
-	cn10k_ml_result_update(cnxk_mldev, qp_id, req);
-	ops[count] = req->op;
-
-	queue_index_advance(&tail, qp->nb_desc);
-	count++;
-
-	if (count < nb_ops)
-		goto dequeue_req;
-
-empty_or_active:
-	queue->tail = tail;
-
-	return count;
+	return true;
 }
 
 __rte_hot int
@@ -1471,41 +1341,48 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 }
 
 __rte_hot int
-cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op)
+cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
+			uint16_t nb_batches)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
 	struct cnxk_ml_req *req;
+	struct rte_ml_op op;
 	uint16_t model_id;
 	uint16_t layer_id;
 	bool timeout;
 	int ret = 0;
 
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	model_id = cnxk_mldev->index_map[op->model_id].model_id;
-	layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
+	model_id = cnxk_mldev->index_map[index].model_id;
+	layer_id = cnxk_mldev->index_map[index].layer_id;
 	model = cnxk_mldev->mldev->data->models[model_id];
 	layer = &model->layer[layer_id];
 	req = layer->glow.req;
 
+	op.model_id = index;
+	op.impl_opaque = 0;
+
 	cn10k_ml_set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, index, input, output, nb_batches);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
 	error_code->s.etype = ML_ETYPE_UNKNOWN;
-	req->cn10k_req.result.user_ptr = op->user_ptr;
+	req->cn10k_req.result.user_ptr = NULL;
 
-	cn10k_mldev->set_poll_ptr(req);
+	cnxk_ml_set_poll_ptr(req);
 	req->cn10k_req.jcmd.w1.s.jobptr = PLT_U64_CAST(&req->cn10k_req.jd);
 
 	timeout = true;
 	req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	do {
 		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd)) {
-			req->op = op;
+			req->op = &op;
 			timeout = false;
 			break;
 		}
@@ -1518,7 +1395,7 @@ cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op)
 
 	timeout = true;
 	do {
-		if (cn10k_mldev->get_poll_ptr(req) == ML_CNXK_POLL_JOB_FINISH) {
+		if (cnxk_ml_get_poll_ptr(req) == ML_CNXK_POLL_JOB_FINISH) {
 			timeout = false;
 			break;
 		}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 4d76164dba..3d18303ed3 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -14,6 +14,7 @@ struct cnxk_ml_dev;
 struct cnxk_ml_qp;
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
+struct cnxk_ml_req;
 
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
@@ -309,13 +310,15 @@ int cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_
 				 void *buffer);
 
 /* Fast-path ops */
-__rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
-					  struct rte_ml_op **ops, uint16_t nb_ops);
-__rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
-					  struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot bool cn10k_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op,
+				       uint16_t layer_id, struct cnxk_ml_qp *qp, uint64_t head);
 __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op,
 				    struct rte_ml_op_error *error);
-__rte_hot int cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op);
+__rte_hot int cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
+				      uint16_t nb_batches);
+__rte_hot void cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request);
+__rte_hot void cn10k_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype);
+__rte_hot void cn10k_ml_set_poll_addr(struct cnxk_ml_req *req);
 
 /* Misc ops */
 void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index 66d979dd3c..f618e5aa5f 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -15,6 +15,8 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_qp;
+struct cnxk_ml_req;
 
 /* Model state */
 enum cnxk_ml_model_state {
@@ -70,6 +72,12 @@ struct cnxk_ml_layer {
 	struct cn10k_ml_layer_data glow;
 };
 
+typedef bool (*enqueue_single_t)(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op,
+				 uint16_t layer_id, struct cnxk_ml_qp *qp, uint64_t head);
+typedef void (*result_update_t)(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request);
+typedef void (*set_error_code_t)(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype);
+typedef void (*set_poll_addr_t)(struct cnxk_ml_req *req);
+
 /* Model Object */
 struct cnxk_ml_model {
 	/* Device reference */
@@ -106,6 +114,12 @@ struct cnxk_ml_model {
 
 	/* Spinlock, used to update model state */
 	plt_spinlock_t lock;
+
+	/* Fast-path functions */
+	enqueue_single_t enqueue_single;
+	result_update_t result_update;
+	set_error_code_t set_error_code;
+	set_poll_addr_t set_poll_addr;
 };
 
 void cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp);
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 4f4a41219e..909e9143bf 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -15,6 +15,18 @@
 /* ML model macros */
 #define CNXK_ML_MODEL_MEMZONE_NAME "ml_cnxk_model_mz"
 
+__rte_hot void
+cnxk_ml_set_poll_ptr(struct cnxk_ml_req *req)
+{
+	plt_write64(ML_CNXK_POLL_JOB_START, req->status);
+}
+
+__rte_hot uint64_t
+cnxk_ml_get_poll_ptr(struct cnxk_ml_req *req)
+{
+	return plt_read64(req->status);
+}
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -1262,6 +1274,122 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 	return 0;
 }
 
+static __rte_always_inline void
+queue_index_advance(uint64_t *index, uint64_t nb_desc)
+{
+	*index = (*index + 1) % nb_desc;
+}
+
+static __rte_always_inline uint64_t
+queue_pending_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
+{
+	return (nb_desc + head - tail) % nb_desc;
+}
+
+static __rte_always_inline uint64_t
+queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
+{
+	return nb_desc - queue_pending_count(head, tail, nb_desc) - 1;
+}
+
+__rte_hot uint16_t
+cnxk_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
+		      uint16_t nb_ops)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_qp *qp;
+	struct rte_ml_op *op;
+
+	uint16_t layer_id = 0;
+	uint16_t count;
+	uint64_t head;
+
+	cnxk_mldev = dev->data->dev_private;
+	qp = dev->data->queue_pairs[qp_id];
+	queue = &qp->queue;
+
+	head = queue->head;
+	nb_ops = PLT_MIN(nb_ops, queue_free_count(head, queue->tail, qp->nb_desc));
+	count = 0;
+
+	if (unlikely(nb_ops == 0))
+		return 0;
+
+enqueue_req:
+	op = ops[count];
+	model = cnxk_mldev->mldev->data->models[op->model_id];
+
+	if (unlikely(!model->enqueue_single(cnxk_mldev, op, layer_id, qp, head)))
+		goto jcmdq_full;
+
+	queue_index_advance(&head, qp->nb_desc);
+	count++;
+
+	if (count < nb_ops)
+		goto enqueue_req;
+
+jcmdq_full:
+	queue->head = head;
+	qp->stats.enqueued_count += count;
+
+	return count;
+}
+
+__rte_hot uint16_t
+cnxk_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
+		      uint16_t nb_ops)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_req *req;
+	struct cnxk_ml_qp *qp;
+
+	uint64_t status;
+	uint16_t count;
+	uint64_t tail;
+
+	cnxk_mldev = dev->data->dev_private;
+	qp = dev->data->queue_pairs[qp_id];
+	queue = &qp->queue;
+
+	tail = queue->tail;
+	nb_ops = PLT_MIN(nb_ops, queue_pending_count(queue->head, tail, qp->nb_desc));
+	count = 0;
+
+	if (unlikely(nb_ops == 0))
+		goto empty_or_active;
+
+dequeue_req:
+
+	req = &queue->reqs[tail];
+	model = cnxk_mldev->mldev->data->models[req->op->model_id];
+
+	status = cnxk_ml_get_poll_ptr(req);
+	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
+		if (plt_tsc_cycles() < req->timeout)
+			goto empty_or_active;
+		else /* Timeout, set indication of driver error */
+			model->set_error_code(req, ML_ETYPE_DRIVER, 0);
+	}
+
+	model->result_update(cnxk_mldev, qp->id, req);
+
+	ops[count] = req->op;
+	queue_index_advance(&tail, qp->nb_desc);
+	count++;
+
+	if (count < nb_ops)
+		goto dequeue_req;
+
+empty_or_active:
+	queue->tail = tail;
+
+	return count;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index d27ca0d0cb..d0c126f34b 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -65,4 +65,11 @@ extern struct rte_ml_dev_ops cnxk_ml_ops;
 int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
 int cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
 
+__rte_hot uint16_t cnxk_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
+					 struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot uint16_t cnxk_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
+					 struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot void cnxk_ml_set_poll_ptr(struct cnxk_ml_req *req);
+__rte_hot uint64_t cnxk_ml_get_poll_ptr(struct cnxk_ml_req *req);
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 17/34] ml/cnxk: move error handling to cnxk layer
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (15 preceding siblings ...)
  2023-10-23  4:41   ` [PATCH v8 16/34] ml/cnxk: update fast path functions Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 18/34] ml/cnxk: support config and close of tvmdp library Srikanth Yalavarthi
                     ` (16 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Move error type structures to cnxk layer. cn10k layer to
handle fw and hw error sub-types only.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h | 41 ++++++---------
 drivers/ml/cnxk/cn10k_ml_ops.c | 93 +++++++++++++---------------------
 drivers/ml/cnxk/cnxk_ml_dev.c  |  8 +++
 drivers/ml/cnxk/cnxk_ml_dev.h  | 18 +++++++
 drivers/ml/cnxk/cnxk_ml_ops.c  |  2 +-
 5 files changed, 78 insertions(+), 84 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 94a94d996f..2e7eb6c9ef 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -52,38 +52,27 @@ struct cnxk_ml_dev;
 struct cnxk_ml_req;
 struct cnxk_ml_qp;
 
-/* Error types enumeration */
-enum cn10k_ml_error_etype {
-	/* 0x0 */ ML_ETYPE_NO_ERROR = 0, /* No error */
-	/* 0x1 */ ML_ETYPE_FW_NONFATAL,	 /* Firmware non-fatal error */
-	/* 0x2 */ ML_ETYPE_HW_NONFATAL,	 /* Hardware non-fatal error */
-	/* 0x3 */ ML_ETYPE_HW_FATAL,	 /* Hardware fatal error */
-	/* 0x4 */ ML_ETYPE_HW_WARNING,	 /* Hardware warning */
-	/* 0x5 */ ML_ETYPE_DRIVER,	 /* Driver specific error */
-	/* 0x6 */ ML_ETYPE_UNKNOWN,	 /* Unknown error */
-};
-
 /* Firmware non-fatal error sub-type */
 enum cn10k_ml_error_stype_fw_nf {
-	/* 0x0 */ ML_FW_ERR_NOERR = 0,		 /* No error */
-	/* 0x1 */ ML_FW_ERR_UNLOAD_ID_NOT_FOUND, /* Model ID not found during load */
-	/* 0x2 */ ML_FW_ERR_LOAD_LUT_OVERFLOW,	 /* Lookup table overflow at load */
-	/* 0x3 */ ML_FW_ERR_ID_IN_USE,		 /* Model ID already in use */
-	/* 0x4 */ ML_FW_ERR_INVALID_TILEMASK,	 /* Invalid OCM tilemask */
-	/* 0x5 */ ML_FW_ERR_RUN_LUT_OVERFLOW,	 /* Lookup table overflow at run */
-	/* 0x6 */ ML_FW_ERR_RUN_ID_NOT_FOUND,	 /* Model ID not found during run */
-	/* 0x7 */ ML_FW_ERR_COMMAND_NOTSUP,	 /* Unsupported command */
-	/* 0x8 */ ML_FW_ERR_DDR_ADDR_RANGE,	 /* DDR address out of range */
-	/* 0x9 */ ML_FW_ERR_NUM_BATCHES_INVALID, /* Invalid number of batches */
-	/* 0xA */ ML_FW_ERR_INSSYNC_TIMEOUT,	 /* INS sync timeout */
+	/* 0x0 */ ML_CN10K_FW_ERR_NOERR = 0,	       /* No error */
+	/* 0x1 */ ML_CN10K_FW_ERR_UNLOAD_ID_NOT_FOUND, /* Model ID not found during load */
+	/* 0x2 */ ML_CN10K_FW_ERR_LOAD_LUT_OVERFLOW,   /* Lookup table overflow at load */
+	/* 0x3 */ ML_CN10K_FW_ERR_ID_IN_USE,	       /* Model ID already in use */
+	/* 0x4 */ ML_CN10K_FW_ERR_INVALID_TILEMASK,    /* Invalid OCM tilemask */
+	/* 0x5 */ ML_CN10K_FW_ERR_RUN_LUT_OVERFLOW,    /* Lookup table overflow at run */
+	/* 0x6 */ ML_CN10K_FW_ERR_RUN_ID_NOT_FOUND,    /* Model ID not found during run */
+	/* 0x7 */ ML_CN10K_FW_ERR_COMMAND_NOTSUP,      /* Unsupported command */
+	/* 0x8 */ ML_CN10K_FW_ERR_DDR_ADDR_RANGE,      /* DDR address out of range */
+	/* 0x9 */ ML_CN10K_FW_ERR_NUM_BATCHES_INVALID, /* Invalid number of batches */
+	/* 0xA */ ML_CN10K_FW_ERR_INSSYNC_TIMEOUT,     /* INS sync timeout */
 };
 
 /* Driver error sub-type */
 enum cn10k_ml_error_stype_driver {
-	/* 0x0 */ ML_DRIVER_ERR_NOERR = 0, /* No error */
-	/* 0x1 */ ML_DRIVER_ERR_UNKNOWN,   /* Unable to determine error sub-type */
-	/* 0x2 */ ML_DRIVER_ERR_EXCEPTION, /* Firmware exception */
-	/* 0x3 */ ML_DRIVER_ERR_FW_ERROR,  /* Unknown firmware error */
+	/* 0x0 */ ML_CN10K_DRIVER_ERR_NOERR = 0, /* No error */
+	/* 0x1 */ ML_CN10K_DRIVER_ERR_UNKNOWN,	 /* Unable to determine error sub-type */
+	/* 0x2 */ ML_CN10K_DRIVER_ERR_EXCEPTION, /* Firmware exception */
+	/* 0x3 */ ML_CN10K_DRIVER_ERR_FW_ERROR,	 /* Unknown firmware error */
 };
 
 /* Error structure */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 8116c8dedb..65eaaf030d 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -22,47 +22,27 @@
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
-/* Error message length */
-#define ERRMSG_LEN 32
-
-/* Error type database */
-static const struct cn10k_ml_etype_db {
-	enum cn10k_ml_error_etype etype;
-	char name[ERRMSG_LEN];
-} ml_etype_db[] = {
-	{ML_ETYPE_NO_ERROR, "NO_ERROR"},	{ML_ETYPE_FW_NONFATAL, "FW_NON_FATAL"},
-	{ML_ETYPE_HW_NONFATAL, "HW_NON_FATAL"}, {ML_ETYPE_HW_FATAL, "HW_FATAL"},
-	{ML_ETYPE_HW_WARNING, "HW_WARNING"},	{ML_ETYPE_DRIVER, "DRIVER_ERROR"},
-	{ML_ETYPE_UNKNOWN, "UNKNOWN_ERROR"},
-};
-
 /* Hardware non-fatal error subtype database */
-static const struct cn10k_ml_stype_db_hw_nf {
-	enum cn10k_ml_error_stype_fw_nf stype;
-	char msg[ERRMSG_LEN];
-} ml_stype_db_hw_nf[] = {
-	{ML_FW_ERR_NOERR, "NO ERROR"},
-	{ML_FW_ERR_UNLOAD_ID_NOT_FOUND, "UNLOAD MODEL ID NOT FOUND"},
-	{ML_FW_ERR_LOAD_LUT_OVERFLOW, "LOAD LUT OVERFLOW"},
-	{ML_FW_ERR_ID_IN_USE, "MODEL ID IN USE"},
-	{ML_FW_ERR_INVALID_TILEMASK, "INVALID TILEMASK"},
-	{ML_FW_ERR_RUN_LUT_OVERFLOW, "RUN LUT OVERFLOW"},
-	{ML_FW_ERR_RUN_ID_NOT_FOUND, "RUN MODEL ID NOT FOUND"},
-	{ML_FW_ERR_COMMAND_NOTSUP, "COMMAND NOT SUPPORTED"},
-	{ML_FW_ERR_DDR_ADDR_RANGE, "DDR ADDRESS OUT OF RANGE"},
-	{ML_FW_ERR_NUM_BATCHES_INVALID, "INVALID BATCHES"},
-	{ML_FW_ERR_INSSYNC_TIMEOUT, "INSSYNC TIMEOUT"},
+static struct cnxk_ml_error_db ml_stype_db_hw_nf[] = {
+	{ML_CN10K_FW_ERR_NOERR, "NO ERROR"},
+	{ML_CN10K_FW_ERR_UNLOAD_ID_NOT_FOUND, "UNLOAD MODEL ID NOT FOUND"},
+	{ML_CN10K_FW_ERR_LOAD_LUT_OVERFLOW, "LOAD LUT OVERFLOW"},
+	{ML_CN10K_FW_ERR_ID_IN_USE, "MODEL ID IN USE"},
+	{ML_CN10K_FW_ERR_INVALID_TILEMASK, "INVALID TILEMASK"},
+	{ML_CN10K_FW_ERR_RUN_LUT_OVERFLOW, "RUN LUT OVERFLOW"},
+	{ML_CN10K_FW_ERR_RUN_ID_NOT_FOUND, "RUN MODEL ID NOT FOUND"},
+	{ML_CN10K_FW_ERR_COMMAND_NOTSUP, "COMMAND NOT SUPPORTED"},
+	{ML_CN10K_FW_ERR_DDR_ADDR_RANGE, "DDR ADDRESS OUT OF RANGE"},
+	{ML_CN10K_FW_ERR_NUM_BATCHES_INVALID, "INVALID BATCHES"},
+	{ML_CN10K_FW_ERR_INSSYNC_TIMEOUT, "INSSYNC TIMEOUT"},
 };
 
 /* Driver error subtype database */
-static const struct cn10k_ml_stype_db_driver {
-	enum cn10k_ml_error_stype_driver stype;
-	char msg[ERRMSG_LEN];
-} ml_stype_db_driver[] = {
-	{ML_DRIVER_ERR_NOERR, "NO ERROR"},
-	{ML_DRIVER_ERR_UNKNOWN, "UNKNOWN ERROR"},
-	{ML_DRIVER_ERR_EXCEPTION, "FW EXCEPTION"},
-	{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
+static struct cnxk_ml_error_db ml_stype_db_driver[] = {
+	{ML_CN10K_DRIVER_ERR_NOERR, "NO ERROR"},
+	{ML_CN10K_DRIVER_ERR_UNKNOWN, "UNKNOWN ERROR"},
+	{ML_CN10K_DRIVER_ERR_EXCEPTION, "FW EXCEPTION"},
+	{ML_CN10K_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
 };
 
 __rte_hot void
@@ -1241,19 +1221,19 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request)
 
 		/* Handle driver error */
 		error_code = (union cn10k_ml_error_code *)&result->error_code;
-		if (error_code->s.etype == ML_ETYPE_DRIVER) {
+		if (error_code->s.etype == ML_CNXK_ETYPE_DRIVER) {
 			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 			/* Check for exception */
 			if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) !=
 			     0) ||
 			    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
-				error_code->s.stype = ML_DRIVER_ERR_EXCEPTION;
+				error_code->s.stype = ML_CN10K_DRIVER_ERR_EXCEPTION;
 			else if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_LO) != 0) ||
 				 (roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_HI) != 0))
-				error_code->s.stype = ML_DRIVER_ERR_FW_ERROR;
+				error_code->s.stype = ML_CN10K_DRIVER_ERR_FW_ERROR;
 			else
-				error_code->s.stype = ML_DRIVER_ERR_UNKNOWN;
+				error_code->s.stype = ML_CN10K_DRIVER_ERR_UNKNOWN;
 		}
 
 		op->impl_opaque = result->error_code;
@@ -1294,7 +1274,7 @@ cn10k_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op, ui
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
-	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	error_code->s.etype = ML_CNXK_ETYPE_UNKNOWN;
 	req->cn10k_req.result.user_ptr = op->user_ptr;
 
 	cnxk_ml_set_poll_ptr(req);
@@ -1311,30 +1291,29 @@ __rte_hot int
 cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_ml_op_error *error)
 {
 	union cn10k_ml_error_code *error_code;
-	char msg[RTE_ML_STR_MAX];
 
 	PLT_SET_USED(dev);
 
 	error_code = (union cn10k_ml_error_code *)&op->impl_opaque;
 
-	/* Copy error message */
-	plt_strlcpy(msg, ml_etype_db[error_code->s.etype].name, sizeof(msg));
-
 	/* Copy sub error message */
-	if (error_code->s.etype == ML_ETYPE_HW_NONFATAL) {
-		strcat(msg, " : ");
+	if (error_code->s.etype == ML_CNXK_ETYPE_HW_NONFATAL) {
 		if (error_code->s.stype < PLT_DIM(ml_stype_db_hw_nf))
-			strcat(msg, ml_stype_db_hw_nf[error_code->s.stype].msg);
+			snprintf(error->message, RTE_ML_STR_MAX, "%s : %s",
+				 ml_etype_db[error_code->s.etype].str,
+				 ml_stype_db_hw_nf[error_code->s.stype].str);
 		else
-			strcat(msg, "UNKNOWN ERROR");
-	}
-
-	if (error_code->s.etype == ML_ETYPE_DRIVER) {
-		strcat(msg, " : ");
-		strcat(msg, ml_stype_db_driver[error_code->s.stype].msg);
+			snprintf(error->message, RTE_ML_STR_MAX, "%s : UNKNOWN ERROR",
+				 ml_etype_db[error_code->s.etype].str);
+	} else if (error_code->s.etype == ML_CNXK_ETYPE_DRIVER) {
+		snprintf(error->message, RTE_ML_STR_MAX, "%s : %s",
+			 ml_etype_db[error_code->s.etype].str,
+			 ml_stype_db_driver[error_code->s.stype].str);
+	} else {
+		snprintf(error->message, RTE_ML_STR_MAX, "%s",
+			 ml_etype_db[error_code->s.etype].str);
 	}
 
-	plt_strlcpy(error->message, msg, sizeof(error->message));
 	error->errcode = error_code->u64;
 
 	return 0;
@@ -1372,7 +1351,7 @@ cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
-	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	error_code->s.etype = ML_CNXK_ETYPE_UNKNOWN;
 	req->cn10k_req.result.user_ptr = NULL;
 
 	cnxk_ml_set_poll_ptr(req);
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.c b/drivers/ml/cnxk/cnxk_ml_dev.c
index 2a5c17c973..63d1c9e417 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.c
+++ b/drivers/ml/cnxk/cnxk_ml_dev.c
@@ -9,3 +9,11 @@
 
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
+
+/* Error type database */
+struct cnxk_ml_error_db ml_etype_db[] = {
+	{ML_CNXK_ETYPE_NO_ERROR, "NO_ERROR"},	     {ML_CNXK_ETYPE_FW_NONFATAL, "FW_NON_FATAL"},
+	{ML_CNXK_ETYPE_HW_NONFATAL, "HW_NON_FATAL"}, {ML_CNXK_ETYPE_HW_FATAL, "HW_FATAL"},
+	{ML_CNXK_ETYPE_HW_WARNING, "HW_WARNING"},    {ML_CNXK_ETYPE_DRIVER, "DRIVER_ERROR"},
+	{ML_CNXK_ETYPE_UNKNOWN, "UNKNOWN_ERROR"},
+};
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 3ce9338f1f..382fca64be 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -18,6 +18,22 @@
 #define ML_CNXK_POLL_JOB_START	0
 #define ML_CNXK_POLL_JOB_FINISH 1
 
+/* Error types enumeration */
+enum cnxk_ml_error_etype {
+	/* 0x0 */ ML_CNXK_ETYPE_NO_ERROR = 0, /* No error */
+	/* 0x1 */ ML_CNXK_ETYPE_FW_NONFATAL,  /* Firmware non-fatal error */
+	/* 0x2 */ ML_CNXK_ETYPE_HW_NONFATAL,  /* Hardware non-fatal error */
+	/* 0x3 */ ML_CNXK_ETYPE_HW_FATAL,     /* Hardware fatal error */
+	/* 0x4 */ ML_CNXK_ETYPE_HW_WARNING,   /* Hardware warning */
+	/* 0x5 */ ML_CNXK_ETYPE_DRIVER,	      /* Driver specific error */
+	/* 0x6 */ ML_CNXK_ETYPE_UNKNOWN,      /* Unknown error */
+};
+
+struct cnxk_ml_error_db {
+	uint64_t code;
+	char str[RTE_ML_STR_MAX];
+};
+
 /* Device configuration state enum */
 enum cnxk_ml_dev_state {
 	/* Probed and not configured */
@@ -78,4 +94,6 @@ struct cnxk_ml_dev {
 	struct cnxk_ml_index_map *index_map;
 };
 
+extern struct cnxk_ml_error_db ml_etype_db[];
+
 #endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 909e9143bf..3d21a31374 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1372,7 +1372,7 @@ cnxk_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op *
 		if (plt_tsc_cycles() < req->timeout)
 			goto empty_or_active;
 		else /* Timeout, set indication of driver error */
-			model->set_error_code(req, ML_ETYPE_DRIVER, 0);
+			model->set_error_code(req, ML_CNXK_ETYPE_DRIVER, 0);
 	}
 
 	model->result_update(cnxk_mldev, qp->id, req);
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 18/34] ml/cnxk: support config and close of tvmdp library
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (16 preceding siblings ...)
  2023-10-23  4:41   ` [PATCH v8 17/34] ml/cnxk: move error handling to cnxk layer Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 19/34] ml/cnxk: add structures to support TVM model type Srikanth Yalavarthi
                     ` (15 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Ruifeng Wang, Bruce Richardson, Srikanth Yalavarthi
  Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to configure and close TVMDP library based
on ML device configuration options.

Updated meson build to enable Jansson, TVM runtime, TVMDP
library as build dependencies.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 config/arm/arm64_cn10k_linux_gcc |   1 +
 config/arm/arm64_cn9k_linux_gcc  |   1 +
 doc/guides/mldevs/cnxk.rst       | 169 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.c    |   7 ++
 drivers/ml/cnxk/cnxk_ml_ops.h    |   6 ++
 drivers/ml/cnxk/meson.build      |  58 +++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.c   |  41 ++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |  19 ++++
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  26 +++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  15 +++
 10 files changed, 343 insertions(+)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_stubs.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_stubs.h

diff --git a/config/arm/arm64_cn10k_linux_gcc b/config/arm/arm64_cn10k_linux_gcc
index 05d2d64cf2..fa904af5d0 100644
--- a/config/arm/arm64_cn10k_linux_gcc
+++ b/config/arm/arm64_cn10k_linux_gcc
@@ -5,6 +5,7 @@ ar = 'aarch64-linux-gnu-gcc-ar'
 strip = 'aarch64-linux-gnu-strip'
 pkgconfig = 'aarch64-linux-gnu-pkg-config'
 pcap-config = ''
+cmake = 'cmake'
 
 [host_machine]
 system = 'linux'
diff --git a/config/arm/arm64_cn9k_linux_gcc b/config/arm/arm64_cn9k_linux_gcc
index 7416454de0..646ce4b5d3 100644
--- a/config/arm/arm64_cn9k_linux_gcc
+++ b/config/arm/arm64_cn9k_linux_gcc
@@ -5,6 +5,7 @@ ar = 'aarch64-linux-gnu-gcc-ar'
 strip = 'aarch64-linux-gnu-strip'
 pkgconfig = 'aarch64-linux-gnu-pkg-config'
 pcap-config = ''
+cmake = 'cmake'
 
 [host_machine]
 system = 'linux'
diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
index 1834b1f905..a4d8903896 100644
--- a/doc/guides/mldevs/cnxk.rst
+++ b/doc/guides/mldevs/cnxk.rst
@@ -46,6 +46,175 @@ or cross-compiled on an x86 platform.
 
 Refer to :doc:`../platform/cnxk` for instructions to build your DPDK application.
 
+Compilation Prerequisites
+-------------------------
+
+This driver requires external libraries to optionally enable support for
+models compiled using Apache TVM framework. The following dependencies are
+not part of DPDK and must be installed separately:
+
+- **Jansson**
+
+  This library enables support to parse and read JSON files.
+
+- **DLPack**
+
+  This library provides headers for open in-memory tensor structures.
+
+.. note::
+
+    DPDK CNXK ML driver requires DLPack version 0.7
+
+.. code-block:: console
+
+    git clone https://github.com/dmlc/dlpack.git
+    cd dlpack
+    git checkout v0.7 -b v0.7
+    cmake -S ./ -B build \
+      -DCMAKE_INSTALL_PREFIX=<install_prefix> \
+      -DBUILD_MOCK=OFF
+    make -C build
+    make -C build install
+
+*Cross-compiling for AArch64*
+
+.. code-block:: console
+
+    git clone https://github.com/dmlc/dlpack.git
+    cd dlpack
+    git checkout v0.7 -b v0.7
+    cmake -S ./ -B build \
+      -DCMAKE_INSTALL_PREFIX=<install_prefix>
+      -DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
+      -DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++ \
+      -DBUILD_MOCK=OFF
+    make -C build
+    make -C build install
+
+- **DMLC**
+
+  This is a common bricks library for building scalable and portable distributed
+  machine learning.
+
+.. code-block:: console
+
+    git clone https://github.com/dmlc/dmlc-core.git
+    cd dmlc-core
+    git checkout main
+    cmake -S ./ -B build \
+      -DCMAKE_INSTALL_PREFIX=<install_prefix> \
+      -DCMAKE_C_FLAGS="-fpermissive" \
+      -DCMAKE_CXX_FLAGS="-fpermissive" \
+      -DUSE_OPENMP=OFF
+    make -C build
+    make -C build install
+
+*Cross-compiling for AArch64*
+
+.. code-block:: console
+
+    git clone https://github.com/dmlc/dmlc-core.git
+    cd dmlc-core
+    git checkout main
+    cmake -S ./ -B build \
+      -DCMAKE_INSTALL_PREFIX=<install_prefix> \
+      -DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
+      -DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++ \
+      -DCMAKE_C_FLAGS="-fpermissive" \
+      -DCMAKE_CXX_FLAGS="-fpermissive" \
+      -DUSE_OPENMP=OFF
+    make -C build
+    make -C build install
+
+- **TVM**
+
+  Apache TVM provides a runtime libraries used to execute models on CPU cores
+  or hardware accelerators.
+
+.. note::
+
+    DPDK CNXK ML driver requires TVM version 0.10.0
+
+.. code-block:: console
+
+    git clone https://github.com/apache/tvm.git
+    cd tvm
+    git checkout v0.11.0 -b v0.11.0
+    git submodule update --init
+    cmake -S ./ -B build \
+      -DCMAKE_INSTALL_PREFIX=<install_prefix> \
+      -DBUILD_STATIC_RUNTIME=OFF
+    make -C build
+    make -C build install
+
+*Cross-compiling for AArch64*
+
+.. code-block:: console
+
+    git clone https://github.com/apache/tvm.git
+    cd tvm
+    git checkout v0.11.0 -b v0.11.0
+    git submodule update --init
+    cmake -S ./ -B build \
+      -DCMAKE_INSTALL_PREFIX=<install_prefix> \
+      -DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
+      -DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++ \
+      -DMACHINE_NAME=aarch64-linux-gnu \
+      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
+      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
+      -DBUILD_STATIC_RUNTIME=OFF
+    make -C build
+    make -C build install
+
+- **TVMDP**
+
+  Marvell's `TVM Dataplane Library <https://github.com/MarvellEmbeddedProcessors/tvmdp>`_
+  works as an interface between TVM runtime and DPDK drivers. TVMDP library
+  provides a simplified C interface for TVM's runtime based on C++.
+
+.. note::
+
+    TVMDP library is dependent on TVM, dlpack, jansson and dmlc-core libraries.
+
+.. code-block:: console
+
+    git clone https://github.com/MarvellEmbeddedProcessors/tvmdp.git
+    cd tvmdp
+    git checkout main
+    cmake -S ./ -B build \
+      -DCMAKE_INSTALL_PREFIX=<install_prefix> \
+      -DBUILD_SHARED_LIBS=ON
+    make -C build
+    make -C build install
+
+*Cross-compiling for AArch64*
+
+.. code-block:: console
+
+    git clone https://github.com/MarvellEmbeddedProcessors/tvmdp.git
+    cd tvmdp
+    git checkout main
+    cmake -S ./ -B build \
+      -DCMAKE_INSTALL_PREFIX=<install_prefix> \
+      -DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
+      -DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++ \
+      -DCMAKE_FIND_ROOT_PATH=<install_prefix> \
+      -DBUILD_SHARED_LIBS=ON
+    make -C build
+    make -C build install
+
+- **libarchive**
+
+  Apached TVM framework generates compiled models as tar archives. This
+  library enables support to decompress and read archive files in tar,
+  xz and other formats.
+
+.. note::
+
+    In order for meson to find the dependencies during the configure stage,
+    it is required to add the cmake paths <install_prefix>/lib/cmake/dlpack,
+    <install_prefix>/lib/cmake/dmlc and <install_prefix>/lib/cmake/tvm to
+    CMAKE_PREFIX_PATH and <install_prefix>/lib/pkgconfig to PKG_CONFIG_PATH.
 
 Initialization
 --------------
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 3d21a31374..33d13d5514 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -564,6 +564,10 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 		goto error;
 	}
 
+	ret = mvtvm_ml_dev_configure(cnxk_mldev, conf);
+	if (ret != 0)
+		goto error;
+
 	/* Set device capabilities */
 	cnxk_mldev->max_nb_layers =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
@@ -624,6 +628,9 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 	/* Un-initialize xstats */
 	cnxk_ml_xstats_uninit(cnxk_mldev);
 
+	if (mvtvm_ml_dev_close(cnxk_mldev) != 0)
+		plt_err("Failed to close MVTVM ML Device");
+
 	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close CN10K ML Device");
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index d0c126f34b..b22a2b0d95 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -12,6 +12,12 @@
 
 #include "cn10k_ml_ops.h"
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_ops.h"
+#else
+#include "mvtvm_ml_stubs.h"
+#endif
+
 /* Request structure */
 struct cnxk_ml_req {
 	/* Device specific request */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 5d27a87d91..1ef2b3c335 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -7,6 +7,37 @@ if not is_linux or not dpdk_conf.get('RTE_ARCH_64')
     subdir_done()
 endif
 
+enable_mvtvm = true
+
+if not jansson_dep.found()
+        message('drivers/ml/cnxk: jansson not found')
+        enable_mvtvm = false
+endif
+
+dlpack_dep = dependency('dlpack', method: 'cmake', required: false, cmake_args: 'CONFIG')
+if not dlpack_dep.found()
+        message('drivers/ml/cnxk: dlpack not found')
+        enable_mvtvm = false
+endif
+
+dmlc_dep = dependency('dmlc', method: 'cmake', required: false, cmake_args: 'CONFIG')
+if not dmlc_dep.found()
+        message('drivers/ml/cnxk: dmlc not found')
+        enable_mvtvm = false
+endif
+
+tvm_dep = dependency('tvm', method: 'cmake', required: false, cmake_args: 'CONFIG', modules : ['tvm::tvm_runtime'])
+if not tvm_dep.found()
+        message('drivers/ml/cnxk: tvm_runtime not found')
+        enable_mvtvm = false
+endif
+
+tvmdp_dep = dependency('tvmdp', method: 'pkg-config', required: false)
+if not tvmdp_dep.found()
+        message('drivers/ml/cnxk: tvmdp not found')
+        enable_mvtvm = false
+endif
+
 sources = files(
         'cn10k_ml_dev.c',
         'cn10k_ml_ops.c',
@@ -21,6 +52,33 @@ sources = files(
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
 
+if enable_mvtvm
+
+dpdk_conf.set('RTE_MLDEV_CNXK_ENABLE_MVTVM', 1)
+
+sources += files(
+        'mvtvm_ml_ops.c',
+)
+
+ext_deps += jansson_dep
+ext_deps += dlpack_dep
+ext_deps += dmlc_dep
+ext_deps += tvm_dep
+ext_deps += tvmdp_dep
+ext_deps += cc.find_library('stdc++', required: true)
+
+deps += ['bus_vdev']
+
+message('drivers/ml/cnxk: Enabled TVM model support')
+else
+message('drivers/ml/cnxk: Disabled TVM model support')
+
+sources += files(
+        'mvtvm_ml_stubs.c',
+)
+
+endif
+
 require_iova_in_mbuf = false
 
 if get_option('buildtype').contains('debug')
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
new file mode 100644
index 0000000000..88c6d5a864
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_common.h>
+#include <rte_cycles.h>
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include "cnxk_ml_dev.h"
+#include "cnxk_ml_ops.h"
+
+int
+mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
+{
+	int ret;
+
+	RTE_SET_USED(conf);
+
+	/* Configure TVMDP library */
+	ret = tvmdp_configure(cnxk_mldev->mldev->data->nb_models, rte_get_tsc_cycles);
+	if (ret != 0)
+		plt_err("TVMDP configuration failed, error = %d\n", ret);
+
+	return ret;
+}
+
+int
+mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
+{
+	int ret;
+
+	RTE_SET_USED(cnxk_mldev);
+
+	/* Close TVMDP library configuration */
+	ret = tvmdp_close();
+	if (ret != 0)
+		plt_err("TVMDP close failed, error = %d\n", ret);
+
+	return ret;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
new file mode 100644
index 0000000000..305b4681ed
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_OPS_H_
+#define _MVTVM_ML_OPS_H_
+
+#include <dlpack/dlpack.h>
+
+#include <tvmdp.h>
+
+#include <rte_mldev.h>
+
+struct cnxk_ml_dev;
+
+int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
+int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+
+#endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
new file mode 100644
index 0000000000..a31cd39cfa
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include "mvtvm_ml_stubs.h"
+
+#include "cnxk_ml_dev.h"
+
+int
+mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(conf);
+
+	return 0;
+}
+
+int
+mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
+{
+	RTE_SET_USED(cnxk_mldev);
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
new file mode 100644
index 0000000000..11c56e5144
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_STUBS_H_
+#define _MVTVM_ML_STUBS_H_
+
+#include <rte_mldev.h>
+
+struct cnxk_ml_dev;
+
+int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
+int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+
+#endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 19/34] ml/cnxk: add structures to support TVM model type
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (17 preceding siblings ...)
  2023-10-23  4:41   ` [PATCH v8 18/34] ml/cnxk: support config and close of tvmdp library Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 20/34] ml/cnxk: add support for identify " Srikanth Yalavarthi
                     ` (14 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduced model type, sub-type and layer type. Added
internal structures for TVM model objects.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  3 ++
 drivers/ml/cnxk/cn10k_ml_ops.c   |  6 ++-
 drivers/ml/cnxk/cnxk_ml_model.h  | 66 +++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.c    | 52 ++++++++++++++++++++-----
 drivers/ml/cnxk/mvtvm_ml_model.h | 46 ++++++++++++++++++++++
 5 files changed, 160 insertions(+), 13 deletions(-)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.h

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index dc315cce10..749ddeb344 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -435,6 +435,9 @@ cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint1
 
 			for (j = 0; j < local_model->nb_layers; j++) {
 				local_layer = &local_model->layer[j];
+				if (local_layer->type != ML_CNXK_LAYER_TYPE_MRVL)
+					continue;
+
 				if (local_layer != layer &&
 				    local_layer->glow.ocm_map.ocm_reserved) {
 					if (IS_BIT_SET(local_layer->glow.ocm_map.tilemask, tile_id))
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 65eaaf030d..a471e98fbf 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -725,6 +725,9 @@ cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	if (ret != 0)
 		return ret;
 
+	/* Set model sub type */
+	model->subtype = ML_CNXK_MODEL_SUBTYPE_GLOW_MRVL;
+
 	/* Copy metadata to internal buffer */
 	rte_memcpy(&model->glow.metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
 	cn10k_ml_model_metadata_update(&model->glow.metadata);
@@ -746,6 +749,7 @@ cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	/* Load layer and get the index */
 	layer = &model->layer[0];
+	layer->type = ML_CNXK_LAYER_TYPE_MRVL;
 	ret = cn10k_ml_layer_load(cnxk_mldev, model->model_id, NULL, params->addr, params->size,
 				  &layer->index);
 	if (ret != 0) {
@@ -969,7 +973,7 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 	if (ret < 0) {
 		cn10k_ml_layer_stop(device, model_id, layer_name);
 	} else {
-		if (cn10k_mldev->cache_model_data)
+		if (cn10k_mldev->cache_model_data && model->type == ML_CNXK_MODEL_TYPE_GLOW)
 			ret = cn10k_ml_cache_model_data(cnxk_mldev, layer);
 	}
 
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index f618e5aa5f..f100eca203 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -11,6 +11,10 @@
 
 #include "cn10k_ml_model.h"
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_model.h"
+#endif
+
 #include "cnxk_ml_io.h"
 
 struct cnxk_ml_dev;
@@ -18,6 +22,48 @@ struct cnxk_ml_model;
 struct cnxk_ml_qp;
 struct cnxk_ml_req;
 
+/* Model type */
+enum cnxk_ml_model_type {
+	/* Unknown model type */
+	ML_CNXK_MODEL_TYPE_UNKNOWN,
+
+	/* Invalid model type */
+	ML_CNXK_MODEL_TYPE_INVALID,
+
+	/* Glow compiled model, for MLIP target */
+	ML_CNXK_MODEL_TYPE_GLOW,
+
+	/* TVM compiled model, for ARM64 / ARM64 + MLIP target */
+	ML_CNXK_MODEL_TYPE_TVM,
+};
+
+/* Model subtype */
+enum cnxk_ml_model_subtype {
+	/* Marvell Glow model */
+	ML_CNXK_MODEL_SUBTYPE_GLOW_MRVL,
+
+	/* TVM model with single MRVL region */
+	ML_CNXK_MODEL_SUBTYPE_TVM_MRVL,
+
+	/* TVM model with LLVM regions only */
+	ML_CNXK_MODEL_SUBTYPE_TVM_LLVM,
+
+	/* TVM hybrid model, with both MRVL and LLVM regions or (> 1) MRVL regions*/
+	ML_CNXK_MODEL_SUBTYPE_TVM_HYBRID,
+};
+
+/* Layer type */
+enum cnxk_ml_layer_type {
+	/* MRVL layer, for MLIP target*/
+	ML_CNXK_LAYER_TYPE_UNKNOWN = 0,
+
+	/* MRVL layer, for MLIP target*/
+	ML_CNXK_LAYER_TYPE_MRVL,
+
+	/* LLVM layer, for ARM64 target*/
+	ML_CNXK_LAYER_TYPE_LLVM,
+};
+
 /* Model state */
 enum cnxk_ml_model_state {
 	/* Unknown state */
@@ -53,6 +99,9 @@ struct cnxk_ml_layer {
 	/* Name*/
 	char name[RTE_ML_STR_MAX];
 
+	/* Type */
+	enum cnxk_ml_layer_type type;
+
 	/* Model handle */
 	struct cnxk_ml_model *model;
 
@@ -83,14 +132,27 @@ struct cnxk_ml_model {
 	/* Device reference */
 	struct cnxk_ml_dev *cnxk_mldev;
 
+	/* Type */
+	enum cnxk_ml_model_type type;
+
+	/* Model subtype */
+	enum cnxk_ml_model_subtype subtype;
+
 	/* ID */
 	uint16_t model_id;
 
 	/* Name */
 	char name[RTE_ML_STR_MAX];
 
-	/* Model specific data - glow */
-	struct cn10k_ml_model_data glow;
+	union {
+		/* Model specific data - glow */
+		struct cn10k_ml_model_data glow;
+
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		/* Model type specific data - mvtvm */
+		struct mvtvm_ml_model_data mvtvm;
+#endif
+	};
 
 	/* Batch size */
 	uint32_t batch_size;
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 33d13d5514..96f87128f9 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1217,6 +1217,8 @@ cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buf
 	struct cnxk_ml_model *model;
 	uint8_t *lcl_dbuffer;
 	uint8_t *lcl_qbuffer;
+	uint64_t d_offset;
+	uint64_t q_offset;
 	uint32_t i;
 	int ret;
 
@@ -1229,17 +1231,31 @@ cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buf
 		return -EINVAL;
 	}
 
-	info = &model->layer[0].info;
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		info = cn10k_ml_model_io_info_get(model, 0);
 
-	lcl_dbuffer = dbuffer[0]->addr;
-	lcl_qbuffer = qbuffer[0]->addr;
+	if (info == NULL)
+		return -EINVAL;
+
+	d_offset = 0;
+	q_offset = 0;
 	for (i = 0; i < info->nb_inputs; i++) {
+		if (model->type == ML_CNXK_MODEL_TYPE_TVM) {
+			lcl_dbuffer = dbuffer[i]->addr;
+			lcl_qbuffer = qbuffer[i]->addr;
+		} else {
+			lcl_dbuffer = RTE_PTR_ADD(dbuffer[0]->addr, d_offset);
+			lcl_qbuffer = RTE_PTR_ADD(qbuffer[0]->addr, q_offset);
+		}
+
 		ret = cnxk_ml_io_quantize_single(&info->input[i], lcl_dbuffer, lcl_qbuffer);
 		if (ret < 0)
 			return ret;
 
-		lcl_dbuffer += info->input[i].sz_d;
-		lcl_qbuffer += info->input[i].sz_q;
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
+			d_offset += info->input[i].sz_d;
+			q_offset += info->input[i].sz_q;
+		}
 	}
 
 	return 0;
@@ -1253,6 +1269,8 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 	struct cnxk_ml_model *model;
 	uint8_t *lcl_qbuffer;
 	uint8_t *lcl_dbuffer;
+	uint64_t q_offset;
+	uint64_t d_offset;
 	uint32_t i;
 	int ret;
 
@@ -1265,17 +1283,31 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 		return -EINVAL;
 	}
 
-	info = &model->layer[model->nb_layers - 1].info;
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		info = cn10k_ml_model_io_info_get(model, model->nb_layers - 1);
 
-	lcl_qbuffer = qbuffer[0]->addr;
-	lcl_dbuffer = dbuffer[0]->addr;
+	if (info == NULL)
+		return -EINVAL;
+
+	q_offset = 0;
+	d_offset = 0;
 	for (i = 0; i < info->nb_outputs; i++) {
+		if (model->type == ML_CNXK_MODEL_TYPE_TVM) {
+			lcl_qbuffer = qbuffer[i]->addr;
+			lcl_dbuffer = dbuffer[i]->addr;
+		} else {
+			lcl_qbuffer = RTE_PTR_ADD(qbuffer[0]->addr, q_offset);
+			lcl_dbuffer = RTE_PTR_ADD(dbuffer[0]->addr, d_offset);
+		}
+
 		ret = cnxk_ml_io_dequantize_single(&info->output[i], lcl_qbuffer, lcl_dbuffer);
 		if (ret < 0)
 			return ret;
 
-		lcl_qbuffer += info->output[i].sz_q;
-		lcl_dbuffer += info->output[i].sz_d;
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
+			q_offset += info->output[i].sz_q;
+			d_offset += info->output[i].sz_d;
+		}
 	}
 
 	return 0;
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
new file mode 100644
index 0000000000..1f6b435be0
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_MODEL_H_
+#define _MVTVM_ML_MODEL_H_
+
+#include <tvmdp.h>
+
+#include <rte_mldev.h>
+
+#include "cnxk_ml_io.h"
+
+/* Maximum number of objects per model */
+#define ML_MVTVM_MODEL_OBJECT_MAX 3
+
+/* Objects list */
+extern char mvtvm_object_list[ML_MVTVM_MODEL_OBJECT_MAX][RTE_ML_STR_MAX];
+
+/* Model object structure */
+struct mvtvm_ml_model_object {
+	/* Name */
+	char name[RTE_ML_STR_MAX];
+
+	/* Temporary buffer */
+	uint8_t *buffer;
+
+	/* Buffer size */
+	int64_t size;
+};
+
+struct mvtvm_ml_model_data {
+	/* Model metadata */
+	struct tvmdp_model_metadata metadata;
+
+	/* Model objects */
+	struct tvmdp_model_object object;
+
+	/* TVM runtime callbacks */
+	struct tvmrt_glow_callback cb;
+
+	/* Model I/O info */
+	struct cnxk_ml_io_info info;
+};
+
+#endif /* _MVTVM_ML_MODEL_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 20/34] ml/cnxk: add support for identify model type
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (18 preceding siblings ...)
  2023-10-23  4:41   ` [PATCH v8 19/34] ml/cnxk: add structures to support TVM model type Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 21/34] ml/cnxk: add support to parse TVM model objects Srikanth Yalavarthi
                     ` (13 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enable support to parse model buffer to identify the
model type and model sub-type. Enabled basic checks
for Glow model type buffer.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_model.c  | 49 ++++++++++++++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_model.h  |  3 ++
 drivers/ml/cnxk/cnxk_ml_ops.c    |  8 +++++
 drivers/ml/cnxk/meson.build      |  6 ++++
 drivers/ml/cnxk/mvtvm_ml_model.c | 55 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  9 ++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  1 +
 8 files changed, 133 insertions(+)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.c

diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
index b069d4e3a5..02f80410ec 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.c
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -2,11 +2,60 @@
  * Copyright (c) 2023 Marvell.
  */
 
+#include <rte_hash_crc.h>
 #include <rte_mldev.h>
 
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_utils.h"
 
+enum cnxk_ml_model_type
+cnxk_ml_model_get_type(struct rte_ml_model_params *params)
+{
+	struct cn10k_ml_model_metadata_header *metadata_header;
+	enum cnxk_ml_model_type type;
+	uint32_t payload_crc32c;
+	uint32_t header_crc32c;
+
+	type = mvtvm_ml_model_type_get(params);
+	if (type == ML_CNXK_MODEL_TYPE_TVM)
+		return ML_CNXK_MODEL_TYPE_TVM;
+	else if (type == ML_CNXK_MODEL_TYPE_INVALID)
+		return ML_CNXK_MODEL_TYPE_INVALID;
+
+	/* Check model magic string */
+	metadata_header = (struct cn10k_ml_model_metadata_header *)params->addr;
+	if (strncmp((char *)metadata_header->magic, MRVL_ML_MODEL_MAGIC_STRING, 4) != 0) {
+		plt_err("Invalid Glow model, magic = %s", metadata_header->magic);
+		return ML_CNXK_MODEL_TYPE_INVALID;
+	}
+
+	/* Header CRC check */
+	if (metadata_header->header_crc32c != 0) {
+		header_crc32c = rte_hash_crc(
+			params->addr,
+			sizeof(struct cn10k_ml_model_metadata_header) - sizeof(uint32_t), 0);
+
+		if (header_crc32c != metadata_header->header_crc32c) {
+			plt_err("Invalid Glow model, Header CRC mismatch");
+			return ML_CNXK_MODEL_TYPE_INVALID;
+		}
+	}
+
+	/* Payload CRC check */
+	if (metadata_header->payload_crc32c != 0) {
+		payload_crc32c = rte_hash_crc(
+			PLT_PTR_ADD(params->addr, sizeof(struct cn10k_ml_model_metadata_header)),
+			params->size - sizeof(struct cn10k_ml_model_metadata_header), 0);
+
+		if (payload_crc32c != metadata_header->payload_crc32c) {
+			plt_err("Invalid Glow model, Payload CRC mismatch");
+			return ML_CNXK_MODEL_TYPE_INVALID;
+		}
+	}
+
+	return ML_CNXK_MODEL_TYPE_GLOW;
+}
+
 void
 cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp)
 {
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index f100eca203..a2fced46a2 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -13,6 +13,8 @@
 
 #ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
 #include "mvtvm_ml_model.h"
+#else
+#include "mvtvm_ml_stubs.h"
 #endif
 
 #include "cnxk_ml_io.h"
@@ -184,6 +186,7 @@ struct cnxk_ml_model {
 	set_poll_addr_t set_poll_addr;
 };
 
+enum cnxk_ml_model_type cnxk_ml_model_get_type(struct rte_ml_model_params *params);
 void cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp);
 
 #endif /* _CNXK_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 96f87128f9..ebc78e36e9 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1018,6 +1018,7 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 {
 	struct rte_ml_dev_info dev_info;
 	struct cnxk_ml_dev *cnxk_mldev;
+	enum cnxk_ml_model_type type;
 	struct cnxk_ml_model *model;
 
 	char str[RTE_MEMZONE_NAMESIZE];
@@ -1033,6 +1034,12 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 
 	cnxk_mldev = dev->data->dev_private;
 
+	type = cnxk_ml_model_get_type(params);
+	if (type == ML_CNXK_MODEL_TYPE_INVALID) {
+		plt_err("Invalid / unsupported model type");
+		return -EINVAL;
+	}
+
 	/* Find model ID */
 	found = false;
 	for (lcl_model_id = 0; lcl_model_id < dev->data->nb_models; lcl_model_id++) {
@@ -1066,6 +1073,7 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 
 	model = mz->addr;
 	model->cnxk_mldev = cnxk_mldev;
+	model->type = type;
 	model->model_id = lcl_model_id;
 	model->info = PLT_PTR_ADD(
 		model, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size));
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 1ef2b3c335..20534d0b00 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -9,6 +9,11 @@ endif
 
 enable_mvtvm = true
 
+if not libarchive.found()
+        message('drivers/ml/cnxk: libarchive not found')
+        enable_mvtvm = false
+endif
+
 if not jansson_dep.found()
         message('drivers/ml/cnxk: jansson not found')
         enable_mvtvm = false
@@ -58,6 +63,7 @@ dpdk_conf.set('RTE_MLDEV_CNXK_ENABLE_MVTVM', 1)
 
 sources += files(
         'mvtvm_ml_ops.c',
+        'mvtvm_ml_model.c',
 )
 
 ext_deps += jansson_dep
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
new file mode 100644
index 0000000000..ab5f8baa67
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <archive.h>
+#include <archive_entry.h>
+
+#include <rte_mldev.h>
+
+#include <roc_api.h>
+
+#include "cnxk_ml_model.h"
+
+/* Objects list */
+char mvtvm_object_list[ML_MVTVM_MODEL_OBJECT_MAX][RTE_ML_STR_MAX] = {"mod.so", "mod.json",
+								     "mod.params"};
+
+enum cnxk_ml_model_type
+mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
+{
+	bool object_found[ML_MVTVM_MODEL_OBJECT_MAX] = {false, false, false};
+	struct archive_entry *entry;
+	struct archive *a;
+	uint8_t i;
+	int ret;
+
+	/* Assume as archive and check for read status */
+	a = archive_read_new();
+	archive_read_support_filter_all(a);
+	archive_read_support_format_all(a);
+
+	ret = archive_read_open_memory(a, params->addr, params->size);
+	if (ret != ARCHIVE_OK)
+		return ML_CNXK_MODEL_TYPE_UNKNOWN;
+
+	/* Parse buffer for available objects */
+	while (archive_read_next_header(a, &entry) == ARCHIVE_OK) {
+		for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+			if (!object_found[i] &&
+			    (strcmp(archive_entry_pathname(entry), mvtvm_object_list[i]) == 0))
+				object_found[i] = true;
+		}
+		archive_read_data_skip(a);
+	}
+
+	/* Check if all objects are available */
+	for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+		if (!object_found[i]) {
+			plt_err("Object %s not found in archive!\n", mvtvm_object_list[i]);
+			return ML_CNXK_MODEL_TYPE_INVALID;
+		}
+	}
+
+	return ML_CNXK_MODEL_TYPE_TVM;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 1f6b435be0..b6162fceec 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -43,4 +43,6 @@ struct mvtvm_ml_model_data {
 	struct cnxk_ml_io_info info;
 };
 
+enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
+
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index a31cd39cfa..a7352840a6 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -7,6 +7,15 @@
 #include "mvtvm_ml_stubs.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
+
+enum cnxk_ml_model_type
+mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
+{
+	RTE_SET_USED(params);
+
+	return ML_CNXK_MODEL_TYPE_UNKNOWN;
+}
 
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 11c56e5144..467a9d39e5 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -9,6 +9,7 @@
 
 struct cnxk_ml_dev;
 
+enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 21/34] ml/cnxk: add support to parse TVM model objects
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (19 preceding siblings ...)
  2023-10-23  4:41   ` [PATCH v8 20/34] ml/cnxk: add support for identify " Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 22/34] ml/cnxk: fetch layer info and load TVM model Srikanth Yalavarthi
                     ` (12 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to parse TVM model objects from the model
archive buffer. Added support to check for all expected
objects and copy TVM model objects to internal buffers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_ops.c    |  5 ++-
 drivers/ml/cnxk/mvtvm_ml_model.c | 57 +++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 62 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |  3 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c | 11 ++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  3 ++
 7 files changed, 142 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index ebc78e36e9..85b37161d2 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1079,7 +1079,10 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 		model, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size));
 	dev->data->models[lcl_model_id] = model;
 
-	ret = cn10k_ml_model_load(cnxk_mldev, params, model);
+	if (type == ML_CNXK_MODEL_TYPE_GLOW)
+		ret = cn10k_ml_model_load(cnxk_mldev, params, model);
+	else
+		ret = mvtvm_ml_model_load(cnxk_mldev, params, model);
 	if (ret != 0)
 		goto error;
 
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index ab5f8baa67..4c9a080c05 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -53,3 +53,60 @@ mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
 
 	return ML_CNXK_MODEL_TYPE_TVM;
 }
+
+int
+mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params, struct mvtvm_ml_model_object *object)
+{
+	bool object_found[ML_MVTVM_MODEL_OBJECT_MAX] = {false, false, false};
+	struct archive_entry *entry;
+	struct archive *a;
+	uint8_t i;
+	int ret;
+
+	/* Open archive */
+	a = archive_read_new();
+	archive_read_support_filter_all(a);
+	archive_read_support_format_all(a);
+
+	ret = archive_read_open_memory(a, params->addr, params->size);
+	if (ret != ARCHIVE_OK)
+		return archive_errno(a);
+
+	/* Read archive */
+	while (archive_read_next_header(a, &entry) == ARCHIVE_OK) {
+		for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+			if (!object_found[i] &&
+			    (strcmp(archive_entry_pathname(entry), mvtvm_object_list[i]) == 0)) {
+				memcpy(object[i].name, mvtvm_object_list[i], RTE_ML_STR_MAX);
+				object[i].size = archive_entry_size(entry);
+				object[i].buffer = rte_malloc(NULL, object[i].size, 0);
+
+				if (archive_read_data(a, object[i].buffer, object[i].size) !=
+				    object[i].size) {
+					plt_err("Failed to read object from model archive: %s",
+						object[i].name);
+					goto error;
+				}
+				object_found[i] = true;
+			}
+		}
+		archive_read_data_skip(a);
+	}
+
+	/* Check if all objects are parsed */
+	for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+		if (!object_found[i]) {
+			plt_err("Object %s not found in archive!\n", mvtvm_object_list[i]);
+			goto error;
+		}
+	}
+	return 0;
+
+error:
+	for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+		if (object[i].buffer != NULL)
+			rte_free(object[i].buffer);
+	}
+
+	return -EINVAL;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index b6162fceec..b11b66f495 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -44,5 +44,7 @@ struct mvtvm_ml_model_data {
 };
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
+int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
+			      struct mvtvm_ml_model_object *object);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 88c6d5a864..e2413b6b15 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -8,8 +8,12 @@
 #include <rte_mldev_pmd.h>
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
+/* ML model macros */
+#define MVTVM_ML_MODEL_MEMZONE_NAME "ml_mvtvm_model_mz"
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
@@ -39,3 +43,61 @@ mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 
 	return ret;
 }
+
+int
+mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+		    struct cnxk_ml_model *model)
+{
+	struct mvtvm_ml_model_object object[ML_MVTVM_MODEL_OBJECT_MAX];
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	size_t model_object_size = 0;
+	uint64_t mz_size = 0;
+	int ret;
+
+	RTE_SET_USED(cnxk_mldev);
+
+	ret = mvtvm_ml_model_blob_parse(params, object);
+	if (ret != 0)
+		return ret;
+
+	model_object_size = RTE_ALIGN_CEIL(object[0].size, RTE_CACHE_LINE_MIN_SIZE) +
+			    RTE_ALIGN_CEIL(object[1].size, RTE_CACHE_LINE_MIN_SIZE) +
+			    RTE_ALIGN_CEIL(object[2].size, RTE_CACHE_LINE_MIN_SIZE);
+	mz_size += model_object_size;
+
+	/* Allocate memzone for model object */
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", MVTVM_ML_MODEL_MEMZONE_NAME, model->model_id);
+	mz = plt_memzone_reserve_aligned(str, mz_size, 0, ML_CN10K_ALIGN_SIZE);
+	if (!mz) {
+		plt_err("plt_memzone_reserve failed : %s", str);
+		return -ENOMEM;
+	}
+
+	/* Copy mod.so */
+	model->mvtvm.object.so.addr = mz->addr;
+	model->mvtvm.object.so.size = object[0].size;
+	rte_memcpy(model->mvtvm.object.so.name, object[0].name, TVMDP_NAME_STRLEN);
+	rte_memcpy(model->mvtvm.object.so.addr, object[0].buffer, object[0].size);
+	rte_free(object[0].buffer);
+
+	/* Copy mod.json */
+	model->mvtvm.object.json.addr =
+		RTE_PTR_ADD(model->mvtvm.object.so.addr,
+			    RTE_ALIGN_CEIL(model->mvtvm.object.so.size, RTE_CACHE_LINE_MIN_SIZE));
+	model->mvtvm.object.json.size = object[1].size;
+	rte_memcpy(model->mvtvm.object.json.name, object[1].name, TVMDP_NAME_STRLEN);
+	rte_memcpy(model->mvtvm.object.json.addr, object[1].buffer, object[1].size);
+	rte_free(object[1].buffer);
+
+	/* Copy mod.params */
+	model->mvtvm.object.params.addr =
+		RTE_PTR_ADD(model->mvtvm.object.json.addr,
+			    RTE_ALIGN_CEIL(model->mvtvm.object.json.size, RTE_CACHE_LINE_MIN_SIZE));
+	model->mvtvm.object.params.size = object[2].size;
+	rte_memcpy(model->mvtvm.object.params.name, object[2].name, TVMDP_NAME_STRLEN);
+	rte_memcpy(model->mvtvm.object.params.addr, object[2].buffer, object[2].size);
+	rte_free(object[2].buffer);
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 305b4681ed..6607537599 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -12,8 +12,11 @@
 #include <rte_mldev.h>
 
 struct cnxk_ml_dev;
+struct cnxk_ml_model;
 
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+			struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index a7352840a6..7f3b3abb2e 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -33,3 +33,14 @@ mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 
 	return 0;
 }
+
+int
+mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+		    struct cnxk_ml_model *model)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(params);
+	RTE_SET_USED(model);
+
+	return -EINVAL;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 467a9d39e5..4bb1772ef4 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -8,9 +8,12 @@
 #include <rte_mldev.h>
 
 struct cnxk_ml_dev;
+struct cnxk_ml_model;
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+			struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 22/34] ml/cnxk: fetch layer info and load TVM model
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (20 preceding siblings ...)
  2023-10-23  4:41   ` [PATCH v8 21/34] ml/cnxk: add support to parse TVM model objects Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 23/34] ml/cnxk: update internal info for " Srikanth Yalavarthi
                     ` (11 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to fetch TVM model layer information and
update internal structures based on the layer information
Set callback functions for layer load and unload and
enable model loading using TVMDP library. Added support
to fetch full metadata after model load.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 11 +++++
 drivers/ml/cnxk/cn10k_ml_model.h |  2 +
 drivers/ml/cnxk/cn10k_ml_ops.c   |  7 ++-
 drivers/ml/cnxk/mvtvm_ml_model.c | 25 ++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  4 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 81 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.c | 10 ++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  3 ++
 8 files changed, 141 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index af9d5a666f..0325cd54f1 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -716,3 +716,14 @@ cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer
 	cnxk_ml_print_line(fp, LINE_LEN);
 	fprintf(fp, "\n");
 }
+
+int
+cn10k_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name, uint16_t *layer_id)
+{
+	if (model->type == ML_CNXK_MODEL_TYPE_TVM)
+		return mvtvm_ml_model_get_layer_id(model, layer_name, layer_id);
+
+	*layer_id = 0;
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 45f2ed5fcf..6744175cd5 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -461,5 +461,7 @@ void cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_mode
 			     struct cnxk_ml_io_info *io_info,
 			     struct cn10k_ml_model_metadata *metadata);
 void cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
+int cn10k_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
+				uint16_t *layer_id);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index a471e98fbf..4191ccc840 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -576,7 +576,7 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 	size_t layer_xstats_size;
 	uint8_t *base_dma_addr;
 	uint16_t scratch_pages;
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	uint16_t wb_pages;
 	uint64_t mz_size;
 	uint16_t idx;
@@ -584,7 +584,6 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 	int ret;
 
 	PLT_SET_USED(size);
-	PLT_SET_USED(layer_name);
 
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
@@ -598,6 +597,10 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 		return -EINVAL;
 	}
 
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
 	layer = &model->layer[layer_id];
 
 	ret = cn10k_ml_model_metadata_check(buffer, size);
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 4c9a080c05..8536fd8927 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -110,3 +110,28 @@ mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params, struct mvtvm_ml_mo
 
 	return -EINVAL;
 }
+
+int
+mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name, uint16_t *layer_id)
+{
+	uint16_t i;
+
+	for (i = 0; i < model->mvtvm.metadata.model.nb_layers; i++) {
+		if (strcmp(model->layer[i].name, layer_name) == 0)
+			break;
+	}
+
+	if (i == model->mvtvm.metadata.model.nb_layers) {
+		plt_err("Invalid layer name: %s", layer_name);
+		return -EINVAL;
+	}
+
+	if (model->layer[i].type != ML_CNXK_LAYER_TYPE_MRVL) {
+		plt_err("Invalid layer type, name: %s type: %d", layer_name, model->layer[i].type);
+		return -EINVAL;
+	}
+
+	*layer_id = i;
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index b11b66f495..6cb2639876 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -11,6 +11,8 @@
 
 #include "cnxk_ml_io.h"
 
+struct cnxk_ml_model;
+
 /* Maximum number of objects per model */
 #define ML_MVTVM_MODEL_OBJECT_MAX 3
 
@@ -46,5 +48,7 @@ struct mvtvm_ml_model_data {
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
 int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
 			      struct mvtvm_ml_model_object *object);
+int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
+				uint16_t *layer_id);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index e2413b6b15..9a3ada1b0d 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -49,9 +49,13 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		    struct cnxk_ml_model *model)
 {
 	struct mvtvm_ml_model_object object[ML_MVTVM_MODEL_OBJECT_MAX];
+	struct tvmrt_glow_callback *callback;
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
 	size_t model_object_size = 0;
+	uint16_t nb_mrvl_layers;
+	uint16_t nb_llvm_layers;
+	uint8_t layer_id = 0;
 	uint64_t mz_size = 0;
 	int ret;
 
@@ -99,5 +103,82 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	rte_memcpy(model->mvtvm.object.params.addr, object[2].buffer, object[2].size);
 	rte_free(object[2].buffer);
 
+	/* Get metadata - stage 1 */
+	ret = tvmdp_model_metadata_get_stage1(model->mvtvm.object.json.addr,
+					      model->mvtvm.object.json.size,
+					      &model->mvtvm.metadata);
+	if (ret != 0) {
+		plt_err("TVMDP: Failed to parse metadata - stage 1, model_id = %u, error = %d",
+			model->model_id, ret);
+		goto error;
+	}
+
+	/* Set model fields */
+	plt_strlcpy(model->name, model->mvtvm.metadata.model.name, TVMDP_NAME_STRLEN);
+	model->batch_size = 1;
+	model->nb_layers = model->mvtvm.metadata.model.nb_layers;
+
+	/* Update layer info */
+	nb_mrvl_layers = 0;
+	nb_llvm_layers = 0;
+	for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+		rte_strscpy(model->layer[layer_id].name,
+			    model->mvtvm.metadata.model.layer[layer_id].name, TVMDP_NAME_STRLEN);
+		if (strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "mrvl") == 0 ||
+		    strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "MRVL") == 0) {
+			model->layer[layer_id].type = ML_CNXK_LAYER_TYPE_MRVL;
+			nb_mrvl_layers++;
+		} else if (strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "llvm") == 0 ||
+			   strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "LLVM") == 0) {
+			model->layer[layer_id].type = ML_CNXK_LAYER_TYPE_LLVM;
+			nb_llvm_layers++;
+		}
+	}
+
+	if ((nb_llvm_layers == 0) && (nb_mrvl_layers == 0)) {
+		plt_err("Invalid model, nb_llvm_layers = %u, nb_mrvl_layers = %u", nb_llvm_layers,
+			nb_mrvl_layers);
+		goto error;
+	}
+
+	/* Set model subtype */
+	if ((nb_llvm_layers == 0) && (nb_mrvl_layers == 1))
+		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_MRVL;
+	else if ((nb_llvm_layers > 0) && (nb_mrvl_layers == 0))
+		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_LLVM;
+	else
+		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_HYBRID;
+
+	/* Set callback function array */
+	if (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_LLVM) {
+		callback = &model->mvtvm.cb;
+		callback->tvmrt_glow_layer_load = cn10k_ml_layer_load;
+		callback->tvmrt_glow_layer_unload = cn10k_ml_layer_unload;
+	} else {
+		callback = NULL;
+	}
+
+	/* Initialize model in TVMDP */
+	ret = tvmdp_model_load(cnxk_mldev, model->model_id, (void *)(&model->mvtvm.object),
+			       callback);
+	if (ret != 0) {
+		plt_err("TVMDP: Model load failed, model_id = %u, error = %d", model->model_id,
+			ret);
+		goto error;
+	}
+
+	/* Get model metadata - stage 2 */
+	ret = tvmdp_model_metadata_get_stage2(model->model_id, &model->mvtvm.metadata);
+	if (ret != 0) {
+		plt_err("TVMDP: Failed to get metadata, model_id = %u, error = %d\n",
+			model->model_id, ret);
+		goto error;
+	}
+
 	return 0;
+
+error:
+	rte_memzone_free(mz);
+
+	return ret;
 }
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index 7f3b3abb2e..d621dbc897 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -17,6 +17,16 @@ mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
 	return ML_CNXK_MODEL_TYPE_UNKNOWN;
 }
 
+int
+mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name, uint16_t *layer_id)
+{
+	RTE_SET_USED(model);
+	RTE_SET_USED(layer_name);
+	RTE_SET_USED(layer_id);
+
+	return -EINVAL;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 4bb1772ef4..23fdfdc4cd 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -16,4 +16,7 @@ int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 
+int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
+				uint16_t *layer_id);
+
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 23/34] ml/cnxk: update internal info for TVM model
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (21 preceding siblings ...)
  2023-10-23  4:41   ` [PATCH v8 22/34] ml/cnxk: fetch layer info and load TVM model Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 24/34] ml/cnxk: enable model unload in tvmdp library Srikanth Yalavarthi
                     ` (10 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enabled updating internal IO info structures for TVM model.
Compute static fields related to the model I/O.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_ops.c    |   4 ++
 drivers/ml/cnxk/mvtvm_ml_model.c | 111 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |   2 +
 drivers/ml/cnxk/mvtvm_ml_ops.c   |   3 +
 drivers/ml/cnxk/mvtvm_ml_stubs.c |   9 +++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |   1 +
 6 files changed, 130 insertions(+)

diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 85b37161d2..1565e521fd 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1244,6 +1244,8 @@ cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buf
 
 	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
 		info = cn10k_ml_model_io_info_get(model, 0);
+	else
+		info = mvtvm_ml_model_io_info_get(model, 0);
 
 	if (info == NULL)
 		return -EINVAL;
@@ -1296,6 +1298,8 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 
 	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
 		info = cn10k_ml_model_io_info_get(model, model->nb_layers - 1);
+	else
+		info = mvtvm_ml_model_io_info_get(model, model->nb_layers - 1);
 
 	if (info == NULL)
 		return -EINVAL;
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 8536fd8927..b40b0a13af 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -7,6 +7,8 @@
 
 #include <rte_mldev.h>
 
+#include <mldev_utils.h>
+
 #include <roc_api.h>
 
 #include "cnxk_ml_model.h"
@@ -135,3 +137,112 @@ mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 
 	return 0;
 }
+
+static enum rte_ml_io_type
+mvtvm_ml_io_type_map(uint8_t type)
+{
+	switch (type) {
+	case kDLInt:
+		return RTE_ML_IO_TYPE_INT32;
+	case kDLUInt:
+		return RTE_ML_IO_TYPE_UINT32;
+	case kDLFloat:
+		return RTE_ML_IO_TYPE_FP32;
+	case kDLBfloat:
+		return RTE_ML_IO_TYPE_BFLOAT16;
+	}
+
+	return RTE_ML_IO_TYPE_UNKNOWN;
+}
+
+void
+mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model)
+{
+	struct tvmdp_model_metadata *metadata;
+	int32_t i;
+	int32_t j;
+
+	if (model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL)
+		goto tvm_mrvl_model;
+
+	metadata = &model->mvtvm.metadata;
+
+	/* Inputs, set for layer_id = 0 */
+	model->mvtvm.info.nb_inputs = metadata->model.num_input;
+	model->mvtvm.info.total_input_sz_d = 0;
+	model->mvtvm.info.total_input_sz_q = 0;
+	for (i = 0; i < metadata->model.num_input; i++) {
+		rte_strscpy(model->mvtvm.info.input[i].name, metadata->input[i].name,
+			    TVMDP_NAME_STRLEN);
+		model->mvtvm.info.input[i].dtype =
+			mvtvm_ml_io_type_map(metadata->input[i].datatype.code);
+		model->mvtvm.info.input[i].qtype =
+			mvtvm_ml_io_type_map(metadata->input[i].model_datatype.code);
+		model->mvtvm.info.input[i].nb_dims = metadata->input[i].ndim;
+
+		model->mvtvm.info.input[i].nb_elements = 1;
+		for (j = 0; j < metadata->input[i].ndim; j++) {
+			model->mvtvm.info.input[i].shape[j] = metadata->input[i].shape[j];
+			model->mvtvm.info.input[i].nb_elements *= metadata->input[i].shape[j];
+		}
+
+		model->mvtvm.info.input[i].sz_d =
+			model->mvtvm.info.input[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.input[i].dtype);
+		model->mvtvm.info.input[i].sz_q =
+			model->mvtvm.info.input[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.input[i].qtype);
+
+		model->mvtvm.info.total_input_sz_d += model->mvtvm.info.input[i].sz_d;
+		model->mvtvm.info.total_input_sz_q += model->mvtvm.info.input[i].sz_q;
+
+		plt_ml_dbg("model_id = %u, input[%u] - sz_d = %u sz_q = %u", model->model_id, i,
+			   model->mvtvm.info.input[i].sz_d, model->mvtvm.info.input[i].sz_q);
+	}
+
+	/* Outputs, set for nb_layers - 1 */
+	model->mvtvm.info.nb_outputs = metadata->model.num_output;
+	model->mvtvm.info.total_output_sz_d = 0;
+	model->mvtvm.info.total_output_sz_q = 0;
+	for (i = 0; i < metadata->model.num_output; i++) {
+		rte_strscpy(model->mvtvm.info.output[i].name, metadata->output[i].name,
+			    TVMDP_NAME_STRLEN);
+		model->mvtvm.info.output[i].dtype =
+			mvtvm_ml_io_type_map(metadata->output[i].datatype.code);
+		model->mvtvm.info.output[i].qtype =
+			mvtvm_ml_io_type_map(metadata->output[i].model_datatype.code);
+		model->mvtvm.info.output[i].nb_dims = metadata->output[i].ndim;
+
+		model->mvtvm.info.output[i].nb_elements = 1;
+		for (j = 0; j < metadata->output[i].ndim; j++) {
+			model->mvtvm.info.output[i].shape[j] = metadata->output[i].shape[j];
+			model->mvtvm.info.output[i].nb_elements *= metadata->output[i].shape[j];
+		}
+
+		model->mvtvm.info.output[i].sz_d =
+			model->mvtvm.info.output[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.output[i].dtype);
+		model->mvtvm.info.output[i].sz_q =
+			model->mvtvm.info.output[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.output[i].qtype);
+
+		model->mvtvm.info.total_output_sz_d += model->mvtvm.info.output[i].sz_d;
+		model->mvtvm.info.total_output_sz_q += model->mvtvm.info.output[i].sz_q;
+
+		plt_ml_dbg("model_id = %u, output[%u] - sz_d = %u sz_q = %u", model->model_id, i,
+			   model->mvtvm.info.output[i].sz_d, model->mvtvm.info.output[i].sz_q);
+	}
+
+	return;
+
+tvm_mrvl_model:
+	cn10k_ml_layer_io_info_set(&model->mvtvm.info, &model->layer[0].glow.metadata);
+}
+
+struct cnxk_ml_io_info *
+mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
+{
+	RTE_SET_USED(layer_id);
+
+	return &model->mvtvm.info;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 6cb2639876..e86581bc6a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -50,5 +50,7 @@ int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
 			      struct mvtvm_ml_model_object *object);
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
+void mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model);
+struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 9a3ada1b0d..e21bf2dc07 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -175,6 +175,9 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		goto error;
 	}
 
+	/* Update model I/O data */
+	mvtvm_ml_model_io_info_set(model);
+
 	return 0;
 
 error:
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index d621dbc897..80a9a90b4e 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -27,6 +27,15 @@ mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 	return -EINVAL;
 }
 
+struct cnxk_ml_io_info *
+mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
+{
+	RTE_SET_USED(model);
+	RTE_SET_USED(layer_id);
+
+	return NULL;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 23fdfdc4cd..29f721072a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -18,5 +18,6 @@ int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_para
 
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
+struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
 
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 24/34] ml/cnxk: enable model unload in tvmdp library
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (22 preceding siblings ...)
  2023-10-23  4:41   ` [PATCH v8 23/34] ml/cnxk: update internal info for " Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 25/34] ml/cnxk: enable OCM check for multilayer TVM model Srikanth Yalavarthi
                     ` (9 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enable unloading model using external tvmdp library. Updated
layer unload callback to support multiple layers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c   |  8 +++++---
 drivers/ml/cnxk/cnxk_ml_ops.c    |  7 +++++--
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 28 ++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |  1 +
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  9 +++++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  1 +
 6 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 4191ccc840..e7208391fd 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -780,11 +780,9 @@ cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name)
 	struct cnxk_ml_layer *layer;
 
 	char str[RTE_MEMZONE_NAMESIZE];
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	int ret;
 
-	PLT_SET_USED(layer_name);
-
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
 		plt_err("Invalid device = %p", device);
@@ -797,6 +795,10 @@ cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name)
 		return -EINVAL;
 	}
 
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
 	layer = &model->layer[layer_id];
 
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u_%u", CN10K_ML_LAYER_MEMZONE_NAME,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 1565e521fd..ce668e1eb6 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1107,7 +1107,7 @@ cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 	struct cnxk_ml_model *model;
 
 	char str[RTE_MEMZONE_NAMESIZE];
-	int ret;
+	int ret = 0;
 
 	if (dev == NULL)
 		return -EINVAL;
@@ -1125,7 +1125,10 @@ cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EBUSY;
 	}
 
-	ret = cn10k_ml_model_unload(cnxk_mldev, model);
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		ret = cn10k_ml_model_unload(cnxk_mldev, model);
+	else
+		ret = mvtvm_ml_model_unload(cnxk_mldev, model);
 	if (ret != 0)
 		return ret;
 
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index e21bf2dc07..3847f9b6b9 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -185,3 +185,31 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	return ret;
 }
+
+int
+mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	int ret;
+
+	RTE_SET_USED(cnxk_mldev);
+
+	/* Initialize model in TVMDP */
+	ret = tvmdp_model_unload(model->model_id);
+	if (ret != 0) {
+		plt_err("TVMDP: Model unload failed, model_id = %u, error = %d", model->model_id,
+			ret);
+		return ret;
+	}
+
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", MVTVM_ML_MODEL_MEMZONE_NAME, model->model_id);
+	mz = rte_memzone_lookup(str);
+	if (mz == NULL) {
+		plt_err("Memzone lookup failed for TVM model: model_id = %u, mz = %s",
+			model->model_id, str);
+		return -EINVAL;
+	}
+
+	return plt_memzone_free(mz);
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 6607537599..770794fe7d 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -18,5 +18,6 @@ int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_d
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
+int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index 80a9a90b4e..a17a76e41f 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -63,3 +63,12 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	return -EINVAL;
 }
+
+int
+mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+
+	return -EINVAL;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 29f721072a..3776fb5369 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -15,6 +15,7 @@ int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_d
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
+int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 25/34] ml/cnxk: enable OCM check for multilayer TVM model
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (23 preceding siblings ...)
  2023-10-23  4:41   ` [PATCH v8 24/34] ml/cnxk: enable model unload in tvmdp library Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 26/34] ml/cnxk: support start and stop for TVM models Srikanth Yalavarthi
                     ` (8 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

From: Anup Prabhu <aprabhu@marvell.com>

Enabled check for OCM size requirement for multi-layer
TVM model. Compute OCM scratch and WB requirement for
all layers during the load stage.

Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_ops.c | 60 +++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index ce668e1eb6..d1471971e4 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1023,8 +1023,12 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
+	uint16_t max_scratch_pages;
+	struct cn10k_ml_ocm *ocm;
 	uint64_t model_info_size;
+	uint16_t total_wb_pages;
 	uint16_t lcl_model_id;
+	uint16_t layer_id;
 	uint64_t mz_size;
 	bool found;
 	int ret;
@@ -1086,6 +1090,62 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 	if (ret != 0)
 		goto error;
 
+	max_scratch_pages = 0;
+	total_wb_pages = 0;
+	layer_id = 0;
+
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
+
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
+		total_wb_pages = total_wb_pages + model->layer[layer_id].glow.ocm_map.wb_pages;
+		max_scratch_pages = PLT_MAX(max_scratch_pages,
+					    model->layer[layer_id].glow.ocm_map.scratch_pages);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	} else {
+		for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+			if (model->layer[layer_id].type == ML_CNXK_LAYER_TYPE_MRVL) {
+				total_wb_pages = total_wb_pages +
+						 model->layer[layer_id].glow.ocm_map.wb_pages;
+				max_scratch_pages =
+					PLT_MAX(max_scratch_pages,
+						model->layer[layer_id].glow.ocm_map.scratch_pages);
+			}
+		}
+#endif
+	}
+
+	if ((total_wb_pages + max_scratch_pages) > ocm->num_pages) {
+		plt_err("model_id = %u: total_wb_pages (%u) + scratch_pages (%u) >  %u\n",
+			lcl_model_id, total_wb_pages, max_scratch_pages, ocm->num_pages);
+
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
+			plt_ml_dbg("layer_id = %u: wb_pages = %u, scratch_pages = %u\n", layer_id,
+				   model->layer[layer_id].glow.ocm_map.wb_pages,
+				   model->layer[layer_id].glow.ocm_map.scratch_pages);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		} else {
+			for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers;
+			     layer_id++) {
+				if (model->layer[layer_id].type == ML_CNXK_LAYER_TYPE_MRVL) {
+					plt_ml_dbg(
+						"layer_id = %u: wb_pages = %u, scratch_pages = %u\n",
+						layer_id,
+						model->layer[layer_id].glow.ocm_map.wb_pages,
+						model->layer[layer_id].glow.ocm_map.scratch_pages);
+				}
+			}
+#endif
+		}
+
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+			cn10k_ml_model_unload(cnxk_mldev, model);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		else {
+			mvtvm_ml_model_unload(cnxk_mldev, model);
+			return -ENOMEM;
+		}
+#endif
+	}
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CNXK_MODEL_STATE_LOADED;
 	cnxk_mldev->nb_models_loaded++;
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 26/34] ml/cnxk: support start and stop for TVM models
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (24 preceding siblings ...)
  2023-10-23  4:41   ` [PATCH v8 25/34] ml/cnxk: enable OCM check for multilayer TVM model Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 27/34] ml/cnxk: update internal TVM model info structure Srikanth Yalavarthi
                     ` (7 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to start and stop TVM models. TVM model
start would invoke layer start for all Glow layers part
of the model. TVM model stop would invoke layer stop
for all Glow layers part of the model.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c   | 16 ++++++----
 drivers/ml/cnxk/cnxk_ml_ops.c    | 14 +++++++--
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 52 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |  2 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c | 18 +++++++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  2 ++
 6 files changed, 96 insertions(+), 8 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index e7208391fd..2d308802cf 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -827,7 +827,7 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 	struct cn10k_ml_ocm *ocm;
 	struct cnxk_ml_req *req;
 
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	bool job_enqueued;
 	bool job_dequeued;
 	uint8_t num_tiles;
@@ -838,8 +838,6 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 	bool locked;
 	int ret = 0;
 
-	PLT_SET_USED(layer_name);
-
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
 		plt_err("Invalid device = %p", device);
@@ -852,6 +850,10 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 		return -EINVAL;
 	}
 
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
 	layer = &model->layer[layer_id];
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
@@ -1015,14 +1017,12 @@ cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name)
 	struct cn10k_ml_ocm *ocm;
 	struct cnxk_ml_req *req;
 
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	bool job_enqueued;
 	bool job_dequeued;
 	bool locked;
 	int ret = 0;
 
-	PLT_SET_USED(layer_name);
-
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
 		plt_err("Invalid device = %p", device);
@@ -1035,6 +1035,10 @@ cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name)
 		return -EINVAL;
 	}
 
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
 	layer = &model->layer[layer_id];
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index d1471971e4..c38c60bf76 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1216,7 +1216,12 @@ cnxk_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EINVAL;
 	}
 
-	return cn10k_ml_model_start(cnxk_mldev, model);
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		return cn10k_ml_model_start(cnxk_mldev, model);
+	else
+		return mvtvm_ml_model_start(cnxk_mldev, model);
+
+	return 0;
 }
 
 int
@@ -1236,7 +1241,12 @@ cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EINVAL;
 	}
 
-	return cn10k_ml_model_stop(cnxk_mldev, model);
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		return cn10k_ml_model_stop(cnxk_mldev, model);
+	else
+		return mvtvm_ml_model_stop(cnxk_mldev, model);
+
+	return 0;
 }
 
 static int
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 3847f9b6b9..323c7c6fb6 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -213,3 +213,55 @@ mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mode
 
 	return plt_memzone_free(mz);
 }
+
+int
+mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+
+	uint16_t layer_id = 0;
+	int ret = 0;
+
+next_layer:
+	layer = &model->layer[layer_id];
+	if (layer->type == ML_CNXK_LAYER_TYPE_MRVL) {
+		ret = cn10k_ml_layer_start(cnxk_mldev, model->model_id, layer->name);
+		if (ret != 0) {
+			plt_err("Layer start failed, model_id = %u, layer_name = %s, error = %d",
+				model->model_id, layer->name, ret);
+			return ret;
+		}
+	}
+	layer_id++;
+
+	if (layer_id < model->nb_layers)
+		goto next_layer;
+
+	return 0;
+}
+
+int
+mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+
+	uint16_t layer_id = 0;
+	int ret = 0;
+
+next_layer:
+	layer = &model->layer[layer_id];
+	if (layer->type == ML_CNXK_LAYER_TYPE_MRVL) {
+		ret = cn10k_ml_layer_stop(cnxk_mldev, model->model_id, layer->name);
+		if (ret != 0) {
+			plt_err("Layer stop failed, model_id = %u, layer_name = %s, error = %d",
+				model->model_id, layer->name, ret);
+			return ret;
+		}
+	}
+	layer_id++;
+
+	if (layer_id < model->nb_layers)
+		goto next_layer;
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 770794fe7d..55459f9f7f 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -19,5 +19,7 @@ int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index a17a76e41f..b8c2e6a1fc 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -72,3 +72,21 @@ mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mode
 
 	return -EINVAL;
 }
+
+int
+mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+
+	return -EINVAL;
+}
+
+int
+mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+
+	return -EINVAL;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 3776fb5369..1eb663b1d1 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -16,6 +16,8 @@ int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 27/34] ml/cnxk: update internal TVM model info structure
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (25 preceding siblings ...)
  2023-10-23  4:41   ` [PATCH v8 26/34] ml/cnxk: support start and stop for TVM models Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 28/34] ml/cnxk: support device dump for TVM models Srikanth Yalavarthi
                     ` (6 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

From: Prince Takkar <ptakkar@marvell.com>

Added support to update internal model info structure
for TVM models.

Signed-off-by: Prince Takkar <ptakkar@marvell.com>
Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/mvtvm_ml_model.c | 65 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 +
 drivers/ml/cnxk/mvtvm_ml_ops.c   |  3 ++
 3 files changed, 70 insertions(+)

diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index b40b0a13af..650dd970bd 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -11,6 +11,7 @@
 
 #include <roc_api.h>
 
+#include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 
 /* Objects list */
@@ -246,3 +247,67 @@ mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
 
 	return &model->mvtvm.info;
 }
+
+void
+mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct tvmdp_model_metadata *metadata;
+	struct rte_ml_model_info *info;
+	struct rte_ml_io_info *output;
+	struct rte_ml_io_info *input;
+	uint8_t i;
+
+	info = PLT_PTR_CAST(model->info);
+	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
+	output = PLT_PTR_ADD(input, ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info));
+
+	/* Reset model info */
+	memset(info, 0, sizeof(struct rte_ml_model_info));
+
+	if (model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL)
+		goto tvm_mrvl_model;
+
+	metadata = &model->mvtvm.metadata;
+	rte_memcpy(info->name, metadata->model.name, TVMDP_NAME_STRLEN);
+	snprintf(info->version, RTE_ML_STR_MAX, "%u.%u.%u.%u", metadata->model.version[0],
+		 metadata->model.version[1], metadata->model.version[2],
+		 metadata->model.version[3]);
+	info->model_id = model->model_id;
+	info->device_id = cnxk_mldev->mldev->data->dev_id;
+	info->io_layout = RTE_ML_IO_LAYOUT_SPLIT;
+	info->min_batches = model->batch_size;
+	info->max_batches = model->batch_size;
+	info->nb_inputs = metadata->model.num_input;
+	info->input_info = input;
+	info->nb_outputs = metadata->model.num_output;
+	info->output_info = output;
+	info->wb_size = 0;
+
+	/* Set input info */
+	for (i = 0; i < info->nb_inputs; i++) {
+		rte_memcpy(input[i].name, metadata->input[i].name, MRVL_ML_INPUT_NAME_LEN);
+		input[i].nb_dims = metadata->input[i].ndim;
+		input[i].shape = &model->mvtvm.info.input[i].shape[0];
+		input[i].type = model->mvtvm.info.input[i].qtype;
+		input[i].nb_elements = model->mvtvm.info.input[i].nb_elements;
+		input[i].size = model->mvtvm.info.input[i].nb_elements *
+				rte_ml_io_type_size_get(model->mvtvm.info.input[i].qtype);
+	}
+
+	/* Set output info */
+	for (i = 0; i < info->nb_outputs; i++) {
+		rte_memcpy(output[i].name, metadata->output[i].name, MRVL_ML_OUTPUT_NAME_LEN);
+		output[i].nb_dims = metadata->output[i].ndim;
+		output[i].shape = &model->mvtvm.info.output[i].shape[0];
+		output[i].type = model->mvtvm.info.output[i].qtype;
+		output[i].nb_elements = model->mvtvm.info.output[i].nb_elements;
+		output[i].size = model->mvtvm.info.output[i].nb_elements *
+				 rte_ml_io_type_size_get(model->mvtvm.info.output[i].qtype);
+	}
+
+	return;
+
+tvm_mrvl_model:
+	cn10k_ml_model_info_set(cnxk_mldev, model, &model->mvtvm.info,
+				&model->layer[0].glow.metadata);
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index e86581bc6a..a1247ffbde 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -11,6 +11,7 @@
 
 #include "cnxk_ml_io.h"
 
+struct cnxk_ml_dev;
 struct cnxk_ml_model;
 
 /* Maximum number of objects per model */
@@ -52,5 +53,6 @@ int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_n
 				uint16_t *layer_id);
 void mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model);
 struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
+void mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 323c7c6fb6..c6872cd89a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -178,6 +178,9 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	/* Update model I/O data */
 	mvtvm_ml_model_io_info_set(model);
 
+	/* Set model info */
+	mvtvm_ml_model_info_set(cnxk_mldev, model);
+
 	return 0;
 
 error:
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 28/34] ml/cnxk: support device dump for TVM models
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (26 preceding siblings ...)
  2023-10-23  4:41   ` [PATCH v8 27/34] ml/cnxk: update internal TVM model info structure Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 29/34] ml/cnxk: enable reporting model runtime as xstats Srikanth Yalavarthi
                     ` (5 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enabled support to print TVM model layer info.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_model.c  |  7 +++-
 drivers/ml/cnxk/mvtvm_ml_model.c | 59 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  8 +++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  2 ++
 5 files changed, 77 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
index 02f80410ec..ed6a1ed866 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.c
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -68,6 +68,8 @@ cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 	cnxk_ml_print_line(fp, LINE_LEN);
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
 	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->name);
+	fprintf(fp, "%*s : %d\n", FIELD_LEN, "type", model->type);
+	fprintf(fp, "%*s : %d\n", FIELD_LEN, "subtype", model->subtype);
 	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->batch_size);
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "nb_layers", model->nb_layers);
@@ -84,6 +86,9 @@ cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 
 	for (layer_id = 0; layer_id < model->nb_layers; layer_id++) {
 		layer = &model->layer[layer_id];
-		cn10k_ml_layer_print(cnxk_mldev, layer, fp);
+		if (layer->type == ML_CNXK_LAYER_TYPE_MRVL)
+			cn10k_ml_layer_print(cnxk_mldev, layer, fp);
+		else
+			mvtvm_ml_layer_print(cnxk_mldev, layer, fp);
 	}
 }
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 650dd970bd..ffbcec8b80 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -13,6 +13,7 @@
 
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_utils.h"
 
 /* Objects list */
 char mvtvm_object_list[ML_MVTVM_MODEL_OBJECT_MAX][RTE_ML_STR_MAX] = {"mod.so", "mod.json",
@@ -311,3 +312,61 @@ mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 	cn10k_ml_model_info_set(cnxk_mldev, model, &model->mvtvm.info,
 				&model->layer[0].glow.metadata);
 }
+
+void
+mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp)
+{
+	char str[STR_LEN];
+	uint8_t i;
+
+	/* Print debug info */
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, " Layer Information (Layer ID: %u, Name: %s)\n",
+		cnxk_mldev->index_map[layer->index].layer_id, layer->name);
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "layer_id",
+		cnxk_mldev->index_map[layer->index].layer_id);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", layer->name);
+	fprintf(fp, "%*s : %d\n", FIELD_LEN, "type", layer->type);
+	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "layer", PLT_U64_CAST(layer));
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", layer->batch_size);
+
+	/* Print model state */
+	if (layer->state == ML_CNXK_LAYER_STATE_LOADED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+	if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+	if (layer->state == ML_CNXK_LAYER_STATE_STARTED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", layer->info.nb_inputs);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", layer->info.nb_outputs);
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s\n", "input", "input_name", "input_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->info.nb_inputs; i++) {
+		fprintf(fp, "%8u  ", i);
+		fprintf(fp, "%*s  ", 16, layer->info.input[i].name);
+		rte_ml_io_type_to_str(layer->info.input[i].qtype, str, STR_LEN);
+		fprintf(fp, "%*s  ", 12, str);
+	}
+	fprintf(fp, "\n");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s\n", "output", "output_name", "output_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->info.nb_outputs; i++) {
+		fprintf(fp, "%8u  ", i);
+		fprintf(fp, "%*s  ", 16, layer->info.output[i].name);
+		rte_ml_io_type_to_str(layer->info.output[i].qtype, str, STR_LEN);
+		fprintf(fp, "%*s  ", 12, str);
+		fprintf(fp, "\n");
+	}
+	fprintf(fp, "\n");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index a1247ffbde..900ba44fa0 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -13,6 +13,7 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 /* Maximum number of objects per model */
 #define ML_MVTVM_MODEL_OBJECT_MAX 3
@@ -54,5 +55,6 @@ int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_n
 void mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model);
 struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
 void mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+void mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index b8c2e6a1fc..260a051b08 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -36,6 +36,14 @@ mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
 	return NULL;
 }
 
+void
+mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(layer);
+	RTE_SET_USED(fp);
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 1eb663b1d1..d6d0edbcf1 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -9,6 +9,7 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
@@ -22,5 +23,6 @@ int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
 struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
+void mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
 
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 29/34] ml/cnxk: enable reporting model runtime as xstats
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (27 preceding siblings ...)
  2023-10-23  4:41   ` [PATCH v8 28/34] ml/cnxk: support device dump for TVM models Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 30/34] ml/cnxk: implement I/O alloc and free callbacks Srikanth Yalavarthi
                     ` (4 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added model xstats entries to compute runtime latency.
Allocated internal resources for TVM model xstats.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c   |   9 +++
 drivers/ml/cnxk/cn10k_ml_ops.h   |   2 +
 drivers/ml/cnxk/cnxk_ml_ops.c    | 131 +++++++++++++++++++++++++++----
 drivers/ml/cnxk/cnxk_ml_ops.h    |   1 +
 drivers/ml/cnxk/cnxk_ml_xstats.h |   7 ++
 drivers/ml/cnxk/mvtvm_ml_model.h |  24 ++++++
 drivers/ml/cnxk/mvtvm_ml_ops.c   |  96 +++++++++++++++++++++-
 drivers/ml/cnxk/mvtvm_ml_ops.h   |   8 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  23 ++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |   6 ++
 10 files changed, 289 insertions(+), 18 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 2d308802cf..0c67ce7b40 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -197,6 +197,15 @@ cn10k_ml_xstats_layer_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model
 	}
 }
 
+void
+cn10k_ml_xstat_model_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			      uint16_t stat_id, uint16_t entry, char *suffix)
+{
+	snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+		 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+		 model->glow.metadata.model.name, model_xstats[entry].name, suffix);
+}
+
 #define ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
 	do {                                                                                       \
 		value = 0;                                                                         \
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 3d18303ed3..045e2e6cd2 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -331,6 +331,8 @@ int cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name
 int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name);
 
 /* xstats ops */
+void cn10k_ml_xstat_model_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				   uint16_t stat_id, uint16_t entry, char *suffix);
 uint64_t cn10k_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				  enum cnxk_ml_xstats_type type);
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index c38c60bf76..2632d70d8c 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -138,7 +138,8 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
 	nb_stats = RTE_DIM(device_xstats) +
-		   RTE_DIM(layer_xstats) * ML_CNXK_MAX_MODELS * ML_CNXK_MODEL_MAX_LAYERS;
+		   RTE_DIM(layer_xstats) * ML_CNXK_MAX_MODELS * ML_CNXK_MODEL_MAX_LAYERS +
+		   RTE_DIM(model_xstats) * ML_CNXK_MAX_MODELS;
 	if (cnxk_mldev->xstats.entries == NULL)
 		cnxk_mldev->xstats.entries = rte_zmalloc(
 			"cnxk_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
@@ -169,6 +170,25 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
 		cnxk_mldev->xstats.offset_for_model[model] = stat_id;
 
+		for (i = 0; i < RTE_DIM(model_xstats); i++) {
+			cnxk_mldev->xstats.entries[stat_id].map.id = stat_id;
+			cnxk_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
+			cnxk_mldev->xstats.entries[stat_id].group = CNXK_ML_XSTATS_GROUP_MODEL;
+			cnxk_mldev->xstats.entries[stat_id].type = model_xstats[i].type;
+			cnxk_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
+			cnxk_mldev->xstats.entries[stat_id].obj_idx = model;
+			cnxk_mldev->xstats.entries[stat_id].layer_id = -1;
+			cnxk_mldev->xstats.entries[stat_id].reset_allowed =
+				model_xstats[i].reset_allowed;
+
+			/* Name of xstat is updated during model load */
+			snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+				 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name),
+				 "Model-%u-%s", model, model_xstats[i].name);
+
+			stat_id++;
+		}
+
 		for (layer = 0; layer < ML_CNXK_MODEL_MAX_LAYERS; layer++) {
 			cnxk_mldev->xstats.offset_for_layer[model][layer] = stat_id;
 
@@ -195,7 +215,8 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 			cnxk_mldev->xstats.count_per_layer[model][layer] = RTE_DIM(layer_xstats);
 		}
 
-		cnxk_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
+		cnxk_mldev->xstats.count_per_model[model] =
+			RTE_DIM(layer_xstats) + ML_CNXK_MODEL_MAX_LAYERS * RTE_DIM(model_xstats);
 	}
 
 	cnxk_mldev->xstats.count_mode_model = stat_id - cnxk_mldev->xstats.count_mode_device;
@@ -204,6 +225,36 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
+void
+cnxk_ml_xstats_model_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id)
+{
+	struct cnxk_ml_model *model;
+	uint16_t rclk_freq;
+	uint16_t sclk_freq;
+	uint16_t stat_id;
+	char suffix[8];
+	uint16_t i;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	stat_id = cnxk_mldev->xstats.offset_for_model[model_id];
+
+	roc_clk_freq_get(&rclk_freq, &sclk_freq);
+	if (sclk_freq == 0)
+		rte_strscpy(suffix, "cycles", 7);
+	else
+		rte_strscpy(suffix, "ns", 3);
+
+	/* Update xstat name based on layer name and sclk availability */
+	for (i = 0; i < RTE_DIM(model_xstats); i++) {
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+			cn10k_ml_xstat_model_name_set(cnxk_mldev, model, stat_id, i, suffix);
+		else
+			mvtvm_ml_model_xstat_name_set(cnxk_mldev, model, stat_id, i, suffix);
+
+		stat_id++;
+	}
+}
+
 static void
 cnxk_ml_xstats_uninit(struct cnxk_ml_dev *cnxk_mldev)
 {
@@ -247,13 +298,22 @@ cnxk_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx, int32_
 	if (model == NULL)
 		return 0;
 
-	if (layer_id >= 0)
+	if (layer_id >= 0) {
 		layer = &model->layer[layer_id];
-	else
-		return 0;
+		goto layer_xstats;
+	} else {
+		layer = NULL;
+		goto model_xstats;
+	}
 
+layer_xstats:
 	value = cn10k_ml_model_xstat_get(cnxk_mldev, layer, type);
+	goto exit_xstats;
 
+model_xstats:
+	value = mvtvm_ml_model_xstat_get(cnxk_mldev, model, type);
+
+exit_xstats:
 	roc_clk_freq_get(&rclk_freq, &sclk_freq);
 	if (sclk_freq != 0) /* return in ns */
 		value = (value * 1000ULL) / sclk_freq;
@@ -836,8 +896,9 @@ cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode
 {
 	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
 	uint32_t xstats_mode_count;
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	uint32_t idx = 0;
 	uint32_t i;
 
@@ -854,7 +915,17 @@ cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CNXK_MAX_MODELS)
 			break;
-		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+
+		model = cnxk_mldev->mldev->data->models[model_id];
+		for (layer_id = 0; layer_id < model->nb_layers; layer_id++) {
+			if (model->layer[layer_id].type == ML_CNXK_LAYER_TYPE_MRVL)
+				xstats_mode_count +=
+					cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+		}
+
+		if ((model->type == ML_CNXK_MODEL_TYPE_TVM) &&
+		    (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_MRVL))
+			xstats_mode_count += RTE_DIM(model_xstats);
 		break;
 	default:
 		return -EINVAL;
@@ -868,9 +939,20 @@ cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode
 		if (xs->mode != mode)
 			continue;
 
-		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
-		    (model_id != xs->obj_idx || layer_id != xs->layer_id))
-			continue;
+		if (mode == RTE_ML_DEV_XSTATS_MODEL) {
+			if (model_id != xs->obj_idx)
+				continue;
+
+			model = cnxk_mldev->mldev->data->models[model_id];
+			if ((model->type == ML_CNXK_MODEL_TYPE_GLOW ||
+			     model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL) &&
+			    xs->group == CNXK_ML_XSTATS_GROUP_MODEL)
+				continue;
+
+			if (model->type == ML_CNXK_MODEL_TYPE_TVM &&
+			    model->layer[xs->layer_id].type == ML_CNXK_LAYER_TYPE_LLVM)
+				continue;
+		}
 
 		rte_strscpy(xstats_map[idx].name, xs->map.name, RTE_ML_STR_MAX);
 		xstats_map[idx].id = xs->map.id;
@@ -931,9 +1013,10 @@ cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 {
 	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
 	uint32_t xstats_mode_count;
-	uint16_t layer_id = 0;
 	cnxk_ml_xstats_fn fn;
+	uint16_t layer_id;
 	uint64_t val;
 	uint32_t idx;
 	uint32_t i;
@@ -951,7 +1034,14 @@ cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CNXK_MAX_MODELS)
 			return -EINVAL;
-		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+
+		model = cnxk_mldev->mldev->data->models[model_id];
+		for (layer_id = 0; layer_id < model->nb_layers; layer_id++)
+			xstats_mode_count += cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+
+		if ((model->type == ML_CNXK_MODEL_TYPE_TVM) &&
+		    (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_MRVL))
+			xstats_mode_count += RTE_DIM(model_xstats);
 		break;
 	default:
 		return -EINVAL;
@@ -963,11 +1053,18 @@ cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 		if (stat_ids[i] > cnxk_mldev->xstats.count || xs->mode != mode)
 			continue;
 
-		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
-		    (model_id != xs->obj_idx || layer_id != xs->layer_id)) {
-			plt_err("Invalid stats_id[%d] = %d for model_id = %d\n", i, stat_ids[i],
-				model_id);
-			return -EINVAL;
+		if (mode == RTE_ML_DEV_XSTATS_MODEL) {
+			if (model_id != xs->obj_idx)
+				continue;
+
+			model = cnxk_mldev->mldev->data->models[xs->obj_idx];
+			if ((model->type == ML_CNXK_MODEL_TYPE_GLOW ||
+			     model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL) &&
+			    xs->group == CNXK_ML_XSTATS_GROUP_MODEL)
+				continue;
+
+			if (xs->layer_id == -1 && xs->group == CNXK_ML_XSTATS_GROUP_LAYER)
+				continue;
 		}
 
 		switch (xs->fn_id) {
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index b22a2b0d95..ab32676b3e 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -70,6 +70,7 @@ extern struct rte_ml_dev_ops cnxk_ml_ops;
 
 int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
 int cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
+void cnxk_ml_xstats_model_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id);
 
 __rte_hot uint16_t cnxk_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					 struct rte_ml_op **ops, uint16_t nb_ops);
diff --git a/drivers/ml/cnxk/cnxk_ml_xstats.h b/drivers/ml/cnxk/cnxk_ml_xstats.h
index 5e02bb876c..a2c9adfe4a 100644
--- a/drivers/ml/cnxk/cnxk_ml_xstats.h
+++ b/drivers/ml/cnxk/cnxk_ml_xstats.h
@@ -142,4 +142,11 @@ static const struct cnxk_ml_xstat_info layer_xstats[] = {
 	{"Min-FW-Latency", min_fw_latency, 1}, {"Max-FW-Latency", max_fw_latency, 1},
 };
 
+/* Model xstats */
+static const struct cnxk_ml_xstat_info model_xstats[] = {
+	{"Avg-RT-Latency", avg_rt_latency, 1},
+	{"Min-RT-Latency", min_rt_latency, 1},
+	{"Max-RT-Latency", max_rt_latency, 1},
+};
+
 #endif /* _CNXK_ML_XSTATS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 900ba44fa0..66c3af18e1 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -33,6 +33,27 @@ struct mvtvm_ml_model_object {
 	int64_t size;
 };
 
+/* Model fast-path stats */
+struct mvtvm_ml_model_xstats {
+	/* Total TVM runtime latency, sum of all inferences */
+	uint64_t tvm_rt_latency_tot;
+
+	/* TVM runtime latency */
+	uint64_t tvm_rt_latency;
+
+	/* Minimum TVM runtime latency */
+	uint64_t tvm_rt_latency_min;
+
+	/* Maximum TVM runtime latency */
+	uint64_t tvm_rt_latency_max;
+
+	/* Total jobs dequeued */
+	uint64_t dequeued_count;
+
+	/* Hardware stats reset index */
+	uint64_t tvm_rt_reset_count;
+};
+
 struct mvtvm_ml_model_data {
 	/* Model metadata */
 	struct tvmdp_model_metadata metadata;
@@ -45,6 +66,9 @@ struct mvtvm_ml_model_data {
 
 	/* Model I/O info */
 	struct cnxk_ml_io_info info;
+
+	/* Stats for burst ops */
+	struct mvtvm_ml_model_xstats *burst_xstats;
 };
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index c6872cd89a..abfbae2b3a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -10,10 +10,83 @@
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
+#include "cnxk_ml_xstats.h"
 
 /* ML model macros */
 #define MVTVM_ML_MODEL_MEMZONE_NAME "ml_mvtvm_model_mz"
 
+void
+mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			      uint16_t stat_id, uint16_t entry, char *suffix)
+{
+	snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+		 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+		 model->mvtvm.metadata.model.name, model_xstats[entry].name, suffix);
+}
+
+#define ML_AVG_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count)                            \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value += model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_tot;              \
+			count += model->mvtvm.burst_xstats[qp_id].dequeued_count -                 \
+				 model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count;              \
+		}                                                                                  \
+		if (count != 0)                                                                    \
+			value = value / count;                                                     \
+	} while (0)
+
+#define ML_MIN_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count)                            \
+	do {                                                                                       \
+		value = UINT64_MAX;                                                                \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MIN(value,                                                     \
+					model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_min);      \
+			count += model->mvtvm.burst_xstats[qp_id].dequeued_count -                 \
+				 model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count;              \
+		}                                                                                  \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+
+#define ML_MAX_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count)                            \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MAX(value,                                                     \
+					model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_max);      \
+			count += model->mvtvm.burst_xstats[qp_id].dequeued_count -                 \
+				 model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count;              \
+		}                                                                                  \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+
+uint64_t
+mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			 enum cnxk_ml_xstats_type type)
+{
+	uint64_t count = 0;
+	uint64_t value = 0;
+	uint32_t qp_id;
+
+	switch (type) {
+	case avg_rt_latency:
+		ML_AVG_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count);
+		break;
+	case min_rt_latency:
+		ML_MIN_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count);
+		break;
+	case max_rt_latency:
+		ML_MAX_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count);
+		break;
+	default:
+		value = 0;
+	}
+
+	return value;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
@@ -53,6 +126,7 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
 	size_t model_object_size = 0;
+	size_t model_xstats_size = 0;
 	uint16_t nb_mrvl_layers;
 	uint16_t nb_llvm_layers;
 	uint8_t layer_id = 0;
@@ -68,7 +142,11 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	model_object_size = RTE_ALIGN_CEIL(object[0].size, RTE_CACHE_LINE_MIN_SIZE) +
 			    RTE_ALIGN_CEIL(object[1].size, RTE_CACHE_LINE_MIN_SIZE) +
 			    RTE_ALIGN_CEIL(object[2].size, RTE_CACHE_LINE_MIN_SIZE);
-	mz_size += model_object_size;
+
+	model_xstats_size =
+		cnxk_mldev->mldev->data->nb_queue_pairs * sizeof(struct mvtvm_ml_model_xstats);
+
+	mz_size += model_object_size + model_xstats_size;
 
 	/* Allocate memzone for model object */
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", MVTVM_ML_MODEL_MEMZONE_NAME, model->model_id);
@@ -181,6 +259,22 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	/* Set model info */
 	mvtvm_ml_model_info_set(cnxk_mldev, model);
 
+	/* Update model xstats name */
+	cnxk_ml_xstats_model_name_update(cnxk_mldev, model->model_id);
+
+	model->mvtvm.burst_xstats = RTE_PTR_ADD(
+		model->mvtvm.object.params.addr,
+		RTE_ALIGN_CEIL(model->mvtvm.object.params.size, RTE_CACHE_LINE_MIN_SIZE));
+
+	for (int qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_tot = 0;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency = 0;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_min = UINT64_MAX;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_max = 0;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count = 0;
+		model->mvtvm.burst_xstats[qp_id].dequeued_count = 0;
+	}
+
 	return 0;
 
 error:
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 55459f9f7f..22e0340146 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -11,8 +11,11 @@
 
 #include <rte_mldev.h>
 
+#include "cnxk_ml_xstats.h"
+
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
@@ -22,4 +25,9 @@ int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *
 int mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
+void mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				   uint16_t stat_id, uint16_t entry, char *suffix);
+uint64_t mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				  enum cnxk_ml_xstats_type type);
+
 #endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index 260a051b08..19af1d2703 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -8,6 +8,7 @@
 
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_xstats.h"
 
 enum cnxk_ml_model_type
 mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
@@ -44,6 +45,28 @@ mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer
 	RTE_SET_USED(fp);
 }
 
+void
+mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			      uint16_t stat_id, uint16_t entry, char *suffix)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+	RTE_SET_USED(stat_id);
+	RTE_SET_USED(entry);
+	RTE_SET_USED(suffix);
+}
+
+uint64_t
+mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			 enum cnxk_ml_xstats_type type)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+	RTE_SET_USED(type);
+
+	return 0;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index d6d0edbcf1..3fd1f04c35 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -7,6 +7,8 @@
 
 #include <rte_mldev.h>
 
+#include "cnxk_ml_xstats.h"
+
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
@@ -24,5 +26,9 @@ int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_n
 				uint16_t *layer_id);
 struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
 void mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
+void mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				   uint16_t stat_id, uint16_t entry, char *suffix);
+uint64_t mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				  enum cnxk_ml_xstats_type type);
 
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 30/34] ml/cnxk: implement I/O alloc and free callbacks
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (28 preceding siblings ...)
  2023-10-23  4:41   ` [PATCH v8 29/34] ml/cnxk: enable reporting model runtime as xstats Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 31/34] ml/cnxk: add generic ML malloc and free callback Srikanth Yalavarthi
                     ` (3 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented callback functions for IO allocation and free
for Glow layers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 87 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  3 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c |  2 +
 3 files changed, 92 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 0c67ce7b40..7802425c87 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1410,3 +1410,90 @@ cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
 error_enqueue:
 	return ret;
 }
+
+int
+cn10k_ml_io_alloc(void *device, uint16_t model_id, const char *layer_name, uint64_t **input_qbuffer,
+		  uint64_t **output_qbuffer)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint64_t output_size;
+	uint64_t input_size;
+	uint16_t layer_id;
+	int ret;
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
+	layer = &model->layer[layer_id];
+	input_size = PLT_ALIGN_CEIL(layer->info.total_input_sz_q, ML_CN10K_ALIGN_SIZE);
+	output_size = PLT_ALIGN_CEIL(layer->info.total_output_sz_q, ML_CN10K_ALIGN_SIZE);
+
+	sprintf(str, "cn10k_ml_io_mz_%u_%u", model_id, layer_id);
+	mz = plt_memzone_reserve_aligned(str, input_size + output_size, 0, ML_CN10K_ALIGN_SIZE);
+	if (mz == NULL) {
+		plt_err("io_alloc failed: Unable to allocate memory: model_id = %u, layer_name = %s",
+			model_id, layer_name);
+		return -ENOMEM;
+	}
+
+	*input_qbuffer = mz->addr;
+	*output_qbuffer = PLT_PTR_ADD(mz->addr, input_size);
+
+	return 0;
+}
+
+int
+cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint16_t layer_id;
+	int ret;
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
+	sprintf(str, "cn10k_ml_io_mz_%u_%u", model_id, layer_id);
+	mz = plt_memzone_lookup(str);
+	if (mz == NULL) {
+		plt_err("io_free failed: Memzone not found: model_id = %u, layer_name = %s",
+			model_id, layer_name);
+		return -EINVAL;
+	}
+
+	return plt_memzone_free(mz);
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 045e2e6cd2..9c41c1c0b0 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -329,6 +329,9 @@ int cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name,
 int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name);
 int cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name);
 int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name);
+int cn10k_ml_io_alloc(void *device, uint16_t model_id, const char *layer_name,
+		      uint64_t **input_qbuffer, uint64_t **output_qbuffer);
+int cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name);
 
 /* xstats ops */
 void cn10k_ml_xstat_model_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index abfbae2b3a..a50b31ec6e 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -232,6 +232,8 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback = &model->mvtvm.cb;
 		callback->tvmrt_glow_layer_load = cn10k_ml_layer_load;
 		callback->tvmrt_glow_layer_unload = cn10k_ml_layer_unload;
+		callback->tvmrt_io_alloc = cn10k_ml_io_alloc;
+		callback->tvmrt_io_free = cn10k_ml_io_free;
 	} else {
 		callback = NULL;
 	}
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 31/34] ml/cnxk: add generic ML malloc and free callback
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (29 preceding siblings ...)
  2023-10-23  4:41   ` [PATCH v8 30/34] ml/cnxk: implement I/O alloc and free callbacks Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 32/34] ml/cnxk: support quantize and dequantize callback Srikanth Yalavarthi
                     ` (2 subsequent siblings)
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented generic ML malloc and free callbacks

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 30 ++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  3 +++
 drivers/ml/cnxk/mvtvm_ml_ops.c |  2 ++
 3 files changed, 35 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 7802425c87..01b0a44caa 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1497,3 +1497,33 @@ cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name)
 
 	return plt_memzone_free(mz);
 }
+
+int
+cn10k_ml_malloc(const char *name, size_t size, uint32_t align, void **addr)
+{
+	const struct plt_memzone *mz;
+
+	mz = plt_memzone_reserve_aligned(name, size, 0, align);
+	if (mz == NULL) {
+		plt_err("ml_malloc failed: Unable to allocate memory: name = %s", name);
+		return -ENOMEM;
+	}
+
+	*addr = mz->addr;
+
+	return 0;
+}
+
+int
+cn10k_ml_free(const char *name)
+{
+	const struct plt_memzone *mz;
+
+	mz = plt_memzone_lookup(name);
+	if (mz == NULL) {
+		plt_err("ml_free failed: Memzone not found: name = %s", name);
+		return -EINVAL;
+	}
+
+	return plt_memzone_free(mz);
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 9c41c1c0b0..eb3e1c139c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -333,6 +333,9 @@ int cn10k_ml_io_alloc(void *device, uint16_t model_id, const char *layer_name,
 		      uint64_t **input_qbuffer, uint64_t **output_qbuffer);
 int cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name);
 
+int cn10k_ml_malloc(const char *name, size_t size, uint32_t align, void **addr);
+int cn10k_ml_free(const char *name);
+
 /* xstats ops */
 void cn10k_ml_xstat_model_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 				   uint16_t stat_id, uint16_t entry, char *suffix);
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index a50b31ec6e..9d59e28661 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -234,6 +234,8 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback->tvmrt_glow_layer_unload = cn10k_ml_layer_unload;
 		callback->tvmrt_io_alloc = cn10k_ml_io_alloc;
 		callback->tvmrt_io_free = cn10k_ml_io_free;
+		callback->tvmrt_malloc = cn10k_ml_malloc;
+		callback->tvmrt_free = cn10k_ml_free;
 	} else {
 		callback = NULL;
 	}
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 32/34] ml/cnxk: support quantize and dequantize callback
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (30 preceding siblings ...)
  2023-10-23  4:41   ` [PATCH v8 31/34] ml/cnxk: add generic ML malloc and free callback Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 33/34] ml/cnxk: enable fast-path ops for TVM models Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 34/34] ml/cnxk: enable creation of mvtvm virtual device Srikanth Yalavarthi
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

From: Prince Takkar <ptakkar@marvell.com>

Added support for quantize and dequantize callback
functions for TVM models.

Signed-off-by: Prince Takkar <ptakkar@marvell.com>
---
 drivers/ml/cnxk/mvtvm_ml_ops.c | 129 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h |   4 +
 2 files changed, 133 insertions(+)

diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 9d59e28661..39c8bf0f04 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -2,11 +2,15 @@
  * Copyright (c) 2023 Marvell.
  */
 
+#include <dlpack/dlpack.h>
+
 #include <rte_common.h>
 #include <rte_cycles.h>
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include <mldev_utils.h>
+
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
@@ -236,6 +240,8 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback->tvmrt_io_free = cn10k_ml_io_free;
 		callback->tvmrt_malloc = cn10k_ml_malloc;
 		callback->tvmrt_free = cn10k_ml_free;
+		callback->tvmrt_quantize = mvtvm_ml_io_quantize;
+		callback->tvmrt_dequantize = mvtvm_ml_io_dequantize;
 	} else {
 		callback = NULL;
 	}
@@ -366,3 +372,126 @@ mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
 
 	return 0;
 }
+
+int
+mvtvm_ml_io_quantize(void *device, uint16_t model_id, const char *layer_name,
+		     const DLTensor **deq_tensor, void *qbuffer)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	uint16_t layer_id = 0;
+	uint8_t *lcl_dbuffer;
+	uint8_t *lcl_qbuffer;
+	uint32_t i;
+	int ret;
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if ((device == NULL) || (deq_tensor == NULL) || (qbuffer == NULL))
+		return -EINVAL;
+#endif
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+#ifdef CNXK_ML_DEV_DEBUG
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+#endif
+
+	/* Get layer id */
+	for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+		if (strcmp(model->layer[layer_id].name, layer_name) == 0)
+			break;
+	}
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if (layer_id == model->mvtvm.metadata.model.nb_layers) {
+		plt_err("Invalid layer name: %s", layer_name);
+		return -EINVAL;
+	}
+
+	if (model->layer[layer_id].type != ML_CNXK_LAYER_TYPE_MRVL) {
+		plt_err("Invalid layer name / type: %s", layer_name);
+		return -EINVAL;
+	}
+#endif
+
+	info = &model->layer[layer_id].info;
+	lcl_qbuffer = (uint8_t *)qbuffer;
+
+	for (i = 0; i < info->nb_inputs; i++) {
+		lcl_dbuffer = PLT_PTR_ADD(deq_tensor[i]->data, deq_tensor[i]->byte_offset);
+
+		ret = cnxk_ml_io_quantize_single(&info->input[i], lcl_dbuffer, lcl_qbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_qbuffer += info->input[i].sz_q;
+	}
+
+	return 0;
+}
+
+int
+mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name, void *qbuffer,
+		       const DLTensor **deq_tensor)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	uint16_t layer_id = 0;
+	uint8_t *lcl_dbuffer;
+	uint8_t *lcl_qbuffer;
+	uint32_t i;
+	int ret;
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if ((device == NULL) || (deq_tensor == NULL) || (qbuffer == NULL))
+		return -EINVAL;
+#endif
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+#ifdef CNXK_ML_DEV_DEBUG
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+#endif
+
+	for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+		if (strcmp(model->layer[layer_id].name, layer_name) == 0)
+			break;
+	}
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if (layer_id == model->mvtvm.metadata.model.nb_layers) {
+		plt_err("Invalid layer name: %s", layer_name);
+		return -EINVAL;
+	}
+
+	if (model->layer[layer_id].type != ML_CNXK_LAYER_TYPE_MRVL) {
+		plt_err("Invalid layer name / type: %s", layer_name);
+		return -EINVAL;
+	}
+#endif
+
+	info = &model->layer[layer_id].info;
+	lcl_qbuffer = (uint8_t *)qbuffer;
+
+	for (i = 0; i < info->nb_outputs; i++) {
+		lcl_dbuffer = PLT_PTR_ADD(deq_tensor[i]->data, deq_tensor[i]->byte_offset);
+
+		ret = cnxk_ml_io_dequantize_single(&info->output[i], lcl_qbuffer, lcl_dbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_qbuffer += info->output[i].sz_q;
+	}
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 22e0340146..4cabe30a82 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -24,6 +24,10 @@ int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_para
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_io_quantize(void *device, uint16_t model_id, const char *layer_name,
+			 const DLTensor **deq_tensor, void *qbuffer);
+int mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name, void *qbuffer,
+			   const DLTensor **deq_tensor);
 
 void mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 				   uint16_t stat_id, uint16_t entry, char *suffix);
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 33/34] ml/cnxk: enable fast-path ops for TVM models
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (31 preceding siblings ...)
  2023-10-23  4:41   ` [PATCH v8 32/34] ml/cnxk: support quantize and dequantize callback Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  2023-10-23  4:41   ` [PATCH v8 34/34] ml/cnxk: enable creation of mvtvm virtual device Srikanth Yalavarthi
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

From: Anup Prabhu <aprabhu@marvell.com>

Enable fast-path ops support for TVM models. Models would
use TVMDP library function calls to execute inference
operations for Hybrid and LLVM model sub-types.

For TVM MRVL model subtypes that have a single MRVL layer,
the inference requests are directly enqueued to hardware
by the driver.

Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 doc/guides/rel_notes/release_23_11.rst |   3 +
 drivers/ml/cnxk/cn10k_ml_ops.c         |   4 -
 drivers/ml/cnxk/cnxk_ml_io.h           |   6 ++
 drivers/ml/cnxk/cnxk_ml_ops.c          |   4 +
 drivers/ml/cnxk/cnxk_ml_ops.h          |   5 +
 drivers/ml/cnxk/mvtvm_ml_model.c       |  20 ++++
 drivers/ml/cnxk/mvtvm_ml_model.h       |   6 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c         | 124 +++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h         |  43 +++++++++
 9 files changed, 211 insertions(+), 4 deletions(-)

diff --git a/doc/guides/rel_notes/release_23_11.rst b/doc/guides/rel_notes/release_23_11.rst
index 0a6fc76a9d..5fcf2a1897 100644
--- a/doc/guides/rel_notes/release_23_11.rst
+++ b/doc/guides/rel_notes/release_23_11.rst
@@ -243,6 +243,9 @@ New Features
   Added dispatcher library which purpose is to help decouple different
   parts (modules) of an eventdev-based application.
 
+* **Updated Marvell cnxk mldev driver.**
+
+  * Added support for models compiled using TVM framework.
 
 Removed Items
 -------------
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 01b0a44caa..b9d30278c6 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -371,10 +371,6 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 	else
 		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
-	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
-	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
-	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
-
 	return 0;
 }
 
diff --git a/drivers/ml/cnxk/cnxk_ml_io.h b/drivers/ml/cnxk/cnxk_ml_io.h
index 5de166c252..6d5d25a7c9 100644
--- a/drivers/ml/cnxk/cnxk_ml_io.h
+++ b/drivers/ml/cnxk/cnxk_ml_io.h
@@ -47,6 +47,12 @@ struct cnxk_ml_io {
 
 	/* Scale */
 	float scale;
+
+	/* Dequantized offset */
+	uint32_t offset_d;
+
+	/* Quantized offset */
+	uint32_t offset_q;
 };
 
 /* Model / Layer IO structure */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 2632d70d8c..bf266d4d6e 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -632,6 +632,10 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	cnxk_mldev->max_nb_layers =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
 
+	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
+	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
+	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
+
 	/* Allocate and initialize index_map */
 	if (cnxk_mldev->index_map == NULL) {
 		cnxk_mldev->index_map =
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index ab32676b3e..7b49793a57 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -24,6 +24,11 @@ struct cnxk_ml_req {
 	union {
 		/* CN10K */
 		struct cn10k_ml_req cn10k_req;
+
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		/* MVTVM */
+		struct mvtvm_ml_req mvtvm_req;
+#endif
 	};
 
 	/* Address of status field */
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index ffbcec8b80..95bde6a9cb 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -198,6 +198,16 @@ mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model)
 		model->mvtvm.info.total_input_sz_d += model->mvtvm.info.input[i].sz_d;
 		model->mvtvm.info.total_input_sz_q += model->mvtvm.info.input[i].sz_q;
 
+		model->mvtvm.info.input[i].offset_d = model->mvtvm.info.total_input_sz_d;
+		model->mvtvm.info.input[i].offset_q = model->mvtvm.info.total_input_sz_q;
+
+		model->mvtvm.input_tensor[i].device = metadata->input[i].device;
+		model->mvtvm.input_tensor[i].ndim = metadata->input[i].ndim;
+		model->mvtvm.input_tensor[i].dtype = metadata->input[i].datatype;
+		model->mvtvm.input_tensor[i].shape = metadata->input[i].shape;
+		model->mvtvm.input_tensor[i].strides = NULL;
+		model->mvtvm.input_tensor[i].byte_offset = model->mvtvm.info.input[i].offset_q;
+
 		plt_ml_dbg("model_id = %u, input[%u] - sz_d = %u sz_q = %u", model->model_id, i,
 			   model->mvtvm.info.input[i].sz_d, model->mvtvm.info.input[i].sz_q);
 	}
@@ -231,6 +241,16 @@ mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model)
 		model->mvtvm.info.total_output_sz_d += model->mvtvm.info.output[i].sz_d;
 		model->mvtvm.info.total_output_sz_q += model->mvtvm.info.output[i].sz_q;
 
+		model->mvtvm.info.output[i].offset_d = model->mvtvm.info.total_output_sz_d;
+		model->mvtvm.info.output[i].offset_q = model->mvtvm.info.total_output_sz_q;
+
+		model->mvtvm.output_tensor[i].device = metadata->output[i].device;
+		model->mvtvm.output_tensor[i].ndim = metadata->output[i].ndim;
+		model->mvtvm.output_tensor[i].dtype = metadata->output[i].datatype;
+		model->mvtvm.output_tensor[i].shape = metadata->output[i].shape;
+		model->mvtvm.output_tensor[i].strides = NULL;
+		model->mvtvm.output_tensor[i].byte_offset = model->mvtvm.info.output[i].offset_q;
+
 		plt_ml_dbg("model_id = %u, output[%u] - sz_d = %u sz_q = %u", model->model_id, i,
 			   model->mvtvm.info.output[i].sz_d, model->mvtvm.info.output[i].sz_q);
 	}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 66c3af18e1..7ffce38094 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -69,6 +69,12 @@ struct mvtvm_ml_model_data {
 
 	/* Stats for burst ops */
 	struct mvtvm_ml_model_xstats *burst_xstats;
+
+	/* Input Tensor */
+	DLTensor input_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Output Tensor */
+	DLTensor output_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
 };
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 39c8bf0f04..6b88491371 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -19,6 +19,12 @@
 /* ML model macros */
 #define MVTVM_ML_MODEL_MEMZONE_NAME "ml_mvtvm_model_mz"
 
+__rte_hot static void
+mvtvm_ml_set_poll_addr(struct cnxk_ml_req *req)
+{
+	req->status = &req->mvtvm_req.status;
+}
+
 void
 mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 			      uint16_t stat_id, uint16_t entry, char *suffix)
@@ -242,6 +248,7 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback->tvmrt_free = cn10k_ml_free;
 		callback->tvmrt_quantize = mvtvm_ml_io_quantize;
 		callback->tvmrt_dequantize = mvtvm_ml_io_dequantize;
+		callback->tvmrt_inference = cn10k_ml_inference_sync;
 	} else {
 		callback = NULL;
 	}
@@ -285,6 +292,19 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		model->mvtvm.burst_xstats[qp_id].dequeued_count = 0;
 	}
 
+	/* Set model specific fast path functions */
+	if (model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL) {
+		model->enqueue_single = cn10k_ml_enqueue_single;
+		model->result_update = cn10k_ml_result_update;
+		model->set_error_code = cn10k_ml_set_error_code;
+		model->set_poll_addr = cn10k_ml_set_poll_addr;
+	} else {
+		model->enqueue_single = mvtvm_ml_enqueue_single;
+		model->result_update = mvtvm_ml_result_update;
+		model->set_error_code = mvtvm_ml_set_error_code;
+		model->set_poll_addr = mvtvm_ml_set_poll_addr;
+	}
+
 	return 0;
 
 error:
@@ -495,3 +515,107 @@ mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name,
 
 	return 0;
 }
+
+static int
+mvtvm_ml_model_run(struct cnxk_ml_model *model, struct rte_ml_op *op, struct cnxk_ml_req *req)
+{
+	uint8_t i;
+
+	rte_memcpy(req->mvtvm_req.input_tensor, model->mvtvm.input_tensor,
+		   model->mvtvm.metadata.model.num_input * sizeof(DLTensor));
+	for (i = 0; i < model->mvtvm.metadata.model.num_input; i++) {
+		req->mvtvm_req.input_tensor[i].data = op->input[i]->addr;
+		req->mvtvm_req.input_tensor[i].byte_offset = 0;
+	}
+
+	rte_memcpy(req->mvtvm_req.output_tensor, model->mvtvm.output_tensor,
+		   model->mvtvm.metadata.model.num_output * sizeof(DLTensor));
+	for (i = 0; i < model->mvtvm.metadata.model.num_output; i++) {
+		req->mvtvm_req.output_tensor[i].data = op->output[i]->addr;
+		req->mvtvm_req.output_tensor[i].byte_offset = 0;
+	}
+
+	tvmdp_model_run(model->model_id, model->mvtvm.metadata.model.num_input,
+			req->mvtvm_req.input_tensor, model->mvtvm.metadata.model.num_output,
+			req->mvtvm_req.output_tensor, &req->mvtvm_req.result,
+			&req->mvtvm_req.status);
+
+	plt_write64(ML_CNXK_POLL_JOB_FINISH, req->status);
+
+	return 0;
+}
+
+__rte_hot void
+mvtvm_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype)
+{
+	RTE_SET_USED(stype);
+
+	req->mvtvm_req.result.error_code = etype;
+}
+
+__rte_hot bool
+mvtvm_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op, uint16_t layer_id,
+			struct cnxk_ml_qp *qp, uint64_t head)
+{
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_req *req;
+
+	RTE_SET_USED(layer_id);
+
+	queue = &qp->queue;
+	req = &queue->reqs[head];
+	model = cnxk_mldev->mldev->data->models[op->model_id];
+
+	model->set_poll_addr(req);
+	memset(&req->mvtvm_req.result, 0, sizeof(struct mvtvm_ml_result));
+	req->mvtvm_req.result.error_code = 0x0;
+	req->mvtvm_req.result.user_ptr = op->user_ptr;
+
+	cnxk_ml_set_poll_ptr(req);
+	mvtvm_ml_model_run(model, op, req);
+	req->timeout = plt_tsc_cycles() + queue->wait_cycles;
+	req->op = op;
+
+	return true;
+}
+
+__rte_hot void
+mvtvm_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request)
+{
+	struct mvtvm_ml_model_xstats *xstats;
+	struct mvtvm_ml_result *result;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_req *req;
+	uint64_t tvm_rt_latency;
+	struct cnxk_ml_qp *qp;
+	struct rte_ml_op *op;
+
+	req = (struct cnxk_ml_req *)request;
+	result = &req->mvtvm_req.result;
+	op = req->op;
+	qp = cnxk_mldev->mldev->data->queue_pairs[qp_id];
+	op->impl_opaque = result->error_code;
+
+	if (likely(result->error_code == 0)) {
+		qp->stats.dequeued_count++;
+		op->status = RTE_ML_OP_STATUS_SUCCESS;
+
+		model = cnxk_mldev->mldev->data->models[op->model_id];
+		xstats = &model->mvtvm.burst_xstats[qp_id];
+
+		if (unlikely(xstats->dequeued_count == xstats->tvm_rt_reset_count)) {
+			xstats->tvm_rt_latency_min = UINT64_MAX;
+			xstats->tvm_rt_latency_max = 0;
+		}
+		tvm_rt_latency = result->stats.end_ns - result->stats.start_ns;
+		xstats->tvm_rt_latency = tvm_rt_latency;
+		xstats->tvm_rt_latency_tot += tvm_rt_latency;
+		xstats->tvm_rt_latency_min = RTE_MIN(xstats->tvm_rt_latency_min, tvm_rt_latency);
+		xstats->tvm_rt_latency_max = RTE_MAX(xstats->tvm_rt_latency_max, tvm_rt_latency);
+		xstats->dequeued_count++;
+	} else {
+		qp->stats.dequeue_err_count++;
+		op->status = RTE_ML_OP_STATUS_ERROR;
+	}
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 4cabe30a82..cb4b219743 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -16,6 +16,44 @@
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
+struct cnxk_ml_qp;
+struct cnxk_ml_req;
+
+/* Inference stats */
+struct mvtvm_ml_stats {
+	/* Start ns */
+	uint64_t start_ns;
+
+	/* Start ns */
+	uint64_t end_ns;
+};
+
+/* Result structure */
+struct mvtvm_ml_result {
+	/* Job error code */
+	uint64_t error_code;
+
+	/* Inference stats */
+	struct mvtvm_ml_stats stats;
+
+	/* User context pointer */
+	void *user_ptr;
+};
+
+/* MVTVM specific request */
+struct mvtvm_ml_req {
+	/* Input tensors */
+	DLTensor input_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Output tensors */
+	DLTensor output_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Status field for poll mode requests */
+	volatile uint64_t status;
+
+	/* Result */
+	struct mvtvm_ml_result result;
+};
 
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
@@ -29,6 +67,11 @@ int mvtvm_ml_io_quantize(void *device, uint16_t model_id, const char *layer_name
 int mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name, void *qbuffer,
 			   const DLTensor **deq_tensor);
 
+__rte_hot bool mvtvm_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op,
+				       uint16_t layer_id, struct cnxk_ml_qp *qp, uint64_t head);
+__rte_hot void mvtvm_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request);
+__rte_hot void mvtvm_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype);
+
 void mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 				   uint16_t stat_id, uint16_t entry, char *suffix);
 uint64_t mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v8 34/34] ml/cnxk: enable creation of mvtvm virtual device
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (32 preceding siblings ...)
  2023-10-23  4:41   ` [PATCH v8 33/34] ml/cnxk: enable fast-path ops for TVM models Srikanth Yalavarthi
@ 2023-10-23  4:41   ` Srikanth Yalavarthi
  33 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-23  4:41 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enable support to create a mvtvm virtual device on
system's without a PCI based ML HW accelerator.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 doc/guides/mldevs/cnxk.rst       |  50 +++++++-
 drivers/ml/cnxk/cn10k_ml_dev.c   |   8 ++
 drivers/ml/cnxk/cn10k_ml_dev.h   |   3 +
 drivers/ml/cnxk/cnxk_ml_dev.c    |   3 +
 drivers/ml/cnxk/cnxk_ml_dev.h    |  21 ++++
 drivers/ml/cnxk/cnxk_ml_ops.c    |  82 +++++++++----
 drivers/ml/cnxk/meson.build      |   1 +
 drivers/ml/cnxk/mvtvm_ml_dev.c   | 196 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_dev.h   |  40 +++++++
 drivers/ml/cnxk/mvtvm_ml_ops.c   |  31 +++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |   2 +
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  18 +++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |   2 +
 13 files changed, 433 insertions(+), 24 deletions(-)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.h

diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
index a4d8903896..28e5b5b87f 100644
--- a/doc/guides/mldevs/cnxk.rst
+++ b/doc/guides/mldevs/cnxk.rst
@@ -239,6 +239,23 @@ Bind the ML PF device to the vfio_pci driver:
    usertools/dpdk-devbind.py -u 0000:00:10.0
    usertools/dpdk-devbind.py -b vfio-pci 0000:00:10.0
 
+VDEV support
+------------
+
+On platforms which don't support ML hardware acceleration through PCI device, the
+Marvell ML CNXK PMD can execute inference operations on a vdev with the ML models
+compiled using Apache TVM framework.
+
+VDEV can be enabled by passing the EAL arguments
+
+.. code-block:: console
+
+   --vdev ml_mvtvm
+
+VDEV can also be used on platforms with ML HW accelerator. However to use VDEV in
+this case, the PCI device has to be un-binded. When PCI device is binded, creation
+of vdev is skipped.
+
 
 Runtime Config Options
 ----------------------
@@ -249,6 +266,8 @@ Runtime Config Options
   The parameter ``fw_path`` can be used by the user
   to load ML firmware from a custom path.
 
+  This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,fw_path="/home/user/ml_fw.bin"
@@ -264,6 +283,8 @@ Runtime Config Options
   When enabled, firmware would mask the DPE non-fatal hardware errors as warnings.
   The parameter ``enable_dpe_warnings`` is used fo this configuration.
 
+  This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,enable_dpe_warnings=0
@@ -280,11 +301,19 @@ Runtime Config Options
   Caching of model data improves the inferencing throughput / latency for the model.
   The parameter ``cache_model_data`` is used to enable data caching.
 
+  This option is supported on PCI HW accelerator and vdev.
+
   For example::
 
      -a 0000:00:10.0,cache_model_data=0
 
-  With the above configuration, model data caching is disabled.
+  With the above configuration, model data caching is disabled on HW accelerator.
+
+  For example::
+
+     --vdev ml_mvtvm,cache_model_data=0
+
+  With the above configuration, model data caching is disabled on vdev.
 
 
 **OCM allocation mode** (default ``lowest``)
@@ -300,6 +329,8 @@ Runtime Config Options
   ``largest``
     Allocate OCM for the model from the slot with largest amount of free space.
 
+  This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,ocm_alloc_mode=lowest
@@ -317,6 +348,8 @@ Runtime Config Options
   Supported page sizes by the driver are 1 KB, 2 KB, 4 KB, 8 KB and 16 KB.
   Default page size is 16 KB.
 
+  This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,ocm_page_size=8192
@@ -341,6 +374,8 @@ Runtime Config Options
     Enabling spinlock version would disable restrictions on the number of queue-pairs
     that can be supported by the driver.
 
+   This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,hw_queue_lock=1
@@ -349,6 +384,19 @@ Runtime Config Options
   in the fast path enqueue burst operation.
 
 
+**Maximum queue pairs** (default ``1``)
+
+  VDEV supports additional EAL arguments to configure the maximum number of
+  queue-pairs on the ML device through the option ``max_qps``.
+
+  This option is supported only on vdev.
+
+  For example::
+
+     --vdev ml_mvtvm,max_qps=4
+
+  With the above configuration, 4 queue-pairs are created on the vdev.
+
 Debugging Options
 -----------------
 
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 91813e9d0a..41f3b7a95d 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -309,6 +309,12 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 
 	PLT_SET_USED(pci_drv);
 
+	if (cnxk_ml_dev_initialized == 1) {
+		plt_err("ML CNXK device already initialized!");
+		plt_err("Cannot initialize CN10K PCI dev");
+		return -EINVAL;
+	}
+
 	init_params = (struct rte_ml_dev_pmd_init_params){
 		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cnxk_ml_dev)};
 
@@ -355,6 +361,8 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	dev->dequeue_burst = NULL;
 	dev->op_error_get = NULL;
 
+	cnxk_ml_dev_initialized = 1;
+	cnxk_mldev->type = CNXK_ML_DEV_TYPE_PCI;
 	cnxk_mldev->state = ML_CNXK_DEV_STATE_PROBED;
 
 	return 0;
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 2e7eb6c9ef..cee405f3f5 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -11,6 +11,9 @@
 
 #include "cnxk_ml_io.h"
 
+/* Device status */
+extern int cnxk_ml_dev_initialized;
+
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.c b/drivers/ml/cnxk/cnxk_ml_dev.c
index 63d1c9e417..dc4512223c 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.c
+++ b/drivers/ml/cnxk/cnxk_ml_dev.c
@@ -7,6 +7,9 @@
 
 #include "cnxk_ml_dev.h"
 
+/* Device status */
+int cnxk_ml_dev_initialized;
+
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
 
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 382fca64be..491c4c4aea 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -9,6 +9,10 @@
 
 #include "cn10k_ml_dev.h"
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_dev.h"
+#endif
+
 #include "cnxk_ml_xstats.h"
 
 /* ML command timeout in seconds */
@@ -34,6 +38,15 @@ struct cnxk_ml_error_db {
 	char str[RTE_ML_STR_MAX];
 };
 
+/* Device type */
+enum cnxk_ml_dev_type {
+	/* PCI based Marvell's ML HW accelerator device */
+	CNXK_ML_DEV_TYPE_PCI,
+
+	/* Generic Virtual device */
+	CNXK_ML_DEV_TYPE_VDEV,
+};
+
 /* Device configuration state enum */
 enum cnxk_ml_dev_state {
 	/* Probed and not configured */
@@ -66,6 +79,9 @@ struct cnxk_ml_dev {
 	/* RTE device */
 	struct rte_ml_dev *mldev;
 
+	/* Device type */
+	enum cnxk_ml_dev_type type;
+
 	/* Configuration state */
 	enum cnxk_ml_dev_state state;
 
@@ -87,6 +103,11 @@ struct cnxk_ml_dev {
 	/* CN10K device structure */
 	struct cn10k_ml_dev cn10k_mldev;
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	/* MVTVM device structure */
+	struct mvtvm_ml_dev mvtvm_mldev;
+#endif
+
 	/* Maximum number of layers */
 	uint64_t max_nb_layers;
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index bf266d4d6e..36a5dcf9b0 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -117,7 +117,8 @@ cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc
 	qp->stats.enqueue_err_count = 0;
 	qp->stats.dequeue_err_count = 0;
 
-	cn10k_ml_qp_initialize(cnxk_mldev, qp);
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		cn10k_ml_qp_initialize(cnxk_mldev, qp);
 
 	return qp;
 
@@ -480,7 +481,12 @@ cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	dev_info->driver_name = dev->device->driver->name;
 	dev_info->max_models = ML_CNXK_MAX_MODELS;
 
-	return cn10k_ml_dev_info_get(cnxk_mldev, dev_info);
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		return cn10k_ml_dev_info_get(cnxk_mldev, dev_info);
+	else
+		return mvtvm_ml_dev_info_get(cnxk_mldev, dev_info);
+
+	return 0;
 }
 
 static int
@@ -518,9 +524,11 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 			   conf->nb_queue_pairs, conf->nb_models);
 
 		/* Load firmware */
-		ret = cn10k_ml_fw_load(cnxk_mldev);
-		if (ret != 0)
-			return ret;
+		if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+			ret = cn10k_ml_fw_load(cnxk_mldev);
+			if (ret != 0)
+				return ret;
+		}
 	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
 		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
@@ -618,10 +626,12 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	}
 	dev->data->nb_models = conf->nb_models;
 
-	ret = cn10k_ml_dev_configure(cnxk_mldev, conf);
-	if (ret != 0) {
-		plt_err("Failed to configure CN10K ML Device");
-		goto error;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		ret = cn10k_ml_dev_configure(cnxk_mldev, conf);
+		if (ret != 0) {
+			plt_err("Failed to configure CN10K ML Device");
+			goto error;
+		}
 	}
 
 	ret = mvtvm_ml_dev_configure(cnxk_mldev, conf);
@@ -629,12 +639,17 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 		goto error;
 
 	/* Set device capabilities */
-	cnxk_mldev->max_nb_layers =
-		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		cnxk_mldev->max_nb_layers =
+			cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
+	else
+		cnxk_mldev->max_nb_layers = ML_CNXK_MAX_MODELS;
 
 	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
 	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
-	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
+
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
 
 	/* Allocate and initialize index_map */
 	if (cnxk_mldev->index_map == NULL) {
@@ -695,8 +710,10 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 	if (mvtvm_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close MVTVM ML Device");
 
-	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
-		plt_err("Failed to close CN10K ML Device");
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		if (cn10k_ml_dev_close(cnxk_mldev) != 0)
+			plt_err("Failed to close CN10K ML Device");
+	}
 
 	if (cnxk_mldev->index_map)
 		rte_free(cnxk_mldev->index_map);
@@ -748,10 +765,12 @@ cnxk_ml_dev_start(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
-	ret = cn10k_ml_dev_start(cnxk_mldev);
-	if (ret != 0) {
-		plt_err("Failed to start CN10K ML Device");
-		return ret;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		ret = cn10k_ml_dev_start(cnxk_mldev);
+		if (ret != 0) {
+			plt_err("Failed to start CN10K ML Device");
+			return ret;
+		}
 	}
 
 	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
@@ -770,10 +789,12 @@ cnxk_ml_dev_stop(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
-	ret = cn10k_ml_dev_stop(cnxk_mldev);
-	if (ret != 0) {
-		plt_err("Failed to stop CN10K ML Device");
-		return ret;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		ret = cn10k_ml_dev_stop(cnxk_mldev);
+		if (ret != 0) {
+			plt_err("Failed to stop CN10K ML Device");
+			return ret;
+		}
 	}
 
 	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
@@ -800,7 +821,12 @@ cnxk_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 			cnxk_ml_model_dump(cnxk_mldev, model, fp);
 	}
 
-	return cn10k_ml_dev_dump(cnxk_mldev, fp);
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		return cn10k_ml_dev_dump(cnxk_mldev, fp);
+	else
+		return mvtvm_ml_dev_dump(cnxk_mldev, fp);
+
+	return 0;
 }
 
 static int
@@ -813,6 +839,9 @@ cnxk_ml_dev_selftest(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV)
+		return -ENOTSUP;
+
 	return cn10k_ml_dev_selftest(cnxk_mldev);
 }
 
@@ -1145,6 +1174,11 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 		return -EINVAL;
 	}
 
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV && type != ML_CNXK_MODEL_TYPE_TVM) {
+		plt_err("Unsupported model type");
+		return -ENOTSUP;
+	}
+
 	/* Find model ID */
 	found = false;
 	for (lcl_model_id = 0; lcl_model_id < dev->data->nb_models; lcl_model_id++) {
@@ -1384,6 +1418,8 @@ cnxk_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buf
 		return -EINVAL;
 
 	cnxk_mldev = dev->data->dev_private;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV)
+		return -ENOTSUP;
 
 	model = dev->data->models[model_id];
 	if (model == NULL) {
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 20534d0b00..0680a0faa5 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -62,6 +62,7 @@ if enable_mvtvm
 dpdk_conf.set('RTE_MLDEV_CNXK_ENABLE_MVTVM', 1)
 
 sources += files(
+        'mvtvm_ml_dev.c',
         'mvtvm_ml_ops.c',
         'mvtvm_ml_model.c',
 )
diff --git a/drivers/ml/cnxk/mvtvm_ml_dev.c b/drivers/ml/cnxk/mvtvm_ml_dev.c
new file mode 100644
index 0000000000..dcac7b7273
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_dev.c
@@ -0,0 +1,196 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_kvargs.h>
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include <bus_vdev_driver.h>
+
+#include <roc_api.h>
+
+#include "cnxk_ml_dev.h"
+
+#define MVTVM_ML_DEV_MAX_QPS	      "max_qps"
+#define MVTVM_ML_DEV_CACHE_MODEL_DATA "cache_model_data"
+
+#define MVTVM_ML_DEV_MAX_QPS_DEFAULT	      32
+#define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT 1
+
+static const char *const valid_args[] = {MVTVM_ML_DEV_MAX_QPS, MVTVM_ML_DEV_CACHE_MODEL_DATA, NULL};
+
+static int
+parse_integer_arg(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	int *i = (int *)extra_args;
+
+	*i = atoi(value);
+	if (*i < 0) {
+		plt_err("Argument has to be positive.");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int
+parse_uint_arg(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	int i;
+	char *end;
+	errno = 0;
+
+	i = strtol(value, &end, 10);
+	if (*end != 0 || errno != 0 || i < 0)
+		return -EINVAL;
+
+	*((uint32_t *)extra_args) = i;
+
+	return 0;
+}
+
+static int
+mvtvm_mldev_parse_devargs(const char *args, struct mvtvm_ml_dev *mvtvm_mldev)
+{
+	bool cache_model_data_set = false;
+	struct rte_kvargs *kvlist = NULL;
+	bool max_qps_set = false;
+	int ret = 0;
+
+	if (args == NULL)
+		goto check_args;
+
+	kvlist = rte_kvargs_parse(args, valid_args);
+	if (kvlist == NULL) {
+		plt_err("Error parsing %s devargs\n", "MLDEV_NAME_MVTVM_PMD");
+		return -EINVAL;
+	}
+
+	if (rte_kvargs_count(kvlist, MVTVM_ML_DEV_MAX_QPS) == 1) {
+		ret = rte_kvargs_process(kvlist, MVTVM_ML_DEV_MAX_QPS, &parse_uint_arg,
+					 &mvtvm_mldev->max_nb_qpairs);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n", MVTVM_ML_DEV_MAX_QPS);
+			ret = -EINVAL;
+			goto exit;
+		}
+		max_qps_set = true;
+	}
+
+	if (rte_kvargs_count(kvlist, MVTVM_ML_DEV_CACHE_MODEL_DATA) == 1) {
+		ret = rte_kvargs_process(kvlist, MVTVM_ML_DEV_CACHE_MODEL_DATA, &parse_integer_arg,
+					 &mvtvm_mldev->cache_model_data);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n",
+				MVTVM_ML_DEV_CACHE_MODEL_DATA);
+			ret = -EINVAL;
+			goto exit;
+		}
+		cache_model_data_set = true;
+	}
+
+check_args:
+	if (!max_qps_set)
+		mvtvm_mldev->max_nb_qpairs = MVTVM_ML_DEV_MAX_QPS_DEFAULT;
+	plt_ml_dbg("ML: %s = %u", MVTVM_ML_DEV_MAX_QPS, mvtvm_mldev->max_nb_qpairs);
+
+	if (!cache_model_data_set) {
+		mvtvm_mldev->cache_model_data = CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT;
+	} else {
+		if ((mvtvm_mldev->cache_model_data < 0) || (mvtvm_mldev->cache_model_data > 1)) {
+			plt_err("Invalid argument, %s = %d\n", MVTVM_ML_DEV_CACHE_MODEL_DATA,
+				mvtvm_mldev->cache_model_data);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_ml_dbg("ML: %s = %d", MVTVM_ML_DEV_CACHE_MODEL_DATA, mvtvm_mldev->cache_model_data);
+
+exit:
+	if (kvlist)
+		rte_kvargs_free(kvlist);
+
+	return ret;
+}
+
+static int
+mvtvm_ml_vdev_probe(struct rte_vdev_device *vdev)
+{
+	struct rte_ml_dev_pmd_init_params init_params;
+	struct mvtvm_ml_dev *mvtvm_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct rte_ml_dev *dev;
+	const char *input_args;
+	const char *name;
+	int ret = 0;
+
+	if (cnxk_ml_dev_initialized == 1) {
+		plt_err("ML CNXK device already initialized!");
+		plt_err("Not creating ml_mvtvm vdev!");
+		return 0;
+	}
+
+	init_params = (struct rte_ml_dev_pmd_init_params){
+		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cnxk_ml_dev)};
+
+	name = rte_vdev_device_name(vdev);
+	if (name == NULL)
+		return -EINVAL;
+	input_args = rte_vdev_device_args(vdev);
+
+	dev = rte_ml_dev_pmd_create(name, &vdev->device, &init_params);
+	if (dev == NULL) {
+		ret = -EFAULT;
+		goto error_exit;
+	}
+
+	cnxk_mldev = dev->data->dev_private;
+	cnxk_mldev->mldev = dev;
+	mvtvm_mldev = &cnxk_mldev->mvtvm_mldev;
+	mvtvm_mldev->vdev = vdev;
+
+	ret = mvtvm_mldev_parse_devargs(input_args, mvtvm_mldev);
+	if (ret < 0)
+		goto error_exit;
+
+	dev->dev_ops = &cnxk_ml_ops;
+	dev->enqueue_burst = NULL;
+	dev->dequeue_burst = NULL;
+	dev->op_error_get = NULL;
+
+	cnxk_ml_dev_initialized = 1;
+	cnxk_mldev->type = CNXK_ML_DEV_TYPE_VDEV;
+
+	return 0;
+
+error_exit:
+	plt_err("Could not create device: ml_mvtvm");
+
+	return ret;
+}
+
+static int
+mvtvm_ml_vdev_remove(struct rte_vdev_device *vdev)
+{
+	struct rte_ml_dev *dev;
+	const char *name;
+
+	name = rte_vdev_device_name(vdev);
+	if (name == NULL)
+		return -EINVAL;
+
+	dev = rte_ml_dev_pmd_get_named_dev(name);
+	if (dev == NULL)
+		return -ENODEV;
+
+	return rte_ml_dev_pmd_destroy(dev);
+}
+
+static struct rte_vdev_driver mvtvm_mldev_pmd = {.probe = mvtvm_ml_vdev_probe,
+						 .remove = mvtvm_ml_vdev_remove};
+
+RTE_PMD_REGISTER_VDEV(MLDEV_NAME_MVTVM_PMD, mvtvm_mldev_pmd);
+
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_MVTVM_PMD,
+			      MVTVM_ML_DEV_MAX_QPS "=<int>" MVTVM_ML_DEV_CACHE_MODEL_DATA "=<0|1>");
diff --git a/drivers/ml/cnxk/mvtvm_ml_dev.h b/drivers/ml/cnxk/mvtvm_ml_dev.h
new file mode 100644
index 0000000000..6922c19337
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_dev.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_DEV_H_
+#define _MVTVM_ML_DEV_H_
+
+#include <rte_mldev_core.h>
+
+/* Device status */
+extern int cnxk_ml_dev_initialized;
+
+/* CNXK Device ops */
+extern struct rte_ml_dev_ops cnxk_ml_ops;
+
+/* Marvell MVTVM ML PMD device name */
+#define MLDEV_NAME_MVTVM_PMD ml_mvtvm
+
+/* Maximum number of descriptors per queue-pair */
+#define ML_MVTVM_MAX_DESC_PER_QP 1024
+
+/* Maximum number of inputs / outputs per model */
+#define ML_MVTVM_MAX_INPUT_OUTPUT 32
+
+/* Maximum number of segments for IO data */
+#define ML_MVTVM_MAX_SEGMENTS 1
+
+/* Device private data */
+struct mvtvm_ml_dev {
+	/* Virtual device */
+	struct rte_vdev_device *vdev;
+
+	/* Maximum number of queue pairs */
+	uint16_t max_nb_qpairs;
+
+	/* Enable / disable model data caching */
+	int cache_model_data;
+};
+
+#endif /* _MVTVM_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 6b88491371..e825c3fb23 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -97,6 +97,22 @@ mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *m
 	return value;
 }
 
+int
+mvtvm_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info)
+{
+	struct mvtvm_ml_dev *mvtvm_mldev;
+
+	mvtvm_mldev = &cnxk_mldev->mvtvm_mldev;
+
+	dev_info->max_queue_pairs = mvtvm_mldev->max_nb_qpairs;
+	dev_info->max_desc = ML_MVTVM_MAX_DESC_PER_QP;
+	dev_info->max_io = ML_MVTVM_MAX_INPUT_OUTPUT;
+	dev_info->max_segments = ML_MVTVM_MAX_SEGMENTS;
+	dev_info->align_size = RTE_CACHE_LINE_SIZE;
+
+	return 0;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
@@ -127,6 +143,15 @@ mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 	return ret;
 }
 
+int
+mvtvm_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(fp);
+
+	return 0;
+}
+
 int
 mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 		    struct cnxk_ml_model *model)
@@ -237,6 +262,12 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	else
 		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_HYBRID;
 
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV &&
+	    model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_LLVM) {
+		plt_err("Unsupported model sub-type");
+		return -ENOTSUP;
+	}
+
 	/* Set callback function array */
 	if (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_LLVM) {
 		callback = &model->mvtvm.cb;
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index cb4b219743..0232c5ead5 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -55,8 +55,10 @@ struct mvtvm_ml_req {
 	struct mvtvm_ml_result result;
 };
 
+int mvtvm_ml_dev_info_get(struct cnxk_ml_dev *mldev, struct rte_ml_dev_info *dev_info);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int mvtvm_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index 19af1d2703..126a954c91 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -67,6 +67,15 @@ mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *m
 	return 0;
 }
 
+int
+mvtvm_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(dev_info);
+
+	return -ENOTSUP;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
@@ -84,6 +93,15 @@ mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
+int
+mvtvm_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(fp);
+
+	return -EINVAL;
+}
+
 int
 mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 		    struct cnxk_ml_model *model)
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 3fd1f04c35..4220a963f2 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -14,8 +14,10 @@ struct cnxk_ml_model;
 struct cnxk_ml_layer;
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
+int mvtvm_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int mvtvm_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 00/34] Implementation of revised ml/cnxk driver
  2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
                   ` (40 preceding siblings ...)
  2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
@ 2023-10-26 12:43 ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 01/34] ml/cnxk: drop support for register polling Srikanth Yalavarthi
                     ` (34 more replies)
  41 siblings, 35 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  Cc: dev, syalavarthi, sshankarnara, aprabhu, ptakkar

This patch series is an implementation of revised ml/cnxk driver
to support models compiled with TVM compiler framework. TVM models
use a hybrid mode for execution, with regions of the model executing
on the ML accelerator and the rest executing on CPU cores.

This series of commits reorganizes the ml/cnxk driver and adds support
to execute multiple regions with-in a TVM model.

v9:
  - Fixed incorrect IO layout for TVM Marvell models
  - Set byte offset to zero for I/O tensors
  - Updated max layers macro definition. Set to TVMDP max layers.
  - Fixed TVM model IO type to RTE IO type map

v8:
  - Updated CMake dependency resolution of external dependencies
  - Updated mldevs/cnxk documentation
  - Updated meson config files for cn9k and cn10k to include cmake

v7:
  - Updated steps to build dependencies in cnxk mldev documentation
  - Replace str functions with rte_str functions
  - Drop use of rte_exit in ml/cnxk driver

v6:
  - Added depends info for series. This series depends on patch-132887
  - Fix merge conflicts with dpdk-23.11-rc1
  - Fix issues with ml/cnxk driver release notes
  - Added build dependency information for dlpack headers

v5:
  - Fix build failures for individual patches in the series
  - Finished build testing with devtools/test-meson-builds.sh script

v4:
  - Squashed release notes
  - Updated external build dependency info in documentation

v3:
  - Reduced use of RTE_MLDEV_CNXK_ENABLE_MVTVM macro
  - Added stubs file with dummy functions to use when TVM is disabled
  - Dropped patch with internal function to read firmware
  - Updated ML CNXK PMD documentation
  - Added external library dependency info in documentation
  - Added release notes for 23.11

v2:
  - Fix xstats reporting
  - Fix issues reported by klocwork static analysis tool
  - Update external header inclusions

v1:
  - Initial changes

Anup Prabhu (2):
  ml/cnxk: enable OCM check for multilayer TVM model
  ml/cnxk: enable fast-path ops for TVM models

Prince Takkar (2):
  ml/cnxk: update internal TVM model info structure
  ml/cnxk: support quantize and dequantize callback

Srikanth Yalavarthi (30):
  ml/cnxk: drop support for register polling
  ml/cnxk: add generic cnxk device structure
  ml/cnxk: add generic model and layer structures
  ml/cnxk: add generic cnxk request structure
  ml/cnxk: add generic cnxk xstats structures
  ml/cnxk: rename cnxk ops function pointers struct
  ml/cnxk: update device handling functions
  ml/cnxk: update queue-pair handling functions
  ml/cnxk: update model load and unload functions
  ml/cnxk: update model start and stop functions
  ml/cnxk: update model utility functions
  ml/cnxk: update data quantization functions
  ml/cnxk: update device debug functions
  ml/cnxk: update device stats functions
  ml/cnxk: update device and model xstats functions
  ml/cnxk: update fast path functions
  ml/cnxk: move error handling to cnxk layer
  ml/cnxk: support config and close of tvmdp library
  ml/cnxk: add structures to support TVM model type
  ml/cnxk: add support for identify model type
  ml/cnxk: add support to parse TVM model objects
  ml/cnxk: fetch layer info and load TVM model
  ml/cnxk: update internal info for TVM model
  ml/cnxk: enable model unload in tvmdp library
  ml/cnxk: support start and stop for TVM models
  ml/cnxk: support device dump for TVM models
  ml/cnxk: enable reporting model runtime as xstats
  ml/cnxk: implement I/O alloc and free callbacks
  ml/cnxk: add generic ML malloc and free callback
  ml/cnxk: enable creation of mvtvm virtual device

 config/arm/arm64_cn10k_linux_gcc       |    1 +
 config/arm/arm64_cn9k_linux_gcc        |    1 +
 doc/guides/mldevs/cnxk.rst             |  223 +-
 doc/guides/rel_notes/release_23_11.rst |    3 +
 drivers/ml/cnxk/cn10k_ml_dev.c         |  416 ++--
 drivers/ml/cnxk/cn10k_ml_dev.h         |  457 +---
 drivers/ml/cnxk/cn10k_ml_model.c       |  403 ++--
 drivers/ml/cnxk/cn10k_ml_model.h       |  151 +-
 drivers/ml/cnxk/cn10k_ml_ocm.c         |  111 +-
 drivers/ml/cnxk/cn10k_ml_ocm.h         |   15 +-
 drivers/ml/cnxk/cn10k_ml_ops.c         | 2828 ++++++++----------------
 drivers/ml/cnxk/cn10k_ml_ops.h         |  358 ++-
 drivers/ml/cnxk/cnxk_ml_dev.c          |   22 +
 drivers/ml/cnxk/cnxk_ml_dev.h          |  120 +
 drivers/ml/cnxk/cnxk_ml_io.c           |   95 +
 drivers/ml/cnxk/cnxk_ml_io.h           |   90 +
 drivers/ml/cnxk/cnxk_ml_model.c        |   94 +
 drivers/ml/cnxk/cnxk_ml_model.h        |  192 ++
 drivers/ml/cnxk/cnxk_ml_ops.c          | 1690 ++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.h          |   87 +
 drivers/ml/cnxk/cnxk_ml_utils.c        |   15 +
 drivers/ml/cnxk/cnxk_ml_utils.h        |   17 +
 drivers/ml/cnxk/cnxk_ml_xstats.h       |  152 ++
 drivers/ml/cnxk/meson.build            |   70 +
 drivers/ml/cnxk/mvtvm_ml_dev.c         |  196 ++
 drivers/ml/cnxk/mvtvm_ml_dev.h         |   40 +
 drivers/ml/cnxk/mvtvm_ml_model.c       |  409 ++++
 drivers/ml/cnxk/mvtvm_ml_model.h       |   90 +
 drivers/ml/cnxk/mvtvm_ml_ops.c         |  652 ++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h         |   82 +
 drivers/ml/cnxk/mvtvm_ml_stubs.c       |  141 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.h       |   36 +
 32 files changed, 6298 insertions(+), 2959 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_xstats.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_stubs.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_stubs.h

-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 01/34] ml/cnxk: drop support for register polling
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 02/34] ml/cnxk: add generic cnxk device structure Srikanth Yalavarthi
                     ` (33 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Dropped support for device argument "poll_mem" for cnxk
ML driver. Support to use registers for polling is removed
and DDR addresses would be used for polling.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 doc/guides/mldevs/cnxk.rst     |  16 -----
 drivers/ml/cnxk/cn10k_ml_dev.c |  36 +----------
 drivers/ml/cnxk/cn10k_ml_dev.h |  13 +---
 drivers/ml/cnxk/cn10k_ml_ops.c | 111 ++++-----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   6 --
 5 files changed, 18 insertions(+), 164 deletions(-)

diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
index b79bc540d9..1834b1f905 100644
--- a/doc/guides/mldevs/cnxk.rst
+++ b/doc/guides/mldevs/cnxk.rst
@@ -180,22 +180,6 @@ Runtime Config Options
   in the fast path enqueue burst operation.
 
 
-**Polling memory location** (default ``ddr``)
-
-  ML cnxk driver provides the option to select the memory location to be used
-  for polling to check the inference request completion.
-  Driver supports using either the DDR address space (``ddr``)
-  or ML registers (``register``) as polling locations.
-  The parameter ``poll_mem`` is used to specify the poll location.
-
-  For example::
-
-     -a 0000:00:10.0,poll_mem="register"
-
-  With the above configuration, ML cnxk driver is configured to use ML registers
-  for polling in fastpath requests.
-
-
 Debugging Options
 -----------------
 
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 983138a7f2..e3c2badcef 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -23,7 +23,6 @@
 #define CN10K_ML_DEV_CACHE_MODEL_DATA	"cache_model_data"
 #define CN10K_ML_OCM_ALLOC_MODE		"ocm_alloc_mode"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK	"hw_queue_lock"
-#define CN10K_ML_FW_POLL_MEM		"poll_mem"
 #define CN10K_ML_OCM_PAGE_SIZE		"ocm_page_size"
 
 #define CN10K_ML_FW_PATH_DEFAULT		"/lib/firmware/mlip-fw.bin"
@@ -32,7 +31,6 @@
 #define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT	1
 #define CN10K_ML_OCM_ALLOC_MODE_DEFAULT		"lowest"
 #define CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT	1
-#define CN10K_ML_FW_POLL_MEM_DEFAULT		"ddr"
 #define CN10K_ML_OCM_PAGE_SIZE_DEFAULT		16384
 
 /* ML firmware macros */
@@ -54,7 +52,6 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 					 CN10K_ML_DEV_CACHE_MODEL_DATA,
 					 CN10K_ML_OCM_ALLOC_MODE,
 					 CN10K_ML_DEV_HW_QUEUE_LOCK,
-					 CN10K_ML_FW_POLL_MEM,
 					 CN10K_ML_OCM_PAGE_SIZE,
 					 NULL};
 
@@ -103,9 +100,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	bool hw_queue_lock_set = false;
 	bool ocm_page_size_set = false;
 	char *ocm_alloc_mode = NULL;
-	bool poll_mem_set = false;
 	bool fw_path_set = false;
-	char *poll_mem = NULL;
 	char *fw_path = NULL;
 	int ret = 0;
 	bool found;
@@ -189,17 +184,6 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 		hw_queue_lock_set = true;
 	}
 
-	if (rte_kvargs_count(kvlist, CN10K_ML_FW_POLL_MEM) == 1) {
-		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_POLL_MEM, &parse_string_arg,
-					 &poll_mem);
-		if (ret < 0) {
-			plt_err("Error processing arguments, key = %s\n", CN10K_ML_FW_POLL_MEM);
-			ret = -EINVAL;
-			goto exit;
-		}
-		poll_mem_set = true;
-	}
-
 	if (rte_kvargs_count(kvlist, CN10K_ML_OCM_PAGE_SIZE) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_OCM_PAGE_SIZE, &parse_integer_arg,
 					 &mldev->ocm_page_size);
@@ -280,18 +264,6 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 	}
 	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, mldev->hw_queue_lock);
 
-	if (!poll_mem_set) {
-		mldev->fw.poll_mem = CN10K_ML_FW_POLL_MEM_DEFAULT;
-	} else {
-		if (!((strcmp(poll_mem, "ddr") == 0) || (strcmp(poll_mem, "register") == 0))) {
-			plt_err("Invalid argument, %s = %s\n", CN10K_ML_FW_POLL_MEM, poll_mem);
-			ret = -EINVAL;
-			goto exit;
-		}
-		mldev->fw.poll_mem = poll_mem;
-	}
-	plt_info("ML: %s = %s", CN10K_ML_FW_POLL_MEM, mldev->fw.poll_mem);
-
 	if (!ocm_page_size_set) {
 		mldev->ocm_page_size = CN10K_ML_OCM_PAGE_SIZE_DEFAULT;
 	} else {
@@ -450,10 +422,7 @@ cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 	if (fw->report_dpe_warnings)
 		flags = flags | FW_REPORT_DPE_WARNING_BITMASK;
 
-	if (strcmp(fw->poll_mem, "ddr") == 0)
-		flags = flags | FW_USE_DDR_POLL_ADDR_FP;
-	else if (strcmp(fw->poll_mem, "register") == 0)
-		flags = flags & ~FW_USE_DDR_POLL_ADDR_FP;
+	flags = flags | FW_USE_DDR_POLL_ADDR_FP;
 
 	return flags;
 }
@@ -863,5 +832,4 @@ RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_CN10K_PMD, CN10K_ML_FW_PATH
 			      "=<0|1>" CN10K_ML_DEV_CACHE_MODEL_DATA
 			      "=<0|1>" CN10K_ML_OCM_ALLOC_MODE
 			      "=<lowest|largest>" CN10K_ML_DEV_HW_QUEUE_LOCK
-			      "=<0|1>" CN10K_ML_FW_POLL_MEM "=<ddr|register>" CN10K_ML_OCM_PAGE_SIZE
-			      "=<1024|2048|4096|8192|16384>");
+			      "=<0|1>" CN10K_ML_OCM_PAGE_SIZE "=<1024|2048|4096|8192|16384>");
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index c73bf7d001..4aaeecff03 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -390,9 +390,6 @@ struct cn10k_ml_fw {
 	/* Report DPE warnings */
 	int report_dpe_warnings;
 
-	/* Memory to be used for polling in fast-path requests */
-	const char *poll_mem;
-
 	/* Data buffer */
 	uint8_t *data;
 
@@ -525,13 +522,9 @@ struct cn10k_ml_dev {
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
 
 	/* Poll handling function pointers */
-	void (*set_poll_addr)(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx);
-	void (*set_poll_ptr)(struct roc_ml *roc_ml, struct cn10k_ml_req *req);
-	uint64_t (*get_poll_ptr)(struct roc_ml *roc_ml, struct cn10k_ml_req *req);
-
-	/* Memory barrier function pointers to handle synchronization */
-	void (*set_enq_barrier)(void);
-	void (*set_deq_barrier)(void);
+	void (*set_poll_addr)(struct cn10k_ml_req *req);
+	void (*set_poll_ptr)(struct cn10k_ml_req *req);
+	uint64_t (*get_poll_ptr)(struct cn10k_ml_req *req);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 4abf4ae0d3..11531afd8c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -23,11 +23,6 @@
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
-/* Scratch register range for poll mode requests */
-#define ML_POLL_REGISTER_SYNC  1023
-#define ML_POLL_REGISTER_START 1024
-#define ML_POLL_REGISTER_END   2047
-
 /* Error message length */
 #define ERRMSG_LEN 32
 
@@ -82,79 +77,23 @@ print_line(FILE *fp, int len)
 }
 
 static inline void
-cn10k_ml_set_poll_addr_ddr(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx)
+cn10k_ml_set_poll_addr(struct cn10k_ml_req *req)
 {
-	PLT_SET_USED(qp);
-	PLT_SET_USED(idx);
-
 	req->compl_W1 = PLT_U64_CAST(&req->status);
 }
 
 static inline void
-cn10k_ml_set_poll_addr_reg(struct cn10k_ml_qp *qp, struct cn10k_ml_req *req, uint64_t idx)
-{
-	req->compl_W1 = ML_SCRATCH(qp->block_start + idx % qp->block_size);
-}
-
-static inline void
-cn10k_ml_set_poll_ptr_ddr(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+cn10k_ml_set_poll_ptr(struct cn10k_ml_req *req)
 {
-	PLT_SET_USED(roc_ml);
-
 	plt_write64(ML_CN10K_POLL_JOB_START, req->compl_W1);
 }
 
-static inline void
-cn10k_ml_set_poll_ptr_reg(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
-{
-	roc_ml_reg_write64(roc_ml, ML_CN10K_POLL_JOB_START, req->compl_W1);
-}
-
 static inline uint64_t
-cn10k_ml_get_poll_ptr_ddr(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
+cn10k_ml_get_poll_ptr(struct cn10k_ml_req *req)
 {
-	PLT_SET_USED(roc_ml);
-
 	return plt_read64(req->compl_W1);
 }
 
-static inline uint64_t
-cn10k_ml_get_poll_ptr_reg(struct roc_ml *roc_ml, struct cn10k_ml_req *req)
-{
-	return roc_ml_reg_read64(roc_ml, req->compl_W1);
-}
-
-static inline void
-cn10k_ml_set_sync_addr(struct cn10k_ml_dev *mldev, struct cn10k_ml_req *req)
-{
-	if (strcmp(mldev->fw.poll_mem, "ddr") == 0)
-		req->compl_W1 = PLT_U64_CAST(&req->status);
-	else if (strcmp(mldev->fw.poll_mem, "register") == 0)
-		req->compl_W1 = ML_SCRATCH(ML_POLL_REGISTER_SYNC);
-}
-
-static inline void
-cn10k_ml_enq_barrier_ddr(void)
-{
-}
-
-static inline void
-cn10k_ml_deq_barrier_ddr(void)
-{
-}
-
-static inline void
-cn10k_ml_enq_barrier_register(void)
-{
-	dmb_st;
-}
-
-static inline void
-cn10k_ml_deq_barrier_register(void)
-{
-	dsb_st;
-}
-
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -242,9 +181,6 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->stats.dequeued_count = 0;
 	qp->stats.enqueue_err_count = 0;
 	qp->stats.dequeue_err_count = 0;
-	qp->block_size =
-		(ML_POLL_REGISTER_END - ML_POLL_REGISTER_START + 1) / dev->data->nb_queue_pairs;
-	qp->block_start = ML_POLL_REGISTER_START + qp_id * qp->block_size;
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
@@ -933,11 +869,7 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	else
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_LF;
 
-	if (strcmp(mldev->fw.poll_mem, "register") == 0)
-		dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP / dev_info->max_queue_pairs;
-	else if (strcmp(mldev->fw.poll_mem, "ddr") == 0)
-		dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
-
+	dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP;
 	dev_info->max_io = ML_CN10K_MAX_INPUT_OUTPUT;
 	dev_info->max_segments = ML_CN10K_MAX_SEGMENTS;
 	dev_info->align_size = ML_CN10K_ALIGN_SIZE;
@@ -1118,24 +1050,9 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
 	/* Set polling function pointers */
-	if (strcmp(mldev->fw.poll_mem, "ddr") == 0) {
-		mldev->set_poll_addr = cn10k_ml_set_poll_addr_ddr;
-		mldev->set_poll_ptr = cn10k_ml_set_poll_ptr_ddr;
-		mldev->get_poll_ptr = cn10k_ml_get_poll_ptr_ddr;
-	} else if (strcmp(mldev->fw.poll_mem, "register") == 0) {
-		mldev->set_poll_addr = cn10k_ml_set_poll_addr_reg;
-		mldev->set_poll_ptr = cn10k_ml_set_poll_ptr_reg;
-		mldev->get_poll_ptr = cn10k_ml_get_poll_ptr_reg;
-	}
-
-	/* Set barrier function pointers */
-	if (strcmp(mldev->fw.poll_mem, "ddr") == 0) {
-		mldev->set_enq_barrier = cn10k_ml_enq_barrier_ddr;
-		mldev->set_deq_barrier = cn10k_ml_deq_barrier_ddr;
-	} else if (strcmp(mldev->fw.poll_mem, "register") == 0) {
-		mldev->set_enq_barrier = cn10k_ml_enq_barrier_register;
-		mldev->set_deq_barrier = cn10k_ml_deq_barrier_register;
-	}
+	mldev->set_poll_addr = cn10k_ml_set_poll_addr;
+	mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
+	mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
 
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
@@ -2390,15 +2307,14 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	op = ops[count];
 	req = &queue->reqs[head];
 
-	mldev->set_poll_addr(qp, req, head);
+	mldev->set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
-	mldev->set_enq_barrier();
 
-	mldev->set_poll_ptr(&mldev->roc, req);
+	mldev->set_poll_ptr(req);
 	enqueued = mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
@@ -2445,7 +2361,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 
 dequeue_req:
 	req = &queue->reqs[tail];
-	status = mldev->get_poll_ptr(&mldev->roc, req);
+	status = mldev->get_poll_ptr(req);
 	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH)) {
 		if (plt_tsc_cycles() < req->timeout)
 			goto empty_or_active;
@@ -2453,7 +2369,6 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 			req->result.error_code.s.etype = ML_ETYPE_DRIVER;
 	}
 
-	mldev->set_deq_barrier();
 	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
 	ops[count] = req->op;
 
@@ -2515,14 +2430,14 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	model = dev->data->models[op->model_id];
 	req = model->req;
 
-	cn10k_ml_set_sync_addr(mldev, req);
+	cn10k_ml_set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
-	mldev->set_poll_ptr(&mldev->roc, req);
+	mldev->set_poll_ptr(req);
 	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
 
 	timeout = true;
@@ -2542,7 +2457,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 
 	timeout = true;
 	do {
-		if (mldev->get_poll_ptr(&mldev->roc, req) == ML_CN10K_POLL_JOB_FINISH) {
+		if (mldev->get_poll_ptr(req) == ML_CN10K_POLL_JOB_FINISH) {
 			timeout = false;
 			break;
 		}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index d64a9f27e6..005b093e45 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -67,12 +67,6 @@ struct cn10k_ml_qp {
 
 	/* Statistics per queue-pair */
 	struct rte_ml_dev_stats stats;
-
-	/* Register block start for polling */
-	uint32_t block_start;
-
-	/* Register block end for polling */
-	uint32_t block_size;
 };
 
 /* Device ops */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 02/34] ml/cnxk: add generic cnxk device structure
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 01/34] ml/cnxk: drop support for register polling Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 03/34] ml/cnxk: add generic model and layer structures Srikanth Yalavarthi
                     ` (32 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduce generic cnxk device structure. This structure is
a top level device structure for the driver, which would
encapsulate the target / platform specific device structure.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c   | 316 ++++++++++----------
 drivers/ml/cnxk/cn10k_ml_dev.h   |  47 +--
 drivers/ml/cnxk/cn10k_ml_model.c |  15 +-
 drivers/ml/cnxk/cn10k_ml_model.h |   8 +-
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  60 ++--
 drivers/ml/cnxk/cn10k_ml_ops.c   | 495 +++++++++++++++++--------------
 drivers/ml/cnxk/cnxk_ml_dev.c    |  11 +
 drivers/ml/cnxk/cnxk_ml_dev.h    |  58 ++++
 drivers/ml/cnxk/meson.build      |   1 +
 9 files changed, 562 insertions(+), 449 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_dev.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index e3c2badcef..3bc61443d8 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -10,13 +10,14 @@
 #include <rte_mldev_pmd.h>
 #include <rte_pci.h>
 
-#include <roc_api.h>
-
 #include <eal_firmware.h>
 
-#include "cn10k_ml_dev.h"
+#include <roc_api.h>
+
 #include "cn10k_ml_ops.h"
 
+#include "cnxk_ml_dev.h"
+
 #define CN10K_ML_FW_PATH		"fw_path"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
 #define CN10K_ML_FW_REPORT_DPE_WARNINGS "report_dpe_warnings"
@@ -58,9 +59,6 @@ static const char *const valid_args[] = {CN10K_ML_FW_PATH,
 /* Supported OCM page sizes: 1KB, 2KB, 4KB, 8KB and 16KB */
 static const int valid_ocm_page_size[] = {1024, 2048, 4096, 8192, 16384};
 
-/* Dummy operations for ML device */
-struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
-
 static int
 parse_string_arg(const char *key __rte_unused, const char *value, void *extra_args)
 {
@@ -90,7 +88,7 @@ parse_integer_arg(const char *key __rte_unused, const char *value, void *extra_a
 }
 
 static int
-cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mldev)
+cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *cn10k_mldev)
 {
 	bool enable_dpe_warnings_set = false;
 	bool report_dpe_warnings_set = false;
@@ -127,7 +125,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_FW_ENABLE_DPE_WARNINGS) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_ENABLE_DPE_WARNINGS,
-					 &parse_integer_arg, &mldev->fw.enable_dpe_warnings);
+					 &parse_integer_arg, &cn10k_mldev->fw.enable_dpe_warnings);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_FW_ENABLE_DPE_WARNINGS);
@@ -139,7 +137,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_FW_REPORT_DPE_WARNINGS) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_FW_REPORT_DPE_WARNINGS,
-					 &parse_integer_arg, &mldev->fw.report_dpe_warnings);
+					 &parse_integer_arg, &cn10k_mldev->fw.report_dpe_warnings);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_FW_REPORT_DPE_WARNINGS);
@@ -151,7 +149,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_DEV_CACHE_MODEL_DATA) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_DEV_CACHE_MODEL_DATA, &parse_integer_arg,
-					 &mldev->cache_model_data);
+					 &cn10k_mldev->cache_model_data);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_DEV_CACHE_MODEL_DATA);
@@ -174,7 +172,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_DEV_HW_QUEUE_LOCK) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_DEV_HW_QUEUE_LOCK, &parse_integer_arg,
-					 &mldev->hw_queue_lock);
+					 &cn10k_mldev->hw_queue_lock);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n",
 				CN10K_ML_DEV_HW_QUEUE_LOCK);
@@ -186,7 +184,7 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 	if (rte_kvargs_count(kvlist, CN10K_ML_OCM_PAGE_SIZE) == 1) {
 		ret = rte_kvargs_process(kvlist, CN10K_ML_OCM_PAGE_SIZE, &parse_integer_arg,
-					 &mldev->ocm_page_size);
+					 &cn10k_mldev->ocm_page_size);
 		if (ret < 0) {
 			plt_err("Error processing arguments, key = %s\n", CN10K_ML_OCM_PAGE_SIZE);
 			ret = -EINVAL;
@@ -197,49 +195,53 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 
 check_args:
 	if (!fw_path_set)
-		mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
+		cn10k_mldev->fw.path = CN10K_ML_FW_PATH_DEFAULT;
 	else
-		mldev->fw.path = fw_path;
-	plt_info("ML: %s = %s", CN10K_ML_FW_PATH, mldev->fw.path);
+		cn10k_mldev->fw.path = fw_path;
+	plt_info("ML: %s = %s", CN10K_ML_FW_PATH, cn10k_mldev->fw.path);
 
 	if (!enable_dpe_warnings_set) {
-		mldev->fw.enable_dpe_warnings = CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT;
+		cn10k_mldev->fw.enable_dpe_warnings = CN10K_ML_FW_ENABLE_DPE_WARNINGS_DEFAULT;
 	} else {
-		if ((mldev->fw.enable_dpe_warnings < 0) || (mldev->fw.enable_dpe_warnings > 1)) {
+		if ((cn10k_mldev->fw.enable_dpe_warnings < 0) ||
+		    (cn10k_mldev->fw.enable_dpe_warnings > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_FW_ENABLE_DPE_WARNINGS,
-				mldev->fw.enable_dpe_warnings);
+				cn10k_mldev->fw.enable_dpe_warnings);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_FW_ENABLE_DPE_WARNINGS, mldev->fw.enable_dpe_warnings);
+	plt_info("ML: %s = %d", CN10K_ML_FW_ENABLE_DPE_WARNINGS,
+		 cn10k_mldev->fw.enable_dpe_warnings);
 
 	if (!report_dpe_warnings_set) {
-		mldev->fw.report_dpe_warnings = CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT;
+		cn10k_mldev->fw.report_dpe_warnings = CN10K_ML_FW_REPORT_DPE_WARNINGS_DEFAULT;
 	} else {
-		if ((mldev->fw.report_dpe_warnings < 0) || (mldev->fw.report_dpe_warnings > 1)) {
+		if ((cn10k_mldev->fw.report_dpe_warnings < 0) ||
+		    (cn10k_mldev->fw.report_dpe_warnings > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_FW_REPORT_DPE_WARNINGS,
-				mldev->fw.report_dpe_warnings);
+				cn10k_mldev->fw.report_dpe_warnings);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_FW_REPORT_DPE_WARNINGS, mldev->fw.report_dpe_warnings);
+	plt_info("ML: %s = %d", CN10K_ML_FW_REPORT_DPE_WARNINGS,
+		 cn10k_mldev->fw.report_dpe_warnings);
 
 	if (!cache_model_data_set) {
-		mldev->cache_model_data = CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT;
+		cn10k_mldev->cache_model_data = CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT;
 	} else {
-		if ((mldev->cache_model_data < 0) || (mldev->cache_model_data > 1)) {
+		if ((cn10k_mldev->cache_model_data < 0) || (cn10k_mldev->cache_model_data > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_DEV_CACHE_MODEL_DATA,
-				mldev->cache_model_data);
+				cn10k_mldev->cache_model_data);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_DEV_CACHE_MODEL_DATA, mldev->cache_model_data);
+	plt_info("ML: %s = %d", CN10K_ML_DEV_CACHE_MODEL_DATA, cn10k_mldev->cache_model_data);
 
 	if (!ocm_alloc_mode_set) {
-		mldev->ocm.alloc_mode = CN10K_ML_OCM_ALLOC_MODE_DEFAULT;
+		cn10k_mldev->ocm.alloc_mode = CN10K_ML_OCM_ALLOC_MODE_DEFAULT;
 	} else {
 		if (!((strcmp(ocm_alloc_mode, "lowest") == 0) ||
 		      (strcmp(ocm_alloc_mode, "largest") == 0))) {
@@ -248,47 +250,47 @@ cn10k_mldev_parse_devargs(struct rte_devargs *devargs, struct cn10k_ml_dev *mlde
 			ret = -EINVAL;
 			goto exit;
 		}
-		mldev->ocm.alloc_mode = ocm_alloc_mode;
+		cn10k_mldev->ocm.alloc_mode = ocm_alloc_mode;
 	}
-	plt_info("ML: %s = %s", CN10K_ML_OCM_ALLOC_MODE, mldev->ocm.alloc_mode);
+	plt_info("ML: %s = %s", CN10K_ML_OCM_ALLOC_MODE, cn10k_mldev->ocm.alloc_mode);
 
 	if (!hw_queue_lock_set) {
-		mldev->hw_queue_lock = CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT;
+		cn10k_mldev->hw_queue_lock = CN10K_ML_DEV_HW_QUEUE_LOCK_DEFAULT;
 	} else {
-		if ((mldev->hw_queue_lock < 0) || (mldev->hw_queue_lock > 1)) {
+		if ((cn10k_mldev->hw_queue_lock < 0) || (cn10k_mldev->hw_queue_lock > 1)) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_DEV_HW_QUEUE_LOCK,
-				mldev->hw_queue_lock);
+				cn10k_mldev->hw_queue_lock);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, mldev->hw_queue_lock);
+	plt_info("ML: %s = %d", CN10K_ML_DEV_HW_QUEUE_LOCK, cn10k_mldev->hw_queue_lock);
 
 	if (!ocm_page_size_set) {
-		mldev->ocm_page_size = CN10K_ML_OCM_PAGE_SIZE_DEFAULT;
+		cn10k_mldev->ocm_page_size = CN10K_ML_OCM_PAGE_SIZE_DEFAULT;
 	} else {
-		if (mldev->ocm_page_size < 0) {
+		if (cn10k_mldev->ocm_page_size < 0) {
 			plt_err("Invalid argument, %s = %d\n", CN10K_ML_OCM_PAGE_SIZE,
-				mldev->ocm_page_size);
+				cn10k_mldev->ocm_page_size);
 			ret = -EINVAL;
 			goto exit;
 		}
 
 		found = false;
 		for (i = 0; i < PLT_DIM(valid_ocm_page_size); i++) {
-			if (mldev->ocm_page_size == valid_ocm_page_size[i]) {
+			if (cn10k_mldev->ocm_page_size == valid_ocm_page_size[i]) {
 				found = true;
 				break;
 			}
 		}
 
 		if (!found) {
-			plt_err("Unsupported ocm_page_size = %d\n", mldev->ocm_page_size);
+			plt_err("Unsupported ocm_page_size = %d\n", cn10k_mldev->ocm_page_size);
 			ret = -EINVAL;
 			goto exit;
 		}
 	}
-	plt_info("ML: %s = %d", CN10K_ML_OCM_PAGE_SIZE, mldev->ocm_page_size);
+	plt_info("ML: %s = %d", CN10K_ML_OCM_PAGE_SIZE, cn10k_mldev->ocm_page_size);
 
 exit:
 	rte_kvargs_free(kvlist);
@@ -300,7 +302,8 @@ static int
 cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
 {
 	struct rte_ml_dev_pmd_init_params init_params;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	char name[RTE_ML_STR_MAX];
 	struct rte_ml_dev *dev;
 	int ret;
@@ -308,7 +311,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	PLT_SET_USED(pci_drv);
 
 	init_params = (struct rte_ml_dev_pmd_init_params){
-		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cn10k_ml_dev)};
+		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cnxk_ml_dev)};
 
 	ret = roc_plt_init();
 	if (ret < 0) {
@@ -324,18 +327,20 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	}
 
 	/* Get private data space allocated */
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cnxk_mldev->mldev = dev;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
-		mldev->roc.pci_dev = pci_dev;
+		cn10k_mldev->roc.pci_dev = pci_dev;
 
-		ret = cn10k_mldev_parse_devargs(dev->device->devargs, mldev);
+		ret = cn10k_mldev_parse_devargs(dev->device->devargs, cn10k_mldev);
 		if (ret) {
 			plt_err("Failed to parse devargs ret = %d", ret);
 			goto pmd_destroy;
 		}
 
-		ret = roc_ml_dev_init(&mldev->roc);
+		ret = roc_ml_dev_init(&cn10k_mldev->roc);
 		if (ret) {
 			plt_err("Failed to initialize ML ROC, ret = %d", ret);
 			goto pmd_destroy;
@@ -351,7 +356,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	dev->dequeue_burst = NULL;
 	dev->op_error_get = NULL;
 
-	mldev->state = ML_CN10K_DEV_STATE_PROBED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_PROBED;
 
 	return 0;
 
@@ -368,7 +373,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 static int
 cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	char name[RTE_ML_STR_MAX];
 	struct rte_ml_dev *dev;
 	int ret;
@@ -383,8 +388,8 @@ cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 		return -ENODEV;
 
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
-		mldev = dev->data->dev_private;
-		ret = roc_ml_dev_fini(&mldev->roc);
+		cnxk_mldev = dev->data->dev_private;
+		ret = roc_ml_dev_fini(&cnxk_mldev->cn10k_mldev.roc);
 		if (ret)
 			return ret;
 	}
@@ -430,45 +435,45 @@ cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw)
 static int
 cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
 	uint64_t timeout_cycle;
 	uint64_t reg_val64;
 	bool timeout;
 	int ret = 0;
 
-	mldev = fw->mldev;
+	cn10k_mldev = fw->cn10k_mldev;
 
 	/* Reset HEAD and TAIL debug pointer registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
 
 	/* Set ML_MLR_BASE to base IOVA of the ML region in LLC/DRAM. */
 	reg_val64 = rte_eal_get_baseaddr();
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_MLR_BASE);
-	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
-	roc_ml_reg_save(&mldev->roc, ML_MLR_BASE);
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_save(&cn10k_mldev->roc, ML_MLR_BASE);
 
 	/* Update FW load completion structure */
 	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
 	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &fw->req->result);
+	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
 	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CN10K_POLL_JOB_START, &fw->req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
-	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&mldev->roc, &fw->req->jd);
+	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
 
 	plt_rmb();
 	do {
-		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
+		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -480,11 +485,11 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 	} else {
 		/* Set ML to disable new jobs */
 		reg_val64 = (ROC_ML_CFG_JD_SIZE | ROC_ML_CFG_MLIP_ENA);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 
 		/* Clear scratch registers */
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
 
 		if (timeout) {
 			plt_err("Firmware load timeout");
@@ -498,14 +503,14 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 	}
 
 	/* Reset scratch registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
 
 	/* Disable job execution, to be enabled in start */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	return ret;
 }
@@ -515,7 +520,7 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 {
 	union ml_a35_0_rst_vector_base_s a35_0_rst_vector_base;
 	union ml_a35_0_rst_vector_base_s a35_1_rst_vector_base;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
 	uint64_t timeout_cycle;
 	uint64_t reg_val64;
 	uint32_t reg_val32;
@@ -524,24 +529,24 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	int ret = 0;
 	uint8_t i;
 
-	mldev = fw->mldev;
+	cn10k_mldev = fw->cn10k_mldev;
 
 	/* Reset HEAD and TAIL debug pointer registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_EXCEPTION_SP_C1);
 
 	/* (1) Write firmware images for ACC's two A35 cores to the ML region in LLC / DRAM. */
 	rte_memcpy(PLT_PTR_ADD(fw->data, FW_LINKER_OFFSET), buffer, size);
 
 	/* (2) Set ML(0)_MLR_BASE = Base IOVA of the ML region in LLC/DRAM. */
 	reg_val64 = PLT_PTR_SUB_U64_CAST(fw->data, rte_eal_get_baseaddr());
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_MLR_BASE);
-	plt_ml_dbg("ML_MLR_BASE => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
-	roc_ml_reg_save(&mldev->roc, ML_MLR_BASE);
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_save(&cn10k_mldev->roc, ML_MLR_BASE);
 
 	/* (3) Set ML(0)_AXI_BRIDGE_CTRL(1) = 0x184003 to remove back-pressure check on DMA AXI
 	 * bridge.
@@ -549,9 +554,9 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	reg_val64 = (ROC_ML_AXI_BRIDGE_CTRL_AXI_RESP_CTRL |
 		     ROC_ML_AXI_BRIDGE_CTRL_BRIDGE_CTRL_MODE | ROC_ML_AXI_BRIDGE_CTRL_NCB_WR_BLK |
 		     ROC_ML_AXI_BRIDGE_CTRL_FORCE_WRESP_OK | ROC_ML_AXI_BRIDGE_CTRL_FORCE_RRESP_OK);
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_AXI_BRIDGE_CTRL(1));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_AXI_BRIDGE_CTRL(1));
 	plt_ml_dbg("ML_AXI_BRIDGE_CTRL(1) => 0x%016lx",
-		   roc_ml_reg_read64(&mldev->roc, ML_AXI_BRIDGE_CTRL(1)));
+		   roc_ml_reg_read64(&cn10k_mldev->roc, ML_AXI_BRIDGE_CTRL(1)));
 
 	/* (4) Set ML(0)_ANB(0..2)_BACKP_DISABLE = 0x3 to remove back-pressure on the AXI to NCB
 	 * bridges.
@@ -559,9 +564,9 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	for (i = 0; i < ML_ANBX_NR; i++) {
 		reg_val64 = (ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_B_BACKP_DISABLE |
 			     ROC_ML_ANBX_BACKP_DISABLE_EXTMSTR_R_BACKP_DISABLE);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_BACKP_DISABLE(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_BACKP_DISABLE(i));
 		plt_ml_dbg("ML_ANBX_BACKP_DISABLE(%u) => 0x%016lx", i,
-			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_BACKP_DISABLE(i)));
+			   roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_BACKP_DISABLE(i)));
 	}
 
 	/* (5) Set ML(0)_ANB(0..2)_NCBI_P_OVR = 0x3000 and ML(0)_ANB(0..2)_NCBI_NP_OVR = 0x3000 to
@@ -570,39 +575,40 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	for (i = 0; i < ML_ANBX_NR; i++) {
 		reg_val64 = (ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR |
 			     ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_NS_OVR_VLD);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
 		plt_ml_dbg("ML_ANBX_NCBI_P_OVR(%u) => 0x%016lx", i,
-			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_P_OVR(i)));
+			   roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_NCBI_P_OVR(i)));
 
 		reg_val64 |= (ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR |
 			      ML_ANBX_NCBI_NP_OVR_ANB_NCBI_NP_NS_OVR_VLD);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_NP_OVR(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_NCBI_NP_OVR(i));
 		plt_ml_dbg("ML_ANBX_NCBI_NP_OVR(%u) => 0x%016lx", i,
-			   roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_NP_OVR(i)));
+			   roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_NCBI_NP_OVR(i)));
 	}
 
 	/* (6) Set ML(0)_CFG[MLIP_CLK_FORCE] = 1, to force turning on the MLIP clock. */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_MLIP_CLK_FORCE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* (7) Set ML(0)_JOB_MGR_CTRL[STALL_ON_IDLE] = 0, to make sure the boot request is accepted
 	 * when there is no job in the command queue.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL);
 	reg_val64 &= ~ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
-	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
+	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx",
+		   roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL));
 
 	/* (8) Set ML(0)_CFG[ENA] = 0 and ML(0)_CFG[MLIP_ENA] = 1 to bring MLIP out of reset while
 	 * keeping the job manager disabled.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_MLIP_ENA;
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* (9) Wait at least 70 coprocessor clock cycles. */
 	plt_delay_us(FW_WAIT_CYCLES);
@@ -613,53 +619,57 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	 * AXI outbound address divided by 4. Read after write.
 	 */
 	offset = PLT_PTR_ADD_U64_CAST(
-		fw->data, FW_LINKER_OFFSET - roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+		fw->data, FW_LINKER_OFFSET - roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
 	a35_0_rst_vector_base.s.addr = (offset + ML_AXI_START_ADDR) / 4;
 	a35_1_rst_vector_base.s.addr = (offset + ML_AXI_START_ADDR) / 4;
 
-	roc_ml_reg_write32(&mldev->roc, a35_0_rst_vector_base.w.w0, ML_A35_0_RST_VECTOR_BASE_W(0));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(0));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_0_rst_vector_base.w.w0,
+			   ML_A35_0_RST_VECTOR_BASE_W(0));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(0));
 	plt_ml_dbg("ML_A35_0_RST_VECTOR_BASE_W(0) => 0x%08x", reg_val32);
 
-	roc_ml_reg_write32(&mldev->roc, a35_0_rst_vector_base.w.w1, ML_A35_0_RST_VECTOR_BASE_W(1));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(1));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_0_rst_vector_base.w.w1,
+			   ML_A35_0_RST_VECTOR_BASE_W(1));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_0_RST_VECTOR_BASE_W(1));
 	plt_ml_dbg("ML_A35_0_RST_VECTOR_BASE_W(1) => 0x%08x", reg_val32);
 
-	roc_ml_reg_write32(&mldev->roc, a35_1_rst_vector_base.w.w0, ML_A35_1_RST_VECTOR_BASE_W(0));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(0));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_1_rst_vector_base.w.w0,
+			   ML_A35_1_RST_VECTOR_BASE_W(0));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(0));
 	plt_ml_dbg("ML_A35_1_RST_VECTOR_BASE_W(0) => 0x%08x", reg_val32);
 
-	roc_ml_reg_write32(&mldev->roc, a35_1_rst_vector_base.w.w1, ML_A35_1_RST_VECTOR_BASE_W(1));
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(1));
+	roc_ml_reg_write32(&cn10k_mldev->roc, a35_1_rst_vector_base.w.w1,
+			   ML_A35_1_RST_VECTOR_BASE_W(1));
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_A35_1_RST_VECTOR_BASE_W(1));
 	plt_ml_dbg("ML_A35_1_RST_VECTOR_BASE_W(1) => 0x%08x", reg_val32);
 
 	/* (11) Clear MLIP's ML(0)_SW_RST_CTRL[ACC_RST]. This will bring the ACC cores and other
 	 * MLIP components out of reset. The cores will execute firmware from the ML region as
 	 * written in step 1.
 	 */
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_SW_RST_CTRL);
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_SW_RST_CTRL);
 	reg_val32 &= ~ROC_ML_SW_RST_CTRL_ACC_RST;
-	roc_ml_reg_write32(&mldev->roc, reg_val32, ML_SW_RST_CTRL);
-	reg_val32 = roc_ml_reg_read32(&mldev->roc, ML_SW_RST_CTRL);
+	roc_ml_reg_write32(&cn10k_mldev->roc, reg_val32, ML_SW_RST_CTRL);
+	reg_val32 = roc_ml_reg_read32(&cn10k_mldev->roc, ML_SW_RST_CTRL);
 	plt_ml_dbg("ML_SW_RST_CTRL => 0x%08x", reg_val32);
 
 	/* (12) Wait for notification from firmware that ML is ready for job execution. */
 	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
 	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &fw->req->result);
+	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
 	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CN10K_POLL_JOB_START, &fw->req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
-	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&mldev->roc, &fw->req->jd);
+	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
 
 	plt_rmb();
 	do {
-		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
+		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -671,11 +681,11 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	} else {
 		/* Set ML to disable new jobs */
 		reg_val64 = (ROC_ML_CFG_JD_SIZE | ROC_ML_CFG_MLIP_ENA);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 
 		/* Clear scratch registers */
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
-		roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+		roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
 
 		if (timeout) {
 			plt_err("Firmware load timeout");
@@ -691,49 +701,51 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	/* (13) Set ML(0)_JOB_MGR_CTRL[STALL_ON_IDLE] = 0x1; this is needed to shut down the MLIP
 	 * clock when there are no more jobs to process.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL);
 	reg_val64 |= ROC_ML_JOB_MGR_CTRL_STALL_ON_IDLE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
-	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_JOB_MGR_CTRL));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_JOB_MGR_CTRL);
+	plt_ml_dbg("ML_JOB_MGR_CTRL => 0x%016lx",
+		   roc_ml_reg_read64(&cn10k_mldev->roc, ML_JOB_MGR_CTRL));
 
 	/* (14) Set ML(0)_CFG[MLIP_CLK_FORCE] = 0; the MLIP clock will be turned on/off based on job
 	 * activities.
 	 */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_MLIP_CLK_FORCE;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* (15) Set ML(0)_CFG[ENA] to enable ML job execution. */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* Reset scratch registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
 
 	/* Disable job execution, to be enabled in start */
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
 	/* Additional fixes: Set RO bit to fix O2D DMA bandwidth issue on cn10ka */
 	for (i = 0; i < ML_ANBX_NR; i++) {
-		reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_ANBX_NCBI_P_OVR(i));
+		reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_ANBX_NCBI_P_OVR(i));
 		reg_val64 |= (ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR |
 			      ML_ANBX_NCBI_P_OVR_ANB_NCBI_P_RO_OVR_VLD);
-		roc_ml_reg_write64(&mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
+		roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_ANBX_NCBI_P_OVR(i));
 	}
 
 	return ret;
 }
 
 int
-cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
+cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
 	const struct plt_memzone *mz;
 	struct cn10k_ml_fw *fw;
 	void *fw_buffer = NULL;
@@ -741,8 +753,9 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	uint64_t fw_size = 0;
 	int ret = 0;
 
-	fw = &mldev->fw;
-	fw->mldev = mldev;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	fw = &cn10k_mldev->fw;
+	fw->cn10k_mldev = cn10k_mldev;
 
 	if (roc_env_is_emulator() || roc_env_is_hw()) {
 		/* Read firmware image to a buffer */
@@ -773,8 +786,8 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	memset(&fw->req->jd.fw_load.version[0], '\0', MLDEV_FIRMWARE_VERSION_LENGTH);
 
 	/* Reset device, if in active state */
-	if (roc_ml_mlip_is_enabled(&mldev->roc))
-		roc_ml_mlip_reset(&mldev->roc, true);
+	if (roc_ml_mlip_is_enabled(&cn10k_mldev->roc))
+		roc_ml_mlip_reset(&cn10k_mldev->roc, true);
 
 	/* Load firmware */
 	if (roc_env_is_emulator() || roc_env_is_hw()) {
@@ -787,22 +800,25 @@ cn10k_ml_fw_load(struct cn10k_ml_dev *mldev)
 	}
 
 	if (ret < 0)
-		cn10k_ml_fw_unload(mldev);
+		cn10k_ml_fw_unload(cnxk_mldev);
 
 	return ret;
 }
 
 void
-cn10k_ml_fw_unload(struct cn10k_ml_dev *mldev)
+cn10k_ml_fw_unload(struct cnxk_ml_dev *cnxk_mldev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
 	const struct plt_memzone *mz;
 	uint64_t reg_val;
 
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+
 	/* Disable and reset device */
-	reg_val = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val &= ~ROC_ML_CFG_MLIP_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val, ML_CFG);
-	roc_ml_mlip_reset(&mldev->roc, true);
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val, ML_CFG);
+	roc_ml_mlip_reset(&cn10k_mldev->roc, true);
 
 	mz = plt_memzone_lookup(FW_MEMZONE_NAME);
 	if (mz != NULL)
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 4aaeecff03..f9da1548c4 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -9,6 +9,9 @@
 
 #include "cn10k_ml_ocm.h"
 
+/* Dummy Device ops */
+extern struct rte_ml_dev_ops ml_dev_dummy_ops;
+
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
@@ -36,17 +39,10 @@
 /* Maximum number of segments for IO data */
 #define ML_CN10K_MAX_SEGMENTS 1
 
-/* ML command timeout in seconds */
-#define ML_CN10K_CMD_TIMEOUT 5
-
 /* ML slow-path job flags */
 #define ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE BIT(0)
 #define ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD   BIT(1)
 
-/* Poll mode job state */
-#define ML_CN10K_POLL_JOB_START	 0
-#define ML_CN10K_POLL_JOB_FINISH 1
-
 /* Memory barrier macros */
 #if defined(RTE_ARCH_ARM)
 #define dmb_st ({ asm volatile("dmb st" : : : "memory"); })
@@ -56,6 +52,7 @@
 #define dsb_st
 #endif
 
+struct cnxk_ml_dev;
 struct cn10k_ml_req;
 struct cn10k_ml_qp;
 
@@ -68,21 +65,6 @@ enum cn10k_ml_job_type {
 	ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST,
 };
 
-/* Device configuration state enum */
-enum cn10k_ml_dev_state {
-	/* Probed and not configured */
-	ML_CN10K_DEV_STATE_PROBED = 0,
-
-	/* Configured */
-	ML_CN10K_DEV_STATE_CONFIGURED,
-
-	/* Started */
-	ML_CN10K_DEV_STATE_STARTED,
-
-	/* Closed */
-	ML_CN10K_DEV_STATE_CLOSED
-};
-
 /* Error types enumeration */
 enum cn10k_ml_error_etype {
 	/* 0x0 */ ML_ETYPE_NO_ERROR = 0, /* No error */
@@ -379,7 +361,7 @@ struct cn10k_ml_jd {
 /* ML firmware structure */
 struct cn10k_ml_fw {
 	/* Device reference */
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
 
 	/* Firmware file path */
 	const char *path;
@@ -485,27 +467,12 @@ struct cn10k_ml_dev {
 	/* Device ROC */
 	struct roc_ml roc;
 
-	/* Configuration state */
-	enum cn10k_ml_dev_state state;
-
 	/* Firmware */
 	struct cn10k_ml_fw fw;
 
 	/* OCM info */
 	struct cn10k_ml_ocm ocm;
 
-	/* Number of models loaded */
-	uint16_t nb_models_loaded;
-
-	/* Number of models unloaded */
-	uint16_t nb_models_unloaded;
-
-	/* Number of models started */
-	uint16_t nb_models_started;
-
-	/* Number of models stopped */
-	uint16_t nb_models_stopped;
-
 	/* Extended stats data */
 	struct cn10k_ml_xstats xstats;
 
@@ -528,7 +495,7 @@ struct cn10k_ml_dev {
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
-int cn10k_ml_fw_load(struct cn10k_ml_dev *mldev);
-void cn10k_ml_fw_unload(struct cn10k_ml_dev *mldev);
+int cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev);
+void cn10k_ml_fw_unload(struct cnxk_ml_dev *cnxk_mldev);
 
 #endif /* _CN10K_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index e0b750cd8e..cc46ca2efd 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -6,10 +6,11 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
+#include "cnxk_ml_dev.h"
+
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
 {
@@ -461,7 +462,7 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_
 }
 
 int
-cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, uint8_t *buffer,
+cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id, uint8_t *buffer,
 			       uint16_t *wb_pages, uint16_t *scratch_pages)
 {
 	struct cn10k_ml_model_metadata *metadata;
@@ -470,7 +471,7 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui
 	uint64_t wb_size;
 
 	metadata = (struct cn10k_ml_model_metadata *)buffer;
-	ocm = &mldev->ocm;
+	ocm = &cn10k_mldev->ocm;
 
 	/* Assume wb_size is zero for non-relocatable models */
 	if (metadata->model.ocm_relocatable)
@@ -494,11 +495,11 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui
 		   scratch_size, *scratch_pages);
 
 	/* Check if the model can be loaded on OCM */
-	if ((*wb_pages + *scratch_pages) > mldev->ocm.num_pages) {
+	if ((*wb_pages + *scratch_pages) > cn10k_mldev->ocm.num_pages) {
 		plt_err("Cannot create the model, OCM relocatable = %u",
 			metadata->model.ocm_relocatable);
 		plt_err("wb_pages (%u) + scratch_pages (%u) > %u", *wb_pages, *scratch_pages,
-			mldev->ocm.num_pages);
+			cn10k_mldev->ocm.num_pages);
 		return -ENOMEM;
 	}
 
@@ -506,8 +507,8 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, ui
 	 * prevent the library from allocating the remaining space on the tile to other models.
 	 */
 	if (!metadata->model.ocm_relocatable)
-		*scratch_pages =
-			PLT_MAX(PLT_U64_CAST(*scratch_pages), PLT_U64_CAST(mldev->ocm.num_pages));
+		*scratch_pages = PLT_MAX(PLT_U64_CAST(*scratch_pages),
+					 PLT_U64_CAST(cn10k_mldev->ocm.num_pages));
 
 	return 0;
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 4cc0744891..3128b28db7 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -13,6 +13,8 @@
 #include "cn10k_ml_ocm.h"
 #include "cn10k_ml_ops.h"
 
+struct cnxk_ml_dev;
+
 /* Model state */
 enum cn10k_ml_model_state {
 	ML_CN10K_MODEL_STATE_LOADED,
@@ -489,7 +491,7 @@ struct cn10k_ml_model_stats {
 /* Model Object */
 struct cn10k_ml_model {
 	/* Device reference */
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *mldev;
 
 	/* Name */
 	char name[RTE_ML_STR_MAX];
@@ -537,8 +539,8 @@ int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
 void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
 				uint8_t *base_dma_addr);
-int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *mldev, uint16_t model_id, uint8_t *buffer,
-				   uint16_t *wb_pages, uint16_t *scratch_pages);
+int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id,
+				   uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages);
 void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 6fb0bb620e..8094a0fab1 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -4,11 +4,12 @@
 
 #include <rte_mldev_pmd.h>
 
-#include "cn10k_ml_dev.h"
+#include <roc_api.h>
+
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
-#include "roc_api.h"
+#include "cnxk_ml_dev.h"
 
 /* OCM macros */
 #define BYTE_LEN	   8
@@ -217,7 +218,8 @@ int
 cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
 			   uint16_t scratch_pages, uint64_t *tilemask)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	uint16_t used_scratch_pages_max;
@@ -236,8 +238,9 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	int max_slot_sz;
 	int page_id;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 
 	if (num_tiles > ML_CN10K_OCM_NUMTILES) {
 		plt_err("Invalid num_tiles = %u (> %u)", num_tiles, ML_CN10K_OCM_NUMTILES);
@@ -254,8 +257,8 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	tile_start = 0;
 	search_end_tile = ocm->num_tiles - num_tiles;
 
-	/* allocate for local ocm mask */
-	local_ocm_mask = rte_zmalloc("local_ocm_mask", mldev->ocm.mask_words, RTE_CACHE_LINE_SIZE);
+	/* Allocate for local ocm mask */
+	local_ocm_mask = rte_zmalloc("local_ocm_mask", ocm->mask_words, RTE_CACHE_LINE_SIZE);
 	if (local_ocm_mask == NULL) {
 		plt_err("Unable to allocate memory for local_ocm_mask");
 		return -1;
@@ -271,7 +274,7 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 			PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, used_last_wb_page_max);
 	}
 
-	memset(local_ocm_mask, 0, mldev->ocm.mask_words);
+	memset(local_ocm_mask, 0, ocm->mask_words);
 	for (tile_id = tile_start; tile_id < tile_start + num_tiles; tile_id++) {
 		for (word_id = 0; word_id < ocm->mask_words; word_id++)
 			local_ocm_mask[word_id] |= ocm->tile_ocm_info[tile_id].ocm_mask[word_id];
@@ -333,8 +336,9 @@ void
 cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t tilemask,
 			   int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_page_start;
@@ -345,8 +349,9 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 	int tile_id;
 	int page_id;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	/* Get first set bit, tile_start */
@@ -391,8 +396,9 @@ void
 cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_model *local_model;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_resize_pages;
@@ -404,8 +410,9 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 	int page_id;
 	uint16_t i;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	/* Update OCM info for WB memory */
@@ -453,35 +460,37 @@ cn10k_ml_ocm_pagemask_to_str(struct cn10k_ml_ocm_tile_info *tile_info, uint16_t
 	char *p = str;
 	int word;
 
-	/* add prefix 0x */
+	/* Add prefix 0x */
 	*p++ = '0';
 	*p++ = 'x';
 
-	/* build one word at a time */
+	/* Build hex string */
 	for (word = nwords - 1; word >= 0; word--) {
 		sprintf(p, "%02X", tile_info->ocm_mask[word]);
 		p += 2;
 	}
 
-	/* terminate */
+	/* Terminate */
 	*p++ = 0;
 }
 
 void
 cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
 {
-	char *str;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 	uint8_t tile_id;
 	uint8_t word_id;
 	int wb_pages;
+	char *str;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 
-	/* nibbles + prefix '0x' */
-	str = rte_zmalloc("ocm_mask_str", mldev->ocm.num_pages / 4 + 2, RTE_CACHE_LINE_SIZE);
+	/* Nibbles + prefix '0x' */
+	str = rte_zmalloc("ocm_mask_str", ocm->num_pages / 4 + 2, RTE_CACHE_LINE_SIZE);
 	if (str == NULL) {
 		plt_err("Unable to allocate memory for ocm_mask_str");
 		return;
@@ -492,9 +501,8 @@ cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
 		cn10k_ml_ocm_pagemask_to_str(&ocm->tile_ocm_info[tile_id], ocm->mask_words, str);
 
 		wb_pages = 0 - ocm->tile_ocm_info[tile_id].scratch_pages;
-		for (word_id = 0; word_id < mldev->ocm.mask_words; word_id++)
-			wb_pages +=
-				rte_popcount32(ocm->tile_ocm_info[tile_id].ocm_mask[word_id]);
+		for (word_id = 0; word_id < ocm->mask_words; word_id++)
+			wb_pages += rte_popcount32(ocm->tile_ocm_info[tile_id].ocm_mask[word_id]);
 
 		fprintf(fp,
 			"tile = %2u, scratch_pages = %4u,"
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 11531afd8c..dc747cf534 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -7,10 +7,11 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_dev.h"
 #include "cn10k_ml_model.h"
 #include "cn10k_ml_ops.h"
 
+#include "cnxk_ml_dev.h"
+
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
 
@@ -85,7 +86,7 @@ cn10k_ml_set_poll_addr(struct cn10k_ml_req *req)
 static inline void
 cn10k_ml_set_poll_ptr(struct cn10k_ml_req *req)
 {
-	plt_write64(ML_CN10K_POLL_JOB_START, req->compl_W1);
+	plt_write64(ML_CNXK_POLL_JOB_START, req->compl_W1);
 }
 
 static inline uint64_t
@@ -175,7 +176,7 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	qp->queue.reqs = (struct cn10k_ml_req *)va;
 	qp->queue.head = 0;
 	qp->queue.tail = 0;
-	qp->queue.wait_cycles = ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	qp->nb_desc = nb_desc;
 	qp->stats.enqueued_count = 0;
 	qp->stats.dequeued_count = 0;
@@ -199,16 +200,17 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 static void
 cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 {
-
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	char str[STR_LEN];
 	uint8_t i;
 	uint8_t j;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	/* Print debug info */
@@ -249,7 +251,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
 			ML_CN10K_OCM_NUMTILES / 4, model->model_mem_map.tilemask);
 		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
-			model->model_mem_map.wb_page_start * mldev->ocm.page_size);
+			model->model_mem_map.wb_page_start * cn10k_mldev->ocm.page_size);
 	}
 
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->metadata.model.num_input);
@@ -325,7 +327,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 }
 
 static void
-cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_model *model,
+cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_model *model,
 				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
@@ -340,7 +342,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 	req->jd.hdr.model_id = model->model_id;
 	req->jd.hdr.job_type = job_type;
 	req->jd.hdr.fp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
 		if (!model->metadata.model.ocm_relocatable)
@@ -350,9 +352,9 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 
 		req->jd.hdr.sp_flags |= ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD;
 		req->jd.model_start.extended_args =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, &req->extended_args));
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->extended_args));
 		req->jd.model_start.model_dst_ddr_addr =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, addr->init_run_addr));
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_run_addr));
 		req->jd.model_start.model_init_offset = 0x0;
 		req->jd.model_start.model_main_offset = metadata->init_model.file_size;
 		req->jd.model_start.model_finish_offset =
@@ -372,7 +374,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 		req->jd.model_start.ocm_wb_range_start = metadata->model.ocm_wb_range_start;
 		req->jd.model_start.ocm_wb_range_end = metadata->model.ocm_wb_range_end;
 		req->jd.model_start.ddr_wb_base_address = PLT_U64_CAST(roc_ml_addr_ap2mlip(
-			&mldev->roc,
+			&cn10k_mldev->roc,
 			PLT_PTR_ADD(addr->finish_load_addr, metadata->finish_model.file_size)));
 		req->jd.model_start.ddr_wb_range_start = metadata->model.ddr_wb_range_start;
 		req->jd.model_start.ddr_wb_range_end = metadata->model.ddr_wb_range_end;
@@ -383,7 +385,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 		req->jd.model_start.output.s.ddr_range_end = metadata->model.ddr_output_range_end;
 
 		req->extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
-			roc_ml_addr_ap2mlip(&mldev->roc, model->addr.scratch_base_addr));
+			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, model->addr.scratch_base_addr));
 		req->extended_args.start.ddr_scratch_range_start =
 			metadata->model.ddr_scratch_range_start;
 		req->extended_args.start.ddr_scratch_range_end =
@@ -392,24 +394,20 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *mldev, struct cn10k_ml_mode
 }
 
 static __rte_always_inline void
-cn10k_ml_prep_fp_job_descriptor(struct rte_ml_dev *dev, struct cn10k_ml_req *req,
+cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_req *req,
 				struct rte_ml_op *op)
 {
-	struct cn10k_ml_dev *mldev;
-
-	mldev = dev->data->dev_private;
-
 	req->jd.hdr.jce.w0.u64 = 0;
 	req->jd.hdr.jce.w1.u64 = req->compl_W1;
 	req->jd.hdr.model_id = op->model_id;
 	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
 	req->jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
 	req->jd.hdr.sp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
 	req->jd.model_run.input_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->input[0]->addr));
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->input[0]->addr));
 	req->jd.model_run.output_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->output[0]->addr));
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->output[0]->addr));
 	req->jd.model_run.num_batches = op->nb_batches;
 }
 
@@ -436,66 +434,69 @@ static const struct xstat_info model_stats[] = {
 static int
 cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint16_t nb_stats;
 	uint16_t stat_id;
 	uint16_t model;
 	uint16_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
 	nb_stats = RTE_DIM(device_stats) + ML_CN10K_MAX_MODELS * RTE_DIM(model_stats);
-	if (mldev->xstats.entries == NULL)
-		mldev->xstats.entries = rte_zmalloc("cn10k_ml_xstats",
-						    sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
-						    PLT_CACHE_LINE_SIZE);
+	if (cn10k_mldev->xstats.entries == NULL)
+		cn10k_mldev->xstats.entries = rte_zmalloc(
+			"cn10k_ml_xstats", sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
+			PLT_CACHE_LINE_SIZE);
 
-	if (mldev->xstats.entries == NULL)
+	if (cn10k_mldev->xstats.entries == NULL)
 		return -ENOMEM;
 
 	/* Initialize device xstats */
 	stat_id = 0;
 	for (i = 0; i < RTE_DIM(device_stats); i++) {
-		mldev->xstats.entries[stat_id].map.id = stat_id;
-		snprintf(mldev->xstats.entries[stat_id].map.name,
-			 sizeof(mldev->xstats.entries[stat_id].map.name), "%s",
+		cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
+		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s",
 			 device_stats[i].name);
 
-		mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
-		mldev->xstats.entries[stat_id].type = device_stats[i].type;
-		mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_DEVICE;
-		mldev->xstats.entries[stat_id].obj_idx = 0;
-		mldev->xstats.entries[stat_id].reset_allowed = device_stats[i].reset_allowed;
+		cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
+		cn10k_mldev->xstats.entries[stat_id].type = device_stats[i].type;
+		cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_DEVICE;
+		cn10k_mldev->xstats.entries[stat_id].obj_idx = 0;
+		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_stats[i].reset_allowed;
 		stat_id++;
 	}
-	mldev->xstats.count_mode_device = stat_id;
+	cn10k_mldev->xstats.count_mode_device = stat_id;
 
 	/* Initialize model xstats */
 	for (model = 0; model < ML_CN10K_MAX_MODELS; model++) {
-		mldev->xstats.offset_for_model[model] = stat_id;
+		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
 
 		for (i = 0; i < RTE_DIM(model_stats); i++) {
-			mldev->xstats.entries[stat_id].map.id = stat_id;
-			mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
-			mldev->xstats.entries[stat_id].type = model_stats[i].type;
-			mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_MODEL;
-			mldev->xstats.entries[stat_id].obj_idx = model;
-			mldev->xstats.entries[stat_id].reset_allowed = model_stats[i].reset_allowed;
+			cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
+			cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
+			cn10k_mldev->xstats.entries[stat_id].type = model_stats[i].type;
+			cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_MODEL;
+			cn10k_mldev->xstats.entries[stat_id].obj_idx = model;
+			cn10k_mldev->xstats.entries[stat_id].reset_allowed =
+				model_stats[i].reset_allowed;
 
 			/* Name of xstat is updated during model load */
-			snprintf(mldev->xstats.entries[stat_id].map.name,
-				 sizeof(mldev->xstats.entries[stat_id].map.name), "Model-%u-%s",
-				 model, model_stats[i].name);
+			snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
+				 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name),
+				 "Model-%u-%s", model, model_stats[i].name);
 
 			stat_id++;
 		}
 
-		mldev->xstats.count_per_model[model] = RTE_DIM(model_stats);
+		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(model_stats);
 	}
 
-	mldev->xstats.count_mode_model = stat_id - mldev->xstats.count_mode_device;
-	mldev->xstats.count = stat_id;
+	cn10k_mldev->xstats.count_mode_model = stat_id - cn10k_mldev->xstats.count_mode_device;
+	cn10k_mldev->xstats.count = stat_id;
 
 	return 0;
 }
@@ -503,28 +504,32 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 static void
 cn10k_ml_xstats_uninit(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	rte_free(mldev->xstats.entries);
-	mldev->xstats.entries = NULL;
+	rte_free(cn10k_mldev->xstats.entries);
+	cn10k_mldev->xstats.entries = NULL;
 
-	mldev->xstats.count = 0;
+	cn10k_mldev->xstats.count = 0;
 }
 
 static void
 cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	uint16_t rclk_freq;
 	uint16_t sclk_freq;
 	uint16_t stat_id;
 	char suffix[8];
 	uint16_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[model_id];
 	stat_id = RTE_DIM(device_stats) + model_id * RTE_DIM(model_stats);
 
@@ -536,8 +541,8 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Update xstat name based on model name and sclk availability */
 	for (i = 0; i < RTE_DIM(model_stats); i++) {
-		snprintf(mldev->xstats.entries[stat_id].map.name,
-			 sizeof(mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
 			 model->metadata.model.name, model_stats[i].name, suffix);
 		stat_id++;
 	}
@@ -547,19 +552,19 @@ static uint64_t
 cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 		       enum cn10k_ml_xstats_type type)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
 
 	switch (type) {
 	case nb_models_loaded:
-		return mldev->nb_models_loaded;
+		return cnxk_mldev->nb_models_loaded;
 	case nb_models_unloaded:
-		return mldev->nb_models_unloaded;
+		return cnxk_mldev->nb_models_unloaded;
 	case nb_models_started:
-		return mldev->nb_models_started;
+		return cnxk_mldev->nb_models_started;
 	case nb_models_stopped:
-		return mldev->nb_models_stopped;
+		return cnxk_mldev->nb_models_stopped;
 	default:
 		return -1;
 	}
@@ -651,15 +656,17 @@ static int
 cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[], uint16_t nb_ids)
 {
 	struct cn10k_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint16_t nb_stats;
 	uint16_t stat_id;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	if (stat_ids == NULL)
-		nb_stats = mldev->xstats.count_mode_device;
+		nb_stats = cn10k_mldev->xstats.count_mode_device;
 	else
 		nb_stats = nb_ids;
 
@@ -669,10 +676,10 @@ cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[],
 		else
 			stat_id = stat_ids[i];
 
-		if (stat_id >= mldev->xstats.count_mode_device)
+		if (stat_id >= cn10k_mldev->xstats.count_mode_device)
 			return -EINVAL;
 
-		xs = &mldev->xstats.entries[stat_id];
+		xs = &cn10k_mldev->xstats.entries[stat_id];
 		if (!xs->reset_allowed)
 			continue;
 
@@ -740,15 +747,17 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 			    uint16_t nb_ids)
 {
 	struct cn10k_ml_xstats_entry *xs;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	int32_t lcl_model_id = 0;
 	uint16_t start_id;
 	uint16_t end_id;
 	int32_t i;
 	int32_t j;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	for (i = 0; i < ML_CN10K_MAX_MODELS; i++) {
 		if (model_id == -1) {
 			model = dev->data->models[i];
@@ -765,12 +774,13 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 			}
 		}
 
-		start_id = mldev->xstats.offset_for_model[i];
-		end_id = mldev->xstats.offset_for_model[i] + mldev->xstats.count_per_model[i] - 1;
+		start_id = cn10k_mldev->xstats.offset_for_model[i];
+		end_id = cn10k_mldev->xstats.offset_for_model[i] +
+			 cn10k_mldev->xstats.count_per_model[i] - 1;
 
 		if (stat_ids == NULL) {
 			for (j = start_id; j <= end_id; j++) {
-				xs = &mldev->xstats.entries[j];
+				xs = &cn10k_mldev->xstats.entries[j];
 				cn10k_ml_reset_model_stat(dev, i, xs->type);
 			}
 		} else {
@@ -780,7 +790,7 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 						stat_ids[j], lcl_model_id);
 					return -EINVAL;
 				}
-				xs = &mldev->xstats.entries[stat_ids[j]];
+				xs = &cn10k_mldev->xstats.entries[stat_ids[j]];
 				cn10k_ml_reset_model_stat(dev, i, xs->type);
 			}
 		}
@@ -854,17 +864,19 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 static int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
 	if (dev_info == NULL)
 		return -EINVAL;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
 	dev_info->driver_name = dev->device->driver->name;
 	dev_info->max_models = ML_CN10K_MAX_MODELS;
-	if (mldev->hw_queue_lock)
+	if (cn10k_mldev->hw_queue_lock)
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
 	else
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_LF;
@@ -881,8 +893,9 @@ static int
 cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
 {
 	struct rte_ml_dev_info dev_info;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
@@ -895,7 +908,8 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		return -EINVAL;
 
 	/* Get CN10K device handle */
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	cn10k_ml_dev_info_get(dev, &dev_info);
 	if (conf->nb_models > dev_info.max_models) {
@@ -908,21 +922,21 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		return -EINVAL;
 	}
 
-	if (mldev->state == ML_CN10K_DEV_STATE_PROBED) {
+	if (cnxk_mldev->state == ML_CNXK_DEV_STATE_PROBED) {
 		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
 
 		/* Load firmware */
-		ret = cn10k_ml_fw_load(mldev);
+		ret = cn10k_ml_fw_load(cnxk_mldev);
 		if (ret != 0)
 			return ret;
-	} else if (mldev->state == ML_CN10K_DEV_STATE_CONFIGURED) {
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
 		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
-	} else if (mldev->state == ML_CN10K_DEV_STATE_STARTED) {
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_STARTED) {
 		plt_err("Device can't be reconfigured in started state\n");
 		return -ENOTSUP;
-	} else if (mldev->state == ML_CN10K_DEV_STATE_CLOSED) {
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CLOSED) {
 		plt_err("Device can't be reconfigured after close\n");
 		return -ENOTSUP;
 	}
@@ -1013,10 +1027,10 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	}
 	dev->data->nb_models = conf->nb_models;
 
-	ocm = &mldev->ocm;
+	ocm = &cn10k_mldev->ocm;
 	ocm->num_tiles = ML_CN10K_OCM_NUMTILES;
 	ocm->size_per_tile = ML_CN10K_OCM_TILESIZE;
-	ocm->page_size = mldev->ocm_page_size;
+	ocm->page_size = cn10k_mldev->ocm_page_size;
 	ocm->num_pages = ocm->size_per_tile / ocm->page_size;
 	ocm->mask_words = ocm->num_pages / (8 * sizeof(uint8_t));
 
@@ -1044,25 +1058,25 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	}
 
 	/* Set JCMDQ enqueue function */
-	if (mldev->hw_queue_lock == 1)
-		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_sl;
+	if (cn10k_mldev->hw_queue_lock == 1)
+		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_sl;
 	else
-		mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
+		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
 	/* Set polling function pointers */
-	mldev->set_poll_addr = cn10k_ml_set_poll_addr;
-	mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
-	mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
+	cn10k_mldev->set_poll_addr = cn10k_ml_set_poll_addr;
+	cn10k_mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
+	cn10k_mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
 
 	dev->enqueue_burst = cn10k_ml_enqueue_burst;
 	dev->dequeue_burst = cn10k_ml_dequeue_burst;
 	dev->op_error_get = cn10k_ml_op_error_get;
 
-	mldev->nb_models_loaded = 0;
-	mldev->nb_models_started = 0;
-	mldev->nb_models_stopped = 0;
-	mldev->nb_models_unloaded = 0;
-	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
+	cnxk_mldev->nb_models_loaded = 0;
+	cnxk_mldev->nb_models_started = 0;
+	cnxk_mldev->nb_models_stopped = 0;
+	cnxk_mldev->nb_models_unloaded = 0;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
 
 	return 0;
 
@@ -1077,8 +1091,9 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 static int
 cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
 	uint16_t qp_id;
@@ -1086,10 +1101,11 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	if (dev == NULL)
 		return -EINVAL;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Release ocm_mask memory */
-	rte_free(mldev->ocm.ocm_mask);
+	rte_free(cn10k_mldev->ocm.ocm_mask);
 
 	/* Stop and unload all models */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
@@ -1125,21 +1141,21 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	cn10k_ml_xstats_uninit(dev);
 
 	/* Unload firmware */
-	cn10k_ml_fw_unload(mldev);
+	cn10k_ml_fw_unload(cnxk_mldev);
 
 	/* Clear scratch registers */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_WORK_PTR);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_FW_CTRL);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-	roc_ml_reg_write64(&mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_WORK_PTR);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_FW_CTRL);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
 
 	/* Reset ML_MLR_BASE */
-	roc_ml_reg_write64(&mldev->roc, 0, ML_MLR_BASE);
-	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_MLR_BASE));
+	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_MLR_BASE);
+	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
 
-	mldev->state = ML_CN10K_DEV_STATE_CLOSED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CLOSED;
 
 	/* Remove PCI device */
 	return rte_dev_remove(dev->device);
@@ -1148,17 +1164,19 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 static int
 cn10k_ml_dev_start(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 |= ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	mldev->state = ML_CN10K_DEV_STATE_STARTED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
 
 	return 0;
 }
@@ -1166,17 +1184,19 @@ cn10k_ml_dev_start(struct rte_ml_dev *dev)
 static int
 cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	reg_val64 = roc_ml_reg_read64(&mldev->roc, ML_CFG);
+	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
 	reg_val64 &= ~ROC_ML_CFG_ENA;
-	roc_ml_reg_write64(&mldev->roc, reg_val64, ML_CFG);
-	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&mldev->roc, ML_CFG));
+	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
+	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	mldev->state = ML_CN10K_DEV_STATE_CONFIGURED;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
 
 	return 0;
 }
@@ -1259,22 +1279,24 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
 			      uint32_t size)
 {
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint32_t xstats_mode_count;
 	uint32_t idx = 0;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	xstats_mode_count = 0;
 	switch (mode) {
 	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = mldev->xstats.count_mode_device;
+		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CN10K_MAX_MODELS)
 			break;
-		xstats_mode_count = mldev->xstats.count_per_model[model_id];
+		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
 	default:
 		return -EINVAL;
@@ -1283,16 +1305,17 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 	if (xstats_mode_count > size || xstats_map == NULL)
 		return xstats_mode_count;
 
-	for (i = 0; i < mldev->xstats.count && idx < size; i++) {
-		if (mldev->xstats.entries[i].mode != mode)
+	for (i = 0; i < cn10k_mldev->xstats.count && idx < size; i++) {
+		if (cn10k_mldev->xstats.entries[i].mode != mode)
 			continue;
 
 		if (mode != RTE_ML_DEV_XSTATS_DEVICE &&
-		    model_id != mldev->xstats.entries[i].obj_idx)
+		    model_id != cn10k_mldev->xstats.entries[i].obj_idx)
 			continue;
 
-		strncpy(xstats_map[idx].name, mldev->xstats.entries[i].map.name, RTE_ML_STR_MAX);
-		xstats_map[idx].id = mldev->xstats.entries[i].map.id;
+		rte_strscpy(xstats_map[idx].name, cn10k_mldev->xstats.entries[i].map.name,
+			    RTE_ML_STR_MAX);
+		xstats_map[idx].id = cn10k_mldev->xstats.entries[i].map.id;
 		idx++;
 	}
 
@@ -1304,13 +1327,15 @@ cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16
 				uint64_t *value)
 {
 	struct cn10k_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	cn10k_ml_xstats_fn fn;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
-	for (i = 0; i < mldev->xstats.count; i++) {
-		xs = &mldev->xstats.entries[i];
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	for (i = 0; i < cn10k_mldev->xstats.count; i++) {
+		xs = &cn10k_mldev->xstats.entries[i];
 		if (strncmp(xs->map.name, name, RTE_ML_STR_MAX) == 0) {
 			if (stat_id != NULL)
 				*stat_id = xs->map.id;
@@ -1344,24 +1369,26 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
 {
 	struct cn10k_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *mldev;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	uint32_t xstats_mode_count;
 	cn10k_ml_xstats_fn fn;
 	uint64_t val;
 	uint32_t idx;
 	uint32_t i;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	xstats_mode_count = 0;
 
 	switch (mode) {
 	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = mldev->xstats.count_mode_device;
+		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CN10K_MAX_MODELS)
 			return -EINVAL;
-		xstats_mode_count = mldev->xstats.count_per_model[model_id];
+		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
 	default:
 		return -EINVAL;
@@ -1369,8 +1396,8 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 
 	idx = 0;
 	for (i = 0; i < nb_ids && idx < xstats_mode_count; i++) {
-		xs = &mldev->xstats.entries[stat_ids[i]];
-		if (stat_ids[i] > mldev->xstats.count || xs->mode != mode)
+		xs = &cn10k_mldev->xstats.entries[stat_ids[i]];
+		if (stat_ids[i] > cn10k_mldev->xstats.count || xs->mode != mode)
 			continue;
 
 		if (mode == RTE_ML_DEV_XSTATS_MODEL && model_id != xs->obj_idx) {
@@ -1418,8 +1445,9 @@ cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mo
 static int
 cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_fw *fw;
 
 	uint32_t head_loc;
@@ -1432,8 +1460,9 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	if (roc_env_is_asim())
 		return 0;
 
-	mldev = dev->data->dev_private;
-	fw = &mldev->fw;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	fw = &cn10k_mldev->fw;
 
 	/* Dump model info */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
@@ -1451,15 +1480,19 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	for (core_id = 0; core_id <= 1; core_id++) {
 		bufsize = fw->req->jd.fw_load.debug.debug_buffer_size;
 		if (core_id == 0) {
-			head_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
-			tail_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
+			head_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
+			tail_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_debug_ptr);
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		} else {
-			head_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
-			tail_loc = roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
+			head_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
+			tail_loc =
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_debug_ptr);
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		}
 		if (head_loc < tail_loc) {
 			fprintf(fp, "%.*s\n", tail_loc - head_loc, &head_ptr[head_loc]);
@@ -1473,18 +1506,18 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	for (core_id = 0; core_id <= 1; core_id++) {
 		bufsize = fw->req->jd.fw_load.debug.exception_state_size;
 		if ((core_id == 0) &&
-		    (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0)) {
+		    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0)) {
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C0 = 0x%016lx",
-				roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0));
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0));
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 			fprintf(fp, "%.*s", bufsize, head_ptr);
-		} else if ((core_id == 1) &&
-			   (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0)) {
+		} else if ((core_id == 1) && (roc_ml_reg_read64(&cn10k_mldev->roc,
+								ML_SCRATCH_EXCEPTION_SP_C1) != 0)) {
 			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C1 = 0x%016lx",
-				roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1));
-			head_ptr = roc_ml_addr_mlip2ap(&mldev->roc, head_ptr);
+				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1));
+			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 			fprintf(fp, "%.*s", bufsize, head_ptr);
 		}
 	}
@@ -1495,14 +1528,16 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 static int
 cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	const struct plt_memzone *mz;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	uint64_t timeout_cycle;
 	bool timeout;
 	int ret;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cn10k_ml_req), 0,
 					 ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL) {
@@ -1515,20 +1550,20 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
 	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
 	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result);
-	req->jd.fw_load.flags = cn10k_ml_fw_flags_get(&mldev->fw);
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
+	req->jd.fw_load.flags = cn10k_ml_fw_flags_get(&cn10k_mldev->fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
 	/* Enqueue firmware selftest request through scratch registers */
 	timeout = true;
-	timeout_cycle = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
 
 	plt_rmb();
 	do {
-		if (roc_ml_scratch_is_done_bit_set(&mldev->roc) &&
-		    (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH)) {
+		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
+		    (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -1552,8 +1587,8 @@ int
 cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
 	struct cn10k_ml_model_metadata *metadata;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
@@ -1574,7 +1609,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	if (ret != 0)
 		return ret;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
 
 	/* Find model ID */
 	found = false;
@@ -1591,7 +1626,8 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	/* Get WB and scratch pages, check if model can be loaded. */
-	ret = cn10k_ml_model_ocm_pages_count(mldev, idx, params->addr, &wb_pages, &scratch_pages);
+	ret = cn10k_ml_model_ocm_pages_count(&cnxk_mldev->cn10k_mldev, idx, params->addr, &wb_pages,
+					     &scratch_pages);
 	if (ret < 0)
 		return ret;
 
@@ -1623,7 +1659,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	model = mz->addr;
-	model->mldev = mldev;
+	model->mldev = cnxk_mldev;
 	model->model_id = idx;
 
 	rte_memcpy(&model->metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
@@ -1680,7 +1716,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
-	mldev->nb_models_loaded++;
+	cnxk_mldev->nb_models_loaded++;
 
 	/* Update xstats names */
 	cn10k_ml_xstats_model_name_update(dev, idx);
@@ -1695,9 +1731,9 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	char str[RTE_MEMZONE_NAMESIZE];
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
 	model = dev->data->models[model_id];
 
 	if (model == NULL) {
@@ -1711,7 +1747,7 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 	}
 
 	dev->data->models[model_id] = NULL;
-	mldev->nb_models_unloaded++;
+	cnxk_mldev->nb_models_unloaded++;
 
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, model_id);
 	return plt_memzone_free(plt_memzone_lookup(str));
@@ -1720,8 +1756,9 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 int
 cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1735,8 +1772,9 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	bool locked;
 	int ret = 0;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	if (model == NULL) {
@@ -1746,11 +1784,11 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Prepare JD */
 	req = model->req;
-	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
+	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
 
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
 	num_tiles = model->metadata.model.tile_end - model->metadata.model.tile_start + 1;
@@ -1815,26 +1853,26 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	job_dequeued = false;
 	do {
 		if (!job_enqueued) {
-			req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&mldev->roc, &req->jd);
+			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
 			if (req->result.error_code.u64 == 0)
 				ret = 0;
 			else
 				ret = -1;
 		}
 	} else { /* Reset scratch registers */
-		roc_ml_scratch_queue_reset(&mldev->roc);
+		roc_ml_scratch_queue_reset(&cn10k_mldev->roc);
 		ret = -ETIME;
 	}
 
@@ -1843,7 +1881,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 		if (plt_spinlock_trylock(&model->lock) != 0) {
 			if (ret == 0) {
 				model->state = ML_CN10K_MODEL_STATE_STARTED;
-				mldev->nb_models_started++;
+				cnxk_mldev->nb_models_started++;
 			} else {
 				model->state = ML_CN10K_MODEL_STATE_UNKNOWN;
 			}
@@ -1867,7 +1905,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	if (ret < 0) { /* Call unload to update model and FW state, ignore error */
 		rte_ml_model_stop(dev->data->dev_id, model_id);
 	} else {
-		if (mldev->cache_model_data && roc_model_is_cn10ka())
+		if (cn10k_mldev->cache_model_data && roc_model_is_cn10ka())
 			ret = cn10k_ml_cache_model_data(dev, model_id);
 	}
 
@@ -1877,8 +1915,9 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 int
 cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1887,8 +1926,9 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	bool locked;
 	int ret = 0;
 
-	mldev = dev->data->dev_private;
-	ocm = &mldev->ocm;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
 
 	if (model == NULL) {
@@ -1898,11 +1938,11 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Prepare JD */
 	req = model->req;
-	cn10k_ml_prep_sp_job_descriptor(mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
+	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
 
-	plt_write64(ML_CN10K_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
 	locked = false;
@@ -1941,33 +1981,33 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	job_dequeued = false;
 	do {
 		if (!job_enqueued) {
-			req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&mldev->roc, &req->jd);
+			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&mldev->roc, &req->jd);
+			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CN10K_POLL_JOB_FINISH) {
+		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
 			if (req->result.error_code.u64 == 0x0)
 				ret = 0;
 			else
 				ret = -1;
 		}
 	} else {
-		roc_ml_scratch_queue_reset(&mldev->roc);
+		roc_ml_scratch_queue_reset(&cn10k_mldev->roc);
 		ret = -ETIME;
 	}
 
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			mldev->nb_models_stopped++;
+			cnxk_mldev->nb_models_stopped++;
 			model->state = ML_CN10K_MODEL_STATE_LOADED;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
@@ -2211,8 +2251,9 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		       struct rte_ml_op *op)
 {
 	struct cn10k_ml_model_stats *stats;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_qp *qp;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
@@ -2258,14 +2299,16 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 
 		/* Handle driver error */
 		if (result->error_code.s.etype == ML_ETYPE_DRIVER) {
-			mldev = dev->data->dev_private;
+			cnxk_mldev = dev->data->dev_private;
+			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 			/* Check for exception */
-			if ((roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0) ||
-			    (roc_ml_reg_read64(&mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
+			if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) !=
+			     0) ||
+			    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
 				result->error_code.s.stype = ML_DRIVER_ERR_EXCEPTION;
-			else if ((roc_ml_reg_read64(&mldev->roc, ML_CORE_INT_LO) != 0) ||
-				 (roc_ml_reg_read64(&mldev->roc, ML_CORE_INT_HI) != 0))
+			else if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_LO) != 0) ||
+				 (roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_HI) != 0))
 				result->error_code.s.stype = ML_DRIVER_ERR_FW_ERROR;
 			else
 				result->error_code.s.stype = ML_DRIVER_ERR_UNKNOWN;
@@ -2282,8 +2325,9 @@ __rte_hot uint16_t
 cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	struct cn10k_ml_qp *qp;
 	struct rte_ml_op *op;
@@ -2292,7 +2336,8 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	uint64_t head;
 	bool enqueued;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	qp = dev->data->queue_pairs[qp_id];
 	queue = &qp->queue;
 
@@ -2307,15 +2352,15 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	op = ops[count];
 	req = &queue->reqs[head];
 
-	mldev->set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
+	cn10k_mldev->set_poll_addr(req);
+	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
-	mldev->set_poll_ptr(req);
-	enqueued = mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd);
+	cn10k_mldev->set_poll_ptr(req);
+	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
 
@@ -2339,8 +2384,9 @@ __rte_hot uint16_t
 cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	struct cn10k_ml_qp *qp;
 
@@ -2348,7 +2394,8 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	uint16_t count;
 	uint64_t tail;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	qp = dev->data->queue_pairs[qp_id];
 	queue = &qp->queue;
 
@@ -2361,8 +2408,8 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 
 dequeue_req:
 	req = &queue->reqs[tail];
-	status = mldev->get_poll_ptr(req);
-	if (unlikely(status != ML_CN10K_POLL_JOB_FINISH)) {
+	status = cn10k_mldev->get_poll_ptr(req);
+	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
 		if (plt_tsc_cycles() < req->timeout)
 			goto empty_or_active;
 		else /* Timeout, set indication of driver error */
@@ -2420,30 +2467,32 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 __rte_hot int
 cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_model *model;
-	struct cn10k_ml_dev *mldev;
 	struct cn10k_ml_req *req;
 	bool timeout;
 	int ret = 0;
 
-	mldev = dev->data->dev_private;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[op->model_id];
 	req = model->req;
 
 	cn10k_ml_set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(dev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
 	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
 	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
 	req->result.user_ptr = op->user_ptr;
 
-	mldev->set_poll_ptr(req);
+	cn10k_mldev->set_poll_ptr(req);
 	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
 
 	timeout = true;
-	req->timeout = plt_tsc_cycles() + ML_CN10K_CMD_TIMEOUT * plt_tsc_hz();
+	req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	do {
-		if (mldev->ml_jcmdq_enqueue(&mldev->roc, &req->jcmd)) {
+		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd)) {
 			req->op = op;
 			timeout = false;
 			break;
@@ -2457,7 +2506,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 
 	timeout = true;
 	do {
-		if (mldev->get_poll_ptr(req) == ML_CN10K_POLL_JOB_FINISH) {
+		if (cn10k_mldev->get_poll_ptr(req) == ML_CNXK_POLL_JOB_FINISH) {
 			timeout = false;
 			break;
 		}
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.c b/drivers/ml/cnxk/cnxk_ml_dev.c
new file mode 100644
index 0000000000..2a5c17c973
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_dev.c
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include "cnxk_ml_dev.h"
+
+/* Dummy operations for ML device */
+struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
new file mode 100644
index 0000000000..51315de622
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_DEV_H_
+#define _CNXK_ML_DEV_H_
+
+#include <roc_api.h>
+
+#include "cn10k_ml_dev.h"
+
+/* ML command timeout in seconds */
+#define ML_CNXK_CMD_TIMEOUT 5
+
+/* Poll mode job state */
+#define ML_CNXK_POLL_JOB_START	0
+#define ML_CNXK_POLL_JOB_FINISH 1
+
+/* Device configuration state enum */
+enum cnxk_ml_dev_state {
+	/* Probed and not configured */
+	ML_CNXK_DEV_STATE_PROBED = 0,
+
+	/* Configured */
+	ML_CNXK_DEV_STATE_CONFIGURED,
+
+	/* Started */
+	ML_CNXK_DEV_STATE_STARTED,
+
+	/* Closed */
+	ML_CNXK_DEV_STATE_CLOSED
+};
+
+/* Device private data */
+struct cnxk_ml_dev {
+	/* RTE device */
+	struct rte_ml_dev *mldev;
+
+	/* Configuration state */
+	enum cnxk_ml_dev_state state;
+
+	/* Number of models loaded */
+	uint16_t nb_models_loaded;
+
+	/* Number of models unloaded */
+	uint16_t nb_models_unloaded;
+
+	/* Number of models started */
+	uint16_t nb_models_started;
+
+	/* Number of models stopped */
+	uint16_t nb_models_stopped;
+
+	/* CN10K device structure */
+	struct cn10k_ml_dev cn10k_mldev;
+};
+
+#endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 5bf17d8ae3..e006fdfe0e 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -12,6 +12,7 @@ sources = files(
         'cn10k_ml_ops.c',
         'cn10k_ml_model.c',
         'cn10k_ml_ocm.c',
+        'cnxk_ml_dev.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 03/34] ml/cnxk: add generic model and layer structures
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 01/34] ml/cnxk: drop support for register polling Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 02/34] ml/cnxk: add generic cnxk device structure Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 04/34] ml/cnxk: add generic cnxk request structure Srikanth Yalavarthi
                     ` (31 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduce generic cnxk model and layer structure. These
structures would enable supporting models with multiple
layers. A model is a collection of multiple independent
layers with flow dependencies between the layers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |   9 +-
 drivers/ml/cnxk/cn10k_ml_model.c | 247 ++++++++--------
 drivers/ml/cnxk/cn10k_ml_model.h | 122 ++------
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  50 ++--
 drivers/ml/cnxk/cn10k_ml_ocm.h   |   9 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 488 +++++++++++++++++--------------
 drivers/ml/cnxk/cnxk_ml_io.h     |  79 +++++
 drivers/ml/cnxk/cnxk_ml_model.c  |   7 +
 drivers/ml/cnxk/cnxk_ml_model.h  | 111 +++++++
 drivers/ml/cnxk/meson.build      |   1 +
 10 files changed, 653 insertions(+), 470 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.h
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_model.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index f9da1548c4..99ff0a344a 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -9,6 +9,8 @@
 
 #include "cn10k_ml_ocm.h"
 
+#include "cnxk_ml_io.h"
+
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 
@@ -21,9 +23,6 @@ extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 /* Device alignment size */
 #define ML_CN10K_ALIGN_SIZE 128
 
-/* Maximum number of models per device */
-#define ML_CN10K_MAX_MODELS 16
-
 /* Maximum number of queue-pairs per device, spinlock version */
 #define ML_CN10K_MAX_QP_PER_DEVICE_SL 16
 
@@ -455,8 +454,8 @@ struct cn10k_ml_xstats {
 	struct cn10k_ml_xstats_entry *entries;
 
 	/* Store num stats and offset of the stats for each model */
-	uint16_t count_per_model[ML_CN10K_MAX_MODELS];
-	uint16_t offset_for_model[ML_CN10K_MAX_MODELS];
+	uint16_t count_per_model[ML_CNXK_MAX_MODELS];
+	uint16_t offset_for_model[ML_CNXK_MAX_MODELS];
 	uint16_t count_mode_device;
 	uint16_t count_mode_model;
 	uint16_t count;
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index cc46ca2efd..d033d6deff 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -6,10 +6,10 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -311,19 +311,17 @@ cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata)
 }
 
 void
-cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_t *base_dma_addr)
+cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer, uint8_t *base_dma_addr)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_model_addr *addr;
+	struct cn10k_ml_layer_addr *addr;
 	size_t model_data_size;
 	uint8_t *dma_addr_load;
 	uint8_t *dma_addr_run;
-	uint8_t i;
-	uint8_t j;
 	int fpos;
 
-	metadata = &model->metadata;
-	addr = &model->addr;
+	metadata = &layer->glow.metadata;
+	addr = &layer->glow.addr;
 	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
 			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
 
@@ -361,102 +359,138 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_
 	addr->wb_base_addr = PLT_PTR_SUB(dma_addr_load, metadata->weights_bias.mem_offset);
 	addr->wb_load_addr = PLT_PTR_ADD(addr->wb_base_addr, metadata->weights_bias.mem_offset);
 	rte_memcpy(addr->wb_load_addr, PLT_PTR_ADD(buffer, fpos), metadata->weights_bias.file_size);
+}
+
+void
+cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	uint8_t i;
+	uint8_t j;
+
+	metadata = &layer->glow.metadata;
 
 	/* Inputs */
-	addr->total_input_sz_d = 0;
-	addr->total_input_sz_q = 0;
+	layer->info.nb_inputs = metadata->model.num_input;
+	layer->info.total_input_sz_d = 0;
+	layer->info.total_input_sz_q = 0;
 	for (i = 0; i < metadata->model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			addr->input[i].nb_dims = 4;
-			addr->input[i].shape[0] = metadata->input1[i].shape.w;
-			addr->input[i].shape[1] = metadata->input1[i].shape.x;
-			addr->input[i].shape[2] = metadata->input1[i].shape.y;
-			addr->input[i].shape[3] = metadata->input1[i].shape.z;
-
-			addr->input[i].nb_elements =
+			rte_strscpy(layer->info.input[i].name,
+				    (char *)metadata->input1[i].input_name, MRVL_ML_INPUT_NAME_LEN);
+			layer->info.input[i].dtype = metadata->input1[i].input_type;
+			layer->info.input[i].qtype = metadata->input1[i].model_input_type;
+			layer->info.input[i].nb_dims = 4;
+			layer->info.input[i].shape[0] = metadata->input1[i].shape.w;
+			layer->info.input[i].shape[1] = metadata->input1[i].shape.x;
+			layer->info.input[i].shape[2] = metadata->input1[i].shape.y;
+			layer->info.input[i].shape[3] = metadata->input1[i].shape.z;
+			layer->info.input[i].nb_elements =
 				metadata->input1[i].shape.w * metadata->input1[i].shape.x *
 				metadata->input1[i].shape.y * metadata->input1[i].shape.z;
-			addr->input[i].sz_d =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_d =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].input_type);
-			addr->input[i].sz_q =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_q =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].model_input_type);
-			addr->total_input_sz_d += addr->input[i].sz_d;
-			addr->total_input_sz_q += addr->input[i].sz_q;
+			layer->info.input[i].scale = metadata->input1[i].qscale;
+
+			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
+			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
 
 			plt_ml_dbg(
-				"model_id = %u, input[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				model->model_id, i, metadata->input1[i].shape.w,
+				"index = %u, input1[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				layer->index, i, metadata->input1[i].shape.w,
 				metadata->input1[i].shape.x, metadata->input1[i].shape.y,
-				metadata->input1[i].shape.z, addr->input[i].sz_d,
-				addr->input[i].sz_q);
+				metadata->input1[i].shape.z, layer->info.input[i].sz_d,
+				layer->info.input[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			addr->input[i].nb_dims = 4;
-			addr->input[i].shape[0] = metadata->input2[j].shape.w;
-			addr->input[i].shape[1] = metadata->input2[j].shape.x;
-			addr->input[i].shape[2] = metadata->input2[j].shape.y;
-			addr->input[i].shape[3] = metadata->input2[j].shape.z;
-
-			addr->input[i].nb_elements =
+			rte_strscpy(layer->info.input[i].name,
+				    (char *)metadata->input2[j].input_name, MRVL_ML_INPUT_NAME_LEN);
+			layer->info.input[i].dtype = metadata->input2[j].input_type;
+			layer->info.input[i].qtype = metadata->input2[j].model_input_type;
+			layer->info.input[i].nb_dims = 4;
+			layer->info.input[i].shape[0] = metadata->input2[j].shape.w;
+			layer->info.input[i].shape[1] = metadata->input2[j].shape.x;
+			layer->info.input[i].shape[2] = metadata->input2[j].shape.y;
+			layer->info.input[i].shape[3] = metadata->input2[j].shape.z;
+			layer->info.input[i].nb_elements =
 				metadata->input2[j].shape.w * metadata->input2[j].shape.x *
 				metadata->input2[j].shape.y * metadata->input2[j].shape.z;
-			addr->input[i].sz_d =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_d =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].input_type);
-			addr->input[i].sz_q =
-				addr->input[i].nb_elements *
+			layer->info.input[i].sz_q =
+				layer->info.input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].model_input_type);
-			addr->total_input_sz_d += addr->input[i].sz_d;
-			addr->total_input_sz_q += addr->input[i].sz_q;
+			layer->info.input[i].scale = metadata->input2[j].qscale;
+
+			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
+			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
 
 			plt_ml_dbg(
-				"model_id = %u, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				model->model_id, j, metadata->input2[j].shape.w,
+				"index = %u, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				layer->index, j, metadata->input2[j].shape.w,
 				metadata->input2[j].shape.x, metadata->input2[j].shape.y,
-				metadata->input2[j].shape.z, addr->input[i].sz_d,
-				addr->input[i].sz_q);
+				metadata->input2[j].shape.z, layer->info.input[i].sz_d,
+				layer->info.input[i].sz_q);
 		}
 	}
 
 	/* Outputs */
-	addr->total_output_sz_q = 0;
-	addr->total_output_sz_d = 0;
+	layer->info.nb_outputs = metadata->model.num_output;
+	layer->info.total_output_sz_q = 0;
+	layer->info.total_output_sz_d = 0;
 	for (i = 0; i < metadata->model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			addr->output[i].nb_dims = 1;
-			addr->output[i].shape[0] = metadata->output1[i].size;
-			addr->output[i].nb_elements = metadata->output1[i].size;
-			addr->output[i].sz_d =
-				addr->output[i].nb_elements *
+			rte_strscpy(layer->info.output[i].name,
+				    (char *)metadata->output1[i].output_name,
+				    MRVL_ML_OUTPUT_NAME_LEN);
+			layer->info.output[i].dtype = metadata->output1[i].output_type;
+			layer->info.output[i].qtype = metadata->output1[i].model_output_type;
+			layer->info.output[i].nb_dims = 1;
+			layer->info.output[i].shape[0] = metadata->output1[i].size;
+			layer->info.output[i].nb_elements = metadata->output1[i].size;
+			layer->info.output[i].sz_d =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].output_type);
-			addr->output[i].sz_q =
-				addr->output[i].nb_elements *
+			layer->info.output[i].sz_q =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].model_output_type);
-			addr->total_output_sz_q += addr->output[i].sz_q;
-			addr->total_output_sz_d += addr->output[i].sz_d;
+			layer->info.output[i].scale = metadata->output1[i].dscale;
 
-			plt_ml_dbg("model_id = %u, output[%u] - sz_d = %u, sz_q = %u",
-				   model->model_id, i, addr->output[i].sz_d, addr->output[i].sz_q);
+			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
+			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
+
+			plt_ml_dbg("index = %u, output1[%u] - sz_d = %u, sz_q = %u", layer->index,
+				   i, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			addr->output[i].nb_dims = 1;
-			addr->output[i].shape[0] = metadata->output2[j].size;
-			addr->output[i].nb_elements = metadata->output2[j].size;
-			addr->output[i].sz_d =
-				addr->output[i].nb_elements *
+			rte_strscpy(layer->info.output[i].name,
+				    (char *)metadata->output2[j].output_name,
+				    MRVL_ML_OUTPUT_NAME_LEN);
+			layer->info.output[i].dtype = metadata->output2[j].output_type;
+			layer->info.output[i].qtype = metadata->output2[j].model_output_type;
+			layer->info.output[i].nb_dims = 1;
+			layer->info.output[i].shape[0] = metadata->output2[j].size;
+			layer->info.output[i].nb_elements = metadata->output2[j].size;
+			layer->info.output[i].sz_d =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].output_type);
-			addr->output[i].sz_q =
-				addr->output[i].nb_elements *
+			layer->info.output[i].sz_q =
+				layer->info.output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].model_output_type);
-			addr->total_output_sz_q += addr->output[i].sz_q;
-			addr->total_output_sz_d += addr->output[i].sz_d;
+			layer->info.output[i].scale = metadata->output2[j].dscale;
+
+			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
+			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
 
-			plt_ml_dbg("model_id = %u, output2[%u] - sz_d = %u, sz_q = %u",
-				   model->model_id, j, addr->output[i].sz_d, addr->output[i].sz_q);
+			plt_ml_dbg("index = %u, output2[%u] - sz_d = %u, sz_q = %u", layer->index,
+				   j, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
 		}
 	}
 }
@@ -514,23 +548,23 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 }
 
 void
-cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model)
+cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_model_addr *addr;
+	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct rte_ml_model_info *info;
 	struct rte_ml_io_info *output;
 	struct rte_ml_io_info *input;
-	struct cn10k_ml_dev *mldev;
+	struct cnxk_ml_layer *layer;
 	uint8_t i;
-	uint8_t j;
 
-	mldev = dev->data->dev_private;
-	metadata = &model->metadata;
+	cnxk_mldev = dev->data->dev_private;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	metadata = &model->glow.metadata;
 	info = PLT_PTR_CAST(model->info);
 	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
 	output = PLT_PTR_ADD(input, metadata->model.num_input * sizeof(struct rte_ml_io_info));
-	addr = &model->addr;
 
 	/* Set model info */
 	memset(info, 0, sizeof(struct rte_ml_model_info));
@@ -542,7 +576,8 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model)
 	info->device_id = dev->data->dev_id;
 	info->io_layout = RTE_ML_IO_LAYOUT_PACKED;
 	info->min_batches = model->batch_size;
-	info->max_batches = mldev->fw.req->jd.fw_load.cap.s.max_num_batches / model->batch_size;
+	info->max_batches =
+		cn10k_mldev->fw.req->jd.fw_load.cap.s.max_num_batches / model->batch_size;
 	info->nb_inputs = metadata->model.num_input;
 	info->input_info = input;
 	info->nb_outputs = metadata->model.num_output;
@@ -550,56 +585,26 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model)
 	info->wb_size = metadata->weights_bias.file_size;
 
 	/* Set input info */
+	layer = &model->layer[0];
 	for (i = 0; i < info->nb_inputs; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			rte_memcpy(input[i].name, metadata->input1[i].input_name,
-				   MRVL_ML_INPUT_NAME_LEN);
-			input[i].nb_dims = addr->input[i].nb_dims;
-			input[i].shape = addr->input[i].shape;
-			input[i].type = metadata->input1[i].model_input_type;
-			input[i].nb_elements = addr->input[i].nb_elements;
-			input[i].size =
-				addr->input[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->input1[i].model_input_type);
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-
-			rte_memcpy(input[i].name, metadata->input2[j].input_name,
-				   MRVL_ML_INPUT_NAME_LEN);
-			input[i].nb_dims = addr->input[i].nb_dims;
-			input[i].shape = addr->input[i].shape;
-			input[i].type = metadata->input2[j].model_input_type;
-			input[i].nb_elements = addr->input[i].nb_elements;
-			input[i].size =
-				addr->input[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->input2[j].model_input_type);
-		}
+		rte_memcpy(input[i].name, layer->info.input[i].name, MRVL_ML_INPUT_NAME_LEN);
+		input[i].nb_dims = layer->info.input[i].nb_dims;
+		input[i].shape = &layer->info.input[i].shape[0];
+		input[i].type = layer->info.input[i].qtype;
+		input[i].nb_elements = layer->info.input[i].nb_elements;
+		input[i].size = layer->info.input[i].nb_elements *
+				rte_ml_io_type_size_get(layer->info.input[i].qtype);
 	}
 
 	/* Set output info */
+	layer = &model->layer[0];
 	for (i = 0; i < info->nb_outputs; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			rte_memcpy(output[i].name, metadata->output1[i].output_name,
-				   MRVL_ML_OUTPUT_NAME_LEN);
-			output[i].nb_dims = addr->output[i].nb_dims;
-			output[i].shape = addr->output[i].shape;
-			output[i].type = metadata->output1[i].model_output_type;
-			output[i].nb_elements = addr->output[i].nb_elements;
-			output[i].size =
-				addr->output[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->output1[i].model_output_type);
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-
-			rte_memcpy(output[i].name, metadata->output2[j].output_name,
-				   MRVL_ML_OUTPUT_NAME_LEN);
-			output[i].nb_dims = addr->output[i].nb_dims;
-			output[i].shape = addr->output[i].shape;
-			output[i].type = metadata->output2[j].model_output_type;
-			output[i].nb_elements = addr->output[i].nb_elements;
-			output[i].size =
-				addr->output[i].nb_elements *
-				rte_ml_io_type_size_get(metadata->output2[j].model_output_type);
-		}
+		rte_memcpy(output[i].name, layer->info.output[i].name, MRVL_ML_INPUT_NAME_LEN);
+		output[i].nb_dims = layer->info.output[i].nb_dims;
+		output[i].shape = &layer->info.output[i].shape[0];
+		output[i].type = layer->info.output[i].qtype;
+		output[i].nb_elements = layer->info.output[i].nb_elements;
+		output[i].size = layer->info.output[i].nb_elements *
+				 rte_ml_io_type_size_get(layer->info.output[i].qtype);
 	}
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 3128b28db7..206a369ca7 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -13,15 +13,8 @@
 #include "cn10k_ml_ocm.h"
 #include "cn10k_ml_ops.h"
 
-struct cnxk_ml_dev;
-
-/* Model state */
-enum cn10k_ml_model_state {
-	ML_CN10K_MODEL_STATE_LOADED,
-	ML_CN10K_MODEL_STATE_JOB_ACTIVE,
-	ML_CN10K_MODEL_STATE_STARTED,
-	ML_CN10K_MODEL_STATE_UNKNOWN,
-};
+struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 /* Model Metadata : v 2.3.0.1 */
 #define MRVL_ML_MODEL_MAGIC_STRING "MRVL"
@@ -369,7 +362,7 @@ struct cn10k_ml_model_metadata {
 };
 
 /* Model address structure */
-struct cn10k_ml_model_addr {
+struct cn10k_ml_layer_addr {
 	/* Base DMA address for load */
 	void *base_dma_addr_load;
 
@@ -408,58 +401,10 @@ struct cn10k_ml_model_addr {
 
 	/* End tile */
 	uint8_t tile_end;
-
-	/* Input address and size */
-	struct {
-		/* Number of dimensions in shape */
-		uint32_t nb_dims;
-
-		/* Shape of input */
-		uint32_t shape[4];
-
-		/* Number of elements */
-		uint32_t nb_elements;
-
-		/* Dequantized input size */
-		uint32_t sz_d;
-
-		/* Quantized input size */
-		uint32_t sz_q;
-	} input[MRVL_ML_NUM_INPUT_OUTPUT];
-
-	/* Output address and size */
-	struct {
-		/* Number of dimensions in shape */
-		uint32_t nb_dims;
-
-		/* Shape of input */
-		uint32_t shape[4];
-
-		/* Number of elements */
-		uint32_t nb_elements;
-
-		/* Dequantize output size */
-		uint32_t sz_d;
-
-		/* Quantized output size */
-		uint32_t sz_q;
-	} output[MRVL_ML_NUM_INPUT_OUTPUT];
-
-	/* Total size of quantized input */
-	uint32_t total_input_sz_q;
-
-	/* Total size of dequantized input */
-	uint32_t total_input_sz_d;
-
-	/* Total size of quantized output */
-	uint32_t total_output_sz_q;
-
-	/* Total size of dequantized output */
-	uint32_t total_output_sz_d;
 };
 
 /* Model fast-path stats */
-struct cn10k_ml_model_stats {
+struct cn10k_ml_layer_stats {
 	/* Total hardware latency, sum of all inferences */
 	uint64_t hw_latency_tot;
 
@@ -488,59 +433,38 @@ struct cn10k_ml_model_stats {
 	uint64_t fw_reset_count;
 };
 
-/* Model Object */
-struct cn10k_ml_model {
-	/* Device reference */
-	struct cnxk_ml_dev *mldev;
-
-	/* Name */
-	char name[RTE_ML_STR_MAX];
-
-	/* ID */
-	uint16_t model_id;
-
-	/* Batch size */
-	uint32_t batch_size;
-
-	/* Metadata */
+struct cn10k_ml_layer_data {
+	/* Model / Layer: metadata */
 	struct cn10k_ml_model_metadata metadata;
 
-	/* Address structure */
-	struct cn10k_ml_model_addr addr;
+	/* Layer: address structure */
+	struct cn10k_ml_layer_addr addr;
 
-	/* Tile and memory information object */
-	struct cn10k_ml_ocm_model_map model_mem_map;
+	/* Layer: Tile and memory information object */
+	struct cn10k_ml_ocm_layer_map ocm_map;
 
-	/* Internal model information structure
-	 * Size of the buffer = sizeof(struct rte_ml_model_info)
-	 *                    + num_inputs * sizeof(struct rte_ml_io_info)
-	 *                    + num_outputs * sizeof(struct rte_ml_io_info).
-	 * Structures would be arranged in the same order in the buffer.
-	 */
-	uint8_t *info;
-
-	/* Spinlock, used to update model state */
-	plt_spinlock_t lock;
-
-	/* State */
-	enum cn10k_ml_model_state state;
-
-	/* Slow-path operations request pointer */
+	/* Layer: Slow-path operations request pointer */
 	struct cn10k_ml_req *req;
 
-	/* Stats for burst ops */
-	struct cn10k_ml_model_stats *burst_stats;
+	/* Layer: Stats for burst ops */
+	struct cn10k_ml_layer_stats *burst_stats;
 
-	/* Stats for sync ops */
-	struct cn10k_ml_model_stats *sync_stats;
+	/* Layer: Stats for sync ops */
+	struct cn10k_ml_layer_stats *sync_stats;
+};
+
+struct cn10k_ml_model_data {
+	/* Model / Layer: metadata */
+	struct cn10k_ml_model_metadata metadata;
 };
 
 int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
-void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
+void cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer,
 				uint8_t *base_dma_addr);
+void cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer);
 int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id,
 				   uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages);
-void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model);
+void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 8094a0fab1..d71c36eae6 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -6,10 +6,10 @@
 
 #include <roc_api.h>
 
-#include "cn10k_ml_model.h"
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 
 /* OCM macros */
 #define BYTE_LEN	   8
@@ -333,12 +333,14 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 }
 
 void
-cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t tilemask,
-			   int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages)
+cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
+			   uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
+			   uint16_t scratch_pages)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_page_start;
@@ -353,6 +355,7 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
+	layer = &model->layer[layer_id];
 
 	/* Get first set bit, tile_start */
 	tile_start = 0;
@@ -382,8 +385,8 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 				PLT_MAX(ocm->tile_ocm_info[tile_id].last_wb_page, wb_page_end);
 	}
 
-	model->addr.tile_start = tile_start;
-	model->addr.tile_end = tile_end;
+	layer->glow.addr.tile_start = tile_start;
+	layer->glow.addr.tile_end = tile_end;
 
 	plt_ml_dbg("model_id = %u, tilemask = 0x%016lx", model_id, tilemask);
 	plt_ml_dbg("model_id = %u, wb_page_start = %d, wb_page_end = %d", model_id, wb_page_start,
@@ -393,12 +396,14 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t t
 }
 
 void
-cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id)
 {
-	struct cn10k_ml_model *local_model;
+	struct cnxk_ml_model *local_model;
+	struct cnxk_ml_layer *local_layer;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 
 	int scratch_resize_pages;
@@ -409,16 +414,19 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 	int tile_id;
 	int page_id;
 	uint16_t i;
+	uint16_t j;
 
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
 	model = dev->data->models[model_id];
+	layer = &model->layer[layer_id];
 
 	/* Update OCM info for WB memory */
-	wb_page_start = model->model_mem_map.wb_page_start;
-	wb_page_end = wb_page_start + model->model_mem_map.wb_pages - 1;
-	for (tile_id = model->addr.tile_start; tile_id <= model->addr.tile_end; tile_id++) {
+	wb_page_start = layer->glow.ocm_map.wb_page_start;
+	wb_page_end = wb_page_start + layer->glow.ocm_map.wb_pages - 1;
+	for (tile_id = layer->glow.addr.tile_start; tile_id <= layer->glow.addr.tile_end;
+	     tile_id++) {
 		for (page_id = wb_page_start; page_id <= wb_page_end; page_id++) {
 			CLEAR_BIT(ocm->tile_ocm_info[tile_id].ocm_mask[page_id / OCM_MAP_WORD_SIZE],
 				  page_id % OCM_MAP_WORD_SIZE);
@@ -432,11 +440,19 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id)
 		scratch_resize_pages = 0;
 		for (i = 0; i < dev->data->nb_models; i++) {
 			local_model = dev->data->models[i];
-			if ((i != model_id) && (local_model != NULL)) {
-				if (IS_BIT_SET(local_model->model_mem_map.tilemask, tile_id))
-					scratch_resize_pages = PLT_MAX(
-						(int)local_model->model_mem_map.scratch_pages,
-						scratch_resize_pages);
+			if (local_model == NULL)
+				continue;
+
+			for (j = 0; j < local_model->nb_layers; j++) {
+				local_layer = &local_model->layer[j];
+				if (local_layer != layer &&
+				    local_layer->glow.ocm_map.ocm_reserved) {
+					if (IS_BIT_SET(local_layer->glow.ocm_map.tilemask, tile_id))
+						scratch_resize_pages =
+							PLT_MAX((int)local_layer->glow.ocm_map
+									.scratch_pages,
+								scratch_resize_pages);
+				}
 			}
 		}
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 3404e7fd65..720f8caf76 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -27,7 +27,7 @@ struct cn10k_ml_ocm_tile_info {
 };
 
 /* Model OCM map structure */
-struct cn10k_ml_ocm_model_map {
+struct cn10k_ml_ocm_layer_map {
 	/* Status of OCM reservation */
 	bool ocm_reserved;
 
@@ -77,9 +77,10 @@ struct cn10k_ml_ocm {
 int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
 int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
 			       uint16_t scratch_pages, uint64_t *tilemask);
-void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint64_t tilemask,
-				int wb_page_start, uint16_t wb_pages, uint16_t scratch_pages);
-void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id);
+void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
+				uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
+				uint16_t scratch_pages);
+void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id);
 void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp);
 
 #endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index dc747cf534..b226a9b5a2 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -7,10 +7,10 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_model.h"
 #include "cn10k_ml_ops.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
@@ -202,7 +202,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	char str[STR_LEN];
 	uint8_t i;
@@ -215,77 +215,80 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 
 	/* Print debug info */
 	print_line(fp, LINE_LEN);
-	fprintf(fp, " Model Information (%s)\n", model->metadata.model.name);
+	fprintf(fp, " Model Information (%s)\n", model->glow.metadata.model.name);
 	print_line(fp, LINE_LEN);
-	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->metadata.model.name);
-	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version", model->metadata.model.version[0],
-		model->metadata.model.version[1], model->metadata.model.version[2],
-		model->metadata.model.version[3]);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->glow.metadata.model.name);
+	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version",
+		model->glow.metadata.model.version[0], model->glow.metadata.model.version[1],
+		model->glow.metadata.model.version[2], model->glow.metadata.model.version[3]);
 	if (strlen(model->name) != 0)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "debug_name", model->name);
 	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->metadata.model.batch_size);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_layers", model->metadata.model.num_layers);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->glow.metadata.model.batch_size);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_layers", model->glow.metadata.model.num_layers);
 
 	/* Print model state */
-	if (model->state == ML_CN10K_MODEL_STATE_LOADED)
+	if (model->state == ML_CNXK_MODEL_STATE_LOADED)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
-	if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE)
+	if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
-	if (model->state == ML_CN10K_MODEL_STATE_STARTED)
+	if (model->state == ML_CNXK_MODEL_STATE_STARTED)
 		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
 
 	/* Print OCM status */
 	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
-		model->metadata.model.ocm_wb_range_end - model->metadata.model.ocm_wb_range_start +
-			1);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", model->model_mem_map.wb_pages);
+		model->glow.metadata.model.ocm_wb_range_end -
+			model->glow.metadata.model.ocm_wb_range_start + 1);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", model->layer[0].glow.ocm_map.wb_pages);
 	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
-		ocm->size_per_tile - model->metadata.model.ocm_tmp_range_floor);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages", model->model_mem_map.scratch_pages);
+		ocm->size_per_tile - model->glow.metadata.model.ocm_tmp_range_floor);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages",
+		model->layer[0].glow.ocm_map.scratch_pages);
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
-		model->metadata.model.tile_end - model->metadata.model.tile_start + 1);
+		model->glow.metadata.model.tile_end - model->glow.metadata.model.tile_start + 1);
 
-	if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+	if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
-			ML_CN10K_OCM_NUMTILES / 4, model->model_mem_map.tilemask);
+			ML_CN10K_OCM_NUMTILES / 4, model->layer[0].glow.ocm_map.tilemask);
 		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
-			model->model_mem_map.wb_page_start * cn10k_mldev->ocm.page_size);
+			model->layer[0].glow.ocm_map.wb_page_start * cn10k_mldev->ocm.page_size);
 	}
 
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->metadata.model.num_input);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", model->metadata.model.num_output);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->glow.metadata.model.num_input);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", model->glow.metadata.model.num_output);
 	fprintf(fp, "\n");
 
 	print_line(fp, LINE_LEN);
 	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "input", "input_name", "input_type",
 		"model_input_type", "quantize");
 	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->metadata.model.num_input; i++) {
+	for (i = 0; i < model->glow.metadata.model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.input1[i].input_name);
-			rte_ml_io_type_to_str(model->metadata.input1[i].input_type, str, STR_LEN);
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.input1[i].input_name);
+			rte_ml_io_type_to_str(model->glow.metadata.input1[i].input_type, str,
+					      STR_LEN);
 			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.input1[i].model_input_type, str,
+			rte_ml_io_type_to_str(model->glow.metadata.input1[i].model_input_type, str,
 					      STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.input1[i].quantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.input1[i].quantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.input2[j].input_name);
-			rte_ml_io_type_to_str(model->metadata.input2[j].input_type, str, STR_LEN);
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.input2[j].input_name);
+			rte_ml_io_type_to_str(model->glow.metadata.input2[j].input_type, str,
+					      STR_LEN);
 			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.input2[j].model_input_type, str,
+			rte_ml_io_type_to_str(model->glow.metadata.input2[j].model_input_type, str,
 					      STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.input2[j].quantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.input2[j].quantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		}
 	}
@@ -295,29 +298,31 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "output", "output_name", "output_type",
 		"model_output_type", "dequantize");
 	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->metadata.model.num_output; i++) {
+	for (i = 0; i < model->glow.metadata.model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.output1[i].output_name);
-			rte_ml_io_type_to_str(model->metadata.output1[i].output_type, str, STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.output1[i].model_output_type, str,
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.output1[i].output_name);
+			rte_ml_io_type_to_str(model->glow.metadata.output1[i].output_type, str,
 					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(model->glow.metadata.output1[i].model_output_type,
+					      str, STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.output1[i].dequantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.output1[i].dequantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->metadata.output2[j].output_name);
-			rte_ml_io_type_to_str(model->metadata.output2[j].output_type, str, STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->metadata.output2[j].model_output_type, str,
+			fprintf(fp, "%*s  ", 16, model->glow.metadata.output2[j].output_name);
+			rte_ml_io_type_to_str(model->glow.metadata.output2[j].output_type, str,
 					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(model->glow.metadata.output2[j].model_output_type,
+					      str, STR_LEN);
 			fprintf(fp, "%*s  ", 18, str);
 			fprintf(fp, "%*s", 12,
-				(model->metadata.output2[j].dequantize == 1 ? "Yes" : "No"));
+				(model->glow.metadata.output2[j].dequantize == 1 ? "Yes" : "No"));
 			fprintf(fp, "\n");
 		}
 	}
@@ -327,14 +332,14 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 }
 
 static void
-cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_model *model,
+cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_model *model,
 				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_model_addr *addr;
+	struct cn10k_ml_layer_addr *addr;
 
-	metadata = &model->metadata;
-	addr = &model->addr;
+	metadata = &model->glow.metadata;
+	addr = &model->layer[0].glow.addr;
 
 	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
 	req->jd.hdr.jce.w0.u64 = 0;
@@ -345,7 +350,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_m
 	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
-		if (!model->metadata.model.ocm_relocatable)
+		if (!model->glow.metadata.model.ocm_relocatable)
 			req->jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
 		else
 			req->jd.hdr.sp_flags = 0x0;
@@ -385,7 +390,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_m
 		req->jd.model_start.output.s.ddr_range_end = metadata->model.ddr_output_range_end;
 
 		req->extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
-			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, model->addr.scratch_base_addr));
+			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->scratch_base_addr));
 		req->extended_args.start.ddr_scratch_range_start =
 			metadata->model.ddr_scratch_range_start;
 		req->extended_args.start.ddr_scratch_range_end =
@@ -445,7 +450,7 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
-	nb_stats = RTE_DIM(device_stats) + ML_CN10K_MAX_MODELS * RTE_DIM(model_stats);
+	nb_stats = RTE_DIM(device_stats) + ML_CNXK_MAX_MODELS * RTE_DIM(model_stats);
 	if (cn10k_mldev->xstats.entries == NULL)
 		cn10k_mldev->xstats.entries = rte_zmalloc(
 			"cn10k_ml_xstats", sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
@@ -472,7 +477,7 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	cn10k_mldev->xstats.count_mode_device = stat_id;
 
 	/* Initialize model xstats */
-	for (model = 0; model < ML_CN10K_MAX_MODELS; model++) {
+	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
 		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
 
 		for (i = 0; i < RTE_DIM(model_stats); i++) {
@@ -521,7 +526,7 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint16_t rclk_freq;
 	uint16_t sclk_freq;
 	uint16_t stat_id;
@@ -543,7 +548,7 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 	for (i = 0; i < RTE_DIM(model_stats); i++) {
 		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
-			 model->metadata.model.name, model_stats[i].name, suffix);
+			 model->layer[0].glow.metadata.model.name, model_stats[i].name, suffix);
 		stat_id++;
 	}
 }
@@ -576,9 +581,9 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value += model->burst_stats[qp_id].str##_latency_tot;                      \
-			count += model->burst_stats[qp_id].dequeued_count -                        \
-				 model->burst_stats[qp_id].str##_reset_count;                      \
+			value += model->layer[0].glow.burst_stats[qp_id].str##_latency_tot;        \
+			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
+				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
 		}                                                                                  \
 		if (count != 0)                                                                    \
 			value = value / count;                                                     \
@@ -588,9 +593,10 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = UINT64_MAX;                                                                \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MIN(value, model->burst_stats[qp_id].str##_latency_min);       \
-			count += model->burst_stats[qp_id].dequeued_count -                        \
-				 model->burst_stats[qp_id].str##_reset_count;                      \
+			value = PLT_MIN(                                                           \
+				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_min); \
+			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
+				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
@@ -600,9 +606,10 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MAX(value, model->burst_stats[qp_id].str##_latency_max);       \
-			count += model->burst_stats[qp_id].dequeued_count -                        \
-				 model->burst_stats[qp_id].str##_reset_count;                      \
+			value = PLT_MAX(                                                           \
+				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_max); \
+			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
+				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
@@ -611,7 +618,7 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 static uint64_t
 cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cn10k_ml_xstats_type type)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint16_t rclk_freq; /* MHz */
 	uint16_t sclk_freq; /* MHz */
 	uint64_t count = 0;
@@ -692,28 +699,28 @@ cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[],
 #define ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			model->burst_stats[qp_id].str##_latency_tot = 0;                           \
-			model->burst_stats[qp_id].str##_reset_count =                              \
-				model->burst_stats[qp_id].dequeued_count;                          \
+			model->layer[0].glow.burst_stats[qp_id].str##_latency_tot = 0;             \
+			model->layer[0].glow.burst_stats[qp_id].str##_reset_count =                \
+				model->layer[0].glow.burst_stats[qp_id].dequeued_count;            \
 		}                                                                                  \
 	} while (0)
 
 #define ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->burst_stats[qp_id].str##_latency_min = UINT64_MAX;                  \
+			model->layer[0].glow.burst_stats[qp_id].str##_latency_min = UINT64_MAX;    \
 	} while (0)
 
 #define ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->burst_stats[qp_id].str##_latency_max = 0;                           \
+			model->layer[0].glow.burst_stats[qp_id].str##_latency_max = 0;             \
 	} while (0)
 
 static void
 cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cn10k_ml_xstats_type type)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint32_t qp_id;
 
 	model = dev->data->models[model_id];
@@ -749,7 +756,7 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	int32_t lcl_model_id = 0;
 	uint16_t start_id;
 	uint16_t end_id;
@@ -758,7 +765,7 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	for (i = 0; i < ML_CN10K_MAX_MODELS; i++) {
+	for (i = 0; i < ML_CNXK_MAX_MODELS; i++) {
 		if (model_id == -1) {
 			model = dev->data->models[i];
 			if (model == NULL) /* Skip inactive models */
@@ -803,7 +810,7 @@ static int
 cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct rte_ml_model_info *info;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct rte_ml_buff_seg seg[2];
 	struct rte_ml_buff_seg *inp;
 	struct rte_ml_buff_seg *out;
@@ -854,7 +861,7 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	op.input = &inp;
 	op.output = &out;
 
-	memset(model->req, 0, sizeof(struct cn10k_ml_req));
+	memset(model->layer[0].glow.req, 0, sizeof(struct cn10k_ml_req));
 	ret = cn10k_ml_inference_sync(dev, &op);
 	plt_memzone_free(mz);
 
@@ -875,7 +882,7 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 
 	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
 	dev_info->driver_name = dev->device->driver->name;
-	dev_info->max_models = ML_CN10K_MAX_MODELS;
+	dev_info->max_models = ML_CNXK_MAX_MODELS;
 	if (cn10k_mldev->hw_queue_lock)
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
 	else
@@ -895,7 +902,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
@@ -1001,11 +1008,11 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 			model = dev->data->models[model_id];
 			if (model != NULL) {
-				if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 					if (cn10k_ml_model_stop(dev, model_id) != 0)
 						plt_err("Could not stop model %u", model_id);
 				}
-				if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
 					if (cn10k_ml_model_unload(dev, model_id) != 0)
 						plt_err("Could not unload model %u", model_id);
 				}
@@ -1093,7 +1100,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_qp *qp;
 	uint16_t model_id;
 	uint16_t qp_id;
@@ -1111,11 +1118,11 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 		model = dev->data->models[model_id];
 		if (model != NULL) {
-			if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 				if (cn10k_ml_model_stop(dev, model_id) != 0)
 					plt_err("Could not stop model %u", model_id);
 			}
-			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
 				if (cn10k_ml_model_unload(dev, model_id) != 0)
 					plt_err("Could not unload model %u", model_id);
 			}
@@ -1294,7 +1301,7 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CN10K_MAX_MODELS)
+		if (model_id >= ML_CNXK_MAX_MODELS)
 			break;
 		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
@@ -1386,7 +1393,7 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
 		break;
 	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CN10K_MAX_MODELS)
+		if (model_id >= ML_CNXK_MAX_MODELS)
 			return -EINVAL;
 		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
 		break;
@@ -1447,7 +1454,7 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_fw *fw;
 
 	uint32_t head_loc;
@@ -1588,7 +1595,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
@@ -1643,9 +1650,9 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
 			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
 	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
-	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_model_stats);
+	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_stats);
 
-	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
+	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
 		  2 * model_data_size + model_scratch_size + model_info_size +
 		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE) +
 		  model_stats_size;
@@ -1659,62 +1666,85 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	model = mz->addr;
-	model->mldev = cnxk_mldev;
+	model->cnxk_mldev = cnxk_mldev;
 	model->model_id = idx;
+	dev->data->models[idx] = model;
 
-	rte_memcpy(&model->metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
-	cn10k_ml_model_metadata_update(&model->metadata);
+	rte_memcpy(&model->glow.metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
+	cn10k_ml_model_metadata_update(&model->glow.metadata);
+
+	/* Set model name */
+	rte_memcpy(model->name, (char *)model->glow.metadata.model.name, 64);
 
 	/* Enable support for batch_size of 256 */
-	if (model->metadata.model.batch_size == 0)
+	if (model->glow.metadata.model.batch_size == 0)
 		model->batch_size = 256;
 	else
-		model->batch_size = model->metadata.model.batch_size;
+		model->batch_size = model->glow.metadata.model.batch_size;
+
+	/* Since the number of layers that the driver would be handling for glow models is
+	 * always 1. consider the entire model as a model with single layer. This would
+	 * ignore the num_layers from metadata.
+	 */
+	model->nb_layers = 1;
+
+	/* Copy metadata to internal buffer */
+	rte_memcpy(&model->layer[0].glow.metadata, params->addr,
+		   sizeof(struct cn10k_ml_model_metadata));
+	cn10k_ml_model_metadata_update(&model->layer[0].glow.metadata);
+	model->layer[0].model = model;
 
 	/* Set DMA base address */
 	base_dma_addr = PLT_PTR_ADD(
-		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE));
-	cn10k_ml_model_addr_update(model, params->addr, base_dma_addr);
-	model->addr.scratch_base_addr = PLT_PTR_ADD(base_dma_addr, 2 * model_data_size);
+		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE));
+	cn10k_ml_layer_addr_update(&model->layer[0], params->addr, base_dma_addr);
+	model->layer[0].glow.addr.scratch_base_addr =
+		PLT_PTR_ADD(base_dma_addr, 2 * model_data_size);
 
 	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, model_data_size);
+	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
+		   model->layer[0].glow.addr.base_dma_addr_load, model_data_size);
+
+	/* Update internal I/O data structure */
+	cn10k_ml_layer_info_update(&model->layer[0]);
 
 	/* Initialize model_mem_map */
-	memset(&model->model_mem_map, 0, sizeof(struct cn10k_ml_ocm_model_map));
-	model->model_mem_map.ocm_reserved = false;
-	model->model_mem_map.tilemask = 0;
-	model->model_mem_map.wb_page_start = -1;
-	model->model_mem_map.wb_pages = wb_pages;
-	model->model_mem_map.scratch_pages = scratch_pages;
+	memset(&model->layer[0].glow.ocm_map, 0, sizeof(struct cn10k_ml_ocm_layer_map));
+	model->layer[0].glow.ocm_map.ocm_reserved = false;
+	model->layer[0].glow.ocm_map.tilemask = 0;
+	model->layer[0].glow.ocm_map.wb_page_start = -1;
+	model->layer[0].glow.ocm_map.wb_pages = wb_pages;
+	model->layer[0].glow.ocm_map.scratch_pages = scratch_pages;
 
 	/* Set model info */
-	model->info = PLT_PTR_ADD(model->addr.scratch_base_addr, model_scratch_size);
+	model->info = PLT_PTR_ADD(model->layer[0].glow.addr.scratch_base_addr, model_scratch_size);
 	cn10k_ml_model_info_set(dev, model);
 
 	/* Set slow-path request address and state */
-	model->req = PLT_PTR_ADD(model->info, model_info_size);
+	model->layer[0].glow.req = PLT_PTR_ADD(model->info, model_info_size);
 
 	/* Reset burst and sync stats */
-	model->burst_stats = PLT_PTR_ADD(
-		model->req, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE));
+	model->layer[0].glow.burst_stats =
+		PLT_PTR_ADD(model->layer[0].glow.req,
+			    PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE));
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
-		model->burst_stats[qp_id].hw_latency_tot = 0;
-		model->burst_stats[qp_id].hw_latency_min = UINT64_MAX;
-		model->burst_stats[qp_id].hw_latency_max = 0;
-		model->burst_stats[qp_id].fw_latency_tot = 0;
-		model->burst_stats[qp_id].fw_latency_min = UINT64_MAX;
-		model->burst_stats[qp_id].fw_latency_max = 0;
-		model->burst_stats[qp_id].hw_reset_count = 0;
-		model->burst_stats[qp_id].fw_reset_count = 0;
-		model->burst_stats[qp_id].dequeued_count = 0;
-	}
-	model->sync_stats =
-		PLT_PTR_ADD(model->burst_stats,
-			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_model_stats));
+		model->layer[0].glow.burst_stats[qp_id].hw_latency_tot = 0;
+		model->layer[0].glow.burst_stats[qp_id].hw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_stats[qp_id].hw_latency_max = 0;
+		model->layer[0].glow.burst_stats[qp_id].fw_latency_tot = 0;
+		model->layer[0].glow.burst_stats[qp_id].fw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_stats[qp_id].fw_latency_max = 0;
+		model->layer[0].glow.burst_stats[qp_id].hw_reset_count = 0;
+		model->layer[0].glow.burst_stats[qp_id].fw_reset_count = 0;
+		model->layer[0].glow.burst_stats[qp_id].dequeued_count = 0;
+	}
+
+	model->layer[0].glow.sync_stats =
+		PLT_PTR_ADD(model->layer[0].glow.burst_stats,
+			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_stats));
 
 	plt_spinlock_init(&model->lock);
-	model->state = ML_CN10K_MODEL_STATE_LOADED;
+	model->state = ML_CNXK_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
 	cnxk_mldev->nb_models_loaded++;
 
@@ -1730,7 +1760,7 @@ int
 cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	char str[RTE_MEMZONE_NAMESIZE];
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cnxk_ml_dev *cnxk_mldev;
 
 	cnxk_mldev = dev->data->dev_private;
@@ -1741,7 +1771,7 @@ cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EINVAL;
 	}
 
-	if (model->state != ML_CN10K_MODEL_STATE_LOADED) {
+	if (model->state != ML_CNXK_MODEL_STATE_LOADED) {
 		plt_err("Cannot unload. Model in use.");
 		return -EBUSY;
 	}
@@ -1758,7 +1788,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1783,7 +1813,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	}
 
 	/* Prepare JD */
-	req = model->req;
+	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
@@ -1791,63 +1821,66 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
 	plt_wmb();
 
-	num_tiles = model->metadata.model.tile_end - model->metadata.model.tile_start + 1;
+	num_tiles = model->layer[0].glow.metadata.model.tile_end -
+		    model->layer[0].glow.metadata.model.tile_start + 1;
 
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CN10K_MODEL_STATE_STARTED) {
+			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
 				plt_ml_dbg("Model already started, model = 0x%016lx",
 					   PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE) {
+			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
 				plt_err("A slow-path job is active for the model = 0x%016lx",
 					PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CN10K_MODEL_STATE_JOB_ACTIVE;
+			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (!model->model_mem_map.ocm_reserved) {
+	while (!model->layer[0].glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
 			wb_page_start = cn10k_ml_ocm_tilemask_find(
-				dev, num_tiles, model->model_mem_map.wb_pages,
-				model->model_mem_map.scratch_pages, &tilemask);
+				dev, num_tiles, model->layer[0].glow.ocm_map.wb_pages,
+				model->layer[0].glow.ocm_map.scratch_pages, &tilemask);
 
 			if (wb_page_start == -1) {
 				plt_err("Free pages not available on OCM tiles");
 				plt_err("Failed to start model = 0x%016lx, name = %s",
-					PLT_U64_CAST(model), model->metadata.model.name);
+					PLT_U64_CAST(model),
+					model->layer[0].glow.metadata.model.name);
 
 				plt_spinlock_unlock(&ocm->lock);
 				return -ENOMEM;
 			}
 
-			model->model_mem_map.tilemask = tilemask;
-			model->model_mem_map.wb_page_start = wb_page_start;
+			model->layer[0].glow.ocm_map.tilemask = tilemask;
+			model->layer[0].glow.ocm_map.wb_page_start = wb_page_start;
 
-			cn10k_ml_ocm_reserve_pages(
-				dev, model->model_id, model->model_mem_map.tilemask,
-				model->model_mem_map.wb_page_start, model->model_mem_map.wb_pages,
-				model->model_mem_map.scratch_pages);
-			model->model_mem_map.ocm_reserved = true;
+			cn10k_ml_ocm_reserve_pages(dev, model->model_id, 0,
+						   model->layer[0].glow.ocm_map.tilemask,
+						   model->layer[0].glow.ocm_map.wb_page_start,
+						   model->layer[0].glow.ocm_map.wb_pages,
+						   model->layer[0].glow.ocm_map.scratch_pages);
+			model->layer[0].glow.ocm_map.ocm_reserved = true;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
 
 	/* Update JD */
-	cn10k_ml_ocm_tilecount(model->model_mem_map.tilemask, &tile_start, &tile_end);
+	cn10k_ml_ocm_tilecount(model->layer[0].glow.ocm_map.tilemask, &tile_start, &tile_end);
 	req->jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
 	req->jd.model_start.ocm_wb_base_address =
-		model->model_mem_map.wb_page_start * ocm->page_size;
+		model->layer[0].glow.ocm_map.wb_page_start * ocm->page_size;
 
 	job_enqueued = false;
 	job_dequeued = false;
@@ -1880,10 +1913,10 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
 			if (ret == 0) {
-				model->state = ML_CN10K_MODEL_STATE_STARTED;
+				model->state = ML_CNXK_MODEL_STATE_STARTED;
 				cnxk_mldev->nb_models_started++;
 			} else {
-				model->state = ML_CN10K_MODEL_STATE_UNKNOWN;
+				model->state = ML_CNXK_MODEL_STATE_UNKNOWN;
 			}
 
 			plt_spinlock_unlock(&model->lock);
@@ -1891,12 +1924,12 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 		}
 	}
 
-	if (model->state == ML_CN10K_MODEL_STATE_UNKNOWN) {
-		while (model->model_mem_map.ocm_reserved) {
+	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN) {
+		while (model->layer[0].glow.ocm_map.ocm_reserved) {
 			if (plt_spinlock_trylock(&ocm->lock) != 0) {
-				cn10k_ml_ocm_free_pages(dev, model->model_id);
-				model->model_mem_map.ocm_reserved = false;
-				model->model_mem_map.tilemask = 0x0;
+				cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
+				model->layer[0].glow.ocm_map.ocm_reserved = false;
+				model->layer[0].glow.ocm_map.tilemask = 0x0;
 				plt_spinlock_unlock(&ocm->lock);
 			}
 		}
@@ -1917,7 +1950,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
 	struct cn10k_ml_req *req;
 
@@ -1937,7 +1970,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	}
 
 	/* Prepare JD */
-	req = model->req;
+	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
 	req->result.error_code.u64 = 0x0;
 	req->result.user_ptr = NULL;
@@ -1948,31 +1981,31 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CN10K_MODEL_STATE_LOADED) {
+			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
 				plt_ml_dbg("Model not started, model = 0x%016lx",
 					   PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CN10K_MODEL_STATE_JOB_ACTIVE) {
+			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
 				plt_err("A slow-path job is active for the model = 0x%016lx",
 					PLT_U64_CAST(model));
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CN10K_MODEL_STATE_JOB_ACTIVE;
+			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (model->model_mem_map.ocm_reserved) {
+	while (model->layer[0].glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
-			cn10k_ml_ocm_free_pages(dev, model->model_id);
-			model->model_mem_map.ocm_reserved = false;
-			model->model_mem_map.tilemask = 0x0;
+			cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
+			model->layer[0].glow.ocm_map.ocm_reserved = false;
+			model->layer[0].glow.ocm_map.tilemask = 0x0;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
@@ -2008,7 +2041,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
 			cnxk_mldev->nb_models_stopped++;
-			model->state = ML_CN10K_MODEL_STATE_LOADED;
+			model->state = ML_CNXK_MODEL_STATE_LOADED;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
@@ -2021,7 +2054,7 @@ static int
 cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			struct rte_ml_model_info *model_info)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 
 	model = dev->data->models[model_id];
 
@@ -2040,7 +2073,7 @@ cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 static int
 cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	size_t size;
 
 	model = dev->data->models[model_id];
@@ -2050,19 +2083,23 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *bu
 		return -EINVAL;
 	}
 
-	if (model->state == ML_CN10K_MODEL_STATE_UNKNOWN)
+	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN)
 		return -1;
-	else if (model->state != ML_CN10K_MODEL_STATE_LOADED)
+	else if (model->state != ML_CNXK_MODEL_STATE_LOADED)
 		return -EBUSY;
 
-	size = model->metadata.init_model.file_size + model->metadata.main_model.file_size +
-	       model->metadata.finish_model.file_size + model->metadata.weights_bias.file_size;
+	size = model->layer[0].glow.metadata.init_model.file_size +
+	       model->layer[0].glow.metadata.main_model.file_size +
+	       model->layer[0].glow.metadata.finish_model.file_size +
+	       model->layer[0].glow.metadata.weights_bias.file_size;
 
 	/* Update model weights & bias */
-	rte_memcpy(model->addr.wb_load_addr, buffer, model->metadata.weights_bias.file_size);
+	rte_memcpy(model->layer[0].glow.addr.wb_load_addr, buffer,
+		   model->layer[0].glow.metadata.weights_bias.file_size);
 
 	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, size);
+	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
+		   model->layer[0].glow.addr.base_dma_addr_load, size);
 
 	return 0;
 }
@@ -2071,7 +2108,7 @@ static int
 cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
 		     struct rte_ml_buff_seg **qbuffer)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint8_t model_input_type;
 	uint8_t *lcl_dbuffer;
 	uint8_t *lcl_qbuffer;
@@ -2091,57 +2128,58 @@ cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_bu
 	lcl_dbuffer = dbuffer[0]->addr;
 	lcl_qbuffer = qbuffer[0]->addr;
 
-	for (i = 0; i < model->metadata.model.num_input; i++) {
+	for (i = 0; i < model->layer[0].glow.metadata.model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			input_type = model->metadata.input1[i].input_type;
-			model_input_type = model->metadata.input1[i].model_input_type;
-			qscale = model->metadata.input1[i].qscale;
+			input_type = model->layer[0].glow.metadata.input1[i].input_type;
+			model_input_type = model->layer[0].glow.metadata.input1[i].model_input_type;
+			qscale = model->layer[0].glow.metadata.input1[i].qscale;
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			input_type = model->metadata.input2[j].input_type;
-			model_input_type = model->metadata.input2[j].model_input_type;
-			qscale = model->metadata.input2[j].qscale;
+			input_type = model->layer[0].glow.metadata.input2[j].input_type;
+			model_input_type = model->layer[0].glow.metadata.input2[j].model_input_type;
+			qscale = model->layer[0].glow.metadata.input2[j].qscale;
 		}
 
 		if (input_type == model_input_type) {
-			rte_memcpy(lcl_qbuffer, lcl_dbuffer, model->addr.input[i].sz_d);
+			rte_memcpy(lcl_qbuffer, lcl_dbuffer, model->layer[0].info.input[i].sz_d);
 		} else {
-			switch (model->metadata.input1[i].model_input_type) {
+			switch (model->layer[0].glow.metadata.input1[i].model_input_type) {
 			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_float32_to_int8(qscale,
-								model->addr.input[i].nb_elements,
-								lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_int8(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_float32_to_uint8(qscale,
-								 model->addr.input[i].nb_elements,
-								 lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_uint8(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_float32_to_int16(qscale,
-								 model->addr.input[i].nb_elements,
-								 lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_int16(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_float32_to_uint16(qscale,
-								  model->addr.input[i].nb_elements,
-								  lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_uint16(
+					qscale, model->layer[0].info.input[i].nb_elements,
+					lcl_dbuffer, lcl_qbuffer);
 				break;
 			case RTE_ML_IO_TYPE_FP16:
-				ret = rte_ml_io_float32_to_float16(model->addr.input[i].nb_elements,
-								   lcl_dbuffer, lcl_qbuffer);
+				ret = rte_ml_io_float32_to_float16(
+					model->layer[0].info.input[i].nb_elements, lcl_dbuffer,
+					lcl_qbuffer);
 				break;
 			default:
 				plt_err("Unsupported model_input_type[%u] : %u", i,
-					model->metadata.input1[i].model_input_type);
+					model->layer[0].glow.metadata.input1[i].model_input_type);
 				ret = -ENOTSUP;
 			}
 			if (ret < 0)
 				return ret;
 		}
 
-		lcl_dbuffer += model->addr.input[i].sz_d;
-		lcl_qbuffer += model->addr.input[i].sz_q;
+		lcl_dbuffer += model->layer[0].info.input[i].sz_d;
+		lcl_qbuffer += model->layer[0].info.input[i].sz_q;
 	}
 
 	return 0;
@@ -2151,7 +2189,7 @@ static int
 cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
 		       struct rte_ml_buff_seg **dbuffer)
 {
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	uint8_t model_output_type;
 	uint8_t *lcl_qbuffer;
 	uint8_t *lcl_dbuffer;
@@ -2171,58 +2209,60 @@ cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_
 	lcl_dbuffer = dbuffer[0]->addr;
 	lcl_qbuffer = qbuffer[0]->addr;
 
-	for (i = 0; i < model->metadata.model.num_output; i++) {
+	for (i = 0; i < model->layer[0].glow.metadata.model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			output_type = model->metadata.output1[i].output_type;
-			model_output_type = model->metadata.output1[i].model_output_type;
-			dscale = model->metadata.output1[i].dscale;
+			output_type = model->layer[0].glow.metadata.output1[i].output_type;
+			model_output_type =
+				model->layer[0].glow.metadata.output1[i].model_output_type;
+			dscale = model->layer[0].glow.metadata.output1[i].dscale;
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			output_type = model->metadata.output2[j].output_type;
-			model_output_type = model->metadata.output2[j].model_output_type;
-			dscale = model->metadata.output2[j].dscale;
+			output_type = model->layer[0].glow.metadata.output2[j].output_type;
+			model_output_type =
+				model->layer[0].glow.metadata.output2[j].model_output_type;
+			dscale = model->layer[0].glow.metadata.output2[j].dscale;
 		}
 
 		if (output_type == model_output_type) {
-			rte_memcpy(lcl_dbuffer, lcl_qbuffer, model->addr.output[i].sz_q);
+			rte_memcpy(lcl_dbuffer, lcl_qbuffer, model->layer[0].info.output[i].sz_q);
 		} else {
-			switch (model->metadata.output1[i].model_output_type) {
+			switch (model->layer[0].glow.metadata.output1[i].model_output_type) {
 			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_int8_to_float32(dscale,
-								model->addr.output[i].nb_elements,
-								lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_int8_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_uint8_to_float32(dscale,
-								 model->addr.output[i].nb_elements,
-								 lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_uint8_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_int16_to_float32(dscale,
-								 model->addr.output[i].nb_elements,
-								 lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_int16_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_uint16_to_float32(dscale,
-								  model->addr.output[i].nb_elements,
-								  lcl_qbuffer, lcl_dbuffer);
+				ret = rte_ml_io_uint16_to_float32(
+					dscale, model->layer[0].info.output[i].nb_elements,
+					lcl_qbuffer, lcl_dbuffer);
 				break;
 			case RTE_ML_IO_TYPE_FP16:
 				ret = rte_ml_io_float16_to_float32(
-					model->addr.output[i].nb_elements, lcl_qbuffer,
+					model->layer[0].info.output[i].nb_elements, lcl_qbuffer,
 					lcl_dbuffer);
 				break;
 			default:
 				plt_err("Unsupported model_output_type[%u] : %u", i,
-					model->metadata.output1[i].model_output_type);
+					model->layer[0].glow.metadata.output1[i].model_output_type);
 				ret = -ENOTSUP;
 			}
 			if (ret < 0)
 				return ret;
 		}
 
-		lcl_qbuffer += model->addr.output[i].sz_q;
-		lcl_dbuffer += model->addr.output[i].sz_d;
+		lcl_qbuffer += model->layer[0].info.output[i].sz_q;
+		lcl_dbuffer += model->layer[0].info.output[i].sz_d;
 	}
 
 	return 0;
@@ -2250,10 +2290,10 @@ static __rte_always_inline void
 cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
 		       struct rte_ml_op *op)
 {
-	struct cn10k_ml_model_stats *stats;
+	struct cn10k_ml_layer_stats *stats;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_qp *qp;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
@@ -2263,9 +2303,9 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		if (likely(qp_id >= 0)) {
 			qp = dev->data->queue_pairs[qp_id];
 			qp->stats.dequeued_count++;
-			stats = &model->burst_stats[qp_id];
+			stats = &model->layer[0].glow.burst_stats[qp_id];
 		} else {
-			stats = model->sync_stats;
+			stats = model->layer[0].glow.sync_stats;
 		}
 
 		if (unlikely(stats->dequeued_count == stats->hw_reset_count)) {
@@ -2469,7 +2509,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_model *model;
+	struct cnxk_ml_model *model;
 	struct cn10k_ml_req *req;
 	bool timeout;
 	int ret = 0;
@@ -2477,7 +2517,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[op->model_id];
-	req = model->req;
+	req = model->layer[0].glow.req;
 
 	cn10k_ml_set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
diff --git a/drivers/ml/cnxk/cnxk_ml_io.h b/drivers/ml/cnxk/cnxk_ml_io.h
new file mode 100644
index 0000000000..1fa965a232
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_io.h
@@ -0,0 +1,79 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_IO_H_
+#define _CNXK_ML_IO_H_
+
+#include <rte_mldev.h>
+
+/* Maximum number of models per device */
+#define ML_CNXK_MAX_MODELS 16
+
+/* Maximum number of layers per model */
+#define ML_CNXK_MODEL_MAX_LAYERS 1
+
+/* Maximum number of inputs or outputs per layer or model */
+#define ML_CNXK_MODEL_MAX_INPUT_OUTPUT 32
+
+/* Maximum number of dimensions per I/O shape */
+#define ML_CNXK_MODEL_MAX_DIMS 8
+
+/* Input / Output structure */
+struct cnxk_ml_io {
+	/* name */
+	char name[RTE_ML_STR_MAX];
+
+	/* dequantized data type */
+	enum rte_ml_io_type dtype;
+
+	/* quantized data type */
+	enum rte_ml_io_type qtype;
+
+	/* Number of dimensions in shape */
+	uint32_t nb_dims;
+
+	/* Shape of input */
+	uint32_t shape[ML_CNXK_MODEL_MAX_DIMS];
+
+	/* Number of elements */
+	uint32_t nb_elements;
+
+	/* Dequantized input size */
+	uint32_t sz_d;
+
+	/* Quantized input size */
+	uint32_t sz_q;
+
+	/* Scale */
+	float scale;
+};
+
+/* Model / Layer IO structure */
+struct cnxk_ml_io_info {
+	/* Number of inputs */
+	uint16_t nb_inputs;
+
+	/* Model / Layer inputs */
+	struct cnxk_ml_io input[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Total size of quantized input */
+	uint32_t total_input_sz_q;
+
+	/* Total size of dequantized input */
+	uint32_t total_input_sz_d;
+
+	/* Number of outputs */
+	uint16_t nb_outputs;
+
+	/* Model / Layer outputs */
+	struct cnxk_ml_io output[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Total size of quantized output */
+	uint32_t total_output_sz_q;
+
+	/* Total size of dequantized output */
+	uint32_t total_output_sz_d;
+};
+
+#endif /* _CNXK_ML_IO_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
new file mode 100644
index 0000000000..3d735ced3e
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include "cnxk_ml_model.h"
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
new file mode 100644
index 0000000000..a2994dbb71
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -0,0 +1,111 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_MODEL_H_
+#define _CNXK_ML_MODEL_H_
+
+#include <rte_mldev.h>
+
+#include <roc_api.h>
+
+#include "cn10k_ml_model.h"
+
+#include "cnxk_ml_io.h"
+
+struct cnxk_ml_dev;
+struct cnxk_ml_model;
+
+/* Model state */
+enum cnxk_ml_model_state {
+	/* Unknown state */
+	ML_CNXK_MODEL_STATE_UNKNOWN,
+
+	/* Model loaded */
+	ML_CNXK_MODEL_STATE_LOADED,
+
+	/* A slow-path job is active, start or stop */
+	ML_CNXK_MODEL_STATE_JOB_ACTIVE,
+
+	/* Model started */
+	ML_CNXK_MODEL_STATE_STARTED,
+};
+
+/* Layer state */
+enum cnxk_ml_layer_state {
+	/* Unknown state */
+	ML_CNXK_LAYER_STATE_UNKNOWN,
+
+	/* Layer loaded */
+	ML_CNXK_LAYER_STATE_LOADED,
+
+	/* A slow-path job is active, start or stop */
+	ML_CNXK_LAYER_STATE_JOB_ACTIVE,
+
+	/* Layer started */
+	ML_CNXK_LAYER_STATE_STARTED,
+};
+
+/* Layer object */
+struct cnxk_ml_layer {
+	/* Name*/
+	char name[RTE_ML_STR_MAX];
+
+	/* Model handle */
+	struct cnxk_ml_model *model;
+
+	/* Index mapped with firmware's model_id */
+	uint16_t index;
+
+	/* Input / Output */
+	struct cnxk_ml_io_info info;
+
+	/* Batch size */
+	uint32_t batch_size;
+
+	/* State */
+	enum cnxk_ml_layer_state state;
+
+	/* Glow layer specific data */
+	struct cn10k_ml_layer_data glow;
+};
+
+/* Model Object */
+struct cnxk_ml_model {
+	/* Device reference */
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	/* ID */
+	uint16_t model_id;
+
+	/* Name */
+	char name[RTE_ML_STR_MAX];
+
+	/* Model specific data - glow */
+	struct cn10k_ml_model_data glow;
+
+	/* Batch size */
+	uint32_t batch_size;
+
+	/* Number of layers */
+	uint16_t nb_layers;
+
+	/* Layer info */
+	struct cnxk_ml_layer layer[ML_CNXK_MODEL_MAX_LAYERS];
+
+	/* State */
+	enum cnxk_ml_model_state state;
+
+	/* Internal model information structure
+	 * Size of the buffer = sizeof(struct rte_ml_model_info)
+	 *                    + num_inputs * sizeof(struct rte_ml_io_info)
+	 *                    + num_outputs * sizeof(struct rte_ml_io_info).
+	 * Structures would be arranged in the same order in the buffer.
+	 */
+	uint8_t *info;
+
+	/* Spinlock, used to update model state */
+	plt_spinlock_t lock;
+};
+
+#endif /* _CNXK_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index e006fdfe0e..a70956cceb 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -13,6 +13,7 @@ sources = files(
         'cn10k_ml_model.c',
         'cn10k_ml_ocm.c',
         'cnxk_ml_dev.c',
+        'cnxk_ml_model.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 04/34] ml/cnxk: add generic cnxk request structure
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (2 preceding siblings ...)
  2023-10-26 12:43   ` [PATCH v9 03/34] ml/cnxk: add generic model and layer structures Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 05/34] ml/cnxk: add generic cnxk xstats structures Srikanth Yalavarthi
                     ` (30 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added generic cnxk request structure. Moved common fields
from cn10k structures to cnxk structure. Moved job related
structures and enumerations to ops headers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c   |  72 +++----
 drivers/ml/cnxk/cn10k_ml_dev.h   | 269 +------------------------
 drivers/ml/cnxk/cn10k_ml_model.c |   6 +-
 drivers/ml/cnxk/cn10k_ml_model.h |   4 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 331 +++++++++++++++++--------------
 drivers/ml/cnxk/cn10k_ml_ops.h   | 296 +++++++++++++++++++++++----
 drivers/ml/cnxk/cnxk_ml_ops.c    |   7 +
 drivers/ml/cnxk/cnxk_ml_ops.h    |  63 ++++++
 drivers/ml/cnxk/meson.build      |   1 +
 9 files changed, 557 insertions(+), 492 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_ops.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 3bc61443d8..fc6f78d414 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -14,9 +14,8 @@
 
 #include <roc_api.h>
 
-#include "cn10k_ml_ops.h"
-
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_ops.h"
 
 #define CN10K_ML_FW_PATH		"fw_path"
 #define CN10K_ML_FW_ENABLE_DPE_WARNINGS "enable_dpe_warnings"
@@ -400,20 +399,23 @@ cn10k_ml_pci_remove(struct rte_pci_device *pci_dev)
 static void
 cn10k_ml_fw_print_info(struct cn10k_ml_fw *fw)
 {
-	plt_info("ML Firmware Version = %s", fw->req->jd.fw_load.version);
-
-	plt_ml_dbg("Firmware capabilities = 0x%016lx", fw->req->jd.fw_load.cap.u64);
-	plt_ml_dbg("Version = %s", fw->req->jd.fw_load.version);
-	plt_ml_dbg("core0_debug_ptr = 0x%016lx", fw->req->jd.fw_load.debug.core0_debug_ptr);
-	plt_ml_dbg("core1_debug_ptr = 0x%016lx", fw->req->jd.fw_load.debug.core1_debug_ptr);
-	plt_ml_dbg("debug_buffer_size = %u bytes", fw->req->jd.fw_load.debug.debug_buffer_size);
+	plt_info("ML Firmware Version = %s", fw->req->cn10k_req.jd.fw_load.version);
+
+	plt_ml_dbg("Firmware capabilities = 0x%016lx", fw->req->cn10k_req.jd.fw_load.cap.u64);
+	plt_ml_dbg("Version = %s", fw->req->cn10k_req.jd.fw_load.version);
+	plt_ml_dbg("core0_debug_ptr = 0x%016lx",
+		   fw->req->cn10k_req.jd.fw_load.debug.core0_debug_ptr);
+	plt_ml_dbg("core1_debug_ptr = 0x%016lx",
+		   fw->req->cn10k_req.jd.fw_load.debug.core1_debug_ptr);
+	plt_ml_dbg("debug_buffer_size = %u bytes",
+		   fw->req->cn10k_req.jd.fw_load.debug.debug_buffer_size);
 	plt_ml_dbg("core0_exception_buffer = 0x%016lx",
-		   fw->req->jd.fw_load.debug.core0_exception_buffer);
+		   fw->req->cn10k_req.jd.fw_load.debug.core0_exception_buffer);
 	plt_ml_dbg("core1_exception_buffer = 0x%016lx",
-		   fw->req->jd.fw_load.debug.core1_exception_buffer);
+		   fw->req->cn10k_req.jd.fw_load.debug.core1_exception_buffer);
 	plt_ml_dbg("exception_state_size = %u bytes",
-		   fw->req->jd.fw_load.debug.exception_state_size);
-	plt_ml_dbg("flags = 0x%016lx", fw->req->jd.fw_load.flags);
+		   fw->req->cn10k_req.jd.fw_load.debug.exception_state_size);
+	plt_ml_dbg("flags = 0x%016lx", fw->req->cn10k_req.jd.fw_load.flags);
 }
 
 uint64_t
@@ -458,29 +460,30 @@ cn10k_ml_fw_load_asim(struct cn10k_ml_fw *fw)
 	roc_ml_reg_save(&cn10k_mldev->roc, ML_MLR_BASE);
 
 	/* Update FW load completion structure */
-	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
-	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
-	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
+	fw->req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->cn10k_req.status);
+	fw->req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
+	fw->req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->cn10k_req.result);
+	fw->req->cn10k_req.jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->cn10k_req.status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
 	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->cn10k_req.jd);
 
 	plt_rmb();
 	do {
 		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
+		    (plt_read64(&fw->req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
 	} while (plt_tsc_cycles() < timeout_cycle);
 
 	/* Check firmware load status, clean-up and exit on failure. */
-	if ((!timeout) && (fw->req->result.error_code.u64 == 0)) {
+	if ((!timeout) && (fw->req->cn10k_req.result.error_code == 0)) {
 		cn10k_ml_fw_print_info(fw);
 	} else {
 		/* Set ML to disable new jobs */
@@ -654,29 +657,30 @@ cn10k_ml_fw_load_cn10ka(struct cn10k_ml_fw *fw, void *buffer, uint64_t size)
 	plt_ml_dbg("ML_SW_RST_CTRL => 0x%08x", reg_val32);
 
 	/* (12) Wait for notification from firmware that ML is ready for job execution. */
-	fw->req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->status);
-	fw->req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
-	fw->req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->result);
-	fw->req->jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
-	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->status);
+	fw->req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&fw->req->cn10k_req.status);
+	fw->req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_LOAD;
+	fw->req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &fw->req->cn10k_req.result);
+	fw->req->cn10k_req.jd.fw_load.flags = cn10k_ml_fw_flags_get(fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &fw->req->cn10k_req.status);
 	plt_wmb();
 
 	/* Enqueue FW load through scratch registers */
 	timeout = true;
 	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->jd);
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &fw->req->cn10k_req.jd);
 
 	plt_rmb();
 	do {
 		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
-		    (plt_read64(&fw->req->status) == ML_CNXK_POLL_JOB_FINISH)) {
+		    (plt_read64(&fw->req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
 	} while (plt_tsc_cycles() < timeout_cycle);
 
 	/* Check firmware load status, clean-up and exit on failure. */
-	if ((!timeout) && (fw->req->result.error_code.u64 == 0)) {
+	if ((!timeout) && (fw->req->cn10k_req.result.error_code == 0)) {
 		cn10k_ml_fw_print_info(fw);
 	} else {
 		/* Set ML to disable new jobs */
@@ -766,11 +770,11 @@ cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 		}
 
 		/* Reserve memzone for firmware load completion and data */
-		mz_size = sizeof(struct cn10k_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
+		mz_size = sizeof(struct cnxk_ml_req) + fw_size + FW_STACK_BUFFER_SIZE +
 			  FW_DEBUG_BUFFER_SIZE + FW_EXCEPTION_BUFFER_SIZE;
 	} else if (roc_env_is_asim()) {
 		/* Reserve memzone for firmware load completion */
-		mz_size = sizeof(struct cn10k_ml_req);
+		mz_size = sizeof(struct cnxk_ml_req);
 	}
 
 	mz = plt_memzone_reserve_aligned(FW_MEMZONE_NAME, mz_size, 0, ML_CN10K_ALIGN_SIZE);
@@ -782,8 +786,8 @@ cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 	fw->req = mz->addr;
 
 	/* Reset firmware load completion structure */
-	memset(&fw->req->jd, 0, sizeof(struct cn10k_ml_jd));
-	memset(&fw->req->jd.fw_load.version[0], '\0', MLDEV_FIRMWARE_VERSION_LENGTH);
+	memset(&fw->req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+	memset(&fw->req->cn10k_req.jd.fw_load.version[0], '\0', MLDEV_FIRMWARE_VERSION_LENGTH);
 
 	/* Reset device, if in active state */
 	if (roc_ml_mlip_is_enabled(&cn10k_mldev->roc))
@@ -791,7 +795,7 @@ cn10k_ml_fw_load(struct cnxk_ml_dev *cnxk_mldev)
 
 	/* Load firmware */
 	if (roc_env_is_emulator() || roc_env_is_hw()) {
-		fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cn10k_ml_req));
+		fw->data = PLT_PTR_ADD(mz->addr, sizeof(struct cnxk_ml_req));
 		ret = cn10k_ml_fw_load_cn10ka(fw, fw_buffer, fw_size);
 		free(fw_buffer);
 	} else if (roc_env_is_asim()) {
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 99ff0a344a..1852d4f6c9 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -17,9 +17,6 @@ extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 /* Marvell OCTEON CN10K ML PMD device name */
 #define MLDEV_NAME_CN10K_PMD ml_cn10k
 
-/* Firmware version string length */
-#define MLDEV_FIRMWARE_VERSION_LENGTH 32
-
 /* Device alignment size */
 #define ML_CN10K_ALIGN_SIZE 128
 
@@ -52,17 +49,8 @@ extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 #endif
 
 struct cnxk_ml_dev;
-struct cn10k_ml_req;
-struct cn10k_ml_qp;
-
-/* Job types */
-enum cn10k_ml_job_type {
-	ML_CN10K_JOB_TYPE_MODEL_RUN = 0,
-	ML_CN10K_JOB_TYPE_MODEL_STOP,
-	ML_CN10K_JOB_TYPE_MODEL_START,
-	ML_CN10K_JOB_TYPE_FIRMWARE_LOAD,
-	ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST,
-};
+struct cnxk_ml_req;
+struct cnxk_ml_qp;
 
 /* Error types enumeration */
 enum cn10k_ml_error_etype {
@@ -112,251 +100,6 @@ union cn10k_ml_error_code {
 	uint64_t u64;
 };
 
-/* Firmware stats */
-struct cn10k_ml_fw_stats {
-	/* Firmware start cycle */
-	uint64_t fw_start;
-
-	/* Firmware end cycle */
-	uint64_t fw_end;
-
-	/* Hardware start cycle */
-	uint64_t hw_start;
-
-	/* Hardware end cycle */
-	uint64_t hw_end;
-};
-
-/* Result structure */
-struct cn10k_ml_result {
-	/* Job error code */
-	union cn10k_ml_error_code error_code;
-
-	/* Firmware stats */
-	struct cn10k_ml_fw_stats stats;
-
-	/* User context pointer */
-	void *user_ptr;
-};
-
-/* Firmware capability structure */
-union cn10k_ml_fw_cap {
-	uint64_t u64;
-
-	struct {
-		/* CMPC completion support */
-		uint64_t cmpc_completions : 1;
-
-		/* Poll mode completion support */
-		uint64_t poll_completions : 1;
-
-		/* SSO completion support */
-		uint64_t sso_completions : 1;
-
-		/* Support for model side loading */
-		uint64_t side_load_model : 1;
-
-		/* Batch execution */
-		uint64_t batch_run : 1;
-
-		/* Max number of models to be loaded in parallel */
-		uint64_t max_models : 8;
-
-		/* Firmware statistics */
-		uint64_t fw_stats : 1;
-
-		/* Hardware statistics */
-		uint64_t hw_stats : 1;
-
-		/* Max number of batches */
-		uint64_t max_num_batches : 16;
-
-		uint64_t rsvd : 33;
-	} s;
-};
-
-/* Firmware debug info structure */
-struct cn10k_ml_fw_debug {
-	/* ACC core 0 debug buffer */
-	uint64_t core0_debug_ptr;
-
-	/* ACC core 1 debug buffer */
-	uint64_t core1_debug_ptr;
-
-	/* ACC core 0 exception state buffer */
-	uint64_t core0_exception_buffer;
-
-	/* ACC core 1 exception state buffer */
-	uint64_t core1_exception_buffer;
-
-	/* Debug buffer size per core */
-	uint32_t debug_buffer_size;
-
-	/* Exception state dump size */
-	uint32_t exception_state_size;
-};
-
-/* Job descriptor header (32 bytes) */
-struct cn10k_ml_jd_header {
-	/* Job completion structure */
-	struct ml_jce_s jce;
-
-	/* Model ID */
-	uint64_t model_id : 8;
-
-	/* Job type */
-	uint64_t job_type : 8;
-
-	/* Flags for fast-path jobs */
-	uint64_t fp_flags : 16;
-
-	/* Flags for slow-path jobs */
-	uint64_t sp_flags : 16;
-	uint64_t rsvd : 16;
-
-	/* Job result pointer */
-	uint64_t *result;
-};
-
-/* Extra arguments for job descriptor */
-union cn10k_ml_jd_extended_args {
-	struct cn10k_ml_jd_extended_args_section_start {
-		/** DDR Scratch base address */
-		uint64_t ddr_scratch_base_address;
-
-		/** DDR Scratch range start */
-		uint64_t ddr_scratch_range_start;
-
-		/** DDR Scratch range end */
-		uint64_t ddr_scratch_range_end;
-
-		uint8_t rsvd[104];
-	} start;
-};
-
-/* Job descriptor structure */
-struct cn10k_ml_jd {
-	/* Job descriptor header (32 bytes) */
-	struct cn10k_ml_jd_header hdr;
-
-	union {
-		struct cn10k_ml_jd_section_fw_load {
-			/* Firmware capability structure (8 bytes) */
-			union cn10k_ml_fw_cap cap;
-
-			/* Firmware version (32 bytes) */
-			uint8_t version[MLDEV_FIRMWARE_VERSION_LENGTH];
-
-			/* Debug capability structure (40 bytes) */
-			struct cn10k_ml_fw_debug debug;
-
-			/* Flags to control error handling */
-			uint64_t flags;
-
-			uint8_t rsvd[8];
-		} fw_load;
-
-		struct cn10k_ml_jd_section_model_start {
-			/* Extended arguments */
-			uint64_t extended_args;
-
-			/* Destination model start address in DDR relative to ML_MLR_BASE */
-			uint64_t model_dst_ddr_addr;
-
-			/* Offset to model init section in the model */
-			uint64_t model_init_offset : 32;
-
-			/* Size of init section in the model */
-			uint64_t model_init_size : 32;
-
-			/* Offset to model main section in the model */
-			uint64_t model_main_offset : 32;
-
-			/* Size of main section in the model */
-			uint64_t model_main_size : 32;
-
-			/* Offset to model finish section in the model */
-			uint64_t model_finish_offset : 32;
-
-			/* Size of finish section in the model */
-			uint64_t model_finish_size : 32;
-
-			/* Offset to WB in model bin */
-			uint64_t model_wb_offset : 32;
-
-			/* Number of model layers */
-			uint64_t num_layers : 8;
-
-			/* Number of gather entries, 0 means linear input mode (= no gather) */
-			uint64_t num_gather_entries : 8;
-
-			/* Number of scatter entries 0 means linear input mode (= no scatter) */
-			uint64_t num_scatter_entries : 8;
-
-			/* Tile mask to load model */
-			uint64_t tilemask : 8;
-
-			/* Batch size of model  */
-			uint64_t batch_size : 32;
-
-			/* OCM WB base address */
-			uint64_t ocm_wb_base_address : 32;
-
-			/* OCM WB range start */
-			uint64_t ocm_wb_range_start : 32;
-
-			/* OCM WB range End */
-			uint64_t ocm_wb_range_end : 32;
-
-			/* DDR WB address */
-			uint64_t ddr_wb_base_address;
-
-			/* DDR WB range start */
-			uint64_t ddr_wb_range_start : 32;
-
-			/* DDR WB range end */
-			uint64_t ddr_wb_range_end : 32;
-
-			union {
-				/* Points to gather list if num_gather_entries > 0 */
-				void *gather_list;
-				struct {
-					/* Linear input mode */
-					uint64_t ddr_range_start : 32;
-					uint64_t ddr_range_end : 32;
-				} s;
-			} input;
-
-			union {
-				/* Points to scatter list if num_scatter_entries > 0 */
-				void *scatter_list;
-				struct {
-					/* Linear output mode */
-					uint64_t ddr_range_start : 32;
-					uint64_t ddr_range_end : 32;
-				} s;
-			} output;
-		} model_start;
-
-		struct cn10k_ml_jd_section_model_stop {
-			uint8_t rsvd[96];
-		} model_stop;
-
-		struct cn10k_ml_jd_section_model_run {
-			/* Address of the input for the run relative to ML_MLR_BASE */
-			uint64_t input_ddr_addr;
-
-			/* Address of the output for the run relative to ML_MLR_BASE */
-			uint64_t output_ddr_addr;
-
-			/* Number of batches to run in variable batch processing */
-			uint16_t num_batches;
-
-			uint8_t rsvd[78];
-		} model_run;
-	};
-};
-
 /* ML firmware structure */
 struct cn10k_ml_fw {
 	/* Device reference */
@@ -375,7 +118,7 @@ struct cn10k_ml_fw {
 	uint8_t *data;
 
 	/* Firmware load / handshake request structure */
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 };
 
 /* Extended stats types enum */
@@ -488,9 +231,9 @@ struct cn10k_ml_dev {
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
 
 	/* Poll handling function pointers */
-	void (*set_poll_addr)(struct cn10k_ml_req *req);
-	void (*set_poll_ptr)(struct cn10k_ml_req *req);
-	uint64_t (*get_poll_ptr)(struct cn10k_ml_req *req);
+	void (*set_poll_addr)(struct cnxk_ml_req *req);
+	void (*set_poll_ptr)(struct cnxk_ml_req *req);
+	uint64_t (*get_poll_ptr)(struct cnxk_ml_req *req);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index d033d6deff..d2f1c761be 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -10,6 +10,7 @@
 
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_ops.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -551,7 +552,6 @@ void
 cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 {
 	struct cn10k_ml_model_metadata *metadata;
-	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct rte_ml_model_info *info;
 	struct rte_ml_io_info *output;
@@ -560,7 +560,6 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 	uint8_t i;
 
 	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	metadata = &model->glow.metadata;
 	info = PLT_PTR_CAST(model->info);
 	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
@@ -577,7 +576,8 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 	info->io_layout = RTE_ML_IO_LAYOUT_PACKED;
 	info->min_batches = model->batch_size;
 	info->max_batches =
-		cn10k_mldev->fw.req->jd.fw_load.cap.s.max_num_batches / model->batch_size;
+		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_num_batches /
+		model->batch_size;
 	info->nb_inputs = metadata->model.num_input;
 	info->input_info = input;
 	info->nb_outputs = metadata->model.num_output;
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 206a369ca7..74ada1531a 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -11,10 +11,10 @@
 
 #include "cn10k_ml_dev.h"
 #include "cn10k_ml_ocm.h"
-#include "cn10k_ml_ops.h"
 
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
+struct cnxk_ml_req;
 
 /* Model Metadata : v 2.3.0.1 */
 #define MRVL_ML_MODEL_MAGIC_STRING "MRVL"
@@ -444,7 +444,7 @@ struct cn10k_ml_layer_data {
 	struct cn10k_ml_ocm_layer_map ocm_map;
 
 	/* Layer: Slow-path operations request pointer */
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 
 	/* Layer: Stats for burst ops */
 	struct cn10k_ml_layer_stats *burst_stats;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index b226a9b5a2..25ebb28993 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -7,10 +7,9 @@
 
 #include <mldev_utils.h>
 
-#include "cn10k_ml_ops.h"
-
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_ops.h"
 
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
@@ -78,31 +77,31 @@ print_line(FILE *fp, int len)
 }
 
 static inline void
-cn10k_ml_set_poll_addr(struct cn10k_ml_req *req)
+cn10k_ml_set_poll_addr(struct cnxk_ml_req *req)
 {
-	req->compl_W1 = PLT_U64_CAST(&req->status);
+	req->status = &req->cn10k_req.status;
 }
 
 static inline void
-cn10k_ml_set_poll_ptr(struct cn10k_ml_req *req)
+cn10k_ml_set_poll_ptr(struct cnxk_ml_req *req)
 {
-	plt_write64(ML_CNXK_POLL_JOB_START, req->compl_W1);
+	plt_write64(ML_CNXK_POLL_JOB_START, req->status);
 }
 
 static inline uint64_t
-cn10k_ml_get_poll_ptr(struct cn10k_ml_req *req)
+cn10k_ml_get_poll_ptr(struct cnxk_ml_req *req)
 {
-	return plt_read64(req->compl_W1);
+	return plt_read64(req->status);
 }
 
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
-	snprintf(name, size, "cn10k_ml_qp_mem_%u:%u", dev_id, qp_id);
+	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
 }
 
 static int
-cn10k_ml_qp_destroy(const struct rte_ml_dev *dev, struct cn10k_ml_qp *qp)
+cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
 {
 	const struct rte_memzone *qp_mem;
 	char name[RTE_MEMZONE_NAMESIZE];
@@ -122,14 +121,14 @@ cn10k_ml_qp_destroy(const struct rte_ml_dev *dev, struct cn10k_ml_qp *qp)
 static int
 cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
 {
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	int ret;
 
 	qp = dev->data->queue_pairs[queue_pair_id];
 	if (qp == NULL)
 		return -EINVAL;
 
-	ret = cn10k_ml_qp_destroy(dev, qp);
+	ret = cnxk_ml_qp_destroy(dev, qp);
 	if (ret) {
 		plt_err("Could not destroy queue pair %u", queue_pair_id);
 		return ret;
@@ -140,18 +139,18 @@ cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
 	return 0;
 }
 
-static struct cn10k_ml_qp *
-cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
+static struct cnxk_ml_qp *
+cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
 {
 	const struct rte_memzone *qp_mem;
 	char name[RTE_MEMZONE_NAMESIZE];
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint32_t len;
 	uint8_t *va;
 	uint64_t i;
 
 	/* Allocate queue pair */
-	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cn10k_ml_qp), ROC_ALIGN,
+	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cnxk_ml_qp), ROC_ALIGN,
 				socket_id);
 	if (qp == NULL) {
 		plt_err("Could not allocate queue pair");
@@ -159,7 +158,7 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 	}
 
 	/* For request queue */
-	len = nb_desc * sizeof(struct cn10k_ml_req);
+	len = nb_desc * sizeof(struct cnxk_ml_req);
 	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp_id);
 	qp_mem = rte_memzone_reserve_aligned(
 		name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | RTE_MEMZONE_256MB, ROC_ALIGN);
@@ -173,7 +172,7 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 
 	/* Initialize Request queue */
 	qp->id = qp_id;
-	qp->queue.reqs = (struct cn10k_ml_req *)va;
+	qp->queue.reqs = (struct cnxk_ml_req *)va;
 	qp->queue.head = 0;
 	qp->queue.tail = 0;
 	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
@@ -185,8 +184,9 @@ cn10k_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_des
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
-		memset(&qp->queue.reqs[i].jd, 0, sizeof(struct cn10k_ml_jd));
-		qp->queue.reqs[i].jcmd.w1.s.jobptr = PLT_U64_CAST(&qp->queue.reqs[i].jd);
+		memset(&qp->queue.reqs[i].cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+		qp->queue.reqs[i].cn10k_req.jcmd.w1.s.jobptr =
+			PLT_U64_CAST(&qp->queue.reqs[i].cn10k_req.jd);
 	}
 
 	return qp;
@@ -333,7 +333,7 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 
 static void
 cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_model *model,
-				struct cn10k_ml_req *req, enum cn10k_ml_job_type job_type)
+				struct cnxk_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_layer_addr *addr;
@@ -341,79 +341,88 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 	metadata = &model->glow.metadata;
 	addr = &model->layer[0].glow.addr;
 
-	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
-	req->jd.hdr.jce.w0.u64 = 0;
-	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
-	req->jd.hdr.model_id = model->model_id;
-	req->jd.hdr.job_type = job_type;
-	req->jd.hdr.fp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
+	memset(&req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
+	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->cn10k_req.status);
+	req->cn10k_req.jd.hdr.model_id = model->model_id;
+	req->cn10k_req.jd.hdr.job_type = job_type;
+	req->cn10k_req.jd.hdr.fp_flags = 0x0;
+	req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
 		if (!model->glow.metadata.model.ocm_relocatable)
-			req->jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
+			req->cn10k_req.jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
 		else
-			req->jd.hdr.sp_flags = 0x0;
+			req->cn10k_req.jd.hdr.sp_flags = 0x0;
 
-		req->jd.hdr.sp_flags |= ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD;
-		req->jd.model_start.extended_args =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->extended_args));
-		req->jd.model_start.model_dst_ddr_addr =
+		req->cn10k_req.jd.hdr.sp_flags |= ML_CN10K_SP_FLAGS_EXTENDED_LOAD_JD;
+		req->cn10k_req.jd.model_start.extended_args = PLT_U64_CAST(
+			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.extended_args));
+		req->cn10k_req.jd.model_start.model_dst_ddr_addr =
 			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_run_addr));
-		req->jd.model_start.model_init_offset = 0x0;
-		req->jd.model_start.model_main_offset = metadata->init_model.file_size;
-		req->jd.model_start.model_finish_offset =
+		req->cn10k_req.jd.model_start.model_init_offset = 0x0;
+		req->cn10k_req.jd.model_start.model_main_offset = metadata->init_model.file_size;
+		req->cn10k_req.jd.model_start.model_finish_offset =
 			metadata->init_model.file_size + metadata->main_model.file_size;
-		req->jd.model_start.model_init_size = metadata->init_model.file_size;
-		req->jd.model_start.model_main_size = metadata->main_model.file_size;
-		req->jd.model_start.model_finish_size = metadata->finish_model.file_size;
-		req->jd.model_start.model_wb_offset = metadata->init_model.file_size +
-						      metadata->main_model.file_size +
-						      metadata->finish_model.file_size;
-		req->jd.model_start.num_layers = metadata->model.num_layers;
-		req->jd.model_start.num_gather_entries = 0;
-		req->jd.model_start.num_scatter_entries = 0;
-		req->jd.model_start.tilemask = 0; /* Updated after reserving pages */
-		req->jd.model_start.batch_size = model->batch_size;
-		req->jd.model_start.ocm_wb_base_address = 0; /* Updated after reserving pages */
-		req->jd.model_start.ocm_wb_range_start = metadata->model.ocm_wb_range_start;
-		req->jd.model_start.ocm_wb_range_end = metadata->model.ocm_wb_range_end;
-		req->jd.model_start.ddr_wb_base_address = PLT_U64_CAST(roc_ml_addr_ap2mlip(
-			&cn10k_mldev->roc,
-			PLT_PTR_ADD(addr->finish_load_addr, metadata->finish_model.file_size)));
-		req->jd.model_start.ddr_wb_range_start = metadata->model.ddr_wb_range_start;
-		req->jd.model_start.ddr_wb_range_end = metadata->model.ddr_wb_range_end;
-		req->jd.model_start.input.s.ddr_range_start = metadata->model.ddr_input_range_start;
-		req->jd.model_start.input.s.ddr_range_end = metadata->model.ddr_input_range_end;
-		req->jd.model_start.output.s.ddr_range_start =
+		req->cn10k_req.jd.model_start.model_init_size = metadata->init_model.file_size;
+		req->cn10k_req.jd.model_start.model_main_size = metadata->main_model.file_size;
+		req->cn10k_req.jd.model_start.model_finish_size = metadata->finish_model.file_size;
+		req->cn10k_req.jd.model_start.model_wb_offset = metadata->init_model.file_size +
+								metadata->main_model.file_size +
+								metadata->finish_model.file_size;
+		req->cn10k_req.jd.model_start.num_layers = metadata->model.num_layers;
+		req->cn10k_req.jd.model_start.num_gather_entries = 0;
+		req->cn10k_req.jd.model_start.num_scatter_entries = 0;
+		req->cn10k_req.jd.model_start.tilemask = 0; /* Updated after reserving pages */
+		req->cn10k_req.jd.model_start.batch_size = model->batch_size;
+		req->cn10k_req.jd.model_start.ocm_wb_base_address =
+			0; /* Updated after reserving pages */
+		req->cn10k_req.jd.model_start.ocm_wb_range_start =
+			metadata->model.ocm_wb_range_start;
+		req->cn10k_req.jd.model_start.ocm_wb_range_end = metadata->model.ocm_wb_range_end;
+		req->cn10k_req.jd.model_start.ddr_wb_base_address =
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(
+				&cn10k_mldev->roc, PLT_PTR_ADD(addr->finish_load_addr,
+							       metadata->finish_model.file_size)));
+		req->cn10k_req.jd.model_start.ddr_wb_range_start =
+			metadata->model.ddr_wb_range_start;
+		req->cn10k_req.jd.model_start.ddr_wb_range_end = metadata->model.ddr_wb_range_end;
+		req->cn10k_req.jd.model_start.input.s.ddr_range_start =
+			metadata->model.ddr_input_range_start;
+		req->cn10k_req.jd.model_start.input.s.ddr_range_end =
+			metadata->model.ddr_input_range_end;
+		req->cn10k_req.jd.model_start.output.s.ddr_range_start =
 			metadata->model.ddr_output_range_start;
-		req->jd.model_start.output.s.ddr_range_end = metadata->model.ddr_output_range_end;
+		req->cn10k_req.jd.model_start.output.s.ddr_range_end =
+			metadata->model.ddr_output_range_end;
 
-		req->extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
+		req->cn10k_req.extended_args.start.ddr_scratch_base_address = PLT_U64_CAST(
 			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->scratch_base_addr));
-		req->extended_args.start.ddr_scratch_range_start =
+		req->cn10k_req.extended_args.start.ddr_scratch_range_start =
 			metadata->model.ddr_scratch_range_start;
-		req->extended_args.start.ddr_scratch_range_end =
+		req->cn10k_req.extended_args.start.ddr_scratch_range_end =
 			metadata->model.ddr_scratch_range_end;
 	}
 }
 
 static __rte_always_inline void
-cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cn10k_ml_req *req,
+cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_req *req,
 				struct rte_ml_op *op)
 {
-	req->jd.hdr.jce.w0.u64 = 0;
-	req->jd.hdr.jce.w1.u64 = req->compl_W1;
-	req->jd.hdr.model_id = op->model_id;
-	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
-	req->jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
-	req->jd.hdr.sp_flags = 0x0;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
-	req->jd.model_run.input_ddr_addr =
+	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
+	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(req->status);
+	req->cn10k_req.jd.hdr.model_id = op->model_id;
+	req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
+	req->cn10k_req.jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
+	req->cn10k_req.jd.hdr.sp_flags = 0x0;
+	req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
+	req->cn10k_req.jd.model_run.input_ddr_addr =
 		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->input[0]->addr));
-	req->jd.model_run.output_ddr_addr =
+	req->cn10k_req.jd.model_run.output_ddr_addr =
 		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->output[0]->addr));
-	req->jd.model_run.num_batches = op->nb_batches;
+	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
 }
 
 struct xstat_info {
@@ -861,7 +870,7 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	op.input = &inp;
 	op.output = &out;
 
-	memset(model->layer[0].glow.req, 0, sizeof(struct cn10k_ml_req));
+	memset(model->layer[0].glow.req, 0, sizeof(struct cnxk_ml_req));
 	ret = cn10k_ml_inference_sync(dev, &op);
 	plt_memzone_free(mz);
 
@@ -904,7 +913,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint16_t model_id;
 	uint32_t mz_size;
 	uint16_t tile_id;
@@ -1101,7 +1110,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint16_t model_id;
 	uint16_t qp_id;
 
@@ -1136,7 +1145,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
 		qp = dev->data->queue_pairs[qp_id];
 		if (qp != NULL) {
-			if (cn10k_ml_qp_destroy(dev, qp) != 0)
+			if (cnxk_ml_qp_destroy(dev, qp) != 0)
 				plt_err("Could not destroy queue pair %u", qp_id);
 			dev->data->queue_pairs[qp_id] = NULL;
 		}
@@ -1213,7 +1222,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 			      const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
 {
 	struct rte_ml_dev_info dev_info;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	uint32_t nb_desc;
 
 	if (queue_pair_id >= dev->data->nb_queue_pairs) {
@@ -1239,7 +1248,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	 */
 	nb_desc =
 		(qp_conf->nb_desc == dev_info.max_desc) ? dev_info.max_desc : qp_conf->nb_desc + 1;
-	qp = cn10k_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
+	qp = cnxk_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
 	if (qp == NULL) {
 		plt_err("Could not create queue pair %u", queue_pair_id);
 		return -ENOMEM;
@@ -1252,7 +1261,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 static int
 cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 {
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	int qp_id;
 
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
@@ -1269,7 +1278,7 @@ cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 static void
 cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 {
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
 	int qp_id;
 
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
@@ -1485,20 +1494,22 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 
 	/* Dump debug buffer */
 	for (core_id = 0; core_id <= 1; core_id++) {
-		bufsize = fw->req->jd.fw_load.debug.debug_buffer_size;
+		bufsize = fw->req->cn10k_req.jd.fw_load.debug.debug_buffer_size;
 		if (core_id == 0) {
 			head_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C0);
 			tail_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C0);
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_debug_ptr);
+			head_ptr =
+				PLT_PTR_CAST(fw->req->cn10k_req.jd.fw_load.debug.core0_debug_ptr);
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		} else {
 			head_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_HEAD_C1);
 			tail_loc =
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_DBG_BUFFER_TAIL_C1);
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_debug_ptr);
+			head_ptr =
+				PLT_PTR_CAST(fw->req->cn10k_req.jd.fw_load.debug.core1_debug_ptr);
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 		}
 		if (head_loc < tail_loc) {
@@ -1511,17 +1522,19 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 
 	/* Dump exception info */
 	for (core_id = 0; core_id <= 1; core_id++) {
-		bufsize = fw->req->jd.fw_load.debug.exception_state_size;
+		bufsize = fw->req->cn10k_req.jd.fw_load.debug.exception_state_size;
 		if ((core_id == 0) &&
 		    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) != 0)) {
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core0_exception_buffer);
+			head_ptr = PLT_PTR_CAST(
+				fw->req->cn10k_req.jd.fw_load.debug.core0_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C0 = 0x%016lx",
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0));
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
 			fprintf(fp, "%.*s", bufsize, head_ptr);
 		} else if ((core_id == 1) && (roc_ml_reg_read64(&cn10k_mldev->roc,
 								ML_SCRATCH_EXCEPTION_SP_C1) != 0)) {
-			head_ptr = PLT_PTR_CAST(fw->req->jd.fw_load.debug.core1_exception_buffer);
+			head_ptr = PLT_PTR_CAST(
+				fw->req->cn10k_req.jd.fw_load.debug.core1_exception_buffer);
 			fprintf(fp, "ML_SCRATCH_EXCEPTION_SP_C1 = 0x%016lx",
 				roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1));
 			head_ptr = roc_ml_addr_mlip2ap(&cn10k_mldev->roc, head_ptr);
@@ -1538,14 +1551,14 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	const struct plt_memzone *mz;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 	uint64_t timeout_cycle;
 	bool timeout;
 	int ret;
 
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cn10k_ml_req), 0,
+	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cnxk_ml_req), 0,
 					 ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL) {
 		plt_err("Could not allocate reserved memzone");
@@ -1554,23 +1567,24 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	req = mz->addr;
 
 	/* Prepare load completion structure */
-	memset(&req->jd, 0, sizeof(struct cn10k_ml_jd));
-	req->jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->status);
-	req->jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST;
-	req->jd.hdr.result = roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->result);
-	req->jd.fw_load.flags = cn10k_ml_fw_flags_get(&cn10k_mldev->fw);
-	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
+	memset(&req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
+	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->cn10k_req.status);
+	req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST;
+	req->cn10k_req.jd.hdr.result =
+		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
+	req->cn10k_req.jd.fw_load.flags = cn10k_ml_fw_flags_get(&cn10k_mldev->fw);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
 	/* Enqueue firmware selftest request through scratch registers */
 	timeout = true;
 	timeout_cycle = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
+	roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 
 	plt_rmb();
 	do {
 		if (roc_ml_scratch_is_done_bit_set(&cn10k_mldev->roc) &&
-		    (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH)) {
+		    (plt_read64(&req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH)) {
 			timeout = false;
 			break;
 		}
@@ -1581,7 +1595,7 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 	if (timeout) {
 		ret = -ETIME;
 	} else {
-		if (req->result.error_code.u64 != 0)
+		if (req->cn10k_req.result.error_code != 0)
 			ret = -1;
 	}
 
@@ -1654,7 +1668,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
 		  2 * model_data_size + model_scratch_size + model_info_size +
-		  PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE) +
+		  PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE) +
 		  model_stats_size;
 
 	/* Allocate memzone for model object and model data */
@@ -1726,7 +1740,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	/* Reset burst and sync stats */
 	model->layer[0].glow.burst_stats =
 		PLT_PTR_ADD(model->layer[0].glow.req,
-			    PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_req), ML_CN10K_ALIGN_SIZE));
+			    PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
 		model->layer[0].glow.burst_stats[qp_id].hw_latency_tot = 0;
 		model->layer[0].glow.burst_stats[qp_id].hw_latency_min = UINT64_MAX;
@@ -1790,7 +1804,7 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 
 	bool job_enqueued;
 	bool job_dequeued;
@@ -1815,10 +1829,10 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	/* Prepare JD */
 	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
-	req->result.error_code.u64 = 0x0;
-	req->result.user_ptr = NULL;
+	req->cn10k_req.result.error_code = 0x0;
+	req->cn10k_req.result.user_ptr = NULL;
 
-	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
 	num_tiles = model->layer[0].glow.metadata.model.tile_end -
@@ -1878,8 +1892,8 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 
 	/* Update JD */
 	cn10k_ml_ocm_tilecount(model->layer[0].glow.ocm_map.tilemask, &tile_start, &tile_end);
-	req->jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
-	req->jd.model_start.ocm_wb_base_address =
+	req->cn10k_req.jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
+	req->cn10k_req.jd.model_start.ocm_wb_base_address =
 		model->layer[0].glow.ocm_map.wb_page_start * ocm->page_size;
 
 	job_enqueued = false;
@@ -1887,19 +1901,21 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	do {
 		if (!job_enqueued) {
 			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
+			job_enqueued =
+				roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
+			job_dequeued =
+				roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
-			if (req->result.error_code.u64 == 0)
+		if (plt_read64(&req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH) {
+			if (req->cn10k_req.result.error_code == 0)
 				ret = 0;
 			else
 				ret = -1;
@@ -1952,7 +1968,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 
 	bool job_enqueued;
 	bool job_dequeued;
@@ -1972,10 +1988,10 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	/* Prepare JD */
 	req = model->layer[0].glow.req;
 	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
-	req->result.error_code.u64 = 0x0;
-	req->result.user_ptr = NULL;
+	req->cn10k_req.result.error_code = 0x0;
+	req->cn10k_req.result.user_ptr = NULL;
 
-	plt_write64(ML_CNXK_POLL_JOB_START, &req->status);
+	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
 	locked = false;
@@ -2015,19 +2031,21 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	do {
 		if (!job_enqueued) {
 			req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-			job_enqueued = roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->jd);
+			job_enqueued =
+				roc_ml_scratch_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 		}
 
 		if (job_enqueued && !job_dequeued)
-			job_dequeued = roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->jd);
+			job_dequeued =
+				roc_ml_scratch_dequeue(&cn10k_mldev->roc, &req->cn10k_req.jd);
 
 		if (job_dequeued)
 			break;
 	} while (plt_tsc_cycles() < req->timeout);
 
 	if (job_dequeued) {
-		if (plt_read64(&req->status) == ML_CNXK_POLL_JOB_FINISH) {
-			if (req->result.error_code.u64 == 0x0)
+		if (plt_read64(&req->cn10k_req.status) == ML_CNXK_POLL_JOB_FINISH) {
+			if (req->cn10k_req.result.error_code == 0x0)
 				ret = 0;
 			else
 				ret = -1;
@@ -2287,18 +2305,23 @@ queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
 }
 
 static __rte_always_inline void
-cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result *result,
-		       struct rte_ml_op *op)
+cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *req)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_layer_stats *stats;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cn10k_ml_result *result;
 	struct cnxk_ml_model *model;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_qp *qp;
+	struct rte_ml_op *op;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
 
-	if (likely(result->error_code.u64 == 0)) {
+	result = &req->cn10k_req.result;
+	op = req->op;
+
+	if (likely(result->error_code == 0)) {
 		model = dev->data->models[op->model_id];
 		if (likely(qp_id >= 0)) {
 			qp = dev->data->queue_pairs[qp_id];
@@ -2329,7 +2352,7 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		stats->fw_latency_max = PLT_MAX(stats->fw_latency_max, fw_latency);
 		stats->dequeued_count++;
 
-		op->impl_opaque = result->error_code.u64;
+		op->impl_opaque = result->error_code;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
 	} else {
 		if (likely(qp_id >= 0)) {
@@ -2338,7 +2361,8 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 		}
 
 		/* Handle driver error */
-		if (result->error_code.s.etype == ML_ETYPE_DRIVER) {
+		error_code = (union cn10k_ml_error_code *)&result->error_code;
+		if (error_code->s.etype == ML_ETYPE_DRIVER) {
 			cnxk_mldev = dev->data->dev_private;
 			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
@@ -2346,15 +2370,15 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cn10k_ml_result
 			if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) !=
 			     0) ||
 			    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
-				result->error_code.s.stype = ML_DRIVER_ERR_EXCEPTION;
+				error_code->s.stype = ML_DRIVER_ERR_EXCEPTION;
 			else if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_LO) != 0) ||
 				 (roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_HI) != 0))
-				result->error_code.s.stype = ML_DRIVER_ERR_FW_ERROR;
+				error_code->s.stype = ML_DRIVER_ERR_FW_ERROR;
 			else
-				result->error_code.s.stype = ML_DRIVER_ERR_UNKNOWN;
+				error_code->s.stype = ML_DRIVER_ERR_UNKNOWN;
 		}
 
-		op->impl_opaque = result->error_code.u64;
+		op->impl_opaque = result->error_code;
 		op->status = RTE_ML_OP_STATUS_ERROR;
 	}
 
@@ -2365,11 +2389,12 @@ __rte_hot uint16_t
 cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_req *req;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_req *req;
+	struct cnxk_ml_qp *qp;
 	struct rte_ml_op *op;
 
 	uint16_t count;
@@ -2395,12 +2420,13 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	cn10k_mldev->set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
-	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
-	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
-	req->result.user_ptr = op->user_ptr;
+	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
+	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	req->cn10k_req.result.user_ptr = op->user_ptr;
 
 	cn10k_mldev->set_poll_ptr(req);
-	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd);
+	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd);
 	if (unlikely(!enqueued))
 		goto jcmdq_full;
 
@@ -2424,11 +2450,12 @@ __rte_hot uint16_t
 cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
 		       uint16_t nb_ops)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	struct cn10k_ml_queue *queue;
-	struct cn10k_ml_req *req;
-	struct cn10k_ml_qp *qp;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_req *req;
+	struct cnxk_ml_qp *qp;
 
 	uint64_t status;
 	uint16_t count;
@@ -2450,13 +2477,15 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	req = &queue->reqs[tail];
 	status = cn10k_mldev->get_poll_ptr(req);
 	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
-		if (plt_tsc_cycles() < req->timeout)
+		if (plt_tsc_cycles() < req->timeout) {
 			goto empty_or_active;
-		else /* Timeout, set indication of driver error */
-			req->result.error_code.s.etype = ML_ETYPE_DRIVER;
+		} else { /* Timeout, set indication of driver error */
+			error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+			error_code->s.etype = ML_ETYPE_DRIVER;
+		}
 	}
 
-	cn10k_ml_result_update(dev, qp_id, &req->result, req->op);
+	cn10k_ml_result_update(dev, qp_id, req);
 	ops[count] = req->op;
 
 	queue_index_advance(&tail, qp->nb_desc);
@@ -2507,10 +2536,11 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 __rte_hot int
 cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 {
+	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
-	struct cn10k_ml_req *req;
+	struct cnxk_ml_req *req;
 	bool timeout;
 	int ret = 0;
 
@@ -2522,17 +2552,18 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	cn10k_ml_set_poll_addr(req);
 	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
 
-	memset(&req->result, 0, sizeof(struct cn10k_ml_result));
-	req->result.error_code.s.etype = ML_ETYPE_UNKNOWN;
-	req->result.user_ptr = op->user_ptr;
+	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
+	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	req->cn10k_req.result.user_ptr = op->user_ptr;
 
 	cn10k_mldev->set_poll_ptr(req);
-	req->jcmd.w1.s.jobptr = PLT_U64_CAST(&req->jd);
+	req->cn10k_req.jcmd.w1.s.jobptr = PLT_U64_CAST(&req->cn10k_req.jd);
 
 	timeout = true;
 	req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	do {
-		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->jcmd)) {
+		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd)) {
 			req->op = op;
 			timeout = false;
 			break;
@@ -2555,7 +2586,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	if (timeout)
 		ret = -ETIME;
 	else
-		cn10k_ml_result_update(dev, -1, &req->result, req->op);
+		cn10k_ml_result_update(dev, -1, req);
 
 error_enqueue:
 	return ret;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 005b093e45..fd5992e192 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -10,63 +10,279 @@
 
 #include <roc_api.h>
 
-#include "cn10k_ml_dev.h"
+/* Firmware version string length */
+#define MLDEV_FIRMWARE_VERSION_LENGTH 32
 
-/* Request structure */
-struct cn10k_ml_req {
-	/* Job descriptor */
-	struct cn10k_ml_jd jd;
+/* Job types */
+enum cn10k_ml_job_type {
+	ML_CN10K_JOB_TYPE_MODEL_RUN = 0,
+	ML_CN10K_JOB_TYPE_MODEL_STOP,
+	ML_CN10K_JOB_TYPE_MODEL_START,
+	ML_CN10K_JOB_TYPE_FIRMWARE_LOAD,
+	ML_CN10K_JOB_TYPE_FIRMWARE_SELFTEST,
+};
 
-	/* Job descriptor extra arguments */
-	union cn10k_ml_jd_extended_args extended_args;
+/* Firmware stats */
+struct cn10k_ml_stats {
+	/* Firmware start cycle */
+	uint64_t fw_start;
 
-	/* Job result */
-	struct cn10k_ml_result result;
+	/* Firmware end cycle */
+	uint64_t fw_end;
 
-	/* Status field for poll mode requests */
-	volatile uint64_t status;
+	/* Hardware start cycle */
+	uint64_t hw_start;
 
-	/* Job command */
-	struct ml_job_cmd_s jcmd;
+	/* Hardware end cycle */
+	uint64_t hw_end;
+};
+
+/* Result structure */
+struct cn10k_ml_result {
+	/* Job error code */
+	uint64_t error_code;
+
+	/* Stats */
+	struct cn10k_ml_stats stats;
+
+	/* User context pointer */
+	void *user_ptr;
+};
+
+/* Firmware capability structure */
+union cn10k_ml_fw_cap {
+	uint64_t u64;
+
+	struct {
+		/* CMPC completion support */
+		uint64_t cmpc_completions : 1;
+
+		/* Poll mode completion support */
+		uint64_t poll_completions : 1;
+
+		/* SSO completion support */
+		uint64_t sso_completions : 1;
+
+		/* Support for model side loading */
+		uint64_t side_load_model : 1;
 
-	/* Job completion W1 */
-	uint64_t compl_W1;
+		/* Batch execution */
+		uint64_t batch_run : 1;
 
-	/* Timeout cycle */
-	uint64_t timeout;
+		/* Max number of models to be loaded in parallel */
+		uint64_t max_models : 8;
 
-	/* Op */
-	struct rte_ml_op *op;
-} __rte_aligned(ROC_ALIGN);
+		/* Firmware statistics */
+		uint64_t fw_stats : 1;
 
-/* Request queue */
-struct cn10k_ml_queue {
-	/* Array of requests */
-	struct cn10k_ml_req *reqs;
+		/* Hardware statistics */
+		uint64_t hw_stats : 1;
 
-	/* Head of the queue, used for enqueue */
-	uint64_t head;
+		/* Max number of batches */
+		uint64_t max_num_batches : 16;
 
-	/* Tail of the queue, used for dequeue */
-	uint64_t tail;
+		uint64_t rsvd : 33;
+	} s;
+};
+
+/* Firmware debug info structure */
+struct cn10k_ml_fw_debug {
+	/* ACC core 0 debug buffer */
+	uint64_t core0_debug_ptr;
+
+	/* ACC core 1 debug buffer */
+	uint64_t core1_debug_ptr;
+
+	/* ACC core 0 exception state buffer */
+	uint64_t core0_exception_buffer;
+
+	/* ACC core 1 exception state buffer */
+	uint64_t core1_exception_buffer;
+
+	/* Debug buffer size per core */
+	uint32_t debug_buffer_size;
 
-	/* Wait cycles before timeout */
-	uint64_t wait_cycles;
+	/* Exception state dump size */
+	uint32_t exception_state_size;
 };
 
-/* Queue-pair structure */
-struct cn10k_ml_qp {
-	/* ID */
-	uint32_t id;
+/* Job descriptor header (32 bytes) */
+struct cn10k_ml_jd_header {
+	/* Job completion structure */
+	struct ml_jce_s jce;
+
+	/* Model ID */
+	uint64_t model_id : 8;
+
+	/* Job type */
+	uint64_t job_type : 8;
+
+	/* Flags for fast-path jobs */
+	uint64_t fp_flags : 16;
+
+	/* Flags for slow-path jobs */
+	uint64_t sp_flags : 16;
+	uint64_t rsvd : 16;
+
+	/* Job result pointer */
+	uint64_t *result;
+};
+
+/* Extra arguments for job descriptor */
+union cn10k_ml_jd_extended_args {
+	struct cn10k_ml_jd_extended_args_section_start {
+		/* DDR Scratch base address */
+		uint64_t ddr_scratch_base_address;
+
+		/* DDR Scratch range start */
+		uint64_t ddr_scratch_range_start;
+
+		/* DDR Scratch range end */
+		uint64_t ddr_scratch_range_end;
+
+		uint8_t rsvd[104];
+	} start;
+};
+
+/* Job descriptor structure */
+struct cn10k_ml_jd {
+	/* Job descriptor header (32 bytes) */
+	struct cn10k_ml_jd_header hdr;
+
+	union {
+		struct cn10k_ml_jd_section_fw_load {
+			/* Firmware capability structure (8 bytes) */
+			union cn10k_ml_fw_cap cap;
+
+			/* Firmware version (32 bytes) */
+			uint8_t version[MLDEV_FIRMWARE_VERSION_LENGTH];
+
+			/* Debug capability structure (40 bytes) */
+			struct cn10k_ml_fw_debug debug;
 
-	/* Number of descriptors */
-	uint64_t nb_desc;
+			/* Flags to control error handling */
+			uint64_t flags;
 
-	/* Request queue */
-	struct cn10k_ml_queue queue;
+			uint8_t rsvd[8];
+		} fw_load;
 
-	/* Statistics per queue-pair */
-	struct rte_ml_dev_stats stats;
+		struct cn10k_ml_jd_section_model_start {
+			/* Extended arguments */
+			uint64_t extended_args;
+
+			/* Destination model start address in DDR relative to ML_MLR_BASE */
+			uint64_t model_dst_ddr_addr;
+
+			/* Offset to model init section in the model */
+			uint64_t model_init_offset : 32;
+
+			/* Size of init section in the model */
+			uint64_t model_init_size : 32;
+
+			/* Offset to model main section in the model */
+			uint64_t model_main_offset : 32;
+
+			/* Size of main section in the model */
+			uint64_t model_main_size : 32;
+
+			/* Offset to model finish section in the model */
+			uint64_t model_finish_offset : 32;
+
+			/* Size of finish section in the model */
+			uint64_t model_finish_size : 32;
+
+			/* Offset to WB in model bin */
+			uint64_t model_wb_offset : 32;
+
+			/* Number of model layers */
+			uint64_t num_layers : 8;
+
+			/* Number of gather entries, 0 means linear input mode (= no gather) */
+			uint64_t num_gather_entries : 8;
+
+			/* Number of scatter entries 0 means linear input mode (= no scatter) */
+			uint64_t num_scatter_entries : 8;
+
+			/* Tile mask to load model */
+			uint64_t tilemask : 8;
+
+			/* Batch size of model  */
+			uint64_t batch_size : 32;
+
+			/* OCM WB base address */
+			uint64_t ocm_wb_base_address : 32;
+
+			/* OCM WB range start */
+			uint64_t ocm_wb_range_start : 32;
+
+			/* OCM WB range End */
+			uint64_t ocm_wb_range_end : 32;
+
+			/* DDR WB address */
+			uint64_t ddr_wb_base_address;
+
+			/* DDR WB range start */
+			uint64_t ddr_wb_range_start : 32;
+
+			/* DDR WB range end */
+			uint64_t ddr_wb_range_end : 32;
+
+			union {
+				/* Points to gather list if num_gather_entries > 0 */
+				void *gather_list;
+				struct {
+					/* Linear input mode */
+					uint64_t ddr_range_start : 32;
+					uint64_t ddr_range_end : 32;
+				} s;
+			} input;
+
+			union {
+				/* Points to scatter list if num_scatter_entries > 0 */
+				void *scatter_list;
+				struct {
+					/* Linear output mode */
+					uint64_t ddr_range_start : 32;
+					uint64_t ddr_range_end : 32;
+				} s;
+			} output;
+		} model_start;
+
+		struct cn10k_ml_jd_section_model_stop {
+			uint8_t rsvd[96];
+		} model_stop;
+
+		struct cn10k_ml_jd_section_model_run {
+			/* Address of the input for the run relative to ML_MLR_BASE */
+			uint64_t input_ddr_addr;
+
+			/* Address of the output for the run relative to ML_MLR_BASE */
+			uint64_t output_ddr_addr;
+
+			/* Number of batches to run in variable batch processing */
+			uint16_t num_batches;
+
+			uint8_t rsvd[78];
+		} model_run;
+	};
+} __plt_aligned(ROC_ALIGN);
+
+/* CN10K specific request */
+struct cn10k_ml_req {
+	/* Job descriptor */
+	struct cn10k_ml_jd jd;
+
+	/* Job descriptor extra arguments */
+	union cn10k_ml_jd_extended_args extended_args;
+
+	/* Status field for poll mode requests */
+	volatile uint64_t status;
+
+	/* Job command */
+	struct ml_job_cmd_s jcmd;
+
+	/* Result */
+	struct cn10k_ml_result result;
 };
 
 /* Device ops */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
new file mode 100644
index 0000000000..f1872dcf7c
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include "cnxk_ml_ops.h"
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
new file mode 100644
index 0000000000..b953fb0f5f
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_OPS_H_
+#define _CNXK_ML_OPS_H_
+
+#include <rte_mldev.h>
+#include <rte_mldev_core.h>
+
+#include <roc_api.h>
+
+#include "cn10k_ml_ops.h"
+
+/* Request structure */
+struct cnxk_ml_req {
+	/* Device specific request */
+	union {
+		/* CN10K */
+		struct cn10k_ml_req cn10k_req;
+	};
+
+	/* Address of status field */
+	volatile uint64_t *status;
+
+	/* Timeout cycle */
+	uint64_t timeout;
+
+	/* Op */
+	struct rte_ml_op *op;
+} __rte_aligned(ROC_ALIGN);
+
+/* Request queue */
+struct cnxk_ml_queue {
+	/* Array of requests */
+	struct cnxk_ml_req *reqs;
+
+	/* Head of the queue, used for enqueue */
+	uint64_t head;
+
+	/* Tail of the queue, used for dequeue */
+	uint64_t tail;
+
+	/* Wait cycles before timeout */
+	uint64_t wait_cycles;
+};
+
+/* Queue-pair structure */
+struct cnxk_ml_qp {
+	/* ID */
+	uint32_t id;
+
+	/* Number of descriptors */
+	uint64_t nb_desc;
+
+	/* Request queue */
+	struct cnxk_ml_queue queue;
+
+	/* Statistics per queue-pair */
+	struct rte_ml_dev_stats stats;
+};
+
+#endif /* _CNXK_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index a70956cceb..d652543912 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -14,6 +14,7 @@ sources = files(
         'cn10k_ml_ocm.c',
         'cnxk_ml_dev.c',
         'cnxk_ml_model.c',
+        'cnxk_ml_ops.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 05/34] ml/cnxk: add generic cnxk xstats structures
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (3 preceding siblings ...)
  2023-10-26 12:43   ` [PATCH v9 04/34] ml/cnxk: add generic cnxk request structure Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 06/34] ml/cnxk: rename cnxk ops function pointers struct Srikanth Yalavarthi
                     ` (29 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduced generic xstats structures and renamed cn10k
xstats enumerations with cnxk prefix.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |  86 +---------------
 drivers/ml/cnxk/cn10k_ml_model.h |   6 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 169 ++++++++++++++-----------------
 drivers/ml/cnxk/cnxk_ml_xstats.h | 128 +++++++++++++++++++++++
 4 files changed, 209 insertions(+), 180 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_xstats.h

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 1852d4f6c9..be989e0a20 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -10,6 +10,7 @@
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_io.h"
+#include "cnxk_ml_xstats.h"
 
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
@@ -121,89 +122,6 @@ struct cn10k_ml_fw {
 	struct cnxk_ml_req *req;
 };
 
-/* Extended stats types enum */
-enum cn10k_ml_xstats_type {
-	/* Number of models loaded */
-	nb_models_loaded,
-
-	/* Number of models unloaded */
-	nb_models_unloaded,
-
-	/* Number of models started */
-	nb_models_started,
-
-	/* Number of models stopped */
-	nb_models_stopped,
-
-	/* Average inference hardware latency */
-	avg_hw_latency,
-
-	/* Minimum hardware latency */
-	min_hw_latency,
-
-	/* Maximum hardware latency */
-	max_hw_latency,
-
-	/* Average firmware latency */
-	avg_fw_latency,
-
-	/* Minimum firmware latency */
-	min_fw_latency,
-
-	/* Maximum firmware latency */
-	max_fw_latency,
-};
-
-/* Extended stats function type enum. */
-enum cn10k_ml_xstats_fn_type {
-	/* Device function */
-	CN10K_ML_XSTATS_FN_DEVICE,
-
-	/* Model function */
-	CN10K_ML_XSTATS_FN_MODEL,
-};
-
-/* Function pointer to get xstats for a type */
-typedef uint64_t (*cn10k_ml_xstats_fn)(struct rte_ml_dev *dev, uint16_t obj_idx,
-				       enum cn10k_ml_xstats_type stat);
-
-/* Extended stats entry structure */
-struct cn10k_ml_xstats_entry {
-	/* Name-ID map */
-	struct rte_ml_dev_xstats_map map;
-
-	/* xstats mode, device or model */
-	enum rte_ml_dev_xstats_mode mode;
-
-	/* Type of xstats */
-	enum cn10k_ml_xstats_type type;
-
-	/* xstats function */
-	enum cn10k_ml_xstats_fn_type fn_id;
-
-	/* Object ID, model ID for model stat type */
-	uint16_t obj_idx;
-
-	/* Allowed to reset the stat */
-	uint8_t reset_allowed;
-
-	/* An offset to be taken away to emulate resets */
-	uint64_t reset_value;
-};
-
-/* Extended stats data */
-struct cn10k_ml_xstats {
-	/* Pointer to xstats entries */
-	struct cn10k_ml_xstats_entry *entries;
-
-	/* Store num stats and offset of the stats for each model */
-	uint16_t count_per_model[ML_CNXK_MAX_MODELS];
-	uint16_t offset_for_model[ML_CNXK_MAX_MODELS];
-	uint16_t count_mode_device;
-	uint16_t count_mode_model;
-	uint16_t count;
-};
-
 /* Device private data */
 struct cn10k_ml_dev {
 	/* Device ROC */
@@ -216,7 +134,7 @@ struct cn10k_ml_dev {
 	struct cn10k_ml_ocm ocm;
 
 	/* Extended stats data */
-	struct cn10k_ml_xstats xstats;
+	struct cnxk_ml_xstats xstats;
 
 	/* Enable / disable model data caching */
 	int cache_model_data;
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 74ada1531a..5c32f48c68 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -404,7 +404,7 @@ struct cn10k_ml_layer_addr {
 };
 
 /* Model fast-path stats */
-struct cn10k_ml_layer_stats {
+struct cn10k_ml_layer_xstats {
 	/* Total hardware latency, sum of all inferences */
 	uint64_t hw_latency_tot;
 
@@ -447,10 +447,10 @@ struct cn10k_ml_layer_data {
 	struct cnxk_ml_req *req;
 
 	/* Layer: Stats for burst ops */
-	struct cn10k_ml_layer_stats *burst_stats;
+	struct cn10k_ml_layer_xstats *burst_xstats;
 
 	/* Layer: Stats for sync ops */
-	struct cn10k_ml_layer_stats *sync_stats;
+	struct cn10k_ml_layer_xstats *sync_xstats;
 };
 
 struct cn10k_ml_model_data {
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 25ebb28993..b470955ffd 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -10,6 +10,7 @@
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
+#include "cnxk_ml_xstats.h"
 
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
@@ -425,26 +426,6 @@ cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
 }
 
-struct xstat_info {
-	char name[32];
-	enum cn10k_ml_xstats_type type;
-	uint8_t reset_allowed;
-};
-
-/* Note: Device stats are not allowed to be reset. */
-static const struct xstat_info device_stats[] = {
-	{"nb_models_loaded", nb_models_loaded, 0},
-	{"nb_models_unloaded", nb_models_unloaded, 0},
-	{"nb_models_started", nb_models_started, 0},
-	{"nb_models_stopped", nb_models_stopped, 0},
-};
-
-static const struct xstat_info model_stats[] = {
-	{"Avg-HW-Latency", avg_hw_latency, 1}, {"Min-HW-Latency", min_hw_latency, 1},
-	{"Max-HW-Latency", max_hw_latency, 1}, {"Avg-FW-Latency", avg_fw_latency, 1},
-	{"Min-FW-Latency", min_fw_latency, 1}, {"Max-FW-Latency", max_fw_latency, 1},
-};
-
 static int
 cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 {
@@ -459,10 +440,10 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
-	nb_stats = RTE_DIM(device_stats) + ML_CNXK_MAX_MODELS * RTE_DIM(model_stats);
+	nb_stats = RTE_DIM(device_xstats) + ML_CNXK_MAX_MODELS * RTE_DIM(layer_xstats);
 	if (cn10k_mldev->xstats.entries == NULL)
 		cn10k_mldev->xstats.entries = rte_zmalloc(
-			"cn10k_ml_xstats", sizeof(struct cn10k_ml_xstats_entry) * nb_stats,
+			"cn10k_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
 			PLT_CACHE_LINE_SIZE);
 
 	if (cn10k_mldev->xstats.entries == NULL)
@@ -470,17 +451,17 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 
 	/* Initialize device xstats */
 	stat_id = 0;
-	for (i = 0; i < RTE_DIM(device_stats); i++) {
+	for (i = 0; i < RTE_DIM(device_xstats); i++) {
 		cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
 		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s",
-			 device_stats[i].name);
+			 device_xstats[i].name);
 
 		cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
-		cn10k_mldev->xstats.entries[stat_id].type = device_stats[i].type;
-		cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_DEVICE;
+		cn10k_mldev->xstats.entries[stat_id].type = device_xstats[i].type;
+		cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_DEVICE;
 		cn10k_mldev->xstats.entries[stat_id].obj_idx = 0;
-		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_stats[i].reset_allowed;
+		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_xstats[i].reset_allowed;
 		stat_id++;
 	}
 	cn10k_mldev->xstats.count_mode_device = stat_id;
@@ -489,24 +470,24 @@ cn10k_ml_xstats_init(struct rte_ml_dev *dev)
 	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
 		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
 
-		for (i = 0; i < RTE_DIM(model_stats); i++) {
+		for (i = 0; i < RTE_DIM(layer_xstats); i++) {
 			cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
 			cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
-			cn10k_mldev->xstats.entries[stat_id].type = model_stats[i].type;
-			cn10k_mldev->xstats.entries[stat_id].fn_id = CN10K_ML_XSTATS_FN_MODEL;
+			cn10k_mldev->xstats.entries[stat_id].type = layer_xstats[i].type;
+			cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
 			cn10k_mldev->xstats.entries[stat_id].obj_idx = model;
 			cn10k_mldev->xstats.entries[stat_id].reset_allowed =
-				model_stats[i].reset_allowed;
+				layer_xstats[i].reset_allowed;
 
 			/* Name of xstat is updated during model load */
 			snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 				 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name),
-				 "Model-%u-%s", model, model_stats[i].name);
+				 "Model-%u-%s", model, layer_xstats[i].name);
 
 			stat_id++;
 		}
 
-		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(model_stats);
+		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
 	}
 
 	cn10k_mldev->xstats.count_mode_model = stat_id - cn10k_mldev->xstats.count_mode_device;
@@ -545,7 +526,7 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	model = dev->data->models[model_id];
-	stat_id = RTE_DIM(device_stats) + model_id * RTE_DIM(model_stats);
+	stat_id = RTE_DIM(device_xstats) + model_id * RTE_DIM(layer_xstats);
 
 	roc_clk_freq_get(&rclk_freq, &sclk_freq);
 	if (sclk_freq == 0)
@@ -554,17 +535,17 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 		strcpy(suffix, "ns");
 
 	/* Update xstat name based on model name and sclk availability */
-	for (i = 0; i < RTE_DIM(model_stats); i++) {
+	for (i = 0; i < RTE_DIM(layer_xstats); i++) {
 		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
 			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
-			 model->layer[0].glow.metadata.model.name, model_stats[i].name, suffix);
+			 model->layer[0].glow.metadata.model.name, layer_xstats[i].name, suffix);
 		stat_id++;
 	}
 }
 
 static uint64_t
 cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
-		       enum cn10k_ml_xstats_type type)
+		       enum cnxk_ml_xstats_type type)
 {
 	struct cnxk_ml_dev *cnxk_mldev;
 
@@ -590,9 +571,9 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 	do {                                                                                       \
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value += model->layer[0].glow.burst_stats[qp_id].str##_latency_tot;        \
-			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
-				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
+			value += model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot;       \
+			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
+				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
 		}                                                                                  \
 		if (count != 0)                                                                    \
 			value = value / count;                                                     \
@@ -603,9 +584,10 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 		value = UINT64_MAX;                                                                \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
 			value = PLT_MIN(                                                           \
-				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_min); \
-			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
-				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
+				value,                                                             \
+				model->layer[0].glow.burst_xstats[qp_id].str##_latency_min);       \
+			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
+				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
@@ -616,16 +598,17 @@ cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
 		value = 0;                                                                         \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
 			value = PLT_MAX(                                                           \
-				value, model->layer[0].glow.burst_stats[qp_id].str##_latency_max); \
-			count += model->layer[0].glow.burst_stats[qp_id].dequeued_count -          \
-				 model->layer[0].glow.burst_stats[qp_id].str##_reset_count;        \
+				value,                                                             \
+				model->layer[0].glow.burst_xstats[qp_id].str##_latency_max);       \
+			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
+				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
 		}                                                                                  \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
 	} while (0)
 
 static uint64_t
-cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cn10k_ml_xstats_type type)
+cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cnxk_ml_xstats_type type)
 {
 	struct cnxk_ml_model *model;
 	uint16_t rclk_freq; /* MHz */
@@ -671,8 +654,8 @@ cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cn10k_ml
 static int
 cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[], uint16_t nb_ids)
 {
-	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
 	uint16_t nb_stats;
 	uint16_t stat_id;
@@ -708,26 +691,26 @@ cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[],
 #define ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			model->layer[0].glow.burst_stats[qp_id].str##_latency_tot = 0;             \
-			model->layer[0].glow.burst_stats[qp_id].str##_reset_count =                \
-				model->layer[0].glow.burst_stats[qp_id].dequeued_count;            \
+			model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot = 0;            \
+			model->layer[0].glow.burst_xstats[qp_id].str##_reset_count =               \
+				model->layer[0].glow.burst_xstats[qp_id].dequeued_count;           \
 		}                                                                                  \
 	} while (0)
 
 #define ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_stats[qp_id].str##_latency_min = UINT64_MAX;    \
+			model->layer[0].glow.burst_xstats[qp_id].str##_latency_min = UINT64_MAX;   \
 	} while (0)
 
 #define ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
 	do {                                                                                       \
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_stats[qp_id].str##_latency_max = 0;             \
+			model->layer[0].glow.burst_xstats[qp_id].str##_latency_max = 0;            \
 	} while (0)
 
 static void
-cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cn10k_ml_xstats_type type)
+cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cnxk_ml_xstats_type type)
 {
 	struct cnxk_ml_model *model;
 	uint32_t qp_id;
@@ -762,8 +745,8 @@ static int
 cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint16_t stat_ids[],
 			    uint16_t nb_ids)
 {
-	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	int32_t lcl_model_id = 0;
@@ -1342,10 +1325,10 @@ static int
 cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
 				uint64_t *value)
 {
-	struct cn10k_ml_xstats_entry *xs;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
-	cn10k_ml_xstats_fn fn;
+	cnxk_ml_xstats_fn fn;
 	uint32_t i;
 
 	cnxk_mldev = dev->data->dev_private;
@@ -1357,10 +1340,10 @@ cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16
 				*stat_id = xs->map.id;
 
 			switch (xs->fn_id) {
-			case CN10K_ML_XSTATS_FN_DEVICE:
+			case CNXK_ML_XSTATS_FN_DEVICE:
 				fn = cn10k_ml_dev_xstat_get;
 				break;
-			case CN10K_ML_XSTATS_FN_MODEL:
+			case CNXK_ML_XSTATS_FN_MODEL:
 				fn = cn10k_ml_model_xstat_get;
 				break;
 			default:
@@ -1384,11 +1367,11 @@ static int
 cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
 			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
 {
-	struct cn10k_ml_xstats_entry *xs;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
 	uint32_t xstats_mode_count;
-	cn10k_ml_xstats_fn fn;
+	cnxk_ml_xstats_fn fn;
 	uint64_t val;
 	uint32_t idx;
 	uint32_t i;
@@ -1423,10 +1406,10 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 		}
 
 		switch (xs->fn_id) {
-		case CN10K_ML_XSTATS_FN_DEVICE:
+		case CNXK_ML_XSTATS_FN_DEVICE:
 			fn = cn10k_ml_dev_xstat_get;
 			break;
-		case CN10K_ML_XSTATS_FN_MODEL:
+		case CNXK_ML_XSTATS_FN_MODEL:
 			fn = cn10k_ml_model_xstat_get;
 			break;
 		default:
@@ -1664,7 +1647,7 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
 			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
 	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
-	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_stats);
+	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_xstats);
 
 	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
 		  2 * model_data_size + model_scratch_size + model_info_size +
@@ -1738,24 +1721,24 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	model->layer[0].glow.req = PLT_PTR_ADD(model->info, model_info_size);
 
 	/* Reset burst and sync stats */
-	model->layer[0].glow.burst_stats =
+	model->layer[0].glow.burst_xstats =
 		PLT_PTR_ADD(model->layer[0].glow.req,
 			    PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
 	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
-		model->layer[0].glow.burst_stats[qp_id].hw_latency_tot = 0;
-		model->layer[0].glow.burst_stats[qp_id].hw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_stats[qp_id].hw_latency_max = 0;
-		model->layer[0].glow.burst_stats[qp_id].fw_latency_tot = 0;
-		model->layer[0].glow.burst_stats[qp_id].fw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_stats[qp_id].fw_latency_max = 0;
-		model->layer[0].glow.burst_stats[qp_id].hw_reset_count = 0;
-		model->layer[0].glow.burst_stats[qp_id].fw_reset_count = 0;
-		model->layer[0].glow.burst_stats[qp_id].dequeued_count = 0;
+		model->layer[0].glow.burst_xstats[qp_id].hw_latency_tot = 0;
+		model->layer[0].glow.burst_xstats[qp_id].hw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_xstats[qp_id].hw_latency_max = 0;
+		model->layer[0].glow.burst_xstats[qp_id].fw_latency_tot = 0;
+		model->layer[0].glow.burst_xstats[qp_id].fw_latency_min = UINT64_MAX;
+		model->layer[0].glow.burst_xstats[qp_id].fw_latency_max = 0;
+		model->layer[0].glow.burst_xstats[qp_id].hw_reset_count = 0;
+		model->layer[0].glow.burst_xstats[qp_id].fw_reset_count = 0;
+		model->layer[0].glow.burst_xstats[qp_id].dequeued_count = 0;
 	}
 
-	model->layer[0].glow.sync_stats =
-		PLT_PTR_ADD(model->layer[0].glow.burst_stats,
-			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_stats));
+	model->layer[0].glow.sync_xstats =
+		PLT_PTR_ADD(model->layer[0].glow.burst_xstats,
+			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_xstats));
 
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CNXK_MODEL_STATE_LOADED;
@@ -2308,7 +2291,7 @@ static __rte_always_inline void
 cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *req)
 {
 	union cn10k_ml_error_code *error_code;
-	struct cn10k_ml_layer_stats *stats;
+	struct cn10k_ml_layer_xstats *xstats;
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_result *result;
@@ -2326,31 +2309,31 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *re
 		if (likely(qp_id >= 0)) {
 			qp = dev->data->queue_pairs[qp_id];
 			qp->stats.dequeued_count++;
-			stats = &model->layer[0].glow.burst_stats[qp_id];
+			xstats = &model->layer[0].glow.burst_xstats[qp_id];
 		} else {
-			stats = model->layer[0].glow.sync_stats;
+			xstats = model->layer[0].glow.sync_xstats;
 		}
 
-		if (unlikely(stats->dequeued_count == stats->hw_reset_count)) {
-			stats->hw_latency_min = UINT64_MAX;
-			stats->hw_latency_max = 0;
+		if (unlikely(xstats->dequeued_count == xstats->hw_reset_count)) {
+			xstats->hw_latency_min = UINT64_MAX;
+			xstats->hw_latency_max = 0;
 		}
 
-		if (unlikely(stats->dequeued_count == stats->fw_reset_count)) {
-			stats->fw_latency_min = UINT64_MAX;
-			stats->fw_latency_max = 0;
+		if (unlikely(xstats->dequeued_count == xstats->fw_reset_count)) {
+			xstats->fw_latency_min = UINT64_MAX;
+			xstats->fw_latency_max = 0;
 		}
 
 		hw_latency = result->stats.hw_end - result->stats.hw_start;
 		fw_latency = result->stats.fw_end - result->stats.fw_start - hw_latency;
 
-		stats->hw_latency_tot += hw_latency;
-		stats->hw_latency_min = PLT_MIN(stats->hw_latency_min, hw_latency);
-		stats->hw_latency_max = PLT_MAX(stats->hw_latency_max, hw_latency);
-		stats->fw_latency_tot += fw_latency;
-		stats->fw_latency_min = PLT_MIN(stats->fw_latency_min, fw_latency);
-		stats->fw_latency_max = PLT_MAX(stats->fw_latency_max, fw_latency);
-		stats->dequeued_count++;
+		xstats->hw_latency_tot += hw_latency;
+		xstats->hw_latency_min = PLT_MIN(xstats->hw_latency_min, hw_latency);
+		xstats->hw_latency_max = PLT_MAX(xstats->hw_latency_max, hw_latency);
+		xstats->fw_latency_tot += fw_latency;
+		xstats->fw_latency_min = PLT_MIN(xstats->fw_latency_min, fw_latency);
+		xstats->fw_latency_max = PLT_MAX(xstats->fw_latency_max, fw_latency);
+		xstats->dequeued_count++;
 
 		op->impl_opaque = result->error_code;
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
diff --git a/drivers/ml/cnxk/cnxk_ml_xstats.h b/drivers/ml/cnxk/cnxk_ml_xstats.h
new file mode 100644
index 0000000000..0d405679ca
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_xstats.h
@@ -0,0 +1,128 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_XSTATS_H_
+#define _CNXK_ML_XSTATS_H_
+
+#include "cnxk_ml_io.h"
+
+/* Extended stats types enum */
+enum cnxk_ml_xstats_type {
+	/* Number of models loaded */
+	nb_models_loaded,
+
+	/* Number of models unloaded */
+	nb_models_unloaded,
+
+	/* Number of models started */
+	nb_models_started,
+
+	/* Number of models stopped */
+	nb_models_stopped,
+
+	/* Average inference hardware latency */
+	avg_hw_latency,
+
+	/* Minimum hardware latency */
+	min_hw_latency,
+
+	/* Maximum hardware latency */
+	max_hw_latency,
+
+	/* Average firmware latency */
+	avg_fw_latency,
+
+	/* Minimum firmware latency */
+	min_fw_latency,
+
+	/* Maximum firmware latency */
+	max_fw_latency,
+
+	/* Average runtime latency */
+	avg_rt_latency,
+
+	/* Minimum runtime latency */
+	min_rt_latency,
+
+	/* Maximum runtime latency */
+	max_rt_latency,
+};
+
+/* Extended stats function type enum. */
+enum cnxk_ml_xstats_fn_type {
+	/* Device function */
+	CNXK_ML_XSTATS_FN_DEVICE,
+
+	/* Model function */
+	CNXK_ML_XSTATS_FN_MODEL,
+};
+
+/* Function pointer to get xstats for a type */
+typedef uint64_t (*cnxk_ml_xstats_fn)(struct rte_ml_dev *cnxk_mldev, uint16_t obj_idx,
+				      enum cnxk_ml_xstats_type stat);
+
+/* Extended stats entry structure */
+struct cnxk_ml_xstats_entry {
+	/* Name-ID map */
+	struct rte_ml_dev_xstats_map map;
+
+	/* xstats mode, device or model */
+	enum rte_ml_dev_xstats_mode mode;
+
+	/* Type of xstats */
+	enum cnxk_ml_xstats_type type;
+
+	/* xstats function */
+	enum cnxk_ml_xstats_fn_type fn_id;
+
+	/* Object ID, model ID for model stat type */
+	uint16_t obj_idx;
+
+	/* Layer ID, valid for model stat type */
+	int32_t layer_id;
+
+	/* Allowed to reset the stat */
+	uint8_t reset_allowed;
+
+	/* An offset to be taken away to emulate resets */
+	uint64_t reset_value;
+};
+
+/* Extended stats data */
+struct cnxk_ml_xstats {
+	/* Pointer to xstats entries */
+	struct cnxk_ml_xstats_entry *entries;
+
+	/* Store num stats and offset of the stats for each model */
+	uint16_t count_per_model[ML_CNXK_MAX_MODELS];
+	uint16_t offset_for_model[ML_CNXK_MAX_MODELS];
+	uint16_t count_per_layer[ML_CNXK_MAX_MODELS][ML_CNXK_MODEL_MAX_LAYERS];
+	uint16_t offset_for_layer[ML_CNXK_MAX_MODELS][ML_CNXK_MODEL_MAX_LAYERS];
+	uint16_t count_mode_device;
+	uint16_t count_mode_model;
+	uint16_t count;
+};
+
+struct cnxk_ml_xstat_info {
+	char name[32];
+	enum cnxk_ml_xstats_type type;
+	uint8_t reset_allowed;
+};
+
+/* Device xstats. Note: Device stats are not allowed to be reset. */
+static const struct cnxk_ml_xstat_info device_xstats[] = {
+	{"nb_models_loaded", nb_models_loaded, 0},
+	{"nb_models_unloaded", nb_models_unloaded, 0},
+	{"nb_models_started", nb_models_started, 0},
+	{"nb_models_stopped", nb_models_stopped, 0},
+};
+
+/* Layer xstats */
+static const struct cnxk_ml_xstat_info layer_xstats[] = {
+	{"Avg-HW-Latency", avg_hw_latency, 1}, {"Min-HW-Latency", min_hw_latency, 1},
+	{"Max-HW-Latency", max_hw_latency, 1}, {"Avg-FW-Latency", avg_fw_latency, 1},
+	{"Min-FW-Latency", min_fw_latency, 1}, {"Max-FW-Latency", max_fw_latency, 1},
+};
+
+#endif /* _CNXK_ML_XSTATS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 06/34] ml/cnxk: rename cnxk ops function pointers struct
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (4 preceding siblings ...)
  2023-10-26 12:43   ` [PATCH v9 05/34] ml/cnxk: add generic cnxk xstats structures Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 07/34] ml/cnxk: update device handling functions Srikanth Yalavarthi
                     ` (28 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Renamed cn10k ML ops structure with cnxk prefix.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.c |  2 +-
 drivers/ml/cnxk/cn10k_ml_ops.c | 73 +++++++++-------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h | 34 +++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.c  | 36 +++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.h  |  2 +
 5 files changed, 91 insertions(+), 56 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index fc6f78d414..91813e9d0a 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -345,7 +345,7 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 			goto pmd_destroy;
 		}
 
-		dev->dev_ops = &cn10k_ml_ops;
+		dev->dev_ops = &cnxk_ml_ops;
 	} else {
 		plt_err("CN10K ML Ops are not supported on secondary process");
 		dev->dev_ops = &ml_dev_dummy_ops;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index b470955ffd..a44fb26215 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -119,7 +119,7 @@ cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
 {
 	struct cnxk_ml_qp *qp;
@@ -860,7 +860,7 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	return ret;
 }
 
-static int
+int
 cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -888,7 +888,7 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
 {
 	struct rte_ml_dev_info dev_info;
@@ -1087,7 +1087,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	return ret;
 }
 
-static int
+int
 cn10k_ml_dev_close(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1160,7 +1160,7 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	return rte_dev_remove(dev->device);
 }
 
-static int
+int
 cn10k_ml_dev_start(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1180,7 +1180,7 @@ cn10k_ml_dev_start(struct rte_ml_dev *dev)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1200,7 +1200,7 @@ cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 			      const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
 {
@@ -1241,7 +1241,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 {
 	struct cnxk_ml_qp *qp;
@@ -1258,7 +1258,7 @@ cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 	return 0;
 }
 
-static void
+void
 cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 {
 	struct cnxk_ml_qp *qp;
@@ -1273,7 +1273,7 @@ cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
 	}
 }
 
-static int
+int
 cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
 			      uint32_t size)
@@ -1321,7 +1321,7 @@ cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 	return idx;
 }
 
-static int
+int
 cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
 				uint64_t *value)
 {
@@ -1363,7 +1363,7 @@ cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16
 	return -EINVAL;
 }
 
-static int
+int
 cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
 			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
 {
@@ -1427,7 +1427,7 @@ cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode
 	return idx;
 }
 
-static int
+int
 cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 			  int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids)
 {
@@ -1441,7 +1441,7 @@ cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mo
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -1528,7 +1528,7 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 	return 0;
 }
 
-static int
+int
 cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
@@ -2051,7 +2051,7 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	return ret;
 }
 
-static int
+int
 cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			struct rte_ml_model_info *model_info)
 {
@@ -2071,7 +2071,7 @@ cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 	return 0;
 }
 
-static int
+int
 cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
 {
 	struct cnxk_ml_model *model;
@@ -2105,7 +2105,7 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *bu
 	return 0;
 }
 
-static int
+int
 cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
 		     struct rte_ml_buff_seg **qbuffer)
 {
@@ -2186,7 +2186,7 @@ cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_bu
 	return 0;
 }
 
-static int
+int
 cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
 		       struct rte_ml_buff_seg **dbuffer)
 {
@@ -2574,38 +2574,3 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 error_enqueue:
 	return ret;
 }
-
-struct rte_ml_dev_ops cn10k_ml_ops = {
-	/* Device control ops */
-	.dev_info_get = cn10k_ml_dev_info_get,
-	.dev_configure = cn10k_ml_dev_configure,
-	.dev_close = cn10k_ml_dev_close,
-	.dev_start = cn10k_ml_dev_start,
-	.dev_stop = cn10k_ml_dev_stop,
-	.dev_dump = cn10k_ml_dev_dump,
-	.dev_selftest = cn10k_ml_dev_selftest,
-
-	/* Queue-pair handling ops */
-	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
-	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
-
-	/* Stats ops */
-	.dev_stats_get = cn10k_ml_dev_stats_get,
-	.dev_stats_reset = cn10k_ml_dev_stats_reset,
-	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
-	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
-	.dev_xstats_get = cn10k_ml_dev_xstats_get,
-	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
-
-	/* Model ops */
-	.model_load = cn10k_ml_model_load,
-	.model_unload = cn10k_ml_model_unload,
-	.model_start = cn10k_ml_model_start,
-	.model_stop = cn10k_ml_model_stop,
-	.model_info_get = cn10k_ml_model_info_get,
-	.model_params_update = cn10k_ml_model_params_update,
-
-	/* I/O ops */
-	.io_quantize = cn10k_ml_io_quantize,
-	.io_dequantize = cn10k_ml_io_dequantize,
-};
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index fd5992e192..16480b9ad8 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -286,7 +286,29 @@ struct cn10k_ml_req {
 };
 
 /* Device ops */
-extern struct rte_ml_dev_ops cn10k_ml_ops;
+int cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
+int cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf);
+int cn10k_ml_dev_close(struct rte_ml_dev *dev);
+int cn10k_ml_dev_start(struct rte_ml_dev *dev);
+int cn10k_ml_dev_stop(struct rte_ml_dev *dev);
+int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
+int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
+int cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
+				  const struct rte_ml_dev_qp_conf *qp_conf, int socket_id);
+int cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id);
+
+int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
+void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
+int cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+				  int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
+				  uint32_t size);
+int cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
+				    uint64_t *value);
+int cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+			    int32_t model_id, const uint16_t stat_ids[], uint64_t values[],
+			    uint16_t nb_ids);
+int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+			      int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids);
 
 /* Slow-path ops */
 int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
@@ -294,6 +316,16 @@ int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *para
 int cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
+int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
+			    struct rte_ml_model_info *model_info);
+int cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer);
+
+/* I/O ops */
+int cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id,
+			 struct rte_ml_buff_seg **dbuffer, struct rte_ml_buff_seg **qbuffer);
+
+int cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id,
+			   struct rte_ml_buff_seg **qbuffer, struct rte_ml_buff_seg **dbuffer);
 
 /* Fast-path ops */
 __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index f1872dcf7c..03402681c5 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -3,5 +3,41 @@
  */
 
 #include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
 
 #include "cnxk_ml_ops.h"
+
+struct rte_ml_dev_ops cnxk_ml_ops = {
+	/* Device control ops */
+	.dev_info_get = cn10k_ml_dev_info_get,
+	.dev_configure = cn10k_ml_dev_configure,
+	.dev_close = cn10k_ml_dev_close,
+	.dev_start = cn10k_ml_dev_start,
+	.dev_stop = cn10k_ml_dev_stop,
+	.dev_dump = cn10k_ml_dev_dump,
+	.dev_selftest = cn10k_ml_dev_selftest,
+
+	/* Queue-pair handling ops */
+	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
+	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
+
+	/* Stats ops */
+	.dev_stats_get = cn10k_ml_dev_stats_get,
+	.dev_stats_reset = cn10k_ml_dev_stats_reset,
+	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
+	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
+	.dev_xstats_get = cn10k_ml_dev_xstats_get,
+	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
+
+	/* Model ops */
+	.model_load = cn10k_ml_model_load,
+	.model_unload = cn10k_ml_model_unload,
+	.model_start = cn10k_ml_model_start,
+	.model_stop = cn10k_ml_model_stop,
+	.model_info_get = cn10k_ml_model_info_get,
+	.model_params_update = cn10k_ml_model_params_update,
+
+	/* I/O ops */
+	.io_quantize = cn10k_ml_io_quantize,
+	.io_dequantize = cn10k_ml_io_dequantize,
+};
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index b953fb0f5f..a925c07580 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -60,4 +60,6 @@ struct cnxk_ml_qp {
 	struct rte_ml_dev_stats stats;
 };
 
+extern struct rte_ml_dev_ops cnxk_ml_ops;
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 07/34] ml/cnxk: update device handling functions
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (5 preceding siblings ...)
  2023-10-26 12:43   ` [PATCH v9 06/34] ml/cnxk: rename cnxk ops function pointers struct Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 08/34] ml/cnxk: update queue-pair " Srikanth Yalavarthi
                     ` (27 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implement CNXK wrapper functions for dev_info_get,
dev_configure, dev_close, dev_start and dev_stop. The
wrapper functions allocate / release common resources
for the ML driver and invoke device specific functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 230 ++------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |  16 +-
 drivers/ml/cnxk/cnxk_ml_dev.h  |   3 +
 drivers/ml/cnxk/cnxk_ml_ops.c  | 286 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.h  |   3 +
 5 files changed, 314 insertions(+), 224 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index a44fb26215..f8c51ab394 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -101,7 +101,7 @@ qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
 }
 
-static int
+int
 cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
 {
 	const struct rte_memzone *qp_mem;
@@ -861,20 +861,12 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 }
 
 int
-cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
+cn10k_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 
-	if (dev_info == NULL)
-		return -EINVAL;
-
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
-	dev_info->driver_name = dev->device->driver->name;
-	dev_info->max_models = ML_CNXK_MAX_MODELS;
 	if (cn10k_mldev->hw_queue_lock)
 		dev_info->max_queue_pairs = ML_CN10K_MAX_QP_PER_DEVICE_SL;
 	else
@@ -889,143 +881,17 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 }
 
 int
-cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
+cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
-	struct rte_ml_dev_info dev_info;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
 	struct cn10k_ml_ocm *ocm;
-	struct cnxk_ml_qp *qp;
-	uint16_t model_id;
-	uint32_t mz_size;
 	uint16_t tile_id;
-	uint16_t qp_id;
 	int ret;
 
-	if (dev == NULL || conf == NULL)
-		return -EINVAL;
+	RTE_SET_USED(conf);
 
-	/* Get CN10K device handle */
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
-	cn10k_ml_dev_info_get(dev, &dev_info);
-	if (conf->nb_models > dev_info.max_models) {
-		plt_err("Invalid device config, nb_models > %u\n", dev_info.max_models);
-		return -EINVAL;
-	}
-
-	if (conf->nb_queue_pairs > dev_info.max_queue_pairs) {
-		plt_err("Invalid device config, nb_queue_pairs > %u\n", dev_info.max_queue_pairs);
-		return -EINVAL;
-	}
-
-	if (cnxk_mldev->state == ML_CNXK_DEV_STATE_PROBED) {
-		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
-			   conf->nb_queue_pairs, conf->nb_models);
-
-		/* Load firmware */
-		ret = cn10k_ml_fw_load(cnxk_mldev);
-		if (ret != 0)
-			return ret;
-	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
-		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
-			   conf->nb_queue_pairs, conf->nb_models);
-	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_STARTED) {
-		plt_err("Device can't be reconfigured in started state\n");
-		return -ENOTSUP;
-	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CLOSED) {
-		plt_err("Device can't be reconfigured after close\n");
-		return -ENOTSUP;
-	}
-
-	/* Configure queue-pairs */
-	if (dev->data->queue_pairs == NULL) {
-		mz_size = sizeof(dev->data->queue_pairs[0]) * conf->nb_queue_pairs;
-		dev->data->queue_pairs =
-			rte_zmalloc("cn10k_mldev_queue_pairs", mz_size, RTE_CACHE_LINE_SIZE);
-		if (dev->data->queue_pairs == NULL) {
-			dev->data->nb_queue_pairs = 0;
-			plt_err("Failed to get memory for queue_pairs, nb_queue_pairs %u",
-				conf->nb_queue_pairs);
-			return -ENOMEM;
-		}
-	} else { /* Re-configure */
-		void **queue_pairs;
-
-		/* Release all queue pairs as ML spec doesn't support queue_pair_destroy. */
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-			qp = dev->data->queue_pairs[qp_id];
-			if (qp != NULL) {
-				ret = cn10k_ml_dev_queue_pair_release(dev, qp_id);
-				if (ret < 0)
-					return ret;
-			}
-		}
-
-		queue_pairs = dev->data->queue_pairs;
-		queue_pairs =
-			rte_realloc(queue_pairs, sizeof(queue_pairs[0]) * conf->nb_queue_pairs,
-				    RTE_CACHE_LINE_SIZE);
-		if (queue_pairs == NULL) {
-			dev->data->nb_queue_pairs = 0;
-			plt_err("Failed to realloc queue_pairs, nb_queue_pairs = %u",
-				conf->nb_queue_pairs);
-			ret = -ENOMEM;
-			goto error;
-		}
-
-		memset(queue_pairs, 0, sizeof(queue_pairs[0]) * conf->nb_queue_pairs);
-		dev->data->queue_pairs = queue_pairs;
-	}
-	dev->data->nb_queue_pairs = conf->nb_queue_pairs;
-
-	/* Allocate ML models */
-	if (dev->data->models == NULL) {
-		mz_size = sizeof(dev->data->models[0]) * conf->nb_models;
-		dev->data->models = rte_zmalloc("cn10k_mldev_models", mz_size, RTE_CACHE_LINE_SIZE);
-		if (dev->data->models == NULL) {
-			dev->data->nb_models = 0;
-			plt_err("Failed to get memory for ml_models, nb_models %u",
-				conf->nb_models);
-			ret = -ENOMEM;
-			goto error;
-		}
-	} else {
-		/* Re-configure */
-		void **models;
-
-		/* Stop and unload all models */
-		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
-			model = dev->data->models[model_id];
-			if (model != NULL) {
-				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-					if (cn10k_ml_model_stop(dev, model_id) != 0)
-						plt_err("Could not stop model %u", model_id);
-				}
-				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-					if (cn10k_ml_model_unload(dev, model_id) != 0)
-						plt_err("Could not unload model %u", model_id);
-				}
-				dev->data->models[model_id] = NULL;
-			}
-		}
-
-		models = dev->data->models;
-		models = rte_realloc(models, sizeof(models[0]) * conf->nb_models,
-				     RTE_CACHE_LINE_SIZE);
-		if (models == NULL) {
-			dev->data->nb_models = 0;
-			plt_err("Failed to realloc ml_models, nb_models = %u", conf->nb_models);
-			ret = -ENOMEM;
-			goto error;
-		}
-		memset(models, 0, sizeof(models[0]) * conf->nb_models);
-		dev->data->models = models;
-	}
-	dev->data->nb_models = conf->nb_models;
-
 	ocm = &cn10k_mldev->ocm;
 	ocm->num_tiles = ML_CN10K_OCM_NUMTILES;
 	ocm->size_per_tile = ML_CN10K_OCM_TILESIZE;
@@ -1038,8 +904,7 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 		rte_zmalloc("ocm_mask", ocm->mask_words * ocm->num_tiles, RTE_CACHE_LINE_SIZE);
 	if (ocm->ocm_mask == NULL) {
 		plt_err("Unable to allocate memory for OCM mask");
-		ret = -ENOMEM;
-		goto error;
+		return -ENOMEM;
 	}
 
 	for (tile_id = 0; tile_id < ocm->num_tiles; tile_id++) {
@@ -1050,10 +915,10 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	rte_spinlock_init(&ocm->lock);
 
 	/* Initialize xstats */
-	ret = cn10k_ml_xstats_init(dev);
+	ret = cn10k_ml_xstats_init(cnxk_mldev->mldev);
 	if (ret != 0) {
 		plt_err("Failed to initialize xstats");
-		goto error;
+		return ret;
 	}
 
 	/* Set JCMDQ enqueue function */
@@ -1067,77 +932,25 @@ cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *c
 	cn10k_mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
 	cn10k_mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
 
-	dev->enqueue_burst = cn10k_ml_enqueue_burst;
-	dev->dequeue_burst = cn10k_ml_dequeue_burst;
-	dev->op_error_get = cn10k_ml_op_error_get;
-
-	cnxk_mldev->nb_models_loaded = 0;
-	cnxk_mldev->nb_models_started = 0;
-	cnxk_mldev->nb_models_stopped = 0;
-	cnxk_mldev->nb_models_unloaded = 0;
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
+	cnxk_mldev->mldev->enqueue_burst = cn10k_ml_enqueue_burst;
+	cnxk_mldev->mldev->dequeue_burst = cn10k_ml_dequeue_burst;
+	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
 
 	return 0;
-
-error:
-	rte_free(dev->data->queue_pairs);
-
-	rte_free(dev->data->models);
-
-	return ret;
 }
 
 int
-cn10k_ml_dev_close(struct rte_ml_dev *dev)
+cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
-	struct cnxk_ml_qp *qp;
-	uint16_t model_id;
-	uint16_t qp_id;
 
-	if (dev == NULL)
-		return -EINVAL;
-
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	/* Release ocm_mask memory */
 	rte_free(cn10k_mldev->ocm.ocm_mask);
 
-	/* Stop and unload all models */
-	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
-		model = dev->data->models[model_id];
-		if (model != NULL) {
-			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-				if (cn10k_ml_model_stop(dev, model_id) != 0)
-					plt_err("Could not stop model %u", model_id);
-			}
-			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-				if (cn10k_ml_model_unload(dev, model_id) != 0)
-					plt_err("Could not unload model %u", model_id);
-			}
-			dev->data->models[model_id] = NULL;
-		}
-	}
-
-	rte_free(dev->data->models);
-
-	/* Destroy all queue pairs */
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-		qp = dev->data->queue_pairs[qp_id];
-		if (qp != NULL) {
-			if (cnxk_ml_qp_destroy(dev, qp) != 0)
-				plt_err("Could not destroy queue pair %u", qp_id);
-			dev->data->queue_pairs[qp_id] = NULL;
-		}
-	}
-
-	rte_free(dev->data->queue_pairs);
-
 	/* Un-initialize xstats */
-	cn10k_ml_xstats_uninit(dev);
+	cn10k_ml_xstats_uninit(cnxk_mldev->mldev);
 
 	/* Unload firmware */
 	cn10k_ml_fw_unload(cnxk_mldev);
@@ -1154,20 +967,15 @@ cn10k_ml_dev_close(struct rte_ml_dev *dev)
 	roc_ml_reg_write64(&cn10k_mldev->roc, 0, ML_MLR_BASE);
 	plt_ml_dbg("ML_MLR_BASE = 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_MLR_BASE));
 
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_CLOSED;
-
-	/* Remove PCI device */
-	return rte_dev_remove(dev->device);
+	return 0;
 }
 
 int
-cn10k_ml_dev_start(struct rte_ml_dev *dev)
+cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
@@ -1175,19 +983,15 @@ cn10k_ml_dev_start(struct rte_ml_dev *dev)
 	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
-
 	return 0;
 }
 
 int
-cn10k_ml_dev_stop(struct rte_ml_dev *dev)
+cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	uint64_t reg_val64;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 	reg_val64 = roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG);
@@ -1195,8 +999,6 @@ cn10k_ml_dev_stop(struct rte_ml_dev *dev)
 	roc_ml_reg_write64(&cn10k_mldev->roc, reg_val64, ML_CFG);
 	plt_ml_dbg("ML_CFG => 0x%016lx", roc_ml_reg_read64(&cn10k_mldev->roc, ML_CFG));
 
-	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
-
 	return 0;
 }
 
@@ -1217,7 +1019,7 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	if (dev->data->queue_pairs[queue_pair_id] != NULL)
 		cn10k_ml_dev_queue_pair_release(dev, queue_pair_id);
 
-	cn10k_ml_dev_info_get(dev, &dev_info);
+	cnxk_ml_dev_info_get(dev, &dev_info);
 	if ((qp_conf->nb_desc > dev_info.max_desc) || (qp_conf->nb_desc == 0)) {
 		plt_err("Could not setup queue pair for %u descriptors", qp_conf->nb_desc);
 		return -EINVAL;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 16480b9ad8..d50b5bede7 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -10,6 +10,9 @@
 
 #include <roc_api.h>
 
+struct cnxk_ml_dev;
+struct cnxk_ml_qp;
+
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
 
@@ -286,11 +289,11 @@ struct cn10k_ml_req {
 };
 
 /* Device ops */
-int cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
-int cn10k_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf);
-int cn10k_ml_dev_close(struct rte_ml_dev *dev);
-int cn10k_ml_dev_start(struct rte_ml_dev *dev);
-int cn10k_ml_dev_stop(struct rte_ml_dev *dev);
+int cn10k_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info);
+int cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
+int cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev);
+int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
 int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
 int cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
@@ -336,4 +339,7 @@ __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op
 				    struct rte_ml_op_error *error);
 __rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
 
+/* Temporarily set below functions as non-static */
+int cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp);
+
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 51315de622..02605fa28f 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -53,6 +53,9 @@ struct cnxk_ml_dev {
 
 	/* CN10K device structure */
 	struct cn10k_ml_dev cn10k_mldev;
+
+	/* Maximum number of layers */
+	uint64_t max_nb_layers;
 };
 
 #endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 03402681c5..07a4daabc5 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -5,15 +5,291 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include "cnxk_ml_dev.h"
+#include "cnxk_ml_io.h"
+#include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
+int
+cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	if (dev == NULL || dev_info == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	memset(dev_info, 0, sizeof(struct rte_ml_dev_info));
+	dev_info->driver_name = dev->device->driver->name;
+	dev_info->max_models = ML_CNXK_MAX_MODELS;
+
+	return cn10k_ml_dev_info_get(cnxk_mldev, dev_info);
+}
+
+static int
+cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *conf)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_qp *qp;
+	uint16_t model_id;
+	uint32_t mz_size;
+	uint16_t qp_id;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	/* Get CNXK device handle */
+	cnxk_mldev = dev->data->dev_private;
+
+	cnxk_ml_dev_info_get(dev, &dev_info);
+	if (conf->nb_models > dev_info.max_models) {
+		plt_err("Invalid device config, nb_models > %u\n", dev_info.max_models);
+		return -EINVAL;
+	}
+
+	if (conf->nb_queue_pairs > dev_info.max_queue_pairs) {
+		plt_err("Invalid device config, nb_queue_pairs > %u\n", dev_info.max_queue_pairs);
+		return -EINVAL;
+	}
+
+	if (cnxk_mldev->state == ML_CNXK_DEV_STATE_PROBED) {
+		plt_ml_dbg("Configuring ML device, nb_queue_pairs = %u, nb_models = %u",
+			   conf->nb_queue_pairs, conf->nb_models);
+
+		/* Load firmware */
+		ret = cn10k_ml_fw_load(cnxk_mldev);
+		if (ret != 0)
+			return ret;
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
+		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
+			   conf->nb_queue_pairs, conf->nb_models);
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_STARTED) {
+		plt_err("Device can't be reconfigured in started state\n");
+		return -ENOTSUP;
+	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CLOSED) {
+		plt_err("Device can't be reconfigured after close\n");
+		return -ENOTSUP;
+	}
+
+	/* Configure queue-pairs */
+	if (dev->data->queue_pairs == NULL) {
+		mz_size = sizeof(dev->data->queue_pairs[0]) * conf->nb_queue_pairs;
+		dev->data->queue_pairs =
+			rte_zmalloc("cnxk_mldev_queue_pairs", mz_size, RTE_CACHE_LINE_SIZE);
+		if (dev->data->queue_pairs == NULL) {
+			dev->data->nb_queue_pairs = 0;
+			plt_err("Failed to get memory for queue_pairs, nb_queue_pairs %u",
+				conf->nb_queue_pairs);
+			return -ENOMEM;
+		}
+	} else { /* Re-configure */
+		void **queue_pairs;
+
+		/* Release all queue pairs as ML spec doesn't support queue_pair_destroy. */
+		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+			qp = dev->data->queue_pairs[qp_id];
+			if (qp != NULL) {
+				ret = cn10k_ml_dev_queue_pair_release(dev, qp_id);
+				if (ret < 0)
+					return ret;
+			}
+		}
+
+		queue_pairs = dev->data->queue_pairs;
+		queue_pairs =
+			rte_realloc(queue_pairs, sizeof(queue_pairs[0]) * conf->nb_queue_pairs,
+				    RTE_CACHE_LINE_SIZE);
+		if (queue_pairs == NULL) {
+			dev->data->nb_queue_pairs = 0;
+			plt_err("Failed to realloc queue_pairs, nb_queue_pairs = %u",
+				conf->nb_queue_pairs);
+			ret = -ENOMEM;
+			goto error;
+		}
+
+		memset(queue_pairs, 0, sizeof(queue_pairs[0]) * conf->nb_queue_pairs);
+		dev->data->queue_pairs = queue_pairs;
+	}
+	dev->data->nb_queue_pairs = conf->nb_queue_pairs;
+
+	/* Allocate ML models */
+	if (dev->data->models == NULL) {
+		mz_size = sizeof(dev->data->models[0]) * conf->nb_models;
+		dev->data->models = rte_zmalloc("cnxk_mldev_models", mz_size, RTE_CACHE_LINE_SIZE);
+		if (dev->data->models == NULL) {
+			dev->data->nb_models = 0;
+			plt_err("Failed to get memory for ml_models, nb_models %u",
+				conf->nb_models);
+			ret = -ENOMEM;
+			goto error;
+		}
+	} else {
+		/* Re-configure */
+		void **models;
+
+		/* Stop and unload all models */
+		for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+			model = dev->data->models[model_id];
+			if (model != NULL) {
+				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
+					if (cn10k_ml_model_stop(dev, model_id) != 0)
+						plt_err("Could not stop model %u", model_id);
+				}
+				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
+					if (cn10k_ml_model_unload(dev, model_id) != 0)
+						plt_err("Could not unload model %u", model_id);
+				}
+				dev->data->models[model_id] = NULL;
+			}
+		}
+
+		models = dev->data->models;
+		models = rte_realloc(models, sizeof(models[0]) * conf->nb_models,
+				     RTE_CACHE_LINE_SIZE);
+		if (models == NULL) {
+			dev->data->nb_models = 0;
+			plt_err("Failed to realloc ml_models, nb_models = %u", conf->nb_models);
+			ret = -ENOMEM;
+			goto error;
+		}
+		memset(models, 0, sizeof(models[0]) * conf->nb_models);
+		dev->data->models = models;
+	}
+	dev->data->nb_models = conf->nb_models;
+
+	ret = cn10k_ml_dev_configure(cnxk_mldev, conf);
+	if (ret != 0) {
+		plt_err("Failed to configure CN10K ML Device");
+		goto error;
+	}
+
+	/* Set device capabilities */
+	cnxk_mldev->max_nb_layers =
+		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
+
+	cnxk_mldev->nb_models_loaded = 0;
+	cnxk_mldev->nb_models_started = 0;
+	cnxk_mldev->nb_models_stopped = 0;
+	cnxk_mldev->nb_models_unloaded = 0;
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
+
+	return 0;
+
+error:
+	rte_free(dev->data->queue_pairs);
+	rte_free(dev->data->models);
+
+	return ret;
+}
+
+static int
+cnxk_ml_dev_close(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_qp *qp;
+	uint16_t model_id;
+	uint16_t qp_id;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
+		plt_err("Failed to close CN10K ML Device");
+
+	/* Stop and unload all models */
+	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
+		model = dev->data->models[model_id];
+		if (model != NULL) {
+			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
+				if (cn10k_ml_model_stop(dev, model_id) != 0)
+					plt_err("Could not stop model %u", model_id);
+			}
+			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
+				if (cn10k_ml_model_unload(dev, model_id) != 0)
+					plt_err("Could not unload model %u", model_id);
+			}
+			dev->data->models[model_id] = NULL;
+		}
+	}
+
+	rte_free(dev->data->models);
+
+	/* Destroy all queue pairs */
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		if (qp != NULL) {
+			if (cnxk_ml_qp_destroy(dev, qp) != 0)
+				plt_err("Could not destroy queue pair %u", qp_id);
+			dev->data->queue_pairs[qp_id] = NULL;
+		}
+	}
+
+	rte_free(dev->data->queue_pairs);
+
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CLOSED;
+
+	/* Remove PCI device */
+	return rte_dev_remove(dev->device);
+}
+
+static int
+cnxk_ml_dev_start(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	ret = cn10k_ml_dev_start(cnxk_mldev);
+	if (ret != 0) {
+		plt_err("Failed to start CN10K ML Device");
+		return ret;
+	}
+
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
+
+	return 0;
+}
+
+static int
+cnxk_ml_dev_stop(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	ret = cn10k_ml_dev_stop(cnxk_mldev);
+	if (ret != 0) {
+		plt_err("Failed to stop CN10K ML Device");
+		return ret;
+	}
+
+	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
-	.dev_info_get = cn10k_ml_dev_info_get,
-	.dev_configure = cn10k_ml_dev_configure,
-	.dev_close = cn10k_ml_dev_close,
-	.dev_start = cn10k_ml_dev_start,
-	.dev_stop = cn10k_ml_dev_stop,
+	.dev_info_get = cnxk_ml_dev_info_get,
+	.dev_configure = cnxk_ml_dev_configure,
+	.dev_close = cnxk_ml_dev_close,
+	.dev_start = cnxk_ml_dev_start,
+	.dev_stop = cnxk_ml_dev_stop,
 	.dev_dump = cn10k_ml_dev_dump,
 	.dev_selftest = cn10k_ml_dev_selftest,
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index a925c07580..2996928d7d 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -62,4 +62,7 @@ struct cnxk_ml_qp {
 
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
+/* Temporarily set cnxk driver functions as non-static */
+int cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 08/34] ml/cnxk: update queue-pair handling functions
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (6 preceding siblings ...)
  2023-10-26 12:43   ` [PATCH v9 07/34] ml/cnxk: update device handling functions Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 09/34] ml/cnxk: update model load and unload functions Srikanth Yalavarthi
                     ` (26 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to handle ML device queue-pairs.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 135 +----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   7 +-
 drivers/ml/cnxk/cnxk_ml_ops.c  | 153 ++++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.h  |   3 -
 4 files changed, 154 insertions(+), 144 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index f8c51ab394..9691cf03e3 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -95,93 +95,12 @@ cn10k_ml_get_poll_ptr(struct cnxk_ml_req *req)
 	return plt_read64(req->status);
 }
 
-static void
-qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
-{
-	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
-}
-
-int
-cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
-{
-	const struct rte_memzone *qp_mem;
-	char name[RTE_MEMZONE_NAMESIZE];
-	int ret;
-
-	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp->id);
-	qp_mem = rte_memzone_lookup(name);
-	ret = rte_memzone_free(qp_mem);
-	if (ret)
-		return ret;
-
-	rte_free(qp);
-
-	return 0;
-}
-
-int
-cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
-{
-	struct cnxk_ml_qp *qp;
-	int ret;
-
-	qp = dev->data->queue_pairs[queue_pair_id];
-	if (qp == NULL)
-		return -EINVAL;
-
-	ret = cnxk_ml_qp_destroy(dev, qp);
-	if (ret) {
-		plt_err("Could not destroy queue pair %u", queue_pair_id);
-		return ret;
-	}
-
-	dev->data->queue_pairs[queue_pair_id] = NULL;
-
-	return 0;
-}
-
-static struct cnxk_ml_qp *
-cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
+void
+cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp)
 {
-	const struct rte_memzone *qp_mem;
-	char name[RTE_MEMZONE_NAMESIZE];
-	struct cnxk_ml_qp *qp;
-	uint32_t len;
-	uint8_t *va;
 	uint64_t i;
 
-	/* Allocate queue pair */
-	qp = rte_zmalloc_socket("cn10k_ml_pmd_queue_pair", sizeof(struct cnxk_ml_qp), ROC_ALIGN,
-				socket_id);
-	if (qp == NULL) {
-		plt_err("Could not allocate queue pair");
-		return NULL;
-	}
-
-	/* For request queue */
-	len = nb_desc * sizeof(struct cnxk_ml_req);
-	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp_id);
-	qp_mem = rte_memzone_reserve_aligned(
-		name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | RTE_MEMZONE_256MB, ROC_ALIGN);
-	if (qp_mem == NULL) {
-		plt_err("Could not reserve memzone: %s", name);
-		goto qp_free;
-	}
-
-	va = qp_mem->addr;
-	memset(va, 0, len);
-
-	/* Initialize Request queue */
-	qp->id = qp_id;
-	qp->queue.reqs = (struct cnxk_ml_req *)va;
-	qp->queue.head = 0;
-	qp->queue.tail = 0;
-	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
-	qp->nb_desc = nb_desc;
-	qp->stats.enqueued_count = 0;
-	qp->stats.dequeued_count = 0;
-	qp->stats.enqueue_err_count = 0;
-	qp->stats.dequeue_err_count = 0;
+	RTE_SET_USED(cnxk_mldev);
 
 	/* Initialize job command */
 	for (i = 0; i < qp->nb_desc; i++) {
@@ -189,13 +108,6 @@ cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc
 		qp->queue.reqs[i].cn10k_req.jcmd.w1.s.jobptr =
 			PLT_U64_CAST(&qp->queue.reqs[i].cn10k_req.jd);
 	}
-
-	return qp;
-
-qp_free:
-	rte_free(qp);
-
-	return NULL;
 }
 
 static void
@@ -1002,47 +914,6 @@ cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
-int
-cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
-			      const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
-{
-	struct rte_ml_dev_info dev_info;
-	struct cnxk_ml_qp *qp;
-	uint32_t nb_desc;
-
-	if (queue_pair_id >= dev->data->nb_queue_pairs) {
-		plt_err("Queue-pair id = %u (>= max queue pairs supported, %u)\n", queue_pair_id,
-			dev->data->nb_queue_pairs);
-		return -EINVAL;
-	}
-
-	if (dev->data->queue_pairs[queue_pair_id] != NULL)
-		cn10k_ml_dev_queue_pair_release(dev, queue_pair_id);
-
-	cnxk_ml_dev_info_get(dev, &dev_info);
-	if ((qp_conf->nb_desc > dev_info.max_desc) || (qp_conf->nb_desc == 0)) {
-		plt_err("Could not setup queue pair for %u descriptors", qp_conf->nb_desc);
-		return -EINVAL;
-	}
-	plt_ml_dbg("Creating queue-pair, queue_pair_id = %u, nb_desc = %u", queue_pair_id,
-		   qp_conf->nb_desc);
-
-	/* As the number of usable descriptors is 1 less than the queue size being created, we
-	 * increment the size of queue by 1 than the requested size, except when the requested size
-	 * is equal to the maximum possible size.
-	 */
-	nb_desc =
-		(qp_conf->nb_desc == dev_info.max_desc) ? dev_info.max_desc : qp_conf->nb_desc + 1;
-	qp = cnxk_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
-	if (qp == NULL) {
-		plt_err("Could not create queue pair %u", queue_pair_id);
-		return -ENOMEM;
-	}
-	dev->data->queue_pairs[queue_pair_id] = qp;
-
-	return 0;
-}
-
 int
 cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
 {
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index d50b5bede7..2d0a49d5cd 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -296,9 +296,6 @@ int cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
 int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
-int cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
-				  const struct rte_ml_dev_qp_conf *qp_conf, int socket_id);
-int cn10k_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id);
 
 int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
 void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
@@ -339,7 +336,7 @@ __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op
 				    struct rte_ml_op_error *error);
 __rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
 
-/* Temporarily set below functions as non-static */
-int cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp);
+/* Misc ops */
+void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
 
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 07a4daabc5..aa56dd2276 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -10,7 +10,107 @@
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
-int
+static void
+qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
+{
+	snprintf(name, size, "cnxk_ml_qp_mem_%u:%u", dev_id, qp_id);
+}
+
+static int
+cnxk_ml_qp_destroy(const struct rte_ml_dev *dev, struct cnxk_ml_qp *qp)
+{
+	const struct rte_memzone *qp_mem;
+	char name[RTE_MEMZONE_NAMESIZE];
+	int ret;
+
+	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp->id);
+	qp_mem = rte_memzone_lookup(name);
+	ret = rte_memzone_free(qp_mem);
+	if (ret)
+		return ret;
+
+	rte_free(qp);
+
+	return 0;
+}
+
+static int
+cnxk_ml_dev_queue_pair_release(struct rte_ml_dev *dev, uint16_t queue_pair_id)
+{
+	struct cnxk_ml_qp *qp;
+	int ret;
+
+	qp = dev->data->queue_pairs[queue_pair_id];
+	if (qp == NULL)
+		return -EINVAL;
+
+	ret = cnxk_ml_qp_destroy(dev, qp);
+	if (ret) {
+		plt_err("Could not destroy queue pair %u", queue_pair_id);
+		return ret;
+	}
+
+	dev->data->queue_pairs[queue_pair_id] = NULL;
+
+	return 0;
+}
+
+static struct cnxk_ml_qp *
+cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc, int socket_id)
+{
+	const struct rte_memzone *qp_mem;
+	char name[RTE_MEMZONE_NAMESIZE];
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_qp *qp;
+	uint32_t len;
+	uint8_t *va;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	/* Allocate queue pair */
+	qp = rte_zmalloc_socket("cnxk_ml_pmd_queue_pair", sizeof(struct cnxk_ml_qp), ROC_ALIGN,
+				socket_id);
+	if (qp == NULL) {
+		plt_err("Could not allocate queue pair");
+		return NULL;
+	}
+
+	/* For request queue */
+	len = nb_desc * sizeof(struct cnxk_ml_req);
+	qp_memzone_name_get(name, RTE_MEMZONE_NAMESIZE, dev->data->dev_id, qp_id);
+	qp_mem = rte_memzone_reserve_aligned(
+		name, len, socket_id, RTE_MEMZONE_SIZE_HINT_ONLY | RTE_MEMZONE_256MB, ROC_ALIGN);
+	if (qp_mem == NULL) {
+		plt_err("Could not reserve memzone: %s", name);
+		goto qp_free;
+	}
+
+	va = qp_mem->addr;
+	memset(va, 0, len);
+
+	/* Initialize Request queue */
+	qp->id = qp_id;
+	qp->queue.reqs = (struct cnxk_ml_req *)va;
+	qp->queue.head = 0;
+	qp->queue.tail = 0;
+	qp->queue.wait_cycles = ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
+	qp->nb_desc = nb_desc;
+	qp->stats.enqueued_count = 0;
+	qp->stats.dequeued_count = 0;
+	qp->stats.enqueue_err_count = 0;
+	qp->stats.dequeue_err_count = 0;
+
+	cn10k_ml_qp_initialize(cnxk_mldev, qp);
+
+	return qp;
+
+qp_free:
+	rte_free(qp);
+
+	return NULL;
+}
+
+static int
 cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
 	struct cnxk_ml_dev *cnxk_mldev;
@@ -93,7 +193,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
 			qp = dev->data->queue_pairs[qp_id];
 			if (qp != NULL) {
-				ret = cn10k_ml_dev_queue_pair_release(dev, qp_id);
+				ret = cnxk_ml_dev_queue_pair_release(dev, qp_id);
 				if (ret < 0)
 					return ret;
 			}
@@ -283,6 +383,51 @@ cnxk_ml_dev_stop(struct rte_ml_dev *dev)
 	return 0;
 }
 
+static int
+cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
+			     const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cnxk_ml_qp *qp;
+	uint32_t nb_desc;
+
+	if (queue_pair_id >= dev->data->nb_queue_pairs) {
+		plt_err("Queue-pair id = %u (>= max queue pairs supported, %u)\n", queue_pair_id,
+			dev->data->nb_queue_pairs);
+		return -EINVAL;
+	}
+
+	if (dev->data->queue_pairs[queue_pair_id] != NULL)
+		cnxk_ml_dev_queue_pair_release(dev, queue_pair_id);
+
+	cnxk_ml_dev_info_get(dev, &dev_info);
+	if (qp_conf->nb_desc == 0) {
+		plt_err("Could not setup queue pair for %u descriptors", qp_conf->nb_desc);
+		return -EINVAL;
+	} else if (qp_conf->nb_desc > dev_info.max_desc) {
+		plt_err("Could not setup queue pair for %u descriptors (> %u)", qp_conf->nb_desc,
+			dev_info.max_desc);
+		return -EINVAL;
+	}
+	plt_ml_dbg("Creating queue-pair, queue_pair_id = %u, nb_desc = %u", queue_pair_id,
+		   qp_conf->nb_desc);
+
+	/* As the number of usable descriptors is 1 less than the queue size being created, we
+	 * increment the size of queue by 1 than the requested size, except when the requested size
+	 * is equal to the maximum possible size.
+	 */
+	nb_desc =
+		(qp_conf->nb_desc == dev_info.max_desc) ? dev_info.max_desc : qp_conf->nb_desc + 1;
+	qp = cnxk_ml_qp_create(dev, queue_pair_id, nb_desc, socket_id);
+	if (qp == NULL) {
+		plt_err("Could not create queue pair %u", queue_pair_id);
+		return -ENOMEM;
+	}
+	dev->data->queue_pairs[queue_pair_id] = qp;
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -294,8 +439,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_selftest = cn10k_ml_dev_selftest,
 
 	/* Queue-pair handling ops */
-	.dev_queue_pair_setup = cn10k_ml_dev_queue_pair_setup,
-	.dev_queue_pair_release = cn10k_ml_dev_queue_pair_release,
+	.dev_queue_pair_setup = cnxk_ml_dev_queue_pair_setup,
+	.dev_queue_pair_release = cnxk_ml_dev_queue_pair_release,
 
 	/* Stats ops */
 	.dev_stats_get = cn10k_ml_dev_stats_get,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index 2996928d7d..a925c07580 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -62,7 +62,4 @@ struct cnxk_ml_qp {
 
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
-/* Temporarily set cnxk driver functions as non-static */
-int cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info);
-
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 09/34] ml/cnxk: update model load and unload functions
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (7 preceding siblings ...)
  2023-10-26 12:43   ` [PATCH v9 08/34] ml/cnxk: update queue-pair " Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 10/34] ml/cnxk: update model start and stop functions Srikanth Yalavarthi
                     ` (25 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented cnxk wrapper functions to load and unload
ML models. Wrapper functions would invoke the cn10k
model load and unload functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 244 ++++++++++++-------------
 drivers/ml/cnxk/cn10k_ml_model.h |  26 ++-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 296 ++++++++++++++++++-------------
 drivers/ml/cnxk/cn10k_ml_ops.h   |  12 +-
 drivers/ml/cnxk/cnxk_ml_dev.h    |  15 ++
 drivers/ml/cnxk/cnxk_ml_ops.c    | 144 ++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.h    |   2 +
 7 files changed, 462 insertions(+), 277 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index d2f1c761be..48d70027ca 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -316,42 +316,31 @@ cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer, uint8_t
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_layer_addr *addr;
-	size_t model_data_size;
 	uint8_t *dma_addr_load;
-	uint8_t *dma_addr_run;
 	int fpos;
 
 	metadata = &layer->glow.metadata;
 	addr = &layer->glow.addr;
-	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
-			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
 
 	/* Base address */
 	addr->base_dma_addr_load = base_dma_addr;
-	addr->base_dma_addr_run = PLT_PTR_ADD(addr->base_dma_addr_load, model_data_size);
 
 	/* Init section */
 	dma_addr_load = addr->base_dma_addr_load;
-	dma_addr_run = addr->base_dma_addr_run;
 	fpos = sizeof(struct cn10k_ml_model_metadata);
 	addr->init_load_addr = dma_addr_load;
-	addr->init_run_addr = dma_addr_run;
 	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->init_model.file_size);
 
 	/* Main section */
 	dma_addr_load += metadata->init_model.file_size;
-	dma_addr_run += metadata->init_model.file_size;
 	fpos += metadata->init_model.file_size;
 	addr->main_load_addr = dma_addr_load;
-	addr->main_run_addr = dma_addr_run;
 	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->main_model.file_size);
 
 	/* Finish section */
 	dma_addr_load += metadata->main_model.file_size;
-	dma_addr_run += metadata->main_model.file_size;
 	fpos += metadata->main_model.file_size;
 	addr->finish_load_addr = dma_addr_load;
-	addr->finish_run_addr = dma_addr_run;
 	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->finish_model.file_size);
 
 	/* Weights and Bias section */
@@ -363,142 +352,148 @@ cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer, uint8_t
 }
 
 void
-cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer)
+cn10k_ml_layer_io_info_set(struct cnxk_ml_io_info *io_info,
+			   struct cn10k_ml_model_metadata *metadata)
 {
-	struct cn10k_ml_model_metadata *metadata;
 	uint8_t i;
 	uint8_t j;
 
-	metadata = &layer->glow.metadata;
-
 	/* Inputs */
-	layer->info.nb_inputs = metadata->model.num_input;
-	layer->info.total_input_sz_d = 0;
-	layer->info.total_input_sz_q = 0;
+	io_info->nb_inputs = metadata->model.num_input;
+	io_info->total_input_sz_d = 0;
+	io_info->total_input_sz_q = 0;
 	for (i = 0; i < metadata->model.num_input; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			rte_strscpy(layer->info.input[i].name,
-				    (char *)metadata->input1[i].input_name, MRVL_ML_INPUT_NAME_LEN);
-			layer->info.input[i].dtype = metadata->input1[i].input_type;
-			layer->info.input[i].qtype = metadata->input1[i].model_input_type;
-			layer->info.input[i].nb_dims = 4;
-			layer->info.input[i].shape[0] = metadata->input1[i].shape.w;
-			layer->info.input[i].shape[1] = metadata->input1[i].shape.x;
-			layer->info.input[i].shape[2] = metadata->input1[i].shape.y;
-			layer->info.input[i].shape[3] = metadata->input1[i].shape.z;
-			layer->info.input[i].nb_elements =
+			rte_strscpy(io_info->input[i].name, (char *)metadata->input1[i].input_name,
+				    MRVL_ML_INPUT_NAME_LEN);
+			io_info->input[i].dtype = metadata->input1[i].input_type;
+			io_info->input[i].qtype = metadata->input1[i].model_input_type;
+			io_info->input[i].nb_dims = 4;
+			io_info->input[i].shape[0] = metadata->input1[i].shape.w;
+			io_info->input[i].shape[1] = metadata->input1[i].shape.x;
+			io_info->input[i].shape[2] = metadata->input1[i].shape.y;
+			io_info->input[i].shape[3] = metadata->input1[i].shape.z;
+			io_info->input[i].nb_elements =
 				metadata->input1[i].shape.w * metadata->input1[i].shape.x *
 				metadata->input1[i].shape.y * metadata->input1[i].shape.z;
-			layer->info.input[i].sz_d =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_d =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].input_type);
-			layer->info.input[i].sz_q =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_q =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input1[i].model_input_type);
-			layer->info.input[i].scale = metadata->input1[i].qscale;
+			io_info->input[i].scale = metadata->input1[i].qscale;
 
-			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
-			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
+			io_info->total_input_sz_d += io_info->input[i].sz_d;
+			io_info->total_input_sz_q += io_info->input[i].sz_q;
 
 			plt_ml_dbg(
-				"index = %u, input1[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				layer->index, i, metadata->input1[i].shape.w,
+				"layer_name = %s, input1[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				metadata->model.name, i, metadata->input1[i].shape.w,
 				metadata->input1[i].shape.x, metadata->input1[i].shape.y,
-				metadata->input1[i].shape.z, layer->info.input[i].sz_d,
-				layer->info.input[i].sz_q);
+				metadata->input1[i].shape.z, io_info->input[i].sz_d,
+				io_info->input[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			rte_strscpy(layer->info.input[i].name,
-				    (char *)metadata->input2[j].input_name, MRVL_ML_INPUT_NAME_LEN);
-			layer->info.input[i].dtype = metadata->input2[j].input_type;
-			layer->info.input[i].qtype = metadata->input2[j].model_input_type;
-			layer->info.input[i].nb_dims = 4;
-			layer->info.input[i].shape[0] = metadata->input2[j].shape.w;
-			layer->info.input[i].shape[1] = metadata->input2[j].shape.x;
-			layer->info.input[i].shape[2] = metadata->input2[j].shape.y;
-			layer->info.input[i].shape[3] = metadata->input2[j].shape.z;
-			layer->info.input[i].nb_elements =
+			rte_strscpy(io_info->input[i].name, (char *)metadata->input2[j].input_name,
+				    MRVL_ML_INPUT_NAME_LEN);
+			io_info->input[i].dtype = metadata->input2[j].input_type;
+			io_info->input[i].qtype = metadata->input2[j].model_input_type;
+			io_info->input[i].nb_dims = 4;
+			io_info->input[i].shape[0] = metadata->input2[j].shape.w;
+			io_info->input[i].shape[1] = metadata->input2[j].shape.x;
+			io_info->input[i].shape[2] = metadata->input2[j].shape.y;
+			io_info->input[i].shape[3] = metadata->input2[j].shape.z;
+			io_info->input[i].nb_elements =
 				metadata->input2[j].shape.w * metadata->input2[j].shape.x *
 				metadata->input2[j].shape.y * metadata->input2[j].shape.z;
-			layer->info.input[i].sz_d =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_d =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].input_type);
-			layer->info.input[i].sz_q =
-				layer->info.input[i].nb_elements *
+			io_info->input[i].sz_q =
+				io_info->input[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->input2[j].model_input_type);
-			layer->info.input[i].scale = metadata->input2[j].qscale;
+			io_info->input[i].scale = metadata->input2[j].qscale;
 
-			layer->info.total_input_sz_d += layer->info.input[i].sz_d;
-			layer->info.total_input_sz_q += layer->info.input[i].sz_q;
+			io_info->total_input_sz_d += io_info->input[i].sz_d;
+			io_info->total_input_sz_q += io_info->input[i].sz_q;
 
 			plt_ml_dbg(
-				"index = %u, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
-				layer->index, j, metadata->input2[j].shape.w,
+				"layer_name = %s, input2[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+				metadata->model.name, j, metadata->input2[j].shape.w,
 				metadata->input2[j].shape.x, metadata->input2[j].shape.y,
-				metadata->input2[j].shape.z, layer->info.input[i].sz_d,
-				layer->info.input[i].sz_q);
+				metadata->input2[j].shape.z, io_info->input[i].sz_d,
+				io_info->input[i].sz_q);
 		}
 	}
 
 	/* Outputs */
-	layer->info.nb_outputs = metadata->model.num_output;
-	layer->info.total_output_sz_q = 0;
-	layer->info.total_output_sz_d = 0;
+	io_info->nb_outputs = metadata->model.num_output;
+	io_info->total_output_sz_q = 0;
+	io_info->total_output_sz_d = 0;
 	for (i = 0; i < metadata->model.num_output; i++) {
 		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			rte_strscpy(layer->info.output[i].name,
+			rte_strscpy(io_info->output[i].name,
 				    (char *)metadata->output1[i].output_name,
 				    MRVL_ML_OUTPUT_NAME_LEN);
-			layer->info.output[i].dtype = metadata->output1[i].output_type;
-			layer->info.output[i].qtype = metadata->output1[i].model_output_type;
-			layer->info.output[i].nb_dims = 1;
-			layer->info.output[i].shape[0] = metadata->output1[i].size;
-			layer->info.output[i].nb_elements = metadata->output1[i].size;
-			layer->info.output[i].sz_d =
-				layer->info.output[i].nb_elements *
+			io_info->output[i].dtype = metadata->output1[i].output_type;
+			io_info->output[i].qtype = metadata->output1[i].model_output_type;
+			io_info->output[i].nb_dims = 1;
+			io_info->output[i].shape[0] = metadata->output1[i].size;
+			io_info->output[i].nb_elements = metadata->output1[i].size;
+			io_info->output[i].sz_d =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].output_type);
-			layer->info.output[i].sz_q =
-				layer->info.output[i].nb_elements *
+			io_info->output[i].sz_q =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output1[i].model_output_type);
-			layer->info.output[i].scale = metadata->output1[i].dscale;
+			io_info->output[i].scale = metadata->output1[i].dscale;
 
-			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
-			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
+			io_info->total_output_sz_q += io_info->output[i].sz_q;
+			io_info->total_output_sz_d += io_info->output[i].sz_d;
 
-			plt_ml_dbg("index = %u, output1[%u] - sz_d = %u, sz_q = %u", layer->index,
-				   i, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
+			plt_ml_dbg("layer_name = %s, output1[%u] - sz_d = %u, sz_q = %u",
+				   metadata->model.name, i, io_info->output[i].sz_d,
+				   io_info->output[i].sz_q);
 		} else {
 			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
 
-			rte_strscpy(layer->info.output[i].name,
+			rte_strscpy(io_info->output[i].name,
 				    (char *)metadata->output2[j].output_name,
 				    MRVL_ML_OUTPUT_NAME_LEN);
-			layer->info.output[i].dtype = metadata->output2[j].output_type;
-			layer->info.output[i].qtype = metadata->output2[j].model_output_type;
-			layer->info.output[i].nb_dims = 1;
-			layer->info.output[i].shape[0] = metadata->output2[j].size;
-			layer->info.output[i].nb_elements = metadata->output2[j].size;
-			layer->info.output[i].sz_d =
-				layer->info.output[i].nb_elements *
+			io_info->output[i].dtype = metadata->output2[j].output_type;
+			io_info->output[i].qtype = metadata->output2[j].model_output_type;
+			io_info->output[i].nb_dims = 1;
+			io_info->output[i].shape[0] = metadata->output2[j].size;
+			io_info->output[i].nb_elements = metadata->output2[j].size;
+			io_info->output[i].sz_d =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].output_type);
-			layer->info.output[i].sz_q =
-				layer->info.output[i].nb_elements *
+			io_info->output[i].sz_q =
+				io_info->output[i].nb_elements *
 				rte_ml_io_type_size_get(metadata->output2[j].model_output_type);
-			layer->info.output[i].scale = metadata->output2[j].dscale;
+			io_info->output[i].scale = metadata->output2[j].dscale;
 
-			layer->info.total_output_sz_q += layer->info.output[i].sz_q;
-			layer->info.total_output_sz_d += layer->info.output[i].sz_d;
+			io_info->total_output_sz_q += io_info->output[i].sz_q;
+			io_info->total_output_sz_d += io_info->output[i].sz_d;
 
-			plt_ml_dbg("index = %u, output2[%u] - sz_d = %u, sz_q = %u", layer->index,
-				   j, layer->info.output[i].sz_d, layer->info.output[i].sz_q);
+			plt_ml_dbg("layer_name = %s, output2[%u] - sz_d = %u, sz_q = %u",
+				   metadata->model.name, j, io_info->output[i].sz_d,
+				   io_info->output[i].sz_q);
 		}
 	}
 }
 
+struct cnxk_ml_io_info *
+cn10k_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
+{
+	return &model->layer[layer_id].info;
+}
+
 int
-cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id, uint8_t *buffer,
-			       uint16_t *wb_pages, uint16_t *scratch_pages)
+cn10k_ml_model_ocm_pages_count(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
+			       uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_ocm *ocm;
@@ -506,7 +501,7 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 	uint64_t wb_size;
 
 	metadata = (struct cn10k_ml_model_metadata *)buffer;
-	ocm = &cn10k_mldev->ocm;
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
 
 	/* Assume wb_size is zero for non-relocatable models */
 	if (metadata->model.ocm_relocatable)
@@ -518,7 +513,7 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 		*wb_pages = wb_size / ocm->page_size + 1;
 	else
 		*wb_pages = wb_size / ocm->page_size;
-	plt_ml_dbg("model_id = %u, wb_size = %" PRIu64 ", wb_pages = %u", model_id, wb_size,
+	plt_ml_dbg("index = %u, wb_size = %" PRIu64 ", wb_pages = %u", layer->index, wb_size,
 		   *wb_pages);
 
 	scratch_size = ocm->size_per_tile - metadata->model.ocm_tmp_range_floor;
@@ -526,15 +521,15 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 		*scratch_pages = scratch_size / ocm->page_size + 1;
 	else
 		*scratch_pages = scratch_size / ocm->page_size;
-	plt_ml_dbg("model_id = %u, scratch_size = %" PRIu64 ", scratch_pages = %u", model_id,
+	plt_ml_dbg("index = %u, scratch_size = %" PRIu64 ", scratch_pages = %u", layer->index,
 		   scratch_size, *scratch_pages);
 
 	/* Check if the model can be loaded on OCM */
-	if ((*wb_pages + *scratch_pages) > cn10k_mldev->ocm.num_pages) {
+	if ((*wb_pages + *scratch_pages) > ocm->num_pages) {
 		plt_err("Cannot create the model, OCM relocatable = %u",
 			metadata->model.ocm_relocatable);
 		plt_err("wb_pages (%u) + scratch_pages (%u) > %u", *wb_pages, *scratch_pages,
-			cn10k_mldev->ocm.num_pages);
+			ocm->num_pages);
 		return -ENOMEM;
 	}
 
@@ -542,28 +537,25 @@ cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_
 	 * prevent the library from allocating the remaining space on the tile to other models.
 	 */
 	if (!metadata->model.ocm_relocatable)
-		*scratch_pages = PLT_MAX(PLT_U64_CAST(*scratch_pages),
-					 PLT_U64_CAST(cn10k_mldev->ocm.num_pages));
+		*scratch_pages =
+			PLT_MAX(PLT_U64_CAST(*scratch_pages), PLT_U64_CAST(ocm->num_pages));
 
 	return 0;
 }
 
 void
-cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
+cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			struct cnxk_ml_io_info *io_info, struct cn10k_ml_model_metadata *metadata)
 {
-	struct cn10k_ml_model_metadata *metadata;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct rte_ml_model_info *info;
 	struct rte_ml_io_info *output;
 	struct rte_ml_io_info *input;
-	struct cnxk_ml_layer *layer;
 	uint8_t i;
 
-	cnxk_mldev = dev->data->dev_private;
 	metadata = &model->glow.metadata;
 	info = PLT_PTR_CAST(model->info);
 	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
-	output = PLT_PTR_ADD(input, metadata->model.num_input * sizeof(struct rte_ml_io_info));
+	output = PLT_PTR_ADD(input, ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info));
 
 	/* Set model info */
 	memset(info, 0, sizeof(struct rte_ml_model_info));
@@ -572,39 +564,37 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model)
 		 metadata->model.version[1], metadata->model.version[2],
 		 metadata->model.version[3]);
 	info->model_id = model->model_id;
-	info->device_id = dev->data->dev_id;
+	info->device_id = cnxk_mldev->mldev->data->dev_id;
 	info->io_layout = RTE_ML_IO_LAYOUT_PACKED;
 	info->min_batches = model->batch_size;
 	info->max_batches =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_num_batches /
 		model->batch_size;
-	info->nb_inputs = metadata->model.num_input;
+	info->nb_inputs = io_info->nb_inputs;
 	info->input_info = input;
-	info->nb_outputs = metadata->model.num_output;
+	info->nb_outputs = io_info->nb_outputs;
 	info->output_info = output;
 	info->wb_size = metadata->weights_bias.file_size;
 
 	/* Set input info */
-	layer = &model->layer[0];
 	for (i = 0; i < info->nb_inputs; i++) {
-		rte_memcpy(input[i].name, layer->info.input[i].name, MRVL_ML_INPUT_NAME_LEN);
-		input[i].nb_dims = layer->info.input[i].nb_dims;
-		input[i].shape = &layer->info.input[i].shape[0];
-		input[i].type = layer->info.input[i].qtype;
-		input[i].nb_elements = layer->info.input[i].nb_elements;
-		input[i].size = layer->info.input[i].nb_elements *
-				rte_ml_io_type_size_get(layer->info.input[i].qtype);
+		rte_memcpy(input[i].name, io_info->input[i].name, MRVL_ML_INPUT_NAME_LEN);
+		input[i].nb_dims = io_info->input[i].nb_dims;
+		input[i].shape = &io_info->input[i].shape[0];
+		input[i].type = io_info->input[i].qtype;
+		input[i].nb_elements = io_info->input[i].nb_elements;
+		input[i].size = io_info->input[i].nb_elements *
+				rte_ml_io_type_size_get(io_info->input[i].qtype);
 	}
 
 	/* Set output info */
-	layer = &model->layer[0];
 	for (i = 0; i < info->nb_outputs; i++) {
-		rte_memcpy(output[i].name, layer->info.output[i].name, MRVL_ML_INPUT_NAME_LEN);
-		output[i].nb_dims = layer->info.output[i].nb_dims;
-		output[i].shape = &layer->info.output[i].shape[0];
-		output[i].type = layer->info.output[i].qtype;
-		output[i].nb_elements = layer->info.output[i].nb_elements;
-		output[i].size = layer->info.output[i].nb_elements *
-				 rte_ml_io_type_size_get(layer->info.output[i].qtype);
+		rte_memcpy(output[i].name, io_info->output[i].name, MRVL_ML_INPUT_NAME_LEN);
+		output[i].nb_dims = io_info->output[i].nb_dims;
+		output[i].shape = &io_info->output[i].shape[0];
+		output[i].type = io_info->output[i].qtype;
+		output[i].nb_elements = io_info->output[i].nb_elements;
+		output[i].size = io_info->output[i].nb_elements *
+				 rte_ml_io_type_size_get(io_info->output[i].qtype);
 	}
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 5c32f48c68..b891c9d627 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -9,9 +9,11 @@
 
 #include <roc_api.h>
 
-#include "cn10k_ml_dev.h"
 #include "cn10k_ml_ocm.h"
 
+#include "cnxk_ml_io.h"
+
+struct cnxk_ml_dev;
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
 struct cnxk_ml_req;
@@ -366,27 +368,15 @@ struct cn10k_ml_layer_addr {
 	/* Base DMA address for load */
 	void *base_dma_addr_load;
 
-	/* Base DMA address for run */
-	void *base_dma_addr_run;
-
 	/* Init section load address */
 	void *init_load_addr;
 
-	/* Init section run address */
-	void *init_run_addr;
-
 	/* Main section load address */
 	void *main_load_addr;
 
-	/* Main section run address */
-	void *main_run_addr;
-
 	/* Finish section load address */
 	void *finish_load_addr;
 
-	/* Finish section run address */
-	void *finish_run_addr;
-
 	/* Weights and Bias base address */
 	void *wb_base_addr;
 
@@ -462,9 +452,13 @@ int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
 void cn10k_ml_layer_addr_update(struct cnxk_ml_layer *layer, uint8_t *buffer,
 				uint8_t *base_dma_addr);
-void cn10k_ml_layer_info_update(struct cnxk_ml_layer *layer);
-int cn10k_ml_model_ocm_pages_count(struct cn10k_ml_dev *cn10k_mldev, uint16_t model_id,
+void cn10k_ml_layer_io_info_set(struct cnxk_ml_io_info *io_info,
+				struct cn10k_ml_model_metadata *metadata);
+struct cnxk_ml_io_info *cn10k_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
+int cn10k_ml_model_ocm_pages_count(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				   uint8_t *buffer, uint16_t *wb_pages, uint16_t *scratch_pages);
-void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cnxk_ml_model *model);
+void cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			     struct cnxk_ml_io_info *io_info,
+			     struct cn10k_ml_model_metadata *metadata);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 9691cf03e3..ab05896b5e 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -15,6 +15,9 @@
 /* ML model macros */
 #define CN10K_ML_MODEL_MEMZONE_NAME "ml_cn10k_model_mz"
 
+/* ML layer macros */
+#define CN10K_ML_LAYER_MEMZONE_NAME "ml_cn10k_layer_mz"
+
 /* Debug print width */
 #define STR_LEN	  12
 #define FIELD_LEN 16
@@ -273,7 +276,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 		req->cn10k_req.jd.model_start.extended_args = PLT_U64_CAST(
 			roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.extended_args));
 		req->cn10k_req.jd.model_start.model_dst_ddr_addr =
-			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_run_addr));
+			PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, addr->init_load_addr));
 		req->cn10k_req.jd.model_start.model_init_offset = 0x0;
 		req->cn10k_req.jd.model_start.model_main_offset = metadata->init_model.file_size;
 		req->cn10k_req.jd.model_start.model_finish_offset =
@@ -1261,85 +1264,171 @@ cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
 }
 
 int
-cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
+cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uint8_t *buffer,
+		    size_t size, uint16_t *index)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
-	size_t model_scratch_size;
-	size_t model_stats_size;
-	size_t model_data_size;
-	size_t model_info_size;
+	size_t layer_object_size = 0;
+	size_t layer_scratch_size;
+	size_t layer_xstats_size;
 	uint8_t *base_dma_addr;
 	uint16_t scratch_pages;
+	uint16_t layer_id = 0;
 	uint16_t wb_pages;
 	uint64_t mz_size;
 	uint16_t idx;
-	bool found;
 	int qp_id;
 	int ret;
 
-	ret = cn10k_ml_model_metadata_check(params->addr, params->size);
+	PLT_SET_USED(size);
+	PLT_SET_USED(layer_name);
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	layer = &model->layer[layer_id];
+
+	ret = cn10k_ml_model_metadata_check(buffer, size);
 	if (ret != 0)
 		return ret;
 
-	cnxk_mldev = dev->data->dev_private;
-
-	/* Find model ID */
-	found = false;
-	for (idx = 0; idx < dev->data->nb_models; idx++) {
-		if (dev->data->models[idx] == NULL) {
-			found = true;
+	/* Get index */
+	for (idx = 0; idx < cnxk_mldev->max_nb_layers; idx++) {
+		if (!cnxk_mldev->index_map[idx].active) {
+			layer->index = idx;
 			break;
 		}
 	}
 
-	if (!found) {
-		plt_err("No slots available to load new model");
-		return -ENOMEM;
+	if (idx >= cnxk_mldev->max_nb_layers) {
+		plt_err("No slots available for model layers, model_id = %u, layer_id = %u",
+			model->model_id, layer_id);
+		return -1;
 	}
 
+	layer->model = model;
+
 	/* Get WB and scratch pages, check if model can be loaded. */
-	ret = cn10k_ml_model_ocm_pages_count(&cnxk_mldev->cn10k_mldev, idx, params->addr, &wb_pages,
-					     &scratch_pages);
+	ret = cn10k_ml_model_ocm_pages_count(cnxk_mldev, layer, buffer, &wb_pages, &scratch_pages);
 	if (ret < 0)
 		return ret;
 
-	/* Compute memzone size */
-	metadata = (struct cn10k_ml_model_metadata *)params->addr;
-	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
-			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
-	model_scratch_size = PLT_ALIGN_CEIL(metadata->model.ddr_scratch_range_end -
+	/* Compute layer memzone size */
+	metadata = (struct cn10k_ml_model_metadata *)buffer;
+	layer_object_size = metadata->init_model.file_size + metadata->main_model.file_size +
+			    metadata->finish_model.file_size + metadata->weights_bias.file_size;
+	layer_object_size = PLT_ALIGN_CEIL(layer_object_size, ML_CN10K_ALIGN_SIZE);
+	layer_scratch_size = PLT_ALIGN_CEIL(metadata->model.ddr_scratch_range_end -
 						    metadata->model.ddr_scratch_range_start + 1,
 					    ML_CN10K_ALIGN_SIZE);
-	model_data_size = PLT_ALIGN_CEIL(model_data_size, ML_CN10K_ALIGN_SIZE);
-	model_info_size = sizeof(struct rte_ml_model_info) +
-			  metadata->model.num_input * sizeof(struct rte_ml_io_info) +
-			  metadata->model.num_output * sizeof(struct rte_ml_io_info);
-	model_info_size = PLT_ALIGN_CEIL(model_info_size, ML_CN10K_ALIGN_SIZE);
-	model_stats_size = (dev->data->nb_queue_pairs + 1) * sizeof(struct cn10k_ml_layer_xstats);
-
-	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE) +
-		  2 * model_data_size + model_scratch_size + model_info_size +
-		  PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE) +
-		  model_stats_size;
+	layer_xstats_size = (cnxk_mldev->mldev->data->nb_queue_pairs + 1) *
+			    sizeof(struct cn10k_ml_layer_xstats);
 
-	/* Allocate memzone for model object and model data */
-	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
+	/* Allocate memzone for model data */
+	mz_size = layer_object_size + layer_scratch_size +
+		  PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE) +
+		  layer_xstats_size;
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u_%u", CN10K_ML_LAYER_MEMZONE_NAME,
+		 model->model_id, layer_id);
 	mz = plt_memzone_reserve_aligned(str, mz_size, 0, ML_CN10K_ALIGN_SIZE);
 	if (!mz) {
 		plt_err("plt_memzone_reserve failed : %s", str);
 		return -ENOMEM;
 	}
 
-	model = mz->addr;
-	model->cnxk_mldev = cnxk_mldev;
-	model->model_id = idx;
-	dev->data->models[idx] = model;
+	/* Copy metadata to internal buffer */
+	rte_memcpy(&layer->glow.metadata, buffer, sizeof(struct cn10k_ml_model_metadata));
+	cn10k_ml_model_metadata_update(&layer->glow.metadata);
+
+	/* Set layer name */
+	rte_memcpy(layer->name, layer->glow.metadata.model.name, MRVL_ML_MODEL_NAME_LEN);
+
+	/* Enable support for batch_size of 256 */
+	if (layer->glow.metadata.model.batch_size == 0)
+		layer->batch_size = 256;
+	else
+		layer->batch_size = layer->glow.metadata.model.batch_size;
+
+	/* Set DMA base address */
+	base_dma_addr = mz->addr;
+	cn10k_ml_layer_addr_update(layer, buffer, base_dma_addr);
+
+	/* Set scratch base address */
+	layer->glow.addr.scratch_base_addr = PLT_PTR_ADD(base_dma_addr, layer_object_size);
+
+	/* Update internal I/O data structure */
+	cn10k_ml_layer_io_info_set(&layer->info, &layer->glow.metadata);
+
+	/* Initialize model_mem_map */
+	memset(&layer->glow.ocm_map, 0, sizeof(struct cn10k_ml_ocm_layer_map));
+	layer->glow.ocm_map.ocm_reserved = false;
+	layer->glow.ocm_map.tilemask = 0;
+	layer->glow.ocm_map.wb_page_start = -1;
+	layer->glow.ocm_map.wb_pages = wb_pages;
+	layer->glow.ocm_map.scratch_pages = scratch_pages;
+
+	/* Set slow-path request address and state */
+	layer->glow.req = PLT_PTR_ADD(mz->addr, layer_object_size + layer_scratch_size);
+
+	/* Reset burst and sync stats */
+	layer->glow.burst_xstats = PLT_PTR_ADD(
+		layer->glow.req, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
+	for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs + 1; qp_id++) {
+		layer->glow.burst_xstats[qp_id].hw_latency_tot = 0;
+		layer->glow.burst_xstats[qp_id].hw_latency_min = UINT64_MAX;
+		layer->glow.burst_xstats[qp_id].hw_latency_max = 0;
+		layer->glow.burst_xstats[qp_id].fw_latency_tot = 0;
+		layer->glow.burst_xstats[qp_id].fw_latency_min = UINT64_MAX;
+		layer->glow.burst_xstats[qp_id].fw_latency_max = 0;
+		layer->glow.burst_xstats[qp_id].hw_reset_count = 0;
+		layer->glow.burst_xstats[qp_id].fw_reset_count = 0;
+		layer->glow.burst_xstats[qp_id].dequeued_count = 0;
+	}
+
+	layer->glow.sync_xstats =
+		PLT_PTR_ADD(layer->glow.burst_xstats, cnxk_mldev->mldev->data->nb_queue_pairs *
+							      sizeof(struct cn10k_ml_layer_xstats));
+
+	/* Update xstats names */
+	cn10k_ml_xstats_model_name_update(cnxk_mldev->mldev, idx);
+
+	layer->state = ML_CNXK_LAYER_STATE_LOADED;
+	cnxk_mldev->index_map[idx].model_id = model->model_id;
+	cnxk_mldev->index_map[idx].layer_id = layer_id;
+	cnxk_mldev->index_map[idx].active = true;
+	*index = idx;
+
+	return 0;
+}
+
+int
+cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+		    struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+	int ret;
+
+	/* Metadata check */
+	ret = cn10k_ml_model_metadata_check(params->addr, params->size);
+	if (ret != 0)
+		return ret;
 
+	/* Copy metadata to internal buffer */
 	rte_memcpy(&model->glow.metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
 	cn10k_ml_model_metadata_update(&model->glow.metadata);
 
@@ -1358,99 +1447,62 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	 */
 	model->nb_layers = 1;
 
-	/* Copy metadata to internal buffer */
-	rte_memcpy(&model->layer[0].glow.metadata, params->addr,
-		   sizeof(struct cn10k_ml_model_metadata));
-	cn10k_ml_model_metadata_update(&model->layer[0].glow.metadata);
-	model->layer[0].model = model;
-
-	/* Set DMA base address */
-	base_dma_addr = PLT_PTR_ADD(
-		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), ML_CN10K_ALIGN_SIZE));
-	cn10k_ml_layer_addr_update(&model->layer[0], params->addr, base_dma_addr);
-	model->layer[0].glow.addr.scratch_base_addr =
-		PLT_PTR_ADD(base_dma_addr, 2 * model_data_size);
-
-	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
-		   model->layer[0].glow.addr.base_dma_addr_load, model_data_size);
-
-	/* Update internal I/O data structure */
-	cn10k_ml_layer_info_update(&model->layer[0]);
-
-	/* Initialize model_mem_map */
-	memset(&model->layer[0].glow.ocm_map, 0, sizeof(struct cn10k_ml_ocm_layer_map));
-	model->layer[0].glow.ocm_map.ocm_reserved = false;
-	model->layer[0].glow.ocm_map.tilemask = 0;
-	model->layer[0].glow.ocm_map.wb_page_start = -1;
-	model->layer[0].glow.ocm_map.wb_pages = wb_pages;
-	model->layer[0].glow.ocm_map.scratch_pages = scratch_pages;
-
-	/* Set model info */
-	model->info = PLT_PTR_ADD(model->layer[0].glow.addr.scratch_base_addr, model_scratch_size);
-	cn10k_ml_model_info_set(dev, model);
-
-	/* Set slow-path request address and state */
-	model->layer[0].glow.req = PLT_PTR_ADD(model->info, model_info_size);
-
-	/* Reset burst and sync stats */
-	model->layer[0].glow.burst_xstats =
-		PLT_PTR_ADD(model->layer[0].glow.req,
-			    PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_req), ML_CN10K_ALIGN_SIZE));
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs + 1; qp_id++) {
-		model->layer[0].glow.burst_xstats[qp_id].hw_latency_tot = 0;
-		model->layer[0].glow.burst_xstats[qp_id].hw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_xstats[qp_id].hw_latency_max = 0;
-		model->layer[0].glow.burst_xstats[qp_id].fw_latency_tot = 0;
-		model->layer[0].glow.burst_xstats[qp_id].fw_latency_min = UINT64_MAX;
-		model->layer[0].glow.burst_xstats[qp_id].fw_latency_max = 0;
-		model->layer[0].glow.burst_xstats[qp_id].hw_reset_count = 0;
-		model->layer[0].glow.burst_xstats[qp_id].fw_reset_count = 0;
-		model->layer[0].glow.burst_xstats[qp_id].dequeued_count = 0;
+	/* Load layer and get the index */
+	layer = &model->layer[0];
+	ret = cn10k_ml_layer_load(cnxk_mldev, model->model_id, NULL, params->addr, params->size,
+				  &layer->index);
+	if (ret != 0) {
+		plt_err("Model layer load failed: model_id = %u, layer_id = %u", model->model_id,
+			0);
+		return ret;
 	}
 
-	model->layer[0].glow.sync_xstats =
-		PLT_PTR_ADD(model->layer[0].glow.burst_xstats,
-			    dev->data->nb_queue_pairs * sizeof(struct cn10k_ml_layer_xstats));
-
-	plt_spinlock_init(&model->lock);
-	model->state = ML_CNXK_MODEL_STATE_LOADED;
-	dev->data->models[idx] = model;
-	cnxk_mldev->nb_models_loaded++;
-
-	/* Update xstats names */
-	cn10k_ml_xstats_model_name_update(dev, idx);
-
-	*model_id = idx;
+	cn10k_ml_model_info_set(cnxk_mldev, model, &model->layer[0].info, &model->glow.metadata);
 
 	return 0;
 }
 
 int
-cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name)
 {
-	char str[RTE_MEMZONE_NAMESIZE];
-	struct cnxk_ml_model *model;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 
-	cnxk_mldev = dev->data->dev_private;
-	model = dev->data->models[model_id];
+	char str[RTE_MEMZONE_NAMESIZE];
+	uint16_t layer_id = 0;
+	int ret;
 
+	PLT_SET_USED(layer_name);
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
 	if (model == NULL) {
 		plt_err("Invalid model_id = %u", model_id);
 		return -EINVAL;
 	}
 
-	if (model->state != ML_CNXK_MODEL_STATE_LOADED) {
-		plt_err("Cannot unload. Model in use.");
-		return -EBUSY;
-	}
+	layer = &model->layer[layer_id];
 
-	dev->data->models[model_id] = NULL;
-	cnxk_mldev->nb_models_unloaded++;
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u_%u", CN10K_ML_LAYER_MEMZONE_NAME,
+		 model->model_id, layer_id);
+	ret = plt_memzone_free(plt_memzone_lookup(str));
 
-	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, model_id);
-	return plt_memzone_free(plt_memzone_lookup(str));
+	layer->state = ML_CNXK_LAYER_STATE_UNKNOWN;
+	cnxk_mldev->index_map[layer->index].active = false;
+
+	return ret;
+}
+
+int
+cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	return cn10k_ml_layer_unload(cnxk_mldev, model->model_id, NULL);
 }
 
 int
@@ -1748,7 +1800,6 @@ int
 cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
 {
 	struct cnxk_ml_model *model;
-	size_t size;
 
 	model = dev->data->models[model_id];
 
@@ -1762,19 +1813,10 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *bu
 	else if (model->state != ML_CNXK_MODEL_STATE_LOADED)
 		return -EBUSY;
 
-	size = model->layer[0].glow.metadata.init_model.file_size +
-	       model->layer[0].glow.metadata.main_model.file_size +
-	       model->layer[0].glow.metadata.finish_model.file_size +
-	       model->layer[0].glow.metadata.weights_bias.file_size;
-
 	/* Update model weights & bias */
 	rte_memcpy(model->layer[0].glow.addr.wb_load_addr, buffer,
 		   model->layer[0].glow.metadata.weights_bias.file_size);
 
-	/* Copy data from load to run. run address to be used by MLIP */
-	rte_memcpy(model->layer[0].glow.addr.base_dma_addr_run,
-		   model->layer[0].glow.addr.base_dma_addr_load, size);
-
 	return 0;
 }
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 2d0a49d5cd..677219dfdf 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -12,6 +12,7 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_qp;
+struct cnxk_ml_model;
 
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
@@ -311,9 +312,9 @@ int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 			      int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids);
 
 /* Slow-path ops */
-int cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
-			uint16_t *model_id);
-int cn10k_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
+int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+			struct cnxk_ml_model *model);
+int cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
 int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
@@ -339,4 +340,9 @@ __rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *
 /* Misc ops */
 void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
 
+/* Layer ops */
+int cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uint8_t *buffer,
+			size_t size, uint16_t *index);
+int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name);
+
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 02605fa28f..1590249abd 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -31,6 +31,18 @@ enum cnxk_ml_dev_state {
 	ML_CNXK_DEV_STATE_CLOSED
 };
 
+/* Index to model and layer ID map */
+struct cnxk_ml_index_map {
+	/* Model ID */
+	uint16_t model_id;
+
+	/* Layer ID */
+	uint16_t layer_id;
+
+	/* Layer status */
+	bool active;
+};
+
 /* Device private data */
 struct cnxk_ml_dev {
 	/* RTE device */
@@ -56,6 +68,9 @@ struct cnxk_ml_dev {
 
 	/* Maximum number of layers */
 	uint64_t max_nb_layers;
+
+	/* Index map */
+	struct cnxk_ml_index_map *index_map;
 };
 
 #endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index aa56dd2276..1d8b84269d 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -10,6 +10,9 @@
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
+/* ML model macros */
+#define CNXK_ML_MODEL_MEMZONE_NAME "ml_cnxk_model_mz"
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -137,6 +140,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	uint16_t model_id;
 	uint32_t mz_size;
 	uint16_t qp_id;
+	uint64_t i;
 	int ret;
 
 	if (dev == NULL)
@@ -240,7 +244,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 						plt_err("Could not stop model %u", model_id);
 				}
 				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-					if (cn10k_ml_model_unload(dev, model_id) != 0)
+					if (cnxk_ml_model_unload(dev, model_id) != 0)
 						plt_err("Could not unload model %u", model_id);
 				}
 				dev->data->models[model_id] = NULL;
@@ -271,6 +275,23 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	cnxk_mldev->max_nb_layers =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
 
+	/* Allocate and initialize index_map */
+	if (cnxk_mldev->index_map == NULL) {
+		cnxk_mldev->index_map =
+			rte_zmalloc("cnxk_ml_index_map",
+				    sizeof(struct cnxk_ml_index_map) * cnxk_mldev->max_nb_layers,
+				    RTE_CACHE_LINE_SIZE);
+		if (cnxk_mldev->index_map == NULL) {
+			plt_err("Failed to get memory for index_map, nb_layers %" PRIu64,
+				cnxk_mldev->max_nb_layers);
+			ret = -ENOMEM;
+			goto error;
+		}
+	}
+
+	for (i = 0; i < cnxk_mldev->max_nb_layers; i++)
+		cnxk_mldev->index_map[i].active = false;
+
 	cnxk_mldev->nb_models_loaded = 0;
 	cnxk_mldev->nb_models_started = 0;
 	cnxk_mldev->nb_models_stopped = 0;
@@ -303,6 +324,9 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close CN10K ML Device");
 
+	if (cnxk_mldev->index_map)
+		rte_free(cnxk_mldev->index_map);
+
 	/* Stop and unload all models */
 	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
 		model = dev->data->models[model_id];
@@ -312,7 +336,7 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 					plt_err("Could not stop model %u", model_id);
 			}
 			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-				if (cn10k_ml_model_unload(dev, model_id) != 0)
+				if (cnxk_ml_model_unload(dev, model_id) != 0)
 					plt_err("Could not unload model %u", model_id);
 			}
 			dev->data->models[model_id] = NULL;
@@ -428,6 +452,118 @@ cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+static int
+cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
+{
+	struct rte_ml_dev_info dev_info;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint64_t model_info_size;
+	uint16_t lcl_model_id;
+	uint64_t mz_size;
+	bool found;
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	/* Find model ID */
+	found = false;
+	for (lcl_model_id = 0; lcl_model_id < dev->data->nb_models; lcl_model_id++) {
+		if (dev->data->models[lcl_model_id] == NULL) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		plt_err("No slots available to load new model");
+		return -ENOMEM;
+	}
+
+	/* Compute memzone size */
+	cnxk_ml_dev_info_get(dev, &dev_info);
+	mz_size = PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size);
+	model_info_size = sizeof(struct rte_ml_model_info) +
+			  ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info) +
+			  ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info);
+	model_info_size = PLT_ALIGN_CEIL(model_info_size, dev_info.align_size);
+	mz_size += model_info_size;
+
+	/* Allocate memzone for model object */
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CNXK_ML_MODEL_MEMZONE_NAME, lcl_model_id);
+	mz = plt_memzone_reserve_aligned(str, mz_size, 0, dev_info.align_size);
+	if (!mz) {
+		plt_err("Failed to allocate memory for cnxk_ml_model: %s", str);
+		return -ENOMEM;
+	}
+
+	model = mz->addr;
+	model->cnxk_mldev = cnxk_mldev;
+	model->model_id = lcl_model_id;
+	model->info = PLT_PTR_ADD(
+		model, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size));
+	dev->data->models[lcl_model_id] = model;
+
+	ret = cn10k_ml_model_load(cnxk_mldev, params, model);
+	if (ret != 0)
+		goto error;
+
+	plt_spinlock_init(&model->lock);
+	model->state = ML_CNXK_MODEL_STATE_LOADED;
+	cnxk_mldev->nb_models_loaded++;
+
+	*model_id = lcl_model_id;
+
+	return 0;
+
+error:
+	rte_memzone_free(mz);
+
+	return ret;
+}
+
+int
+cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	int ret;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	if (model->state != ML_CNXK_MODEL_STATE_LOADED) {
+		plt_err("Cannot unload. Model in use.");
+		return -EBUSY;
+	}
+
+	ret = cn10k_ml_model_unload(cnxk_mldev, model);
+	if (ret != 0)
+		return ret;
+
+	dev->data->models[model_id] = NULL;
+	cnxk_mldev->nb_models_unloaded++;
+
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CNXK_ML_MODEL_MEMZONE_NAME, model_id);
+	return plt_memzone_free(plt_memzone_lookup(str));
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -451,8 +587,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
 
 	/* Model ops */
-	.model_load = cn10k_ml_model_load,
-	.model_unload = cn10k_ml_model_unload,
+	.model_load = cnxk_ml_model_load,
+	.model_unload = cnxk_ml_model_unload,
 	.model_start = cn10k_ml_model_start,
 	.model_stop = cn10k_ml_model_stop,
 	.model_info_get = cn10k_ml_model_info_get,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index a925c07580..bc14f6e5b9 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -62,4 +62,6 @@ struct cnxk_ml_qp {
 
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
+int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 10/34] ml/cnxk: update model start and stop functions
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (8 preceding siblings ...)
  2023-10-26 12:43   ` [PATCH v9 09/34] ml/cnxk: update model load and unload functions Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 11/34] ml/cnxk: update model utility functions Srikanth Yalavarthi
                     ` (24 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented cnxk wrapper functions to start and stop
ML models. Wrapper functions would invoke the cn10k
model start and stop functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c |  28 ++--
 drivers/ml/cnxk/cn10k_ml_ocm.h |  12 +-
 drivers/ml/cnxk/cn10k_ml_ops.c | 282 ++++++++++++++++++++-------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   8 +-
 drivers/ml/cnxk/cnxk_ml_ops.c  |  48 +++++-
 drivers/ml/cnxk/cnxk_ml_ops.h  |   1 +
 6 files changed, 240 insertions(+), 139 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index d71c36eae6..2197e5e0ed 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -215,11 +215,10 @@ cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end)
  * scratch & WB pages and OCM allocation mode.
  */
 int
-cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
+cn10k_ml_ocm_tilemask_find(struct cnxk_ml_dev *cnxk_mldev, uint8_t num_tiles, uint16_t wb_pages,
 			   uint16_t scratch_pages, uint64_t *tilemask)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 
 	uint16_t used_scratch_pages_max;
@@ -238,7 +237,6 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 	int max_slot_sz;
 	int page_id;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
 
@@ -333,12 +331,10 @@ cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t w
 }
 
 void
-cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
+cn10k_ml_ocm_reserve_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id,
 			   uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
 			   uint16_t scratch_pages)
 {
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
@@ -351,10 +347,8 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t l
 	int tile_id;
 	int page_id;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
+	model = cnxk_mldev->mldev->data->models[model_id];
 	layer = &model->layer[layer_id];
 
 	/* Get first set bit, tile_start */
@@ -396,12 +390,10 @@ cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t l
 }
 
 void
-cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id)
+cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id)
 {
 	struct cnxk_ml_model *local_model;
 	struct cnxk_ml_layer *local_layer;
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
@@ -416,10 +408,8 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t laye
 	uint16_t i;
 	uint16_t j;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
+	model = cnxk_mldev->mldev->data->models[model_id];
 	layer = &model->layer[layer_id];
 
 	/* Update OCM info for WB memory */
@@ -438,8 +428,8 @@ cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t laye
 
 		/* Get max scratch pages required, excluding the current model */
 		scratch_resize_pages = 0;
-		for (i = 0; i < dev->data->nb_models; i++) {
-			local_model = dev->data->models[i];
+		for (i = 0; i < cnxk_mldev->mldev->data->nb_models; i++) {
+			local_model = cnxk_mldev->mldev->data->models[i];
 			if (local_model == NULL)
 				continue;
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 720f8caf76..97b723a56a 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -8,6 +8,8 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+struct cnxk_ml_dev;
+
 /* Number of OCM tiles. */
 #define ML_CN10K_OCM_NUMTILES 0x8
 
@@ -75,12 +77,12 @@ struct cn10k_ml_ocm {
 };
 
 int cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end);
-int cn10k_ml_ocm_tilemask_find(struct rte_ml_dev *dev, uint8_t num_tiles, uint16_t wb_pages,
+int cn10k_ml_ocm_tilemask_find(struct cnxk_ml_dev *cnxk_mldev, uint8_t num_tiles, uint16_t wb_pages,
 			       uint16_t scratch_pages, uint64_t *tilemask);
-void cn10k_ml_ocm_reserve_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id,
-				uint64_t tilemask, int wb_page_start, uint16_t wb_pages,
-				uint16_t scratch_pages);
-void cn10k_ml_ocm_free_pages(struct rte_ml_dev *dev, uint16_t model_id, uint16_t layer_id);
+void cn10k_ml_ocm_reserve_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id,
+				uint16_t layer_id, uint64_t tilemask, int wb_page_start,
+				uint16_t wb_pages, uint16_t scratch_pages);
+void cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id);
 void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp);
 
 #endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index ab05896b5e..40f484158a 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -248,26 +248,28 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
 }
 
 static void
-cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_model *model,
+cn10k_ml_prep_sp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				struct cnxk_ml_req *req, enum cn10k_ml_job_type job_type)
 {
 	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_layer_addr *addr;
+	struct cn10k_ml_dev *cn10k_mldev;
 
-	metadata = &model->glow.metadata;
-	addr = &model->layer[0].glow.addr;
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	metadata = &layer->glow.metadata;
+	addr = &layer->glow.addr;
 
 	memset(&req->cn10k_req.jd, 0, sizeof(struct cn10k_ml_jd));
 	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
 	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(&req->cn10k_req.status);
-	req->cn10k_req.jd.hdr.model_id = model->model_id;
+	req->cn10k_req.jd.hdr.model_id = layer->index;
 	req->cn10k_req.jd.hdr.job_type = job_type;
 	req->cn10k_req.jd.hdr.fp_flags = 0x0;
 	req->cn10k_req.jd.hdr.result =
 		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
 
 	if (job_type == ML_CN10K_JOB_TYPE_MODEL_START) {
-		if (!model->glow.metadata.model.ocm_relocatable)
+		if (!layer->glow.metadata.model.ocm_relocatable)
 			req->cn10k_req.jd.hdr.sp_flags = ML_CN10K_SP_FLAGS_OCM_NONRELOCATABLE;
 		else
 			req->cn10k_req.jd.hdr.sp_flags = 0x0;
@@ -291,7 +293,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 		req->cn10k_req.jd.model_start.num_gather_entries = 0;
 		req->cn10k_req.jd.model_start.num_scatter_entries = 0;
 		req->cn10k_req.jd.model_start.tilemask = 0; /* Updated after reserving pages */
-		req->cn10k_req.jd.model_start.batch_size = model->batch_size;
+		req->cn10k_req.jd.model_start.batch_size = layer->batch_size;
 		req->cn10k_req.jd.model_start.ocm_wb_base_address =
 			0; /* Updated after reserving pages */
 		req->cn10k_req.jd.model_start.ocm_wb_range_start =
@@ -323,9 +325,13 @@ cn10k_ml_prep_sp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml
 }
 
 static __rte_always_inline void
-cn10k_ml_prep_fp_job_descriptor(struct cn10k_ml_dev *cn10k_mldev, struct cnxk_ml_req *req,
+cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_req *req,
 				struct rte_ml_op *op)
 {
+	struct cn10k_ml_dev *cn10k_mldev;
+
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+
 	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
 	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(req->status);
 	req->cn10k_req.jd.hdr.model_id = op->model_id;
@@ -714,10 +720,8 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint
 }
 
 static int
-cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer)
 {
-	struct rte_ml_model_info *info;
-	struct cnxk_ml_model *model;
 	struct rte_ml_buff_seg seg[2];
 	struct rte_ml_buff_seg *inp;
 	struct rte_ml_buff_seg *out;
@@ -730,22 +734,20 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	int ret = 0;
 	uint32_t i;
 
-	model = dev->data->models[model_id];
-	info = (struct rte_ml_model_info *)model->info;
 	inp = &seg[0];
 	out = &seg[1];
 
 	/* Create input and output buffers. */
-	for (i = 0; i < info->nb_inputs; i++)
-		isize += info->input_info[i].size;
+	for (i = 0; i < layer->info.nb_inputs; i++)
+		isize += layer->info.input[i].sz_q;
 
-	for (i = 0; i < info->nb_outputs; i++)
-		osize += info->output_info[i].size;
+	for (i = 0; i < layer->info.nb_outputs; i++)
+		osize += layer->info.output[i].sz_q;
 
-	isize = model->batch_size * isize;
-	osize = model->batch_size * osize;
+	isize = layer->batch_size * isize;
+	osize = layer->batch_size * osize;
 
-	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", model_id);
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", layer->index);
 	mz = plt_memzone_reserve_aligned(str, isize + osize, 0, ML_CN10K_ALIGN_SIZE);
 	if (mz == NULL)
 		return -ENOMEM;
@@ -761,15 +763,15 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id)
 	seg[1].length = osize;
 	seg[1].next = NULL;
 
-	op.model_id = model_id;
-	op.nb_batches = model->batch_size;
+	op.model_id = layer->index;
+	op.nb_batches = layer->batch_size;
 	op.mempool = NULL;
 
 	op.input = &inp;
 	op.output = &out;
 
-	memset(model->layer[0].glow.req, 0, sizeof(struct cnxk_ml_req));
-	ret = cn10k_ml_inference_sync(dev, &op);
+	memset(layer->glow.req, 0, sizeof(struct cnxk_ml_req));
+	ret = cn10k_ml_inference_sync(cnxk_mldev, &op);
 	plt_memzone_free(mz);
 
 	return ret;
@@ -1506,14 +1508,16 @@ cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mode
 }
 
 int
-cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 	struct cnxk_ml_req *req;
 
+	uint16_t layer_id = 0;
 	bool job_enqueued;
 	bool job_dequeued;
 	uint8_t num_tiles;
@@ -1524,85 +1528,89 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	bool locked;
 	int ret = 0;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	PLT_SET_USED(layer_name);
 
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
 	if (model == NULL) {
 		plt_err("Invalid model_id = %u", model_id);
 		return -EINVAL;
 	}
 
+	layer = &model->layer[layer_id];
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
+
 	/* Prepare JD */
-	req = model->layer[0].glow.req;
-	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_START);
+	req = layer->glow.req;
+	cn10k_ml_prep_sp_job_descriptor(cnxk_mldev, layer, req, ML_CN10K_JOB_TYPE_MODEL_START);
 	req->cn10k_req.result.error_code = 0x0;
 	req->cn10k_req.result.user_ptr = NULL;
 
 	plt_write64(ML_CNXK_POLL_JOB_START, &req->cn10k_req.status);
 	plt_wmb();
 
-	num_tiles = model->layer[0].glow.metadata.model.tile_end -
-		    model->layer[0].glow.metadata.model.tile_start + 1;
+	num_tiles = layer->glow.metadata.model.tile_end - layer->glow.metadata.model.tile_start + 1;
 
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-				plt_ml_dbg("Model already started, model = 0x%016lx",
-					   PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_STARTED) {
+				plt_ml_dbg("Layer already started, model_id = %u, layer_id = %u",
+					   model->model_id, layer_id);
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
-				plt_err("A slow-path job is active for the model = 0x%016lx",
-					PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE) {
+				plt_err("A slow-path job is active for the model_id = %u",
+					model->model_id);
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
+			layer->state = ML_CNXK_LAYER_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (!model->layer[0].glow.ocm_map.ocm_reserved) {
+	while (!layer->glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
 			wb_page_start = cn10k_ml_ocm_tilemask_find(
-				dev, num_tiles, model->layer[0].glow.ocm_map.wb_pages,
-				model->layer[0].glow.ocm_map.scratch_pages, &tilemask);
+				cnxk_mldev, num_tiles, layer->glow.ocm_map.wb_pages,
+				layer->glow.ocm_map.scratch_pages, &tilemask);
 
 			if (wb_page_start == -1) {
 				plt_err("Free pages not available on OCM tiles");
-				plt_err("Failed to start model = 0x%016lx, name = %s",
-					PLT_U64_CAST(model),
-					model->layer[0].glow.metadata.model.name);
-
+				plt_err("Failed to start layer, model_id = %u, layer_id = %u",
+					model->model_id, layer_id);
 				plt_spinlock_unlock(&ocm->lock);
 				return -ENOMEM;
 			}
 
-			model->layer[0].glow.ocm_map.tilemask = tilemask;
-			model->layer[0].glow.ocm_map.wb_page_start = wb_page_start;
+			layer->glow.ocm_map.tilemask = tilemask;
+			layer->glow.ocm_map.wb_page_start = wb_page_start;
 
-			cn10k_ml_ocm_reserve_pages(dev, model->model_id, 0,
-						   model->layer[0].glow.ocm_map.tilemask,
-						   model->layer[0].glow.ocm_map.wb_page_start,
-						   model->layer[0].glow.ocm_map.wb_pages,
-						   model->layer[0].glow.ocm_map.scratch_pages);
-			model->layer[0].glow.ocm_map.ocm_reserved = true;
+			cn10k_ml_ocm_reserve_pages(
+				cnxk_mldev, model->model_id, layer_id, layer->glow.ocm_map.tilemask,
+				layer->glow.ocm_map.wb_page_start, layer->glow.ocm_map.wb_pages,
+				layer->glow.ocm_map.scratch_pages);
+			layer->glow.ocm_map.ocm_reserved = true;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
 
 	/* Update JD */
-	cn10k_ml_ocm_tilecount(model->layer[0].glow.ocm_map.tilemask, &tile_start, &tile_end);
+	cn10k_ml_ocm_tilecount(layer->glow.ocm_map.tilemask, &tile_start, &tile_end);
 	req->cn10k_req.jd.model_start.tilemask = GENMASK_ULL(tile_end, tile_start);
 	req->cn10k_req.jd.model_start.ocm_wb_base_address =
-		model->layer[0].glow.ocm_map.wb_page_start * ocm->page_size;
+		layer->glow.ocm_map.wb_page_start * ocm->page_size;
 
 	job_enqueued = false;
 	job_dequeued = false;
@@ -1636,66 +1644,94 @@ cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (ret == 0) {
-				model->state = ML_CNXK_MODEL_STATE_STARTED;
-				cnxk_mldev->nb_models_started++;
-			} else {
-				model->state = ML_CNXK_MODEL_STATE_UNKNOWN;
-			}
+			if (ret == 0)
+				layer->state = ML_CNXK_LAYER_STATE_STARTED;
+			else
+				layer->state = ML_CNXK_LAYER_STATE_UNKNOWN;
 
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN) {
-		while (model->layer[0].glow.ocm_map.ocm_reserved) {
+	if (layer->state == ML_CNXK_LAYER_STATE_UNKNOWN) {
+		while (layer->glow.ocm_map.ocm_reserved) {
 			if (plt_spinlock_trylock(&ocm->lock) != 0) {
-				cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
-				model->layer[0].glow.ocm_map.ocm_reserved = false;
-				model->layer[0].glow.ocm_map.tilemask = 0x0;
+				cn10k_ml_ocm_free_pages(cnxk_mldev, model->model_id, layer_id);
+				layer->glow.ocm_map.ocm_reserved = false;
+				layer->glow.ocm_map.tilemask = 0x0;
 				plt_spinlock_unlock(&ocm->lock);
 			}
 		}
 	}
 
-	if (ret < 0) { /* Call unload to update model and FW state, ignore error */
-		rte_ml_model_stop(dev->data->dev_id, model_id);
+	if (ret < 0) {
+		cn10k_ml_layer_stop(device, model_id, layer_name);
 	} else {
-		if (cn10k_mldev->cache_model_data && roc_model_is_cn10ka())
-			ret = cn10k_ml_cache_model_data(dev, model_id);
+		if (cn10k_mldev->cache_model_data)
+			ret = cn10k_ml_cache_model_data(cnxk_mldev, layer);
 	}
 
 	return ret;
 }
 
 int
-cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
+cn10k_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+	int ret;
+
+	layer = &model->layer[0];
+	ret = cn10k_ml_layer_start(cnxk_mldev, model->model_id, layer->name);
+	if (ret != 0) {
+		plt_err("CN10K Model start failed, model_id = %u, error = %d", model->model_id,
+			ret);
+		return ret;
+	}
+
+	cnxk_mldev->nb_models_started++;
+	model->state = ML_CNXK_MODEL_STATE_STARTED;
+
+	return 0;
+}
+
+int
+cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cn10k_ml_ocm *ocm;
 	struct cnxk_ml_req *req;
 
+	uint16_t layer_id = 0;
 	bool job_enqueued;
 	bool job_dequeued;
 	bool locked;
 	int ret = 0;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
+	PLT_SET_USED(layer_name);
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
 
+	model = cnxk_mldev->mldev->data->models[model_id];
 	if (model == NULL) {
 		plt_err("Invalid model_id = %u", model_id);
 		return -EINVAL;
 	}
 
+	layer = &model->layer[layer_id];
+	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
+	ocm = &cn10k_mldev->ocm;
+
 	/* Prepare JD */
-	req = model->layer[0].glow.req;
-	cn10k_ml_prep_sp_job_descriptor(cn10k_mldev, model, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
+	req = layer->glow.req;
+	cn10k_ml_prep_sp_job_descriptor(cnxk_mldev, layer, req, ML_CN10K_JOB_TYPE_MODEL_STOP);
 	req->cn10k_req.result.error_code = 0x0;
 	req->cn10k_req.result.user_ptr = NULL;
 
@@ -1705,31 +1741,31 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
-				plt_ml_dbg("Model not started, model = 0x%016lx",
-					   PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_LOADED) {
+				plt_ml_dbg("Layer not started, model_id = %u, layer_id = %u",
+					   model->model_id, layer_id);
 				plt_spinlock_unlock(&model->lock);
 				return 1;
 			}
 
-			if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE) {
-				plt_err("A slow-path job is active for the model = 0x%016lx",
-					PLT_U64_CAST(model));
+			if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE) {
+				plt_err("A slow-path job is active for the layer, model_id = %u, layer_id = %u",
+					model->model_id, layer_id);
 				plt_spinlock_unlock(&model->lock);
 				return -EBUSY;
 			}
 
-			model->state = ML_CNXK_MODEL_STATE_JOB_ACTIVE;
+			layer->state = ML_CNXK_LAYER_STATE_JOB_ACTIVE;
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
 	}
 
-	while (model->layer[0].glow.ocm_map.ocm_reserved) {
+	while (layer->glow.ocm_map.ocm_reserved) {
 		if (plt_spinlock_trylock(&ocm->lock) != 0) {
-			cn10k_ml_ocm_free_pages(dev, model->model_id, 0);
-			model->layer[0].glow.ocm_map.ocm_reserved = false;
-			model->layer[0].glow.ocm_map.tilemask = 0x0;
+			cn10k_ml_ocm_free_pages(cnxk_mldev, model->model_id, layer_id);
+			layer->glow.ocm_map.ocm_reserved = false;
+			layer->glow.ocm_map.tilemask = 0x0;
 			plt_spinlock_unlock(&ocm->lock);
 		}
 	}
@@ -1766,8 +1802,11 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	locked = false;
 	while (!locked) {
 		if (plt_spinlock_trylock(&model->lock) != 0) {
-			cnxk_mldev->nb_models_stopped++;
-			model->state = ML_CNXK_MODEL_STATE_LOADED;
+			if (ret == 0)
+				layer->state = ML_CNXK_LAYER_STATE_LOADED;
+			else
+				layer->state = ML_CNXK_LAYER_STATE_UNKNOWN;
+
 			plt_spinlock_unlock(&model->lock);
 			locked = true;
 		}
@@ -1776,6 +1815,25 @@ cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	return ret;
 }
 
+int
+cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+	int ret;
+
+	layer = &model->layer[0];
+	ret = cn10k_ml_layer_stop(cnxk_mldev, model->model_id, layer->name);
+	if (ret != 0) {
+		plt_err("CN10K Model stop failed, model_id = %u, error = %d", model->model_id, ret);
+		return ret;
+	}
+
+	cnxk_mldev->nb_models_stopped++;
+	model->state = ML_CNXK_MODEL_STATE_LOADED;
+
+	return 0;
+}
+
 int
 cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			struct rte_ml_model_info *model_info)
@@ -2003,30 +2061,35 @@ queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
 }
 
 static __rte_always_inline void
-cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *req)
+cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml_req *req)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_layer_xstats *xstats;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_result *result;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cnxk_ml_qp *qp;
 	struct rte_ml_op *op;
 	uint64_t hw_latency;
 	uint64_t fw_latency;
+	uint16_t model_id;
+	uint16_t layer_id;
 
 	result = &req->cn10k_req.result;
 	op = req->op;
 
 	if (likely(result->error_code == 0)) {
-		model = dev->data->models[op->model_id];
+		model_id = cnxk_mldev->index_map[op->model_id].model_id;
+		layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
+		model = cnxk_mldev->mldev->data->models[model_id];
+		layer = &model->layer[layer_id];
 		if (likely(qp_id >= 0)) {
-			qp = dev->data->queue_pairs[qp_id];
+			qp = cnxk_mldev->mldev->data->queue_pairs[qp_id];
 			qp->stats.dequeued_count++;
-			xstats = &model->layer[0].glow.burst_xstats[qp_id];
+			xstats = &layer->glow.burst_xstats[qp_id];
 		} else {
-			xstats = model->layer[0].glow.sync_xstats;
+			xstats = layer->glow.sync_xstats;
 		}
 
 		if (unlikely(xstats->dequeued_count == xstats->hw_reset_count)) {
@@ -2054,14 +2117,13 @@ cn10k_ml_result_update(struct rte_ml_dev *dev, int qp_id, struct cnxk_ml_req *re
 		op->status = RTE_ML_OP_STATUS_SUCCESS;
 	} else {
 		if (likely(qp_id >= 0)) {
-			qp = dev->data->queue_pairs[qp_id];
+			qp = cnxk_mldev->mldev->data->queue_pairs[qp_id];
 			qp->stats.dequeue_err_count++;
 		}
 
 		/* Handle driver error */
 		error_code = (union cn10k_ml_error_code *)&result->error_code;
 		if (error_code->s.etype == ML_ETYPE_DRIVER) {
-			cnxk_mldev = dev->data->dev_private;
 			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 			/* Check for exception */
@@ -2116,7 +2178,7 @@ cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 	req = &queue->reqs[head];
 
 	cn10k_mldev->set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
@@ -2183,7 +2245,7 @@ cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op
 		}
 	}
 
-	cn10k_ml_result_update(dev, qp_id, req);
+	cn10k_ml_result_update(cnxk_mldev, qp_id, req);
 	ops[count] = req->op;
 
 	queue_index_advance(&tail, qp->nb_desc);
@@ -2232,23 +2294,27 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 }
 
 __rte_hot int
-cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
+cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	struct cnxk_ml_req *req;
+	uint16_t model_id;
+	uint16_t layer_id;
 	bool timeout;
 	int ret = 0;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	model = dev->data->models[op->model_id];
-	req = model->layer[0].glow.req;
+	model_id = cnxk_mldev->index_map[op->model_id].model_id;
+	layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
+	model = cnxk_mldev->mldev->data->models[model_id];
+	layer = &model->layer[layer_id];
+	req = layer->glow.req;
 
 	cn10k_ml_set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cn10k_mldev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
@@ -2284,7 +2350,7 @@ cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op)
 	if (timeout)
 		ret = -ETIME;
 	else
-		cn10k_ml_result_update(dev, -1, req);
+		cn10k_ml_result_update(cnxk_mldev, -1, req);
 
 error_enqueue:
 	return ret;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 677219dfdf..a222a43d55 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -315,8 +315,8 @@ int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mod
 int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
-int cn10k_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id);
-int cn10k_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
+int cn10k_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
 			    struct rte_ml_model_info *model_info);
 int cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer);
@@ -335,7 +335,7 @@ __rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id
 					  struct rte_ml_op **ops, uint16_t nb_ops);
 __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op,
 				    struct rte_ml_op_error *error);
-__rte_hot int cn10k_ml_inference_sync(struct rte_ml_dev *dev, struct rte_ml_op *op);
+__rte_hot int cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op);
 
 /* Misc ops */
 void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
@@ -344,5 +344,7 @@ void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *q
 int cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uint8_t *buffer,
 			size_t size, uint16_t *index);
 int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name);
+int cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name);
+int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name);
 
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 1d8b84269d..b61ed45876 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -240,7 +240,7 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 			model = dev->data->models[model_id];
 			if (model != NULL) {
 				if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-					if (cn10k_ml_model_stop(dev, model_id) != 0)
+					if (cnxk_ml_model_stop(dev, model_id) != 0)
 						plt_err("Could not stop model %u", model_id);
 				}
 				if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
@@ -332,7 +332,7 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 		model = dev->data->models[model_id];
 		if (model != NULL) {
 			if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-				if (cn10k_ml_model_stop(dev, model_id) != 0)
+				if (cnxk_ml_model_stop(dev, model_id) != 0)
 					plt_err("Could not stop model %u", model_id);
 			}
 			if (model->state == ML_CNXK_MODEL_STATE_LOADED) {
@@ -564,6 +564,46 @@ cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 	return plt_memzone_free(plt_memzone_lookup(str));
 }
 
+static int
+cnxk_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	return cn10k_ml_model_start(cnxk_mldev, model);
+}
+
+int
+cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	return cn10k_ml_model_stop(cnxk_mldev, model);
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -589,8 +629,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Model ops */
 	.model_load = cnxk_ml_model_load,
 	.model_unload = cnxk_ml_model_unload,
-	.model_start = cn10k_ml_model_start,
-	.model_stop = cn10k_ml_model_stop,
+	.model_start = cnxk_ml_model_start,
+	.model_stop = cnxk_ml_model_stop,
 	.model_info_get = cn10k_ml_model_info_get,
 	.model_params_update = cn10k_ml_model_params_update,
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index bc14f6e5b9..d27ca0d0cb 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -63,5 +63,6 @@ struct cnxk_ml_qp {
 extern struct rte_ml_dev_ops cnxk_ml_ops;
 
 int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
+int cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
 
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 11/34] ml/cnxk: update model utility functions
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (9 preceding siblings ...)
  2023-10-26 12:43   ` [PATCH v9 10/34] ml/cnxk: update model start and stop functions Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 12/34] ml/cnxk: update data quantization functions Srikanth Yalavarthi
                     ` (23 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to update model params and
fetch model info.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 38 ++++++---------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |  5 ++--
 drivers/ml/cnxk/cnxk_ml_ops.c  | 48 ++++++++++++++++++++++++++++++++--
 3 files changed, 56 insertions(+), 35 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 40f484158a..3ff82829f0 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1835,45 +1835,23 @@ cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
 }
 
 int
-cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
-			struct rte_ml_model_info *model_info)
+cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			     void *buffer)
 {
-	struct cnxk_ml_model *model;
-
-	model = dev->data->models[model_id];
-
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
-
-	rte_memcpy(model_info, model->info, sizeof(struct rte_ml_model_info));
-	model_info->input_info = ((struct rte_ml_model_info *)model->info)->input_info;
-	model_info->output_info = ((struct rte_ml_model_info *)model->info)->output_info;
-
-	return 0;
-}
-
-int
-cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
-{
-	struct cnxk_ml_model *model;
-
-	model = dev->data->models[model_id];
+	struct cnxk_ml_layer *layer;
 
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
+	RTE_SET_USED(cnxk_mldev);
 
 	if (model->state == ML_CNXK_MODEL_STATE_UNKNOWN)
 		return -1;
 	else if (model->state != ML_CNXK_MODEL_STATE_LOADED)
 		return -EBUSY;
 
+	layer = &model->layer[0];
+
 	/* Update model weights & bias */
-	rte_memcpy(model->layer[0].glow.addr.wb_load_addr, buffer,
-		   model->layer[0].glow.metadata.weights_bias.file_size);
+	rte_memcpy(layer->glow.addr.wb_load_addr, buffer,
+		   layer->glow.metadata.weights_bias.file_size);
 
 	return 0;
 }
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index a222a43d55..ef12069f0d 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -317,9 +317,8 @@ int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_para
 int cn10k_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
-int cn10k_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
-			    struct rte_ml_model_info *model_info);
-int cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer);
+int cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				 void *buffer);
 
 /* I/O ops */
 int cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index b61ed45876..9ce37fcfd1 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -604,6 +604,50 @@ cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 	return cn10k_ml_model_stop(cnxk_mldev, model);
 }
 
+static int
+cnxk_ml_model_info_get(struct rte_ml_dev *dev, uint16_t model_id,
+		       struct rte_ml_model_info *model_info)
+{
+	struct rte_ml_model_info *info;
+	struct cnxk_ml_model *model;
+
+	if ((dev == NULL) || (model_info == NULL))
+		return -EINVAL;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	info = (struct rte_ml_model_info *)model->info;
+	rte_memcpy(model_info, info, sizeof(struct rte_ml_model_info));
+	model_info->input_info = info->input_info;
+	model_info->output_info = info->output_info;
+
+	return 0;
+}
+
+static int
+cnxk_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buffer)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	if ((dev == NULL) || (buffer == NULL))
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	return cn10k_ml_model_params_update(cnxk_mldev, model, buffer);
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -631,8 +675,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.model_unload = cnxk_ml_model_unload,
 	.model_start = cnxk_ml_model_start,
 	.model_stop = cnxk_ml_model_stop,
-	.model_info_get = cn10k_ml_model_info_get,
-	.model_params_update = cn10k_ml_model_params_update,
+	.model_info_get = cnxk_ml_model_info_get,
+	.model_params_update = cnxk_ml_model_params_update,
 
 	/* I/O ops */
 	.io_quantize = cn10k_ml_io_quantize,
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 12/34] ml/cnxk: update data quantization functions
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (10 preceding siblings ...)
  2023-10-26 12:43   ` [PATCH v9 11/34] ml/cnxk: update model utility functions Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 13/34] ml/cnxk: update device debug functions Srikanth Yalavarthi
                     ` (22 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper functions to quantize input data and
dequantize output data.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 164 ---------------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |   7 --
 drivers/ml/cnxk/cnxk_ml_io.c   |  95 +++++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_io.h   |   3 +
 drivers/ml/cnxk/cnxk_ml_ops.c  |  78 +++++++++++++++-
 drivers/ml/cnxk/meson.build    |   1 +
 6 files changed, 175 insertions(+), 173 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_io.c

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 3ff82829f0..c68e6c620c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1856,170 +1856,6 @@ cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_mode
 	return 0;
 }
 
-int
-cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
-		     struct rte_ml_buff_seg **qbuffer)
-{
-	struct cnxk_ml_model *model;
-	uint8_t model_input_type;
-	uint8_t *lcl_dbuffer;
-	uint8_t *lcl_qbuffer;
-	uint8_t input_type;
-	float qscale;
-	uint32_t i;
-	uint32_t j;
-	int ret;
-
-	model = dev->data->models[model_id];
-
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
-
-	lcl_dbuffer = dbuffer[0]->addr;
-	lcl_qbuffer = qbuffer[0]->addr;
-
-	for (i = 0; i < model->layer[0].glow.metadata.model.num_input; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			input_type = model->layer[0].glow.metadata.input1[i].input_type;
-			model_input_type = model->layer[0].glow.metadata.input1[i].model_input_type;
-			qscale = model->layer[0].glow.metadata.input1[i].qscale;
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			input_type = model->layer[0].glow.metadata.input2[j].input_type;
-			model_input_type = model->layer[0].glow.metadata.input2[j].model_input_type;
-			qscale = model->layer[0].glow.metadata.input2[j].qscale;
-		}
-
-		if (input_type == model_input_type) {
-			rte_memcpy(lcl_qbuffer, lcl_dbuffer, model->layer[0].info.input[i].sz_d);
-		} else {
-			switch (model->layer[0].glow.metadata.input1[i].model_input_type) {
-			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_float32_to_int8(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_float32_to_uint8(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_float32_to_int16(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_float32_to_uint16(
-					qscale, model->layer[0].info.input[i].nb_elements,
-					lcl_dbuffer, lcl_qbuffer);
-				break;
-			case RTE_ML_IO_TYPE_FP16:
-				ret = rte_ml_io_float32_to_float16(
-					model->layer[0].info.input[i].nb_elements, lcl_dbuffer,
-					lcl_qbuffer);
-				break;
-			default:
-				plt_err("Unsupported model_input_type[%u] : %u", i,
-					model->layer[0].glow.metadata.input1[i].model_input_type);
-				ret = -ENOTSUP;
-			}
-			if (ret < 0)
-				return ret;
-		}
-
-		lcl_dbuffer += model->layer[0].info.input[i].sz_d;
-		lcl_qbuffer += model->layer[0].info.input[i].sz_q;
-	}
-
-	return 0;
-}
-
-int
-cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
-		       struct rte_ml_buff_seg **dbuffer)
-{
-	struct cnxk_ml_model *model;
-	uint8_t model_output_type;
-	uint8_t *lcl_qbuffer;
-	uint8_t *lcl_dbuffer;
-	uint8_t output_type;
-	float dscale;
-	uint32_t i;
-	uint32_t j;
-	int ret;
-
-	model = dev->data->models[model_id];
-
-	if (model == NULL) {
-		plt_err("Invalid model_id = %u", model_id);
-		return -EINVAL;
-	}
-
-	lcl_dbuffer = dbuffer[0]->addr;
-	lcl_qbuffer = qbuffer[0]->addr;
-
-	for (i = 0; i < model->layer[0].glow.metadata.model.num_output; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			output_type = model->layer[0].glow.metadata.output1[i].output_type;
-			model_output_type =
-				model->layer[0].glow.metadata.output1[i].model_output_type;
-			dscale = model->layer[0].glow.metadata.output1[i].dscale;
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			output_type = model->layer[0].glow.metadata.output2[j].output_type;
-			model_output_type =
-				model->layer[0].glow.metadata.output2[j].model_output_type;
-			dscale = model->layer[0].glow.metadata.output2[j].dscale;
-		}
-
-		if (output_type == model_output_type) {
-			rte_memcpy(lcl_dbuffer, lcl_qbuffer, model->layer[0].info.output[i].sz_q);
-		} else {
-			switch (model->layer[0].glow.metadata.output1[i].model_output_type) {
-			case RTE_ML_IO_TYPE_INT8:
-				ret = rte_ml_io_int8_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT8:
-				ret = rte_ml_io_uint8_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_INT16:
-				ret = rte_ml_io_int16_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_UINT16:
-				ret = rte_ml_io_uint16_to_float32(
-					dscale, model->layer[0].info.output[i].nb_elements,
-					lcl_qbuffer, lcl_dbuffer);
-				break;
-			case RTE_ML_IO_TYPE_FP16:
-				ret = rte_ml_io_float16_to_float32(
-					model->layer[0].info.output[i].nb_elements, lcl_qbuffer,
-					lcl_dbuffer);
-				break;
-			default:
-				plt_err("Unsupported model_output_type[%u] : %u", i,
-					model->layer[0].glow.metadata.output1[i].model_output_type);
-				ret = -ENOTSUP;
-			}
-			if (ret < 0)
-				return ret;
-		}
-
-		lcl_qbuffer += model->layer[0].info.output[i].sz_q;
-		lcl_dbuffer += model->layer[0].info.output[i].sz_d;
-	}
-
-	return 0;
-}
-
 static __rte_always_inline void
 queue_index_advance(uint64_t *index, uint64_t nb_desc)
 {
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index ef12069f0d..780e2a9f9c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -320,13 +320,6 @@ int cn10k_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 int cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 				 void *buffer);
 
-/* I/O ops */
-int cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id,
-			 struct rte_ml_buff_seg **dbuffer, struct rte_ml_buff_seg **qbuffer);
-
-int cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id,
-			   struct rte_ml_buff_seg **qbuffer, struct rte_ml_buff_seg **dbuffer);
-
 /* Fast-path ops */
 __rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					  struct rte_ml_op **ops, uint16_t nb_ops);
diff --git a/drivers/ml/cnxk/cnxk_ml_io.c b/drivers/ml/cnxk/cnxk_ml_io.c
new file mode 100644
index 0000000000..c78009ab0c
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_io.c
@@ -0,0 +1,95 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include <mldev_utils.h>
+
+#include <roc_api.h>
+
+#include "cnxk_ml_io.h"
+
+inline int
+cnxk_ml_io_quantize_single(struct cnxk_ml_io *input, uint8_t *dbuffer, uint8_t *qbuffer)
+{
+	enum rte_ml_io_type qtype;
+	enum rte_ml_io_type dtype;
+	uint32_t nb_elements;
+	float qscale;
+	int ret = 0;
+
+	dtype = input->dtype;
+	qtype = input->qtype;
+	qscale = input->scale;
+	nb_elements = input->nb_elements;
+
+	if (dtype == qtype) {
+		rte_memcpy(qbuffer, dbuffer, input->sz_d);
+	} else {
+		switch (qtype) {
+		case RTE_ML_IO_TYPE_INT8:
+			ret = rte_ml_io_float32_to_int8(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT8:
+			ret = rte_ml_io_float32_to_uint8(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_INT16:
+			ret = rte_ml_io_float32_to_int16(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT16:
+			ret = rte_ml_io_float32_to_uint16(qscale, nb_elements, dbuffer, qbuffer);
+			break;
+		case RTE_ML_IO_TYPE_FP16:
+			ret = rte_ml_io_float32_to_float16(nb_elements, dbuffer, qbuffer);
+			break;
+		default:
+			plt_err("Unsupported qtype : %u", qtype);
+			ret = -ENOTSUP;
+		}
+	}
+
+	return ret;
+}
+
+inline int
+cnxk_ml_io_dequantize_single(struct cnxk_ml_io *output, uint8_t *qbuffer, uint8_t *dbuffer)
+{
+	enum rte_ml_io_type qtype;
+	enum rte_ml_io_type dtype;
+	uint32_t nb_elements;
+	float dscale;
+	int ret = 0;
+
+	dtype = output->dtype;
+	qtype = output->qtype;
+	dscale = output->scale;
+	nb_elements = output->nb_elements;
+
+	if (dtype == qtype) {
+		rte_memcpy(dbuffer, qbuffer, output->sz_q);
+	} else {
+		switch (qtype) {
+		case RTE_ML_IO_TYPE_INT8:
+			ret = rte_ml_io_int8_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT8:
+			ret = rte_ml_io_uint8_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_INT16:
+			ret = rte_ml_io_int16_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_UINT16:
+			ret = rte_ml_io_uint16_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+			break;
+		case RTE_ML_IO_TYPE_FP16:
+			ret = rte_ml_io_float16_to_float32(nb_elements, qbuffer, dbuffer);
+			break;
+		default:
+			plt_err("Unsupported qtype: %u", qtype);
+			ret = -ENOTSUP;
+		}
+	}
+
+	return ret;
+}
diff --git a/drivers/ml/cnxk/cnxk_ml_io.h b/drivers/ml/cnxk/cnxk_ml_io.h
index 1fa965a232..d500d77b9a 100644
--- a/drivers/ml/cnxk/cnxk_ml_io.h
+++ b/drivers/ml/cnxk/cnxk_ml_io.h
@@ -76,4 +76,7 @@ struct cnxk_ml_io_info {
 	uint32_t total_output_sz_d;
 };
 
+int cnxk_ml_io_quantize_single(struct cnxk_ml_io *input, uint8_t *dbuffer, uint8_t *qbuffer);
+int cnxk_ml_io_dequantize_single(struct cnxk_ml_io *output, uint8_t *qbuffer, uint8_t *dbuffer);
+
 #endif /* _CNXK_ML_IO_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 9ce37fcfd1..63842025fc 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -5,6 +5,8 @@
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include <mldev_utils.h>
+
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_io.h"
 #include "cnxk_ml_model.h"
@@ -648,6 +650,78 @@ cnxk_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buf
 	return cn10k_ml_model_params_update(cnxk_mldev, model, buffer);
 }
 
+static int
+cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer,
+		    struct rte_ml_buff_seg **qbuffer)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_model *model;
+	uint8_t *lcl_dbuffer;
+	uint8_t *lcl_qbuffer;
+	uint32_t i;
+	int ret;
+
+	if ((dev == NULL) || (dbuffer == NULL) || (qbuffer == NULL))
+		return -EINVAL;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	info = &model->layer[0].info;
+
+	lcl_dbuffer = dbuffer[0]->addr;
+	lcl_qbuffer = qbuffer[0]->addr;
+	for (i = 0; i < info->nb_inputs; i++) {
+		ret = cnxk_ml_io_quantize_single(&info->input[i], lcl_dbuffer, lcl_qbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_dbuffer += info->input[i].sz_d;
+		lcl_qbuffer += info->input[i].sz_q;
+	}
+
+	return 0;
+}
+
+static int
+cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer,
+		      struct rte_ml_buff_seg **dbuffer)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_model *model;
+	uint8_t *lcl_qbuffer;
+	uint8_t *lcl_dbuffer;
+	uint32_t i;
+	int ret;
+
+	if ((dev == NULL) || (qbuffer == NULL) || (dbuffer == NULL))
+		return -EINVAL;
+
+	model = dev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	info = &model->layer[model->nb_layers - 1].info;
+
+	lcl_qbuffer = qbuffer[0]->addr;
+	lcl_dbuffer = dbuffer[0]->addr;
+	for (i = 0; i < info->nb_outputs; i++) {
+		ret = cnxk_ml_io_dequantize_single(&info->output[i], lcl_qbuffer, lcl_dbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_qbuffer += info->output[i].sz_q;
+		lcl_dbuffer += info->output[i].sz_d;
+	}
+
+	return 0;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
@@ -679,6 +753,6 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.model_params_update = cnxk_ml_model_params_update,
 
 	/* I/O ops */
-	.io_quantize = cn10k_ml_io_quantize,
-	.io_dequantize = cn10k_ml_io_dequantize,
+	.io_quantize = cnxk_ml_io_quantize,
+	.io_dequantize = cnxk_ml_io_dequantize,
 };
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index d652543912..79154c8698 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -13,6 +13,7 @@ sources = files(
         'cn10k_ml_model.c',
         'cn10k_ml_ocm.c',
         'cnxk_ml_dev.c',
+        'cnxk_ml_io.c',
         'cnxk_ml_model.c',
         'cnxk_ml_ops.c',
 )
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 13/34] ml/cnxk: update device debug functions
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (11 preceding siblings ...)
  2023-10-26 12:43   ` [PATCH v9 12/34] ml/cnxk: update data quantization functions Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 14/34] ml/cnxk: update device stats functions Srikanth Yalavarthi
                     ` (21 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper for device dump and selftest debug
functions.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 118 +++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h |   1 +
 drivers/ml/cnxk/cn10k_ml_ocm.c   |   8 +-
 drivers/ml/cnxk/cn10k_ml_ocm.h   |   2 +-
 drivers/ml/cnxk/cn10k_ml_ops.c   | 176 ++-----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h   |   4 +-
 drivers/ml/cnxk/cnxk_ml_model.c  |  33 ++++++
 drivers/ml/cnxk/cnxk_ml_model.h  |   2 +
 drivers/ml/cnxk/cnxk_ml_ops.c    |  39 ++++++-
 drivers/ml/cnxk/cnxk_ml_utils.c  |  15 +++
 drivers/ml/cnxk/cnxk_ml_utils.h  |  17 +++
 drivers/ml/cnxk/meson.build      |   1 +
 12 files changed, 235 insertions(+), 181 deletions(-)
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.c
 create mode 100644 drivers/ml/cnxk/cnxk_ml_utils.h

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index 48d70027ca..af9d5a666f 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -11,6 +11,7 @@
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
+#include "cnxk_ml_utils.h"
 
 static enum rte_ml_io_type
 cn10k_ml_io_type_map(uint8_t type)
@@ -598,3 +599,120 @@ cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 				 rte_ml_io_type_size_get(io_info->output[i].qtype);
 	}
 }
+
+void
+cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp)
+{
+	struct cn10k_ml_ocm *ocm;
+	char str[STR_LEN];
+	uint8_t i;
+	uint8_t j;
+
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
+
+	/* Print debug info */
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, " Layer Information (Layer ID: %u, Name: %s)\n",
+		cnxk_mldev->index_map[layer->index].layer_id, layer->name);
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "index", layer->index);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", layer->name);
+	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version",
+		layer->glow.metadata.model.version[0], layer->glow.metadata.model.version[1],
+		layer->glow.metadata.model.version[2], layer->glow.metadata.model.version[3]);
+	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "layer", PLT_U64_CAST(layer));
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", layer->batch_size);
+
+	/* Print model state */
+	if (layer->state == ML_CNXK_LAYER_STATE_LOADED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+	if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+	if (layer->state == ML_CNXK_LAYER_STATE_STARTED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+
+	/* Print OCM status */
+	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
+		layer->glow.metadata.model.ocm_wb_range_end -
+			layer->glow.metadata.model.ocm_wb_range_start + 1);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", layer->glow.ocm_map.wb_pages);
+	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
+		ocm->size_per_tile - layer->glow.metadata.model.ocm_tmp_range_floor);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages", layer->glow.ocm_map.scratch_pages);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
+		layer->glow.metadata.model.tile_end - layer->glow.metadata.model.tile_start + 1);
+
+	if (layer->state == ML_CNXK_LAYER_STATE_STARTED) {
+		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
+			ML_CN10K_OCM_NUMTILES / 4, layer->glow.ocm_map.tilemask);
+		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
+			layer->glow.ocm_map.wb_page_start * ocm->page_size);
+	}
+
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", layer->glow.metadata.model.num_input);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", layer->glow.metadata.model.num_output);
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s  %18s\n", "input", "input_name", "input_type",
+		"model_input_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->glow.metadata.model.num_input; i++) {
+		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.input1[i].input_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.input1[i].input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.input1[i].model_input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		} else {
+			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
+
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.input2[j].input_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.input2[j].input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.input2[j].model_input_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		}
+	}
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s  %18s\n", "output", "output_name", "output_type",
+		"model_output_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->glow.metadata.model.num_output; i++) {
+		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.output1[i].output_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.output1[i].output_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.output1[i].model_output_type,
+					      str, STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		} else {
+			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
+			fprintf(fp, "%8u  ", i);
+			fprintf(fp, "%*s  ", 16, layer->glow.metadata.output2[j].output_name);
+			rte_ml_io_type_to_str(layer->glow.metadata.output2[j].output_type, str,
+					      STR_LEN);
+			fprintf(fp, "%*s  ", 12, str);
+			rte_ml_io_type_to_str(layer->glow.metadata.output2[j].model_output_type,
+					      str, STR_LEN);
+			fprintf(fp, "%*s  ", 18, str);
+			fprintf(fp, "\n");
+		}
+	}
+	fprintf(fp, "\n");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index b891c9d627..45f2ed5fcf 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -460,5 +460,6 @@ int cn10k_ml_model_ocm_pages_count(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_m
 void cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 			     struct cnxk_ml_io_info *io_info,
 			     struct cn10k_ml_model_metadata *metadata);
+void cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 2197e5e0ed..dc315cce10 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -481,19 +481,15 @@ cn10k_ml_ocm_pagemask_to_str(struct cn10k_ml_ocm_tile_info *tile_info, uint16_t
 }
 
 void
-cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp)
+cn10k_ml_ocm_print(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
 {
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cn10k_ml_ocm *ocm;
 	uint8_t tile_id;
 	uint8_t word_id;
 	int wb_pages;
 	char *str;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
 
 	/* Nibbles + prefix '0x' */
 	str = rte_zmalloc("ocm_mask_str", ocm->num_pages / 4 + 2, RTE_CACHE_LINE_SIZE);
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.h b/drivers/ml/cnxk/cn10k_ml_ocm.h
index 97b723a56a..bf8944f8ee 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.h
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.h
@@ -83,6 +83,6 @@ void cn10k_ml_ocm_reserve_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_i
 				uint16_t layer_id, uint64_t tilemask, int wb_page_start,
 				uint16_t wb_pages, uint16_t scratch_pages);
 void cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint16_t layer_id);
-void cn10k_ml_ocm_print(struct rte_ml_dev *dev, FILE *fp);
+void cn10k_ml_ocm_print(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 
 #endif /* _CN10K_ML_OCM_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index c68e6c620c..a56d002d4c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -18,11 +18,6 @@
 /* ML layer macros */
 #define CN10K_ML_LAYER_MEMZONE_NAME "ml_cn10k_layer_mz"
 
-/* Debug print width */
-#define STR_LEN	  12
-#define FIELD_LEN 16
-#define LINE_LEN  90
-
 /* ML Job descriptor flags */
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
@@ -70,16 +65,6 @@ static const struct cn10k_ml_stype_db_driver {
 	{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
 };
 
-static void
-print_line(FILE *fp, int len)
-{
-	int i;
-
-	for (i = 0; i < len; i++)
-		fprintf(fp, "-");
-	fprintf(fp, "\n");
-}
-
 static inline void
 cn10k_ml_set_poll_addr(struct cnxk_ml_req *req)
 {
@@ -113,140 +98,6 @@ cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp)
 	}
 }
 
-static void
-cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
-	struct cn10k_ml_ocm *ocm;
-	char str[STR_LEN];
-	uint8_t i;
-	uint8_t j;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	ocm = &cn10k_mldev->ocm;
-	model = dev->data->models[model_id];
-
-	/* Print debug info */
-	print_line(fp, LINE_LEN);
-	fprintf(fp, " Model Information (%s)\n", model->glow.metadata.model.name);
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->glow.metadata.model.name);
-	fprintf(fp, "%*s : %u.%u.%u.%u\n", FIELD_LEN, "version",
-		model->glow.metadata.model.version[0], model->glow.metadata.model.version[1],
-		model->glow.metadata.model.version[2], model->glow.metadata.model.version[3]);
-	if (strlen(model->name) != 0)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "debug_name", model->name);
-	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->glow.metadata.model.batch_size);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_layers", model->glow.metadata.model.num_layers);
-
-	/* Print model state */
-	if (model->state == ML_CNXK_MODEL_STATE_LOADED)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
-	if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
-	if (model->state == ML_CNXK_MODEL_STATE_STARTED)
-		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
-
-	/* Print OCM status */
-	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "wb_size",
-		model->glow.metadata.model.ocm_wb_range_end -
-			model->glow.metadata.model.ocm_wb_range_start + 1);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "wb_pages", model->layer[0].glow.ocm_map.wb_pages);
-	fprintf(fp, "%*s : %" PRIu64 " bytes\n", FIELD_LEN, "scratch_size",
-		ocm->size_per_tile - model->glow.metadata.model.ocm_tmp_range_floor);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "scratch_pages",
-		model->layer[0].glow.ocm_map.scratch_pages);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_tiles",
-		model->glow.metadata.model.tile_end - model->glow.metadata.model.tile_start + 1);
-
-	if (model->state == ML_CNXK_MODEL_STATE_STARTED) {
-		fprintf(fp, "%*s : 0x%0*" PRIx64 "\n", FIELD_LEN, "tilemask",
-			ML_CN10K_OCM_NUMTILES / 4, model->layer[0].glow.ocm_map.tilemask);
-		fprintf(fp, "%*s : 0x%" PRIx64 "\n", FIELD_LEN, "ocm_wb_start",
-			model->layer[0].glow.ocm_map.wb_page_start * cn10k_mldev->ocm.page_size);
-	}
-
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", model->glow.metadata.model.num_input);
-	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", model->glow.metadata.model.num_output);
-	fprintf(fp, "\n");
-
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "input", "input_name", "input_type",
-		"model_input_type", "quantize");
-	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->glow.metadata.model.num_input; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.input1[i].input_name);
-			rte_ml_io_type_to_str(model->glow.metadata.input1[i].input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.input1[i].model_input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.input1[i].quantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.input2[j].input_name);
-			rte_ml_io_type_to_str(model->glow.metadata.input2[j].input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.input2[j].model_input_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.input2[j].quantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		}
-	}
-	fprintf(fp, "\n");
-
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "%8s  %16s  %12s  %18s  %12s\n", "output", "output_name", "output_type",
-		"model_output_type", "dequantize");
-	print_line(fp, LINE_LEN);
-	for (i = 0; i < model->glow.metadata.model.num_output; i++) {
-		if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) {
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.output1[i].output_name);
-			rte_ml_io_type_to_str(model->glow.metadata.output1[i].output_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.output1[i].model_output_type,
-					      str, STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.output1[i].dequantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		} else {
-			j = i - MRVL_ML_NUM_INPUT_OUTPUT_1;
-			fprintf(fp, "%8u  ", i);
-			fprintf(fp, "%*s  ", 16, model->glow.metadata.output2[j].output_name);
-			rte_ml_io_type_to_str(model->glow.metadata.output2[j].output_type, str,
-					      STR_LEN);
-			fprintf(fp, "%*s  ", 12, str);
-			rte_ml_io_type_to_str(model->glow.metadata.output2[j].model_output_type,
-					      str, STR_LEN);
-			fprintf(fp, "%*s  ", 18, str);
-			fprintf(fp, "%*s", 12,
-				(model->glow.metadata.output2[j].dequantize == 1 ? "Yes" : "No"));
-			fprintf(fp, "\n");
-		}
-	}
-	fprintf(fp, "\n");
-	print_line(fp, LINE_LEN);
-	fprintf(fp, "\n");
-}
-
 static void
 cn10k_ml_prep_sp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				struct cnxk_ml_req *req, enum cn10k_ml_job_type job_type)
@@ -1120,38 +971,25 @@ cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mo
 }
 
 int
-cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
+cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
 	struct cn10k_ml_fw *fw;
 
 	uint32_t head_loc;
 	uint32_t tail_loc;
-	uint16_t model_id;
 	uint32_t bufsize;
 	char *head_ptr;
 	int core_id;
 
-	if (roc_env_is_asim())
-		return 0;
-
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	fw = &cn10k_mldev->fw;
 
-	/* Dump model info */
-	for (model_id = 0; model_id < dev->data->nb_models; model_id++) {
-		model = dev->data->models[model_id];
-		if (model != NULL) {
-			cn10k_ml_model_print(dev, model_id, fp);
-			fprintf(fp, "\n");
-		}
-	}
-
 	/* Dump OCM state */
-	cn10k_ml_ocm_print(dev, fp);
+	cn10k_ml_ocm_print(cnxk_mldev, fp);
+
+	if (roc_env_is_asim())
+		return 0;
 
 	/* Dump debug buffer */
 	for (core_id = 0; core_id <= 1; core_id++) {
@@ -1207,17 +1045,15 @@ cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 }
 
 int
-cn10k_ml_dev_selftest(struct rte_ml_dev *dev)
+cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	const struct plt_memzone *mz;
 	struct cnxk_ml_req *req;
 	uint64_t timeout_cycle;
 	bool timeout;
 	int ret;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	mz = plt_memzone_reserve_aligned("dev_selftest", sizeof(struct cnxk_ml_req), 0,
 					 ML_CN10K_ALIGN_SIZE);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 780e2a9f9c..5fda98ae88 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -295,8 +295,8 @@ int cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_d
 int cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_start(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
-int cn10k_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp);
-int cn10k_ml_dev_selftest(struct rte_ml_dev *dev);
+int cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
+int cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev);
 
 int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
 void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
index 3d735ced3e..b069d4e3a5 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.c
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -5,3 +5,36 @@
 #include <rte_mldev.h>
 
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_utils.h"
+
+void
+cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp)
+{
+	struct cnxk_ml_layer *layer;
+	uint16_t layer_id;
+
+	/* Print debug info */
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, " Model Information (Model ID: %u, Name: %s)\n", model->model_id, model->name);
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->name);
+	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->batch_size);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "nb_layers", model->nb_layers);
+
+	/* Print model state */
+	if (model->state == ML_CNXK_MODEL_STATE_LOADED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+	if (model->state == ML_CNXK_MODEL_STATE_JOB_ACTIVE)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+	if (model->state == ML_CNXK_MODEL_STATE_STARTED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+
+	for (layer_id = 0; layer_id < model->nb_layers; layer_id++) {
+		layer = &model->layer[layer_id];
+		cn10k_ml_layer_print(cnxk_mldev, layer, fp);
+	}
+}
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index a2994dbb71..66d979dd3c 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -108,4 +108,6 @@ struct cnxk_ml_model {
 	plt_spinlock_t lock;
 };
 
+void cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp);
+
 #endif /* _CNXK_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 63842025fc..66b88ddae1 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -409,6 +409,41 @@ cnxk_ml_dev_stop(struct rte_ml_dev *dev)
 	return 0;
 }
 
+static int
+cnxk_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	uint16_t model_id;
+
+	if ((dev == NULL) || (fp == NULL))
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	/* Dump model info */
+	for (model_id = 0; model_id < cnxk_mldev->mldev->data->nb_models; model_id++) {
+		model = cnxk_mldev->mldev->data->models[model_id];
+		if (model != NULL)
+			cnxk_ml_model_dump(cnxk_mldev, model, fp);
+	}
+
+	return cn10k_ml_dev_dump(cnxk_mldev, fp);
+}
+
+static int
+cnxk_ml_dev_selftest(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	return cn10k_ml_dev_selftest(cnxk_mldev);
+}
+
 static int
 cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 			     const struct rte_ml_dev_qp_conf *qp_conf, int socket_id)
@@ -729,8 +764,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_close = cnxk_ml_dev_close,
 	.dev_start = cnxk_ml_dev_start,
 	.dev_stop = cnxk_ml_dev_stop,
-	.dev_dump = cn10k_ml_dev_dump,
-	.dev_selftest = cn10k_ml_dev_selftest,
+	.dev_dump = cnxk_ml_dev_dump,
+	.dev_selftest = cnxk_ml_dev_selftest,
 
 	/* Queue-pair handling ops */
 	.dev_queue_pair_setup = cnxk_ml_dev_queue_pair_setup,
diff --git a/drivers/ml/cnxk/cnxk_ml_utils.c b/drivers/ml/cnxk/cnxk_ml_utils.c
new file mode 100644
index 0000000000..ca3670a9e8
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_utils.c
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include "cnxk_ml_utils.h"
+
+void
+cnxk_ml_print_line(FILE *fp, int len)
+{
+	int i;
+
+	for (i = 0; i < len; i++)
+		fprintf(fp, "-");
+	fprintf(fp, "\n");
+}
diff --git a/drivers/ml/cnxk/cnxk_ml_utils.h b/drivers/ml/cnxk/cnxk_ml_utils.h
new file mode 100644
index 0000000000..ed2ab21346
--- /dev/null
+++ b/drivers/ml/cnxk/cnxk_ml_utils.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _CNXK_ML_UTILS_H_
+#define _CNXK_ML_UTILS_H_
+
+#include <rte_mldev.h>
+
+/* Debug print width */
+#define STR_LEN	  12
+#define FIELD_LEN 16
+#define LINE_LEN  72
+
+void cnxk_ml_print_line(FILE *fp, int len);
+
+#endif /* _CNXK_ML_UTILS_H_ */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 79154c8698..5d27a87d91 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -16,6 +16,7 @@ sources = files(
         'cnxk_ml_io.c',
         'cnxk_ml_model.c',
         'cnxk_ml_ops.c',
+        'cnxk_ml_utils.c',
 )
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 14/34] ml/cnxk: update device stats functions
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (12 preceding siblings ...)
  2023-10-26 12:43   ` [PATCH v9 13/34] ml/cnxk: update device debug functions Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 15/34] ml/cnxk: update device and model xstats functions Srikanth Yalavarthi
                     ` (20 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to handle ML device stats

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 32 ------------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h |  2 --
 drivers/ml/cnxk/cnxk_ml_ops.c  | 36 ++++++++++++++++++++++++++++++++--
 3 files changed, 34 insertions(+), 36 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index a56d002d4c..8cbf700f6e 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -770,38 +770,6 @@ cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
-int
-cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
-{
-	struct cnxk_ml_qp *qp;
-	int qp_id;
-
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-		qp = dev->data->queue_pairs[qp_id];
-		stats->enqueued_count += qp->stats.enqueued_count;
-		stats->dequeued_count += qp->stats.dequeued_count;
-		stats->enqueue_err_count += qp->stats.enqueue_err_count;
-		stats->dequeue_err_count += qp->stats.dequeue_err_count;
-	}
-
-	return 0;
-}
-
-void
-cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev)
-{
-	struct cnxk_ml_qp *qp;
-	int qp_id;
-
-	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
-		qp = dev->data->queue_pairs[qp_id];
-		qp->stats.enqueued_count = 0;
-		qp->stats.dequeued_count = 0;
-		qp->stats.enqueue_err_count = 0;
-		qp->stats.dequeue_err_count = 0;
-	}
-}
-
 int
 cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 5fda98ae88..47e7cb12af 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -298,8 +298,6 @@ int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev);
 
-int cn10k_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats);
-void cn10k_ml_dev_stats_reset(struct rte_ml_dev *dev);
 int cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 				  int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
 				  uint32_t size);
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 66b88ddae1..c75317d6da 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -489,6 +489,38 @@ cnxk_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 	return 0;
 }
 
+static int
+cnxk_ml_dev_stats_get(struct rte_ml_dev *dev, struct rte_ml_dev_stats *stats)
+{
+	struct cnxk_ml_qp *qp;
+	int qp_id;
+
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		stats->enqueued_count += qp->stats.enqueued_count;
+		stats->dequeued_count += qp->stats.dequeued_count;
+		stats->enqueue_err_count += qp->stats.enqueue_err_count;
+		stats->dequeue_err_count += qp->stats.dequeue_err_count;
+	}
+
+	return 0;
+}
+
+static void
+cnxk_ml_dev_stats_reset(struct rte_ml_dev *dev)
+{
+	struct cnxk_ml_qp *qp;
+	int qp_id;
+
+	for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {
+		qp = dev->data->queue_pairs[qp_id];
+		qp->stats.enqueued_count = 0;
+		qp->stats.dequeued_count = 0;
+		qp->stats.enqueue_err_count = 0;
+		qp->stats.dequeue_err_count = 0;
+	}
+}
+
 static int
 cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
@@ -772,8 +804,8 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	.dev_queue_pair_release = cnxk_ml_dev_queue_pair_release,
 
 	/* Stats ops */
-	.dev_stats_get = cn10k_ml_dev_stats_get,
-	.dev_stats_reset = cn10k_ml_dev_stats_reset,
+	.dev_stats_get = cnxk_ml_dev_stats_get,
+	.dev_stats_reset = cnxk_ml_dev_stats_reset,
 	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
 	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
 	.dev_xstats_get = cn10k_ml_dev_xstats_get,
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 15/34] ml/cnxk: update device and model xstats functions
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (13 preceding siblings ...)
  2023-10-26 12:43   ` [PATCH v9 14/34] ml/cnxk: update device stats functions Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 16/34] ml/cnxk: update fast path functions Srikanth Yalavarthi
                     ` (19 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added cnxk wrapper function to handle ML device and model
extended stats. Handling resources for the xstats is done
in the cnxk layer. Introduced internal xstats group.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h   |   4 -
 drivers/ml/cnxk/cn10k_ml_ops.c   | 531 +++----------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h   |  16 +-
 drivers/ml/cnxk/cnxk_ml_dev.h    |   5 +
 drivers/ml/cnxk/cnxk_ml_ops.c    | 481 +++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_xstats.h |  21 +-
 6 files changed, 551 insertions(+), 507 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index be989e0a20..bde9d08901 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -10,7 +10,6 @@
 #include "cn10k_ml_ocm.h"
 
 #include "cnxk_ml_io.h"
-#include "cnxk_ml_xstats.h"
 
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
@@ -133,9 +132,6 @@ struct cn10k_ml_dev {
 	/* OCM info */
 	struct cn10k_ml_ocm ocm;
 
-	/* Extended stats data */
-	struct cnxk_ml_xstats xstats;
-
 	/* Enable / disable model data caching */
 	int cache_model_data;
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 8cbf700f6e..776ad60401 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -198,107 +198,21 @@ cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_r
 	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
 }
 
-static int
-cn10k_ml_xstats_init(struct rte_ml_dev *dev)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint16_t nb_stats;
-	uint16_t stat_id;
-	uint16_t model;
-	uint16_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
-	nb_stats = RTE_DIM(device_xstats) + ML_CNXK_MAX_MODELS * RTE_DIM(layer_xstats);
-	if (cn10k_mldev->xstats.entries == NULL)
-		cn10k_mldev->xstats.entries = rte_zmalloc(
-			"cn10k_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
-			PLT_CACHE_LINE_SIZE);
-
-	if (cn10k_mldev->xstats.entries == NULL)
-		return -ENOMEM;
-
-	/* Initialize device xstats */
-	stat_id = 0;
-	for (i = 0; i < RTE_DIM(device_xstats); i++) {
-		cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
-		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
-			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s",
-			 device_xstats[i].name);
-
-		cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
-		cn10k_mldev->xstats.entries[stat_id].type = device_xstats[i].type;
-		cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_DEVICE;
-		cn10k_mldev->xstats.entries[stat_id].obj_idx = 0;
-		cn10k_mldev->xstats.entries[stat_id].reset_allowed = device_xstats[i].reset_allowed;
-		stat_id++;
-	}
-	cn10k_mldev->xstats.count_mode_device = stat_id;
-
-	/* Initialize model xstats */
-	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
-		cn10k_mldev->xstats.offset_for_model[model] = stat_id;
-
-		for (i = 0; i < RTE_DIM(layer_xstats); i++) {
-			cn10k_mldev->xstats.entries[stat_id].map.id = stat_id;
-			cn10k_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
-			cn10k_mldev->xstats.entries[stat_id].type = layer_xstats[i].type;
-			cn10k_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
-			cn10k_mldev->xstats.entries[stat_id].obj_idx = model;
-			cn10k_mldev->xstats.entries[stat_id].reset_allowed =
-				layer_xstats[i].reset_allowed;
-
-			/* Name of xstat is updated during model load */
-			snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
-				 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name),
-				 "Model-%u-%s", model, layer_xstats[i].name);
-
-			stat_id++;
-		}
-
-		cn10k_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
-	}
-
-	cn10k_mldev->xstats.count_mode_model = stat_id - cn10k_mldev->xstats.count_mode_device;
-	cn10k_mldev->xstats.count = stat_id;
-
-	return 0;
-}
-
 static void
-cn10k_ml_xstats_uninit(struct rte_ml_dev *dev)
+cn10k_ml_xstats_layer_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id,
+				  uint16_t layer_id)
 {
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	rte_free(cn10k_mldev->xstats.entries);
-	cn10k_mldev->xstats.entries = NULL;
-
-	cn10k_mldev->xstats.count = 0;
-}
-
-static void
-cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
 	uint16_t rclk_freq;
 	uint16_t sclk_freq;
 	uint16_t stat_id;
 	char suffix[8];
 	uint16_t i;
 
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	model = dev->data->models[model_id];
-	stat_id = RTE_DIM(device_xstats) + model_id * RTE_DIM(layer_xstats);
+	model = cnxk_mldev->mldev->data->models[model_id];
+	layer = &model->layer[layer_id];
+	stat_id = cnxk_mldev->xstats.offset_for_layer[model_id][layer_id];
 
 	roc_clk_freq_get(&rclk_freq, &sclk_freq);
 	if (sclk_freq == 0)
@@ -306,270 +220,94 @@ cn10k_ml_xstats_model_name_update(struct rte_ml_dev *dev, uint16_t model_id)
 	else
 		strcpy(suffix, "ns");
 
-	/* Update xstat name based on model name and sclk availability */
+	/* Update xstat name based on layer name and sclk availability */
 	for (i = 0; i < RTE_DIM(layer_xstats); i++) {
-		snprintf(cn10k_mldev->xstats.entries[stat_id].map.name,
-			 sizeof(cn10k_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
-			 model->layer[0].glow.metadata.model.name, layer_xstats[i].name, suffix);
+		snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+			 layer->glow.metadata.model.name, layer_xstats[i].name, suffix);
 		stat_id++;
 	}
 }
 
-static uint64_t
-cn10k_ml_dev_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx __rte_unused,
-		       enum cnxk_ml_xstats_type type)
-{
-	struct cnxk_ml_dev *cnxk_mldev;
-
-	cnxk_mldev = dev->data->dev_private;
-
-	switch (type) {
-	case nb_models_loaded:
-		return cnxk_mldev->nb_models_loaded;
-	case nb_models_unloaded:
-		return cnxk_mldev->nb_models_unloaded;
-	case nb_models_started:
-		return cnxk_mldev->nb_models_started;
-	case nb_models_stopped:
-		return cnxk_mldev->nb_models_stopped;
-	default:
-		return -1;
-	}
-
-	return 0;
-}
-
-#define ML_AVG_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+#define ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
 	do {                                                                                       \
 		value = 0;                                                                         \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value += model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot;       \
-			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
-				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value += layer->glow.burst_xstats[qp_id].str##_latency_tot;                \
+			count += layer->glow.burst_xstats[qp_id].dequeued_count -                  \
+				 layer->glow.burst_xstats[qp_id].str##_reset_count;                \
 		}                                                                                  \
+		value += layer->glow.sync_xstats->str##_latency_tot;                               \
+		count += layer->glow.sync_xstats->dequeued_count -                                 \
+			 layer->glow.sync_xstats->str##_reset_count;                               \
 		if (count != 0)                                                                    \
 			value = value / count;                                                     \
 	} while (0)
 
-#define ML_MIN_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+#define ML_MIN_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
 	do {                                                                                       \
 		value = UINT64_MAX;                                                                \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MIN(                                                           \
-				value,                                                             \
-				model->layer[0].glow.burst_xstats[qp_id].str##_latency_min);       \
-			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
-				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MIN(value, layer->glow.burst_xstats[qp_id].str##_latency_min); \
+			count += layer->glow.burst_xstats[qp_id].dequeued_count -                  \
+				 layer->glow.burst_xstats[qp_id].str##_reset_count;                \
 		}                                                                                  \
+		value = PLT_MIN(value, layer->glow.sync_xstats->str##_latency_min);                \
+		count += layer->glow.sync_xstats->dequeued_count -                                 \
+			 layer->glow.sync_xstats->str##_reset_count;                               \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
 	} while (0)
 
-#define ML_MAX_FOREACH_QP(dev, model, qp_id, str, value, count)                                    \
+#define ML_MAX_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
 	do {                                                                                       \
 		value = 0;                                                                         \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			value = PLT_MAX(                                                           \
-				value,                                                             \
-				model->layer[0].glow.burst_xstats[qp_id].str##_latency_max);       \
-			count += model->layer[0].glow.burst_xstats[qp_id].dequeued_count -         \
-				 model->layer[0].glow.burst_xstats[qp_id].str##_reset_count;       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MAX(value, layer->glow.burst_xstats[qp_id].str##_latency_max); \
+			count += layer->glow.burst_xstats[qp_id].dequeued_count -                  \
+				 layer->glow.burst_xstats[qp_id].str##_reset_count;                \
 		}                                                                                  \
+		value = PLT_MAX(value, layer->glow.sync_xstats->str##_latency_max);                \
+		count += layer->glow.sync_xstats->dequeued_count -                                 \
+			 layer->glow.sync_xstats->str##_reset_count;                               \
 		if (count == 0)                                                                    \
 			value = 0;                                                                 \
 	} while (0)
 
-static uint64_t
-cn10k_ml_model_xstat_get(struct rte_ml_dev *dev, uint16_t obj_idx, enum cnxk_ml_xstats_type type)
+uint64_t
+cn10k_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
+			 enum cnxk_ml_xstats_type type)
 {
-	struct cnxk_ml_model *model;
-	uint16_t rclk_freq; /* MHz */
-	uint16_t sclk_freq; /* MHz */
 	uint64_t count = 0;
-	uint64_t value;
+	uint64_t value = 0;
 	uint32_t qp_id;
 
-	model = dev->data->models[obj_idx];
-	if (model == NULL)
-		return 0;
-
 	switch (type) {
 	case avg_hw_latency:
-		ML_AVG_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, hw, value, count);
 		break;
 	case min_hw_latency:
-		ML_MIN_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		ML_MIN_FOREACH_QP(cnxk_mldev, layer, qp_id, hw, value, count);
 		break;
 	case max_hw_latency:
-		ML_MAX_FOREACH_QP(dev, model, qp_id, hw, value, count);
+		ML_MAX_FOREACH_QP(cnxk_mldev, layer, qp_id, hw, value, count);
 		break;
 	case avg_fw_latency:
-		ML_AVG_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, fw, value, count);
 		break;
 	case min_fw_latency:
-		ML_MIN_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		ML_MIN_FOREACH_QP(cnxk_mldev, layer, qp_id, fw, value, count);
 		break;
 	case max_fw_latency:
-		ML_MAX_FOREACH_QP(dev, model, qp_id, fw, value, count);
+		ML_MAX_FOREACH_QP(cnxk_mldev, layer, qp_id, fw, value, count);
 		break;
 	default:
 		value = 0;
 	}
 
-	roc_clk_freq_get(&rclk_freq, &sclk_freq);
-	if (sclk_freq != 0) /* return in ns */
-		value = (value * 1000ULL) / sclk_freq;
-
 	return value;
 }
 
-static int
-cn10k_ml_device_xstats_reset(struct rte_ml_dev *dev, const uint16_t stat_ids[], uint16_t nb_ids)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_xstats_entry *xs;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint16_t nb_stats;
-	uint16_t stat_id;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	if (stat_ids == NULL)
-		nb_stats = cn10k_mldev->xstats.count_mode_device;
-	else
-		nb_stats = nb_ids;
-
-	for (i = 0; i < nb_stats; i++) {
-		if (stat_ids == NULL)
-			stat_id = i;
-		else
-			stat_id = stat_ids[i];
-
-		if (stat_id >= cn10k_mldev->xstats.count_mode_device)
-			return -EINVAL;
-
-		xs = &cn10k_mldev->xstats.entries[stat_id];
-		if (!xs->reset_allowed)
-			continue;
-
-		xs->reset_value = cn10k_ml_dev_xstat_get(dev, xs->obj_idx, xs->type);
-	}
-
-	return 0;
-}
-
-#define ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
-	do {                                                                                       \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++) {                      \
-			model->layer[0].glow.burst_xstats[qp_id].str##_latency_tot = 0;            \
-			model->layer[0].glow.burst_xstats[qp_id].str##_reset_count =               \
-				model->layer[0].glow.burst_xstats[qp_id].dequeued_count;           \
-		}                                                                                  \
-	} while (0)
-
-#define ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
-	do {                                                                                       \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_xstats[qp_id].str##_latency_min = UINT64_MAX;   \
-	} while (0)
-
-#define ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, str)                                            \
-	do {                                                                                       \
-		for (qp_id = 0; qp_id < dev->data->nb_queue_pairs; qp_id++)                        \
-			model->layer[0].glow.burst_xstats[qp_id].str##_latency_max = 0;            \
-	} while (0)
-
-static void
-cn10k_ml_reset_model_stat(struct rte_ml_dev *dev, uint16_t model_id, enum cnxk_ml_xstats_type type)
-{
-	struct cnxk_ml_model *model;
-	uint32_t qp_id;
-
-	model = dev->data->models[model_id];
-
-	switch (type) {
-	case avg_hw_latency:
-		ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, hw);
-		break;
-	case min_hw_latency:
-		ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, hw);
-		break;
-	case max_hw_latency:
-		ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, hw);
-		break;
-	case avg_fw_latency:
-		ML_AVG_RESET_FOREACH_QP(dev, model, qp_id, fw);
-		break;
-	case min_fw_latency:
-		ML_MIN_RESET_FOREACH_QP(dev, model, qp_id, fw);
-		break;
-	case max_fw_latency:
-		ML_MAX_RESET_FOREACH_QP(dev, model, qp_id, fw);
-		break;
-	default:
-		return;
-	}
-}
-
-static int
-cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint16_t stat_ids[],
-			    uint16_t nb_ids)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_xstats_entry *xs;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_model *model;
-	int32_t lcl_model_id = 0;
-	uint16_t start_id;
-	uint16_t end_id;
-	int32_t i;
-	int32_t j;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	for (i = 0; i < ML_CNXK_MAX_MODELS; i++) {
-		if (model_id == -1) {
-			model = dev->data->models[i];
-			if (model == NULL) /* Skip inactive models */
-				continue;
-		} else {
-			if (model_id != i)
-				continue;
-
-			model = dev->data->models[model_id];
-			if (model == NULL) {
-				plt_err("Invalid model_id = %d\n", model_id);
-				return -EINVAL;
-			}
-		}
-
-		start_id = cn10k_mldev->xstats.offset_for_model[i];
-		end_id = cn10k_mldev->xstats.offset_for_model[i] +
-			 cn10k_mldev->xstats.count_per_model[i] - 1;
-
-		if (stat_ids == NULL) {
-			for (j = start_id; j <= end_id; j++) {
-				xs = &cn10k_mldev->xstats.entries[j];
-				cn10k_ml_reset_model_stat(dev, i, xs->type);
-			}
-		} else {
-			for (j = 0; j < nb_ids; j++) {
-				if (stat_ids[j] < start_id || stat_ids[j] > end_id) {
-					plt_err("Invalid stat_ids[%d] = %d for model_id = %d\n", j,
-						stat_ids[j], lcl_model_id);
-					return -EINVAL;
-				}
-				xs = &cn10k_mldev->xstats.entries[stat_ids[j]];
-				cn10k_ml_reset_model_stat(dev, i, xs->type);
-			}
-		}
-	}
-
-	return 0;
-}
-
 static int
 cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer)
 {
@@ -654,7 +392,6 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 	struct cn10k_ml_dev *cn10k_mldev;
 	struct cn10k_ml_ocm *ocm;
 	uint16_t tile_id;
-	int ret;
 
 	RTE_SET_USED(conf);
 
@@ -682,13 +419,6 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 
 	rte_spinlock_init(&ocm->lock);
 
-	/* Initialize xstats */
-	ret = cn10k_ml_xstats_init(cnxk_mldev->mldev);
-	if (ret != 0) {
-		plt_err("Failed to initialize xstats");
-		return ret;
-	}
-
 	/* Set JCMDQ enqueue function */
 	if (cn10k_mldev->hw_queue_lock == 1)
 		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_sl;
@@ -717,9 +447,6 @@ cn10k_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 	/* Release ocm_mask memory */
 	rte_free(cn10k_mldev->ocm.ocm_mask);
 
-	/* Un-initialize xstats */
-	cn10k_ml_xstats_uninit(cnxk_mldev->mldev);
-
 	/* Unload firmware */
 	cn10k_ml_fw_unload(cnxk_mldev);
 
@@ -770,174 +497,6 @@ cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
-int
-cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			      int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
-			      uint32_t size)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint32_t xstats_mode_count;
-	uint32_t idx = 0;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-
-	xstats_mode_count = 0;
-	switch (mode) {
-	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
-		break;
-	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CNXK_MAX_MODELS)
-			break;
-		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
-		break;
-	default:
-		return -EINVAL;
-	};
-
-	if (xstats_mode_count > size || xstats_map == NULL)
-		return xstats_mode_count;
-
-	for (i = 0; i < cn10k_mldev->xstats.count && idx < size; i++) {
-		if (cn10k_mldev->xstats.entries[i].mode != mode)
-			continue;
-
-		if (mode != RTE_ML_DEV_XSTATS_DEVICE &&
-		    model_id != cn10k_mldev->xstats.entries[i].obj_idx)
-			continue;
-
-		rte_strscpy(xstats_map[idx].name, cn10k_mldev->xstats.entries[i].map.name,
-			    RTE_ML_STR_MAX);
-		xstats_map[idx].id = cn10k_mldev->xstats.entries[i].map.id;
-		idx++;
-	}
-
-	return idx;
-}
-
-int
-cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
-				uint64_t *value)
-{
-	struct cnxk_ml_xstats_entry *xs;
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	cnxk_ml_xstats_fn fn;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	for (i = 0; i < cn10k_mldev->xstats.count; i++) {
-		xs = &cn10k_mldev->xstats.entries[i];
-		if (strncmp(xs->map.name, name, RTE_ML_STR_MAX) == 0) {
-			if (stat_id != NULL)
-				*stat_id = xs->map.id;
-
-			switch (xs->fn_id) {
-			case CNXK_ML_XSTATS_FN_DEVICE:
-				fn = cn10k_ml_dev_xstat_get;
-				break;
-			case CNXK_ML_XSTATS_FN_MODEL:
-				fn = cn10k_ml_model_xstat_get;
-				break;
-			default:
-				plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
-				return -EINVAL;
-			}
-
-			*value = fn(dev, xs->obj_idx, xs->type) - xs->reset_value;
-
-			return 0;
-		}
-	}
-
-	if (stat_id != NULL)
-		*stat_id = (uint16_t)-1;
-
-	return -EINVAL;
-}
-
-int
-cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
-			const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
-{
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_xstats_entry *xs;
-	struct cnxk_ml_dev *cnxk_mldev;
-	uint32_t xstats_mode_count;
-	cnxk_ml_xstats_fn fn;
-	uint64_t val;
-	uint32_t idx;
-	uint32_t i;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	xstats_mode_count = 0;
-
-	switch (mode) {
-	case RTE_ML_DEV_XSTATS_DEVICE:
-		xstats_mode_count = cn10k_mldev->xstats.count_mode_device;
-		break;
-	case RTE_ML_DEV_XSTATS_MODEL:
-		if (model_id >= ML_CNXK_MAX_MODELS)
-			return -EINVAL;
-		xstats_mode_count = cn10k_mldev->xstats.count_per_model[model_id];
-		break;
-	default:
-		return -EINVAL;
-	};
-
-	idx = 0;
-	for (i = 0; i < nb_ids && idx < xstats_mode_count; i++) {
-		xs = &cn10k_mldev->xstats.entries[stat_ids[i]];
-		if (stat_ids[i] > cn10k_mldev->xstats.count || xs->mode != mode)
-			continue;
-
-		if (mode == RTE_ML_DEV_XSTATS_MODEL && model_id != xs->obj_idx) {
-			plt_err("Invalid stats_id[%d] = %d for model_id = %d\n", i, stat_ids[i],
-				model_id);
-			return -EINVAL;
-		}
-
-		switch (xs->fn_id) {
-		case CNXK_ML_XSTATS_FN_DEVICE:
-			fn = cn10k_ml_dev_xstat_get;
-			break;
-		case CNXK_ML_XSTATS_FN_MODEL:
-			fn = cn10k_ml_model_xstat_get;
-			break;
-		default:
-			plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
-			return -EINVAL;
-		}
-
-		val = fn(dev, xs->obj_idx, xs->type);
-		if (values)
-			values[idx] = val;
-
-		idx++;
-	}
-
-	return idx;
-}
-
-int
-cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			  int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids)
-{
-	switch (mode) {
-	case RTE_ML_DEV_XSTATS_DEVICE:
-		return cn10k_ml_device_xstats_reset(dev, stat_ids, nb_ids);
-	case RTE_ML_DEV_XSTATS_MODEL:
-		return cn10k_ml_model_xstats_reset(dev, model_id, stat_ids, nb_ids);
-	};
-
-	return 0;
-}
-
 int
 cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
 {
@@ -1211,7 +770,7 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 							      sizeof(struct cn10k_ml_layer_xstats));
 
 	/* Update xstats names */
-	cn10k_ml_xstats_model_name_update(cnxk_mldev->mldev, idx);
+	cn10k_ml_xstats_layer_name_update(cnxk_mldev, model_id, layer_id);
 
 	layer->state = ML_CNXK_LAYER_STATE_LOADED;
 	cnxk_mldev->index_map[idx].model_id = model->model_id;
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 47e7cb12af..4d76164dba 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -13,6 +13,7 @@
 struct cnxk_ml_dev;
 struct cnxk_ml_qp;
 struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
@@ -298,17 +299,6 @@ int cn10k_ml_dev_stop(struct cnxk_ml_dev *cnxk_mldev);
 int cn10k_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int cn10k_ml_dev_selftest(struct cnxk_ml_dev *cnxk_mldev);
 
-int cn10k_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-				  int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
-				  uint32_t size);
-int cn10k_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
-				    uint64_t *value);
-int cn10k_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			    int32_t model_id, const uint16_t stat_ids[], uint64_t values[],
-			    uint16_t nb_ids);
-int cn10k_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
-			      int32_t model_id, const uint16_t stat_ids[], uint16_t nb_ids);
-
 /* Slow-path ops */
 int cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
@@ -337,4 +327,8 @@ int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_nam
 int cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name);
 int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name);
 
+/* xstats ops */
+uint64_t cn10k_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
+				  enum cnxk_ml_xstats_type type);
+
 #endif /* _CN10K_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 1590249abd..3ce9338f1f 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -9,6 +9,8 @@
 
 #include "cn10k_ml_dev.h"
 
+#include "cnxk_ml_xstats.h"
+
 /* ML command timeout in seconds */
 #define ML_CNXK_CMD_TIMEOUT 5
 
@@ -51,6 +53,9 @@ struct cnxk_ml_dev {
 	/* Configuration state */
 	enum cnxk_ml_dev_state state;
 
+	/* Extended stats data */
+	struct cnxk_ml_xstats xstats;
+
 	/* Number of models loaded */
 	uint16_t nb_models_loaded;
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index c75317d6da..4f4a41219e 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -115,6 +115,285 @@ cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc
 	return NULL;
 }
 
+static int
+cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
+{
+	uint16_t nb_stats;
+	uint16_t stat_id;
+	uint16_t model;
+	uint16_t layer;
+	uint16_t i;
+
+	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
+	nb_stats = RTE_DIM(device_xstats) +
+		   RTE_DIM(layer_xstats) * ML_CNXK_MAX_MODELS * ML_CNXK_MODEL_MAX_LAYERS;
+	if (cnxk_mldev->xstats.entries == NULL)
+		cnxk_mldev->xstats.entries = rte_zmalloc(
+			"cnxk_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
+			PLT_CACHE_LINE_SIZE);
+
+	if (cnxk_mldev->xstats.entries == NULL)
+		return -ENOMEM;
+
+	/* Initialize device xstats */
+	stat_id = 0;
+	for (i = 0; i < RTE_DIM(device_xstats); i++) {
+		cnxk_mldev->xstats.entries[stat_id].map.id = stat_id;
+		snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+			 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s",
+			 device_xstats[i].name);
+
+		cnxk_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_DEVICE;
+		cnxk_mldev->xstats.entries[stat_id].group = CNXK_ML_XSTATS_GROUP_DEVICE;
+		cnxk_mldev->xstats.entries[stat_id].type = device_xstats[i].type;
+		cnxk_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_DEVICE;
+		cnxk_mldev->xstats.entries[stat_id].obj_idx = 0;
+		cnxk_mldev->xstats.entries[stat_id].reset_allowed = device_xstats[i].reset_allowed;
+		stat_id++;
+	}
+	cnxk_mldev->xstats.count_mode_device = stat_id;
+
+	/* Initialize model xstats */
+	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
+		cnxk_mldev->xstats.offset_for_model[model] = stat_id;
+
+		for (layer = 0; layer < ML_CNXK_MODEL_MAX_LAYERS; layer++) {
+			cnxk_mldev->xstats.offset_for_layer[model][layer] = stat_id;
+
+			for (i = 0; i < RTE_DIM(layer_xstats); i++) {
+				cnxk_mldev->xstats.entries[stat_id].map.id = stat_id;
+				cnxk_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
+				cnxk_mldev->xstats.entries[stat_id].group =
+					CNXK_ML_XSTATS_GROUP_LAYER;
+				cnxk_mldev->xstats.entries[stat_id].type = layer_xstats[i].type;
+				cnxk_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
+				cnxk_mldev->xstats.entries[stat_id].obj_idx = model;
+				cnxk_mldev->xstats.entries[stat_id].layer_id = layer;
+				cnxk_mldev->xstats.entries[stat_id].reset_allowed =
+					layer_xstats[i].reset_allowed;
+
+				/* Name of xstat is updated during model load */
+				snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+					 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name),
+					 "Layer-%u-%u-%s", model, layer, layer_xstats[i].name);
+
+				stat_id++;
+			}
+
+			cnxk_mldev->xstats.count_per_layer[model][layer] = RTE_DIM(layer_xstats);
+		}
+
+		cnxk_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
+	}
+
+	cnxk_mldev->xstats.count_mode_model = stat_id - cnxk_mldev->xstats.count_mode_device;
+	cnxk_mldev->xstats.count = stat_id;
+
+	return 0;
+}
+
+static void
+cnxk_ml_xstats_uninit(struct cnxk_ml_dev *cnxk_mldev)
+{
+	rte_free(cnxk_mldev->xstats.entries);
+	cnxk_mldev->xstats.entries = NULL;
+
+	cnxk_mldev->xstats.count = 0;
+}
+
+static uint64_t
+cnxk_ml_dev_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx __rte_unused,
+		      int32_t layer_id __rte_unused, enum cnxk_ml_xstats_type type)
+{
+	switch (type) {
+	case nb_models_loaded:
+		return cnxk_mldev->nb_models_loaded;
+	case nb_models_unloaded:
+		return cnxk_mldev->nb_models_unloaded;
+	case nb_models_started:
+		return cnxk_mldev->nb_models_started;
+	case nb_models_stopped:
+		return cnxk_mldev->nb_models_stopped;
+	default:
+		return -1;
+	}
+
+	return 0;
+}
+
+static uint64_t
+cnxk_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx, int32_t layer_id,
+			enum cnxk_ml_xstats_type type)
+{
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
+	uint16_t rclk_freq; /* MHz */
+	uint16_t sclk_freq; /* MHz */
+	uint64_t value = 0;
+
+	model = cnxk_mldev->mldev->data->models[obj_idx];
+	if (model == NULL)
+		return 0;
+
+	if (layer_id >= 0)
+		layer = &model->layer[layer_id];
+	else
+		return 0;
+
+	value = cn10k_ml_model_xstat_get(cnxk_mldev, layer, type);
+
+	roc_clk_freq_get(&rclk_freq, &sclk_freq);
+	if (sclk_freq != 0) /* return in ns */
+		value = (value * 1000ULL) / sclk_freq;
+
+	return value;
+}
+
+static int
+cnxk_ml_device_xstats_reset(struct cnxk_ml_dev *cnxk_mldev, const uint16_t stat_ids[],
+			    uint16_t nb_ids)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	uint16_t nb_stats;
+	uint16_t stat_id;
+	uint32_t i;
+
+	if (stat_ids == NULL)
+		nb_stats = cnxk_mldev->xstats.count_mode_device;
+	else
+		nb_stats = nb_ids;
+
+	for (i = 0; i < nb_stats; i++) {
+		if (stat_ids == NULL)
+			stat_id = i;
+		else
+			stat_id = stat_ids[i];
+
+		if (stat_id >= cnxk_mldev->xstats.count_mode_device)
+			return -EINVAL;
+
+		xs = &cnxk_mldev->xstats.entries[stat_id];
+		if (!xs->reset_allowed)
+			continue;
+
+		xs->reset_value =
+			cnxk_ml_dev_xstat_get(cnxk_mldev, xs->obj_idx, xs->layer_id, xs->type);
+	}
+
+	return 0;
+}
+
+#define ML_AVG_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, str)                                     \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			layer->glow.burst_xstats[qp_id].str##_latency_tot = 0;                     \
+			layer->glow.burst_xstats[qp_id].str##_reset_count =                        \
+				layer->glow.burst_xstats[qp_id].dequeued_count;                    \
+		}                                                                                  \
+	} while (0)
+
+#define ML_MIN_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, str)                                     \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++)          \
+			layer->glow.burst_xstats[qp_id].str##_latency_min = UINT64_MAX;            \
+	} while (0)
+
+#define ML_MAX_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, str)                                     \
+	do {                                                                                       \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++)          \
+			layer->glow.burst_xstats[qp_id].str##_latency_max = 0;                     \
+	} while (0)
+
+static void
+cnxk_ml_reset_model_stat(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id,
+			 enum cnxk_ml_xstats_type type)
+{
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
+	uint16_t layer_id = 0;
+	uint32_t qp_id;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	layer = &model->layer[layer_id];
+
+	switch (type) {
+	case avg_hw_latency:
+		ML_AVG_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, hw);
+		break;
+	case min_hw_latency:
+		ML_MIN_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, hw);
+		break;
+	case max_hw_latency:
+		ML_MAX_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, hw);
+		break;
+	case avg_fw_latency:
+		ML_AVG_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, fw);
+		break;
+	case min_fw_latency:
+		ML_MIN_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, fw);
+		break;
+	case max_fw_latency:
+		ML_MAX_RESET_FOREACH_QP(cnxk_mldev, layer, qp_id, fw);
+		break;
+	default:
+		return;
+	}
+}
+
+static int
+cnxk_ml_model_xstats_reset(struct cnxk_ml_dev *cnxk_mldev, int32_t model_id,
+			   const uint16_t stat_ids[], uint16_t nb_ids)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_model *model;
+	int32_t lcl_model_id = 0;
+	uint16_t layer_id = 0;
+	uint16_t start_id;
+	uint16_t end_id;
+	int32_t i;
+	int32_t j;
+
+	for (i = 0; i < ML_CNXK_MAX_MODELS; i++) {
+		if (model_id == -1) {
+			model = cnxk_mldev->mldev->data->models[i];
+			if (model == NULL) /* skip inactive models */
+				continue;
+		} else {
+			if (model_id != i)
+				continue;
+
+			model = cnxk_mldev->mldev->data->models[model_id];
+			if (model == NULL) {
+				plt_err("Invalid model_id = %d\n", model_id);
+				return -EINVAL;
+			}
+		}
+
+		start_id = cnxk_mldev->xstats.offset_for_layer[i][layer_id];
+		end_id = cnxk_mldev->xstats.offset_for_layer[i][layer_id] +
+			 cnxk_mldev->xstats.count_per_layer[i][layer_id] - 1;
+
+		if (stat_ids == NULL) {
+			for (j = start_id; j <= end_id; j++) {
+				xs = &cnxk_mldev->xstats.entries[j];
+				cnxk_ml_reset_model_stat(cnxk_mldev, i, xs->type);
+			}
+		} else {
+			for (j = 0; j < nb_ids; j++) {
+				if (stat_ids[j] < start_id || stat_ids[j] > end_id) {
+					plt_err("Invalid stat_ids[%d] = %d for model_id = %d\n", j,
+						stat_ids[j], lcl_model_id);
+					return -EINVAL;
+				}
+				xs = &cnxk_mldev->xstats.entries[stat_ids[j]];
+				cnxk_ml_reset_model_stat(cnxk_mldev, i, xs->type);
+			}
+		}
+	}
+
+	return 0;
+}
+
 static int
 cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 {
@@ -294,6 +573,13 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	for (i = 0; i < cnxk_mldev->max_nb_layers; i++)
 		cnxk_mldev->index_map[i].active = false;
 
+	/* Initialize xstats */
+	ret = cnxk_ml_xstats_init(cnxk_mldev);
+	if (ret != 0) {
+		plt_err("Failed to initialize xstats");
+		goto error;
+	}
+
 	cnxk_mldev->nb_models_loaded = 0;
 	cnxk_mldev->nb_models_started = 0;
 	cnxk_mldev->nb_models_stopped = 0;
@@ -323,6 +609,9 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
+	/* Un-initialize xstats */
+	cnxk_ml_xstats_uninit(cnxk_mldev);
+
 	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close CN10K ML Device");
 
@@ -521,6 +810,190 @@ cnxk_ml_dev_stats_reset(struct rte_ml_dev *dev)
 	}
 }
 
+static int
+cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
+			     int32_t model_id, struct rte_ml_dev_xstats_map *xstats_map,
+			     uint32_t size)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_dev *cnxk_mldev;
+	uint32_t xstats_mode_count;
+	uint16_t layer_id = 0;
+	uint32_t idx = 0;
+	uint32_t i;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+	xstats_mode_count = 0;
+
+	switch (mode) {
+	case RTE_ML_DEV_XSTATS_DEVICE:
+		xstats_mode_count = cnxk_mldev->xstats.count_mode_device;
+		break;
+	case RTE_ML_DEV_XSTATS_MODEL:
+		if (model_id >= ML_CNXK_MAX_MODELS)
+			break;
+		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+		break;
+	default:
+		return -EINVAL;
+	};
+
+	if (xstats_mode_count > size || xstats_map == NULL)
+		return xstats_mode_count;
+
+	for (i = 0; i < cnxk_mldev->xstats.count && idx < size; i++) {
+		xs = &cnxk_mldev->xstats.entries[i];
+		if (xs->mode != mode)
+			continue;
+
+		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
+		    (model_id != xs->obj_idx || layer_id != xs->layer_id))
+			continue;
+
+		rte_strscpy(xstats_map[idx].name, xs->map.name, RTE_ML_STR_MAX);
+		xstats_map[idx].id = xs->map.id;
+		idx++;
+	}
+
+	return idx;
+}
+
+static int
+cnxk_ml_dev_xstats_by_name_get(struct rte_ml_dev *dev, const char *name, uint16_t *stat_id,
+			       uint64_t *value)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_dev *cnxk_mldev;
+	cnxk_ml_xstats_fn fn;
+	uint32_t i;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	for (i = 0; i < cnxk_mldev->xstats.count; i++) {
+		xs = &cnxk_mldev->xstats.entries[i];
+		if (strncmp(xs->map.name, name, RTE_ML_STR_MAX) == 0) {
+			if (stat_id != NULL)
+				*stat_id = xs->map.id;
+
+			switch (xs->fn_id) {
+			case CNXK_ML_XSTATS_FN_DEVICE:
+				fn = cnxk_ml_dev_xstat_get;
+				break;
+			case CNXK_ML_XSTATS_FN_MODEL:
+				fn = cnxk_ml_model_xstat_get;
+				break;
+			default:
+				plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
+				return -EINVAL;
+			}
+
+			*value = fn(cnxk_mldev, xs->obj_idx, xs->layer_id, xs->type) -
+				 xs->reset_value;
+
+			return 0;
+		}
+	}
+
+	if (stat_id != NULL)
+		*stat_id = (uint16_t)-1;
+
+	return -EINVAL;
+}
+
+static int
+cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
+		       const uint16_t stat_ids[], uint64_t values[], uint16_t nb_ids)
+{
+	struct cnxk_ml_xstats_entry *xs;
+	struct cnxk_ml_dev *cnxk_mldev;
+	uint32_t xstats_mode_count;
+	uint16_t layer_id = 0;
+	cnxk_ml_xstats_fn fn;
+	uint64_t val;
+	uint32_t idx;
+	uint32_t i;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+	xstats_mode_count = 0;
+
+	switch (mode) {
+	case RTE_ML_DEV_XSTATS_DEVICE:
+		xstats_mode_count = cnxk_mldev->xstats.count_mode_device;
+		break;
+	case RTE_ML_DEV_XSTATS_MODEL:
+		if (model_id >= ML_CNXK_MAX_MODELS)
+			return -EINVAL;
+		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+		break;
+	default:
+		return -EINVAL;
+	};
+
+	idx = 0;
+	for (i = 0; i < nb_ids && idx < xstats_mode_count; i++) {
+		xs = &cnxk_mldev->xstats.entries[stat_ids[i]];
+		if (stat_ids[i] > cnxk_mldev->xstats.count || xs->mode != mode)
+			continue;
+
+		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
+		    (model_id != xs->obj_idx || layer_id != xs->layer_id)) {
+			plt_err("Invalid stats_id[%d] = %d for model_id = %d\n", i, stat_ids[i],
+				model_id);
+			return -EINVAL;
+		}
+
+		switch (xs->fn_id) {
+		case CNXK_ML_XSTATS_FN_DEVICE:
+			fn = cnxk_ml_dev_xstat_get;
+			break;
+		case CNXK_ML_XSTATS_FN_MODEL:
+			fn = cnxk_ml_model_xstat_get;
+			break;
+		default:
+			plt_err("Unexpected xstat fn_id = %d", xs->fn_id);
+			return -EINVAL;
+		}
+
+		val = fn(cnxk_mldev, xs->obj_idx, xs->layer_id, xs->type);
+		if (values)
+			values[idx] = val;
+
+		idx++;
+	}
+
+	return idx;
+}
+
+static int
+cnxk_ml_dev_xstats_reset(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode, int32_t model_id,
+			 const uint16_t stat_ids[], uint16_t nb_ids)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+
+	if (dev == NULL)
+		return -EINVAL;
+
+	cnxk_mldev = dev->data->dev_private;
+
+	switch (mode) {
+	case RTE_ML_DEV_XSTATS_DEVICE:
+		return cnxk_ml_device_xstats_reset(cnxk_mldev, stat_ids, nb_ids);
+	case RTE_ML_DEV_XSTATS_MODEL:
+		return cnxk_ml_model_xstats_reset(cnxk_mldev, model_id, stat_ids, nb_ids);
+	};
+
+	return 0;
+}
+
 static int
 cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
@@ -806,10 +1279,10 @@ struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Stats ops */
 	.dev_stats_get = cnxk_ml_dev_stats_get,
 	.dev_stats_reset = cnxk_ml_dev_stats_reset,
-	.dev_xstats_names_get = cn10k_ml_dev_xstats_names_get,
-	.dev_xstats_by_name_get = cn10k_ml_dev_xstats_by_name_get,
-	.dev_xstats_get = cn10k_ml_dev_xstats_get,
-	.dev_xstats_reset = cn10k_ml_dev_xstats_reset,
+	.dev_xstats_names_get = cnxk_ml_dev_xstats_names_get,
+	.dev_xstats_by_name_get = cnxk_ml_dev_xstats_by_name_get,
+	.dev_xstats_get = cnxk_ml_dev_xstats_get,
+	.dev_xstats_reset = cnxk_ml_dev_xstats_reset,
 
 	/* Model ops */
 	.model_load = cnxk_ml_model_load,
diff --git a/drivers/ml/cnxk/cnxk_ml_xstats.h b/drivers/ml/cnxk/cnxk_ml_xstats.h
index 0d405679ca..5e02bb876c 100644
--- a/drivers/ml/cnxk/cnxk_ml_xstats.h
+++ b/drivers/ml/cnxk/cnxk_ml_xstats.h
@@ -7,6 +7,8 @@
 
 #include "cnxk_ml_io.h"
 
+struct cnxk_ml_dev;
+
 /* Extended stats types enum */
 enum cnxk_ml_xstats_type {
 	/* Number of models loaded */
@@ -58,9 +60,21 @@ enum cnxk_ml_xstats_fn_type {
 	CNXK_ML_XSTATS_FN_MODEL,
 };
 
+/* Extended stats group */
+enum cnxk_ml_xstats_group {
+	/* Device stats */
+	CNXK_ML_XSTATS_GROUP_DEVICE,
+
+	/* Model stats */
+	CNXK_ML_XSTATS_GROUP_MODEL,
+
+	/* Layer stats */
+	CNXK_ML_XSTATS_GROUP_LAYER,
+};
+
 /* Function pointer to get xstats for a type */
-typedef uint64_t (*cnxk_ml_xstats_fn)(struct rte_ml_dev *cnxk_mldev, uint16_t obj_idx,
-				      enum cnxk_ml_xstats_type stat);
+typedef uint64_t (*cnxk_ml_xstats_fn)(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx,
+				      int32_t layer_id, enum cnxk_ml_xstats_type stat);
 
 /* Extended stats entry structure */
 struct cnxk_ml_xstats_entry {
@@ -70,6 +84,9 @@ struct cnxk_ml_xstats_entry {
 	/* xstats mode, device or model */
 	enum rte_ml_dev_xstats_mode mode;
 
+	/* xstats group */
+	enum cnxk_ml_xstats_group group;
+
 	/* Type of xstats */
 	enum cnxk_ml_xstats_type type;
 
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 16/34] ml/cnxk: update fast path functions
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (14 preceding siblings ...)
  2023-10-26 12:43   ` [PATCH v9 15/34] ml/cnxk: update device and model xstats functions Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 17/34] ml/cnxk: move error handling to cnxk layer Srikanth Yalavarthi
                     ` (18 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented cnxk layer fast-path functions and added support
for model specific fast-path functions. CNXK layer functions
would invoke model specific fast-path functions.

Added support for model specific poll handling functions and
updated internal inference sync function. Drop use of rte_ml_op
as argument. Updated function arguments to enable the function
to be used as callback by TVM HW runtime.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h  |   5 -
 drivers/ml/cnxk/cn10k_ml_ops.c  | 241 ++++++++------------------------
 drivers/ml/cnxk/cn10k_ml_ops.h  |  13 +-
 drivers/ml/cnxk/cnxk_ml_model.h |  14 ++
 drivers/ml/cnxk/cnxk_ml_ops.c   | 128 +++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.h   |   7 +
 6 files changed, 216 insertions(+), 192 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index bde9d08901..94a94d996f 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -143,11 +143,6 @@ struct cn10k_ml_dev {
 
 	/* JCMD enqueue function handler */
 	bool (*ml_jcmdq_enqueue)(struct roc_ml *roc_ml, struct ml_job_cmd_s *job_cmd);
-
-	/* Poll handling function pointers */
-	void (*set_poll_addr)(struct cnxk_ml_req *req);
-	void (*set_poll_ptr)(struct cnxk_ml_req *req);
-	uint64_t (*get_poll_ptr)(struct cnxk_ml_req *req);
 };
 
 uint64_t cn10k_ml_fw_flags_get(struct cn10k_ml_fw *fw);
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 776ad60401..8116c8dedb 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -65,24 +65,12 @@ static const struct cn10k_ml_stype_db_driver {
 	{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
 };
 
-static inline void
+__rte_hot void
 cn10k_ml_set_poll_addr(struct cnxk_ml_req *req)
 {
 	req->status = &req->cn10k_req.status;
 }
 
-static inline void
-cn10k_ml_set_poll_ptr(struct cnxk_ml_req *req)
-{
-	plt_write64(ML_CNXK_POLL_JOB_START, req->status);
-}
-
-static inline uint64_t
-cn10k_ml_get_poll_ptr(struct cnxk_ml_req *req)
-{
-	return plt_read64(req->status);
-}
-
 void
 cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp)
 {
@@ -177,7 +165,7 @@ cn10k_ml_prep_sp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_l
 
 static __rte_always_inline void
 cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_req *req,
-				struct rte_ml_op *op)
+				uint16_t index, void *input, void *output, uint16_t nb_batches)
 {
 	struct cn10k_ml_dev *cn10k_mldev;
 
@@ -185,17 +173,17 @@ cn10k_ml_prep_fp_job_descriptor(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_r
 
 	req->cn10k_req.jd.hdr.jce.w0.u64 = 0;
 	req->cn10k_req.jd.hdr.jce.w1.u64 = PLT_U64_CAST(req->status);
-	req->cn10k_req.jd.hdr.model_id = op->model_id;
+	req->cn10k_req.jd.hdr.model_id = index;
 	req->cn10k_req.jd.hdr.job_type = ML_CN10K_JOB_TYPE_MODEL_RUN;
 	req->cn10k_req.jd.hdr.fp_flags = ML_FLAGS_POLL_COMPL;
 	req->cn10k_req.jd.hdr.sp_flags = 0x0;
 	req->cn10k_req.jd.hdr.result =
 		roc_ml_addr_ap2mlip(&cn10k_mldev->roc, &req->cn10k_req.result);
 	req->cn10k_req.jd.model_run.input_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->input[0]->addr));
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, input));
 	req->cn10k_req.jd.model_run.output_ddr_addr =
-		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, op->output[0]->addr));
-	req->cn10k_req.jd.model_run.num_batches = op->nb_batches;
+		PLT_U64_CAST(roc_ml_addr_ap2mlip(&cn10k_mldev->roc, output));
+	req->cn10k_req.jd.model_run.num_batches = nb_batches;
 }
 
 static void
@@ -311,30 +299,15 @@ cn10k_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *l
 static int
 cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer)
 {
-	struct rte_ml_buff_seg seg[2];
-	struct rte_ml_buff_seg *inp;
-	struct rte_ml_buff_seg *out;
-	struct rte_ml_op op;
-
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
 	uint64_t isize = 0;
 	uint64_t osize = 0;
 	int ret = 0;
-	uint32_t i;
-
-	inp = &seg[0];
-	out = &seg[1];
 
 	/* Create input and output buffers. */
-	for (i = 0; i < layer->info.nb_inputs; i++)
-		isize += layer->info.input[i].sz_q;
-
-	for (i = 0; i < layer->info.nb_outputs; i++)
-		osize += layer->info.output[i].sz_q;
-
-	isize = layer->batch_size * isize;
-	osize = layer->batch_size * osize;
+	isize = layer->info.total_input_sz_q;
+	osize = layer->info.total_output_sz_q;
 
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", layer->index);
 	mz = plt_memzone_reserve_aligned(str, isize + osize, 0, ML_CN10K_ALIGN_SIZE);
@@ -342,25 +315,9 @@ cn10k_ml_cache_model_data(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *
 		return -ENOMEM;
 	memset(mz->addr, 0, isize + osize);
 
-	seg[0].addr = mz->addr;
-	seg[0].iova_addr = mz->iova;
-	seg[0].length = isize;
-	seg[0].next = NULL;
-
-	seg[1].addr = PLT_PTR_ADD(mz->addr, isize);
-	seg[1].iova_addr = mz->iova + isize;
-	seg[1].length = osize;
-	seg[1].next = NULL;
-
-	op.model_id = layer->index;
-	op.nb_batches = layer->batch_size;
-	op.mempool = NULL;
-
-	op.input = &inp;
-	op.output = &out;
-
 	memset(layer->glow.req, 0, sizeof(struct cnxk_ml_req));
-	ret = cn10k_ml_inference_sync(cnxk_mldev, &op);
+	ret = cn10k_ml_inference_sync(cnxk_mldev, layer->index, mz->addr,
+				      PLT_PTR_ADD(mz->addr, isize), 1);
 	plt_memzone_free(mz);
 
 	return ret;
@@ -425,13 +382,8 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 	else
 		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
-	/* Set polling function pointers */
-	cn10k_mldev->set_poll_addr = cn10k_ml_set_poll_addr;
-	cn10k_mldev->set_poll_ptr = cn10k_ml_set_poll_ptr;
-	cn10k_mldev->get_poll_ptr = cn10k_ml_get_poll_ptr;
-
-	cnxk_mldev->mldev->enqueue_burst = cn10k_ml_enqueue_burst;
-	cnxk_mldev->mldev->dequeue_burst = cn10k_ml_dequeue_burst;
+	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
+	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
 	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
 
 	return 0;
@@ -824,6 +776,12 @@ cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	cn10k_ml_model_info_set(cnxk_mldev, model, &model->layer[0].info, &model->glow.metadata);
 
+	/* Set fast-path functions */
+	model->enqueue_single = cn10k_ml_enqueue_single;
+	model->result_update = cn10k_ml_result_update;
+	model->set_error_code = cn10k_ml_set_error_code;
+	model->set_poll_addr = cn10k_ml_set_poll_addr;
+
 	return 0;
 }
 
@@ -1219,26 +1177,8 @@ cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_mode
 	return 0;
 }
 
-static __rte_always_inline void
-queue_index_advance(uint64_t *index, uint64_t nb_desc)
-{
-	*index = (*index + 1) % nb_desc;
-}
-
-static __rte_always_inline uint64_t
-queue_pending_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
-{
-	return (nb_desc + head - tail) % nb_desc;
-}
-
-static __rte_always_inline uint64_t
-queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
-{
-	return nb_desc - queue_pending_count(head, tail, nb_desc) - 1;
-}
-
-static __rte_always_inline void
-cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml_req *req)
+__rte_hot void
+cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_layer_xstats *xstats;
@@ -1246,6 +1186,7 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml
 	struct cn10k_ml_result *result;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
+	struct cnxk_ml_req *req;
 	struct cnxk_ml_qp *qp;
 	struct rte_ml_op *op;
 	uint64_t hw_latency;
@@ -1253,9 +1194,9 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml
 	uint16_t model_id;
 	uint16_t layer_id;
 
+	req = (struct cnxk_ml_req *)request;
 	result = &req->cn10k_req.result;
 	op = req->op;
-
 	if (likely(result->error_code == 0)) {
 		model_id = cnxk_mldev->index_map[op->model_id].model_id;
 		layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
@@ -1322,119 +1263,48 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, struct cnxk_ml
 	op->user_ptr = result->user_ptr;
 }
 
-__rte_hot uint16_t
-cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
-		       uint16_t nb_ops)
+__rte_hot void
+cn10k_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype)
+{
+	union cn10k_ml_error_code *error_code;
+
+	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
+	error_code->s.etype = etype;
+	error_code->s.stype = stype;
+}
+
+__rte_hot bool
+cn10k_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op, uint16_t layer_id,
+			struct cnxk_ml_qp *qp, uint64_t head)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
 	struct cnxk_ml_queue *queue;
 	struct cnxk_ml_req *req;
-	struct cnxk_ml_qp *qp;
-	struct rte_ml_op *op;
-
-	uint16_t count;
-	uint64_t head;
-	bool enqueued;
 
-	cnxk_mldev = dev->data->dev_private;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	qp = dev->data->queue_pairs[qp_id];
 	queue = &qp->queue;
-
-	head = queue->head;
-	nb_ops = PLT_MIN(nb_ops, queue_free_count(head, queue->tail, qp->nb_desc));
-	count = 0;
-
-	if (unlikely(nb_ops == 0))
-		return 0;
-
-enqueue_req:
-	op = ops[count];
 	req = &queue->reqs[head];
 
-	cn10k_mldev->set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
+	model = cnxk_mldev->mldev->data->models[op->model_id];
+	model->set_poll_addr(req);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, model->layer[layer_id].index,
+					op->input[0]->addr, op->output[0]->addr, op->nb_batches);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
 	error_code->s.etype = ML_ETYPE_UNKNOWN;
 	req->cn10k_req.result.user_ptr = op->user_ptr;
 
-	cn10k_mldev->set_poll_ptr(req);
-	enqueued = cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd);
-	if (unlikely(!enqueued))
-		goto jcmdq_full;
+	cnxk_ml_set_poll_ptr(req);
+	if (unlikely(!cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd)))
+		return false;
 
 	req->timeout = plt_tsc_cycles() + queue->wait_cycles;
 	req->op = op;
 
-	queue_index_advance(&head, qp->nb_desc);
-	count++;
-
-	if (count < nb_ops)
-		goto enqueue_req;
-
-jcmdq_full:
-	queue->head = head;
-	qp->stats.enqueued_count += count;
-
-	return count;
-}
-
-__rte_hot uint16_t
-cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
-		       uint16_t nb_ops)
-{
-	union cn10k_ml_error_code *error_code;
-	struct cn10k_ml_dev *cn10k_mldev;
-	struct cnxk_ml_dev *cnxk_mldev;
-	struct cnxk_ml_queue *queue;
-	struct cnxk_ml_req *req;
-	struct cnxk_ml_qp *qp;
-
-	uint64_t status;
-	uint16_t count;
-	uint64_t tail;
-
-	cnxk_mldev = dev->data->dev_private;
-	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	qp = dev->data->queue_pairs[qp_id];
-	queue = &qp->queue;
-
-	tail = queue->tail;
-	nb_ops = PLT_MIN(nb_ops, queue_pending_count(queue->head, tail, qp->nb_desc));
-	count = 0;
-
-	if (unlikely(nb_ops == 0))
-		goto empty_or_active;
-
-dequeue_req:
-	req = &queue->reqs[tail];
-	status = cn10k_mldev->get_poll_ptr(req);
-	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
-		if (plt_tsc_cycles() < req->timeout) {
-			goto empty_or_active;
-		} else { /* Timeout, set indication of driver error */
-			error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
-			error_code->s.etype = ML_ETYPE_DRIVER;
-		}
-	}
-
-	cn10k_ml_result_update(cnxk_mldev, qp_id, req);
-	ops[count] = req->op;
-
-	queue_index_advance(&tail, qp->nb_desc);
-	count++;
-
-	if (count < nb_ops)
-		goto dequeue_req;
-
-empty_or_active:
-	queue->tail = tail;
-
-	return count;
+	return true;
 }
 
 __rte_hot int
@@ -1471,41 +1341,48 @@ cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_m
 }
 
 __rte_hot int
-cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op)
+cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
+			uint16_t nb_batches)
 {
 	union cn10k_ml_error_code *error_code;
 	struct cn10k_ml_dev *cn10k_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
 	struct cnxk_ml_model *model;
 	struct cnxk_ml_layer *layer;
 	struct cnxk_ml_req *req;
+	struct rte_ml_op op;
 	uint16_t model_id;
 	uint16_t layer_id;
 	bool timeout;
 	int ret = 0;
 
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
-	model_id = cnxk_mldev->index_map[op->model_id].model_id;
-	layer_id = cnxk_mldev->index_map[op->model_id].layer_id;
+	model_id = cnxk_mldev->index_map[index].model_id;
+	layer_id = cnxk_mldev->index_map[index].layer_id;
 	model = cnxk_mldev->mldev->data->models[model_id];
 	layer = &model->layer[layer_id];
 	req = layer->glow.req;
 
+	op.model_id = index;
+	op.impl_opaque = 0;
+
 	cn10k_ml_set_poll_addr(req);
-	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, op);
+	cn10k_ml_prep_fp_job_descriptor(cnxk_mldev, req, index, input, output, nb_batches);
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
 	error_code->s.etype = ML_ETYPE_UNKNOWN;
-	req->cn10k_req.result.user_ptr = op->user_ptr;
+	req->cn10k_req.result.user_ptr = NULL;
 
-	cn10k_mldev->set_poll_ptr(req);
+	cnxk_ml_set_poll_ptr(req);
 	req->cn10k_req.jcmd.w1.s.jobptr = PLT_U64_CAST(&req->cn10k_req.jd);
 
 	timeout = true;
 	req->timeout = plt_tsc_cycles() + ML_CNXK_CMD_TIMEOUT * plt_tsc_hz();
 	do {
 		if (cn10k_mldev->ml_jcmdq_enqueue(&cn10k_mldev->roc, &req->cn10k_req.jcmd)) {
-			req->op = op;
+			req->op = &op;
 			timeout = false;
 			break;
 		}
@@ -1518,7 +1395,7 @@ cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op)
 
 	timeout = true;
 	do {
-		if (cn10k_mldev->get_poll_ptr(req) == ML_CNXK_POLL_JOB_FINISH) {
+		if (cnxk_ml_get_poll_ptr(req) == ML_CNXK_POLL_JOB_FINISH) {
 			timeout = false;
 			break;
 		}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 4d76164dba..3d18303ed3 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -14,6 +14,7 @@ struct cnxk_ml_dev;
 struct cnxk_ml_qp;
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
+struct cnxk_ml_req;
 
 /* Firmware version string length */
 #define MLDEV_FIRMWARE_VERSION_LENGTH 32
@@ -309,13 +310,15 @@ int cn10k_ml_model_params_update(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_
 				 void *buffer);
 
 /* Fast-path ops */
-__rte_hot uint16_t cn10k_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
-					  struct rte_ml_op **ops, uint16_t nb_ops);
-__rte_hot uint16_t cn10k_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
-					  struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot bool cn10k_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op,
+				       uint16_t layer_id, struct cnxk_ml_qp *qp, uint64_t head);
 __rte_hot int cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op,
 				    struct rte_ml_op_error *error);
-__rte_hot int cn10k_ml_inference_sync(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op);
+__rte_hot int cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
+				      uint16_t nb_batches);
+__rte_hot void cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request);
+__rte_hot void cn10k_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype);
+__rte_hot void cn10k_ml_set_poll_addr(struct cnxk_ml_req *req);
 
 /* Misc ops */
 void cn10k_ml_qp_initialize(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_qp *qp);
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index 66d979dd3c..f618e5aa5f 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -15,6 +15,8 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_qp;
+struct cnxk_ml_req;
 
 /* Model state */
 enum cnxk_ml_model_state {
@@ -70,6 +72,12 @@ struct cnxk_ml_layer {
 	struct cn10k_ml_layer_data glow;
 };
 
+typedef bool (*enqueue_single_t)(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op,
+				 uint16_t layer_id, struct cnxk_ml_qp *qp, uint64_t head);
+typedef void (*result_update_t)(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request);
+typedef void (*set_error_code_t)(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype);
+typedef void (*set_poll_addr_t)(struct cnxk_ml_req *req);
+
 /* Model Object */
 struct cnxk_ml_model {
 	/* Device reference */
@@ -106,6 +114,12 @@ struct cnxk_ml_model {
 
 	/* Spinlock, used to update model state */
 	plt_spinlock_t lock;
+
+	/* Fast-path functions */
+	enqueue_single_t enqueue_single;
+	result_update_t result_update;
+	set_error_code_t set_error_code;
+	set_poll_addr_t set_poll_addr;
 };
 
 void cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp);
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 4f4a41219e..909e9143bf 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -15,6 +15,18 @@
 /* ML model macros */
 #define CNXK_ML_MODEL_MEMZONE_NAME "ml_cnxk_model_mz"
 
+__rte_hot void
+cnxk_ml_set_poll_ptr(struct cnxk_ml_req *req)
+{
+	plt_write64(ML_CNXK_POLL_JOB_START, req->status);
+}
+
+__rte_hot uint64_t
+cnxk_ml_get_poll_ptr(struct cnxk_ml_req *req)
+{
+	return plt_read64(req->status);
+}
+
 static void
 qp_memzone_name_get(char *name, int size, int dev_id, int qp_id)
 {
@@ -1262,6 +1274,122 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 	return 0;
 }
 
+static __rte_always_inline void
+queue_index_advance(uint64_t *index, uint64_t nb_desc)
+{
+	*index = (*index + 1) % nb_desc;
+}
+
+static __rte_always_inline uint64_t
+queue_pending_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
+{
+	return (nb_desc + head - tail) % nb_desc;
+}
+
+static __rte_always_inline uint64_t
+queue_free_count(uint64_t head, uint64_t tail, uint64_t nb_desc)
+{
+	return nb_desc - queue_pending_count(head, tail, nb_desc) - 1;
+}
+
+__rte_hot uint16_t
+cnxk_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
+		      uint16_t nb_ops)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_qp *qp;
+	struct rte_ml_op *op;
+
+	uint16_t layer_id = 0;
+	uint16_t count;
+	uint64_t head;
+
+	cnxk_mldev = dev->data->dev_private;
+	qp = dev->data->queue_pairs[qp_id];
+	queue = &qp->queue;
+
+	head = queue->head;
+	nb_ops = PLT_MIN(nb_ops, queue_free_count(head, queue->tail, qp->nb_desc));
+	count = 0;
+
+	if (unlikely(nb_ops == 0))
+		return 0;
+
+enqueue_req:
+	op = ops[count];
+	model = cnxk_mldev->mldev->data->models[op->model_id];
+
+	if (unlikely(!model->enqueue_single(cnxk_mldev, op, layer_id, qp, head)))
+		goto jcmdq_full;
+
+	queue_index_advance(&head, qp->nb_desc);
+	count++;
+
+	if (count < nb_ops)
+		goto enqueue_req;
+
+jcmdq_full:
+	queue->head = head;
+	qp->stats.enqueued_count += count;
+
+	return count;
+}
+
+__rte_hot uint16_t
+cnxk_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op **ops,
+		      uint16_t nb_ops)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_req *req;
+	struct cnxk_ml_qp *qp;
+
+	uint64_t status;
+	uint16_t count;
+	uint64_t tail;
+
+	cnxk_mldev = dev->data->dev_private;
+	qp = dev->data->queue_pairs[qp_id];
+	queue = &qp->queue;
+
+	tail = queue->tail;
+	nb_ops = PLT_MIN(nb_ops, queue_pending_count(queue->head, tail, qp->nb_desc));
+	count = 0;
+
+	if (unlikely(nb_ops == 0))
+		goto empty_or_active;
+
+dequeue_req:
+
+	req = &queue->reqs[tail];
+	model = cnxk_mldev->mldev->data->models[req->op->model_id];
+
+	status = cnxk_ml_get_poll_ptr(req);
+	if (unlikely(status != ML_CNXK_POLL_JOB_FINISH)) {
+		if (plt_tsc_cycles() < req->timeout)
+			goto empty_or_active;
+		else /* Timeout, set indication of driver error */
+			model->set_error_code(req, ML_ETYPE_DRIVER, 0);
+	}
+
+	model->result_update(cnxk_mldev, qp->id, req);
+
+	ops[count] = req->op;
+	queue_index_advance(&tail, qp->nb_desc);
+	count++;
+
+	if (count < nb_ops)
+		goto dequeue_req;
+
+empty_or_active:
+	queue->tail = tail;
+
+	return count;
+}
+
 struct rte_ml_dev_ops cnxk_ml_ops = {
 	/* Device control ops */
 	.dev_info_get = cnxk_ml_dev_info_get,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index d27ca0d0cb..d0c126f34b 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -65,4 +65,11 @@ extern struct rte_ml_dev_ops cnxk_ml_ops;
 int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
 int cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
 
+__rte_hot uint16_t cnxk_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
+					 struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot uint16_t cnxk_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
+					 struct rte_ml_op **ops, uint16_t nb_ops);
+__rte_hot void cnxk_ml_set_poll_ptr(struct cnxk_ml_req *req);
+__rte_hot uint64_t cnxk_ml_get_poll_ptr(struct cnxk_ml_req *req);
+
 #endif /* _CNXK_ML_OPS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 17/34] ml/cnxk: move error handling to cnxk layer
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (15 preceding siblings ...)
  2023-10-26 12:43   ` [PATCH v9 16/34] ml/cnxk: update fast path functions Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 18/34] ml/cnxk: support config and close of tvmdp library Srikanth Yalavarthi
                     ` (17 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Move error type structures to cnxk layer. cn10k layer to
handle fw and hw error sub-types only.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_dev.h | 41 ++++++---------
 drivers/ml/cnxk/cn10k_ml_ops.c | 93 +++++++++++++---------------------
 drivers/ml/cnxk/cnxk_ml_dev.c  |  8 +++
 drivers/ml/cnxk/cnxk_ml_dev.h  | 18 +++++++
 drivers/ml/cnxk/cnxk_ml_ops.c  |  2 +-
 5 files changed, 78 insertions(+), 84 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 94a94d996f..2e7eb6c9ef 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -52,38 +52,27 @@ struct cnxk_ml_dev;
 struct cnxk_ml_req;
 struct cnxk_ml_qp;
 
-/* Error types enumeration */
-enum cn10k_ml_error_etype {
-	/* 0x0 */ ML_ETYPE_NO_ERROR = 0, /* No error */
-	/* 0x1 */ ML_ETYPE_FW_NONFATAL,	 /* Firmware non-fatal error */
-	/* 0x2 */ ML_ETYPE_HW_NONFATAL,	 /* Hardware non-fatal error */
-	/* 0x3 */ ML_ETYPE_HW_FATAL,	 /* Hardware fatal error */
-	/* 0x4 */ ML_ETYPE_HW_WARNING,	 /* Hardware warning */
-	/* 0x5 */ ML_ETYPE_DRIVER,	 /* Driver specific error */
-	/* 0x6 */ ML_ETYPE_UNKNOWN,	 /* Unknown error */
-};
-
 /* Firmware non-fatal error sub-type */
 enum cn10k_ml_error_stype_fw_nf {
-	/* 0x0 */ ML_FW_ERR_NOERR = 0,		 /* No error */
-	/* 0x1 */ ML_FW_ERR_UNLOAD_ID_NOT_FOUND, /* Model ID not found during load */
-	/* 0x2 */ ML_FW_ERR_LOAD_LUT_OVERFLOW,	 /* Lookup table overflow at load */
-	/* 0x3 */ ML_FW_ERR_ID_IN_USE,		 /* Model ID already in use */
-	/* 0x4 */ ML_FW_ERR_INVALID_TILEMASK,	 /* Invalid OCM tilemask */
-	/* 0x5 */ ML_FW_ERR_RUN_LUT_OVERFLOW,	 /* Lookup table overflow at run */
-	/* 0x6 */ ML_FW_ERR_RUN_ID_NOT_FOUND,	 /* Model ID not found during run */
-	/* 0x7 */ ML_FW_ERR_COMMAND_NOTSUP,	 /* Unsupported command */
-	/* 0x8 */ ML_FW_ERR_DDR_ADDR_RANGE,	 /* DDR address out of range */
-	/* 0x9 */ ML_FW_ERR_NUM_BATCHES_INVALID, /* Invalid number of batches */
-	/* 0xA */ ML_FW_ERR_INSSYNC_TIMEOUT,	 /* INS sync timeout */
+	/* 0x0 */ ML_CN10K_FW_ERR_NOERR = 0,	       /* No error */
+	/* 0x1 */ ML_CN10K_FW_ERR_UNLOAD_ID_NOT_FOUND, /* Model ID not found during load */
+	/* 0x2 */ ML_CN10K_FW_ERR_LOAD_LUT_OVERFLOW,   /* Lookup table overflow at load */
+	/* 0x3 */ ML_CN10K_FW_ERR_ID_IN_USE,	       /* Model ID already in use */
+	/* 0x4 */ ML_CN10K_FW_ERR_INVALID_TILEMASK,    /* Invalid OCM tilemask */
+	/* 0x5 */ ML_CN10K_FW_ERR_RUN_LUT_OVERFLOW,    /* Lookup table overflow at run */
+	/* 0x6 */ ML_CN10K_FW_ERR_RUN_ID_NOT_FOUND,    /* Model ID not found during run */
+	/* 0x7 */ ML_CN10K_FW_ERR_COMMAND_NOTSUP,      /* Unsupported command */
+	/* 0x8 */ ML_CN10K_FW_ERR_DDR_ADDR_RANGE,      /* DDR address out of range */
+	/* 0x9 */ ML_CN10K_FW_ERR_NUM_BATCHES_INVALID, /* Invalid number of batches */
+	/* 0xA */ ML_CN10K_FW_ERR_INSSYNC_TIMEOUT,     /* INS sync timeout */
 };
 
 /* Driver error sub-type */
 enum cn10k_ml_error_stype_driver {
-	/* 0x0 */ ML_DRIVER_ERR_NOERR = 0, /* No error */
-	/* 0x1 */ ML_DRIVER_ERR_UNKNOWN,   /* Unable to determine error sub-type */
-	/* 0x2 */ ML_DRIVER_ERR_EXCEPTION, /* Firmware exception */
-	/* 0x3 */ ML_DRIVER_ERR_FW_ERROR,  /* Unknown firmware error */
+	/* 0x0 */ ML_CN10K_DRIVER_ERR_NOERR = 0, /* No error */
+	/* 0x1 */ ML_CN10K_DRIVER_ERR_UNKNOWN,	 /* Unable to determine error sub-type */
+	/* 0x2 */ ML_CN10K_DRIVER_ERR_EXCEPTION, /* Firmware exception */
+	/* 0x3 */ ML_CN10K_DRIVER_ERR_FW_ERROR,	 /* Unknown firmware error */
 };
 
 /* Error structure */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 8116c8dedb..65eaaf030d 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -22,47 +22,27 @@
 #define ML_FLAGS_POLL_COMPL BIT(0)
 #define ML_FLAGS_SSO_COMPL  BIT(1)
 
-/* Error message length */
-#define ERRMSG_LEN 32
-
-/* Error type database */
-static const struct cn10k_ml_etype_db {
-	enum cn10k_ml_error_etype etype;
-	char name[ERRMSG_LEN];
-} ml_etype_db[] = {
-	{ML_ETYPE_NO_ERROR, "NO_ERROR"},	{ML_ETYPE_FW_NONFATAL, "FW_NON_FATAL"},
-	{ML_ETYPE_HW_NONFATAL, "HW_NON_FATAL"}, {ML_ETYPE_HW_FATAL, "HW_FATAL"},
-	{ML_ETYPE_HW_WARNING, "HW_WARNING"},	{ML_ETYPE_DRIVER, "DRIVER_ERROR"},
-	{ML_ETYPE_UNKNOWN, "UNKNOWN_ERROR"},
-};
-
 /* Hardware non-fatal error subtype database */
-static const struct cn10k_ml_stype_db_hw_nf {
-	enum cn10k_ml_error_stype_fw_nf stype;
-	char msg[ERRMSG_LEN];
-} ml_stype_db_hw_nf[] = {
-	{ML_FW_ERR_NOERR, "NO ERROR"},
-	{ML_FW_ERR_UNLOAD_ID_NOT_FOUND, "UNLOAD MODEL ID NOT FOUND"},
-	{ML_FW_ERR_LOAD_LUT_OVERFLOW, "LOAD LUT OVERFLOW"},
-	{ML_FW_ERR_ID_IN_USE, "MODEL ID IN USE"},
-	{ML_FW_ERR_INVALID_TILEMASK, "INVALID TILEMASK"},
-	{ML_FW_ERR_RUN_LUT_OVERFLOW, "RUN LUT OVERFLOW"},
-	{ML_FW_ERR_RUN_ID_NOT_FOUND, "RUN MODEL ID NOT FOUND"},
-	{ML_FW_ERR_COMMAND_NOTSUP, "COMMAND NOT SUPPORTED"},
-	{ML_FW_ERR_DDR_ADDR_RANGE, "DDR ADDRESS OUT OF RANGE"},
-	{ML_FW_ERR_NUM_BATCHES_INVALID, "INVALID BATCHES"},
-	{ML_FW_ERR_INSSYNC_TIMEOUT, "INSSYNC TIMEOUT"},
+static struct cnxk_ml_error_db ml_stype_db_hw_nf[] = {
+	{ML_CN10K_FW_ERR_NOERR, "NO ERROR"},
+	{ML_CN10K_FW_ERR_UNLOAD_ID_NOT_FOUND, "UNLOAD MODEL ID NOT FOUND"},
+	{ML_CN10K_FW_ERR_LOAD_LUT_OVERFLOW, "LOAD LUT OVERFLOW"},
+	{ML_CN10K_FW_ERR_ID_IN_USE, "MODEL ID IN USE"},
+	{ML_CN10K_FW_ERR_INVALID_TILEMASK, "INVALID TILEMASK"},
+	{ML_CN10K_FW_ERR_RUN_LUT_OVERFLOW, "RUN LUT OVERFLOW"},
+	{ML_CN10K_FW_ERR_RUN_ID_NOT_FOUND, "RUN MODEL ID NOT FOUND"},
+	{ML_CN10K_FW_ERR_COMMAND_NOTSUP, "COMMAND NOT SUPPORTED"},
+	{ML_CN10K_FW_ERR_DDR_ADDR_RANGE, "DDR ADDRESS OUT OF RANGE"},
+	{ML_CN10K_FW_ERR_NUM_BATCHES_INVALID, "INVALID BATCHES"},
+	{ML_CN10K_FW_ERR_INSSYNC_TIMEOUT, "INSSYNC TIMEOUT"},
 };
 
 /* Driver error subtype database */
-static const struct cn10k_ml_stype_db_driver {
-	enum cn10k_ml_error_stype_driver stype;
-	char msg[ERRMSG_LEN];
-} ml_stype_db_driver[] = {
-	{ML_DRIVER_ERR_NOERR, "NO ERROR"},
-	{ML_DRIVER_ERR_UNKNOWN, "UNKNOWN ERROR"},
-	{ML_DRIVER_ERR_EXCEPTION, "FW EXCEPTION"},
-	{ML_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
+static struct cnxk_ml_error_db ml_stype_db_driver[] = {
+	{ML_CN10K_DRIVER_ERR_NOERR, "NO ERROR"},
+	{ML_CN10K_DRIVER_ERR_UNKNOWN, "UNKNOWN ERROR"},
+	{ML_CN10K_DRIVER_ERR_EXCEPTION, "FW EXCEPTION"},
+	{ML_CN10K_DRIVER_ERR_FW_ERROR, "UNKNOWN FIRMWARE ERROR"},
 };
 
 __rte_hot void
@@ -1241,19 +1221,19 @@ cn10k_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request)
 
 		/* Handle driver error */
 		error_code = (union cn10k_ml_error_code *)&result->error_code;
-		if (error_code->s.etype == ML_ETYPE_DRIVER) {
+		if (error_code->s.etype == ML_CNXK_ETYPE_DRIVER) {
 			cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 
 			/* Check for exception */
 			if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C0) !=
 			     0) ||
 			    (roc_ml_reg_read64(&cn10k_mldev->roc, ML_SCRATCH_EXCEPTION_SP_C1) != 0))
-				error_code->s.stype = ML_DRIVER_ERR_EXCEPTION;
+				error_code->s.stype = ML_CN10K_DRIVER_ERR_EXCEPTION;
 			else if ((roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_LO) != 0) ||
 				 (roc_ml_reg_read64(&cn10k_mldev->roc, ML_CORE_INT_HI) != 0))
-				error_code->s.stype = ML_DRIVER_ERR_FW_ERROR;
+				error_code->s.stype = ML_CN10K_DRIVER_ERR_FW_ERROR;
 			else
-				error_code->s.stype = ML_DRIVER_ERR_UNKNOWN;
+				error_code->s.stype = ML_CN10K_DRIVER_ERR_UNKNOWN;
 		}
 
 		op->impl_opaque = result->error_code;
@@ -1294,7 +1274,7 @@ cn10k_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op, ui
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
-	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	error_code->s.etype = ML_CNXK_ETYPE_UNKNOWN;
 	req->cn10k_req.result.user_ptr = op->user_ptr;
 
 	cnxk_ml_set_poll_ptr(req);
@@ -1311,30 +1291,29 @@ __rte_hot int
 cn10k_ml_op_error_get(struct rte_ml_dev *dev, struct rte_ml_op *op, struct rte_ml_op_error *error)
 {
 	union cn10k_ml_error_code *error_code;
-	char msg[RTE_ML_STR_MAX];
 
 	PLT_SET_USED(dev);
 
 	error_code = (union cn10k_ml_error_code *)&op->impl_opaque;
 
-	/* Copy error message */
-	plt_strlcpy(msg, ml_etype_db[error_code->s.etype].name, sizeof(msg));
-
 	/* Copy sub error message */
-	if (error_code->s.etype == ML_ETYPE_HW_NONFATAL) {
-		strcat(msg, " : ");
+	if (error_code->s.etype == ML_CNXK_ETYPE_HW_NONFATAL) {
 		if (error_code->s.stype < PLT_DIM(ml_stype_db_hw_nf))
-			strcat(msg, ml_stype_db_hw_nf[error_code->s.stype].msg);
+			snprintf(error->message, RTE_ML_STR_MAX, "%s : %s",
+				 ml_etype_db[error_code->s.etype].str,
+				 ml_stype_db_hw_nf[error_code->s.stype].str);
 		else
-			strcat(msg, "UNKNOWN ERROR");
-	}
-
-	if (error_code->s.etype == ML_ETYPE_DRIVER) {
-		strcat(msg, " : ");
-		strcat(msg, ml_stype_db_driver[error_code->s.stype].msg);
+			snprintf(error->message, RTE_ML_STR_MAX, "%s : UNKNOWN ERROR",
+				 ml_etype_db[error_code->s.etype].str);
+	} else if (error_code->s.etype == ML_CNXK_ETYPE_DRIVER) {
+		snprintf(error->message, RTE_ML_STR_MAX, "%s : %s",
+			 ml_etype_db[error_code->s.etype].str,
+			 ml_stype_db_driver[error_code->s.stype].str);
+	} else {
+		snprintf(error->message, RTE_ML_STR_MAX, "%s",
+			 ml_etype_db[error_code->s.etype].str);
 	}
 
-	plt_strlcpy(error->message, msg, sizeof(error->message));
 	error->errcode = error_code->u64;
 
 	return 0;
@@ -1372,7 +1351,7 @@ cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
 
 	memset(&req->cn10k_req.result, 0, sizeof(struct cn10k_ml_result));
 	error_code = (union cn10k_ml_error_code *)&req->cn10k_req.result.error_code;
-	error_code->s.etype = ML_ETYPE_UNKNOWN;
+	error_code->s.etype = ML_CNXK_ETYPE_UNKNOWN;
 	req->cn10k_req.result.user_ptr = NULL;
 
 	cnxk_ml_set_poll_ptr(req);
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.c b/drivers/ml/cnxk/cnxk_ml_dev.c
index 2a5c17c973..63d1c9e417 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.c
+++ b/drivers/ml/cnxk/cnxk_ml_dev.c
@@ -9,3 +9,11 @@
 
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
+
+/* Error type database */
+struct cnxk_ml_error_db ml_etype_db[] = {
+	{ML_CNXK_ETYPE_NO_ERROR, "NO_ERROR"},	     {ML_CNXK_ETYPE_FW_NONFATAL, "FW_NON_FATAL"},
+	{ML_CNXK_ETYPE_HW_NONFATAL, "HW_NON_FATAL"}, {ML_CNXK_ETYPE_HW_FATAL, "HW_FATAL"},
+	{ML_CNXK_ETYPE_HW_WARNING, "HW_WARNING"},    {ML_CNXK_ETYPE_DRIVER, "DRIVER_ERROR"},
+	{ML_CNXK_ETYPE_UNKNOWN, "UNKNOWN_ERROR"},
+};
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 3ce9338f1f..382fca64be 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -18,6 +18,22 @@
 #define ML_CNXK_POLL_JOB_START	0
 #define ML_CNXK_POLL_JOB_FINISH 1
 
+/* Error types enumeration */
+enum cnxk_ml_error_etype {
+	/* 0x0 */ ML_CNXK_ETYPE_NO_ERROR = 0, /* No error */
+	/* 0x1 */ ML_CNXK_ETYPE_FW_NONFATAL,  /* Firmware non-fatal error */
+	/* 0x2 */ ML_CNXK_ETYPE_HW_NONFATAL,  /* Hardware non-fatal error */
+	/* 0x3 */ ML_CNXK_ETYPE_HW_FATAL,     /* Hardware fatal error */
+	/* 0x4 */ ML_CNXK_ETYPE_HW_WARNING,   /* Hardware warning */
+	/* 0x5 */ ML_CNXK_ETYPE_DRIVER,	      /* Driver specific error */
+	/* 0x6 */ ML_CNXK_ETYPE_UNKNOWN,      /* Unknown error */
+};
+
+struct cnxk_ml_error_db {
+	uint64_t code;
+	char str[RTE_ML_STR_MAX];
+};
+
 /* Device configuration state enum */
 enum cnxk_ml_dev_state {
 	/* Probed and not configured */
@@ -78,4 +94,6 @@ struct cnxk_ml_dev {
 	struct cnxk_ml_index_map *index_map;
 };
 
+extern struct cnxk_ml_error_db ml_etype_db[];
+
 #endif /* _CNXK_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 909e9143bf..3d21a31374 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1372,7 +1372,7 @@ cnxk_ml_dequeue_burst(struct rte_ml_dev *dev, uint16_t qp_id, struct rte_ml_op *
 		if (plt_tsc_cycles() < req->timeout)
 			goto empty_or_active;
 		else /* Timeout, set indication of driver error */
-			model->set_error_code(req, ML_ETYPE_DRIVER, 0);
+			model->set_error_code(req, ML_CNXK_ETYPE_DRIVER, 0);
 	}
 
 	model->result_update(cnxk_mldev, qp->id, req);
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 18/34] ml/cnxk: support config and close of tvmdp library
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (16 preceding siblings ...)
  2023-10-26 12:43   ` [PATCH v9 17/34] ml/cnxk: move error handling to cnxk layer Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 19/34] ml/cnxk: add structures to support TVM model type Srikanth Yalavarthi
                     ` (16 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Ruifeng Wang, Bruce Richardson, Srikanth Yalavarthi
  Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to configure and close TVMDP library based
on ML device configuration options.

Updated meson build to enable Jansson, TVM runtime, TVMDP
library as build dependencies.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 config/arm/arm64_cn10k_linux_gcc |   1 +
 config/arm/arm64_cn9k_linux_gcc  |   1 +
 doc/guides/mldevs/cnxk.rst       | 169 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_ops.c    |   7 ++
 drivers/ml/cnxk/cnxk_ml_ops.h    |   6 ++
 drivers/ml/cnxk/meson.build      |  58 +++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.c   |  41 ++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |  19 ++++
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  26 +++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  15 +++
 10 files changed, 343 insertions(+)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_ops.h
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_stubs.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_stubs.h

diff --git a/config/arm/arm64_cn10k_linux_gcc b/config/arm/arm64_cn10k_linux_gcc
index 05d2d64cf2..fa904af5d0 100644
--- a/config/arm/arm64_cn10k_linux_gcc
+++ b/config/arm/arm64_cn10k_linux_gcc
@@ -5,6 +5,7 @@ ar = 'aarch64-linux-gnu-gcc-ar'
 strip = 'aarch64-linux-gnu-strip'
 pkgconfig = 'aarch64-linux-gnu-pkg-config'
 pcap-config = ''
+cmake = 'cmake'
 
 [host_machine]
 system = 'linux'
diff --git a/config/arm/arm64_cn9k_linux_gcc b/config/arm/arm64_cn9k_linux_gcc
index 7416454de0..646ce4b5d3 100644
--- a/config/arm/arm64_cn9k_linux_gcc
+++ b/config/arm/arm64_cn9k_linux_gcc
@@ -5,6 +5,7 @@ ar = 'aarch64-linux-gnu-gcc-ar'
 strip = 'aarch64-linux-gnu-strip'
 pkgconfig = 'aarch64-linux-gnu-pkg-config'
 pcap-config = ''
+cmake = 'cmake'
 
 [host_machine]
 system = 'linux'
diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
index 1834b1f905..a4d8903896 100644
--- a/doc/guides/mldevs/cnxk.rst
+++ b/doc/guides/mldevs/cnxk.rst
@@ -46,6 +46,175 @@ or cross-compiled on an x86 platform.
 
 Refer to :doc:`../platform/cnxk` for instructions to build your DPDK application.
 
+Compilation Prerequisites
+-------------------------
+
+This driver requires external libraries to optionally enable support for
+models compiled using Apache TVM framework. The following dependencies are
+not part of DPDK and must be installed separately:
+
+- **Jansson**
+
+  This library enables support to parse and read JSON files.
+
+- **DLPack**
+
+  This library provides headers for open in-memory tensor structures.
+
+.. note::
+
+    DPDK CNXK ML driver requires DLPack version 0.7
+
+.. code-block:: console
+
+    git clone https://github.com/dmlc/dlpack.git
+    cd dlpack
+    git checkout v0.7 -b v0.7
+    cmake -S ./ -B build \
+      -DCMAKE_INSTALL_PREFIX=<install_prefix> \
+      -DBUILD_MOCK=OFF
+    make -C build
+    make -C build install
+
+*Cross-compiling for AArch64*
+
+.. code-block:: console
+
+    git clone https://github.com/dmlc/dlpack.git
+    cd dlpack
+    git checkout v0.7 -b v0.7
+    cmake -S ./ -B build \
+      -DCMAKE_INSTALL_PREFIX=<install_prefix>
+      -DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
+      -DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++ \
+      -DBUILD_MOCK=OFF
+    make -C build
+    make -C build install
+
+- **DMLC**
+
+  This is a common bricks library for building scalable and portable distributed
+  machine learning.
+
+.. code-block:: console
+
+    git clone https://github.com/dmlc/dmlc-core.git
+    cd dmlc-core
+    git checkout main
+    cmake -S ./ -B build \
+      -DCMAKE_INSTALL_PREFIX=<install_prefix> \
+      -DCMAKE_C_FLAGS="-fpermissive" \
+      -DCMAKE_CXX_FLAGS="-fpermissive" \
+      -DUSE_OPENMP=OFF
+    make -C build
+    make -C build install
+
+*Cross-compiling for AArch64*
+
+.. code-block:: console
+
+    git clone https://github.com/dmlc/dmlc-core.git
+    cd dmlc-core
+    git checkout main
+    cmake -S ./ -B build \
+      -DCMAKE_INSTALL_PREFIX=<install_prefix> \
+      -DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
+      -DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++ \
+      -DCMAKE_C_FLAGS="-fpermissive" \
+      -DCMAKE_CXX_FLAGS="-fpermissive" \
+      -DUSE_OPENMP=OFF
+    make -C build
+    make -C build install
+
+- **TVM**
+
+  Apache TVM provides a runtime libraries used to execute models on CPU cores
+  or hardware accelerators.
+
+.. note::
+
+    DPDK CNXK ML driver requires TVM version 0.10.0
+
+.. code-block:: console
+
+    git clone https://github.com/apache/tvm.git
+    cd tvm
+    git checkout v0.11.0 -b v0.11.0
+    git submodule update --init
+    cmake -S ./ -B build \
+      -DCMAKE_INSTALL_PREFIX=<install_prefix> \
+      -DBUILD_STATIC_RUNTIME=OFF
+    make -C build
+    make -C build install
+
+*Cross-compiling for AArch64*
+
+.. code-block:: console
+
+    git clone https://github.com/apache/tvm.git
+    cd tvm
+    git checkout v0.11.0 -b v0.11.0
+    git submodule update --init
+    cmake -S ./ -B build \
+      -DCMAKE_INSTALL_PREFIX=<install_prefix> \
+      -DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
+      -DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++ \
+      -DMACHINE_NAME=aarch64-linux-gnu \
+      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
+      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
+      -DBUILD_STATIC_RUNTIME=OFF
+    make -C build
+    make -C build install
+
+- **TVMDP**
+
+  Marvell's `TVM Dataplane Library <https://github.com/MarvellEmbeddedProcessors/tvmdp>`_
+  works as an interface between TVM runtime and DPDK drivers. TVMDP library
+  provides a simplified C interface for TVM's runtime based on C++.
+
+.. note::
+
+    TVMDP library is dependent on TVM, dlpack, jansson and dmlc-core libraries.
+
+.. code-block:: console
+
+    git clone https://github.com/MarvellEmbeddedProcessors/tvmdp.git
+    cd tvmdp
+    git checkout main
+    cmake -S ./ -B build \
+      -DCMAKE_INSTALL_PREFIX=<install_prefix> \
+      -DBUILD_SHARED_LIBS=ON
+    make -C build
+    make -C build install
+
+*Cross-compiling for AArch64*
+
+.. code-block:: console
+
+    git clone https://github.com/MarvellEmbeddedProcessors/tvmdp.git
+    cd tvmdp
+    git checkout main
+    cmake -S ./ -B build \
+      -DCMAKE_INSTALL_PREFIX=<install_prefix> \
+      -DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
+      -DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++ \
+      -DCMAKE_FIND_ROOT_PATH=<install_prefix> \
+      -DBUILD_SHARED_LIBS=ON
+    make -C build
+    make -C build install
+
+- **libarchive**
+
+  Apached TVM framework generates compiled models as tar archives. This
+  library enables support to decompress and read archive files in tar,
+  xz and other formats.
+
+.. note::
+
+    In order for meson to find the dependencies during the configure stage,
+    it is required to add the cmake paths <install_prefix>/lib/cmake/dlpack,
+    <install_prefix>/lib/cmake/dmlc and <install_prefix>/lib/cmake/tvm to
+    CMAKE_PREFIX_PATH and <install_prefix>/lib/pkgconfig to PKG_CONFIG_PATH.
 
 Initialization
 --------------
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 3d21a31374..33d13d5514 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -564,6 +564,10 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 		goto error;
 	}
 
+	ret = mvtvm_ml_dev_configure(cnxk_mldev, conf);
+	if (ret != 0)
+		goto error;
+
 	/* Set device capabilities */
 	cnxk_mldev->max_nb_layers =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
@@ -624,6 +628,9 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 	/* Un-initialize xstats */
 	cnxk_ml_xstats_uninit(cnxk_mldev);
 
+	if (mvtvm_ml_dev_close(cnxk_mldev) != 0)
+		plt_err("Failed to close MVTVM ML Device");
+
 	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close CN10K ML Device");
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index d0c126f34b..b22a2b0d95 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -12,6 +12,12 @@
 
 #include "cn10k_ml_ops.h"
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_ops.h"
+#else
+#include "mvtvm_ml_stubs.h"
+#endif
+
 /* Request structure */
 struct cnxk_ml_req {
 	/* Device specific request */
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 5d27a87d91..1ef2b3c335 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -7,6 +7,37 @@ if not is_linux or not dpdk_conf.get('RTE_ARCH_64')
     subdir_done()
 endif
 
+enable_mvtvm = true
+
+if not jansson_dep.found()
+        message('drivers/ml/cnxk: jansson not found')
+        enable_mvtvm = false
+endif
+
+dlpack_dep = dependency('dlpack', method: 'cmake', required: false, cmake_args: 'CONFIG')
+if not dlpack_dep.found()
+        message('drivers/ml/cnxk: dlpack not found')
+        enable_mvtvm = false
+endif
+
+dmlc_dep = dependency('dmlc', method: 'cmake', required: false, cmake_args: 'CONFIG')
+if not dmlc_dep.found()
+        message('drivers/ml/cnxk: dmlc not found')
+        enable_mvtvm = false
+endif
+
+tvm_dep = dependency('tvm', method: 'cmake', required: false, cmake_args: 'CONFIG', modules : ['tvm::tvm_runtime'])
+if not tvm_dep.found()
+        message('drivers/ml/cnxk: tvm_runtime not found')
+        enable_mvtvm = false
+endif
+
+tvmdp_dep = dependency('tvmdp', method: 'pkg-config', required: false)
+if not tvmdp_dep.found()
+        message('drivers/ml/cnxk: tvmdp not found')
+        enable_mvtvm = false
+endif
+
 sources = files(
         'cn10k_ml_dev.c',
         'cn10k_ml_ops.c',
@@ -21,6 +52,33 @@ sources = files(
 
 deps += ['mldev', 'common_cnxk', 'kvargs', 'hash']
 
+if enable_mvtvm
+
+dpdk_conf.set('RTE_MLDEV_CNXK_ENABLE_MVTVM', 1)
+
+sources += files(
+        'mvtvm_ml_ops.c',
+)
+
+ext_deps += jansson_dep
+ext_deps += dlpack_dep
+ext_deps += dmlc_dep
+ext_deps += tvm_dep
+ext_deps += tvmdp_dep
+ext_deps += cc.find_library('stdc++', required: true)
+
+deps += ['bus_vdev']
+
+message('drivers/ml/cnxk: Enabled TVM model support')
+else
+message('drivers/ml/cnxk: Disabled TVM model support')
+
+sources += files(
+        'mvtvm_ml_stubs.c',
+)
+
+endif
+
 require_iova_in_mbuf = false
 
 if get_option('buildtype').contains('debug')
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
new file mode 100644
index 0000000000..88c6d5a864
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_common.h>
+#include <rte_cycles.h>
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include "cnxk_ml_dev.h"
+#include "cnxk_ml_ops.h"
+
+int
+mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
+{
+	int ret;
+
+	RTE_SET_USED(conf);
+
+	/* Configure TVMDP library */
+	ret = tvmdp_configure(cnxk_mldev->mldev->data->nb_models, rte_get_tsc_cycles);
+	if (ret != 0)
+		plt_err("TVMDP configuration failed, error = %d\n", ret);
+
+	return ret;
+}
+
+int
+mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
+{
+	int ret;
+
+	RTE_SET_USED(cnxk_mldev);
+
+	/* Close TVMDP library configuration */
+	ret = tvmdp_close();
+	if (ret != 0)
+		plt_err("TVMDP close failed, error = %d\n", ret);
+
+	return ret;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
new file mode 100644
index 0000000000..305b4681ed
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_OPS_H_
+#define _MVTVM_ML_OPS_H_
+
+#include <dlpack/dlpack.h>
+
+#include <tvmdp.h>
+
+#include <rte_mldev.h>
+
+struct cnxk_ml_dev;
+
+int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
+int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+
+#endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
new file mode 100644
index 0000000000..a31cd39cfa
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_mldev.h>
+
+#include "mvtvm_ml_stubs.h"
+
+#include "cnxk_ml_dev.h"
+
+int
+mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(conf);
+
+	return 0;
+}
+
+int
+mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
+{
+	RTE_SET_USED(cnxk_mldev);
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
new file mode 100644
index 0000000000..11c56e5144
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_STUBS_H_
+#define _MVTVM_ML_STUBS_H_
+
+#include <rte_mldev.h>
+
+struct cnxk_ml_dev;
+
+int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
+int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+
+#endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 19/34] ml/cnxk: add structures to support TVM model type
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (17 preceding siblings ...)
  2023-10-26 12:43   ` [PATCH v9 18/34] ml/cnxk: support config and close of tvmdp library Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 20/34] ml/cnxk: add support for identify " Srikanth Yalavarthi
                     ` (15 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Introduced model type, sub-type and layer type. Added
internal structures for TVM model objects.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ocm.c   |  3 ++
 drivers/ml/cnxk/cn10k_ml_ops.c   |  6 ++-
 drivers/ml/cnxk/cnxk_ml_model.h  | 66 +++++++++++++++++++++++++++++++-
 drivers/ml/cnxk/cnxk_ml_ops.c    | 52 ++++++++++++++++++++-----
 drivers/ml/cnxk/mvtvm_ml_model.h | 46 ++++++++++++++++++++++
 5 files changed, 160 insertions(+), 13 deletions(-)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.h

diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index dc315cce10..749ddeb344 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -435,6 +435,9 @@ cn10k_ml_ocm_free_pages(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id, uint1
 
 			for (j = 0; j < local_model->nb_layers; j++) {
 				local_layer = &local_model->layer[j];
+				if (local_layer->type != ML_CNXK_LAYER_TYPE_MRVL)
+					continue;
+
 				if (local_layer != layer &&
 				    local_layer->glow.ocm_map.ocm_reserved) {
 					if (IS_BIT_SET(local_layer->glow.ocm_map.tilemask, tile_id))
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 65eaaf030d..a471e98fbf 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -725,6 +725,9 @@ cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	if (ret != 0)
 		return ret;
 
+	/* Set model sub type */
+	model->subtype = ML_CNXK_MODEL_SUBTYPE_GLOW_MRVL;
+
 	/* Copy metadata to internal buffer */
 	rte_memcpy(&model->glow.metadata, params->addr, sizeof(struct cn10k_ml_model_metadata));
 	cn10k_ml_model_metadata_update(&model->glow.metadata);
@@ -746,6 +749,7 @@ cn10k_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	/* Load layer and get the index */
 	layer = &model->layer[0];
+	layer->type = ML_CNXK_LAYER_TYPE_MRVL;
 	ret = cn10k_ml_layer_load(cnxk_mldev, model->model_id, NULL, params->addr, params->size,
 				  &layer->index);
 	if (ret != 0) {
@@ -969,7 +973,7 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 	if (ret < 0) {
 		cn10k_ml_layer_stop(device, model_id, layer_name);
 	} else {
-		if (cn10k_mldev->cache_model_data)
+		if (cn10k_mldev->cache_model_data && model->type == ML_CNXK_MODEL_TYPE_GLOW)
 			ret = cn10k_ml_cache_model_data(cnxk_mldev, layer);
 	}
 
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index f618e5aa5f..f100eca203 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -11,6 +11,10 @@
 
 #include "cn10k_ml_model.h"
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_model.h"
+#endif
+
 #include "cnxk_ml_io.h"
 
 struct cnxk_ml_dev;
@@ -18,6 +22,48 @@ struct cnxk_ml_model;
 struct cnxk_ml_qp;
 struct cnxk_ml_req;
 
+/* Model type */
+enum cnxk_ml_model_type {
+	/* Unknown model type */
+	ML_CNXK_MODEL_TYPE_UNKNOWN,
+
+	/* Invalid model type */
+	ML_CNXK_MODEL_TYPE_INVALID,
+
+	/* Glow compiled model, for MLIP target */
+	ML_CNXK_MODEL_TYPE_GLOW,
+
+	/* TVM compiled model, for ARM64 / ARM64 + MLIP target */
+	ML_CNXK_MODEL_TYPE_TVM,
+};
+
+/* Model subtype */
+enum cnxk_ml_model_subtype {
+	/* Marvell Glow model */
+	ML_CNXK_MODEL_SUBTYPE_GLOW_MRVL,
+
+	/* TVM model with single MRVL region */
+	ML_CNXK_MODEL_SUBTYPE_TVM_MRVL,
+
+	/* TVM model with LLVM regions only */
+	ML_CNXK_MODEL_SUBTYPE_TVM_LLVM,
+
+	/* TVM hybrid model, with both MRVL and LLVM regions or (> 1) MRVL regions*/
+	ML_CNXK_MODEL_SUBTYPE_TVM_HYBRID,
+};
+
+/* Layer type */
+enum cnxk_ml_layer_type {
+	/* MRVL layer, for MLIP target*/
+	ML_CNXK_LAYER_TYPE_UNKNOWN = 0,
+
+	/* MRVL layer, for MLIP target*/
+	ML_CNXK_LAYER_TYPE_MRVL,
+
+	/* LLVM layer, for ARM64 target*/
+	ML_CNXK_LAYER_TYPE_LLVM,
+};
+
 /* Model state */
 enum cnxk_ml_model_state {
 	/* Unknown state */
@@ -53,6 +99,9 @@ struct cnxk_ml_layer {
 	/* Name*/
 	char name[RTE_ML_STR_MAX];
 
+	/* Type */
+	enum cnxk_ml_layer_type type;
+
 	/* Model handle */
 	struct cnxk_ml_model *model;
 
@@ -83,14 +132,27 @@ struct cnxk_ml_model {
 	/* Device reference */
 	struct cnxk_ml_dev *cnxk_mldev;
 
+	/* Type */
+	enum cnxk_ml_model_type type;
+
+	/* Model subtype */
+	enum cnxk_ml_model_subtype subtype;
+
 	/* ID */
 	uint16_t model_id;
 
 	/* Name */
 	char name[RTE_ML_STR_MAX];
 
-	/* Model specific data - glow */
-	struct cn10k_ml_model_data glow;
+	union {
+		/* Model specific data - glow */
+		struct cn10k_ml_model_data glow;
+
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		/* Model type specific data - mvtvm */
+		struct mvtvm_ml_model_data mvtvm;
+#endif
+	};
 
 	/* Batch size */
 	uint32_t batch_size;
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 33d13d5514..96f87128f9 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1217,6 +1217,8 @@ cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buf
 	struct cnxk_ml_model *model;
 	uint8_t *lcl_dbuffer;
 	uint8_t *lcl_qbuffer;
+	uint64_t d_offset;
+	uint64_t q_offset;
 	uint32_t i;
 	int ret;
 
@@ -1229,17 +1231,31 @@ cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buf
 		return -EINVAL;
 	}
 
-	info = &model->layer[0].info;
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		info = cn10k_ml_model_io_info_get(model, 0);
 
-	lcl_dbuffer = dbuffer[0]->addr;
-	lcl_qbuffer = qbuffer[0]->addr;
+	if (info == NULL)
+		return -EINVAL;
+
+	d_offset = 0;
+	q_offset = 0;
 	for (i = 0; i < info->nb_inputs; i++) {
+		if (model->type == ML_CNXK_MODEL_TYPE_TVM) {
+			lcl_dbuffer = dbuffer[i]->addr;
+			lcl_qbuffer = qbuffer[i]->addr;
+		} else {
+			lcl_dbuffer = RTE_PTR_ADD(dbuffer[0]->addr, d_offset);
+			lcl_qbuffer = RTE_PTR_ADD(qbuffer[0]->addr, q_offset);
+		}
+
 		ret = cnxk_ml_io_quantize_single(&info->input[i], lcl_dbuffer, lcl_qbuffer);
 		if (ret < 0)
 			return ret;
 
-		lcl_dbuffer += info->input[i].sz_d;
-		lcl_qbuffer += info->input[i].sz_q;
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
+			d_offset += info->input[i].sz_d;
+			q_offset += info->input[i].sz_q;
+		}
 	}
 
 	return 0;
@@ -1253,6 +1269,8 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 	struct cnxk_ml_model *model;
 	uint8_t *lcl_qbuffer;
 	uint8_t *lcl_dbuffer;
+	uint64_t q_offset;
+	uint64_t d_offset;
 	uint32_t i;
 	int ret;
 
@@ -1265,17 +1283,31 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 		return -EINVAL;
 	}
 
-	info = &model->layer[model->nb_layers - 1].info;
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		info = cn10k_ml_model_io_info_get(model, model->nb_layers - 1);
 
-	lcl_qbuffer = qbuffer[0]->addr;
-	lcl_dbuffer = dbuffer[0]->addr;
+	if (info == NULL)
+		return -EINVAL;
+
+	q_offset = 0;
+	d_offset = 0;
 	for (i = 0; i < info->nb_outputs; i++) {
+		if (model->type == ML_CNXK_MODEL_TYPE_TVM) {
+			lcl_qbuffer = qbuffer[i]->addr;
+			lcl_dbuffer = dbuffer[i]->addr;
+		} else {
+			lcl_qbuffer = RTE_PTR_ADD(qbuffer[0]->addr, q_offset);
+			lcl_dbuffer = RTE_PTR_ADD(dbuffer[0]->addr, d_offset);
+		}
+
 		ret = cnxk_ml_io_dequantize_single(&info->output[i], lcl_qbuffer, lcl_dbuffer);
 		if (ret < 0)
 			return ret;
 
-		lcl_qbuffer += info->output[i].sz_q;
-		lcl_dbuffer += info->output[i].sz_d;
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
+			q_offset += info->output[i].sz_q;
+			d_offset += info->output[i].sz_d;
+		}
 	}
 
 	return 0;
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
new file mode 100644
index 0000000000..1f6b435be0
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_MODEL_H_
+#define _MVTVM_ML_MODEL_H_
+
+#include <tvmdp.h>
+
+#include <rte_mldev.h>
+
+#include "cnxk_ml_io.h"
+
+/* Maximum number of objects per model */
+#define ML_MVTVM_MODEL_OBJECT_MAX 3
+
+/* Objects list */
+extern char mvtvm_object_list[ML_MVTVM_MODEL_OBJECT_MAX][RTE_ML_STR_MAX];
+
+/* Model object structure */
+struct mvtvm_ml_model_object {
+	/* Name */
+	char name[RTE_ML_STR_MAX];
+
+	/* Temporary buffer */
+	uint8_t *buffer;
+
+	/* Buffer size */
+	int64_t size;
+};
+
+struct mvtvm_ml_model_data {
+	/* Model metadata */
+	struct tvmdp_model_metadata metadata;
+
+	/* Model objects */
+	struct tvmdp_model_object object;
+
+	/* TVM runtime callbacks */
+	struct tvmrt_glow_callback cb;
+
+	/* Model I/O info */
+	struct cnxk_ml_io_info info;
+};
+
+#endif /* _MVTVM_ML_MODEL_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 20/34] ml/cnxk: add support for identify model type
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (18 preceding siblings ...)
  2023-10-26 12:43   ` [PATCH v9 19/34] ml/cnxk: add structures to support TVM model type Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 21/34] ml/cnxk: add support to parse TVM model objects Srikanth Yalavarthi
                     ` (14 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enable support to parse model buffer to identify the
model type and model sub-type. Enabled basic checks
for Glow model type buffer.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_model.c  | 49 ++++++++++++++++++++++++++++
 drivers/ml/cnxk/cnxk_ml_model.h  |  3 ++
 drivers/ml/cnxk/cnxk_ml_ops.c    |  8 +++++
 drivers/ml/cnxk/meson.build      |  6 ++++
 drivers/ml/cnxk/mvtvm_ml_model.c | 55 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  9 ++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  1 +
 8 files changed, 133 insertions(+)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_model.c

diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
index b069d4e3a5..02f80410ec 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.c
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -2,11 +2,60 @@
  * Copyright (c) 2023 Marvell.
  */
 
+#include <rte_hash_crc.h>
 #include <rte_mldev.h>
 
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_utils.h"
 
+enum cnxk_ml_model_type
+cnxk_ml_model_get_type(struct rte_ml_model_params *params)
+{
+	struct cn10k_ml_model_metadata_header *metadata_header;
+	enum cnxk_ml_model_type type;
+	uint32_t payload_crc32c;
+	uint32_t header_crc32c;
+
+	type = mvtvm_ml_model_type_get(params);
+	if (type == ML_CNXK_MODEL_TYPE_TVM)
+		return ML_CNXK_MODEL_TYPE_TVM;
+	else if (type == ML_CNXK_MODEL_TYPE_INVALID)
+		return ML_CNXK_MODEL_TYPE_INVALID;
+
+	/* Check model magic string */
+	metadata_header = (struct cn10k_ml_model_metadata_header *)params->addr;
+	if (strncmp((char *)metadata_header->magic, MRVL_ML_MODEL_MAGIC_STRING, 4) != 0) {
+		plt_err("Invalid Glow model, magic = %s", metadata_header->magic);
+		return ML_CNXK_MODEL_TYPE_INVALID;
+	}
+
+	/* Header CRC check */
+	if (metadata_header->header_crc32c != 0) {
+		header_crc32c = rte_hash_crc(
+			params->addr,
+			sizeof(struct cn10k_ml_model_metadata_header) - sizeof(uint32_t), 0);
+
+		if (header_crc32c != metadata_header->header_crc32c) {
+			plt_err("Invalid Glow model, Header CRC mismatch");
+			return ML_CNXK_MODEL_TYPE_INVALID;
+		}
+	}
+
+	/* Payload CRC check */
+	if (metadata_header->payload_crc32c != 0) {
+		payload_crc32c = rte_hash_crc(
+			PLT_PTR_ADD(params->addr, sizeof(struct cn10k_ml_model_metadata_header)),
+			params->size - sizeof(struct cn10k_ml_model_metadata_header), 0);
+
+		if (payload_crc32c != metadata_header->payload_crc32c) {
+			plt_err("Invalid Glow model, Payload CRC mismatch");
+			return ML_CNXK_MODEL_TYPE_INVALID;
+		}
+	}
+
+	return ML_CNXK_MODEL_TYPE_GLOW;
+}
+
 void
 cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp)
 {
diff --git a/drivers/ml/cnxk/cnxk_ml_model.h b/drivers/ml/cnxk/cnxk_ml_model.h
index f100eca203..a2fced46a2 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.h
+++ b/drivers/ml/cnxk/cnxk_ml_model.h
@@ -13,6 +13,8 @@
 
 #ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
 #include "mvtvm_ml_model.h"
+#else
+#include "mvtvm_ml_stubs.h"
 #endif
 
 #include "cnxk_ml_io.h"
@@ -184,6 +186,7 @@ struct cnxk_ml_model {
 	set_poll_addr_t set_poll_addr;
 };
 
+enum cnxk_ml_model_type cnxk_ml_model_get_type(struct rte_ml_model_params *params);
 void cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model, FILE *fp);
 
 #endif /* _CNXK_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 96f87128f9..ebc78e36e9 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1018,6 +1018,7 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 {
 	struct rte_ml_dev_info dev_info;
 	struct cnxk_ml_dev *cnxk_mldev;
+	enum cnxk_ml_model_type type;
 	struct cnxk_ml_model *model;
 
 	char str[RTE_MEMZONE_NAMESIZE];
@@ -1033,6 +1034,12 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 
 	cnxk_mldev = dev->data->dev_private;
 
+	type = cnxk_ml_model_get_type(params);
+	if (type == ML_CNXK_MODEL_TYPE_INVALID) {
+		plt_err("Invalid / unsupported model type");
+		return -EINVAL;
+	}
+
 	/* Find model ID */
 	found = false;
 	for (lcl_model_id = 0; lcl_model_id < dev->data->nb_models; lcl_model_id++) {
@@ -1066,6 +1073,7 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 
 	model = mz->addr;
 	model->cnxk_mldev = cnxk_mldev;
+	model->type = type;
 	model->model_id = lcl_model_id;
 	model->info = PLT_PTR_ADD(
 		model, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size));
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 1ef2b3c335..20534d0b00 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -9,6 +9,11 @@ endif
 
 enable_mvtvm = true
 
+if not libarchive.found()
+        message('drivers/ml/cnxk: libarchive not found')
+        enable_mvtvm = false
+endif
+
 if not jansson_dep.found()
         message('drivers/ml/cnxk: jansson not found')
         enable_mvtvm = false
@@ -58,6 +63,7 @@ dpdk_conf.set('RTE_MLDEV_CNXK_ENABLE_MVTVM', 1)
 
 sources += files(
         'mvtvm_ml_ops.c',
+        'mvtvm_ml_model.c',
 )
 
 ext_deps += jansson_dep
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
new file mode 100644
index 0000000000..ab5f8baa67
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <archive.h>
+#include <archive_entry.h>
+
+#include <rte_mldev.h>
+
+#include <roc_api.h>
+
+#include "cnxk_ml_model.h"
+
+/* Objects list */
+char mvtvm_object_list[ML_MVTVM_MODEL_OBJECT_MAX][RTE_ML_STR_MAX] = {"mod.so", "mod.json",
+								     "mod.params"};
+
+enum cnxk_ml_model_type
+mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
+{
+	bool object_found[ML_MVTVM_MODEL_OBJECT_MAX] = {false, false, false};
+	struct archive_entry *entry;
+	struct archive *a;
+	uint8_t i;
+	int ret;
+
+	/* Assume as archive and check for read status */
+	a = archive_read_new();
+	archive_read_support_filter_all(a);
+	archive_read_support_format_all(a);
+
+	ret = archive_read_open_memory(a, params->addr, params->size);
+	if (ret != ARCHIVE_OK)
+		return ML_CNXK_MODEL_TYPE_UNKNOWN;
+
+	/* Parse buffer for available objects */
+	while (archive_read_next_header(a, &entry) == ARCHIVE_OK) {
+		for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+			if (!object_found[i] &&
+			    (strcmp(archive_entry_pathname(entry), mvtvm_object_list[i]) == 0))
+				object_found[i] = true;
+		}
+		archive_read_data_skip(a);
+	}
+
+	/* Check if all objects are available */
+	for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+		if (!object_found[i]) {
+			plt_err("Object %s not found in archive!\n", mvtvm_object_list[i]);
+			return ML_CNXK_MODEL_TYPE_INVALID;
+		}
+	}
+
+	return ML_CNXK_MODEL_TYPE_TVM;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 1f6b435be0..b6162fceec 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -43,4 +43,6 @@ struct mvtvm_ml_model_data {
 	struct cnxk_ml_io_info info;
 };
 
+enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
+
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index a31cd39cfa..a7352840a6 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -7,6 +7,15 @@
 #include "mvtvm_ml_stubs.h"
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
+
+enum cnxk_ml_model_type
+mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
+{
+	RTE_SET_USED(params);
+
+	return ML_CNXK_MODEL_TYPE_UNKNOWN;
+}
 
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 11c56e5144..467a9d39e5 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -9,6 +9,7 @@
 
 struct cnxk_ml_dev;
 
+enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 21/34] ml/cnxk: add support to parse TVM model objects
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (19 preceding siblings ...)
  2023-10-26 12:43   ` [PATCH v9 20/34] ml/cnxk: add support for identify " Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 22/34] ml/cnxk: fetch layer info and load TVM model Srikanth Yalavarthi
                     ` (13 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to parse TVM model objects from the model
archive buffer. Added support to check for all expected
objects and copy TVM model objects to internal buffers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_ops.c    |  5 ++-
 drivers/ml/cnxk/mvtvm_ml_model.c | 57 +++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 62 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |  3 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c | 11 ++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  3 ++
 7 files changed, 142 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index ebc78e36e9..85b37161d2 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1079,7 +1079,10 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 		model, PLT_ALIGN_CEIL(sizeof(struct cnxk_ml_model), dev_info.align_size));
 	dev->data->models[lcl_model_id] = model;
 
-	ret = cn10k_ml_model_load(cnxk_mldev, params, model);
+	if (type == ML_CNXK_MODEL_TYPE_GLOW)
+		ret = cn10k_ml_model_load(cnxk_mldev, params, model);
+	else
+		ret = mvtvm_ml_model_load(cnxk_mldev, params, model);
 	if (ret != 0)
 		goto error;
 
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index ab5f8baa67..4c9a080c05 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -53,3 +53,60 @@ mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
 
 	return ML_CNXK_MODEL_TYPE_TVM;
 }
+
+int
+mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params, struct mvtvm_ml_model_object *object)
+{
+	bool object_found[ML_MVTVM_MODEL_OBJECT_MAX] = {false, false, false};
+	struct archive_entry *entry;
+	struct archive *a;
+	uint8_t i;
+	int ret;
+
+	/* Open archive */
+	a = archive_read_new();
+	archive_read_support_filter_all(a);
+	archive_read_support_format_all(a);
+
+	ret = archive_read_open_memory(a, params->addr, params->size);
+	if (ret != ARCHIVE_OK)
+		return archive_errno(a);
+
+	/* Read archive */
+	while (archive_read_next_header(a, &entry) == ARCHIVE_OK) {
+		for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+			if (!object_found[i] &&
+			    (strcmp(archive_entry_pathname(entry), mvtvm_object_list[i]) == 0)) {
+				memcpy(object[i].name, mvtvm_object_list[i], RTE_ML_STR_MAX);
+				object[i].size = archive_entry_size(entry);
+				object[i].buffer = rte_malloc(NULL, object[i].size, 0);
+
+				if (archive_read_data(a, object[i].buffer, object[i].size) !=
+				    object[i].size) {
+					plt_err("Failed to read object from model archive: %s",
+						object[i].name);
+					goto error;
+				}
+				object_found[i] = true;
+			}
+		}
+		archive_read_data_skip(a);
+	}
+
+	/* Check if all objects are parsed */
+	for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+		if (!object_found[i]) {
+			plt_err("Object %s not found in archive!\n", mvtvm_object_list[i]);
+			goto error;
+		}
+	}
+	return 0;
+
+error:
+	for (i = 0; i < ML_MVTVM_MODEL_OBJECT_MAX; i++) {
+		if (object[i].buffer != NULL)
+			rte_free(object[i].buffer);
+	}
+
+	return -EINVAL;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index b6162fceec..b11b66f495 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -44,5 +44,7 @@ struct mvtvm_ml_model_data {
 };
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
+int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
+			      struct mvtvm_ml_model_object *object);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 88c6d5a864..e2413b6b15 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -8,8 +8,12 @@
 #include <rte_mldev_pmd.h>
 
 #include "cnxk_ml_dev.h"
+#include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
 
+/* ML model macros */
+#define MVTVM_ML_MODEL_MEMZONE_NAME "ml_mvtvm_model_mz"
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
@@ -39,3 +43,61 @@ mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 
 	return ret;
 }
+
+int
+mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+		    struct cnxk_ml_model *model)
+{
+	struct mvtvm_ml_model_object object[ML_MVTVM_MODEL_OBJECT_MAX];
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	size_t model_object_size = 0;
+	uint64_t mz_size = 0;
+	int ret;
+
+	RTE_SET_USED(cnxk_mldev);
+
+	ret = mvtvm_ml_model_blob_parse(params, object);
+	if (ret != 0)
+		return ret;
+
+	model_object_size = RTE_ALIGN_CEIL(object[0].size, RTE_CACHE_LINE_MIN_SIZE) +
+			    RTE_ALIGN_CEIL(object[1].size, RTE_CACHE_LINE_MIN_SIZE) +
+			    RTE_ALIGN_CEIL(object[2].size, RTE_CACHE_LINE_MIN_SIZE);
+	mz_size += model_object_size;
+
+	/* Allocate memzone for model object */
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", MVTVM_ML_MODEL_MEMZONE_NAME, model->model_id);
+	mz = plt_memzone_reserve_aligned(str, mz_size, 0, ML_CN10K_ALIGN_SIZE);
+	if (!mz) {
+		plt_err("plt_memzone_reserve failed : %s", str);
+		return -ENOMEM;
+	}
+
+	/* Copy mod.so */
+	model->mvtvm.object.so.addr = mz->addr;
+	model->mvtvm.object.so.size = object[0].size;
+	rte_memcpy(model->mvtvm.object.so.name, object[0].name, TVMDP_NAME_STRLEN);
+	rte_memcpy(model->mvtvm.object.so.addr, object[0].buffer, object[0].size);
+	rte_free(object[0].buffer);
+
+	/* Copy mod.json */
+	model->mvtvm.object.json.addr =
+		RTE_PTR_ADD(model->mvtvm.object.so.addr,
+			    RTE_ALIGN_CEIL(model->mvtvm.object.so.size, RTE_CACHE_LINE_MIN_SIZE));
+	model->mvtvm.object.json.size = object[1].size;
+	rte_memcpy(model->mvtvm.object.json.name, object[1].name, TVMDP_NAME_STRLEN);
+	rte_memcpy(model->mvtvm.object.json.addr, object[1].buffer, object[1].size);
+	rte_free(object[1].buffer);
+
+	/* Copy mod.params */
+	model->mvtvm.object.params.addr =
+		RTE_PTR_ADD(model->mvtvm.object.json.addr,
+			    RTE_ALIGN_CEIL(model->mvtvm.object.json.size, RTE_CACHE_LINE_MIN_SIZE));
+	model->mvtvm.object.params.size = object[2].size;
+	rte_memcpy(model->mvtvm.object.params.name, object[2].name, TVMDP_NAME_STRLEN);
+	rte_memcpy(model->mvtvm.object.params.addr, object[2].buffer, object[2].size);
+	rte_free(object[2].buffer);
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 305b4681ed..6607537599 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -12,8 +12,11 @@
 #include <rte_mldev.h>
 
 struct cnxk_ml_dev;
+struct cnxk_ml_model;
 
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+			struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index a7352840a6..7f3b3abb2e 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -33,3 +33,14 @@ mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 
 	return 0;
 }
+
+int
+mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+		    struct cnxk_ml_model *model)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(params);
+	RTE_SET_USED(model);
+
+	return -EINVAL;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 467a9d39e5..4bb1772ef4 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -8,9 +8,12 @@
 #include <rte_mldev.h>
 
 struct cnxk_ml_dev;
+struct cnxk_ml_model;
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
+			struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 22/34] ml/cnxk: fetch layer info and load TVM model
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (20 preceding siblings ...)
  2023-10-26 12:43   ` [PATCH v9 21/34] ml/cnxk: add support to parse TVM model objects Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 23/34] ml/cnxk: update internal info for " Srikanth Yalavarthi
                     ` (12 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to fetch TVM model layer information and
update internal structures based on the layer information
Set callback functions for layer load and unload and
enable model loading using TVMDP library. Added support
to fetch full metadata after model load.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 11 +++++
 drivers/ml/cnxk/cn10k_ml_model.h |  2 +
 drivers/ml/cnxk/cn10k_ml_ops.c   |  7 ++-
 drivers/ml/cnxk/cnxk_ml_io.h     |  8 ++++
 drivers/ml/cnxk/mvtvm_ml_model.c | 25 ++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  4 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 81 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.c | 10 ++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  3 ++
 9 files changed, 149 insertions(+), 2 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index af9d5a666f..0325cd54f1 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -716,3 +716,14 @@ cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer
 	cnxk_ml_print_line(fp, LINE_LEN);
 	fprintf(fp, "\n");
 }
+
+int
+cn10k_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name, uint16_t *layer_id)
+{
+	if (model->type == ML_CNXK_MODEL_TYPE_TVM)
+		return mvtvm_ml_model_get_layer_id(model, layer_name, layer_id);
+
+	*layer_id = 0;
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index 45f2ed5fcf..6744175cd5 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -461,5 +461,7 @@ void cn10k_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_mode
 			     struct cnxk_ml_io_info *io_info,
 			     struct cn10k_ml_model_metadata *metadata);
 void cn10k_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
+int cn10k_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
+				uint16_t *layer_id);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index a471e98fbf..4191ccc840 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -576,7 +576,7 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 	size_t layer_xstats_size;
 	uint8_t *base_dma_addr;
 	uint16_t scratch_pages;
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	uint16_t wb_pages;
 	uint64_t mz_size;
 	uint16_t idx;
@@ -584,7 +584,6 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 	int ret;
 
 	PLT_SET_USED(size);
-	PLT_SET_USED(layer_name);
 
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
@@ -598,6 +597,10 @@ cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name, uin
 		return -EINVAL;
 	}
 
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
 	layer = &model->layer[layer_id];
 
 	ret = cn10k_ml_model_metadata_check(buffer, size);
diff --git a/drivers/ml/cnxk/cnxk_ml_io.h b/drivers/ml/cnxk/cnxk_ml_io.h
index d500d77b9a..c33a9c23a1 100644
--- a/drivers/ml/cnxk/cnxk_ml_io.h
+++ b/drivers/ml/cnxk/cnxk_ml_io.h
@@ -5,13 +5,21 @@
 #ifndef _CNXK_ML_IO_H_
 #define _CNXK_ML_IO_H_
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include <tvmdp.h>
+#endif
+
 #include <rte_mldev.h>
 
 /* Maximum number of models per device */
 #define ML_CNXK_MAX_MODELS 16
 
 /* Maximum number of layers per model */
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#define ML_CNXK_MODEL_MAX_LAYERS TVMDP_MODEL_LAYERS_MAX
+#else
 #define ML_CNXK_MODEL_MAX_LAYERS 1
+#endif
 
 /* Maximum number of inputs or outputs per layer or model */
 #define ML_CNXK_MODEL_MAX_INPUT_OUTPUT 32
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 4c9a080c05..8536fd8927 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -110,3 +110,28 @@ mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params, struct mvtvm_ml_mo
 
 	return -EINVAL;
 }
+
+int
+mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name, uint16_t *layer_id)
+{
+	uint16_t i;
+
+	for (i = 0; i < model->mvtvm.metadata.model.nb_layers; i++) {
+		if (strcmp(model->layer[i].name, layer_name) == 0)
+			break;
+	}
+
+	if (i == model->mvtvm.metadata.model.nb_layers) {
+		plt_err("Invalid layer name: %s", layer_name);
+		return -EINVAL;
+	}
+
+	if (model->layer[i].type != ML_CNXK_LAYER_TYPE_MRVL) {
+		plt_err("Invalid layer type, name: %s type: %d", layer_name, model->layer[i].type);
+		return -EINVAL;
+	}
+
+	*layer_id = i;
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index b11b66f495..6cb2639876 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -11,6 +11,8 @@
 
 #include "cnxk_ml_io.h"
 
+struct cnxk_ml_model;
+
 /* Maximum number of objects per model */
 #define ML_MVTVM_MODEL_OBJECT_MAX 3
 
@@ -46,5 +48,7 @@ struct mvtvm_ml_model_data {
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
 int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
 			      struct mvtvm_ml_model_object *object);
+int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
+				uint16_t *layer_id);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index e2413b6b15..9a3ada1b0d 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -49,9 +49,13 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		    struct cnxk_ml_model *model)
 {
 	struct mvtvm_ml_model_object object[ML_MVTVM_MODEL_OBJECT_MAX];
+	struct tvmrt_glow_callback *callback;
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
 	size_t model_object_size = 0;
+	uint16_t nb_mrvl_layers;
+	uint16_t nb_llvm_layers;
+	uint8_t layer_id = 0;
 	uint64_t mz_size = 0;
 	int ret;
 
@@ -99,5 +103,82 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	rte_memcpy(model->mvtvm.object.params.addr, object[2].buffer, object[2].size);
 	rte_free(object[2].buffer);
 
+	/* Get metadata - stage 1 */
+	ret = tvmdp_model_metadata_get_stage1(model->mvtvm.object.json.addr,
+					      model->mvtvm.object.json.size,
+					      &model->mvtvm.metadata);
+	if (ret != 0) {
+		plt_err("TVMDP: Failed to parse metadata - stage 1, model_id = %u, error = %d",
+			model->model_id, ret);
+		goto error;
+	}
+
+	/* Set model fields */
+	plt_strlcpy(model->name, model->mvtvm.metadata.model.name, TVMDP_NAME_STRLEN);
+	model->batch_size = 1;
+	model->nb_layers = model->mvtvm.metadata.model.nb_layers;
+
+	/* Update layer info */
+	nb_mrvl_layers = 0;
+	nb_llvm_layers = 0;
+	for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+		rte_strscpy(model->layer[layer_id].name,
+			    model->mvtvm.metadata.model.layer[layer_id].name, TVMDP_NAME_STRLEN);
+		if (strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "mrvl") == 0 ||
+		    strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "MRVL") == 0) {
+			model->layer[layer_id].type = ML_CNXK_LAYER_TYPE_MRVL;
+			nb_mrvl_layers++;
+		} else if (strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "llvm") == 0 ||
+			   strcmp(model->mvtvm.metadata.model.layer[layer_id].type, "LLVM") == 0) {
+			model->layer[layer_id].type = ML_CNXK_LAYER_TYPE_LLVM;
+			nb_llvm_layers++;
+		}
+	}
+
+	if ((nb_llvm_layers == 0) && (nb_mrvl_layers == 0)) {
+		plt_err("Invalid model, nb_llvm_layers = %u, nb_mrvl_layers = %u", nb_llvm_layers,
+			nb_mrvl_layers);
+		goto error;
+	}
+
+	/* Set model subtype */
+	if ((nb_llvm_layers == 0) && (nb_mrvl_layers == 1))
+		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_MRVL;
+	else if ((nb_llvm_layers > 0) && (nb_mrvl_layers == 0))
+		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_LLVM;
+	else
+		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_HYBRID;
+
+	/* Set callback function array */
+	if (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_LLVM) {
+		callback = &model->mvtvm.cb;
+		callback->tvmrt_glow_layer_load = cn10k_ml_layer_load;
+		callback->tvmrt_glow_layer_unload = cn10k_ml_layer_unload;
+	} else {
+		callback = NULL;
+	}
+
+	/* Initialize model in TVMDP */
+	ret = tvmdp_model_load(cnxk_mldev, model->model_id, (void *)(&model->mvtvm.object),
+			       callback);
+	if (ret != 0) {
+		plt_err("TVMDP: Model load failed, model_id = %u, error = %d", model->model_id,
+			ret);
+		goto error;
+	}
+
+	/* Get model metadata - stage 2 */
+	ret = tvmdp_model_metadata_get_stage2(model->model_id, &model->mvtvm.metadata);
+	if (ret != 0) {
+		plt_err("TVMDP: Failed to get metadata, model_id = %u, error = %d\n",
+			model->model_id, ret);
+		goto error;
+	}
+
 	return 0;
+
+error:
+	rte_memzone_free(mz);
+
+	return ret;
 }
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index 7f3b3abb2e..d621dbc897 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -17,6 +17,16 @@ mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
 	return ML_CNXK_MODEL_TYPE_UNKNOWN;
 }
 
+int
+mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name, uint16_t *layer_id)
+{
+	RTE_SET_USED(model);
+	RTE_SET_USED(layer_name);
+	RTE_SET_USED(layer_id);
+
+	return -EINVAL;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 4bb1772ef4..23fdfdc4cd 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -16,4 +16,7 @@ int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 
+int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
+				uint16_t *layer_id);
+
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 23/34] ml/cnxk: update internal info for TVM model
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (21 preceding siblings ...)
  2023-10-26 12:43   ` [PATCH v9 22/34] ml/cnxk: fetch layer info and load TVM model Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 24/34] ml/cnxk: enable model unload in tvmdp library Srikanth Yalavarthi
                     ` (11 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enabled updating internal IO info structures for TVM model.
Compute static fields related to the model I/O.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_ops.c    |   4 +
 drivers/ml/cnxk/mvtvm_ml_model.c | 133 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |   2 +
 drivers/ml/cnxk/mvtvm_ml_ops.c   |   3 +
 drivers/ml/cnxk/mvtvm_ml_stubs.c |   9 +++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |   1 +
 6 files changed, 152 insertions(+)

diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 85b37161d2..1565e521fd 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1244,6 +1244,8 @@ cnxk_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buf
 
 	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
 		info = cn10k_ml_model_io_info_get(model, 0);
+	else
+		info = mvtvm_ml_model_io_info_get(model, 0);
 
 	if (info == NULL)
 		return -EINVAL;
@@ -1296,6 +1298,8 @@ cnxk_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_b
 
 	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
 		info = cn10k_ml_model_io_info_get(model, model->nb_layers - 1);
+	else
+		info = mvtvm_ml_model_io_info_get(model, model->nb_layers - 1);
 
 	if (info == NULL)
 		return -EINVAL;
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 8536fd8927..f35c2bb3e5 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -7,6 +7,8 @@
 
 #include <rte_mldev.h>
 
+#include <mldev_utils.h>
+
 #include <roc_api.h>
 
 #include "cnxk_ml_model.h"
@@ -135,3 +137,134 @@ mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 
 	return 0;
 }
+
+static enum rte_ml_io_type
+mvtvm_ml_io_type_map(DLDataType dltype)
+{
+	switch (dltype.code) {
+	case kDLInt:
+		if (dltype.bits == 8)
+			return RTE_ML_IO_TYPE_INT8;
+		else if (dltype.bits == 16)
+			return RTE_ML_IO_TYPE_INT16;
+		else if (dltype.bits == 32)
+			return RTE_ML_IO_TYPE_INT32;
+		break;
+	case kDLUInt:
+		if (dltype.bits == 8)
+			return RTE_ML_IO_TYPE_UINT8;
+		else if (dltype.bits == 16)
+			return RTE_ML_IO_TYPE_UINT16;
+		else if (dltype.bits == 32)
+			return RTE_ML_IO_TYPE_UINT32;
+		break;
+	case kDLFloat:
+		if (dltype.bits == 8)
+			return RTE_ML_IO_TYPE_FP8;
+		else if (dltype.bits == 16)
+			return RTE_ML_IO_TYPE_FP16;
+		else if (dltype.bits == 32)
+			return RTE_ML_IO_TYPE_FP32;
+		break;
+	case kDLBfloat:
+		if (dltype.bits == 16)
+			return RTE_ML_IO_TYPE_BFLOAT16;
+		break;
+	default:
+		return RTE_ML_IO_TYPE_UNKNOWN;
+	}
+
+	return RTE_ML_IO_TYPE_UNKNOWN;
+}
+
+void
+mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model)
+{
+	struct tvmdp_model_metadata *metadata;
+	int32_t i;
+	int32_t j;
+
+	if (model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL)
+		goto tvm_mrvl_model;
+
+	metadata = &model->mvtvm.metadata;
+
+	/* Inputs, set for layer_id = 0 */
+	model->mvtvm.info.nb_inputs = metadata->model.num_input;
+	model->mvtvm.info.total_input_sz_d = 0;
+	model->mvtvm.info.total_input_sz_q = 0;
+	for (i = 0; i < metadata->model.num_input; i++) {
+		rte_strscpy(model->mvtvm.info.input[i].name, metadata->input[i].name,
+			    TVMDP_NAME_STRLEN);
+		model->mvtvm.info.input[i].dtype =
+			mvtvm_ml_io_type_map(metadata->input[i].datatype);
+		model->mvtvm.info.input[i].qtype =
+			mvtvm_ml_io_type_map(metadata->input[i].model_datatype);
+		model->mvtvm.info.input[i].nb_dims = metadata->input[i].ndim;
+
+		model->mvtvm.info.input[i].nb_elements = 1;
+		for (j = 0; j < metadata->input[i].ndim; j++) {
+			model->mvtvm.info.input[i].shape[j] = metadata->input[i].shape[j];
+			model->mvtvm.info.input[i].nb_elements *= metadata->input[i].shape[j];
+		}
+
+		model->mvtvm.info.input[i].sz_d =
+			model->mvtvm.info.input[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.input[i].dtype);
+		model->mvtvm.info.input[i].sz_q =
+			model->mvtvm.info.input[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.input[i].qtype);
+
+		model->mvtvm.info.total_input_sz_d += model->mvtvm.info.input[i].sz_d;
+		model->mvtvm.info.total_input_sz_q += model->mvtvm.info.input[i].sz_q;
+
+		plt_ml_dbg("model_id = %u, input[%u] - sz_d = %u sz_q = %u", model->model_id, i,
+			   model->mvtvm.info.input[i].sz_d, model->mvtvm.info.input[i].sz_q);
+	}
+
+	/* Outputs, set for nb_layers - 1 */
+	model->mvtvm.info.nb_outputs = metadata->model.num_output;
+	model->mvtvm.info.total_output_sz_d = 0;
+	model->mvtvm.info.total_output_sz_q = 0;
+	for (i = 0; i < metadata->model.num_output; i++) {
+		rte_strscpy(model->mvtvm.info.output[i].name, metadata->output[i].name,
+			    TVMDP_NAME_STRLEN);
+		model->mvtvm.info.output[i].dtype =
+			mvtvm_ml_io_type_map(metadata->output[i].datatype);
+		model->mvtvm.info.output[i].qtype =
+			mvtvm_ml_io_type_map(metadata->output[i].model_datatype);
+		model->mvtvm.info.output[i].nb_dims = metadata->output[i].ndim;
+
+		model->mvtvm.info.output[i].nb_elements = 1;
+		for (j = 0; j < metadata->output[i].ndim; j++) {
+			model->mvtvm.info.output[i].shape[j] = metadata->output[i].shape[j];
+			model->mvtvm.info.output[i].nb_elements *= metadata->output[i].shape[j];
+		}
+
+		model->mvtvm.info.output[i].sz_d =
+			model->mvtvm.info.output[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.output[i].dtype);
+		model->mvtvm.info.output[i].sz_q =
+			model->mvtvm.info.output[i].nb_elements *
+			rte_ml_io_type_size_get(model->mvtvm.info.output[i].qtype);
+
+		model->mvtvm.info.total_output_sz_d += model->mvtvm.info.output[i].sz_d;
+		model->mvtvm.info.total_output_sz_q += model->mvtvm.info.output[i].sz_q;
+
+		plt_ml_dbg("model_id = %u, output[%u] - sz_d = %u sz_q = %u", model->model_id, i,
+			   model->mvtvm.info.output[i].sz_d, model->mvtvm.info.output[i].sz_q);
+	}
+
+	return;
+
+tvm_mrvl_model:
+	cn10k_ml_layer_io_info_set(&model->mvtvm.info, &model->layer[0].glow.metadata);
+}
+
+struct cnxk_ml_io_info *
+mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
+{
+	RTE_SET_USED(layer_id);
+
+	return &model->mvtvm.info;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 6cb2639876..e86581bc6a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -50,5 +50,7 @@ int mvtvm_ml_model_blob_parse(struct rte_ml_model_params *params,
 			      struct mvtvm_ml_model_object *object);
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
+void mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model);
+struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 9a3ada1b0d..e21bf2dc07 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -175,6 +175,9 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		goto error;
 	}
 
+	/* Update model I/O data */
+	mvtvm_ml_model_io_info_set(model);
+
 	return 0;
 
 error:
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index d621dbc897..80a9a90b4e 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -27,6 +27,15 @@ mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 	return -EINVAL;
 }
 
+struct cnxk_ml_io_info *
+mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
+{
+	RTE_SET_USED(model);
+	RTE_SET_USED(layer_id);
+
+	return NULL;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 23fdfdc4cd..29f721072a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -18,5 +18,6 @@ int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_para
 
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
+struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
 
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 24/34] ml/cnxk: enable model unload in tvmdp library
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (22 preceding siblings ...)
  2023-10-26 12:43   ` [PATCH v9 23/34] ml/cnxk: update internal info for " Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 25/34] ml/cnxk: enable OCM check for multilayer TVM model Srikanth Yalavarthi
                     ` (10 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enable unloading model using external tvmdp library. Updated
layer unload callback to support multiple layers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c   |  8 +++++---
 drivers/ml/cnxk/cnxk_ml_ops.c    |  7 +++++--
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 28 ++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |  1 +
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  9 +++++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  1 +
 6 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 4191ccc840..e7208391fd 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -780,11 +780,9 @@ cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name)
 	struct cnxk_ml_layer *layer;
 
 	char str[RTE_MEMZONE_NAMESIZE];
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	int ret;
 
-	PLT_SET_USED(layer_name);
-
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
 		plt_err("Invalid device = %p", device);
@@ -797,6 +795,10 @@ cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name)
 		return -EINVAL;
 	}
 
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
 	layer = &model->layer[layer_id];
 
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u_%u", CN10K_ML_LAYER_MEMZONE_NAME,
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 1565e521fd..ce668e1eb6 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1107,7 +1107,7 @@ cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 	struct cnxk_ml_model *model;
 
 	char str[RTE_MEMZONE_NAMESIZE];
-	int ret;
+	int ret = 0;
 
 	if (dev == NULL)
 		return -EINVAL;
@@ -1125,7 +1125,10 @@ cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EBUSY;
 	}
 
-	ret = cn10k_ml_model_unload(cnxk_mldev, model);
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		ret = cn10k_ml_model_unload(cnxk_mldev, model);
+	else
+		ret = mvtvm_ml_model_unload(cnxk_mldev, model);
 	if (ret != 0)
 		return ret;
 
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index e21bf2dc07..3847f9b6b9 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -185,3 +185,31 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	return ret;
 }
+
+int
+mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	int ret;
+
+	RTE_SET_USED(cnxk_mldev);
+
+	/* Initialize model in TVMDP */
+	ret = tvmdp_model_unload(model->model_id);
+	if (ret != 0) {
+		plt_err("TVMDP: Model unload failed, model_id = %u, error = %d", model->model_id,
+			ret);
+		return ret;
+	}
+
+	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", MVTVM_ML_MODEL_MEMZONE_NAME, model->model_id);
+	mz = rte_memzone_lookup(str);
+	if (mz == NULL) {
+		plt_err("Memzone lookup failed for TVM model: model_id = %u, mz = %s",
+			model->model_id, str);
+		return -EINVAL;
+	}
+
+	return plt_memzone_free(mz);
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 6607537599..770794fe7d 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -18,5 +18,6 @@ int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_d
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
+int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index 80a9a90b4e..a17a76e41f 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -63,3 +63,12 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 
 	return -EINVAL;
 }
+
+int
+mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+
+	return -EINVAL;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 29f721072a..3776fb5369 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -15,6 +15,7 @@ int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_d
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
+int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 25/34] ml/cnxk: enable OCM check for multilayer TVM model
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (23 preceding siblings ...)
  2023-10-26 12:43   ` [PATCH v9 24/34] ml/cnxk: enable model unload in tvmdp library Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 26/34] ml/cnxk: support start and stop for TVM models Srikanth Yalavarthi
                     ` (9 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

From: Anup Prabhu <aprabhu@marvell.com>

Enabled check for OCM size requirement for multi-layer
TVM model. Compute OCM scratch and WB requirement for
all layers during the load stage.

Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_ops.c | 60 +++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index ce668e1eb6..d1471971e4 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1023,8 +1023,12 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
+	uint16_t max_scratch_pages;
+	struct cn10k_ml_ocm *ocm;
 	uint64_t model_info_size;
+	uint16_t total_wb_pages;
 	uint16_t lcl_model_id;
+	uint16_t layer_id;
 	uint64_t mz_size;
 	bool found;
 	int ret;
@@ -1086,6 +1090,62 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 	if (ret != 0)
 		goto error;
 
+	max_scratch_pages = 0;
+	total_wb_pages = 0;
+	layer_id = 0;
+
+	ocm = &cnxk_mldev->cn10k_mldev.ocm;
+
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
+		total_wb_pages = total_wb_pages + model->layer[layer_id].glow.ocm_map.wb_pages;
+		max_scratch_pages = PLT_MAX(max_scratch_pages,
+					    model->layer[layer_id].glow.ocm_map.scratch_pages);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	} else {
+		for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+			if (model->layer[layer_id].type == ML_CNXK_LAYER_TYPE_MRVL) {
+				total_wb_pages = total_wb_pages +
+						 model->layer[layer_id].glow.ocm_map.wb_pages;
+				max_scratch_pages =
+					PLT_MAX(max_scratch_pages,
+						model->layer[layer_id].glow.ocm_map.scratch_pages);
+			}
+		}
+#endif
+	}
+
+	if ((total_wb_pages + max_scratch_pages) > ocm->num_pages) {
+		plt_err("model_id = %u: total_wb_pages (%u) + scratch_pages (%u) >  %u\n",
+			lcl_model_id, total_wb_pages, max_scratch_pages, ocm->num_pages);
+
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW) {
+			plt_ml_dbg("layer_id = %u: wb_pages = %u, scratch_pages = %u\n", layer_id,
+				   model->layer[layer_id].glow.ocm_map.wb_pages,
+				   model->layer[layer_id].glow.ocm_map.scratch_pages);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		} else {
+			for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers;
+			     layer_id++) {
+				if (model->layer[layer_id].type == ML_CNXK_LAYER_TYPE_MRVL) {
+					plt_ml_dbg(
+						"layer_id = %u: wb_pages = %u, scratch_pages = %u\n",
+						layer_id,
+						model->layer[layer_id].glow.ocm_map.wb_pages,
+						model->layer[layer_id].glow.ocm_map.scratch_pages);
+				}
+			}
+#endif
+		}
+
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+			cn10k_ml_model_unload(cnxk_mldev, model);
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		else {
+			mvtvm_ml_model_unload(cnxk_mldev, model);
+			return -ENOMEM;
+		}
+#endif
+	}
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CNXK_MODEL_STATE_LOADED;
 	cnxk_mldev->nb_models_loaded++;
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 26/34] ml/cnxk: support start and stop for TVM models
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (24 preceding siblings ...)
  2023-10-26 12:43   ` [PATCH v9 25/34] ml/cnxk: enable OCM check for multilayer TVM model Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 27/34] ml/cnxk: update internal TVM model info structure Srikanth Yalavarthi
                     ` (8 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added support to start and stop TVM models. TVM model
start would invoke layer start for all Glow layers part
of the model. TVM model stop would invoke layer stop
for all Glow layers part of the model.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c   | 16 ++++++----
 drivers/ml/cnxk/cnxk_ml_ops.c    | 14 +++++++--
 drivers/ml/cnxk/mvtvm_ml_ops.c   | 52 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |  2 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c | 18 +++++++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  2 ++
 6 files changed, 96 insertions(+), 8 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index e7208391fd..2d308802cf 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -827,7 +827,7 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 	struct cn10k_ml_ocm *ocm;
 	struct cnxk_ml_req *req;
 
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	bool job_enqueued;
 	bool job_dequeued;
 	uint8_t num_tiles;
@@ -838,8 +838,6 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 	bool locked;
 	int ret = 0;
 
-	PLT_SET_USED(layer_name);
-
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
 		plt_err("Invalid device = %p", device);
@@ -852,6 +850,10 @@ cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name)
 		return -EINVAL;
 	}
 
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
 	layer = &model->layer[layer_id];
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
@@ -1015,14 +1017,12 @@ cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name)
 	struct cn10k_ml_ocm *ocm;
 	struct cnxk_ml_req *req;
 
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	bool job_enqueued;
 	bool job_dequeued;
 	bool locked;
 	int ret = 0;
 
-	PLT_SET_USED(layer_name);
-
 	cnxk_mldev = (struct cnxk_ml_dev *)device;
 	if (cnxk_mldev == NULL) {
 		plt_err("Invalid device = %p", device);
@@ -1035,6 +1035,10 @@ cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name)
 		return -EINVAL;
 	}
 
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
 	layer = &model->layer[layer_id];
 	cn10k_mldev = &cnxk_mldev->cn10k_mldev;
 	ocm = &cn10k_mldev->ocm;
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index d1471971e4..c38c60bf76 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -1216,7 +1216,12 @@ cnxk_ml_model_start(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EINVAL;
 	}
 
-	return cn10k_ml_model_start(cnxk_mldev, model);
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		return cn10k_ml_model_start(cnxk_mldev, model);
+	else
+		return mvtvm_ml_model_start(cnxk_mldev, model);
+
+	return 0;
 }
 
 int
@@ -1236,7 +1241,12 @@ cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id)
 		return -EINVAL;
 	}
 
-	return cn10k_ml_model_stop(cnxk_mldev, model);
+	if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+		return cn10k_ml_model_stop(cnxk_mldev, model);
+	else
+		return mvtvm_ml_model_stop(cnxk_mldev, model);
+
+	return 0;
 }
 
 static int
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 3847f9b6b9..323c7c6fb6 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -213,3 +213,55 @@ mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mode
 
 	return plt_memzone_free(mz);
 }
+
+int
+mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+
+	uint16_t layer_id = 0;
+	int ret = 0;
+
+next_layer:
+	layer = &model->layer[layer_id];
+	if (layer->type == ML_CNXK_LAYER_TYPE_MRVL) {
+		ret = cn10k_ml_layer_start(cnxk_mldev, model->model_id, layer->name);
+		if (ret != 0) {
+			plt_err("Layer start failed, model_id = %u, layer_name = %s, error = %d",
+				model->model_id, layer->name, ret);
+			return ret;
+		}
+	}
+	layer_id++;
+
+	if (layer_id < model->nb_layers)
+		goto next_layer;
+
+	return 0;
+}
+
+int
+mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct cnxk_ml_layer *layer;
+
+	uint16_t layer_id = 0;
+	int ret = 0;
+
+next_layer:
+	layer = &model->layer[layer_id];
+	if (layer->type == ML_CNXK_LAYER_TYPE_MRVL) {
+		ret = cn10k_ml_layer_stop(cnxk_mldev, model->model_id, layer->name);
+		if (ret != 0) {
+			plt_err("Layer stop failed, model_id = %u, layer_name = %s, error = %d",
+				model->model_id, layer->name, ret);
+			return ret;
+		}
+	}
+	layer_id++;
+
+	if (layer_id < model->nb_layers)
+		goto next_layer;
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 770794fe7d..55459f9f7f 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -19,5 +19,7 @@ int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index a17a76e41f..b8c2e6a1fc 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -72,3 +72,21 @@ mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mode
 
 	return -EINVAL;
 }
+
+int
+mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+
+	return -EINVAL;
+}
+
+int
+mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+
+	return -EINVAL;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 3776fb5369..1eb663b1d1 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -16,6 +16,8 @@ int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 27/34] ml/cnxk: update internal TVM model info structure
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (25 preceding siblings ...)
  2023-10-26 12:43   ` [PATCH v9 26/34] ml/cnxk: support start and stop for TVM models Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 28/34] ml/cnxk: support device dump for TVM models Srikanth Yalavarthi
                     ` (7 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

From: Prince Takkar <ptakkar@marvell.com>

Added support to update internal model info structure
for TVM models.

Signed-off-by: Prince Takkar <ptakkar@marvell.com>
Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/mvtvm_ml_model.c | 66 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 +
 drivers/ml/cnxk/mvtvm_ml_ops.c   |  3 ++
 3 files changed, 71 insertions(+)

diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index f35c2bb3e5..88f2738423 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -11,6 +11,7 @@
 
 #include <roc_api.h>
 
+#include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 
 /* Objects list */
@@ -268,3 +269,68 @@ mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
 
 	return &model->mvtvm.info;
 }
+
+void
+mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
+{
+	struct tvmdp_model_metadata *metadata;
+	struct rte_ml_model_info *info;
+	struct rte_ml_io_info *output;
+	struct rte_ml_io_info *input;
+	uint8_t i;
+
+	info = PLT_PTR_CAST(model->info);
+	input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info));
+	output = PLT_PTR_ADD(input, ML_CNXK_MODEL_MAX_INPUT_OUTPUT * sizeof(struct rte_ml_io_info));
+
+	/* Reset model info */
+	memset(info, 0, sizeof(struct rte_ml_model_info));
+
+	if (model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL)
+		goto tvm_mrvl_model;
+
+	metadata = &model->mvtvm.metadata;
+	rte_memcpy(info->name, metadata->model.name, TVMDP_NAME_STRLEN);
+	snprintf(info->version, RTE_ML_STR_MAX, "%u.%u.%u.%u", metadata->model.version[0],
+		 metadata->model.version[1], metadata->model.version[2],
+		 metadata->model.version[3]);
+	info->model_id = model->model_id;
+	info->device_id = cnxk_mldev->mldev->data->dev_id;
+	info->io_layout = RTE_ML_IO_LAYOUT_SPLIT;
+	info->min_batches = model->batch_size;
+	info->max_batches = model->batch_size;
+	info->nb_inputs = metadata->model.num_input;
+	info->input_info = input;
+	info->nb_outputs = metadata->model.num_output;
+	info->output_info = output;
+	info->wb_size = 0;
+
+	/* Set input info */
+	for (i = 0; i < info->nb_inputs; i++) {
+		rte_memcpy(input[i].name, metadata->input[i].name, MRVL_ML_INPUT_NAME_LEN);
+		input[i].nb_dims = metadata->input[i].ndim;
+		input[i].shape = &model->mvtvm.info.input[i].shape[0];
+		input[i].type = model->mvtvm.info.input[i].qtype;
+		input[i].nb_elements = model->mvtvm.info.input[i].nb_elements;
+		input[i].size = model->mvtvm.info.input[i].nb_elements *
+				rte_ml_io_type_size_get(model->mvtvm.info.input[i].qtype);
+	}
+
+	/* Set output info */
+	for (i = 0; i < info->nb_outputs; i++) {
+		rte_memcpy(output[i].name, metadata->output[i].name, MRVL_ML_OUTPUT_NAME_LEN);
+		output[i].nb_dims = metadata->output[i].ndim;
+		output[i].shape = &model->mvtvm.info.output[i].shape[0];
+		output[i].type = model->mvtvm.info.output[i].qtype;
+		output[i].nb_elements = model->mvtvm.info.output[i].nb_elements;
+		output[i].size = model->mvtvm.info.output[i].nb_elements *
+				 rte_ml_io_type_size_get(model->mvtvm.info.output[i].qtype);
+	}
+
+	return;
+
+tvm_mrvl_model:
+	cn10k_ml_model_info_set(cnxk_mldev, model, &model->mvtvm.info,
+				&model->layer[0].glow.metadata);
+	info->io_layout = RTE_ML_IO_LAYOUT_SPLIT;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index e86581bc6a..a1247ffbde 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -11,6 +11,7 @@
 
 #include "cnxk_ml_io.h"
 
+struct cnxk_ml_dev;
 struct cnxk_ml_model;
 
 /* Maximum number of objects per model */
@@ -52,5 +53,6 @@ int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_n
 				uint16_t *layer_id);
 void mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model);
 struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
+void mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 323c7c6fb6..c6872cd89a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -178,6 +178,9 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	/* Update model I/O data */
 	mvtvm_ml_model_io_info_set(model);
 
+	/* Set model info */
+	mvtvm_ml_model_info_set(cnxk_mldev, model);
+
 	return 0;
 
 error:
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 28/34] ml/cnxk: support device dump for TVM models
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (26 preceding siblings ...)
  2023-10-26 12:43   ` [PATCH v9 27/34] ml/cnxk: update internal TVM model info structure Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 29/34] ml/cnxk: enable reporting model runtime as xstats Srikanth Yalavarthi
                     ` (6 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enabled support to print TVM model layer info.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cnxk_ml_model.c  |  7 +++-
 drivers/ml/cnxk/mvtvm_ml_model.c | 59 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_model.h |  2 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  8 +++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |  2 ++
 5 files changed, 77 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cnxk_ml_model.c b/drivers/ml/cnxk/cnxk_ml_model.c
index 02f80410ec..ed6a1ed866 100644
--- a/drivers/ml/cnxk/cnxk_ml_model.c
+++ b/drivers/ml/cnxk/cnxk_ml_model.c
@@ -68,6 +68,8 @@ cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 	cnxk_ml_print_line(fp, LINE_LEN);
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "model_id", model->model_id);
 	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", model->name);
+	fprintf(fp, "%*s : %d\n", FIELD_LEN, "type", model->type);
+	fprintf(fp, "%*s : %d\n", FIELD_LEN, "subtype", model->subtype);
 	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "model", PLT_U64_CAST(model));
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", model->batch_size);
 	fprintf(fp, "%*s : %u\n", FIELD_LEN, "nb_layers", model->nb_layers);
@@ -84,6 +86,9 @@ cnxk_ml_model_dump(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 
 	for (layer_id = 0; layer_id < model->nb_layers; layer_id++) {
 		layer = &model->layer[layer_id];
-		cn10k_ml_layer_print(cnxk_mldev, layer, fp);
+		if (layer->type == ML_CNXK_LAYER_TYPE_MRVL)
+			cn10k_ml_layer_print(cnxk_mldev, layer, fp);
+		else
+			mvtvm_ml_layer_print(cnxk_mldev, layer, fp);
 	}
 }
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 88f2738423..e5ba672788 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -13,6 +13,7 @@
 
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_utils.h"
 
 /* Objects list */
 char mvtvm_object_list[ML_MVTVM_MODEL_OBJECT_MAX][RTE_ML_STR_MAX] = {"mod.so", "mod.json",
@@ -334,3 +335,61 @@ mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 				&model->layer[0].glow.metadata);
 	info->io_layout = RTE_ML_IO_LAYOUT_SPLIT;
 }
+
+void
+mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp)
+{
+	char str[STR_LEN];
+	uint8_t i;
+
+	/* Print debug info */
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, " Layer Information (Layer ID: %u, Name: %s)\n",
+		cnxk_mldev->index_map[layer->index].layer_id, layer->name);
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "layer_id",
+		cnxk_mldev->index_map[layer->index].layer_id);
+	fprintf(fp, "%*s : %s\n", FIELD_LEN, "name", layer->name);
+	fprintf(fp, "%*s : %d\n", FIELD_LEN, "type", layer->type);
+	fprintf(fp, "%*s : 0x%016lx\n", FIELD_LEN, "layer", PLT_U64_CAST(layer));
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "batch_size", layer->batch_size);
+
+	/* Print model state */
+	if (layer->state == ML_CNXK_LAYER_STATE_LOADED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "loaded");
+	if (layer->state == ML_CNXK_LAYER_STATE_JOB_ACTIVE)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "job_active");
+	if (layer->state == ML_CNXK_LAYER_STATE_STARTED)
+		fprintf(fp, "%*s : %s\n", FIELD_LEN, "state", "started");
+
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_inputs", layer->info.nb_inputs);
+	fprintf(fp, "%*s : %u\n", FIELD_LEN, "num_outputs", layer->info.nb_outputs);
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s\n", "input", "input_name", "input_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->info.nb_inputs; i++) {
+		fprintf(fp, "%8u  ", i);
+		fprintf(fp, "%*s  ", 16, layer->info.input[i].name);
+		rte_ml_io_type_to_str(layer->info.input[i].qtype, str, STR_LEN);
+		fprintf(fp, "%*s  ", 12, str);
+	}
+	fprintf(fp, "\n");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "%8s  %16s  %12s\n", "output", "output_name", "output_type");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	for (i = 0; i < layer->info.nb_outputs; i++) {
+		fprintf(fp, "%8u  ", i);
+		fprintf(fp, "%*s  ", 16, layer->info.output[i].name);
+		rte_ml_io_type_to_str(layer->info.output[i].qtype, str, STR_LEN);
+		fprintf(fp, "%*s  ", 12, str);
+		fprintf(fp, "\n");
+	}
+	fprintf(fp, "\n");
+	cnxk_ml_print_line(fp, LINE_LEN);
+	fprintf(fp, "\n");
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index a1247ffbde..900ba44fa0 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -13,6 +13,7 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 /* Maximum number of objects per model */
 #define ML_MVTVM_MODEL_OBJECT_MAX 3
@@ -54,5 +55,6 @@ int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_n
 void mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model);
 struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
 void mvtvm_ml_model_info_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+void mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
 
 #endif /* _MVTVM_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index b8c2e6a1fc..260a051b08 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -36,6 +36,14 @@ mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id)
 	return NULL;
 }
 
+void
+mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(layer);
+	RTE_SET_USED(fp);
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 1eb663b1d1..d6d0edbcf1 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -9,6 +9,7 @@
 
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
@@ -22,5 +23,6 @@ int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *mo
 int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_name,
 				uint16_t *layer_id);
 struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
+void mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
 
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 29/34] ml/cnxk: enable reporting model runtime as xstats
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (27 preceding siblings ...)
  2023-10-26 12:43   ` [PATCH v9 28/34] ml/cnxk: support device dump for TVM models Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 30/34] ml/cnxk: implement I/O alloc and free callbacks Srikanth Yalavarthi
                     ` (5 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Added model xstats entries to compute runtime latency.
Allocated internal resources for TVM model xstats.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c   |   9 +++
 drivers/ml/cnxk/cn10k_ml_ops.h   |   2 +
 drivers/ml/cnxk/cnxk_ml_ops.c    | 131 +++++++++++++++++++++++++++----
 drivers/ml/cnxk/cnxk_ml_ops.h    |   1 +
 drivers/ml/cnxk/cnxk_ml_xstats.h |   7 ++
 drivers/ml/cnxk/mvtvm_ml_model.h |  24 ++++++
 drivers/ml/cnxk/mvtvm_ml_ops.c   |  96 +++++++++++++++++++++-
 drivers/ml/cnxk/mvtvm_ml_ops.h   |   8 ++
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  23 ++++++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |   6 ++
 10 files changed, 289 insertions(+), 18 deletions(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 2d308802cf..0c67ce7b40 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -197,6 +197,15 @@ cn10k_ml_xstats_layer_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model
 	}
 }
 
+void
+cn10k_ml_xstat_model_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			      uint16_t stat_id, uint16_t entry, char *suffix)
+{
+	snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+		 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+		 model->glow.metadata.model.name, model_xstats[entry].name, suffix);
+}
+
 #define ML_AVG_FOREACH_QP(cnxk_mldev, layer, qp_id, str, value, count)                             \
 	do {                                                                                       \
 		value = 0;                                                                         \
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 3d18303ed3..045e2e6cd2 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -331,6 +331,8 @@ int cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name
 int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name);
 
 /* xstats ops */
+void cn10k_ml_xstat_model_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				   uint16_t stat_id, uint16_t entry, char *suffix);
 uint64_t cn10k_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer,
 				  enum cnxk_ml_xstats_type type);
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index c38c60bf76..2632d70d8c 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -138,7 +138,8 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 
 	/* Allocate memory for xstats entries. Don't allocate during reconfigure */
 	nb_stats = RTE_DIM(device_xstats) +
-		   RTE_DIM(layer_xstats) * ML_CNXK_MAX_MODELS * ML_CNXK_MODEL_MAX_LAYERS;
+		   RTE_DIM(layer_xstats) * ML_CNXK_MAX_MODELS * ML_CNXK_MODEL_MAX_LAYERS +
+		   RTE_DIM(model_xstats) * ML_CNXK_MAX_MODELS;
 	if (cnxk_mldev->xstats.entries == NULL)
 		cnxk_mldev->xstats.entries = rte_zmalloc(
 			"cnxk_ml_xstats", sizeof(struct cnxk_ml_xstats_entry) * nb_stats,
@@ -169,6 +170,25 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 	for (model = 0; model < ML_CNXK_MAX_MODELS; model++) {
 		cnxk_mldev->xstats.offset_for_model[model] = stat_id;
 
+		for (i = 0; i < RTE_DIM(model_xstats); i++) {
+			cnxk_mldev->xstats.entries[stat_id].map.id = stat_id;
+			cnxk_mldev->xstats.entries[stat_id].mode = RTE_ML_DEV_XSTATS_MODEL;
+			cnxk_mldev->xstats.entries[stat_id].group = CNXK_ML_XSTATS_GROUP_MODEL;
+			cnxk_mldev->xstats.entries[stat_id].type = model_xstats[i].type;
+			cnxk_mldev->xstats.entries[stat_id].fn_id = CNXK_ML_XSTATS_FN_MODEL;
+			cnxk_mldev->xstats.entries[stat_id].obj_idx = model;
+			cnxk_mldev->xstats.entries[stat_id].layer_id = -1;
+			cnxk_mldev->xstats.entries[stat_id].reset_allowed =
+				model_xstats[i].reset_allowed;
+
+			/* Name of xstat is updated during model load */
+			snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+				 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name),
+				 "Model-%u-%s", model, model_xstats[i].name);
+
+			stat_id++;
+		}
+
 		for (layer = 0; layer < ML_CNXK_MODEL_MAX_LAYERS; layer++) {
 			cnxk_mldev->xstats.offset_for_layer[model][layer] = stat_id;
 
@@ -195,7 +215,8 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 			cnxk_mldev->xstats.count_per_layer[model][layer] = RTE_DIM(layer_xstats);
 		}
 
-		cnxk_mldev->xstats.count_per_model[model] = RTE_DIM(layer_xstats);
+		cnxk_mldev->xstats.count_per_model[model] =
+			RTE_DIM(layer_xstats) + ML_CNXK_MODEL_MAX_LAYERS * RTE_DIM(model_xstats);
 	}
 
 	cnxk_mldev->xstats.count_mode_model = stat_id - cnxk_mldev->xstats.count_mode_device;
@@ -204,6 +225,36 @@ cnxk_ml_xstats_init(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
+void
+cnxk_ml_xstats_model_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id)
+{
+	struct cnxk_ml_model *model;
+	uint16_t rclk_freq;
+	uint16_t sclk_freq;
+	uint16_t stat_id;
+	char suffix[8];
+	uint16_t i;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	stat_id = cnxk_mldev->xstats.offset_for_model[model_id];
+
+	roc_clk_freq_get(&rclk_freq, &sclk_freq);
+	if (sclk_freq == 0)
+		rte_strscpy(suffix, "cycles", 7);
+	else
+		rte_strscpy(suffix, "ns", 3);
+
+	/* Update xstat name based on layer name and sclk availability */
+	for (i = 0; i < RTE_DIM(model_xstats); i++) {
+		if (model->type == ML_CNXK_MODEL_TYPE_GLOW)
+			cn10k_ml_xstat_model_name_set(cnxk_mldev, model, stat_id, i, suffix);
+		else
+			mvtvm_ml_model_xstat_name_set(cnxk_mldev, model, stat_id, i, suffix);
+
+		stat_id++;
+	}
+}
+
 static void
 cnxk_ml_xstats_uninit(struct cnxk_ml_dev *cnxk_mldev)
 {
@@ -247,13 +298,22 @@ cnxk_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, uint16_t obj_idx, int32_
 	if (model == NULL)
 		return 0;
 
-	if (layer_id >= 0)
+	if (layer_id >= 0) {
 		layer = &model->layer[layer_id];
-	else
-		return 0;
+		goto layer_xstats;
+	} else {
+		layer = NULL;
+		goto model_xstats;
+	}
 
+layer_xstats:
 	value = cn10k_ml_model_xstat_get(cnxk_mldev, layer, type);
+	goto exit_xstats;
 
+model_xstats:
+	value = mvtvm_ml_model_xstat_get(cnxk_mldev, model, type);
+
+exit_xstats:
 	roc_clk_freq_get(&rclk_freq, &sclk_freq);
 	if (sclk_freq != 0) /* return in ns */
 		value = (value * 1000ULL) / sclk_freq;
@@ -836,8 +896,9 @@ cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode
 {
 	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
 	uint32_t xstats_mode_count;
-	uint16_t layer_id = 0;
+	uint16_t layer_id;
 	uint32_t idx = 0;
 	uint32_t i;
 
@@ -854,7 +915,17 @@ cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CNXK_MAX_MODELS)
 			break;
-		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+
+		model = cnxk_mldev->mldev->data->models[model_id];
+		for (layer_id = 0; layer_id < model->nb_layers; layer_id++) {
+			if (model->layer[layer_id].type == ML_CNXK_LAYER_TYPE_MRVL)
+				xstats_mode_count +=
+					cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+		}
+
+		if ((model->type == ML_CNXK_MODEL_TYPE_TVM) &&
+		    (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_MRVL))
+			xstats_mode_count += RTE_DIM(model_xstats);
 		break;
 	default:
 		return -EINVAL;
@@ -868,9 +939,20 @@ cnxk_ml_dev_xstats_names_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode
 		if (xs->mode != mode)
 			continue;
 
-		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
-		    (model_id != xs->obj_idx || layer_id != xs->layer_id))
-			continue;
+		if (mode == RTE_ML_DEV_XSTATS_MODEL) {
+			if (model_id != xs->obj_idx)
+				continue;
+
+			model = cnxk_mldev->mldev->data->models[model_id];
+			if ((model->type == ML_CNXK_MODEL_TYPE_GLOW ||
+			     model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL) &&
+			    xs->group == CNXK_ML_XSTATS_GROUP_MODEL)
+				continue;
+
+			if (model->type == ML_CNXK_MODEL_TYPE_TVM &&
+			    model->layer[xs->layer_id].type == ML_CNXK_LAYER_TYPE_LLVM)
+				continue;
+		}
 
 		rte_strscpy(xstats_map[idx].name, xs->map.name, RTE_ML_STR_MAX);
 		xstats_map[idx].id = xs->map.id;
@@ -931,9 +1013,10 @@ cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 {
 	struct cnxk_ml_xstats_entry *xs;
 	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
 	uint32_t xstats_mode_count;
-	uint16_t layer_id = 0;
 	cnxk_ml_xstats_fn fn;
+	uint16_t layer_id;
 	uint64_t val;
 	uint32_t idx;
 	uint32_t i;
@@ -951,7 +1034,14 @@ cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 	case RTE_ML_DEV_XSTATS_MODEL:
 		if (model_id >= ML_CNXK_MAX_MODELS)
 			return -EINVAL;
-		xstats_mode_count = cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+
+		model = cnxk_mldev->mldev->data->models[model_id];
+		for (layer_id = 0; layer_id < model->nb_layers; layer_id++)
+			xstats_mode_count += cnxk_mldev->xstats.count_per_layer[model_id][layer_id];
+
+		if ((model->type == ML_CNXK_MODEL_TYPE_TVM) &&
+		    (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_MRVL))
+			xstats_mode_count += RTE_DIM(model_xstats);
 		break;
 	default:
 		return -EINVAL;
@@ -963,11 +1053,18 @@ cnxk_ml_dev_xstats_get(struct rte_ml_dev *dev, enum rte_ml_dev_xstats_mode mode,
 		if (stat_ids[i] > cnxk_mldev->xstats.count || xs->mode != mode)
 			continue;
 
-		if (mode == RTE_ML_DEV_XSTATS_MODEL &&
-		    (model_id != xs->obj_idx || layer_id != xs->layer_id)) {
-			plt_err("Invalid stats_id[%d] = %d for model_id = %d\n", i, stat_ids[i],
-				model_id);
-			return -EINVAL;
+		if (mode == RTE_ML_DEV_XSTATS_MODEL) {
+			if (model_id != xs->obj_idx)
+				continue;
+
+			model = cnxk_mldev->mldev->data->models[xs->obj_idx];
+			if ((model->type == ML_CNXK_MODEL_TYPE_GLOW ||
+			     model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL) &&
+			    xs->group == CNXK_ML_XSTATS_GROUP_MODEL)
+				continue;
+
+			if (xs->layer_id == -1 && xs->group == CNXK_ML_XSTATS_GROUP_LAYER)
+				continue;
 		}
 
 		switch (xs->fn_id) {
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index b22a2b0d95..ab32676b3e 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -70,6 +70,7 @@ extern struct rte_ml_dev_ops cnxk_ml_ops;
 
 int cnxk_ml_model_unload(struct rte_ml_dev *dev, uint16_t model_id);
 int cnxk_ml_model_stop(struct rte_ml_dev *dev, uint16_t model_id);
+void cnxk_ml_xstats_model_name_update(struct cnxk_ml_dev *cnxk_mldev, uint16_t model_id);
 
 __rte_hot uint16_t cnxk_ml_enqueue_burst(struct rte_ml_dev *dev, uint16_t qp_id,
 					 struct rte_ml_op **ops, uint16_t nb_ops);
diff --git a/drivers/ml/cnxk/cnxk_ml_xstats.h b/drivers/ml/cnxk/cnxk_ml_xstats.h
index 5e02bb876c..a2c9adfe4a 100644
--- a/drivers/ml/cnxk/cnxk_ml_xstats.h
+++ b/drivers/ml/cnxk/cnxk_ml_xstats.h
@@ -142,4 +142,11 @@ static const struct cnxk_ml_xstat_info layer_xstats[] = {
 	{"Min-FW-Latency", min_fw_latency, 1}, {"Max-FW-Latency", max_fw_latency, 1},
 };
 
+/* Model xstats */
+static const struct cnxk_ml_xstat_info model_xstats[] = {
+	{"Avg-RT-Latency", avg_rt_latency, 1},
+	{"Min-RT-Latency", min_rt_latency, 1},
+	{"Max-RT-Latency", max_rt_latency, 1},
+};
+
 #endif /* _CNXK_ML_XSTATS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 900ba44fa0..66c3af18e1 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -33,6 +33,27 @@ struct mvtvm_ml_model_object {
 	int64_t size;
 };
 
+/* Model fast-path stats */
+struct mvtvm_ml_model_xstats {
+	/* Total TVM runtime latency, sum of all inferences */
+	uint64_t tvm_rt_latency_tot;
+
+	/* TVM runtime latency */
+	uint64_t tvm_rt_latency;
+
+	/* Minimum TVM runtime latency */
+	uint64_t tvm_rt_latency_min;
+
+	/* Maximum TVM runtime latency */
+	uint64_t tvm_rt_latency_max;
+
+	/* Total jobs dequeued */
+	uint64_t dequeued_count;
+
+	/* Hardware stats reset index */
+	uint64_t tvm_rt_reset_count;
+};
+
 struct mvtvm_ml_model_data {
 	/* Model metadata */
 	struct tvmdp_model_metadata metadata;
@@ -45,6 +66,9 @@ struct mvtvm_ml_model_data {
 
 	/* Model I/O info */
 	struct cnxk_ml_io_info info;
+
+	/* Stats for burst ops */
+	struct mvtvm_ml_model_xstats *burst_xstats;
 };
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index c6872cd89a..abfbae2b3a 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -10,10 +10,83 @@
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
+#include "cnxk_ml_xstats.h"
 
 /* ML model macros */
 #define MVTVM_ML_MODEL_MEMZONE_NAME "ml_mvtvm_model_mz"
 
+void
+mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			      uint16_t stat_id, uint16_t entry, char *suffix)
+{
+	snprintf(cnxk_mldev->xstats.entries[stat_id].map.name,
+		 sizeof(cnxk_mldev->xstats.entries[stat_id].map.name), "%s-%s-%s",
+		 model->mvtvm.metadata.model.name, model_xstats[entry].name, suffix);
+}
+
+#define ML_AVG_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count)                            \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value += model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_tot;              \
+			count += model->mvtvm.burst_xstats[qp_id].dequeued_count -                 \
+				 model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count;              \
+		}                                                                                  \
+		if (count != 0)                                                                    \
+			value = value / count;                                                     \
+	} while (0)
+
+#define ML_MIN_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count)                            \
+	do {                                                                                       \
+		value = UINT64_MAX;                                                                \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MIN(value,                                                     \
+					model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_min);      \
+			count += model->mvtvm.burst_xstats[qp_id].dequeued_count -                 \
+				 model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count;              \
+		}                                                                                  \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+
+#define ML_MAX_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count)                            \
+	do {                                                                                       \
+		value = 0;                                                                         \
+		for (qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {        \
+			value = PLT_MAX(value,                                                     \
+					model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_max);      \
+			count += model->mvtvm.burst_xstats[qp_id].dequeued_count -                 \
+				 model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count;              \
+		}                                                                                  \
+		if (count == 0)                                                                    \
+			value = 0;                                                                 \
+	} while (0)
+
+uint64_t
+mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			 enum cnxk_ml_xstats_type type)
+{
+	uint64_t count = 0;
+	uint64_t value = 0;
+	uint32_t qp_id;
+
+	switch (type) {
+	case avg_rt_latency:
+		ML_AVG_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count);
+		break;
+	case min_rt_latency:
+		ML_MIN_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count);
+		break;
+	case max_rt_latency:
+		ML_MAX_FOREACH_QP_MVTVM(cnxk_mldev, model, qp_id, value, count);
+		break;
+	default:
+		value = 0;
+	}
+
+	return value;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
@@ -53,6 +126,7 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
 	size_t model_object_size = 0;
+	size_t model_xstats_size = 0;
 	uint16_t nb_mrvl_layers;
 	uint16_t nb_llvm_layers;
 	uint8_t layer_id = 0;
@@ -68,7 +142,11 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	model_object_size = RTE_ALIGN_CEIL(object[0].size, RTE_CACHE_LINE_MIN_SIZE) +
 			    RTE_ALIGN_CEIL(object[1].size, RTE_CACHE_LINE_MIN_SIZE) +
 			    RTE_ALIGN_CEIL(object[2].size, RTE_CACHE_LINE_MIN_SIZE);
-	mz_size += model_object_size;
+
+	model_xstats_size =
+		cnxk_mldev->mldev->data->nb_queue_pairs * sizeof(struct mvtvm_ml_model_xstats);
+
+	mz_size += model_object_size + model_xstats_size;
 
 	/* Allocate memzone for model object */
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", MVTVM_ML_MODEL_MEMZONE_NAME, model->model_id);
@@ -181,6 +259,22 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	/* Set model info */
 	mvtvm_ml_model_info_set(cnxk_mldev, model);
 
+	/* Update model xstats name */
+	cnxk_ml_xstats_model_name_update(cnxk_mldev, model->model_id);
+
+	model->mvtvm.burst_xstats = RTE_PTR_ADD(
+		model->mvtvm.object.params.addr,
+		RTE_ALIGN_CEIL(model->mvtvm.object.params.size, RTE_CACHE_LINE_MIN_SIZE));
+
+	for (int qp_id = 0; qp_id < cnxk_mldev->mldev->data->nb_queue_pairs; qp_id++) {
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_tot = 0;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency = 0;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_min = UINT64_MAX;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_latency_max = 0;
+		model->mvtvm.burst_xstats[qp_id].tvm_rt_reset_count = 0;
+		model->mvtvm.burst_xstats[qp_id].dequeued_count = 0;
+	}
+
 	return 0;
 
 error:
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 55459f9f7f..22e0340146 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -11,8 +11,11 @@
 
 #include <rte_mldev.h>
 
+#include "cnxk_ml_xstats.h"
+
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
+struct cnxk_ml_layer;
 
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
@@ -22,4 +25,9 @@ int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *
 int mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 
+void mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				   uint16_t stat_id, uint16_t entry, char *suffix);
+uint64_t mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				  enum cnxk_ml_xstats_type type);
+
 #endif /* _MVTVM_ML_OPS_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index 260a051b08..19af1d2703 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -8,6 +8,7 @@
 
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
+#include "cnxk_ml_xstats.h"
 
 enum cnxk_ml_model_type
 mvtvm_ml_model_type_get(struct rte_ml_model_params *params)
@@ -44,6 +45,28 @@ mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer
 	RTE_SET_USED(fp);
 }
 
+void
+mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			      uint16_t stat_id, uint16_t entry, char *suffix)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+	RTE_SET_USED(stat_id);
+	RTE_SET_USED(entry);
+	RTE_SET_USED(suffix);
+}
+
+uint64_t
+mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+			 enum cnxk_ml_xstats_type type)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(model);
+	RTE_SET_USED(type);
+
+	return 0;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index d6d0edbcf1..3fd1f04c35 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -7,6 +7,8 @@
 
 #include <rte_mldev.h>
 
+#include "cnxk_ml_xstats.h"
+
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
@@ -24,5 +26,9 @@ int mvtvm_ml_model_get_layer_id(struct cnxk_ml_model *model, const char *layer_n
 				uint16_t *layer_id);
 struct cnxk_ml_io_info *mvtvm_ml_model_io_info_get(struct cnxk_ml_model *model, uint16_t layer_id);
 void mvtvm_ml_layer_print(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_layer *layer, FILE *fp);
+void mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				   uint16_t stat_id, uint16_t entry, char *suffix);
+uint64_t mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
+				  enum cnxk_ml_xstats_type type);
 
 #endif /* _MVTVM_ML_STUBS_H_ */
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 30/34] ml/cnxk: implement I/O alloc and free callbacks
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (28 preceding siblings ...)
  2023-10-26 12:43   ` [PATCH v9 29/34] ml/cnxk: enable reporting model runtime as xstats Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 31/34] ml/cnxk: add generic ML malloc and free callback Srikanth Yalavarthi
                     ` (4 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented callback functions for IO allocation and free
for Glow layers.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 87 ++++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  3 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c |  2 +
 3 files changed, 92 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 0c67ce7b40..7802425c87 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1410,3 +1410,90 @@ cn10k_ml_inference_sync(void *device, uint16_t index, void *input, void *output,
 error_enqueue:
 	return ret;
 }
+
+int
+cn10k_ml_io_alloc(void *device, uint16_t model_id, const char *layer_name, uint64_t **input_qbuffer,
+		  uint64_t **output_qbuffer)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_layer *layer;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint64_t output_size;
+	uint64_t input_size;
+	uint16_t layer_id;
+	int ret;
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
+	layer = &model->layer[layer_id];
+	input_size = PLT_ALIGN_CEIL(layer->info.total_input_sz_q, ML_CN10K_ALIGN_SIZE);
+	output_size = PLT_ALIGN_CEIL(layer->info.total_output_sz_q, ML_CN10K_ALIGN_SIZE);
+
+	sprintf(str, "cn10k_ml_io_mz_%u_%u", model_id, layer_id);
+	mz = plt_memzone_reserve_aligned(str, input_size + output_size, 0, ML_CN10K_ALIGN_SIZE);
+	if (mz == NULL) {
+		plt_err("io_alloc failed: Unable to allocate memory: model_id = %u, layer_name = %s",
+			model_id, layer_name);
+		return -ENOMEM;
+	}
+
+	*input_qbuffer = mz->addr;
+	*output_qbuffer = PLT_PTR_ADD(mz->addr, input_size);
+
+	return 0;
+}
+
+int
+cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name)
+{
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+
+	char str[RTE_MEMZONE_NAMESIZE];
+	const struct plt_memzone *mz;
+	uint16_t layer_id;
+	int ret;
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+	if (cnxk_mldev == NULL) {
+		plt_err("Invalid device = %p", device);
+		return -EINVAL;
+	}
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+
+	ret = cn10k_ml_model_get_layer_id(model, layer_name, &layer_id);
+	if (ret != 0)
+		return ret;
+
+	sprintf(str, "cn10k_ml_io_mz_%u_%u", model_id, layer_id);
+	mz = plt_memzone_lookup(str);
+	if (mz == NULL) {
+		plt_err("io_free failed: Memzone not found: model_id = %u, layer_name = %s",
+			model_id, layer_name);
+		return -EINVAL;
+	}
+
+	return plt_memzone_free(mz);
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 045e2e6cd2..9c41c1c0b0 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -329,6 +329,9 @@ int cn10k_ml_layer_load(void *device, uint16_t model_id, const char *layer_name,
 int cn10k_ml_layer_unload(void *device, uint16_t model_id, const char *layer_name);
 int cn10k_ml_layer_start(void *device, uint16_t model_id, const char *layer_name);
 int cn10k_ml_layer_stop(void *device, uint16_t model_id, const char *layer_name);
+int cn10k_ml_io_alloc(void *device, uint16_t model_id, const char *layer_name,
+		      uint64_t **input_qbuffer, uint64_t **output_qbuffer);
+int cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name);
 
 /* xstats ops */
 void cn10k_ml_xstat_model_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index abfbae2b3a..a50b31ec6e 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -232,6 +232,8 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback = &model->mvtvm.cb;
 		callback->tvmrt_glow_layer_load = cn10k_ml_layer_load;
 		callback->tvmrt_glow_layer_unload = cn10k_ml_layer_unload;
+		callback->tvmrt_io_alloc = cn10k_ml_io_alloc;
+		callback->tvmrt_io_free = cn10k_ml_io_free;
 	} else {
 		callback = NULL;
 	}
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 31/34] ml/cnxk: add generic ML malloc and free callback
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (29 preceding siblings ...)
  2023-10-26 12:43   ` [PATCH v9 30/34] ml/cnxk: implement I/O alloc and free callbacks Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 32/34] ml/cnxk: support quantize and dequantize callback Srikanth Yalavarthi
                     ` (3 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Implemented generic ML malloc and free callbacks

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_ops.c | 30 ++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.h |  3 +++
 drivers/ml/cnxk/mvtvm_ml_ops.c |  2 ++
 3 files changed, 35 insertions(+)

diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 7802425c87..01b0a44caa 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -1497,3 +1497,33 @@ cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name)
 
 	return plt_memzone_free(mz);
 }
+
+int
+cn10k_ml_malloc(const char *name, size_t size, uint32_t align, void **addr)
+{
+	const struct plt_memzone *mz;
+
+	mz = plt_memzone_reserve_aligned(name, size, 0, align);
+	if (mz == NULL) {
+		plt_err("ml_malloc failed: Unable to allocate memory: name = %s", name);
+		return -ENOMEM;
+	}
+
+	*addr = mz->addr;
+
+	return 0;
+}
+
+int
+cn10k_ml_free(const char *name)
+{
+	const struct plt_memzone *mz;
+
+	mz = plt_memzone_lookup(name);
+	if (mz == NULL) {
+		plt_err("ml_free failed: Memzone not found: name = %s", name);
+		return -EINVAL;
+	}
+
+	return plt_memzone_free(mz);
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.h b/drivers/ml/cnxk/cn10k_ml_ops.h
index 9c41c1c0b0..eb3e1c139c 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.h
+++ b/drivers/ml/cnxk/cn10k_ml_ops.h
@@ -333,6 +333,9 @@ int cn10k_ml_io_alloc(void *device, uint16_t model_id, const char *layer_name,
 		      uint64_t **input_qbuffer, uint64_t **output_qbuffer);
 int cn10k_ml_io_free(void *device, uint16_t model_id, const char *layer_name);
 
+int cn10k_ml_malloc(const char *name, size_t size, uint32_t align, void **addr);
+int cn10k_ml_free(const char *name);
+
 /* xstats ops */
 void cn10k_ml_xstat_model_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 				   uint16_t stat_id, uint16_t entry, char *suffix);
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index a50b31ec6e..9d59e28661 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -234,6 +234,8 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback->tvmrt_glow_layer_unload = cn10k_ml_layer_unload;
 		callback->tvmrt_io_alloc = cn10k_ml_io_alloc;
 		callback->tvmrt_io_free = cn10k_ml_io_free;
+		callback->tvmrt_malloc = cn10k_ml_malloc;
+		callback->tvmrt_free = cn10k_ml_free;
 	} else {
 		callback = NULL;
 	}
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 32/34] ml/cnxk: support quantize and dequantize callback
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (30 preceding siblings ...)
  2023-10-26 12:43   ` [PATCH v9 31/34] ml/cnxk: add generic ML malloc and free callback Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 33/34] ml/cnxk: enable fast-path ops for TVM models Srikanth Yalavarthi
                     ` (2 subsequent siblings)
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

From: Prince Takkar <ptakkar@marvell.com>

Added support for quantize and dequantize callback
functions for TVM models.

Signed-off-by: Prince Takkar <ptakkar@marvell.com>
---
 drivers/ml/cnxk/mvtvm_ml_ops.c | 129 +++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h |   4 +
 2 files changed, 133 insertions(+)

diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 9d59e28661..39c8bf0f04 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -2,11 +2,15 @@
  * Copyright (c) 2023 Marvell.
  */
 
+#include <dlpack/dlpack.h>
+
 #include <rte_common.h>
 #include <rte_cycles.h>
 #include <rte_mldev.h>
 #include <rte_mldev_pmd.h>
 
+#include <mldev_utils.h>
+
 #include "cnxk_ml_dev.h"
 #include "cnxk_ml_model.h"
 #include "cnxk_ml_ops.h"
@@ -236,6 +240,8 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback->tvmrt_io_free = cn10k_ml_io_free;
 		callback->tvmrt_malloc = cn10k_ml_malloc;
 		callback->tvmrt_free = cn10k_ml_free;
+		callback->tvmrt_quantize = mvtvm_ml_io_quantize;
+		callback->tvmrt_dequantize = mvtvm_ml_io_dequantize;
 	} else {
 		callback = NULL;
 	}
@@ -366,3 +372,126 @@ mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model)
 
 	return 0;
 }
+
+int
+mvtvm_ml_io_quantize(void *device, uint16_t model_id, const char *layer_name,
+		     const DLTensor **deq_tensor, void *qbuffer)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	uint16_t layer_id = 0;
+	uint8_t *lcl_dbuffer;
+	uint8_t *lcl_qbuffer;
+	uint32_t i;
+	int ret;
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if ((device == NULL) || (deq_tensor == NULL) || (qbuffer == NULL))
+		return -EINVAL;
+#endif
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+#ifdef CNXK_ML_DEV_DEBUG
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+#endif
+
+	/* Get layer id */
+	for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+		if (strcmp(model->layer[layer_id].name, layer_name) == 0)
+			break;
+	}
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if (layer_id == model->mvtvm.metadata.model.nb_layers) {
+		plt_err("Invalid layer name: %s", layer_name);
+		return -EINVAL;
+	}
+
+	if (model->layer[layer_id].type != ML_CNXK_LAYER_TYPE_MRVL) {
+		plt_err("Invalid layer name / type: %s", layer_name);
+		return -EINVAL;
+	}
+#endif
+
+	info = &model->layer[layer_id].info;
+	lcl_qbuffer = (uint8_t *)qbuffer;
+
+	for (i = 0; i < info->nb_inputs; i++) {
+		lcl_dbuffer = PLT_PTR_ADD(deq_tensor[i]->data, deq_tensor[i]->byte_offset);
+
+		ret = cnxk_ml_io_quantize_single(&info->input[i], lcl_dbuffer, lcl_qbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_qbuffer += info->input[i].sz_q;
+	}
+
+	return 0;
+}
+
+int
+mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name, void *qbuffer,
+		       const DLTensor **deq_tensor)
+{
+	struct cnxk_ml_io_info *info = NULL;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct cnxk_ml_model *model;
+	uint16_t layer_id = 0;
+	uint8_t *lcl_dbuffer;
+	uint8_t *lcl_qbuffer;
+	uint32_t i;
+	int ret;
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if ((device == NULL) || (deq_tensor == NULL) || (qbuffer == NULL))
+		return -EINVAL;
+#endif
+
+	cnxk_mldev = (struct cnxk_ml_dev *)device;
+
+	model = cnxk_mldev->mldev->data->models[model_id];
+#ifdef CNXK_ML_DEV_DEBUG
+	if (model == NULL) {
+		plt_err("Invalid model_id = %u", model_id);
+		return -EINVAL;
+	}
+#endif
+
+	for (layer_id = 0; layer_id < model->mvtvm.metadata.model.nb_layers; layer_id++) {
+		if (strcmp(model->layer[layer_id].name, layer_name) == 0)
+			break;
+	}
+
+#ifdef CNXK_ML_DEV_DEBUG
+	if (layer_id == model->mvtvm.metadata.model.nb_layers) {
+		plt_err("Invalid layer name: %s", layer_name);
+		return -EINVAL;
+	}
+
+	if (model->layer[layer_id].type != ML_CNXK_LAYER_TYPE_MRVL) {
+		plt_err("Invalid layer name / type: %s", layer_name);
+		return -EINVAL;
+	}
+#endif
+
+	info = &model->layer[layer_id].info;
+	lcl_qbuffer = (uint8_t *)qbuffer;
+
+	for (i = 0; i < info->nb_outputs; i++) {
+		lcl_dbuffer = PLT_PTR_ADD(deq_tensor[i]->data, deq_tensor[i]->byte_offset);
+
+		ret = cnxk_ml_io_dequantize_single(&info->output[i], lcl_qbuffer, lcl_dbuffer);
+		if (ret < 0)
+			return ret;
+
+		lcl_qbuffer += info->output[i].sz_q;
+	}
+
+	return 0;
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 22e0340146..4cabe30a82 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -24,6 +24,10 @@ int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_para
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int mvtvm_ml_model_start(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
 int mvtvm_ml_model_stop(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
+int mvtvm_ml_io_quantize(void *device, uint16_t model_id, const char *layer_name,
+			 const DLTensor **deq_tensor, void *qbuffer);
+int mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name, void *qbuffer,
+			   const DLTensor **deq_tensor);
 
 void mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 				   uint16_t stat_id, uint16_t entry, char *suffix);
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 33/34] ml/cnxk: enable fast-path ops for TVM models
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (31 preceding siblings ...)
  2023-10-26 12:43   ` [PATCH v9 32/34] ml/cnxk: support quantize and dequantize callback Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-26 12:43   ` [PATCH v9 34/34] ml/cnxk: enable creation of mvtvm virtual device Srikanth Yalavarthi
  2023-10-29 12:53   ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Jerin Jacob
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

From: Anup Prabhu <aprabhu@marvell.com>

Enable fast-path ops support for TVM models. Models would
use TVMDP library function calls to execute inference
operations for Hybrid and LLVM model sub-types.

For TVM MRVL model subtypes that have a single MRVL layer,
the inference requests are directly enqueued to hardware
by the driver.

Signed-off-by: Anup Prabhu <aprabhu@marvell.com>
Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 doc/guides/rel_notes/release_23_11.rst |   3 +
 drivers/ml/cnxk/cn10k_ml_ops.c         |   4 -
 drivers/ml/cnxk/cnxk_ml_ops.c          |   4 +
 drivers/ml/cnxk/cnxk_ml_ops.h          |   5 +
 drivers/ml/cnxk/mvtvm_ml_model.c       |  14 +++
 drivers/ml/cnxk/mvtvm_ml_model.h       |   6 ++
 drivers/ml/cnxk/mvtvm_ml_ops.c         | 124 +++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_ops.h         |  43 +++++++++
 8 files changed, 199 insertions(+), 4 deletions(-)

diff --git a/doc/guides/rel_notes/release_23_11.rst b/doc/guides/rel_notes/release_23_11.rst
index 0a6fc76a9d..5fcf2a1897 100644
--- a/doc/guides/rel_notes/release_23_11.rst
+++ b/doc/guides/rel_notes/release_23_11.rst
@@ -243,6 +243,9 @@ New Features
   Added dispatcher library which purpose is to help decouple different
   parts (modules) of an eventdev-based application.
 
+* **Updated Marvell cnxk mldev driver.**
+
+  * Added support for models compiled using TVM framework.
 
 Removed Items
 -------------
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 01b0a44caa..b9d30278c6 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -371,10 +371,6 @@ cn10k_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_c
 	else
 		cn10k_mldev->ml_jcmdq_enqueue = roc_ml_jcmdq_enqueue_lf;
 
-	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
-	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
-	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
-
 	return 0;
 }
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index 2632d70d8c..bf266d4d6e 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -632,6 +632,10 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	cnxk_mldev->max_nb_layers =
 		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
 
+	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
+	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
+	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
+
 	/* Allocate and initialize index_map */
 	if (cnxk_mldev->index_map == NULL) {
 		cnxk_mldev->index_map =
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.h b/drivers/ml/cnxk/cnxk_ml_ops.h
index ab32676b3e..7b49793a57 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.h
+++ b/drivers/ml/cnxk/cnxk_ml_ops.h
@@ -24,6 +24,11 @@ struct cnxk_ml_req {
 	union {
 		/* CN10K */
 		struct cn10k_ml_req cn10k_req;
+
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+		/* MVTVM */
+		struct mvtvm_ml_req mvtvm_req;
+#endif
 	};
 
 	/* Address of status field */
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index e5ba672788..d28bd88a08 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -220,6 +220,13 @@ mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model)
 		model->mvtvm.info.total_input_sz_d += model->mvtvm.info.input[i].sz_d;
 		model->mvtvm.info.total_input_sz_q += model->mvtvm.info.input[i].sz_q;
 
+		model->mvtvm.input_tensor[i].device = metadata->input[i].device;
+		model->mvtvm.input_tensor[i].ndim = metadata->input[i].ndim;
+		model->mvtvm.input_tensor[i].dtype = metadata->input[i].datatype;
+		model->mvtvm.input_tensor[i].shape = metadata->input[i].shape;
+		model->mvtvm.input_tensor[i].strides = NULL;
+		model->mvtvm.input_tensor[i].byte_offset = 0;
+
 		plt_ml_dbg("model_id = %u, input[%u] - sz_d = %u sz_q = %u", model->model_id, i,
 			   model->mvtvm.info.input[i].sz_d, model->mvtvm.info.input[i].sz_q);
 	}
@@ -253,6 +260,13 @@ mvtvm_ml_model_io_info_set(struct cnxk_ml_model *model)
 		model->mvtvm.info.total_output_sz_d += model->mvtvm.info.output[i].sz_d;
 		model->mvtvm.info.total_output_sz_q += model->mvtvm.info.output[i].sz_q;
 
+		model->mvtvm.output_tensor[i].device = metadata->output[i].device;
+		model->mvtvm.output_tensor[i].ndim = metadata->output[i].ndim;
+		model->mvtvm.output_tensor[i].dtype = metadata->output[i].datatype;
+		model->mvtvm.output_tensor[i].shape = metadata->output[i].shape;
+		model->mvtvm.output_tensor[i].strides = NULL;
+		model->mvtvm.output_tensor[i].byte_offset = 0;
+
 		plt_ml_dbg("model_id = %u, output[%u] - sz_d = %u sz_q = %u", model->model_id, i,
 			   model->mvtvm.info.output[i].sz_d, model->mvtvm.info.output[i].sz_q);
 	}
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.h b/drivers/ml/cnxk/mvtvm_ml_model.h
index 66c3af18e1..7ffce38094 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.h
+++ b/drivers/ml/cnxk/mvtvm_ml_model.h
@@ -69,6 +69,12 @@ struct mvtvm_ml_model_data {
 
 	/* Stats for burst ops */
 	struct mvtvm_ml_model_xstats *burst_xstats;
+
+	/* Input Tensor */
+	DLTensor input_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Output Tensor */
+	DLTensor output_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
 };
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 39c8bf0f04..6b88491371 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -19,6 +19,12 @@
 /* ML model macros */
 #define MVTVM_ML_MODEL_MEMZONE_NAME "ml_mvtvm_model_mz"
 
+__rte_hot static void
+mvtvm_ml_set_poll_addr(struct cnxk_ml_req *req)
+{
+	req->status = &req->mvtvm_req.status;
+}
+
 void
 mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 			      uint16_t stat_id, uint16_t entry, char *suffix)
@@ -242,6 +248,7 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		callback->tvmrt_free = cn10k_ml_free;
 		callback->tvmrt_quantize = mvtvm_ml_io_quantize;
 		callback->tvmrt_dequantize = mvtvm_ml_io_dequantize;
+		callback->tvmrt_inference = cn10k_ml_inference_sync;
 	} else {
 		callback = NULL;
 	}
@@ -285,6 +292,19 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 		model->mvtvm.burst_xstats[qp_id].dequeued_count = 0;
 	}
 
+	/* Set model specific fast path functions */
+	if (model->subtype == ML_CNXK_MODEL_SUBTYPE_TVM_MRVL) {
+		model->enqueue_single = cn10k_ml_enqueue_single;
+		model->result_update = cn10k_ml_result_update;
+		model->set_error_code = cn10k_ml_set_error_code;
+		model->set_poll_addr = cn10k_ml_set_poll_addr;
+	} else {
+		model->enqueue_single = mvtvm_ml_enqueue_single;
+		model->result_update = mvtvm_ml_result_update;
+		model->set_error_code = mvtvm_ml_set_error_code;
+		model->set_poll_addr = mvtvm_ml_set_poll_addr;
+	}
+
 	return 0;
 
 error:
@@ -495,3 +515,107 @@ mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name,
 
 	return 0;
 }
+
+static int
+mvtvm_ml_model_run(struct cnxk_ml_model *model, struct rte_ml_op *op, struct cnxk_ml_req *req)
+{
+	uint8_t i;
+
+	rte_memcpy(req->mvtvm_req.input_tensor, model->mvtvm.input_tensor,
+		   model->mvtvm.metadata.model.num_input * sizeof(DLTensor));
+	for (i = 0; i < model->mvtvm.metadata.model.num_input; i++) {
+		req->mvtvm_req.input_tensor[i].data = op->input[i]->addr;
+		req->mvtvm_req.input_tensor[i].byte_offset = 0;
+	}
+
+	rte_memcpy(req->mvtvm_req.output_tensor, model->mvtvm.output_tensor,
+		   model->mvtvm.metadata.model.num_output * sizeof(DLTensor));
+	for (i = 0; i < model->mvtvm.metadata.model.num_output; i++) {
+		req->mvtvm_req.output_tensor[i].data = op->output[i]->addr;
+		req->mvtvm_req.output_tensor[i].byte_offset = 0;
+	}
+
+	tvmdp_model_run(model->model_id, model->mvtvm.metadata.model.num_input,
+			req->mvtvm_req.input_tensor, model->mvtvm.metadata.model.num_output,
+			req->mvtvm_req.output_tensor, &req->mvtvm_req.result,
+			&req->mvtvm_req.status);
+
+	plt_write64(ML_CNXK_POLL_JOB_FINISH, req->status);
+
+	return 0;
+}
+
+__rte_hot void
+mvtvm_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype)
+{
+	RTE_SET_USED(stype);
+
+	req->mvtvm_req.result.error_code = etype;
+}
+
+__rte_hot bool
+mvtvm_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op, uint16_t layer_id,
+			struct cnxk_ml_qp *qp, uint64_t head)
+{
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_queue *queue;
+	struct cnxk_ml_req *req;
+
+	RTE_SET_USED(layer_id);
+
+	queue = &qp->queue;
+	req = &queue->reqs[head];
+	model = cnxk_mldev->mldev->data->models[op->model_id];
+
+	model->set_poll_addr(req);
+	memset(&req->mvtvm_req.result, 0, sizeof(struct mvtvm_ml_result));
+	req->mvtvm_req.result.error_code = 0x0;
+	req->mvtvm_req.result.user_ptr = op->user_ptr;
+
+	cnxk_ml_set_poll_ptr(req);
+	mvtvm_ml_model_run(model, op, req);
+	req->timeout = plt_tsc_cycles() + queue->wait_cycles;
+	req->op = op;
+
+	return true;
+}
+
+__rte_hot void
+mvtvm_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request)
+{
+	struct mvtvm_ml_model_xstats *xstats;
+	struct mvtvm_ml_result *result;
+	struct cnxk_ml_model *model;
+	struct cnxk_ml_req *req;
+	uint64_t tvm_rt_latency;
+	struct cnxk_ml_qp *qp;
+	struct rte_ml_op *op;
+
+	req = (struct cnxk_ml_req *)request;
+	result = &req->mvtvm_req.result;
+	op = req->op;
+	qp = cnxk_mldev->mldev->data->queue_pairs[qp_id];
+	op->impl_opaque = result->error_code;
+
+	if (likely(result->error_code == 0)) {
+		qp->stats.dequeued_count++;
+		op->status = RTE_ML_OP_STATUS_SUCCESS;
+
+		model = cnxk_mldev->mldev->data->models[op->model_id];
+		xstats = &model->mvtvm.burst_xstats[qp_id];
+
+		if (unlikely(xstats->dequeued_count == xstats->tvm_rt_reset_count)) {
+			xstats->tvm_rt_latency_min = UINT64_MAX;
+			xstats->tvm_rt_latency_max = 0;
+		}
+		tvm_rt_latency = result->stats.end_ns - result->stats.start_ns;
+		xstats->tvm_rt_latency = tvm_rt_latency;
+		xstats->tvm_rt_latency_tot += tvm_rt_latency;
+		xstats->tvm_rt_latency_min = RTE_MIN(xstats->tvm_rt_latency_min, tvm_rt_latency);
+		xstats->tvm_rt_latency_max = RTE_MAX(xstats->tvm_rt_latency_max, tvm_rt_latency);
+		xstats->dequeued_count++;
+	} else {
+		qp->stats.dequeue_err_count++;
+		op->status = RTE_ML_OP_STATUS_ERROR;
+	}
+}
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index 4cabe30a82..cb4b219743 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -16,6 +16,44 @@
 struct cnxk_ml_dev;
 struct cnxk_ml_model;
 struct cnxk_ml_layer;
+struct cnxk_ml_qp;
+struct cnxk_ml_req;
+
+/* Inference stats */
+struct mvtvm_ml_stats {
+	/* Start ns */
+	uint64_t start_ns;
+
+	/* Start ns */
+	uint64_t end_ns;
+};
+
+/* Result structure */
+struct mvtvm_ml_result {
+	/* Job error code */
+	uint64_t error_code;
+
+	/* Inference stats */
+	struct mvtvm_ml_stats stats;
+
+	/* User context pointer */
+	void *user_ptr;
+};
+
+/* MVTVM specific request */
+struct mvtvm_ml_req {
+	/* Input tensors */
+	DLTensor input_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Output tensors */
+	DLTensor output_tensor[ML_CNXK_MODEL_MAX_INPUT_OUTPUT];
+
+	/* Status field for poll mode requests */
+	volatile uint64_t status;
+
+	/* Result */
+	struct mvtvm_ml_result result;
+};
 
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
@@ -29,6 +67,11 @@ int mvtvm_ml_io_quantize(void *device, uint16_t model_id, const char *layer_name
 int mvtvm_ml_io_dequantize(void *device, uint16_t model_id, const char *layer_name, void *qbuffer,
 			   const DLTensor **deq_tensor);
 
+__rte_hot bool mvtvm_ml_enqueue_single(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_op *op,
+				       uint16_t layer_id, struct cnxk_ml_qp *qp, uint64_t head);
+__rte_hot void mvtvm_ml_result_update(struct cnxk_ml_dev *cnxk_mldev, int qp_id, void *request);
+__rte_hot void mvtvm_ml_set_error_code(struct cnxk_ml_req *req, uint64_t etype, uint64_t stype);
+
 void mvtvm_ml_model_xstat_name_set(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
 				   uint16_t stat_id, uint16_t entry, char *suffix);
 uint64_t mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model,
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* [PATCH v9 34/34] ml/cnxk: enable creation of mvtvm virtual device
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (32 preceding siblings ...)
  2023-10-26 12:43   ` [PATCH v9 33/34] ml/cnxk: enable fast-path ops for TVM models Srikanth Yalavarthi
@ 2023-10-26 12:43   ` Srikanth Yalavarthi
  2023-10-29 12:53   ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Jerin Jacob
  34 siblings, 0 replies; 340+ messages in thread
From: Srikanth Yalavarthi @ 2023-10-26 12:43 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

Enable support to create a mvtvm virtual device on
system's without a PCI based ML HW accelerator.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 doc/guides/mldevs/cnxk.rst       |  50 +++++++-
 drivers/ml/cnxk/cn10k_ml_dev.c   |   8 ++
 drivers/ml/cnxk/cn10k_ml_dev.h   |   3 +
 drivers/ml/cnxk/cnxk_ml_dev.c    |   3 +
 drivers/ml/cnxk/cnxk_ml_dev.h    |  21 ++++
 drivers/ml/cnxk/cnxk_ml_ops.c    |  82 +++++++++----
 drivers/ml/cnxk/meson.build      |   1 +
 drivers/ml/cnxk/mvtvm_ml_dev.c   | 196 +++++++++++++++++++++++++++++++
 drivers/ml/cnxk/mvtvm_ml_dev.h   |  40 +++++++
 drivers/ml/cnxk/mvtvm_ml_ops.c   |  31 +++++
 drivers/ml/cnxk/mvtvm_ml_ops.h   |   2 +
 drivers/ml/cnxk/mvtvm_ml_stubs.c |  18 +++
 drivers/ml/cnxk/mvtvm_ml_stubs.h |   2 +
 13 files changed, 433 insertions(+), 24 deletions(-)
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.c
 create mode 100644 drivers/ml/cnxk/mvtvm_ml_dev.h

diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
index a4d8903896..28e5b5b87f 100644
--- a/doc/guides/mldevs/cnxk.rst
+++ b/doc/guides/mldevs/cnxk.rst
@@ -239,6 +239,23 @@ Bind the ML PF device to the vfio_pci driver:
    usertools/dpdk-devbind.py -u 0000:00:10.0
    usertools/dpdk-devbind.py -b vfio-pci 0000:00:10.0
 
+VDEV support
+------------
+
+On platforms which don't support ML hardware acceleration through PCI device, the
+Marvell ML CNXK PMD can execute inference operations on a vdev with the ML models
+compiled using Apache TVM framework.
+
+VDEV can be enabled by passing the EAL arguments
+
+.. code-block:: console
+
+   --vdev ml_mvtvm
+
+VDEV can also be used on platforms with ML HW accelerator. However to use VDEV in
+this case, the PCI device has to be un-binded. When PCI device is binded, creation
+of vdev is skipped.
+
 
 Runtime Config Options
 ----------------------
@@ -249,6 +266,8 @@ Runtime Config Options
   The parameter ``fw_path`` can be used by the user
   to load ML firmware from a custom path.
 
+  This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,fw_path="/home/user/ml_fw.bin"
@@ -264,6 +283,8 @@ Runtime Config Options
   When enabled, firmware would mask the DPE non-fatal hardware errors as warnings.
   The parameter ``enable_dpe_warnings`` is used fo this configuration.
 
+  This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,enable_dpe_warnings=0
@@ -280,11 +301,19 @@ Runtime Config Options
   Caching of model data improves the inferencing throughput / latency for the model.
   The parameter ``cache_model_data`` is used to enable data caching.
 
+  This option is supported on PCI HW accelerator and vdev.
+
   For example::
 
      -a 0000:00:10.0,cache_model_data=0
 
-  With the above configuration, model data caching is disabled.
+  With the above configuration, model data caching is disabled on HW accelerator.
+
+  For example::
+
+     --vdev ml_mvtvm,cache_model_data=0
+
+  With the above configuration, model data caching is disabled on vdev.
 
 
 **OCM allocation mode** (default ``lowest``)
@@ -300,6 +329,8 @@ Runtime Config Options
   ``largest``
     Allocate OCM for the model from the slot with largest amount of free space.
 
+  This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,ocm_alloc_mode=lowest
@@ -317,6 +348,8 @@ Runtime Config Options
   Supported page sizes by the driver are 1 KB, 2 KB, 4 KB, 8 KB and 16 KB.
   Default page size is 16 KB.
 
+  This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,ocm_page_size=8192
@@ -341,6 +374,8 @@ Runtime Config Options
     Enabling spinlock version would disable restrictions on the number of queue-pairs
     that can be supported by the driver.
 
+   This option is supported only on PCI HW accelerator.
+
   For example::
 
      -a 0000:00:10.0,hw_queue_lock=1
@@ -349,6 +384,19 @@ Runtime Config Options
   in the fast path enqueue burst operation.
 
 
+**Maximum queue pairs** (default ``1``)
+
+  VDEV supports additional EAL arguments to configure the maximum number of
+  queue-pairs on the ML device through the option ``max_qps``.
+
+  This option is supported only on vdev.
+
+  For example::
+
+     --vdev ml_mvtvm,max_qps=4
+
+  With the above configuration, 4 queue-pairs are created on the vdev.
+
 Debugging Options
 -----------------
 
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.c b/drivers/ml/cnxk/cn10k_ml_dev.c
index 91813e9d0a..41f3b7a95d 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.c
+++ b/drivers/ml/cnxk/cn10k_ml_dev.c
@@ -309,6 +309,12 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 
 	PLT_SET_USED(pci_drv);
 
+	if (cnxk_ml_dev_initialized == 1) {
+		plt_err("ML CNXK device already initialized!");
+		plt_err("Cannot initialize CN10K PCI dev");
+		return -EINVAL;
+	}
+
 	init_params = (struct rte_ml_dev_pmd_init_params){
 		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cnxk_ml_dev)};
 
@@ -355,6 +361,8 @@ cn10k_ml_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_de
 	dev->dequeue_burst = NULL;
 	dev->op_error_get = NULL;
 
+	cnxk_ml_dev_initialized = 1;
+	cnxk_mldev->type = CNXK_ML_DEV_TYPE_PCI;
 	cnxk_mldev->state = ML_CNXK_DEV_STATE_PROBED;
 
 	return 0;
diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h
index 2e7eb6c9ef..cee405f3f5 100644
--- a/drivers/ml/cnxk/cn10k_ml_dev.h
+++ b/drivers/ml/cnxk/cn10k_ml_dev.h
@@ -11,6 +11,9 @@
 
 #include "cnxk_ml_io.h"
 
+/* Device status */
+extern int cnxk_ml_dev_initialized;
+
 /* Dummy Device ops */
 extern struct rte_ml_dev_ops ml_dev_dummy_ops;
 
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.c b/drivers/ml/cnxk/cnxk_ml_dev.c
index 63d1c9e417..dc4512223c 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.c
+++ b/drivers/ml/cnxk/cnxk_ml_dev.c
@@ -7,6 +7,9 @@
 
 #include "cnxk_ml_dev.h"
 
+/* Device status */
+int cnxk_ml_dev_initialized;
+
 /* Dummy operations for ML device */
 struct rte_ml_dev_ops ml_dev_dummy_ops = {0};
 
diff --git a/drivers/ml/cnxk/cnxk_ml_dev.h b/drivers/ml/cnxk/cnxk_ml_dev.h
index 382fca64be..491c4c4aea 100644
--- a/drivers/ml/cnxk/cnxk_ml_dev.h
+++ b/drivers/ml/cnxk/cnxk_ml_dev.h
@@ -9,6 +9,10 @@
 
 #include "cn10k_ml_dev.h"
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+#include "mvtvm_ml_dev.h"
+#endif
+
 #include "cnxk_ml_xstats.h"
 
 /* ML command timeout in seconds */
@@ -34,6 +38,15 @@ struct cnxk_ml_error_db {
 	char str[RTE_ML_STR_MAX];
 };
 
+/* Device type */
+enum cnxk_ml_dev_type {
+	/* PCI based Marvell's ML HW accelerator device */
+	CNXK_ML_DEV_TYPE_PCI,
+
+	/* Generic Virtual device */
+	CNXK_ML_DEV_TYPE_VDEV,
+};
+
 /* Device configuration state enum */
 enum cnxk_ml_dev_state {
 	/* Probed and not configured */
@@ -66,6 +79,9 @@ struct cnxk_ml_dev {
 	/* RTE device */
 	struct rte_ml_dev *mldev;
 
+	/* Device type */
+	enum cnxk_ml_dev_type type;
+
 	/* Configuration state */
 	enum cnxk_ml_dev_state state;
 
@@ -87,6 +103,11 @@ struct cnxk_ml_dev {
 	/* CN10K device structure */
 	struct cn10k_ml_dev cn10k_mldev;
 
+#ifdef RTE_MLDEV_CNXK_ENABLE_MVTVM
+	/* MVTVM device structure */
+	struct mvtvm_ml_dev mvtvm_mldev;
+#endif
+
 	/* Maximum number of layers */
 	uint64_t max_nb_layers;
 
diff --git a/drivers/ml/cnxk/cnxk_ml_ops.c b/drivers/ml/cnxk/cnxk_ml_ops.c
index bf266d4d6e..36a5dcf9b0 100644
--- a/drivers/ml/cnxk/cnxk_ml_ops.c
+++ b/drivers/ml/cnxk/cnxk_ml_ops.c
@@ -117,7 +117,8 @@ cnxk_ml_qp_create(const struct rte_ml_dev *dev, uint16_t qp_id, uint32_t nb_desc
 	qp->stats.enqueue_err_count = 0;
 	qp->stats.dequeue_err_count = 0;
 
-	cn10k_ml_qp_initialize(cnxk_mldev, qp);
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		cn10k_ml_qp_initialize(cnxk_mldev, qp);
 
 	return qp;
 
@@ -480,7 +481,12 @@ cnxk_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info)
 	dev_info->driver_name = dev->device->driver->name;
 	dev_info->max_models = ML_CNXK_MAX_MODELS;
 
-	return cn10k_ml_dev_info_get(cnxk_mldev, dev_info);
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		return cn10k_ml_dev_info_get(cnxk_mldev, dev_info);
+	else
+		return mvtvm_ml_dev_info_get(cnxk_mldev, dev_info);
+
+	return 0;
 }
 
 static int
@@ -518,9 +524,11 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 			   conf->nb_queue_pairs, conf->nb_models);
 
 		/* Load firmware */
-		ret = cn10k_ml_fw_load(cnxk_mldev);
-		if (ret != 0)
-			return ret;
+		if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+			ret = cn10k_ml_fw_load(cnxk_mldev);
+			if (ret != 0)
+				return ret;
+		}
 	} else if (cnxk_mldev->state == ML_CNXK_DEV_STATE_CONFIGURED) {
 		plt_ml_dbg("Re-configuring ML device, nb_queue_pairs = %u, nb_models = %u",
 			   conf->nb_queue_pairs, conf->nb_models);
@@ -618,10 +626,12 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 	}
 	dev->data->nb_models = conf->nb_models;
 
-	ret = cn10k_ml_dev_configure(cnxk_mldev, conf);
-	if (ret != 0) {
-		plt_err("Failed to configure CN10K ML Device");
-		goto error;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		ret = cn10k_ml_dev_configure(cnxk_mldev, conf);
+		if (ret != 0) {
+			plt_err("Failed to configure CN10K ML Device");
+			goto error;
+		}
 	}
 
 	ret = mvtvm_ml_dev_configure(cnxk_mldev, conf);
@@ -629,12 +639,17 @@ cnxk_ml_dev_configure(struct rte_ml_dev *dev, const struct rte_ml_dev_config *co
 		goto error;
 
 	/* Set device capabilities */
-	cnxk_mldev->max_nb_layers =
-		cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		cnxk_mldev->max_nb_layers =
+			cnxk_mldev->cn10k_mldev.fw.req->cn10k_req.jd.fw_load.cap.s.max_models;
+	else
+		cnxk_mldev->max_nb_layers = ML_CNXK_MAX_MODELS;
 
 	cnxk_mldev->mldev->enqueue_burst = cnxk_ml_enqueue_burst;
 	cnxk_mldev->mldev->dequeue_burst = cnxk_ml_dequeue_burst;
-	cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
+
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		cnxk_mldev->mldev->op_error_get = cn10k_ml_op_error_get;
 
 	/* Allocate and initialize index_map */
 	if (cnxk_mldev->index_map == NULL) {
@@ -695,8 +710,10 @@ cnxk_ml_dev_close(struct rte_ml_dev *dev)
 	if (mvtvm_ml_dev_close(cnxk_mldev) != 0)
 		plt_err("Failed to close MVTVM ML Device");
 
-	if (cn10k_ml_dev_close(cnxk_mldev) != 0)
-		plt_err("Failed to close CN10K ML Device");
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		if (cn10k_ml_dev_close(cnxk_mldev) != 0)
+			plt_err("Failed to close CN10K ML Device");
+	}
 
 	if (cnxk_mldev->index_map)
 		rte_free(cnxk_mldev->index_map);
@@ -748,10 +765,12 @@ cnxk_ml_dev_start(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
-	ret = cn10k_ml_dev_start(cnxk_mldev);
-	if (ret != 0) {
-		plt_err("Failed to start CN10K ML Device");
-		return ret;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		ret = cn10k_ml_dev_start(cnxk_mldev);
+		if (ret != 0) {
+			plt_err("Failed to start CN10K ML Device");
+			return ret;
+		}
 	}
 
 	cnxk_mldev->state = ML_CNXK_DEV_STATE_STARTED;
@@ -770,10 +789,12 @@ cnxk_ml_dev_stop(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
-	ret = cn10k_ml_dev_stop(cnxk_mldev);
-	if (ret != 0) {
-		plt_err("Failed to stop CN10K ML Device");
-		return ret;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI) {
+		ret = cn10k_ml_dev_stop(cnxk_mldev);
+		if (ret != 0) {
+			plt_err("Failed to stop CN10K ML Device");
+			return ret;
+		}
 	}
 
 	cnxk_mldev->state = ML_CNXK_DEV_STATE_CONFIGURED;
@@ -800,7 +821,12 @@ cnxk_ml_dev_dump(struct rte_ml_dev *dev, FILE *fp)
 			cnxk_ml_model_dump(cnxk_mldev, model, fp);
 	}
 
-	return cn10k_ml_dev_dump(cnxk_mldev, fp);
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_PCI)
+		return cn10k_ml_dev_dump(cnxk_mldev, fp);
+	else
+		return mvtvm_ml_dev_dump(cnxk_mldev, fp);
+
+	return 0;
 }
 
 static int
@@ -813,6 +839,9 @@ cnxk_ml_dev_selftest(struct rte_ml_dev *dev)
 
 	cnxk_mldev = dev->data->dev_private;
 
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV)
+		return -ENOTSUP;
+
 	return cn10k_ml_dev_selftest(cnxk_mldev);
 }
 
@@ -1145,6 +1174,11 @@ cnxk_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, u
 		return -EINVAL;
 	}
 
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV && type != ML_CNXK_MODEL_TYPE_TVM) {
+		plt_err("Unsupported model type");
+		return -ENOTSUP;
+	}
+
 	/* Find model ID */
 	found = false;
 	for (lcl_model_id = 0; lcl_model_id < dev->data->nb_models; lcl_model_id++) {
@@ -1384,6 +1418,8 @@ cnxk_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *buf
 		return -EINVAL;
 
 	cnxk_mldev = dev->data->dev_private;
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV)
+		return -ENOTSUP;
 
 	model = dev->data->models[model_id];
 	if (model == NULL) {
diff --git a/drivers/ml/cnxk/meson.build b/drivers/ml/cnxk/meson.build
index 20534d0b00..0680a0faa5 100644
--- a/drivers/ml/cnxk/meson.build
+++ b/drivers/ml/cnxk/meson.build
@@ -62,6 +62,7 @@ if enable_mvtvm
 dpdk_conf.set('RTE_MLDEV_CNXK_ENABLE_MVTVM', 1)
 
 sources += files(
+        'mvtvm_ml_dev.c',
         'mvtvm_ml_ops.c',
         'mvtvm_ml_model.c',
 )
diff --git a/drivers/ml/cnxk/mvtvm_ml_dev.c b/drivers/ml/cnxk/mvtvm_ml_dev.c
new file mode 100644
index 0000000000..dcac7b7273
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_dev.c
@@ -0,0 +1,196 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#include <rte_kvargs.h>
+#include <rte_mldev.h>
+#include <rte_mldev_pmd.h>
+
+#include <bus_vdev_driver.h>
+
+#include <roc_api.h>
+
+#include "cnxk_ml_dev.h"
+
+#define MVTVM_ML_DEV_MAX_QPS	      "max_qps"
+#define MVTVM_ML_DEV_CACHE_MODEL_DATA "cache_model_data"
+
+#define MVTVM_ML_DEV_MAX_QPS_DEFAULT	      32
+#define CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT 1
+
+static const char *const valid_args[] = {MVTVM_ML_DEV_MAX_QPS, MVTVM_ML_DEV_CACHE_MODEL_DATA, NULL};
+
+static int
+parse_integer_arg(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	int *i = (int *)extra_args;
+
+	*i = atoi(value);
+	if (*i < 0) {
+		plt_err("Argument has to be positive.");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int
+parse_uint_arg(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	int i;
+	char *end;
+	errno = 0;
+
+	i = strtol(value, &end, 10);
+	if (*end != 0 || errno != 0 || i < 0)
+		return -EINVAL;
+
+	*((uint32_t *)extra_args) = i;
+
+	return 0;
+}
+
+static int
+mvtvm_mldev_parse_devargs(const char *args, struct mvtvm_ml_dev *mvtvm_mldev)
+{
+	bool cache_model_data_set = false;
+	struct rte_kvargs *kvlist = NULL;
+	bool max_qps_set = false;
+	int ret = 0;
+
+	if (args == NULL)
+		goto check_args;
+
+	kvlist = rte_kvargs_parse(args, valid_args);
+	if (kvlist == NULL) {
+		plt_err("Error parsing %s devargs\n", "MLDEV_NAME_MVTVM_PMD");
+		return -EINVAL;
+	}
+
+	if (rte_kvargs_count(kvlist, MVTVM_ML_DEV_MAX_QPS) == 1) {
+		ret = rte_kvargs_process(kvlist, MVTVM_ML_DEV_MAX_QPS, &parse_uint_arg,
+					 &mvtvm_mldev->max_nb_qpairs);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n", MVTVM_ML_DEV_MAX_QPS);
+			ret = -EINVAL;
+			goto exit;
+		}
+		max_qps_set = true;
+	}
+
+	if (rte_kvargs_count(kvlist, MVTVM_ML_DEV_CACHE_MODEL_DATA) == 1) {
+		ret = rte_kvargs_process(kvlist, MVTVM_ML_DEV_CACHE_MODEL_DATA, &parse_integer_arg,
+					 &mvtvm_mldev->cache_model_data);
+		if (ret < 0) {
+			plt_err("Error processing arguments, key = %s\n",
+				MVTVM_ML_DEV_CACHE_MODEL_DATA);
+			ret = -EINVAL;
+			goto exit;
+		}
+		cache_model_data_set = true;
+	}
+
+check_args:
+	if (!max_qps_set)
+		mvtvm_mldev->max_nb_qpairs = MVTVM_ML_DEV_MAX_QPS_DEFAULT;
+	plt_ml_dbg("ML: %s = %u", MVTVM_ML_DEV_MAX_QPS, mvtvm_mldev->max_nb_qpairs);
+
+	if (!cache_model_data_set) {
+		mvtvm_mldev->cache_model_data = CN10K_ML_DEV_CACHE_MODEL_DATA_DEFAULT;
+	} else {
+		if ((mvtvm_mldev->cache_model_data < 0) || (mvtvm_mldev->cache_model_data > 1)) {
+			plt_err("Invalid argument, %s = %d\n", MVTVM_ML_DEV_CACHE_MODEL_DATA,
+				mvtvm_mldev->cache_model_data);
+			ret = -EINVAL;
+			goto exit;
+		}
+	}
+	plt_ml_dbg("ML: %s = %d", MVTVM_ML_DEV_CACHE_MODEL_DATA, mvtvm_mldev->cache_model_data);
+
+exit:
+	if (kvlist)
+		rte_kvargs_free(kvlist);
+
+	return ret;
+}
+
+static int
+mvtvm_ml_vdev_probe(struct rte_vdev_device *vdev)
+{
+	struct rte_ml_dev_pmd_init_params init_params;
+	struct mvtvm_ml_dev *mvtvm_mldev;
+	struct cnxk_ml_dev *cnxk_mldev;
+	struct rte_ml_dev *dev;
+	const char *input_args;
+	const char *name;
+	int ret = 0;
+
+	if (cnxk_ml_dev_initialized == 1) {
+		plt_err("ML CNXK device already initialized!");
+		plt_err("Not creating ml_mvtvm vdev!");
+		return 0;
+	}
+
+	init_params = (struct rte_ml_dev_pmd_init_params){
+		.socket_id = rte_socket_id(), .private_data_size = sizeof(struct cnxk_ml_dev)};
+
+	name = rte_vdev_device_name(vdev);
+	if (name == NULL)
+		return -EINVAL;
+	input_args = rte_vdev_device_args(vdev);
+
+	dev = rte_ml_dev_pmd_create(name, &vdev->device, &init_params);
+	if (dev == NULL) {
+		ret = -EFAULT;
+		goto error_exit;
+	}
+
+	cnxk_mldev = dev->data->dev_private;
+	cnxk_mldev->mldev = dev;
+	mvtvm_mldev = &cnxk_mldev->mvtvm_mldev;
+	mvtvm_mldev->vdev = vdev;
+
+	ret = mvtvm_mldev_parse_devargs(input_args, mvtvm_mldev);
+	if (ret < 0)
+		goto error_exit;
+
+	dev->dev_ops = &cnxk_ml_ops;
+	dev->enqueue_burst = NULL;
+	dev->dequeue_burst = NULL;
+	dev->op_error_get = NULL;
+
+	cnxk_ml_dev_initialized = 1;
+	cnxk_mldev->type = CNXK_ML_DEV_TYPE_VDEV;
+
+	return 0;
+
+error_exit:
+	plt_err("Could not create device: ml_mvtvm");
+
+	return ret;
+}
+
+static int
+mvtvm_ml_vdev_remove(struct rte_vdev_device *vdev)
+{
+	struct rte_ml_dev *dev;
+	const char *name;
+
+	name = rte_vdev_device_name(vdev);
+	if (name == NULL)
+		return -EINVAL;
+
+	dev = rte_ml_dev_pmd_get_named_dev(name);
+	if (dev == NULL)
+		return -ENODEV;
+
+	return rte_ml_dev_pmd_destroy(dev);
+}
+
+static struct rte_vdev_driver mvtvm_mldev_pmd = {.probe = mvtvm_ml_vdev_probe,
+						 .remove = mvtvm_ml_vdev_remove};
+
+RTE_PMD_REGISTER_VDEV(MLDEV_NAME_MVTVM_PMD, mvtvm_mldev_pmd);
+
+RTE_PMD_REGISTER_PARAM_STRING(MLDEV_NAME_MVTVM_PMD,
+			      MVTVM_ML_DEV_MAX_QPS "=<int>" MVTVM_ML_DEV_CACHE_MODEL_DATA "=<0|1>");
diff --git a/drivers/ml/cnxk/mvtvm_ml_dev.h b/drivers/ml/cnxk/mvtvm_ml_dev.h
new file mode 100644
index 0000000000..6922c19337
--- /dev/null
+++ b/drivers/ml/cnxk/mvtvm_ml_dev.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Marvell.
+ */
+
+#ifndef _MVTVM_ML_DEV_H_
+#define _MVTVM_ML_DEV_H_
+
+#include <rte_mldev_core.h>
+
+/* Device status */
+extern int cnxk_ml_dev_initialized;
+
+/* CNXK Device ops */
+extern struct rte_ml_dev_ops cnxk_ml_ops;
+
+/* Marvell MVTVM ML PMD device name */
+#define MLDEV_NAME_MVTVM_PMD ml_mvtvm
+
+/* Maximum number of descriptors per queue-pair */
+#define ML_MVTVM_MAX_DESC_PER_QP 1024
+
+/* Maximum number of inputs / outputs per model */
+#define ML_MVTVM_MAX_INPUT_OUTPUT 32
+
+/* Maximum number of segments for IO data */
+#define ML_MVTVM_MAX_SEGMENTS 1
+
+/* Device private data */
+struct mvtvm_ml_dev {
+	/* Virtual device */
+	struct rte_vdev_device *vdev;
+
+	/* Maximum number of queue pairs */
+	uint16_t max_nb_qpairs;
+
+	/* Enable / disable model data caching */
+	int cache_model_data;
+};
+
+#endif /* _MVTVM_ML_DEV_H_ */
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.c b/drivers/ml/cnxk/mvtvm_ml_ops.c
index 6b88491371..e825c3fb23 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.c
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.c
@@ -97,6 +97,22 @@ mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *m
 	return value;
 }
 
+int
+mvtvm_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info)
+{
+	struct mvtvm_ml_dev *mvtvm_mldev;
+
+	mvtvm_mldev = &cnxk_mldev->mvtvm_mldev;
+
+	dev_info->max_queue_pairs = mvtvm_mldev->max_nb_qpairs;
+	dev_info->max_desc = ML_MVTVM_MAX_DESC_PER_QP;
+	dev_info->max_io = ML_MVTVM_MAX_INPUT_OUTPUT;
+	dev_info->max_segments = ML_MVTVM_MAX_SEGMENTS;
+	dev_info->align_size = RTE_CACHE_LINE_SIZE;
+
+	return 0;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
@@ -127,6 +143,15 @@ mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 	return ret;
 }
 
+int
+mvtvm_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(fp);
+
+	return 0;
+}
+
 int
 mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 		    struct cnxk_ml_model *model)
@@ -237,6 +262,12 @@ mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *
 	else
 		model->subtype = ML_CNXK_MODEL_SUBTYPE_TVM_HYBRID;
 
+	if (cnxk_mldev->type == CNXK_ML_DEV_TYPE_VDEV &&
+	    model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_LLVM) {
+		plt_err("Unsupported model sub-type");
+		return -ENOTSUP;
+	}
+
 	/* Set callback function array */
 	if (model->subtype != ML_CNXK_MODEL_SUBTYPE_TVM_LLVM) {
 		callback = &model->mvtvm.cb;
diff --git a/drivers/ml/cnxk/mvtvm_ml_ops.h b/drivers/ml/cnxk/mvtvm_ml_ops.h
index cb4b219743..0232c5ead5 100644
--- a/drivers/ml/cnxk/mvtvm_ml_ops.h
+++ b/drivers/ml/cnxk/mvtvm_ml_ops.h
@@ -55,8 +55,10 @@ struct mvtvm_ml_req {
 	struct mvtvm_ml_result result;
 };
 
+int mvtvm_ml_dev_info_get(struct cnxk_ml_dev *mldev, struct rte_ml_dev_info *dev_info);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int mvtvm_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.c b/drivers/ml/cnxk/mvtvm_ml_stubs.c
index 19af1d2703..126a954c91 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.c
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.c
@@ -67,6 +67,15 @@ mvtvm_ml_model_xstat_get(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *m
 	return 0;
 }
 
+int
+mvtvm_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(dev_info);
+
+	return -ENOTSUP;
+}
+
 int
 mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf)
 {
@@ -84,6 +93,15 @@ mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev)
 	return 0;
 }
 
+int
+mvtvm_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp)
+{
+	RTE_SET_USED(cnxk_mldev);
+	RTE_SET_USED(fp);
+
+	return -EINVAL;
+}
+
 int
 mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 		    struct cnxk_ml_model *model)
diff --git a/drivers/ml/cnxk/mvtvm_ml_stubs.h b/drivers/ml/cnxk/mvtvm_ml_stubs.h
index 3fd1f04c35..4220a963f2 100644
--- a/drivers/ml/cnxk/mvtvm_ml_stubs.h
+++ b/drivers/ml/cnxk/mvtvm_ml_stubs.h
@@ -14,8 +14,10 @@ struct cnxk_ml_model;
 struct cnxk_ml_layer;
 
 enum cnxk_ml_model_type mvtvm_ml_model_type_get(struct rte_ml_model_params *params);
+int mvtvm_ml_dev_info_get(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_dev_info *dev_info);
 int mvtvm_ml_dev_configure(struct cnxk_ml_dev *cnxk_mldev, const struct rte_ml_dev_config *conf);
 int mvtvm_ml_dev_close(struct cnxk_ml_dev *cnxk_mldev);
+int mvtvm_ml_dev_dump(struct cnxk_ml_dev *cnxk_mldev, FILE *fp);
 int mvtvm_ml_model_load(struct cnxk_ml_dev *cnxk_mldev, struct rte_ml_model_params *params,
 			struct cnxk_ml_model *model);
 int mvtvm_ml_model_unload(struct cnxk_ml_dev *cnxk_mldev, struct cnxk_ml_model *model);
-- 
2.42.0


^ permalink raw reply	[flat|nested] 340+ messages in thread

* Re: [PATCH v9 00/34] Implementation of revised ml/cnxk driver
  2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
                     ` (33 preceding siblings ...)
  2023-10-26 12:43   ` [PATCH v9 34/34] ml/cnxk: enable creation of mvtvm virtual device Srikanth Yalavarthi
@ 2023-10-29 12:53   ` Jerin Jacob
  34 siblings, 0 replies; 340+ messages in thread
From: Jerin Jacob @ 2023-10-29 12:53 UTC (permalink / raw)
  To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar

On Fri, Oct 27, 2023 at 12:12 AM Srikanth Yalavarthi
<syalavarthi@marvell.com> wrote:
>
> This patch series is an implementation of revised ml/cnxk driver
> to support models compiled with TVM compiler framework. TVM models
> use a hybrid mode for execution, with regions of the model executing
> on the ML accelerator and the rest executing on CPU cores.
>
> This series of commits reorganizes the ml/cnxk driver and adds support
> to execute multiple regions with-in a TVM model.
>
> v9:
>   - Fixed incorrect IO layout for TVM Marvell models
>   - Set byte offset to zero for I/O tensors
>   - Updated max layers macro definition. Set to TVMDP max layers.
>   - Fixed TVM model IO type to RTE IO type map

Series applied to dpdk-next-net-mrvl/for-next-net with following chages. Thanks.

1) cnxl ml driver update in doc/guides/rel_notes/release_23_11.rst
moved to close to mldev subsystem changes
2)
[for-next-net]dell[dpdk-next-net-mrvl] $ git diff
diff --git a/doc/guides/mldevs/cnxk.rst b/doc/guides/mldevs/cnxk.rst
index 28e5b5b87f..25e8ff783a 100644
--- a/doc/guides/mldevs/cnxk.rst
+++ b/doc/guides/mldevs/cnxk.rst
@@ -212,9 +212,9 @@ not part of DPDK and must be installed separately:
 .. note::

     In order for meson to find the dependencies during the configure stage,
-    it is required to add the cmake paths <install_prefix>/lib/cmake/dlpack,
-    <install_prefix>/lib/cmake/dmlc and <install_prefix>/lib/cmake/tvm to
-    CMAKE_PREFIX_PATH and <install_prefix>/lib/pkgconfig to PKG_CONFIG_PATH.
+    it is required to update CMAKE_PREFIX_PATH and PKG_CONFIG_PATH as below.
+    CMAKE_PREFIX_PATH='<install_prefix>/lib/cmake/tvm:<install_prefix>/lib/cmake/dlpack:<install_prefix>/lib/cmake/dmlc'
+    PKG_CONFIG_PATH='<install_prefix>/lib/pkgconfig'


For the record, I have used following build command to test code with
external build dep.

Assumung all depended libraries installed at /export/cross_ml/install/

CMAKE_PREFIX_PATH='/export/cross_ml/install/lib/cmake/tvm:/export/cross_ml/install/lib/cmake/dlpack:/export/cross_ml/install/lib/cmake/dmlc'
PKG_CONFIG_PATH='/export/cross_ml/install/lib/pkgconfig/' meson setup
 --cross config/arm/arm64_cn10k_linux_gcc  -Denable_docs=true
-Dexamples=all -Dc_args='-I/export/cross_ml/install/include'
-Dc_link_args='-L/export/cross_ml/install/lib' build

Also, -Dc_args='-I/export/cross_ml/install/include'
-Dc_link_args='-L/export/cross_ml/install/lib' could be removed when
following patch merged through main tree.
https://patches.dpdk.org/project/dpdk/patch/20231029082004.5576-1-syalavarthi@marvell.com/

^ permalink raw reply	[flat|nested] 340+ messages in thread

end of thread, other threads:[~2023-10-29 12:54 UTC | newest]

Thread overview: 340+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-30 15:58 [PATCH v1 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
2023-08-30 15:58 ` [PATCH v1 01/34] ml/cnxk: drop support for register polling Srikanth Yalavarthi
2023-08-30 15:58 ` [PATCH v1 02/34] ml/cnxk: drop use of RTE API for firmware read Srikanth Yalavarthi
2023-09-21 12:08   ` Jerin Jacob
2023-09-21 12:52     ` David Marchand
2023-09-21 13:06       ` [EXT] " Srikanth Yalavarthi
2023-09-21 13:26         ` David Marchand
2023-09-22  3:59           ` Srikanth Yalavarthi
2023-09-22  8:07             ` David Marchand
2023-09-22 16:59               ` Srikanth Yalavarthi
2023-09-27  9:38     ` David Marchand
2023-09-27 10:00       ` [EXT] " Srikanth Yalavarthi
2023-09-27 18:37     ` Srikanth Yalavarthi
2023-08-30 15:58 ` [PATCH v1 03/34] ml/cnxk: add generic cnxk device structure Srikanth Yalavarthi
2023-08-30 15:58 ` [PATCH v1 04/34] ml/cnxk: add generic model and layer structures Srikanth Yalavarthi
2023-08-30 15:58 ` [PATCH v1 05/34] ml/cnxk: add generic cnxk request structure Srikanth Yalavarthi
2023-08-30 15:58 ` [PATCH v1 06/34] ml/cnxk: add generic cnxk xstats structures Srikanth Yalavarthi
2023-08-30 15:58 ` [PATCH v1 07/34] ml/cnxk: rename cnxk ops function pointers struct Srikanth Yalavarthi
2023-08-30 15:58 ` [PATCH v1 08/34] ml/cnxk: update device handling functions Srikanth Yalavarthi
2023-08-30 15:58 ` [PATCH v1 09/34] ml/cnxk: update queue-pair " Srikanth Yalavarthi
2023-08-30 15:59 ` [PATCH v1 10/34] ml/cnxk: update model load and unload functions Srikanth Yalavarthi
2023-08-30 15:59 ` [PATCH v1 11/34] ml/cnxk: update model start and stop functions Srikanth Yalavarthi
2023-08-30 15:59 ` [PATCH v1 12/34] ml/cnxk: update model utility functions Srikanth Yalavarthi
2023-08-30 15:59 ` [PATCH v1 13/34] ml/cnxk: update data quantization functions Srikanth Yalavarthi
2023-08-30 15:59 ` [PATCH v1 14/34] ml/cnxk: update device debug functions Srikanth Yalavarthi
2023-08-30 15:59 ` [PATCH v1 15/34] ml/cnxk: update device stats functions Srikanth Yalavarthi
2023-08-30 15:59 ` [PATCH v1 16/34] ml/cnxk: update device and model xstats functions Srikanth Yalavarthi
2023-08-30 15:59 ` [PATCH v1 17/34] ml/cnxk: update fast path functions Srikanth Yalavarthi
2023-08-30 15:59 ` [PATCH v1 18/34] ml/cnxk: move error handling to cnxk layer Srikanth Yalavarthi
2023-08-30 15:59 ` [PATCH v1 19/34] ml/cnxk: support config and close of tvmdp library Srikanth Yalavarthi
2023-09-21 12:32   ` Jerin Jacob
2023-09-27 18:38     ` [EXT] " Srikanth Yalavarthi
2023-08-30 15:59 ` [PATCH v1 20/34] ml/cnxk: add structures to support TVM model type Srikanth Yalavarthi
2023-08-30 15:59 ` [PATCH v1 21/34] ml/cnxk: add support for identify " Srikanth Yalavarthi
2023-08-30 15:59 ` [PATCH v1 22/34] ml/cnxk: add support to parse TVM model objects Srikanth Yalavarthi
2023-08-30 15:59 ` [PATCH v1 23/34] ml/cnxk: fetch layer info and load TVM model Srikanth Yalavarthi
2023-08-30 15:59 ` [PATCH v1 24/34] ml/cnxk: update internal info for " Srikanth Yalavarthi
2023-08-30 15:59 ` [PATCH v1 25/34] ml/cnxk: enable model unload in tvmdp library Srikanth Yalavarthi
2023-08-30 15:59 ` [PATCH v1 26/34] ml/cnxk: support start and stop for TVM models Srikanth Yalavarthi
2023-08-30 15:59 ` [PATCH v1 27/34] ml/cnxk: update internal TVM model info structure Srikanth Yalavarthi
2023-08-30 15:59 ` [PATCH v1 28/34] ml/cnxk: support device dump for TVM models Srikanth Yalavarthi
2023-08-30 15:59 ` [PATCH v1 29/34] ml/cnxk: enable reporting model runtime as xstats Srikanth Yalavarthi
2023-08-30 15:59 ` [PATCH v1 30/34] ml/cnxk: implement I/O alloc and free callbacks Srikanth Yalavarthi
2023-08-30 15:59 ` [PATCH v1 31/34] ml/cnxk: add generic ML malloc and free callback Srikanth Yalavarthi
2023-08-30 15:59 ` [PATCH v1 32/34] ml/cnxk: support quantize and dequantize callback Srikanth Yalavarthi
2023-08-30 15:59 ` [PATCH v1 33/34] ml/cnxk: enable fast-path ops for TVM models Srikanth Yalavarthi
2023-08-30 15:59 ` [PATCH v1 34/34] ml/cnxk: enable creation of mvtvm virtual device Srikanth Yalavarthi
2023-09-20  7:24 ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Srikanth Yalavarthi
2023-09-20  7:24   ` [PATCH v2 01/34] ml/cnxk: drop support for register polling Srikanth Yalavarthi
2023-09-20  7:24   ` [PATCH v2 02/34] ml/cnxk: drop use of RTE API for firmware read Srikanth Yalavarthi
2023-09-20  7:24   ` [PATCH v2 03/34] ml/cnxk: add generic cnxk device structure Srikanth Yalavarthi
2023-09-20  7:24   ` [PATCH v2 04/34] ml/cnxk: add generic model and layer structures Srikanth Yalavarthi
2023-09-20  7:24   ` [PATCH v2 05/34] ml/cnxk: add generic cnxk request structure Srikanth Yalavarthi
2023-09-20  7:24   ` [PATCH v2 06/34] ml/cnxk: add generic cnxk xstats structures Srikanth Yalavarthi
2023-09-20  7:24   ` [PATCH v2 07/34] ml/cnxk: rename cnxk ops function pointers struct Srikanth Yalavarthi
2023-09-20  7:24   ` [PATCH v2 08/34] ml/cnxk: update device handling functions Srikanth Yalavarthi
2023-09-20  7:25   ` [PATCH v2 09/34] ml/cnxk: update queue-pair " Srikanth Yalavarthi
2023-09-20  7:25   ` [PATCH v2 10/34] ml/cnxk: update model load and unload functions Srikanth Yalavarthi
2023-09-20  7:25   ` [PATCH v2 11/34] ml/cnxk: update model start and stop functions Srikanth Yalavarthi
2023-09-20  7:25   ` [PATCH v2 12/34] ml/cnxk: update model utility functions Srikanth Yalavarthi
2023-09-20  7:25   ` [PATCH v2 13/34] ml/cnxk: update data quantization functions Srikanth Yalavarthi
2023-09-20  7:25   ` [PATCH v2 14/34] ml/cnxk: update device debug functions Srikanth Yalavarthi
2023-09-20  7:25   ` [PATCH v2 15/34] ml/cnxk: update device stats functions Srikanth Yalavarthi
2023-09-20  7:25   ` [PATCH v2 16/34] ml/cnxk: update device and model xstats functions Srikanth Yalavarthi
2023-09-20  7:25   ` [PATCH v2 17/34] ml/cnxk: update fast path functions Srikanth Yalavarthi
2023-09-20  7:25   ` [PATCH v2 18/34] ml/cnxk: move error handling to cnxk layer Srikanth Yalavarthi
2023-09-20  7:25   ` [PATCH v2 19/34] ml/cnxk: support config and close of tvmdp library Srikanth Yalavarthi
2023-09-20  7:25   ` [PATCH v2 20/34] ml/cnxk: add structures to support TVM model type Srikanth Yalavarthi
2023-09-20  7:25   ` [PATCH v2 21/34] ml/cnxk: add support for identify " Srikanth Yalavarthi
2023-09-20  7:25   ` [PATCH v2 22/34] ml/cnxk: add support to parse TVM model objects Srikanth Yalavarthi
2023-09-20  7:25   ` [PATCH v2 23/34] ml/cnxk: fetch layer info and load TVM model Srikanth Yalavarthi
2023-09-20  7:25   ` [PATCH v2 24/34] ml/cnxk: update internal info for " Srikanth Yalavarthi
2023-09-20  7:25   ` [PATCH v2 25/34] ml/cnxk: enable model unload in tvmdp library Srikanth Yalavarthi
2023-09-20  7:25   ` [PATCH v2 26/34] ml/cnxk: support start and stop for TVM models Srikanth Yalavarthi
2023-09-20  7:25   ` [PATCH v2 27/34] ml/cnxk: update internal TVM model info structure Srikanth Yalavarthi
2023-09-20  7:25   ` [PATCH v2 28/34] ml/cnxk: support device dump for TVM models Srikanth Yalavarthi
2023-09-20  7:25   ` [PATCH v2 29/34] ml/cnxk: enable reporting model runtime as xstats Srikanth Yalavarthi
2023-09-20  7:25   ` [PATCH v2 30/34] ml/cnxk: implement I/O alloc and free callbacks Srikanth Yalavarthi
2023-09-20  7:25   ` [PATCH v2 31/34] ml/cnxk: add generic ML malloc and free callback Srikanth Yalavarthi
2023-09-20  7:25   ` [PATCH v2 32/34] ml/cnxk: support quantize and dequantize callback Srikanth Yalavarthi
2023-09-20  7:25   ` [PATCH v2 33/34] ml/cnxk: enable fast-path ops for TVM models Srikanth Yalavarthi
2023-09-20  7:25   ` [PATCH v2 34/34] ml/cnxk: enable creation of mvtvm virtual device Srikanth Yalavarthi
2023-09-21 12:15   ` [PATCH v2 00/34] Implemenation of revised ml/cnxk driver Jerin Jacob
2023-09-27 18:39     ` [EXT] " Srikanth Yalavarthi
2023-09-27 18:30 ` [PATCH v3 00/35] " Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 01/35] ml/cnxk: drop support for register polling Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 02/35] ml/cnxk: add generic cnxk device structure Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 03/35] ml/cnxk: add generic model and layer structures Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 04/35] ml/cnxk: add generic cnxk request structure Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 05/35] ml/cnxk: add generic cnxk xstats structures Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 06/35] ml/cnxk: rename cnxk ops function pointers struct Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 07/35] ml/cnxk: update device handling functions Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 08/35] ml/cnxk: update queue-pair " Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 09/35] ml/cnxk: update model load and unload functions Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 10/35] ml/cnxk: update model start and stop functions Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 11/35] ml/cnxk: update model utility functions Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 12/35] ml/cnxk: update data quantization functions Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 13/35] ml/cnxk: update device debug functions Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 14/35] ml/cnxk: update device stats functions Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 15/35] ml/cnxk: update device and model xstats functions Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 16/35] ml/cnxk: update fast path functions Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 17/35] ml/cnxk: move error handling to cnxk layer Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 18/35] ml/cnxk: support config and close of tvmdp library Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 19/35] ml/cnxk: add structures to support TVM model type Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 20/35] ml/cnxk: add support for identify " Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 21/35] ml/cnxk: add support to parse TVM model objects Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 22/35] ml/cnxk: fetch layer info and load TVM model Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 23/35] ml/cnxk: update internal info for " Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 24/35] ml/cnxk: enable model unload in tvmdp library Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 25/35] ml/cnxk: support start and stop for TVM models Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 26/35] ml/cnxk: update internal TVM model info structure Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 27/35] ml/cnxk: support device dump for TVM models Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 28/35] ml/cnxk: enable reporting model runtime as xstats Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 29/35] ml/cnxk: implement I/O alloc and free callbacks Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 30/35] ml/cnxk: add generic ML malloc and free callback Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 31/35] ml/cnxk: support quantize and dequantize callback Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 32/35] ml/cnxk: enable fast-path ops for TVM models Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 33/35] ml/cnxk: enable creation of mvtvm virtual device Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 34/35] ml/cnxk: update dependency info in driver docs Srikanth Yalavarthi
2023-09-28  4:12     ` Jerin Jacob
2023-10-01  0:32       ` [EXT] " Srikanth Yalavarthi
2023-10-17 17:03       ` Srikanth Yalavarthi
2023-09-27 18:30   ` [PATCH v3 35/35] ml/cnxk: update release notes for 23.11 Srikanth Yalavarthi
2023-10-17 16:59 ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 01/34] ml/cnxk: drop support for register polling Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 02/34] ml/cnxk: add generic cnxk device structure Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 03/34] ml/cnxk: add generic model and layer structures Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 04/34] ml/cnxk: add generic cnxk request structure Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 05/34] ml/cnxk: add generic cnxk xstats structures Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 06/34] ml/cnxk: rename cnxk ops function pointers struct Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 07/34] ml/cnxk: update device handling functions Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 08/34] ml/cnxk: update queue-pair " Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 09/34] ml/cnxk: update model load and unload functions Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 10/34] ml/cnxk: enable OCM check for multilayer TVM model Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 11/34] ml/cnxk: update model start and stop functions Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 12/34] ml/cnxk: update model utility functions Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 13/34] ml/cnxk: update data quantization functions Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 14/34] ml/cnxk: update device debug functions Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 15/34] ml/cnxk: update device stats functions Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 16/34] ml/cnxk: update device and model xstats functions Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 17/34] ml/cnxk: update fast path functions Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 18/34] ml/cnxk: move error handling to cnxk layer Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 19/34] ml/cnxk: support config and close of tvmdp library Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 20/34] ml/cnxk: add structures to support TVM model type Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 21/34] ml/cnxk: add support for identify " Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 22/34] ml/cnxk: add support to parse TVM model objects Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 23/34] ml/cnxk: fetch layer info and load TVM model Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 24/34] ml/cnxk: update internal info for " Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 25/34] ml/cnxk: enable model unload in tvmdp library Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 26/34] ml/cnxk: support start and stop for TVM models Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 27/34] ml/cnxk: update internal TVM model info structure Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 28/34] ml/cnxk: support device dump for TVM models Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 29/34] ml/cnxk: enable reporting model runtime as xstats Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 30/34] ml/cnxk: implement I/O alloc and free callbacks Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 31/34] ml/cnxk: add generic ML malloc and free callback Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 32/34] ml/cnxk: support quantize and dequantize callback Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 33/34] ml/cnxk: enable fast-path ops for TVM models Srikanth Yalavarthi
2023-10-17 16:59   ` [PATCH v4 34/34] ml/cnxk: enable creation of mvtvm virtual device Srikanth Yalavarthi
2023-10-18  1:56   ` [PATCH v4 00/34] Implementation of revised ml/cnxk driver Jerin Jacob
2023-10-18  6:55     ` [EXT] " Srikanth Yalavarthi
2023-10-18  6:47 ` [PATCH v5 " Srikanth Yalavarthi
2023-10-18  6:47   ` [PATCH v5 01/34] ml/cnxk: drop support for register polling Srikanth Yalavarthi
2023-10-18  6:47   ` [PATCH v5 02/34] ml/cnxk: add generic cnxk device structure Srikanth Yalavarthi
2023-10-18  6:47   ` [PATCH v5 03/34] ml/cnxk: add generic model and layer structures Srikanth Yalavarthi
2023-10-18  6:47   ` [PATCH v5 04/34] ml/cnxk: add generic cnxk request structure Srikanth Yalavarthi
2023-10-18  6:47   ` [PATCH v5 05/34] ml/cnxk: add generic cnxk xstats structures Srikanth Yalavarthi
2023-10-18  6:47   ` [PATCH v5 06/34] ml/cnxk: rename cnxk ops function pointers struct Srikanth Yalavarthi
2023-10-18  6:47   ` [PATCH v5 07/34] ml/cnxk: update device handling functions Srikanth Yalavarthi
2023-10-18  6:47   ` [PATCH v5 08/34] ml/cnxk: update queue-pair " Srikanth Yalavarthi
2023-10-18  6:47   ` [PATCH v5 09/34] ml/cnxk: update model load and unload functions Srikanth Yalavarthi
2023-10-18  6:47   ` [PATCH v5 10/34] ml/cnxk: update model start and stop functions Srikanth Yalavarthi
2023-10-18  6:47   ` [PATCH v5 11/34] ml/cnxk: update model utility functions Srikanth Yalavarthi
2023-10-18  6:47   ` [PATCH v5 12/34] ml/cnxk: update data quantization functions Srikanth Yalavarthi
2023-10-18  6:47   ` [PATCH v5 13/34] ml/cnxk: update device debug functions Srikanth Yalavarthi
2023-10-18  6:47   ` [PATCH v5 14/34] ml/cnxk: update device stats functions Srikanth Yalavarthi
2023-10-18  6:47   ` [PATCH v5 15/34] ml/cnxk: update device and model xstats functions Srikanth Yalavarthi
2023-10-18  6:47   ` [PATCH v5 16/34] ml/cnxk: update fast path functions Srikanth Yalavarthi
2023-10-18  6:47   ` [PATCH v5 17/34] ml/cnxk: move error handling to cnxk layer Srikanth Yalavarthi
2023-10-18  6:47   ` [PATCH v5 18/34] ml/cnxk: support config and close of tvmdp library Srikanth Yalavarthi
2023-10-18  6:47   ` [PATCH v5 19/34] ml/cnxk: add structures to support TVM model type Srikanth Yalavarthi
2023-10-18  6:47   ` [PATCH v5 20/34] ml/cnxk: add support for identify " Srikanth Yalavarthi
2023-10-18  6:47   ` [PATCH v5 21/34] ml/cnxk: add support to parse TVM model objects Srikanth Yalavarthi
2023-10-18  6:47   ` [PATCH v5 22/34] ml/cnxk: fetch layer info and load TVM model Srikanth Yalavarthi
2023-10-18  6:47   ` [PATCH v5 23/34] ml/cnxk: update internal info for " Srikanth Yalavarthi
2023-10-18  6:47   ` [PATCH v5 24/34] ml/cnxk: enable model unload in tvmdp library Srikanth Yalavarthi
2023-10-18  6:47   ` [PATCH v5 25/34] ml/cnxk: enable OCM check for multilayer TVM model Srikanth Yalavarthi
2023-10-18  6:47   ` [PATCH v5 26/34] ml/cnxk: support start and stop for TVM models Srikanth Yalavarthi
2023-10-18  6:47   ` [PATCH v5 27/34] ml/cnxk: update internal TVM model info structure Srikanth Yalavarthi
2023-10-18  6:47   ` [PATCH v5 28/34] ml/cnxk: support device dump for TVM models Srikanth Yalavarthi
2023-10-18  6:47   ` [PATCH v5 29/34] ml/cnxk: enable reporting model runtime as xstats Srikanth Yalavarthi
2023-10-18  6:47   ` [PATCH v5 30/34] ml/cnxk: implement I/O alloc and free callbacks Srikanth Yalavarthi
2023-10-18  6:47   ` [PATCH v5 31/34] ml/cnxk: add generic ML malloc and free callback Srikanth Yalavarthi
2023-10-18  6:48   ` [PATCH v5 32/34] ml/cnxk: support quantize and dequantize callback Srikanth Yalavarthi
2023-10-18  6:48   ` [PATCH v5 33/34] ml/cnxk: enable fast-path ops for TVM models Srikanth Yalavarthi
2023-10-18  6:48   ` [PATCH v5 34/34] ml/cnxk: enable creation of mvtvm virtual device Srikanth Yalavarthi
2023-10-18 13:53 ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
2023-10-18 13:53   ` [PATCH v6 01/34] ml/cnxk: drop support for register polling Srikanth Yalavarthi
2023-10-18 13:53   ` [PATCH v6 02/34] ml/cnxk: add generic cnxk device structure Srikanth Yalavarthi
2023-10-18 13:53   ` [PATCH v6 03/34] ml/cnxk: add generic model and layer structures Srikanth Yalavarthi
2023-10-18 13:53   ` [PATCH v6 04/34] ml/cnxk: add generic cnxk request structure Srikanth Yalavarthi
2023-10-18 13:53   ` [PATCH v6 05/34] ml/cnxk: add generic cnxk xstats structures Srikanth Yalavarthi
2023-10-18 13:53   ` [PATCH v6 06/34] ml/cnxk: rename cnxk ops function pointers struct Srikanth Yalavarthi
2023-10-18 13:53   ` [PATCH v6 07/34] ml/cnxk: update device handling functions Srikanth Yalavarthi
2023-10-18 13:53   ` [PATCH v6 08/34] ml/cnxk: update queue-pair " Srikanth Yalavarthi
2023-10-18 13:53   ` [PATCH v6 09/34] ml/cnxk: update model load and unload functions Srikanth Yalavarthi
2023-10-18 13:53   ` [PATCH v6 10/34] ml/cnxk: update model start and stop functions Srikanth Yalavarthi
2023-10-18 13:53   ` [PATCH v6 11/34] ml/cnxk: update model utility functions Srikanth Yalavarthi
2023-10-18 13:53   ` [PATCH v6 12/34] ml/cnxk: update data quantization functions Srikanth Yalavarthi
2023-10-18 13:53   ` [PATCH v6 13/34] ml/cnxk: update device debug functions Srikanth Yalavarthi
2023-10-18 13:53   ` [PATCH v6 14/34] ml/cnxk: update device stats functions Srikanth Yalavarthi
2023-10-18 13:54   ` [PATCH v6 15/34] ml/cnxk: update device and model xstats functions Srikanth Yalavarthi
2023-10-18 13:54   ` [PATCH v6 16/34] ml/cnxk: update fast path functions Srikanth Yalavarthi
2023-10-18 13:54   ` [PATCH v6 17/34] ml/cnxk: move error handling to cnxk layer Srikanth Yalavarthi
2023-10-18 13:54   ` [PATCH v6 18/34] ml/cnxk: support config and close of tvmdp library Srikanth Yalavarthi
2023-10-18 18:34     ` Jerin Jacob
2023-10-19  6:44       ` [EXT] " Srikanth Yalavarthi
2023-10-18 13:54   ` [PATCH v6 19/34] ml/cnxk: add structures to support TVM model type Srikanth Yalavarthi
2023-10-18 13:54   ` [PATCH v6 20/34] ml/cnxk: add support for identify " Srikanth Yalavarthi
2023-10-18 13:54   ` [PATCH v6 21/34] ml/cnxk: add support to parse TVM model objects Srikanth Yalavarthi
2023-10-18 13:54   ` [PATCH v6 22/34] ml/cnxk: fetch layer info and load TVM model Srikanth Yalavarthi
2023-10-18 13:54   ` [PATCH v6 23/34] ml/cnxk: update internal info for " Srikanth Yalavarthi
2023-10-18 13:54   ` [PATCH v6 24/34] ml/cnxk: enable model unload in tvmdp library Srikanth Yalavarthi
2023-10-18 13:54   ` [PATCH v6 25/34] ml/cnxk: enable OCM check for multilayer TVM model Srikanth Yalavarthi
2023-10-18 13:54   ` [PATCH v6 26/34] ml/cnxk: support start and stop for TVM models Srikanth Yalavarthi
2023-10-18 13:54   ` [PATCH v6 27/34] ml/cnxk: update internal TVM model info structure Srikanth Yalavarthi
2023-10-18 13:54   ` [PATCH v6 28/34] ml/cnxk: support device dump for TVM models Srikanth Yalavarthi
2023-10-18 13:54   ` [PATCH v6 29/34] ml/cnxk: enable reporting model runtime as xstats Srikanth Yalavarthi
2023-10-18 13:54   ` [PATCH v6 30/34] ml/cnxk: implement I/O alloc and free callbacks Srikanth Yalavarthi
2023-10-18 13:54   ` [PATCH v6 31/34] ml/cnxk: add generic ML malloc and free callback Srikanth Yalavarthi
2023-10-18 13:54   ` [PATCH v6 32/34] ml/cnxk: support quantize and dequantize callback Srikanth Yalavarthi
2023-10-18 13:54   ` [PATCH v6 33/34] ml/cnxk: enable fast-path ops for TVM models Srikanth Yalavarthi
2023-10-18 13:54   ` [PATCH v6 34/34] ml/cnxk: enable creation of mvtvm virtual device Srikanth Yalavarthi
2023-10-18 14:20   ` [PATCH v6 00/34] Implementation of revised ml/cnxk driver Jerin Jacob
2023-10-19  6:41     ` [EXT] " Srikanth Yalavarthi
2023-10-19  4:16 ` [PATCH v7 " Srikanth Yalavarthi
2023-10-19  4:16   ` [PATCH v7 01/34] ml/cnxk: drop support for register polling Srikanth Yalavarthi
2023-10-19  4:16   ` [PATCH v7 02/34] ml/cnxk: add generic cnxk device structure Srikanth Yalavarthi
2023-10-19  4:16   ` [PATCH v7 03/34] ml/cnxk: add generic model and layer structures Srikanth Yalavarthi
2023-10-19  4:16   ` [PATCH v7 04/34] ml/cnxk: add generic cnxk request structure Srikanth Yalavarthi
2023-10-19  4:16   ` [PATCH v7 05/34] ml/cnxk: add generic cnxk xstats structures Srikanth Yalavarthi
2023-10-19  4:16   ` [PATCH v7 06/34] ml/cnxk: rename cnxk ops function pointers struct Srikanth Yalavarthi
2023-10-19  4:16   ` [PATCH v7 07/34] ml/cnxk: update device handling functions Srikanth Yalavarthi
2023-10-19  4:16   ` [PATCH v7 08/34] ml/cnxk: update queue-pair " Srikanth Yalavarthi
2023-10-19  4:16   ` [PATCH v7 09/34] ml/cnxk: update model load and unload functions Srikanth Yalavarthi
2023-10-19  4:16   ` [PATCH v7 10/34] ml/cnxk: update model start and stop functions Srikanth Yalavarthi
2023-10-19  4:17   ` [PATCH v7 11/34] ml/cnxk: update model utility functions Srikanth Yalavarthi
2023-10-19  4:17   ` [PATCH v7 12/34] ml/cnxk: update data quantization functions Srikanth Yalavarthi
2023-10-19  4:17   ` [PATCH v7 13/34] ml/cnxk: update device debug functions Srikanth Yalavarthi
2023-10-19  4:17   ` [PATCH v7 14/34] ml/cnxk: update device stats functions Srikanth Yalavarthi
2023-10-19  4:17   ` [PATCH v7 15/34] ml/cnxk: update device and model xstats functions Srikanth Yalavarthi
2023-10-19  4:17   ` [PATCH v7 16/34] ml/cnxk: update fast path functions Srikanth Yalavarthi
2023-10-19  4:17   ` [PATCH v7 17/34] ml/cnxk: move error handling to cnxk layer Srikanth Yalavarthi
2023-10-19  4:17   ` [PATCH v7 18/34] ml/cnxk: support config and close of tvmdp library Srikanth Yalavarthi
2023-10-19  4:17   ` [PATCH v7 19/34] ml/cnxk: add structures to support TVM model type Srikanth Yalavarthi
2023-10-19  4:17   ` [PATCH v7 20/34] ml/cnxk: add support for identify " Srikanth Yalavarthi
2023-10-19  4:17   ` [PATCH v7 21/34] ml/cnxk: add support to parse TVM model objects Srikanth Yalavarthi
2023-10-19  4:17   ` [PATCH v7 22/34] ml/cnxk: fetch layer info and load TVM model Srikanth Yalavarthi
2023-10-19  4:17   ` [PATCH v7 23/34] ml/cnxk: update internal info for " Srikanth Yalavarthi
2023-10-19  4:17   ` [PATCH v7 24/34] ml/cnxk: enable model unload in tvmdp library Srikanth Yalavarthi
2023-10-19  4:17   ` [PATCH v7 25/34] ml/cnxk: enable OCM check for multilayer TVM model Srikanth Yalavarthi
2023-10-19  4:17   ` [PATCH v7 26/34] ml/cnxk: support start and stop for TVM models Srikanth Yalavarthi
2023-10-19  4:17   ` [PATCH v7 27/34] ml/cnxk: update internal TVM model info structure Srikanth Yalavarthi
2023-10-19  4:17   ` [PATCH v7 28/34] ml/cnxk: support device dump for TVM models Srikanth Yalavarthi
2023-10-19  4:17   ` [PATCH v7 29/34] ml/cnxk: enable reporting model runtime as xstats Srikanth Yalavarthi
2023-10-19  4:17   ` [PATCH v7 30/34] ml/cnxk: implement I/O alloc and free callbacks Srikanth Yalavarthi
2023-10-19  4:17   ` [PATCH v7 31/34] ml/cnxk: add generic ML malloc and free callback Srikanth Yalavarthi
2023-10-19  4:17   ` [PATCH v7 32/34] ml/cnxk: support quantize and dequantize callback Srikanth Yalavarthi
2023-10-19  4:17   ` [PATCH v7 33/34] ml/cnxk: enable fast-path ops for TVM models Srikanth Yalavarthi
2023-10-19  4:17   ` [PATCH v7 34/34] ml/cnxk: enable creation of mvtvm virtual device Srikanth Yalavarthi
2023-10-23  4:41 ` [PATCH v8 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 01/34] ml/cnxk: drop support for register polling Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 02/34] ml/cnxk: add generic cnxk device structure Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 03/34] ml/cnxk: add generic model and layer structures Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 04/34] ml/cnxk: add generic cnxk request structure Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 05/34] ml/cnxk: add generic cnxk xstats structures Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 06/34] ml/cnxk: rename cnxk ops function pointers struct Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 07/34] ml/cnxk: update device handling functions Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 08/34] ml/cnxk: update queue-pair " Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 09/34] ml/cnxk: update model load and unload functions Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 10/34] ml/cnxk: update model start and stop functions Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 11/34] ml/cnxk: update model utility functions Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 12/34] ml/cnxk: update data quantization functions Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 13/34] ml/cnxk: update device debug functions Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 14/34] ml/cnxk: update device stats functions Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 15/34] ml/cnxk: update device and model xstats functions Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 16/34] ml/cnxk: update fast path functions Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 17/34] ml/cnxk: move error handling to cnxk layer Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 18/34] ml/cnxk: support config and close of tvmdp library Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 19/34] ml/cnxk: add structures to support TVM model type Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 20/34] ml/cnxk: add support for identify " Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 21/34] ml/cnxk: add support to parse TVM model objects Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 22/34] ml/cnxk: fetch layer info and load TVM model Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 23/34] ml/cnxk: update internal info for " Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 24/34] ml/cnxk: enable model unload in tvmdp library Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 25/34] ml/cnxk: enable OCM check for multilayer TVM model Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 26/34] ml/cnxk: support start and stop for TVM models Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 27/34] ml/cnxk: update internal TVM model info structure Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 28/34] ml/cnxk: support device dump for TVM models Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 29/34] ml/cnxk: enable reporting model runtime as xstats Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 30/34] ml/cnxk: implement I/O alloc and free callbacks Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 31/34] ml/cnxk: add generic ML malloc and free callback Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 32/34] ml/cnxk: support quantize and dequantize callback Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 33/34] ml/cnxk: enable fast-path ops for TVM models Srikanth Yalavarthi
2023-10-23  4:41   ` [PATCH v8 34/34] ml/cnxk: enable creation of mvtvm virtual device Srikanth Yalavarthi
2023-10-26 12:43 ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 01/34] ml/cnxk: drop support for register polling Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 02/34] ml/cnxk: add generic cnxk device structure Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 03/34] ml/cnxk: add generic model and layer structures Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 04/34] ml/cnxk: add generic cnxk request structure Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 05/34] ml/cnxk: add generic cnxk xstats structures Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 06/34] ml/cnxk: rename cnxk ops function pointers struct Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 07/34] ml/cnxk: update device handling functions Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 08/34] ml/cnxk: update queue-pair " Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 09/34] ml/cnxk: update model load and unload functions Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 10/34] ml/cnxk: update model start and stop functions Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 11/34] ml/cnxk: update model utility functions Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 12/34] ml/cnxk: update data quantization functions Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 13/34] ml/cnxk: update device debug functions Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 14/34] ml/cnxk: update device stats functions Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 15/34] ml/cnxk: update device and model xstats functions Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 16/34] ml/cnxk: update fast path functions Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 17/34] ml/cnxk: move error handling to cnxk layer Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 18/34] ml/cnxk: support config and close of tvmdp library Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 19/34] ml/cnxk: add structures to support TVM model type Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 20/34] ml/cnxk: add support for identify " Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 21/34] ml/cnxk: add support to parse TVM model objects Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 22/34] ml/cnxk: fetch layer info and load TVM model Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 23/34] ml/cnxk: update internal info for " Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 24/34] ml/cnxk: enable model unload in tvmdp library Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 25/34] ml/cnxk: enable OCM check for multilayer TVM model Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 26/34] ml/cnxk: support start and stop for TVM models Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 27/34] ml/cnxk: update internal TVM model info structure Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 28/34] ml/cnxk: support device dump for TVM models Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 29/34] ml/cnxk: enable reporting model runtime as xstats Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 30/34] ml/cnxk: implement I/O alloc and free callbacks Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 31/34] ml/cnxk: add generic ML malloc and free callback Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 32/34] ml/cnxk: support quantize and dequantize callback Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 33/34] ml/cnxk: enable fast-path ops for TVM models Srikanth Yalavarthi
2023-10-26 12:43   ` [PATCH v9 34/34] ml/cnxk: enable creation of mvtvm virtual device Srikanth Yalavarthi
2023-10-29 12:53   ` [PATCH v9 00/34] Implementation of revised ml/cnxk driver Jerin Jacob

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).