From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 59B9D41C30;
	Tue,  7 Feb 2023 17:08:54 +0100 (CET)
Received: from mails.dpdk.org (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id D0FA942D73;
	Tue,  7 Feb 2023 17:07:40 +0100 (CET)
Received: from mx0b-0016f401.pphosted.com (mx0a-0016f401.pphosted.com
 [67.231.148.174])
 by mails.dpdk.org (Postfix) with ESMTP id 32047427E9
 for <dev@dpdk.org>; Tue,  7 Feb 2023 17:07:29 +0100 (CET)
Received: from pps.filterd (m0045849.ppops.net [127.0.0.1])
 by mx0a-0016f401.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id
 317EgVTl017206 for <dev@dpdk.org>; Tue, 7 Feb 2023 08:07:28 -0800
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com;
 h=from : to : cc :
 subject : date : message-id : in-reply-to : references : mime-version :
 content-type; s=pfpt0220; bh=bEzzf8hUf98dRIH02qWrNrwfHIlzSsptJ2JA9OpgoUs=;
 b=RUaM8ha72hTWm4i5VNlO/T5bP8+bW9FGD9DEKnGrg+7BPtkG7sjUqywe38QjMn7sjFjC
 Na9MJZdOwjF2AhTofQSne/8AuZuvM+2jlCcdqEefHzXP+PykrSl5CevvjfGwle8ypf1t
 SwGTx45XdCoeYDFUhR/2jh/g3sUJaJOaLh490Cl+BsNhmfW2YAf2ab6JSy4BpxPSvdwm
 /WJ/ToXn+iO6v7WfWMA3kKyF//q1vVeNLwhRoHcbKj9FW+ViBzG5vSDDd4c0OKU+WUia
 Fkl6tQ9gOChSqek5Fzx9BWRkCATiQRVoFda9eFvqW7YBG8WhtpFKwnWvc4yY9WQlsbhm rg== 
Received: from dc5-exch02.marvell.com ([199.233.59.182])
 by mx0a-0016f401.pphosted.com (PPS) with ESMTPS id 3nkdyrssx7-8
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT)
 for <dev@dpdk.org>; Tue, 07 Feb 2023 08:07:28 -0800
Received: from DC5-EXCH01.marvell.com (10.69.176.38) by DC5-EXCH02.marvell.com
 (10.69.176.39) with Microsoft SMTP Server (TLS) id 15.0.1497.42;
 Tue, 7 Feb 2023 08:07:25 -0800
Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH01.marvell.com
 (10.69.176.38) with Microsoft SMTP Server id 15.0.1497.42 via Frontend
 Transport; Tue, 7 Feb 2023 08:07:25 -0800
Received: from ml-host-33.caveonetworks.com (unknown [10.110.143.233])
 by maili.marvell.com (Postfix) with ESMTP id BE58E3F708C;
 Tue,  7 Feb 2023 08:07:25 -0800 (PST)
From: Srikanth Yalavarthi <syalavarthi@marvell.com>
To: Srikanth Yalavarthi <syalavarthi@marvell.com>
CC: <dev@dpdk.org>, <sshankarnara@marvell.com>, <jerinj@marvell.com>,
 <aprabhu@marvell.com>, <ptakkar@marvell.com>, <pshukla@marvell.com>
Subject: [PATCH v5 13/39] ml/cnxk: add internal structures for derived info
Date: Tue, 7 Feb 2023 08:06:53 -0800
Message-ID: <20230207160719.1307-14-syalavarthi@marvell.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20230207160719.1307-1-syalavarthi@marvell.com>
References: <20221208200220.20267-1-syalavarthi@marvell.com>
 <20230207160719.1307-1-syalavarthi@marvell.com>
MIME-Version: 1.0
Content-Type: text/plain
X-Proofpoint-ORIG-GUID: ZG40KOhIlurWX7fj6QoV_jNcGva5n78T
X-Proofpoint-GUID: ZG40KOhIlurWX7fj6QoV_jNcGva5n78T
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.122.1
 definitions=2023-02-07_07,2023-02-06_03,2022-06-22_01
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

Added internal structures to handle derived address fields
and enabled support to compute DMA addresses for model start.
Enabled updating internal model fields.

Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
 drivers/ml/cnxk/cn10k_ml_model.c | 89 ++++++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_model.h | 80 ++++++++++++++++++++++++++++
 drivers/ml/cnxk/cn10k_ml_ops.c   | 18 ++++++-
 3 files changed, 186 insertions(+), 1 deletion(-)

diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c
index dfa814bbe0..2530beb80e 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.c
+++ b/drivers/ml/cnxk/cn10k_ml_model.c
@@ -214,3 +214,92 @@ cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata)
 			cn10k_ml_io_type_map(metadata->output[i].model_output_type);
 	}
 }
+
+void
+cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_t *base_dma_addr)
+{
+	struct cn10k_ml_model_metadata *metadata;
+	struct cn10k_ml_model_addr *addr;
+	size_t model_data_size;
+	uint8_t *dma_addr_load;
+	uint8_t *dma_addr_run;
+	uint8_t i;
+	int fpos;
+
+	metadata = &model->metadata;
+	addr = &model->addr;
+	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
+			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
+
+	/* Base address */
+	addr->base_dma_addr_load = base_dma_addr;
+	addr->base_dma_addr_run = PLT_PTR_ADD(addr->base_dma_addr_load, model_data_size);
+
+	/* Init section */
+	dma_addr_load = addr->base_dma_addr_load;
+	dma_addr_run = addr->base_dma_addr_run;
+	fpos = sizeof(struct cn10k_ml_model_metadata);
+	addr->init_load_addr = dma_addr_load;
+	addr->init_run_addr = dma_addr_run;
+	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->init_model.file_size);
+
+	/* Main section */
+	dma_addr_load += metadata->init_model.file_size;
+	dma_addr_run += metadata->init_model.file_size;
+	fpos += metadata->init_model.file_size;
+	addr->main_load_addr = dma_addr_load;
+	addr->main_run_addr = dma_addr_run;
+	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->main_model.file_size);
+
+	/* Finish section */
+	dma_addr_load += metadata->main_model.file_size;
+	dma_addr_run += metadata->main_model.file_size;
+	fpos += metadata->main_model.file_size;
+	addr->finish_load_addr = dma_addr_load;
+	addr->finish_run_addr = dma_addr_run;
+	rte_memcpy(dma_addr_load, PLT_PTR_ADD(buffer, fpos), metadata->finish_model.file_size);
+
+	/* Weights and Bias section */
+	dma_addr_load += metadata->finish_model.file_size;
+	fpos += metadata->finish_model.file_size;
+	addr->wb_base_addr = PLT_PTR_SUB(dma_addr_load, metadata->weights_bias.mem_offset);
+	addr->wb_load_addr = PLT_PTR_ADD(addr->wb_base_addr, metadata->weights_bias.mem_offset);
+	rte_memcpy(addr->wb_load_addr, PLT_PTR_ADD(buffer, fpos), metadata->weights_bias.file_size);
+
+	/* Inputs */
+	addr->total_input_sz_d = 0;
+	addr->total_input_sz_q = 0;
+	for (i = 0; i < metadata->model.num_input; i++) {
+		addr->input[i].nb_elements =
+			model->metadata.input[i].shape.w * model->metadata.input[i].shape.x *
+			model->metadata.input[i].shape.y * model->metadata.input[i].shape.z;
+		addr->input[i].sz_d = addr->input[i].nb_elements *
+				      rte_ml_io_type_size_get(metadata->input[i].input_type);
+		addr->input[i].sz_q = addr->input[i].nb_elements *
+				      rte_ml_io_type_size_get(metadata->input[i].model_input_type);
+		addr->total_input_sz_d += addr->input[i].sz_d;
+		addr->total_input_sz_q += addr->input[i].sz_q;
+
+		plt_ml_dbg("model_id = %u, input[%u] - w:%u x:%u y:%u z:%u, sz_d = %u sz_q = %u",
+			   model->model_id, i, metadata->input[i].shape.w,
+			   metadata->input[i].shape.x, metadata->input[i].shape.y,
+			   metadata->input[i].shape.z, addr->input[i].sz_d, addr->input[i].sz_q);
+	}
+
+	/* Outputs */
+	addr->total_output_sz_q = 0;
+	addr->total_output_sz_d = 0;
+	for (i = 0; i < metadata->model.num_output; i++) {
+		addr->output[i].nb_elements = metadata->output[i].size;
+		addr->output[i].sz_d = addr->output[i].nb_elements *
+				       rte_ml_io_type_size_get(metadata->output[i].output_type);
+		addr->output[i].sz_q =
+			addr->output[i].nb_elements *
+			rte_ml_io_type_size_get(metadata->output[i].model_output_type);
+		addr->total_output_sz_q += addr->output[i].sz_q;
+		addr->total_output_sz_d += addr->output[i].sz_d;
+
+		plt_ml_dbg("model_id = %u, output[%u] - sz_d = %u, sz_q = %u", model->model_id, i,
+			   addr->output[i].sz_d, addr->output[i].sz_q);
+	}
+}
diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h
index dc30bc2aa7..5345160a74 100644
--- a/drivers/ml/cnxk/cn10k_ml_model.h
+++ b/drivers/ml/cnxk/cn10k_ml_model.h
@@ -322,6 +322,81 @@ struct cn10k_ml_model_metadata {
 	uint8_t reserved3[16];
 };
 
+/* Model address structure */
+struct cn10k_ml_model_addr {
+	/* Base DMA address for load */
+	void *base_dma_addr_load;
+
+	/* Base DMA address for run */
+	void *base_dma_addr_run;
+
+	/* Init section load address */
+	void *init_load_addr;
+
+	/* Init section run address */
+	void *init_run_addr;
+
+	/* Main section load address */
+	void *main_load_addr;
+
+	/* Main section run address */
+	void *main_run_addr;
+
+	/* Finish section load address */
+	void *finish_load_addr;
+
+	/* Finish section run address */
+	void *finish_run_addr;
+
+	/* Weights and Bias base address */
+	void *wb_base_addr;
+
+	/* Weights and bias load address */
+	void *wb_load_addr;
+
+	/* Start tile */
+	uint8_t tile_start;
+
+	/* End tile */
+	uint8_t tile_end;
+
+	/* Input address and size */
+	struct {
+		/* Number of elements */
+		uint32_t nb_elements;
+
+		/* Dequantized input size */
+		uint32_t sz_d;
+
+		/* Quantized input size */
+		uint32_t sz_q;
+	} input[MRVL_ML_INPUT_OUTPUT_SIZE];
+
+	/* Output address and size */
+	struct {
+		/* Number of elements */
+		uint32_t nb_elements;
+
+		/* Dequantize output size */
+		uint32_t sz_d;
+
+		/* Quantized output size */
+		uint32_t sz_q;
+	} output[MRVL_ML_INPUT_OUTPUT_SIZE];
+
+	/* Total size of quantized input */
+	uint32_t total_input_sz_q;
+
+	/* Total size of dequantized input */
+	uint32_t total_input_sz_d;
+
+	/* Total size of quantized output */
+	uint32_t total_output_sz_q;
+
+	/* Total size of dequantized output */
+	uint32_t total_output_sz_d;
+};
+
 /* Model Object */
 struct cn10k_ml_model {
 	/* Device reference */
@@ -339,6 +414,9 @@ struct cn10k_ml_model {
 	/* Metadata */
 	struct cn10k_ml_model_metadata metadata;
 
+	/* Address structure */
+	struct cn10k_ml_model_addr addr;
+
 	/* Spinlock, used to update model state */
 	plt_spinlock_t lock;
 
@@ -348,5 +426,7 @@ struct cn10k_ml_model {
 
 int cn10k_ml_model_metadata_check(uint8_t *buffer, uint64_t size);
 void cn10k_ml_model_metadata_update(struct cn10k_ml_model_metadata *metadata);
+void cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer,
+				uint8_t *base_dma_addr);
 
 #endif /* _CN10K_ML_MODEL_H_ */
diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c
index 2cde795903..b11228f2cb 100644
--- a/drivers/ml/cnxk/cn10k_ml_ops.c
+++ b/drivers/ml/cnxk/cn10k_ml_ops.c
@@ -408,11 +408,14 @@ cn10k_ml_dev_queue_pair_setup(struct rte_ml_dev *dev, uint16_t queue_pair_id,
 int
 cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params, uint16_t *model_id)
 {
+	struct cn10k_ml_model_metadata *metadata;
 	struct cn10k_ml_model *model;
 	struct cn10k_ml_dev *mldev;
 
 	char str[RTE_MEMZONE_NAMESIZE];
 	const struct plt_memzone *mz;
+	size_t model_data_size;
+	uint8_t *base_dma_addr;
 	uint64_t mz_size;
 	uint16_t idx;
 	bool found;
@@ -439,7 +442,12 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	}
 
 	/* Compute memzone size */
-	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE);
+	metadata = (struct cn10k_ml_model_metadata *)params->addr;
+	model_data_size = metadata->init_model.file_size + metadata->main_model.file_size +
+			  metadata->finish_model.file_size + metadata->weights_bias.file_size;
+	model_data_size = PLT_ALIGN_CEIL(model_data_size, ML_CN10K_ALIGN_SIZE);
+	mz_size = PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE) +
+		  2 * model_data_size;
 
 	/* Allocate memzone for model object and model data */
 	snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", CN10K_ML_MODEL_MEMZONE_NAME, idx);
@@ -462,6 +470,14 @@ cn10k_ml_model_load(struct rte_ml_dev *dev, struct rte_ml_model_params *params,
 	else
 		model->batch_size = model->metadata.model.batch_size;
 
+	/* Set DMA base address */
+	base_dma_addr = PLT_PTR_ADD(
+		mz->addr, PLT_ALIGN_CEIL(sizeof(struct cn10k_ml_model), ML_CN10K_ALIGN_SIZE));
+	cn10k_ml_model_addr_update(model, params->addr, base_dma_addr);
+
+	/* Copy data from load to run. run address to be used by MLIP */
+	rte_memcpy(model->addr.base_dma_addr_run, model->addr.base_dma_addr_load, model_data_size);
+
 	plt_spinlock_init(&model->lock);
 	model->state = ML_CN10K_MODEL_STATE_LOADED;
 	dev->data->models[idx] = model;
-- 
2.17.1