* [PATCH v1 0/3] Spec changes to support multi I/O models @ 2023-08-30 15:52 Srikanth Yalavarthi 2023-08-30 15:53 ` [PATCH v1 1/3] mldev: add support for arbitrary shape dimensions Srikanth Yalavarthi ` (5 more replies) 0 siblings, 6 replies; 26+ messages in thread From: Srikanth Yalavarthi @ 2023-08-30 15:52 UTC (permalink / raw) Cc: dev, syalavarthi, sshankarnara, aprabhu, ptakkar This series implements changes to mldev spec to extend support for ML models with multiple inputs and outputs. Changes include introduction of I/O layout to support packed and split buffers for model input and output. Extended the rte_ml_model_info structure to support multiple inputs and outputs. Updated rte_ml_op and quantize / dequantize APIs to support an array of input and output ML buffer segments. Srikanth Yalavarthi (3): mldev: add support for arbitrary shape dimensions mldev: introduce support for IO layout mldev: drop input and output size get APIs app/test-mldev/ml_options.c | 15 - app/test-mldev/ml_options.h | 2 - app/test-mldev/test_inference_common.c | 416 +++++++++++++++++-------- app/test-mldev/test_inference_common.h | 6 + app/test-mldev/test_model_common.c | 6 - app/test-mldev/test_model_common.h | 1 - doc/guides/tools/testmldev.rst | 6 - drivers/ml/cnxk/cn10k_ml_dev.h | 3 + drivers/ml/cnxk/cn10k_ml_model.c | 84 +++-- drivers/ml/cnxk/cn10k_ml_model.h | 12 + drivers/ml/cnxk/cn10k_ml_ops.c | 135 +++----- lib/mldev/meson.build | 2 +- lib/mldev/mldev_utils.c | 30 -- lib/mldev/mldev_utils.h | 16 - lib/mldev/rte_mldev.c | 50 +-- lib/mldev/rte_mldev.h | 201 +++++------- lib/mldev/rte_mldev_core.h | 68 +--- lib/mldev/version.map | 3 - 18 files changed, 503 insertions(+), 553 deletions(-) -- 2.41.0 ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v1 1/3] mldev: add support for arbitrary shape dimensions 2023-08-30 15:52 [PATCH v1 0/3] Spec changes to support multi I/O models Srikanth Yalavarthi @ 2023-08-30 15:53 ` Srikanth Yalavarthi 2023-08-30 15:53 ` [PATCH v1 2/3] mldev: introduce support for IO layout Srikanth Yalavarthi ` (4 subsequent siblings) 5 siblings, 0 replies; 26+ messages in thread From: Srikanth Yalavarthi @ 2023-08-30 15:53 UTC (permalink / raw) To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar Updated rte_ml_io_info to support shape of arbitrary number of dimensions. Dropped use of rte_ml_io_shape and rte_ml_io_format. Introduced new fields nb_elements and size in rte_ml_io_info. Updated drivers and app/mldev to support the changes. Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com> --- app/test-mldev/test_inference_common.c | 97 +++++--------------------- drivers/ml/cnxk/cn10k_ml_model.c | 78 +++++++++++++-------- drivers/ml/cnxk/cn10k_ml_model.h | 12 ++++ drivers/ml/cnxk/cn10k_ml_ops.c | 11 +-- lib/mldev/mldev_utils.c | 30 -------- lib/mldev/mldev_utils.h | 16 ----- lib/mldev/rte_mldev.h | 59 ++++------------ lib/mldev/version.map | 1 - 8 files changed, 94 insertions(+), 210 deletions(-) diff --git a/app/test-mldev/test_inference_common.c b/app/test-mldev/test_inference_common.c index 418bf38be4c..6bda37b0fab 100644 --- a/app/test-mldev/test_inference_common.c +++ b/app/test-mldev/test_inference_common.c @@ -3,6 +3,7 @@ */ #include <errno.h> +#include <math.h> #include <stdio.h> #include <unistd.h> @@ -18,11 +19,6 @@ #include "ml_common.h" #include "test_inference_common.h" -#define ML_TEST_READ_TYPE(buffer, type) (*((type *)buffer)) - -#define ML_TEST_CHECK_OUTPUT(output, reference, tolerance) \ - (((float)output - (float)reference) <= (((float)reference * tolerance) / 100.0)) - #define ML_OPEN_WRITE_GET_ERR(name, buffer, size, err) \ do { \ FILE *fp = fopen(name, "w+"); \ @@ -763,9 +759,9 @@ ml_inference_validation(struct ml_test *test, struct ml_request *req) { struct test_inference *t = ml_test_priv((struct ml_test *)test); struct ml_model *model; - uint32_t nb_elements; - uint8_t *reference; - uint8_t *output; + float *reference; + float *output; + float deviation; bool match; uint32_t i; uint32_t j; @@ -777,89 +773,30 @@ ml_inference_validation(struct ml_test *test, struct ml_request *req) match = (rte_hash_crc(model->output, model->out_dsize, 0) == rte_hash_crc(model->reference, model->out_dsize, 0)); } else { - output = model->output; - reference = model->reference; + output = (float *)model->output; + reference = (float *)model->reference; i = 0; next_output: - nb_elements = - model->info.output_info[i].shape.w * model->info.output_info[i].shape.x * - model->info.output_info[i].shape.y * model->info.output_info[i].shape.z; j = 0; next_element: match = false; - switch (model->info.output_info[i].dtype) { - case RTE_ML_IO_TYPE_INT8: - if (ML_TEST_CHECK_OUTPUT(ML_TEST_READ_TYPE(output, int8_t), - ML_TEST_READ_TYPE(reference, int8_t), - t->cmn.opt->tolerance)) - match = true; - - output += sizeof(int8_t); - reference += sizeof(int8_t); - break; - case RTE_ML_IO_TYPE_UINT8: - if (ML_TEST_CHECK_OUTPUT(ML_TEST_READ_TYPE(output, uint8_t), - ML_TEST_READ_TYPE(reference, uint8_t), - t->cmn.opt->tolerance)) - match = true; - - output += sizeof(float); - reference += sizeof(float); - break; - case RTE_ML_IO_TYPE_INT16: - if (ML_TEST_CHECK_OUTPUT(ML_TEST_READ_TYPE(output, int16_t), - ML_TEST_READ_TYPE(reference, int16_t), - t->cmn.opt->tolerance)) - match = true; - - output += sizeof(int16_t); - reference += sizeof(int16_t); - break; - case RTE_ML_IO_TYPE_UINT16: - if (ML_TEST_CHECK_OUTPUT(ML_TEST_READ_TYPE(output, uint16_t), - ML_TEST_READ_TYPE(reference, uint16_t), - t->cmn.opt->tolerance)) - match = true; - - output += sizeof(uint16_t); - reference += sizeof(uint16_t); - break; - case RTE_ML_IO_TYPE_INT32: - if (ML_TEST_CHECK_OUTPUT(ML_TEST_READ_TYPE(output, int32_t), - ML_TEST_READ_TYPE(reference, int32_t), - t->cmn.opt->tolerance)) - match = true; - - output += sizeof(int32_t); - reference += sizeof(int32_t); - break; - case RTE_ML_IO_TYPE_UINT32: - if (ML_TEST_CHECK_OUTPUT(ML_TEST_READ_TYPE(output, uint32_t), - ML_TEST_READ_TYPE(reference, uint32_t), - t->cmn.opt->tolerance)) - match = true; - - output += sizeof(uint32_t); - reference += sizeof(uint32_t); - break; - case RTE_ML_IO_TYPE_FP32: - if (ML_TEST_CHECK_OUTPUT(ML_TEST_READ_TYPE(output, float), - ML_TEST_READ_TYPE(reference, float), - t->cmn.opt->tolerance)) - match = true; - - output += sizeof(float); - reference += sizeof(float); - break; - default: /* other types, fp8, fp16, bfloat16 */ + deviation = + (*reference == 0 ? 0 : 100 * fabs(*output - *reference) / fabs(*reference)); + if (deviation <= t->cmn.opt->tolerance) match = true; - } + else + ml_err("id = %d, element = %d, output = %f, reference = %f, deviation = %f %%\n", + i, j, *output, *reference, deviation); + + output++; + reference++; if (!match) goto done; + j++; - if (j < nb_elements) + if (j < model->info.output_info[i].nb_elements) goto next_element; i++; diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c index 92c47d39baf..26df8d9ff94 100644 --- a/drivers/ml/cnxk/cn10k_ml_model.c +++ b/drivers/ml/cnxk/cn10k_ml_model.c @@ -366,6 +366,12 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_ addr->total_input_sz_q = 0; for (i = 0; i < metadata->model.num_input; i++) { if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) { + addr->input[i].nb_dims = 4; + addr->input[i].shape[0] = metadata->input1[i].shape.w; + addr->input[i].shape[1] = metadata->input1[i].shape.x; + addr->input[i].shape[2] = metadata->input1[i].shape.y; + addr->input[i].shape[3] = metadata->input1[i].shape.z; + addr->input[i].nb_elements = metadata->input1[i].shape.w * metadata->input1[i].shape.x * metadata->input1[i].shape.y * metadata->input1[i].shape.z; @@ -386,6 +392,13 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_ addr->input[i].sz_q); } else { j = i - MRVL_ML_NUM_INPUT_OUTPUT_1; + + addr->input[i].nb_dims = 4; + addr->input[i].shape[0] = metadata->input2[j].shape.w; + addr->input[i].shape[1] = metadata->input2[j].shape.x; + addr->input[i].shape[2] = metadata->input2[j].shape.y; + addr->input[i].shape[3] = metadata->input2[j].shape.z; + addr->input[i].nb_elements = metadata->input2[j].shape.w * metadata->input2[j].shape.x * metadata->input2[j].shape.y * metadata->input2[j].shape.z; @@ -412,6 +425,8 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_ addr->total_output_sz_d = 0; for (i = 0; i < metadata->model.num_output; i++) { if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) { + addr->output[i].nb_dims = 1; + addr->output[i].shape[0] = metadata->output1[i].size; addr->output[i].nb_elements = metadata->output1[i].size; addr->output[i].sz_d = addr->output[i].nb_elements * @@ -426,6 +441,9 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_ model->model_id, i, addr->output[i].sz_d, addr->output[i].sz_q); } else { j = i - MRVL_ML_NUM_INPUT_OUTPUT_1; + + addr->output[i].nb_dims = 1; + addr->output[i].shape[0] = metadata->output2[j].size; addr->output[i].nb_elements = metadata->output2[j].size; addr->output[i].sz_d = addr->output[i].nb_elements * @@ -498,6 +516,7 @@ void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model) { struct cn10k_ml_model_metadata *metadata; + struct cn10k_ml_model_addr *addr; struct rte_ml_model_info *info; struct rte_ml_io_info *output; struct rte_ml_io_info *input; @@ -508,6 +527,7 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model) info = PLT_PTR_CAST(model->info); input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info)); output = PLT_PTR_ADD(input, metadata->model.num_input * sizeof(struct rte_ml_io_info)); + addr = &model->addr; /* Set model info */ memset(info, 0, sizeof(struct rte_ml_model_info)); @@ -529,24 +549,25 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model) if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) { rte_memcpy(input[i].name, metadata->input1[i].input_name, MRVL_ML_INPUT_NAME_LEN); - input[i].dtype = metadata->input1[i].input_type; - input[i].qtype = metadata->input1[i].model_input_type; - input[i].shape.format = metadata->input1[i].shape.format; - input[i].shape.w = metadata->input1[i].shape.w; - input[i].shape.x = metadata->input1[i].shape.x; - input[i].shape.y = metadata->input1[i].shape.y; - input[i].shape.z = metadata->input1[i].shape.z; + input[i].nb_dims = addr->input[i].nb_dims; + input[i].shape = addr->input[i].shape; + input[i].type = metadata->input1[i].model_input_type; + input[i].nb_elements = addr->input[i].nb_elements; + input[i].size = + addr->input[i].nb_elements * + rte_ml_io_type_size_get(metadata->input1[i].model_input_type); } else { j = i - MRVL_ML_NUM_INPUT_OUTPUT_1; + rte_memcpy(input[i].name, metadata->input2[j].input_name, MRVL_ML_INPUT_NAME_LEN); - input[i].dtype = metadata->input2[j].input_type; - input[i].qtype = metadata->input2[j].model_input_type; - input[i].shape.format = metadata->input2[j].shape.format; - input[i].shape.w = metadata->input2[j].shape.w; - input[i].shape.x = metadata->input2[j].shape.x; - input[i].shape.y = metadata->input2[j].shape.y; - input[i].shape.z = metadata->input2[j].shape.z; + input[i].nb_dims = addr->input[i].nb_dims; + input[i].shape = addr->input[i].shape; + input[i].type = metadata->input2[j].model_input_type; + input[i].nb_elements = addr->input[i].nb_elements; + input[i].size = + addr->input[i].nb_elements * + rte_ml_io_type_size_get(metadata->input2[j].model_input_type); } } @@ -555,24 +576,25 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model) if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) { rte_memcpy(output[i].name, metadata->output1[i].output_name, MRVL_ML_OUTPUT_NAME_LEN); - output[i].dtype = metadata->output1[i].output_type; - output[i].qtype = metadata->output1[i].model_output_type; - output[i].shape.format = RTE_ML_IO_FORMAT_1D; - output[i].shape.w = metadata->output1[i].size; - output[i].shape.x = 1; - output[i].shape.y = 1; - output[i].shape.z = 1; + output[i].nb_dims = addr->output[i].nb_dims; + output[i].shape = addr->output[i].shape; + output[i].type = metadata->output1[i].model_output_type; + output[i].nb_elements = addr->output[i].nb_elements; + output[i].size = + addr->output[i].nb_elements * + rte_ml_io_type_size_get(metadata->output1[i].model_output_type); } else { j = i - MRVL_ML_NUM_INPUT_OUTPUT_1; + rte_memcpy(output[i].name, metadata->output2[j].output_name, MRVL_ML_OUTPUT_NAME_LEN); - output[i].dtype = metadata->output2[j].output_type; - output[i].qtype = metadata->output2[j].model_output_type; - output[i].shape.format = RTE_ML_IO_FORMAT_1D; - output[i].shape.w = metadata->output2[j].size; - output[i].shape.x = 1; - output[i].shape.y = 1; - output[i].shape.z = 1; + output[i].nb_dims = addr->output[i].nb_dims; + output[i].shape = addr->output[i].shape; + output[i].type = metadata->output2[j].model_output_type; + output[i].nb_elements = addr->output[i].nb_elements; + output[i].size = + addr->output[i].nb_elements * + rte_ml_io_type_size_get(metadata->output2[j].model_output_type); } } } diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h index 1f689363fc4..4cc0744891b 100644 --- a/drivers/ml/cnxk/cn10k_ml_model.h +++ b/drivers/ml/cnxk/cn10k_ml_model.h @@ -409,6 +409,12 @@ struct cn10k_ml_model_addr { /* Input address and size */ struct { + /* Number of dimensions in shape */ + uint32_t nb_dims; + + /* Shape of input */ + uint32_t shape[4]; + /* Number of elements */ uint32_t nb_elements; @@ -421,6 +427,12 @@ struct cn10k_ml_model_addr { /* Output address and size */ struct { + /* Number of dimensions in shape */ + uint32_t nb_dims; + + /* Shape of input */ + uint32_t shape[4]; + /* Number of elements */ uint32_t nb_elements; diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c index 656467d8918..e3faab81ba3 100644 --- a/drivers/ml/cnxk/cn10k_ml_ops.c +++ b/drivers/ml/cnxk/cn10k_ml_ops.c @@ -321,8 +321,8 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp) fprintf(fp, "\n"); print_line(fp, LINE_LEN); - fprintf(fp, "%8s %16s %12s %18s %12s %14s\n", "input", "input_name", "input_type", - "model_input_type", "quantize", "format"); + fprintf(fp, "%8s %16s %12s %18s %12s\n", "input", "input_name", "input_type", + "model_input_type", "quantize"); print_line(fp, LINE_LEN); for (i = 0; i < model->metadata.model.num_input; i++) { if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) { @@ -335,12 +335,10 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp) fprintf(fp, "%*s ", 18, str); fprintf(fp, "%*s", 12, (model->metadata.input1[i].quantize == 1 ? "Yes" : "No")); - rte_ml_io_format_to_str(model->metadata.input1[i].shape.format, str, - STR_LEN); - fprintf(fp, "%*s", 16, str); fprintf(fp, "\n"); } else { j = i - MRVL_ML_NUM_INPUT_OUTPUT_1; + fprintf(fp, "%8u ", i); fprintf(fp, "%*s ", 16, model->metadata.input2[j].input_name); rte_ml_io_type_to_str(model->metadata.input2[j].input_type, str, STR_LEN); @@ -350,9 +348,6 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp) fprintf(fp, "%*s ", 18, str); fprintf(fp, "%*s", 12, (model->metadata.input2[j].quantize == 1 ? "Yes" : "No")); - rte_ml_io_format_to_str(model->metadata.input2[j].shape.format, str, - STR_LEN); - fprintf(fp, "%*s", 16, str); fprintf(fp, "\n"); } } diff --git a/lib/mldev/mldev_utils.c b/lib/mldev/mldev_utils.c index d2442b123b8..ccd2c39ca89 100644 --- a/lib/mldev/mldev_utils.c +++ b/lib/mldev/mldev_utils.c @@ -86,33 +86,3 @@ rte_ml_io_type_to_str(enum rte_ml_io_type type, char *str, int len) rte_strlcpy(str, "invalid", len); } } - -void -rte_ml_io_format_to_str(enum rte_ml_io_format format, char *str, int len) -{ - switch (format) { - case RTE_ML_IO_FORMAT_NCHW: - rte_strlcpy(str, "NCHW", len); - break; - case RTE_ML_IO_FORMAT_NHWC: - rte_strlcpy(str, "NHWC", len); - break; - case RTE_ML_IO_FORMAT_CHWN: - rte_strlcpy(str, "CHWN", len); - break; - case RTE_ML_IO_FORMAT_3D: - rte_strlcpy(str, "3D", len); - break; - case RTE_ML_IO_FORMAT_2D: - rte_strlcpy(str, "Matrix", len); - break; - case RTE_ML_IO_FORMAT_1D: - rte_strlcpy(str, "Vector", len); - break; - case RTE_ML_IO_FORMAT_SCALAR: - rte_strlcpy(str, "Scalar", len); - break; - default: - rte_strlcpy(str, "invalid", len); - } -} diff --git a/lib/mldev/mldev_utils.h b/lib/mldev/mldev_utils.h index 5bc80204532..220afb42f0d 100644 --- a/lib/mldev/mldev_utils.h +++ b/lib/mldev/mldev_utils.h @@ -52,22 +52,6 @@ __rte_internal void rte_ml_io_type_to_str(enum rte_ml_io_type type, char *str, int len); -/** - * @internal - * - * Get the name of an ML IO format. - * - * @param[in] type - * Enumeration of ML IO format. - * @param[in] str - * Address of character array. - * @param[in] len - * Length of character array. - */ -__rte_internal -void -rte_ml_io_format_to_str(enum rte_ml_io_format format, char *str, int len); - /** * @internal * diff --git a/lib/mldev/rte_mldev.h b/lib/mldev/rte_mldev.h index fc3525c1ab5..6204df09308 100644 --- a/lib/mldev/rte_mldev.h +++ b/lib/mldev/rte_mldev.h @@ -863,47 +863,6 @@ enum rte_ml_io_type { /**< 16-bit brain floating point number. */ }; -/** - * Input and output format. This is used to represent the encoding type of multi-dimensional - * used by ML models. - */ -enum rte_ml_io_format { - RTE_ML_IO_FORMAT_NCHW = 1, - /**< Batch size (N) x channels (C) x height (H) x width (W) */ - RTE_ML_IO_FORMAT_NHWC, - /**< Batch size (N) x height (H) x width (W) x channels (C) */ - RTE_ML_IO_FORMAT_CHWN, - /**< Channels (C) x height (H) x width (W) x batch size (N) */ - RTE_ML_IO_FORMAT_3D, - /**< Format to represent a 3 dimensional data */ - RTE_ML_IO_FORMAT_2D, - /**< Format to represent matrix data */ - RTE_ML_IO_FORMAT_1D, - /**< Format to represent vector data */ - RTE_ML_IO_FORMAT_SCALAR, - /**< Format to represent scalar data */ -}; - -/** - * Input and output shape. This structure represents the encoding format and dimensions - * of the tensor or vector. - * - * The data can be a 4D / 3D tensor, matrix, vector or a scalar. Number of dimensions used - * for the data would depend on the format. Unused dimensions to be set to 1. - */ -struct rte_ml_io_shape { - enum rte_ml_io_format format; - /**< Format of the data */ - uint32_t w; - /**< First dimension */ - uint32_t x; - /**< Second dimension */ - uint32_t y; - /**< Third dimension */ - uint32_t z; - /**< Fourth dimension */ -}; - /** Input and output data information structure * * Specifies the type and shape of input and output data. @@ -911,12 +870,18 @@ struct rte_ml_io_shape { struct rte_ml_io_info { char name[RTE_ML_STR_MAX]; /**< Name of data */ - struct rte_ml_io_shape shape; - /**< Shape of data */ - enum rte_ml_io_type qtype; - /**< Type of quantized data */ - enum rte_ml_io_type dtype; - /**< Type of de-quantized data */ + uint32_t nb_dims; + /**< Number of dimensions in shape */ + uint32_t *shape; + /**< Shape of the tensor */ + enum rte_ml_io_type type; + /**< Type of data + * @see enum rte_ml_io_type + */ + uint64_t nb_elements; + /** Number of elements in tensor */ + uint64_t size; + /** Size of tensor in bytes */ }; /** Model information structure */ diff --git a/lib/mldev/version.map b/lib/mldev/version.map index 0706b565be6..40ff27f4b95 100644 --- a/lib/mldev/version.map +++ b/lib/mldev/version.map @@ -51,7 +51,6 @@ INTERNAL { rte_ml_io_type_size_get; rte_ml_io_type_to_str; - rte_ml_io_format_to_str; rte_ml_io_float32_to_int8; rte_ml_io_int8_to_float32; rte_ml_io_float32_to_uint8; -- 2.41.0 ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v1 2/3] mldev: introduce support for IO layout 2023-08-30 15:52 [PATCH v1 0/3] Spec changes to support multi I/O models Srikanth Yalavarthi 2023-08-30 15:53 ` [PATCH v1 1/3] mldev: add support for arbitrary shape dimensions Srikanth Yalavarthi @ 2023-08-30 15:53 ` Srikanth Yalavarthi 2023-08-30 15:53 ` [PATCH v1 3/3] mldev: drop input and output size get APIs Srikanth Yalavarthi ` (3 subsequent siblings) 5 siblings, 0 replies; 26+ messages in thread From: Srikanth Yalavarthi @ 2023-08-30 15:53 UTC (permalink / raw) To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar Introduce IO layout in ML device specification. IO layout defines the expected arrangement of model input and output buffers in the memory. Packed and Split layout support is added in the specification. Updated rte_ml_op to support array of rte_ml_buff_seg pointers to support packed and split I/O layouts. Updated ML quantize and dequantize APIs to support rte_ml_buff_seg pointer arrays. Replaced batch_size with min_batches and max_batches in rte_ml_model_info. Implement support for model IO layout in ml/cnxk driver. Updated the ML test application to support IO layout and dropped support for '--batches' in test application. Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com> --- app/test-mldev/ml_options.c | 15 -- app/test-mldev/ml_options.h | 2 - app/test-mldev/test_inference_common.c | 323 +++++++++++++++++++++---- app/test-mldev/test_inference_common.h | 6 + app/test-mldev/test_model_common.c | 6 - app/test-mldev/test_model_common.h | 1 - doc/guides/tools/testmldev.rst | 6 - drivers/ml/cnxk/cn10k_ml_dev.h | 3 + drivers/ml/cnxk/cn10k_ml_model.c | 6 +- drivers/ml/cnxk/cn10k_ml_ops.c | 74 +++--- lib/mldev/meson.build | 2 +- lib/mldev/rte_mldev.c | 12 +- lib/mldev/rte_mldev.h | 90 +++++-- lib/mldev/rte_mldev_core.h | 14 +- 14 files changed, 415 insertions(+), 145 deletions(-) diff --git a/app/test-mldev/ml_options.c b/app/test-mldev/ml_options.c index 816e41fdb70..c0468f5eee4 100644 --- a/app/test-mldev/ml_options.c +++ b/app/test-mldev/ml_options.c @@ -28,7 +28,6 @@ ml_options_default(struct ml_options *opt) opt->burst_size = 1; opt->queue_pairs = 1; opt->queue_size = 1; - opt->batches = 0; opt->tolerance = 0.0; opt->stats = false; opt->debug = false; @@ -212,18 +211,6 @@ ml_parse_queue_size(struct ml_options *opt, const char *arg) return ret; } -static int -ml_parse_batches(struct ml_options *opt, const char *arg) -{ - int ret; - - ret = parser_read_uint16(&opt->batches, arg); - if (ret != 0) - ml_err("Invalid option, batches = %s\n", arg); - - return ret; -} - static int ml_parse_tolerance(struct ml_options *opt, const char *arg) { @@ -286,7 +273,6 @@ static struct option lgopts[] = { {ML_BURST_SIZE, 1, 0, 0}, {ML_QUEUE_PAIRS, 1, 0, 0}, {ML_QUEUE_SIZE, 1, 0, 0}, - {ML_BATCHES, 1, 0, 0}, {ML_TOLERANCE, 1, 0, 0}, {ML_STATS, 0, 0, 0}, {ML_DEBUG, 0, 0, 0}, @@ -308,7 +294,6 @@ ml_opts_parse_long(int opt_idx, struct ml_options *opt) {ML_BURST_SIZE, ml_parse_burst_size}, {ML_QUEUE_PAIRS, ml_parse_queue_pairs}, {ML_QUEUE_SIZE, ml_parse_queue_size}, - {ML_BATCHES, ml_parse_batches}, {ML_TOLERANCE, ml_parse_tolerance}, }; diff --git a/app/test-mldev/ml_options.h b/app/test-mldev/ml_options.h index 622a4c05fc2..90e22adeac1 100644 --- a/app/test-mldev/ml_options.h +++ b/app/test-mldev/ml_options.h @@ -21,7 +21,6 @@ #define ML_BURST_SIZE ("burst_size") #define ML_QUEUE_PAIRS ("queue_pairs") #define ML_QUEUE_SIZE ("queue_size") -#define ML_BATCHES ("batches") #define ML_TOLERANCE ("tolerance") #define ML_STATS ("stats") #define ML_DEBUG ("debug") @@ -44,7 +43,6 @@ struct ml_options { uint16_t burst_size; uint16_t queue_pairs; uint16_t queue_size; - uint16_t batches; float tolerance; bool stats; bool debug; diff --git a/app/test-mldev/test_inference_common.c b/app/test-mldev/test_inference_common.c index 6bda37b0fab..0018cc92514 100644 --- a/app/test-mldev/test_inference_common.c +++ b/app/test-mldev/test_inference_common.c @@ -47,7 +47,10 @@ ml_enqueue_single(void *arg) uint64_t start_cycle; uint32_t burst_enq; uint32_t lcore_id; + uint64_t offset; + uint64_t bufsz; uint16_t fid; + uint32_t i; int ret; lcore_id = rte_lcore_id(); @@ -66,24 +69,64 @@ ml_enqueue_single(void *arg) if (ret != 0) goto next_model; -retry: +retry_req: ret = rte_mempool_get(t->model[fid].io_pool, (void **)&req); if (ret != 0) - goto retry; + goto retry_req; + +retry_inp_segs: + ret = rte_mempool_get_bulk(t->buf_seg_pool, (void **)req->inp_buf_segs, + t->model[fid].info.nb_inputs); + if (ret != 0) + goto retry_inp_segs; + +retry_out_segs: + ret = rte_mempool_get_bulk(t->buf_seg_pool, (void **)req->out_buf_segs, + t->model[fid].info.nb_outputs); + if (ret != 0) + goto retry_out_segs; op->model_id = t->model[fid].id; - op->nb_batches = t->model[fid].nb_batches; + op->nb_batches = t->model[fid].info.min_batches; op->mempool = t->op_pool; + op->input = req->inp_buf_segs; + op->output = req->out_buf_segs; + op->user_ptr = req; - op->input.addr = req->input; - op->input.length = t->model[fid].inp_qsize; - op->input.next = NULL; + if (t->model[fid].info.io_layout == RTE_ML_IO_LAYOUT_PACKED) { + op->input[0]->addr = req->input; + op->input[0]->iova_addr = rte_mem_virt2iova(req->input); + op->input[0]->length = t->model[fid].inp_qsize; + op->input[0]->next = NULL; + + op->output[0]->addr = req->output; + op->output[0]->iova_addr = rte_mem_virt2iova(req->output); + op->output[0]->length = t->model[fid].out_qsize; + op->output[0]->next = NULL; + } else { + offset = 0; + for (i = 0; i < t->model[fid].info.nb_inputs; i++) { + bufsz = RTE_ALIGN_CEIL(t->model[fid].info.input_info[i].size, + t->cmn.dev_info.align_size); + op->input[i]->addr = req->input + offset; + op->input[i]->iova_addr = rte_mem_virt2iova(req->input + offset); + op->input[i]->length = bufsz; + op->input[i]->next = NULL; + offset += bufsz; + } - op->output.addr = req->output; - op->output.length = t->model[fid].out_qsize; - op->output.next = NULL; + offset = 0; + for (i = 0; i < t->model[fid].info.nb_outputs; i++) { + bufsz = RTE_ALIGN_CEIL(t->model[fid].info.output_info[i].size, + t->cmn.dev_info.align_size); + op->output[i]->addr = req->output + offset; + op->output[i]->iova_addr = rte_mem_virt2iova(req->output + offset); + op->output[i]->length = bufsz; + op->output[i]->next = NULL; + offset += bufsz; + } + } - op->user_ptr = req; req->niters++; req->fid = fid; @@ -143,6 +186,10 @@ ml_dequeue_single(void *arg) } req = (struct ml_request *)op->user_ptr; rte_mempool_put(t->model[req->fid].io_pool, req); + rte_mempool_put_bulk(t->buf_seg_pool, (void **)op->input, + t->model[req->fid].info.nb_inputs); + rte_mempool_put_bulk(t->buf_seg_pool, (void **)op->output, + t->model[req->fid].info.nb_outputs); rte_mempool_put(t->op_pool, op); } @@ -164,9 +211,12 @@ ml_enqueue_burst(void *arg) uint16_t burst_enq; uint32_t lcore_id; uint16_t pending; + uint64_t offset; + uint64_t bufsz; uint16_t idx; uint16_t fid; uint16_t i; + uint16_t j; int ret; lcore_id = rte_lcore_id(); @@ -186,25 +236,70 @@ ml_enqueue_burst(void *arg) if (ret != 0) goto next_model; -retry: +retry_reqs: ret = rte_mempool_get_bulk(t->model[fid].io_pool, (void **)args->reqs, ops_count); if (ret != 0) - goto retry; + goto retry_reqs; for (i = 0; i < ops_count; i++) { +retry_inp_segs: + ret = rte_mempool_get_bulk(t->buf_seg_pool, (void **)args->reqs[i]->inp_buf_segs, + t->model[fid].info.nb_inputs); + if (ret != 0) + goto retry_inp_segs; + +retry_out_segs: + ret = rte_mempool_get_bulk(t->buf_seg_pool, (void **)args->reqs[i]->out_buf_segs, + t->model[fid].info.nb_outputs); + if (ret != 0) + goto retry_out_segs; + args->enq_ops[i]->model_id = t->model[fid].id; - args->enq_ops[i]->nb_batches = t->model[fid].nb_batches; + args->enq_ops[i]->nb_batches = t->model[fid].info.min_batches; args->enq_ops[i]->mempool = t->op_pool; + args->enq_ops[i]->input = args->reqs[i]->inp_buf_segs; + args->enq_ops[i]->output = args->reqs[i]->out_buf_segs; + args->enq_ops[i]->user_ptr = args->reqs[i]; - args->enq_ops[i]->input.addr = args->reqs[i]->input; - args->enq_ops[i]->input.length = t->model[fid].inp_qsize; - args->enq_ops[i]->input.next = NULL; + if (t->model[fid].info.io_layout == RTE_ML_IO_LAYOUT_PACKED) { + args->enq_ops[i]->input[0]->addr = args->reqs[i]->input; + args->enq_ops[i]->input[0]->iova_addr = + rte_mem_virt2iova(args->reqs[i]->input); + args->enq_ops[i]->input[0]->length = t->model[fid].inp_qsize; + args->enq_ops[i]->input[0]->next = NULL; + + args->enq_ops[i]->output[0]->addr = args->reqs[i]->output; + args->enq_ops[i]->output[0]->iova_addr = + rte_mem_virt2iova(args->reqs[i]->output); + args->enq_ops[i]->output[0]->length = t->model[fid].out_qsize; + args->enq_ops[i]->output[0]->next = NULL; + } else { + offset = 0; + for (j = 0; j < t->model[fid].info.nb_inputs; j++) { + bufsz = RTE_ALIGN_CEIL(t->model[fid].info.input_info[i].size, + t->cmn.dev_info.align_size); + + args->enq_ops[i]->input[j]->addr = args->reqs[i]->input + offset; + args->enq_ops[i]->input[j]->iova_addr = + rte_mem_virt2iova(args->reqs[i]->input + offset); + args->enq_ops[i]->input[j]->length = t->model[fid].inp_qsize; + args->enq_ops[i]->input[j]->next = NULL; + offset += bufsz; + } - args->enq_ops[i]->output.addr = args->reqs[i]->output; - args->enq_ops[i]->output.length = t->model[fid].out_qsize; - args->enq_ops[i]->output.next = NULL; + offset = 0; + for (j = 0; j < t->model[fid].info.nb_outputs; j++) { + bufsz = RTE_ALIGN_CEIL(t->model[fid].info.output_info[i].size, + t->cmn.dev_info.align_size); + args->enq_ops[i]->output[j]->addr = args->reqs[i]->output + offset; + args->enq_ops[i]->output[j]->iova_addr = + rte_mem_virt2iova(args->reqs[i]->output + offset); + args->enq_ops[i]->output[j]->length = t->model[fid].out_qsize; + args->enq_ops[i]->output[j]->next = NULL; + offset += bufsz; + } + } - args->enq_ops[i]->user_ptr = args->reqs[i]; args->reqs[i]->niters++; args->reqs[i]->fid = fid; } @@ -277,6 +372,11 @@ ml_dequeue_burst(void *arg) req = (struct ml_request *)args->deq_ops[i]->user_ptr; if (req != NULL) rte_mempool_put(t->model[req->fid].io_pool, req); + + rte_mempool_put_bulk(t->buf_seg_pool, (void **)args->deq_ops[i]->input, + t->model[req->fid].info.nb_inputs); + rte_mempool_put_bulk(t->buf_seg_pool, (void **)args->deq_ops[i]->output, + t->model[req->fid].info.nb_outputs); } rte_mempool_put_bulk(t->op_pool, (void *)args->deq_ops, burst_deq); } @@ -315,6 +415,12 @@ test_inference_cap_check(struct ml_options *opt) return false; } + if (dev_info.max_io < ML_TEST_MAX_IO_SIZE) { + ml_err("Insufficient capabilities: Max I/O, count = %u > (max limit = %u)", + ML_TEST_MAX_IO_SIZE, dev_info.max_io); + return false; + } + return true; } @@ -403,11 +509,6 @@ test_inference_opt_dump(struct ml_options *opt) ml_dump("tolerance", "%-7.3f", opt->tolerance); ml_dump("stats", "%s", (opt->stats ? "true" : "false")); - if (opt->batches == 0) - ml_dump("batches", "%u (default batch size)", opt->batches); - else - ml_dump("batches", "%u", opt->batches); - ml_dump_begin("filelist"); for (i = 0; i < opt->nb_filelist; i++) { ml_dump_list("model", i, opt->filelist[i].model); @@ -492,10 +593,18 @@ void test_inference_destroy(struct ml_test *test, struct ml_options *opt) { struct test_inference *t; + uint32_t lcore_id; RTE_SET_USED(opt); t = ml_test_priv(test); + + for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) { + rte_free(t->args[lcore_id].enq_ops); + rte_free(t->args[lcore_id].deq_ops); + rte_free(t->args[lcore_id].reqs); + } + rte_free(t); } @@ -572,19 +681,62 @@ ml_request_initialize(struct rte_mempool *mp, void *opaque, void *obj, unsigned { struct test_inference *t = ml_test_priv((struct ml_test *)opaque); struct ml_request *req = (struct ml_request *)obj; + struct rte_ml_buff_seg dbuff_seg[ML_TEST_MAX_IO_SIZE]; + struct rte_ml_buff_seg qbuff_seg[ML_TEST_MAX_IO_SIZE]; + struct rte_ml_buff_seg *q_segs[ML_TEST_MAX_IO_SIZE]; + struct rte_ml_buff_seg *d_segs[ML_TEST_MAX_IO_SIZE]; + uint64_t offset; + uint64_t bufsz; + uint32_t i; RTE_SET_USED(mp); RTE_SET_USED(obj_idx); req->input = (uint8_t *)obj + - RTE_ALIGN_CEIL(sizeof(struct ml_request), t->cmn.dev_info.min_align_size); - req->output = req->input + - RTE_ALIGN_CEIL(t->model[t->fid].inp_qsize, t->cmn.dev_info.min_align_size); + RTE_ALIGN_CEIL(sizeof(struct ml_request), t->cmn.dev_info.align_size); + req->output = + req->input + RTE_ALIGN_CEIL(t->model[t->fid].inp_qsize, t->cmn.dev_info.align_size); req->niters = 0; + if (t->model[t->fid].info.io_layout == RTE_ML_IO_LAYOUT_PACKED) { + dbuff_seg[0].addr = t->model[t->fid].input; + dbuff_seg[0].iova_addr = rte_mem_virt2iova(t->model[t->fid].input); + dbuff_seg[0].length = t->model[t->fid].inp_dsize; + dbuff_seg[0].next = NULL; + d_segs[0] = &dbuff_seg[0]; + + qbuff_seg[0].addr = req->input; + qbuff_seg[0].iova_addr = rte_mem_virt2iova(req->input); + qbuff_seg[0].length = t->model[t->fid].inp_qsize; + qbuff_seg[0].next = NULL; + q_segs[0] = &qbuff_seg[0]; + } else { + offset = 0; + for (i = 0; i < t->model[t->fid].info.nb_inputs; i++) { + bufsz = t->model[t->fid].info.input_info[i].nb_elements * sizeof(float); + dbuff_seg[i].addr = t->model[t->fid].input + offset; + dbuff_seg[i].iova_addr = rte_mem_virt2iova(t->model[t->fid].input + offset); + dbuff_seg[i].length = bufsz; + dbuff_seg[i].next = NULL; + d_segs[i] = &dbuff_seg[i]; + offset += bufsz; + } + + offset = 0; + for (i = 0; i < t->model[t->fid].info.nb_inputs; i++) { + bufsz = RTE_ALIGN_CEIL(t->model[t->fid].info.input_info[i].size, + t->cmn.dev_info.align_size); + qbuff_seg[i].addr = req->input + offset; + qbuff_seg[i].iova_addr = rte_mem_virt2iova(req->input + offset); + qbuff_seg[i].length = bufsz; + qbuff_seg[i].next = NULL; + q_segs[i] = &qbuff_seg[i]; + offset += bufsz; + } + } + /* quantize data */ - rte_ml_io_quantize(t->cmn.opt->dev_id, t->model[t->fid].id, t->model[t->fid].nb_batches, - t->model[t->fid].input, req->input); + rte_ml_io_quantize(t->cmn.opt->dev_id, t->model[t->fid].id, d_segs, q_segs); } int @@ -599,24 +751,39 @@ ml_inference_iomem_setup(struct ml_test *test, struct ml_options *opt, uint16_t uint32_t buff_size; uint32_t mz_size; size_t fsize; + uint32_t i; int ret; /* get input buffer size */ - ret = rte_ml_io_input_size_get(opt->dev_id, t->model[fid].id, t->model[fid].nb_batches, - &t->model[fid].inp_qsize, &t->model[fid].inp_dsize); - if (ret != 0) { - ml_err("Failed to get input size, model : %s\n", opt->filelist[fid].model); - return ret; + t->model[fid].inp_qsize = 0; + for (i = 0; i < t->model[fid].info.nb_inputs; i++) { + if (t->model[fid].info.io_layout == RTE_ML_IO_LAYOUT_PACKED) + t->model[fid].inp_qsize += t->model[fid].info.input_info[i].size; + else + t->model[fid].inp_qsize += RTE_ALIGN_CEIL( + t->model[fid].info.input_info[i].size, t->cmn.dev_info.align_size); } /* get output buffer size */ - ret = rte_ml_io_output_size_get(opt->dev_id, t->model[fid].id, t->model[fid].nb_batches, - &t->model[fid].out_qsize, &t->model[fid].out_dsize); - if (ret != 0) { - ml_err("Failed to get input size, model : %s\n", opt->filelist[fid].model); - return ret; + t->model[fid].out_qsize = 0; + for (i = 0; i < t->model[fid].info.nb_outputs; i++) { + if (t->model[fid].info.io_layout == RTE_ML_IO_LAYOUT_PACKED) + t->model[fid].out_qsize += t->model[fid].info.output_info[i].size; + else + t->model[fid].out_qsize += RTE_ALIGN_CEIL( + t->model[fid].info.output_info[i].size, t->cmn.dev_info.align_size); } + t->model[fid].inp_dsize = 0; + for (i = 0; i < t->model[fid].info.nb_inputs; i++) + t->model[fid].inp_dsize += + t->model[fid].info.input_info[i].nb_elements * sizeof(float); + + t->model[fid].out_dsize = 0; + for (i = 0; i < t->model[fid].info.nb_outputs; i++) + t->model[fid].out_dsize += + t->model[fid].info.output_info[i].nb_elements * sizeof(float); + /* allocate buffer for user data */ mz_size = t->model[fid].inp_dsize + t->model[fid].out_dsize; if (strcmp(opt->filelist[fid].reference, "\0") != 0) @@ -673,9 +840,9 @@ ml_inference_iomem_setup(struct ml_test *test, struct ml_options *opt, uint16_t /* create mempool for quantized input and output buffers. ml_request_initialize is * used as a callback for object creation. */ - buff_size = RTE_ALIGN_CEIL(sizeof(struct ml_request), t->cmn.dev_info.min_align_size) + - RTE_ALIGN_CEIL(t->model[fid].inp_qsize, t->cmn.dev_info.min_align_size) + - RTE_ALIGN_CEIL(t->model[fid].out_qsize, t->cmn.dev_info.min_align_size); + buff_size = RTE_ALIGN_CEIL(sizeof(struct ml_request), t->cmn.dev_info.align_size) + + RTE_ALIGN_CEIL(t->model[fid].inp_qsize, t->cmn.dev_info.align_size) + + RTE_ALIGN_CEIL(t->model[fid].out_qsize, t->cmn.dev_info.align_size); nb_buffers = RTE_MIN((uint64_t)ML_TEST_MAX_POOL_SIZE, opt->repetitions); t->fid = fid; @@ -740,6 +907,18 @@ ml_inference_mem_setup(struct ml_test *test, struct ml_options *opt) return -ENOMEM; } + /* create buf_segs pool of with element of uint8_t. external buffers are attached to the + * buf_segs while queuing inference requests. + */ + t->buf_seg_pool = rte_mempool_create("ml_test_mbuf_pool", ML_TEST_MAX_POOL_SIZE * 2, + sizeof(struct rte_ml_buff_seg), 0, 0, NULL, NULL, NULL, + NULL, opt->socket_id, 0); + if (t->buf_seg_pool == NULL) { + ml_err("Failed to create buf_segs pool : %s\n", "ml_test_mbuf_pool"); + rte_ml_op_pool_free(t->op_pool); + return -ENOMEM; + } + return 0; } @@ -752,6 +931,9 @@ ml_inference_mem_destroy(struct ml_test *test, struct ml_options *opt) /* release op pool */ rte_mempool_free(t->op_pool); + + /* release buf_segs pool */ + rte_mempool_free(t->buf_seg_pool); } static bool @@ -781,8 +963,10 @@ ml_inference_validation(struct ml_test *test, struct ml_request *req) j = 0; next_element: match = false; - deviation = - (*reference == 0 ? 0 : 100 * fabs(*output - *reference) / fabs(*reference)); + if ((*reference == 0) && (*output == 0)) + deviation = 0; + else + deviation = 100 * fabs(*output - *reference) / fabs(*reference); if (deviation <= t->cmn.opt->tolerance) match = true; else @@ -817,14 +1001,59 @@ ml_request_finish(struct rte_mempool *mp, void *opaque, void *obj, unsigned int bool error = false; char *dump_path; + struct rte_ml_buff_seg qbuff_seg[ML_TEST_MAX_IO_SIZE]; + struct rte_ml_buff_seg dbuff_seg[ML_TEST_MAX_IO_SIZE]; + struct rte_ml_buff_seg *q_segs[ML_TEST_MAX_IO_SIZE]; + struct rte_ml_buff_seg *d_segs[ML_TEST_MAX_IO_SIZE]; + uint64_t offset; + uint64_t bufsz; + uint32_t i; + RTE_SET_USED(mp); if (req->niters == 0) return; t->nb_used++; - rte_ml_io_dequantize(t->cmn.opt->dev_id, model->id, t->model[req->fid].nb_batches, - req->output, model->output); + + if (t->model[req->fid].info.io_layout == RTE_ML_IO_LAYOUT_PACKED) { + qbuff_seg[0].addr = req->output; + qbuff_seg[0].iova_addr = rte_mem_virt2iova(req->output); + qbuff_seg[0].length = t->model[req->fid].out_qsize; + qbuff_seg[0].next = NULL; + q_segs[0] = &qbuff_seg[0]; + + dbuff_seg[0].addr = model->output; + dbuff_seg[0].iova_addr = rte_mem_virt2iova(model->output); + dbuff_seg[0].length = t->model[req->fid].out_dsize; + dbuff_seg[0].next = NULL; + d_segs[0] = &dbuff_seg[0]; + } else { + offset = 0; + for (i = 0; i < t->model[req->fid].info.nb_outputs; i++) { + bufsz = RTE_ALIGN_CEIL(t->model[req->fid].info.output_info[i].size, + t->cmn.dev_info.align_size); + qbuff_seg[i].addr = req->output + offset; + qbuff_seg[i].iova_addr = rte_mem_virt2iova(req->output + offset); + qbuff_seg[i].length = bufsz; + qbuff_seg[i].next = NULL; + q_segs[i] = &qbuff_seg[i]; + offset += bufsz; + } + + offset = 0; + for (i = 0; i < t->model[req->fid].info.nb_outputs; i++) { + bufsz = t->model[req->fid].info.output_info[i].nb_elements * sizeof(float); + dbuff_seg[i].addr = model->output + offset; + dbuff_seg[i].iova_addr = rte_mem_virt2iova(model->output + offset); + dbuff_seg[i].length = bufsz; + dbuff_seg[i].next = NULL; + d_segs[i] = &dbuff_seg[i]; + offset += bufsz; + } + } + + rte_ml_io_dequantize(t->cmn.opt->dev_id, model->id, q_segs, d_segs); if (model->reference == NULL) goto dump_output_pass; diff --git a/app/test-mldev/test_inference_common.h b/app/test-mldev/test_inference_common.h index 8f27af25e4f..3f4ba3219be 100644 --- a/app/test-mldev/test_inference_common.h +++ b/app/test-mldev/test_inference_common.h @@ -11,11 +11,16 @@ #include "test_model_common.h" +#define ML_TEST_MAX_IO_SIZE 32 + struct ml_request { uint8_t *input; uint8_t *output; uint16_t fid; uint64_t niters; + + struct rte_ml_buff_seg *inp_buf_segs[ML_TEST_MAX_IO_SIZE]; + struct rte_ml_buff_seg *out_buf_segs[ML_TEST_MAX_IO_SIZE]; }; struct ml_core_args { @@ -38,6 +43,7 @@ struct test_inference { /* test specific data */ struct ml_model model[ML_TEST_MAX_MODELS]; + struct rte_mempool *buf_seg_pool; struct rte_mempool *op_pool; uint64_t nb_used; diff --git a/app/test-mldev/test_model_common.c b/app/test-mldev/test_model_common.c index 8dbb0ff89ff..c517a506117 100644 --- a/app/test-mldev/test_model_common.c +++ b/app/test-mldev/test_model_common.c @@ -50,12 +50,6 @@ ml_model_load(struct ml_test *test, struct ml_options *opt, struct ml_model *mod return ret; } - /* Update number of batches */ - if (opt->batches == 0) - model->nb_batches = model->info.batch_size; - else - model->nb_batches = opt->batches; - model->state = MODEL_LOADED; return 0; diff --git a/app/test-mldev/test_model_common.h b/app/test-mldev/test_model_common.h index c1021ef1b6a..a207e54ab71 100644 --- a/app/test-mldev/test_model_common.h +++ b/app/test-mldev/test_model_common.h @@ -31,7 +31,6 @@ struct ml_model { uint8_t *reference; struct rte_mempool *io_pool; - uint32_t nb_batches; }; int ml_model_load(struct ml_test *test, struct ml_options *opt, struct ml_model *model, diff --git a/doc/guides/tools/testmldev.rst b/doc/guides/tools/testmldev.rst index 741abd722e2..9b1565a4576 100644 --- a/doc/guides/tools/testmldev.rst +++ b/doc/guides/tools/testmldev.rst @@ -106,11 +106,6 @@ The following are the command-line options supported by the test application. Queue size would translate into ``rte_ml_dev_qp_conf::nb_desc`` field during queue-pair creation. Default value is ``1``. -``--batches <n>`` - Set the number batches in the input file provided for inference run. - When not specified, the test would assume the number of batches - is the batch size of the model. - ``--tolerance <n>`` Set the tolerance value in percentage to be used for output validation. Default value is ``0``. @@ -282,7 +277,6 @@ Supported command line options for inference tests are following:: --burst_size --queue_pairs --queue_size - --batches --tolerance --stats diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h index 6ca0b0bb6e2..c73bf7d001a 100644 --- a/drivers/ml/cnxk/cn10k_ml_dev.h +++ b/drivers/ml/cnxk/cn10k_ml_dev.h @@ -30,6 +30,9 @@ /* Maximum number of descriptors per queue-pair */ #define ML_CN10K_MAX_DESC_PER_QP 1024 +/* Maximum number of inputs / outputs per model */ +#define ML_CN10K_MAX_INPUT_OUTPUT 32 + /* Maximum number of segments for IO data */ #define ML_CN10K_MAX_SEGMENTS 1 diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c index 26df8d9ff94..e0b750cd8ef 100644 --- a/drivers/ml/cnxk/cn10k_ml_model.c +++ b/drivers/ml/cnxk/cn10k_ml_model.c @@ -520,9 +520,11 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model) struct rte_ml_model_info *info; struct rte_ml_io_info *output; struct rte_ml_io_info *input; + struct cn10k_ml_dev *mldev; uint8_t i; uint8_t j; + mldev = dev->data->dev_private; metadata = &model->metadata; info = PLT_PTR_CAST(model->info); input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info)); @@ -537,7 +539,9 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model) metadata->model.version[3]); info->model_id = model->model_id; info->device_id = dev->data->dev_id; - info->batch_size = model->batch_size; + info->io_layout = RTE_ML_IO_LAYOUT_PACKED; + info->min_batches = model->batch_size; + info->max_batches = mldev->fw.req->jd.fw_load.cap.s.max_num_batches / model->batch_size; info->nb_inputs = metadata->model.num_input; info->input_info = input; info->nb_outputs = metadata->model.num_output; diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c index e3faab81ba3..1d72fb52a6a 100644 --- a/drivers/ml/cnxk/cn10k_ml_ops.c +++ b/drivers/ml/cnxk/cn10k_ml_ops.c @@ -471,9 +471,9 @@ cn10k_ml_prep_fp_job_descriptor(struct rte_ml_dev *dev, struct cn10k_ml_req *req req->jd.hdr.sp_flags = 0x0; req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result); req->jd.model_run.input_ddr_addr = - PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->input.addr)); + PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->input[0]->addr)); req->jd.model_run.output_ddr_addr = - PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->output.addr)); + PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->output[0]->addr)); req->jd.model_run.num_batches = op->nb_batches; } @@ -856,7 +856,11 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint static int cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id) { + struct rte_ml_model_info *info; struct cn10k_ml_model *model; + struct rte_ml_buff_seg seg[2]; + struct rte_ml_buff_seg *inp; + struct rte_ml_buff_seg *out; struct rte_ml_op op; char str[RTE_MEMZONE_NAMESIZE]; @@ -864,12 +868,22 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id) uint64_t isize = 0; uint64_t osize = 0; int ret = 0; + uint32_t i; model = dev->data->models[model_id]; + info = (struct rte_ml_model_info *)model->info; + inp = &seg[0]; + out = &seg[1]; /* Create input and output buffers. */ - rte_ml_io_input_size_get(dev->data->dev_id, model_id, model->batch_size, &isize, NULL); - rte_ml_io_output_size_get(dev->data->dev_id, model_id, model->batch_size, &osize, NULL); + for (i = 0; i < info->nb_inputs; i++) + isize += info->input_info[i].size; + + for (i = 0; i < info->nb_outputs; i++) + osize += info->output_info[i].size; + + isize = model->batch_size * isize; + osize = model->batch_size * osize; snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", model_id); mz = plt_memzone_reserve_aligned(str, isize + osize, 0, ML_CN10K_ALIGN_SIZE); @@ -877,17 +891,22 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id) return -ENOMEM; memset(mz->addr, 0, isize + osize); + seg[0].addr = mz->addr; + seg[0].iova_addr = mz->iova; + seg[0].length = isize; + seg[0].next = NULL; + + seg[1].addr = PLT_PTR_ADD(mz->addr, isize); + seg[1].iova_addr = mz->iova + isize; + seg[1].length = osize; + seg[1].next = NULL; + op.model_id = model_id; op.nb_batches = model->batch_size; op.mempool = NULL; - op.input.addr = mz->addr; - op.input.length = isize; - op.input.next = NULL; - - op.output.addr = PLT_PTR_ADD(op.input.addr, isize); - op.output.length = osize; - op.output.next = NULL; + op.input = &inp; + op.output = &out; memset(model->req, 0, sizeof(struct cn10k_ml_req)); ret = cn10k_ml_inference_sync(dev, &op); @@ -919,8 +938,9 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info) else if (strcmp(mldev->fw.poll_mem, "ddr") == 0) dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP; + dev_info->max_io = ML_CN10K_MAX_INPUT_OUTPUT; dev_info->max_segments = ML_CN10K_MAX_SEGMENTS; - dev_info->min_align_size = ML_CN10K_ALIGN_SIZE; + dev_info->align_size = ML_CN10K_ALIGN_SIZE; return 0; } @@ -2139,15 +2159,14 @@ cn10k_ml_io_output_size_get(struct rte_ml_dev *dev, uint16_t model_id, uint32_t } static int -cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_batches, void *dbuffer, - void *qbuffer) +cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer, + struct rte_ml_buff_seg **qbuffer) { struct cn10k_ml_model *model; uint8_t model_input_type; uint8_t *lcl_dbuffer; uint8_t *lcl_qbuffer; uint8_t input_type; - uint32_t batch_id; float qscale; uint32_t i; uint32_t j; @@ -2160,11 +2179,9 @@ cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_batc return -EINVAL; } - lcl_dbuffer = dbuffer; - lcl_qbuffer = qbuffer; - batch_id = 0; + lcl_dbuffer = dbuffer[0]->addr; + lcl_qbuffer = qbuffer[0]->addr; -next_batch: for (i = 0; i < model->metadata.model.num_input; i++) { if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) { input_type = model->metadata.input1[i].input_type; @@ -2218,23 +2235,18 @@ cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_batc lcl_qbuffer += model->addr.input[i].sz_q; } - batch_id++; - if (batch_id < PLT_DIV_CEIL(nb_batches, model->batch_size)) - goto next_batch; - return 0; } static int -cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_batches, - void *qbuffer, void *dbuffer) +cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer, + struct rte_ml_buff_seg **dbuffer) { struct cn10k_ml_model *model; uint8_t model_output_type; uint8_t *lcl_qbuffer; uint8_t *lcl_dbuffer; uint8_t output_type; - uint32_t batch_id; float dscale; uint32_t i; uint32_t j; @@ -2247,11 +2259,9 @@ cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_ba return -EINVAL; } - lcl_dbuffer = dbuffer; - lcl_qbuffer = qbuffer; - batch_id = 0; + lcl_dbuffer = dbuffer[0]->addr; + lcl_qbuffer = qbuffer[0]->addr; -next_batch: for (i = 0; i < model->metadata.model.num_output; i++) { if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) { output_type = model->metadata.output1[i].output_type; @@ -2306,10 +2316,6 @@ cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_ba lcl_dbuffer += model->addr.output[i].sz_d; } - batch_id++; - if (batch_id < PLT_DIV_CEIL(nb_batches, model->batch_size)) - goto next_batch; - return 0; } diff --git a/lib/mldev/meson.build b/lib/mldev/meson.build index 5769b0640a1..0079ccd2052 100644 --- a/lib/mldev/meson.build +++ b/lib/mldev/meson.build @@ -35,7 +35,7 @@ driver_sdk_headers += files( 'mldev_utils.h', ) -deps += ['mempool'] +deps += ['mempool', 'mbuf'] if get_option('buildtype').contains('debug') cflags += [ '-DRTE_LIBRTE_ML_DEV_DEBUG' ] diff --git a/lib/mldev/rte_mldev.c b/lib/mldev/rte_mldev.c index 0d8ccd32124..9a48ed3e944 100644 --- a/lib/mldev/rte_mldev.c +++ b/lib/mldev/rte_mldev.c @@ -730,8 +730,8 @@ rte_ml_io_output_size_get(int16_t dev_id, uint16_t model_id, uint32_t nb_batches } int -rte_ml_io_quantize(int16_t dev_id, uint16_t model_id, uint16_t nb_batches, void *dbuffer, - void *qbuffer) +rte_ml_io_quantize(int16_t dev_id, uint16_t model_id, struct rte_ml_buff_seg **dbuffer, + struct rte_ml_buff_seg **qbuffer) { struct rte_ml_dev *dev; @@ -754,12 +754,12 @@ rte_ml_io_quantize(int16_t dev_id, uint16_t model_id, uint16_t nb_batches, void return -EINVAL; } - return (*dev->dev_ops->io_quantize)(dev, model_id, nb_batches, dbuffer, qbuffer); + return (*dev->dev_ops->io_quantize)(dev, model_id, dbuffer, qbuffer); } int -rte_ml_io_dequantize(int16_t dev_id, uint16_t model_id, uint16_t nb_batches, void *qbuffer, - void *dbuffer) +rte_ml_io_dequantize(int16_t dev_id, uint16_t model_id, struct rte_ml_buff_seg **qbuffer, + struct rte_ml_buff_seg **dbuffer) { struct rte_ml_dev *dev; @@ -782,7 +782,7 @@ rte_ml_io_dequantize(int16_t dev_id, uint16_t model_id, uint16_t nb_batches, voi return -EINVAL; } - return (*dev->dev_ops->io_dequantize)(dev, model_id, nb_batches, qbuffer, dbuffer); + return (*dev->dev_ops->io_dequantize)(dev, model_id, qbuffer, dbuffer); } /** Initialise rte_ml_op mempool element */ diff --git a/lib/mldev/rte_mldev.h b/lib/mldev/rte_mldev.h index 6204df09308..316c6fd0188 100644 --- a/lib/mldev/rte_mldev.h +++ b/lib/mldev/rte_mldev.h @@ -228,12 +228,14 @@ struct rte_ml_dev_info { /**< Maximum allowed number of descriptors for queue pair by the device. * @see struct rte_ml_dev_qp_conf::nb_desc */ + uint16_t max_io; + /**< Maximum number of inputs/outputs supported per model. */ uint16_t max_segments; /**< Maximum number of scatter-gather entries supported by the device. * @see struct rte_ml_buff_seg struct rte_ml_buff_seg::next */ - uint16_t min_align_size; - /**< Minimum alignment size of IO buffers used by the device. */ + uint16_t align_size; + /**< Alignment size of IO buffers used by the device. */ }; /** @@ -429,10 +431,28 @@ struct rte_ml_op { /**< Reserved for future use. */ struct rte_mempool *mempool; /**< Pool from which operation is allocated. */ - struct rte_ml_buff_seg input; - /**< Input buffer to hold the inference data. */ - struct rte_ml_buff_seg output; - /**< Output buffer to hold the inference output by the driver. */ + struct rte_ml_buff_seg **input; + /**< Array of buffer segments to hold the inference input data. + * + * When the model supports IO layout RTE_ML_IO_LAYOUT_PACKED, size of + * the array is 1. + * + * When the model supports IO layout RTE_ML_IO_LAYOUT_SPLIT, size of + * the array is rte_ml_model_info::nb_inputs. + * + * @see struct rte_ml_dev_info::io_layout + */ + struct rte_ml_buff_seg **output; + /**< Array of buffer segments to hold the inference output data. + * + * When the model supports IO layout RTE_ML_IO_LAYOUT_PACKED, size of + * the array is 1. + * + * When the model supports IO layout RTE_ML_IO_LAYOUT_SPLIT, size of + * the array is rte_ml_model_info::nb_outputs. + * + * @see struct rte_ml_dev_info::io_layout + */ union { uint64_t user_u64; /**< User data as uint64_t.*/ @@ -863,7 +883,37 @@ enum rte_ml_io_type { /**< 16-bit brain floating point number. */ }; -/** Input and output data information structure +/** ML I/O buffer layout */ +enum rte_ml_io_layout { + RTE_ML_IO_LAYOUT_PACKED, + /**< All inputs for the model should packed in a single buffer with + * no padding between individual inputs. The buffer is expected to + * be aligned to rte_ml_dev_info::align_size. + * + * When I/O segmentation is supported by the device, the packed + * data can be split into multiple segments. In this case, each + * segment is expected to be aligned to rte_ml_dev_info::align_size + * + * Same applies to output. + * + * @see struct rte_ml_dev_info::max_segments + */ + RTE_ML_IO_LAYOUT_SPLIT + /**< Each input for the model should be stored as separate buffers + * and each input should be aligned to rte_ml_dev_info::align_size. + * + * When I/O segmentation is supported, each input can be split into + * multiple segments. In this case, each segment is expected to be + * aligned to rte_ml_dev_info::align_size + * + * Same applies to output. + * + * @see struct rte_ml_dev_info::max_segments + */ +}; + +/** + * Input and output data information structure * * Specifies the type and shape of input and output data. */ @@ -873,7 +923,7 @@ struct rte_ml_io_info { uint32_t nb_dims; /**< Number of dimensions in shape */ uint32_t *shape; - /**< Shape of the tensor */ + /**< Shape of the tensor for rte_ml_model_info::min_batches of the model. */ enum rte_ml_io_type type; /**< Type of data * @see enum rte_ml_io_type @@ -894,8 +944,16 @@ struct rte_ml_model_info { /**< Model ID */ uint16_t device_id; /**< Device ID */ - uint16_t batch_size; - /**< Maximum number of batches that the model can process simultaneously */ + enum rte_ml_io_layout io_layout; + /**< I/O buffer layout for the model */ + uint16_t min_batches; + /**< Minimum number of batches that the model can process + * in one inference request + */ + uint16_t max_batches; + /**< Maximum number of batches that the model can process + * in one inference request + */ uint32_t nb_inputs; /**< Number of inputs */ const struct rte_ml_io_info *input_info; @@ -1021,8 +1079,6 @@ rte_ml_io_output_size_get(int16_t dev_id, uint16_t model_id, uint32_t nb_batches * The identifier of the device. * @param[in] model_id * Identifier for the model - * @param[in] nb_batches - * Number of batches in the dequantized input buffer * @param[in] dbuffer * Address of dequantized input data * @param[in] qbuffer @@ -1034,8 +1090,8 @@ rte_ml_io_output_size_get(int16_t dev_id, uint16_t model_id, uint32_t nb_batches */ __rte_experimental int -rte_ml_io_quantize(int16_t dev_id, uint16_t model_id, uint16_t nb_batches, void *dbuffer, - void *qbuffer); +rte_ml_io_quantize(int16_t dev_id, uint16_t model_id, struct rte_ml_buff_seg **dbuffer, + struct rte_ml_buff_seg **qbuffer); /** * Dequantize output data. @@ -1047,8 +1103,6 @@ rte_ml_io_quantize(int16_t dev_id, uint16_t model_id, uint16_t nb_batches, void * The identifier of the device. * @param[in] model_id * Identifier for the model - * @param[in] nb_batches - * Number of batches in the dequantized output buffer * @param[in] qbuffer * Address of quantized output data * @param[in] dbuffer @@ -1060,8 +1114,8 @@ rte_ml_io_quantize(int16_t dev_id, uint16_t model_id, uint16_t nb_batches, void */ __rte_experimental int -rte_ml_io_dequantize(int16_t dev_id, uint16_t model_id, uint16_t nb_batches, void *qbuffer, - void *dbuffer); +rte_ml_io_dequantize(int16_t dev_id, uint16_t model_id, struct rte_ml_buff_seg **qbuffer, + struct rte_ml_buff_seg **dbuffer); /* ML op pool operations */ diff --git a/lib/mldev/rte_mldev_core.h b/lib/mldev/rte_mldev_core.h index 78b8b7633dd..8530b073162 100644 --- a/lib/mldev/rte_mldev_core.h +++ b/lib/mldev/rte_mldev_core.h @@ -523,8 +523,6 @@ typedef int (*mldev_io_output_size_get_t)(struct rte_ml_dev *dev, uint16_t model * ML device pointer. * @param model_id * Model ID to use. - * @param nb_batches - * Number of batches. * @param dbuffer * Pointer t de-quantized data buffer. * @param qbuffer @@ -534,8 +532,9 @@ typedef int (*mldev_io_output_size_get_t)(struct rte_ml_dev *dev, uint16_t model * - 0 on success. * - <0, error on failure. */ -typedef int (*mldev_io_quantize_t)(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_batches, - void *dbuffer, void *qbuffer); +typedef int (*mldev_io_quantize_t)(struct rte_ml_dev *dev, uint16_t model_id, + struct rte_ml_buff_seg **dbuffer, + struct rte_ml_buff_seg **qbuffer); /** * @internal @@ -546,8 +545,6 @@ typedef int (*mldev_io_quantize_t)(struct rte_ml_dev *dev, uint16_t model_id, ui * ML device pointer. * @param model_id * Model ID to use. - * @param nb_batches - * Number of batches. * @param qbuffer * Pointer t de-quantized data buffer. * @param dbuffer @@ -557,8 +554,9 @@ typedef int (*mldev_io_quantize_t)(struct rte_ml_dev *dev, uint16_t model_id, ui * - 0 on success. * - <0, error on failure. */ -typedef int (*mldev_io_dequantize_t)(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_batches, - void *qbuffer, void *dbuffer); +typedef int (*mldev_io_dequantize_t)(struct rte_ml_dev *dev, uint16_t model_id, + struct rte_ml_buff_seg **qbuffer, + struct rte_ml_buff_seg **dbuffer); /** * @internal -- 2.41.0 ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v1 3/3] mldev: drop input and output size get APIs 2023-08-30 15:52 [PATCH v1 0/3] Spec changes to support multi I/O models Srikanth Yalavarthi 2023-08-30 15:53 ` [PATCH v1 1/3] mldev: add support for arbitrary shape dimensions Srikanth Yalavarthi 2023-08-30 15:53 ` [PATCH v1 2/3] mldev: introduce support for IO layout Srikanth Yalavarthi @ 2023-08-30 15:53 ` Srikanth Yalavarthi 2023-09-20 7:19 ` [PATCH v2 0/3] Spec changes to support multi I/O models Srikanth Yalavarthi ` (2 subsequent siblings) 5 siblings, 0 replies; 26+ messages in thread From: Srikanth Yalavarthi @ 2023-08-30 15:53 UTC (permalink / raw) To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar Drop support and use of ML input and output size get functions, rte_ml_io_input_size_get and rte_ml_io_output_size_get. These functions are not required, as the model buffer size can be computed from the fields of updated rte_ml_io_info structure. Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com> --- drivers/ml/cnxk/cn10k_ml_ops.c | 50 ---------------------------- lib/mldev/rte_mldev.c | 38 --------------------- lib/mldev/rte_mldev.h | 60 ---------------------------------- lib/mldev/rte_mldev_core.h | 54 ------------------------------ lib/mldev/version.map | 2 -- 5 files changed, 204 deletions(-) diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c index 1d72fb52a6a..4abf4ae0d39 100644 --- a/drivers/ml/cnxk/cn10k_ml_ops.c +++ b/drivers/ml/cnxk/cn10k_ml_ops.c @@ -2110,54 +2110,6 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *bu return 0; } -static int -cn10k_ml_io_input_size_get(struct rte_ml_dev *dev, uint16_t model_id, uint32_t nb_batches, - uint64_t *input_qsize, uint64_t *input_dsize) -{ - struct cn10k_ml_model *model; - - model = dev->data->models[model_id]; - - if (model == NULL) { - plt_err("Invalid model_id = %u", model_id); - return -EINVAL; - } - - if (input_qsize != NULL) - *input_qsize = PLT_U64_CAST(model->addr.total_input_sz_q * - PLT_DIV_CEIL(nb_batches, model->batch_size)); - - if (input_dsize != NULL) - *input_dsize = PLT_U64_CAST(model->addr.total_input_sz_d * - PLT_DIV_CEIL(nb_batches, model->batch_size)); - - return 0; -} - -static int -cn10k_ml_io_output_size_get(struct rte_ml_dev *dev, uint16_t model_id, uint32_t nb_batches, - uint64_t *output_qsize, uint64_t *output_dsize) -{ - struct cn10k_ml_model *model; - - model = dev->data->models[model_id]; - - if (model == NULL) { - plt_err("Invalid model_id = %u", model_id); - return -EINVAL; - } - - if (output_qsize != NULL) - *output_qsize = PLT_U64_CAST(model->addr.total_output_sz_q * - PLT_DIV_CEIL(nb_batches, model->batch_size)); - - if (output_dsize != NULL) - *output_dsize = PLT_U64_CAST(model->addr.total_output_sz_d * - PLT_DIV_CEIL(nb_batches, model->batch_size)); - - return 0; -} - static int cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer, struct rte_ml_buff_seg **qbuffer) @@ -2636,8 +2588,6 @@ struct rte_ml_dev_ops cn10k_ml_ops = { .model_params_update = cn10k_ml_model_params_update, /* I/O ops */ - .io_input_size_get = cn10k_ml_io_input_size_get, - .io_output_size_get = cn10k_ml_io_output_size_get, .io_quantize = cn10k_ml_io_quantize, .io_dequantize = cn10k_ml_io_dequantize, }; diff --git a/lib/mldev/rte_mldev.c b/lib/mldev/rte_mldev.c index 9a48ed3e944..cc5f2e0cc63 100644 --- a/lib/mldev/rte_mldev.c +++ b/lib/mldev/rte_mldev.c @@ -691,44 +691,6 @@ rte_ml_model_params_update(int16_t dev_id, uint16_t model_id, void *buffer) return (*dev->dev_ops->model_params_update)(dev, model_id, buffer); } -int -rte_ml_io_input_size_get(int16_t dev_id, uint16_t model_id, uint32_t nb_batches, - uint64_t *input_qsize, uint64_t *input_dsize) -{ - struct rte_ml_dev *dev; - - if (!rte_ml_dev_is_valid_dev(dev_id)) { - RTE_MLDEV_LOG(ERR, "Invalid dev_id = %d\n", dev_id); - return -EINVAL; - } - - dev = rte_ml_dev_pmd_get_dev(dev_id); - if (*dev->dev_ops->io_input_size_get == NULL) - return -ENOTSUP; - - return (*dev->dev_ops->io_input_size_get)(dev, model_id, nb_batches, input_qsize, - input_dsize); -} - -int -rte_ml_io_output_size_get(int16_t dev_id, uint16_t model_id, uint32_t nb_batches, - uint64_t *output_qsize, uint64_t *output_dsize) -{ - struct rte_ml_dev *dev; - - if (!rte_ml_dev_is_valid_dev(dev_id)) { - RTE_MLDEV_LOG(ERR, "Invalid dev_id = %d\n", dev_id); - return -EINVAL; - } - - dev = rte_ml_dev_pmd_get_dev(dev_id); - if (*dev->dev_ops->io_output_size_get == NULL) - return -ENOTSUP; - - return (*dev->dev_ops->io_output_size_get)(dev, model_id, nb_batches, output_qsize, - output_dsize); -} - int rte_ml_io_quantize(int16_t dev_id, uint16_t model_id, struct rte_ml_buff_seg **dbuffer, struct rte_ml_buff_seg **qbuffer) diff --git a/lib/mldev/rte_mldev.h b/lib/mldev/rte_mldev.h index 316c6fd0188..63b2670bb04 100644 --- a/lib/mldev/rte_mldev.h +++ b/lib/mldev/rte_mldev.h @@ -1008,66 +1008,6 @@ rte_ml_model_params_update(int16_t dev_id, uint16_t model_id, void *buffer); /* IO operations */ -/** - * Get size of quantized and dequantized input buffers. - * - * Calculate the size of buffers required for quantized and dequantized input data. - * This API would return the buffer sizes for the number of batches provided and would - * consider the alignment requirements as per the PMD. Input sizes computed by this API can - * be used by the application to allocate buffers. - * - * @param[in] dev_id - * The identifier of the device. - * @param[in] model_id - * Identifier for the model created - * @param[in] nb_batches - * Number of batches of input to be processed in a single inference job - * @param[out] input_qsize - * Quantized input size pointer. - * NULL value is allowed, in which case input_qsize is not calculated by the driver. - * @param[out] input_dsize - * Dequantized input size pointer. - * NULL value is allowed, in which case input_dsize is not calculated by the driver. - * - * @return - * - Returns 0 on success - * - Returns negative value on failure - */ -__rte_experimental -int -rte_ml_io_input_size_get(int16_t dev_id, uint16_t model_id, uint32_t nb_batches, - uint64_t *input_qsize, uint64_t *input_dsize); - -/** - * Get size of quantized and dequantized output buffers. - * - * Calculate the size of buffers required for quantized and dequantized output data. - * This API would return the buffer sizes for the number of batches provided and would consider - * the alignment requirements as per the PMD. Output sizes computed by this API can be used by the - * application to allocate buffers. - * - * @param[in] dev_id - * The identifier of the device. - * @param[in] model_id - * Identifier for the model created - * @param[in] nb_batches - * Number of batches of input to be processed in a single inference job - * @param[out] output_qsize - * Quantized output size pointer. - * NULL value is allowed, in which case output_qsize is not calculated by the driver. - * @param[out] output_dsize - * Dequantized output size pointer. - * NULL value is allowed, in which case output_dsize is not calculated by the driver. - * - * @return - * - Returns 0 on success - * - Returns negative value on failure - */ -__rte_experimental -int -rte_ml_io_output_size_get(int16_t dev_id, uint16_t model_id, uint32_t nb_batches, - uint64_t *output_qsize, uint64_t *output_dsize); - /** * Quantize input data. * diff --git a/lib/mldev/rte_mldev_core.h b/lib/mldev/rte_mldev_core.h index 8530b073162..2279b1dcecb 100644 --- a/lib/mldev/rte_mldev_core.h +++ b/lib/mldev/rte_mldev_core.h @@ -466,54 +466,6 @@ typedef int (*mldev_model_info_get_t)(struct rte_ml_dev *dev, uint16_t model_id, */ typedef int (*mldev_model_params_update_t)(struct rte_ml_dev *dev, uint16_t model_id, void *buffer); -/** - * @internal - * - * Get size of input buffers. - * - * @param dev - * ML device pointer. - * @param model_id - * Model ID to use. - * @param nb_batches - * Number of batches. - * @param input_qsize - * Size of quantized input. - * @param input_dsize - * Size of dequantized input. - * - * @return - * - 0 on success. - * - <0, error on failure. - */ -typedef int (*mldev_io_input_size_get_t)(struct rte_ml_dev *dev, uint16_t model_id, - uint32_t nb_batches, uint64_t *input_qsize, - uint64_t *input_dsize); - -/** - * @internal - * - * Get size of output buffers. - * - * @param dev - * ML device pointer. - * @param model_id - * Model ID to use. - * @param nb_batches - * Number of batches. - * @param output_qsize - * Size of quantized output. - * @param output_dsize - * Size of dequantized output. - * - * @return - * - 0 on success. - * - <0, error on failure. - */ -typedef int (*mldev_io_output_size_get_t)(struct rte_ml_dev *dev, uint16_t model_id, - uint32_t nb_batches, uint64_t *output_qsize, - uint64_t *output_dsize); - /** * @internal * @@ -627,12 +579,6 @@ struct rte_ml_dev_ops { /** Update model params. */ mldev_model_params_update_t model_params_update; - /** Get input buffer size. */ - mldev_io_input_size_get_t io_input_size_get; - - /** Get output buffer size. */ - mldev_io_output_size_get_t io_output_size_get; - /** Quantize data */ mldev_io_quantize_t io_quantize; diff --git a/lib/mldev/version.map b/lib/mldev/version.map index 40ff27f4b95..99841db6aa9 100644 --- a/lib/mldev/version.map +++ b/lib/mldev/version.map @@ -23,8 +23,6 @@ EXPERIMENTAL { rte_ml_dev_xstats_reset; rte_ml_enqueue_burst; rte_ml_io_dequantize; - rte_ml_io_input_size_get; - rte_ml_io_output_size_get; rte_ml_io_quantize; rte_ml_model_info_get; rte_ml_model_load; -- 2.41.0 ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 0/3] Spec changes to support multi I/O models 2023-08-30 15:52 [PATCH v1 0/3] Spec changes to support multi I/O models Srikanth Yalavarthi ` (2 preceding siblings ...) 2023-08-30 15:53 ` [PATCH v1 3/3] mldev: drop input and output size get APIs Srikanth Yalavarthi @ 2023-09-20 7:19 ` Srikanth Yalavarthi 2023-09-20 7:19 ` [PATCH v2 1/3] mldev: add support for arbitrary shape dimensions Srikanth Yalavarthi ` (2 more replies) 2023-09-27 18:11 ` [PATCH v3 0/4] Spec changes to support multi I/O models Srikanth Yalavarthi 2023-10-02 9:58 ` [PATCH v4 0/3] Spec changes to support multi I/O models Srikanth Yalavarthi 5 siblings, 3 replies; 26+ messages in thread From: Srikanth Yalavarthi @ 2023-09-20 7:19 UTC (permalink / raw) Cc: dev, syalavarthi, sshankarnara, aprabhu, ptakkar This series implements changes to mldev spec to extend support for ML models with multiple inputs and outputs. Changes include introduction of I/O layout to support packed and split buffers for model input and output. Extended the rte_ml_model_info structure to support multiple inputs and outputs. Updated rte_ml_op and quantize / dequantize APIs to support an array of input and output ML buffer segments. Support for batches option is dropped from test application. v2: - Minor fixes - Cleanup of application help v1: - Initial changes Srikanth Yalavarthi (3): mldev: add support for arbitrary shape dimensions mldev: introduce support for IO layout mldev: drop input and output size get APIs app/test-mldev/ml_options.c | 16 - app/test-mldev/ml_options.h | 2 - app/test-mldev/test_inference_common.c | 420 +++++++++++++++++-------- app/test-mldev/test_inference_common.h | 6 + app/test-mldev/test_model_common.c | 6 - app/test-mldev/test_model_common.h | 1 - doc/guides/tools/testmldev.rst | 6 - drivers/ml/cnxk/cn10k_ml_dev.h | 3 + drivers/ml/cnxk/cn10k_ml_model.c | 84 +++-- drivers/ml/cnxk/cn10k_ml_model.h | 12 + drivers/ml/cnxk/cn10k_ml_ops.c | 135 +++----- lib/mldev/meson.build | 2 +- lib/mldev/mldev_utils.c | 30 -- lib/mldev/mldev_utils.h | 16 - lib/mldev/rte_mldev.c | 50 +-- lib/mldev/rte_mldev.h | 201 +++++------- lib/mldev/rte_mldev_core.h | 68 +--- lib/mldev/version.map | 3 - 18 files changed, 506 insertions(+), 555 deletions(-) -- 2.41.0 ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 1/3] mldev: add support for arbitrary shape dimensions 2023-09-20 7:19 ` [PATCH v2 0/3] Spec changes to support multi I/O models Srikanth Yalavarthi @ 2023-09-20 7:19 ` Srikanth Yalavarthi 2023-09-20 7:19 ` [PATCH v2 2/3] mldev: introduce support for IO layout Srikanth Yalavarthi 2023-09-20 7:19 ` [PATCH v2 3/3] mldev: drop input and output size get APIs Srikanth Yalavarthi 2 siblings, 0 replies; 26+ messages in thread From: Srikanth Yalavarthi @ 2023-09-20 7:19 UTC (permalink / raw) To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar Updated rte_ml_io_info to support shape of arbitrary number of dimensions. Dropped use of rte_ml_io_shape and rte_ml_io_format. Introduced new fields nb_elements and size in rte_ml_io_info. Updated drivers and app/mldev to support the changes. Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com> --- app/test-mldev/test_inference_common.c | 97 +++++--------------------- drivers/ml/cnxk/cn10k_ml_model.c | 78 +++++++++++++-------- drivers/ml/cnxk/cn10k_ml_model.h | 12 ++++ drivers/ml/cnxk/cn10k_ml_ops.c | 11 +-- lib/mldev/mldev_utils.c | 30 -------- lib/mldev/mldev_utils.h | 16 ----- lib/mldev/rte_mldev.h | 59 ++++------------ lib/mldev/version.map | 1 - 8 files changed, 94 insertions(+), 210 deletions(-) diff --git a/app/test-mldev/test_inference_common.c b/app/test-mldev/test_inference_common.c index 05b221401b..b40519b5e3 100644 --- a/app/test-mldev/test_inference_common.c +++ b/app/test-mldev/test_inference_common.c @@ -3,6 +3,7 @@ */ #include <errno.h> +#include <math.h> #include <stdio.h> #include <unistd.h> @@ -18,11 +19,6 @@ #include "ml_common.h" #include "test_inference_common.h" -#define ML_TEST_READ_TYPE(buffer, type) (*((type *)buffer)) - -#define ML_TEST_CHECK_OUTPUT(output, reference, tolerance) \ - (((float)output - (float)reference) <= (((float)reference * tolerance) / 100.0)) - #define ML_OPEN_WRITE_GET_ERR(name, buffer, size, err) \ do { \ FILE *fp = fopen(name, "w+"); \ @@ -763,9 +759,9 @@ ml_inference_validation(struct ml_test *test, struct ml_request *req) { struct test_inference *t = ml_test_priv((struct ml_test *)test); struct ml_model *model; - uint32_t nb_elements; - uint8_t *reference; - uint8_t *output; + float *reference; + float *output; + float deviation; bool match; uint32_t i; uint32_t j; @@ -777,89 +773,30 @@ ml_inference_validation(struct ml_test *test, struct ml_request *req) match = (rte_hash_crc(model->output, model->out_dsize, 0) == rte_hash_crc(model->reference, model->out_dsize, 0)); } else { - output = model->output; - reference = model->reference; + output = (float *)model->output; + reference = (float *)model->reference; i = 0; next_output: - nb_elements = - model->info.output_info[i].shape.w * model->info.output_info[i].shape.x * - model->info.output_info[i].shape.y * model->info.output_info[i].shape.z; j = 0; next_element: match = false; - switch (model->info.output_info[i].dtype) { - case RTE_ML_IO_TYPE_INT8: - if (ML_TEST_CHECK_OUTPUT(ML_TEST_READ_TYPE(output, int8_t), - ML_TEST_READ_TYPE(reference, int8_t), - t->cmn.opt->tolerance)) - match = true; - - output += sizeof(int8_t); - reference += sizeof(int8_t); - break; - case RTE_ML_IO_TYPE_UINT8: - if (ML_TEST_CHECK_OUTPUT(ML_TEST_READ_TYPE(output, uint8_t), - ML_TEST_READ_TYPE(reference, uint8_t), - t->cmn.opt->tolerance)) - match = true; - - output += sizeof(float); - reference += sizeof(float); - break; - case RTE_ML_IO_TYPE_INT16: - if (ML_TEST_CHECK_OUTPUT(ML_TEST_READ_TYPE(output, int16_t), - ML_TEST_READ_TYPE(reference, int16_t), - t->cmn.opt->tolerance)) - match = true; - - output += sizeof(int16_t); - reference += sizeof(int16_t); - break; - case RTE_ML_IO_TYPE_UINT16: - if (ML_TEST_CHECK_OUTPUT(ML_TEST_READ_TYPE(output, uint16_t), - ML_TEST_READ_TYPE(reference, uint16_t), - t->cmn.opt->tolerance)) - match = true; - - output += sizeof(uint16_t); - reference += sizeof(uint16_t); - break; - case RTE_ML_IO_TYPE_INT32: - if (ML_TEST_CHECK_OUTPUT(ML_TEST_READ_TYPE(output, int32_t), - ML_TEST_READ_TYPE(reference, int32_t), - t->cmn.opt->tolerance)) - match = true; - - output += sizeof(int32_t); - reference += sizeof(int32_t); - break; - case RTE_ML_IO_TYPE_UINT32: - if (ML_TEST_CHECK_OUTPUT(ML_TEST_READ_TYPE(output, uint32_t), - ML_TEST_READ_TYPE(reference, uint32_t), - t->cmn.opt->tolerance)) - match = true; - - output += sizeof(uint32_t); - reference += sizeof(uint32_t); - break; - case RTE_ML_IO_TYPE_FP32: - if (ML_TEST_CHECK_OUTPUT(ML_TEST_READ_TYPE(output, float), - ML_TEST_READ_TYPE(reference, float), - t->cmn.opt->tolerance)) - match = true; - - output += sizeof(float); - reference += sizeof(float); - break; - default: /* other types, fp8, fp16, bfloat16 */ + deviation = + (*reference == 0 ? 0 : 100 * fabs(*output - *reference) / fabs(*reference)); + if (deviation <= t->cmn.opt->tolerance) match = true; - } + else + ml_err("id = %d, element = %d, output = %f, reference = %f, deviation = %f %%\n", + i, j, *output, *reference, deviation); + + output++; + reference++; if (!match) goto done; + j++; - if (j < nb_elements) + if (j < model->info.output_info[i].nb_elements) goto next_element; i++; diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c index 92c47d39ba..26df8d9ff9 100644 --- a/drivers/ml/cnxk/cn10k_ml_model.c +++ b/drivers/ml/cnxk/cn10k_ml_model.c @@ -366,6 +366,12 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_ addr->total_input_sz_q = 0; for (i = 0; i < metadata->model.num_input; i++) { if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) { + addr->input[i].nb_dims = 4; + addr->input[i].shape[0] = metadata->input1[i].shape.w; + addr->input[i].shape[1] = metadata->input1[i].shape.x; + addr->input[i].shape[2] = metadata->input1[i].shape.y; + addr->input[i].shape[3] = metadata->input1[i].shape.z; + addr->input[i].nb_elements = metadata->input1[i].shape.w * metadata->input1[i].shape.x * metadata->input1[i].shape.y * metadata->input1[i].shape.z; @@ -386,6 +392,13 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_ addr->input[i].sz_q); } else { j = i - MRVL_ML_NUM_INPUT_OUTPUT_1; + + addr->input[i].nb_dims = 4; + addr->input[i].shape[0] = metadata->input2[j].shape.w; + addr->input[i].shape[1] = metadata->input2[j].shape.x; + addr->input[i].shape[2] = metadata->input2[j].shape.y; + addr->input[i].shape[3] = metadata->input2[j].shape.z; + addr->input[i].nb_elements = metadata->input2[j].shape.w * metadata->input2[j].shape.x * metadata->input2[j].shape.y * metadata->input2[j].shape.z; @@ -412,6 +425,8 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_ addr->total_output_sz_d = 0; for (i = 0; i < metadata->model.num_output; i++) { if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) { + addr->output[i].nb_dims = 1; + addr->output[i].shape[0] = metadata->output1[i].size; addr->output[i].nb_elements = metadata->output1[i].size; addr->output[i].sz_d = addr->output[i].nb_elements * @@ -426,6 +441,9 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_ model->model_id, i, addr->output[i].sz_d, addr->output[i].sz_q); } else { j = i - MRVL_ML_NUM_INPUT_OUTPUT_1; + + addr->output[i].nb_dims = 1; + addr->output[i].shape[0] = metadata->output2[j].size; addr->output[i].nb_elements = metadata->output2[j].size; addr->output[i].sz_d = addr->output[i].nb_elements * @@ -498,6 +516,7 @@ void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model) { struct cn10k_ml_model_metadata *metadata; + struct cn10k_ml_model_addr *addr; struct rte_ml_model_info *info; struct rte_ml_io_info *output; struct rte_ml_io_info *input; @@ -508,6 +527,7 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model) info = PLT_PTR_CAST(model->info); input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info)); output = PLT_PTR_ADD(input, metadata->model.num_input * sizeof(struct rte_ml_io_info)); + addr = &model->addr; /* Set model info */ memset(info, 0, sizeof(struct rte_ml_model_info)); @@ -529,24 +549,25 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model) if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) { rte_memcpy(input[i].name, metadata->input1[i].input_name, MRVL_ML_INPUT_NAME_LEN); - input[i].dtype = metadata->input1[i].input_type; - input[i].qtype = metadata->input1[i].model_input_type; - input[i].shape.format = metadata->input1[i].shape.format; - input[i].shape.w = metadata->input1[i].shape.w; - input[i].shape.x = metadata->input1[i].shape.x; - input[i].shape.y = metadata->input1[i].shape.y; - input[i].shape.z = metadata->input1[i].shape.z; + input[i].nb_dims = addr->input[i].nb_dims; + input[i].shape = addr->input[i].shape; + input[i].type = metadata->input1[i].model_input_type; + input[i].nb_elements = addr->input[i].nb_elements; + input[i].size = + addr->input[i].nb_elements * + rte_ml_io_type_size_get(metadata->input1[i].model_input_type); } else { j = i - MRVL_ML_NUM_INPUT_OUTPUT_1; + rte_memcpy(input[i].name, metadata->input2[j].input_name, MRVL_ML_INPUT_NAME_LEN); - input[i].dtype = metadata->input2[j].input_type; - input[i].qtype = metadata->input2[j].model_input_type; - input[i].shape.format = metadata->input2[j].shape.format; - input[i].shape.w = metadata->input2[j].shape.w; - input[i].shape.x = metadata->input2[j].shape.x; - input[i].shape.y = metadata->input2[j].shape.y; - input[i].shape.z = metadata->input2[j].shape.z; + input[i].nb_dims = addr->input[i].nb_dims; + input[i].shape = addr->input[i].shape; + input[i].type = metadata->input2[j].model_input_type; + input[i].nb_elements = addr->input[i].nb_elements; + input[i].size = + addr->input[i].nb_elements * + rte_ml_io_type_size_get(metadata->input2[j].model_input_type); } } @@ -555,24 +576,25 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model) if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) { rte_memcpy(output[i].name, metadata->output1[i].output_name, MRVL_ML_OUTPUT_NAME_LEN); - output[i].dtype = metadata->output1[i].output_type; - output[i].qtype = metadata->output1[i].model_output_type; - output[i].shape.format = RTE_ML_IO_FORMAT_1D; - output[i].shape.w = metadata->output1[i].size; - output[i].shape.x = 1; - output[i].shape.y = 1; - output[i].shape.z = 1; + output[i].nb_dims = addr->output[i].nb_dims; + output[i].shape = addr->output[i].shape; + output[i].type = metadata->output1[i].model_output_type; + output[i].nb_elements = addr->output[i].nb_elements; + output[i].size = + addr->output[i].nb_elements * + rte_ml_io_type_size_get(metadata->output1[i].model_output_type); } else { j = i - MRVL_ML_NUM_INPUT_OUTPUT_1; + rte_memcpy(output[i].name, metadata->output2[j].output_name, MRVL_ML_OUTPUT_NAME_LEN); - output[i].dtype = metadata->output2[j].output_type; - output[i].qtype = metadata->output2[j].model_output_type; - output[i].shape.format = RTE_ML_IO_FORMAT_1D; - output[i].shape.w = metadata->output2[j].size; - output[i].shape.x = 1; - output[i].shape.y = 1; - output[i].shape.z = 1; + output[i].nb_dims = addr->output[i].nb_dims; + output[i].shape = addr->output[i].shape; + output[i].type = metadata->output2[j].model_output_type; + output[i].nb_elements = addr->output[i].nb_elements; + output[i].size = + addr->output[i].nb_elements * + rte_ml_io_type_size_get(metadata->output2[j].model_output_type); } } } diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h index 1f689363fc..4cc0744891 100644 --- a/drivers/ml/cnxk/cn10k_ml_model.h +++ b/drivers/ml/cnxk/cn10k_ml_model.h @@ -409,6 +409,12 @@ struct cn10k_ml_model_addr { /* Input address and size */ struct { + /* Number of dimensions in shape */ + uint32_t nb_dims; + + /* Shape of input */ + uint32_t shape[4]; + /* Number of elements */ uint32_t nb_elements; @@ -421,6 +427,12 @@ struct cn10k_ml_model_addr { /* Output address and size */ struct { + /* Number of dimensions in shape */ + uint32_t nb_dims; + + /* Shape of input */ + uint32_t shape[4]; + /* Number of elements */ uint32_t nb_elements; diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c index 656467d891..e3faab81ba 100644 --- a/drivers/ml/cnxk/cn10k_ml_ops.c +++ b/drivers/ml/cnxk/cn10k_ml_ops.c @@ -321,8 +321,8 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp) fprintf(fp, "\n"); print_line(fp, LINE_LEN); - fprintf(fp, "%8s %16s %12s %18s %12s %14s\n", "input", "input_name", "input_type", - "model_input_type", "quantize", "format"); + fprintf(fp, "%8s %16s %12s %18s %12s\n", "input", "input_name", "input_type", + "model_input_type", "quantize"); print_line(fp, LINE_LEN); for (i = 0; i < model->metadata.model.num_input; i++) { if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) { @@ -335,12 +335,10 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp) fprintf(fp, "%*s ", 18, str); fprintf(fp, "%*s", 12, (model->metadata.input1[i].quantize == 1 ? "Yes" : "No")); - rte_ml_io_format_to_str(model->metadata.input1[i].shape.format, str, - STR_LEN); - fprintf(fp, "%*s", 16, str); fprintf(fp, "\n"); } else { j = i - MRVL_ML_NUM_INPUT_OUTPUT_1; + fprintf(fp, "%8u ", i); fprintf(fp, "%*s ", 16, model->metadata.input2[j].input_name); rte_ml_io_type_to_str(model->metadata.input2[j].input_type, str, STR_LEN); @@ -350,9 +348,6 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp) fprintf(fp, "%*s ", 18, str); fprintf(fp, "%*s", 12, (model->metadata.input2[j].quantize == 1 ? "Yes" : "No")); - rte_ml_io_format_to_str(model->metadata.input2[j].shape.format, str, - STR_LEN); - fprintf(fp, "%*s", 16, str); fprintf(fp, "\n"); } } diff --git a/lib/mldev/mldev_utils.c b/lib/mldev/mldev_utils.c index d2442b123b..ccd2c39ca8 100644 --- a/lib/mldev/mldev_utils.c +++ b/lib/mldev/mldev_utils.c @@ -86,33 +86,3 @@ rte_ml_io_type_to_str(enum rte_ml_io_type type, char *str, int len) rte_strlcpy(str, "invalid", len); } } - -void -rte_ml_io_format_to_str(enum rte_ml_io_format format, char *str, int len) -{ - switch (format) { - case RTE_ML_IO_FORMAT_NCHW: - rte_strlcpy(str, "NCHW", len); - break; - case RTE_ML_IO_FORMAT_NHWC: - rte_strlcpy(str, "NHWC", len); - break; - case RTE_ML_IO_FORMAT_CHWN: - rte_strlcpy(str, "CHWN", len); - break; - case RTE_ML_IO_FORMAT_3D: - rte_strlcpy(str, "3D", len); - break; - case RTE_ML_IO_FORMAT_2D: - rte_strlcpy(str, "Matrix", len); - break; - case RTE_ML_IO_FORMAT_1D: - rte_strlcpy(str, "Vector", len); - break; - case RTE_ML_IO_FORMAT_SCALAR: - rte_strlcpy(str, "Scalar", len); - break; - default: - rte_strlcpy(str, "invalid", len); - } -} diff --git a/lib/mldev/mldev_utils.h b/lib/mldev/mldev_utils.h index 5bc8020453..220afb42f0 100644 --- a/lib/mldev/mldev_utils.h +++ b/lib/mldev/mldev_utils.h @@ -52,22 +52,6 @@ __rte_internal void rte_ml_io_type_to_str(enum rte_ml_io_type type, char *str, int len); -/** - * @internal - * - * Get the name of an ML IO format. - * - * @param[in] type - * Enumeration of ML IO format. - * @param[in] str - * Address of character array. - * @param[in] len - * Length of character array. - */ -__rte_internal -void -rte_ml_io_format_to_str(enum rte_ml_io_format format, char *str, int len); - /** * @internal * diff --git a/lib/mldev/rte_mldev.h b/lib/mldev/rte_mldev.h index fc3525c1ab..6204df0930 100644 --- a/lib/mldev/rte_mldev.h +++ b/lib/mldev/rte_mldev.h @@ -863,47 +863,6 @@ enum rte_ml_io_type { /**< 16-bit brain floating point number. */ }; -/** - * Input and output format. This is used to represent the encoding type of multi-dimensional - * used by ML models. - */ -enum rte_ml_io_format { - RTE_ML_IO_FORMAT_NCHW = 1, - /**< Batch size (N) x channels (C) x height (H) x width (W) */ - RTE_ML_IO_FORMAT_NHWC, - /**< Batch size (N) x height (H) x width (W) x channels (C) */ - RTE_ML_IO_FORMAT_CHWN, - /**< Channels (C) x height (H) x width (W) x batch size (N) */ - RTE_ML_IO_FORMAT_3D, - /**< Format to represent a 3 dimensional data */ - RTE_ML_IO_FORMAT_2D, - /**< Format to represent matrix data */ - RTE_ML_IO_FORMAT_1D, - /**< Format to represent vector data */ - RTE_ML_IO_FORMAT_SCALAR, - /**< Format to represent scalar data */ -}; - -/** - * Input and output shape. This structure represents the encoding format and dimensions - * of the tensor or vector. - * - * The data can be a 4D / 3D tensor, matrix, vector or a scalar. Number of dimensions used - * for the data would depend on the format. Unused dimensions to be set to 1. - */ -struct rte_ml_io_shape { - enum rte_ml_io_format format; - /**< Format of the data */ - uint32_t w; - /**< First dimension */ - uint32_t x; - /**< Second dimension */ - uint32_t y; - /**< Third dimension */ - uint32_t z; - /**< Fourth dimension */ -}; - /** Input and output data information structure * * Specifies the type and shape of input and output data. @@ -911,12 +870,18 @@ struct rte_ml_io_shape { struct rte_ml_io_info { char name[RTE_ML_STR_MAX]; /**< Name of data */ - struct rte_ml_io_shape shape; - /**< Shape of data */ - enum rte_ml_io_type qtype; - /**< Type of quantized data */ - enum rte_ml_io_type dtype; - /**< Type of de-quantized data */ + uint32_t nb_dims; + /**< Number of dimensions in shape */ + uint32_t *shape; + /**< Shape of the tensor */ + enum rte_ml_io_type type; + /**< Type of data + * @see enum rte_ml_io_type + */ + uint64_t nb_elements; + /** Number of elements in tensor */ + uint64_t size; + /** Size of tensor in bytes */ }; /** Model information structure */ diff --git a/lib/mldev/version.map b/lib/mldev/version.map index 0706b565be..40ff27f4b9 100644 --- a/lib/mldev/version.map +++ b/lib/mldev/version.map @@ -51,7 +51,6 @@ INTERNAL { rte_ml_io_type_size_get; rte_ml_io_type_to_str; - rte_ml_io_format_to_str; rte_ml_io_float32_to_int8; rte_ml_io_int8_to_float32; rte_ml_io_float32_to_uint8; -- 2.41.0 ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 2/3] mldev: introduce support for IO layout 2023-09-20 7:19 ` [PATCH v2 0/3] Spec changes to support multi I/O models Srikanth Yalavarthi 2023-09-20 7:19 ` [PATCH v2 1/3] mldev: add support for arbitrary shape dimensions Srikanth Yalavarthi @ 2023-09-20 7:19 ` Srikanth Yalavarthi 2023-09-20 7:19 ` [PATCH v2 3/3] mldev: drop input and output size get APIs Srikanth Yalavarthi 2 siblings, 0 replies; 26+ messages in thread From: Srikanth Yalavarthi @ 2023-09-20 7:19 UTC (permalink / raw) To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar Introduce IO layout in ML device specification. IO layout defines the expected arrangement of model input and output buffers in the memory. Packed and Split layout support is added in the specification. Updated rte_ml_op to support array of rte_ml_buff_seg pointers to support packed and split I/O layouts. Updated ML quantize and dequantize APIs to support rte_ml_buff_seg pointer arrays. Replaced batch_size with min_batches and max_batches in rte_ml_model_info. Implement support for model IO layout in ml/cnxk driver. Updated the ML test application to support IO layout and dropped support for '--batches' in test application. Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com> --- app/test-mldev/ml_options.c | 16 -- app/test-mldev/ml_options.h | 2 - app/test-mldev/test_inference_common.c | 327 +++++++++++++++++++++---- app/test-mldev/test_inference_common.h | 6 + app/test-mldev/test_model_common.c | 6 - app/test-mldev/test_model_common.h | 1 - doc/guides/tools/testmldev.rst | 6 - drivers/ml/cnxk/cn10k_ml_dev.h | 3 + drivers/ml/cnxk/cn10k_ml_model.c | 6 +- drivers/ml/cnxk/cn10k_ml_ops.c | 74 +++--- lib/mldev/meson.build | 2 +- lib/mldev/rte_mldev.c | 12 +- lib/mldev/rte_mldev.h | 90 +++++-- lib/mldev/rte_mldev_core.h | 14 +- 14 files changed, 418 insertions(+), 147 deletions(-) diff --git a/app/test-mldev/ml_options.c b/app/test-mldev/ml_options.c index d068b30df5..eeaffec399 100644 --- a/app/test-mldev/ml_options.c +++ b/app/test-mldev/ml_options.c @@ -28,7 +28,6 @@ ml_options_default(struct ml_options *opt) opt->burst_size = 1; opt->queue_pairs = 1; opt->queue_size = 1; - opt->batches = 0; opt->tolerance = 0.0; opt->stats = false; opt->debug = false; @@ -213,18 +212,6 @@ ml_parse_queue_size(struct ml_options *opt, const char *arg) return ret; } -static int -ml_parse_batches(struct ml_options *opt, const char *arg) -{ - int ret; - - ret = parser_read_uint16(&opt->batches, arg); - if (ret != 0) - ml_err("Invalid option, batches = %s\n", arg); - - return ret; -} - static int ml_parse_tolerance(struct ml_options *opt, const char *arg) { @@ -255,7 +242,6 @@ ml_dump_test_options(const char *testname) "\t\t--burst_size : inference burst size\n" "\t\t--queue_pairs : number of queue pairs to create\n" "\t\t--queue_size : size of queue-pair\n" - "\t\t--batches : number of batches of input\n" "\t\t--tolerance : maximum tolerance (%%) for output validation\n" "\t\t--stats : enable reporting device and model statistics\n"); printf("\n"); @@ -287,7 +273,6 @@ static struct option lgopts[] = { {ML_BURST_SIZE, 1, 0, 0}, {ML_QUEUE_PAIRS, 1, 0, 0}, {ML_QUEUE_SIZE, 1, 0, 0}, - {ML_BATCHES, 1, 0, 0}, {ML_TOLERANCE, 1, 0, 0}, {ML_STATS, 0, 0, 0}, {ML_DEBUG, 0, 0, 0}, @@ -309,7 +294,6 @@ ml_opts_parse_long(int opt_idx, struct ml_options *opt) {ML_BURST_SIZE, ml_parse_burst_size}, {ML_QUEUE_PAIRS, ml_parse_queue_pairs}, {ML_QUEUE_SIZE, ml_parse_queue_size}, - {ML_BATCHES, ml_parse_batches}, {ML_TOLERANCE, ml_parse_tolerance}, }; diff --git a/app/test-mldev/ml_options.h b/app/test-mldev/ml_options.h index 622a4c05fc..90e22adeac 100644 --- a/app/test-mldev/ml_options.h +++ b/app/test-mldev/ml_options.h @@ -21,7 +21,6 @@ #define ML_BURST_SIZE ("burst_size") #define ML_QUEUE_PAIRS ("queue_pairs") #define ML_QUEUE_SIZE ("queue_size") -#define ML_BATCHES ("batches") #define ML_TOLERANCE ("tolerance") #define ML_STATS ("stats") #define ML_DEBUG ("debug") @@ -44,7 +43,6 @@ struct ml_options { uint16_t burst_size; uint16_t queue_pairs; uint16_t queue_size; - uint16_t batches; float tolerance; bool stats; bool debug; diff --git a/app/test-mldev/test_inference_common.c b/app/test-mldev/test_inference_common.c index b40519b5e3..846f71abb1 100644 --- a/app/test-mldev/test_inference_common.c +++ b/app/test-mldev/test_inference_common.c @@ -47,7 +47,10 @@ ml_enqueue_single(void *arg) uint64_t start_cycle; uint32_t burst_enq; uint32_t lcore_id; + uint64_t offset; + uint64_t bufsz; uint16_t fid; + uint32_t i; int ret; lcore_id = rte_lcore_id(); @@ -66,24 +69,64 @@ ml_enqueue_single(void *arg) if (ret != 0) goto next_model; -retry: +retry_req: ret = rte_mempool_get(t->model[fid].io_pool, (void **)&req); if (ret != 0) - goto retry; + goto retry_req; + +retry_inp_segs: + ret = rte_mempool_get_bulk(t->buf_seg_pool, (void **)req->inp_buf_segs, + t->model[fid].info.nb_inputs); + if (ret != 0) + goto retry_inp_segs; + +retry_out_segs: + ret = rte_mempool_get_bulk(t->buf_seg_pool, (void **)req->out_buf_segs, + t->model[fid].info.nb_outputs); + if (ret != 0) + goto retry_out_segs; op->model_id = t->model[fid].id; - op->nb_batches = t->model[fid].nb_batches; + op->nb_batches = t->model[fid].info.min_batches; op->mempool = t->op_pool; + op->input = req->inp_buf_segs; + op->output = req->out_buf_segs; + op->user_ptr = req; - op->input.addr = req->input; - op->input.length = t->model[fid].inp_qsize; - op->input.next = NULL; + if (t->model[fid].info.io_layout == RTE_ML_IO_LAYOUT_PACKED) { + op->input[0]->addr = req->input; + op->input[0]->iova_addr = rte_mem_virt2iova(req->input); + op->input[0]->length = t->model[fid].inp_qsize; + op->input[0]->next = NULL; + + op->output[0]->addr = req->output; + op->output[0]->iova_addr = rte_mem_virt2iova(req->output); + op->output[0]->length = t->model[fid].out_qsize; + op->output[0]->next = NULL; + } else { + offset = 0; + for (i = 0; i < t->model[fid].info.nb_inputs; i++) { + bufsz = RTE_ALIGN_CEIL(t->model[fid].info.input_info[i].size, + t->cmn.dev_info.align_size); + op->input[i]->addr = req->input + offset; + op->input[i]->iova_addr = rte_mem_virt2iova(req->input + offset); + op->input[i]->length = bufsz; + op->input[i]->next = NULL; + offset += bufsz; + } - op->output.addr = req->output; - op->output.length = t->model[fid].out_qsize; - op->output.next = NULL; + offset = 0; + for (i = 0; i < t->model[fid].info.nb_outputs; i++) { + bufsz = RTE_ALIGN_CEIL(t->model[fid].info.output_info[i].size, + t->cmn.dev_info.align_size); + op->output[i]->addr = req->output + offset; + op->output[i]->iova_addr = rte_mem_virt2iova(req->output + offset); + op->output[i]->length = bufsz; + op->output[i]->next = NULL; + offset += bufsz; + } + } - op->user_ptr = req; req->niters++; req->fid = fid; @@ -143,6 +186,10 @@ ml_dequeue_single(void *arg) } req = (struct ml_request *)op->user_ptr; rte_mempool_put(t->model[req->fid].io_pool, req); + rte_mempool_put_bulk(t->buf_seg_pool, (void **)op->input, + t->model[req->fid].info.nb_inputs); + rte_mempool_put_bulk(t->buf_seg_pool, (void **)op->output, + t->model[req->fid].info.nb_outputs); rte_mempool_put(t->op_pool, op); } @@ -164,9 +211,12 @@ ml_enqueue_burst(void *arg) uint16_t burst_enq; uint32_t lcore_id; uint16_t pending; + uint64_t offset; + uint64_t bufsz; uint16_t idx; uint16_t fid; uint16_t i; + uint16_t j; int ret; lcore_id = rte_lcore_id(); @@ -186,25 +236,70 @@ ml_enqueue_burst(void *arg) if (ret != 0) goto next_model; -retry: +retry_reqs: ret = rte_mempool_get_bulk(t->model[fid].io_pool, (void **)args->reqs, ops_count); if (ret != 0) - goto retry; + goto retry_reqs; for (i = 0; i < ops_count; i++) { +retry_inp_segs: + ret = rte_mempool_get_bulk(t->buf_seg_pool, (void **)args->reqs[i]->inp_buf_segs, + t->model[fid].info.nb_inputs); + if (ret != 0) + goto retry_inp_segs; + +retry_out_segs: + ret = rte_mempool_get_bulk(t->buf_seg_pool, (void **)args->reqs[i]->out_buf_segs, + t->model[fid].info.nb_outputs); + if (ret != 0) + goto retry_out_segs; + args->enq_ops[i]->model_id = t->model[fid].id; - args->enq_ops[i]->nb_batches = t->model[fid].nb_batches; + args->enq_ops[i]->nb_batches = t->model[fid].info.min_batches; args->enq_ops[i]->mempool = t->op_pool; + args->enq_ops[i]->input = args->reqs[i]->inp_buf_segs; + args->enq_ops[i]->output = args->reqs[i]->out_buf_segs; + args->enq_ops[i]->user_ptr = args->reqs[i]; - args->enq_ops[i]->input.addr = args->reqs[i]->input; - args->enq_ops[i]->input.length = t->model[fid].inp_qsize; - args->enq_ops[i]->input.next = NULL; + if (t->model[fid].info.io_layout == RTE_ML_IO_LAYOUT_PACKED) { + args->enq_ops[i]->input[0]->addr = args->reqs[i]->input; + args->enq_ops[i]->input[0]->iova_addr = + rte_mem_virt2iova(args->reqs[i]->input); + args->enq_ops[i]->input[0]->length = t->model[fid].inp_qsize; + args->enq_ops[i]->input[0]->next = NULL; + + args->enq_ops[i]->output[0]->addr = args->reqs[i]->output; + args->enq_ops[i]->output[0]->iova_addr = + rte_mem_virt2iova(args->reqs[i]->output); + args->enq_ops[i]->output[0]->length = t->model[fid].out_qsize; + args->enq_ops[i]->output[0]->next = NULL; + } else { + offset = 0; + for (j = 0; j < t->model[fid].info.nb_inputs; j++) { + bufsz = RTE_ALIGN_CEIL(t->model[fid].info.input_info[i].size, + t->cmn.dev_info.align_size); + + args->enq_ops[i]->input[j]->addr = args->reqs[i]->input + offset; + args->enq_ops[i]->input[j]->iova_addr = + rte_mem_virt2iova(args->reqs[i]->input + offset); + args->enq_ops[i]->input[j]->length = t->model[fid].inp_qsize; + args->enq_ops[i]->input[j]->next = NULL; + offset += bufsz; + } - args->enq_ops[i]->output.addr = args->reqs[i]->output; - args->enq_ops[i]->output.length = t->model[fid].out_qsize; - args->enq_ops[i]->output.next = NULL; + offset = 0; + for (j = 0; j < t->model[fid].info.nb_outputs; j++) { + bufsz = RTE_ALIGN_CEIL(t->model[fid].info.output_info[i].size, + t->cmn.dev_info.align_size); + args->enq_ops[i]->output[j]->addr = args->reqs[i]->output + offset; + args->enq_ops[i]->output[j]->iova_addr = + rte_mem_virt2iova(args->reqs[i]->output + offset); + args->enq_ops[i]->output[j]->length = t->model[fid].out_qsize; + args->enq_ops[i]->output[j]->next = NULL; + offset += bufsz; + } + } - args->enq_ops[i]->user_ptr = args->reqs[i]; args->reqs[i]->niters++; args->reqs[i]->fid = fid; } @@ -275,8 +370,15 @@ ml_dequeue_burst(void *arg) t->error_count[lcore_id]++; } req = (struct ml_request *)args->deq_ops[i]->user_ptr; - if (req != NULL) + if (req != NULL) { rte_mempool_put(t->model[req->fid].io_pool, req); + rte_mempool_put_bulk(t->buf_seg_pool, + (void **)args->deq_ops[i]->input, + t->model[req->fid].info.nb_inputs); + rte_mempool_put_bulk(t->buf_seg_pool, + (void **)args->deq_ops[i]->output, + t->model[req->fid].info.nb_outputs); + } } rte_mempool_put_bulk(t->op_pool, (void *)args->deq_ops, burst_deq); } @@ -315,6 +417,12 @@ test_inference_cap_check(struct ml_options *opt) return false; } + if (dev_info.max_io < ML_TEST_MAX_IO_SIZE) { + ml_err("Insufficient capabilities: Max I/O, count = %u > (max limit = %u)", + ML_TEST_MAX_IO_SIZE, dev_info.max_io); + return false; + } + return true; } @@ -403,11 +511,6 @@ test_inference_opt_dump(struct ml_options *opt) ml_dump("tolerance", "%-7.3f", opt->tolerance); ml_dump("stats", "%s", (opt->stats ? "true" : "false")); - if (opt->batches == 0) - ml_dump("batches", "%u (default batch size)", opt->batches); - else - ml_dump("batches", "%u", opt->batches); - ml_dump_begin("filelist"); for (i = 0; i < opt->nb_filelist; i++) { ml_dump_list("model", i, opt->filelist[i].model); @@ -492,10 +595,18 @@ void test_inference_destroy(struct ml_test *test, struct ml_options *opt) { struct test_inference *t; + uint32_t lcore_id; RTE_SET_USED(opt); t = ml_test_priv(test); + + for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) { + rte_free(t->args[lcore_id].enq_ops); + rte_free(t->args[lcore_id].deq_ops); + rte_free(t->args[lcore_id].reqs); + } + rte_free(t); } @@ -572,19 +683,62 @@ ml_request_initialize(struct rte_mempool *mp, void *opaque, void *obj, unsigned { struct test_inference *t = ml_test_priv((struct ml_test *)opaque); struct ml_request *req = (struct ml_request *)obj; + struct rte_ml_buff_seg dbuff_seg[ML_TEST_MAX_IO_SIZE]; + struct rte_ml_buff_seg qbuff_seg[ML_TEST_MAX_IO_SIZE]; + struct rte_ml_buff_seg *q_segs[ML_TEST_MAX_IO_SIZE]; + struct rte_ml_buff_seg *d_segs[ML_TEST_MAX_IO_SIZE]; + uint64_t offset; + uint64_t bufsz; + uint32_t i; RTE_SET_USED(mp); RTE_SET_USED(obj_idx); req->input = (uint8_t *)obj + - RTE_ALIGN_CEIL(sizeof(struct ml_request), t->cmn.dev_info.min_align_size); - req->output = req->input + - RTE_ALIGN_CEIL(t->model[t->fid].inp_qsize, t->cmn.dev_info.min_align_size); + RTE_ALIGN_CEIL(sizeof(struct ml_request), t->cmn.dev_info.align_size); + req->output = + req->input + RTE_ALIGN_CEIL(t->model[t->fid].inp_qsize, t->cmn.dev_info.align_size); req->niters = 0; + if (t->model[t->fid].info.io_layout == RTE_ML_IO_LAYOUT_PACKED) { + dbuff_seg[0].addr = t->model[t->fid].input; + dbuff_seg[0].iova_addr = rte_mem_virt2iova(t->model[t->fid].input); + dbuff_seg[0].length = t->model[t->fid].inp_dsize; + dbuff_seg[0].next = NULL; + d_segs[0] = &dbuff_seg[0]; + + qbuff_seg[0].addr = req->input; + qbuff_seg[0].iova_addr = rte_mem_virt2iova(req->input); + qbuff_seg[0].length = t->model[t->fid].inp_qsize; + qbuff_seg[0].next = NULL; + q_segs[0] = &qbuff_seg[0]; + } else { + offset = 0; + for (i = 0; i < t->model[t->fid].info.nb_inputs; i++) { + bufsz = t->model[t->fid].info.input_info[i].nb_elements * sizeof(float); + dbuff_seg[i].addr = t->model[t->fid].input + offset; + dbuff_seg[i].iova_addr = rte_mem_virt2iova(t->model[t->fid].input + offset); + dbuff_seg[i].length = bufsz; + dbuff_seg[i].next = NULL; + d_segs[i] = &dbuff_seg[i]; + offset += bufsz; + } + + offset = 0; + for (i = 0; i < t->model[t->fid].info.nb_inputs; i++) { + bufsz = RTE_ALIGN_CEIL(t->model[t->fid].info.input_info[i].size, + t->cmn.dev_info.align_size); + qbuff_seg[i].addr = req->input + offset; + qbuff_seg[i].iova_addr = rte_mem_virt2iova(req->input + offset); + qbuff_seg[i].length = bufsz; + qbuff_seg[i].next = NULL; + q_segs[i] = &qbuff_seg[i]; + offset += bufsz; + } + } + /* quantize data */ - rte_ml_io_quantize(t->cmn.opt->dev_id, t->model[t->fid].id, t->model[t->fid].nb_batches, - t->model[t->fid].input, req->input); + rte_ml_io_quantize(t->cmn.opt->dev_id, t->model[t->fid].id, d_segs, q_segs); } int @@ -599,24 +753,39 @@ ml_inference_iomem_setup(struct ml_test *test, struct ml_options *opt, uint16_t uint32_t buff_size; uint32_t mz_size; size_t fsize; + uint32_t i; int ret; /* get input buffer size */ - ret = rte_ml_io_input_size_get(opt->dev_id, t->model[fid].id, t->model[fid].nb_batches, - &t->model[fid].inp_qsize, &t->model[fid].inp_dsize); - if (ret != 0) { - ml_err("Failed to get input size, model : %s\n", opt->filelist[fid].model); - return ret; + t->model[fid].inp_qsize = 0; + for (i = 0; i < t->model[fid].info.nb_inputs; i++) { + if (t->model[fid].info.io_layout == RTE_ML_IO_LAYOUT_PACKED) + t->model[fid].inp_qsize += t->model[fid].info.input_info[i].size; + else + t->model[fid].inp_qsize += RTE_ALIGN_CEIL( + t->model[fid].info.input_info[i].size, t->cmn.dev_info.align_size); } /* get output buffer size */ - ret = rte_ml_io_output_size_get(opt->dev_id, t->model[fid].id, t->model[fid].nb_batches, - &t->model[fid].out_qsize, &t->model[fid].out_dsize); - if (ret != 0) { - ml_err("Failed to get input size, model : %s\n", opt->filelist[fid].model); - return ret; + t->model[fid].out_qsize = 0; + for (i = 0; i < t->model[fid].info.nb_outputs; i++) { + if (t->model[fid].info.io_layout == RTE_ML_IO_LAYOUT_PACKED) + t->model[fid].out_qsize += t->model[fid].info.output_info[i].size; + else + t->model[fid].out_qsize += RTE_ALIGN_CEIL( + t->model[fid].info.output_info[i].size, t->cmn.dev_info.align_size); } + t->model[fid].inp_dsize = 0; + for (i = 0; i < t->model[fid].info.nb_inputs; i++) + t->model[fid].inp_dsize += + t->model[fid].info.input_info[i].nb_elements * sizeof(float); + + t->model[fid].out_dsize = 0; + for (i = 0; i < t->model[fid].info.nb_outputs; i++) + t->model[fid].out_dsize += + t->model[fid].info.output_info[i].nb_elements * sizeof(float); + /* allocate buffer for user data */ mz_size = t->model[fid].inp_dsize + t->model[fid].out_dsize; if (strcmp(opt->filelist[fid].reference, "\0") != 0) @@ -675,9 +844,9 @@ ml_inference_iomem_setup(struct ml_test *test, struct ml_options *opt, uint16_t /* create mempool for quantized input and output buffers. ml_request_initialize is * used as a callback for object creation. */ - buff_size = RTE_ALIGN_CEIL(sizeof(struct ml_request), t->cmn.dev_info.min_align_size) + - RTE_ALIGN_CEIL(t->model[fid].inp_qsize, t->cmn.dev_info.min_align_size) + - RTE_ALIGN_CEIL(t->model[fid].out_qsize, t->cmn.dev_info.min_align_size); + buff_size = RTE_ALIGN_CEIL(sizeof(struct ml_request), t->cmn.dev_info.align_size) + + RTE_ALIGN_CEIL(t->model[fid].inp_qsize, t->cmn.dev_info.align_size) + + RTE_ALIGN_CEIL(t->model[fid].out_qsize, t->cmn.dev_info.align_size); nb_buffers = RTE_MIN((uint64_t)ML_TEST_MAX_POOL_SIZE, opt->repetitions); t->fid = fid; @@ -740,6 +909,18 @@ ml_inference_mem_setup(struct ml_test *test, struct ml_options *opt) return -ENOMEM; } + /* create buf_segs pool of with element of uint8_t. external buffers are attached to the + * buf_segs while queuing inference requests. + */ + t->buf_seg_pool = rte_mempool_create("ml_test_mbuf_pool", ML_TEST_MAX_POOL_SIZE * 2, + sizeof(struct rte_ml_buff_seg), 0, 0, NULL, NULL, NULL, + NULL, opt->socket_id, 0); + if (t->buf_seg_pool == NULL) { + ml_err("Failed to create buf_segs pool : %s\n", "ml_test_mbuf_pool"); + rte_ml_op_pool_free(t->op_pool); + return -ENOMEM; + } + return 0; } @@ -752,6 +933,9 @@ ml_inference_mem_destroy(struct ml_test *test, struct ml_options *opt) /* release op pool */ rte_mempool_free(t->op_pool); + + /* release buf_segs pool */ + rte_mempool_free(t->buf_seg_pool); } static bool @@ -781,8 +965,10 @@ ml_inference_validation(struct ml_test *test, struct ml_request *req) j = 0; next_element: match = false; - deviation = - (*reference == 0 ? 0 : 100 * fabs(*output - *reference) / fabs(*reference)); + if ((*reference == 0) && (*output == 0)) + deviation = 0; + else + deviation = 100 * fabs(*output - *reference) / fabs(*reference); if (deviation <= t->cmn.opt->tolerance) match = true; else @@ -817,14 +1003,59 @@ ml_request_finish(struct rte_mempool *mp, void *opaque, void *obj, unsigned int bool error = false; char *dump_path; + struct rte_ml_buff_seg qbuff_seg[ML_TEST_MAX_IO_SIZE]; + struct rte_ml_buff_seg dbuff_seg[ML_TEST_MAX_IO_SIZE]; + struct rte_ml_buff_seg *q_segs[ML_TEST_MAX_IO_SIZE]; + struct rte_ml_buff_seg *d_segs[ML_TEST_MAX_IO_SIZE]; + uint64_t offset; + uint64_t bufsz; + uint32_t i; + RTE_SET_USED(mp); if (req->niters == 0) return; t->nb_used++; - rte_ml_io_dequantize(t->cmn.opt->dev_id, model->id, t->model[req->fid].nb_batches, - req->output, model->output); + + if (t->model[req->fid].info.io_layout == RTE_ML_IO_LAYOUT_PACKED) { + qbuff_seg[0].addr = req->output; + qbuff_seg[0].iova_addr = rte_mem_virt2iova(req->output); + qbuff_seg[0].length = t->model[req->fid].out_qsize; + qbuff_seg[0].next = NULL; + q_segs[0] = &qbuff_seg[0]; + + dbuff_seg[0].addr = model->output; + dbuff_seg[0].iova_addr = rte_mem_virt2iova(model->output); + dbuff_seg[0].length = t->model[req->fid].out_dsize; + dbuff_seg[0].next = NULL; + d_segs[0] = &dbuff_seg[0]; + } else { + offset = 0; + for (i = 0; i < t->model[req->fid].info.nb_outputs; i++) { + bufsz = RTE_ALIGN_CEIL(t->model[req->fid].info.output_info[i].size, + t->cmn.dev_info.align_size); + qbuff_seg[i].addr = req->output + offset; + qbuff_seg[i].iova_addr = rte_mem_virt2iova(req->output + offset); + qbuff_seg[i].length = bufsz; + qbuff_seg[i].next = NULL; + q_segs[i] = &qbuff_seg[i]; + offset += bufsz; + } + + offset = 0; + for (i = 0; i < t->model[req->fid].info.nb_outputs; i++) { + bufsz = t->model[req->fid].info.output_info[i].nb_elements * sizeof(float); + dbuff_seg[i].addr = model->output + offset; + dbuff_seg[i].iova_addr = rte_mem_virt2iova(model->output + offset); + dbuff_seg[i].length = bufsz; + dbuff_seg[i].next = NULL; + d_segs[i] = &dbuff_seg[i]; + offset += bufsz; + } + } + + rte_ml_io_dequantize(t->cmn.opt->dev_id, model->id, q_segs, d_segs); if (model->reference == NULL) goto dump_output_pass; diff --git a/app/test-mldev/test_inference_common.h b/app/test-mldev/test_inference_common.h index 8f27af25e4..3f4ba3219b 100644 --- a/app/test-mldev/test_inference_common.h +++ b/app/test-mldev/test_inference_common.h @@ -11,11 +11,16 @@ #include "test_model_common.h" +#define ML_TEST_MAX_IO_SIZE 32 + struct ml_request { uint8_t *input; uint8_t *output; uint16_t fid; uint64_t niters; + + struct rte_ml_buff_seg *inp_buf_segs[ML_TEST_MAX_IO_SIZE]; + struct rte_ml_buff_seg *out_buf_segs[ML_TEST_MAX_IO_SIZE]; }; struct ml_core_args { @@ -38,6 +43,7 @@ struct test_inference { /* test specific data */ struct ml_model model[ML_TEST_MAX_MODELS]; + struct rte_mempool *buf_seg_pool; struct rte_mempool *op_pool; uint64_t nb_used; diff --git a/app/test-mldev/test_model_common.c b/app/test-mldev/test_model_common.c index 8dbb0ff89f..c517a50611 100644 --- a/app/test-mldev/test_model_common.c +++ b/app/test-mldev/test_model_common.c @@ -50,12 +50,6 @@ ml_model_load(struct ml_test *test, struct ml_options *opt, struct ml_model *mod return ret; } - /* Update number of batches */ - if (opt->batches == 0) - model->nb_batches = model->info.batch_size; - else - model->nb_batches = opt->batches; - model->state = MODEL_LOADED; return 0; diff --git a/app/test-mldev/test_model_common.h b/app/test-mldev/test_model_common.h index c1021ef1b6..a207e54ab7 100644 --- a/app/test-mldev/test_model_common.h +++ b/app/test-mldev/test_model_common.h @@ -31,7 +31,6 @@ struct ml_model { uint8_t *reference; struct rte_mempool *io_pool; - uint32_t nb_batches; }; int ml_model_load(struct ml_test *test, struct ml_options *opt, struct ml_model *model, diff --git a/doc/guides/tools/testmldev.rst b/doc/guides/tools/testmldev.rst index 741abd722e..9b1565a457 100644 --- a/doc/guides/tools/testmldev.rst +++ b/doc/guides/tools/testmldev.rst @@ -106,11 +106,6 @@ The following are the command-line options supported by the test application. Queue size would translate into ``rte_ml_dev_qp_conf::nb_desc`` field during queue-pair creation. Default value is ``1``. -``--batches <n>`` - Set the number batches in the input file provided for inference run. - When not specified, the test would assume the number of batches - is the batch size of the model. - ``--tolerance <n>`` Set the tolerance value in percentage to be used for output validation. Default value is ``0``. @@ -282,7 +277,6 @@ Supported command line options for inference tests are following:: --burst_size --queue_pairs --queue_size - --batches --tolerance --stats diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h index 6ca0b0bb6e..c73bf7d001 100644 --- a/drivers/ml/cnxk/cn10k_ml_dev.h +++ b/drivers/ml/cnxk/cn10k_ml_dev.h @@ -30,6 +30,9 @@ /* Maximum number of descriptors per queue-pair */ #define ML_CN10K_MAX_DESC_PER_QP 1024 +/* Maximum number of inputs / outputs per model */ +#define ML_CN10K_MAX_INPUT_OUTPUT 32 + /* Maximum number of segments for IO data */ #define ML_CN10K_MAX_SEGMENTS 1 diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c index 26df8d9ff9..e0b750cd8e 100644 --- a/drivers/ml/cnxk/cn10k_ml_model.c +++ b/drivers/ml/cnxk/cn10k_ml_model.c @@ -520,9 +520,11 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model) struct rte_ml_model_info *info; struct rte_ml_io_info *output; struct rte_ml_io_info *input; + struct cn10k_ml_dev *mldev; uint8_t i; uint8_t j; + mldev = dev->data->dev_private; metadata = &model->metadata; info = PLT_PTR_CAST(model->info); input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info)); @@ -537,7 +539,9 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model) metadata->model.version[3]); info->model_id = model->model_id; info->device_id = dev->data->dev_id; - info->batch_size = model->batch_size; + info->io_layout = RTE_ML_IO_LAYOUT_PACKED; + info->min_batches = model->batch_size; + info->max_batches = mldev->fw.req->jd.fw_load.cap.s.max_num_batches / model->batch_size; info->nb_inputs = metadata->model.num_input; info->input_info = input; info->nb_outputs = metadata->model.num_output; diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c index e3faab81ba..1d72fb52a6 100644 --- a/drivers/ml/cnxk/cn10k_ml_ops.c +++ b/drivers/ml/cnxk/cn10k_ml_ops.c @@ -471,9 +471,9 @@ cn10k_ml_prep_fp_job_descriptor(struct rte_ml_dev *dev, struct cn10k_ml_req *req req->jd.hdr.sp_flags = 0x0; req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result); req->jd.model_run.input_ddr_addr = - PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->input.addr)); + PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->input[0]->addr)); req->jd.model_run.output_ddr_addr = - PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->output.addr)); + PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->output[0]->addr)); req->jd.model_run.num_batches = op->nb_batches; } @@ -856,7 +856,11 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint static int cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id) { + struct rte_ml_model_info *info; struct cn10k_ml_model *model; + struct rte_ml_buff_seg seg[2]; + struct rte_ml_buff_seg *inp; + struct rte_ml_buff_seg *out; struct rte_ml_op op; char str[RTE_MEMZONE_NAMESIZE]; @@ -864,12 +868,22 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id) uint64_t isize = 0; uint64_t osize = 0; int ret = 0; + uint32_t i; model = dev->data->models[model_id]; + info = (struct rte_ml_model_info *)model->info; + inp = &seg[0]; + out = &seg[1]; /* Create input and output buffers. */ - rte_ml_io_input_size_get(dev->data->dev_id, model_id, model->batch_size, &isize, NULL); - rte_ml_io_output_size_get(dev->data->dev_id, model_id, model->batch_size, &osize, NULL); + for (i = 0; i < info->nb_inputs; i++) + isize += info->input_info[i].size; + + for (i = 0; i < info->nb_outputs; i++) + osize += info->output_info[i].size; + + isize = model->batch_size * isize; + osize = model->batch_size * osize; snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", model_id); mz = plt_memzone_reserve_aligned(str, isize + osize, 0, ML_CN10K_ALIGN_SIZE); @@ -877,17 +891,22 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id) return -ENOMEM; memset(mz->addr, 0, isize + osize); + seg[0].addr = mz->addr; + seg[0].iova_addr = mz->iova; + seg[0].length = isize; + seg[0].next = NULL; + + seg[1].addr = PLT_PTR_ADD(mz->addr, isize); + seg[1].iova_addr = mz->iova + isize; + seg[1].length = osize; + seg[1].next = NULL; + op.model_id = model_id; op.nb_batches = model->batch_size; op.mempool = NULL; - op.input.addr = mz->addr; - op.input.length = isize; - op.input.next = NULL; - - op.output.addr = PLT_PTR_ADD(op.input.addr, isize); - op.output.length = osize; - op.output.next = NULL; + op.input = &inp; + op.output = &out; memset(model->req, 0, sizeof(struct cn10k_ml_req)); ret = cn10k_ml_inference_sync(dev, &op); @@ -919,8 +938,9 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info) else if (strcmp(mldev->fw.poll_mem, "ddr") == 0) dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP; + dev_info->max_io = ML_CN10K_MAX_INPUT_OUTPUT; dev_info->max_segments = ML_CN10K_MAX_SEGMENTS; - dev_info->min_align_size = ML_CN10K_ALIGN_SIZE; + dev_info->align_size = ML_CN10K_ALIGN_SIZE; return 0; } @@ -2139,15 +2159,14 @@ cn10k_ml_io_output_size_get(struct rte_ml_dev *dev, uint16_t model_id, uint32_t } static int -cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_batches, void *dbuffer, - void *qbuffer) +cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer, + struct rte_ml_buff_seg **qbuffer) { struct cn10k_ml_model *model; uint8_t model_input_type; uint8_t *lcl_dbuffer; uint8_t *lcl_qbuffer; uint8_t input_type; - uint32_t batch_id; float qscale; uint32_t i; uint32_t j; @@ -2160,11 +2179,9 @@ cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_batc return -EINVAL; } - lcl_dbuffer = dbuffer; - lcl_qbuffer = qbuffer; - batch_id = 0; + lcl_dbuffer = dbuffer[0]->addr; + lcl_qbuffer = qbuffer[0]->addr; -next_batch: for (i = 0; i < model->metadata.model.num_input; i++) { if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) { input_type = model->metadata.input1[i].input_type; @@ -2218,23 +2235,18 @@ cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_batc lcl_qbuffer += model->addr.input[i].sz_q; } - batch_id++; - if (batch_id < PLT_DIV_CEIL(nb_batches, model->batch_size)) - goto next_batch; - return 0; } static int -cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_batches, - void *qbuffer, void *dbuffer) +cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer, + struct rte_ml_buff_seg **dbuffer) { struct cn10k_ml_model *model; uint8_t model_output_type; uint8_t *lcl_qbuffer; uint8_t *lcl_dbuffer; uint8_t output_type; - uint32_t batch_id; float dscale; uint32_t i; uint32_t j; @@ -2247,11 +2259,9 @@ cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_ba return -EINVAL; } - lcl_dbuffer = dbuffer; - lcl_qbuffer = qbuffer; - batch_id = 0; + lcl_dbuffer = dbuffer[0]->addr; + lcl_qbuffer = qbuffer[0]->addr; -next_batch: for (i = 0; i < model->metadata.model.num_output; i++) { if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) { output_type = model->metadata.output1[i].output_type; @@ -2306,10 +2316,6 @@ cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_ba lcl_dbuffer += model->addr.output[i].sz_d; } - batch_id++; - if (batch_id < PLT_DIV_CEIL(nb_batches, model->batch_size)) - goto next_batch; - return 0; } diff --git a/lib/mldev/meson.build b/lib/mldev/meson.build index 5769b0640a..0079ccd205 100644 --- a/lib/mldev/meson.build +++ b/lib/mldev/meson.build @@ -35,7 +35,7 @@ driver_sdk_headers += files( 'mldev_utils.h', ) -deps += ['mempool'] +deps += ['mempool', 'mbuf'] if get_option('buildtype').contains('debug') cflags += [ '-DRTE_LIBRTE_ML_DEV_DEBUG' ] diff --git a/lib/mldev/rte_mldev.c b/lib/mldev/rte_mldev.c index 0d8ccd3212..9a48ed3e94 100644 --- a/lib/mldev/rte_mldev.c +++ b/lib/mldev/rte_mldev.c @@ -730,8 +730,8 @@ rte_ml_io_output_size_get(int16_t dev_id, uint16_t model_id, uint32_t nb_batches } int -rte_ml_io_quantize(int16_t dev_id, uint16_t model_id, uint16_t nb_batches, void *dbuffer, - void *qbuffer) +rte_ml_io_quantize(int16_t dev_id, uint16_t model_id, struct rte_ml_buff_seg **dbuffer, + struct rte_ml_buff_seg **qbuffer) { struct rte_ml_dev *dev; @@ -754,12 +754,12 @@ rte_ml_io_quantize(int16_t dev_id, uint16_t model_id, uint16_t nb_batches, void return -EINVAL; } - return (*dev->dev_ops->io_quantize)(dev, model_id, nb_batches, dbuffer, qbuffer); + return (*dev->dev_ops->io_quantize)(dev, model_id, dbuffer, qbuffer); } int -rte_ml_io_dequantize(int16_t dev_id, uint16_t model_id, uint16_t nb_batches, void *qbuffer, - void *dbuffer) +rte_ml_io_dequantize(int16_t dev_id, uint16_t model_id, struct rte_ml_buff_seg **qbuffer, + struct rte_ml_buff_seg **dbuffer) { struct rte_ml_dev *dev; @@ -782,7 +782,7 @@ rte_ml_io_dequantize(int16_t dev_id, uint16_t model_id, uint16_t nb_batches, voi return -EINVAL; } - return (*dev->dev_ops->io_dequantize)(dev, model_id, nb_batches, qbuffer, dbuffer); + return (*dev->dev_ops->io_dequantize)(dev, model_id, qbuffer, dbuffer); } /** Initialise rte_ml_op mempool element */ diff --git a/lib/mldev/rte_mldev.h b/lib/mldev/rte_mldev.h index 6204df0930..316c6fd018 100644 --- a/lib/mldev/rte_mldev.h +++ b/lib/mldev/rte_mldev.h @@ -228,12 +228,14 @@ struct rte_ml_dev_info { /**< Maximum allowed number of descriptors for queue pair by the device. * @see struct rte_ml_dev_qp_conf::nb_desc */ + uint16_t max_io; + /**< Maximum number of inputs/outputs supported per model. */ uint16_t max_segments; /**< Maximum number of scatter-gather entries supported by the device. * @see struct rte_ml_buff_seg struct rte_ml_buff_seg::next */ - uint16_t min_align_size; - /**< Minimum alignment size of IO buffers used by the device. */ + uint16_t align_size; + /**< Alignment size of IO buffers used by the device. */ }; /** @@ -429,10 +431,28 @@ struct rte_ml_op { /**< Reserved for future use. */ struct rte_mempool *mempool; /**< Pool from which operation is allocated. */ - struct rte_ml_buff_seg input; - /**< Input buffer to hold the inference data. */ - struct rte_ml_buff_seg output; - /**< Output buffer to hold the inference output by the driver. */ + struct rte_ml_buff_seg **input; + /**< Array of buffer segments to hold the inference input data. + * + * When the model supports IO layout RTE_ML_IO_LAYOUT_PACKED, size of + * the array is 1. + * + * When the model supports IO layout RTE_ML_IO_LAYOUT_SPLIT, size of + * the array is rte_ml_model_info::nb_inputs. + * + * @see struct rte_ml_dev_info::io_layout + */ + struct rte_ml_buff_seg **output; + /**< Array of buffer segments to hold the inference output data. + * + * When the model supports IO layout RTE_ML_IO_LAYOUT_PACKED, size of + * the array is 1. + * + * When the model supports IO layout RTE_ML_IO_LAYOUT_SPLIT, size of + * the array is rte_ml_model_info::nb_outputs. + * + * @see struct rte_ml_dev_info::io_layout + */ union { uint64_t user_u64; /**< User data as uint64_t.*/ @@ -863,7 +883,37 @@ enum rte_ml_io_type { /**< 16-bit brain floating point number. */ }; -/** Input and output data information structure +/** ML I/O buffer layout */ +enum rte_ml_io_layout { + RTE_ML_IO_LAYOUT_PACKED, + /**< All inputs for the model should packed in a single buffer with + * no padding between individual inputs. The buffer is expected to + * be aligned to rte_ml_dev_info::align_size. + * + * When I/O segmentation is supported by the device, the packed + * data can be split into multiple segments. In this case, each + * segment is expected to be aligned to rte_ml_dev_info::align_size + * + * Same applies to output. + * + * @see struct rte_ml_dev_info::max_segments + */ + RTE_ML_IO_LAYOUT_SPLIT + /**< Each input for the model should be stored as separate buffers + * and each input should be aligned to rte_ml_dev_info::align_size. + * + * When I/O segmentation is supported, each input can be split into + * multiple segments. In this case, each segment is expected to be + * aligned to rte_ml_dev_info::align_size + * + * Same applies to output. + * + * @see struct rte_ml_dev_info::max_segments + */ +}; + +/** + * Input and output data information structure * * Specifies the type and shape of input and output data. */ @@ -873,7 +923,7 @@ struct rte_ml_io_info { uint32_t nb_dims; /**< Number of dimensions in shape */ uint32_t *shape; - /**< Shape of the tensor */ + /**< Shape of the tensor for rte_ml_model_info::min_batches of the model. */ enum rte_ml_io_type type; /**< Type of data * @see enum rte_ml_io_type @@ -894,8 +944,16 @@ struct rte_ml_model_info { /**< Model ID */ uint16_t device_id; /**< Device ID */ - uint16_t batch_size; - /**< Maximum number of batches that the model can process simultaneously */ + enum rte_ml_io_layout io_layout; + /**< I/O buffer layout for the model */ + uint16_t min_batches; + /**< Minimum number of batches that the model can process + * in one inference request + */ + uint16_t max_batches; + /**< Maximum number of batches that the model can process + * in one inference request + */ uint32_t nb_inputs; /**< Number of inputs */ const struct rte_ml_io_info *input_info; @@ -1021,8 +1079,6 @@ rte_ml_io_output_size_get(int16_t dev_id, uint16_t model_id, uint32_t nb_batches * The identifier of the device. * @param[in] model_id * Identifier for the model - * @param[in] nb_batches - * Number of batches in the dequantized input buffer * @param[in] dbuffer * Address of dequantized input data * @param[in] qbuffer @@ -1034,8 +1090,8 @@ rte_ml_io_output_size_get(int16_t dev_id, uint16_t model_id, uint32_t nb_batches */ __rte_experimental int -rte_ml_io_quantize(int16_t dev_id, uint16_t model_id, uint16_t nb_batches, void *dbuffer, - void *qbuffer); +rte_ml_io_quantize(int16_t dev_id, uint16_t model_id, struct rte_ml_buff_seg **dbuffer, + struct rte_ml_buff_seg **qbuffer); /** * Dequantize output data. @@ -1047,8 +1103,6 @@ rte_ml_io_quantize(int16_t dev_id, uint16_t model_id, uint16_t nb_batches, void * The identifier of the device. * @param[in] model_id * Identifier for the model - * @param[in] nb_batches - * Number of batches in the dequantized output buffer * @param[in] qbuffer * Address of quantized output data * @param[in] dbuffer @@ -1060,8 +1114,8 @@ rte_ml_io_quantize(int16_t dev_id, uint16_t model_id, uint16_t nb_batches, void */ __rte_experimental int -rte_ml_io_dequantize(int16_t dev_id, uint16_t model_id, uint16_t nb_batches, void *qbuffer, - void *dbuffer); +rte_ml_io_dequantize(int16_t dev_id, uint16_t model_id, struct rte_ml_buff_seg **qbuffer, + struct rte_ml_buff_seg **dbuffer); /* ML op pool operations */ diff --git a/lib/mldev/rte_mldev_core.h b/lib/mldev/rte_mldev_core.h index 78b8b7633d..8530b07316 100644 --- a/lib/mldev/rte_mldev_core.h +++ b/lib/mldev/rte_mldev_core.h @@ -523,8 +523,6 @@ typedef int (*mldev_io_output_size_get_t)(struct rte_ml_dev *dev, uint16_t model * ML device pointer. * @param model_id * Model ID to use. - * @param nb_batches - * Number of batches. * @param dbuffer * Pointer t de-quantized data buffer. * @param qbuffer @@ -534,8 +532,9 @@ typedef int (*mldev_io_output_size_get_t)(struct rte_ml_dev *dev, uint16_t model * - 0 on success. * - <0, error on failure. */ -typedef int (*mldev_io_quantize_t)(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_batches, - void *dbuffer, void *qbuffer); +typedef int (*mldev_io_quantize_t)(struct rte_ml_dev *dev, uint16_t model_id, + struct rte_ml_buff_seg **dbuffer, + struct rte_ml_buff_seg **qbuffer); /** * @internal @@ -546,8 +545,6 @@ typedef int (*mldev_io_quantize_t)(struct rte_ml_dev *dev, uint16_t model_id, ui * ML device pointer. * @param model_id * Model ID to use. - * @param nb_batches - * Number of batches. * @param qbuffer * Pointer t de-quantized data buffer. * @param dbuffer @@ -557,8 +554,9 @@ typedef int (*mldev_io_quantize_t)(struct rte_ml_dev *dev, uint16_t model_id, ui * - 0 on success. * - <0, error on failure. */ -typedef int (*mldev_io_dequantize_t)(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_batches, - void *qbuffer, void *dbuffer); +typedef int (*mldev_io_dequantize_t)(struct rte_ml_dev *dev, uint16_t model_id, + struct rte_ml_buff_seg **qbuffer, + struct rte_ml_buff_seg **dbuffer); /** * @internal -- 2.41.0 ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v2 3/3] mldev: drop input and output size get APIs 2023-09-20 7:19 ` [PATCH v2 0/3] Spec changes to support multi I/O models Srikanth Yalavarthi 2023-09-20 7:19 ` [PATCH v2 1/3] mldev: add support for arbitrary shape dimensions Srikanth Yalavarthi 2023-09-20 7:19 ` [PATCH v2 2/3] mldev: introduce support for IO layout Srikanth Yalavarthi @ 2023-09-20 7:19 ` Srikanth Yalavarthi 2023-10-03 6:10 ` Anup Prabhu 2 siblings, 1 reply; 26+ messages in thread From: Srikanth Yalavarthi @ 2023-09-20 7:19 UTC (permalink / raw) To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar Drop support and use of ML input and output size get functions, rte_ml_io_input_size_get and rte_ml_io_output_size_get. These functions are not required, as the model buffer size can be computed from the fields of updated rte_ml_io_info structure. Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com> --- drivers/ml/cnxk/cn10k_ml_ops.c | 50 ---------------------------- lib/mldev/rte_mldev.c | 38 --------------------- lib/mldev/rte_mldev.h | 60 ---------------------------------- lib/mldev/rte_mldev_core.h | 54 ------------------------------ lib/mldev/version.map | 2 -- 5 files changed, 204 deletions(-) diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c index 1d72fb52a6..4abf4ae0d3 100644 --- a/drivers/ml/cnxk/cn10k_ml_ops.c +++ b/drivers/ml/cnxk/cn10k_ml_ops.c @@ -2110,54 +2110,6 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *bu return 0; } -static int -cn10k_ml_io_input_size_get(struct rte_ml_dev *dev, uint16_t model_id, uint32_t nb_batches, - uint64_t *input_qsize, uint64_t *input_dsize) -{ - struct cn10k_ml_model *model; - - model = dev->data->models[model_id]; - - if (model == NULL) { - plt_err("Invalid model_id = %u", model_id); - return -EINVAL; - } - - if (input_qsize != NULL) - *input_qsize = PLT_U64_CAST(model->addr.total_input_sz_q * - PLT_DIV_CEIL(nb_batches, model->batch_size)); - - if (input_dsize != NULL) - *input_dsize = PLT_U64_CAST(model->addr.total_input_sz_d * - PLT_DIV_CEIL(nb_batches, model->batch_size)); - - return 0; -} - -static int -cn10k_ml_io_output_size_get(struct rte_ml_dev *dev, uint16_t model_id, uint32_t nb_batches, - uint64_t *output_qsize, uint64_t *output_dsize) -{ - struct cn10k_ml_model *model; - - model = dev->data->models[model_id]; - - if (model == NULL) { - plt_err("Invalid model_id = %u", model_id); - return -EINVAL; - } - - if (output_qsize != NULL) - *output_qsize = PLT_U64_CAST(model->addr.total_output_sz_q * - PLT_DIV_CEIL(nb_batches, model->batch_size)); - - if (output_dsize != NULL) - *output_dsize = PLT_U64_CAST(model->addr.total_output_sz_d * - PLT_DIV_CEIL(nb_batches, model->batch_size)); - - return 0; -} - static int cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer, struct rte_ml_buff_seg **qbuffer) @@ -2636,8 +2588,6 @@ struct rte_ml_dev_ops cn10k_ml_ops = { .model_params_update = cn10k_ml_model_params_update, /* I/O ops */ - .io_input_size_get = cn10k_ml_io_input_size_get, - .io_output_size_get = cn10k_ml_io_output_size_get, .io_quantize = cn10k_ml_io_quantize, .io_dequantize = cn10k_ml_io_dequantize, }; diff --git a/lib/mldev/rte_mldev.c b/lib/mldev/rte_mldev.c index 9a48ed3e94..cc5f2e0cc6 100644 --- a/lib/mldev/rte_mldev.c +++ b/lib/mldev/rte_mldev.c @@ -691,44 +691,6 @@ rte_ml_model_params_update(int16_t dev_id, uint16_t model_id, void *buffer) return (*dev->dev_ops->model_params_update)(dev, model_id, buffer); } -int -rte_ml_io_input_size_get(int16_t dev_id, uint16_t model_id, uint32_t nb_batches, - uint64_t *input_qsize, uint64_t *input_dsize) -{ - struct rte_ml_dev *dev; - - if (!rte_ml_dev_is_valid_dev(dev_id)) { - RTE_MLDEV_LOG(ERR, "Invalid dev_id = %d\n", dev_id); - return -EINVAL; - } - - dev = rte_ml_dev_pmd_get_dev(dev_id); - if (*dev->dev_ops->io_input_size_get == NULL) - return -ENOTSUP; - - return (*dev->dev_ops->io_input_size_get)(dev, model_id, nb_batches, input_qsize, - input_dsize); -} - -int -rte_ml_io_output_size_get(int16_t dev_id, uint16_t model_id, uint32_t nb_batches, - uint64_t *output_qsize, uint64_t *output_dsize) -{ - struct rte_ml_dev *dev; - - if (!rte_ml_dev_is_valid_dev(dev_id)) { - RTE_MLDEV_LOG(ERR, "Invalid dev_id = %d\n", dev_id); - return -EINVAL; - } - - dev = rte_ml_dev_pmd_get_dev(dev_id); - if (*dev->dev_ops->io_output_size_get == NULL) - return -ENOTSUP; - - return (*dev->dev_ops->io_output_size_get)(dev, model_id, nb_batches, output_qsize, - output_dsize); -} - int rte_ml_io_quantize(int16_t dev_id, uint16_t model_id, struct rte_ml_buff_seg **dbuffer, struct rte_ml_buff_seg **qbuffer) diff --git a/lib/mldev/rte_mldev.h b/lib/mldev/rte_mldev.h index 316c6fd018..63b2670bb0 100644 --- a/lib/mldev/rte_mldev.h +++ b/lib/mldev/rte_mldev.h @@ -1008,66 +1008,6 @@ rte_ml_model_params_update(int16_t dev_id, uint16_t model_id, void *buffer); /* IO operations */ -/** - * Get size of quantized and dequantized input buffers. - * - * Calculate the size of buffers required for quantized and dequantized input data. - * This API would return the buffer sizes for the number of batches provided and would - * consider the alignment requirements as per the PMD. Input sizes computed by this API can - * be used by the application to allocate buffers. - * - * @param[in] dev_id - * The identifier of the device. - * @param[in] model_id - * Identifier for the model created - * @param[in] nb_batches - * Number of batches of input to be processed in a single inference job - * @param[out] input_qsize - * Quantized input size pointer. - * NULL value is allowed, in which case input_qsize is not calculated by the driver. - * @param[out] input_dsize - * Dequantized input size pointer. - * NULL value is allowed, in which case input_dsize is not calculated by the driver. - * - * @return - * - Returns 0 on success - * - Returns negative value on failure - */ -__rte_experimental -int -rte_ml_io_input_size_get(int16_t dev_id, uint16_t model_id, uint32_t nb_batches, - uint64_t *input_qsize, uint64_t *input_dsize); - -/** - * Get size of quantized and dequantized output buffers. - * - * Calculate the size of buffers required for quantized and dequantized output data. - * This API would return the buffer sizes for the number of batches provided and would consider - * the alignment requirements as per the PMD. Output sizes computed by this API can be used by the - * application to allocate buffers. - * - * @param[in] dev_id - * The identifier of the device. - * @param[in] model_id - * Identifier for the model created - * @param[in] nb_batches - * Number of batches of input to be processed in a single inference job - * @param[out] output_qsize - * Quantized output size pointer. - * NULL value is allowed, in which case output_qsize is not calculated by the driver. - * @param[out] output_dsize - * Dequantized output size pointer. - * NULL value is allowed, in which case output_dsize is not calculated by the driver. - * - * @return - * - Returns 0 on success - * - Returns negative value on failure - */ -__rte_experimental -int -rte_ml_io_output_size_get(int16_t dev_id, uint16_t model_id, uint32_t nb_batches, - uint64_t *output_qsize, uint64_t *output_dsize); - /** * Quantize input data. * diff --git a/lib/mldev/rte_mldev_core.h b/lib/mldev/rte_mldev_core.h index 8530b07316..2279b1dcec 100644 --- a/lib/mldev/rte_mldev_core.h +++ b/lib/mldev/rte_mldev_core.h @@ -466,54 +466,6 @@ typedef int (*mldev_model_info_get_t)(struct rte_ml_dev *dev, uint16_t model_id, */ typedef int (*mldev_model_params_update_t)(struct rte_ml_dev *dev, uint16_t model_id, void *buffer); -/** - * @internal - * - * Get size of input buffers. - * - * @param dev - * ML device pointer. - * @param model_id - * Model ID to use. - * @param nb_batches - * Number of batches. - * @param input_qsize - * Size of quantized input. - * @param input_dsize - * Size of dequantized input. - * - * @return - * - 0 on success. - * - <0, error on failure. - */ -typedef int (*mldev_io_input_size_get_t)(struct rte_ml_dev *dev, uint16_t model_id, - uint32_t nb_batches, uint64_t *input_qsize, - uint64_t *input_dsize); - -/** - * @internal - * - * Get size of output buffers. - * - * @param dev - * ML device pointer. - * @param model_id - * Model ID to use. - * @param nb_batches - * Number of batches. - * @param output_qsize - * Size of quantized output. - * @param output_dsize - * Size of dequantized output. - * - * @return - * - 0 on success. - * - <0, error on failure. - */ -typedef int (*mldev_io_output_size_get_t)(struct rte_ml_dev *dev, uint16_t model_id, - uint32_t nb_batches, uint64_t *output_qsize, - uint64_t *output_dsize); - /** * @internal * @@ -627,12 +579,6 @@ struct rte_ml_dev_ops { /** Update model params. */ mldev_model_params_update_t model_params_update; - /** Get input buffer size. */ - mldev_io_input_size_get_t io_input_size_get; - - /** Get output buffer size. */ - mldev_io_output_size_get_t io_output_size_get; - /** Quantize data */ mldev_io_quantize_t io_quantize; diff --git a/lib/mldev/version.map b/lib/mldev/version.map index 40ff27f4b9..99841db6aa 100644 --- a/lib/mldev/version.map +++ b/lib/mldev/version.map @@ -23,8 +23,6 @@ EXPERIMENTAL { rte_ml_dev_xstats_reset; rte_ml_enqueue_burst; rte_ml_io_dequantize; - rte_ml_io_input_size_get; - rte_ml_io_output_size_get; rte_ml_io_quantize; rte_ml_model_info_get; rte_ml_model_load; -- 2.41.0 ^ permalink raw reply [flat|nested] 26+ messages in thread
* RE: [PATCH v2 3/3] mldev: drop input and output size get APIs 2023-09-20 7:19 ` [PATCH v2 3/3] mldev: drop input and output size get APIs Srikanth Yalavarthi @ 2023-10-03 6:10 ` Anup Prabhu 0 siblings, 0 replies; 26+ messages in thread From: Anup Prabhu @ 2023-10-03 6:10 UTC (permalink / raw) To: Srikanth Yalavarthi, Srikanth Yalavarthi Cc: dev, Shivah Shankar Shankar Narayan Rao, Prince Takkar [-- Attachment #1: Type: text/plain, Size: 805 bytes --] > -----Original Message----- > From: Srikanth Yalavarthi <syalavarthi@marvell.com> > Sent: Wednesday, September 20, 2023 12:49 PM > To: Srikanth Yalavarthi <syalavarthi@marvell.com> > Cc: dev@dpdk.org; Shivah Shankar Shankar Narayan Rao > <sshankarnara@marvell.com>; Anup Prabhu <aprabhu@marvell.com>; > Prince Takkar <ptakkar@marvell.com> > Subject: [PATCH v2 3/3] mldev: drop input and output size get APIs > > Drop support and use of ML input and output size get functions, > rte_ml_io_input_size_get and rte_ml_io_output_size_get. > > These functions are not required, as the model buffer size can be computed > from the fields of updated rte_ml_io_info structure. > > Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com> > Acked-by: Anup Prabhu <aprabhu@marvell.com> [-- Attachment #2: winmail.dat --] [-- Type: application/ms-tnef, Size: 35422 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v3 0/4] Spec changes to support multi I/O models 2023-08-30 15:52 [PATCH v1 0/3] Spec changes to support multi I/O models Srikanth Yalavarthi ` (3 preceding siblings ...) 2023-09-20 7:19 ` [PATCH v2 0/3] Spec changes to support multi I/O models Srikanth Yalavarthi @ 2023-09-27 18:11 ` Srikanth Yalavarthi 2023-09-27 18:11 ` [PATCH v3 1/4] mldev: add support for arbitrary shape dimensions Srikanth Yalavarthi ` (3 more replies) 2023-10-02 9:58 ` [PATCH v4 0/3] Spec changes to support multi I/O models Srikanth Yalavarthi 5 siblings, 4 replies; 26+ messages in thread From: Srikanth Yalavarthi @ 2023-09-27 18:11 UTC (permalink / raw) Cc: dev, syalavarthi, sshankarnara, aprabhu, ptakkar This series implements changes to mldev spec to extend support for ML models with multiple inputs and outputs. Changes include introduction of I/O layout to support packed and split buffers for model input and output. Extended the rte_ml_model_info structure to support multiple inputs and outputs. Updated rte_ml_op and quantize / dequantize APIs to support an array of input and output ML buffer segments. Support for batches option is dropped from test application. v3: - Added release notes for 23.11 v2: - Minor fixes - Cleanup of application help v1: - Initial changes Srikanth Yalavarthi (4): mldev: add support for arbitrary shape dimensions mldev: introduce support for IO layout mldev: drop input and output size get APIs mldev: update release notes for 23.11 app/test-mldev/ml_options.c | 16 - app/test-mldev/ml_options.h | 2 - app/test-mldev/test_inference_common.c | 420 +++++++++++++++++-------- app/test-mldev/test_inference_common.h | 6 + app/test-mldev/test_model_common.c | 6 - app/test-mldev/test_model_common.h | 1 - doc/guides/rel_notes/release_23_11.rst | 15 + doc/guides/tools/testmldev.rst | 6 - drivers/ml/cnxk/cn10k_ml_dev.h | 3 + drivers/ml/cnxk/cn10k_ml_model.c | 84 +++-- drivers/ml/cnxk/cn10k_ml_model.h | 12 + drivers/ml/cnxk/cn10k_ml_ops.c | 135 +++----- lib/mldev/meson.build | 2 +- lib/mldev/mldev_utils.c | 30 -- lib/mldev/mldev_utils.h | 16 - lib/mldev/rte_mldev.c | 50 +-- lib/mldev/rte_mldev.h | 201 +++++------- lib/mldev/rte_mldev_core.h | 68 +--- lib/mldev/version.map | 3 - 19 files changed, 521 insertions(+), 555 deletions(-) -- 2.41.0 ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v3 1/4] mldev: add support for arbitrary shape dimensions 2023-09-27 18:11 ` [PATCH v3 0/4] Spec changes to support multi I/O models Srikanth Yalavarthi @ 2023-09-27 18:11 ` Srikanth Yalavarthi 2023-09-27 18:11 ` [PATCH v3 2/4] mldev: introduce support for IO layout Srikanth Yalavarthi ` (2 subsequent siblings) 3 siblings, 0 replies; 26+ messages in thread From: Srikanth Yalavarthi @ 2023-09-27 18:11 UTC (permalink / raw) To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar Updated rte_ml_io_info to support shape of arbitrary number of dimensions. Dropped use of rte_ml_io_shape and rte_ml_io_format. Introduced new fields nb_elements and size in rte_ml_io_info. Updated drivers and app/mldev to support the changes. Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com> --- app/test-mldev/test_inference_common.c | 97 +++++--------------------- drivers/ml/cnxk/cn10k_ml_model.c | 78 +++++++++++++-------- drivers/ml/cnxk/cn10k_ml_model.h | 12 ++++ drivers/ml/cnxk/cn10k_ml_ops.c | 11 +-- lib/mldev/mldev_utils.c | 30 -------- lib/mldev/mldev_utils.h | 16 ----- lib/mldev/rte_mldev.h | 59 ++++------------ lib/mldev/version.map | 1 - 8 files changed, 94 insertions(+), 210 deletions(-) diff --git a/app/test-mldev/test_inference_common.c b/app/test-mldev/test_inference_common.c index 05b221401b..b40519b5e3 100644 --- a/app/test-mldev/test_inference_common.c +++ b/app/test-mldev/test_inference_common.c @@ -3,6 +3,7 @@ */ #include <errno.h> +#include <math.h> #include <stdio.h> #include <unistd.h> @@ -18,11 +19,6 @@ #include "ml_common.h" #include "test_inference_common.h" -#define ML_TEST_READ_TYPE(buffer, type) (*((type *)buffer)) - -#define ML_TEST_CHECK_OUTPUT(output, reference, tolerance) \ - (((float)output - (float)reference) <= (((float)reference * tolerance) / 100.0)) - #define ML_OPEN_WRITE_GET_ERR(name, buffer, size, err) \ do { \ FILE *fp = fopen(name, "w+"); \ @@ -763,9 +759,9 @@ ml_inference_validation(struct ml_test *test, struct ml_request *req) { struct test_inference *t = ml_test_priv((struct ml_test *)test); struct ml_model *model; - uint32_t nb_elements; - uint8_t *reference; - uint8_t *output; + float *reference; + float *output; + float deviation; bool match; uint32_t i; uint32_t j; @@ -777,89 +773,30 @@ ml_inference_validation(struct ml_test *test, struct ml_request *req) match = (rte_hash_crc(model->output, model->out_dsize, 0) == rte_hash_crc(model->reference, model->out_dsize, 0)); } else { - output = model->output; - reference = model->reference; + output = (float *)model->output; + reference = (float *)model->reference; i = 0; next_output: - nb_elements = - model->info.output_info[i].shape.w * model->info.output_info[i].shape.x * - model->info.output_info[i].shape.y * model->info.output_info[i].shape.z; j = 0; next_element: match = false; - switch (model->info.output_info[i].dtype) { - case RTE_ML_IO_TYPE_INT8: - if (ML_TEST_CHECK_OUTPUT(ML_TEST_READ_TYPE(output, int8_t), - ML_TEST_READ_TYPE(reference, int8_t), - t->cmn.opt->tolerance)) - match = true; - - output += sizeof(int8_t); - reference += sizeof(int8_t); - break; - case RTE_ML_IO_TYPE_UINT8: - if (ML_TEST_CHECK_OUTPUT(ML_TEST_READ_TYPE(output, uint8_t), - ML_TEST_READ_TYPE(reference, uint8_t), - t->cmn.opt->tolerance)) - match = true; - - output += sizeof(float); - reference += sizeof(float); - break; - case RTE_ML_IO_TYPE_INT16: - if (ML_TEST_CHECK_OUTPUT(ML_TEST_READ_TYPE(output, int16_t), - ML_TEST_READ_TYPE(reference, int16_t), - t->cmn.opt->tolerance)) - match = true; - - output += sizeof(int16_t); - reference += sizeof(int16_t); - break; - case RTE_ML_IO_TYPE_UINT16: - if (ML_TEST_CHECK_OUTPUT(ML_TEST_READ_TYPE(output, uint16_t), - ML_TEST_READ_TYPE(reference, uint16_t), - t->cmn.opt->tolerance)) - match = true; - - output += sizeof(uint16_t); - reference += sizeof(uint16_t); - break; - case RTE_ML_IO_TYPE_INT32: - if (ML_TEST_CHECK_OUTPUT(ML_TEST_READ_TYPE(output, int32_t), - ML_TEST_READ_TYPE(reference, int32_t), - t->cmn.opt->tolerance)) - match = true; - - output += sizeof(int32_t); - reference += sizeof(int32_t); - break; - case RTE_ML_IO_TYPE_UINT32: - if (ML_TEST_CHECK_OUTPUT(ML_TEST_READ_TYPE(output, uint32_t), - ML_TEST_READ_TYPE(reference, uint32_t), - t->cmn.opt->tolerance)) - match = true; - - output += sizeof(uint32_t); - reference += sizeof(uint32_t); - break; - case RTE_ML_IO_TYPE_FP32: - if (ML_TEST_CHECK_OUTPUT(ML_TEST_READ_TYPE(output, float), - ML_TEST_READ_TYPE(reference, float), - t->cmn.opt->tolerance)) - match = true; - - output += sizeof(float); - reference += sizeof(float); - break; - default: /* other types, fp8, fp16, bfloat16 */ + deviation = + (*reference == 0 ? 0 : 100 * fabs(*output - *reference) / fabs(*reference)); + if (deviation <= t->cmn.opt->tolerance) match = true; - } + else + ml_err("id = %d, element = %d, output = %f, reference = %f, deviation = %f %%\n", + i, j, *output, *reference, deviation); + + output++; + reference++; if (!match) goto done; + j++; - if (j < nb_elements) + if (j < model->info.output_info[i].nb_elements) goto next_element; i++; diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c index 92c47d39ba..26df8d9ff9 100644 --- a/drivers/ml/cnxk/cn10k_ml_model.c +++ b/drivers/ml/cnxk/cn10k_ml_model.c @@ -366,6 +366,12 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_ addr->total_input_sz_q = 0; for (i = 0; i < metadata->model.num_input; i++) { if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) { + addr->input[i].nb_dims = 4; + addr->input[i].shape[0] = metadata->input1[i].shape.w; + addr->input[i].shape[1] = metadata->input1[i].shape.x; + addr->input[i].shape[2] = metadata->input1[i].shape.y; + addr->input[i].shape[3] = metadata->input1[i].shape.z; + addr->input[i].nb_elements = metadata->input1[i].shape.w * metadata->input1[i].shape.x * metadata->input1[i].shape.y * metadata->input1[i].shape.z; @@ -386,6 +392,13 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_ addr->input[i].sz_q); } else { j = i - MRVL_ML_NUM_INPUT_OUTPUT_1; + + addr->input[i].nb_dims = 4; + addr->input[i].shape[0] = metadata->input2[j].shape.w; + addr->input[i].shape[1] = metadata->input2[j].shape.x; + addr->input[i].shape[2] = metadata->input2[j].shape.y; + addr->input[i].shape[3] = metadata->input2[j].shape.z; + addr->input[i].nb_elements = metadata->input2[j].shape.w * metadata->input2[j].shape.x * metadata->input2[j].shape.y * metadata->input2[j].shape.z; @@ -412,6 +425,8 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_ addr->total_output_sz_d = 0; for (i = 0; i < metadata->model.num_output; i++) { if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) { + addr->output[i].nb_dims = 1; + addr->output[i].shape[0] = metadata->output1[i].size; addr->output[i].nb_elements = metadata->output1[i].size; addr->output[i].sz_d = addr->output[i].nb_elements * @@ -426,6 +441,9 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_ model->model_id, i, addr->output[i].sz_d, addr->output[i].sz_q); } else { j = i - MRVL_ML_NUM_INPUT_OUTPUT_1; + + addr->output[i].nb_dims = 1; + addr->output[i].shape[0] = metadata->output2[j].size; addr->output[i].nb_elements = metadata->output2[j].size; addr->output[i].sz_d = addr->output[i].nb_elements * @@ -498,6 +516,7 @@ void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model) { struct cn10k_ml_model_metadata *metadata; + struct cn10k_ml_model_addr *addr; struct rte_ml_model_info *info; struct rte_ml_io_info *output; struct rte_ml_io_info *input; @@ -508,6 +527,7 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model) info = PLT_PTR_CAST(model->info); input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info)); output = PLT_PTR_ADD(input, metadata->model.num_input * sizeof(struct rte_ml_io_info)); + addr = &model->addr; /* Set model info */ memset(info, 0, sizeof(struct rte_ml_model_info)); @@ -529,24 +549,25 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model) if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) { rte_memcpy(input[i].name, metadata->input1[i].input_name, MRVL_ML_INPUT_NAME_LEN); - input[i].dtype = metadata->input1[i].input_type; - input[i].qtype = metadata->input1[i].model_input_type; - input[i].shape.format = metadata->input1[i].shape.format; - input[i].shape.w = metadata->input1[i].shape.w; - input[i].shape.x = metadata->input1[i].shape.x; - input[i].shape.y = metadata->input1[i].shape.y; - input[i].shape.z = metadata->input1[i].shape.z; + input[i].nb_dims = addr->input[i].nb_dims; + input[i].shape = addr->input[i].shape; + input[i].type = metadata->input1[i].model_input_type; + input[i].nb_elements = addr->input[i].nb_elements; + input[i].size = + addr->input[i].nb_elements * + rte_ml_io_type_size_get(metadata->input1[i].model_input_type); } else { j = i - MRVL_ML_NUM_INPUT_OUTPUT_1; + rte_memcpy(input[i].name, metadata->input2[j].input_name, MRVL_ML_INPUT_NAME_LEN); - input[i].dtype = metadata->input2[j].input_type; - input[i].qtype = metadata->input2[j].model_input_type; - input[i].shape.format = metadata->input2[j].shape.format; - input[i].shape.w = metadata->input2[j].shape.w; - input[i].shape.x = metadata->input2[j].shape.x; - input[i].shape.y = metadata->input2[j].shape.y; - input[i].shape.z = metadata->input2[j].shape.z; + input[i].nb_dims = addr->input[i].nb_dims; + input[i].shape = addr->input[i].shape; + input[i].type = metadata->input2[j].model_input_type; + input[i].nb_elements = addr->input[i].nb_elements; + input[i].size = + addr->input[i].nb_elements * + rte_ml_io_type_size_get(metadata->input2[j].model_input_type); } } @@ -555,24 +576,25 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model) if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) { rte_memcpy(output[i].name, metadata->output1[i].output_name, MRVL_ML_OUTPUT_NAME_LEN); - output[i].dtype = metadata->output1[i].output_type; - output[i].qtype = metadata->output1[i].model_output_type; - output[i].shape.format = RTE_ML_IO_FORMAT_1D; - output[i].shape.w = metadata->output1[i].size; - output[i].shape.x = 1; - output[i].shape.y = 1; - output[i].shape.z = 1; + output[i].nb_dims = addr->output[i].nb_dims; + output[i].shape = addr->output[i].shape; + output[i].type = metadata->output1[i].model_output_type; + output[i].nb_elements = addr->output[i].nb_elements; + output[i].size = + addr->output[i].nb_elements * + rte_ml_io_type_size_get(metadata->output1[i].model_output_type); } else { j = i - MRVL_ML_NUM_INPUT_OUTPUT_1; + rte_memcpy(output[i].name, metadata->output2[j].output_name, MRVL_ML_OUTPUT_NAME_LEN); - output[i].dtype = metadata->output2[j].output_type; - output[i].qtype = metadata->output2[j].model_output_type; - output[i].shape.format = RTE_ML_IO_FORMAT_1D; - output[i].shape.w = metadata->output2[j].size; - output[i].shape.x = 1; - output[i].shape.y = 1; - output[i].shape.z = 1; + output[i].nb_dims = addr->output[i].nb_dims; + output[i].shape = addr->output[i].shape; + output[i].type = metadata->output2[j].model_output_type; + output[i].nb_elements = addr->output[i].nb_elements; + output[i].size = + addr->output[i].nb_elements * + rte_ml_io_type_size_get(metadata->output2[j].model_output_type); } } } diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h index 1f689363fc..4cc0744891 100644 --- a/drivers/ml/cnxk/cn10k_ml_model.h +++ b/drivers/ml/cnxk/cn10k_ml_model.h @@ -409,6 +409,12 @@ struct cn10k_ml_model_addr { /* Input address and size */ struct { + /* Number of dimensions in shape */ + uint32_t nb_dims; + + /* Shape of input */ + uint32_t shape[4]; + /* Number of elements */ uint32_t nb_elements; @@ -421,6 +427,12 @@ struct cn10k_ml_model_addr { /* Output address and size */ struct { + /* Number of dimensions in shape */ + uint32_t nb_dims; + + /* Shape of input */ + uint32_t shape[4]; + /* Number of elements */ uint32_t nb_elements; diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c index 656467d891..e3faab81ba 100644 --- a/drivers/ml/cnxk/cn10k_ml_ops.c +++ b/drivers/ml/cnxk/cn10k_ml_ops.c @@ -321,8 +321,8 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp) fprintf(fp, "\n"); print_line(fp, LINE_LEN); - fprintf(fp, "%8s %16s %12s %18s %12s %14s\n", "input", "input_name", "input_type", - "model_input_type", "quantize", "format"); + fprintf(fp, "%8s %16s %12s %18s %12s\n", "input", "input_name", "input_type", + "model_input_type", "quantize"); print_line(fp, LINE_LEN); for (i = 0; i < model->metadata.model.num_input; i++) { if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) { @@ -335,12 +335,10 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp) fprintf(fp, "%*s ", 18, str); fprintf(fp, "%*s", 12, (model->metadata.input1[i].quantize == 1 ? "Yes" : "No")); - rte_ml_io_format_to_str(model->metadata.input1[i].shape.format, str, - STR_LEN); - fprintf(fp, "%*s", 16, str); fprintf(fp, "\n"); } else { j = i - MRVL_ML_NUM_INPUT_OUTPUT_1; + fprintf(fp, "%8u ", i); fprintf(fp, "%*s ", 16, model->metadata.input2[j].input_name); rte_ml_io_type_to_str(model->metadata.input2[j].input_type, str, STR_LEN); @@ -350,9 +348,6 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp) fprintf(fp, "%*s ", 18, str); fprintf(fp, "%*s", 12, (model->metadata.input2[j].quantize == 1 ? "Yes" : "No")); - rte_ml_io_format_to_str(model->metadata.input2[j].shape.format, str, - STR_LEN); - fprintf(fp, "%*s", 16, str); fprintf(fp, "\n"); } } diff --git a/lib/mldev/mldev_utils.c b/lib/mldev/mldev_utils.c index d2442b123b..ccd2c39ca8 100644 --- a/lib/mldev/mldev_utils.c +++ b/lib/mldev/mldev_utils.c @@ -86,33 +86,3 @@ rte_ml_io_type_to_str(enum rte_ml_io_type type, char *str, int len) rte_strlcpy(str, "invalid", len); } } - -void -rte_ml_io_format_to_str(enum rte_ml_io_format format, char *str, int len) -{ - switch (format) { - case RTE_ML_IO_FORMAT_NCHW: - rte_strlcpy(str, "NCHW", len); - break; - case RTE_ML_IO_FORMAT_NHWC: - rte_strlcpy(str, "NHWC", len); - break; - case RTE_ML_IO_FORMAT_CHWN: - rte_strlcpy(str, "CHWN", len); - break; - case RTE_ML_IO_FORMAT_3D: - rte_strlcpy(str, "3D", len); - break; - case RTE_ML_IO_FORMAT_2D: - rte_strlcpy(str, "Matrix", len); - break; - case RTE_ML_IO_FORMAT_1D: - rte_strlcpy(str, "Vector", len); - break; - case RTE_ML_IO_FORMAT_SCALAR: - rte_strlcpy(str, "Scalar", len); - break; - default: - rte_strlcpy(str, "invalid", len); - } -} diff --git a/lib/mldev/mldev_utils.h b/lib/mldev/mldev_utils.h index 5bc8020453..220afb42f0 100644 --- a/lib/mldev/mldev_utils.h +++ b/lib/mldev/mldev_utils.h @@ -52,22 +52,6 @@ __rte_internal void rte_ml_io_type_to_str(enum rte_ml_io_type type, char *str, int len); -/** - * @internal - * - * Get the name of an ML IO format. - * - * @param[in] type - * Enumeration of ML IO format. - * @param[in] str - * Address of character array. - * @param[in] len - * Length of character array. - */ -__rte_internal -void -rte_ml_io_format_to_str(enum rte_ml_io_format format, char *str, int len); - /** * @internal * diff --git a/lib/mldev/rte_mldev.h b/lib/mldev/rte_mldev.h index fc3525c1ab..6204df0930 100644 --- a/lib/mldev/rte_mldev.h +++ b/lib/mldev/rte_mldev.h @@ -863,47 +863,6 @@ enum rte_ml_io_type { /**< 16-bit brain floating point number. */ }; -/** - * Input and output format. This is used to represent the encoding type of multi-dimensional - * used by ML models. - */ -enum rte_ml_io_format { - RTE_ML_IO_FORMAT_NCHW = 1, - /**< Batch size (N) x channels (C) x height (H) x width (W) */ - RTE_ML_IO_FORMAT_NHWC, - /**< Batch size (N) x height (H) x width (W) x channels (C) */ - RTE_ML_IO_FORMAT_CHWN, - /**< Channels (C) x height (H) x width (W) x batch size (N) */ - RTE_ML_IO_FORMAT_3D, - /**< Format to represent a 3 dimensional data */ - RTE_ML_IO_FORMAT_2D, - /**< Format to represent matrix data */ - RTE_ML_IO_FORMAT_1D, - /**< Format to represent vector data */ - RTE_ML_IO_FORMAT_SCALAR, - /**< Format to represent scalar data */ -}; - -/** - * Input and output shape. This structure represents the encoding format and dimensions - * of the tensor or vector. - * - * The data can be a 4D / 3D tensor, matrix, vector or a scalar. Number of dimensions used - * for the data would depend on the format. Unused dimensions to be set to 1. - */ -struct rte_ml_io_shape { - enum rte_ml_io_format format; - /**< Format of the data */ - uint32_t w; - /**< First dimension */ - uint32_t x; - /**< Second dimension */ - uint32_t y; - /**< Third dimension */ - uint32_t z; - /**< Fourth dimension */ -}; - /** Input and output data information structure * * Specifies the type and shape of input and output data. @@ -911,12 +870,18 @@ struct rte_ml_io_shape { struct rte_ml_io_info { char name[RTE_ML_STR_MAX]; /**< Name of data */ - struct rte_ml_io_shape shape; - /**< Shape of data */ - enum rte_ml_io_type qtype; - /**< Type of quantized data */ - enum rte_ml_io_type dtype; - /**< Type of de-quantized data */ + uint32_t nb_dims; + /**< Number of dimensions in shape */ + uint32_t *shape; + /**< Shape of the tensor */ + enum rte_ml_io_type type; + /**< Type of data + * @see enum rte_ml_io_type + */ + uint64_t nb_elements; + /** Number of elements in tensor */ + uint64_t size; + /** Size of tensor in bytes */ }; /** Model information structure */ diff --git a/lib/mldev/version.map b/lib/mldev/version.map index 0706b565be..40ff27f4b9 100644 --- a/lib/mldev/version.map +++ b/lib/mldev/version.map @@ -51,7 +51,6 @@ INTERNAL { rte_ml_io_type_size_get; rte_ml_io_type_to_str; - rte_ml_io_format_to_str; rte_ml_io_float32_to_int8; rte_ml_io_int8_to_float32; rte_ml_io_float32_to_uint8; -- 2.41.0 ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v3 2/4] mldev: introduce support for IO layout 2023-09-27 18:11 ` [PATCH v3 0/4] Spec changes to support multi I/O models Srikanth Yalavarthi 2023-09-27 18:11 ` [PATCH v3 1/4] mldev: add support for arbitrary shape dimensions Srikanth Yalavarthi @ 2023-09-27 18:11 ` Srikanth Yalavarthi 2023-09-27 18:11 ` [PATCH v3 3/4] mldev: drop input and output size get APIs Srikanth Yalavarthi 2023-09-27 18:11 ` [PATCH v3 4/4] mldev: update release notes for 23.11 Srikanth Yalavarthi 3 siblings, 0 replies; 26+ messages in thread From: Srikanth Yalavarthi @ 2023-09-27 18:11 UTC (permalink / raw) To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar Introduce IO layout in ML device specification. IO layout defines the expected arrangement of model input and output buffers in the memory. Packed and Split layout support is added in the specification. Updated rte_ml_op to support array of rte_ml_buff_seg pointers to support packed and split I/O layouts. Updated ML quantize and dequantize APIs to support rte_ml_buff_seg pointer arrays. Replaced batch_size with min_batches and max_batches in rte_ml_model_info. Implement support for model IO layout in ml/cnxk driver. Updated the ML test application to support IO layout and dropped support for '--batches' in test application. Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com> --- app/test-mldev/ml_options.c | 16 -- app/test-mldev/ml_options.h | 2 - app/test-mldev/test_inference_common.c | 327 +++++++++++++++++++++---- app/test-mldev/test_inference_common.h | 6 + app/test-mldev/test_model_common.c | 6 - app/test-mldev/test_model_common.h | 1 - doc/guides/tools/testmldev.rst | 6 - drivers/ml/cnxk/cn10k_ml_dev.h | 3 + drivers/ml/cnxk/cn10k_ml_model.c | 6 +- drivers/ml/cnxk/cn10k_ml_ops.c | 74 +++--- lib/mldev/meson.build | 2 +- lib/mldev/rte_mldev.c | 12 +- lib/mldev/rte_mldev.h | 90 +++++-- lib/mldev/rte_mldev_core.h | 14 +- 14 files changed, 418 insertions(+), 147 deletions(-) diff --git a/app/test-mldev/ml_options.c b/app/test-mldev/ml_options.c index d068b30df5..eeaffec399 100644 --- a/app/test-mldev/ml_options.c +++ b/app/test-mldev/ml_options.c @@ -28,7 +28,6 @@ ml_options_default(struct ml_options *opt) opt->burst_size = 1; opt->queue_pairs = 1; opt->queue_size = 1; - opt->batches = 0; opt->tolerance = 0.0; opt->stats = false; opt->debug = false; @@ -213,18 +212,6 @@ ml_parse_queue_size(struct ml_options *opt, const char *arg) return ret; } -static int -ml_parse_batches(struct ml_options *opt, const char *arg) -{ - int ret; - - ret = parser_read_uint16(&opt->batches, arg); - if (ret != 0) - ml_err("Invalid option, batches = %s\n", arg); - - return ret; -} - static int ml_parse_tolerance(struct ml_options *opt, const char *arg) { @@ -255,7 +242,6 @@ ml_dump_test_options(const char *testname) "\t\t--burst_size : inference burst size\n" "\t\t--queue_pairs : number of queue pairs to create\n" "\t\t--queue_size : size of queue-pair\n" - "\t\t--batches : number of batches of input\n" "\t\t--tolerance : maximum tolerance (%%) for output validation\n" "\t\t--stats : enable reporting device and model statistics\n"); printf("\n"); @@ -287,7 +273,6 @@ static struct option lgopts[] = { {ML_BURST_SIZE, 1, 0, 0}, {ML_QUEUE_PAIRS, 1, 0, 0}, {ML_QUEUE_SIZE, 1, 0, 0}, - {ML_BATCHES, 1, 0, 0}, {ML_TOLERANCE, 1, 0, 0}, {ML_STATS, 0, 0, 0}, {ML_DEBUG, 0, 0, 0}, @@ -309,7 +294,6 @@ ml_opts_parse_long(int opt_idx, struct ml_options *opt) {ML_BURST_SIZE, ml_parse_burst_size}, {ML_QUEUE_PAIRS, ml_parse_queue_pairs}, {ML_QUEUE_SIZE, ml_parse_queue_size}, - {ML_BATCHES, ml_parse_batches}, {ML_TOLERANCE, ml_parse_tolerance}, }; diff --git a/app/test-mldev/ml_options.h b/app/test-mldev/ml_options.h index 622a4c05fc..90e22adeac 100644 --- a/app/test-mldev/ml_options.h +++ b/app/test-mldev/ml_options.h @@ -21,7 +21,6 @@ #define ML_BURST_SIZE ("burst_size") #define ML_QUEUE_PAIRS ("queue_pairs") #define ML_QUEUE_SIZE ("queue_size") -#define ML_BATCHES ("batches") #define ML_TOLERANCE ("tolerance") #define ML_STATS ("stats") #define ML_DEBUG ("debug") @@ -44,7 +43,6 @@ struct ml_options { uint16_t burst_size; uint16_t queue_pairs; uint16_t queue_size; - uint16_t batches; float tolerance; bool stats; bool debug; diff --git a/app/test-mldev/test_inference_common.c b/app/test-mldev/test_inference_common.c index b40519b5e3..846f71abb1 100644 --- a/app/test-mldev/test_inference_common.c +++ b/app/test-mldev/test_inference_common.c @@ -47,7 +47,10 @@ ml_enqueue_single(void *arg) uint64_t start_cycle; uint32_t burst_enq; uint32_t lcore_id; + uint64_t offset; + uint64_t bufsz; uint16_t fid; + uint32_t i; int ret; lcore_id = rte_lcore_id(); @@ -66,24 +69,64 @@ ml_enqueue_single(void *arg) if (ret != 0) goto next_model; -retry: +retry_req: ret = rte_mempool_get(t->model[fid].io_pool, (void **)&req); if (ret != 0) - goto retry; + goto retry_req; + +retry_inp_segs: + ret = rte_mempool_get_bulk(t->buf_seg_pool, (void **)req->inp_buf_segs, + t->model[fid].info.nb_inputs); + if (ret != 0) + goto retry_inp_segs; + +retry_out_segs: + ret = rte_mempool_get_bulk(t->buf_seg_pool, (void **)req->out_buf_segs, + t->model[fid].info.nb_outputs); + if (ret != 0) + goto retry_out_segs; op->model_id = t->model[fid].id; - op->nb_batches = t->model[fid].nb_batches; + op->nb_batches = t->model[fid].info.min_batches; op->mempool = t->op_pool; + op->input = req->inp_buf_segs; + op->output = req->out_buf_segs; + op->user_ptr = req; - op->input.addr = req->input; - op->input.length = t->model[fid].inp_qsize; - op->input.next = NULL; + if (t->model[fid].info.io_layout == RTE_ML_IO_LAYOUT_PACKED) { + op->input[0]->addr = req->input; + op->input[0]->iova_addr = rte_mem_virt2iova(req->input); + op->input[0]->length = t->model[fid].inp_qsize; + op->input[0]->next = NULL; + + op->output[0]->addr = req->output; + op->output[0]->iova_addr = rte_mem_virt2iova(req->output); + op->output[0]->length = t->model[fid].out_qsize; + op->output[0]->next = NULL; + } else { + offset = 0; + for (i = 0; i < t->model[fid].info.nb_inputs; i++) { + bufsz = RTE_ALIGN_CEIL(t->model[fid].info.input_info[i].size, + t->cmn.dev_info.align_size); + op->input[i]->addr = req->input + offset; + op->input[i]->iova_addr = rte_mem_virt2iova(req->input + offset); + op->input[i]->length = bufsz; + op->input[i]->next = NULL; + offset += bufsz; + } - op->output.addr = req->output; - op->output.length = t->model[fid].out_qsize; - op->output.next = NULL; + offset = 0; + for (i = 0; i < t->model[fid].info.nb_outputs; i++) { + bufsz = RTE_ALIGN_CEIL(t->model[fid].info.output_info[i].size, + t->cmn.dev_info.align_size); + op->output[i]->addr = req->output + offset; + op->output[i]->iova_addr = rte_mem_virt2iova(req->output + offset); + op->output[i]->length = bufsz; + op->output[i]->next = NULL; + offset += bufsz; + } + } - op->user_ptr = req; req->niters++; req->fid = fid; @@ -143,6 +186,10 @@ ml_dequeue_single(void *arg) } req = (struct ml_request *)op->user_ptr; rte_mempool_put(t->model[req->fid].io_pool, req); + rte_mempool_put_bulk(t->buf_seg_pool, (void **)op->input, + t->model[req->fid].info.nb_inputs); + rte_mempool_put_bulk(t->buf_seg_pool, (void **)op->output, + t->model[req->fid].info.nb_outputs); rte_mempool_put(t->op_pool, op); } @@ -164,9 +211,12 @@ ml_enqueue_burst(void *arg) uint16_t burst_enq; uint32_t lcore_id; uint16_t pending; + uint64_t offset; + uint64_t bufsz; uint16_t idx; uint16_t fid; uint16_t i; + uint16_t j; int ret; lcore_id = rte_lcore_id(); @@ -186,25 +236,70 @@ ml_enqueue_burst(void *arg) if (ret != 0) goto next_model; -retry: +retry_reqs: ret = rte_mempool_get_bulk(t->model[fid].io_pool, (void **)args->reqs, ops_count); if (ret != 0) - goto retry; + goto retry_reqs; for (i = 0; i < ops_count; i++) { +retry_inp_segs: + ret = rte_mempool_get_bulk(t->buf_seg_pool, (void **)args->reqs[i]->inp_buf_segs, + t->model[fid].info.nb_inputs); + if (ret != 0) + goto retry_inp_segs; + +retry_out_segs: + ret = rte_mempool_get_bulk(t->buf_seg_pool, (void **)args->reqs[i]->out_buf_segs, + t->model[fid].info.nb_outputs); + if (ret != 0) + goto retry_out_segs; + args->enq_ops[i]->model_id = t->model[fid].id; - args->enq_ops[i]->nb_batches = t->model[fid].nb_batches; + args->enq_ops[i]->nb_batches = t->model[fid].info.min_batches; args->enq_ops[i]->mempool = t->op_pool; + args->enq_ops[i]->input = args->reqs[i]->inp_buf_segs; + args->enq_ops[i]->output = args->reqs[i]->out_buf_segs; + args->enq_ops[i]->user_ptr = args->reqs[i]; - args->enq_ops[i]->input.addr = args->reqs[i]->input; - args->enq_ops[i]->input.length = t->model[fid].inp_qsize; - args->enq_ops[i]->input.next = NULL; + if (t->model[fid].info.io_layout == RTE_ML_IO_LAYOUT_PACKED) { + args->enq_ops[i]->input[0]->addr = args->reqs[i]->input; + args->enq_ops[i]->input[0]->iova_addr = + rte_mem_virt2iova(args->reqs[i]->input); + args->enq_ops[i]->input[0]->length = t->model[fid].inp_qsize; + args->enq_ops[i]->input[0]->next = NULL; + + args->enq_ops[i]->output[0]->addr = args->reqs[i]->output; + args->enq_ops[i]->output[0]->iova_addr = + rte_mem_virt2iova(args->reqs[i]->output); + args->enq_ops[i]->output[0]->length = t->model[fid].out_qsize; + args->enq_ops[i]->output[0]->next = NULL; + } else { + offset = 0; + for (j = 0; j < t->model[fid].info.nb_inputs; j++) { + bufsz = RTE_ALIGN_CEIL(t->model[fid].info.input_info[i].size, + t->cmn.dev_info.align_size); + + args->enq_ops[i]->input[j]->addr = args->reqs[i]->input + offset; + args->enq_ops[i]->input[j]->iova_addr = + rte_mem_virt2iova(args->reqs[i]->input + offset); + args->enq_ops[i]->input[j]->length = t->model[fid].inp_qsize; + args->enq_ops[i]->input[j]->next = NULL; + offset += bufsz; + } - args->enq_ops[i]->output.addr = args->reqs[i]->output; - args->enq_ops[i]->output.length = t->model[fid].out_qsize; - args->enq_ops[i]->output.next = NULL; + offset = 0; + for (j = 0; j < t->model[fid].info.nb_outputs; j++) { + bufsz = RTE_ALIGN_CEIL(t->model[fid].info.output_info[i].size, + t->cmn.dev_info.align_size); + args->enq_ops[i]->output[j]->addr = args->reqs[i]->output + offset; + args->enq_ops[i]->output[j]->iova_addr = + rte_mem_virt2iova(args->reqs[i]->output + offset); + args->enq_ops[i]->output[j]->length = t->model[fid].out_qsize; + args->enq_ops[i]->output[j]->next = NULL; + offset += bufsz; + } + } - args->enq_ops[i]->user_ptr = args->reqs[i]; args->reqs[i]->niters++; args->reqs[i]->fid = fid; } @@ -275,8 +370,15 @@ ml_dequeue_burst(void *arg) t->error_count[lcore_id]++; } req = (struct ml_request *)args->deq_ops[i]->user_ptr; - if (req != NULL) + if (req != NULL) { rte_mempool_put(t->model[req->fid].io_pool, req); + rte_mempool_put_bulk(t->buf_seg_pool, + (void **)args->deq_ops[i]->input, + t->model[req->fid].info.nb_inputs); + rte_mempool_put_bulk(t->buf_seg_pool, + (void **)args->deq_ops[i]->output, + t->model[req->fid].info.nb_outputs); + } } rte_mempool_put_bulk(t->op_pool, (void *)args->deq_ops, burst_deq); } @@ -315,6 +417,12 @@ test_inference_cap_check(struct ml_options *opt) return false; } + if (dev_info.max_io < ML_TEST_MAX_IO_SIZE) { + ml_err("Insufficient capabilities: Max I/O, count = %u > (max limit = %u)", + ML_TEST_MAX_IO_SIZE, dev_info.max_io); + return false; + } + return true; } @@ -403,11 +511,6 @@ test_inference_opt_dump(struct ml_options *opt) ml_dump("tolerance", "%-7.3f", opt->tolerance); ml_dump("stats", "%s", (opt->stats ? "true" : "false")); - if (opt->batches == 0) - ml_dump("batches", "%u (default batch size)", opt->batches); - else - ml_dump("batches", "%u", opt->batches); - ml_dump_begin("filelist"); for (i = 0; i < opt->nb_filelist; i++) { ml_dump_list("model", i, opt->filelist[i].model); @@ -492,10 +595,18 @@ void test_inference_destroy(struct ml_test *test, struct ml_options *opt) { struct test_inference *t; + uint32_t lcore_id; RTE_SET_USED(opt); t = ml_test_priv(test); + + for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) { + rte_free(t->args[lcore_id].enq_ops); + rte_free(t->args[lcore_id].deq_ops); + rte_free(t->args[lcore_id].reqs); + } + rte_free(t); } @@ -572,19 +683,62 @@ ml_request_initialize(struct rte_mempool *mp, void *opaque, void *obj, unsigned { struct test_inference *t = ml_test_priv((struct ml_test *)opaque); struct ml_request *req = (struct ml_request *)obj; + struct rte_ml_buff_seg dbuff_seg[ML_TEST_MAX_IO_SIZE]; + struct rte_ml_buff_seg qbuff_seg[ML_TEST_MAX_IO_SIZE]; + struct rte_ml_buff_seg *q_segs[ML_TEST_MAX_IO_SIZE]; + struct rte_ml_buff_seg *d_segs[ML_TEST_MAX_IO_SIZE]; + uint64_t offset; + uint64_t bufsz; + uint32_t i; RTE_SET_USED(mp); RTE_SET_USED(obj_idx); req->input = (uint8_t *)obj + - RTE_ALIGN_CEIL(sizeof(struct ml_request), t->cmn.dev_info.min_align_size); - req->output = req->input + - RTE_ALIGN_CEIL(t->model[t->fid].inp_qsize, t->cmn.dev_info.min_align_size); + RTE_ALIGN_CEIL(sizeof(struct ml_request), t->cmn.dev_info.align_size); + req->output = + req->input + RTE_ALIGN_CEIL(t->model[t->fid].inp_qsize, t->cmn.dev_info.align_size); req->niters = 0; + if (t->model[t->fid].info.io_layout == RTE_ML_IO_LAYOUT_PACKED) { + dbuff_seg[0].addr = t->model[t->fid].input; + dbuff_seg[0].iova_addr = rte_mem_virt2iova(t->model[t->fid].input); + dbuff_seg[0].length = t->model[t->fid].inp_dsize; + dbuff_seg[0].next = NULL; + d_segs[0] = &dbuff_seg[0]; + + qbuff_seg[0].addr = req->input; + qbuff_seg[0].iova_addr = rte_mem_virt2iova(req->input); + qbuff_seg[0].length = t->model[t->fid].inp_qsize; + qbuff_seg[0].next = NULL; + q_segs[0] = &qbuff_seg[0]; + } else { + offset = 0; + for (i = 0; i < t->model[t->fid].info.nb_inputs; i++) { + bufsz = t->model[t->fid].info.input_info[i].nb_elements * sizeof(float); + dbuff_seg[i].addr = t->model[t->fid].input + offset; + dbuff_seg[i].iova_addr = rte_mem_virt2iova(t->model[t->fid].input + offset); + dbuff_seg[i].length = bufsz; + dbuff_seg[i].next = NULL; + d_segs[i] = &dbuff_seg[i]; + offset += bufsz; + } + + offset = 0; + for (i = 0; i < t->model[t->fid].info.nb_inputs; i++) { + bufsz = RTE_ALIGN_CEIL(t->model[t->fid].info.input_info[i].size, + t->cmn.dev_info.align_size); + qbuff_seg[i].addr = req->input + offset; + qbuff_seg[i].iova_addr = rte_mem_virt2iova(req->input + offset); + qbuff_seg[i].length = bufsz; + qbuff_seg[i].next = NULL; + q_segs[i] = &qbuff_seg[i]; + offset += bufsz; + } + } + /* quantize data */ - rte_ml_io_quantize(t->cmn.opt->dev_id, t->model[t->fid].id, t->model[t->fid].nb_batches, - t->model[t->fid].input, req->input); + rte_ml_io_quantize(t->cmn.opt->dev_id, t->model[t->fid].id, d_segs, q_segs); } int @@ -599,24 +753,39 @@ ml_inference_iomem_setup(struct ml_test *test, struct ml_options *opt, uint16_t uint32_t buff_size; uint32_t mz_size; size_t fsize; + uint32_t i; int ret; /* get input buffer size */ - ret = rte_ml_io_input_size_get(opt->dev_id, t->model[fid].id, t->model[fid].nb_batches, - &t->model[fid].inp_qsize, &t->model[fid].inp_dsize); - if (ret != 0) { - ml_err("Failed to get input size, model : %s\n", opt->filelist[fid].model); - return ret; + t->model[fid].inp_qsize = 0; + for (i = 0; i < t->model[fid].info.nb_inputs; i++) { + if (t->model[fid].info.io_layout == RTE_ML_IO_LAYOUT_PACKED) + t->model[fid].inp_qsize += t->model[fid].info.input_info[i].size; + else + t->model[fid].inp_qsize += RTE_ALIGN_CEIL( + t->model[fid].info.input_info[i].size, t->cmn.dev_info.align_size); } /* get output buffer size */ - ret = rte_ml_io_output_size_get(opt->dev_id, t->model[fid].id, t->model[fid].nb_batches, - &t->model[fid].out_qsize, &t->model[fid].out_dsize); - if (ret != 0) { - ml_err("Failed to get input size, model : %s\n", opt->filelist[fid].model); - return ret; + t->model[fid].out_qsize = 0; + for (i = 0; i < t->model[fid].info.nb_outputs; i++) { + if (t->model[fid].info.io_layout == RTE_ML_IO_LAYOUT_PACKED) + t->model[fid].out_qsize += t->model[fid].info.output_info[i].size; + else + t->model[fid].out_qsize += RTE_ALIGN_CEIL( + t->model[fid].info.output_info[i].size, t->cmn.dev_info.align_size); } + t->model[fid].inp_dsize = 0; + for (i = 0; i < t->model[fid].info.nb_inputs; i++) + t->model[fid].inp_dsize += + t->model[fid].info.input_info[i].nb_elements * sizeof(float); + + t->model[fid].out_dsize = 0; + for (i = 0; i < t->model[fid].info.nb_outputs; i++) + t->model[fid].out_dsize += + t->model[fid].info.output_info[i].nb_elements * sizeof(float); + /* allocate buffer for user data */ mz_size = t->model[fid].inp_dsize + t->model[fid].out_dsize; if (strcmp(opt->filelist[fid].reference, "\0") != 0) @@ -675,9 +844,9 @@ ml_inference_iomem_setup(struct ml_test *test, struct ml_options *opt, uint16_t /* create mempool for quantized input and output buffers. ml_request_initialize is * used as a callback for object creation. */ - buff_size = RTE_ALIGN_CEIL(sizeof(struct ml_request), t->cmn.dev_info.min_align_size) + - RTE_ALIGN_CEIL(t->model[fid].inp_qsize, t->cmn.dev_info.min_align_size) + - RTE_ALIGN_CEIL(t->model[fid].out_qsize, t->cmn.dev_info.min_align_size); + buff_size = RTE_ALIGN_CEIL(sizeof(struct ml_request), t->cmn.dev_info.align_size) + + RTE_ALIGN_CEIL(t->model[fid].inp_qsize, t->cmn.dev_info.align_size) + + RTE_ALIGN_CEIL(t->model[fid].out_qsize, t->cmn.dev_info.align_size); nb_buffers = RTE_MIN((uint64_t)ML_TEST_MAX_POOL_SIZE, opt->repetitions); t->fid = fid; @@ -740,6 +909,18 @@ ml_inference_mem_setup(struct ml_test *test, struct ml_options *opt) return -ENOMEM; } + /* create buf_segs pool of with element of uint8_t. external buffers are attached to the + * buf_segs while queuing inference requests. + */ + t->buf_seg_pool = rte_mempool_create("ml_test_mbuf_pool", ML_TEST_MAX_POOL_SIZE * 2, + sizeof(struct rte_ml_buff_seg), 0, 0, NULL, NULL, NULL, + NULL, opt->socket_id, 0); + if (t->buf_seg_pool == NULL) { + ml_err("Failed to create buf_segs pool : %s\n", "ml_test_mbuf_pool"); + rte_ml_op_pool_free(t->op_pool); + return -ENOMEM; + } + return 0; } @@ -752,6 +933,9 @@ ml_inference_mem_destroy(struct ml_test *test, struct ml_options *opt) /* release op pool */ rte_mempool_free(t->op_pool); + + /* release buf_segs pool */ + rte_mempool_free(t->buf_seg_pool); } static bool @@ -781,8 +965,10 @@ ml_inference_validation(struct ml_test *test, struct ml_request *req) j = 0; next_element: match = false; - deviation = - (*reference == 0 ? 0 : 100 * fabs(*output - *reference) / fabs(*reference)); + if ((*reference == 0) && (*output == 0)) + deviation = 0; + else + deviation = 100 * fabs(*output - *reference) / fabs(*reference); if (deviation <= t->cmn.opt->tolerance) match = true; else @@ -817,14 +1003,59 @@ ml_request_finish(struct rte_mempool *mp, void *opaque, void *obj, unsigned int bool error = false; char *dump_path; + struct rte_ml_buff_seg qbuff_seg[ML_TEST_MAX_IO_SIZE]; + struct rte_ml_buff_seg dbuff_seg[ML_TEST_MAX_IO_SIZE]; + struct rte_ml_buff_seg *q_segs[ML_TEST_MAX_IO_SIZE]; + struct rte_ml_buff_seg *d_segs[ML_TEST_MAX_IO_SIZE]; + uint64_t offset; + uint64_t bufsz; + uint32_t i; + RTE_SET_USED(mp); if (req->niters == 0) return; t->nb_used++; - rte_ml_io_dequantize(t->cmn.opt->dev_id, model->id, t->model[req->fid].nb_batches, - req->output, model->output); + + if (t->model[req->fid].info.io_layout == RTE_ML_IO_LAYOUT_PACKED) { + qbuff_seg[0].addr = req->output; + qbuff_seg[0].iova_addr = rte_mem_virt2iova(req->output); + qbuff_seg[0].length = t->model[req->fid].out_qsize; + qbuff_seg[0].next = NULL; + q_segs[0] = &qbuff_seg[0]; + + dbuff_seg[0].addr = model->output; + dbuff_seg[0].iova_addr = rte_mem_virt2iova(model->output); + dbuff_seg[0].length = t->model[req->fid].out_dsize; + dbuff_seg[0].next = NULL; + d_segs[0] = &dbuff_seg[0]; + } else { + offset = 0; + for (i = 0; i < t->model[req->fid].info.nb_outputs; i++) { + bufsz = RTE_ALIGN_CEIL(t->model[req->fid].info.output_info[i].size, + t->cmn.dev_info.align_size); + qbuff_seg[i].addr = req->output + offset; + qbuff_seg[i].iova_addr = rte_mem_virt2iova(req->output + offset); + qbuff_seg[i].length = bufsz; + qbuff_seg[i].next = NULL; + q_segs[i] = &qbuff_seg[i]; + offset += bufsz; + } + + offset = 0; + for (i = 0; i < t->model[req->fid].info.nb_outputs; i++) { + bufsz = t->model[req->fid].info.output_info[i].nb_elements * sizeof(float); + dbuff_seg[i].addr = model->output + offset; + dbuff_seg[i].iova_addr = rte_mem_virt2iova(model->output + offset); + dbuff_seg[i].length = bufsz; + dbuff_seg[i].next = NULL; + d_segs[i] = &dbuff_seg[i]; + offset += bufsz; + } + } + + rte_ml_io_dequantize(t->cmn.opt->dev_id, model->id, q_segs, d_segs); if (model->reference == NULL) goto dump_output_pass; diff --git a/app/test-mldev/test_inference_common.h b/app/test-mldev/test_inference_common.h index 8f27af25e4..3f4ba3219b 100644 --- a/app/test-mldev/test_inference_common.h +++ b/app/test-mldev/test_inference_common.h @@ -11,11 +11,16 @@ #include "test_model_common.h" +#define ML_TEST_MAX_IO_SIZE 32 + struct ml_request { uint8_t *input; uint8_t *output; uint16_t fid; uint64_t niters; + + struct rte_ml_buff_seg *inp_buf_segs[ML_TEST_MAX_IO_SIZE]; + struct rte_ml_buff_seg *out_buf_segs[ML_TEST_MAX_IO_SIZE]; }; struct ml_core_args { @@ -38,6 +43,7 @@ struct test_inference { /* test specific data */ struct ml_model model[ML_TEST_MAX_MODELS]; + struct rte_mempool *buf_seg_pool; struct rte_mempool *op_pool; uint64_t nb_used; diff --git a/app/test-mldev/test_model_common.c b/app/test-mldev/test_model_common.c index 8dbb0ff89f..c517a50611 100644 --- a/app/test-mldev/test_model_common.c +++ b/app/test-mldev/test_model_common.c @@ -50,12 +50,6 @@ ml_model_load(struct ml_test *test, struct ml_options *opt, struct ml_model *mod return ret; } - /* Update number of batches */ - if (opt->batches == 0) - model->nb_batches = model->info.batch_size; - else - model->nb_batches = opt->batches; - model->state = MODEL_LOADED; return 0; diff --git a/app/test-mldev/test_model_common.h b/app/test-mldev/test_model_common.h index c1021ef1b6..a207e54ab7 100644 --- a/app/test-mldev/test_model_common.h +++ b/app/test-mldev/test_model_common.h @@ -31,7 +31,6 @@ struct ml_model { uint8_t *reference; struct rte_mempool *io_pool; - uint32_t nb_batches; }; int ml_model_load(struct ml_test *test, struct ml_options *opt, struct ml_model *model, diff --git a/doc/guides/tools/testmldev.rst b/doc/guides/tools/testmldev.rst index 741abd722e..9b1565a457 100644 --- a/doc/guides/tools/testmldev.rst +++ b/doc/guides/tools/testmldev.rst @@ -106,11 +106,6 @@ The following are the command-line options supported by the test application. Queue size would translate into ``rte_ml_dev_qp_conf::nb_desc`` field during queue-pair creation. Default value is ``1``. -``--batches <n>`` - Set the number batches in the input file provided for inference run. - When not specified, the test would assume the number of batches - is the batch size of the model. - ``--tolerance <n>`` Set the tolerance value in percentage to be used for output validation. Default value is ``0``. @@ -282,7 +277,6 @@ Supported command line options for inference tests are following:: --burst_size --queue_pairs --queue_size - --batches --tolerance --stats diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h index 6ca0b0bb6e..c73bf7d001 100644 --- a/drivers/ml/cnxk/cn10k_ml_dev.h +++ b/drivers/ml/cnxk/cn10k_ml_dev.h @@ -30,6 +30,9 @@ /* Maximum number of descriptors per queue-pair */ #define ML_CN10K_MAX_DESC_PER_QP 1024 +/* Maximum number of inputs / outputs per model */ +#define ML_CN10K_MAX_INPUT_OUTPUT 32 + /* Maximum number of segments for IO data */ #define ML_CN10K_MAX_SEGMENTS 1 diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c index 26df8d9ff9..e0b750cd8e 100644 --- a/drivers/ml/cnxk/cn10k_ml_model.c +++ b/drivers/ml/cnxk/cn10k_ml_model.c @@ -520,9 +520,11 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model) struct rte_ml_model_info *info; struct rte_ml_io_info *output; struct rte_ml_io_info *input; + struct cn10k_ml_dev *mldev; uint8_t i; uint8_t j; + mldev = dev->data->dev_private; metadata = &model->metadata; info = PLT_PTR_CAST(model->info); input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info)); @@ -537,7 +539,9 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model) metadata->model.version[3]); info->model_id = model->model_id; info->device_id = dev->data->dev_id; - info->batch_size = model->batch_size; + info->io_layout = RTE_ML_IO_LAYOUT_PACKED; + info->min_batches = model->batch_size; + info->max_batches = mldev->fw.req->jd.fw_load.cap.s.max_num_batches / model->batch_size; info->nb_inputs = metadata->model.num_input; info->input_info = input; info->nb_outputs = metadata->model.num_output; diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c index e3faab81ba..1d72fb52a6 100644 --- a/drivers/ml/cnxk/cn10k_ml_ops.c +++ b/drivers/ml/cnxk/cn10k_ml_ops.c @@ -471,9 +471,9 @@ cn10k_ml_prep_fp_job_descriptor(struct rte_ml_dev *dev, struct cn10k_ml_req *req req->jd.hdr.sp_flags = 0x0; req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result); req->jd.model_run.input_ddr_addr = - PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->input.addr)); + PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->input[0]->addr)); req->jd.model_run.output_ddr_addr = - PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->output.addr)); + PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->output[0]->addr)); req->jd.model_run.num_batches = op->nb_batches; } @@ -856,7 +856,11 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint static int cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id) { + struct rte_ml_model_info *info; struct cn10k_ml_model *model; + struct rte_ml_buff_seg seg[2]; + struct rte_ml_buff_seg *inp; + struct rte_ml_buff_seg *out; struct rte_ml_op op; char str[RTE_MEMZONE_NAMESIZE]; @@ -864,12 +868,22 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id) uint64_t isize = 0; uint64_t osize = 0; int ret = 0; + uint32_t i; model = dev->data->models[model_id]; + info = (struct rte_ml_model_info *)model->info; + inp = &seg[0]; + out = &seg[1]; /* Create input and output buffers. */ - rte_ml_io_input_size_get(dev->data->dev_id, model_id, model->batch_size, &isize, NULL); - rte_ml_io_output_size_get(dev->data->dev_id, model_id, model->batch_size, &osize, NULL); + for (i = 0; i < info->nb_inputs; i++) + isize += info->input_info[i].size; + + for (i = 0; i < info->nb_outputs; i++) + osize += info->output_info[i].size; + + isize = model->batch_size * isize; + osize = model->batch_size * osize; snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", model_id); mz = plt_memzone_reserve_aligned(str, isize + osize, 0, ML_CN10K_ALIGN_SIZE); @@ -877,17 +891,22 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id) return -ENOMEM; memset(mz->addr, 0, isize + osize); + seg[0].addr = mz->addr; + seg[0].iova_addr = mz->iova; + seg[0].length = isize; + seg[0].next = NULL; + + seg[1].addr = PLT_PTR_ADD(mz->addr, isize); + seg[1].iova_addr = mz->iova + isize; + seg[1].length = osize; + seg[1].next = NULL; + op.model_id = model_id; op.nb_batches = model->batch_size; op.mempool = NULL; - op.input.addr = mz->addr; - op.input.length = isize; - op.input.next = NULL; - - op.output.addr = PLT_PTR_ADD(op.input.addr, isize); - op.output.length = osize; - op.output.next = NULL; + op.input = &inp; + op.output = &out; memset(model->req, 0, sizeof(struct cn10k_ml_req)); ret = cn10k_ml_inference_sync(dev, &op); @@ -919,8 +938,9 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info) else if (strcmp(mldev->fw.poll_mem, "ddr") == 0) dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP; + dev_info->max_io = ML_CN10K_MAX_INPUT_OUTPUT; dev_info->max_segments = ML_CN10K_MAX_SEGMENTS; - dev_info->min_align_size = ML_CN10K_ALIGN_SIZE; + dev_info->align_size = ML_CN10K_ALIGN_SIZE; return 0; } @@ -2139,15 +2159,14 @@ cn10k_ml_io_output_size_get(struct rte_ml_dev *dev, uint16_t model_id, uint32_t } static int -cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_batches, void *dbuffer, - void *qbuffer) +cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer, + struct rte_ml_buff_seg **qbuffer) { struct cn10k_ml_model *model; uint8_t model_input_type; uint8_t *lcl_dbuffer; uint8_t *lcl_qbuffer; uint8_t input_type; - uint32_t batch_id; float qscale; uint32_t i; uint32_t j; @@ -2160,11 +2179,9 @@ cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_batc return -EINVAL; } - lcl_dbuffer = dbuffer; - lcl_qbuffer = qbuffer; - batch_id = 0; + lcl_dbuffer = dbuffer[0]->addr; + lcl_qbuffer = qbuffer[0]->addr; -next_batch: for (i = 0; i < model->metadata.model.num_input; i++) { if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) { input_type = model->metadata.input1[i].input_type; @@ -2218,23 +2235,18 @@ cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_batc lcl_qbuffer += model->addr.input[i].sz_q; } - batch_id++; - if (batch_id < PLT_DIV_CEIL(nb_batches, model->batch_size)) - goto next_batch; - return 0; } static int -cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_batches, - void *qbuffer, void *dbuffer) +cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer, + struct rte_ml_buff_seg **dbuffer) { struct cn10k_ml_model *model; uint8_t model_output_type; uint8_t *lcl_qbuffer; uint8_t *lcl_dbuffer; uint8_t output_type; - uint32_t batch_id; float dscale; uint32_t i; uint32_t j; @@ -2247,11 +2259,9 @@ cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_ba return -EINVAL; } - lcl_dbuffer = dbuffer; - lcl_qbuffer = qbuffer; - batch_id = 0; + lcl_dbuffer = dbuffer[0]->addr; + lcl_qbuffer = qbuffer[0]->addr; -next_batch: for (i = 0; i < model->metadata.model.num_output; i++) { if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) { output_type = model->metadata.output1[i].output_type; @@ -2306,10 +2316,6 @@ cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_ba lcl_dbuffer += model->addr.output[i].sz_d; } - batch_id++; - if (batch_id < PLT_DIV_CEIL(nb_batches, model->batch_size)) - goto next_batch; - return 0; } diff --git a/lib/mldev/meson.build b/lib/mldev/meson.build index 5769b0640a..0079ccd205 100644 --- a/lib/mldev/meson.build +++ b/lib/mldev/meson.build @@ -35,7 +35,7 @@ driver_sdk_headers += files( 'mldev_utils.h', ) -deps += ['mempool'] +deps += ['mempool', 'mbuf'] if get_option('buildtype').contains('debug') cflags += [ '-DRTE_LIBRTE_ML_DEV_DEBUG' ] diff --git a/lib/mldev/rte_mldev.c b/lib/mldev/rte_mldev.c index 0d8ccd3212..9a48ed3e94 100644 --- a/lib/mldev/rte_mldev.c +++ b/lib/mldev/rte_mldev.c @@ -730,8 +730,8 @@ rte_ml_io_output_size_get(int16_t dev_id, uint16_t model_id, uint32_t nb_batches } int -rte_ml_io_quantize(int16_t dev_id, uint16_t model_id, uint16_t nb_batches, void *dbuffer, - void *qbuffer) +rte_ml_io_quantize(int16_t dev_id, uint16_t model_id, struct rte_ml_buff_seg **dbuffer, + struct rte_ml_buff_seg **qbuffer) { struct rte_ml_dev *dev; @@ -754,12 +754,12 @@ rte_ml_io_quantize(int16_t dev_id, uint16_t model_id, uint16_t nb_batches, void return -EINVAL; } - return (*dev->dev_ops->io_quantize)(dev, model_id, nb_batches, dbuffer, qbuffer); + return (*dev->dev_ops->io_quantize)(dev, model_id, dbuffer, qbuffer); } int -rte_ml_io_dequantize(int16_t dev_id, uint16_t model_id, uint16_t nb_batches, void *qbuffer, - void *dbuffer) +rte_ml_io_dequantize(int16_t dev_id, uint16_t model_id, struct rte_ml_buff_seg **qbuffer, + struct rte_ml_buff_seg **dbuffer) { struct rte_ml_dev *dev; @@ -782,7 +782,7 @@ rte_ml_io_dequantize(int16_t dev_id, uint16_t model_id, uint16_t nb_batches, voi return -EINVAL; } - return (*dev->dev_ops->io_dequantize)(dev, model_id, nb_batches, qbuffer, dbuffer); + return (*dev->dev_ops->io_dequantize)(dev, model_id, qbuffer, dbuffer); } /** Initialise rte_ml_op mempool element */ diff --git a/lib/mldev/rte_mldev.h b/lib/mldev/rte_mldev.h index 6204df0930..316c6fd018 100644 --- a/lib/mldev/rte_mldev.h +++ b/lib/mldev/rte_mldev.h @@ -228,12 +228,14 @@ struct rte_ml_dev_info { /**< Maximum allowed number of descriptors for queue pair by the device. * @see struct rte_ml_dev_qp_conf::nb_desc */ + uint16_t max_io; + /**< Maximum number of inputs/outputs supported per model. */ uint16_t max_segments; /**< Maximum number of scatter-gather entries supported by the device. * @see struct rte_ml_buff_seg struct rte_ml_buff_seg::next */ - uint16_t min_align_size; - /**< Minimum alignment size of IO buffers used by the device. */ + uint16_t align_size; + /**< Alignment size of IO buffers used by the device. */ }; /** @@ -429,10 +431,28 @@ struct rte_ml_op { /**< Reserved for future use. */ struct rte_mempool *mempool; /**< Pool from which operation is allocated. */ - struct rte_ml_buff_seg input; - /**< Input buffer to hold the inference data. */ - struct rte_ml_buff_seg output; - /**< Output buffer to hold the inference output by the driver. */ + struct rte_ml_buff_seg **input; + /**< Array of buffer segments to hold the inference input data. + * + * When the model supports IO layout RTE_ML_IO_LAYOUT_PACKED, size of + * the array is 1. + * + * When the model supports IO layout RTE_ML_IO_LAYOUT_SPLIT, size of + * the array is rte_ml_model_info::nb_inputs. + * + * @see struct rte_ml_dev_info::io_layout + */ + struct rte_ml_buff_seg **output; + /**< Array of buffer segments to hold the inference output data. + * + * When the model supports IO layout RTE_ML_IO_LAYOUT_PACKED, size of + * the array is 1. + * + * When the model supports IO layout RTE_ML_IO_LAYOUT_SPLIT, size of + * the array is rte_ml_model_info::nb_outputs. + * + * @see struct rte_ml_dev_info::io_layout + */ union { uint64_t user_u64; /**< User data as uint64_t.*/ @@ -863,7 +883,37 @@ enum rte_ml_io_type { /**< 16-bit brain floating point number. */ }; -/** Input and output data information structure +/** ML I/O buffer layout */ +enum rte_ml_io_layout { + RTE_ML_IO_LAYOUT_PACKED, + /**< All inputs for the model should packed in a single buffer with + * no padding between individual inputs. The buffer is expected to + * be aligned to rte_ml_dev_info::align_size. + * + * When I/O segmentation is supported by the device, the packed + * data can be split into multiple segments. In this case, each + * segment is expected to be aligned to rte_ml_dev_info::align_size + * + * Same applies to output. + * + * @see struct rte_ml_dev_info::max_segments + */ + RTE_ML_IO_LAYOUT_SPLIT + /**< Each input for the model should be stored as separate buffers + * and each input should be aligned to rte_ml_dev_info::align_size. + * + * When I/O segmentation is supported, each input can be split into + * multiple segments. In this case, each segment is expected to be + * aligned to rte_ml_dev_info::align_size + * + * Same applies to output. + * + * @see struct rte_ml_dev_info::max_segments + */ +}; + +/** + * Input and output data information structure * * Specifies the type and shape of input and output data. */ @@ -873,7 +923,7 @@ struct rte_ml_io_info { uint32_t nb_dims; /**< Number of dimensions in shape */ uint32_t *shape; - /**< Shape of the tensor */ + /**< Shape of the tensor for rte_ml_model_info::min_batches of the model. */ enum rte_ml_io_type type; /**< Type of data * @see enum rte_ml_io_type @@ -894,8 +944,16 @@ struct rte_ml_model_info { /**< Model ID */ uint16_t device_id; /**< Device ID */ - uint16_t batch_size; - /**< Maximum number of batches that the model can process simultaneously */ + enum rte_ml_io_layout io_layout; + /**< I/O buffer layout for the model */ + uint16_t min_batches; + /**< Minimum number of batches that the model can process + * in one inference request + */ + uint16_t max_batches; + /**< Maximum number of batches that the model can process + * in one inference request + */ uint32_t nb_inputs; /**< Number of inputs */ const struct rte_ml_io_info *input_info; @@ -1021,8 +1079,6 @@ rte_ml_io_output_size_get(int16_t dev_id, uint16_t model_id, uint32_t nb_batches * The identifier of the device. * @param[in] model_id * Identifier for the model - * @param[in] nb_batches - * Number of batches in the dequantized input buffer * @param[in] dbuffer * Address of dequantized input data * @param[in] qbuffer @@ -1034,8 +1090,8 @@ rte_ml_io_output_size_get(int16_t dev_id, uint16_t model_id, uint32_t nb_batches */ __rte_experimental int -rte_ml_io_quantize(int16_t dev_id, uint16_t model_id, uint16_t nb_batches, void *dbuffer, - void *qbuffer); +rte_ml_io_quantize(int16_t dev_id, uint16_t model_id, struct rte_ml_buff_seg **dbuffer, + struct rte_ml_buff_seg **qbuffer); /** * Dequantize output data. @@ -1047,8 +1103,6 @@ rte_ml_io_quantize(int16_t dev_id, uint16_t model_id, uint16_t nb_batches, void * The identifier of the device. * @param[in] model_id * Identifier for the model - * @param[in] nb_batches - * Number of batches in the dequantized output buffer * @param[in] qbuffer * Address of quantized output data * @param[in] dbuffer @@ -1060,8 +1114,8 @@ rte_ml_io_quantize(int16_t dev_id, uint16_t model_id, uint16_t nb_batches, void */ __rte_experimental int -rte_ml_io_dequantize(int16_t dev_id, uint16_t model_id, uint16_t nb_batches, void *qbuffer, - void *dbuffer); +rte_ml_io_dequantize(int16_t dev_id, uint16_t model_id, struct rte_ml_buff_seg **qbuffer, + struct rte_ml_buff_seg **dbuffer); /* ML op pool operations */ diff --git a/lib/mldev/rte_mldev_core.h b/lib/mldev/rte_mldev_core.h index 78b8b7633d..8530b07316 100644 --- a/lib/mldev/rte_mldev_core.h +++ b/lib/mldev/rte_mldev_core.h @@ -523,8 +523,6 @@ typedef int (*mldev_io_output_size_get_t)(struct rte_ml_dev *dev, uint16_t model * ML device pointer. * @param model_id * Model ID to use. - * @param nb_batches - * Number of batches. * @param dbuffer * Pointer t de-quantized data buffer. * @param qbuffer @@ -534,8 +532,9 @@ typedef int (*mldev_io_output_size_get_t)(struct rte_ml_dev *dev, uint16_t model * - 0 on success. * - <0, error on failure. */ -typedef int (*mldev_io_quantize_t)(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_batches, - void *dbuffer, void *qbuffer); +typedef int (*mldev_io_quantize_t)(struct rte_ml_dev *dev, uint16_t model_id, + struct rte_ml_buff_seg **dbuffer, + struct rte_ml_buff_seg **qbuffer); /** * @internal @@ -546,8 +545,6 @@ typedef int (*mldev_io_quantize_t)(struct rte_ml_dev *dev, uint16_t model_id, ui * ML device pointer. * @param model_id * Model ID to use. - * @param nb_batches - * Number of batches. * @param qbuffer * Pointer t de-quantized data buffer. * @param dbuffer @@ -557,8 +554,9 @@ typedef int (*mldev_io_quantize_t)(struct rte_ml_dev *dev, uint16_t model_id, ui * - 0 on success. * - <0, error on failure. */ -typedef int (*mldev_io_dequantize_t)(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_batches, - void *qbuffer, void *dbuffer); +typedef int (*mldev_io_dequantize_t)(struct rte_ml_dev *dev, uint16_t model_id, + struct rte_ml_buff_seg **qbuffer, + struct rte_ml_buff_seg **dbuffer); /** * @internal -- 2.41.0 ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v3 3/4] mldev: drop input and output size get APIs 2023-09-27 18:11 ` [PATCH v3 0/4] Spec changes to support multi I/O models Srikanth Yalavarthi 2023-09-27 18:11 ` [PATCH v3 1/4] mldev: add support for arbitrary shape dimensions Srikanth Yalavarthi 2023-09-27 18:11 ` [PATCH v3 2/4] mldev: introduce support for IO layout Srikanth Yalavarthi @ 2023-09-27 18:11 ` Srikanth Yalavarthi 2023-09-27 18:11 ` [PATCH v3 4/4] mldev: update release notes for 23.11 Srikanth Yalavarthi 3 siblings, 0 replies; 26+ messages in thread From: Srikanth Yalavarthi @ 2023-09-27 18:11 UTC (permalink / raw) To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar Drop support and use of ML input and output size get functions, rte_ml_io_input_size_get and rte_ml_io_output_size_get. These functions are not required, as the model buffer size can be computed from the fields of updated rte_ml_io_info structure. Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com> --- drivers/ml/cnxk/cn10k_ml_ops.c | 50 ---------------------------- lib/mldev/rte_mldev.c | 38 --------------------- lib/mldev/rte_mldev.h | 60 ---------------------------------- lib/mldev/rte_mldev_core.h | 54 ------------------------------ lib/mldev/version.map | 2 -- 5 files changed, 204 deletions(-) diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c index 1d72fb52a6..4abf4ae0d3 100644 --- a/drivers/ml/cnxk/cn10k_ml_ops.c +++ b/drivers/ml/cnxk/cn10k_ml_ops.c @@ -2110,54 +2110,6 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *bu return 0; } -static int -cn10k_ml_io_input_size_get(struct rte_ml_dev *dev, uint16_t model_id, uint32_t nb_batches, - uint64_t *input_qsize, uint64_t *input_dsize) -{ - struct cn10k_ml_model *model; - - model = dev->data->models[model_id]; - - if (model == NULL) { - plt_err("Invalid model_id = %u", model_id); - return -EINVAL; - } - - if (input_qsize != NULL) - *input_qsize = PLT_U64_CAST(model->addr.total_input_sz_q * - PLT_DIV_CEIL(nb_batches, model->batch_size)); - - if (input_dsize != NULL) - *input_dsize = PLT_U64_CAST(model->addr.total_input_sz_d * - PLT_DIV_CEIL(nb_batches, model->batch_size)); - - return 0; -} - -static int -cn10k_ml_io_output_size_get(struct rte_ml_dev *dev, uint16_t model_id, uint32_t nb_batches, - uint64_t *output_qsize, uint64_t *output_dsize) -{ - struct cn10k_ml_model *model; - - model = dev->data->models[model_id]; - - if (model == NULL) { - plt_err("Invalid model_id = %u", model_id); - return -EINVAL; - } - - if (output_qsize != NULL) - *output_qsize = PLT_U64_CAST(model->addr.total_output_sz_q * - PLT_DIV_CEIL(nb_batches, model->batch_size)); - - if (output_dsize != NULL) - *output_dsize = PLT_U64_CAST(model->addr.total_output_sz_d * - PLT_DIV_CEIL(nb_batches, model->batch_size)); - - return 0; -} - static int cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer, struct rte_ml_buff_seg **qbuffer) @@ -2636,8 +2588,6 @@ struct rte_ml_dev_ops cn10k_ml_ops = { .model_params_update = cn10k_ml_model_params_update, /* I/O ops */ - .io_input_size_get = cn10k_ml_io_input_size_get, - .io_output_size_get = cn10k_ml_io_output_size_get, .io_quantize = cn10k_ml_io_quantize, .io_dequantize = cn10k_ml_io_dequantize, }; diff --git a/lib/mldev/rte_mldev.c b/lib/mldev/rte_mldev.c index 9a48ed3e94..cc5f2e0cc6 100644 --- a/lib/mldev/rte_mldev.c +++ b/lib/mldev/rte_mldev.c @@ -691,44 +691,6 @@ rte_ml_model_params_update(int16_t dev_id, uint16_t model_id, void *buffer) return (*dev->dev_ops->model_params_update)(dev, model_id, buffer); } -int -rte_ml_io_input_size_get(int16_t dev_id, uint16_t model_id, uint32_t nb_batches, - uint64_t *input_qsize, uint64_t *input_dsize) -{ - struct rte_ml_dev *dev; - - if (!rte_ml_dev_is_valid_dev(dev_id)) { - RTE_MLDEV_LOG(ERR, "Invalid dev_id = %d\n", dev_id); - return -EINVAL; - } - - dev = rte_ml_dev_pmd_get_dev(dev_id); - if (*dev->dev_ops->io_input_size_get == NULL) - return -ENOTSUP; - - return (*dev->dev_ops->io_input_size_get)(dev, model_id, nb_batches, input_qsize, - input_dsize); -} - -int -rte_ml_io_output_size_get(int16_t dev_id, uint16_t model_id, uint32_t nb_batches, - uint64_t *output_qsize, uint64_t *output_dsize) -{ - struct rte_ml_dev *dev; - - if (!rte_ml_dev_is_valid_dev(dev_id)) { - RTE_MLDEV_LOG(ERR, "Invalid dev_id = %d\n", dev_id); - return -EINVAL; - } - - dev = rte_ml_dev_pmd_get_dev(dev_id); - if (*dev->dev_ops->io_output_size_get == NULL) - return -ENOTSUP; - - return (*dev->dev_ops->io_output_size_get)(dev, model_id, nb_batches, output_qsize, - output_dsize); -} - int rte_ml_io_quantize(int16_t dev_id, uint16_t model_id, struct rte_ml_buff_seg **dbuffer, struct rte_ml_buff_seg **qbuffer) diff --git a/lib/mldev/rte_mldev.h b/lib/mldev/rte_mldev.h index 316c6fd018..63b2670bb0 100644 --- a/lib/mldev/rte_mldev.h +++ b/lib/mldev/rte_mldev.h @@ -1008,66 +1008,6 @@ rte_ml_model_params_update(int16_t dev_id, uint16_t model_id, void *buffer); /* IO operations */ -/** - * Get size of quantized and dequantized input buffers. - * - * Calculate the size of buffers required for quantized and dequantized input data. - * This API would return the buffer sizes for the number of batches provided and would - * consider the alignment requirements as per the PMD. Input sizes computed by this API can - * be used by the application to allocate buffers. - * - * @param[in] dev_id - * The identifier of the device. - * @param[in] model_id - * Identifier for the model created - * @param[in] nb_batches - * Number of batches of input to be processed in a single inference job - * @param[out] input_qsize - * Quantized input size pointer. - * NULL value is allowed, in which case input_qsize is not calculated by the driver. - * @param[out] input_dsize - * Dequantized input size pointer. - * NULL value is allowed, in which case input_dsize is not calculated by the driver. - * - * @return - * - Returns 0 on success - * - Returns negative value on failure - */ -__rte_experimental -int -rte_ml_io_input_size_get(int16_t dev_id, uint16_t model_id, uint32_t nb_batches, - uint64_t *input_qsize, uint64_t *input_dsize); - -/** - * Get size of quantized and dequantized output buffers. - * - * Calculate the size of buffers required for quantized and dequantized output data. - * This API would return the buffer sizes for the number of batches provided and would consider - * the alignment requirements as per the PMD. Output sizes computed by this API can be used by the - * application to allocate buffers. - * - * @param[in] dev_id - * The identifier of the device. - * @param[in] model_id - * Identifier for the model created - * @param[in] nb_batches - * Number of batches of input to be processed in a single inference job - * @param[out] output_qsize - * Quantized output size pointer. - * NULL value is allowed, in which case output_qsize is not calculated by the driver. - * @param[out] output_dsize - * Dequantized output size pointer. - * NULL value is allowed, in which case output_dsize is not calculated by the driver. - * - * @return - * - Returns 0 on success - * - Returns negative value on failure - */ -__rte_experimental -int -rte_ml_io_output_size_get(int16_t dev_id, uint16_t model_id, uint32_t nb_batches, - uint64_t *output_qsize, uint64_t *output_dsize); - /** * Quantize input data. * diff --git a/lib/mldev/rte_mldev_core.h b/lib/mldev/rte_mldev_core.h index 8530b07316..2279b1dcec 100644 --- a/lib/mldev/rte_mldev_core.h +++ b/lib/mldev/rte_mldev_core.h @@ -466,54 +466,6 @@ typedef int (*mldev_model_info_get_t)(struct rte_ml_dev *dev, uint16_t model_id, */ typedef int (*mldev_model_params_update_t)(struct rte_ml_dev *dev, uint16_t model_id, void *buffer); -/** - * @internal - * - * Get size of input buffers. - * - * @param dev - * ML device pointer. - * @param model_id - * Model ID to use. - * @param nb_batches - * Number of batches. - * @param input_qsize - * Size of quantized input. - * @param input_dsize - * Size of dequantized input. - * - * @return - * - 0 on success. - * - <0, error on failure. - */ -typedef int (*mldev_io_input_size_get_t)(struct rte_ml_dev *dev, uint16_t model_id, - uint32_t nb_batches, uint64_t *input_qsize, - uint64_t *input_dsize); - -/** - * @internal - * - * Get size of output buffers. - * - * @param dev - * ML device pointer. - * @param model_id - * Model ID to use. - * @param nb_batches - * Number of batches. - * @param output_qsize - * Size of quantized output. - * @param output_dsize - * Size of dequantized output. - * - * @return - * - 0 on success. - * - <0, error on failure. - */ -typedef int (*mldev_io_output_size_get_t)(struct rte_ml_dev *dev, uint16_t model_id, - uint32_t nb_batches, uint64_t *output_qsize, - uint64_t *output_dsize); - /** * @internal * @@ -627,12 +579,6 @@ struct rte_ml_dev_ops { /** Update model params. */ mldev_model_params_update_t model_params_update; - /** Get input buffer size. */ - mldev_io_input_size_get_t io_input_size_get; - - /** Get output buffer size. */ - mldev_io_output_size_get_t io_output_size_get; - /** Quantize data */ mldev_io_quantize_t io_quantize; diff --git a/lib/mldev/version.map b/lib/mldev/version.map index 40ff27f4b9..99841db6aa 100644 --- a/lib/mldev/version.map +++ b/lib/mldev/version.map @@ -23,8 +23,6 @@ EXPERIMENTAL { rte_ml_dev_xstats_reset; rte_ml_enqueue_burst; rte_ml_io_dequantize; - rte_ml_io_input_size_get; - rte_ml_io_output_size_get; rte_ml_io_quantize; rte_ml_model_info_get; rte_ml_model_load; -- 2.41.0 ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v3 4/4] mldev: update release notes for 23.11 2023-09-27 18:11 ` [PATCH v3 0/4] Spec changes to support multi I/O models Srikanth Yalavarthi ` (2 preceding siblings ...) 2023-09-27 18:11 ` [PATCH v3 3/4] mldev: drop input and output size get APIs Srikanth Yalavarthi @ 2023-09-27 18:11 ` Srikanth Yalavarthi 2023-09-29 3:39 ` Jerin Jacob 3 siblings, 1 reply; 26+ messages in thread From: Srikanth Yalavarthi @ 2023-09-27 18:11 UTC (permalink / raw) Cc: dev, syalavarthi, sshankarnara, aprabhu, ptakkar Updated 23.11 release notes for mldev spec. Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com> --- doc/guides/rel_notes/release_23_11.rst | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/doc/guides/rel_notes/release_23_11.rst b/doc/guides/rel_notes/release_23_11.rst index 9746809a66..ca31ac5985 100644 --- a/doc/guides/rel_notes/release_23_11.rst +++ b/doc/guides/rel_notes/release_23_11.rst @@ -41,6 +41,11 @@ DPDK Release 23.11 New Features ------------ + * **Added support for models with multiple I/O in mldev library.** + + Added support in mldev library for models with multiple inputs and outputs. + + .. This section should contain new features added in this release. Sample format: @@ -97,6 +102,8 @@ Removed Items * kni: Removed the Kernel Network Interface (KNI) library and driver. +* mldev: Removed APIs ``rte_ml_io_input_size_get`` and ``rte_ml_io_output_size_get``. + API Changes ----------- @@ -119,6 +126,14 @@ API Changes except ``rte_thread_setname()`` and ``rte_ctrl_thread_create()`` which are replaced with ``rte_thread_set_name()`` and ``rte_thread_create_control()``. +* mldev: Updated mldev API to support models with multiple inputs and outputs. + Updated the structure ``rte_ml_model_info`` to support input and output with + arbitrary shapes. Introduced support for ``rte_ml_io_layout``. Two layout types + split and packed are supported by the specification, which enables higher + control in handling models with multiple inputs and outputs. Updated ``rte_ml_op``, + ``rte_ml_io_quantize`` and ``rte_ml_io_dequantize`` to support an array of + ``rte_ml_buff_seg`` for inputs and outputs and removed use of batches argument. + ABI Changes ----------- -- 2.41.0 ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v3 4/4] mldev: update release notes for 23.11 2023-09-27 18:11 ` [PATCH v3 4/4] mldev: update release notes for 23.11 Srikanth Yalavarthi @ 2023-09-29 3:39 ` Jerin Jacob 2023-10-02 9:59 ` [EXT] " Srikanth Yalavarthi 0 siblings, 1 reply; 26+ messages in thread From: Jerin Jacob @ 2023-09-29 3:39 UTC (permalink / raw) To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar On Thu, Sep 28, 2023 at 3:01 PM Srikanth Yalavarthi <syalavarthi@marvell.com> wrote: > > Updated 23.11 release notes for mldev spec. > > Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com> Squash the doc changes to relevant patches where respective source code change has been made. ^ permalink raw reply [flat|nested] 26+ messages in thread
* RE: [EXT] Re: [PATCH v3 4/4] mldev: update release notes for 23.11 2023-09-29 3:39 ` Jerin Jacob @ 2023-10-02 9:59 ` Srikanth Yalavarthi 0 siblings, 0 replies; 26+ messages in thread From: Srikanth Yalavarthi @ 2023-10-02 9:59 UTC (permalink / raw) To: Jerin Jacob Cc: dev, Shivah Shankar Shankar Narayan Rao, Anup Prabhu, Prince Takkar, Srikanth Yalavarthi > -----Original Message----- > From: Jerin Jacob <jerinjacobk@gmail.com> > Sent: 29 September 2023 09:10 > To: Srikanth Yalavarthi <syalavarthi@marvell.com> > Cc: dev@dpdk.org; Shivah Shankar Shankar Narayan Rao > <sshankarnara@marvell.com>; Anup Prabhu <aprabhu@marvell.com>; > Prince Takkar <ptakkar@marvell.com>; Srikanth Yalavarthi > <syalavarthi@marvell.com> > Subject: [EXT] Re: [PATCH v3 4/4] mldev: update release notes for 23.11 > > External Email > > ---------------------------------------------------------------------- > On Thu, Sep 28, 2023 at 3:01 PM Srikanth Yalavarthi > <syalavarthi@marvell.com> wrote: > > > > Updated 23.11 release notes for mldev spec. > > > > Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com> > > > Squash the doc changes to relevant patches where respective source code > change has been made. Updated in v4. ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v4 0/3] Spec changes to support multi I/O models 2023-08-30 15:52 [PATCH v1 0/3] Spec changes to support multi I/O models Srikanth Yalavarthi ` (4 preceding siblings ...) 2023-09-27 18:11 ` [PATCH v3 0/4] Spec changes to support multi I/O models Srikanth Yalavarthi @ 2023-10-02 9:58 ` Srikanth Yalavarthi 2023-10-02 9:58 ` [PATCH v4 1/3] mldev: add support for arbitrary shape dimensions Srikanth Yalavarthi ` (3 more replies) 5 siblings, 4 replies; 26+ messages in thread From: Srikanth Yalavarthi @ 2023-10-02 9:58 UTC (permalink / raw) Cc: dev, syalavarthi, sshankarnara, aprabhu, ptakkar This series implements changes to mldev spec to extend support for ML models with multiple inputs and outputs. Changes include introduction of I/O layout to support packed and split buffers for model input and output. Extended the rte_ml_model_info structure to support multiple inputs and outputs. Updated rte_ml_op and quantize / dequantize APIs to support an array of input and output ML buffer segments. Support for batches option is dropped from test application. v4: - Squashed release notes v3: - Added release notes for 23.11 v2: - Minor fixes - Cleanup of application help v1: - Initial changes Srikanth Yalavarthi (3): mldev: add support for arbitrary shape dimensions mldev: introduce support for IO layout mldev: drop input and output size get APIs app/test-mldev/ml_options.c | 16 - app/test-mldev/ml_options.h | 2 - app/test-mldev/test_inference_common.c | 420 +++++++++++++++++-------- app/test-mldev/test_inference_common.h | 6 + app/test-mldev/test_model_common.c | 6 - app/test-mldev/test_model_common.h | 1 - doc/guides/rel_notes/release_23_11.rst | 15 + doc/guides/tools/testmldev.rst | 6 - drivers/ml/cnxk/cn10k_ml_dev.h | 3 + drivers/ml/cnxk/cn10k_ml_model.c | 84 +++-- drivers/ml/cnxk/cn10k_ml_model.h | 12 + drivers/ml/cnxk/cn10k_ml_ops.c | 135 +++----- lib/mldev/meson.build | 2 +- lib/mldev/mldev_utils.c | 30 -- lib/mldev/mldev_utils.h | 16 - lib/mldev/rte_mldev.c | 50 +-- lib/mldev/rte_mldev.h | 201 +++++------- lib/mldev/rte_mldev_core.h | 68 +--- lib/mldev/version.map | 3 - 19 files changed, 521 insertions(+), 555 deletions(-) -- 2.41.0 ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v4 1/3] mldev: add support for arbitrary shape dimensions 2023-10-02 9:58 ` [PATCH v4 0/3] Spec changes to support multi I/O models Srikanth Yalavarthi @ 2023-10-02 9:58 ` Srikanth Yalavarthi 2023-10-04 14:42 ` Anup Prabhu 2023-10-05 9:12 ` Shivah Shankar Shankar Narayan Rao 2023-10-02 9:58 ` [PATCH v4 2/3] mldev: introduce support for IO layout Srikanth Yalavarthi ` (2 subsequent siblings) 3 siblings, 2 replies; 26+ messages in thread From: Srikanth Yalavarthi @ 2023-10-02 9:58 UTC (permalink / raw) To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar Updated rte_ml_io_info to support shape of arbitrary number of dimensions. Dropped use of rte_ml_io_shape and rte_ml_io_format. Introduced new fields nb_elements and size in rte_ml_io_info. Updated drivers and app/mldev to support the changes. Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com> --- app/test-mldev/test_inference_common.c | 97 +++++--------------------- doc/guides/rel_notes/release_23_11.rst | 3 + drivers/ml/cnxk/cn10k_ml_model.c | 78 +++++++++++++-------- drivers/ml/cnxk/cn10k_ml_model.h | 12 ++++ drivers/ml/cnxk/cn10k_ml_ops.c | 11 +-- lib/mldev/mldev_utils.c | 30 -------- lib/mldev/mldev_utils.h | 16 ----- lib/mldev/rte_mldev.h | 59 ++++------------ lib/mldev/version.map | 1 - 9 files changed, 97 insertions(+), 210 deletions(-) diff --git a/app/test-mldev/test_inference_common.c b/app/test-mldev/test_inference_common.c index 05b221401b..b40519b5e3 100644 --- a/app/test-mldev/test_inference_common.c +++ b/app/test-mldev/test_inference_common.c @@ -3,6 +3,7 @@ */ #include <errno.h> +#include <math.h> #include <stdio.h> #include <unistd.h> @@ -18,11 +19,6 @@ #include "ml_common.h" #include "test_inference_common.h" -#define ML_TEST_READ_TYPE(buffer, type) (*((type *)buffer)) - -#define ML_TEST_CHECK_OUTPUT(output, reference, tolerance) \ - (((float)output - (float)reference) <= (((float)reference * tolerance) / 100.0)) - #define ML_OPEN_WRITE_GET_ERR(name, buffer, size, err) \ do { \ FILE *fp = fopen(name, "w+"); \ @@ -763,9 +759,9 @@ ml_inference_validation(struct ml_test *test, struct ml_request *req) { struct test_inference *t = ml_test_priv((struct ml_test *)test); struct ml_model *model; - uint32_t nb_elements; - uint8_t *reference; - uint8_t *output; + float *reference; + float *output; + float deviation; bool match; uint32_t i; uint32_t j; @@ -777,89 +773,30 @@ ml_inference_validation(struct ml_test *test, struct ml_request *req) match = (rte_hash_crc(model->output, model->out_dsize, 0) == rte_hash_crc(model->reference, model->out_dsize, 0)); } else { - output = model->output; - reference = model->reference; + output = (float *)model->output; + reference = (float *)model->reference; i = 0; next_output: - nb_elements = - model->info.output_info[i].shape.w * model->info.output_info[i].shape.x * - model->info.output_info[i].shape.y * model->info.output_info[i].shape.z; j = 0; next_element: match = false; - switch (model->info.output_info[i].dtype) { - case RTE_ML_IO_TYPE_INT8: - if (ML_TEST_CHECK_OUTPUT(ML_TEST_READ_TYPE(output, int8_t), - ML_TEST_READ_TYPE(reference, int8_t), - t->cmn.opt->tolerance)) - match = true; - - output += sizeof(int8_t); - reference += sizeof(int8_t); - break; - case RTE_ML_IO_TYPE_UINT8: - if (ML_TEST_CHECK_OUTPUT(ML_TEST_READ_TYPE(output, uint8_t), - ML_TEST_READ_TYPE(reference, uint8_t), - t->cmn.opt->tolerance)) - match = true; - - output += sizeof(float); - reference += sizeof(float); - break; - case RTE_ML_IO_TYPE_INT16: - if (ML_TEST_CHECK_OUTPUT(ML_TEST_READ_TYPE(output, int16_t), - ML_TEST_READ_TYPE(reference, int16_t), - t->cmn.opt->tolerance)) - match = true; - - output += sizeof(int16_t); - reference += sizeof(int16_t); - break; - case RTE_ML_IO_TYPE_UINT16: - if (ML_TEST_CHECK_OUTPUT(ML_TEST_READ_TYPE(output, uint16_t), - ML_TEST_READ_TYPE(reference, uint16_t), - t->cmn.opt->tolerance)) - match = true; - - output += sizeof(uint16_t); - reference += sizeof(uint16_t); - break; - case RTE_ML_IO_TYPE_INT32: - if (ML_TEST_CHECK_OUTPUT(ML_TEST_READ_TYPE(output, int32_t), - ML_TEST_READ_TYPE(reference, int32_t), - t->cmn.opt->tolerance)) - match = true; - - output += sizeof(int32_t); - reference += sizeof(int32_t); - break; - case RTE_ML_IO_TYPE_UINT32: - if (ML_TEST_CHECK_OUTPUT(ML_TEST_READ_TYPE(output, uint32_t), - ML_TEST_READ_TYPE(reference, uint32_t), - t->cmn.opt->tolerance)) - match = true; - - output += sizeof(uint32_t); - reference += sizeof(uint32_t); - break; - case RTE_ML_IO_TYPE_FP32: - if (ML_TEST_CHECK_OUTPUT(ML_TEST_READ_TYPE(output, float), - ML_TEST_READ_TYPE(reference, float), - t->cmn.opt->tolerance)) - match = true; - - output += sizeof(float); - reference += sizeof(float); - break; - default: /* other types, fp8, fp16, bfloat16 */ + deviation = + (*reference == 0 ? 0 : 100 * fabs(*output - *reference) / fabs(*reference)); + if (deviation <= t->cmn.opt->tolerance) match = true; - } + else + ml_err("id = %d, element = %d, output = %f, reference = %f, deviation = %f %%\n", + i, j, *output, *reference, deviation); + + output++; + reference++; if (!match) goto done; + j++; - if (j < nb_elements) + if (j < model->info.output_info[i].nb_elements) goto next_element; i++; diff --git a/doc/guides/rel_notes/release_23_11.rst b/doc/guides/rel_notes/release_23_11.rst index 9746809a66..e553554b3a 100644 --- a/doc/guides/rel_notes/release_23_11.rst +++ b/doc/guides/rel_notes/release_23_11.rst @@ -119,6 +119,9 @@ API Changes except ``rte_thread_setname()`` and ``rte_ctrl_thread_create()`` which are replaced with ``rte_thread_set_name()`` and ``rte_thread_create_control()``. +* mldev: Updated mldev API to support models with multiple inputs and outputs + Updated the structure ``rte_ml_model_info`` to support input and output with + arbitrary shapes. ABI Changes ----------- diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c index 92c47d39ba..26df8d9ff9 100644 --- a/drivers/ml/cnxk/cn10k_ml_model.c +++ b/drivers/ml/cnxk/cn10k_ml_model.c @@ -366,6 +366,12 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_ addr->total_input_sz_q = 0; for (i = 0; i < metadata->model.num_input; i++) { if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) { + addr->input[i].nb_dims = 4; + addr->input[i].shape[0] = metadata->input1[i].shape.w; + addr->input[i].shape[1] = metadata->input1[i].shape.x; + addr->input[i].shape[2] = metadata->input1[i].shape.y; + addr->input[i].shape[3] = metadata->input1[i].shape.z; + addr->input[i].nb_elements = metadata->input1[i].shape.w * metadata->input1[i].shape.x * metadata->input1[i].shape.y * metadata->input1[i].shape.z; @@ -386,6 +392,13 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_ addr->input[i].sz_q); } else { j = i - MRVL_ML_NUM_INPUT_OUTPUT_1; + + addr->input[i].nb_dims = 4; + addr->input[i].shape[0] = metadata->input2[j].shape.w; + addr->input[i].shape[1] = metadata->input2[j].shape.x; + addr->input[i].shape[2] = metadata->input2[j].shape.y; + addr->input[i].shape[3] = metadata->input2[j].shape.z; + addr->input[i].nb_elements = metadata->input2[j].shape.w * metadata->input2[j].shape.x * metadata->input2[j].shape.y * metadata->input2[j].shape.z; @@ -412,6 +425,8 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_ addr->total_output_sz_d = 0; for (i = 0; i < metadata->model.num_output; i++) { if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) { + addr->output[i].nb_dims = 1; + addr->output[i].shape[0] = metadata->output1[i].size; addr->output[i].nb_elements = metadata->output1[i].size; addr->output[i].sz_d = addr->output[i].nb_elements * @@ -426,6 +441,9 @@ cn10k_ml_model_addr_update(struct cn10k_ml_model *model, uint8_t *buffer, uint8_ model->model_id, i, addr->output[i].sz_d, addr->output[i].sz_q); } else { j = i - MRVL_ML_NUM_INPUT_OUTPUT_1; + + addr->output[i].nb_dims = 1; + addr->output[i].shape[0] = metadata->output2[j].size; addr->output[i].nb_elements = metadata->output2[j].size; addr->output[i].sz_d = addr->output[i].nb_elements * @@ -498,6 +516,7 @@ void cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model) { struct cn10k_ml_model_metadata *metadata; + struct cn10k_ml_model_addr *addr; struct rte_ml_model_info *info; struct rte_ml_io_info *output; struct rte_ml_io_info *input; @@ -508,6 +527,7 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model) info = PLT_PTR_CAST(model->info); input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info)); output = PLT_PTR_ADD(input, metadata->model.num_input * sizeof(struct rte_ml_io_info)); + addr = &model->addr; /* Set model info */ memset(info, 0, sizeof(struct rte_ml_model_info)); @@ -529,24 +549,25 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model) if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) { rte_memcpy(input[i].name, metadata->input1[i].input_name, MRVL_ML_INPUT_NAME_LEN); - input[i].dtype = metadata->input1[i].input_type; - input[i].qtype = metadata->input1[i].model_input_type; - input[i].shape.format = metadata->input1[i].shape.format; - input[i].shape.w = metadata->input1[i].shape.w; - input[i].shape.x = metadata->input1[i].shape.x; - input[i].shape.y = metadata->input1[i].shape.y; - input[i].shape.z = metadata->input1[i].shape.z; + input[i].nb_dims = addr->input[i].nb_dims; + input[i].shape = addr->input[i].shape; + input[i].type = metadata->input1[i].model_input_type; + input[i].nb_elements = addr->input[i].nb_elements; + input[i].size = + addr->input[i].nb_elements * + rte_ml_io_type_size_get(metadata->input1[i].model_input_type); } else { j = i - MRVL_ML_NUM_INPUT_OUTPUT_1; + rte_memcpy(input[i].name, metadata->input2[j].input_name, MRVL_ML_INPUT_NAME_LEN); - input[i].dtype = metadata->input2[j].input_type; - input[i].qtype = metadata->input2[j].model_input_type; - input[i].shape.format = metadata->input2[j].shape.format; - input[i].shape.w = metadata->input2[j].shape.w; - input[i].shape.x = metadata->input2[j].shape.x; - input[i].shape.y = metadata->input2[j].shape.y; - input[i].shape.z = metadata->input2[j].shape.z; + input[i].nb_dims = addr->input[i].nb_dims; + input[i].shape = addr->input[i].shape; + input[i].type = metadata->input2[j].model_input_type; + input[i].nb_elements = addr->input[i].nb_elements; + input[i].size = + addr->input[i].nb_elements * + rte_ml_io_type_size_get(metadata->input2[j].model_input_type); } } @@ -555,24 +576,25 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model) if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) { rte_memcpy(output[i].name, metadata->output1[i].output_name, MRVL_ML_OUTPUT_NAME_LEN); - output[i].dtype = metadata->output1[i].output_type; - output[i].qtype = metadata->output1[i].model_output_type; - output[i].shape.format = RTE_ML_IO_FORMAT_1D; - output[i].shape.w = metadata->output1[i].size; - output[i].shape.x = 1; - output[i].shape.y = 1; - output[i].shape.z = 1; + output[i].nb_dims = addr->output[i].nb_dims; + output[i].shape = addr->output[i].shape; + output[i].type = metadata->output1[i].model_output_type; + output[i].nb_elements = addr->output[i].nb_elements; + output[i].size = + addr->output[i].nb_elements * + rte_ml_io_type_size_get(metadata->output1[i].model_output_type); } else { j = i - MRVL_ML_NUM_INPUT_OUTPUT_1; + rte_memcpy(output[i].name, metadata->output2[j].output_name, MRVL_ML_OUTPUT_NAME_LEN); - output[i].dtype = metadata->output2[j].output_type; - output[i].qtype = metadata->output2[j].model_output_type; - output[i].shape.format = RTE_ML_IO_FORMAT_1D; - output[i].shape.w = metadata->output2[j].size; - output[i].shape.x = 1; - output[i].shape.y = 1; - output[i].shape.z = 1; + output[i].nb_dims = addr->output[i].nb_dims; + output[i].shape = addr->output[i].shape; + output[i].type = metadata->output2[j].model_output_type; + output[i].nb_elements = addr->output[i].nb_elements; + output[i].size = + addr->output[i].nb_elements * + rte_ml_io_type_size_get(metadata->output2[j].model_output_type); } } } diff --git a/drivers/ml/cnxk/cn10k_ml_model.h b/drivers/ml/cnxk/cn10k_ml_model.h index 1f689363fc..4cc0744891 100644 --- a/drivers/ml/cnxk/cn10k_ml_model.h +++ b/drivers/ml/cnxk/cn10k_ml_model.h @@ -409,6 +409,12 @@ struct cn10k_ml_model_addr { /* Input address and size */ struct { + /* Number of dimensions in shape */ + uint32_t nb_dims; + + /* Shape of input */ + uint32_t shape[4]; + /* Number of elements */ uint32_t nb_elements; @@ -421,6 +427,12 @@ struct cn10k_ml_model_addr { /* Output address and size */ struct { + /* Number of dimensions in shape */ + uint32_t nb_dims; + + /* Shape of input */ + uint32_t shape[4]; + /* Number of elements */ uint32_t nb_elements; diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c index 656467d891..e3faab81ba 100644 --- a/drivers/ml/cnxk/cn10k_ml_ops.c +++ b/drivers/ml/cnxk/cn10k_ml_ops.c @@ -321,8 +321,8 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp) fprintf(fp, "\n"); print_line(fp, LINE_LEN); - fprintf(fp, "%8s %16s %12s %18s %12s %14s\n", "input", "input_name", "input_type", - "model_input_type", "quantize", "format"); + fprintf(fp, "%8s %16s %12s %18s %12s\n", "input", "input_name", "input_type", + "model_input_type", "quantize"); print_line(fp, LINE_LEN); for (i = 0; i < model->metadata.model.num_input; i++) { if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) { @@ -335,12 +335,10 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp) fprintf(fp, "%*s ", 18, str); fprintf(fp, "%*s", 12, (model->metadata.input1[i].quantize == 1 ? "Yes" : "No")); - rte_ml_io_format_to_str(model->metadata.input1[i].shape.format, str, - STR_LEN); - fprintf(fp, "%*s", 16, str); fprintf(fp, "\n"); } else { j = i - MRVL_ML_NUM_INPUT_OUTPUT_1; + fprintf(fp, "%8u ", i); fprintf(fp, "%*s ", 16, model->metadata.input2[j].input_name); rte_ml_io_type_to_str(model->metadata.input2[j].input_type, str, STR_LEN); @@ -350,9 +348,6 @@ cn10k_ml_model_print(struct rte_ml_dev *dev, uint16_t model_id, FILE *fp) fprintf(fp, "%*s ", 18, str); fprintf(fp, "%*s", 12, (model->metadata.input2[j].quantize == 1 ? "Yes" : "No")); - rte_ml_io_format_to_str(model->metadata.input2[j].shape.format, str, - STR_LEN); - fprintf(fp, "%*s", 16, str); fprintf(fp, "\n"); } } diff --git a/lib/mldev/mldev_utils.c b/lib/mldev/mldev_utils.c index d2442b123b..ccd2c39ca8 100644 --- a/lib/mldev/mldev_utils.c +++ b/lib/mldev/mldev_utils.c @@ -86,33 +86,3 @@ rte_ml_io_type_to_str(enum rte_ml_io_type type, char *str, int len) rte_strlcpy(str, "invalid", len); } } - -void -rte_ml_io_format_to_str(enum rte_ml_io_format format, char *str, int len) -{ - switch (format) { - case RTE_ML_IO_FORMAT_NCHW: - rte_strlcpy(str, "NCHW", len); - break; - case RTE_ML_IO_FORMAT_NHWC: - rte_strlcpy(str, "NHWC", len); - break; - case RTE_ML_IO_FORMAT_CHWN: - rte_strlcpy(str, "CHWN", len); - break; - case RTE_ML_IO_FORMAT_3D: - rte_strlcpy(str, "3D", len); - break; - case RTE_ML_IO_FORMAT_2D: - rte_strlcpy(str, "Matrix", len); - break; - case RTE_ML_IO_FORMAT_1D: - rte_strlcpy(str, "Vector", len); - break; - case RTE_ML_IO_FORMAT_SCALAR: - rte_strlcpy(str, "Scalar", len); - break; - default: - rte_strlcpy(str, "invalid", len); - } -} diff --git a/lib/mldev/mldev_utils.h b/lib/mldev/mldev_utils.h index 5bc8020453..220afb42f0 100644 --- a/lib/mldev/mldev_utils.h +++ b/lib/mldev/mldev_utils.h @@ -52,22 +52,6 @@ __rte_internal void rte_ml_io_type_to_str(enum rte_ml_io_type type, char *str, int len); -/** - * @internal - * - * Get the name of an ML IO format. - * - * @param[in] type - * Enumeration of ML IO format. - * @param[in] str - * Address of character array. - * @param[in] len - * Length of character array. - */ -__rte_internal -void -rte_ml_io_format_to_str(enum rte_ml_io_format format, char *str, int len); - /** * @internal * diff --git a/lib/mldev/rte_mldev.h b/lib/mldev/rte_mldev.h index fc3525c1ab..6204df0930 100644 --- a/lib/mldev/rte_mldev.h +++ b/lib/mldev/rte_mldev.h @@ -863,47 +863,6 @@ enum rte_ml_io_type { /**< 16-bit brain floating point number. */ }; -/** - * Input and output format. This is used to represent the encoding type of multi-dimensional - * used by ML models. - */ -enum rte_ml_io_format { - RTE_ML_IO_FORMAT_NCHW = 1, - /**< Batch size (N) x channels (C) x height (H) x width (W) */ - RTE_ML_IO_FORMAT_NHWC, - /**< Batch size (N) x height (H) x width (W) x channels (C) */ - RTE_ML_IO_FORMAT_CHWN, - /**< Channels (C) x height (H) x width (W) x batch size (N) */ - RTE_ML_IO_FORMAT_3D, - /**< Format to represent a 3 dimensional data */ - RTE_ML_IO_FORMAT_2D, - /**< Format to represent matrix data */ - RTE_ML_IO_FORMAT_1D, - /**< Format to represent vector data */ - RTE_ML_IO_FORMAT_SCALAR, - /**< Format to represent scalar data */ -}; - -/** - * Input and output shape. This structure represents the encoding format and dimensions - * of the tensor or vector. - * - * The data can be a 4D / 3D tensor, matrix, vector or a scalar. Number of dimensions used - * for the data would depend on the format. Unused dimensions to be set to 1. - */ -struct rte_ml_io_shape { - enum rte_ml_io_format format; - /**< Format of the data */ - uint32_t w; - /**< First dimension */ - uint32_t x; - /**< Second dimension */ - uint32_t y; - /**< Third dimension */ - uint32_t z; - /**< Fourth dimension */ -}; - /** Input and output data information structure * * Specifies the type and shape of input and output data. @@ -911,12 +870,18 @@ struct rte_ml_io_shape { struct rte_ml_io_info { char name[RTE_ML_STR_MAX]; /**< Name of data */ - struct rte_ml_io_shape shape; - /**< Shape of data */ - enum rte_ml_io_type qtype; - /**< Type of quantized data */ - enum rte_ml_io_type dtype; - /**< Type of de-quantized data */ + uint32_t nb_dims; + /**< Number of dimensions in shape */ + uint32_t *shape; + /**< Shape of the tensor */ + enum rte_ml_io_type type; + /**< Type of data + * @see enum rte_ml_io_type + */ + uint64_t nb_elements; + /** Number of elements in tensor */ + uint64_t size; + /** Size of tensor in bytes */ }; /** Model information structure */ diff --git a/lib/mldev/version.map b/lib/mldev/version.map index 0706b565be..40ff27f4b9 100644 --- a/lib/mldev/version.map +++ b/lib/mldev/version.map @@ -51,7 +51,6 @@ INTERNAL { rte_ml_io_type_size_get; rte_ml_io_type_to_str; - rte_ml_io_format_to_str; rte_ml_io_float32_to_int8; rte_ml_io_int8_to_float32; rte_ml_io_float32_to_uint8; -- 2.41.0 ^ permalink raw reply [flat|nested] 26+ messages in thread
* RE: [PATCH v4 1/3] mldev: add support for arbitrary shape dimensions 2023-10-02 9:58 ` [PATCH v4 1/3] mldev: add support for arbitrary shape dimensions Srikanth Yalavarthi @ 2023-10-04 14:42 ` Anup Prabhu 2023-10-05 9:12 ` Shivah Shankar Shankar Narayan Rao 1 sibling, 0 replies; 26+ messages in thread From: Anup Prabhu @ 2023-10-04 14:42 UTC (permalink / raw) To: Srikanth Yalavarthi, Srikanth Yalavarthi Cc: dev, Shivah Shankar Shankar Narayan Rao, Prince Takkar [-- Attachment #1: Type: text/plain, Size: 798 bytes --] > -----Original Message----- > From: Srikanth Yalavarthi <syalavarthi@marvell.com> > Sent: Monday, October 2, 2023 3:29 PM > To: Srikanth Yalavarthi <syalavarthi@marvell.com> > Cc: dev@dpdk.org; Shivah Shankar Shankar Narayan Rao > <sshankarnara@marvell.com>; Anup Prabhu <aprabhu@marvell.com>; > Prince Takkar <ptakkar@marvell.com> > Subject: [PATCH v4 1/3] mldev: add support for arbitrary shape dimensions > > Updated rte_ml_io_info to support shape of arbitrary number of > dimensions. Dropped use of rte_ml_io_shape and rte_ml_io_format. > Introduced new fields nb_elements and size in rte_ml_io_info. > > Updated drivers and app/mldev to support the changes. > > Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com> Acked-by: Anup Prabhu <aprabhu@marvell.com> [-- Attachment #2: winmail.dat --] [-- Type: application/ms-tnef, Size: 35441 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* RE: [PATCH v4 1/3] mldev: add support for arbitrary shape dimensions 2023-10-02 9:58 ` [PATCH v4 1/3] mldev: add support for arbitrary shape dimensions Srikanth Yalavarthi 2023-10-04 14:42 ` Anup Prabhu @ 2023-10-05 9:12 ` Shivah Shankar Shankar Narayan Rao 1 sibling, 0 replies; 26+ messages in thread From: Shivah Shankar Shankar Narayan Rao @ 2023-10-05 9:12 UTC (permalink / raw) To: Srikanth Yalavarthi, Srikanth Yalavarthi; +Cc: dev, Anup Prabhu, Prince Takkar [-- Attachment #1: Type: text/plain, Size: 806 bytes --] > -----Original Message----- > From: Srikanth Yalavarthi <syalavarthi@marvell.com> > Sent: Monday, October 2, 2023 3:29 PM > To: Srikanth Yalavarthi <syalavarthi@marvell.com> > Cc: dev@dpdk.org; Shivah Shankar Shankar Narayan Rao > <sshankarnara@marvell.com>; Anup Prabhu <aprabhu@marvell.com>; > Prince Takkar <ptakkar@marvell.com> > Subject: [PATCH v4 1/3] mldev: add support for arbitrary shape dimensions > > Updated rte_ml_io_info to support shape of arbitrary number of > dimensions. Dropped use of rte_ml_io_shape and rte_ml_io_format. > Introduced new fields nb_elements and size in rte_ml_io_info. > > Updated drivers and app/mldev to support the changes. > > Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com> Acked-by: Shivah Shankar S <sshankarnara@marvell.com> [-- Attachment #2: winmail.dat --] [-- Type: application/ms-tnef, Size: 36177 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v4 2/3] mldev: introduce support for IO layout 2023-10-02 9:58 ` [PATCH v4 0/3] Spec changes to support multi I/O models Srikanth Yalavarthi 2023-10-02 9:58 ` [PATCH v4 1/3] mldev: add support for arbitrary shape dimensions Srikanth Yalavarthi @ 2023-10-02 9:58 ` Srikanth Yalavarthi 2023-10-05 9:10 ` Shivah Shankar Shankar Narayan Rao 2023-10-02 9:58 ` [PATCH v4 3/3] mldev: drop input and output size get APIs Srikanth Yalavarthi 2023-10-11 14:45 ` [PATCH v4 0/3] Spec changes to support multi I/O models Thomas Monjalon 3 siblings, 1 reply; 26+ messages in thread From: Srikanth Yalavarthi @ 2023-10-02 9:58 UTC (permalink / raw) To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar Introduce IO layout in ML device specification. IO layout defines the expected arrangement of model input and output buffers in the memory. Packed and Split layout support is added in the specification. Updated rte_ml_op to support array of rte_ml_buff_seg pointers to support packed and split I/O layouts. Updated ML quantize and dequantize APIs to support rte_ml_buff_seg pointer arrays. Replaced batch_size with min_batches and max_batches in rte_ml_model_info. Implement support for model IO layout in ml/cnxk driver. Updated the ML test application to support IO layout and dropped support for '--batches' in test application. Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com> --- app/test-mldev/ml_options.c | 16 -- app/test-mldev/ml_options.h | 2 - app/test-mldev/test_inference_common.c | 327 +++++++++++++++++++++---- app/test-mldev/test_inference_common.h | 6 + app/test-mldev/test_model_common.c | 6 - app/test-mldev/test_model_common.h | 1 - doc/guides/rel_notes/release_23_11.rst | 10 + doc/guides/tools/testmldev.rst | 6 - drivers/ml/cnxk/cn10k_ml_dev.h | 3 + drivers/ml/cnxk/cn10k_ml_model.c | 6 +- drivers/ml/cnxk/cn10k_ml_ops.c | 74 +++--- lib/mldev/meson.build | 2 +- lib/mldev/rte_mldev.c | 12 +- lib/mldev/rte_mldev.h | 90 +++++-- lib/mldev/rte_mldev_core.h | 14 +- 15 files changed, 428 insertions(+), 147 deletions(-) diff --git a/app/test-mldev/ml_options.c b/app/test-mldev/ml_options.c index d068b30df5..eeaffec399 100644 --- a/app/test-mldev/ml_options.c +++ b/app/test-mldev/ml_options.c @@ -28,7 +28,6 @@ ml_options_default(struct ml_options *opt) opt->burst_size = 1; opt->queue_pairs = 1; opt->queue_size = 1; - opt->batches = 0; opt->tolerance = 0.0; opt->stats = false; opt->debug = false; @@ -213,18 +212,6 @@ ml_parse_queue_size(struct ml_options *opt, const char *arg) return ret; } -static int -ml_parse_batches(struct ml_options *opt, const char *arg) -{ - int ret; - - ret = parser_read_uint16(&opt->batches, arg); - if (ret != 0) - ml_err("Invalid option, batches = %s\n", arg); - - return ret; -} - static int ml_parse_tolerance(struct ml_options *opt, const char *arg) { @@ -255,7 +242,6 @@ ml_dump_test_options(const char *testname) "\t\t--burst_size : inference burst size\n" "\t\t--queue_pairs : number of queue pairs to create\n" "\t\t--queue_size : size of queue-pair\n" - "\t\t--batches : number of batches of input\n" "\t\t--tolerance : maximum tolerance (%%) for output validation\n" "\t\t--stats : enable reporting device and model statistics\n"); printf("\n"); @@ -287,7 +273,6 @@ static struct option lgopts[] = { {ML_BURST_SIZE, 1, 0, 0}, {ML_QUEUE_PAIRS, 1, 0, 0}, {ML_QUEUE_SIZE, 1, 0, 0}, - {ML_BATCHES, 1, 0, 0}, {ML_TOLERANCE, 1, 0, 0}, {ML_STATS, 0, 0, 0}, {ML_DEBUG, 0, 0, 0}, @@ -309,7 +294,6 @@ ml_opts_parse_long(int opt_idx, struct ml_options *opt) {ML_BURST_SIZE, ml_parse_burst_size}, {ML_QUEUE_PAIRS, ml_parse_queue_pairs}, {ML_QUEUE_SIZE, ml_parse_queue_size}, - {ML_BATCHES, ml_parse_batches}, {ML_TOLERANCE, ml_parse_tolerance}, }; diff --git a/app/test-mldev/ml_options.h b/app/test-mldev/ml_options.h index 622a4c05fc..90e22adeac 100644 --- a/app/test-mldev/ml_options.h +++ b/app/test-mldev/ml_options.h @@ -21,7 +21,6 @@ #define ML_BURST_SIZE ("burst_size") #define ML_QUEUE_PAIRS ("queue_pairs") #define ML_QUEUE_SIZE ("queue_size") -#define ML_BATCHES ("batches") #define ML_TOLERANCE ("tolerance") #define ML_STATS ("stats") #define ML_DEBUG ("debug") @@ -44,7 +43,6 @@ struct ml_options { uint16_t burst_size; uint16_t queue_pairs; uint16_t queue_size; - uint16_t batches; float tolerance; bool stats; bool debug; diff --git a/app/test-mldev/test_inference_common.c b/app/test-mldev/test_inference_common.c index b40519b5e3..846f71abb1 100644 --- a/app/test-mldev/test_inference_common.c +++ b/app/test-mldev/test_inference_common.c @@ -47,7 +47,10 @@ ml_enqueue_single(void *arg) uint64_t start_cycle; uint32_t burst_enq; uint32_t lcore_id; + uint64_t offset; + uint64_t bufsz; uint16_t fid; + uint32_t i; int ret; lcore_id = rte_lcore_id(); @@ -66,24 +69,64 @@ ml_enqueue_single(void *arg) if (ret != 0) goto next_model; -retry: +retry_req: ret = rte_mempool_get(t->model[fid].io_pool, (void **)&req); if (ret != 0) - goto retry; + goto retry_req; + +retry_inp_segs: + ret = rte_mempool_get_bulk(t->buf_seg_pool, (void **)req->inp_buf_segs, + t->model[fid].info.nb_inputs); + if (ret != 0) + goto retry_inp_segs; + +retry_out_segs: + ret = rte_mempool_get_bulk(t->buf_seg_pool, (void **)req->out_buf_segs, + t->model[fid].info.nb_outputs); + if (ret != 0) + goto retry_out_segs; op->model_id = t->model[fid].id; - op->nb_batches = t->model[fid].nb_batches; + op->nb_batches = t->model[fid].info.min_batches; op->mempool = t->op_pool; + op->input = req->inp_buf_segs; + op->output = req->out_buf_segs; + op->user_ptr = req; - op->input.addr = req->input; - op->input.length = t->model[fid].inp_qsize; - op->input.next = NULL; + if (t->model[fid].info.io_layout == RTE_ML_IO_LAYOUT_PACKED) { + op->input[0]->addr = req->input; + op->input[0]->iova_addr = rte_mem_virt2iova(req->input); + op->input[0]->length = t->model[fid].inp_qsize; + op->input[0]->next = NULL; + + op->output[0]->addr = req->output; + op->output[0]->iova_addr = rte_mem_virt2iova(req->output); + op->output[0]->length = t->model[fid].out_qsize; + op->output[0]->next = NULL; + } else { + offset = 0; + for (i = 0; i < t->model[fid].info.nb_inputs; i++) { + bufsz = RTE_ALIGN_CEIL(t->model[fid].info.input_info[i].size, + t->cmn.dev_info.align_size); + op->input[i]->addr = req->input + offset; + op->input[i]->iova_addr = rte_mem_virt2iova(req->input + offset); + op->input[i]->length = bufsz; + op->input[i]->next = NULL; + offset += bufsz; + } - op->output.addr = req->output; - op->output.length = t->model[fid].out_qsize; - op->output.next = NULL; + offset = 0; + for (i = 0; i < t->model[fid].info.nb_outputs; i++) { + bufsz = RTE_ALIGN_CEIL(t->model[fid].info.output_info[i].size, + t->cmn.dev_info.align_size); + op->output[i]->addr = req->output + offset; + op->output[i]->iova_addr = rte_mem_virt2iova(req->output + offset); + op->output[i]->length = bufsz; + op->output[i]->next = NULL; + offset += bufsz; + } + } - op->user_ptr = req; req->niters++; req->fid = fid; @@ -143,6 +186,10 @@ ml_dequeue_single(void *arg) } req = (struct ml_request *)op->user_ptr; rte_mempool_put(t->model[req->fid].io_pool, req); + rte_mempool_put_bulk(t->buf_seg_pool, (void **)op->input, + t->model[req->fid].info.nb_inputs); + rte_mempool_put_bulk(t->buf_seg_pool, (void **)op->output, + t->model[req->fid].info.nb_outputs); rte_mempool_put(t->op_pool, op); } @@ -164,9 +211,12 @@ ml_enqueue_burst(void *arg) uint16_t burst_enq; uint32_t lcore_id; uint16_t pending; + uint64_t offset; + uint64_t bufsz; uint16_t idx; uint16_t fid; uint16_t i; + uint16_t j; int ret; lcore_id = rte_lcore_id(); @@ -186,25 +236,70 @@ ml_enqueue_burst(void *arg) if (ret != 0) goto next_model; -retry: +retry_reqs: ret = rte_mempool_get_bulk(t->model[fid].io_pool, (void **)args->reqs, ops_count); if (ret != 0) - goto retry; + goto retry_reqs; for (i = 0; i < ops_count; i++) { +retry_inp_segs: + ret = rte_mempool_get_bulk(t->buf_seg_pool, (void **)args->reqs[i]->inp_buf_segs, + t->model[fid].info.nb_inputs); + if (ret != 0) + goto retry_inp_segs; + +retry_out_segs: + ret = rte_mempool_get_bulk(t->buf_seg_pool, (void **)args->reqs[i]->out_buf_segs, + t->model[fid].info.nb_outputs); + if (ret != 0) + goto retry_out_segs; + args->enq_ops[i]->model_id = t->model[fid].id; - args->enq_ops[i]->nb_batches = t->model[fid].nb_batches; + args->enq_ops[i]->nb_batches = t->model[fid].info.min_batches; args->enq_ops[i]->mempool = t->op_pool; + args->enq_ops[i]->input = args->reqs[i]->inp_buf_segs; + args->enq_ops[i]->output = args->reqs[i]->out_buf_segs; + args->enq_ops[i]->user_ptr = args->reqs[i]; - args->enq_ops[i]->input.addr = args->reqs[i]->input; - args->enq_ops[i]->input.length = t->model[fid].inp_qsize; - args->enq_ops[i]->input.next = NULL; + if (t->model[fid].info.io_layout == RTE_ML_IO_LAYOUT_PACKED) { + args->enq_ops[i]->input[0]->addr = args->reqs[i]->input; + args->enq_ops[i]->input[0]->iova_addr = + rte_mem_virt2iova(args->reqs[i]->input); + args->enq_ops[i]->input[0]->length = t->model[fid].inp_qsize; + args->enq_ops[i]->input[0]->next = NULL; + + args->enq_ops[i]->output[0]->addr = args->reqs[i]->output; + args->enq_ops[i]->output[0]->iova_addr = + rte_mem_virt2iova(args->reqs[i]->output); + args->enq_ops[i]->output[0]->length = t->model[fid].out_qsize; + args->enq_ops[i]->output[0]->next = NULL; + } else { + offset = 0; + for (j = 0; j < t->model[fid].info.nb_inputs; j++) { + bufsz = RTE_ALIGN_CEIL(t->model[fid].info.input_info[i].size, + t->cmn.dev_info.align_size); + + args->enq_ops[i]->input[j]->addr = args->reqs[i]->input + offset; + args->enq_ops[i]->input[j]->iova_addr = + rte_mem_virt2iova(args->reqs[i]->input + offset); + args->enq_ops[i]->input[j]->length = t->model[fid].inp_qsize; + args->enq_ops[i]->input[j]->next = NULL; + offset += bufsz; + } - args->enq_ops[i]->output.addr = args->reqs[i]->output; - args->enq_ops[i]->output.length = t->model[fid].out_qsize; - args->enq_ops[i]->output.next = NULL; + offset = 0; + for (j = 0; j < t->model[fid].info.nb_outputs; j++) { + bufsz = RTE_ALIGN_CEIL(t->model[fid].info.output_info[i].size, + t->cmn.dev_info.align_size); + args->enq_ops[i]->output[j]->addr = args->reqs[i]->output + offset; + args->enq_ops[i]->output[j]->iova_addr = + rte_mem_virt2iova(args->reqs[i]->output + offset); + args->enq_ops[i]->output[j]->length = t->model[fid].out_qsize; + args->enq_ops[i]->output[j]->next = NULL; + offset += bufsz; + } + } - args->enq_ops[i]->user_ptr = args->reqs[i]; args->reqs[i]->niters++; args->reqs[i]->fid = fid; } @@ -275,8 +370,15 @@ ml_dequeue_burst(void *arg) t->error_count[lcore_id]++; } req = (struct ml_request *)args->deq_ops[i]->user_ptr; - if (req != NULL) + if (req != NULL) { rte_mempool_put(t->model[req->fid].io_pool, req); + rte_mempool_put_bulk(t->buf_seg_pool, + (void **)args->deq_ops[i]->input, + t->model[req->fid].info.nb_inputs); + rte_mempool_put_bulk(t->buf_seg_pool, + (void **)args->deq_ops[i]->output, + t->model[req->fid].info.nb_outputs); + } } rte_mempool_put_bulk(t->op_pool, (void *)args->deq_ops, burst_deq); } @@ -315,6 +417,12 @@ test_inference_cap_check(struct ml_options *opt) return false; } + if (dev_info.max_io < ML_TEST_MAX_IO_SIZE) { + ml_err("Insufficient capabilities: Max I/O, count = %u > (max limit = %u)", + ML_TEST_MAX_IO_SIZE, dev_info.max_io); + return false; + } + return true; } @@ -403,11 +511,6 @@ test_inference_opt_dump(struct ml_options *opt) ml_dump("tolerance", "%-7.3f", opt->tolerance); ml_dump("stats", "%s", (opt->stats ? "true" : "false")); - if (opt->batches == 0) - ml_dump("batches", "%u (default batch size)", opt->batches); - else - ml_dump("batches", "%u", opt->batches); - ml_dump_begin("filelist"); for (i = 0; i < opt->nb_filelist; i++) { ml_dump_list("model", i, opt->filelist[i].model); @@ -492,10 +595,18 @@ void test_inference_destroy(struct ml_test *test, struct ml_options *opt) { struct test_inference *t; + uint32_t lcore_id; RTE_SET_USED(opt); t = ml_test_priv(test); + + for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) { + rte_free(t->args[lcore_id].enq_ops); + rte_free(t->args[lcore_id].deq_ops); + rte_free(t->args[lcore_id].reqs); + } + rte_free(t); } @@ -572,19 +683,62 @@ ml_request_initialize(struct rte_mempool *mp, void *opaque, void *obj, unsigned { struct test_inference *t = ml_test_priv((struct ml_test *)opaque); struct ml_request *req = (struct ml_request *)obj; + struct rte_ml_buff_seg dbuff_seg[ML_TEST_MAX_IO_SIZE]; + struct rte_ml_buff_seg qbuff_seg[ML_TEST_MAX_IO_SIZE]; + struct rte_ml_buff_seg *q_segs[ML_TEST_MAX_IO_SIZE]; + struct rte_ml_buff_seg *d_segs[ML_TEST_MAX_IO_SIZE]; + uint64_t offset; + uint64_t bufsz; + uint32_t i; RTE_SET_USED(mp); RTE_SET_USED(obj_idx); req->input = (uint8_t *)obj + - RTE_ALIGN_CEIL(sizeof(struct ml_request), t->cmn.dev_info.min_align_size); - req->output = req->input + - RTE_ALIGN_CEIL(t->model[t->fid].inp_qsize, t->cmn.dev_info.min_align_size); + RTE_ALIGN_CEIL(sizeof(struct ml_request), t->cmn.dev_info.align_size); + req->output = + req->input + RTE_ALIGN_CEIL(t->model[t->fid].inp_qsize, t->cmn.dev_info.align_size); req->niters = 0; + if (t->model[t->fid].info.io_layout == RTE_ML_IO_LAYOUT_PACKED) { + dbuff_seg[0].addr = t->model[t->fid].input; + dbuff_seg[0].iova_addr = rte_mem_virt2iova(t->model[t->fid].input); + dbuff_seg[0].length = t->model[t->fid].inp_dsize; + dbuff_seg[0].next = NULL; + d_segs[0] = &dbuff_seg[0]; + + qbuff_seg[0].addr = req->input; + qbuff_seg[0].iova_addr = rte_mem_virt2iova(req->input); + qbuff_seg[0].length = t->model[t->fid].inp_qsize; + qbuff_seg[0].next = NULL; + q_segs[0] = &qbuff_seg[0]; + } else { + offset = 0; + for (i = 0; i < t->model[t->fid].info.nb_inputs; i++) { + bufsz = t->model[t->fid].info.input_info[i].nb_elements * sizeof(float); + dbuff_seg[i].addr = t->model[t->fid].input + offset; + dbuff_seg[i].iova_addr = rte_mem_virt2iova(t->model[t->fid].input + offset); + dbuff_seg[i].length = bufsz; + dbuff_seg[i].next = NULL; + d_segs[i] = &dbuff_seg[i]; + offset += bufsz; + } + + offset = 0; + for (i = 0; i < t->model[t->fid].info.nb_inputs; i++) { + bufsz = RTE_ALIGN_CEIL(t->model[t->fid].info.input_info[i].size, + t->cmn.dev_info.align_size); + qbuff_seg[i].addr = req->input + offset; + qbuff_seg[i].iova_addr = rte_mem_virt2iova(req->input + offset); + qbuff_seg[i].length = bufsz; + qbuff_seg[i].next = NULL; + q_segs[i] = &qbuff_seg[i]; + offset += bufsz; + } + } + /* quantize data */ - rte_ml_io_quantize(t->cmn.opt->dev_id, t->model[t->fid].id, t->model[t->fid].nb_batches, - t->model[t->fid].input, req->input); + rte_ml_io_quantize(t->cmn.opt->dev_id, t->model[t->fid].id, d_segs, q_segs); } int @@ -599,24 +753,39 @@ ml_inference_iomem_setup(struct ml_test *test, struct ml_options *opt, uint16_t uint32_t buff_size; uint32_t mz_size; size_t fsize; + uint32_t i; int ret; /* get input buffer size */ - ret = rte_ml_io_input_size_get(opt->dev_id, t->model[fid].id, t->model[fid].nb_batches, - &t->model[fid].inp_qsize, &t->model[fid].inp_dsize); - if (ret != 0) { - ml_err("Failed to get input size, model : %s\n", opt->filelist[fid].model); - return ret; + t->model[fid].inp_qsize = 0; + for (i = 0; i < t->model[fid].info.nb_inputs; i++) { + if (t->model[fid].info.io_layout == RTE_ML_IO_LAYOUT_PACKED) + t->model[fid].inp_qsize += t->model[fid].info.input_info[i].size; + else + t->model[fid].inp_qsize += RTE_ALIGN_CEIL( + t->model[fid].info.input_info[i].size, t->cmn.dev_info.align_size); } /* get output buffer size */ - ret = rte_ml_io_output_size_get(opt->dev_id, t->model[fid].id, t->model[fid].nb_batches, - &t->model[fid].out_qsize, &t->model[fid].out_dsize); - if (ret != 0) { - ml_err("Failed to get input size, model : %s\n", opt->filelist[fid].model); - return ret; + t->model[fid].out_qsize = 0; + for (i = 0; i < t->model[fid].info.nb_outputs; i++) { + if (t->model[fid].info.io_layout == RTE_ML_IO_LAYOUT_PACKED) + t->model[fid].out_qsize += t->model[fid].info.output_info[i].size; + else + t->model[fid].out_qsize += RTE_ALIGN_CEIL( + t->model[fid].info.output_info[i].size, t->cmn.dev_info.align_size); } + t->model[fid].inp_dsize = 0; + for (i = 0; i < t->model[fid].info.nb_inputs; i++) + t->model[fid].inp_dsize += + t->model[fid].info.input_info[i].nb_elements * sizeof(float); + + t->model[fid].out_dsize = 0; + for (i = 0; i < t->model[fid].info.nb_outputs; i++) + t->model[fid].out_dsize += + t->model[fid].info.output_info[i].nb_elements * sizeof(float); + /* allocate buffer for user data */ mz_size = t->model[fid].inp_dsize + t->model[fid].out_dsize; if (strcmp(opt->filelist[fid].reference, "\0") != 0) @@ -675,9 +844,9 @@ ml_inference_iomem_setup(struct ml_test *test, struct ml_options *opt, uint16_t /* create mempool for quantized input and output buffers. ml_request_initialize is * used as a callback for object creation. */ - buff_size = RTE_ALIGN_CEIL(sizeof(struct ml_request), t->cmn.dev_info.min_align_size) + - RTE_ALIGN_CEIL(t->model[fid].inp_qsize, t->cmn.dev_info.min_align_size) + - RTE_ALIGN_CEIL(t->model[fid].out_qsize, t->cmn.dev_info.min_align_size); + buff_size = RTE_ALIGN_CEIL(sizeof(struct ml_request), t->cmn.dev_info.align_size) + + RTE_ALIGN_CEIL(t->model[fid].inp_qsize, t->cmn.dev_info.align_size) + + RTE_ALIGN_CEIL(t->model[fid].out_qsize, t->cmn.dev_info.align_size); nb_buffers = RTE_MIN((uint64_t)ML_TEST_MAX_POOL_SIZE, opt->repetitions); t->fid = fid; @@ -740,6 +909,18 @@ ml_inference_mem_setup(struct ml_test *test, struct ml_options *opt) return -ENOMEM; } + /* create buf_segs pool of with element of uint8_t. external buffers are attached to the + * buf_segs while queuing inference requests. + */ + t->buf_seg_pool = rte_mempool_create("ml_test_mbuf_pool", ML_TEST_MAX_POOL_SIZE * 2, + sizeof(struct rte_ml_buff_seg), 0, 0, NULL, NULL, NULL, + NULL, opt->socket_id, 0); + if (t->buf_seg_pool == NULL) { + ml_err("Failed to create buf_segs pool : %s\n", "ml_test_mbuf_pool"); + rte_ml_op_pool_free(t->op_pool); + return -ENOMEM; + } + return 0; } @@ -752,6 +933,9 @@ ml_inference_mem_destroy(struct ml_test *test, struct ml_options *opt) /* release op pool */ rte_mempool_free(t->op_pool); + + /* release buf_segs pool */ + rte_mempool_free(t->buf_seg_pool); } static bool @@ -781,8 +965,10 @@ ml_inference_validation(struct ml_test *test, struct ml_request *req) j = 0; next_element: match = false; - deviation = - (*reference == 0 ? 0 : 100 * fabs(*output - *reference) / fabs(*reference)); + if ((*reference == 0) && (*output == 0)) + deviation = 0; + else + deviation = 100 * fabs(*output - *reference) / fabs(*reference); if (deviation <= t->cmn.opt->tolerance) match = true; else @@ -817,14 +1003,59 @@ ml_request_finish(struct rte_mempool *mp, void *opaque, void *obj, unsigned int bool error = false; char *dump_path; + struct rte_ml_buff_seg qbuff_seg[ML_TEST_MAX_IO_SIZE]; + struct rte_ml_buff_seg dbuff_seg[ML_TEST_MAX_IO_SIZE]; + struct rte_ml_buff_seg *q_segs[ML_TEST_MAX_IO_SIZE]; + struct rte_ml_buff_seg *d_segs[ML_TEST_MAX_IO_SIZE]; + uint64_t offset; + uint64_t bufsz; + uint32_t i; + RTE_SET_USED(mp); if (req->niters == 0) return; t->nb_used++; - rte_ml_io_dequantize(t->cmn.opt->dev_id, model->id, t->model[req->fid].nb_batches, - req->output, model->output); + + if (t->model[req->fid].info.io_layout == RTE_ML_IO_LAYOUT_PACKED) { + qbuff_seg[0].addr = req->output; + qbuff_seg[0].iova_addr = rte_mem_virt2iova(req->output); + qbuff_seg[0].length = t->model[req->fid].out_qsize; + qbuff_seg[0].next = NULL; + q_segs[0] = &qbuff_seg[0]; + + dbuff_seg[0].addr = model->output; + dbuff_seg[0].iova_addr = rte_mem_virt2iova(model->output); + dbuff_seg[0].length = t->model[req->fid].out_dsize; + dbuff_seg[0].next = NULL; + d_segs[0] = &dbuff_seg[0]; + } else { + offset = 0; + for (i = 0; i < t->model[req->fid].info.nb_outputs; i++) { + bufsz = RTE_ALIGN_CEIL(t->model[req->fid].info.output_info[i].size, + t->cmn.dev_info.align_size); + qbuff_seg[i].addr = req->output + offset; + qbuff_seg[i].iova_addr = rte_mem_virt2iova(req->output + offset); + qbuff_seg[i].length = bufsz; + qbuff_seg[i].next = NULL; + q_segs[i] = &qbuff_seg[i]; + offset += bufsz; + } + + offset = 0; + for (i = 0; i < t->model[req->fid].info.nb_outputs; i++) { + bufsz = t->model[req->fid].info.output_info[i].nb_elements * sizeof(float); + dbuff_seg[i].addr = model->output + offset; + dbuff_seg[i].iova_addr = rte_mem_virt2iova(model->output + offset); + dbuff_seg[i].length = bufsz; + dbuff_seg[i].next = NULL; + d_segs[i] = &dbuff_seg[i]; + offset += bufsz; + } + } + + rte_ml_io_dequantize(t->cmn.opt->dev_id, model->id, q_segs, d_segs); if (model->reference == NULL) goto dump_output_pass; diff --git a/app/test-mldev/test_inference_common.h b/app/test-mldev/test_inference_common.h index 8f27af25e4..3f4ba3219b 100644 --- a/app/test-mldev/test_inference_common.h +++ b/app/test-mldev/test_inference_common.h @@ -11,11 +11,16 @@ #include "test_model_common.h" +#define ML_TEST_MAX_IO_SIZE 32 + struct ml_request { uint8_t *input; uint8_t *output; uint16_t fid; uint64_t niters; + + struct rte_ml_buff_seg *inp_buf_segs[ML_TEST_MAX_IO_SIZE]; + struct rte_ml_buff_seg *out_buf_segs[ML_TEST_MAX_IO_SIZE]; }; struct ml_core_args { @@ -38,6 +43,7 @@ struct test_inference { /* test specific data */ struct ml_model model[ML_TEST_MAX_MODELS]; + struct rte_mempool *buf_seg_pool; struct rte_mempool *op_pool; uint64_t nb_used; diff --git a/app/test-mldev/test_model_common.c b/app/test-mldev/test_model_common.c index 8dbb0ff89f..c517a50611 100644 --- a/app/test-mldev/test_model_common.c +++ b/app/test-mldev/test_model_common.c @@ -50,12 +50,6 @@ ml_model_load(struct ml_test *test, struct ml_options *opt, struct ml_model *mod return ret; } - /* Update number of batches */ - if (opt->batches == 0) - model->nb_batches = model->info.batch_size; - else - model->nb_batches = opt->batches; - model->state = MODEL_LOADED; return 0; diff --git a/app/test-mldev/test_model_common.h b/app/test-mldev/test_model_common.h index c1021ef1b6..a207e54ab7 100644 --- a/app/test-mldev/test_model_common.h +++ b/app/test-mldev/test_model_common.h @@ -31,7 +31,6 @@ struct ml_model { uint8_t *reference; struct rte_mempool *io_pool; - uint32_t nb_batches; }; int ml_model_load(struct ml_test *test, struct ml_options *opt, struct ml_model *model, diff --git a/doc/guides/rel_notes/release_23_11.rst b/doc/guides/rel_notes/release_23_11.rst index e553554b3a..8562bac77c 100644 --- a/doc/guides/rel_notes/release_23_11.rst +++ b/doc/guides/rel_notes/release_23_11.rst @@ -41,6 +41,11 @@ DPDK Release 23.11 New Features ------------ + * **Added support for models with multiple I/O in mldev library.** + + Added support in mldev library for models with multiple inputs and outputs. + + .. This section should contain new features added in this release. Sample format: @@ -122,6 +127,11 @@ API Changes * mldev: Updated mldev API to support models with multiple inputs and outputs Updated the structure ``rte_ml_model_info`` to support input and output with arbitrary shapes. + Added support for ``rte_ml_io_layout``. Two layout types split and packed are + supported by the specification, which enables higher control in handling models + with multiple inputs and outputs. Updated ``rte_ml_op``, ``rte_ml_io_quantize`` + and ``rte_ml_io_dequantize`` to support an array of ``rte_ml_buff_seg`` for + inputs and outputs and removed use of batches argument. ABI Changes ----------- diff --git a/doc/guides/tools/testmldev.rst b/doc/guides/tools/testmldev.rst index 741abd722e..9b1565a457 100644 --- a/doc/guides/tools/testmldev.rst +++ b/doc/guides/tools/testmldev.rst @@ -106,11 +106,6 @@ The following are the command-line options supported by the test application. Queue size would translate into ``rte_ml_dev_qp_conf::nb_desc`` field during queue-pair creation. Default value is ``1``. -``--batches <n>`` - Set the number batches in the input file provided for inference run. - When not specified, the test would assume the number of batches - is the batch size of the model. - ``--tolerance <n>`` Set the tolerance value in percentage to be used for output validation. Default value is ``0``. @@ -282,7 +277,6 @@ Supported command line options for inference tests are following:: --burst_size --queue_pairs --queue_size - --batches --tolerance --stats diff --git a/drivers/ml/cnxk/cn10k_ml_dev.h b/drivers/ml/cnxk/cn10k_ml_dev.h index 6ca0b0bb6e..c73bf7d001 100644 --- a/drivers/ml/cnxk/cn10k_ml_dev.h +++ b/drivers/ml/cnxk/cn10k_ml_dev.h @@ -30,6 +30,9 @@ /* Maximum number of descriptors per queue-pair */ #define ML_CN10K_MAX_DESC_PER_QP 1024 +/* Maximum number of inputs / outputs per model */ +#define ML_CN10K_MAX_INPUT_OUTPUT 32 + /* Maximum number of segments for IO data */ #define ML_CN10K_MAX_SEGMENTS 1 diff --git a/drivers/ml/cnxk/cn10k_ml_model.c b/drivers/ml/cnxk/cn10k_ml_model.c index 26df8d9ff9..e0b750cd8e 100644 --- a/drivers/ml/cnxk/cn10k_ml_model.c +++ b/drivers/ml/cnxk/cn10k_ml_model.c @@ -520,9 +520,11 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model) struct rte_ml_model_info *info; struct rte_ml_io_info *output; struct rte_ml_io_info *input; + struct cn10k_ml_dev *mldev; uint8_t i; uint8_t j; + mldev = dev->data->dev_private; metadata = &model->metadata; info = PLT_PTR_CAST(model->info); input = PLT_PTR_ADD(info, sizeof(struct rte_ml_model_info)); @@ -537,7 +539,9 @@ cn10k_ml_model_info_set(struct rte_ml_dev *dev, struct cn10k_ml_model *model) metadata->model.version[3]); info->model_id = model->model_id; info->device_id = dev->data->dev_id; - info->batch_size = model->batch_size; + info->io_layout = RTE_ML_IO_LAYOUT_PACKED; + info->min_batches = model->batch_size; + info->max_batches = mldev->fw.req->jd.fw_load.cap.s.max_num_batches / model->batch_size; info->nb_inputs = metadata->model.num_input; info->input_info = input; info->nb_outputs = metadata->model.num_output; diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c index e3faab81ba..1d72fb52a6 100644 --- a/drivers/ml/cnxk/cn10k_ml_ops.c +++ b/drivers/ml/cnxk/cn10k_ml_ops.c @@ -471,9 +471,9 @@ cn10k_ml_prep_fp_job_descriptor(struct rte_ml_dev *dev, struct cn10k_ml_req *req req->jd.hdr.sp_flags = 0x0; req->jd.hdr.result = roc_ml_addr_ap2mlip(&mldev->roc, &req->result); req->jd.model_run.input_ddr_addr = - PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->input.addr)); + PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->input[0]->addr)); req->jd.model_run.output_ddr_addr = - PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->output.addr)); + PLT_U64_CAST(roc_ml_addr_ap2mlip(&mldev->roc, op->output[0]->addr)); req->jd.model_run.num_batches = op->nb_batches; } @@ -856,7 +856,11 @@ cn10k_ml_model_xstats_reset(struct rte_ml_dev *dev, int32_t model_id, const uint static int cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id) { + struct rte_ml_model_info *info; struct cn10k_ml_model *model; + struct rte_ml_buff_seg seg[2]; + struct rte_ml_buff_seg *inp; + struct rte_ml_buff_seg *out; struct rte_ml_op op; char str[RTE_MEMZONE_NAMESIZE]; @@ -864,12 +868,22 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id) uint64_t isize = 0; uint64_t osize = 0; int ret = 0; + uint32_t i; model = dev->data->models[model_id]; + info = (struct rte_ml_model_info *)model->info; + inp = &seg[0]; + out = &seg[1]; /* Create input and output buffers. */ - rte_ml_io_input_size_get(dev->data->dev_id, model_id, model->batch_size, &isize, NULL); - rte_ml_io_output_size_get(dev->data->dev_id, model_id, model->batch_size, &osize, NULL); + for (i = 0; i < info->nb_inputs; i++) + isize += info->input_info[i].size; + + for (i = 0; i < info->nb_outputs; i++) + osize += info->output_info[i].size; + + isize = model->batch_size * isize; + osize = model->batch_size * osize; snprintf(str, RTE_MEMZONE_NAMESIZE, "%s_%u", "ml_dummy_io", model_id); mz = plt_memzone_reserve_aligned(str, isize + osize, 0, ML_CN10K_ALIGN_SIZE); @@ -877,17 +891,22 @@ cn10k_ml_cache_model_data(struct rte_ml_dev *dev, uint16_t model_id) return -ENOMEM; memset(mz->addr, 0, isize + osize); + seg[0].addr = mz->addr; + seg[0].iova_addr = mz->iova; + seg[0].length = isize; + seg[0].next = NULL; + + seg[1].addr = PLT_PTR_ADD(mz->addr, isize); + seg[1].iova_addr = mz->iova + isize; + seg[1].length = osize; + seg[1].next = NULL; + op.model_id = model_id; op.nb_batches = model->batch_size; op.mempool = NULL; - op.input.addr = mz->addr; - op.input.length = isize; - op.input.next = NULL; - - op.output.addr = PLT_PTR_ADD(op.input.addr, isize); - op.output.length = osize; - op.output.next = NULL; + op.input = &inp; + op.output = &out; memset(model->req, 0, sizeof(struct cn10k_ml_req)); ret = cn10k_ml_inference_sync(dev, &op); @@ -919,8 +938,9 @@ cn10k_ml_dev_info_get(struct rte_ml_dev *dev, struct rte_ml_dev_info *dev_info) else if (strcmp(mldev->fw.poll_mem, "ddr") == 0) dev_info->max_desc = ML_CN10K_MAX_DESC_PER_QP; + dev_info->max_io = ML_CN10K_MAX_INPUT_OUTPUT; dev_info->max_segments = ML_CN10K_MAX_SEGMENTS; - dev_info->min_align_size = ML_CN10K_ALIGN_SIZE; + dev_info->align_size = ML_CN10K_ALIGN_SIZE; return 0; } @@ -2139,15 +2159,14 @@ cn10k_ml_io_output_size_get(struct rte_ml_dev *dev, uint16_t model_id, uint32_t } static int -cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_batches, void *dbuffer, - void *qbuffer) +cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer, + struct rte_ml_buff_seg **qbuffer) { struct cn10k_ml_model *model; uint8_t model_input_type; uint8_t *lcl_dbuffer; uint8_t *lcl_qbuffer; uint8_t input_type; - uint32_t batch_id; float qscale; uint32_t i; uint32_t j; @@ -2160,11 +2179,9 @@ cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_batc return -EINVAL; } - lcl_dbuffer = dbuffer; - lcl_qbuffer = qbuffer; - batch_id = 0; + lcl_dbuffer = dbuffer[0]->addr; + lcl_qbuffer = qbuffer[0]->addr; -next_batch: for (i = 0; i < model->metadata.model.num_input; i++) { if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) { input_type = model->metadata.input1[i].input_type; @@ -2218,23 +2235,18 @@ cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_batc lcl_qbuffer += model->addr.input[i].sz_q; } - batch_id++; - if (batch_id < PLT_DIV_CEIL(nb_batches, model->batch_size)) - goto next_batch; - return 0; } static int -cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_batches, - void *qbuffer, void *dbuffer) +cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **qbuffer, + struct rte_ml_buff_seg **dbuffer) { struct cn10k_ml_model *model; uint8_t model_output_type; uint8_t *lcl_qbuffer; uint8_t *lcl_dbuffer; uint8_t output_type; - uint32_t batch_id; float dscale; uint32_t i; uint32_t j; @@ -2247,11 +2259,9 @@ cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_ba return -EINVAL; } - lcl_dbuffer = dbuffer; - lcl_qbuffer = qbuffer; - batch_id = 0; + lcl_dbuffer = dbuffer[0]->addr; + lcl_qbuffer = qbuffer[0]->addr; -next_batch: for (i = 0; i < model->metadata.model.num_output; i++) { if (i < MRVL_ML_NUM_INPUT_OUTPUT_1) { output_type = model->metadata.output1[i].output_type; @@ -2306,10 +2316,6 @@ cn10k_ml_io_dequantize(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_ba lcl_dbuffer += model->addr.output[i].sz_d; } - batch_id++; - if (batch_id < PLT_DIV_CEIL(nb_batches, model->batch_size)) - goto next_batch; - return 0; } diff --git a/lib/mldev/meson.build b/lib/mldev/meson.build index 5769b0640a..0079ccd205 100644 --- a/lib/mldev/meson.build +++ b/lib/mldev/meson.build @@ -35,7 +35,7 @@ driver_sdk_headers += files( 'mldev_utils.h', ) -deps += ['mempool'] +deps += ['mempool', 'mbuf'] if get_option('buildtype').contains('debug') cflags += [ '-DRTE_LIBRTE_ML_DEV_DEBUG' ] diff --git a/lib/mldev/rte_mldev.c b/lib/mldev/rte_mldev.c index 0d8ccd3212..9a48ed3e94 100644 --- a/lib/mldev/rte_mldev.c +++ b/lib/mldev/rte_mldev.c @@ -730,8 +730,8 @@ rte_ml_io_output_size_get(int16_t dev_id, uint16_t model_id, uint32_t nb_batches } int -rte_ml_io_quantize(int16_t dev_id, uint16_t model_id, uint16_t nb_batches, void *dbuffer, - void *qbuffer) +rte_ml_io_quantize(int16_t dev_id, uint16_t model_id, struct rte_ml_buff_seg **dbuffer, + struct rte_ml_buff_seg **qbuffer) { struct rte_ml_dev *dev; @@ -754,12 +754,12 @@ rte_ml_io_quantize(int16_t dev_id, uint16_t model_id, uint16_t nb_batches, void return -EINVAL; } - return (*dev->dev_ops->io_quantize)(dev, model_id, nb_batches, dbuffer, qbuffer); + return (*dev->dev_ops->io_quantize)(dev, model_id, dbuffer, qbuffer); } int -rte_ml_io_dequantize(int16_t dev_id, uint16_t model_id, uint16_t nb_batches, void *qbuffer, - void *dbuffer) +rte_ml_io_dequantize(int16_t dev_id, uint16_t model_id, struct rte_ml_buff_seg **qbuffer, + struct rte_ml_buff_seg **dbuffer) { struct rte_ml_dev *dev; @@ -782,7 +782,7 @@ rte_ml_io_dequantize(int16_t dev_id, uint16_t model_id, uint16_t nb_batches, voi return -EINVAL; } - return (*dev->dev_ops->io_dequantize)(dev, model_id, nb_batches, qbuffer, dbuffer); + return (*dev->dev_ops->io_dequantize)(dev, model_id, qbuffer, dbuffer); } /** Initialise rte_ml_op mempool element */ diff --git a/lib/mldev/rte_mldev.h b/lib/mldev/rte_mldev.h index 6204df0930..316c6fd018 100644 --- a/lib/mldev/rte_mldev.h +++ b/lib/mldev/rte_mldev.h @@ -228,12 +228,14 @@ struct rte_ml_dev_info { /**< Maximum allowed number of descriptors for queue pair by the device. * @see struct rte_ml_dev_qp_conf::nb_desc */ + uint16_t max_io; + /**< Maximum number of inputs/outputs supported per model. */ uint16_t max_segments; /**< Maximum number of scatter-gather entries supported by the device. * @see struct rte_ml_buff_seg struct rte_ml_buff_seg::next */ - uint16_t min_align_size; - /**< Minimum alignment size of IO buffers used by the device. */ + uint16_t align_size; + /**< Alignment size of IO buffers used by the device. */ }; /** @@ -429,10 +431,28 @@ struct rte_ml_op { /**< Reserved for future use. */ struct rte_mempool *mempool; /**< Pool from which operation is allocated. */ - struct rte_ml_buff_seg input; - /**< Input buffer to hold the inference data. */ - struct rte_ml_buff_seg output; - /**< Output buffer to hold the inference output by the driver. */ + struct rte_ml_buff_seg **input; + /**< Array of buffer segments to hold the inference input data. + * + * When the model supports IO layout RTE_ML_IO_LAYOUT_PACKED, size of + * the array is 1. + * + * When the model supports IO layout RTE_ML_IO_LAYOUT_SPLIT, size of + * the array is rte_ml_model_info::nb_inputs. + * + * @see struct rte_ml_dev_info::io_layout + */ + struct rte_ml_buff_seg **output; + /**< Array of buffer segments to hold the inference output data. + * + * When the model supports IO layout RTE_ML_IO_LAYOUT_PACKED, size of + * the array is 1. + * + * When the model supports IO layout RTE_ML_IO_LAYOUT_SPLIT, size of + * the array is rte_ml_model_info::nb_outputs. + * + * @see struct rte_ml_dev_info::io_layout + */ union { uint64_t user_u64; /**< User data as uint64_t.*/ @@ -863,7 +883,37 @@ enum rte_ml_io_type { /**< 16-bit brain floating point number. */ }; -/** Input and output data information structure +/** ML I/O buffer layout */ +enum rte_ml_io_layout { + RTE_ML_IO_LAYOUT_PACKED, + /**< All inputs for the model should packed in a single buffer with + * no padding between individual inputs. The buffer is expected to + * be aligned to rte_ml_dev_info::align_size. + * + * When I/O segmentation is supported by the device, the packed + * data can be split into multiple segments. In this case, each + * segment is expected to be aligned to rte_ml_dev_info::align_size + * + * Same applies to output. + * + * @see struct rte_ml_dev_info::max_segments + */ + RTE_ML_IO_LAYOUT_SPLIT + /**< Each input for the model should be stored as separate buffers + * and each input should be aligned to rte_ml_dev_info::align_size. + * + * When I/O segmentation is supported, each input can be split into + * multiple segments. In this case, each segment is expected to be + * aligned to rte_ml_dev_info::align_size + * + * Same applies to output. + * + * @see struct rte_ml_dev_info::max_segments + */ +}; + +/** + * Input and output data information structure * * Specifies the type and shape of input and output data. */ @@ -873,7 +923,7 @@ struct rte_ml_io_info { uint32_t nb_dims; /**< Number of dimensions in shape */ uint32_t *shape; - /**< Shape of the tensor */ + /**< Shape of the tensor for rte_ml_model_info::min_batches of the model. */ enum rte_ml_io_type type; /**< Type of data * @see enum rte_ml_io_type @@ -894,8 +944,16 @@ struct rte_ml_model_info { /**< Model ID */ uint16_t device_id; /**< Device ID */ - uint16_t batch_size; - /**< Maximum number of batches that the model can process simultaneously */ + enum rte_ml_io_layout io_layout; + /**< I/O buffer layout for the model */ + uint16_t min_batches; + /**< Minimum number of batches that the model can process + * in one inference request + */ + uint16_t max_batches; + /**< Maximum number of batches that the model can process + * in one inference request + */ uint32_t nb_inputs; /**< Number of inputs */ const struct rte_ml_io_info *input_info; @@ -1021,8 +1079,6 @@ rte_ml_io_output_size_get(int16_t dev_id, uint16_t model_id, uint32_t nb_batches * The identifier of the device. * @param[in] model_id * Identifier for the model - * @param[in] nb_batches - * Number of batches in the dequantized input buffer * @param[in] dbuffer * Address of dequantized input data * @param[in] qbuffer @@ -1034,8 +1090,8 @@ rte_ml_io_output_size_get(int16_t dev_id, uint16_t model_id, uint32_t nb_batches */ __rte_experimental int -rte_ml_io_quantize(int16_t dev_id, uint16_t model_id, uint16_t nb_batches, void *dbuffer, - void *qbuffer); +rte_ml_io_quantize(int16_t dev_id, uint16_t model_id, struct rte_ml_buff_seg **dbuffer, + struct rte_ml_buff_seg **qbuffer); /** * Dequantize output data. @@ -1047,8 +1103,6 @@ rte_ml_io_quantize(int16_t dev_id, uint16_t model_id, uint16_t nb_batches, void * The identifier of the device. * @param[in] model_id * Identifier for the model - * @param[in] nb_batches - * Number of batches in the dequantized output buffer * @param[in] qbuffer * Address of quantized output data * @param[in] dbuffer @@ -1060,8 +1114,8 @@ rte_ml_io_quantize(int16_t dev_id, uint16_t model_id, uint16_t nb_batches, void */ __rte_experimental int -rte_ml_io_dequantize(int16_t dev_id, uint16_t model_id, uint16_t nb_batches, void *qbuffer, - void *dbuffer); +rte_ml_io_dequantize(int16_t dev_id, uint16_t model_id, struct rte_ml_buff_seg **qbuffer, + struct rte_ml_buff_seg **dbuffer); /* ML op pool operations */ diff --git a/lib/mldev/rte_mldev_core.h b/lib/mldev/rte_mldev_core.h index 78b8b7633d..8530b07316 100644 --- a/lib/mldev/rte_mldev_core.h +++ b/lib/mldev/rte_mldev_core.h @@ -523,8 +523,6 @@ typedef int (*mldev_io_output_size_get_t)(struct rte_ml_dev *dev, uint16_t model * ML device pointer. * @param model_id * Model ID to use. - * @param nb_batches - * Number of batches. * @param dbuffer * Pointer t de-quantized data buffer. * @param qbuffer @@ -534,8 +532,9 @@ typedef int (*mldev_io_output_size_get_t)(struct rte_ml_dev *dev, uint16_t model * - 0 on success. * - <0, error on failure. */ -typedef int (*mldev_io_quantize_t)(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_batches, - void *dbuffer, void *qbuffer); +typedef int (*mldev_io_quantize_t)(struct rte_ml_dev *dev, uint16_t model_id, + struct rte_ml_buff_seg **dbuffer, + struct rte_ml_buff_seg **qbuffer); /** * @internal @@ -546,8 +545,6 @@ typedef int (*mldev_io_quantize_t)(struct rte_ml_dev *dev, uint16_t model_id, ui * ML device pointer. * @param model_id * Model ID to use. - * @param nb_batches - * Number of batches. * @param qbuffer * Pointer t de-quantized data buffer. * @param dbuffer @@ -557,8 +554,9 @@ typedef int (*mldev_io_quantize_t)(struct rte_ml_dev *dev, uint16_t model_id, ui * - 0 on success. * - <0, error on failure. */ -typedef int (*mldev_io_dequantize_t)(struct rte_ml_dev *dev, uint16_t model_id, uint16_t nb_batches, - void *qbuffer, void *dbuffer); +typedef int (*mldev_io_dequantize_t)(struct rte_ml_dev *dev, uint16_t model_id, + struct rte_ml_buff_seg **qbuffer, + struct rte_ml_buff_seg **dbuffer); /** * @internal -- 2.41.0 ^ permalink raw reply [flat|nested] 26+ messages in thread
* RE: [PATCH v4 2/3] mldev: introduce support for IO layout 2023-10-02 9:58 ` [PATCH v4 2/3] mldev: introduce support for IO layout Srikanth Yalavarthi @ 2023-10-05 9:10 ` Shivah Shankar Shankar Narayan Rao 0 siblings, 0 replies; 26+ messages in thread From: Shivah Shankar Shankar Narayan Rao @ 2023-10-05 9:10 UTC (permalink / raw) To: Srikanth Yalavarthi, Srikanth Yalavarthi; +Cc: dev, Anup Prabhu, Prince Takkar [-- Attachment #1: Type: text/plain, Size: 1206 bytes --] > -----Original Message----- > From: Srikanth Yalavarthi <syalavarthi@marvell.com> > Sent: Monday, October 2, 2023 3:29 PM > To: Srikanth Yalavarthi <syalavarthi@marvell.com> > Cc: dev@dpdk.org; Shivah Shankar Shankar Narayan Rao > <sshankarnara@marvell.com>; Anup Prabhu <aprabhu@marvell.com>; > Prince Takkar <ptakkar@marvell.com> > Subject: [PATCH v4 2/3] mldev: introduce support for IO layout > > Introduce IO layout in ML device specification. IO layout defines the > expected arrangement of model input and output buffers in the memory. > Packed and Split layout support is added in the specification. > > Updated rte_ml_op to support array of rte_ml_buff_seg pointers to support > packed and split I/O layouts. Updated ML quantize and dequantize APIs to > support rte_ml_buff_seg pointer arrays. Replaced batch_size with > min_batches and max_batches in rte_ml_model_info. > > Implement support for model IO layout in ml/cnxk driver. > Updated the ML test application to support IO layout and dropped support > for '--batches' in test application. > > Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com> Acked-by: Shivah Shankar S <sshankarnara@marvell.com> [-- Attachment #2: winmail.dat --] [-- Type: application/ms-tnef, Size: 36350 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH v4 3/3] mldev: drop input and output size get APIs 2023-10-02 9:58 ` [PATCH v4 0/3] Spec changes to support multi I/O models Srikanth Yalavarthi 2023-10-02 9:58 ` [PATCH v4 1/3] mldev: add support for arbitrary shape dimensions Srikanth Yalavarthi 2023-10-02 9:58 ` [PATCH v4 2/3] mldev: introduce support for IO layout Srikanth Yalavarthi @ 2023-10-02 9:58 ` Srikanth Yalavarthi 2023-10-03 6:12 ` Anup Prabhu 2023-10-05 9:06 ` Shivah Shankar Shankar Narayan Rao 2023-10-11 14:45 ` [PATCH v4 0/3] Spec changes to support multi I/O models Thomas Monjalon 3 siblings, 2 replies; 26+ messages in thread From: Srikanth Yalavarthi @ 2023-10-02 9:58 UTC (permalink / raw) To: Srikanth Yalavarthi; +Cc: dev, sshankarnara, aprabhu, ptakkar Drop support and use of ML input and output size get functions, rte_ml_io_input_size_get and rte_ml_io_output_size_get. These functions are not required, as the model buffer size can be computed from the fields of updated rte_ml_io_info structure. Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com> --- doc/guides/rel_notes/release_23_11.rst | 2 + drivers/ml/cnxk/cn10k_ml_ops.c | 50 --------------------- lib/mldev/rte_mldev.c | 38 ---------------- lib/mldev/rte_mldev.h | 60 -------------------------- lib/mldev/rte_mldev_core.h | 54 ----------------------- lib/mldev/version.map | 2 - 6 files changed, 2 insertions(+), 204 deletions(-) diff --git a/doc/guides/rel_notes/release_23_11.rst b/doc/guides/rel_notes/release_23_11.rst index 8562bac77c..efd9b89bd7 100644 --- a/doc/guides/rel_notes/release_23_11.rst +++ b/doc/guides/rel_notes/release_23_11.rst @@ -102,6 +102,8 @@ Removed Items * kni: Removed the Kernel Network Interface (KNI) library and driver. +* mldev: Removed APIs ``rte_ml_io_input_size_get`` and ``rte_ml_io_output_size_get``. + API Changes ----------- diff --git a/drivers/ml/cnxk/cn10k_ml_ops.c b/drivers/ml/cnxk/cn10k_ml_ops.c index 1d72fb52a6..4abf4ae0d3 100644 --- a/drivers/ml/cnxk/cn10k_ml_ops.c +++ b/drivers/ml/cnxk/cn10k_ml_ops.c @@ -2110,54 +2110,6 @@ cn10k_ml_model_params_update(struct rte_ml_dev *dev, uint16_t model_id, void *bu return 0; } -static int -cn10k_ml_io_input_size_get(struct rte_ml_dev *dev, uint16_t model_id, uint32_t nb_batches, - uint64_t *input_qsize, uint64_t *input_dsize) -{ - struct cn10k_ml_model *model; - - model = dev->data->models[model_id]; - - if (model == NULL) { - plt_err("Invalid model_id = %u", model_id); - return -EINVAL; - } - - if (input_qsize != NULL) - *input_qsize = PLT_U64_CAST(model->addr.total_input_sz_q * - PLT_DIV_CEIL(nb_batches, model->batch_size)); - - if (input_dsize != NULL) - *input_dsize = PLT_U64_CAST(model->addr.total_input_sz_d * - PLT_DIV_CEIL(nb_batches, model->batch_size)); - - return 0; -} - -static int -cn10k_ml_io_output_size_get(struct rte_ml_dev *dev, uint16_t model_id, uint32_t nb_batches, - uint64_t *output_qsize, uint64_t *output_dsize) -{ - struct cn10k_ml_model *model; - - model = dev->data->models[model_id]; - - if (model == NULL) { - plt_err("Invalid model_id = %u", model_id); - return -EINVAL; - } - - if (output_qsize != NULL) - *output_qsize = PLT_U64_CAST(model->addr.total_output_sz_q * - PLT_DIV_CEIL(nb_batches, model->batch_size)); - - if (output_dsize != NULL) - *output_dsize = PLT_U64_CAST(model->addr.total_output_sz_d * - PLT_DIV_CEIL(nb_batches, model->batch_size)); - - return 0; -} - static int cn10k_ml_io_quantize(struct rte_ml_dev *dev, uint16_t model_id, struct rte_ml_buff_seg **dbuffer, struct rte_ml_buff_seg **qbuffer) @@ -2636,8 +2588,6 @@ struct rte_ml_dev_ops cn10k_ml_ops = { .model_params_update = cn10k_ml_model_params_update, /* I/O ops */ - .io_input_size_get = cn10k_ml_io_input_size_get, - .io_output_size_get = cn10k_ml_io_output_size_get, .io_quantize = cn10k_ml_io_quantize, .io_dequantize = cn10k_ml_io_dequantize, }; diff --git a/lib/mldev/rte_mldev.c b/lib/mldev/rte_mldev.c index 9a48ed3e94..cc5f2e0cc6 100644 --- a/lib/mldev/rte_mldev.c +++ b/lib/mldev/rte_mldev.c @@ -691,44 +691,6 @@ rte_ml_model_params_update(int16_t dev_id, uint16_t model_id, void *buffer) return (*dev->dev_ops->model_params_update)(dev, model_id, buffer); } -int -rte_ml_io_input_size_get(int16_t dev_id, uint16_t model_id, uint32_t nb_batches, - uint64_t *input_qsize, uint64_t *input_dsize) -{ - struct rte_ml_dev *dev; - - if (!rte_ml_dev_is_valid_dev(dev_id)) { - RTE_MLDEV_LOG(ERR, "Invalid dev_id = %d\n", dev_id); - return -EINVAL; - } - - dev = rte_ml_dev_pmd_get_dev(dev_id); - if (*dev->dev_ops->io_input_size_get == NULL) - return -ENOTSUP; - - return (*dev->dev_ops->io_input_size_get)(dev, model_id, nb_batches, input_qsize, - input_dsize); -} - -int -rte_ml_io_output_size_get(int16_t dev_id, uint16_t model_id, uint32_t nb_batches, - uint64_t *output_qsize, uint64_t *output_dsize) -{ - struct rte_ml_dev *dev; - - if (!rte_ml_dev_is_valid_dev(dev_id)) { - RTE_MLDEV_LOG(ERR, "Invalid dev_id = %d\n", dev_id); - return -EINVAL; - } - - dev = rte_ml_dev_pmd_get_dev(dev_id); - if (*dev->dev_ops->io_output_size_get == NULL) - return -ENOTSUP; - - return (*dev->dev_ops->io_output_size_get)(dev, model_id, nb_batches, output_qsize, - output_dsize); -} - int rte_ml_io_quantize(int16_t dev_id, uint16_t model_id, struct rte_ml_buff_seg **dbuffer, struct rte_ml_buff_seg **qbuffer) diff --git a/lib/mldev/rte_mldev.h b/lib/mldev/rte_mldev.h index 316c6fd018..63b2670bb0 100644 --- a/lib/mldev/rte_mldev.h +++ b/lib/mldev/rte_mldev.h @@ -1008,66 +1008,6 @@ rte_ml_model_params_update(int16_t dev_id, uint16_t model_id, void *buffer); /* IO operations */ -/** - * Get size of quantized and dequantized input buffers. - * - * Calculate the size of buffers required for quantized and dequantized input data. - * This API would return the buffer sizes for the number of batches provided and would - * consider the alignment requirements as per the PMD. Input sizes computed by this API can - * be used by the application to allocate buffers. - * - * @param[in] dev_id - * The identifier of the device. - * @param[in] model_id - * Identifier for the model created - * @param[in] nb_batches - * Number of batches of input to be processed in a single inference job - * @param[out] input_qsize - * Quantized input size pointer. - * NULL value is allowed, in which case input_qsize is not calculated by the driver. - * @param[out] input_dsize - * Dequantized input size pointer. - * NULL value is allowed, in which case input_dsize is not calculated by the driver. - * - * @return - * - Returns 0 on success - * - Returns negative value on failure - */ -__rte_experimental -int -rte_ml_io_input_size_get(int16_t dev_id, uint16_t model_id, uint32_t nb_batches, - uint64_t *input_qsize, uint64_t *input_dsize); - -/** - * Get size of quantized and dequantized output buffers. - * - * Calculate the size of buffers required for quantized and dequantized output data. - * This API would return the buffer sizes for the number of batches provided and would consider - * the alignment requirements as per the PMD. Output sizes computed by this API can be used by the - * application to allocate buffers. - * - * @param[in] dev_id - * The identifier of the device. - * @param[in] model_id - * Identifier for the model created - * @param[in] nb_batches - * Number of batches of input to be processed in a single inference job - * @param[out] output_qsize - * Quantized output size pointer. - * NULL value is allowed, in which case output_qsize is not calculated by the driver. - * @param[out] output_dsize - * Dequantized output size pointer. - * NULL value is allowed, in which case output_dsize is not calculated by the driver. - * - * @return - * - Returns 0 on success - * - Returns negative value on failure - */ -__rte_experimental -int -rte_ml_io_output_size_get(int16_t dev_id, uint16_t model_id, uint32_t nb_batches, - uint64_t *output_qsize, uint64_t *output_dsize); - /** * Quantize input data. * diff --git a/lib/mldev/rte_mldev_core.h b/lib/mldev/rte_mldev_core.h index 8530b07316..2279b1dcec 100644 --- a/lib/mldev/rte_mldev_core.h +++ b/lib/mldev/rte_mldev_core.h @@ -466,54 +466,6 @@ typedef int (*mldev_model_info_get_t)(struct rte_ml_dev *dev, uint16_t model_id, */ typedef int (*mldev_model_params_update_t)(struct rte_ml_dev *dev, uint16_t model_id, void *buffer); -/** - * @internal - * - * Get size of input buffers. - * - * @param dev - * ML device pointer. - * @param model_id - * Model ID to use. - * @param nb_batches - * Number of batches. - * @param input_qsize - * Size of quantized input. - * @param input_dsize - * Size of dequantized input. - * - * @return - * - 0 on success. - * - <0, error on failure. - */ -typedef int (*mldev_io_input_size_get_t)(struct rte_ml_dev *dev, uint16_t model_id, - uint32_t nb_batches, uint64_t *input_qsize, - uint64_t *input_dsize); - -/** - * @internal - * - * Get size of output buffers. - * - * @param dev - * ML device pointer. - * @param model_id - * Model ID to use. - * @param nb_batches - * Number of batches. - * @param output_qsize - * Size of quantized output. - * @param output_dsize - * Size of dequantized output. - * - * @return - * - 0 on success. - * - <0, error on failure. - */ -typedef int (*mldev_io_output_size_get_t)(struct rte_ml_dev *dev, uint16_t model_id, - uint32_t nb_batches, uint64_t *output_qsize, - uint64_t *output_dsize); - /** * @internal * @@ -627,12 +579,6 @@ struct rte_ml_dev_ops { /** Update model params. */ mldev_model_params_update_t model_params_update; - /** Get input buffer size. */ - mldev_io_input_size_get_t io_input_size_get; - - /** Get output buffer size. */ - mldev_io_output_size_get_t io_output_size_get; - /** Quantize data */ mldev_io_quantize_t io_quantize; diff --git a/lib/mldev/version.map b/lib/mldev/version.map index 40ff27f4b9..99841db6aa 100644 --- a/lib/mldev/version.map +++ b/lib/mldev/version.map @@ -23,8 +23,6 @@ EXPERIMENTAL { rte_ml_dev_xstats_reset; rte_ml_enqueue_burst; rte_ml_io_dequantize; - rte_ml_io_input_size_get; - rte_ml_io_output_size_get; rte_ml_io_quantize; rte_ml_model_info_get; rte_ml_model_load; -- 2.41.0 ^ permalink raw reply [flat|nested] 26+ messages in thread
* RE: [PATCH v4 3/3] mldev: drop input and output size get APIs 2023-10-02 9:58 ` [PATCH v4 3/3] mldev: drop input and output size get APIs Srikanth Yalavarthi @ 2023-10-03 6:12 ` Anup Prabhu 2023-10-05 9:06 ` Shivah Shankar Shankar Narayan Rao 1 sibling, 0 replies; 26+ messages in thread From: Anup Prabhu @ 2023-10-03 6:12 UTC (permalink / raw) To: Srikanth Yalavarthi, Srikanth Yalavarthi Cc: dev, Shivah Shankar Shankar Narayan Rao, Prince Takkar [-- Attachment #1: Type: text/plain, Size: 797 bytes --] > -----Original Message----- > From: Srikanth Yalavarthi <syalavarthi@marvell.com> > Sent: Monday, October 2, 2023 3:29 PM > To: Srikanth Yalavarthi <syalavarthi@marvell.com> > Cc: dev@dpdk.org; Shivah Shankar Shankar Narayan Rao > <sshankarnara@marvell.com>; Anup Prabhu <aprabhu@marvell.com>; > Prince Takkar <ptakkar@marvell.com> > Subject: [PATCH v4 3/3] mldev: drop input and output size get APIs > > Drop support and use of ML input and output size get functions, > rte_ml_io_input_size_get and rte_ml_io_output_size_get. > > These functions are not required, as the model buffer size can be computed > from the fields of updated rte_ml_io_info structure. > > Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com> Acked-by: Anup Prabhu <aprabhu@marvell.com> [-- Attachment #2: winmail.dat --] [-- Type: application/ms-tnef, Size: 35414 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* RE: [PATCH v4 3/3] mldev: drop input and output size get APIs 2023-10-02 9:58 ` [PATCH v4 3/3] mldev: drop input and output size get APIs Srikanth Yalavarthi 2023-10-03 6:12 ` Anup Prabhu @ 2023-10-05 9:06 ` Shivah Shankar Shankar Narayan Rao 1 sibling, 0 replies; 26+ messages in thread From: Shivah Shankar Shankar Narayan Rao @ 2023-10-05 9:06 UTC (permalink / raw) To: Srikanth Yalavarthi, Srikanth Yalavarthi; +Cc: dev, Anup Prabhu, Prince Takkar [-- Attachment #1: Type: text/plain, Size: 803 bytes --] > -----Original Message----- > From: Srikanth Yalavarthi <syalavarthi@marvell.com> > Sent: Monday, October 2, 2023 3:29 PM > To: Srikanth Yalavarthi <syalavarthi@marvell.com> > Cc: dev@dpdk.org; Shivah Shankar Shankar Narayan Rao > <sshankarnara@marvell.com>; Anup Prabhu <aprabhu@marvell.com>; > Prince Takkar <ptakkar@marvell.com> > Subject: [PATCH v4 3/3] mldev: drop input and output size get APIs > > Drop support and use of ML input and output size get functions, > rte_ml_io_input_size_get and rte_ml_io_output_size_get. > > These functions are not required, as the model buffer size can be computed > from the fields of updated rte_ml_io_info structure. > > Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com> Acked-by: Shivah Shankar S <sshankarnara@marvell.com> [-- Attachment #2: winmail.dat --] [-- Type: application/ms-tnef, Size: 36134 bytes --] ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v4 0/3] Spec changes to support multi I/O models 2023-10-02 9:58 ` [PATCH v4 0/3] Spec changes to support multi I/O models Srikanth Yalavarthi ` (2 preceding siblings ...) 2023-10-02 9:58 ` [PATCH v4 3/3] mldev: drop input and output size get APIs Srikanth Yalavarthi @ 2023-10-11 14:45 ` Thomas Monjalon 3 siblings, 0 replies; 26+ messages in thread From: Thomas Monjalon @ 2023-10-11 14:45 UTC (permalink / raw) To: Srikanth Yalavarthi; +Cc: dev, syalavarthi, sshankarnara, aprabhu, ptakkar 02/10/2023 11:58, Srikanth Yalavarthi: > This series implements changes to mldev spec to extend support > for ML models with multiple inputs and outputs. Changes include > introduction of I/O layout to support packed and split buffers > for model input and output. Extended the rte_ml_model_info > structure to support multiple inputs and outputs. > > Updated rte_ml_op and quantize / dequantize APIs to support an > array of input and output ML buffer segments. > > Support for batches option is dropped from test application. Applied, thanks. ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2023-10-11 14:45 UTC | newest] Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-08-30 15:52 [PATCH v1 0/3] Spec changes to support multi I/O models Srikanth Yalavarthi 2023-08-30 15:53 ` [PATCH v1 1/3] mldev: add support for arbitrary shape dimensions Srikanth Yalavarthi 2023-08-30 15:53 ` [PATCH v1 2/3] mldev: introduce support for IO layout Srikanth Yalavarthi 2023-08-30 15:53 ` [PATCH v1 3/3] mldev: drop input and output size get APIs Srikanth Yalavarthi 2023-09-20 7:19 ` [PATCH v2 0/3] Spec changes to support multi I/O models Srikanth Yalavarthi 2023-09-20 7:19 ` [PATCH v2 1/3] mldev: add support for arbitrary shape dimensions Srikanth Yalavarthi 2023-09-20 7:19 ` [PATCH v2 2/3] mldev: introduce support for IO layout Srikanth Yalavarthi 2023-09-20 7:19 ` [PATCH v2 3/3] mldev: drop input and output size get APIs Srikanth Yalavarthi 2023-10-03 6:10 ` Anup Prabhu 2023-09-27 18:11 ` [PATCH v3 0/4] Spec changes to support multi I/O models Srikanth Yalavarthi 2023-09-27 18:11 ` [PATCH v3 1/4] mldev: add support for arbitrary shape dimensions Srikanth Yalavarthi 2023-09-27 18:11 ` [PATCH v3 2/4] mldev: introduce support for IO layout Srikanth Yalavarthi 2023-09-27 18:11 ` [PATCH v3 3/4] mldev: drop input and output size get APIs Srikanth Yalavarthi 2023-09-27 18:11 ` [PATCH v3 4/4] mldev: update release notes for 23.11 Srikanth Yalavarthi 2023-09-29 3:39 ` Jerin Jacob 2023-10-02 9:59 ` [EXT] " Srikanth Yalavarthi 2023-10-02 9:58 ` [PATCH v4 0/3] Spec changes to support multi I/O models Srikanth Yalavarthi 2023-10-02 9:58 ` [PATCH v4 1/3] mldev: add support for arbitrary shape dimensions Srikanth Yalavarthi 2023-10-04 14:42 ` Anup Prabhu 2023-10-05 9:12 ` Shivah Shankar Shankar Narayan Rao 2023-10-02 9:58 ` [PATCH v4 2/3] mldev: introduce support for IO layout Srikanth Yalavarthi 2023-10-05 9:10 ` Shivah Shankar Shankar Narayan Rao 2023-10-02 9:58 ` [PATCH v4 3/3] mldev: drop input and output size get APIs Srikanth Yalavarthi 2023-10-03 6:12 ` Anup Prabhu 2023-10-05 9:06 ` Shivah Shankar Shankar Narayan Rao 2023-10-11 14:45 ` [PATCH v4 0/3] Spec changes to support multi I/O models Thomas Monjalon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).