* [PATCH 0/3] add support for additional data types
@ 2024-01-07 15:28 Srikanth Yalavarthi
2024-01-07 15:28 ` [PATCH 1/3] mldev: add conversion routines for 32-bit integers Srikanth Yalavarthi
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Srikanth Yalavarthi @ 2024-01-07 15:28 UTC (permalink / raw)
Cc: dev, aprabhu, syalavarthi, sshankarnara, ptakkar
Added support for 64-bit integer data types for inference input and
output. Extended support for quantization of 32-bit and 64-bit integer
data types.
Srikanth Yalavarthi (3):
mldev: add conversion routines for 32-bit integers
mldev: add support for 64-integer data type
ml/cnxk: add support for additional integer types
drivers/ml/cnxk/cnxk_ml_io.c | 24 ++
drivers/ml/cnxk/mvtvm_ml_model.c | 4 +
lib/mldev/mldev_utils.c | 4 +
lib/mldev/mldev_utils.h | 184 ++++++++++
lib/mldev/mldev_utils_neon.c | 566 +++++++++++++++++++++++++++++++
lib/mldev/mldev_utils_scalar.c | 196 +++++++++++
lib/mldev/rte_mldev.h | 4 +
lib/mldev/version.map | 8 +
8 files changed, 990 insertions(+)
--
2.42.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/3] mldev: add conversion routines for 32-bit integers
2024-01-07 15:28 [PATCH 0/3] add support for additional data types Srikanth Yalavarthi
@ 2024-01-07 15:28 ` Srikanth Yalavarthi
2024-01-07 15:28 ` [PATCH 2/3] mldev: add support for 64-integer data type Srikanth Yalavarthi
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Srikanth Yalavarthi @ 2024-01-07 15:28 UTC (permalink / raw)
To: Srikanth Yalavarthi, Ruifeng Wang; +Cc: dev, aprabhu, sshankarnara, ptakkar
Added routines to convert data from 32-bit integer type to
float32_t and vice-versa.
Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
lib/mldev/mldev_utils.h | 92 +++++++++++++
lib/mldev/mldev_utils_neon.c | 242 +++++++++++++++++++++++++++++++++
lib/mldev/mldev_utils_scalar.c | 98 +++++++++++++
lib/mldev/version.map | 4 +
4 files changed, 436 insertions(+)
diff --git a/lib/mldev/mldev_utils.h b/lib/mldev/mldev_utils.h
index 220afb42f0d..1d041531b43 100644
--- a/lib/mldev/mldev_utils.h
+++ b/lib/mldev/mldev_utils.h
@@ -236,6 +236,98 @@ __rte_internal
int
rte_ml_io_uint16_to_float32(float scale, uint64_t nb_elements, void *input, void *output);
+/**
+ * @internal
+ *
+ * Convert a buffer containing numbers in single precision floating format (float32) to signed
+ * 32-bit integer format (INT32).
+ *
+ * @param[in] scale
+ * Scale factor for conversion.
+ * @param[in] nb_elements
+ * Number of elements in the buffer.
+ * @param[in] input
+ * Input buffer containing float32 numbers. Size of buffer is equal to (nb_elements * 4) bytes.
+ * @param[out] output
+ * Output buffer to store INT32 numbers. Size of buffer is equal to (nb_elements * 4) bytes.
+ *
+ * @return
+ * - 0, Success.
+ * - < 0, Error code on failure.
+ */
+__rte_internal
+int
+rte_ml_io_float32_to_int32(float scale, uint64_t nb_elements, void *input, void *output);
+
+/**
+ * @internal
+ *
+ * Convert a buffer containing numbers in signed 32-bit integer format (INT32) to single precision
+ * floating format (float32).
+ *
+ * @param[in] scale
+ * Scale factor for conversion.
+ * @param[in] nb_elements
+ * Number of elements in the buffer.
+ * @param[in] input
+ * Input buffer containing INT32 numbers. Size of buffer is equal to (nb_elements * 4) bytes.
+ * @param[out] output
+ * Output buffer to store float32 numbers. Size of buffer is equal to (nb_elements * 4) bytes.
+ *
+ * @return
+ * - 0, Success.
+ * - < 0, Error code on failure.
+ */
+__rte_internal
+int
+rte_ml_io_int32_to_float32(float scale, uint64_t nb_elements, void *input, void *output);
+
+/**
+ * @internal
+ *
+ * Convert a buffer containing numbers in single precision floating format (float32) to unsigned
+ * 32-bit integer format (UINT32).
+ *
+ * @param[in] scale
+ * Scale factor for conversion.
+ * @param[in] nb_elements
+ * Number of elements in the buffer.
+ * @param[in] input
+ * Input buffer containing float32 numbers. Size of buffer is equal to (nb_elements * 4) bytes.
+ * @param[out] output
+ * Output buffer to store UINT32 numbers. Size of buffer is equal to (nb_elements * 4) bytes.
+ *
+ * @return
+ * - 0, Success.
+ * - < 0, Error code on failure.
+ */
+__rte_internal
+int
+rte_ml_io_float32_to_uint32(float scale, uint64_t nb_elements, void *input, void *output);
+
+/**
+ * @internal
+ *
+ * Convert a buffer containing numbers in unsigned 32-bit integer format (UINT32) to single
+ * precision floating format (float32).
+ *
+ * @param[in] scale
+ * Scale factor for conversion.
+ * @param[in] nb_elements
+ * Number of elements in the buffer.
+ * @param[in] input
+ * Input buffer containing UINT32 numbers. Size of buffer is equal to (nb_elements * 4) bytes.
+ * @param[out] output
+ * Output buffer to store float32 numbers. Size of buffer is equal to (nb_elements * 4) bytes.
+ *
+ * @return
+ * - 0, Success.
+ * - < 0, Error code on failure.
+ */
+__rte_internal
+int
+rte_ml_io_uint32_to_float32(float scale, uint64_t nb_elements, void *input, void *output);
+
/**
* @internal
*
diff --git a/lib/mldev/mldev_utils_neon.c b/lib/mldev/mldev_utils_neon.c
index c7baec012b8..250fa43fa73 100644
--- a/lib/mldev/mldev_utils_neon.c
+++ b/lib/mldev/mldev_utils_neon.c
@@ -600,6 +600,248 @@ rte_ml_io_uint16_to_float32(float scale, uint64_t nb_elements, void *input, void
return 0;
}
+static inline void
+__float32_to_int32_neon_s32x4(float scale, float *input, int32_t *output)
+{
+ float32x4_t f32x4;
+ int32x4_t s32x4;
+
+ /* load 4 x float elements */
+ f32x4 = vld1q_f32(input);
+
+ /* scale */
+ f32x4 = vmulq_n_f32(f32x4, scale);
+
+ /* convert to int32x4_t using round to nearest with ties away rounding mode */
+ s32x4 = vcvtaq_s32_f32(f32x4);
+
+ /* store 4 elements */
+ vst1q_s32(output, s32x4);
+}
+
+static inline void
+__float32_to_int32_neon_s32x1(float scale, float *input, int32_t *output)
+{
+ /* scale and convert, round to nearest with ties away rounding mode */
+ *output = vcvtas_s32_f32(scale * (*input));
+}
+
+int
+rte_ml_io_float32_to_int32(float scale, uint64_t nb_elements, void *input, void *output)
+{
+ float *input_buffer;
+ int32_t *output_buffer;
+ uint64_t nb_iterations;
+ uint32_t vlen;
+ uint64_t i;
+
+ if ((scale == 0) || (nb_elements == 0) || (input == NULL) || (output == NULL))
+ return -EINVAL;
+
+ input_buffer = (float *)input;
+ output_buffer = (int32_t *)output;
+ vlen = 2 * sizeof(float) / sizeof(int32_t);
+ nb_iterations = nb_elements / vlen;
+
+ /* convert vlen elements in each iteration */
+ for (i = 0; i < nb_iterations; i++) {
+ __float32_to_int32_neon_s32x4(scale, input_buffer, output_buffer);
+ input_buffer += vlen;
+ output_buffer += vlen;
+ }
+
+ /* convert leftover elements */
+ i = i * vlen;
+ for (; i < nb_elements; i++) {
+ __float32_to_int32_neon_s32x1(scale, input_buffer, output_buffer);
+ input_buffer++;
+ output_buffer++;
+ }
+
+ return 0;
+}
+
+static inline void
+__int32_to_float32_neon_f32x4(float scale, int32_t *input, float *output)
+{
+ float32x4_t f32x4;
+ int32x4_t s32x4;
+
+ /* load 4 x int32_t elements */
+ s32x4 = vld1q_s32(input);
+
+ /* convert int32_t to float */
+ f32x4 = vcvtq_f32_s32(s32x4);
+
+ /* scale */
+ f32x4 = vmulq_n_f32(f32x4, scale);
+
+ /* store float32x4_t */
+ vst1q_f32(output, f32x4);
+}
+
+static inline void
+__int32_to_float32_neon_f32x1(float scale, int32_t *input, float *output)
+{
+ *output = scale * vcvts_f32_s32(*input);
+}
+
+int
+rte_ml_io_int32_to_float32(float scale, uint64_t nb_elements, void *input, void *output)
+{
+ int32_t *input_buffer;
+ float *output_buffer;
+ uint64_t nb_iterations;
+ uint32_t vlen;
+ uint64_t i;
+
+ if ((scale == 0) || (nb_elements == 0) || (input == NULL) || (output == NULL))
+ return -EINVAL;
+
+ input_buffer = (int32_t *)input;
+ output_buffer = (float *)output;
+ vlen = 2 * sizeof(float) / sizeof(int32_t);
+ nb_iterations = nb_elements / vlen;
+
+ /* convert vlen elements in each iteration */
+ for (i = 0; i < nb_iterations; i++) {
+ __int32_to_float32_neon_f32x4(scale, input_buffer, output_buffer);
+ input_buffer += vlen;
+ output_buffer += vlen;
+ }
+
+ /* convert leftover elements */
+ i = i * vlen;
+ for (; i < nb_elements; i++) {
+ __int32_to_float32_neon_f32x1(scale, input_buffer, output_buffer);
+ input_buffer++;
+ output_buffer++;
+ }
+
+ return 0;
+}
+
+static inline void
+__float32_to_uint32_neon_u32x4(float scale, float *input, uint32_t *output)
+{
+ float32x4_t f32x4;
+ uint32x4_t u32x4;
+
+ /* load 4 float elements */
+ f32x4 = vld1q_f32(input);
+
+ /* scale */
+ f32x4 = vmulq_n_f32(f32x4, scale);
+
+ /* convert using round to nearest with ties to away rounding mode */
+ u32x4 = vcvtaq_u32_f32(f32x4);
+
+ /* store 4 elements */
+ vst1q_u32(output, u32x4);
+}
+
+static inline void
+__float32_to_uint32_neon_u32x1(float scale, float *input, uint32_t *output)
+{
+ /* scale and convert, round to nearest with ties away rounding mode */
+ *output = vcvtas_u32_f32(scale * (*input));
+}
+
+int
+rte_ml_io_float32_to_uint32(float scale, uint64_t nb_elements, void *input, void *output)
+{
+ float *input_buffer;
+ uint32_t *output_buffer;
+ uint64_t nb_iterations;
+ uint64_t vlen;
+ uint64_t i;
+
+ if ((scale == 0) || (nb_elements == 0) || (input == NULL) || (output == NULL))
+ return -EINVAL;
+
+ input_buffer = (float *)input;
+ output_buffer = (uint32_t *)output;
+ vlen = 2 * sizeof(float) / sizeof(uint32_t);
+ nb_iterations = nb_elements / vlen;
+
+ /* convert vlen elements in each iteration */
+ for (i = 0; i < nb_iterations; i++) {
+ __float32_to_uint32_neon_u32x4(scale, input_buffer, output_buffer);
+ input_buffer += vlen;
+ output_buffer += vlen;
+ }
+
+ /* convert leftover elements */
+ i = i * vlen;
+ for (; i < nb_elements; i++) {
+ __float32_to_uint32_neon_u32x1(scale, input_buffer, output_buffer);
+ input_buffer++;
+ output_buffer++;
+ }
+
+ return 0;
+}
+
+static inline void
+__uint32_to_float32_neon_f32x4(float scale, uint32_t *input, float *output)
+{
+ float32x4_t f32x4;
+ uint32x4_t u32x4;
+
+ /* load 4 x uint32_t elements */
+ u32x4 = vld1q_u32(input);
+
+ /* convert uint32_t to float */
+ f32x4 = vcvtq_f32_u32(u32x4);
+
+ /* scale */
+ f32x4 = vmulq_n_f32(f32x4, scale);
+
+ /* store float32x4_t */
+ vst1q_f32(output, f32x4);
+}
+
+static inline void
+__uint32_to_float32_neon_f32x1(float scale, uint32_t *input, float *output)
+{
+ *output = scale * vcvts_f32_u32(*input);
+}
+
+int
+rte_ml_io_uint32_to_float32(float scale, uint64_t nb_elements, void *input, void *output)
+{
+ uint32_t *input_buffer;
+ float *output_buffer;
+ uint64_t nb_iterations;
+ uint32_t vlen;
+ uint64_t i;
+
+ if ((scale == 0) || (nb_elements == 0) || (input == NULL) || (output == NULL))
+ return -EINVAL;
+
+ input_buffer = (uint32_t *)input;
+ output_buffer = (float *)output;
+ vlen = 2 * sizeof(float) / sizeof(uint32_t);
+ nb_iterations = nb_elements / vlen;
+
+ /* convert vlen elements in each iteration */
+ for (i = 0; i < nb_iterations; i++) {
+ __uint32_to_float32_neon_f32x4(scale, input_buffer, output_buffer);
+ input_buffer += vlen;
+ output_buffer += vlen;
+ }
+
+ /* convert leftover elements */
+ i = i * vlen;
+ for (; i < nb_elements; i++) {
+ __uint32_to_float32_neon_f32x1(scale, input_buffer, output_buffer);
+ input_buffer++;
+ output_buffer++;
+ }
+
+ return 0;
+}
+
static inline void
__float32_to_float16_neon_f16x4(float32_t *input, float16_t *output)
{
diff --git a/lib/mldev/mldev_utils_scalar.c b/lib/mldev/mldev_utils_scalar.c
index 4d6cb880240..af1a3a103b2 100644
--- a/lib/mldev/mldev_utils_scalar.c
+++ b/lib/mldev/mldev_utils_scalar.c
@@ -229,6 +229,104 @@ rte_ml_io_uint16_to_float32(float scale, uint64_t nb_elements, void *input, void
return 0;
}
+int
+rte_ml_io_float32_to_int32(float scale, uint64_t nb_elements, void *input, void *output)
+{
+ float *input_buffer;
+ int32_t *output_buffer;
+ uint64_t i;
+
+ if ((scale == 0) || (nb_elements == 0) || (input == NULL) || (output == NULL))
+ return -EINVAL;
+
+ input_buffer = (float *)input;
+ output_buffer = (int32_t *)output;
+
+ for (i = 0; i < nb_elements; i++) {
+ *output_buffer = (int32_t)round((*input_buffer) * scale);
+
+ input_buffer++;
+ output_buffer++;
+ }
+
+ return 0;
+}
+
+int
+rte_ml_io_int32_to_float32(float scale, uint64_t nb_elements, void *input, void *output)
+{
+ int32_t *input_buffer;
+ float *output_buffer;
+ uint64_t i;
+
+ if ((scale == 0) || (nb_elements == 0) || (input == NULL) || (output == NULL))
+ return -EINVAL;
+
+ input_buffer = (int32_t *)input;
+ output_buffer = (float *)output;
+
+ for (i = 0; i < nb_elements; i++) {
+ *output_buffer = scale * (float)(*input_buffer);
+
+ input_buffer++;
+ output_buffer++;
+ }
+
+ return 0;
+}
+
+int
+rte_ml_io_float32_to_uint32(float scale, uint64_t nb_elements, void *input, void *output)
+{
+ float *input_buffer;
+ uint32_t *output_buffer;
+ int32_t i32;
+ uint64_t i;
+
+ if ((scale == 0) || (nb_elements == 0) || (input == NULL) || (output == NULL))
+ return -EINVAL;
+
+ input_buffer = (float *)input;
+ output_buffer = (uint32_t *)output;
+
+ for (i = 0; i < nb_elements; i++) {
+ i32 = (int32_t)round((*input_buffer) * scale);
+
+ if (i32 < 0)
+ i32 = 0;
+
+ *output_buffer = (uint32_t)i32;
+
+ input_buffer++;
+ output_buffer++;
+ }
+
+ return 0;
+}
+
+int
+rte_ml_io_uint32_to_float32(float scale, uint64_t nb_elements, void *input, void *output)
+{
+ uint32_t *input_buffer;
+ float *output_buffer;
+ uint64_t i;
+
+ if ((scale == 0) || (nb_elements == 0) || (input == NULL) || (output == NULL))
+ return -EINVAL;
+
+ input_buffer = (uint32_t *)input;
+ output_buffer = (float *)output;
+
+ for (i = 0; i < nb_elements; i++) {
+ *output_buffer = scale * (float)(*input_buffer);
+
+ input_buffer++;
+ output_buffer++;
+ }
+
+ return 0;
+}
+
/* Convert a single precision floating point number (float32) into a half precision
* floating point number (float16) using round to nearest rounding mode.
*/
diff --git a/lib/mldev/version.map b/lib/mldev/version.map
index 99841db6aa9..2e8f1555225 100644
--- a/lib/mldev/version.map
+++ b/lib/mldev/version.map
@@ -57,6 +57,10 @@ INTERNAL {
rte_ml_io_int16_to_float32;
rte_ml_io_float32_to_uint16;
rte_ml_io_uint16_to_float32;
+ rte_ml_io_float32_to_int32;
+ rte_ml_io_int32_to_float32;
+ rte_ml_io_float32_to_uint32;
+ rte_ml_io_uint32_to_float32;
rte_ml_io_float32_to_float16;
rte_ml_io_float16_to_float32;
rte_ml_io_float32_to_bfloat16;
--
2.42.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 2/3] mldev: add support for 64-integer data type
2024-01-07 15:28 [PATCH 0/3] add support for additional data types Srikanth Yalavarthi
2024-01-07 15:28 ` [PATCH 1/3] mldev: add conversion routines for 32-bit integers Srikanth Yalavarthi
@ 2024-01-07 15:28 ` Srikanth Yalavarthi
2024-01-07 15:28 ` [PATCH 3/3] ml/cnxk: add support for additional integer types Srikanth Yalavarthi
2024-02-18 22:14 ` [PATCH 0/3] add support for additional data types Thomas Monjalon
3 siblings, 0 replies; 5+ messages in thread
From: Srikanth Yalavarthi @ 2024-01-07 15:28 UTC (permalink / raw)
To: Srikanth Yalavarthi, Ruifeng Wang; +Cc: dev, aprabhu, sshankarnara, ptakkar
Added support in mldev spec for 64-bit integer types. Added
routines to convert data from 64-bit integer type to float32_t
and vice-versa.
Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
lib/mldev/mldev_utils.c | 4 +
lib/mldev/mldev_utils.h | 92 ++++++++++
lib/mldev/mldev_utils_neon.c | 324 +++++++++++++++++++++++++++++++++
lib/mldev/mldev_utils_scalar.c | 98 ++++++++++
lib/mldev/rte_mldev.h | 4 +
lib/mldev/version.map | 4 +
6 files changed, 526 insertions(+)
diff --git a/lib/mldev/mldev_utils.c b/lib/mldev/mldev_utils.c
index ccd2c39ca89..13ac615e9fc 100644
--- a/lib/mldev/mldev_utils.c
+++ b/lib/mldev/mldev_utils.c
@@ -32,6 +32,10 @@ rte_ml_io_type_size_get(enum rte_ml_io_type type)
return sizeof(int32_t);
case RTE_ML_IO_TYPE_UINT32:
return sizeof(uint32_t);
+ case RTE_ML_IO_TYPE_INT64:
+ return sizeof(int64_t);
+ case RTE_ML_IO_TYPE_UINT64:
+ return sizeof(uint64_t);
case RTE_ML_IO_TYPE_FP8:
return sizeof(uint8_t);
case RTE_ML_IO_TYPE_FP16:
diff --git a/lib/mldev/mldev_utils.h b/lib/mldev/mldev_utils.h
index 1d041531b43..6daae6d0a1c 100644
--- a/lib/mldev/mldev_utils.h
+++ b/lib/mldev/mldev_utils.h
@@ -328,6 +328,98 @@ __rte_internal
int
rte_ml_io_uint32_to_float32(float scale, uint64_t nb_elements, void *input, void *output);
+/**
+ * @internal
+ *
+ * Convert a buffer containing numbers in single precision floating format (float32) to signed
+ * 64-bit integer format (INT64).
+ *
+ * @param[in] scale
+ * Scale factor for conversion.
+ * @param[in] nb_elements
+ * Number of elements in the buffer.
+ * @param[in] input
+ * Input buffer containing float32 numbers. Size of buffer is equal to (nb_elements * 4) bytes.
+ * @param[out] output
+ * Output buffer to store INT64 numbers. Size of buffer is equal to (nb_elements * 8) bytes.
+ *
+ * @return
+ * - 0, Success.
+ * - < 0, Error code on failure.
+ */
+__rte_internal
+int
+rte_ml_io_float32_to_int64(float scale, uint64_t nb_elements, void *input, void *output);
+
+/**
+ * @internal
+ *
+ * Convert a buffer containing numbers in signed 64-bit integer format (INT64) to single precision
+ * floating format (float32).
+ *
+ * @param[in] scale
+ * Scale factor for conversion.
+ * @param[in] nb_elements
+ * Number of elements in the buffer.
+ * @param[in] input
+ * Input buffer containing INT64 numbers. Size of buffer is equal to (nb_elements * 8) bytes.
+ * @param[out] output
+ * Output buffer to store float32 numbers. Size of buffer is equal to (nb_elements * 4) bytes.
+ *
+ * @return
+ * - 0, Success.
+ * - < 0, Error code on failure.
+ */
+__rte_internal
+int
+rte_ml_io_int64_to_float32(float scale, uint64_t nb_elements, void *input, void *output);
+
+/**
+ * @internal
+ *
+ * Convert a buffer containing numbers in single precision floating format (float32) to unsigned
+ * 64-bit integer format (UINT64).
+ *
+ * @param[in] scale
+ * Scale factor for conversion.
+ * @param[in] nb_elements
+ * Number of elements in the buffer.
+ * @param[in] input
+ * Input buffer containing float32 numbers. Size of buffer is equal to (nb_elements * 4) bytes.
+ * @param[out] output
+ * Output buffer to store UINT64 numbers. Size of buffer is equal to (nb_elements * 8) bytes.
+ *
+ * @return
+ * - 0, Success.
+ * - < 0, Error code on failure.
+ */
+__rte_internal
+int
+rte_ml_io_float32_to_uint64(float scale, uint64_t nb_elements, void *input, void *output);
+
+/**
+ * @internal
+ *
+ * Convert a buffer containing numbers in unsigned 64-bit integer format (UINT64) to single
+ * precision floating format (float32).
+ *
+ * @param[in] scale
+ * Scale factor for conversion.
+ * @param[in] nb_elements
+ * Number of elements in the buffer.
+ * @param[in] input
+ * Input buffer containing UINT64 numbers. Size of buffer is equal to (nb_elements * 8) bytes.
+ * @param[out] output
+ * Output buffer to store float32 numbers. Size of buffer is equal to (nb_elements * 4) bytes.
+ *
+ * @return
+ * - 0, Success.
+ * - < 0, Error code on failure.
+ */
+__rte_internal
+int
+rte_ml_io_uint64_to_float32(float scale, uint64_t nb_elements, void *input, void *output);
+
/**
* @internal
*
diff --git a/lib/mldev/mldev_utils_neon.c b/lib/mldev/mldev_utils_neon.c
index 250fa43fa73..4cde2ebabd3 100644
--- a/lib/mldev/mldev_utils_neon.c
+++ b/lib/mldev/mldev_utils_neon.c
@@ -842,6 +842,330 @@ rte_ml_io_uint32_to_float32(float scale, uint64_t nb_elements, void *input, void
return 0;
}
+static inline void
+__float32_to_int64_neon_s64x2(float scale, float *input, int64_t *output)
+{
+ float32x2_t f32x2;
+ float64x2_t f64x2;
+ int64x2_t s64x2;
+
+ /* load 2 x float elements */
+ f32x2 = vld1_f32(input);
+
+ /* scale */
+ f32x2 = vmul_n_f32(f32x2, scale);
+
+ /* convert to float64x2_t */
+ f64x2 = vcvt_f64_f32(f32x2);
+
+ /* convert to int64x2_t */
+ s64x2 = vcvtaq_s64_f64(f64x2);
+
+ /* store 2 elements */
+ vst1q_s64(output, s64x2);
+}
+
+static inline void
+__float32_to_int64_neon_s64x1(float scale, float *input, int64_t *output)
+{
+ float32x2_t f32x2;
+ float64x2_t f64x2;
+ int64x2_t s64x2;
+
+ /* load 1 x float element */
+ f32x2 = vdup_n_f32(*input);
+
+ /* scale */
+ f32x2 = vmul_n_f32(f32x2, scale);
+
+ /* convert to float64x2_t */
+ f64x2 = vcvt_f64_f32(f32x2);
+
+ /* convert to int64x2_t */
+ s64x2 = vcvtaq_s64_f64(f64x2);
+
+ /* store lane 0 of int64x2_t */
+ vst1q_lane_s64(output, s64x2, 0);
+}
+
+int
+rte_ml_io_float32_to_int64(float scale, uint64_t nb_elements, void *input, void *output)
+{
+ float *input_buffer;
+ int64_t *output_buffer;
+ uint64_t nb_iterations;
+ uint32_t vlen;
+ uint64_t i;
+
+ if ((scale == 0) || (nb_elements == 0) || (input == NULL) || (output == NULL))
+ return -EINVAL;
+
+ input_buffer = (float *)input;
+ output_buffer = (int64_t *)output;
+ vlen = 4 * sizeof(float) / sizeof(int64_t);
+ nb_iterations = nb_elements / vlen;
+
+ /* convert vlen elements in each iteration */
+ for (i = 0; i < nb_iterations; i++) {
+ __float32_to_int64_neon_s64x2(scale, input_buffer, output_buffer);
+ input_buffer += vlen;
+ output_buffer += vlen;
+ }
+
+ /* convert leftover elements */
+ i = i * vlen;
+ for (; i < nb_elements; i++) {
+ __float32_to_int64_neon_s64x1(scale, input_buffer, output_buffer);
+ input_buffer++;
+ output_buffer++;
+ }
+
+ return 0;
+}
+
+static inline void
+__int64_to_float32_neon_f32x2(float scale, int64_t *input, float *output)
+{
+ int64x2_t s64x2;
+ float64x2_t f64x2;
+ float32x2_t f32x2;
+
+ /* load 2 x int64_t elements */
+ s64x2 = vld1q_s64(input);
+
+ /* convert int64x2_t to float64x2_t */
+ f64x2 = vcvtq_f64_s64(s64x2);
+
+ /* convert float64x2_t to float32x2_t */
+ f32x2 = vcvt_f32_f64(f64x2);
+
+ /* scale */
+ f32x2 = vmul_n_f32(f32x2, scale);
+
+ /* store float32x2_t */
+ vst1_f32(output, f32x2);
+}
+
+static inline void
+__int64_to_float32_neon_f32x1(float scale, int64_t *input, float *output)
+{
+ int64x2_t s64x2;
+ float64x2_t f64x2;
+ float32x2_t f32x2;
+
+ /* load 2 x int64_t elements */
+ s64x2 = vld1q_lane_s64(input, vdupq_n_s64(0), 0);
+
+ /* convert int64x2_t to float64x2_t */
+ f64x2 = vcvtq_f64_s64(s64x2);
+
+ /* convert float64x2_t to float32x2_t */
+ f32x2 = vcvt_f32_f64(f64x2);
+
+ /* scale */
+ f32x2 = vmul_n_f32(f32x2, scale);
+
+ /* store float32x2_t */
+ vst1_lane_f32(output, f32x2, 0);
+}
+
+int
+rte_ml_io_int64_to_float32(float scale, uint64_t nb_elements, void *input, void *output)
+{
+ int64_t *input_buffer;
+ float *output_buffer;
+ uint64_t nb_iterations;
+ uint32_t vlen;
+ uint64_t i;
+
+ if ((scale == 0) || (nb_elements == 0) || (input == NULL) || (output == NULL))
+ return -EINVAL;
+
+ input_buffer = (int64_t *)input;
+ output_buffer = (float *)output;
+ vlen = 4 * sizeof(float) / sizeof(int64_t);
+ nb_iterations = nb_elements / vlen;
+
+ /* convert vlen elements in each iteration */
+ for (i = 0; i < nb_iterations; i++) {
+ __int64_to_float32_neon_f32x2(scale, input_buffer, output_buffer);
+ input_buffer += vlen;
+ output_buffer += vlen;
+ }
+
+ /* convert leftover elements */
+ i = i * vlen;
+ for (; i < nb_elements; i++) {
+ __int64_to_float32_neon_f32x1(scale, input_buffer, output_buffer);
+ input_buffer++;
+ output_buffer++;
+ }
+
+ return 0;
+}
+
+static inline void
+__float32_to_uint64_neon_u64x2(float scale, float *input, uint64_t *output)
+{
+ float32x2_t f32x2;
+ float64x2_t f64x2;
+ uint64x2_t u64x2;
+
+ /* load 2 x float elements */
+ f32x2 = vld1_f32(input);
+
+ /* scale */
+ f32x2 = vmul_n_f32(f32x2, scale);
+
+ /* convert to float64x2_t */
+ f64x2 = vcvt_f64_f32(f32x2);
+
+ /* convert to int64x2_t */
+ u64x2 = vcvtaq_u64_f64(f64x2);
+
+ /* store 2 elements */
+ vst1q_u64(output, u64x2);
+}
+
+static inline void
+__float32_to_uint64_neon_u64x1(float scale, float *input, uint64_t *output)
+{
+ float32x2_t f32x2;
+ float64x2_t f64x2;
+ uint64x2_t u64x2;
+
+ /* load 1 x float element */
+ f32x2 = vld1_lane_f32(input, vdup_n_f32(0), 0);
+
+ /* scale */
+ f32x2 = vmul_n_f32(f32x2, scale);
+
+ /* convert to float64x2_t */
+ f64x2 = vcvt_f64_f32(f32x2);
+
+ /* convert to int64x2_t */
+ u64x2 = vcvtaq_u64_f64(f64x2);
+
+ /* store 2 elements */
+ vst1q_lane_u64(output, u64x2, 0);
+}
+
+int
+rte_ml_io_float32_to_uint64(float scale, uint64_t nb_elements, void *input, void *output)
+{
+ float *input_buffer;
+ uint64_t *output_buffer;
+ uint64_t nb_iterations;
+ uint32_t vlen;
+ uint64_t i;
+
+ if ((scale == 0) || (nb_elements == 0) || (input == NULL) || (output == NULL))
+ return -EINVAL;
+
+ input_buffer = (float *)input;
+ output_buffer = (uint64_t *)output;
+ vlen = 4 * sizeof(float) / sizeof(uint64_t);
+ nb_iterations = nb_elements / vlen;
+
+ /* convert vlen elements in each iteration */
+ for (i = 0; i < nb_iterations; i++) {
+ __float32_to_uint64_neon_u64x2(scale, input_buffer, output_buffer);
+ input_buffer += vlen;
+ output_buffer += vlen;
+ }
+
+ /* convert leftover elements */
+ i = i * vlen;
+ for (; i < nb_elements; i++) {
+ __float32_to_uint64_neon_u64x1(scale, input_buffer, output_buffer);
+ input_buffer++;
+ output_buffer++;
+ }
+
+ return 0;
+}
+
+static inline void
+__uint64_to_float32_neon_f32x2(float scale, uint64_t *input, float *output)
+{
+ uint64x2_t u64x2;
+ float64x2_t f64x2;
+ float32x2_t f32x2;
+
+ /* load 2 x int64_t elements */
+ u64x2 = vld1q_u64(input);
+
+ /* convert int64x2_t to float64x2_t */
+ f64x2 = vcvtq_f64_u64(u64x2);
+
+ /* convert float64x2_t to float32x2_t */
+ f32x2 = vcvt_f32_f64(f64x2);
+
+ /* scale */
+ f32x2 = vmul_n_f32(f32x2, scale);
+
+ /* store float32x2_t */
+ vst1_f32(output, f32x2);
+}
+
+static inline void
+__uint64_to_float32_neon_f32x1(float scale, uint64_t *input, float *output)
+{
+ uint64x2_t u64x2;
+ float64x2_t f64x2;
+ float32x2_t f32x2;
+
+ /* load 2 x int64_t elements */
+ u64x2 = vld1q_lane_u64(input, vdupq_n_u64(0), 0);
+
+ /* convert int64x2_t to float64x2_t */
+ f64x2 = vcvtq_f64_u64(u64x2);
+
+ /* convert float64x2_t to float32x2_t */
+ f32x2 = vcvt_f32_f64(f64x2);
+
+ /* scale */
+ f32x2 = vmul_n_f32(f32x2, scale);
+
+ /* store float32x2_t */
+ vst1_lane_f32(output, f32x2, 0);
+}
+
+int
+rte_ml_io_uint64_to_float32(float scale, uint64_t nb_elements, void *input, void *output)
+{
+ uint64_t *input_buffer;
+ float *output_buffer;
+ uint64_t nb_iterations;
+ uint32_t vlen;
+ uint64_t i;
+
+ if ((scale == 0) || (nb_elements == 0) || (input == NULL) || (output == NULL))
+ return -EINVAL;
+
+ input_buffer = (uint64_t *)input;
+ output_buffer = (float *)output;
+ vlen = 4 * sizeof(float) / sizeof(uint64_t);
+ nb_iterations = nb_elements / vlen;
+
+ /* convert vlen elements in each iteration */
+ for (i = 0; i < nb_iterations; i++) {
+ __uint64_to_float32_neon_f32x2(scale, input_buffer, output_buffer);
+ input_buffer += vlen;
+ output_buffer += vlen;
+ }
+
+ /* convert leftover elements */
+ i = i * vlen;
+ for (; i < nb_elements; i++) {
+ __uint64_to_float32_neon_f32x1(scale, input_buffer, output_buffer);
+ input_buffer++;
+ output_buffer++;
+ }
+
+ return 0;
+}
+
static inline void
__float32_to_float16_neon_f16x4(float32_t *input, float16_t *output)
{
diff --git a/lib/mldev/mldev_utils_scalar.c b/lib/mldev/mldev_utils_scalar.c
index af1a3a103b2..63a9900cc8c 100644
--- a/lib/mldev/mldev_utils_scalar.c
+++ b/lib/mldev/mldev_utils_scalar.c
@@ -327,6 +327,104 @@ rte_ml_io_uint32_to_float32(float scale, uint64_t nb_elements, void *input, void
return 0;
}
+int
+rte_ml_io_float32_to_int64(float scale, uint64_t nb_elements, void *input, void *output)
+{
+ float *input_buffer;
+ int64_t *output_buffer;
+ uint64_t i;
+
+ if ((scale == 0) || (nb_elements == 0) || (input == NULL) || (output == NULL))
+ return -EINVAL;
+
+ input_buffer = (float *)input;
+ output_buffer = (int64_t *)output;
+
+ for (i = 0; i < nb_elements; i++) {
+ *output_buffer = (int64_t)round((*input_buffer) * scale);
+
+ input_buffer++;
+ output_buffer++;
+ }
+
+ return 0;
+}
+
+int
+rte_ml_io_int64_to_float32(float scale, uint64_t nb_elements, void *input, void *output)
+{
+ int64_t *input_buffer;
+ float *output_buffer;
+ uint64_t i;
+
+ if ((scale == 0) || (nb_elements == 0) || (input == NULL) || (output == NULL))
+ return -EINVAL;
+
+ input_buffer = (int64_t *)input;
+ output_buffer = (float *)output;
+
+ for (i = 0; i < nb_elements; i++) {
+ *output_buffer = scale * (float)(*input_buffer);
+
+ input_buffer++;
+ output_buffer++;
+ }
+
+ return 0;
+}
+
+int
+rte_ml_io_float32_to_uint64(float scale, uint64_t nb_elements, void *input, void *output)
+{
+ float *input_buffer;
+ uint64_t *output_buffer;
+ int64_t i64;
+ uint64_t i;
+
+ if ((scale == 0) || (nb_elements == 0) || (input == NULL) || (output == NULL))
+ return -EINVAL;
+
+ input_buffer = (float *)input;
+ output_buffer = (uint64_t *)output;
+
+ for (i = 0; i < nb_elements; i++) {
+ i64 = (int64_t)round((*input_buffer) * scale);
+
+ if (i64 < 0)
+ i64 = 0;
+
+ *output_buffer = (uint64_t)i64;
+
+ input_buffer++;
+ output_buffer++;
+ }
+
+ return 0;
+}
+
+int
+rte_ml_io_uint64_to_float32(float scale, uint64_t nb_elements, void *input, void *output)
+{
+ uint64_t *input_buffer;
+ float *output_buffer;
+ uint64_t i;
+
+ if ((scale == 0) || (nb_elements == 0) || (input == NULL) || (output == NULL))
+ return -EINVAL;
+
+ input_buffer = (uint64_t *)input;
+ output_buffer = (float *)output;
+
+ for (i = 0; i < nb_elements; i++) {
+ *output_buffer = scale * (float)(*input_buffer);
+
+ input_buffer++;
+ output_buffer++;
+ }
+
+ return 0;
+}
+
/* Convert a single precision floating point number (float32) into a half precision
* floating point number (float16) using round to nearest rounding mode.
*/
diff --git a/lib/mldev/rte_mldev.h b/lib/mldev/rte_mldev.h
index 5cf6f0566f1..27e372fbcf1 100644
--- a/lib/mldev/rte_mldev.h
+++ b/lib/mldev/rte_mldev.h
@@ -874,6 +874,10 @@ enum rte_ml_io_type {
/**< 32-bit integer */
RTE_ML_IO_TYPE_UINT32,
/**< 32-bit unsigned integer */
+ RTE_ML_IO_TYPE_INT64,
+ /**< 32-bit integer */
+ RTE_ML_IO_TYPE_UINT64,
+ /**< 32-bit unsigned integer */
RTE_ML_IO_TYPE_FP8,
/**< 8-bit floating point number */
RTE_ML_IO_TYPE_FP16,
diff --git a/lib/mldev/version.map b/lib/mldev/version.map
index 2e8f1555225..1978695314e 100644
--- a/lib/mldev/version.map
+++ b/lib/mldev/version.map
@@ -61,6 +61,10 @@ INTERNAL {
rte_ml_io_int32_to_float32;
rte_ml_io_float32_to_uint32;
rte_ml_io_uint32_to_float32;
+ rte_ml_io_float32_to_int64;
+ rte_ml_io_int64_to_float32;
+ rte_ml_io_float32_to_uint64;
+ rte_ml_io_uint64_to_float32;
rte_ml_io_float32_to_float16;
rte_ml_io_float16_to_float32;
rte_ml_io_float32_to_bfloat16;
--
2.42.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 3/3] ml/cnxk: add support for additional integer types
2024-01-07 15:28 [PATCH 0/3] add support for additional data types Srikanth Yalavarthi
2024-01-07 15:28 ` [PATCH 1/3] mldev: add conversion routines for 32-bit integers Srikanth Yalavarthi
2024-01-07 15:28 ` [PATCH 2/3] mldev: add support for 64-integer data type Srikanth Yalavarthi
@ 2024-01-07 15:28 ` Srikanth Yalavarthi
2024-02-18 22:14 ` [PATCH 0/3] add support for additional data types Thomas Monjalon
3 siblings, 0 replies; 5+ messages in thread
From: Srikanth Yalavarthi @ 2024-01-07 15:28 UTC (permalink / raw)
To: Srikanth Yalavarthi; +Cc: dev, aprabhu, sshankarnara, ptakkar
Added support quantization and dequantization of 32-bit
and 64-bit integer types.
Signed-off-by: Srikanth Yalavarthi <syalavarthi@marvell.com>
---
drivers/ml/cnxk/cnxk_ml_io.c | 24 ++++++++++++++++++++++++
drivers/ml/cnxk/mvtvm_ml_model.c | 4 ++++
2 files changed, 28 insertions(+)
diff --git a/drivers/ml/cnxk/cnxk_ml_io.c b/drivers/ml/cnxk/cnxk_ml_io.c
index c78009ab0cd..4b0adc2ae47 100644
--- a/drivers/ml/cnxk/cnxk_ml_io.c
+++ b/drivers/ml/cnxk/cnxk_ml_io.c
@@ -40,6 +40,18 @@ cnxk_ml_io_quantize_single(struct cnxk_ml_io *input, uint8_t *dbuffer, uint8_t *
case RTE_ML_IO_TYPE_UINT16:
ret = rte_ml_io_float32_to_uint16(qscale, nb_elements, dbuffer, qbuffer);
break;
+ case RTE_ML_IO_TYPE_INT32:
+ ret = rte_ml_io_float32_to_int32(qscale, nb_elements, dbuffer, qbuffer);
+ break;
+ case RTE_ML_IO_TYPE_UINT32:
+ ret = rte_ml_io_float32_to_uint32(qscale, nb_elements, dbuffer, qbuffer);
+ break;
+ case RTE_ML_IO_TYPE_INT64:
+ ret = rte_ml_io_float32_to_int64(qscale, nb_elements, dbuffer, qbuffer);
+ break;
+ case RTE_ML_IO_TYPE_UINT64:
+ ret = rte_ml_io_float32_to_uint64(qscale, nb_elements, dbuffer, qbuffer);
+ break;
case RTE_ML_IO_TYPE_FP16:
ret = rte_ml_io_float32_to_float16(nb_elements, dbuffer, qbuffer);
break;
@@ -82,6 +94,18 @@ cnxk_ml_io_dequantize_single(struct cnxk_ml_io *output, uint8_t *qbuffer, uint8_
case RTE_ML_IO_TYPE_UINT16:
ret = rte_ml_io_uint16_to_float32(dscale, nb_elements, qbuffer, dbuffer);
break;
+ case RTE_ML_IO_TYPE_INT32:
+ ret = rte_ml_io_int32_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+ break;
+ case RTE_ML_IO_TYPE_UINT32:
+ ret = rte_ml_io_uint32_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+ break;
+ case RTE_ML_IO_TYPE_INT64:
+ ret = rte_ml_io_int64_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+ break;
+ case RTE_ML_IO_TYPE_UINT64:
+ ret = rte_ml_io_uint64_to_float32(dscale, nb_elements, qbuffer, dbuffer);
+ break;
case RTE_ML_IO_TYPE_FP16:
ret = rte_ml_io_float16_to_float32(nb_elements, qbuffer, dbuffer);
break;
diff --git a/drivers/ml/cnxk/mvtvm_ml_model.c b/drivers/ml/cnxk/mvtvm_ml_model.c
index 0dbe08e9889..e3234ae4422 100644
--- a/drivers/ml/cnxk/mvtvm_ml_model.c
+++ b/drivers/ml/cnxk/mvtvm_ml_model.c
@@ -150,6 +150,8 @@ mvtvm_ml_io_type_map(DLDataType dltype)
return RTE_ML_IO_TYPE_INT16;
else if (dltype.bits == 32)
return RTE_ML_IO_TYPE_INT32;
+ else if (dltype.bits == 64)
+ return RTE_ML_IO_TYPE_INT64;
break;
case kDLUInt:
if (dltype.bits == 8)
@@ -158,6 +160,8 @@ mvtvm_ml_io_type_map(DLDataType dltype)
return RTE_ML_IO_TYPE_UINT16;
else if (dltype.bits == 32)
return RTE_ML_IO_TYPE_UINT32;
+ else if (dltype.bits == 64)
+ return RTE_ML_IO_TYPE_UINT64;
break;
case kDLFloat:
if (dltype.bits == 8)
--
2.42.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 0/3] add support for additional data types
2024-01-07 15:28 [PATCH 0/3] add support for additional data types Srikanth Yalavarthi
` (2 preceding siblings ...)
2024-01-07 15:28 ` [PATCH 3/3] ml/cnxk: add support for additional integer types Srikanth Yalavarthi
@ 2024-02-18 22:14 ` Thomas Monjalon
3 siblings, 0 replies; 5+ messages in thread
From: Thomas Monjalon @ 2024-02-18 22:14 UTC (permalink / raw)
To: Srikanth Yalavarthi; +Cc: dev, aprabhu, syalavarthi, sshankarnara, ptakkar
07/01/2024 16:28, Srikanth Yalavarthi:
> Added support for 64-bit integer data types for inference input and
> output. Extended support for quantization of 32-bit and 64-bit integer
> data types.
>
> Srikanth Yalavarthi (3):
> mldev: add conversion routines for 32-bit integers
> mldev: add support for 64-integer data type
> ml/cnxk: add support for additional integer types
Applied, thanks.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-02-18 22:14 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-07 15:28 [PATCH 0/3] add support for additional data types Srikanth Yalavarthi
2024-01-07 15:28 ` [PATCH 1/3] mldev: add conversion routines for 32-bit integers Srikanth Yalavarthi
2024-01-07 15:28 ` [PATCH 2/3] mldev: add support for 64-integer data type Srikanth Yalavarthi
2024-01-07 15:28 ` [PATCH 3/3] ml/cnxk: add support for additional integer types Srikanth Yalavarthi
2024-02-18 22:14 ` [PATCH 0/3] add support for additional data types Thomas Monjalon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).