From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <nicolas.chautru@intel.com> Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id CB417548B for <dev@dpdk.org>; Tue, 14 May 2019 21:46:20 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 14 May 2019 12:46:19 -0700 X-ExtLoop1: 1 Received: from skx-5gnr-sd5.sc.intel.com ([172.25.69.194]) by orsmga003.jf.intel.com with ESMTP; 14 May 2019 12:46:19 -0700 From: Nicolas Chautru <nicolas.chautru@intel.com> To: Cc: dev@dpdk.org, Nicolas Chautru <nicolas.chautru@intel.com> Date: Tue, 14 May 2019 12:45:41 -0700 Message-Id: <1557863143-174842-4-git-send-email-nicolas.chautru@intel.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1557863143-174842-1-git-send-email-nicolas.chautru@intel.com> References: <1557533163-172544-1-git-send-email-nicolas.chautru@intel.com> <1557863143-174842-1-git-send-email-nicolas.chautru@intel.com> Subject: [dpdk-dev] [PATCH v2 3/5] baseband/turbo_sw: extension of turbosw for 5G FEC X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions <dev.dpdk.org> List-Unsubscribe: <https://mails.dpdk.org/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://mails.dpdk.org/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <https://mails.dpdk.org/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> X-List-Received-Date: Tue, 14 May 2019 19:46:21 -0000 Implementation still based on Intel SDK libraries Optimized for AVX512 instructions set Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> --- app/test-bbdev/test_bbdev_vector.c | 4 +- app/test-bbdev/test_bbdev_vector.h | 2 +- config/common_base | 3 +- doc/guides/bbdevs/turbo_sw.rst | 57 +- drivers/baseband/turbo_sw/Makefile | 15 +- drivers/baseband/turbo_sw/bbdev_turbo_software.c | 679 ++++++++++++++++++++++- mk/rte.app.mk | 8 +- 7 files changed, 731 insertions(+), 37 deletions(-) diff --git a/app/test-bbdev/test_bbdev_vector.c b/app/test-bbdev/test_bbdev_vector.c index e4f68e2..e149ced 100644 --- a/app/test-bbdev/test_bbdev_vector.c +++ b/app/test-bbdev/test_bbdev_vector.c @@ -298,9 +298,9 @@ op_data = vector->entries[type].segments; nb_ops = &vector->entries[type].nb_segments; - if (*nb_ops >= RTE_BBDEV_MAX_CODE_BLOCKS) { + if (*nb_ops >= RTE_BBDEV_TURBO_MAX_CODE_BLOCKS) { printf("Too many segments (code blocks defined): %u, max %d!\n", - *nb_ops, RTE_BBDEV_MAX_CODE_BLOCKS); + *nb_ops, RTE_BBDEV_TURBO_MAX_CODE_BLOCKS); return -1; } diff --git a/app/test-bbdev/test_bbdev_vector.h b/app/test-bbdev/test_bbdev_vector.h index 476aae1..c85e94d 100644 --- a/app/test-bbdev/test_bbdev_vector.h +++ b/app/test-bbdev/test_bbdev_vector.h @@ -46,7 +46,7 @@ struct op_data_buf { }; struct op_data_entries { - struct op_data_buf segments[RTE_BBDEV_MAX_CODE_BLOCKS]; + struct op_data_buf segments[RTE_BBDEV_TURBO_MAX_CODE_BLOCKS]; unsigned int nb_segments; }; diff --git a/config/common_base b/config/common_base index ae52028..78e41f2 100644 --- a/config/common_base +++ b/config/common_base @@ -523,7 +523,8 @@ CONFIG_RTE_PMD_PACKET_PREFETCH=y CONFIG_RTE_LIBRTE_BBDEV=y CONFIG_RTE_LIBRTE_BBDEV_DEBUG=n CONFIG_RTE_BBDEV_MAX_DEVS=128 -CONFIG_RTE_BBDEV_OFFLOAD_COST=y +CONFIG_RTE_BBDEV_OFFLOAD_COST=n +CONFIG_RTE_LIBRTE_PMD_BBDEV_AVX512=n # # Compile PMD for NULL bbdev device diff --git a/doc/guides/bbdevs/turbo_sw.rst b/doc/guides/bbdevs/turbo_sw.rst index 29f7ec9..e67ecc4 100644 --- a/doc/guides/bbdevs/turbo_sw.rst +++ b/doc/guides/bbdevs/turbo_sw.rst @@ -1,26 +1,26 @@ .. SPDX-License-Identifier: BSD-3-Clause Copyright(c) 2017 Intel Corporation -SW Turbo Poll Mode Driver +SW FEC Poll Mode Driver ========================= -The SW Turbo PMD (**baseband_turbo_sw**) provides a poll mode bbdev driver that utilizes -Intel optimized libraries for LTE Layer 1 workloads acceleration. This PMD -supports the functions: Turbo FEC, Rate Matching and CRC functions. +The SW FEC PMD (**baseband_turbo_sw**) provides a poll mode bbdev driver that utilizes +Intel optimized libraries for LTE and 5GNR Layer 1 workloads acceleration. This PMD +supports the functions: FEC, Rate Matching and CRC functions. Features -------- -SW Turbo PMD has support for the following capabilities: +SW FEC PMD has support for the following capabilities: -For the encode operation: +For the LTE encode operation: * ``RTE_BBDEV_TURBO_CRC_24A_ATTACH`` * ``RTE_BBDEV_TURBO_CRC_24B_ATTACH`` * ``RTE_BBDEV_TURBO_RATE_MATCH`` * ``RTE_BBDEV_TURBO_RV_INDEX_BYPASS`` -For the decode operation: +For the LTE decode operation: * ``RTE_BBDEV_TURBO_SUBBLOCK_DEINTERLEAVE`` * ``RTE_BBDEV_TURBO_CRC_TYPE_24B`` @@ -29,11 +29,25 @@ For the decode operation: * ``RTE_BBDEV_TURBO_DEC_TB_CRC_24B_KEEP`` * ``RTE_BBDEV_TURBO_EARLY_TERMINATION`` +For the LDPC encode operation: + +* ``RTE_BBDEV_LDPC_RATE_MATCH`` +* ``RTE_BBDEV_LDPC_CRC_24A_ATTACH`` +* ``RTE_BBDEV_LDPC_CRC_24B_ATTACH`` + +For the LDPC decode operation: + +* ``RTE_BBDEV_LDPC_CRC_TYPE_24B_CHECK`` +* ``RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK`` +* ``RTE_BBDEV_LDPC_CRC_TYPE_24B_DROP`` +* ``RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE`` +* ``RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE`` +* ``RTE_BBDEV_LDPC_ITERATION_STOP_ENABLE`` Limitations ----------- -* In-place operations for Turbo encode and decode are not supported +* In-place operations for encode and decode are not supported Installation ------------ @@ -65,14 +79,14 @@ The following table maps DPDK versions with past FlexRAN SDK releases: 18.02 1.3.0 18.05 1.4.0 18.08 1.6.0 - 19.02 18.09 + 18.08-Patch 19.04 ===================== ============================ FlexRAN SDK Installation ~~~~~~~~~~~~~~~~~~~~~~~~ The following are pre-requisites for building FlexRAN SDK Libraries: - (a) An AVX2 supporting machine + (a) An AVX512 supporting machine (b) CentOS Linux release 7.2.1511 (Core) operating system (c) Intel ICC 18.0.1 20171018 compiler installed @@ -84,25 +98,25 @@ The following instructions should be followed in this exact order: source <path-to-icc-compiler-install-folder>/linux/bin/compilervars.sh intel64 -platform linux -#. Extract the ``605167-flexran-18-09-tar.gz`` package: +#. Extract the ``...-flexran-19-04-tar.gz`` package: .. code-block:: console - mkdir FlexRAN-18.09 - tar xvzf 605167-flexran-18-09-tar.gz -C FlexRAN-18.09/ + mkdir FlexRAN-19.04 + tar xvzf ...-flexran-19-04-tar.gz -C FlexRAN-19.04/ #. Run the SDK extractor script and accept the license: .. code-block:: console - cd <path-to-workspace>/FlexRAN-18.09/ - ./SDK-18.09.sh + cd <path-to-workspace>/FlexRAN-19.04/ + ./SDK-19.04.sh #. Generate makefiles based on system configuration: .. code-block:: console - cd <path-to-workspace>/FlexRAN-18.09/SDK-18.09/sdk/ + cd <path-to-workspace>/FlexRAN-19.04/SDK-19.04/sdk/ ./create-makefiles-linux.sh #. A build folder is generated in this form ``build-<ISA>-<CC>``, enter that @@ -110,7 +124,7 @@ The following instructions should be followed in this exact order: .. code-block:: console - cd build-avx2-icc/ + cd build-avx512-icc/ make && make install @@ -129,12 +143,11 @@ Example: .. code-block:: console - export FLEXRAN_SDK=<path-to-workspace>/FlexRAN-18.09/SDK-18.09/sdk/build-avx2-icc/install - export DIR_WIRELESS_SDK=<path-to-workspace>/FlexRAN-18.09/SDK-18.09/sdk/ - + export FLEXRAN_SDK=<path-to-workspace>/FlexRAN-19.04/SDK-19.04/sdk/build-avx2-icc/install + export DIR_WIRELESS_SDK=<path-to-workspace>/FlexRAN-19.04/SDK-19.04/sdk/ -* Set ``CONFIG_RTE_LIBRTE_PMD_BBDEV_TURBO_SW=y`` in DPDK common configuration - file ``config/common_base``. +* Set ``CONFIG_RTE_LIBRTE_PMD_BBDEV_TURBO_SW=y`` and ``CONFIG_RTE_LIBRTE_PMD_BBDEV_AVX512=y`` + in DPDK common configuration file ``config/common_base``. To use the PMD in an application, user must: diff --git a/drivers/baseband/turbo_sw/Makefile b/drivers/baseband/turbo_sw/Makefile index d364677..1a47dfb 100644 --- a/drivers/baseband/turbo_sw/Makefile +++ b/drivers/baseband/turbo_sw/Makefile @@ -26,12 +26,25 @@ CFLAGS += -I$(FLEXRAN_SDK)/lib_common CFLAGS += -I$(FLEXRAN_SDK)/lib_turbo CFLAGS += -I$(FLEXRAN_SDK)/lib_crc CFLAGS += -I$(FLEXRAN_SDK)/lib_rate_matching +ifeq ($(CONFIG_RTE_LIBRTE_PMD_BBDEV_AVX512),y) +CFLAGS += -I$(FLEXRAN_SDK)/lib_ldpc_encoder_5gnr +CFLAGS += -I$(FLEXRAN_SDK)/lib_ldpc_decoder_5gnr +CFLAGS += -I$(FLEXRAN_SDK)/lib_LDPC_ratematch_5gnr +CFLAGS += -I$(FLEXRAN_SDK)/lib_rate_dematching_5gnr +endif LDLIBS += -L$(FLEXRAN_SDK)/lib_turbo -lturbo LDLIBS += -L$(FLEXRAN_SDK)/lib_crc -lcrc LDLIBS += -L$(FLEXRAN_SDK)/lib_rate_matching -lrate_matching LDLIBS += -L$(FLEXRAN_SDK)/lib_common -lcommon -LDLIBS += -lstdc++ -lirc -limf -lipps +ifeq ($(CONFIG_RTE_LIBRTE_PMD_BBDEV_AVX512),y) +LDLIBS += -L$(FLEXRAN_SDK)/lib_ldpc_encoder_5gnr -lldpc_encoder_5gnr +LDLIBS += -L$(FLEXRAN_SDK)/lib_ldpc_decoder_5gnr -lldpc_decoder_5gnr +LDLIBS += -L$(FLEXRAN_SDK)/lib_LDPC_ratematch_5gnr -lLDPC_ratematch_5gnr +LDLIBS += -L$(FLEXRAN_SDK)/lib_rate_dematching_5gnr -lrate_dematching_5gnr +endif + +LDLIBS += -lstdc++ -lirc -limf -lipps -lsvml # library version LIBABIVER := 1 diff --git a/drivers/baseband/turbo_sw/bbdev_turbo_software.c b/drivers/baseband/turbo_sw/bbdev_turbo_software.c index c714d03..af5e731 100644 --- a/drivers/baseband/turbo_sw/bbdev_turbo_software.c +++ b/drivers/baseband/turbo_sw/bbdev_turbo_software.c @@ -14,9 +14,23 @@ #include <rte_bbdev.h> #include <rte_bbdev_pmd.h> +#include <rte_common.h> +#include <rte_hexdump.h> +#include <rte_log.h> + +#include <ipp.h> +#include <ipps.h> #include <phy_turbo.h> #include <phy_crc.h> #include <phy_rate_match.h> +#ifdef RTE_LIBRTE_PMD_BBDEV_AVX512 +#include <bit_reverse.h> +#include <phy_ldpc_encoder_5gnr.h> +#include <phy_ldpc_decoder_5gnr.h> +#include <phy_LDPC_ratematch_5gnr.h> +#include <phy_rate_dematching_5gnr.h> +#endif + #include <divide.h> #define DRIVER_NAME baseband_turbo_sw @@ -154,7 +168,8 @@ struct turbo_sw_queue { RTE_BBDEV_TURBO_DEC_TB_CRC_24B_KEEP | RTE_BBDEV_TURBO_EARLY_TERMINATION, .max_llr_modulus = 16, - .num_buffers_src = RTE_BBDEV_TURBO_MAX_CODE_BLOCKS, + .num_buffers_src = + RTE_BBDEV_TURBO_MAX_CODE_BLOCKS, .num_buffers_hard_out = RTE_BBDEV_TURBO_MAX_CODE_BLOCKS, .num_buffers_soft_out = 0, @@ -168,10 +183,46 @@ struct turbo_sw_queue { RTE_BBDEV_TURBO_CRC_24A_ATTACH | RTE_BBDEV_TURBO_RATE_MATCH | RTE_BBDEV_TURBO_RV_INDEX_BYPASS, - .num_buffers_src = RTE_BBDEV_TURBO_MAX_CODE_BLOCKS, - .num_buffers_dst = RTE_BBDEV_TURBO_MAX_CODE_BLOCKS, + .num_buffers_src = + RTE_BBDEV_TURBO_MAX_CODE_BLOCKS, + .num_buffers_dst = + RTE_BBDEV_TURBO_MAX_CODE_BLOCKS, + } + }, +#ifdef RTE_LIBRTE_PMD_BBDEV_AVX512 + { + .type = RTE_BBDEV_OP_LDPC_ENC, + .cap.ldpc_enc = { + .capability_flags = + RTE_BBDEV_LDPC_RATE_MATCH | + RTE_BBDEV_LDPC_CRC_24A_ATTACH | + RTE_BBDEV_LDPC_CRC_24B_ATTACH, + .num_buffers_src = + RTE_BBDEV_LDPC_MAX_CODE_BLOCKS, + .num_buffers_dst = + RTE_BBDEV_LDPC_MAX_CODE_BLOCKS, } }, + { + .type = RTE_BBDEV_OP_LDPC_DEC, + .cap.ldpc_dec = { + .capability_flags = + RTE_BBDEV_LDPC_CRC_TYPE_24B_CHECK | + RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK | + RTE_BBDEV_LDPC_CRC_TYPE_24B_DROP | + RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE | + RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE | + RTE_BBDEV_LDPC_ITERATION_STOP_ENABLE, + .llr_size = 8, + .llr_decimals = 2, + .num_buffers_src = + RTE_BBDEV_TURBO_MAX_CODE_BLOCKS, + .num_buffers_hard_out = + RTE_BBDEV_TURBO_MAX_CODE_BLOCKS, + .num_buffers_soft_out = 0, + } + }, +#endif RTE_BBDEV_END_OF_CAPABILITIES_LIST() }; @@ -268,7 +319,7 @@ struct turbo_sw_queue { return -ENAMETOOLONG; } q->enc_in = rte_zmalloc_socket(name, - (RTE_BBDEV_TURBO_MAX_CB_SIZE >> 3) * sizeof(*q->enc_in), + (RTE_BBDEV_LDPC_MAX_CB_SIZE >> 3) * sizeof(*q->enc_in), RTE_CACHE_LINE_SIZE, queue_conf->socket); if (q->enc_in == NULL) { rte_bbdev_log(ERR, @@ -276,7 +327,7 @@ struct turbo_sw_queue { goto free_q; } - /* Allocate memory for Aplha Gamma temp buffer. */ + /* Allocate memory for Alpha Gamma temp buffer. */ ret = snprintf(name, RTE_RING_NAMESIZE, RTE_STR(DRIVER_NAME)"_ag%u:%u", dev->data->dev_id, q_id); if ((ret < 0) || (ret >= (int)RTE_RING_NAMESIZE)) { @@ -410,6 +461,7 @@ struct turbo_sw_queue { .queue_release = q_release }; +#ifdef RTE_LIBRTE_BBDEV_DEBUG /* Checks if the encoder input buffer is correct. * Returns 0 if it's valid, -1 otherwise. */ @@ -437,7 +489,9 @@ struct turbo_sw_queue { return 0; } +#endif +#ifdef RTE_LIBRTE_BBDEV_DEBUG /* Checks if the decoder input buffer is correct. * Returns 0 if it's valid, -1 otherwise. */ @@ -464,15 +518,20 @@ struct turbo_sw_queue { return 0; } +#endif static inline void process_enc_cb(struct turbo_sw_queue *q, struct rte_bbdev_enc_op *op, uint8_t r, uint8_t c, uint16_t k, uint16_t ncb, uint32_t e, struct rte_mbuf *m_in, struct rte_mbuf *m_out_head, - struct rte_mbuf *m_out, uint16_t in_offset, uint16_t out_offset, + struct rte_mbuf *m_out, uint16_t in_offset, uint16_t out_offset, uint16_t in_length, struct rte_bbdev_stats *q_stats) { +#ifdef RTE_LIBRTE_BBDEV_DEBUG int ret; +#else + RTE_SET_USED(in_length); +#endif int16_t k_idx; uint16_t m; uint8_t *in, *out0, *out1, *out2, *tmp_out, *rm_out; @@ -496,11 +555,14 @@ struct turbo_sw_queue { /* CRC24A (for TB) */ if ((enc->op_flags & RTE_BBDEV_TURBO_CRC_24A_ATTACH) && (enc->code_block_mode == 1)) { +#ifdef RTE_LIBRTE_BBDEV_DEBUG ret = is_enc_input_valid(k - 24, k_idx, in_length); if (ret != 0) { op->status |= 1 << RTE_BBDEV_DATA_ERROR; return; } +#endif + crc_req.data = in; crc_req.len = k - 24; /* Check if there is a room for CRC bits if not use @@ -529,11 +591,14 @@ struct turbo_sw_queue { #endif } else if (enc->op_flags & RTE_BBDEV_TURBO_CRC_24B_ATTACH) { /* CRC24B */ +#ifdef RTE_LIBRTE_BBDEV_DEBUG ret = is_enc_input_valid(k - 24, k_idx, in_length); if (ret != 0) { op->status |= 1 << RTE_BBDEV_DATA_ERROR; return; } +#endif + crc_req.data = in; crc_req.len = k - 24; /* Check if there is a room for CRC bits if this is the last @@ -560,13 +625,16 @@ struct turbo_sw_queue { #ifdef RTE_BBDEV_OFFLOAD_COST q_stats->acc_offload_cycles += rte_rdtsc_precise() - start_time; #endif - } else { + } +#ifdef RTE_LIBRTE_BBDEV_DEBUG + else { ret = is_enc_input_valid(k, k_idx, in_length); if (ret != 0) { op->status |= 1 << RTE_BBDEV_DATA_ERROR; return; } } +#endif /* Turbo encoder */ @@ -726,6 +794,143 @@ struct turbo_sw_queue { } } + +static inline void +process_ldpc_enc_cb(struct turbo_sw_queue *q, struct rte_bbdev_enc_op *op, + uint32_t e, struct rte_mbuf *m_in, struct rte_mbuf *m_out_head, + struct rte_mbuf *m_out, uint16_t in_offset, uint16_t out_offset, + uint16_t seg_total_left, struct rte_bbdev_stats *q_stats) +{ +#ifdef RTE_LIBRTE_PMD_BBDEV_AVX512 + RTE_SET_USED(seg_total_left); + uint8_t *in, *rm_out; + struct rte_bbdev_op_ldpc_enc *enc = &op->ldpc_enc; + struct bblib_ldpc_encoder_5gnr_request ldpc_req; + struct bblib_ldpc_encoder_5gnr_response ldpc_resp; + struct bblib_LDPC_ratematch_5gnr_request rm_req; + struct bblib_LDPC_ratematch_5gnr_response rm_resp; + struct bblib_crc_request crc_req; + struct bblib_crc_response crc_resp; + uint16_t msgLen, puntBits, parity_offset, out_len; + uint16_t K = (enc->basegraph == 1 ? 22 : 10) * enc->z_c; + uint16_t in_length_in_bits = K - enc->n_filler; + uint16_t in_length_in_bytes = (in_length_in_bits + 7) >> 3; + +#ifdef RTE_BBDEV_OFFLOAD_COST + uint64_t start_time = rte_rdtsc_precise(); +#else + RTE_SET_USED(q_stats); +#endif + + in = rte_pktmbuf_mtod_offset(m_in, uint8_t *, in_offset); + + /* Masking the Filler bits explicitly */ + memset(q->enc_in + (in_length_in_bytes - 3), 0, + ((K + 7) >> 3) - (in_length_in_bytes - 3)); + /* CRC Generation */ + if (enc->op_flags & RTE_BBDEV_LDPC_CRC_24A_ATTACH) { + rte_memcpy(q->enc_in, in, in_length_in_bytes - 3); + crc_req.data = in; + crc_req.len = in_length_in_bits - 24; + crc_resp.data = q->enc_in; + bblib_lte_crc24a_gen(&crc_req, &crc_resp); + } else if (enc->op_flags & RTE_BBDEV_LDPC_CRC_24B_ATTACH) { + rte_memcpy(q->enc_in, in, in_length_in_bytes - 3); + crc_req.data = in; + crc_req.len = in_length_in_bits - 24; + crc_resp.data = q->enc_in; + bblib_lte_crc24b_gen(&crc_req, &crc_resp); + } else + rte_memcpy(q->enc_in, in, in_length_in_bytes); + + /* LDPC Encoding */ + ldpc_req.Zc = enc->z_c; + ldpc_req.baseGraph = enc->basegraph; + /* Number of rows set to maximum */ + ldpc_req.nRows = ldpc_req.baseGraph == 1 ? 46 : 42; + ldpc_req.numberCodeblocks = 1; + ldpc_req.input[0] = (int8_t *) q->enc_in; + ldpc_resp.output[0] = (int8_t *) q->enc_out; + + bblib_bit_reverse(ldpc_req.input[0], in_length_in_bytes << 3); + + if (bblib_ldpc_encoder_5gnr(&ldpc_req, &ldpc_resp) != 0) { + op->status |= 1 << RTE_BBDEV_DRV_ERROR; + rte_bbdev_log(ERR, "LDPC Encoder failed"); + return; + } + + /* + * Systematic + Parity : Recreating stream with filler bits, ideally + * the bit select could handle this in the RM SDK + */ + msgLen = (ldpc_req.baseGraph == 1 ? 22 : 10) * ldpc_req.Zc; + puntBits = 2 * ldpc_req.Zc; + parity_offset = msgLen - puntBits; + ippsCopyBE_1u(((uint8_t *) ldpc_req.input[0]) + (puntBits / 8), + puntBits%8, q->adapter_output, 0, parity_offset); + ippsCopyBE_1u(q->enc_out, 0, q->adapter_output + (parity_offset / 8), + parity_offset % 8, ldpc_req.nRows * ldpc_req.Zc); + + out_len = (e + 7) >> 3; + /* get output data starting address */ + rm_out = (uint8_t *)mbuf_append(m_out_head, m_out, out_len); + if (rm_out == NULL) { + op->status |= 1 << RTE_BBDEV_DATA_ERROR; + rte_bbdev_log(ERR, + "Too little space in output mbuf"); + return; + } + /* + * rte_bbdev_op_data.offset can be different than the offset + * of the appended bytes + */ + rm_out = rte_pktmbuf_mtod_offset(m_out, uint8_t *, out_offset); + + /* Rate-Matching */ + rm_req.E = e; + rm_req.Ncb = enc->n_cb; + rm_req.Qm = enc->q_m; + rm_req.Zc = enc->z_c; + rm_req.baseGraph = enc->basegraph; + rm_req.input = q->adapter_output; + rm_req.nLen = enc->n_filler; + rm_req.nullIndex = parity_offset - enc->n_filler; + rm_req.rvidx = enc->rv_index; + rm_resp.output = q->deint_output; + + if (bblib_LDPC_ratematch_5gnr(&rm_req, &rm_resp) != 0) { + op->status |= 1 << RTE_BBDEV_DRV_ERROR; + rte_bbdev_log(ERR, "Rate matching failed"); + return; + } + + /* RM SDK may provide non zero bits on last byte */ + if ((e % 8) != 0) + q->deint_output[out_len-1] &= (1 << (e % 8)) - 1; + + bblib_bit_reverse((int8_t *) q->deint_output, out_len << 3); + + rte_memcpy(rm_out, q->deint_output, out_len); + enc->output.length += out_len; + +#ifdef RTE_BBDEV_OFFLOAD_COST + q_stats->acc_offload_cycles += rte_rdtsc_precise() - start_time; +#endif +#else + RTE_SET_USED(q); + RTE_SET_USED(op); + RTE_SET_USED(e); + RTE_SET_USED(m_in); + RTE_SET_USED(m_out_head); + RTE_SET_USED(m_out); + RTE_SET_USED(in_offset); + RTE_SET_USED(out_offset); + RTE_SET_USED(seg_total_left); + RTE_SET_USED(q_stats); +#endif +} + static inline void enqueue_enc_one_op(struct turbo_sw_queue *q, struct rte_bbdev_enc_op *op, struct rte_bbdev_stats *queue_stats) @@ -819,6 +1024,93 @@ struct turbo_sw_queue { } } + +static inline void +enqueue_ldpc_enc_one_op(struct turbo_sw_queue *q, struct rte_bbdev_enc_op *op, + struct rte_bbdev_stats *queue_stats) +{ + uint8_t c, r, crc24_bits = 0; + uint32_t e; + struct rte_bbdev_op_ldpc_enc *enc = &op->ldpc_enc; + uint16_t in_offset = enc->input.offset; + uint16_t out_offset = enc->output.offset; + struct rte_mbuf *m_in = enc->input.data; + struct rte_mbuf *m_out = enc->output.data; + struct rte_mbuf *m_out_head = enc->output.data; + uint32_t in_length, mbuf_total_left = enc->input.length; + + uint16_t seg_total_left; + + /* Clear op status */ + op->status = 0; + + if (mbuf_total_left > RTE_BBDEV_TURBO_MAX_TB_SIZE >> 3) { + rte_bbdev_log(ERR, "TB size (%u) is too big, max: %d", + mbuf_total_left, RTE_BBDEV_TURBO_MAX_TB_SIZE); + op->status = 1 << RTE_BBDEV_DATA_ERROR; + return; + } + + if (m_in == NULL || m_out == NULL) { + rte_bbdev_log(ERR, "Invalid mbuf pointer"); + op->status = 1 << RTE_BBDEV_DATA_ERROR; + return; + } + + if ((enc->op_flags & RTE_BBDEV_TURBO_CRC_24B_ATTACH) || + (enc->op_flags & RTE_BBDEV_TURBO_CRC_24A_ATTACH)) + crc24_bits = 24; + + if (enc->code_block_mode == 0) { /* For Transport Block mode */ + c = enc->tb_params.c; + r = enc->tb_params.r; + } else { /* For Code Block mode */ + c = 1; + r = 0; + } + + while (mbuf_total_left > 0 && r < c) { + + seg_total_left = rte_pktmbuf_data_len(m_in) - in_offset; + + if (enc->code_block_mode == 0) { + e = (r < enc->tb_params.cab) ? + enc->tb_params.ea : enc->tb_params.eb; + } else { + e = enc->cb_params.e; + } + + process_ldpc_enc_cb(q, op, e, m_in, m_out_head, + m_out, in_offset, out_offset, seg_total_left, + queue_stats); + /* Update total_left */ + in_length = (enc->basegraph == 1 ? 22 : 10) * enc->z_c; + in_length = ((in_length - crc24_bits - enc->n_filler) >> 3); + mbuf_total_left -= in_length; + /* Update offsets for next CBs (if exist) */ + in_offset += in_length; + out_offset += (e + 7) >> 3; + + /* Update offsets */ + if (seg_total_left == in_length) { + /* Go to the next mbuf */ + m_in = m_in->next; + m_out = m_out->next; + in_offset = 0; + out_offset = 0; + } + r++; + } + + /* check if all input data was processed */ + if (mbuf_total_left != 0) { + op->status |= 1 << RTE_BBDEV_DATA_ERROR; + rte_bbdev_log(ERR, + "Mismatch between mbuf length and included CBs sizes %d", + mbuf_total_left); + } +} + static inline uint16_t enqueue_enc_all_ops(struct turbo_sw_queue *q, struct rte_bbdev_enc_op **ops, uint16_t nb_ops, struct rte_bbdev_stats *queue_stats) @@ -835,6 +1127,23 @@ struct turbo_sw_queue { NULL); } +static inline uint16_t +enqueue_ldpc_enc_all_ops(struct turbo_sw_queue *q, + struct rte_bbdev_enc_op **ops, + uint16_t nb_ops, struct rte_bbdev_stats *queue_stats) +{ + uint16_t i; +#ifdef RTE_BBDEV_OFFLOAD_COST + queue_stats->acc_offload_cycles = 0; +#endif + + for (i = 0; i < nb_ops; ++i) + enqueue_ldpc_enc_one_op(q, ops[i], queue_stats); + + return rte_ring_enqueue_burst(q->processed_pkts, (void **)ops, nb_ops, + NULL); +} + static inline void move_padding_bytes(const uint8_t *in, uint8_t *out, uint16_t k, uint16_t ncb) @@ -856,7 +1165,11 @@ struct turbo_sw_queue { uint16_t crc24_overlap, uint16_t in_length, struct rte_bbdev_stats *q_stats) { +#ifdef RTE_LIBRTE_BBDEV_DEBUG int ret; +#else + RTE_SET_USED(in_length); +#endif int32_t k_idx; int32_t iter_cnt; uint8_t *in, *out, *adapter_input; @@ -874,11 +1187,13 @@ struct turbo_sw_queue { k_idx = compute_idx(k); +#ifdef RTE_LIBRTE_BBDEV_DEBUG ret = is_dec_input_valid(k_idx, kw, in_length); if (ret != 0) { op->status |= 1 << RTE_BBDEV_DATA_ERROR; return; } +#endif in = rte_pktmbuf_mtod_offset(m_in, uint8_t *, in_offset); ncb = kw; @@ -894,11 +1209,12 @@ struct turbo_sw_queue { deint_resp.pinteleavebuffer = q->deint_output; #ifdef RTE_BBDEV_OFFLOAD_COST - start_time = rte_rdtsc_precise(); + start_time = rte_rdtsc_precise(); #endif + /* Sub-block De-Interleaving */ bblib_deinterleave_ul(&deint_req, &deint_resp); #ifdef RTE_BBDEV_OFFLOAD_COST - q_stats->acc_offload_cycles += rte_rdtsc_precise() - start_time; + q_stats->acc_offload_cycles += rte_rdtsc_precise() - start_time; #endif } else move_padding_bytes(in, q->deint_output, k, ncb); @@ -975,6 +1291,196 @@ struct turbo_sw_queue { } static inline void +process_ldpc_dec_cb(struct turbo_sw_queue *q, struct rte_bbdev_dec_op *op, + uint8_t c, uint16_t out_length, uint16_t e, + struct rte_mbuf *m_in, + struct rte_mbuf *m_out_head, struct rte_mbuf *m_out, + struct rte_mbuf *m_harq_in, + struct rte_mbuf *m_harq_out_head, struct rte_mbuf *m_harq_out, + uint16_t in_offset, uint16_t out_offset, + uint16_t harq_in_offset, uint16_t harq_out_offset, + bool check_crc_24b, + uint16_t crc24_overlap, uint16_t in_length, + struct rte_bbdev_stats *q_stats) +{ +#ifdef RTE_LIBRTE_PMD_BBDEV_AVX512 + RTE_SET_USED(in_length); + RTE_SET_USED(c); + uint8_t *in, *out, *harq_in, *harq_out, *adapter_input; + struct bblib_rate_dematching_5gnr_request derm_req; + struct bblib_rate_dematching_5gnr_response derm_resp; + struct bblib_ldpc_decoder_5gnr_request dec_req; + struct bblib_ldpc_decoder_5gnr_response dec_resp; + struct bblib_crc_request crc_req; + struct bblib_crc_response crc_resp; + struct rte_bbdev_op_ldpc_dec *dec = &op->ldpc_dec; + uint16_t K, parity_offset, sys_cols, outLenWithCrc; + int16_t deRmOutSize, numRows; + + /* Compute some LDPC BG lengths */ + outLenWithCrc = out_length + (crc24_overlap >> 3); + sys_cols = (dec->basegraph == 1) ? 22 : 10; + K = sys_cols * dec->z_c; + parity_offset = K - 2 * dec->z_c; + +#ifdef RTE_BBDEV_OFFLOAD_COST + uint64_t start_time = rte_rdtsc_precise(); +#else + RTE_SET_USED(q_stats); +#endif + + in = rte_pktmbuf_mtod_offset(m_in, uint8_t *, in_offset); + + if (check_bit(dec->op_flags, RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE)) { + /** + * Single contiguous block from the first LLR of the + * circular buffer. + */ + harq_in = NULL; + if (m_harq_in != NULL) + harq_in = rte_pktmbuf_mtod_offset(m_harq_in, + uint8_t *, harq_in_offset); + if (harq_in == NULL) { + op->status |= 1 << RTE_BBDEV_DATA_ERROR; + rte_bbdev_log(ERR, "No space in harq input mbuf"); + return; + } + uint16_t harq_in_length = RTE_MIN( + dec->harq_combined_input.length, + (uint32_t) dec->n_cb); + memset(q->ag + harq_in_length, 0, + dec->n_cb - harq_in_length); + rte_memcpy(q->ag, harq_in, harq_in_length); + } + + derm_req.p_in = (int8_t *) in; + derm_req.p_harq = q->ag; /* This doesn't include the filler bits */ + derm_req.base_graph = dec->basegraph; + derm_req.zc = dec->z_c; + derm_req.ncb = dec->n_cb; + derm_req.e = e; + derm_req.k0 = 0; /* Actual output from SDK */ + derm_req.isretx = check_bit(dec->op_flags, + RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE); + derm_req.rvid = dec->rv_index; + derm_req.modulation_order = dec->q_m; + derm_req.start_null_index = parity_offset - dec->n_filler; + derm_req.num_of_null = dec->n_filler; + + bblib_rate_dematching_5gnr(&derm_req, &derm_resp); + + /* Compute RM out size and number of rows */ + deRmOutSize = RTE_MIN( + derm_req.k0 + derm_req.e - + ((derm_req.k0 < derm_req.start_null_index) ? + 0 : dec->n_filler), + dec->n_cb - dec->n_filler); + if (m_harq_in != NULL) + deRmOutSize = RTE_MAX(deRmOutSize, + RTE_MIN(dec->n_cb - dec->n_filler, + m_harq_in->data_len)); + numRows = ((deRmOutSize + dec->n_filler + dec->z_c - 1) / dec->z_c) + - sys_cols + 2; + numRows = RTE_MAX(4, numRows); + + /* get output data starting address */ + out = (uint8_t *)mbuf_append(m_out_head, m_out, out_length); + if (out == NULL) { + op->status |= 1 << RTE_BBDEV_DATA_ERROR; + rte_bbdev_log(ERR, + "Too little space in output mbuf"); + return; + } + + /* rte_bbdev_op_data.offset can be different than the offset + * of the appended bytes + */ + out = rte_pktmbuf_mtod_offset(m_out, uint8_t *, out_offset); + adapter_input = q->enc_out; + + dec_req.Zc = dec->z_c; + dec_req.baseGraph = dec->basegraph; + dec_req.nRows = numRows; + dec_req.numChannelLlrs = deRmOutSize; + dec_req.varNodes = derm_req.p_harq; + dec_req.numFillerBits = dec->n_filler; + dec_req.maxIterations = dec->iter_max; + dec_req.enableEarlyTermination = check_bit(dec->op_flags, + RTE_BBDEV_LDPC_ITERATION_STOP_ENABLE); + dec_resp.varNodes = (int16_t *) q->adapter_output; + dec_resp.compactedMessageBytes = q->enc_out; + + bblib_ldpc_decoder_5gnr(&dec_req, &dec_resp); + + dec->iter_count = RTE_MAX(dec_resp.iterationAtTermination, + dec->iter_count); + if (!dec_resp.parityPassedAtTermination) + op->status |= 1 << RTE_BBDEV_SYNDROME_ERROR; + + bblib_bit_reverse((int8_t *) q->enc_out, outLenWithCrc << 3); + + if (check_bit(dec->op_flags, RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK) || + check_bit(dec->op_flags, + RTE_BBDEV_LDPC_CRC_TYPE_24B_CHECK)) { + crc_req.data = adapter_input; + crc_req.len = K - dec->n_filler - 24; + crc_resp.check_passed = false; + crc_resp.data = adapter_input; + if (check_crc_24b) + bblib_lte_crc24b_check(&crc_req, &crc_resp); + else + bblib_lte_crc24a_check(&crc_req, &crc_resp); + if (!crc_resp.check_passed) + op->status |= 1 << RTE_BBDEV_CRC_ERROR; + } + +#ifdef RTE_BBDEV_OFFLOAD_COST + q_stats->acc_offload_cycles += rte_rdtsc_precise() - start_time; +#endif + if (check_bit(dec->op_flags, RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE)) { + /* get output data starting address */ + harq_out = NULL; + if (m_harq_out != NULL) + harq_out = (uint8_t *)mbuf_append(m_harq_out_head, + m_harq_out, deRmOutSize); + if (harq_out == NULL) { + op->status |= 1 << RTE_BBDEV_DATA_ERROR; + rte_bbdev_log(ERR, "No space in harq output mbuf"); + return; + } + harq_out = rte_pktmbuf_mtod_offset(m_harq_out, uint8_t *, + harq_out_offset); + rte_memcpy(harq_out, derm_req.p_harq, deRmOutSize); + dec->harq_combined_output.length += deRmOutSize; + } + + rte_memcpy(out, adapter_input, out_length); + dec->hard_output.length += out_length; +#else + RTE_SET_USED(q); + RTE_SET_USED(op); + RTE_SET_USED(c); + RTE_SET_USED(out_length); + RTE_SET_USED(e); + RTE_SET_USED(m_in); + RTE_SET_USED(m_out_head); + RTE_SET_USED(m_out); + RTE_SET_USED(m_harq_in); + RTE_SET_USED(m_harq_out_head); + RTE_SET_USED(m_harq_out); + RTE_SET_USED(harq_in_offset); + RTE_SET_USED(harq_out_offset); + RTE_SET_USED(in_offset); + RTE_SET_USED(out_offset); + RTE_SET_USED(check_crc_24b); + RTE_SET_USED(crc24_overlap); + RTE_SET_USED(in_length); + RTE_SET_USED(q_stats); +#endif +} + + +static inline void enqueue_dec_one_op(struct turbo_sw_queue *q, struct rte_bbdev_dec_op *op, struct rte_bbdev_stats *queue_stats) { @@ -1033,6 +1539,7 @@ struct turbo_sw_queue { in_offset, out_offset, check_bit(dec->op_flags, RTE_BBDEV_TURBO_CRC_TYPE_24B), crc24_overlap, seg_total_left, queue_stats); + /* To keep CRC24 attached to end of Code block, use * RTE_BBDEV_TURBO_DEC_TB_CRC_24B_KEEP flag as it * removed by default once verified. @@ -1054,6 +1561,103 @@ struct turbo_sw_queue { } r++; } + + if (mbuf_total_left != 0) { + op->status |= 1 << RTE_BBDEV_DATA_ERROR; + rte_bbdev_log(ERR, + "Mismatch between mbuf length and included Circular buffer sizes"); + } +} + +static inline void +enqueue_ldpc_dec_one_op(struct turbo_sw_queue *q, struct rte_bbdev_dec_op *op, + struct rte_bbdev_stats *queue_stats) +{ + uint8_t c, r = 0; + uint16_t e, out_length; + uint16_t crc24_overlap = 0; + struct rte_bbdev_op_ldpc_dec *dec = &op->ldpc_dec; + struct rte_mbuf *m_in = dec->input.data; + struct rte_mbuf *m_harq_in = dec->harq_combined_input.data; + struct rte_mbuf *m_harq_out = dec->harq_combined_output.data; + struct rte_mbuf *m_harq_out_head = dec->harq_combined_output.data; + struct rte_mbuf *m_out = dec->hard_output.data; + struct rte_mbuf *m_out_head = dec->hard_output.data; + uint16_t in_offset = dec->input.offset; + uint16_t harq_in_offset = dec->harq_combined_input.offset; + uint16_t harq_out_offset = dec->harq_combined_output.offset; + uint16_t out_offset = dec->hard_output.offset; + uint32_t mbuf_total_left = dec->input.length; + uint16_t seg_total_left; + + /* Clear op status */ + op->status = 0; + + if (m_in == NULL || m_out == NULL) { + rte_bbdev_log(ERR, "Invalid mbuf pointer"); + op->status = 1 << RTE_BBDEV_DATA_ERROR; + return; + } + + if (dec->code_block_mode == 0) { /* For Transport Block mode */ + c = dec->tb_params.c; + e = dec->tb_params.ea; + } else { /* For Code Block mode */ + c = 1; + e = dec->cb_params.e; + } + + if (check_bit(dec->op_flags, RTE_BBDEV_LDPC_CRC_TYPE_24B_DROP)) + crc24_overlap = 24; + + out_length = (dec->basegraph == 1 ? 22 : 10) * dec->z_c; /* K */ + out_length = ((out_length - crc24_overlap - dec->n_filler) >> 3); + + while (mbuf_total_left > 0) { + if (dec->code_block_mode == 0) + e = (r < dec->tb_params.cab) ? + dec->tb_params.ea : dec->tb_params.eb; + + seg_total_left = rte_pktmbuf_data_len(m_in) - in_offset; + + process_ldpc_dec_cb(q, op, c, out_length, e, + m_in, m_out_head, m_out, + m_harq_in, m_harq_out_head, m_harq_out, + in_offset, out_offset, harq_in_offset, + harq_out_offset, + check_bit(dec->op_flags, + RTE_BBDEV_LDPC_CRC_TYPE_24B_CHECK), + crc24_overlap, + seg_total_left, queue_stats); + + /* To keep CRC24 attached to end of Code block, use + * RTE_BBDEV_LDPC_DEC_TB_CRC_24B_KEEP flag as it + * removed by default once verified. + */ + + mbuf_total_left -= e; + + /* Update offsets */ + if (seg_total_left == e) { + /* Go to the next mbuf */ + m_in = m_in->next; + m_out = m_out->next; + if (m_harq_in != NULL) + m_harq_in = m_harq_in->next; + if (m_harq_out != NULL) + m_harq_out = m_harq_out->next; + in_offset = 0; + out_offset = 0; + harq_in_offset = 0; + harq_out_offset = 0; + } else { + /* Update offsets for next CBs (if exist) */ + in_offset += e; + out_offset += out_length; + } + r++; + } + if (mbuf_total_left != 0) { op->status |= 1 << RTE_BBDEV_DATA_ERROR; rte_bbdev_log(ERR, @@ -1077,6 +1681,23 @@ struct turbo_sw_queue { NULL); } +static inline uint16_t +enqueue_ldpc_dec_all_ops(struct turbo_sw_queue *q, + struct rte_bbdev_dec_op **ops, + uint16_t nb_ops, struct rte_bbdev_stats *queue_stats) +{ + uint16_t i; +#ifdef RTE_BBDEV_OFFLOAD_COST + queue_stats->acc_offload_cycles = 0; +#endif + + for (i = 0; i < nb_ops; ++i) + enqueue_ldpc_dec_one_op(q, ops[i], queue_stats); + + return rte_ring_enqueue_burst(q->processed_pkts, (void **)ops, nb_ops, + NULL); +} + /* Enqueue burst */ static uint16_t enqueue_enc_ops(struct rte_bbdev_queue_data *q_data, @@ -1096,6 +1717,24 @@ struct turbo_sw_queue { /* Enqueue burst */ static uint16_t +enqueue_ldpc_enc_ops(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_enc_op **ops, uint16_t nb_ops) +{ + void *queue = q_data->queue_private; + struct turbo_sw_queue *q = queue; + uint16_t nb_enqueued = 0; + + nb_enqueued = enqueue_ldpc_enc_all_ops( + q, ops, nb_ops, &q_data->queue_stats); + + q_data->queue_stats.enqueue_err_count += nb_ops - nb_enqueued; + q_data->queue_stats.enqueued_count += nb_enqueued; + + return nb_enqueued; +} + +/* Enqueue burst */ +static uint16_t enqueue_dec_ops(struct rte_bbdev_queue_data *q_data, struct rte_bbdev_dec_op **ops, uint16_t nb_ops) { @@ -1111,6 +1750,24 @@ struct turbo_sw_queue { return nb_enqueued; } +/* Enqueue burst */ +static uint16_t +enqueue_ldpc_dec_ops(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_dec_op **ops, uint16_t nb_ops) +{ + void *queue = q_data->queue_private; + struct turbo_sw_queue *q = queue; + uint16_t nb_enqueued = 0; + + nb_enqueued = enqueue_ldpc_dec_all_ops(q, ops, nb_ops, + &q_data->queue_stats); + + q_data->queue_stats.enqueue_err_count += nb_ops - nb_enqueued; + q_data->queue_stats.enqueued_count += nb_enqueued; + + return nb_enqueued; +} + /* Dequeue decode burst */ static uint16_t dequeue_dec_ops(struct rte_bbdev_queue_data *q_data, @@ -1223,6 +1880,10 @@ struct turbo_sw_queue { bbdev->dequeue_dec_ops = dequeue_dec_ops; bbdev->enqueue_enc_ops = enqueue_enc_ops; bbdev->enqueue_dec_ops = enqueue_dec_ops; + bbdev->dequeue_ldpc_enc_ops = dequeue_enc_ops; + bbdev->dequeue_ldpc_dec_ops = dequeue_dec_ops; + bbdev->enqueue_ldpc_enc_ops = enqueue_ldpc_enc_ops; + bbdev->enqueue_ldpc_dec_ops = enqueue_ldpc_dec_ops; ((struct bbdev_private *) bbdev->data->dev_private)->max_nb_queues = init_params->queues_num; diff --git a/mk/rte.app.mk b/mk/rte.app.mk index 21d32a2..060e0ec 100644 --- a/mk/rte.app.mk +++ b/mk/rte.app.mk @@ -225,8 +225,14 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BBDEV_TURBO_SW) += -lrte_pmd_bbdev_turbo_sw _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BBDEV_TURBO_SW) += -L$(FLEXRAN_SDK)/lib_crc -lcrc _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BBDEV_TURBO_SW) += -L$(FLEXRAN_SDK)/lib_turbo -lturbo _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BBDEV_TURBO_SW) += -L$(FLEXRAN_SDK)/lib_rate_matching -lrate_matching +ifeq ($(CONFIG_RTE_LIBRTE_PMD_BBDEV_AVX512),y) +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BBDEV_TURBO_SW) += -L$(FLEXRAN_SDK)/lib_LDPC_ratematch_5gnr -lLDPC_ratematch_5gnr +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BBDEV_TURBO_SW) += -L$(FLEXRAN_SDK)/lib_ldpc_encoder_5gnr -lldpc_encoder_5gnr +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BBDEV_TURBO_SW) += -L$(FLEXRAN_SDK)/lib_ldpc_decoder_5gnr -lldpc_decoder_5gnr +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BBDEV_TURBO_SW) += -L$(FLEXRAN_SDK)/lib_rate_dematching_5gnr -lrate_dematching_5gnr +endif _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BBDEV_TURBO_SW) += -L$(FLEXRAN_SDK)/lib_common -lcommon -_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BBDEV_TURBO_SW) += -lirc -limf -lstdc++ -lipps +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BBDEV_TURBO_SW) += -lirc -limf -lstdc++ -lipps -lsvml endif # CONFIG_RTE_LIBRTE_BBDEV ifeq ($(CONFIG_RTE_LIBRTE_CRYPTODEV),y) -- 1.8.3.1 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <dev-bounces@dpdk.org> Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id E6E8DA00E6 for <public@inbox.dpdk.org>; Tue, 14 May 2019 21:47:01 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id A24605F0F; Tue, 14 May 2019 21:46:25 +0200 (CEST) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id CB417548B for <dev@dpdk.org>; Tue, 14 May 2019 21:46:20 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 14 May 2019 12:46:19 -0700 X-ExtLoop1: 1 Received: from skx-5gnr-sd5.sc.intel.com ([172.25.69.194]) by orsmga003.jf.intel.com with ESMTP; 14 May 2019 12:46:19 -0700 From: Nicolas Chautru <nicolas.chautru@intel.com> To: Cc: dev@dpdk.org, Nicolas Chautru <nicolas.chautru@intel.com> Date: Tue, 14 May 2019 12:45:41 -0700 Message-Id: <1557863143-174842-4-git-send-email-nicolas.chautru@intel.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1557863143-174842-1-git-send-email-nicolas.chautru@intel.com> References: <1557533163-172544-1-git-send-email-nicolas.chautru@intel.com> <1557863143-174842-1-git-send-email-nicolas.chautru@intel.com> Subject: [dpdk-dev] [PATCH v2 3/5] baseband/turbo_sw: extension of turbosw for 5G FEC X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions <dev.dpdk.org> List-Unsubscribe: <https://mails.dpdk.org/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://mails.dpdk.org/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <https://mails.dpdk.org/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org> Content-Type: text/plain; charset="UTF-8" Message-ID: <20190514194541.xbyOq-4VDw3Y7m6wMULf548uc9w-LSOTSAx6FD2xcA8@z> Implementation still based on Intel SDK libraries Optimized for AVX512 instructions set Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com> --- app/test-bbdev/test_bbdev_vector.c | 4 +- app/test-bbdev/test_bbdev_vector.h | 2 +- config/common_base | 3 +- doc/guides/bbdevs/turbo_sw.rst | 57 +- drivers/baseband/turbo_sw/Makefile | 15 +- drivers/baseband/turbo_sw/bbdev_turbo_software.c | 679 ++++++++++++++++++++++- mk/rte.app.mk | 8 +- 7 files changed, 731 insertions(+), 37 deletions(-) diff --git a/app/test-bbdev/test_bbdev_vector.c b/app/test-bbdev/test_bbdev_vector.c index e4f68e2..e149ced 100644 --- a/app/test-bbdev/test_bbdev_vector.c +++ b/app/test-bbdev/test_bbdev_vector.c @@ -298,9 +298,9 @@ op_data = vector->entries[type].segments; nb_ops = &vector->entries[type].nb_segments; - if (*nb_ops >= RTE_BBDEV_MAX_CODE_BLOCKS) { + if (*nb_ops >= RTE_BBDEV_TURBO_MAX_CODE_BLOCKS) { printf("Too many segments (code blocks defined): %u, max %d!\n", - *nb_ops, RTE_BBDEV_MAX_CODE_BLOCKS); + *nb_ops, RTE_BBDEV_TURBO_MAX_CODE_BLOCKS); return -1; } diff --git a/app/test-bbdev/test_bbdev_vector.h b/app/test-bbdev/test_bbdev_vector.h index 476aae1..c85e94d 100644 --- a/app/test-bbdev/test_bbdev_vector.h +++ b/app/test-bbdev/test_bbdev_vector.h @@ -46,7 +46,7 @@ struct op_data_buf { }; struct op_data_entries { - struct op_data_buf segments[RTE_BBDEV_MAX_CODE_BLOCKS]; + struct op_data_buf segments[RTE_BBDEV_TURBO_MAX_CODE_BLOCKS]; unsigned int nb_segments; }; diff --git a/config/common_base b/config/common_base index ae52028..78e41f2 100644 --- a/config/common_base +++ b/config/common_base @@ -523,7 +523,8 @@ CONFIG_RTE_PMD_PACKET_PREFETCH=y CONFIG_RTE_LIBRTE_BBDEV=y CONFIG_RTE_LIBRTE_BBDEV_DEBUG=n CONFIG_RTE_BBDEV_MAX_DEVS=128 -CONFIG_RTE_BBDEV_OFFLOAD_COST=y +CONFIG_RTE_BBDEV_OFFLOAD_COST=n +CONFIG_RTE_LIBRTE_PMD_BBDEV_AVX512=n # # Compile PMD for NULL bbdev device diff --git a/doc/guides/bbdevs/turbo_sw.rst b/doc/guides/bbdevs/turbo_sw.rst index 29f7ec9..e67ecc4 100644 --- a/doc/guides/bbdevs/turbo_sw.rst +++ b/doc/guides/bbdevs/turbo_sw.rst @@ -1,26 +1,26 @@ .. SPDX-License-Identifier: BSD-3-Clause Copyright(c) 2017 Intel Corporation -SW Turbo Poll Mode Driver +SW FEC Poll Mode Driver ========================= -The SW Turbo PMD (**baseband_turbo_sw**) provides a poll mode bbdev driver that utilizes -Intel optimized libraries for LTE Layer 1 workloads acceleration. This PMD -supports the functions: Turbo FEC, Rate Matching and CRC functions. +The SW FEC PMD (**baseband_turbo_sw**) provides a poll mode bbdev driver that utilizes +Intel optimized libraries for LTE and 5GNR Layer 1 workloads acceleration. This PMD +supports the functions: FEC, Rate Matching and CRC functions. Features -------- -SW Turbo PMD has support for the following capabilities: +SW FEC PMD has support for the following capabilities: -For the encode operation: +For the LTE encode operation: * ``RTE_BBDEV_TURBO_CRC_24A_ATTACH`` * ``RTE_BBDEV_TURBO_CRC_24B_ATTACH`` * ``RTE_BBDEV_TURBO_RATE_MATCH`` * ``RTE_BBDEV_TURBO_RV_INDEX_BYPASS`` -For the decode operation: +For the LTE decode operation: * ``RTE_BBDEV_TURBO_SUBBLOCK_DEINTERLEAVE`` * ``RTE_BBDEV_TURBO_CRC_TYPE_24B`` @@ -29,11 +29,25 @@ For the decode operation: * ``RTE_BBDEV_TURBO_DEC_TB_CRC_24B_KEEP`` * ``RTE_BBDEV_TURBO_EARLY_TERMINATION`` +For the LDPC encode operation: + +* ``RTE_BBDEV_LDPC_RATE_MATCH`` +* ``RTE_BBDEV_LDPC_CRC_24A_ATTACH`` +* ``RTE_BBDEV_LDPC_CRC_24B_ATTACH`` + +For the LDPC decode operation: + +* ``RTE_BBDEV_LDPC_CRC_TYPE_24B_CHECK`` +* ``RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK`` +* ``RTE_BBDEV_LDPC_CRC_TYPE_24B_DROP`` +* ``RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE`` +* ``RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE`` +* ``RTE_BBDEV_LDPC_ITERATION_STOP_ENABLE`` Limitations ----------- -* In-place operations for Turbo encode and decode are not supported +* In-place operations for encode and decode are not supported Installation ------------ @@ -65,14 +79,14 @@ The following table maps DPDK versions with past FlexRAN SDK releases: 18.02 1.3.0 18.05 1.4.0 18.08 1.6.0 - 19.02 18.09 + 18.08-Patch 19.04 ===================== ============================ FlexRAN SDK Installation ~~~~~~~~~~~~~~~~~~~~~~~~ The following are pre-requisites for building FlexRAN SDK Libraries: - (a) An AVX2 supporting machine + (a) An AVX512 supporting machine (b) CentOS Linux release 7.2.1511 (Core) operating system (c) Intel ICC 18.0.1 20171018 compiler installed @@ -84,25 +98,25 @@ The following instructions should be followed in this exact order: source <path-to-icc-compiler-install-folder>/linux/bin/compilervars.sh intel64 -platform linux -#. Extract the ``605167-flexran-18-09-tar.gz`` package: +#. Extract the ``...-flexran-19-04-tar.gz`` package: .. code-block:: console - mkdir FlexRAN-18.09 - tar xvzf 605167-flexran-18-09-tar.gz -C FlexRAN-18.09/ + mkdir FlexRAN-19.04 + tar xvzf ...-flexran-19-04-tar.gz -C FlexRAN-19.04/ #. Run the SDK extractor script and accept the license: .. code-block:: console - cd <path-to-workspace>/FlexRAN-18.09/ - ./SDK-18.09.sh + cd <path-to-workspace>/FlexRAN-19.04/ + ./SDK-19.04.sh #. Generate makefiles based on system configuration: .. code-block:: console - cd <path-to-workspace>/FlexRAN-18.09/SDK-18.09/sdk/ + cd <path-to-workspace>/FlexRAN-19.04/SDK-19.04/sdk/ ./create-makefiles-linux.sh #. A build folder is generated in this form ``build-<ISA>-<CC>``, enter that @@ -110,7 +124,7 @@ The following instructions should be followed in this exact order: .. code-block:: console - cd build-avx2-icc/ + cd build-avx512-icc/ make && make install @@ -129,12 +143,11 @@ Example: .. code-block:: console - export FLEXRAN_SDK=<path-to-workspace>/FlexRAN-18.09/SDK-18.09/sdk/build-avx2-icc/install - export DIR_WIRELESS_SDK=<path-to-workspace>/FlexRAN-18.09/SDK-18.09/sdk/ - + export FLEXRAN_SDK=<path-to-workspace>/FlexRAN-19.04/SDK-19.04/sdk/build-avx2-icc/install + export DIR_WIRELESS_SDK=<path-to-workspace>/FlexRAN-19.04/SDK-19.04/sdk/ -* Set ``CONFIG_RTE_LIBRTE_PMD_BBDEV_TURBO_SW=y`` in DPDK common configuration - file ``config/common_base``. +* Set ``CONFIG_RTE_LIBRTE_PMD_BBDEV_TURBO_SW=y`` and ``CONFIG_RTE_LIBRTE_PMD_BBDEV_AVX512=y`` + in DPDK common configuration file ``config/common_base``. To use the PMD in an application, user must: diff --git a/drivers/baseband/turbo_sw/Makefile b/drivers/baseband/turbo_sw/Makefile index d364677..1a47dfb 100644 --- a/drivers/baseband/turbo_sw/Makefile +++ b/drivers/baseband/turbo_sw/Makefile @@ -26,12 +26,25 @@ CFLAGS += -I$(FLEXRAN_SDK)/lib_common CFLAGS += -I$(FLEXRAN_SDK)/lib_turbo CFLAGS += -I$(FLEXRAN_SDK)/lib_crc CFLAGS += -I$(FLEXRAN_SDK)/lib_rate_matching +ifeq ($(CONFIG_RTE_LIBRTE_PMD_BBDEV_AVX512),y) +CFLAGS += -I$(FLEXRAN_SDK)/lib_ldpc_encoder_5gnr +CFLAGS += -I$(FLEXRAN_SDK)/lib_ldpc_decoder_5gnr +CFLAGS += -I$(FLEXRAN_SDK)/lib_LDPC_ratematch_5gnr +CFLAGS += -I$(FLEXRAN_SDK)/lib_rate_dematching_5gnr +endif LDLIBS += -L$(FLEXRAN_SDK)/lib_turbo -lturbo LDLIBS += -L$(FLEXRAN_SDK)/lib_crc -lcrc LDLIBS += -L$(FLEXRAN_SDK)/lib_rate_matching -lrate_matching LDLIBS += -L$(FLEXRAN_SDK)/lib_common -lcommon -LDLIBS += -lstdc++ -lirc -limf -lipps +ifeq ($(CONFIG_RTE_LIBRTE_PMD_BBDEV_AVX512),y) +LDLIBS += -L$(FLEXRAN_SDK)/lib_ldpc_encoder_5gnr -lldpc_encoder_5gnr +LDLIBS += -L$(FLEXRAN_SDK)/lib_ldpc_decoder_5gnr -lldpc_decoder_5gnr +LDLIBS += -L$(FLEXRAN_SDK)/lib_LDPC_ratematch_5gnr -lLDPC_ratematch_5gnr +LDLIBS += -L$(FLEXRAN_SDK)/lib_rate_dematching_5gnr -lrate_dematching_5gnr +endif + +LDLIBS += -lstdc++ -lirc -limf -lipps -lsvml # library version LIBABIVER := 1 diff --git a/drivers/baseband/turbo_sw/bbdev_turbo_software.c b/drivers/baseband/turbo_sw/bbdev_turbo_software.c index c714d03..af5e731 100644 --- a/drivers/baseband/turbo_sw/bbdev_turbo_software.c +++ b/drivers/baseband/turbo_sw/bbdev_turbo_software.c @@ -14,9 +14,23 @@ #include <rte_bbdev.h> #include <rte_bbdev_pmd.h> +#include <rte_common.h> +#include <rte_hexdump.h> +#include <rte_log.h> + +#include <ipp.h> +#include <ipps.h> #include <phy_turbo.h> #include <phy_crc.h> #include <phy_rate_match.h> +#ifdef RTE_LIBRTE_PMD_BBDEV_AVX512 +#include <bit_reverse.h> +#include <phy_ldpc_encoder_5gnr.h> +#include <phy_ldpc_decoder_5gnr.h> +#include <phy_LDPC_ratematch_5gnr.h> +#include <phy_rate_dematching_5gnr.h> +#endif + #include <divide.h> #define DRIVER_NAME baseband_turbo_sw @@ -154,7 +168,8 @@ struct turbo_sw_queue { RTE_BBDEV_TURBO_DEC_TB_CRC_24B_KEEP | RTE_BBDEV_TURBO_EARLY_TERMINATION, .max_llr_modulus = 16, - .num_buffers_src = RTE_BBDEV_TURBO_MAX_CODE_BLOCKS, + .num_buffers_src = + RTE_BBDEV_TURBO_MAX_CODE_BLOCKS, .num_buffers_hard_out = RTE_BBDEV_TURBO_MAX_CODE_BLOCKS, .num_buffers_soft_out = 0, @@ -168,10 +183,46 @@ struct turbo_sw_queue { RTE_BBDEV_TURBO_CRC_24A_ATTACH | RTE_BBDEV_TURBO_RATE_MATCH | RTE_BBDEV_TURBO_RV_INDEX_BYPASS, - .num_buffers_src = RTE_BBDEV_TURBO_MAX_CODE_BLOCKS, - .num_buffers_dst = RTE_BBDEV_TURBO_MAX_CODE_BLOCKS, + .num_buffers_src = + RTE_BBDEV_TURBO_MAX_CODE_BLOCKS, + .num_buffers_dst = + RTE_BBDEV_TURBO_MAX_CODE_BLOCKS, + } + }, +#ifdef RTE_LIBRTE_PMD_BBDEV_AVX512 + { + .type = RTE_BBDEV_OP_LDPC_ENC, + .cap.ldpc_enc = { + .capability_flags = + RTE_BBDEV_LDPC_RATE_MATCH | + RTE_BBDEV_LDPC_CRC_24A_ATTACH | + RTE_BBDEV_LDPC_CRC_24B_ATTACH, + .num_buffers_src = + RTE_BBDEV_LDPC_MAX_CODE_BLOCKS, + .num_buffers_dst = + RTE_BBDEV_LDPC_MAX_CODE_BLOCKS, } }, + { + .type = RTE_BBDEV_OP_LDPC_DEC, + .cap.ldpc_dec = { + .capability_flags = + RTE_BBDEV_LDPC_CRC_TYPE_24B_CHECK | + RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK | + RTE_BBDEV_LDPC_CRC_TYPE_24B_DROP | + RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE | + RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE | + RTE_BBDEV_LDPC_ITERATION_STOP_ENABLE, + .llr_size = 8, + .llr_decimals = 2, + .num_buffers_src = + RTE_BBDEV_TURBO_MAX_CODE_BLOCKS, + .num_buffers_hard_out = + RTE_BBDEV_TURBO_MAX_CODE_BLOCKS, + .num_buffers_soft_out = 0, + } + }, +#endif RTE_BBDEV_END_OF_CAPABILITIES_LIST() }; @@ -268,7 +319,7 @@ struct turbo_sw_queue { return -ENAMETOOLONG; } q->enc_in = rte_zmalloc_socket(name, - (RTE_BBDEV_TURBO_MAX_CB_SIZE >> 3) * sizeof(*q->enc_in), + (RTE_BBDEV_LDPC_MAX_CB_SIZE >> 3) * sizeof(*q->enc_in), RTE_CACHE_LINE_SIZE, queue_conf->socket); if (q->enc_in == NULL) { rte_bbdev_log(ERR, @@ -276,7 +327,7 @@ struct turbo_sw_queue { goto free_q; } - /* Allocate memory for Aplha Gamma temp buffer. */ + /* Allocate memory for Alpha Gamma temp buffer. */ ret = snprintf(name, RTE_RING_NAMESIZE, RTE_STR(DRIVER_NAME)"_ag%u:%u", dev->data->dev_id, q_id); if ((ret < 0) || (ret >= (int)RTE_RING_NAMESIZE)) { @@ -410,6 +461,7 @@ struct turbo_sw_queue { .queue_release = q_release }; +#ifdef RTE_LIBRTE_BBDEV_DEBUG /* Checks if the encoder input buffer is correct. * Returns 0 if it's valid, -1 otherwise. */ @@ -437,7 +489,9 @@ struct turbo_sw_queue { return 0; } +#endif +#ifdef RTE_LIBRTE_BBDEV_DEBUG /* Checks if the decoder input buffer is correct. * Returns 0 if it's valid, -1 otherwise. */ @@ -464,15 +518,20 @@ struct turbo_sw_queue { return 0; } +#endif static inline void process_enc_cb(struct turbo_sw_queue *q, struct rte_bbdev_enc_op *op, uint8_t r, uint8_t c, uint16_t k, uint16_t ncb, uint32_t e, struct rte_mbuf *m_in, struct rte_mbuf *m_out_head, - struct rte_mbuf *m_out, uint16_t in_offset, uint16_t out_offset, + struct rte_mbuf *m_out, uint16_t in_offset, uint16_t out_offset, uint16_t in_length, struct rte_bbdev_stats *q_stats) { +#ifdef RTE_LIBRTE_BBDEV_DEBUG int ret; +#else + RTE_SET_USED(in_length); +#endif int16_t k_idx; uint16_t m; uint8_t *in, *out0, *out1, *out2, *tmp_out, *rm_out; @@ -496,11 +555,14 @@ struct turbo_sw_queue { /* CRC24A (for TB) */ if ((enc->op_flags & RTE_BBDEV_TURBO_CRC_24A_ATTACH) && (enc->code_block_mode == 1)) { +#ifdef RTE_LIBRTE_BBDEV_DEBUG ret = is_enc_input_valid(k - 24, k_idx, in_length); if (ret != 0) { op->status |= 1 << RTE_BBDEV_DATA_ERROR; return; } +#endif + crc_req.data = in; crc_req.len = k - 24; /* Check if there is a room for CRC bits if not use @@ -529,11 +591,14 @@ struct turbo_sw_queue { #endif } else if (enc->op_flags & RTE_BBDEV_TURBO_CRC_24B_ATTACH) { /* CRC24B */ +#ifdef RTE_LIBRTE_BBDEV_DEBUG ret = is_enc_input_valid(k - 24, k_idx, in_length); if (ret != 0) { op->status |= 1 << RTE_BBDEV_DATA_ERROR; return; } +#endif + crc_req.data = in; crc_req.len = k - 24; /* Check if there is a room for CRC bits if this is the last @@ -560,13 +625,16 @@ struct turbo_sw_queue { #ifdef RTE_BBDEV_OFFLOAD_COST q_stats->acc_offload_cycles += rte_rdtsc_precise() - start_time; #endif - } else { + } +#ifdef RTE_LIBRTE_BBDEV_DEBUG + else { ret = is_enc_input_valid(k, k_idx, in_length); if (ret != 0) { op->status |= 1 << RTE_BBDEV_DATA_ERROR; return; } } +#endif /* Turbo encoder */ @@ -726,6 +794,143 @@ struct turbo_sw_queue { } } + +static inline void +process_ldpc_enc_cb(struct turbo_sw_queue *q, struct rte_bbdev_enc_op *op, + uint32_t e, struct rte_mbuf *m_in, struct rte_mbuf *m_out_head, + struct rte_mbuf *m_out, uint16_t in_offset, uint16_t out_offset, + uint16_t seg_total_left, struct rte_bbdev_stats *q_stats) +{ +#ifdef RTE_LIBRTE_PMD_BBDEV_AVX512 + RTE_SET_USED(seg_total_left); + uint8_t *in, *rm_out; + struct rte_bbdev_op_ldpc_enc *enc = &op->ldpc_enc; + struct bblib_ldpc_encoder_5gnr_request ldpc_req; + struct bblib_ldpc_encoder_5gnr_response ldpc_resp; + struct bblib_LDPC_ratematch_5gnr_request rm_req; + struct bblib_LDPC_ratematch_5gnr_response rm_resp; + struct bblib_crc_request crc_req; + struct bblib_crc_response crc_resp; + uint16_t msgLen, puntBits, parity_offset, out_len; + uint16_t K = (enc->basegraph == 1 ? 22 : 10) * enc->z_c; + uint16_t in_length_in_bits = K - enc->n_filler; + uint16_t in_length_in_bytes = (in_length_in_bits + 7) >> 3; + +#ifdef RTE_BBDEV_OFFLOAD_COST + uint64_t start_time = rte_rdtsc_precise(); +#else + RTE_SET_USED(q_stats); +#endif + + in = rte_pktmbuf_mtod_offset(m_in, uint8_t *, in_offset); + + /* Masking the Filler bits explicitly */ + memset(q->enc_in + (in_length_in_bytes - 3), 0, + ((K + 7) >> 3) - (in_length_in_bytes - 3)); + /* CRC Generation */ + if (enc->op_flags & RTE_BBDEV_LDPC_CRC_24A_ATTACH) { + rte_memcpy(q->enc_in, in, in_length_in_bytes - 3); + crc_req.data = in; + crc_req.len = in_length_in_bits - 24; + crc_resp.data = q->enc_in; + bblib_lte_crc24a_gen(&crc_req, &crc_resp); + } else if (enc->op_flags & RTE_BBDEV_LDPC_CRC_24B_ATTACH) { + rte_memcpy(q->enc_in, in, in_length_in_bytes - 3); + crc_req.data = in; + crc_req.len = in_length_in_bits - 24; + crc_resp.data = q->enc_in; + bblib_lte_crc24b_gen(&crc_req, &crc_resp); + } else + rte_memcpy(q->enc_in, in, in_length_in_bytes); + + /* LDPC Encoding */ + ldpc_req.Zc = enc->z_c; + ldpc_req.baseGraph = enc->basegraph; + /* Number of rows set to maximum */ + ldpc_req.nRows = ldpc_req.baseGraph == 1 ? 46 : 42; + ldpc_req.numberCodeblocks = 1; + ldpc_req.input[0] = (int8_t *) q->enc_in; + ldpc_resp.output[0] = (int8_t *) q->enc_out; + + bblib_bit_reverse(ldpc_req.input[0], in_length_in_bytes << 3); + + if (bblib_ldpc_encoder_5gnr(&ldpc_req, &ldpc_resp) != 0) { + op->status |= 1 << RTE_BBDEV_DRV_ERROR; + rte_bbdev_log(ERR, "LDPC Encoder failed"); + return; + } + + /* + * Systematic + Parity : Recreating stream with filler bits, ideally + * the bit select could handle this in the RM SDK + */ + msgLen = (ldpc_req.baseGraph == 1 ? 22 : 10) * ldpc_req.Zc; + puntBits = 2 * ldpc_req.Zc; + parity_offset = msgLen - puntBits; + ippsCopyBE_1u(((uint8_t *) ldpc_req.input[0]) + (puntBits / 8), + puntBits%8, q->adapter_output, 0, parity_offset); + ippsCopyBE_1u(q->enc_out, 0, q->adapter_output + (parity_offset / 8), + parity_offset % 8, ldpc_req.nRows * ldpc_req.Zc); + + out_len = (e + 7) >> 3; + /* get output data starting address */ + rm_out = (uint8_t *)mbuf_append(m_out_head, m_out, out_len); + if (rm_out == NULL) { + op->status |= 1 << RTE_BBDEV_DATA_ERROR; + rte_bbdev_log(ERR, + "Too little space in output mbuf"); + return; + } + /* + * rte_bbdev_op_data.offset can be different than the offset + * of the appended bytes + */ + rm_out = rte_pktmbuf_mtod_offset(m_out, uint8_t *, out_offset); + + /* Rate-Matching */ + rm_req.E = e; + rm_req.Ncb = enc->n_cb; + rm_req.Qm = enc->q_m; + rm_req.Zc = enc->z_c; + rm_req.baseGraph = enc->basegraph; + rm_req.input = q->adapter_output; + rm_req.nLen = enc->n_filler; + rm_req.nullIndex = parity_offset - enc->n_filler; + rm_req.rvidx = enc->rv_index; + rm_resp.output = q->deint_output; + + if (bblib_LDPC_ratematch_5gnr(&rm_req, &rm_resp) != 0) { + op->status |= 1 << RTE_BBDEV_DRV_ERROR; + rte_bbdev_log(ERR, "Rate matching failed"); + return; + } + + /* RM SDK may provide non zero bits on last byte */ + if ((e % 8) != 0) + q->deint_output[out_len-1] &= (1 << (e % 8)) - 1; + + bblib_bit_reverse((int8_t *) q->deint_output, out_len << 3); + + rte_memcpy(rm_out, q->deint_output, out_len); + enc->output.length += out_len; + +#ifdef RTE_BBDEV_OFFLOAD_COST + q_stats->acc_offload_cycles += rte_rdtsc_precise() - start_time; +#endif +#else + RTE_SET_USED(q); + RTE_SET_USED(op); + RTE_SET_USED(e); + RTE_SET_USED(m_in); + RTE_SET_USED(m_out_head); + RTE_SET_USED(m_out); + RTE_SET_USED(in_offset); + RTE_SET_USED(out_offset); + RTE_SET_USED(seg_total_left); + RTE_SET_USED(q_stats); +#endif +} + static inline void enqueue_enc_one_op(struct turbo_sw_queue *q, struct rte_bbdev_enc_op *op, struct rte_bbdev_stats *queue_stats) @@ -819,6 +1024,93 @@ struct turbo_sw_queue { } } + +static inline void +enqueue_ldpc_enc_one_op(struct turbo_sw_queue *q, struct rte_bbdev_enc_op *op, + struct rte_bbdev_stats *queue_stats) +{ + uint8_t c, r, crc24_bits = 0; + uint32_t e; + struct rte_bbdev_op_ldpc_enc *enc = &op->ldpc_enc; + uint16_t in_offset = enc->input.offset; + uint16_t out_offset = enc->output.offset; + struct rte_mbuf *m_in = enc->input.data; + struct rte_mbuf *m_out = enc->output.data; + struct rte_mbuf *m_out_head = enc->output.data; + uint32_t in_length, mbuf_total_left = enc->input.length; + + uint16_t seg_total_left; + + /* Clear op status */ + op->status = 0; + + if (mbuf_total_left > RTE_BBDEV_TURBO_MAX_TB_SIZE >> 3) { + rte_bbdev_log(ERR, "TB size (%u) is too big, max: %d", + mbuf_total_left, RTE_BBDEV_TURBO_MAX_TB_SIZE); + op->status = 1 << RTE_BBDEV_DATA_ERROR; + return; + } + + if (m_in == NULL || m_out == NULL) { + rte_bbdev_log(ERR, "Invalid mbuf pointer"); + op->status = 1 << RTE_BBDEV_DATA_ERROR; + return; + } + + if ((enc->op_flags & RTE_BBDEV_TURBO_CRC_24B_ATTACH) || + (enc->op_flags & RTE_BBDEV_TURBO_CRC_24A_ATTACH)) + crc24_bits = 24; + + if (enc->code_block_mode == 0) { /* For Transport Block mode */ + c = enc->tb_params.c; + r = enc->tb_params.r; + } else { /* For Code Block mode */ + c = 1; + r = 0; + } + + while (mbuf_total_left > 0 && r < c) { + + seg_total_left = rte_pktmbuf_data_len(m_in) - in_offset; + + if (enc->code_block_mode == 0) { + e = (r < enc->tb_params.cab) ? + enc->tb_params.ea : enc->tb_params.eb; + } else { + e = enc->cb_params.e; + } + + process_ldpc_enc_cb(q, op, e, m_in, m_out_head, + m_out, in_offset, out_offset, seg_total_left, + queue_stats); + /* Update total_left */ + in_length = (enc->basegraph == 1 ? 22 : 10) * enc->z_c; + in_length = ((in_length - crc24_bits - enc->n_filler) >> 3); + mbuf_total_left -= in_length; + /* Update offsets for next CBs (if exist) */ + in_offset += in_length; + out_offset += (e + 7) >> 3; + + /* Update offsets */ + if (seg_total_left == in_length) { + /* Go to the next mbuf */ + m_in = m_in->next; + m_out = m_out->next; + in_offset = 0; + out_offset = 0; + } + r++; + } + + /* check if all input data was processed */ + if (mbuf_total_left != 0) { + op->status |= 1 << RTE_BBDEV_DATA_ERROR; + rte_bbdev_log(ERR, + "Mismatch between mbuf length and included CBs sizes %d", + mbuf_total_left); + } +} + static inline uint16_t enqueue_enc_all_ops(struct turbo_sw_queue *q, struct rte_bbdev_enc_op **ops, uint16_t nb_ops, struct rte_bbdev_stats *queue_stats) @@ -835,6 +1127,23 @@ struct turbo_sw_queue { NULL); } +static inline uint16_t +enqueue_ldpc_enc_all_ops(struct turbo_sw_queue *q, + struct rte_bbdev_enc_op **ops, + uint16_t nb_ops, struct rte_bbdev_stats *queue_stats) +{ + uint16_t i; +#ifdef RTE_BBDEV_OFFLOAD_COST + queue_stats->acc_offload_cycles = 0; +#endif + + for (i = 0; i < nb_ops; ++i) + enqueue_ldpc_enc_one_op(q, ops[i], queue_stats); + + return rte_ring_enqueue_burst(q->processed_pkts, (void **)ops, nb_ops, + NULL); +} + static inline void move_padding_bytes(const uint8_t *in, uint8_t *out, uint16_t k, uint16_t ncb) @@ -856,7 +1165,11 @@ struct turbo_sw_queue { uint16_t crc24_overlap, uint16_t in_length, struct rte_bbdev_stats *q_stats) { +#ifdef RTE_LIBRTE_BBDEV_DEBUG int ret; +#else + RTE_SET_USED(in_length); +#endif int32_t k_idx; int32_t iter_cnt; uint8_t *in, *out, *adapter_input; @@ -874,11 +1187,13 @@ struct turbo_sw_queue { k_idx = compute_idx(k); +#ifdef RTE_LIBRTE_BBDEV_DEBUG ret = is_dec_input_valid(k_idx, kw, in_length); if (ret != 0) { op->status |= 1 << RTE_BBDEV_DATA_ERROR; return; } +#endif in = rte_pktmbuf_mtod_offset(m_in, uint8_t *, in_offset); ncb = kw; @@ -894,11 +1209,12 @@ struct turbo_sw_queue { deint_resp.pinteleavebuffer = q->deint_output; #ifdef RTE_BBDEV_OFFLOAD_COST - start_time = rte_rdtsc_precise(); + start_time = rte_rdtsc_precise(); #endif + /* Sub-block De-Interleaving */ bblib_deinterleave_ul(&deint_req, &deint_resp); #ifdef RTE_BBDEV_OFFLOAD_COST - q_stats->acc_offload_cycles += rte_rdtsc_precise() - start_time; + q_stats->acc_offload_cycles += rte_rdtsc_precise() - start_time; #endif } else move_padding_bytes(in, q->deint_output, k, ncb); @@ -975,6 +1291,196 @@ struct turbo_sw_queue { } static inline void +process_ldpc_dec_cb(struct turbo_sw_queue *q, struct rte_bbdev_dec_op *op, + uint8_t c, uint16_t out_length, uint16_t e, + struct rte_mbuf *m_in, + struct rte_mbuf *m_out_head, struct rte_mbuf *m_out, + struct rte_mbuf *m_harq_in, + struct rte_mbuf *m_harq_out_head, struct rte_mbuf *m_harq_out, + uint16_t in_offset, uint16_t out_offset, + uint16_t harq_in_offset, uint16_t harq_out_offset, + bool check_crc_24b, + uint16_t crc24_overlap, uint16_t in_length, + struct rte_bbdev_stats *q_stats) +{ +#ifdef RTE_LIBRTE_PMD_BBDEV_AVX512 + RTE_SET_USED(in_length); + RTE_SET_USED(c); + uint8_t *in, *out, *harq_in, *harq_out, *adapter_input; + struct bblib_rate_dematching_5gnr_request derm_req; + struct bblib_rate_dematching_5gnr_response derm_resp; + struct bblib_ldpc_decoder_5gnr_request dec_req; + struct bblib_ldpc_decoder_5gnr_response dec_resp; + struct bblib_crc_request crc_req; + struct bblib_crc_response crc_resp; + struct rte_bbdev_op_ldpc_dec *dec = &op->ldpc_dec; + uint16_t K, parity_offset, sys_cols, outLenWithCrc; + int16_t deRmOutSize, numRows; + + /* Compute some LDPC BG lengths */ + outLenWithCrc = out_length + (crc24_overlap >> 3); + sys_cols = (dec->basegraph == 1) ? 22 : 10; + K = sys_cols * dec->z_c; + parity_offset = K - 2 * dec->z_c; + +#ifdef RTE_BBDEV_OFFLOAD_COST + uint64_t start_time = rte_rdtsc_precise(); +#else + RTE_SET_USED(q_stats); +#endif + + in = rte_pktmbuf_mtod_offset(m_in, uint8_t *, in_offset); + + if (check_bit(dec->op_flags, RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE)) { + /** + * Single contiguous block from the first LLR of the + * circular buffer. + */ + harq_in = NULL; + if (m_harq_in != NULL) + harq_in = rte_pktmbuf_mtod_offset(m_harq_in, + uint8_t *, harq_in_offset); + if (harq_in == NULL) { + op->status |= 1 << RTE_BBDEV_DATA_ERROR; + rte_bbdev_log(ERR, "No space in harq input mbuf"); + return; + } + uint16_t harq_in_length = RTE_MIN( + dec->harq_combined_input.length, + (uint32_t) dec->n_cb); + memset(q->ag + harq_in_length, 0, + dec->n_cb - harq_in_length); + rte_memcpy(q->ag, harq_in, harq_in_length); + } + + derm_req.p_in = (int8_t *) in; + derm_req.p_harq = q->ag; /* This doesn't include the filler bits */ + derm_req.base_graph = dec->basegraph; + derm_req.zc = dec->z_c; + derm_req.ncb = dec->n_cb; + derm_req.e = e; + derm_req.k0 = 0; /* Actual output from SDK */ + derm_req.isretx = check_bit(dec->op_flags, + RTE_BBDEV_LDPC_HQ_COMBINE_IN_ENABLE); + derm_req.rvid = dec->rv_index; + derm_req.modulation_order = dec->q_m; + derm_req.start_null_index = parity_offset - dec->n_filler; + derm_req.num_of_null = dec->n_filler; + + bblib_rate_dematching_5gnr(&derm_req, &derm_resp); + + /* Compute RM out size and number of rows */ + deRmOutSize = RTE_MIN( + derm_req.k0 + derm_req.e - + ((derm_req.k0 < derm_req.start_null_index) ? + 0 : dec->n_filler), + dec->n_cb - dec->n_filler); + if (m_harq_in != NULL) + deRmOutSize = RTE_MAX(deRmOutSize, + RTE_MIN(dec->n_cb - dec->n_filler, + m_harq_in->data_len)); + numRows = ((deRmOutSize + dec->n_filler + dec->z_c - 1) / dec->z_c) + - sys_cols + 2; + numRows = RTE_MAX(4, numRows); + + /* get output data starting address */ + out = (uint8_t *)mbuf_append(m_out_head, m_out, out_length); + if (out == NULL) { + op->status |= 1 << RTE_BBDEV_DATA_ERROR; + rte_bbdev_log(ERR, + "Too little space in output mbuf"); + return; + } + + /* rte_bbdev_op_data.offset can be different than the offset + * of the appended bytes + */ + out = rte_pktmbuf_mtod_offset(m_out, uint8_t *, out_offset); + adapter_input = q->enc_out; + + dec_req.Zc = dec->z_c; + dec_req.baseGraph = dec->basegraph; + dec_req.nRows = numRows; + dec_req.numChannelLlrs = deRmOutSize; + dec_req.varNodes = derm_req.p_harq; + dec_req.numFillerBits = dec->n_filler; + dec_req.maxIterations = dec->iter_max; + dec_req.enableEarlyTermination = check_bit(dec->op_flags, + RTE_BBDEV_LDPC_ITERATION_STOP_ENABLE); + dec_resp.varNodes = (int16_t *) q->adapter_output; + dec_resp.compactedMessageBytes = q->enc_out; + + bblib_ldpc_decoder_5gnr(&dec_req, &dec_resp); + + dec->iter_count = RTE_MAX(dec_resp.iterationAtTermination, + dec->iter_count); + if (!dec_resp.parityPassedAtTermination) + op->status |= 1 << RTE_BBDEV_SYNDROME_ERROR; + + bblib_bit_reverse((int8_t *) q->enc_out, outLenWithCrc << 3); + + if (check_bit(dec->op_flags, RTE_BBDEV_LDPC_CRC_TYPE_24A_CHECK) || + check_bit(dec->op_flags, + RTE_BBDEV_LDPC_CRC_TYPE_24B_CHECK)) { + crc_req.data = adapter_input; + crc_req.len = K - dec->n_filler - 24; + crc_resp.check_passed = false; + crc_resp.data = adapter_input; + if (check_crc_24b) + bblib_lte_crc24b_check(&crc_req, &crc_resp); + else + bblib_lte_crc24a_check(&crc_req, &crc_resp); + if (!crc_resp.check_passed) + op->status |= 1 << RTE_BBDEV_CRC_ERROR; + } + +#ifdef RTE_BBDEV_OFFLOAD_COST + q_stats->acc_offload_cycles += rte_rdtsc_precise() - start_time; +#endif + if (check_bit(dec->op_flags, RTE_BBDEV_LDPC_HQ_COMBINE_OUT_ENABLE)) { + /* get output data starting address */ + harq_out = NULL; + if (m_harq_out != NULL) + harq_out = (uint8_t *)mbuf_append(m_harq_out_head, + m_harq_out, deRmOutSize); + if (harq_out == NULL) { + op->status |= 1 << RTE_BBDEV_DATA_ERROR; + rte_bbdev_log(ERR, "No space in harq output mbuf"); + return; + } + harq_out = rte_pktmbuf_mtod_offset(m_harq_out, uint8_t *, + harq_out_offset); + rte_memcpy(harq_out, derm_req.p_harq, deRmOutSize); + dec->harq_combined_output.length += deRmOutSize; + } + + rte_memcpy(out, adapter_input, out_length); + dec->hard_output.length += out_length; +#else + RTE_SET_USED(q); + RTE_SET_USED(op); + RTE_SET_USED(c); + RTE_SET_USED(out_length); + RTE_SET_USED(e); + RTE_SET_USED(m_in); + RTE_SET_USED(m_out_head); + RTE_SET_USED(m_out); + RTE_SET_USED(m_harq_in); + RTE_SET_USED(m_harq_out_head); + RTE_SET_USED(m_harq_out); + RTE_SET_USED(harq_in_offset); + RTE_SET_USED(harq_out_offset); + RTE_SET_USED(in_offset); + RTE_SET_USED(out_offset); + RTE_SET_USED(check_crc_24b); + RTE_SET_USED(crc24_overlap); + RTE_SET_USED(in_length); + RTE_SET_USED(q_stats); +#endif +} + + +static inline void enqueue_dec_one_op(struct turbo_sw_queue *q, struct rte_bbdev_dec_op *op, struct rte_bbdev_stats *queue_stats) { @@ -1033,6 +1539,7 @@ struct turbo_sw_queue { in_offset, out_offset, check_bit(dec->op_flags, RTE_BBDEV_TURBO_CRC_TYPE_24B), crc24_overlap, seg_total_left, queue_stats); + /* To keep CRC24 attached to end of Code block, use * RTE_BBDEV_TURBO_DEC_TB_CRC_24B_KEEP flag as it * removed by default once verified. @@ -1054,6 +1561,103 @@ struct turbo_sw_queue { } r++; } + + if (mbuf_total_left != 0) { + op->status |= 1 << RTE_BBDEV_DATA_ERROR; + rte_bbdev_log(ERR, + "Mismatch between mbuf length and included Circular buffer sizes"); + } +} + +static inline void +enqueue_ldpc_dec_one_op(struct turbo_sw_queue *q, struct rte_bbdev_dec_op *op, + struct rte_bbdev_stats *queue_stats) +{ + uint8_t c, r = 0; + uint16_t e, out_length; + uint16_t crc24_overlap = 0; + struct rte_bbdev_op_ldpc_dec *dec = &op->ldpc_dec; + struct rte_mbuf *m_in = dec->input.data; + struct rte_mbuf *m_harq_in = dec->harq_combined_input.data; + struct rte_mbuf *m_harq_out = dec->harq_combined_output.data; + struct rte_mbuf *m_harq_out_head = dec->harq_combined_output.data; + struct rte_mbuf *m_out = dec->hard_output.data; + struct rte_mbuf *m_out_head = dec->hard_output.data; + uint16_t in_offset = dec->input.offset; + uint16_t harq_in_offset = dec->harq_combined_input.offset; + uint16_t harq_out_offset = dec->harq_combined_output.offset; + uint16_t out_offset = dec->hard_output.offset; + uint32_t mbuf_total_left = dec->input.length; + uint16_t seg_total_left; + + /* Clear op status */ + op->status = 0; + + if (m_in == NULL || m_out == NULL) { + rte_bbdev_log(ERR, "Invalid mbuf pointer"); + op->status = 1 << RTE_BBDEV_DATA_ERROR; + return; + } + + if (dec->code_block_mode == 0) { /* For Transport Block mode */ + c = dec->tb_params.c; + e = dec->tb_params.ea; + } else { /* For Code Block mode */ + c = 1; + e = dec->cb_params.e; + } + + if (check_bit(dec->op_flags, RTE_BBDEV_LDPC_CRC_TYPE_24B_DROP)) + crc24_overlap = 24; + + out_length = (dec->basegraph == 1 ? 22 : 10) * dec->z_c; /* K */ + out_length = ((out_length - crc24_overlap - dec->n_filler) >> 3); + + while (mbuf_total_left > 0) { + if (dec->code_block_mode == 0) + e = (r < dec->tb_params.cab) ? + dec->tb_params.ea : dec->tb_params.eb; + + seg_total_left = rte_pktmbuf_data_len(m_in) - in_offset; + + process_ldpc_dec_cb(q, op, c, out_length, e, + m_in, m_out_head, m_out, + m_harq_in, m_harq_out_head, m_harq_out, + in_offset, out_offset, harq_in_offset, + harq_out_offset, + check_bit(dec->op_flags, + RTE_BBDEV_LDPC_CRC_TYPE_24B_CHECK), + crc24_overlap, + seg_total_left, queue_stats); + + /* To keep CRC24 attached to end of Code block, use + * RTE_BBDEV_LDPC_DEC_TB_CRC_24B_KEEP flag as it + * removed by default once verified. + */ + + mbuf_total_left -= e; + + /* Update offsets */ + if (seg_total_left == e) { + /* Go to the next mbuf */ + m_in = m_in->next; + m_out = m_out->next; + if (m_harq_in != NULL) + m_harq_in = m_harq_in->next; + if (m_harq_out != NULL) + m_harq_out = m_harq_out->next; + in_offset = 0; + out_offset = 0; + harq_in_offset = 0; + harq_out_offset = 0; + } else { + /* Update offsets for next CBs (if exist) */ + in_offset += e; + out_offset += out_length; + } + r++; + } + if (mbuf_total_left != 0) { op->status |= 1 << RTE_BBDEV_DATA_ERROR; rte_bbdev_log(ERR, @@ -1077,6 +1681,23 @@ struct turbo_sw_queue { NULL); } +static inline uint16_t +enqueue_ldpc_dec_all_ops(struct turbo_sw_queue *q, + struct rte_bbdev_dec_op **ops, + uint16_t nb_ops, struct rte_bbdev_stats *queue_stats) +{ + uint16_t i; +#ifdef RTE_BBDEV_OFFLOAD_COST + queue_stats->acc_offload_cycles = 0; +#endif + + for (i = 0; i < nb_ops; ++i) + enqueue_ldpc_dec_one_op(q, ops[i], queue_stats); + + return rte_ring_enqueue_burst(q->processed_pkts, (void **)ops, nb_ops, + NULL); +} + /* Enqueue burst */ static uint16_t enqueue_enc_ops(struct rte_bbdev_queue_data *q_data, @@ -1096,6 +1717,24 @@ struct turbo_sw_queue { /* Enqueue burst */ static uint16_t +enqueue_ldpc_enc_ops(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_enc_op **ops, uint16_t nb_ops) +{ + void *queue = q_data->queue_private; + struct turbo_sw_queue *q = queue; + uint16_t nb_enqueued = 0; + + nb_enqueued = enqueue_ldpc_enc_all_ops( + q, ops, nb_ops, &q_data->queue_stats); + + q_data->queue_stats.enqueue_err_count += nb_ops - nb_enqueued; + q_data->queue_stats.enqueued_count += nb_enqueued; + + return nb_enqueued; +} + +/* Enqueue burst */ +static uint16_t enqueue_dec_ops(struct rte_bbdev_queue_data *q_data, struct rte_bbdev_dec_op **ops, uint16_t nb_ops) { @@ -1111,6 +1750,24 @@ struct turbo_sw_queue { return nb_enqueued; } +/* Enqueue burst */ +static uint16_t +enqueue_ldpc_dec_ops(struct rte_bbdev_queue_data *q_data, + struct rte_bbdev_dec_op **ops, uint16_t nb_ops) +{ + void *queue = q_data->queue_private; + struct turbo_sw_queue *q = queue; + uint16_t nb_enqueued = 0; + + nb_enqueued = enqueue_ldpc_dec_all_ops(q, ops, nb_ops, + &q_data->queue_stats); + + q_data->queue_stats.enqueue_err_count += nb_ops - nb_enqueued; + q_data->queue_stats.enqueued_count += nb_enqueued; + + return nb_enqueued; +} + /* Dequeue decode burst */ static uint16_t dequeue_dec_ops(struct rte_bbdev_queue_data *q_data, @@ -1223,6 +1880,10 @@ struct turbo_sw_queue { bbdev->dequeue_dec_ops = dequeue_dec_ops; bbdev->enqueue_enc_ops = enqueue_enc_ops; bbdev->enqueue_dec_ops = enqueue_dec_ops; + bbdev->dequeue_ldpc_enc_ops = dequeue_enc_ops; + bbdev->dequeue_ldpc_dec_ops = dequeue_dec_ops; + bbdev->enqueue_ldpc_enc_ops = enqueue_ldpc_enc_ops; + bbdev->enqueue_ldpc_dec_ops = enqueue_ldpc_dec_ops; ((struct bbdev_private *) bbdev->data->dev_private)->max_nb_queues = init_params->queues_num; diff --git a/mk/rte.app.mk b/mk/rte.app.mk index 21d32a2..060e0ec 100644 --- a/mk/rte.app.mk +++ b/mk/rte.app.mk @@ -225,8 +225,14 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BBDEV_TURBO_SW) += -lrte_pmd_bbdev_turbo_sw _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BBDEV_TURBO_SW) += -L$(FLEXRAN_SDK)/lib_crc -lcrc _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BBDEV_TURBO_SW) += -L$(FLEXRAN_SDK)/lib_turbo -lturbo _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BBDEV_TURBO_SW) += -L$(FLEXRAN_SDK)/lib_rate_matching -lrate_matching +ifeq ($(CONFIG_RTE_LIBRTE_PMD_BBDEV_AVX512),y) +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BBDEV_TURBO_SW) += -L$(FLEXRAN_SDK)/lib_LDPC_ratematch_5gnr -lLDPC_ratematch_5gnr +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BBDEV_TURBO_SW) += -L$(FLEXRAN_SDK)/lib_ldpc_encoder_5gnr -lldpc_encoder_5gnr +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BBDEV_TURBO_SW) += -L$(FLEXRAN_SDK)/lib_ldpc_decoder_5gnr -lldpc_decoder_5gnr +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BBDEV_TURBO_SW) += -L$(FLEXRAN_SDK)/lib_rate_dematching_5gnr -lrate_dematching_5gnr +endif _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BBDEV_TURBO_SW) += -L$(FLEXRAN_SDK)/lib_common -lcommon -_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BBDEV_TURBO_SW) += -lirc -limf -lstdc++ -lipps +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BBDEV_TURBO_SW) += -lirc -limf -lstdc++ -lipps -lsvml endif # CONFIG_RTE_LIBRTE_BBDEV ifeq ($(CONFIG_RTE_LIBRTE_CRYPTODEV),y) -- 1.8.3.1